Using Bash to badly monitor your servers
April 15th, 2007 - CharlesI was surfing dzone and I came across two links regarding site monitoring. The first link was “using Twitter for server-monitoring” and the second link was, in a nutshell, making a PHP poor man’s ping without using Net_Ping.
Both of these methods have a little bit of a square peg in a round hole approach, but I definitely understand where they are coming from. My secret embarrassing project has been to build a scalable multi-server monitoring service using the bare minimum amount of utilities. I got pretty close, and I put it down to pursue more worthwhile items. I will eventually finish it with a little divine inspiration. Personally, the primary purpose of the project is to scratch the MacGyver itch when it arises.
The plan:
Have a continuously running monitor script which will check mail, database, and web services on a variable number of servers.
If something is wrong, the monitoring server is supposed to check with another server to confirm that a problem does exist. The second server will respond with the results, and the first server will act accordingly.
The challenge:
Do all this with a bare minimum of tools. I ended up using Bash and Netcat.
The results:
I got the mail and web monitoring script taken care of. To check the database, I would have a keyword pulled from the database and placed on a web page.
The hitch is when the monitoring script goes to alert the other server/monitoring script that something is wrong. Given Netcat isn’t secure, I have to verify the request is coming from a white-listed IP address. Lsof doesn’t work, because by the time I make the check, the connection is already terminated. Checking with Netstat works, but given the connection is listed for awhile, it would leave a loophole for someone to inject their own data without a need to pass the white-list process.
What follows are the tiny scripts that do the monitoring. You could easily throw in a
echo "uh, you broke it"|mail -s "Server Down" foo@bar.com to have these scripts actually DO something.
mailCheck.sh
#! /bin/bash
#requires netcat
# look for a valid quit to show that mail is paying attention
stcode=`(sleep 1;echo QUIT) |nc $1 25|awk '/221/{print $1}'`
if [ "$stcode" == "221" ]
then
echo 1
else
echo 0
webCheck.sh
#! /bin/bash
#requires netcat
# do not use it on a page where it will match HTTP...my awk skills aren't that good
#URL is first argument
# page is second argument
echo -en "GET /$2 HTTP/1.1\nHost:$1\nConnection:Close\n\n\n\n" |nc $1 80|awk '/HTTP\/1.1/{print $2}'
I don’t know if these scripts are useful (yet). This was more a post of solidarity to those who enjoy the DIY approach ![]()
I just need to stress that if the site monitoring is a professional requirement, skip the fun and do it right.