Blessed are the Geeks, for they shall internet the earth

You Gotta Stop, Look And Listen, Baby...
Greg Bromage: Australia


It sneaks up on you, when you least expect it. Around about 3:15pm. Probably on a Thursday.


It begins with a phone call at your desk. "I've got this strange message on my screen. It says that I can't access my mail server. Why can't I get my mail?". While you're going through the usual troubleshooting things, your mobile rings. "I've been working on this document for, like, 5 hours and my machine has frozen."


A quick ping tells you the cause of the problem - the server has crashed. So you start it up and the reason is obvious.


Only 150kb disk space free.


This story is repeated hundreds of time every day, all around the world. Whether it's running out of disk space, database servers straining under load or forgetting to update the anti-virus software until it's too late.


     System Administrators are continually being asked (or told) to do more with less and amongst all the additional duties piled on to us, the one thing that is always pushed to the bottom of the list is perhaps the most essential one of all: System Monitoring.


     Remember the days when you used to actually look at the log files? That was back in the days when you knew the configuration of each and every machine from memory and could guess, to within a few megabytes, the available capacity. Those halcyon days can be yours again, with just a few simple steps.


System Monitoring needs to be a priority. Block it out in your calendar if you have to. Lock the door. Forward your phone to someone else. Do whatever it takes but make it known to everyone (especially your boss) that for, say, one afternoon a week you are incommunicado. They won't be happy about it. Convince them that having 1 person unavailable for 2 hours a week is better and cheaper than having the entire network offline for 7 or 8 hours whilst you fix a simple problem that could have been avoided.


Make a checklist of what you need to check so that you go through the same routine. Include disk space, memory and CPU utilisation for every server. Traffic statistics, either gross levels or break it down by protocol. Just how much e-mail does your site receive per day? If you don't know, then how can you tell at what point your internet link can't cope?


Keep a spreadsheet of the numbers you find. This is a great way to justify not only the time you're taking to do the monitoring, but also to justify your budget. Management people (especially accounts) like colourful graphs. Learn phrases like: "Based on the current trend, we'll need to buy more disk space in September next year." Knowing that sort of thing, and having documentation to back it up, also makes for a better working relationship come budget time.


Establish a baseline. How are you going to know what's out of the ordinary until you know what "normal" is. Get to know your network traffic.


Automate it where possible: Remember how computers were going to make life easier for us humans? Yeah, me neither... But, consider the virtues of the humble command scheduler (at or cron, depending on whether you're in a MS or *nix world). Why go to each server to collect the data when you can have each server automatically collect the statistics and e-mail it to you?


Do you have any thoughts on the subject? Or some time-saving scripts to automate your monitoring? Send them along to me at



E-mail your comments to
All rights reserved

Disclaimer: The Opinions shared on are contributed by its readers and does not necessarily express the opinion of the creators of this publication.