Monitoring Windows NT/2000/XP/2003 is important even for small environments. Automatically monitored, critical failures can often be avoided. But how to monitor a system without too much effort? The basic idea behind a successful monitoring and alerting system is to centralize all system events at a single monitoring station. Once the information is centralized, it can be used to build an alerting system or even carry out corrective actions.
What is a Monitoring System made up from?
Successful monitoring systems do usually have many components. These are typically loosly copuled so that new requirements can be added easily. Keep in mind how often systems and environments change – flexibility in a monitoring system is nowadays a “must have”. Typically, a system consists of
- data collector processes
- storage engine
- analysis console
- background processes
In this scenario, the data collectors run on the monitored systems. This should be light weight processes because they shouldn’t put to much burden on the host system. This is especially important if high performing systems like web servers are to be monitored. The data collector picks up “interesting” events and forward them to the central storage engine.
The storage engine then stores the received event notifications to persistent storage. That way, it is safe from any manipulations or technical problems at the monitored systems. The storage engine typically runs on limited number of machines. Often, there is only a single storage engine inside a whole network. That’s really not a bad idea, as the whole concept of monitoring is to have all information centrally. Multiple storage engines, on the other hand, are typically used in complex scenarios, mostly with WAN links in between. There, a local storage engines serves as a central hub for one location and forwards the information to the central system.
The analysis console finally is used by the system administrators. It is the interface allows to have a look at consolidated reports and also allows to drill down into more specific topics. Ideally, the console supports multiple concurrent users as well as provides some hints to fixing detected problems. Integrated links to vendor knowledge bases or public search & discussion services are a valuable help here.
Of course, data collectors and the storage engine are background processes. But there can also be background processes that consolidate and monitor the storage engine’s data on a schedule – e.g. daily. So administrators either receive an activity overview report or an exception report (for important and urgent matters).
What about Windows?
Windows NT/2000/XP/2003 do not come with a build in monitoring solution. So you need some tools to get it going.
Windows logs the most important state date into the event log. Third party vendors are also encouraged to log any events to the event log. For example, most Anti-Virus products will log caught viruses here. The the event log is definitely the place to look at if you’d like to monitor an Windows system’s health. As a build-in tool, only the Windows event viewer is available (part of the computer management MMC under Windows 2000 and XP). That tool allows interactive display of current events but was never meant to be part of an automated monitoring solution.
What we need is a data collector that can run in the background. For this, we use EventReporter. That product monitors the event log in near real time and forwards all new messages to the storage engine via syslog protocol. Why did I say “near real time”? Well, EventReporter by design does not operate on Windows event notifications, which have been proven to be not fully reliable under extreme scenarios. Instead, it polls the event logs on a pre-set schedule. Resource usage is very moderate, so the schedule can be set to run every 30 seconds – even more often in very security sensitive environments. EventReporter does not only forward the logs but also checks if someone truncates them (via Windows Event Viewer or an API call). If that is done, a notification is send the the storage engine. This functionality is important, as such log truncations can be a good indication of an intruder. EventReporter is installed on each system that is to be monitored. It runs on all flavors of NT (even ALPHA), so really all systems can be monitored.
Why syslog?
I mentioned that we forward the messages via syslog protocol. This in fact is a big plus. Syslog is a standard protocol stemming from Unix. Nowadays, it is supported by nearly all major devices. For example, most routers and network printers are able to provide diagnostic information via syslog. So a syslog based monitoring solution is able to gather data from a variety of sources. While this is not really the scope of this article, it is nice to know that syslog can help us a bit out when we need to monitor the whole network. This gives us additional flexibility as our needs may grow.
Storing the Events…
Now we need something to store the events collected by EventReporter. We use WinSyslog for this. This enhanced syslog daemon works much like it’s Unix pendant. But besides writing to flat files, it can also log to a database and carry out flexible actions.
In our monitoring system, we use it for two functions: first of all, it stores all events. In our case, events are written both to a flat file as well as the database. We use this approach because bulk analysis is done fastest with the help of flat files. However, viewing event details is done best by using a database. So we’ve taken the route to simply write to both stores and have the best of both worlds. A large hard disk is of course helpful here…
Besides storing events, WinSyslog acts also as an alerting engine. It can be configured to detect important message fragments or high priority messages and set to forward these to an email account. If you paging provider supports an email to page interface, this is also the way to call a page in case of an emergency.
Typically, only a single instance of WinSyslog is needed. However, it has support for syslog cascading. Cascading is used if a reporting hierarchy is build. This is most often done in corporate networks involving WAN links where only higher importance messages should be send to a central data store while less important messages are stored at the individual sites locally. That way, complete data is available for drill-down, but it is not necessarily being transmitted over the WAN. WinSyslog fully supports cascading. It is also able to forward only selected messages based on rules.
Analyzing the Events
Now we come to the analysis part. In most cases, administrators don’t like to be bothered with routine information. They just want to get notified if things go either terribly wrong (hopefully a bit before it really hurts) or regularily to see that all is doing well.
In our system, we have MoniLog running to provide daily reports. These MoniLog reports take the wealth of information available and largely compress it. It creates a summary of the events that happened. So a typical report is just a short HTML page, even for a system with a large number or servers. The color coded reports are stored on the Intranet web server and accessible by every administrator. They allow to have a quick look at the system state. Even better, the reports include links to EventID.NET, an online resource for information on Windows events. With eventid.net, solutions can often be found very quickly.
Of course, the MoniLog reports might be to much compressed or limited in scope to fully analyze an issue indicated in the daily report. Then, the administrator can use the MoniLog client to dig into the stored event base. WinSyslog does also come with a web interface to the raw event data, so this might be used to have a detail view at each single event.
The integrated Solution
As you see, the system is made up of three main components. Each of these has specific duties to perform. The modular approach provides the flexibility need in today’s environments. For example, if Cisco information is to be integrated into the system, you simply need to point the Cisco boxes to the WinSyslog server. Now, the storage engine saves the new events. Even though MoniLog does not (yet) pick up and analyze the Cisco events, they can be viewed with the WinSyslog web interface, which might be very helpful during analysis.
Also, an administrator has the option to add his or her own custom scripts to be executed on the stored event data. The open system architecture provides unlimited flexibility to do so.
It is also easy to integrate Unix and Linux machines into the scenario. They support syslog natively and as such can both send and consume syslog messages. In fact, the EventReporter product alone is often used as a tool to integrate Windows events into Unix based management systems.
Conclusion
An effective monitoring solution can save the administrator a lot (and I mean lot) of work. It can also help prevent major system breakdowns, as ciritical situations can be detected early and – hopefully – solved before any damage occurs. This is especially true if your think about security monitoring.
As I have outlined, a monitoring system needs not to be very complex or hard to set up. Just use some ready to run tools, integrate them and enjoy the benefits of the system.
Tools used
The following tools were used to build the monitoring system:
- EventReporter – data collector
- WinSyslog – storage engine and alert notification
- MoniLog – console and reporting tool
If you’d like to build you own system, you can download free evaluation copies from the respective web sites. Detailed installation instructions are available in the additional article “How To setup Windows NT centralized Monitoring“.
I hope this article is helpful. If you have any questions or remarks, please do not hesitate to contact me at rgerhards@adiscon.com.
Click here to sign up for FREE B2B / tech newsletters from Murdok!
Rainer Gerhards works for Adiscon, who offers software for server monitoring. Visit http://www.monitorware.com for more information and free downloads.