In this project, we designed & implemented an automated monitoring and
trouble-ticketing system for a large (over 100,000 nodes planned) NT, Novell, & UNIX
network. This system allows these client nodes to detect certain of
their own problems and generate their own Help Desk trouble tickets automatically. This
minimizes the end-user's need to identify and report
problems and in many instances enables the Help Desk to
start fixing a problem before the user is even aware of
it.
The solution utilizes off-the-shelf software agents and
custom configurations to: detect system events, report
them via SNMP traps, and then turn these SNMP traps into
Help Desk trouble-tickets. The off-the-shelf
components used in this case were: Microsoft's SMS, HP
Openview, Seagate NerveCenter Pro, and Remedy "Action
Request System". An architectural overview for this system is shown
below:
The types of events that are currently being detected and trouble ticketed by this system include:
- Viruses detected during auto-scan
- RAID disk failures
- Login attempt limits exceeded
- System backup job failures
- UPS detected power failures
- NT service startup failures
- Disk utilization limits exceeded
- CPU utilization limits exceeded
- Microsoft SMS inventory process failures
- Software package distribution failures
At the conclusion of our role in the project, this system was already monitoring and
trouble ticketing over 3000 nodes.
|