Check MK

From Infogalactic: the planetary knowledge core
Jump to: navigation, search
Check_MK
250px
Multisite Python GUI displaying an overview of monitored systems
Developer(s) Mathias Kettner GmbH
Initial release 2008; 16 years ago (2008)
Stable release 1.2.8p2[1] / May 27, 2016; 8 years ago (2016-05-27)
Operating system Linux as monitoring host, and other Unix, Microsoft Windows, VMS as monitored systems.
License GNU GPL v2
Website mathias-kettner.com/check_mk.html

Check_MK is an extension to the Nagios monitoring system that allows creating rule-based configuration using Python and offloading work from the Nagios core to make it scale better, allowing more systems to be monitored from a single Nagios server.

It comes with a set of system checks, a mod_python and JavaScript based web user interface, and a module that allows fast access to the Nagios core. On top of Nagios it also adds additional features.[2]

Version history

The first public versions were available in 2008. In April 2009 it was released under the GPL.[3] Since 2009, the releases have been tracked in git.[4]

"Stable" releases are labeled with a major version and a "p" for production and the build number: I.e. 1.1.12p6 is a stable 1.1.12 version, and it is the 6th public release. These are ABI compatible within their version, so a 1.1.12p5 config will work mostly unchanged for 1.1.12p6.

"Innovation" releases are specially marked versions based on the development branch that are for public testing. Check_MK keeps the interfaces stable during the lifetime of a "p" release, but they may change between new stable releases. For example, there are changes between 1.1.10p<num> and 1.1.12p.x where users will have to adjust their configuration. The same applies from 1.1.12 to 1.2, since this is a new major release.

Uses

It can be used as a front-end and extension of a Nagios, Icinga[5] or Shinken[6] monitoring system, for monitoring performance and health of networking devices, servers and infrastructure systems.

  • Autodetection of configuration of data points in a monitored system (inventory)
  • Special checks in addition to standard Nagios plugins
  • Rule-based configuration
  • Agentless (SNMP-based) monitoring
  • Scalability tuning for setups that could normally not be monitored using Nagios
  • Replacement of standard Nagios GUI and centralized monitoring
  • Nagios configuration management via text files holding python expressions (where the rules go) or web interface (which writes text files)
  • Graphical administration of the monitoring system
  • Filtering, viewing and alerting for logfiles and event data like SNMP traps

Technology

Check_MK includes a combination of multiple components:

  • Using multiple "passive" checks via a single "active" check (passive checks are only processed, but not executed by Nagios, which is considerably faster[citation needed])
  • Modules to unify configuration handling and connections to monitored systems. This makes TCP or SNMP access transparent to the user and authors of check plugins
  • Configuration handling for PNP4Nagios, a graphing tool for Nagios and compatible systems
  • An agent for host operating systems. The relatively small agent only runs the commands to gather the data needed to run checks but avoids local processing. Per design it is also not allowed to accept any external input. There are agents for different operating systems such as Linux, Unix, Windows and OpenVMS. The agents are made to be modifyable and/or extensible by the user.
  • Checks that consist of agent-side and server-side parts. Check_MK gives them a framework for handling connections, talking to Nagios and handling internal errors. There are rather strict design standards for writing checks that are supposed to bring more conformity to the plugins than with standard Nagios plugins. The checks handle the detection of supported devices and are then automatically called to check against the expected status (good) of a component that was found earlier on. Currently there are about 640 plugins in the official distribution, plus 100 on the community exchange. A larger number of checks can be found at Github.
  • Livestatus is a module that handles direct access to the core of Nagios to allow. It can be queried using a query language and is used as a backend. Nagios addons that use livestatus to access Nagios data include JasperReports, NagiosBP, Thruk, NagstaMon, NagVis and Multisite.
  • Multisite is a GUI component that can run in parallel or instead of the standard Nagios GUI. It uses Livestatus to access one or more Nagios servers directly and can build reports from the available data. There also are plugins for Multisite:
    • Check_MK BI - a business process / impact analysis tool (rule-based, if you define a rule for "all servers" and you add a new server, the rule immediately applies to that server, too.)
    • WATO - a web administration frontend to the check_mk (and nagios) configuration (rule-based)
    • Event Console - a rule based event processing interface to handle i.e. data arriving from SNMP Traps or Syslog. This data can be processed further by applying rules ("if this message occurred more than 5 times this hour, then...") and finally also turned into services monitored by Nagios. It's not primarily a browser for unstructured logs, but being similar to event processing in classic NMS.

It is possible to use some of the components on their own. Check_MK can be used to define a configuration that only consists of standard Nagios checks. Another option is to add livestatus to an existing Nagios server without any further modifications. That way a user can use the newer Web interfaces like Multisite or Thruk. There's also a livestatus-based tool to replace NSCA, transferring both status information and valid Nagios configuration to a remote server (With normal NSCA, the handling of remote configuration can be complex).

Differences from standard Nagios installations

  • Higher total number of services checks as one service is generated per monitored component - a server can have over 1000 services which are all monitored (and can be grouped)
  • Usage of RRD databases for historical data with almost every service, set up and displayed automatically based on the check and validity of data.
  • Standard check interval of 1 minute (Nagios defaults to 5 minutes)
  • In SNMP monitoring, avoidance of traps in favour of status polling (for extra performance data)
  • Smaller, fully scriptable configuration
  • Rare use of high-maintenance Nagios config "tricks"
  • Focus on passive services solves Nagios check latency problems.
  • No use of databases, commonly used data is held in RAM or fetched as live data from Nagios
  • Always preferring a rule-based configuration (my most important disks should be no fuller than 90%, and anything else can be up to 95%) over explicit (this disk here and this disk there) configuration statements.
  • Scalability (users connect 100 nagios servers into one UI (source: list archive))

See also

Use in other projects

The agent portion of Check_MK is used in some other projects as a "data source" for Unix/Linux systems. One example is OpenNMS.

Observium rebrands the Check_MK agent as the "Observium Unix Agent".

References

  1. http://mathias-kettner.de/check_mk_werks.php?edition_id=raw&branch=1.2.8&version=today
  2. Lua error in package.lua at line 80: module 'strict' not found.
  3. Lua error in package.lua at line 80: module 'strict' not found.
  4. Git releases
  5. Lua error in package.lua at line 80: module 'strict' not found.
  6. Lua error in package.lua at line 80: module 'strict' not found.

External links