Zabbix in the subway. Munich Transport Corporation Case Study

Munich Transport Corporation (MVG) is a division of an integrated service provider including electricity, gas and water utilities, as well as transportation services. Providing a daily service used by thousands of Munich residents and tourists, the company seeks for options to detect errors as fast as possible and efficiently deal with them.

Thanks to IntelliTrend IT-Services GmbH and Wolfgang Alper, who has supported this project, Zabbix now not only monitors the IT infrastructure of MVG, but also provides control over stations and moving vehicles (trams and subways).

PROBLEM REQUIREMENTS APPROACH OUTCOME
The main challenge was to find a tool that provides reliable error detection for the components, installed in moving vehicles. For a certain time period, they may remain in a dead zone. This means that a certain logic had to be developed to check whether the device is temporarily unavailable or defective.
The critical moment was when the number of devices and the level of complexity and scope increased, pushing the existing monitoring system to the limits. In addition to that, further diagnostic data was required to obtain a finer analysis of errors.
The monitoring solution should be compatible with a large number of different protocols/manufacturers and should also provide an extensible scripting environment. Finally, it was important that the software supported data visualization. Zabbix monitors network components for the transmission of infotainment data (screens display data within the vehicles), as well as video components that function flawlessly, providing secure and reliable video surveillance. After the implementation of Zabbix, a number of problems surfaced, that were undetectable by the previous monitoring solution. Now it is possible to identify failures much faster and fix immediately by access to the relevant devices directly from the UI. Furthermore, with a few external scripts, it is possible to simplify the workflow and add several comfort functions in the Zabbix Maps.

The most important factors why MVG has decided to use Zabbix:

  • Support of many protocols which allows the monitoring of many different devices by various manufacturers.
  • Extensibility through scripts.
  • A high degree of data visualization.
  • High scalability.
  • Agentless surveillance.
  • Separation of functions: data collection, data evaluation, problem identification and alerting.
  • Large user community.
  • Powerful API.

“When the number and complexity of the devices to be monitored, and the resulting additional costs became too high, in 2015 MVG started looking for a new monitoring solution. At the beginning (of this challenge), other monitoring solutions were investigated and tested, but due to lack of flexibility and expandability, they were not further followed up upon. More importantly, these tools did not offer reliable error detection of components installed in moving vehicles.

Thanks to the design of Zabbix, and especially the division into “items” and “triggers”, it became possible to meet this specific requirement by means of an individual logic pattern within the trigger condition.

Due to the high homogeneity of the components, a solution that enhances manageability via templates was sought after. Likewise, the requirement of simple integration of external scripts was satisfied. All these positive features, as well as a large community behind Zabbix, have led to choosing Zabbix.” Sandro Gehlhaar, Network and system administrator.

Implementation

Currently, there are 3796 devices monitored by the Zabbix server, which in turn houses the MySQL database and the web front-end. This server is operated virtually with 8 CPU’s and 32GB RAM supported by SAS storage system. 105818 items are queried by the Zabbix server from those devices, where 23820 triggers detect whether certain items deviate from their target state. This results in 298.48 NVPS with an average of approx. 7 people (actively) using the system concurrently. General overview:

  • Each device inside a tram/subway is treated as a host and is monitored for availability.
  • Each tram/subway is managed as a host group.
  • Host groups are nested and organized by the tram/subway lines (using the host group nesting feature introduced in Zabbix 3.2).
  • All devices depend on the connectivity of the MRCU (Mobile Radio Control Unit in subways) or LTE router (LTE G4 Connectivity in trams)
  • Maps are automatically created for each tram/subway (using the Zabbix API).
  • Maps use sub-maps to link to a specific tram/subway view.

The following examples illustrate Zabbix maps, showing each subway as a hostgroup with its state:


Detailed Zabbix maps per tram/subway show components and their state

Operations executed by context menus from the map – installation point

Show the installation point of the device in tram/subway directly from the Zabbix map

Operations executed by context menus from the map – vehicle location.

Tram location is shown directly from the Zabbix map

“While most monitoring projects focus on standard IT and application infrastructure, this project was really different and exciting. The difference was not only the idea to monitor trams, subways and stations in a very dynamic environment, but also the vision to visualize these entities in a user-friendly way and to make it part of their daily work. Zabbix proved to be the right choice and it was a great experience to see how these challenges could be mastered with Zabbix and the usage of the Zabbix-API.” – summarized Wolfgang Alper.

 

About the company:

Company: Stadtwerke München GmbH
Location: Munich
Founded: 1998
Employees: ~9067
Operating budget: 7.224 billion Euro

 

 

Jekaterina Petruhina

Author: Jekaterina Petruhina

Marketing Specialist at Zabbix

Leave a Reply