Learn how to build a monitoring system for Ceph Storage using Zabbix, improving your visibility about your storage solution health and working proactively to identify possible failed events and performance issues before it impacts your applications and even the business continuity.
Introduction
The storage prices has been decreasing, business demands are growing fastly and companies are storing more data than ever before. Following this growth, it’s emerging a demand for monitoring and data protection involving software-defined storage solutions. Downtimes have a high cost, can directly impact business continuity and cause irreversible damage to organizations. Some after-effects are loss of assets and information, Interruption of services and operations, violation of law, regulations or contracts. Even that it could impact your business financially, loss of customers and damage a company’s reputation. Gartner estimates that a minute of downtime costs enterprises $5,600 and an hour over than $300,000. On the other hand, in a DevOps context, It’s essential to think about Continuous Monitoring, a proactive approach to monitoring throughout the all application life cycle and its components. This will help to identify the root cause of possible problems and work quickly and proactively to prevent performance problems or future outages. In this article, you will see how to implement monitoring of your storage solution (Ceph) using an Enterprise and Open Source tool (Zabbix).
What’s Ceph Storage?
The Ceph Storage is an open source software-defined storage, petabyte-scale and distributed storage, designed mainly for cloud workloads. While some traditional NAS or SAN storage solutions are based on expensive proprietary hardware solutions, software-defined storage is usually designed to run on commodity hardware and this can allow these new systems to be less expensive than traditional storage appliances. Ceph is designed primarily for the following use cases:
- Storing images and virtual block device storage for an OpenStack environment (using Glance, Cinder, and Nova)
- Applications that use standard APIs to access object-based storage
- Persistent storage for containers
According to Ceph documentation, whether you want to provide an object storage and/or block device services to Cloud platforms, deploy a filesystem or use Ceph for another purpose, all storage cluster deployments begin with setting up each a node, your network, and the storage cluster. A Ceph storage cluster requires at least one Monitor (ceph-mon), Manager (ceph-mgr) and Object Storage Daemon (ceph-osd). The Metadata server (ceph-mds) is also required when running Ceph File System clients. These are some of many components that will be monitored by Zabbix. To learn more about what each component does, check the product documentation.
Here we are proposing a Lab, but if you are planning to do this in production, you should review hardware and operating system recommendations.
What’s Zabbix and how can it help?
Zabbix is an enterprise-class open source distributed monitoring solution. It monitors numerous network parameters and the health and integrity of servers. Zabbix uses a flexible notification mechanism that allows users to configure e-mail based alerts for virtually any event. This allows a fast reaction to server problems. It offers excellent reporting and data visualization features based on the stored data. This makes it ideal for capacity planning. It supports both polling and trapping and all reports and statistics, as well as configuration parameters, are accessed through a web-based frontend. A web-based frontend ensures that the status of your network and the health of your servers can be assessed from any location. Properly configured, It can play an important role in monitoring IT infrastructure. This is equally true for small organizations with a few servers and for large companies with a multitude of servers. I won’t be covering Zabbix Installation here, but there’s a great guide and a video in the official documentation.
How has everything started?
Started in the Red Hat Ceph Storage 3 version, as known as Luminous, the Ceph Manager daemon (ceph-mgr) is required for normal operations and runs alongside monitor daemons, to provide additional monitoring and interfaces to external monitoring and management systems. At the same time, you can create modules and extend mgrs to provide new features. Here we will use this extension ability though a Zabbix Python module where this module is responsible to export overall cluster status and performance to Zabbix server, which is a central process that performs monitoring, interacts with Zabbix proxies and agents, calculates triggers, sends notifications; a central repository of data. Obviously, you can actively still collect traditional metrics about your operational systems, but the Zabbix module will start to gather specific information about storage metrics and performance and send it to the Zabbix server.
Here we have some examples of available metrics:
- Ceph performance: I/O operations, bandwidth, latency …
- Storage utilization and overview
- OSD status and how many are IN or UP
- Number of Mons and OSDs
- Number of Pools and Placement groups
- Overall Ceph status and much more!
How about my Lab environment ?
The Ceph cluster installation will not be covered here, but you can find more information about how to do that in the Ceph documentation. My storage cluster was installed using ceph-ansible.
The computing resources used were 12 instances with the same configuration: 2 CPU cores and 4GB RAM and as following:
- 3 Monitors and Managers nodes (collocated)
- 3 OSDs nodes with 3 disks per node (9 OSDs in total)
- 2 MDS nodes
- 2 RADOS Gateway nodes
- 1 Ansible Mgmt node
- 1 Zabbix server node collocated (Zabbix server, MariaDB server and Zabbix frontend)
Figure 1 – Lab topology
The software resources used:
- Base OS for all instances: Red Hat Enterprise Linux 7.7
- Cluster Storage nodes: Red Hat Ceph Storage 4.0
- Management & Automation: Ansible 2.8
- Monitoring: Zabbix 4.4
Considering my cluster is installed and ready, here it is the health, services and tasks status:
[user@mons-0 ~]$ sudo ceph -s cluster: id: 7f528221-4110-40d7-84ff-5fbf939dd451 health: HEALTH_OK services: mon: 3 daemons, quorum mons-1,mons-2,mons-0 (age 37m) mgr: mons-0(active, since 3d), standbys: mons-1, mons-2 mds: cephfs:1 {0=mdss-0=up:active} 1 up:standby osd: 9 osds: 9 up (since 35m), 9 in (since 3d) rgw: 2 daemons active (rgws-0.rgw0, rgws-1.rgw0) task status: data: pools: 8 pools, 312 pgs objects: 248 objects, 6.1 KiB usage: 9.1 GiB used, 252 GiB / 261 GiB avail pgs: 312 active+clean
How to enable the Zabbix dashboard module?
The Zabbix module is included in the ceph-mgr package and you must deploy your Ceph cluster with a Manager service enabled. You should enable the Zabbix module with a single command in one of the ceph-mgr nodes:
[user@mons-0 ~]$ sudo ceph mgr module enable zabbix
It’s possible to check if Zabbix module is enabled through the following command:
[user@mons-0 ~]$ sudo ceph mgr module ls | head -5 { "enabled_modules": [ "dashboard", "prometheus", "zabbix"
Sending data from Ceph cluster to Zabbix
This solution uses Zabbix Sender utility which is a command line tool that can be used to send performance data to Zabbix server for processing purpose. The utility is usually used in long running user scripts for periodical sending of availability and performance data. It can be installed on most distributions using the package manager. You should install zabbix_sender executable on all machines running ceph-mgr for high availability.
Let’s enable Zabbix repositories and install zabbix_sender in all Ceph managers nodes:
[user@mons-0 ~]$ sudo rpm -Uvh https://repo.zabbix.com/zabbix/4.4/rhel/7/x86_64/zabbix-release-4.4-1.el7.noarch.rpm [user@mons-0 ~]$ sudo yum clean all [user@mons-0 ~]$ sudo yum install zabbix-sender -y
Alternatively, you can automate it and use Ansible to run these commands once a time and in the three mgrs nodes:
[user@mgmt ~]$ ansible mgrs -m command -a "sudo rpm -Uvh https://repo.zabbix.com/zabbix/4.4/rhel/7/x86_64/zabbix-release-4.4-1.el7.noarch.rpm" [user@mgmt ~]$ ansible mgrs -m command -a "sudo yum clean all" [user@mgmt ~]$ ansible mgrs -m command -a "sudo yum install zabbix-sender -y"
Configuring the module
After understand how everything works, you need just a piece of configuration to make this module working accurately:
- zabbix_host: This is a Zabbix server hostname or IP address to which zabbix_sender will send the items as a trap.
- identifier: This is a Ceph cluster identifier parameter in Zabbix. It controls the identifier/hostname to use as source when sending items to Zabbix. This should match the name of the Host in your Zabbix server. If you don’t configure the identifier parameter the ceph-<fsid> of the cluster will be used when sending data to Zabbix. This would for example be ceph-c6d33a98-8e90-790f-bd3a-1d22d8a7d354
Optionally you have many others configuration keys which can be configured and their default values:
- zabbix_port: 10051 – This is a TCP port where Zabbix server runs
- zabbix_sender: /usr/bin/zabbix_sender – Path for the Zabbix sender binary as default
- interval: 60 – the update interval for the specified time period that zabbix_sender will be sending the data for Zabbix server. Default is 60 seconds.
Configuring your keys
Configuration keys can be set on any server with the proper cephx credentials, these are usually Monitors where the client.admin key is available.
[user@mons-0 ~]$ sudo ceph zabbix config-set zabbix_host zabbix.lab.example [user@mons-0 ~]$ sudo ceph zabbix config-set identifier ceph4-cluster-example [user@mons-0 ~]$ sudo ceph zabbix config-set interval 120
The current configuration of the module can also be shown using the following command:
[user@mons-0 ~]$ sudo ceph zabbix config-show {"zabbix_port": 10051, "zabbix_host": "zabbix.lab.example", "identifier": "ceph4-cluster-example", "zabbix_sender": "/usr/bin/zabbix_sender", "interval": 120}
Exploring Zabbix: Templates, Host creation and Dashboard
First of all, it’s time to import your template. In the Zabbix world, a template is a set of entities that can be conveniently applied to multiple hosts. The entities may be items, triggers, graphs, discovery rules, etc. Your base will be the items. Have in your mind that an item is a particular piece of data that you want to receive off of a host, a metric of data. When a template is linked to a host, all entities of the template are added to the host. Templates are assigned to each individual host directly.
Download the Zabbix template for Ceph which is available in the source directory as a XML file. It’s important to download the template file locally in a raw mode or you will have problems importing in the next step.
[user@mylaptop ~]$ curl https://raw.githubusercontent.com/ceph/ceph/master/src/pybind/mgr/zabbix/zabbix_template.xml -o zabbix_template.xml
To import template, do the following:
- Go to: Configuration → Templates
- Click on Import to the right
- Select the import file
- Click on import button
- Click on Import
Figure 2 – Importing a Zabbix template
A success or failure message of the import will be displayed in the frontend.
Configure a host in Zabbix frontend and link to the newly created template:
- Go to: Configuration → Hosts
- Click on Create host button to the right
- Enter Hostname and Group
- And link Ceph template
Figure 3 – Creating your Ceph cluster host and adding to a group
Hostname and groups are required fields. Make sure the host has the same name as the identifier configured in the Ceph config-key parameter. We have many groups available and you can choose one or create a new one. Choose Linux servers for this lab.
In the Templates tab, choose the ceph-mgr Zabbix module that you imported before and click on Select and after Add button.
Figure 4 – Linking Ceph template to the host
Configuration is done and after a few minutes, data should start to appear in Zabbix web interface and under the “Monitoring > Latest Data“ menu and graphs will start to populate for the host. Many triggers are already configured in the template which will send out notifications if you configure your actions and operations.
Figure 5 – Latest data collected by Zabbix
After the data is collected, you can easily create your Ceph dashboards like that and have fun with Zabbix:
Figure 6 – Zabbix Ceph Dashboard
Kudos:
Renato Puccini and Rodney Beauclair from Red Hat for their first revision and insights.
Bio:
Alessandro Silva works at Red Hat as a Senior Cloud Success Architect where is responsible to support strategic customers in Latin America about Cloud Adoption. He’s a Red Hat Certified Architect, LPIC-3 Security Specialist and one of the first Zabbix Certified Specialists in Brazil. He’s a Zabbix advocate and has performed so many presentations at conferences, including the Zabbix Conference Latam 2016, when he presented the Zabbix Security Insights solution. Alessandro is available for connection through his Linkedin: https://linkedin.com/in/alessandro-silva-236b4b42
We are running a containerized version of ceph and it seems to want to have the zabbix_sender in the container. Was the above a daemon install of ceph or one that used containers?