The new Zabbix proxy groups provide us with a method to provide both redundancy and load balancing in our Zabbix proxy setups. However, one major limitation arises when we want to use SNMP traps with these new proxy groups – it isn’t natively supported at the moment. One of our customers asked me to find a solution to that problem, so here’s how I went about it.
Getting to grips with the problem
As mentioned, many of us are now facing a problem. Either we use proxy groups and we don’t use SNMP traps, or we use proxy groups and move SNMP traps to a single proxy. Unfortunately, this is unacceptable for many environments where SNMP traps might be an essential part of monitoring. The problem, however, stems from how snmptrapd works in combination with Zabbix reading the trapper file. Improvements have already been made to provide for more room when creating our own solutions like this.
Other Zabbix users have also been proposing solutions and I’m sure Zabbix is looking into improvements. Here’s an example case to vote on.
However, that doesn’t solve many of our issues now. The problem starts when we are sending SNMP traps to a single proxy (Proxy 1 for example) and a Zabbix host (let’s say Zabbix host 2) is assigned to another proxy in the proxy group (Proxy 2 for example). In this situation, the trap is coming in on an incorrect monitoring proxy and Zabbix won’t be able to read the trap. It will simply not add it to the Zabbix database and ignore it.
The solution here is simple – we can configure our monitoring target like a switch or a router to send the SNMP trap to multiple sources. However, this will cause our trap to be sent over the network multiple times, increasing the load on our network. This is acceptable for smaller setups, but we were dealing with a setup that is sending hundreds of traps every second.
Finding a solution
With the problem laid out for us, we came up with a simple duplication setup that included these requirements:
- Simple and easy to maintain/troubleshoot
- Traps could only be sent over the network once
- Works fast between failovers
- Works with both redundancy and load balancing
- Minimal extra packages
- No easily corruptible shared file systems
What we came up with in the end is visible in the image below:
It’s a simple setup that requires us to install 2 extra packages and a container.
First, we added a VIP to our proxy setup using keepalived, to provide our monitoring targets with a single SNMP trap destination. The VIP will be available on one proxy at the time, regardless of whether there are 2, 10 or more proxies in the proxy group. Our switches, routers, or any other SNMP trap host can now be configured to send traps to this VIP.
Second, we needed a way to duplicate our traps. Since only one proxy is going to be receiving traps, the other proxies still need to be able to receive the traps. Without the duplication and the VIP being present on Proxy 1, Zabbix host 2 still would not receive its trap. We installed Docker and created a tiny, lightweight container on our hosts to duplicate the SNMP trap from one proxy to all other proxies in the group. Admittedly this does slightly go against requirement number 2, as we are now sending the trap over the network between proxies. This is, however, all within our own more localized infrastructure instead of over a longer network.
That’s it! Whenever Proxy 1 receives a trap, it will now duplicate it to Proxy 2. The proxy with the host being monitoring will parse the trap correctly to Zabbix and the other proxies will ignore the trap. Even if the proxy restarts, fails over, or suddenly goes down, it will not read the trap twice.
The only thing to keep in mind is that it can take some time for keepalived to fail over the VIP. With SNMP traps being UDP-based, this means that any traps sent to the VIP while snmptrapd is down won’t be parsed. However, it’s definitely better to lose some in case of failover, than to lose all upon outage!