The performance of Zabbix is being constantly improved, and there were significant performance improvements back in 1.8. Then pretty much every Zabbix 1.8 series release added some more benefits, reduced database access and so on. With the 2.0 release there are more performance benefits expected, but there’s so little time to gather some information… luckily, some users do provide us with empirical evidence 🙂
The promise of 2.0
Zabbix 2.0 “What’s new” page does list several things that are supposed to improve performance – general configuration caching, improved user macro caching, Zabbix internal process history syncer and escalator improvements – but the biggest expected benefit most likely would come from the trigger cache.
Before 2.0, trigger configuration information would have to be retrieved from the database as triggers were calculated – which could be quite a lot of work on a large system. Now Zabbix will cache almost all configuration needed to figure out whether values received should be considered a problem.
The real world impact
Now that’s great, but how do we know whether this will have any impact on real systems? Thanks to Robert Hau from West Highland, we have got a couple of graphs, showing the impact of 2.0 upgrade on a production system.
First we can see the change in database queries before the upgrade and after (the rightmost side of the graph after the gap – gap being the upgrade).
Most of the query types don’t seem to show significant change… except selects. They went from just-a-bit-below-3000 to 500 right after server startup and somewhere around 260 some time later. Apparently, this system does have a fair bit of triggers, and reducing the need to retrieve trigger configuration from the database has helped here quite a lot.
So that’s database access – what about Zabbix server itself, does it feel more happy?
It apparently does. Again, on the rightmost side we can see the effect upgrade had. The most notable change seems to be the reduction of poller process busy rate from 11-15% to 6-7%, which also made their busy rate more even. History syncer processes had some spikes earlier, but we would need to see a bit more data to figure out whether these spikes are gone with 2.0.
2.0 does seem to improve performance, mostly by reducing the select rate significantly. Note that this most likely depends on having larger amount of triggers – on a system with lots of items but no triggers the change could be much smaller or nonexistent. It is also likely that the difference between 1.8.14 and 2.0.0 would be slightly smaller, as post-1.8.5 releases had some minor performance improvements as well.
Nevertheless, such an improvement should be good news for all who already have upgraded to 2.0 or are planning that. If you have upgraded already, do you see real gain performance wise? If so, let us know (preferably, with shiny graphs 🙂 ).