There are many design choices to consider when we build our monitoring environment for high-frequency monitoring. How to minimize performance impact? What are the data retention policies with storage space in mind? What are the available out-of-the-box features to solve these potential problems?
In this blog post, we will discuss when you should use preprocessing and when it is better to use the “Do not keep history” option for your metrics, and what are the pros and cons for both of these approaches.

Throttling and other preprocessing steps

We’ve discussed throttling previously as the go-to approach for high-frequency monitoring. Indeed, with throttling, you can discard repeated values and do so with a heartbeat. This is extremely useful with metrics that come as discrete values – services states, network port statuses, and so on.
Example of throttling with and without heartbeat
In addition, since starting from Zabbix 4.2 all preprocessing is also performed by Zabbix proxies. This means we can discard the repeated values before they reach the Zabbix server. This can help us both with the performance (fewer metrics to insert in the Zabbix server DB) and reduce the DB size (Fewer metrics stored in the DB. This also helps with improving overall Zabbix performance)
There are a few caveats with this approach – since metrics get discarded before they reach the Zabbix server, the triggers will not react on these metrics (This is where having a heartbeat is useful) and, since trends are calculated by Zabbix server based on the received history data, there could be a lack of trend information for these metrics. Keep in mind that this applies not only to throttling preprocessing rules – any preprocessing can be done on the proxy and any preprocessing rules can be used to transform your data.

Understanding “Do not keep history” option

The behavior of “Do not keep history” which we can define when configuring an item is a bit different though. If we collect an item by a Proxy and configure the item with “Do not keep history”, the history won’t always get discarded! There are a couple of reasons for this.
  • First off, let’s not forget that some of our values can populate host inventory! If the particular item is configured to populate an inventory field – it will be forwarded to the Zabbix server, but it will not get stored in the history tables.
  • If the item does not populate an inventory field – the text data such as character, log and text will indeed get discarded before reaching the Zabbix server, but Numeric values – both float and integer, will get forwarded to the server. The reason for that is deriving trend information from the numeric values. Mind that the numeric data will still not get stored in the history tables, only trends will be available for these items.

Note: This behavior has been properly implemented starting from Zabbix 5.2. See ZBX-17548

Setting the “Do not keep history” option for an item

Using trend functions with high-frequency monitoring

With the specifics of “Do not keep history” in mind, we should now recall that starting from Zabbix 5.2 we have trend functions available at our disposal!
History functions such as trendavg, trendcount, trendmax, trendmin, trendsum allow us to perform different kinds of trend calculations – from counting the number of trend values to retrieving min/max/avg trend values for a time period.
This means, that if we require only the metric trend for specific time periods (hours, days, weeks, etc) we can use these trend functions together with “Do not keep history” option, thus discarding unnecessary data and improving our Zabbix server performance!
There are two approaches two using trend functions:
  • If you wish to collect and display the trend data, you need to create the item which will collect the metrics (say, a net.if.in Agent item for collecting incoming network traffic) and then create a separate calculated item that uses the trend function to calculate the avg/min/max value for the trend over a time period. The original item can then have “Do not keep history” option selected for it.

trendavg item for calculating hourly trends from the net.if.in[ifHCInOctets.5] item

 

  • If you wish to simply define triggers and react on long-term trends and are not required to collect the trend values, then we can skip the creation of the calculated item and simply use the trend function on the original item in the trigger.

This trigger fires if the hourly average trend value exceeds 100M.
Note: In this case only the original item is required.

By combining these approaches in our environment – using preprocessing when we wish to discard or transform the data and also implementing opting out of storing the history data, whenever this is appropriate, we can minimize the performance impact on our Zabbix instance. Add a layer of distributed Zabbix proxies on top of this and you can truly achieve a large, scalable Zabbix infrastructure optimized for high-frequency ingestion and processing of your data.

Subscribe
Notify of
2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Ian Bromehead
Ian Bromehead
2 years ago

Thanks Artur, nice discussion to help us especially in large configurations. One thing I’d suggest to help clarification is when you speak about those 5.2 onwards history functions, is to explain where they are executed. Before that section you explain well how to use options to optimise data transmission to the server and reduce unnecessary load on it. If I understand correctly trend analysis and execution of these functions is all performed on the server, so their adoption has no impact on the amount of data transmitted, right?
So the only economy gained is DB space, or am I missing something?
As the data used isn’t stored by the server, wouldn’t it be logical to have an option to perform these functions on the proxy, at least in certain cases?
Thanks
Ian

ArtursL
ArtursL
2 years ago
Reply to  Ian Bromehead

Thanks for the comment, Ian!

The obvious benefit is the saved storage space – that’s correct.
But, there are also less DB writes ( With “Do not keep history” the history data does not get written to the DB in most cases, as per the blog post)
Also, if we were to use History trigger functions, they would get re-analyzed every time we pick up new history data (say, every 3 seconds if the update interval is 3s) while trend trigger re-calculation rate depends on the trend period (hourly/daily/weekly, etc)

So in the end, there are a few benefitt both for storage and performance.

As for why the proxy doesn’t do trend/trigger calculation – we really wish to avoid delegating extra functionality to proxies, as that would further bloat them. But I cannot really comment on the plans about further proxy functionality and design decisions, so this isn’t necessarily set in stone.

2
0
Would love your thoughts, please comment.x
()
x