Zabbix trigger expressions provide an incredibly flexible way of defining problem conditions. If you can express your problem using plain English or any other human language, there is a great chance it could be represented using triggers.

I’ve noticed that even experienced Zabbix users are not always aware of the true power of triggers. The article is about defining problems in a smart way so that all alerts generated by Zabbix will be about real issues. No flapping, no false alarms any more. Interested?

 

Let us start with some definitions first. According to Zabbix documentation, a trigger is a logical expression that defines a problem threshold and is used to “evaluate” received data.

Triggers are not limited to a single item (metric) or a host, you are free to create triggers to analyze performance and availability information from different hosts.

Simple thresholds.

A simple trigger expression may look like this:

{MySQL DB1:vfs.fs.size[/var/lib/mysql,pfree].last(0)} < 10

The first part of the expression: MySQL DB1:vfs.fs.size[/var/lib/mysql, pfree] is a unique reference to the item we process data from. In this case it is the percentage of free disk space on the MySQL DB1 host.

last(0) is a function that returns the most recent value.

Therefore the whole expression means that if the percentage of free disk space on /var/lib/mysql volume goes below 10% we have a problem.

A few more examples:

CPU load is too high

{MYSQL DB1:system.cpu.load.last(0)} > 2

MySQL is overloaded, too many transactions per second

{MySQL DB1:mysql.tps.last(0)} > 10000

Incoming traffic is more than 50Mbps

{Firewall:net.if.in[eth0].last(0)} > 50M

No Apache processes running

{MySQL DB1:proc.num[apache2].last(0)} = 0

Do you see any issues? Think again. Right, such triggers may lead to flapping when values are jumping above and below our threshold in case of isolated performance or availability issues.

Note that Zabbix comes with templates that use simple thresholds. We did it for simplicity’s sake. Simple trigger expressions are easy to understand, especially for beginners.

It is probably the reason why sometimes our users say that Zabbix is too sensitive, it generates too many alarms or there is no flapping detection.

Making it less sensitive.

This is where more advanced trigger functions come handy. Our CPU load is too high trigger expression may take advantage of the min() function. Look:

{Oracle DB1:system.cpu.load.min(5m)} > 2

Now we are calculating the minimum of all values for the last 5 minutes. This expression means that CPU load stayed above 2 for the last 5 minutes, i.e. there were no values below 2.

Great! Now the trigger became much less sensitive, it will not alert us any time the CPU load jumps above 2.

Eliminating flapping and false alarms – hysteresis.

Hysteresis is an extremely useful, but often overlooked feature. It allows us to define different conditions for problem and recovery state. Suddenly our triggers become much smarter if powered by hysteresis.

How does it work? Zabbix supports a {TRIGGER.VALUE} macro, which returns the current trigger status as an integer (0 – ok, 1 – problem) and can be used directly in trigger expressions.

Let’s have a look at this example:

({TRIGGER.VALUE}=0 & {Oracle DB1:system.cpu.load.last()} > 2)
|
({TRIGGER.VALUE}=1 & {Oracle DB1:system.cpu.load.last()} > 1)

The {Oracle DB1:system.cpu.load.last()} > 2 part defines when a problem starts, while the second part of the expression: {Oracle DB1:system.cpu.load.last()} > 1 defines the condition to stay in the problem state.

The problem definition is much smarter now. We have a problem if CPU load is more than 2, while recovery happens only if the CPU load goes below 1.

A few more examples, note the use of different trigger functions.

CPU load is too high

({TRIGGER.VALUE}=0 & {Oracle DB1:system.cpu.load.min(5m)} > 2)
|
({TRIGGER.VALUE}=1 & {Oracle DB1:system.cpu.load.min(10m)} > 0.5)

Lack of free disk space on /var/lib/mysql

({TRIGGER.VALUE}=0 & {MySQL:vfs.fs.size[/var/lib/mysql,pfree].last(0)} < 10)
|
({TRIGGER.VALUE}=1 & {MySQL:vfs.fs.size[/var/lib/mysql,pfree].last(0)} < 30)

Best practices

  • Do not start writing trigger expressions before you know precisely what problem you are trying to describe; define and pronounce it first.
  • Do not rely on standard templates, review everything: data you are collecting, data collection frequency, trigger expressions, thresholds. Remember that you know your environment better than we do.
  • Define problem conditions wisely. Use advanced trigger functions and hysteresis.
  • Use global, template- and host-level macros instead of fixed values in trigger expressions. You will be able to tune the thresholds of thousands of triggers with two or three mouse clicks this way.

Additional reading