An audit improves the security of a product, specifically the “non-repudiation” aspect in threat-models (risks are reduced when threat-agents cannot deny they did malicious activity). Zabbix 5.0 already had audit functionality, which received a major rewrite in 6.0 and several updates since then. In this blog post, we will go through them and get an overall picture of what has changed (and why).
The server side work on 6.0 was mostly done and further improved in 7.0. Front-end work is still ongoing (due to a larger scope). The main goal of a Zabbix audit is to track all configuration and settings changes – who, when, and what. This is an enterprise-level requirement, but non-enterprise users can also benefit.
Table of Contents
The situation before Zabbix 6.0
When a host or template is added, only its name is recorded, without info about items, triggers, tags, etc. The linking of the template on the host is not audited. Everything on the screen is an audit done by the front-end, except the script execution. Zabbix Server itself actually does a lot of configuration changes, including adding and updating hosts and updating items (during LLD or when linking templates during auto-registration or network discovery), but there is no audit for that at all. There are also non-configuration changes (events) we want to audit, including:
- Script execution (already audited in 5.0):
- Reloading passive proxy config data (ZBXNEXT-1580), added in 6.2:
- HA node status change (ZBXNEXT-6923), added in 6.0, history push API requests, and sending data to Zabbix server via API (ZBXNEXT-8541), added in 7.0:
Audit overview
Most Zabbix server audit logic is in:
a) Linking of templates (as a result of auto-registration or network discovery) with updates to:
- Hosts
- Items
- Triggers
- Graphs
- Discovery Rules (and prototypes of everything above)
- Web Scenarios
b) LLD, with the following entities created from prototypes:
- Hosts
- Items
- Triggers
- Graphs
New audit goals
In addition to the main functional requirement to “track all configuration and settings changes,” there are additional requirements aimed at making all audits faster and easier to manage:
- All audits are now stored in a single table (Simpler and faster SQL queries)
- Bulk SQL inserts and efficient ids generation
- The audit of a particular entity stays longer than this entity. If an entity – (host or user) is deleted – the audit for it stays
- The audit has an independent housekeeping schedule
- It is still possible to disable the audit
CUID
Zabbix uses an ids table to generate ids:
When something (items, triggers etc.) needs to be generated, the related row in the ids table gets locked. This represents a problem for generating audit rows, because an audit can be generated independently by the server and front-end:
So, we could end up in a situation where a user cannot create an item because the server is holding a lock on the ids table while generating thousands of new LLD items. That is why a new method for generating ID was used for audits:
Thanks to it, the front-end and server can independently generate ids for audit entries without locks. The chance of collision is astronomically low.
System user
When it is not clear under which user an audit entry needs to be generated, it is recorded under “System user.” Most of the audits done by Server are done under “System user.” One exception is “script execution,”” since it is clear which user clicked on the script execution button. However, under which user should the server record audit entries when new items are generated during LLD? We could track down which user created the LLD rule, but what if the LLD rule was then modified by another user? For such cases, “System user” is used.
RecordSetID
From the spec: “To have the ability to recognize that some set of audit log records was created during the processing of a separate operation, a new column “Recordset ID” for audit log records will be provided. Each audit log record of the separate operation will have the same recordset ID. The recordset ID will be generated using the CUID algorithm.”
We can see that 2 graphs were created in a single operation (e.g. during the linking of one template with 2 graphs).
Audit details
A new audit contains much more information on what was changed with new details:
Upgrade patch
Warning! Old auditlog, and auditlog_details tables are removed during the upgrade patch to 6.0. A new auditlog table is created, and the schema is updated.
- auditid is now CUID
- userid can be NULL (no more foreign reference on users table)
- username is added
- resource_cuid is added(alternative to resource, only for HA)
- recordsetid is added
- note and other auditlog_details table data now is in details (JSON)
BulkSQL
It is much more efficient to execute SQL queries in bulk. Zabbix already relies on bulk SQL queries:
Inserting and/or updating thousands of new items in one query is much faster than running thousands of individual queries. There are many reasons why this is the case, but the most basic answer is that DBs are designed this way. For example, a large single query in PostgreSQL needs to start the planner/optimizer once, and then it would be able to properly analyze this large query and create an efficient execution plan. When running thousands of separate queries, the planner/optimizer needs to be started for each query, and every time it would analyze the small query and decide there is not much it can do. When a server is doing some configuration changes, like LLD or templates linking during auto-registration or network discovery, it will insert/update/delete items/triggers and also auditlog entries in one large query.
Performance impact
Quick performance tests showed that the audit slows the server at most by 4-5%. The larger the setup, the smaller the impact will be.
Storage impact and administration
Zabbix audits can generate a lot of data. If your setup generates a lot of configuration, audits can eventually overrun the storage space. In this case, there are several audit configurations that could be helpful.
First of all, an audit can be disabled for all Zabbix, including the front-end:
Disabling audit is not advised, however – this option exists mostly as a possible workaround. Audit is enabled by default and Zabbix is developed and tested with audit enabled.
Log system actions button:
Disables audit done by Zabbix server during auto-registration, network discover, and LLD. On some systems, these can generate a lot of configurations and audits, for example when LLD discovers hundreds of new devices every minute.
This could help reduce the storage impact while preserving all other audit functionality.
Housekeeping schedule:
If a host, trigger, or graph is deleted (by housekeeper or manually), the audit generated for it stays (as it exists in a separate table).
A Zabbix audit has its own independent housekeeping schedule, and it can be adjusted to suit your environment.