How to use Remote command functionality in Zabbix and what to keep in mind. Real life scenario of pro active monitoring of Windows services, to automatically start services that were stopped for any reason. And then use escalation support to notify people, if remote command did not help.
Contents
- Introduction (00:08)
- Zabbix remote commands: the idea behind it and what we can achieve with it (01:08)
- Using Zabbix remote commands to fix problems (05:00)
- Testing Zabbix remote commands (10:21)
Watch the video now.
Introduction
The topic for today will be about the remote commands inside Zabbix. Just recently there was a request to talk about this quite an old but still useful feature inside Zabbix so there we go.
Note. In my installation, just like usually from the Docker compose files, I have a version 4.2.1 but don’t worry, it’s not a new feature, it’s available even in 3.0 so you still be able to use it even if you don’t have the latest release of the Zabbix.
What I have here is:
- a default host from the Zabbix server, actually not that important in this article;
- I’ve installed an agent on my Windows host–so on this machine in which I’m working for this article;
- there’s also a host called localhost, which is a Zabbix agent in my Docker container, just like the server front and another stuff;
Zabbix remote commands: the idea behind it and what we can achieve with it
User parameters
You know that there are items on the agents that are responsible to collect some kind of the metrics. But usually those are fixed item keys that will be responsible to get a system CPU load, a free disk space, state of some service or stuff like that. In previous articles we talked that it is possible to extend functionality of the Zabbix agent and use user parameters.
To do this we’ll have to go to the configuration file, add the new user parameter, restart the Zabbix agent service or binary, then the new key will be created. It can do whatever we want, depending on what we’ve specified in the configuration file.
Then, the parameter in the agent config file. In this article I demonstrate this on my localhost — zabbix_agentd.conf.
There are two parameters:
1. First of all EnableRemoteCommands equals ‘0‘.
# Mandatory: no # Default: # EnableRemoteCommands=0 EnableRemoteCommands=1
That is the default value and by default when you install the Zabbix agent remote commands are not allowed. So even if you try to use it from the front end it will not work , the error message will be exactly the same ‘Remote commands are not allowed on that host‘.
2. The next parameter is LogRemoteCommands which also by default is ‘0‘–turned off.
# Mandatory: no # Default: # LogRemoteCommands=0 LogRemoteCommands=1
If you enable it, all of the executed remote commands from this agent will be also written to the log file of the agent, so later on it will have some some kind of audit log and you can follow who executed what.
system.run
First of all, there is one item in the Zabbix agent that is called system.run. I can use it from the CLI, so the command is #zabbix_get -s local host -k system.run[]. In the parameters, in the brackets [] it is possible to write any command you wish.
[root@localhost-]#zabbix_get -s 127.0.0.1 -k system.run["echo 123"]
Here’s what you should have on screen.
The response of this is ‘123‘, so basically we are using the runtime remote command on the agent. If you edit your agent config file
[root@localhost~]# vim /etc/zabbix/zabbix_agentd.conf
and comment out EnableRemoteCommands
# Mandatory: no # Default: # EnableRemoteCommands=0 # EnableRemoteCommands=1 wq
and try to execute this command again
[root@localhost~]# systemctl restart zabbix-agent [root@localhost~]# zabbix_get -s 127.0.0.1 -k system.run["echo 123"]
you will get an error message
So to use a system.run key on the agents you must open the configuration file of the agent and enable remote commands.
LogRemoteCommands is upon your wish: if you want to log them in the log file then do that, if no — that’s absolutely not mandatory.
Why would we want to use the system.run instead of user parameters? Because to add a user parameter we actually need to open that host, perform modifications and restart the Zabbix agent. To use a system.run it is enough to add an item in the front end, so it’s not a CLI tool. Like any other item: go to Host Items, click ‘Create a new one‘, the type will be Zabbix agent and a key: system.run and in the brackets you need to enter the command.
Using Zabbix remote commands to fix problems
That is basically just the smallest part about the remote commands. The second thing is the following: here in the Zabbix agent items we’re using again the remote command to get some metric. But we can use remote commands to actually fix some problems after they happen. So we are using remote commands together with actions and instead of just sending an email we are actually executing our remote commands to fix the problem.
For the purpose of demonstration I have configured an example on my Windows host. I have a Windows host and I’m monitoring my own computer. I’ve added a default operating system Windows template which also has a Windows service discovery.
Windows Host Discovery
Discovery > Items
So I am discovering all of my services on my Windows machine that has a startup type automatic or automatic delayed, that creates all of the items for service state monitoring, like these ones: Windows service discovery, state of service, then the service name and a service description in the brackets.
It also has triggers. Scroll up to see Triggers.
Discovery > Triggers
And triggers will fire when the service state will be anything but started: stopped, not started, hanging some kind of error.
By default it will produce a problem in your front end if last three checks appear as the service is not running.
We can actually create an action that will try to start a Windows service each time when it is stopped. The reason for this action is because these triggers can produce a lot of noise and it is not always required to involve somebody from a support team to actually check it out, log in on the machine and perhaps start a service.
Instead when the service is not running we can teach Zabbix to automatically try to start it. If after e.g. 10 minutes the service is still not running — we can send some email to our support team.
How to do that? In the configuration actions we need to create an action that will be responsible for our Windows services. But we need to make sure that this action will work only on the Windows services because, obviously, if the problem is with some kind of free disk space it would be a bad idea to try to start or restart some service upon that problem.
Configuration > Actions > Windows restart service > Operations
And also you need to write a command. Since the example is on the Windows, command is pretty simple. Click ‘Edit‘ like in the picture above and, after the page refreshes, scroll down to see Commands on the left. In the Commands type: net start and then {service name in the Windows}.
net start {TRIGGER.DESCRIPTION}
Here’s how it should look like on screen
Remote command
Of course the service name is dynamic, so we don’t know what the service name will be because it depends on which trigger we’ll go to the problem state.
But we can solve all of these problems in the template Windows Service discovery. Go Configuration → Hosts → Windows Host Discovery → Windows Service discovery.
On your Windows host on the Windows Service Discovery you can make a few changes. First of all trigger prototypes. Click Windows Service discovery Triggers prototypes.
In the Trigger prototype add a description field with a low-level discovery macro service.name which returns the discovered service name.
Trigger prototypes > Template OS Windows: Service “{#SERVICE.DISPLAYNAME}) is not running (startup type {#SERVICE.STARTUPNAME})> scroll down to Description on the left and type {SERVICE.NAME}
Then in the ‘Actions‘ in your remote command ‘net start’ use this trigger.description internal macro that will reveal with an actual service name.
net start {TRIGGER.DESCRIPTION}
Then to make sure that this problem actually is related only to the Windows services (and only for Windows services we should execute this remote command) you will add tags. Scroll up to see Tags. Click Tags.
Add tags on the trigger prototype with the tag name ‘type‘ and a value ‘WinService‘.
Now you have all needed information configured in the low-level discovery rules and you can proceed with the Actions. Let’s go back to the ‘Actions‘.
Configuration > Actions > Windows restart service.
In the conditions you have: ‘Tag type equals WinService‘.
So based on our logic, this Condition A will affect only those triggers that have a tag value ‘WinServices‘, it means Windows services. And just for addition you can add the additional condition.
This condition checks for the host group and will work with it only if it equals to Windows because obviously such command would not work on the Linux machines.
Let’s get to operations now. Configuration > Actions > Windows restart service > Operations > Edit > scroll down to see Commands. In the Commands you will see
net start {TRIGGER.DESCRIPTION}
In each of those automatically created triggers from the trigger prototypes the trigger description will actually have the actual service name.
Testing Zabbix remote commands
Now you can test this. Let’s go to dashboard.
Surely, there will be some some problems already from some services that are not running. But you can open your services screen menu in Windows — Background Intelligent Transfer service, which is BITS, and you should stop it.
Now let’s go to the Monitoring > latest data.
Let’s go to ‘Windows host‘ and search for BITS service. Click Apply.
You will see that the last check is “stopped” (you stopped it seconds ago). You will probably need wait two more checks until the trigger fires and you see if action was executed and if the service actually started.
While you’re waiting let’s talk about the system.run key. Go to Configurations > Hosts > Local host Items > scroll down and search for the item name ‘System run-agent version‘ (if necessary change the page).
Once you click on it you will see that the name is ‘system.run agent version‘, the type is ‘Zabbix agent‘, the key is ‘system.run‘, so it means we will be executing a remote command. Let’s copy the command.
[root@localhost ~]# /usr/sbin/zabbix_agentd -V
We copy the command to get a version of our Zabbix agent. The response from this command would be the same as from the CLI.
Now let’s go back to our Zabbix dashboard. There in my example you can see a new flashing problem, which is the latest 15:04.
This new problem is about some other service was stopped, not the BITS, so that was not forced by me, that happened by itself on Windows. We can see that the duration is 5 seconds, it is not running and we have one action executed. The action was ‘Remote Command-Executed‘.
Here’s the evidence that Zabbix successfully started this Windows biometric service. Also we will see a Tag – Type: WinService.
Once you refresh the page you will see a new flashing problem for the BITS.
The problem says that Service ‘BITS’ (Background Intelligent Transfer service) is not running (startup type automatic or delayed), duration of the problem is 2 seconds, type and value of the tag is WinService.
Once you refresh the page again you’ll see that action was actually executed.
The problem will say: one action executed, the action will be a remote command, status ‘executed‘ in green color. After this you can verify in the Services > Background Intelligent Transfer Service, you will see the service status is ‘Running‘.
So Zabbix can automatically start the service that is not running and we can save up some time for our support team.
If you still want to notify somebody about a problem if you fail to automatically fix it, you can just add another step.
Configurations > Actions > Windows Restart service > Operations
So the first step (that will run immediately) is a remote command to try to start a service.
And then if you want to create another step, click ‘New‘ and there another step could be ‘send a message to some user groups after 10 minutes‘.
So the command would have the meaning: immediately try to start the service, of course continue monitoring, if the problem is still active after 10 minutes — then notify our support team.
Note. In terms of these remote commands there must be a passive connection to the host. If the connection is only active — so incoming connections to the agent are not allowed based on the firewall specifics, Zabbix server will not be able to execute a remote command on the agent.
In previous articles we talked about active or passive agent modes. You can choose the mode: it can be one or another or both, but if your environment specifics (for example, cloud environment) don’t allow any incoming connections then you won’t be able to use this functionality.
Windows services is just an example. You can use this with all of the other problems and triggers and monitoring stuff on your servers and in your infrastructure.
Hope this article has taught you something new. Go ahead, do your own tests and make Zabbix even smarter in your environment.
We use a method that’s very similar to yours!
Only, we use an even more general rule to fire “restart operations”.
Just set one single rule for everything, starting from a specifical trigger name, containing, for example “Autorestart”. After that, we just execute what’s written in {TRIGGER.DESCRIPTION}. And so on, defining the whole escalation process. This is a VERY POWERFUL tool.
That’s why we propose to plan a little improvement: a specific NEW FIELD in “Triggers” configuration page (we may call “Trigger.action”) where you can write down the SPECIFIC REMOTE ACTION to fire for recovering that specific problem.
That’s just because we wouldn’t like to misuse the “Description” field.
Thank you
This article is very interesting and useful.
Instead of starting a service, I need to start a Windows EXE using the USER who is connected and opened the shell.
I created the action and it is correctly executed by Zabbix on the remote Windows host, but the application does not appear to the user because it is executed by SYSTEM instead of the user.
For my experiments I tried to open Notepad using c:windowssystem32notepad.exe
Is it possible?