As some point you'll likely find yourself writing a script which needs to run all the time - a "long running script". These are scripts that shouldn't fail if there's an error, or ones that should restart when the system reboots.
To accomplish this, we need something to watch these scripts. Such tools are process watchers. They watch processes and restart them if they fail, and ensure they start on system boot.Let's see how to monitor long-running scripts in order to keep them running on bootup and after errors.
The Script
Here's an example long-running script - we'll make a quick service in Node. This NodeJS script will live at /srv/http.js
:
var http = require('http');
function serve(ip, port)
{
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.write("\nSome Secrets:");
res.write("\n"+process.env.SECRET_PASSPHRASE);
res.write("\n"+process.env.SECRET_TWO);
res.end("\nThere's no place like "+ip+":"+port+"\n");
}).listen(port, ip);
console.log('Server running at http://'+ip+':'+port+'/');
}
// Create a server listening on all networks
serve('0.0.0.0', 9000);
Note that the service prints out two environmental variables: "SECRET_PASSPHRASE" and "SECRET_TWO". We'll see how we can pass these into a watched process.
Supervisord
Supervisord is a simple and popular choice for process monitoring. Let's check out the package on Ubuntu:
$ apt-cache show supervisor
Package: supervisor
Priority: extra
Section: universe/admin
Installed-Size: 1485
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Qijiang Fan <fqj1994@gmail.com>
Architecture: all
Version: 3.0b2-1
Depends: python, python-meld3, python-pkg-resources (>= 0.6c7)
Filename: pool/universe/s/supervisor/supervisor_3.0b2-1_all.deb
Size: 313972
MD5sum: 1e5ee03933451a0f4fc9ff391404f292
SHA1: d9dc47366e99e77b6577a9a82abd538c4982c58e
SHA256: f83f89a439cc8de5f2a545edbf20506695e4b477c579a5824c063fbaf94127c1
Description: A system for controlling process state
Description-md5: b18ffbeaa3a697e8ccaee9cc104ec380
Homepage: http://supervisord.org/
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Origin: Ubuntu
We can see that we'll get version 3.0b2. That latest is version 3.1, but 3.0b2 is good enough. We can get a newer version by installing manually or by using Python's Pip, but then we'd lose out on making sure all the dependencies are met, along with the Upstart setup so that Supervisord works as a service and starts on system boot.
Installation
To install Supervisord, we can simply run the following:
sudo apt-get update
sudo apt-get install -y supervisor
Installing it as a package gives us the ability to treat it as a service:
sudo service supervisor start
Configuration
Configuration for Supervisord is found in /etc/supervisor/supervisord.conf
:
[include]
files = /etc/supervisor/conf.d/*.conf
So, any files found in /etc/supervisor/conf.d
and ending in .conf
will be included.
Now we need to tell Supervisord how to run and monitor our Node script. Let's create a configuration for it called webhooks.conf
.
This file will be created at /etc/supervisor/conf.d/webhooks.conf
:
[program:nodehook]
command=/usr/bin/node /srv/http.js
directory=/srv
autostart=true
autorestart=true
startretries=3
stderr_logfile=/var/log/webhook/nodehook.err.log
stdout_logfile=/var/log/webhook/nodehook.out.log
user=www-data
environment=SECRET_PASSPHRASE='this is secret',SECRET_TWO='another secret'
As usual, we'll cover the options set here:
[program:nodehook]
- Define the program to monitor. We'll call it "nodehook".command
- This is the command to run that kicks off the monitored process. We use "node" and run the "http.js" file. If you needed to pass any command line arguments or other data, you could do so here.directory
- Set a directory for Supervisord to "cd" into for before running the process, useful for cases where the process assumes a directory structure relative to the location of the executed script.autostart
- Setting this "true" means the process will start when Supervisord starts (essentially on system boot).autorestart
- If this is "true", the program will be restarted if it exits unexpectedly.startretries
- The number of retries to do before the process is considered "failed"stderr_logfile
- The file to write any errors output.stdout_logfile
- The file to write any regular output.user
- The user the process is run as.environment
- Environment variables to pass to the process.
Note that we've specified some log files to be created inside of the /var/log/webhook
directory. Supervisord won't create a directory for logs if they do not exist; We need to create them before running Supervisord:
sudo mkdir /var/log/webhook
Controlling Processes
Now that we've configured Supervisord to monitor our Node process, we can read the configuration in and then reload Supervisord, using the supervisorctl
tool:
supervisorctl reread
supervisorctl update
Our Node process should be running now. We can check this by simply running supervisorctl
:
$ supervisorctl
nodehook RUNNING pid 444, uptime 0:02:45
We can double check this with the ps
command:
$ ps aux | grep node
www-data 444 0.0 2.0 659620 10520 ? Sl 00:57 0:00 /usr/bin/node /srv/http.js
It's running! If we check our localhost at port 9000, we'll see the output written out by the NodeJS script, including the environment variables.
If your process is not running, try explicitly telling Supervisord to start process "nodehook" via
supervisorctl start nodehook
There's other things we can do with supervisorctl
as well. Enter the controlling tool using supervisorctl
:
$ supervisorctl
nodehook RUNNING pid 444, uptime 0:15:42
We can try some more commands:
Get a menu of available commands:
supervisor> help
# Available commands output here
Let's stop the process:
supervisor> stop nodehook
nodehook: stopped
Then we can start it back up
supervisor> start nodehook
nodehook: started
We can use <ctrl+c> or type "exit" to get out of the supervisorctl tool.
These commands can also be run directly:
$ supervisorctl stop nodebook
$ supervisorctl start nodebook
Web Interface
We can configure a web interface which comes with Supervisord.
Inside of /etc/supervisord.conf
, add this:
[inet_http_server]
port = 9001
username = user # Basic auth username
password = pass # Basic auth password
If we access our server in a web browser at port 9001, we'll see the web interface:
Clicking into the process name ("nodehook" in this case) will show the logs for that process.
Resource
- Also check out Python Circus, a more featured process monitor
- Check out sample Supervisord config for the popular Ghost blogging system