March 05, 2016

Running Ansible 2 Programmatically

Ansible 2 is out, and that means it's time to upgrade the previous article on Running Ansible Programmatically for Ansible 2, which has significant API changes under the hood.

Use Case

At work, we are spinning up hosted trials for a historically on-premise product (no multi-tenancy).

To ensure things run smoothly, we need logging and reporting of Ansible runs while these trials spin up or are updated.

Each server instance (installation of the application) has unique data (license, domain configuration, etc).

Running Ansible programmatically gives us the most flexibility and has proven to be a reliable way to go about this.

At the cost of some code complexity, we gain the ability to avoid generating host and variable files on the system (although dynamic host generations may have let us do this - this is certainly not THE WAY™ to solve this problem).

There are ways to accomplish all of this without diving into Ansible's internal API. The trade-off seems to be control vs "running a CLI command from another program", which always feels dirty.

Learning some of Ansible's internals was fun, so I went ahead and did it.

Overall, there's just more control when calling the Ansible API programmatically.

Install Dependencies

Ansible 2 is the latest stable, so we don't need to do anything fancy to get it. We can get a virtual environment (Python 2.7, because I live in the stone-age) up and running and install dependencies into it. I'm using an Ubuntu server in this case:

# Get pip
sudo apt-get install -y python-pip

# Get/udpate pip and virtualenv
sudo pip install -U pip virtualenv

# Create virtualenv
cd /path/to/runner/script
virtualenv ./.env
source ./env/bin/activate

# Install Ansible into the virtualenv
pip install ansible

Then, with the virtual environment active, we call upon Ansible from our Python scripts.

Ansible Config

In the Ansible 1 series, I first tried and eventually gave up pon trying to set the path to the ansible.cfg file in code. I didn't even try again in Ansible 2, opting instead to (again) set the environmental variable ANSIBLE_CONFIG.

That variable looks something like ANSIBLE_CONFIG=/path/to/ansible.cfg;

The ansible.cfg file looks something like this:

[defaults]
log_path = /var/log/ansible/ansible.log
callback_plugins = /path/to/project/callback_plugins:~/.ansible/plugins/callback_plugins/:/usr/share/ansible_plugins/callback_plugins

[ssh_connection]
ssh_args = -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentitiesOnly=yes -o ControlMaster=auto -o ControlPersist=60s
control_path = /home/your_username/.ansible/cp/ansible-ssh-%%h-%%p-%%r

Here's what is set here:

  • log_path - Uses the internal log plugin to log all actions to a file. I may turned this off in production in favor of our own log plugin which will send logs to another source (database or a log aggregator).
  • callback_plugins - The file path a custom logging plugin (or any plugin!) would be auto-loaded from, if any. I add the project's path in first, and then append the default paths after.
  • ssh_args - Sets SSH options as you would normally set with the -o flag. This controls how Ansible connects to the host server. The ones I used will prevent the prompt that asks if this is a trusted host (be cautious with doing that!) and ensure Ansible uses our private key to access the server. Check out the video on Logging in with SSH to see some examples of those options used with the SSH command.
  • control_path - I set the SSH control path to a directory writable by the user running the script/Ansible code (and thus using SSH to connect to the remote server). This is likely optional in reality.

The Script(s)

Like in our previous article, we'll dive right into the main script.

import os
from tempfile import NamedTemporaryFile
from ansible.inventory import Inventory
from ansible.vars import VariableManager
from ansible.parsing.dataloader import DataLoader
from ansible.executor import playbook_executor
from ansible.utils.display import Display

class Options(object):
    """
    Options class to replace Ansible OptParser
    """
    def __init__(self, verbosity=None, inventory=None, listhosts=None, subset=None, module_paths=None, extra_vars=None,
                 forks=None, ask_vault_pass=None, vault_password_files=None, new_vault_password_file=None,
                 output_file=None, tags=None, skip_tags=None, one_line=None, tree=None, ask_sudo_pass=None, ask_su_pass=None,
                 sudo=None, sudo_user=None, become=None, become_method=None, become_user=None, become_ask_pass=None,
                 ask_pass=None, private_key_file=None, remote_user=None, connection=None, timeout=None, ssh_common_args=None,
                 sftp_extra_args=None, scp_extra_args=None, ssh_extra_args=None, poll_interval=None, seconds=None, check=None,
                 syntax=None, diff=None, force_handlers=None, flush_cache=None, listtasks=None, listtags=None, module_path=None):
        self.verbosity = verbosity
        self.inventory = inventory
        self.listhosts = listhosts
        self.subset = subset
        self.module_paths = module_paths
        self.extra_vars = extra_vars
        self.forks = forks
        self.ask_vault_pass = ask_vault_pass
        self.vault_password_files = vault_password_files
        self.new_vault_password_file = new_vault_password_file
        self.output_file = output_file
        self.tags = tags
        self.skip_tags = skip_tags
        self.one_line = one_line
        self.tree = tree
        self.ask_sudo_pass = ask_sudo_pass
        self.ask_su_pass = ask_su_pass
        self.sudo = sudo
        self.sudo_user = sudo_user
        self.become = become
        self.become_method = become_method
        self.become_user = become_user
        self.become_ask_pass = become_ask_pass
        self.ask_pass = ask_pass
        self.private_key_file = private_key_file
        self.remote_user = remote_user
        self.connection = connection
        self.timeout = timeout
        self.ssh_common_args = ssh_common_args
        self.sftp_extra_args = sftp_extra_args
        self.scp_extra_args = scp_extra_args
        self.ssh_extra_args = ssh_extra_args
        self.poll_interval = poll_interval
        self.seconds = seconds
        self.check = check
        self.syntax = syntax
        self.diff = diff
        self.force_handlers = force_handlers
        self.flush_cache = flush_cache
        self.listtasks = listtasks
        self.listtags = listtags
        self.module_path = module_path

class Runner(object):

    def __init__(self, hostnames, playbook, private_key_file, run_data, become_pass, verbosity=0):

        self.run_data = run_data

        self.options = Options()
        self.options.private_key_file = private_key_file
        self.options.verbosity = verbosity
        self.options.connection = 'ssh'  # Need a connection type "smart" or "ssh"
        self.options.become = True
        self.options.become_method = 'sudo'
        self.options.become_user = 'root'

        # Set global verbosity
        self.display = Display()
        self.display.verbosity = self.options.verbosity
        # Executor appears to have it's own 
        # verbosity object/setting as well
        playbook_executor.verbosity = self.options.verbosity

        # Become Pass Needed if not logging in as user root
        passwords = {'become_pass': become_pass}

        # Gets data from YAML/JSON files
        self.loader = DataLoader()
        self.loader.set_vault_password(os.environ['VAULT_PASS'])

        # All the variables from all the various places
        self.variable_manager = VariableManager()
        self.variable_manager.extra_vars = self.run_data

        # Parse hosts, I haven't found a good way to
        # pass hosts in without using a parsed template :(
        # (Maybe you know how?)
        self.hosts = NamedTemporaryFile(delete=False)
        self.hosts.write("""[run_hosts]
%s
""" % hostnames)
        self.hosts.close()

        # This was my attempt to pass in hosts directly.
        # 
        # Also Note: In py2.7, "isinstance(foo, str)" is valid for
        #            latin chars only. Luckily, hostnames are 
        #            ascii-only, which overlaps latin charset
        ## if isinstance(hostnames, str):
        ##     hostnames = {"customers": {"hosts": [hostnames]}}

        # Set inventory, using most of above objects
        self.inventory = Inventory(loader=self.loader, variable_manager=self.variable_manager, host_list=self.hosts.name)
        self.variable_manager.set_inventory(self.inventory)

        # Playbook to run. Assumes it is
        # local to this python file
        pb_dir = os.path.dirname(__file__)
        playbook = "%s/%s" % (pb_dir, playbook)

        # Setup playbook executor, but don't run until run() called
        self.pbex = playbook_executor.PlaybookExecutor(
            playbooks=[playbook], 
            inventory=self.inventory, 
            variable_manager=self.variable_manager,
            loader=self.loader, 
            options=self.options, 
            passwords=passwords)

    def run(self):
        # Results of PlaybookExecutor
        self.pbex.run()
        stats = self.pbex._tqm._stats

        # Test if success for record_logs
        run_success = True
        hosts = sorted(stats.processed.keys())
        for h in hosts:
            t = stats.summarize(h)
            if t['unreachable'] > 0 or t['failures'] > 0:
                run_success = False

        # Dirty hack to send callback to save logs with data we want
        # Note that function "record_logs" is one I created and put into
        # the playbook callback file
        self.pbex._tqm.send_callback(
            'record_logs', 
            user_id=self.run_data['user_id'], 
            success=run_success
        )

        # Remove created temporary files
        os.remove(self.hosts.name)

        return stats

We'll cover what all this is doing - in particular, what that huge, ugly Options class is doing there.

Some Assumptions

I have a directory called roles in the same directory as this script. However, you can set the roles path in your ansible.cfg file.

Playbooks are assumed to be in the same directory as this script as well. That's hard-coded above via pb_dir = os.path.dirname(__file__).

Lastly, as noted, when this code is run, ensure the ANSIBLE_CONFIG environment variable is set with the full path to the ansible.cfg file.

Now let's start from the top to cover the file.

Imports

We import the Ansible objects that we'll use to run everything. The only non-Ansible imports are os (to get environment variables and the current file path of this script) and NamedTemporaryFile, useful for generating files with dynamic content for when Ansible expects a file.

Note that Ansible has refactored a lot of code in Ansible 2 so things like the DataLoader and VariableMapper are in one class, making things like variable loading precedence much more consistent.

Options

We made a monster Options class, which is basically a glorified dict. During a regular CLI call to Ansible, an option parser gets available options (via optparse). These are options like "ask vault password", "become user", "limit hosts" and other common options we would pass to Ansible.

Since we're not calling it via CLI, we need something to provide options. In it's place, we provide an object with the same properties, painfully taken from the Github code and some experimentation.

Note that this is similar to how Ansible itself runs Ansible programmatically in their Python API docs. The use options = Options(connection='local', module_path='/path/to/mymodules', forks=100, ... ).

This sets almost all the options (and more!) that you might pass a CLI call to ansible or ansible-playbook.

We'll later fill out some of these options (not all, some just need to exist, even if they have a None value) and our Options object to the PlaybookExecutor.

Runner

Here's the magic - I made a Runner class, responsible for collecting needed data and running the Ansible Playbook executor.

The runner needs a few bits of information, which of course you can customize to your needs. As noted, it uses an instance of the Options object and sets the important bits to it, such as the become options, verbosity, private_key_file location and more.

In this case, we can pass our desired verbosity to the Display object, which will set how much is output to stdout when we run this.

We create a DataLoader instance, which will load in YAML/JSON data from our roles. This gets passed a Vault password as well, in case you are encrypting any variable data using Ansible Vault. Note we have a second environmental variable, VAULT_PASS. You may want to pass that in instead of use an environmental variable - whatever works for you.

Then the script creates a VariableManager object, which is responsible for adding in all variables from the various sources, and keeping variable precedence consistent.

We pass in any extra_vars to this object. This is the main way in which I've chosen to add in variable data for the roles to use. It skips using Jinja2 or other methods of passing in host variables, although those methods are also available to you. Since our use case was to have a lot of custom data per customer, this method of passing variables to Ansible made sense to us.

After that, we create a NamedTemporaryFile and create a small hosts file entry (I'm assuming one host at a time, you don't have to!). I avoided using Jinja2 there, but you could easily do that just like in the previous article on running Ansible 1 programmatically.

Next, we create an Inventory object and pass it the items it needs.

Finally we create an instance of PlaybookExecutor with all our objects. That's then ready to run!

The actual execution of the playbook is in a run method, so we can call it when we need to. The __init__ method just sets everything up for us.

This should run your roles against your hosts! It will still output the usual data to Stderr/Stdout.

Callback Module

I needed a callback module to log Ansible runs to a database. Here's how to do that!

First, the callback module's path is set in the ansible.cfg file. The following callback module is in that defined directory location.

Second, a note: You may have noticed that on the bottom of the Runner object, we reach deep into the PlaybookExecutor's Task Queue Manager object and tell it to send a (custom) callback. This object is meant to be a private property of the PlaybookExecutor, but Python's "We're all adults here" philosophy makes adding a custom callback possible.

Here's the callback module:

from datetime import datetime
from ansible.plugins.callback import CallbackBase

from some_project.storage import Logs # A custom object to store to the database

class PlayLogger:
    """Store log output in a single object.
    We create a new object per Ansible run
    """
    def __init__(self):
        self.log = ''
        self.runtime = 0

    def append(self, log_line):
        """append to log"""
        self.log += log_line+"\n\n"

    def banner(self, msg):
        """Output Trailing Stars"""
        width = 78 - len(msg)
        if width < 3:
            width = 3
        filler = "*" * width
        return "\n%s %s " % (msg, filler)

class CallbackModule(CallbackBase):
    """
    Reference: https://github.com/ansible/ansible/blob/v2.0.0.2-1/lib/ansible/plugins/callback/default.py
    """

    CALLBACK_VERSION = 2.0
    CALLBACK_TYPE = 'stored'
    CALLBACK_NAME = 'database'

    def __init__(self):
        super(CallbackModule, self).__init__()
        self.logger = PlayLogger()
        self.start_time = datetime.now()

    def v2_runner_on_failed(self, result, ignore_errors=False):
        delegated_vars = result._result.get('_ansible_delegated_vars', None)

        # Catch an exception
        # This may never be called because default handler deletes
        # the exception, since Ansible thinks it knows better
        if 'exception' in result._result:
            # Extract the error message and log it
            error = result._result['exception'].strip().split('\n')[-1]
            self.logger.append(error)

            # Remove the exception from the result so it's not shown every time
            del result._result['exception']

        # Else log the reason for the failure
        if result._task.loop and 'results' in result._result:
            self._process_items(result)  # item_on_failed, item_on_skipped, item_on_ok
        else:
            if delegated_vars:
                self.logger.append("fatal: [%s -> %s]: FAILED! => %s" % (result._host.get_name(), delegated_vars['ansible_host'], self._dump_results(result._result)))
            else:
                self.logger.append("fatal: [%s]: FAILED! => %s" % (result._host.get_name(), self._dump_results(result._result)))

    def v2_runner_on_ok(self, result):
        self._clean_results(result._result, result._task.action)
        delegated_vars = result._result.get('_ansible_delegated_vars', None)
        if result._task.action == 'include':
            return
        elif result._result.get('changed', False):
            if delegated_vars:
                msg = "changed: [%s -> %s]" % (result._host.get_name(), delegated_vars['ansible_host'])
            else:
                msg = "changed: [%s]" % result._host.get_name()
        else:
            if delegated_vars:
                msg = "ok: [%s -> %s]" % (result._host.get_name(), delegated_vars['ansible_host'])
            else:
                msg = "ok: [%s]" % result._host.get_name()

        if result._task.loop and 'results' in result._result:
            self._process_items(result)  # item_on_failed, item_on_skipped, item_on_ok
        else:
            self.logger.append(msg)

    def v2_runner_on_skipped(self, result):
        if result._task.loop and 'results' in result._result:
            self._process_items(result)  # item_on_failed, item_on_skipped, item_on_ok
        else:
            msg = "skipping: [%s]" % result._host.get_name()
            self.logger.append(msg)

    def v2_runner_on_unreachable(self, result):
        delegated_vars = result._result.get('_ansible_delegated_vars', None)
        if delegated_vars:
            self.logger.append("fatal: [%s -> %s]: UNREACHABLE! => %s" % (result._host.get_name(), delegated_vars['ansible_host'], self._dump_results(result._result)))
        else:
            self.logger.append("fatal: [%s]: UNREACHABLE! => %s" % (result._host.get_name(), self._dump_results(result._result)))

    def v2_runner_on_no_hosts(self, task):
        self.logger.append("skipping: no hosts matched")

    def v2_playbook_on_task_start(self, task, is_conditional):
        self.logger.append("TASK [%s]" % task.get_name().strip())

    def v2_playbook_on_play_start(self, play):
        name = play.get_name().strip()
        if not name:
            msg = "PLAY"
        else:
            msg = "PLAY [%s]" % name

        self.logger.append(msg)

    def v2_playbook_item_on_ok(self, result):
        delegated_vars = result._result.get('_ansible_delegated_vars', None)
        if result._task.action == 'include':
            return
        elif result._result.get('changed', False):
            if delegated_vars:
                msg = "changed: [%s -> %s]" % (result._host.get_name(), delegated_vars['ansible_host'])
            else:
                msg = "changed: [%s]" % result._host.get_name()
        else:
            if delegated_vars:
                msg = "ok: [%s -> %s]" % (result._host.get_name(), delegated_vars['ansible_host'])
            else:
                msg = "ok: [%s]" % result._host.get_name()

        msg += " => (item=%s)" % (result._result['item'])

        self.logger.append(msg)

    def v2_playbook_item_on_failed(self, result):
        delegated_vars = result._result.get('_ansible_delegated_vars', None)
        if 'exception' in result._result:
            # Extract the error message and log it
            error = result._result['exception'].strip().split('\n')[-1]
            self.logger.append(error)

            # Remove the exception from the result so it's not shown every time
            del result._result['exception']

        if delegated_vars:
            self.logger.append("failed: [%s -> %s] => (item=%s) => %s" % (result._host.get_name(), delegated_vars['ansible_host'], result._result['item'], self._dump_results(result._result)))
        else:
            self.logger.append("failed: [%s] => (item=%s) => %s" % (result._host.get_name(), result._result['item'], self._dump_results(result._result)))

    def v2_playbook_item_on_skipped(self, result):
        msg = "skipping: [%s] => (item=%s) " % (result._host.get_name(), result._result['item'])
        self.logger.append(msg)

    def v2_playbook_on_stats(self, stats):
        run_time = datetime.now() - self.start_time
        self.logger.runtime = run_time.seconds  # returns an int, unlike run_time.total_seconds()

        hosts = sorted(stats.processed.keys())
        for h in hosts:
            t = stats.summarize(h)

            msg = "PLAY RECAP [%s] : %s %s %s %s %s" % (
                h,
                "ok: %s" % (t['ok']),
                "changed: %s" % (t['changed']),
                "unreachable: %s" % (t['unreachable']),
                "skipped: %s" % (t['skipped']),
                "failed: %s" % (t['failures']),
            )

            self.logger.append(msg)

    def record_logs(self, user_id, success=False):
        """
        Special callback added to this callback plugin
        Called by Runner objet
        :param user_id:
        :return:
        """

        log_storage = Logs()
        return log_storage.save_log(user_id, self.logger.log, self.logger.runtime, success)

One thing not shown here is the some_project.storage.Logs object that has some boiler plate for saving log output to a database.

We have a PlayLogger object that does two things:

  1. Concatenates log output string together
  2. Times how long the Ansible run takes

Then we have the callback module object which is basically copy and paste boiler plate from the default callback module with a few tweaks. In particular, I edit how Exceptions are handled (ignoring verbosity settings) and remove calls to "Display", since we're saving output to a log string rather than outputting data to Stderr/Stdout.

The most interesting part here is the added method record_logs. This is the custom callback method we call from the Runner object. It Just Works™ and is amazing! In that method, we collect a user_id to give this Ansible run context (that's specific to our use case and we pass it a bunch more information in reality, including the ID of the server it was run on).

Running It

Here's how to use it, assuming we have a playbook called run.yaml.

from task import Runner
# You may want this to run as user root instead
# or make this an environmental variable, or
# a CLI prompt. Whatever you want!
become_user_password = 'foo-whatever' 
run_data: {
    'user_id': 12345,
    'foo': 'bar',
    'baz': 'cux-or-whatever-this-one-is'
}

runner = task.Runner(
    hostnames='192.168.10.233'
    playbook='run.yaml',
    private_key='/home/user/.ssh/id_whatever',
    run_data=run_data,
    become_pass=become_user_password, 
    verbosity=0
)

stats = runner.run()

# Maybe do something with stats here? If you want!

return stats

That's it! You can run Ansible 2 programmatically now!

All Topics