Infrastructure@Home: Using Ansible to Install the Service Monitoring Software Consul

This is the 4th article in my series about infrastructure monitoring. Let’s recap shortly. I discussed the motivation for automating my infrastructure and defined four sets of requirements: infrastructure management, application management and service discovery. The first two articles explained my infrastructure, the steps to install the basic OS, and using Ansible for system management and package installation.

This article is a tutorial about writing a complex Ansible role that will install the service monitoring and discovery software Hashicorp Consul. Consul will fulfill the following application management requirement:

  • AM4: An interface in which I see the status of each application (health, exposed endpoints)

This article originally appeared at my blog.

Ansible Roles

Roles that should be executed on hosts are named inside a playbook. For example, if we want to install the new_role on all hosts, we can define the playbook as follows [^2].

- name: Install new role
- hosts: all
- roles:
- new_role

Roles have a basic directory structure. By executing ansible-galaxy role init new_role, you will get the following diretory layout.

new_role
├── README.md
├── defaults
├── files
├── handlers
├── meta
├── tasks
├── templates
├── tests
└── vars

What goes into these directories? In tasks and handlers include the concrete tasks that executing your role encompasses. In files, you put static configuration file, and in templates, you put files that will include variables. Finally, variables include all dynamic values that your playbook needs, and defaults encompasses variables that other roles or playbooks might overwrite. If you want to share your role on ansible galaxy, include relevant information in meta.

Now let’s apply this knowledge to write an Ansible role for install Consul on all our nodes.

Consul

Install Consul with a Playbook

  • Determine the correct Linux distribution and hardware
  • Create consul group, user and directories
  • Download and link the consul binary
  • Copy configuration files
  • Enable Consul with systemd

All these steps will be fully automated with Ansible!

Step 1 — Determine the correct Linux Distribution and Hardware

Here is the corresponding Ansible task. The consul version is configured as a role variable. Then, for each host, I define the src file name and the src url to use them in later tasks.

- name: Set vars when architecture is armv7l
set_fact:
consul_src_file: "consul_{{ consul_version }}_linux_armhfv6.zip"
consul_src_url: "https://releases.hashicorp.com/consul/{{ consul_version }}/consul_{{ consul_version }}_linux_armhfv6.zip"
consul_src_hash: "https://releases.hashicorp.com/consul/{{ consul_version }}/consul_{{ consul_version }}_SHA256SUMS"
when: ansible_facts.architecture == 'armv7l'

Step 2 — Create Consul Group, User and Directories

- name: Create consul group
group:
name: consul
state: present
system: true
- name: Create consul user
user:
name: consul
group: consul
shell: /bin/false
home: /etc/consul/
state: present
- name: Create consul dir
file:
state: directory
path: "{{ consul_base_directory }}"
owner: consul
mode: 0755

Step 3 — Download and Link the Consul Binary

- name: Get consul binary
get_url:
url: "{{ hostvars[inventory_hostname].consul_src_url }}"
dest: /tmp
checksum: sha256:{{ hostvars[inventory_hostname].consul_src_hash }}
- name: Unzip consul binary
unarchive:
src: /tmp/{{ hostvars[inventory_hostname].consul_src_file }}
dest: "{{ consul_base_directory }}"
remote_src: true
mode: 0744
- name: Create symlink
file:
src: "{{ consul_base_directory }}/consul"
dest: /usr/local/bin/consul
state: link

Step 4 — Copy Configuration Files

{
"datacenter": "{{ consul_datacenter }}",
"data_dir": "{{ consul_base_directory }}",
"log_level": "INFO",
"enable_local_script_checks": true,
"enable_syslog": true,
"node_name": "{{ ansible_facts.hostname }}",
"retry_join": ["{{ consul_main_server_ip }}"]
}

Consul reads all configuration files from a directory in alphabetic order. We will copy the basic configuration consul.json to all nodes, and additionally the server.json to the server.

- name: Copy consul config
template:
src: config.json.j2
dest: "{{ consul_base_directory }}/config.json"
mode: 0644
notify: Restart consul
- name: Copy consul server config
template:
src: server.json.j2
dest: "{{ consul_base_directory }}/server.json"
mode: 0644
when: consul_role == 'server'
notify: Restart consul

Step 5 — Enable Consul with Systemd

- name: Copy consul service file
template:
src: consul.service.j2
dest: /etc/systemd/system/consul.service
mode: 0644
notify: Restart consul
- name: Set file permissions
file:
path: "{{ consul_base_directory }}"
owner: consul
group: consul
recurse: true
- name: Enable consul service
systemd:
service: consul.service
enabled: true
notify: Restart consul

Running the Playbook

Consul will be automatically installed on all nodes. The very first time I ran this playbook, I understood the power of Ansible to apply complex installation steps to an arbitrary number of nodes! My infrastructure only consists of 5 nodes, but the playbook could configure 50 nodes too.

Using Consul

Consul offers an a rich set of command line commands. For now, I will only show how nodes can connect to each other to form the cluster.

Let’s manually start a Consul agent on the command line. We see important properties like the node name, datacenter, and IP address.

$ consul —agent
==> Starting Consul agent…
Version: 'v1.5.2'
Node ID: '094163f6-9215-6f2c-c930-9e84600029da'
Node name: 'raspi-3-1+'
Datacenter: 'infrastructure_at_home' (Segment: '<all>')
Server: true (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-E

Another command shows all members of the same cluster.

$ consul membersNode       Address             Status  Type    Build  Protocol  DC                      Segment
raspi-3-2 192.168.2.202:8301 alive server 1.7.1 2 infrastructure_at_home <all>
raspi-4-1 192.168.2.203:8301 alive server 1.7.1 2 infrastructure_at_home <all>
raspi-4-2 192.168.2.204:8301 alive server 1.7.1 2 infrastructure_at_home <all>
raspi-3-1 192.168.2.201:8301 alive client 1.7.1 2 infrastructure_at_home <default>

Because of the config file that we used, all nodes are already connected. Nodes can join a cluster manually by using consul join <ipaddress>.

Finally, Consul also provides a Web UI that shows the nodes in the server, their health status, and the services running on them.

Conclusion

Consul provides the management view of all nodes in my cluster. For now, we barely scratched the surface of what is possible. In the next post, we will continue to set up the infrastructure by installing Docker and Hashicorp Nomad, a job scheduler.

Footnotes

[2]: Important note: Roles are executed before any other tasks in your playbook! So, in this example, the task to print the hostname will be run after the role. If you need tasks to run beforehand, use the pre_tasks declaration.

IT Project Manager & Developer