At my new(ish) job, I’ve taken on a dev ops role in addition to development. We’ve been working with the excellent configuration management tool Ansible to configure and maintain our servers.
One of the specific tasks I was working on was to setup backup and replication from our primary database server to a remote recovery server. This required setting up known_hosts and authorized keys on each server so they could talk to each other over ssh without a password in either direction.
Alot of the other documentation and tutorials on ansible didn’t really have a great way to do this. The most common approach I saw was people putting the private keys for servers into ansible-vault, which required pre-generating a key for each server locally, or using a shared key for all servers. I figured there had to be away to generate the private keys on the servers themselves, and then copy the public keys between them.
I did manage to create a solution I’m pretty happy with, though it required thinking about ansible a little differently than alot of the tutorials and documentation encourages. For the most part, ansible best practices seem to encourage to think about your configuration goals in terms of roles. Playbook are almost always shown as simple composition of roles, and rarely have a significant number of tasks over their own. This is fine when you’re thinking about configuration tasks that impact a single server in isolation, but doesn’t work so well when you’re working with tasks that have to touch multiple servers, like copying public keys between two boxes.
Fortunately, there are a number of other ways to organize ansibles work besides the typical example of 1 file-1 play-multiple roles. For starters, you can actually have multiple plays in a single yaml file. When you run ansible-playbook against a file like that, it will run each play successively. That allowed me to do something like this.
- Play 1: Gather information about all the servers
- Play 2: Generate ssh keys on the first group of servers and then fetch the public key for each
- Play 3: Generate ssh keys on the second group of servers and then fetch the public key for each
- Play 4: Add the fetched public keys to the first group of servers
- Play 5: Add the fetched public keys to the second group of servers
This has a couple of key advantages over the other methods I’ve seen.
1. Each server get’s a unique private key, for better control of your infrastructure. Single servers can be removed from the trust relationship without affecting any others.
2. Each unique private key never leaves the server it was generated on. There’s no risk of leaking a key in source control, or storing on an insecure location.
We’ll take a look at each of those steps in isolation and then look at the playbook as a whole
Play 1
Play 1 is a simple dummy play that ensures that we gather information about all the servers involved first. That way we have everything we need for known_hosts entries. It targets an inventory named db, which should have every database involved in the trust relationships you are trying to setup. The only task it has will never run.
- hosts: db name: gather facts about all dbs tasks: - fail: msg="" when: false
Play 2
Play 2 is where I create the keys for all our replica database servers. We create the keys simply through a standard user task. The magic though, is that we subsequently use the fetch module to download a copy of the public key and store it in a temporary directory. This will be important later.
The second important piece is that we add the primary as a known_host. There are two steps to this. First we use ssh-keygen to check if the primary is already a known host. If it’s not then we use ssh-keyscan to read the key and add it to the known_hosts file. In my case, we’re dealing with a single server representing the primary, so I can store that server in the variable postgresql_primary. However, if you’re working with multiple machines with dynamic ip addresses, you can also use a loop with the built in groups dictionary to loop through a group with all the servers that need to be added as a known host.
- name: create keys for replicas hosts: replicas user: "{{ config_user }}" sudo: yes sudo_user: postgres tasks: - name: Generating postgres user and ssh key user: name=postgres group=postgres generate_ssh_key=yes sudo: yes sudo_user: root - name: Downloading pub key fetch: src=~/.ssh/id_rsa.pub dest=/tmp/pub-keys/replicas/{{ansible_hostname}}/id_rsa.tmp flat=yes changed_when: False - name: check if primary is already a known host shell: ssh-keygen -H -F {{ postgresql_primary }} register: know_host ignore_errors: true changed_when: False - name: make sure the ssh folder exists file: name=~/.ssh state=directory - name: make sure known_hosts exists file: name=~/.ssh/known_hosts state=touch - name: Make primary a known host shell: ssh-keyscan -H {{ postgresql_primary }} >>; ~/.ssh/known_hosts when: know_host|failed
Play 3
Play 3 is almost identical, but now we’re creating keys for our primary database server. The process is the same though, create the keys. fetch them to a temporary folder, and the other servers as known hosts.
- name: create keys for primary hosts: primary-db user: "{{config_user}}" sudo: yes sudo_user: postgres tasks: - name: Generating postgres user and ssh key user: name=postgres group=postgres generate_ssh_key=yes sudo: yes sudo_user: root - name: Downloading pub key fetch: src=~/.ssh/id_rsa.pub dest=/tmp/pub-keys/primary/{{ansible_hostname}}/id_rsa.tmp flat=yes changed_when: False - name: check if backup is already a known host shell: ssh-keygen -H -F {{ postgresql_backup }} register: know_host ignore_errors: true changed_when: False - name: make sure the ssh folder exists file: name=~/.ssh state=directory - name: make sure known_hosts exists file: name=~/.ssh/known_hosts state=touch - name: Make backup a known host shell: ssh-keyscan -H {{ postgresql_backup }} >> ~/.ssh/known_hosts when: know_host|failed
Play 4
Play 4 is where we start to use those saved public keys. On the primary database we make sure our trusted user exists and then we loop through the local temporary directory and add all those keys as authorized authorized keys for that user. Then we delete the local copies of the public keys.
- name: add keys to primary hosts: primary-db user: "{{config_user}}" sudo: yes sudo_user: root tasks: - name: Create postgres user user: name=postgres group=postgres - name: add keys authorized_key: user=postgres key="{{ lookup('file', 'item') }}" with_fileglob: - /tmp/pub-keys/replicas/*/id_rsa.tmp - name: Deleting public key files local_action: file path=/tmp/pub-keys/replicas state=absent changed_when: False sudo: no
Play 5
Play 5 is just the reverse of play 4. It copies the other set of keys from your local folders, up to the server as authorized keys
- name: add keys to backup hosts: backup-dbs user: "{{config_user}}" sudo: yes sudo_user: root tasks: - name: ensure replication user user: name=replication group=replication - name: add keys authorized_key: user=replication key="{{ lookup('file', 'item') }}" with_fileglob: - /tmp/pub-keys/primary/*/id_rsa.tmp - name: Deleting public key files local_action: file path=/tmp/pub-keys/primary state=absent changed_when: False sudo: no
Wrapping up
Here’s the full playbook, written out
- hosts: db name: gather facts about all dbs tasks: - fail: msg="" when: false - name: create keys for replicas hosts: replicas user: "{{ config_user }}" sudo: yes sudo_user: postgres tasks: - name: Generating postgres user and ssh key user: name=postgres group=postgres generate_ssh_key=yes sudo: yes sudo_user: root - name: Downloading pub key fetch: src=~/.ssh/id_rsa.pub dest=/tmp/pub-keys/replicas/{{ansible_hostname}}/id_rsa.tmp flat=yes changed_when: False - name: check if primary is already a known host shell: ssh-keygen -H -F {{ postgresql_primary }} register: know_host ignore_errors: true changed_when: False - name: make sure the ssh folder exists file: name=~/.ssh state=directory - name: make sure known_hosts exists file: name=~/.ssh/known_hosts state=touch - name: Make primary a known host shell: ssh-keyscan -H {{ postgresql_primary }} >>; ~/.ssh/known_hosts when: know_host|failed - name: create keys for primary hosts: primary-db user: "{{config_user}}" sudo: yes sudo_user: postgres tasks: - name: Generating postgres user and ssh key user: name=postgres group=postgres generate_ssh_key=yes sudo: yes sudo_user: root - name: Downloading pub key fetch: src=~/.ssh/id_rsa.pub dest=/tmp/pub-keys/primary/{{ansible_hostname}}/id_rsa.tmp flat=yes changed_when: False - name: check if backup is already a known host shell: ssh-keygen -H -F {{ postgresql_backup }} register: know_host ignore_errors: true changed_when: False - name: make sure the ssh folder exists file: name=~/.ssh state=directory - name: make sure known_hosts exists file: name=~/.ssh/known_hosts state=touch - name: Make backup a known host shell: ssh-keyscan -H {{ postgresql_backup }} >> ~/.ssh/known_hosts when: know_host|failed - name: add keys to primary hosts: primary-db user: "{{config_user}}" sudo: yes sudo_user: root tasks: - name: Create postgres user user: name=postgres group=postgres - name: add keys authorized_key: user=postgres key="{{ lookup('file', 'item') }}" with_fileglob: - /tmp/pub-keys/replicas/*/id_rsa.tmp - name: Deleting public key files local_action: file path=/tmp/pub-keys/replicas state=absent changed_when: False sudo: no - name: add keys to backup hosts: backup-dbs user: "{{config_user}}" sudo: yes sudo_user: root tasks: - name: ensure replication user user: name=replication group=replication - name: add keys authorized_key: user=replication key="{{ lookup('file', 'item') }}" with_fileglob: - /tmp/pub-keys/primary/*/id_rsa.tmp - name: Deleting public key files local_action: file path=/tmp/pub-keys/primary state=absent changed_when: False sudo: no
Save this in a yml file and you can run it with ansible-playbook. You can also include it in other playbooks where you need to ensure both groups of servers can ssh between each other before your configuration really begins.