ssh key management

Background

I was asked to comment on the following:

Our software is built and maintained by about 20 engineers, and runs in production on dozens of servers in a remote data center. One problem we frequently face is identity and access management on these servers – our engineers occasionally need to directly access the servers (via SSH) to debug an issue or perform maintenance, but often find that their key is not installed where they need it. Additionally, any time an engineer leaves the company we need to scramble to make sure their key is removed from all servers, and that they didn’t create any backdoor access channels for themselves that would present a security risk.

Describe how you would solve this problem. If you think outside tools are appropriate, explain which ones and why. If you think this problem is better solved by rolling something yourself, include as much detail as possible. Keep in mind that we are a small startup with limited time and money – so we place a lot of value on solutions that are cheap, simple, and easy-to-implement.

The best solutions to this problem will be thorough, thoughtful, and well presented. Use this as an opportunity to showcase your areas of expertise, like scripting, security, networking, etc.

Outline of problem

Key management is a issue whenever access to servers must be controlled. Keys must be added when new employees come on board. Old keys must be removed when employees leave. Keys must be updated when someone forgets a passphrase.

Proposed solution

We will also not allow individual users to have control over their own authorized_keys file. Instead, we will make use of the AuthorizedKeysFile option for sshd and place the keys under the /etc/ssh/authorized_keys directory. This prevents users from adding/changing their respective ssh-keys and also prevents an intruder from adding their own key. This approach centralizes the control and location of all ssh-keys using standard sshd configuration. We can also put authorized_keys files under version control.

In addition to this, we will use a automation tool, such as Ansible, to manage the servers. It will be able to:

add authorized_keys files for new users
disable existing users
maintain authorized_keys file for existing users

Although Ansible is suggested here, any such tool can do the same thing: e.g. Puppet, Chef, Salt, etc. Ansible is just the example.

Assumptions

Here are a few assumptions for this proposal:

each user has their own login
creation of new users on the server is outside scope, but can also be scripted via Ansible
deleting users from the system is outside scope
each user uses sudo
each user is a member of a specified group
root login is disabled
PasswordAuthentication in sshd_config is set to No
all users have access to all servers (the solution described can be expanded to have server-specific or server-group scenarios)

sudo will be configured so that any user in the group can execute any command, in this case, the group is wheel:

%wheel ALL=(ALL) ALL

This will enable any upgrades they need to perform. We’ll assume our users are trusted enough to behave nicely with sudo. If that is not the case, we can lock that down with the sudoers file, but that seems to be outside the scope of this project.

In the context of this post, we will deal only with public ssh keys for users who need to have ssh access to our servers. It will not deal with server keys, although the proposed solution can be extended to that task.

The proposed solution is based on tests and proof of concepts run on FreeBSD 9.2 servers with Ansible 1.5 but the results should transfer easily to any platform on which Ansible can run.

Why Ansible?

In late 2013 I was looking for configuration management tool and did a proof-of-concept for my personal servers. I settled on Ansible after looking for recommendations from other sysadmins and doing some research online. I had previous exposure to Puppet and CFEngine, so I tried Salt and Ansible based on multiple recommendations from people I had worked with in the past. I tried Salt for a few days, then tried Ansible. I found Ansible was much faster to get going, with less effort.

I’ve found that Ansible is undergoing active development and has a thriving user community which maintains various playbooks upon which we can build.

I’m also basing much of what I present here on an a post (managing sshd with Ansible) by Michael W Lucas, author of SSH Mastery. I have used similar techniques for deploying my own servers.

The post by Mr Lucas allows remote root login via ssh-key only. I prefer not to permit that, and use a slightly different approach described in the ansible user section.

Control machine requirements

From Ansible Installation:

The control machine (i.e. Ansible server) needs to have Python 2.6 installed.

Managed Node Requirements

Each managed node (i.e. our servers / Ansible clients) will need Python 2.4 or later (if using Python < 2.5, we also need python-simplejson).

ansible user

The Ansible configuration tool will need ssh access to each managed node. By its nature, this user will need to have root privileges, and in our case, that will be achieved via sudo. The ansible user will login via ssh-key, and the passphrase for this user will need to be protected and trusted to a few individuals. Access to systems by the ansible user can be restricted to connections originating from a predetermined IP address (via the authorized_keys file and/or Match options in sshd_config). This ansible user is permitted unrestricted sudo access (but that can be restricted via the sudoers file).

This is the most vulnerable part of the solution, and appropriate consideration must be given to this point.

sshd configuration

Here are a few of the key items from the ssd_config file which are central to this solution:

AuthorizedKeysFile /etc/ssh/authorized_keys/%u
PasswordAuthentication no
ChallengeResponseAuthentication no

Each user’s authorized keys are in a file named after the username, in the directory /etc/ssh/authorized_keys. Our sshd_config will direct sshd to look in that directory.

Simple examples

As mentioned above, these examples are based on managing sshd with Ansible.

I will skip over the installation of the required software and start with examples of how we can maintain users.

Here are the users who can access our servers via ssh. We create this file: /etc/ansible/group_vars/ssh_users

---

#users who get SSH access to these machines
sshusers:
  - mwlucas
  - ansible
  - john
  - harry
  - mandy

With the sshd_config file mentioned in the previous section, here is what /etc/ansible/configs/etc/ssh/sshd_config.j2 will contain:

#$Id$
#{{ ansible_managed }}
ListenAddress {{ ansible_ssh_host }}
PermitRootLogin without-password
AuthorizedKeysFile /etc/ssh/authorized_keys/%u
PasswordAuthentication no
ChallengeResponseAuthentication no

The following is an Ansible playbook and will manage both sshd configuration and authorized_keys:

---
- hosts: freebsd
  user: ansible
  sudo: yes

  tasks:
  - name: create key directory
    action: file path=/etc/ssh/authorized_keys state=directory
      owner=0 group=0 mode=0755

  - name: upload user key
    action: copy src=/home/ansible/crossplatform/etc/ssh/authorized_keys/{{ item }}
      dest=/etc/ssh/authorized_keys/
      owner=0 group=0 mode=644
    with_items: sshusers

  - name: sshd configuration file update
    template: src=/etc/ansible/configs/etc/ssh/sshd_config.j2
      dest=/etc/ssh/sshd_config
      backup=yes
      owner=0 group=0 mode=0644
      validate='/usr/sbin/sshd -T -f %s'
    notify:
    - restart sshd

  handlers:
    - name: restart sshd
      service: name=sshd state=restarted

In this case, we will have a directory on the ansible server, /home/ansible/crossplatform/etc/ssh/authorized_keys/, which contains a list of files, one for each user. Each file will represent the authorized_keys for that user. The {{ item }} directive pulls values from the sshusers global variable previously defined.

Our sshd configuration is stored at /etc/ansible/configs/etc/ssh/sshd_config.j2 and is a template. The variables in that template are pulled from our configuration settings for each host.

The validate command checks the validity of the configuration configuration. If all goes well, sshd is restarted.

With this configuration, users can log in only via ssh-key and those public keys are centrally controlled.

Procedures

The following is a list of routine maintenance and how to perform them:

Adding a new user

When we add a new user, we add them to sshusers and create their authorized_keys file at /home/ansible/crossplatform/etc/ssh/authorized_keys/ on our ansible server. When then invoke the playbook and the authorized_keys are copied to /etc/ssh/authorized_keys on all servers.

Disabling a user

To disabled a user, delete the contents of their respective file at /home/ansible/crossplatform/etc/ssh/authorized_keys/, then invoke the playbook.

We could easily go a step farther and disable the account (e.g. pw lock user FOO).

We do not remove the user from sshusers yet. That will not be done until we delete the user from the system.

We could also create a playbook to remove a given user from the /etc/ssh/authorized_keys directory on each server. Presumably that would be part of the ‘Remove User’ process, which is outside scope.

Updating keys

To update the authorized_keys for a user, update the file at /home/ansible/crossplatform/etc/ssh/authorized_keys/, then invoke the playbook.

Audit

It would be fairly straight forward to create another playbook which would verify /etc/password to look for any rogue users which have been added. We could take that a step further and maintain the users via an existing Ansible module.

Notes

Ansible has a module for maintaining ssh-keys, but as Mr Lucas pointed out, it has a problem with quotes in restricted keys.

Other interesting stuff

I also found RevokedKeys in man sshd_config. This seems to be a way to tell sshd not to accept a given key for any user, regardless of whether or not it’s in an authorized_keys file. Given the proposal above, it’s not really relevant to our needs, but I found it interesting nonetheless. If used, we would still want to disable that user from login.

Recommendation

I have been very brief on some of the configuration items, but the basic concepts are sound. This will work and it is pretty straight forward to do. The configuration changes are pushed to the server on an as-required basis. That can be automated, but I’ve always preferred to be sitting at the keyboard when such changes are pushed to the nodes.

I think we should try a quick proof of concept and see how easy it is to get this running. I think it should be pretty easy to add/remove users as required, making the whole process rather straight forward and quick.