Cloud-init: What is it and Why it’s Important to Your Cloud Environment

Have you ever wondered why, when you start up an instance in the cloud, the instance just ‘happens’ to come up with the correct software installed and all ready for use? The answer is in the software that everyone uses, cloud-init, and in this post we’ll discuss what it is and how it’s implemented.

Cloud-init Capabilities

Cloud-init is the service that is installed inside the instance and cloud-config are a set of scripts that are executed as soon as the instance is started. Cloud-config is the language of the scripts that cloud-init knows to execute.

Cloud-init runs on Linux workloads; for Microsoft Windows workloads, the equivalent is CloudBase-init which supports the majority of cloud-config parameters. The service on the operating system starts early at boot, retrieves metadata that has been provided from an external provider (metadata) or by direct user data supplied by the user, such as through OpenStack Nova.

For each instance that you start in Openstack with Nova or through Heat, you have the option of passing what is called user-data.

nova boot <instance_name> --user-data ...... Or
heat stack-create <stack_name> --user-data ......

You can pass a regular script of almost any interpreter that you wish to use, providing that you have that interpreter already installed on your instance.

For example, passing a bash script to basically any instance will succeed because all instances have bash installed by default. On the other hand, passing a Python or Ruby script to cloud init as a file will most probably not work, unless you already have Python or Ruby installed in your instance as part of your image or template.

Cloud-config Syntax

Cloud-config has its own syntax that is based on YAML, a human-readable data serialization language. There are a number of basics rules that you should be aware of when using YAML:

  • Indentation with white spaces (not tabs) defines the relationship and structure of the resources within the script.

  • Members of a list can be identified with the leading dash on the line.

  • Blocks of text are indented.

  • Text that is to be interpreted as is, should be preceded by a pipe character (“|”).

Cloud-config Examples

Users and Groups

When provisioning your instances, a good practice is to ensure that you have a default group and a user defined within the instance. This allows you to ensure that you have a standard configuration of all your provisioned instances and also allows you to comply with requirements as needed.

users:  - name: strato
groups: sudo
shell: /bin/bash
ssh-authorized-keys: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDf0q4PyG0doiBQYV7OlOxbRjle026hJPBWDAaaaaxcdxd +eKHWuVXIpAiQlSElEBqQn0pOqNJZ3IBCvSLnrdZTUph4czNC4885AArS9NkyM7lK27Oo8RV8 +NI5xPB/QT3Um2Zi7GRkIwIgNPN5uqUtXvjgAaaaaaaffcdc+i1CS0Ku4ld8vndXvr504jV9BM QoZrXEST3YlriOb8Wf7hYqphVMpF3b+8df96Pxsj0+iZqayS9wFcL8ITPApHi0yVwS8T jxEtI3FDpCbf7Y/DmTGOv49+AWBkFhS2ZwwGTX65L61PDlTSAzL+rPFmHaQBHnsli8U9N6E4X HDEOjbSMRX

Let’s explain what the code snippet above does in more detail:

The first line #cloud-config tells cloud-init that this is a cloud config file and should be treated as such.

We then have a top level object – users with the following characteristics:

  • A new user with the name of ‘strato’

  • Part of the sudo group with a bash shell, with the correct privileges in the sudoers file which allow this user to run all privileged commands without having to provide a password

  • Determines which key should be added to the user’s ~/.ssh/authorized_keys file when it is created

Injecting Key-Value Pairs

When deploying at scale you are probably already using some kind of configuration management system such as Puppet, Salt or Chef to customize your instances. To manage each of these individually does not scale.

How can cloud-config help?

When you provision an instance, you would like to be able to identify the instance to be provisioned in a certain way.

For example, you would like the instance to be installed as a web server, to provide static content for your web application. The flow would be as follows:

  1. Provision an instance with cloud-config

  2. Configure the instance to point to your configuration management server

  3. Identify the instance as a web server

This can be accomplished by defining a custom variable as part of your user-data which will be parsed by cloud-init and then used further for the provisioning of the instance.

Following is an abbreviated example (this is not a full solution) of how this would be accomplished:

#!/bin/sh set -e -x # Standard role defaultrole=web #
# get role from commandline or if absent from hiera
# if [ $# -gt 0 ]; then  role=$1 else  role="`hiera -c
role $defaultrole 2>&1`" fi # # Run puppet.
# if [ -f /etc/puppet/manifests/$role.pp  ]; then  puppet apply
/etc/puppet/manifests/$role.pp else
echo "/etc/puppet/manifests/$role.pp was not found!"
exit 1 fi

Here we have defined the instance with the role of a web server, and instructed the instance to apply the appropriate modules according to the role for which the instance was defined. This could also be accomplished by setting specific metadata on the instance through your cloud platform upon instantiation, and then configuring the instance to retrieve that information and act upon it.

Adding Specific Yum Repositories

To install your software you need to configure a specific yum or apt repository on the instance so that it will know from where to pull the correct packages. This can be accomplished by adding the following lines to your cloud-config file:

#cloud-config # # Add yum repository configuration to the system
# # The following example adds the file /etc/yum.repos.d/epel_testing.repo
# which can then subsequently be used by yum for later operations.
# The name of the repository    epel-testing:
enabled: false
failovermethod: priority
gpgcheck: true
gpgkey: file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL
name: Extra Packages for Enterprise Linux 5 - Testing


Cloud-init is a proven method of bootstrapping your instances in the cloud and creating a standard environment for your workloads to run on, providing standardization and manageability.