Discovering AWS with the CLI Part 1: Networking and Virtual Machines
Published on
Recently, I started working on moving an application that was deployed manually to an AWS EC2 instance to a more modern, infrastructure-as-code setup. This gave me the chance to dive deeper into AWS concepts, and play around with the various services. There are numerous ways to use the AWS API: On top of the standard tools offered by Amazon, such as the web GUI, CLI client, client packages for a number of languages and CloudFormation, there are various third party tools, such as Terraform and Ansible. Pretty much every other tutorial or book on AWS is a click-through in the web UI, but neither the pedagogic effect nor the resulting programmatic output is optimal: It cannot be reproduced, and when you want to go over it, you need to recall where the hell you clicked, and which values had to be same or related to each other. I found the CLI client to be a much better alternative, because you can linearly follow what has to happen when, and how things connect to each other. You can also use the resulting code for actual productive orchestration work. This tutorial documents what I found out about getting the most out of the CLI client, and how one can use it to understand and discover AWS concepts.
If you don’t want to copy-paste all the commands, you can check out the samples
repository, which I will
refer to extensively in part 2,and use checkpoints-part-1.sh
file
which bundles all examples into on script. This script notificies the user at
the different checkpoints of the current location, and the execution will pause.
You can then inspect the state on the AWS console, or run commands in another
shell.
Installation and Configuration of awscli
The AWS CLI client is delivered as a Python package named
awscli. As such, the easiest way to install
it is to use pip, with pip install awscli
. Once you have installed it, you
need to register you access keys, which can be done with aws configure
command. You can add additional profiles with the --profile
argument, and you
can also rerun the command if you want to change something, such as the default
region. When you use the command and want to specify a certain profile, you can
either use the `–profile` argument, or set the environment variable
AWS_DEFAULT_PROFILE
. The same thing is valid for region; you can either pass
the argument --region
, or export it as AWS_DEFAULT_REGION
. If you are ever
in doubt of who you are logged in as, you can simply issue the command aws iam
get-user
, which will show you your username and user ARN.
General usage
The AWS CLI accepts combinations of commands, with the first command being
something like the namespace. The default output format is JSON, and you can
manipulate this output using JMESPath notation. It is
also possible to print the output as a table or in plain text; the former is
rarely used, but the latter is necessary if you want to use the output as input
for other commands. A really useful feature is autocompletion, which provides a
quick means to search among the many namespaces and subcommands. In order to
enable autocompletion, you need to specify that the command aws_completer
needs to be used to complete the command aws
, which can be done with the
following:
complete -C "$(which aws_completer)" aws
Now, tabbing should help you find stuff. More details can be found here.
Uploading an SSH key
We will be creating EC2 instances in the following, and you will need an SSH key to access them. The way this works on AWS is that you upload your public key with a name, and then specify, on creation, that a VM should be accessible with that key. Creating an SSH key is very easy, as famously documented on the Github documentation. Once you have created one, you can add it to the available keys on AWS with the following command:
aws ec2 import-key-pair --key-name brand-new-key \
--public-key-material file://~/.ssh/id_rsa.pub
You can later refer to this key as brand-new-key
and use it to SSH into your
VMs.
Resource groups and tags
It is possible to gather AWS resources under resource groups, which enables
certain bulk features such as monitoring costs or gathering logs. Unfortunately,
deleting resources is not among those features, at least not using the web
console or the CLI. A resource group is created by specifying a query that will
match resources based on tags. If we want resource groups to be based on the
value of the Environment
tag, for example (tag names are by convention
capitalized), we need to create the resource group demo-environment
with the
following command:
aws resource-groups create-group \
--name DemoEnvironment \
--resource-query '{"Type":"TAG_FILTERS_1_0", "Query":"{\"ResourceTypeFilters\":[\"AWS::AllSupported\"],\"TagFilters\":[{\"Key\":\"Environment\", \"Values\":[\"Demo\"]}]}"}'
As you can see, the format is really god awful. The use of tags on the command
line (and also on the web console, for that matter) is complicated considerably
by the non-uniform application of tags to resources. Some resources (such as EC2
instances) accept a tag on creation, whereas others can be tagged only once they
are are created; you can see a detailed list
here.
Resource groups are still rather useful, however, due to which reason we will
tag all the resources we create. We will see later how to create resource as a
part of the DemoEnvironment
.
VM and VPC, the heart of AWS
The heart of AWS is EC2 service, the Elastic Compute Cloud. It provides the means to create, organize, access and interface to scalable computing infrastructure, the infamous EC2 instances. Under the covers, AWS uses EC2 to run the rest of its own services. EC2 instances are just a part of the puzzle, though. You will be dealing even more often with the networking components of EC2, especially with virtual private clouds (VPC). Nearly every resource on AWS is connected to a VPC and a subnet, either directly or with at most one hop. A VPC is a logically isolated network which separates your AWS resources from the rest of AWS, while subnets are tools for finer control of how these resources communicate with each other, and with the internet. Your account comes with a default VPC; if you don’t supply the VPC argument, the resource will be created in this default VPC. Here is how to get the default VPC’s ID:
DEFAULTVPCID="$(aws ec2 describe-vpcs \
--filter "Name=isDefault, Values=true" \
--query "Vpcs[0].VpcId" --output text)"
As you can see, there is no separate namespace for VPC subcommands; they are in
the EC2 namespace. Also, we used the --query
argument, which can be added to
any command to print a specific part of the response JSON. Here we use it to
print the ID of the new network; we also pass it the option --output text
to
get the ID as simple text instead of a JSON string. Talking of the default VPC
is a bit misleading; it’s more like the default networking infrastructure, as
there are a couple of other things attached to this VPC that make it special.
The first part of this structure is the subnets. We can print the subnets of the
default VPC with the following query:
aws ec2 describe-subnets --filter \
"Name=vpc-id,Values=$DEFAULTVPCID"
This should print a number of subnets; in my case it’s 3. One field of
significance is the AvailabilityZone
. It should be easy to see that each
subnet has a different value, but they are all in the same region (for my region
eu-central-1
, the availability zones are eu-central-1a
to 1c
). A VPC
created in a region will logically span all the availability zones (AZ) in that
region. A subnet, on the other hand, is specific to a single AZ. You can also
list the network interfaces, which are the entities through which computational
resources connect to the network, with the following command:
aws ec2 describe-network-interfaces --filter \
"Name=vpc-id,Values=$DEFAULTVPCID"
One situation where this command comes in handy is figuring out which resources to first delete when you are trying to delete a VPC. When it comes to the dependency graph, VPCs are pretty much at the top of the (top-down) tree. You cannot delete them unless all the other, non-default resources are also removed or detached.
Creating, connecting and instantiating resources in VPC and subnets
If you want to create multiple isolated resource groups, keep control over which resources can access which others, and generally understand how to connect various other AWS things like RDS databases, you will need to deal with VPC’s. Let’s begin this process with creating one such VPC:
VPCID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 \
--query "Vpc.VpcId" --output text)
aws ec2 create-tags --resources $VPCID --tags Key=Environment,Value=Demo
The --cidr-block
required argument specifies what IP range will be valid
within the VPC. This uses the CIDR format, with the suffix /16
specifying how
many bits from the beginning constitute the network mask; our VPC will be able
to hand out and route between IPs from 10.0.0.0
to 10.0.255.255
, that is,
256*256 = 65536
IPs in total. Once this command runs, you should see two
results in the output of aws ec2 describe-vpcs
: The default VPC, and the new
one you created just now. You can also see the new VPC in the list of resources
for our new resource group with the command aws resource-groups
list-group-resources --group-name DemoEnvironment
. A VPC is not enough
information for AWS to figure out the networking topology, however: We need a
subnet. The subnet needs to have a CIDR block that’s a subset of the VPC’s. Now
let’s create one with the following command:
SUBNETID=$(aws ec2 create-subnet --vpc-id $VPCID \
--cidr-block 10.0.1.0/24 \
--query "Subnet.SubnetId" --output text)
aws ec2 create-tags --resources $SUBNETID --tags Key=Environment,Value=Demo
As you can see in the --cidr-block
argument, this subnet covers IPs in the
ranges from 10.0.1.0
to 10.0.1.255
, which is a part of the IPs covered by
the VPC. Once we have the subnet, we can go ahead and create our first EC2
instance attached to it. In order to do so, we first need the ID of a proper
AMI. I used the following command to list the official Ubuntu AMI’s, and picked
the newest one:
AMIID=$(aws ec2 describe-images \
--filters "Name=root-device-type,Values=ebs" \
"Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-*" \
"Name=architecture,Values=x86_64" \
--query "reverse(sort_by(Images, &CreationDate)) | [?! ProductCodes] | [0].ImageId" \
--output text)
The reason for the complicated query argument is that we don’t want the AMIs
that are in the AMI
marketplace,
and one needs to pay for, or agree to license for. Now let’s start an EC2
instance with the AMI the above command picked (as of 11.08.2019, this is
ami-0ac05733838eabc06
):
aws ec2 run-instances --image-id $AMIID --count 1 \
--instance-type t2.micro --key-name brand-new-key \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Environment,Value=Demo}]' --subnet-id $SUBNETID
This instance gets loaded with the SSH key that we uploaded earlier, named
brand-new-key
. It also gets the same environment tag, but with the convenience
of adding it in the creation command, making a second command unnecessary. The
--subnet-id
argument specifies which subnet the networking interface should
connect to. If we hadn’t specified this, a subnet in the default VPC would have
been picked. We now have a functioning VM, whose status we can query by listing
through the resource group, and querying for the instance ID:
INSTANCEID=$(aws ec2 describe-instances \
--filter "Name=tag:Environment,Values=Demo" \
--query "reverse(sort_by(Reservations, &Instances[0].LaunchTime)) | [0].Instances[0].InstanceId" \
--output text)
The query part of this command is again relatively complicated. The reason is
that, if you create a couple of VMs and terminate them, they will still appear
in the list of VMs when searched by tag. That’s the reason we pick the VM that
was last launched. When you create an instance and would like to know when it is
actually running, you can use the handy wait
feature, as follows:
aws ec2 wait instance-running --instance-ids $INSTANCEID
Now if we run the command to list group resources, we should see three entries:
A VPC, a subnet and an instance. If you inspect the EC2 instance with aws ec2
describe-instance $INSTANCEID
, you can see a couple of fields that are
interesting. There’s the ID of course, and PrivateDnsName
, but peculiarly no
public IP or DNS. This is because the subnet was not configured to give this
instances an IP address on launch; you can see that this is so in the
MapPublicIpOnLaunch
field of the subnet we created, which is false. The
instance we created is in a vacuum, as far as we are concerned, and cannot be
contacted from anywhere. You can also see this by right clicking on the instance
in the web GUI, and clicking connect. AWS will ask you to pick a method out of
SSH client, web SSH client, or Java SSH client. Interestingly, the first of
these shows the private IP of this instance (something like 10.0.1.12
), which
is in the reserved range and cannot be used for internetworking. If you pick the
second option, you will see an error message telling you that the instance does
not have a public IP.
Opening a subnet to the outer world
We need to modify and extend our basic subnet in two ways in order for the instances connected to it to communicate with the internet. The first is a gateway. An internet gateway acts as a target for internet-routable traffic, and takes care of NAT (Network Address Translation). You should not confuse an internet gateway with a NAT gateway: The latter is used to connect instances in private subnets to the internet, while they are still unavailable to traffic from the outside. The default VPC has an internet gateway, as you would expect:
DEFAULTGATEWAY=$(aws ec2 describe-internet-gateways \
--filters "Name=attachment.vpc-id,Values=$DEFAULTVPCID" \
--query "InternetGateways[0].InternetGatewayId" --output text)
echo $DEFAULTGATEWAY
This should print the ID of the gateway used by the default VPC. A gateway is not automatically created for a VPC, however. Our new VPC is lacking one, which we can see using the following command:
aws ec2 describe-internet-gateways --filters \
"Name=attachment.vpc-id,Values=$VPCID"
This should return an empty list. We can create a brand new gateway for our VPC with the following commands:
GATEWAYID=$(aws ec2 create-internet-gateway --query \
"InternetGateway.InternetGatewayId" --output text)
aws ec2 create-tags --resources $GATEWAYID --tags Key=Environment,Value=Demo
aws ec2 attach-internet-gateway --vpc-id $VPCID \
--internet-gateway-id $GATEWAYID
Now we have a gateway that is attached to our VPC. The next thing we need is a means for the networking logic to route the requests that are meant for the internet through this gateway. This is the job of the route table. Every VPC comes with a default route table (see here for details). We can see how these rules look by first looking at the settings for the default VPC and its subnets:
aws ec2 describe-route-tables --filters "Name=vpc-id,Values=$DEFAULTVPCID"
In the Routes
entry of the resulting output, you should be able to see two
entries. The first of these has the field DestinationCidrBlock
set to
172.31.0.0/16
, which is the CIDR of the VPC itself (you can verify this with
the command aws ec2 describe-vpcs --vpc-id $DEFAULTVPCID --query
"Vpcs[0].CidrBlock"
). The GatewayId
of this rule is local
, meaning that it
will route traffic locally. The second rule has 0.0.0.0/0
as
DestinationCidrBlock
, and its GatewayId
is equal to the DEFAULTGATEWAY
.
Since the rules in a routing table take precedence in order of specifity, this
second rule will be valid for all requests that are not meant for the VPC IP
range. Since, as mentioned above, every VPC has a route table, we do not need to
create a new one, and can instead modify the existing route table:
ROUTETABLEID=$(aws ec2 describe-route-tables \
--filter "Name=vpc-id,Values=$VPCID" \
--query "RouteTables[0].RouteTableId" --output text)
aws ec2 create-tags --resources $ROUTETABLEID \
--tags Key=Environment,Value=Demo
aws ec2 create-route --route-table-id $ROUTETABLEID \
--destination-cidr-block 0.0.0.0/0 \
--gateway-id $GATEWAYID
With the last create-route
command, we are telling the network to route
requests that are not to an interface in the VPC to the gateway defined by the
GATEWAYID
. As we are modifying the default route table, there is no need to
explicitly associate the route table with the subnets of the VPC which we want
to make public, because in the absence of explicit associations, subnets use the
default route table. This association is also not displayed in the result of
aws ec2 describe-route-tables
, which is the reason we cannot demo it for the
default network. If it were the case that we were creating a new routing table,
however, the following command would have been necessary for such an
association:
aws ec2 associate-route-table --subnet-id $SUBNETID \
--route-table-id $ROUTETABLEID
One last step is necessary to make sure that the instances we start in the subnet are getting public IPs. The following will modify the subnet to make sure that is the case:
aws ec2 modify-subnet-attribute --subnet-id $SUBNETID \
--map-public-ip-on-launch
Normally (as in, in most cases, and definitely for the default VPC), an instance
that gets a public IP address is also given a public DNS; this public DNS of an
instance can be queried through the PublicDnsName
field. Sometimes, however
(the
documentation
is not clear on when and how), the relevant fields on the VPC are not set
properly on creation. In order to make sure that your instance gets not only an
IP address but also a DNS, you should to set the proper configuration values
with the following commands:
aws ec2 modify-vpc-attribute --vpc-id $VPCID --enable-dns-hostnames
aws ec2 modify-vpc-attribute --vpc-id $VPCID --enable-dns-support
As far as I can understand from the documentation, it is not possible to attach a public IP to a running instance from the subnet pool. You can use an Elastic IP, but that’s out of scope for this post. Instead, we will simply delete the running instance, and create a new one:
aws ec2 terminate-instances --instance-ids $INSTANCEID
INSTANCEID=$(aws ec2 run-instances --image-id $AMIID --count 1 \
--instance-type t2.micro --key-name brand-new-key \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Environment,Value=Demo}]' \
--subnet-id $SUBNETID --query "Instances[0].InstanceId" --output text)
aws ec2 wait instance-running --instance-ids $INSTANCEID
Let’s check whether our instance now has a public IP address and DNS:
IPADDRESS=$(aws ec2 describe-instances --instance-ids $INSTANCEID \
--query "Reservations[0].Instances[0].PublicIpAddress" --output text)
PUBLICDNS=$(aws ec2 describe-instances --instance-ids $INSTANCEID \
--query "Reservations[0].Instances[0].PublicDnsName" --output text)
IPADDRESS
should now be a proper IP address, and PUBLICDNS
should be a URL
that resolves to that IP address. Since we already waited for the instance to
start, you can, at least in principle, contact it via SSH with ssh
ubuntu@$IPADDRESS
or ssh ubuntu@$PUBLICDNS
. If you try this now, however, you
will again face an empty line, without a response from the new server. The
reason for this silence is that the default security rules do not allow inbound
traffic to this instance. AWS security groups are means of controlling the
traffic between EC2 instances and the internet. A new VPC has a default security
group, which also has default rules. These default rules allow all outgoing
connections (and the incoming responses these cause), and all connections
between instances in the same security group, but nothing else. Since we did not
create a new security group (which we could have done with aws ec2
create-security-group
), the new instance has been automatically connected to
the default security group of the VPC. All is not lost, though: If we change the
rules for the security group, it will be instantly applied to any new requests.
Let’s modify the security group rules, and allow TCP connections from all IP
addresses on the default SSH port:
SECURITYGROUPID=$(aws ec2 describe-security-groups \
--filters Name=vpc-id,Values=$VPCID \
--query "SecurityGroups[0].GroupId" --output text)
aws ec2 authorize-security-group-ingress --group-id $SECURITYGROUPID \
--protocol tcp --port 22 --cidr 0.0.0.0/0
See here for more on security groups. Now you should be able to access the VM on the public IP address or DNS.
Cleanup
Cleaning up is relatively straightforward if you have access to the shell session with the variables that store the resource IDs. Remove all the AWS resources we created with the following commands:
aws ec2 terminate-instances --instance-ids $INSTANCEID
aws ec2 delete-key-pair --key-name brand-new-key
aws ec2 detach-internet-gateway --internet-gateway-id $GATEWAYID \
--vpc-id $VPCID
aws ec2 delete-internet-gateway --internet-gateway-id $GATEWAYID
aws ec2 delete-subnet --subnet-id $SUBNETID
aws ec2 delete-vpc --vpc-id $VPCID
aws resource-groups delete-group --group-name DemoEnvironment
You have to delete resources in this order, otherwise AWS will tell you that
dependencies are being violated. If you don’t have access to the IDs, you can
either query the individual elements via the CLI using the Environment
tag, or
copy the IDs from the result of aws resource-groups list-group-resources
.
Unfortunately, as mentioned above, there is no easy command to delete all
resources in a resource group. Even worse, there is no way to delete resources
by ARN, which is the identifier output of this last command.
Conclusion
The AWS CLI client is, as one would expect from the company that builds AWS, a solid piece of software. As you might have noticed from the command examples, there are some inconsistencies, such as differing names for the same kinds of arguments, or the issue with tags, but I think this is the least one would expect from a client that has to cover such a massive base of functionality. In the second part of this tutorial, we will be looking at creating a Fargate cluster using the CLI. The requirements will get more complicated as we try to create a scalable, decoupled application, and we will use many other AWS services to tackle them.