OK I GIVE UP

Discovering AWS with the CLI Part 1: Networking and Virtual Machines

Published on

Recently, I started working on moving an application that was deployed manually to an AWS EC2 instance to a more modern, infrastructure-as-code setup. This gave me the chance to dive deeper into AWS concepts, and play around with the various services. There are numerous ways to use the AWS API: On top of the standard tools offered by Amazon, such as the web GUI, CLI client, client packages for a number of languages and CloudFormation, there are various third party tools, such as Terraform and Ansible. Pretty much every other tutorial or book on AWS is a click-through in the web UI, but neither the pedagogic effect nor the resulting programmatic output is optimal: It cannot be reproduced, and when you want to go over it, you need to recall where the hell you clicked, and which values had to be same or related to each other. I found the CLI client to be a much better alternative, because you can linearly follow what has to happen when, and how things connect to each other. You can also use the resulting code for actual productive orchestration work. This tutorial documents what I found out about getting the most out of the CLI client, and how one can use it to understand and discover AWS concepts.

If you don’t want to copy-paste all the commands, you can check out the samples repository, which I will refer to extensively in part 2,and use checkpoints-part-1.sh file which bundles all examples into on script. This script notificies the user at the different checkpoints of the current location, and the execution will pause. You can then inspect the state on the AWS console, or run commands in another shell.

Installation and Configuration of awscli

The AWS CLI client is delivered as a Python package named awscli. As such, the easiest way to install it is to use pip, with pip install awscli. Once you have installed it, you need to register you access keys, which can be done with aws configure command. You can add additional profiles with the --profile argument, and you can also rerun the command if you want to change something, such as the default region. When you use the command and want to specify a certain profile, you can either use the `–profile` argument, or set the environment variable AWS_DEFAULT_PROFILE. The same thing is valid for region; you can either pass the argument --region, or export it as AWS_DEFAULT_REGION. If you are ever in doubt of who you are logged in as, you can simply issue the command aws iam get-user, which will show you your username and user ARN.

General usage

The AWS CLI accepts combinations of commands, with the first command being something like the namespace. The default output format is JSON, and you can manipulate this output using JMESPath notation. It is also possible to print the output as a table or in plain text; the former is rarely used, but the latter is necessary if you want to use the output as input for other commands. A really useful feature is autocompletion, which provides a quick means to search among the many namespaces and subcommands. In order to enable autocompletion, you need to specify that the command aws_completer needs to be used to complete the command aws, which can be done with the following:

complete -C "$(which aws_completer)" aws

Now, tabbing should help you find stuff. More details can be found here.

Uploading an SSH key

We will be creating EC2 instances in the following, and you will need an SSH key to access them. The way this works on AWS is that you upload your public key with a name, and then specify, on creation, that a VM should be accessible with that key. Creating an SSH key is very easy, as famously documented on the Github documentation. Once you have created one, you can add it to the available keys on AWS with the following command:

aws ec2 import-key-pair --key-name brand-new-key \
    --public-key-material file://~/.ssh/id_rsa.pub

You can later refer to this key as brand-new-key and use it to SSH into your VMs.

Resource groups and tags

It is possible to gather AWS resources under resource groups, which enables certain bulk features such as monitoring costs or gathering logs. Unfortunately, deleting resources is not among those features, at least not using the web console or the CLI. A resource group is created by specifying a query that will match resources based on tags. If we want resource groups to be based on the value of the Environment tag, for example (tag names are by convention capitalized), we need to create the resource group demo-environment with the following command:

aws resource-groups create-group \
    --name DemoEnvironment \
    --resource-query '{"Type":"TAG_FILTERS_1_0", "Query":"{\"ResourceTypeFilters\":[\"AWS::AllSupported\"],\"TagFilters\":[{\"Key\":\"Environment\", \"Values\":[\"Demo\"]}]}"}'

As you can see, the format is really god awful. The use of tags on the command line (and also on the web console, for that matter) is complicated considerably by the non-uniform application of tags to resources. Some resources (such as EC2 instances) accept a tag on creation, whereas others can be tagged only once they are are created; you can see a detailed list here. Resource groups are still rather useful, however, due to which reason we will tag all the resources we create. We will see later how to create resource as a part of the DemoEnvironment.

VM and VPC, the heart of AWS

The heart of AWS is EC2 service, the Elastic Compute Cloud. It provides the means to create, organize, access and interface to scalable computing infrastructure, the infamous EC2 instances. Under the covers, AWS uses EC2 to run the rest of its own services. EC2 instances are just a part of the puzzle, though. You will be dealing even more often with the networking components of EC2, especially with virtual private clouds (VPC). Nearly every resource on AWS is connected to a VPC and a subnet, either directly or with at most one hop. A VPC is a logically isolated network which separates your AWS resources from the rest of AWS, while subnets are tools for finer control of how these resources communicate with each other, and with the internet. Your account comes with a default VPC; if you don’t supply the VPC argument, the resource will be created in this default VPC. Here is how to get the default VPC’s ID:

DEFAULTVPCID="$(aws ec2 describe-vpcs \
    --filter "Name=isDefault, Values=true" \
    --query "Vpcs[0].VpcId" --output text)"

As you can see, there is no separate namespace for VPC subcommands; they are in the EC2 namespace. Also, we used the --query argument, which can be added to any command to print a specific part of the response JSON. Here we use it to print the ID of the new network; we also pass it the option --output text to get the ID as simple text instead of a JSON string. Talking of the default VPC is a bit misleading; it’s more like the default networking infrastructure, as there are a couple of other things attached to this VPC that make it special. The first part of this structure is the subnets. We can print the subnets of the default VPC with the following query:

aws ec2 describe-subnets --filter \
    "Name=vpc-id,Values=$DEFAULTVPCID"

This should print a number of subnets; in my case it’s 3. One field of significance is the AvailabilityZone. It should be easy to see that each subnet has a different value, but they are all in the same region (for my region eu-central-1, the availability zones are eu-central-1a to 1c). A VPC created in a region will logically span all the availability zones (AZ) in that region. A subnet, on the other hand, is specific to a single AZ. You can also list the network interfaces, which are the entities through which computational resources connect to the network, with the following command:

aws ec2 describe-network-interfaces --filter \
    "Name=vpc-id,Values=$DEFAULTVPCID"

One situation where this command comes in handy is figuring out which resources to first delete when you are trying to delete a VPC. When it comes to the dependency graph, VPCs are pretty much at the top of the (top-down) tree. You cannot delete them unless all the other, non-default resources are also removed or detached.

Creating, connecting and instantiating resources in VPC and subnets

If you want to create multiple isolated resource groups, keep control over which resources can access which others, and generally understand how to connect various other AWS things like RDS databases, you will need to deal with VPC’s. Let’s begin this process with creating one such VPC:

VPCID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 \
    --query "Vpc.VpcId" --output text)
aws ec2 create-tags --resources $VPCID --tags Key=Environment,Value=Demo

The --cidr-block required argument specifies what IP range will be valid within the VPC. This uses the CIDR format, with the suffix /16 specifying how many bits from the beginning constitute the network mask; our VPC will be able to hand out and route between IPs from 10.0.0.0 to 10.0.255.255, that is, 256*256 = 65536 IPs in total. Once this command runs, you should see two results in the output of aws ec2 describe-vpcs: The default VPC, and the new one you created just now. You can also see the new VPC in the list of resources for our new resource group with the command aws resource-groups list-group-resources --group-name DemoEnvironment. A VPC is not enough information for AWS to figure out the networking topology, however: We need a subnet. The subnet needs to have a CIDR block that’s a subset of the VPC’s. Now let’s create one with the following command:

SUBNETID=$(aws ec2 create-subnet --vpc-id $VPCID \
  --cidr-block 10.0.1.0/24 \
  --query "Subnet.SubnetId" --output text)
aws ec2 create-tags --resources $SUBNETID --tags Key=Environment,Value=Demo

As you can see in the --cidr-block argument, this subnet covers IPs in the ranges from 10.0.1.0 to 10.0.1.255, which is a part of the IPs covered by the VPC. Once we have the subnet, we can go ahead and create our first EC2 instance attached to it. In order to do so, we first need the ID of a proper AMI. I used the following command to list the official Ubuntu AMI’s, and picked the newest one:

AMIID=$(aws ec2 describe-images \
  --filters "Name=root-device-type,Values=ebs" \
  "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-*" \
  "Name=architecture,Values=x86_64" \
  --query "reverse(sort_by(Images, &CreationDate)) | [?! ProductCodes] | [0].ImageId" \
  --output text)

The reason for the complicated query argument is that we don’t want the AMIs that are in the AMI marketplace, and one needs to pay for, or agree to license for. Now let’s start an EC2 instance with the AMI the above command picked (as of 11.08.2019, this is ami-0ac05733838eabc06):

aws ec2 run-instances --image-id $AMIID --count 1 \
    --instance-type t2.micro --key-name brand-new-key \
    --tag-specifications 'ResourceType=instance,Tags=[{Key=Environment,Value=Demo}]' --subnet-id $SUBNETID

This instance gets loaded with the SSH key that we uploaded earlier, named brand-new-key. It also gets the same environment tag, but with the convenience of adding it in the creation command, making a second command unnecessary. The --subnet-id argument specifies which subnet the networking interface should connect to. If we hadn’t specified this, a subnet in the default VPC would have been picked. We now have a functioning VM, whose status we can query by listing through the resource group, and querying for the instance ID:

INSTANCEID=$(aws ec2 describe-instances \
  --filter "Name=tag:Environment,Values=Demo" \
  --query "reverse(sort_by(Reservations, &Instances[0].LaunchTime)) | [0].Instances[0].InstanceId" \
  --output text)

The query part of this command is again relatively complicated. The reason is that, if you create a couple of VMs and terminate them, they will still appear in the list of VMs when searched by tag. That’s the reason we pick the VM that was last launched. When you create an instance and would like to know when it is actually running, you can use the handy wait feature, as follows:

aws ec2 wait instance-running --instance-ids $INSTANCEID

Now if we run the command to list group resources, we should see three entries: A VPC, a subnet and an instance. If you inspect the EC2 instance with aws ec2 describe-instance $INSTANCEID, you can see a couple of fields that are interesting. There’s the ID of course, and PrivateDnsName, but peculiarly no public IP or DNS. This is because the subnet was not configured to give this instances an IP address on launch; you can see that this is so in the MapPublicIpOnLaunch field of the subnet we created, which is false. The instance we created is in a vacuum, as far as we are concerned, and cannot be contacted from anywhere. You can also see this by right clicking on the instance in the web GUI, and clicking connect. AWS will ask you to pick a method out of SSH client, web SSH client, or Java SSH client. Interestingly, the first of these shows the private IP of this instance (something like 10.0.1.12), which is in the reserved range and cannot be used for internetworking. If you pick the second option, you will see an error message telling you that the instance does not have a public IP.

Opening a subnet to the outer world

We need to modify and extend our basic subnet in two ways in order for the instances connected to it to communicate with the internet. The first is a gateway. An internet gateway acts as a target for internet-routable traffic, and takes care of NAT (Network Address Translation). You should not confuse an internet gateway with a NAT gateway: The latter is used to connect instances in private subnets to the internet, while they are still unavailable to traffic from the outside. The default VPC has an internet gateway, as you would expect:

DEFAULTGATEWAY=$(aws ec2 describe-internet-gateways \
  --filters "Name=attachment.vpc-id,Values=$DEFAULTVPCID" \
  --query "InternetGateways[0].InternetGatewayId" --output text)
echo $DEFAULTGATEWAY

This should print the ID of the gateway used by the default VPC. A gateway is not automatically created for a VPC, however. Our new VPC is lacking one, which we can see using the following command:

aws ec2 describe-internet-gateways --filters \
  "Name=attachment.vpc-id,Values=$VPCID"

This should return an empty list. We can create a brand new gateway for our VPC with the following commands:

GATEWAYID=$(aws ec2 create-internet-gateway --query \
  "InternetGateway.InternetGatewayId" --output text)
aws ec2 create-tags --resources $GATEWAYID --tags Key=Environment,Value=Demo
aws ec2 attach-internet-gateway --vpc-id $VPCID \
  --internet-gateway-id $GATEWAYID

Now we have a gateway that is attached to our VPC. The next thing we need is a means for the networking logic to route the requests that are meant for the internet through this gateway. This is the job of the route table. Every VPC comes with a default route table (see here for details). We can see how these rules look by first looking at the settings for the default VPC and its subnets:

aws ec2 describe-route-tables --filters "Name=vpc-id,Values=$DEFAULTVPCID"

In the Routes entry of the resulting output, you should be able to see two entries. The first of these has the field DestinationCidrBlock set to 172.31.0.0/16, which is the CIDR of the VPC itself (you can verify this with the command aws ec2 describe-vpcs --vpc-id $DEFAULTVPCID --query "Vpcs[0].CidrBlock"). The GatewayId of this rule is local, meaning that it will route traffic locally. The second rule has 0.0.0.0/0 as DestinationCidrBlock, and its GatewayId is equal to the DEFAULTGATEWAY. Since the rules in a routing table take precedence in order of specifity, this second rule will be valid for all requests that are not meant for the VPC IP range. Since, as mentioned above, every VPC has a route table, we do not need to create a new one, and can instead modify the existing route table:

ROUTETABLEID=$(aws ec2 describe-route-tables \
  --filter "Name=vpc-id,Values=$VPCID" \
  --query "RouteTables[0].RouteTableId" --output text)
aws ec2 create-tags --resources $ROUTETABLEID \
  --tags Key=Environment,Value=Demo
aws ec2 create-route --route-table-id $ROUTETABLEID \
  --destination-cidr-block 0.0.0.0/0 \
  --gateway-id $GATEWAYID

With the last create-route command, we are telling the network to route requests that are not to an interface in the VPC to the gateway defined by the GATEWAYID. As we are modifying the default route table, there is no need to explicitly associate the route table with the subnets of the VPC which we want to make public, because in the absence of explicit associations, subnets use the default route table. This association is also not displayed in the result of aws ec2 describe-route-tables, which is the reason we cannot demo it for the default network. If it were the case that we were creating a new routing table, however, the following command would have been necessary for such an association:

aws ec2 associate-route-table  --subnet-id $SUBNETID \
  --route-table-id $ROUTETABLEID

One last step is necessary to make sure that the instances we start in the subnet are getting public IPs. The following will modify the subnet to make sure that is the case:

aws ec2 modify-subnet-attribute --subnet-id $SUBNETID \
  --map-public-ip-on-launch

Normally (as in, in most cases, and definitely for the default VPC), an instance that gets a public IP address is also given a public DNS; this public DNS of an instance can be queried through the PublicDnsName field. Sometimes, however (the documentation is not clear on when and how), the relevant fields on the VPC are not set properly on creation. In order to make sure that your instance gets not only an IP address but also a DNS, you should to set the proper configuration values with the following commands:

aws ec2 modify-vpc-attribute --vpc-id $VPCID --enable-dns-hostnames
aws ec2 modify-vpc-attribute --vpc-id $VPCID --enable-dns-support

As far as I can understand from the documentation, it is not possible to attach a public IP to a running instance from the subnet pool. You can use an Elastic IP, but that’s out of scope for this post. Instead, we will simply delete the running instance, and create a new one:

aws ec2 terminate-instances --instance-ids $INSTANCEID
INSTANCEID=$(aws ec2 run-instances --image-id $AMIID --count 1 \
    --instance-type t2.micro --key-name brand-new-key \
    --tag-specifications 'ResourceType=instance,Tags=[{Key=Environment,Value=Demo}]' \
    --subnet-id $SUBNETID --query "Instances[0].InstanceId" --output text)
aws ec2 wait instance-running --instance-ids $INSTANCEID

Let’s check whether our instance now has a public IP address and DNS:

IPADDRESS=$(aws ec2 describe-instances --instance-ids $INSTANCEID \
  --query "Reservations[0].Instances[0].PublicIpAddress" --output text)
PUBLICDNS=$(aws ec2 describe-instances --instance-ids $INSTANCEID \
  --query "Reservations[0].Instances[0].PublicDnsName" --output text)

IPADDRESS should now be a proper IP address, and PUBLICDNS should be a URL that resolves to that IP address. Since we already waited for the instance to start, you can, at least in principle, contact it via SSH with ssh ubuntu@$IPADDRESS or ssh ubuntu@$PUBLICDNS. If you try this now, however, you will again face an empty line, without a response from the new server. The reason for this silence is that the default security rules do not allow inbound traffic to this instance. AWS security groups are means of controlling the traffic between EC2 instances and the internet. A new VPC has a default security group, which also has default rules. These default rules allow all outgoing connections (and the incoming responses these cause), and all connections between instances in the same security group, but nothing else. Since we did not create a new security group (which we could have done with aws ec2 create-security-group), the new instance has been automatically connected to the default security group of the VPC. All is not lost, though: If we change the rules for the security group, it will be instantly applied to any new requests. Let’s modify the security group rules, and allow TCP connections from all IP addresses on the default SSH port:

SECURITYGROUPID=$(aws ec2 describe-security-groups \
  --filters Name=vpc-id,Values=$VPCID \
  --query "SecurityGroups[0].GroupId" --output text)
aws ec2 authorize-security-group-ingress --group-id $SECURITYGROUPID \
  --protocol tcp --port 22 --cidr 0.0.0.0/0

See here for more on security groups. Now you should be able to access the VM on the public IP address or DNS.

Cleanup

Cleaning up is relatively straightforward if you have access to the shell session with the variables that store the resource IDs. Remove all the AWS resources we created with the following commands:

aws ec2 terminate-instances --instance-ids $INSTANCEID
aws ec2 delete-key-pair --key-name brand-new-key
aws ec2 detach-internet-gateway --internet-gateway-id $GATEWAYID \
  --vpc-id $VPCID
aws ec2 delete-internet-gateway --internet-gateway-id $GATEWAYID
aws ec2 delete-subnet --subnet-id $SUBNETID
aws ec2 delete-vpc --vpc-id $VPCID
aws resource-groups delete-group --group-name DemoEnvironment

You have to delete resources in this order, otherwise AWS will tell you that dependencies are being violated. If you don’t have access to the IDs, you can either query the individual elements via the CLI using the Environment tag, or copy the IDs from the result of aws resource-groups list-group-resources. Unfortunately, as mentioned above, there is no easy command to delete all resources in a resource group. Even worse, there is no way to delete resources by ARN, which is the identifier output of this last command.

Conclusion

The AWS CLI client is, as one would expect from the company that builds AWS, a solid piece of software. As you might have noticed from the command examples, there are some inconsistencies, such as differing names for the same kinds of arguments, or the issue with tags, but I think this is the least one would expect from a client that has to cover such a massive base of functionality. In the second part of this tutorial, we will be looking at creating a Fargate cluster using the CLI. The requirements will get more complicated as we try to create a scalable, decoupled application, and we will use many other AWS services to tackle them.