Discovering AWS with the CLI Part 2: ECS and Fargate

Published on 25.10.2019

In the first part of this tutorial, we looked at provisioning AWS EC2 resources using the CLI client, and delved into the details of how various networking components function. In this second part, we will look at using containers instead of virtual machines to deploy applications. In the recent years, containers have become the predominant form of delivering server-side software, due to their versatility and limited resource use. Especially Docker has made it possible to package services and online applications so that they can be distributed from a central repository, and replicated with very little effort. ECS (Elastic Container Service) is AWS’s entry into the container orchestration space, where other alternatives are Kubernetes, Mesos and the like. There are two different ways to use ECS: The old way, where you have to provision the computing resources manually, and the new way, where AWS is responsible for running the infrastructure. We will use the latter method, which is named Fargate.

As in the first part of the tutorial, we will be using the AWS CLI; you should install it and set up the necessary credentials and environment variables using the tips from the first post. In order to build the container images that will be deployed on Fargate, you will need Docker; it can be installed by following the standard installation instructions. The necessary files for the container images and the demo applications are in this sample repo. Finally, as with the first part, you can find all the commands in this tutorial in a bash script in the same repository.

Organization of Fargate

As mentioned above, Fargate is a launch type, i.e. a method of deploying containers on ECS, the Elastic Container Service. In ECS, applications are deployed as tasks, which are collections of containers working together, similar to pods in Kubernetes, on clusters, groups of container and networking infrastructure that can spawn multiple AZs in a region (see here for a diagram of how ECS clusters are organized). A set of tasks that are scheduled according to a scaling strategy, and on which load is distributed, is a service. There are two launch types on ECS. Fargate, the one we will use, offloads the management of computational resources to AWS, and leaves only the work of defining tasks and services, in addition to networking, to the user. The EC2 launch type, on the other hand, requires the user to create and manage the VMs on which the containers run.

An important component of ECS is the container agent. This agent is installed on EC2 instances on which the tasks run, and is responsible for pulling, running and stopping the containers. When using the EC2 launch type, it’s the user’s duty to install and run the agent, but Fargate absolves the user of this task by automating it. You nevertheless need to be conscious of the fact that this agent is doing work for you in the background, however, as we will see later.

Preliminary commands

There are two things we need to take care of at the start. The first is picking a region. Many commands, such as creation of subnets or VPC endpoints, require the explicit specification of a region, which we would like to simplify by putting it into a variable, as in REGION=eu-central-1. The second preliminary is a bit more complicated. ECS uses a longer format for ARNs, which due to some reason makes it impossible to tag services. There is an option, however, which you need to opt in to, which enables this feature. You can opt in either using the web console (the Account Settings tab in the ECS service view), or by running the following command:

aws ecs put-account-setting-default --name serviceLongArnFormat --value enabled

Warning: This will set the option for all the IAM users on an account. If you don’t want to do this, you should change it on the web console for the specific IAM user and use the API keys for that account in the rest of the tutorial.

Creating a repository

Containers are distributed by building an image, and uploading it to a container repository from which they can be downloaded. Repositories on AWS are provided by ECR, the Elastic Container Registry (not Repository, since a registry is a collection of repositories). Each AWS account has a single registry, which can house many repositories; you can’t delete this registry, or add any new registries. If you want to push images for a service, you need to create a repository for it. Let’s go ahead and create a repository for the static-app (which is just Nginx with an index page, but I named it app due to some reason, and now it’s too late to change) in the sample code repo:

aws ecr create-repository --repository-name static-app \
  --tags Key=Environment,Value=Demo

STATICAPPREPOURL=$(aws ecr describe-repositories \
  --repository-names static-app \
  --query "repositories[0].repositoryUri" --output text)

As you can see, we are sticking to the habit of setting the Environment tag to Demo for all our resources, as in the previous installment. The CLI also lets you log into the repository you just created without having to deal wih a complicated process, using the ecr get-login subcommand. The output of this command is itself a command you can use to log your docker client into the registry. You can avoid the extra copy-paste by executing the return value of this command, as follows:

$(aws ecr get-login --region $REGION --no-include-email)

Be mindful of the --no-include-email option, as the command returned without it is not valid. Now it’s time to build and push a container to this registry. In the directory of the static-app, there is a Dockerfile that you can use to create an image. Once you check out this repository, navigate to the directory static-app, and run the following commands:

docker build -t $STATICAPPREPOURL:0.1 static-app/
docker push $STATICAPPREPOURL:0.1

We now need to deploy this image on Fargate. The first resource we need to create is a cluster. We will use the name demo-cluster for our cluster:

aws ecs create-cluster --cluster-name demo-cluster --tags key=Environment,value=Demo

If you now run aws ecs list-clusters, it should show your brand new cluster as the only entry.

IAM role for the ECS agent

The ECS agent mentioned above needs to carry out certain operations in order to orchestrate the task containers. Among these are checking for images in the registry, downloading these images, and creating and piping to log streams (see here for details). In order to give it the right permissions, we need to create the approriate IAM role, and give the ECS agent the permission to take on this role. Let’s first create a role named ecsTaskExecutionRole, giving the ECS agent the right to take on this role:

ROLEARN=$(aws iam create-role --role-name ecsTaskExecutionRole \
  --assume-role-policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"Service\":[\"ecs-tasks.amazonaws.com\"]},\"Action\":[\"sts:AssumeRole\"]}]}" \
  --query "Role.Arn" --output text)

We will later use this role name in our task definitions. We now need to attach the right policy to this role. Fortunately, we don’t have to manually create the policy, or attach the individual permissions one by one, since there is a policy managed by AWS that contains all the individual permissions. We will now get the ARN of this policy, named AmazonECSTaskExecutionRolePolicy, and attach it to the role we just created:

POLICYARN=$(aws iam list-policies \
  --query 'Policies[?PolicyName==`AmazonECSTaskExecutionRolePolicy`].{ARN:Arn}' \
  --output text)
aws iam attach-role-policy --role-name ecsTaskExecutionRole --policy-arn $POLICYARN

Registering a task definition

Having pushed a container image, created a cluster, and given the cluster agent the right permissions, what we need to do next is create a task definition. A task definition is a JSON file that specifies which containers have to be deployed together as a unit, and on which ports these containers are listening. Here is a template that will serve as the base of our task definition for the static-app container (the file static-app/task-definition.json.tmpl in the sample repository):

{
  "family": "static-app",
  "networkMode": "awsvpc",
  "executionRoleArn": "$ROLEARN",
  "containerDefinitions": [
    {
      "name": "static-app",
      "image": "$STATICAPPREPOURL:0.1",
      "portMappings": [
    {
      "containerPort": 8080,
      "hostPort": 8080,
      "protocol": "tcp"
    }
      ],
      "essential": true
    }
  ],
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512"
}

In this template, you need to either manually replace $ROLEARN and $REPOURL with the actual values, or use this file as a template, by first exporting the necessary values with export ROLEARN STATICAPPREPOURL on the command line, and then subsituting them with envsubst < static-app/task-definition.json.tmpl > task-definition.json. Now we are ready to create a task definition with the following command:

TASKREVISION=$(aws ecs register-task-definition --cli-input-json file://task-definition.json \
  --tags key=Environment,value=Demo --query "taskDefinition.revision" --output text)

A couple of things worth pointing out in this task definition:

The networkMode is awsvpc, which is an AWS-native implementation of container networking. awsvpc enables tasks to connect to the AWs networking infrastructure just like VMs over an elastic network interface (ENI), with the ability to give them private IPs and DNS entries. When using Fargate, the networkMode has to be specified as awsvpc.
containerPort and the hostPort have to match because we are using awsvpc; see the section Port mappings in this part of the documentation.
You can’t use arbitrary values for cpu and memory. See here for the combinations of values that are allowed.
The family field is used to generate an index for the task definition versions. When the task definition is first created, it starts at version 1. Every request to register a task definition with the same family field will up this number by one, and this version number can be used when a service is created or updated. This numbering is also the reason we are saving the new task revision in a variable, so that we don’t accidentally deploy old versions of our tasks.

Creating a service

A service is a group of tasks, managed by the container orchestration system (in our case Fargate). The tasks sit behind a common interface, and the incoming requests are distributed among them based on load and availability. Fargate, similar to other container orchestration systems, makes it easy to scale the number of tasks and dedicate resources. In order to turn our static-app into a service, we need to use the previously created task definition, specifying how to scale it, and route requests to it.

If you thought we would be able to navigate around the networking stuff from the first post, I’m sorry to disappoint you. The first thing we need to deal with to create an online service is networking infrastructure. Let’s start with the VPC and its subnets:

VPCID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 --query "Vpc.VpcId" --output text)
aws ec2 create-tags --resources $VPCID --tags Key=Environment,Value=Demo

# We will need this later when we deploy services with DNS to our VPC
aws ec2 modify-vpc-attribute --vpc-id $VPCID --enable-dns-hostnames
aws ec2 modify-vpc-attribute --vpc-id $VPCID --enable-dns-support

SUBNETID=$(aws ec2 create-subnet --vpc-id $VPCID --cidr-block 10.0.1.0/24 \
  --availability-zone "${REGION}b" \
  --query "Subnet.SubnetId" --output text)
SUBNET2ID=$(aws ec2 create-subnet --vpc-id $VPCID --cidr-block 10.0.2.0/24 \
  --availability-zone "${REGION}c" \
  --query "Subnet.SubnetId" --output text)
PRIVATESUBNETID=$(aws ec2 create-subnet --vpc-id $VPCID --cidr-block 10.0.3.0/24 \
  --availability-zone "${REGION}c" \
  --query "Subnet.SubnetId" --output text)
aws ec2 create-tags --resources $SUBNETID --tags Key=Environment,Value=Demo
aws ec2 create-tags --resources $SUBNET2ID --tags Key=Environment,Value=Demo
aws ec2 create-tags --resources $PRIVATESUBNETID --tags Key=Environment,Value=Demo

Here, we are laying down the networking infrastructure for the rest of the tutorial; this is the reason for creating two subnets. As we will see later, application load balancers require at least two subnets, hence the two public subnets. These two subnets also need to be in different availability zones for reliability; this is why we are distinguishing them using a single extra letter, as explained in the AWS documentation on regions and AZs. The private subnet will be used to host an internal service. Now let’s create a gateway, which we need for the communication between the services on the VPC and the rest of the Internet, as explained in the first part of this post:

GATEWAYID=$(aws ec2 create-internet-gateway --query "InternetGateway.InternetGatewayId" \
  --output text)
aws ec2 create-tags --resources $GATEWAYID --tags Key=Environment,Value=Demo
aws ec2 attach-internet-gateway --vpc-id $VPCID --internet-gateway-id $GATEWAYID

Once we have the gateway, we need to modify the default route table to use it, and allow ingress to the network security group:

ROUTETABLEID=$(aws ec2 create-route-table --vpc-id $VPCID \
  --query "RouteTable.RouteTableId" --output text)
aws ec2 create-tags --resources $ROUTETABLEID --tags Key=Environment,Value=Demo
aws ec2 create-route --route-table-id $ROUTETABLEID --destination-cidr-block 0.0.0.0/0 \
  --gateway-id $GATEWAYID
aws ec2 associate-route-table  --subnet-id $SUBNETID --route-table-id $ROUTETABLEID
aws ec2 associate-route-table  --subnet-id $SUBNET2ID --route-table-id $ROUTETABLEID
SECURITYGROUPID=$(aws ec2 describe-security-groups \
  --filters Name=vpc-id,Values=$VPCID \
  --query "SecurityGroups[0].GroupId" --output text)
aws ec2 authorize-security-group-ingress --group-id $SECURITYGROUPID \
  --protocol tcp --port 80 --cidr 0.0.0.0/0

Now that we have the necessary networking elements and security rules, we can go ahead and create our first service, based on the simple-app task definition:

aws ecs create-service --cluster demo-cluster --service-name static-app-service \
  --task-definition static-app:$TASKREVISION --desired-count 1 --launch-type "FARGATE" \
  --scheduling-strategy REPLICA --deployment-controller '{"type": "ECS"}'\
  --deployment-configuration minimumHealthyPercent=100,maximumPercent=200 \
  --network-configuration "awsvpcConfiguration={subnets=[$SUBNETID],securityGroups=[$SECURITYGROUPID],assignPublicIp=\"ENABLED\"}"

aws ecs wait services-stable --cluster demo-cluster --services static-app-service

Let’s go through some of the arguments:

The launch type is FARGATE, which we also specified as a required compatibility in the task definition.
The scheduling-strategy argument lets us specify how tasks are instantiated and maintained. The REPLICA strategy tells Fargate to keep desired-count (another argument) instances of the task running. We can increase or decrease this number as need be, and Fargate will take care of starting, stopping and (together with a load balancer, which we will see later) routing traffic to these tasks.
An important aspect of a container orchestration platform is how new containers are deployed. The deployment-controller and deployment-configuration arguments are how we specify the deployment strategy. The ECS deployment controller is used for rolling deployments, in which new containers are started, and depending on whether these reach running state, old ones are stopped after draining connections to them. The numbers in deployment-configuration specify the percentage of new containers to start and old ones to stop at the same time. Refer to the documentation for details.
The network configuration options, required for the awsvpc infrastructure, specify that the service should attach to one of the public subnets, run under the default security group, and receive a public IP.

Once the service is created, we use the wait command, which we previously used to wait for an EC2 instance (albeit with ec2 wait instead of ecs wait), to wait for the service to be stable, i.e. for the number of running tasks to be equal to the number of desired tasks. Once this command returns, we can fetch the IP address of the service task with the following command:

aws ec2 describe-network-interfaces --filters "Name=subnet-id,Values=$SUBNETID" \
  --query 'NetworkInterfaces[0].PrivateIpAddresses[0].Association.PublicIp' --output text

You should now be able to access this task at the resulting IP address. We can’t yet call it a day, however. The way we are using ECS is suboptimal due to a number of reasons. Because each task gets a separate IP address, clients will need to know which task has which IP to make a request (assuming that our service does something useful, of course). Load balancing between multiple tasks of a service will be difficult, as the clients need to keep track of the IPs of the tasks. There is also a clear security risk, as all tasks would have public interfaces. We will adress these issues in the next section.

Microservices on Fargate

What we want to achieve in this section is being able to use Fargate as a microservices platform. This involves the following features that are missing from our primitive, one-public-IP-per-task setup:

Ingress configuration: Based on the request path, we want to be able to route requests to different services.
Load balancing: Both for public and private services, we want to distribute requests between the tasks in a manner independent of the client.
Internal DNS to implement service discovery.

Ingress and load balancing with ELB

Thanks to awsvpc networking, it is very easy to connect an ELB instance to a subnet, and assign task containers to it. The kind of load balancer we will use is called an application load balancer (ALB), which allows only HTTP and HTTPS traffic. Let’s first scale down our static-app service to zero tasks and delete it, as it is too basic for this demonstration:

aws ecs update-service --service static-app-service --cluster demo-cluster --desired-count 0
aws ecs delete-service --service static-app-service --cluster demo-cluster
aws ecs wait services-inactive --service static-app-service --cluster demo-cluster

An ALB is configured through three entities: Load balancer, target group and listener. The load balancer is the point of contact for the clients, and the target group gathers the target units (in our case tasks) that receive the requests. Listeners connect these two to each other, and are used to specify which conditions are used to route requests to which target groups. Now let’s create these:

LBARN=$(aws elbv2 create-load-balancer --tags Key=Environment,Value=Demo --name demo-balancer \
  --type application --subnets $SUBNETID $SUBNET2ID --security-groups $SECURITYGROUPID \
  --tags Key=Environment,Value=Demo \
  --query "LoadBalancers[0].LoadBalancerArn" --output text)

TGARN=$(aws elbv2 create-target-group --name hostname-app-tg \
  --protocol HTTP --port 80 --target-type ip --vpc-id $VPCID \
  --query "TargetGroups[0].TargetGroupArn" --output text)

aws elbv2 add-tags --resource-arns $TGARN --tags Key=Environment,Value=Demo

LISTENERARN=$(aws elbv2 create-listener --load-balancer-arn $LBARN --protocol HTTP \
  --port 80 --default-actions Type=forward,TargetGroupArn=$TGARN \
  --query "Listeners[0].ListenerArn" --output text)

We are not adding tags to the listener, as this is not supported. As already mentioned, load balancers require at least two subnets from different zones on creation, for reasons of reliability; we are using the two subnets we created in different AZs here. The target group we create is empty, and will be populated later by a new service. We will be using a different service for demo purposes in this section; you can find it in the samples repo. This service is called hostname-app because it displays the value of the HOSTNAME environment variable; we will see why this is relevant later. Another thing we will need is a security group for internal services through which we can control traffic between various parts and the internet. We will allow traffic between this security group and any interfaces on the VPC network:

PRIVATESECURITYGROUPID=$(aws ec2 create-security-group \
  --group-name private-security-group --description "Private SG" \
  --vpc-id $VPCID --query "GroupId" --output text)

aws ec2 authorize-security-group-ingress --group-id $PRIVATESECURITYGROUPID \
  --protocol tcp --port 0-65535 --cidr 10.0.0.0/16

aws ec2 authorize-security-group-egress --group-id $PRIVATESECURITYGROUPID \
  --protocol tcp --port 0-65535 --cidr 10.0.0.0/16

Finally, we need to create a new container repository for this service, push an image, and create a task description:

aws ecr create-repository --repository-name hostname-app \
  --tags Key=Environment,Value=Demo

HOSTNAMEAPPREPOURL=$(aws ecr describe-repositories \
  --repository-names hostname-app \
  --query "repositories[0].repositoryUri" --output text)

docker build -t $HOSTNAMEAPPREPOURL:0.1 hostname-app/
docker push $HOSTNAMEAPPREPOURL:0.1
export ROLEARN HOSTNAMEAPPREPOURL
envsubst < hostname-app/task-definition.json.tmpl > task-definition.json

HNTASKREVISION=$(aws ecs register-task-definition --cli-input-json file://task-definition.json \
  --tags key=Environment,value=Demo --query "taskDefinition.revision" --output text)

VPC Endpoints

We can now create a service for the hostname app, which, unfortunately, is not going to be particulary successful. Let’s go ahead and see why. Here is the command we need to create the service:

aws ecs create-service --cluster demo-cluster --service-name hostname-app-service \
  --task-definition hostname-app:$HNTASKREVISION --desired-count 2 --launch-type "FARGATE" \
  --scheduling-strategy REPLICA --deployment-controller '{"type": "ECS"}'\
  --deployment-configuration minimumHealthyPercent=100,maximumPercent=200
  --network-configuration "awsvpcConfiguration={subnets=[$PRIVATESUBNETID],securityGroups=[$SECURITYGROUPID],assignPublicIp=\"DISABLED\"}" \
  --load-balancers targetGroupArn=$TGARN,containerName=hostname-app,containerPort=8080 \
  --tags key=Environment,value=Demo

We will go through the new arguments to the create-service command later, but first let’s query the state of the task that is started by the Fargate agent for this service with the following commands:

TASKARNS=$(aws ecs list-tasks --cluster demo-cluster \
  --service-name hostname-app-service --query "taskArns" --output text)
aws ecs describe-tasks --tasks $TASKARNS --cluster demo-cluster

If you do this a short time after the service is created, you will see an error message similar to the following in the field tasks[0].containers[0].reason:

"CannotPullContainerError: Error response from daemon: Get https://$REPOID.ecr.eu-central-1.amazonaws.com/v2/: net/http: request canceled while waiting for connection
(Client.Timeout exceeded while awaiting headers)"

This error is caused by Fargate not being able to fetch the container images required for the task, because there is no network path to the ECR repository. When we deployed static-app, our tasks could communicate with the rest of the Internet in a straightforward manner, as they had public IPs. In the new layout, the tasks are on a private subnet, and can be contacted only through the load balancer. It is possible to solve this issue using a NAT (Network Address Translation) gateway, but NAT gateways are relativelssy expensive, and require an elastic IP address. A better solution can be achieved using VPC endpoints. What VPC endpoints essentially provide is that AWS services work as if they are part of a private subnet. There are two kinds of VPC endpoints: Interfaces and gateways. Interface endpoints function by creating an endpoint network interface in the specified subnets. Gateway endpoints, on the other hand, function by manipulating the route table of a VPC. Although there are two different kinds of endpoints, you as a user do not have much choice as to which to use for which service, since gateway endpoints have to be used for S3 and DynamoDB, and interfaces for the other services. We will therefore go ahead and create an interface VPC endpoint for ECR, and a gateway endpoint for S3, as container image layers are downloaded from S3. Bur first let’s first delete the existing service:

aws ecs update-service --service hostname-app-service --cluster demo-cluster --desired-count 0
aws ecs delete-service --service hostname-app-service --cluster demo-cluster
# This takes some time
aws ecs wait services-inactive --service hostname-app-service --cluster demo-cluster

In order to make sure we can isolate different pieces of our cluster security-wise, let’s also create a separate security groop for the endpoints, and authorize ingress and egress between the private security group and this new group:

ENDPOINTSECURITYGROUPID=$(aws ec2 create-security-group \
  --group-name endpoint-security-group --description "VPC Endpoint SG" \
  --vpc-id $VPCID --query "GroupId" --output text)

aws ec2 authorize-security-group-ingress --group-id $ENDPOINTSECURITYGROUPID \
  --protocol tcp --port 0-65535 --source-group $PRIVATESECURITYGROUPID

aws ec2 authorize-security-group-egress --group-id $PRIVATESECURITYGROUPID \
  --protocol tcp --port 0-65535 --source-group $ENDPOINTSECURITYGROUPID

Through these rules, we are allowing requests into the endpoints from the private services (on all ports here, but allowing port 80 for HTTP and 443 for HTTPS should be enough).

And now let’s create the ECR and S3 VPC endpoints:

ECRENDPOINTID=$(aws ec2 create-vpc-endpoint --vpc-endpoint-type "Interface" \
  --vpc-id $VPCID --service-name "com.amazonaws.${REGION}.ecr.dkr" \
  --security-group-ids $ENDPOINTSECURITYGROUPID --subnet-id $PRIVATESUBNETID \
  --private-dns-enabled --query "VpcEndpoint.VpcEndpointId" --output text)

aws ec2 create-tags --resources $ECRENDPOINTID --tags Key=Environment,Value=Demo

S3ENDPOINTID=$(aws ec2 create-vpc-endpoint --vpc-endpoint-type "Gateway" \
  --vpc-id $VPCID --service-name "com.amazonaws.${REGION}.s3" \
  --route-table-ids $DEFAULTRTID $ROUTETABLEID \
  --query "VpcEndpoint.VpcEndpointId" --output text)

aws ec2 create-tags --resources $S3ENDPOINTID --tags Key=Environment,Value=Demo

The ECR endpoint accepts a security group id argument, for which we use the default security group of the VPC. The S3 endpoint, on the other hand, does not accept such an argument. The question now is, how do we specify that requests from our private subnet to S3 are allowed? We can’t use IP addresses, as we don’t know the private IP which the S3 gateway is appointed. Security groups are not an option, as the gateway does not have on. The solution is using what are called prefix lists to specify a group of IP prefixes that point to the S3 endpoints the gateway will choose among. In the following, we first get the ID of the prefix list we are interested in using aws ec2 describe-prefix-lists, and then we allow requests to these IP addresses from our services using the --ip-permissions option of the authorize-security-group-egress:

S3PREFIXLISTID=$(aws ec2 describe-prefix-lists --region $REGION \
  --query "PrefixLists[?PrefixListName == 'com.amazonaws.${REGION}.s3'].PrefixListId" \
  --output text)

aws ec2 authorize-security-group-egress --group-id $PRIVATESECURITYGROUPID \
    --ip-permissions IpProtocol=tcp,FromPort=0,ToPort=65535,PrefixListIds="[{Description=\"Why isnt this in the docs\",PrefixListId=${S3PREFIXLISTID}}]"

Afterwards, let’s try to create the service once more, with the command repeated here for ease of reference:

aws ecs create-service --cluster demo-cluster --service-name hostname-app-service \
  --task-definition hostname-app:$HNTASKREVISION --desired-count 2 --launch-type "FARGATE" \
  --scheduling-strategy REPLICA --deployment-controller '{"type": "ECS"}'\
  --deployment-configuration minimumHealthyPercent=100,maximumPercent=200
  --network-configuration "awsvpcConfiguration={subnets=[$PRIVATESUBNETID],securityGroups=[$SECURITYGROUPID],assignPublicIp=\"DISABLED\"}" \
  --load-balancers targetGroupArn=$TGARN,containerName=hostname-app,containerPort=8080 \
  --tags key=Environment,value=Demo

aws ecs wait services-stable --cluster demo-cluster --services hostname-app-service

Let’s now go through the arguments to this command that differ from the previous one that created static-app:

The desired count is this time 2. The service will create 2 tasks for us, and the incoming requests will be load balanced among these over the load balancer we created.
The network configuration this time around specifies that the network interface should be placed on the private subnet, and that public IP is disabled. Our container cannot make or receive requests to/from the rest of the internet, except for the AWS services for which we created VPC endpoints.
The additional argument --load-balancers specifies that the service bind to the load balancer target group created earlier. Here we are specifying that the containers named hostname-app (this should align with the name field in the task definition) should be contacted on port 8080, which is the port our app listens on.

Once again, we are waiting for the service to reach a stable state where all tasks are running. Once this command has run through, we can fetch the URL of the load balancer, at which we can access the service, with the following command:

aws elbv2 describe-load-balancers  --load-balancer-arns $LBARN \
  --query "LoadBalancers[0].DNSName" --output text

You should now see a page that displays the hostname of the task that responds to the request. If you reload the page, you should see the displayed hostname alternate between two options, as the consequent requests are rotated between two targets as per the round robin algorithm. We can scale our service by changing the number of tasks using the aws ecs update-service. New tasks will be added to the service, or old ones removed, with the load balancer target group draining the connections from the removed ones, and rerouting alternatively to new tasks. Here is an example for reducing the number of tasks to one:

aws ecs update-service --service hostname-app --cluster demo-cluster \
  --desired-count 1

Health Checks

One thing you have to pay attention to when creating the task and the load balancer is the health check option of the load balancer target group. Health checks are used by load balancers to determine which targets (in our case, containers, but it could also be VMs) are healthy, and should be routed requests to. The default health check for ALBs is whether a GET request to the index (i.e. /) endpoint of the target returns a 200 response code. If your app does not respond to such a request in the expected manner, you can use the health check options of the create-target-group subcommand to specify a more suitable one. A tricky issue to debug is when the app is configured to bind to localhost or 127.0.0.1 instead of 0.0.0.0. When this is the case, the app will not respond to the requests on the host it is given by the Fargate agent, thus failing the health request checks. New instances of the same task will be created in a loop, without the service reaching stable status. So make sure that your app binds to the general 0.0.0.0 interface instead of the loopback interface.

Internal DNS and Service Discovery

If we want to use Fargate as a microservice platform, we need a means to contact the tasks of a service on a private subnet under a single name for easy server-side service discovery. To give an example, Kubernetes achieves this functionality by giving each service a DNS that resolves to a cluster IP. This cluster IP is used to proxy connection requests to a service to one of the service pods at the node level. The way to implement similar functionality on Fargate would be through the ECS service discovery API, which uses Route 53 to create VPC-local DNS entries for services. In our demo of this functionality, we will use yet another app, the random-quote-app, which returns a random quote on programming a JSON. The random-quote-app will not have a public endpoint, in order to simulate microservices. hostname-app service has the route /random-quote which queries the random-quote-app and displays the result.

Commands for creating random-quote-app container registry and task definition are marginally different from the previous two services, so I will not repeat them here, and will instead focus on service discovery. The resources we need for DNS-based service discovery on Fargate are a namespace and a “service discovery service”, a terrible name for a straightforward concept. A service discovery service is an ECS service that should be represented in the service discovery mechanism with a name. This name, plus the namespace, are used to resolve DNS queries to the IP address of a task that belongs to the service. In the following, we are first creating the namespace, and then the service discovery service that attacches to it:

OPERATIONID=$(aws servicediscovery create-private-dns-namespace --name "local" \
 --vpc $VPCID --region $REGION --query "OperationId" --output text)

NAMESPACEID=$(aws servicediscovery get-operation --operation-id $OPERATIONID \
  --query "Operation.Targets[0].NAMESPACE" --output text)

RQSERVICEID=$(aws servicediscovery create-service --name random-quote \
  --dns-config "NamespaceId=\"${NAMESPACEID}\",DnsRecords=[{Type=\"A\",TTL=\"300\"}]" \
  --health-check-custom-config FailureThreshold=1 --region $REGION \
  --query "Service.Id" --output text)

The --name argument we supply to the aws servicediscovery create-private-dns-namespace command will be the top-level domain of the cluster DNS. Once we fetch the ID of the namespace with the second command, we can use it to create DNS for our service with aws servicediscovery create-service. The --name argument to this command determines how to refer to the service. Once this second command has ran, any DNS queries to random-quote.local from within the VPC will resolve to up to eight instances of random-quote-app. You should now be able to go to ${LBURN}/random-quote/ and see a random quote on programming. As you can see in the app code, hostname-app uses the URL http://random-qoute.local:8080 to contact the random-quote-app and fetch the quote. The port has to be included in the request, because the task to which the DNS resolves is contacted directly, without a load balancer in between.

Conclusion

As mentioned in the introduction to the first part of this tutorial, the command line client for AWS can be quite useful for discovering what AWS has to offer. Once the going gets tough, however, and numerous AWS services and complicated security and network resources are involved, it gets quite difficult to keep track of the various commands and the minute ways they differ from each other. In another context, I have had the opportunity to implement a very similar microservice architecture, using Terraform, a tool much better suited to provisioning dependent and highly-connected cloud resources. It was a much better experience, and I would say that beyond simple things, and the occasional tricky feature that cannot be implemented with another tool, the CLI should be limited only to discovery and prototyping. That said, I hope this tutorial helped you to understand Fargate and the other relevant AWS components better.

Resources

This blog post gives an overview of the advantages of Fargate over ECS.
This blog post from the AWS team explains the nitty gritty details of container networking in Fargate.
Another blog post from AWS, this one explaining how to create a service registry for a Fargate cluster.
A detailed tutorial on connecting ECR to Fargate using VPC endpoints.
Deep Dive into AWS Fargate is a talk from 2018 that contains a nice overview of Fargate as compared to ECS and standard EC2, with a demo that uses CloudFormation.