Simple load balancing with Nginx and AWS

Julio César
@ZzAntares

This time we’re going to configure a load balancer with Nginx, however before we start we need to know the basics of load balancing. Let’s get down to it.

A load balancer is just an entity in the network that delegates traffic to some backend service, the most basic of them are often working at the transport layer, they check the source and destination IP address and ports of the incoming packets and route them to some backend based on some rules. However they’re much more common in the application layer, usually when dealing with HTTP requests, in this layer the load balancer uses information from the HTTP request headers such as the URI, content type, destination host, etc. in order to know where should request be routed to.

Strategies

There are different strategies in which a load balancer is configured but the most common ones are these:

Round Robin

In this strategy the requests are sent sequentially in turn to the backend servers, for example, suppose we have three backend servers and we receive 5 requests, then the load balancer under the round robin fashion will route the requests like this:

Request #	Backend server
1	Server 1
2	Server 2
3	Server 3
4	Server 1
5	Server 2

Least connections

Under this configuration the load balancer is monitoring the backend servers so that it knows how much load they’re receiving, in this case the load balancer simply routes new requests to the server with the least amount of traffic. The load of the backend servers is typically determined by the number of open connections they have or the average CPU usage.

IP Hash

For this load balancing strategy, the load balancer is usually configured so that requests coming from a certain range of IPs are routed to a set of specific backend servers based in some administration policies. For example, some countries may require companies to route requests received from within the country to be processed by a server that is also located at a data center in the same country were the request was originated. Another example could be a VIP access, for example request made by certain known IPs are routed to a backend service with a much better performance than the average consumer uses, streaming companies could use this setup for example.

Statefulness and sticky sessions

When our application is replicated among different back-end servers behind a load balancer one tricky thing to deal with are sessions. Sessions are usually created for a request so that during a series of various request-response interactions we know we’re talking to the same client. However, if one instance of our application at the back-end server #1 handles the first request, and then in the next request the load balancer forwards the request to the back-end server #2, we would’ve effectively lost the session of the user. In order to avoid this we can use a common shared data store, for example a redis instance were we store all the sessions, then if our request is then handled at another back-end server, we just have to look up in the redis database the session of the user and restore it from there.

But that’s not the only way in which we can deal with that problem, we could configure the load balancer so that it keeps state about where previous requests from the same IP have been sent to. This is usually achieved by having the load balancer to hash the IP of the incoming request and keep a map-like data structure were the value corresponds to the identifier of the back-end server (usually the host or IP address) that previously handled incoming requests from this IP, under this configuration we are guaranteed by the load balancer that requests received from the same IP will be handled by the same back-end server that processed the initial request, this at least for some time as usually the load balancer will keep a TTL after which requests could then be forwarded to another back-end server making the session to expire.

Amazon Auto-Scaling

Usually load balancers can be considered as a single point of failure so it’s never a good idea to have a single instance, luckily for us nowadays is pretty fast and cheap to spin up new instances.

I don’t like commercials so I will not give one, but just to comment if our back-end server is just an EC2 instance on Amazon Web Services (AWS), then we can create an image of our back-end server (called an AMI), then by using the auto-scaling feature provided by AWS one can configure which AMI to deploy when the load to our servers increases, it can also scale down so idle instances are shutdown. I won’t go deep on the configuration details for AWS but I will write the basics so in the future I don’t forget how is done.

First, a launch configuration needs to be created, in this configuration one has to specify the performance characteristics of the machine to be deployed in the auto-scaling group as well as the SSH key pair used to manage the instances (if one needs to). Then we define the auto-scaling group and we specify the launch configuration we just created, in this phase we can also set the conditions under which new instances must be spawned, for example if the average CPU usage (among all the servers in the group) goes beyond a threshold and stays there after some time.

At this point we don’t have a load balancer created, as it is a separated entity from the auto-scaling group. At the time of writing amazon offers the classic load balancer and the application load balancer, the difference is that the first just distributes traffic equally among all servers belonging to our auto-scaling group, whereas the application load balancer uses the request information to determine where certain requests should be routed to.

In the AWS console we then spin up a load balancer and we will need to define the targets that this load balancer will forward requests to. After creating the load balancer we create the target groups that will then be associated to the auto-scaling group we made. Under the auto-scaling section we need to edit the auto-scaling group we’re interested to link to the load balancer and we specify the target groups to be linked to this auto-scaling group. In the load balancer section there’s an area were we can configure the routing rules, that is, the circumstances under which the requests will be routed to a specific auto-scaling group.

Using Nginx as a load balancer

The AWS route is fine, but I love the CLI and configuring tools myself, of course this is not always the path one must take but certainly is the most entertaining. Let’s configure Nginx as a load balancer.

Usually a simple Nginx server configuration looks like this:

http {
    server {
        listen 80;

        location / {
            proxy_pass http://localhost:8000/;
        }
    }
}

This is a simple case where our application server, for example Gunicorn, could be listening on localhost port 8000. Now, to use Nginx as load balancer we need to specify our back-end servers as a group in a upstream block outside of the server block like this:

http {
    upstream webapp {
        server localhost:8000;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://webapp/;
        }
    }
}

The interesting part is the upstream block, there we can define other instances of our application server that will receive requests from the Nginx load balancer:

upstream webapp {
    server localhost:8000;
    server localhost:8001;
    server localhost:8002;
}

Just by doing this, Nginx will now employ a round-robin load balancing strategy, but we can change that, for example to implement sticky sessions we use the ip_hash strategy (inside the upstream block) so that Nginx keeps state were previous from this IP requests have been sent to.

upstream webapp {
    ip_hash;
    server localhost:8000;
    server localhost:8001;
    server localhost:8002;
}

We can also use least_conn to employ the least connections strategy instead and use max_conns on each server to limit the number of open connections to the back-end server. There exists other useful server directives such as backup to indicate that a back-end server should receive requests only if the primary server fails or down to deactivate a server from being a back-end (or just comment that server line 😅).

Of course this is a simple example were the services are hosted in the same host as the Nginx server, in a real case scenario the back-end servers will be at another host but regardless of that the configuration is the same.

More information

Nginx load balancing reference