Back to Notes

Load Balancing

  • Distributes incoming requests amongst multiple servers to optimise resource utilisation
  • Why it is needed?
    • This distributes the requests among multiple servers so our single system won't be overloaded. which results into high availability.
    • It reduces the response time since not a single system is over worked.
    • It provided ability to dynamically add or remove servers which helps in making systems falut tolerant and scalable.

Load Balancing Algorithms

  • There are 2 types of load balancing algorithms:
    1. Dynamic Load Balancing:
      • This consider the current state for each server for distributing the incoming traffic.
      • These are complex compared to static one but this provides better fault tolerance and performance.
      • These are most efficient when certain tasks or requests takes longer time to execute then the other.
      • Ex. dynamic round robin, Least connection, weighted response time.
    2. Static Load Balancing:
      • These doesn't consider current state for the severs.
      • These algorithms are very simple and efficient in case each tasks take same time to execute.
      • Ex. Round robin, Consistent hashing, Weighted round robin.
Simple Hashing:
  • There are N severs. each server is labelled from 0 to N-1.
  • Create a has function H(x) which which takes request x returns uniformly random value
  • Every time a request comes we get the H(x)%N. This gives us a value y (which is between 0 to N-1). We then route the request to server y.
  • As the output of hash function is uniformly random if we have M requests then each server will get M/N requests. so the load factor (a number indicating the amount of time needed to execute a service or request) will become 1 / N
  • Limitations:
    • For same input hash function will alway gives same output. so the same client request will redirect to the same server. To improve the performance each server uses cache to store relevant information. For e.g., if client x is directed to server y then y will cache some data for x.
    • But what happens when we add a new server. Now we need to find the value of H(x)%(N+1). This will cause a lot of client requests to be routed to different servers. Due to this almost all servers have to recompute the cache data.
    • It is very inefficient to recompute the cache every time we add or remove servers. Therefore this method does not provides the flexibility to add or remove servers.
    • On average, Number of remappings = Number of requests.
Consistent Hashing:

Ref: https://www.youtube.com/watch?v=zaRkONvyGr8

  • This algorithm improves upon simple hashing by reducing the reallocation of requests to servers when a new server is added or removed. this provides flexibility to add or remove servers.
  • In simple hashing we map each client request to an array ranging from 0 to N-1. In consistent hashing we have a circular array ranging from 0 to M-1.
  • Each client and servers have a unique ID an we hash these IDs and place the client and servers on the circular array. For each client request we go clockwise and find the nearest sever and rout the request to that server so the load factor (average) in this case is 1/N.
  • Limitations:
    • Although the load factor is 1/N in average case but practically we may have skewed distribution. This might cause few servers to be overloaded with lot of requests while other servers are idle.
    • This approach won't work if servers have different capacities since it distributes the load evenly.
  • How we can improve:
    • To prevent this skewed distribution we can use concept of virtual servers.
    • We want to increase the no of servers but getting actual servers is expensive. so instead of generating 1 ID we will create different K IDs for the single server. and then has these K IDs and place them on the circular array. This reduces the chances of load being skewed.
Round Robin:
  • iterate through each servers and send requests one by one. Once it reaches the last server start iterating and sending requests from first server.
  • E.g., Request 1 is routed to Server 1. Request 2 is routed to Server 2. Request 3 is routed to Server 3. Request 4 is routed to Server 1 and so on.
  • Benefits:
    • It is easy to implement
  • Limitations:
    • This algorithm assumes that servers are similar enough to handle equivalent loads. This might cause the servers with less capacity to overload and fail more quickly while other servers remain idle.
Weighted Round Robin:
  • This algorithm will use when we need to route requests to servers having different capacities.
  • In this we assign each server some weight, this weight is directly proportional to the capacity of the server. It then allocates requests in a cyclic manner similar to round robin.
  • e.g., Server X is assigned weight 4. Server Y is assigned weight 1. Server Z is assigned weight 2. Suppose we receive 15 requests.
    • Server X is routed 8 requests. Server Y is routed 2 requests. Server Z is routed 4 requests. Server X is routed 1 request.
Geo Based:
  • this routes requests to servers specified in given locality.
  • Limitations:
    • This doesn't do even distribution in case we have skewed load.
Least Connections:
  • This is dynamic load balancing algorithm.
  • List the connections each servers have currently and distribute next request to server which has least active connections.
  • If there are multiple servers with lowest connections then it follows the round robin approach.
  • E.g., Server A has 3 active connections, B has 2 active connections and C also has 1 active connections. Suppose we receive 3 client requests.
    • Request 1 is routed to server C. (Now C has 2 connections) Request 2 is routed to server C. (Now C has 3 connections) Request 3 is routed to server B. (Now B has 3 connections
  • Benefits:
    • This helps in distributing load evenly. without casing skewed allocation.
    • This helps even the different requests take different time to complete.
  • Limitations:
    • It has a complex algorithm compared to static load balancing.
    • It needs more processing power.
    • It only considers the no of active connections not the capacities of the server.
Weighted Least Connection:
  • This is similar to the least connection algorithm except each server has some weight which is directly proportional to the capacity of the server.
  • If there are multiple servers with lowest connection then it routes the requests to the server with largest weight.
  • e.g., Server A is assigned weight 7. Server B is assigned weight 4. Server C is assigned weight 3.
    • Initially, Server A has 3 active connections. Server B has 2 active connections. Server C has 2 active connections. Suppose we receive 3 client requests.
    • Request 1 is router to server B. (It has least connection and weight(B) > weight(C)).
    • Request 2 is routed to server C. (It has least connection)
    • Request 3 is routed to server A. (It has least connection and weight(A) > weight(B) > weight(C))

Sticky vs Non-Sticky sessions

  • In case of Sticky Sessions, all your requests will be directed to the same physical web server while in case of a non-sticky load balancer may choose any webserver to serve your requests.
  • Explanation:
    • Ref:
    • When your website is served by only one web server, for each client-server pair, a session object is created and remains in the memory of the web server. All the requests from the client go to this web server and update this session object. If some data needs to be stored in the session object over the period of interaction, it is stored in this session object and stays there as long as the session exists.
    • However, if your website is served by multiple web servers which sit behind a load balancer, the load balancer decides which actual (physical) web-server should each request go to. For example, if there are 3 web servers A, B and C behind the load balancer, it is possible that www.mywebsite.com is served from server A, www.mywebsite.com is served from server B and www.mywebsite.com/ are served from server C.
    • Now, if the requests are being served from (physically) 3 different servers, each server has created a session object for you and because these session objects sit on three independent boxes, there's no direct way of one knowing what is there in the session object of the other. In order to synchronize between these server sessions, you may have to write/read the session data into a layer which is common to all - like a DB. Now writing and reading data to/from a db for this use-case may not be a good idea. Now, here comes the role of sticky-session.
    • If the load balancer is instructed to use sticky sessions, all of your interactions will happen with the same physical server, even though other servers are present. Thus, your session object will be the same throughout your entire interaction with this website.

Database Sharding:

  • This is a process of dividing data into partitions which can be stored in multiple database instances.
  • It uses some key to partition data (e.g. userId, Geo Locations etc.). This key is an attribute of data that we are storing.
  • How it works?
    • Suppose, there are 1000 users in your database and you have 5 database servers. You want to shard the data on userID. So you can partition the data in the following manner
      • userID 000 - 199 -> database 1.
      • userID 200 - 399 -> database 2.
      • userID 400 - 599 -> database 3.
      • userID 600 - 799 -> database 4.
      • userID 800 - 999 -> database 5.
    • Now if the userID 546 wants to perform read/write operations, he will only connect to database instance-3. And since there are only 200 userIDs, query processing will be fast.

Types of Sharding Architectures:

Range based sharding
  • In this partition data based on the ranges of key.
  • This is easy to implement but data may not be evenly distributed across shards.
Key based sharding or Hash based sharding
  • In this method generate hash value of the key (an attribute of the data). This hash value will determine which shards to be used to store or access data.
  • using this simple hash function to distribute data can cause the skewed distribution. To overcome this we can use consistent hashing.
Directory based sharding:
  • In this create a lookup table that uses a shared key to check which shard has which data. The lookup maps each key to the shard.
  • It is more flexible than range and key-based sharding, the lookup table is a single point of failure.

Difference between Horizontal Partitioning and Sharding

  • In horizontal partitioning we split the table into multiple tables in the same database instance whereas in sharding we split the table into multiple tables across multiple database instances.
  • In horizontal partitioning we use same data base instance so the names of partitioned tables have to be different. In sharding since the tables are stored in different database instances table names can be the same.

Advantages of Sharding:

  • High availability - Even if one shard fails other shards are still functioning and can process queries so the database as a whole remains partially functional.
  • Security - User can only access certain shards. So you can implement different access control mechanisms for different shards.
  • Faster query processing - Since the size of data base in each server is small the size of index is also small. This results in faster query processing.
  • Increase read and write throughput - Both read and write capacity increases as long as operations are done on one shard.
  • High Scalability - Partitioning the data and storing them in different shards provides scalability in terms of data and memory (because it spreads the load on multiple machines memory usage in each shard is less and the network bandwidth does not saturate)

Disadvantages of Sharding:

  • Complexity - The server has to know how to route query to the appropriate shard. If we add the code for finding shard in the server, it makes server more complex.
  • Transactions and Rollback - We can't process queries for the two different tables present in different shard. So transactions across shard are not possible. and therefore rollbacks are not possible.
  • Joins across shards - If we want to join tables from two different shards, then the query needs to go to two different shards, pull the data and join the data across the network. This becomes a very expensive operation.
  • Infra cost - Sharding requires more machines and computing power over a single database server. If there is no proper optimization then the increase in cost can be significant.
  • Fix Number of shards
    • solution to this is consistent hashing
    • Or Mem-cached database

Improved sharding techniques:

Hierarchal Sharding
  • It is very difficult to increase/decrease the no of shards. So we only can have fix no of shards.
  • Since the no of shards are fixed, one shards could grow too big to solve this issue we can do sharding on the large shard itself. Every shard has a manager who maps the request to the correct mini shard. This is known as hierarchal sharding.
Master Slave architecture
  • If we want to query data from a shard even if the database instance goes offline, we can use master-slave architecture.
  • There are multiple slaves which are copying the master. whenever there is a written request it is always on the master and whenever there is a read request it is distributed evenly across the slaves. In case the master fails, slaves choose one master among themselves.

Best Practices:

  • Create index on the attribute on each shards which is different from the attribute then the one used to create shard.

Single Point of failure

  • A single point of failure in computing is a critical point in the system where failure can take down whole system. A lot of resources and time is spent on removing single point of failure in an architecture.
  • Single points of failure often pop up when setting up coordinators and proxies. These services help distribute load and discover services as they come and leave the system. Because of the critical centralized tasks of these services, they are more prone to being SPOFs.
  • How to avoid/remove single point of failure?
    • One way to mitigate the problem is to add more nodes/instances of every component in the service.
    • Another approach is to use master slave architecture where we have backups that allows us to qucik switch over an failure. These backups are useful in componenet dealing with data like database servers.
    • If we have multiple nodes for servers we need load balancer to divert the requests efficiently but then load balancer can become the single point of failure in that case we can have multiple load balancer which are register through DNS.
    • If we follow all above then also there can be change of physical damage to the systems where they have place in that case we can deploy such systems in the multiple regions.
  • It is important to note that the CAP theorem does not allow removing SPOFs if perfect consistency is required.

Service discovery and Heartbeats

  • In real world availability should be take more precedence then the efficiency.
  • We can have a health service, which periodically checks all servers and see if it responds if any server doesn't gives the response it will mark that server as critical and if two consequent calls doesn't gives the response then it will mark that server as dead and triggers the workflow to restart the server.
  • These health service also tells the load balancer that server is dead and transfer the incoming requests to some other server.
  • There can be chance the server can still be up the service deployed in the server has stopped working in that case we can't determine just by checking the server so we can do the two way call as the health service will check the server and the service inside server also sends the single to the health service so that we can know that both server and service are running correctly.