Design A Rate Limiter

I am a full-stack .NET & React developer. Till now, I have over 5 years of experience working as a professional. Currently, I am working at Vivasoft Ltd. as a Software engineer.
In a network system, a rate limiter is used to control the rate of traffic sent by a client or service. Rate limiting can be implemented using different algorithms. The algorithm to be used depends on the application's rate-limiting requirements. There are many algorithms, each has its pros & cons. I have discussed popular algorithms on my blog:- Rate-limiting algorithms
Design Requirements:
Design a server-side rate limiter.
Accurately limit excessive requests.
It will support a different set of throttle rules.
It must be able to handle a large number of requests.
Inform users if they are throttled.
It should be highly available and fault-tolerant.
Distributed rate limiting. Rate limit multiple servers.
High-Level Design:

From a very abstract concept, the rate limiter will function as described above picture. If a new request arrives and it is under the limit, the request will be successful; otherwise, it will be rate-limited.
The basic idea of rate limiting is simple. We need a counter to keep tract of the number of requests from a user, IP address, etc. If the counter is larger than the limit, the request is dropped.
The rate limiter needs to store the counters. A better choice is Redis in-memory database. It is faster & supports time-based expiration. It has handy commands that greatly help in implementation.
INCR => It increase the stored counter by 1
EXPIRE => It sets a timeout for the counter

When a request arrives, the rate limiter will fetch the corresponding bucket from Redis and check if the limit is reached or not.
If the limit is reached, the request is rejected.
If limit is not reached, the rate limiter will send the request to the Server. Meanwhile the system increments the counter and saves it back to Redis.
The high-level design does not show:
How are rate-limiting rules created & stored?
How to handle requests that are rate-limited.
Usually, the rules are written in configuration files and saved on disks. We have to pull the rules from that storage.
Rule examples:
domain: messaging
descriptors:
- key: message_type
value: marketing
rate_limit:
unit: day
requests_per_unit: 5
domain: auth
descriptors:
- key: auth_type
value: login
rate_limit:
unit: minute
requests_per_unit: 5
In the above examples, the first one is configured to allow a maximum of 5 marketing messages per day. In the second example, a client is not allowed to log in 5 times in a minute.
Detailed Design:

Rules are stored on the disk. Workers frequently pull the rules and store them in the Cache. When a client sends a request to the server, the request is sent to the rate limiter first.
Rate limiter loads the rules from cache. It fetches counters and the last request timestamp from Redis. Based on the data, it decides:
If the request is not rate-limited, it is forwarded to the server.
If the request is rate-limited, the rate-limiter returns ‘429 too many requests‘ error to the client. In the meantime, the request is either dropped or forwarded to the Queue for further processing.
Rate limiting in a distributed environment:
Till now our design will work fine for a single server. It is difficult for scalling the rate-limiting system to support multiple servers. Again handling concurrent thread is another challenge.
For designing a distributed rate limiter, 2 main challenges come first.
Race condition:
In a distributed environment race condition is an obvious issue. Locks are the simplest solution. But looks will significantly slow down the system. Using Lua script and sorted set data structure in Redis, it can be solved without locks.
Synchronization:
For handling millions of users, one rate limiter server is not enough. We need multiple servers. When multiple servers are used, synchronization is a must.
Suppose, client’s first request goes to server 1, and the second request goes to server 2. Each server knows that the client made 1 request, but actually 2. So we need a synchronization for the counter.
One possible solution is using a sticky session that allows a client to send traffic to the same rate limiter. This solution is not scalable and flexible. Hence, it is not advised.
A better approach is to use a centralized data store like Redis. All rate limiters will store state in the central Redis store.
Final Design:

Here we have multiple rate limiters. Since the rate limiter should be fault-tolerant, we will use multiple Redis replicas and multiple rule cache replicas.
Improvements:
Multi-data centers:
The system should perform well. So we have to optimize performance. We need multiple data centers because users who are located far away from the data center will observe high latency. Cloudflare has more than 200 geographically distributed edge servers.
Monitoring:
The rate limiter should be monitored thoroughly. We have to gather analytics to ensure that
The rate-limiting algorithm is effective.
The rate-limiting rules are effective.
According to business needs, we have to decide what to do with the request that is rate-limited. If we need to process them later, we can put them in the message queue. Otherwise, simply drop the request.

