Rate Limiting Algorithms
The 5 Most Effective Rate Limiting Strategies Explained

I am a full-stack .NET & React developer. Till now, I have over 5 years of experience working as a professional. Currently, I am working at Vivasoft Ltd. as a Software engineer.
Rate Limit:
The rate limit is a restriction that controls the number of requests, actions, or operations a user, system, or application can perform within a specific time period.
It is commonly used in computer systems, APIs, and networks to:
Prevent abuse (e.g., stopping bots or DDOS attacks)
Ensure fairness (so resources are shared evenly)
Protect system stability (avoiding overloads)
Popular Algorithms:
Leaking bucket
Token bucket
Fixed window counter
Sliding window log
Sliding window counter
Leaking Bucket:

It can be seen as a bucket with a hole in the bottom—water leaks from the bottom at a fixed rate. If water pours in at a higher rate, then it creates overflow naturally.

This is what happens in this algorithm. In a computer system, the bucket is implemented using a First In First Out data structure (QUEUE). The queue has a fixed size. The requests are processed by taking from the queue. If the queue is full and the next request comes in, then the request is discarded.
Pros:
Stable traffic control
Easy to understand & implement
Cons:
- Hard to handle a sudden traffic spike
Token Bucket:
It solves the issue with high traffic in the previous algorithm.

A token bucket is a container that contains tokens with limited capacity. There is a refiller that periodically puts tokens in the bucket. It can not add more tokens if the bucket is already full. When a request is received, if a token is available, the request is forwarded for processing; otherwise, it is dropped. Each request consumes one token.
In the picture above, we want to process 3 requests per minute. Then the bucket has a capacity of 3 tokens. The refiller puts 3 tokens in the bucket every minute. Here we have 5 concurrent requests. Since we have 3 tokens in the bucket, only 3 requests can be served, and 2 of them are dropped.
Pros:
Flexible in handling short-term burst requests.
Easy to understand and implement.
Cons:
Memory consumption issue at the user level. Needs a separate token bucket for each user.
Doesn’t guarantee a smooth traffic rate.
Complex parameter tuning. (bucket capacity, token generation rate)
Fixed Window Counter:
Here window is defined as a fixed time frame. The number of requests within a window is fixed.

In the picture,
window size = 1 minute
request/window = 5
The request counter resets to 0 every minute. Here we are allowing 5 requests per minute. In the first minute, 7 requests come & 5 of them are processed & 2 are rejected.
Pretty simple, right? But can you see there is a problem in the algorithm? From 1:45s to 2:15s, the time period is less than 1 minute, but requests were served 8, which is more than allowed (5). This can create an unwanted spike.
Pros:
Simple & easy to implement.
Allow short-term peek requests.
Cons:
- Windows boundary issue: requests can exceed the threshold. Potentially 2x the requests allowed.
Sliding Window Log:
It fixes the window boundary spike issue of the fixed window. It uses a dynamic window that moves with the time change.
The system logs the timestamp of the accepted requests. Each new request is checked against the number of requests in the recent window.

window size = 10s
requests/window = 3
Here we can see, 4 requests come within the first window(10 seconds). Timestamps (1, 3, 8) are logged since they are accepted. When the 4th request comes at the 9th second, the current window’s request count is 4, but the allowed count is 3. So the request is dropped.

New request comes at 12th second, and the window moves to [2, 12]. So the older log is removed, and the latest request is accepted and logged.
Pros:
- Fine-grained traffic control & no window boundary issue.
Cons:
- Increase complexity in processing.
Sliding Window Counter:
This provides a balance between the Fixed window counter & Sliding window log algorithm.
It uses a formula to accept or reject requests.
weight = (1 - X%) * last_window_requests + current_window_requests
1 + weight <= Limit then accept otherwise reject

According to the above situation, 3.4 > 3, so the request is rejected. But the first 3 requests are accepted.
Even if the current window’s request count (counter_2 = 0) is below the threshold, the system rejects the new request, considering the load of the previous window.

The current picture shows that the system is accepting more than the limit.
Pros:
Requires less memory than a sliding window log
Smooth window boundary traffic
Cons:
Not as accurate as the Sliding window log algorithm
The threshold can be exceeded
References:
System Design Interview V1, Book by Alex Xu

