Rate Limiting Algorithms

Rate Limit:

The rate limit is a restriction that controls the number of requests, actions, or operations a user, system, or application can perform within a specific time period.

It is commonly used in computer systems, APIs, and networks to:

Prevent abuse (e.g., stopping bots or DDOS attacks)
Ensure fairness (so resources are shared evenly)
Protect system stability (avoiding overloads)

Popular Algorithms:

Leaking bucket

Token bucket

Fixed window counter

Sliding window log

Sliding window counter

Leaking Bucket:

Rate Limiting Algorithms Explained with Code

It can be seen as a bucket with a hole in the bottom—water leaks from the bottom at a fixed rate. If water pours in at a higher rate, then it creates overflow naturally.

Design a Distributed Scalable API Rate Limiter

This is what happens in this algorithm. In a computer system, the bucket is implemented using a First In First Out data structure (QUEUE). The queue has a fixed size. The requests are processed by taking from the queue. If the queue is full and the next request comes in, then the request is discarded.

Pros:

Stable traffic control
Easy to understand & implement

Cons:

Hard to handle a sudden traffic spike

Token Bucket:

It solves the issue with high traffic in the previous algorithm.

Alex Xu on X: "Token Bucket The Token Bucket algorithm is widely used for rate limiting. It is simple, well understood and commonly used by large tech companies. Both Amazon and Stripe

A token bucket is a container that contains tokens with limited capacity. There is a refiller that periodically puts tokens in the bucket. It can not add more tokens if the bucket is already full. When a request is received, if a token is available, the request is forwarded for processing; otherwise, it is dropped. Each request consumes one token.

In the picture above, we want to process 3 requests per minute. Then the bucket has a capacity of 3 tokens. The refiller puts 3 tokens in the bucket every minute. Here we have 5 concurrent requests. Since we have 3 tokens in the bucket, only 3 requests can be served, and 2 of them are dropped.

Pros:

Flexible in handling short-term burst requests.
Easy to understand and implement.

Cons:

Memory consumption issue at the user level. Needs a separate token bucket for each user.
Doesn’t guarantee a smooth traffic rate.
Complex parameter tuning. (bucket capacity, token generation rate)

Fixed Window Counter:

Here window is defined as a fixed time frame. The number of requests within a window is fixed.

In the picture,

window size = 1 minute
request/window = 5

The request counter resets to 0 every minute. Here we are allowing 5 requests per minute. In the first minute, 7 requests come & 5 of them are processed & 2 are rejected.

Pretty simple, right? But can you see there is a problem in the algorithm? From 1:45s to 2:15s, the time period is less than 1 minute, but requests were served 8, which is more than allowed (5). This can create an unwanted spike.

Pros:

Simple & easy to implement.
Allow short-term peek requests.

Cons:

Windows boundary issue: requests can exceed the threshold. Potentially 2x the requests allowed.

Sliding Window Log:

It fixes the window boundary spike issue of the fixed window. It uses a dynamic window that moves with the time change.

The system logs the timestamp of the accepted requests. Each new request is checked against the number of requests in the recent window.

window size = 10s
requests/window = 3

Here we can see, 4 requests come within the first window(10 seconds). Timestamps (1, 3, 8) are logged since they are accepted. When the 4th request comes at the 9th second, the current window’s request count is 4, but the allowed count is 3. So the request is dropped.

New request comes at 12th second, and the window moves to [2, 12]. So the older log is removed, and the latest request is accepted and logged.

Pros:

Fine-grained traffic control & no window boundary issue.

Cons:

Increase complexity in processing.

Sliding Window Counter:

This provides a balance between the Fixed window counter & Sliding window log algorithm.

It uses a formula to accept or reject requests.

weight = (1 - X%) * last_window_requests + current_window_requests
1 + weight <= Limit then accept otherwise reject

According to the above situation, 3.4 > 3, so the request is rejected. But the first 3 requests are accepted.

Even if the current window’s request count (counter_2 = 0) is below the threshold, the system rejects the new request, considering the load of the previous window.

The current picture shows that the system is accepting more than the limit.

Pros:

Requires less memory than a sliding window log
Smooth window boundary traffic

Cons:

Not as accurate as the Sliding window log algorithm
The threshold can be exceeded

References:

A video explanation with animation
System Design Interview V1, Book by Alex Xu
A better blog than this by Asish Pratap Singh

Rate Limiting Algorithms

Rate Limit:

Popular Algorithms:

Leaking Bucket:

Token Bucket:

Fixed Window Counter:

Sliding Window Log:

Sliding Window Counter:

References:

Comments

More from this blog

Design A Rate Limiter

Command Palette

Rate Limit:

Popular Algorithms:

Leaking Bucket:

Token Bucket:

Fixed Window Counter:

Sliding Window Log:

Sliding Window Counter:

References:

Comments

More from this blog