Rate Limits
Learn about how fulfillmenttools handles load and how your client should be designed to allow for seamless scaling
Currently fulfillmenttools APIs has a rate limit of ~1000 requests per second. This limit can be raised on demand. Also we reserve the right to impose a limit that lowers 1000 requests per second in the future.
Consider a system that provides APIs as the primary way on how to interact with it. Thus API Calls ("Requests") have to be issued from any clients side in order to create/update or read data at/from the platform. When the number of Requests rises during usage a given infrastructure either needs to block calls (e.g. impose "rate limits" that limit the resource to only 1 instance) or it needs to scale with the load. fulfillmenttools decided for the latter: the platform scales with load imposed by the clients using the API.
That means in general the usage of the API can scale with the business of our customers. In order to provide the necessary service level however the platform transparently scales the needed services horizontally - which means new containers, that can answer requests, need to be created. In that case the HTTP Response Code 429: Too many Requests
comes into play - potentially way earlier than at 1000 requests a second as you will learn in the following.
Scaling behavior under load
Low Load Phase
When there is low load the provided resources are at a minimum. In the example above one instance is enough to handle all the calls in the depicted "Low Load Phase". That makes sense, because when there is nothing to do - why provide resources, that are idling?
Entering a High Load Phase
In the beginning of a "High load Phase" we also enter the "Scale Up Phase". On server side one or more Instances are starting in order to provide the needed resources to handle the current and the anticipated load of the near future. This happens fully automated and of course without any manual effort.
When entering a scale up phase the API may respond with HTTP Response Code 429 (Too many requests) to some requests. This does not necessarily mean, that you reached a rate limit, but it means that the current call could not be processed due to the lack of resources.
When you receive this response to a call an Instance is already starting. However, you need to re-issue the request in order to have it processed.
During time the needed resource will be provided and takes over some of the load. In this example from now on 2 instances are able to handle incoming requests.
"Will all my requests fail with '429: Too many requests' during a scale up phase?!"
Of course not. As the typical usage pattern of an API does not show sudden increases of load by doubling or even quadrupling the number of requests the better part of all the requests are being answered successfully by the existing instance(s).
However, best practice is, that a client should definitely be able to cope with the situation depicted above (see Best Practice).
During a High Load Phase
During the time of high traffic / high request counts the provided resources handle the incoming calls. If the load rises further additional Instances are being provided and the behavior will be similar to the one described in Entering a High Load Phase. However, the percentage of affected calls will further decrease as the ratio of operationally available instance is rising over time.
Leaving a High Load Phase
When the usage of the API drops again, when the load lowers not needed instances are being shutdown to safe resources. This happens completely transparent to the clients that issue requests. After the load is gone in above example only one instance is enough to handle the load.
"When does the system scale / add more instances?"
There is no definitive answer to this question. It depends on multiple parameters, such as complexity of the call, required CPU or Memory and the number and type of parallel calls that need to be processed as well as the currently available number of instances.
Also there is a number of concurrent requests that one instance is able to handle. Whenever an instance is about to reach this value another instance will be created.
So the true answer to that is (unfortunately): It depends. But the good news is, that following our Best Practice this should not be an issue for any given client.
Best Practice: Use Exponential Backoffs
In order to issue retries in case of the described HTTP Response Code 429: Too many Requests
we suggest the implementation of a retry mechanism based on Exponential Backoff algorithm.
tl;dr; Exponential Backoff in the context of web calls is the idea of re-issuing requests in case of errors in a decreasing cadence over time.
One example would be: A request is issued. The Requetst is anwered with 429: Too many requests
as an answer.
The first retry is then done after 1 second. If the answer is still 429 the next retry call is issued after 3 seconds. When the answer is still 429 for the call the next request is done after 10 seconds and so on and so forth.
This pattern continues until either the service answers with a positive response code or a (client induced) time limit was reached which would trigger fail-over strategies.
The example above is, by all means, an extreme example. There are good chances that a request reaches an instance with available resources after the first call already. However, it merely serves as an example for exponential backoff in action.
This approach might sound like over-complicating things. This is in fact not the case.
On the contrary: Implementing such mechanisms adds substantial resilience to the system as your client is aware and able to cope with the situation, that for a brief period of time the functionality of a remote service is not available. This could and will - rarely, but still - happen in any distributed system!
Luckily for us developers you do not need to implement the behavior of exponential backoff by yourself. There are powerful libraries out there that do a close to perfect job for this problem. Here are some libraries that we found helpful, but of course you are free to choose another or do your own implementation:
Library | Programming Language | Link |
---|---|---|
Resilience4J | Java | |
exponential-backoff | Node |
Best practice: Apply retries for other HTTP status codes
Once you have exponential retry mechanism in place it makes sense to also leverage this mechanism for other HTTP status codes. Among these codes are
500 Internal Server Error
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
408 Request Timeout
This again adds resilience to the system and allows for automated recovery of error states due to connectivity or temporary downtime.
Last updated