Limits and scaling
Responsible API use
To ensure reliable service, platform integrity, and fair access for all users, fulfillmenttools expects all API consumers to follow a set of basic usage principles. While many of these should be self-evident to experienced developers, we state them explicitly to establish a shared understanding:
Use the API responsibly: Avoid excessive or abusive request patterns, including unnecessary retries, polling loops, or scripted load tests against production endpoints.
Store only relevant data: Only data that's necessary and appropriate within the context of the Order Management System (OMS) may be stored on the platform. Storing unrelated or excessive data isn't permitted.
Respect rate limits: Don't attempt to bypass or manipulate rate limits. These limits are in place to protect platform stability and ensure equitable access.
Avoid misuse of credentials: API keys and tokens must be kept secure and used only by authorized systems. Sharing credentials across unrelated systems or teams is prohibited.
Don't scrape or replicate data: The API is designed for transactional and operational use. Bulk extraction or replication of data for purposes outside the intended scope isn't permitted.
Follow retry guidelines: Implement exponential backoff and respect
Retry-Afterheaders when handling transient errors such as429: Too many requestsor408: Request timeout. See Retry policies for more information.Report issues responsibly: If you encounter unexpected behavior, performance degradation, or suspect a bug, contact fulfillmenttools support rather than attempting workarounds that could impact system integrity.
It's strictly forbidden to attempt to crash or destabilize the platform: Any such behavior will result in immediate suspension and may lead to permanent revocation of access.
Load testing must be controlled: Simulations should reflect realistic daily traffic and expected peak loads. Sudden request spikes are prohibited. A controlled ramp-up phase is essential. See the Load testing article for more information.
By integrating with our API, you agree to adhere to these principles. Violations may result in throttling, temporary suspension, or revocation of access.
Retry policies
In the event of an HTTP request failure, clients may implement a retry mechanism. fulfillmenttools recommends using an exponential backoff strategy to avoid overwhelming the system and increase the likelihood of a successful retry.
General guidance
Retries should be avoided for most client-side errors (HTTP 4xx), as these typically indicate issues that won't resolve without changes to the request. Retrying with identical parameters can result in unnecessary traffic and degraded performance.
However, there are some notable exceptions where retries might be appropriate:
404: Not foundThis status might indicate that the requested resource doesn't exist yet. If the resource is expected to be created shortly (for example, due to asynchronous processing), a retry after a delay might be reasonable. Otherwise, repeated retries will get the same result.408: Request timeoutThe server timed out waiting for the request. A retry might be feasible, especially if the client is confident the request was valid and the timeout was due to transient network issues.409: ConflictThe request couldn't be completed due to a conflict with the resource's current state. A retry might be feasible, but only after resolving the conflict (see the Optimistic-Locking section for more information)429: Too many requestsThis status indicates that the client has exceeded the allowed request rate, either per endpoint or across the API. This can occur during platform scale-up phases or due to rate limits designed to protect system stability. Clients should respect theRetry-Afterheader (if provided) and delay subsequent requests accordingly. If this error occurs frequently, contact fulfillmenttools support to discuss potential adjustments to your rate limits.
Rate limits and quotas
Global API rate limits and quotas
To ensure platform stability and fair usage across clients, fulfillmenttools enforces a default rate limit of 1,000 requests per second (RPS) across all API endpoints. This global limit applies cumulatively to all requests made by a client, regardless of the specific endpoint.
If your integration requires higher throughput — for example, during peak operational periods or batch processing — rate limits can be adjusted upon request. Contact fulfillmenttools support to discuss your requirements and initiate a quota increase.
Clients exceeding the allowed rate might receive a 429: Too many requests response. In such cases, it's recommended to implement exponential backoff and respect the Retry-After header if present.
Endpoint-specific rate limits (planned)
In addition to the global rate limit of 1,000 requests per second, fulfillmenttools will introduce endpoint-specific rate limits to reflect better the operational characteristics and business relevance of individual API resources.
These limits will be defined based on the assumed usage patterns per business functionality. For example, endpoints supporting high-volume read operations, such as /api/promises/checkoutoptions/* , may allow higher throughput. Endpoints involved in transactional or write-heavy operations, such as /api/facilities/{facilityId}/listings or /api/users, might have stricter limits to ensure consistency and system stability.
Rate limit information will be communicated through standard HTTP headers in each response:
X-RateLimit-Limit: [max requests]
X-RateLimit-Remaining: [remaining requests]
X-RateLimit-Reset: [timestamp]Clients are encouraged to monitor these headers and implement adaptive throttling strategies. If your application requires higher limits for specific endpoints, contact fulfillmenttools support to request a tailored quota.
Further details, including endpoint-specific thresholds and usage tiers, will be published in upcoming releases of the API documentation.
API limits
Rate limits
fulfillmenttools APIs enforce a rate limit of approximately 1000 requests per second.
This limit applies across the entire API.
The limit can be increased on demand.
A lower limit may be imposed in the future.
The platform uses APIs as the primary interaction method. Clients issue API calls to create, update, or read data. When request volume increases, infrastructure must either block calls or scale with the load. fulfillmenttools applies the latter approach: the platform scales horizontally by creating additional containers to handle requests.
If the system cannot scale quickly enough, the API responds with HTTP 429: Too Many Requests. This response may occur before reaching 1000 requests per second, depending on infrastructure load.
List size limits
REST read and write operations support a maximum of 500 items per call.
GraphQL queries, subscriptions, and mutations support a maximum of 100 items per call.
Requests exceeding these limits return an error.
Multiple calls and the paging function of the APIs can be used to handle larger datasets.
Scaling
API calls must be issued from the client side to create/update or read data at/from the platform. When the number of requests rises during usage, a given infrastructure either needs to block calls (for example, by imposing rate limits that limit the resource to only one instance) or scale with the load. fulfillmenttools decided on the latter: The platform scales with the load imposed by the clients using the API.
That means API usage can scale with our customers' businesses. To provide the necessary service level, however, the platform transparently scales the needed services horizontally (new containers that can answer requests need to be created). In that case, the HTTP Code 429: Too many requests comes into play — potentially way earlier than 1000 requests a second.
Scaling behavior under load
The below diagram shows the different phases.
Scaling up phase
When there is a low load, the provided resources are minimal. In the example above, one instance is enough to handle all the calls in the depicted "low load phase."
At the beginning of a "high load phase," we also enter the "scale up phase". On the server side, one or more instances are started to provide the needed resources to handle the current and anticipated load shortly. This happens fully automated.
These new instances begin processing requests as soon as they are available, distributing the load across the newly available resources.
During a scale-up phase, the API might respond with 429: Too many requests. This doesn't necessarily indicate that a rate limit has been reached. Instead, the request couldn't be processed due to insufficient resources.
When this response occurs, a new instance is already starting. The request must be re-issued to be processed successfully.
High load phase
During periods of high traffic, the available resources handle incoming requests. If the load continues to rise, additional instances are started. The behavior is similar to the initial scale-up phase. As the number of operational instances increases, the percentage of affected calls decreases as the ratio of operationally available instances rises over time.
Scaling down
When API usage decreases, unneeded instances are shut down to conserve resources. This process is fully transparent to clients. After the load subsides, a single instance may again be sufficient to handle requests.
When does the system scale?
There's no definitive answer to this question. It depends on multiple parameters, such as the complexity of the call, the required CPU or memory, the number and type of parallel calls to be processed, and the number of available instances.
At low load, minimal resources are provided. In the example above, a single instance can handle all requests during a low-load phase.
At the beginning of a high-load phase, the system automatically starts one or more additional instances to provide the required resources. Once scaling is complete, the new instances handle incoming requests. For example, two instances may now process calls instead of one.
Scaling parameters
The exact point at which the system scales depends on several factors:
Complexity of the call
Required CPU or memory
Number and type of parallel calls
Currently available number of instances
Last updated