fulfillmenttools
API documentationIncident ManagementFeedback
Developer Docs
Developer Docs
  • Developer docs
  • Getting Started
    • Quickstart
    • Integration tutorial
      • Adding facilities
      • Adding listings to facilities
      • Configuring stocks
      • Carrier configuration
      • Placing orders
      • Checkout options
      • Distributed Order Management System (Routing)
      • Local fulfillment configuration
    • Free trial
  • Technical Basics
    • Access to fulfillmenttools
    • Feature status
    • Available regions
    • Backup policies
  • Connecting to fulfillmenttools
    • Client SDKs
    • commercetools connect
    • OpenID connect
      • Configure Microsoft Entra ID / Azure Active Directory
      • Configure Keycloak
  • API
    • Core concepts
      • Authentication & authorization
      • API Versioning & lifecycle
      • Assign user to jobs
      • Localization
      • Resource timestamps
      • Custom attributes
      • Article attributes
      • Recordable attributes
      • Data update guarantees
      • Rate limits & scaling
      • Retries
      • Performance on test vs. production systems
      • Load testing
    • API calls
      • Postman
      • cURL
      • GraphQL Explorer
    • GraphQL API
    • RESTful API
      • Pagination interface
      • RapiDoc
      • OpenAPI 3.0 Spec
    • Eventing
      • Structure of an event
      • Available events
        • Event flows
      • Eventing example
      • Event export
  • Integration Guides
    • Basics
      • Article categories
      • Audits
      • Facilities
      • Facility groups
      • GDPR configuration
      • Listings
      • Remote configuration
      • Receipts
      • Search
      • Subscribe to events
      • Sticker
      • Stocks
      • Storage locations
      • Tags
      • Users
    • Channel inventory
    • Inbound process
    • Outbound stocks
    • Purchase order
    • Receipt
    • Routing strategy (context-based multi-config DOMS)
    • Show sticker to clients
    • Stow jobs
  • More Integration Guides
    • Carrier management
      • Introduction to carrier configuration
      • Required data when operating carriers
      • Adding & connecting carriers to facilities
      • Custom carrier
    • Configurations for order fulfillment
      • Picking configuration
      • Packing configuration
      • Handover configuration
      • Printing and document configuration
      • Packing container types
      • Parcel tag configuration
      • Headless order fulfillment
      • Short-pick reasons
      • External documents in order fulfillment
      • Service jobs
      • Load units
      • Running sequence
    • DOMS - distributed order management system (routing)
    • External actions
    • Interfacility transfer
    • Notifications
    • Orders
      • Place your first order
      • Ship-from-store orders
      • Click-and-collect orders
      • Locked orders
      • Order with custom services
      • Bundled items in an order
      • Order process status
    • Availability & promising
    • Returns
Powered by GitBook
On this page
  • Scaling behavior under load
  • Scaling up
  • High load phase
  • Scaling down
  • When does the system scale?
Edit on GitHub
  1. API
  2. Core concepts

Rate limits & scaling

Last updated 4 months ago

fulfillmenttools APIs have a rate limit of ~1000 requests per second

This limit applies across the entire API and can be raised on demand. We also reserve the right to impose a limit lower than 1000 requests per second in the future.

Consider a system that provides APIs as the primary way to interact with it. Thus, API Calls have to be issued from any client's side to create/update or read data at/from the platform. When the number of Requests rises during usage, a given infrastructure either needs to block calls (e.g. impose rate limits that limit the resource to only 1 instance) or needs to scale with the load. fulfillmenttools decided for the latter: The platform scales with the load imposed by the clients using the API.

That means, in general, API usage can scale with our customers' businesses. To provide the necessary service level, however, the platform transparently scales the needed services horizontally - which means new containers that can answer requests need to be created. In that case, the HTTP Code 429: Too many Requests comes into play - potentially way earlier than 1000 requests a second, as you will learn in the following section.

Scaling behavior under load

When entering a scale-up phase, the API may respond to some requests with HTTP Response Code 429 (too many requests). This does not necessarily mean that you reached a rate limit, but the current call could not be processed due to the lack of resources.

When you receive this response to a call, an instance is already starting. However, you need to re-issue the request to have it processed.

Scaling up

When there is a low load, the provided resources are minimal. In the example above, one instance is enough to handle all the calls in the depicted "low load phase."

At the beginning of a "high load phase," we also enter the "scale up phase". On the server side, one or more instances are started to provide the needed resources to handle the current and anticipated load shortly. This happens fully automated.

The needed resources will be provided during this time, and some load will be taken over. In this example, two instances can handle incoming requests from now on.

High load phase

During high traffic/request counts, the provided resources handle the incoming calls. If the load rises further, additional Instances are provided, and the behavior will be similar to the one described in entering a high load phase. However, the percentage of affected calls will decrease as the ratio of operationally available instances rises over time.

Scaling down

When the API usage drops again, when the load lowers, unneeded instances are shut down to safe resources. This happens entirely transparently to the clients that issue requests. After the load is gone, in the above example, only one instance is enough to handle the load.

When does the system scale?

There is no definitive answer to this question. It depends on multiple parameters, such as the complexity of the call, the required CPU or Memory, the number and type of parallel calls that need to be processed, and the currently available number of instances.

Drawing