Skip to main content

Rate Limiting and Security Policies

When your MCP servers are deployed — whether in the Cequence cloud or in a private pool — the AI Gateway enforces security policies on every request. This guide covers how to configure rate limits and understand the enforcement pipeline.

Quick Start

I want to...Jump to
Limit how often a specific tool can be calledPer-tool rate limits
Understand the default limitsDefault rate limits
Know what happens when a request is blockedEnforcement pipeline
Understand circuit breakersCircuit breaking

Rate Limiting

Rate limiting protects your upstream applications from excessive usage. You can control how frequently each tool is called, with sensible defaults that apply automatically.

Enabling Rate Limits

Rate limiting is configured per MCP server:

  1. Open the MCP server from the MCP Registry.
  2. Select the Tools tab.
  3. Toggle Rate Limiting to enabled.

When enabled, every tool on the MCP server receives default rate limits based on its HTTP method. You can then customize limits for individual tools.

Default Rate Limits by HTTP Method

When you enable rate limiting, these defaults apply automatically:

HTTP MethodDefault LimitTime Window
GET, HEAD, OPTIONS1,000 requests1 hour
POST, PUT, PATCH100 requests1 hour
DELETE10 requests1 hour

These defaults are designed for typical usage patterns — read-heavy operations get higher limits, while destructive operations are more restricted.

Per-Tool Rate Limits

You can override the default limits for any individual tool:

  1. On the Tools tab, find the tool you want to customize.
  2. Select the tool to expand its details.
  3. Under Rate Limit, adjust:
    • Max Requests — The maximum number of requests allowed in the time window (1 to 1,000,000)
    • Time Window — The rolling window for counting requests (1 second to 24 hours)

Example configurations:

ToolMax RequestsTime WindowWhy
list_customers (GET)5,0001 hourHigh-frequency read operation
create_ticket (POST)501 hourPrevent accidental mass creation
delete_record (DELETE)51 hourExtra protection for destructive operations
send_message (POST)2006 hoursGenerous but bounded messaging limit

What Happens When a Limit Is Reached

When a tool call exceeds its rate limit:

  1. The request is rejected with an HTTP 429 (Too Many Requests) status
  2. The AI agent receives an error message indicating the limit was exceeded
  3. An audit event is logged with the reason rate_limit_exceeded
  4. The agent can retry after the time window resets

Enforcement Pipeline

Every request that reaches the AI Gateway passes through a series of security checks. Understanding this pipeline helps you troubleshoot when requests are blocked.

The checks run in this order:

StepWhat it checksBlocked response
1. RoutingDoes the request path match a known MCP server or persona?404 Not Found
2. AuthenticationIs the agent's credential (API key, OAuth token, JWT) valid?401 Unauthorized
3. AuthorizationDoes this agent have permission to call this tool?403 Forbidden
4. Rate LimitingHas the agent exceeded the configured rate limit for this tool?429 Too Many Requests
5. Security InterceptorsDo any configured security policies (DLP, behavioral rules) flag this request?Varies
6. Upstream CallForward the request to the MCP server and upstream application502/503 on failure

If a request fails at any step, subsequent steps are skipped and the agent receives the corresponding error. Every enforcement decision is logged in the audit trail.


Circuit Breaking

Circuit breaking protects your deployment when an upstream application becomes unresponsive. Instead of sending requests to a failing service and waiting for timeouts, the gateway temporarily stops forwarding traffic to that backend.

How it works:

  1. If an upstream service fails repeatedly, the circuit "opens" and requests are immediately rejected with a 503 status
  2. After a cooldown period, the gateway tries sending a test request to the upstream
  3. If the test succeeds, the circuit "closes" and normal traffic resumes
  4. If the test fails, the circuit stays open for another cooldown period

What you see:

  • Requests are rejected faster (no waiting for slow timeouts)
  • Audit logs show events with reason circuit_breaker_open
  • Once the upstream recovers, traffic resumes automatically

Circuit breaking is applied automatically — no configuration is needed.


Real-World Examples

Example 1: "We want to protect our CRM from agent overuse"

SettingValue
MCP ServerSalesforce CRM
Rate LimitingEnabled
list_contacts (GET)2,000 requests / 1 hour
create_lead (POST)50 requests / 1 hour
delete_contact (DELETE)5 requests / 1 hour

Your sales team can query contacts freely, but lead creation and deletion are tightly controlled.

Example 2: "A CI/CD persona should have higher limits than interactive users"

Create two personas with different rate limit configurations:

Interactive Persona (for team members in Cursor/Claude Desktop):

  • create_issue: 20 requests / 1 hour
  • post_comment: 50 requests / 1 hour

Automation Persona (for CI/CD pipelines with access keys):

  • create_issue: 500 requests / 1 hour
  • post_comment: 2,000 requests / 1 hour

Since rate limits are configured per MCP server, you can create separate MCP servers with different limits for each use case.


What to Ask Your IT Team

If you need...Ask for...
Rate limit guidanceExpected request volumes per tool per hour for your team's workload
SIEM integrationSIEM endpoint and credentials (see Observability)
Network policiesConfirm the Armor gateway can reach your upstream applications
TLS certificatesCertificates for your pool's ingress hostname

Tips

  • Start with defaults. The built-in rate limits by HTTP method are reasonable for most workloads. Only customize when you see issues.
  • Monitor before restricting. Check the audit logs to understand actual usage patterns before tightening limits.
  • Use different MCP servers for different use cases. If interactive users and automation need different limits, create separate MCP servers rather than trying to share one.
  • Watch for circuit breaker events. Repeated circuit_breaker_open events in your audit logs may indicate an upstream service is struggling and needs attention.
  • Rate limits reset on a rolling window. The time window is rolling, not fixed — a limit of 100/hour means 100 requests in any 60-minute sliding window.