Rate Limiting and Security Policies

When your MCP servers are deployed — whether in the Cequence cloud or in a private pool — the AI Gateway enforces security policies on every request. This guide covers how to configure rate limits and understand the enforcement pipeline.

Quick Start

I want to...	Jump to
Limit how often a specific tool can be called	Per-tool rate limits
Understand the default limits	Default rate limits
Know what happens when a request is blocked	Enforcement pipeline
Understand circuit breakers	Circuit breaking

Rate Limiting

Rate limiting protects your upstream applications from excessive usage. You can control how frequently each tool is called, with sensible defaults that apply automatically.

Enabling Rate Limits

Rate limiting is configured per MCP server:

Open the MCP server from the MCP Registry.
Select the Tools tab.
Toggle Rate Limiting to enabled.

When enabled, every tool on the MCP server receives default rate limits based on its HTTP method. You can then customize limits for individual tools.

Default Rate Limits by HTTP Method

When you enable rate limiting, these defaults apply automatically:

HTTP Method	Default Limit	Time Window
`GET`, `HEAD`, `OPTIONS`	1,000 requests	1 hour
`POST`, `PUT`, `PATCH`	100 requests	1 hour
`DELETE`	10 requests	1 hour

These defaults are designed for typical usage patterns — read-heavy operations get higher limits, while destructive operations are more restricted.

Per-Tool Rate Limits

You can override the default limits for any individual tool:

On the Tools tab, find the tool you want to customize.
Select the tool to expand its details.
Under Rate Limit, adjust:
- Max Requests — The maximum number of requests allowed in the time window (1 to 1,000,000)
- Time Window — The rolling window for counting requests (1 second to 24 hours)

Example configurations:

Tool	Max Requests	Time Window	Why
`list_customers` (GET)	5,000	1 hour	High-frequency read operation
`create_ticket` (POST)	50	1 hour	Prevent accidental mass creation
`delete_record` (DELETE)	5	1 hour	Extra protection for destructive operations
`send_message` (POST)	200	6 hours	Generous but bounded messaging limit

What Happens When a Limit Is Reached

When a tool call exceeds its rate limit:

The request is rejected with an HTTP 429 (Too Many Requests) status
The AI agent receives an error message indicating the limit was exceeded
An audit event is logged with the reason rate_limit_exceeded
The agent can retry after the time window resets

Enforcement Pipeline

Every request that reaches the AI Gateway passes through a series of security checks. Understanding this pipeline helps you troubleshoot when requests are blocked.

The checks run in this order:

Step	What it checks	Blocked response
1. Routing	Does the request path match a known MCP server or persona?	404 Not Found
2. Authentication	Is the agent's credential (API key, OAuth token, JWT) valid?	401 Unauthorized
3. Authorization	Does this agent have permission to call this tool?	403 Forbidden
4. Rate Limiting	Has the agent exceeded the configured rate limit for this tool?	429 Too Many Requests
5. Security Interceptors	Do any configured security policies (DLP, behavioral rules) flag this request?	Varies
6. Upstream Call	Forward the request to the MCP server and upstream application	502/503 on failure

If a request fails at any step, subsequent steps are skipped and the agent receives the corresponding error. Every enforcement decision is logged in the audit trail.

Circuit Breaking

Circuit breaking protects your deployment when an upstream application becomes unresponsive. Instead of sending requests to a failing service and waiting for timeouts, the gateway temporarily stops forwarding traffic to that backend.

How it works:

If an upstream service fails repeatedly, the circuit "opens" and requests are immediately rejected with a 503 status
After a cooldown period, the gateway tries sending a test request to the upstream
If the test succeeds, the circuit "closes" and normal traffic resumes
If the test fails, the circuit stays open for another cooldown period

What you see:

Requests are rejected faster (no waiting for slow timeouts)
Audit logs show events with reason circuit_breaker_open
Once the upstream recovers, traffic resumes automatically

Circuit breaking is applied automatically — no configuration is needed.

Real-World Examples

Example 1: "We want to protect our CRM from agent overuse"

Setting	Value
MCP Server	Salesforce CRM
Rate Limiting	Enabled
`list_contacts` (GET)	2,000 requests / 1 hour
`create_lead` (POST)	50 requests / 1 hour
`delete_contact` (DELETE)	5 requests / 1 hour

Your sales team can query contacts freely, but lead creation and deletion are tightly controlled.

Example 2: "A CI/CD persona should have higher limits than interactive users"

Create two personas with different rate limit configurations:

Interactive Persona (for team members in Cursor/Claude Desktop):

create_issue: 20 requests / 1 hour
post_comment: 50 requests / 1 hour

Automation Persona (for CI/CD pipelines with access keys):

create_issue: 500 requests / 1 hour
post_comment: 2,000 requests / 1 hour

Since rate limits are configured per MCP server, you can create separate MCP servers with different limits for each use case.

What to Ask Your IT Team

If you need...	Ask for...
Rate limit guidance	Expected request volumes per tool per hour for your team's workload
SIEM integration	SIEM endpoint and credentials (see Observability)
Network policies	Confirm the Armor gateway can reach your upstream applications
TLS certificates	Certificates for your pool's ingress hostname

Tips

Start with defaults. The built-in rate limits by HTTP method are reasonable for most workloads. Only customize when you see issues.
Monitor before restricting. Check the audit logs to understand actual usage patterns before tightening limits.
Use different MCP servers for different use cases. If interactive users and automation need different limits, create separate MCP servers rather than trying to share one.
Watch for circuit breaker events. Repeated circuit_breaker_open events in your audit logs may indicate an upstream service is struggling and needs attention.
Rate limits reset on a rolling window. The time window is rolling, not fixed — a limit of 100/hour means 100 requests in any 60-minute sliding window.

Quick Start​

Rate Limiting​

Enabling Rate Limits​

Default Rate Limits by HTTP Method​

Per-Tool Rate Limits​

What Happens When a Limit Is Reached​

Enforcement Pipeline​

Circuit Breaking​

Real-World Examples​

Example 1: "We want to protect our CRM from agent overuse"​

Example 2: "A CI/CD persona should have higher limits than interactive users"​

What to Ask Your IT Team​

Tips​