Rate Limiting and Security Policies
When your MCP servers are deployed — whether in the Cequence cloud or in a private pool — the AI Gateway enforces security policies on every request. This guide covers how to configure rate limits and understand the enforcement pipeline.
Quick Start
| I want to... | Jump to |
|---|---|
| Limit how often a specific tool can be called | Per-tool rate limits |
| Understand the default limits | Default rate limits |
| Know what happens when a request is blocked | Enforcement pipeline |
| Understand circuit breakers | Circuit breaking |
Rate Limiting
Rate limiting protects your upstream applications from excessive usage. You can control how frequently each tool is called, with sensible defaults that apply automatically.
Enabling Rate Limits
Rate limiting is configured per MCP server:
- Open the MCP server from the MCP Registry.
- Select the Tools tab.
- Toggle Rate Limiting to enabled.
When enabled, every tool on the MCP server receives default rate limits based on its HTTP method. You can then customize limits for individual tools.
Default Rate Limits by HTTP Method
When you enable rate limiting, these defaults apply automatically:
| HTTP Method | Default Limit | Time Window |
|---|---|---|
GET, HEAD, OPTIONS | 1,000 requests | 1 hour |
POST, PUT, PATCH | 100 requests | 1 hour |
DELETE | 10 requests | 1 hour |
These defaults are designed for typical usage patterns — read-heavy operations get higher limits, while destructive operations are more restricted.
Per-Tool Rate Limits
You can override the default limits for any individual tool:
- On the Tools tab, find the tool you want to customize.
- Select the tool to expand its details.
- Under Rate Limit, adjust:
- Max Requests — The maximum number of requests allowed in the time window (1 to 1,000,000)
- Time Window — The rolling window for counting requests (1 second to 24 hours)
Example configurations:
| Tool | Max Requests | Time Window | Why |
|---|---|---|---|
list_customers (GET) | 5,000 | 1 hour | High-frequency read operation |
create_ticket (POST) | 50 | 1 hour | Prevent accidental mass creation |
delete_record (DELETE) | 5 | 1 hour | Extra protection for destructive operations |
send_message (POST) | 200 | 6 hours | Generous but bounded messaging limit |
What Happens When a Limit Is Reached
When a tool call exceeds its rate limit:
- The request is rejected with an HTTP 429 (Too Many Requests) status
- The AI agent receives an error message indicating the limit was exceeded
- An audit event is logged with the reason
rate_limit_exceeded - The agent can retry after the time window resets
Enforcement Pipeline
Every request that reaches the AI Gateway passes through a series of security checks. Understanding this pipeline helps you troubleshoot when requests are blocked.
The checks run in this order:
| Step | What it checks | Blocked response |
|---|---|---|
| 1. Routing | Does the request path match a known MCP server or persona? | 404 Not Found |
| 2. Authentication | Is the agent's credential (API key, OAuth token, JWT) valid? | 401 Unauthorized |
| 3. Authorization | Does this agent have permission to call this tool? | 403 Forbidden |
| 4. Rate Limiting | Has the agent exceeded the configured rate limit for this tool? | 429 Too Many Requests |
| 5. Security Interceptors | Do any configured security policies (DLP, behavioral rules) flag this request? | Varies |
| 6. Upstream Call | Forward the request to the MCP server and upstream application | 502/503 on failure |
If a request fails at any step, subsequent steps are skipped and the agent receives the corresponding error. Every enforcement decision is logged in the audit trail.
Circuit Breaking
Circuit breaking protects your deployment when an upstream application becomes unresponsive. Instead of sending requests to a failing service and waiting for timeouts, the gateway temporarily stops forwarding traffic to that backend.
How it works:
- If an upstream service fails repeatedly, the circuit "opens" and requests are immediately rejected with a 503 status
- After a cooldown period, the gateway tries sending a test request to the upstream
- If the test succeeds, the circuit "closes" and normal traffic resumes
- If the test fails, the circuit stays open for another cooldown period
What you see:
- Requests are rejected faster (no waiting for slow timeouts)
- Audit logs show events with reason
circuit_breaker_open - Once the upstream recovers, traffic resumes automatically
Circuit breaking is applied automatically — no configuration is needed.
Real-World Examples
Example 1: "We want to protect our CRM from agent overuse"
| Setting | Value |
|---|---|
| MCP Server | Salesforce CRM |
| Rate Limiting | Enabled |
list_contacts (GET) | 2,000 requests / 1 hour |
create_lead (POST) | 50 requests / 1 hour |
delete_contact (DELETE) | 5 requests / 1 hour |
Your sales team can query contacts freely, but lead creation and deletion are tightly controlled.
Example 2: "A CI/CD persona should have higher limits than interactive users"
Create two personas with different rate limit configurations:
Interactive Persona (for team members in Cursor/Claude Desktop):
create_issue: 20 requests / 1 hourpost_comment: 50 requests / 1 hour
Automation Persona (for CI/CD pipelines with access keys):
create_issue: 500 requests / 1 hourpost_comment: 2,000 requests / 1 hour
Since rate limits are configured per MCP server, you can create separate MCP servers with different limits for each use case.
What to Ask Your IT Team
| If you need... | Ask for... |
|---|---|
| Rate limit guidance | Expected request volumes per tool per hour for your team's workload |
| SIEM integration | SIEM endpoint and credentials (see Observability) |
| Network policies | Confirm the Armor gateway can reach your upstream applications |
| TLS certificates | Certificates for your pool's ingress hostname |
Tips
- Start with defaults. The built-in rate limits by HTTP method are reasonable for most workloads. Only customize when you see issues.
- Monitor before restricting. Check the audit logs to understand actual usage patterns before tightening limits.
- Use different MCP servers for different use cases. If interactive users and automation need different limits, create separate MCP servers rather than trying to share one.
- Watch for circuit breaker events. Repeated
circuit_breaker_openevents in your audit logs may indicate an upstream service is struggling and needs attention. - Rate limits reset on a rolling window. The time window is rolling, not fixed — a limit of 100/hour means 100 requests in any 60-minute sliding window.
Cequence AI Gateway