Observability and SIEM Integration
The Cequence AI Gateway provides full visibility into what your AI agents are doing — every tool call, every authentication decision, and every policy enforcement action. This guide covers how to monitor activity, export events to your SIEM, and track pool health.
Quick Start
| I want to... | Jump to |
|---|---|
| Understand what gets logged | What gets logged |
| Export events to Splunk, Datadog, or another SIEM | SIEM export configuration |
| Monitor private pool health | Pool health monitoring |
| Troubleshoot blocked or failed requests | Using audit events for troubleshooting |
What Gets Logged
The AI Gateway captures two categories of events:
Tool Activity Events
Emitted for every MCP tool invocation — whether it was allowed, blocked, or failed. These answer the question: "What did this agent do, and was it allowed?"
Each event includes:
| Field | Description |
|---|---|
| Tool name | Which tool was called (for example, list_contacts, send_message) |
| MCP server | Which MCP server handled the call |
| User identity | Email or user ID of the authenticated agent (from JWT/SSO) |
| Client IP | IP address of the calling agent |
| Status | success, error, blocked, or rate_limited |
| Action | allowed or blocked |
| Reason | Why the request was blocked (for example, rate_limit_exceeded, authz_denied) |
| Duration | Total round-trip time and upstream processing time (in milliseconds) |
| Timestamp | When the event occurred |
| Request ID | Unique identifier for correlating related events |
| Session ID | MCP session identifier |
Operational Events
Infrastructure and security events from the gateway and Operator. These answer the question: "Is the data plane healthy? What changed?"
| Event Category | Examples |
|---|---|
| Authentication | Login failures, expired tokens, invalid credentials |
| Authorization | Access denied for tool or persona |
| Rate Limiting | Limit exceeded with current count and configured limit |
| Routing | No matching route, host mismatch |
| Upstream Errors | Backend failures, connection timeouts |
| Circuit Breaker | Circuit opened/closed for a backend |
| Configuration | Config updated from control plane, version changes |
| Component Lifecycle | Armor deployed, Redis ready, SIEM exporter started |
| Scaling | Replica count mismatches detected |
| Reconciliation | Operator reconciliation cycle completed with summary |
SIEM Export Configuration
You can forward audit events to your existing SIEM or observability platform. The AI Gateway supports multiple export destinations simultaneously.
Supported Destinations
| Destination | Protocol | Use case |
|---|---|---|
| Splunk | HTTP Event Collector (HEC) | Enterprise SIEM with Splunk |
| Datadog | Datadog API | Monitoring with Datadog |
| OTLP | OpenTelemetry Protocol (gRPC or HTTP) | Any OTLP-compatible backend |
| Syslog | TCP, UDP, or TLS | Traditional syslog infrastructure |
Setting Up SIEM Export
SIEM export is configured at the pool level for private cloud deployments. Contact your Cequence administrator or use the pool configuration API to set up export destinations.
What you'll need from your IT team:
| Destination | Required information |
|---|---|
| Splunk | HEC endpoint URL, HEC token, index name, source/sourcetype |
| Datadog | Datadog site (for example, datadoghq.com), API key |
| OTLP | Endpoint URL, protocol (gRPC or HTTP), any required headers |
| Syslog | Server endpoint, port, protocol (TCP/UDP), TLS certificates if applicable |
Routing Events by Type
You can route different event types to different destinations. For example:
- Tool activity events → Splunk (for compliance and audit)
- Operational events → Datadog (for infrastructure monitoring)
- Metrics → Your Prometheus-compatible backend
This lets you send high-volume tool activity to your SIEM for compliance while routing operational alerts to your monitoring stack.
Per-MCP Server Overrides
For specific MCP servers that require different logging rules (for example, a high-security financial integration), you can override the default SIEM settings:
- Route events to a specific Splunk index
- Enable or disable logging for that MCP server
- Send events to different destinations than the pool default
Pool Health Monitoring
For private cloud deployments, the AI Gateway provides several ways to monitor pool health.
Portal Dashboard
The Private Cloud page in the portal shows health at a glance:
| Status | What it means | Action needed |
|---|---|---|
| Active | Operator connected, heartbeat recent | None |
| Pending | Operator not yet installed or connecting | Complete deployment (guide) |
| Stale | No heartbeat in 30+ minutes | Check Operator pod health and network connectivity |
| Error | Operator reporting unhealthy state | Check Operator logs for errors |
Select a pool to see detailed status including:
- Operator version and Kubernetes cluster version
- MCP server count — total deployed and how many are healthy
- Last heartbeat — when the Operator last reported
- Component health — status of Armor, Redis, and other services
Pool Operations
When health checks reveal issues, you can take action:
| Operation | When to use | How |
|---|---|---|
| Force sync | Config changes aren't being picked up | Select Force Sync on the pool detail page |
| Reseed | Pool config is out of sync with control plane | Select Reseed on the pool detail page |
| View Armor config | Verify what's actually running | Select View Config on the pool detail page |
Troubleshooting with Audit Events
Audit events are your primary tool for understanding why requests succeed or fail. Here are common scenarios and what to look for:
"My agent can't access a tool"
Look for events with status blocked:
| Reason | Meaning | Fix |
|---|---|---|
authz_denied | Agent doesn't have permission for this tool | Check persona tool assignments and team membership |
rate_limit_exceeded | Rate limit hit for this tool | Wait for the window to reset, or increase the limit |
auth_denied | Agent credential is invalid or expired | Re-authenticate or generate a new access key |
interceptor_denied | A security policy blocked the request | Review the configured security interceptors |
"Requests are slow or timing out"
Look for events with high duration.total_ms values:
| Indicator | Meaning | Fix |
|---|---|---|
High duration.upstream_ms | The upstream app is slow | Check the upstream application's health |
circuit_breaker_open events | Gateway stopped forwarding to a failing backend | Upstream is down; wait for recovery or check the service |
upstream_error events | Backend returning errors | Check upstream service logs and connectivity |
"The pool shows as stale"
The Operator hasn't reported a heartbeat in over 30 minutes:
- Check if the Operator pod is running:
kubectl get pods -n <namespace> -l app=ai-gateway-operator - Check Operator logs for connectivity errors:
kubectl logs -n <namespace> -l app=ai-gateway-operator - Verify outbound HTTPS access to the Cequence control plane is not blocked
Event Fields Reference
Tool Activity Events — Key Fields
| Field | Type | Description |
|---|---|---|
timestamp | datetime | When the event occurred |
request.id | string | Unique request identifier for correlation |
tool.name | string | Name of the tool called |
mcp.server_name | string | MCP server that handled the call |
user.email | string | Authenticated user's email |
client.ip | string | Client IP address |
status | string | success, error, blocked, rate_limited |
action | string | allowed or blocked |
action.reason | string | Reason for blocking (if blocked) |
duration.total_ms | number | Total request duration in milliseconds |
duration.upstream_ms | number | Upstream processing time in milliseconds |
http.status_code | number | HTTP status code returned |
Common Operational Events
| Event | Severity | Description |
|---|---|---|
auth_denied | WARN | Authentication failed (invalid/expired credential) |
access_control_denied | WARN | Authorization check failed (wrong team/role) |
rate_limit_exceeded | WARN | Rate limit policy exceeded |
circuit_breaker_open | WARN | Backend circuit breaker tripped |
upstream_error | ERROR | Backend returned an error |
outbound_auth_failed | ERROR | Failed to obtain upstream credentials |
config_change | INFO | Configuration updated from control plane |
component_lifecycle | INFO | Component deployed, ready, or degraded |
reconciliation_complete | INFO | Operator completed a reconciliation cycle |
Tips
- Start with tool activity events. These give you the most immediately useful information — who's using what, and whether anything is being blocked.
- Set up alerts for
circuit_breaker_openevents. These indicate an upstream service is failing, which often requires attention. - Use request IDs for correlation. When troubleshooting a specific failure, search your SIEM for the
request.idto find all related events in the chain. - Monitor
stalepool status. Set up an alert in your monitoring system for pools that go stale — this usually indicates a networking issue. - Route different event types to different systems. Send security-relevant events (auth failures, rate limits) to your SIEM for compliance, and operational events to your monitoring stack for alerting.
- Check pool health after config changes. After updating pool configuration, use force-sync and verify the pool returns to
Activestatus.
Cequence AI Gateway