Skip to main content

Observability and SIEM Integration

The Cequence AI Gateway provides full visibility into what your AI agents are doing — every tool call, every authentication decision, and every policy enforcement action. This guide covers how to monitor activity, export events to your SIEM, and track pool health.

Quick Start

I want to...Jump to
Understand what gets loggedWhat gets logged
Export events to Splunk, Datadog, or another SIEMSIEM export configuration
Monitor private pool healthPool health monitoring
Troubleshoot blocked or failed requestsUsing audit events for troubleshooting

What Gets Logged

The AI Gateway captures two categories of events:

Tool Activity Events

Emitted for every MCP tool invocation — whether it was allowed, blocked, or failed. These answer the question: "What did this agent do, and was it allowed?"

Each event includes:

FieldDescription
Tool nameWhich tool was called (for example, list_contacts, send_message)
MCP serverWhich MCP server handled the call
User identityEmail or user ID of the authenticated agent (from JWT/SSO)
Client IPIP address of the calling agent
Statussuccess, error, blocked, or rate_limited
Actionallowed or blocked
ReasonWhy the request was blocked (for example, rate_limit_exceeded, authz_denied)
DurationTotal round-trip time and upstream processing time (in milliseconds)
TimestampWhen the event occurred
Request IDUnique identifier for correlating related events
Session IDMCP session identifier

Operational Events

Infrastructure and security events from the gateway and Operator. These answer the question: "Is the data plane healthy? What changed?"

Event CategoryExamples
AuthenticationLogin failures, expired tokens, invalid credentials
AuthorizationAccess denied for tool or persona
Rate LimitingLimit exceeded with current count and configured limit
RoutingNo matching route, host mismatch
Upstream ErrorsBackend failures, connection timeouts
Circuit BreakerCircuit opened/closed for a backend
ConfigurationConfig updated from control plane, version changes
Component LifecycleArmor deployed, Redis ready, SIEM exporter started
ScalingReplica count mismatches detected
ReconciliationOperator reconciliation cycle completed with summary

SIEM Export Configuration

You can forward audit events to your existing SIEM or observability platform. The AI Gateway supports multiple export destinations simultaneously.

Supported Destinations

DestinationProtocolUse case
SplunkHTTP Event Collector (HEC)Enterprise SIEM with Splunk
DatadogDatadog APIMonitoring with Datadog
OTLPOpenTelemetry Protocol (gRPC or HTTP)Any OTLP-compatible backend
SyslogTCP, UDP, or TLSTraditional syslog infrastructure

Setting Up SIEM Export

SIEM export is configured at the pool level for private cloud deployments. Contact your Cequence administrator or use the pool configuration API to set up export destinations.

What you'll need from your IT team:

DestinationRequired information
SplunkHEC endpoint URL, HEC token, index name, source/sourcetype
DatadogDatadog site (for example, datadoghq.com), API key
OTLPEndpoint URL, protocol (gRPC or HTTP), any required headers
SyslogServer endpoint, port, protocol (TCP/UDP), TLS certificates if applicable

Routing Events by Type

You can route different event types to different destinations. For example:

  • Tool activity events → Splunk (for compliance and audit)
  • Operational events → Datadog (for infrastructure monitoring)
  • Metrics → Your Prometheus-compatible backend

This lets you send high-volume tool activity to your SIEM for compliance while routing operational alerts to your monitoring stack.

Per-MCP Server Overrides

For specific MCP servers that require different logging rules (for example, a high-security financial integration), you can override the default SIEM settings:

  • Route events to a specific Splunk index
  • Enable or disable logging for that MCP server
  • Send events to different destinations than the pool default

Pool Health Monitoring

For private cloud deployments, the AI Gateway provides several ways to monitor pool health.

Portal Dashboard

The Private Cloud page in the portal shows health at a glance:

StatusWhat it meansAction needed
ActiveOperator connected, heartbeat recentNone
PendingOperator not yet installed or connectingComplete deployment (guide)
StaleNo heartbeat in 30+ minutesCheck Operator pod health and network connectivity
ErrorOperator reporting unhealthy stateCheck Operator logs for errors

Select a pool to see detailed status including:

  • Operator version and Kubernetes cluster version
  • MCP server count — total deployed and how many are healthy
  • Last heartbeat — when the Operator last reported
  • Component health — status of Armor, Redis, and other services

Pool Operations

When health checks reveal issues, you can take action:

OperationWhen to useHow
Force syncConfig changes aren't being picked upSelect Force Sync on the pool detail page
ReseedPool config is out of sync with control planeSelect Reseed on the pool detail page
View Armor configVerify what's actually runningSelect View Config on the pool detail page

Troubleshooting with Audit Events

Audit events are your primary tool for understanding why requests succeed or fail. Here are common scenarios and what to look for:

"My agent can't access a tool"

Look for events with status blocked:

ReasonMeaningFix
authz_deniedAgent doesn't have permission for this toolCheck persona tool assignments and team membership
rate_limit_exceededRate limit hit for this toolWait for the window to reset, or increase the limit
auth_deniedAgent credential is invalid or expiredRe-authenticate or generate a new access key
interceptor_deniedA security policy blocked the requestReview the configured security interceptors

"Requests are slow or timing out"

Look for events with high duration.total_ms values:

IndicatorMeaningFix
High duration.upstream_msThe upstream app is slowCheck the upstream application's health
circuit_breaker_open eventsGateway stopped forwarding to a failing backendUpstream is down; wait for recovery or check the service
upstream_error eventsBackend returning errorsCheck upstream service logs and connectivity

"The pool shows as stale"

The Operator hasn't reported a heartbeat in over 30 minutes:

  1. Check if the Operator pod is running: kubectl get pods -n <namespace> -l app=ai-gateway-operator
  2. Check Operator logs for connectivity errors: kubectl logs -n <namespace> -l app=ai-gateway-operator
  3. Verify outbound HTTPS access to the Cequence control plane is not blocked

Event Fields Reference

Tool Activity Events — Key Fields

FieldTypeDescription
timestampdatetimeWhen the event occurred
request.idstringUnique request identifier for correlation
tool.namestringName of the tool called
mcp.server_namestringMCP server that handled the call
user.emailstringAuthenticated user's email
client.ipstringClient IP address
statusstringsuccess, error, blocked, rate_limited
actionstringallowed or blocked
action.reasonstringReason for blocking (if blocked)
duration.total_msnumberTotal request duration in milliseconds
duration.upstream_msnumberUpstream processing time in milliseconds
http.status_codenumberHTTP status code returned

Common Operational Events

EventSeverityDescription
auth_deniedWARNAuthentication failed (invalid/expired credential)
access_control_deniedWARNAuthorization check failed (wrong team/role)
rate_limit_exceededWARNRate limit policy exceeded
circuit_breaker_openWARNBackend circuit breaker tripped
upstream_errorERRORBackend returned an error
outbound_auth_failedERRORFailed to obtain upstream credentials
config_changeINFOConfiguration updated from control plane
component_lifecycleINFOComponent deployed, ready, or degraded
reconciliation_completeINFOOperator completed a reconciliation cycle

Tips

  • Start with tool activity events. These give you the most immediately useful information — who's using what, and whether anything is being blocked.
  • Set up alerts for circuit_breaker_open events. These indicate an upstream service is failing, which often requires attention.
  • Use request IDs for correlation. When troubleshooting a specific failure, search your SIEM for the request.id to find all related events in the chain.
  • Monitor stale pool status. Set up an alert in your monitoring system for pools that go stale — this usually indicates a networking issue.
  • Route different event types to different systems. Send security-relevant events (auth failures, rate limits) to your SIEM for compliance, and operational events to your monitoring stack for alerting.
  • Check pool health after config changes. After updating pool configuration, use force-sync and verify the pool returns to Active status.