Observability and SIEM Integration

The Cequence AI Gateway provides full visibility into what your AI agents are doing — every tool call, every authentication decision, and every policy enforcement action. This guide covers how to monitor activity, export events to your SIEM, and track pool health.

Quick Start

I want to...	Jump to
Understand what gets logged	What gets logged
Export events to Splunk, Datadog, or another SIEM	SIEM export configuration
Monitor private pool health	Pool health monitoring
Troubleshoot blocked or failed requests	Using audit events for troubleshooting

What Gets Logged

The AI Gateway captures two categories of events:

Tool Activity Events

Emitted for every MCP tool invocation — whether it was allowed, blocked, or failed. These answer the question: "What did this agent do, and was it allowed?"

Each event includes:

Field	Description
Tool name	Which tool was called (for example, `list_contacts`, `send_message`)
MCP server	Which MCP server handled the call
User identity	Email or user ID of the authenticated agent (from JWT/SSO)
Client IP	IP address of the calling agent
Status	`success`, `error`, `blocked`, or `rate_limited`
Action	`allowed` or `blocked`
Reason	Why the request was blocked (for example, `rate_limit_exceeded`, `authz_denied`)
Duration	Total round-trip time and upstream processing time (in milliseconds)
Timestamp	When the event occurred
Request ID	Unique identifier for correlating related events
Session ID	MCP session identifier

Operational Events

Infrastructure and security events from the gateway and Operator. These answer the question: "Is the data plane healthy? What changed?"

Event Category	Examples
Authentication	Login failures, expired tokens, invalid credentials
Authorization	Access denied for tool or persona
Rate Limiting	Limit exceeded with current count and configured limit
Routing	No matching route, host mismatch
Upstream Errors	Backend failures, connection timeouts
Circuit Breaker	Circuit opened/closed for a backend
Configuration	Config updated from control plane, version changes
Component Lifecycle	Armor deployed, Redis ready, SIEM exporter started
Scaling	Replica count mismatches detected
Reconciliation	Operator reconciliation cycle completed with summary

SIEM Export Configuration

You can forward audit events to your existing SIEM or observability platform. The AI Gateway supports multiple export destinations simultaneously.

Supported Destinations

Destination	Protocol	Use case
Splunk	HTTP Event Collector (HEC)	Enterprise SIEM with Splunk
Datadog	Datadog API	Monitoring with Datadog
OTLP	OpenTelemetry Protocol (gRPC or HTTP)	Any OTLP-compatible backend
Syslog	TCP, UDP, or TLS	Traditional syslog infrastructure

Setting Up SIEM Export

SIEM export is configured at the pool level for private cloud deployments. Contact your Cequence administrator or use the pool configuration API to set up export destinations.

What you'll need from your IT team:

Destination	Required information
Splunk	HEC endpoint URL, HEC token, index name, source/sourcetype
Datadog	Datadog site (for example, `datadoghq.com`), API key
OTLP	Endpoint URL, protocol (gRPC or HTTP), any required headers
Syslog	Server endpoint, port, protocol (TCP/UDP), TLS certificates if applicable

Routing Events by Type

You can route different event types to different destinations. For example:

Tool activity events → Splunk (for compliance and audit)
Operational events → Datadog (for infrastructure monitoring)
Metrics → Your Prometheus-compatible backend

This lets you send high-volume tool activity to your SIEM for compliance while routing operational alerts to your monitoring stack.

Per-MCP Server Overrides

For specific MCP servers that require different logging rules (for example, a high-security financial integration), you can override the default SIEM settings:

Route events to a specific Splunk index
Enable or disable logging for that MCP server
Send events to different destinations than the pool default

Pool Health Monitoring

For private cloud deployments, the AI Gateway provides several ways to monitor pool health.

Portal Dashboard

The Private Cloud page in the portal shows health at a glance:

Status	What it means	Action needed
Active	Operator connected, heartbeat recent	None
Pending	Operator not yet installed or connecting	Complete deployment (guide)
Stale	No heartbeat in 30+ minutes	Check Operator pod health and network connectivity
Error	Operator reporting unhealthy state	Check Operator logs for errors

Select a pool to see detailed status including:

Operator version and Kubernetes cluster version
MCP server count — total deployed and how many are healthy
Last heartbeat — when the Operator last reported
Component health — status of Armor, Redis, and other services

Pool Operations

When health checks reveal issues, you can take action:

Operation	When to use	How
Force sync	Config changes aren't being picked up	Select Force Sync on the pool detail page
Reseed	Pool config is out of sync with control plane	Select Reseed on the pool detail page
View Armor config	Verify what's actually running	Select View Config on the pool detail page

Troubleshooting with Audit Events

Audit events are your primary tool for understanding why requests succeed or fail. Here are common scenarios and what to look for:

"My agent can't access a tool"

Look for events with status blocked:

Reason	Meaning	Fix
`authz_denied`	Agent doesn't have permission for this tool	Check persona tool assignments and team membership
`rate_limit_exceeded`	Rate limit hit for this tool	Wait for the window to reset, or increase the limit
`auth_denied`	Agent credential is invalid or expired	Re-authenticate or generate a new access key
`interceptor_denied`	A security policy blocked the request	Review the configured security interceptors

"Requests are slow or timing out"

Look for events with high duration.total_ms values:

Indicator	Meaning	Fix
High `duration.upstream_ms`	The upstream app is slow	Check the upstream application's health
`circuit_breaker_open` events	Gateway stopped forwarding to a failing backend	Upstream is down; wait for recovery or check the service
`upstream_error` events	Backend returning errors	Check upstream service logs and connectivity

"The pool shows as stale"

The Operator hasn't reported a heartbeat in over 30 minutes:

Check if the Operator pod is running: kubectl get pods -n <namespace> -l app=ai-gateway-operator
Check Operator logs for connectivity errors: kubectl logs -n <namespace> -l app=ai-gateway-operator
Verify outbound HTTPS access to the Cequence control plane is not blocked

Event Fields Reference

Tool Activity Events — Key Fields

Field	Type	Description
`timestamp`	datetime	When the event occurred
`request.id`	string	Unique request identifier for correlation
`tool.name`	string	Name of the tool called
`mcp.server_name`	string	MCP server that handled the call
`user.email`	string	Authenticated user's email
`client.ip`	string	Client IP address
`status`	string	`success`, `error`, `blocked`, `rate_limited`
`action`	string	`allowed` or `blocked`
`action.reason`	string	Reason for blocking (if blocked)
`duration.total_ms`	number	Total request duration in milliseconds
`duration.upstream_ms`	number	Upstream processing time in milliseconds
`http.status_code`	number	HTTP status code returned

Common Operational Events

Event	Severity	Description
`auth_denied`	WARN	Authentication failed (invalid/expired credential)
`access_control_denied`	WARN	Authorization check failed (wrong team/role)
`rate_limit_exceeded`	WARN	Rate limit policy exceeded
`circuit_breaker_open`	WARN	Backend circuit breaker tripped
`upstream_error`	ERROR	Backend returned an error
`outbound_auth_failed`	ERROR	Failed to obtain upstream credentials
`config_change`	INFO	Configuration updated from control plane
`component_lifecycle`	INFO	Component deployed, ready, or degraded
`reconciliation_complete`	INFO	Operator completed a reconciliation cycle

Tips

Start with tool activity events. These give you the most immediately useful information — who's using what, and whether anything is being blocked.
Set up alerts for circuit_breaker_open events. These indicate an upstream service is failing, which often requires attention.
Use request IDs for correlation. When troubleshooting a specific failure, search your SIEM for the request.id to find all related events in the chain.
Monitor stale pool status. Set up an alert in your monitoring system for pools that go stale — this usually indicates a networking issue.
Route different event types to different systems. Send security-relevant events (auth failures, rate limits) to your SIEM for compliance, and operational events to your monitoring stack for alerting.
Check pool health after config changes. After updating pool configuration, use force-sync and verify the pool returns to Active status.

Quick Start​

What Gets Logged​

Tool Activity Events​

Operational Events​

SIEM Export Configuration​

Supported Destinations​

Setting Up SIEM Export​

Routing Events by Type​

Per-MCP Server Overrides​

Pool Health Monitoring​

Portal Dashboard​

Pool Operations​

Troubleshooting with Audit Events​

"My agent can't access a tool"​

"Requests are slow or timing out"​

"The pool shows as stale"​

Event Fields Reference​

Tool Activity Events — Key Fields​

Common Operational Events​

Tips​