Datadog MCP Server
Create a powerful Model Context Protocol (MCP) server for Datadog in minutes with our AI Gateway. This guide walks you through setting up seamless Datadog integration with enterprise-grade security and instant API key authentication.
About Datadog API
Datadog is the essential monitoring and observability platform for cloud-scale applications, providing comprehensive visibility across your entire technology stack. The Datadog API enables programmatic access to:
- Infrastructure Monitoring: Metrics from servers, containers, and cloud services
- Application Performance Monitoring (APM): End-to-end request tracing
- Log Management: Centralized log aggregation and analysis
- Real User Monitoring (RUM): Frontend performance tracking
- Security Monitoring: Threat detection and compliance
- Synthetic Monitoring: Proactive testing and alerting
- Network Performance Monitoring: Traffic flow analysis
- Incident Management: Alert orchestration and response
Key Features
- REST API v2: Modern API with enhanced capabilities
- Multi-Region Support: US1, EU1, US3, US5, AP1, FedRAMP
- Rate Limiting: 300-1000 requests/hour depending on endpoint
- Real-time Metrics: Sub-minute metric resolution
- Log Streaming: Real-time log ingestion
- Custom Metrics: Business-specific KPIs
- Service Map: Automatic dependency discovery
- Anomaly Detection: AI-powered insights
What You Can Do with Datadog MCP Server
The MCP server transforms Datadog's API into a natural language interface, enabling AI agents to:
Metrics & Monitoring
-
Metric Operations
- "Show CPU usage for production servers in the last hour"
- "Alert when memory exceeds 80% on any database server"
- "Track custom business metrics like checkout conversion rate"
- "Compare performance metrics between releases"
-
Dashboard Management
- "Create a dashboard for API performance metrics"
- "Clone the production dashboard for staging environment"
- "Update SLO dashboard with new service endpoints"
- "Generate executive dashboard with key business metrics"
-
Anomaly Detection
- "Find anomalies in request latency over the past week"
- "Detect unusual spikes in error rates"
- "Identify outliers in database query times"
- "Alert on abnormal user behavior patterns"
Log Management
-
Log Analysis
- "Search for all 500 errors in the payment service"
- "Find logs containing user ID 12345 from yesterday"
- "Show authentication failures in the last 24 hours"
- "Correlate error logs with deployment events"
-
Log Patterns
- "Identify common error patterns in application logs"
- "Group similar log messages automatically"
- "Extract fields from unstructured log data"
- "Create parsing rules for custom log formats"
-
Log Archives
- "Archive logs older than 30 days to S3"
- "Rehydrate logs from last month for investigation"
- "Set retention policies by log source"
- "Calculate log storage costs by service"
Alerting & Incidents
-
Alert Configuration
- "Create alert when API response time exceeds 500ms"
- "Set up composite alerts for multi-condition scenarios"
- "Configure anomaly-based alerts for traffic patterns"
- "Build SLA alerts for critical services"
-
Incident Management
- "Create incident for current production outage"
- "Assign incident to on-call engineer"
- "Update incident status and add timeline events"
- "Generate post-mortem report with metrics"
-
Alert Routing
- "Route database alerts to DBA team"
- "Escalate P1 alerts to management after 15 minutes"
- "Suppress alerts during maintenance windows"
- "Configure alert fatigue reduction rules"
APM & Tracing
-
Service Performance
- "Show slowest endpoints in the user service"
- "Trace requests through microservice architecture"
- "Identify bottlenecks in database queries"
- "Compare latency across different regions"
-
Error Tracking
- "Find most frequent errors by service"
- "Track error rates after deployments"
- "Identify error patterns by user segment"
- "Correlate errors with infrastructure issues"
-
Dependency Mapping
- "Show all services dependent on the auth service"
- "Identify critical path for checkout flow"
- "Map database connections by service"
- "Visualize API call patterns"
Security Monitoring
-
Threat Detection
- "Show all security signals from the last hour"
- "Detect brute force login attempts"
- "Monitor for suspicious API usage patterns"
- "Track compliance violations"
-
Security Posture
- "Audit cloud resource configurations"
- "Find publicly exposed S3 buckets"
- "Check for unencrypted databases"
- "Monitor IAM permission changes"
-
Compliance Reporting
- "Generate PCI compliance report"
- "Track GDPR data access requests"
- "Monitor SOC2 control effectiveness"
- "Audit security group changes"
Automation & Integration
-
Workflow Automation
- "Auto-scale services based on metrics"
- "Trigger remediation scripts on alerts"
- "Update CMDB with discovered services"
- "Sync monitors with infrastructure changes"
-
Synthetic Monitoring
- "Create API tests for critical endpoints"
- "Monitor multi-step user journeys"
- "Test from multiple global locations"
- "Alert on availability drops"
-
Service Level Objectives
- "Define SLO for 99.9% uptime"
- "Track error budget consumption"
- "Alert when burning through error budget"
- "Generate SLO compliance reports"
Prerequisites
- Access to Cequence AI Gateway
- Datadog account with appropriate permissions
- API Key and Application Key from Datadog
- Understanding of your Datadog region (US1, EU1, etc.)
Step 1: Generate Datadog API Credentials
Before setting up the MCP server, you need to create API credentials in Datadog.
1.1 Access Datadog Organization Settings
- Log in to your Datadog account
- Navigate to Organization Settings (bottom left menu)
- Select API Keys under Access section
1.2 Create API Key
- Click New Key
- Provide a descriptive name:
- Example: "AI Gateway MCP Integration"
- Copy the generated API key (you'll need this later)
- Store it securely - you cannot view it again
1.3 Create Application Key
- Go to Application Keys tab
- Click New Key
- Provide a descriptive name:
- Example: "AI Gateway MCP App Key"
- Copy the generated application key
- Store it securely
1.4 Note Your Datadog Region
Your Datadog region determines the API endpoint:
- US1: api.datadoghq.com (default)
- EU1: api.datadoghq.eu
- US3: api.us3.datadoghq.com
- US5: api.us5.datadoghq.com
- AP1: api.ap1.datadoghq.com
- US1-FED: api.ddog-gov.com (FedRAMP)
Step 2: Access AI Gateway Apps
- Log in to your Cequence AI Gateway dashboard
- Navigate to Apps in the left sidebar
- You'll see the list of available third-party applications
Step 3: Find and Select Datadog API
- In the Apps section, browse through the Third-party category
- Look for Datadog or use the search function
- Click on the Datadog API card to view details
The Datadog API card shows:
- Number of available endpoints
- Integration capabilities
- Quick description of functionality
Step 4: Create MCP Server
- Click the Create MCP Server button on the Datadog API card
- You'll be redirected to the MCP Server creation wizard
Step 5: Configure API Endpoints
In the App Configuration step:
- Base URL: Select your Datadog region endpoint
- Default:
https://api.datadoghq.com
- Or choose your specific region
- Default:
- Select API endpoints to expose to your MCP server based on your needs
- Click Next to proceed
Step 6: MCP Server Basic Setup
Configure your MCP server details:
-
MCP Server Name: Enter a descriptive name
- Example: "Datadog Observability Platform"
- This name will identify your server in the dashboard
-
Description (Optional): Add details about the server's purpose
- Example: "Comprehensive monitoring and observability for production infrastructure"
-
Production Mode: Toggle based on your needs
- ON for production environments
- OFF for development/testing
-
Click Next to continue
Step 7: Configure Authentication
This is where you'll use your Datadog API credentials:
-
Authentication Type: Select API Key
-
Fill in the authentication details:
- API Key: Paste your Datadog API key
- Application Key: Paste your Datadog Application key
-
Additional Headers:
- The system will automatically configure:
DD-API-KEY: {your-api-key}
DD-APPLICATION-KEY: {your-application-key}
- The system will automatically configure:
-
Click Next to continue
Available Datadog API Capabilities
The Datadog MCP server provides access to comprehensive monitoring capabilities:
Metrics API
-
Query Metrics
- Retrieve time series data
- Aggregate metrics across tags
- Calculate rollups and transformations
- Access custom metrics
-
Submit Metrics
- Send custom metrics
- Batch metric submission
- Update metric metadata
- Configure metric units
Logs API
-
Search Logs
- Query log events
- Aggregate log data
- Access log archives
- Configure log pipelines
-
Log Management
- Create parsing rules
- Manage indexes
- Configure retention
- Set up archives
Monitors API
-
Monitor Operations
- Create and update monitors
- Manage alert conditions
- Configure notifications
- Schedule downtimes
-
Monitor Groups
- Group related monitors
- Bulk operations
- Template management
- Tag-based organization
Dashboards API
-
Dashboard Management
- Create custom dashboards
- Clone and modify templates
- Share dashboards
- Schedule reports
-
Widget Configuration
- Time series graphs
- Query value displays
- Heat maps and distributions
- Service maps
Events API
- Event Tracking
- Submit custom events
- Query event stream
- Correlate with metrics
- Tag and filter events
Service Management
-
APM Services
- Service dependencies
- Performance metrics
- Error tracking
- SLO management
-
Synthetic Tests
- API tests
- Browser tests
- Multi-step journeys
- Global locations
Step 8: Configure Security
Set up API protection features:
-
API Protection: Toggle ON to enable
- Protects against bot attacks, DDoS, and threats
- Monitors for suspicious activity
- Rate limiting and anomaly detection
-
Protection Features (when enabled):
- Auto-scaling protection
- Managed infrastructure
- Built-in monitoring
- Zero maintenance required
-
Click Next to continue
Step 9: Choose Deployment Method
Select your deployment preference:
Option A: Deploy to Cequence Cloud (Recommended)
- Fully managed deployment
- Automatic scaling and monitoring
- Built-in high availability
- Features included:
- Auto-scaling
- Managed infrastructure
- Built-in monitoring
- Zero maintenance
Option B: Deploy with Helm Chart
- Self-managed Kubernetes deployment
- Full control over infrastructure
- Requires:
- Kubernetes cluster
- Helm 3.x installed
- Container registry access
Click Next after selecting your deployment method.
Step 10: Review and Deploy
Review your MCP server configuration:
- MCP Server Name: Your chosen name
- Base URL: Your Datadog region endpoint
- Selected Endpoints: Number of endpoints selected
- Authentication: API Key (Configured)
- API Protection: Enabled/Disabled
- Deployment: Cequence Cloud or Helm
Click Create & Deploy to finalize the setup.
Step 11: Post-Deployment Setup
After successful deployment:
-
Note the MCP Server URL provided
-
Test the connection:
- Click "Test Connection"
- Should return successful authentication
- Verify access to selected endpoints
-
Configure AI Agents:
- The MCP server is now available for AI agent connections
- Use the provided server URL in your AI agent configuration
Using Your Datadog MCP Server
With Claude Desktop
-
Open Claude Desktop settings
-
Add your MCP server:
{
"servers": {
"datadog": {
"url": "your-mcp-server-url",
"auth": {
"type": "api_key",
"api_key": "your-encrypted-key"
}
}
}
} -
Start using natural language commands:
- "Show me CPU usage for web servers in the last hour"
- "Create an alert for high memory usage on database servers"
- "Find all error logs from the payment service today"
- "Generate a dashboard for API performance metrics"
- "What services are experiencing high latency right now?"
API Integration Example
// Initialize MCP client
const mcpClient = new MCPClient({
serverUrl: 'your-mcp-server-url',
auth: {
type: 'api_key',
headers: {
'DD-API-KEY': process.env.DD_API_KEY,
'DD-APPLICATION-KEY': process.env.DD_APP_KEY
}
}
});
// Query metrics
const cpuMetrics = await mcpClient.datadog.metrics.query({
query: 'avg:system.cpu.user{service:web-app}',
from: Date.now() - 3600000, // 1 hour ago
to: Date.now()
});
// Search logs
const errorLogs = await mcpClient.datadog.logs.search({
query: 'service:payment-api status:error',
time: {
from: '1 hour ago',
to: 'now'
},
limit: 100
});
// Create monitor
const monitor = await mcpClient.datadog.monitors.create({
type: 'metric alert',
query: 'avg(last_5m):avg:system.memory.used{*} by {host} > 0.9',
name: 'High Memory Usage Alert',
message: 'Memory usage is above 90% on {{host.name}}',
tags: ['team:infrastructure', 'severity:high'],
options: {
thresholds: {
critical: 0.9,
warning: 0.8
},
notify_no_data: true,
notify_audit: false
}
});
// Update dashboard
await mcpClient.datadog.dashboards.update({
id: 'dashboard-id',
title: 'Application Performance Dashboard',
widgets: [{
definition: {
type: 'timeseries',
requests: [{
q: 'avg:trace.servlet.request.duration{*}',
display_type: 'line'
}],
title: 'API Response Time'
}
}]
});
Common Use Cases
Infrastructure Monitoring
- Server health tracking
- Container orchestration metrics
- Cloud resource utilization
- Network performance analysis
- Cost optimization insights
Application Performance
- API latency monitoring
- Error rate tracking
- Database query optimization
- Service dependency mapping
- User journey analysis
Log Intelligence
- Centralized log analysis
- Error pattern detection
- Security event correlation
- Compliance auditing
- Troubleshooting workflows
Incident Response
- Automated alert creation
- On-call rotation management
- Incident timeline tracking
- Post-mortem generation
- SLA compliance reporting
Security Best Practices
-
API Key Security:
- Store keys in secure vault
- Rotate keys regularly
- Use separate keys per environment
- Monitor key usage
-
Access Control:
- Limit API key permissions
- Use role-based access
- Audit API activity
- Implement IP allowlists
-
Rate Limiting:
- Monitor API usage
- Implement caching strategies
- Use batch operations
- Handle rate limit errors
-
Data Privacy:
- Mask sensitive data in logs
- Implement data retention policies
- Use log archives for compliance
- Encrypt data in transit
Troubleshooting
Common Issues
-
403 Forbidden
- Verify API key is active
- Check application key permissions
- Ensure correct region endpoint
- Validate IP allowlist settings
-
429 Rate Limited
- Check rate limit headers
- Implement exponential backoff
- Use more efficient queries
- Consider caching responses
-
400 Bad Request
- Validate query syntax
- Check time range format
- Verify tag formatting
- Review API documentation
-
No Data Returned
- Verify metric/log exists
- Check time range
- Validate tag filters
- Ensure data retention period
Getting Help
- Documentation: AI Gateway Docs
- Support: support@cequence.ai
- Community: AI Gateway Forum
- Datadog Docs: docs.datadoghq.com
- Datadog Community: community.datadoghq.com