Skip to main content

Datadog MCP Server

Create a powerful Model Context Protocol (MCP) server for Datadog in minutes with our AI Gateway. This guide walks you through setting up seamless Datadog integration with enterprise-grade security and instant API key authentication.

About Datadog API

Datadog is the essential monitoring and observability platform for cloud-scale applications, providing comprehensive visibility across your entire technology stack. The Datadog API enables programmatic access to:

  • Infrastructure Monitoring: Metrics from servers, containers, and cloud services
  • Application Performance Monitoring (APM): End-to-end request tracing
  • Log Management: Centralized log aggregation and analysis
  • Real User Monitoring (RUM): Frontend performance tracking
  • Security Monitoring: Threat detection and compliance
  • Synthetic Monitoring: Proactive testing and alerting
  • Network Performance Monitoring: Traffic flow analysis
  • Incident Management: Alert orchestration and response

Key Features

  • REST API v2: Modern API with enhanced capabilities
  • Multi-Region Support: US1, EU1, US3, US5, AP1, FedRAMP
  • Rate Limiting: 300-1000 requests/hour depending on endpoint
  • Real-time Metrics: Sub-minute metric resolution
  • Log Streaming: Real-time log ingestion
  • Custom Metrics: Business-specific KPIs
  • Service Map: Automatic dependency discovery
  • Anomaly Detection: AI-powered insights

What You Can Do with Datadog MCP Server

The MCP server transforms Datadog's API into a natural language interface, enabling AI agents to:

Metrics & Monitoring

  • Metric Operations

    • "Show CPU usage for production servers in the last hour"
    • "Alert when memory exceeds 80% on any database server"
    • "Track custom business metrics like checkout conversion rate"
    • "Compare performance metrics between releases"
  • Dashboard Management

    • "Create a dashboard for API performance metrics"
    • "Clone the production dashboard for staging environment"
    • "Update SLO dashboard with new service endpoints"
    • "Generate executive dashboard with key business metrics"
  • Anomaly Detection

    • "Find anomalies in request latency over the past week"
    • "Detect unusual spikes in error rates"
    • "Identify outliers in database query times"
    • "Alert on abnormal user behavior patterns"

Log Management

  • Log Analysis

    • "Search for all 500 errors in the payment service"
    • "Find logs containing user ID 12345 from yesterday"
    • "Show authentication failures in the last 24 hours"
    • "Correlate error logs with deployment events"
  • Log Patterns

    • "Identify common error patterns in application logs"
    • "Group similar log messages automatically"
    • "Extract fields from unstructured log data"
    • "Create parsing rules for custom log formats"
  • Log Archives

    • "Archive logs older than 30 days to S3"
    • "Rehydrate logs from last month for investigation"
    • "Set retention policies by log source"
    • "Calculate log storage costs by service"

Alerting & Incidents

  • Alert Configuration

    • "Create alert when API response time exceeds 500ms"
    • "Set up composite alerts for multi-condition scenarios"
    • "Configure anomaly-based alerts for traffic patterns"
    • "Build SLA alerts for critical services"
  • Incident Management

    • "Create incident for current production outage"
    • "Assign incident to on-call engineer"
    • "Update incident status and add timeline events"
    • "Generate post-mortem report with metrics"
  • Alert Routing

    • "Route database alerts to DBA team"
    • "Escalate P1 alerts to management after 15 minutes"
    • "Suppress alerts during maintenance windows"
    • "Configure alert fatigue reduction rules"

APM & Tracing

  • Service Performance

    • "Show slowest endpoints in the user service"
    • "Trace requests through microservice architecture"
    • "Identify bottlenecks in database queries"
    • "Compare latency across different regions"
  • Error Tracking

    • "Find most frequent errors by service"
    • "Track error rates after deployments"
    • "Identify error patterns by user segment"
    • "Correlate errors with infrastructure issues"
  • Dependency Mapping

    • "Show all services dependent on the auth service"
    • "Identify critical path for checkout flow"
    • "Map database connections by service"
    • "Visualize API call patterns"

Security Monitoring

  • Threat Detection

    • "Show all security signals from the last hour"
    • "Detect brute force login attempts"
    • "Monitor for suspicious API usage patterns"
    • "Track compliance violations"
  • Security Posture

    • "Audit cloud resource configurations"
    • "Find publicly exposed S3 buckets"
    • "Check for unencrypted databases"
    • "Monitor IAM permission changes"
  • Compliance Reporting

    • "Generate PCI compliance report"
    • "Track GDPR data access requests"
    • "Monitor SOC2 control effectiveness"
    • "Audit security group changes"

Automation & Integration

  • Workflow Automation

    • "Auto-scale services based on metrics"
    • "Trigger remediation scripts on alerts"
    • "Update CMDB with discovered services"
    • "Sync monitors with infrastructure changes"
  • Synthetic Monitoring

    • "Create API tests for critical endpoints"
    • "Monitor multi-step user journeys"
    • "Test from multiple global locations"
    • "Alert on availability drops"
  • Service Level Objectives

    • "Define SLO for 99.9% uptime"
    • "Track error budget consumption"
    • "Alert when burning through error budget"
    • "Generate SLO compliance reports"

Prerequisites

  • Access to Cequence AI Gateway
  • Datadog account with appropriate permissions
  • API Key and Application Key from Datadog
  • Understanding of your Datadog region (US1, EU1, etc.)

Step 1: Generate Datadog API Credentials

Before setting up the MCP server, you need to create API credentials in Datadog.

1.1 Access Datadog Organization Settings

  1. Log in to your Datadog account
  2. Navigate to Organization Settings (bottom left menu)
  3. Select API Keys under Access section

1.2 Create API Key

  1. Click New Key
  2. Provide a descriptive name:
    • Example: "AI Gateway MCP Integration"
  3. Copy the generated API key (you'll need this later)
  4. Store it securely - you cannot view it again

1.3 Create Application Key

  1. Go to Application Keys tab
  2. Click New Key
  3. Provide a descriptive name:
    • Example: "AI Gateway MCP App Key"
  4. Copy the generated application key
  5. Store it securely

1.4 Note Your Datadog Region

Your Datadog region determines the API endpoint:

  • US1: api.datadoghq.com (default)
  • EU1: api.datadoghq.eu
  • US3: api.us3.datadoghq.com
  • US5: api.us5.datadoghq.com
  • AP1: api.ap1.datadoghq.com
  • US1-FED: api.ddog-gov.com (FedRAMP)

Step 2: Access AI Gateway Apps

  1. Log in to your Cequence AI Gateway dashboard
  2. Navigate to Apps in the left sidebar
  3. You'll see the list of available third-party applications

Step 3: Find and Select Datadog API

  1. In the Apps section, browse through the Third-party category
  2. Look for Datadog or use the search function
  3. Click on the Datadog API card to view details

The Datadog API card shows:

  • Number of available endpoints
  • Integration capabilities
  • Quick description of functionality

Step 4: Create MCP Server

  1. Click the Create MCP Server button on the Datadog API card
  2. You'll be redirected to the MCP Server creation wizard

Step 5: Configure API Endpoints

In the App Configuration step:

  1. Base URL: Select your Datadog region endpoint
    • Default: https://api.datadoghq.com
    • Or choose your specific region
  2. Select API endpoints to expose to your MCP server based on your needs
  3. Click Next to proceed

Step 6: MCP Server Basic Setup

Configure your MCP server details:

  1. MCP Server Name: Enter a descriptive name

    • Example: "Datadog Observability Platform"
    • This name will identify your server in the dashboard
  2. Description (Optional): Add details about the server's purpose

    • Example: "Comprehensive monitoring and observability for production infrastructure"
  3. Production Mode: Toggle based on your needs

    • ON for production environments
    • OFF for development/testing
  4. Click Next to continue

Step 7: Configure Authentication

This is where you'll use your Datadog API credentials:

  1. Authentication Type: Select API Key

  2. Fill in the authentication details:

    • API Key: Paste your Datadog API key
    • Application Key: Paste your Datadog Application key
  3. Additional Headers:

    • The system will automatically configure:
      • DD-API-KEY: {your-api-key}
      • DD-APPLICATION-KEY: {your-application-key}
  4. Click Next to continue

Available Datadog API Capabilities

The Datadog MCP server provides access to comprehensive monitoring capabilities:

Metrics API

  • Query Metrics

    • Retrieve time series data
    • Aggregate metrics across tags
    • Calculate rollups and transformations
    • Access custom metrics
  • Submit Metrics

    • Send custom metrics
    • Batch metric submission
    • Update metric metadata
    • Configure metric units

Logs API

  • Search Logs

    • Query log events
    • Aggregate log data
    • Access log archives
    • Configure log pipelines
  • Log Management

    • Create parsing rules
    • Manage indexes
    • Configure retention
    • Set up archives

Monitors API

  • Monitor Operations

    • Create and update monitors
    • Manage alert conditions
    • Configure notifications
    • Schedule downtimes
  • Monitor Groups

    • Group related monitors
    • Bulk operations
    • Template management
    • Tag-based organization

Dashboards API

  • Dashboard Management

    • Create custom dashboards
    • Clone and modify templates
    • Share dashboards
    • Schedule reports
  • Widget Configuration

    • Time series graphs
    • Query value displays
    • Heat maps and distributions
    • Service maps

Events API

  • Event Tracking
    • Submit custom events
    • Query event stream
    • Correlate with metrics
    • Tag and filter events

Service Management

  • APM Services

    • Service dependencies
    • Performance metrics
    • Error tracking
    • SLO management
  • Synthetic Tests

    • API tests
    • Browser tests
    • Multi-step journeys
    • Global locations

Step 8: Configure Security

Set up API protection features:

  1. API Protection: Toggle ON to enable

    • Protects against bot attacks, DDoS, and threats
    • Monitors for suspicious activity
    • Rate limiting and anomaly detection
  2. Protection Features (when enabled):

    • Auto-scaling protection
    • Managed infrastructure
    • Built-in monitoring
    • Zero maintenance required
  3. Click Next to continue

Step 9: Choose Deployment Method

Select your deployment preference:

  • Fully managed deployment
  • Automatic scaling and monitoring
  • Built-in high availability
  • Features included:
    • Auto-scaling
    • Managed infrastructure
    • Built-in monitoring
    • Zero maintenance

Option B: Deploy with Helm Chart

  • Self-managed Kubernetes deployment
  • Full control over infrastructure
  • Requires:
    • Kubernetes cluster
    • Helm 3.x installed
    • Container registry access

Click Next after selecting your deployment method.

Step 10: Review and Deploy

Review your MCP server configuration:

  • MCP Server Name: Your chosen name
  • Base URL: Your Datadog region endpoint
  • Selected Endpoints: Number of endpoints selected
  • Authentication: API Key (Configured)
  • API Protection: Enabled/Disabled
  • Deployment: Cequence Cloud or Helm

Click Create & Deploy to finalize the setup.

Step 11: Post-Deployment Setup

After successful deployment:

  1. Note the MCP Server URL provided

  2. Test the connection:

    • Click "Test Connection"
    • Should return successful authentication
    • Verify access to selected endpoints
  3. Configure AI Agents:

    • The MCP server is now available for AI agent connections
    • Use the provided server URL in your AI agent configuration

Using Your Datadog MCP Server

With Claude Desktop

  1. Open Claude Desktop settings

  2. Add your MCP server:

    {
    "servers": {
    "datadog": {
    "url": "your-mcp-server-url",
    "auth": {
    "type": "api_key",
    "api_key": "your-encrypted-key"
    }
    }
    }
    }
  3. Start using natural language commands:

    • "Show me CPU usage for web servers in the last hour"
    • "Create an alert for high memory usage on database servers"
    • "Find all error logs from the payment service today"
    • "Generate a dashboard for API performance metrics"
    • "What services are experiencing high latency right now?"

API Integration Example

// Initialize MCP client
const mcpClient = new MCPClient({
serverUrl: 'your-mcp-server-url',
auth: {
type: 'api_key',
headers: {
'DD-API-KEY': process.env.DD_API_KEY,
'DD-APPLICATION-KEY': process.env.DD_APP_KEY
}
}
});

// Query metrics
const cpuMetrics = await mcpClient.datadog.metrics.query({
query: 'avg:system.cpu.user{service:web-app}',
from: Date.now() - 3600000, // 1 hour ago
to: Date.now()
});

// Search logs
const errorLogs = await mcpClient.datadog.logs.search({
query: 'service:payment-api status:error',
time: {
from: '1 hour ago',
to: 'now'
},
limit: 100
});

// Create monitor
const monitor = await mcpClient.datadog.monitors.create({
type: 'metric alert',
query: 'avg(last_5m):avg:system.memory.used{*} by {host} > 0.9',
name: 'High Memory Usage Alert',
message: 'Memory usage is above 90% on {{host.name}}',
tags: ['team:infrastructure', 'severity:high'],
options: {
thresholds: {
critical: 0.9,
warning: 0.8
},
notify_no_data: true,
notify_audit: false
}
});

// Update dashboard
await mcpClient.datadog.dashboards.update({
id: 'dashboard-id',
title: 'Application Performance Dashboard',
widgets: [{
definition: {
type: 'timeseries',
requests: [{
q: 'avg:trace.servlet.request.duration{*}',
display_type: 'line'
}],
title: 'API Response Time'
}
}]
});

Common Use Cases

Infrastructure Monitoring

  • Server health tracking
  • Container orchestration metrics
  • Cloud resource utilization
  • Network performance analysis
  • Cost optimization insights

Application Performance

  • API latency monitoring
  • Error rate tracking
  • Database query optimization
  • Service dependency mapping
  • User journey analysis

Log Intelligence

  • Centralized log analysis
  • Error pattern detection
  • Security event correlation
  • Compliance auditing
  • Troubleshooting workflows

Incident Response

  • Automated alert creation
  • On-call rotation management
  • Incident timeline tracking
  • Post-mortem generation
  • SLA compliance reporting

Security Best Practices

  1. API Key Security:

    • Store keys in secure vault
    • Rotate keys regularly
    • Use separate keys per environment
    • Monitor key usage
  2. Access Control:

    • Limit API key permissions
    • Use role-based access
    • Audit API activity
    • Implement IP allowlists
  3. Rate Limiting:

    • Monitor API usage
    • Implement caching strategies
    • Use batch operations
    • Handle rate limit errors
  4. Data Privacy:

    • Mask sensitive data in logs
    • Implement data retention policies
    • Use log archives for compliance
    • Encrypt data in transit

Troubleshooting

Common Issues

  1. 403 Forbidden

    • Verify API key is active
    • Check application key permissions
    • Ensure correct region endpoint
    • Validate IP allowlist settings
  2. 429 Rate Limited

    • Check rate limit headers
    • Implement exponential backoff
    • Use more efficient queries
    • Consider caching responses
  3. 400 Bad Request

    • Validate query syntax
    • Check time range format
    • Verify tag formatting
    • Review API documentation
  4. No Data Returned

    • Verify metric/log exists
    • Check time range
    • Validate tag filters
    • Ensure data retention period

Getting Help