Databricks MCP Server
Create a powerful Model Context Protocol (MCP) server for Databricks in minutes with our AI Gateway. This guide walks you through setting up seamless data lakehouse automation with enterprise-grade security and instant OAuth authentication.
About Databricks API
The Databricks REST API provides programmatic access to manage clusters, jobs, notebooks, data, and ML workflows in your Databricks workspace. It enables powerful automation for data engineering, analytics, and machine learning operations across your organization.
Key Capabilities
- Cluster Management: Create, start, stop, and configure clusters
- Job Orchestration: Schedule and run data pipelines
- Notebook Operations: Import, export, and execute notebooks
- DBFS Management: Store and manage files in Databricks File System
- ML Workflows: Track experiments and manage models
- SQL Warehouses: Query and analyze data
- User Management: Configure workspace users and groups
- Secrets Management: Secure credentials and tokens
API Features
- REST API v2.0/2.1/2.2: Comprehensive workspace operations
- OAuth 2.0: Secure authentication with token refresh
- Workspace API: Manage notebooks and folders
- Clusters API: Full cluster lifecycle management
- Jobs API: Automated workflow orchestration
- DBFS API: File system operations
- SQL API: Data warehousing operations
- SCIM API: User and group provisioning
What You Can Do with Databricks MCP Server
The MCP server transforms Databricks API into a natural language interface, enabling AI agents to:
Cluster Management
-
Cluster Operations
- "Create a new cluster with 8 workers"
- "Start the analytics cluster"
- "Stop all idle clusters"
- "List available Spark versions"
-
Cluster Configuration
- "Update cluster to use latest Spark"
- "Resize cluster to 16 workers"
- "Configure autoscaling for cluster"
- "Set auto-termination to 30 minutes"
-
Cluster Monitoring
- "Check cluster status"
- "View cluster events"
- "List all running clusters"
- "Get cluster metrics"
Job Orchestration
-
Job Management
- "Create ETL job for daily processing"
- "Schedule job to run at 2 AM"
- "Update job configuration"
- "Delete completed jobs"
-
Job Execution
- "Run data pipeline now"
- "Check job run status"
- "Cancel running job"
- "Get job output logs"
-
Job Monitoring
- "List all job runs"
- "View failed jobs today"
- "Get job execution history"
- "Monitor job performance"
Workspace Management
-
Notebook Operations
- "Import notebook from repository"
- "Export notebook to Python script"
- "List all notebooks in workspace"
- "Create new notebook folder"
-
Folder Organization
- "Create project folder structure"
- "Move notebooks to archive"
- "List workspace contents"
- "Get notebook status"
-
Collaboration
- "Share notebook with team"
- "Get notebook metadata"
- "Track notebook changes"
- "Export workspace backup"
DBFS Operations
-
File Management
- "Upload data file to DBFS"
- "List files in /FileStore/"
- "Download analysis results"
- "Delete old data files"
-
Directory Operations
- "Create data directory"
- "Move files to archive folder"
- "Check file status"
- "List directory contents"
-
Data Transfer
- "Upload CSV to DBFS"
- "Download processed data"
- "Copy files between locations"
- "Verify file integrity"
SQL Warehouses
-
Warehouse Management
- "List available warehouses"
- "Start SQL warehouse"
- "Stop warehouse after query"
- "Get warehouse configuration"
-
Query Operations
- "List saved SQL queries"
- "View query history"
- "Get query results"
- "Manage query dashboards"
-
Analytics
- "Create data dashboard"
- "Schedule report generation"
- "Export query results"
- "Monitor warehouse usage"
User & Group Management
-
User Operations
- "List workspace users"
- "Create service account"
- "Update user permissions"
- "Deactivate user access"
-
Group Management
- "Create data engineering group"
- "Add users to group"
- "List group members"
- "Update group permissions"
-
Access Control
- "Grant cluster access"
- "Review user permissions"
- "Manage workspace roles"
- "Audit access logs"
Secrets Management
-
Secret Operations
- "Create secret scope"
- "Store API credentials"
- "List secrets in scope"
- "Delete expired secrets"
-
Security
- "Manage secret permissions"
- "Rotate access tokens"
- "Audit secret usage"
- "Configure secret ACLs"
Prerequisites
- Access to Cequence AI Gateway
- Databricks workspace account
- Workspace admin or appropriate permissions
- Account-level access (for OAuth setup)
Step 1: Configure OAuth in Databricks
1.1 Access Databricks Account Console
- Go to https://accounts.cloud.databricks.com/
- Sign in with your Databricks account credentials
- Select your account from the list
1.2 Create OAuth App Connection
-
Navigate to Settings in the left sidebar
-
Click on App connections tab
-
Click Add connection button
-
Configure the application:
Application Details:
- Name: "AI Gateway MCP Server"
- Redirect URLs: Add the following:
https://auth.aigateway.cequence.ai/v1/outbound/oauth/callback
1.3 Configure OAuth Scopes
Select the required scopes based on your needs:
Essential Scopes:
all-apis
- Access to all Databricks REST APIsoffline_access
- Refresh token support
Optional Scopes (for specific features):
sql
- SQL Analytics accessclusters
- Cluster managementjobs
- Job orchestrationworkspace
- Notebook operations
1.4 Get OAuth Credentials
- Click Create to generate the app connection
- Copy the Client ID (format:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
) - Click Generate Secret and copy the Client Secret
- Note your Account ID from the URL or account settings
1.5 Important OAuth URLs
Based on your Databricks account:
- Authorization URL:
https://accounts.cloud.databricks.com/oidc/accounts/{accountId}/v1/authorize
- Token URL:
https://accounts.cloud.databricks.com/oidc/accounts/{accountId}/v1/token
Replace {accountId}
with your actual Databricks Account ID.
Step 2: Access AI Gateway
- Log in to your Cequence AI Gateway portal
- Navigate to API Catalog or Integrations
- Search for "Databricks" in the catalog
Step 3: Find Databricks API
- Locate Databricks REST API in the search results
- Review available API categories:
- Clusters API
- Jobs API
- Workspace API
- DBFS API
- SQL Warehouses API
- SCIM API (Users & Groups)
- Secrets API
- Click Create MCP Server or Configure
Step 4: Create MCP Server
- Click Create New MCP Server for Databricks
- Review the MCP server creation wizard
- Click Start Configuration
Step 5: Configure API Endpoints
-
Base URL:
https://{workspace_url}
- Example:
https://dbc-abc12345-6789.cloud.databricks.com
- Get this from your workspace URL
- Example:
-
Select Endpoint Categories:
- ✓ Clusters API (v2.1)
- ✓ Jobs API (v2.2)
- ✓ Workspace API (v2.0)
- ✓ DBFS API (v2.0)
- ✓ SQL Warehouses API (v2.0)
- ✓ SCIM API (v2.0)
- ✓ Secrets API (v2.0)
-
Review selected endpoints and click Next
Step 6: MCP Server Configuration
- Server Name: "Databricks Lakehouse Automation"
- Description: "Data engineering and ML workflow automation"
- Environment: Select production or development
- Enable Features:
- ✓ Request logging
- ✓ Error handling
- ✓ Rate limiting
- ✓ Token refresh
- Click Next
Step 7: Configure Authentication
OAuth 2.0 Configuration
-
Authentication Type: Select OAuth 2.0
-
Authorization URL:
https://accounts.cloud.databricks.com/oidc/accounts/{your-account-id}/v1/authorize
Replace
{your-account-id}
with your actual Account ID -
Token URL:
https://accounts.cloud.databricks.com/oidc/accounts/{your-account-id}/v1/token
Replace
{your-account-id}
with your actual Account ID -
Client ID: Paste the Client ID from Step 1.4
-
Client Secret: Paste the Client Secret from Step 1.4
-
OAuth Scopes: Enter the required scopes (space-separated):
all-apis offline_access
-
Token Refresh: Enable automatic token refresh
-
Click Validate Credentials to test the configuration
-
Click Next after successful validation
Available Databricks OAuth Scopes
Core Access Scopes
-
all-apis
- Access to all Databricks REST APIs
- Cluster management
- Job orchestration
- Workspace operations
- DBFS access
- User management
- Recommended for full automation
-
offline_access
- Enables refresh token
- Long-lived sessions
- Automatic token renewal
- Required for production use
Feature-Specific Scopes
-
sql
- SQL Analytics access
- Query warehouses
- Dashboard operations
- Data exploration
-
clusters
- Cluster lifecycle management
- Configuration updates
- Event monitoring
-
jobs
- Job creation and management
- Run orchestration
- Pipeline automation
-
workspace
- Notebook operations
- Folder management
- Import/export
Recommended Scope Combinations
For Full Automation:
all-apis offline_access
For SQL Analytics Only:
sql offline_access
For Data Engineering:
all-apis offline_access
For Read-Only Monitoring:
all-apis
(Note: Most operations require all-apis scope)
Step 8: Security Configuration
-
API Key Management:
- ✓ Enable key rotation
- Set expiration policies
- Configure access logs
-
Rate Limiting:
- Set requests per minute: 100
- Configure burst limits
- Enable throttling alerts
-
IP Restrictions (optional):
- Add allowed IP ranges
- Configure firewall rules
-
Audit Logging:
- ✓ Enable request logging
- ✓ Track API usage
- Configure retention period
-
Click Next
Step 9: Choose Deployment Option
Option A: Cloud Deployment (Recommended)
- Fully managed by AI Gateway
- Automatic scaling
- High availability
- No infrastructure management
Option B: Self-Hosted
- Deploy in your infrastructure
- Full control over resources
- Custom security policies
- Manual scaling
Select your preferred option and click Next
Step 10: Deploy MCP Server
-
Review all configurations:
- Workspace URL
- Selected APIs
- OAuth settings
- Security policies
-
Click Deploy MCP Server
-
Wait for deployment (typically 1-2 minutes)
Using Your Databricks MCP Server
With Claude Desktop, Cursor, or Windsurf
Add to your MCP client configuration (e.g., claude_desktop_config.json
):
{
"mcpServers": {
"databricks": {
"command": "npx",
"args": [
"-y",
"@cequenceai/mcp-remote",
"<your-mcp-url>"
]
}
}
}
}
Natural Language Commands
Try these commands with your AI assistant:
Cluster Management:
- "List all Spark versions available"
- "Create a cluster with 4 workers using latest Spark"
- "Show me all running clusters"
- "Stop the cluster named 'analytics-cluster'"
Job Operations:
- "Create a daily ETL job"
- "Run the data-pipeline job now"
- "Show me failed jobs from today"
- "Get the output from job run 12345"
Workspace Management:
- "List all notebooks in the Shared folder"
- "Import notebook from GitHub repository"
- "Export the analysis notebook as Python"
- "Create a project folder structure"
DBFS Operations:
- "List files in /FileStore/data/"
- "Upload local CSV to DBFS"
- "Download processed results from DBFS"
- "Create a data directory in FileStore"
User Management:
- "List all workspace users"
- "Create a new service account group"
- "Show me user permissions"
- "Add user to data-engineering group"
Common Use Cases
Data Engineering
- ETL Pipelines: Automated data transformation
- Cluster Management: Dynamic resource allocation
- Job Scheduling: Orchestrated workflows
- Data Quality: Validation and monitoring
Machine Learning
- Experiment Tracking: ML lifecycle management
- Model Deployment: Production pipelines
- Feature Engineering: Data preparation
- Training Jobs: Distributed computing
Analytics & BI
- SQL Queries: Ad-hoc analysis
- Dashboard Creation: Visual reporting
- Data Exploration: Interactive queries
- Report Scheduling: Automated insights
DevOps & Automation
- Infrastructure as Code: Workspace configuration
- CI/CD Integration: Deployment automation
- Monitoring: Health checks and alerts
- Cost Optimization: Resource management
Security Best Practices
-
OAuth Security:
- Use
offline_access
for token refresh - Rotate client secrets regularly
- Monitor OAuth app usage
- Revoke unused connections
- Implement token expiration policies
- Use
-
Access Control:
- Follow principle of least privilege
- Use service accounts for automation
- Implement SCIM for user provisioning
- Regular access reviews
- Enable audit logging
-
Workspace Security:
- Protect sensitive notebooks
- Encrypt secrets at rest
- Use secret scopes for credentials
- Configure IP access lists
- Enable workspace isolation
-
Data Protection:
- Encrypt data in transit
- Use Unity Catalog for governance
- Implement fine-grained ACLs
- Monitor DBFS access
- Regular backup procedures
API Rate Limits
Databricks API has the following limits:
- Default: 100 requests per minute per user
- Burst: Short bursts up to 200 RPM
- Cluster APIs: May have lower limits
- Best Practice: Implement exponential backoff
Troubleshooting
Common Issues
1. Authentication Errors
- Error: "Invalid OAuth credentials"
- Solution:
- Verify Client ID and Secret
- Check Account ID in URLs
- Ensure app connection is active
- Validate redirect URI matches
2. Token Expiration
- Error: "Token has expired"
- Solution:
- Enable
offline_access
scope - Configure automatic token refresh
- Check token TTL settings
- Enable
3. Permission Errors
- Error: "Insufficient permissions"
- Solution:
- Verify OAuth scopes include
all-apis
- Check workspace user permissions
- Review cluster/job ACLs
- Ensure admin access for management ops
- Verify OAuth scopes include
4. Cluster Creation Fails
- Error: "Spark version not supported"
- Solution:
- Use Spark Runtime 13.3 LTS or higher
- Check if legacy features are disabled
- Verify node type availability
- Review workspace policies
5. DBFS Access Denied
- Error: "Public DBFS root is disabled"
- Solution:
- Use
/FileStore/
path instead of/tmp/
- Check DBFS permissions
- Verify workspace security settings
- Use allowed storage paths
- Use
6. Workspace Import Fails
- Error: "Folder is protected"
- Solution:
- Use
/Shared/
path instead of/Users/
- Check folder permissions
- Verify notebook format
- Use correct content encoding
- Use
Getting Help
- Documentation: AI Gateway Docs
- Databricks API Docs: docs.databricks.com/api
- API Reference: api-reference.cloud.databricks.com