Databricks MCP Server

Create a powerful Model Context Protocol (MCP) server for Databricks in minutes with our AI Gateway. This guide walks you through setting up seamless data lakehouse automation with enterprise-grade security and instant OAuth authentication.

About Databricks API

The Databricks REST API provides programmatic access to manage clusters, jobs, notebooks, data, and ML workflows in your Databricks workspace. It enables powerful automation for data engineering, analytics, and machine learning operations across your organization.

Key Capabilities

Cluster Management: Create, start, stop, and configure clusters
Job Orchestration: Schedule and run data pipelines
Notebook Operations: Import, export, and execute notebooks
DBFS Management: Store and manage files in Databricks File System
ML Workflows: Track experiments and manage models
SQL Warehouses: Query and analyze data
User Management: Configure workspace users and groups
Secrets Management: Secure credentials and tokens

API Features

REST API v2.0/2.1/2.2: Comprehensive workspace operations
OAuth 2.0: Secure authentication with token refresh
Workspace API: Manage notebooks and folders
Clusters API: Full cluster lifecycle management
Jobs API: Automated workflow orchestration
DBFS API: File system operations
SQL API: Data warehousing operations
SCIM API: User and group provisioning

What You Can Do with Databricks MCP Server

The MCP server transforms Databricks API into a natural language interface, enabling AI agents to:

Cluster Management

Cluster Operations
- "Create a new cluster with 8 workers"
- "Start the analytics cluster"
- "Stop all idle clusters"
- "List available Spark versions"
Cluster Configuration
- "Update cluster to use latest Spark"
- "Resize cluster to 16 workers"
- "Configure autoscaling for cluster"
- "Set auto-termination to 30 minutes"
Cluster Monitoring
- "Check cluster status"
- "View cluster events"
- "List all running clusters"
- "Get cluster metrics"

Job Orchestration

Job Management
- "Create ETL job for daily processing"
- "Schedule job to run at 2 AM"
- "Update job configuration"
- "Delete completed jobs"
Job Execution
- "Run data pipeline now"
- "Check job run status"
- "Cancel running job"
- "Get job output logs"
Job Monitoring
- "List all job runs"
- "View failed jobs today"
- "Get job execution history"
- "Monitor job performance"

Workspace Management

Notebook Operations
- "Import notebook from repository"
- "Export notebook to Python script"
- "List all notebooks in workspace"
- "Create new notebook folder"
Folder Organization
- "Create project folder structure"
- "Move notebooks to archive"
- "List workspace contents"
- "Get notebook status"
Collaboration
- "Share notebook with team"
- "Get notebook metadata"
- "Track notebook changes"
- "Export workspace backup"

DBFS Operations

File Management
- "Upload data file to DBFS"
- "List files in /FileStore/"
- "Download analysis results"
- "Delete old data files"
Directory Operations
- "Create data directory"
- "Move files to archive folder"
- "Check file status"
- "List directory contents"
Data Transfer
- "Upload CSV to DBFS"
- "Download processed data"
- "Copy files between locations"
- "Verify file integrity"

SQL Warehouses

Warehouse Management
- "List available warehouses"
- "Start SQL warehouse"
- "Stop warehouse after query"
- "Get warehouse configuration"
Query Operations
- "List saved SQL queries"
- "View query history"
- "Get query results"
- "Manage query dashboards"
Analytics
- "Create data dashboard"
- "Schedule report generation"
- "Export query results"
- "Monitor warehouse usage"

User & Group Management

User Operations
- "List workspace users"
- "Create service account"
- "Update user permissions"
- "Deactivate user access"
Group Management
- "Create data engineering group"
- "Add users to group"
- "List group members"
- "Update group permissions"
Access Control
- "Grant cluster access"
- "Review user permissions"
- "Manage workspace roles"
- "Audit access logs"

Secrets Management

Secret Operations
- "Create secret scope"
- "Store API credentials"
- "List secrets in scope"
- "Delete expired secrets"
Security
- "Manage secret permissions"
- "Rotate access tokens"
- "Audit secret usage"
- "Configure secret ACLs"

Prerequisites

Access to Cequence AI Gateway
Databricks workspace account
Workspace admin or appropriate permissions
Account-level access (for OAuth setup)

Step 1: Configure OAuth in Databricks

1.1 Access Databricks Account Console

Go to https://accounts.cloud.databricks.com/
Sign in with your Databricks account credentials
Select your account from the list

1.2 Create OAuth App Connection

Navigate to Settings in the left sidebar
Click on App connections tab
Click Add connection button
Configure the application:

Application Details:
- Name: "AI Gateway MCP Server"
- Redirect URLs: Add the following:
```
https://auth.aigateway.cequence.ai/v1/outbound/oauth/callback
```

1.3 Configure OAuth Scopes

Select the required scopes based on your needs:

Essential Scopes:

all-apis - Access to all Databricks REST APIs
offline_access - Refresh token support

Optional Scopes (for specific features):

sql - SQL Analytics access
clusters - Cluster management
jobs - Job orchestration
workspace - Notebook operations

1.4 Get OAuth Credentials

Click Create to generate the app connection
Copy the Client ID (format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
Click Generate Secret and copy the Client Secret
Note your Account ID from the URL or account settings

1.5 Important OAuth URLs

Based on your Databricks account:

Authorization URL: https://accounts.cloud.databricks.com/oidc/accounts/{accountId}/v1/authorize
Token URL: https://accounts.cloud.databricks.com/oidc/accounts/{accountId}/v1/token

Replace {accountId} with your actual Databricks Account ID.

Step 2: Access AI Gateway

Log in to your Cequence AI Gateway portal
Navigate to API Catalog or Integrations
Search for "Databricks" in the catalog

Step 3: Find Databricks API

Locate Databricks REST API in the search results
Review available API categories:
- Clusters API
- Jobs API
- Workspace API
- DBFS API
- SQL Warehouses API
- SCIM API (Users & Groups)
- Secrets API
Click Create MCP Server or Configure

Step 4: Create MCP Server

Click Create New MCP Server for Databricks
Review the MCP server creation wizard
Click Start Configuration

Step 5: Configure API Endpoints

Base URL: https://{workspace_url}
- Example: https://dbc-abc12345-6789.cloud.databricks.com
- Get this from your workspace URL
Select Endpoint Categories:
- ✓ Clusters API (v2.1)
- ✓ Jobs API (v2.2)
- ✓ Workspace API (v2.0)
- ✓ DBFS API (v2.0)
- ✓ SQL Warehouses API (v2.0)
- ✓ SCIM API (v2.0)
- ✓ Secrets API (v2.0)
Review selected endpoints and click Next

Step 6: MCP Server Configuration

Server Name: "Databricks Lakehouse Automation"
Description: "Data engineering and ML workflow automation"
Environment: Select production or development
Enable Features:
- ✓ Request logging
- ✓ Error handling
- ✓ Rate limiting
- ✓ Token refresh
Click Next

Step 7: Configure Authentication

OAuth 2.0 Configuration

Authentication Type: Select OAuth 2.0

Authorization URL:

https://accounts.cloud.databricks.com/oidc/accounts/{your-account-id}/v1/authorize

Replace {your-account-id} with your actual Account ID

Token URL:

https://accounts.cloud.databricks.com/oidc/accounts/{your-account-id}/v1/token

Replace {your-account-id} with your actual Account ID

Client ID: Paste the Client ID from Step 1.4
Client Secret: Paste the Client Secret from Step 1.4
OAuth Scopes: Enter the required scopes (space-separated):
```
all-apis offline_access
```
Token Refresh: Enable automatic token refresh
Click Validate Credentials to test the configuration
Click Next after successful validation

Available Databricks OAuth Scopes

Core Access Scopes

all-apis
- Access to all Databricks REST APIs
- Cluster management
- Job orchestration
- Workspace operations
- DBFS access
- User management
- Recommended for full automation
offline_access
- Enables refresh token
- Long-lived sessions
- Automatic token renewal
- Required for production use

Feature-Specific Scopes

sql
- SQL Analytics access
- Query warehouses
- Dashboard operations
- Data exploration
clusters
- Cluster lifecycle management
- Configuration updates
- Event monitoring
jobs
- Job creation and management
- Run orchestration
- Pipeline automation
workspace
- Notebook operations
- Folder management
- Import/export

Recommended Scope Combinations

For Full Automation:

all-apis offline_access

For SQL Analytics Only:

sql offline_access

For Data Engineering:

all-apis offline_access

For Read-Only Monitoring:

all-apis

(Note: Most operations require all-apis scope)

Step 8: Security Configuration

API Key Management:
- ✓ Enable key rotation
- Set expiration policies
- Configure access logs
Rate Limiting:
- Set requests per minute: 100
- Configure burst limits
- Enable throttling alerts
IP Restrictions (optional):
- Add allowed IP ranges
- Configure firewall rules
Audit Logging:
- ✓ Enable request logging
- ✓ Track API usage
- Configure retention period
Click Next

Step 9: Choose Deployment Option

Option A: Cloud Deployment (Recommended)

Fully managed by AI Gateway
Automatic scaling
High availability
No infrastructure management

Option B: Self-Hosted

Deploy in your infrastructure
Full control over resources
Custom security policies
Manual scaling

Select your preferred option and click Next

Step 10: Deploy MCP Server

Review all configurations:
- Workspace URL
- Selected APIs
- OAuth settings
- Security policies
Click Deploy MCP Server
Wait for deployment (typically 1-2 minutes)

Using Your Databricks MCP Server

With Claude Desktop, Cursor, or Windsurf

Add to your MCP client configuration (e.g., claude_desktop_config.json):

{
  "mcpServers": {
    "databricks": {
      "command": "npx",
      "args": [
        "-y",
        "@cequenceai/mcp-remote",
        "<your-mcp-url>"
      ]
      }
    }
  }
}

Natural Language Commands

Try these commands with your AI assistant:

Cluster Management:

"List all Spark versions available"
"Create a cluster with 4 workers using latest Spark"
"Show me all running clusters"
"Stop the cluster named 'analytics-cluster'"

Job Operations:

"Create a daily ETL job"
"Run the data-pipeline job now"
"Show me failed jobs from today"
"Get the output from job run 12345"

Workspace Management:

"List all notebooks in the Shared folder"
"Import notebook from GitHub repository"
"Export the analysis notebook as Python"
"Create a project folder structure"

DBFS Operations:

"List files in /FileStore/data/"
"Upload local CSV to DBFS"
"Download processed results from DBFS"
"Create a data directory in FileStore"

User Management:

"List all workspace users"
"Create a new service account group"
"Show me user permissions"
"Add user to data-engineering group"

Common Use Cases

Data Engineering

ETL Pipelines: Automated data transformation
Cluster Management: Dynamic resource allocation
Job Scheduling: Orchestrated workflows
Data Quality: Validation and monitoring

Machine Learning

Experiment Tracking: ML lifecycle management
Model Deployment: Production pipelines
Feature Engineering: Data preparation
Training Jobs: Distributed computing

Analytics & BI

SQL Queries: Ad-hoc analysis
Dashboard Creation: Visual reporting
Data Exploration: Interactive queries
Report Scheduling: Automated insights

DevOps & Automation

Infrastructure as Code: Workspace configuration
CI/CD Integration: Deployment automation
Monitoring: Health checks and alerts
Cost Optimization: Resource management

Security Best Practices

OAuth Security:
- Use offline_access for token refresh
- Rotate client secrets regularly
- Monitor OAuth app usage
- Revoke unused connections
- Implement token expiration policies
Access Control:
- Follow principle of least privilege
- Use service accounts for automation
- Implement SCIM for user provisioning
- Regular access reviews
- Enable audit logging
Workspace Security:
- Protect sensitive notebooks
- Encrypt secrets at rest
- Use secret scopes for credentials
- Configure IP access lists
- Enable workspace isolation
Data Protection:
- Encrypt data in transit
- Use Unity Catalog for governance
- Implement fine-grained ACLs
- Monitor DBFS access
- Regular backup procedures

API Rate Limits

Databricks API has the following limits:

Default: 100 requests per minute per user
Burst: Short bursts up to 200 RPM
Cluster APIs: May have lower limits
Best Practice: Implement exponential backoff

Troubleshooting

Common Issues

1. Authentication Errors

Error: "Invalid OAuth credentials"
Solution:
- Verify Client ID and Secret
- Check Account ID in URLs
- Ensure app connection is active
- Validate redirect URI matches

2. Token Expiration

Error: "Token has expired"
Solution:
- Enable offline_access scope
- Configure automatic token refresh
- Check token TTL settings

3. Permission Errors

Error: "Insufficient permissions"
Solution:
- Verify OAuth scopes include all-apis
- Check workspace user permissions
- Review cluster/job ACLs
- Ensure admin access for management ops

4. Cluster Creation Fails

Error: "Spark version not supported"
Solution:
- Use Spark Runtime 13.3 LTS or higher
- Check if legacy features are disabled
- Verify node type availability
- Review workspace policies

5. DBFS Access Denied

Error: "Public DBFS root is disabled"
Solution:
- Use /FileStore/ path instead of /tmp/
- Check DBFS permissions
- Verify workspace security settings
- Use allowed storage paths

6. Workspace Import Fails

Error: "Folder is protected"
Solution:
- Use /Shared/ path instead of /Users/
- Check folder permissions
- Verify notebook format
- Use correct content encoding

Getting Help

Documentation: AI Gateway Docs
Databricks API Docs: docs.databricks.com/api
API Reference: api-reference.cloud.databricks.com

About Databricks API​

Key Capabilities​

API Features​

What You Can Do with Databricks MCP Server​

Cluster Management​

Job Orchestration​

Workspace Management​

DBFS Operations​

SQL Warehouses​

User & Group Management​

Secrets Management​

Prerequisites​

Step 1: Configure OAuth in Databricks​

1.1 Access Databricks Account Console​

1.2 Create OAuth App Connection​

1.3 Configure OAuth Scopes​

1.4 Get OAuth Credentials​

1.5 Important OAuth URLs​

Step 2: Access AI Gateway​

Step 3: Find Databricks API​

Step 4: Create MCP Server​

Step 5: Configure API Endpoints​

Step 6: MCP Server Configuration​

Step 7: Configure Authentication​

OAuth 2.0 Configuration​

Available Databricks OAuth Scopes​

Core Access Scopes​

Feature-Specific Scopes​

Recommended Scope Combinations​

Step 8: Security Configuration​

Step 9: Choose Deployment Option​

Option A: Cloud Deployment (Recommended)​

Option B: Self-Hosted​

Step 10: Deploy MCP Server​

Using Your Databricks MCP Server​

With Claude Desktop, Cursor, or Windsurf​

Natural Language Commands​

Common Use Cases​

Data Engineering​

Machine Learning​

Analytics & BI​

DevOps & Automation​

Security Best Practices​

API Rate Limits​

Troubleshooting​

Common Issues​

Getting Help​