Perplexity AI MCP server

Perplexity AI is an AI-powered search and answer engine that provides web-grounded responses with inline citations, real-time web search, text embeddings, and multi-provider agent capabilities. With this MCP server, AI agents can run web-grounded chat completions using Sonar models, perform real-time web searches with domain and date filtering, generate embeddings for semantic search and RAG, and orchestrate responses across third-party models from OpenAI, Anthropic, Google, and xAI.

Setting up an MCP server

This article covers the standard steps for creating an MCP server in AI Gateway and connecting it to an AI client. The steps are the same for every integration — application-specific details (API credentials, OAuth endpoints, and scopes) are covered in the individual application pages.

Before you begin

You'll need:

Access to AI Gateway with permission to create MCP servers
API credentials for the application you're connecting (see the relevant application page for what to collect)

Create an MCP server

Find the API in the catalog

Sign in to AI Gateway and select MCP Servers from the left navigation.
Select New MCP Server.
Search for the application you want to connect, then select it from the catalog.

Configure the server

Enter a Name for your server — something descriptive that identifies both the application and its purpose.
Enter a Description so your team knows what the server is for.
Set the Timeout value. 30 seconds works for most APIs; increase to 60 seconds for APIs that return large payloads.
Toggle Production mode on if this server will be used in a live workflow.
Select Next.

Configure authentication

Enter the authentication details for the application. This varies by service — see the Authentication section of the relevant application page for the specific credentials, OAuth URLs, and scopes to use.

Configure security

Set any Rate limits appropriate for your use case and the API's own limits.
Enable Logging if you want AI Gateway to record requests and responses for auditing.
Select Next.

Deploy

Review the summary, then select Deploy. AI Gateway provisions the server and provides a server URL you'll use when configuring your AI client.

Connect to an AI client

Once your server is deployed, you'll need to add it to the AI client your team uses. Select your client for setup instructions:

Tips

You can create multiple MCP servers for the same application — for example, a read-only server for reporting agents and a read-write server for automation workflows.
If you're unsure which OAuth scopes to request, start with the minimum read-only set and add write scopes only when needed. Most application pages include scope recommendations.
You can edit a server's name, description, timeout, and security settings after deployment without redeploying.

Authentication

Perplexity AI uses API key authentication via a Bearer token in the Authorization header. Generate your API key from the Perplexity console.

API Key Header: Authorization: Bearer YOUR_API_KEY
Where to get: console.perplexity.ai
Rate limits: Tier-dependent. See Perplexity rate limits and usage tiers

Available tools

The Perplexity AI MCP server exposes web-grounded chat completions, asynchronous queries, multi-provider agent responses, real-time web search, and text embedding APIs.

Tool	Purpose
Chat Completion (Sonar)	Web-grounded chat completions with citations using Sonar models (sonar, sonar-pro, sonar-deep-research, sonar-reasoning-pro)
Async Chat Completion	Submit, list, and retrieve long-running chat completions asynchronously — ideal for deep research queries
Agent	Generate responses using third-party models (OpenAI, Anthropic, Google, xAI) with web search tools, function calling, and model fallback chains
Web Search	Real-time ranked web search with domain allowlist/denylist, language, country, date range, and recency filtering
Embeddings	Standard text embeddings for semantic search, clustering, and RAG with Matryoshka dimensionality reduction
Contextualized Embeddings	Context-aware embeddings for document chunks that share document-level context for improved retrieval quality
Auth Token Management	Generate and revoke API authentication tokens programmatically

Tips

Choose the right Sonar model for your use case — sonar for fast general queries, sonar-pro for higher quality, sonar-deep-research for comprehensive multi-step research, and sonar-reasoning-pro for complex reasoning tasks.

Use async completions for deep research queries that may take longer to process — submit the request, then poll for results by request ID instead of waiting on a synchronous call.

Filter web searches by domain using search_domain_filter to restrict results to trusted sources like arxiv.org or scholar.google.com, and use search_recency_filter to limit results by time (day, week, month, year).

Request structured JSON output with response_format: { type: "json_schema" } to get machine-readable responses that fit directly into downstream workflows.

Leverage the Agent API to route queries across multiple LLM providers with automatic model fallback chains — specify a list of models and the API will try each in order if one fails.

Use contextualized embeddings instead of standard embeddings when embedding document chunks — chunks from the same document share context, which improves retrieval accuracy in RAG pipelines.

Store API keys in environment variables or secure vaults, never in client-side code.

Rotate API keys periodically and revoke unused tokens through the Auth Token Management API.

Before you begin​

Create an MCP server​

Find the API in the catalog​

Configure the server​

Configure authentication​

Configure security​

Deploy​

Connect to an AI client​

Tips​

Authentication​

Available tools​

Tips​

Before you begin

Create an MCP server

Find the API in the catalog

Configure the server

Configure authentication

Configure security

Deploy

Connect to an AI client

Tips

Authentication

Available tools

Tips