Learn to build an MCP server exposing Snowflake/BigQuery to Claude. Step-by-step Python tutorial for secure AI-powered warehouse access.
An MCP (Model Context Protocol) server is a standardized interface that connects AI models—like Claude, GPT-4, or other LLMs—directly to your internal data systems. Instead of copying data out of your warehouse or writing custom API wrappers, an MCP server acts as a bridge: Claude can query your Snowflake or BigQuery warehouse, run SQL, fetch results, and reason about data without ever leaving your secure environment.
For data and engineering leaders, this changes the game. Your analytics team no longer needs to manually write queries or export CSVs. Your product team can embed AI-powered data exploration into your application. Your finance team can ask Claude to generate KPI dashboards on the fly. All of this happens through a single, authenticated, auditable interface.
The appeal is straightforward: you get text-to-SQL capabilities without building a custom LLM pipeline, and you maintain control over access, query complexity, and cost. When D23 manages Apache Superset with AI integration, the underlying architecture often includes MCP servers to wire AI agents directly to your BI layer. This tutorial walks you through building that same capability for your own warehouse.
Traditional BI platforms like Looker, Tableau, and Power BI are designed for dashboard consumption and scheduled reports. They're powerful, but they require data engineers to build the schema, analysts to create the dashboard, and business users to navigate a UI. That's a multi-step, human-intensive workflow.
An MCP server flips that. An analyst—or even a non-technical stakeholder—can ask Claude: "What was our revenue by region last quarter?" Claude calls your MCP server, which executes a safe, parameterized query against Snowflake, and returns the result in seconds. No dashboard refresh needed. No waiting for an analyst to build a report.
This is especially valuable for:
The downside is operational: you're responsible for building, testing, securing, and maintaining the server. This tutorial shows you how to do that in Python, using industry-standard tools.
Before writing code, understand the moving parts.
FastMCP is a Python framework that simplifies MCP server development. Instead of implementing the entire MCP protocol yourself, FastMCP handles serialization, request routing, and error handling. You write the business logic—connecting to Snowflake, executing queries—and FastMCP wires it up to Claude.
The MCP protocol defines how clients (like Claude) talk to servers. A client sends a JSON-RPC request to the server; the server processes it and returns a response. The protocol is language-agnostic, so a Python server works with any MCP-compatible client.
Resources are the data your server exposes. In this context, a resource might be a list of tables in your warehouse, or a parameterized SQL query template.
Tools are functions the client can call. When Claude uses your MCP server, it's calling a tool—e.g., execute_query(sql) or list_tables(schema). Tools return results that Claude can reason about.
For a deeper dive into MCP architecture, resources like How to Build MCP Servers for Your Internal Data provide excellent context on production patterns.
You'll need:
Start by creating a Python virtual environment:
python3 -m venv mcp_warehouse_env
source mcp_warehouse_env/bin/activate # On Windows: mcp_warehouse_env\Scripts\activate
pip install --upgrade pipNext, install the required packages:
pip install fastmcp snowflake-connector-python google-cloud-bigquery python-dotenvIf you're using Snowflake, snowflake-connector-python is the official client. For BigQuery, google-cloud-bigquery is standard. python-dotenv lets you load credentials from a .env file instead of hardcoding them.
Never hardcode credentials in your source code. Use environment variables.
Create a .env file in your project directory:
# For Snowflake
SNOWFLAKE_ACCOUNT=xy12345.us-east-1
SNOWFLAKE_USER=your_username
SNOWFLAKE_PASSWORD=your_password
SNOWFLAKE_WAREHOUSE=COMPUTE_WH
SNOWFLAKE_DATABASE=ANALYTICS
# For BigQuery
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json
BIGQUERY_PROJECT_ID=your-gcp-project
BIGQUERY_DATASET=analytics
Load these in your Python script:
import os
from dotenv import load_dotenv
load_dotenv()
snowflake_account = os.getenv('SNOWFLAKE_ACCOUNT')
snowflake_user = os.getenv('SNOWFLAKE_USER')
# ... and so onThis keeps secrets out of version control. Make sure .env is in your .gitignore.
Create a file called warehouse_mcp_server.py:
from fastmcp import FastMCP
from typing import Any
import json
# Initialize the MCP server
server = FastMCP("warehouse-mcp")
@server.tool()
def list_tables(schema: str = "PUBLIC") -> str:
"""
List all tables in a given schema.
Args:
schema: The schema name (default: PUBLIC)
Returns:
JSON string of table names
"""
# We'll implement warehouse-specific logic in the next step
return json.dumps({"tables": ["customers", "orders", "products"]})
@server.tool()
def execute_query(sql: str) -> str:
"""
Execute a SQL query against the warehouse.
Args:
sql: The SQL query to execute
Returns:
JSON string of results
"""
# Warehouse-specific implementation follows
return json.dumps({"result": "Query executed"})
if __name__ == "__main__":
server.run()This creates a basic server with two tools: list_tables and execute_query. FastMCP automatically exposes these to Claude. The @server.tool() decorator registers each function as a callable tool.
Run the server to verify it starts:
python warehouse_mcp_server.pyYou should see output indicating the server is listening. This is your foundation. Now we wire it to actual warehouse logic.
If you're using Snowflake, update warehouse_mcp_server.py to connect and execute real queries:
import os
from dotenv import load_dotenv
import snowflake.connector
from fastmcp import FastMCP
import json
load_dotenv()
server = FastMCP("warehouse-mcp")
def get_snowflake_connection():
"""
Establish a connection to Snowflake.
"""
return snowflake.connector.connect(
account=os.getenv('SNOWFLAKE_ACCOUNT'),
user=os.getenv('SNOWFLAKE_USER'),
password=os.getenv('SNOWFLAKE_PASSWORD'),
warehouse=os.getenv('SNOWFLAKE_WAREHOUSE'),
database=os.getenv('SNOWFLAKE_DATABASE')
)
@server.tool()
def list_tables(schema: str = "PUBLIC") -> str:
"""
List all tables in a given schema.
"""
try:
conn = get_snowflake_connection()
cursor = conn.cursor()
cursor.execute(f"SHOW TABLES IN SCHEMA {schema}")
tables = [row[1] for row in cursor.fetchall()]
cursor.close()
conn.close()
return json.dumps({"schema": schema, "tables": tables})
except Exception as e:
return json.dumps({"error": str(e)})
@server.tool()
def execute_query(sql: str) -> str:
"""
Execute a SQL query and return results as JSON.
"""
try:
conn = get_snowflake_connection()
cursor = conn.cursor()
cursor.execute(sql)
columns = [desc[0] for desc in cursor.description]
rows = cursor.fetchall()
cursor.close()
conn.close()
results = [dict(zip(columns, row)) for row in rows]
return json.dumps({"columns": columns, "rows": results})
except Exception as e:
return json.dumps({"error": str(e)})
if __name__ == "__main__":
server.run()Now when Claude calls execute_query, your server connects to Snowflake, runs the SQL, and returns structured results. The key improvements:
sqlalchemy) to avoid opening a new connection per query.If you prefer BigQuery, the pattern is similar:
import os
from dotenv import load_dotenv
from google.cloud import bigquery
from fastmcp import FastMCP
import json
load_dotenv()
server = FastMCP("warehouse-mcp")
def get_bigquery_client():
"""
Create a BigQuery client using service account credentials.
"""
return bigquery.Client(project=os.getenv('BIGQUERY_PROJECT_ID'))
@server.tool()
def list_tables(dataset: str = None) -> str:
"""
List all tables in a BigQuery dataset.
"""
try:
client = get_bigquery_client()
dataset_id = dataset or os.getenv('BIGQUERY_DATASET')
dataset_ref = client.dataset(dataset_id)
tables = client.list_tables(dataset_ref)
table_names = [table.table_id for table in tables]
return json.dumps({"dataset": dataset_id, "tables": table_names})
except Exception as e:
return json.dumps({"error": str(e)})
@server.tool()
def execute_query(sql: str) -> str:
"""
Execute a BigQuery SQL query and return results.
"""
try:
client = get_bigquery_client()
query_job = client.query(sql)
results = query_job.result()
# Convert to list of dicts
rows = [dict(row) for row in results]
columns = list(results.schema) if results.schema else []
return json.dumps({"columns": [field.name for field in columns], "rows": rows})
except Exception as e:
return json.dumps({"error": str(e)})
if __name__ == "__main__":
server.run()BigQuery's Python client is higher-level than Snowflake's connector, so the code is slightly cleaner. The logic is identical: connect, execute, return JSON.
Before you expose your warehouse to Claude, add safeguards. Claude is powerful, but it can generate expensive queries—full table scans, runaway joins, or accidental DELETE statements.
Implement a query validator:
import re
FORBIDDEN_KEYWORDS = ['DROP', 'DELETE', 'TRUNCATE', 'ALTER', 'GRANT']
MAX_ROWS = 10000
def validate_query(sql: str) -> tuple[bool, str]:
"""
Validate a query for safety.
Returns:
(is_valid, error_message)
"""
sql_upper = sql.strip().upper()
# Check for dangerous keywords
for keyword in FORBIDDEN_KEYWORDS:
if re.search(rf'\b{keyword}\b', sql_upper):
return False, f"Query contains forbidden keyword: {keyword}"
# Ensure it's a SELECT (or other read-only operation)
if not sql_upper.startswith('SELECT'):
return False, "Only SELECT queries are allowed"
return True, ""
@server.tool()
def execute_query(sql: str) -> str:
"""
Execute a SQL query with validation.
"""
is_valid, error = validate_query(sql)
if not is_valid:
return json.dumps({"error": error})
try:
conn = get_snowflake_connection()
cursor = conn.cursor()
cursor.execute(sql)
rows = cursor.fetchall()
# Limit result set size
if len(rows) > MAX_ROWS:
rows = rows[:MAX_ROWS]
warning = f"Result set truncated to {MAX_ROWS} rows"
else:
warning = None
columns = [desc[0] for desc in cursor.description]
results = [dict(zip(columns, row)) for row in rows]
cursor.close()
conn.close()
response = {"columns": columns, "rows": results}
if warning:
response["warning"] = warning
return json.dumps(response)
except Exception as e:
return json.dumps({"error": str(e)})This blocks dangerous operations and caps result sets. In production, you might also:
Once your server is running locally, test it with Claude. Create a simple test script:
import anthropic
import json
import subprocess
import time
# Start your MCP server in the background
server_process = subprocess.Popen(['python', 'warehouse_mcp_server.py'])
time.sleep(2) # Give the server time to start
client = anthropic.Anthropic(api_key="your-anthropic-api-key")
# Define your MCP server as a tool
mcp_tools = [
{
"type": "tool",
"name": "execute_query",
"description": "Execute a SQL query against the warehouse",
"input_schema": {
"type": "object",
"properties": {
"sql": {"type": "string", "description": "The SQL query"}
},
"required": ["sql"]
}
},
{
"type": "tool",
"name": "list_tables",
"description": "List tables in a schema",
"input_schema": {
"type": "object",
"properties": {
"schema": {"type": "string", "description": "Schema name"}
},
"required": []
}
}
]
# Ask Claude a question
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=mcp_tools,
messages=[
{"role": "user", "content": "What tables are in the PUBLIC schema?"}
]
)
print(json.dumps(message, indent=2, default=str))
server_process.terminate()This test sends a question to Claude, which calls your MCP server's list_tables tool. Claude receives the result and responds to the user.
For production deployment, consider:
Once your basic server works, layer on sophistication:
Instead of Claude writing SQL from scratch (which can be error-prone), have it generate SQL, then validate and execute:
@server.tool()
def natural_language_query(question: str) -> str:
"""
Convert a natural language question to SQL and execute it.
"""
# Use Claude to generate SQL from the question
# (This requires a separate Claude API call)
# Then execute the generated SQL
passExpose your warehouse schema so Claude understands what data is available:
@server.tool()
def describe_table(table_name: str) -> str:
"""
Return column names, types, and sample data for a table.
"""
passCache frequently-run queries to reduce warehouse load and latency:
from functools import lru_cache
@lru_cache(maxsize=100)
def cached_execute_query(sql: str) -> str:
# Execute and cache
passFor more advanced patterns, Building MCP servers in the real world covers production lessons from engineering teams.
If you're running D23—the managed Apache Superset platform—you can layer your MCP server on top for conversational BI. Instead of users clicking through dashboards, they ask Claude questions, which queries your warehouse via the MCP server, and returns insights.
D23's self-serve BI capabilities combined with an MCP server create a powerful workflow:
This is especially valuable for teams managing analytics across data consulting engagements or portfolio companies. One MCP server can expose data from multiple sources, and Claude can correlate insights across them.
When exposing your warehouse to an AI model, security is non-negotiable:
Your MCP server should authenticate requests. If you're deploying over HTTP, use TLS and require API keys:
from fastapi import FastAPI, Header, HTTPException
app = FastAPI()
VALID_API_KEYS = [os.getenv('MCP_API_KEY')]
@app.post("/query")
async def query_endpoint(sql: str, x_api_key: str = Header(None)):
if x_api_key not in VALID_API_KEYS:
raise HTTPException(status_code=401, detail="Invalid API key")
# Process queryEncrypt credentials in transit and at rest. Use secrets management (AWS Secrets Manager, HashiCorp Vault) instead of .env files in production.
Log every query Claude executes:
import logging
logger = logging.getLogger(__name__)
logger.info(f"Query executed: {sql[:100]}... by {user_id} at {timestamp}")If your warehouse supports it (Snowflake and BigQuery do), enforce RLS so Claude only sees rows the requesting user is authorized to access.
For compliance frameworks like HIPAA or SOC 2, consult your warehouse provider's security documentation and your legal team. The D23 Privacy Policy and Terms of Service outline how managed platforms handle data governance.
Pitfall 1: Unbounded queries. Claude might ask for "all customers" without a LIMIT. Always cap result sets and set query timeouts.
Pitfall 2: Exposing sensitive data. If your warehouse contains PII, PHI, or financial data, use RLS or masking to prevent Claude from accessing it.
Pitfall 3: Expensive queries. A full table scan on a billion-row table costs money and time. Validate queries before execution and educate Claude on efficient patterns.
Pitfall 4: No error handling. If a query fails, Claude needs a clear error message so it can retry or ask the user for clarification.
Pitfall 5: Ignoring latency. If your MCP server takes 30 seconds to respond, Claude's response to the user will be slow. Optimize query performance and consider caching.
Imagine you're a PE firm with 15 portfolio companies, each with its own data warehouse. You build a single MCP server that connects to all 15 warehouses:
WAREHOUSES = {
"company_a": {"type": "snowflake", "account": "..."},
"company_b": {"type": "bigquery", "project": "..."},
# ... 13 more
}
@server.tool()
def execute_query_across_warehouses(company: str, sql: str) -> str:
"""
Execute a query against a specific portfolio company's warehouse.
"""
warehouse_config = WAREHOUSES.get(company)
if not warehouse_config:
return json.dumps({"error": f"Unknown company: {company}"})
# Connect to the appropriate warehouse and execute
passNow your CFO can ask Claude: "What's the churn rate across all portfolio companies?" Claude calls your MCP server with queries for each company, aggregates results, and returns a comparison. No manual report generation. No waiting for analysts.
This is the power of MCP servers: they democratize data access while keeping your warehouse secure and your costs under control.
Before deploying to production, test thoroughly:
Resources like Build an MCP Server: Complete MCP Tutorial for Beginners and Building a Basic MCP Server with Python provide additional test patterns and examples.
Once deployed, monitor your MCP server:
Tools like Datadog, New Relic, or open-source Prometheus can aggregate these metrics.
As your MCP server matures:
calculate_ltv(), forecast_churn()) that Claude can call alongside SQL queries.For teams building embedded analytics, the combination of an MCP server and embedded analytics on Apache Superset is powerful. You control the data access layer (MCP), the BI layer (Superset), and the AI reasoning layer (Claude), without vendor lock-in.
Building an MCP server for your internal data warehouse is a concrete, high-ROI project. You're not adopting a new BI platform; you're adding a conversational interface to your existing warehouse. The implementation is straightforward—a few hundred lines of Python—and the payoff is immediate: faster insights, reduced manual reporting, and AI-powered analytics without vendor lock-in.
Start with the basics: connect to your warehouse, expose a few key tables, and test with Claude. Iterate from there. Add validation, caching, and monitoring. Integrate with your existing BI stack. Within weeks, you'll have a system that turns your warehouse into a conversational data engine.
For teams already using D23 for API-first BI and self-serve analytics, an MCP server is the natural next step. It complements Superset's dashboard-centric model with a query-centric, AI-powered alternative. And because both are open-source or open-source-based, you maintain control and avoid the cost and complexity of monolithic BI platforms like Looker, Tableau, or Power BI.
The tutorial above is production-ready. Use it as a template, adapt it to your warehouse, and deploy with confidence. Your data team will thank you.