The Blog

AI billing insights, product vision and all other areas we touch while working on GATE/0.

Building a Custom MCP Server for PostgreSQL

In one of our previous posts, we introduced the Model Context Protocol (MCP) and showed how to integrate a large language model (LLM), such as Claude, with a PostgreSQL database using a ready-to-use MCP server. This integration empowers AI systems to interact with external data sources in a consistent, secure, and transparent manner. But what if your project demands more customization, tighter security, or specialized behavior that the default server doesn't provide? In this post, we’ll walk through how to build your own MCP server for PostgreSQL from scratch—explaining both why you might want to and how to do it. Why You Might Want to Build a Custom MCP Server? The open-source PostgreSQL MCP server is a great starting point, but it may not fit every use case. Here’s why you might consider writing your own: 1. Precise Query Control and Access Permissions The default server exposes general SQL execution capabilities. If your data is sensitive or if you need to strictly control what the model can query, building a custom MCP server lets you: Whitelist specific tables or columns Allow only SELECT operations Enforce row-level access control Sanitize or redact sensitive data in responses 2. Custom API-Like Behaviors Instead of Raw SQL You might want to expose higher-level operations rather than raw SQL. For example, instead of letting the model ask: you can offer: This way, you define the behavior behind the scenes. 3. Integrate External Systems and Hybrid Workflows Sometimes, data doesn’t live in PostgreSQL alone. You may want your MCP server to: Combine PostgreSQL with REST API responses Cache results Perform audits or logging Validate data before returning it 4. Enhanced Security and Regulatory Compliance Organizations with strict security requirements (e.g., healthcare or finance) often need full control over their AI integration layers. Writing your own MCP server helps ensure you meet compliance standards like GDPR or HIPAA. Step-by-Step: How to build a MCP Server We are going to use the official MCP Python SDK to build our server and uv for managing the Python project. Project Setup For this walkthrough, we'll use uv, a Python project management tool. You're free to use another manager, but uv simplifies setup for this example. Begin by initializing your project: Then, add the MCP SDK: Now that our project is set up, we are ready to implement our MCP server. Let's explore how to do it. Implementing the server Let's create a file server.py in the root directory of our new project and paste the following code into it. Here's what this initial setup does: Imports the required libraries Creates an MCP server instance Launches it using the standard input/output transport (stdio), enabling local interaction Implementing a tool without parameters So far our MCP server is quite empty - it does not provide any tools. Let's change that by implementing our first tool for listing tables in our employees database. The following example demonstrates how to achieve that. MCP tools are registered via the @mcp.tool() decorator. The MCP system uses the function's signature and docstring to automatically configure the tool: Function name becomes the tool's identifier Parameters define its input schema Return type specifies the expected result format Docstring serves as inline documentation and is used for tool discovery and runtime introspection Our tool is implemented by querying all table names in the employees schema from the information_schema.tables table. The above tool relies on a helper function, execute_query, which we've separated for reuse across tools. Here’s how it works: Now that you learned how to implement a tool without parameters, let's implement another tool with an input parameter. Implementing a tool with parameters Now let's implement another tool that will describe a given table. The tool takes a table name as parameter and returns a list of column definitions. The tool above is named get_table_columns and takes table_name parameter. This table name is used to fetch the column descriptions from the table information_schema.columns and return it to the client. As you can see, the implementation of this tool is very similar to the the tool we implemented before. It differs only in the specification of the input parameter and the @mcp.tool() decorator will do the job for us by registering the tool with proper configuration in the MCP server. Now let's add our last tool that takes an arbitrary SQL statement and executes it agains our datable. This tool is required to allow our users to ask LLM questions about our database, like we did in our previous post. Now let’s test the MCP server using the MCP Inspector tool. Run this command: MCP Inspector Now let’s test the MCP server using the MCP Inspector tool. Run this command: Once executed, you should see something like this in your terminal. If successful, the terminal will display a URL that opens the inspector UI in your browser. If it doesn't open automatically, copy and paste it manually. In the MCP Inspector UI: Set Transport Type to STDIO Command: uv Arguments: run server.py Then, click Connect. If connected without errors, you should be able to see a screen like the following one. Once connected: Click the Tools tab Click List Tools Select a tool (e.g., list_tables) If applicable, enter input values Click Run Tool to execute and view the output The tool should be executed and the output presented to you. The following screenshot demonstrates the result of running the tool list_tables. 🎉 Congratulations! You've built and tested your first custom MCP server. You now have full control over how your LLM interacts with your PostgreSQL database—securely, flexibly, and with custom business logic. Real-World Example: Connecting Claude Desktop to Your Custom MCP Server Now let's use our MCP server in a real life scenario. Suppose you want to provide your LLM access to your company's employee database to enable data analysis about employees - perhaps to generate reports, answer HR questions, or identify workforce trends. We'll use Claude Desktop as our LLM client and connect it to your employee database through a PostgreSQL MCP server. In our previous post, we covered how to use MCP servers in Claude Desktop in details. The following steps assume you have read that post. We don't go into much details here. Update your claude_desktop_config.json to point to your custom MCP server: Let's review the configuration above in details. The configuration tells Claude Desktop to start our MCP server with the following command. So we tell uv to run our server.py file in the specified directory. Wrapping Up In this post, we explored how to build a custom MCP server for PostgreSQL from the ground up. We discussed why you might need a custom server—from fine-grained permission control and custom business logic to multi-system integration and regulatory compliance. Then, we walked through a step-by-step implementation using the mcp Python SDK and uv, adding tools for: Listing database tables Inspecting column metadata Executing custom SQL queries You also learned how to test your MCP server using MCP Inspector and how to integrate it with Claude Desktop for real-world usage. By creating your own MCP server, you gain full control over how your LLM interacts with your data—ensuring security, transparency, and flexibility tailored to your needs. Whether you're building internal analytics tools or deploying AI assistants in regulated industries, this approach puts you in the driver’s seat.

From Chatbots to Agents: The Evolution of AI with LLMs

AI has moved fast—really fast. Just a year ago, most people were getting excited about chatbots that could write emails or summarize articles. Now we’re talking about autonomous agents that can plan vacations, write code, schedule your meetings, and even collaborate with other agents. So what changed? This post walks you through the natural evolution of AI systems built with large language models (LLMs), from simple use cases to tool-augmented workflows, and finally to full-blown autonomous agents. Phase 1: Simple LLM Applications The journey begins with basic uses of LLMs — what many people first encounter when using tools like ChatGPT, Claude, or Gemini. These applications are typically stateless, meaning they don’t remember previous interactions beyond a single session (unless explicitly designed to). They’re also reactive: they wait for a human to provide input, and then respond. Typical use cases Chatbots: Answering product questions, FAQs, or guiding users through scripted workflows. Text Transformation: Summarizing articles, rewriting content, changing tone or format. Coding Assistants: Auto-completing functions, translating code, or explaining snippets. Language Tasks: Translating text, correcting grammar, extracting entities or keywords. Knowledge Lookup (with baked-in info): "What is quantum entanglement?" or "Who won the 2014 World Cup?" These tasks are surprisingly powerful — and for many users, this is more than enough. With the right prompt engineering, simple LLM apps can appear smart, fast, and even creative. Here is a minimal Python script demonstrating such a simple LLM application These early LLM applications amazed users with their apparent intelligence, but their simplicity also revealed clear constraints. Understanding these limitations is crucial to appreciating why developers began moving beyond simple chatbots toward more powerful, integrated systems. Limitations No long-term memory or personalization. No live data (unless fine-tuned or augmented with retrieval). Cannot perform real actions (e.g., open files, call APIs, send emails). Highly dependent on user prompting to get specific results. Even well-written prompts don’t give the model true understanding — just the illusion of it. That’s why these systems can: Repeat themselves Miss obvious facts Or “hallucinate” confidently wrong answers Still, this phase laid the groundwork. But as impressive as LLMs were, one key limitation became obvious: the model didn’t know your data — your policies, documents, or systems. The next question was inevitable: how can we give it access to that knowledge, and eventually, the power to use it? Retrieval-Augmented Generation (RAG) What if you want the LLM to answer questions using your own data — things it wasn’t trained on, like internal documents, customer support logs, engineering specs, or legal contracts? That’s where Retrieval-Augmented Generation (RAG) comes in. RAG is a method for dynamically injecting external information into the prompt at runtime. Instead of hoping the model knows something, you retrieve relevant content and feed it directly to the model as context. Let’s say your company has a 60-page procurement manual. You want to ask: “Under what conditions can a department make a purchase over $10,000 without VP approval?” This is a perfect RAG use case: It’s too specific for the LLM to know by default. The answer exists somewhere in internal documentation. The user shouldn’t have to dig manually — the system should do it. In a RAG system, when a user asks a question, the workflow typically looks like this: Embed the query – The user’s question is converted into a vector using an embedding model. Search the knowledge base – The query vector is compared against vectors of your documents (stored in a vector database like Pinecone, Weaviate, or FAISS) to find the most relevant chunks of text. Retrieve top results – The most relevant sections (e.g., paragraphs or bullet points) are selected. Construct the augmented prompt – The retrieved text snippets are inserted into the LLM prompt alongside the user’s question, forming a rich context window. Generate the answer – The LLM uses this context to craft a response grounded in your actual data. This approach turns an LLM from a generic, public knowledge assistant into a domain-specific expert tuned to your organization’s private information — without retraining the model itself. Why RAG is a breakthrough: Live knowledge – It connects your LLM to a constantly updated knowledge base, reflecting the latest policies, procedures, or data. Reduced hallucinations – Grounding the model in authoritative documents helps reduce confidently wrong answers. No retraining needed – Updating your knowledge base instantly updates what your assistant knows, without retraining or fine-tuning the model. Scalable – You can expand coverage simply by adding new documents or data to your retrieval index. Challenges to watch for: Chunking strategy – If documents are split poorly (e.g., too small or too large), retrieval accuracy suffers. Quality of embeddings – Better embedding models yield more relevant search results. Latency – Combining retrieval and generation can add delay; optimizing retrieval speed and prompt size is key for a smooth user experience. RAG systems are now at the heart of many advanced AI applications: internal chatbots that answer questions about company SOPs, customer support assistants that read ticket histories, and knowledge workers who need answers grounded in thousands of pages of regulations or research. By bridging the gap between static model knowledge and dynamic, proprietary information, RAG enables language models to truly “know what you know.” But even with RAG, we’re still fundamentally in the realm of one-shot responses: everything the model needs must be packed into a single prompt at inference time. While retrieval can dynamically inject up-to-date knowledge, the LLM can’t do anything beyond generating text. It can’t perform calculations, interact with APIs, or carry out multi-step processes. For example, if your question requires both retrieving a document and checking a live system (like an inventory database or weather API), RAG alone isn’t enough — because LLMs without tools can’t call external functions or get live data beyond what’s stuffed into the prompt. To move beyond static, context-only interactions, we need something more sophisticated: a way for LLMs to use tools, extending their capabilities from pure language generation to actively interfacing with other systems. This next evolution turns the LLM into a reasoning engine that can decide when to call APIs, perform calculations, or fetch real-time information — opening the door to far more powerful and interactive AI systems. Phase 2: LLMs + Tools Retrieval made it possible for LLMs to answer questions using your data, but there was still a fundamental limit: they could only respond with text. What if the best response requires action, like calculating a complex formula, querying a database, or booking a meeting? This is where tool use changes everything. Instead of trying to cram every possible piece of data or functionality into the prompt, we give the LLM access to external tools it can call on demand. Tools can be anything from calculators and web browsers to your internal APIs, databases, or third-party services. Think of it like upgrading the LLM from a smart librarian — who can find information — to a digital assistant — who can find information and take meaningful action. With tool use, the LLM no longer needs every detail packed into the prompt. Instead, it can recognize when it’s missing information, decide what steps to take, and call external tools to gather data or perform actions. How it works in practice: We (the developers) implement and host the tools: these can be functions, APIs, calculators, or any service your LLM can use. When we send a prompt to the LLM, we also describe the tools we’ve made available — including each tool’s name, what it does, and what arguments it accepts. The LLM interprets the prompt, decides whether it needs to use any of the tools, and if so, generates a structured request (like a function call) specifying which tool to invoke and with what arguments. We, as the system orchestrator, receive this tool call, execute it in the real world, and send the results back to the LLM. The LLM then incorporates the tool’s output into its final response to the user. To see how this works in practice, imagine we send the LLM a pseudo prompt like this: What’s the weather in San Francisco today? By the way, I have a tool getWeather which delivers weather for a given city and date. Of course, this isn’t how a real prompt would look in an actual implementation — it’s just a simplified example to demonstrate the concept of tool use. In reality, tools are described to the LLM in structured formats (like function schemas), not casual text. In this example, the LLM interprets the user’s question and realizes it needs live weather data to answer accurately. Based on the available tools we’ve told it about, it decides to call the getWeather tool with “San Francisco” and “today” as inputs. Our system then executes this tool, fetches the current weather, and sends the result back to the LLM. The model incorporates this real-time information into a final response for the user, such as: The current temperature in San Francisco is 66°F with clear skies. This workflow highlights the key innovation: we provide the tools and describe them to the LLM, and the LLM determines when and how to use them — turning it from a static text generator into a dynamic assistant capable of real-world actions. But even with tools, each request is still fundamentally a single-turn interaction: the LLM receives the prompt, decides whether it needs a tool, calls it if necessary, and produces an answer. Once the response is sent, the conversation effectively resets — there’s no memory of past attempts, and no ability to adapt or plan multiple steps. To move beyond one-off questions and actions, we need something more sophisticated: systems that can break a high-level goal into sub-tasks, make decisions over multiple steps, handle errors, and remember context throughout a process. That’s where the concept of agents comes in. Phase 3: Agents Now we’re getting to the fun part. Agents combine everything above — retrieval, tool use, and LLM reasoning — with one key idea: autonomy. Instead of handling a single prompt-response cycle, agents can work toward a high-level goal by planning, executing, and adapting over multiple steps. An AI agent doesn’t just generate an answer — it can: Receive a complex objective, like “Book me a flight to New York next Friday under $300, and then schedule a meeting with the team the day after I arrive.” Break it down into sub-tasks, such as searching flights, comparing options, booking tickets, checking calendars, and sending invites. Use tools or APIs repeatedly as needed to carry out each step. Decide what to do next based on intermediate results (e.g., if no flights under $300 are available, look for nearby airports). Handle errors or retries (e.g., if a payment fails or an API times out). Optionally store memory, so it can refer back to previous decisions or actions as the task progresses. This multi-step, adaptive behavior makes agents feel less like question-answering bots and more like collaborative digital workers capable of handling real-world workflows end-to-end. What makes agents different from tool-augmented LLMs? Whereas tool use lets an LLM perform one action in response to one prompt, agents can build a plan, adjust it dynamically, and execute a sequence of actions — often involving multiple tool calls across many turns — to achieve the final outcome. Agents don’t just respond, they reason over time. For example: If an agent tasked with “Find three laptops under $1,000 with good battery life” sees the first two search results exceed the budget, it can keep searching, compare specs, and update its plan until it finds suitable options — instead of giving up after one tool call. Why agents matter Agents represent a major leap forward because they let you move from simple, single-turn interactions to powerful, goal-driven systems. With agents, you can automate workflows that previously required human coordination — from booking travel and compiling reports to troubleshooting technical issues or orchestrating multi-step customer support tasks. By combining planning, reasoning, and tool use, agents don’t just answer questions — they deliver outcomes. This shift transforms LLMs from reactive assistants into proactive collaborators capable of executing complex, real-world processes. Trust, Transparency, and Control One of the biggest concerns around AI agents is control. People worry about giving too much autonomy to systems that can hallucinate facts, generate faulty code, or make decisions based on incorrect assumptions. And these fears are valid—LLMs can still confidently produce incorrect outputs. However, this is also where well-designed agents start to shine. Unlike a one-shot LLM response that might spit out an answer with no explanation, agents are explicit about their reasoning. They: Break tasks into steps Show you intermediate decisions Generate and reveal tool inputs like SQL queries, API calls, or code Justify why a certain action is being taken This makes agents inherently more transparent than simple LLM apps. You don’t just get an answer—you see how the agent arrived at it, and you can inspect, approve, or intervene as needed. Instead of hiding complexity, agents give you a traceable process. Think of it less like talking to a "magic box" and more like collaborating with a very capable (and very verbose) intern. As agent frameworks mature, expect to see better UIs for reviewing steps, controls for restricting actions, and logs that help you debug or audit agent behavior—making it easier to trust these systems, not just for what they say, but for how they think. Introducing GATE/0: A Gateway for Your LLM and Agentic Applications As we move from simple LLM prompts to retrieval, tool use, and fully autonomous agents, one thing becomes clear: these systems are incredibly powerful — but also complex. Once agents start reasoning over multiple steps, calling tools, and interacting with live systems, it becomes critical to understand what they’re doing, why they’re doing it, and how much it costs. That’s where GATE/0 comes in. GATE/0 provides a unified gateway for all your LLM and agentic applications, giving you deep visibility into every prompt, tool call, and decision your agents make. It lets you trace your AI workflows just like you would trace traditional software, so you can: Analyze costs — Understand how your LLM usage translates to spending, down to individual prompts or tool invocations. Debug behavior — See exactly what your agents did at each step, including tool inputs, retrieved documents, and reasoning chains. Audit decisions — Trace how agents arrived at their conclusions, making it easier to spot errors or validate outputs. Improve reliability — Identify and fix failure points in complex multi-step processes. As AI systems become more agentic, observability and control aren’t optional — they’re essential. GATE/0 helps you move confidently into this new era, where your LLM-powered applications aren’t just reactive bots, but intelligent, proactive agents driving real business outcomes.

Introduction to the Model Context Protocol (MCP)

In the rapidly evolving landscape of artificial intelligence, one of the most significant challenges has been enabling AI models to interact seamlessly with external tools, data sources, and services. Enter the Model Context Protocol (MCP) – a groundbreaking standard that's reshaping how AI systems connect with the world around them. What is Model Context Protocol? Model Context Protocol (MCP) is an open standard created by Anthropic to enable secure, standardized connections between AI models and external resources. Anthropic developed MCP to solve a critical challenge in the AI ecosystem: the lack of a unified way for AI assistants to interact with external tools and data sources. Before MCP, the landscape was fragmented and inefficient. Each AI application developer had to build custom, one-off integrations for every external service they wanted their AI to use. If you wanted your AI assistant to access a database, you'd write custom database connection code. If you needed file system access, that required another custom integration. Want to connect to a specific API? Yet another bespoke implementation. This approach created several problems: Massive duplication of effort: Every AI application was essentially solving the same integration challenges from scratch Inconsistent user experiences: Different AI tools handled similar tasks in completely different ways High maintenance burden: When external services updated their APIs, every custom integration had to be individually updated Limited tool ecosystems: The high cost of building integrations meant most AI applications could only connect to a handful of external services MCP changes this paradigm entirely. Think of it as a universal translator that allows AI assistants to communicate with databases, APIs, file systems, and other tools through a single, consistent interface. Instead of building dozens of custom integrations, developers can implement MCP once and gain access to any MCP-compatible tool or service. Next let's explore the MCP architecture. The Architecture Behind MCP MCP operates on a client-server architecture that prioritizes both flexibility and security: Servers MCP servers act as bridges between AI models and external resources. Each server can expose multiple tools, data sources, or services through a standardized interface. For example, a database server might provide read and write capabilities for specific tables, while a file system server could offer document retrieval and manipulation functions. Clients On the other side, MCP clients are typically AI applications or model providers that need to access external resources. These clients can discover available servers, understand their capabilities, and make requests through the standardized protocol. Transport Layer MCP supports multiple transport mechanisms, including local stdio connections for desktop applications and HTTP/WebSocket connections for web-based or remote integrations. This flexibility ensures that MCP can work across different deployment scenarios. To better understand how MCP works in practice, let's walk through a concrete example. MCP in Action: A Practical Example Suppose you want to provide your LLM access to your company's employee database to enable data analysis about employees - perhaps to generate reports, answer HR questions, or identify workforce trends. We'll use Claude Desktop as our LLM client and connect it to your employee database through a PostgreSQL MCP server The Setup In this scenario, you have: Claude Desktop: Acting as the MCP client that needs database access PostgreSQL MCP Server: A specialized PostgreSQL MCP server that provides secure access to your employee database Employee Database: A PostgreSQL employees database. Database structure We are going to create our employees database using the data from neondatabase-labs/postgres-sample-dbs. You can use your own database with your data or import the employees data for the sake of experiment. Let's have a brief look into the table structure in the employees database As you can see in the diagram above, we have employees, each working in one or more departments. Some employees also manage departments. Separately, the salary table keeps track of how much each employee earned over time, while the title table records their job titles through different periods. Departments themselves are independent entries, but they connect to employees through both membership and management. Now that we are familiar with the database structure, let's configure our Claude Desktop with a PostgreSQL MCP server. Configuring Claude Desktop with MCP server To configure your Claude Desktop with a MCP server, perform the following steps. Launch your Claude Desktop Go to Settings and click on the Developer tab Click on Edit Config which will create a file called claude_desktop_config.json On macOS: ~/Library/Application Support/Claude/claude_desktop_config.json On Windows: %APPDATA%/Claude/claude_desktop_config.json Open this file and add paste the following configuration into it. Please note that we are using npx as command. npx is a command-line tool that allows you to execute Node.js packages without installing them globally. So, you need to install npm first. You don’t install npx separately in most cases. npx comes bundled with npm version 5.2.0 or later. Alternatively, you can use Docker to run the MCP server. If you prefer Docker, please use this configuration. Please note that when running Docker on MacOS, use host.docker.internal if the server is running on the host network (eg localhost). Now restart your Claude Desktop and go to the Settings again. If everything went okay, you should see your Postgres MCP server running, as shown in the following screenshot. Start chatting with your database Now comes the fun part. Let's type the following question into Claude Desktop prompt field: Who are the top 3 highest-paid employees in my PostgreSQL database? Once you’ve granted Claude Desktop permission to access these tools, the model (Claude Sonnet 4 in this example) will immediately begin reasoning, as shown in the following screenshot. Let’s walk through the model’s reasoning process. First, it checks which schemas are available in the database by asking the MCP server to execute this query: The MCP server responds with a JSON array containing the available schema names: Seeing that the employees schema exists, the model next determines what tables are present in it by executing: The MCP server replies with a JSON array of table names: The model then examines the structure of the employee and salary tables to understand their relationship. First, it retrieves the columns in the employee table: The response provides the columns of the employee table: Next, it explores the structure of the salary table with the following query: And receives the column details: Now that the model understands the tables’ structures, it identifies the top 3 highest-paid employees by joining the employee and salary tables and retrieving each employee’s most recent salary: he MCP server responds with: Finally, Claude Desktop displays the answer to our question: the top 3 highest-paid employees in the database. Isn’t this reasoning process remarkable? Watching the model think through the problem step by step gives us both transparency and confidence in its results—an invaluable tool for anyone who wants deeper insights into their data. Conclusion By combining LLMs with the Model Context Protocol, we’ve unlocked a new era of AI capabilities: one where large language models can securely, transparently, and intelligently interact with complex databases and external tools. Instead of opaque black-box responses, we can now watch the model reason step by step—understanding schemas, inspecting tables, and crafting precise queries—all in real time. MCP turns what used to be a fragmented, tedious integration process into a unified and streamlined experience, empowering developers and organizations to build AI systems that are not just powerful but also verifiable and maintainable. As the AI landscape continues to advance, standards like MCP will be essential for creating systems we can trust—and that can keep pace with our ever-growing data and operational needs.