Tadata

Introduction

Since the launch of the Model Context Protocol (MCP), thousands of companies have applied its technology. Common use cases include empowering internal teams with quick access to company data or providing end-users with seamless, natural-language interfaces for their software.

In our spare time, however, we've played around with using the MCP in a more unique way to power an escape room game. This “MCP Game”, enjoyable to create and play, offered interesting lessons about the evolving role of MCPs in the world.

What Exactly is an "MCP Game"?

The core idea for an MCP Game sparked from a realization: any game, when powered by a large language model (LLM), can now be infused with new life and capabilities thanks to the MCP. Previously, creating an LLM-based game that could interact with a dynamic world, execute actions, and change states required individual integrations. The MCP fundamentally alters this and simplifies the development of these games.

Our specific creation is an escape room. What better way to demonstrate the thrill of this new genre of “MCP Games” than by trapping players in a virtual room, challenging them to prompt the LLM to take actions that lead to their escape? It's a fun, immersive experience where you're truly in control.

How It Works

Here's a high-level pipeline of how the open-source game works:

Player Input: A user provides a query in a specific room, for example, "open the door."
Client-Side Tool Selection (LLM1): That query is sent to an initial LLM residing within the client. Along with a system prompt and a list of available tools (e.g., open_door, look_under_rug), this LLM decides which tool/action is most relevant. (We've also incorporated tools like impossible_action and multiple_actions to handle edge cases where a single, valid action isn't requested by the user.)
Server-Side Execution: The client then executes this tool call, which triggers the server to perform the necessary changes to its internal game state. The server then generates a new image reflecting the updated game state and returns that image along with a factual description of what occurred (e.g., "You discovered a set of bars behind the door").
Client-Side Response Enhancement (LLM2): The client receives these factual changes. It then sends these changes, along with another system prompt and the user's initial query, to a second LLM. This LLM's role is to craft a natural-language sentence that summarizes the changes in an engaging and atmospheric way for the user.

Both the client and server are written in Python. The server leverages FastAPI, which was transformed into an MCP server in just a few lines of code using the FastAPI-MCP open-source library. Communication between the client and server utilizes the Streamable HTTP transport.

To Expose or Not to Expose: The LLM Context Dilemma

One of the most intriguing takeaways was the constant tension of what information to expose to the LLM. We want the LLM to be as context-aware as possible so its responses feel natural and smooth. So, we tried "dumping" everything to both LLM calls: available tools, full conversation history, LLM role context, room/state details, and even the path to success. We assumed that with a sufficiently stringent prompt discouraging hints or tool suggestions, the LLM would use this context appropriately to make the game more engaging and realistic, yet still challenging.

Unfortunately, that wasn't the case. We quickly learned that as long as the LLM possessed even basic information like available tools, let alone the solution path, it would inevitably lean towards offering a "helping hand," even when unrequested (e.g., "that didn't work, perhaps try X").

Since no prompt was strong enough, this forced us to carefully curate LLM input; limiting the LLM's knowledge and context was the only way to curb its helpfulness. At one point, we went too far in this direction, stripping the LLM's role to just tool selection (LLM1), and replacing LLM2 with predetermined outputs. However, with rigid responses, the presence of the LLM vanished, and the game was frustrating to play.

Our current iteration strikes a balance: LLM2 still enhances responses after state changes, but its power is dramatically limited. It receives only the user's input, the selected tool, the factual state change, and a simple system prompt for engaging output. No game context, history, past actions, or tool/room state is provided.

When companies expose their software as an MCP and choose which endpoints to expose or hide, they must consider all the unintended ways LLMs might interpret and use that information. Deciding what tools, resources, prompts, and general context to provide an LLM is arguably the most significant challenge companies are currently facing. You want LLMs to have enough information to be helpful, but not so much that they cause unintended damage, become overwhelmed by irrelevant data, or leak sensitive information. This game serves as a microcosm of a much larger set of problems that companies and developers will need to solve in the coming years.

The Use of Custom Clients

Another fascinating challenge in building this project was the development of our own MCP client. Since the release of the MCP, most of the industry's focus has been on creating servers that perform cool functions. The assumption is that a pre-existing client (like Claude's desktop app or Cursor) will always provide a meaningful way for users to interact with these servers.

Often, this assumption is true. But, in unique cases, like building a game, a custom client becomes important. The client was founded on the typical MCP client responsibilities of managing the LLM and executing the needed MCP tool calls. However, by building our own client, we obtained more freedom. We could write specific prompts for the LLM calls (and in fact, the current iteration of the client has 3 different prompts given to LLM2, depending on what tool was called). Additionally, we could add restrictions, like allowing for just 1 tool call per user input. Building out our own MCP client was essential to truly control the game flow and user experience.

Again, this relates to an interesting broader question. As companies begin to build MCPs that expose their software to LLMs, they will naturally grapple with questions like:

What information do we provide to the LLMs?
How will the LLMs use this information?
What guardrails do we need to implement?

Much of the answer lies on the server side, in choosing which endpoints to expose and building those endpoints in a robust and secure manner. But this project does highlight the utility of the other end of the model context protocol. For example, if you wish to limit the LLM to specific sequences of tool calls or if you wish to add extra confirmation for sensitive actions, building custom clients can be your friend. This does come with added complications, though, and for many use cases, Claude or Cursor will do the job just fine!

Conclusion

In the process of using MCP servers and clients to create this game, we've tackled some very interesting problems related to the Model Context Protocol, and we've come away with valuable insights. Beyond engaging in a new genre of smooth and powerful LLM-powered games, we've also gained a deeper understanding of the roles of clients and the careful consideration required for information flow within the protocol. This is all just the beginning.