When Cursor forwards 40+ raw MCP tools, the LLM wastes context judging tools, so latency increases and precision drops. Disabling tools manually is tedious and error-prone.
Alternatively: The paper “RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation” (https://arxiv.org/abs/2505.03275) embeds every tool’s metadata into a vector index and retrieves only the K most relevant entries before prompting. Authors report >50 % token cuts and ~3× higher tool-selection accuracy.
Two Practical Paths
- Raise the cap – simplest to ship but keeps prompts fat and costly.
- Retrieval proxy – place an external proxy that does RAG-based pre-selection and exposes just high-level endpoints. This flow is already implemented in the open-source mcpproxy-go (https://github.com/smart-mcp-proxy/mcpproxy-go/) if you need relief today.