# MCP Catalog API — Full Reference > Search and discovery engine for 23,000+ MCP servers indexed from multiple registries (GitHub, Smithery, mcpservers.org, MCP Registry, npm, PyPI). Provides hybrid semantic search with BM25 + dense vector + RRF fusion + tool embedding boost, trust scoring, and vulnerability tracking via NVD. ## Overview - **What it is:** A REST API catalog of MCP (Model Context Protocol) servers and AI agents. - **What it indexes:** 23,000+ MCP servers collected from GitHub, Smithery, mcpservers.org, MCP Registry, npm, and PyPI via scheduled scrapers. - **Database:** TimescaleDB (PostgreSQL 16) with pgvector extension for vector similarity search. - **Auth:** None required. All endpoints are public, read-only GETs. - **Response format:** JSON (application/json). All list endpoints return paginated responses. - **Rate limits:** 15 req/min (search), 10 req/min (list), 30 req/min (detail) — per IP. --- ## API Endpoints Base URL: `/api/v1` (except `/health` which is at root) --- ### GET /health Health check with database connectivity probe. **Response:** ```json {"status": "ok", "db": "connected"} ``` --- ### GET /api/v1/servers List MCP servers with pagination, filtering, and sorting. **Query Parameters:** | Param | Type | Default | Constraints | Description | |-------|------|---------|-------------|-------------| | limit | int | 20 | 1-20 | Results per page | | offset | int | 0 | 0-40 | Pagination offset | | sort_by | string | popularity | popularity, stars, downloads, name, newest | Sort order | | source_registry | string | null | | Filter by registry | | status | string | null | active, archived, deprecated | Filter by status | | transport_type | string | null | stdio, sse, streamable-http | Filter by transport | | license | string | null | MIT, Apache-2.0, etc. | Filter by license | | protocol | string | null | mcp, a2a, rest, openapi | Filter by protocol | **Example:** ```bash curl "https://knyazevai.work/api/v1/servers?limit=10&sort_by=stars" ``` --- ### GET /api/v1/servers/search Hybrid semantic search across all indexed MCP servers. **Query Parameters:** | Param | Type | Default | Constraints | Description | |-------|------|---------|-------------|-------------| | q | string | **required** | 1-200 chars | Search query in natural language | | limit | int | 20 | 1-20 | Results per page | | offset | int | 0 | 0-40 | Pagination offset | | min_similarity | float | 0.01 | 0-1 | Minimum cosine similarity threshold | | transport_type | string | null | | Filter by transport | | protocol | string | null | | Filter by protocol | | requires_api_key | bool | null | | Filter by API key requirement | | has_docker | bool | null | | Filter by Docker availability | | min_stars | int | null | | Minimum GitHub stars | **Response fields (per server):** - `cosine_similarity` (float, 0.0-1.0) — cosine similarity between query embedding and server description embedding. **Use this to evaluate match quality.** 0.85+ = excellent, 0.70-0.85 = good, <0.70 = weak. - `similarity_score` (float) — combined RRF ranking score with boosts (internal, used for ordering) - `trust_score` (int, 0-100) — composite trust score - `readme_summary` (string) — AI-generated 2-3 sentence summary - `transport_type` (list[string]) — supported transports - `protocol` (string) — mcp, a2a, rest, openapi - `pricing_model` (string) — free, paid, freemium, per_call - `requires_api_key` (bool) — whether API key is needed - `install_command` (string) — ready-to-run install command **Example:** ```bash curl "https://knyazevai.work/api/v1/servers/search?q=database%20postgresql&limit=5" ``` **Example response:** ```json { "items": [ { "id": "582ac966-...", "title": "MCP-PostgreSQL-Ops", "description": "Professional MCP server for PostgreSQL database operations...", "cosine_similarity": 0.852, "similarity_score": 0.029, "trust_score": 68, "protocol": "mcp", "transport_type": ["stdio", "streamable-http"], "pricing_model": "free", "requires_api_key": false, "github_stars": 142, "popularity_score": 1346, "vulnerability_count": 0, "readme_summary": "MCP-PostgreSQL-Ops is a server for PostgreSQL database operations..." } ], "total": 34, "limit": 5, "offset": 0, "relevant_count": 12 } ``` --- ### GET /api/v1/servers/{server_id} Full detail for a single server including tools, packages, categories, metrics, and trust breakdown. **Response:** `ServerDetailResponse` — includes all fields from search plus: - `tools` — list of MCP tools with names, descriptions, and input schemas - `packages` — npm/PyPI package distributions - `categories` — server categories/tags - `latest_metrics` — recent usage metrics (stars, forks, downloads) - `trust_breakdown` — per-dimension trust scores (source, popularity, completeness, freshness, security, user_signals) - `config_example` — example config for claude_desktop_config.json - `client_compatibility` — compatible clients (claude_desktop, cursor, vs_code_copilot, etc.) **Example:** ```bash curl "https://knyazevai.work/api/v1/servers/582ac966-15df-403f-ae40-4c4c54dccb38" ``` --- ### GET /api/v1/servers/{server_id}/vulnerabilities Known vulnerabilities (CVE/GHSA) for a server, newest first. **Example:** ```bash curl "https://knyazevai.work/api/v1/servers/{id}/vulnerabilities" ``` --- ### GET /api/v1/categories List all server categories with counts. --- ## Search Algorithm The `/servers/search` endpoint uses a multi-stage hybrid retrieval pipeline: ### Stage 1: Candidate Retrieval (Hybrid RRF) Two parallel retrieval paths, each returning top-50 candidates: 1. **BM25 full-text search** — PostgreSQL `ts_rank_cd` over weighted tsvectors (`title` weight A, `description` weight B, `registry_name` weight C). GIN index. 2. **Dense vector search** — pgvector HNSW index with cosine distance over 1024-dim embeddings. Model: `intfloat/multilingual-e5-large`. `hnsw.ef_search=40`. Fused via **Reciprocal Rank Fusion (RRF)** with k=60: ``` rrf_score = 1/(60 + semantic_rank) + 1/(60 + fulltext_rank) ``` ### Stage 2: Tool Score Boost For each candidate, the best-matching tool embedding adds 0.3x weight: ``` boosted = rrf_score + (best_tool_similarity * 0.3) ``` ### Stage 3: Popularity & Trust Adjustments ``` + ln(1 + popularity_score) * 0.002 + (trust_score / 100) * 0.05 + rating_avg * 0.05 * min(rating_count, 20) / 20 ``` ### Output - `similarity_score` = final boosted RRF score (used for ranking) - `cosine_similarity` = raw cosine similarity between query and description embeddings (0-1, use for evaluating match quality) --- ## Pydantic Schemas ### ServerResponse ``` id: UUID registry_name: str title: str | None description: str | None current_version: str | None repository_url: str | None homepage_url: str | None license: str | None transport_type: list[str] | None status: str | None source_registry: str first_seen: datetime | None last_updated: datetime | None similarity_score: float | None # RRF ranking score (search only) cosine_similarity: float | None # Cosine similarity 0-1 (search only) vulnerability_count: int | None data_age_seconds: int | None provider: str | None popularity_score: int | None github_stars: int | None npm_downloads_weekly: int | None trust_score: int | None # 0-100 composite trust install_command: str | None readme_summary: str | None requires_api_key: bool protocol: str # mcp, a2a, rest, openapi pricing_model: str | None # free, paid, freemium, per_call rating_avg: float | None rating_count: int | None ``` ### ServerDetailResponse (extends ServerResponse) ``` tools: list[ToolResponse] packages: list[PackageResponse] categories: list[str] latest_metrics: list[MetricResponse] trust_breakdown: dict | None config_example: dict | str | None client_compatibility: list[str] ``` ### PaginatedResponse[T] ``` items: list[T] total: int limit: int offset: int relevant_count: int | None ``` --- ## MCP Server Interface The catalog is also available as an MCP server for programmatic use. **Remote SSE (for any MCP client):** ``` https://knyazevai.work/sse ``` **MCP tools available:** | Tool | Description | |------|-------------| | search_servers | Semantic search across 23,000+ servers by natural language query | | list_servers | Browse catalog with filters (registry, status, sort) | | get_server | Full detail for a server (tools, packages, categories, metrics) | | discover | Top recommendation(s) for a task — opinionated, agent-friendly | | get_server_vulnerabilities | CVE/GHSA security data for a server | | get_catalog_stats | Aggregate statistics (totals, registry breakdown) | --- ## Discovery Endpoints | Method | Path | Description | |--------|------|-------------| | GET | `/llms.txt` | Concise API documentation for LLMs | | GET | `/llms-full.txt` | Complete API reference (this file) | | GET | `/.well-known/agent.json` | A2A agent card | | GET | `/docs` | Interactive Swagger UI | | GET | `/openapi.json` | OpenAPI 3.1 specification |