Files
mcpctl/.taskmaster/tasks/task_022.md

272 lines
12 KiB
Markdown
Raw Normal View History

2026-02-21 03:10:39 +00:00
# Task ID: 22
**Title:** Implement MCP Registry Client
**Status:** pending
**Dependencies:** None
**Priority:** high
**Description:** Build a multi-source registry client that queries the Official MCP Registry, Glama.ai, and Smithery.ai APIs to search, discover, and retrieve MCP server metadata with deduplication, ranking, and caching.
**Details:**
Create src/cli/src/registry/ directory with the following structure:
```
registry/
├── client.ts # Main RegistryClient facade
├── sources/
│ ├── base.ts # Abstract RegistrySource interface
│ ├── official.ts # Official MCP Registry (registry.modelcontextprotocol.io)
│ ├── glama.ts # Glama.ai registry
│ └── smithery.ts # Smithery.ai registry
├── types.ts # RegistryServer, SearchOptions, etc.
├── cache.ts # TTL-based result caching
├── dedup.ts # Deduplication logic
├── ranking.ts # Result ranking algorithm
└── index.ts # Barrel export
```
**Strategy Pattern Implementation:**
```typescript
// types.ts
export interface EnvVar {
name: string;
description: string;
isSecret: boolean;
setupUrl?: string;
}
export interface RegistryServer {
name: string;
description: string;
packages: {
npm?: string;
pypi?: string;
docker?: string;
};
envTemplate: EnvVar[];
transport: 'stdio' | 'sse' | 'websocket';
repositoryUrl?: string;
popularityScore: number;
verified: boolean;
sourceRegistry: 'official' | 'glama' | 'smithery';
lastUpdated?: Date;
}
export interface SearchOptions {
query: string;
limit?: number;
registries?: ('official' | 'glama' | 'smithery')[];
verified?: boolean;
transport?: 'stdio' | 'sse';
category?: string;
}
// base.ts
export abstract class RegistrySource {
abstract name: string;
abstract search(query: string, limit: number): Promise<RegistryServer[]>;
protected abstract normalizeResult(raw: unknown): RegistryServer;
}
```
**Official MCP Registry Source (GET /v0/servers):**
- Base URL: https://registry.modelcontextprotocol.io/v0/servers
- Query params: ?search=<query>&limit=100&cursor=<cursor>
- No authentication required
- Pagination via cursor
- Response includes: name, description, npm package, env vars, transport
**Glama.ai Source:**
- Base URL: https://glama.ai/api/mcp/v1/servers
- No authentication required
- Cursor-based pagination
- Response includes env var JSON schemas
**Smithery.ai Source:**
- Base URL: https://registry.smithery.ai/servers
- Query params: ?q=<query>
- Requires free API key from config (optional, graceful fallback)
- Has verified badges, usage analytics
**Caching Implementation:**
```typescript
// cache.ts
import { createHash } from 'crypto';
export class RegistryCache {
private cache = new Map<string, { data: RegistryServer[]; expires: number }>();
private defaultTTL: number;
constructor(ttlMs = 3600000) { // 1 hour default
this.defaultTTL = ttlMs;
}
private getKey(query: string, options: SearchOptions): string {
return createHash('sha256').update(JSON.stringify({ query, options })).digest('hex');
}
get(query: string, options: SearchOptions): RegistryServer[] | null {
const key = this.getKey(query, options);
const entry = this.cache.get(key);
if (entry && entry.expires > Date.now()) {
return entry.data;
}
this.cache.delete(key);
return null;
}
set(query: string, options: SearchOptions, data: RegistryServer[]): void {
const key = this.getKey(query, options);
this.cache.set(key, { data, expires: Date.now() + this.defaultTTL });
}
getHitRatio(): { hits: number; misses: number; ratio: number } { /* metrics */ }
}
```
**Deduplication Logic:**
- Match by npm package name first (exact match)
- Fall back to GitHub repository URL comparison
- Keep the result with highest popularity score
- Merge envTemplate data from multiple sources
**Ranking Algorithm:**
1. Relevance score (text match quality) - weight: 40%
2. Popularity/usage count (Smithery analytics) - weight: 30%
3. Verified status - weight: 20%
4. Recency (last updated) - weight: 10%
**Rate Limiting & Retry:**
```typescript
export async function withRetry<T>(
fn: () => Promise<T>,
maxRetries = 3,
baseDelay = 1000
): Promise<T> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
if (attempt === maxRetries - 1) throw error;
const delay = baseDelay * Math.pow(2, attempt) + Math.random() * 1000;
await new Promise(r => setTimeout(r, delay));
}
}
throw new Error('Unreachable');
}
```
**Security Requirements:**
- Validate all API responses with Zod schemas
- Sanitize descriptions to prevent terminal escape sequence injection
- Never log API keys (Smithery key)
- Support HTTP_PROXY/HTTPS_PROXY environment variables
- Support NODE_EXTRA_CA_CERTS for custom CA certificates
**SRE Metrics (expose via shared metrics module):**
- registry_query_latency_ms (histogram by source)
- registry_cache_hit_ratio (gauge)
- registry_error_count (counter by source, error_type)
**Test Strategy:**
TDD approach - write tests BEFORE implementation:
1. **Unit tests for each registry source:**
- Mock HTTP responses for official, glama, smithery APIs
- Test normalization of raw API responses to RegistryServer type
- Test pagination handling (cursor-based)
- Test error handling (network errors, invalid responses, rate limits)
2. **Cache tests:**
- Test cache hit returns data without API call
- Test cache miss triggers API call
- Test TTL expiration correctly invalidates entries
- Test cache key generation is deterministic
- Test hit ratio metrics accuracy
3. **Deduplication tests:**
- Test npm package name matching
- Test GitHub URL matching with different formats (https vs git@)
- Test keeping highest popularity score
- Test envTemplate merging from multiple sources
4. **Ranking tests:**
- Test relevance scoring for exact vs partial matches
- Test popularity weight contribution
- Test verified boost
- Test overall ranking order
5. **Integration tests:**
- Test full search flow with mocked HTTP
- Test parallel queries to all registries
- Test graceful degradation when one registry fails
6. **Security tests:**
- Test Zod validation rejects malformed responses
- Test terminal escape sequence sanitization
- Test no API keys in error messages or logs
Run: `pnpm --filter @mcpctl/cli test:run -- --coverage registry/`
## Subtasks
### 22.1. Define Registry Types, Zod Schemas, and Base Abstract Source Interface
**Status:** pending
**Dependencies:** None
Create the foundational types, validation schemas, and abstract base class for all registry sources following TDD and strategy pattern principles.
**Details:**
Create src/cli/src/registry/ directory structure. Implement types.ts with RegistryServer, SearchOptions, EnvVar interfaces. Define Zod schemas for validating all API responses (OfficialRegistryResponseSchema, GlamaResponseSchema, SmitheryResponseSchema) to ensure security validation. Create base.ts with abstract RegistrySource class including name property, search() method, and normalizeResult() protected method. Include terminal escape sequence sanitization utility in types.ts. Write comprehensive Vitest tests BEFORE implementation: test type guards, Zod schema validation with valid/invalid inputs, sanitization of malicious strings with ANSI escape codes. Add category tags including data platform categories (bigquery, snowflake, dbt). Export everything via index.ts barrel file.
### 22.2. Implement Individual Registry Sources with HTTP Client and Proxy Support
**Status:** pending
**Dependencies:** 22.1
Implement the three concrete registry source classes (OfficialRegistrySource, GlamaRegistrySource, SmitheryRegistrySource) with proper HTTP handling, proxy support, and response normalization.
**Details:**
Create sources/official.ts for https://registry.modelcontextprotocol.io/v0/servers - implement cursor-based pagination, normalize responses to RegistryServer type. Create sources/glama.ts for https://glama.ai/api/mcp/v1/servers - handle JSON schema env vars, cursor pagination. Create sources/smithery.ts for https://registry.smithery.ai/servers - optional API key from config, graceful fallback if unauthorized, handle verified badges and analytics. Implement shared HTTP client utility supporting HTTP_PROXY/HTTPS_PROXY environment variables and NODE_EXTRA_CA_CERTS for custom CA certificates. Add exponential backoff retry logic with jitter (withRetry function). Never log API keys in error messages or debug output. Use structured logging with appropriate log levels. Write tests BEFORE implementation using mock HTTP responses.
### 22.3. Implement TTL-Based Caching with Metrics and Hit Ratio Tracking
**Status:** pending
**Dependencies:** 22.1
Build the RegistryCache class with TTL-based expiration, SHA-256 cache keys, hit/miss metrics, and integration with the SRE metrics module.
**Details:**
Create cache.ts with RegistryCache class. Use SHA-256 hash of query+options JSON for cache keys. Implement TTL-based expiration with configurable defaultTTL (default 1 hour). Track hits/misses with getHitRatio() method returning { hits, misses, ratio }. Integrate with shared metrics module to expose registry_cache_hit_ratio gauge. Implement cache.clear() for testing and manual invalidation. Add cache size limits with LRU eviction if needed. Ensure thread-safety for concurrent access patterns. Write comprehensive Vitest tests BEFORE implementation covering cache behavior.
### 22.4. Implement Deduplication Logic and Ranking Algorithm
**Status:** pending
**Dependencies:** 22.1
Create the deduplication module to merge results from multiple registries and the ranking algorithm to sort results by relevance, popularity, verification, and recency.
**Details:**
Create dedup.ts with deduplicateResults(results: RegistryServer[]): RegistryServer[] function. Match duplicates by npm package name (exact match) first, then fall back to GitHub repositoryUrl comparison. Keep the result with highest popularityScore when merging duplicates. Merge envTemplate arrays from multiple sources, deduplicating by env var name. Create ranking.ts with rankResults(results: RegistryServer[], query: string): RegistryServer[] function. Implement weighted scoring: text match relevance 40%, popularity/usage 30%, verified status 20%, recency 10%. Text relevance uses fuzzy matching on name and description. Write tests BEFORE implementation with sample datasets.
### 22.5. Build Main RegistryClient Facade with Parallel Queries and SRE Metrics
**Status:** pending
**Dependencies:** 22.1, 22.2, 22.3, 22.4
Create the main RegistryClient facade class that orchestrates parallel queries across all sources, applies caching, deduplication, ranking, and exposes SRE metrics for observability.
**Details:**
Create client.ts with RegistryClient class implementing the facade pattern. Constructor accepts optional config for enabling/disabling specific registries, cache TTL, and Smithery API key. Implement search(options: SearchOptions): Promise<RegistryServer[]> that queries all enabled registries in parallel using Promise.allSettled, applies caching, deduplication, and ranking. Expose SRE metrics via shared metrics module: registry_query_latency_ms histogram labeled by source, registry_error_count counter labeled by source and error_type. Use structured logging for all operations. Handle partial failures gracefully (return results from successful sources). Create index.ts barrel export for clean public API. Include comprehensive JSDoc documentation.