Model Registry Without the Bottleneck

10,000 GPU nodes query your registry simultaneously. DNS caching means 1 origin request. Not 10,000.

The Problem

Your ML platform serves 50 models across 3 regions. vLLM workers spin up, query the registry for endpoint metadata, and die. Ray clusters sync configs across 1,000 nodes. Kubernetes operators poll for version changes. Every cold start hits your registry. Every config sync hammers the same API. At scale, your registry becomes the bottleneck—rate limits, connection pools exhausted, slower than inference itself.

The Solution

ResolveDB publishes model metadata as DNS TXT records. When a vLLM worker queries get.llama-70b.models.inference.v1.resolvedb.net, the response includes endpoint, tensor parallelism, and health status. That response is cached by every resolver in the path. The next 9,999 workers behind your corporate DNS get the answer from cache. Your registry sees 1 request per TTL.

Key Benefits

Zero thundering herd - autoscaling events don't flood your registry
10,000 nodes = 1 origin request per TTL period
Sub-5ms discovery from cache (vs 50-200ms API calls)
No single point of failure - DNS hierarchy is inherently distributed

Use Cases

vLLM/TensorRT-LLM Endpoint Discovery

Inference workers find which endpoint serves which model version. Thousands of workers cold-start during traffic spikes without hammering a central registry.

get.llama-70b.models.inference.v1.resolvedb.net → {"endpoint":"http://vllm:8000","tp":4,"maxBatch":32}

Distributed Training Config Sync

Ray/DeepSpeed jobs across 1,000+ GPUs need identical hyperparameters. Config drift causes training divergence. DNS-cached config means all nodes get the same response.

get.run-abc123.config.training.v1.resolvedb.net → {"lr":1e-4,"batchSize":2048,"fsdp":true}

Model A/B Routing

Roll out new model versions without updating configs everywhere. Change routing in DNS, caches expire, all clients pick up new routing. No deployments.

get.codegen.routing.prod.v1.resolvedb.net → {"v2.3":90,"v2.4-canary":10}

LoRA Adapter Registry

Serve 100+ fine-tuned LoRA adapters on shared base models. Clients discover which adapters are available and their base model compatibility.

list.base-llama-7b.adapters.serving.v1.resolvedb.net → ["sql-expert","medical-qa","code-review"]

GPU Cluster Topology

Schedulers need cluster topology—NVLink, InfiniBand, NUMA zones. Topology metadata cached globally, no central inventory API.

get.node-042.topology.cluster.v1.resolvedb.net → {"gpus":8,"nvlink":true,"ib":"400Gbps"}

Try It Live

Live DNS Query

dig TXT

Query breakdown:

operation:getparams:llama-70bresource:modelsnamespace:inferenceversion:v1

Code Examples

Terminal

# Get model endpoint for vLLM worker
dig TXT get.llama-70b.models.inference.v1.resolvedb.net +short
# "v=rdb1;s=ok;ttl=3600;d={\"endpoint\":\"http://vllm:8000\",\"tp\":4}"

# Get training run config
dig TXT get.run-abc123.config.training.v1.resolvedb.net +short

# Check model routing weights
dig TXT get.codegen.routing.prod.v1.resolvedb.net +short

Python

import dns.resolver
import json

def get_model_endpoint(model: str) -> dict:
    """Get model endpoint from DNS registry."""
    fqdn = f"get.{model}.models.inference.v1.resolvedb.net"
    answers = dns.resolver.resolve(fqdn, 'TXT')
    response = str(answers[0]).strip('"')
    parts = dict(p.split('=', 1) for p in response.split(';') if '=' in p)
    return json.loads(parts['d'])

# vLLM worker startup
model = get_model_endpoint("llama-70b")
print(f"Endpoint: {model['endpoint']}, TP: {model['tp']}")

JavaScript

import dns from 'dns/promises';

async function getModelEndpoint(model: string) {
  const fqdn = `get.${model}.models.inference.v1.resolvedb.net`;
  const records = await dns.resolveTxt(fqdn);
  const response = records.flat().join('');
  const parts: Record<string, string> = {};
  for (const seg of response.split(';')) {
    const [k, v] = seg.split('=', 2);
    if (k && v) parts[k] = v;
  }
  return JSON.parse(parts.d);
}

// Worker startup
const model = await getModelEndpoint('llama-70b');
console.log(`Endpoint: ${model.endpoint}`);

func getModelEndpoint(model string) (map[string]any, error) {
    fqdn := fmt.Sprintf("get.%s.models.inference.v1.resolvedb.net", model)
    records, err := net.LookupTXT(fqdn)
    if err != nil { return nil, err }

    for _, seg := range strings.Split(strings.Join(records, ""), ";") {
        if strings.HasPrefix(seg, "d=") {
            var result map[string]any
            json.Unmarshal([]byte(seg[2:]), &result)
            return result, nil
        }
    }
    return nil, nil
}

Comparison

Feature	ResolveDB	Alternative
Cold start discovery	<5ms (cached)	50-200ms (API)
10K simultaneous workers	1 origin request	10K requests
Global distribution	DNS infra (free)	CDN required
Single point of failure	No (DNS hierarchy)	Yes (registry)
Works air-gapped	Yes (local DNS)	No

Frequently Asked Questions

How do I register model metadata?

Use the HTTP API: PUT /api/v1/namespaces/inference/resources/models/llama-70b with JSON containing endpoint, tensor parallelism, max batch size. Data is immediately available via DNS.

What about sensitive endpoints?

Use authenticated queries with JWT tokens for private endpoints. The response can include a routing token instead of the actual endpoint, which clients exchange with your load balancer.

How do I update routing during a canary rollout?

Update the routing record via API, wait for TTL expiry (or use short TTL=60s during rollouts), all clients automatically pick up new weights. No deployments, no client changes.

Can I use this with Ray, DeepSpeed, or other frameworks?

Yes. DNS queries work from any language. Parse the UQRP response and use the endpoint. The examples show the pattern—it's just string parsing.

Ready to get started?

Create an account and start storing data in under a minute.

Start Building View More Use Cases