10,000 GPU nodes query your registry simultaneously. DNS caching means 1 origin request. Not 10,000.
Your ML platform serves 50 models across 3 regions. vLLM workers spin up, query the registry for endpoint metadata, and die. Ray clusters sync configs across 1,000 nodes. Kubernetes operators poll for version changes. Every cold start hits your registry. Every config sync hammers the same API. At scale, your registry becomes the bottleneck—rate limits, connection pools exhausted, slower than inference itself.
ResolveDB publishes model metadata as DNS TXT records. When a vLLM worker queries get.llama-70b.models.inference.v1.resolvedb.net, the response includes endpoint, tensor parallelism, and health status. That response is cached by every resolver in the path. The next 9,999 workers behind your corporate DNS get the answer from cache. Your registry sees 1 request per TTL.
Inference workers find which endpoint serves which model version. Thousands of workers cold-start during traffic spikes without hammering a central registry.
get.llama-70b.models.inference.v1.resolvedb.net → {"endpoint":"http://vllm:8000","tp":4,"maxBatch":32}Ray/DeepSpeed jobs across 1,000+ GPUs need identical hyperparameters. Config drift causes training divergence. DNS-cached config means all nodes get the same response.
get.run-abc123.config.training.v1.resolvedb.net → {"lr":1e-4,"batchSize":2048,"fsdp":true}Roll out new model versions without updating configs everywhere. Change routing in DNS, caches expire, all clients pick up new routing. No deployments.
get.codegen.routing.prod.v1.resolvedb.net → {"v2.3":90,"v2.4-canary":10}Serve 100+ fine-tuned LoRA adapters on shared base models. Clients discover which adapters are available and their base model compatibility.
list.base-llama-7b.adapters.serving.v1.resolvedb.net → ["sql-expert","medical-qa","code-review"]Schedulers need cluster topology—NVLink, InfiniBand, NUMA zones. Topology metadata cached globally, no central inventory API.
get.node-042.topology.cluster.v1.resolvedb.net → {"gpus":8,"nvlink":true,"ib":"400Gbps"}# Get model endpoint for vLLM worker
dig TXT get.llama-70b.models.inference.v1.resolvedb.net +short
# "v=rdb1;s=ok;ttl=3600;d={\"endpoint\":\"http://vllm:8000\",\"tp\":4}"
# Get training run config
dig TXT get.run-abc123.config.training.v1.resolvedb.net +short
# Check model routing weights
dig TXT get.codegen.routing.prod.v1.resolvedb.net +shortimport dns.resolver
import json
def get_model_endpoint(model: str) -> dict:
"""Get model endpoint from DNS registry."""
fqdn = f"get.{model}.models.inference.v1.resolvedb.net"
answers = dns.resolver.resolve(fqdn, 'TXT')
response = str(answers[0]).strip('"')
parts = dict(p.split('=', 1) for p in response.split(';') if '=' in p)
return json.loads(parts['d'])
# vLLM worker startup
model = get_model_endpoint("llama-70b")
print(f"Endpoint: {model['endpoint']}, TP: {model['tp']}")import dns from 'dns/promises';
async function getModelEndpoint(model: string) {
const fqdn = `get.${model}.models.inference.v1.resolvedb.net`;
const records = await dns.resolveTxt(fqdn);
const response = records.flat().join('');
const parts: Record<string, string> = {};
for (const seg of response.split(';')) {
const [k, v] = seg.split('=', 2);
if (k && v) parts[k] = v;
}
return JSON.parse(parts.d);
}
// Worker startup
const model = await getModelEndpoint('llama-70b');
console.log(`Endpoint: ${model.endpoint}`);func getModelEndpoint(model string) (map[string]any, error) {
fqdn := fmt.Sprintf("get.%s.models.inference.v1.resolvedb.net", model)
records, err := net.LookupTXT(fqdn)
if err != nil { return nil, err }
for _, seg := range strings.Split(strings.Join(records, ""), ";") {
if strings.HasPrefix(seg, "d=") {
var result map[string]any
json.Unmarshal([]byte(seg[2:]), &result)
return result, nil
}
}
return nil, nil
}| Feature | ResolveDB | Alternative |
|---|---|---|
| Cold start discovery | <5ms (cached) | 50-200ms (API) |
| 10K simultaneous workers | 1 origin request | 10K requests |
| Global distribution | DNS infra (free) | CDN required |
| Single point of failure | No (DNS hierarchy) | Yes (registry) |
| Works air-gapped | Yes (local DNS) | No |
Use the HTTP API: PUT /api/v1/namespaces/inference/resources/models/llama-70b with JSON containing endpoint, tensor parallelism, max batch size. Data is immediately available via DNS.
Use authenticated queries with JWT tokens for private endpoints. The response can include a routing token instead of the actual endpoint, which clients exchange with your load balancer.
Update the routing record via API, wait for TTL expiry (or use short TTL=60s during rollouts), all clients automatically pick up new weights. No deployments, no client changes.
Yes. DNS queries work from any language. Parse the UQRP response and use the endpoint. The examples show the pattern—it's just string parsing.
Create an account and start storing data in under a minute.