Blog/Developer

Caching gold prices and staying inside rate limits

Match cache TTLs to data freshness, read the X-RateLimit-* headers, back off gracefully on 429, and decide when polling beats streaming.

Developer.md

The free tier gives you 1000 calls per month and 30 requests per minute. That sounds like a lot until you add caching as an afterthought and realize your serverless function calls the API on every page load. If you haven't fetched a price yet, the quickstart guide is the right starting point; come back here when you're ready to handle rate limits properly.

A few hours of planning saves you from either hitting the limit mid-month or upgrading before you need to.

Read the rate-limit headers first

Every response from api.goldprice.dev includes these headers:

X-RateLimit-Limit: 30
X-RateLimit-Remaining: 27
X-RateLimit-Reset: 1718784000

Limit is your per-minute ceiling. Remaining tells you how many calls are left in the current window. Reset is a Unix timestamp for when the window resets.

Read these before you cache anything. They tell you the actual state of your quota, which is ground truth. Your local counter and the server's counter can diverge if you have multiple processes or serverless instances.

How fresh does the price need to be?

Gold trades in a liquid market during weekday hours. The spot price moves continuously, but most use cases do not need sub-second freshness:

  • A price display on a content site: 60 seconds is fine.
  • A user checking before a purchase decision: 15–30 seconds is reasonable.
  • An automated system making time-sensitive decisions: use the stream endpoint (Pro tier), not polling.

Match your TTL to what users actually need. A 60-second TTL on a page that receives 1000 visits per hour means one API call per minute, not 1000.

A simple in-process cache (Python)

For a single-process application, an in-memory cache with a TTL is enough:

import time
import os
import requests

API_KEY = os.environ["GOLDPRICE_API_KEY"]
BASE = "https://api.goldprice.dev"
TTL_SECONDS = 60

_cache: dict = {}

def get_spot_price(symbol: str = "XAU-USD-SPOT") -> dict:
    now = time.monotonic()
    cached = _cache.get(symbol)

    if cached and (now - cached["fetched_at"]) < TTL_SECONDS:
        return cached["data"]

    resp = requests.get(
        f"{BASE}/v1/spot/{symbol}",
        headers={"Authorization": f"Bearer {API_KEY}"},
        timeout=10,
    )
    resp.raise_for_status()

    # Log remaining quota on every real fetch
    remaining = resp.headers.get("X-RateLimit-Remaining")
    reset_ts = resp.headers.get("X-RateLimit-Reset")
    print(f"Quota remaining: {remaining}, resets at {reset_ts}")

    data = resp.json()
    _cache[symbol] = {"data": data, "fetched_at": now}
    return data

This is not safe for multi-process or serverless environments: each instance has its own _cache. In those cases, cache in Redis or another shared store.

Shared cache with Redis

If you have multiple workers or serverless functions, use Redis as the shared cache layer:

import json
import redis
import requests
import os

API_KEY = os.environ["GOLDPRICE_API_KEY"]
redis_client = redis.Redis.from_url(os.environ["REDIS_URL"])
CACHE_KEY_PREFIX = "goldprice:"
TTL_SECONDS = 60

def get_spot_price(symbol: str = "XAU-USD-SPOT") -> dict:
    cache_key = CACHE_KEY_PREFIX + symbol
    cached = redis_client.get(cache_key)

    if cached:
        return json.loads(cached)

    resp = requests.get(
        f"https://api.goldprice.dev/v1/spot/{symbol}",
        headers={"Authorization": f"Bearer {API_KEY}"},
        timeout=10,
    )
    resp.raise_for_status()
    data = resp.json()

    redis_client.setex(cache_key, TTL_SECONDS, json.dumps(data))
    return data

setex sets the value and the TTL atomically. All workers share the same cache; only one fetch happens per TTL window regardless of how many instances are running.

Handling 429 correctly

When you exceed the rate limit, the API returns HTTP 429. Do not retry immediately — that burns more quota. Back off and wait until the reset window.

import time

def get_spot_price_with_backoff(symbol: str = "XAU-USD-SPOT") -> dict:
    for attempt in range(3):
        resp = requests.get(
            f"https://api.goldprice.dev/v1/spot/{symbol}",
            headers={"Authorization": f"Bearer {API_KEY}"},
            timeout=10,
        )

        if resp.status_code == 429:
            reset_ts = int(resp.headers.get("X-RateLimit-Reset", 0))
            wait = max(0, reset_ts - int(time.time())) + 1
            print(f"Rate limited. Waiting {wait}s before retry.")
            time.sleep(wait)
            continue

        resp.raise_for_status()
        return resp.json()

    raise RuntimeError("Rate limit exceeded after retries")

The wait time is the difference between the reset timestamp and now, plus one second of buffer. This is more precise than a fixed exponential backoff because the server tells you exactly when the window opens.

Polling vs streaming

The stream endpoint (/v1/prices/stream) is a persistent SSE connection. The server pushes events; you receive them. This is efficient for dashboards and real-time displays because it eliminates polling overhead entirely.

Polling makes more sense for background jobs, scheduled tasks, and any case where the connection would be idle most of the time. Opening an SSE connection from a cron job that runs every 15 minutes and immediately closes is worse than a single GET request.

A rough decision tree:

  • Persistent UI that users watch: stream.
  • Background job or infrequent check: poll with cache.
  • Between 15 seconds and 60 seconds per update, single process: poll with in-process cache.
  • Multi-process or serverless with shared traffic: poll with shared cache (Redis or equivalent).

The stream is a Pro tier feature. If you are on Free, polling with a sensible TTL is the correct approach. See Stream live gold prices with Server-Sent Events for the SSE implementation once you're ready to upgrade.

Monthly quota math

1000 calls per month at 30 req/min. If you poll once per minute continuously, that is 1440 calls per day, which exhausts the free quota in under 17 hours of continuous polling. With a 60-second TTL and caching, you make one call per minute only when the cache misses — which on a low-traffic site might be 50–200 calls per day total, well within the budget.

The key is that the TTL is the dial. Adjust it based on actual traffic, then check the X-RateLimit-Remaining header in logs to verify you have headroom.

See plan limits and pricing to pick the right tier for your polling cadence.

// related guides

// goldprice.dev

Live gold prices, historical OHLC, and multi-source aggregation — available via REST and SSE.