# Caching gold prices and staying inside rate limits

The free tier gives you 1000 calls per month and 30 requests per minute. That sounds like a lot until you add caching as an afterthought and realize your serverless function calls the API on every page load. If you haven't fetched a price yet, the [quickstart guide](/docs/quickstart) is the right starting point; come back here when you're ready to handle rate limits properly.

A few hours of planning saves you from either hitting the limit mid-month or upgrading before you need to.

## Read the rate-limit headers first

Every response from `api.goldprice.dev` includes these headers:

```
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 27
X-RateLimit-Reset: 1718784000
```

`Limit` is your per-minute ceiling. `Remaining` tells you how many calls are left in the current window. `Reset` is a Unix timestamp for when the window resets.

Read these before you cache anything. They tell you the actual state of your quota, which is ground truth. Your local counter and the server's counter can diverge if you have multiple processes or serverless instances.

## How fresh does the price need to be?

Gold trades in a liquid market during weekday hours. The spot price moves continuously, but most use cases do not need sub-second freshness:

- A price display on a content site: 60 seconds is fine.
- A user checking before a purchase decision: 15–30 seconds is reasonable.
- An automated system making time-sensitive decisions: use the stream endpoint (Pro tier), not polling.

Match your TTL to what users actually need. A 60-second TTL on a page that receives 1000 visits per hour means one API call per minute, not 1000.

## A simple in-process cache (Python)

For a single-process application, an in-memory cache with a TTL is enough:

```python
import time
import os
import requests

API_KEY = os.environ["GOLDPRICE_API_KEY"]
BASE = "https://api.goldprice.dev"
TTL_SECONDS = 60

_cache: dict = {}

def get_spot_price(symbol: str = "XAU-USD-SPOT") -> dict:
    now = time.monotonic()
    cached = _cache.get(symbol)

    if cached and (now - cached["fetched_at"]) < TTL_SECONDS:
        return cached["data"]

    resp = requests.get(
        f"{BASE}/v1/spot/{symbol}",
        headers={"Authorization": f"Bearer {API_KEY}"},
        timeout=10,
    )
    resp.raise_for_status()

    # Log remaining quota on every real fetch
    remaining = resp.headers.get("X-RateLimit-Remaining")
    reset_ts = resp.headers.get("X-RateLimit-Reset")
    print(f"Quota remaining: {remaining}, resets at {reset_ts}")

    data = resp.json()
    _cache[symbol] = {"data": data, "fetched_at": now}
    return data
```

This is not safe for multi-process or serverless environments: each instance has its own `_cache`. In those cases, cache in Redis or another shared store.

## Shared cache with Redis

If you have multiple workers or serverless functions, use Redis as the shared cache layer:

```python
import json
import redis
import requests
import os

API_KEY = os.environ["GOLDPRICE_API_KEY"]
redis_client = redis.Redis.from_url(os.environ["REDIS_URL"])
CACHE_KEY_PREFIX = "goldprice:"
TTL_SECONDS = 60

def get_spot_price(symbol: str = "XAU-USD-SPOT") -> dict:
    cache_key = CACHE_KEY_PREFIX + symbol
    cached = redis_client.get(cache_key)

    if cached:
        return json.loads(cached)

    resp = requests.get(
        f"https://api.goldprice.dev/v1/spot/{symbol}",
        headers={"Authorization": f"Bearer {API_KEY}"},
        timeout=10,
    )
    resp.raise_for_status()
    data = resp.json()

    redis_client.setex(cache_key, TTL_SECONDS, json.dumps(data))
    return data
```

`setex` sets the value and the TTL atomically. All workers share the same cache; only one fetch happens per TTL window regardless of how many instances are running.

## Handling 429 correctly

When you exceed the rate limit, the API returns HTTP 429. Do not retry immediately — that burns more quota. Back off and wait until the reset window.

```python
import time

def get_spot_price_with_backoff(symbol: str = "XAU-USD-SPOT") -> dict:
    for attempt in range(3):
        resp = requests.get(
            f"https://api.goldprice.dev/v1/spot/{symbol}",
            headers={"Authorization": f"Bearer {API_KEY}"},
            timeout=10,
        )

        if resp.status_code == 429:
            reset_ts = int(resp.headers.get("X-RateLimit-Reset", 0))
            wait = max(0, reset_ts - int(time.time())) + 1
            print(f"Rate limited. Waiting {wait}s before retry.")
            time.sleep(wait)
            continue

        resp.raise_for_status()
        return resp.json()

    raise RuntimeError("Rate limit exceeded after retries")
```

The wait time is the difference between the reset timestamp and now, plus one second of buffer. This is more precise than a fixed exponential backoff because the server tells you exactly when the window opens.

## Polling vs streaming

The stream endpoint (`/v1/prices/stream`) is a persistent SSE connection. The server pushes events; you receive them. This is efficient for dashboards and real-time displays because it eliminates polling overhead entirely.

Polling makes more sense for background jobs, scheduled tasks, and any case where the connection would be idle most of the time. Opening an SSE connection from a cron job that runs every 15 minutes and immediately closes is worse than a single GET request.

A rough decision tree:

- Persistent UI that users watch: stream.
- Background job or infrequent check: poll with cache.
- Between 15 seconds and 60 seconds per update, single process: poll with in-process cache.
- Multi-process or serverless with shared traffic: poll with shared cache (Redis or equivalent).

The stream is a Pro tier feature. If you are on Free, polling with a sensible TTL is the correct approach. See [Stream live gold prices with Server-Sent Events](/blog/stream-gold-prices-sse) for the SSE implementation once you're ready to upgrade.

## Monthly quota math

1000 calls per month at 30 req/min. If you poll once per minute continuously, that is 1440 calls per day, which exhausts the free quota in under 17 hours of continuous polling. With a 60-second TTL and caching, you make one call per minute only when the cache misses — which on a low-traffic site might be 50–200 calls per day total, well within the budget.

The key is that the TTL is the dial. Adjust it based on actual traffic, then check the `X-RateLimit-Remaining` header in logs to verify you have headroom.

See [plan limits and pricing](/pricing) to pick the right tier for your polling cadence.
