CodeGym /Courses /ChatGPT Apps /Cache and the edge layer: CDN, edge cache, ETag, SWR, edg...

Cache and the edge layer: CDN, edge cache, ETag, SWR, edge functions (Vercel), and their limits

ChatGPT Apps
Level 16 , Lesson 4
Available

1. Why even think about cache and edge in a ChatGPT App

In a classic web app you also care about speed, but at least the user can see a spinner. In a ChatGPT App the situation is more interesting. The user talks to the model, and the model sometimes decides to call your App. The widget should pop up and show something useful fairly quickly.

Practice is pretty unambiguous: latency = money. The longer you take to respond, the higher the chance the user leaves, and extra LLM/backend calls are direct costs for models and infrastructure. Caching reduces both.

Plus, the specifics of ChatGPT Apps:

  • Requests from ChatGPT to your App travel through the network and various layers. Every millisecond at each hop adds up.
  • MCP/HTTP endpoints have real timeouts (including Vercel serverless functions and edge functions). If you don’t make it in time, ChatGPT sees an error and might even start to hallucinate an answer.
  • Many data points in GiftGenius don’t change every second: the structure of the gift catalog, “top ideas” collections for different segments, feature flags. It’s silly to hit the database or an external API every time.

And that’s exactly where these come in:

  • CDN and edge cache to serve static assets and cacheable JSON quickly.
  • HTTP cache with Cache-Control/ETag/SWR so repeat requests are faster and cheaper.
  • Vercel edge functions to run light logic as close as possible to ChatGPT and the user, without turning them into a “mini‑backend.”

2. Latency anatomy in GiftGenius and caching points

It’s useful to first honestly sketch where the latency is born.

sequenceDiagram
    participant User as User
    participant ChatGPT as ChatGPT
    participant App as ChatGPT App (Apps SDK)
    participant GW as MCP Gateway / Edge
    participant GiftAPI as Gift REST API / gift microservice
    participant DB as Catalog/DB

    User->>ChatGPT: "Find a gift for my brother"
    ChatGPT->>App: Tool call + render widget
    App->>GW: HTTP / MCP request (categories, collections)
    GW->>GiftAPI: HTTP (REST)
    GiftAPI->>DB: Catalog/recommendations query
    DB-->>GiftAPI: Response
    GiftAPI-->>GW: Response (JSON)
    GW-->>App: Response (JSON)
    App-->>ChatGPT: Widget with results
    ChatGPT-->>User: Message + UI

Where can we “cut corners” here?

  1. Between ChatGPT and your perimeter — a CDN/edge cache (Vercel CDN/Edge Network) that can serve immutable widget assets and cacheable JSON without hitting your origin server.
  2. Between the Gateway and internal REST/HTTP services (Gift REST API, Commerce REST API, etc.) and the database — an application cache (Redis/in‑memory/DB‑backed) to avoid repeating the same requests (e.g., “list of gift categories”) ten times.

In this lecture we focus on the HTTP/edge layer specifically, because it’s closer to ChatGPT and Vercel.

3. Types of cache in our architecture

Since our architecture is a “layer cake,” there are multiple caches.

Cache type Where it lives Best for
Browser cache Inside the ChatGPT client (browser/desktop) Widget static assets, icons, fonts (limited control)
CDN / edge cache On Vercel/Cloudflare edge nodes Static assets + shared JSON (categories, configs, common collections)
Application cache Inside your MCP Gateway or backend services (Redis, in‑memory) Results of heavy DB/external API requests
DB cache/materialization In the DB itself (materialized views, etc.) Precomputed aggregates, analytics

We’ll now concentrate on the first two: HTTP cache + CDN/edge.

4. HTTP cache: Cache-Control, max-age and s-maxage

HTTP caching is controlled primarily by the Cache-Control header. It determines whether the browser/ChatGPT client and/or the CDN may cache your response, and for how long.

Key pieces:

  • max-age — how many seconds the browser may cache the response.
  • s-maxage — how many seconds a shared cache (CDN/proxy) may cache the response.
  • public — the response may be cached in a shared cache.
  • private — the response is only for a specific client; the CDN must not cache it.

In GiftGenius, for example:

  • Widget JS/CSS/fonts are versioned files (with a hash in the filename). You can safely serve them with Cache-Control: max-age=31536000, immutable.
  • JSON with the list of gift categories — the same for all users — is a good fit for public, s-maxage=60 (or more).

A simple Next.js Route Handler for GET /api/gifts/categories that is cached on the CDN for 60 seconds:

// app/api/gifts/categories/route.ts
import { NextResponse } from "next/server";

export const runtime = "nodejs"; // regular serverless function

export async function GET() {
  // could query a DB/external API here
  const categories = [
    { id: "for_brother", title: "Gifts for a brother" },
    { id: "for_mom", title: "Gifts for mom" },
  ];

  return NextResponse.json(categories, {
    headers: {
      // allow the CDN to cache for 60 seconds
      "Cache-Control": "public, s-maxage=60",
    },
  });
}

Vercel’s CDN will store the response for 60 seconds, and all ChatGPT requests for this JSON within that window won’t reach your function at all. It’s instantaneous and cheap.

5. ETag: content fingerprint and 304 Not Modified

ETag is a conditional “fingerprint” of a resource, usually a hash of the content. The flow:

  1. The server returns a response with ETag: "v1-abc123".
  2. Next time the client sends If-None-Match: "v1-abc123".
  3. If the server decides the content hasn’t changed, it responds with 304 Not Modified and no body.

Important: ETag saves bandwidth, but it doesn’t necessarily reduce latency, because you still need a round trip to the server. In the context of ChatGPT Apps it helps for heavy JSON responses, but don’t expect miraculous speedups from ETag alone — use SWR and edge caching for that.

A simple ETag example in a Next.js handler (no crypto hashes to keep it simple):

// app/api/gifts/config/route.ts
import { NextRequest, NextResponse } from "next/server";

const CONFIG = { version: 1, showExperimentalIdeas: true };
const ETAG = `"v${CONFIG.version}"`;

export async function GET(req: NextRequest) {
  const ifNoneMatch = req.headers.get("if-none-match");
  if (ifNoneMatch === ETAG) {
    // Content has not changed — return 304
    return new NextResponse(null, { status: 304, headers: { ETag: ETAG } });
  }

  return NextResponse.json(CONFIG, {
    headers: {
      ETag: ETAG,
      "Cache-Control": "public, s-maxage=300",
    },
  });
}

In real life, you’ll compute the ETag from a data hash or use a record version from the DB.

6. Stale‑While‑Revalidate (SWR): fast and fresh enough

SWR is the “show stale now, refresh in the background” approach. You can implement it:

  • At the HTTP header level using Cache-Control with stale-while-revalidate.
  • At the UI level using libraries like swr/react-query, which keep a local cache and do background refetches.

SWR in the HTTP header

Typical header:

Cache-Control: public, s-maxage=60, stale-while-revalidate=300

Meaning:

  • For the first 60 seconds the CDN serves a fresh version.
  • From the 61st to the 360th second the CDN may return a stale response instantly, and trigger a background request to the origin for a fresh version.
  • After 360 seconds the request for new content becomes blocking.

The user (and ChatGPT) gets a response instantly even at peak load, while you gently refresh the cache in the background. For GiftGenius this is ideal, for example, for “top gift picks for the New Year” — they don’t change every second.

Example:

// app/api/gifts/top/route.ts
import { NextResponse } from "next/server";

export async function GET() {
  const topGifts = [
    { id: "coffee_mug", title: "Mug with a caption" },
    { id: "smart_led", title: "Smart lamp" },
  ];

  return NextResponse.json(topGifts, {
    headers: {
      "Cache-Control": "public, s-maxage=60, stale-while-revalidate=300",
    },
  });
}

SWR in the UI widget (React)

The GiftGenius widget lives in ChatGPT’s sandbox and can use any React code. You already know how to call your API with window.fetch. Let’s add the swr library and organize a cache on the widget side:

// widget/GiftTopList.tsx
import useSWR from "swr";

const fetcher = (url: string) => fetch(url).then((r) => r.json());

export function GiftTopList() {
  const { data, isLoading } = useSWR(
    "https://api.giftgenius.com/api/gifts/top",
    fetcher,
    { revalidateOnFocus: false } // focus is odd inside chat, disable it
  );

  if (isLoading && !data) return <div>Loading ideas...</div>;

  return (
    <ul>
      {data?.map((gift: any) => (
        <li key={gift.id}>{gift.title}</li>
      ))}
    </ul>
  );
}

How it works:

  • On first render it requests our API.
  • The result is stored in swr’s cache inside the widget.
  • On subsequent renders (or new answers where ChatGPT embeds this widget again with the same key) the data is taken from the cache. The user doesn’t see flicker or spinners, and a background refresh may run.

Thus we combine two SWR levels:

  • At the CDN/HTTP level — to avoid loading the origin.
  • In the UI — to avoid loading the user experience.

Putting it all together:

  • Simple Cache-Control (max-age/s-maxage) — the base layer: allow CDNs and clients to cache responses and reduce load.
  • ETag + If-None-Match — add when bandwidth savings for heavy JSON matter, while accepting the network round trip.
  • stale-while-revalidate — enable when instant delivery of slightly stale data matters (catalogs, top picks).
  • SWR in the UI (the swr/react-query library) — a separate layer to smooth widget re-renders and maintain a local cache in ChatGPT’s sandbox.

7. What to cache in GiftGenius and for how long

Let’s classify GiftGenius data by “cacheability layers.”

Safe to cache at the CDN/edge level

Anything that’s the same for everyone (or large segments) and changes infrequently:

  • Widget static assets: JS/CSS, fonts, icons — “forever” (a year) with immutable.
  • Gift catalog structure: categories, sections, filters — minutes/hours.
  • Common collections (“best ideas for coworkers under $50”) — minutes/tens of minutes, especially in peak seasons.

Here, public, s-maxage + stale-while-revalidate is ideal.

Better cached in the application/Redis

More dynamic yet still repetitive data:

  • Results of heavy external APIs (e.g., exchange rates, current prices from an external store).
  • Frequently requested recommendation segments (by gender/age/occasion).

A CDN won’t always be suitable here because the data may depend on token/organization/tenant. Cache it at the MCP Gateway level or inside internal REST services: you control it completely and avoid mixing data between users.

Must not be cached (in shared caches)

Anything tied to a specific user:

  • Personal orders and their statuses.
  • Payment information, addresses, email.
  • Personalized recommendations based on private order history (if sensitive).

You can cache this only at the application level with careful semantics (and absolutely no cross‑user leakage), but definitely not in a public CDN cache.

8. The edge layer: CDN vs. edge functions

Don’t confuse two similar but different beasts:

  • CDN / edge cache — stores precomputed responses; there’s almost no logic there.
  • Edge functions (Vercel Edge / Cloudflare Workers) — small pieces of code that run on edge nodes.

Experience shows: Edge ≠ Serverless. Many developers try to cram heavy business logic, LLM calls, and BLOB processing in there and then get surprised by timeouts and limits. Edge functions:

  • Start very fast (near‑zero cold start).
  • But are strongly limited in CPU, execution time, and available APIs (often without full Node.js, no long‑lived sockets, etc.).

When an edge function is a good idea

In the context of GiftGenius and a ChatGPT App, edge functions are useful for:

  • Lightweight routing: based on headers like locale, x-openai-user-location, or a tenant ID, decide which regional backend cluster to hit.
  • Adding simple headers, feature flags, A/B routing.
  • Fast read‑only endpoints that read from an edge KV or the CDN cache and do almost no computation.

When an edge function is a bad idea

  • Long external API calls.
  • Calls to LLM models.
  • Complex checkout logic.
  • MCP tools with heavy business logic.

For all of that you have regular Next.js serverless functions (e.g., runtime = "nodejs") or separate services/clusters.

Edge function example in Next.js 16

Let’s make a small route GET /api/geo-router that returns which regional cluster to use based on the x-openai-user-location header (hypothetical).

// app/api/geo-router/route.ts
import { NextRequest, NextResponse } from "next/server";

export const runtime = "edge"; // run on the edge

export function GET(req: NextRequest) {
  const userLocation = req.headers.get("x-openai-user-location") ?? "US";
  const cluster =
    userLocation.startsWith("EU") ? "eu-gift-api" : "us-gift-api";

  return NextResponse.json({ cluster }, {
    headers: {
      "Cache-Control": "public, s-maxage=300",
    },
  });
}

Such an endpoint:

  • Works very fast (edge).
  • Does nothing complex.
  • Can be cached by the CDN.

9. Edge and cache in the overall GiftGenius architecture

Let’s put it all into one picture.

flowchart TD
    ChatGPT[(ChatGPT / User)]
    CDN["CDN / Edge Cache (Vercel)"]
    EdgeFn["Edge Functions (routing, feature flags)"]
    GW[MCP Gateway]
    GiftAPI["Gift REST API Cluster"]
    CommerceAPI["Commerce REST API Cluster"]
    DB[(DB/External APIs)]
    
    ChatGPT --> CDN
    CDN -->|cache hit| ChatGPT
    CDN -->|cache miss| EdgeFn
    EdgeFn --> GW
    GW --> GiftAPI
    GW --> CommerceAPI
    GiftAPI --> DB
    CommerceAPI --> DB

A typical scenario:

  1. The ChatGPT widget requests /api/gifts/categories.
  2. The CDN checks the cache. If it has a fresh or “stale yet still acceptable” version — it returns it immediately, without touching EdgeFn/GW.
  3. If there’s no cache — the request falls through to EdgeFn (if enabled) and/or straight to the GW.
  4. The GW may use an internal Redis cache for heavy operations or call internal REST services and then the DB.
  5. The response comes back, lands in the CDN/edge cache, and is served to other users.

This setup:

  • Reduces latency for the widget and ChatGPT.
  • Reduces load on the MCP Gateway and backend clusters.
  • Lowers the cost of LLM/DB calls (fewer repeat requests).

10. Small practical snippets for GiftGenius

Category cache + Next.js revalidate

So far we’ve been talking about API endpoints only. But Next.js provides similar mechanisms for pages themselves — via ISR (revalidate).

A server component example that fetches the category list with revalidate = 60:

// app/(widget)/categories/page.tsx
export const revalidate = 60; // ISR: re-generate every 60 s

async function fetchCategories() {
  const res = await fetch("https://api.giftgenius.com/api/gifts/categories");
  return res.json();
}

export default async function CategoriesPage() {
  const categories = await fetchCategories();
  return (
    <ul>
      {categories.map((c: any) => (
        <li key={c.id}>{c.title}</li>
      ))}
    </ul>
  );
}

In production, Vercel will generate and cache the HTML output of this page, which is useful when your widget/interface is open not only via ChatGPT but also as a regular web page (e.g., a debug panel or landing).

A simple application cache in a backend service

This is not the edge layer anymore but an application cache (Redis/in‑memory inside your Gift REST API or another backend service). It’s helpful to show how it looks in the simplest form:

// pseudo-code inside Gift REST API
const cache = new Map<string, any>();

async function getGiftCategories() {
  const key = "gift_categories_v1";
  const cached = cache.get(key);
  if (cached && Date.now() - cached.ts < 60_000) {
    return cached.data; // 60 seconds cache
  }
  const data = await fetchRealCategories();
  cache.set(key, { ts: Date.now(), data });
  return data;
}

In production you’ll replace Map with Redis/Memcached, but the idea is the same: fewer trips to the DB/external API.

If we compress all this into one thesis: first decide clearly what can be cached and where (CDN, edge, Redis, DB), and only then flip the platform’s “magic” switches. Cache isn’t a checkbox in config; it’s part of the architecture — it affects speed, stability, and cost.

11. Common mistakes when working with cache and the edge layer

Mistake #1: “Cache everything, as long as it’s faster.”
The classic: a developer sets Cache-Control: public, s-maxage=3600 on all JSON responses. A couple of hours later it turns out one user sees another user’s orders, and ChatGPT starts operating with stale stock data. For personal or sensitive data you need either a private cache or to disable the CDN cache entirely and keep caching at the application level with careful isolation.

Mistake #2: Confusing max-age and s-maxage.
Some set only max-age and expect the CDN to cache for the same duration. In reality max-age primarily applies to the browser, and for a shared cache you need s-maxage. The result: the browser caches, but the CDN doesn’t, and the origin keeps choking under load even though “we set a cache.” The right way is to specify s-maxage explicitly for the CDN.

Mistake #3: Expecting ETag to speed up everything.
ETag is great for saving bandwidth, especially for large JSON files, but the network round trip remains. In the world of ChatGPT Apps this means the model still waits for your server’s response, even if it’s a 304 with no body. If you care about latency, you need edge cache + SWR, with ETag as a supporting mechanism.

Mistake #4: Shoving heavy business logic into edge functions.
“Let’s call an external LLM, compute complex selections, and hit three external APIs right from Vercel Edge — it’s fast there!” Then the pain starts: execution time limits, no full Node.js, weird errors. The edge is great for lightweight routing and A/B, while heavy work should go to regular serverless functions or separate backend clusters.

Mistake #5: No cache invalidation strategy.
You set the cache to “one hour,” everything flies. Then the business says: “we changed prices/categories/constraints, why does ChatGPT still show the old stuff?” Developers start pulling levers manually, purging caches, and restarting services. For important data, plan ahead: how will you purge the cache (via a webhook from the admin panel, via a version, by key) instead of relying on “it will auto‑refresh in an hour.”

Mistake #6: Ignoring the cache ↔ cost relationship.
Sometimes developers think about cache only in terms of speed. In the LLM ecosystem it’s also about money: every extra call to a model or external API costs money. Without caching, an MCP server may start hammering an external service/model so often that the monthly bill will be unpleasant. Proper caching reduces both latency and the bill.

Mistake #7: Mixing data for different locales/regions in one cache.
GiftGenius operates in multiple countries but uses a single key top_gifts in the cache. Result: a user from the US sees rubles and Russian stores, while a user from Europe sees dollars and US stores. When caching, always consider keys like locale, currency, tenant in the cache key or in the route (e.g., /api/{locale}/gifts/top).

Mistake #8: Fully relying on Next.js/platform “magic.”
ISR, revalidate, automatic CDN — all of this is great. But if you don’t understand what happens under the hood, you can get surprises. For example, a page shows old content while the API returns new content; ChatGPT sees one thing, and browser users see another. Spend time to understand how Cache-Control, ETag, and the SWR pattern work, and use Next.js as a convenient wrapper, not a black box.

Mistake #9: No difference between dev/staging/production in caching.
In the dev environment, cache often gets in the way (“I changed the data, why does ChatGPT still see the old selections?”). It’s useful to have a config that almost disables caching in dev (or sets TTL to a few seconds) and enables aggressive caching in production. Otherwise, you’ll either go crazy during development or accidentally ship to prod without caching and get a storm of requests hitting internal backend clusters behind the MCP Gateway.

1
Task
ChatGPT Apps, level 16, lesson 4
Locked
Public JSON endpoint with Cache-Control for CDN
Public JSON endpoint with Cache-Control for CDN
1
Task
ChatGPT Apps, level 16, lesson 4
Locked
ETag + If-None-Match → 304 Not Modified for configuration
ETag + If-None-Match → 304 Not Modified for configuration
1
Survey/quiz
Production and scaling, level 16, lesson 4
Unavailable
Production and scaling
Production, networking, and scaling
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION