CodeGym /Courses /ChatGPT Apps /How GPT decides to call a tool: the tool-call model and t...

How GPT decides to call a tool: the tool-call model and the role of descriptions

ChatGPT Apps
Level 4 , Lesson 0
Available

1. Why even bother understanding tool-call

To simplify, a typical web app works like this: “the user clicks a button — we call a function.” In the world of ChatGPT Apps it’s different: the user says something, the model thinks, and, if it deems it necessary, it produces a structured tool invocation (tool-call).

So you do not write:

onClick={() => callSuggestGiftsApi(formData)}

but instead:

  1. Describe a tool suggest_gifts (name, description, argument schema).
  2. Explain in the system-prompt how this tool is useful.
  3. Hand control to the model: it decides when and how to call it.

From this, it’s important to grasp two things early:

  1. GPT does not see your backend code. It only sees the tool’s “surface”: the name, description, and parameter schema.
  2. How “smartly” the model will use your app almost directly depends on how you write those descriptions. Good descriptions are your “prompt for the tool”.

Today’s lecture is about this very “brain” between the user and your server.

2. The mental model of tool-call: what actually happens

Let’s start with the big picture. A typical scenario for GiftGenius:

  1. User: “Pick a gift for a 30‑year‑old friend, budget 100 dollars, he loves video games.”
  2. GPT reads the message and checks what tools exist. In our app, for example, there is suggest_gifts.
  3. GPT decides: “To answer well, I need to call this tool.”
  4. Instead of a regular text reply, it generates a structure: tool name + JSON arguments.
  5. The ChatGPT client sees: “Aha, this is a tool-call,” and sends it to your MCP/server.
  6. Your server executes business logic and returns structured output.
  7. GPT receives the result, reads it, and based on the tool’s answer composes a clear reply for the user and/or updates a widget.

From the OpenAI API standpoint, this is the same mechanism as LLM-function-calling: in the model’s response, instead of normal text you get an object with the tool’s name and arguments, and finish_reason is marked as tool_calls. The model does not execute code itself — it only proposes which tool to call, and the actual call is made by the client (ChatGPT/Apps SDK).

It looks roughly like this (simplified sequence):

sequenceDiagram
    participant U as User
    participant G as GPT (model)
    participant C as ChatGPT client
    participant S as Your MCP/Backend

    U->>G: "Pick a gift for a friend..."
    G->>C: tool-call: { name: "suggest_gifts", args: {...} }
    C->>S: HTTP /mcp tools/call (suggest_gifts, args)
    S-->>C: Result (JSON with a list of gifts)
    C-->>G: tool result
    G-->>U: Response + updated widget

The key takeaway: you do not write if (userAskedAboutGifts) callSuggestGifts(). You create a tool and its description, and the model makes the decision.

3. What the model sees: system prompt + tool list

To understand how GPT decides what to do, you need a clear picture of what information it has at decision time.

Simplified, the model sees:

  • the app’s system prompt (we’ll cover this in detail in module 5);
  • the conversation history: user messages, its own replies, and results of past tool calls;
  • the list of available tools (tools) with their names, descriptions, and parameter schemas;
  • additional annotations for tools (readOnly/destructive, etc.).

It does not see:

  • function implementations;
  • SQL queries;
  • your table structure;
  • the contents of a private repository with the service.

We’ll discuss MCP in detail later. For now, it’s enough to know that at the MCP level, tools are declared as descriptors: each has a name, description, and inputSchema (JSON Schema). During the handshake, ChatGPT requests the tool list from the MCP server and begins to treat them as available “actions”.

Here’s an example of such a descriptor for GiftGenius (simplified JSON):

{
  "name": "suggest_gifts",
  "description": "Picks gift ideas based on age, interests, and budget",
  "inputSchema": {
    "type": "object",
    "properties": {
      "age": { "type": "integer" },
      "budget": { "type": "number" }
    },
    "required": ["age", "budget"]
  }
}

The model only “reads” the text and structure here: what age is, what budget is, and what the tool does overall. The next lecture will be about how to properly describe inputSchema. For now — how this description turns into the decision “let’s call suggest_gifts.”

4. What a tool-call looks like via the API

ChatGPT calls the tools (tools) of your MCP server roughly the same way an OpenAI Agent calls functions on your backend. In the ChatGPT Apps SDK it’s a bit more wrapped, but the basic mechanics are the same.

Imagine that on our backend we make a normal request to the OpenAI API and pass a tool suggest_gifts that the model can invoke in its response:

const response = await openai.responses.create({
  model: 'gpt-5-mini',
  messages: [
    {
      role: 'user',
      content: 'Need a gift for a 30-year-old friend, budget 100 dollars'
    }
  ],
  tools: [ // here we pass the list of functions that the LLM can "call"
    {
      name: 'suggest_gifts',
      description: 'Picks gifts by age, budget, and interests',
      parameters: {
        type: 'object',
        properties: {
          age: { type: 'integer' },
          budget: { type: 'number' }
        },
        required: ['age', 'budget']
      }
    }
  ]
});

If the model decides to call the tool, you’ll receive an assistant message in response instead of text, something like:

{
  "role": "assistant",
  "tool_calls": [
    {
      "id": "call_1",
      "name": "suggest_gifts",
      "arguments": "{\"age\":30,\"budget\":100}"
    }
  ],
  "content": []
}

This is how the LLM tells your backend it needs to call suggest_gifts(30,100).

Three things matter here:

  1. The tool name (name) — the model literally substitutes the exact string you specified in the tools description when you sent the first request.
  2. The arguments (arguments) — a JSON string assembled based on parameters/inputSchema.
  3. No regular text reply (yet) — instead you get a structure for invoking the tool.

In a ChatGPT app, it’s the same: the model returns “I want to call suggest_gifts with these parameters,” and the client (ChatGPT) makes an HTTP request to your MCP/server: tools/call with the tool name and the arguments.

5. How the model decides: tool or text

Now the interesting part: when does GPT even consider your tools?

The mechanics, simplified:

  1. The model sees the user’s new message and the current context.
  2. Internally it has a “layer” that generates the next assistant message, but instead of always producing regular text, the model can choose one of the completion types:
    • a regular text reply (finish_reason: "stop");
    • one or more tool-calls (finish_reason: "tool_calls");
    • sometimes other variants (for example, “needs another user message”).
  3. This choice is influenced by:
    • how similar the user’s request is to the tasks described by your tools;
    • how explicitly your tool description says “use me in this exact case”;
    • data from the app system prompt, which in the Apps SDK is configured in settings.

Put simply, the model “tries your tool on” for the current request. If the description says: “Selects gifts by age and interests,” and the user asks for “analysis of a government budget,” the model won’t even attempt to call it. If the description is too vague — “does cool stuff” — the model won’t understand when it should be used at all.

An interesting nuance: the model is not obliged to call a tool even if you described it. GPT may decide: “Everything is clear here; I’ll answer myself without a tool call.” Later in the course we’ll actively practice writing tool descriptions that make using the tool as obvious and beneficial as possible for the model.

6. Tool name: why tool1 is a bad idea

The tool’s name is essentially the identifier the model will use in its calls. It might seem like a purely technical field, but in practice the name heavily affects model behavior.

If you name a tool tool1, the model won’t infer anything from it. It’s just a sequence of characters. If you name it suggest_gifts, search_products, or fetch_user_orders, the name itself gives a strong signal about what the tool does.

Think about how you read unfamiliar code. Seeing a function calculateCartTotal, you roughly know what to expect. The model needs the same “semantic anchor.”

For GiftGenius, sensible tool names could be:

suggest_gifts
search_products
get_product_details
create_order

Good if a name is:

  • short but meaningful;
  • consistent (snake_case, Latin letters, verb_noun);
  • reflecting one specific action.

It’s a bad idea to mix multiple actions in one tool, like do_all_gift_stuff. It’s harder for the model to understand when to use it, and in later lectures we’ll see how this breaks the argument schema and complicates debugging.

7. Tool description: your prompt for the model

If the name is a title, then the description is mini‑documentation, not for a human developer but for GPT. A developer will read the code; the model will not. It relies on the description text when deciding when to call the tool and which arguments to provide.

It’s important to write the description as “instructions for use”:

  • when to use the tool;
  • what its limitations are;
  • what it must not do.

Let’s take our suggest_gifts. Here are three description variants.

Too broad:

"Selects gifts."

The model won’t know for whom, on what occasion, and with which parameters. This tool can “compete” with the model’s general knowledge about gifts, and it will often decide to answer with text instead.

Too narrow:

"Selects gifts only for younger brothers for a birthday."

Here we’ve effectively forbidden using the tool almost all the time. Any other scenario — mom, colleague, anniversary — “doesn’t fit,” and the model will avoid calling it.

Optimal:

"Use this tool when you need to select gifts for a person based on age, relationship type (friend, partner, colleague, etc.), budget, and interests.
Do not call it for questions unrelated to gifts (e.g., politics or weather)."

This clearly states what the tool does, which parameters it has for that, and when to call it, and adds a negative condition — for which requests it should not be used.

The model “likes” such clear boundaries. The more clearly you spell out for which user phrasings (intents) the tool is appropriate, the more predictable your app’s behavior will be.

Mini exercise

Right now, take your future app (maybe not about gifts) and come up with three descriptions for one of its tools: very broad, very narrow, and balanced. Then test how GPT behaves with different versions.

8. Argument schema: how it helps the decision

We’ll talk about JSON Schema in detail in the next lecture, but to understand tool-calls you need at least a high‑level feel.

When the model decides to call a tool, it needs to:

  1. Understand which arguments the tool expects at all.
  2. Extract those values from the user’s text (or from context).
  3. Assemble JSON with those arguments.

For that, the tool’s description has a parameter schema (inputSchema) that tells the model:

  • which fields exist (age, budget, relationship_type, interests, etc.);
  • which fields are required (required);
  • what the types are (integer, number, string, arrays, etc.);
  • sometimes — which values are allowed (enum) and field explanations (description).

A simplest TypeScript interface for suggest_gifts parameters could look like this:

interface SuggestGiftsParams {
  age: number;
  relationship_type: 'friend' | 'partner' | 'colleague';
  budget: number;
  interests?: string[];
}

At the model level, this becomes JSON Schema, and the model by the name and description of each field infers that:

  • take age from phrases like “30 years old,” “for a teenager,” etc.;
  • take budget from “budget 100 dollars,” “up to 50 euros”;
  • relationship_type from “friend,” “colleague”;
  • interests from “loves video games.”

If you provide a schema without descriptions and with abstract field names (a, b, c), the model will make far more mistakes when filling arguments. We’ll return to this in the module on localization and UX hints. The key idea: the schema is not only backend validation — it is primarily a hint to the model about where to put what.

We’ve discussed how the schema helps the model assemble arguments correctly. But beyond “what and how to call,” there is also “can it be called right now and how safe is it.” This is where permissions and tool meta‑information come into play.

9. Permissions and context: not every tool is available all the time

Beyond name, description, and argument schema, tools have another important dimension — safety and access. Tools in a real app differ greatly in “risk level.” One thing is searching gifts in a public catalog; another is charging the user’s card.

The Apps SDK and MCP let you reflect this in tool descriptions and annotations — for example, marking them as read-only or destructive.

The idea is:

  • Tools that only read public data (search_products, get_weather) can be called without extra confirmations.
  • Tools that modify something (create_order, cancel_order, charge_user) are marked destructive. The ChatGPT UI can prompt the user for additional confirmation (“Are you sure you want to place the order?”), and the model itself will suggest them less often without an explicit request.

In future modules, when configuring MCP, you’ll see how these annotations (_meta, destructiveHint, readOnlyHint) look in real JSON descriptors, how they influence UX, and how ChatGPT forms “Are you sure?” dialogs before invoking. For now, understand:

  • GPT takes into account not only the description text but also safety meta‑information.
  • A tool requiring authentication won’t be used until the user is logged in (or the app has received the required token).

This is another factor influencing the “call a tool or not” decision: even if a tool fits semantically, it may be unavailable due to permissions, and the model will choose another path.

10. Where tools in ChatGPT come from

Architecturally, a tool can reach the model via two main paths.

First, from your ChatGPT App configuration. When you register an app, you specify which MCP servers (and their tools) are linked to it, or which built‑in tools the app itself has. At session start, ChatGPT receives this configuration and understands which tools are available at all.

Second, directly from MCP. MCP (Model Context Protocol) defines a standard way for the client (in our case, ChatGPT/Apps SDK) to learn what your server can do: it makes a tools/list request, receives JSON with tool descriptions, and stores them as capabilities. We’ll cover the details in a separate MCP module; for now, keep the general idea in mind.

Schematically:

flowchart LR
  A[ChatGPT Client] -->|handshake| B[MCP Server]
  B -->|tools/list| A
  A -->|passes the list| G[GPT Model]

After that, the tool list becomes part of the model’s context. If you change a tool’s schema or description on the server and restart the app, the new descriptor will reach ChatGPT at the next handshake, and the model will start making different decisions about calling it.

And an important practical thought: when you only change the backend (the tool implementation), the model doesn’t know about it. But when you change the name/description/schema, you are actually changing the app’s “brain.” Sometimes it’s more effective to tweak one line in the description than to write 200 lines of heuristics.

11. Apply to GiftGenius: build a tool the model will want to call

Let’s connect all of this to our training app, GiftGenius. Suppose we already have an MCP server or backend layer where we register tools. Let’s register a suggest_gifts tool using server.registerTool(...).

Primitive TypeScript sketch (without real logic yet):

// pseudo-mcp-server/tools/suggestGifts.ts
server.registerTool(
  'suggest_gifts', // tool name
  {
    title: 'Gift selection',
    description:
      'Use this tool to suggest gift ideas by age, ' +
      'relationship type, and budget. Do not call it for questions unrelated to gifts.',
    inputSchema: { // tool parameter description
      type: 'object',
      properties: {
        age: { type: 'integer', description: 'Recipient age in years' },
        relationship_type: {
          type: 'string',
          description: 'Relationship type: friend, partner, colleague'
        },
        budget: {
          type: 'number',
          description: 'Maximum gift budget in the user’s currency'
        }
      },
      required: ['age', 'budget']
    }
  },
  async ({ age, relationship_type, budget }) => { // function/tool code
    // Real logic will come later
    return { suggestions: [] };
  }
);

Note the details we’ve considered at this stage, even though the logic is still a stub:

  • Name: suggest_gifts, not tool1.
  • Description: explicitly explains when to call the tool and when not to.
  • Field descriptions: help the model correctly map user text to arguments.

As a result, when a user writes “Pick a gift for a colleague for 50 dollars,” the model will see that:

  • there’s a tool named suggest_gifts with a description about gift selection;
  • it has fields age, relationship_type, budget;
  • budget is “maximum budget for the gift,” relationship_type is “relationship type: friend, partner, colleague.”

Even if users are imprecise (“up to fifty,” “for a project teammate”), the model will have enough context to assemble a JSON of arguments reasonably well.

When our tool starts working for real (in the backend and MCP module), you’ll already be well‑oriented in the topic: GPT will call it predictably because we designed the interface and description well.

12. A small practice for you

So this doesn’t stay purely theoretical, I recommend a small experiment right after the lecture.

First, take one of your GiftGenius scenarios or come up with a new app. Write down on paper or in an editor one function you clearly want to give the model — something like search_products, find_hotels, calculate_shipping.

Then come up with three “name + description” pairs for the same tool:

  1. Very abstract name and description.
  2. Too specific (almost a special case).
  3. A well‑balanced name + description that clearly states when to call the tool and what it must not do.

Next — optionally — using the regular OpenAI SDK, make a simple request with these variants and see how the model’s behavior changes: whether the tool is called and how it fills the arguments. Research on this topic includes exactly this kind of exercise for suggest_gifts.

13. Common mistakes when designing tool-call and descriptions

Mistake #1: Naming tools tool1, handler, doStuff.
Such naming is useless for the model. GPT does not infer “developer intent” from a filename; it needs a semantically meaningful name. If you provide a set of tool1, tool2, tool3 without descriptions, the tool will hardly ever be called: the model simply won’t understand what each one does and will either ignore them or pick randomly.

Mistake #2: Treating description as comments for humans.
Many people write something formal like “Function for selecting gifts,” assuming details are known from the code anyway. But the model doesn’t see the code; it only sees the description text and the argument schema. A vague description becomes a source of hallucinations: GPT will either try to answer itself when the tool should have been called, or it will call the tool in odd situations.

Mistake #3: Making the description too broad or too narrow.
If you write “Does cool things,” the model doesn’t know the bounds of applicability. If you write “Selects a gift only for a younger brother’s 18th birthday,” you effectively forbid using the tool almost always. An optimal description sets a clear task area (gift selection by several parameters), lists key parameters (age, relationship, budget, interests), and states which classes of questions the tool should not be used for.

Mistake #4: Ignoring the argument schema as part of the “prompt.”
Some developers view JSON Schema only as server‑side validation. In reality, the model actively analyzes field names, their types, and descriptions to understand what data to extract from user text. If you name a field x without a description and make it optional, GPT will start filling it chaotically or leave it empty. A proper schema with clear names and brief descriptions greatly reduces the number of invalid tool-calls.

Mistake #5: Expecting the model to be “obliged” to call a tool.
Developers sometimes ask: “Why didn’t GPT call my tool, since it exists?” The answer is almost always the same: neither the description nor the system prompt implies that the tool is needed for that question, or the request lands in a zone where the model believes it can answer more easily by itself.

Mistake #6: Mixing several different actions in one tool.
Sometimes you want a universal manage_orders that searches orders, creates new ones, and cancels old ones. A human could live with that, but for the model it’s a vague tool without clear boundaries. GPT understands more poorly when exactly to call it, and it’s harder to fill arguments — there will be lots of optional fields. It’s better to split such actions into several narrow tools (get_order, create_order, cancel_order) with clear descriptions and schemas.

Mistake #7: Not accounting for permissions and safety in tool design.
If you describe a tool that can perform destructive actions (charging funds, deleting data) but don’t mark it destructive and don’t restrict its usage area in the description, you create risk. The ChatGPT UI won’t ask for extra confirmation, and the model may decide to call the tool even in “borderline” scenarios. Proper annotations and careful description (“use only after explicit user consent”) help reduce such risks at the tool‑call level.

1
Task
ChatGPT Apps, level 4, lesson 0
Locked
Two intents — tool or text (description role)
Two intents — tool or text (description role)
1
Task
ChatGPT Apps, level 4, lesson 0
Locked
Compare without "MUST call tool" (trigger by description)
Compare without "MUST call tool" (trigger by description)
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION