GPT-5.4 Tool Search: Load Only What You Need — ContentBuffer guide

GPT-5.4 Tool Search: Load Only What You Need

K
Kodetra Technologies··4 min read Beginner

Summary

Use OpenAI's tool search to dynamically load tools at runtime, cutting token usage by 47% in large tool ecosystems.

What Is GPT-5.4 Tool Search?

When you build AI agents with 50+ tools, every tool definition eats context tokens. GPT-5.4 introduces tool search — the model loads only the 3–8 tools relevant to the current request instead of all definitions upfront. Result: 47% fewer tokens, faster responses, and better tool selection accuracy.

Only gpt-5.4 and later models support this feature. It works with both functions and MCP servers.


Prerequisites

  • OpenAI API key with GPT-5.4 access
  • Python 3.9+ with openai SDK installed
  • Basic understanding of OpenAI function calling
pip install openai --upgrade

How Tool Search Works

Without tool search, every tool schema is loaded into the prompt. With tool search, you mark tools as defer_loading: true and add a tool_search entry. The model then searches for relevant tools at runtime.

FeatureWithout Tool SearchWith Tool Search
Token usage (50 tools)~12K tokens~6.3K tokens
Tool selection accuracyDegrades with scaleConsistent
Response latencyHigherLower (cache preserved)
Setup complexityNoneMinimal

Option 1: Hosted Tool Search (Easiest)

OpenAI handles the search logic. You declare all tools upfront but mark them as deferred. The API decides which ones to load.

Step 1 — Define Your Tools with defer_loading

import openai

client = openai.OpenAI()

tools = [
    # Tool search entry — tells the model to search
    {"type": "tool_search", "execution": "server"},

    # Deferred tool — NOT loaded until searched
    {
        "type": "function",
        "defer_loading": True,
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "defer_loading": True,
        "function": {
            "name": "search_flights",
            "description": "Search available flights between cities",
            "parameters": {
                "type": "object",
                "properties": {
                    "origin": {"type": "string"},
                    "destination": {"type": "string"},
                    "date": {"type": "string"}
                },
                "required": ["origin", "destination", "date"]
            }
        }
    },
    {
        "type": "function",
        "defer_loading": True,
        "function": {
            "name": "book_hotel",
            "description": "Book a hotel room in a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "checkin": {"type": "string"},
                    "checkout": {"type": "string"}
                },
                "required": ["city", "checkin", "checkout"]
            }
        }
    }
]

Step 2 — Make the API Call

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "user", "content": "What's the weather in Tokyo?"}
    ],
    tools=tools
)

print(response.choices[0].message)

Step 3 — Inspect the Response

The model first emits a tool_search_call (internal search step), then a tool_search_output (loaded tools), then the actual function call. You only need to handle the function call:

# Example output — model loaded only get_weather
# {
#   "role": "assistant",
#   "tool_calls": [
#     {
#       "id": "call_abc123",
#       "type": "function",
#       "function": {
#         "name": "get_weather",
#         "arguments": "{\"city\": \"Tokyo\"}"
#       }
#     }
#   ]
# }
# search_flights and book_hotel were NOT loaded — saving tokens

Option 2: Client-Executed Tool Search

You control the search logic. The model tells you what it needs; your code decides which tools to provide. Use this when tools depend on user context, tenant config, or external registries.

Step 1 — Configure Client-Side Search

tools_client = [
    {
        "type": "tool_search",
        "execution": "client",
        "description": "Search project-specific tools",
        "parameters": {
            "type": "object",
            "properties": {
                "goal": {"type": "string"}
            },
            "required": ["goal"]
        }
    }
]

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "user", "content": "Deploy the staging server"}
    ],
    tools=tools_client
)

Step 2 — Handle the Search Call

import json

# Model returns a tool_search_call with a call_id
search_call = response.choices[0].message.tool_calls[0]
goal = json.loads(search_call.function.arguments)["goal"]
print(f"Model wants tools for: {goal}")
# Output: "Model wants tools for: deploy staging server"

# Your logic: look up relevant tools from your registry
def find_tools(goal):
    registry = {
        "deploy": [
            {
                "type": "function",
                "function": {
                    "name": "run_deploy",
                    "description": "Deploy to staging or production",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "env": {
                                "type": "string",
                                "enum": ["staging", "production"]
                            }
                        },
                        "required": ["env"]
                    }
                }
            }
        ]
    }
    for key, tools in registry.items():
        if key in goal.lower():
            return tools
    return []

matched_tools = find_tools(goal)

Step 3 — Return Tools and Continue

# Send matched tools back with the same call_id
follow_up = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "user", "content": "Deploy the staging server"},
        response.choices[0].message,
        {
            "role": "tool",
            "tool_call_id": search_call.id,
            "content": json.dumps({
                "type": "tool_search_output",
                "tools": matched_tools
            })
        }
    ],
    tools=tools_client + matched_tools
)

# Model now calls run_deploy with env="staging"
print(follow_up.choices[0].message.tool_calls)

Best Practices

  1. Group tools into namespaces — Use MCP servers or logical groups instead of individual deferred functions for better token efficiency
  2. Keep namespaces under 10 functions — Smaller groups improve search accuracy
  3. Write clear descriptions — The model uses descriptions to decide which tools to load, so vague ones cause poor matches
  4. Don't mix deferred and non-deferred carelessly — Keep always-needed tools non-deferred and optional tools deferred
  5. Cache is preserved — Loaded tools append to context end, so KV cache stays intact across requests

When to Use Each Approach

ScenarioApproach
All tools known at request timeHosted (server)
Tools depend on user/tenant contextClient-executed
External tool registry (MCP, plugin store)Client-executed
Simple agent with 10-50 toolsHosted (server)
Platform with 100+ tools per userClient-executed

Quick Reference

# Minimal hosted tool search setup
tools = [
    {"type": "tool_search", "execution": "server"},
    {"type": "function", "defer_loading": True, "function": {...}},
    {"type": "function", "defer_loading": True, "function": {...}},
]

# Minimal client tool search setup
tools = [
    {
        "type": "tool_search",
        "execution": "client",
        "description": "Search available tools",
        "parameters": {
            "type": "object",
            "properties": {"goal": {"type": "string"}}
        }
    }
]

Key Takeaway

Tool search is essential for any agent with more than a dozen tools. Hosted search is the simplest starting point — just add defer_loading: true and a tool_search entry. Switch to client-executed search when you need dynamic, context-aware tool discovery. Either way, you get significant token savings and better tool selection out of the box.

Comments

Subscribe to join the conversation...

Be the first to comment

Found this useful?

Get new AI guides for builders by email. Free.