
GPT-5.4 Tool Search: Load Only What You Need
Summary
Use OpenAI's tool search to dynamically load tools at runtime, cutting token usage by 47% in large tool ecosystems.
What Is GPT-5.4 Tool Search?
When you build AI agents with 50+ tools, every tool definition eats context tokens. GPT-5.4 introduces tool search — the model loads only the 3–8 tools relevant to the current request instead of all definitions upfront. Result: 47% fewer tokens, faster responses, and better tool selection accuracy.
Only gpt-5.4 and later models support this feature. It works with both functions and MCP servers.
Prerequisites
- OpenAI API key with GPT-5.4 access
- Python 3.9+ with
openaiSDK installed - Basic understanding of OpenAI function calling
pip install openai --upgrade
How Tool Search Works
Without tool search, every tool schema is loaded into the prompt. With tool search, you mark tools as defer_loading: true and add a tool_search entry. The model then searches for relevant tools at runtime.
| Feature | Without Tool Search | With Tool Search |
|---|---|---|
| Token usage (50 tools) | ~12K tokens | ~6.3K tokens |
| Tool selection accuracy | Degrades with scale | Consistent |
| Response latency | Higher | Lower (cache preserved) |
| Setup complexity | None | Minimal |
Option 1: Hosted Tool Search (Easiest)
OpenAI handles the search logic. You declare all tools upfront but mark them as deferred. The API decides which ones to load.
Step 1 — Define Your Tools with defer_loading
import openai
client = openai.OpenAI()
tools = [
# Tool search entry — tells the model to search
{"type": "tool_search", "execution": "server"},
# Deferred tool — NOT loaded until searched
{
"type": "function",
"defer_loading": True,
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
},
{
"type": "function",
"defer_loading": True,
"function": {
"name": "search_flights",
"description": "Search available flights between cities",
"parameters": {
"type": "object",
"properties": {
"origin": {"type": "string"},
"destination": {"type": "string"},
"date": {"type": "string"}
},
"required": ["origin", "destination", "date"]
}
}
},
{
"type": "function",
"defer_loading": True,
"function": {
"name": "book_hotel",
"description": "Book a hotel room in a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
"checkin": {"type": "string"},
"checkout": {"type": "string"}
},
"required": ["city", "checkin", "checkout"]
}
}
}
]
Step 2 — Make the API Call
response = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"}
],
tools=tools
)
print(response.choices[0].message)
Step 3 — Inspect the Response
The model first emits a tool_search_call (internal search step), then a tool_search_output (loaded tools), then the actual function call. You only need to handle the function call:
# Example output — model loaded only get_weather
# {
# "role": "assistant",
# "tool_calls": [
# {
# "id": "call_abc123",
# "type": "function",
# "function": {
# "name": "get_weather",
# "arguments": "{\"city\": \"Tokyo\"}"
# }
# }
# ]
# }
# search_flights and book_hotel were NOT loaded — saving tokens
Option 2: Client-Executed Tool Search
You control the search logic. The model tells you what it needs; your code decides which tools to provide. Use this when tools depend on user context, tenant config, or external registries.
Step 1 — Configure Client-Side Search
tools_client = [
{
"type": "tool_search",
"execution": "client",
"description": "Search project-specific tools",
"parameters": {
"type": "object",
"properties": {
"goal": {"type": "string"}
},
"required": ["goal"]
}
}
]
response = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "user", "content": "Deploy the staging server"}
],
tools=tools_client
)
Step 2 — Handle the Search Call
import json
# Model returns a tool_search_call with a call_id
search_call = response.choices[0].message.tool_calls[0]
goal = json.loads(search_call.function.arguments)["goal"]
print(f"Model wants tools for: {goal}")
# Output: "Model wants tools for: deploy staging server"
# Your logic: look up relevant tools from your registry
def find_tools(goal):
registry = {
"deploy": [
{
"type": "function",
"function": {
"name": "run_deploy",
"description": "Deploy to staging or production",
"parameters": {
"type": "object",
"properties": {
"env": {
"type": "string",
"enum": ["staging", "production"]
}
},
"required": ["env"]
}
}
}
]
}
for key, tools in registry.items():
if key in goal.lower():
return tools
return []
matched_tools = find_tools(goal)
Step 3 — Return Tools and Continue
# Send matched tools back with the same call_id
follow_up = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "user", "content": "Deploy the staging server"},
response.choices[0].message,
{
"role": "tool",
"tool_call_id": search_call.id,
"content": json.dumps({
"type": "tool_search_output",
"tools": matched_tools
})
}
],
tools=tools_client + matched_tools
)
# Model now calls run_deploy with env="staging"
print(follow_up.choices[0].message.tool_calls)
Best Practices
- Group tools into namespaces — Use MCP servers or logical groups instead of individual deferred functions for better token efficiency
- Keep namespaces under 10 functions — Smaller groups improve search accuracy
- Write clear descriptions — The model uses descriptions to decide which tools to load, so vague ones cause poor matches
- Don't mix deferred and non-deferred carelessly — Keep always-needed tools non-deferred and optional tools deferred
- Cache is preserved — Loaded tools append to context end, so KV cache stays intact across requests
When to Use Each Approach
| Scenario | Approach |
|---|---|
| All tools known at request time | Hosted (server) |
| Tools depend on user/tenant context | Client-executed |
| External tool registry (MCP, plugin store) | Client-executed |
| Simple agent with 10-50 tools | Hosted (server) |
| Platform with 100+ tools per user | Client-executed |
Quick Reference
# Minimal hosted tool search setup
tools = [
{"type": "tool_search", "execution": "server"},
{"type": "function", "defer_loading": True, "function": {...}},
{"type": "function", "defer_loading": True, "function": {...}},
]
# Minimal client tool search setup
tools = [
{
"type": "tool_search",
"execution": "client",
"description": "Search available tools",
"parameters": {
"type": "object",
"properties": {"goal": {"type": "string"}}
}
}
]
Key Takeaway
Tool search is essential for any agent with more than a dozen tools. Hosted search is the simplest starting point — just add defer_loading: true and a tool_search entry. Switch to client-executed search when you need dynamic, context-aware tool discovery. Either way, you get significant token savings and better tool selection out of the box.
Comments
Be the first to comment
Found this useful?
Get new AI guides for builders by email. Free.