Anthropic
Prompt Caching
Portkey makes Anthropic’s prompt caching work on our OpenAI-compliant universal API.
Just pass Anthropic’s anthropic-beta
header in your request, and set the cache_control
param in your respective message body:
Anthropic currently has certain restrictions on prompt caching, like:
- Cache TTL is set at 5 minutes and can not be changed
- The message you are caching needs to cross minimum length to enable this feature
- 1024 tokens for Claude 3.5 Sonnet and Claude 3 Opus
- 2048 tokens for Claude 3 Haiku
For more, refer to Anthropic’s prompt caching documentation here.
Seeing Cache Results in Portkey
Portkey automatically calculate the correct pricing for your prompt caching requests & responses based on Anthropic’s calculations here:
In the individual log for any request, you can also see the exact status of your request and verify if it was cached, or delivered from cache with two usage
parameters:
cache_creation_input_tokens
: Number of tokens written to the cache when creating a new entry.cache_read_input_tokens
: Number of tokens retrieved from the cache for this request.
Was this page helpful?