The Portkey x LlamaIndex integration brings advanced AI gateway capabilities, full-stack observability, and prompt management to apps built on LlamaIndex.
In a nutshell, Portkey extends the familiar OpenAI schema to make Llamaindex work with 200+ LLMs without the need for importing different classes for each provider or having to configure your code separately. Portkey makes your Llamaindex apps reliable, fast, and cost-efficient.
Getting Started
1. Install the Portkey SDK
pipinstall-Uportkey-ai
2. Import the necessary classes and functions
Import the OpenAI class in Llamaindex as you normally would, along with Portkey's helper functions createHeaders and PORTKEY_GATEWAY_URL
from llama_index.llms.openai import OpenAIfrom portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
3. Configure model details
Configure your model details using Portkey's Config object schema. This is where you can define the provider and model name, model parameters, set up fallbacks, retries, and more.
Here are basic integrations examples on using the complete and chat methods with streaming on & off.
from llama_index.llms.openai import OpenAIfrom llama_index.core.llms import ChatMessagefrom portkey_ai import PORTKEY_GATEWAY_URL, createHeadersconfig ={"provider":"openai","api_key":"YOUR_OPENAI_API_KEY","override_params":{"model":"gpt-4o","max_tokens":64}}#### You can also reference a saved Config ######## config = "pc-anthropic-xx"portkey =OpenAI( api_base=PORTKEY_GATEWAY_URL, default_headers=createHeaders( api_key="YOUR_PORTKEY_API_KEY", config=config ))messages = [ChatMessage(role="system", content="You are a pirate with a colorful personality"),ChatMessage(role="user", content="What is your name"),]resp = portkey.chat(messages)print(resp)##### Streaming Mode #####resp = portkey.stream_chat(messages)for r in resp:print(r.delta, end="")
assistant: Arrr, matey! They call me Captain Barnacle Bill, the most colorful pirate to ever sail the seven seas! With a parrot on me shoulder and a treasure map in me hand, I'm always ready for adventure! What be yer name, landlubber?
from llama_index.llms.openai import OpenAIfrom portkey_ai import PORTKEY_GATEWAY_URL, createHeadersconfig ={"provider":"openai","api_key":"YOUR_OPENAI_API_KEY","override_params":{"model":"gpt-4o","max_tokens":64}}#### You can also reference a saved Config ######## config = "pc-anthropic-xx"portkey =OpenAI( api_base=PORTKEY_GATEWAY_URL, default_headers=createHeaders( api_key="YOUR_PORTKEY_API_KEY", config=config ))resp=portkey.complete("Paul Graham is ")print(resp)##### Streaming Mode #####resp=portkey.stream_complete("Paul Graham is ")for r in resp:print(r.delta, end="")
a computer scientist, entrepreneur, and venture capitalist. He is best known for co-founding the startup accelerator Y Combinator and for his work on programming languages and web development. Graham is also a prolific writer and has published essays on a wide range of topics, including startups, technology, and education.
import asynciofrom llama_index.llms.openai import OpenAIfrom portkey_ai import PORTKEY_GATEWAY_URL, createHeadersconfig ={"provider":"openai","api_key":"YOUR_OPENAI_API_KEY","override_params":{"model":"gpt-4o","max_tokens":64}}#### You can also reference a saved Config ######## config = "pc-anthropic-xx"asyncdefmain(): portkey =OpenAI( api_base=PORTKEY_GATEWAY_URL, default_headers=createHeaders( api_key="PORTKEY_API_KEY", config=config ) ) resp =await portkey.acomplete("Paul Graham is ")print(resp)##### Streaming Mode ##### resp =await portkey.astream_complete("Paul Graham is ")asyncfor delta in resp:print(delta.delta, end="")asyncio.run(main())
Enabling Portkey Features
By routing your LlamaIndex requests through Portkey, you get access to the following production-grade features:
Interoperability: Call various LLMs like Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, and AWS Bedrock with minimal code changes.
Caching: Speed up your requests and save money on LLM calls by storing past responses in the Portkey cache. Choose between Simple and Semantic cache modes.
Reliability: Set up fallbacks between different LLMs or providers, load balance your requests across multiple instances or API keys, set automatic retries, and request timeouts.
Observability: Portkey automatically logs all the key details about your requests, including cost, tokens used, response time, request and response bodies, and more. Send custom metadata and trace IDs for better analytics and debugging.
Prompt Management: Use Portkey as a centralized hub to store, version, and experiment with prompts across multiple LLMs, and seamlessly retrieve them in your LlamaIndex app for easy integration.
Continuous Improvement: Improve your LlamaIndex app by capturing qualitative & quantitative user feedback on your requests.
Security & Compliance: Set budget limits on provider API keys and implement fine-grained user roles and permissions for both the app and the Portkey APIs.
Much of these features are driven by Portkey's Config architecture. On the Portkey app, we make it easy to help you create, manage, and version your Configs so that you can reference them easily in Llamaindex.
Saving Configs in the Portkey App
Head over to the Configs tab in Portkey app where you can save various provider Configs along with the reliability and caching features. Each Config has an associated slug that you can reference in your Llamaindex code.
Overriding a Saved Config
If you want to use a saved Config from the Portkey app in your LlamaIndex code but need to modify certain parts of it before making a request, you can easily achieve this using Portkey's Configs API. This approach allows you to leverage the convenience of saved Configs while still having the flexibility to adapt them to your specific needs.
Here's an example of how you can fetch a saved Config using the Configs API and override the model parameter:
We define a helper function get_customized_config that takes a config_slug and a model as parameters.
Inside the function, we make a GET request to the Portkey Configs API endpoint to fetch the saved Config using the provided config_slug.
We extract the config object from the API response.
We update the model parameter in the override_params section of the Config with the provided custom_model.
Finally, we return the customized Config.
We can then use this customized Config when initializing the OpenAI client from LlamaIndex, ensuring that our specific model override is applied to the saved Config.
For more details on working with Configs in Portkey, refer to the Config documentation.
1. Interoperability - Calling Anthropic, Gemini, Mistral, and more
Now that we have the OpenAI code up and running, let's see how you can use Portkey to send the request across multiple LLMs - we'll show Anthropic, Gemini, and Mistral. For the full list of providers & LLMs supported, check out this doc.
Switching providers just requires changing 3 lines of code:
Change the provider name
Change the API key, and
Change the model name
from llama_index.llms.openai import OpenAIfrom llama_index.core.llms import ChatMessagefrom portkey_ai import PORTKEY_GATEWAY_URL, createHeadersconfig ={"provider":"anthropic","api_key":"YOUR_ANTHROPIC_API_KEY","override_params":{"model":"claude-3-opus-20240229","max_tokens":64}}#### You can also reference a saved Config ######## config = "pc-anthropic-xx"portkey =OpenAI( api_base=PORTKEY_GATEWAY_URL, default_headers=createHeaders( api_key="YOUR_PORTKEY_API_KEY", config=config ))messages = [ChatMessage(role="system", content="You are a pirate with a colorful personality"),ChatMessage(role="user", content="What is your name"),]resp = portkey.chat(messages)print(resp)
from llama_index.llms.openai import OpenAIfrom llama_index.core.llms import ChatMessagefrom portkey_ai import PORTKEY_GATEWAY_URL, createHeadersconfig ={"provider":"google","api_key":"YOUR_GOOGLE_GEMINI_API_KEY","override_params":{"model":"gemini-1.5-flash-latest","max_tokens":64}}#### You can also reference a saved Config instead ######## config = "pc-gemini-xx"portkey =OpenAI( api_base=PORTKEY_GATEWAY_URL, default_headers=createHeaders( api_key="YOUR_PORTKEY_API_KEY", config=config ))messages = [ChatMessage(role="system", content="You are a pirate with a colorful personality"),ChatMessage(role="user", content="What is your name"),]resp = portkey.chat(messages)print(resp)
from llama_index.llms.openai import OpenAIfrom llama_index.core.llms import ChatMessagefrom portkey_ai import PORTKEY_GATEWAY_URL, createHeadersconfig ={"provider":"mistral-ai","api_key":"YOUR_MISTRAL_AI_API_KEY","override_params":{"model":"codestral-latest","max_tokens":64}}#### You can also reference a saved Config instead ######## config = "pc-mistral-xx"portkey =OpenAI( api_base=PORTKEY_GATEWAY_URL, default_headers=createHeaders( api_key="YOUR_PORTKEY_API_KEY", config=config ))messages = [ChatMessage(role="system", content="You are a pirate with a colorful personality"),ChatMessage(role="user", content="What is your name"),]resp = portkey.chat(messages)print(resp)
Calling Azure, Google Vertex, AWS Bedrock
We recommend saving your cloud details to Portkey vault and getting a corresponding Virtual Key.
from llama_index.llms.openai import OpenAIfrom llama_index.core.llms import ChatMessagefrom portkey_ai import PORTKEY_GATEWAY_URL, createHeadersconfig ={"virtual_key":"AZURE_OPENAI_PORTKEY_VIRTUAL_KEY"}#### You can also reference a saved Config instead ######## config = "pc-azure-xx"portkey =OpenAI( api_base=PORTKEY_GATEWAY_URL, default_headers=createHeaders( api_key="YOUR_PORTKEY_API_KEY", config=config ))messages = [ChatMessage(role="system", content="You are a pirate with a colorful personality"),ChatMessage(role="user", content="What is your name"),]resp = portkey.chat(messages)print(resp)
from llama_index.llms.openai import OpenAIfrom llama_index.core.llms import ChatMessagefrom portkey_ai import PORTKEY_GATEWAY_URL, createHeadersconfig ={"virtual_key":"AWS_BEDROCK_PORTKEY_VIRTUAL_KEY"}#### You can also reference a saved Config instead ######## config = "pc-bedrock-xx"portkey =OpenAI( api_base=PORTKEY_GATEWAY_URL, default_headers=createHeaders( api_key="YOUR_PORTKEY_API_KEY", config=config ))messages = [ChatMessage(role="system", content="You are a pirate with a colorful personality"),ChatMessage(role="user", content="What is your name"),]resp = portkey.chat(messages)print(resp)
Vertex AI uses OAuth2 to authenticate its requests, so you need to send the access token additionally along with the request - you can do this while by sending it as the api_key in the OpenAI client.
Run gcloud auth print-access-token in your terminal to get your Vertex AI access token.
from llama_index.llms.openai import OpenAIfrom llama_index.core.llms import ChatMessagefrom portkey_ai import PORTKEY_GATEWAY_URL, createHeadersconfig ={"virtual_key":"VERTEX_AI_PORTKEY_VIRTUAL_KEY"}#### You can also reference a saved Config instead ######## config = "pc-vertex-xx"portkey =OpenAI( api_key="YOUR_VERTEX_AI_ACCESS_TOKEN", # Get by running gcloud auth print-access-token in terminal api_base=PORTKEY_GATEWAY_URL, default_headers=createHeaders( api_key="YOUR_PORTKEY_API_KEY", config=config ))messages = [ChatMessage(role="system", content="You are a pirate with a colorful personality"),ChatMessage(role="user", content="What is your name"),]resp = portkey.chat(messages)print(resp)
Calling Local or Privately Hosted Models like Ollama
from llama_index.llms.openai import OpenAIfrom llama_index.core.llms import ChatMessagefrom portkey_ai import PORTKEY_GATEWAY_URL, createHeadersconfig ={"provider":"ollama","custom_host":"https://7cc4-3-235-157-146.ngrok-free.app",# Your Ollama ngrok URL"override_params":{"model":"llama3"}}#### You can also reference a saved Config instead ######## config = "pc-azure-xx"portkey =OpenAI( api_base=PORTKEY_GATEWAY_URL, default_headers=createHeaders( api_key="YOUR_PORTKEY_API_KEY", config=config ))messages = [ChatMessage(role="system", content="You are a pirate with a colorful personality"),ChatMessage(role="user", content="What is your name"),]resp = portkey.chat(messages)print(resp)
Set up fallbacks between different LLMs or providers, load balance your requests across multiple instances or API keys, set automatic retries, or set request timeouts - all set through Configs.
Portkey automatically logs all the key details about your requests, including cost, tokens used, response time, request and response bodies, and more.
Using Portkey, you can also send custom metadata with each of your requests to further segment your logs for better analytics. Similarly, you can also trace multiple requests to a single trace ID and filter or view them separately in Portkey logs.
Custom Metadata and Trace ID information is sent in default_headers.
from llama_index.llms.openai import OpenAIfrom llama_index.core.llms import ChatMessagefrom portkey_ai import PORTKEY_GATEWAY_URL, createHeadersconfig ="pc-xxxx"portkey =OpenAI( api_base=PORTKEY_GATEWAY_URL, default_headers=createHeaders( api_key="YOUR_PORTKEY_API_KEY", config=config, metadata={"_user": "USER_ID","environment": "production","session_id": "1729" } ))messages = [ChatMessage(role="system", content="You are a pirate with a colorful personality"),ChatMessage(role="user", content="What is your name"),]resp = portkey.chat(messages)print(resp)
from llama_index.llms.openai import OpenAIfrom llama_index.core.llms import ChatMessagefrom portkey_ai import PORTKEY_GATEWAY_URL, createHeadersconfig ="pc-xxxx"portkey =OpenAI( api_base=PORTKEY_GATEWAY_URL, default_headers=createHeaders( api_key="YOUR_PORTKEY_API_KEY", config=config, trace_id="YOUR_TRACE_ID_HERE" ))messages = [ChatMessage(role="system", content="You are a pirate with a colorful personality"),ChatMessage(role="user", content="What is your name"),]resp = portkey.chat(messages)print(resp)
Portkey shows these details separately for each log:
Portkey features an advanced Prompts platform tailor-made for better prompt engineering. With Portkey, you can:
Store Prompts with Access Control and Version Control: Keep all your prompts organized in a centralized location, easily track changes over time, and manage edit/view permissions for your team.
Parameterize Prompts: Define variables and mustache-approved tags within your prompts, allowing for dynamic value insertion when calling LLMs. This enables greater flexibility and reusability of your prompts.
Experiment in a Sandbox Environment: Quickly iterate on different LLMs and parameters to find the optimal combination for your use case, without modifying your LlamaIndex code.
Here's how you can leverage Portkey's Prompt Management in your LlamaIndex application:
Create your prompt template on the Portkey app, and save it to get an associated Prompt ID
Before making a Llamaindex request, render the prompt template using the Portkey SDK
Transform the retrieved prompt to be compatible with LlamaIndex and send the request!
Example: Using a Portkey Prompt Template in LlamaIndex
import jsonimport osfrom llama_index.llms.openai import OpenAIfrom llama_index.core.llms import ChatMessagefrom portkey_ai import PORTKEY_GATEWAY_URL, createHeaders, Portkey### Initialize Portkey client with API keyclient =Portkey(api_key=os.environ.get("PORTKEY_API_KEY"))### Render the prompt template with your prompt ID and variablesprompt_template = client.prompts.render( prompt_id="pp-prompt-id", variables={ "movie":"Dune 2" }).data.dict()config ={"virtual_key":"GROQ_VIRTUAL_KEY",# You need to send the virtual key separately"override_params":{"model":prompt_template["model"],# Set the model name based on the value in the prompt template"temperature":prompt_template["temperature"]# Similarly, you can also set other model params}}portkey =OpenAI( api_base=PORTKEY_GATEWAY_URL, default_headers=createHeaders( api_key=os.environ.get("PORTKEY_API_KEY"), config=config ))### Transform the rendered prompt into LlamaIndex-compatible formatmessages = [ChatMessage(content=msg["content"], role=msg["role"])for msg in prompt_template["messages"]]resp = portkey.chat(messages)print(resp)
Now that you know how to trace & log your Llamaindex requests to Portkey, you can also start capturing user feedback to improve your app!
You can append qualitative as well as quantitative feedback to any trace ID with the portkey.feedback.create method:
from portkey_ai import Portkeyportkey =Portkey( api_key="PORTKEY_API_KEY")feedback = portkey.feedback.create( trace_id="YOUR_LLAMAINDEX_TRACE_ID", value=5, # Integer between -10 and 10 weight=1, # Optional metadata={# Pass any additional context here like comments, _user and more })print(feedback)
When you onboard more team members to help out on your Llamaindex app - permissioning, budgeting, and access management can become a mess! Using Portkey, you can set budget limits on provide API keys and implement fine-grained user roles and permissions to:
Control access: Restrict team members' access to specific features, Configs, or API endpoints based on their roles and responsibilities.
Manage costs: Set budget limits on API keys to prevent unexpected expenses and ensure that your LLM usage stays within your allocated budget.
Ensure compliance: Implement strict security policies and audit trails to maintain compliance with industry regulations and protect sensitive data.
Simplify onboarding: Streamline the onboarding process for new team members by assigning them appropriate roles and permissions, eliminating the need to share sensitive API keys or secrets.
Monitor usage: Gain visibility into your team's LLM usage, track costs, and identify potential security risks or anomalies through comprehensive monitoring and reporting.