Multimodal Capabilities
Speech-to-Text
Portkey’s AI gateway supports STT models like Whisper by OpenAI.
Transcription & Translation Usage
Portkey supports both Transcription
and Translation
methods for STT models and follows the OpenAI signature where you can send the file (in flac
, mp3
, mp4
, mpeg
, mpga
, m4a
, ogg
, wav
, or webm
formats) as part of the API request.
Here’s an example:
OpenAI NodeJSOpenAI PythonREST
import fs from "fs";
import OpenAI from "openai";
import { PORTKEY_GATEWAY_URL, createHeaders } from 'portkey-ai'
const openai = new OpenAI({
baseURL: PORTKEY_GATEWAY_URL,
defaultHeaders: createHeaders({
apiKey: "PORTKEY_API_KEY",
virtualKey: "OPENAI_VIRTUAL_KEY"
})
});
// Transcription
async function transcribe() {
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream("/path/to/file.mp3"),
model: "whisper-1",
});
console.log(transcription.text);
}
transcribe();
// Translation
async function translate() {
const translation = await openai.audio.translations.create({
file: fs.createReadStream("/path/to/file.mp3"),
model: "whisper-1",
});
console.log(translation.text);
}
translate();
On completion, the request will get logged in the logs UI where you can see trasncribed or translated text, along with the cost and latency incurred.
Supported Providers and Models
The following providers are supported for speech-to-text with more providers getting added soon. Please raise a request or a PR to add model or provider to the AI gateway.
Provider | Models | Functions |
---|---|---|
OpenAI | whisper-1 | Transcription Translation |
Was this page helpful?