Step: Speech to Text

Key: speech-to-text
Category: AI & Language Models
Description: Transcribe base64 audio to text.

Inputs

base64Key (select, optional): Context key containing audio as a data URL or raw base64 string.
base64 (long-text, optional): Inline audio as a data URL or raw base64 string.

Provide either base64Key or base64.

Outputs

language (string): Detected language when returned by the transcription provider.
duration (number): Audio duration in seconds when returned by the provider.
text (long-text): Transcribed text.

Notes

The runtime currently uses the configured Together transcription provider.
Data URL inputs can include MIME types such as audio/mpeg, audio/wav, audio/webm, audio/ogg, or audio/mp4.
Pair this step with an agent, ask-to-llm, or classifier step to process the transcript.