Install
bun add -g @crafter/trx
Setup
Run trx init to install dependencies and download a Whisper model:
trx init
This will:
- Check and install
whisper-cli,yt-dlp, andffmpeg - Let you choose a Whisper model size
- Optionally install the Claude Code agent skill
Transcribe
Paste a URL or path to a local file:
# YouTube video
trx "https://youtube.com/watch?v=dQw4w9WgXcQ"
# Local file
trx recording.mp4
# With language override
trx podcast.mp3 --language es
Output: .txt (plain text) and .srt (subtitles with timestamps).
OpenAI API (optional)
For faster transcription without local models:
export OPENAI_API_KEY="sk-..."
trx init --backend openai
trx recording.mp4 -b openai trx transcribe
Transcribe audio/video from a URL or local file.
trx transcribe <input> [flags]
The transcribe subcommand is optional — trx <input> works the same way.
Flags
| Flag | Description | Default |
|---|---|---|
-b, --backend | Transcription backend (local or openai) | from config |
-l, --language | ISO 639-1 language code | auto |
-m, --model | Override model size | from config |
-w, --words | Word-level timestamps in SRT | false |
--output-dir | Directory for output files | . |
--fields | Limit output: text,srt,metadata,files | all |
--dry-run | Show execution plan without running | false |
--no-download | Skip yt-dlp (input must be local) | false |
--no-clean | Skip ffmpeg audio cleaning | false |
--json | Raw JSON payload for agents | — |
-o, --output | Output format: json, table, auto | auto |
Models
Local (whisper-cli):
| Model | Size | Speed | Accuracy |
|---|---|---|---|
tiny | ~75 MB | Fastest | Lowest |
base | ~142 MB | Fast | Decent |
small | ~466 MB | Balanced | Good (recommended) |
medium | ~1.5 GB | Slow | High |
large | ~3 GB | Slowest | Best |
large-v3-turbo | ~1.6 GB | Fast | Near-large |
OpenAI API:
| Model | Cost | Notes |
|---|---|---|
gpt-4o-transcribe | $2.50/hr | Best accuracy |
gpt-4o-mini-transcribe | $0.60/hr | Fastest, cheapest |
whisper-1 | $0.36/hr | Legacy, segment timestamps |
Examples
# Transcribe YouTube video
trx "https://youtube.com/watch?v=abc"
# Spanish podcast with word timestamps
trx podcast.mp3 -l es -w
# OpenAI API with specific model
trx meeting.m4a -b openai -m gpt-4o-mini-transcribe
# JSON output for piping
trx video.mp4 --output json --fields text
# Dry run to preview
trx video.mp4 --dry-run --output json
trx init
Install dependencies and configure the transcription backend.
trx init [flags]
Flags
| Flag | Description | Default |
|---|---|---|
-b, --backend | Backend: local or openai | local |
-m, --model | Model to download/configure | small |
-l, --language | Default language | auto |
What it does
Local backend:
- Installs
whisper-cli,yt-dlp,ffmpegvia your OS package manager - Downloads the selected Whisper model from Hugging Face
- Saves config to
~/.trx/config.json
OpenAI backend:
- Validates
OPENAI_API_KEYis set - Installs
yt-dlpandffmpeg(still needed for download/clean) - Saves config with selected OpenAI model
trx doctor
Health check for all dependencies and configuration.
trx doctor [--output json]
Shows: installed dependencies, versions, config path, model status, backend, API key.
trx schema
Runtime introspection for agents. Returns the JSON schema of any command.
trx schema transcribe
trx schema init
Agents use this to discover available flags and their types without reading docs.
What is the agent skill?
trx ships with a SKILL.md that teaches Claude Code how to use the CLI and post-process transcription results. When installed, agents can:
- Transcribe URLs and files autonomously
- Fix common Whisper mistakes (proper nouns, technical terms)
- Extract structured data from transcripts
- Generate summaries, translations, and subtitles
Install
npx skills add crafter-station/trx -g
Or during trx init, accept the skill installation prompt.
How it works
- Agent calls
trx schema transcribeto discover available flags - Agent runs
trx <input> --output jsonto get structured output - Agent reads the
.txtfile and applies corrections - Agent can chain with other tools (translation, summarization)
Example agent workflow
User: "Transcribe this video and fix any technical terms"
Agent:
1. trx schema transcribe # discover flags
2. trx "https://..." --output json # transcribe
3. Read output .txt file # get raw text
4. Fix "reakt" → "React", etc. # post-process
5. Write corrected file # save result
Self-correction patterns
The skill teaches agents to watch for:
- Proper nouns: Brand names, people, places
- Technical terms: Programming languages, frameworks, APIs
- Homophones: “their/there/they’re”, “your/you’re”
- Filler removal: “um”, “uh”, “like” (optional)
Config file
trx stores configuration at ~/.trx/config.json. Created automatically by trx init.
{
"backend": "local",
"modelPath": "~/.trx/models/ggml-small.bin",
"modelSize": "small",
"language": "auto",
"threads": 8,
"wordTimestamps": false,
"openai": {
"model": "gpt-4o-transcribe"
},
"whisperFlags": {
"suppressNst": true,
"noFallback": true,
"entropyThold": 2.8,
"logprobThold": -1.0,
"maxContext": 0
}
}
Fields
| Field | Type | Description |
|---|---|---|
backend | "local" | "openai" | Active transcription backend |
modelPath | string | Path to the local Whisper model file |
modelSize | string | Model size identifier |
language | string | Default language ("auto" for detection) |
threads | number | CPU threads for local transcription |
wordTimestamps | boolean | Enable word-level SRT by default |
openai.model | string | Default OpenAI model |
whisperFlags | object | Advanced whisper-cli flags |
Environment variables
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY | For OpenAI backend | Your OpenAI API key |
Models directory
Downloaded models are stored at ~/.trx/models/. Each model is a .bin file downloaded from Hugging Face.
Override per-command
Any config value can be overridden via CLI flags:
# Override backend
trx video.mp4 --backend openai
# Override model
trx video.mp4 --model large-v3-turbo
# Override language
trx video.mp4 --language es