Docs — trx

Install

bun add -g @crafter/trx

Setup

Run trx init to install dependencies and download a Whisper model:

trx init

This will:

Check and install whisper-cli, yt-dlp, and ffmpeg
Let you choose a Whisper model size
Optionally install the Claude Code agent skill

Transcribe

Paste a URL or path to a local file:

# YouTube video
trx "https://youtube.com/watch?v=dQw4w9WgXcQ"

# Local file
trx recording.mp4

# With language override
trx podcast.mp3 --language es

Output: .txt (plain text) and .srt (subtitles with timestamps).

OpenAI API (optional)

For faster transcription without local models:

export OPENAI_API_KEY="sk-..."
trx init --backend openai
trx recording.mp4 -b openai

trx transcribe

Transcribe audio/video from a URL or local file.

trx transcribe <input> [flags]

The transcribe subcommand is optional — trx <input> works the same way.

Flags

Flag	Description	Default
`-b, --backend`	Transcription backend (`local` or `openai`)	from config
`-l, --language`	ISO 639-1 language code	`auto`
`-m, --model`	Override model size	from config
`-w, --words`	Word-level timestamps in SRT	`false`
`--output-dir`	Directory for output files	`.`
`--fields`	Limit output: `text,srt,metadata,files`	all
`--dry-run`	Show execution plan without running	`false`
`--no-download`	Skip yt-dlp (input must be local)	`false`
`--no-clean`	Skip ffmpeg audio cleaning	`false`
`--json`	Raw JSON payload for agents	—
`-o, --output`	Output format: `json`, `table`, `auto`	`auto`

Models

Local (whisper-cli):

Model	Size	Speed	Accuracy
`tiny`	~75 MB	Fastest	Lowest
`base`	~142 MB	Fast	Decent
`small`	~466 MB	Balanced	Good (recommended)
`medium`	~1.5 GB	Slow	High
`large`	~3 GB	Slowest	Best
`large-v3-turbo`	~1.6 GB	Fast	Near-large

OpenAI API:

Model	Cost	Notes
`gpt-4o-transcribe`	$2.50/hr	Best accuracy
`gpt-4o-mini-transcribe`	$0.60/hr	Fastest, cheapest
`whisper-1`	$0.36/hr	Legacy, segment timestamps

Examples

# Transcribe YouTube video
trx "https://youtube.com/watch?v=abc"

# Spanish podcast with word timestamps
trx podcast.mp3 -l es -w

# OpenAI API with specific model
trx meeting.m4a -b openai -m gpt-4o-mini-transcribe

# JSON output for piping
trx video.mp4 --output json --fields text

# Dry run to preview
trx video.mp4 --dry-run --output json

trx init

Install dependencies and configure the transcription backend.

trx init [flags]

Flags

Flag	Description	Default
`-b, --backend`	Backend: `local` or `openai`	`local`
`-m, --model`	Model to download/configure	`small`
`-l, --language`	Default language	`auto`

What it does

Local backend:

Installs whisper-cli, yt-dlp, ffmpeg via your OS package manager
Downloads the selected Whisper model from Hugging Face
Saves config to ~/.trx/config.json

OpenAI backend:

Validates OPENAI_API_KEY is set
Installs yt-dlp and ffmpeg (still needed for download/clean)
Saves config with selected OpenAI model

trx doctor

Health check for all dependencies and configuration.

trx doctor [--output json]

Shows: installed dependencies, versions, config path, model status, backend, API key.

trx schema

Runtime introspection for agents. Returns the JSON schema of any command.

trx schema transcribe
trx schema init

Agents use this to discover available flags and their types without reading docs.

What is the agent skill?

trx ships with a SKILL.md that teaches Claude Code how to use the CLI and post-process transcription results. When installed, agents can:

Transcribe URLs and files autonomously
Fix common Whisper mistakes (proper nouns, technical terms)
Extract structured data from transcripts
Generate summaries, translations, and subtitles

Install

npx skills add crafter-station/trx -g

Or during trx init, accept the skill installation prompt.

How it works

Agent calls trx schema transcribe to discover available flags
Agent runs trx <input> --output json to get structured output
Agent reads the .txt file and applies corrections
Agent can chain with other tools (translation, summarization)

Example agent workflow

User: "Transcribe this video and fix any technical terms"

Agent:
1. trx schema transcribe          # discover flags
2. trx "https://..." --output json # transcribe
3. Read output .txt file           # get raw text
4. Fix "reakt" → "React", etc.    # post-process
5. Write corrected file            # save result

Self-correction patterns

The skill teaches agents to watch for:

Proper nouns: Brand names, people, places
Technical terms: Programming languages, frameworks, APIs
Homophones: “their/there/they’re”, “your/you’re”
Filler removal: “um”, “uh”, “like” (optional)

Config file

trx stores configuration at ~/.trx/config.json. Created automatically by trx init.

{
  "backend": "local",
  "modelPath": "~/.trx/models/ggml-small.bin",
  "modelSize": "small",
  "language": "auto",
  "threads": 8,
  "wordTimestamps": false,
  "openai": {
    "model": "gpt-4o-transcribe"
  },
  "whisperFlags": {
    "suppressNst": true,
    "noFallback": true,
    "entropyThold": 2.8,
    "logprobThold": -1.0,
    "maxContext": 0
  }
}

Fields

Field	Type	Description
`backend`	`"local"` \| `"openai"`	Active transcription backend
`modelPath`	string	Path to the local Whisper model file
`modelSize`	string	Model size identifier
`language`	string	Default language (`"auto"` for detection)
`threads`	number	CPU threads for local transcription
`wordTimestamps`	boolean	Enable word-level SRT by default
`openai.model`	string	Default OpenAI model
`whisperFlags`	object	Advanced whisper-cli flags

Environment variables

Variable	Required	Description
`OPENAI_API_KEY`	For OpenAI backend	Your OpenAI API key

Models directory

Downloaded models are stored at ~/.trx/models/. Each model is a .bin file downloaded from Hugging Face.

Override per-command

Any config value can be overridden via CLI flags:

# Override backend
trx video.mp4 --backend openai

# Override model
trx video.mp4 --model large-v3-turbo

# Override language
trx video.mp4 --language es