Install

bun add -g @crafter/trx

Setup

Run trx init to install dependencies and download a Whisper model:

trx init

This will:

  1. Check and install whisper-cli, yt-dlp, and ffmpeg
  2. Let you choose a Whisper model size
  3. Optionally install the Claude Code agent skill

Transcribe

Paste a URL or path to a local file:

# YouTube video
trx "https://youtube.com/watch?v=dQw4w9WgXcQ"

# Local file
trx recording.mp4

# With language override
trx podcast.mp3 --language es

Output: .txt (plain text) and .srt (subtitles with timestamps).

OpenAI API (optional)

For faster transcription without local models:

export OPENAI_API_KEY="sk-..."
trx init --backend openai
trx recording.mp4 -b openai

trx transcribe

Transcribe audio/video from a URL or local file.

trx transcribe <input> [flags]

The transcribe subcommand is optional — trx <input> works the same way.

Flags

FlagDescriptionDefault
-b, --backendTranscription backend (local or openai)from config
-l, --languageISO 639-1 language codeauto
-m, --modelOverride model sizefrom config
-w, --wordsWord-level timestamps in SRTfalse
--output-dirDirectory for output files.
--fieldsLimit output: text,srt,metadata,filesall
--dry-runShow execution plan without runningfalse
--no-downloadSkip yt-dlp (input must be local)false
--no-cleanSkip ffmpeg audio cleaningfalse
--jsonRaw JSON payload for agents
-o, --outputOutput format: json, table, autoauto

Models

Local (whisper-cli):

ModelSizeSpeedAccuracy
tiny~75 MBFastestLowest
base~142 MBFastDecent
small~466 MBBalancedGood (recommended)
medium~1.5 GBSlowHigh
large~3 GBSlowestBest
large-v3-turbo~1.6 GBFastNear-large

OpenAI API:

ModelCostNotes
gpt-4o-transcribe$2.50/hrBest accuracy
gpt-4o-mini-transcribe$0.60/hrFastest, cheapest
whisper-1$0.36/hrLegacy, segment timestamps

Examples

# Transcribe YouTube video
trx "https://youtube.com/watch?v=abc"

# Spanish podcast with word timestamps
trx podcast.mp3 -l es -w

# OpenAI API with specific model
trx meeting.m4a -b openai -m gpt-4o-mini-transcribe

# JSON output for piping
trx video.mp4 --output json --fields text

# Dry run to preview
trx video.mp4 --dry-run --output json

trx init

Install dependencies and configure the transcription backend.

trx init [flags]

Flags

FlagDescriptionDefault
-b, --backendBackend: local or openailocal
-m, --modelModel to download/configuresmall
-l, --languageDefault languageauto

What it does

Local backend:

  1. Installs whisper-cli, yt-dlp, ffmpeg via your OS package manager
  2. Downloads the selected Whisper model from Hugging Face
  3. Saves config to ~/.trx/config.json

OpenAI backend:

  1. Validates OPENAI_API_KEY is set
  2. Installs yt-dlp and ffmpeg (still needed for download/clean)
  3. Saves config with selected OpenAI model

trx doctor

Health check for all dependencies and configuration.

trx doctor [--output json]

Shows: installed dependencies, versions, config path, model status, backend, API key.


trx schema

Runtime introspection for agents. Returns the JSON schema of any command.

trx schema transcribe
trx schema init

Agents use this to discover available flags and their types without reading docs.

What is the agent skill?

trx ships with a SKILL.md that teaches Claude Code how to use the CLI and post-process transcription results. When installed, agents can:

  • Transcribe URLs and files autonomously
  • Fix common Whisper mistakes (proper nouns, technical terms)
  • Extract structured data from transcripts
  • Generate summaries, translations, and subtitles

Install

npx skills add crafter-station/trx -g

Or during trx init, accept the skill installation prompt.

How it works

  1. Agent calls trx schema transcribe to discover available flags
  2. Agent runs trx <input> --output json to get structured output
  3. Agent reads the .txt file and applies corrections
  4. Agent can chain with other tools (translation, summarization)

Example agent workflow

User: "Transcribe this video and fix any technical terms"

Agent:
1. trx schema transcribe          # discover flags
2. trx "https://..." --output json # transcribe
3. Read output .txt file           # get raw text
4. Fix "reakt" → "React", etc.    # post-process
5. Write corrected file            # save result

Self-correction patterns

The skill teaches agents to watch for:

  • Proper nouns: Brand names, people, places
  • Technical terms: Programming languages, frameworks, APIs
  • Homophones: “their/there/they’re”, “your/you’re”
  • Filler removal: “um”, “uh”, “like” (optional)

Config file

trx stores configuration at ~/.trx/config.json. Created automatically by trx init.

{
  "backend": "local",
  "modelPath": "~/.trx/models/ggml-small.bin",
  "modelSize": "small",
  "language": "auto",
  "threads": 8,
  "wordTimestamps": false,
  "openai": {
    "model": "gpt-4o-transcribe"
  },
  "whisperFlags": {
    "suppressNst": true,
    "noFallback": true,
    "entropyThold": 2.8,
    "logprobThold": -1.0,
    "maxContext": 0
  }
}

Fields

FieldTypeDescription
backend"local" | "openai"Active transcription backend
modelPathstringPath to the local Whisper model file
modelSizestringModel size identifier
languagestringDefault language ("auto" for detection)
threadsnumberCPU threads for local transcription
wordTimestampsbooleanEnable word-level SRT by default
openai.modelstringDefault OpenAI model
whisperFlagsobjectAdvanced whisper-cli flags

Environment variables

VariableRequiredDescription
OPENAI_API_KEYFor OpenAI backendYour OpenAI API key

Models directory

Downloaded models are stored at ~/.trx/models/. Each model is a .bin file downloaded from Hugging Face.

Override per-command

Any config value can be overridden via CLI flags:

# Override backend
trx video.mp4 --backend openai

# Override model
trx video.mp4 --model large-v3-turbo

# Override language
trx video.mp4 --language es