# speak v1.1 Plan

## Completed in this session

### 1. Streaming Audio Fix (Short Content)
**Status:** DONE

Fixed audio cutting off at beginning when using `--stream` mode with short content.

**Root cause:** `Buffer.from(chunk.buffer, ...)` creates a view, not a copy. When the Float32Array was reused for the next read, it corrupted data still queued in the speaker's buffer.

**Files changed:**
- `src/audio/stream-player.ts` - Allocate new buffer per push instead of reusing view
- `src/bridge/binary-reader.ts` - Copy socket buffers immediately to avoid Bun reuse

### 2. Streaming Hang Fix (Long Content) - GitHub Issue #13
**Status:** DONE

Fixed streaming mode hanging indefinitely on long content (2.5KB+, ~150s audio).

**Root cause:** The `for await` loop reading from socket was blocked while `handleChunk()` waited for buffer to drain. This caused a deadlock - socket couldn't be read while processing, leading to socket timeout.

**Solution:** Implemented producer-consumer pattern:
- Producer: Reads chunks from socket into a queue (runs concurrently)
- Consumer: Processes chunks from queue (can block on buffer writes)
- Socket reading and chunk processing now happen in parallel

**Files changed:**
- `src/streaming/orchestrator.ts` - Refactored to use concurrent producer-consumer pattern

**Test results:**
- Short (21 chars, 0.9s audio): ✓ Works
- Medium (329 chars, 17s audio): ✓ Works  
- Long (2.5KB, 149s audio): ✓ Works - completes in ~160s

### 3. Server Idle Auto-Shutdown
**Status:** DONE

Server now automatically shuts down after 1 hour of no TTS inference requests.

- Health checks and list-models don't reset the timer
- Only `generate` and `stream-binary` (actual TTS) reset the timer
- Uses `select()` with 60s timeout to check idle state periodically

**Files changed:**
- `src/python/server.py` - Added `IDLE_TIMEOUT_SECONDS` and idle tracking logic

### 4. SKILL.md Updates
**Status:** DONE

- Fixed non-existent `daemon start`/`daemon stop` commands
- Made `--stream` the recommended option
- Added Performance section with timing expectations
- Documented actual server behavior (auto-start, auto-shutdown)

### 5. Version Bump
**Status:** DONE

- Updated version from 0.1.0 to 1.1.0 in `package.json` and `src/index.ts`

---

## Known Limitations

### Streaming Latency for Long Text
The `--stream` flag enables streaming playback, but there's still latency before audio starts:

- **Short text (<250 chars)**: ~3-8 seconds to first audio (full generation before playback)
- **Long text (>250 chars)**: Text is split into chunks, each chunk generates then streams

This is because mlx-audio's `generate_audio()` generates the complete audio for a text segment before returning. True token-by-token streaming would require changes to the underlying TTS library.

**Current behavior:**
1. Text split into ~250 char chunks
2. Each chunk: generate (3-8s) → send → play
3. Next chunk starts generating while previous plays

**Workaround:** For long content, the chunking provides pseudo-streaming - audio starts playing after the first chunk generates, while subsequent chunks generate in parallel with playback.

---

## Future Improvements (v1.2+)

### True Streaming (Requires mlx-audio changes)
- mlx-audio has `stream=True` parameter but it doesn't work well currently
- Would need to investigate or contribute upstream fix
- Goal: Start playing audio within 1-2 seconds regardless of text length

### Voice Cloning Improvements
- Voice cloning already works with `--voice sample.wav`
- Future: Cache voice embeddings for reuse
- Future: Voice presets directory

### Daemon Commands
- Implement `speak daemon start` (explicit warm-up)
- Implement `speak daemon stop` (graceful shutdown)
- Consider `speak daemon status` for detailed info
