
##### Support this project

> If this was helpful to you, please buy me a beer on PayPal: [Click Here to Buy Me a Beer](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=WXQKYYKPHWXHS)

-----------------

# m3u8-grabber

HLS archiver that downloads m3u8 playlists, video segments, encryption keys, and init segments to Google Cloud Storage. Preserves the original directory structure as-is.

## Supported Formats

| Format | Extensions | Status |
|--------|-----------|--------|
| MPEG-TS | `.ts` | Supported |
| Fragmented MP4 (fMP4) | `.m4s`, `.m4v`, `.mp4`, `.cmfv`, `.cmfa` | Supported |
| CMAF | `.m4s`, `.mp4` | Supported |
| Init segments | `#EXT-X-MAP` URI | Supported |
| Encryption keys | `#EXT-X-KEY` URI | Supported |
| Master playlists | Multi-variant with `#EXT-X-STREAM-INF` | Supported |
| Audio playlists | `#EXT-X-MEDIA` URI | Supported |

## Install

```bash
npm install m3u8-grabber
```

Requires Node.js >= 18.

## Usage

```typescript
import { Storage } from '@google-cloud/storage';
import { downloadM3u8 } from 'm3u8-grabber';

const storage = new Storage({
  credentials: require('./google-credentials.json'),
});

await downloadM3u8(
  storage,
  'https://example.com/hls/master.m3u8',
  'my-gcs-bucket',
  'output',          // local temp dir for m3u8 files
  'user-1234',       // GCS path prefix
);
```

### With progress callbacks

```typescript
await downloadM3u8(
  storage,
  'https://example.com/hls/master.m3u8',
  'my-gcs-bucket',
  'output',
  'user-1234',
  null,              // baseUrl (auto-detected)
  {
    onSegmentComplete: (file, completed, total) => {
      console.log(`[${completed}/${total}] ${file}`);
    },
    onSegmentError: (file, error) => {
      console.error(`Failed: ${file}`, error.message);
    },
  },
);
```

## How It Works

1. Downloads the m3u8 manifest file
2. Parses it using a state-machine parser (not regex) to extract:
   - Child m3u8 playlists (variant streams, audio tracks)
   - Media segments (`.ts`, `.m4s`, `.m4v`, `.mp4`, etc.)
   - Init segments (`#EXT-X-MAP` URIs)
   - Encryption keys (`#EXT-X-KEY` URIs)
3. Recursively processes child playlists
4. Streams all segments directly to GCS (no local buffering of large files)
5. Size-aware skip: existing GCS files are only skipped if their size matches the expected `Content-Length` — zero-byte or partially-uploaded files are re-downloaded

## Exported Functions

| Function | Description |
|----------|-------------|
| `downloadM3u8(storage, url, bucket, localDir, gcsDir, baseUrl?, options?)` | Download a full HLS stream to GCS |
| `streamDownloadToGcp(bucket, url, output, storage?, credsPath?, skip?)` | Stream a single file to GCS |
| `uploadLocalFileToGCS(storage, bucket, src, dest, skip?, retry?)` | Upload a local file to GCS with retry |
| `downloadFile(url, outputPath)` | Download a file to local disk |
| `extractTsUrls(m3u8Content)` | Extract segment URLs (`.ts` + fMP4) from manifest |
| `extractM3u8Urls(m3u8Content)` | Extract child playlist URLs from manifest |
| `extractKeyUrls(m3u8Content)` | Extract encryption key URLs from `#EXT-X-KEY` tags |
| `isValidManifestPath(url)` | Check if a string is a valid file path (not a tag line) |
| `extractBaseUrl(url)` | Get the base URL (everything before the filename) |
| `extractFilename(url)` | Get the filename from a URL (strips query params) |
| `extractParentFolder(url, offset?)` | Get a parent directory name from a URL |
| `createStorageClient(credsPath)` | Create a GCS Storage client from a credentials file |

## Retry & Resilience

- **HTTP downloads**: 5 retries with linear backoff (axios-retry)
- **GCS uploads**: 5 retries with size verification after each attempt
- **GCS client**: `IdempotencyStrategy.RetryAlways`, max 10 retries, 500s total timeout
- **Resume support**: size-aware skip checks `Content-Length` header against GCS blob metadata — partial uploads are detected and re-downloaded

## Running Tests

**Local:** Place `google-creds.json` at project root (GCP service account JSON key).

**CI/CD:** Set `GOOGLE_SERVICE_ACCOUNT_CREDS` (JSON or base64-encoded JSON) and `NODE_ENV=test` as pipeline secrets.

```bash
npm test              # watch mode
npm run test:no-reloading  # single run
```

Unit tests (parsing) run without GCS credentials. Integration tests (download + upload) require valid credentials and create/delete a temporary GCS bucket.

----------

##### Support this project

> If this was helpful to you, please buy me a beer on PayPal: [Click Here to Buy Me a Beer](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=WXQKYYKPHWXHS)

----------
