This package includes data derived from OpenSubtitles via OPUS in commonWords.json
(https://opus.nlpl.eu/).

Source dataset: OpenSubtitles 2024
Provider: Helsinki-NLP / OPUS

License: ODC-BY (Open Data Commons Attribution License)

Original data is attributed to subtitle contributors and rights holders.
This package contains a processed derivative:
- tokenized text
- frequency aggregation
- normalization and filtering

Any redistribution of this package must retain this notice.