# Open Chinese Convert 開放中文轉換

[![CMake](https://github.com/BYVoid/OpenCC/actions/workflows/cmake.yml/badge.svg)](https://github.com/BYVoid/OpenCC/actions/workflows/cmake.yml)
[![Bazel](https://github.com/BYVoid/OpenCC/actions/workflows/bazel.yml/badge.svg)](https://github.com/BYVoid/OpenCC/actions/workflows/bazel.yml)
[![MSVC](https://github.com/BYVoid/OpenCC/actions/workflows/msvc.yml/badge.svg)](https://github.com/BYVoid/OpenCC/actions/workflows/msvc.yml)
[![Node.js CI](https://github.com/BYVoid/OpenCC/actions/workflows/nodejs.yml/badge.svg)](https://github.com/BYVoid/OpenCC/actions/workflows/nodejs.yml)
[![Python CI](https://github.com/BYVoid/OpenCC/actions/workflows/python.yml/badge.svg)](https://github.com/BYVoid/OpenCC/actions/workflows/python.yml)
[![AppVeyor](https://img.shields.io/appveyor/ci/Carbo/OpenCC.svg)](https://ci.appveyor.com/project/Carbo/OpenCC)

[![GitHub downloads](https://img.shields.io/github/downloads/BYVoid/OpenCC/total)](https://github.com/BYVoid/OpenCC/releases)
[![WinGet](https://img.shields.io/winget/v/BYVoid.OpenCC)](https://winstall.app/apps/BYVoid.OpenCC)
[![npm package badge](https://img.shields.io/npm/v/opencc)](https://www.npmjs.com/package/opencc)
[![PyPI version](https://img.shields.io/pypi/v/opencc.svg)](https://pypi.org/project/opencc/)
[![Debian package](https://img.shields.io/debian/v/opencc/unstable)](https://packages.debian.org/search?keywords=opencc)
[![latest packaged version(s)](https://repology.org/badge/latest-versions/opencc.svg)](https://repology.org/project/opencc/versions)

## Introduction 介紹

![OpenCC](https://opencc.byvoid.com/img/opencc.png)

Open Chinese Convert (OpenCC, 開放中文轉換) is an open source project for conversions between Traditional Chinese, Simplified Chinese and Japanese Kanji (Shinjitai). It supports character-level and phrase-level conversion, character variant handling, and regional vocabulary variants across Mainland China, Taiwan and Hong Kong. This is not a translation tool between Mandarin and Cantonese, etc.

中文簡繁轉換開源項目，支持詞彙級別的轉換、異體字轉換和地區習慣用詞轉換（中國大陸、台灣、香港）及日本新字體轉換。不提供普通話與粵語之間的轉換。

Discussion (Telegram): https://t.me/open_chinese_convert

### Features 特點

* 嚴格區分「一簡對多繁」和「一簡對多異」。
* 完全兼容異體字，可以實現動態替換。
* 嚴格審校一簡對多繁詞條，原則爲「能分則不合」。
* 支持中國大陸、台灣、香港異體字和地區習慣用詞轉換，如「裏」「裡」、「鼠標」「滑鼠」。
* 詞庫和函數庫完全分離，可以自由修改、導入、擴展。

詳情參閱[OpenCC 設計思想](./DESIGN_PRINCIPLES.md)及[地區詞收錄標準](doc/regional-phrase-criteria.md)。

## Installation 安裝

### Package Managers 包管理器

* [Debian](https://tracker.debian.org/pkg/opencc)
* [Ubuntu](https://launchpad.net/ubuntu/+source/opencc)
* [Fedora](https://packages.fedoraproject.org/pkgs/opencc/opencc/)
* [Arch Linux](https://archlinux.org/packages/extra/x86_64/opencc/)
* [macOS (Homebrew)](https://formulae.brew.sh/formula/opencc)
* [WinGet](https://github.com/microsoft/winget-pkgs/tree/master/manifests/b/BYVoid/OpenCC)
    * 使用 `winget install opencc` 命令可直接安裝 opencc.exe 應用程式，含 Jieba 分詞插件
* [Bazel](https://registry.bazel.build/modules/opencc)
* [Node.js](https://npmjs.org/package/opencc)
    * 使用 `npm install -g opencc` 命令可安裝 OpenCC Node.js CLI
    * 使用 `npm install -g opencc opencc-jieba` 命令可同時安裝 OpenCC Node.js CLI 及 Jieba 分詞插件
* [Python](https://pypi.org/project/OpenCC/)
    * 使用 `pip install opencc` 命令可安裝 Python API 及 Python CLI
* [More (Repology)](https://repology.org/project/opencc/versions)

### Prebuilt binaries 預編譯二進位檔

* Windows (x86_64): [OpenCC-1.3.2](https://github.com/BYVoid/OpenCC/releases/download/ver.1.3.2/OpenCC-1.3.2-windows-x64-portable.zip) ([SHA-256](https://github.com/BYVoid/OpenCC/releases/download/ver.1.3.2/OpenCC-1.3.2-windows-x64-portable.zip.sha256))
    * This Windows release is available from WinGet. For details, see [doc/windows-winget-release.md](doc/windows-winget-release.md).
    * Requires Microsoft Visual C++ Redistributable for Visual Studio 2015-2026. Download the latest version from [Microsoft](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170#latest-supported-redistributable-version).
* Debian/Ubuntu (amd64):
    * [opencc_1.3.2_amd64.deb](https://github.com/BYVoid/OpenCC/releases/download/ver.1.3.2/opencc_1.3.2_amd64.deb)
    * [opencc-jieba_1.3.2_amd64.deb](https://github.com/BYVoid/OpenCC/releases/download/ver.1.3.2/opencc-jieba_1.3.2_amd64.deb)

## Usage 使用

### Online 線上轉換

https://opencc.js.org/converter?config=s2t

### Node.js

`npm install opencc`

The npm package supports Node.js `>=20.17`. It uses bundled Node-API
prebuilds when available and falls back to a local `node-gyp` build when the
current platform does not have a matching prebuild.

To install the npm CLI:

```sh
npm install -g opencc
opencc -c s2t.json -i input.txt -o output.txt
```

The npm CLI supports basic text conversion. Plugins, `--inspect`, and
`--segmentation` require the native OpenCC CLI.

```ts
import { OpenCC } from 'opencc';
async function main() {
  const converter: OpenCC = new OpenCC('s2t.json');
  const result: string = await converter.convertPromise('汉字');
  console.log(result);  // 漢字
}
```

See [demo.js](https://github.com/BYVoid/OpenCC/blob/master/node/demo.js) and [ts-demo.ts](https://github.com/BYVoid/OpenCC/blob/master/node/ts-demo.ts).

### Python

`pip install opencc` (Windows, Linux, macOS)

```python
import opencc
converter = opencc.OpenCC('s2t.json')
converter.convert('汉字')  # 漢字
```

The Python package also installs a basic CLI:

```sh
pip install opencc
opencc -c s2t.json -i input.txt -o output.txt
```

The Python CLI supports basic text conversion, `--include-tofu-risk-dictionaries`,
and `--resource-zip`. Diagnostic modes such as `--inspect` and `--segmentation`
still require the native OpenCC CLI.

### C++

```c++
#include "opencc.h"

int main() {
  const opencc::SimpleConverter converter("s2t.json");
  converter.Convert("汉字");  // 漢字
  return 0;
}
```

[Full example with Bazel](https://github.com/BYVoid/opencc-bazel-example)

When OpenCC is embedded in a server binary or self-contained application, the
JSON config can stay small while dictionary resources are loaded from explicit
resource directories:

```c++
#include <memory>
#include <vector>

#include "SimpleConverter.hpp"

int main() {
  auto resources = std::make_shared<opencc::FilesystemResourceProvider>(
      std::vector<std::string>{
          "/opt/my-app/opencc",
          "/opt/my-app/plugins/opencc-jieba",
          "/usr/share/opencc",
      });
  const opencc::SimpleConverter converter("s2t.json", resources);
  converter.Convert("汉字");
  return 0;
}
```

`FilesystemResourceProvider` searches directories in order. Existing
`SimpleConverter("s2t.json")` and CLI behavior continue to use the config file
location, current directory, explicit paths, and installed OpenCC data directory
as before.

### C

```c
#include "opencc.h"

int main() {
  opencc_t opencc = opencc_open("s2t.json");
  const char* input = "汉字";
  char* converted = opencc_convert_utf8(opencc, input, strlen(input));  // 漢字
  opencc_convert_utf8_free(converted);
  opencc_close(opencc);
  return 0;
}

```

[Full Document 完整文檔](https://opencc.byvoid.com/docs/)

### Command Line

Unless otherwise noted, this section describes the native OpenCC CLI built from
the C++ toolchain. The Python and npm CLIs support basic file/stdin conversion
only, plus `--include-tofu-risk-dictionaries`; the Python CLI also supports
`--resource-zip`.

* `opencc --help`
* `opencc_dict --help`

#### Segmentation and Inspection Modes

OpenCC CLI supports two diagnostic modes that output JSON instead of converted text:

**`--segmentation`** — Output segmentation result only (no conversion):

```bash
echo "他只看了几行日志，就一叶知秋，猜到整个系统是数据库连接池出了问题" | opencc -c s2twp.json --segmentation
# {"input":"他只看了几行日志，就一叶知秋，猜到整个系统是数据库连接池出了问题","segments":["他","只看","了几行","日志","，就","一叶知秋","，猜到","整个","系统","是","数据库","连接池","出了","问题"]}
```

**`--inspect`** — Output full inspection result (segmentation + per-stage conversion + final output):

```bash
echo "他只看了几行日志，就一叶知秋，猜到整个系统是数据库连接池出了问题" | opencc -c s2twp.json --inspect
# {"input":"他只看了几行日志，就一叶知秋，猜到整个系统是数据库连接池出了问题","segments":["他","只看","了几行","日志","，就","一叶知秋","，猜到","整个","系统","是","数据库","连接池","出了","问题"],"stages":[{"index":1,"segments":["他","只看","了幾行","日誌","，就","一葉知秋","，猜到","整個","系統","是","數據庫","連接池","出了","問題"]},{"index":2,"segments":["他","只看","了幾行","日誌","，就","一葉知秋","，猜到","整個","系統","是","資料庫","連線池","出了","問題"]},{"index":3,"segments":["他","只看","了幾行","日誌","，就","一葉知秋","，猜到","整個","系統","是","資料庫","連線池","出了","問題"]}],"output":"他只看了幾行日誌，就一葉知秋，猜到整個系統是資料庫連線池出了問題"}

# Pretty-print with jq:
echo "他只看了几行日志，就一叶知秋，猜到整个系统是数据库连接池出了问题" | opencc -c s2twp.json --inspect | jq .
```

These modes are useful for diagnosing conversion issues:

1. Use `--segmentation` to verify that the input is segmented as expected.
2. Use `--inspect` to see which conversion stage produces an unexpected result.

Rules:
- `--segmentation` and `--inspect` are mutually exclusive.

### Official / Recommended Ports

The following ports are maintained within the OpenCC ecosystem and are generally up to date with current configuration and dictionary data.

* Data package: [opencc-data](https://www.npmjs.com/package/opencc-data)
* Pure JavaScript: [opencc-js](https://www.npmjs.com/package/opencc-js)
    * See [notes about different OpenCC NPM packages](#links-%E7%9B%B8%E9%97%9C%E9%8F%88%E6%8E%A5) below.
* WebAssembly: [opencc-wasm](https://www.npmjs.com/package/opencc-wasm) ([website](https://opencc.js.org/))
* Pure Python: [opencc-py](https://pypi.org/project/opencc-py/) (pre-release)

### Other Ports (Unofficial)

These ports are community-maintained and may not always track upstream updates.

* Swift (iOS): [SwiftyOpenCC](https://github.com/XQS6LB3A/SwiftyOpenCC)
* iOSOpenCC (pod): [iOSOpenCC](https://github.com/swiftdo/OpenCC)
* Java: [opencc4j](https://github.com/houbb/opencc4j)
* Android: [android-opencc](https://github.com/qichuan/android-opencc)
* PHP: [opencc4php](https://github.com/nauxliu/opencc4php)
* WebAssembly: [wasm-opencc](https://github.com/oyyd/wasm-opencc)
* Browser Extension: [opencc-extension](https://github.com/tnychn/opencc-extension)
* Go (Pure): [OpenCC for Go](https://github.com/longbridge/opencc)
* Dart (native-assets): [opencc-dart](https://github.com/lindeer/opencc-dart)

### Configurations 配置文件

#### 預設配置文件

* `s2t.json` **Simplified Chinese** to **Traditional Chinese (OpenCC Standard)** / **簡體** 到 **OpenCC 標準繁體**
* `t2s.json` **Traditional Chinese (OpenCC Standard)** to **Simplified Chinese** / **OpenCC 標準繁體** 到 **簡體**
* `s2tw.json` **Simplified Chinese** to **Traditional Chinese (Taiwan Standard)** / **簡體** 到 **台灣正體**
* `tw2s.json` **Traditional Chinese (Taiwan Standard)** to **Simplified Chinese** / **台灣正體** 到 **簡體**
* `s2hk.json` **Simplified Chinese** to **Traditional Chinese (Hong Kong variant)** / **簡體** 到 **香港繁體**
* `hk2s.json` **Traditional Chinese (Hong Kong variant)** to **Simplified Chinese** / **香港繁體** 到 **簡體**
* `s2twp.json` **Simplified Chinese** to **Traditional Chinese (Taiwan Standard, with Taiwan Phrases)** / **簡體** 到 **台灣正體（含台灣常用詞彙）**
* `tw2sp.json` **Traditional Chinese (Taiwan Standard)** to **Simplified Chinese (Mainland China Phrases)** / **台灣正體** 到 **簡體（含中國大陸常用詞彙）**
* `t2tw.json` **Traditional Chinese (OpenCC Standard)** to **Traditional Chinese (Taiwan Standard)** / **OpenCC 標準繁體** 到 **台灣正體**
* `tw2t.json` **Traditional Chinese (Taiwan Standard)** to **Traditional Chinese (OpenCC Standard)** / **台灣正體** 到 **OpenCC 標準繁體**
* `t2hk.json` **Traditional Chinese (OpenCC Standard)** to **Traditional Chinese (Hong Kong variant)** / **OpenCC 標準繁體** 到 **香港繁體**
* `hk2t.json` **Traditional Chinese (Hong Kong variant)** to **Traditional Chinese (OpenCC Standard)** / **香港繁體** 到 **OpenCC 標準繁體**

下列配置文件仍在開發中，歡迎貢獻新詞組：

* `s2hkp.json` **Simplified Chinese** to **Traditional Chinese (Hong Kong variant, with Hong Kong Phrases)** / **簡體** 到 **香港繁體（香港常用詞彙）**
* `hk2sp.json` **Traditional Chinese (Hong Kong variant)** to **Simplified Chinese (Mainland China Phrases)** / **香港繁體** 到 **簡體（含中國大陸常用詞彙）**

下列配置文件僅供探索性研究，不建議用於生產環境：

* `t2jp.json` **Old Japanese Kanji (Kyūjitai)** to **New Japanese Kanji (Shinjitai)** / **日文舊字體** 到 **日文新字體**
* `jp2t.json` **New Japanese Kanji (Shinjitai)** to **Old Japanese Kanji (Kyūjitai)** / **日文新字體** 到 **日文舊字體**，並將少量日文詞組轉換爲對應中文

#### 指定配置文件

通过环境变量`OPENCC_DATA_DIR`加载指定路径下的配置文件
```sh
OPENCC_DATA_DIR=/path/to/your/config/dir opencc --help
```

#### 內聯字典（inline dictionary）

配置檔中的字典可使用 `type: "inline"`，直接在 JSON 裡定義小型自訂詞彙，
不必修改外部字典檔。例如在 `group.dicts` 最前面加入覆寫規則：

```json
{
  "conversion_chain": [
    {
      "dict": {
        "type": "group",
        "dicts": [
          {
            "type": "inline",
            "entries": {
              "麦旋风": "冰炫風",
              "服务器": "伺服器"
            }
          },
          {
            "type": "ocd2",
            "file": "STPhrases.ocd2"
          },
          {
            "type": "ocd2",
            "file": "STCharacters.ocd2"
          }
        ]
      }
    }
  ]
}
```

規則與限制：

- `entries` 必須是 JSON 物件。
- `entries` 的 key/value 必須是非空字串。
- 重複 key 不受支援；如包含，載入會直接失敗（拋出錯誤）。
- key/value 會按解析結果原樣使用，不做 trim、大小寫折疊或 Unicode normalization。
- 內聯字典與普通字典行為一致，優先級由 `group.dicts` 的順序決定。
- 內聯字典輸出仍會繼續經過後續 `conversion_chain` 步驟，不提供鎖定最終輸出。

備註：OpenCC 1.3.2+ 解析器支援有限 JSONC 語法（`//`、`/* */` 註解與尾逗號）。
若需跨實作相容，建議使用嚴格 JSON，不依賴 JSONC 擴充。

更多完整示例可見 `examples/config/`。該目錄僅供學習與自訂參考，不屬於官方內建
配置列表。

### Experimental Plugins 試驗性插件

OpenCC 現已支援外部 C++ 分詞插件。當前第一個插件為 `opencc-jieba`，
可通過 `s2t_jieba.json`、`s2tw_jieba.json`、`s2hk_jieba.json`、
`s2twp_jieba.json`、`tw2sp_jieba.json` 等插件配置啓用。

OpenCC now supports external C++ segmentation plugins. The first plugin is
`opencc-jieba`, which can be enabled through plugin-backed configs such as
`s2t_jieba.json`, `s2tw_jieba.json`, `s2hk_jieba.json`,
`s2twp_jieba.json`, and `tw2sp_jieba.json`.

注意：

- 該插件機制目前仍為試驗性功能。
- `jieba` 插件是可選組件，預設 OpenCC 構建、Python 套件和 Node.js 套件都不要求它。
- `opencc-jieba` 額外依賴 `cppjieba` 及其配套詞典資源，這些依賴僅在構建或分發該插件時需要。
- 在下一次正式發布版本之前，插件 ABI 仍可能發生變化，不應視為穩定介面。
- 我們預計從下一次正式發布版本開始，將插件 ABI 視為穩定介面。
- Windows 下插件必須與宿主 OpenCC 二進位使用 ABI 相容的工具鏈／執行時構建；MSVC 與 MinGW 產物不支援混用。

Notes:

- The plugin mechanism is currently experimental.
- The `jieba` plugin is optional and is not required for the default OpenCC
  build, Python package, or Node.js package.
- `opencc-jieba` additionally depends on `cppjieba` and its dictionary
  resources. These dependencies are only needed when building or distributing
  the plugin itself.
- The plugin ABI may still change before the next formal OpenCC release and
  should not yet be treated as stable.
- We expect to treat the plugin ABI as stable starting with the next formal
  OpenCC release.
- On Windows, plugins must be built with an ABI-compatible toolchain/runtime as
  the host OpenCC binary. Mixing MSVC-built hosts with MinGW-built plugins, or
  the reverse, is unsupported.

## Build 編譯

### Build with CMake

#### Linux & macOS

g++ 4.6+ or clang 3.2+ is required.

```bash
make
```

#### Windows Visual Studio:

```bash
build.cmd
```

### Build with Bazel

```bash
bazel build //:opencc
```

### Test 測試

#### Linux & macOS

```
make test
```

#### Windows Visual Studio:

```bash
test.cmd
```

#### Test with Bazel

```bash
bazel test --test_output=all //src/... //data/... //python/... //test/...
```

### Benchmark 基準測試

```
make benchmark
```

詳情見 [doc/benchmark.md](doc/benchmark.md) 檔案。

## Projects using OpenCC 使用 OpenCC 的項目

Please update if your project is using OpenCC.

* [ibus-pinyin](https://github.com/ibus/ibus-pinyin)
* [fcitx](https://github.com/fcitx/fcitx)
* [rimeime](https://rime.im/)
* [libgooglepinyin](http://code.google.com/p/libgooglepinyin/)
* [ibus-libpinyin](https://github.com/libpinyin/ibus-libpinyin)
* [alfred-chinese-converter](https://github.com/amowu/alfred-chinese-converter)
* [GoldenDict](https://github.com/goldendict/goldendict)
* [China Biographical Database Project (CBDB)](https://cbdb.hsites.harvard.edu/)

## License 許可協議

Apache License 2.0

## Third Party Libraries 第三方庫

* [darts-clone](https://github.com/s-yata/darts-clone) BSD License
* [marisa-trie](https://github.com/s-yata/marisa-trie) BSD License
* [tclap](http://tclap.sourceforge.net/) MIT License
* [rapidjson](https://github.com/Tencent/rapidjson) MIT License
* [Google Test](https://github.com/google/googletest) BSD License
* [cppjieba](https://github.com/yanyiwu/cppjieba) MIT License
  - Optional dependency used by the experimental `opencc-jieba` plugin.
  - 試驗性 `opencc-jieba` 插件使用的可選依賴。

## Change History 版本歷史

* [NEWS](https://github.com/BYVoid/OpenCC/blob/master/NEWS.md)
  - 另見 https://opencc.byvoid.com/news/

## Links 相關連結

* [Publications Using OpenCC](https://github.com/BYVoid/OpenCC/blob/master/PUBLICATIONS.md) - 近年來使用了 OpenCC 的研究論文選錄
* [現代漢語常用繁簡轉換匹配辨析表](https://github.com/BYVoid/OpenCC/blob/master/doc/characters-easy-to-misuse.md)
* 關於 [`opencc`](https://www.npmjs.com/package/opencc), [`opencc-js`](https://www.npmjs.com/package/opencc-js) 与 [`opencc-wasm`](https://www.npmjs.com/package/opencc-wasm) 三个 NPM packages 區別的說明
  https://github.com/nk2028/opencc-js/blob/HEAD/README-zh-TW.md#%E8%88%87-opencc-npm-package-%E7%9A%84%E5%8D%80%E5%88%A5

## Contributors 貢獻者

* [BYVoid](http://www.byvoid.com/)
* [佛振](https://github.com/lotem)
* [Peng Huang](https://github.com/phuang)
* [LI Daobing](https://github.com/lidaobing)
* [Kefu Chai](https://github.com/tchaikov)
* [Kan-Ru Chen](http://kanru.info/)
* [Ma Xiaojun](https://twitter.com/damage3025)
* [Jiang Jiang](http://jjgod.org/)
* [Ruey-Cheng Chen](https://github.com/rueycheng)
* [Paul Meng](http://home.mno2.org/)
* [Lawrence Lau](https://github.com/ktslwy)
* [瑾昀](https://github.com/kunki)
* [內木一郎](https://github.com/SyaoranHinata)
* [Marguerite Su](https://www.marguerite.su/)
* [Brian White](http://mscdex.net)
* [Qijiang Fan](https://fqj.me/)
* [LEOYoon-Tsaw](https://github.com/LEOYoon-Tsaw)
* [Steven Yao](https://github.com/stevenyao)
* [Pellaeon Lin](https://github.com/pellaeon)
* [stony](https://github.com/stony-shixz)
* [steelywing](https://github.com/steelywing)
* [吕旭东](https://github.com/lvxudong)
* [Weng Xuetian](https://github.com/wengxt)
* [Ma Tao](https://github.com/iwater)
* [Heinz Wiesinger](https://github.com/pprkut)
* [J.W](https://github.com/jakwings)
* [Amo Wu](https://github.com/amowu)
* [Mark Tsai](https://github.com/mxgit1090)
* [Zhe Wang](https://github.com/0x1997)
* [sgqy](https://github.com/sgqy)
* [Qichuan (Sean) ZHANG](https://github.com/qichuan)
* [Flandre Scarlet](https://github.com/XadillaX)
* [宋辰文](https://github.com/songchenwen)
* [iwater](https://github.com/iwater)
* [Xpol Wan](https://github.com/xpol)
* [Weihang Lo](https://github.com/weihanglo)
* [Cychih](https://github.com/pi314)
* [kyleskimo](https://github.com/kyleskimo)
* [Ryuan Choi](https://github.com/bunhere)
* [Prcuvu](https://github.com/Prcuvu)
* [Tony Able](https://github.com/TonyAble)
* [Xiao Liang](https://github.com/yxliang01)
* [Frank Lin](https://github.com/frankslin)

Please feel free to update this list if you have contributed OpenCC.
