Harman Patil (Editor)

CEDICT

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

The CEDICT project was started by Paul Denisowski in 1997 and is maintained by MDBG, under the name CC-CEDICT, with the aim to provide a complete Chinese to English dictionary with pronunciation in pinyin for the Chinese characters.

Contents

Content

CEDICT is a text file; other programs (or simply Notepad or egrep or equivalent) are needed to search and display it. This project is considered a standard Chinese-English reference on the Internet and is used by several other Chinese-English projects. The Unihan Database uses CEDICT data for most of its information about character compounds, but this is auxiliary and is explicitly not a part of the main Unicode database [1].

Features:

  • Traditional Chinese and Simplified Chinese
  • Pinyin (several pronunciations)
  • American English (several)
  • As of 14 February 2016, it had 114,087 entries [2] in UTF-8.
  • The basic format of a CEDICT entry is:

    Traditional Simplified [pin1 yin1] /American English equivalent 1/equivalent 2/ 漢字 汉字 [han4 zi4] /Chinese character/CL:個|个/

    Example of a simple egrep search:

    $ egrep -i 有勇無謀 cedict.txt 有勇無謀 有勇无谋 [you3 yong3 wu2 mou2] /bold but not very astute/

    CEDICT has shown the way to some other projects:

  • HanDeDict (127,000 Chinese entries)
  • Chinese-German free dictionary
  • CFDICT (200,000 entries) for French
  • Some older CEDICT data is also found in the Adsotrans dictionary.
  • February 2012: ChE-DICC, the Spanish-Chinese free dictionary starts (currently beta)
  • CC-Canto is Pleco Software's addition of Cantonese language readings in Jyutping transcription to CC-CEDICT
  • Cantonese CEDICT features Cantonese language readings in Yale transcription and has Cantonese-specific words, many of which were taken from "A Dictionary of Cantonese Slang" (possible copyright infringement)
  • References

    CEDICT Wikipedia