Data
Composition data for thousands of Chinese characters, constantly updated as new evidence emerges.
The Kanjimori database contains composition data for thousands of Chinese characters with an emphasis on those used in Japanese (Kanji). It's a "living database," meaning it's constantly being updated and refined as new evidence for a character's origin emerges. Check out the database report below for insights based on our data!
Dataset Completion Progress
Character Complexity Variation
Across Kanken Levels
Distribution of Roots and Phono-Semantic
Compounds Across Kanken Levels
Dataset Breakdown by Character Type
Sem = Semantic, Ph-Sem = Phono-Semantic
*Phono-semantic roots are characters that were historically phono-semantic compounds but were corrupted
beyond component recognition
The Kanjimori database is in many ways a culmination of community efforts to better understand the origins of Chinese characters. Some key sources used in constructing the database include:
We sincerely thank the authors and contributors of these sources for their help in making Kanjimori possible!
Public distributions of the data are currently unavailable while Kanjimori is under development, but the data will be made public with regular releases in the near future. Check back later for updates!