+
Jim Rose's
kradfile-u (Unicode) License Agreement
Radical Decomposition of 13,108 Japanese Characters
Home | ICE MOCHA | Expresso Ristretto | Using Expresso | The Day I Met King Olav
Downloads: 1,654 SODs & SODAs | kradfile-u | Katakana Tutorial | Katakana Chart | Hiragana Chart

A merger of kradfile, kradfile2, and 952 new decompositions.
(License applies to both kradfile-u & kradfile2).



DOWNLOADS:

EUC-JP Encoded KRADFILE2:
kradfile2.gz

UTF-8 Encoded KRADFILE-U:
kradfile-u.gz


MODIFICATION OF GRANT OF LICENSE (Effective January 1, 2012)

The (5) ADDITIONAL LIMITATIONS attached to the below original license are no longer requirments, but are suggestions.


ORIGINAL EFFECTIVE DATE:

June 10, 2009


GRANT OF LICENSE

This Grant of License fully covers the kanji radical decompositions of the 5,801 JIS x 0212 characters contained in the file "kradfile2", and also applies to those portions of the file "kradfile-u" consisting of the radical decompositions of the same 5,801 JIS x 0212 kanji contained in kradfile2, and the radical decompositions of an additional 952 JIS x 0213 kanji. This license also applies to the radkfile2 and radkfile-u derivatives of kradfile2 and kradfile-u respectively.

License is hereby granted to use the file known as "kradfile2", and / or those portions of the file known as "kradfile-u" which are derived from "kradfile2" and the additional 952 JIS x 0213 kanji radical decompositions therein, such that this Grant of License shall in form and function mirror the Grant of License contained in the GENERAL DICTIONARY LICENCE STATEMENT established by Professor Jim Breen and the ELECTRONIC DICTIONARY RESEARCH AND DEVELOPMENT GROUP at Monash University, Australia, in each and every way that said license pertains to Jim Breen's "kradfile" SUBJECT to the following (5) ADDITIONAL LIMITATIONS:

  1. You must provide James Rose, at email address Jim(at)KanjiCafe.com, the NAME of your project which will use either kradfile2 and / or kradfile-u, and...
  2. You must provide in the same message the general LOCATION of the project which will use either kradfile2 or kradfile-u, and...
  3. You must provide in the same message YOUR NAME, and...
  4. You must provide in the same message notification of WHICH FILE (kradfile2 or kradfile-u) you intend to use, and...
  5. Your license is shall be considered legal and valid ONLY once this information is posted on this page in the next paragraph, recognizing you as a fully compliant and licensed user of the files:

You may optionally provide an additional link to a personal page about yourself.


COMMERCIAL LICENSE NOTICE

As of the January 1, 2012, no special commercial license is required for use of either kradfile-u or kradfile2.



kradfile-u INTRODUCTION


KRADFILE - Unicode
(kradfile-u)

Radical Decomposition of 13,108 Japanese Characters

A merger of kradfile, kradfile2, and 952 new decompositions.

Copyright 2001 / 2007 / 2009:

952 JIS x 0213 kanji radical decompositions
Copyright 2009 James Rose and the KanjiCafe.com

The 5,801 JIS x 0212 kanji radical decompositions
Copyright 2007 James Rose and the KanjiCafe.com

The 6,355 JIS x 0208 kanji radical decompositions
Copyright 2001/2007 Michael Raine, James Breen and the
Electronic Dictionary Research & Development Group at Monash University.

A Grant of License detailing legal use of this file can be found at:
http://www.kanjicafe.com/kradfile_license.htm

Jim Rose:
In the CJK Unified Ideographs range of Unicode, Japanese Kanji were assigned code points corresponding to each of the characters in the 2,965 most common kanji of the JIS x 0208 Level 1, the 3,390 next most common kanji of the JIS x 0208 Level 2, and the 5,801 kanji of the JIS x 0212 standard intended to supplement and extend the JIS x 0208.

CJK Unified Ideographs Extension B of Unicode version 3.2 allocated code points to the 3,695 kanji defined in the 2004 JIS x 0213, which was also intended to supplement and extend the JIS x 0208. 952 kanji defined in the JIS x 0213 do not occur in the JIS x 0212 (which I reckon means that 3,058 kanji in the JIS x 0212 were not included in the JIS x 0213).

CJK Unified Ideographs Extension A added to its list of encoded Japanese characters the Unified Japanese IT Vendors Contemporary Ideographs. We do not believe these characters are of any practical value to users of current computing platforms and they are ignored.

Personal Note:
When I first created my version of a kanji selection by-multi-radical interface on the ICE MOCHA tool at Kanjicafe.com, I thought that the radical selection interface would make a good starting point for a tool to both glean errors and improve kradfile, and help build a bigger "kradfile" which added the JIS x 0212 kanji. Jim Breen had mentioned that he would like to see the JIS x 0213 set decomposed by radical, but my interface was designed to handle Extended Unix Code Japanese, which included the JIS x 0208 and JIS x 0212, but did not include the newer JIS x 0213 standard. Rather than deal with learning how to cope with Unicode, I plowed ahead and developed kradefile2 as a JIS x 0212 extension and companion to the JIS x 0208 based kradfile.

But since we all know that we are slowly migrating our tools and systems to Unicode, the idea of at least converting my little radical decomposition tool over kept gnawing at me. So after pestering Jim Breen some more about it, on June 1, 2009 Professor Breen sent me a file with the 952 JIS x 0213 kanji needed to make a "complete" radical decomposition of all the Japanese kanji defined in the CJK section of Unicode, and I commenced to finish this project to a higher state of "doneness".

Converting the tool over to Unicode opens up the possibility of using the same code base to develop radical decompositions in Chinese or Korean (the C & K of CJK), and if there is anyone interested in pursuing this, please contact me.

There are some noteworthy changes to the new file vis-a-vis the kradfile and kradfile2 legacy data.

1) the encoding scheme now in use is no longer EUC-JP, and the convenient 2 bytes for the JIS x 0208 and 3 bytes for the JIS x 0212. The encoding of this file is now UTF-8, and as such, the byte length of each character is highly variable. Processing Unicode properly requires that your software does not rely on a fixed byte length. The primary reason for the change of encoding method is that the JIS x 0213 standard kanji are not defined in the Extended Unix Code Japanese encoding scheme which predates it (EUC-JP).

2) UTF-8 is a Unicode encoding, but keep in mind that Unicode itself is not. There may come a time and place when you are using Unicode, but not UTF-8. I doubt it, but I thought I would just throw that out for clarity.

3) The original kradfile used JIS x 0208 kanji to represent radicals. In several instances there were no JIS x 0208 kanji which were also representative of the radical alone, so a JIS x 0208 kanji containing the radical was used as a kind of radical "place holder". When I developed kradfile2, I maintained this convention so that kradfile2 would be simple to integrate with existing tools already using kradfile.

The following legacy JIS x 0208 kanji "place holders" are now replaced by the radical/element itself:

Unicode's inclusion of the JIS x 0212 and JIS x 0213 kanji allow us to replace most of the "place holder" kanji with the actual radical. In fact, Unicode also defines all 214 Kangxi radicals from Mei Yingzuo's Zihui, or "Character Collection/Categorization" published in 1615, so we can do away with all but two JIS x 0208 representative "place holder" characters. One of these is a two stroke radical defined by Andrew Nelson in his 1962 "The Original Modern Reader's Japanese-English Character Dictionary". I'm not sure where the other 11 stroke radical came from, but Jim can edit this sentence for me. These are represented instead by 并 (5E76) and 滴 (6EF4).

Other than the encoding change, the file is still in the same basic format as the legacy kradfile and kradfile2.

Decomposition of the JIS x 0213:
Two fonts were used in the decomposition of the JIS x 0213 so as to include as much variation in the appearance of the kanji as possible. There were several instances when one of the two fonts used (HiraMinPro-W3 and IPAMincho) showed a particular stroke more distinctly than the other, and vise-versa.

Thus despite the numerical pausity of fonts which reach into the JIS x 0213, using two fonts provided enough variety to add valuable clarity when distinguishing strokes and choosing radicals / elements.

The useable portion of the file consists of 13,108 lines of text; one for each of the:

  - 6,355 kanji defined in the JIS x 0208-1997 standard
  - 5,801 kanji defined in the JIS x 0212-1990 standard
  - 952 kanji defined in the JIS x 0213-2004 standard
          and not found in the JIS x 0212

Each line is as follows:
  - the kanji itself,
  - a space followed by a colon (:) followed by a space,
  - one or more radicals/elements which can be seen in the kanji.
  - the radical/elements are themselves separated by a space

The decomposition is based on what can be seen in typical kanji glyphs. Elements themselves can be further subdivided.

You can contact Jim Rose at Jim(at)Kanjicafe.com.

Jim Rose, Christiansted, United States Virgin Islands
June 2009


KRADFILE INTRODUCTION

##########################################################



K R A D F I L E
Copyright 2001/2007 Michael Raine, James Breen and the
Electronic Dictionary Research & Development Group at Monash University.
See: http://www.csse.monash.edu.au/~jwb/edrdg/licence.html
for permissions for use and redistribution.

This is the data file from which the "radkfile" is made, which in turn drives the multi-radical lookup method in XJDIC, WWWJDIC and possibly other dictionary and related software.

The file is based on work done in 1994/1995 by Michael Raine in which he analyzed all the JIS1/2 kanji and identified the constituent radicals and other common elements, with the intention of facilitating the selection of kanji within a dictionary program by identifying multiple elements. The file was revised by Jim Breen in September 1995. Further revisions were done in 1998/9 at the suggestion of Wolfgang Conrath, then a revision was carried out in 2001 using suggestions from Yutaka Ohno based on a similar decomposition made by Kobayashi. Further amendments were made in July 2001 after suggestions from Hendrik.

The file consists of 6,355 lines of text; one for each of the JIS x 0208-1997 kanji. Each line is a follows:
  - the kanji itself,
  - a space followed by a colon (:) followed by a space,
  - one or more radicals/elements which can be seen in the kanji. These
    are drawn from JIS x 0208-1997. Where the element alone is not in JIS x 0208, a kanji which contains the element is used instead.

The decomposition is based on what can be seen in typical kanji glyphs. Elements themselves can be further subdivided. For example, 舌 is an element and so is 口, so the elements in 話 are <口 舌 言>.

Jim Breen, Tokyo, January 2001
    Melbourne, July 2001
    Melbourne, Dec 2004

##########################################################
Nov 2004 - 八 replaced by ハ and 并
Aug 2005 - added 斉; replaced 薺 with 齊
Jan 2006 - added 一 to 今
Apr 2006 - changed 坐, 座 and 挫 from 入 to 人
Aug 2006 - added 卩 to 危 and 卵, dropped 刈 from 唖
Sep 2006 - added 刀 and 氏 to 齊 and derivatives
Nov 2006 - added 巛 as an indexer, replacing 川 for many kanji
Jan 2007 - revised 春榛奏泰椿俸奉捧棒湊輳 adding 人 and removing ノ
Sep 2007 - made sure all the 糸 indices also had 幺 and 小
Apr 2008 - added 廾 to all cases of 齊
Dec 2008 - added ハ to 詮,粉; 一 and | to 置; | and 丶 to 否
##########################################################



kradfile2 INTRODUCTION


K R A D F I L E - 2


Copyright 2007 James Rose and the KanjiCafe.com.

Special GRANT OF LICENSE is hereby given to James Breen and the Electronic Dictionary Research & Development Group at Monash University such that said licensees may maintain, modify, use, and redistribute this file. Derivatives should maintain this notice. All other rights reserved.

A Grant of License detailing legal use of this file can be found at:
http://www.kanjicafe.com/kradfile_license.htm

Kradfile - 2 was created by James Rose by means of analysis of all 5,801 JIS x 0212 Kanji and identification of the constituent radicals and other common elements, with the goal of extending the capability of current kanji selection by-multi-radical tools in this range. Care has been exercised to maintain the same format as the original kradfile by Michael Raine and Jim Breen to aid in integration with existing electronic dictionary programs.

Two fonts were used in decomposition so as to include as many glyphs as possible. One apparently based on the JIS x 0212 standard itself, and one based on Unicode. Each JIS x 0212 kanji is represented by 3 bytes in EUC-JP encoding, as opposed to the two bytes used in the JIS x 0208 range, so adjust your software accordingly if necessary.

The useable portion of the file consists of 5,801 lines; one for each of the JIS x 0212 kanji. Each line is a follows:
  - the kanji itself,
  - a space followed by a colon (:) followed by a space,
  - one or more radicals/elements which can be seen in the kanji. These
    are drawn from JIS x 0208-1997. Where the element alone is not in
    JIS x 0208, a kanji which contains the element is used instead.

The decomposition is based on what can be seen in typical kanji glyphs. Elements themselves can be further subdivided.

You can contact Jim Rose at Jim(at)Kanjicafe.com.

Jim Rose, Christiansted, United States Virgin Islands
September 2007
##########################################################


Here is an example of how different the same kanji may appear in different fonts. The image below was taken during the creation of kradfile2 in 2007:
Two alternate gyphs of the same kanji in kradfile2