|ICEMOCHA 「氷モカ」 Credits|
|And a Brief Record of its Evolution|
|Home | ICE MOCHA | Expresso Ristretto | Using Expresso | The Day I Met King Olav|
|Downloads: 1,654 SODs & SODAs | kradfile-u | Katakana Tutorial | Katakana Chart | Hiragana Chart|
The Historical Story of Ice Mocha's Evolution
- with Attributions Given to Those Who Assisted in Making a Dream into a Reality
It all began, oddly enough, in Australia. In 1991, Professor Jim Breen at Monash University began to compile both EDICT and KANJIDIC over the course of more than a decade (now two decades). EDICT is an electronic English-Japanese dictionary file project, and KANJIDIC is a collection of information pertaining to 6,353 JIS X 0208 kanji. Modern forms of these files are now the basis of countless Japanese language learning tools all over the world - none of which could have existed without professor Breen's two decade long devotion to the project.
I left MIT (2002) a year after I founded Rolomail Trading, which gave me an income and the ability to focus on this project for days on end up up until I got married in 2004. Rolomail was a company devoted to selling Japanese language learning products that paid for me to pursue my dream of developing the online tools to exploit the potential Jim Breen's work had opened up. Ice Mocha itelf began with the creation of the supporting files necessary to build a Japanese text imaging proxy server: fully integrated into every function of the applicaton that I've ever added. Proxy server development slowly crawled along in 2002, whereupon I began the restructuring of EDICT's format in the fall of 2003 to make it a better fit to what Ice Moch would actually do. I decided early on to build a file which would contain the position of the yomigana for each of EDICT's then 106,000+ words, the calculation of which, through computation, is no trivial problem. Euphonic variations in kanji readings were theorized from KANJIDIC, and EDICT's reading errors were exaustively rooted out and repaired using tools built to find mismatched readings with words. Most of these errors were passed on to Professor Breen in Australia - to give future generations of tool builders a cleaner EDICT to work with. Two attempts were made to derive a file containing EDICT's Yomigana with some help from KANJIDIC. The first one was an extensive tour-de-force of predictive and retroactive logic. It failed miserably, only correctly parsing the reading of about 85% of EDICT. Then in a flash of caffeine induced genius, I conceived of a kind of "extending string theory" to derive the yomigana from the reading, which when combined with a euphony hunting algorithm, correctly parsed about 99.5% of the mammoth file. Having received this deep revelation while drinking an Ice Mocha from Brügger's Bagels, a tradition of naming .
Most people now have fairly simple access to Japanese fonts on their computer. When I first wrote this program the proxy server feature on Ice Mocha was more important than now. Much of this technology requires the Perl module GD.pm written by Lincoln Stein at the Cold Spring Harbor Laboratory. GD.pm is a Perl 5.x interface to Thomas Boutell's gd library that allows you to generate PNG and JPEG images on the fly.
In the first week of May, 2004 Jim Rose included the ability to view and edit the three study lists, and to shuffle the order of words on the lists.
On August 11, 2004, Jim Rose rewrote the way Ice handles the kanji information file so as to end its former practice of speedy, but RAM wasting "file slurping". Part of the reason is that Jim found converting Ice from Perl to mod_Perl unfruitful - primarily because he doesn't really understand the whole multi-threaded processes concept in unix machines and why they just have to screw up all the variable values. So Jim thought he had better conserve some RAM inside of regular ole Perl in preparation for swarms of users making hundreds of page loads on the server. His Perl only modification will allow for the eventual expansion of the file to cover kanji etymology etc, as well as to enhance and preserve the application's performance a little better when there are multiple people using the application over the web at the same time (despite not running Ice more directly as an Apache module as it would in mod_Perl). Only the most minor of files are now slurped into memory - and at the moment, the mod_Perl conversion has been abandoned. The application's speed does not seem to be diminished at all from this upgrade, and mod_Perl conversion can probably be avoided as long as there are fewer than 10,000 users, right? Right? As of this date, there are nearly 1,400 Ice Mocha users... this despite virtual invisibility on the web.
On October 15, 2004, Jim Rose completed the ability to add vocabulary words from either the JLPT 4, 3, 2, or 1 (Japanese Language Proficiency Test) that were not already on one of your study lists (a, b, or c) - idea suggested by Alanna. He also cleaned up the controls a little, placed a letter by the word being studied to tell you which study list it is on, and put in a safegaurd against changing a word's emphasis if its not the current word being examined. Alas, he wanted to do more, but grew tired from lack of sleep.
On November 30, 2004, the very first Stroke Order Diagrams (SODs) created by the SOD Editor-Retrographer (SODER v.I) were uploaded into the Ice Mocha application. The SODs display whenever more information is requested for a given word's kanji. SODER is one of the first Internet tools Jim (Rose) ever developed after doing actual research and experimentation - in fact several years worth. That either means its really cool, or he's a really slow programmer. It has many little subsystems which make it tick, and even though it is certainly not optimized nor pretty, we're all quite proud of it. A list of the 5 top contributing volunteers is now on the KanjiCafe.com homepage. So far, several hundred SODs of 10 strokes or less are available to Ice users, with more added every day. Improved versions of SODER are required to complete the entire 6,000+ kanji used by Ice, and are on the way.
On July 23, 2005, there are about 3,000 registered users. Jim is thinking out the details of a kanji "look-up by radical" function, and will begin programming shortly.
On August 23, 2005, Jim (Rose) adds the "radicals" feature to Ice, allowing the user to perform kanji lookup-by-multi-radical. This feature is made possible by the incorporation of an improved version of "RADKFILE" (Copyright 2001 Michael Raine, J. W. Breen) into the Ice Mocha brain. According Professor Breen, Ice is now one of only two websites to support this feature. The first was WWWJDIC. RADKFILE is based on work performed in 1994/1995 by Michael Raine in which he analyzed all the JIS1/2 kanji and identified the constituent radicals and other common elements, with the intention of facilitating the selection of kanji within a dictionary program by identifying multiple elements. The file was revised by Jim Breen in September 1995. Further revisions were carried out in 1998/1999 at the suggestion of Wolfgang Conrath, then a revision was carried out in 2001 using suggestions from Yutaka Ohno based on a similar decomposition made by Kobayashi. Further amendments were made in July 2001 after suggestions from Hendrik. Jim Rose made several corrections to the file, and organized its kanji by stroke count before wrapping a library of Mocha code around it.
September 6, 2005, You can now have a SODA with your Ice Mocha. At some God-awful hour in the very early morning, Jim (Rose) uploaded the first 500 "light" version Stroke Order Diagram Animations (SODAs). With the flick of the return button on his Macintosh Terminal, SODs from the SODER project were whisked down from the web-server, turned into animations, and then promptly and automatically shuffled back up to the server from the Mac. (Actual conversion only took 12 seconds.) Each time new SODs are created by the volunteer corps of the SODER project, the diagrams are downloaded, the animations are created and uploaded. Too tired to appreciate the magnitude of his accomplishment, Jim promptly went to bed. He also found time to completely rewrite the way Ice handles the login registration file and believes that some people have permanently lost their account because of hasty programming decisions made in December 2003 to hurry the release of the multi-user version of the Ice Mocha application - never thinking there would be 3,000 accounts in just a year-and-a-half pushing Jim's stop-gap, band-aid code to its limits. This problem should now be a thing of Ice Mocha's primordial past - if you have lost access to your account for no apparent reason, and the ERROR messages DO NOT say that you have already registered an account under that email, you are urged to reregister a new account. Work now continues on a "heavy" version of Animated Stroke Order Diagrams (ASODs). Whereas "light" SODAs show one stroke per frame, "heavy" ASOD animations will depict the ink of the brush pen being laid in smooth brush motions. If you don't have the bandwidth for the coming ASODs, stick to consuming SODAs. All of these visual media are derived from the same SODER group: SOD => SODA => ASOD.
On December 16, 2005, Jim (Rose) released SODER v.III, an all new point-n-click version of the web based graphics editing application used by volunteers over the Internet to create Stroke Order Diagrams (SODs) for Ice Mocha. There are 652 SODs at this point, from 1 to 12 strokes, with the goal of creating at least 6,353 in total. So just as the best tool in existance for the project is released, we're more than 10% through the project. Everyone is working hard to try and finish 1,000 diagrams of up to 20 strokes each by early 2006.
As of December 21, 2005, kanji look-up will now display the character's associated radicals and meanings used for lookup-by-multi-radical. These can often provide an etymological insight into the kanji's meaning. Some of the 250 radicals currently used by Ice Mocha are in-fact kanji, and often kanji which are not in the JIS 208 standard, but perhaps in the JIS 212 set. Testing has shown that Firefox, Netscape, and Safari are capable of JIS 212 display, and in non-proxy mode, Ice sends some of these radicals as JIS 212 kanji if it detects any of these three browsers.
January 30, 2006 Ice Mocha's dictionary expanded. Ice Mocha's dictionary was originally based on a 2003 version of EDICT. The January 19, 2006 version of EDICT was proofed and error reports sent to Australia. The yomigana parsing engine was tweaked and over 116,000 words were parsed in just over 104 minutes. The new basis files add some 5,000 new priority words, and relegated 2,000 formerly priority words to the non-priority category and the new dictionary adds about 10,000 words to Ice Mocha in all. A brute force algorithm mapped all old dictionary indices stored on user's a, b, and c lists to the new dictionary. The mapping required 2 days of computation time, but future upgrades hope to exploit simularities between EDICT versions to dramatically shorten mapping time with a target of 20 minutes as the goal. A starting point codebase has been created to make updating Ice Mocha to new versions of EDICT less painful. In the future it is hoped such upgrades will only require a few hours of computation with further enhancements and optimizations of the code base. New account creation has been cgi wrapped to move Ice Mocha closer to this goal. This means nothing to you unless you've built a similar application so don't sweat it. Ice's dictionary now stores its yomigana data in compressed format. JLPT 4 through 1, suggested words, etc, have all been updated to the new indices.
Also on February 14, 2006, Jim Rose restored some sanity to Ice Mocha by implementing some common sense features like the 'OK' button, which allows you to turn off lists of suggested words, JLPT 1-4 words, lists of words in your 'a', 'b', and 'c' lists, words discovered by searching the dictionary, and the radical search table and/or results. On an HTML4 compliant browser you can also toggle the 'OK' feature by depressing the letter 'o'. You can also conjure up the radical table on said HTML4 compliant browser now by depressing the 'b' (think bushu) button on your keyboard. And most importantly, you can resume working on the ABC study list word after selecting other words from one of the above lists by pressing the 'return' button, or by depressing the 'r' key in an HTML4 compliant manner. The absence of these tiny features created a real headache for those who sought temporary adventures in the midst of pounding through their study list regimes.
On February 14, 2006 Jim Rose completes a 1st attempt at integrating Ice Mocha with the Tanaka Corpus. This huge body of sentence pairs was compiled by the late Professor Yasuhito Tanaka at Hyogo University who had released the corpus into the public domain several years ago. Over the years, each of Tanaka sensei's students were tasked with collecting 300 Japanese-English sentence pairs. Many appear to be derived from textbooks, the Bible, children's stories, and songs. Some are a bit long. Jim Breen later standardized the entries, added some gender-specific clues, removed duplicates, and corrected many of the errors (with a little help from the sci.lang.japan gang), and processed the Japanese sentences with the Nara Institute of Technology's Chasen morphological analysis program. The end result was a pretty handy, although often incomplete glossary of Japanese words for each sentence. There is still much work to do on the Corpus, and it has many flaws, but nevertheless constitutes a fundamental alteration to Ice Mocha's capabilities. You can now spend hours learning to read, all the while keeping on track with an emphasis on your vocabulary words. Happy Valentine's Day!
On February 15, 2006 Jim Rose added keyboard commands for viewing kanji information with HTML4 compliant browsers. If the kanji is the 1st one listed after the word being studied, simply press '1' on your keyboard. If it is the 2nd kanji, press '2', etc. The key, as always, to truly enjoy Ice Mocha, is to make sure you are NOT using Microsoft Internet Explorer. It gives the worst interpretation of the application, and cannot process key commands. The 'radicals' function, or as it is often called 'kanji lookup-by-multi-radical', has been overhauled to allow you to return to the same 'state' after leaving the radical selection table for any reason. This way you can look up one kanji, get lost exploring links, and simply return to the same batch of similar kanji without replugging in the radical choices. Some other minor details you will not notice involve how the stroke order diagrams and animations are stored on the server.
On February 17, 2006 2:30 AM Ice Mocha began keeping track of which example sentence was last viewed for any and all words on your study list. This feature will help keep you from seeing the same sentences over and over.
March 10, 2006 Added a toggle to turn a list of keyboard commands on and off. Some improvement of the interface.
March 16, 2006 While frustrated over the complexity of trying overhaul Ice's entire infrastructure to byte offsets, I took a break and starting adding both Japanese names and English nicknames to the radical decompositions on the kanji pages. Did anyone notice?
March 17, 2006 Do you want the bad news first, or the good news first? The bad news is that I had to reset everyone's account to day 1. This will naturally anger a few, frustrate some, but hopefully most of you will simply roll like an aikido master with the punches and use this as an opportunity to be more picky about which words to add to your list this time around. And why did this happen? Why Jim have you inconvenienced us? I didn't want to, it just worked out that I had to. The good news is this: The reason is that I've completely re-engineered a good bit of the file system under the hood. After incorporating Ice with the Tanaka Corpus (TC), it struck me and several other people that Ice Mocha had fundamentally changed. What was in its infancy a glorified flash card system par excel-ante, had suddenly matriculated into the realm of the supernatural. It has actually become an almost magical teaching tool that mesmerizes the user who without much effort at all finds himself actually learning to read Japanese. Killing time with Ice Mocha is like taking hyper-linked adventures into the Matrix - a universe of interconnectedness. So Jim Breen's monumental task of cleaning up Professor's Tanaka's corpus deserved my full attention, and whatever cooperation I could muster from my aging life force.
With error reports flooding the Internet, and SLJ regulars like Paul Blay streaming in fixes by the thousands, the TC and its glossary are evolving at a record pace, and in order to keep up, EDICT itself is making evolutionary jumps. Ice Mocha was originally built on a 2003 version of EDICT. The next upgrade didn't happen until 2006. For Ice Mocha users to assist in making a better Corpus, Ice needed to reflect the corrections in quasi-real time.
But alas, Ice Mocha was quite the high-maintenance spirit. The time consumed to process EDICT and the TC, in keeping with her demanding architecture, was measured in days - and a file system of nearly 400 Mbyte. To contribute meaningful error reports to Jim in Australia, Ice's underlying file system would have to be rebuilt with freshly corrected files on a nightly basis. To achieve that goal required about a month of programming in the area of streamlining and automating the file system downloading, processing, and uploading. Extensive use of bifurcated binary sorting algorithms intermingled with Schwartzian Transformations and fuzzy logic had to be applied to normalize processing speeds. Unix installations were made. A twin Macintosh system took shape so that two processors could share duties. Tools like wget, Perl modules like Time::HiRes and Encode. Ice had become a "system" which can in theory now upgrade itself in 6 hours with very little human interference. So in the midst of all this massive engineering, I decided I might as well upgrade how Ice accesses the dictionary files too. Instead of searching files by index, I've shifted to using byte offsets. She's become less of a hack. Instead of starting the file pointer at the beginning of a file and slowly moving though it counting how many records are passed until reaching the word you're looking for, this technique places the file pointer directly at the very byte on the hard-drive that your record starts at. That meant that not only Ice, and its dozen or so function libraries had to be rebuilt, but indices of the files had to be created and sorted, and the type of indices used to store words on your study lists had to change as well. The payoff is that byte offset indexing is much faster. Hopefully you will notice!
So now that Ice is on the verge of being able to update itself to versions of EDICT and the TC scarcely hours old, I'm striving to incorporate an error slash suggestion feature into Ice Mocha to help clean up the Tanaka Corpus even further (with YOUR help). The missing glosses, the premature parses, etc.. Given how enormous the TC is, errors are bound to be plentiful for some time. Hopefully you will see new buttons for suggesting errors in the near future (hopefully before Jim returns from Europe in 6 weeks).
March 26, 2006 Fixed a bug first noticed by Kim Desmond where dictionary searches ignored exact matches if the end of the match was also the end of the dictionary entry. Exact searches for words like "cat" gathered lots of cat entries, but ignored "cat" itself. This bug doesn't seem to have existed before the March 17 update, so I must have inadvertantly removed the EOL regex condition thinking it wasn't necessary. So much for programming late at night with your eyes half shut.
June 1, 2006 Jim & Arlene Rose nearly killed in a freak accident when their Jeep flips back over front 7.5 times at 80 mph. The laptop containing software which converted SODER project diagrams into animations was ejected from the vehicle. The SODER project, as well as all Ice Mocha development comes to a complete halt.
October 14, 2006 Jim successfully installs modern GD.pm on a Mac Mini running OS X 10.4.8 and the SODER project resumes. Ice Mocha has well over 1,100 SODs and SODAs at this point. Tracy Gittins emerged as a major driving force in submitting the most diagrams for me to edit and became an important person to this project.
May 3, 2007 This may not actually be worthy of mention, but there is now a tiny 10 X 5 pixel image which says "SOD" if the Stroke Order Diagram for a given kanji in the word being examined exists. Most of Jim Rose's current efforts at Ice Mocha development are devoted to an overhaul of the radical decomposition system, and editing the backlog of SOD submissions. 1,445 SODs and SODAs are edited and available for use.
July 24. 2007 The 3rd public release of Stroke Order Diagrams (SODs) and Stroke Order Diagram Animations (SODAs) hits the Internet today. 1,500 Kanji depicted. The archive is now the 2nd largest on the Internet, and largest that you can download to your own website or home computer. The next public release will be the largest set on the Internet.
November 23, 2008 fixed a bug discovered by Alex Pankhurst that affected a single word in the dictionary that could neither be added, moved, nor removed from the a list. There are 1,643 SODs and SODAs created by the SODER project and available.
May 26, 2009 removed kanji selection by keyboard whenever the radical table was displayed as it was causing some browsers to show kanji information when you tried to change the minimum or maximum stroke count of the character you were searching for. There are 1,663 edited SODs and SODAs online. A new project to create kanji etymologies is underway.
May 30, 2009 fixed a bug in which after doing a dictionary search, ICE no longer told you which list (a, b, or c) the words in the example sentence might be in. This had a side effect of corrupting the word lists if you tried to add a word already on the list but didn't realize it because you had looked up a dictionary word on the same page. To correct the list corruption, delete any word from your list and the corruption should be repaired (I think). You can recognize the corruption because you may ask the application to display 25 of your 100 "a" list words, and only 8 appear for example. Deleting a word seems to remove the problem. ICE now correctly displays the list status of words in the example sentences when also showing dictionary search results. There are 1,664 edited SODs and SODAs, and 15 kanji etymologies online.
February 17, 2010 Cyberknight Massao Kawata send me a note concerning a style attribute in SODER's image which would fix the scaling problem created by Firefox's decision to abandon nearest-neighbor scaling. The attribute triggers the browser to revert to the "fastest" rendering speed, which has the affect of restoring nearest neighbor scaling and restores the visual integrity of hte SODER editing tool. I actually implemented this on March 20, 2010. SODER is however, still not producing diagrams because my hardrive died, and the new one, of course, does not have GD.pm working. Why Apple refuses to ship Darwin with a working version of GD/GD.pm with Perl I cannot begin to fathom.
January 20, 2012 Allowed the server to die because it was too expensive. Never bothered to download the site never thinking it would be resurrected. Found hosting that seems capable of the same level of performance, and decided to see if I could located all the necessary files after 4 successive hard-drive crashes. Can't find all of the 48 pixel proxy plates used to make SODs and SODAs. Will have to attempt to rebuild the plates. Slowly uploading resurrected site. Ice Moch is about 95% of the site. Lots of work.
February 1, 2012 Completely restored the old proxy server, and extended it up into the 64 pixel size. Rebuilt the page structure, and upgraded the horizontal word's font size to 64 pixel from 48 pixels, since there is now the capability of sending that size as an image. Fixed an old, undiscoverd bug that would not allow you to add example words from the example sentences if it was the only word in the list not already on your study lists. Am now able to develop code offline on my Macintosh Apache server, and directly upload to the server without code change. Have the potential to actually serve the site from my Mac, and will experiment with that soon. Hope to restore the SODER project soon, at least on my own machine. I seem to have a renewed love for Ice Mocha as I now live in Hawaii and have the chance to speak Japanese everyday. Expanded the number of functions that could be controlled by the keyboard, although it appears FIrefox is no longer controllable via keystrokes. "auto", "fade", "proxy", and "search" can all now be summoned by depressing "a", "f", "p", and "s".
February 26, 2012 Fixed a bug that now allos you to use the regex start of string anchor "^" in your dictionary searches. Several regex expressions can be used to located a list of specific words for study using the "|" to separate the query criteria. "$" can be used as the end of string anchor. Most words can be identified by their kanji form followed by their reading without a space between. Kana only words can be narrowly searched for in such groupings by use of the "^" and "$" sting anchors.
February 27, 2012 In preparing the way for future versions of Ice Mocha, it occurred to me that converting the Ice's logic in order to process UTF-8 encoded Japanese was unnecessary. I simply needed to changed the encoding of the display. Today I converted Ice's code base to emit UTF-8 Unicode but continue to process the convenient ECU-JP encoded text files with their predictable byte size. The first small benefit was allowing the replacement of all but two small images in the select kanji by multi-radical box with text. Most of the radicals used by the bushu search box are defined as characters if you can display all of the JIS x 0212 and JIS x 0213 Kanji. In Unicode, life is easier since both are therein defined. The next immediate bonus will be to allow etymologies which use very rare kanji to tell the story of character evolution, and shortly after that I expect the benefit of kradfile-u's 13,108 radical decompositions and being able to display all 13,108 kanji. I will probably build a new proxy server that can handle about 14,000 characters, as I'm sure many people will not be prepared with the correct fonts to handle all that power.
© 2003-2012 James Linden Rose, Kingdom of Hawai'i