Cannot edit Japanese strings in Contents

From SuperMemopedia
Jump to navigation Jump to search

Problem

SuperMemo 2006 can display Unicode in the Contents window. However, the local Contents editor is not Unicode enabled. This causes problems with editing Asian strings in Contents

Report

Japanese characters don't appear to entirely work correctly in Contents, although some of them do. It appears to be an all-or-nothing issue with the entire item -- either a content list entry is entirely correct, or it is entirely wrong, but never only partially incorrect.

I can easily reproduce the broken display. If you are entering hiragana, and you enter an odd number of characters, the display in the content list breaks. If you enter an even number of characters it displays correctly. This appears to be per-word-token or something of that nature, for example (I'm not sure if you know much about Japanese, but each character in these examples is composed of two english letters, so "ni" would be one character, and "nu" would be another).

Here's some examples:


ninu = ok (two characters)
ninuninu = ok (even number of characters)
ni = broken (odd)
nu = broken (odd)
ninuni = broken (odd)
ninuni ninuni = broken (3 character words)
ninu ninu = ok (5 characters including spaces, but the complete unicode 
words are in groups of two characters)
ninu abc = ok (any groupings of non-unicode letters don't appear to 
affect anything)

Additional comments

  1. I never edit the content list itself. I think it may be possible that this is somehow related to a typo when I was entering data, as I had vocabulary in both Japanese and Chinese for the english word "sneeze". I think I might have entered the same question text for both, and somehow that resulted in losing data and confusing entries.
  2. Just to clarify, I'm not editing the content list itself. I'm editing the "question" portion of the item using the regular HTML editor. When I am done editing, the software automatically transfers the contents of the question portion to the content list after preprocessing it, and this is the phase where the data is becoming damaged.

Solution

Edit titles in the element window.

Use HTML components to edit and display Unicode in the element window. Once the texts are complete, use Alt+T to convert the HTML text to a title that will display Unicode in Contents. You will need Alt+T only if the previously edited title differs from the element contents. If you let SuperMemo generate titles automatically, they will automatically show correctly as Unicode in the Contents window.

In short: Instead of editing titles, focus on editing texts and let SuperMemo generate the titles automatically.

Technical

The tree component used in the contents window is Unicode-enabled. The primitive mini-editor that is hooked to that component at edit time is a separate component and it does not support Unicode. The only components capable of editing Unicode are RTF (outdated) and HTML components (default) in the element window. HTML components and the contents window communicate via SuperMemo database. They are therefore the easiest (and the only) way to edit element titles. The mini-editor hooked to the contents tree is suitable only to correct typos in Latin languages in cases where the element text and the title differ (otherwise, the typos should also be corrected in the element window as editing the title will not affect the text in the element window)

Counter evidence

I am actually editing the text in the element window. It's just that certain groupings of unicode letters are not translated correctly between the two. The automatic generation of the content list title from the element window is where the problem lies. I can manually fix the content list by retyping or cutting and pasting between the element window and the content list title editor - as far as I can tell, every part of supermemo can ultimately properly display the correct unicode characters, after manually synchronizing the text.

Answer?

I can reproduce it now. What I do is go to Control Panel/Regional and Language Options/Advanced/Language for non-Unicode programs, and choose "Chinese (PRC)". When this setting is on, the content list entries break. If I set this to English, the problem goes away.

Once the settings are for English in non-Unicode programs, XP seems to behave exactly as Vista. This includes the behaviour when importing Unicode Q&A text files. Before on XP, I could import Japanese characters into the HTML element, albeit with broken content list entries, however now all non-english characters are replaced with "?" in both the content list and the HTML element. However! If I save the Unicode text file as utf-8, I can import the real Japanese characters into the HTML element, and only the content list entries contain the "?" marks. This isn't so bad, as now that the transfer from the HTML element to the content list via Alt+T works (only if you choose "use current text", which I assume regenerates the title text), I can update all the entries fairly quickly.

The problem with setting titles to ANSI on Q&A import is a known bug (hopefully yet to be fixed)

Report

Just a user note that Alt+T is not really a feasible workaround for this bug -- when importing hundreds of items or dealing with thousands of old elements, it would take ages to manually fix them all with Alt+T.

(Supermemo does not generate titles properly most of the time when importing QA text -- but it does work occasionally. Having non-unicode characters at the start of the title seems to help)