File corruption in HTML component
Problem
kuba ch. reported:
I was learning with Advanced English 2014 pronunciation. While experimenting with XML synchronization, I noticed that SuperMemo produces a corrupted XML file (it does not open in Internet Explorer). I narrowed down the problem to the item with pronunciation of romance. The answer was this:
˙ű�
This is very strange because the text is correct in the original collection, and I have never learned or edited that item.
Suggestions
Please submit the HTML file using View Source option on HTML component menu.
Diagnosis
This looks like a file corruption. Your answer is not even stored as HTML. It's plain text version is corrupted. You will need to examine your collection for further corruption of texts. This is because all plain texts are stored in text.rtx file. This means that the extend of the damage might be greater than just a single item. You can submit your text.rtx for analysis if you wish.
Possible causes of corruption are similar to file corruption in other applications (disk errors, virus, software crashes, etc.).
Diagnostic tools
You can use View : Encoding on the text registry menu to inspect the contents of texts that seem corrupted or hard to read. In your case, you will notice that the file contains zeros which terminate the string sequence. This makes it look like your text is just 3 characters long, while it is actually 3552 characters long. If you examine your file further, you might suspect that the corrupted content was taken from an MP3 file. If your computer crashed while copying MP3 files, some disk information mix-up might occur resulting in part of your text.rtx file pointing to a sector with MP3 file data?
This is what your Encoding shows:
____________________ Name (Position=6258)(Len=3552 chars) ____________________ "˙ű�Info�!7������������� �,�‹�‚@Ш�20ac�I‚Ęťă§!źW¬ů)‘Ř ..." ____________________ Unicode Data ____________________ ˙=729 ű=369 =144 Special=4 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 I=73 n=110 f=102 o=111 Zero=0 Zero=0 Zero=0 Special=15 Zero=0 Zero=0 Zero=0 !=33 Zero=0 Zero=0 7=55 =129 Zero=0 Special=7 Special=7 Special=7 Special=15 Special=15 Special=15 Special=23 Special=23 Special=23 Special=31 Special=31 Special=31 Special=30 Special=10 Special=28 ,=44 Special=3 ‹=8249 Special=5 ‚=8218 @=64 Đ=272 ¨=168 Special=21 2=50 0=48 a=97 c=99 Special=2 I=73 ‚=8218 Ę=280 ť=357 ă=259 §=167 !=33 ź=378 W=87 ¬=172 ů=367 )=41 ‘=8216 Ř=344 Space=32 .=46 .=46 .=46 ____________________ UTF8 Codes ____________________ Ë=203 Control=153 Å=197 ±=177 Â=194 Control=144 Special=4 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 Zero=0 I=73 n=110 f=102 o=111 Zero=0 Zero=0 Zero=0 Special=15 Zero=0 Zero=0 Zero=0 !=33 Zero=0 Zero=0 7=55 Â=194 Control=129 Zero=0 Special=7 Special=7 Special=7 Special=15 Special=15 Special=15 Special=23 Special=23 Special=23 Special=31 Special=31 Special=31 Special=30 Special=10 Special=28 ,=44 Special=3 â=226 Control=128 ¹=185 Special=5 â=226 Control=128 Control=154 @=64 Ä=196 Control=144 Â=194 ¨=168 Special=21 2=50 0=48 a=97 c=99 Special=2 I=73 â=226 Control=128 Control=154 Ä=196 Control=152 Å=197 ¥=165 Ä=196 Control=131 Â=194 §=167 !=33 Å=197 º=186 W=87 Â=194 ¬=172 Å=197 ¯=175 )=41 â=226 Control=128 Control=152 Å=197 Control=152 Space=32 .=46 .=46 .=46 ____________________ UTF8 Encoding ____________________ "˙űÂ�
Corrupted XML
The newest version of SuperMemo 17 for Windows prevents corrupting XML files by encoding control characters as HTML. This way, instead of your 3 character string in SuperMemo 16, you can see the whole corrupted file in the HTML component. This also prevents corrupting XML.
Cause of damage
While examining your text.rtx file, we can safely conclude that the file has not been corrupted. Only that single items had its text converted to a binary string. This is rather unusual. This means that SuperMemo must have received that binary string somewhere in the process of editing, or the corruption occurred when writing the text to the file (e.g. mp3 file overwriting SuperMemo's protected text buffer, i.e. in a crash). The neighboring registry members are all stored correctly and in contiguous positions. This means that only the area belonging to the damaged member is corrupted. This makes random file damage highly unlikely (wild estimate: 1 : 70,000).
In theory, you might have pasted that string of characters into registry member text editor, however, you would not paste an MP3 file directly (the editor accepts only text format). This means that you would need to somehow copy that file in the text format. If you do a lot of registry editing, pasting some random data may occur by accident. Only you will know if this was a likely scenario.
There are many signs of registry editing around the area of damage. Many members have had their texts changed to longer length. This leaves strings of #02 characters that fill up empty spaces and are ignored in low-level search. Those spaces are reused only after Repair collection. This means that you have edited many members, changed their length to longer, but the corrupted one had to be written back to its long 3552 byte length.
SuperMemo 17
SuperMemo 17 does better encoding and should make this problem instantly visible. Instead of some mysterious short string, this is how your file shows in SuperMemo 17 (front fragment only):
˙ű��������������������������������Info������!��7�
,‹‚@Ш20acI‚Ęťă§!źW¬ů)‘Ř TŰšVÓŃ{ś÷ĆÚMZmŃZľ«ţČFÄ]ú�„i¶2AA“F•¨$¤8$ Ĺb#ç•ěúeř[ \˝9™xŞ´NśÇÎeí˝1¤cĹh»ƒ¬I‘c‰˛v‡Z¤PK ééĹwŮąYf4+ú¨o꼊Ё˙ű’Z€Â(ÉKLŔ]#ii†8 ř),¤nAo ç5‡˜äŕJü˝}K=A 7:!ŁŰQg\7ľ¬ÍËOµUShc4]Ť˛yůéWEç3(pô$öA˛TĐ.u¦Ţ˘´‹=ˆ3[]eëe+DQôě˙Vĺ/—ý-¶ĆšŹ‰M€"Ú!ŁŰääe˝RĄĄ3y°"r*E*
Q•%ɘˆ·=°ş[HŚž ď0'ńšÁŽLŰŠÉĄÂÁ ąŻ 4ŁŔÎ\şo3’׍hS™fĐ@Ppŕ`ÝsȐ kŔĂe�4±^ď׿éţéµ%�`?iv—ƒˆLćY´Yƒ