Cloze deletion on duplicate extracts leads to a wrong q&a pair in items

From SuperMemopedia
Jump to navigation Jump to search

Environment: Supermemo 2004

Problem

I have language learning material for incremental reading and did an extraction of the part with vocabularies. To do cloze deletion for each word, I extracted the (same) part several times and then did a cloze deletion on the answer. Now the strange thing happened: I got the question-part of the last cloze deletion in every item created from that part. So only the last item is a correct q&a pair, all others have the wrong question part.

ideas & workarounds

Please describe in more detail. This does not sound as correct incremental reading: "I extracted the (same) part several times and then did a cloze deletion on the answer"

  1. there should be no need to "extract the same part" more than once
  2. cloze deletion is executed on short topics and "cloze deletion on the answer" sounds like an error in art?

Analysis of the actual collection: Korean test collection

The collection listed individual syllables and their pronunciation. Each Korean symbol was represented as a picture:

[Picture] = Pronunciation

Upon extracting individual pairs, the user generated cloze deletions of this form:

[Picture] = [...]

The problem

There is a design limitation in SuperMemo. It uses text comparisons for distinguishing between HTML texts in components. Pictures are not analyzed. HTML tags are not analyzed. As a result:

[Picture 1] = [...]

and

[Picture 2] = [...]

are seen by SuperMemo as the same text registry member named

= [...]

As there is only one file per member, the said cloze deletions would be formatted as the last cloze generated.

Workaround

Make sure your texts are always unique. For example:

[Picture 3] = [...](3)

[Picture 4] = [...](4)

will not result in confusion as these will be represented in the registry as:

= [...](3) and = [...](4)

Similar problems

Similar problems have been pestering users with short mathematical formulas represented embedded in short texts, users who make a heavy use of subscripts and superscripts (these are represented by HTML tags and are not distinguished at the registry level), other pictorial representations.

Future

SuperMemo cannot distinguish between all forms of formatting of a single text as it would result in bloating the size of collections (e.g. there might be a thousand of uses of the word "yes" with many variations of formatting, e.g. different font). In addition, pure texts are used in fast binary-level searches through the collection. Including HTML tags in the search would require dedicated low-level encoding to distinguish from tags included in the learning material itself. Testing for and skipping over such encodings would substantially slow down the search.

Currently, the proposed solution is to allow of including subscript/superscript tags and/or image names in short texts for minimizing the impact of the problem. Perhaps as an option (to prevent "unclean" registry members for those who do not need this distinction).