Recovering from damage to files in a big collection
Summary
- This is a case where an extensive collection file data damage could not be remedied due to a lack of correct backup. As a side effect, the recovery process revealed an unrelated bug in SuperMemo that has already been fixed.
- Keep your collections well backed up!!!. It is not enough to keep backing up. From time to time, you need to check if your backups work. For example, a user kept backing up KNO file without the associated folder in the collection. After a crash, 2 years later, he discovered that KNO keeps just a few pieces of data about the collection. This resulted in losing 2-3 months of work!
- If your collection is huge (above 100,000 elements) and SuperMemo is causing problems with Cannot update A-Factor distribution, you may need to update to the newest version of SuperMemo 16 (dated May 9, 2014 or later). All SuperMemos released before that date include a bug in which A-Factor distribution is not protected from overflow when the number of elements having a given A-Factor surpasses 65,535
Problem
havate wrote:
I started to use Supermemo 16 because I had some old knowledge database which hadn't worked very well. The database was incomplete as a result of hardware malfunction that took place earlier. Even now some sound files are missing, but it generally works. The errors occur when I try Mercy or when I try to reset the collection. Mercy simply freezes the software at some point, and resetting the collection causes the same problem. Reset only works when I manually forget the items and then run reset. Is there a way to check which individual items cause problems? I'd simply like to delete them or fix manually. I've already tried forgetting all items and memorising them back again to check if mercy works. The report is in the attachment. I think that all the problems are caused by some erroneus items, although these can't be found even by using detailed repair. I'd like to send you the database but it's full of things I wouldn't like to share, so I'd rather find some way to find the damaged items and fix/delete them. The problem with dividing the database into smaller chunks is that it has 184,374 items. When I run basic repair after memorize I get the following results: see recover.zip. After running it again I get the same number of errors. I've also tried to split it into a number of source code files but they are too big and it's hard to eliminate the wrong items this way. When I try to resent the collection with all those errors I get a lot of yellow warning boxes:
- Wrong A-Factor distribution in category #7
- Wrong interval distribution
- Wrong repetition distribution for category #1
- Wrong lapses distribution
- Cannot update workload statistics for day 2
- Wrong A-Factor distribution in category #7
I'd have to press o and hold for a good 30 minutes until all the screens are gone. I'd be really grateful if you could help me in any way
Analysis
- quick analysis of errors in your recovery reports seems to indicate that you have damaged two files:
- the KNO file (possibly it has been reset to all zeros)
- the elinfo.data (possibly it has also been reset to all zeros)
- please send the above files for inspection to confirm
- if you do not have a backup of the entire collection, and the above suspicion is correct, you will most likely lose your learning process (unless SuperMemo could be armed with procedures from restoring item info from other data, like repetition history - it is not clear if this is even possible)
- as you already run Reset and Forget, you probably care more about the data, rather than the process; in such a case, recovery should be possible and a substitute learning process could be engineered to enable fast recovery within a few months of learning (180 thousand items is many enough to take a year to review, let alone put right in the learning schedule using artificial methods)
Elinfo.dat and KNO
havate wrote:
I've attached the file. There's elinfo and kno file. I don't have any data about viruses there, but yes, there has been a major hardware failure. Even tough I perform backups regularly, the backup software didn't copy the database properly and after the HDD died I had a backup which wasn't fully functional. Anyway, I hope that you'll be able to see what could have gone wrong. What's weird is that if I transfer a branch of elements to separate databases they seem to work fine, but when I start merging pieces together, there are errors in the database. The same ones I've already described. The elifile attached comes from the most complete database of around 180k elements.
More feedback needed
Unfortunately, the original hypothesis cannot be verified as your KNO and elinfo files have already been filled with data (probably during Reset or Repair). This problem might be solved remotely in an interactive way if you keep reporting which integrity errors disrupt your work and cannot be fixed with the Repair procedure. You could download debug versions of SuperMemo for diagnostics and the repair procedure might be enhanced to address your specific case. There is little evidence of damage going beyond the files involved in the learning process. As such, the collection should be recoverable.
Feedback
havate wrote:
details
I'm happy that the files contain at least some data. The thing is that it's impossible to reset the database without holding o button for an extensive amount of time. There are many A-factor and other errors. It would be really useful if there was some "Answer OK to all" button.
My collection appears to work fine to the point where you try to use mercy. Then you'll get lots of errors.
Where can I get those debug versions of Supermemo? Currently I'm using SM with debugging switched on.
what happes when I use memorize on the collection
With a database completely reset - 0 memorized elements and after full repair - no errors. I try to memorize the datbase and this s the error count. At least there's no need to OK each error individually. I supopse that tomorrow when I try to mercy those elements, it won't be possible.
SuperMemo Report
Activity: Operation: memorize Collection: Y:\BAZA ZINTEGROWANA 2014-04-26 (COPY) Date: Wednesday, May 07, 2014, 11:25:22 AM SuperMemo 16 (Build 16.03, Apr 17, 2014)
++++++++++++++++++++++ ERROR #1 ++++++++++++++++++++++
Cannot update A-Factor distribution
++++++++++++++++++++++ ERROR #2 ++++++++++++++++++++++
Cannot upldate workload statistics for day 2
++++++++++++++++++++++ ERROR #3 ++++++++++++++++++++++ ++++++++++++++++++++++ ERROR #206397 ++++++++++++++++++++++
Cannot update A-Factor distribution
++++++++++++++++++++++ ERROR #206398 ++++++++++++++++++++++
Cannot upldate workload statistics for day 2
Process completed at 11:35:08 AM in 00:09:46 sec (Wednesday, May 07, 2014)
206398 ERRORS (Operation: memorize)
____________________________________________________________
mercy rescheduling
After I memorized entire collection I needed to use Mercy on it. There were 168734+15670 elements to reschedule. I decided to spread them at 300 per day. The process freezes after it's completed 79200 elements. SM16 stops responding which is also reflected in procmon.
I suppose that those A-factor errors cause the mercy malfunction.
Recover.txt
The error Cannot update A-Factor distribution shows up 58245 times. There are many warnings like 251 children at Topic #24330: Self-Discipline in 10 days. And there are two errors in the schedule:
++++++++++++++++++++++ ERROR #58249 ++++++++++++++++++++++ Wrong item count on 4/24/2014 (Day #2) Expected: 65535 Found: 123783 ++++++++++++++++++++++ ERROR #58250 ++++++++++++++++++++++ Error verifying scheduled elements
comments
- Cannot update A-Factor distribution tells you that A-Factor distribution in the KNO file is wrong (e.g. set to all zeros, or includes 65535 entries). The error will show as many times as there is a need to update the distribution, but should result in a new correct distribution (it is computed from scratch at recovery). This problem is harmless
- the warning 251 children at Topic #24330 will appear each time you add more elements to a branch than your set children limit. This is harmless.
- Wrong item count on 4/24/2014 (Day #2) is caused by your Reset/Memorize attempts. The schedule cannot hold more than 2^16 elements in a single day. You can Mercy those elements over a week to solve the problems. This is harmless.
- Error verifying scheduled elements has not been documented. You will need to wait a few days for more information.
Debug Version of SuperMemo
This collection seems to have the learning material unaffected, while the learning data is lost and/or corrupted. Without access to the collection, which is huge and private, it is hard to say what steps SuperMemo might take to fill in some substitute data that would, at least, allow of relearning the material. The purpose of the debug version is to lower the level of severity for warning and error reporting and enhance diagnostics for a specific set of procedures. Those procedures and operations need to be identified first. This is why it would be helpful if you could list errors or misbehaviors occurring in a typical learning process. Long-drawn processes (e.g. Mercy, Reset, etc.) may freeze when the program is overwhelmed with the size of data structures. For example, if false data indicate, a billion-element array needs to be allocated, this may exceed the capacity of a strong PC and/or Windows.
Debugging version vs Debug Build
- Debugging version option in SuperMemo increases the number of warnings and reports. Those are not targeted at a specific problem. The name is confusing because the version of SuperMemo is the same. The name Debug mode would be more appropriate.
- Debug Build is the name you will see in the about box of the debug version of SuperMemo once you download. This version can include custom diagnostics that help arm SuperMemo with tools to recover from various data damage/loss scenarios
"Yes to All"
There are nearly 3000 error traps in SuperMemo. Providing "Yes to All" option to all of them is not realistic as each case needs to be implemented individually. Providing a universal "Yes to All" would be very dangerous as it my tempt the user to ignore severe errors that should rather lead to a collection shutdown/backup/repair.
Debug Build diagnostics
The purpose of Debug Build is to add diagnostic reports. The A-Factor error might be set to be ignored, but this would only speed up going through the mercy procedure. It would not bring this case closer to resolution as it would not fix the data which has been corrupted and is causing problems. However, it might be possible to trigger specific diagnostics around that error at mercy time. Naturally, this would increase the number of errors showing up (rather than reduce it).
Collection Download
havate: Although the data in my collection is private, I will try to delete the private parts and provide you with a download location by email. I hope that you will be able to help me recover the database having complete access to it. Will you also need the elements folder? It's a large part of the collection.
Collection Download Details
Yes. You can provide a link for a download. as long as you make sure the link is not intercepted by a third party, you do not need to be thorough in your work in slimming down the collection. it will be used solely for diagnostic purposes, which will most likely involve only two files. The \elements folder will probably not be vital for that job, but including it might make the whole process simpler (and faster). Please use the following steps:
- send the link to bug[year](@)supermemo(.)org
- ask for confirmation that should arrive within 24h (please resend that mail if you do not get the confirmation in time)
- delete the collection file once you get a successful download confirmation
Solution
Thank you for submitting your full collection for testing. The data does not look damaged. All you need is a patched up SuperMemo that can handle collections with 180,000 pending elements.
If you still have the version of this collection with the learning process, start from trying that older version with the new SuperMemo!
Memorize/Mercy
You can restore your learning process by executing Learning : Remember : Intervals in a chosen range in the browser. For example, the range 1..365 would let your spread your elements in 300-400 elements per day portions. This will produce a spread that will take you a year to recover from (if you can handle 300-400 repetitions per day). You will need the patched up version for this to work. In the meantime, you can just memorize elements one by one as you keep your review.
Technical
SuperMemo uses a number of statistical parameters for monitoring the learning process. Distributions tell you how many elements fall into a given category (e.g. for interval length, no of repetitions, etc.). All distributions have a 2-byte limit. This means they can show only up to 65535 elements per category. All distributions have been protected from overflow in that they ignore element counts above 65535. Due to an oversight, A-Factor distribution does not have this protection. This bug has never been spotted because it is highly unlikely to build a collection with 65535 identical A-Factors. In your case, collection behaved ok until all A-Factors have been reset to the same value (probably due to a collection damage). This bug is a minor fix and the problem will not re-occur in future updates. In the meantime, you will receive a dedicated fix-only build of SuperMemo to resume your work.