Binary components are not designed to hold HTML files

From SuperMemopedia
Revision as of 15:03, 16 March 2020 by SuperMemoHelp (talk | contribs) (→‎MHT test)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Environment

  • Version: 18.02, 18.03, 18.04
  • Operating system: Windows 10, 7, and Linux, all 64-bit. (Seems irrelevant)

Description of the use case

I saved a large article on the web as a single HTML file with the SingleFile browser extension—which, as the name suggests, bundles everything (text, images, frames, etc.) into a single "fat" HTML file for offline viewing. For several reasons I opted not to import it to an HTML Article (e.g. through Edit : Web import) as I want to take incremental decisions on what to import in the first place. For this I set up a binary component to hold the single HTML file besides an HTML component, along the lines of Extracting pictures while reading PDF (SuperMemo 17). The idea is that launching the binary would open the "fat" HTML file in the system's default browser, and that the binary would be kept until the article is fully processed (and then eventually removed from the Binary registry).

Problem description

Upon import of a HTML file to a binary component, the following warning is shown:

Program registry file is obscured by an extra filespace file:
s:\colls\knowledge\elements\1\14.htm

The following two methods of file import were tried:

  • Component menu : File : Import file (answer to "Leave the imported file in its original location?" was irrelevant to the outcome).
  • Component menu : Links : Registry member, then Adding the file and clicking Accept in the Binary registry window.

I can confirm that s:\colls\knowledge\elements\1\14.htm corresponds to the "fat" HTML file that is the offline source of the article, presumably imported into filespace. Perhaps worthy of mention is that the original file's extension (.htm and .html were tried) were changed by SuperMemo to .htm when copied to the collection.

The following issues were experienced:

  • Clicking Run triggers the same warning, and the operation is aborted (the system's default browser is not launched)
  • Making an extract on the HTML component besides the Binary component triggers the same warning. Yet, the child element is created, and the binary component carried over.
    • Navigating to the child element triggers the warning automatically (the component is set to by Display at browsing by default)
    • Activating the binary component in the child element has the same behavior as in its parent.
  • Selecting the entry in the Binary registry window triggered the same warning.
  • [intermittent outcome] Deleting the registry entry through the Binary registry window yields a 0-byte .del file with the same file name (14.htm becomes zero-byte 14.del), which disappears after Collection repair.

It seems as if there is a bias for filespace to identify htm files solely as the source of an HTML component, and warnings are triggered when there is no match.

Workarounds tried

  • Set the binary components not to display at browsing. Not only it is impractical, but triggering its display recreates the same problems described above.
  • Override the binary file with an external reference via Component menu : Links : External file. It didn't present the problems described (the system's default browser was launched without any warnings), but this path is not ideal because of the intent to keep the file in the filespace for backup and reference purposes for as long as it is needed.

Planned workarounds

Create a new file type for the single HTML file tied to a new extension, e.g. .fathtml and associate it with the system's default browser; rename the file to that.

Hypothesis

This workaround might work on the assumption that all problems stem from using the extension HTM that is subject to dedicated processing.

Quick comment from SMW

this looks like a bug or a "special case", which should not be hard to resolve. However, it may be impossible to figure out by just using/testing SM18. a programmer's point of view may be needed to know what is happening

MHT test

Using the same trick on *.mht file with a binary component seems to work. The extension is most likely the source of trouble. Renaming *.mht to *.htm would result in identical problems as above.