PDF to HTML converters

From SuperMemopedia
Jump to navigation Jump to search

Summary

The best approach to incremental processing of PDF files is to convert them first to HTML.

For a summary of other methods see: PDF in SuperMemo

Converters

The best: Investintech

  • address: http://www.pdf.investintech.com/
  • free on-line PDF -> HTML converter
  • converts 1 MB file in seconds
  • messes up some mathematical formulas
  • sometimes conversion fails with "There was an error with conversion"
  • lots of extra HTML code for absolute positioning
  • HTML difficult to process in incremental reading without splits and filtering
  • articles may be split with a custom split string (see below)
  • to split articles, use Ctrl+Enter and Split: Split the article
  • separated pages are easier to filter and to illustrate with downloadable images
  • important improvement in the quality of texts comes with removing absolute positioning (see below)
  • filter with F6
  • add pictures with Ctrl+F8
  • incremental reading extracts work well on the final product, i.e. individual filtered pages

Custom string for article splits

Examplary custom split string for separating pages in Split the article:

<DIV style="POSITION: absolute; LEFT: 0px; TOP: 0px">

Code to remove to avoid absolute positioning

Exemplary code to remove to improve text flow in SuperMemo:

style="POSITION: absolute;

For example, use Ctrl+Shift+F6 to see the code, and do Search&Replace to remove that code from source HTML.

On-line converters

Downloads

Converting to MS Word

There does not seem to be any tools that will reliably convert PDF to MS Word format or HTML format. Pasting from Acrobat to MS Word yields poor results (e.g. texts in columns lose or mix portions of texts.)

Links to investigate

Rather not useful

See also