PDF to HTML converters
Summary
The best approach to incremental processing of PDF files is to convert them first to HTML.
For a summary of other methods see: PDF in SuperMemo
Converters
The best: Investintech
- address: http://www.pdf.investintech.com/
- free on-line PDF -> HTML converter
- converts 1 MB file in seconds
- messes up some mathematical formulas
- sometimes conversion fails with "There was an error with conversion"
- lots of extra HTML code for absolute positioning
- HTML difficult to process in incremental reading without splits and filtering
- articles may be split with a custom split string (see below)
- to split articles, use Ctrl+Enter and Split: Split the article
- separated pages are easier to filter and to illustrate with downloadable images
- important improvement in the quality of texts comes with removing absolute positioning (see below)
- filter with F6
- add pictures with Ctrl+F8
- incremental reading extracts work well on the final product, i.e. individual filtered pages
Custom string for article splits
Examplary custom split string for separating pages in Split the article:
<DIV style="POSITION: absolute; LEFT: 0px; TOP: 0px">
Code to remove to avoid absolute positioning
Exemplary code to remove to improve text flow in SuperMemo:
style="POSITION: absolute;
For example, use Ctrl+Shift+F6 to see the code, and do Search&Replace to remove that code from source HTML.
On-line converters
- Investintech: http://www.pdf.investintech.com/ (probably the best of them all)
- PDF online: http://www.pdfonline.com/ (2 MB size limit, converts to MS Word only?, desktop version $20)
- https://www.easypdfcloud.com/ (converts to MS Word only? - keeps hanging up? - requires signing in for files above 2MB)
Downloads
- Adobe Acrobat: http://www.adobe.com/products/acrobatpro.html (probably the best tool, $15 per month in subscription?)
- Wondershare: http://www.wondershare.net/ad/pdf-editor/converter.html
- Some PDF to HTML Converter (website subscription is needed to use it)
- Boxoft Free PDF to HTML
- PDF Mate: http://www.pdfmate.com/download.html
- Tipard: http://www.tipard.com/pdf-to-html-converter/
- Joboshare: http://www.joboshare.com/pdf-to-html-converter.html
- FlipPDF: http://www.flippdf.com/flip-pdf-to-html/index.html
- Convert Zone: http://www.convertzone.com/pdftohtm/index.htm ($40)
- ABC Amber: http://www.thebeatlesforever.com/processtext/abcpdf.html (link to buy it now goes to ... testimonials?)
- Clickcat: http://www.pdf-to-html.com/products.html
Converting to MS Word
There does not seem to be any tools that will reliably convert PDF to MS Word format or HTML format. Pasting from Acrobat to MS Word yields poor results (e.g. texts in columns lose or mix portions of texts.)
Links to investigate
- http://www.intrapdf.com/ ($50, probably works in Windows, i.e. not on-line)
- http://www.anypdftools.com/ (PDF -> MS Word, which should later be converted to RTF or WordPad subset to minimize Word-specific garbage)
- http://www.pdfpdf.com/pdfconverter.html ($50, works in Windows)
- http://www.zamzar.com/ (works on-line, but requires typing in e-mail address)
- http://www.pdftohtml.net/ (asks for a file on a local drive, sends the results via e-mail, i.e. requires typing in e-mail before providing any results)
- http://www.pdfonline.com/convert-pdf-to-html/ (online converted does not accept 3 MB file as too big)
- http://atechguide.com/online-pdf-to-html-converter/
- http://webdesign.about.com/od/pdf/tp/tools-for-converting-pdf-to-html.htm
- http://labnol.blogspot.com/2005/12/convert-doc-xls-ppt-rtf-pdf-to-html.html
Rather not useful
- http://www.html-to-pdf.net/free-online-pdf-converter.aspx (converts from HTML to PDF, not the other way round)
- http://www.web2pdfconvert.com/ (converts from HTML to PDF, not the other way round)
- http://www.htmlpdf.com/ (converts from HTML to PDF, not the other way round)