FDF Text Extractor

Created on Thursday, November 28, 2013.
Filed under Software, Productivity.
 

Annotations made in PDFs can be exported as FDF files, but FDFs are full of junk formatting and are not directly usable. This tool pulls text annotations from FDFs and presents them as plain text for use in other programs.

 

Annotations made in PDFs can be exported as .FDF files, but .FDFs are full of junk formatting and are not directly usable. This tool pulls text annotations from .FDFs and presents them as plain text for use in other programs.

The Extractor also does some niceties to the output such as collapsing line breaks, capitalising the beginning of annotations, and replacing HTML entities with their plain text equivalents, and converting HTML text formatting into Markdown text formatting.

Compatibility

Supported formats

  • .FDF (generated by Adobe Reader only)

Supported Adobe Reader annotations

  • Note highlights (highlights that contain text)
  • Sticky notes
  • Floating text
  • Text boxes
  • Callout boxes

Usage

There are two ways to use this extractor:

  1. Drag and drop .FDF files onto FDF Note Extractor.exe. You can convert multiple files at once by dragging and dropping them together. A .TXT file containing your text annotations will appear in the same folder as the original .FDF.
  2. Use "FDF Note Extractor.exe" $1 in a batch file. The converted .TXT files are placed in the same directories as the original .FDF files.

Caveats

  • Only .FDFs generated by Adobe Reader will reliably work. The output produced by other programs varies widely and is sometimes even messier.
  • Only annotations containing text are extracted. Highlights are not fine because FDFs don'’t store the highlighted text, only the location of the highlight within the document.
    • Use notes instead of highlights (highlight a block, right-click it, and choose ’Add Note to text’), and copy and paste the highlighted text into the note’s popup box. You get the visual effect of a highlight, plus storage of the highlighted text which is exportable with the .FDF.
  • Faulty OCR in some documents may produce ’null’ characters in the exported .FDF. The presence of null characters makes the annotation that contains it unreadable; Autohotkey uses null characters to signal the end of a string, and will not cross one because it believes there is nothing beyond it. If this happens, the best solution is to re-OCR the document using something like PDF-XChange Viewer, which has an excellent OCR tool for free use.

Let me download it!

Download FDF Text Extractor (Windows).

Changelog

v1 (28 Nov 2013)

That's all there is, there isn't any more.
© Desi Quintans, 2002 – 2016.