问题描述:

I have an old Linux version (0.12.4) of pdftotext that runs without problems, but I would like to run it on a Windows 7 machine.

I downloaded the Windows installer for what appears to be the latest version, xpdf-2.03-bin.exe from http://gnuwin32.sourceforge.net/packages/xpdf.htm.

I accepted all the installer defaults. When I ran the Windows 7 pdftotext on a PDF file that the Linux version handled correctly, I got the following series of error messages:

 - Error <0>: PDF file is damaged - attempting to reconstruct xref table ...

- Error: Couldn't find trailer dictionary

- List item Error: Couldn't read xref table

I did a web search on these error messages, but none of the issues I found associated with these errors seemed related to the problem I'm having.

Has anyone encountered this problem with pdftotext on Windows 7 or know how to resolve it?

网友答案:

Guessing from the version numbers:

  • 0.12.4 on Linux
  • 2.03 on Windows

you seem to be using two very different beasts, both containing a utility named pdftotext:

  • Version 0.12.4 is a Poppler-based version of pdftotext, released in Feb 2010. Being almost 5 years old now, it is rather outdated today. Poppler is a 'fork'  from the original code base of XPDF, which happened in the 2005. Since the fork happened, it has been developed faster than the "mother" code, and has acquired many more additional + useful features. It is difficult to find pre-compiled binaries for Windows, though. The latest release is 0.30.0 (Jan 2015).

  • Version 2.03 is an XPDF-based version of pdftotext, released in Oct 2003. Being more than 11 years old now, it is ancient. XPDF is the original software to provide the pdftotext utility. It was released for the first time in 1995. It is still developed, albeit more slowly than the Poppler fork is. Its most recent release is version 3.04 (May 2014) and can be downloaded here. Attention -- may be of major interest to you: this release contains a new text extractor!

相关阅读:
Top