Ade Malsasa Akbar contact
Senior author, Open Source enthusiast.
Sunday, January 31, 2016 at 12:30

This guide shows instuctions about converting PDF into TXT with pdftotext utility. pdftotext is already built-in in Ubuntu along with poppler-utils package. Thanks to Poppler Project and Glyph & Cog for providing this utility.

Converting As Is


pdftotext <pdf_file_name> <txt_file_name>
Explanation: this command line will convert whole pages of pdf_file_name into a single file txt_file_name.

Converting with Following Original Text Layout


pdftotext -layout <pdf_file_name> <txt_file_name>
Explanation: this command line with -layout option will force txt_file_name to have same text layout with the original pdf layout.

Converting PDF to HTML


pdftotext -htmlmeta <pdf_file_name> <html_file_name>
Explanation: this command line will convert pdf_file_name into a HTML file.

Converting Only Particular Pages


pdftotext -f <number> -l <number> <pdf_file_name> <txt_file_name>
Explanation: this command line will specify first page number (-f) and the last (-l) to convert.

Adjust End Of Line for Another OS Purpose


pdftotext -eol dos <pdf_file_name> <txt_file_name>
pdftotext -eol unix <pdf_file_name> <txt_file_name>
pdftotext -eol mac <pdf_file_name> <txt_file_name>
Explanation: the first command line is suitable if you want to read the TXT in Windows. The second is suitable in GNU/Linux and another UNIX family. The third is suitable for Mac OS.