Numerous .txt to PDF (best options)
December 6, 2009 6:11 PM

What's the best way to make a PDF of a bunch of plain text files?

I have 260+ .txt files I would like to convert into one PDF. Each .txt file is a page or so. I would like each file to be it's own page/story. I use a Mac.

Presume I have access to most programs. Presume I have a passable knowledge of terminal.

I would rather not import each file one at a time. I would prefer not to just combine them into one big file to import and have to flow each page.

My end goal: I want a singular PDF with each .txt file as its own page (so I can reorder as I prefer). What's the best way to do this?

If it matters, each of these pages will be styled identically, but there may be some commentary inserted between pages.

I am willing to buy an app if need be, but cheaper the better.
posted by cjorgensen to Computers & Internet (11 answers total) 3 users marked this as a favorite
txt2pdf should work for you, there's a compiled OSX executable linked on the site. Note it's a command-line app so you'll have to run it from the terminal, but it should be as simple as running "txt2pdf *.txt" once you have it installed.
posted by reptile at 6:14 PM on December 6, 2009


I would guess if you run txt2pdf above you'll end up with 260 pdf files. If you want to merge them into one big pdf there are several programs that can do this. I've used 'pdf toolkit' in the past; see here. Run the following and you're done:

pdftk *.pdf cat output combined.pdf
posted by PercussivePaul at 6:55 PM on December 6, 2009


The enscript command can do this, if it's on your Mac (I'm not sure if it's included). Downside: it can be hard to make it look nice.
posted by miyabo at 7:12 PM on December 6, 2009


Here's how I would do it:

1. install pdftk.

2. Get enscript if you don't have it (I think Macs ship with it)

3. Get ps2pdf if you don't have it.

4. Get all the files in one directory, and move to that directory in a terminal

5. Execute this in the terminal:

names=''
for f in *.txt
do
fbase="${f%%.txt}" #strip file extension
enscript -B -p "$fbase"".ps" "$f" #convert text file to ps
ps2pdf "$fbase"".ps" "$fbase"".pdf" #convert ps to pdf
names="$names""$fbase"".pdf"" " #add pdf filename to list
done
pdftk $names cat output FINALPRODUCT.pdf #Concatenate all the pdfs into FINALPRODUCT.pdf

NB: Note that all the intermediate ps and pdf files will still be there, so you may want to clean up afterwards.

Also note that the "-B" option for enscript suppresses the header for each text file in the pdf documents. If you want a header, don't use the -B option.

Disclaimer: This is all bash, and the invoked programs claim to be compatible with OS X, but I'm familiar with Linux, not Mac OS, so I would verify that this will work with a mac person first.
posted by Salvor Hardin at 7:28 PM on December 6, 2009


Oh, and I don't know if you want a particular order. The code I posed above will probably do it alphabetically by file name. If you want some other order, you'll have to replace the for loop with some other loop. Also, if you need to insert commentary, I think the best way would probably be to add it to one of the text files. Or you could make a separate text file for each piece of commentary, and name it so it will be alphabetized into the correct place.
posted by Salvor Hardin at 7:37 PM on December 6, 2009


One more thing - enscript is pretty flexible, and I think you can do a lot of formatting with it, but you'll have to read the man page. I know you can specify a font with the -f option.
posted by Salvor Hardin at 7:38 PM on December 6, 2009


You can convert a .txt to .pdf by selecting "Save as PDF" in the Print... dialog box. Combine PDFs is a freeware program that will--drum roll, please--combine PDFs.
posted by neuron at 8:52 PM on December 6, 2009


Presume I have access to most programs.

If that includes Acrobat, then "File>Create PDF>From multiple files"
posted by camcgee at 9:44 PM on December 6, 2009


If you want pretty-printing features, check out a2ps . It has a ton of features and options, plus stylesheets for many formats and languages. You need the --file-align=sheet option in particular to make each file start on a different page. Like the name implies, it generates ps, not pdf, but I'm guessing you already know how to make the final conversion.
posted by Dr Dracator at 10:13 PM on December 6, 2009


Another option if you are super concerned about the text files ending up looking professional might be : Write a sed or perl script to convert the text files to sections or chapters in a tex/latex document.

I don't think you'll find massaging these into latex too difficult if you (a) understand regular expressions, (b) your text files are using fairly simple and homogeneous layout, and (c) you want the final results to all use identical layout, spacing, etc. i.e. look professional. If your control file uses \include, or if individual files are chapters, then you'll automatically get each file beginning a new page.

For example :
If the only formating is centering the title, then maybe :
s/^ +([A-Z][A-Za-z0-9 ]+)/\\chapter\{\1\}/
If sections headings begin with a number, then :
s/^ ? ? ?[0-9]\. +[A-Z][A-Za-z0-9 ]+)/\\section\{\1\}/
If paragraphs are not separated by blank lines, but are indented, then :
s/^ /\n\n/
You'll obviously need to escape any preexisting \ and % characters using s/\\/\\\\/g and s/\%/\\%/g.

You'd need slightly more complex regular expressions if the documents are letters, use differing title representations, contain numerous source code examples, etc. If the files use accented characters, you'll want to look up the inputenc and babel packages.

You'll obviously never get reasonable latex representations from ascii art, btw.
posted by jeffburdges at 4:17 AM on December 7, 2009


It is possible to produce PDFs at the command-line without installing any additional software.
posted by James Scott-Brown at 5:45 AM on December 15, 2009


« Older Irma Vep: final scene   |   What leaf? What thorn? Newer »
This thread is closed to new comments.