PDFs created from screencap PNGs much bigger than scanned JPG ones. Why?
December 24, 2019 7:38 PM   Subscribe

I've only recently started making PDFs (using Photoshop CS2 on an old Mac) from loose image files. The 200 DPI grayscale scanned JPGs, when collected together to make a PDF, produced a single PDF only somewhat smaller than the total size of the JPGs included in it. (For example, a 28.5 MB folder of JPGs resulted in a PDF of 25.5 MB; one of around 85 MB resulted in a PDF around 81 MB.)

When I created a PDF from 72 DPI color screencap PNGs (black text on solid color background), however, the resulting PDF was roughly ten percent the size of the original (13 MB vs 110 MB). Why such a difference between the two results? (I suspect it was because the only color in the color PNGs was in the solid background, and the PDF compression algorithm was able to reduce the space that data took up quite dramatically. If this is indeed the case, can one safely expect the quality of the actual black/gray text in these PDFs to be equal in quality to that of the black/gray text in the original color screencaps, with only the solid color background suffering degradation?)
posted by tenderly to Technology (6 answers total) 1 user marked this as a favorite
 
Best answer: PNGs will compress solid colors quite efficiently. However gradients and continuous tone stuff will be much less efficient than a lossy compression algorithm like JPEG. Probably when you are converting images into PDFs it's JPEG compressing them, apparently at a slightly more aggressive setting than whatever made the JPEGs you have is using. And probably your PNG screenshots have antialiased text - if they were truly two colors with no smoothing, you'd need an enormous amount of 72 dpi screen shots to get up to 110MB.
posted by aubilenon at 7:46 PM on December 24, 2019 [1 favorite]


Best answer: PNG is a lossless compression format this means that it will exactly reproduce the input image. JPG is a lossy compression format this means that while it may still "look good" the pixels produced when you uncompress the JPG will not be identical to the original. Likely what is happening is the PDF is using a lossy compression format (probably JPG) and this is leading to the large decrease in size.

So you are losing some information in the process, it's up to your needs to discern if this is acceptable. For typical pictures of people, places and things JPG usually does a great job. For art and text with sharp edges that need to remain sharp PNG is usually better as the assumptions JPG makes as to what is important/not important fail in these situations introducing artifacts that are not in the original image.
posted by NoDef at 7:50 PM on December 24, 2019 [3 favorites]


Response by poster: note: Almost all these screenshots are of black text on solid color background only (i.e. books); my grayscale personal scans (also of books) are of black text on a white background.

aubilenon: my (11 x 13") 72 dpi screenshot PNGs are around 990k each (so 225 page book @ two pages/screenshot= 110 MB)

NoDef: Likely what is happening is the PDF is using a lossy compression format (probably JPG) and this is leading to the large decrease in size.
Are you saying that Adobe PDF creator is applying (probably) JPEG compression to the PNGs, making them +_90% smaller, but not to the (already compressed) JPGs (where the resulting file is very close in size to the original scans)?
posted by tenderly at 8:17 PM on December 24, 2019


Best answer: No, Creator will be uncompressing everything you feed to it, then recompressing the final composite result. And the file sizes you're getting suggest strongly that it's using JPEG (lossy) compression to do that last step.
posted by flabdablet at 8:20 PM on December 24, 2019 [1 favorite]


Best answer: If you are using Adobe Pro, dig into the Create PDF preferences, Advanced Settings, Color to tweak color handling and Images to examine Image compression. Then, only create PDFs, instead of Save As or Print to PDF.
posted by Riverine at 5:03 AM on December 25, 2019


Best answer: PDF is a metafile format: that is, it can contain (and more importantly, describe how to display) other file formats. PDF can contain JPEG and PNG files unchanged, so the times you're getting PDFs just slightly larger than your JPEG input is when Creator is storing your JPEG files as-is.

If your screencaps are truly just two colours, then PNG should be the smallest file format available to you. PNG has variable compression and bit-depths, so maybe your software is saving the PNGs as 24-bit RGB when you really only need a 2-colour paletted PNG. If the screencaps are using anti-aliased text (and almost all modern displays anti-alias using methods like ClearType), then you'll have to use more colours in the palette.

JPEG can be a bit disappointing for text. Hard transitions between colours — and that's what makes text crisp — causes "ringing" noise around the text. Counterintuitively, if you add a tiny bit of noise/dithering to the image you can get JPEG compression to much higher levels before the ringing becomes really unpleasant. But even these images are likely to be larger than a well-tuned PNG.

Most interactive software has built-in assumptions, so Riverine's suggestion of digging into the advanced options is worth doing. Personally, when I have control over the input format, a combination of pngcrush, the IJG programs and the almost magic img2pdf command line tools are what I use. img2pdf can be truly lossless: even the metadata fields from input JPEG images are kept, so you can use it as a kind of portable viewable archive for your photos. You can extract the source images with a tool like pdfimages.

(In the past, the recommended image format for truly black and white scans was G4 [as used in fax machines] but a PNG image can often manage the same or slightly better compression with less hassle. Another option supported by PDF for B&W images is JBIG-2, but it's lossy and can make your text illegible [or worse]. For true continuous-colour images, JPEG-2000 can result in stunningly small files, but they can be very slow do display and fall apart into blurry goo if you get the compression settings wrong. I don't really recommend any of these parenthetical formats, but they might be offered as part of your software's advanced options.)
posted by scruss at 6:20 AM on December 26, 2019


« Older Setting up a charity to do a very specific thing   |   extended visit with in-laws who eat no fiber; how... Newer »
This thread is closed to new comments.