Why Scanned PDFs Are So Much Larger Than Digital Ones

A ten-page letter typed in Word and exported to PDF might be 200KB. The same ten pages scanned and saved as a PDF might be 30MB — 150 times larger. The content is identical. The file size difference is enormous. This comes up constantly when people scan documents and then wonder why they can't email the result. The explanation is straightforward once you understand how each type of PDF stores its content.

Text Data vs Image Data: A Fundamental Size Difference

A digital PDF stores text as character data. The letter "A" in a PDF is stored as a reference to the character "A" in a specific font — a few bytes of information that tells the viewer what to draw and where. An entire page of text might occupy 5-10KB because each character is just a small reference, not a picture.

A Scanned PDF stores each page as a photograph. That same page of text, photographed at 300 DPI in color, is a grid of roughly 2,500 × 3,500 pixels — nearly 9 million individual colored dots, each requiring data to describe its exact color. Even after compression, a single page of scanned text is typically 1-5MB. Ten pages means 10-50MB.

Try Compress PDF

No installation needed. Works directly in your browser.

Get Started →

The Math Behind the Size Difference

An A4 page scanned at 300 DPI produces an image of 2,480 × 3,508 pixels. That's approximately 8.7 million pixels. In full color (RGB), each pixel requires 3 bytes of data — one each for red, green, and blue values. Uncompressed, that's roughly 26MB per page.

JPEG compression reduces this dramatically — a typical scanned page compresses to 1-3MB. But even compressed, it's orders of magnitude larger than the few KB needed to store the same content as actual text characters. The content is the same; the storage method is completely different.

Color vs Grayscale vs Black and White

Not all scanned PDFs are the same size. The color mode chosen at scanning time has a major impact:

Color (RGB): 3 bytes per pixel. The largest files. Necessary for documents with color content; wasteful for black text on white paper.
Grayscale: 1 byte per pixel. Files are roughly 1/3 the size of color scans. Ideal for typed documents, forms, and anything without meaningful color.
Black and white (1-bit): each pixel is either black or white — 1 bit of data. Files are extremely small. Best for printed text documents where no gray shading is needed, but harsh on anything with gradients or photographs.

For most document scanning — letters, contracts, forms, invoices — grayscale at 150-200 DPI produces files that are readable, compact, and appropriate for email and digital submission.

What to Do About Large Scanned PDFs

If the scan is already done and the file is too large, compression is the fastest fix. PDF Compression reduces scanned PDFs significantly — often by 60-80% — because the image data in each page has substantial redundancy that compression can eliminate. WukongPDF at www.wukongpdf.com handles this: upload the scanned PDF, apply medium or high compression, download a file small enough to email.

If you can rescan, adjust the settings first: switch from color to grayscale, reduce DPI from 300 to 150 or 200, and enable any built-in PDF compression in the scanner software. These changes at the source produce a much smaller file without the quality tradeoffs of aggressive post-scan compression.

The OCR Approach: Smaller and More Useful

Running a scanned PDF through OCR doesn't just make it searchable — it can also reduce file size. Some OCR tools replace high-resolution page images with lower-resolution versions after extracting the text, since the text layer handles readability and the image only needs to provide visual context. The result is a smaller file that's also searchable and copyable — a better outcome than just compressing the image-only scan.

Try Compress PDF

No installation needed. Works directly in your browser.

Get Started →