joeldueckdotcom

Paper document archival

For the past decade I’ve scanned any paper documents that might be worth saving into my laptop as PDFs, and then shredded them.

When I push the blue button on the scanner, the file gets scanned, run through optical character recognition (OCR), automatically renamed to YYYY-MM-DD - «Document Type».pdf and moved to a Paper Records folder.

Setup

Here’s how I set it up on Mac OS.

My scanner is a Fuji ScanSnap (model 1300i, not sure if they’re sold anymore).

I created an Incoming Scans folder on my desktop and configured the ScanSnap software to save all new scanned PDFs directly into that folder.

I also created a Paper Records folder. The PDFs get moved here when they are after being properly renamed and tagged.

I bought Hazel for automated file processing and PDFPen Pro (so I can do automated OCR on scans).

OCR

I created a Hazel rule for the Incoming Scans folder called “OCR all PDFs”.

The AppleScript is below. I’ve stripped it down from other versions of it you might find online; it doesn’t attempt to detect if the program was already open or if the document was already OCR’d.

Note: Make sure you configure PDFPen not to prompt you to OCR when opening new documents. Otherwise it will interfere with the AppleScript automation and Hazel will throw errors.

tell application "PDFpen"
	open theFile as alias
	tell document 1
		ocr
		repeat while performing ocr
			delay 1
		end repeat
		delay 1
		close with saving
	end tell
	quit
end tell

Auto-rename and -tag common documents

Set up additional Hazel rules for the Incoming Documents folder, one rule for each type of document you want auto-filed.

Example rule:

The «…» in the match pattern refers to Hazel’s “Anything” pattern element, and «Billing Date» refers to a custom date element.

Manually renamed documents.

I don’t have a rule for every possible type of document I scan; obviously there are going to be many one-offs. So Hazel can’t inspect and rename these for me automatically. But I have an extra rule (“Autofile manually renamed files”) so that it can auto-file them into Paper Records when I’ve renamed it myself: