A print shop publishes one address. Customers email files in. A worker reads the inbox, checks each PDF for the things that ruin a print run, and replies in plain English. No human in the loop, no portal to log into.
Flow
Why these checks?
QR cross-reference
Rasterise each page, decode any QR codes, compare the encoded URL to the URLs in the card's text. A misspelt domain in the QR (exarnple.com vs example.com) is invisible to a human eye but lethal to print.
Placeholders & typos
Regex finds obvious leftovers (Lorem ipsum); a small LLM pass catches fuzzier mistakes — URL/email domain mismatches, near-miss spellings.
Bleed
Compares MediaBox against TrimBox / BleedBox. If the page sits at trim size and ink runs to the edge, the cut leaves a white sliver. Pure-white edges downgrade to a note.
Colour space
Press wants CMYK. Any RGB image embedded in the PDF is flagged so the shop can convert before they output plates.
Image DPI
For each embedded raster, divide source pixels by placed size. Anything under ~250 DPI will look soft on press.
Text size & strokes
Type under 6pt and strokes under 0.25pt sit at the edge of what offset can hold cleanly. Both get flagged.
What is and isn't in the loop
The bot writes and sends the reply itself — no draft review, no human approval. The relevance filter ahead of the analysis is the safety net: anything that doesn't look like a printing request gets read and ignored, no auto-reply to spam or newsletters.
Replies are threaded under the original message, so the conversation looks like a normal back-and-forth with the print shop's address rather than a bot UI.
poppler-utils, libzbar0) reproducible.