Homemade Document Imager

By , June 16, 2010 1:12 pm

Recently I had the need to digitize a few banker boxes worth of old documents. I usually would use a Canon Lide scanner to scan a few pages but this project required the capture of a few thousand pages of paper and would take forever with a normal document scanner. After looking around on the internet to see what other people have done to solve this type of problem I decided to build my own document imager.

I converted an old overhead projector into a copy stand by taking off the projector head and adapted the arm and bracket to have a 1/4 inch camera thread mount. Then I spray painted a plywood board matte black for the imager table surface. The overhead projector is useful because the table and arm are aligned and the camera is held centered above the document. The arm is easily adjustable with a large knob and can be racked-in to focus on something small like a post-it note or pulled back to capture a full newspaper page. I made a few registration marks on the table surface to keep documents aligned. You can quite often find an old overhead projector out on garbage day or at a yard sale.

Two old desk lamps were mounted next to the table for illumination. I hooked a Canon Powershot camera to a TV for previewing the document imager output using the camera's NTSC video output. To enable constant focus and brightness during a capture session I used AEL (Auto Exposure Lock) and AFL (Auto Focus Lock) modes. The digital camera was powered off a wall power adapter using a Canon ACK-DC10 AC Adapter Kit.

I have been extremely happy with the results and it takes a fraction of the time a normal scanner would take to capture a few hundred pages at a time.

Document Imager

Document Imager

Camera Bracket

Camera Bracket

Here is a sample page that captured using this homemade document imager and was cropped in Photoshop. It is from an old 1995 era Apple Computer Macintosh Performa 580CD sales flyer.

Apple Performa 580CD flyer

Apple Performa 580CD flyer - Click for full size image

After about 2 weeks in my spare time I managed to capture 8 bankers boxes worth of documents totaling 13084 pages of paper. Weighing in around 154 pounds. The average weight per box was 19.25 pounds / box. The pile of boxes stacked vertically would come in around 9 feet high.

I sorted the documents by year and categorized the content with searchable meta data. I used Carbon Copy Cloner on my Mac to clone the data onto 3 hard disks for redundancy and keep one drive at an off-site location for safe storage.

After the digitization project was over I decided to incinerate the old paper in a wood stove because my paper shredder couldn't handle that volume of paper. It generated an incredible about of heat. I figure burning the paper roughly put out around 976,937 BTUs of heat. While incinerating the paper I had to put a screen mesh over the stove pipe to reduce the risk of fly ash.

Print Friendly

11 Responses to “Homemade Document Imager”

  1. Tobias says:

    Was this done using tethered shooting? Using PTP protocol magic in CHDK? I wasn’t aware that there was a build for the IXUS 100.

  2. Andrew says:

    Actually, I used the live video output cable to preview the alignment of the documents on a tv as I worked. As well, I locked the focus and exposure on the first page to keep it consistent. Then I would transfer the images to my desktop once I had captured a few folders worth of documents.

    Andrew

  3. Mahmoud says:

    Had you looked at something like the ScanSnap?

  4. akamarkman says:

    Ahhh! Digitization does not equal preservation! This cannot be emphasized enough. At the very least please tell me you are storing one of those hard drives off-site.

    BTW, what are you doing in terms of organization? Full-text search or something else?

  5. Andrew says:

    The data is cloned onto three hard drives using the mac program Carbon Copy Cloner. One hard drive is stored off site. The data is sorted and categorized by year and content and has searchable meta data.

  6. Greg says:

    How large are the images? Do you have to do any post processing - cropping/etc? I've thought about using this technique before but wondered if there is any distortion (parallax) of the scanned image. Any problems with that?

  7. Andrew says:

    The images are about 2 to 3 MB in size. With my camera, a Canon Powershot SD780 IS, the images are 4000x3000. This works out to about 363 dpi x 352 dpi for an 8.5x11 page. The nice thing about using a modded overhead projector is that you can raise and lower the arm on the projector for framing the image. I can easily capture the image of anything from a post-it note to a newspaper page. I don't usually shoot at the widest setting on my camera so lens distortion isn't too bad. If lens distortion is an issue for you it can be removed in post processing using Photoshop or imagemagick.

  8. Tom Halligan says:

    This is VERY cool.

  9. Just Vecht says:

    Good to see this post. Even with a simple handheld camera useful results are possible with some care.

    I have built a simple bookscanner uisng the same principles. It takes about 30 minutes to scan a 200 page book and about the same time in postprocessing.

    Have a look at diybookscanner.org!

    Now my girl goes to University with just an ereader. No big load of books anymore.

  10. Len says:

    How does the picture of the page work with ABBY?
    Does it do the recognizing ok?

    I use that Plustek book scanner and it's like racing to the surface
    for air in a pool of molasses. Does a perfect job but is slow.

    Thanks for a great article and idea.

  11. Andrew says:

    I don't have a copy of the ABBY OCR program to test the images with. Just note that if your OCR program doesn't have the accuracy you want when processing the images shot by your digital camera, you can shoot a test grid pattern with your camera and then use it as a the basis for calculating and correcting for lens distortion. If you have a series of pictures taken in a batch you can also do perspective correction for any misalignment. Also, it is important to have consistent, even lighting when using a camera to image a document.