<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Ian M Hart &#187; digital camera</title>
	<atom:link href="http://www.ianmhart.net/tag/digital-camera/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ianmhart.net</link>
	<description>Postdoc historian</description>
	<lastBuildDate>Fri, 19 Mar 2010 18:32:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>How to quickly convert printed material into indexable PDF files with a digital camera</title>
		<link>http://www.ianmhart.net/2008/01/17/converting-printed-pages-into-pdfs-with-a-digital-camera/</link>
		<comments>http://www.ianmhart.net/2008/01/17/converting-printed-pages-into-pdfs-with-a-digital-camera/#comments</comments>
		<pubDate>Thu, 17 Jan 2008 23:02:55 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[General academic interest]]></category>
		<category><![CDATA[capture]]></category>
		<category><![CDATA[digital camera]]></category>
		<category><![CDATA[document processing]]></category>
		<category><![CDATA[image]]></category>
		<category><![CDATA[OCR]]></category>
		<category><![CDATA[recognise text]]></category>
		<category><![CDATA[text recognition]]></category>

		<guid isPermaLink="false">http://www.ianmhart.net/2008/01/17/converting-printed-pages-into-pdfs-with-a-digital-camera/</guid>
		<description><![CDATA[I describe below a method of quickly capturing digital images of printed material straight onto a laptop, which can then be easily converted to recognised, indexable text. This should prove useful to anyone who has to wade through and record vast amounts of printed material. As part of my research I have had to go [...]]]></description>
			<content:encoded><![CDATA[<p><em>I describe below a method of quickly capturing digital images of printed material</em><img src="http://www.ianmhart.net/wp-content/uploads/2008/01/for_ocr_article.jpg" alt="camera laptop and tripod" align="right" /><em> straight onto a laptop, which can then be easily converted to recognised, indexable text. This should prove useful to anyone who has to wade through and record vast amounts of printed material.</em></p>
<p>As part of my research I have had to go through hundreds of pages of archival material a day &#8211; material which it is often very time consuming to photocopy due to the bureaucratic procedures one has to navigate at restricted-access libraries. On a trip to the LBJ Library in Austin, Texas I was introduced to the concept of using a digital camera to take pictures of pages, which could be transferred to a computer later as images. I realised that I could use an optical character recognition (OCR) program to recognise the text, making it fully searchable and allowing me to copy and paste quotes from memos etc. straight into my work. The tedious part was transferring the images from the camera to the laptop, and keeping track of the correct folder structure at the same time. I knew it would be much easier if I could find a way to save the images straight from the camera to the laptop, and that is what I eventually found&#8230;.</p>
<p>For this setup you will need:</p>
<ol>
<li>A portable tripod small enough to take into libraries with you. I used a &#8220;Hama Mini Tripod Traveller Compact&#8221; I got from Amazon for £13.</li>
<li>A Canon Powershot digital camera which is compatible with the PSRemote camera-controlling software (see below). I bought an old Canon PowerShot A85, again through Amazon.co.uk, for £50. It takes pictures at a maximum resolution of 4MP, which is perfectly adequate. Most recent models are compatible with the PSRemote software, but it has to be a Canon digital camera.</li>
<li>PSRemote software, available here: <a href="http://www.breezesys.com/purchase_psr.htm" title="http://www.breezesys.com/purchase_psr.htm">http://www.breezesys.com/purchase_psr.htm</a> for $49. The list of compatible Canon cameras is available at the above link.</li>
<li>A laptop computer running some version of Microsoft Windows.</li>
<li>Some kind of OCR program on your computer. The best is ABBYY Fine Reader, now in version 9 (£90 for the downloadable edition), but there are free programs available which, in my experience, perform almost as well.</li>
</ol>
<p><strong>Taking the photos</strong></p>
<p>Plug in the USB cable which should have come with the Canon camera into your laptop and the camera. Switch the camera to the play setting. After you have installed the PSRemote software start the software up. Click the menu item to &#8220;register for camera events&#8221; in PSRemote, then turn on the camera. PSRemote should recognise your camera; you can now turn on the viewfinder feature in PSRemote and see the viewfinder on the computer screen. Press the F8 key to take photos straight from your laptop. Position the camera on the tripod with the documents to be photographed below. Before you start taking pictures adjust PSRemote&#8217;s options to take them using the &#8220;tungsten&#8221; filter (I have found this works best). Now, finally, adjust the PSRemote preferences to choose the correct folder on your laptop to save the photos in. You can now begin taking photos of the documents, it takes about 5 seconds to download each picture from the camera to the computer, which is normally just enough time to turn the page before the next photo.</p>
<p><strong>Recognising the text</strong></p>
<p>You should now have a lot of JPEG files in a folder on your laptop. Open ABBY Fine Reader (or another OCR program) and drag and drop the JPEG files in there, this will add them to the &#8216;batch&#8217; to be processed. Select all, then read (text recognise) them. This usually takes about 10 seconds per image. When the OCR program has recognised the text you choose the format in which to save the text. ABBYY FineReader has an export to Microsoft Word feature, but arguably the best method is saving all the images and all the recognised text in a single PDF file. In ABBY Fine Reader you can use the Save Wizard to select Acrobat (PDF) format, then in the further formatting options choose to keep the original image and place the recognised text under the images (see screenshot below). Here is a sample of a resulting PDF file saved using the method described here: <a href="http://www.ianmhart.net/images/recognised_text_sample.pdf" title="a newspaper clipping from July 1970" target="_blank">newspaper clipping from July 1970</a>.</p>
<p><a href="http://www.ianmhart.net/wp-content/uploads/2008/01/pdfsettings.jpg" title="PDF export settings in ABBY Fine Reader 8"><img src="http://www.ianmhart.net/wp-content/uploads/2008/01/pdfsettings.thumbnail.jpg" alt="PDF export settings in ABBY Fine Reader 8" /></a></p>

]]></content:encoded>
			<wfw:commentRss>http://www.ianmhart.net/2008/01/17/converting-printed-pages-into-pdfs-with-a-digital-camera/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

