Menu PDF scans
 

PDF scans

It is becoming more common for me to scan a PDF and upload it to my server. In some cases I have opened them in Google Docs so that I can extract text from them to incorporate into a webpage. In some instances I just what to have an on-line access to the PDF itself.

While I have no objection to others accessing them and doing whatever they want with them I have no means of tracking these visits apart from putting links to them on a seperate page. Even then cannot easily track whether the links have been followed. I know that there is a way using the Google tools but that possibly only covers the Google inspired visits and not the Bing and other searches or referrals from other websites.

Scanning Newspaper Articles

This is not easy! The way that newspapers are formatted, pictures interspersed with text in columns, the scanning involves a lot of manual intervention.

I was thinking of subscribing to the British Newspaper Archive but I was a little put off as the samples that they give on their website often did not make sense. At first I thought this was due to the fact that they were trying to get you to subscribe so that you could see the "un-corrupted" article. Now I am not so sure as it appears that it is still up to the subscriber to correct their scans and then upload a corrected version.

Top

How-to open a PDF in Google Docs

Due to copywrite reasons this is not something that Google promote as it allows you to overcome the fact that someone had PDF'd a document so that you cannot copy and paste from it.

OCR - Optical Character Recognition

The British Newspaper Archive website says:

...... Although OCR makes it possible to search large quantities of full text information it is not 100% accurate. The accuracy depends on a variety of factors: condition of the original newspaper or microfilm, quality of the paper, size and style of the font and column layouts, for example.

When viewing an image, the OCR text can be viewed via the left navigation column 'All Articles' option. You can select an individual article (either from the image in the Viewer or from the 'All Articles' dropdown. Then select the 'Edit Article Text' option in the left navigation column. How to correct the text - This option can be accessed by simply clicking the list of sections displayed and applying your own corrections. By correcting the text, you will be adding to the quality of the data that can be searched by others.

There is no mention of copying the article to incorporate into your own document, I presume that you can. In any case I would to do my editing in my own emvironment not theirs!

I see on closer inspection that BNA (British Newspaper Archive) are part of Find My Past. I.e. a commercial enterprise.

Links/Scans

PDF files are in the images/ directory

Top

More research performed by John Marlow

Link to PDF Disabled

If you require a copy please email tempusfugit.me.uk

External Links - references

  • New Organ - 🔗 - British Newspaper Archive - a search for Belchamp Walter

Site design by Tempusfugit Web Design -

More