Do you have a PDF document or image that you want to convert to text? Someone recently mailed me a document that I needed to edit and send back with the corrections. The person could not find a digital copy, so I was tasked with translating all this text into digital format.
I didn’t intend to spend hours typing everything back, so I ended up taking a nice high-quality photo of the document and then sneaking through a bunch of online OCR services to see which one worked best for me. results.
In this article, I’ll go over a few of my favorite free OCR sites. It’s worth noting that most of these sites provide a basic free service and then have paid options if you need additional features like large images, multi-page PDF documents, different input languages, etc.
It’s also good to know ahead of time that most of these services won’t be able to match the formatting of your original document. They are mainly designed to extract text, and that’s it. If you need everything to be in a specific layout or format, you have to do it manually once you get all the text from OCR.
In addition, the best results for obtaining text will be obtained from documents with a resolution of 200 to 400 dpi. If you have a low resolution image, the results will not be as good.
Finally, there were a lot of sites that I tested, but they just didn’t work. If you use Google’s free online OCR, you will see several sites, but some of the sites in the top ten results haven’t even finished converting. Some were getting timeout, others were getting errors, and some were just stuck on the conversion page, so I didn’t even mention these sites.
For each site, I tested two documents to see how good the result would be. For my tests, I just used my iPhone 5S to take a photo of both documents and then upload them directly to websites for conversion.
If you want to see what the images I used for my test looked like, I’ve attached them here: Test1 .and Test2 Please note that these are not phone versions of full resolution images. When uploading to sites, I used a full resolution image.
online
OnlineOCR.net is a clean and simple site that performed very well in my test. The main thing I love about it is that it doesn’t have a lot of ads, as is usually the case with such niche service sites.
First, select your file and wait for the download to finish. The maximum download size for this site is 100 MB. If you sign up for a free account, you get a few extra features like larger download size, multi-page PDFs, different input languages, more conversions per hour, etc.
Then select the input language and then select the output format. You can choose Word, Excel, or plain text. Click the “Convert” button and you will see the text displayed in the box below along with the download link.
If you only want text, just copy and paste it from the box. However, I suggest you download the Word document because it does a surprisingly great job of preserving the layout of the original document.
For example, when I opened a Word document for the second test, I was surprised to find that the document has a three-column table like in the image.
Of all the sites, this one was by far the best. If you need a lot of conversions, it’s worth signing up.
For completeness, I’ll also link to the output files generated by each service so you can see the results for yourself. Here are the OnlineOCR results: Test1 .Doc and Test2 Doc.
Note that when you open these Word documents on your computer, you will receive a message in Word that they are from the Internet and that editing is disabled. This is fine because Word doesn’t trust documents from the Internet, and you really don’t need to allow editing if you just want to view the document.
i2OCR
Another site that has shown good results is i2OCR The process is very similar: select a language, a file, and click Extract Text.
Here you will have to wait a minute or two because this site takes a little longer. Also, in step 2, make sure your image is displayed right up in the preview, otherwise you’ll end up with a bunch of gibberish in the output. For some reason, images from my iPhone were showing on my computer in portrait mode but in landscape mode when I uploaded to this site.
I had to manually open the image in a photo editing application, rotate it 90 degrees, then rotate it back to portrait position and then save again. Once complete, scroll down and you will see a text preview along with a download button.
This site did a pretty good job with the first test, but failed with the second test with column layout. Here are the i2OCR results: Test1 .Doc and Test2 Doc.
FreeOCR
Free-OCR.com converts your images to plain text. It has no way to export to Word format. Select the file, select the language and click “Start”.
The site is fast and you get results pretty quickly. Just click on the link to download the text file to your computer.
As with the NewOCR mentioned below, this site uses all letters in the document with a capital letter. I have no idea why he did it, but for some strange reason this site and NewOCR did it. It is not difficult to change this, but it is a tedious and unnecessary process.
Here are the FreeOCR results: Test1 .Doc and Test2 Doc.
ABBYY FineReader Online
To use FineReader Online, you need to register for an account that gives you a 15-day free trial for OCR up to 10 pages for free. If you only need to do one-time OCR for a couple of pages, you can use this service. Make sure you click the confirmation link in the confirmation email after registering.
Click “Recognize†at the top and then click “Upload†to select the file. Select your language, output format and click “Recognize” at the bottom. This site has a clean interface and no ads.
In my tests, this site was able to get the text from the first test document, but when I opened the Word document it was absolutely huge, so I ended up doing it again and choosing plain text as the output format.
In the second test with columns, the Word document was empty and I couldn’t even find the text. Not sure what happened there, but it looks like it can’t handle anything other than simple paragraphs. Here are the FineReader results: Test1 .Doc and Test2 Doc.
New Ukr
The next site, NewOCR.com, was ok, but not as good as the first. First, there are ads, but luckily not a ton. First you select the file and then click the Preview button.
Then you can rotate the image and adjust the area where you want to scan the text. This is very similar to how the scanning process works on a computer with a scanner attached.
If your document has multiple columns, you can click the Analyze Page Layout button and it will try to break the text into columns. Press the OCR button, wait a few seconds for it to complete, then scroll down when the page refreshes.
In the first test, he got all the text correctly, but for some reason every T in the document was capitalized! I have no idea why he did it, but it did. In the second test with page parsing enabled, it got most of the text, but the layout was completely disabled.
Here are the NewOCR results: Test1 .Doc and Test2 Doc.
Conclusion
As you can see, in most cases the free ones, unfortunately, do not give good results. The first site mentioned is by far the best because it not only perfectly recognized all the text, but also retained the format of the original document.
However, if you just want text, most of the websites listed above can do it for you. If you have any questions, do not hesitate to comment. Enjoy!
–