public inbox for linux-newbie@vger.kernel.org
 help / color / mirror / Atom feed
From: Yawar Amin <yawar.amin@gmail.com>
Cc: linux-newbie@vger.kernel.org
Subject: Re: Converting .gif to .txt
Date: Thu, 09 Feb 2006 20:14:07 +0600	[thread overview]
Message-ID: <43EB4E2F.3070501@gmail.com> (raw)
In-Reply-To: <380-2200624964217314@M2W030.mail2web.com>

[-- Attachment #1: Type: text/plain, Size: 1452 bytes --]

heisspf@skyinet.net wrote:
> Hi,
> 
> How can one convert a text document which has been scanned and therefore
> has become a
> .gif file back to a .txt file in order that one can copy and paste text
> from it.

This is the job of OCR software, as mentioned.

> I was told it can be done with some software in windows. Is there such
> software in
> Linux?

Yes, but it's not up to the mark. I did some research on Freshmeat
some time ago. I recommend, if this is a one-time thing, getting the
OCR done at a shop.

> I was able to convert a .gif file in question to .pdf and open it with
> acroread. Acroread has an option to convert to text, however, trying to do
> it I get an empty file.

Basically the image is being embedded, pixel by pixel, into a PDF
file. The conversion program doesn't understand that the image is
showing English (?) text, and neither does Acroread. Acroread's
convert to text feature usually works because usually PDF files
contain text. You can verify this by opening a sampling of PDF files
with `less'. Then try opening your scanned image's PDF file with
`less'. You won't see the scanned text, but you will see a truckload
of gibberish data -- more or less the pixel-by-pixel description of
the image.

-- 
Yawar
Malaysia +60 (12) 918 6642
Bangladesh +880 (174) 614 754 or +880 (2) 882 1848 or +880 (175) 003
706 or +880 (189) 250 170
OpenPGP key ID 8B6B0839
Fingerprint EFB0 5050 6F27 AFC2 42B2 3B40 FD9C B344 8B6B 0839

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

      parent reply	other threads:[~2006-02-09 14:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-09  6:42 Converting .gif to .txt heisspf
2006-02-09  8:08 ` Andrew
2006-02-09 13:21   ` James Miller
2006-02-10  6:46   ` Peter
2006-02-09 14:08 ` joy merwin monteiro
2006-02-09 14:14 ` Yawar Amin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43EB4E2F.3070501@gmail.com \
    --to=yawar.amin@gmail.com \
    --cc=linux-newbie@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox