Re: Linux kernel - Libata bad block error handling to user mode program

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mike Hayward <hayward@loup.net>
To: foosaa@gmail.com
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-ide@vger.kernel.org, jens.axboe@oracle.com,
	linux-mm@kvack.org
Subject: Re: Linux kernel - Libata bad block error handling to user mode  program
Date: Fri, 5 Mar 2010 09:31:37 -0700	[thread overview]
Message-ID: <201003051631.o25GVbsD010752@alien.loup.net> (raw)
In-Reply-To: <f875e2fe1003041823o507ecb36qfd7af7d27de7683d@mail.gmail.com> (message from s ponnusa on Thu, 4 Mar 2010 21:23:09 -0500)

 > The data written through linux cannot be read back by any other means.
 > Does that prove any data corruption? I wrote a signature on to a bad
 > drive. (With all the before mentioned permutation and combinations).
 > The program returned 0 (zero) errors and said the data was
 > successfully written to all the sectors of the drive and it had taken
 > 5 hrs (The sample size of the drive is 20 GB). And I tried to verify
 > it using another program on linux. It produced read errors across a
 > couple of million sectors after almost 13 hours of grinding the
 > hdd.

It is normal, although low probability, for what we call a 'stable'
storage device to lose data for numerous reasons.  It detects this by
returning io error if a checksum doesn't match.  An I/O error is not
data corruption, it is what we would call data loss or unavailability.

 > I can understand the slow remapping process during the write
 > operations. But what if the drive has used up all the available
 > sectors for mapping and is slowly dying. The SMART data displays
 > thousands of seek, read, crc errors and still linux does not notify
 > the program which has asked it to write some data. ????

SMART data is not really all that standardized, and it is quite normal
to see the drive correcting errors with rereads, reseeks, ecc, etc. so
determining drive health really is manufacturer and model specific.

If it remaps either from it's own retry or from the operating system
retrying, it should of course return a succesful write even if it
takes a minute or two.  Once it is out of blocks to remap with it must
return io error or timeout.

All that being said, if a drive returns success after writing, and you
read different data than you "successfully wrote", as opposed to an
error, this is data corruption.  My number 1 rule of storage is "thou
shalt not silently corrupt data".  It should be incredibly unlikely
due to sufficiently strong checksum that silent corruption should
occur.  If you are detecting it this frequently, clearly something is
not working as intended.  This means the storage system is not
sufficiently "stable" to rely upon it's own checksums and return codes
for correctness.

This is why some apps may resort to replication or to adding
additional checksums or ecc at a higher layer, but this should
generally be unnecessary.  I would use such techniques primarily to
prove corruption defects in kernels, drivers, or hardware, or if, as
Alan mentioned, I were storing an extremely large amount of data.  For
performance reasons, my software (which does store huge amounts of
data) relies primarily upon replication (to work around both
unavailability and corruption) as opposed to parity techniques and
this is effectively what you are doing to prove data corruption here.

Hopefully you haven't found high probability data corruption :-) Can
you reproduce the problem with different manufacturers or models of
drives?  If so, the problem is most likely not in the drive.  I'd say
that's job number one and it's easy to try.  Short of doing a white
box inspection of the kernel, you could narrow the problem down by
swapping out kernels (try another much older or newer linux kernel,
and try another os) and various pieces of hardware.

If everything points to the linux kernel, then you'll have to start
instrumenting the kernel to track down where, exactly, it returns
success after having logged ata errors.  If the write didn't
eventually succeed after retries, but returned success to your app,
you'll have your kernel bug and be famous :-)

Or you could start there if you are confident it isn't the hardware or
your program.  Thankfully you are using linux and have an open kernel
data path to work with.

If you prove the drive is lying, which manufacturer makes it?  You
could call up the manufacturer with your reproducible problem.  They
would probably like to know if their controller is corrupting.

- Mike

next prev parent reply	other threads:[~2010-03-05 16:43 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-04  4:52 Linux kernel - Libata bad block error handling to user mode program foo saa
2010-03-04  6:42 ` Andrew Morton
2010-03-04 12:58   ` foo saa
2010-03-04 16:31     ` Mike Hayward
2010-03-04 18:12       ` s ponnusa
2010-03-05  0:42         ` Mike Hayward
2010-03-05  2:23           ` s ponnusa
2010-03-05 16:31             ` Mike Hayward [this message]
2010-03-05  6:01           ` Greg Freemyer
2010-03-05 13:04             ` Alan Cox
2010-03-04 16:37     ` Mike Hayward
2010-03-04 18:23       ` s ponnusa
2010-03-04 14:17   ` Greg Freemyer
2010-03-04 14:41     ` Mark Lord
2010-03-04 15:33       ` foo saa
2010-03-04 17:49         ` Mark Lord
2010-03-04 18:20           ` s ponnusa
2010-03-04 19:41             ` Greg Freemyer
2010-03-04 19:50               ` s ponnusa
2010-03-05  1:58             ` Robert Hancock
2010-03-05  2:11               ` s ponnusa
2010-03-05  2:16                 ` Robert Hancock
2010-03-05  2:17                   ` s ponnusa
2010-03-05 12:03                 ` Alan Cox
2010-03-05 22:27                   ` s ponnusa
2010-03-11 18:29       ` Greg Freemyer
2010-03-13 22:44         ` s ponnusa
2010-03-13 23:44           ` Robert Hancock
2010-03-14  0:12             ` s ponnusa
2010-03-14  5:06               ` Robert Hancock
2010-03-14 16:02         ` Mark Lord
2010-03-14 16:12           ` Greg Freemyer
  -- strict thread matches above, loose matches on Subject: below --
2010-03-04 18:40 Kalra Ashish-B00888
2010-03-04 18:41 Kalra Ashish-B00888

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201003051631.o25GVbsD010752@alien.loup.net \
    --to=hayward@loup.net \
    --cc=akpm@linux-foundation.org \
    --cc=foosaa@gmail.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox