Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Norman Diamond" <ndiamond@wta.att.ne.jp>
To: "Hans Reiser" <reiser@namesys.com>,
	"Wes Janzen" <superchkn@sbcglobal.net>,
	"Rogier Wolff" <R.E.Wolff@BitWizard.nl>,
	"John Bradford" <john@grabjohn.com>,
	<linux-kernel@vger.kernel.org>, <nikita@namesys.com>,
	"Pavel Machek" <pavel@ucw.cz>
Subject: Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
Date: Fri, 17 Oct 2003 20:11:42 +0900	[thread overview]
Message-ID: <126d01c3949f$91bdecc0$3eee4ca5@DIAMONDLX60> (raw)
In-Reply-To: 3F8FBADE.7020107@namesys.com

Replying first to Hans Reiser; below to Russell King and Pavel Machek.

> Instead of recording the bad blocks, just write to them.

If writes are guaranteed to force reallocations then this is potentially
part of a solution.

I still remain suspicious because the first failed read was milliseconds or
minutes after the preceding write.  I think the odds are very high that the
sector was already bad at the time of the write but reallocation did not
occur.  It is possible but I think very unlikely that the sector was
reallocated to a different physical sector which went bad milliseconds after
being written after reallocation, and equally unlikely that the sector
wasn't reallocated because it really hadn't been bad but went bad
milliseconds later.  In other words, I think it is overwhelmingly likely
that the write failed but was not detected as such and did not result in
reallocation.

Now, maybe there is a technique to force it anyway.  When a partition is
newly created and is being formatted with the intention of writing data a
few minutes later, do writes that "should" have a better chance of being
detected.  The way to start this is to simply write every block, but this is
obviously insufficient because my block did get written shortly after the
partition was formatted and that write didn't cause the block to be
reallocated.  So in addition to simply writing every block, also read every
block.  For each read that fails, proceed to do another write which "should"
force reallocation.

Mr. Reiser, when I created a partition of your design, that technique was
not offered.  Why?  And will it soon start being offered?

Also, I remain highly suspicious that for each read that fails, when the
formatting program proceeds to do another write which "should" force
reallocation, the drive might not do it.  The formatter will have to proceed
to yet another read.  And if the block is still bad, then figure that the
drive is refusing to reallocate the bad block.  And then yes, the formatter
will still have to make a list of known bad blocks and do something to
prevent ordinary file system operations from ever seeing those blocks.

Russell King replied to me:

> > When a drive tries to read a block, if it detects errors, it retries up
> > to 255 times.  If a retry succeeds then the block gets reallocated.  IF
> > 255 RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
>
> This is perfectly reasonable.  If the drive can't recover your old data
> to reallocate it to a new block, then leaving the error present until you
> write new data to that bad block is the correct thing to do.

Only if the subsequent write is guaranteed to result in reallocation.  I
remain suspicious that the drive does not guarantee such.  Suppose the
contents of the next write happen to get stored close enough to correct that
the block doesn't get reallocated and the data survive for another 100
milliseconds before getting corrupt again?

> Think about what would happen if it did get reallocated.  What data would
> the drive return when requested to read the bad block?

Why does it matter?  The drive already reported a read failure.  Maybe Linux
programs aren't all smart enough to inform the user when a read operation
results in an I/O error, but drivers could be smarter.  I think there's
probably a bit of room in an inode to add a flag saying that the file has
been detected to be partially unreadable.  Sorry for the digression.
Anyway, it is 100% true that the data in that block are gone.  The block
should be reallocated and the new physical block can either be zeroed or
randomized or anything, and that's what subsequent reads will get until the
block gets written again.

> If the error persists during a write to the bad block, then yes, I'd
> expect it to be reallocated at that point - but only because the drive has
> the correct data for that block available.

We agree in our moral expectations and our technical analysis that correct
data will be available at that time.  But if your word "expect" means you
have confidence that the drive will perform correctly, I do not share your
confidence (I think it is possible but highly unlikely that the drive did
its job correctly during the previous write).

> Your description of the way Toshibas drive works seems perfectly sane.
> In fact, I'd consider a drive to be broken if it behaved in any other way
> - capable of almost silent data loss.

I think it would not be silent.  If the system log had one repetition
instead of fifty repetitions, it would not be silent.  I don't know which
application was silent and am irritated.  (dd wasn't silent when I tried
copying the entire partition to /dev/null).

Pavel Machek wrote:

> Well, this behaviour makes sense.
>
> "If we can't read this, leave it in place, perhaps we can read it in
> future (when temperature drops below 80Celsius or something)". "If we
> can't write this, bad, but we can reallocate without loosing
> anything".

Well, consider the two extremes we've seen in this thread now.  Mr. Bradford
felt that the entire drive should be discarded on account of having one bad
block.  Mr. Machek feels that we should preserve the possibility of reusing
the bad block because in the future it might appear not to be bad.  I take
the middle road.  The drive should not be discarded until errors become more
frequent or numerous, but known bad blocks should be acted on so that those
physical blocks should not have a chance of being used again.

Suppose the block became readable when the temperature drops (this one
didn't but I believe some can).  What happens when the block becomes
readable, and then a program writes new data to that block, and the block
temporarily appears good?  At that time it will get written and will not get
reallocated, right?  And a few milliseconds later, what?  I do not want that
block reused.  I want it reallocated.

And when a drive doesn't guarantee reallocation, I want the driver to remove
the sector from the file system.

next prev parent reply	other threads:[~2003-10-17 11:14 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-13  9:31 Why are bad disk sectors numbered strangely, and what happens to them? Norman Diamond
     [not found] ` <200310131014.h9DAEwY3000241@81-2-122-30.bradfords.org.uk>
2003-10-13 10:24   ` Norman Diamond
2003-10-13 10:33     ` John Bradford
2003-10-13 11:30       ` Norman Diamond
2003-10-13 11:58         ` Maciej Zenczykowski
2003-10-15 10:22           ` Norman Diamond
2003-10-13 12:02         ` John Bradford
2003-10-15 10:23           ` Norman Diamond
2003-10-15 18:56             ` Pavel Machek
2003-10-14  6:54         ` Rogier Wolff
2003-10-13 14:24     ` Chuck Campbell
2003-10-13 14:54       ` Maciej Zenczykowski
2003-10-13 16:29         ` Roger Larsson
2003-10-14  6:49     ` Rogier Wolff
2003-10-14  7:05       ` Wes Janzen
2003-10-14  7:21         ` John Bradford
2003-10-14  7:40           ` Rogier Wolff
2003-10-14  8:11             ` John Bradford
2003-10-14  8:45               ` Hans Reiser
2003-10-14  9:46                 ` Rogier Wolff
2003-10-14  9:57                   ` Hans Reiser
2003-10-14 10:10                     ` Rogier Wolff
2003-10-14 10:31                       ` Hans Reiser
2003-10-14 10:19                 ` John Bradford
     [not found]             ` <200310140800.h9E80BT9000815@81-2-122-30.bradfords.org.uk>
     [not found]               ` <20031014081110.GA14418@bitwizard.nl>
2003-10-14  8:55                 ` Wes Janzen
2003-10-14 10:05                   ` Rogier Wolff
2003-10-14  7:24         ` Rogier Wolff
2003-10-14  9:04         ` Hans Reiser
2003-10-15 10:23           ` Norman Diamond
2003-10-15 10:39             ` Hans Reiser
2003-10-17  9:40           ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
2003-10-17  9:48             ` Hans Reiser
2003-10-17 11:11               ` Norman Diamond [this message]
2003-10-17 11:45                 ` Hans Reiser
2003-10-17 11:51                 ` John Bradford
2003-10-17 12:53                 ` John Bradford
2003-10-17 13:03                   ` Russell King
2003-10-17 13:26                     ` John Bradford
2003-10-19  7:50                   ` Andre Hedrick
2003-10-17 13:04                 ` Russell King
2003-10-17 14:09                   ` Norman Diamond
2003-10-17  9:58             ` Pavel Machek
2003-10-17 10:15               ` Hans Reiser
2003-10-17 10:24             ` Rogier Wolff
2003-10-17 10:49               ` John Bradford
2003-10-17 11:09                 ` Rogier Wolff
2003-10-17 11:24                 ` Krzysztof Halasa
2003-10-17 19:35                   ` John Bradford
2003-10-17 23:28                     ` Krzysztof Halasa
2003-10-18  7:42                       ` Pavel Machek
2003-10-18  8:30                         ` John Bradford
2003-10-21 20:26                           ` bill davidsen
2003-10-18  8:27                       ` John Bradford
2003-10-18 12:02                         ` Krzysztof Halasa
2003-10-18 16:26                           ` Nuno Silva
2003-10-18 20:16                             ` Krzysztof Halasa
     [not found]                     ` <m37k33igui.fsf@defiant. <m3u166vjn0.fsf@defiant.pm.waw.pl>
2003-10-21 20:39                       ` bill davidsen
2003-10-17 10:37             ` ATA Defect management John Bradford
2003-10-21 20:44               ` bill davidsen
2003-10-17 12:08             ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Justin Cormack
2003-10-21 20:12             ` bill davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='126d01c3949f$91bdecc0$3eee4ca5@DIAMONDLX60' \
    --to=ndiamond@wta.att.ne.jp \
    --cc=R.E.Wolff@BitWizard.nl \
    --cc=john@grabjohn.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nikita@namesys.com \
    --cc=pavel@ucw.cz \
    --cc=reiser@namesys.com \
    --cc=superchkn@sbcglobal.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox