All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ric Wheeler <ric@emc.com>
To: Toby Thain <toby@telegraphics.com.au>
Cc: Zan Lynx <zlynx@acm.org>, Jeff Mahoney <jeffm@suse.com>,
	Christian Kujau <lists@nerdbynature.de>,
	kgp <jyotiv@tataelxsi.co.in>,
	reiserfs-devel@vger.kernel.org
Subject: Re: bad block management
Date: Sat, 05 Apr 2008 11:08:13 -0400	[thread overview]
Message-ID: <47F795DD.1050508@emc.com> (raw)
In-Reply-To: <9BF9A69F-F5D1-4A58-B3AA-719799E225A9@telegraphics.com.au>

Toby Thain wrote:
> 
> On 5-Apr-08, at 8:31 AM, Ric Wheeler wrote:
>> Toby Thain wrote:
>>> On 4-Apr-08, at 2:58 PM, Ric Wheeler wrote:
>>>>
>>>> Toby Thain wrote:
>>>>> On 3-Apr-08, at 8:14 PM, Zan Lynx wrote:
>>>>>> On Tue, 2008-04-01 at 15:51 -0400, Jeff Mahoney wrote:
>>>>>>
>>>>>>> Ric's right about disk drives, though. They'll remap the bad sectors
>>>>>>> automatically at the hardware level. When you start to see bad 
>>>>>>> sectors
>>>>>>> at the file system level, it means that the sectors reserved for
>>>>>>> remapping have been exhausted and you should replace the disk.
>>>>>>
>>>>>> There are a couple of cases where you can see bad block errors on 
>>>>>> a good
>>>>>> drive.
>>>>>>
>>>>>> If a block is written with a bad CRC for some reason...the write head
>>>>>> got a freak blip or it lost power as it was writing, or the data went
>>>>>> corrupt while sitting on disk, then it will read as a bad block, but
>>>>>> rewriting would fix it.
>>>>>>
>>>>>> A RAID media verify or a badblocks -n run can usually fix these.
>>>>> Only if your RAID uses CRCs (most don't).
>>>>> ZFS is the real answer to undetected corruption.
>>>>> --Toby
>>>>
>>>> Zan is right - even on a local drive, a write can repair some 
>>>> sectors with bad protection bits. All disks have per sector data 
>>>> protection (reed solomon encoding, etc) and there are lots of those 
>>>> bits per sector.
>>> That does not protect against writing bad data, only some errors 
>>> internal to drive. There is a long way to travel between CPU and 
>>> drive. Cable, controller, RAM, etc, etc, etc. ZFS protects the entire 
>>> data path.
>>> --Toby
>>
>> If you want to protect the entire data path, you are looking at 
>> something like DIF which protects even more of the data path than ZFS 
>> since it adds a check from application space to the IO stack ;-)
>>
>> ZFS does not export its protection bits up the stack.
> 
> Correct, but it protects everything up to the system call. RAID does not 
> even get close, even with perfect error reporting (which doesn't really 
> exist anyway). :)
> 
> --Toby
> 

When you look in detail at how data is lost in working systems, it is 
always interesting to look at the big buckets of common failures and 
make sure that we balance the complexity and cost (in money or in 
performance) against the realized improvement.

What RAID does well is to protect against the leading and really, really 
common error case which is single or few sector errors on a disk drive. 
Those errors are almost always reported as IO errors and RAID systems 
(including our MD software RAID) will do the right thing when only one 
sector in a stripe is bad with a media error.

The interesting question is what failure is the second most common.

That, from what I see, is normally application/SW errors. It can be bugs 
in the fs or IO stack, but also it is also common to lose data from bad 
applications.

I don't have first hand experience with ZFS, but in any complicated 
system you have a danger to increase the error rate (certainly for early 
adopters ;-)) while the developers try to figure out how their 
implementation differs from their pristine design (or what the design 
concept missed).

My measured results of the reliability of reiserfs (v3) over a really 
large population show that we do quite well (when you use barriers or 
disable the write cache).

It will be interesting to look for the first ZFS study (like the CMU 
paper by Bianca on disk failure, the google paper on failures and the 
recent NetApp/UWisc papers on IO stack failures) to see how ZFS does in 
the wild.

ric

      reply	other threads:[~2008-04-05 15:08 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-01  5:03 bad block management kgp
2008-04-01 18:55 ` Christian Kujau
2008-04-01 19:32   ` Ric Wheeler
2008-04-01 19:51     ` Jeff Mahoney
2008-04-01 22:11       ` Edward Shishkin
2008-04-02  4:50         ` jyotiv
2008-04-02 10:43           ` Ric Wheeler
2008-04-02 11:22             ` jyotiv
2008-04-02 13:31               ` Ric Wheeler
2008-04-02 13:14           ` Jeff Mahoney
2008-04-04  0:14       ` Zan Lynx
2008-04-04  4:21         ` Toby Thain
2008-04-04 16:12           ` Zan Lynx
2008-04-04 22:41             ` Toby Thain
2008-04-04 18:58           ` Ric Wheeler
2008-04-04 22:42             ` Toby Thain
2008-04-05 12:31               ` Ric Wheeler
2008-04-05 14:07                 ` Toby Thain
2008-04-05 15:08                   ` Ric Wheeler [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47F795DD.1050508@emc.com \
    --to=ric@emc.com \
    --cc=jeffm@suse.com \
    --cc=jyotiv@tataelxsi.co.in \
    --cc=lists@nerdbynature.de \
    --cc=reiserfs-devel@vger.kernel.org \
    --cc=toby@telegraphics.com.au \
    --cc=zlynx@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.