Why does the md/raid subsystem does not remap bad sectors in a raid array?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Why does the md/raid subsystem does not remap bad sectors in a raid array?
@ 2008-11-23  0:02 Justin Piszcz
  2008-11-23  0:13 ` Jon Nelson
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Justin Piszcz @ 2008-11-23  0:02 UTC (permalink / raw)
  To: linux-raid; +Cc: linux-kernel

I asked before but it was kind of clobbered in the velociraptor mess:

On a colleague's box:

Aug 02, 2008 12:15.30AM(0x04:0x0023): Sector repair completed: port=7, 
LBA=0x4A0387F5

SMART Self-test log structure revision number 0
Warning: ATA Specification requires self-test log structure revision 
number = 1
Num  Test_Description    Status                  Remaining 
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%       305 
1241745397

Even though this disk has a bad sector:
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline 
-       1

The controller does not drop the drive from the array when it hits an 
error, the 3ware card "takes care of it" and the user need not worry about 
it, whereas with md/raid every time it hits a bad sector, it breaks the 
raid and it goes degraded, is this correct?  Will/can something like what 
3ware does be possible in a sw-raid based configuration or is a HW raid 
card required?

Justin.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why does the md/raid subsystem does not remap bad sectors in a raid array?
  2008-11-23  0:02 Why does the md/raid subsystem does not remap bad sectors in a raid array? Justin Piszcz
@ 2008-11-23  0:13 ` Jon Nelson
  2008-11-23  1:44 ` Robert Hancock
  2008-11-25  8:17 ` Luca Berra
  2 siblings, 0 replies; 6+ messages in thread
From: Jon Nelson @ 2008-11-23  0:13 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid, linux-kernel

There are a few reasons but my guess is this: md tries to use the
entire available storage (of the smallest element) in a given set of
devices, which means there is no room for remappery.

However, if MD could be told to set aside some percentage of this
value, or some fixed amount (like, say, 10MB) then the *possibility*
of remapping blocks becomes possible.

However, to add this functionality one would have to consider the following:

1. how much to set aside?
2. where? beginning, end, middle, staggered in chunks?
3. how to tell MD that block A maps to block B on device C? Should it
be done as an exception list (all blocks not in list X refer to their
actual block, otherwise they refer to a redirected block)? or as a
direct map (or something else)?

Perhaps an alternative would be to add a new block layer which takes
an existing block device X and exposes a new, automatic block
remapper-y block device Y (bad reads might continue to return errors
but writes to a previous bad read might go to a new block and so would
subsequent reads) and so on.

Perhaps the easiest way to test this would be to hack NBD or AoE and
build a raid out of such devices.

Just ramblin' here.

-- 
Jon

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why does the md/raid subsystem does not remap bad sectors in a raid   array?
  2008-11-23  0:02 Why does the md/raid subsystem does not remap bad sectors in a raid array? Justin Piszcz
  2008-11-23  0:13 ` Jon Nelson
@ 2008-11-23  1:44 ` Robert Hancock
  2008-11-23  4:34   ` Brad Campbell
  2008-11-25  8:17 ` Luca Berra
  2 siblings, 1 reply; 6+ messages in thread
From: Robert Hancock @ 2008-11-23  1:44 UTC (permalink / raw)
  To: linux-raid; +Cc: linux-kernel

Justin Piszcz wrote:
> I asked before but it was kind of clobbered in the velociraptor mess:
> 
> On a colleague's box:
> 
> Aug 02, 2008 12:15.30AM(0x04:0x0023): Sector repair completed: port=7, 
> LBA=0x4A0387F5
> 
> SMART Self-test log structure revision number 0
> Warning: ATA Specification requires self-test log structure revision 
> number = 1
> Num  Test_Description    Status                  Remaining 
> LifeTime(hours)  LBA_of_first_error
> # 1  Extended offline    Completed: read failure       90%       305 
> 1241745397
> 
> Even though this disk has a bad sector:
> 198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   
> Offline -       1
> 
> The controller does not drop the drive from the array when it hits an 
> error, the 3ware card "takes care of it" and the user need not worry 
> about it, whereas with md/raid every time it hits a bad sector, it 
> breaks the raid and it goes degraded, is this correct?  Will/can 
> something like what 3ware does be possible in a sw-raid based 
> configuration or is a HW raid card required?

Presumably all it's doing is writing that sector's contents back from 
the other drive(s) in the array when the read error is detected, this is 
something that software could do just as well. Drives only remap bad 
sectors when they are written over, as a read failure doesn't 
necessarily mean that the sector is entirely unreadable, but could be 
due to environmental factors such as high temperature, vibration, etc.

Just rewriting the sector seems a bit questionable though, as if a drive 
in your array is growing read errors that's not really a good thing..

> 
> Justin.
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why does the md/raid subsystem does not remap bad sectors in a raid   array?
  2008-11-23  1:44 ` Robert Hancock
@ 2008-11-23  4:34   ` Brad Campbell
  2008-11-23 12:20     ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 6+ messages in thread
From: Brad Campbell @ 2008-11-23  4:34 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-raid, linux-kernel

Robert Hancock wrote:
>> The controller does not drop the drive from the array when it hits an 
>> error, the 3ware card "takes care of it" and the user need not worry 
>> about it, whereas with md/raid every time it hits a bad sector, it 
>> breaks the raid and it goes degraded, is this correct?  Will/can 
>> something like what 3ware does be possible in a sw-raid based 
>> configuration or is a HW raid card required?
> 
> Presumably all it's doing is writing that sector's contents back from 
> the other drive(s) in the array when the read error is detected, this is 
> something that software could do just as well. Drives only remap bad 
> sectors when they are written over, as a read failure doesn't 
> necessarily mean that the sector is entirely unreadable, but could be 
> due to environmental factors such as high temperature, vibration, etc.
> 
> Just rewriting the sector seems a bit questionable though, as if a drive 
> in your array is growing read errors that's not really a good thing..

md has done this for a while now though. If it encounters a read error in the array it will make an 
attempt to write the reconstructed data back to that disk attempting to force a reallocation. I've 
seen it work quite well here on disks that have the occasional grown defect.

It's certainly _much_ nicer than having the disk booted from the array on a single read error.

If the disk is haemorrhaging sectors then you will find out about it sooner or later through other 
means.

Brad
-- 
Dolphins are so intelligent that within a few weeks they can
train Americans to stand at the edge of the pool and throw them
fish.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why does the md/raid subsystem does not remap bad sectors in a raid   array?
  2008-11-23  4:34   ` Brad Campbell
@ 2008-11-23 12:20     ` Henrique de Moraes Holschuh
  0 siblings, 0 replies; 6+ messages in thread
From: Henrique de Moraes Holschuh @ 2008-11-23 12:20 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Robert Hancock, linux-raid, linux-kernel

On Sun, 23 Nov 2008, Brad Campbell wrote:
> md has done this for a while now though. If it encounters a read error in 
> the array it will make an attempt to write the reconstructed data back to 
> that disk attempting to force a reallocation. I've seen it work quite 
> well here on disks that have the occasional grown defect.

Indeed, but it does so in the "check array" mode (which distros like
Debian are now enabling once-a-month or so, I always up that to once a
week :p)

Does md repair bitrotten sectors ALSO outside of check mode?  That's
what is being asked in this thread...

> If the disk is haemorrhaging sectors then you will find out about it 
> sooner or later through other means.

Like a weekly SMART long test.   That's what our maintenance windows are
for :)  Everything is kept on-line, but allowed to run in degraded
performance mode, so we kick in SMART offline and long tests, RAID array
scrubbing, etc (not at the same time, though!).

That reminds me to file a bug against smartmontools to DISABLE auto
offline mode on disks, and enable them one disk at a time at a random
interval with at least one hour between them.  Otherwise, the disks all
enter auto-offline-testing SMART mode at the same time.

Hmm, it would be good to teach md to measure disk throughput using a
sliding window (of say, 5 minutes) and reduce read priority of disks
that are slow...

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why does the md/raid subsystem does not remap bad sectors in a raid array?
  2008-11-23  0:02 Why does the md/raid subsystem does not remap bad sectors in a raid array? Justin Piszcz
  2008-11-23  0:13 ` Jon Nelson
  2008-11-23  1:44 ` Robert Hancock
@ 2008-11-25  8:17 ` Luca Berra
  2 siblings, 0 replies; 6+ messages in thread
From: Luca Berra @ 2008-11-25  8:17 UTC (permalink / raw)
  To: linux-raid

Cause all modern hard drives have sector remapping capabilities in
firmware.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-11-25  8:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-23  0:02 Why does the md/raid subsystem does not remap bad sectors in a raid array? Justin Piszcz
2008-11-23  0:13 ` Jon Nelson
2008-11-23  1:44 ` Robert Hancock
2008-11-23  4:34   ` Brad Campbell
2008-11-23 12:20     ` Henrique de Moraes Holschuh
2008-11-25  8:17 ` Luca Berra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).