Joys of spare disks!

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Joys of spare disks!
@ 2005-02-28 14:24 Robin Bowes
  2005-02-28 15:04 ` Jon Lewis
  2005-03-02  2:48 ` Robin Bowes
  0 siblings, 2 replies; 22+ messages in thread
From: Robin Bowes @ 2005-02-28 14:24 UTC (permalink / raw)
  To: linux-raid

Hi,

I run a RAID5 array built from six 250GB Maxtor Maxline II SATA disks. 
After having several problems with Maxtor disks I decided to use a spare 
disk, i.e. 5+1 spare.

Well, *another* disk failed last week. The spare disk was brought into 
play seamlessly:

[root@dude ~]# mdadm --detail /dev/md5
/dev/md5:
         Version : 00.90.01
   Creation Time : Thu Jul 29 21:41:38 2004
      Raid Level : raid5
      Array Size : 974566400 (929.42 GiB 997.96 GB)
     Device Size : 243641600 (232.35 GiB 249.49 GB)
    Raid Devices : 5
   Total Devices : 6
Preferred Minor : 5
     Persistence : Superblock is persistent

     Update Time : Mon Feb 28 14:00:54 2005
           State : clean
  Active Devices : 5
Working Devices : 5
  Failed Devices : 1
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 128K

            UUID : a4bbcd09:5e178c5b:3bf8bd45:8c31d2a1
          Events : 0.6941488

     Number   Major   Minor   RaidDevice State
        0       8        2        0      active sync   /dev/sda2
        1       8       18        1      active sync   /dev/sdb2
        2       8       34        2      active sync   /dev/sdc2
        3       8       82        3      active sync   /dev/sdf2
        4       8       66        4      active sync   /dev/sde2

        5       8       50        -      faulty   /dev/sdd2

I've done a quick test of /dev/sdd2:

[root@dude ~]# dd if=/dev/sdd2 of=/dev/null bs=64k
dd: reading `/dev/sdd2': Input/output error
50921+1 records in
50921+1 records out

So, I guess it's time to raise another return with Maxtor <sigh>.

/dev/sdd1 is used in /dev/md0. So, just to confirm, is this what I need 
to do to remove bad disk/add new disk:

Remove faulty partition:

	mdadm --manage /dev/md5 --remove /dev/sdd2

Remove "good" from RAID1 array:

	mdadm --manage /dev/md0 --fail /dev/sdd1
	mdadm --manage /dev/md0 --remove /dev/sdd1

[pull out bad disk, install replacement]

Partition new disk (will be /dev/sdd) (All six disks are partitioned the 
same):

	fdisk -l /dev/sda | fdisk /dev/sdd

(I seem to remember having a problem with this when I did it last time. 
Something about a bug in fdisk that won't partition brand new new disks 
correctly? Or was it sfdisk?)

Add new partitions to arrays:

	mdadm --manage /dev/md0 --add /dev/sdd1
	mdadm --manage /dev/md5 --add /dev/sdd2

Thanks,

R.
-- 
http://robinbowes.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
  2005-02-28 14:24 Joys of spare disks! Robin Bowes
@ 2005-02-28 15:04 ` Jon Lewis
  2005-02-28 15:23   ` Robin Bowes
  2005-03-02  2:48 ` Robin Bowes
  1 sibling, 1 reply; 22+ messages in thread
From: Jon Lewis @ 2005-02-28 15:04 UTC (permalink / raw)
  To: Robin Bowes; +Cc: linux-raid

On Mon, 28 Feb 2005, Robin Bowes wrote:

> Hi,
>
> I run a RAID5 array built from six 250GB Maxtor Maxline II SATA disks.
> After having several problems with Maxtor disks I decided to use a spare
> disk, i.e. 5+1 spare.

Are you aware Maxtor had a bad batch(es?) of large SATA drives?  We have a
few machines with 6 Maxtor 6Y200M0 drives and got a couple bad drives.
They have a utility program (powermax) you can run that will tell you if
any of your drives are bad.  The bad ones will appear to work...but not
entirely reliably.  i.e. the system I installed that had 1 bad drive could
not resync the RAID5.  It would always either crash or lock up during the
resync.

----------------------------------------------------------------------
 Jon Lewis                   |  I route
 Senior Network Engineer     |  therefore you are
 Atlantic Net                |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
  2005-02-28 15:04 ` Jon Lewis
@ 2005-02-28 15:23   ` Robin Bowes
  2005-02-28 15:54     ` Nicola Fankhauser
  0 siblings, 1 reply; 22+ messages in thread
From: Robin Bowes @ 2005-02-28 15:23 UTC (permalink / raw)
  To: linux-raid

Jon Lewis wrote:
> Are you aware Maxtor had a bad batch(es?) of large SATA drives?  We have a
> few machines with 6 Maxtor 6Y200M0 drives and got a couple bad drives.
> They have a utility program (powermax) you can run that will tell you if
> any of your drives are bad.  The bad ones will appear to work...but not
> entirely reliably.  i.e. the system I installed that had 1 bad drive could
> not resync the RAID5.  It would always either crash or lock up during the
> resync.

I experienced just those symptoms with some of the drives I've had. I've 
used powermax to identify bad disks and RMA'd them. I can't remember if 
any of the replacement drives have failed as well - I seem to think they 
have but I could be imagining that.

I've had loads of problems with MaxLine II 250GB SATA drives and 
wouldn't buy them again. But as long as they keep RMA'ing them I'm happy 
to stick with them for now.

R.
-- 
http://robinbowes.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
  2005-02-28 15:23   ` Robin Bowes
@ 2005-02-28 15:54     ` Nicola Fankhauser
  2005-02-28 17:04       ` Robin Bowes
  0 siblings, 1 reply; 22+ messages in thread
From: Nicola Fankhauser @ 2005-02-28 15:54 UTC (permalink / raw)
  To: linux-raid

hi robin

nice to meet you here again - I've been subscribed to this list for 1.5 
hour today when your post made it into my inbox! :)

Robin Bowes wrote:
> I've had loads of problems with MaxLine II 250GB SATA drives and 
> wouldn't buy them again. 

I bought 9 (8+1 for RAID5) 6B300S0 300GB Maxtor DM10 a month ago and 
they seem to run fine - though I want to test them with powermax this 
week to see if there could be problems.

I'll need as well to upgrade the kernel to 2.6.10 and patch it with the 
libata patch to make smartctl happy again (it won't work with SATA 
drives yet).

> But as long as they keep RMA'ing them I'm happy 
> to stick with them for now.

I'm afraid that the warranty period won't be extended by the replacement 
drive(s).

regards from snowy and cold switzerland
nicola

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
  2005-02-28 15:54     ` Nicola Fankhauser
@ 2005-02-28 17:04       ` Robin Bowes
  2005-02-28 18:58         ` Nicola Fankhauser
  0 siblings, 1 reply; 22+ messages in thread
From: Robin Bowes @ 2005-02-28 17:04 UTC (permalink / raw)
  To: linux-raid

Nicola Fankhauser wrote:
> hi robin
> 
> nice to meet you here again - I've been subscribed to this list for 1.5 
> hour today when your post made it into my inbox! :)

Small world, eh?!

> Robin Bowes wrote:
> 
>> I've had loads of problems with MaxLine II 250GB SATA drives and 
>> wouldn't buy them again. 
> 
> 
> I bought 9 (8+1 for RAID5) 6B300S0 300GB Maxtor DM10 a month ago and 
> they seem to run fine - though I want to test them with powermax this 
> week to see if there could be problems.

Based on my experiences, I would strongly recommend running them 7+1+1, 
i.e. configuring a spare diskl

> I'll need as well to upgrade the kernel to 2.6.10 and patch it with the 
> libata patch to make smartctl happy again (it won't work with SATA 
> drives yet).

I haven't done that yet - I prefer to stick with release FC3 kernels 
(don't want to get back into rolling my own - takes up too much time!)

> 
>> But as long as they keep RMA'ing them I'm happy to stick with them for 
>> now.
> 
> 
> I'm afraid that the warranty period won't be extended by the replacement 
> drive(s).

Ah, that could be a problem. I'll have to review the situation in a 
couple of years.

> regards from snowy and cold switzerland

Well, it's not too warm in the UK at the moment!

R.
-- 
http://robinbowes.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
  2005-02-28 17:04       ` Robin Bowes
@ 2005-02-28 18:58         ` Nicola Fankhauser
  2005-02-28 19:25           ` Robin Bowes
  0 siblings, 1 reply; 22+ messages in thread
From: Nicola Fankhauser @ 2005-02-28 18:58 UTC (permalink / raw)
  To: linux-raid

hi robin

Robin Bowes wrote:
>> I bought 9 (8+1 for RAID5) 6B300S0 300GB Maxtor DM10 a month ago and 
>> they seem to run fine - though I want to test them with powermax this 
>> week to see if there could be problems.
> 
> Based on my experiences, I would strongly recommend running them 7+1+1, 
> i.e. configuring a spare diskl

the RAID 5 array has 8 drives. the "server" is in my basement and thus 
accessible all the time. the (tested :) 9th drive is lying in my drawer 
and will serve as replacement as soon as a drive dies. I hope it's enough.

regards
nicola

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
  2005-02-28 18:58         ` Nicola Fankhauser
@ 2005-02-28 19:25           ` Robin Bowes
  0 siblings, 0 replies; 22+ messages in thread
From: Robin Bowes @ 2005-02-28 19:25 UTC (permalink / raw)
  To: linux-raid

Nicola Fankhauser wrote:
> the RAID 5 array has 8 drives. the "server" is in my basement and thus 
> accessible all the time. the (tested :) 9th drive is lying in my drawer 
> and will serve as replacement as soon as a drive dies. I hope it's enough.

Is that because you can't fit it in the case with the other 8?

R.
-- 
http://robinbowes.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
  2005-02-28 14:24 Joys of spare disks! Robin Bowes
  2005-02-28 15:04 ` Jon Lewis
@ 2005-03-02  2:48 ` Robin Bowes
  2005-03-02  2:59   ` Neil Brown
                     ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Robin Bowes @ 2005-03-02  2:48 UTC (permalink / raw)
  To: linux-raid

Robin Bowes wrote:
> Hi,
> 
> I run a RAID5 array built from six 250GB Maxtor Maxline II SATA disks. 
> After having several problems with Maxtor disks I decided to use a spare 
> disk, i.e. 5+1 spare.
> 
> Well, *another* disk failed last week. The spare disk was brought into 
> play seamlessly:

Thanks to some advice from Guy the "failed" disk is now back up and running.

To fix it I did the following;

Removed the bad partition from the array:

   mdadm --manage /dev/md5 --remove /dev/sdd2

Wrote to the whole disk, causing bad blocks to be re-located:

   [root@dude test]#  dd if=/dev/zero of=/dev/sdd2 bs=64k
   dd: writing `/dev/sdd2': No space left on device
   3806903+0 records in
   3806902+0 records out

Verified the disk:

   [root@dude test]# dd if=/dev/sdd2 of=/dev/null bs=64k
   3806902+1 records in
   3806902+1 records out

Added the partition back to the array:

   [root@dude test]# mdadm /dev/md5 --add /dev/sdd2
   mdadm: hot added /dev/sdd2

Quick look at the arrya configuration to make sure:

   [root@dude test]# mdadm --detail /dev/md5
   /dev/md5:
            Version : 00.90.01
      Creation Time : Thu Jul 29 21:41:38 2004
         Raid Level : raid5
         Array Size : 974566400 (929.42 GiB 997.96 GB)
        Device Size : 243641600 (232.35 GiB 249.49 GB)
       Raid Devices : 5
      Total Devices : 6
    Preferred Minor : 5
        Persistence : Superblock is persistent

        Update Time : Wed Mar  2 02:01:24 2005
              State : clean
     Active Devices : 5
    Working Devices : 6
     Failed Devices : 0
      Spare Devices : 1

             Layout : left-symmetric
         Chunk Size : 128K

               UUID : a4bbcd09:5e178c5b:3bf8bd45:8c31d2a1
             Events : 0.7036368

        Number   Major   Minor   RaidDevice State
           0       8        2        0      active sync   /dev/sda2
           1       8       18        1      active sync   /dev/sdb2
           2       8       34        2      active sync   /dev/sdc2
           3       8       82        3      active sync   /dev/sdf2
           4       8       66        4      active sync   /dev/sde2

           5       8       50        -      spare   /dev/sdd2

This raises the question: why can't md do this automatically? Not for 
the whole disk/partition, but just for a bad block when encountered? I 
envisage something like:

md attempts read
one disk/partition fails with a bad block
md re-calculates correct data from other disks
md writes correct data to "bad" disk
  - disk will re-locate the bad block

Of course, if you encounter further bad blocks when reading from the 
other disks then you're screwed and it's time to get the backup tapes out!

Is there any sound reason why this is not feasible? Is it just that 
someone needs to write the code to implement it?

R.
-- 
http://robinbowes.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
  2005-03-02  2:48 ` Robin Bowes
@ 2005-03-02  2:59   ` Neil Brown
  2005-03-02  3:50   ` Molle Bestefich
  2005-03-02  4:57   ` Brad Campbell
  2 siblings, 0 replies; 22+ messages in thread
From: Neil Brown @ 2005-03-02  2:59 UTC (permalink / raw)
  To: Robin Bowes; +Cc: linux-raid

On Wednesday March 2, robin-lists@robinbowes.com wrote:
> 
> Is there any sound reason why this is not feasible? Is it just that 
> someone needs to write the code to implement it?

Exactly (just needs to be implemented).

NeilBrown


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
  2005-03-02  2:48 ` Robin Bowes
  2005-03-02  2:59   ` Neil Brown
@ 2005-03-02  3:50   ` Molle Bestefich
  2005-03-02  3:52     ` Molle Bestefich
  2005-03-02  5:52     ` Guy
  2005-03-02  4:57   ` Brad Campbell
  2 siblings, 2 replies; 22+ messages in thread
From: Molle Bestefich @ 2005-03-02  3:50 UTC (permalink / raw)
  To: linux-raid

Robin Bowes wrote:
> I envisage something like:
> 
> md attempts read
> one disk/partition fails with a bad block
> md re-calculates correct data from other disks
> md writes correct data to "bad" disk
>   - disk will re-locate the bad block

Probably not that simple, since some times multiple blocks will go
bad, and you wouldn't want the entire system to come to a screeching
halt whenever that happens.

A more consistent and risk-free way of doing it would probably be to
do the above partial resync in a background thread or so?..

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
  2005-03-02  3:50   ` Molle Bestefich
@ 2005-03-02  3:52     ` Molle Bestefich
  2005-03-02  5:52     ` Guy
  1 sibling, 0 replies; 22+ messages in thread
From: Molle Bestefich @ 2005-03-02  3:52 UTC (permalink / raw)
  To: linux-raid

Molle Bestefich wrote:
> [...] come to a screeching halt [...]

... because of the disks spending a couple of seconds per each write
retrying the write, moving the head to the spare area, whatever.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
  2005-03-02  2:48 ` Robin Bowes
  2005-03-02  2:59   ` Neil Brown
  2005-03-02  3:50   ` Molle Bestefich
@ 2005-03-02  4:57   ` Brad Campbell
  2005-03-02  5:53     ` Guy
  2 siblings, 1 reply; 22+ messages in thread
From: Brad Campbell @ 2005-03-02  4:57 UTC (permalink / raw)
  To: Robin Bowes; +Cc: linux-raid

Robin Bowes wrote:

> Thanks to some advice from Guy the "failed" disk is now back up and 
> running.
> 
> To fix it I did the following;
> 
<snip>

Just watch that disk like a hawk. I had two disks fail recently in the same way, I did exactly what 
you did and 2 days later they both started to grow defects again and got kicked out of the array. I 
RMA'd both of them last week as they would not stay stable for more than a day or two after 
re-allocation.

Regards,
Brad
-- 
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Joys of spare disks!
  2005-03-02  3:50   ` Molle Bestefich
  2005-03-02  3:52     ` Molle Bestefich
@ 2005-03-02  5:52     ` Guy
  2005-03-02 12:05       ` Molle Bestefich
  1 sibling, 1 reply; 22+ messages in thread
From: Guy @ 2005-03-02  5:52 UTC (permalink / raw)
  To: 'Molle Bestefich', linux-raid

I think the overhead related to fixing the bad blocks would be insignificant
compared to the overhead of degraded mode.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Molle Bestefich
Sent: Tuesday, March 01, 2005 10:51 PM
To: linux-raid@vger.kernel.org
Subject: Re: Joys of spare disks!

Robin Bowes wrote:
> I envisage something like:
> 
> md attempts read
> one disk/partition fails with a bad block
> md re-calculates correct data from other disks
> md writes correct data to "bad" disk
>   - disk will re-locate the bad block

Probably not that simple, since some times multiple blocks will go
bad, and you wouldn't want the entire system to come to a screeching
halt whenever that happens.

A more consistent and risk-free way of doing it would probably be to
do the above partial resync in a background thread or so?..
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Joys of spare disks!
  2005-03-02  4:57   ` Brad Campbell
@ 2005-03-02  5:53     ` Guy
       [not found]       ` <eaa6dfe0503080915276466a1@mail.gmail.com>
  0 siblings, 1 reply; 22+ messages in thread
From: Guy @ 2005-03-02  5:53 UTC (permalink / raw)
  To: 'Brad Campbell', 'Robin Bowes'; +Cc: linux-raid

That has not been my experience, but I have Seagate drives!

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Brad Campbell
Sent: Tuesday, March 01, 2005 11:57 PM
To: Robin Bowes
Cc: linux-raid@vger.kernel.org
Subject: Re: Joys of spare disks!

Robin Bowes wrote:

> Thanks to some advice from Guy the "failed" disk is now back up and 
> running.
> 
> To fix it I did the following;
> 
<snip>

Just watch that disk like a hawk. I had two disks fail recently in the same
way, I did exactly what 
you did and 2 days later they both started to grow defects again and got
kicked out of the array. I 
RMA'd both of them last week as they would not stay stable for more than a
day or two after 
re-allocation.


Regards,
Brad
-- 
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
  2005-03-02  5:52     ` Guy
@ 2005-03-02 12:05       ` Molle Bestefich
  2005-03-02 16:16         ` Guy
  0 siblings, 1 reply; 22+ messages in thread
From: Molle Bestefich @ 2005-03-02 12:05 UTC (permalink / raw)
  To: Guy; +Cc: linux-raid

Hm..  I said partial resync, because a full resync would be a waste of
time if it's just a thousand sectors or so that needs to be relocated.
 Anyhow.

There's no overhead to the application with the (theoretically
"partial") degraded mode, since it happens in parallel.

The latency of doing it while the read operation is ongoing would be,
say, 3 seconds or so per bad sector on a standard disk?  Imagine a
thousand bad sectors, and any sane person would quickly pull the plug
from the dead box and have it resync when it boots instead of staring
at a hung system.  When that happens there's even the risk that the
resync fails completely, if md decides to pull one of the disks other
than the one with bad blocks on it from the array before it resyncs.

I prefer the first scenario (the system keeps running, the array isn't
potentially destroyed), even if it means a slightly lower I/O rate and
thus a minor overhead if and only if running applications utilize the
I/O subsystem 100%..

Am I wrong?

Guy wrote:
> I think the overhead related to fixing the bad blocks would be insignificant
> compared to the overhead of degraded mode.
> 
> Guy
> 
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Molle Bestefich
> Sent: Tuesday, March 01, 2005 10:51 PM
> To: linux-raid@vger.kernel.org
> Subject: Re: Joys of spare disks!
> 
> Robin Bowes wrote:
> > I envisage something like:
> >
> > md attempts read
> > one disk/partition fails with a bad block
> > md re-calculates correct data from other disks
> > md writes correct data to "bad" disk
> >   - disk will re-locate the bad block
> 
> Probably not that simple, since some times multiple blocks will go
> bad, and you wouldn't want the entire system to come to a screeching
> halt whenever that happens.
> 
> A more consistent and risk-free way of doing it would probably be to
> do the above partial resync in a background thread or so?..
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Joys of spare disks!
  2005-03-02 12:05       ` Molle Bestefich
@ 2005-03-02 16:16         ` Guy
  2005-03-03  9:37           ` Molle Bestefich
  0 siblings, 1 reply; 22+ messages in thread
From: Guy @ 2005-03-02 16:16 UTC (permalink / raw)
  To: 'Molle Bestefich'; +Cc: linux-raid

This is what you said:
"A more consistent and risk-free way of doing it would probably be to
do the above partial resync in a background thread or so?.."

This sounded like Neil's current plan.  But if I understand the plan, the
drive would be kicked out of the array.  A log would track which stripes
were effected by the bad block and other writes to the array.  A partial
re-sync would be done and the disk put back into the array.  I think the
array would be degraded during the re-sync.  This is why I made my comments.
On a related subject, I don't like Neil's plan if it causes the array to be
degraded, since while degraded, if another disk has a bad block the array
then goes off-line.  Not a good plan IMO.

I also never said a full re-sync, I was referring to correcting the bad
block(s).  And 1000 bad blocks!  I have never had 2 on the same disk at the
same time.  AFAIK.  I would agree that 1000 would put a strain on the
system!  Normally only 1 block is bad at a time IMO.  However, I guess it is
possible to have a full track go bad.  Sometime in the past I have said
there should be a threshold on the number of bad blocks allowed.  Once the
threshold is reached, the disk should be assumed bad, or at least failing,
and should be replaced.  I would also say the bad block repair should have a
speed limit so the effect on the overall system is minimized.  Maybe only
allow it to repair 1 bad block per minute.  (a configurable parameter).  But
your 1000 bad block example would take almost 17 hours.

I think 1000 bad blocks at one time is an indication you have a head
failure.  In that case, the disk is bad.

Does anyone know how many spare blocks are on a disk?
My worse disk has 28 relocated bad blocks.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Molle Bestefich
Sent: Wednesday, March 02, 2005 7:05 AM
To: Guy
Cc: linux-raid@vger.kernel.org
Subject: Re: Joys of spare disks!

Hm..  I said partial resync, because a full resync would be a waste of
time if it's just a thousand sectors or so that needs to be relocated.
 Anyhow.

There's no overhead to the application with the (theoretically
"partial") degraded mode, since it happens in parallel.

The latency of doing it while the read operation is ongoing would be,
say, 3 seconds or so per bad sector on a standard disk?  Imagine a
thousand bad sectors, and any sane person would quickly pull the plug
from the dead box and have it resync when it boots instead of staring
at a hung system.  When that happens there's even the risk that the
resync fails completely, if md decides to pull one of the disks other
than the one with bad blocks on it from the array before it resyncs.

I prefer the first scenario (the system keeps running, the array isn't
potentially destroyed), even if it means a slightly lower I/O rate and
thus a minor overhead if and only if running applications utilize the
I/O subsystem 100%..

Am I wrong?

Guy wrote:
> I think the overhead related to fixing the bad blocks would be
insignificant
> compared to the overhead of degraded mode.
> 
> Guy
> 
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Molle Bestefich
> Sent: Tuesday, March 01, 2005 10:51 PM
> To: linux-raid@vger.kernel.org
> Subject: Re: Joys of spare disks!
> 
> Robin Bowes wrote:
> > I envisage something like:
> >
> > md attempts read
> > one disk/partition fails with a bad block
> > md re-calculates correct data from other disks
> > md writes correct data to "bad" disk
> >   - disk will re-locate the bad block
> 
> Probably not that simple, since some times multiple blocks will go
> bad, and you wouldn't want the entire system to come to a screeching
> halt whenever that happens.
> 
> A more consistent and risk-free way of doing it would probably be to
> do the above partial resync in a background thread or so?..
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
  2005-03-02 16:16         ` Guy
@ 2005-03-03  9:37           ` Molle Bestefich
  0 siblings, 0 replies; 22+ messages in thread
From: Molle Bestefich @ 2005-03-03  9:37 UTC (permalink / raw)
  To: Guy; +Cc: linux-raid

Guy <bugzilla@watkins-home.com> wrote:

I generally agree with you, so I'm just gonna cite / reply to the
points where we don't :-).

> This sounded like Neil's current plan.  But if I understand the plan, the
> drive would be kicked out of the array.

Yeah, sounds bad.
Although it should be marked as "degraded" in mdstat, since there's
basically no redundancy until the failed blocks have been reassigned
somehow.

> And 1000 bad blocks!  I have never had 2 on the same disk at the
> same time.  AFAIK.  I would agree that 1000 would put a strain on the
> system!

Well, it happened to me on a Windows system, so I don't think that
that is far-fetched.

This was a desktop system with the case open, so it was bounced about a lot.

Every time the disk reached one of the faulty areas, it recalibrated
the head and then moved it out to try and read again.  It retried the
operation 5 times before giving up.  While this was ongoing, Windows
was frozen.  It took at least 3 seconds each time I hit a bad area,
and I think even more.

If MD could read from a disk while a similar scenario occurred, and
just mark the bad blocks for "rewriting" in some "bad block rewrite
bitmap" or whatever, a system hang could be avoided.  Trying to
rewrite every failed sector sequentially in the code that also reads
the data would incur a system hang.  That's what I tried to say
originally, though I probably didn't do a good job (I know little of
linux md, guess it shows =)).

Of course, the disks would, in the case of IDE, probably have to _not_
be in master/slave configurations, since the disk with failing blocks
could perhaps hog the bus.  Of course I know as little of ATA/IDE as I
do of linux MD, so I'm basically just guessing here ;-).

> Sometime in the past I have said there should be a threshold on the number
> of bad blocks allowed.  Once the threshold is reached, the disk should be
> assumed bad, or at least failing, and should be replaced.

Hm.  Why?
If a re-write on the block succeeds and then a read on the block
returns the correct data, the block has been fixed.  I can see your
point on old disks where it might be a magnetic problem that was
causing the sector to fail, but on a modern disk, it has probably been
relocated to the spare area.  I think the disk should just be failed
when a rewrite-and-verify cycle still fails.  The threshold suggestion
adds complexity and user-configurability (error-prone) to an area
where it's not really needed, doesn't it?

Another note.  I'd like to see MD being able to have a
user-specifiable "bad block relocation area", just like modern disks
have.  It could use this when the disks spare area filled up.  I even
thought up a use case at one time that wasn't insane like, "my disks
is really beginning to show up a lot of failures now, but I think I'll
keep it running a bit more", but I can't quite reminisce what it was.

> Does anyone know how many spare blocks are on a disk?

It probably varies?
Ie. crappy disks probably have a much too small area ;-).
In this case it would be very cute if MD had an option to specify it's
own relocation area (and perhaps even a recommendation for the user on
how to set it wrt. specific harddisks).
But OTOH, it sucks to implement features in MD that would be much
easier to solve in the disks by just expanding the spare area (when
present).

> My worse disk has 28 relocated bad blocks.

Doesn't sound bad.
Isn't there a SMART value that will show you how big a percentage of
spare is used (0-255)?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
@ 2005-03-07 16:36 LinuxRaid
  2005-03-07 17:09 ` Peter T. Breuer
  0 siblings, 1 reply; 22+ messages in thread
From: LinuxRaid @ 2005-03-07 16:36 UTC (permalink / raw)
  To: linux-raid

With error correcting RAID, where the whole idea is to do everything possible
to maintain data reliability, it seems to me the correct behavior of the RAID
subsytem is to attempt to re-write ECC failed data blocks whenever possible.

This is especially true where Software Controlled Timeouts are being
implimented on ATA/SATA drives.

I'm running several RAID-5 arrays against mixed PATA/SATA systems, and I am
amazed at how fragile Linux Software RAID-5 really is.  It makes no sense to
me that one soft ECC errror would kick out an entire volume of data, cause a
rebuild or a run in "degraded" mode, and with the inherent risk of another
event happening on another disk resulting in the loss of all data on the
storage system.

And from what I can tell, Linux software RAID, never gives the drive the
chance to perform reallocation on "weak" sectors...

What should be happening:
1) Drive has a read error or does not deliver the data within the command
timeout parameters that have been issued to the drive.
2) RAID driver collects the blocks from the "working" drives, generates the
missing data from the problem drive.
3) RAID driver both returns the data to the calling process, and issues a
re-write of the bad block on the disk drive in question.
4) RAID drive generates a log message tracking the problem
5) When the number of "event messages" for block re-writes exceeds a certain
threshold, alert the sys-admin that a specific drive is unreliable.

I've been going through the MD driver source, and to tell the truth, can't
figure out where the read error is detected and how to "hook" that event and
force a re-write of the failing sector.  I would very much appreciate it if
someone out there could send me some some hints/tips/pointers on how to
impliment this.  I'm not a Linux / kernel hacker (yet), but this should not be
hard to fix....

John Suykerbuyk

At Wed, 2 Mar 2005 13:05:04 +0100, you wrote
>Hm..  I said partial resync, because a full resync would be a waste of
>time if it's just a thousand sectors or so that needs to be relocated.
> Anyhow.
>
>There's no overhead to the application with the (theoretically
>"partial") degraded mode, since it happens in parallel.
>
>The latency of doing it while the read operation is ongoing would be,
>say, 3 seconds or so per bad sector on a standard disk?  Imagine a
>thousand bad sectors, and any sane person would quickly pull the plug
>from the dead box and have it resync when it boots instead of staring
>at a hung system.  When that happens there's even the risk that the
>resync fails completely, if md decides to pull one of the disks other
>than the one with bad blocks on it from the array before it resyncs.
>
>I prefer the first scenario (the system keeps running, the array isn't
>potentially destroyed), even if it means a slightly lower I/O rate and
>thus a minor overhead if and only if running applications utilize the
>I/O subsystem 100%..
>
>Am I wrong?
>
>Guy wrote:
>>I think the overhead related to fixing the bad blocks would be insignificant
>> compared to the overhead of degraded mode.
>> >> Guy
>> >> -----Original Message-----
>> From: linux-raid-owner@vger.kernel.org
>> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Molle Bestefich
>> Sent: Tuesday, March 01, 2005 10:51 PM
>> To: linux-raid@vger.kernel.org
>> Subject: Re: Joys of spare disks!
>> >> Robin Bowes wrote:
>> > I envisage something like:
>> >
>> > md attempts read
>> > one disk/partition fails with a bad block
>> > md re-calculates correct data from other disks
>> > md writes correct data to "bad" disk
>> >   - disk will re-locate the bad block
>> >> Probably not that simple, since some times multiple blocks will go
>> bad, and you wouldn't want the entire system to come to a screeching
>> halt whenever that happens.
>> >> A more consistent and risk-free way of doing it would probably be to
>> do the above partial resync in a background thread or so?..
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
  2005-03-07 16:36 LinuxRaid
@ 2005-03-07 17:09 ` Peter T. Breuer
  0 siblings, 0 replies; 22+ messages in thread
From: Peter T. Breuer @ 2005-03-07 17:09 UTC (permalink / raw)
  To: linux-raid

LinuxRaid@suykerbuyk.org wrote:
> I've been going through the MD driver source, and to tell the truth, can't
> figure out where the read error is detected and how to "hook" that event and
> force a re-write of the failing sector.  I would very much appreciate it if

I did that for RAID1, or at least most of it (look for "robust read
patch" in this group's archive).

I also looked at RAID5, and my reaction was about the same as yours. I
made some progress, then decided that valour was not my colour that day.

> someone out there could send me some some hints/tips/pointers on how to
> impliment this.  I'm not a Linux / kernel hacker (yet), but this should not be
> hard to fix....

Peter


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
@ 2005-03-07 20:15 LinuxRaid
  0 siblings, 0 replies; 22+ messages in thread
From: LinuxRaid @ 2005-03-07 20:15 UTC (permalink / raw)
  To: linux-raid

Well, 

With this much interest, I will tear back into the bowels of Raid-5. Again,
anyone else reading this with a shred of a clue as to where to start, please
chime in!

- John "S"

At Mon, 07 Mar 2005 10:18:14 -0800, you wrote
>
>
>
>LinuxRaid@Suykerbuyk.org wrote:
>
>> I'm running several RAID-5 arrays against mixed PATA/SATA systems, and I am
>>amazed at how fragile Linux Software RAID-5 really is.  It makes no sense to
>
>Amen!
>
>> What should be happening:
>> 1) Drive has a read error or does not deliver the data within the command
>> timeout parameters that have been issued to the drive.
>> 2) RAID driver collects the blocks from the "working" drives, generates the
>> missing data from the problem drive.
>> 3) RAID driver both returns the data to the calling process, and issues a
>> re-write of the bad block on the disk drive in question.
>> 4) RAID drive generates a log message tracking the problem
>>5) When the number of "event messages" for block re-writes exceeds a certain
>> threshold, alert the sys-admin that a specific drive is unreliable.
>
>Absolutely
>
>>impliment this.  I'm not a Linux / kernel hacker (yet), but this should not be
>> hard to fix....
>
>You will have a very willing tester in me if you generate any patches. I 
>haven't played with device mapper yet (though that is apparently the way 
>to get fake "faulty" devices for testing), but I have created a quick 
>script to create/destroy a loopback-mounted set of files and raid5 array 
>on top of it. Its in the archives and may or may not help as a test rig 
>as you're hacking on the code
>
>There's lots more people than just me interested too, if you've got the 
>motivation
>
>-Mike
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Joys of spare disks!
       [not found]       ` <eaa6dfe0503080915276466a1@mail.gmail.com>
@ 2005-03-08 17:15         ` Derek Piper
       [not found]         ` <200503091704.j29H4l517152@www.watkins-home.com>
  1 sibling, 0 replies; 22+ messages in thread
From: Derek Piper @ 2005-03-08 17:15 UTC (permalink / raw)
  To: linux-raid

Heh.. I too have Seagate drives and have RMA'd even the drive they
sent back to replace the one I RMA'd a month or so ago. Normally
they're great, so I'm hoping the next replacement will fair better.

The one they sent back had more than 10000 unrecoverable sector errors
(!) according to smartctl by the time I RMA'd it (the first one, my
original drive, that I sent back had 3). It failed its self-tests and
all of Seagate's SeaTools tests too. The drive wasn't mishandled by
me, although I thought Seagate's own packaging looked a bit crap (two
pieces of black plastic suspending the drive within a plastic 'shell'
(the so called 'SeaShell' :>) within the box. No peanuts or bubble
wrap or anything else. Still, the drive worked at first and then
within 48hrs had problems.

I have other Seagate drives that have been run for over 15000 without
a single even reallocated bad sector. I might have to try Peter's
patch though, since I am running them in RAID-1.

Derek

/trying to un-lurk on the list (i.e. read the 120 threads that have
appeared since I forgot about checking my gmail account)

On Wed, 2 Mar 2005 00:53:51 -0500, Guy <bugzilla@watkins-home.com> wrote:
> That has not been my experience, but I have Seagate drives!
>
> Guy
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Brad Campbell
> Sent: Tuesday, March 01, 2005 11:57 PM
> To: Robin Bowes
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Joys of spare disks!
>
> Robin Bowes wrote:
>
> > Thanks to some advice from Guy the "failed" disk is now back up and
> > running.
> >
> > To fix it I did the following;
> >
> <snip>
>
> Just watch that disk like a hawk. I had two disks fail recently in the same
> way, I did exactly what
> you did and 2 days later they both started to grow defects again and got
> kicked out of the array. I
> RMA'd both of them last week as they would not stay stable for more than a
> day or two after
> re-allocation.
>
> Regards,
> Brad
> --
> "Human beings, who are almost unique in having the ability
> to learn from the experience of others, are also remarkable
> for their apparent disinclination to do so." -- Douglas Adams
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
Derek Piper - derek.piper@gmail.com
http://doofer.org/

-- 
Derek Piper - derek.piper@gmail.com
http://doofer.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Joys of spare disks!
       [not found]         ` <200503091704.j29H4l517152@www.watkins-home.com>
@ 2005-03-10 19:24           ` Derek Piper
  0 siblings, 0 replies; 22+ messages in thread
From: Derek Piper @ 2005-03-10 19:24 UTC (permalink / raw)
  To: linux-raid

Hmmm, yea.. I'm hoping I get a better one next time. I'll bore you to
tears, I mean, let you know when it comes in :D

Derek

PS: Make sure the 'saveauto' is set to on for SMART data to be saved
automatically through power-cycles

i.e.

> smartctl --saveauto=on /dev/hda

you might want to do this too:

> smartctl --offlineauto=on /dev/hda


On Wed, 9 Mar 2005 12:04:41 -0500, Guy <bugzilla@watkins-home.com> wrote:
> My drives reset the "power on time" when they power on.  So I can't be sure
> how long my drives have been on.  But at least 6 of them have been in use
> 24/7 for 2.5 years.  I have powered them down on occasion.  sdh was replaced
> a few months ago.  But the problem was with the power cable, not the drive.
> But did not determine this until after I swapped the drives.  On Jan 18 sdg
> had only 19 entries.  Anyway, 2 of them still have zero bad blocks.
> 
> And for what it's worth, I have had many problems with re-built disks.
> Once, out of 5 disks, 3 were bad, out of the box.  I have learned that
> repaired disks suck, even Seagate.
> 
> Guy
> 
> Status of my 2.5 year old disks:
> /dev/sdd -  5 entries (40 bytes) in grown table.
> /dev/sde -  0 entries (0 bytes) in grown table.
> /dev/sdf -  12 entries (96 bytes) in grown table.
> /dev/sdg -  21 entries (168 bytes) in grown table.
> /dev/sdh -  0 entries (0 bytes) in grown table.
> /dev/sdi -  6 entries (48 bytes) in grown table.
> /dev/sdj -  0 entries (0 bytes) in grown table.
> 
> -----Original Message-----
> From: Derek Piper [mailto:derek.piper@gmail.com]
> Sent: Tuesday, March 08, 2005 12:15 PM
> To: Guy
> Subject: Re: Joys of spare disks!
> 
> Heh.. I too have Seagate drives and have RMA'd even the drive they
> sent back to replace the one I RMA'd a month or so ago. Normally
> they're great, so I'm hoping the next replacement will fair better.
> 
> The one they sent back had more than 10000 unrecoverable sector errors
> (!) according to smartctl by the time I RMA'd it (the first one, my
> original drive, that I sent back had 3). It failed its self-tests and
> all of Seagate's SeaTools tests too. The drive wasn't mishandled by
> me, although I thought Seagate's own packaging looked a bit crap (two
> pieces of black plastic suspending the drive within a plastic 'shell'
> (the so called 'SeaShell' :>) within the box. No peanuts or bubble
> wrap or anything else. Still, the drive worked at first and then
> within 48hrs had problems.
> 
> I have other Seagate drives that have been run for over 15000 without
> a single even reallocated bad sector. I might have to try Peter's
> patch though, since I am running them in RAID-1.
> 
> Derek
> 
> /trying to un-lurk on the list (i.e. read the 120 threads that have
> appeared since I forgot about checking my gmail account)
> 
> On Wed, 2 Mar 2005 00:53:51 -0500, Guy <bugzilla@watkins-home.com> wrote:
> > That has not been my experience, but I have Seagate drives!
> >
> > Guy
> >
> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org
> > [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Brad Campbell
> > Sent: Tuesday, March 01, 2005 11:57 PM
> > To: Robin Bowes
> > Cc: linux-raid@vger.kernel.org
> > Subject: Re: Joys of spare disks!
> >
> > Robin Bowes wrote:
> >
> > > Thanks to some advice from Guy the "failed" disk is now back up and
> > > running.
> > >
> > > To fix it I did the following;
> > >
> > <snip>
> >
> > Just watch that disk like a hawk. I had two disks fail recently in the
> same
> > way, I did exactly what
> > you did and 2 days later they both started to grow defects again and got
> > kicked out of the array. I
> > RMA'd both of them last week as they would not stay stable for more than a
> > day or two after
> > re-allocation.
> >
> > Regards,
> > Brad
> > --
> > "Human beings, who are almost unique in having the ability
> > to learn from the experience of others, are also remarkable
> > for their apparent disinclination to do so." -- Douglas Adams
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> --
> Derek Piper - derek.piper@gmail.com
> http://doofer.org/
> 
> 


-- 
Derek Piper - derek.piper@gmail.com
http://doofer.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2005-03-10 19:24 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-28 14:24 Joys of spare disks! Robin Bowes
2005-02-28 15:04 ` Jon Lewis
2005-02-28 15:23   ` Robin Bowes
2005-02-28 15:54     ` Nicola Fankhauser
2005-02-28 17:04       ` Robin Bowes
2005-02-28 18:58         ` Nicola Fankhauser
2005-02-28 19:25           ` Robin Bowes
2005-03-02  2:48 ` Robin Bowes
2005-03-02  2:59   ` Neil Brown
2005-03-02  3:50   ` Molle Bestefich
2005-03-02  3:52     ` Molle Bestefich
2005-03-02  5:52     ` Guy
2005-03-02 12:05       ` Molle Bestefich
2005-03-02 16:16         ` Guy
2005-03-03  9:37           ` Molle Bestefich
2005-03-02  4:57   ` Brad Campbell
2005-03-02  5:53     ` Guy
     [not found]       ` <eaa6dfe0503080915276466a1@mail.gmail.com>
2005-03-08 17:15         ` Derek Piper
     [not found]         ` <200503091704.j29H4l517152@www.watkins-home.com>
2005-03-10 19:24           ` Derek Piper
  -- strict thread matches above, loose matches on Subject: below --
2005-03-07 16:36 LinuxRaid
2005-03-07 17:09 ` Peter T. Breuer
2005-03-07 20:15 LinuxRaid

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).