linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Raid6 array crashed-- 4-disk failure...(?)
@ 2008-09-15  9:04 Maarten
  2008-09-15 10:16 ` Neil Brown
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Maarten @ 2008-09-15  9:04 UTC (permalink / raw)
  To: linux-raid


This weekend I promoted my new 6-disk raid6 array to production use and 
was busy copying data to it overnight. The next morning the machine had 
crashed, and the array is down with an (apparent?) 4-disk failure, as 
witnessed by this info:

md5 : inactive sdj1[2](S) sdb1[5](S) sda1[4](S) sdf1[3](S) sdc1[1](S) 
sdk1[0](S)
       2925435648 blocks

apoc ~ # mdadm --assemble /dev/md5 /dev/sd[abcfjk]1
mdadm: /dev/md5 assembled from 2 drives - not enough to start the array.

apoc log # fdisk -l|grep 4875727
/dev/sda1        1       60700   487572718+  fd  Linux raid autodetect
/dev/sdb1        1       60700   487572718+  fd  Linux raid autodetect
/dev/sdc1        1       60700   487572718+  fd  Linux raid autodetect
/dev/sdf1        1       60700   487572718+  fd  Linux raid autodetect
/dev/sdj1        1       60700   487572718+  fd  Linux raid autodetect
/dev/sdk1        1       60700   487572718+  fd  Linux raid autodetect

apoc log # mdadm --examine /dev/sd[abcfjk]1|grep Events
          Events : 0.1057345
          Events : 0.1057343
          Events : 0.1057343
          Events : 0.1057343
          Events : 0.1057345
          Events : 0.1057343

Note: the array was built half-degraded, ie. it misses one disk. This is 
how it was displayed when it was still OK yesterday:

md5 : active raid6 sdk1[0] sdj1[2] sdf1[3] sdc1[1] sdb1[5] sda1[4]
       2437863040 blocks level 6, 64k chunk, algorithm 2 [7/6] [UUUUUU_]


By these event counters, one would maybe assume that 4 disks failed 
simultaneously, however weird this may be. But when looking at the other 
info of the examine command, this seems unlikely: all drives report (I 
think) that they were online until the end, except for two drives. The 
first drive of those two is the one that reports it has failed. The 
second is the one that 'sees' that that first drive did fail. All the 
others seem oblivious to that...  I included that data below at the end.

My questions...

1) Is my analysis correct so far ?
2) Can/should I try to assemble --force, or it that very bad in these 
circumstances?
3) Should I say farewell to my ~2400 GB of data ? :-(
4) If it was only a one-drive failure, why did it kill the array ?
5) Any insight as to how this happened / can be prevented in future ?

Thanks in advance !
Maarten



apoc log # mdadm --examine /dev/sd[abcfjk]1
/dev/sda1:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 999c61f3:c632ab84:b78500dd:1e5b1429
   Creation Time : Sun Jan 13 18:10:14 2008
      Raid Level : raid6
   Used Dev Size : 487572608 (464.99 GiB 499.27 GB)
      Array Size : 2437863040 (2324.93 GiB 2496.37 GB)
    Raid Devices : 7
   Total Devices : 6
Preferred Minor : 5

     Update Time : Mon Sep 15 05:17:07 2008
           State : active
  Active Devices : 5
Working Devices : 5
  Failed Devices : 1
   Spare Devices : 0
        Checksum : 8c5374ca - correct
          Events : 0.1057345

      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     4       8        1        4      active sync   /dev/sda1

    0     0       0        0        0      removed
    1     1       8       33        1      active sync   /dev/sdc1
    2     2       8      145        2      active sync   /dev/sdj1
    3     3       8       81        3      active sync   /dev/sdf1
    4     4       8        1        4      active sync   /dev/sda1
    5     5       8       17        5      active sync   /dev/sdb1
    6     6       0        0        6      faulty removed
/dev/sdb1:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 999c61f3:c632ab84:b78500dd:1e5b1429
   Creation Time : Sun Jan 13 18:10:14 2008
      Raid Level : raid6
   Used Dev Size : 487572608 (464.99 GiB 499.27 GB)
      Array Size : 2437863040 (2324.93 GiB 2496.37 GB)
    Raid Devices : 7
   Total Devices : 6
Preferred Minor : 5

     Update Time : Mon Sep 15 05:16:06 2008
           State : active
  Active Devices : 6
Working Devices : 6
  Failed Devices : 1
   Spare Devices : 0
        Checksum : 8c53748e - correct
          Events : 0.1057343

      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     5       8       17        5      active sync   /dev/sdb1

    0     0       8      161        0      active sync   /dev/sdk1
    1     1       8       33        1      active sync   /dev/sdc1
    2     2       8      145        2      active sync   /dev/sdj1
    3     3       8       81        3      active sync   /dev/sdf1
    4     4       8        1        4      active sync   /dev/sda1
    5     5       8       17        5      active sync   /dev/sdb1
    6     6       0        0        6      faulty removed
/dev/sdc1:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 999c61f3:c632ab84:b78500dd:1e5b1429
   Creation Time : Sun Jan 13 18:10:14 2008
      Raid Level : raid6
   Used Dev Size : 487572608 (464.99 GiB 499.27 GB)
      Array Size : 2437863040 (2324.93 GiB 2496.37 GB)
    Raid Devices : 7
   Total Devices : 6
Preferred Minor : 5

     Update Time : Mon Sep 15 05:16:06 2008
           State : active
  Active Devices : 6
Working Devices : 6
  Failed Devices : 1
   Spare Devices : 0
        Checksum : 8c537496 - correct
          Events : 0.1057343

      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     1       8       33        1      active sync   /dev/sdc1

    0     0       8      161        0      active sync   /dev/sdk1
    1     1       8       33        1      active sync   /dev/sdc1
    2     2       8      145        2      active sync   /dev/sdj1
    3     3       8       81        3      active sync   /dev/sdf1
    4     4       8        1        4      active sync   /dev/sda1
    5     5       8       17        5      active sync   /dev/sdb1
    6     6       0        0        6      faulty removed
/dev/sdf1:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 999c61f3:c632ab84:b78500dd:1e5b1429
   Creation Time : Sun Jan 13 18:10:14 2008
      Raid Level : raid6
   Used Dev Size : 487572608 (464.99 GiB 499.27 GB)
      Array Size : 2437863040 (2324.93 GiB 2496.37 GB)
    Raid Devices : 7
   Total Devices : 6
Preferred Minor : 5

     Update Time : Mon Sep 15 05:16:06 2008
           State : active
  Active Devices : 6
Working Devices : 6
  Failed Devices : 1
   Spare Devices : 0
        Checksum : 8c5374ca - correct
          Events : 0.1057343

      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     3       8       81        3      active sync   /dev/sdf1

    0     0       8      161        0      active sync   /dev/sdk1
    1     1       8       33        1      active sync   /dev/sdc1
    2     2       8      145        2      active sync   /dev/sdj1
    3     3       8       81        3      active sync   /dev/sdf1
    4     4       8        1        4      active sync   /dev/sda1
    5     5       8       17        5      active sync   /dev/sdb1
    6     6       0        0        6      faulty removed
/dev/sdj1:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 999c61f3:c632ab84:b78500dd:1e5b1429
   Creation Time : Sun Jan 13 18:10:14 2008
      Raid Level : raid6
   Used Dev Size : 487572608 (464.99 GiB 499.27 GB)
      Array Size : 2437863040 (2324.93 GiB 2496.37 GB)
    Raid Devices : 7
   Total Devices : 6
Preferred Minor : 5

     Update Time : Mon Sep 15 05:17:07 2008
           State : active
  Active Devices : 5
Working Devices : 5
  Failed Devices : 1
   Spare Devices : 0
        Checksum : 8c537556 - correct
          Events : 0.1057345

      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     2       8      145        2      active sync   /dev/sdj1

    0     0       0        0        0      removed
    1     1       8       33        1      active sync   /dev/sdc1
    2     2       8      145        2      active sync   /dev/sdj1
    3     3       8       81        3      active sync   /dev/sdf1
    4     4       8        1        4      active sync   /dev/sda1
    5     5       8       17        5      active sync   /dev/sdb1
    6     6       0        0        6      faulty removed
/dev/sdk1:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 999c61f3:c632ab84:b78500dd:1e5b1429
   Creation Time : Sun Jan 13 18:10:14 2008
      Raid Level : raid6
   Used Dev Size : 487572608 (464.99 GiB 499.27 GB)
      Array Size : 2437863040 (2324.93 GiB 2496.37 GB)
    Raid Devices : 7
   Total Devices : 6
Preferred Minor : 5

     Update Time : Mon Sep 15 05:16:06 2008
           State : active
  Active Devices : 6
Working Devices : 6
  Failed Devices : 1
   Spare Devices : 0
        Checksum : 8c537514 - correct
          Events : 0.1057343

      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     0       8      161        0      active sync   /dev/sdk1

    0     0       8      161        0      active sync   /dev/sdk1
    1     1       8       33        1      active sync   /dev/sdc1
    2     2       8      145        2      active sync   /dev/sdj1
    3     3       8       81        3      active sync   /dev/sdf1
    4     4       8        1        4      active sync   /dev/sda1
    5     5       8       17        5      active sync   /dev/sdb1
    6     6       0        0        6      faulty removed



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Raid6 array crashed-- 4-disk failure...(?)
  2008-09-15  9:04 Raid6 array crashed-- 4-disk failure...(?) Maarten
@ 2008-09-15 10:16 ` Neil Brown
  2008-09-15 16:32   ` Maarten
  2008-09-15 11:03 ` Peter Grandi
  2008-09-15 12:59 ` Andre Noll
  2 siblings, 1 reply; 15+ messages in thread
From: Neil Brown @ 2008-09-15 10:16 UTC (permalink / raw)
  To: Maarten; +Cc: linux-raid

On Monday September 15, maarten@ultratux.net wrote:
> 
> This weekend I promoted my new 6-disk raid6 array to production use and 
> was busy copying data to it overnight. The next morning the machine had 
> crashed, and the array is down with an (apparent?) 4-disk failure, as 
> witnessed by this info:

Pity about that crash.  I don't suppose there are any useful kernel
logs leading up to it.  Maybe the machine needs more burn-in testing
before going into production?


> 
> md5 : inactive sdj1[2](S) sdb1[5](S) sda1[4](S) sdf1[3](S) sdc1[1](S) 
> sdk1[0](S)
>        2925435648 blocks

That suggests that the kernel tried to assemble the array, but failed
because it was too degraded.

> 
> apoc ~ # mdadm --assemble /dev/md5 /dev/sd[abcfjk]1
> mdadm: /dev/md5 assembled from 2 drives - not enough to start the array.
> 
> apoc log # fdisk -l|grep 4875727
> /dev/sda1        1       60700   487572718+  fd  Linux raid autodetect
> /dev/sdb1        1       60700   487572718+  fd  Linux raid autodetect
> /dev/sdc1        1       60700   487572718+  fd  Linux raid autodetect
> /dev/sdf1        1       60700   487572718+  fd  Linux raid autodetect
> /dev/sdj1        1       60700   487572718+  fd  Linux raid autodetect
> /dev/sdk1        1       60700   487572718+  fd  Linux raid autodetect
> 
> apoc log # mdadm --examine /dev/sd[abcfjk]1|grep Events
>           Events : 0.1057345
>           Events : 0.1057343
>           Events : 0.1057343
>           Events : 0.1057343
>           Events : 0.1057345
>           Events : 0.1057343
> 

So sda1 and sdj1 are newer, but not by much.
Looking at the full --examine output below, the time difference
between 1057343 and 1057345 is 61 seconds.  That is probably one or
two device timeouts.

'a' and 'j' think that 'k' failed and was removed.  Everyone else
think that the world is a happy place.

So I suspect that an IO to k failed, and the attempt to update the
metadata worked on 'a' and 'j' but not anywhere else.  So then the
array just stopped.  When md tried to update 'a' and 'j' with the new
failure information, it failed on them as well.

> Note: the array was built half-degraded, ie. it misses one disk. This is 
> how it was displayed when it was still OK yesterday:
> 
> md5 : active raid6 sdk1[0] sdj1[2] sdf1[3] sdc1[1] sdb1[5] sda1[4]
>        2437863040 blocks level 6, 64k chunk, algorithm 2 [7/6] [UUUUUU_]
> 
> 
> By these event counters, one would maybe assume that 4 disks failed 
> simultaneously, however weird this may be. But when looking at the other 
> info of the examine command, this seems unlikely: all drives report (I 
> think) that they were online until the end, except for two drives. The 
> first drive of those two is the one that reports it has failed. The 
> second is the one that 'sees' that that first drive did fail. All the 
> others seem oblivious to that...  I included that data below at the end.

Not quite.  'k' is reported as failed, 'a' 'and 'j' know this.


> 
> My questions...
> 
> 1) Is my analysis correct so far ?

Not exactly, but fairly close.

> 2) Can/should I try to assemble --force, or it that very bad in these 
> circumstances?

Yes, you should assemble with --force.  The evidence is strong that
nothing was successfully written after 'k' failed, so all the data
should be consistent.  You will need to sit through a recovery with
probably won't make any changes, but it is certainly safest to let it
try.


> 3) Should I say farewell to my ~2400 GB of data ? :-(

Not yet.

> 4) If it was only a one-drive failure, why did it kill the array ?

It wasn't just one drive.  Maybe it was a controller/connector
failure.  Maybe when one drive failed it did bad things to the buss.
It is hard to know for sure.
Are these drives SATA or SCSI or SAS or ???

> 5) Any insight as to how this happened / can be prevented in future ?

See above.
You need to identify the failing component and correct it - either
replace or re-seat or whatever is needed.
Finding the failing component is not easy.   Lots of burn-in testing
and catching any kernel logs if/when it crashes is your best bet.

Good luck.

NeilBrown

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Raid6 array crashed-- 4-disk failure...(?)
  2008-09-15  9:04 Raid6 array crashed-- 4-disk failure...(?) Maarten
  2008-09-15 10:16 ` Neil Brown
@ 2008-09-15 11:03 ` Peter Grandi
  2008-09-15 16:57   ` Maarten
  2008-09-15 12:59 ` Andre Noll
  2 siblings, 1 reply; 15+ messages in thread
From: Peter Grandi @ 2008-09-15 11:03 UTC (permalink / raw)
  To: Linux RAID


> This weekend I promoted my new 6-disk raid6 array to
> production use and was busy copying data to it overnight. The
> next morning the machine had crashed, and the array is down
> with an (apparent?) 4-disk failure, [ ... ]

Multiple drive failures are far more common than people expect,
and the problem lies in people's expectations, because they don't
do common mode analysis (what's what? many will think).

They typically happen all at once at power up, or in short
succession (e.g. 2nd drive fails while syncing to recover from
1st failure).

The typical RAID has N drives from the same manufacturer, of the
same model, with nearly contiguous serial numbers, from the same
shipping carton, in an enclosure where they all are started and
stopped at the same time, run on the same power circuit, at the
same temperature, on much the same load, attached to the same
host adapter or N of the same type. Expecting as many do to have
uncorrelated failures is rather comical.

1) Is my analysis correct so far ?

Not so sure :-). Consider this interesting discrepancy:

  /dev/sda1:
  [ ... ]
      Raid Devices : 7
     Total Devices : 6
  [ ... ]
    Active Devices : 5
  Working Devices : 5

  /dev/sdb1:
  [ ... ]
      Raid Devices : 7
     Total Devices : 6
  [ ... ]
    Active Devices : 6
  Working Devices : 6

Also note that member 0, 'sdk1' is listed as "removed", but not
faulty, in some member statuses. However you have been able to
actually get the status out of all members, including 'sdk1',
which reports itself as 'active', like all other drives as of
5:16. Then only 2 drives report themselves as 'active' as of
5:17, and those think that the array has 5 'active'/'working'
devices at that time. What happened between 5:16 and 5:17?

You should look at your system log to figure out what really
happened to your drives and then assess what the cause of the
failure was and its impact.

3) Should I say farewell to my ~2400 GB of data ? :-(

Surely not -- you have a backup of those 2400GB, as obvious from
"busy copying data to it". RAID is not backup anyhow :-).

4) If it was only a one-drive failure, why did it kill the array ?

The MD subsystem marked as bad more than one drive. Anyhow doing
a 5+2 RAID6 and then loading it with data with a checksum drive
missing and at the same time as it syncing seems a bit too clever
to me. Right now the array is running in effect in RAID0 mode, so
I would not trust it even if you are able to restart it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Raid6 array crashed-- 4-disk failure...(?)
  2008-09-15  9:04 Raid6 array crashed-- 4-disk failure...(?) Maarten
  2008-09-15 10:16 ` Neil Brown
  2008-09-15 11:03 ` Peter Grandi
@ 2008-09-15 12:59 ` Andre Noll
  2008-09-15 17:14   ` Maarten
  2 siblings, 1 reply; 15+ messages in thread
From: Andre Noll @ 2008-09-15 12:59 UTC (permalink / raw)
  To: Maarten; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1375 bytes --]

On 11:04, Maarten wrote:
> 
> This weekend I promoted my new 6-disk raid6 array to production use and 
> was busy copying data to it overnight. The next morning the machine had 
> crashed, and the array is down with an (apparent?) 4-disk failure, as 
> witnessed by this info:

Believe it or not: The same thing (6-disk raid6, 4 disks failed)
happened also to me during this weekend.

> 4) If it was only a one-drive failure, why did it kill the array ?

As others have already pointed out, this was not a one-drive
failure. In my case, the two SATA disks which are still functional
are connected to a 3ware controller while the four failed disks use
the onboard SATA controller [1]. Therefore I'm confident that this
is just a problem with the onboard SATA chip and that the array can
be assembled again after a reboot. I'll have to wait until the end
of the week to reboot that machine though.

Are you also using an Intel-based SATA chip (please send the output
of lspci -v)?  Also, which kernel version are you using?

> 5) Any insight as to how this happened / can be prevented in future ?

Don't use cheap hardware (Fast, cheap, good. Pick two) ;)

Andre

[1] 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA
Storage Controller AHCI (rev 09)
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Raid6 array crashed-- 4-disk failure...(?)
  2008-09-15 10:16 ` Neil Brown
@ 2008-09-15 16:32   ` Maarten
  2008-09-15 20:57     ` Maarten
  0 siblings, 1 reply; 15+ messages in thread
From: Maarten @ 2008-09-15 16:32 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil Brown wrote:
> On Monday September 15, maarten@ultratux.net wrote:
>> This weekend I promoted my new 6-disk raid6 array to production use and 
>> was busy copying data to it overnight. The next morning the machine had 
>> crashed, and the array is down with an (apparent?) 4-disk failure, as 
>> witnessed by this info:
> 
> Pity about that crash.  I don't suppose there are any useful kernel
> logs leading up to it.  Maybe the machine needs more burn-in testing
> before going into production?

The thing is, I tested the array for months on a new install that was 
running on spare hardware. Then this weekend I swapped the new OS 
together with the new disks to the fileserver. The fileserver was 
running well on the old OS. So indeed, maybe there is a mismatch between 
the new kernel and the hardware... But I did test-drive the raid-6 code 
for a couple of months.

>> md5 : inactive sdj1[2](S) sdb1[5](S) sda1[4](S) sdf1[3](S) sdc1[1](S) 
>> sdk1[0](S)
>>        2925435648 blocks
> 
> That suggests that the kernel tried to assemble the array, but failed
> because it was too degraded.
> 
>> apoc ~ # mdadm --assemble /dev/md5 /dev/sd[abcfjk]1
>> mdadm: /dev/md5 assembled from 2 drives - not enough to start the array.
>>
>> apoc log # fdisk -l|grep 4875727
>> /dev/sda1        1       60700   487572718+  fd  Linux raid autodetect
>> /dev/sdb1        1       60700   487572718+  fd  Linux raid autodetect
>> /dev/sdc1        1       60700   487572718+  fd  Linux raid autodetect
>> /dev/sdf1        1       60700   487572718+  fd  Linux raid autodetect
>> /dev/sdj1        1       60700   487572718+  fd  Linux raid autodetect
>> /dev/sdk1        1       60700   487572718+  fd  Linux raid autodetect
>>
>> apoc log # mdadm --examine /dev/sd[abcfjk]1|grep Events
>>           Events : 0.1057345
>>           Events : 0.1057343
>>           Events : 0.1057343
>>           Events : 0.1057343
>>           Events : 0.1057345
>>           Events : 0.1057343
>>
> 
> So sda1 and sdj1 are newer, but not by much.
> Looking at the full --examine output below, the time difference
> between 1057343 and 1057345 is 61 seconds.  That is probably one or
> two device timeouts.

Ah. How can you tell, I did not know this...

> 'a' and 'j' think that 'k' failed and was removed.  Everyone else
> think that the world is a happy place.
> 
> So I suspect that an IO to k failed, and the attempt to update the
> metadata worked on 'a' and 'j' but not anywhere else.  So then the
> array just stopped.  When md tried to update 'a' and 'j' with the new
> failure information, it failed on them as well.
> 
>> Note: the array was built half-degraded, ie. it misses one disk. This is 
>> how it was displayed when it was still OK yesterday:
>>
>> md5 : active raid6 sdk1[0] sdj1[2] sdf1[3] sdc1[1] sdb1[5] sda1[4]
>>        2437863040 blocks level 6, 64k chunk, algorithm 2 [7/6] [UUUUUU_]
>>
>>
>> By these event counters, one would maybe assume that 4 disks failed 
>> simultaneously, however weird this may be. But when looking at the other 
>> info of the examine command, this seems unlikely: all drives report (I 
>> think) that they were online until the end, except for two drives. The 
>> first drive of those two is the one that reports it has failed. The 
>> second is the one that 'sees' that that first drive did fail. All the 
>> others seem oblivious to that...  I included that data below at the end.
> 
> Not quite.  'k' is reported as failed, 'a' 'and 'j' know this.
> 
> 
>> My questions...
>>
>> 1) Is my analysis correct so far ?
> 
> Not exactly, but fairly close.
> 
>> 2) Can/should I try to assemble --force, or it that very bad in these 
>> circumstances?
> 
> Yes, you should assemble with --force.  The evidence is strong that
> nothing was successfully written after 'k' failed, so all the data
> should be consistent.  You will need to sit through a recovery with
> probably won't make any changes, but it is certainly safest to let it
> try.
> 
> 
>> 3) Should I say farewell to my ~2400 GB of data ? :-(
> 
> Not yet.
> 
>> 4) If it was only a one-drive failure, why did it kill the array ?
> 
> It wasn't just one drive.  Maybe it was a controller/connector
> failure.  Maybe when one drive failed it did bad things to the buss.
> It is hard to know for sure.
> Are these drives SATA or SCSI or SAS or ???

Eh, SATA. The machine has 4 4-port SATA controllers on 33MHz PCI busses.
Yes, that kills performance, but what can you do. It still outperforms 
the network.
Re-seating the PCI cards may be a good idea. However, I think (am sure) 
the drives were not on the same controllers: a thru d are on card #1, e 
thru h on the second card, etc.

>> 5) Any insight as to how this happened / can be prevented in future ?
> 
> See above.
> You need to identify the failing component and correct it - either
> replace or re-seat or whatever is needed.
> Finding the failing component is not easy.   Lots of burn-in testing
> and catching any kernel logs if/when it crashes is your best bet.

Ok, I'll read up on using the MagicSysRQ, too. The logs were completely 
empty at the time of the crash and the keyboard was unresponsive, so it 
was a full kernel panic.

> Good luck.

Thanks for your help Neil !

Maarten

> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Raid6 array crashed-- 4-disk failure...(?)
  2008-09-15 11:03 ` Peter Grandi
@ 2008-09-15 16:57   ` Maarten
  2008-09-16 19:06     ` Bill Davidsen
  0 siblings, 1 reply; 15+ messages in thread
From: Maarten @ 2008-09-15 16:57 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux RAID

Peter Grandi wrote:
>> This weekend I promoted my new 6-disk raid6 array to
>> production use and was busy copying data to it overnight. The
>> next morning the machine had crashed, and the array is down
>> with an (apparent?) 4-disk failure, [ ... ]
> 
> Multiple drive failures are far more common than people expect,
> and the problem lies in people's expectations, because they don't
> do common mode analysis (what's what? many will think).

It IS more common indeed. I'm on my seventh or eight raid-5 array now, 
the first was a 4-disk raid5 40(120) GB array. I've had 4 or 5 two-disk 
failures happen to me over the years, invariably during rebuild, indeed.
This is why I'm switching over to raid-6, by the way.

I did not, at any point, lose the array with the two-disk failures 
though. I intelligently cloned bad drives with dd_rescue and reassembled 
those degraded arrays using the new disks and thus got my data back.
But still, such events tend to keep me busy for a whole weekend, which 
is not too pleasant.

> They typically happen all at once at power up, or in short
> succession (e.g. 2nd drive fails while syncing to recover from
> 1st failure).
> 
> The typical RAID has N drives from the same manufacturer, of the
> same model, with nearly contiguous serial numbers, from the same
> shipping carton, in an enclosure where they all are started and
> stopped at the same time, run on the same power circuit, at the
> same temperature, on much the same load, attached to the same
> host adapter or N of the same type. Expecting as many do to have
> uncorrelated failures is rather comical.

This is true. However, since I know this fact I tend to take care to not 
make it too vulnerable; the system is incredibly well cooled, it has 8 
80mm fans that cool the 16(!) disks, I buy disks in batches of 2, from 
different brands and vendors. It indeed has just one PSU, but I chose a 
good one, I think it's a Tagan 550 Watt unit.

In fact -this is my home system- since I cannot afford a DLT drive for 
this much data I practically have no backup, so I really spend a lot of 
effort making sure the array stays ok. Yes, I know, this not a good 
idea, but how do I economically backup 3 TB ?
In practice I have older disks and/or decommisioned arrays with 
"backups" but this is of course not up to date at all.

> 1) Is my analysis correct so far ?
> 
> Not so sure :-). Consider this interesting discrepancy:
> 
>   /dev/sda1:
>   [ ... ]
>       Raid Devices : 7
>      Total Devices : 6
>   [ ... ]
>     Active Devices : 5
>   Working Devices : 5
> 
>   /dev/sdb1:
>   [ ... ]
>       Raid Devices : 7
>      Total Devices : 6
>   [ ... ]
>     Active Devices : 6
>   Working Devices : 6
> 
> Also note that member 0, 'sdk1' is listed as "removed", but not
> faulty, in some member statuses. However you have been able to
> actually get the status out of all members, including 'sdk1',
> which reports itself as 'active', like all other drives as of
> 5:16. Then only 2 drives report themselves as 'active' as of
> 5:17, and those think that the array has 5 'active'/'working'
> devices at that time. What happened between 5:16 and 5:17?

Don't know, I was asleep ;-)
Seriously, the system experienced a hard crash. Not even the keyboard 
responded to the capslock key/led anymore. Logs are empty.

> You should look at your system log to figure out what really
> happened to your drives and then assess what the cause of the
> failure was and its impact.

Syslogs are empty. Not one line nor even a hint at that time.

> 3) Should I say farewell to my ~2400 GB of data ? :-(
> 
> Surely not -- you have a backup of those 2400GB, as obvious from
> "busy copying data to it". RAID is not backup anyhow :-).

Yes I have most of the data. What I'd lose is ~20 GB, which is less than 
one percent ;-).  But still, it's a lot of bytes...

> 4) If it was only a one-drive failure, why did it kill the array ?
> 
> The MD subsystem marked as bad more than one drive. Anyhow doing
> a 5+2 RAID6 and then loading it with data with a checksum drive
> missing and at the same time as it syncing seems a bit too clever
> to me. Right now the array is running in effect in RAID0 mode, so
> I would not trust it even if you are able to restart it.

Just bought a seventh/replacement disk...  But if the array is lost that 
is of little use.  I'll try to reassemble later tonight...

Thanks,
Maarten

> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Raid6 array crashed-- 4-disk failure...(?)
  2008-09-15 12:59 ` Andre Noll
@ 2008-09-15 17:14   ` Maarten
  2008-09-16  8:25     ` Andre Noll
  0 siblings, 1 reply; 15+ messages in thread
From: Maarten @ 2008-09-15 17:14 UTC (permalink / raw)
  To: Andre Noll; +Cc: linux-raid

Andre Noll wrote:
> On 11:04, Maarten wrote:
>> This weekend I promoted my new 6-disk raid6 array to production use and 
>> was busy copying data to it overnight. The next morning the machine had 
>> crashed, and the array is down with an (apparent?) 4-disk failure, as 
>> witnessed by this info:
> 
> Believe it or not: The same thing (6-disk raid6, 4 disks failed)
> happened also to me during this weekend.

Hehe. It doesn't get more scary than this.... ;-)

>> 4) If it was only a one-drive failure, why did it kill the array ?
> 
> As others have already pointed out, this was not a one-drive
> failure. In my case, the two SATA disks which are still functional
> are connected to a 3ware controller while the four failed disks use
> the onboard SATA controller [1]. Therefore I'm confident that this
> is just a problem with the onboard SATA chip and that the array can
> be assembled again after a reboot. I'll have to wait until the end
> of the week to reboot that machine though.
> 
> Are you also using an Intel-based SATA chip (please send the output
> of lspci -v)?  Also, which kernel version are you using?

No, my chipset is a VIA one. Because the VIA SATA chips/drivers are 
terrible, I use only SATA PCI cards with Sil chipsets.  Believe it or 
not I have good/excellent experiences with these. The driver is quite 
stable, better than everything else I tried.

apoc log # lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] 
Host Bridge (rev 80)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:07.0 RAID bus controller: Silicon Image, Inc. SiI 3114 
[SATALink/SATARaid] Serial ATA Controller (rev 02)
00:08.0 RAID bus controller: Silicon Image, Inc. SiI 3114 
[SATALink/SATARaid] Serial ATA Controller (rev 02)
00:09.0 RAID bus controller: Silicon Image, Inc. SiI 3114 
[SATALink/SATARaid] Serial ATA Controller (rev 02)
00:0a.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial 
ATA Controller (rev 02)
00:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 
Gigabit Ethernet (rev 10)
00:0f.0 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge 
[KT600/K8T800/K8T890 South]
01:00.0 VGA compatible controller: ATI Technologies Inc Rage XL AGP 2X 
(rev 65)

Linux apoc 2.6.23-gentoo-r3 #2 Fri Apr 25 11:09:37 CEST 2008 i686 AMD 
Sempron(tm) 2200+ AuthenticAMD GNU/Linux


>> 5) Any insight as to how this happened / can be prevented in future ?
> 
> Don't use cheap hardware (Fast, cheap, good. Pick two) ;)

How true. In this case I think(or hope) I went for "cheap, good"... 
Sixteen disks on 4 PCI slots (but still a single PCI bus!) is far from 
fast indeed. ;-) I get a rebuild speed of 20436K/sec on a 5-disk raid5 
array (SATA 250 GB disks), which is not terrible, but not fast either.

I'm considering a 8/12/16 port Areca controller but a few practicalities 
hold me back: the price, and the fact I would need a PCI-X slot unless I 
want to kill performance by a factor of 10. Also, the fact that I then 
cannot use software raid anymore tends to scare me a little: You never 
know how firmware reacts in the more 'interesting' circumstances, and 
you lose control over it...

Maarten

> Andre
> 
> [1] 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA
> Storage Controller AHCI (rev 09)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Raid6 array crashed-- 4-disk failure...(?)
  2008-09-15 16:32   ` Maarten
@ 2008-09-15 20:57     ` Maarten
  2008-09-16 13:12       ` Andre Noll
  0 siblings, 1 reply; 15+ messages in thread
From: Maarten @ 2008-09-15 20:57 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Replying to myself...

Maarten wrote:
> Neil Brown wrote:
>> On Monday September 15, maarten@ultratux.net wrote:
>>> This weekend I promoted my new 6-disk raid6 array to production use 
>>> and was busy copying data to it overnight. The next morning the 
>>> machine had crashed, and the array is down with an (apparent?) 4-disk 
>>> failure, as witnessed by this info:

>> So sda1 and sdj1 are newer, but not by much.
>> Looking at the full --examine output below, the time difference
>> between 1057343 and 1057345 is 61 seconds.  That is probably one or
>> two device timeouts.
> 
> Ah. How can you tell, I did not know this...

(Duh...)

>>> 2) Can/should I try to assemble --force, or it that very bad in these 
>>> circumstances?
>>
>> Yes, you should assemble with --force.  The evidence is strong that
>> nothing was successfully written after 'k' failed, so all the data
>> should be consistent.  You will need to sit through a recovery with
>> probably won't make any changes, but it is certainly safest to let it
>> try.

I did some rewiring, verified the PCI connections, rearranged the order 
of the drives, changed the realtek Gbit card to an intel (To be safe-- I 
did experience an earlier crash, very possibly due to the eth card...), 
updated the kernel, added the seventh drive and booted.
I did the --assemble --force thingy and all seems to be well, as far as 
one can see at this point that is:

apoc ~ # mdadm --assemble --force /dev/md5  /dev/sd[fhijkl]1
mdadm: forcing event count in /dev/sdl1(0) from 1057343 upto 1057345
mdadm: forcing event count in /dev/sdj1(1) from 1057343 upto 1057345
mdadm: forcing event count in /dev/sdf1(3) from 1057343 upto 1057345
mdadm: forcing event count in /dev/sdi1(5) from 1057343 upto 1057345
mdadm: /dev/md5 has been started with 6 drives (out of 7).

md5 : active raid6 sdl1[0] sdi1[5] sdh1[4] sdf1[3] sdk1[2] sdj1[1]
       2437863040 blocks level 6, 64k chunk, algorithm 2 [7/6] [UUUUUU_]

apoc ~ # pvscan;vgscan;lvscan;vgchange -a y
apoc ~ # xfs_check /dev/volume/video
ERROR: The filesystem has valuable metadata changes in a log which needs 
to be replayed.  Mount the filesystem to replay the log, and unmount it 
before re-running xfs_check.  If you are unable to mount the filesystem, 
then use the xfs_repair -L option to destroy the log and attempt a 
repair. Note that destroying the log may cause corruption -- please 
attempt a mount of the filesystem before doing this.
apoc ~ # mount /dev/volume/video  /video/
apoc ~ # umount /dev/volume/video
apoc ~ # xfs_check /dev/volume/video
apoc ~ #

But: The array did not resync. I think this may be correct but my 
understanding of raid-6 is still a bit flaky. It is degraded, but not 
fully degraded, that would mean two drives missing as it is raid-6. So 
there is indeed parity information now. Do I have to force some resync ?
Or did you mean to --assemble five disks instead of six, and hot-add the 
sixth ? If so, is it still useful to do that or is that either too late 
or pointless ?

I'm planning to add the last drive to it, to make it fully synced. 
However, since these are 500GB SATA drives and a resync of my smaller 
raid-5 array with just 5 250GB disks takes 180 mins, my guesstimate 
would be that it takes at least 7 hours to resync. Maybe this is not the 
best time to do that kind of stresstest, what with the possible 
instabilities still in there... But do I first have to make sure it is 
consistent as is...?

I think the hardware as is is at least fairly stable; probably due to 
the same crash my smaller 5-disk array resynced today (in 180 mins) and 
there has been no errors or malfunction during this process this morning.

>>> 3) Should I say farewell to my ~2400 GB of data ? :-(
>>
>> Not yet.

Indeed. :-)

>> NeilBrown

Thanks again.

For completeness sake: this is the current kernel as of now:
Linux apoc 2.6.25-gentoo-r7 #1 Mon Sep 15 20:35:31 CEST 2008 i686 AMD 
Sempron(tm) 2200+ AuthenticAMD GNU/Linux

...and lspci:
apoc ~ # lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] 
Host Bridge (rev 80)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:07.0 RAID bus controller: Silicon Image, Inc. SiI 3114 
[SATALink/SATARaid] Serial ATA Controller (rev 02)
00:08.0 RAID bus controller: Silicon Image, Inc. SiI 3114 
[SATALink/SATARaid] Serial ATA Controller (rev 02)
00:09.0 RAID bus controller: Silicon Image, Inc. SiI 3114 
[SATALink/SATARaid] Serial ATA Controller (rev 02)
00:0a.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial 
ATA Controller (rev 02)
00:0b.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet 
Controller
00:0f.0 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge 
[KT600/K8T800/K8T890 South]
01:00.0 VGA compatible controller: ATI Technologies Inc Rage XL AGP 2X 
(rev 65)

Regards, and thanks everyone for the timely and perfect help !
Maarten

>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Raid6 array crashed-- 4-disk failure...(?)
  2008-09-15 17:14   ` Maarten
@ 2008-09-16  8:25     ` Andre Noll
  2008-09-16 17:50       ` Maarten
  0 siblings, 1 reply; 15+ messages in thread
From: Andre Noll @ 2008-09-16  8:25 UTC (permalink / raw)
  To: Maarten; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2789 bytes --]

On 19:14, Maarten wrote:
> >Are you also using an Intel-based SATA chip (please send the output
> >of lspci -v)?  Also, which kernel version are you using?
> 
> No, my chipset is a VIA one. Because the VIA SATA chips/drivers are 
> terrible, I use only SATA PCI cards with Sil chipsets.  Believe it or 
> not I have good/excellent experiences with these. The driver is quite 
> stable, better than everything else I tried.
> 
> apoc log # lspci
> 00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] 
> Host Bridge (rev 80)
> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
> 00:07.0 RAID bus controller: Silicon Image, Inc. SiI 3114 
> [SATALink/SATARaid] Serial ATA Controller (rev 02)
> 00:08.0 RAID bus controller: Silicon Image, Inc. SiI 3114 
> [SATALink/SATARaid] Serial ATA Controller (rev 02)
> 00:09.0 RAID bus controller: Silicon Image, Inc. SiI 3114 
> [SATALink/SATARaid] Serial ATA Controller (rev 02)
> 00:0a.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial 
> ATA Controller (rev 02)
> 00:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 
> Gigabit Ethernet (rev 10)
> 00:0f.0 IDE interface: VIA Technologies, Inc. 
> VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
> 00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge 
> [KT600/K8T800/K8T890 South]
> 01:00.0 VGA compatible controller: ATI Technologies Inc Rage XL AGP 2X 
> (rev 65)
> 
> Linux apoc 2.6.23-gentoo-r3 #2 Fri Apr 25 11:09:37 CEST 2008 i686 AMD 
> Sempron(tm) 2200+ AuthenticAMD GNU/Linux

My machine is running vanilla 2.6.25.4, i.e. we're using different
SATA drivers and different kernels.

While looking at the logs I found a plenty of those:

	set_rtc_mmss: can't update from 150 to 43

And indeed this machine started to have serious problems with its
clock since last weekend. I found it off by 12 hours yesterday and it
is still runing much too fast so that ntp is not working any more. I'm
currently setting the time with a script in 10min intervals...

Were you also seeing such messages during/after the hard disk failures?

> I'm considering a 8/12/16 port Areca controller but a few practicalities 
> hold me back: the price, and the fact I would need a PCI-X slot unless I 
> want to kill performance by a factor of 10. Also, the fact that I then 
> cannot use software raid anymore tends to scare me a little: You never 
> know how firmware reacts in the more 'interesting' circumstances, and 
> you lose control over it...

You could use jbod mode (or create single-disk "raid arrays") with
Areca or 3ware controllers and use software raid on top of that.

Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Raid6 array crashed-- 4-disk failure...(?)
  2008-09-15 20:57     ` Maarten
@ 2008-09-16 13:12       ` Andre Noll
  0 siblings, 0 replies; 15+ messages in thread
From: Andre Noll @ 2008-09-16 13:12 UTC (permalink / raw)
  To: Maarten; +Cc: Neil Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2218 bytes --]

On 22:57, Maarten wrote:

> apoc ~ # mdadm --assemble --force /dev/md5  /dev/sd[fhijkl]1
> mdadm: forcing event count in /dev/sdl1(0) from 1057343 upto 1057345
> mdadm: forcing event count in /dev/sdj1(1) from 1057343 upto 1057345
> mdadm: forcing event count in /dev/sdf1(3) from 1057343 upto 1057345
> mdadm: forcing event count in /dev/sdi1(5) from 1057343 upto 1057345
> mdadm: /dev/md5 has been started with 6 drives (out of 7).

So four of the six disks were not uptodate. As Neil said, the
difference might just be the sb updates, so I'd guess the array
is fine.

> But: The array did not resync. I think this may be correct but my 
> understanding of raid-6 is still a bit flaky. It is degraded, but not 
> fully degraded, that would mean two drives missing as it is raid-6. So 
> there is indeed parity information now.

Yes, there is parity information. Raid6 uses two parities, freqently
called P and Q, where P is the ordinary xor parity that is also used
in raid5. So each 6-tuple of corresponding disk blocks has exactly
one of the following possible contents:

	D, D, D, D, D, P (Q is missing)
	D, D, D, D, D, Q (P is missing)
	D, D, D, D, P, Q (one of the D's is missing)

Only in the last case, one of the five data blocks is missing. In
this case the raid code uses the P parity to compute the contents of
the missing data block, which is cheap.

IOW, you're as safe as with raid5 ATM. However, there are other reasons
for not running a degraded raid6 array for longer than absolutely
necessary. One of them being that writes to a degraded raid6 array
are significantly slower.

Before adding the seventh disk you might want to check the parities
using mdadm --update=resync. From the mdadm man page:

	The resync option will cause the array to be marked dirty
	meaning that any redundancy in the array (e.g. parity for
	raid5, copies for raid1) may be incorrect.  This will cause
	the raid system to perform a "resync" pass to make sure that
	all redundant information is correct.

I wonder why raid6 isn't mentioned there. Neil, are there any reasons
for this?

Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Raid6 array crashed-- 4-disk failure...(?)
  2008-09-16  8:25     ` Andre Noll
@ 2008-09-16 17:50       ` Maarten
  2008-09-16 18:12         ` Maarten
                           ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Maarten @ 2008-09-16 17:50 UTC (permalink / raw)
  To: Andre Noll; +Cc: linux-raid

Andre Noll wrote:
> On 19:14, Maarten wrote:

>> Linux apoc 2.6.23-gentoo-r3 #2 Fri Apr 25 11:09:37 CEST 2008 i686 AMD 
>> Sempron(tm) 2200+ AuthenticAMD GNU/Linux
> 
> My machine is running vanilla 2.6.25.4, i.e. we're using different
> SATA drivers and different kernels.
> 
> While looking at the logs I found a plenty of those:
> 
> 	set_rtc_mmss: can't update from 150 to 43
> 
> And indeed this machine started to have serious problems with its
> clock since last weekend. I found it off by 12 hours yesterday and it
> is still runing much too fast so that ntp is not working any more. I'm
> currently setting the time with a script in 10min intervals...
> 
> Were you also seeing such messages during/after the hard disk failures?

Nope, nothing there.
Strange your clock is so erratic. Maybe the BIOS battery is dead, then 
again I would not be surprised it runs off mains when powered up.


>> I'm considering a 8/12/16 port Areca controller but a few practicalities 
>> hold me back: the price, and the fact I would need a PCI-X slot unless I 
>> want to kill performance by a factor of 10. Also, the fact that I then 
>> cannot use software raid anymore tends to scare me a little: You never 
>> know how firmware reacts in the more 'interesting' circumstances, and 
>> you lose control over it...
> 
> You could use jbod mode (or create single-disk "raid arrays") with
> Areca or 3ware controllers and use software raid on top of that.

True, but then that would kind of defeat the whole purpose of the fairly 
expensive card. In that case a better investment might be a sort of 
servergrade motherboard which has 6 good onboard controllers and at 
least two separate PCI buses for the add-on cards. (For the price of one 
12 port Areca you can already buy a bare bones low-range server...).

The main selling point of most hardware raid cards is that they seem to 
be doing a much better job predicting failure of a drive than software 
raid can. I don't know how they do that, but a fact is I've never even 
heard of a two-disk failure with hardware raid. Which of course doesn't 
say it cannot happen, but it does seem to be a lot less likely somehow.

> Andre

I'm saving the 15 GB data I did not have any backup of elsewhere now and 
will sunsequently start a forced resync and then hot-add the 7th drive.

Maarten

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Raid6 array crashed-- 4-disk failure...(?)
  2008-09-16 17:50       ` Maarten
@ 2008-09-16 18:12         ` Maarten
  2008-09-17  8:25         ` Andre Noll
  2008-09-19 14:55         ` John Stoffel
  2 siblings, 0 replies; 15+ messages in thread
From: Maarten @ 2008-09-16 18:12 UTC (permalink / raw)
  To: linux-raid

Maarten wrote:
> Andre Noll wrote:
>> On 19:14, Maarten wrote:

> 
> I'm saving the 15 GB data I did not have any backup of elsewhere now and 
> will sunsequently start a forced resync and then hot-add the 7th drive.

Hm... I'm having some trouble here. In short, I have to stop the array 
to get it to resync, and when I try it like that it doesn't want to 
anyhow. Peculiar. Maybe my mdadm or manpage is out of date, I'll google.


apoc ~ # mdadm --update=resync /dev/md5
mdadm: --update does not set the mode, and so cannot be the first option.
apoc ~ # man mdadm
apoc ~ # mdadm --assemble --update=resync /dev/md5
mdadm: device /dev/md5 already active - cannot assemble it

apoc ~ # mdadm -S /dev/md5
mdadm: stopped /dev/md5
apoc ~ # mdadm --assemble --update=resync /dev/md5
mdadm: failed to RUN_ARRAY /dev/md5: Input/output error
apoc ~ # cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [multipath]

md5 : inactive sdl1[0] sdi1[5] sdh1[4] sdf1[3] sdk1[2] sdj1[1]
       2925435648 blocks

apoc ~ # mdadm --assemble --update=resync /dev/md5
mdadm: device /dev/md5 already active - cannot assemble it
apoc ~ # mdadm -S /dev/md5
mdadm: stopped /dev/md5
apoc ~ # cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [multipath]

unused devices: <none>
apoc ~ # mdadm --assemble --update=resync /dev/md5
mdadm: /dev/md5 assembled from 6 drives - not enough to start the array 
while not clean - consider --force.
apoc ~ # cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [multipath]

md5 : inactive sdl1[0](S) sdi1[5](S) sdh1[4](S) sdf1[3](S) sdk1[2](S) 
sdj1[1](S)
       2925435648 blocks

unused devices: <none>
apoc ~ # mdadm --assemble --update=resync --force /dev/md5
mdadm: /dev/md5 has been started with 6 drives (out of 7).
apoc ~ # cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [multipath]

md5 : active raid6 sdl1[0] sdi1[5] sdh1[4] sdf1[3] sdk1[2] sdj1[1]
       2437863040 blocks level 6, 64k chunk, algorithm 2 [7/6] [UUUUUU_]


So, I'm back at square one with that.

Regards,
Maarten

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Raid6 array crashed-- 4-disk failure...(?)
  2008-09-15 16:57   ` Maarten
@ 2008-09-16 19:06     ` Bill Davidsen
  0 siblings, 0 replies; 15+ messages in thread
From: Bill Davidsen @ 2008-09-16 19:06 UTC (permalink / raw)
  To: Maarten; +Cc: Peter Grandi, Linux RAID

Maarten wrote:
> Peter Grandi wrote:
>>> This weekend I promoted my new 6-disk raid6 array to
>>> production use and was busy copying data to it overnight. The
>>> next morning the machine had crashed, and the array is down
>>> with an (apparent?) 4-disk failure, [ ... ]
>>
>> Multiple drive failures are far more common than people expect,
>> and the problem lies in people's expectations, because they don't
>> do common mode analysis (what's what? many will think).
>
> It IS more common indeed. I'm on my seventh or eight raid-5 array now, 
> the first was a 4-disk raid5 40(120) GB array. I've had 4 or 5 
> two-disk failures happen to me over the years, invariably during 
> rebuild, indeed.
> This is why I'm switching over to raid-6, by the way.
>
> I did not, at any point, lose the array with the two-disk failures 
> though. I intelligently cloned bad drives with dd_rescue and 
> reassembled those degraded arrays using the new disks and thus got my 
> data back.
> But still, such events tend to keep me busy for a whole weekend, which 
> is not too pleasant.
>
>> They typically happen all at once at power up, or in short
>> succession (e.g. 2nd drive fails while syncing to recover from
>> 1st failure).
>>
>> The typical RAID has N drives from the same manufacturer, of the
>> same model, with nearly contiguous serial numbers, from the same
>> shipping carton, in an enclosure where they all are started and
>> stopped at the same time, run on the same power circuit, at the
>> same temperature, on much the same load, attached to the same
>> host adapter or N of the same type. Expecting as many do to have
>> uncorrelated failures is rather comical.
>
> This is true. However, since I know this fact I tend to take care to 
> not make it too vulnerable; the system is incredibly well cooled, it 
> has 8 80mm fans that cool the 16(!) disks, I buy disks in batches of 
> 2, from different brands and vendors. It indeed has just one PSU, but 
> I chose a good one, I think it's a Tagan 550 Watt unit.
>
> In fact -this is my home system- since I cannot afford a DLT drive for 
> this much data I practically have no backup, so I really spend a lot 
> of effort making sure the array stays ok. Yes, I know, this not a good 
> idea, but how do I economically backup 3 TB ?
> In practice I have older disks and/or decommisioned arrays with 
> "backups" but this is of course not up to date at all.
Given the low cost of USB connected TB drives, I would say "look there" 
rather than expect to be able to keep any system totally reliable.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Raid6 array crashed-- 4-disk failure...(?)
  2008-09-16 17:50       ` Maarten
  2008-09-16 18:12         ` Maarten
@ 2008-09-17  8:25         ` Andre Noll
  2008-09-19 14:55         ` John Stoffel
  2 siblings, 0 replies; 15+ messages in thread
From: Andre Noll @ 2008-09-17  8:25 UTC (permalink / raw)
  To: Maarten; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1984 bytes --]

On 19:50, Maarten wrote:
> >While looking at the logs I found a plenty of those:
> >
> >	set_rtc_mmss: can't update from 150 to 43
> >
> >And indeed this machine started to have serious problems with its
> >clock since last weekend. I found it off by 12 hours yesterday and it
> >is still runing much too fast so that ntp is not working any more. I'm
> >currently setting the time with a script in 10min intervals...
> >
> >Were you also seeing such messages during/after the hard disk failures?
> 
> Nope, nothing there.
> Strange your clock is so erratic. Maybe the BIOS battery is dead, then 
> again I would not be surprised it runs off mains when powered up.

I don't think it's the battery because this machine is only one year
old. Moreover, AFAIK, the battery-powered clock is only used during
system boot and I haven't rebooted the box yet. It's up for 113 days
now with no problems at all until last weekend. So I'd rather guess
it's a software-related problem, perhaps something like an integer
overflow in the timer code.

> >You could use jbod mode (or create single-disk "raid arrays") with
> >Areca or 3ware controllers and use software raid on top of that.
> 
> True, but then that would kind of defeat the whole purpose of the fairly 
> expensive card.

You'd still profit from the battery unit contained in such cards and
the (hopefully) better driver.

> In that case a better investment might be a sort of servergrade
> motherboard which has 6 good onboard controllers and at least two
> separate PCI buses for the add-on cards. (For the price of one 12 port
> Areca you can already buy a bare bones low-range server...).

Well, the box I have troubles with contains such a servergrade board
with 6 onboard SATA controllers ;) Fortunately, we have a couple
of unused 3ware cards lying around, so I prefer to use these in
the future.

Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Raid6 array crashed-- 4-disk failure...(?)
  2008-09-16 17:50       ` Maarten
  2008-09-16 18:12         ` Maarten
  2008-09-17  8:25         ` Andre Noll
@ 2008-09-19 14:55         ` John Stoffel
  2 siblings, 0 replies; 15+ messages in thread
From: John Stoffel @ 2008-09-19 14:55 UTC (permalink / raw)
  To: Maarten; +Cc: Andre Noll, linux-raid

>>>>> "Maarten" == Maarten  <maarten@ultratux.net> writes:

>> You could use jbod mode (or create single-disk "raid arrays") with
>> Areca or 3ware controllers and use software raid on top of that.

Maarten> True, but then that would kind of defeat the whole purpose of
Maarten> the fairly expensive card. In that case a better investment
Maarten> might be a sort of servergrade motherboard which has 6 good
Maarten> onboard controllers and at least two separate PCI buses for
Maarten> the add-on cards. (For the price of one 12 port Areca you can
Maarten> already buy a bare bones low-range server...).

Maarten> The main selling point of most hardware raid cards is that
Maarten> they seem to be doing a much better job predicting failure of
Maarten> a drive than software raid can. I don't know how they do
Maarten> that, but a fact is I've never even heard of a two-disk
Maarten> failure with hardware raid. Which of course doesn't say it
Maarten> cannot happen, but it does seem to be a lot less likely
Maarten> somehow.

Hah!  I'll trump that.  In a previous job we had a bunch of Netapp
Filers.  Nice boxes, really nice and reliable.  We had a two disk
failure in one volume, so they do happen.

Managed to get it back by swapping the disk driver controller board
between the really failed drive and the second failed drive.  Took
quite a few hours of mucking about, but it was certainly better than
losing the data and doing large restores.

Nowdays Netapp has their double parity Raid6 like setup for data,
since with 1Tb disks,a second failure isn't all that unlikely.

Pretty soon I suspect disk^Wstorage will be cheap enough that we'll
just mirror and snapshot and try NOT to write to offline media if at
all possible.  Except for data which hasn't been accessed recently.

HSM anyone?

John

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-09-19 14:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-15  9:04 Raid6 array crashed-- 4-disk failure...(?) Maarten
2008-09-15 10:16 ` Neil Brown
2008-09-15 16:32   ` Maarten
2008-09-15 20:57     ` Maarten
2008-09-16 13:12       ` Andre Noll
2008-09-15 11:03 ` Peter Grandi
2008-09-15 16:57   ` Maarten
2008-09-16 19:06     ` Bill Davidsen
2008-09-15 12:59 ` Andre Noll
2008-09-15 17:14   ` Maarten
2008-09-16  8:25     ` Andre Noll
2008-09-16 17:50       ` Maarten
2008-09-16 18:12         ` Maarten
2008-09-17  8:25         ` Andre Noll
2008-09-19 14:55         ` John Stoffel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).