RAID5 - 2nd drive died whilst waiting for RMA

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID5 - 2nd drive died whilst waiting for RMA
@ 2004-11-12 10:07 David Greaves
  2004-11-12 12:17 ` David Greaves
  2004-11-15 20:56 ` Robin Bowes
  0 siblings, 2 replies; 26+ messages in thread
From: David Greaves @ 2004-11-12 10:07 UTC (permalink / raw)
  To: linux-raid

Hi
Remember I had a disk fail a couple of weeks back - and being a normal 
person and not a business I don't have hot spares lying about...
Well, it got back from RMA today.
And before I could plug it in, another drive got a bad block 
reallocation error.

So my RAID5 has 2 dead drives and is toasted :(
I had a few smaller disks on another machine which I lvm'ed together to 
do a backup - but I could only fit about a quarter of my data there. I'd 
*really* like not to have lost all this stuff.

However I do now have a 'good' drive.
Can I dd the newly dead drive (bear in mind it probably only has a bad 
block or two) onto the new drive and come back up in degraded mode?

Any other suggestions.

Then RMA *this* Maxtor and hope to resync in a couple of weeks (well, 
actually - these drives seem so damned unreliable I guess I'm going to 
*have* to buy a spare)

FYI these are 250Gb Maxtor SATA disks.

David

Nov 12 09:45:40 cu kernel: scsi0: ERROR on channel 0, id 0, lun 0, CDB: 
Read (10) 0
0 0d 3b 56 97 00 00 08 00
Nov 12 09:45:40 cu kernel: Current sda: sense key Medium Error
Nov 12 09:45:40 cu kernel: Additional sense: Unrecovered read error - 
auto realloca
te failed
Nov 12 09:45:40 cu kernel: end_request: I/O error, dev sda, sector 221992599
Nov 12 09:45:40 cu kernel: RAID5 conf printout:
Nov 12 09:45:40 cu kernel:  --- rd:5 wd:3 fd:2
Nov 12 09:45:40 cu kernel:  disk 0, o:0, dev:sda1
Nov 12 09:45:40 cu kernel:  disk 1, o:1, dev:sdc1
Nov 12 09:45:40 cu kernel:  disk 2, o:1, dev:sdb1
Nov 12 09:45:40 cu kernel:  disk 4, o:1, dev:hdb1
Nov 12 09:45:40 cu kernel: RAID5 conf printout:
Nov 12 09:45:40 cu kernel:  --- rd:5 wd:3 fd:2
Nov 12 09:45:40 cu kernel:  disk 1, o:1, dev:sdc1
Nov 12 09:45:40 cu kernel:  disk 2, o:1, dev:sdb1
Nov 12 09:45:40 cu kernel:  disk 4, o:1, dev:hdb1
Nov 12 09:45:41 cu kernel: lost page write due to I/O error on dm-0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-12 10:07 RAID5 - 2nd drive died whilst waiting for RMA David Greaves
@ 2004-11-12 12:17 ` David Greaves
  2004-11-12 16:22   ` Dick Streefland
                     ` (2 more replies)
  2004-11-15 20:56 ` Robin Bowes
  1 sibling, 3 replies; 26+ messages in thread
From: David Greaves @ 2004-11-12 12:17 UTC (permalink / raw)
  To: linux-raid

David Greaves wrote:

> So my RAID5 has 2 dead drives and is toasted :(
> I had a few smaller disks on another machine which I lvm'ed together 
> to do a backup - but I could only fit about a quarter of my data 
> there. I'd *really* like not to have lost all this stuff.
>
> However I do now have a 'good' drive.
> Can I dd the newly dead drive (bear in mind it probably only has a bad 
> block or two) onto the new drive and come back up in degraded mode?
>
I've had a think and this is my plan.... comments appreciated.

Currently:
/dev/md0:
        Version : 00.90.01
  Creation Time : Sat Jun  5 18:13:04 2004
     Raid Level : raid5
     Array Size : 980446208 (935.03 GiB 1003.98 GB)
    Device Size : 245111552 (233.76 GiB 250.99 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Nov 12 09:46:53 2004
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

    Number   Major   Minor   RaidDevice State
       0       0        0       -1      removed
       1       8       33        1      active sync   /dev/sdc1
       2       8       17        2      active sync   /dev/sdb1
       3       0        0       -1      removed
       4       3       65        4      active sync   /dev/hdb1
       5       8        1       -1      faulty   /dev/sda1
           UUID : 19779db7:1b41c34b:f70aa853:062c9fe5
         Events : 0.4443578

so, the plan in order to try and extract data:
* insert new drive as /dev/sdd1
* dd if=/dev/sda1 of=/dev/sdd1
* mdadm /dev/md0 --remove /dev/sda1
* physically swap /dev/sda and /dev/sdd so /dev/sdd
* mdadm /dev/md0 --add /dev/sda1
* fsck filesystem and expect to lose files where there were bad blocks
* wait for new drive (special delivery - tomorrow morning)
* insert new drive as /dev/sdd
* mdadm /dev/md0 --add /dev/sdd1

or am I wasting my time?

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-12 12:17 ` David Greaves
@ 2004-11-12 16:22   ` Dick Streefland
  2004-11-12 17:31     ` David Greaves
  2004-11-12 17:48   ` Guy
  2004-11-12 18:09   ` Guy
  2 siblings, 1 reply; 26+ messages in thread
From: Dick Streefland @ 2004-11-12 16:22 UTC (permalink / raw)
  To: linux-raid

David Greaves <david@dgreaves.com> wrote:
| so, the plan in order to try and extract data:
| * insert new drive as /dev/sdd1
| * dd if=/dev/sda1 of=/dev/sdd1
| * mdadm /dev/md0 --remove /dev/sda1
| * physically swap /dev/sda and /dev/sdd so /dev/sdd
| * mdadm /dev/md0 --add /dev/sda1
| * fsck filesystem and expect to lose files where there were bad blocks
| * wait for new drive (special delivery - tomorrow morning)
| * insert new drive as /dev/sdd
| * mdadm /dev/md0 --add /dev/sdd1

You might want to check out "ddrescue", which is a version of "dd"
that is designed to read from a disk with bad sectors.

-- 
Dick Streefland                      ////                      Altium BV
dick.streefland@altium.nl           (@ @)          http://www.altium.com
--------------------------------oOO--(_)--OOo---------------------------


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-12 16:22   ` Dick Streefland
@ 2004-11-12 17:31     ` David Greaves
  2004-11-12 17:49       ` Guy
  0 siblings, 1 reply; 26+ messages in thread
From: David Greaves @ 2004-11-12 17:31 UTC (permalink / raw)
  To: Dick Streefland; +Cc: linux-raid

Dick Streefland wrote:

>David Greaves <david@dgreaves.com> wrote:
>| so, the plan in order to try and extract data:
>| * insert new drive as /dev/sdd1
>| * dd if=/dev/sda1 of=/dev/sdd1
>| * mdadm /dev/md0 --remove /dev/sda1
>| * physically swap /dev/sda and /dev/sdd so /dev/sdd
>| * mdadm /dev/md0 --add /dev/sda1
>| * fsck filesystem and expect to lose files where there were bad blocks
>| * wait for new drive (special delivery - tomorrow morning)
>| * insert new drive as /dev/sdd
>| * mdadm /dev/md0 --add /dev/sdd1
>
>You might want to check out "ddrescue", which is a version of "dd"
>that is designed to read from a disk with bad sectors.
>  
>
That was it - thanks :)

I was googling 'dd recover' and various 'badblocks' etc... not 'dd rescue'

Also found dd_rhelp which looks sensible.

fingers crossed that it works...

David


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-12 12:17 ` David Greaves
  2004-11-12 16:22   ` Dick Streefland
@ 2004-11-12 17:48   ` Guy
  2004-11-12 18:09   ` Guy
  2 siblings, 0 replies; 26+ messages in thread
From: Guy @ 2004-11-12 17:48 UTC (permalink / raw)
  To: 'David Greaves', linux-raid

NO!!!!!

Bad plan, but almost correct.

I am creating a complete response, so don't do anything for 10 minutes or
so!

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves
Sent: Friday, November 12, 2004 7:18 AM
To: linux-raid@vger.kernel.org
Subject: Re: RAID5 - 2nd drive died whilst waiting for RMA

David Greaves wrote:

> So my RAID5 has 2 dead drives and is toasted :(
> I had a few smaller disks on another machine which I lvm'ed together 
> to do a backup - but I could only fit about a quarter of my data 
> there. I'd *really* like not to have lost all this stuff.
>
> However I do now have a 'good' drive.
> Can I dd the newly dead drive (bear in mind it probably only has a bad 
> block or two) onto the new drive and come back up in degraded mode?
>
I've had a think and this is my plan.... comments appreciated.

Currently:
/dev/md0:
        Version : 00.90.01
  Creation Time : Sat Jun  5 18:13:04 2004
     Raid Level : raid5
     Array Size : 980446208 (935.03 GiB 1003.98 GB)
    Device Size : 245111552 (233.76 GiB 250.99 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Nov 12 09:46:53 2004
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

    Number   Major   Minor   RaidDevice State
       0       0        0       -1      removed
       1       8       33        1      active sync   /dev/sdc1
       2       8       17        2      active sync   /dev/sdb1
       3       0        0       -1      removed
       4       3       65        4      active sync   /dev/hdb1
       5       8        1       -1      faulty   /dev/sda1
           UUID : 19779db7:1b41c34b:f70aa853:062c9fe5
         Events : 0.4443578

so, the plan in order to try and extract data:
* insert new drive as /dev/sdd1
* dd if=/dev/sda1 of=/dev/sdd1
* mdadm /dev/md0 --remove /dev/sda1
* physically swap /dev/sda and /dev/sdd so /dev/sdd
* mdadm /dev/md0 --add /dev/sda1
* fsck filesystem and expect to lose files where there were bad blocks
* wait for new drive (special delivery - tomorrow morning)
* insert new drive as /dev/sdd
* mdadm /dev/md0 --add /dev/sdd1

or am I wasting my time?

David
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-12 17:31     ` David Greaves
@ 2004-11-12 17:49       ` Guy
  2004-11-12 18:13         ` David Greaves
  0 siblings, 1 reply; 26+ messages in thread
From: Guy @ 2004-11-12 17:49 UTC (permalink / raw)
  To: 'David Greaves', 'Dick Streefland'; +Cc: linux-raid

Let me know what you have done so far.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves
Sent: Friday, November 12, 2004 12:32 PM
To: Dick Streefland
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID5 - 2nd drive died whilst waiting for RMA

Dick Streefland wrote:

>David Greaves <david@dgreaves.com> wrote:
>| so, the plan in order to try and extract data:
>| * insert new drive as /dev/sdd1
>| * dd if=/dev/sda1 of=/dev/sdd1
>| * mdadm /dev/md0 --remove /dev/sda1
>| * physically swap /dev/sda and /dev/sdd so /dev/sdd
>| * mdadm /dev/md0 --add /dev/sda1
>| * fsck filesystem and expect to lose files where there were bad blocks
>| * wait for new drive (special delivery - tomorrow morning)
>| * insert new drive as /dev/sdd
>| * mdadm /dev/md0 --add /dev/sdd1
>
>You might want to check out "ddrescue", which is a version of "dd"
>that is designed to read from a disk with bad sectors.
>  
>
That was it - thanks :)

I was googling 'dd recover' and various 'badblocks' etc... not 'dd rescue'

Also found dd_rhelp which looks sensible.

fingers crossed that it works...

David

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-12 12:17 ` David Greaves
  2004-11-12 16:22   ` Dick Streefland
  2004-11-12 17:48   ` Guy
@ 2004-11-12 18:09   ` Guy
  2004-11-12 18:30     ` David Greaves
  2 siblings, 1 reply; 26+ messages in thread
From: Guy @ 2004-11-12 18:09 UTC (permalink / raw)
  To: 'David Greaves', linux-raid

NO!!!!!

Bad plan, but almost correct.
If you "--add" a disk, md would think it is a new disk.  And none of the
existing data would be used.  The array would not start anyway.  But if you
use the instructions below, you should get your array back to normal.

+ Stop array:
+ "mdadm /dev/md0 -S"
* insert new drive as /dev/sdd1
+ I have never used dd rescue, so verify its usage.
+ copy failed disk to new disk:
+ "ddrescue if=/dev/sda1 of=/dev/sdd1"
+ physically swap /dev/sda and /dev/sdd, now sda is the new disk
+ start the array:
+ "mdadm --assemble --force /dev/sda1 /dev/sdb1 /dev/sdc1" /dev/hdb1"
verify this, all working disks should be listed.

At this point your array should be up with 1 disk missing.
Your disk with the bad block should be /dev/sdd1, if so, add it to the
array.  Or wait for the new drive and add it.

+ "mdadm /dev/md0 --add /dev/sdd1"

It is normal for a disk to get some bad blocks over time.  Md does not
handle bad blocks very well, just fails the drive.  Since disk drive can
re-locate bad blocks, when you get one, just remove the drive and add it
back.  Or test the disk first with dd.

Assume /dev/sdd is the bad disk:

Test write to the disk to correct bad blocks (you should get no errors):
dd if=/dev/zero of=/dev/sdd bs=1024k

Test read from the disk (you should get no errors):
dd if=/dev/sdd of=/dev/null bs=1024k

If the above gave no errors, then add the disk to the array.

Remove the failed disk from the array:
mdadm /dev/md0 -r /dev/sdd1

Add the repaired disk back to the array:
mdadm /dev/md0 -a /dev/sdd1

I hope this helps!

Guy



-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves
Sent: Friday, November 12, 2004 7:18 AM
To: linux-raid@vger.kernel.org
Subject: Re: RAID5 - 2nd drive died whilst waiting for RMA

David Greaves wrote:

> So my RAID5 has 2 dead drives and is toasted :(
> I had a few smaller disks on another machine which I lvm'ed together 
> to do a backup - but I could only fit about a quarter of my data 
> there. I'd *really* like not to have lost all this stuff.
>
> However I do now have a 'good' drive.
> Can I dd the newly dead drive (bear in mind it probably only has a bad 
> block or two) onto the new drive and come back up in degraded mode?
>
I've had a think and this is my plan.... comments appreciated.

Currently:
/dev/md0:
        Version : 00.90.01
  Creation Time : Sat Jun  5 18:13:04 2004
     Raid Level : raid5
     Array Size : 980446208 (935.03 GiB 1003.98 GB)
    Device Size : 245111552 (233.76 GiB 250.99 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Nov 12 09:46:53 2004
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

    Number   Major   Minor   RaidDevice State
       0       0        0       -1      removed
       1       8       33        1      active sync   /dev/sdc1
       2       8       17        2      active sync   /dev/sdb1
       3       0        0       -1      removed
       4       3       65        4      active sync   /dev/hdb1
       5       8        1       -1      faulty   /dev/sda1
           UUID : 19779db7:1b41c34b:f70aa853:062c9fe5
         Events : 0.4443578

so, the plan in order to try and extract data:
* insert new drive as /dev/sdd1
* dd if=/dev/sda1 of=/dev/sdd1
* mdadm /dev/md0 --remove /dev/sda1
* physically swap /dev/sda and /dev/sdd so /dev/sdd
* mdadm /dev/md0 --add /dev/sda1
* fsck filesystem and expect to lose files where there were bad blocks
* wait for new drive (special delivery - tomorrow morning)
* insert new drive as /dev/sdd
* mdadm /dev/md0 --add /dev/sdd1

or am I wasting my time?

David
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-12 17:49       ` Guy
@ 2004-11-12 18:13         ` David Greaves
  0 siblings, 0 replies; 26+ messages in thread
From: David Greaves @ 2004-11-12 18:13 UTC (permalink / raw)
  To: Guy; +Cc: linux-raid

Thanks Guy :)

Nothing at all.
I haven't rebooted; I haven't attempted any dd's

The only think I've done is stop the monitor:
  /etc/init.d/mdadm stop

and see what's up
  cat /proc/mdstat
and run
  mdadm --display /dev/md0

I'm preparing linux-2.6.9 (currently running 2.6.6).

I've ordered another 250Gb disk and a basic Sil-Img 2xSATA controller 
that should be here in the morning so I'm not planning on doing anything 
until then.

I've been reading up on dd_rescue and dd_rhelp too.
They're now installed and waiting.

David


Guy wrote:

>Let me know what you have done so far.
>
>Guy
>
>-----Original Message-----
>From: linux-raid-owner@vger.kernel.org
>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves
>Sent: Friday, November 12, 2004 12:32 PM
>To: Dick Streefland
>Cc: linux-raid@vger.kernel.org
>Subject: Re: RAID5 - 2nd drive died whilst waiting for RMA
>
>Dick Streefland wrote:
>
>  
>
>>David Greaves <david@dgreaves.com> wrote:
>>| so, the plan in order to try and extract data:
>>| * insert new drive as /dev/sdd1
>>| * dd if=/dev/sda1 of=/dev/sdd1
>>| * mdadm /dev/md0 --remove /dev/sda1
>>| * physically swap /dev/sda and /dev/sdd so /dev/sdd
>>| * mdadm /dev/md0 --add /dev/sda1
>>| * fsck filesystem and expect to lose files where there were bad blocks
>>| * wait for new drive (special delivery - tomorrow morning)
>>| * insert new drive as /dev/sdd
>>| * mdadm /dev/md0 --add /dev/sdd1
>>
>>You might want to check out "ddrescue", which is a version of "dd"
>>that is designed to read from a disk with bad sectors.
>> 
>>
>>    
>>
>That was it - thanks :)
>
>I was googling 'dd recover' and various 'badblocks' etc... not 'dd rescue'
>
>Also found dd_rhelp which looks sensible.
>
>fingers crossed that it works...
>
>David
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>  
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-12 18:09   ` Guy
@ 2004-11-12 18:30     ` David Greaves
  2004-11-12 18:47       ` Guy
  0 siblings, 1 reply; 26+ messages in thread
From: David Greaves @ 2004-11-12 18:30 UTC (permalink / raw)
  To: Guy; +Cc: linux-raid

Guy wrote:

>NO!!!!!
>
>Bad plan, but almost correct.
>If you "--add" a disk, md would think it is a new disk.  And none of the
>existing data would be used.  The array would not start anyway.  But if you
>use the instructions below, you should get your array back to normal.
>  
>
OK, that's encouraging :)

>+ Stop array:
>+ "mdadm /dev/md0 -S"
>  
>
mdadm: fail to stop array /dev/md0: Device or resource busy
Basically they're still mounted and can't unmount 'cos the nfsds are 
running.

Is a reboot the right thing? (back into single user)

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-12 18:30     ` David Greaves
@ 2004-11-12 18:47       ` Guy
  2004-11-13 19:48         ` David Greaves
  0 siblings, 1 reply; 26+ messages in thread
From: Guy @ 2004-11-12 18:47 UTC (permalink / raw)
  To: 'David Greaves'; +Cc: linux-raid

Yes a reboot is needed to correct the problem.  Unless you can figure out
how to stop the array.

Guy

-----Original Message-----
From: David Greaves [mailto:david@dgreaves.com] 
Sent: Friday, November 12, 2004 1:30 PM
To: Guy
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID5 - 2nd drive died whilst waiting for RMA

Guy wrote:

>NO!!!!!
>
>Bad plan, but almost correct.
>If you "--add" a disk, md would think it is a new disk.  And none of the
>existing data would be used.  The array would not start anyway.  But if you
>use the instructions below, you should get your array back to normal.
>  
>
OK, that's encouraging :)

>+ Stop array:
>+ "mdadm /dev/md0 -S"
>  
>
mdadm: fail to stop array /dev/md0: Device or resource busy
Basically they're still mounted and can't unmount 'cos the nfsds are 
running.

Is a reboot the right thing? (back into single user)

David


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-12 18:47       ` Guy
@ 2004-11-13 19:48         ` David Greaves
  2004-11-13 20:01           ` Måns Rullgård
  2004-11-13 20:39           ` Guy
  0 siblings, 2 replies; 26+ messages in thread
From: David Greaves @ 2004-11-13 19:48 UTC (permalink / raw)
  To: Guy; +Cc: linux-raid

OK, not great, but nothing has broken either.

So the reboot happened, the disks went in and got moved around:
new drive->/dev/sdd1
the machine hung complaining 'ata4 timed out' during dd if=sda of=sdd :(
reboot, did it again and it worked fine ( - no errors? maybe it was 
transient? anway...)

powerdown
move sickly drive (sda) to new controller (new:sde)
move new copy (sdd) to position 1 (new:sda)
insert new spare disk (new:sdd)

of course, the partitions are 'fd' so boot produces:
md: Autodetecting RAID arrays.
md: invalid raid superblock magic on sdd1
md: sdd1 has invalid sb, not importing!
# that's fair, I just made a partition, never used, no sb
md: autorun ...
md: considering sde1 ...
# hmm, ok, it's fd and has a superblock but should be marked faulty
md: adding sde1 ...
md: adding sdc1 ...
md: adding sdb1 ...
md: adding sda1 ...
md: adding hdb1 ...
md: created md0
md: bind<hdb1>
md: bind<sda1>
md: bind<sdb1>
md: bind<sdc1>
md: export_rdev(sde1)
# what does this mean? does it mean 'sb is marked faulty'?
md: running: <sdc1><sdb1><sda1><hdb1>
md: kicking non-fresh sda1 from array!
# OK, so you recognise it as an md device but think it's dirty - good
md: unbind<sda1>
md: export_rdev(sda1)
raid5: device sdc1 operational as raid disk 1
raid5: device sdb1 operational as raid disk 2
raid5: device hdb1 operational as raid disk 4
raid5: not enough operational devices for md0 (2/5 failed)
# as expected

Now, Guy, your next step:
# mdadm --assemble --force /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/hdb1
mdadm: /dev/sda1 does not appear to be an md device

Oh.

So the first thing I'm doing is to cleanly redo the dd_rescue from (now) 
sde to sda

Comments?

David

PS Guy, my ISP has blocked comcast.net due to excessive spam - he's 
reconsidered and will unblock them 'soon' but for now I can only 
apologise :(

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-13 19:48         ` David Greaves
@ 2004-11-13 20:01           ` Måns Rullgård
  2004-11-13 20:28             ` David Greaves
  2004-11-13 20:39           ` Guy
  1 sibling, 1 reply; 26+ messages in thread
From: Måns Rullgård @ 2004-11-13 20:01 UTC (permalink / raw)
  To: linux-raid

David Greaves <david@dgreaves.com> writes:

> Now, Guy, your next step:
> # mdadm --assemble --force /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/hdb1
> mdadm: /dev/sda1 does not appear to be an md device

Sorry to butt in, but that command should, assuming the components are
correct, read as

mdadm --assemble <raiddevice> --force /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/hdb1

with <raiddevice> replaced by something like /dev/md0.

-- 
Måns Rullgård
mru@inprovide.com

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-13 20:01           ` Måns Rullgård
@ 2004-11-13 20:28             ` David Greaves
  2004-11-13 20:32               ` Måns Rullgård
  0 siblings, 1 reply; 26+ messages in thread
From: David Greaves @ 2004-11-13 20:28 UTC (permalink / raw)
  To: Måns Rullgård; +Cc: linux-raid

Måns Rullgård wrote:

>David Greaves <david@dgreaves.com> writes:
>
>  
>
>>Now, Guy, your next step:
>># mdadm --assemble --force /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/hdb1
>>mdadm: /dev/sda1 does not appear to be an md device
>>    
>>
>
>Sorry to butt in,
>
Feel free - and thanks :)

> but that command should, assuming the components are
>correct, read as
>
>mdadm --assemble <raiddevice> --force /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/hdb1
>
>with <raiddevice> replaced by something like /dev/md0.
>
>  
>
indeed it should.
I don't know if that's why mdadm gave that message.

I'm still re-copying the partition so I'll try it again when I'm done 
(an hour or so left)

Thanks
David

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-13 20:28             ` David Greaves
@ 2004-11-13 20:32               ` Måns Rullgård
  0 siblings, 0 replies; 26+ messages in thread
From: Måns Rullgård @ 2004-11-13 20:32 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-raid

David Greaves <david@dgreaves.com> writes:

>> but that command should, assuming the components are
>>correct, read as
>>
>>mdadm --assemble <raiddevice> --force /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/hdb1
>>
>>with <raiddevice> replaced by something like /dev/md0.
>>
> indeed it should.
> I don't know if that's why mdadm gave that message.

That's what caused the message.  The message means that the first
device mentioned on the command line wasn't an md device.  This has
nothing to do with the contents of the device, mdadm needs an md
device to operate on.

-- 
Måns Rullgård
mru@inprovide.com
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-13 19:48         ` David Greaves
  2004-11-13 20:01           ` Måns Rullgård
@ 2004-11-13 20:39           ` Guy
  2004-11-13 21:54             ` David Greaves
  1 sibling, 1 reply; 26+ messages in thread
From: Guy @ 2004-11-13 20:39 UTC (permalink / raw)
  Cc: linux-raid

Sorry, the command should be:
mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/hdb1

Hope that helps.

Guy

-----Original Message-----
From: David Greaves [mailto:david@dgreaves.com] 
Sent: Saturday, November 13, 2004 2:49 PM
To: Guy
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID5 - 2nd drive died whilst waiting for RMA

OK, not great, but nothing has broken either.

So the reboot happened, the disks went in and got moved around:
new drive->/dev/sdd1
the machine hung complaining 'ata4 timed out' during dd if=sda of=sdd :(
reboot, did it again and it worked fine ( - no errors? maybe it was 
transient? anway...)

powerdown
move sickly drive (sda) to new controller (new:sde)
move new copy (sdd) to position 1 (new:sda)
insert new spare disk (new:sdd)

of course, the partitions are 'fd' so boot produces:
md: Autodetecting RAID arrays.
md: invalid raid superblock magic on sdd1
md: sdd1 has invalid sb, not importing!
# that's fair, I just made a partition, never used, no sb
md: autorun ...
md: considering sde1 ...
# hmm, ok, it's fd and has a superblock but should be marked faulty
md: adding sde1 ...
md: adding sdc1 ...
md: adding sdb1 ...
md: adding sda1 ...
md: adding hdb1 ...
md: created md0
md: bind<hdb1>
md: bind<sda1>
md: bind<sdb1>
md: bind<sdc1>
md: export_rdev(sde1)
# what does this mean? does it mean 'sb is marked faulty'?
md: running: <sdc1><sdb1><sda1><hdb1>
md: kicking non-fresh sda1 from array!
# OK, so you recognise it as an md device but think it's dirty - good
md: unbind<sda1>
md: export_rdev(sda1)
raid5: device sdc1 operational as raid disk 1
raid5: device sdb1 operational as raid disk 2
raid5: device hdb1 operational as raid disk 4
raid5: not enough operational devices for md0 (2/5 failed)
# as expected

Now, Guy, your next step:
# mdadm --assemble --force /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/hdb1
mdadm: /dev/sda1 does not appear to be an md device

Oh.

So the first thing I'm doing is to cleanly redo the dd_rescue from (now) 
sde to sda

Comments?

David

PS Guy, my ISP has blocked comcast.net due to excessive spam - he's 
reconsidered and will unblock them 'soon' but for now I can only 
apologise :(

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-13 20:39           ` Guy
@ 2004-11-13 21:54             ` David Greaves
  2004-11-15 16:55               ` David Greaves
  0 siblings, 1 reply; 26+ messages in thread
From: David Greaves @ 2004-11-13 21:54 UTC (permalink / raw)
  To: Guy; +Cc: linux-raid

Guy wrote:

>Sorry, the command should be:
>mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/hdb1
>
>Hope that helps.
>
>Guy
>  
>
Yep.

My fault though - no excuses - I should know this stuff and should have 
checked more carefully.

It's been a fraught day and coming back to take the server apart, screw 
it back together etc etc made me sloppy.

Anyway. The good news is that I let the 2nd dd_rescue finish - typed in 
the command *with* a raid device to work on (doh!) and it worked.

Then I started lvm and that worked.

Then I tried to run xfs_check on the smaller (350Gb) partition - 
oomkiller time.
<Sigh>, nothing's easy is it.
I have 256Mb of RAM and 512Mb swap - I've stopped all other daemons - 
and still it dies...

And to think one of my tasks in all this was to merge the two 
filesystems into one and use xfs rather than reiserfs3.6 - but xfs won't 
even fsck!!!

OK, off to yank some RAM from another machine...

Thanks for the help guys - I appreciate it.

Is there a wiki for mdadm, raid around? I'd like to contribute to it (so 
I can use it next time <grin>)

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-13 21:54             ` David Greaves
@ 2004-11-15 16:55               ` David Greaves
  2004-11-16  6:13                 ` Brad Campbell
  0 siblings, 1 reply; 26+ messages in thread
From: David Greaves @ 2004-11-15 16:55 UTC (permalink / raw)
  To: Guy, Måns Rullgård; +Cc: linux-raid

Thanks for your help guys.

I'm back up with a full complement of disks and data.

It turns out that despite the 'auto reallocate failed' in this message:

Nov 12 09:45:40 cu kernel: Current sda: sense key Medium Error
Nov 12 09:45:40 cu kernel: Additional sense: Unrecovered read error - 
auto reallocate failed
Nov 12 09:45:40 cu kernel: end_request: I/O error, dev sda, sector 221992599

The disk is actually good according to 'badblocks -w -v -s -c 1024 
/dev/sde 222000000 221800000'
(I did a full scan too).

Checking for bad blocks in read-write mode
 From block 221800000 to 222000000
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found.

Now all I need is SMART through libata.

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-12 10:07 RAID5 - 2nd drive died whilst waiting for RMA David Greaves
  2004-11-12 12:17 ` David Greaves
@ 2004-11-15 20:56 ` Robin Bowes
  2004-11-15 21:24   ` Guy
  1 sibling, 1 reply; 26+ messages in thread
From: Robin Bowes @ 2004-11-15 20:56 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-raid

David Greaves wrote:
  > Then RMA *this* Maxtor and hope to resync in a couple of weeks (well,
> actually - these drives seem so damned unreliable I guess I'm going to 
> *have* to buy a spare)
> 
> FYI these are 250Gb Maxtor SATA disks.

David,

I use the same drives; I've had a *terrible* failure rate with the them.

I bought 6 drives - 2 from separate vendors on eBay (d1, d2) plus 4 more 
from another vendor (d3-d6). d3 turned out to be faulty so I asked the 
vendor to replace it. He said I could, or I could RMA it direct with 
Maxtor. This I did as it meant I could get a replacement quicker.

However, when I provided the drive serial no. for the RMA it turned out 
to be stolen - as did the other 3 drives from the same vendor (d4-d6) so 
I returned all 4 to the vendor and got 4 more (d7-10) after first 
checking the serial nos to make sure they weren't stolen!

Anyway, I checked the drives when I got them with the Maxtor PowerMax 
utility - 3 out of the 4 were faulty (d7-d9). I RMAd all three back to 
Maxtor and got 3 more (d11-d13). So far (touch wood) all six are still 
working OK.

Let's look at a summary:

d1	OK
d2	OK
d3	Failed
d4	Returned untested
d5	Returned untested
d6	Returned untested
d7	Failed
d8	Failed
d9	Failed
d10	OK
d11	OK
d12	OK
d13	OK

So, of the 10 drives I tested, four failed - That's a 40% failure rate.

Needless to say I decided to configure my RAID5 array with a spare:

[root@dude geeklog]# mdadm --detail /dev/md5
/dev/md5:
         Version : 00.90.01
   Creation Time : Thu Jul 29 21:41:38 2004
      Raid Level : raid5
      Array Size : 974566400 (929.42 GiB 997.96 GB)
     Device Size : 243641600 (232.35 GiB 249.49 GB)
    Raid Devices : 5
   Total Devices : 6
Preferred Minor : 5
     Persistence : Superblock is persistent

     Update Time : Mon Nov 15 20:55:23 2004
           State : clean
  Active Devices : 5
Working Devices : 6
  Failed Devices : 0
   Spare Devices : 1

          Layout : left-symmetric
      Chunk Size : 128K

            UUID : a4bbcd09:5e178c5b:3bf8bd45:8c31d2a1
          Events : 0.1716573

     Number   Major   Minor   RaidDevice State
        0       8        2        0      active sync   /dev/sda2
        1       8       18        1      active sync   /dev/sdb2
        2       8       34        2      active sync   /dev/sdc2
        3       8       50        3      active sync   /dev/sdd2
        4       8       66        4      active sync   /dev/sde2

        5       8       82        -      spare   /dev/sdf2

R.
-- 
http://robinbowes.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-15 20:56 ` Robin Bowes
@ 2004-11-15 21:24   ` Guy
  2004-11-15 21:30     ` Robin Bowes
  2004-11-15 21:39     ` Gordon Henderson
  0 siblings, 2 replies; 26+ messages in thread
From: Guy @ 2004-11-15 21:24 UTC (permalink / raw)
  To: 'Robin Bowes', 'David Greaves'; +Cc: linux-raid

I learned years ago that Maxtor drives have a high failure rate.  I had
hoped they improved over the years, I guess not.  My last Maxtor drive was
800 Meg.  Way overkill.  :)  Those were the days.

Most of the time Maxtor drives have the best price.
You get what you paid for.

I like SeaGate myself.  But they have had some lemons.  But those were 1
model.

Something to consider.  A bad block does not indicate a failed drive.
However, this point is debatable.  There is a reason they have spare blocks.
Most or all drives can re-locate a bad block to a spare.

Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Robin Bowes
Sent: Monday, November 15, 2004 3:57 PM
To: David Greaves
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID5 - 2nd drive died whilst waiting for RMA

David Greaves wrote:
  > Then RMA *this* Maxtor and hope to resync in a couple of weeks (well,
> actually - these drives seem so damned unreliable I guess I'm going to 
> *have* to buy a spare)
> 
> FYI these are 250Gb Maxtor SATA disks.

David,

I use the same drives; I've had a *terrible* failure rate with the them.

I bought 6 drives - 2 from separate vendors on eBay (d1, d2) plus 4 more 
from another vendor (d3-d6). d3 turned out to be faulty so I asked the 
vendor to replace it. He said I could, or I could RMA it direct with 
Maxtor. This I did as it meant I could get a replacement quicker.

However, when I provided the drive serial no. for the RMA it turned out 
to be stolen - as did the other 3 drives from the same vendor (d4-d6) so 
I returned all 4 to the vendor and got 4 more (d7-10) after first 
checking the serial nos to make sure they weren't stolen!

Anyway, I checked the drives when I got them with the Maxtor PowerMax 
utility - 3 out of the 4 were faulty (d7-d9). I RMAd all three back to 
Maxtor and got 3 more (d11-d13). So far (touch wood) all six are still 
working OK.

Let's look at a summary:

d1	OK
d2	OK
d3	Failed
d4	Returned untested
d5	Returned untested
d6	Returned untested
d7	Failed
d8	Failed
d9	Failed
d10	OK
d11	OK
d12	OK
d13	OK

So, of the 10 drives I tested, four failed - That's a 40% failure rate.

Needless to say I decided to configure my RAID5 array with a spare:

[root@dude geeklog]# mdadm --detail /dev/md5
/dev/md5:
         Version : 00.90.01
   Creation Time : Thu Jul 29 21:41:38 2004
      Raid Level : raid5
      Array Size : 974566400 (929.42 GiB 997.96 GB)
     Device Size : 243641600 (232.35 GiB 249.49 GB)
    Raid Devices : 5
   Total Devices : 6
Preferred Minor : 5
     Persistence : Superblock is persistent

     Update Time : Mon Nov 15 20:55:23 2004
           State : clean
  Active Devices : 5
Working Devices : 6
  Failed Devices : 0
   Spare Devices : 1

          Layout : left-symmetric
      Chunk Size : 128K

            UUID : a4bbcd09:5e178c5b:3bf8bd45:8c31d2a1
          Events : 0.1716573

     Number   Major   Minor   RaidDevice State
        0       8        2        0      active sync   /dev/sda2
        1       8       18        1      active sync   /dev/sdb2
        2       8       34        2      active sync   /dev/sdc2
        3       8       50        3      active sync   /dev/sdd2
        4       8       66        4      active sync   /dev/sde2

        5       8       82        -      spare   /dev/sdf2

R.
-- 
http://robinbowes.com
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-15 21:24   ` Guy
@ 2004-11-15 21:30     ` Robin Bowes
  2004-11-15 21:39     ` Gordon Henderson
  1 sibling, 0 replies; 26+ messages in thread
From: Robin Bowes @ 2004-11-15 21:30 UTC (permalink / raw)
  To: Guy; +Cc: 'David Greaves', linux-raid

Guy wrote:

> Something to consider.  A bad block does not indicate a failed drive.
> However, this point is debatable.  There is a reason they have spare blocks.
> Most or all drives can re-locate a bad block to a spare.

When I say "failed" I mean "diagnosed as faulty by the Maxtor PowerMax 
utility". Some drives that were diagnosed as faulty would appear to be 
working OK but would cause the server to crash when syncronising the 
array for the first time.

I suspect the guy I got these drives from had no idea how to handle 
computer gear - the drives arrived packed tightly in a rigid box with a 
thin layer of bubble wrap. I suggested to him that this was sub-optimal 
and he claimed that bacuse the box was really hard it would project the 
drives more. Obviously not heard of g force then!

R.
-- 
http://robinbowes.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-15 21:24   ` Guy
  2004-11-15 21:30     ` Robin Bowes
@ 2004-11-15 21:39     ` Gordon Henderson
  1 sibling, 0 replies; 26+ messages in thread
From: Gordon Henderson @ 2004-11-15 21:39 UTC (permalink / raw)
  To: linux-raid

On Mon, 15 Nov 2004, Guy wrote:

> I learned years ago that Maxtor drives have a high failure rate.  I had
> hoped they improved over the years, I guess not.  My last Maxtor drive was
> 800 Meg.  Way overkill.  :)  Those were the days.
>
> Most of the time Maxtor drives have the best price.
> You get what you paid for.
>
> I like SeaGate myself.  But they have had some lemons.  But those were 1
> model.

Some of the Seagates are offering 5 year warrantys now too - that
certianly puts them head to head with one of the old reasons to get SCSI.
Eg:

http://www.scan.co.uk/Products/ProductInfo.asp?WebProductID=60962

This is £71.10 for 160GB/7200/8MB cache

The Maxtor equiv. is £63.86 with no warranty mentioned on that vendors
page. http://www.scan.co.uk/Products/ProductInfo.asp?WebProductID=147139

> Something to consider.  A bad block does not indicate a failed drive.
> However, this point is debatable.  There is a reason they have spare blocks.
> Most or all drives can re-locate a bad block to a spare.

PITA to deal with at the next level up though.

Gordon
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-15 16:55               ` David Greaves
@ 2004-11-16  6:13                 ` Brad Campbell
  2004-11-17 11:21                   ` David Greaves
  0 siblings, 1 reply; 26+ messages in thread
From: Brad Campbell @ 2004-11-16  6:13 UTC (permalink / raw)
  To: David Greaves; +Cc: Guy, Måns Rullgård, linux-raid

David Greaves wrote:

> Now all I need is SMART through libata.

If you are using a UP system, grab the patches from Jeff's libata-dev tree and have at it.
I have been beating it hard on 14 drives here on a loaded working server for weeks and have not 
managed to toast anything yet. It's great being able to keep tabs on the remaining whirly bits and 
hopefully get an early warning before they expire.

(Fingers crossed. 12 of them are Maxtor Maxline-II drives but all sit between 35 & 40 deg C)

Brad

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-16  6:13                 ` Brad Campbell
@ 2004-11-17 11:21                   ` David Greaves
  2004-11-17 11:24                     ` Måns Rullgård
  0 siblings, 1 reply; 26+ messages in thread
From: David Greaves @ 2004-11-17 11:21 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Guy, Måns Rullgård, linux-raid, linux-ide

Brad Campbell wrote:

> David Greaves wrote:
>
>> Now all I need is SMART through libata.
>
>
> If you are using a UP system, grab the patches from Jeff's libata-dev 
> tree and have at it.
> I have been beating it hard on 14 drives here on a loaded working 
> server for weeks and have not managed to toast anything yet. It's 
> great being able to keep tabs on the remaining whirly bits and 
> hopefully get an early warning before they expire.
>
> (Fingers crossed. 12 of them are Maxtor Maxline-II drives but all sit 
> between 35 & 40 deg C)
>
> Brad

Thanks Brad
I've been tracking this on the ide list and I've been nervous of 
applying anything since I don't have proper  backups. I'm relying on 
redundancy.

I also found this comment by Jeff on some patches:
  As I noted in another email, be careful...  that patch bypasses the 
SCSI command synchronization, so you could potentially send a SMART 
command to the hardware while another command is still in progress.
He follows up by saying this *will* result in corruption.

And although I *think* he was refering to an earlier incarnation...

I looked to see what I'd need to do and given I'm running 2.6.9 I take 
it I'd have to apply:

http://www.kernel.org/pub/linux/kernel/people/jgarzik/libata/2.6.9-libata1-dev1.patch.bz2
which depends on

http://www.kernel.org/pub/linux/kernel/people/jgarzik/libata/2.6.9-libata1.patch.bz2

That seems like quite a lot and I'm not clear on the level of stability 
impact on my system.

Now although I track linux-ide I doubt I'm fully informed!!
So if anyone would like to inform me of a reasonable course of action 
then I'm actually reasonably happy to apply the SMART patch.

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-17 11:21                   ` David Greaves
@ 2004-11-17 11:24                     ` Måns Rullgård
  2004-11-17 11:44                       ` Brad Campbell
  2004-11-17 12:04                       ` David Greaves
  0 siblings, 2 replies; 26+ messages in thread
From: Måns Rullgård @ 2004-11-17 11:24 UTC (permalink / raw)
  To: David Greaves; +Cc: Brad Campbell, Guy, linux-raid, linux-ide

David Greaves <david@dgreaves.com> writes:

> I've been tracking this on the ide list and I've been nervous of
> applying anything since I don't have proper  backups. I'm relying on
> redundancy.

RAID can *never* replace proper backups.  RAID only protects against
low-level disk failures.  Filesystem corruption caused by bugs,
accidental deleting of files, etc. are happily allowed.  Get a DVD
burner, you'll thank yourself one day.

-- 
Måns Rullgård
mru@inprovide.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-17 11:24                     ` Måns Rullgård
@ 2004-11-17 11:44                       ` Brad Campbell
  2004-11-17 12:04                       ` David Greaves
  1 sibling, 0 replies; 26+ messages in thread
From: Brad Campbell @ 2004-11-17 11:44 UTC (permalink / raw)
  To: Måns Rullgård; +Cc: David Greaves, Guy, linux-raid, linux-ide

Måns Rullgård wrote:
> David Greaves <david@dgreaves.com> writes:
> 
> 
>>I've been tracking this on the ide list and I've been nervous of
>>applying anything since I don't have proper  backups. I'm relying on
>>redundancy.
> 
>
> RAID can *never* replace proper backups.  RAID only protects against
> low-level disk failures.  Filesystem corruption caused by bugs,
> accidental deleting of files, etc. are happily allowed.  Get a DVD
> burner, you'll thank yourself one day.
> 

Heh. I have 2.5TB here. Thats a lot of DVD's.
I have the whole lot set chmod a-w and rely on RAID to keep me alive. (Having said it is a home 
entertainment system and would not cause mega dollar industrial damage if I lost it). All my 
original media is about 9,000km away but I could survive until I re-ripped it.

I slide my array drives out and slide in some throwaway spares to beat on for testing before I 
deploy a new kernel.

David Greaves wrote:

> 
> I looked to see what I'd need to do and given I'm running 2.6.9 I take it I'd have to apply:
>  
> http://www.kernel.org/pub/linux/kernel/people/jgarzik/libata/2.6.9-libata1-dev1.patch.bz2
> which depends on
>  
> http://www.kernel.org/pub/linux/kernel/people/jgarzik/libata/2.6.9-libata1.patch.bz2
> 
> That seems like quite a lot and I'm not clear on the level of stability impact on my system.
> 
> Now although I track linux-ide I doubt I'm fully informed!!
> So if anyone would like to inform me of a reasonable course of action then I'm actually reasonably happy to apply the SMART patch.

That is pretty close. I actually just cloned the kernel bk tree and pulled both of Jeffs trees, but 
then I ran about 10 hours of super intensive tests with 5 spare drives spread across my 3 
controllers before I slid my 13 raid disks back in and let it loose. I guess my real kernel version 
is somewhere around 2.6.10-rc1-bk3.

Have had good luck with it over the last couple of weeks and I have been hitting it pretty hard. 
But, as I said before, I'm on a UP machine. Andy pointed out some possible issues on SMP so beware 
there.

Brad
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID5 - 2nd drive died whilst waiting for RMA
  2004-11-17 11:24                     ` Måns Rullgård
  2004-11-17 11:44                       ` Brad Campbell
@ 2004-11-17 12:04                       ` David Greaves
  1 sibling, 0 replies; 26+ messages in thread
From: David Greaves @ 2004-11-17 12:04 UTC (permalink / raw)
  To: Måns Rullgård; +Cc: Brad Campbell, Guy, linux-raid, linux-ide

Måns Rullgård wrote:

>David Greaves <david@dgreaves.com> writes:
>  
>
>>I've been tracking this on the ide list and I've been nervous of
>>applying anything since I don't have proper  backups. I'm relying on
>>redundancy.
>>    
>>
>RAID can *never* replace proper backups.  RAID only protects against
>low-level disk failures.  Filesystem corruption caused by bugs,
>accidental deleting of files, etc. are happily allowed.  Get a DVD
>burner, you'll thank yourself one day.
>  
>
Got one.
I have 935Gb of data (soon to be 1.2Tb)
Each DVD holds 4.7Gb and takes 20-30 mins to burn.
Any other ideas ? ;)

David

PS Seriously - the data is films, my CDs, DVDs, TV shows and the like. 
It's a bummer to lose but in the grand scheme of things it's only the telly.
My _real_ personal data (photos etc) are mirrored onto physically 
seperate disks each night. The 'backups' are remounted read-only. It's 
the 'rsync snapshot' method.

See: http://www.mikerubel.org/computers/rsync_snapshots/

Thanks for your concern though :)


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2004-11-17 12:04 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-12 10:07 RAID5 - 2nd drive died whilst waiting for RMA David Greaves
2004-11-12 12:17 ` David Greaves
2004-11-12 16:22   ` Dick Streefland
2004-11-12 17:31     ` David Greaves
2004-11-12 17:49       ` Guy
2004-11-12 18:13         ` David Greaves
2004-11-12 17:48   ` Guy
2004-11-12 18:09   ` Guy
2004-11-12 18:30     ` David Greaves
2004-11-12 18:47       ` Guy
2004-11-13 19:48         ` David Greaves
2004-11-13 20:01           ` Måns Rullgård
2004-11-13 20:28             ` David Greaves
2004-11-13 20:32               ` Måns Rullgård
2004-11-13 20:39           ` Guy
2004-11-13 21:54             ` David Greaves
2004-11-15 16:55               ` David Greaves
2004-11-16  6:13                 ` Brad Campbell
2004-11-17 11:21                   ` David Greaves
2004-11-17 11:24                     ` Måns Rullgård
2004-11-17 11:44                       ` Brad Campbell
2004-11-17 12:04                       ` David Greaves
2004-11-15 20:56 ` Robin Bowes
2004-11-15 21:24   ` Guy
2004-11-15 21:30     ` Robin Bowes
2004-11-15 21:39     ` Gordon Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).