(unknown),

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* (unknown), 
@ 2011-09-26  4:23 Kenn
  2011-09-26  4:52 ` NeilBrown
  0 siblings, 1 reply; 10+ messages in thread
From: Kenn @ 2011-09-26  4:23 UTC (permalink / raw)
  To: linux-raid; +Cc: neilb

I have a raid5 array that had a drive drop out, and resilvered the wrong
drive when I put it back in, corrupting and destroying the raid.  I
stopped the array at less than 1% resilvering and I'm in the process of
making a dd-copy of the drive to recover the files.

(1) Is there anything diagnostic I can contribute to add more
wrong-drive-resilvering protection to mdadm?  I have the command history
showing everything I did, I have the five drives available for reading
sectors, I haven't touched anything yet.

(2) Can I suggest improvements into resilvering?  Can I contribute code to
implement them?  Such as resilver from the end of the drive back to the
front, so if you notice the wrong drive resilvering, you can stop and not
lose the MBR and the directory format structure that's stored in the first
few sectors?  I'd also like to take a look at adding a raid mode where
there's checksum in every stripe block so the system can detect corrupted
disks and not resilver.  I'd also like to add a raid option where a
resilvering need will be reported by email and needs to be started
manually.  All to prevent what happened to me from happening again.

Thanks for your time.

Kenn Frank

P.S.  Setup:

# uname -a
Linux teresa 2.6.26-2-686 #1 SMP Sat Jun 11 14:54:10 UTC 2011 i686 GNU/Linux

# mdadm --version
mdadm - v2.6.7.2 - 14th November 2008

# mdadm --detail /dev/md3
/dev/md3:
        Version : 00.90
  Creation Time : Thu Sep 22 16:23:50 2011
     Raid Level : raid5
     Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
  Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Thu Sep 22 20:19:09 2011
          State : clean, degraded
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host teresa)
         Events : 0.6

    Number   Major   Minor   RaidDevice State
       0      33        1        0      active sync   /dev/hde1
       1      56        1        1      active sync   /dev/hdi1
       2       0        0        2      removed
       3      57        1        3      active sync   /dev/hdk1
       4      34        1        4      active sync   /dev/hdg1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re:
  2011-09-26  4:23 (unknown), Kenn
@ 2011-09-26  4:52 ` NeilBrown
  2011-09-26  7:03   ` Re: Roman Mamedov
  2011-09-26  7:42   ` Kenn
  0 siblings, 2 replies; 10+ messages in thread
From: NeilBrown @ 2011-09-26  4:52 UTC (permalink / raw)
  To: kenn; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2814 bytes --]

On Sun, 25 Sep 2011 21:23:31 -0700 "Kenn" <kenn@kenn.us> wrote:

> I have a raid5 array that had a drive drop out, and resilvered the wrong
> drive when I put it back in, corrupting and destroying the raid.  I
> stopped the array at less than 1% resilvering and I'm in the process of
> making a dd-copy of the drive to recover the files.

I don't know what you mean by "resilvered".

> 
> (1) Is there anything diagnostic I can contribute to add more
> wrong-drive-resilvering protection to mdadm?  I have the command history
> showing everything I did, I have the five drives available for reading
> sectors, I haven't touched anything yet.

Yes, report the command history, and any relevant kernel logs, and the output
of "mdadm --examine" on all relevant devices.

NeilBrown


> 
> (2) Can I suggest improvements into resilvering?  Can I contribute code to
> implement them?  Such as resilver from the end of the drive back to the
> front, so if you notice the wrong drive resilvering, you can stop and not
> lose the MBR and the directory format structure that's stored in the first
> few sectors?  I'd also like to take a look at adding a raid mode where
> there's checksum in every stripe block so the system can detect corrupted
> disks and not resilver.  I'd also like to add a raid option where a
> resilvering need will be reported by email and needs to be started
> manually.  All to prevent what happened to me from happening again.
> 
> Thanks for your time.
> 
> Kenn Frank
> 
> P.S.  Setup:
> 
> # uname -a
> Linux teresa 2.6.26-2-686 #1 SMP Sat Jun 11 14:54:10 UTC 2011 i686 GNU/Linux
> 
> # mdadm --version
> mdadm - v2.6.7.2 - 14th November 2008
> 
> # mdadm --detail /dev/md3
> /dev/md3:
>         Version : 00.90
>   Creation Time : Thu Sep 22 16:23:50 2011
>      Raid Level : raid5
>      Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
>   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>    Raid Devices : 5
>   Total Devices : 4
> Preferred Minor : 3
>     Persistence : Superblock is persistent
> 
>     Update Time : Thu Sep 22 20:19:09 2011
>           State : clean, degraded
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>            UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host teresa)
>          Events : 0.6
> 
>     Number   Major   Minor   RaidDevice State
>        0      33        1        0      active sync   /dev/hde1
>        1      56        1        1      active sync   /dev/hdi1
>        2       0        0        2      removed
>        3      57        1        3      active sync   /dev/hdk1
>        4      34        1        4      active sync   /dev/hdg1
> 
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re:
  2011-09-26  4:52 ` NeilBrown
@ 2011-09-26  7:03   ` Roman Mamedov
  2011-09-26 23:23     ` Re: Kenn
  2011-09-26 23:46     ` Recovering from a Bad Resilver / Rebuild Kenn
  2011-09-26  7:42   ` Kenn
  1 sibling, 2 replies; 10+ messages in thread
From: Roman Mamedov @ 2011-09-26  7:03 UTC (permalink / raw)
  To: NeilBrown; +Cc: kenn, linux-raid

[-- Attachment #1: Type: text/plain, Size: 917 bytes --]

On Mon, 26 Sep 2011 14:52:48 +1000
NeilBrown <neilb@suse.de> wrote:

> On Sun, 25 Sep 2011 21:23:31 -0700 "Kenn" <kenn@kenn.us> wrote:
> 
> > I have a raid5 array that had a drive drop out, and resilvered the wrong
> > drive when I put it back in, corrupting and destroying the raid.  I
> > stopped the array at less than 1% resilvering and I'm in the process of
> > making a dd-copy of the drive to recover the files.
> 
> I don't know what you mean by "resilvered".

At first I thought the initial poster just invented some peculiar funny word of his own, but it looks like it's from the ZFS circles:
https://encrypted.google.com/search?q=resilver+zfs
@Kenn; you probably mean 'resync' or 'rebuild', but no one ever calls those processes 'resilver' here, you'll get no google results and blank/unknowing/funny looks from people when using that term in relation to mdadm.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re:
  2011-09-26  7:03   ` Re: Roman Mamedov
@ 2011-09-26 23:23     ` Kenn
  2011-09-26 23:46     ` Recovering from a Bad Resilver / Rebuild Kenn
  1 sibling, 0 replies; 10+ messages in thread
From: Kenn @ 2011-09-26 23:23 UTC (permalink / raw)
  To: linux-raid, rm

> On Mon, 26 Sep 2011 14:52:48 +1000
> NeilBrown <neilb@suse.de> wrote:
>
>> On Sun, 25 Sep 2011 21:23:31 -0700 "Kenn" <kenn@kenn.us> wrote:
>>
>> > I have a raid5 array that had a drive drop out, and resilvered the
>> wrong
>> > drive when I put it back in, corrupting and destroying the raid.  I
>> > stopped the array at less than 1% resilvering and I'm in the process
>> of
>> > making a dd-copy of the drive to recover the files.
>>
>> I don't know what you mean by "resilvered".
>
> At first I thought the initial poster just invented some peculiar funny
> word of his own, but it looks like it's from the ZFS circles:
> https://encrypted.google.com/search?q=resilver+zfs
> @Kenn; you probably mean 'resync' or 'rebuild', but no one ever calls
> those processes 'resilver' here, you'll get no google results and
> blank/unknowing/funny looks from people when using that term in relation
> to mdadm.

Good point, I am a very old unix user and my RAID terminology hasn't been
properly updated since college.  Resilver is mentioned here in wikipedia
for disk mirroring http://en.wikipedia.org/wiki/Disk_mirroring and I've
always used the word but it's not in the RAID page and I'll switch to
"rebuilding".

Thanks,
Kenn


>
> --
> With respect,
> Roman
>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Recovering from a Bad Resilver / Rebuild
  2011-09-26  7:03   ` Re: Roman Mamedov
  2011-09-26 23:23     ` Re: Kenn
@ 2011-09-26 23:46     ` Kenn
  2011-09-27  9:27       ` David Brown
  1 sibling, 1 reply; 10+ messages in thread
From: Kenn @ 2011-09-26 23:46 UTC (permalink / raw)
  To: linux-raid; +Cc: david.brown

>> On Mon, 26 Sep 2011 14:52:48 +1000
>> NeilBrown <neilb@suse.de> wrote:
>>
>> On Sun, 25 Sep 2011 21:23:31 -0700 "Kenn" <kenn@kenn.us> wrote:
>>
>> So that brings up another point -- I've been reading through your blog,
>> and I acknowledge your thoughts on not having much benefit to checksums on
>> every block (http://neil.brown.name/blog/20110227114201), but sometimes
>> people like to having that extra lock on their door even though it takes
>> more effort to go in and out of their home.  In my five-drive array, if
>> the last five words were the checksums of the blocks on every drive, the
>> checksums off each drive could vote on trusting the blocks of every other
>> drive during the rebuild process, and prevent an idiot (me) from killing
>> his data.  It would force wasteful sectors on the drive, perhaps harm
>> performance by squeezing 2+n bytes out of each sector, but if someone
>> wants to protect their data as much as possible, it would be a welcome
>> option where performance is not a priority.
>>
>> Also, the checksums do provide some protection: first, against against
>> partial media failure, which is a major flaw in raid 456 design according
>> to http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt , and checksum
>> voting could protect against the Atomicity/write-in-place flaw outlined in
>> http://en.wikipedia.org/wiki/RAID#Problems_with_RAID .
>>
>> What do you think?
>>
>> Kenn
> On Sun, 26 Sep 2011 19:56:50 -0700 "David Brown"
<david.brown@hesbynett.no> wrote:
>
> /raid/ protects against partial media flaws.  If one disk in a raid5
> stripe has a bad sector, that sector will be ignored and the missing
> data will be re-created from the other disks using the raid recovery
> algorithm.  If you want to have such protection even when doing a resync
> (as many people do), then use raid6 - it has two parity blocks.
>
> As Neil points out in his blog, it is impossible to fully recover from a
> failure part way through a write - checksum voting or majority voting
> /may/ give you the right answer, but it may not.  If you need protection
> against that, you have to have filesystem level control (data logging
> and journalling as well as metafile journalling), or perhaps use raid
> systems with battery backed write caches.

From what I understand of basic RAID theory, the "If one disk in a raid5
stripe has a bad sector," is the part that's based on too much faith in
the hardware.  RAID trusts the hardware to send it errors when there are
read failures, and it's helpless when the drive reads garbage without an
error and returns it as a good read.  During a rebuild this will destroy a
good array.  This is the argument against RAID in the articles I listed,
and why checksums in the blocks would be helpful as they get around this
blind spot.  And they give early warning on reads that something is dying.
 Having each block's checksums in all the other blocks in the stripe lets
md detect a previously failed atomic write and give another early warning.

I think for people coming from the "can't be too safe" mindset, these
checksums would be welcome, and basically, anyone who signs up for RAID5/6
already is choosing safety over performance.

Kenn


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Recovering from a Bad Resilver / Rebuild
  2011-09-26 23:46     ` Recovering from a Bad Resilver / Rebuild Kenn
@ 2011-09-27  9:27       ` David Brown
  0 siblings, 0 replies; 10+ messages in thread
From: David Brown @ 2011-09-27  9:27 UTC (permalink / raw)
  To: linux-raid

On 27/09/2011 01:46, Kenn wrote:
>>> On Mon, 26 Sep 2011 14:52:48 +1000
>>> NeilBrown<neilb@suse.de>  wrote:
>>>
>>> On Sun, 25 Sep 2011 21:23:31 -0700 "Kenn"<kenn@kenn.us>  wrote:
>>>
>>> So that brings up another point -- I've been reading through your blog,
>>> and I acknowledge your thoughts on not having much benefit to checksums on
>>> every block (http://neil.brown.name/blog/20110227114201), but sometimes
>>> people like to having that extra lock on their door even though it takes
>>> more effort to go in and out of their home.  In my five-drive array, if
>>> the last five words were the checksums of the blocks on every drive, the
>>> checksums off each drive could vote on trusting the blocks of every other
>>> drive during the rebuild process, and prevent an idiot (me) from killing
>>> his data.  It would force wasteful sectors on the drive, perhaps harm
>>> performance by squeezing 2+n bytes out of each sector, but if someone
>>> wants to protect their data as much as possible, it would be a welcome
>>> option where performance is not a priority.
>>>
>>> Also, the checksums do provide some protection: first, against against
>>> partial media failure, which is a major flaw in raid 456 design according
>>> to http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt , and checksum
>>> voting could protect against the Atomicity/write-in-place flaw outlined in
>>> http://en.wikipedia.org/wiki/RAID#Problems_with_RAID .
>>>
>>> What do you think?
>>>
>>> Kenn
>> On Sun, 26 Sep 2011 19:56:50 -0700 "David Brown"
> <david.brown@hesbynett.no>  wrote:
>>
>> /raid/ protects against partial media flaws.  If one disk in a raid5
>> stripe has a bad sector, that sector will be ignored and the missing
>> data will be re-created from the other disks using the raid recovery
>> algorithm.  If you want to have such protection even when doing a resync
>> (as many people do), then use raid6 - it has two parity blocks.
>>
>> As Neil points out in his blog, it is impossible to fully recover from a
>> failure part way through a write - checksum voting or majority voting
>> /may/ give you the right answer, but it may not.  If you need protection
>> against that, you have to have filesystem level control (data logging
>> and journalling as well as metafile journalling), or perhaps use raid
>> systems with battery backed write caches.
>
>  From what I understand of basic RAID theory, the "If one disk in a raid5
> stripe has a bad sector," is the part that's based on too much faith in
> the hardware.  RAID trusts the hardware to send it errors when there are
> read failures, and it's helpless when the drive reads garbage without an
> error and returns it as a good read.  During a rebuild this will destroy a
> good array.  This is the argument against RAID in the articles I listed,
> and why checksums in the blocks would be helpful as they get around this
> blind spot.  And they give early warning on reads that something is dying.
>   Having each block's checksums in all the other blocks in the stripe lets
> md detect a previously failed atomic write and give another early warning.
>
> I think for people coming from the "can't be too safe" mindset, these
> checksums would be welcome, and basically, anyone who signs up for RAID5/6
> already is choosing safety over performance.
>

I think you have to be very clear on the difference between 
/unrecoverable/ read errors and /undetected/ read errors.  Unrecoverable 
read errors means the disk controller has seen more bit errors on the 
disk surface than it is able to correct.  These are not a problem for 
raid, because the disk controller returns an error message - the raid 
system then re-creates the missing data from the rest of the stripe. 
This is one of the main reasons for using raid in the first place.  It 
/is/ a problem if such an URE occurs while you are already resyncing a 
missing disk - and is therefore a major motivation behind raid6 (and 
also Neil's "hotsync" plans).

/Undetected/ read errors are when the disk controller reads errors from 
the disk surface, and the incorrect data passes the disk's RS and CRC 
checksums.  The chances of this happening are absurdly small unless 
there are faults in the drive electronics or firmware (in which case all 
bets are off anyway).  Higher level checksums are one way to detect such 
errors, as is regular data scrubbing.

It is correct that there is always a chance of incorrect data getting 
through somewhere with undetected read errors.  I don't have any figures 
on me, but I suspect that before these are a realistic worry then you 
have bigger concerns about memory bit errors going undetected despite 
ECC ram, and undetected network errors despite Ethernet and IP checksums.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re:
  2011-09-26  4:52 ` NeilBrown
  2011-09-26  7:03   ` Re: Roman Mamedov
@ 2011-09-26  7:42   ` Kenn
  2011-09-26  8:04     ` Re: NeilBrown
  1 sibling, 1 reply; 10+ messages in thread
From: Kenn @ 2011-09-26  7:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Replying.  I realize and I apologize I didn't create a subject.  I hope
this doesn't confuse majordomo.

> On Sun, 25 Sep 2011 21:23:31 -0700 "Kenn" <kenn@kenn.us> wrote:
>
>> I have a raid5 array that had a drive drop out, and resilvered the wrong
>> drive when I put it back in, corrupting and destroying the raid.  I
>> stopped the array at less than 1% resilvering and I'm in the process of
>> making a dd-copy of the drive to recover the files.
>
> I don't know what you mean by "resilvered".

Resilvering -- Rebuilding the array.  Lesser used term, sorry!

>
>>
>> (1) Is there anything diagnostic I can contribute to add more
>> wrong-drive-resilvering protection to mdadm?  I have the command history
>> showing everything I did, I have the five drives available for reading
>> sectors, I haven't touched anything yet.
>
> Yes, report the command history, and any relevant kernel logs, and the
> output
> of "mdadm --examine" on all relevant devices.
>
> NeilBrown

Awesome!  I hope this is useful.  It's really long, so I edited down the
logs and command history to what I thought were the important bits.  If
you want more, I can post unedited versions, please let me know.

### Command History ###

# The start of the sequence, removing sde from array
mdadm --examine /dev/sde
mdadm --detail /dev/md3
cat /proc/mdstat
mdadm /dev/md3 --remove /dev/sde1
mdadm /dev/md3 --remove /dev/sde
mdadm /dev/md3 --fail /dev/sde1
cat /proc/mdstat
mdadm --examine /dev/sde1
fdisk -l | grep 750
mdadm --examine /dev/sde1
mdadm --remove /dev/sde
mdadm /dev/md3 --remove /dev/sde
mdadm /dev/md3 --fail /dev/sde
fdisk /dev/sde
ls
vi /var/log/syslog
reboot
vi /var/log/syslog
reboot
mdadm --detail /dev/md3
mdadm --examine /dev/sde1
# Wiping sde
fdisk /dev/sde
newfs -t ext3 /dev/sde1
mkfs -t ext3 /dev/sde1
mkfs -t ext3 /dev/sde2
fdisk /dev/sde
mdadm --stop /dev/md3
# Putting sde back into array
mdadm --examine /dev/sde
mdadm --help
mdadm --misc --help
mdadm --zero-superblock /dev/sde
mdadm --query /dev/sde
mdadm --examine /dev/sde
mdadm --detail /dev/sde
mdadm --detail /dev/sde1
fdisk /dev/sde
mdadm --assemble --no-degraded /dev/md3  /dev/hde1 /dev/hdi1 /dev/sde1
/dev/hdk1 /dev/hdg1
cat /proc/mdstat
mdadm --stop /dev/md3
mdadm --create /dev/md3 --level=5 --raid-devices=5  /dev/hde1 /dev/hdi1
missing /dev/hdk1 /dev/hdg1
mount -o ro /raid53
ls /raid53
umount /raid53
mdadm --stop /dev/md3
# The command that did the bad rebuild
mdadm --create /dev/md3 --level=5 --raid-devices=5  /dev/hde1 /dev/hdi1
/dev/sde1 /dev/hdk1 /dev/hdg1
cat /proc/mdstat
mdadm --examine /dev/md3
mdadm --query /dev/md3
mdadm --detail /dev/md3
mount /raid53
mdadm --stop /dev/md3
# Trying to get the corrupted disk back up
mdadm --create /dev/md3 --level=5 --raid-devices=5  /dev/hde1 /dev/hdi1
missing /dev/hdk1 /dev/hdg1
cat /proc/mdstat
mount /raid53
fsck -n /dev/md3



### KERNEL LOGS ###

# Me messing around with fdisk and mdadm creating new partitions to wipe
out sde
Sep 22 15:56:39 teresa kernel: [ 7897.778204] sd 5:0:0:0: [sde] 1465149168
512-byte hardware sectors (750156 MB)
Sep 22 15:56:39 teresa kernel: [ 7897.778204] sd 5:0:0:0: [sde] Write
Protect is off
Sep 22 15:56:39 teresa kernel: [ 7897.778204] sd 5:0:0:0: [sde] Mode
Sense: 00 3a 00 00
Sep 22 15:56:39 teresa kernel: [ 7897.778204] sd 5:0:0:0: [sde] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 22 15:56:39 teresa kernel: [ 7897.778204]  sde: sde1 sde2
Sep 22 15:56:41 teresa kernel: [ 7899.848026] sd 5:0:0:0: [sde] 1465149168
512-byte hardware sectors (750156 MB)
Sep 22 15:56:41 teresa kernel: [ 7899.848026] sd 5:0:0:0: [sde] Write
Protect is off
Sep 22 15:56:41 teresa kernel: [ 7899.848026] sd 5:0:0:0: [sde] Mode
Sense: 00 3a 00 00
Sep 22 15:56:41 teresa kernel: [ 7899.848026] sd 5:0:0:0: [sde] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 22 15:56:41 teresa kernel: [ 7899.848026]  sde: sde1 sde2
Sep 22 16:01:49 teresa kernel: [ 8207.733821] sd 5:0:0:0: [sde] 1465149168
512-byte hardware sectors (750156 MB)
Sep 22 16:01:49 teresa kernel: [ 8207.733919] sd 5:0:0:0: [sde] Write
Protect is off
Sep 22 16:01:49 teresa kernel: [ 8207.733943] sd 5:0:0:0: [sde] Mode
Sense: 00 3a 00 00
Sep 22 16:01:49 teresa kernel: [ 8207.734039] sd 5:0:0:0: [sde] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 22 16:01:49 teresa kernel: [ 8207.734083]  sde: sde1
Sep 22 16:01:51 teresa kernel: [ 8209.777260] sd 5:0:0:0: [sde] 1465149168
512-byte hardware sectors (750156 MB)
Sep 22 16:01:51 teresa kernel: [ 8209.777260] sd 5:0:0:0: [sde] Write
Protect is off
Sep 22 16:01:51 teresa kernel: [ 8209.777260] sd 5:0:0:0: [sde] Mode
Sense: 00 3a 00 00
Sep 22 16:01:51 teresa kernel: [ 8209.777260] sd 5:0:0:0: [sde] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 22 16:01:51 teresa kernel: [ 8209.777260]  sde: sde1
Sep 22 16:02:09 teresa mdadm[2694]: DeviceDisappeared event detected on md
device /dev/md3
Sep 22 16:02:09 teresa kernel: [ 8227.781860] md: md3 stopped.
Sep 22 16:02:09 teresa kernel: [ 8227.781908] md: unbind<hde1>
Sep 22 16:02:09 teresa kernel: [ 8227.781937] md: export_rdev(hde1)
Sep 22 16:02:09 teresa kernel: [ 8227.782261] md: unbind<hdg1>
Sep 22 16:02:09 teresa kernel: [ 8227.782292] md: export_rdev(hdg1)
Sep 22 16:02:09 teresa kernel: [ 8227.782561] md: unbind<hdk1>
Sep 22 16:02:09 teresa kernel: [ 8227.782590] md: export_rdev(hdk1)
Sep 22 16:02:09 teresa kernel: [ 8227.782855] md: unbind<hdi1>
Sep 22 16:02:09 teresa kernel: [ 8227.782885] md: export_rdev(hdi1)
Sep 22 16:15:32 teresa smartd[2657]: Device: /dev/hda, Failed SMART usage
Attribute: 194 Temperature_Celsius.
Sep 22 16:15:33 teresa smartd[2657]: Device: /dev/hdk, SMART Usage
Attribute: 194 Temperature_Celsius changed from 110 to 111
Sep 22 16:15:33 teresa smartd[2657]: Device: /dev/sdb, SMART Usage
Attribute: 194 Temperature_Celsius changed from 113 to 116
Sep 22 16:15:33 teresa smartd[2657]: Device: /dev/sdc, SMART Usage
Attribute: 190 Airflow_Temperature_Cel changed from 52 to 51
Sep 22 16:17:01 teresa /USR/SBIN/CRON[2965]: (root) CMD (   cd / &&
run-parts --report /etc/cron.hourly)
Sep 22 16:18:42 teresa kernel: [ 9220.400915] md: md3 stopped.
Sep 22 16:18:42 teresa kernel: [ 9220.411525] md: bind<hdi1>
Sep 22 16:18:42 teresa kernel: [ 9220.411884] md: bind<sde1>
Sep 22 16:18:42 teresa kernel: [ 9220.412577] md: bind<hdk1>
Sep 22 16:18:42 teresa kernel: [ 9220.413162] md: bind<hdg1>
Sep 22 16:18:42 teresa kernel: [ 9220.413750] md: bind<hde1>
Sep 22 16:18:42 teresa kernel: [ 9220.413855] md: kicking non-fresh sde1
from array!
Sep 22 16:18:42 teresa kernel: [ 9220.413887] md: unbind<sde1>
Sep 22 16:18:42 teresa kernel: [ 9220.413915] md: export_rdev(sde1)
Sep 22 16:18:42 teresa kernel: [ 9220.477393] raid5: device hde1
operational as raid disk 0
Sep 22 16:18:42 teresa kernel: [ 9220.477420] raid5: device hdg1
operational as raid disk 4
Sep 22 16:18:42 teresa kernel: [ 9220.477438] raid5: device hdk1
operational as raid disk 3
Sep 22 16:18:42 teresa kernel: [ 9220.477456] raid5: device hdi1
operational as raid disk 1
Sep 22 16:18:42 teresa kernel: [ 9220.478236] raid5: allocated 5252kB for md3
Sep 22 16:18:42 teresa kernel: [ 9220.478265] raid5: raid level 5 set md3
active with 4 out of 5 devices, algorithm 2
Sep 22 16:18:42 teresa kernel: [ 9220.478294] RAID5 conf printout:
Sep 22 16:18:42 teresa kernel: [ 9220.478309]  --- rd:5 wd:4
Sep 22 16:18:42 teresa kernel: [ 9220.478324]  disk 0, o:1, dev:hde1
Sep 22 16:18:42 teresa kernel: [ 9220.478339]  disk 1, o:1, dev:hdi1
Sep 22 16:18:42 teresa kernel: [ 9220.478354]  disk 3, o:1, dev:hdk1
Sep 22 16:18:42 teresa kernel: [ 9220.478369]  disk 4, o:1, dev:hdg1
# Me stopping md3
Sep 22 16:18:53 teresa mdadm[2694]: DeviceDisappeared event detected on md
device /dev/md3
Sep 22 16:18:53 teresa kernel: [ 9231.572348] md: md3 stopped.
Sep 22 16:18:53 teresa kernel: [ 9231.572394] md: unbind<hde1>
Sep 22 16:18:53 teresa kernel: [ 9231.572423] md: export_rdev(hde1)
Sep 22 16:18:53 teresa kernel: [ 9231.572728] md: unbind<hdg1>
Sep 22 16:18:53 teresa kernel: [ 9231.572758] md: export_rdev(hdg1)
Sep 22 16:18:53 teresa kernel: [ 9231.572988] md: unbind<hdk1>
Sep 22 16:18:53 teresa kernel: [ 9231.573015] md: export_rdev(hdk1)
Sep 22 16:18:53 teresa kernel: [ 9231.573243] md: unbind<hdi1>
Sep 22 16:18:53 teresa kernel: [ 9231.573270] md: export_rdev(hdi1)
# Me creating md3 with sde1 missing
Sep 22 16:19:51 teresa kernel: [ 9289.621646] md: bind<hde1>
Sep 22 16:19:51 teresa kernel: [ 9289.665268] md: bind<hdi1>
Sep 22 16:19:51 teresa kernel: [ 9289.695676] md: bind<hdk1>
Sep 22 16:19:51 teresa kernel: [ 9289.726906] md: bind<hdg1>
Sep 22 16:19:51 teresa kernel: [ 9289.809030] raid5: device hdg1
operational as raid disk 4
Sep 22 16:19:51 teresa kernel: [ 9289.809057] raid5: device hdk1
operational as raid disk 3
Sep 22 16:19:51 teresa kernel: [ 9289.809075] raid5: device hdi1
operational as raid disk 1
Sep 22 16:19:51 teresa kernel: [ 9289.809093] raid5: device hde1
operational as raid disk 0
Sep 22 16:19:51 teresa kernel: [ 9289.809821] raid5: allocated 5252kB for md3
Sep 22 16:19:51 teresa kernel: [ 9289.809850] raid5: raid level 5 set md3
active with 4 out of 5 devices, algorithm 2
Sep 22 16:19:51 teresa kernel: [ 9289.809877] RAID5 conf printout:
Sep 22 16:19:51 teresa kernel: [ 9289.809891]  --- rd:5 wd:4
Sep 22 16:19:51 teresa kernel: [ 9289.809907]  disk 0, o:1, dev:hde1
Sep 22 16:19:51 teresa kernel: [ 9289.809922]  disk 1, o:1, dev:hdi1
Sep 22 16:19:51 teresa kernel: [ 9289.809937]  disk 3, o:1, dev:hdk1
Sep 22 16:19:51 teresa kernel: [ 9289.809953]  disk 4, o:1, dev:hdg1
Sep 22 16:20:20 teresa kernel: [ 9318.486512] kjournald starting.  Commit
interval 5 seconds
Sep 22 16:20:20 teresa kernel: [ 9318.486512] EXT3-fs: mounted filesystem
with ordered data mode.
# Me stopping md3 again
Sep 22 16:20:42 teresa mdadm[2694]: DeviceDisappeared event detected on md
device /dev/md3
Sep 22 16:20:42 teresa kernel: [ 9340.300590] md: md3 stopped.
Sep 22 16:20:42 teresa kernel: [ 9340.300639] md: unbind<hdg1>
Sep 22 16:20:42 teresa kernel: [ 9340.300668] md: export_rdev(hdg1)
Sep 22 16:20:42 teresa kernel: [ 9340.300921] md: unbind<hdk1>
Sep 22 16:20:42 teresa kernel: [ 9340.300950] md: export_rdev(hdk1)
Sep 22 16:20:42 teresa kernel: [ 9340.301183] md: unbind<hdi1>
Sep 22 16:20:42 teresa kernel: [ 9340.301211] md: export_rdev(hdi1)
Sep 22 16:20:42 teresa kernel: [ 9340.301438] md: unbind<hde1>
Sep 22 16:20:42 teresa kernel: [ 9340.301465] md: export_rdev(hde1)
# This is me doing the fatal create, that recovers the wrong disk
Sep 22 16:21:39 teresa kernel: [ 9397.609864] md: bind<hde1>
Sep 22 16:21:39 teresa kernel: [ 9397.652426] md: bind<hdi1>
Sep 22 16:21:39 teresa kernel: [ 9397.673203] md: bind<sde1>
Sep 22 16:21:39 teresa kernel: [ 9397.699373] md: bind<hdk1>
Sep 22 16:21:39 teresa kernel: [ 9397.739372] md: bind<hdg1>
Sep 22 16:21:39 teresa kernel: [ 9397.801729] raid5: device hdk1
operational as raid disk 3
Sep 22 16:21:39 teresa kernel: [ 9397.801756] raid5: device sde1
operational as raid disk 2
Sep 22 16:21:39 teresa kernel: [ 9397.801774] raid5: device hdi1
operational as raid disk 1
Sep 22 16:21:39 teresa kernel: [ 9397.801793] raid5: device hde1
operational as raid disk 0
Sep 22 16:21:39 teresa kernel: [ 9397.802531] raid5: allocated 5252kB for md3
Sep 22 16:21:39 teresa kernel: [ 9397.802559] raid5: raid level 5 set md3
active with 4 out of 5 devices, algorithm 2
Sep 22 16:21:39 teresa kernel: [ 9397.802586] RAID5 conf printout:
Sep 22 16:21:39 teresa kernel: [ 9397.802600]  --- rd:5 wd:4
Sep 22 16:21:39 teresa kernel: [ 9397.802615]  disk 0, o:1, dev:hde1
Sep 22 16:21:39 teresa kernel: [ 9397.802631]  disk 1, o:1, dev:hdi1
Sep 22 16:21:39 teresa kernel: [ 9397.802646]  disk 2, o:1, dev:sde1
Sep 22 16:21:39 teresa kernel: [ 9397.802661]  disk 3, o:1, dev:hdk1
Sep 22 16:21:39 teresa kernel: [ 9397.838429] RAID5 conf printout:
Sep 22 16:21:39 teresa kernel: [ 9397.838454]  --- rd:5 wd:4
Sep 22 16:21:39 teresa kernel: [ 9397.838471]  disk 0, o:1, dev:hde1
Sep 22 16:21:39 teresa kernel: [ 9397.838486]  disk 1, o:1, dev:hdi1
Sep 22 16:21:39 teresa kernel: [ 9397.838502]  disk 2, o:1, dev:sde1
Sep 22 16:21:39 teresa kernel: [ 9397.838518]  disk 3, o:1, dev:hdk1
Sep 22 16:21:39 teresa kernel: [ 9397.838533]  disk 4, o:1, dev:hdg1
Sep 22 16:21:39 teresa mdadm[2694]: RebuildStarted event detected on md
device /dev/md3
Sep 22 16:21:39 teresa kernel: [ 9397.841822] md: recovery of RAID array md3
Sep 22 16:21:39 teresa kernel: [ 9397.841848] md: minimum _guaranteed_ 
speed: 1000 KB/sec/disk.
Sep 22 16:21:39 teresa kernel: [ 9397.841868] md: using maximum available
idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Sep 22 16:21:39 teresa kernel: [ 9397.841908] md: using 128k window, over
a total of 732571904 blocks.
Sep 22 16:22:33 teresa kernel: [ 9451.640192] EXT3-fs error (device md3):
ext3_check_descriptors: Block bitmap for group 3968 not in group (block
0)!
Sep 22 16:22:33 teresa kernel: [ 9451.750241] EXT3-fs: group descriptors
corrupted!
Sep 22 16:22:39 teresa kernel: [ 9458.079151] md: md_do_sync() got signal
... exiting
Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: md3 stopped.
Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<hdg1>
Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(hdg1)
Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<hdk1>
Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(hdk1)
Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<sde1>
Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(sde1)
Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<hdi1>
Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(hdi1)
Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<hde1>
Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(hde1)
Sep 22 16:22:39 teresa mdadm[2694]: DeviceDisappeared event detected on md
device /dev/md3
# Me trying to recreate md3 without sde
Sep 22 16:23:50 teresa kernel: [ 9529.065477] md: bind<hde1>
Sep 22 16:23:50 teresa kernel: [ 9529.107767] md: bind<hdi1>
Sep 22 16:23:50 teresa kernel: [ 9529.137743] md: bind<hdk1>
Sep 22 16:23:50 teresa kernel: [ 9529.177990] md: bind<hdg1>
Sep 22 16:23:51 teresa mdadm[2694]: RebuildFinished event detected on md
device /dev/md3
Sep 22 16:23:51 teresa kernel: [ 9529.240814] raid5: device hdg1
operational as raid disk 4
Sep 22 16:23:51 teresa kernel: [ 9529.241734] raid5: device hdk1
operational as raid disk 3
Sep 22 16:23:51 teresa kernel: [ 9529.241752] raid5: device hdi1
operational as raid disk 1
Sep 22 16:23:51 teresa kernel: [ 9529.241770] raid5: device hde1
operational as raid disk 0
Sep 22 16:23:51 teresa kernel: [ 9529.242520] raid5: allocated 5252kB for md3
Sep 22 16:23:51 teresa kernel: [ 9529.242547] raid5: raid level 5 set md3
active with 4 out of 5 devices, algorithm 2
Sep 22 16:23:51 teresa kernel: [ 9529.242574] RAID5 conf printout:
Sep 22 16:23:51 teresa kernel: [ 9529.242588]  --- rd:5 wd:4
Sep 22 16:23:51 teresa kernel: [ 9529.242603]  disk 0, o:1, dev:hde1
Sep 22 16:23:51 teresa kernel: [ 9529.242618]  disk 1, o:1, dev:hdi1
Sep 22 16:23:51 teresa kernel: [ 9529.242633]  disk 3, o:1, dev:hdk1
Sep 22 16:23:51 teresa kernel: [ 9529.242649]  disk 4, o:1, dev:hdg1
# And me trying a fsck -n or a mount
Sep 22 16:24:07 teresa kernel: [ 9545.326343] EXT3-fs error (device md3):
ext3_check_descriptors: Block bitmap for group 3968 not in group (block
0)!
Sep 22 16:24:07 teresa kernel: [ 9545.369071] EXT3-fs: group descriptors
corrupted!


### EXAMINES OF PARTITIONS ###

=== --examine /dev/hde1 ===
/dev/hde1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host teresa)
  Creation Time : Thu Sep 22 16:23:50 2011
     Raid Level : raid5
  Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
     Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 3

    Update Time : Sun Sep 25 22:11:22 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 0
       Checksum : b7f6a3c0 - correct
         Events : 10

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0      33        1        0      active sync   /dev/hde1

   0     0      33        1        0      active sync   /dev/hde1
   1     1      56        1        1      active sync   /dev/hdi1
   2     2       0        0        2      faulty removed
   3     3      57        1        3      active sync   /dev/hdk1
   4     4      34        1        4      active sync   /dev/hdg1

=== --examine /dev/hdi1 ===
/dev/hdi1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host teresa)
  Creation Time : Thu Sep 22 16:23:50 2011
     Raid Level : raid5
  Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
     Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 3

    Update Time : Sun Sep 25 22:11:22 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 0
       Checksum : b7f6a3d9 - correct
         Events : 10

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1      56        1        1      active sync   /dev/hdi1

   0     0      33        1        0      active sync   /dev/hde1
   1     1      56        1        1      active sync   /dev/hdi1
   2     2       0        0        2      faulty removed
   3     3      57        1        3      active sync   /dev/hdk1
   4     4      34        1        4      active sync   /dev/hdg1

=== --examine /dev/sde1 ===
/dev/sde1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : e6e3df36:1195239f:47f7b12e:9c2b2218 (local to host teresa)
  Creation Time : Thu Sep 22 16:21:39 2011
     Raid Level : raid5
  Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
     Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 3

    Update Time : Thu Sep 22 16:22:39 2011
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 4e69d679 - correct
         Events : 8

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       65        2      active sync   /dev/sde1

   0     0      33        1        0      active sync   /dev/hde1
   1     1      56        1        1      active sync   /dev/hdi1
   2     2       8       65        2      active sync   /dev/sde1
   3     3      57        1        3      active sync   /dev/hdk1
   4     4       0        0        4      faulty removed
   5     5      34        1        5      spare   /dev/hdg1

=== --examine /dev/hdk1 ===
/dev/hdk1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host teresa)
  Creation Time : Thu Sep 22 16:23:50 2011
     Raid Level : raid5
  Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
     Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 3

    Update Time : Sun Sep 25 22:11:22 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 0
       Checksum : b7f6a3de - correct
         Events : 10

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3      57        1        3      active sync   /dev/hdk1

   0     0      33        1        0      active sync   /dev/hde1
   1     1      56        1        1      active sync   /dev/hdi1
   2     2       0        0        2      faulty removed
   3     3      57        1        3      active sync   /dev/hdk1
   4     4      34        1        4      active sync   /dev/hdg1

=== --examine /dev/hdg1 ===
/dev/hdg1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host teresa)
  Creation Time : Thu Sep 22 16:23:50 2011
     Raid Level : raid5
  Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
     Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 3

    Update Time : Sun Sep 25 22:11:22 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 0
       Checksum : b7f6a3c9 - correct
         Events : 10

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4      34        1        4      active sync   /dev/hdg1

   0     0      33        1        0      active sync   /dev/hde1
   1     1      56        1        1      active sync   /dev/hdi1
   2     2       0        0        2      faulty removed
   3     3      57        1        3      active sync   /dev/hdk1
   4     4      34        1        4      active sync   /dev/hdg1




>
>
>>
>> (2) Can I suggest improvements into resilvering?  Can I contribute code
>> to
>> implement them?  Such as resilver from the end of the drive back to the
>> front, so if you notice the wrong drive resilvering, you can stop and
>> not
>> lose the MBR and the directory format structure that's stored in the
>> first
>> few sectors?  I'd also like to take a look at adding a raid mode where
>> there's checksum in every stripe block so the system can detect
>> corrupted
>> disks and not resilver.  I'd also like to add a raid option where a
>> resilvering need will be reported by email and needs to be started
>> manually.  All to prevent what happened to me from happening again.
>>
>> Thanks for your time.
>>
>> Kenn Frank
>>
>> P.S.  Setup:
>>
>> # uname -a
>> Linux teresa 2.6.26-2-686 #1 SMP Sat Jun 11 14:54:10 UTC 2011 i686
>> GNU/Linux
>>
>> # mdadm --version
>> mdadm - v2.6.7.2 - 14th November 2008
>>
>> # mdadm --detail /dev/md3
>> /dev/md3:
>>         Version : 00.90
>>   Creation Time : Thu Sep 22 16:23:50 2011
>>      Raid Level : raid5
>>      Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
>>   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>>    Raid Devices : 5
>>   Total Devices : 4
>> Preferred Minor : 3
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Thu Sep 22 20:19:09 2011
>>           State : clean, degraded
>>  Active Devices : 4
>> Working Devices : 4
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>          Layout : left-symmetric
>>      Chunk Size : 64K
>>
>>            UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host
>> teresa)
>>          Events : 0.6
>>
>>     Number   Major   Minor   RaidDevice State
>>        0      33        1        0      active sync   /dev/hde1
>>        1      56        1        1      active sync   /dev/hdi1
>>        2       0        0        2      removed
>>        3      57        1        3      active sync   /dev/hdk1
>>        4      34        1        4      active sync   /dev/hdg1
>>
>>
>
>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re:
  2011-09-26  7:42   ` Kenn
@ 2011-09-26  8:04     ` NeilBrown
  2011-09-26 18:04       ` Re: Kenn
  0 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2011-09-26  8:04 UTC (permalink / raw)
  To: kenn; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 26202 bytes --]

On Mon, 26 Sep 2011 00:42:23 -0700 "Kenn" <kenn@kenn.us> wrote:

> Replying.  I realize and I apologize I didn't create a subject.  I hope
> this doesn't confuse majordomo.
> 
> > On Sun, 25 Sep 2011 21:23:31 -0700 "Kenn" <kenn@kenn.us> wrote:
> >
> >> I have a raid5 array that had a drive drop out, and resilvered the wrong
> >> drive when I put it back in, corrupting and destroying the raid.  I
> >> stopped the array at less than 1% resilvering and I'm in the process of
> >> making a dd-copy of the drive to recover the files.
> >
> > I don't know what you mean by "resilvered".
> 
> Resilvering -- Rebuilding the array.  Lesser used term, sorry!

I see..

I guess that looking-glass mirrors have a silver backing and when it becomes
tarnished you might re-silver the mirror to make it better again.
So the name works as a poor pun for RAID1.  But I don't see how it applies
to RAID5....
No matter.

Basically you have messed up badly.
Recreating arrays should only be done as a last-ditch attempt to get data
back, and preferably with expert advice...

When you created the array with all devices present it effectively started
copying the corruption that you had deliberately (why??) placed on device 2
(sde) onto device 4 (counting from 0).
So now you have two devices that are corrupt in the early blocks.
There is not much you can do to fix that.

There is some chance that 'fsck' could find a backup superblock somewhere and
try to put the pieces back together.  But the 'mkfs' probably made a
substantial mess of important data structures so I don't consider you chances
very high.
Keeping sde out and just working with the remaining 4 is certainly your best
bet.

What made you think it would be a good idea to re-create the array when all
you wanted to do was trigger a resync/recovery??

NeilBrown


> 
> >
> >>
> >> (1) Is there anything diagnostic I can contribute to add more
> >> wrong-drive-resilvering protection to mdadm?  I have the command history
> >> showing everything I did, I have the five drives available for reading
> >> sectors, I haven't touched anything yet.
> >
> > Yes, report the command history, and any relevant kernel logs, and the
> > output
> > of "mdadm --examine" on all relevant devices.
> >
> > NeilBrown
> 
> Awesome!  I hope this is useful.  It's really long, so I edited down the
> logs and command history to what I thought were the important bits.  If
> you want more, I can post unedited versions, please let me know.
> 
> ### Command History ###
> 
> # The start of the sequence, removing sde from array
> mdadm --examine /dev/sde
> mdadm --detail /dev/md3
> cat /proc/mdstat
> mdadm /dev/md3 --remove /dev/sde1
> mdadm /dev/md3 --remove /dev/sde
> mdadm /dev/md3 --fail /dev/sde1
> cat /proc/mdstat
> mdadm --examine /dev/sde1
> fdisk -l | grep 750
> mdadm --examine /dev/sde1
> mdadm --remove /dev/sde
> mdadm /dev/md3 --remove /dev/sde
> mdadm /dev/md3 --fail /dev/sde
> fdisk /dev/sde
> ls
> vi /var/log/syslog
> reboot
> vi /var/log/syslog
> reboot
> mdadm --detail /dev/md3
> mdadm --examine /dev/sde1
> # Wiping sde
> fdisk /dev/sde
> newfs -t ext3 /dev/sde1
> mkfs -t ext3 /dev/sde1
> mkfs -t ext3 /dev/sde2
> fdisk /dev/sde
> mdadm --stop /dev/md3
> # Putting sde back into array
> mdadm --examine /dev/sde
> mdadm --help
> mdadm --misc --help
> mdadm --zero-superblock /dev/sde
> mdadm --query /dev/sde
> mdadm --examine /dev/sde
> mdadm --detail /dev/sde
> mdadm --detail /dev/sde1
> fdisk /dev/sde
> mdadm --assemble --no-degraded /dev/md3  /dev/hde1 /dev/hdi1 /dev/sde1
> /dev/hdk1 /dev/hdg1
> cat /proc/mdstat
> mdadm --stop /dev/md3
> mdadm --create /dev/md3 --level=5 --raid-devices=5  /dev/hde1 /dev/hdi1
> missing /dev/hdk1 /dev/hdg1
> mount -o ro /raid53
> ls /raid53
> umount /raid53
> mdadm --stop /dev/md3
> # The command that did the bad rebuild
> mdadm --create /dev/md3 --level=5 --raid-devices=5  /dev/hde1 /dev/hdi1
> /dev/sde1 /dev/hdk1 /dev/hdg1
> cat /proc/mdstat
> mdadm --examine /dev/md3
> mdadm --query /dev/md3
> mdadm --detail /dev/md3
> mount /raid53
> mdadm --stop /dev/md3
> # Trying to get the corrupted disk back up
> mdadm --create /dev/md3 --level=5 --raid-devices=5  /dev/hde1 /dev/hdi1
> missing /dev/hdk1 /dev/hdg1
> cat /proc/mdstat
> mount /raid53
> fsck -n /dev/md3
> 
> 
> 
> ### KERNEL LOGS ###
> 
> # Me messing around with fdisk and mdadm creating new partitions to wipe
> out sde
> Sep 22 15:56:39 teresa kernel: [ 7897.778204] sd 5:0:0:0: [sde] 1465149168
> 512-byte hardware sectors (750156 MB)
> Sep 22 15:56:39 teresa kernel: [ 7897.778204] sd 5:0:0:0: [sde] Write
> Protect is off
> Sep 22 15:56:39 teresa kernel: [ 7897.778204] sd 5:0:0:0: [sde] Mode
> Sense: 00 3a 00 00
> Sep 22 15:56:39 teresa kernel: [ 7897.778204] sd 5:0:0:0: [sde] Write
> cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Sep 22 15:56:39 teresa kernel: [ 7897.778204]  sde: sde1 sde2
> Sep 22 15:56:41 teresa kernel: [ 7899.848026] sd 5:0:0:0: [sde] 1465149168
> 512-byte hardware sectors (750156 MB)
> Sep 22 15:56:41 teresa kernel: [ 7899.848026] sd 5:0:0:0: [sde] Write
> Protect is off
> Sep 22 15:56:41 teresa kernel: [ 7899.848026] sd 5:0:0:0: [sde] Mode
> Sense: 00 3a 00 00
> Sep 22 15:56:41 teresa kernel: [ 7899.848026] sd 5:0:0:0: [sde] Write
> cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Sep 22 15:56:41 teresa kernel: [ 7899.848026]  sde: sde1 sde2
> Sep 22 16:01:49 teresa kernel: [ 8207.733821] sd 5:0:0:0: [sde] 1465149168
> 512-byte hardware sectors (750156 MB)
> Sep 22 16:01:49 teresa kernel: [ 8207.733919] sd 5:0:0:0: [sde] Write
> Protect is off
> Sep 22 16:01:49 teresa kernel: [ 8207.733943] sd 5:0:0:0: [sde] Mode
> Sense: 00 3a 00 00
> Sep 22 16:01:49 teresa kernel: [ 8207.734039] sd 5:0:0:0: [sde] Write
> cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Sep 22 16:01:49 teresa kernel: [ 8207.734083]  sde: sde1
> Sep 22 16:01:51 teresa kernel: [ 8209.777260] sd 5:0:0:0: [sde] 1465149168
> 512-byte hardware sectors (750156 MB)
> Sep 22 16:01:51 teresa kernel: [ 8209.777260] sd 5:0:0:0: [sde] Write
> Protect is off
> Sep 22 16:01:51 teresa kernel: [ 8209.777260] sd 5:0:0:0: [sde] Mode
> Sense: 00 3a 00 00
> Sep 22 16:01:51 teresa kernel: [ 8209.777260] sd 5:0:0:0: [sde] Write
> cache: enabled, read cache: enabled, doesn't support DPO or FUA
> Sep 22 16:01:51 teresa kernel: [ 8209.777260]  sde: sde1
> Sep 22 16:02:09 teresa mdadm[2694]: DeviceDisappeared event detected on md
> device /dev/md3
> Sep 22 16:02:09 teresa kernel: [ 8227.781860] md: md3 stopped.
> Sep 22 16:02:09 teresa kernel: [ 8227.781908] md: unbind<hde1>
> Sep 22 16:02:09 teresa kernel: [ 8227.781937] md: export_rdev(hde1)
> Sep 22 16:02:09 teresa kernel: [ 8227.782261] md: unbind<hdg1>
> Sep 22 16:02:09 teresa kernel: [ 8227.782292] md: export_rdev(hdg1)
> Sep 22 16:02:09 teresa kernel: [ 8227.782561] md: unbind<hdk1>
> Sep 22 16:02:09 teresa kernel: [ 8227.782590] md: export_rdev(hdk1)
> Sep 22 16:02:09 teresa kernel: [ 8227.782855] md: unbind<hdi1>
> Sep 22 16:02:09 teresa kernel: [ 8227.782885] md: export_rdev(hdi1)
> Sep 22 16:15:32 teresa smartd[2657]: Device: /dev/hda, Failed SMART usage
> Attribute: 194 Temperature_Celsius.
> Sep 22 16:15:33 teresa smartd[2657]: Device: /dev/hdk, SMART Usage
> Attribute: 194 Temperature_Celsius changed from 110 to 111
> Sep 22 16:15:33 teresa smartd[2657]: Device: /dev/sdb, SMART Usage
> Attribute: 194 Temperature_Celsius changed from 113 to 116
> Sep 22 16:15:33 teresa smartd[2657]: Device: /dev/sdc, SMART Usage
> Attribute: 190 Airflow_Temperature_Cel changed from 52 to 51
> Sep 22 16:17:01 teresa /USR/SBIN/CRON[2965]: (root) CMD (   cd / &&
> run-parts --report /etc/cron.hourly)
> Sep 22 16:18:42 teresa kernel: [ 9220.400915] md: md3 stopped.
> Sep 22 16:18:42 teresa kernel: [ 9220.411525] md: bind<hdi1>
> Sep 22 16:18:42 teresa kernel: [ 9220.411884] md: bind<sde1>
> Sep 22 16:18:42 teresa kernel: [ 9220.412577] md: bind<hdk1>
> Sep 22 16:18:42 teresa kernel: [ 9220.413162] md: bind<hdg1>
> Sep 22 16:18:42 teresa kernel: [ 9220.413750] md: bind<hde1>
> Sep 22 16:18:42 teresa kernel: [ 9220.413855] md: kicking non-fresh sde1
> from array!
> Sep 22 16:18:42 teresa kernel: [ 9220.413887] md: unbind<sde1>
> Sep 22 16:18:42 teresa kernel: [ 9220.413915] md: export_rdev(sde1)
> Sep 22 16:18:42 teresa kernel: [ 9220.477393] raid5: device hde1
> operational as raid disk 0
> Sep 22 16:18:42 teresa kernel: [ 9220.477420] raid5: device hdg1
> operational as raid disk 4
> Sep 22 16:18:42 teresa kernel: [ 9220.477438] raid5: device hdk1
> operational as raid disk 3
> Sep 22 16:18:42 teresa kernel: [ 9220.477456] raid5: device hdi1
> operational as raid disk 1
> Sep 22 16:18:42 teresa kernel: [ 9220.478236] raid5: allocated 5252kB for md3
> Sep 22 16:18:42 teresa kernel: [ 9220.478265] raid5: raid level 5 set md3
> active with 4 out of 5 devices, algorithm 2
> Sep 22 16:18:42 teresa kernel: [ 9220.478294] RAID5 conf printout:
> Sep 22 16:18:42 teresa kernel: [ 9220.478309]  --- rd:5 wd:4
> Sep 22 16:18:42 teresa kernel: [ 9220.478324]  disk 0, o:1, dev:hde1
> Sep 22 16:18:42 teresa kernel: [ 9220.478339]  disk 1, o:1, dev:hdi1
> Sep 22 16:18:42 teresa kernel: [ 9220.478354]  disk 3, o:1, dev:hdk1
> Sep 22 16:18:42 teresa kernel: [ 9220.478369]  disk 4, o:1, dev:hdg1
> # Me stopping md3
> Sep 22 16:18:53 teresa mdadm[2694]: DeviceDisappeared event detected on md
> device /dev/md3
> Sep 22 16:18:53 teresa kernel: [ 9231.572348] md: md3 stopped.
> Sep 22 16:18:53 teresa kernel: [ 9231.572394] md: unbind<hde1>
> Sep 22 16:18:53 teresa kernel: [ 9231.572423] md: export_rdev(hde1)
> Sep 22 16:18:53 teresa kernel: [ 9231.572728] md: unbind<hdg1>
> Sep 22 16:18:53 teresa kernel: [ 9231.572758] md: export_rdev(hdg1)
> Sep 22 16:18:53 teresa kernel: [ 9231.572988] md: unbind<hdk1>
> Sep 22 16:18:53 teresa kernel: [ 9231.573015] md: export_rdev(hdk1)
> Sep 22 16:18:53 teresa kernel: [ 9231.573243] md: unbind<hdi1>
> Sep 22 16:18:53 teresa kernel: [ 9231.573270] md: export_rdev(hdi1)
> # Me creating md3 with sde1 missing
> Sep 22 16:19:51 teresa kernel: [ 9289.621646] md: bind<hde1>
> Sep 22 16:19:51 teresa kernel: [ 9289.665268] md: bind<hdi1>
> Sep 22 16:19:51 teresa kernel: [ 9289.695676] md: bind<hdk1>
> Sep 22 16:19:51 teresa kernel: [ 9289.726906] md: bind<hdg1>
> Sep 22 16:19:51 teresa kernel: [ 9289.809030] raid5: device hdg1
> operational as raid disk 4
> Sep 22 16:19:51 teresa kernel: [ 9289.809057] raid5: device hdk1
> operational as raid disk 3
> Sep 22 16:19:51 teresa kernel: [ 9289.809075] raid5: device hdi1
> operational as raid disk 1
> Sep 22 16:19:51 teresa kernel: [ 9289.809093] raid5: device hde1
> operational as raid disk 0
> Sep 22 16:19:51 teresa kernel: [ 9289.809821] raid5: allocated 5252kB for md3
> Sep 22 16:19:51 teresa kernel: [ 9289.809850] raid5: raid level 5 set md3
> active with 4 out of 5 devices, algorithm 2
> Sep 22 16:19:51 teresa kernel: [ 9289.809877] RAID5 conf printout:
> Sep 22 16:19:51 teresa kernel: [ 9289.809891]  --- rd:5 wd:4
> Sep 22 16:19:51 teresa kernel: [ 9289.809907]  disk 0, o:1, dev:hde1
> Sep 22 16:19:51 teresa kernel: [ 9289.809922]  disk 1, o:1, dev:hdi1
> Sep 22 16:19:51 teresa kernel: [ 9289.809937]  disk 3, o:1, dev:hdk1
> Sep 22 16:19:51 teresa kernel: [ 9289.809953]  disk 4, o:1, dev:hdg1
> Sep 22 16:20:20 teresa kernel: [ 9318.486512] kjournald starting.  Commit
> interval 5 seconds
> Sep 22 16:20:20 teresa kernel: [ 9318.486512] EXT3-fs: mounted filesystem
> with ordered data mode.
> # Me stopping md3 again
> Sep 22 16:20:42 teresa mdadm[2694]: DeviceDisappeared event detected on md
> device /dev/md3
> Sep 22 16:20:42 teresa kernel: [ 9340.300590] md: md3 stopped.
> Sep 22 16:20:42 teresa kernel: [ 9340.300639] md: unbind<hdg1>
> Sep 22 16:20:42 teresa kernel: [ 9340.300668] md: export_rdev(hdg1)
> Sep 22 16:20:42 teresa kernel: [ 9340.300921] md: unbind<hdk1>
> Sep 22 16:20:42 teresa kernel: [ 9340.300950] md: export_rdev(hdk1)
> Sep 22 16:20:42 teresa kernel: [ 9340.301183] md: unbind<hdi1>
> Sep 22 16:20:42 teresa kernel: [ 9340.301211] md: export_rdev(hdi1)
> Sep 22 16:20:42 teresa kernel: [ 9340.301438] md: unbind<hde1>
> Sep 22 16:20:42 teresa kernel: [ 9340.301465] md: export_rdev(hde1)
> # This is me doing the fatal create, that recovers the wrong disk
> Sep 22 16:21:39 teresa kernel: [ 9397.609864] md: bind<hde1>
> Sep 22 16:21:39 teresa kernel: [ 9397.652426] md: bind<hdi1>
> Sep 22 16:21:39 teresa kernel: [ 9397.673203] md: bind<sde1>
> Sep 22 16:21:39 teresa kernel: [ 9397.699373] md: bind<hdk1>
> Sep 22 16:21:39 teresa kernel: [ 9397.739372] md: bind<hdg1>
> Sep 22 16:21:39 teresa kernel: [ 9397.801729] raid5: device hdk1
> operational as raid disk 3
> Sep 22 16:21:39 teresa kernel: [ 9397.801756] raid5: device sde1
> operational as raid disk 2
> Sep 22 16:21:39 teresa kernel: [ 9397.801774] raid5: device hdi1
> operational as raid disk 1
> Sep 22 16:21:39 teresa kernel: [ 9397.801793] raid5: device hde1
> operational as raid disk 0
> Sep 22 16:21:39 teresa kernel: [ 9397.802531] raid5: allocated 5252kB for md3
> Sep 22 16:21:39 teresa kernel: [ 9397.802559] raid5: raid level 5 set md3
> active with 4 out of 5 devices, algorithm 2
> Sep 22 16:21:39 teresa kernel: [ 9397.802586] RAID5 conf printout:
> Sep 22 16:21:39 teresa kernel: [ 9397.802600]  --- rd:5 wd:4
> Sep 22 16:21:39 teresa kernel: [ 9397.802615]  disk 0, o:1, dev:hde1
> Sep 22 16:21:39 teresa kernel: [ 9397.802631]  disk 1, o:1, dev:hdi1
> Sep 22 16:21:39 teresa kernel: [ 9397.802646]  disk 2, o:1, dev:sde1
> Sep 22 16:21:39 teresa kernel: [ 9397.802661]  disk 3, o:1, dev:hdk1
> Sep 22 16:21:39 teresa kernel: [ 9397.838429] RAID5 conf printout:
> Sep 22 16:21:39 teresa kernel: [ 9397.838454]  --- rd:5 wd:4
> Sep 22 16:21:39 teresa kernel: [ 9397.838471]  disk 0, o:1, dev:hde1
> Sep 22 16:21:39 teresa kernel: [ 9397.838486]  disk 1, o:1, dev:hdi1
> Sep 22 16:21:39 teresa kernel: [ 9397.838502]  disk 2, o:1, dev:sde1
> Sep 22 16:21:39 teresa kernel: [ 9397.838518]  disk 3, o:1, dev:hdk1
> Sep 22 16:21:39 teresa kernel: [ 9397.838533]  disk 4, o:1, dev:hdg1
> Sep 22 16:21:39 teresa mdadm[2694]: RebuildStarted event detected on md
> device /dev/md3
> Sep 22 16:21:39 teresa kernel: [ 9397.841822] md: recovery of RAID array md3
> Sep 22 16:21:39 teresa kernel: [ 9397.841848] md: minimum _guaranteed_ 
> speed: 1000 KB/sec/disk.
> Sep 22 16:21:39 teresa kernel: [ 9397.841868] md: using maximum available
> idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
> Sep 22 16:21:39 teresa kernel: [ 9397.841908] md: using 128k window, over
> a total of 732571904 blocks.
> Sep 22 16:22:33 teresa kernel: [ 9451.640192] EXT3-fs error (device md3):
> ext3_check_descriptors: Block bitmap for group 3968 not in group (block
> 0)!
> Sep 22 16:22:33 teresa kernel: [ 9451.750241] EXT3-fs: group descriptors
> corrupted!
> Sep 22 16:22:39 teresa kernel: [ 9458.079151] md: md_do_sync() got signal
> ... exiting
> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: md3 stopped.
> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<hdg1>
> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(hdg1)
> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<hdk1>
> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(hdk1)
> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<sde1>
> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(sde1)
> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<hdi1>
> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(hdi1)
> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<hde1>
> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(hde1)
> Sep 22 16:22:39 teresa mdadm[2694]: DeviceDisappeared event detected on md
> device /dev/md3
> # Me trying to recreate md3 without sde
> Sep 22 16:23:50 teresa kernel: [ 9529.065477] md: bind<hde1>
> Sep 22 16:23:50 teresa kernel: [ 9529.107767] md: bind<hdi1>
> Sep 22 16:23:50 teresa kernel: [ 9529.137743] md: bind<hdk1>
> Sep 22 16:23:50 teresa kernel: [ 9529.177990] md: bind<hdg1>
> Sep 22 16:23:51 teresa mdadm[2694]: RebuildFinished event detected on md
> device /dev/md3
> Sep 22 16:23:51 teresa kernel: [ 9529.240814] raid5: device hdg1
> operational as raid disk 4
> Sep 22 16:23:51 teresa kernel: [ 9529.241734] raid5: device hdk1
> operational as raid disk 3
> Sep 22 16:23:51 teresa kernel: [ 9529.241752] raid5: device hdi1
> operational as raid disk 1
> Sep 22 16:23:51 teresa kernel: [ 9529.241770] raid5: device hde1
> operational as raid disk 0
> Sep 22 16:23:51 teresa kernel: [ 9529.242520] raid5: allocated 5252kB for md3
> Sep 22 16:23:51 teresa kernel: [ 9529.242547] raid5: raid level 5 set md3
> active with 4 out of 5 devices, algorithm 2
> Sep 22 16:23:51 teresa kernel: [ 9529.242574] RAID5 conf printout:
> Sep 22 16:23:51 teresa kernel: [ 9529.242588]  --- rd:5 wd:4
> Sep 22 16:23:51 teresa kernel: [ 9529.242603]  disk 0, o:1, dev:hde1
> Sep 22 16:23:51 teresa kernel: [ 9529.242618]  disk 1, o:1, dev:hdi1
> Sep 22 16:23:51 teresa kernel: [ 9529.242633]  disk 3, o:1, dev:hdk1
> Sep 22 16:23:51 teresa kernel: [ 9529.242649]  disk 4, o:1, dev:hdg1
> # And me trying a fsck -n or a mount
> Sep 22 16:24:07 teresa kernel: [ 9545.326343] EXT3-fs error (device md3):
> ext3_check_descriptors: Block bitmap for group 3968 not in group (block
> 0)!
> Sep 22 16:24:07 teresa kernel: [ 9545.369071] EXT3-fs: group descriptors
> corrupted!
> 
> 
> ### EXAMINES OF PARTITIONS ###
> 
> === --examine /dev/hde1 ===
> /dev/hde1:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host teresa)
>   Creation Time : Thu Sep 22 16:23:50 2011
>      Raid Level : raid5
>   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>      Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
>    Raid Devices : 5
>   Total Devices : 4
> Preferred Minor : 3
> 
>     Update Time : Sun Sep 25 22:11:22 2011
>           State : clean
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 1
>   Spare Devices : 0
>        Checksum : b7f6a3c0 - correct
>          Events : 10
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     0      33        1        0      active sync   /dev/hde1
> 
>    0     0      33        1        0      active sync   /dev/hde1
>    1     1      56        1        1      active sync   /dev/hdi1
>    2     2       0        0        2      faulty removed
>    3     3      57        1        3      active sync   /dev/hdk1
>    4     4      34        1        4      active sync   /dev/hdg1
> 
> === --examine /dev/hdi1 ===
> /dev/hdi1:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host teresa)
>   Creation Time : Thu Sep 22 16:23:50 2011
>      Raid Level : raid5
>   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>      Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
>    Raid Devices : 5
>   Total Devices : 4
> Preferred Minor : 3
> 
>     Update Time : Sun Sep 25 22:11:22 2011
>           State : clean
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 1
>   Spare Devices : 0
>        Checksum : b7f6a3d9 - correct
>          Events : 10
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     1      56        1        1      active sync   /dev/hdi1
> 
>    0     0      33        1        0      active sync   /dev/hde1
>    1     1      56        1        1      active sync   /dev/hdi1
>    2     2       0        0        2      faulty removed
>    3     3      57        1        3      active sync   /dev/hdk1
>    4     4      34        1        4      active sync   /dev/hdg1
> 
> === --examine /dev/sde1 ===
> /dev/sde1:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : e6e3df36:1195239f:47f7b12e:9c2b2218 (local to host teresa)
>   Creation Time : Thu Sep 22 16:21:39 2011
>      Raid Level : raid5
>   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>      Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
>    Raid Devices : 5
>   Total Devices : 5
> Preferred Minor : 3
> 
>     Update Time : Thu Sep 22 16:22:39 2011
>           State : clean
>  Active Devices : 4
> Working Devices : 5
>  Failed Devices : 1
>   Spare Devices : 1
>        Checksum : 4e69d679 - correct
>          Events : 8
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     2       8       65        2      active sync   /dev/sde1
> 
>    0     0      33        1        0      active sync   /dev/hde1
>    1     1      56        1        1      active sync   /dev/hdi1
>    2     2       8       65        2      active sync   /dev/sde1
>    3     3      57        1        3      active sync   /dev/hdk1
>    4     4       0        0        4      faulty removed
>    5     5      34        1        5      spare   /dev/hdg1
> 
> === --examine /dev/hdk1 ===
> /dev/hdk1:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host teresa)
>   Creation Time : Thu Sep 22 16:23:50 2011
>      Raid Level : raid5
>   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>      Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
>    Raid Devices : 5
>   Total Devices : 4
> Preferred Minor : 3
> 
>     Update Time : Sun Sep 25 22:11:22 2011
>           State : clean
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 1
>   Spare Devices : 0
>        Checksum : b7f6a3de - correct
>          Events : 10
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     3      57        1        3      active sync   /dev/hdk1
> 
>    0     0      33        1        0      active sync   /dev/hde1
>    1     1      56        1        1      active sync   /dev/hdi1
>    2     2       0        0        2      faulty removed
>    3     3      57        1        3      active sync   /dev/hdk1
>    4     4      34        1        4      active sync   /dev/hdg1
> 
> === --examine /dev/hdg1 ===
> /dev/hdg1:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host teresa)
>   Creation Time : Thu Sep 22 16:23:50 2011
>      Raid Level : raid5
>   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>      Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
>    Raid Devices : 5
>   Total Devices : 4
> Preferred Minor : 3
> 
>     Update Time : Sun Sep 25 22:11:22 2011
>           State : clean
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 1
>   Spare Devices : 0
>        Checksum : b7f6a3c9 - correct
>          Events : 10
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     4      34        1        4      active sync   /dev/hdg1
> 
>    0     0      33        1        0      active sync   /dev/hde1
>    1     1      56        1        1      active sync   /dev/hdi1
>    2     2       0        0        2      faulty removed
>    3     3      57        1        3      active sync   /dev/hdk1
>    4     4      34        1        4      active sync   /dev/hdg1
> 
> 
> 
> 
> >
> >
> >>
> >> (2) Can I suggest improvements into resilvering?  Can I contribute code
> >> to
> >> implement them?  Such as resilver from the end of the drive back to the
> >> front, so if you notice the wrong drive resilvering, you can stop and
> >> not
> >> lose the MBR and the directory format structure that's stored in the
> >> first
> >> few sectors?  I'd also like to take a look at adding a raid mode where
> >> there's checksum in every stripe block so the system can detect
> >> corrupted
> >> disks and not resilver.  I'd also like to add a raid option where a
> >> resilvering need will be reported by email and needs to be started
> >> manually.  All to prevent what happened to me from happening again.
> >>
> >> Thanks for your time.
> >>
> >> Kenn Frank
> >>
> >> P.S.  Setup:
> >>
> >> # uname -a
> >> Linux teresa 2.6.26-2-686 #1 SMP Sat Jun 11 14:54:10 UTC 2011 i686
> >> GNU/Linux
> >>
> >> # mdadm --version
> >> mdadm - v2.6.7.2 - 14th November 2008
> >>
> >> # mdadm --detail /dev/md3
> >> /dev/md3:
> >>         Version : 00.90
> >>   Creation Time : Thu Sep 22 16:23:50 2011
> >>      Raid Level : raid5
> >>      Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
> >>   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
> >>    Raid Devices : 5
> >>   Total Devices : 4
> >> Preferred Minor : 3
> >>     Persistence : Superblock is persistent
> >>
> >>     Update Time : Thu Sep 22 20:19:09 2011
> >>           State : clean, degraded
> >>  Active Devices : 4
> >> Working Devices : 4
> >>  Failed Devices : 0
> >>   Spare Devices : 0
> >>
> >>          Layout : left-symmetric
> >>      Chunk Size : 64K
> >>
> >>            UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host
> >> teresa)
> >>          Events : 0.6
> >>
> >>     Number   Major   Minor   RaidDevice State
> >>        0      33        1        0      active sync   /dev/hde1
> >>        1      56        1        1      active sync   /dev/hdi1
> >>        2       0        0        2      removed
> >>        3      57        1        3      active sync   /dev/hdk1
> >>        4      34        1        4      active sync   /dev/hdg1
> >>
> >>
> >
> >
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re:
  2011-09-26  8:04     ` Re: NeilBrown
@ 2011-09-26 18:04       ` Kenn
  2011-09-26 19:56         ` Re: David Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Kenn @ 2011-09-26 18:04 UTC (permalink / raw)
  To: linux-raid; +Cc: neilb

> On Mon, 26 Sep 2011 00:42:23 -0700 "Kenn" <kenn@kenn.us> wrote:
>
>> Replying.  I realize and I apologize I didn't create a subject.  I hope
>> this doesn't confuse majordomo.
>>
>> > On Sun, 25 Sep 2011 21:23:31 -0700 "Kenn" <kenn@kenn.us> wrote:
>> >
>> >> I have a raid5 array that had a drive drop out, and resilvered the
>> wrong
>> >> drive when I put it back in, corrupting and destroying the raid.  I
>> >> stopped the array at less than 1% resilvering and I'm in the process
>> of
>> >> making a dd-copy of the drive to recover the files.
>> >
>> > I don't know what you mean by "resilvered".
>>
>> Resilvering -- Rebuilding the array.  Lesser used term, sorry!
>
> I see..
>
> I guess that looking-glass mirrors have a silver backing and when it
> becomes
> tarnished you might re-silver the mirror to make it better again.
> So the name works as a poor pun for RAID1.  But I don't see how it applies
> to RAID5....
> No matter.
>
> Basically you have messed up badly.
> Recreating arrays should only be done as a last-ditch attempt to get data
> back, and preferably with expert advice...
>
> When you created the array with all devices present it effectively started
> copying the corruption that you had deliberately (why??) placed on device
> 2
> (sde) onto device 4 (counting from 0).
> So now you have two devices that are corrupt in the early blocks.
> There is not much you can do to fix that.
>
> There is some chance that 'fsck' could find a backup superblock somewhere
> and
> try to put the pieces back together.  But the 'mkfs' probably made a
> substantial mess of important data structures so I don't consider you
> chances
> very high.
> Keeping sde out and just working with the remaining 4 is certainly your
> best
> bet.
>
> What made you think it would be a good idea to re-create the array when
> all
> you wanted to do was trigger a resync/recovery??
>
> NeilBrown

Originally I had failed & removed sde from the array and then added it
back in, but no resilvering happened, it was just placed as raid device #
5 as an active (faulty?) spare, no rebuilding.  So I thought I'd have to
recreate the array to get it to rebuild.

Because my sde disk was only questionably healthy, if the problem was the
loose cable, I wanted to test the sde disk by having a complete rebuild
put onto it.   I was confident in all the other drives because when I
mounted the array without sde, I ran a complete md5sum scan and
everything's checksum was correct.  So I wanted to force a complete
rebuilding of the array on sde and the --zero-superblock was supposed to
render sde "new" to the array to force the rebuild onto sde.  I just did
the fsck and mkfs for good measure instead of spending the time of using
dd to zero every byte on the drive.  At the time because I thought if
--zero-superblock went wrong, md would reject a blank drive as a data
source for rebuilding and prevent resilvering.

So that brings up another point -- I've been reading through your blog,
and I acknowledge your thoughts on not having much benefit to checksums on
every block (http://neil.brown.name/blog/20110227114201), but sometimes
people like to having that extra lock on their door even though it takes
more effort to go in and out of their home.  In my five-drive array, if
the last five words were the checksums of the blocks on every drive, the
checksums off each drive could vote on trusting the blocks of every other
drive during the rebuild process, and prevent an idiot (me) from killing
his data.  It would force wasteful sectors on the drive, perhaps harm
performance by squeezing 2+n bytes out of each sector, but if someone
wants to protect their data as much as possible, it would be a welcome
option where performance is not a priority.

Also, the checksums do provide some protection: first, against against
partial media failure, which is a major flaw in raid 456 design according
to http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt , and checksum
voting could protect against the Atomicity/write-in-place flaw outlined in
http://en.wikipedia.org/wiki/RAID#Problems_with_RAID .

What do you think?

Kenn

>
>
>>
>> >
>> >>
>> >> (1) Is there anything diagnostic I can contribute to add more
>> >> wrong-drive-resilvering protection to mdadm?  I have the command
>> history
>> >> showing everything I did, I have the five drives available for
>> reading
>> >> sectors, I haven't touched anything yet.
>> >
>> > Yes, report the command history, and any relevant kernel logs, and the
>> > output
>> > of "mdadm --examine" on all relevant devices.
>> >
>> > NeilBrown
>>
>> Awesome!  I hope this is useful.  It's really long, so I edited down the
>> logs and command history to what I thought were the important bits.  If
>> you want more, I can post unedited versions, please let me know.
>>
>> ### Command History ###
>>
>> # The start of the sequence, removing sde from array
>> mdadm --examine /dev/sde
>> mdadm --detail /dev/md3
>> cat /proc/mdstat
>> mdadm /dev/md3 --remove /dev/sde1
>> mdadm /dev/md3 --remove /dev/sde
>> mdadm /dev/md3 --fail /dev/sde1
>> cat /proc/mdstat
>> mdadm --examine /dev/sde1
>> fdisk -l | grep 750
>> mdadm --examine /dev/sde1
>> mdadm --remove /dev/sde
>> mdadm /dev/md3 --remove /dev/sde
>> mdadm /dev/md3 --fail /dev/sde
>> fdisk /dev/sde
>> ls
>> vi /var/log/syslog
>> reboot
>> vi /var/log/syslog
>> reboot
>> mdadm --detail /dev/md3
>> mdadm --examine /dev/sde1
>> # Wiping sde
>> fdisk /dev/sde
>> newfs -t ext3 /dev/sde1
>> mkfs -t ext3 /dev/sde1
>> mkfs -t ext3 /dev/sde2
>> fdisk /dev/sde
>> mdadm --stop /dev/md3
>> # Putting sde back into array
>> mdadm --examine /dev/sde
>> mdadm --help
>> mdadm --misc --help
>> mdadm --zero-superblock /dev/sde
>> mdadm --query /dev/sde
>> mdadm --examine /dev/sde
>> mdadm --detail /dev/sde
>> mdadm --detail /dev/sde1
>> fdisk /dev/sde
>> mdadm --assemble --no-degraded /dev/md3  /dev/hde1 /dev/hdi1 /dev/sde1
>> /dev/hdk1 /dev/hdg1
>> cat /proc/mdstat
>> mdadm --stop /dev/md3
>> mdadm --create /dev/md3 --level=5 --raid-devices=5  /dev/hde1 /dev/hdi1
>> missing /dev/hdk1 /dev/hdg1
>> mount -o ro /raid53
>> ls /raid53
>> umount /raid53
>> mdadm --stop /dev/md3
>> # The command that did the bad rebuild
>> mdadm --create /dev/md3 --level=5 --raid-devices=5  /dev/hde1 /dev/hdi1
>> /dev/sde1 /dev/hdk1 /dev/hdg1
>> cat /proc/mdstat
>> mdadm --examine /dev/md3
>> mdadm --query /dev/md3
>> mdadm --detail /dev/md3
>> mount /raid53
>> mdadm --stop /dev/md3
>> # Trying to get the corrupted disk back up
>> mdadm --create /dev/md3 --level=5 --raid-devices=5  /dev/hde1 /dev/hdi1
>> missing /dev/hdk1 /dev/hdg1
>> cat /proc/mdstat
>> mount /raid53
>> fsck -n /dev/md3
>>
>>
>>
>> ### KERNEL LOGS ###
>>
>> # Me messing around with fdisk and mdadm creating new partitions to wipe
>> out sde
>> Sep 22 15:56:39 teresa kernel: [ 7897.778204] sd 5:0:0:0: [sde]
>> 1465149168
>> 512-byte hardware sectors (750156 MB)
>> Sep 22 15:56:39 teresa kernel: [ 7897.778204] sd 5:0:0:0: [sde] Write
>> Protect is off
>> Sep 22 15:56:39 teresa kernel: [ 7897.778204] sd 5:0:0:0: [sde] Mode
>> Sense: 00 3a 00 00
>> Sep 22 15:56:39 teresa kernel: [ 7897.778204] sd 5:0:0:0: [sde] Write
>> cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> Sep 22 15:56:39 teresa kernel: [ 7897.778204]  sde: sde1 sde2
>> Sep 22 15:56:41 teresa kernel: [ 7899.848026] sd 5:0:0:0: [sde]
>> 1465149168
>> 512-byte hardware sectors (750156 MB)
>> Sep 22 15:56:41 teresa kernel: [ 7899.848026] sd 5:0:0:0: [sde] Write
>> Protect is off
>> Sep 22 15:56:41 teresa kernel: [ 7899.848026] sd 5:0:0:0: [sde] Mode
>> Sense: 00 3a 00 00
>> Sep 22 15:56:41 teresa kernel: [ 7899.848026] sd 5:0:0:0: [sde] Write
>> cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> Sep 22 15:56:41 teresa kernel: [ 7899.848026]  sde: sde1 sde2
>> Sep 22 16:01:49 teresa kernel: [ 8207.733821] sd 5:0:0:0: [sde]
>> 1465149168
>> 512-byte hardware sectors (750156 MB)
>> Sep 22 16:01:49 teresa kernel: [ 8207.733919] sd 5:0:0:0: [sde] Write
>> Protect is off
>> Sep 22 16:01:49 teresa kernel: [ 8207.733943] sd 5:0:0:0: [sde] Mode
>> Sense: 00 3a 00 00
>> Sep 22 16:01:49 teresa kernel: [ 8207.734039] sd 5:0:0:0: [sde] Write
>> cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> Sep 22 16:01:49 teresa kernel: [ 8207.734083]  sde: sde1
>> Sep 22 16:01:51 teresa kernel: [ 8209.777260] sd 5:0:0:0: [sde]
>> 1465149168
>> 512-byte hardware sectors (750156 MB)
>> Sep 22 16:01:51 teresa kernel: [ 8209.777260] sd 5:0:0:0: [sde] Write
>> Protect is off
>> Sep 22 16:01:51 teresa kernel: [ 8209.777260] sd 5:0:0:0: [sde] Mode
>> Sense: 00 3a 00 00
>> Sep 22 16:01:51 teresa kernel: [ 8209.777260] sd 5:0:0:0: [sde] Write
>> cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> Sep 22 16:01:51 teresa kernel: [ 8209.777260]  sde: sde1
>> Sep 22 16:02:09 teresa mdadm[2694]: DeviceDisappeared event detected on
>> md
>> device /dev/md3
>> Sep 22 16:02:09 teresa kernel: [ 8227.781860] md: md3 stopped.
>> Sep 22 16:02:09 teresa kernel: [ 8227.781908] md: unbind<hde1>
>> Sep 22 16:02:09 teresa kernel: [ 8227.781937] md: export_rdev(hde1)
>> Sep 22 16:02:09 teresa kernel: [ 8227.782261] md: unbind<hdg1>
>> Sep 22 16:02:09 teresa kernel: [ 8227.782292] md: export_rdev(hdg1)
>> Sep 22 16:02:09 teresa kernel: [ 8227.782561] md: unbind<hdk1>
>> Sep 22 16:02:09 teresa kernel: [ 8227.782590] md: export_rdev(hdk1)
>> Sep 22 16:02:09 teresa kernel: [ 8227.782855] md: unbind<hdi1>
>> Sep 22 16:02:09 teresa kernel: [ 8227.782885] md: export_rdev(hdi1)
>> Sep 22 16:15:32 teresa smartd[2657]: Device: /dev/hda, Failed SMART
>> usage
>> Attribute: 194 Temperature_Celsius.
>> Sep 22 16:15:33 teresa smartd[2657]: Device: /dev/hdk, SMART Usage
>> Attribute: 194 Temperature_Celsius changed from 110 to 111
>> Sep 22 16:15:33 teresa smartd[2657]: Device: /dev/sdb, SMART Usage
>> Attribute: 194 Temperature_Celsius changed from 113 to 116
>> Sep 22 16:15:33 teresa smartd[2657]: Device: /dev/sdc, SMART Usage
>> Attribute: 190 Airflow_Temperature_Cel changed from 52 to 51
>> Sep 22 16:17:01 teresa /USR/SBIN/CRON[2965]: (root) CMD (   cd / &&
>> run-parts --report /etc/cron.hourly)
>> Sep 22 16:18:42 teresa kernel: [ 9220.400915] md: md3 stopped.
>> Sep 22 16:18:42 teresa kernel: [ 9220.411525] md: bind<hdi1>
>> Sep 22 16:18:42 teresa kernel: [ 9220.411884] md: bind<sde1>
>> Sep 22 16:18:42 teresa kernel: [ 9220.412577] md: bind<hdk1>
>> Sep 22 16:18:42 teresa kernel: [ 9220.413162] md: bind<hdg1>
>> Sep 22 16:18:42 teresa kernel: [ 9220.413750] md: bind<hde1>
>> Sep 22 16:18:42 teresa kernel: [ 9220.413855] md: kicking non-fresh sde1
>> from array!
>> Sep 22 16:18:42 teresa kernel: [ 9220.413887] md: unbind<sde1>
>> Sep 22 16:18:42 teresa kernel: [ 9220.413915] md: export_rdev(sde1)
>> Sep 22 16:18:42 teresa kernel: [ 9220.477393] raid5: device hde1
>> operational as raid disk 0
>> Sep 22 16:18:42 teresa kernel: [ 9220.477420] raid5: device hdg1
>> operational as raid disk 4
>> Sep 22 16:18:42 teresa kernel: [ 9220.477438] raid5: device hdk1
>> operational as raid disk 3
>> Sep 22 16:18:42 teresa kernel: [ 9220.477456] raid5: device hdi1
>> operational as raid disk 1
>> Sep 22 16:18:42 teresa kernel: [ 9220.478236] raid5: allocated 5252kB
>> for md3
>> Sep 22 16:18:42 teresa kernel: [ 9220.478265] raid5: raid level 5 set
>> md3
>> active with 4 out of 5 devices, algorithm 2
>> Sep 22 16:18:42 teresa kernel: [ 9220.478294] RAID5 conf printout:
>> Sep 22 16:18:42 teresa kernel: [ 9220.478309]  --- rd:5 wd:4
>> Sep 22 16:18:42 teresa kernel: [ 9220.478324]  disk 0, o:1, dev:hde1
>> Sep 22 16:18:42 teresa kernel: [ 9220.478339]  disk 1, o:1, dev:hdi1
>> Sep 22 16:18:42 teresa kernel: [ 9220.478354]  disk 3, o:1, dev:hdk1
>> Sep 22 16:18:42 teresa kernel: [ 9220.478369]  disk 4, o:1, dev:hdg1
>> # Me stopping md3
>> Sep 22 16:18:53 teresa mdadm[2694]: DeviceDisappeared event detected on
>> md
>> device /dev/md3
>> Sep 22 16:18:53 teresa kernel: [ 9231.572348] md: md3 stopped.
>> Sep 22 16:18:53 teresa kernel: [ 9231.572394] md: unbind<hde1>
>> Sep 22 16:18:53 teresa kernel: [ 9231.572423] md: export_rdev(hde1)
>> Sep 22 16:18:53 teresa kernel: [ 9231.572728] md: unbind<hdg1>
>> Sep 22 16:18:53 teresa kernel: [ 9231.572758] md: export_rdev(hdg1)
>> Sep 22 16:18:53 teresa kernel: [ 9231.572988] md: unbind<hdk1>
>> Sep 22 16:18:53 teresa kernel: [ 9231.573015] md: export_rdev(hdk1)
>> Sep 22 16:18:53 teresa kernel: [ 9231.573243] md: unbind<hdi1>
>> Sep 22 16:18:53 teresa kernel: [ 9231.573270] md: export_rdev(hdi1)
>> # Me creating md3 with sde1 missing
>> Sep 22 16:19:51 teresa kernel: [ 9289.621646] md: bind<hde1>
>> Sep 22 16:19:51 teresa kernel: [ 9289.665268] md: bind<hdi1>
>> Sep 22 16:19:51 teresa kernel: [ 9289.695676] md: bind<hdk1>
>> Sep 22 16:19:51 teresa kernel: [ 9289.726906] md: bind<hdg1>
>> Sep 22 16:19:51 teresa kernel: [ 9289.809030] raid5: device hdg1
>> operational as raid disk 4
>> Sep 22 16:19:51 teresa kernel: [ 9289.809057] raid5: device hdk1
>> operational as raid disk 3
>> Sep 22 16:19:51 teresa kernel: [ 9289.809075] raid5: device hdi1
>> operational as raid disk 1
>> Sep 22 16:19:51 teresa kernel: [ 9289.809093] raid5: device hde1
>> operational as raid disk 0
>> Sep 22 16:19:51 teresa kernel: [ 9289.809821] raid5: allocated 5252kB
>> for md3
>> Sep 22 16:19:51 teresa kernel: [ 9289.809850] raid5: raid level 5 set
>> md3
>> active with 4 out of 5 devices, algorithm 2
>> Sep 22 16:19:51 teresa kernel: [ 9289.809877] RAID5 conf printout:
>> Sep 22 16:19:51 teresa kernel: [ 9289.809891]  --- rd:5 wd:4
>> Sep 22 16:19:51 teresa kernel: [ 9289.809907]  disk 0, o:1, dev:hde1
>> Sep 22 16:19:51 teresa kernel: [ 9289.809922]  disk 1, o:1, dev:hdi1
>> Sep 22 16:19:51 teresa kernel: [ 9289.809937]  disk 3, o:1, dev:hdk1
>> Sep 22 16:19:51 teresa kernel: [ 9289.809953]  disk 4, o:1, dev:hdg1
>> Sep 22 16:20:20 teresa kernel: [ 9318.486512] kjournald starting.
>> Commit
>> interval 5 seconds
>> Sep 22 16:20:20 teresa kernel: [ 9318.486512] EXT3-fs: mounted
>> filesystem
>> with ordered data mode.
>> # Me stopping md3 again
>> Sep 22 16:20:42 teresa mdadm[2694]: DeviceDisappeared event detected on
>> md
>> device /dev/md3
>> Sep 22 16:20:42 teresa kernel: [ 9340.300590] md: md3 stopped.
>> Sep 22 16:20:42 teresa kernel: [ 9340.300639] md: unbind<hdg1>
>> Sep 22 16:20:42 teresa kernel: [ 9340.300668] md: export_rdev(hdg1)
>> Sep 22 16:20:42 teresa kernel: [ 9340.300921] md: unbind<hdk1>
>> Sep 22 16:20:42 teresa kernel: [ 9340.300950] md: export_rdev(hdk1)
>> Sep 22 16:20:42 teresa kernel: [ 9340.301183] md: unbind<hdi1>
>> Sep 22 16:20:42 teresa kernel: [ 9340.301211] md: export_rdev(hdi1)
>> Sep 22 16:20:42 teresa kernel: [ 9340.301438] md: unbind<hde1>
>> Sep 22 16:20:42 teresa kernel: [ 9340.301465] md: export_rdev(hde1)
>> # This is me doing the fatal create, that recovers the wrong disk
>> Sep 22 16:21:39 teresa kernel: [ 9397.609864] md: bind<hde1>
>> Sep 22 16:21:39 teresa kernel: [ 9397.652426] md: bind<hdi1>
>> Sep 22 16:21:39 teresa kernel: [ 9397.673203] md: bind<sde1>
>> Sep 22 16:21:39 teresa kernel: [ 9397.699373] md: bind<hdk1>
>> Sep 22 16:21:39 teresa kernel: [ 9397.739372] md: bind<hdg1>
>> Sep 22 16:21:39 teresa kernel: [ 9397.801729] raid5: device hdk1
>> operational as raid disk 3
>> Sep 22 16:21:39 teresa kernel: [ 9397.801756] raid5: device sde1
>> operational as raid disk 2
>> Sep 22 16:21:39 teresa kernel: [ 9397.801774] raid5: device hdi1
>> operational as raid disk 1
>> Sep 22 16:21:39 teresa kernel: [ 9397.801793] raid5: device hde1
>> operational as raid disk 0
>> Sep 22 16:21:39 teresa kernel: [ 9397.802531] raid5: allocated 5252kB
>> for md3
>> Sep 22 16:21:39 teresa kernel: [ 9397.802559] raid5: raid level 5 set
>> md3
>> active with 4 out of 5 devices, algorithm 2
>> Sep 22 16:21:39 teresa kernel: [ 9397.802586] RAID5 conf printout:
>> Sep 22 16:21:39 teresa kernel: [ 9397.802600]  --- rd:5 wd:4
>> Sep 22 16:21:39 teresa kernel: [ 9397.802615]  disk 0, o:1, dev:hde1
>> Sep 22 16:21:39 teresa kernel: [ 9397.802631]  disk 1, o:1, dev:hdi1
>> Sep 22 16:21:39 teresa kernel: [ 9397.802646]  disk 2, o:1, dev:sde1
>> Sep 22 16:21:39 teresa kernel: [ 9397.802661]  disk 3, o:1, dev:hdk1
>> Sep 22 16:21:39 teresa kernel: [ 9397.838429] RAID5 conf printout:
>> Sep 22 16:21:39 teresa kernel: [ 9397.838454]  --- rd:5 wd:4
>> Sep 22 16:21:39 teresa kernel: [ 9397.838471]  disk 0, o:1, dev:hde1
>> Sep 22 16:21:39 teresa kernel: [ 9397.838486]  disk 1, o:1, dev:hdi1
>> Sep 22 16:21:39 teresa kernel: [ 9397.838502]  disk 2, o:1, dev:sde1
>> Sep 22 16:21:39 teresa kernel: [ 9397.838518]  disk 3, o:1, dev:hdk1
>> Sep 22 16:21:39 teresa kernel: [ 9397.838533]  disk 4, o:1, dev:hdg1
>> Sep 22 16:21:39 teresa mdadm[2694]: RebuildStarted event detected on md
>> device /dev/md3
>> Sep 22 16:21:39 teresa kernel: [ 9397.841822] md: recovery of RAID array
>> md3
>> Sep 22 16:21:39 teresa kernel: [ 9397.841848] md: minimum _guaranteed_
>> speed: 1000 KB/sec/disk.
>> Sep 22 16:21:39 teresa kernel: [ 9397.841868] md: using maximum
>> available
>> idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
>> Sep 22 16:21:39 teresa kernel: [ 9397.841908] md: using 128k window,
>> over
>> a total of 732571904 blocks.
>> Sep 22 16:22:33 teresa kernel: [ 9451.640192] EXT3-fs error (device
>> md3):
>> ext3_check_descriptors: Block bitmap for group 3968 not in group (block
>> 0)!
>> Sep 22 16:22:33 teresa kernel: [ 9451.750241] EXT3-fs: group descriptors
>> corrupted!
>> Sep 22 16:22:39 teresa kernel: [ 9458.079151] md: md_do_sync() got
>> signal
>> ... exiting
>> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: md3 stopped.
>> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<hdg1>
>> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(hdg1)
>> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<hdk1>
>> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(hdk1)
>> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<sde1>
>> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(sde1)
>> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<hdi1>
>> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(hdi1)
>> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: unbind<hde1>
>> Sep 22 16:22:39 teresa kernel: [ 9458.114590] md: export_rdev(hde1)
>> Sep 22 16:22:39 teresa mdadm[2694]: DeviceDisappeared event detected on
>> md
>> device /dev/md3
>> # Me trying to recreate md3 without sde
>> Sep 22 16:23:50 teresa kernel: [ 9529.065477] md: bind<hde1>
>> Sep 22 16:23:50 teresa kernel: [ 9529.107767] md: bind<hdi1>
>> Sep 22 16:23:50 teresa kernel: [ 9529.137743] md: bind<hdk1>
>> Sep 22 16:23:50 teresa kernel: [ 9529.177990] md: bind<hdg1>
>> Sep 22 16:23:51 teresa mdadm[2694]: RebuildFinished event detected on md
>> device /dev/md3
>> Sep 22 16:23:51 teresa kernel: [ 9529.240814] raid5: device hdg1
>> operational as raid disk 4
>> Sep 22 16:23:51 teresa kernel: [ 9529.241734] raid5: device hdk1
>> operational as raid disk 3
>> Sep 22 16:23:51 teresa kernel: [ 9529.241752] raid5: device hdi1
>> operational as raid disk 1
>> Sep 22 16:23:51 teresa kernel: [ 9529.241770] raid5: device hde1
>> operational as raid disk 0
>> Sep 22 16:23:51 teresa kernel: [ 9529.242520] raid5: allocated 5252kB
>> for md3
>> Sep 22 16:23:51 teresa kernel: [ 9529.242547] raid5: raid level 5 set
>> md3
>> active with 4 out of 5 devices, algorithm 2
>> Sep 22 16:23:51 teresa kernel: [ 9529.242574] RAID5 conf printout:
>> Sep 22 16:23:51 teresa kernel: [ 9529.242588]  --- rd:5 wd:4
>> Sep 22 16:23:51 teresa kernel: [ 9529.242603]  disk 0, o:1, dev:hde1
>> Sep 22 16:23:51 teresa kernel: [ 9529.242618]  disk 1, o:1, dev:hdi1
>> Sep 22 16:23:51 teresa kernel: [ 9529.242633]  disk 3, o:1, dev:hdk1
>> Sep 22 16:23:51 teresa kernel: [ 9529.242649]  disk 4, o:1, dev:hdg1
>> # And me trying a fsck -n or a mount
>> Sep 22 16:24:07 teresa kernel: [ 9545.326343] EXT3-fs error (device
>> md3):
>> ext3_check_descriptors: Block bitmap for group 3968 not in group (block
>> 0)!
>> Sep 22 16:24:07 teresa kernel: [ 9545.369071] EXT3-fs: group descriptors
>> corrupted!
>>
>>
>> ### EXAMINES OF PARTITIONS ###
>>
>> === --examine /dev/hde1 ===
>> /dev/hde1:
>>           Magic : a92b4efc
>>         Version : 00.90.00
>>            UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host
>> teresa)
>>   Creation Time : Thu Sep 22 16:23:50 2011
>>      Raid Level : raid5
>>   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>>      Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
>>    Raid Devices : 5
>>   Total Devices : 4
>> Preferred Minor : 3
>>
>>     Update Time : Sun Sep 25 22:11:22 2011
>>           State : clean
>>  Active Devices : 4
>> Working Devices : 4
>>  Failed Devices : 1
>>   Spare Devices : 0
>>        Checksum : b7f6a3c0 - correct
>>          Events : 10
>>
>>          Layout : left-symmetric
>>      Chunk Size : 64K
>>
>>       Number   Major   Minor   RaidDevice State
>> this     0      33        1        0      active sync   /dev/hde1
>>
>>    0     0      33        1        0      active sync   /dev/hde1
>>    1     1      56        1        1      active sync   /dev/hdi1
>>    2     2       0        0        2      faulty removed
>>    3     3      57        1        3      active sync   /dev/hdk1
>>    4     4      34        1        4      active sync   /dev/hdg1
>>
>> === --examine /dev/hdi1 ===
>> /dev/hdi1:
>>           Magic : a92b4efc
>>         Version : 00.90.00
>>            UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host
>> teresa)
>>   Creation Time : Thu Sep 22 16:23:50 2011
>>      Raid Level : raid5
>>   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>>      Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
>>    Raid Devices : 5
>>   Total Devices : 4
>> Preferred Minor : 3
>>
>>     Update Time : Sun Sep 25 22:11:22 2011
>>           State : clean
>>  Active Devices : 4
>> Working Devices : 4
>>  Failed Devices : 1
>>   Spare Devices : 0
>>        Checksum : b7f6a3d9 - correct
>>          Events : 10
>>
>>          Layout : left-symmetric
>>      Chunk Size : 64K
>>
>>       Number   Major   Minor   RaidDevice State
>> this     1      56        1        1      active sync   /dev/hdi1
>>
>>    0     0      33        1        0      active sync   /dev/hde1
>>    1     1      56        1        1      active sync   /dev/hdi1
>>    2     2       0        0        2      faulty removed
>>    3     3      57        1        3      active sync   /dev/hdk1
>>    4     4      34        1        4      active sync   /dev/hdg1
>>
>> === --examine /dev/sde1 ===
>> /dev/sde1:
>>           Magic : a92b4efc
>>         Version : 00.90.00
>>            UUID : e6e3df36:1195239f:47f7b12e:9c2b2218 (local to host
>> teresa)
>>   Creation Time : Thu Sep 22 16:21:39 2011
>>      Raid Level : raid5
>>   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>>      Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
>>    Raid Devices : 5
>>   Total Devices : 5
>> Preferred Minor : 3
>>
>>     Update Time : Thu Sep 22 16:22:39 2011
>>           State : clean
>>  Active Devices : 4
>> Working Devices : 5
>>  Failed Devices : 1
>>   Spare Devices : 1
>>        Checksum : 4e69d679 - correct
>>          Events : 8
>>
>>          Layout : left-symmetric
>>      Chunk Size : 64K
>>
>>       Number   Major   Minor   RaidDevice State
>> this     2       8       65        2      active sync   /dev/sde1
>>
>>    0     0      33        1        0      active sync   /dev/hde1
>>    1     1      56        1        1      active sync   /dev/hdi1
>>    2     2       8       65        2      active sync   /dev/sde1
>>    3     3      57        1        3      active sync   /dev/hdk1
>>    4     4       0        0        4      faulty removed
>>    5     5      34        1        5      spare   /dev/hdg1
>>
>> === --examine /dev/hdk1 ===
>> /dev/hdk1:
>>           Magic : a92b4efc
>>         Version : 00.90.00
>>            UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host
>> teresa)
>>   Creation Time : Thu Sep 22 16:23:50 2011
>>      Raid Level : raid5
>>   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>>      Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
>>    Raid Devices : 5
>>   Total Devices : 4
>> Preferred Minor : 3
>>
>>     Update Time : Sun Sep 25 22:11:22 2011
>>           State : clean
>>  Active Devices : 4
>> Working Devices : 4
>>  Failed Devices : 1
>>   Spare Devices : 0
>>        Checksum : b7f6a3de - correct
>>          Events : 10
>>
>>          Layout : left-symmetric
>>      Chunk Size : 64K
>>
>>       Number   Major   Minor   RaidDevice State
>> this     3      57        1        3      active sync   /dev/hdk1
>>
>>    0     0      33        1        0      active sync   /dev/hde1
>>    1     1      56        1        1      active sync   /dev/hdi1
>>    2     2       0        0        2      faulty removed
>>    3     3      57        1        3      active sync   /dev/hdk1
>>    4     4      34        1        4      active sync   /dev/hdg1
>>
>> === --examine /dev/hdg1 ===
>> /dev/hdg1:
>>           Magic : a92b4efc
>>         Version : 00.90.00
>>            UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host
>> teresa)
>>   Creation Time : Thu Sep 22 16:23:50 2011
>>      Raid Level : raid5
>>   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>>      Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
>>    Raid Devices : 5
>>   Total Devices : 4
>> Preferred Minor : 3
>>
>>     Update Time : Sun Sep 25 22:11:22 2011
>>           State : clean
>>  Active Devices : 4
>> Working Devices : 4
>>  Failed Devices : 1
>>   Spare Devices : 0
>>        Checksum : b7f6a3c9 - correct
>>          Events : 10
>>
>>          Layout : left-symmetric
>>      Chunk Size : 64K
>>
>>       Number   Major   Minor   RaidDevice State
>> this     4      34        1        4      active sync   /dev/hdg1
>>
>>    0     0      33        1        0      active sync   /dev/hde1
>>    1     1      56        1        1      active sync   /dev/hdi1
>>    2     2       0        0        2      faulty removed
>>    3     3      57        1        3      active sync   /dev/hdk1
>>    4     4      34        1        4      active sync   /dev/hdg1
>>
>>
>>
>>
>> >
>> >
>> >>
>> >> (2) Can I suggest improvements into resilvering?  Can I contribute
>> code
>> >> to
>> >> implement them?  Such as resilver from the end of the drive back to
>> the
>> >> front, so if you notice the wrong drive resilvering, you can stop and
>> >> not
>> >> lose the MBR and the directory format structure that's stored in the
>> >> first
>> >> few sectors?  I'd also like to take a look at adding a raid mode
>> where
>> >> there's checksum in every stripe block so the system can detect
>> >> corrupted
>> >> disks and not resilver.  I'd also like to add a raid option where a
>> >> resilvering need will be reported by email and needs to be started
>> >> manually.  All to prevent what happened to me from happening again.
>> >>
>> >> Thanks for your time.
>> >>
>> >> Kenn Frank
>> >>
>> >> P.S.  Setup:
>> >>
>> >> # uname -a
>> >> Linux teresa 2.6.26-2-686 #1 SMP Sat Jun 11 14:54:10 UTC 2011 i686
>> >> GNU/Linux
>> >>
>> >> # mdadm --version
>> >> mdadm - v2.6.7.2 - 14th November 2008
>> >>
>> >> # mdadm --detail /dev/md3
>> >> /dev/md3:
>> >>         Version : 00.90
>> >>   Creation Time : Thu Sep 22 16:23:50 2011
>> >>      Raid Level : raid5
>> >>      Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
>> >>   Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>> >>    Raid Devices : 5
>> >>   Total Devices : 4
>> >> Preferred Minor : 3
>> >>     Persistence : Superblock is persistent
>> >>
>> >>     Update Time : Thu Sep 22 20:19:09 2011
>> >>           State : clean, degraded
>> >>  Active Devices : 4
>> >> Working Devices : 4
>> >>  Failed Devices : 0
>> >>   Spare Devices : 0
>> >>
>> >>          Layout : left-symmetric
>> >>      Chunk Size : 64K
>> >>
>> >>            UUID : ed1e6357:74e32684:47f7b12e:9c2b2218 (local to host
>> >> teresa)
>> >>          Events : 0.6
>> >>
>> >>     Number   Major   Minor   RaidDevice State
>> >>        0      33        1        0      active sync   /dev/hde1
>> >>        1      56        1        1      active sync   /dev/hdi1
>> >>        2       0        0        2      removed
>> >>        3      57        1        3      active sync   /dev/hdk1
>> >>        4      34        1        4      active sync   /dev/hdg1
>> >>
>> >>
>> >
>> >
>>
>
>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re:
  2011-09-26 18:04       ` Re: Kenn
@ 2011-09-26 19:56         ` David Brown
  0 siblings, 0 replies; 10+ messages in thread
From: David Brown @ 2011-09-26 19:56 UTC (permalink / raw)
  To: linux-raid

On 26/09/11 20:04, Kenn wrote:
>> On Mon, 26 Sep 2011 00:42:23 -0700 "Kenn"<kenn@kenn.us>  wrote:
>>
>>> Replying.  I realize and I apologize I didn't create a subject.  I hope
>>> this doesn't confuse majordomo.
>>>
>>>> On Sun, 25 Sep 2011 21:23:31 -0700 "Kenn"<kenn@kenn.us>  wrote:
>>>>
>>>>> I have a raid5 array that had a drive drop out, and resilvered the
>>> wrong
>>>>> drive when I put it back in, corrupting and destroying the raid.  I
>>>>> stopped the array at less than 1% resilvering and I'm in the process
>>> of
>>>>> making a dd-copy of the drive to recover the files.
>>>>
>>>> I don't know what you mean by "resilvered".
>>>
>>> Resilvering -- Rebuilding the array.  Lesser used term, sorry!
>>
>> I see..
>>
>> I guess that looking-glass mirrors have a silver backing and when it
>> becomes
>> tarnished you might re-silver the mirror to make it better again.
>> So the name works as a poor pun for RAID1.  But I don't see how it applies
>> to RAID5....
>> No matter.
>>
>> Basically you have messed up badly.
>> Recreating arrays should only be done as a last-ditch attempt to get data
>> back, and preferably with expert advice...
>>
>> When you created the array with all devices present it effectively started
>> copying the corruption that you had deliberately (why??) placed on device
>> 2
>> (sde) onto device 4 (counting from 0).
>> So now you have two devices that are corrupt in the early blocks.
>> There is not much you can do to fix that.
>>
>> There is some chance that 'fsck' could find a backup superblock somewhere
>> and
>> try to put the pieces back together.  But the 'mkfs' probably made a
>> substantial mess of important data structures so I don't consider you
>> chances
>> very high.
>> Keeping sde out and just working with the remaining 4 is certainly your
>> best
>> bet.
>>
>> What made you think it would be a good idea to re-create the array when
>> all
>> you wanted to do was trigger a resync/recovery??
>>
>> NeilBrown
>
> Originally I had failed&  removed sde from the array and then added it
> back in, but no resilvering happened, it was just placed as raid device #
> 5 as an active (faulty?) spare, no rebuilding.  So I thought I'd have to
> recreate the array to get it to rebuild.
>
> Because my sde disk was only questionably healthy, if the problem was the
> loose cable, I wanted to test the sde disk by having a complete rebuild
> put onto it.   I was confident in all the other drives because when I
> mounted the array without sde, I ran a complete md5sum scan and
> everything's checksum was correct.  So I wanted to force a complete
> rebuilding of the array on sde and the --zero-superblock was supposed to
> render sde "new" to the array to force the rebuild onto sde.  I just did
> the fsck and mkfs for good measure instead of spending the time of using
> dd to zero every byte on the drive.  At the time because I thought if
> --zero-superblock went wrong, md would reject a blank drive as a data
> source for rebuilding and prevent resilvering.
>
> So that brings up another point -- I've been reading through your blog,
> and I acknowledge your thoughts on not having much benefit to checksums on
> every block (http://neil.brown.name/blog/20110227114201), but sometimes
> people like to having that extra lock on their door even though it takes
> more effort to go in and out of their home.  In my five-drive array, if
> the last five words were the checksums of the blocks on every drive, the
> checksums off each drive could vote on trusting the blocks of every other
> drive during the rebuild process, and prevent an idiot (me) from killing
> his data.  It would force wasteful sectors on the drive, perhaps harm
> performance by squeezing 2+n bytes out of each sector, but if someone
> wants to protect their data as much as possible, it would be a welcome
> option where performance is not a priority.
>
> Also, the checksums do provide some protection: first, against against
> partial media failure, which is a major flaw in raid 456 design according
> to http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt , and checksum
> voting could protect against the Atomicity/write-in-place flaw outlined in
> http://en.wikipedia.org/wiki/RAID#Problems_with_RAID .
>
> What do you think?
>
> Kenn

/raid/ protects against partial media flaws.  If one disk in a raid5 
stripe has a bad sector, that sector will be ignored and the missing 
data will be re-created from the other disks using the raid recovery 
algorithm.  If you want to have such protection even when doing a resync 
(as many people do), then use raid6 - it has two parity blocks.

As Neil points out in his blog, it is impossible to fully recover from a 
failure part way through a write - checksum voting or majority voting 
/may/ give you the right answer, but it may not.  If you need protection 
against that, you have to have filesystem level control (data logging 
and journalling as well as metafile journalling), or perhaps use raid 
systems with battery backed write caches.



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-09-27  9:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-26  4:23 (unknown), Kenn
2011-09-26  4:52 ` NeilBrown
2011-09-26  7:03   ` Re: Roman Mamedov
2011-09-26 23:23     ` Re: Kenn
2011-09-26 23:46     ` Recovering from a Bad Resilver / Rebuild Kenn
2011-09-27  9:27       ` David Brown
2011-09-26  7:42   ` Kenn
2011-09-26  8:04     ` Re: NeilBrown
2011-09-26 18:04       ` Re: Kenn
2011-09-26 19:56         ` Re: David Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).