Re-assembling faulty array

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re-assembling faulty array
@ 2012-05-11 19:41 C.J. Adams-Collier KF7BMP
  2012-05-15  0:09 ` NeilBrown
  2012-05-15 22:59 ` NeilBrown
  0 siblings, 2 replies; 4+ messages in thread
From: C.J. Adams-Collier KF7BMP @ 2012-05-11 19:41 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 856 bytes --]

Hey all,

I've got an array that seems to have failed while I was re-synchronizing
one of the disks.  sde fell out when I moved six disks from one chassis
to another.  I re-added it and it was 98.8% done with 300 minutes left
in the process when I went to sleep last night.  When I woke up, the
array was in a FAILED state, sdg was marked failed and sde was marked
spare.  I removed sdg from the array and re-booted and now the array
won't start.

Is there a way to re-add sdg back in to slot 5 rather than having it
added as a spare?  AFAICT, no writes have been made to sdg or md0 since
I removed it from the array, so it should be pretty close to its active
state.  sde must be nearly ready to be added in as an active participant
in the array, too.

Is there anything I can do to re-build the array at this point?

Cheers,

C.J.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re-assembling faulty array
  2012-05-11 19:41 Re-assembling faulty array C.J. Adams-Collier KF7BMP
@ 2012-05-15  0:09 ` NeilBrown
  2012-05-15 22:59 ` NeilBrown
  1 sibling, 0 replies; 4+ messages in thread
From: NeilBrown @ 2012-05-15  0:09 UTC (permalink / raw)
  To: C.J. Adams-Collier KF7BMP; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1155 bytes --]

On Fri, 11 May 2012 12:41:16 -0700 "C.J. Adams-Collier KF7BMP"
<cjac@colliertech.org> wrote:

> Hey all,
> 
> I've got an array that seems to have failed while I was re-synchronizing
> one of the disks.  sde fell out when I moved six disks from one chassis
> to another.  I re-added it and it was 98.8% done with 300 minutes left
> in the process when I went to sleep last night.  When I woke up, the
> array was in a FAILED state, sdg was marked failed and sde was marked
> spare.  I removed sdg from the array and re-booted and now the array
> won't start.
> 
> Is there a way to re-add sdg back in to slot 5 rather than having it
> added as a spare?  AFAICT, no writes have been made to sdg or md0 since
> I removed it from the array, so it should be pretty close to its active
> state.  sde must be nearly ready to be added in as an active participant
> in the array, too.
> 
> Is there anything I can do to re-build the array at this point?
> 
>
Maybe.  However I cannot quite intuit the state of the arrays from your
description.
Could you report the output of "mdadm -E" on each member device please?

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re-assembling faulty array
  2012-05-11 19:41 Re-assembling faulty array C.J. Adams-Collier KF7BMP
  2012-05-15  0:09 ` NeilBrown
@ 2012-05-15 22:59 ` NeilBrown
  2012-05-16  5:16   ` C.J. Adams-Collier
  1 sibling, 1 reply; 4+ messages in thread
From: NeilBrown @ 2012-05-15 22:59 UTC (permalink / raw)
  To: C.J. Adams-Collier KF7BMP; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2757 bytes --]

On Fri, 11 May 2012 12:41:16 -0700 "C.J. Adams-Collier KF7BMP"
<cjac@colliertech.org> wrote:

> Hey all,
> 
> I've got an array that seems to have failed while I was re-synchronizing
> one of the disks.  sde fell out when I moved six disks from one chassis
> to another.  I re-added it and it was 98.8% done with 300 minutes left
> in the process when I went to sleep last night.  When I woke up, the
> array was in a FAILED state, sdg was marked failed and sde was marked
> spare.  I removed sdg from the array and re-booted and now the array
> won't start.
> 
> Is there a way to re-add sdg back in to slot 5 rather than having it
> added as a spare?  AFAICT, no writes have been made to sdg or md0 since
> I removed it from the array, so it should be pretty close to its active
> state.  sde must be nearly ready to be added in as an active participant
> in the array, too.
> 
> Is there anything I can do to re-build the array at this point?
> 
> Cheers,
> 
> C.J.
> 

 From the "mdadm -E" you sent me separately :

        Version : 0.90.00
     Raid Level : raid5
  Used Dev Size : 972848128 (927.78 GiB 996.20 GB)
     Array Size : 4864240640 (4638.90 GiB 4980.98 GB)
   Raid Devices : 6

and "grep this" show:

this     3       8       18        3      active sync   /dev/sdb2
this     4       8       34        4      active sync   /dev/sdc2
this     2       8       50        2      active sync   /dev/sdd2
this     6       8       66        6      spare   /dev/sde2
this     1       8       82        1      active sync   /dev/sdf2
this     6       8       98        6      spare   /dev/sdg2

"grep Events" shows:

         Events : 34795
         Events : 34795
         Events : 34795
         Events : 34795
         Events : 34795
         Events : 34794

So you are missing device '0' and '5'.

So presumably sdg reported an error before sde finished recovery, so
sde remains a spare.  I cannot see why "sdg"  is marked as a spare though.
It should still be marked as a member of the array.  Maybe you tried to add
it after removing it?

What you need to do is decide which of 'e' and 'g' you trust most (probably
g, but I don't know the full history) and which slot it should be in (0 or 5,
you might be able to tell from a recent "RAID conf printout" in kernel logs).
Then
 mdadm -S /dev/md0
 mdadm -C /dev/md0 -l5 -n6 -e 0.90 -c 64 /dev/sdg2 /dev/sdf2 /dev/sdd2 \
        /dev/sdb2 /dev/sdc2 missing

The order of devices is important.  This puts 'g2' in slot 0 and 'missing'
in slot 5.

Then 'fsck -n /dev/md0' or whatever is appropriate given what sort of data
you have on md0.  If that is happy, add the other device (g2 or e2) and let
it recovery.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re-assembling faulty array
  2012-05-15 22:59 ` NeilBrown
@ 2012-05-16  5:16   ` C.J. Adams-Collier
  0 siblings, 0 replies; 4+ messages in thread
From: C.J. Adams-Collier @ 2012-05-16  5:16 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid@vger.kernel.org





On May 15, 2012, at 3:59 PM, NeilBrown <neilb@suse.de> wrote:

> On Fri, 11 May 2012 12:41:16 -0700 "C.J. Adams-Collier KF7BMP"
> <cjac@colliertech.org> wrote:
> 
>> Hey all,
>> 
>> I've got an array that seems to have failed while I was re-synchronizing
>> one of the disks.  sde fell out when I moved six disks from one chassis
>> to another.  I re-added it and it was 98.8% done with 300 minutes left
>> in the process when I went to sleep last night.  When I woke up, the
>> array was in a FAILED state, sdg was marked failed and sde was marked
>> spare.  I removed sdg from the array and re-booted and now the array
>> won't start.
>> 
>> Is there a way to re-add sdg back in to slot 5 rather than having it
>> added as a spare?  AFAICT, no writes have been made to sdg or md0 since
>> I removed it from the array, so it should be pretty close to its active
>> state.  sde must be nearly ready to be added in as an active participant
>> in the array, too.
>> 
>> Is there anything I can do to re-build the array at this point?
>> 
>> Cheers,
>> 
>> C.J.
>> 
> 
> From the "mdadm -E" you sent me separately :
> 
>        Version : 0.90.00
>     Raid Level : raid5
>  Used Dev Size : 972848128 (927.78 GiB 996.20 GB)
>     Array Size : 4864240640 (4638.90 GiB 4980.98 GB)
>   Raid Devices : 6
> 
> and "grep this" show:
> 
> this     3       8       18        3      active sync   /dev/sdb2
> this     4       8       34        4      active sync   /dev/sdc2
> this     2       8       50        2      active sync   /dev/sdd2
> this     6       8       66        6      spare   /dev/sde2
> this     1       8       82        1      active sync   /dev/sdf2
> this     6       8       98        6      spare   /dev/sdg2
> 
> "grep Events" shows:
> 
>         Events : 34795
>         Events : 34795
>         Events : 34795
>         Events : 34795
>         Events : 34795
>         Events : 34794
> 
> So you are missing device '0' and '5'.
> 
> So presumably sdg reported an error before sde finished recovery, so
> sde remains a spare.  I cannot see why "sdg"  is marked as a spare though.
> It should still be marked as a member of the array.  Maybe you tried to add
> it after removing it?
> 
> What you need to do is decide which of 'e' and 'g' you trust most (probably
> g, but I don't know the full history) and which slot it should be in (0 or 5,
> you might be able to tell from a recent "RAID conf printout" in kernel logs).
> Then
> mdadm -S /dev/md0
> mdadm -C /dev/md0 -l5 -n6 -e 0.90 -c 64 /dev/sdg2 /dev/sdf2 /dev/sdd2 \
>        /dev/sdb2 /dev/sdc2 missing
> 
> The order of devices is important.  This puts 'g2' in slot 0 and 'missing'
> in slot 5.
> 
> Then 'fsck -n /dev/md0' or whatever is appropriate given what sort of data
> you have on md0.  If that is happy, add the other device (g2 or e2) and let
> it recovery.
> 
> NeilBrown

Thanks a million.  I really appreciate your help.

Sent from my PDP-11

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-05-16  5:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-11 19:41 Re-assembling faulty array C.J. Adams-Collier KF7BMP
2012-05-15  0:09 ` NeilBrown
2012-05-15 22:59 ` NeilBrown
2012-05-16  5:16   ` C.J. Adams-Collier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).