recover short-time-power-failure of raid 5?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* recover short-time-power-failure of raid 5?
@ 2002-11-24 20:03 krause
  2002-11-24 23:45 ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: krause @ 2002-11-24 20:03 UTC (permalink / raw)
  To: linux-raid

hi folks!

i am using a small 3-disk software raid on a linux redhat 7.3 system for
my home directory. 
never had any problems since today, somehow the power cable of to two of
my disks dropped (a fan in the case got loose which was connected to the
same cable!). only some seconds later i shut down the system, switched
it off and reconneceted the disks. the raid device /dev/md0 was not used
by the time the power failure happend (i was not logged in, /dev/md0
only contains /home), so actually there should be no loss of data.
but how do i tell the system just to reuse the disks? 
i read some howtos and articles in the mailing archiv but found no 
instruction for this case or (completely ?) different ways (one said
just use "mkraid --create", others "raidhotadd") 
so what is the correct way? the data on /home is quite important to me
and the last backup a week ago unfortunately does not include some
relevant changes! because of this i dared not to try things out.

attached are the contents/output of /etc/raidtab, mdadm --examine and
/var/log/messages.

i hope you can help me to reuse my data!

thanks in advance for your hints!

   markus


my config:
---8<---
[root@merlin root]# cat /etc/raidtab
raiddev             /dev/md0
raid-level                  5
nr-raid-disks               3
chunk-size                  64k
persistent-superblock       1
nr-spare-disks              0
    device          /dev/sda1
    raid-disk     0
    device          /dev/sdb1
    raid-disk     1
    device          /dev/sdc2
    raid-disk     2
[root@merlin root]#
--->8---

some other output:
---8<---
[root@merlin root]# mdadm --examine /dev/sda1
/dev/sda1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 396dcf37:a40fe0d4:c6d72718:8dd3f5d6
  Creation Time : Thu Jun 20 17:13:01 2002
     Raid Level : raid5
    Device Size : 4192832 (3.100 GiB 4.34 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0

    Update Time : Sun Nov 24 12:30:33 2002
          State : clean, no-errors
 Active Devices : 1
Working Devices : 1
 Failed Devices : 2
  Spare Devices : 0
       Checksum : 5687c1bf - correct
         Events : 0.254

         Layout : left-asymmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1
   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      faulty   /dev/sdb1
   2     2       8       34        2      faulty   /dev/sdc2
[root@merlin root]#
[root@merlin root]# mdadm --examine /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 396dcf37:a40fe0d4:c6d72718:8dd3f5d6
  Creation Time : Thu Jun 20 17:13:01 2002
     Raid Level : raid5
    Device Size : 4192832 (3.100 GiB 4.34 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0

    Update Time : Sun Nov 24 12:27:53 2002
          State : dirty, no-errors
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 5687c139 - correct
         Events : 0.251

         Layout : left-asymmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       17        1      active sync   /dev/sdb1
   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       34        2      active sync   /dev/sdc2
[root@merlin root]#
[root@merlin root]# mdadm --examine /dev/sdc2
/dev/sdc2:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 396dcf37:a40fe0d4:c6d72718:8dd3f5d6
  Creation Time : Thu Jun 20 17:13:01 2002
     Raid Level : raid5
    Device Size : 4192832 (3.100 GiB 4.34 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0

    Update Time : Sun Nov 24 12:27:53 2002
          State : dirty, no-errors
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 5687c14c - correct
         Events : 0.251

         Layout : left-asymmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       34        2      active sync   /dev/sdc2
   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       34        2      active sync   /dev/sdc2
[root@merlin root]#
--->8---


the info in /var/log/messages when running "raidstart /dev/md0":
---8<---
Nov 24 21:02:08 merlin kernel:  [events: 000000fe]
Nov 24 21:02:08 merlin kernel:  [events: 000000fb]
Nov 24 21:02:08 merlin kernel:  [events: 000000fb]
Nov 24 21:02:08 merlin kernel: md: autorun ...
Nov 24 21:02:08 merlin kernel: md: considering sdc2 ...
Nov 24 21:02:08 merlin kernel: md:  adding sdc2 ...
Nov 24 21:02:08 merlin kernel: md:  adding sdb1 ...
Nov 24 21:02:08 merlin kernel: md:  adding sda1 ...
Nov 24 21:02:08 merlin kernel: md: created md0
Nov 24 21:02:08 merlin kernel: md: bind<sda1,1>
Nov 24 21:02:08 merlin kernel: md: bind<sdb1,2>
Nov 24 21:02:08 merlin kernel: md: bind<sdc2,3>
Nov 24 21:02:08 merlin kernel: md: running: <sdc2><sdb1><sda1>
Nov 24 21:02:08 merlin kernel: md: sdc2's event counter: 000000fb
Nov 24 21:02:08 merlin kernel: md: sdb1's event counter: 000000fb
Nov 24 21:02:08 merlin kernel: md: sda1's event counter: 000000fe
Nov 24 21:02:08 merlin kernel: md: superblock update time inconsistency
-- using the most recent one
Nov 24 21:02:08 merlin kernel: md: freshest: sda1
Nov 24 21:02:08 merlin kernel: md: kicking non-fresh sdc2 from array!
Nov 24 21:02:08 merlin kernel: md: unbind<sdc2,2>
Nov 24 21:02:08 merlin kernel: md: export_rdev(sdc2)
Nov 24 21:02:08 merlin kernel: md: kicking non-fresh sdb1 from array!
Nov 24 21:02:08 merlin kernel: md: unbind<sdb1,1>
Nov 24 21:02:08 merlin kernel: md: export_rdev(sdb1)
Nov 24 21:02:08 merlin kernel: md0: removing former faulty sdb1!
Nov 24 21:02:08 merlin kernel: md0: removing former faulty sdc2!
Nov 24 21:02:08 merlin kernel: md0: max total readahead window set to
512k
Nov 24 21:02:08 merlin kernel: md0: 2 data-disks, max readahead per
data-disk: 256k
Nov 24 21:02:08 merlin kernel: raid5: device sda1 operational as raid
disk 0
Nov 24 21:02:08 merlin kernel: raid5: not enough operational devices for
md0 (2/3 failed)
Nov 24 21:02:08 merlin kernel: RAID5 conf printout:
Nov 24 21:02:08 merlin kernel:  --- rd:3 wd:1 fd:2
Nov 24 21:02:08 merlin kernel:  disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda1
Nov 24 21:02:08 merlin kernel:  disk 1, s:0, o:0, n:1 rd:1 us:1 dev:[dev
00:00]
Nov 24 21:02:08 merlin kernel:  disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev
00:00]
Nov 24 21:02:08 merlin kernel: raid5: failed to run raid set md0
Nov 24 21:02:08 merlin kernel: md: pers->run() failed ...
Nov 24 21:02:08 merlin kernel: md :do_md_run() returned -22
Nov 24 21:02:08 merlin kernel: md: md0 stopped.
Nov 24 21:02:08 merlin kernel: md: unbind<sda1,0>
Nov 24 21:02:08 merlin kernel: md: export_rdev(sda1)
Nov 24 21:02:08 merlin kernel: md: ... autorun DONE.
---8<---

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: recover short-time-power-failure of raid 5?
  2002-11-24 20:03 recover short-time-power-failure of raid 5? krause
@ 2002-11-24 23:45 ` Neil Brown
  2002-11-25  9:42   ` krause
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2002-11-24 23:45 UTC (permalink / raw)
  To: krause; +Cc: linux-raid

On Sunday November 24, krause@mogli-soft.de wrote:
> hi folks!
> 
> i am using a small 3-disk software raid on a linux redhat 7.3 system for
> my home directory. 
> never had any problems since today, somehow the power cable of to two of
> my disks dropped (a fan in the case got loose which was connected to the
> same cable!). only some seconds later i shut down the system, switched
> it off and reconneceted the disks. the raid device /dev/md0 was not used
> by the time the power failure happend (i was not logged in, /dev/md0
> only contains /home), so actually there should be no loss of data.
> but how do i tell the system just to reuse the disks? 


 mdadm -A /dev/md0 --force /dev/sda1 /dev/sdb1 /dev/sdc2

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: recover short-time-power-failure of raid 5?
  2002-11-24 23:45 ` Neil Brown
@ 2002-11-25  9:42   ` krause
  2002-11-25  9:59     ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: krause @ 2002-11-25  9:42 UTC (permalink / raw)
  To: linux-raid

Neil Brown <neilb@cse.unsw.edu.au> schrieb am 25.11.2002, 00:45:22:
> On Sunday November 24, krause@mogli-soft.de wrote:
> > hi folks!
> > 
> > i am using a small 3-disk software raid on a linux redhat 7.3 system for
> > my home directory. 
> > never had any problems since today, somehow the power cable of to two of
> > my disks dropped (a fan in the case got loose which was connected to the
> > same cable!). only some seconds later i shut down the system, switched
> > it off and reconneceted the disks. the raid device /dev/md0 was not used
> > by the time the power failure happend (i was not logged in, /dev/md0
> > only contains /home), so actually there should be no loss of data.
> > but how do i tell the system just to reuse the disks? 
> 
> 
>  mdadm -A /dev/md0 --force /dev/sda1 /dev/sdb1 /dev/sdc2
> 
> NeilBrown

hello neilbrown, 

thanks a lot for your fast response!
i tried the command as you suggested but i am not sure if it really
worked, it seems that /dev/sdc2 is not yet used (but of course i may be
wrong! ;-) )
after i ran "mdadm -A /dev/md0 --force /dev/sda1 /dev/sdb1 /dev/sdc2" i
got in my /var/log/messages:

--->8---
Nov 25 10:09:31 merlin kernel:  [events: 000000fe]
Nov 25 10:09:31 merlin kernel: md: bind<sdb1,1>
Nov 25 10:09:31 merlin kernel:  [events: 000000fe]
Nov 25 10:09:31 merlin kernel: md: bind<sda1,2>
Nov 25 10:09:31 merlin kernel: md: sda1's event counter: 000000fe
Nov 25 10:09:31 merlin kernel: md: sdb1's event counter: 000000fe
Nov 25 10:09:31 merlin kernel: md0: removing former faulty sdc2!
Nov 25 10:09:31 merlin kernel: md0: max total readahead window set to
512k
Nov 25 10:09:31 merlin kernel: md0: 2 data-disks, max readahead per
data-disk: 256k
Nov 25 10:09:31 merlin kernel: raid5: device sda1 operational as raid
disk 0
Nov 25 10:09:31 merlin kernel: raid5: device sdb1 operational as raid
disk 1
Nov 25 10:09:31 merlin kernel: raid5: md0, not all disks are operational
-- trying to recover array
Nov 25 10:09:31 merlin kernel: raid5: allocated 3291kB for md0
Nov 25 10:09:31 merlin kernel: raid5: raid level 5 set md0 active with 2
out of 3 devices, algorithm 0
Nov 25 10:09:31 merlin kernel: RAID5 conf printout:
Nov 25 10:09:31 merlin kernel:  --- rd:3 wd:2 fd:1
Nov 25 10:09:31 merlin kernel:  disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda1
Nov 25 10:09:31 merlin kernel:  disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb1
Nov 25 10:09:31 merlin kernel:  disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev
00:00]
Nov 25 10:09:31 merlin kernel: RAID5 conf printout:
Nov 25 10:09:31 merlin kernel:  --- rd:3 wd:2 fd:1
Nov 25 10:09:31 merlin kernel:  disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda1
Nov 25 10:09:31 merlin kernel:  disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb1
Nov 25 10:09:31 merlin kernel:  disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev
00:00]
Nov 25 10:09:31 merlin kernel: md: updating md0 RAID superblock on
device
Nov 25 10:09:31 merlin kernel: md: sda1 [events: 000000ff]<6>(write)
sda1's sb offset: 4192960
Nov 25 10:09:31 merlin kernel: md: recovery thread got woken up ...
Nov 25 10:09:31 merlin kernel: md0: no spare disk to reconstruct array!
-- continuing in degraded mode
Nov 25 10:09:31 merlin kernel: md: recovery thread finished ...
Nov 25 10:09:31 merlin kernel: md: sdb1 [events: 000000ff]<6>(write)
sdb1's sb offset: 4192832
---8<---

then i ran "raidstop /dev/md0" because i thought i did somethin wrong!
/var/log/messages said:
--->8---
Nov 25 10:09:54 merlin kernel: md: marking sb clean...
Nov 25 10:09:54 merlin kernel: md: updating md0 RAID superblock on
device
Nov 25 10:09:54 merlin kernel: md: sda1 [events: 00000100]<6>(write)
sda1's sb offset: 4192960
Nov 25 10:09:54 merlin kernel: md: sdb1 [events: 00000100]<6>(write)
sdb1's sb offset: 4192832
Nov 25 10:09:55 merlin kernel: md: md0 stopped.
Nov 25 10:09:55 merlin kernel: md: unbind<sda1,1>
Nov 25 10:09:55 merlin kernel: md: export_rdev(sda1)
Nov 25 10:09:55 merlin kernel: md: unbind<sdb1,0>
Nov 25 10:09:55 merlin kernel: md: export_rdev(sdb1)
---8<---

but then i tried another "mdadm ..." just to see if maybe it takes to
steps/runs to get back two drives, but again /var/log/messages was not
encouraging (for me):

--->8---
Nov 25 10:10:15 merlin kernel:  [events: 00000100]
Nov 25 10:10:15 merlin kernel: md: bind<sdb1,1>
Nov 25 10:10:15 merlin kernel:  [events: 00000100]
Nov 25 10:10:15 merlin kernel: md: bind<sda1,2>
Nov 25 10:10:15 merlin kernel: md: sda1's event counter: 00000100
Nov 25 10:10:15 merlin kernel: md: sdb1's event counter: 00000100
Nov 25 10:10:15 merlin kernel: md0: max total readahead window set to
512k
Nov 25 10:10:15 merlin kernel: md0: 2 data-disks, max readahead per
data-disk: 256k
Nov 25 10:10:15 merlin kernel: raid5: device sda1 operational as raid
disk 0
Nov 25 10:10:15 merlin kernel: raid5: device sdb1 operational as raid
disk 1
Nov 25 10:10:15 merlin kernel: raid5: md0, not all disks are operational
-- trying to recover array
Nov 25 10:10:15 merlin kernel: raid5: allocated 3291kB for md0
Nov 25 10:10:15 merlin kernel: raid5: raid level 5 set md0 active with 2
out of 3 devices, algorithm 0
Nov 25 10:10:15 merlin kernel: RAID5 conf printout:
Nov 25 10:10:15 merlin kernel:  --- rd:3 wd:2 fd:1
Nov 25 10:10:15 merlin kernel:  disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda1
Nov 25 10:10:15 merlin kernel:  disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb1
Nov 25 10:10:15 merlin kernel:  disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev
00:00]
Nov 25 10:10:15 merlin kernel: RAID5 conf printout:
Nov 25 10:10:15 merlin kernel:  --- rd:3 wd:2 fd:1
Nov 25 10:10:15 merlin kernel:  disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda1
Nov 25 10:10:15 merlin kernel:  disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb1
Nov 25 10:10:15 merlin kernel:  disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev
00:00]
Nov 25 10:10:15 merlin kernel: md: updating md0 RAID superblock on
device
Nov 25 10:10:15 merlin kernel: md: sda1 [events: 00000101]<6>(write)
sda1's sb offset: 4192960
Nov 25 10:10:15 merlin kernel: md: recovery thread got woken up ...
Nov 25 10:10:15 merlin kernel: md0: no spare disk to reconstruct array!
-- continuing in degraded mode
Nov 25 10:10:15 merlin kernel: md: recovery thread finished ...
Nov 25 10:10:15 merlin kernel: md: sdb1 [events: 00000101]<6>(write)
sdb1's sb offset: 4192832
---8<---

what about disk /dev/sdc2 ? i have tried to access /dev/sdc1 (a small
ext2 partition) and the data there was correct, so it seems that the
disk is accessable and did not got damaged by the shot power failure.

how can i get the third disk back into the raid system? or do i have to
backup all reconstructed data from the two disks and reinit the raid
from the start?

thanks again in advance!
(i hope my surely stupid questions don't bother you!)

   markus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: recover short-time-power-failure of raid 5?
  2002-11-25  9:42   ` krause
@ 2002-11-25  9:59     ` Neil Brown
  2002-11-28  9:26       ` Markus Krause
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2002-11-25  9:59 UTC (permalink / raw)
  To: krause; +Cc: linux-raid

On Monday November 25, krause@mogli-soft.de wrote:
> 
> hello neilbrown, 
> 
> thanks a lot for your fast response!
> i tried the command as you suggested but i am not sure if it really
> worked, it seems that /dev/sdc2 is not yet used (but of course i may be
> wrong! ;-) )
.....
> 
> what about disk /dev/sdc2 ? i have tried to access /dev/sdc1 (a small
> ext2 partition) and the data there was correct, so it seems that the
> disk is accessable and did not got damaged by the shot power failure.
> 
> how can i get the third disk back into the raid system? or do i have to
> backup all reconstructed data from the two disks and reinit the raid
> from the start?

mdadm -A --force
will only 'force' into the array enough drives to make it work.  For
your 3 drive raid5 array, it ony needs to force in 2 drives, so it
takes the two most recent drives and uses them.  They will have all
the data on them, but no redundancy.

I suggest that after assembling the array, you 'fsck' the filesystem
on md0 just to make sure that the data is fine and then simply hot-add
the third device:
  mdadm /dev/md0 -a /dev/sdc

If fsck reports lots of error..... maybe try force the assmbly from a
different pair of drives. e.g.
   mdadm -A --force /dev/sda1 /dev/sdc2

and then do the fsck.

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: recover short-time-power-failure of raid 5?
  2002-11-25  9:59     ` Neil Brown
@ 2002-11-28  9:26       ` Markus Krause
  0 siblings, 0 replies; 5+ messages in thread
From: Markus Krause @ 2002-11-28  9:26 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2247 bytes --]

Am Mon, 2002-11-25 um 10.59 schrieb Neil Brown: 
> On Monday November 25, krause@mogli-soft.de wrote:
> > 
> > hello neilbrown, 
> > 
> > thanks a lot for your fast response!
> > i tried the command as you suggested but i am not sure if it really
> > worked, it seems that /dev/sdc2 is not yet used (but of course i may be
> > wrong! ;-) )
> .....
> > 
> > what about disk /dev/sdc2 ? i have tried to access /dev/sdc1 (a small
> > ext2 partition) and the data there was correct, so it seems that the
> > disk is accessable and did not got damaged by the shot power failure.
> > 
> > how can i get the third disk back into the raid system? or do i have to
> > backup all reconstructed data from the two disks and reinit the raid
> > from the start?
> 
> mdadm -A --force
> will only 'force' into the array enough drives to make it work.  For
> your 3 drive raid5 array, it ony needs to force in 2 drives, so it
> takes the two most recent drives and uses them.  They will have all
> the data on them, but no redundancy.
> 
> I suggest that after assembling the array, you 'fsck' the filesystem
> on md0 just to make sure that the data is fine and then simply hot-add
> the third device:
>   mdadm /dev/md0 -a /dev/sdc
> 
> If fsck reports lots of error..... maybe try force the assmbly from a
> different pair of drives. e.g.
>    mdadm -A --force /dev/sda1 /dev/sdc2
> 
> and then do the fsck.
> 
> NeilBrown
hi again and sorry for the delayed answer, i had to work a few days 
"on the road" and could not test your suggestion until some minutes ago!
and i worked exactly as you described! great! thanks a lot, thats real
cool! 

what i did exactly (maybe someone else is interested): 
(i snipped the output) 

  [root@merlin root]# mdadm -A /dev/md0 --force /dev/sda1 /dev/sdb1
/dev/sdc2

  [root@merlin root]# fsck.ext3 -n /dev/md0 

  [root@merlin root]# mdadm /dev/md0 -a /dev/sdc

and everything worked, all the data is back again!

well, except the fact, that now its a ext2 file system and not ext3, but
right now (till the current project is done in about two weeks) thats ok
for me, i can (must!) life without a journal file system.

thanks again for all your help!

   markus

[-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 240 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-11-28  9:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-11-24 20:03 recover short-time-power-failure of raid 5? krause
2002-11-24 23:45 ` Neil Brown
2002-11-25  9:42   ` krause
2002-11-25  9:59     ` Neil Brown
2002-11-28  9:26       ` Markus Krause

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).