* recover short-time-power-failure of raid 5?
@ 2002-11-24 20:03 krause
2002-11-24 23:45 ` Neil Brown
0 siblings, 1 reply; 5+ messages in thread
From: krause @ 2002-11-24 20:03 UTC (permalink / raw)
To: linux-raid
hi folks!
i am using a small 3-disk software raid on a linux redhat 7.3 system for
my home directory.
never had any problems since today, somehow the power cable of to two of
my disks dropped (a fan in the case got loose which was connected to the
same cable!). only some seconds later i shut down the system, switched
it off and reconneceted the disks. the raid device /dev/md0 was not used
by the time the power failure happend (i was not logged in, /dev/md0
only contains /home), so actually there should be no loss of data.
but how do i tell the system just to reuse the disks?
i read some howtos and articles in the mailing archiv but found no
instruction for this case or (completely ?) different ways (one said
just use "mkraid --create", others "raidhotadd")
so what is the correct way? the data on /home is quite important to me
and the last backup a week ago unfortunately does not include some
relevant changes! because of this i dared not to try things out.
attached are the contents/output of /etc/raidtab, mdadm --examine and
/var/log/messages.
i hope you can help me to reuse my data!
thanks in advance for your hints!
markus
my config:
---8<---
[root@merlin root]# cat /etc/raidtab
raiddev /dev/md0
raid-level 5
nr-raid-disks 3
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/sda1
raid-disk 0
device /dev/sdb1
raid-disk 1
device /dev/sdc2
raid-disk 2
[root@merlin root]#
--->8---
some other output:
---8<---
[root@merlin root]# mdadm --examine /dev/sda1
/dev/sda1:
Magic : a92b4efc
Version : 00.90.00
UUID : 396dcf37:a40fe0d4:c6d72718:8dd3f5d6
Creation Time : Thu Jun 20 17:13:01 2002
Raid Level : raid5
Device Size : 4192832 (3.100 GiB 4.34 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Update Time : Sun Nov 24 12:30:33 2002
State : clean, no-errors
Active Devices : 1
Working Devices : 1
Failed Devices : 2
Spare Devices : 0
Checksum : 5687c1bf - correct
Events : 0.254
Layout : left-asymmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 1 0 active sync /dev/sda1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 faulty /dev/sdb1
2 2 8 34 2 faulty /dev/sdc2
[root@merlin root]#
[root@merlin root]# mdadm --examine /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : 396dcf37:a40fe0d4:c6d72718:8dd3f5d6
Creation Time : Thu Jun 20 17:13:01 2002
Raid Level : raid5
Device Size : 4192832 (3.100 GiB 4.34 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Update Time : Sun Nov 24 12:27:53 2002
State : dirty, no-errors
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Checksum : 5687c139 - correct
Events : 0.251
Layout : left-asymmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 17 1 active sync /dev/sdb1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 34 2 active sync /dev/sdc2
[root@merlin root]#
[root@merlin root]# mdadm --examine /dev/sdc2
/dev/sdc2:
Magic : a92b4efc
Version : 00.90.00
UUID : 396dcf37:a40fe0d4:c6d72718:8dd3f5d6
Creation Time : Thu Jun 20 17:13:01 2002
Raid Level : raid5
Device Size : 4192832 (3.100 GiB 4.34 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Update Time : Sun Nov 24 12:27:53 2002
State : dirty, no-errors
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Checksum : 5687c14c - correct
Events : 0.251
Layout : left-asymmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 34 2 active sync /dev/sdc2
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 34 2 active sync /dev/sdc2
[root@merlin root]#
--->8---
the info in /var/log/messages when running "raidstart /dev/md0":
---8<---
Nov 24 21:02:08 merlin kernel: [events: 000000fe]
Nov 24 21:02:08 merlin kernel: [events: 000000fb]
Nov 24 21:02:08 merlin kernel: [events: 000000fb]
Nov 24 21:02:08 merlin kernel: md: autorun ...
Nov 24 21:02:08 merlin kernel: md: considering sdc2 ...
Nov 24 21:02:08 merlin kernel: md: adding sdc2 ...
Nov 24 21:02:08 merlin kernel: md: adding sdb1 ...
Nov 24 21:02:08 merlin kernel: md: adding sda1 ...
Nov 24 21:02:08 merlin kernel: md: created md0
Nov 24 21:02:08 merlin kernel: md: bind<sda1,1>
Nov 24 21:02:08 merlin kernel: md: bind<sdb1,2>
Nov 24 21:02:08 merlin kernel: md: bind<sdc2,3>
Nov 24 21:02:08 merlin kernel: md: running: <sdc2><sdb1><sda1>
Nov 24 21:02:08 merlin kernel: md: sdc2's event counter: 000000fb
Nov 24 21:02:08 merlin kernel: md: sdb1's event counter: 000000fb
Nov 24 21:02:08 merlin kernel: md: sda1's event counter: 000000fe
Nov 24 21:02:08 merlin kernel: md: superblock update time inconsistency
-- using the most recent one
Nov 24 21:02:08 merlin kernel: md: freshest: sda1
Nov 24 21:02:08 merlin kernel: md: kicking non-fresh sdc2 from array!
Nov 24 21:02:08 merlin kernel: md: unbind<sdc2,2>
Nov 24 21:02:08 merlin kernel: md: export_rdev(sdc2)
Nov 24 21:02:08 merlin kernel: md: kicking non-fresh sdb1 from array!
Nov 24 21:02:08 merlin kernel: md: unbind<sdb1,1>
Nov 24 21:02:08 merlin kernel: md: export_rdev(sdb1)
Nov 24 21:02:08 merlin kernel: md0: removing former faulty sdb1!
Nov 24 21:02:08 merlin kernel: md0: removing former faulty sdc2!
Nov 24 21:02:08 merlin kernel: md0: max total readahead window set to
512k
Nov 24 21:02:08 merlin kernel: md0: 2 data-disks, max readahead per
data-disk: 256k
Nov 24 21:02:08 merlin kernel: raid5: device sda1 operational as raid
disk 0
Nov 24 21:02:08 merlin kernel: raid5: not enough operational devices for
md0 (2/3 failed)
Nov 24 21:02:08 merlin kernel: RAID5 conf printout:
Nov 24 21:02:08 merlin kernel: --- rd:3 wd:1 fd:2
Nov 24 21:02:08 merlin kernel: disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda1
Nov 24 21:02:08 merlin kernel: disk 1, s:0, o:0, n:1 rd:1 us:1 dev:[dev
00:00]
Nov 24 21:02:08 merlin kernel: disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev
00:00]
Nov 24 21:02:08 merlin kernel: raid5: failed to run raid set md0
Nov 24 21:02:08 merlin kernel: md: pers->run() failed ...
Nov 24 21:02:08 merlin kernel: md :do_md_run() returned -22
Nov 24 21:02:08 merlin kernel: md: md0 stopped.
Nov 24 21:02:08 merlin kernel: md: unbind<sda1,0>
Nov 24 21:02:08 merlin kernel: md: export_rdev(sda1)
Nov 24 21:02:08 merlin kernel: md: ... autorun DONE.
---8<---
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: recover short-time-power-failure of raid 5? 2002-11-24 20:03 recover short-time-power-failure of raid 5? krause @ 2002-11-24 23:45 ` Neil Brown 2002-11-25 9:42 ` krause 0 siblings, 1 reply; 5+ messages in thread From: Neil Brown @ 2002-11-24 23:45 UTC (permalink / raw) To: krause; +Cc: linux-raid On Sunday November 24, krause@mogli-soft.de wrote: > hi folks! > > i am using a small 3-disk software raid on a linux redhat 7.3 system for > my home directory. > never had any problems since today, somehow the power cable of to two of > my disks dropped (a fan in the case got loose which was connected to the > same cable!). only some seconds later i shut down the system, switched > it off and reconneceted the disks. the raid device /dev/md0 was not used > by the time the power failure happend (i was not logged in, /dev/md0 > only contains /home), so actually there should be no loss of data. > but how do i tell the system just to reuse the disks? mdadm -A /dev/md0 --force /dev/sda1 /dev/sdb1 /dev/sdc2 NeilBrown ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: recover short-time-power-failure of raid 5? 2002-11-24 23:45 ` Neil Brown @ 2002-11-25 9:42 ` krause 2002-11-25 9:59 ` Neil Brown 0 siblings, 1 reply; 5+ messages in thread From: krause @ 2002-11-25 9:42 UTC (permalink / raw) To: linux-raid Neil Brown <neilb@cse.unsw.edu.au> schrieb am 25.11.2002, 00:45:22: > On Sunday November 24, krause@mogli-soft.de wrote: > > hi folks! > > > > i am using a small 3-disk software raid on a linux redhat 7.3 system for > > my home directory. > > never had any problems since today, somehow the power cable of to two of > > my disks dropped (a fan in the case got loose which was connected to the > > same cable!). only some seconds later i shut down the system, switched > > it off and reconneceted the disks. the raid device /dev/md0 was not used > > by the time the power failure happend (i was not logged in, /dev/md0 > > only contains /home), so actually there should be no loss of data. > > but how do i tell the system just to reuse the disks? > > > mdadm -A /dev/md0 --force /dev/sda1 /dev/sdb1 /dev/sdc2 > > NeilBrown hello neilbrown, thanks a lot for your fast response! i tried the command as you suggested but i am not sure if it really worked, it seems that /dev/sdc2 is not yet used (but of course i may be wrong! ;-) ) after i ran "mdadm -A /dev/md0 --force /dev/sda1 /dev/sdb1 /dev/sdc2" i got in my /var/log/messages: --->8--- Nov 25 10:09:31 merlin kernel: [events: 000000fe] Nov 25 10:09:31 merlin kernel: md: bind<sdb1,1> Nov 25 10:09:31 merlin kernel: [events: 000000fe] Nov 25 10:09:31 merlin kernel: md: bind<sda1,2> Nov 25 10:09:31 merlin kernel: md: sda1's event counter: 000000fe Nov 25 10:09:31 merlin kernel: md: sdb1's event counter: 000000fe Nov 25 10:09:31 merlin kernel: md0: removing former faulty sdc2! Nov 25 10:09:31 merlin kernel: md0: max total readahead window set to 512k Nov 25 10:09:31 merlin kernel: md0: 2 data-disks, max readahead per data-disk: 256k Nov 25 10:09:31 merlin kernel: raid5: device sda1 operational as raid disk 0 Nov 25 10:09:31 merlin kernel: raid5: device sdb1 operational as raid disk 1 Nov 25 10:09:31 merlin kernel: raid5: md0, not all disks are operational -- trying to recover array Nov 25 10:09:31 merlin kernel: raid5: allocated 3291kB for md0 Nov 25 10:09:31 merlin kernel: raid5: raid level 5 set md0 active with 2 out of 3 devices, algorithm 0 Nov 25 10:09:31 merlin kernel: RAID5 conf printout: Nov 25 10:09:31 merlin kernel: --- rd:3 wd:2 fd:1 Nov 25 10:09:31 merlin kernel: disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda1 Nov 25 10:09:31 merlin kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb1 Nov 25 10:09:31 merlin kernel: disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev 00:00] Nov 25 10:09:31 merlin kernel: RAID5 conf printout: Nov 25 10:09:31 merlin kernel: --- rd:3 wd:2 fd:1 Nov 25 10:09:31 merlin kernel: disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda1 Nov 25 10:09:31 merlin kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb1 Nov 25 10:09:31 merlin kernel: disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev 00:00] Nov 25 10:09:31 merlin kernel: md: updating md0 RAID superblock on device Nov 25 10:09:31 merlin kernel: md: sda1 [events: 000000ff]<6>(write) sda1's sb offset: 4192960 Nov 25 10:09:31 merlin kernel: md: recovery thread got woken up ... Nov 25 10:09:31 merlin kernel: md0: no spare disk to reconstruct array! -- continuing in degraded mode Nov 25 10:09:31 merlin kernel: md: recovery thread finished ... Nov 25 10:09:31 merlin kernel: md: sdb1 [events: 000000ff]<6>(write) sdb1's sb offset: 4192832 ---8<--- then i ran "raidstop /dev/md0" because i thought i did somethin wrong! /var/log/messages said: --->8--- Nov 25 10:09:54 merlin kernel: md: marking sb clean... Nov 25 10:09:54 merlin kernel: md: updating md0 RAID superblock on device Nov 25 10:09:54 merlin kernel: md: sda1 [events: 00000100]<6>(write) sda1's sb offset: 4192960 Nov 25 10:09:54 merlin kernel: md: sdb1 [events: 00000100]<6>(write) sdb1's sb offset: 4192832 Nov 25 10:09:55 merlin kernel: md: md0 stopped. Nov 25 10:09:55 merlin kernel: md: unbind<sda1,1> Nov 25 10:09:55 merlin kernel: md: export_rdev(sda1) Nov 25 10:09:55 merlin kernel: md: unbind<sdb1,0> Nov 25 10:09:55 merlin kernel: md: export_rdev(sdb1) ---8<--- but then i tried another "mdadm ..." just to see if maybe it takes to steps/runs to get back two drives, but again /var/log/messages was not encouraging (for me): --->8--- Nov 25 10:10:15 merlin kernel: [events: 00000100] Nov 25 10:10:15 merlin kernel: md: bind<sdb1,1> Nov 25 10:10:15 merlin kernel: [events: 00000100] Nov 25 10:10:15 merlin kernel: md: bind<sda1,2> Nov 25 10:10:15 merlin kernel: md: sda1's event counter: 00000100 Nov 25 10:10:15 merlin kernel: md: sdb1's event counter: 00000100 Nov 25 10:10:15 merlin kernel: md0: max total readahead window set to 512k Nov 25 10:10:15 merlin kernel: md0: 2 data-disks, max readahead per data-disk: 256k Nov 25 10:10:15 merlin kernel: raid5: device sda1 operational as raid disk 0 Nov 25 10:10:15 merlin kernel: raid5: device sdb1 operational as raid disk 1 Nov 25 10:10:15 merlin kernel: raid5: md0, not all disks are operational -- trying to recover array Nov 25 10:10:15 merlin kernel: raid5: allocated 3291kB for md0 Nov 25 10:10:15 merlin kernel: raid5: raid level 5 set md0 active with 2 out of 3 devices, algorithm 0 Nov 25 10:10:15 merlin kernel: RAID5 conf printout: Nov 25 10:10:15 merlin kernel: --- rd:3 wd:2 fd:1 Nov 25 10:10:15 merlin kernel: disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda1 Nov 25 10:10:15 merlin kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb1 Nov 25 10:10:15 merlin kernel: disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev 00:00] Nov 25 10:10:15 merlin kernel: RAID5 conf printout: Nov 25 10:10:15 merlin kernel: --- rd:3 wd:2 fd:1 Nov 25 10:10:15 merlin kernel: disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda1 Nov 25 10:10:15 merlin kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb1 Nov 25 10:10:15 merlin kernel: disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev 00:00] Nov 25 10:10:15 merlin kernel: md: updating md0 RAID superblock on device Nov 25 10:10:15 merlin kernel: md: sda1 [events: 00000101]<6>(write) sda1's sb offset: 4192960 Nov 25 10:10:15 merlin kernel: md: recovery thread got woken up ... Nov 25 10:10:15 merlin kernel: md0: no spare disk to reconstruct array! -- continuing in degraded mode Nov 25 10:10:15 merlin kernel: md: recovery thread finished ... Nov 25 10:10:15 merlin kernel: md: sdb1 [events: 00000101]<6>(write) sdb1's sb offset: 4192832 ---8<--- what about disk /dev/sdc2 ? i have tried to access /dev/sdc1 (a small ext2 partition) and the data there was correct, so it seems that the disk is accessable and did not got damaged by the shot power failure. how can i get the third disk back into the raid system? or do i have to backup all reconstructed data from the two disks and reinit the raid from the start? thanks again in advance! (i hope my surely stupid questions don't bother you!) markus ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: recover short-time-power-failure of raid 5? 2002-11-25 9:42 ` krause @ 2002-11-25 9:59 ` Neil Brown 2002-11-28 9:26 ` Markus Krause 0 siblings, 1 reply; 5+ messages in thread From: Neil Brown @ 2002-11-25 9:59 UTC (permalink / raw) To: krause; +Cc: linux-raid On Monday November 25, krause@mogli-soft.de wrote: > > hello neilbrown, > > thanks a lot for your fast response! > i tried the command as you suggested but i am not sure if it really > worked, it seems that /dev/sdc2 is not yet used (but of course i may be > wrong! ;-) ) ..... > > what about disk /dev/sdc2 ? i have tried to access /dev/sdc1 (a small > ext2 partition) and the data there was correct, so it seems that the > disk is accessable and did not got damaged by the shot power failure. > > how can i get the third disk back into the raid system? or do i have to > backup all reconstructed data from the two disks and reinit the raid > from the start? mdadm -A --force will only 'force' into the array enough drives to make it work. For your 3 drive raid5 array, it ony needs to force in 2 drives, so it takes the two most recent drives and uses them. They will have all the data on them, but no redundancy. I suggest that after assembling the array, you 'fsck' the filesystem on md0 just to make sure that the data is fine and then simply hot-add the third device: mdadm /dev/md0 -a /dev/sdc If fsck reports lots of error..... maybe try force the assmbly from a different pair of drives. e.g. mdadm -A --force /dev/sda1 /dev/sdc2 and then do the fsck. NeilBrown ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: recover short-time-power-failure of raid 5? 2002-11-25 9:59 ` Neil Brown @ 2002-11-28 9:26 ` Markus Krause 0 siblings, 0 replies; 5+ messages in thread From: Markus Krause @ 2002-11-28 9:26 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2247 bytes --] Am Mon, 2002-11-25 um 10.59 schrieb Neil Brown: > On Monday November 25, krause@mogli-soft.de wrote: > > > > hello neilbrown, > > > > thanks a lot for your fast response! > > i tried the command as you suggested but i am not sure if it really > > worked, it seems that /dev/sdc2 is not yet used (but of course i may be > > wrong! ;-) ) > ..... > > > > what about disk /dev/sdc2 ? i have tried to access /dev/sdc1 (a small > > ext2 partition) and the data there was correct, so it seems that the > > disk is accessable and did not got damaged by the shot power failure. > > > > how can i get the third disk back into the raid system? or do i have to > > backup all reconstructed data from the two disks and reinit the raid > > from the start? > > mdadm -A --force > will only 'force' into the array enough drives to make it work. For > your 3 drive raid5 array, it ony needs to force in 2 drives, so it > takes the two most recent drives and uses them. They will have all > the data on them, but no redundancy. > > I suggest that after assembling the array, you 'fsck' the filesystem > on md0 just to make sure that the data is fine and then simply hot-add > the third device: > mdadm /dev/md0 -a /dev/sdc > > If fsck reports lots of error..... maybe try force the assmbly from a > different pair of drives. e.g. > mdadm -A --force /dev/sda1 /dev/sdc2 > > and then do the fsck. > > NeilBrown hi again and sorry for the delayed answer, i had to work a few days "on the road" and could not test your suggestion until some minutes ago! and i worked exactly as you described! great! thanks a lot, thats real cool! what i did exactly (maybe someone else is interested): (i snipped the output) [root@merlin root]# mdadm -A /dev/md0 --force /dev/sda1 /dev/sdb1 /dev/sdc2 [root@merlin root]# fsck.ext3 -n /dev/md0 [root@merlin root]# mdadm /dev/md0 -a /dev/sdc and everything worked, all the data is back again! well, except the fact, that now its a ext2 file system and not ext3, but right now (till the current project is done in about two weeks) thats ok for me, i can (must!) life without a journal file system. thanks again for all your help! markus [-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --] [-- Type: application/pgp-signature, Size: 240 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2002-11-28 9:26 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-11-24 20:03 recover short-time-power-failure of raid 5? krause 2002-11-24 23:45 ` Neil Brown 2002-11-25 9:42 ` krause 2002-11-25 9:59 ` Neil Brown 2002-11-28 9:26 ` Markus Krause
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).