* why won't this RAID5 device start?
@ 2021-03-28 2:14 David T-G
2021-03-28 4:36 ` Wols Lists
2021-03-28 19:27 ` Wols Lists
0 siblings, 2 replies; 4+ messages in thread
From: David T-G @ 2021-03-28 2:14 UTC (permalink / raw)
To: Linux RAID list
Hi, all --
I recently migrated our disk farm to a new box with a new OS build
(openSuSE from KNOPPIX). Aside from the usual challenges of setting
up the world again, I have a 3-device RAID5 volume that won't start.
The other metadevice is fine, though; I think we can say that the md
system is running. Soooooo ... Where do I start?
diskfarm:~ # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc1[3] sdd1[4] sdb1[0]
11720265216 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [U_UU]
md127 : inactive sdl2[0](S) sdj2[3](S) sdf2[1](S)
2196934199 blocks super 1.2
unused devices: <none>
[No, I don't know why md127 was first a moment ago!] I tried reassembling
the device, but mdadm doesn't like it.
diskfarm:~ # mdadm --stop /dev/md127
mdadm: stopped /dev/md127
diskfarm:~ # mdadm --assemble --scan
mdadm: /dev/md/750Graid5md assembled from 1 drive - not enough to start the array.
mdadm: Found some drive for an array that is already active: /dev/md/0
mdadm: giving up.
mdadm: No arrays found in config file or automatically
But ... what's with just 1 drive?
diskfarm:~ # for D in /dev/sd[fjl] ; do parted $D print ; done
Model: ATA WDC WD7500BPKX-7 (scsi)
Disk /dev/sdf: 750GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
2 1049kB 750GB 750GB ntfs Linux RAID raid
3 750GB 750GB 134MB ext3 Linux filesystem
Model: ATA WDC WD7500BPKX-7 (scsi)
Disk /dev/sdj: 750GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
2 1049kB 750GB 750GB ntfs Linux RAID raid
3 750GB 750GB 134MB xfs Linux filesystem
Model: ATA Hitachi HDE72101 (scsi)
Disk /dev/sdl: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:
Number Start End Size Type File system Flags
1 1049kB 4227MB 4226MB primary ntfs diag, type=27
2 4227MB 754GB 750GB primary reiserfs raid, type=fd
3 754GB 754GB 134MB primary reiserfs type=83
4 754GB 1000GB 246GB primary reiserfs type=83
Slice 2 on each is the RAID partition, slice 3 on each is a little
filesystem for bare-bones info, and slice 4 on sdl is a normal basic
filesystem for scratch content.
diskfarm:~ # mdadm --examine /dev/sd[fjl]2 | egrep '/dev|Name|Role|State|Checksum|Events|UUID'
/dev/sdf2:
Array UUID : 88575f01:592167fd:bd9f9ba1:a61fafc4
Name : diskfarm:750Graid5md (local to host diskfarm)
State : clean
Device UUID : e916fc67:b8b7fc59:51440134:fa431d02
Checksum : 43f9e7a4 - correct
Events : 720
Device Role : Active device 1
Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdj2:
Array UUID : 88575f01:592167fd:bd9f9ba1:a61fafc4
Name : diskfarm:750Graid5md (local to host diskfarm)
State : clean
Device UUID : 0b847f84:83e80a3d:a0dc11e7:60bffc9f
Checksum : 9522782b - correct
Events : 177792
Device Role : Active device 2
Array State : A.A ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdl2:
Array UUID : 88575f01:592167fd:bd9f9ba1:a61fafc4
Name : diskfarm:750Graid5md (local to host diskfarm)
State : clean
Device UUID : cc53440e:cb9180e4:be4c38d4:88a676eb
Checksum : fef95256 - correct
Events : 177794
Device Role : Active device 0
Array State : A.. ('A' == active, '.' == missing, 'R' == replacing)
Slice f2 looks great, but slices j2 & l2 seem to be missing -- even though
they are present. Worse, the Events counter on sdf2 is frighteningly
small. Where did it go?!? So maybe I consider sdf2 failed and reassemble
from the other two [only] and then put f2 back in?
Definitely time to stop, take a deep breath, and ask for help :-)
TIA & HAND
:-D
--
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: why won't this RAID5 device start? 2021-03-28 2:14 why won't this RAID5 device start? David T-G @ 2021-03-28 4:36 ` Wols Lists 2021-03-28 19:27 ` Wols Lists 1 sibling, 0 replies; 4+ messages in thread From: Wols Lists @ 2021-03-28 4:36 UTC (permalink / raw) To: David T-G, Linux RAID list On 28/03/21 03:14, David T-G wrote: > Hi, all -- > > I recently migrated our disk farm to a new box with a new OS build > (openSuSE from KNOPPIX). Aside from the usual challenges of setting > up the world again, I have a 3-device RAID5 volume that won't start. > The other metadevice is fine, though; I think we can say that the md > system is running. Soooooo ... Where do I start? https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn Sorry I don't have time to look deeper right now. If someone else doesn't chime in first, I'll take a look when I get back from work. Cheers, Wol ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: why won't this RAID5 device start? 2021-03-28 2:14 why won't this RAID5 device start? David T-G 2021-03-28 4:36 ` Wols Lists @ 2021-03-28 19:27 ` Wols Lists 2021-03-30 3:13 ` David T-G 1 sibling, 1 reply; 4+ messages in thread From: Wols Lists @ 2021-03-28 19:27 UTC (permalink / raw) To: David T-G, Linux RAID list On 28/03/21 03:14, David T-G wrote: > Hi, all -- > > I recently migrated our disk farm to a new box with a new OS build > (openSuSE from KNOPPIX). Aside from the usual challenges of setting > up the world again, I have a 3-device RAID5 volume that won't start. > The other metadevice is fine, though; I think we can say that the md > system is running. Soooooo ... Where do I start? > > diskfarm:~ # cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] > md0 : active raid5 sdc1[3] sdd1[4] sdb1[0] > 11720265216 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [U_UU] This looks wrong - is this supposed to be a four-drive array? U_UU implies a missing drive ... as does [4/3] > > md127 : inactive sdl2[0](S) sdj2[3](S) sdf2[1](S) > 2196934199 blocks super 1.2 > > unused devices: <none> > > [No, I don't know why md127 was first a moment ago!] I tried reassembling > the device, but mdadm doesn't like it. > > diskfarm:~ # mdadm --stop /dev/md127 > mdadm: stopped /dev/md127 > diskfarm:~ # mdadm --assemble --scan > mdadm: /dev/md/750Graid5md assembled from 1 drive - not enough to start the array. > mdadm: Found some drive for an array that is already active: /dev/md/0 > mdadm: giving up. > mdadm: No arrays found in config file or automatically > > But ... what's with just 1 drive? > > diskfarm:~ # for D in /dev/sd[fjl] ; do parted $D print ; done > Model: ATA WDC WD7500BPKX-7 (scsi) > Disk /dev/sdf: 750GB > Sector size (logical/physical): 512B/4096B > Partition Table: gpt > Disk Flags: > > Number Start End Size File system Name Flags > 2 1049kB 750GB 750GB ntfs Linux RAID raid > 3 750GB 750GB 134MB ext3 Linux filesystem > > Model: ATA WDC WD7500BPKX-7 (scsi) > Disk /dev/sdj: 750GB > Sector size (logical/physical): 512B/4096B > Partition Table: gpt > Disk Flags: > > Number Start End Size File system Name Flags > 2 1049kB 750GB 750GB ntfs Linux RAID raid > 3 750GB 750GB 134MB xfs Linux filesystem > > Model: ATA Hitachi HDE72101 (scsi) > Disk /dev/sdl: 1000GB > Sector size (logical/physical): 512B/512B > Partition Table: msdos > Disk Flags: > > Number Start End Size Type File system Flags > 1 1049kB 4227MB 4226MB primary ntfs diag, type=27 > 2 4227MB 754GB 750GB primary reiserfs raid, type=fd > 3 754GB 754GB 134MB primary reiserfs type=83 > 4 754GB 1000GB 246GB primary reiserfs type=83 > > Slice 2 on each is the RAID partition, slice 3 on each is a little > filesystem for bare-bones info, and slice 4 on sdl is a normal basic > filesystem for scratch content. > > diskfarm:~ # mdadm --examine /dev/sd[fjl]2 | egrep '/dev|Name|Role|State|Checksum|Events|UUID' > /dev/sdf2: > Array UUID : 88575f01:592167fd:bd9f9ba1:a61fafc4 > Name : diskfarm:750Graid5md (local to host diskfarm) > State : clean > Device UUID : e916fc67:b8b7fc59:51440134:fa431d02 > Checksum : 43f9e7a4 - correct > Events : 720 > Device Role : Active device 1 > Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) > /dev/sdj2: > Array UUID : 88575f01:592167fd:bd9f9ba1:a61fafc4 > Name : diskfarm:750Graid5md (local to host diskfarm) > State : clean > Device UUID : 0b847f84:83e80a3d:a0dc11e7:60bffc9f > Checksum : 9522782b - correct > Events : 177792 > Device Role : Active device 2 > Array State : A.A ('A' == active, '.' == missing, 'R' == replacing) > /dev/sdl2: > Array UUID : 88575f01:592167fd:bd9f9ba1:a61fafc4 > Name : diskfarm:750Graid5md (local to host diskfarm) > State : clean > Device UUID : cc53440e:cb9180e4:be4c38d4:88a676eb > Checksum : fef95256 - correct > Events : 177794 > Device Role : Active device 0 > Array State : A.. ('A' == active, '.' == missing, 'R' == replacing) > > Slice f2 looks great, but slices j2 & l2 seem to be missing -- even though > they are present. Worse, the Events counter on sdf2 is frighteningly > small. Where did it go?!? So maybe I consider sdf2 failed and reassemble > from the other two [only] and then put f2 back in? Yes I'm afraid so. I'd guess you've been running a failed raid-5 for ages, and because something hiccuped when you shut it down, the two good drives drifted apart, and now they won't start ... > > Definitely time to stop, take a deep breath, and ask for help :-) > Read up about overlays on the web site, use an overlay to force-assemble the two good drives, run fsck etc to check everything's good (you might lose a bit of data, hopefully very little), then if everything looks okay do it for real. Ie force-assemble the two good drives, then re-add the third. I'd just do a quick smartctl health check on all three drives first, just to make sure nothing is obviously wrong with them - a problem could kill your array completely! Then add the third drive back in (whether you use --add or --re-add probably won't make much difference). Oh - and fix whatever is wrong with md0, too, before that dies on you! Cheers, Wol ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: why won't this RAID5 device start? 2021-03-28 19:27 ` Wols Lists @ 2021-03-30 3:13 ` David T-G 0 siblings, 0 replies; 4+ messages in thread From: David T-G @ 2021-03-30 3:13 UTC (permalink / raw) To: Linux RAID list Wol, et al -- ...and then Wols Lists said... % % On 28/03/21 03:14, David T-G wrote: % > ... % > system is running. Soooooo ... Where do I start? % > % > diskfarm:~ # cat /proc/mdstat % > Personalities : [raid6] [raid5] [raid4] % > md0 : active raid5 sdc1[3] sdd1[4] sdb1[0] % > 11720265216 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [U_UU] % % This looks wrong - is this supposed to be a four-drive array? U_UU % implies a missing drive ... as does [4/3] Yep :-( I sent a separate note about that one. % > % > md127 : inactive sdl2[0](S) sdj2[3](S) sdf2[1](S) % > 2196934199 blocks super 1.2 % > ... % > diskfarm:~ # mdadm --examine /dev/sd[fjl]2 | egrep '/dev|Name|Role|State|Checksum|Events|UUID' % > /dev/sdf2: ... % > Events : 720 % > Device Role : Active device 1 % > Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) % > /dev/sdj2: ... % > Events : 177792 % > Device Role : Active device 2 % > Array State : A.A ('A' == active, '.' == missing, 'R' == replacing) % > /dev/sdl2: ... % > Events : 177794 % > Device Role : Active device 0 % > Array State : A.. ('A' == active, '.' == missing, 'R' == replacing) % > % > Slice f2 looks great, but slices j2 & l2 seem to be missing -- even though % > they are present. Worse, the Events counter on sdf2 is frighteningly % > small. Where did it go?!? So maybe I consider sdf2 failed and reassemble % > from the other two [only] and then put f2 back in? % % Yes I'm afraid so. I'd guess you've been running a failed raid-5 for % ages, and because something hiccuped when you shut it down, the two good % drives drifted apart, and now they won't start ... Arrrgh :-( I was afraid of that ... % > % > Definitely time to stop, take a deep breath, and ask for help :-) % > % Read up about overlays on the web site, use an overlay to force-assemble Ouch. I tried that with a hiccup a few years ago and never really got it down. This is gonna take some reading again ... % the two good drives, run fsck etc to check everything's good (you might % lose a bit of data, hopefully very little), then if everything looks % okay do it for real. Ie force-assemble the two good drives, then re-add It looks like I want mdadm --assemble --force /dev/md127 /dev/sd[jl]2 for that based on the Go Wrong page and Go Wrogn collection. Does that sound right? % the third. I'd just do a quick smartctl health check on all three drives % first, just to make sure nothing is obviously wrong with them - a That looks good, although perhaps quite simple: diskfarm:~ # for D in /dev/sd[fjl] ; do echo "### $D" ; smartctl --health $D ; done ### /dev/sdf smartctl 7.0 2019-05-21 r4917 [x86_64-linux-5.3.18-lp152.63-default] (SUSE RPM) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED ### /dev/sdj smartctl 7.0 2019-05-21 r4917 [x86_64-linux-5.3.18-lp152.63-default] (SUSE RPM) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED ### /dev/sdl smartctl 7.0 2019-05-21 r4917 [x86_64-linux-5.3.18-lp152.63-default] (SUSE RPM) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED But --info is longer and still doesn't give THAT much info, so maybe this is fine :-) % problem could kill your array completely! Then add the third drive back % in (whether you use --add or --re-add probably won't make much difference). OK. % % Oh - and fix whatever is wrong with md0, too, before that dies on you! Yeah, about that ... Haven't seen any replies there yet :-/ % % Cheers, % Wol Thanks again & HANN :-D -- David T-G See http://justpickone.org/davidtg/email/ See http://justpickone.org/davidtg/tofu.txt ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-03-30 3:15 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-03-28 2:14 why won't this RAID5 device start? David T-G 2021-03-28 4:36 ` Wols Lists 2021-03-28 19:27 ` Wols Lists 2021-03-30 3:13 ` David T-G
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox