* Trying to start dirty, degraded RAID6 array
@ 2006-04-26 23:37 Christopher Smith
2006-04-27 0:06 ` Neil Brown
0 siblings, 1 reply; 4+ messages in thread
From: Christopher Smith @ 2006-04-26 23:37 UTC (permalink / raw)
To: linux-raid
The short version:
I have a 12-disk RAID6 array that has lost a device and now whenever I
try to start it with:
mdadm -Af /dev/md0 /dev/sd[abcdefgijkl]1
I get:
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
And in dmesg:
md: bind<sdk1>
md: bind<sdi1>
md: bind<sdj1>
md: bind<sde1>
md: bind<sdf1>
md: bind<sdg1>
md: bind<sdb1>
md: bind<sdd1>
md: bind<sda1>
md: bind<sdc1>
md: bind<sdl1>
md: md0: raid array is not clean -- starting background reconstruction
raid6: device sdl1 operational as raid disk 0
raid6: device sdc1 operational as raid disk 11
raid6: device sda1 operational as raid disk 10
raid6: device sdd1 operational as raid disk 9
raid6: device sdb1 operational as raid disk 8
raid6: device sdg1 operational as raid disk 6
raid6: device sdf1 operational as raid disk 5
raid6: device sde1 operational as raid disk 4
raid6: device sdj1 operational as raid disk 3
raid6: device sdi1 operational as raid disk 2
raid6: device sdk1 operational as raid disk 1
raid6: cannot start dirty degraded array for md0
RAID6 conf printout:
--- rd:12 wd:11 fd:1
disk 0, o:1, dev:sdl1
disk 1, o:1, dev:sdk1
disk 2, o:1, dev:sdi1
disk 3, o:1, dev:sdj1
disk 4, o:1, dev:sde1
disk 5, o:1, dev:sdf1
disk 6, o:1, dev:sdg1
disk 8, o:1, dev:sdb1
disk 9, o:1, dev:sdd1
disk 10, o:1, dev:sda1
disk 11, o:1, dev:sdc1
raid6: failed to run raid set md0
md: pers->run() failed ...
I'm 99% sure the data is ok and I'd like to know how to force the array
online.
Longer version:
A couple of days ago I started having troubles with my fileserver
mysteriously hanging during boot (I was messing with trying to get Xen
running at the time, so lots of reboots were involved). I finally
nailed it down to the autostarting of the RAID array.
After several hours of pulling CPUs, SATA cards, RAM (not to mention
some scary problems with memtest86+ that turned out to be because "USB
Legacy" was enabled) I finally managed to figure out that one of my
drives would simply stop transferring data after about the first gig
(tested with dd, monitoring with iostat). About 30 seconds after the
drive "stops", the rest of the machine also hangs.
Interestingly, there are no error messages anywhere I could find
indicating the drive was having problem. Even its SMART test (smartctl
-t long) says it's ok. This made the problem substantially more
difficult to figure out.
I then tried to start the array without the broken disk and had the
problem mentioned in the short version above - the array wouldn't start,
presumably because its rebuild had been started and (uncleanly) stopped
about a dozen times since it last succeeeded. I finally managed to get
the array online by starting it with all the disks, then immediately
knocking the one I knew to be bad offline with 'mdadm /dev/md0 -f
/dev/sdh1' before it hit the point where it would hang. After that the
rebuild completed without error (I didn't touch the machine at all while
it was rebuilding).
However, a few hours after the rebuild completed, a power failure killed
the machine again and now I can't start the array, as outlined in the
"short version" above. I must admit I find it a bit weird that the
array is "dirty and degraded" after it had successfully completed a rebuild.
Unfortunately the original failed drive (/dev/sdh) is no longer
available, so I can't do my original trick again. I'm pretty sure -
based on the rebuild completing previously - that the data will be fine
if I can just get the array back online, is there some sort of
--really-force switch to mdadm ? Can the array be brought back online
*without* triggering a rebuild, so I can get as much data as possible
off and then start from scratch again ?
CS
Here is the 'mdadm --examine /dev/sdX' output for each of the remaining
drives, if it is helpful:
/dev/sda1:
Magic : a92b4efc
Version : 00.90.02
UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298
Creation Time : Wed Feb 1 01:09:11 2006
Raid Level : raid6
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 2441959040 (2328.83 GiB 2500.57 GB)
Raid Devices : 12
Total Devices : 11
Preferred Minor : 0
Update Time : Wed Apr 26 22:30:01 2006
State : active
Active Devices : 11
Working Devices : 11
Failed Devices : 1
Spare Devices : 0
Checksum : 1685ebfc - correct
Events : 0.11176511
Number Major Minor RaidDevice State
this 10 8 1 10 active sync /dev/sda1
0 0 8 177 0 active sync /dev/sdl1
1 1 8 161 1 active sync /dev/sdk1
2 2 8 129 2 active sync /dev/sdi1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 0 0 7 faulty removed
8 8 8 17 8 active sync /dev/sdb1
9 9 8 49 9 active sync /dev/sdd1
10 10 8 1 10 active sync /dev/sda1
11 11 8 33 11 active sync /dev/sdc1
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.02
UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298
Creation Time : Wed Feb 1 01:09:11 2006
Raid Level : raid6
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 2441959040 (2328.83 GiB 2500.57 GB)
Raid Devices : 12
Total Devices : 11
Preferred Minor : 0
Update Time : Wed Apr 26 22:30:01 2006
State : active
Active Devices : 11
Working Devices : 11
Failed Devices : 1
Spare Devices : 0
Checksum : 1685ec08 - correct
Events : 0.11176511
Number Major Minor RaidDevice State
this 8 8 17 8 active sync /dev/sdb1
0 0 8 177 0 active sync /dev/sdl1
1 1 8 161 1 active sync /dev/sdk1
2 2 8 129 2 active sync /dev/sdi1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 0 0 7 faulty removed
8 8 8 17 8 active sync /dev/sdb1
9 9 8 49 9 active sync /dev/sdd1
10 10 8 1 10 active sync /dev/sda1
11 11 8 33 11 active sync /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.02
UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298
Creation Time : Wed Feb 1 01:09:11 2006
Raid Level : raid6
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 2441959040 (2328.83 GiB 2500.57 GB)
Raid Devices : 12
Total Devices : 11
Preferred Minor : 0
Update Time : Wed Apr 26 22:30:01 2006
State : active
Active Devices : 11
Working Devices : 11
Failed Devices : 1
Spare Devices : 0
Checksum : 1685ec1e - correct
Events : 0.11176511
Number Major Minor RaidDevice State
this 11 8 33 11 active sync /dev/sdc1
0 0 8 177 0 active sync /dev/sdl1
1 1 8 161 1 active sync /dev/sdk1
2 2 8 129 2 active sync /dev/sdi1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 0 0 7 faulty removed
8 8 8 17 8 active sync /dev/sdb1
9 9 8 49 9 active sync /dev/sdd1
10 10 8 1 10 active sync /dev/sda1
11 11 8 33 11 active sync /dev/sdc1
/dev/sdd1:
Magic : a92b4efc
Version : 00.90.02
UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298
Creation Time : Wed Feb 1 01:09:11 2006
Raid Level : raid6
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 2441959040 (2328.83 GiB 2500.57 GB)
Raid Devices : 12
Total Devices : 11
Preferred Minor : 0
Update Time : Wed Apr 26 22:30:01 2006
State : active
Active Devices : 11
Working Devices : 11
Failed Devices : 1
Spare Devices : 0
Checksum : 1685ec2a - correct
Events : 0.11176511
Number Major Minor RaidDevice State
this 9 8 49 9 active sync /dev/sdd1
0 0 8 177 0 active sync /dev/sdl1
1 1 8 161 1 active sync /dev/sdk1
2 2 8 129 2 active sync /dev/sdi1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 0 0 7 faulty removed
8 8 8 17 8 active sync /dev/sdb1
9 9 8 49 9 active sync /dev/sdd1
10 10 8 1 10 active sync /dev/sda1
11 11 8 33 11 active sync /dev/sdc1
/dev/sde1:
Magic : a92b4efc
Version : 00.90.02
UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298
Creation Time : Wed Feb 1 01:09:11 2006
Raid Level : raid6
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 2441959040 (2328.83 GiB 2500.57 GB)
Raid Devices : 12
Total Devices : 11
Preferred Minor : 0
Update Time : Wed Apr 26 22:30:01 2006
State : active
Active Devices : 11
Working Devices : 11
Failed Devices : 1
Spare Devices : 0
Checksum : 1685ec30 - correct
Events : 0.11176511
Number Major Minor RaidDevice State
this 4 8 65 4 active sync /dev/sde1
0 0 8 177 0 active sync /dev/sdl1
1 1 8 161 1 active sync /dev/sdk1
2 2 8 129 2 active sync /dev/sdi1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 0 0 7 faulty removed
8 8 8 17 8 active sync /dev/sdb1
9 9 8 49 9 active sync /dev/sdd1
10 10 8 1 10 active sync /dev/sda1
11 11 8 33 11 active sync /dev/sdc1
/dev/sdf1:
Magic : a92b4efc
Version : 00.90.02
UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298
Creation Time : Wed Feb 1 01:09:11 2006
Raid Level : raid6
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 2441959040 (2328.83 GiB 2500.57 GB)
Raid Devices : 12
Total Devices : 11
Preferred Minor : 0
Update Time : Wed Apr 26 22:30:01 2006
State : active
Active Devices : 11
Working Devices : 11
Failed Devices : 1
Spare Devices : 0
Checksum : 1685ec42 - correct
Events : 0.11176511
Number Major Minor RaidDevice State
this 5 8 81 5 active sync /dev/sdf1
0 0 8 177 0 active sync /dev/sdl1
1 1 8 161 1 active sync /dev/sdk1
2 2 8 129 2 active sync /dev/sdi1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 0 0 7 faulty removed
8 8 8 17 8 active sync /dev/sdb1
9 9 8 49 9 active sync /dev/sdd1
10 10 8 1 10 active sync /dev/sda1
11 11 8 33 11 active sync /dev/sdc1
/dev/sdg1:
Magic : a92b4efc
Version : 00.90.02
UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298
Creation Time : Wed Feb 1 01:09:11 2006
Raid Level : raid6
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 2441959040 (2328.83 GiB 2500.57 GB)
Raid Devices : 12
Total Devices : 11
Preferred Minor : 0
Update Time : Wed Apr 26 22:30:01 2006
State : active
Active Devices : 11
Working Devices : 11
Failed Devices : 1
Spare Devices : 0
Checksum : 1685ec54 - correct
Events : 0.11176511
Number Major Minor RaidDevice State
this 6 8 97 6 active sync /dev/sdg1
0 0 8 177 0 active sync /dev/sdl1
1 1 8 161 1 active sync /dev/sdk1
2 2 8 129 2 active sync /dev/sdi1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 0 0 7 faulty removed
8 8 8 17 8 active sync /dev/sdb1
9 9 8 49 9 active sync /dev/sdd1
10 10 8 1 10 active sync /dev/sda1
11 11 8 33 11 active sync /dev/sdc1
/dev/sdi1:
Magic : a92b4efc
Version : 00.90.02
UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298
Creation Time : Wed Feb 1 01:09:11 2006
Raid Level : raid6
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 2441959040 (2328.83 GiB 2500.57 GB)
Raid Devices : 12
Total Devices : 11
Preferred Minor : 0
Update Time : Wed Apr 26 22:30:01 2006
State : active
Active Devices : 11
Working Devices : 11
Failed Devices : 1
Spare Devices : 0
Checksum : 1685ec6c - correct
Events : 0.11176511
Number Major Minor RaidDevice State
this 2 8 129 2 active sync /dev/sdi1
0 0 8 177 0 active sync /dev/sdl1
1 1 8 161 1 active sync /dev/sdk1
2 2 8 129 2 active sync /dev/sdi1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 0 0 7 faulty removed
8 8 8 17 8 active sync /dev/sdb1
9 9 8 49 9 active sync /dev/sdd1
10 10 8 1 10 active sync /dev/sda1
11 11 8 33 11 active sync /dev/sdc1
/dev/sdj1:
Magic : a92b4efc
Version : 00.90.02
UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298
Creation Time : Wed Feb 1 01:09:11 2006
Raid Level : raid6
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 2441959040 (2328.83 GiB 2500.57 GB)
Raid Devices : 12
Total Devices : 11
Preferred Minor : 0
Update Time : Wed Apr 26 22:30:01 2006
State : active
Active Devices : 11
Working Devices : 11
Failed Devices : 1
Spare Devices : 0
Checksum : 1685ec7e - correct
Events : 0.11176511
Number Major Minor RaidDevice State
this 3 8 145 3 active sync /dev/sdj1
0 0 8 177 0 active sync /dev/sdl1
1 1 8 161 1 active sync /dev/sdk1
2 2 8 129 2 active sync /dev/sdi1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 0 0 7 faulty removed
8 8 8 17 8 active sync /dev/sdb1
9 9 8 49 9 active sync /dev/sdd1
10 10 8 1 10 active sync /dev/sda1
11 11 8 33 11 active sync /dev/sdc1
/dev/sdk1:
Magic : a92b4efc
Version : 00.90.02
UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298
Creation Time : Wed Feb 1 01:09:11 2006
Raid Level : raid6
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 2441959040 (2328.83 GiB 2500.57 GB)
Raid Devices : 12
Total Devices : 11
Preferred Minor : 0
Update Time : Wed Apr 26 22:30:01 2006
State : active
Active Devices : 11
Working Devices : 11
Failed Devices : 1
Spare Devices : 0
Checksum : 1685ec8a - correct
Events : 0.11176511
Number Major Minor RaidDevice State
this 1 8 161 1 active sync /dev/sdk1
0 0 8 177 0 active sync /dev/sdl1
1 1 8 161 1 active sync /dev/sdk1
2 2 8 129 2 active sync /dev/sdi1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 0 0 7 faulty removed
8 8 8 17 8 active sync /dev/sdb1
9 9 8 49 9 active sync /dev/sdd1
10 10 8 1 10 active sync /dev/sda1
11 11 8 33 11 active sync /dev/sdc1
/dev/sdl1:
Magic : a92b4efc
Version : 00.90.02
UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298
Creation Time : Wed Feb 1 01:09:11 2006
Raid Level : raid6
Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 2441959040 (2328.83 GiB 2500.57 GB)
Raid Devices : 12
Total Devices : 11
Preferred Minor : 0
Update Time : Wed Apr 26 22:30:01 2006
State : active
Active Devices : 11
Working Devices : 11
Failed Devices : 1
Spare Devices : 0
Checksum : 1685ec98 - correct
Events : 0.11176511
Number Major Minor RaidDevice State
this 0 8 177 0 active sync /dev/sdl1
0 0 8 177 0 active sync /dev/sdl1
1 1 8 161 1 active sync /dev/sdk1
2 2 8 129 2 active sync /dev/sdi1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 97 6 active sync /dev/sdg1
7 7 0 0 7 faulty removed
8 8 8 17 8 active sync /dev/sdb1
9 9 8 49 9 active sync /dev/sdd1
10 10 8 1 10 active sync /dev/sda1
11 11 8 33 11 active sync /dev/sdc1
Cheers,
CS
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Trying to start dirty, degraded RAID6 array
2006-04-26 23:37 Trying to start dirty, degraded RAID6 array Christopher Smith
@ 2006-04-27 0:06 ` Neil Brown
2006-04-27 0:22 ` Christopher Smith
0 siblings, 1 reply; 4+ messages in thread
From: Neil Brown @ 2006-04-27 0:06 UTC (permalink / raw)
To: Christopher Smith; +Cc: linux-raid
On Thursday April 27, csmith@nighthawkrad.net wrote:
> The short version:
>
> I have a 12-disk RAID6 array that has lost a device and now whenever I
> try to start it with:
>
> mdadm -Af /dev/md0 /dev/sd[abcdefgijkl]1
>
> I get:
>
> mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
>
...
> raid6: cannot start dirty degraded array for md0
The '-f' is meant to make this work. However it seems there is a bug.
Could you please test this patch? It isn't exactly the right fix, but
it definitely won't hurt.
Thanks,
NeilBrown
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./super0.c | 1 +
1 file changed, 1 insertion(+)
diff ./super0.c~current~ ./super0.c
--- ./super0.c~current~ 2006-03-28 17:10:51.000000000 +1100
+++ ./super0.c 2006-04-27 10:03:40.000000000 +1000
@@ -372,6 +372,7 @@ static int update_super0(struct mdinfo *
if (sb->level == 5 || sb->level == 4 || sb->level == 6)
/* need to force clean */
sb->state |= (1 << MD_SB_CLEAN);
+ rv = 1;
}
if (strcmp(update, "assemble")==0) {
int d = info->disk.number;
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Trying to start dirty, degraded RAID6 array
2006-04-27 0:06 ` Neil Brown
@ 2006-04-27 0:22 ` Christopher Smith
2006-04-27 0:52 ` Neil Brown
0 siblings, 1 reply; 4+ messages in thread
From: Christopher Smith @ 2006-04-27 0:22 UTC (permalink / raw)
To: linux-raid
Neil Brown wrote:
> The '-f' is meant to make this work. However it seems there is a bug.
>
> Could you please test this patch? It isn't exactly the right fix, but
> it definitely won't hurt.
Thanks, Neil, I'll give this a go when I get home tonight.
Is there any way to start an array without kicking off a rebuild ?
CS
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Trying to start dirty, degraded RAID6 array
2006-04-27 0:22 ` Christopher Smith
@ 2006-04-27 0:52 ` Neil Brown
0 siblings, 0 replies; 4+ messages in thread
From: Neil Brown @ 2006-04-27 0:52 UTC (permalink / raw)
To: Christopher Smith; +Cc: linux-raid
On Thursday April 27, csmith@nighthawkrad.net wrote:
> Neil Brown wrote:
> > The '-f' is meant to make this work. However it seems there is a bug.
> >
> > Could you please test this patch? It isn't exactly the right fix, but
> > it definitely won't hurt.
>
> Thanks, Neil, I'll give this a go when I get home tonight.
>
> Is there any way to start an array without kicking off a rebuild ?
echo 1 > /sys/module/md_mod/parameters/start_ro
If you do this, then arrays will be read-only when they are started,
and so will not do a rebuild. The first write request to the array
(e.g. if you mount a filesystem) will cause a switch to read/write and
any required rebuild will start.
echo 0 > ....
will revert the effect.
This requires a reasonably recent kernel.
NeilBrown
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-04-27 0:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-26 23:37 Trying to start dirty, degraded RAID6 array Christopher Smith
2006-04-27 0:06 ` Neil Brown
2006-04-27 0:22 ` Christopher Smith
2006-04-27 0:52 ` Neil Brown
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.