* raid6 rebuild not starting
@ 2011-12-11 7:03 Anssi Hannula
2011-12-12 3:01 ` NeilBrown
0 siblings, 1 reply; 9+ messages in thread
From: Anssi Hannula @ 2011-12-11 7:03 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 742 bytes --]
Hi!
After I rebooted during a raid6 rebuild, the rebuild didn't start again.
Instead, there is a flood of "RAID conf printout"s that seemingly happen
on array activity.
All the devices show up properly in --detail and two devices are marked
as "spare rebuilding", and I can access the contents of the array just
fine, but the rebuild doesn't actually start. Is this a bug or am I
missing something? :)
I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have
the same issue. mdadm is 3.1.5.
I'm not using start_ro and writing to the array doesn't trigger a
rebuild either.
Attached are --examine outputs before assembly, kernel log output on
assembly, /proc/mdstat and --detail after assembly (on 3.1.4).
--
Anssi Hannula
[-- Attachment #2: detail-md0.txt --]
[-- Type: text/plain, Size: 1561 bytes --]
/dev/md0:
Version : 1.2
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Array Size : 15628095488 (14904.11 GiB 16003.17 GB)
Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
Raid Devices : 10
Total Devices : 13
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sun Dec 11 08:12:18 2011
State : active, degraded
Active Devices : 8
Working Devices : 13
Failed Devices : 0
Spare Devices : 5
Layout : left-symmetric
Chunk Size : 512K
Name : delta.onse.fi:delta (local to host delta.onse.fi)
UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Events : 293646
Number Major Minor RaidDevice State
5 8 225 0 active sync /dev/sdo1
2 65 1 1 active sync /dev/sdq1
11 8 161 2 active sync /dev/sdk1
17 8 113 3 spare rebuilding /dev/sdh1
10 8 145 4 active sync /dev/sdj1
9 8 177 5 active sync /dev/sdl1
8 8 241 6 active sync /dev/sdp1
18 8 81 7 spare rebuilding /dev/sdf1
12 8 193 8 active sync /dev/sdm1
13 8 209 9 active sync /dev/sdn1
14 8 97 - spare /dev/sdg1
15 8 65 - spare /dev/sde1
16 8 129 - spare /dev/sdi1
[-- Attachment #3: dmesg-assembled.txt --]
[-- Type: text/plain, Size: 4028 bytes --]
[ 763.062742] md: md0 stopped.
[ 763.070923] md: bind<sdq1>
[ 763.072029] md: bind<sdk1>
[ 763.072567] md: bind<sdh1>
[ 763.073404] md: bind<sdj1>
[ 763.074199] md: bind<sdl1>
[ 763.074752] md: bind<sdp1>
[ 763.075234] md: bind<sdf1>
[ 763.076543] md: bind<sdm1>
[ 763.077348] md: bind<sdn1>
[ 763.078020] md: bind<sdi1>
[ 763.078356] md: bind<sdg1>
[ 763.079611] md: bind<sde1>
[ 763.080159] md: bind<sdo1>
[ 763.082687] md/raid:md0: device sdo1 operational as raid disk 0
[ 763.082765] md/raid:md0: device sdn1 operational as raid disk 9
[ 763.082840] md/raid:md0: device sdm1 operational as raid disk 8
[ 763.082914] md/raid:md0: device sdp1 operational as raid disk 6
[ 763.082988] md/raid:md0: device sdl1 operational as raid disk 5
[ 763.083063] md/raid:md0: device sdj1 operational as raid disk 4
[ 763.083137] md/raid:md0: device sdk1 operational as raid disk 2
[ 763.083211] md/raid:md0: device sdq1 operational as raid disk 1
[ 763.084250] md/raid:md0: allocated 10592kB
[ 763.084522] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2
[ 763.084617] RAID conf printout:
[ 763.084685] --- level:6 rd:10 wd:8
[ 763.087189] disk 0, o:1, dev:sdo1
[ 763.087259] disk 1, o:1, dev:sdq1
[ 763.087328] disk 2, o:1, dev:sdk1
[ 763.087397] disk 3, o:1, dev:sdh1
[ 763.087465] disk 4, o:1, dev:sdj1
[ 763.087534] disk 5, o:1, dev:sdl1
[ 763.087603] disk 6, o:1, dev:sdp1
[ 763.087672] disk 7, o:1, dev:sdf1
[ 763.087741] disk 8, o:1, dev:sdm1
[ 763.087810] disk 9, o:1, dev:sdn1
[ 763.088062] created bitmap (15 pages) for device md0
[ 763.508492] md0: bitmap initialized from disk: read 1/1 pages, set 254 of 29809 bits
[ 763.953649] md0: detected capacity change from 0 to 16003169779712
[ 763.953810] RAID conf printout:
[ 763.953813] --- level:6 rd:10 wd:8
[ 763.953816] disk 0, o:1, dev:sdo1
[ 763.953818] disk 1, o:1, dev:sdq1
[ 763.953820] disk 2, o:1, dev:sdk1
[ 763.953822] disk 3, o:1, dev:sdh1
[ 763.953824] disk 4, o:1, dev:sdj1
[ 763.953826] disk 5, o:1, dev:sdl1
[ 763.953828] disk 6, o:1, dev:sdp1
[ 763.953830] disk 7, o:1, dev:sdf1
[ 763.953832] disk 8, o:1, dev:sdm1
[ 763.953834] disk 9, o:1, dev:sdn1
[ 763.953835] RAID conf printout:
[ 763.953837] --- level:6 rd:10 wd:8
[ 763.953838] disk 0, o:1, dev:sdo1
[ 763.953840] disk 1, o:1, dev:sdq1
[ 763.953843] disk 2, o:1, dev:sdk1
[ 763.953844] disk 3, o:1, dev:sdh1
[ 763.953846] disk 4, o:1, dev:sdj1
[ 763.953848] disk 5, o:1, dev:sdl1
[ 763.953849] disk 6, o:1, dev:sdp1
[ 763.953851] disk 7, o:1, dev:sdf1
[ 763.953853] disk 8, o:1, dev:sdm1
[ 763.953854] disk 9, o:1, dev:sdn1
[ 763.953856] RAID conf printout:
[ 763.953857] --- level:6 rd:10 wd:8
[ 763.953859] disk 0, o:1, dev:sdo1
[ 763.953861] disk 1, o:1, dev:sdq1
[ 763.953862] disk 2, o:1, dev:sdk1
[ 763.953864] disk 3, o:1, dev:sdh1
[ 763.953866] disk 4, o:1, dev:sdj1
[ 763.953867] disk 5, o:1, dev:sdl1
[ 763.953869] disk 6, o:1, dev:sdp1
[ 763.953871] disk 7, o:1, dev:sdf1
[ 763.953872] disk 8, o:1, dev:sdm1
[ 763.953874] disk 9, o:1, dev:sdn1
[ 763.953875] RAID conf printout:
[ 763.953876] --- level:6 rd:10 wd:8
[ 763.953878] disk 0, o:1, dev:sdo1
[ 763.953880] disk 1, o:1, dev:sdq1
[ 763.953882] disk 2, o:1, dev:sdk1
[ 763.953883] disk 3, o:1, dev:sdh1
[ 763.953885] disk 4, o:1, dev:sdj1
[ 763.953887] disk 5, o:1, dev:sdl1
[ 763.953888] disk 6, o:1, dev:sdp1
[ 763.953890] disk 7, o:1, dev:sdf1
[ 763.953892] disk 8, o:1, dev:sdm1
[ 763.953893] disk 9, o:1, dev:sdn1
[ 763.953895] RAID conf printout:
[ 763.953896] --- level:6 rd:10 wd:8
[ 763.953898] disk 0, o:1, dev:sdo1
[ 763.953900] disk 1, o:1, dev:sdq1
[ 763.953901] disk 2, o:1, dev:sdk1
[ 763.953903] disk 3, o:1, dev:sdh1
[ 763.953905] disk 4, o:1, dev:sdj1
[ 763.953906] disk 5, o:1, dev:sdl1
[ 763.953908] disk 6, o:1, dev:sdp1
[ 763.953910] disk 7, o:1, dev:sdf1
[ 763.953911] disk 8, o:1, dev:sdm1
[ 763.953913] disk 9, o:1, dev:sdn1
[-- Attachment #4: examine-sdf1.txt --]
[-- Type: text/plain, Size: 933 bytes --]
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x3
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Recovery Offset : 289089800 sectors
State : clean
Device UUID : 775affd2:7cedaf16:d29518a2:9ebda6ab
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : 397ee09a - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 7
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #5: examine-sdh1.txt --]
[-- Type: text/plain, Size: 932 bytes --]
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x3
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Recovery Offset : 289089800 sectors
State : clean
Device UUID : 01c10562:b17a29e4:f554a3bd:bf9ad501
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : 7b8709c - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #6: examine-sdj1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdj1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 46cf1470:7fba6892:5646483b:687df054
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : 838b69a7 - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #7: examine-sdk1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdk1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 8f58934b:e8c3b310:77280e93:c6dd7f1e
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : feaa3ed8 - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #8: examine-sdl1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdl1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 7c742911:a4aa0b69:75767d03:d78f19aa
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : 18a1418f - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #9: examine-sdm1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdm1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 5da5f2c1:27900682:eed98123:d1dd7683
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : dbc70969 - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 8
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #10: examine-sdn1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdn1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : a3a9b4ad:cab2a5c4:fa4a83d2:6839a473
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : a956fd07 - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 9
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #11: examine-sdo1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdo1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f17591b8:8012317d:ded224ad:8054aee9
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : bd6acbef - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #12: examine-sdp1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdp1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : a0a2c206:5ce49b5c:f5a735f1:62352734
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : 79908075 - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 6
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #13: examine-sdq1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdq1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 3cdd3e34:252a6077:8a1a5828:27528598
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : 5d51902e - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #14: proc_mdstat.txt --]
[-- Type: text/plain, Size: 435 bytes --]
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid6 sdo1[5] sde1[15](S) sdg1[14](S) sdi1[16](S) sdn1[13] sdm1[12] sdf1[18] sdp1[8] sdl1[9] sdj1[10] sdh1[17] sdk1[11] sdq1[2]
15628095488 blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/8] [UUU_UUU_UU]
bitmap: 7/15 pages [28KB], 65536KB chunk
md1 : active raid1 sda2[0] sdb2[1]
116170128 blocks super 1.2 [2/2] [UU]
unused devices: <none>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting
2011-12-11 7:03 raid6 rebuild not starting Anssi Hannula
@ 2011-12-12 3:01 ` NeilBrown
2011-12-12 5:22 ` Anssi Hannula
0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2011-12-12 3:01 UTC (permalink / raw)
To: Anssi Hannula; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2745 bytes --]
On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
> Hi!
>
> After I rebooted during a raid6 rebuild, the rebuild didn't start again.
> Instead, there is a flood of "RAID conf printout"s that seemingly happen
> on array activity.
>
> All the devices show up properly in --detail and two devices are marked
> as "spare rebuilding", and I can access the contents of the array just
> fine, but the rebuild doesn't actually start. Is this a bug or am I
> missing something? :)
>
> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have
> the same issue. mdadm is 3.1.5.
>
> I'm not using start_ro and writing to the array doesn't trigger a
> rebuild either.
>
> Attached are --examine outputs before assembly, kernel log output on
> assembly, /proc/mdstat and --detail after assembly (on 3.1.4).
>
Thank you for the very detailed problem report.
Unfortunately it is a complete mystery to me what is happening.
The repeated "RAID conf printout" messages are almost certainly coming from
the end of raid5_remove_disk.
It is being called from remove_and_add_spares for each of the two devices
that are being rebuilt. raid5_remove_disk declines to remove them because it
can keep rebuilding them.
remove_and_add_spares then counts them and notes there are 2.
md_check_recovery notes that this is > 0, so it should create a thread to run
md_do_sync.
md_do_sync should then print out a message like
md: recovery of RAID array md0
but it doesn't. So something went wrong.
There are three reasons that md_do_sync might not print a message:
1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is
unlikely, and in any case md_check_recovery clears it.
2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2
then we would be able to see that in /proc/mdstat as a "(readonly)"
status. But we don't.
3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does
get set if kthread_should_stop() returns 'true', but that should only
happen if kthread_stop() was called. That is only called by
md_unregister_thread and I cannot see any way that could be call.
So. No idea.
Are you compiling these kernels yourself?
If so, could you:
- put a printk in the top of md_do_sync to report the values of
mddev->recovery and mddev->ro
- print a message whenever md_unregister_thread is called
- in md_check_recovery, in the
if (mddev->ro) {
/* Only thing we do on a ro array is remove
* failed devices.
*/
mdk_rdev_t *rdev;
in statement, print the value of mddev->ro.
Then see which of those printk's fire, and what they tell us.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting
2011-12-12 3:01 ` NeilBrown
@ 2011-12-12 5:22 ` Anssi Hannula
2011-12-12 5:42 ` NeilBrown
0 siblings, 1 reply; 9+ messages in thread
From: Anssi Hannula @ 2011-12-12 5:22 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 3213 bytes --]
On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@suse.de> wrote:
> On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
>
>> Hi!
>>
>> After I rebooted during a raid6 rebuild, the rebuild didn't start again.
>> Instead, there is a flood of "RAID conf printout"s that seemingly happen
>> on array activity.
>>
>> All the devices show up properly in --detail and two devices are marked
>> as "spare rebuilding", and I can access the contents of the array just
>> fine, but the rebuild doesn't actually start. Is this a bug or am I
>> missing something? :)
>>
>> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have
>> the same issue. mdadm is 3.1.5.
>>
>> I'm not using start_ro and writing to the array doesn't trigger a
>> rebuild either.
>>
>> Attached are --examine outputs before assembly, kernel log output on
>> assembly, /proc/mdstat and --detail after assembly (on 3.1.4).
>>
>
> Thank you for the very detailed problem report.
Thanks for the quick response :)
> Unfortunately it is a complete mystery to me what is happening.
>
> The repeated "RAID conf printout" messages are almost certainly coming from
> the end of raid5_remove_disk.
> It is being called from remove_and_add_spares for each of the two devices
> that are being rebuilt. raid5_remove_disk declines to remove them because it
> can keep rebuilding them.
>
> remove_and_add_spares then counts them and notes there are 2.
> md_check_recovery notes that this is > 0, so it should create a thread to run
> md_do_sync.
>
> md_do_sync should then print out a message like
> md: recovery of RAID array md0
>
> but it doesn't. So something went wrong.
> There are three reasons that md_do_sync might not print a message:
>
> 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is
> unlikely, and in any case md_check_recovery clears it.
> 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2
> then we would be able to see that in /proc/mdstat as a "(readonly)"
> status. But we don't.
> 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does
> get set if kthread_should_stop() returns 'true', but that should only
> happen if kthread_stop() was called. That is only called by
> md_unregister_thread and I cannot see any way that could be call.
>
> So. No idea.
>
> Are you compiling these kernels yourself?
Nope (used Mageia kernels), but I did now (3.1.5).
> If so, could you:
> - put a printk in the top of md_do_sync to report the values of
> mddev->recovery and mddev->ro
> - print a message whenever md_unregister_thread is called
> - in md_check_recovery, in the
> if (mddev->ro) {
> /* Only thing we do on a ro array is remove
> * failed devices.
> */
> mdk_rdev_t *rdev;
>
> in statement, print the value of mddev->ro.
>
> Then see which of those printk's fire, and what they tell us.
Only the last one does, and mddev->ro == 0.
For reference, attached is the used patch and resulting log output.
--
Anssi Hannula
[-- Attachment #2: dmesg-dbg.txt --]
[-- Type: text/plain, Size: 7847 bytes --]
[ 46.356559] md: md0 stopped.
[ 46.363773] md: bind<sdq1>
[ 46.363920] md: bind<sdk1>
[ 46.364035] md: bind<sdh1>
[ 46.364170] md: bind<sdj1>
[ 46.364567] md: bind<sdl1>
[ 46.364972] md: bind<sdp1>
[ 46.365184] md: bind<sdf1>
[ 46.365603] md: bind<sdm1>
[ 46.366025] md: bind<sdn1>
[ 46.366219] md: bind<sdi1>
[ 46.366464] md: bind<sdg1>
[ 46.366751] md: bind<sde1>
[ 46.367222] md: bind<sdo1>
[ 46.374687] async_tx: api initialized (async)
[ 46.547228] raid6: int64x1 2723 MB/s
[ 46.717043] raid6: int64x2 3533 MB/s
[ 46.886928] raid6: int64x4 2398 MB/s
[ 47.056795] raid6: int64x8 2482 MB/s
[ 47.226617] raid6: sse2x1 7368 MB/s
[ 47.396465] raid6: sse2x2 8686 MB/s
[ 47.566333] raid6: sse2x4 9614 MB/s
[ 47.566336] raid6: using algorithm sse2x4 (9614 MB/s)
[ 47.573931] xor: automatically using best checksumming function: generic_sse
[ 47.616567] generic_sse: 12849.600 MB/sec
[ 47.616569] xor: using function: generic_sse (12849.600 MB/sec)
[ 47.625852] md: raid6 personality registered for level 6
[ 47.625854] md: raid5 personality registered for level 5
[ 47.625855] md: raid4 personality registered for level 4
[ 47.626275] md/raid:md0: device sdo1 operational as raid disk 0
[ 47.626277] md/raid:md0: device sdn1 operational as raid disk 9
[ 47.626279] md/raid:md0: device sdm1 operational as raid disk 8
[ 47.626280] md/raid:md0: device sdp1 operational as raid disk 6
[ 47.626281] md/raid:md0: device sdl1 operational as raid disk 5
[ 47.626283] md/raid:md0: device sdj1 operational as raid disk 4
[ 47.626284] md/raid:md0: device sdk1 operational as raid disk 2
[ 47.626285] md/raid:md0: device sdq1 operational as raid disk 1
[ 47.626899] md/raid:md0: allocated 10592kB
[ 47.627049] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2
[ 47.627150] RAID conf printout:
[ 47.627152] --- level:6 rd:10 wd:8
[ 47.627155] disk 0, o:1, dev:sdo1
[ 47.627157] disk 1, o:1, dev:sdq1
[ 47.627163] disk 2, o:1, dev:sdk1
[ 47.627165] disk 3, o:1, dev:sdh1
[ 47.627167] disk 4, o:1, dev:sdj1
[ 47.627169] disk 5, o:1, dev:sdl1
[ 47.627171] disk 6, o:1, dev:sdp1
[ 47.627173] disk 7, o:1, dev:sdf1
[ 47.627175] disk 8, o:1, dev:sdm1
[ 47.627177] disk 9, o:1, dev:sdn1
[ 47.627337] created bitmap (15 pages) for device md0
[ 48.028349] md0: bitmap initialized from disk: read 1/1 pages, set 257 of 29809 bits
[ 48.476099] md0: detected capacity change from 0 to 16003169779712
[ 48.476739] md0: in md_check_recovery(), mddev->ro = 0
[ 48.476742] RAID conf printout:
[ 48.476744] --- level:6 rd:10 wd:8
[ 48.476746] disk 0, o:1, dev:sdo1
[ 48.476748] disk 1, o:1, dev:sdq1
[ 48.476749] disk 2, o:1, dev:sdk1
[ 48.476751] disk 3, o:1, dev:sdh1
[ 48.476753] disk 4, o:1, dev:sdj1
[ 48.476754] disk 5, o:1, dev:sdl1
[ 48.476756] disk 6, o:1, dev:sdp1
[ 48.476757] disk 7, o:1, dev:sdf1
[ 48.476758] disk 8, o:1, dev:sdm1
[ 48.476760] disk 9, o:1, dev:sdn1
[ 48.476761] RAID conf printout:
[ 48.476762] --- level:6 rd:10 wd:8
[ 48.476763] disk 0, o:1, dev:sdo1
[ 48.476764] disk 1, o:1, dev:sdq1
[ 48.476766] disk 2, o:1, dev:sdk1
[ 48.476767] disk 3, o:1, dev:sdh1
[ 48.476768] disk 4, o:1, dev:sdj1
[ 48.476770] disk 5, o:1, dev:sdl1
[ 48.476771] disk 6, o:1, dev:sdp1
[ 48.476772] disk 7, o:1, dev:sdf1
[ 48.476773] disk 8, o:1, dev:sdm1
[ 48.476775] disk 9, o:1, dev:sdn1
[ 48.476775] RAID conf printout:
[ 48.476777] --- level:6 rd:10 wd:8
[ 48.476778] disk 0, o:1, dev:sdo1
[ 48.476779] disk 1, o:1, dev:sdq1
[ 48.476780] disk 2, o:1, dev:sdk1
[ 48.476781] disk 3, o:1, dev:sdh1
[ 48.476783] disk 4, o:1, dev:sdj1
[ 48.476784] disk 5, o:1, dev:sdl1
[ 48.476785] disk 6, o:1, dev:sdp1
[ 48.476786] disk 7, o:1, dev:sdf1
[ 48.476788] disk 8, o:1, dev:sdm1
[ 48.476789] disk 9, o:1, dev:sdn1
[ 48.476790] RAID conf printout:
[ 48.476791] --- level:6 rd:10 wd:8
[ 48.476792] disk 0, o:1, dev:sdo1
[ 48.476793] disk 1, o:1, dev:sdq1
[ 48.476795] disk 2, o:1, dev:sdk1
[ 48.476796] disk 3, o:1, dev:sdh1
[ 48.476798] disk 4, o:1, dev:sdj1
[ 48.476799] disk 5, o:1, dev:sdl1
[ 48.476800] disk 6, o:1, dev:sdp1
[ 48.476802] disk 7, o:1, dev:sdf1
[ 48.476803] disk 8, o:1, dev:sdm1
[ 48.476804] disk 9, o:1, dev:sdn1
[ 48.476805] RAID conf printout:
[ 48.476806] --- level:6 rd:10 wd:8
[ 48.476807] disk 0, o:1, dev:sdo1
[ 48.476809] disk 1, o:1, dev:sdq1
[ 48.476810] disk 2, o:1, dev:sdk1
[ 48.476811] disk 3, o:1, dev:sdh1
[ 48.476812] disk 4, o:1, dev:sdj1
[ 48.476813] disk 5, o:1, dev:sdl1
[ 48.476815] disk 6, o:1, dev:sdp1
[ 48.476816] disk 7, o:1, dev:sdf1
[ 48.476817] disk 8, o:1, dev:sdm1
[ 48.476819] disk 9, o:1, dev:sdn1
[ 48.488210] md0: unknown partition table
[ 49.328595] md1: in md_check_recovery(), mddev->ro = 0
[ 49.329692] EXT4-fs (dm-0): re-mounted. Opts: (null)
[ 49.534829] md1: in md_check_recovery(), mddev->ro = 0
[ 49.792196] md0: in md_check_recovery(), mddev->ro = 0
[ 50.349944] RAID conf printout:
[ 50.349949] --- level:6 rd:10 wd:8
[ 50.349952] disk 0, o:1, dev:sdo1
[ 50.349955] disk 1, o:1, dev:sdq1
[ 50.349957] disk 2, o:1, dev:sdk1
[ 50.349959] disk 3, o:1, dev:sdh1
[ 50.349961] disk 4, o:1, dev:sdj1
[ 50.349964] disk 5, o:1, dev:sdl1
[ 50.349966] disk 6, o:1, dev:sdp1
[ 50.349968] disk 7, o:1, dev:sdf1
[ 50.349970] disk 8, o:1, dev:sdm1
[ 50.349972] disk 9, o:1, dev:sdn1
[ 50.349974] RAID conf printout:
[ 50.349976] --- level:6 rd:10 wd:8
[ 50.349978] disk 0, o:1, dev:sdo1
[ 50.349980] disk 1, o:1, dev:sdq1
[ 50.349982] disk 2, o:1, dev:sdk1
[ 50.349984] disk 3, o:1, dev:sdh1
[ 50.349986] disk 4, o:1, dev:sdj1
[ 50.349988] disk 5, o:1, dev:sdl1
[ 50.349990] disk 6, o:1, dev:sdp1
[ 50.349992] disk 7, o:1, dev:sdf1
[ 50.349994] disk 8, o:1, dev:sdm1
[ 50.349996] disk 9, o:1, dev:sdn1
[ 50.349997] RAID conf printout:
[ 50.349999] --- level:6 rd:10 wd:8
[ 50.350001] disk 0, o:1, dev:sdo1
[ 50.350003] disk 1, o:1, dev:sdq1
[ 50.350005] disk 2, o:1, dev:sdk1
[ 50.350007] disk 3, o:1, dev:sdh1
[ 50.350009] disk 4, o:1, dev:sdj1
[ 50.350011] disk 5, o:1, dev:sdl1
[ 50.350013] disk 6, o:1, dev:sdp1
[ 50.350015] disk 7, o:1, dev:sdf1
[ 50.350017] disk 8, o:1, dev:sdm1
[ 50.350019] disk 9, o:1, dev:sdn1
[ 50.350020] RAID conf printout:
[ 50.350022] --- level:6 rd:10 wd:8
[ 50.350024] disk 0, o:1, dev:sdo1
[ 50.350026] disk 1, o:1, dev:sdq1
[ 50.350028] disk 2, o:1, dev:sdk1
[ 50.350030] disk 3, o:1, dev:sdh1
[ 50.350032] disk 4, o:1, dev:sdj1
[ 50.350034] disk 5, o:1, dev:sdl1
[ 50.350036] disk 6, o:1, dev:sdp1
[ 50.350038] disk 7, o:1, dev:sdf1
[ 50.350040] disk 8, o:1, dev:sdm1
[ 50.350042] disk 9, o:1, dev:sdn1
[ 50.350043] RAID conf printout:
[ 50.350045] --- level:6 rd:10 wd:8
[ 50.350047] disk 0, o:1, dev:sdo1
[ 50.350049] disk 1, o:1, dev:sdq1
[ 50.350051] disk 2, o:1, dev:sdk1
[ 50.350053] disk 3, o:1, dev:sdh1
[ 50.350055] disk 4, o:1, dev:sdj1
[ 50.350057] disk 5, o:1, dev:sdl1
[ 50.350059] disk 6, o:1, dev:sdp1
[ 50.350061] disk 7, o:1, dev:sdf1
[ 50.350063] disk 8, o:1, dev:sdm1
[ 50.350065] disk 9, o:1, dev:sdn1
[ 50.941625] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: (null)
[ 50.966825] EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts: (null)
[ 50.979025] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null)
[ 50.991736] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[ 51.006856] loop: module loaded
[ 51.033690] md1: in md_check_recovery(), mddev->ro = 0
[-- Attachment #3: dbg.patch --]
[-- Type: text/x-patch, Size: 1013 bytes --]
--- md-orig/md.c 2011-10-24 10:10:05.000000000 +0300
+++ 3.1.4-MD/md.c 2011-12-12 06:26:43.388401596 +0200
@@ -6442,6 +6442,7 @@ mdk_thread_t *md_register_thread(void (*
void md_unregister_thread(mdk_thread_t **threadp)
{
mdk_thread_t *thread = *threadp;
+ printk(KERN_INFO "md_unregister_thread()\n");
if (!thread)
return;
dprintk("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk));
@@ -6996,6 +6997,8 @@ void md_do_sync(mddev_t *mddev)
mdk_rdev_t *rdev;
char *desc;
+ printk(KERN_INFO "%s: md_do_sync(), recovery=0x%lx, ro=%d\n",
+ mdname(mddev), mddev->recovery, mddev->ro);
/* just incase thread restarts... */
if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
return;
@@ -7445,6 +7452,8 @@ void md_check_recovery(mddev_t *mddev)
if (mddev_trylock(mddev)) {
int spares = 0;
+ printk(KERN_INFO "%s: in md_check_recovery(), mddev->ro = %d\n",
+ mdname(mddev), mddev->ro);
if (mddev->ro) {
/* Only thing we do on a ro array is remove
* failed devices.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting
2011-12-12 5:22 ` Anssi Hannula
@ 2011-12-12 5:42 ` NeilBrown
2011-12-12 6:02 ` Anssi Hannula
0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2011-12-12 5:42 UTC (permalink / raw)
To: Anssi Hannula; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 3790 bytes --]
On Mon, 12 Dec 2011 07:22:17 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
> On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@suse.de> wrote:
> > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
> >
> >> Hi!
> >>
> >> After I rebooted during a raid6 rebuild, the rebuild didn't start again.
> >> Instead, there is a flood of "RAID conf printout"s that seemingly happen
> >> on array activity.
> >>
> >> All the devices show up properly in --detail and two devices are marked
> >> as "spare rebuilding", and I can access the contents of the array just
> >> fine, but the rebuild doesn't actually start. Is this a bug or am I
> >> missing something? :)
> >>
> >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have
> >> the same issue. mdadm is 3.1.5.
> >>
> >> I'm not using start_ro and writing to the array doesn't trigger a
> >> rebuild either.
> >>
> >> Attached are --examine outputs before assembly, kernel log output on
> >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4).
> >>
> >
> > Thank you for the very detailed problem report.
>
> Thanks for the quick response :)
>
> > Unfortunately it is a complete mystery to me what is happening.
> >
> > The repeated "RAID conf printout" messages are almost certainly coming from
> > the end of raid5_remove_disk.
> > It is being called from remove_and_add_spares for each of the two devices
> > that are being rebuilt. raid5_remove_disk declines to remove them because it
> > can keep rebuilding them.
> >
> > remove_and_add_spares then counts them and notes there are 2.
> > md_check_recovery notes that this is > 0, so it should create a thread to run
> > md_do_sync.
> >
> > md_do_sync should then print out a message like
> > md: recovery of RAID array md0
> >
> > but it doesn't. So something went wrong.
> > There are three reasons that md_do_sync might not print a message:
> >
> > 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is
> > unlikely, and in any case md_check_recovery clears it.
> > 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2
> > then we would be able to see that in /proc/mdstat as a "(readonly)"
> > status. But we don't.
> > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does
> > get set if kthread_should_stop() returns 'true', but that should only
> > happen if kthread_stop() was called. That is only called by
> > md_unregister_thread and I cannot see any way that could be call.
> >
> > So. No idea.
> >
> > Are you compiling these kernels yourself?
>
> Nope (used Mageia kernels), but I did now (3.1.5).
>
> > If so, could you:
> > - put a printk in the top of md_do_sync to report the values of
> > mddev->recovery and mddev->ro
> > - print a message whenever md_unregister_thread is called
> > - in md_check_recovery, in the
> > if (mddev->ro) {
> > /* Only thing we do on a ro array is remove
> > * failed devices.
> > */
> > mdk_rdev_t *rdev;
> >
> > in statement, print the value of mddev->ro.
> >
> > Then see which of those printk's fire, and what they tell us.
>
> Only the last one does, and mddev->ro == 0.
>
> For reference, attached is the used patch and resulting log output.
>
Thanks.
So it isn't running md_do_sync at all. Odd.
Could please add:
- call "WARN_ON(1);" in print_raid5_conf() so we get a stack trace and can
see who is calling it.
- print the value that remove_and_add_spares is going to return.
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting
2011-12-12 5:42 ` NeilBrown
@ 2011-12-12 6:02 ` Anssi Hannula
2011-12-12 6:24 ` NeilBrown
0 siblings, 1 reply; 9+ messages in thread
From: Anssi Hannula @ 2011-12-12 6:02 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 3928 bytes --]
On Mon, Dec 12, 2011 at 7:42 AM, NeilBrown <neilb@suse.de> wrote:
> On Mon, 12 Dec 2011 07:22:17 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
>
>> On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@suse.de> wrote:
>> > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
>> >
>> >> Hi!
>> >>
>> >> After I rebooted during a raid6 rebuild, the rebuild didn't start again.
>> >> Instead, there is a flood of "RAID conf printout"s that seemingly happen
>> >> on array activity.
>> >>
>> >> All the devices show up properly in --detail and two devices are marked
>> >> as "spare rebuilding", and I can access the contents of the array just
>> >> fine, but the rebuild doesn't actually start. Is this a bug or am I
>> >> missing something? :)
>> >>
>> >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have
>> >> the same issue. mdadm is 3.1.5.
>> >>
>> >> I'm not using start_ro and writing to the array doesn't trigger a
>> >> rebuild either.
>> >>
>> >> Attached are --examine outputs before assembly, kernel log output on
>> >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4).
>> >>
>> >
>> > Thank you for the very detailed problem report.
>>
>> Thanks for the quick response :)
>>
>> > Unfortunately it is a complete mystery to me what is happening.
>> >
>> > The repeated "RAID conf printout" messages are almost certainly coming from
>> > the end of raid5_remove_disk.
>> > It is being called from remove_and_add_spares for each of the two devices
>> > that are being rebuilt. raid5_remove_disk declines to remove them because it
>> > can keep rebuilding them.
>> >
>> > remove_and_add_spares then counts them and notes there are 2.
>> > md_check_recovery notes that this is > 0, so it should create a thread to run
>> > md_do_sync.
>> >
>> > md_do_sync should then print out a message like
>> > md: recovery of RAID array md0
>> >
>> > but it doesn't. So something went wrong.
>> > There are three reasons that md_do_sync might not print a message:
>> >
>> > 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is
>> > unlikely, and in any case md_check_recovery clears it.
>> > 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2
>> > then we would be able to see that in /proc/mdstat as a "(readonly)"
>> > status. But we don't.
>> > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does
>> > get set if kthread_should_stop() returns 'true', but that should only
>> > happen if kthread_stop() was called. That is only called by
>> > md_unregister_thread and I cannot see any way that could be call.
>> >
>> > So. No idea.
>> >
>> > Are you compiling these kernels yourself?
>>
>> Nope (used Mageia kernels), but I did now (3.1.5).
>>
>> > If so, could you:
>> > - put a printk in the top of md_do_sync to report the values of
>> > mddev->recovery and mddev->ro
>> > - print a message whenever md_unregister_thread is called
>> > - in md_check_recovery, in the
>> > if (mddev->ro) {
>> > /* Only thing we do on a ro array is remove
>> > * failed devices.
>> > */
>> > mdk_rdev_t *rdev;
>> >
>> > in statement, print the value of mddev->ro.
>> >
>> > Then see which of those printk's fire, and what they tell us.
>>
>> Only the last one does, and mddev->ro == 0.
>>
>> For reference, attached is the used patch and resulting log output.
>>
>
> Thanks.
>
> So it isn't running md_do_sync at all. Odd.
>
> Could please add:
> - call "WARN_ON(1);" in print_raid5_conf() so we get a stack trace and can
> see who is calling it.
> - print the value that remove_and_add_spares is going to return.
Attached. As you can see, remove_and_add_spare returns 0.
--
Anssi Hannula
[-- Attachment #2: dmesg-dbg2.txt --]
[-- Type: text/plain, Size: 20695 bytes --]
[ 44.270890] md: md0 stopped.
[ 44.276951] md: bind<sdq1>
[ 44.277373] md: bind<sdo1>
[ 44.277504] md: bind<sdh1>
[ 44.278088] md: bind<sdk1>
[ 44.278816] md: bind<sdp1>
[ 44.282215] md: bind<sdn1>
[ 44.282377] md: bind<sdf1>
[ 44.282558] md: bind<sdm1>
[ 44.282741] md: bind<sdl1>
[ 44.282948] md: bind<sdi1>
[ 44.283085] md: bind<sdg1>
[ 44.283249] md: bind<sdd1>
[ 44.283449] md: bind<sdj1>
[ 44.288946] async_tx: api initialized (async)
[ 44.465572] raid6: int64x1 2753 MB/s
[ 44.637492] raid6: int64x2 3533 MB/s
[ 44.805245] raid6: int64x4 2396 MB/s
[ 44.975134] raid6: int64x8 2491 MB/s
[ 45.146107] raid6: sse2x1 7311 MB/s
[ 45.314810] raid6: sse2x2 8670 MB/s
[ 45.484662] raid6: sse2x4 9630 MB/s
[ 45.484665] raid6: using algorithm sse2x4 (9630 MB/s)
[ 45.491202] xor: automatically using best checksumming function: generic_sse
[ 45.534618] generic_sse: 12819.200 MB/sec
[ 45.534623] xor: using function: generic_sse (12819.200 MB/sec)
[ 45.543401] md: raid6 personality registered for level 6
[ 45.543402] md: raid5 personality registered for level 5
[ 45.543403] md: raid4 personality registered for level 4
[ 45.543961] md/raid:md0: device sdj1 operational as raid disk 0
[ 45.543963] md/raid:md0: device sdl1 operational as raid disk 9
[ 45.543964] md/raid:md0: device sdm1 operational as raid disk 8
[ 45.543966] md/raid:md0: device sdn1 operational as raid disk 6
[ 45.543967] md/raid:md0: device sdp1 operational as raid disk 5
[ 45.543968] md/raid:md0: device sdk1 operational as raid disk 4
[ 45.543969] md/raid:md0: device sdo1 operational as raid disk 2
[ 45.543970] md/raid:md0: device sdq1 operational as raid disk 1
[ 45.544667] md/raid:md0: allocated 10592kB
[ 45.544806] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2
[ 45.544907] ------------[ cut here ]------------
[ 45.544913] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]()
[ 45.544920] Hardware name: System Product Name
[ 45.544922] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd joydev ir_lirc_codec rc_imon_pad lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder ir_rc5_decoder imon ir_nec_decoder usb_storage rc_core snd_hda_codec_realtek snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i2c_i801 drxk snd_timer snd_mixer_oss e1000e ddbridge evdev mxm_wmi snd dvb_core firewire_ohci sr_mod serio_raw iTCO_wdt cxd2099(C) firewire_core soundcore iTCO_vendor_support i2c_core i7core_edac crc_itu_t snd_page_alloc edac_core asus_atk0110 wmi processor button sg xhci_hcd ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan]
[ 45.544994] Pid: 848, comm: mdadm Tainted: G C 3.1.5 #5
[ 45.544995] Call Trace:
[ 45.545000] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0
[ 45.545002] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20
[ 45.545004] [<ffffffffa0675607>] print_raid5_conf+0x37/0x110 [raid456]
[ 45.545007] [<ffffffffa067e2ba>] run+0x2aa/0x750 [raid456]
[ 45.545010] [<ffffffff811908a1>] ? bioset_create+0x1c1/0x280
[ 45.545012] [<ffffffff8133f3eb>] md_run+0x36b/0x8a0
[ 45.545014] [<ffffffff8134060e>] ? add_new_disk+0x2ee/0x550
[ 45.545016] [<ffffffff8133f93e>] do_md_run+0x1e/0xd0
[ 45.545017] [<ffffffff81341223>] md_ioctl+0x9b3/0x1230
[ 45.545021] [<ffffffff8121d260>] blkdev_ioctl+0x230/0x740
[ 45.545023] [<ffffffff81191a31>] block_ioctl+0x41/0x50
[ 45.545026] [<ffffffff8116f867>] do_vfs_ioctl+0x97/0x4f0
[ 45.545028] [<ffffffff8116fd51>] sys_ioctl+0x91/0xa0
[ 45.545031] [<ffffffff8143cd92>] system_call_fastpath+0x16/0x1b
[ 45.545032] ---[ end trace 8bf0f462f72ecc2c ]---
[ 45.545033] RAID conf printout:
[ 45.545034] --- level:6 rd:10 wd:8
[ 45.545035] disk 0, o:1, dev:sdj1
[ 45.545036] disk 1, o:1, dev:sdq1
[ 45.545037] disk 2, o:1, dev:sdo1
[ 45.545038] disk 3, o:1, dev:sdh1
[ 45.545039] disk 4, o:1, dev:sdk1
[ 45.545040] disk 5, o:1, dev:sdp1
[ 45.545041] disk 6, o:1, dev:sdn1
[ 45.545042] disk 7, o:1, dev:sdf1
[ 45.545043] disk 8, o:1, dev:sdm1
[ 45.545044] disk 9, o:1, dev:sdl1
[ 45.545192] created bitmap (15 pages) for device md0
[ 45.967725] md0: bitmap initialized from disk: read 1/1 pages, set 258 of 29809 bits
[ 46.420406] md0: detected capacity change from 0 to 16003169779712
[ 46.421264] md0: in md_check_recovery(), mddev->ro = 0
[ 46.421270] ------------[ cut here ]------------
[ 46.421279] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]()
[ 46.421283] Hardware name: System Product Name
[ 46.421285] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd joydev ir_lirc_codec rc_imon_pad lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder ir_rc5_decoder imon ir_nec_decoder usb_storage rc_core snd_hda_codec_realtek snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i2c_i801 drxk snd_timer snd_mixer_oss e1000e ddbridge evdev mxm_wmi snd dvb_core firewire_ohci sr_mod serio_raw iTCO_wdt cxd2099(C) firewire_core soundcore iTCO_vendor_support i2c_core i7core_edac crc_itu_t snd_page_alloc edac_core asus_atk0110 wmi processor button sg xhci_hcd ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan]
[ 46.421387] Pid: 912, comm: md0_raid6 Tainted: G WC 3.1.5 #5
[ 46.421389] Call Trace:
[ 46.421396] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0
[ 46.421398] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20
[ 46.421402] [<ffffffffa0675607>] print_raid5_conf+0x37/0x110 [raid456]
[ 46.421405] [<ffffffffa0675ff7>] raid5_remove_disk+0x47/0x120 [raid456]
[ 46.421408] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x690
[ 46.421412] [<ffffffffa067be63>] raid5d+0x43/0x780 [raid456]
[ 46.421415] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60
[ 46.421419] [<ffffffff81433057>] ? schedule_timeout+0x177/0x2e0
[ 46.421422] [<ffffffff81073660>] ? cascade+0xa0/0xa0
[ 46.421426] [<ffffffff81338f96>] md_thread+0x116/0x150
[ 46.421430] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40
[ 46.421433] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130
[ 46.421435] [<ffffffff810869a6>] kthread+0x96/0xa0
[ 46.421439] [<ffffffff8143eef4>] kernel_thread_helper+0x4/0x10
[ 46.421442] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0
[ 46.421444] [<ffffffff8143eef0>] ? gs_change+0x13/0x13
[ 46.421446] ---[ end trace 8bf0f462f72ecc2d ]---
[ 46.421447] RAID conf printout:
[ 46.421449] --- level:6 rd:10 wd:8
[ 46.421451] disk 0, o:1, dev:sdj1
[ 46.421453] disk 1, o:1, dev:sdq1
[ 46.421454] disk 2, o:1, dev:sdo1
[ 46.421456] disk 3, o:1, dev:sdh1
[ 46.421457] disk 4, o:1, dev:sdk1
[ 46.421459] disk 5, o:1, dev:sdp1
[ 46.421461] disk 6, o:1, dev:sdn1
[ 46.421463] disk 7, o:1, dev:sdf1
[ 46.421464] disk 8, o:1, dev:sdm1
[ 46.421466] disk 9, o:1, dev:sdl1
[ 46.421467] ------------[ cut here ]------------
[ 46.421471] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]()
[ 46.421473] Hardware name: System Product Name
[ 46.421474] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd joydev ir_lirc_codec rc_imon_pad lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder ir_rc5_decoder imon ir_nec_decoder usb_storage rc_core snd_hda_codec_realtek snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i2c_i801 drxk snd_timer snd_mixer_oss e1000e ddbridge evdev mxm_wmi snd dvb_core firewire_ohci sr_mod serio_raw iTCO_wdt cxd2099(C) firewire_core soundcore iTCO_vendor_support i2c_core i7core_edac crc_itu_t snd_page_alloc edac_core asus_atk0110 wmi processor button sg xhci_hcd ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan]
[ 46.421531] Pid: 912, comm: md0_raid6 Tainted: G WC 3.1.5 #5
[ 46.421532] Call Trace:
[ 46.421535] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0
[ 46.421537] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20
[ 46.421540] [<ffffffffa0675607>] print_raid5_conf+0x37/0x110 [raid456]
[ 46.421543] [<ffffffffa067600c>] raid5_remove_disk+0x5c/0x120 [raid456]
[ 46.421546] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x690
[ 46.421549] [<ffffffffa067be63>] raid5d+0x43/0x780 [raid456]
[ 46.421551] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60
[ 46.421554] [<ffffffff81433057>] ? schedule_timeout+0x177/0x2e0
[ 46.421559] [<ffffffff81073660>] ? cascade+0xa0/0xa0
[ 46.421562] [<ffffffff81338f96>] md_thread+0x116/0x150
[ 46.421564] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40
[ 46.421567] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130
[ 46.421569] [<ffffffff810869a6>] kthread+0x96/0xa0
[ 46.421572] [<ffffffff8143eef4>] kernel_thread_helper+0x4/0x10
[ 46.421574] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0
[ 46.421577] [<ffffffff8143eef0>] ? gs_change+0x13/0x13
[ 46.421578] ---[ end trace 8bf0f462f72ecc2e ]---
[ 46.421579] RAID conf printout:
[ 46.421581] --- level:6 rd:10 wd:8
[ 46.421582] disk 0, o:1, dev:sdj1
[ 46.421583] disk 1, o:1, dev:sdq1
[ 46.421585] disk 2, o:1, dev:sdo1
[ 46.421586] disk 3, o:1, dev:sdh1
[ 46.421588] disk 4, o:1, dev:sdk1
[ 46.421589] disk 5, o:1, dev:sdp1
[ 46.421590] disk 6, o:1, dev:sdn1
[ 46.421592] disk 7, o:1, dev:sdf1
[ 46.421593] disk 8, o:1, dev:sdm1
[ 46.421594] disk 9, o:1, dev:sdl1
[ 46.421596] ------------[ cut here ]------------
[ 46.421598] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]()
[ 46.421600] Hardware name: System Product Name
[ 46.421601] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd joydev ir_lirc_codec rc_imon_pad lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder ir_rc5_decoder imon ir_nec_decoder usb_storage rc_core snd_hda_codec_realtek snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i2c_i801 drxk snd_timer snd_mixer_oss e1000e ddbridge evdev mxm_wmi snd dvb_core firewire_ohci sr_mod serio_raw iTCO_wdt cxd2099(C) firewire_core soundcore iTCO_vendor_support i2c_core i7core_edac crc_itu_t snd_page_alloc edac_core asus_atk0110 wmi processor button sg xhci_hcd ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan]
[ 46.421653] Pid: 912, comm: md0_raid6 Tainted: G WC 3.1.5 #5
[ 46.421655] Call Trace:
[ 46.421657] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0
[ 46.421659] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20
[ 46.421662] [<ffffffffa0675607>] print_raid5_conf+0x37/0x110 [raid456]
[ 46.421665] [<ffffffffa0675ff7>] raid5_remove_disk+0x47/0x120 [raid456]
[ 46.421668] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x690
[ 46.421671] [<ffffffffa067be63>] raid5d+0x43/0x780 [raid456]
[ 46.421673] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60
[ 46.421676] [<ffffffff81433057>] ? schedule_timeout+0x177/0x2e0
[ 46.421678] [<ffffffff81073660>] ? cascade+0xa0/0xa0
[ 46.421681] [<ffffffff81338f96>] md_thread+0x116/0x150
[ 46.421686] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40
[ 46.421688] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130
[ 46.421691] [<ffffffff810869a6>] kthread+0x96/0xa0
[ 46.421693] [<ffffffff8143eef4>] kernel_thread_helper+0x4/0x10
[ 46.421695] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0
[ 46.421698] [<ffffffff8143eef0>] ? gs_change+0x13/0x13
[ 46.421699] ---[ end trace 8bf0f462f72ecc2f ]---
[ 46.421700] RAID conf printout:
[ 46.421701] --- level:6 rd:10 wd:8
[ 46.421703] disk 0, o:1, dev:sdj1
[ 46.421704] disk 1, o:1, dev:sdq1
[ 46.421705] disk 2, o:1, dev:sdo1
[ 46.421707] disk 3, o:1, dev:sdh1
[ 46.421708] disk 4, o:1, dev:sdk1
[ 46.421709] disk 5, o:1, dev:sdp1
[ 46.421711] disk 6, o:1, dev:sdn1
[ 46.421712] disk 7, o:1, dev:sdf1
[ 46.421713] disk 8, o:1, dev:sdm1
[ 46.421715] disk 9, o:1, dev:sdl1
[ 46.421716] ------------[ cut here ]------------
[ 46.421718] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]()
[ 46.421720] Hardware name: System Product Name
[ 46.421721] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd joydev ir_lirc_codec rc_imon_pad lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder ir_rc5_decoder imon ir_nec_decoder usb_storage rc_core snd_hda_codec_realtek snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i2c_i801 drxk snd_timer snd_mixer_oss e1000e ddbridge evdev mxm_wmi snd dvb_core firewire_ohci sr_mod serio_raw iTCO_wdt cxd2099(C) firewire_core soundcore iTCO_vendor_support i2c_core i7core_edac crc_itu_t snd_page_alloc edac_core asus_atk0110 wmi processor button sg xhci_hcd ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan]
[ 46.421771] Pid: 912, comm: md0_raid6 Tainted: G WC 3.1.5 #5
[ 46.421772] Call Trace:
[ 46.421774] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0
[ 46.421776] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20
[ 46.421779] [<ffffffffa0675607>] print_raid5_conf+0x37/0x110 [raid456]
[ 46.421782] [<ffffffffa067600c>] raid5_remove_disk+0x5c/0x120 [raid456]
[ 46.421784] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x690
[ 46.421788] [<ffffffffa067be63>] raid5d+0x43/0x780 [raid456]
[ 46.421790] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60
[ 46.421792] [<ffffffff81433057>] ? schedule_timeout+0x177/0x2e0
[ 46.421795] [<ffffffff81073660>] ? cascade+0xa0/0xa0
[ 46.421797] [<ffffffff81338f96>] md_thread+0x116/0x150
[ 46.421799] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40
[ 46.421802] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130
[ 46.421804] [<ffffffff810869a6>] kthread+0x96/0xa0
[ 46.421806] [<ffffffff8143eef4>] kernel_thread_helper+0x4/0x10
[ 46.421809] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0
[ 46.421814] [<ffffffff8143eef0>] ? gs_change+0x13/0x13
[ 46.421815] ---[ end trace 8bf0f462f72ecc30 ]---
[ 46.421816] RAID conf printout:
[ 46.421818] --- level:6 rd:10 wd:8
[ 46.421819] disk 0, o:1, dev:sdj1
[ 46.421821] disk 1, o:1, dev:sdq1
[ 46.421822] disk 2, o:1, dev:sdo1
[ 46.421823] disk 3, o:1, dev:sdh1
[ 46.421825] disk 4, o:1, dev:sdk1
[ 46.421826] disk 5, o:1, dev:sdp1
[ 46.421827] disk 6, o:1, dev:sdn1
[ 46.421829] disk 7, o:1, dev:sdf1
[ 46.421830] disk 8, o:1, dev:sdm1
[ 46.421831] disk 9, o:1, dev:sdl1
[ 46.421833] ------------[ cut here ]------------
[ 46.421835] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]()
[ 46.421837] Hardware name: System Product Name
[ 46.421838] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd joydev ir_lirc_codec rc_imon_pad lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder ir_rc5_decoder imon ir_nec_decoder usb_storage rc_core snd_hda_codec_realtek snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i2c_i801 drxk snd_timer snd_mixer_oss e1000e ddbridge evdev mxm_wmi snd dvb_core firewire_ohci sr_mod serio_raw iTCO_wdt cxd2099(C) firewire_core soundcore iTCO_vendor_support i2c_core i7core_edac crc_itu_t snd_page_alloc edac_core asus_atk0110 wmi processor button sg xhci_hcd ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan]
[ 46.421890] Pid: 912, comm: md0_raid6 Tainted: G WC 3.1.5 #5
[ 46.421891] Call Trace:
[ 46.421893] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0
[ 46.421896] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20
[ 46.421899] [<ffffffffa0675607>] print_raid5_conf+0x37/0x110 [raid456]
[ 46.421902] [<ffffffffa06757b4>] raid5_add_disk+0xd4/0x120 [raid456]
[ 46.421904] [<ffffffff8133dded>] md_check_recovery+0x46d/0x690
[ 46.421907] [<ffffffffa067be63>] raid5d+0x43/0x780 [raid456]
[ 46.421910] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60
[ 46.421912] [<ffffffff81433057>] ? schedule_timeout+0x177/0x2e0
[ 46.421915] [<ffffffff81073660>] ? cascade+0xa0/0xa0
[ 46.421917] [<ffffffff81338f96>] md_thread+0x116/0x150
[ 46.421920] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40
[ 46.421922] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130
[ 46.421924] [<ffffffff810869a6>] kthread+0x96/0xa0
[ 46.421927] [<ffffffff8143eef4>] kernel_thread_helper+0x4/0x10
[ 46.421929] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0
[ 46.421932] [<ffffffff8143eef0>] ? gs_change+0x13/0x13
[ 46.421933] ---[ end trace 8bf0f462f72ecc31 ]---
[ 46.421934] RAID conf printout:
[ 46.421935] --- level:6 rd:10 wd:8
[ 46.421937] disk 0, o:1, dev:sdj1
[ 46.421938] disk 1, o:1, dev:sdq1
[ 46.421940] disk 2, o:1, dev:sdo1
[ 46.421941] disk 3, o:1, dev:sdh1
[ 46.421942] disk 4, o:1, dev:sdk1
[ 46.421944] disk 5, o:1, dev:sdp1
[ 46.421945] disk 6, o:1, dev:sdn1
[ 46.421946] disk 7, o:1, dev:sdf1
[ 46.421948] disk 8, o:1, dev:sdm1
[ 46.421949] disk 9, o:1, dev:sdl1
[ 46.421950] md0: remove_and_add_spares will return 0
[ 46.434878] md0: unknown partition table
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting
2011-12-12 6:02 ` Anssi Hannula
@ 2011-12-12 6:24 ` NeilBrown
2011-12-12 6:42 ` Anssi Hannula
0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2011-12-12 6:24 UTC (permalink / raw)
To: Anssi Hannula; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 5307 bytes --]
On Mon, 12 Dec 2011 08:02:33 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
> On Mon, Dec 12, 2011 at 7:42 AM, NeilBrown <neilb@suse.de> wrote:
> > On Mon, 12 Dec 2011 07:22:17 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
> >
> >> On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@suse.de> wrote:
> >> > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
> >> >
> >> >> Hi!
> >> >>
> >> >> After I rebooted during a raid6 rebuild, the rebuild didn't start again.
> >> >> Instead, there is a flood of "RAID conf printout"s that seemingly happen
> >> >> on array activity.
> >> >>
> >> >> All the devices show up properly in --detail and two devices are marked
> >> >> as "spare rebuilding", and I can access the contents of the array just
> >> >> fine, but the rebuild doesn't actually start. Is this a bug or am I
> >> >> missing something? :)
> >> >>
> >> >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have
> >> >> the same issue. mdadm is 3.1.5.
> >> >>
> >> >> I'm not using start_ro and writing to the array doesn't trigger a
> >> >> rebuild either.
> >> >>
> >> >> Attached are --examine outputs before assembly, kernel log output on
> >> >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4).
> >> >>
> >> >
> >> > Thank you for the very detailed problem report.
> >>
> >> Thanks for the quick response :)
> >>
> >> > Unfortunately it is a complete mystery to me what is happening.
> >> >
> >> > The repeated "RAID conf printout" messages are almost certainly coming from
> >> > the end of raid5_remove_disk.
> >> > It is being called from remove_and_add_spares for each of the two devices
> >> > that are being rebuilt. raid5_remove_disk declines to remove them because it
> >> > can keep rebuilding them.
> >> >
> >> > remove_and_add_spares then counts them and notes there are 2.
> >> > md_check_recovery notes that this is > 0, so it should create a thread to run
> >> > md_do_sync.
> >> >
> >> > md_do_sync should then print out a message like
> >> > md: recovery of RAID array md0
> >> >
> >> > but it doesn't. So something went wrong.
> >> > There are three reasons that md_do_sync might not print a message:
> >> >
> >> > 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is
> >> > unlikely, and in any case md_check_recovery clears it.
> >> > 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2
> >> > then we would be able to see that in /proc/mdstat as a "(readonly)"
> >> > status. But we don't.
> >> > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does
> >> > get set if kthread_should_stop() returns 'true', but that should only
> >> > happen if kthread_stop() was called. That is only called by
> >> > md_unregister_thread and I cannot see any way that could be call.
> >> >
> >> > So. No idea.
> >> >
> >> > Are you compiling these kernels yourself?
> >>
> >> Nope (used Mageia kernels), but I did now (3.1.5).
> >>
> >> > If so, could you:
> >> > - put a printk in the top of md_do_sync to report the values of
> >> > mddev->recovery and mddev->ro
> >> > - print a message whenever md_unregister_thread is called
> >> > - in md_check_recovery, in the
> >> > if (mddev->ro) {
> >> > /* Only thing we do on a ro array is remove
> >> > * failed devices.
> >> > */
> >> > mdk_rdev_t *rdev;
> >> >
> >> > in statement, print the value of mddev->ro.
> >> >
> >> > Then see which of those printk's fire, and what they tell us.
> >>
> >> Only the last one does, and mddev->ro == 0.
> >>
> >> For reference, attached is the used patch and resulting log output.
> >>
> >
> > Thanks.
> >
> > So it isn't running md_do_sync at all. Odd.
> >
> > Could please add:
> > - call "WARN_ON(1);" in print_raid5_conf() so we get a stack trace and can
> > see who is calling it.
> > - print the value that remove_and_add_spares is going to return.
>
> Attached. As you can see, remove_and_add_spare returns 0.
>
> --
> Anssi Hannula
Please add:
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5c95ccb..fa56ac5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7328,8 +7328,10 @@ static int remove_and_add_spares(mddev_t *mddev)
}
}
+ printk("degraded=%d\n", mddev->degraded);
if (mddev->degraded) {
list_for_each_entry(rdev, &mddev->disks, same_set) {
+ printk("raid_disk=%d flags=%x\n", rdev->raid_disk, rdev->flags);
if (rdev->raid_disk >= 0 &&
!test_bit(In_sync, &rdev->flags) &&
!test_bit(Faulty, &rdev->flags))
'degraded' must be 2 as dmesg contains
[ 45.544806] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2
and 'degraded' is exactly the difference between '8' and '10' there.
raid disks 3 and 7 must have In_sync and Faulty clear as both of them just
show "spare rebuilding" in the 'detail' output.
so remove_and_add_spares "must" return 2.
Hopefully the above patch will help me understand which of those is wrong.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting
2011-12-12 6:24 ` NeilBrown
@ 2011-12-12 6:42 ` Anssi Hannula
2011-12-12 7:10 ` NeilBrown
0 siblings, 1 reply; 9+ messages in thread
From: Anssi Hannula @ 2011-12-12 6:42 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 5748 bytes --]
On Mon, Dec 12, 2011 at 8:24 AM, NeilBrown <neilb@suse.de> wrote:
> On Mon, 12 Dec 2011 08:02:33 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
>
>> On Mon, Dec 12, 2011 at 7:42 AM, NeilBrown <neilb@suse.de> wrote:
>> > On Mon, 12 Dec 2011 07:22:17 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
>> >
>> >> On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@suse.de> wrote:
>> >> > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
>> >> >
>> >> >> Hi!
>> >> >>
>> >> >> After I rebooted during a raid6 rebuild, the rebuild didn't start again.
>> >> >> Instead, there is a flood of "RAID conf printout"s that seemingly happen
>> >> >> on array activity.
>> >> >>
>> >> >> All the devices show up properly in --detail and two devices are marked
>> >> >> as "spare rebuilding", and I can access the contents of the array just
>> >> >> fine, but the rebuild doesn't actually start. Is this a bug or am I
>> >> >> missing something? :)
>> >> >>
>> >> >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have
>> >> >> the same issue. mdadm is 3.1.5.
>> >> >>
>> >> >> I'm not using start_ro and writing to the array doesn't trigger a
>> >> >> rebuild either.
>> >> >>
>> >> >> Attached are --examine outputs before assembly, kernel log output on
>> >> >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4).
>> >> >>
>> >> >
>> >> > Thank you for the very detailed problem report.
>> >>
>> >> Thanks for the quick response :)
>> >>
>> >> > Unfortunately it is a complete mystery to me what is happening.
>> >> >
>> >> > The repeated "RAID conf printout" messages are almost certainly coming from
>> >> > the end of raid5_remove_disk.
>> >> > It is being called from remove_and_add_spares for each of the two devices
>> >> > that are being rebuilt. raid5_remove_disk declines to remove them because it
>> >> > can keep rebuilding them.
>> >> >
>> >> > remove_and_add_spares then counts them and notes there are 2.
>> >> > md_check_recovery notes that this is > 0, so it should create a thread to run
>> >> > md_do_sync.
>> >> >
>> >> > md_do_sync should then print out a message like
>> >> > md: recovery of RAID array md0
>> >> >
>> >> > but it doesn't. So something went wrong.
>> >> > There are three reasons that md_do_sync might not print a message:
>> >> >
>> >> > 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is
>> >> > unlikely, and in any case md_check_recovery clears it.
>> >> > 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2
>> >> > then we would be able to see that in /proc/mdstat as a "(readonly)"
>> >> > status. But we don't.
>> >> > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does
>> >> > get set if kthread_should_stop() returns 'true', but that should only
>> >> > happen if kthread_stop() was called. That is only called by
>> >> > md_unregister_thread and I cannot see any way that could be call.
>> >> >
>> >> > So. No idea.
>> >> >
>> >> > Are you compiling these kernels yourself?
>> >>
>> >> Nope (used Mageia kernels), but I did now (3.1.5).
>> >>
>> >> > If so, could you:
>> >> > - put a printk in the top of md_do_sync to report the values of
>> >> > mddev->recovery and mddev->ro
>> >> > - print a message whenever md_unregister_thread is called
>> >> > - in md_check_recovery, in the
>> >> > if (mddev->ro) {
>> >> > /* Only thing we do on a ro array is remove
>> >> > * failed devices.
>> >> > */
>> >> > mdk_rdev_t *rdev;
>> >> >
>> >> > in statement, print the value of mddev->ro.
>> >> >
>> >> > Then see which of those printk's fire, and what they tell us.
>> >>
>> >> Only the last one does, and mddev->ro == 0.
>> >>
>> >> For reference, attached is the used patch and resulting log output.
>> >>
>> >
>> > Thanks.
>> >
>> > So it isn't running md_do_sync at all. Odd.
>> >
>> > Could please add:
>> > - call "WARN_ON(1);" in print_raid5_conf() so we get a stack trace and can
>> > see who is calling it.
>> > - print the value that remove_and_add_spares is going to return.
>>
>> Attached. As you can see, remove_and_add_spare returns 0.
>>
>> --
>> Anssi Hannula
>
>
> Please add:
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 5c95ccb..fa56ac5 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7328,8 +7328,10 @@ static int remove_and_add_spares(mddev_t *mddev)
> }
> }
>
> + printk("degraded=%d\n", mddev->degraded);
> if (mddev->degraded) {
> list_for_each_entry(rdev, &mddev->disks, same_set) {
> + printk("raid_disk=%d flags=%x\n", rdev->raid_disk, rdev->flags);
> if (rdev->raid_disk >= 0 &&
> !test_bit(In_sync, &rdev->flags) &&
> !test_bit(Faulty, &rdev->flags))
>
>
> 'degraded' must be 2 as dmesg contains
>
> [ 45.544806] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2
>
> and 'degraded' is exactly the difference between '8' and '10' there.
>
> raid disks 3 and 7 must have In_sync and Faulty clear as both of them just
> show "spare rebuilding" in the 'detail' output.
>
> so remove_and_add_spares "must" return 2.
>
> Hopefully the above patch will help me understand which of those is wrong.
The output is:
[ 47.389379] md0: degraded=2
[ 47.389380] md0: raid_disk=0 flags=4
[ 47.389381] md0: raid_disk=-1 flags=0
Full assemble log attached.
--
Anssi Hannula
[-- Attachment #2: dmesg-dbg3.txt --]
[-- Type: text/plain, Size: 20875 bytes --]
[ 45.225768] md: md0 stopped.
[ 45.238528] md: bind<sdq1>
[ 45.243922] md: bind<sdo1>
[ 45.244059] md: bind<sdh1>
[ 45.246554] md: bind<sdk1>
[ 45.246987] md: bind<sdp1>
[ 45.247224] md: bind<sdn1>
[ 45.247357] md: bind<sdf1>
[ 45.247596] md: bind<sdm1>
[ 45.247738] md: bind<sdl1>
[ 45.247860] md: bind<sdi1>
[ 45.247986] md: bind<sdg1>
[ 45.248100] md: bind<sdd1>
[ 45.248408] md: bind<sdj1>
[ 45.251243] async_tx: api initialized (async)
[ 45.433646] raid6: int64x1 2754 MB/s
[ 45.603511] raid6: int64x2 3535 MB/s
[ 45.773372] raid6: int64x4 2375 MB/s
[ 45.943221] raid6: int64x8 2489 MB/s
[ 46.113076] raid6: sse2x1 7372 MB/s
[ 46.282914] raid6: sse2x2 8653 MB/s
[ 46.452774] raid6: sse2x4 9642 MB/s
[ 46.452776] raid6: using algorithm sse2x4 (9642 MB/s)
[ 46.460643] xor: automatically using best checksumming function: generic_sse
[ 46.502736] generic_sse: 12817.200 MB/sec
[ 46.502738] xor: using function: generic_sse (12817.200 MB/sec)
[ 46.514000] md: raid6 personality registered for level 6
[ 46.514002] md: raid5 personality registered for level 5
[ 46.514003] md: raid4 personality registered for level 4
[ 46.514225] md/raid:md0: device sdj1 operational as raid disk 0
[ 46.514227] md/raid:md0: device sdl1 operational as raid disk 9
[ 46.514228] md/raid:md0: device sdm1 operational as raid disk 8
[ 46.514230] md/raid:md0: device sdn1 operational as raid disk 6
[ 46.514231] md/raid:md0: device sdp1 operational as raid disk 5
[ 46.514232] md/raid:md0: device sdk1 operational as raid disk 4
[ 46.514233] md/raid:md0: device sdo1 operational as raid disk 2
[ 46.514234] md/raid:md0: device sdq1 operational as raid disk 1
[ 46.514822] md/raid:md0: allocated 10592kB
[ 46.514941] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2
[ 46.515042] ------------[ cut here ]------------
[ 46.515048] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]()
[ 46.515055] Hardware name: System Product Name
[ 46.515057] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder rc_imon_pad ir_rc5_decoder usb_storage joydev imon ir_nec_decoder rc_core snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep drxk snd_seq_dummy snd_seq_oss ddbridge dvb_core snd_seq_midi_event snd_seq snd_seq_device cxd2099(C) snd_pcm_oss snd_pcm snd_timer snd_mixer_oss snd evdev e1000e mxm_wmi sr_mod firewire_ohci serio_raw firewire_core i2c_i801 iTCO_wdt i7core_edac i2c_core iTCO_vendor_support soundcore edac_core crc_itu_t snd_page_alloc asus_atk0110 wmi sg xhci_hcd button processor ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan]
[ 46.515126] Pid: 899, comm: mdadm Tainted: G C 3.1.5 #6
[ 46.515127] Call Trace:
[ 46.515132] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0
[ 46.515134] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20
[ 46.515136] [<ffffffffa067e607>] print_raid5_conf+0x37/0x110 [raid456]
[ 46.515139] [<ffffffffa06872ba>] run+0x2aa/0x750 [raid456]
[ 46.515142] [<ffffffff811908a1>] ? bioset_create+0x1c1/0x280
[ 46.515144] [<ffffffff8133f42b>] md_run+0x36b/0x8a0
[ 46.515146] [<ffffffff8134064e>] ? add_new_disk+0x2ee/0x550
[ 46.515149] [<ffffffff8104d544>] ? check_preempt_curr+0x84/0xa0
[ 46.515150] [<ffffffff8133f97e>] do_md_run+0x1e/0xd0
[ 46.515152] [<ffffffff81341263>] md_ioctl+0x9b3/0x1230
[ 46.515156] [<ffffffff8121d260>] blkdev_ioctl+0x230/0x740
[ 46.515158] [<ffffffff81191a31>] block_ioctl+0x41/0x50
[ 46.515160] [<ffffffff8116f867>] do_vfs_ioctl+0x97/0x4f0
[ 46.515162] [<ffffffff8116fd51>] sys_ioctl+0x91/0xa0
[ 46.515165] [<ffffffff8143cdd2>] system_call_fastpath+0x16/0x1b
[ 46.515166] ---[ end trace d6c4c3c9ed54b494 ]---
[ 46.515167] RAID conf printout:
[ 46.515168] --- level:6 rd:10 wd:8
[ 46.515169] disk 0, o:1, dev:sdj1
[ 46.515170] disk 1, o:1, dev:sdq1
[ 46.515171] disk 2, o:1, dev:sdo1
[ 46.515173] disk 3, o:1, dev:sdh1
[ 46.515173] disk 4, o:1, dev:sdk1
[ 46.515174] disk 5, o:1, dev:sdp1
[ 46.515175] disk 6, o:1, dev:sdn1
[ 46.515176] disk 7, o:1, dev:sdf1
[ 46.515177] disk 8, o:1, dev:sdm1
[ 46.515178] disk 9, o:1, dev:sdl1
[ 46.515327] created bitmap (15 pages) for device md0
[ 46.930490] md0: bitmap initialized from disk: read 1/1 pages, set 258 of 29809 bits
[ 47.387963] md0: detected capacity change from 0 to 16003169779712
[ 47.388949] md0: in md_check_recovery(), mddev->ro = 0
[ 47.388954] ------------[ cut here ]------------
[ 47.388961] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]()
[ 47.388964] Hardware name: System Product Name
[ 47.388966] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder rc_imon_pad ir_rc5_decoder usb_storage joydev imon ir_nec_decoder rc_core snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep drxk snd_seq_dummy snd_seq_oss ddbridge dvb_core snd_seq_midi_event snd_seq snd_seq_device cxd2099(C) snd_pcm_oss snd_pcm snd_timer snd_mixer_oss snd evdev e1000e mxm_wmi sr_mod firewire_ohci serio_raw firewire_core i2c_i801 iTCO_wdt i7core_edac i2c_core iTCO_vendor_support soundcore edac_core crc_itu_t snd_page_alloc asus_atk0110 wmi sg xhci_hcd button processor ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan]
[ 47.389058] Pid: 983, comm: md0_raid6 Tainted: G WC 3.1.5 #6
[ 47.389059] Call Trace:
[ 47.389064] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0
[ 47.389066] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20
[ 47.389068] [<ffffffffa067e607>] print_raid5_conf+0x37/0x110 [raid456]
[ 47.389070] [<ffffffffa067eff7>] raid5_remove_disk+0x47/0x120 [raid456]
[ 47.389073] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x6d0
[ 47.389076] [<ffffffffa0684e63>] raid5d+0x43/0x780 [raid456]
[ 47.389079] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60
[ 47.389083] [<ffffffff81433097>] ? schedule_timeout+0x177/0x2e0
[ 47.389085] [<ffffffff81073660>] ? cascade+0xa0/0xa0
[ 47.389088] [<ffffffff81338f96>] md_thread+0x116/0x150
[ 47.389091] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40
[ 47.389093] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130
[ 47.389094] [<ffffffff810869a6>] kthread+0x96/0xa0
[ 47.389098] [<ffffffff8143ef34>] kernel_thread_helper+0x4/0x10
[ 47.389100] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0
[ 47.389102] [<ffffffff8143ef30>] ? gs_change+0x13/0x13
[ 47.389103] ---[ end trace d6c4c3c9ed54b495 ]---
[ 47.389104] RAID conf printout:
[ 47.389105] --- level:6 rd:10 wd:8
[ 47.389106] disk 0, o:1, dev:sdj1
[ 47.389108] disk 1, o:1, dev:sdq1
[ 47.389109] disk 2, o:1, dev:sdo1
[ 47.389110] disk 3, o:1, dev:sdh1
[ 47.389111] disk 4, o:1, dev:sdk1
[ 47.389113] disk 5, o:1, dev:sdp1
[ 47.389114] disk 6, o:1, dev:sdn1
[ 47.389115] disk 7, o:1, dev:sdf1
[ 47.389116] disk 8, o:1, dev:sdm1
[ 47.389117] disk 9, o:1, dev:sdl1
[ 47.389118] ------------[ cut here ]------------
[ 47.389121] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]()
[ 47.389122] Hardware name: System Product Name
[ 47.389123] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder rc_imon_pad ir_rc5_decoder usb_storage joydev imon ir_nec_decoder rc_core snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep drxk snd_seq_dummy snd_seq_oss ddbridge dvb_core snd_seq_midi_event snd_seq snd_seq_device cxd2099(C) snd_pcm_oss snd_pcm snd_timer snd_mixer_oss snd evdev e1000e mxm_wmi sr_mod firewire_ohci serio_raw firewire_core i2c_i801 iTCO_wdt i7core_edac i2c_core iTCO_vendor_support soundcore edac_core crc_itu_t snd_page_alloc asus_atk0110 wmi sg xhci_hcd button processor ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan]
[ 47.389162] Pid: 983, comm: md0_raid6 Tainted: G WC 3.1.5 #6
[ 47.389163] Call Trace:
[ 47.389164] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0
[ 47.389166] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20
[ 47.389168] [<ffffffffa067e607>] print_raid5_conf+0x37/0x110 [raid456]
[ 47.389171] [<ffffffffa067f00c>] raid5_remove_disk+0x5c/0x120 [raid456]
[ 47.389173] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x6d0
[ 47.389175] [<ffffffffa0684e63>] raid5d+0x43/0x780 [raid456]
[ 47.389177] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60
[ 47.389179] [<ffffffff81433097>] ? schedule_timeout+0x177/0x2e0
[ 47.389181] [<ffffffff81073660>] ? cascade+0xa0/0xa0
[ 47.389183] [<ffffffff81338f96>] md_thread+0x116/0x150
[ 47.389184] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40
[ 47.389186] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130
[ 47.389188] [<ffffffff810869a6>] kthread+0x96/0xa0
[ 47.389190] [<ffffffff8143ef34>] kernel_thread_helper+0x4/0x10
[ 47.389191] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0
[ 47.389193] [<ffffffff8143ef30>] ? gs_change+0x13/0x13
[ 47.389194] ---[ end trace d6c4c3c9ed54b496 ]---
[ 47.389195] RAID conf printout:
[ 47.389196] --- level:6 rd:10 wd:8
[ 47.389197] disk 0, o:1, dev:sdj1
[ 47.389198] disk 1, o:1, dev:sdq1
[ 47.389199] disk 2, o:1, dev:sdo1
[ 47.389200] disk 3, o:1, dev:sdh1
[ 47.389201] disk 4, o:1, dev:sdk1
[ 47.389202] disk 5, o:1, dev:sdp1
[ 47.389203] disk 6, o:1, dev:sdn1
[ 47.389204] disk 7, o:1, dev:sdf1
[ 47.389205] disk 8, o:1, dev:sdm1
[ 47.389206] disk 9, o:1, dev:sdl1
[ 47.389206] ------------[ cut here ]------------
[ 47.389209] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]()
[ 47.389210] Hardware name: System Product Name
[ 47.389210] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder rc_imon_pad ir_rc5_decoder usb_storage joydev imon ir_nec_decoder rc_core snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep drxk snd_seq_dummy snd_seq_oss ddbridge dvb_core snd_seq_midi_event snd_seq snd_seq_device cxd2099(C) snd_pcm_oss snd_pcm snd_timer snd_mixer_oss snd evdev e1000e mxm_wmi sr_mod firewire_ohci serio_raw firewire_core i2c_i801 iTCO_wdt i7core_edac i2c_core iTCO_vendor_support soundcore edac_core crc_itu_t snd_page_alloc asus_atk0110 wmi sg xhci_hcd button processor ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan]
[ 47.389248] Pid: 983, comm: md0_raid6 Tainted: G WC 3.1.5 #6
[ 47.389249] Call Trace:
[ 47.389251] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0
[ 47.389253] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20
[ 47.389255] [<ffffffffa067e607>] print_raid5_conf+0x37/0x110 [raid456]
[ 47.389257] [<ffffffffa067eff7>] raid5_remove_disk+0x47/0x120 [raid456]
[ 47.389259] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x6d0
[ 47.389261] [<ffffffffa0684e63>] raid5d+0x43/0x780 [raid456]
[ 47.389263] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60
[ 47.389265] [<ffffffff81433097>] ? schedule_timeout+0x177/0x2e0
[ 47.389267] [<ffffffff81073660>] ? cascade+0xa0/0xa0
[ 47.389269] [<ffffffff81338f96>] md_thread+0x116/0x150
[ 47.389271] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40
[ 47.389273] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130
[ 47.389274] [<ffffffff810869a6>] kthread+0x96/0xa0
[ 47.389276] [<ffffffff8143ef34>] kernel_thread_helper+0x4/0x10
[ 47.389278] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0
[ 47.389280] [<ffffffff8143ef30>] ? gs_change+0x13/0x13
[ 47.389281] ---[ end trace d6c4c3c9ed54b497 ]---
[ 47.389282] RAID conf printout:
[ 47.389283] --- level:6 rd:10 wd:8
[ 47.389284] disk 0, o:1, dev:sdj1
[ 47.389285] disk 1, o:1, dev:sdq1
[ 47.389285] disk 2, o:1, dev:sdo1
[ 47.389286] disk 3, o:1, dev:sdh1
[ 47.389287] disk 4, o:1, dev:sdk1
[ 47.389288] disk 5, o:1, dev:sdp1
[ 47.389289] disk 6, o:1, dev:sdn1
[ 47.389290] disk 7, o:1, dev:sdf1
[ 47.389291] disk 8, o:1, dev:sdm1
[ 47.389292] disk 9, o:1, dev:sdl1
[ 47.389293] ------------[ cut here ]------------
[ 47.389295] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]()
[ 47.389296] Hardware name: System Product Name
[ 47.389297] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder rc_imon_pad ir_rc5_decoder usb_storage joydev imon ir_nec_decoder rc_core snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep drxk snd_seq_dummy snd_seq_oss ddbridge dvb_core snd_seq_midi_event snd_seq snd_seq_device cxd2099(C) snd_pcm_oss snd_pcm snd_timer snd_mixer_oss snd evdev e1000e mxm_wmi sr_mod firewire_ohci serio_raw firewire_core i2c_i801 iTCO_wdt i7core_edac i2c_core iTCO_vendor_support soundcore edac_core crc_itu_t snd_page_alloc asus_atk0110 wmi sg xhci_hcd button processor ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan]
[ 47.389334] Pid: 983, comm: md0_raid6 Tainted: G WC 3.1.5 #6
[ 47.389335] Call Trace:
[ 47.389337] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0
[ 47.389339] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20
[ 47.389341] [<ffffffffa067e607>] print_raid5_conf+0x37/0x110 [raid456]
[ 47.389343] [<ffffffffa067f00c>] raid5_remove_disk+0x5c/0x120 [raid456]
[ 47.389345] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x6d0
[ 47.389347] [<ffffffffa0684e63>] raid5d+0x43/0x780 [raid456]
[ 47.389349] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60
[ 47.389351] [<ffffffff81433097>] ? schedule_timeout+0x177/0x2e0
[ 47.389353] [<ffffffff81073660>] ? cascade+0xa0/0xa0
[ 47.389355] [<ffffffff81338f96>] md_thread+0x116/0x150
[ 47.389357] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40
[ 47.389359] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130
[ 47.389360] [<ffffffff810869a6>] kthread+0x96/0xa0
[ 47.389362] [<ffffffff8143ef34>] kernel_thread_helper+0x4/0x10
[ 47.389364] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0
[ 47.389366] [<ffffffff8143ef30>] ? gs_change+0x13/0x13
[ 47.389367] ---[ end trace d6c4c3c9ed54b498 ]---
[ 47.389368] RAID conf printout:
[ 47.389368] --- level:6 rd:10 wd:8
[ 47.389369] disk 0, o:1, dev:sdj1
[ 47.389370] disk 1, o:1, dev:sdq1
[ 47.389371] disk 2, o:1, dev:sdo1
[ 47.389372] disk 3, o:1, dev:sdh1
[ 47.389373] disk 4, o:1, dev:sdk1
[ 47.389374] disk 5, o:1, dev:sdp1
[ 47.389375] disk 6, o:1, dev:sdn1
[ 47.389376] disk 7, o:1, dev:sdf1
[ 47.389377] disk 8, o:1, dev:sdm1
[ 47.389378] disk 9, o:1, dev:sdl1
[ 47.389379] md0: degraded=2
[ 47.389380] md0: raid_disk=0 flags=4
[ 47.389381] md0: raid_disk=-1 flags=0
[ 47.389382] ------------[ cut here ]------------
[ 47.389384] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]()
[ 47.389385] Hardware name: System Product Name
[ 47.389386] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder rc_imon_pad ir_rc5_decoder usb_storage joydev imon ir_nec_decoder rc_core snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep drxk snd_seq_dummy snd_seq_oss ddbridge dvb_core snd_seq_midi_event snd_seq snd_seq_device cxd2099(C) snd_pcm_oss snd_pcm snd_timer snd_mixer_oss snd evdev e1000e mxm_wmi sr_mod firewire_ohci serio_raw firewire_core i2c_i801 iTCO_wdt i7core_edac i2c_core iTCO_vendor_support soundcore edac_core crc_itu_t snd_page_alloc asus_atk0110 wmi sg xhci_hcd button processor ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan]
[ 47.389423] Pid: 983, comm: md0_raid6 Tainted: G WC 3.1.5 #6
[ 47.389424] Call Trace:
[ 47.389426] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0
[ 47.389428] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20
[ 47.389430] [<ffffffffa067e607>] print_raid5_conf+0x37/0x110 [raid456]
[ 47.389432] [<ffffffffa067e7b4>] raid5_add_disk+0xd4/0x120 [raid456]
[ 47.389434] [<ffffffff8133ded9>] md_check_recovery+0x559/0x6d0
[ 47.389436] [<ffffffffa0684e63>] raid5d+0x43/0x780 [raid456]
[ 47.389438] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60
[ 47.389440] [<ffffffff81433097>] ? schedule_timeout+0x177/0x2e0
[ 47.389442] [<ffffffff81073660>] ? cascade+0xa0/0xa0
[ 47.389444] [<ffffffff81338f96>] md_thread+0x116/0x150
[ 47.389446] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40
[ 47.389447] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130
[ 47.389449] [<ffffffff810869a6>] kthread+0x96/0xa0
[ 47.389451] [<ffffffff8143ef34>] kernel_thread_helper+0x4/0x10
[ 47.389453] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0
[ 47.389455] [<ffffffff8143ef30>] ? gs_change+0x13/0x13
[ 47.389456] ---[ end trace d6c4c3c9ed54b499 ]---
[ 47.389456] RAID conf printout:
[ 47.389457] --- level:6 rd:10 wd:8
[ 47.389458] disk 0, o:1, dev:sdj1
[ 47.389459] disk 1, o:1, dev:sdq1
[ 47.389460] disk 2, o:1, dev:sdo1
[ 47.389461] disk 3, o:1, dev:sdh1
[ 47.389462] disk 4, o:1, dev:sdk1
[ 47.389463] disk 5, o:1, dev:sdp1
[ 47.389464] disk 6, o:1, dev:sdn1
[ 47.389465] disk 7, o:1, dev:sdf1
[ 47.389466] disk 8, o:1, dev:sdm1
[ 47.389467] disk 9, o:1, dev:sdl1
[ 47.389468] md0: remove_and_add_spares will return 0
[ 47.397561] md0: unknown partition table
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting
2011-12-12 6:42 ` Anssi Hannula
@ 2011-12-12 7:10 ` NeilBrown
2011-12-12 7:25 ` Anssi Hannula
0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2011-12-12 7:10 UTC (permalink / raw)
To: Anssi Hannula; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 6751 bytes --]
On Mon, 12 Dec 2011 08:42:49 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
> On Mon, Dec 12, 2011 at 8:24 AM, NeilBrown <neilb@suse.de> wrote:
> > On Mon, 12 Dec 2011 08:02:33 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
> >
> >> On Mon, Dec 12, 2011 at 7:42 AM, NeilBrown <neilb@suse.de> wrote:
> >> > On Mon, 12 Dec 2011 07:22:17 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
> >> >
> >> >> On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@suse.de> wrote:
> >> >> > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
> >> >> >
> >> >> >> Hi!
> >> >> >>
> >> >> >> After I rebooted during a raid6 rebuild, the rebuild didn't start again.
> >> >> >> Instead, there is a flood of "RAID conf printout"s that seemingly happen
> >> >> >> on array activity.
> >> >> >>
> >> >> >> All the devices show up properly in --detail and two devices are marked
> >> >> >> as "spare rebuilding", and I can access the contents of the array just
> >> >> >> fine, but the rebuild doesn't actually start. Is this a bug or am I
> >> >> >> missing something? :)
> >> >> >>
> >> >> >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have
> >> >> >> the same issue. mdadm is 3.1.5.
> >> >> >>
> >> >> >> I'm not using start_ro and writing to the array doesn't trigger a
> >> >> >> rebuild either.
> >> >> >>
> >> >> >> Attached are --examine outputs before assembly, kernel log output on
> >> >> >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4).
> >> >> >>
> >> >> >
> >> >> > Thank you for the very detailed problem report.
> >> >>
> >> >> Thanks for the quick response :)
> >> >>
> >> >> > Unfortunately it is a complete mystery to me what is happening.
> >> >> >
> >> >> > The repeated "RAID conf printout" messages are almost certainly coming from
> >> >> > the end of raid5_remove_disk.
> >> >> > It is being called from remove_and_add_spares for each of the two devices
> >> >> > that are being rebuilt. raid5_remove_disk declines to remove them because it
> >> >> > can keep rebuilding them.
> >> >> >
> >> >> > remove_and_add_spares then counts them and notes there are 2.
> >> >> > md_check_recovery notes that this is > 0, so it should create a thread to run
> >> >> > md_do_sync.
> >> >> >
> >> >> > md_do_sync should then print out a message like
> >> >> > md: recovery of RAID array md0
> >> >> >
> >> >> > but it doesn't. So something went wrong.
> >> >> > There are three reasons that md_do_sync might not print a message:
> >> >> >
> >> >> > 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is
> >> >> > unlikely, and in any case md_check_recovery clears it.
> >> >> > 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2
> >> >> > then we would be able to see that in /proc/mdstat as a "(readonly)"
> >> >> > status. But we don't.
> >> >> > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does
> >> >> > get set if kthread_should_stop() returns 'true', but that should only
> >> >> > happen if kthread_stop() was called. That is only called by
> >> >> > md_unregister_thread and I cannot see any way that could be call.
> >> >> >
> >> >> > So. No idea.
> >> >> >
> >> >> > Are you compiling these kernels yourself?
> >> >>
> >> >> Nope (used Mageia kernels), but I did now (3.1.5).
> >> >>
> >> >> > If so, could you:
> >> >> > - put a printk in the top of md_do_sync to report the values of
> >> >> > mddev->recovery and mddev->ro
> >> >> > - print a message whenever md_unregister_thread is called
> >> >> > - in md_check_recovery, in the
> >> >> > if (mddev->ro) {
> >> >> > /* Only thing we do on a ro array is remove
> >> >> > * failed devices.
> >> >> > */
> >> >> > mdk_rdev_t *rdev;
> >> >> >
> >> >> > in statement, print the value of mddev->ro.
> >> >> >
> >> >> > Then see which of those printk's fire, and what they tell us.
> >> >>
> >> >> Only the last one does, and mddev->ro == 0.
> >> >>
> >> >> For reference, attached is the used patch and resulting log output.
> >> >>
> >> >
> >> > Thanks.
> >> >
> >> > So it isn't running md_do_sync at all. Odd.
> >> >
> >> > Could please add:
> >> > - call "WARN_ON(1);" in print_raid5_conf() so we get a stack trace and can
> >> > see who is calling it.
> >> > - print the value that remove_and_add_spares is going to return.
> >>
> >> Attached. As you can see, remove_and_add_spare returns 0.
> >>
> >> --
> >> Anssi Hannula
> >
> >
> > Please add:
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 5c95ccb..fa56ac5 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -7328,8 +7328,10 @@ static int remove_and_add_spares(mddev_t *mddev)
> > }
> > }
> >
> > + printk("degraded=%d\n", mddev->degraded);
> > if (mddev->degraded) {
> > list_for_each_entry(rdev, &mddev->disks, same_set) {
> > + printk("raid_disk=%d flags=%x\n", rdev->raid_disk, rdev->flags);
> > if (rdev->raid_disk >= 0 &&
> > !test_bit(In_sync, &rdev->flags) &&
> > !test_bit(Faulty, &rdev->flags))
> >
> >
> > 'degraded' must be 2 as dmesg contains
> >
> > [ 45.544806] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2
> >
> > and 'degraded' is exactly the difference between '8' and '10' there.
> >
> > raid disks 3 and 7 must have In_sync and Faulty clear as both of them just
> > show "spare rebuilding" in the 'detail' output.
> >
> > so remove_and_add_spares "must" return 2.
> >
> > Hopefully the above patch will help me understand which of those is wrong.
>
> The output is:
> [ 47.389379] md0: degraded=2
> [ 47.389380] md0: raid_disk=0 flags=4
> [ 47.389381] md0: raid_disk=-1 flags=0
>
> Full assemble log attached.
>
> --
> Anssi Hannula
Bingo.
This will fix it. We don't really need that 'break' there, and it is a
problem.
Thanks.
NeilBrown
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5c95ccb..440d964 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7344,8 +7344,7 @@ static int remove_and_add_spares(mddev_t *mddev)
spares++;
md_new_event(mddev);
set_bit(MD_CHANGE_DEVS, &mddev->flags);
- } else
- break;
+ }
}
}
}
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting
2011-12-12 7:10 ` NeilBrown
@ 2011-12-12 7:25 ` Anssi Hannula
0 siblings, 0 replies; 9+ messages in thread
From: Anssi Hannula @ 2011-12-12 7:25 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On Mon, Dec 12, 2011 at 9:10 AM, NeilBrown <neilb@suse.de> wrote:
> On Mon, 12 Dec 2011 08:42:49 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
>
>> On Mon, Dec 12, 2011 at 8:24 AM, NeilBrown <neilb@suse.de> wrote:
>> > On Mon, 12 Dec 2011 08:02:33 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
>> >
>> >> On Mon, Dec 12, 2011 at 7:42 AM, NeilBrown <neilb@suse.de> wrote:
>> >> > On Mon, 12 Dec 2011 07:22:17 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
>> >> >
>> >> >> On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@suse.de> wrote:
>> >> >> > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote:
>> >> >> >
>> >> >> >> Hi!
>> >> >> >>
>> >> >> >> After I rebooted during a raid6 rebuild, the rebuild didn't start again.
>> >> >> >> Instead, there is a flood of "RAID conf printout"s that seemingly happen
>> >> >> >> on array activity.
>> >> >> >>
>> >> >> >> All the devices show up properly in --detail and two devices are marked
>> >> >> >> as "spare rebuilding", and I can access the contents of the array just
>> >> >> >> fine, but the rebuild doesn't actually start. Is this a bug or am I
>> >> >> >> missing something? :)
>> >> >> >>
>> >> >> >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have
>> >> >> >> the same issue. mdadm is 3.1.5.
>> >> >> >>
>> >> >> >> I'm not using start_ro and writing to the array doesn't trigger a
>> >> >> >> rebuild either.
>> >> >> >>
>> >> >> >> Attached are --examine outputs before assembly, kernel log output on
>> >> >> >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4).
>> >> >> >>
>> >> >> >
>> >> >> > Thank you for the very detailed problem report.
>> >> >>
>> >> >> Thanks for the quick response :)
>> >> >>
>> >> >> > Unfortunately it is a complete mystery to me what is happening.
>> >> >> >
>> >> >> > The repeated "RAID conf printout" messages are almost certainly coming from
>> >> >> > the end of raid5_remove_disk.
>> >> >> > It is being called from remove_and_add_spares for each of the two devices
>> >> >> > that are being rebuilt. raid5_remove_disk declines to remove them because it
>> >> >> > can keep rebuilding them.
>> >> >> >
>> >> >> > remove_and_add_spares then counts them and notes there are 2.
>> >> >> > md_check_recovery notes that this is > 0, so it should create a thread to run
>> >> >> > md_do_sync.
>> >> >> >
>> >> >> > md_do_sync should then print out a message like
>> >> >> > md: recovery of RAID array md0
>> >> >> >
>> >> >> > but it doesn't. So something went wrong.
>> >> >> > There are three reasons that md_do_sync might not print a message:
>> >> >> >
>> >> >> > 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is
>> >> >> > unlikely, and in any case md_check_recovery clears it.
>> >> >> > 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2
>> >> >> > then we would be able to see that in /proc/mdstat as a "(readonly)"
>> >> >> > status. But we don't.
>> >> >> > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does
>> >> >> > get set if kthread_should_stop() returns 'true', but that should only
>> >> >> > happen if kthread_stop() was called. That is only called by
>> >> >> > md_unregister_thread and I cannot see any way that could be call.
>> >> >> >
>> >> >> > So. No idea.
>> >> >> >
>> >> >> > Are you compiling these kernels yourself?
>> >> >>
>> >> >> Nope (used Mageia kernels), but I did now (3.1.5).
>> >> >>
>> >> >> > If so, could you:
>> >> >> > - put a printk in the top of md_do_sync to report the values of
>> >> >> > mddev->recovery and mddev->ro
>> >> >> > - print a message whenever md_unregister_thread is called
>> >> >> > - in md_check_recovery, in the
>> >> >> > if (mddev->ro) {
>> >> >> > /* Only thing we do on a ro array is remove
>> >> >> > * failed devices.
>> >> >> > */
>> >> >> > mdk_rdev_t *rdev;
>> >> >> >
>> >> >> > in statement, print the value of mddev->ro.
>> >> >> >
>> >> >> > Then see which of those printk's fire, and what they tell us.
>> >> >>
>> >> >> Only the last one does, and mddev->ro == 0.
>> >> >>
>> >> >> For reference, attached is the used patch and resulting log output.
>> >> >>
>> >> >
>> >> > Thanks.
>> >> >
>> >> > So it isn't running md_do_sync at all. Odd.
>> >> >
>> >> > Could please add:
>> >> > - call "WARN_ON(1);" in print_raid5_conf() so we get a stack trace and can
>> >> > see who is calling it.
>> >> > - print the value that remove_and_add_spares is going to return.
>> >>
>> >> Attached. As you can see, remove_and_add_spare returns 0.
>> >>
>> >> --
>> >> Anssi Hannula
>> >
>> >
>> > Please add:
>> >
>> > diff --git a/drivers/md/md.c b/drivers/md/md.c
>> > index 5c95ccb..fa56ac5 100644
>> > --- a/drivers/md/md.c
>> > +++ b/drivers/md/md.c
>> > @@ -7328,8 +7328,10 @@ static int remove_and_add_spares(mddev_t *mddev)
>> > }
>> > }
>> >
>> > + printk("degraded=%d\n", mddev->degraded);
>> > if (mddev->degraded) {
>> > list_for_each_entry(rdev, &mddev->disks, same_set) {
>> > + printk("raid_disk=%d flags=%x\n", rdev->raid_disk, rdev->flags);
>> > if (rdev->raid_disk >= 0 &&
>> > !test_bit(In_sync, &rdev->flags) &&
>> > !test_bit(Faulty, &rdev->flags))
>> >
>> >
>> > 'degraded' must be 2 as dmesg contains
>> >
>> > [ 45.544806] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2
>> >
>> > and 'degraded' is exactly the difference between '8' and '10' there.
>> >
>> > raid disks 3 and 7 must have In_sync and Faulty clear as both of them just
>> > show "spare rebuilding" in the 'detail' output.
>> >
>> > so remove_and_add_spares "must" return 2.
>> >
>> > Hopefully the above patch will help me understand which of those is wrong.
>>
>> The output is:
>> [ 47.389379] md0: degraded=2
>> [ 47.389380] md0: raid_disk=0 flags=4
>> [ 47.389381] md0: raid_disk=-1 flags=0
>>
>> Full assemble log attached.
>>
>
> Bingo.
>
> This will fix it. We don't really need that 'break' there, and it is a
> problem.
>
> Thanks.
Confirmed. Thanks for the quick fix :)
--
Anssi Hannula
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-12-12 7:25 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-11 7:03 raid6 rebuild not starting Anssi Hannula
2011-12-12 3:01 ` NeilBrown
2011-12-12 5:22 ` Anssi Hannula
2011-12-12 5:42 ` NeilBrown
2011-12-12 6:02 ` Anssi Hannula
2011-12-12 6:24 ` NeilBrown
2011-12-12 6:42 ` Anssi Hannula
2011-12-12 7:10 ` NeilBrown
2011-12-12 7:25 ` Anssi Hannula
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.