* raid6 rebuild not starting
@ 2011-12-11 7:03 Anssi Hannula
2011-12-12 3:01 ` NeilBrown
0 siblings, 1 reply; 9+ messages in thread
From: Anssi Hannula @ 2011-12-11 7:03 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 742 bytes --]
Hi!
After I rebooted during a raid6 rebuild, the rebuild didn't start again.
Instead, there is a flood of "RAID conf printout"s that seemingly happen
on array activity.
All the devices show up properly in --detail and two devices are marked
as "spare rebuilding", and I can access the contents of the array just
fine, but the rebuild doesn't actually start. Is this a bug or am I
missing something? :)
I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have
the same issue. mdadm is 3.1.5.
I'm not using start_ro and writing to the array doesn't trigger a
rebuild either.
Attached are --examine outputs before assembly, kernel log output on
assembly, /proc/mdstat and --detail after assembly (on 3.1.4).
--
Anssi Hannula
[-- Attachment #2: detail-md0.txt --]
[-- Type: text/plain, Size: 1561 bytes --]
/dev/md0:
Version : 1.2
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Array Size : 15628095488 (14904.11 GiB 16003.17 GB)
Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
Raid Devices : 10
Total Devices : 13
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sun Dec 11 08:12:18 2011
State : active, degraded
Active Devices : 8
Working Devices : 13
Failed Devices : 0
Spare Devices : 5
Layout : left-symmetric
Chunk Size : 512K
Name : delta.onse.fi:delta (local to host delta.onse.fi)
UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Events : 293646
Number Major Minor RaidDevice State
5 8 225 0 active sync /dev/sdo1
2 65 1 1 active sync /dev/sdq1
11 8 161 2 active sync /dev/sdk1
17 8 113 3 spare rebuilding /dev/sdh1
10 8 145 4 active sync /dev/sdj1
9 8 177 5 active sync /dev/sdl1
8 8 241 6 active sync /dev/sdp1
18 8 81 7 spare rebuilding /dev/sdf1
12 8 193 8 active sync /dev/sdm1
13 8 209 9 active sync /dev/sdn1
14 8 97 - spare /dev/sdg1
15 8 65 - spare /dev/sde1
16 8 129 - spare /dev/sdi1
[-- Attachment #3: dmesg-assembled.txt --]
[-- Type: text/plain, Size: 4028 bytes --]
[ 763.062742] md: md0 stopped.
[ 763.070923] md: bind<sdq1>
[ 763.072029] md: bind<sdk1>
[ 763.072567] md: bind<sdh1>
[ 763.073404] md: bind<sdj1>
[ 763.074199] md: bind<sdl1>
[ 763.074752] md: bind<sdp1>
[ 763.075234] md: bind<sdf1>
[ 763.076543] md: bind<sdm1>
[ 763.077348] md: bind<sdn1>
[ 763.078020] md: bind<sdi1>
[ 763.078356] md: bind<sdg1>
[ 763.079611] md: bind<sde1>
[ 763.080159] md: bind<sdo1>
[ 763.082687] md/raid:md0: device sdo1 operational as raid disk 0
[ 763.082765] md/raid:md0: device sdn1 operational as raid disk 9
[ 763.082840] md/raid:md0: device sdm1 operational as raid disk 8
[ 763.082914] md/raid:md0: device sdp1 operational as raid disk 6
[ 763.082988] md/raid:md0: device sdl1 operational as raid disk 5
[ 763.083063] md/raid:md0: device sdj1 operational as raid disk 4
[ 763.083137] md/raid:md0: device sdk1 operational as raid disk 2
[ 763.083211] md/raid:md0: device sdq1 operational as raid disk 1
[ 763.084250] md/raid:md0: allocated 10592kB
[ 763.084522] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2
[ 763.084617] RAID conf printout:
[ 763.084685] --- level:6 rd:10 wd:8
[ 763.087189] disk 0, o:1, dev:sdo1
[ 763.087259] disk 1, o:1, dev:sdq1
[ 763.087328] disk 2, o:1, dev:sdk1
[ 763.087397] disk 3, o:1, dev:sdh1
[ 763.087465] disk 4, o:1, dev:sdj1
[ 763.087534] disk 5, o:1, dev:sdl1
[ 763.087603] disk 6, o:1, dev:sdp1
[ 763.087672] disk 7, o:1, dev:sdf1
[ 763.087741] disk 8, o:1, dev:sdm1
[ 763.087810] disk 9, o:1, dev:sdn1
[ 763.088062] created bitmap (15 pages) for device md0
[ 763.508492] md0: bitmap initialized from disk: read 1/1 pages, set 254 of 29809 bits
[ 763.953649] md0: detected capacity change from 0 to 16003169779712
[ 763.953810] RAID conf printout:
[ 763.953813] --- level:6 rd:10 wd:8
[ 763.953816] disk 0, o:1, dev:sdo1
[ 763.953818] disk 1, o:1, dev:sdq1
[ 763.953820] disk 2, o:1, dev:sdk1
[ 763.953822] disk 3, o:1, dev:sdh1
[ 763.953824] disk 4, o:1, dev:sdj1
[ 763.953826] disk 5, o:1, dev:sdl1
[ 763.953828] disk 6, o:1, dev:sdp1
[ 763.953830] disk 7, o:1, dev:sdf1
[ 763.953832] disk 8, o:1, dev:sdm1
[ 763.953834] disk 9, o:1, dev:sdn1
[ 763.953835] RAID conf printout:
[ 763.953837] --- level:6 rd:10 wd:8
[ 763.953838] disk 0, o:1, dev:sdo1
[ 763.953840] disk 1, o:1, dev:sdq1
[ 763.953843] disk 2, o:1, dev:sdk1
[ 763.953844] disk 3, o:1, dev:sdh1
[ 763.953846] disk 4, o:1, dev:sdj1
[ 763.953848] disk 5, o:1, dev:sdl1
[ 763.953849] disk 6, o:1, dev:sdp1
[ 763.953851] disk 7, o:1, dev:sdf1
[ 763.953853] disk 8, o:1, dev:sdm1
[ 763.953854] disk 9, o:1, dev:sdn1
[ 763.953856] RAID conf printout:
[ 763.953857] --- level:6 rd:10 wd:8
[ 763.953859] disk 0, o:1, dev:sdo1
[ 763.953861] disk 1, o:1, dev:sdq1
[ 763.953862] disk 2, o:1, dev:sdk1
[ 763.953864] disk 3, o:1, dev:sdh1
[ 763.953866] disk 4, o:1, dev:sdj1
[ 763.953867] disk 5, o:1, dev:sdl1
[ 763.953869] disk 6, o:1, dev:sdp1
[ 763.953871] disk 7, o:1, dev:sdf1
[ 763.953872] disk 8, o:1, dev:sdm1
[ 763.953874] disk 9, o:1, dev:sdn1
[ 763.953875] RAID conf printout:
[ 763.953876] --- level:6 rd:10 wd:8
[ 763.953878] disk 0, o:1, dev:sdo1
[ 763.953880] disk 1, o:1, dev:sdq1
[ 763.953882] disk 2, o:1, dev:sdk1
[ 763.953883] disk 3, o:1, dev:sdh1
[ 763.953885] disk 4, o:1, dev:sdj1
[ 763.953887] disk 5, o:1, dev:sdl1
[ 763.953888] disk 6, o:1, dev:sdp1
[ 763.953890] disk 7, o:1, dev:sdf1
[ 763.953892] disk 8, o:1, dev:sdm1
[ 763.953893] disk 9, o:1, dev:sdn1
[ 763.953895] RAID conf printout:
[ 763.953896] --- level:6 rd:10 wd:8
[ 763.953898] disk 0, o:1, dev:sdo1
[ 763.953900] disk 1, o:1, dev:sdq1
[ 763.953901] disk 2, o:1, dev:sdk1
[ 763.953903] disk 3, o:1, dev:sdh1
[ 763.953905] disk 4, o:1, dev:sdj1
[ 763.953906] disk 5, o:1, dev:sdl1
[ 763.953908] disk 6, o:1, dev:sdp1
[ 763.953910] disk 7, o:1, dev:sdf1
[ 763.953911] disk 8, o:1, dev:sdm1
[ 763.953913] disk 9, o:1, dev:sdn1
[-- Attachment #4: examine-sdf1.txt --]
[-- Type: text/plain, Size: 933 bytes --]
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x3
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Recovery Offset : 289089800 sectors
State : clean
Device UUID : 775affd2:7cedaf16:d29518a2:9ebda6ab
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : 397ee09a - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 7
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #5: examine-sdh1.txt --]
[-- Type: text/plain, Size: 932 bytes --]
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x3
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Recovery Offset : 289089800 sectors
State : clean
Device UUID : 01c10562:b17a29e4:f554a3bd:bf9ad501
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : 7b8709c - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #6: examine-sdj1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdj1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 46cf1470:7fba6892:5646483b:687df054
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : 838b69a7 - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #7: examine-sdk1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdk1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 8f58934b:e8c3b310:77280e93:c6dd7f1e
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : feaa3ed8 - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #8: examine-sdl1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdl1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 7c742911:a4aa0b69:75767d03:d78f19aa
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : 18a1418f - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #9: examine-sdm1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdm1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 5da5f2c1:27900682:eed98123:d1dd7683
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : dbc70969 - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 8
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #10: examine-sdn1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdn1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : a3a9b4ad:cab2a5c4:fa4a83d2:6839a473
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : a956fd07 - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 9
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #11: examine-sdo1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdo1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f17591b8:8012317d:ded224ad:8054aee9
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : bd6acbef - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #12: examine-sdp1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdp1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : a0a2c206:5ce49b5c:f5a735f1:62352734
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : 79908075 - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 6
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #13: examine-sdq1.txt --]
[-- Type: text/plain, Size: 897 bytes --]
/dev/sdq1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 35794893:8cbfdd41:36a60e33:2a6169dd
Name : delta.onse.fi:delta (local to host delta.onse.fi)
Creation Time : Fri Sep 3 21:45:45 2010
Raid Level : raid6
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 31256190976 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 3cdd3e34:252a6077:8a1a5828:27528598
Internal Bitmap : 2 sectors from superblock
Update Time : Sun Dec 11 08:12:18 2011
Checksum : 5d51902e - correct
Events : 293646
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAAAAAAAA ('A' == active, '.' == missing)
[-- Attachment #14: proc_mdstat.txt --]
[-- Type: text/plain, Size: 435 bytes --]
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid6 sdo1[5] sde1[15](S) sdg1[14](S) sdi1[16](S) sdn1[13] sdm1[12] sdf1[18] sdp1[8] sdl1[9] sdj1[10] sdh1[17] sdk1[11] sdq1[2]
15628095488 blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/8] [UUU_UUU_UU]
bitmap: 7/15 pages [28KB], 65536KB chunk
md1 : active raid1 sda2[0] sdb2[1]
116170128 blocks super 1.2 [2/2] [UU]
unused devices: <none>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: raid6 rebuild not starting 2011-12-11 7:03 raid6 rebuild not starting Anssi Hannula @ 2011-12-12 3:01 ` NeilBrown 2011-12-12 5:22 ` Anssi Hannula 0 siblings, 1 reply; 9+ messages in thread From: NeilBrown @ 2011-12-12 3:01 UTC (permalink / raw) To: Anssi Hannula; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2745 bytes --] On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: > Hi! > > After I rebooted during a raid6 rebuild, the rebuild didn't start again. > Instead, there is a flood of "RAID conf printout"s that seemingly happen > on array activity. > > All the devices show up properly in --detail and two devices are marked > as "spare rebuilding", and I can access the contents of the array just > fine, but the rebuild doesn't actually start. Is this a bug or am I > missing something? :) > > I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have > the same issue. mdadm is 3.1.5. > > I'm not using start_ro and writing to the array doesn't trigger a > rebuild either. > > Attached are --examine outputs before assembly, kernel log output on > assembly, /proc/mdstat and --detail after assembly (on 3.1.4). > Thank you for the very detailed problem report. Unfortunately it is a complete mystery to me what is happening. The repeated "RAID conf printout" messages are almost certainly coming from the end of raid5_remove_disk. It is being called from remove_and_add_spares for each of the two devices that are being rebuilt. raid5_remove_disk declines to remove them because it can keep rebuilding them. remove_and_add_spares then counts them and notes there are 2. md_check_recovery notes that this is > 0, so it should create a thread to run md_do_sync. md_do_sync should then print out a message like md: recovery of RAID array md0 but it doesn't. So something went wrong. There are three reasons that md_do_sync might not print a message: 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is unlikely, and in any case md_check_recovery clears it. 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2 then we would be able to see that in /proc/mdstat as a "(readonly)" status. But we don't. 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does get set if kthread_should_stop() returns 'true', but that should only happen if kthread_stop() was called. That is only called by md_unregister_thread and I cannot see any way that could be call. So. No idea. Are you compiling these kernels yourself? If so, could you: - put a printk in the top of md_do_sync to report the values of mddev->recovery and mddev->ro - print a message whenever md_unregister_thread is called - in md_check_recovery, in the if (mddev->ro) { /* Only thing we do on a ro array is remove * failed devices. */ mdk_rdev_t *rdev; in statement, print the value of mddev->ro. Then see which of those printk's fire, and what they tell us. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting 2011-12-12 3:01 ` NeilBrown @ 2011-12-12 5:22 ` Anssi Hannula 2011-12-12 5:42 ` NeilBrown 0 siblings, 1 reply; 9+ messages in thread From: Anssi Hannula @ 2011-12-12 5:22 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 3213 bytes --] On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@suse.de> wrote: > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: > >> Hi! >> >> After I rebooted during a raid6 rebuild, the rebuild didn't start again. >> Instead, there is a flood of "RAID conf printout"s that seemingly happen >> on array activity. >> >> All the devices show up properly in --detail and two devices are marked >> as "spare rebuilding", and I can access the contents of the array just >> fine, but the rebuild doesn't actually start. Is this a bug or am I >> missing something? :) >> >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have >> the same issue. mdadm is 3.1.5. >> >> I'm not using start_ro and writing to the array doesn't trigger a >> rebuild either. >> >> Attached are --examine outputs before assembly, kernel log output on >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4). >> > > Thank you for the very detailed problem report. Thanks for the quick response :) > Unfortunately it is a complete mystery to me what is happening. > > The repeated "RAID conf printout" messages are almost certainly coming from > the end of raid5_remove_disk. > It is being called from remove_and_add_spares for each of the two devices > that are being rebuilt. raid5_remove_disk declines to remove them because it > can keep rebuilding them. > > remove_and_add_spares then counts them and notes there are 2. > md_check_recovery notes that this is > 0, so it should create a thread to run > md_do_sync. > > md_do_sync should then print out a message like > md: recovery of RAID array md0 > > but it doesn't. So something went wrong. > There are three reasons that md_do_sync might not print a message: > > 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is > unlikely, and in any case md_check_recovery clears it. > 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2 > then we would be able to see that in /proc/mdstat as a "(readonly)" > status. But we don't. > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does > get set if kthread_should_stop() returns 'true', but that should only > happen if kthread_stop() was called. That is only called by > md_unregister_thread and I cannot see any way that could be call. > > So. No idea. > > Are you compiling these kernels yourself? Nope (used Mageia kernels), but I did now (3.1.5). > If so, could you: > - put a printk in the top of md_do_sync to report the values of > mddev->recovery and mddev->ro > - print a message whenever md_unregister_thread is called > - in md_check_recovery, in the > if (mddev->ro) { > /* Only thing we do on a ro array is remove > * failed devices. > */ > mdk_rdev_t *rdev; > > in statement, print the value of mddev->ro. > > Then see which of those printk's fire, and what they tell us. Only the last one does, and mddev->ro == 0. For reference, attached is the used patch and resulting log output. -- Anssi Hannula [-- Attachment #2: dmesg-dbg.txt --] [-- Type: text/plain, Size: 7847 bytes --] [ 46.356559] md: md0 stopped. [ 46.363773] md: bind<sdq1> [ 46.363920] md: bind<sdk1> [ 46.364035] md: bind<sdh1> [ 46.364170] md: bind<sdj1> [ 46.364567] md: bind<sdl1> [ 46.364972] md: bind<sdp1> [ 46.365184] md: bind<sdf1> [ 46.365603] md: bind<sdm1> [ 46.366025] md: bind<sdn1> [ 46.366219] md: bind<sdi1> [ 46.366464] md: bind<sdg1> [ 46.366751] md: bind<sde1> [ 46.367222] md: bind<sdo1> [ 46.374687] async_tx: api initialized (async) [ 46.547228] raid6: int64x1 2723 MB/s [ 46.717043] raid6: int64x2 3533 MB/s [ 46.886928] raid6: int64x4 2398 MB/s [ 47.056795] raid6: int64x8 2482 MB/s [ 47.226617] raid6: sse2x1 7368 MB/s [ 47.396465] raid6: sse2x2 8686 MB/s [ 47.566333] raid6: sse2x4 9614 MB/s [ 47.566336] raid6: using algorithm sse2x4 (9614 MB/s) [ 47.573931] xor: automatically using best checksumming function: generic_sse [ 47.616567] generic_sse: 12849.600 MB/sec [ 47.616569] xor: using function: generic_sse (12849.600 MB/sec) [ 47.625852] md: raid6 personality registered for level 6 [ 47.625854] md: raid5 personality registered for level 5 [ 47.625855] md: raid4 personality registered for level 4 [ 47.626275] md/raid:md0: device sdo1 operational as raid disk 0 [ 47.626277] md/raid:md0: device sdn1 operational as raid disk 9 [ 47.626279] md/raid:md0: device sdm1 operational as raid disk 8 [ 47.626280] md/raid:md0: device sdp1 operational as raid disk 6 [ 47.626281] md/raid:md0: device sdl1 operational as raid disk 5 [ 47.626283] md/raid:md0: device sdj1 operational as raid disk 4 [ 47.626284] md/raid:md0: device sdk1 operational as raid disk 2 [ 47.626285] md/raid:md0: device sdq1 operational as raid disk 1 [ 47.626899] md/raid:md0: allocated 10592kB [ 47.627049] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2 [ 47.627150] RAID conf printout: [ 47.627152] --- level:6 rd:10 wd:8 [ 47.627155] disk 0, o:1, dev:sdo1 [ 47.627157] disk 1, o:1, dev:sdq1 [ 47.627163] disk 2, o:1, dev:sdk1 [ 47.627165] disk 3, o:1, dev:sdh1 [ 47.627167] disk 4, o:1, dev:sdj1 [ 47.627169] disk 5, o:1, dev:sdl1 [ 47.627171] disk 6, o:1, dev:sdp1 [ 47.627173] disk 7, o:1, dev:sdf1 [ 47.627175] disk 8, o:1, dev:sdm1 [ 47.627177] disk 9, o:1, dev:sdn1 [ 47.627337] created bitmap (15 pages) for device md0 [ 48.028349] md0: bitmap initialized from disk: read 1/1 pages, set 257 of 29809 bits [ 48.476099] md0: detected capacity change from 0 to 16003169779712 [ 48.476739] md0: in md_check_recovery(), mddev->ro = 0 [ 48.476742] RAID conf printout: [ 48.476744] --- level:6 rd:10 wd:8 [ 48.476746] disk 0, o:1, dev:sdo1 [ 48.476748] disk 1, o:1, dev:sdq1 [ 48.476749] disk 2, o:1, dev:sdk1 [ 48.476751] disk 3, o:1, dev:sdh1 [ 48.476753] disk 4, o:1, dev:sdj1 [ 48.476754] disk 5, o:1, dev:sdl1 [ 48.476756] disk 6, o:1, dev:sdp1 [ 48.476757] disk 7, o:1, dev:sdf1 [ 48.476758] disk 8, o:1, dev:sdm1 [ 48.476760] disk 9, o:1, dev:sdn1 [ 48.476761] RAID conf printout: [ 48.476762] --- level:6 rd:10 wd:8 [ 48.476763] disk 0, o:1, dev:sdo1 [ 48.476764] disk 1, o:1, dev:sdq1 [ 48.476766] disk 2, o:1, dev:sdk1 [ 48.476767] disk 3, o:1, dev:sdh1 [ 48.476768] disk 4, o:1, dev:sdj1 [ 48.476770] disk 5, o:1, dev:sdl1 [ 48.476771] disk 6, o:1, dev:sdp1 [ 48.476772] disk 7, o:1, dev:sdf1 [ 48.476773] disk 8, o:1, dev:sdm1 [ 48.476775] disk 9, o:1, dev:sdn1 [ 48.476775] RAID conf printout: [ 48.476777] --- level:6 rd:10 wd:8 [ 48.476778] disk 0, o:1, dev:sdo1 [ 48.476779] disk 1, o:1, dev:sdq1 [ 48.476780] disk 2, o:1, dev:sdk1 [ 48.476781] disk 3, o:1, dev:sdh1 [ 48.476783] disk 4, o:1, dev:sdj1 [ 48.476784] disk 5, o:1, dev:sdl1 [ 48.476785] disk 6, o:1, dev:sdp1 [ 48.476786] disk 7, o:1, dev:sdf1 [ 48.476788] disk 8, o:1, dev:sdm1 [ 48.476789] disk 9, o:1, dev:sdn1 [ 48.476790] RAID conf printout: [ 48.476791] --- level:6 rd:10 wd:8 [ 48.476792] disk 0, o:1, dev:sdo1 [ 48.476793] disk 1, o:1, dev:sdq1 [ 48.476795] disk 2, o:1, dev:sdk1 [ 48.476796] disk 3, o:1, dev:sdh1 [ 48.476798] disk 4, o:1, dev:sdj1 [ 48.476799] disk 5, o:1, dev:sdl1 [ 48.476800] disk 6, o:1, dev:sdp1 [ 48.476802] disk 7, o:1, dev:sdf1 [ 48.476803] disk 8, o:1, dev:sdm1 [ 48.476804] disk 9, o:1, dev:sdn1 [ 48.476805] RAID conf printout: [ 48.476806] --- level:6 rd:10 wd:8 [ 48.476807] disk 0, o:1, dev:sdo1 [ 48.476809] disk 1, o:1, dev:sdq1 [ 48.476810] disk 2, o:1, dev:sdk1 [ 48.476811] disk 3, o:1, dev:sdh1 [ 48.476812] disk 4, o:1, dev:sdj1 [ 48.476813] disk 5, o:1, dev:sdl1 [ 48.476815] disk 6, o:1, dev:sdp1 [ 48.476816] disk 7, o:1, dev:sdf1 [ 48.476817] disk 8, o:1, dev:sdm1 [ 48.476819] disk 9, o:1, dev:sdn1 [ 48.488210] md0: unknown partition table [ 49.328595] md1: in md_check_recovery(), mddev->ro = 0 [ 49.329692] EXT4-fs (dm-0): re-mounted. Opts: (null) [ 49.534829] md1: in md_check_recovery(), mddev->ro = 0 [ 49.792196] md0: in md_check_recovery(), mddev->ro = 0 [ 50.349944] RAID conf printout: [ 50.349949] --- level:6 rd:10 wd:8 [ 50.349952] disk 0, o:1, dev:sdo1 [ 50.349955] disk 1, o:1, dev:sdq1 [ 50.349957] disk 2, o:1, dev:sdk1 [ 50.349959] disk 3, o:1, dev:sdh1 [ 50.349961] disk 4, o:1, dev:sdj1 [ 50.349964] disk 5, o:1, dev:sdl1 [ 50.349966] disk 6, o:1, dev:sdp1 [ 50.349968] disk 7, o:1, dev:sdf1 [ 50.349970] disk 8, o:1, dev:sdm1 [ 50.349972] disk 9, o:1, dev:sdn1 [ 50.349974] RAID conf printout: [ 50.349976] --- level:6 rd:10 wd:8 [ 50.349978] disk 0, o:1, dev:sdo1 [ 50.349980] disk 1, o:1, dev:sdq1 [ 50.349982] disk 2, o:1, dev:sdk1 [ 50.349984] disk 3, o:1, dev:sdh1 [ 50.349986] disk 4, o:1, dev:sdj1 [ 50.349988] disk 5, o:1, dev:sdl1 [ 50.349990] disk 6, o:1, dev:sdp1 [ 50.349992] disk 7, o:1, dev:sdf1 [ 50.349994] disk 8, o:1, dev:sdm1 [ 50.349996] disk 9, o:1, dev:sdn1 [ 50.349997] RAID conf printout: [ 50.349999] --- level:6 rd:10 wd:8 [ 50.350001] disk 0, o:1, dev:sdo1 [ 50.350003] disk 1, o:1, dev:sdq1 [ 50.350005] disk 2, o:1, dev:sdk1 [ 50.350007] disk 3, o:1, dev:sdh1 [ 50.350009] disk 4, o:1, dev:sdj1 [ 50.350011] disk 5, o:1, dev:sdl1 [ 50.350013] disk 6, o:1, dev:sdp1 [ 50.350015] disk 7, o:1, dev:sdf1 [ 50.350017] disk 8, o:1, dev:sdm1 [ 50.350019] disk 9, o:1, dev:sdn1 [ 50.350020] RAID conf printout: [ 50.350022] --- level:6 rd:10 wd:8 [ 50.350024] disk 0, o:1, dev:sdo1 [ 50.350026] disk 1, o:1, dev:sdq1 [ 50.350028] disk 2, o:1, dev:sdk1 [ 50.350030] disk 3, o:1, dev:sdh1 [ 50.350032] disk 4, o:1, dev:sdj1 [ 50.350034] disk 5, o:1, dev:sdl1 [ 50.350036] disk 6, o:1, dev:sdp1 [ 50.350038] disk 7, o:1, dev:sdf1 [ 50.350040] disk 8, o:1, dev:sdm1 [ 50.350042] disk 9, o:1, dev:sdn1 [ 50.350043] RAID conf printout: [ 50.350045] --- level:6 rd:10 wd:8 [ 50.350047] disk 0, o:1, dev:sdo1 [ 50.350049] disk 1, o:1, dev:sdq1 [ 50.350051] disk 2, o:1, dev:sdk1 [ 50.350053] disk 3, o:1, dev:sdh1 [ 50.350055] disk 4, o:1, dev:sdj1 [ 50.350057] disk 5, o:1, dev:sdl1 [ 50.350059] disk 6, o:1, dev:sdp1 [ 50.350061] disk 7, o:1, dev:sdf1 [ 50.350063] disk 8, o:1, dev:sdm1 [ 50.350065] disk 9, o:1, dev:sdn1 [ 50.941625] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: (null) [ 50.966825] EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts: (null) [ 50.979025] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null) [ 50.991736] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) [ 51.006856] loop: module loaded [ 51.033690] md1: in md_check_recovery(), mddev->ro = 0 [-- Attachment #3: dbg.patch --] [-- Type: text/x-patch, Size: 1013 bytes --] --- md-orig/md.c 2011-10-24 10:10:05.000000000 +0300 +++ 3.1.4-MD/md.c 2011-12-12 06:26:43.388401596 +0200 @@ -6442,6 +6442,7 @@ mdk_thread_t *md_register_thread(void (* void md_unregister_thread(mdk_thread_t **threadp) { mdk_thread_t *thread = *threadp; + printk(KERN_INFO "md_unregister_thread()\n"); if (!thread) return; dprintk("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk)); @@ -6996,6 +6997,8 @@ void md_do_sync(mddev_t *mddev) mdk_rdev_t *rdev; char *desc; + printk(KERN_INFO "%s: md_do_sync(), recovery=0x%lx, ro=%d\n", + mdname(mddev), mddev->recovery, mddev->ro); /* just incase thread restarts... */ if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) return; @@ -7445,6 +7452,8 @@ void md_check_recovery(mddev_t *mddev) if (mddev_trylock(mddev)) { int spares = 0; + printk(KERN_INFO "%s: in md_check_recovery(), mddev->ro = %d\n", + mdname(mddev), mddev->ro); if (mddev->ro) { /* Only thing we do on a ro array is remove * failed devices. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting 2011-12-12 5:22 ` Anssi Hannula @ 2011-12-12 5:42 ` NeilBrown 2011-12-12 6:02 ` Anssi Hannula 0 siblings, 1 reply; 9+ messages in thread From: NeilBrown @ 2011-12-12 5:42 UTC (permalink / raw) To: Anssi Hannula; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 3790 bytes --] On Mon, 12 Dec 2011 07:22:17 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: > On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@suse.de> wrote: > > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: > > > >> Hi! > >> > >> After I rebooted during a raid6 rebuild, the rebuild didn't start again. > >> Instead, there is a flood of "RAID conf printout"s that seemingly happen > >> on array activity. > >> > >> All the devices show up properly in --detail and two devices are marked > >> as "spare rebuilding", and I can access the contents of the array just > >> fine, but the rebuild doesn't actually start. Is this a bug or am I > >> missing something? :) > >> > >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have > >> the same issue. mdadm is 3.1.5. > >> > >> I'm not using start_ro and writing to the array doesn't trigger a > >> rebuild either. > >> > >> Attached are --examine outputs before assembly, kernel log output on > >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4). > >> > > > > Thank you for the very detailed problem report. > > Thanks for the quick response :) > > > Unfortunately it is a complete mystery to me what is happening. > > > > The repeated "RAID conf printout" messages are almost certainly coming from > > the end of raid5_remove_disk. > > It is being called from remove_and_add_spares for each of the two devices > > that are being rebuilt. raid5_remove_disk declines to remove them because it > > can keep rebuilding them. > > > > remove_and_add_spares then counts them and notes there are 2. > > md_check_recovery notes that this is > 0, so it should create a thread to run > > md_do_sync. > > > > md_do_sync should then print out a message like > > md: recovery of RAID array md0 > > > > but it doesn't. So something went wrong. > > There are three reasons that md_do_sync might not print a message: > > > > 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is > > unlikely, and in any case md_check_recovery clears it. > > 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2 > > then we would be able to see that in /proc/mdstat as a "(readonly)" > > status. But we don't. > > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does > > get set if kthread_should_stop() returns 'true', but that should only > > happen if kthread_stop() was called. That is only called by > > md_unregister_thread and I cannot see any way that could be call. > > > > So. No idea. > > > > Are you compiling these kernels yourself? > > Nope (used Mageia kernels), but I did now (3.1.5). > > > If so, could you: > > - put a printk in the top of md_do_sync to report the values of > > mddev->recovery and mddev->ro > > - print a message whenever md_unregister_thread is called > > - in md_check_recovery, in the > > if (mddev->ro) { > > /* Only thing we do on a ro array is remove > > * failed devices. > > */ > > mdk_rdev_t *rdev; > > > > in statement, print the value of mddev->ro. > > > > Then see which of those printk's fire, and what they tell us. > > Only the last one does, and mddev->ro == 0. > > For reference, attached is the used patch and resulting log output. > Thanks. So it isn't running md_do_sync at all. Odd. Could please add: - call "WARN_ON(1);" in print_raid5_conf() so we get a stack trace and can see who is calling it. - print the value that remove_and_add_spares is going to return. Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting 2011-12-12 5:42 ` NeilBrown @ 2011-12-12 6:02 ` Anssi Hannula 2011-12-12 6:24 ` NeilBrown 0 siblings, 1 reply; 9+ messages in thread From: Anssi Hannula @ 2011-12-12 6:02 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 3928 bytes --] On Mon, Dec 12, 2011 at 7:42 AM, NeilBrown <neilb@suse.de> wrote: > On Mon, 12 Dec 2011 07:22:17 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: > >> On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@suse.de> wrote: >> > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: >> > >> >> Hi! >> >> >> >> After I rebooted during a raid6 rebuild, the rebuild didn't start again. >> >> Instead, there is a flood of "RAID conf printout"s that seemingly happen >> >> on array activity. >> >> >> >> All the devices show up properly in --detail and two devices are marked >> >> as "spare rebuilding", and I can access the contents of the array just >> >> fine, but the rebuild doesn't actually start. Is this a bug or am I >> >> missing something? :) >> >> >> >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have >> >> the same issue. mdadm is 3.1.5. >> >> >> >> I'm not using start_ro and writing to the array doesn't trigger a >> >> rebuild either. >> >> >> >> Attached are --examine outputs before assembly, kernel log output on >> >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4). >> >> >> > >> > Thank you for the very detailed problem report. >> >> Thanks for the quick response :) >> >> > Unfortunately it is a complete mystery to me what is happening. >> > >> > The repeated "RAID conf printout" messages are almost certainly coming from >> > the end of raid5_remove_disk. >> > It is being called from remove_and_add_spares for each of the two devices >> > that are being rebuilt. raid5_remove_disk declines to remove them because it >> > can keep rebuilding them. >> > >> > remove_and_add_spares then counts them and notes there are 2. >> > md_check_recovery notes that this is > 0, so it should create a thread to run >> > md_do_sync. >> > >> > md_do_sync should then print out a message like >> > md: recovery of RAID array md0 >> > >> > but it doesn't. So something went wrong. >> > There are three reasons that md_do_sync might not print a message: >> > >> > 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is >> > unlikely, and in any case md_check_recovery clears it. >> > 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2 >> > then we would be able to see that in /proc/mdstat as a "(readonly)" >> > status. But we don't. >> > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does >> > get set if kthread_should_stop() returns 'true', but that should only >> > happen if kthread_stop() was called. That is only called by >> > md_unregister_thread and I cannot see any way that could be call. >> > >> > So. No idea. >> > >> > Are you compiling these kernels yourself? >> >> Nope (used Mageia kernels), but I did now (3.1.5). >> >> > If so, could you: >> > - put a printk in the top of md_do_sync to report the values of >> > mddev->recovery and mddev->ro >> > - print a message whenever md_unregister_thread is called >> > - in md_check_recovery, in the >> > if (mddev->ro) { >> > /* Only thing we do on a ro array is remove >> > * failed devices. >> > */ >> > mdk_rdev_t *rdev; >> > >> > in statement, print the value of mddev->ro. >> > >> > Then see which of those printk's fire, and what they tell us. >> >> Only the last one does, and mddev->ro == 0. >> >> For reference, attached is the used patch and resulting log output. >> > > Thanks. > > So it isn't running md_do_sync at all. Odd. > > Could please add: > - call "WARN_ON(1);" in print_raid5_conf() so we get a stack trace and can > see who is calling it. > - print the value that remove_and_add_spares is going to return. Attached. As you can see, remove_and_add_spare returns 0. -- Anssi Hannula [-- Attachment #2: dmesg-dbg2.txt --] [-- Type: text/plain, Size: 20695 bytes --] [ 44.270890] md: md0 stopped. [ 44.276951] md: bind<sdq1> [ 44.277373] md: bind<sdo1> [ 44.277504] md: bind<sdh1> [ 44.278088] md: bind<sdk1> [ 44.278816] md: bind<sdp1> [ 44.282215] md: bind<sdn1> [ 44.282377] md: bind<sdf1> [ 44.282558] md: bind<sdm1> [ 44.282741] md: bind<sdl1> [ 44.282948] md: bind<sdi1> [ 44.283085] md: bind<sdg1> [ 44.283249] md: bind<sdd1> [ 44.283449] md: bind<sdj1> [ 44.288946] async_tx: api initialized (async) [ 44.465572] raid6: int64x1 2753 MB/s [ 44.637492] raid6: int64x2 3533 MB/s [ 44.805245] raid6: int64x4 2396 MB/s [ 44.975134] raid6: int64x8 2491 MB/s [ 45.146107] raid6: sse2x1 7311 MB/s [ 45.314810] raid6: sse2x2 8670 MB/s [ 45.484662] raid6: sse2x4 9630 MB/s [ 45.484665] raid6: using algorithm sse2x4 (9630 MB/s) [ 45.491202] xor: automatically using best checksumming function: generic_sse [ 45.534618] generic_sse: 12819.200 MB/sec [ 45.534623] xor: using function: generic_sse (12819.200 MB/sec) [ 45.543401] md: raid6 personality registered for level 6 [ 45.543402] md: raid5 personality registered for level 5 [ 45.543403] md: raid4 personality registered for level 4 [ 45.543961] md/raid:md0: device sdj1 operational as raid disk 0 [ 45.543963] md/raid:md0: device sdl1 operational as raid disk 9 [ 45.543964] md/raid:md0: device sdm1 operational as raid disk 8 [ 45.543966] md/raid:md0: device sdn1 operational as raid disk 6 [ 45.543967] md/raid:md0: device sdp1 operational as raid disk 5 [ 45.543968] md/raid:md0: device sdk1 operational as raid disk 4 [ 45.543969] md/raid:md0: device sdo1 operational as raid disk 2 [ 45.543970] md/raid:md0: device sdq1 operational as raid disk 1 [ 45.544667] md/raid:md0: allocated 10592kB [ 45.544806] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2 [ 45.544907] ------------[ cut here ]------------ [ 45.544913] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]() [ 45.544920] Hardware name: System Product Name [ 45.544922] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd joydev ir_lirc_codec rc_imon_pad lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder ir_rc5_decoder imon ir_nec_decoder usb_storage rc_core snd_hda_codec_realtek snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i2c_i801 drxk snd_timer snd_mixer_oss e1000e ddbridge evdev mxm_wmi snd dvb_core firewire_ohci sr_mod serio_raw iTCO_wdt cxd2099(C) firewire_core soundcore iTCO_vendor_support i2c_core i7core_edac crc_itu_t snd_page_alloc edac_core asus_atk0110 wmi processor button sg xhci_hcd ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan] [ 45.544994] Pid: 848, comm: mdadm Tainted: G C 3.1.5 #5 [ 45.544995] Call Trace: [ 45.545000] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0 [ 45.545002] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20 [ 45.545004] [<ffffffffa0675607>] print_raid5_conf+0x37/0x110 [raid456] [ 45.545007] [<ffffffffa067e2ba>] run+0x2aa/0x750 [raid456] [ 45.545010] [<ffffffff811908a1>] ? bioset_create+0x1c1/0x280 [ 45.545012] [<ffffffff8133f3eb>] md_run+0x36b/0x8a0 [ 45.545014] [<ffffffff8134060e>] ? add_new_disk+0x2ee/0x550 [ 45.545016] [<ffffffff8133f93e>] do_md_run+0x1e/0xd0 [ 45.545017] [<ffffffff81341223>] md_ioctl+0x9b3/0x1230 [ 45.545021] [<ffffffff8121d260>] blkdev_ioctl+0x230/0x740 [ 45.545023] [<ffffffff81191a31>] block_ioctl+0x41/0x50 [ 45.545026] [<ffffffff8116f867>] do_vfs_ioctl+0x97/0x4f0 [ 45.545028] [<ffffffff8116fd51>] sys_ioctl+0x91/0xa0 [ 45.545031] [<ffffffff8143cd92>] system_call_fastpath+0x16/0x1b [ 45.545032] ---[ end trace 8bf0f462f72ecc2c ]--- [ 45.545033] RAID conf printout: [ 45.545034] --- level:6 rd:10 wd:8 [ 45.545035] disk 0, o:1, dev:sdj1 [ 45.545036] disk 1, o:1, dev:sdq1 [ 45.545037] disk 2, o:1, dev:sdo1 [ 45.545038] disk 3, o:1, dev:sdh1 [ 45.545039] disk 4, o:1, dev:sdk1 [ 45.545040] disk 5, o:1, dev:sdp1 [ 45.545041] disk 6, o:1, dev:sdn1 [ 45.545042] disk 7, o:1, dev:sdf1 [ 45.545043] disk 8, o:1, dev:sdm1 [ 45.545044] disk 9, o:1, dev:sdl1 [ 45.545192] created bitmap (15 pages) for device md0 [ 45.967725] md0: bitmap initialized from disk: read 1/1 pages, set 258 of 29809 bits [ 46.420406] md0: detected capacity change from 0 to 16003169779712 [ 46.421264] md0: in md_check_recovery(), mddev->ro = 0 [ 46.421270] ------------[ cut here ]------------ [ 46.421279] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]() [ 46.421283] Hardware name: System Product Name [ 46.421285] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd joydev ir_lirc_codec rc_imon_pad lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder ir_rc5_decoder imon ir_nec_decoder usb_storage rc_core snd_hda_codec_realtek snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i2c_i801 drxk snd_timer snd_mixer_oss e1000e ddbridge evdev mxm_wmi snd dvb_core firewire_ohci sr_mod serio_raw iTCO_wdt cxd2099(C) firewire_core soundcore iTCO_vendor_support i2c_core i7core_edac crc_itu_t snd_page_alloc edac_core asus_atk0110 wmi processor button sg xhci_hcd ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan] [ 46.421387] Pid: 912, comm: md0_raid6 Tainted: G WC 3.1.5 #5 [ 46.421389] Call Trace: [ 46.421396] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0 [ 46.421398] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20 [ 46.421402] [<ffffffffa0675607>] print_raid5_conf+0x37/0x110 [raid456] [ 46.421405] [<ffffffffa0675ff7>] raid5_remove_disk+0x47/0x120 [raid456] [ 46.421408] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x690 [ 46.421412] [<ffffffffa067be63>] raid5d+0x43/0x780 [raid456] [ 46.421415] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60 [ 46.421419] [<ffffffff81433057>] ? schedule_timeout+0x177/0x2e0 [ 46.421422] [<ffffffff81073660>] ? cascade+0xa0/0xa0 [ 46.421426] [<ffffffff81338f96>] md_thread+0x116/0x150 [ 46.421430] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40 [ 46.421433] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130 [ 46.421435] [<ffffffff810869a6>] kthread+0x96/0xa0 [ 46.421439] [<ffffffff8143eef4>] kernel_thread_helper+0x4/0x10 [ 46.421442] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0 [ 46.421444] [<ffffffff8143eef0>] ? gs_change+0x13/0x13 [ 46.421446] ---[ end trace 8bf0f462f72ecc2d ]--- [ 46.421447] RAID conf printout: [ 46.421449] --- level:6 rd:10 wd:8 [ 46.421451] disk 0, o:1, dev:sdj1 [ 46.421453] disk 1, o:1, dev:sdq1 [ 46.421454] disk 2, o:1, dev:sdo1 [ 46.421456] disk 3, o:1, dev:sdh1 [ 46.421457] disk 4, o:1, dev:sdk1 [ 46.421459] disk 5, o:1, dev:sdp1 [ 46.421461] disk 6, o:1, dev:sdn1 [ 46.421463] disk 7, o:1, dev:sdf1 [ 46.421464] disk 8, o:1, dev:sdm1 [ 46.421466] disk 9, o:1, dev:sdl1 [ 46.421467] ------------[ cut here ]------------ [ 46.421471] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]() [ 46.421473] Hardware name: System Product Name [ 46.421474] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd joydev ir_lirc_codec rc_imon_pad lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder ir_rc5_decoder imon ir_nec_decoder usb_storage rc_core snd_hda_codec_realtek snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i2c_i801 drxk snd_timer snd_mixer_oss e1000e ddbridge evdev mxm_wmi snd dvb_core firewire_ohci sr_mod serio_raw iTCO_wdt cxd2099(C) firewire_core soundcore iTCO_vendor_support i2c_core i7core_edac crc_itu_t snd_page_alloc edac_core asus_atk0110 wmi processor button sg xhci_hcd ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan] [ 46.421531] Pid: 912, comm: md0_raid6 Tainted: G WC 3.1.5 #5 [ 46.421532] Call Trace: [ 46.421535] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0 [ 46.421537] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20 [ 46.421540] [<ffffffffa0675607>] print_raid5_conf+0x37/0x110 [raid456] [ 46.421543] [<ffffffffa067600c>] raid5_remove_disk+0x5c/0x120 [raid456] [ 46.421546] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x690 [ 46.421549] [<ffffffffa067be63>] raid5d+0x43/0x780 [raid456] [ 46.421551] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60 [ 46.421554] [<ffffffff81433057>] ? schedule_timeout+0x177/0x2e0 [ 46.421559] [<ffffffff81073660>] ? cascade+0xa0/0xa0 [ 46.421562] [<ffffffff81338f96>] md_thread+0x116/0x150 [ 46.421564] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40 [ 46.421567] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130 [ 46.421569] [<ffffffff810869a6>] kthread+0x96/0xa0 [ 46.421572] [<ffffffff8143eef4>] kernel_thread_helper+0x4/0x10 [ 46.421574] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0 [ 46.421577] [<ffffffff8143eef0>] ? gs_change+0x13/0x13 [ 46.421578] ---[ end trace 8bf0f462f72ecc2e ]--- [ 46.421579] RAID conf printout: [ 46.421581] --- level:6 rd:10 wd:8 [ 46.421582] disk 0, o:1, dev:sdj1 [ 46.421583] disk 1, o:1, dev:sdq1 [ 46.421585] disk 2, o:1, dev:sdo1 [ 46.421586] disk 3, o:1, dev:sdh1 [ 46.421588] disk 4, o:1, dev:sdk1 [ 46.421589] disk 5, o:1, dev:sdp1 [ 46.421590] disk 6, o:1, dev:sdn1 [ 46.421592] disk 7, o:1, dev:sdf1 [ 46.421593] disk 8, o:1, dev:sdm1 [ 46.421594] disk 9, o:1, dev:sdl1 [ 46.421596] ------------[ cut here ]------------ [ 46.421598] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]() [ 46.421600] Hardware name: System Product Name [ 46.421601] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd joydev ir_lirc_codec rc_imon_pad lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder ir_rc5_decoder imon ir_nec_decoder usb_storage rc_core snd_hda_codec_realtek snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i2c_i801 drxk snd_timer snd_mixer_oss e1000e ddbridge evdev mxm_wmi snd dvb_core firewire_ohci sr_mod serio_raw iTCO_wdt cxd2099(C) firewire_core soundcore iTCO_vendor_support i2c_core i7core_edac crc_itu_t snd_page_alloc edac_core asus_atk0110 wmi processor button sg xhci_hcd ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan] [ 46.421653] Pid: 912, comm: md0_raid6 Tainted: G WC 3.1.5 #5 [ 46.421655] Call Trace: [ 46.421657] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0 [ 46.421659] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20 [ 46.421662] [<ffffffffa0675607>] print_raid5_conf+0x37/0x110 [raid456] [ 46.421665] [<ffffffffa0675ff7>] raid5_remove_disk+0x47/0x120 [raid456] [ 46.421668] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x690 [ 46.421671] [<ffffffffa067be63>] raid5d+0x43/0x780 [raid456] [ 46.421673] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60 [ 46.421676] [<ffffffff81433057>] ? schedule_timeout+0x177/0x2e0 [ 46.421678] [<ffffffff81073660>] ? cascade+0xa0/0xa0 [ 46.421681] [<ffffffff81338f96>] md_thread+0x116/0x150 [ 46.421686] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40 [ 46.421688] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130 [ 46.421691] [<ffffffff810869a6>] kthread+0x96/0xa0 [ 46.421693] [<ffffffff8143eef4>] kernel_thread_helper+0x4/0x10 [ 46.421695] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0 [ 46.421698] [<ffffffff8143eef0>] ? gs_change+0x13/0x13 [ 46.421699] ---[ end trace 8bf0f462f72ecc2f ]--- [ 46.421700] RAID conf printout: [ 46.421701] --- level:6 rd:10 wd:8 [ 46.421703] disk 0, o:1, dev:sdj1 [ 46.421704] disk 1, o:1, dev:sdq1 [ 46.421705] disk 2, o:1, dev:sdo1 [ 46.421707] disk 3, o:1, dev:sdh1 [ 46.421708] disk 4, o:1, dev:sdk1 [ 46.421709] disk 5, o:1, dev:sdp1 [ 46.421711] disk 6, o:1, dev:sdn1 [ 46.421712] disk 7, o:1, dev:sdf1 [ 46.421713] disk 8, o:1, dev:sdm1 [ 46.421715] disk 9, o:1, dev:sdl1 [ 46.421716] ------------[ cut here ]------------ [ 46.421718] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]() [ 46.421720] Hardware name: System Product Name [ 46.421721] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd joydev ir_lirc_codec rc_imon_pad lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder ir_rc5_decoder imon ir_nec_decoder usb_storage rc_core snd_hda_codec_realtek snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i2c_i801 drxk snd_timer snd_mixer_oss e1000e ddbridge evdev mxm_wmi snd dvb_core firewire_ohci sr_mod serio_raw iTCO_wdt cxd2099(C) firewire_core soundcore iTCO_vendor_support i2c_core i7core_edac crc_itu_t snd_page_alloc edac_core asus_atk0110 wmi processor button sg xhci_hcd ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan] [ 46.421771] Pid: 912, comm: md0_raid6 Tainted: G WC 3.1.5 #5 [ 46.421772] Call Trace: [ 46.421774] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0 [ 46.421776] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20 [ 46.421779] [<ffffffffa0675607>] print_raid5_conf+0x37/0x110 [raid456] [ 46.421782] [<ffffffffa067600c>] raid5_remove_disk+0x5c/0x120 [raid456] [ 46.421784] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x690 [ 46.421788] [<ffffffffa067be63>] raid5d+0x43/0x780 [raid456] [ 46.421790] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60 [ 46.421792] [<ffffffff81433057>] ? schedule_timeout+0x177/0x2e0 [ 46.421795] [<ffffffff81073660>] ? cascade+0xa0/0xa0 [ 46.421797] [<ffffffff81338f96>] md_thread+0x116/0x150 [ 46.421799] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40 [ 46.421802] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130 [ 46.421804] [<ffffffff810869a6>] kthread+0x96/0xa0 [ 46.421806] [<ffffffff8143eef4>] kernel_thread_helper+0x4/0x10 [ 46.421809] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0 [ 46.421814] [<ffffffff8143eef0>] ? gs_change+0x13/0x13 [ 46.421815] ---[ end trace 8bf0f462f72ecc30 ]--- [ 46.421816] RAID conf printout: [ 46.421818] --- level:6 rd:10 wd:8 [ 46.421819] disk 0, o:1, dev:sdj1 [ 46.421821] disk 1, o:1, dev:sdq1 [ 46.421822] disk 2, o:1, dev:sdo1 [ 46.421823] disk 3, o:1, dev:sdh1 [ 46.421825] disk 4, o:1, dev:sdk1 [ 46.421826] disk 5, o:1, dev:sdp1 [ 46.421827] disk 6, o:1, dev:sdn1 [ 46.421829] disk 7, o:1, dev:sdf1 [ 46.421830] disk 8, o:1, dev:sdm1 [ 46.421831] disk 9, o:1, dev:sdl1 [ 46.421833] ------------[ cut here ]------------ [ 46.421835] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]() [ 46.421837] Hardware name: System Product Name [ 46.421838] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd joydev ir_lirc_codec rc_imon_pad lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder ir_rc5_decoder imon ir_nec_decoder usb_storage rc_core snd_hda_codec_realtek snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i2c_i801 drxk snd_timer snd_mixer_oss e1000e ddbridge evdev mxm_wmi snd dvb_core firewire_ohci sr_mod serio_raw iTCO_wdt cxd2099(C) firewire_core soundcore iTCO_vendor_support i2c_core i7core_edac crc_itu_t snd_page_alloc edac_core asus_atk0110 wmi processor button sg xhci_hcd ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan] [ 46.421890] Pid: 912, comm: md0_raid6 Tainted: G WC 3.1.5 #5 [ 46.421891] Call Trace: [ 46.421893] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0 [ 46.421896] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20 [ 46.421899] [<ffffffffa0675607>] print_raid5_conf+0x37/0x110 [raid456] [ 46.421902] [<ffffffffa06757b4>] raid5_add_disk+0xd4/0x120 [raid456] [ 46.421904] [<ffffffff8133dded>] md_check_recovery+0x46d/0x690 [ 46.421907] [<ffffffffa067be63>] raid5d+0x43/0x780 [raid456] [ 46.421910] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60 [ 46.421912] [<ffffffff81433057>] ? schedule_timeout+0x177/0x2e0 [ 46.421915] [<ffffffff81073660>] ? cascade+0xa0/0xa0 [ 46.421917] [<ffffffff81338f96>] md_thread+0x116/0x150 [ 46.421920] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40 [ 46.421922] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130 [ 46.421924] [<ffffffff810869a6>] kthread+0x96/0xa0 [ 46.421927] [<ffffffff8143eef4>] kernel_thread_helper+0x4/0x10 [ 46.421929] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0 [ 46.421932] [<ffffffff8143eef0>] ? gs_change+0x13/0x13 [ 46.421933] ---[ end trace 8bf0f462f72ecc31 ]--- [ 46.421934] RAID conf printout: [ 46.421935] --- level:6 rd:10 wd:8 [ 46.421937] disk 0, o:1, dev:sdj1 [ 46.421938] disk 1, o:1, dev:sdq1 [ 46.421940] disk 2, o:1, dev:sdo1 [ 46.421941] disk 3, o:1, dev:sdh1 [ 46.421942] disk 4, o:1, dev:sdk1 [ 46.421944] disk 5, o:1, dev:sdp1 [ 46.421945] disk 6, o:1, dev:sdn1 [ 46.421946] disk 7, o:1, dev:sdf1 [ 46.421948] disk 8, o:1, dev:sdm1 [ 46.421949] disk 9, o:1, dev:sdl1 [ 46.421950] md0: remove_and_add_spares will return 0 [ 46.434878] md0: unknown partition table ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting 2011-12-12 6:02 ` Anssi Hannula @ 2011-12-12 6:24 ` NeilBrown 2011-12-12 6:42 ` Anssi Hannula 0 siblings, 1 reply; 9+ messages in thread From: NeilBrown @ 2011-12-12 6:24 UTC (permalink / raw) To: Anssi Hannula; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 5307 bytes --] On Mon, 12 Dec 2011 08:02:33 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: > On Mon, Dec 12, 2011 at 7:42 AM, NeilBrown <neilb@suse.de> wrote: > > On Mon, 12 Dec 2011 07:22:17 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: > > > >> On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@suse.de> wrote: > >> > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: > >> > > >> >> Hi! > >> >> > >> >> After I rebooted during a raid6 rebuild, the rebuild didn't start again. > >> >> Instead, there is a flood of "RAID conf printout"s that seemingly happen > >> >> on array activity. > >> >> > >> >> All the devices show up properly in --detail and two devices are marked > >> >> as "spare rebuilding", and I can access the contents of the array just > >> >> fine, but the rebuild doesn't actually start. Is this a bug or am I > >> >> missing something? :) > >> >> > >> >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have > >> >> the same issue. mdadm is 3.1.5. > >> >> > >> >> I'm not using start_ro and writing to the array doesn't trigger a > >> >> rebuild either. > >> >> > >> >> Attached are --examine outputs before assembly, kernel log output on > >> >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4). > >> >> > >> > > >> > Thank you for the very detailed problem report. > >> > >> Thanks for the quick response :) > >> > >> > Unfortunately it is a complete mystery to me what is happening. > >> > > >> > The repeated "RAID conf printout" messages are almost certainly coming from > >> > the end of raid5_remove_disk. > >> > It is being called from remove_and_add_spares for each of the two devices > >> > that are being rebuilt. raid5_remove_disk declines to remove them because it > >> > can keep rebuilding them. > >> > > >> > remove_and_add_spares then counts them and notes there are 2. > >> > md_check_recovery notes that this is > 0, so it should create a thread to run > >> > md_do_sync. > >> > > >> > md_do_sync should then print out a message like > >> > md: recovery of RAID array md0 > >> > > >> > but it doesn't. So something went wrong. > >> > There are three reasons that md_do_sync might not print a message: > >> > > >> > 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is > >> > unlikely, and in any case md_check_recovery clears it. > >> > 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2 > >> > then we would be able to see that in /proc/mdstat as a "(readonly)" > >> > status. But we don't. > >> > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does > >> > get set if kthread_should_stop() returns 'true', but that should only > >> > happen if kthread_stop() was called. That is only called by > >> > md_unregister_thread and I cannot see any way that could be call. > >> > > >> > So. No idea. > >> > > >> > Are you compiling these kernels yourself? > >> > >> Nope (used Mageia kernels), but I did now (3.1.5). > >> > >> > If so, could you: > >> > - put a printk in the top of md_do_sync to report the values of > >> > mddev->recovery and mddev->ro > >> > - print a message whenever md_unregister_thread is called > >> > - in md_check_recovery, in the > >> > if (mddev->ro) { > >> > /* Only thing we do on a ro array is remove > >> > * failed devices. > >> > */ > >> > mdk_rdev_t *rdev; > >> > > >> > in statement, print the value of mddev->ro. > >> > > >> > Then see which of those printk's fire, and what they tell us. > >> > >> Only the last one does, and mddev->ro == 0. > >> > >> For reference, attached is the used patch and resulting log output. > >> > > > > Thanks. > > > > So it isn't running md_do_sync at all. Odd. > > > > Could please add: > > - call "WARN_ON(1);" in print_raid5_conf() so we get a stack trace and can > > see who is calling it. > > - print the value that remove_and_add_spares is going to return. > > Attached. As you can see, remove_and_add_spare returns 0. > > -- > Anssi Hannula Please add: diff --git a/drivers/md/md.c b/drivers/md/md.c index 5c95ccb..fa56ac5 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -7328,8 +7328,10 @@ static int remove_and_add_spares(mddev_t *mddev) } } + printk("degraded=%d\n", mddev->degraded); if (mddev->degraded) { list_for_each_entry(rdev, &mddev->disks, same_set) { + printk("raid_disk=%d flags=%x\n", rdev->raid_disk, rdev->flags); if (rdev->raid_disk >= 0 && !test_bit(In_sync, &rdev->flags) && !test_bit(Faulty, &rdev->flags)) 'degraded' must be 2 as dmesg contains [ 45.544806] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2 and 'degraded' is exactly the difference between '8' and '10' there. raid disks 3 and 7 must have In_sync and Faulty clear as both of them just show "spare rebuilding" in the 'detail' output. so remove_and_add_spares "must" return 2. Hopefully the above patch will help me understand which of those is wrong. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting 2011-12-12 6:24 ` NeilBrown @ 2011-12-12 6:42 ` Anssi Hannula 2011-12-12 7:10 ` NeilBrown 0 siblings, 1 reply; 9+ messages in thread From: Anssi Hannula @ 2011-12-12 6:42 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 5748 bytes --] On Mon, Dec 12, 2011 at 8:24 AM, NeilBrown <neilb@suse.de> wrote: > On Mon, 12 Dec 2011 08:02:33 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: > >> On Mon, Dec 12, 2011 at 7:42 AM, NeilBrown <neilb@suse.de> wrote: >> > On Mon, 12 Dec 2011 07:22:17 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: >> > >> >> On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@suse.de> wrote: >> >> > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: >> >> > >> >> >> Hi! >> >> >> >> >> >> After I rebooted during a raid6 rebuild, the rebuild didn't start again. >> >> >> Instead, there is a flood of "RAID conf printout"s that seemingly happen >> >> >> on array activity. >> >> >> >> >> >> All the devices show up properly in --detail and two devices are marked >> >> >> as "spare rebuilding", and I can access the contents of the array just >> >> >> fine, but the rebuild doesn't actually start. Is this a bug or am I >> >> >> missing something? :) >> >> >> >> >> >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have >> >> >> the same issue. mdadm is 3.1.5. >> >> >> >> >> >> I'm not using start_ro and writing to the array doesn't trigger a >> >> >> rebuild either. >> >> >> >> >> >> Attached are --examine outputs before assembly, kernel log output on >> >> >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4). >> >> >> >> >> > >> >> > Thank you for the very detailed problem report. >> >> >> >> Thanks for the quick response :) >> >> >> >> > Unfortunately it is a complete mystery to me what is happening. >> >> > >> >> > The repeated "RAID conf printout" messages are almost certainly coming from >> >> > the end of raid5_remove_disk. >> >> > It is being called from remove_and_add_spares for each of the two devices >> >> > that are being rebuilt. raid5_remove_disk declines to remove them because it >> >> > can keep rebuilding them. >> >> > >> >> > remove_and_add_spares then counts them and notes there are 2. >> >> > md_check_recovery notes that this is > 0, so it should create a thread to run >> >> > md_do_sync. >> >> > >> >> > md_do_sync should then print out a message like >> >> > md: recovery of RAID array md0 >> >> > >> >> > but it doesn't. So something went wrong. >> >> > There are three reasons that md_do_sync might not print a message: >> >> > >> >> > 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is >> >> > unlikely, and in any case md_check_recovery clears it. >> >> > 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2 >> >> > then we would be able to see that in /proc/mdstat as a "(readonly)" >> >> > status. But we don't. >> >> > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does >> >> > get set if kthread_should_stop() returns 'true', but that should only >> >> > happen if kthread_stop() was called. That is only called by >> >> > md_unregister_thread and I cannot see any way that could be call. >> >> > >> >> > So. No idea. >> >> > >> >> > Are you compiling these kernels yourself? >> >> >> >> Nope (used Mageia kernels), but I did now (3.1.5). >> >> >> >> > If so, could you: >> >> > - put a printk in the top of md_do_sync to report the values of >> >> > mddev->recovery and mddev->ro >> >> > - print a message whenever md_unregister_thread is called >> >> > - in md_check_recovery, in the >> >> > if (mddev->ro) { >> >> > /* Only thing we do on a ro array is remove >> >> > * failed devices. >> >> > */ >> >> > mdk_rdev_t *rdev; >> >> > >> >> > in statement, print the value of mddev->ro. >> >> > >> >> > Then see which of those printk's fire, and what they tell us. >> >> >> >> Only the last one does, and mddev->ro == 0. >> >> >> >> For reference, attached is the used patch and resulting log output. >> >> >> > >> > Thanks. >> > >> > So it isn't running md_do_sync at all. Odd. >> > >> > Could please add: >> > - call "WARN_ON(1);" in print_raid5_conf() so we get a stack trace and can >> > see who is calling it. >> > - print the value that remove_and_add_spares is going to return. >> >> Attached. As you can see, remove_and_add_spare returns 0. >> >> -- >> Anssi Hannula > > > Please add: > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 5c95ccb..fa56ac5 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -7328,8 +7328,10 @@ static int remove_and_add_spares(mddev_t *mddev) > } > } > > + printk("degraded=%d\n", mddev->degraded); > if (mddev->degraded) { > list_for_each_entry(rdev, &mddev->disks, same_set) { > + printk("raid_disk=%d flags=%x\n", rdev->raid_disk, rdev->flags); > if (rdev->raid_disk >= 0 && > !test_bit(In_sync, &rdev->flags) && > !test_bit(Faulty, &rdev->flags)) > > > 'degraded' must be 2 as dmesg contains > > [ 45.544806] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2 > > and 'degraded' is exactly the difference between '8' and '10' there. > > raid disks 3 and 7 must have In_sync and Faulty clear as both of them just > show "spare rebuilding" in the 'detail' output. > > so remove_and_add_spares "must" return 2. > > Hopefully the above patch will help me understand which of those is wrong. The output is: [ 47.389379] md0: degraded=2 [ 47.389380] md0: raid_disk=0 flags=4 [ 47.389381] md0: raid_disk=-1 flags=0 Full assemble log attached. -- Anssi Hannula [-- Attachment #2: dmesg-dbg3.txt --] [-- Type: text/plain, Size: 20875 bytes --] [ 45.225768] md: md0 stopped. [ 45.238528] md: bind<sdq1> [ 45.243922] md: bind<sdo1> [ 45.244059] md: bind<sdh1> [ 45.246554] md: bind<sdk1> [ 45.246987] md: bind<sdp1> [ 45.247224] md: bind<sdn1> [ 45.247357] md: bind<sdf1> [ 45.247596] md: bind<sdm1> [ 45.247738] md: bind<sdl1> [ 45.247860] md: bind<sdi1> [ 45.247986] md: bind<sdg1> [ 45.248100] md: bind<sdd1> [ 45.248408] md: bind<sdj1> [ 45.251243] async_tx: api initialized (async) [ 45.433646] raid6: int64x1 2754 MB/s [ 45.603511] raid6: int64x2 3535 MB/s [ 45.773372] raid6: int64x4 2375 MB/s [ 45.943221] raid6: int64x8 2489 MB/s [ 46.113076] raid6: sse2x1 7372 MB/s [ 46.282914] raid6: sse2x2 8653 MB/s [ 46.452774] raid6: sse2x4 9642 MB/s [ 46.452776] raid6: using algorithm sse2x4 (9642 MB/s) [ 46.460643] xor: automatically using best checksumming function: generic_sse [ 46.502736] generic_sse: 12817.200 MB/sec [ 46.502738] xor: using function: generic_sse (12817.200 MB/sec) [ 46.514000] md: raid6 personality registered for level 6 [ 46.514002] md: raid5 personality registered for level 5 [ 46.514003] md: raid4 personality registered for level 4 [ 46.514225] md/raid:md0: device sdj1 operational as raid disk 0 [ 46.514227] md/raid:md0: device sdl1 operational as raid disk 9 [ 46.514228] md/raid:md0: device sdm1 operational as raid disk 8 [ 46.514230] md/raid:md0: device sdn1 operational as raid disk 6 [ 46.514231] md/raid:md0: device sdp1 operational as raid disk 5 [ 46.514232] md/raid:md0: device sdk1 operational as raid disk 4 [ 46.514233] md/raid:md0: device sdo1 operational as raid disk 2 [ 46.514234] md/raid:md0: device sdq1 operational as raid disk 1 [ 46.514822] md/raid:md0: allocated 10592kB [ 46.514941] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2 [ 46.515042] ------------[ cut here ]------------ [ 46.515048] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]() [ 46.515055] Hardware name: System Product Name [ 46.515057] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder rc_imon_pad ir_rc5_decoder usb_storage joydev imon ir_nec_decoder rc_core snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep drxk snd_seq_dummy snd_seq_oss ddbridge dvb_core snd_seq_midi_event snd_seq snd_seq_device cxd2099(C) snd_pcm_oss snd_pcm snd_timer snd_mixer_oss snd evdev e1000e mxm_wmi sr_mod firewire_ohci serio_raw firewire_core i2c_i801 iTCO_wdt i7core_edac i2c_core iTCO_vendor_support soundcore edac_core crc_itu_t snd_page_alloc asus_atk0110 wmi sg xhci_hcd button processor ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan] [ 46.515126] Pid: 899, comm: mdadm Tainted: G C 3.1.5 #6 [ 46.515127] Call Trace: [ 46.515132] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0 [ 46.515134] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20 [ 46.515136] [<ffffffffa067e607>] print_raid5_conf+0x37/0x110 [raid456] [ 46.515139] [<ffffffffa06872ba>] run+0x2aa/0x750 [raid456] [ 46.515142] [<ffffffff811908a1>] ? bioset_create+0x1c1/0x280 [ 46.515144] [<ffffffff8133f42b>] md_run+0x36b/0x8a0 [ 46.515146] [<ffffffff8134064e>] ? add_new_disk+0x2ee/0x550 [ 46.515149] [<ffffffff8104d544>] ? check_preempt_curr+0x84/0xa0 [ 46.515150] [<ffffffff8133f97e>] do_md_run+0x1e/0xd0 [ 46.515152] [<ffffffff81341263>] md_ioctl+0x9b3/0x1230 [ 46.515156] [<ffffffff8121d260>] blkdev_ioctl+0x230/0x740 [ 46.515158] [<ffffffff81191a31>] block_ioctl+0x41/0x50 [ 46.515160] [<ffffffff8116f867>] do_vfs_ioctl+0x97/0x4f0 [ 46.515162] [<ffffffff8116fd51>] sys_ioctl+0x91/0xa0 [ 46.515165] [<ffffffff8143cdd2>] system_call_fastpath+0x16/0x1b [ 46.515166] ---[ end trace d6c4c3c9ed54b494 ]--- [ 46.515167] RAID conf printout: [ 46.515168] --- level:6 rd:10 wd:8 [ 46.515169] disk 0, o:1, dev:sdj1 [ 46.515170] disk 1, o:1, dev:sdq1 [ 46.515171] disk 2, o:1, dev:sdo1 [ 46.515173] disk 3, o:1, dev:sdh1 [ 46.515173] disk 4, o:1, dev:sdk1 [ 46.515174] disk 5, o:1, dev:sdp1 [ 46.515175] disk 6, o:1, dev:sdn1 [ 46.515176] disk 7, o:1, dev:sdf1 [ 46.515177] disk 8, o:1, dev:sdm1 [ 46.515178] disk 9, o:1, dev:sdl1 [ 46.515327] created bitmap (15 pages) for device md0 [ 46.930490] md0: bitmap initialized from disk: read 1/1 pages, set 258 of 29809 bits [ 47.387963] md0: detected capacity change from 0 to 16003169779712 [ 47.388949] md0: in md_check_recovery(), mddev->ro = 0 [ 47.388954] ------------[ cut here ]------------ [ 47.388961] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]() [ 47.388964] Hardware name: System Product Name [ 47.388966] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder rc_imon_pad ir_rc5_decoder usb_storage joydev imon ir_nec_decoder rc_core snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep drxk snd_seq_dummy snd_seq_oss ddbridge dvb_core snd_seq_midi_event snd_seq snd_seq_device cxd2099(C) snd_pcm_oss snd_pcm snd_timer snd_mixer_oss snd evdev e1000e mxm_wmi sr_mod firewire_ohci serio_raw firewire_core i2c_i801 iTCO_wdt i7core_edac i2c_core iTCO_vendor_support soundcore edac_core crc_itu_t snd_page_alloc asus_atk0110 wmi sg xhci_hcd button processor ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan] [ 47.389058] Pid: 983, comm: md0_raid6 Tainted: G WC 3.1.5 #6 [ 47.389059] Call Trace: [ 47.389064] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0 [ 47.389066] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20 [ 47.389068] [<ffffffffa067e607>] print_raid5_conf+0x37/0x110 [raid456] [ 47.389070] [<ffffffffa067eff7>] raid5_remove_disk+0x47/0x120 [raid456] [ 47.389073] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x6d0 [ 47.389076] [<ffffffffa0684e63>] raid5d+0x43/0x780 [raid456] [ 47.389079] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60 [ 47.389083] [<ffffffff81433097>] ? schedule_timeout+0x177/0x2e0 [ 47.389085] [<ffffffff81073660>] ? cascade+0xa0/0xa0 [ 47.389088] [<ffffffff81338f96>] md_thread+0x116/0x150 [ 47.389091] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40 [ 47.389093] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130 [ 47.389094] [<ffffffff810869a6>] kthread+0x96/0xa0 [ 47.389098] [<ffffffff8143ef34>] kernel_thread_helper+0x4/0x10 [ 47.389100] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0 [ 47.389102] [<ffffffff8143ef30>] ? gs_change+0x13/0x13 [ 47.389103] ---[ end trace d6c4c3c9ed54b495 ]--- [ 47.389104] RAID conf printout: [ 47.389105] --- level:6 rd:10 wd:8 [ 47.389106] disk 0, o:1, dev:sdj1 [ 47.389108] disk 1, o:1, dev:sdq1 [ 47.389109] disk 2, o:1, dev:sdo1 [ 47.389110] disk 3, o:1, dev:sdh1 [ 47.389111] disk 4, o:1, dev:sdk1 [ 47.389113] disk 5, o:1, dev:sdp1 [ 47.389114] disk 6, o:1, dev:sdn1 [ 47.389115] disk 7, o:1, dev:sdf1 [ 47.389116] disk 8, o:1, dev:sdm1 [ 47.389117] disk 9, o:1, dev:sdl1 [ 47.389118] ------------[ cut here ]------------ [ 47.389121] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]() [ 47.389122] Hardware name: System Product Name [ 47.389123] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder rc_imon_pad ir_rc5_decoder usb_storage joydev imon ir_nec_decoder rc_core snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep drxk snd_seq_dummy snd_seq_oss ddbridge dvb_core snd_seq_midi_event snd_seq snd_seq_device cxd2099(C) snd_pcm_oss snd_pcm snd_timer snd_mixer_oss snd evdev e1000e mxm_wmi sr_mod firewire_ohci serio_raw firewire_core i2c_i801 iTCO_wdt i7core_edac i2c_core iTCO_vendor_support soundcore edac_core crc_itu_t snd_page_alloc asus_atk0110 wmi sg xhci_hcd button processor ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan] [ 47.389162] Pid: 983, comm: md0_raid6 Tainted: G WC 3.1.5 #6 [ 47.389163] Call Trace: [ 47.389164] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0 [ 47.389166] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20 [ 47.389168] [<ffffffffa067e607>] print_raid5_conf+0x37/0x110 [raid456] [ 47.389171] [<ffffffffa067f00c>] raid5_remove_disk+0x5c/0x120 [raid456] [ 47.389173] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x6d0 [ 47.389175] [<ffffffffa0684e63>] raid5d+0x43/0x780 [raid456] [ 47.389177] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60 [ 47.389179] [<ffffffff81433097>] ? schedule_timeout+0x177/0x2e0 [ 47.389181] [<ffffffff81073660>] ? cascade+0xa0/0xa0 [ 47.389183] [<ffffffff81338f96>] md_thread+0x116/0x150 [ 47.389184] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40 [ 47.389186] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130 [ 47.389188] [<ffffffff810869a6>] kthread+0x96/0xa0 [ 47.389190] [<ffffffff8143ef34>] kernel_thread_helper+0x4/0x10 [ 47.389191] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0 [ 47.389193] [<ffffffff8143ef30>] ? gs_change+0x13/0x13 [ 47.389194] ---[ end trace d6c4c3c9ed54b496 ]--- [ 47.389195] RAID conf printout: [ 47.389196] --- level:6 rd:10 wd:8 [ 47.389197] disk 0, o:1, dev:sdj1 [ 47.389198] disk 1, o:1, dev:sdq1 [ 47.389199] disk 2, o:1, dev:sdo1 [ 47.389200] disk 3, o:1, dev:sdh1 [ 47.389201] disk 4, o:1, dev:sdk1 [ 47.389202] disk 5, o:1, dev:sdp1 [ 47.389203] disk 6, o:1, dev:sdn1 [ 47.389204] disk 7, o:1, dev:sdf1 [ 47.389205] disk 8, o:1, dev:sdm1 [ 47.389206] disk 9, o:1, dev:sdl1 [ 47.389206] ------------[ cut here ]------------ [ 47.389209] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]() [ 47.389210] Hardware name: System Product Name [ 47.389210] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder rc_imon_pad ir_rc5_decoder usb_storage joydev imon ir_nec_decoder rc_core snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep drxk snd_seq_dummy snd_seq_oss ddbridge dvb_core snd_seq_midi_event snd_seq snd_seq_device cxd2099(C) snd_pcm_oss snd_pcm snd_timer snd_mixer_oss snd evdev e1000e mxm_wmi sr_mod firewire_ohci serio_raw firewire_core i2c_i801 iTCO_wdt i7core_edac i2c_core iTCO_vendor_support soundcore edac_core crc_itu_t snd_page_alloc asus_atk0110 wmi sg xhci_hcd button processor ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan] [ 47.389248] Pid: 983, comm: md0_raid6 Tainted: G WC 3.1.5 #6 [ 47.389249] Call Trace: [ 47.389251] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0 [ 47.389253] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20 [ 47.389255] [<ffffffffa067e607>] print_raid5_conf+0x37/0x110 [raid456] [ 47.389257] [<ffffffffa067eff7>] raid5_remove_disk+0x47/0x120 [raid456] [ 47.389259] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x6d0 [ 47.389261] [<ffffffffa0684e63>] raid5d+0x43/0x780 [raid456] [ 47.389263] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60 [ 47.389265] [<ffffffff81433097>] ? schedule_timeout+0x177/0x2e0 [ 47.389267] [<ffffffff81073660>] ? cascade+0xa0/0xa0 [ 47.389269] [<ffffffff81338f96>] md_thread+0x116/0x150 [ 47.389271] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40 [ 47.389273] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130 [ 47.389274] [<ffffffff810869a6>] kthread+0x96/0xa0 [ 47.389276] [<ffffffff8143ef34>] kernel_thread_helper+0x4/0x10 [ 47.389278] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0 [ 47.389280] [<ffffffff8143ef30>] ? gs_change+0x13/0x13 [ 47.389281] ---[ end trace d6c4c3c9ed54b497 ]--- [ 47.389282] RAID conf printout: [ 47.389283] --- level:6 rd:10 wd:8 [ 47.389284] disk 0, o:1, dev:sdj1 [ 47.389285] disk 1, o:1, dev:sdq1 [ 47.389285] disk 2, o:1, dev:sdo1 [ 47.389286] disk 3, o:1, dev:sdh1 [ 47.389287] disk 4, o:1, dev:sdk1 [ 47.389288] disk 5, o:1, dev:sdp1 [ 47.389289] disk 6, o:1, dev:sdn1 [ 47.389290] disk 7, o:1, dev:sdf1 [ 47.389291] disk 8, o:1, dev:sdm1 [ 47.389292] disk 9, o:1, dev:sdl1 [ 47.389293] ------------[ cut here ]------------ [ 47.389295] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]() [ 47.389296] Hardware name: System Product Name [ 47.389297] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder rc_imon_pad ir_rc5_decoder usb_storage joydev imon ir_nec_decoder rc_core snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep drxk snd_seq_dummy snd_seq_oss ddbridge dvb_core snd_seq_midi_event snd_seq snd_seq_device cxd2099(C) snd_pcm_oss snd_pcm snd_timer snd_mixer_oss snd evdev e1000e mxm_wmi sr_mod firewire_ohci serio_raw firewire_core i2c_i801 iTCO_wdt i7core_edac i2c_core iTCO_vendor_support soundcore edac_core crc_itu_t snd_page_alloc asus_atk0110 wmi sg xhci_hcd button processor ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan] [ 47.389334] Pid: 983, comm: md0_raid6 Tainted: G WC 3.1.5 #6 [ 47.389335] Call Trace: [ 47.389337] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0 [ 47.389339] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20 [ 47.389341] [<ffffffffa067e607>] print_raid5_conf+0x37/0x110 [raid456] [ 47.389343] [<ffffffffa067f00c>] raid5_remove_disk+0x5c/0x120 [raid456] [ 47.389345] [<ffffffff8133dd45>] md_check_recovery+0x3c5/0x6d0 [ 47.389347] [<ffffffffa0684e63>] raid5d+0x43/0x780 [raid456] [ 47.389349] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60 [ 47.389351] [<ffffffff81433097>] ? schedule_timeout+0x177/0x2e0 [ 47.389353] [<ffffffff81073660>] ? cascade+0xa0/0xa0 [ 47.389355] [<ffffffff81338f96>] md_thread+0x116/0x150 [ 47.389357] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40 [ 47.389359] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130 [ 47.389360] [<ffffffff810869a6>] kthread+0x96/0xa0 [ 47.389362] [<ffffffff8143ef34>] kernel_thread_helper+0x4/0x10 [ 47.389364] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0 [ 47.389366] [<ffffffff8143ef30>] ? gs_change+0x13/0x13 [ 47.389367] ---[ end trace d6c4c3c9ed54b498 ]--- [ 47.389368] RAID conf printout: [ 47.389368] --- level:6 rd:10 wd:8 [ 47.389369] disk 0, o:1, dev:sdj1 [ 47.389370] disk 1, o:1, dev:sdq1 [ 47.389371] disk 2, o:1, dev:sdo1 [ 47.389372] disk 3, o:1, dev:sdh1 [ 47.389373] disk 4, o:1, dev:sdk1 [ 47.389374] disk 5, o:1, dev:sdp1 [ 47.389375] disk 6, o:1, dev:sdn1 [ 47.389376] disk 7, o:1, dev:sdf1 [ 47.389377] disk 8, o:1, dev:sdm1 [ 47.389378] disk 9, o:1, dev:sdl1 [ 47.389379] md0: degraded=2 [ 47.389380] md0: raid_disk=0 flags=4 [ 47.389381] md0: raid_disk=-1 flags=0 [ 47.389382] ------------[ cut here ]------------ [ 47.389384] WARNING: at drivers/md/raid5.c:5030 print_raid5_conf+0x37/0x110 [raid456]() [ 47.389385] Hardware name: System Product Name [ 47.389386] Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx coretemp cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table mperf pcspkr nvram netconsole configfs kvm_intel kvm snd_hda_codec_hdmi tda18271c2dd ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sony_decoder ir_jvc_decoder uas ir_rc6_decoder rc_imon_pad ir_rc5_decoder usb_storage joydev imon ir_nec_decoder rc_core snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep drxk snd_seq_dummy snd_seq_oss ddbridge dvb_core snd_seq_midi_event snd_seq snd_seq_device cxd2099(C) snd_pcm_oss snd_pcm snd_timer snd_mixer_oss snd evdev e1000e mxm_wmi sr_mod firewire_ohci serio_raw firewire_core i2c_i801 iTCO_wdt i7core_edac i2c_core iTCO_vendor_support soundcore edac_core crc_itu_t snd_page_alloc asus_atk0110 wmi sg xhci_hcd button processor ide_generic ide_core mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class ata_generic pata_jmicron cciss pata_amd sata_sil24 sata_sil sata_via dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug ata_piix ahci libahci libata sd_mod scsi_mod crc_t10dif raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd usbhid hid usbcore [last unloaded: scsi_wait_scan] [ 47.389423] Pid: 983, comm: md0_raid6 Tainted: G WC 3.1.5 #6 [ 47.389424] Call Trace: [ 47.389426] [<ffffffff8106487f>] warn_slowpath_common+0x7f/0xc0 [ 47.389428] [<ffffffff810648da>] warn_slowpath_null+0x1a/0x20 [ 47.389430] [<ffffffffa067e607>] print_raid5_conf+0x37/0x110 [raid456] [ 47.389432] [<ffffffffa067e7b4>] raid5_add_disk+0xd4/0x120 [raid456] [ 47.389434] [<ffffffff8133ded9>] md_check_recovery+0x559/0x6d0 [ 47.389436] [<ffffffffa0684e63>] raid5d+0x43/0x780 [raid456] [ 47.389438] [<ffffffff810749ea>] ? del_timer_sync+0x3a/0x60 [ 47.389440] [<ffffffff81433097>] ? schedule_timeout+0x177/0x2e0 [ 47.389442] [<ffffffff81073660>] ? cascade+0xa0/0xa0 [ 47.389444] [<ffffffff81338f96>] md_thread+0x116/0x150 [ 47.389446] [<ffffffff81086f40>] ? wake_up_bit+0x40/0x40 [ 47.389447] [<ffffffff81338e80>] ? md_rdev_init+0x130/0x130 [ 47.389449] [<ffffffff810869a6>] kthread+0x96/0xa0 [ 47.389451] [<ffffffff8143ef34>] kernel_thread_helper+0x4/0x10 [ 47.389453] [<ffffffff81086910>] ? kthread_worker_fn+0x1a0/0x1a0 [ 47.389455] [<ffffffff8143ef30>] ? gs_change+0x13/0x13 [ 47.389456] ---[ end trace d6c4c3c9ed54b499 ]--- [ 47.389456] RAID conf printout: [ 47.389457] --- level:6 rd:10 wd:8 [ 47.389458] disk 0, o:1, dev:sdj1 [ 47.389459] disk 1, o:1, dev:sdq1 [ 47.389460] disk 2, o:1, dev:sdo1 [ 47.389461] disk 3, o:1, dev:sdh1 [ 47.389462] disk 4, o:1, dev:sdk1 [ 47.389463] disk 5, o:1, dev:sdp1 [ 47.389464] disk 6, o:1, dev:sdn1 [ 47.389465] disk 7, o:1, dev:sdf1 [ 47.389466] disk 8, o:1, dev:sdm1 [ 47.389467] disk 9, o:1, dev:sdl1 [ 47.389468] md0: remove_and_add_spares will return 0 [ 47.397561] md0: unknown partition table ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting 2011-12-12 6:42 ` Anssi Hannula @ 2011-12-12 7:10 ` NeilBrown 2011-12-12 7:25 ` Anssi Hannula 0 siblings, 1 reply; 9+ messages in thread From: NeilBrown @ 2011-12-12 7:10 UTC (permalink / raw) To: Anssi Hannula; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 6751 bytes --] On Mon, 12 Dec 2011 08:42:49 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: > On Mon, Dec 12, 2011 at 8:24 AM, NeilBrown <neilb@suse.de> wrote: > > On Mon, 12 Dec 2011 08:02:33 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: > > > >> On Mon, Dec 12, 2011 at 7:42 AM, NeilBrown <neilb@suse.de> wrote: > >> > On Mon, 12 Dec 2011 07:22:17 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: > >> > > >> >> On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@suse.de> wrote: > >> >> > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: > >> >> > > >> >> >> Hi! > >> >> >> > >> >> >> After I rebooted during a raid6 rebuild, the rebuild didn't start again. > >> >> >> Instead, there is a flood of "RAID conf printout"s that seemingly happen > >> >> >> on array activity. > >> >> >> > >> >> >> All the devices show up properly in --detail and two devices are marked > >> >> >> as "spare rebuilding", and I can access the contents of the array just > >> >> >> fine, but the rebuild doesn't actually start. Is this a bug or am I > >> >> >> missing something? :) > >> >> >> > >> >> >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have > >> >> >> the same issue. mdadm is 3.1.5. > >> >> >> > >> >> >> I'm not using start_ro and writing to the array doesn't trigger a > >> >> >> rebuild either. > >> >> >> > >> >> >> Attached are --examine outputs before assembly, kernel log output on > >> >> >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4). > >> >> >> > >> >> > > >> >> > Thank you for the very detailed problem report. > >> >> > >> >> Thanks for the quick response :) > >> >> > >> >> > Unfortunately it is a complete mystery to me what is happening. > >> >> > > >> >> > The repeated "RAID conf printout" messages are almost certainly coming from > >> >> > the end of raid5_remove_disk. > >> >> > It is being called from remove_and_add_spares for each of the two devices > >> >> > that are being rebuilt. raid5_remove_disk declines to remove them because it > >> >> > can keep rebuilding them. > >> >> > > >> >> > remove_and_add_spares then counts them and notes there are 2. > >> >> > md_check_recovery notes that this is > 0, so it should create a thread to run > >> >> > md_do_sync. > >> >> > > >> >> > md_do_sync should then print out a message like > >> >> > md: recovery of RAID array md0 > >> >> > > >> >> > but it doesn't. So something went wrong. > >> >> > There are three reasons that md_do_sync might not print a message: > >> >> > > >> >> > 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is > >> >> > unlikely, and in any case md_check_recovery clears it. > >> >> > 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2 > >> >> > then we would be able to see that in /proc/mdstat as a "(readonly)" > >> >> > status. But we don't. > >> >> > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does > >> >> > get set if kthread_should_stop() returns 'true', but that should only > >> >> > happen if kthread_stop() was called. That is only called by > >> >> > md_unregister_thread and I cannot see any way that could be call. > >> >> > > >> >> > So. No idea. > >> >> > > >> >> > Are you compiling these kernels yourself? > >> >> > >> >> Nope (used Mageia kernels), but I did now (3.1.5). > >> >> > >> >> > If so, could you: > >> >> > - put a printk in the top of md_do_sync to report the values of > >> >> > mddev->recovery and mddev->ro > >> >> > - print a message whenever md_unregister_thread is called > >> >> > - in md_check_recovery, in the > >> >> > if (mddev->ro) { > >> >> > /* Only thing we do on a ro array is remove > >> >> > * failed devices. > >> >> > */ > >> >> > mdk_rdev_t *rdev; > >> >> > > >> >> > in statement, print the value of mddev->ro. > >> >> > > >> >> > Then see which of those printk's fire, and what they tell us. > >> >> > >> >> Only the last one does, and mddev->ro == 0. > >> >> > >> >> For reference, attached is the used patch and resulting log output. > >> >> > >> > > >> > Thanks. > >> > > >> > So it isn't running md_do_sync at all. Odd. > >> > > >> > Could please add: > >> > - call "WARN_ON(1);" in print_raid5_conf() so we get a stack trace and can > >> > see who is calling it. > >> > - print the value that remove_and_add_spares is going to return. > >> > >> Attached. As you can see, remove_and_add_spare returns 0. > >> > >> -- > >> Anssi Hannula > > > > > > Please add: > > > > diff --git a/drivers/md/md.c b/drivers/md/md.c > > index 5c95ccb..fa56ac5 100644 > > --- a/drivers/md/md.c > > +++ b/drivers/md/md.c > > @@ -7328,8 +7328,10 @@ static int remove_and_add_spares(mddev_t *mddev) > > } > > } > > > > + printk("degraded=%d\n", mddev->degraded); > > if (mddev->degraded) { > > list_for_each_entry(rdev, &mddev->disks, same_set) { > > + printk("raid_disk=%d flags=%x\n", rdev->raid_disk, rdev->flags); > > if (rdev->raid_disk >= 0 && > > !test_bit(In_sync, &rdev->flags) && > > !test_bit(Faulty, &rdev->flags)) > > > > > > 'degraded' must be 2 as dmesg contains > > > > [ 45.544806] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2 > > > > and 'degraded' is exactly the difference between '8' and '10' there. > > > > raid disks 3 and 7 must have In_sync and Faulty clear as both of them just > > show "spare rebuilding" in the 'detail' output. > > > > so remove_and_add_spares "must" return 2. > > > > Hopefully the above patch will help me understand which of those is wrong. > > The output is: > [ 47.389379] md0: degraded=2 > [ 47.389380] md0: raid_disk=0 flags=4 > [ 47.389381] md0: raid_disk=-1 flags=0 > > Full assemble log attached. > > -- > Anssi Hannula Bingo. This will fix it. We don't really need that 'break' there, and it is a problem. Thanks. NeilBrown diff --git a/drivers/md/md.c b/drivers/md/md.c index 5c95ccb..440d964 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -7344,8 +7344,7 @@ static int remove_and_add_spares(mddev_t *mddev) spares++; md_new_event(mddev); set_bit(MD_CHANGE_DEVS, &mddev->flags); - } else - break; + } } } } [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: raid6 rebuild not starting 2011-12-12 7:10 ` NeilBrown @ 2011-12-12 7:25 ` Anssi Hannula 0 siblings, 0 replies; 9+ messages in thread From: Anssi Hannula @ 2011-12-12 7:25 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Mon, Dec 12, 2011 at 9:10 AM, NeilBrown <neilb@suse.de> wrote: > On Mon, 12 Dec 2011 08:42:49 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: > >> On Mon, Dec 12, 2011 at 8:24 AM, NeilBrown <neilb@suse.de> wrote: >> > On Mon, 12 Dec 2011 08:02:33 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: >> > >> >> On Mon, Dec 12, 2011 at 7:42 AM, NeilBrown <neilb@suse.de> wrote: >> >> > On Mon, 12 Dec 2011 07:22:17 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: >> >> > >> >> >> On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@suse.de> wrote: >> >> >> > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@iki.fi> wrote: >> >> >> > >> >> >> >> Hi! >> >> >> >> >> >> >> >> After I rebooted during a raid6 rebuild, the rebuild didn't start again. >> >> >> >> Instead, there is a flood of "RAID conf printout"s that seemingly happen >> >> >> >> on array activity. >> >> >> >> >> >> >> >> All the devices show up properly in --detail and two devices are marked >> >> >> >> as "spare rebuilding", and I can access the contents of the array just >> >> >> >> fine, but the rebuild doesn't actually start. Is this a bug or am I >> >> >> >> missing something? :) >> >> >> >> >> >> >> >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have >> >> >> >> the same issue. mdadm is 3.1.5. >> >> >> >> >> >> >> >> I'm not using start_ro and writing to the array doesn't trigger a >> >> >> >> rebuild either. >> >> >> >> >> >> >> >> Attached are --examine outputs before assembly, kernel log output on >> >> >> >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4). >> >> >> >> >> >> >> > >> >> >> > Thank you for the very detailed problem report. >> >> >> >> >> >> Thanks for the quick response :) >> >> >> >> >> >> > Unfortunately it is a complete mystery to me what is happening. >> >> >> > >> >> >> > The repeated "RAID conf printout" messages are almost certainly coming from >> >> >> > the end of raid5_remove_disk. >> >> >> > It is being called from remove_and_add_spares for each of the two devices >> >> >> > that are being rebuilt. raid5_remove_disk declines to remove them because it >> >> >> > can keep rebuilding them. >> >> >> > >> >> >> > remove_and_add_spares then counts them and notes there are 2. >> >> >> > md_check_recovery notes that this is > 0, so it should create a thread to run >> >> >> > md_do_sync. >> >> >> > >> >> >> > md_do_sync should then print out a message like >> >> >> > md: recovery of RAID array md0 >> >> >> > >> >> >> > but it doesn't. So something went wrong. >> >> >> > There are three reasons that md_do_sync might not print a message: >> >> >> > >> >> >> > 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is >> >> >> > unlikely, and in any case md_check_recovery clears it. >> >> >> > 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2 >> >> >> > then we would be able to see that in /proc/mdstat as a "(readonly)" >> >> >> > status. But we don't. >> >> >> > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does >> >> >> > get set if kthread_should_stop() returns 'true', but that should only >> >> >> > happen if kthread_stop() was called. That is only called by >> >> >> > md_unregister_thread and I cannot see any way that could be call. >> >> >> > >> >> >> > So. No idea. >> >> >> > >> >> >> > Are you compiling these kernels yourself? >> >> >> >> >> >> Nope (used Mageia kernels), but I did now (3.1.5). >> >> >> >> >> >> > If so, could you: >> >> >> > - put a printk in the top of md_do_sync to report the values of >> >> >> > mddev->recovery and mddev->ro >> >> >> > - print a message whenever md_unregister_thread is called >> >> >> > - in md_check_recovery, in the >> >> >> > if (mddev->ro) { >> >> >> > /* Only thing we do on a ro array is remove >> >> >> > * failed devices. >> >> >> > */ >> >> >> > mdk_rdev_t *rdev; >> >> >> > >> >> >> > in statement, print the value of mddev->ro. >> >> >> > >> >> >> > Then see which of those printk's fire, and what they tell us. >> >> >> >> >> >> Only the last one does, and mddev->ro == 0. >> >> >> >> >> >> For reference, attached is the used patch and resulting log output. >> >> >> >> >> > >> >> > Thanks. >> >> > >> >> > So it isn't running md_do_sync at all. Odd. >> >> > >> >> > Could please add: >> >> > - call "WARN_ON(1);" in print_raid5_conf() so we get a stack trace and can >> >> > see who is calling it. >> >> > - print the value that remove_and_add_spares is going to return. >> >> >> >> Attached. As you can see, remove_and_add_spare returns 0. >> >> >> >> -- >> >> Anssi Hannula >> > >> > >> > Please add: >> > >> > diff --git a/drivers/md/md.c b/drivers/md/md.c >> > index 5c95ccb..fa56ac5 100644 >> > --- a/drivers/md/md.c >> > +++ b/drivers/md/md.c >> > @@ -7328,8 +7328,10 @@ static int remove_and_add_spares(mddev_t *mddev) >> > } >> > } >> > >> > + printk("degraded=%d\n", mddev->degraded); >> > if (mddev->degraded) { >> > list_for_each_entry(rdev, &mddev->disks, same_set) { >> > + printk("raid_disk=%d flags=%x\n", rdev->raid_disk, rdev->flags); >> > if (rdev->raid_disk >= 0 && >> > !test_bit(In_sync, &rdev->flags) && >> > !test_bit(Faulty, &rdev->flags)) >> > >> > >> > 'degraded' must be 2 as dmesg contains >> > >> > [ 45.544806] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2 >> > >> > and 'degraded' is exactly the difference between '8' and '10' there. >> > >> > raid disks 3 and 7 must have In_sync and Faulty clear as both of them just >> > show "spare rebuilding" in the 'detail' output. >> > >> > so remove_and_add_spares "must" return 2. >> > >> > Hopefully the above patch will help me understand which of those is wrong. >> >> The output is: >> [ 47.389379] md0: degraded=2 >> [ 47.389380] md0: raid_disk=0 flags=4 >> [ 47.389381] md0: raid_disk=-1 flags=0 >> >> Full assemble log attached. >> > > Bingo. > > This will fix it. We don't really need that 'break' there, and it is a > problem. > > Thanks. Confirmed. Thanks for the quick fix :) -- Anssi Hannula -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-12-12 7:25 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-12-11 7:03 raid6 rebuild not starting Anssi Hannula 2011-12-12 3:01 ` NeilBrown 2011-12-12 5:22 ` Anssi Hannula 2011-12-12 5:42 ` NeilBrown 2011-12-12 6:02 ` Anssi Hannula 2011-12-12 6:24 ` NeilBrown 2011-12-12 6:42 ` Anssi Hannula 2011-12-12 7:10 ` NeilBrown 2011-12-12 7:25 ` Anssi Hannula
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.