* Update to mdadm V3.2.5 => RAID starts to recover (reproducible)
@ 2013-08-22 13:20 Andreas Baer
2013-08-26 5:52 ` NeilBrown
0 siblings, 1 reply; 6+ messages in thread
From: Andreas Baer @ 2013-08-22 13:20 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 3459 bytes --]
Short description:
I've discovered a problem during re-assembly of a clean RAID. mdadm
throws one disk out because this disk apparently shows another disk as
failed. After assembly, RAID starts to recover on existing spare disk.
In detail:
1. RAID-6 (Superblock V0.90.00) created with mdadm V2.6.4 and with 7
active disks and 1 spare disk (disk size: 1 TB), fully synced and
clean.
2. RAID-6 stopped and re-assembled with mdadm V3.2.5, but during that
one disk is thrown out.
Manual assembly command for /dev/md0, relevant partitions are /dev/sd[b-i]1:
# mdadm --assemble --scan -vvv
mdadm: looking for devices for /dev/md0
mdadm: no RAID superblock on /dev/sdi
mdadm: no RAID superblock on /dev/sdh
mdadm: no RAID superblock on /dev/sdg
mdadm: no RAID superblock on /dev/sdf
mdadm: no RAID superblock on /dev/sde
mdadm: no RAID superblock on /dev/sdd
mdadm: no RAID superblock on /dev/sdc
mdadm: no RAID superblock on /dev/sdb
mdadm: no RAID superblock on /dev/sda1
mdadm: no RAID superblock on /dev/sda
mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 7.
mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 6.
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
mdadm: ignoring /dev/sdb1 as it reports /dev/sdi1 as failed
mdadm: no uptodate device for slot 0 of /dev/md0
mdadm: added /dev/sdd1 to /dev/md0 as 2
mdadm: added /dev/sde1 to /dev/md0 as 3
mdadm: added /dev/sdf1 to /dev/md0 as 4
mdadm: added /dev/sdg1 to /dev/md0 as 5
mdadm: added /dev/sdh1 to /dev/md0 as 6
mdadm: added /dev/sdi1 to /dev/md0 as 7
mdadm: added /dev/sdc1 to /dev/md0 as 1
mdadm: /dev/md0 has been started with 6 drives (out of 7) and 1 spare.
I finally made a test by modifying mdadm V3.2.5 sources to not write
any data to any superblock and to simply exit() somewhere in the
middle of assembly process to be able to reproduce this behavior
without any RAID re-creation/synchronization.
So using mdadm V2.6.4 /dev/md0 assembles without problems and if I
switch to mdadm V3.2.5 it shows the same messages as above.
The real problem:
I have more than a single machine receiving a similar software update
so I need to find a solution or workaround around this problem. By the
way, from another test without an existing spare disk, there seems to
be no 'throwing out'-problem when switching from V2.6.4 to V3.2.5.
It would also be a great help if someone could explain the reason
behind the relevant code fragment for rejecting a device, e.g. why is
only the 'most_recent' device important?
/* If this device thinks that 'most_recent' has failed, then
* we must reject this device.
*/
if (j != most_recent &&
content->array.raid_disks > 0 &&
devices[most_recent].i.disk.raid_disk >= 0 &&
devmap[j * content->array.raid_disks +
devices[most_recent].i.disk.raid_disk] == 0) {
if (verbose > -1)
fprintf(stderr, Name ": ignoring %s as it reports %s as failed\n",
devices[j].devname, devices[most_recent].devname);
best[i] = -1;
continue;
}
I also attached some files showing some details about related
superblocks before and after assembly as well as about RAID status
itself.
[-- Attachment #2: raid_assembly_V3.2.5.txt --]
[-- Type: text/plain, Size: 2589 bytes --]
# mdadm --assemble --scan -vvv
mdadm: looking for devices for /dev/md0
mdadm: no RAID superblock on /dev/sdi
mdadm: no RAID superblock on /dev/sdh
mdadm: no RAID superblock on /dev/sdg
mdadm: no RAID superblock on /dev/sdf
mdadm: no RAID superblock on /dev/sde
mdadm: no RAID superblock on /dev/sdd
mdadm: no RAID superblock on /dev/sdc
mdadm: no RAID superblock on /dev/sdb
mdadm: no RAID superblock on /dev/sda1
mdadm: no RAID superblock on /dev/sda
mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 7.
mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 6.
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
mdadm: ignoring /dev/sdb1 as it reports /dev/sdi1 as failed
mdadm: no uptodate device for slot 0 of /dev/md0
mdadm: added /dev/sdd1 to /dev/md0 as 2
mdadm: added /dev/sde1 to /dev/md0 as 3
mdadm: added /dev/sdf1 to /dev/md0 as 4
mdadm: added /dev/sdg1 to /dev/md0 as 5
mdadm: added /dev/sdh1 to /dev/md0 as 6
mdadm: added /dev/sdi1 to /dev/md0 as 7
mdadm: added /dev/sdc1 to /dev/md0 as 1
mdadm: /dev/md0 has been started with 6 drives (out of 7) and 1 spare.
# mdadm -D /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Raid Devices : 7
Total Devices : 7
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue Jul XX 12:04:22 20XX
State : clean, degraded, recovering
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 32K
Rebuild Status : 0% complete
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Events : 0.40
Number Major Minor RaidDevice State
7 8 129 0 spare rebuilding /dev/sdi1
1 8 33 1 active sync /dev/sdc1
2 8 49 2 active sync /dev/sdd1
3 8 65 3 active sync /dev/sde1
4 8 81 4 active sync /dev/sdf1
5 8 97 5 active sync /dev/sdg1
6 8 113 6 active sync /dev/sdh1
[-- Attachment #3: superblocks_after_synchronization.txt --]
[-- Type: text/plain, Size: 10002 bytes --]
/dev/sdb1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jul XX 12:04:22 20XX
State : clean
Active Devices : 7
Working Devices : 8
Failed Devices : 0
Spare Devices : 1
Checksum : ce8f938f - correct
Events : 40
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 0 8 17 0 active sync /dev/sdb1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
/dev/sdc1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jul XX 12:04:22 20XX
State : clean
Active Devices : 7
Working Devices : 8
Failed Devices : 0
Spare Devices : 1
Checksum : ce8f93a1 - correct
Events : 40
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 1 8 33 1 active sync /dev/sdc1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
/dev/sdd1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jul XX 12:04:22 20XX
State : clean
Active Devices : 7
Working Devices : 8
Failed Devices : 0
Spare Devices : 1
Checksum : ce8f93b3 - correct
Events : 40
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 2 8 49 2 active sync /dev/sdd1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
/dev/sde1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jul XX 12:04:22 20XX
State : clean
Active Devices : 7
Working Devices : 8
Failed Devices : 0
Spare Devices : 1
Checksum : ce8f93c5 - correct
Events : 40
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 3 8 65 3 active sync /dev/sde1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
/dev/sdf1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jul XX 12:04:22 20XX
State : clean
Active Devices : 7
Working Devices : 8
Failed Devices : 0
Spare Devices : 1
Checksum : ce8f93d7 - correct
Events : 40
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 4 8 81 4 active sync /dev/sdf1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
/dev/sdg1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jul XX 12:04:22 20XX
State : clean
Active Devices : 7
Working Devices : 8
Failed Devices : 0
Spare Devices : 1
Checksum : ce8f93e9 - correct
Events : 40
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 5 8 97 5 active sync /dev/sdg1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
/dev/sdh1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jul XX 12:04:22 20XX
State : clean
Active Devices : 7
Working Devices : 8
Failed Devices : 0
Spare Devices : 1
Checksum : ce8f93fb - correct
Events : 40
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 6 8 113 6 active sync /dev/sdh1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
/dev/sdi1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jul XX 01:05:31 20XX
State : clean
Active Devices : 7
Working Devices : 8
Failed Devices : 0
Spare Devices : 1
Checksum : ce8ef99c - correct
Events : 40
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 7 8 129 7 spare /dev/sdi1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
[-- Attachment #4: superblocks_after_V3.2.5_re-assembly.txt --]
[-- Type: text/plain, Size: 9890 bytes --]
/dev/sdb1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jul XX 12:04:22 20XX
State : clean
Active Devices : 7
Working Devices : 8
Failed Devices : 0
Spare Devices : 1
Checksum : ce8f938f - correct
Events : 40
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 0 8 17 0 active sync /dev/sdb1
0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
/dev/sdc1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 7
Preferred Minor : 0
Update Time : Tue Jul XX 13:32:24 20XX
State : clean
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 1
Checksum : ce8fa831 - correct
Events : 44
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 1 8 33 1 active sync /dev/sdc1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
/dev/sdd1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 7
Preferred Minor : 0
Update Time : Tue Jul XX 13:32:24 20XX
State : clean
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 1
Checksum : ce8fa843 - correct
Events : 44
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 2 8 49 2 active sync /dev/sdd1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
/dev/sde1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 7
Preferred Minor : 0
Update Time : Tue Jul XX 13:32:24 20XX
State : clean
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 1
Checksum : ce8fa855 - correct
Events : 44
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 3 8 65 3 active sync /dev/sde1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
/dev/sdf1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 7
Preferred Minor : 0
Update Time : Tue Jul XX 13:32:24 20XX
State : clean
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 1
Checksum : ce8fa867 - correct
Events : 44
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 4 8 81 4 active sync /dev/sdf1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
/dev/sdg1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 7
Preferred Minor : 0
Update Time : Tue Jul XX 13:32:24 20XX
State : clean
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 1
Checksum : ce8fa879 - correct
Events : 44
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 5 8 97 5 active sync /dev/sdg1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
/dev/sdh1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 7
Preferred Minor : 0
Update Time : Tue Jul XX 13:32:24 20XX
State : clean
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 1
Checksum : ce8fa88b - correct
Events : 44
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 6 8 113 6 active sync /dev/sdh1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
/dev/sdi1:
Magic : a92b4efc
Version : 0.90.00
UUID : d5236bb6:af292dc7:90a2b5fe:324e77f8
Creation Time : Mon Jul XX 11:59:56 20XX
Raid Level : raid6
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 4883799680 (4657.55 GiB 5001.01 GB)
Raid Devices : 7
Total Devices : 7
Preferred Minor : 0
Update Time : Tue Jul XX 13:32:24 20XX
State : clean
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 1
Checksum : ce8fa897 - correct
Events : 44
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 7 8 129 7 spare /dev/sdi1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 8 81 4 active sync /dev/sdf1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 129 7 spare /dev/sdi1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Update to mdadm V3.2.5 => RAID starts to recover (reproducible)
2013-08-22 13:20 Update to mdadm V3.2.5 => RAID starts to recover (reproducible) Andreas Baer
@ 2013-08-26 5:52 ` NeilBrown
2013-08-29 9:55 ` Andreas Baer
0 siblings, 1 reply; 6+ messages in thread
From: NeilBrown @ 2013-08-26 5:52 UTC (permalink / raw)
To: Andreas Baer; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 4961 bytes --]
On Thu, 22 Aug 2013 15:20:06 +0200 Andreas Baer <synthetic.gods@gmail.com>
wrote:
> Short description:
> I've discovered a problem during re-assembly of a clean RAID. mdadm
> throws one disk out because this disk apparently shows another disk as
> failed. After assembly, RAID starts to recover on existing spare disk.
>
> In detail:
> 1. RAID-6 (Superblock V0.90.00) created with mdadm V2.6.4 and with 7
> active disks and 1 spare disk (disk size: 1 TB), fully synced and
> clean.
> 2. RAID-6 stopped and re-assembled with mdadm V3.2.5, but during that
> one disk is thrown out.
>
> Manual assembly command for /dev/md0, relevant partitions are /dev/sd[b-i]1:
> # mdadm --assemble --scan -vvv
> mdadm: looking for devices for /dev/md0
> mdadm: no RAID superblock on /dev/sdi
> mdadm: no RAID superblock on /dev/sdh
> mdadm: no RAID superblock on /dev/sdg
> mdadm: no RAID superblock on /dev/sdf
> mdadm: no RAID superblock on /dev/sde
> mdadm: no RAID superblock on /dev/sdd
> mdadm: no RAID superblock on /dev/sdc
> mdadm: no RAID superblock on /dev/sdb
> mdadm: no RAID superblock on /dev/sda1
> mdadm: no RAID superblock on /dev/sda
> mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 7.
> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 6.
> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.
> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
> mdadm: ignoring /dev/sdb1 as it reports /dev/sdi1 as failed
> mdadm: no uptodate device for slot 0 of /dev/md0
> mdadm: added /dev/sdd1 to /dev/md0 as 2
> mdadm: added /dev/sde1 to /dev/md0 as 3
> mdadm: added /dev/sdf1 to /dev/md0 as 4
> mdadm: added /dev/sdg1 to /dev/md0 as 5
> mdadm: added /dev/sdh1 to /dev/md0 as 6
> mdadm: added /dev/sdi1 to /dev/md0 as 7
> mdadm: added /dev/sdc1 to /dev/md0 as 1
> mdadm: /dev/md0 has been started with 6 drives (out of 7) and 1 spare.
>
> I finally made a test by modifying mdadm V3.2.5 sources to not write
> any data to any superblock and to simply exit() somewhere in the
> middle of assembly process to be able to reproduce this behavior
> without any RAID re-creation/synchronization.
> So using mdadm V2.6.4 /dev/md0 assembles without problems and if I
> switch to mdadm V3.2.5 it shows the same messages as above.
>
> The real problem:
> I have more than a single machine receiving a similar software update
> so I need to find a solution or workaround around this problem. By the
> way, from another test without an existing spare disk, there seems to
> be no 'throwing out'-problem when switching from V2.6.4 to V3.2.5.
>
> It would also be a great help if someone could explain the reason
> behind the relevant code fragment for rejecting a device, e.g. why is
> only the 'most_recent' device important?
>
> /* If this device thinks that 'most_recent' has failed, then
> * we must reject this device.
> */
> if (j != most_recent &&
> content->array.raid_disks > 0 &&
> devices[most_recent].i.disk.raid_disk >= 0 &&
> devmap[j * content->array.raid_disks +
> devices[most_recent].i.disk.raid_disk] == 0) {
> if (verbose > -1)
> fprintf(stderr, Name ": ignoring %s as it reports %s as failed\n",
> devices[j].devname, devices[most_recent].devname);
> best[i] = -1;
> continue;
> }
>
> I also attached some files showing some details about related
> superblocks before and after assembly as well as about RAID status
> itself.
Thanks for the thorough report. I think this issue has been fixed in 3.3-rc1
You can fix it for 3.2.5 by applying the following patch:
diff --git a/Assemble.c b/Assemble.c
index 227d66f..bc65c29 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -849,7 +849,8 @@ int Assemble(struct supertype *st, char *mddev,
devices[devcnt].i.disk.minor = minor(stb.st_rdev);
if (most_recent < devcnt) {
if (devices[devcnt].i.events
- > devices[most_recent].i.events)
+ > devices[most_recent].i.events &&
+ devices[devcnt].i.disk.state == 6)
most_recent = devcnt;
}
if (content->array.level == LEVEL_MULTIPATH)
The "most recent" device is important as we need to choose one to compare all
others again. The problem is that the code in 3.2.5 can sometimes choose a
spare, which isn't such a good idea.
The "most recent" is also important because when a collection of devices is
given to the kernel it will give priority to some information which is on the
last device passed in. So we make sure that the last device given to the
kernel is the "most recent".
Please let me know if the patch fixes your problem.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Update to mdadm V3.2.5 => RAID starts to recover (reproducible)
2013-08-26 5:52 ` NeilBrown
@ 2013-08-29 9:55 ` Andreas Baer
2013-09-02 1:35 ` NeilBrown
0 siblings, 1 reply; 6+ messages in thread
From: Andreas Baer @ 2013-08-29 9:55 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 5759 bytes --]
On 8/26/13, NeilBrown <neilb@suse.de> wrote:
> On Thu, 22 Aug 2013 15:20:06 +0200 Andreas Baer <synthetic.gods@gmail.com>
> wrote:
>
>> Short description:
>> I've discovered a problem during re-assembly of a clean RAID. mdadm
>> throws one disk out because this disk apparently shows another disk as
>> failed. After assembly, RAID starts to recover on existing spare disk.
>>
>> In detail:
>> 1. RAID-6 (Superblock V0.90.00) created with mdadm V2.6.4 and with 7
>> active disks and 1 spare disk (disk size: 1 TB), fully synced and
>> clean.
>> 2. RAID-6 stopped and re-assembled with mdadm V3.2.5, but during that
>> one disk is thrown out.
>>
>> Manual assembly command for /dev/md0, relevant partitions are
>> /dev/sd[b-i]1:
>> # mdadm --assemble --scan -vvv
>> mdadm: looking for devices for /dev/md0
>> mdadm: no RAID superblock on /dev/sdi
>> mdadm: no RAID superblock on /dev/sdh
>> mdadm: no RAID superblock on /dev/sdg
>> mdadm: no RAID superblock on /dev/sdf
>> mdadm: no RAID superblock on /dev/sde
>> mdadm: no RAID superblock on /dev/sdd
>> mdadm: no RAID superblock on /dev/sdc
>> mdadm: no RAID superblock on /dev/sdb
>> mdadm: no RAID superblock on /dev/sda1
>> mdadm: no RAID superblock on /dev/sda
>> mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 7.
>> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 6.
>> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
>> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.
>> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
>> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
>> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
>> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
>> mdadm: ignoring /dev/sdb1 as it reports /dev/sdi1 as failed
>> mdadm: no uptodate device for slot 0 of /dev/md0
>> mdadm: added /dev/sdd1 to /dev/md0 as 2
>> mdadm: added /dev/sde1 to /dev/md0 as 3
>> mdadm: added /dev/sdf1 to /dev/md0 as 4
>> mdadm: added /dev/sdg1 to /dev/md0 as 5
>> mdadm: added /dev/sdh1 to /dev/md0 as 6
>> mdadm: added /dev/sdi1 to /dev/md0 as 7
>> mdadm: added /dev/sdc1 to /dev/md0 as 1
>> mdadm: /dev/md0 has been started with 6 drives (out of 7) and 1 spare.
>>
>> I finally made a test by modifying mdadm V3.2.5 sources to not write
>> any data to any superblock and to simply exit() somewhere in the
>> middle of assembly process to be able to reproduce this behavior
>> without any RAID re-creation/synchronization.
>> So using mdadm V2.6.4 /dev/md0 assembles without problems and if I
>> switch to mdadm V3.2.5 it shows the same messages as above.
>>
>> The real problem:
>> I have more than a single machine receiving a similar software update
>> so I need to find a solution or workaround around this problem. By the
>> way, from another test without an existing spare disk, there seems to
>> be no 'throwing out'-problem when switching from V2.6.4 to V3.2.5.
>>
>> It would also be a great help if someone could explain the reason
>> behind the relevant code fragment for rejecting a device, e.g. why is
>> only the 'most_recent' device important?
>>
>> /* If this device thinks that 'most_recent' has failed, then
>> * we must reject this device.
>> */
>> if (j != most_recent &&
>> content->array.raid_disks > 0 &&
>> devices[most_recent].i.disk.raid_disk >= 0 &&
>> devmap[j * content->array.raid_disks +
>> devices[most_recent].i.disk.raid_disk] == 0) {
>> if (verbose > -1)
>> fprintf(stderr, Name ": ignoring %s as it reports %s as
>> failed\n",
>> devices[j].devname, devices[most_recent].devname);
>> best[i] = -1;
>> continue;
>> }
>>
>> I also attached some files showing some details about related
>> superblocks before and after assembly as well as about RAID status
>> itself.
>
>
> Thanks for the thorough report. I think this issue has been fixed in
> 3.3-rc1
> You can fix it for 3.2.5 by applying the following patch:
>
> diff --git a/Assemble.c b/Assemble.c
> index 227d66f..bc65c29 100644
> --- a/Assemble.c
> +++ b/Assemble.c
> @@ -849,7 +849,8 @@ int Assemble(struct supertype *st, char *mddev,
> devices[devcnt].i.disk.minor = minor(stb.st_rdev);
> if (most_recent < devcnt) {
> if (devices[devcnt].i.events
> - > devices[most_recent].i.events)
> + > devices[most_recent].i.events &&
> + devices[devcnt].i.disk.state == 6)
> most_recent = devcnt;
> }
> if (content->array.level == LEVEL_MULTIPATH)
>
> The "most recent" device is important as we need to choose one to compare
> all
> others again. The problem is that the code in 3.2.5 can sometimes choose a
> spare, which isn't such a good idea.
>
> The "most recent" is also important because when a collection of devices is
> given to the kernel it will give priority to some information which is on
> the
> last device passed in. So we make sure that the last device given to the
> kernel is the "most recent".
>
> Please let me know if the patch fixes your problem.
>
> NeilBrown
First of all, thanks for your very helpful 'most recent disk' explanation.
Sadly, the patch didn't fix my problem because the event counters are
really equal on all disks (inclusive spare) and the first disk that is
checked is the spare disk so there is no reason to set another disk as
'most recent disk', but I improved your patch a little bit by
providing more output and created also an own solution, but that needs
review because I'm not sure if it can be done like that.
Patch 1: Your solution with more output
Diff: mdadm-3.2.5-noassemble-patch1.diff
Assembly: mdadm-3.2.5-noassemble-patch1.txt
Patch 2: My proposed solution
Diff: mdadm-3.2.5-noassemble-patch2.diff
Assembly: mdadm-3.2.5-noassemble-patch2.txt
[-- Attachment #2: mdadm-3.2.5-noassemble-patch1.txt --]
[-- Type: text/plain, Size: 2049 bytes --]
# ./mdadm-3.2.5-noassemble-patch1 --assemble --scan -v
mdadm main: failed to get exclusive lock on mapfile
mdadm: looking for devices for /dev/md0
mdadm: no RAID superblock on /dev/sdg
mdadm: no RAID superblock on /dev/sdf
mdadm: no RAID superblock on /dev/sde
mdadm: no RAID superblock on /dev/sdd
mdadm: no RAID superblock on /dev/sdc
mdadm: no RAID superblock on /dev/sdb
mdadm: no RAID superblock on /dev/sda1
mdadm: no RAID superblock on /dev/sda
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
# most_recent=0; devcnt=0; devices[devcnt].i.events=42; devices[most_recent].i.events=42; disk.state=0
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.
# most_recent=0; devcnt=1; devices[devcnt].i.events=42; devices[most_recent].i.events=42; disk.state=6
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
# most_recent=0; devcnt=2; devices[devcnt].i.events=42; devices[most_recent].i.events=42; disk.state=6
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
# most_recent=0; devcnt=3; devices[devcnt].i.events=42; devices[most_recent].i.events=42; disk.state=6
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
# most_recent=0; devcnt=4; devices[devcnt].i.events=42; devices[most_recent].i.events=42; disk.state=6
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
# most_recent=0; devcnt=5; devices[devcnt].i.events=42; devices[most_recent].i.events=42; disk.state=6
# j=5; most_recent=0; content->array.raid_disks=5; devices[most_recent].i.disk.raid_disk=5; devmap[30] = 0
mdadm: ignoring /dev/sdb1 as it reports /dev/sdg1 as failed
# j=4; most_recent=0; content->array.raid_disks=5; devices[most_recent].i.disk.raid_disk=5; devmap[25] = 1
# j=3; most_recent=0; content->array.raid_disks=5; devices[most_recent].i.disk.raid_disk=5; devmap[20] = 1
# j=2; most_recent=0; content->array.raid_disks=5; devices[most_recent].i.disk.raid_disk=5; devmap[15] = 1
# j=1; most_recent=0; content->array.raid_disks=5; devices[most_recent].i.disk.raid_disk=5; devmap[10] = 1
[-- Attachment #3: mdadm-3.2.5-noassemble-patch1.diff --]
[-- Type: application/octet-stream, Size: 2649 bytes --]
diff -Nur mdadm-3.2.5-orig/Assemble.c mdadm-3.2.5-noassemble-patch1/Assemble.c
--- mdadm-3.2.5-orig/Assemble.c 2012-05-18 09:10:03.000000000 +0200
+++ mdadm-3.2.5-noassemble-patch1/Assemble.c 2013-08-29 10:57:58.000000000 +0200
@@ -220,7 +220,7 @@
int change = 0;
int inargv = 0;
int report_missmatch;
-#ifndef MDASSEMBLE
+#if 0
int bitmap_done;
#endif
int start_partial_ok = (runstop >= 0) &&
@@ -716,7 +716,7 @@
}
ioctl(mdfd, STOP_ARRAY, NULL); /* just incase it was started but has no content */
-#ifndef MDASSEMBLE
+#if 0
if (content != &info) {
/* This is a member of a container. Try starting the array. */
int err;
@@ -735,7 +735,7 @@
char *devname = tmpdev->devname;
struct stat stb;
/* looks like a good enough match to update the super block if needed */
-#ifndef MDASSEMBLE
+#if 0
if (update) {
int dfd;
/* prepare useful information in info structures */
@@ -847,10 +847,14 @@
devices[devcnt].i = *content;
devices[devcnt].i.disk.major = major(stb.st_rdev);
devices[devcnt].i.disk.minor = minor(stb.st_rdev);
+ fprintf( stderr, "# most_recent=%d; devcnt=%d; devices[devcnt].i.events=%llu; devices[most_recent].i.events=%llu; disk.state=%d\n",
+ most_recent, devcnt, devices[devcnt].i.events, devices[most_recent].i.events, devices[devcnt].i.disk.state );
if (most_recent < devcnt) {
- if (devices[devcnt].i.events
- > devices[most_recent].i.events)
+ if (devices[devcnt].i.events > devices[most_recent].i.events &&
+ devices[devcnt].i.disk.state == 6) {
most_recent = devcnt;
+ fprintf( stderr, " new: most_recent=%d; disk.state=%d\n", most_recent, devices[devcnt].i.disk.state );
+ }
}
if (content->array.level == LEVEL_MULTIPATH)
/* with multipath, the raid_disk from the superblock is meaningless */
@@ -960,6 +964,10 @@
/* If this device thinks that 'most_recent' has failed, then
* we must reject this device.
*/
+ fprintf( stderr, "# j=%d; most_recent=%d; content->array.raid_disks=%d; devices[most_recent].i.disk.raid_disk=%d; devmap[%d] = %d\n",
+ j, most_recent, content->array.raid_disks, devices[most_recent].i.disk.raid_disk,
+ j * content->array.raid_disks + devices[most_recent].i.disk.raid_disk,
+ devmap[j * content->array.raid_disks + devices[most_recent].i.disk.raid_disk] );
if (j != most_recent &&
content->array.raid_disks > 0 &&
devices[most_recent].i.disk.raid_disk >= 0 &&
@@ -988,6 +996,7 @@
}
}
free(devmap);
+exit( 255 );
while (force &&
(!enough(content->array.level, content->array.raid_disks,
content->array.layout, 1,
[-- Attachment #4: mdadm-3.2.5-noassemble-patch2.diff --]
[-- Type: application/octet-stream, Size: 3044 bytes --]
diff -Nur mdadm-3.2.5-orig/Assemble.c mdadm-3.2.5-noassemble-patch2/Assemble.c
--- mdadm-3.2.5-orig/Assemble.c 2012-05-18 09:10:03.000000000 +0200
+++ mdadm-3.2.5-noassemble-patch2/Assemble.c 2013-08-29 10:40:42.000000000 +0200
@@ -220,7 +220,7 @@
int change = 0;
int inargv = 0;
int report_missmatch;
-#ifndef MDASSEMBLE
+#if 0
int bitmap_done;
#endif
int start_partial_ok = (runstop >= 0) &&
@@ -235,6 +235,7 @@
int trustworthy;
char chosen_name[1024];
struct domainlist *domains = NULL;
+ int get_new_recent_disk = 0;
if (get_linux_version() < 2004000)
old_linux = 1;
@@ -716,7 +717,7 @@
}
ioctl(mdfd, STOP_ARRAY, NULL); /* just incase it was started but has no content */
-#ifndef MDASSEMBLE
+#if 0
if (content != &info) {
/* This is a member of a container. Try starting the array. */
int err;
@@ -735,7 +736,7 @@
char *devname = tmpdev->devname;
struct stat stb;
/* looks like a good enough match to update the super block if needed */
-#ifndef MDASSEMBLE
+#if 0
if (update) {
int dfd;
/* prepare useful information in info structures */
@@ -847,10 +848,17 @@
devices[devcnt].i = *content;
devices[devcnt].i.disk.major = major(stb.st_rdev);
devices[devcnt].i.disk.minor = minor(stb.st_rdev);
+ fprintf( stderr, "# most_recent=%d; devcnt=%d; devices[devcnt].i.events=%llu; devices[most_recent].i.events=%llu; disk.state=%d\n",
+ most_recent, devcnt, devices[devcnt].i.events, devices[most_recent].i.events, devices[devcnt].i.disk.state );
if (most_recent < devcnt) {
- if (devices[devcnt].i.events
- > devices[most_recent].i.events)
+ if ((devices[devcnt].i.events > devices[most_recent].i.events && devices[devcnt].i.disk.state == 6) ||
+ (get_new_recent_disk == 1 && devices[devcnt].i.disk.state == 6)) {
most_recent = devcnt;
+ get_new_recent_disk = 0;
+ fprintf( stderr, " new: most_recent=%d; disk.state=%d\n", most_recent, devices[devcnt].i.disk.state );
+ }
+ } else if (most_recent == devcnt && devices[devcnt].i.disk.state != 6) {
+ get_new_recent_disk = 1;
}
if (content->array.level == LEVEL_MULTIPATH)
/* with multipath, the raid_disk from the superblock is meaningless */
@@ -960,6 +968,10 @@
/* If this device thinks that 'most_recent' has failed, then
* we must reject this device.
*/
+ fprintf( stderr, "# j=%d; most_recent=%d; content->array.raid_disks=%d; devices[most_recent].i.disk.raid_disk=%d; devmap[%d] = %d\n",
+ j, most_recent, content->array.raid_disks, devices[most_recent].i.disk.raid_disk,
+ j * content->array.raid_disks + devices[most_recent].i.disk.raid_disk,
+ devmap[j * content->array.raid_disks + devices[most_recent].i.disk.raid_disk] );
if (j != most_recent &&
content->array.raid_disks > 0 &&
devices[most_recent].i.disk.raid_disk >= 0 &&
@@ -988,6 +1000,7 @@
}
}
free(devmap);
+exit( 255 );
while (force &&
(!enough(content->array.level, content->array.raid_disks,
content->array.layout, 1,
[-- Attachment #5: mdadm-3.2.5-noassemble-patch2.txt --]
[-- Type: text/plain, Size: 2023 bytes --]
# ./mdadm-3.2.5-noassemble-patch2 --assemble --scan -v
mdadm main: failed to get exclusive lock on mapfile
mdadm: looking for devices for /dev/md0
mdadm: no RAID superblock on /dev/sdg
mdadm: no RAID superblock on /dev/sdf
mdadm: no RAID superblock on /dev/sde
mdadm: no RAID superblock on /dev/sdd
mdadm: no RAID superblock on /dev/sdc
mdadm: no RAID superblock on /dev/sdb
mdadm: no RAID superblock on /dev/sda1
mdadm: no RAID superblock on /dev/sda
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
# most_recent=0; devcnt=0; devices[devcnt].i.events=42; devices[most_recent].i.events=42; disk.state=0
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.
# most_recent=0; devcnt=1; devices[devcnt].i.events=42; devices[most_recent].i.events=42; disk.state=6
new: most_recent=1; disk.state=6
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
# most_recent=1; devcnt=2; devices[devcnt].i.events=42; devices[most_recent].i.events=42; disk.state=6
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
# most_recent=1; devcnt=3; devices[devcnt].i.events=42; devices[most_recent].i.events=42; disk.state=6
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
# most_recent=1; devcnt=4; devices[devcnt].i.events=42; devices[most_recent].i.events=42; disk.state=6
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
# most_recent=1; devcnt=5; devices[devcnt].i.events=42; devices[most_recent].i.events=42; disk.state=6
# j=5; most_recent=1; content->array.raid_disks=5; devices[most_recent].i.disk.raid_disk=4; devmap[29] = 1
# j=4; most_recent=1; content->array.raid_disks=5; devices[most_recent].i.disk.raid_disk=4; devmap[24] = 1
# j=3; most_recent=1; content->array.raid_disks=5; devices[most_recent].i.disk.raid_disk=4; devmap[19] = 1
# j=2; most_recent=1; content->array.raid_disks=5; devices[most_recent].i.disk.raid_disk=4; devmap[14] = 1
# j=1; most_recent=1; content->array.raid_disks=5; devices[most_recent].i.disk.raid_disk=4; devmap[9] = 1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Update to mdadm V3.2.5 => RAID starts to recover (reproducible)
2013-08-29 9:55 ` Andreas Baer
@ 2013-09-02 1:35 ` NeilBrown
2013-09-05 15:22 ` Andreas Baer
0 siblings, 1 reply; 6+ messages in thread
From: NeilBrown @ 2013-09-02 1:35 UTC (permalink / raw)
To: Andreas Baer; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 7222 bytes --]
On Thu, 29 Aug 2013 11:55:09 +0200 Andreas Baer <synthetic.gods@gmail.com>
wrote:
> On 8/26/13, NeilBrown <neilb@suse.de> wrote:
> > On Thu, 22 Aug 2013 15:20:06 +0200 Andreas Baer <synthetic.gods@gmail.com>
> > wrote:
> >
> >> Short description:
> >> I've discovered a problem during re-assembly of a clean RAID. mdadm
> >> throws one disk out because this disk apparently shows another disk as
> >> failed. After assembly, RAID starts to recover on existing spare disk.
> >>
> >> In detail:
> >> 1. RAID-6 (Superblock V0.90.00) created with mdadm V2.6.4 and with 7
> >> active disks and 1 spare disk (disk size: 1 TB), fully synced and
> >> clean.
> >> 2. RAID-6 stopped and re-assembled with mdadm V3.2.5, but during that
> >> one disk is thrown out.
> >>
> >> Manual assembly command for /dev/md0, relevant partitions are
> >> /dev/sd[b-i]1:
> >> # mdadm --assemble --scan -vvv
> >> mdadm: looking for devices for /dev/md0
> >> mdadm: no RAID superblock on /dev/sdi
> >> mdadm: no RAID superblock on /dev/sdh
> >> mdadm: no RAID superblock on /dev/sdg
> >> mdadm: no RAID superblock on /dev/sdf
> >> mdadm: no RAID superblock on /dev/sde
> >> mdadm: no RAID superblock on /dev/sdd
> >> mdadm: no RAID superblock on /dev/sdc
> >> mdadm: no RAID superblock on /dev/sdb
> >> mdadm: no RAID superblock on /dev/sda1
> >> mdadm: no RAID superblock on /dev/sda
> >> mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 7.
> >> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 6.
> >> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
> >> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.
> >> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
> >> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
> >> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
> >> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
> >> mdadm: ignoring /dev/sdb1 as it reports /dev/sdi1 as failed
> >> mdadm: no uptodate device for slot 0 of /dev/md0
> >> mdadm: added /dev/sdd1 to /dev/md0 as 2
> >> mdadm: added /dev/sde1 to /dev/md0 as 3
> >> mdadm: added /dev/sdf1 to /dev/md0 as 4
> >> mdadm: added /dev/sdg1 to /dev/md0 as 5
> >> mdadm: added /dev/sdh1 to /dev/md0 as 6
> >> mdadm: added /dev/sdi1 to /dev/md0 as 7
> >> mdadm: added /dev/sdc1 to /dev/md0 as 1
> >> mdadm: /dev/md0 has been started with 6 drives (out of 7) and 1 spare.
> >>
> >> I finally made a test by modifying mdadm V3.2.5 sources to not write
> >> any data to any superblock and to simply exit() somewhere in the
> >> middle of assembly process to be able to reproduce this behavior
> >> without any RAID re-creation/synchronization.
> >> So using mdadm V2.6.4 /dev/md0 assembles without problems and if I
> >> switch to mdadm V3.2.5 it shows the same messages as above.
> >>
> >> The real problem:
> >> I have more than a single machine receiving a similar software update
> >> so I need to find a solution or workaround around this problem. By the
> >> way, from another test without an existing spare disk, there seems to
> >> be no 'throwing out'-problem when switching from V2.6.4 to V3.2.5.
> >>
> >> It would also be a great help if someone could explain the reason
> >> behind the relevant code fragment for rejecting a device, e.g. why is
> >> only the 'most_recent' device important?
> >>
> >> /* If this device thinks that 'most_recent' has failed, then
> >> * we must reject this device.
> >> */
> >> if (j != most_recent &&
> >> content->array.raid_disks > 0 &&
> >> devices[most_recent].i.disk.raid_disk >= 0 &&
> >> devmap[j * content->array.raid_disks +
> >> devices[most_recent].i.disk.raid_disk] == 0) {
> >> if (verbose > -1)
> >> fprintf(stderr, Name ": ignoring %s as it reports %s as
> >> failed\n",
> >> devices[j].devname, devices[most_recent].devname);
> >> best[i] = -1;
> >> continue;
> >> }
> >>
> >> I also attached some files showing some details about related
> >> superblocks before and after assembly as well as about RAID status
> >> itself.
> >
> >
> > Thanks for the thorough report. I think this issue has been fixed in
> > 3.3-rc1
> > You can fix it for 3.2.5 by applying the following patch:
> >
> > diff --git a/Assemble.c b/Assemble.c
> > index 227d66f..bc65c29 100644
> > --- a/Assemble.c
> > +++ b/Assemble.c
> > @@ -849,7 +849,8 @@ int Assemble(struct supertype *st, char *mddev,
> > devices[devcnt].i.disk.minor = minor(stb.st_rdev);
> > if (most_recent < devcnt) {
> > if (devices[devcnt].i.events
> > - > devices[most_recent].i.events)
> > + > devices[most_recent].i.events &&
> > + devices[devcnt].i.disk.state == 6)
> > most_recent = devcnt;
> > }
> > if (content->array.level == LEVEL_MULTIPATH)
> >
> > The "most recent" device is important as we need to choose one to compare
> > all
> > others again. The problem is that the code in 3.2.5 can sometimes choose a
> > spare, which isn't such a good idea.
> >
> > The "most recent" is also important because when a collection of devices is
> > given to the kernel it will give priority to some information which is on
> > the
> > last device passed in. So we make sure that the last device given to the
> > kernel is the "most recent".
> >
> > Please let me know if the patch fixes your problem.
> >
> > NeilBrown
>
> First of all, thanks for your very helpful 'most recent disk' explanation.
>
> Sadly, the patch didn't fix my problem because the event counters are
> really equal on all disks (inclusive spare) and the first disk that is
> checked is the spare disk so there is no reason to set another disk as
> 'most recent disk', but I improved your patch a little bit by
> providing more output and created also an own solution, but that needs
> review because I'm not sure if it can be done like that.
>
> Patch 1: Your solution with more output
> Diff: mdadm-3.2.5-noassemble-patch1.diff
> Assembly: mdadm-3.2.5-noassemble-patch1.txt
>
> Patch 2: My proposed solution
> Diff: mdadm-3.2.5-noassemble-patch2.diff
> Assembly: mdadm-3.2.5-noassemble-patch2.txt
Thanks for the testing and suggestions. I see what I missed now.
Can you check if this patch works please?
Thanks.
NeilBrown
diff --git a/Assemble.c b/Assemble.c
index 227d66f..9131917 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -215,7 +215,7 @@ int Assemble(struct supertype *st, char *mddev,
unsigned int okcnt, sparecnt, rebuilding_cnt;
unsigned int req_cnt;
int i;
- int most_recent = 0;
+ int most_recent = -1;
int chosen_drive;
int change = 0;
int inargv = 0;
@@ -847,8 +847,9 @@ int Assemble(struct supertype *st, char *mddev,
devices[devcnt].i = *content;
devices[devcnt].i.disk.major = major(stb.st_rdev);
devices[devcnt].i.disk.minor = minor(stb.st_rdev);
- if (most_recent < devcnt) {
- if (devices[devcnt].i.events
+ if (devices[devcnt].i.disk_state == 6) {
+ if (most_recent < 0 ||
+ devices[devcnt].i.events
> devices[most_recent].i.events)
most_recent = devcnt;
}
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Update to mdadm V3.2.5 => RAID starts to recover (reproducible)
2013-09-02 1:35 ` NeilBrown
@ 2013-09-05 15:22 ` Andreas Baer
2013-09-09 2:39 ` NeilBrown
0 siblings, 1 reply; 6+ messages in thread
From: Andreas Baer @ 2013-09-05 15:22 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 7575 bytes --]
On 9/2/13, NeilBrown <neilb@suse.de> wrote:
> On Thu, 29 Aug 2013 11:55:09 +0200 Andreas Baer <synthetic.gods@gmail.com>
> wrote:
>
>> On 8/26/13, NeilBrown <neilb@suse.de> wrote:
>> > On Thu, 22 Aug 2013 15:20:06 +0200 Andreas Baer
>> > <synthetic.gods@gmail.com>
>> > wrote:
>> >
>> >> Short description:
>> >> I've discovered a problem during re-assembly of a clean RAID. mdadm
>> >> throws one disk out because this disk apparently shows another disk as
>> >> failed. After assembly, RAID starts to recover on existing spare disk.
>> >>
>> >> In detail:
>> >> 1. RAID-6 (Superblock V0.90.00) created with mdadm V2.6.4 and with 7
>> >> active disks and 1 spare disk (disk size: 1 TB), fully synced and
>> >> clean.
>> >> 2. RAID-6 stopped and re-assembled with mdadm V3.2.5, but during that
>> >> one disk is thrown out.
>> >>
>> >> Manual assembly command for /dev/md0, relevant partitions are
>> >> /dev/sd[b-i]1:
>> >> # mdadm --assemble --scan -vvv
>> >> mdadm: looking for devices for /dev/md0
>> >> mdadm: no RAID superblock on /dev/sdi
>> >> mdadm: no RAID superblock on /dev/sdh
>> >> mdadm: no RAID superblock on /dev/sdg
>> >> mdadm: no RAID superblock on /dev/sdf
>> >> mdadm: no RAID superblock on /dev/sde
>> >> mdadm: no RAID superblock on /dev/sdd
>> >> mdadm: no RAID superblock on /dev/sdc
>> >> mdadm: no RAID superblock on /dev/sdb
>> >> mdadm: no RAID superblock on /dev/sda1
>> >> mdadm: no RAID superblock on /dev/sda
>> >> mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 7.
>> >> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 6.
>> >> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
>> >> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.
>> >> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
>> >> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
>> >> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
>> >> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
>> >> mdadm: ignoring /dev/sdb1 as it reports /dev/sdi1 as failed
>> >> mdadm: no uptodate device for slot 0 of /dev/md0
>> >> mdadm: added /dev/sdd1 to /dev/md0 as 2
>> >> mdadm: added /dev/sde1 to /dev/md0 as 3
>> >> mdadm: added /dev/sdf1 to /dev/md0 as 4
>> >> mdadm: added /dev/sdg1 to /dev/md0 as 5
>> >> mdadm: added /dev/sdh1 to /dev/md0 as 6
>> >> mdadm: added /dev/sdi1 to /dev/md0 as 7
>> >> mdadm: added /dev/sdc1 to /dev/md0 as 1
>> >> mdadm: /dev/md0 has been started with 6 drives (out of 7) and 1 spare.
>> >>
>> >> I finally made a test by modifying mdadm V3.2.5 sources to not write
>> >> any data to any superblock and to simply exit() somewhere in the
>> >> middle of assembly process to be able to reproduce this behavior
>> >> without any RAID re-creation/synchronization.
>> >> So using mdadm V2.6.4 /dev/md0 assembles without problems and if I
>> >> switch to mdadm V3.2.5 it shows the same messages as above.
>> >>
>> >> The real problem:
>> >> I have more than a single machine receiving a similar software update
>> >> so I need to find a solution or workaround around this problem. By the
>> >> way, from another test without an existing spare disk, there seems to
>> >> be no 'throwing out'-problem when switching from V2.6.4 to V3.2.5.
>> >>
>> >> It would also be a great help if someone could explain the reason
>> >> behind the relevant code fragment for rejecting a device, e.g. why is
>> >> only the 'most_recent' device important?
>> >>
>> >> /* If this device thinks that 'most_recent' has failed, then
>> >> * we must reject this device.
>> >> */
>> >> if (j != most_recent &&
>> >> content->array.raid_disks > 0 &&
>> >> devices[most_recent].i.disk.raid_disk >= 0 &&
>> >> devmap[j * content->array.raid_disks +
>> >> devices[most_recent].i.disk.raid_disk] == 0) {
>> >> if (verbose > -1)
>> >> fprintf(stderr, Name ": ignoring %s as it reports %s as
>> >> failed\n",
>> >> devices[j].devname, devices[most_recent].devname);
>> >> best[i] = -1;
>> >> continue;
>> >> }
>> >>
>> >> I also attached some files showing some details about related
>> >> superblocks before and after assembly as well as about RAID status
>> >> itself.
>> >
>> >
>> > Thanks for the thorough report. I think this issue has been fixed in
>> > 3.3-rc1
>> > You can fix it for 3.2.5 by applying the following patch:
>> >
>> > diff --git a/Assemble.c b/Assemble.c
>> > index 227d66f..bc65c29 100644
>> > --- a/Assemble.c
>> > +++ b/Assemble.c
>> > @@ -849,7 +849,8 @@ int Assemble(struct supertype *st, char *mddev,
>> > devices[devcnt].i.disk.minor = minor(stb.st_rdev);
>> > if (most_recent < devcnt) {
>> > if (devices[devcnt].i.events
>> > - > devices[most_recent].i.events)
>> > + > devices[most_recent].i.events &&
>> > + devices[devcnt].i.disk.state == 6)
>> > most_recent = devcnt;
>> > }
>> > if (content->array.level == LEVEL_MULTIPATH)
>> >
>> > The "most recent" device is important as we need to choose one to
>> > compare
>> > all
>> > others again. The problem is that the code in 3.2.5 can sometimes
>> > choose a
>> > spare, which isn't such a good idea.
>> >
>> > The "most recent" is also important because when a collection of devices
>> > is given to the kernel it will give priority to some information which is
>> > on the
>> > last device passed in. So we make sure that the last device given to
>> > the kernel is the "most recent".
>> >
>> > Please let me know if the patch fixes your problem.
>> >
>> > NeilBrown
>>
>> First of all, thanks for your very helpful 'most recent disk'
>> explanation.
>>
>> Sadly, the patch didn't fix my problem because the event counters are
>> really equal on all disks (inclusive spare) and the first disk that is
>> checked is the spare disk so there is no reason to set another disk as
>> 'most recent disk', but I improved your patch a little bit by
>> providing more output and created also an own solution, but that needs
>> review because I'm not sure if it can be done like that.
>>
>> Patch 1: Your solution with more output
>> Diff: mdadm-3.2.5-noassemble-patch1.diff
>> Assembly: mdadm-3.2.5-noassemble-patch1.txt
>>
>> Patch 2: My proposed solution
>> Diff: mdadm-3.2.5-noassemble-patch2.diff
>> Assembly: mdadm-3.2.5-noassemble-patch2.txt
>
>
> Thanks for the testing and suggestions. I see what I missed now.
> Can you check if this patch works please?
>
> Thanks.
> NeilBrown
>
> diff --git a/Assemble.c b/Assemble.c
> index 227d66f..9131917 100644
> --- a/Assemble.c
> +++ b/Assemble.c
> @@ -215,7 +215,7 @@ int Assemble(struct supertype *st, char *mddev,
> unsigned int okcnt, sparecnt, rebuilding_cnt;
> unsigned int req_cnt;
> int i;
> - int most_recent = 0;
> + int most_recent = -1;
> int chosen_drive;
> int change = 0;
> int inargv = 0;
> @@ -847,8 +847,9 @@ int Assemble(struct supertype *st, char *mddev,
> devices[devcnt].i = *content;
> devices[devcnt].i.disk.major = major(stb.st_rdev);
> devices[devcnt].i.disk.minor = minor(stb.st_rdev);
> - if (most_recent < devcnt) {
> - if (devices[devcnt].i.events
> + if (devices[devcnt].i.disk_state == 6) {
> + if (most_recent < 0 ||
> + devices[devcnt].i.events
> > devices[most_recent].i.events)
> most_recent = devcnt;
> }
Your patch seems to work without issues.
There is only a small typo:
+ if (devices[devcnt].i.disk_state == 6) {
should be:
+ if (devices[devcnt].i.disk.state == 6) {
I attached the patch that I'm finally using to this mail.
Thank you very much for your help.
[-- Attachment #2: no-spare-as-most_recent.patch --]
[-- Type: application/octet-stream, Size: 799 bytes --]
diff -Naur mdadm-3.2.5-orig/Assemble.c mdadm-3.2.5/Assemble.c
--- mdadm-3.2.5-orig/Assemble.c 2012-05-18 09:10:03.000000000 +0200
+++ mdadm-3.2.5/Assemble.c 2013-09-03 14:05:45.000000000 +0200
@@ -215,7 +215,7 @@
unsigned int okcnt, sparecnt, rebuilding_cnt;
unsigned int req_cnt;
int i;
- int most_recent = 0;
+ int most_recent = -1;
int chosen_drive;
int change = 0;
int inargv = 0;
@@ -847,8 +847,9 @@
devices[devcnt].i = *content;
devices[devcnt].i.disk.major = major(stb.st_rdev);
devices[devcnt].i.disk.minor = minor(stb.st_rdev);
- if (most_recent < devcnt) {
- if (devices[devcnt].i.events
+ if (devices[devcnt].i.disk.state == 6) {
+ if (most_recent < 0 ||
+ devices[devcnt].i.events
> devices[most_recent].i.events)
most_recent = devcnt;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Update to mdadm V3.2.5 => RAID starts to recover (reproducible)
2013-09-05 15:22 ` Andreas Baer
@ 2013-09-09 2:39 ` NeilBrown
0 siblings, 0 replies; 6+ messages in thread
From: NeilBrown @ 2013-09-09 2:39 UTC (permalink / raw)
To: Andreas Baer; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 8300 bytes --]
On Thu, 5 Sep 2013 17:22:26 +0200 Andreas Baer <synthetic.gods@gmail.com>
wrote:
> On 9/2/13, NeilBrown <neilb@suse.de> wrote:
> > On Thu, 29 Aug 2013 11:55:09 +0200 Andreas Baer <synthetic.gods@gmail.com>
> > wrote:
> >
> >> On 8/26/13, NeilBrown <neilb@suse.de> wrote:
> >> > On Thu, 22 Aug 2013 15:20:06 +0200 Andreas Baer
> >> > <synthetic.gods@gmail.com>
> >> > wrote:
> >> >
> >> >> Short description:
> >> >> I've discovered a problem during re-assembly of a clean RAID. mdadm
> >> >> throws one disk out because this disk apparently shows another disk as
> >> >> failed. After assembly, RAID starts to recover on existing spare disk.
> >> >>
> >> >> In detail:
> >> >> 1. RAID-6 (Superblock V0.90.00) created with mdadm V2.6.4 and with 7
> >> >> active disks and 1 spare disk (disk size: 1 TB), fully synced and
> >> >> clean.
> >> >> 2. RAID-6 stopped and re-assembled with mdadm V3.2.5, but during that
> >> >> one disk is thrown out.
> >> >>
> >> >> Manual assembly command for /dev/md0, relevant partitions are
> >> >> /dev/sd[b-i]1:
> >> >> # mdadm --assemble --scan -vvv
> >> >> mdadm: looking for devices for /dev/md0
> >> >> mdadm: no RAID superblock on /dev/sdi
> >> >> mdadm: no RAID superblock on /dev/sdh
> >> >> mdadm: no RAID superblock on /dev/sdg
> >> >> mdadm: no RAID superblock on /dev/sdf
> >> >> mdadm: no RAID superblock on /dev/sde
> >> >> mdadm: no RAID superblock on /dev/sdd
> >> >> mdadm: no RAID superblock on /dev/sdc
> >> >> mdadm: no RAID superblock on /dev/sdb
> >> >> mdadm: no RAID superblock on /dev/sda1
> >> >> mdadm: no RAID superblock on /dev/sda
> >> >> mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 7.
> >> >> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 6.
> >> >> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
> >> >> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.
> >> >> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
> >> >> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
> >> >> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
> >> >> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
> >> >> mdadm: ignoring /dev/sdb1 as it reports /dev/sdi1 as failed
> >> >> mdadm: no uptodate device for slot 0 of /dev/md0
> >> >> mdadm: added /dev/sdd1 to /dev/md0 as 2
> >> >> mdadm: added /dev/sde1 to /dev/md0 as 3
> >> >> mdadm: added /dev/sdf1 to /dev/md0 as 4
> >> >> mdadm: added /dev/sdg1 to /dev/md0 as 5
> >> >> mdadm: added /dev/sdh1 to /dev/md0 as 6
> >> >> mdadm: added /dev/sdi1 to /dev/md0 as 7
> >> >> mdadm: added /dev/sdc1 to /dev/md0 as 1
> >> >> mdadm: /dev/md0 has been started with 6 drives (out of 7) and 1 spare.
> >> >>
> >> >> I finally made a test by modifying mdadm V3.2.5 sources to not write
> >> >> any data to any superblock and to simply exit() somewhere in the
> >> >> middle of assembly process to be able to reproduce this behavior
> >> >> without any RAID re-creation/synchronization.
> >> >> So using mdadm V2.6.4 /dev/md0 assembles without problems and if I
> >> >> switch to mdadm V3.2.5 it shows the same messages as above.
> >> >>
> >> >> The real problem:
> >> >> I have more than a single machine receiving a similar software update
> >> >> so I need to find a solution or workaround around this problem. By the
> >> >> way, from another test without an existing spare disk, there seems to
> >> >> be no 'throwing out'-problem when switching from V2.6.4 to V3.2.5.
> >> >>
> >> >> It would also be a great help if someone could explain the reason
> >> >> behind the relevant code fragment for rejecting a device, e.g. why is
> >> >> only the 'most_recent' device important?
> >> >>
> >> >> /* If this device thinks that 'most_recent' has failed, then
> >> >> * we must reject this device.
> >> >> */
> >> >> if (j != most_recent &&
> >> >> content->array.raid_disks > 0 &&
> >> >> devices[most_recent].i.disk.raid_disk >= 0 &&
> >> >> devmap[j * content->array.raid_disks +
> >> >> devices[most_recent].i.disk.raid_disk] == 0) {
> >> >> if (verbose > -1)
> >> >> fprintf(stderr, Name ": ignoring %s as it reports %s as
> >> >> failed\n",
> >> >> devices[j].devname, devices[most_recent].devname);
> >> >> best[i] = -1;
> >> >> continue;
> >> >> }
> >> >>
> >> >> I also attached some files showing some details about related
> >> >> superblocks before and after assembly as well as about RAID status
> >> >> itself.
> >> >
> >> >
> >> > Thanks for the thorough report. I think this issue has been fixed in
> >> > 3.3-rc1
> >> > You can fix it for 3.2.5 by applying the following patch:
> >> >
> >> > diff --git a/Assemble.c b/Assemble.c
> >> > index 227d66f..bc65c29 100644
> >> > --- a/Assemble.c
> >> > +++ b/Assemble.c
> >> > @@ -849,7 +849,8 @@ int Assemble(struct supertype *st, char *mddev,
> >> > devices[devcnt].i.disk.minor = minor(stb.st_rdev);
> >> > if (most_recent < devcnt) {
> >> > if (devices[devcnt].i.events
> >> > - > devices[most_recent].i.events)
> >> > + > devices[most_recent].i.events &&
> >> > + devices[devcnt].i.disk.state == 6)
> >> > most_recent = devcnt;
> >> > }
> >> > if (content->array.level == LEVEL_MULTIPATH)
> >> >
> >> > The "most recent" device is important as we need to choose one to
> >> > compare
> >> > all
> >> > others again. The problem is that the code in 3.2.5 can sometimes
> >> > choose a
> >> > spare, which isn't such a good idea.
> >> >
> >> > The "most recent" is also important because when a collection of devices
> >> > is given to the kernel it will give priority to some information which is
> >> > on the
> >> > last device passed in. So we make sure that the last device given to
> >> > the kernel is the "most recent".
> >> >
> >> > Please let me know if the patch fixes your problem.
> >> >
> >> > NeilBrown
> >>
> >> First of all, thanks for your very helpful 'most recent disk'
> >> explanation.
> >>
> >> Sadly, the patch didn't fix my problem because the event counters are
> >> really equal on all disks (inclusive spare) and the first disk that is
> >> checked is the spare disk so there is no reason to set another disk as
> >> 'most recent disk', but I improved your patch a little bit by
> >> providing more output and created also an own solution, but that needs
> >> review because I'm not sure if it can be done like that.
> >>
> >> Patch 1: Your solution with more output
> >> Diff: mdadm-3.2.5-noassemble-patch1.diff
> >> Assembly: mdadm-3.2.5-noassemble-patch1.txt
> >>
> >> Patch 2: My proposed solution
> >> Diff: mdadm-3.2.5-noassemble-patch2.diff
> >> Assembly: mdadm-3.2.5-noassemble-patch2.txt
> >
> >
> > Thanks for the testing and suggestions. I see what I missed now.
> > Can you check if this patch works please?
> >
> > Thanks.
> > NeilBrown
> >
> > diff --git a/Assemble.c b/Assemble.c
> > index 227d66f..9131917 100644
> > --- a/Assemble.c
> > +++ b/Assemble.c
> > @@ -215,7 +215,7 @@ int Assemble(struct supertype *st, char *mddev,
> > unsigned int okcnt, sparecnt, rebuilding_cnt;
> > unsigned int req_cnt;
> > int i;
> > - int most_recent = 0;
> > + int most_recent = -1;
> > int chosen_drive;
> > int change = 0;
> > int inargv = 0;
> > @@ -847,8 +847,9 @@ int Assemble(struct supertype *st, char *mddev,
> > devices[devcnt].i = *content;
> > devices[devcnt].i.disk.major = major(stb.st_rdev);
> > devices[devcnt].i.disk.minor = minor(stb.st_rdev);
> > - if (most_recent < devcnt) {
> > - if (devices[devcnt].i.events
> > + if (devices[devcnt].i.disk_state == 6) {
> > + if (most_recent < 0 ||
> > + devices[devcnt].i.events
> > > devices[most_recent].i.events)
> > most_recent = devcnt;
> > }
>
> Your patch seems to work without issues.
>
> There is only a small typo:
> + if (devices[devcnt].i.disk_state == 6) {
> should be:
> + if (devices[devcnt].i.disk.state == 6) {
>
> I attached the patch that I'm finally using to this mail.
> Thank you very much for your help.
Great. Thanks for the confirmation.
This fix is in 3.3.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-09-09 2:39 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-22 13:20 Update to mdadm V3.2.5 => RAID starts to recover (reproducible) Andreas Baer
2013-08-26 5:52 ` NeilBrown
2013-08-29 9:55 ` Andreas Baer
2013-09-02 1:35 ` NeilBrown
2013-09-05 15:22 ` Andreas Baer
2013-09-09 2:39 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).