* A disk failure during the initial resync after create, does not always suspend the resync to start the recovery
@ 2012-06-26 21:14 Ralph Berrett
2012-06-26 22:57 ` NeilBrown
0 siblings, 1 reply; 2+ messages in thread
From: Ralph Berrett @ 2012-06-26 21:14 UTC (permalink / raw)
To: linux-raid@vger.kernel.org
A disk failure during the initial resync after create, does not always suspend the resync to start the recovery
Steps:
1. Create multiple Raid6 arrays (in my case 8 arrays, this is a large storage system)
2. Create 2 spares (with one spare in md1 and other in md5)
3. Set two different "spare-group" in /etc/mdamd/mdadm.conf so that md1-md4 and md5-md8 each share one of the two spare.
4. While resync is still in progress, fail a disk in the arrays that do not have the spare (physically pulled).
5. The spare drive is moved from to the effected array via mdadm --monitor daemon that is running. But the "recovery" does not always started. Most of the time it waits for the "resync" to complete before starting the "recovery", but not always.
Which is the expected behavior, should it stop the resync to do the recovery or not? If not, since these are fairly large arrays, the "resync" could take a while before it even starts the "recovery" leaving the system in degraded state. From my experiments, the "resync" is not stopped more often than it is stopped.
The only workaround I have found is to send "echo "ilde" > /sys/block/md2/md/sync_action" which will suspend the resync and then the recovery will start. This is a intended as a embedded system so this is not an optimal workaround.
emDebain 6.0.4
Kernel: 2.6.32
mdadm: v3.1.4 - 31st August 2010
Example: I failed a disk in md2 and then later on in md6. In both cases the spares from md1 and md5 (respectively) were moved into the degraded array. But md2 stopped the "resync" and started the "recovery" almost immediately but md6 stayed in "resync" until it finished, before starting the "recovery"
# cat /etc/mdadm/mdadm.conf
DEVICE /dev/sd*[^0-9]
ARRAY /dev/md1 metadata=1.2 name=nl-emdebian:1 UUID=29156c78:d55cd2dd:07c02578:9238ec0a spare-group=group0
ARRAY /dev/md2 metadata=1.2 name=nl-emdebian:2 UUID=4e7531bb:8f297d64:d8c1aab8:6b711384 spare-group=group0
ARRAY /dev/md3 metadata=1.2 name=nl-emdebian:3 UUID=d14eb677:d9f8b9d9:7de9c5de:c6cee4c8 spare-group=group0
ARRAY /dev/md4 metadata=1.2 name=nl-emdebian:4 UUID=4ee94f31:65af2645:0f8b557d:58b4a203 spare-group=group0
ARRAY /dev/md5 metadata=1.2 name=nl-emdebian:5 UUID=fe6a448c:68b60591:a939b315:9dabc1d1 spare-group=group1
ARRAY /dev/md6 metadata=1.2 name=nl-emdebian:6 UUID=832328d7:1107804e:1dfc3a48:760a2341 spare-group=group1
ARRAY /dev/md7 metadata=1.2 name=nl-emdebian:7 UUID=231c4ff6:be348e5e:ff144e55:1cfa0c9f spare-group=group1
ARRAY /dev/md8 metadata=1.2 name=nl-emdebian:8 UUID=d2904af6:cbc6d409:8f1601a4:d1f2229e spare-group=group1
# mdadm --detail /dev/md2
/dev/md2:
Version : 1.2
Creation Time : Tue Jun 26 10:57:51 2012
Raid Level : raid6
Array Size : 7814090752 (7452.10 GiB 8001.63 GB)
Used Dev Size : 976761344 (931.51 GiB 1000.20 GB)
Raid Devices : 10
Total Devices : 11
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Jun 26 15:54:45 2012
State : active, degraded, recovering
Active Devices : 9
Working Devices : 10
Failed Devices : 1
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 512K
Rebuild Status : 0% complete
Name : nl-emdebian:2
UUID : 4e7531bb:8f297d64:d8c1aab8:6b711384
Events : 364
Number Major Minor RaidDevice State
10 67 160 0 spare rebuilding /dev/sdbg
1 68 112 1 active sync /dev/sdbt
2 68 128 2 active sync /dev/sdbu
3 67 80 3 active sync /dev/sdbb
4 67 48 4 active sync /dev/sdaz
5 67 64 5 active sync /dev/sdba
6 67 96 6 active sync /dev/sdbc
7 67 16 7 active sync /dev/sdax
8 67 32 8 active sync /dev/sday
9 66 144 9 active sync /dev/sdap
0 68 160 - faulty spare
# mdadm --detail /dev/md6
/dev/md6:
Version : 1.2
Creation Time : Tue Jun 26 10:57:58 2012
Raid Level : raid6
Array Size : 7814090752 (7452.10 GiB 8001.63 GB)
Used Dev Size : 976761344 (931.51 GiB 1000.20 GB)
Raid Devices : 10
Total Devices : 11
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Jun 26 15:54:04 2012
State : active, degraded, resyncing
Active Devices : 9
Working Devices : 10
Failed Devices : 1
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 512K
Rebuild Status : 2% complete
Name : nl-emdebian:6
UUID : 832328d7:1107804e:1dfc3a48:760a2341
Events : 336
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
3 65 144 3 active sync /dev/sdz
4 65 112 4 active sync /dev/sdx
5 65 128 5 active sync /dev/sdy
6 65 160 6 active sync /dev/sdaa
7 65 80 7 active sync /dev/sdv
8 65 96 8 active sync /dev/sdw
9 8 208 9 active sync /dev/sdn
0 8 80 - faulty spare
10 65 224 - spare /dev/sdae
Ralph Berrett
Senior Software Engineer | P&S Broadcast & Storage
Avid
65 Network Drive
Burlington, MA 01803
United States
ralph.berrett@avid.com
t 9786403674
We're Avid. Learn more at www.avid.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: A disk failure during the initial resync after create, does not always suspend the resync to start the recovery
2012-06-26 21:14 A disk failure during the initial resync after create, does not always suspend the resync to start the recovery Ralph Berrett
@ 2012-06-26 22:57 ` NeilBrown
0 siblings, 0 replies; 2+ messages in thread
From: NeilBrown @ 2012-06-26 22:57 UTC (permalink / raw)
To: Ralph Berrett; +Cc: linux-raid@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 3023 bytes --]
On Tue, 26 Jun 2012 21:14:54 +0000 Ralph Berrett <ralph.berrett@avid.com>
wrote:
> A disk failure during the initial resync after create, does not always suspend the resync to start the recovery
>
> Steps:
> 1. Create multiple Raid6 arrays (in my case 8 arrays, this is a large storage system)
> 2. Create 2 spares (with one spare in md1 and other in md5)
> 3. Set two different "spare-group" in /etc/mdamd/mdadm.conf so that md1-md4 and md5-md8 each share one of the two spare.
> 4. While resync is still in progress, fail a disk in the arrays that do not have the spare (physically pulled).
> 5. The spare drive is moved from to the effected array via mdadm --monitor daemon that is running. But the "recovery" does not always started. Most of the time it waits for the "resync" to complete before starting the "recovery", but not always.
>
> Which is the expected behavior, should it stop the resync to do the recovery or not? If not, since these are fairly large arrays, the "resync" could take a while before it even starts the "recovery" leaving the system in degraded state. From my experiments, the "resync" is not stopped more often than it is stopped.
>
> The only workaround I have found is to send "echo "ilde" > /sys/block/md2/md/sync_action" which will suspend the resync and then the recovery will start. This is a intended as a embedded system so this is not an optimal workaround.
>
> emDebain 6.0.4
> Kernel: 2.6.32
> mdadm: v3.1.4 - 31st August 2010
>
I've never given a lot of thought to this scenario so the way that it works
is simply how the different bits fall together, not anything deliberate.
If a device failed in an array which did not currently have a spare attached,
I would expect the resync to restart and when the spare gets moved over the
resync would continue and when it completes a recovery would start.
If a device failed in an array which already had a spare attached - I would
have to check the code to see what would happen but I can certainly imagine
that a recovery of that spare would start, and it may well resync the other
parity block at the same time.
It should be deterministic though - I can't see much room for any random
element.
As the initial sync of RAID6 isn't really needed anyway, it is clear that it
should be interrupted and the recovery preformed instead.
However if a sync is happening after an unclean restart when a device fails,
it isn't clear to me what the preferred option is.
Allowing the sync to complete means your data will be protected from another
failure sooner.
Allowing the recovery to start immediately means that you will get all your
bandwidth to the array back sooner, and you'll be protected from double
failure sooner.
Maybe if the sync is less than half way, interrupt it. If more than half way,
abort it?
The way I would 'fix' this would be to modify mdadm to write 'idle' to
'sync_action' at an appropriate time (after moving the spare over).
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2012-06-26 22:57 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-26 21:14 A disk failure during the initial resync after create, does not always suspend the resync to start the recovery Ralph Berrett
2012-06-26 22:57 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).