Suspicious test failure - mdmon misses recovery events on loop devices

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Suspicious test failure - mdmon misses recovery events on loop devices
@ 2013-07-26 20:58 Martin Wilck
  2013-07-29  6:55 ` NeilBrown
  0 siblings, 1 reply; 22+ messages in thread
From: Martin Wilck @ 2013-07-26 20:58 UTC (permalink / raw)
  To: NeilBrown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3250 bytes --]

Hi Neil, everybody,

I am currently pulling my hair over strange failures I am observing. I
was trying to create a new unit test for DDF along the same lines as
09imsm-create-fail-rebuild. That's of course a very important test -
making sure that an array can actually recover from a disk failure.

The script does just that - create a container with 2 subarrays, fail a
disk, add a spare, and expect everything to be recovered after that.

What I find is that the recovery actually works, but sometimes the meta
data is broken after the test has finished. The added disk is shown in
"Rebuilding" state and/or one or both subarrays are considered
"degraded" although the kernel log clearly shows that the recovery
finished. The problem occurs almost always if the test is done on loop
devices, as in mdadm's "test" script. Just did another test, failed
10/10 attempts on loop devices. If I use LVM logical volumes instead (on
the same physical disk), the test never fails (0/10).

In the success case, the script prints only one line (its log file). In
the bad case, it will print some more lines of mdadm -E information. The
log file contains all the details.

I have come to the conclusion that if the failure occurs, mdmon simply
misses one or more state changes of the arrays and/or disks. For mdmon
to notice that the recovery has finished, it is crucial to see a
situation where sync_action is "idle", and had been "recover" before,
for both subarrays. This happens if I run the test on LVM, but not if I
run it on a loop device.

Thinking about it - what guarantee is there that mdmon catches a certain
kernel status change? If I read the code correctly, mdmon will only
catch it if
 (a) the status change occurs while mdmon is in the select() call, and
 (b) the status in sysfs doesn't change again between the return from
select() and mdmon reading the sysfs file contents.

I can see no guarantee that this always works, and with my loop device
test case I seem to have found a scenario where it actually doesn't. I
suppose that mdmon may be busy writing the DDF metadata while the kernel
event about finished recovery is arriving.

My first idea was that the the cause were the loop devices on my CentOS6
kernel not supporting O_DIRECT properly (recovery finishes almost
immediately in the page cache, perhaps too quickly for mdmon to notice),
but running a more recent kernel with proper O_DIRECT in the loop
device, I still see the problem, although the recovery takes longer now.

There is still a chance that I messed something up in DDF (I haven't
seen the problem with IMSM), but it isn't likely given that the test
always works fine on LVM. I am pretty much at my wit's end here and I'd
like to solicit some advice.

I'd definitely like to understand exactly what's going wrong here, but
it's very hard to debug because it's a timing issue involving the
kernel, mdadm, mdmon, and the manager. Adding debug code changes the
probability to hit the problem

Thanks for reading this far, I hope someone has an idea.

Martin

PS: In that context, reading mdmon-design.txt, is it allowed at all to
add dprintf() messages in the code path called by mdmon? That would also
affect some DDF methods where I currently have lots of debug code.

[-- Attachment #2: 22-ddf-fail --]
[-- Type: text/plain, Size: 1718 bytes --]

#! /bin/bash
mdadm=/home/martin/mdadm/mdadm
n=1
log=$(printf /var/tmp/md-%04d.log $n)
while [ -e $log ]; do
    n=$((n+1))
    log=$(printf /var/tmp/md-%04d.log $n)
done
echo log=$log

exec 3>&2
exec 1>$log 2>&1

TRAP=
trap 'eval "$TRAP"' 0

set -x
setup_lo() {
    for i in `seq 8 13`; do
	mknod /dev/loop$i b 7 $i
	losetup /dev/loop$i /var/tmp/mdtest$i
	eval "dev$i=/dev/loop$i"
    done >/dev/null 2>&1
    TRAP="cleanup_lo; $TRAP"
}

cleanup_lo() {
    for i in `seq 8 13`; do
	eval "losetup -d \$dev$i"
    done >/dev/null 2>&1
}

setup_lvm() {
    for i in `seq 8 13`; do
	eval "dev$i=/dev/tosh/rd$((i-7))"
    done
}

#setup_lvm
setup_lo

# for testing with IMSM instead of DDF
#export IMSM_NO_PLATFORM=1
#export IMSM_DEVNAME_AS_SERIAL=1
#export IMSM_TEST_OROM=1
#export MDADM_EXPERIMENTAL=1

dmesg -c >/dev/null
#date +"CREATE %H:%M:%S.%N"
$mdadm --quiet -CR /dev/md/ddf -e ddf -n 4 $dev8 $dev9 $dev10 $dev11 --auto=yes
$mdadm --quiet -CR /dev/md/vol1 $dev8 $dev9 $dev10 $dev11 -n 4 -l 5 -z 4000 -c 64 --auto=yes
$mdadm --quiet -CR /dev/md/vol0 $dev8 $dev9 $dev10 $dev11 -n 4 -l 10 -z 9000 -c 64 --auto=yes
$mdadm --quiet --wait /dev/md/vol[01]
#date +"WAIT %H:%M:%S.%N"

# get PID of mdmon for tracing
#for x in $(ps --no-heading -Lo lwp -C mdmon); do trace_cmd="$trace_cmd -p $x"; done

$mdadm --quiet --fail /dev/md/vol0 $dev11
#date +"FAIL %H:%M:%S.%N"

#strace -tt -x -o /tmp/md.str $trace_cmd &

$mdadm --quiet --add /dev/md/ddf $dev12
#date +"ADD %H:%M:%S.%N"

$mdadm --quiet --wait /dev/md/vol[01]
#date +"WAIT %H:%M:%S.%N"

cat /proc/mdstat
dmesg -c
$mdadm -E $dev11
$mdadm -E $dev12
$mdadm -Ss

# if strace was used above
#cat /tmp/md.str
set +x
egrep -i 'error|failed|rebuilding|Degraded,' $log >&3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Suspicious test failure - mdmon misses recovery events on loop devices
  2013-07-26 20:58 Suspicious test failure - mdmon misses recovery events on loop devices Martin Wilck
@ 2013-07-29  6:55 ` NeilBrown
  2013-07-29 20:39   ` Martin Wilck
  0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2013-07-29  6:55 UTC (permalink / raw)
  To: Martin Wilck; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4281 bytes --]

On Fri, 26 Jul 2013 22:58:05 +0200 Martin Wilck <mwilck@arcor.de> wrote:

> Hi Neil, everybody,
> 
> I am currently pulling my hair over strange failures I am observing. I
> was trying to create a new unit test for DDF along the same lines as
> 09imsm-create-fail-rebuild. That's of course a very important test -
> making sure that an array can actually recover from a disk failure.
> 
> The script does just that - create a container with 2 subarrays, fail a
> disk, add a spare, and expect everything to be recovered after that.
> 
> What I find is that the recovery actually works, but sometimes the meta
> data is broken after the test has finished. The added disk is shown in
> "Rebuilding" state and/or one or both subarrays are considered
> "degraded" although the kernel log clearly shows that the recovery
> finished. The problem occurs almost always if the test is done on loop
> devices, as in mdadm's "test" script. Just did another test, failed
> 10/10 attempts on loop devices. If I use LVM logical volumes instead (on
> the same physical disk), the test never fails (0/10).
> 
> In the success case, the script prints only one line (its log file). In
> the bad case, it will print some more lines of mdadm -E information. The
> log file contains all the details.
> 
> I have come to the conclusion that if the failure occurs, mdmon simply
> misses one or more state changes of the arrays and/or disks. For mdmon
> to notice that the recovery has finished, it is crucial to see a
> situation where sync_action is "idle", and had been "recover" before,
> for both subarrays. This happens if I run the test on LVM, but not if I
> run it on a loop device.
> 
> Thinking about it - what guarantee is there that mdmon catches a certain
> kernel status change? If I read the code correctly, mdmon will only
> catch it if
>  (a) the status change occurs while mdmon is in the select() call, and
>  (b) the status in sysfs doesn't change again between the return from
> select() and mdmon reading the sysfs file contents.
> 
> I can see no guarantee that this always works, and with my loop device
> test case I seem to have found a scenario where it actually doesn't. I
> suppose that mdmon may be busy writing the DDF metadata while the kernel
> event about finished recovery is arriving.
> 
> My first idea was that the the cause were the loop devices on my CentOS6
> kernel not supporting O_DIRECT properly (recovery finishes almost
> immediately in the page cache, perhaps too quickly for mdmon to notice),
> but running a more recent kernel with proper O_DIRECT in the loop
> device, I still see the problem, although the recovery takes longer now.
> 
> There is still a chance that I messed something up in DDF (I haven't
> seen the problem with IMSM), but it isn't likely given that the test
> always works fine on LVM. I am pretty much at my wit's end here and I'd
> like to solicit some advice.
> 
> I'd definitely like to understand exactly what's going wrong here, but
> it's very hard to debug because it's a timing issue involving the
> kernel, mdadm, mdmon, and the manager. Adding debug code changes the
> probability to hit the problem
> 
> Thanks for reading this far, I hope someone has an idea.

Hi Martin.

 I don't think the state change needs to happen while mdmon is in the select
 call.  It just need to happen between one call to read_and_act, and the next.
 And everything happens between one call and the next...

 If sync_action is 'recovery' one time and then something else that isn't
 'idle' the next time, then that would cause the transition to get lost.
 Can that ever happen?  Do you see a particular transition that bypasses
 'idle'?
 It is possible there is some race here...

 I'll try out your test script can see if I can reproduce  it.



> 
> Martin
> 
> PS: In that context, reading mdmon-design.txt, is it allowed at all to
> add dprintf() messages in the code path called by mdmon? That would also
> affect some DDF methods where I currently have lots of debug code.

Yes, you can have dprintf messages anywhere.  However if debugging is
enabled, then I don't promise that mdmon will even try to survive low memory
conditions.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Suspicious test failure - mdmon misses recovery events on loop devices
  2013-07-29  6:55 ` NeilBrown
@ 2013-07-29 20:39   ` Martin Wilck
  2013-07-29 20:42     ` Martin Wilck
  0 siblings, 1 reply; 22+ messages in thread
From: Martin Wilck @ 2013-07-29 20:39 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2241 bytes --]

On 07/29/2013 08:55 AM, NeilBrown wrote:

> Hi Martin.
> 
>  I don't think the state change needs to happen while mdmon is in the select
>  call.  It just need to happen between one call to read_and_act, and the next.
>  And everything happens between one call and the next...
> 
>  If sync_action is 'recovery' one time and then something else that isn't
>  'idle' the next time, then that would cause the transition to get lost.
>  Can that ever happen?  Do you see a particular transition that bypasses
>  'idle'?
>  It is possible there is some race here...
> 
>  I'll try out your test script can see if I can reproduce  it.

I don't see "recover" followed by anything else but "idle". But what I
do observe is that the "recover" status isn't seen at all.

I've attached a comparison of good and bad case for the same test.

BAD CASE: mdmon processes the metadata update for the first member and
writes the metadata. when it calls read_and_act after that, the recovery
on the first array is already finished, and it will call set_disk, and
call sync_metadata() again. This double metadata write takes a long
time. Meanwhile, the manager sent the update for the second member and
started the recovery on it. When mdmon comes down to processing this
update, the recovery on the 2nd array is already finished, and it never
sees "recover" or "frozen" state on it. Consequently, it doesn't realize
that there ever was a recovery.

The example is somehow pathological because the test arrays are
unrealistically small. In normal situations we'd expect recovery to take
much longer than 2 meta data writes. However, it doesn't feel good to
know that this can happen.

My current idea to solve this is yet another separate thread just for
monitoring kernel state changes. Don't have it ready yet, though.

Martin

> 
> 
> 
>>
>> Martin
>>
>> PS: In that context, reading mdmon-design.txt, is it allowed at all to
>> add dprintf() messages in the code path called by mdmon? That would also
>> affect some DDF methods where I currently have lots of debug code.
> 
> Yes, you can have dprintf messages anywhere.  However if debugging is
> enabled, then I don't promise that mdmon will even try to survive low memory
> conditions.
> 
> NeilBrown

[-- Attachment #2: fail-and-add_goodbad.txt --]
[-- Type: text/plain, Size: 804 bytes --]

"*" means monitor wakeup
array (1) is the RAID10, (0) the RAID5

GOOD (2)

Monitor					Manager
					activate_spare(1)
* 1:frozen 0: idle
* process_update(1)			
  ...					activate_spare(0)
  sync_metadata				
  1: recover 0: frozen			
     	       				remove_old(md125)
*  process_update(0)
   sync_metadata
  1: recover 0: recover
* 1: recover 0: recover
					remove_old(md126)
...
* 1: idle 0: recover
  set_disk
  sync_metadata
...
* 1: clean 0: clean
  set_disk
  sync_metadata

BAD

Monitor					Manager
					activate_spare(1)
* 1: frozen 0: idle
* 
					remove_old(md125)
  1: recover 0: idle
* process_update(1)			
  ...					activate_spare(0)
  sync_metadata
					remove_old(md126)
  1: idle 0: idle
  set_disk
  sync_metadata
* process_update(0)
  sync_metadata
  1: idle 0: idle  (3s later)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Suspicious test failure - mdmon misses recovery events on loop devices
  2013-07-29 20:39   ` Martin Wilck
@ 2013-07-29 20:42     ` Martin Wilck
  2013-07-30  0:42       ` NeilBrown
  0 siblings, 1 reply; 22+ messages in thread
From: Martin Wilck @ 2013-07-29 20:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid


> My current idea to solve this is yet another separate thread just for
> monitoring kernel state changes. Don't have it ready yet, though.

Another idea would be in manage_member, after queueing the metadata
update and waking up the monitor, to wait for the metadata to finish
processing before actually starting the recovery (writing "recover" to
sync_action).

Martin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Suspicious test failure - mdmon misses recovery events on loop devices
  2013-07-29 20:42     ` Martin Wilck
@ 2013-07-30  0:42       ` NeilBrown
  2013-07-30 21:16         ` Martin Wilck
  0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2013-07-30  0:42 UTC (permalink / raw)
  To: Martin Wilck; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1259 bytes --]

On Mon, 29 Jul 2013 22:42:25 +0200 Martin Wilck <mwilck@arcor.de> wrote:

> 
> > My current idea to solve this is yet another separate thread just for
> > monitoring kernel state changes. Don't have it ready yet, though.
> 
> Another idea would be in manage_member, after queueing the metadata
> update and waking up the monitor, to wait for the metadata to finish
> processing before actually starting the recovery (writing "recover" to
> sync_action).
> 
> Martin

I hope an extra thread won't be necessary :-)

I think that manage_member is the place to fix this.  However it might be
even simpler than you suggest.

We currently have

		replace_array(container, a, newa);
		sysfs_set_str(&a->info, NULL, "sync_action", "recover");

monitor subsequently takes that 'newa', looks at 'sync_action', see that it
is 'idle' and assume that the recover never happened.
Suppose we change it to:

		if (sysfs_set_str(&a->info, NULL, "sync_action", "recover") == 0)
		        newa->prev_action = newa->curr_action = recovery;
		replace_array(container, a, newa);

Then it wouldn't matter if monitor never saw the 'recovery' state as manager
explicitly told it that recovery had started.

Could you try that?

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Suspicious test failure - mdmon misses recovery events on loop devices
  2013-07-30  0:42       ` NeilBrown
@ 2013-07-30 21:16         ` Martin Wilck
  2013-07-30 21:18           ` [PATCH 00/10] Two bug fixes and a lot of debug code mwilck
                             ` (10 more replies)
  0 siblings, 11 replies; 22+ messages in thread
From: Martin Wilck @ 2013-07-30 21:16 UTC (permalink / raw)
  To: NeilBrown, linux-raid

On 07/30/2013 02:42 AM, NeilBrown wrote:
> On Mon, 29 Jul 2013 22:42:25 +0200 Martin Wilck <mwilck@arcor.de> wrote:
> 
>>
>>> My current idea to solve this is yet another separate thread just for
>>> monitoring kernel state changes. Don't have it ready yet, though.
>>
>> Another idea would be in manage_member, after queueing the metadata
>> update and waking up the monitor, to wait for the metadata to finish
>> processing before actually starting the recovery (writing "recover" to
>> sync_action).
>>
>> Martin
> 
> I hope an extra thread won't be necessary :-)

I think the general problem that mdmon may be busy writing to disk while
something changes in the kernel is real. But introducing an extra thread
would make things even more complex as they are now, so it might be
something to avoid.

> I think that manage_member is the place to fix this.  However it might be
> even simpler than you suggest.
> 
> We currently have
> 
> 		replace_array(container, a, newa);
> 		sysfs_set_str(&a->info, NULL, "sync_action", "recover");
> 
> monitor subsequently takes that 'newa', looks at 'sync_action', see that it
> is 'idle' and assume that the recover never happened.
> Suppose we change it to:
> 
> 		if (sysfs_set_str(&a->info, NULL, "sync_action", "recover") == 0)
> 		        newa->prev_action = newa->curr_action = recovery;
> 		replace_array(container, a, newa);
> 
> Then it wouldn't matter if monitor never saw the 'recovery' state as manager
> explicitly told it that recovery had started.
> 
> Could you try that?

Ingenious idea :-) Unfortunately it isn't sufficient. However this PLUS
waiting for the metadata upate to finish makes my test succeed reliably
(10/10, in the previous failure scenario). I'll send in the current
status of patches in a minute.

Martin
> 
> Thanks,
> NeilBrown


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 00/10] Two bug fixes and a lot of debug code
  2013-07-30 21:16         ` Martin Wilck
@ 2013-07-30 21:18           ` mwilck
  2013-07-31  3:10             ` NeilBrown
  2013-07-30 21:18           ` [PATCH 01/10] DDF: ddf_activate_spare: bugfix for 62ff3c40 mwilck
                             ` (9 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: mwilck @ 2013-07-30 21:18 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck

From: Martin Wilck <mwilck@arcor.de>

This patch series fixes the "suspicious test failure" mentioned
in an earlier posting to linux-raid, and another bug introduced
by myself. Besides, it contains a lot of DEBUG code that was 
necessary to understand the "suspicious" problem.

Verified that the 09 and 10 unit test series are not broken.

Martin Wilck (10):
  DDF: ddf_activate_spare: bugfix for 62ff3c40
  DDF: log disk status changes more nicely
  DDF: ddf_process_update: log offsets for conf changes
  DDF: load_ddf_header: more error logging
  DDF: ddf_set_disk: add some debug messages
  monitor: read_and_act: log status when called
  mdmon: wait_and_act: fix debug message for SIGUSR1
  mdmon: manage_member: debug messages for array state
  mdmon: manage_member: fix race condition during slow meta data writes
  tests/10ddf-create-fail-rebuild: new unit test for DDF

 managemon.c                     |   11 +++++-
 monitor.c                       |   25 +++++++++++--
 super-ddf.c                     |   75 +++++++++++++++++++++++++++++---------
 tests/10ddf-create-fail-rebuild |   77 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 166 insertions(+), 22 deletions(-)
 create mode 100644 tests/10ddf-create-fail-rebuild

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 00/10] Two bug fixes and a lot of debug code
  2013-07-30 21:18           ` [PATCH 00/10] Two bug fixes and a lot of debug code mwilck
@ 2013-07-31  3:10             ` NeilBrown
  0 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2013-07-31  3:10 UTC (permalink / raw)
  To: mwilck; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1467 bytes --]

On Tue, 30 Jul 2013 23:18:24 +0200 mwilck@arcor.de wrote:

> From: Martin Wilck <mwilck@arcor.de>
> 
> This patch series fixes the "suspicious test failure" mentioned
> in an earlier posting to linux-raid, and another bug introduced
> by myself. Besides, it contains a lot of DEBUG code that was 
> necessary to understand the "suspicious" problem.
> 
> Verified that the 09 and 10 unit test series are not broken.
> 
> Martin Wilck (10):
>   DDF: ddf_activate_spare: bugfix for 62ff3c40
>   DDF: log disk status changes more nicely
>   DDF: ddf_process_update: log offsets for conf changes
>   DDF: load_ddf_header: more error logging
>   DDF: ddf_set_disk: add some debug messages
>   monitor: read_and_act: log status when called
>   mdmon: wait_and_act: fix debug message for SIGUSR1
>   mdmon: manage_member: debug messages for array state
>   mdmon: manage_member: fix race condition during slow meta data writes
>   tests/10ddf-create-fail-rebuild: new unit test for DDF
> 
>  managemon.c                     |   11 +++++-
>  monitor.c                       |   25 +++++++++++--
>  super-ddf.c                     |   75 +++++++++++++++++++++++++++++---------
>  tests/10ddf-create-fail-rebuild |   77 +++++++++++++++++++++++++++++++++++++++
>  4 files changed, 166 insertions(+), 22 deletions(-)
>  create mode 100644 tests/10ddf-create-fail-rebuild

Thanks. All applied with one small change as already mentioned.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 01/10] DDF: ddf_activate_spare: bugfix for 62ff3c40
  2013-07-30 21:16         ` Martin Wilck
  2013-07-30 21:18           ` [PATCH 00/10] Two bug fixes and a lot of debug code mwilck
@ 2013-07-30 21:18           ` mwilck
  2013-07-30 21:18           ` [PATCH 02/10] DDF: log disk status changes more nicely mwilck
                             ` (8 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: mwilck @ 2013-07-30 21:18 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck

From: Martin Wilck <mwilck@arcor.de>

Move the check for good drives in the dl loop - otherwise dl
may be NULL and mdmon may crash.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 super-ddf.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/super-ddf.c b/super-ddf.c
index 683f969..46262ce 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -4774,13 +4774,6 @@ static struct mdinfo *ddf_activate_spare(struct active_array *a,
 	/* For each slot, if it is not working, find a spare */
 	dl = ddf->dlist;
 	for (i = 0; i < a->info.array.raid_disks; i++) {
-		be16 state = ddf->phys->entries[dl->pdnum].state;
-		if (be16_and(state,
-			     cpu_to_be16(DDF_Failed|DDF_Missing)) ||
-		    !be16_and(state,
-			      cpu_to_be16(DDF_Online)))
-			continue;
-
 		for (d = a->info.devs ; d ; d = d->next)
 			if (d->disk.raid_disk == i)
 				break;
@@ -4798,6 +4791,13 @@ static struct mdinfo *ddf_activate_spare(struct active_array *a,
 			int is_dedicated = 0;
 			struct extent *ex;
 			unsigned int j;
+			be16 state = ddf->phys->entries[dl->pdnum].state;
+			if (be16_and(state,
+				     cpu_to_be16(DDF_Failed|DDF_Missing)) ||
+			    !be16_and(state,
+				      cpu_to_be16(DDF_Online)))
+				continue;
+
 			/* If in this array, skip */
 			for (d2 = a->info.devs ; d2 ; d2 = d2->next)
 				if (d2->state_fd >= 0 &&
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 02/10] DDF: log disk status changes more nicely
  2013-07-30 21:16         ` Martin Wilck
  2013-07-30 21:18           ` [PATCH 00/10] Two bug fixes and a lot of debug code mwilck
  2013-07-30 21:18           ` [PATCH 01/10] DDF: ddf_activate_spare: bugfix for 62ff3c40 mwilck
@ 2013-07-30 21:18           ` mwilck
  2013-07-30 21:18           ` [PATCH 03/10] DDF: ddf_process_update: log offsets for conf changes mwilck
                             ` (7 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: mwilck @ 2013-07-30 21:18 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck

From: Martin Wilck <mwilck@arcor.de>

In particular, include refnum for better tracking. This makes
it a little easier for humans to track what happened to which disk.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 super-ddf.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/super-ddf.c b/super-ddf.c
index 46262ce..20f4cf4 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -4203,7 +4203,8 @@ static void ddf_set_disk(struct active_array *a, int n, int state)
 			ddf_set_updates_pending(ddf);
 	}
 
-	dprintf("ddf: set_disk %d to %x\n", n, state);
+	dprintf("ddf: set_disk %d (%08x) to %x\n", n,
+		be32_to_cpu(dl->disk.refnum), state);
 
 	/* Now we need to check the state of the array and update
 	 * virtual_disk.entries[n].state.
@@ -4803,7 +4804,9 @@ static struct mdinfo *ddf_activate_spare(struct active_array *a,
 				if (d2->state_fd >= 0 &&
 				    d2->disk.major == dl->major &&
 				    d2->disk.minor == dl->minor) {
-					dprintf("%x:%x already in array\n", dl->major, dl->minor);
+					dprintf("%x:%x (%08x) already in array\n",
+						dl->major, dl->minor,
+						be32_to_cpu(dl->disk.refnum));
 					break;
 				}
 			if (d2)
@@ -4883,8 +4886,9 @@ static struct mdinfo *ddf_activate_spare(struct active_array *a,
 			di->container_member = dl->pdnum;
 			di->next = rv;
 			rv = di;
-			dprintf("%x:%x to be %d at %llu\n", dl->major, dl->minor,
-				i, pos);
+			dprintf("%x:%x (%08x) to be %d at %llu\n",
+				dl->major, dl->minor,
+				be32_to_cpu(dl->disk.refnum), i, pos);
 
 			break;
 		}
@@ -4945,6 +4949,9 @@ static struct mdinfo *ddf_activate_spare(struct active_array *a,
 		}
 		vc->phys_refnum[i_prim] = ddf->phys->entries[dl->pdnum].refnum;
 		LBA_OFFSET(ddf, vc)[i_prim] = cpu_to_be64(di->data_offset);
+		dprintf("BVD %u gets %u: %08x at %llu\n", i_sec, i_prim,
+			be32_to_cpu(vc->phys_refnum[i_prim]),
+			be64_to_cpu(LBA_OFFSET(ddf, vc)[i_prim]));
 	}
 	*updates = mu;
 	return rv;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 03/10] DDF: ddf_process_update: log offsets for conf changes
  2013-07-30 21:16         ` Martin Wilck
                             ` (2 preceding siblings ...)
  2013-07-30 21:18           ` [PATCH 02/10] DDF: log disk status changes more nicely mwilck
@ 2013-07-30 21:18           ` mwilck
  2013-07-30 21:18           ` [PATCH 04/10] DDF: load_ddf_header: more error logging mwilck
                             ` (6 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: mwilck @ 2013-07-30 21:18 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck

From: Martin Wilck <mwilck@arcor.de>

I needed this for tracking a bug with wrong offsets after array
creation.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 super-ddf.c |   19 ++++++++++++++++++-
 1 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/super-ddf.c b/super-ddf.c
index 20f4cf4..b639f07 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -4495,10 +4495,27 @@ static void ddf_process_update(struct supertype *st,
 			 * fields
 			 */
 			unsigned int i;
+			unsigned int k;
 			copy_matching_bvd(ddf, &vcl->conf, update);
-			for (i = 1; i < vc->sec_elmnt_count; i++)
+			for (k = 0; k < be16_to_cpu(vc->prim_elmnt_count); k++)
+				dprintf("BVD %u has %08x at %llu\n", 0,
+					be32_to_cpu(vcl->conf.phys_refnum[k]),
+					be64_to_cpu(LBA_OFFSET(ddf,
+							       &vcl->conf)[k]));
+			for (i = 1; i < vc->sec_elmnt_count; i++) {
 				copy_matching_bvd(ddf, vcl->other_bvds[i-1],
 						  update);
+				for (k = 0; k < be16_to_cpu(
+					     vc->prim_elmnt_count); k++)
+					dprintf("BVD %u has %08x at %llu\n", i,
+						be32_to_cpu
+						(vcl->other_bvds[i-1]->
+						 phys_refnum[k]),
+						be64_to_cpu
+						(LBA_OFFSET
+						 (ddf,
+						  vcl->other_bvds[i-1])[k]));
+			}
 		} else {
 			/* A new VD_CONF */
 			unsigned int i;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 04/10] DDF: load_ddf_header: more error logging
  2013-07-30 21:16         ` Martin Wilck
                             ` (3 preceding siblings ...)
  2013-07-30 21:18           ` [PATCH 03/10] DDF: ddf_process_update: log offsets for conf changes mwilck
@ 2013-07-30 21:18           ` mwilck
  2013-07-30 21:18           ` [PATCH 05/10] DDF: ddf_set_disk: add some debug messages mwilck
                             ` (5 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: mwilck @ 2013-07-30 21:18 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck

From: Martin Wilck <mwilck@arcor.de>

Try to determine problem if load_ddf_header fails. May be useful
for determining compatibility problems with Fake RAID BIOSes.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 super-ddf.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/super-ddf.c b/super-ddf.c
index b639f07..f55cc11 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -754,18 +754,24 @@ static int load_ddf_header(int fd, unsigned long long lba,
 	if (read(fd, hdr, 512) != 512)
 		return 0;
 
-	if (!be32_eq(hdr->magic, DDF_HEADER_MAGIC))
+	if (!be32_eq(hdr->magic, DDF_HEADER_MAGIC)) {
+		pr_err("%s: bad header magic\n", __func__);
 		return 0;
-	if (!be32_eq(calc_crc(hdr, 512), hdr->crc))
+	}
+	if (!be32_eq(calc_crc(hdr, 512), hdr->crc)) {
+		pr_err("%s: bad CRC\n", __func__);
 		return 0;
+	}
 	if (memcmp(anchor->guid, hdr->guid, DDF_GUID_LEN) != 0 ||
 	    memcmp(anchor->revision, hdr->revision, 8) != 0 ||
 	    !be64_eq(anchor->primary_lba, hdr->primary_lba) ||
 	    !be64_eq(anchor->secondary_lba, hdr->secondary_lba) ||
 	    hdr->type != type ||
 	    memcmp(anchor->pad2, hdr->pad2, 512 -
-		   offsetof(struct ddf_header, pad2)) != 0)
+		   offsetof(struct ddf_header, pad2)) != 0) {
+		pr_err("%s: header mismatch\n", __func__);
 		return 0;
+	}
 
 	/* Looks good enough to me... */
 	return 1;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 05/10] DDF: ddf_set_disk: add some debug messages
  2013-07-30 21:16         ` Martin Wilck
                             ` (4 preceding siblings ...)
  2013-07-30 21:18           ` [PATCH 04/10] DDF: load_ddf_header: more error logging mwilck
@ 2013-07-30 21:18           ` mwilck
  2013-07-30 21:18           ` [PATCH 06/10] monitor: read_and_act: log status when called mwilck
                             ` (4 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: mwilck @ 2013-07-30 21:18 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck

From: Martin Wilck <mwilck@arcor.de>

Adds more verbose debugging in ddf_set_disk, to understand failures
better.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 super-ddf.c |   19 ++++++++++++++-----
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/super-ddf.c b/super-ddf.c
index f55cc11..744d52b 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -4152,7 +4152,8 @@ static void ddf_set_disk(struct active_array *a, int n, int state)
 	int pd;
 	struct mdinfo *mdi;
 	struct dl *dl;
-
+	
+	dprintf("%s: %d to %x\n", __func__, n, state);
 	if (vc == NULL) {
 		dprintf("ddf: cannot find instance %d!!\n", inst);
 		return;
@@ -4161,8 +4162,11 @@ static void ddf_set_disk(struct active_array *a, int n, int state)
 	for (mdi = a->info.devs; mdi; mdi = mdi->next)
 		if (mdi->disk.raid_disk == n)
 			break;
-	if (!mdi)
+	if (!mdi) {
+		pr_err("%s: cannot find raid disk %d\n",
+		       __func__, n);
 		return;
+	}
 
 	/* and find the 'dl' entry corresponding to that. */
 	for (dl = ddf->dlist; dl; dl = dl->next)
@@ -4170,8 +4174,12 @@ static void ddf_set_disk(struct active_array *a, int n, int state)
 		    mdi->disk.major == dl->major &&
 		    mdi->disk.minor == dl->minor)
 			break;
-	if (!dl)
+	if (!dl) {
+		pr_err("%s: cannot find raid disk %d (%d/%d)\n",
+		       __func__, n,
+		       mdi->disk.major, mdi->disk.minor);
 		return;
+	}
 
 	pd = find_phys(ddf, vc->phys_refnum[n_bvd]);
 	if (pd < 0 || pd != dl->pdnum) {
@@ -4209,8 +4217,9 @@ static void ddf_set_disk(struct active_array *a, int n, int state)
 			ddf_set_updates_pending(ddf);
 	}
 
-	dprintf("ddf: set_disk %d (%08x) to %x\n", n,
-		be32_to_cpu(dl->disk.refnum), state);
+	dprintf("ddf: set_disk %d (%08x) to %x->%02x\n", n,
+		be32_to_cpu(dl->disk.refnum), state,
+		be16_to_cpu(ddf->phys->entries[pd].state));
 
 	/* Now we need to check the state of the array and update
 	 * virtual_disk.entries[n].state.
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 06/10] monitor: read_and_act: log status when called
  2013-07-30 21:16         ` Martin Wilck
                             ` (5 preceding siblings ...)
  2013-07-30 21:18           ` [PATCH 05/10] DDF: ddf_set_disk: add some debug messages mwilck
@ 2013-07-30 21:18           ` mwilck
  2013-07-31  2:59             ` NeilBrown
  2013-07-30 21:18           ` [PATCH 07/10] mdmon: wait_and_act: fix debug message for SIGUSR1 mwilck
                             ` (3 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: mwilck @ 2013-07-30 21:18 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck

From: Martin Wilck <mwilck@arcor.de>

read_and_act() currently prints a debug message only very late.
Print the status seen by mdmon right away, to track mdmon's
actions more closely. Add a time stamp to observe long delays
between read_and_act calls, e.g. caused by meta data writes.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 monitor.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/monitor.c b/monitor.c
index 67b40b1..b208168 100644
--- a/monitor.c
+++ b/monitor.c
@@ -234,6 +234,7 @@ static int read_and_act(struct active_array *a)
 	struct mdinfo *mdi;
 	int ret = 0;
 	int count = 0;
+	struct timeval tv;
 
 	a->next_state = bad_word;
 	a->next_action = bad_action;
@@ -257,6 +258,17 @@ static int read_and_act(struct active_array *a)
 			mdi->curr_state = read_dev_state(mdi->state_fd);
 		}
 	}
+	
+	gettimeofday(&tv, NULL);
+	fprintf(stderr, "%s(%d): %ld.%06ld state:%s prev:%s action:%s prev: %s start:%llu\n",
+		__func__, a->info.container_member,
+		tv.tv_sec, tv.tv_usec,
+		array_states[a->curr_state],
+		array_states[a->prev_state],
+		sync_actions[a->curr_action],
+		sync_actions[a->prev_action],
+		a->info.resync_start
+		);
 
 	if (a->curr_state > inactive &&
 	    a->prev_state == inactive) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 06/10] monitor: read_and_act: log status when called
  2013-07-30 21:18           ` [PATCH 06/10] monitor: read_and_act: log status when called mwilck
@ 2013-07-31  2:59             ` NeilBrown
  2013-07-31  5:28               ` Martin Wilck
  0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2013-07-31  2:59 UTC (permalink / raw)
  To: mwilck; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1542 bytes --]

On Tue, 30 Jul 2013 23:18:30 +0200 mwilck@arcor.de wrote:

> From: Martin Wilck <mwilck@arcor.de>
> 
> read_and_act() currently prints a debug message only very late.
> Print the status seen by mdmon right away, to track mdmon's
> actions more closely. Add a time stamp to observe long delays
> between read_and_act calls, e.g. caused by meta data writes.
> 
> Signed-off-by: Martin Wilck <mwilck@arcor.de>
> ---
>  monitor.c |   12 ++++++++++++
>  1 files changed, 12 insertions(+), 0 deletions(-)
> 
> diff --git a/monitor.c b/monitor.c
> index 67b40b1..b208168 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -234,6 +234,7 @@ static int read_and_act(struct active_array *a)
>  	struct mdinfo *mdi;
>  	int ret = 0;
>  	int count = 0;
> +	struct timeval tv;
>  
>  	a->next_state = bad_word;
>  	a->next_action = bad_action;
> @@ -257,6 +258,17 @@ static int read_and_act(struct active_array *a)
>  			mdi->curr_state = read_dev_state(mdi->state_fd);
>  		}
>  	}
> +	
> +	gettimeofday(&tv, NULL);
> +	fprintf(stderr, "%s(%d): %ld.%06ld state:%s prev:%s action:%s prev: %s start:%llu\n",
> +		__func__, a->info.container_member,
> +		tv.tv_sec, tv.tv_usec,
> +		array_states[a->curr_state],
> +		array_states[a->prev_state],
> +		sync_actions[a->curr_action],
> +		sync_actions[a->prev_action],
> +		a->info.resync_start
> +		);

I hope this was meant to be "dprintf" ???

I've made that change.

NeilBrown


>  
>  	if (a->curr_state > inactive &&
>  	    a->prev_state == inactive) {


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 06/10] monitor: read_and_act: log status when called
  2013-07-31  2:59             ` NeilBrown
@ 2013-07-31  5:28               ` Martin Wilck
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Wilck @ 2013-07-31  5:28 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 07/31/2013 04:59 AM, NeilBrown wrote:
>  	}
>> +	
>> +	gettimeofday(&tv, NULL);
>> +	fprintf(stderr, "%s(%d): %ld.%06ld state:%s prev:%s action:%s prev: %s start:%llu\n",
>> +		__func__, a->info.container_member,
>> +		tv.tv_sec, tv.tv_usec,
>> +		array_states[a->curr_state],
>> +		array_states[a->prev_state],
>> +		sync_actions[a->curr_action],
>> +		sync_actions[a->prev_action],
>> +		a->info.resync_start
>> +		);
> 
> I hope this was meant to be "dprintf" ???
> 
> I've made that change.

OOps sorry. That was part of a (failed) attempt of mine to print ONLY
this message. I forgot to undo it later.

Thanks a lot for spotting it.

Martin



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 07/10] mdmon: wait_and_act: fix debug message for SIGUSR1
  2013-07-30 21:16         ` Martin Wilck
                             ` (6 preceding siblings ...)
  2013-07-30 21:18           ` [PATCH 06/10] monitor: read_and_act: log status when called mwilck
@ 2013-07-30 21:18           ` mwilck
  2013-07-30 21:18           ` [PATCH 08/10] mdmon: manage_member: debug messages for array state mwilck
                             ` (2 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: mwilck @ 2013-07-30 21:18 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck

From: Martin Wilck <mwilck@arcor.de>

Correctly print out wake reason if it was a signal. Previous code
would print misleading select events (pselect(2) man page says the
fdsets become undefined in case of error).

Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 monitor.c |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/monitor.c b/monitor.c
index b208168..571698a 100644
--- a/monitor.c
+++ b/monitor.c
@@ -640,10 +640,17 @@ static int wait_and_act(struct supertype *container, int nowait)
 		monitor_loop_cnt |= 1;
 		rv = pselect(maxfd+1, NULL, NULL, &rfds, &ts, &set);
 		monitor_loop_cnt += 1;
-		if (rv == -1 && errno == EINTR)
-			rv = 0;
+		if (rv == -1) {
+			if (errno == EINTR) {
+				rv = 0;
+				dprintf("monitor: caught signal\n");
+			} else
+				dprintf("monitor: error %d in pselect\n",
+					errno);
+		}
 		#ifdef DEBUG
-		dprint_wake_reasons(&rfds);
+		else
+			dprint_wake_reasons(&rfds);
 		#endif
 		container->retry_soon = 0;
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 08/10] mdmon: manage_member: debug messages for array state
  2013-07-30 21:16         ` Martin Wilck
                             ` (7 preceding siblings ...)
  2013-07-30 21:18           ` [PATCH 07/10] mdmon: wait_and_act: fix debug message for SIGUSR1 mwilck
@ 2013-07-30 21:18           ` mwilck
  2013-07-30 21:18           ` [PATCH 09/10] mdmon: manage_member: fix race condition during slow meta data writes mwilck
  2013-07-30 21:18           ` [PATCH 10/10] tests/10ddf-create-fail-rebuild: new unit test for DDF mwilck
  10 siblings, 0 replies; 22+ messages in thread
From: mwilck @ 2013-07-30 21:18 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck

From: Martin Wilck <mwilck@arcor.de>

Add debug messages to watch the manager's steps.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 managemon.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/managemon.c b/managemon.c
index f50f954..a655108 100644
--- a/managemon.c
+++ b/managemon.c
@@ -518,6 +518,7 @@ static void manage_member(struct mdstat_ent *mdstat,
 		/* prevent the kernel from activating the disk(s) before we
 		 * finish adding them
 		 */
+		dprintf("%s: freezing %s\n", __func__,  a->info.sys_name);
 		sysfs_set_str(&a->info, NULL, "sync_action", "frozen");
 
 		/* Add device to array and set offset/size/slot.
@@ -536,6 +537,8 @@ static void manage_member(struct mdstat_ent *mdstat,
 		updates = NULL;
 		replace_array(container, a, newa);
 		sysfs_set_str(&a->info, NULL, "sync_action", "recover");
+		dprintf("%s: recovery started on %s\n", __func__,
+			a->info.sys_name);
  out:
 		while (newdev) {
 			d = newdev->next;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 09/10] mdmon: manage_member: fix race condition during slow meta data writes
  2013-07-30 21:16         ` Martin Wilck
                             ` (8 preceding siblings ...)
  2013-07-30 21:18           ` [PATCH 08/10] mdmon: manage_member: debug messages for array state mwilck
@ 2013-07-30 21:18           ` mwilck
  2013-07-30 21:18           ` [PATCH 10/10] tests/10ddf-create-fail-rebuild: new unit test for DDF mwilck
  10 siblings, 0 replies; 22+ messages in thread
From: mwilck @ 2013-07-30 21:18 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck

From: Martin Wilck <mwilck@arcor.de>

In order to track kernel state changes, the monitor needs to
notice changes in sysfs. If the changes are transient, and the
monitor is busy writing meta data, it can happen that the changes
are missed. This will cause the meta data to be inconsistent with
the real state of the array.

I can reproduce this in  a test scenario with a DDF container and
two subarrays, where I set a disk to "failed" and then add a global
hot-spare. On a typical MD test setup with loop devices, I can
reliably reproduce a failure where the metadata show degraded members
although the kernel finished the recovery successfully.

This patch fixes this problem by applying two changes. First, when
a metadata update is queued, wait until it is certain that the monitor
actually applied these meta data (the for loop is actually needed to
avoid failures completely in my test case). Second, after triggering the
recovery, set prev_state of the changed array to "recover", in case
the monitor misses the transient "recover" state.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 managemon.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/managemon.c b/managemon.c
index a655108..40c863f 100644
--- a/managemon.c
+++ b/managemon.c
@@ -535,8 +535,14 @@ static void manage_member(struct mdstat_ent *mdstat,
 		}
 		queue_metadata_update(updates);
 		updates = NULL;
+		while (update_queue_pending || update_queue) {
+			check_update_queue(container);
+			usleep(15*1000);
+		}
 		replace_array(container, a, newa);
-		sysfs_set_str(&a->info, NULL, "sync_action", "recover");
+		if (sysfs_set_str(&a->info, NULL, "sync_action", "recover")
+		    == 0)
+			newa->prev_action = recover;
 		dprintf("%s: recovery started on %s\n", __func__,
 			a->info.sys_name);
  out:
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 10/10] tests/10ddf-create-fail-rebuild: new unit test for DDF
  2013-07-30 21:16         ` Martin Wilck
                             ` (9 preceding siblings ...)
  2013-07-30 21:18           ` [PATCH 09/10] mdmon: manage_member: fix race condition during slow meta data writes mwilck
@ 2013-07-30 21:18           ` mwilck
  2013-07-31  5:36             ` [PATCH] tests/env-ddf-template: helper for new unit test mwilck
  10 siblings, 1 reply; 22+ messages in thread
From: mwilck @ 2013-07-30 21:18 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck

From: Martin Wilck <mwilck@arcor.de>

This test adds a new unit test similar to 009imsm-create-fail-rebuild.
With the previous patches, it actually succeeds on my system.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 tests/10ddf-create-fail-rebuild |   77 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 77 insertions(+), 0 deletions(-)
 create mode 100644 tests/10ddf-create-fail-rebuild

diff --git a/tests/10ddf-create-fail-rebuild b/tests/10ddf-create-fail-rebuild
new file mode 100644
index 0000000..525c24d
--- /dev/null
+++ b/tests/10ddf-create-fail-rebuild
@@ -0,0 +1,77 @@
+# sanity check array creation
+
+ddf_check_hold() {
+   if mdadm --remove $1 $2; then
+       echo "$2 removal from $1 should have been blocked" >&2
+       cat /proc/mdstat >&2
+       mdadm -E $2
+       exit 1
+   fi
+}
+
+ddf_check_removal() {
+   if ! mdadm --remove $1 $2 ; then
+       echo "$2 removal from $1 should have succeeded" >&2
+       cat /proc/mdstat >&2
+       mdadm -E $2
+       exit 1
+   fi
+}
+
+. tests/env-ddf-template
+
+num_disks=2
+mdadm -CR $container -e ddf -n $num_disks $dev8 $dev9
+ddf_check container $num_disks
+
+# RAID0 + RAID1
+size=9000
+level=0
+chunk=64
+offset=0
+layout=0
+mdadm -CR $member0 $dev8 $dev9 -n $num_disks -l $level -z $size -c $chunk
+ddf_check member $member0 $num_disks $level $size $((size*2)) $offset $chunk $layout
+testdev $member0 $num_disks $size $chunk
+
+offset=$(((size & ~(chunk - 1))))
+size=4000
+level=1
+chunk=0
+mdadm -CR $member1 $dev8 $dev9 -n $num_disks -l $level -z $size
+ddf_check member $member1 $num_disks $level $size $size $offset $chunk $layout
+testdev $member1 1 $size 1
+check wait
+
+mdadm -Ss
+
+# RAID10 + RAID5
+num_disks=4
+mdadm -CR $container -e ddf -n $num_disks $dev8 $dev9 $dev10 $dev11
+ddf_check container $num_disks
+
+size=9000
+level=10
+chunk=64
+offset=0
+layout=2
+mdadm -CR $member0 $dev8 $dev9 $dev10 $dev11 -n $num_disks -l $level -z $size -c $chunk
+ddf_check member $member0 $num_disks $level $size $((size*2)) $offset $chunk $layout
+testdev $member0 $((num_disks-2)) $size $chunk
+
+offset=$(((size & ~(chunk - 1))))
+size=4000
+level=5
+mdadm -CR $member1 $dev8 $dev9 $dev10 $dev11 -n $num_disks -l $level -z $size -c $chunk
+ddf_check member $member1 $num_disks $level $size $((size*3)) $offset $chunk $layout
+testdev $member1 $((num_disks-1)) $size $chunk
+check wait
+
+# FAIL / REBUILD
+ddf_check_hold $container $dev8
+mdadm --fail $member0 $dev8
+mdadm --wait-clean --scan
+ddf_check_removal $container $dev8
+mdadm --add $container $dev12
+check wait
+ddf_check_hold $container $dev12
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] tests/env-ddf-template: helper for new unit test
  2013-07-30 21:18           ` [PATCH 10/10] tests/10ddf-create-fail-rebuild: new unit test for DDF mwilck
@ 2013-07-31  5:36             ` mwilck
  2013-07-31  6:49               ` NeilBrown
  0 siblings, 1 reply; 22+ messages in thread
From: mwilck @ 2013-07-31  5:36 UTC (permalink / raw)
  To: neilb, linux-raid; +Cc: mwilck

From: Martin Wilck <mwilck@arcor.de>

I forgot to check in this helper script, similar to the one for IMSM.
It is needed by tests/10ddf-create-fail-rebuild.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
 tests/env-ddf-template |   76 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 76 insertions(+), 0 deletions(-)
 create mode 100644 tests/env-ddf-template

diff --git a/tests/env-ddf-template b/tests/env-ddf-template
new file mode 100644
index 0000000..28e19d0
--- /dev/null
+++ b/tests/env-ddf-template
@@ -0,0 +1,76 @@
+ddf_check() {
+    udevadm settle
+    case $1 in
+    container )
+        grep -s "blocks super external:ddf" /proc/mdstat > /dev/null || {
+            echo >&2 "**Fatal** Correctly formed container not found"; cat /proc/mdstat; exit 1; }
+        ;;
+    member )
+        t_member=$2
+        t_num_disks=$3
+        t_level=$4
+        t_rd_size=$5
+        t_size=$6
+        t_offset=$7
+        t_chunk=$8
+	t_layout=$9
+	
+	if [ $t_chunk -ne 0 ]; then
+	    t_rd_size=$((t_rd_size & ~(t_chunk - 1)))
+        fi
+	case $t_level in
+	0) t_size=$((t_num_disks*$t_rd_size));;
+	1) t_size=$t_rd_size;;
+	4|5) t_size=$(((t_num_disks-1)*$t_rd_size));;
+	6) t_size=$(((t_num_disks-2)*$t_rd_size));;
+	10) t_size=$((t_num_disks*$t_rd_size/t_layout));;
+	esac
+
+        err=0
+
+        eval `stat -L -c "let major=0x%t; let minor=0x%T;" $t_member`
+        sysfs=/sys/dev/block/${major}:${minor}
+        if [ ! -f ${sysfs}/md/array_state ]; then
+            echo "**Fatal**: Array member $t_member not found" >&2; cat /proc/mdstat >&2; exit 1
+        fi
+        _chunk=`cat ${sysfs}/md/chunk_size`
+        if [ $t_chunk -ne $((_chunk/1024)) ]; then
+            echo "**Error**: Chunk size mismatch - expected $t_chunk, actual $_chunk" >&2
+            err=$((err + 1))
+        fi
+        for i in `seq 0 $((t_num_disks - 1))`; do
+	    _offset=`cat ${sysfs}/md/rd${i}/offset`
+	    if [ $t_offset -ne $((_offset / 2)) ]; then
+		echo "**Error**: Offset mismatch - expected $t_offset, actual $((_offset/2))" >&2
+		err=$((err + 1))
+	    fi
+	    _rd_size=`cat ${sysfs}/md/rd${i}/size`
+	    if [ $t_rd_size -ne $_rd_size ]; then
+		echo "**Error**: Component size mismatch - expected $t_rd_size, actual $_rd_size" >&2
+		err=$((err + 1))
+	    fi
+        done
+        _size=`cat ${sysfs}/md/array_size`
+	[ o$_size = odefault ] && _size=$(($(cat ${sysfs}/size)/2))
+        if [ $t_size -ne $_size ]; then
+            echo "**Error**: Array size mismatch - expected $t_size, actual $_size" >&2
+            err=$((err + 1))
+        fi
+        if [ $err -gt 0 ]; then
+            echo "$t_member failed check" >&2
+            cat /proc/mdstat >&2
+            mdadm -E /dev/loop8 >&2
+            exit 1
+        fi
+        ;;
+    * )
+        echo >&2 "**Error** unknown check $1"; exit 1;
+    esac
+}
+
+container=/dev/md/ddf
+member0=/dev/md/vol0
+member1=/dev/md/vol1
+member2=/dev/md/vol2
+member3=/dev/md/vol3
+member4=/dev/md/vol4
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] tests/env-ddf-template: helper for new unit test
  2013-07-31  5:36             ` [PATCH] tests/env-ddf-template: helper for new unit test mwilck
@ 2013-07-31  6:49               ` NeilBrown
  0 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2013-07-31  6:49 UTC (permalink / raw)
  To: mwilck; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3382 bytes --]

On Wed, 31 Jul 2013 07:36:32 +0200 mwilck@arcor.de wrote:

> From: Martin Wilck <mwilck@arcor.de>
> 
> I forgot to check in this helper script, similar to the one for IMSM.
> It is needed by tests/10ddf-create-fail-rebuild.
> 
> Signed-off-by: Martin Wilck <mwilck@arcor.de>
> ---
>  tests/env-ddf-template |   76 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 76 insertions(+), 0 deletions(-)
>  create mode 100644 tests/env-ddf-template
> 
> diff --git a/tests/env-ddf-template b/tests/env-ddf-template
> new file mode 100644
> index 0000000..28e19d0
> --- /dev/null
> +++ b/tests/env-ddf-template
> @@ -0,0 +1,76 @@
> +ddf_check() {
> +    udevadm settle
> +    case $1 in
> +    container )
> +        grep -s "blocks super external:ddf" /proc/mdstat > /dev/null || {
> +            echo >&2 "**Fatal** Correctly formed container not found"; cat /proc/mdstat; exit 1; }
> +        ;;
> +    member )
> +        t_member=$2
> +        t_num_disks=$3
> +        t_level=$4
> +        t_rd_size=$5
> +        t_size=$6
> +        t_offset=$7
> +        t_chunk=$8
> +	t_layout=$9
> +	
> +	if [ $t_chunk -ne 0 ]; then
> +	    t_rd_size=$((t_rd_size & ~(t_chunk - 1)))
> +        fi
> +	case $t_level in
> +	0) t_size=$((t_num_disks*$t_rd_size));;
> +	1) t_size=$t_rd_size;;
> +	4|5) t_size=$(((t_num_disks-1)*$t_rd_size));;
> +	6) t_size=$(((t_num_disks-2)*$t_rd_size));;
> +	10) t_size=$((t_num_disks*$t_rd_size/t_layout));;
> +	esac
> +
> +        err=0
> +
> +        eval `stat -L -c "let major=0x%t; let minor=0x%T;" $t_member`
> +        sysfs=/sys/dev/block/${major}:${minor}
> +        if [ ! -f ${sysfs}/md/array_state ]; then
> +            echo "**Fatal**: Array member $t_member not found" >&2; cat /proc/mdstat >&2; exit 1
> +        fi
> +        _chunk=`cat ${sysfs}/md/chunk_size`
> +        if [ $t_chunk -ne $((_chunk/1024)) ]; then
> +            echo "**Error**: Chunk size mismatch - expected $t_chunk, actual $_chunk" >&2
> +            err=$((err + 1))
> +        fi
> +        for i in `seq 0 $((t_num_disks - 1))`; do
> +	    _offset=`cat ${sysfs}/md/rd${i}/offset`
> +	    if [ $t_offset -ne $((_offset / 2)) ]; then
> +		echo "**Error**: Offset mismatch - expected $t_offset, actual $((_offset/2))" >&2
> +		err=$((err + 1))
> +	    fi
> +	    _rd_size=`cat ${sysfs}/md/rd${i}/size`
> +	    if [ $t_rd_size -ne $_rd_size ]; then
> +		echo "**Error**: Component size mismatch - expected $t_rd_size, actual $_rd_size" >&2
> +		err=$((err + 1))
> +	    fi
> +        done
> +        _size=`cat ${sysfs}/md/array_size`
> +	[ o$_size = odefault ] && _size=$(($(cat ${sysfs}/size)/2))
> +        if [ $t_size -ne $_size ]; then
> +            echo "**Error**: Array size mismatch - expected $t_size, actual $_size" >&2
> +            err=$((err + 1))
> +        fi
> +        if [ $err -gt 0 ]; then
> +            echo "$t_member failed check" >&2
> +            cat /proc/mdstat >&2
> +            mdadm -E /dev/loop8 >&2
> +            exit 1
> +        fi
> +        ;;
> +    * )
> +        echo >&2 "**Error** unknown check $1"; exit 1;
> +    esac
> +}
> +
> +container=/dev/md/ddf
> +member0=/dev/md/vol0
> +member1=/dev/md/vol1
> +member2=/dev/md/vol2
> +member3=/dev/md/vol3
> +member4=/dev/md/vol4


applied, thanks.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2013-07-31  6:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-26 20:58 Suspicious test failure - mdmon misses recovery events on loop devices Martin Wilck
2013-07-29  6:55 ` NeilBrown
2013-07-29 20:39   ` Martin Wilck
2013-07-29 20:42     ` Martin Wilck
2013-07-30  0:42       ` NeilBrown
2013-07-30 21:16         ` Martin Wilck
2013-07-30 21:18           ` [PATCH 00/10] Two bug fixes and a lot of debug code mwilck
2013-07-31  3:10             ` NeilBrown
2013-07-30 21:18           ` [PATCH 01/10] DDF: ddf_activate_spare: bugfix for 62ff3c40 mwilck
2013-07-30 21:18           ` [PATCH 02/10] DDF: log disk status changes more nicely mwilck
2013-07-30 21:18           ` [PATCH 03/10] DDF: ddf_process_update: log offsets for conf changes mwilck
2013-07-30 21:18           ` [PATCH 04/10] DDF: load_ddf_header: more error logging mwilck
2013-07-30 21:18           ` [PATCH 05/10] DDF: ddf_set_disk: add some debug messages mwilck
2013-07-30 21:18           ` [PATCH 06/10] monitor: read_and_act: log status when called mwilck
2013-07-31  2:59             ` NeilBrown
2013-07-31  5:28               ` Martin Wilck
2013-07-30 21:18           ` [PATCH 07/10] mdmon: wait_and_act: fix debug message for SIGUSR1 mwilck
2013-07-30 21:18           ` [PATCH 08/10] mdmon: manage_member: debug messages for array state mwilck
2013-07-30 21:18           ` [PATCH 09/10] mdmon: manage_member: fix race condition during slow meta data writes mwilck
2013-07-30 21:18           ` [PATCH 10/10] tests/10ddf-create-fail-rebuild: new unit test for DDF mwilck
2013-07-31  5:36             ` [PATCH] tests/env-ddf-template: helper for new unit test mwilck
2013-07-31  6:49               ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).