From: mwilck@arcor.de
To: neilb@suse.de, linux-raid@vger.kernel.org
Cc: mwilck@arcor.de
Subject: [PATCH 09/10] mdmon: manage_member: fix race condition during slow meta data writes
Date: Tue, 30 Jul 2013 23:18:33 +0200 [thread overview]
Message-ID: <1375219114-5626-10-git-send-email-mwilck@arcor.de> (raw)
In-Reply-To: <51F82D3B.6060104@arcor.de>
From: Martin Wilck <mwilck@arcor.de>
In order to track kernel state changes, the monitor needs to
notice changes in sysfs. If the changes are transient, and the
monitor is busy writing meta data, it can happen that the changes
are missed. This will cause the meta data to be inconsistent with
the real state of the array.
I can reproduce this in a test scenario with a DDF container and
two subarrays, where I set a disk to "failed" and then add a global
hot-spare. On a typical MD test setup with loop devices, I can
reliably reproduce a failure where the metadata show degraded members
although the kernel finished the recovery successfully.
This patch fixes this problem by applying two changes. First, when
a metadata update is queued, wait until it is certain that the monitor
actually applied these meta data (the for loop is actually needed to
avoid failures completely in my test case). Second, after triggering the
recovery, set prev_state of the changed array to "recover", in case
the monitor misses the transient "recover" state.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
---
managemon.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/managemon.c b/managemon.c
index a655108..40c863f 100644
--- a/managemon.c
+++ b/managemon.c
@@ -535,8 +535,14 @@ static void manage_member(struct mdstat_ent *mdstat,
}
queue_metadata_update(updates);
updates = NULL;
+ while (update_queue_pending || update_queue) {
+ check_update_queue(container);
+ usleep(15*1000);
+ }
replace_array(container, a, newa);
- sysfs_set_str(&a->info, NULL, "sync_action", "recover");
+ if (sysfs_set_str(&a->info, NULL, "sync_action", "recover")
+ == 0)
+ newa->prev_action = recover;
dprintf("%s: recovery started on %s\n", __func__,
a->info.sys_name);
out:
--
1.7.1
next prev parent reply other threads:[~2013-07-30 21:18 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-26 20:58 Suspicious test failure - mdmon misses recovery events on loop devices Martin Wilck
2013-07-29 6:55 ` NeilBrown
2013-07-29 20:39 ` Martin Wilck
2013-07-29 20:42 ` Martin Wilck
2013-07-30 0:42 ` NeilBrown
2013-07-30 21:16 ` Martin Wilck
2013-07-30 21:18 ` [PATCH 00/10] Two bug fixes and a lot of debug code mwilck
2013-07-31 3:10 ` NeilBrown
2013-07-30 21:18 ` [PATCH 01/10] DDF: ddf_activate_spare: bugfix for 62ff3c40 mwilck
2013-07-30 21:18 ` [PATCH 02/10] DDF: log disk status changes more nicely mwilck
2013-07-30 21:18 ` [PATCH 03/10] DDF: ddf_process_update: log offsets for conf changes mwilck
2013-07-30 21:18 ` [PATCH 04/10] DDF: load_ddf_header: more error logging mwilck
2013-07-30 21:18 ` [PATCH 05/10] DDF: ddf_set_disk: add some debug messages mwilck
2013-07-30 21:18 ` [PATCH 06/10] monitor: read_and_act: log status when called mwilck
2013-07-31 2:59 ` NeilBrown
2013-07-31 5:28 ` Martin Wilck
2013-07-30 21:18 ` [PATCH 07/10] mdmon: wait_and_act: fix debug message for SIGUSR1 mwilck
2013-07-30 21:18 ` [PATCH 08/10] mdmon: manage_member: debug messages for array state mwilck
2013-07-30 21:18 ` mwilck [this message]
2013-07-30 21:18 ` [PATCH 10/10] tests/10ddf-create-fail-rebuild: new unit test for DDF mwilck
2013-07-31 5:36 ` [PATCH] tests/env-ddf-template: helper for new unit test mwilck
2013-07-31 6:49 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1375219114-5626-10-git-send-email-mwilck@arcor.de \
--to=mwilck@arcor.de \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).