From: NeilBrown <neilb@suse.de>
To: Heinz Mauelshagen <heinzm@redhat.com>
Cc: "dm-devel >> device-mapper development" <dm-devel@redhat.com>,
linux-raid@vger.kernel.org
Subject: Re: [PATCH] md: fix raid5 livelock
Date: Wed, 28 Jan 2015 13:37:54 +1100 [thread overview]
Message-ID: <20150128133754.25835582@notabene.brown> (raw)
In-Reply-To: <54C54CBC.50101@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3242 bytes --]
On Sun, 25 Jan 2015 21:06:20 +0100 Heinz Mauelshagen <heinzm@redhat.com>
wrote:
> From: Heinz Mauelshagen <heinzm@redhat.com>
>
> Hi Neil,
>
> the reconstruct write optimization in raid5, function fetch_block causes
> livelocks in LVM raid4/5 tests.
>
> Test scenarios:
> the tests wait for full initial array resynchronization before making a
> filesystem
> on the raid4/5 logical volume, mounting it, writing to the filesystem
> and failing
> one physical volume holding a raiddev.
>
> In short, we're seeing livelocks on fully synchronized raid4/5 arrays
> with a failed device.
>
> This patch fixes the issue but likely in a suboptimnal way.
>
> Do you think there is a better solution to avoid livelocks on
> reconstruct writes?
>
> Regards,
> Heinz
>
> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
> Tested-by: Jon Brassow <jbrassow@redhat.com>
> Tested-by: Heinz Mauelshagen <heinzm@redhat.com>
>
> ---
> drivers/md/raid5.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index c1b0d52..0fc8737 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -2915,7 +2915,7 @@ static int fetch_block(struct stripe_head *sh,
> struct stripe_head_state *s,
> (s->failed >= 1 && fdev[0]->toread) ||
> (s->failed >= 2 && fdev[1]->toread) ||
> (sh->raid_conf->level <= 5 && s->failed && fdev[0]->towrite &&
> - (!test_bit(R5_Insync, &dev->flags) ||
> test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) &&
> + (!test_bit(R5_Insync, &dev->flags) ||
> test_bit(STRIPE_PREREAD_ACTIVE, &sh->state) || s->non_overwrite) &&
> !test_bit(R5_OVERWRITE, &fdev[0]->flags)) ||
> ((sh->raid_conf->level == 6 ||
> sh->sector >= sh->raid_conf->mddev->recovery_cp)
That is a bit heavy handed, but knowing that fixes the problem helps a lot.
I think the problem happens when processes a non-overwrite write to a failed
device.
fetch_block() should, in that case, pre-read all of the working device, but
since
(!test_bit(R5_Insync, &dev->flags) || test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) &&
was added, it sometimes doesn't. The root problem is that
handle_stripe_dirtying is getting confused because neither rmw or rcw seem to
work, so it doesn't start the chain of events to set STRIPE_PREREAD_ACTIVE.
The following (which is against mainline) might fix it. Can you test?
Thanks,
NeilBrown
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c1b0d52bfcb0..793cf2861e97 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3195,6 +3195,10 @@ static void handle_stripe_dirtying(struct r5conf *conf,
(unsigned long long)sh->sector,
rcw, qread, test_bit(STRIPE_DELAYED, &sh->state));
}
+ if (rcw > disks && rmw > disks &&
+ !test_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
+ set_bit(STRIPE_DELAYED, &sh->state);
+
/* now if nothing is locked, and if we have enough data,
* we can start a write request
*/
This code really really needs to be tidied up and commented better!!!
Thanks,
NeilBrown
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
next prev parent reply other threads:[~2015-01-28 2:37 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-25 20:06 [PATCH] md: fix raid5 livelock Heinz Mauelshagen
2015-01-28 2:37 ` NeilBrown [this message]
[not found] ` <54C8CFF8.6000807@redhat.com>
2015-01-29 11:24 ` [dm-devel] " Heinz Mauelshagen
2015-02-02 0:06 ` NeilBrown
2015-01-29 17:17 ` Jes Sorensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150128133754.25835582@notabene.brown \
--to=neilb@suse.de \
--cc=dm-devel@redhat.com \
--cc=heinzm@redhat.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).