From: Xiao Ni <xni@redhat.com>
To: Coly Li <colyli@suse.de>
Cc: Shaohua Li <shli@kernel.org>,
linux-raid@vger.kernel.org, ncroxon@redhat.com, neilb@suse.com
Subject: Re: [MD PATCH 1/1] Don't add nr_pending for resync requests
Date: Wed, 26 Apr 2017 05:22:08 -0400 (EDT) [thread overview]
Message-ID: <228146615.1231110.1493198528975.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <599ae320-3375-8bd6-f1cd-4091ecc0a529@suse.de>
----- Original Message -----
> From: "Coly Li" <colyli@suse.de>
> To: "Shaohua Li" <shli@kernel.org>, "Xiao Ni" <xni@redhat.com>
> Cc: linux-raid@vger.kernel.org, ncroxon@redhat.com, neilb@suse.com
> Sent: Wednesday, April 26, 2017 4:40:20 PM
> Subject: Re: [MD PATCH 1/1] Don't add nr_pending for resync requests
>
> On 2017/4/26 上午1:10, Shaohua Li wrote:
> > Hi,
> > Cc: Coly and Neil.
> >
> > On Mon, Apr 24, 2017 at 04:47:15PM +0800, Xiao Ni wrote:
> >> In newer barrier codes, raise_barrier waits if conf->nr_pending[idx] is
> >> not zero.
> >> After all the conditions are true, the resync request can go on be
> >> handling. But
> >> it adds conf->nr_pending[idx] again. The next resync request hit the same
> >> bucket
> >> idx need to wait the resync request which is submitted before. The
> >> performance
> >> of resync/recovery is degraded.
> >>
> >> I did some tests:
> >> 1. Before removing the codes, the resync performance is:
> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> >> avgqu-sz await r_await w_await svctm %util
> >> sdc 0.00 0.00 0.00 164.00 0.00 10.13 126.46
> >> 0.95 5.80 0.00 5.80 5.82 95.40
> >> sdb 0.00 0.00 163.00 2.00 10.19 0.00 126.47
> >> 0.04 0.25 0.18 5.50 0.25 4.10
> >> 2. After removing the codes, the resync performance is:
> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> >> avgqu-sz await r_await w_await svctm %util
> >> sdc 0.00 2164.00 0.00 751.00 0.00 182.06 496.49
> >> 7.49 9.95 0.00 9.95 1.33 100.00
> >> sdb 2162.00 0.00 752.00 0.00 182.25 0.00 496.34
> >> 0.56 0.74 0.74 0.00 0.65 48.80
> >
> > A big resync regression, thanks to check it! Your analysis makes sense to
> > me,
> > but the patch doesn't. freeze_array() depends on nr_pending to wait for all
> > normal IO and resync. This change will make the wait for resync disappear.
> >
> > I think the problem is we are abusing nr_pending here. freeze_array() waits
> > for
> > a condition which doesn't need to be maintained by nr_pending. It makes
> > sense
> > raise_barrier to wait for nr_pending caused by normal IO, but it doesn't
> > make
> > sense to wait for other raise_barrier.
> >
> > We could add another vairable, increase it in raise_barrier, decrease it in
> > lower_barrier and let freeze_array() check the new variable. That should
> > fix
> > the problem and not expose race condition for freeze_array().
>
> Shaohua,
>
> Adding another variable to recode pending resync I/O is good idea. We
> can do something like this,
>
> @@ -869,7 +869,7 @@ static void raise_barrier()
>
> - atomic_inc(&conf->nr_pending[idx]);
> + atomic_inc(&conf->nr_resync_pending[idx]);
> spin_unlock_irq(&conf->resync_lock);
> }
>
> @@ -880,7 +880,7 @@ static void lower_barrier()
>
> atomic_dec(&conf->barrier[idx]);
> - atomic_dec(&conf->nr_pending[idx]);
> + atomic_dec(&conf->nr_resync_pending[idx]);
> wake_up(&conf->wait_barrier);
> }
>
> @@ -1018,7 +1018,8 @@ static int get_unqueued_pending()
> int idx, ret;
>
> for (ret = 0, idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> - ret += atomic_read(&conf->nr_pending[idx]) -
> + ret += atomic_read(&conf->nr_pending[idx]) +
> + atomic_read(&conf->nr_resync_pending[idx]) -
> atomic_read(&conf->nr_queued[idx]);
>
> return ret;
>
> Xiao,
>
> Nice catch! From your email, it seems resync request on raid1 device can
> be non-linear, otherwise, it won't hit same barrier bucket by different
> resync I/O to raise barrier. This is some thing I don't know. Thank you !
Hi Coly and shaohua
Sorry I don't understand "non-linear raid1 device". What's the meaning about this?
In raid1_sync_request, it can handle RESYNC_PAGES pages. It's 64KB. So there is a
chance that different resync requests hit the same bucket id. Because the bucket unit
is 64MB. Am I right?
>
> The reason to increase conf->nr_pending[idx] is from 'commit 34e97f17
> ("md/raid1: count resync requests in nr_pending.")', because normal I/O
> and resync I/O can happen at same time on same raid1 device. And both of
> them can be queued by reschedule_retry(), raise_barrier() need to
> increase conf->nr_pending[idx] to make sure pending I/Os will be equal
> to queued I/Os when array is frozen. We cannot remove
> conf->nr_pending[idx] operations from raise_barrier()/lower_barrier().
>
> Would you like to compose a patch by Shaohua's suggestion ?
>
Sure, I'll re-send the patch later.
At first I didn't understand the relationship about nr_pending and nr_queued.
If I remove the action adding nr_pending for resync requests. There maybe one
deadlock in freeze_array. Thanks for the explanation.
Regards
Xiao
prev parent reply other threads:[~2017-04-26 9:22 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-24 8:47 [MD PATCH 1/1] Don't add nr_pending for resync requests Xiao Ni
2017-04-25 17:10 ` Shaohua Li
2017-04-26 8:40 ` Coly Li
2017-04-26 9:22 ` Xiao Ni [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=228146615.1231110.1493198528975.JavaMail.zimbra@redhat.com \
--to=xni@redhat.com \
--cc=colyli@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=ncroxon@redhat.com \
--cc=neilb@suse.com \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).