public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [PATCH] blk-wbt: fix indefinite background writeback sleep
Date: Sat, 23 Jun 2018 09:37:20 +0800	[thread overview]
Message-ID: <20180623013719.GA19228@ming.t460p> (raw)
In-Reply-To: <d15e2417-641b-9aa4-5048-aabb48c1a7b0@kernel.dk>

On Fri, Jun 22, 2018 at 05:13:31PM -0600, Jens Axboe wrote:
> On 6/22/18 5:11 PM, Ming Lei wrote:
> > On Fri, Jun 22, 2018 at 04:51:26PM -0600, Jens Axboe wrote:
> >> On 6/22/18 4:43 PM, Ming Lei wrote:
> >>> On Fri, Jun 22, 2018 at 01:26:10PM -0600, Jens Axboe wrote:
> >>>> blk-wbt adds waiters to the tail of the waitqueue, and factors in the
> >>>> task placement in its decision making on whether or not the current task
> >>>> can proceed. This can cause issues for the lowest class of writers,
> >>>> since they can get woken up, denied access, and then put back to sleep
> >>>> at the end of the waitqueue.
> >>>>
> >>>> Fix this so that we only utilize the tail add for the initial sleep, and
> >>>> we don't factor in the wait queue placement after we've slept (and are
> >>>> now using the head addition).
> >>>>
> >>>> Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
> >>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>>>
> >>>> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> >>>> index 4f89b28fa652..7beeabd05f4a 100644
> >>>> --- a/block/blk-wbt.c
> >>>> +++ b/block/blk-wbt.c
> >>>> @@ -550,7 +550,7 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw,
> >>>>  	 * If the waitqueue is already active and we are not the next
> >>>>  	 * in line to be woken up, wait for our turn.
> >>>>  	 */
> >>>> -	if (waitqueue_active(&rqw->wait) &&
> >>>> +	if (wait && waitqueue_active(&rqw->wait) &&
> >>>>  	    rqw->wait.head.next != &wait->entry)
> >>>>  		return false;
> >>>>  
> >>>> @@ -567,16 +567,27 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
> >>>>  	__acquires(lock)
> >>>>  {
> >>>>  	struct rq_wait *rqw = get_rq_wait(rwb, wb_acct);
> >>>> +	struct wait_queue_entry *waitptr = NULL;
> >>>>  	DEFINE_WAIT(wait);
> >>>>  
> >>>> -	if (may_queue(rwb, rqw, &wait, rw))
> >>>> +	if (may_queue(rwb, rqw, waitptr, rw))
> >>>>  		return;
> >>>>  
> >>>> +	waitptr = &wait;
> >>>>  	do {
> >>>> -		prepare_to_wait_exclusive(&rqw->wait, &wait,
> >>>> +		/*
> >>>> +		 * Don't add ourselves to the wq tail if we've already
> >>>> +		 * slept. Otherwise we can penalize background writes
> >>>> +		 * indefinitely.
> >>>> +		 */
> >>>
> >>> I saw this indefinite wbt_wait() in systemd-journal when running
> >>> aio-stress read test, but just once, not figured out one approach
> >>> to reproduce it yet, just wondering if you have quick test case for
> >>> reproducing and verifying this issue.
> >>
> >> I've seen it in production, but I'm currently relying on someone else
> >> to reproduce it synthetically. I'm just providing the patches for
> >> testing.
> >>
> >>>> +		if (waitptr)
> >>>> +			prepare_to_wait_exclusive(&rqw->wait, &wait,
> >>>> +							TASK_UNINTERRUPTIBLE);
> >>>> +		else
> >>>> +			prepare_to_wait(&rqw->wait, &wait,
> >>>>  						TASK_UNINTERRUPTIBLE);
> >>>
> >>> Could you explain a bit why the 'wait_entry' order matters wrt. this
> >>> issue? Since other 'wait_entry' still may come at the head meantime to
> >>> the same wq before checking in may_queue().
> >>
> >> Let's say we have 10 tasks queued up. Each one gets added to the tail,
> >> so when it's our turn, we've now reached the head. We fail to get a
> >> queue token, so we go back to sleep. At that point we should add
> >> back to the head, not the tail, for fairness purposes.
> > 
> > OK, it is reasonable to do it for fairness purpose, but seems we still
> > don't know how the wait forever in wbt_wait() is fixed by this way.
> 
> It's not, it's just removing a class of unfairness. You should not go
> to the back of the queue if you fail, you should remain near the top.
> 
> > I guess the reason is in the check of 'rqw->wait.head.next != &wait->entry',
> > which is basically removed when the waiter is waken up.
> 
> Plus at that point it's useless, since we are the head of queue. The
> check only makes sense if we tail add.

Seems not safe to run the check in case of tail add too:

- just during or after checking 'rqw->wait.head.next != &wait->entry', all
  inflight requests in this wq are done

- may_queue() still returns false because 'rqw->wait.head.next != &wait->entry'
  returns true, then __wbt_wait() may wait forever.

Thanks,
Ming

  reply	other threads:[~2018-06-23  1:37 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-22 19:26 [PATCH] blk-wbt: fix indefinite background writeback sleep Jens Axboe
2018-06-22 22:43 ` Ming Lei
2018-06-22 22:51   ` Jens Axboe
2018-06-22 23:11     ` Ming Lei
2018-06-22 23:13       ` Jens Axboe
2018-06-23  1:37         ` Ming Lei [this message]
2018-06-23 16:20           ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180623013719.GA19228@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox