From: Jan Kara <jack@suse.cz>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jan Kara <jack@suse.cz>, Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, David Jeffery <djeffery@redhat.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Gabriel Krisman Bertazi <krisman@suse.de>,
Chengming Zhou <zhouchengming@bytedance.com>
Subject: Re: [RFC PATCH] sbitmap: fix batching wakeup
Date: Tue, 8 Aug 2023 12:30:19 +0200 [thread overview]
Message-ID: <20230808103019.zvduupwrbalk3ee4@quack3> (raw)
In-Reply-To: <ZNH6as/wUkbCMAcN@fedora>
On Tue 08-08-23 16:18:50, Ming Lei wrote:
> On Wed, Aug 02, 2023 at 06:05:53PM +0200, Jan Kara wrote:
> > On Fri 21-07-23 17:57:15, Ming Lei wrote:
> > > From: David Jeffery <djeffery@redhat.com>
> > >
> > > Current code supposes that it is enough to provide forward progress by just
> > > waking up one wait queue after one completion batch is done.
> > >
> > > Unfortunately this way isn't enough, cause waiter can be added to
> > > wait queue just after it is woken up.
> > >
> > > Follows one example(64 depth, wake_batch is 8)
> > >
> > > 1) all 64 tags are active
> > >
> > > 2) in each wait queue, there is only one single waiter
> > >
> > > 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> > > queue, then immediately one new sleeper is added to this wait queue
> > >
> > > 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> > > wait queue
> > >
> > > 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> > > can't be waken up anymore.
> > >
> > > Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> > > single batch.
> > >
> > > Cc: David Jeffery <djeffery@redhat.com>
> > > Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> > > Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> > > Cc: Chengming Zhou <zhouchengming@bytedance.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> >
> > I'm sorry for the delay - I was on vacation. I can see the patch got
> > already merged and I'm not strictly against that (although I think Gabriel
> > was experimenting with this exact wakeup scheme and as far as I remember
> > the more eager waking up was causing performance decrease for some
> > configurations). But let me challenge the analysis above a bit. For the
> > sleeper to be added to a waitqueue in step 3), blk_mq_mark_tag_wait() must
> > fail the blk_mq_get_driver_tag() call. Which means that all tags were used
>
> Here only allocating request by blk_mq_get_tag() is involved, and
> getting driver tag isn't involved.
>
> > at that moment. To summarize, anytime we add any new waiter to the
> > waitqueue, all tags are used and thus we should eventually receive enough
> > wakeups to wake all of them. What am I missing?
>
> When running the final retry(__blk_mq_get_tag) before
> sleeping(io_schedule()) in blk_mq_get_tag(), the sleeper has been added to
> wait queue.
>
> So when two completion batch comes, the two may wake up same wq because
> same ->wake_index can be observed from two completion path, and both two
> wake_up_nr() can return > 0 because adding sleeper into wq and wake_up_nr()
> can be interleaved, then 16 completions just wakeup 2 sleepers added to
> same wq.
>
> If the story happens on one wq with >= 8 sleepers, io hang will be
> triggered, if there are another two pending wq.
OK, thanks for explanation! I think I see the problem now.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2023-08-08 21:53 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-21 9:57 [RFC PATCH] sbitmap: fix batching wakeup Ming Lei
2023-07-21 10:40 ` Keith Busch
2023-07-21 10:50 ` Ming Lei
2023-07-21 17:38 ` David Jeffery
2023-07-21 11:51 ` Keith Busch
2023-07-21 16:35 ` Gabriel Krisman Bertazi
2023-07-22 2:42 ` Ming Lei
2023-07-21 17:29 ` Jens Axboe
2023-07-21 17:40 ` Jens Axboe
2023-08-02 16:05 ` Jan Kara
2023-08-08 8:18 ` Ming Lei
2023-08-08 10:30 ` Jan Kara [this message]
2024-01-15 9:51 ` Kemeng Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230808103019.zvduupwrbalk3ee4@quack3 \
--to=jack@suse.cz \
--cc=axboe@kernel.dk \
--cc=djeffery@redhat.com \
--cc=krisman@suse.de \
--cc=linux-block@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=shikemeng@huaweicloud.com \
--cc=zhouchengming@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox