From: Keith Busch <kbusch@kernel.org>
To: Ming Lei <ming.lei@redhat.com>
Cc: Sagi Grimberg <sagi@grimberg.me>, Jens Axboe <axboe@kernel.dk>,
Christoph Hellwig <hch@lst.de>,
linux-nvme@lists.infradead.org, Yi Zhang <yi.zhang@redhat.com>,
linux-block@vger.kernel.org, Chunguang Xu <brookxu.cn@gmail.com>
Subject: Re: [PATCH V2 0/4] nvme: fix two kinds of IO hang from removing NSs
Date: Thu, 22 Jun 2023 09:19:19 -0600 [thread overview]
Message-ID: <ZJRmd7bnclaNW3PL@kbusch-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <ZJRgUXfRuuOoIN1o@ovpn-8-19.pek2.redhat.com>
On Thu, Jun 22, 2023 at 10:53:05PM +0800, Ming Lei wrote:
> On Thu, Jun 22, 2023 at 08:35:49AM -0600, Keith Busch wrote:
> > On Thu, Jun 22, 2023 at 09:51:12PM +0800, Ming Lei wrote:
> > > On Wed, Jun 21, 2023 at 09:48:49AM -0600, Keith Busch wrote:
> > > > The point was to contain requests from entering while the hctx's are
> > > > being reconfigured. If you're going to pair up the freezes as you've
> > > > suggested, we might as well just not call freeze at all.
> > >
> > > blk_mq_update_nr_hw_queues() requires queue to be frozen.
> >
> > It's too late at that point. Let's work through a real example. You'll
> > need a system that has more CPU's than your nvme has IO queues.
> >
> > Boot without any special nvme parameters. Every possible nvme IO queue
> > will be assigned "default" hctx type. Now start IO to every queue, then
> > run:
> >
> > # echo 8 > /sys/modules/nvme/parameters/poll_queues && echo 1 > /sys/class/nvme/nvme0/reset_controller
> >
> > Today, we freeze prior to tearing down the "default" IO queues, so
> > there's nothing entered into them while the driver reconfigures the
> > queues.
>
> nvme_start_freeze() just prevents new IO from being queued, and old ones
> may still be entering block layer queue, and what matters here is
> actually quiesce, which prevents new IO from being queued to
> driver/hardware.
>
> >
> > What you're suggesting will allow IO to queue up in a queisced "default"
> > queue, which will become "polled" without an interrupt hanlder on the
> > other side of the reset. The application doesn't know that, so the IO
> > you're allowing to queue up will time out.
>
> time out only happens after the request is queued to driver/hardware, or after
> blk_mq_start_request() is called in nvme_queue_rq(), but quiesce actually
> prevents new IOs from being dispatched to driver or be queued via .queue_rq(),
> meantime old requests have been canceled, so no any request can be
> timed out.
Quiesce doesn't prevent requests from entering an hctx, and you can't
back it out to put on another hctx later. It doesn't matter that you
haven't dispatched it to hardware yet. The request's queue was set the
moment it was allocated, so after you unquiesce and freeze for the new
queue mapping, the requests previously blocked on quiesce will time out
in the scenario I've described.
There are certainly gaps in the existing code where error'ed requests
can be requeued or stuck elsewhere and hit the exact same problem, but
the current way at least tries to contain it.
next prev parent reply other threads:[~2023-06-22 15:19 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-20 1:33 [PATCH V2 0/4] nvme: fix two kinds of IO hang from removing NSs Ming Lei
2023-06-20 1:33 ` [PATCH V2 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force Ming Lei
2023-06-20 1:33 ` [PATCH V2 2/4] nvme: add nvme_unfreeze_force() Ming Lei
2023-06-20 1:33 ` [PATCH V2 3/4] nvme: unfreeze queues before removing namespaces Ming Lei
2023-06-20 1:33 ` [PATCH V2 4/4] nvme: unquiesce io queues when " Ming Lei
2023-06-20 10:04 ` [PATCH V2 0/4] nvme: fix two kinds of IO hang from removing NSs Sagi Grimberg
2023-06-20 13:23 ` Ming Lei
2023-06-20 13:40 ` Sagi Grimberg
2023-06-21 0:09 ` Ming Lei
2023-06-21 10:13 ` Sagi Grimberg
2023-06-21 13:27 ` Ming Lei
2023-06-21 15:48 ` Keith Busch
2023-06-22 13:51 ` Ming Lei
2023-06-22 14:35 ` Keith Busch
2023-06-22 14:53 ` Ming Lei
2023-06-22 15:19 ` Keith Busch [this message]
2023-06-25 0:26 ` Ming Lei
2023-06-25 8:09 ` Sagi Grimberg
2023-06-27 17:21 ` Keith Busch
2023-06-27 21:15 ` Sagi Grimberg
2023-06-28 1:30 ` Ming Lei
2023-06-28 14:35 ` Keith Busch
2023-06-29 0:08 ` Ming Lei
2023-06-29 15:33 ` Keith Busch
2023-06-28 7:06 ` Ming Lei
2023-06-28 7:26 ` Sagi Grimberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZJRmd7bnclaNW3PL@kbusch-mbp.dhcp.thefacebook.com \
--to=kbusch@kernel.org \
--cc=axboe@kernel.dk \
--cc=brookxu.cn@gmail.com \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=ming.lei@redhat.com \
--cc=sagi@grimberg.me \
--cc=yi.zhang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox