From: Ming Lei <ming.lei@redhat.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Yi Zhang <yi.zhang@redhat.com>,
"open list:NVM EXPRESS DRIVER" <linux-nvme@lists.infradead.org>,
linux-block <linux-block@vger.kernel.org>,
Bart Van Assche <bvanassche@acm.org>
Subject: Re: [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side
Date: Thu, 7 Jul 2022 09:46:10 +0800 [thread overview]
Message-ID: <YsY64iMxnLtucKsP@T590> (raw)
In-Reply-To: <0a8099e6-6e28-da1f-7b4b-0ea04fa8f9d6@grimberg.me>
On Wed, Jul 06, 2022 at 06:30:43PM +0300, Sagi Grimberg wrote:
>
> > > > update the subject to better describe the issue:
> > > >
> > > > So I tried this issue on one nvme/rdma environment, and it was also
> > > > reproducible, here are the steps:
> > > >
> > > > # echo 0 >/sys/devices/system/cpu/cpu0/online
> > > > # dmesg | tail -10
> > > > [ 781.577235] smpboot: CPU 0 is now offline
> > > > # nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn
> > > > Failed to write to /dev/nvme-fabrics: Invalid cross-device link
> > > > no controller found: failed to write to nvme-fabrics device
> > > >
> > > > # dmesg
> > > > [ 781.577235] smpboot: CPU 0 is now offline
> > > > [ 799.471627] nvme nvme0: creating 39 I/O queues.
> > > > [ 801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues.
> > > > [ 801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402
> > > > [ 801.073059] nvme nvme0: failed to connect queue: 1 ret=-18
> > >
> > > This is because of blk_mq_alloc_request_hctx() and was raised before.
> > >
> > > IIRC there was reluctance to make it allocate a request for an hctx even
> > > if its associated mapped cpu is offline.
> > >
> > > The latest attempt was from Ming:
> > > [PATCH V7 0/3] blk-mq: fix blk_mq_alloc_request_hctx
> > >
> > > Don't know where that went tho...
> >
> > The attempt relies on that the queue for connecting io queue uses
> > non-admined irq, unfortunately that can't be true for all drivers,
> > so that way can't go.
>
> The only consumer is nvme-fabrics, so others don't matter.
> Maybe we need a different interface that allows this relaxation.
>
> > So far, I'd suggest to fix nvme_*_connect_io_queues() to ignore failed
> > io queue, then the nvme host still can be setup with less io queues.
>
> What happens when the CPU comes back? Not sure we can simply ignore it.
Anyway, it is a not good choice to fail the whole controller if only one
queue can't be connected. I meant the queue can be kept as non-LIVE, and
it should work since no any io can be issued to this queue when it is
non-LIVE.
Just wondering why we can't re-connect the io queue and set LIVE after
any CPU in the this hctx->cpumask becomes online? blk-mq could add one
pair of callbacks for driver for handing this queue change.
thanks,
Ming
next prev parent reply other threads:[~2022-07-07 1:46 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-30 6:02 [bug report] blktests nvme/004 failed after offline cpu Yi Zhang
2022-07-04 5:42 ` [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side Yi Zhang
2022-07-04 23:04 ` Sagi Grimberg
2022-07-05 0:49 ` Ming Lei
2022-07-06 15:30 ` Sagi Grimberg
2022-07-07 1:46 ` Ming Lei [this message]
2022-07-07 7:28 ` Sagi Grimberg
2022-07-07 8:07 ` Ming Lei
2022-07-26 2:05 ` Ming Lei
2022-07-26 8:56 ` Sagi Grimberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YsY64iMxnLtucKsP@T590 \
--to=ming.lei@redhat.com \
--cc=bvanassche@acm.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
--cc=yi.zhang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.