From: Chaitanya Kulkarni <chaitanyak@nvidia.com>
To: brookxu.cn <brookxu.cn@gmail.com>,
"kbusch@kernel.org" <kbusch@kernel.org>,
"axboe@kernel.dk" <axboe@kernel.dk>, "hch@lst.de" <hch@lst.de>,
"sagi@grimberg.me" <sagi@grimberg.me>
Cc: "linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] nvme: fix reconnection fail due to reserved tag allocation
Date: Fri, 8 Mar 2024 00:19:47 +0000 [thread overview]
Message-ID: <800a2a54-9156-482f-9bc7-e99eea7dad5a@nvidia.com> (raw)
In-Reply-To: <20240307110657.252120-1-brookxu.cn@gmail.com>
On 3/7/24 03:06, brookxu.cn wrote:
> From: Chunguang Xu <chunguang.xu@shopee.com>
>
> We found a issue on production environment while using NVMe
> over RDMA, admin_q reconnect failed forever while remote
> target and network is ok. After dig into it, we found it
> may caused by a ABBA deadlock due to tag allocation. In my
> case, the tag was hold by a keep alive request waiting
> inside admin_q, as we quiesced admin_q while reset ctrl,
> so the request maked as idle and will not process before
> reset success. As fabric_q shares tagset with admin_q,
> while reconnect remote target, we need a tag for connect
> command, but the only one reserved tag was held by keep
> alive command which waiting inside admin_q. As a result,
> we failed to reconnect admin_q forever. In order to fix
> this issue, I think we should keep two reserved tags for
> admin queue.
plz consider rearranged line length, no change in wording to use the
full length :-
We found a issue on production environment while using NVMe over RDMA,
admin_q reconnect failed forever while remote target and network is ok.
After dig into it, we found it may caused by a ABBA deadlock due to tag
allocation. In my case, the tag was hold by a keep alive request
waiting inside admin_q, as we quiesced admin_q while reset ctrl, so the
request maked as idle and will not process before reset success. As
fabric_q shares tagset with admin_q, while reconnect remote target, we
need a tag for connect command, but the only one reserved tag was held
by keep alive command which waiting inside admin_q. As a result, we
failed to reconnect admin_q forever. In order to fix this issue, I think
we should keep two reserved tags for admin queue.
Rest of the patch looks good and follows the discussion on V1.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
prev parent reply other threads:[~2024-03-08 0:19 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-07 11:06 [PATCH v2] nvme: fix reconnection fail due to reserved tag allocation brookxu.cn
2024-03-07 11:18 ` Sagi Grimberg
2024-03-08 0:19 ` Chaitanya Kulkarni [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=800a2a54-9156-482f-9bc7-e99eea7dad5a@nvidia.com \
--to=chaitanyak@nvidia.com \
--cc=axboe@kernel.dk \
--cc=brookxu.cn@gmail.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox