From: Sagi Grimberg <sagi@grimberg.me>
To: 许春光 <brookxu.cn@gmail.com>
Cc: kbusch@kernel.org, axboe@kernel.dk, hch@lst.de,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] nvme: fix reconnection fail due to reserved tag allocation
Date: Thu, 7 Mar 2024 12:35:11 +0200 [thread overview]
Message-ID: <5a6ceb89-c242-4e76-b6f3-91c83099794a@grimberg.me> (raw)
In-Reply-To: <CADtkEeeiNDO87L9MwC392gEp7YhhGGxojRu8nW_epkTe-jxcyg@mail.gmail.com>
On 07/03/2024 12:32, 许春光 wrote:
> Thanks for review, seems that we should revert this patch
> ed01fee283a0, ed01fee283a0 seems just a alone 'optimization'. If no
> double, I will send another patch.
Not a revert, but a fix with a Fixes tag. Just use
NVMF_ADMIN_RESERVED_TAGS and NVMF_IO_RESERVED_TAGS.
>
> Thanks
>
> Sagi Grimberg <sagi@grimberg.me> 于2024年3月7日周四 17:36写道:
>>
>>
>> On 28/02/2024 11:14, brookxu.cn wrote:
>>> From: Chunguang Xu <chunguang.xu@shopee.com>
>>>
>>> We found a issue on production environment while using NVMe
>>> over RDMA, admin_q reconnect failed forever while remote
>>> target and network is ok. After dig into it, we found it
>>> may caused by a ABBA deadlock due to tag allocation. In my
>>> case, the tag was hold by a keep alive request waiting
>>> inside admin_q, as we quiesced admin_q while reset ctrl,
>>> so the request maked as idle and will not process before
>>> reset success. As fabric_q shares tagset with admin_q,
>>> while reconnect remote target, we need a tag for connect
>>> command, but the only one reserved tag was held by keep
>>> alive command which waiting inside admin_q. As a result,
>>> we failed to reconnect admin_q forever.
>>>
>>> In order to workaround this issue, I think we should not
>>> retry keep alive request while controller reconnecting,
>>> as we have stopped keep alive while resetting controller,
>>> and will start it again while init finish, so it maybe ok
>>> to drop it.
>> This is the wrong fix.
>> First we should note that this is a regression caused by:
>> ed01fee283a0 ("nvme-fabrics: only reserve a single tag")
>>
>> Then, you need to restore reserving two tags for the admin
>> tagset.
prev parent reply other threads:[~2024-03-07 10:35 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-28 9:14 [PATCH] nvme: fix reconnection fail due to reserved tag allocation brookxu.cn
2024-03-07 9:36 ` Sagi Grimberg
2024-03-07 10:32 ` 许春光
2024-03-07 10:35 ` Sagi Grimberg [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5a6ceb89-c242-4e76-b6f3-91c83099794a@grimberg.me \
--to=sagi@grimberg.me \
--cc=axboe@kernel.dk \
--cc=brookxu.cn@gmail.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox