* [PATCH] nvme-fabrics: use reserved tag for reg read/write command
@ 2024-04-30 2:17 brookxu.cn
2024-04-30 3:46 ` Chaitanya Kulkarni
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: brookxu.cn @ 2024-04-30 2:17 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel
From: Chunguang Xu <chunguang.xu@shopee.com>
In some scenarios, if too many commands are issued by nvme command in
the same time by user tasks, this may exhaust all tags of admin_q. If
a reset (nvme reset or IO timeout) occurs before these commands finish,
reconnect routine may fail to update nvme regs due to insufficient tags,
which will cause kernel hang forever. In order to workaround this issue,
maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved
tags. This maybe safe for nvmf:
1. For the disable ctrl path, we will not issue connect command
2. For the enable ctrl / fw activate path, since connect and reg_xx()
are called serially.
So the reserved tags may still be enough while reg_xx() use reserved tags.
Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com>
---
drivers/nvme/host/fabrics.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 1f0ea1f32d22..f6416f8553f0 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -180,7 +180,7 @@ int nvmf_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
cmd.prop_get.offset = cpu_to_le32(off);
ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res, NULL, 0,
- NVME_QID_ANY, 0);
+ NVME_QID_ANY, NVME_SUBMIT_RESERVED);
if (ret >= 0)
*val = le64_to_cpu(res.u64);
@@ -226,7 +226,7 @@ int nvmf_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val)
cmd.prop_get.offset = cpu_to_le32(off);
ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res, NULL, 0,
- NVME_QID_ANY, 0);
+ NVME_QID_ANY, NVME_SUBMIT_RESERVED);
if (ret >= 0)
*val = le64_to_cpu(res.u64);
@@ -271,7 +271,7 @@ int nvmf_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val)
cmd.prop_set.value = cpu_to_le64(val);
ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, NULL, NULL, 0,
- NVME_QID_ANY, 0);
+ NVME_QID_ANY, NVME_SUBMIT_RESERVED);
if (unlikely(ret))
dev_err(ctrl->device,
"Property Set error: %d, offset %#x\n",
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] nvme-fabrics: use reserved tag for reg read/write command
2024-04-30 2:17 [PATCH] nvme-fabrics: use reserved tag for reg read/write command brookxu.cn
@ 2024-04-30 3:46 ` Chaitanya Kulkarni
2024-04-30 5:10 ` 许春光
2024-04-30 9:44 ` Sagi Grimberg
2024-04-30 17:42 ` Chaitanya Kulkarni
2 siblings, 1 reply; 7+ messages in thread
From: Chaitanya Kulkarni @ 2024-04-30 3:46 UTC (permalink / raw)
To: brookxu.cn
Cc: linux-nvme@lists.infradead.org, hch@lst.de, kbusch@kernel.org,
linux-kernel@vger.kernel.org, axboe@kernel.dk, sagi@grimberg.me
On 4/29/2024 7:17 PM, brookxu.cn wrote:
> From: Chunguang Xu <chunguang.xu@shopee.com>
>
> In some scenarios, if too many commands are issued by nvme command in
> the same time by user tasks, this may exhaust all tags of admin_q. If
> a reset (nvme reset or IO timeout) occurs before these commands finish,
> reconnect routine may fail to update nvme regs due to insufficient tags,
> which will cause kernel hang forever. In order to workaround this issue,
> maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved
> tags. This maybe safe for nvmf:
>
> 1. For the disable ctrl path, we will not issue connect command
> 2. For the enable ctrl / fw activate path, since connect and reg_xx()
> are called serially.
>
Given the complexity of the scenario described above, is it possible to
write a script for this scenario that will trigger this and submit to
blktest ? not that this is a blocker to get this patch reviewed, but
believe it is needed in long run, WDYT ?
> So the reserved tags may still be enough while reg_xx() use reserved tags.
>
> Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com>
> ---
-ck
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvme-fabrics: use reserved tag for reg read/write command
2024-04-30 3:46 ` Chaitanya Kulkarni
@ 2024-04-30 5:10 ` 许春光
2024-04-30 6:59 ` Chaitanya Kulkarni
0 siblings, 1 reply; 7+ messages in thread
From: 许春光 @ 2024-04-30 5:10 UTC (permalink / raw)
To: Chaitanya Kulkarni
Cc: linux-nvme@lists.infradead.org, hch@lst.de, kbusch@kernel.org,
linux-kernel@vger.kernel.org, axboe@kernel.dk, sagi@grimberg.me
Chaitanya Kulkarni <chaitanyak@nvidia.com> 于2024年4月30日周二 11:47写道:
>
> On 4/29/2024 7:17 PM, brookxu.cn wrote:
> > From: Chunguang Xu <chunguang.xu@shopee.com>
> >
> > In some scenarios, if too many commands are issued by nvme command in
> > the same time by user tasks, this may exhaust all tags of admin_q. If
> > a reset (nvme reset or IO timeout) occurs before these commands finish,
> > reconnect routine may fail to update nvme regs due to insufficient tags,
> > which will cause kernel hang forever. In order to workaround this issue,
> > maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved
> > tags. This maybe safe for nvmf:
> >
> > 1. For the disable ctrl path, we will not issue connect command
> > 2. For the enable ctrl / fw activate path, since connect and reg_xx()
> > are called serially.
> >
>
> Given the complexity of the scenario described above, is it possible to
> write a script for this scenario that will trigger this and submit to
> blktest ? not that this is a blocker to get this patch reviewed, but
> believe it is needed in long run, WDYT ?
Thanks for you reply, I can easily reproduce it in my VMs by following steps:
STEP 1. In order to reduce the complexity of reproduction, I reduce
NVME_AQ_MQ_TAG_DEPTH from 31 to 8
STEP 2. Create a nvme device by NVMe over tcp, such as following command:
nvme connect -t tcp -a 192.168.122.20 -s 4420 -n
nqn.2014-08.org.nvmexpress.mytest
STEP 3. Buind and run the c++ program issues nvme commands as followed:
#include <sys/types.h>
#include <signal.h>
#include <unistd.h>
#include <vector>
#include <set>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
const int concurrency = 64;
std::set<pid_t> chlds;
int __exit = 0;
void sigint_proc(int signo)
{
__exit = 1;
}
int main(int argc, char **argv)
{
signal(SIGINT, sigint_proc);
for (auto i = 0; i < concurrency; i++) {
auto pid = fork();
if (!pid) {
while (true) {
system("nvme list -o json 2>&1 > /dev/null");
}
}
chlds.insert(pid);
}
while (!__exit) {
if (chlds.empty())
break;
for (auto pid : chlds) {
int wstatus, ret;
ret = waitpid(pid, &wstatus, WNOWAIT);
if (ret > 0) {
chlds.erase(pid);
break;
}
}
usleep(1000);
}
// exit
for (auto pid : chlds)
kill(pid, SIGKILL);
return 0;
}
STEP 4. Open a new console, running the followed command:
while [ true ]; do nvme reset /dev/nvme0; sleep `echo "$RANDOM%1" | bc`; done
We will reproduce this issue soon.
>
> > So the reserved tags may still be enough while reg_xx() use reserved tags.
> >
> > Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com>
> > ---
>
> -ck
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] nvme-fabrics: use reserved tag for reg read/write command
2024-04-30 5:10 ` 许春光
@ 2024-04-30 6:59 ` Chaitanya Kulkarni
2024-04-30 8:59 ` 许春光
0 siblings, 1 reply; 7+ messages in thread
From: Chaitanya Kulkarni @ 2024-04-30 6:59 UTC (permalink / raw)
To: 许春光
Cc: linux-nvme@lists.infradead.org, hch@lst.de, kbusch@kernel.org,
linux-kernel@vger.kernel.org, axboe@kernel.dk, sagi@grimberg.me
On 4/29/2024 10:10 PM, 许春光 wrote:
> Chaitanya Kulkarni <chaitanyak@nvidia.com> 于2024年4月30日周二 11:47写道:
>>
>> On 4/29/2024 7:17 PM, brookxu.cn wrote:
>>> From: Chunguang Xu <chunguang.xu@shopee.com>
>>>
>>> In some scenarios, if too many commands are issued by nvme command in
>>> the same time by user tasks, this may exhaust all tags of admin_q. If
>>> a reset (nvme reset or IO timeout) occurs before these commands finish,
>>> reconnect routine may fail to update nvme regs due to insufficient tags,
>>> which will cause kernel hang forever. In order to workaround this issue,
>>> maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved
>>> tags. This maybe safe for nvmf:
>>>
>>> 1. For the disable ctrl path, we will not issue connect command
>>> 2. For the enable ctrl / fw activate path, since connect and reg_xx()
>>> are called serially.
>>>
>>
>> Given the complexity of the scenario described above, is it possible to
>> write a script for this scenario that will trigger this and submit to
>> blktest ? not that this is a blocker to get this patch reviewed, but
>> believe it is needed in long run, WDYT ?
>
> Thanks for you reply, I can easily reproduce it in my VMs by following steps:
> STEP 1. In order to reduce the complexity of reproduction, I reduce
> NVME_AQ_MQ_TAG_DEPTH from 31 to 8
>
> STEP 2. Create a nvme device by NVMe over tcp, such as following command:
> nvme connect -t tcp -a 192.168.122.20 -s 4420 -n
> nqn.2014-08.org.nvmexpress.mytest
>
> STEP 3. Buind and run the c++ program issues nvme commands as followed:
> #include <sys/types.h>
> #include <signal.h>
> #include <unistd.h>
> #include <vector>
> #include <set>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/wait.h>
>
> const int concurrency = 64;
> std::set<pid_t> chlds;
>
> int __exit = 0;
> void sigint_proc(int signo)
> {
> __exit = 1;
> }
>
> int main(int argc, char **argv)
> {
> signal(SIGINT, sigint_proc);
>
> for (auto i = 0; i < concurrency; i++) {
> auto pid = fork();
> if (!pid) {
> while (true) {
> system("nvme list -o json 2>&1 > /dev/null");
> }
> }
>
> chlds.insert(pid);
> }
>
> while (!__exit) {
> if (chlds.empty())
> break;
>
> for (auto pid : chlds) {
> int wstatus, ret;
> ret = waitpid(pid, &wstatus, WNOWAIT);
> if (ret > 0) {
> chlds.erase(pid);
> break;
> }
> }
> usleep(1000);
> }
>
> // exit
> for (auto pid : chlds)
> kill(pid, SIGKILL);
>
> return 0;
> }
>
> STEP 4. Open a new console, running the followed command:
> while [ true ]; do nvme reset /dev/nvme0; sleep `echo "$RANDOM%1" | bc`; done
>
> We will reproduce this issue soon.
>>
cool, can you please submit a blktest [1] for this ? I'm not sure if we
have any coverage for this scenario ...
-ck
[1] https://github.com/osandov/blktests
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] nvme-fabrics: use reserved tag for reg read/write command
2024-04-30 6:59 ` Chaitanya Kulkarni
@ 2024-04-30 8:59 ` 许春光
0 siblings, 0 replies; 7+ messages in thread
From: 许春光 @ 2024-04-30 8:59 UTC (permalink / raw)
To: Chaitanya Kulkarni
Cc: linux-nvme@lists.infradead.org, hch@lst.de, kbusch@kernel.org,
linux-kernel@vger.kernel.org, axboe@kernel.dk, sagi@grimberg.me
Chaitanya Kulkarni <chaitanyak@nvidia.com> 于2024年4月30日周二 14:59写道:
>
> On 4/29/2024 10:10 PM, 许春光 wrote:
> > Chaitanya Kulkarni <chaitanyak@nvidia.com> 于2024年4月30日周二 11:47写道:
> >>
> >> On 4/29/2024 7:17 PM, brookxu.cn wrote:
> >>> From: Chunguang Xu <chunguang.xu@shopee.com>
> >>>
> >>> In some scenarios, if too many commands are issued by nvme command in
> >>> the same time by user tasks, this may exhaust all tags of admin_q. If
> >>> a reset (nvme reset or IO timeout) occurs before these commands finish,
> >>> reconnect routine may fail to update nvme regs due to insufficient tags,
> >>> which will cause kernel hang forever. In order to workaround this issue,
> >>> maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved
> >>> tags. This maybe safe for nvmf:
> >>>
> >>> 1. For the disable ctrl path, we will not issue connect command
> >>> 2. For the enable ctrl / fw activate path, since connect and reg_xx()
> >>> are called serially.
> >>>
> >>
> >> Given the complexity of the scenario described above, is it possible to
> >> write a script for this scenario that will trigger this and submit to
> >> blktest ? not that this is a blocker to get this patch reviewed, but
> >> believe it is needed in long run, WDYT ?
> >
> > Thanks for you reply, I can easily reproduce it in my VMs by following steps:
> > STEP 1. In order to reduce the complexity of reproduction, I reduce
> > NVME_AQ_MQ_TAG_DEPTH from 31 to 8
> >
> > STEP 2. Create a nvme device by NVMe over tcp, such as following command:
> > nvme connect -t tcp -a 192.168.122.20 -s 4420 -n
> > nqn.2014-08.org.nvmexpress.mytest
> >
> > STEP 3. Buind and run the c++ program issues nvme commands as followed:
> > #include <sys/types.h>
> > #include <signal.h>
> > #include <unistd.h>
> > #include <vector>
> > #include <set>
> > #include <stdlib.h>
> > #include <sys/types.h>
> > #include <sys/wait.h>
> >
> > const int concurrency = 64;
> > std::set<pid_t> chlds;
> >
> > int __exit = 0;
> > void sigint_proc(int signo)
> > {
> > __exit = 1;
> > }
> >
> > int main(int argc, char **argv)
> > {
> > signal(SIGINT, sigint_proc);
> >
> > for (auto i = 0; i < concurrency; i++) {
> > auto pid = fork();
> > if (!pid) {
> > while (true) {
> > system("nvme list -o json 2>&1 > /dev/null");
> > }
> > }
> >
> > chlds.insert(pid);
> > }
> >
> > while (!__exit) {
> > if (chlds.empty())
> > break;
> >
> > for (auto pid : chlds) {
> > int wstatus, ret;
> > ret = waitpid(pid, &wstatus, WNOWAIT);
> > if (ret > 0) {
> > chlds.erase(pid);
> > break;
> > }
> > }
> > usleep(1000);
> > }
> >
> > // exit
> > for (auto pid : chlds)
> > kill(pid, SIGKILL);
> >
> > return 0;
> > }
> >
> > STEP 4. Open a new console, running the followed command:
> > while [ true ]; do nvme reset /dev/nvme0; sleep `echo "$RANDOM%1" | bc`; done
> >
> > We will reproduce this issue soon.
> >>
>
> cool, can you please submit a blktest [1] for this ? I'm not sure if we
> have any coverage for this scenario ...
>
My pleasure, after reviewed, if no more doubt, I will try to do it, thanks
> -ck
>
> [1] https://github.com/osandov/blktests
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvme-fabrics: use reserved tag for reg read/write command
2024-04-30 2:17 [PATCH] nvme-fabrics: use reserved tag for reg read/write command brookxu.cn
2024-04-30 3:46 ` Chaitanya Kulkarni
@ 2024-04-30 9:44 ` Sagi Grimberg
2024-04-30 17:42 ` Chaitanya Kulkarni
2 siblings, 0 replies; 7+ messages in thread
From: Sagi Grimberg @ 2024-04-30 9:44 UTC (permalink / raw)
To: brookxu.cn, kbusch, axboe, hch; +Cc: linux-nvme, linux-kernel
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvme-fabrics: use reserved tag for reg read/write command
2024-04-30 2:17 [PATCH] nvme-fabrics: use reserved tag for reg read/write command brookxu.cn
2024-04-30 3:46 ` Chaitanya Kulkarni
2024-04-30 9:44 ` Sagi Grimberg
@ 2024-04-30 17:42 ` Chaitanya Kulkarni
2 siblings, 0 replies; 7+ messages in thread
From: Chaitanya Kulkarni @ 2024-04-30 17:42 UTC (permalink / raw)
To: brookxu.cn, kbusch@kernel.org, axboe@kernel.dk, hch@lst.de,
sagi@grimberg.me
Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
On 4/29/24 19:17, brookxu.cn wrote:
> From: Chunguang Xu <chunguang.xu@shopee.com>
>
> In some scenarios, if too many commands are issued by nvme command in
> the same time by user tasks, this may exhaust all tags of admin_q. If
> a reset (nvme reset or IO timeout) occurs before these commands finish,
> reconnect routine may fail to update nvme regs due to insufficient tags,
> which will cause kernel hang forever. In order to workaround this issue,
> maybe we can let reg_read32()/reg_read64()/reg_write32() use reserved
> tags. This maybe safe for nvmf:
>
> 1. For the disable ctrl path, we will not issue connect command
> 2. For the enable ctrl / fw activate path, since connect and reg_xx()
> are called serially.
>
> So the reserved tags may still be enough while reg_xx() use reserved tags.
>
> Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com>
> ---
>
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-04-30 17:42 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-30 2:17 [PATCH] nvme-fabrics: use reserved tag for reg read/write command brookxu.cn
2024-04-30 3:46 ` Chaitanya Kulkarni
2024-04-30 5:10 ` 许春光
2024-04-30 6:59 ` Chaitanya Kulkarni
2024-04-30 8:59 ` 许春光
2024-04-30 9:44 ` Sagi Grimberg
2024-04-30 17:42 ` Chaitanya Kulkarni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox