From: Hannes Reinecke <hare@suse.de>
To: Yi Zhang <yi.zhang@redhat.com>, Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org, skt-results-master@redhat.com,
Bruno Goncalves <bgoncalv@redhat.com>
Subject: Re: [bug report] blktests nvme/022 lead kernel WARNING and NULL pointer
Date: Sat, 22 May 2021 16:59:06 +0200 [thread overview]
Message-ID: <3c592789-12ea-41cc-5b47-1a7d3aabb4d1@suse.de> (raw)
In-Reply-To: <CAHj4cs9p0ckNUT7jvcGs-u=sbbX6U5787smgeYrUE+ZM_vk6tg@mail.gmail.com>
On 5/22/21 2:12 AM, Yi Zhang wrote:
> On Sat, May 22, 2021 at 2:19 AM Sagi Grimberg <sagi@grimberg.me> wrote:
>>
>>
>>>>> What about this?
>>>
>>> Hi Hannes
>>> With this patch, no WARNNING/NULL pointer this time, but still have
>>> 'keep-alive timer expired' and reset failure issue, here is the full
>>> log:
>>>
>>> # ./check nvme/022
>>> nvme/022 (test NVMe reset command on NVMeOF file-backed ns) [failed]
>>> runtime 10.646s ... 11.087s
>>> --- tests/nvme/022.out 2021-05-20 20:16:31.384068807 -0400
>>> +++ /root/blktests/results/nodev/nvme/022.out.bad 2021-05-20
>>> 20:24:27.874250466 -0400
>>> @@ -1,4 +1,5 @@
>>> Running nvme/022
>>> 91fdba0d-f87b-4c25-b80f-db7be1418b9e
>>> uuid.91fdba0d-f87b-4c25-b80f-db7be1418b9e
>>> +ERROR: reset failed
>>> Test complete
>>> # cat results/nodev/nvme/022.full
>>> Reset: Network dropped connection on reset
>>> NQN:blktests-subsystem-1 disconnected 1 controller(s)
>>>
>>> [37353.068448] run blktests nvme/022 at 2021-05-20 20:24:16
>>> [37353.146301] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>> [37353.161765] nvmet: creating controller 1 for subsystem
>>> blktests-subsystem-1 for NQN
>>> nqn.2014-08.org.nvmexpress:uuid:6a70d220-bfde-1000-03ce-ea40b8730904.
>>> [37353.175796] nvme nvme0: creating 128 I/O queues.
>>> [37353.189734] nvme nvme0: new ctrl: "blktests-subsystem-1"
>>> [37354.216686] nvme nvme0: resetting controller
>>> [37363.270607] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
>>> [37363.276521] nvmet: ctrl 1 fatal error occurred!
>>> [37363.281058] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
>>>
>>> # ./check nvme/021
>>> nvme/021 (test NVMe list command on NVMeOF file-backed ns) [passed]
>>> runtime 10.958s ... 11.382s
>>> # dmesg
>>> [38142.862881] run blktests nvme/021 at 2021-05-20 20:37:26
>>> [38142.941038] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>> [38142.956621] nvmet: creating controller 1 for subsystem
>>> blktests-subsystem-1 for NQN
>>> nqn.2014-08.org.nvmexpress:uuid:6a70d220-bfde-1000-03ce-ea40b8730904.
>>> [38142.970524] nvme nvme0: creating 128 I/O queues.
>>> [38142.984356] nvme nvme0: new ctrl: "blktests-subsystem-1"
>>> [38144.014601] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
>>> [38153.030107] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
>>> [38153.036018] nvmet: ctrl 1 fatal error occurred!
>>
>> I think that the main reason is that there are 128 queues that are being
>> created, and during that time the keep alive timer ends up expiring as
>> it is shorter (used to be 15 seconds, now 5 by default).
>>
>> nvmet only stops the keep-alive timer when the controller is freed,
>> which is pretty late in the sequence.. The problem is that it needs to
>> be this way because if we shut it down sooner a host can die in the
>> middle of a teardown sequence and we still need to detect that and
>> cleanup ourselves. But maybe we can mod the keep-alive timer for
>> every queue we delete, just in the case the host is not deleting
>> fast enough?
>>
>> Ming, does this solve the issue you are seeing?
>
> Hi Sagi
> The issue was fixed by this patch. :)
>
>> --
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index 1853db38b682..f0715e9a4a9c 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -804,6 +804,7 @@ void nvmet_sq_destroy(struct nvmet_sq *sq)
>> percpu_ref_exit(&sq->ref);
>>
>> if (ctrl) {
>> + ctrl->cmd_seen = true;
>> nvmet_ctrl_put(ctrl);
>> sq->ctrl = NULL; /* allows reusing the queue later */
>> }
>> --
>>
>> We probably need to rename cmd_seen to extend_tbkas (extend traffic
>> based keep-alive).
>>
>
>
Thanks for the confirmation.
I'll send a proper patchset.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
prev parent reply other threads:[~2021-05-22 14:59 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-30 16:40 [bug report] blktests nvme/022 lead kernel WARNING and NULL pointer Yi Zhang
2021-05-01 0:55 ` Sagi Grimberg
2021-05-01 9:58 ` Yi Zhang
2021-05-07 8:35 ` Yi Zhang
2021-05-07 19:50 ` Sagi Grimberg
2021-05-09 8:44 ` Hannes Reinecke
2021-05-12 0:32 ` Yi Zhang
2021-05-19 0:36 ` Yi Zhang
2021-05-20 6:19 ` Hannes Reinecke
2021-05-21 0:38 ` Yi Zhang
2021-05-21 18:19 ` Sagi Grimberg
2021-05-22 0:12 ` Yi Zhang
2021-05-22 14:59 ` Hannes Reinecke [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3c592789-12ea-41cc-5b47-1a7d3aabb4d1@suse.de \
--to=hare@suse.de \
--cc=bgoncalv@redhat.com \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
--cc=skt-results-master@redhat.com \
--cc=yi.zhang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox