Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Yi Zhang <yi.zhang@redhat.com>, Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org, skt-results-master@redhat.com,
	Bruno Goncalves <bgoncalv@redhat.com>
Subject: Re: [bug report] blktests nvme/022 lead kernel WARNING and NULL pointer
Date: Sat, 22 May 2021 16:59:06 +0200	[thread overview]
Message-ID: <3c592789-12ea-41cc-5b47-1a7d3aabb4d1@suse.de> (raw)
In-Reply-To: <CAHj4cs9p0ckNUT7jvcGs-u=sbbX6U5787smgeYrUE+ZM_vk6tg@mail.gmail.com>

On 5/22/21 2:12 AM, Yi Zhang wrote:
> On Sat, May 22, 2021 at 2:19 AM Sagi Grimberg <sagi@grimberg.me> wrote:
>>
>>
>>>>> What about this?
>>>
>>> Hi Hannes
>>> With this patch, no WARNNING/NULL pointer this time, but still have
>>> 'keep-alive timer expired' and reset failure issue, here is the full
>>> log:
>>>
>>> # ./check nvme/022
>>> nvme/022 (test NVMe reset command on NVMeOF file-backed ns)  [failed]
>>>       runtime  10.646s  ...  11.087s
>>>       --- tests/nvme/022.out 2021-05-20 20:16:31.384068807 -0400
>>>       +++ /root/blktests/results/nodev/nvme/022.out.bad 2021-05-20
>>> 20:24:27.874250466 -0400
>>>       @@ -1,4 +1,5 @@
>>>        Running nvme/022
>>>        91fdba0d-f87b-4c25-b80f-db7be1418b9e
>>>        uuid.91fdba0d-f87b-4c25-b80f-db7be1418b9e
>>>       +ERROR: reset failed
>>>        Test complete
>>> # cat results/nodev/nvme/022.full
>>> Reset: Network dropped connection on reset
>>> NQN:blktests-subsystem-1 disconnected 1 controller(s)
>>>
>>> [37353.068448] run blktests nvme/022 at 2021-05-20 20:24:16
>>> [37353.146301] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>> [37353.161765] nvmet: creating controller 1 for subsystem
>>> blktests-subsystem-1 for NQN
>>> nqn.2014-08.org.nvmexpress:uuid:6a70d220-bfde-1000-03ce-ea40b8730904.
>>> [37353.175796] nvme nvme0: creating 128 I/O queues.
>>> [37353.189734] nvme nvme0: new ctrl: "blktests-subsystem-1"
>>> [37354.216686] nvme nvme0: resetting controller
>>> [37363.270607] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
>>> [37363.276521] nvmet: ctrl 1 fatal error occurred!
>>> [37363.281058] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
>>>
>>> # ./check nvme/021
>>> nvme/021 (test NVMe list command on NVMeOF file-backed ns)   [passed]
>>>       runtime  10.958s  ...  11.382s
>>> # dmesg
>>> [38142.862881] run blktests nvme/021 at 2021-05-20 20:37:26
>>> [38142.941038] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>> [38142.956621] nvmet: creating controller 1 for subsystem
>>> blktests-subsystem-1 for NQN
>>> nqn.2014-08.org.nvmexpress:uuid:6a70d220-bfde-1000-03ce-ea40b8730904.
>>> [38142.970524] nvme nvme0: creating 128 I/O queues.
>>> [38142.984356] nvme nvme0: new ctrl: "blktests-subsystem-1"
>>> [38144.014601] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
>>> [38153.030107] nvmet: ctrl 1 keep-alive timer (5 seconds) expired!
>>> [38153.036018] nvmet: ctrl 1 fatal error occurred!
>>
>> I think that the main reason is that there are 128 queues that are being
>> created, and during that time the keep alive timer ends up expiring as
>> it is shorter (used to be 15 seconds, now 5 by default).
>>
>> nvmet only stops the keep-alive timer when the controller is freed,
>> which is pretty late in the sequence.. The problem is that it needs to
>> be this way because if we shut it down sooner a host can die in the
>> middle of a teardown sequence and we still need to detect that and
>> cleanup ourselves. But maybe we can mod the keep-alive timer for
>> every queue we delete, just in the case the host is not deleting
>> fast enough?
>>
>> Ming, does this solve the issue you are seeing?
> 
> Hi Sagi
> The issue was fixed by this patch. :)
> 
>> --
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index 1853db38b682..f0715e9a4a9c 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -804,6 +804,7 @@ void nvmet_sq_destroy(struct nvmet_sq *sq)
>>           percpu_ref_exit(&sq->ref);
>>
>>           if (ctrl) {
>> +               ctrl->cmd_seen = true;
>>                   nvmet_ctrl_put(ctrl);
>>                   sq->ctrl = NULL; /* allows reusing the queue later */
>>           }
>> --
>>
>> We probably need to rename cmd_seen to extend_tbkas (extend traffic
>> based keep-alive).
>>
> 
> 
Thanks for the confirmation.

I'll send a proper patchset.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

      reply	other threads:[~2021-05-22 14:59 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-30 16:40 [bug report] blktests nvme/022 lead kernel WARNING and NULL pointer Yi Zhang
2021-05-01  0:55 ` Sagi Grimberg
2021-05-01  9:58   ` Yi Zhang
2021-05-07  8:35     ` Yi Zhang
2021-05-07 19:50       ` Sagi Grimberg
2021-05-09  8:44         ` Hannes Reinecke
2021-05-12  0:32           ` Yi Zhang
2021-05-19  0:36             ` Yi Zhang
2021-05-20  6:19               ` Hannes Reinecke
2021-05-21  0:38                 ` Yi Zhang
2021-05-21 18:19                   ` Sagi Grimberg
2021-05-22  0:12                     ` Yi Zhang
2021-05-22 14:59                       ` Hannes Reinecke [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3c592789-12ea-41cc-5b47-1a7d3aabb4d1@suse.de \
    --to=hare@suse.de \
    --cc=bgoncalv@redhat.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --cc=skt-results-master@redhat.com \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox