[PATCHv2 0/3] nvme: start keep-alive after admin queue setup

public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCHv2 0/3] nvme: start keep-alive after admin queue setup
@ 2023-10-24  6:13 Hannes Reinecke
  2023-10-24  6:13 ` [PATCH 1/3] nvme-tcp: avoid open-coding nvme_tcp_teardown_admin_queue() Hannes Reinecke
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Hannes Reinecke @ 2023-10-24  6:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, Keith Busch, linux-nvme, Hannes Reinecke

Hi all,

Setting up I/O queues might take quite some time on larger and/or
busy setups, so KATO might expire on the admin queue before all
I/O queues can be setup.
This patchset fixes this issue by moving the call to start keep-alive
into the ->init_ctrl_finish() callback, and move the call to stop
keep-alives into nvme_cancel_admin_tagset().

As usual, comments and reviews are welcome.

Changes to the original version:
- Reworked to use nvme_cancel_admin_tagset()

Hannes Reinecke (3):
  nvme-tcp: avoid open-coding nvme_tcp_teardown_admin_queue()
  nvme-loop: always quiesce and cancel commands before destroying admin
    q
  nvme: start keep-alive after admin queue setup

 drivers/nvme/host/core.c   | 6 +++---
 drivers/nvme/host/fc.c     | 6 ++++++
 drivers/nvme/host/tcp.c    | 6 +-----
 drivers/nvme/target/loop.c | 4 ++++
 4 files changed, 14 insertions(+), 8 deletions(-)

-- 
2.35.3



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] nvme-tcp: avoid open-coding nvme_tcp_teardown_admin_queue()
  2023-10-24  6:13 [PATCHv2 0/3] nvme: start keep-alive after admin queue setup Hannes Reinecke
@ 2023-10-24  6:13 ` Hannes Reinecke
  2023-10-24  6:13 ` [PATCH 2/3] nvme-loop: always quiesce and cancel commands before destroying admin q Hannes Reinecke
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Hannes Reinecke @ 2023-10-24  6:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, Keith Busch, linux-nvme, Hannes Reinecke

nvme_tcp_setup_ctrl() has an open-coded version of
nvme_tcp_teardown_admin_queue().

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/tcp.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 4714a902f4ca..e98076392b69 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2236,11 +2236,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
 		nvme_tcp_destroy_io_queues(ctrl, new);
 	}
 destroy_admin:
-	nvme_quiesce_admin_queue(ctrl);
-	blk_sync_queue(ctrl->admin_q);
-	nvme_tcp_stop_queue(ctrl, 0);
-	nvme_cancel_admin_tagset(ctrl);
-	nvme_tcp_destroy_admin_queue(ctrl, new);
+	nvme_tcp_teardown_admin_queue(ctrl, false);
 	return ret;
 }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] nvme-loop: always quiesce and cancel commands before destroying admin q
  2023-10-24  6:13 [PATCHv2 0/3] nvme: start keep-alive after admin queue setup Hannes Reinecke
  2023-10-24  6:13 ` [PATCH 1/3] nvme-tcp: avoid open-coding nvme_tcp_teardown_admin_queue() Hannes Reinecke
@ 2023-10-24  6:13 ` Hannes Reinecke
  2023-10-24  6:13 ` [PATCH 3/3] nvme: start keep-alive after admin queue setup Hannes Reinecke
  2023-10-26 16:00 ` [PATCHv2 0/3] " Mark O'Donovan
  3 siblings, 0 replies; 15+ messages in thread
From: Hannes Reinecke @ 2023-10-24  6:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, Keith Busch, linux-nvme, Hannes Reinecke

Once ->init_ctrl_finish() is called there may be commands outstanding,
so we should quiesce the admin queue and cancel all commands prior
to call nvme_loop_destroy_admin_queue().

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/target/loop.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 48d5df054cd0..9cb434c58075 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -466,6 +466,8 @@ static void nvme_loop_reset_ctrl_work(struct work_struct *work)
 out_destroy_io:
 	nvme_loop_destroy_io_queues(ctrl);
 out_destroy_admin:
+	nvme_quiesce_admin_queue(&ctrl->ctrl);
+	nvme_cancel_admin_tagset(&ctrl->ctrl);
 	nvme_loop_destroy_admin_queue(ctrl);
 out_disable:
 	dev_warn(ctrl->ctrl.device, "Removing after reset failure\n");
@@ -600,6 +602,8 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
 	return &ctrl->ctrl;
 
 out_remove_admin_queue:
+	nvme_quiesce_admin_queue(&ctrl->ctrl);
+	nvme_cancel_admin_tagset(&ctrl->ctrl);
 	nvme_loop_destroy_admin_queue(ctrl);
 out_free_queues:
 	kfree(ctrl->queues);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] nvme: start keep-alive after admin queue setup
  2023-10-24  6:13 [PATCHv2 0/3] nvme: start keep-alive after admin queue setup Hannes Reinecke
  2023-10-24  6:13 ` [PATCH 1/3] nvme-tcp: avoid open-coding nvme_tcp_teardown_admin_queue() Hannes Reinecke
  2023-10-24  6:13 ` [PATCH 2/3] nvme-loop: always quiesce and cancel commands before destroying admin q Hannes Reinecke
@ 2023-10-24  6:13 ` Hannes Reinecke
  2023-11-06 17:17   ` Keith Busch
  2023-11-20 13:39   ` Sagi Grimberg
  2023-10-26 16:00 ` [PATCHv2 0/3] " Mark O'Donovan
  3 siblings, 2 replies; 15+ messages in thread
From: Hannes Reinecke @ 2023-10-24  6:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, Keith Busch, linux-nvme, Hannes Reinecke

Setting up I/O queues might take quite some time on larger and/or
busy setups, so KATO might expire before all I/O queues could be
set up.
Fix this by start keep alive from the ->init_ctrl_finish() callback,
and stopping it when calling nvme_cancel_admin_tagset().

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/core.c | 6 +++---
 drivers/nvme/host/fc.c   | 6 ++++++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 62612f87aafa..f48b4f735d2d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -483,6 +483,7 @@ EXPORT_SYMBOL_GPL(nvme_cancel_tagset);
 
 void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl)
 {
+	nvme_stop_keep_alive(ctrl);
 	if (ctrl->admin_tagset) {
 		blk_mq_tagset_busy_iter(ctrl->admin_tagset,
 				nvme_cancel_request, ctrl);
@@ -3200,6 +3201,8 @@ int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl, bool was_suspended)
 	clear_bit(NVME_CTRL_DIRTY_CAPABILITY, &ctrl->flags);
 	ctrl->identified = true;
 
+	nvme_start_keep_alive(ctrl);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(nvme_init_ctrl_finish);
@@ -4333,7 +4336,6 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
 {
 	nvme_mpath_stop(ctrl);
 	nvme_auth_stop(ctrl);
-	nvme_stop_keep_alive(ctrl);
 	nvme_stop_failfast_work(ctrl);
 	flush_work(&ctrl->async_event_work);
 	cancel_work_sync(&ctrl->fw_act_work);
@@ -4344,8 +4346,6 @@ EXPORT_SYMBOL_GPL(nvme_stop_ctrl);
 
 void nvme_start_ctrl(struct nvme_ctrl *ctrl)
 {
-	nvme_start_keep_alive(ctrl);
-
 	nvme_enable_aen(ctrl);
 
 	/*
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index a15b37750d6e..a9affc8b755b 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2530,6 +2530,12 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
 	 * clean up the admin queue. Same thing as above.
 	 */
 	nvme_quiesce_admin_queue(&ctrl->ctrl);
+
+	/*
+	 * Open-coding nvme_cancel_admin_tagset() as fc
+	 * is not using nvme_cancel_request().
+	 */
+	nvme_stop_keep_alive(ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
 				nvme_fc_terminate_exchange, &ctrl->ctrl);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 0/3] nvme: start keep-alive after admin queue setup
  2023-10-24  6:13 [PATCHv2 0/3] nvme: start keep-alive after admin queue setup Hannes Reinecke
                   ` (2 preceding siblings ...)
  2023-10-24  6:13 ` [PATCH 3/3] nvme: start keep-alive after admin queue setup Hannes Reinecke
@ 2023-10-26 16:00 ` Mark O'Donovan
  3 siblings, 0 replies; 15+ messages in thread
From: Mark O'Donovan @ 2023-10-26 16:00 UTC (permalink / raw)
  To: linux-nvme

Hi Hannes,

Testing with v1 of this patchset definitely seems
to have solved the problem.
With a 5s kato the connection below would have timed out.

The column to the left of ctrl/chap is milliseconds
since admin queue connect:

[  229.079675][  T640] nvme nvme0: qid 0: authenticated
[  229.087243][  T640] 00000169: ctrl:000000001221a9b6     <NEW KEEP-ALIVE START TIME>
[  229.091241][  T640] nvme nvme0: creating 16 I/O queues.
[  229.257482][  T112] 00000340: chap:00000000db978430 q01 nvme_queue_auth_work auth successful
[  229.412266][  T112] 00000494: chap:0000000062bd76c8 q02 nvme_queue_auth_work auth successful
...
[  231.425002][  T112] 00002507: chap:0000000051ca4f9a q15 nvme_queue_auth_work auth successful
[  231.579549][  T112] 00002662: chap:0000000008e8cfab q16 nvme_queue_auth_work auth successful
[  231.582339][  T640] <OLD KEEP-ALIVE START TIME>
[  231.832703][  T112] 00002915: ctrl:000000001221a9b6     nvme_keep_alive_work sending keep-alive


I have encountered another issue on my qemu vm where the first
call after power up to nvme_auth_process_dhchap_challenge() takes
almost 3s. The delay comes from crypto_alloc_tfm_node().
This patchset does not solve that particular problem.
I have submitted the patch below for it, but both would be required in
my opinion.

http://lists.infradead.org/pipermail/linux-nvme/2023-October/042862.html

Regards,
Mark


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] nvme: start keep-alive after admin queue setup
  2023-10-24  6:13 ` [PATCH 3/3] nvme: start keep-alive after admin queue setup Hannes Reinecke
@ 2023-11-06 17:17   ` Keith Busch
  2023-11-20 13:39   ` Sagi Grimberg
  1 sibling, 0 replies; 15+ messages in thread
From: Keith Busch @ 2023-11-06 17:17 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Christoph Hellwig, Sagi Grimberg, linux-nvme

On Tue, Oct 24, 2023 at 08:13:37AM +0200, Hannes Reinecke wrote:
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -2530,6 +2530,12 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
>  	 * clean up the admin queue. Same thing as above.
>  	 */
>  	nvme_quiesce_admin_queue(&ctrl->ctrl);
> +
> +	/*
> +	 * Open-coding nvme_cancel_admin_tagset() as fc
> +	 * is not using nvme_cancel_request().
> +	 */
> +	nvme_stop_keep_alive(ctrl);

That should be '(&ctrl->ctrl)'. No worries, I fixed it up when applying.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] nvme: start keep-alive after admin queue setup
  2023-10-24  6:13 ` [PATCH 3/3] nvme: start keep-alive after admin queue setup Hannes Reinecke
  2023-11-06 17:17   ` Keith Busch
@ 2023-11-20 13:39   ` Sagi Grimberg
  2023-11-20 14:19     ` Hannes Reinecke
  2023-11-20 16:01     ` Hannes Reinecke
  1 sibling, 2 replies; 15+ messages in thread
From: Sagi Grimberg @ 2023-11-20 13:39 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig; +Cc: Keith Busch, linux-nvme


> Setting up I/O queues might take quite some time on larger and/or
> busy setups, so KATO might expire before all I/O queues could be
> set up.
> Fix this by start keep alive from the ->init_ctrl_finish() callback,
> and stopping it when calling nvme_cancel_admin_tagset().

If this is a fix, the title should describe the issue it is fixing, and
the body should say how it is fixing it.

> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>   drivers/nvme/host/core.c | 6 +++---
>   drivers/nvme/host/fc.c   | 6 ++++++
>   2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 62612f87aafa..f48b4f735d2d 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -483,6 +483,7 @@ EXPORT_SYMBOL_GPL(nvme_cancel_tagset);
>   
>   void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl)
>   {
> +	nvme_stop_keep_alive(ctrl);
>   	if (ctrl->admin_tagset) {
>   		blk_mq_tagset_busy_iter(ctrl->admin_tagset,
>   				nvme_cancel_request, ctrl);

There is a cross dependency here, now nvme_cancel_admin_tagset needs to
have the keep-alive stopped first, which may be waiting on I/O, which
needs to be cancelled...

Keep in mind that kato can be arbitrarily long, and now this function
may be blocked on this kato period.

I also think that now the function is doing something that is more
than simply cancelling the inflight admin tagset, as it is named.

> @@ -3200,6 +3201,8 @@ int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl, bool was_suspended)
>   	clear_bit(NVME_CTRL_DIRTY_CAPABILITY, &ctrl->flags);
>   	ctrl->identified = true;
>   
> +	nvme_start_keep_alive(ctrl);
> +

I'm fine with moving it here. But instead, maybe just change
nvme_start_keep_alive() to use a zero delay and keep it where it
is? will that help?

>   	return 0;
>   }
>   EXPORT_SYMBOL_GPL(nvme_init_ctrl_finish);
> @@ -4333,7 +4336,6 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
>   {
>   	nvme_mpath_stop(ctrl);
>   	nvme_auth_stop(ctrl);
> -	nvme_stop_keep_alive(ctrl);
>   	nvme_stop_failfast_work(ctrl);
>   	flush_work(&ctrl->async_event_work);
>   	cancel_work_sync(&ctrl->fw_act_work);
> @@ -4344,8 +4346,6 @@ EXPORT_SYMBOL_GPL(nvme_stop_ctrl);
>   
>   void nvme_start_ctrl(struct nvme_ctrl *ctrl)
>   {
> -	nvme_start_keep_alive(ctrl);
> -
>   	nvme_enable_aen(ctrl);
>   
>   	/*
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index a15b37750d6e..a9affc8b755b 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -2530,6 +2530,12 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
>   	 * clean up the admin queue. Same thing as above.
>   	 */
>   	nvme_quiesce_admin_queue(&ctrl->ctrl);
> +
> +	/*
> +	 * Open-coding nvme_cancel_admin_tagset() as fc
> +	 * is not using nvme_cancel_request().
> +	 */
> +	nvme_stop_keep_alive(ctrl);
>   	blk_sync_queue(ctrl->ctrl.admin_q);
>   	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
>   				nvme_fc_terminate_exchange, &ctrl->ctrl);

What does this fix? This should really be split out of the patch.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] nvme: start keep-alive after admin queue setup
  2023-11-20 13:39   ` Sagi Grimberg
@ 2023-11-20 14:19     ` Hannes Reinecke
  2023-11-20 14:25       ` Sagi Grimberg
  2023-11-20 16:01     ` Hannes Reinecke
  1 sibling, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-11-20 14:19 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig; +Cc: Keith Busch, linux-nvme

On 11/20/23 14:39, Sagi Grimberg wrote:
> 
>> Setting up I/O queues might take quite some time on larger and/or
>> busy setups, so KATO might expire before all I/O queues could be
>> set up.
>> Fix this by start keep alive from the ->init_ctrl_finish() callback,
>> and stopping it when calling nvme_cancel_admin_tagset().
> 
> If this is a fix, the title should describe the issue it is fixing, and
> the body should say how it is fixing it.
> 
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/nvme/host/core.c | 6 +++---
>>   drivers/nvme/host/fc.c   | 6 ++++++
>>   2 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 62612f87aafa..f48b4f735d2d 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -483,6 +483,7 @@ EXPORT_SYMBOL_GPL(nvme_cancel_tagset);
>>   void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl)
>>   {
>> +    nvme_stop_keep_alive(ctrl);
>>       if (ctrl->admin_tagset) {
>>           blk_mq_tagset_busy_iter(ctrl->admin_tagset,
>>                   nvme_cancel_request, ctrl);
> 
> There is a cross dependency here, now nvme_cancel_admin_tagset needs to
> have the keep-alive stopped first, which may be waiting on I/O, which
> needs to be cancelled...
> 
> Keep in mind that kato can be arbitrarily long, and now this function
> may be blocked on this kato period.
> 
> I also think that now the function is doing something that is more
> than simply cancelling the inflight admin tagset, as it is named.
> 
Hmm. I could move it out of cancel_admin_tagset(). It means that I'll
have to touch each transport driver, but as I have to touch at least
fc anyway I guess it's okay.

>> @@ -3200,6 +3201,8 @@ int nvme_init_ctrl_finish(struct nvme_ctrl 
>> *ctrl, bool was_suspended)
>>       clear_bit(NVME_CTRL_DIRTY_CAPABILITY, &ctrl->flags);
>>       ctrl->identified = true;
>> +    nvme_start_keep_alive(ctrl);
>> +
> 
> I'm fine with moving it here. But instead, maybe just change
> nvme_start_keep_alive() to use a zero delay and keep it where it
> is? will that help?
> 
Not really. We still will fail if setting up I/O queues takes longer
than the KATO period.

Will be updating the patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Frankenstr. 146, 90461 Nürnberg
Managing Directors: I. Totev, A. Myers, A. McDonald, M. B. Moerman
(HRB 36809, AG Nürnberg)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] nvme: start keep-alive after admin queue setup
  2023-11-20 14:19     ` Hannes Reinecke
@ 2023-11-20 14:25       ` Sagi Grimberg
  2023-11-20 15:05         ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2023-11-20 14:25 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig; +Cc: Keith Busch, linux-nvme



On 11/20/23 16:19, Hannes Reinecke wrote:
> On 11/20/23 14:39, Sagi Grimberg wrote:
>>
>>> Setting up I/O queues might take quite some time on larger and/or
>>> busy setups, so KATO might expire before all I/O queues could be
>>> set up.
>>> Fix this by start keep alive from the ->init_ctrl_finish() callback,
>>> and stopping it when calling nvme_cancel_admin_tagset().
>>
>> If this is a fix, the title should describe the issue it is fixing, and
>> the body should say how it is fixing it.
>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> ---
>>>   drivers/nvme/host/core.c | 6 +++---
>>>   drivers/nvme/host/fc.c   | 6 ++++++
>>>   2 files changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 62612f87aafa..f48b4f735d2d 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -483,6 +483,7 @@ EXPORT_SYMBOL_GPL(nvme_cancel_tagset);
>>>   void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl)
>>>   {
>>> +    nvme_stop_keep_alive(ctrl);
>>>       if (ctrl->admin_tagset) {
>>>           blk_mq_tagset_busy_iter(ctrl->admin_tagset,
>>>                   nvme_cancel_request, ctrl);
>>
>> There is a cross dependency here, now nvme_cancel_admin_tagset needs to
>> have the keep-alive stopped first, which may be waiting on I/O, which
>> needs to be cancelled...
>>
>> Keep in mind that kato can be arbitrarily long, and now this function
>> may be blocked on this kato period.
>>
>> I also think that now the function is doing something that is more
>> than simply cancelling the inflight admin tagset, as it is named.
>>
> Hmm. I could move it out of cancel_admin_tagset(). It means that I'll
> have to touch each transport driver, but as I have to touch at least
> fc anyway I guess it's okay.
> 
>>> @@ -3200,6 +3201,8 @@ int nvme_init_ctrl_finish(struct nvme_ctrl 
>>> *ctrl, bool was_suspended)
>>>       clear_bit(NVME_CTRL_DIRTY_CAPABILITY, &ctrl->flags);
>>>       ctrl->identified = true;
>>> +    nvme_start_keep_alive(ctrl);
>>> +
>>
>> I'm fine with moving it here. But instead, maybe just change
>> nvme_start_keep_alive() to use a zero delay and keep it where it
>> is? will that help?
>>
> Not really. We still will fail if setting up I/O queues takes longer
> than the KATO period.

Why? Is this specific for non-tbkas? or very short kato?
because connect is effectively a command that should be counted as
indication that the controller is fine.

If you are not able to connect even a single queue, I'd say something
is wrong and it could be argued that you _want_ to fail.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] nvme: start keep-alive after admin queue setup
  2023-11-20 14:25       ` Sagi Grimberg
@ 2023-11-20 15:05         ` Hannes Reinecke
  2023-11-20 19:03           ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-11-20 15:05 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig; +Cc: Keith Busch, linux-nvme

On 11/20/23 15:25, Sagi Grimberg wrote:
> 
> 
> On 11/20/23 16:19, Hannes Reinecke wrote:
>> On 11/20/23 14:39, Sagi Grimberg wrote:
>>>
>>>> Setting up I/O queues might take quite some time on larger and/or
>>>> busy setups, so KATO might expire before all I/O queues could be
>>>> set up.
>>>> Fix this by start keep alive from the ->init_ctrl_finish() callback,
>>>> and stopping it when calling nvme_cancel_admin_tagset().
>>>
>>> If this is a fix, the title should describe the issue it is fixing, and
>>> the body should say how it is fixing it.
>>>
>>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>>> ---
>>>>   drivers/nvme/host/core.c | 6 +++---
>>>>   drivers/nvme/host/fc.c   | 6 ++++++
>>>>   2 files changed, 9 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>>> index 62612f87aafa..f48b4f735d2d 100644
>>>> --- a/drivers/nvme/host/core.c
>>>> +++ b/drivers/nvme/host/core.c
>>>> @@ -483,6 +483,7 @@ EXPORT_SYMBOL_GPL(nvme_cancel_tagset);
>>>>   void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl)
>>>>   {
>>>> +    nvme_stop_keep_alive(ctrl);
>>>>       if (ctrl->admin_tagset) {
>>>>           blk_mq_tagset_busy_iter(ctrl->admin_tagset,
>>>>                   nvme_cancel_request, ctrl);
>>>
>>> There is a cross dependency here, now nvme_cancel_admin_tagset needs to
>>> have the keep-alive stopped first, which may be waiting on I/O, which
>>> needs to be cancelled...
>>>
>>> Keep in mind that kato can be arbitrarily long, and now this function
>>> may be blocked on this kato period.
>>>
>>> I also think that now the function is doing something that is more
>>> than simply cancelling the inflight admin tagset, as it is named.
>>>
>> Hmm. I could move it out of cancel_admin_tagset(). It means that I'll
>> have to touch each transport driver, but as I have to touch at least
>> fc anyway I guess it's okay.
>>
>>>> @@ -3200,6 +3201,8 @@ int nvme_init_ctrl_finish(struct nvme_ctrl 
>>>> *ctrl, bool was_suspended)
>>>>       clear_bit(NVME_CTRL_DIRTY_CAPABILITY, &ctrl->flags);
>>>>       ctrl->identified = true;
>>>> +    nvme_start_keep_alive(ctrl);
>>>> +
>>>
>>> I'm fine with moving it here. But instead, maybe just change
>>> nvme_start_keep_alive() to use a zero delay and keep it where it
>>> is? will that help?
>>>
>> Not really. We still will fail if setting up I/O queues takes longer
>> than the KATO period.
> 
> Why? Is this specific for non-tbkas? or very short kato?

Non-tbkas.

> because connect is effectively a command that should be counted as
> indication that the controller is fine.
> 
Yeah, indeed.

> If you are not able to connect even a single queue, I'd say something
> is wrong and it could be argued that you _want_ to fail.

This is in conjunction with authentication, where some controller 
implementations use external entities to calculate and/or validate the 
DH parameters. Which takes time.

Scenario is that the admin queue connects fine, but the I/O queues take
some time to setup, until eventually KATO fires before all queues are
established.
As you said, TBKAS should avoid this scenario, but non-TBKAS is still
a valid setup.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] nvme: start keep-alive after admin queue setup
  2023-11-20 13:39   ` Sagi Grimberg
  2023-11-20 14:19     ` Hannes Reinecke
@ 2023-11-20 16:01     ` Hannes Reinecke
  2023-11-20 19:05       ` Sagi Grimberg
  1 sibling, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-11-20 16:01 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig; +Cc: Keith Busch, linux-nvme

On 11/20/23 14:39, Sagi Grimberg wrote:
> 
>> Setting up I/O queues might take quite some time on larger and/or
>> busy setups, so KATO might expire before all I/O queues could be
>> set up.
>> Fix this by start keep alive from the ->init_ctrl_finish() callback,
>> and stopping it when calling nvme_cancel_admin_tagset().
> 
> If this is a fix, the title should describe the issue it is fixing, and
> the body should say how it is fixing it.
> 
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/nvme/host/core.c | 6 +++---
>>   drivers/nvme/host/fc.c   | 6 ++++++
>>   2 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 62612f87aafa..f48b4f735d2d 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -483,6 +483,7 @@ EXPORT_SYMBOL_GPL(nvme_cancel_tagset);
>>   void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl)
>>   {
>> +    nvme_stop_keep_alive(ctrl);
>>       if (ctrl->admin_tagset) {
>>           blk_mq_tagset_busy_iter(ctrl->admin_tagset,
>>                   nvme_cancel_request, ctrl);
> 
> There is a cross dependency here, now nvme_cancel_admin_tagset needs to
> have the keep-alive stopped first, which may be waiting on I/O, which
> needs to be cancelled...
> 
> Keep in mind that kato can be arbitrarily long, and now this function
> may be blocked on this kato period.
> 
> I also think that now the function is doing something that is more
> than simply cancelling the inflight admin tagset, as it is named.
> 
I am having a hard time following this reasoning.
While I do accept that nvme_stop_keep_alive() might trigger I/O
(ie if the work queue has just been started when calling 
cancel_delayed_work), nvme_tcp_error_recovery_work() has this:

	nvme_stop_keep_alive(ctrl);
	flush_work(&ctrl->async_event_work);
	nvme_tcp_teardown_io_queues(ctrl, false);
	/* unquiesce to fail fast pending requests */
	nvme_unquiesce_io_queues(ctrl);
	nvme_tcp_teardown_admin_queue(ctrl, false);

and nvme_tcp_teardown_admin_queue() calls nvme_cancel_admin_tagset().
So by your above reasoning this code should be wrong, too.
What am I missing?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] nvme: start keep-alive after admin queue setup
  2023-11-20 15:05         ` Hannes Reinecke
@ 2023-11-20 19:03           ` Sagi Grimberg
  0 siblings, 0 replies; 15+ messages in thread
From: Sagi Grimberg @ 2023-11-20 19:03 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig; +Cc: Keith Busch, linux-nvme


> Yeah, indeed.
> 
>> If you are not able to connect even a single queue, I'd say something
>> is wrong and it could be argued that you _want_ to fail.
> 
> This is in conjunction with authentication, where some controller 
> implementations use external entities to calculate and/or validate the 
> DH parameters. Which takes time.
> 
> Scenario is that the admin queue connects fine, but the I/O queues take
> some time to setup, until eventually KATO fires before all queues are
> established.
> As you said, TBKAS should avoid this scenario, but non-TBKAS is still
> a valid setup.

Well, I'm not opposed to starting the keep alive sooner, but we don't
need to change its stop location.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] nvme: start keep-alive after admin queue setup
  2023-11-20 16:01     ` Hannes Reinecke
@ 2023-11-20 19:05       ` Sagi Grimberg
  2023-11-21  7:29         ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2023-11-20 19:05 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig; +Cc: Keith Busch, linux-nvme



On 11/20/23 18:01, Hannes Reinecke wrote:
> On 11/20/23 14:39, Sagi Grimberg wrote:
>>
>>> Setting up I/O queues might take quite some time on larger and/or
>>> busy setups, so KATO might expire before all I/O queues could be
>>> set up.
>>> Fix this by start keep alive from the ->init_ctrl_finish() callback,
>>> and stopping it when calling nvme_cancel_admin_tagset().
>>
>> If this is a fix, the title should describe the issue it is fixing, and
>> the body should say how it is fixing it.
>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> ---
>>>   drivers/nvme/host/core.c | 6 +++---
>>>   drivers/nvme/host/fc.c   | 6 ++++++
>>>   2 files changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 62612f87aafa..f48b4f735d2d 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -483,6 +483,7 @@ EXPORT_SYMBOL_GPL(nvme_cancel_tagset);
>>>   void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl)
>>>   {
>>> +    nvme_stop_keep_alive(ctrl);
>>>       if (ctrl->admin_tagset) {
>>>           blk_mq_tagset_busy_iter(ctrl->admin_tagset,
>>>                   nvme_cancel_request, ctrl);
>>
>> There is a cross dependency here, now nvme_cancel_admin_tagset needs to
>> have the keep-alive stopped first, which may be waiting on I/O, which
>> needs to be cancelled...
>>
>> Keep in mind that kato can be arbitrarily long, and now this function
>> may be blocked on this kato period.
>>
>> I also think that now the function is doing something that is more
>> than simply cancelling the inflight admin tagset, as it is named.
>>
> I am having a hard time following this reasoning.
> While I do accept that nvme_stop_keep_alive() might trigger I/O
> (ie if the work queue has just been started when calling 
> cancel_delayed_work), nvme_tcp_error_recovery_work() has this:
> 
>      nvme_stop_keep_alive(ctrl);
>      flush_work(&ctrl->async_event_work);
>      nvme_tcp_teardown_io_queues(ctrl, false);
>      /* unquiesce to fail fast pending requests */
>      nvme_unquiesce_io_queues(ctrl);
>      nvme_tcp_teardown_admin_queue(ctrl, false);
> 
> and nvme_tcp_teardown_admin_queue() calls nvme_cancel_admin_tagset().
> So by your above reasoning this code should be wrong, too.
> What am I missing?

Need to dig through the history, but it can most definitely move to
after the teardown. It could be from the earlier days where the
transport fencing was not as reliable particularly for admin requests.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] nvme: start keep-alive after admin queue setup
  2023-11-20 19:05       ` Sagi Grimberg
@ 2023-11-21  7:29         ` Hannes Reinecke
  2023-11-21  9:16           ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2023-11-21  7:29 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig; +Cc: Keith Busch, linux-nvme

On 11/20/23 20:05, Sagi Grimberg wrote:
> 
> 
> On 11/20/23 18:01, Hannes Reinecke wrote:
>> On 11/20/23 14:39, Sagi Grimberg wrote:
>>>
>>>> Setting up I/O queues might take quite some time on larger and/or
>>>> busy setups, so KATO might expire before all I/O queues could be
>>>> set up.
>>>> Fix this by start keep alive from the ->init_ctrl_finish() callback,
>>>> and stopping it when calling nvme_cancel_admin_tagset().
>>>
>>> If this is a fix, the title should describe the issue it is fixing, and
>>> the body should say how it is fixing it.
>>>
>>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>>> ---
>>>>   drivers/nvme/host/core.c | 6 +++---
>>>>   drivers/nvme/host/fc.c   | 6 ++++++
>>>>   2 files changed, 9 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>>> index 62612f87aafa..f48b4f735d2d 100644
>>>> --- a/drivers/nvme/host/core.c
>>>> +++ b/drivers/nvme/host/core.c
>>>> @@ -483,6 +483,7 @@ EXPORT_SYMBOL_GPL(nvme_cancel_tagset);
>>>>   void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl)
>>>>   {
>>>> +    nvme_stop_keep_alive(ctrl);
>>>>       if (ctrl->admin_tagset) {
>>>>           blk_mq_tagset_busy_iter(ctrl->admin_tagset,
>>>>                   nvme_cancel_request, ctrl);
>>>
>>> There is a cross dependency here, now nvme_cancel_admin_tagset needs to
>>> have the keep-alive stopped first, which may be waiting on I/O, which
>>> needs to be cancelled...
>>>
>>> Keep in mind that kato can be arbitrarily long, and now this function
>>> may be blocked on this kato period.
>>>
>>> I also think that now the function is doing something that is more
>>> than simply cancelling the inflight admin tagset, as it is named.
>>>
>> I am having a hard time following this reasoning.
>> While I do accept that nvme_stop_keep_alive() might trigger I/O
>> (ie if the work queue has just been started when calling 
>> cancel_delayed_work), nvme_tcp_error_recovery_work() has this:
>>
>>      nvme_stop_keep_alive(ctrl);
>>      flush_work(&ctrl->async_event_work);
>>      nvme_tcp_teardown_io_queues(ctrl, false);
>>      /* unquiesce to fail fast pending requests */
>>      nvme_unquiesce_io_queues(ctrl);
>>      nvme_tcp_teardown_admin_queue(ctrl, false);
>>
>> and nvme_tcp_teardown_admin_queue() calls nvme_cancel_admin_tagset().
>> So by your above reasoning this code should be wrong, too.
>> What am I missing?
> 
> Need to dig through the history, but it can most definitely move to
> after the teardown. It could be from the earlier days where the
> transport fencing was not as reliable particularly for admin requests.

Ah, no. We are both wrong. Colophon of cancel_delayed_work():

  * Note:
  * The work callback function may still be running on return, unless
  * it returns %true and the work doesn't re-arm itself.  Explicitly 
flush or
  * use cancel_delayed_work_sync() to wait on it.

Hence we won't be blocked by running I/O when calling 
nvme_stop_keep_alive().
So the correct syntax would indeed be calling nvme_stop_keep_alive()
first, and then disable/cancel the admin tagset (which will terminate
any outstanding keep-alive requests).

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] nvme: start keep-alive after admin queue setup
  2023-11-21  7:29         ` Hannes Reinecke
@ 2023-11-21  9:16           ` Sagi Grimberg
  0 siblings, 0 replies; 15+ messages in thread
From: Sagi Grimberg @ 2023-11-21  9:16 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig; +Cc: Keith Busch, linux-nvme



On 11/21/23 09:29, Hannes Reinecke wrote:
> On 11/20/23 20:05, Sagi Grimberg wrote:
>>
>>
>> On 11/20/23 18:01, Hannes Reinecke wrote:
>>> On 11/20/23 14:39, Sagi Grimberg wrote:
>>>>
>>>>> Setting up I/O queues might take quite some time on larger and/or
>>>>> busy setups, so KATO might expire before all I/O queues could be
>>>>> set up.
>>>>> Fix this by start keep alive from the ->init_ctrl_finish() callback,
>>>>> and stopping it when calling nvme_cancel_admin_tagset().
>>>>
>>>> If this is a fix, the title should describe the issue it is fixing, and
>>>> the body should say how it is fixing it.
>>>>
>>>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>>>> ---
>>>>>   drivers/nvme/host/core.c | 6 +++---
>>>>>   drivers/nvme/host/fc.c   | 6 ++++++
>>>>>   2 files changed, 9 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>>>> index 62612f87aafa..f48b4f735d2d 100644
>>>>> --- a/drivers/nvme/host/core.c
>>>>> +++ b/drivers/nvme/host/core.c
>>>>> @@ -483,6 +483,7 @@ EXPORT_SYMBOL_GPL(nvme_cancel_tagset);
>>>>>   void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl)
>>>>>   {
>>>>> +    nvme_stop_keep_alive(ctrl);
>>>>>       if (ctrl->admin_tagset) {
>>>>>           blk_mq_tagset_busy_iter(ctrl->admin_tagset,
>>>>>                   nvme_cancel_request, ctrl);
>>>>
>>>> There is a cross dependency here, now nvme_cancel_admin_tagset needs to
>>>> have the keep-alive stopped first, which may be waiting on I/O, which
>>>> needs to be cancelled...
>>>>
>>>> Keep in mind that kato can be arbitrarily long, and now this function
>>>> may be blocked on this kato period.
>>>>
>>>> I also think that now the function is doing something that is more
>>>> than simply cancelling the inflight admin tagset, as it is named.
>>>>
>>> I am having a hard time following this reasoning.
>>> While I do accept that nvme_stop_keep_alive() might trigger I/O
>>> (ie if the work queue has just been started when calling 
>>> cancel_delayed_work), nvme_tcp_error_recovery_work() has this:
>>>
>>>      nvme_stop_keep_alive(ctrl);
>>>      flush_work(&ctrl->async_event_work);
>>>      nvme_tcp_teardown_io_queues(ctrl, false);
>>>      /* unquiesce to fail fast pending requests */
>>>      nvme_unquiesce_io_queues(ctrl);
>>>      nvme_tcp_teardown_admin_queue(ctrl, false);
>>>
>>> and nvme_tcp_teardown_admin_queue() calls nvme_cancel_admin_tagset().
>>> So by your above reasoning this code should be wrong, too.
>>> What am I missing?
>>
>> Need to dig through the history, but it can most definitely move to
>> after the teardown. It could be from the earlier days where the
>> transport fencing was not as reliable particularly for admin requests.
> 
> Ah, no. We are both wrong. Colophon of cancel_delayed_work():
> 
>   * Note:
>   * The work callback function may still be running on return, unless
>   * it returns %true and the work doesn't re-arm itself.  Explicitly 
> flush or
>   * use cancel_delayed_work_sync() to wait on it.
> 
> Hence we won't be blocked by running I/O when calling 
> nvme_stop_keep_alive().
> So the correct syntax would indeed be calling nvme_stop_keep_alive()
> first, and then disable/cancel the admin tagset (which will terminate
> any outstanding keep-alive requests).

Yes, you're correct. because keep-alive is not a sync command it
won't block on io. However I still don't see why should we change
where it exist today (i.e. the choice to move it to the odd location
of nvme_cancel_admin_tagset).

I agree that we can move start_keep_alive earlier.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-11-21  9:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-24  6:13 [PATCHv2 0/3] nvme: start keep-alive after admin queue setup Hannes Reinecke
2023-10-24  6:13 ` [PATCH 1/3] nvme-tcp: avoid open-coding nvme_tcp_teardown_admin_queue() Hannes Reinecke
2023-10-24  6:13 ` [PATCH 2/3] nvme-loop: always quiesce and cancel commands before destroying admin q Hannes Reinecke
2023-10-24  6:13 ` [PATCH 3/3] nvme: start keep-alive after admin queue setup Hannes Reinecke
2023-11-06 17:17   ` Keith Busch
2023-11-20 13:39   ` Sagi Grimberg
2023-11-20 14:19     ` Hannes Reinecke
2023-11-20 14:25       ` Sagi Grimberg
2023-11-20 15:05         ` Hannes Reinecke
2023-11-20 19:03           ` Sagi Grimberg
2023-11-20 16:01     ` Hannes Reinecke
2023-11-20 19:05       ` Sagi Grimberg
2023-11-21  7:29         ` Hannes Reinecke
2023-11-21  9:16           ` Sagi Grimberg
2023-10-26 16:00 ` [PATCHv2 0/3] " Mark O'Donovan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox