linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme-pci: use NOWAIT flag for nvme_set_host_mem
@ 2018-01-29  3:07 Jianchao Wang
  2018-01-29  8:06 ` jianchao.wang
  2018-01-29 16:01 ` Keith Busch
  0 siblings, 2 replies; 7+ messages in thread
From: Jianchao Wang @ 2018-01-29  3:07 UTC (permalink / raw)
  To: keith.busch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel

nvme_set_host_mem will invoke nvme_alloc_request without NOWAIT
flag, it is unsafe for nvme_dev_disable. The adminq driver tags
may have been used up when the previous outstanding adminq requests
cannot be completed due to some hardware error. We have to depend
on the timeout path to complete the previous outstanding adminq
requests and free the tags.
However, nvme_timeout will invoke nvme_dev_disable and try to
get the shutdown_lock which is held by another context who is
sleeping to wait for the tags to be freed by timeout path. A
deadlock comes up.

To fix it, let nvme_set_host_mem use NOWAIT flag.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
---
 drivers/nvme/host/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6fe7af0..9532529 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1736,7 +1736,8 @@ static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
 	c.features.dword14	= cpu_to_le32(upper_32_bits(dma_addr));
 	c.features.dword15	= cpu_to_le32(dev->nr_host_mem_descs);
 
-	ret = nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0);
+	ret = __nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, NULL, 0, 0,
+			NVME_QID_ANY, 0, BLK_MQ_REQ_NOWAIT);
 	if (ret) {
 		dev_warn(dev->ctrl.device,
 			 "failed to set host mem (err %d, flags %#x).\n",
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme-pci: use NOWAIT flag for nvme_set_host_mem
  2018-01-29  3:07 [PATCH] nvme-pci: use NOWAIT flag for nvme_set_host_mem Jianchao Wang
@ 2018-01-29  8:06 ` jianchao.wang
  2018-01-29 16:01 ` Keith Busch
  1 sibling, 0 replies; 7+ messages in thread
From: jianchao.wang @ 2018-01-29  8:06 UTC (permalink / raw)
  To: keith.busch, axboe, hch, sagi; +Cc: linux-kernel, linux-nvme



On 01/29/2018 11:07 AM, Jianchao Wang wrote:
> nvme_set_host_mem will invoke nvme_alloc_request without NOWAIT
> flag, it is unsafe for nvme_dev_disable. The adminq driver tags
> may have been used up when the previous outstanding adminq requests
> cannot be completed due to some hardware error. We have to depend
> on the timeout path to complete the previous outstanding adminq
> requests and free the tags.
> However, nvme_timeout will invoke nvme_dev_disable and try to
> get the shutdown_lock which is held by another context who is
> sleeping to wait for the tags to be freed by timeout path. A
> deadlock comes up.
> 
> To fix it, let nvme_set_host_mem use NOWAIT flag.

In fact, this is the only case about nvme_set_host_mem in
nvme_dev_disable.
Consider the following case:

A adminq request expired.

timeout_work context
nvme_timeout
  -> nvme_dev_disable
    -> nvme_set_host_mem
    if this adminq request expires again, the timeout_work
    cannot handle this case, because it is waiting for the
    result of nvme_set_host_mem.

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme-pci: use NOWAIT flag for nvme_set_host_mem
  2018-01-29  3:07 [PATCH] nvme-pci: use NOWAIT flag for nvme_set_host_mem Jianchao Wang
  2018-01-29  8:06 ` jianchao.wang
@ 2018-01-29 16:01 ` Keith Busch
  2018-01-29 19:55   ` Sagi Grimberg
  1 sibling, 1 reply; 7+ messages in thread
From: Keith Busch @ 2018-01-29 16:01 UTC (permalink / raw)
  To: Jianchao Wang; +Cc: axboe, hch, sagi, linux-nvme, linux-kernel

On Mon, Jan 29, 2018 at 11:07:35AM +0800, Jianchao Wang wrote:
> nvme_set_host_mem will invoke nvme_alloc_request without NOWAIT
> flag, it is unsafe for nvme_dev_disable. The adminq driver tags
> may have been used up when the previous outstanding adminq requests
> cannot be completed due to some hardware error. We have to depend
> on the timeout path to complete the previous outstanding adminq
> requests and free the tags.
> However, nvme_timeout will invoke nvme_dev_disable and try to
> get the shutdown_lock which is held by another context who is
> sleeping to wait for the tags to be freed by timeout path. A
> deadlock comes up.
> 
> To fix it, let nvme_set_host_mem use NOWAIT flag.
> 
> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>

Thanks for the fix. It looks like we still have a problem, though.
Commands submitted with the "shutdown_lock" held need to be able to make
forward progress without relying on a completion, but this one could
block indefinitely.


>  drivers/nvme/host/pci.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 6fe7af0..9532529 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1736,7 +1736,8 @@ static int nvme_set_host_mem(struct nvme_dev *dev, u32 bits)
>  	c.features.dword14	= cpu_to_le32(upper_32_bits(dma_addr));
>  	c.features.dword15	= cpu_to_le32(dev->nr_host_mem_descs);
>  
> -	ret = nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0);
> +	ret = __nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, NULL, 0, 0,
> +			NVME_QID_ANY, 0, BLK_MQ_REQ_NOWAIT);
>  	if (ret) {
>  		dev_warn(dev->ctrl.device,
>  			 "failed to set host mem (err %d, flags %#x).\n",
> -- 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme-pci: use NOWAIT flag for nvme_set_host_mem
  2018-01-29 16:01 ` Keith Busch
@ 2018-01-29 19:55   ` Sagi Grimberg
  2018-01-29 20:17     ` Keith Busch
  0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2018-01-29 19:55 UTC (permalink / raw)
  To: Keith Busch, Jianchao Wang; +Cc: axboe, hch, linux-nvme, linux-kernel

Hey Keith,

> Thanks for the fix. It looks like we still have a problem, though.
> Commands submitted with the "shutdown_lock" held need to be able to make
> forward progress without relying on a completion, but this one could
> block indefinitely.

Can you explain to me why is the shutdown_lock needed to synchronize
nvme_dev_disable? More concretely, how is nvme_dev_disable different
from other places where we rely on the ctrl state to serialize stuff?

The only reason I see would be to protect against completion-after-abort
scenario but I think the block layer should protect against it (checks
if the request timeout timer fired).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme-pci: use NOWAIT flag for nvme_set_host_mem
  2018-01-29 19:55   ` Sagi Grimberg
@ 2018-01-29 20:17     ` Keith Busch
  2018-01-30  3:41       ` jianchao.wang
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Busch @ 2018-01-29 20:17 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Jianchao Wang, axboe, hch, linux-nvme, linux-kernel

On Mon, Jan 29, 2018 at 09:55:41PM +0200, Sagi Grimberg wrote:
> > Thanks for the fix. It looks like we still have a problem, though.
> > Commands submitted with the "shutdown_lock" held need to be able to make
> > forward progress without relying on a completion, but this one could
> > block indefinitely.
> 
> Can you explain to me why is the shutdown_lock needed to synchronize
> nvme_dev_disable? More concretely, how is nvme_dev_disable different
> from other places where we rely on the ctrl state to serialize stuff?
> 
> The only reason I see would be to protect against completion-after-abort
> scenario but I think the block layer should protect against it (checks
> if the request timeout timer fired).

We can probably find a way to use the state machine for this. Disabling
the controller pre-dates the state machine, and the mutex is there to
protect against two actors shutting the controller down at the same
time, like a hot removal at the same time as a timeout handling reset.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme-pci: use NOWAIT flag for nvme_set_host_mem
  2018-01-29 20:17     ` Keith Busch
@ 2018-01-30  3:41       ` jianchao.wang
  2018-01-30 16:13         ` Keith Busch
  0 siblings, 1 reply; 7+ messages in thread
From: jianchao.wang @ 2018-01-30  3:41 UTC (permalink / raw)
  To: Keith Busch, Sagi Grimberg; +Cc: axboe, hch, linux-nvme, linux-kernel

Hi Keith and Sagi

Thanks for your kindly response. :)

On 01/30/2018 04:17 AM, Keith Busch wrote:
> On Mon, Jan 29, 2018 at 09:55:41PM +0200, Sagi Grimberg wrote:
>>> Thanks for the fix. It looks like we still have a problem, though.
>>> Commands submitted with the "shutdown_lock" held need to be able to make
>>> forward progress without relying on a completion, but this one could
>>> block indefinitely.
>>
>> Can you explain to me why is the shutdown_lock needed to synchronize
>> nvme_dev_disable? More concretely, how is nvme_dev_disable different
>> from other places where we rely on the ctrl state to serialize stuff?
>>
>> The only reason I see would be to protect against completion-after-abort
>> scenario but I think the block layer should protect against it (checks
>> if the request timeout timer fired).
> 
> We can probably find a way to use the state machine for this. Disabling
> the controller pre-dates the state machine, and the mutex is there to
> protect against two actors shutting the controller down at the same
> time, like a hot removal at the same time as a timeout handling reset.
> 

Another point that confuses me is that whether nvme_set_host_mem is necessary
in nvme_dev_disable ?
As the comment:
----
		/*
		 * If the controller is still alive tell it to stop using the
		 * host memory buffer.  In theory the shutdown / reset should
		 * make sure that it doesn't access the host memoery anymore,
		 * but I'd rather be safe than sorry..
		 */
		if (dev->host_mem_descs)
			nvme_set_host_mem(dev, 0);
----
It is to avoid accessing to host memory from controller. But the host memory is just
freed when nvme_remove. It looks like we just need to do this in nvme_remove.
For example:
-----
@@ -2553,6 +2545,14 @@ static void nvme_remove(struct pci_dev *pdev)
        flush_work(&dev->ctrl.reset_work);
        nvme_stop_ctrl(&dev->ctrl);
        nvme_remove_namespaces(&dev->ctrl);
+       /*
+        * If the controller is still alive tell it to stop using the
+        * host memory buffer.  In theory the shutdown / reset should
+        * make sure that it doesn't access the host memoery anymore,
+        * but I'd rather be safe than sorry..
+        */
+       if (dev->host_mem_descs)
+               nvme_set_host_mem(dev, 0);
        nvme_dev_disable(dev, true);
        nvme_free_host_mem(dev);
----

If anything missed, please point out.
That's really appreciated.

Sincerely
Jianchao

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme-pci: use NOWAIT flag for nvme_set_host_mem
  2018-01-30  3:41       ` jianchao.wang
@ 2018-01-30 16:13         ` Keith Busch
  0 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2018-01-30 16:13 UTC (permalink / raw)
  To: jianchao.wang; +Cc: Sagi Grimberg, axboe, hch, linux-nvme, linux-kernel

On Tue, Jan 30, 2018 at 11:41:07AM +0800, jianchao.wang wrote:
> Another point that confuses me is that whether nvme_set_host_mem is necessary
> in nvme_dev_disable ?
> As the comment:
> ----
> 		/*
> 		 * If the controller is still alive tell it to stop using the
> 		 * host memory buffer.  In theory the shutdown / reset should
> 		 * make sure that it doesn't access the host memoery anymore,
> 		 * but I'd rather be safe than sorry..
> 		 */
> 		if (dev->host_mem_descs)
> 			nvme_set_host_mem(dev, 0);
> ----
> It is to avoid accessing to host memory from controller. But the host memory is just
> freed when nvme_remove. It looks like we just need to do this in nvme_remove.
> For example:
> -----
> @@ -2553,6 +2545,14 @@ static void nvme_remove(struct pci_dev *pdev)
>         flush_work(&dev->ctrl.reset_work);
>         nvme_stop_ctrl(&dev->ctrl);
>         nvme_remove_namespaces(&dev->ctrl);
> +       /*
> +        * If the controller is still alive tell it to stop using the
> +        * host memory buffer.  In theory the shutdown / reset should
> +        * make sure that it doesn't access the host memoery anymore,
> +        * but I'd rather be safe than sorry..
> +        */
> +       if (dev->host_mem_descs)
> +               nvme_set_host_mem(dev, 0);
>         nvme_dev_disable(dev, true);
>         nvme_free_host_mem(dev);
> ----
> 
> If anything missed, please point out.
> That's really appreciated.

I don't make any such controller, so I can't speak to the necessity of
having this here. I just think having it in the controller disabling
path is for safety. We want the controller to acknowledge that it has
completed any use of the HMB before we make it so it can't access it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-01-30 16:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-29  3:07 [PATCH] nvme-pci: use NOWAIT flag for nvme_set_host_mem Jianchao Wang
2018-01-29  8:06 ` jianchao.wang
2018-01-29 16:01 ` Keith Busch
2018-01-29 19:55   ` Sagi Grimberg
2018-01-29 20:17     ` Keith Busch
2018-01-30  3:41       ` jianchao.wang
2018-01-30 16:13         ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).