All of lore.kernel.org
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>,
	linux-nvme@lists.infradead.org, Sagi Grimberg <sagi@grimberg.me>,
	Yi Zhang <yi.zhang@redhat.com>
Subject: Re: [PATCH] nvme-pci: fix race between pci reset and nvme probe
Date: Mon, 1 Aug 2022 08:33:01 -0600	[thread overview]
Message-ID: <YufkHXoUrw7jRLuT@kbusch-mbp> (raw)
In-Reply-To: <20220801125753.1434024-1-ming.lei@redhat.com>

On Mon, Aug 01, 2022 at 08:57:53PM +0800, Ming Lei wrote:
> After nvme_probe() returns, device lock is released, and PCI reset
> handler may come, meantime reset work is just scheduled and should
> be in-progress.
> 
> When nvme_reset_prepare() is run, all NSs may not be allocated yet
> and each NS's request queue won't be frozen by nvme_dev_disable().
> 
> But when nvme_reset_done() is called for resetting controller, all
> NSs may have been scanned successfully, and nvme_wait_freeze() is
> called on un-frozen request queues, then wait forever.
> 
> Fix the issue by holding device lock for resetting from nvme probe.
> 
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Link: https://lore.kernel.org/linux-block/CAHj4cs--KPTAGP=jj+7KMe=arDv=HeGeOgs1T8vbusyk=EjXow@mail.gmail.com/#r
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/nvme/host/pci.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 4232192e10dd..d49b1a082983 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -3075,9 +3075,14 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
>  static void nvme_async_probe(void *data, async_cookie_t cookie)
>  {
>  	struct nvme_dev *dev = data;
> +	struct pci_dev *pdev = to_pci_dev(dev->dev);
>  
> +	pci_dev_lock(pdev);
> +	nvme_reset_ctrl(&dev->ctrl);
>  	flush_work(&dev->ctrl.reset_work);
>  	flush_work(&dev->ctrl.scan_work);
> +	pci_dev_unlock(pdev);
> +
>  	nvme_put_ctrl(&dev->ctrl);
>  }

When low on memory, async_schedule() falls back to calling the requested
function directly, so this would deadlock on taking the pci_dev_lock() the
second time within the probe context.

If it is successfully scheduled asynchronously, holding the lock blocks a hot
removal, which might be the only thing that can unblock the nvme reset_work
from forward progress.

If you are encountering a nvme_reset_prepare() condition during scanning, that
might indicate a failure to communicate with the end device. The scan work may
need the error handling to unblock it.

> @@ -3154,7 +3159,6 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  
>  	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
>  
> -	nvme_reset_ctrl(&dev->ctrl);
>  	async_schedule(nvme_async_probe, dev);
>  
>  	return 0;
> -- 
> 2.31.1
> 


  reply	other threads:[~2022-08-01 14:33 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-01 12:57 [PATCH] nvme-pci: fix race between pci reset and nvme probe Ming Lei
2022-08-01 14:33 ` Keith Busch [this message]
2022-08-02  0:05   ` Ming Lei
2022-08-02  2:22     ` Keith Busch
2022-08-04 21:55 ` Keith Busch
2022-08-05 12:34   ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YufkHXoUrw7jRLuT@kbusch-mbp \
    --to=kbusch@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=ming.lei@redhat.com \
    --cc=sagi@grimberg.me \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.