From: sagi@grimberg.me (sagig)
Subject: [PATCHv2] NVMe: Fix reset/remove race
Date: Sun, 3 Apr 2016 19:38:46 +0300 [thread overview]
Message-ID: <57014716.5050901@grimberg.me> (raw)
In-Reply-To: <1458683375-27496-1-git-send-email-keith.busch@intel.com>
On 22/03/16 23:49, Keith Busch wrote:
> This fixes a scenario where device is present and being reset, but a
> request to unbind the driver occurs.
>
> A previous patch series addressing a device failure removal scenario
> flushed reset_work after controller disable to unblock reset_work waiting
> on a completion that wouldn't occur. This isn't safe as-is. The broken
> scenario can potentially be induced with:
>
> modprobe nvme && modprobe -r nvme
>
> To fix, the reset work is flushed immediately after setting the controller
> removing flag, and any subsequent reset will not proceed with controller
> initialization if the flag is set.
>
> The controller status must be polled while active, so the watchdog timer
> is also left active until the controller is disabled to cleanup requests
> that may be stuck during namespace removal.
>
> [Fixes: ff23a2a15a2117245b4599c1352343c8b8fb4c43]
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
> v1->v2:
> Removed the untested change on IO timeout handling that skipped queueing
> reset work.
>
> drivers/nvme/host/pci.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 24ccda3..660ec84 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1859,6 +1859,9 @@ static void nvme_reset_work(struct work_struct *work)
> if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
> nvme_dev_disable(dev, false);
>
> + if (test_bit(NVME_CTRL_REMOVING, &dev->flags))
> + goto out;
> +
> set_bit(NVME_CTRL_RESETTING, &dev->flags);
>
> result = nvme_pci_enable(dev);
> @@ -2078,11 +2081,10 @@ static void nvme_remove(struct pci_dev *pdev)
> {
> struct nvme_dev *dev = pci_get_drvdata(pdev);
>
> - del_timer_sync(&dev->watchdog_timer);
> -
> set_bit(NVME_CTRL_REMOVING, &dev->flags);
> pci_set_drvdata(pdev, NULL);
> flush_work(&dev->async_work);
> + flush_work(&dev->reset_work);
Do we need the same for scan_work? AFAICT it can still
sneak in while we're removing...
next prev parent reply other threads:[~2016-04-03 16:38 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-22 21:49 [PATCHv2] NVMe: Fix reset/remove race Keith Busch
2016-03-23 7:44 ` Christoph Hellwig
2016-03-23 7:56 ` Johannes Thumshirn
2016-04-03 16:38 ` sagig [this message]
2016-04-04 19:14 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57014716.5050901@grimberg.me \
--to=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.