All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] nvme: don't ignore tagset allocation failures
@ 2017-03-29  9:03 Max Gurtovoy
  2017-03-29 17:19 ` Sagi Grimberg
  0 siblings, 1 reply; 8+ messages in thread
From: Max Gurtovoy @ 2017-03-29  9:03 UTC (permalink / raw)


the nvme_dev_add() function silently ignores failures.
In case blk_mq_alloc_tag_set fails, we hit NULL deref while
calling blk_mq_init_queue during nvme_alloc_ns with tagset == NULL.
Instead, we'll not issue the scan_work in case tagset allocation
failed and leave the ctrl functional.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
Reviewed-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/core.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9b3b57f..493722a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2115,9 +2115,9 @@ void nvme_queue_scan(struct nvme_ctrl *ctrl)
 {
 	/*
 	 * Do not queue new scan work when a controller is reset during
-	 * removal.
+	 * removal or if the tagset doesn't exist.
 	 */
-	if (ctrl->state == NVME_CTRL_LIVE)
+	if (ctrl->state == NVME_CTRL_LIVE && ctrl->tagset)
 		schedule_work(&ctrl->scan_work);
 }
 EXPORT_SYMBOL_GPL(nvme_queue_scan);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 1/1] nvme: don't ignore tagset allocation failures
  2017-03-29  9:03 [PATCH 1/1] nvme: don't ignore tagset allocation failures Max Gurtovoy
@ 2017-03-29 17:19 ` Sagi Grimberg
  2017-03-29 17:32   ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2017-03-29 17:19 UTC (permalink / raw)



> the nvme_dev_add() function silently ignores failures.
> In case blk_mq_alloc_tag_set fails, we hit NULL deref while
> calling blk_mq_init_queue during nvme_alloc_ns with tagset == NULL.
> Instead, we'll not issue the scan_work in case tagset allocation
> failed and leave the ctrl functional.
>
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> Reviewed-by: Keith Busch <keith.busch at intel.com>
> ---
>  drivers/nvme/host/core.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 9b3b57f..493722a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2115,9 +2115,9 @@ void nvme_queue_scan(struct nvme_ctrl *ctrl)
>  {
>  	/*
>  	 * Do not queue new scan work when a controller is reset during
> -	 * removal.
> +	 * removal or if the tagset doesn't exist.
>  	 */
> -	if (ctrl->state == NVME_CTRL_LIVE)
> +	if (ctrl->state == NVME_CTRL_LIVE && ctrl->tagset)
>  		schedule_work(&ctrl->scan_work);

This looks wrong...

Why is the controller LIVE without a tagset allocated?
I think you should handle it in the pci driver.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/1] nvme: don't ignore tagset allocation failures
  2017-03-29 17:19 ` Sagi Grimberg
@ 2017-03-29 17:32   ` Keith Busch
  2017-03-29 17:42     ` Sagi Grimberg
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2017-03-29 17:32 UTC (permalink / raw)


On Wed, Mar 29, 2017@08:19:46PM +0300, Sagi Grimberg wrote:
> 
> > the nvme_dev_add() function silently ignores failures.
> > In case blk_mq_alloc_tag_set fails, we hit NULL deref while
> > calling blk_mq_init_queue during nvme_alloc_ns with tagset == NULL.
> > Instead, we'll not issue the scan_work in case tagset allocation
> > failed and leave the ctrl functional.
> > 
> > Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> > Reviewed-by: Keith Busch <keith.busch at intel.com>
> > ---
> >  drivers/nvme/host/core.c |    4 ++--
> >  1 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 9b3b57f..493722a 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -2115,9 +2115,9 @@ void nvme_queue_scan(struct nvme_ctrl *ctrl)
> >  {
> >  	/*
> >  	 * Do not queue new scan work when a controller is reset during
> > -	 * removal.
> > +	 * removal or if the tagset doesn't exist.
> >  	 */
> > -	if (ctrl->state == NVME_CTRL_LIVE)
> > +	if (ctrl->state == NVME_CTRL_LIVE && ctrl->tagset)
> >  		schedule_work(&ctrl->scan_work);
> 
> This looks wrong...
> 
> Why is the controller LIVE without a tagset allocated?
> I think you should handle it in the pci driver.

Not having a tagset doesn't mean we can't go live; it just means we can't
do IO, but the admin handle is still up for device management. Also,
nvme_queue_scan can be triggered from places outside nvme pci's control,
so I think Max's patch is an appropriate place to check.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/1] nvme: don't ignore tagset allocation failures
  2017-03-29 17:32   ` Keith Busch
@ 2017-03-29 17:42     ` Sagi Grimberg
  2017-03-29 23:10       ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2017-03-29 17:42 UTC (permalink / raw)



> Not having a tagset doesn't mean we can't go live; it just means we can't
> do IO, but the admin handle is still up for device management.

So don't queue ns scanning...

> Also, nvme_queue_scan can be triggered from places outside nvme pci's control,
> so I think Max's patch is an appropriate place to check.

How is that not racy?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/1] nvme: don't ignore tagset allocation failures
  2017-03-29 17:42     ` Sagi Grimberg
@ 2017-03-29 23:10       ` Keith Busch
  2017-03-30  5:47         ` Sagi Grimberg
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2017-03-29 23:10 UTC (permalink / raw)


On Wed, Mar 29, 2017@08:42:33PM +0300, Sagi Grimberg wrote:
> 
> > Not having a tagset doesn't mean we can't go live; it just means we can't
> > do IO, but the admin handle is still up for device management.
> 
> So don't queue ns scanning...

And what about the sysfs rescan or namespace notify async event? We have
to fence those off too, so doing it in one place sounds better than
three.
 
> > Also, nvme_queue_scan can be triggered from places outside nvme pci's control,
> > so I think Max's patch is an appropriate place to check.
> 
> How is that not racy?

The tagset can only be allocated while the controller is not "LIVE", so
that satisfies synchronizing this resource.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/1] nvme: don't ignore tagset allocation failures
  2017-03-29 23:10       ` Keith Busch
@ 2017-03-30  5:47         ` Sagi Grimberg
  2017-03-30  7:13           ` Max Gurtovoy
  0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2017-03-30  5:47 UTC (permalink / raw)



>>> Not having a tagset doesn't mean we can't go live; it just means we can't
>>> do IO, but the admin handle is still up for device management.
>>
>> So don't queue ns scanning...
>
> And what about the sysfs rescan or namespace notify async event? We have
> to fence those off too, so doing it in one place sounds better than
> three.

True, but it still feels wrong that the pci driver fails tagset
allocation and continues to scan namespaces as if nothing happened, then
queue namespaces checks for the existence of a tagset... It looks awkward...

Why is the controller moving to LIVE anyway? Obviously something went
wrong, which is either a SW BUG (EINVAL), or a temporary failure that
needs to be retried again (ENOMEM). So I think the proper solution would
be to either retry again later for transient errors and for
non-transient errors simply log the error remove the controller (its
not a device issue so no point in keeping the controller around for
error log query).

Thoughts?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/1] nvme: don't ignore tagset allocation failures
  2017-03-30  5:47         ` Sagi Grimberg
@ 2017-03-30  7:13           ` Max Gurtovoy
  2017-03-30 13:05             ` Sagi Grimberg
  0 siblings, 1 reply; 8+ messages in thread
From: Max Gurtovoy @ 2017-03-30  7:13 UTC (permalink / raw)




On 3/30/2017 8:47 AM, Sagi Grimberg wrote:
>
>>>> Not having a tagset doesn't mean we can't go live; it just means we
>>>> can't
>>>> do IO, but the admin handle is still up for device management.
>>>
>>> So don't queue ns scanning...
>>
>> And what about the sysfs rescan or namespace notify async event? We have
>> to fence those off too, so doing it in one place sounds better than
>> three.
>
> True, but it still feels wrong that the pci driver fails tagset
> allocation and continues to scan namespaces as if nothing happened, then
> queue namespaces checks for the existence of a tagset... It looks
> awkward...
>
> Why is the controller moving to LIVE anyway? Obviously something went
> wrong, which is either a SW BUG (EINVAL), or a temporary failure that
> needs to be retried again (ENOMEM). So I think the proper solution would
> be to either retry again later for transient errors and for
> non-transient errors simply log the error remove the controller (its
> not a device issue so no point in keeping the controller around for
> error log query).
>
> Thoughts?


hi Sagi,

what do you think about this solution that we don't keep the ctrl:

The nvme_dev_add() function silently ignores failures.
In case blk_mq_alloc_tag_set fails, we hit NULL deref while
calling blk_mq_init_queue during nvme_alloc_ns with tagset == NULL.
Instead we'll remove the dead ctrl as it can't be functional anymore.
Also remove old and wrong comment.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>

---
  drivers/nvme/host/pci.c |   19 +++++++++----------
  1 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3faefab..3964e1d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1545,14 +1545,10 @@ static void nvme_disable_io_queues(struct 
nvme_dev *dev, int queues)
  	}
  }

-/*
- * Return: error value if an error occurred setting up the queues or 
calling
- * Identify Device.  0 if these succeeded, even if adding some of the
- * namespaces failed.  At the moment, these failures are silent.  TBD which
- * failures should be reported.
- */
  static int nvme_dev_add(struct nvme_dev *dev)
  {
+	int ret = 0;
+
  	if (!dev->ctrl.tagset) {
  		dev->tagset.ops = &nvme_mq_ops;
  		dev->tagset.nr_hw_queues = dev->online_queues - 1;
@@ -1564,8 +1560,9 @@ static int nvme_dev_add(struct nvme_dev *dev)
  		dev->tagset.flags = BLK_MQ_F_SHOULD_MERGE;
  		dev->tagset.driver_data = dev;

-		if (blk_mq_alloc_tag_set(&dev->tagset))
-			return 0;
+		ret = blk_mq_alloc_tag_set(&dev->tagset);
+		if (ret)
+			return ret;
  		dev->ctrl.tagset = &dev->tagset;
  	} else {
  		blk_mq_update_nr_hw_queues(&dev->tagset, dev->online_queues - 1);
@@ -1574,7 +1571,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
  		nvme_free_queues(dev, dev->online_queues);
  	}

-	return 0;
+	return ret;
  }

  static int nvme_pci_enable(struct nvme_dev *dev)
@@ -1811,7 +1808,9 @@ static void nvme_reset_work(struct work_struct *work)
  		nvme_remove_namespaces(&dev->ctrl);
  	} else {
  		nvme_start_queues(&dev->ctrl);
-		nvme_dev_add(dev);
+		result = nvme_dev_add(dev);
+		if (result)
+			goto out;
  	}

  	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_LIVE)) {
-- 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 1/1] nvme: don't ignore tagset allocation failures
  2017-03-30  7:13           ` Max Gurtovoy
@ 2017-03-30 13:05             ` Sagi Grimberg
  0 siblings, 0 replies; 8+ messages in thread
From: Sagi Grimberg @ 2017-03-30 13:05 UTC (permalink / raw)



> hi Sagi,
>
> what do you think about this solution that we don't keep the ctrl:

I'd say that if you got a transient error, try reset again...

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-03-30 13:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-29  9:03 [PATCH 1/1] nvme: don't ignore tagset allocation failures Max Gurtovoy
2017-03-29 17:19 ` Sagi Grimberg
2017-03-29 17:32   ` Keith Busch
2017-03-29 17:42     ` Sagi Grimberg
2017-03-29 23:10       ` Keith Busch
2017-03-30  5:47         ` Sagi Grimberg
2017-03-30  7:13           ` Max Gurtovoy
2017-03-30 13:05             ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.