* [PATCH 1/1] nvme: don't ignore tagset allocation failures @ 2017-03-29 9:03 Max Gurtovoy 2017-03-29 17:19 ` Sagi Grimberg 0 siblings, 1 reply; 8+ messages in thread From: Max Gurtovoy @ 2017-03-29 9:03 UTC (permalink / raw) the nvme_dev_add() function silently ignores failures. In case blk_mq_alloc_tag_set fails, we hit NULL deref while calling blk_mq_init_queue during nvme_alloc_ns with tagset == NULL. Instead, we'll not issue the scan_work in case tagset allocation failed and leave the ctrl functional. Signed-off-by: Max Gurtovoy <maxg at mellanox.com> Reviewed-by: Keith Busch <keith.busch at intel.com> --- drivers/nvme/host/core.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 9b3b57f..493722a 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2115,9 +2115,9 @@ void nvme_queue_scan(struct nvme_ctrl *ctrl) { /* * Do not queue new scan work when a controller is reset during - * removal. + * removal or if the tagset doesn't exist. */ - if (ctrl->state == NVME_CTRL_LIVE) + if (ctrl->state == NVME_CTRL_LIVE && ctrl->tagset) schedule_work(&ctrl->scan_work); } EXPORT_SYMBOL_GPL(nvme_queue_scan); -- 1.7.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 1/1] nvme: don't ignore tagset allocation failures 2017-03-29 9:03 [PATCH 1/1] nvme: don't ignore tagset allocation failures Max Gurtovoy @ 2017-03-29 17:19 ` Sagi Grimberg 2017-03-29 17:32 ` Keith Busch 0 siblings, 1 reply; 8+ messages in thread From: Sagi Grimberg @ 2017-03-29 17:19 UTC (permalink / raw) > the nvme_dev_add() function silently ignores failures. > In case blk_mq_alloc_tag_set fails, we hit NULL deref while > calling blk_mq_init_queue during nvme_alloc_ns with tagset == NULL. > Instead, we'll not issue the scan_work in case tagset allocation > failed and leave the ctrl functional. > > Signed-off-by: Max Gurtovoy <maxg at mellanox.com> > Reviewed-by: Keith Busch <keith.busch at intel.com> > --- > drivers/nvme/host/core.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 9b3b57f..493722a 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -2115,9 +2115,9 @@ void nvme_queue_scan(struct nvme_ctrl *ctrl) > { > /* > * Do not queue new scan work when a controller is reset during > - * removal. > + * removal or if the tagset doesn't exist. > */ > - if (ctrl->state == NVME_CTRL_LIVE) > + if (ctrl->state == NVME_CTRL_LIVE && ctrl->tagset) > schedule_work(&ctrl->scan_work); This looks wrong... Why is the controller LIVE without a tagset allocated? I think you should handle it in the pci driver. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/1] nvme: don't ignore tagset allocation failures 2017-03-29 17:19 ` Sagi Grimberg @ 2017-03-29 17:32 ` Keith Busch 2017-03-29 17:42 ` Sagi Grimberg 0 siblings, 1 reply; 8+ messages in thread From: Keith Busch @ 2017-03-29 17:32 UTC (permalink / raw) On Wed, Mar 29, 2017@08:19:46PM +0300, Sagi Grimberg wrote: > > > the nvme_dev_add() function silently ignores failures. > > In case blk_mq_alloc_tag_set fails, we hit NULL deref while > > calling blk_mq_init_queue during nvme_alloc_ns with tagset == NULL. > > Instead, we'll not issue the scan_work in case tagset allocation > > failed and leave the ctrl functional. > > > > Signed-off-by: Max Gurtovoy <maxg at mellanox.com> > > Reviewed-by: Keith Busch <keith.busch at intel.com> > > --- > > drivers/nvme/host/core.c | 4 ++-- > > 1 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > > index 9b3b57f..493722a 100644 > > --- a/drivers/nvme/host/core.c > > +++ b/drivers/nvme/host/core.c > > @@ -2115,9 +2115,9 @@ void nvme_queue_scan(struct nvme_ctrl *ctrl) > > { > > /* > > * Do not queue new scan work when a controller is reset during > > - * removal. > > + * removal or if the tagset doesn't exist. > > */ > > - if (ctrl->state == NVME_CTRL_LIVE) > > + if (ctrl->state == NVME_CTRL_LIVE && ctrl->tagset) > > schedule_work(&ctrl->scan_work); > > This looks wrong... > > Why is the controller LIVE without a tagset allocated? > I think you should handle it in the pci driver. Not having a tagset doesn't mean we can't go live; it just means we can't do IO, but the admin handle is still up for device management. Also, nvme_queue_scan can be triggered from places outside nvme pci's control, so I think Max's patch is an appropriate place to check. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/1] nvme: don't ignore tagset allocation failures 2017-03-29 17:32 ` Keith Busch @ 2017-03-29 17:42 ` Sagi Grimberg 2017-03-29 23:10 ` Keith Busch 0 siblings, 1 reply; 8+ messages in thread From: Sagi Grimberg @ 2017-03-29 17:42 UTC (permalink / raw) > Not having a tagset doesn't mean we can't go live; it just means we can't > do IO, but the admin handle is still up for device management. So don't queue ns scanning... > Also, nvme_queue_scan can be triggered from places outside nvme pci's control, > so I think Max's patch is an appropriate place to check. How is that not racy? ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/1] nvme: don't ignore tagset allocation failures 2017-03-29 17:42 ` Sagi Grimberg @ 2017-03-29 23:10 ` Keith Busch 2017-03-30 5:47 ` Sagi Grimberg 0 siblings, 1 reply; 8+ messages in thread From: Keith Busch @ 2017-03-29 23:10 UTC (permalink / raw) On Wed, Mar 29, 2017@08:42:33PM +0300, Sagi Grimberg wrote: > > > Not having a tagset doesn't mean we can't go live; it just means we can't > > do IO, but the admin handle is still up for device management. > > So don't queue ns scanning... And what about the sysfs rescan or namespace notify async event? We have to fence those off too, so doing it in one place sounds better than three. > > Also, nvme_queue_scan can be triggered from places outside nvme pci's control, > > so I think Max's patch is an appropriate place to check. > > How is that not racy? The tagset can only be allocated while the controller is not "LIVE", so that satisfies synchronizing this resource. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/1] nvme: don't ignore tagset allocation failures 2017-03-29 23:10 ` Keith Busch @ 2017-03-30 5:47 ` Sagi Grimberg 2017-03-30 7:13 ` Max Gurtovoy 0 siblings, 1 reply; 8+ messages in thread From: Sagi Grimberg @ 2017-03-30 5:47 UTC (permalink / raw) >>> Not having a tagset doesn't mean we can't go live; it just means we can't >>> do IO, but the admin handle is still up for device management. >> >> So don't queue ns scanning... > > And what about the sysfs rescan or namespace notify async event? We have > to fence those off too, so doing it in one place sounds better than > three. True, but it still feels wrong that the pci driver fails tagset allocation and continues to scan namespaces as if nothing happened, then queue namespaces checks for the existence of a tagset... It looks awkward... Why is the controller moving to LIVE anyway? Obviously something went wrong, which is either a SW BUG (EINVAL), or a temporary failure that needs to be retried again (ENOMEM). So I think the proper solution would be to either retry again later for transient errors and for non-transient errors simply log the error remove the controller (its not a device issue so no point in keeping the controller around for error log query). Thoughts? ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/1] nvme: don't ignore tagset allocation failures 2017-03-30 5:47 ` Sagi Grimberg @ 2017-03-30 7:13 ` Max Gurtovoy 2017-03-30 13:05 ` Sagi Grimberg 0 siblings, 1 reply; 8+ messages in thread From: Max Gurtovoy @ 2017-03-30 7:13 UTC (permalink / raw) On 3/30/2017 8:47 AM, Sagi Grimberg wrote: > >>>> Not having a tagset doesn't mean we can't go live; it just means we >>>> can't >>>> do IO, but the admin handle is still up for device management. >>> >>> So don't queue ns scanning... >> >> And what about the sysfs rescan or namespace notify async event? We have >> to fence those off too, so doing it in one place sounds better than >> three. > > True, but it still feels wrong that the pci driver fails tagset > allocation and continues to scan namespaces as if nothing happened, then > queue namespaces checks for the existence of a tagset... It looks > awkward... > > Why is the controller moving to LIVE anyway? Obviously something went > wrong, which is either a SW BUG (EINVAL), or a temporary failure that > needs to be retried again (ENOMEM). So I think the proper solution would > be to either retry again later for transient errors and for > non-transient errors simply log the error remove the controller (its > not a device issue so no point in keeping the controller around for > error log query). > > Thoughts? hi Sagi, what do you think about this solution that we don't keep the ctrl: The nvme_dev_add() function silently ignores failures. In case blk_mq_alloc_tag_set fails, we hit NULL deref while calling blk_mq_init_queue during nvme_alloc_ns with tagset == NULL. Instead we'll remove the dead ctrl as it can't be functional anymore. Also remove old and wrong comment. Signed-off-by: Max Gurtovoy <maxg at mellanox.com> --- drivers/nvme/host/pci.c | 19 +++++++++---------- 1 files changed, 9 insertions(+), 10 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 3faefab..3964e1d 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1545,14 +1545,10 @@ static void nvme_disable_io_queues(struct nvme_dev *dev, int queues) } } -/* - * Return: error value if an error occurred setting up the queues or calling - * Identify Device. 0 if these succeeded, even if adding some of the - * namespaces failed. At the moment, these failures are silent. TBD which - * failures should be reported. - */ static int nvme_dev_add(struct nvme_dev *dev) { + int ret = 0; + if (!dev->ctrl.tagset) { dev->tagset.ops = &nvme_mq_ops; dev->tagset.nr_hw_queues = dev->online_queues - 1; @@ -1564,8 +1560,9 @@ static int nvme_dev_add(struct nvme_dev *dev) dev->tagset.flags = BLK_MQ_F_SHOULD_MERGE; dev->tagset.driver_data = dev; - if (blk_mq_alloc_tag_set(&dev->tagset)) - return 0; + ret = blk_mq_alloc_tag_set(&dev->tagset); + if (ret) + return ret; dev->ctrl.tagset = &dev->tagset; } else { blk_mq_update_nr_hw_queues(&dev->tagset, dev->online_queues - 1); @@ -1574,7 +1571,7 @@ static int nvme_dev_add(struct nvme_dev *dev) nvme_free_queues(dev, dev->online_queues); } - return 0; + return ret; } static int nvme_pci_enable(struct nvme_dev *dev) @@ -1811,7 +1808,9 @@ static void nvme_reset_work(struct work_struct *work) nvme_remove_namespaces(&dev->ctrl); } else { nvme_start_queues(&dev->ctrl); - nvme_dev_add(dev); + result = nvme_dev_add(dev); + if (result) + goto out; } if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_LIVE)) { -- ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 1/1] nvme: don't ignore tagset allocation failures 2017-03-30 7:13 ` Max Gurtovoy @ 2017-03-30 13:05 ` Sagi Grimberg 0 siblings, 0 replies; 8+ messages in thread From: Sagi Grimberg @ 2017-03-30 13:05 UTC (permalink / raw) > hi Sagi, > > what do you think about this solution that we don't keep the ctrl: I'd say that if you got a transient error, try reset again... ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-03-30 13:05 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-03-29 9:03 [PATCH 1/1] nvme: don't ignore tagset allocation failures Max Gurtovoy 2017-03-29 17:19 ` Sagi Grimberg 2017-03-29 17:32 ` Keith Busch 2017-03-29 17:42 ` Sagi Grimberg 2017-03-29 23:10 ` Keith Busch 2017-03-30 5:47 ` Sagi Grimberg 2017-03-30 7:13 ` Max Gurtovoy 2017-03-30 13:05 ` Sagi Grimberg
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.