* [PATCH 0/2] nvme: fixup crash in device_add_disk() @ 2019-02-19 12:13 Hannes Reinecke 2019-02-19 12:13 ` [PATCH 1/2] nvme: return error from nvme_alloc_ns() Hannes Reinecke 2019-02-19 12:13 ` [PATCH 2/2] nvme: protect against race condition in nvme_validate_ns() Hannes Reinecke 0 siblings, 2 replies; 8+ messages in thread From: Hannes Reinecke @ 2019-02-19 12:13 UTC (permalink / raw) Hi all, during testing we've ran into an issue where the system would crash in device_add_disk(); analysis showed that there is a race condition in nvme_validate_ns() if called simultaneously for the same controller. This patchset tries to fix it up. As usual, comments and reviews are appreciated. Hannes Reinecke (2): nvme: return error from nvme_alloc_ns() nvme: protect against race condition in nvme_validate_ns() drivers/nvme/host/core.c | 51 ++++++++++++++++++++++++++++++++++++------------ 1 file changed, 38 insertions(+), 13 deletions(-) -- 2.16.4 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] nvme: return error from nvme_alloc_ns() 2019-02-19 12:13 [PATCH 0/2] nvme: fixup crash in device_add_disk() Hannes Reinecke @ 2019-02-19 12:13 ` Hannes Reinecke 2019-02-19 19:42 ` Sagi Grimberg 2019-02-20 14:21 ` Christoph Hellwig 2019-02-19 12:13 ` [PATCH 2/2] nvme: protect against race condition in nvme_validate_ns() Hannes Reinecke 1 sibling, 2 replies; 8+ messages in thread From: Hannes Reinecke @ 2019-02-19 12:13 UTC (permalink / raw) nvme_alloc_ns() might fail, so we should be returning an error code. Signed-off-by: Hannes Reinecke <hare at suse.com> --- drivers/nvme/host/core.c | 31 +++++++++++++++++++++---------- 1 file changed, 21 insertions(+), 10 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index f2f75831decd..9c6f6a4db60a 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3214,21 +3214,23 @@ static int nvme_setup_streams_ns(struct nvme_ctrl *ctrl, struct nvme_ns *ns) return 0; } -static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) +static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) { struct nvme_ns *ns; struct gendisk *disk; struct nvme_id_ns *id; char disk_name[DISK_NAME_LEN]; - int node = ctrl->numa_node, flags = GENHD_FL_EXT_DEVT; + int node = ctrl->numa_node, flags = GENHD_FL_EXT_DEVT, ret; ns = kzalloc_node(sizeof(*ns), GFP_KERNEL, node); if (!ns) - return; + return -ENOMEM; ns->queue = blk_mq_init_queue(ctrl->tagset); - if (IS_ERR(ns->queue)) + if (IS_ERR(ns->queue)) { + ret = PTR_ERR(ns->queue); goto out_free_ns; + } blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue); if (ctrl->ops->flags & NVME_F_PCI_P2PDMA) @@ -3244,20 +3246,27 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) nvme_set_queue_limits(ctrl, ns->queue); id = nvme_identify_ns(ctrl, nsid); - if (!id) + if (!id) { + ret = -EIO; goto out_free_queue; + } - if (id->ncap == 0) + if (id->ncap == 0) { + ret = -EINVAL; goto out_free_id; + } - if (nvme_init_ns_head(ns, nsid, id)) + ret = nvme_init_ns_head(ns, nsid, id); + if (ret) goto out_free_id; nvme_setup_streams_ns(ctrl, ns); nvme_set_disk_name(disk_name, ns, ctrl, &flags); disk = alloc_disk_node(0, node); - if (!disk) + if (!disk) { + ret = -ENOMEM; goto out_unlink_ns; + } disk->fops = &nvme_fops; disk->private_data = ns; @@ -3269,7 +3278,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) __nvme_revalidate_disk(disk, id); if ((ctrl->quirks & NVME_QUIRK_LIGHTNVM) && id->vs[0] == 0x1) { - if (nvme_nvm_register(ns, disk_name, node)) { + ret = nvme_nvm_register(ns, disk_name, node); + if (ret) { dev_warn(ctrl->device, "LightNVM init failure\n"); goto out_put_disk; } @@ -3287,7 +3297,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) nvme_fault_inject_init(ns); kfree(id); - return; + return 0; out_put_disk: put_disk(ns->disk); out_unlink_ns: @@ -3300,6 +3310,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) blk_cleanup_queue(ns->queue); out_free_ns: kfree(ns); + return ret; } static void nvme_ns_remove(struct nvme_ns *ns) -- 2.16.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 1/2] nvme: return error from nvme_alloc_ns() 2019-02-19 12:13 ` [PATCH 1/2] nvme: return error from nvme_alloc_ns() Hannes Reinecke @ 2019-02-19 19:42 ` Sagi Grimberg 2019-02-20 14:21 ` Christoph Hellwig 1 sibling, 0 replies; 8+ messages in thread From: Sagi Grimberg @ 2019-02-19 19:42 UTC (permalink / raw) Reviewed-by: Sagi Grimberg <sagi at grimberg.me> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] nvme: return error from nvme_alloc_ns() 2019-02-19 12:13 ` [PATCH 1/2] nvme: return error from nvme_alloc_ns() Hannes Reinecke 2019-02-19 19:42 ` Sagi Grimberg @ 2019-02-20 14:21 ` Christoph Hellwig 1 sibling, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2019-02-20 14:21 UTC (permalink / raw) Thanks, applied to nvme-5.1. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] nvme: protect against race condition in nvme_validate_ns() 2019-02-19 12:13 [PATCH 0/2] nvme: fixup crash in device_add_disk() Hannes Reinecke 2019-02-19 12:13 ` [PATCH 1/2] nvme: return error from nvme_alloc_ns() Hannes Reinecke @ 2019-02-19 12:13 ` Hannes Reinecke 2019-02-19 19:44 ` Sagi Grimberg 1 sibling, 1 reply; 8+ messages in thread From: Hannes Reinecke @ 2019-02-19 12:13 UTC (permalink / raw) When subsystems are rapidly reconfigured (or sending out several AENs) we might end up in a situation where several instances of nvme_scan_work() are running. Each of which might be trying to register the same nsid, so nvme_find_get_ns() in nvme_validate_ns() will return 0 for both, resulting in a crash in nvme_alloc_ns() as both are registering a gendisk with the same name. Signed-off-by: Hannes Reinecke <hare at suse.com> --- drivers/nvme/host/core.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 9c6f6a4db60a..7cf710e8d98d 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3216,7 +3216,7 @@ static int nvme_setup_streams_ns(struct nvme_ctrl *ctrl, struct nvme_ns *ns) static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) { - struct nvme_ns *ns; + struct nvme_ns *ns, *tmp; struct gendisk *disk; struct nvme_id_ns *id; char disk_name[DISK_NAME_LEN]; @@ -3286,6 +3286,15 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) } down_write(&ctrl->namespaces_rwsem); + list_for_each_entry(tmp, &ctrl->namespaces, list) { + if (nsid == tmp->head->ns_id) { + up_write(&ctrl->namespaces_rwsem); + dev_warn(ctrl->device, + "Duplicate ns %d, rescanning", nsid); + ret = -EAGAIN; + goto out_put_disk; + } + } list_add_tail(&ns->list, &ctrl->namespaces); up_write(&ctrl->namespaces_rwsem); @@ -3343,14 +3352,19 @@ static void nvme_ns_remove(struct nvme_ns *ns) static void nvme_validate_ns(struct nvme_ctrl *ctrl, unsigned nsid) { struct nvme_ns *ns; + int ret; +rescan: ns = nvme_find_get_ns(ctrl, nsid); if (ns) { if (ns->disk && revalidate_disk(ns->disk)) nvme_ns_remove(ns); nvme_put_ns(ns); - } else - nvme_alloc_ns(ctrl, nsid); + } else { + ret = nvme_alloc_ns(ctrl, nsid); + if (ret == -EAGAIN) + goto rescan; + } } static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl, -- 2.16.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] nvme: protect against race condition in nvme_validate_ns() 2019-02-19 12:13 ` [PATCH 2/2] nvme: protect against race condition in nvme_validate_ns() Hannes Reinecke @ 2019-02-19 19:44 ` Sagi Grimberg 2019-02-19 19:54 ` Keith Busch 0 siblings, 1 reply; 8+ messages in thread From: Sagi Grimberg @ 2019-02-19 19:44 UTC (permalink / raw) On 2/19/19 4:13 AM, Hannes Reinecke wrote: > When subsystems are rapidly reconfigured (or sending out several AENs) > we might end up in a situation where several instances of nvme_scan_work() > are running. Each of which might be trying to register the same nsid, > so nvme_find_get_ns() in nvme_validate_ns() will return 0 for both, > resulting in a crash in nvme_alloc_ns() as both are registering a > gendisk with the same name. Wouldn't it be better to serialize nvme_scan_work such that it doesn't run multiple times in parallel? > Signed-off-by: Hannes Reinecke <hare at suse.com> > --- > drivers/nvme/host/core.c | 20 +++++++++++++++++--- > 1 file changed, 17 insertions(+), 3 deletions(-) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 9c6f6a4db60a..7cf710e8d98d 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -3216,7 +3216,7 @@ static int nvme_setup_streams_ns(struct nvme_ctrl *ctrl, struct nvme_ns *ns) > > static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) > { > - struct nvme_ns *ns; > + struct nvme_ns *ns, *tmp; > struct gendisk *disk; > struct nvme_id_ns *id; > char disk_name[DISK_NAME_LEN]; > @@ -3286,6 +3286,15 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) > } > > down_write(&ctrl->namespaces_rwsem); > + list_for_each_entry(tmp, &ctrl->namespaces, list) { > + if (nsid == tmp->head->ns_id) { > + up_write(&ctrl->namespaces_rwsem); > + dev_warn(ctrl->device, > + "Duplicate ns %d, rescanning", nsid); Can you move this print to the caller where the actual rescanning happens. > + ret = -EAGAIN; > + goto out_put_disk; > + } > + } > list_add_tail(&ns->list, &ctrl->namespaces); > up_write(&ctrl->namespaces_rwsem); > > @@ -3343,14 +3352,19 @@ static void nvme_ns_remove(struct nvme_ns *ns) > static void nvme_validate_ns(struct nvme_ctrl *ctrl, unsigned nsid) > { > struct nvme_ns *ns; > + int ret; > > +rescan: > ns = nvme_find_get_ns(ctrl, nsid); > if (ns) { > if (ns->disk && revalidate_disk(ns->disk)) > nvme_ns_remove(ns); > nvme_put_ns(ns); > - } else > - nvme_alloc_ns(ctrl, nsid); > + } else { > + ret = nvme_alloc_ns(ctrl, nsid); > + if (ret == -EAGAIN) > + goto rescan; > + } > } > > static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl, > ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] nvme: protect against race condition in nvme_validate_ns() 2019-02-19 19:44 ` Sagi Grimberg @ 2019-02-19 19:54 ` Keith Busch 2019-02-20 6:52 ` Hannes Reinecke 0 siblings, 1 reply; 8+ messages in thread From: Keith Busch @ 2019-02-19 19:54 UTC (permalink / raw) On Tue, Feb 19, 2019@11:44:41AM -0800, Sagi Grimberg wrote: > On 2/19/19 4:13 AM, Hannes Reinecke wrote: > > When subsystems are rapidly reconfigured (or sending out several AENs) > > we might end up in a situation where several instances of nvme_scan_work() > > are running. Each of which might be trying to register the same nsid, > > so nvme_find_get_ns() in nvme_validate_ns() will return 0 for both, > > resulting in a crash in nvme_alloc_ns() as both are registering a > > gendisk with the same name. > > Wouldn't it be better to serialize nvme_scan_work such that it doesn't > run multiple times in parallel? Doesn't the work queue already serialize individual ctrl's scan_work? There is also a recently added mutex to synchronize scan work with command effects handling, which would force an nvme_ctrl's scan_work to be serialized: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e7ad43c3eda6a1690c4c3c341f95dc1c6898da83 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] nvme: protect against race condition in nvme_validate_ns() 2019-02-19 19:54 ` Keith Busch @ 2019-02-20 6:52 ` Hannes Reinecke 0 siblings, 0 replies; 8+ messages in thread From: Hannes Reinecke @ 2019-02-20 6:52 UTC (permalink / raw) On 2/19/19 8:54 PM, Keith Busch wrote: > On Tue, Feb 19, 2019@11:44:41AM -0800, Sagi Grimberg wrote: >> On 2/19/19 4:13 AM, Hannes Reinecke wrote: >>> When subsystems are rapidly reconfigured (or sending out several AENs) >>> we might end up in a situation where several instances of nvme_scan_work() >>> are running. Each of which might be trying to register the same nsid, >>> so nvme_find_get_ns() in nvme_validate_ns() will return 0 for both, >>> resulting in a crash in nvme_alloc_ns() as both are registering a >>> gendisk with the same name. >> >> Wouldn't it be better to serialize nvme_scan_work such that it doesn't >> run multiple times in parallel? > > Doesn't the work queue already serialize individual ctrl's scan_work? > > There is also a recently added mutex to synchronize scan work with > command effects handling, which would force an nvme_ctrl's scan_work to > be serialized: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e7ad43c3eda6a1690c4c3c341f95dc1c6898da83 > Ah. Hmm. Probably. And indeed, the tests were done without this patch. I'll check if that patch is sufficient. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare at suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG N?rnberg) ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-02-20 14:21 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-02-19 12:13 [PATCH 0/2] nvme: fixup crash in device_add_disk() Hannes Reinecke 2019-02-19 12:13 ` [PATCH 1/2] nvme: return error from nvme_alloc_ns() Hannes Reinecke 2019-02-19 19:42 ` Sagi Grimberg 2019-02-20 14:21 ` Christoph Hellwig 2019-02-19 12:13 ` [PATCH 2/2] nvme: protect against race condition in nvme_validate_ns() Hannes Reinecke 2019-02-19 19:44 ` Sagi Grimberg 2019-02-19 19:54 ` Keith Busch 2019-02-20 6:52 ` Hannes Reinecke
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.