* [PATCH] nvme: do not block connect call on inaccessible devices @ 2018-06-05 8:29 Hannes Reinecke 2018-06-05 11:17 ` Christoph Hellwig 0 siblings, 1 reply; 6+ messages in thread From: Hannes Reinecke @ 2018-06-05 8:29 UTC (permalink / raw) When a device is in ANA inaccessible state the call to 'nvme connect' will hang as a parition scan is triggered and the I/O will be requeued indefinitely. Or rather, requeued until the next path is connected, but we cannot issue any nvme ioctls anymore as the nvmf_dev_mutex is blocked by the initial 'connect' call. This patch inhibits partition scan if no working paths are found, and retriggers the partition scan once the device becomes active. Signed-off-by: Hannes Reinecke <hare at suse.com> --- drivers/nvme/host/multipath.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 14bcc51016c7..30ba3c43e541 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -275,16 +275,44 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head) void nvme_mpath_add_disk(struct nvme_ns_head *head) { + struct nvme_ns *ns; + int nr_working = 0; + if (!head->disk) return; mutex_lock(&head->subsys->lock); + list_for_each_entry(ns, &head->list, siblings) { + if (nvme_ns_ana_state(ns) == NVME_ANA_OPTIMIZED || + nvme_ns_ana_state(ns) == NVME_ANA_NONOPTIMIZED) + nr_working++; + } + /* Disable partition scan if no working paths are found */ + if (!nr_working) + head->disk->flags |= GENHD_FL_NO_PART_SCAN; + else + head->disk->flags &= ~GENHD_FL_NO_PART_SCAN; + if (!(head->disk->flags & GENHD_FL_UP)) { device_add_disk(&head->subsys->dev, head->disk); if (sysfs_create_group(&disk_to_dev(head->disk)->kobj, &nvme_ns_id_attr_group)) pr_warn("%s: failed to create sysfs group for identification\n", head->disk->disk_name); + } else { + struct block_device *bdev; + + /* + * We need to do an explicit blkdev_get() to + * re-trigger partition scan if new paths have been added. + */ + bdev = bdget_disk(head->disk, 0); + if (bdev) { + bdev->bd_invalidated = 1; + if (!blkdev_get(bdev, FMODE_READ, NULL)) + blkdev_put(bdev, FMODE_READ); + bdput(bdev); + } } mutex_unlock(&head->subsys->lock); } -- 2.12.3 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH] nvme: do not block connect call on inaccessible devices 2018-06-05 8:29 [PATCH] nvme: do not block connect call on inaccessible devices Hannes Reinecke @ 2018-06-05 11:17 ` Christoph Hellwig 2018-06-05 12:23 ` Hannes Reinecke 0 siblings, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2018-06-05 11:17 UTC (permalink / raw) On Tue, Jun 05, 2018@10:29:05AM +0200, Hannes Reinecke wrote: > When a device is in ANA inaccessible state the call to 'nvme connect' > will hang as a parition scan is triggered and the I/O will be requeued > indefinitely. > Or rather, requeued until the next path is connected, but we cannot issue > any nvme ioctls anymore as the nvmf_dev_mutex is blocked by the initial > 'connect' call. > > This patch inhibits partition scan if no working paths are found, and > retriggers the partition scan once the device becomes active. And what updates the state once all your inaccessible or change state groups become optimized? Also I see no point registering the disk if we can't do I/O to it. I'd rather prefer something along the lines of the completely untested patch below if we really want to go down this route: diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 3b55580bb9be..5d90560775e1 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3080,7 +3080,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) pr_warn("%s: failed to register lightnvm sysfs group for identification\n", ns->disk->disk_name); - nvme_mpath_add_disk(ns->head); + nvme_mpath_add_disk(ns); nvme_fault_inject_init(ns); return; out_unlink_ns: diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 6e4fdc1a0581..8058dc385df5 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -264,20 +264,31 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head) return -ENOMEM; } -void nvme_mpath_add_disk(struct nvme_ns_head *head) +static void __nvme_mpath_add_disk(struct nvme_ns *ns) { - if (!head->disk) + + struct nvme_ns_head *head = ns->head; + enum nvme_ana_state state = nvme_ns_ana_state(ns); + + lockdep_assert_held(&head->subsys->lock); + + if (!head->disk || + (head->disk->flags & GENHD_FL_UP) || + (state != NVME_ANA_OPTIMIZED && state != NVME_ANA_NONOPTIMIZED)) return; - mutex_lock(&head->subsys->lock); - if (!(head->disk->flags & GENHD_FL_UP)) { - device_add_disk(&head->subsys->dev, head->disk); - if (sysfs_create_group(&disk_to_dev(head->disk)->kobj, - &nvme_ns_id_attr_group)) - pr_warn("%s: failed to create sysfs group for identification\n", - head->disk->disk_name); - } - mutex_unlock(&head->subsys->lock); + device_add_disk(&head->subsys->dev, head->disk); + if (sysfs_create_group(&disk_to_dev(head->disk)->kobj, + &nvme_ns_id_attr_group)) + pr_warn("%s: failed to create sysfs group for identification\n", + head->disk->disk_name); +} + +void nvme_mpath_add_disk(struct nvme_ns *ns) +{ + mutex_lock(&ns->head->subsys->lock); + __nvme_mpath_add_disk(ns); + mutex_unlock(&ns->head->subsys->lock); } void nvme_mpath_remove_disk(struct nvme_ns_head *head) @@ -299,6 +310,7 @@ static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool groups_only) { void *base = ctrl->ana_log_buf; size_t offset = sizeof(struct nvme_ana_rsp_hdr); + struct nvme_ns *ns; int error, i; /* @@ -316,21 +328,22 @@ static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool groups_only) return error; } + error = -EINVAL; + mutex_lock(&ctrl->subsys->lock); for (i = 0; i < le16_to_cpu(ctrl->ana_log_buf->ngrps); i++) { struct nvme_ana_group_desc *desc = base + offset; u32 grpid = le32_to_cpu(desc->grpid); u32 nr_nsids = le32_to_cpu(desc->nnsids), n = 0; size_t nsid_buf_size = nr_nsids * sizeof(__le32); - struct nvme_ns *ns; if (WARN_ON_ONCE(grpid == 0)) - return -EINVAL; + goto out_error; if (WARN_ON_ONCE(grpid > ctrl->anagrpmax)) - return -EINVAL; + goto out_error; if (WARN_ON_ONCE(desc->state == 0)) - return -EINVAL; + goto out_error; if (WARN_ON_ONCE(desc->state > NVME_ANA_CHANGE)) - return -EINVAL; + goto out_error; dev_info(ctrl->device, "ANA group %d: %s.\n", grpid, nvme_ana_state_names[desc->state]); @@ -340,9 +353,9 @@ static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool groups_only) continue; if (WARN_ON_ONCE(groups_only)) - return -EINVAL; + goto out_error; if (WARN_ON_ONCE(offset > ctrl->ana_log_size - nsid_buf_size)) - return -EINVAL; + goto out_error; down_write(&ctrl->namespaces_rwsem); list_for_each_entry(ns, &ctrl->namespaces, list) { @@ -359,10 +372,19 @@ static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool groups_only) offset += nsid_buf_size; if (WARN_ON_ONCE(offset > ctrl->ana_log_size - sizeof(*desc))) - return -EINVAL; + goto out_error; } - return 0; + // XXX: optimize when to actually calls this.. + down_write(&ctrl->namespaces_rwsem); + list_for_each_entry(ns, &ctrl->namespaces, list) + __nvme_mpath_add_disk(ns); + up_write(&ctrl->namespaces_rwsem); + + error = 0; +out_error: + mutex_unlock(&ctrl->subsys->lock); + return error; } static void nvme_ana_work(struct work_struct *work) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index a6541b9f1548..ccbeae09599d 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -460,7 +460,7 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns, void nvme_failover_req(struct request *req); void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl); int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head); -void nvme_mpath_add_disk(struct nvme_ns_head *head); +void nvme_mpath_add_disk(struct nvme_ns *ns); void nvme_mpath_remove_disk(struct nvme_ns_head *head); int nvme_configure_ana(struct nvme_ctrl *ctrl); void nvme_deconfigure_ana(struct nvme_ctrl *ctrl); @@ -507,7 +507,7 @@ static inline int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, { return 0; } -static inline void nvme_mpath_add_disk(struct nvme_ns_head *head) +static inline void nvme_mpath_add_disk(struct nvme_ns *ns) { } static inline void nvme_mpath_remove_disk(struct nvme_ns_head *head) ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH] nvme: do not block connect call on inaccessible devices 2018-06-05 11:17 ` Christoph Hellwig @ 2018-06-05 12:23 ` Hannes Reinecke 2018-06-05 12:35 ` Bart Van Assche 2018-06-05 12:51 ` Christoph Hellwig 0 siblings, 2 replies; 6+ messages in thread From: Hannes Reinecke @ 2018-06-05 12:23 UTC (permalink / raw) On Tue, 5 Jun 2018 13:17:46 +0200 Christoph Hellwig <hch@lst.de> wrote: > On Tue, Jun 05, 2018@10:29:05AM +0200, Hannes Reinecke wrote: > > When a device is in ANA inaccessible state the call to 'nvme > > connect' will hang as a parition scan is triggered and the I/O will > > be requeued indefinitely. > > Or rather, requeued until the next path is connected, but we cannot > > issue any nvme ioctls anymore as the nvmf_dev_mutex is blocked by > > the initial 'connect' call. > > > > This patch inhibits partition scan if no working paths are found, > > and retriggers the partition scan once the device becomes active. > > And what updates the state once all your inaccessible or change > state groups become optimized? > ANA AENs? Someone told me we should rely on ANA AENS or the system is hosed anyway :-) My idea was to hook into the 'revalidate_disk' call here, but then this approach turned out to be easier. > Also I see no point registering the disk if we can't do I/O to it. > > I'd rather prefer something along the lines of the completely untested > patch below if we really want to go down this route: > Okay, I'll give it a go. Cheers, Hannes ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] nvme: do not block connect call on inaccessible devices 2018-06-05 12:23 ` Hannes Reinecke @ 2018-06-05 12:35 ` Bart Van Assche 2018-06-05 12:50 ` hch 2018-06-05 12:51 ` Christoph Hellwig 1 sibling, 1 reply; 6+ messages in thread From: Bart Van Assche @ 2018-06-05 12:35 UTC (permalink / raw) On Tue, 2018-06-05@14:23 +0200, Hannes Reinecke wrote: > On Tue, 5 Jun 2018 13:17:46 +0200 > Christoph Hellwig <hch@lst.de> wrote: > > And what updates the state once all your inaccessible or change > > state groups become optimized? > > ANA AENs? > Someone told me we should rely on ANA AENS or the system is hosed > anyway :-) Is there a standard for AENs over NVMeOF? If so, how many receive buffers should the initiator reserve for AENs? Which credit mechanism should be used for AEN communication? Thanks, Bart. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] nvme: do not block connect call on inaccessible devices 2018-06-05 12:35 ` Bart Van Assche @ 2018-06-05 12:50 ` hch 0 siblings, 0 replies; 6+ messages in thread From: hch @ 2018-06-05 12:50 UTC (permalink / raw) On Tue, Jun 05, 2018@12:35:13PM +0000, Bart Van Assche wrote: > Is there a standard for AENs over NVMeOF? If so, how many receive buffers > should the initiator reserve for AENs? Which credit mechanism should be used > for AEN communication? Take a look at sections 5.2 and 7.7 of NVMe 1.3. All that text actually goes back to 1.0 without major change. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] nvme: do not block connect call on inaccessible devices 2018-06-05 12:23 ` Hannes Reinecke 2018-06-05 12:35 ` Bart Van Assche @ 2018-06-05 12:51 ` Christoph Hellwig 1 sibling, 0 replies; 6+ messages in thread From: Christoph Hellwig @ 2018-06-05 12:51 UTC (permalink / raw) On Tue, Jun 05, 2018@02:23:18PM +0200, Hannes Reinecke wrote: > > > This patch inhibits partition scan if no working paths are found, > > > and retriggers the partition scan once the device becomes active. > > > > And what updates the state once all your inaccessible or change > > state groups become optimized? > > > ANA AENs? Not in your patch. > Someone told me we should rely on ANA AENS or the system is hosed > anyway :-) Yes, of course. But AENs aren;t going to help you if you don't take any action upon them. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-06-05 12:51 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-06-05 8:29 [PATCH] nvme: do not block connect call on inaccessible devices Hannes Reinecke 2018-06-05 11:17 ` Christoph Hellwig 2018-06-05 12:23 ` Hannes Reinecke 2018-06-05 12:35 ` Bart Van Assche 2018-06-05 12:50 ` hch 2018-06-05 12:51 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox