[PATCH] nvme: do not block connect call on inaccessible devices

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] nvme: do not block connect call on inaccessible devices
@ 2018-06-05  8:29 Hannes Reinecke
  2018-06-05 11:17 ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Hannes Reinecke @ 2018-06-05  8:29 UTC (permalink / raw)


When a device is in ANA inaccessible state the call to 'nvme connect'
will hang as a parition scan is triggered and the I/O will be requeued
indefinitely.
Or rather, requeued until the next path is connected, but we cannot issue
any nvme ioctls anymore as the nvmf_dev_mutex is blocked by the initial
'connect' call.

This patch inhibits partition scan if no working paths are found, and
retriggers the partition scan once the device becomes active.

Signed-off-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/host/multipath.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 14bcc51016c7..30ba3c43e541 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -275,16 +275,44 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 
 void nvme_mpath_add_disk(struct nvme_ns_head *head)
 {
+	struct nvme_ns *ns;
+	int nr_working = 0;
+
 	if (!head->disk)
 		return;
 
 	mutex_lock(&head->subsys->lock);
+	list_for_each_entry(ns, &head->list, siblings) {
+		if (nvme_ns_ana_state(ns) == NVME_ANA_OPTIMIZED ||
+		    nvme_ns_ana_state(ns) == NVME_ANA_NONOPTIMIZED)
+			nr_working++;
+	}
+	/* Disable partition scan if no working paths are found */
+	if (!nr_working)
+		head->disk->flags |= GENHD_FL_NO_PART_SCAN;
+	else
+		head->disk->flags &= ~GENHD_FL_NO_PART_SCAN;
+
 	if (!(head->disk->flags & GENHD_FL_UP)) {
 		device_add_disk(&head->subsys->dev, head->disk);
 		if (sysfs_create_group(&disk_to_dev(head->disk)->kobj,
 				&nvme_ns_id_attr_group))
 			pr_warn("%s: failed to create sysfs group for identification\n",
 				head->disk->disk_name);
+	} else {
+		struct block_device *bdev;
+
+		/*
+		 * We need to do an explicit blkdev_get() to
+		 * re-trigger partition scan if new paths have been added.
+		 */
+		bdev = bdget_disk(head->disk, 0);
+		if (bdev) {
+			bdev->bd_invalidated = 1;
+			if (!blkdev_get(bdev, FMODE_READ, NULL))
+				blkdev_put(bdev, FMODE_READ);
+			bdput(bdev);
+		}
 	}
 	mutex_unlock(&head->subsys->lock);
 }
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] nvme: do not block connect call on inaccessible devices
  2018-06-05  8:29 [PATCH] nvme: do not block connect call on inaccessible devices Hannes Reinecke
@ 2018-06-05 11:17 ` Christoph Hellwig
  2018-06-05 12:23   ` Hannes Reinecke
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2018-06-05 11:17 UTC (permalink / raw)


On Tue, Jun 05, 2018@10:29:05AM +0200, Hannes Reinecke wrote:
> When a device is in ANA inaccessible state the call to 'nvme connect'
> will hang as a parition scan is triggered and the I/O will be requeued
> indefinitely.
> Or rather, requeued until the next path is connected, but we cannot issue
> any nvme ioctls anymore as the nvmf_dev_mutex is blocked by the initial
> 'connect' call.
> 
> This patch inhibits partition scan if no working paths are found, and
> retriggers the partition scan once the device becomes active.

And what updates the state once all your inaccessible or change
state groups become optimized?

Also I see no point registering the disk if we can't do I/O to it.

I'd rather prefer something along the lines of the completely untested
patch below if we really want to go down this route:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3b55580bb9be..5d90560775e1 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3080,7 +3080,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 		pr_warn("%s: failed to register lightnvm sysfs group for identification\n",
 			ns->disk->disk_name);
 
-	nvme_mpath_add_disk(ns->head);
+	nvme_mpath_add_disk(ns);
 	nvme_fault_inject_init(ns);
 	return;
  out_unlink_ns:
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 6e4fdc1a0581..8058dc385df5 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -264,20 +264,31 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 	return -ENOMEM;
 }
 
-void nvme_mpath_add_disk(struct nvme_ns_head *head)
+static void __nvme_mpath_add_disk(struct nvme_ns *ns)
 {
-	if (!head->disk)
+
+	struct nvme_ns_head *head = ns->head;
+	enum nvme_ana_state state = nvme_ns_ana_state(ns);
+
+	lockdep_assert_held(&head->subsys->lock);
+
+	if (!head->disk ||
+	    (head->disk->flags & GENHD_FL_UP) ||
+	    (state != NVME_ANA_OPTIMIZED && state != NVME_ANA_NONOPTIMIZED))
 		return;
 
-	mutex_lock(&head->subsys->lock);
-	if (!(head->disk->flags & GENHD_FL_UP)) {
-		device_add_disk(&head->subsys->dev, head->disk);
-		if (sysfs_create_group(&disk_to_dev(head->disk)->kobj,
-				&nvme_ns_id_attr_group))
-			pr_warn("%s: failed to create sysfs group for identification\n",
-				head->disk->disk_name);
-	}
-	mutex_unlock(&head->subsys->lock);
+	device_add_disk(&head->subsys->dev, head->disk);
+	if (sysfs_create_group(&disk_to_dev(head->disk)->kobj,
+			&nvme_ns_id_attr_group))
+		pr_warn("%s: failed to create sysfs group for identification\n",
+			head->disk->disk_name);
+}
+
+void nvme_mpath_add_disk(struct nvme_ns *ns)
+{
+	mutex_lock(&ns->head->subsys->lock);
+	__nvme_mpath_add_disk(ns);
+	mutex_unlock(&ns->head->subsys->lock);
 }
 
 void nvme_mpath_remove_disk(struct nvme_ns_head *head)
@@ -299,6 +310,7 @@ static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool groups_only)
 {
 	void *base = ctrl->ana_log_buf;
 	size_t offset = sizeof(struct nvme_ana_rsp_hdr);
+	struct nvme_ns *ns;
 	int error, i;
 
 	/*
@@ -316,21 +328,22 @@ static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool groups_only)
 		return error;
 	}
 
+	error = -EINVAL;
+	mutex_lock(&ctrl->subsys->lock);
 	for (i = 0; i < le16_to_cpu(ctrl->ana_log_buf->ngrps); i++) {
 		struct nvme_ana_group_desc *desc = base + offset;
 		u32 grpid = le32_to_cpu(desc->grpid);
 		u32 nr_nsids = le32_to_cpu(desc->nnsids), n = 0;
 		size_t nsid_buf_size = nr_nsids * sizeof(__le32);
-		struct nvme_ns *ns;
 
 		if (WARN_ON_ONCE(grpid == 0))
-			return -EINVAL;
+			goto out_error;
 		if (WARN_ON_ONCE(grpid > ctrl->anagrpmax))
-			return -EINVAL;
+			goto out_error;
 		if (WARN_ON_ONCE(desc->state == 0))
-			return -EINVAL;
+			goto out_error;
 		if (WARN_ON_ONCE(desc->state > NVME_ANA_CHANGE))
-			return -EINVAL;
+			goto out_error;
 
 		dev_info(ctrl->device, "ANA group %d: %s.\n",
 				grpid, nvme_ana_state_names[desc->state]);
@@ -340,9 +353,9 @@ static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool groups_only)
 			continue;
 
 		if (WARN_ON_ONCE(groups_only))
-			return -EINVAL;
+			goto out_error;
 		if (WARN_ON_ONCE(offset > ctrl->ana_log_size - nsid_buf_size))
-			return -EINVAL;
+			goto out_error;
 
 		down_write(&ctrl->namespaces_rwsem);
 		list_for_each_entry(ns, &ctrl->namespaces, list) {
@@ -359,10 +372,19 @@ static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool groups_only)
 
 		offset += nsid_buf_size;
 		if (WARN_ON_ONCE(offset > ctrl->ana_log_size - sizeof(*desc)))
-			return -EINVAL;
+			goto out_error;
 	}
 
-	return 0;
+	// XXX: optimize when to actually calls this..
+	down_write(&ctrl->namespaces_rwsem);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		__nvme_mpath_add_disk(ns);
+	up_write(&ctrl->namespaces_rwsem);
+
+	error = 0;
+out_error:
+	mutex_unlock(&ctrl->subsys->lock);
+	return error;
 }
 
 static void nvme_ana_work(struct work_struct *work)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a6541b9f1548..ccbeae09599d 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -460,7 +460,7 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 void nvme_failover_req(struct request *req);
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
-void nvme_mpath_add_disk(struct nvme_ns_head *head);
+void nvme_mpath_add_disk(struct nvme_ns *ns);
 void nvme_mpath_remove_disk(struct nvme_ns_head *head);
 int nvme_configure_ana(struct nvme_ctrl *ctrl);
 void nvme_deconfigure_ana(struct nvme_ctrl *ctrl);
@@ -507,7 +507,7 @@ static inline int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,
 {
 	return 0;
 }
-static inline void nvme_mpath_add_disk(struct nvme_ns_head *head)
+static inline void nvme_mpath_add_disk(struct nvme_ns *ns)
 {
 }
 static inline void nvme_mpath_remove_disk(struct nvme_ns_head *head)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] nvme: do not block connect call on inaccessible devices
  2018-06-05 11:17 ` Christoph Hellwig
@ 2018-06-05 12:23   ` Hannes Reinecke
  2018-06-05 12:35     ` Bart Van Assche
  2018-06-05 12:51     ` Christoph Hellwig
  0 siblings, 2 replies; 6+ messages in thread
From: Hannes Reinecke @ 2018-06-05 12:23 UTC (permalink / raw)


On Tue, 5 Jun 2018 13:17:46 +0200
Christoph Hellwig <hch@lst.de> wrote:

> On Tue, Jun 05, 2018@10:29:05AM +0200, Hannes Reinecke wrote:
> > When a device is in ANA inaccessible state the call to 'nvme
> > connect' will hang as a parition scan is triggered and the I/O will
> > be requeued indefinitely.
> > Or rather, requeued until the next path is connected, but we cannot
> > issue any nvme ioctls anymore as the nvmf_dev_mutex is blocked by
> > the initial 'connect' call.
> > 
> > This patch inhibits partition scan if no working paths are found,
> > and retriggers the partition scan once the device becomes active.  
> 
> And what updates the state once all your inaccessible or change
> state groups become optimized?
> 
ANA AENs?
Someone told me we should rely on ANA AENS or the system is hosed
anyway :-)

My idea was to hook into the 'revalidate_disk' call here, but then this
approach turned out to be easier.

> Also I see no point registering the disk if we can't do I/O to it.
> 
> I'd rather prefer something along the lines of the completely untested
> patch below if we really want to go down this route:
> 
Okay, I'll give it a go.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] nvme: do not block connect call on inaccessible devices
  2018-06-05 12:23   ` Hannes Reinecke
@ 2018-06-05 12:35     ` Bart Van Assche
  2018-06-05 12:50       ` hch
  2018-06-05 12:51     ` Christoph Hellwig
  1 sibling, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2018-06-05 12:35 UTC (permalink / raw)


On Tue, 2018-06-05@14:23 +0200, Hannes Reinecke wrote:
> On Tue, 5 Jun 2018 13:17:46 +0200
> Christoph Hellwig <hch@lst.de> wrote:
> > And what updates the state once all your inaccessible or change
> > state groups become optimized?
> 
> ANA AENs?
> Someone told me we should rely on ANA AENS or the system is hosed
> anyway :-)

Is there a standard for AENs over NVMeOF? If so, how many receive buffers
should the initiator reserve for AENs? Which credit mechanism should be used
for AEN communication?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] nvme: do not block connect call on inaccessible devices
  2018-06-05 12:35     ` Bart Van Assche
@ 2018-06-05 12:50       ` hch
  0 siblings, 0 replies; 6+ messages in thread
From: hch @ 2018-06-05 12:50 UTC (permalink / raw)


On Tue, Jun 05, 2018@12:35:13PM +0000, Bart Van Assche wrote:
> Is there a standard for AENs over NVMeOF? If so, how many receive buffers
> should the initiator reserve for AENs? Which credit mechanism should be used
> for AEN communication?

Take a look at sections 5.2 and 7.7 of NVMe 1.3.  All that text
actually goes back to 1.0 without major change.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] nvme: do not block connect call on inaccessible devices
  2018-06-05 12:23   ` Hannes Reinecke
  2018-06-05 12:35     ` Bart Van Assche
@ 2018-06-05 12:51     ` Christoph Hellwig
  1 sibling, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2018-06-05 12:51 UTC (permalink / raw)


On Tue, Jun 05, 2018@02:23:18PM +0200, Hannes Reinecke wrote:
> > > This patch inhibits partition scan if no working paths are found,
> > > and retriggers the partition scan once the device becomes active.  
> > 
> > And what updates the state once all your inaccessible or change
> > state groups become optimized?
> > 
> ANA AENs?

Not in your patch.

> Someone told me we should rely on ANA AENS or the system is hosed
> anyway :-)

Yes, of course.  But AENs aren;t going to help you if you don't
take any action upon them.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-06-05 12:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-05  8:29 [PATCH] nvme: do not block connect call on inaccessible devices Hannes Reinecke
2018-06-05 11:17 ` Christoph Hellwig
2018-06-05 12:23   ` Hannes Reinecke
2018-06-05 12:35     ` Bart Van Assche
2018-06-05 12:50       ` hch
2018-06-05 12:51     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox