All of lore.kernel.org
 help / color / mirror / Atom feed
From: snitzer@redhat.com (Mike Snitzer)
Subject: [PATCH 0/3] Provide more fine grained control over multipathing
Date: Fri, 1 Jun 2018 00:11:15 -0400	[thread overview]
Message-ID: <20180601041115.GA14244@redhat.com> (raw)
In-Reply-To: <20180531163440.GB30954@lst.de>

On Thu, May 31 2018 at 12:34pm -0400,
Christoph Hellwig <hch@lst.de> wrote:

> On Thu, May 31, 2018@08:37:39AM -0400, Mike Snitzer wrote:
> > I saw your reply to the 1/3 patch.. I do agree it is broken for not
> > checking if any handles are active.  But that is easily fixed no?
> 
> Doing a switch at runtime simply is a really bad idea.  If for some
> reason we end up with a good per-controller switch it would have
> to be something set at probe time, and to get it on a controller
> you'd need to reset it first.

Yes, I see that now.  And the implementation would need to be something
yourself or other more seasoned NVMe developers pursued.  NVMe code is
pretty unforgiving.

I took a crack at aspects of this, my head hurts.  While testing I hit
some "interesting" lack of self-awareness about NVMe resources that are
in use.  So lots of associations are able to be torn down rather than
graceful failure.  Could be nvme_fcloop specific, but it is pretty easy
to do the following using mptest's lib/unittests/nvme_4port_create.sh
followed by: modprobe -r nvme_fcloop

Results in an infinite spew of:
[14245.345759] nvme_fcloop: fcloop_exit: Failed deleting remote port
[14245.351851] nvme_fcloop: fcloop_exit: Failed deleting target port
[14245.357944] nvme_fcloop: fcloop_exit: Failed deleting remote port
[14245.364038] nvme_fcloop: fcloop_exit: Failed deleting target port

Another fun one is to lib/unittests/nvme_4port_delete.sh while the
native NVMe multipath device (created from nvme_4port_create.sh) was
still in use by an xfs mount, so:
./nvme_4port_create.sh
mount /dev/nvme1n1 /mnt
./nvme_4port_delete.sh
umount /mnt

Those were clear screwups on my part but I wouldn't have expected them
to cause nvme to blow through so many stop signs.

Anyway, I put enough time to trying to make the previously thought
"simple" mpath_personality switch safe -- in the face of active handles
(issue Sagi pointed out) -- that it is clear NVMe just doesn't have
enough state to do it in a clean way.  Would require a deeper
understanding of the code that I don't have.  Most every NVMe function
returns void so there is basically no potential for error handling (in
the face of a resource being in use).

The following is my WIP patch (built ontop of the 3 patches from
this thread's series) that has cured me of wanting to continue pursuit
of a robust implementation of the runtime 'mpath_personality' switch:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1e018d0..80103b3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2146,10 +2146,8 @@ static ssize_t __nvme_subsys_store_mpath_personality(struct nvme_subsystem *subs
 		goto out;
 	}
 
-	if (subsys->native_mpath != native_mpath) {
-		subsys->native_mpath = native_mpath;
-		ret = nvme_mpath_change_personality(subsys);
-	}
+	if (subsys->native_mpath != native_mpath)
+		ret = nvme_mpath_change_personality(subsys, native_mpath);
 out:
 	return ret ? ret : count;
 }
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 53d2610..017c924 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -247,26 +247,57 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 	put_disk(head->disk);
 }
 
-int nvme_mpath_change_personality(struct nvme_subsystem *subsys)
+static bool __nvme_subsys_in_use(struct nvme_subsystem *subsys)
 {
 	struct nvme_ctrl *ctrl;
-	int ret = 0;
+	struct nvme_ns *ns, *next;
 
-restart:
-	mutex_lock(&subsys->lock);
 	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
-		if (!list_empty(&ctrl->namespaces)) {
-			mutex_unlock(&subsys->lock);
-			nvme_remove_namespaces(ctrl);
-			goto restart;
+		down_write(&ctrl->namespaces_rwsem);
+		list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
+			if ((kref_read(&ns->kref) > 1) ||
+			    // FIXME: need to compare with N paths
+			    (ns->head && (kref_read(&ns->head->ref) > 1))) {
+				printk("ns->kref = %d", kref_read(&ns->kref));
+				printk("ns->head->ref = %d", kref_read(&ns->head->ref));
+				up_write(&ctrl->namespaces_rwsem);
+				mutex_unlock(&subsys->lock);
+				return true;
+			}
 		}
+		up_write(&ctrl->namespaces_rwsem);
 	}
-	mutex_unlock(&subsys->lock);
+
+	return false;
+}
+
+int nvme_mpath_change_personality(struct nvme_subsystem *subsys, bool native)
+{
+	struct nvme_ctrl *ctrl;
 
 	mutex_lock(&subsys->lock);
-	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
-		nvme_queue_scan(ctrl);
+
+	if (__nvme_subsys_in_use(subsys)) {
+		mutex_unlock(&subsys->lock);
+		return -EBUSY;
+	}
+
+	// FIXME: racey, subsys could now be in use here.
+	// Interlock against use needs work from an NVMe developer (hch?) :)
+
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		cancel_work_sync(&ctrl->reset_work);
+		flush_work(&ctrl->reset_work);
+		nvme_stop_ctrl(ctrl);
+	}
+
+	subsys->native_mpath = native;
 	mutex_unlock(&subsys->lock);
 
-	return ret;
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		nvme_remove_namespaces(ctrl);
+		nvme_start_ctrl(ctrl);
+	}
+
+	return 0;
 }
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 81e4e71..97a6b08 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -452,7 +452,7 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
 void nvme_mpath_add_disk(struct nvme_ns_head *head);
 void nvme_mpath_remove_disk(struct nvme_ns_head *head);
-int nvme_mpath_change_personality(struct nvme_subsystem *subsys);
+int nvme_mpath_change_personality(struct nvme_subsystem *subsys, bool native);
 
 static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)
 {

WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Keith Busch <keith.busch@intel.com>,
	Hannes Reinecke <hare@suse.de>,
	Laurence Oberman <loberman@redhat.com>,
	Ewan Milne <emilne@redhat.com>,
	James Smart <james.smart@broadcom.com>,
	Linux Kernel Mailinglist <linux-kernel@vger.kernel.org>,
	Linux NVMe Mailinglist <linux-nvme@lists.infradead.org>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	Martin George <marting@netapp.com>,
	John Meneghini <John.Meneghini@netapp.com>
Subject: Re: [PATCH 0/3] Provide more fine grained control over multipathing
Date: Fri, 1 Jun 2018 00:11:15 -0400	[thread overview]
Message-ID: <20180601041115.GA14244@redhat.com> (raw)
In-Reply-To: <20180531163440.GB30954@lst.de>

On Thu, May 31 2018 at 12:34pm -0400,
Christoph Hellwig <hch@lst.de> wrote:

> On Thu, May 31, 2018 at 08:37:39AM -0400, Mike Snitzer wrote:
> > I saw your reply to the 1/3 patch.. I do agree it is broken for not
> > checking if any handles are active.  But that is easily fixed no?
> 
> Doing a switch at runtime simply is a really bad idea.  If for some
> reason we end up with a good per-controller switch it would have
> to be something set at probe time, and to get it on a controller
> you'd need to reset it first.

Yes, I see that now.  And the implementation would need to be something
yourself or other more seasoned NVMe developers pursued.  NVMe code is
pretty unforgiving.

I took a crack at aspects of this, my head hurts.  While testing I hit
some "interesting" lack of self-awareness about NVMe resources that are
in use.  So lots of associations are able to be torn down rather than
graceful failure.  Could be nvme_fcloop specific, but it is pretty easy
to do the following using mptest's lib/unittests/nvme_4port_create.sh
followed by: modprobe -r nvme_fcloop

Results in an infinite spew of:
[14245.345759] nvme_fcloop: fcloop_exit: Failed deleting remote port
[14245.351851] nvme_fcloop: fcloop_exit: Failed deleting target port
[14245.357944] nvme_fcloop: fcloop_exit: Failed deleting remote port
[14245.364038] nvme_fcloop: fcloop_exit: Failed deleting target port

Another fun one is to lib/unittests/nvme_4port_delete.sh while the
native NVMe multipath device (created from nvme_4port_create.sh) was
still in use by an xfs mount, so:
./nvme_4port_create.sh
mount /dev/nvme1n1 /mnt
./nvme_4port_delete.sh
umount /mnt

Those were clear screwups on my part but I wouldn't have expected them
to cause nvme to blow through so many stop signs.

Anyway, I put enough time to trying to make the previously thought
"simple" mpath_personality switch safe -- in the face of active handles
(issue Sagi pointed out) -- that it is clear NVMe just doesn't have
enough state to do it in a clean way.  Would require a deeper
understanding of the code that I don't have.  Most every NVMe function
returns void so there is basically no potential for error handling (in
the face of a resource being in use).

The following is my WIP patch (built ontop of the 3 patches from
this thread's series) that has cured me of wanting to continue pursuit
of a robust implementation of the runtime 'mpath_personality' switch:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1e018d0..80103b3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2146,10 +2146,8 @@ static ssize_t __nvme_subsys_store_mpath_personality(struct nvme_subsystem *subs
 		goto out;
 	}
 
-	if (subsys->native_mpath != native_mpath) {
-		subsys->native_mpath = native_mpath;
-		ret = nvme_mpath_change_personality(subsys);
-	}
+	if (subsys->native_mpath != native_mpath)
+		ret = nvme_mpath_change_personality(subsys, native_mpath);
 out:
 	return ret ? ret : count;
 }
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 53d2610..017c924 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -247,26 +247,57 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 	put_disk(head->disk);
 }
 
-int nvme_mpath_change_personality(struct nvme_subsystem *subsys)
+static bool __nvme_subsys_in_use(struct nvme_subsystem *subsys)
 {
 	struct nvme_ctrl *ctrl;
-	int ret = 0;
+	struct nvme_ns *ns, *next;
 
-restart:
-	mutex_lock(&subsys->lock);
 	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
-		if (!list_empty(&ctrl->namespaces)) {
-			mutex_unlock(&subsys->lock);
-			nvme_remove_namespaces(ctrl);
-			goto restart;
+		down_write(&ctrl->namespaces_rwsem);
+		list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
+			if ((kref_read(&ns->kref) > 1) ||
+			    // FIXME: need to compare with N paths
+			    (ns->head && (kref_read(&ns->head->ref) > 1))) {
+				printk("ns->kref = %d", kref_read(&ns->kref));
+				printk("ns->head->ref = %d", kref_read(&ns->head->ref));
+				up_write(&ctrl->namespaces_rwsem);
+				mutex_unlock(&subsys->lock);
+				return true;
+			}
 		}
+		up_write(&ctrl->namespaces_rwsem);
 	}
-	mutex_unlock(&subsys->lock);
+
+	return false;
+}
+
+int nvme_mpath_change_personality(struct nvme_subsystem *subsys, bool native)
+{
+	struct nvme_ctrl *ctrl;
 
 	mutex_lock(&subsys->lock);
-	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
-		nvme_queue_scan(ctrl);
+
+	if (__nvme_subsys_in_use(subsys)) {
+		mutex_unlock(&subsys->lock);
+		return -EBUSY;
+	}
+
+	// FIXME: racey, subsys could now be in use here.
+	// Interlock against use needs work from an NVMe developer (hch?) :)
+
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		cancel_work_sync(&ctrl->reset_work);
+		flush_work(&ctrl->reset_work);
+		nvme_stop_ctrl(ctrl);
+	}
+
+	subsys->native_mpath = native;
 	mutex_unlock(&subsys->lock);
 
-	return ret;
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		nvme_remove_namespaces(ctrl);
+		nvme_start_ctrl(ctrl);
+	}
+
+	return 0;
 }
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 81e4e71..97a6b08 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -452,7 +452,7 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
 void nvme_mpath_add_disk(struct nvme_ns_head *head);
 void nvme_mpath_remove_disk(struct nvme_ns_head *head);
-int nvme_mpath_change_personality(struct nvme_subsystem *subsys);
+int nvme_mpath_change_personality(struct nvme_subsystem *subsys, bool native);
 
 static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)
 {

  reply	other threads:[~2018-06-01  4:11 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-25 12:53 [PATCH 0/3] Provide more fine grained control over multipathing Johannes Thumshirn
2018-05-25 12:53 ` Johannes Thumshirn
2018-05-25 12:53 ` [PATCH 1/3] nvme: provide a way to disable nvme mpath per subsystem Johannes Thumshirn
2018-05-25 12:53   ` Johannes Thumshirn
2018-05-25 13:47   ` Mike Snitzer
2018-05-25 13:47     ` Mike Snitzer
2018-05-31  8:17   ` Sagi Grimberg
2018-05-31  8:17     ` Sagi Grimberg
2018-05-25 12:53 ` [PATCH 2/3] nvme multipath: added SUBSYS_ATTR_RW Johannes Thumshirn
2018-05-25 12:53   ` Johannes Thumshirn
2018-05-25 12:53 ` [PATCH 3/3] nvme multipath: add dev_attr_mpath_personality Johannes Thumshirn
2018-05-25 12:53   ` Johannes Thumshirn
2018-05-25 13:05 ` [PATCH 0/3] Provide more fine grained control over multipathing Christoph Hellwig
2018-05-25 13:05   ` Christoph Hellwig
2018-05-25 13:58   ` Mike Snitzer
2018-05-25 13:58     ` Mike Snitzer
2018-05-25 14:12     ` Christoph Hellwig
2018-05-25 14:12       ` Christoph Hellwig
2018-05-25 14:50       ` Mike Snitzer
2018-05-25 14:50         ` Mike Snitzer
2018-05-29  1:19         ` Martin K. Petersen
2018-05-29  1:19           ` Martin K. Petersen
2018-05-29  3:02           ` Mike Snitzer
2018-05-29  3:02             ` Mike Snitzer
2018-05-29  7:18             ` Hannes Reinecke
2018-05-29  7:18               ` Hannes Reinecke
2018-05-29  7:22             ` Johannes Thumshirn
2018-05-29  7:22               ` Johannes Thumshirn
2018-05-29  8:09               ` Christoph Hellwig
2018-05-29  8:09                 ` Christoph Hellwig
2018-05-29  9:54                 ` Mike Snitzer
2018-05-29  9:54                   ` Mike Snitzer
2018-05-29 23:27                 ` Mike Snitzer
2018-05-29 23:27                   ` Mike Snitzer
2018-05-30 19:05                   ` Jens Axboe
2018-05-30 19:05                     ` Jens Axboe
2018-05-30 19:59                     ` Mike Snitzer
2018-05-30 19:59                       ` Mike Snitzer
2018-06-04  6:19                     ` Hannes Reinecke
2018-06-04  6:19                       ` Hannes Reinecke
2018-06-04  7:18                       ` Johannes Thumshirn
2018-06-04  7:18                         ` Johannes Thumshirn
2018-06-04 12:59                         ` Christoph Hellwig
2018-06-04 12:59                           ` Christoph Hellwig
2018-06-04 13:27                           ` Mike Snitzer
2018-06-04 13:27                             ` Mike Snitzer
2018-05-31  2:42               ` Ming Lei
2018-05-31  2:42                 ` Ming Lei
2018-05-30 21:20     ` Sagi Grimberg
2018-05-30 21:20       ` Sagi Grimberg
2018-05-30 22:02       ` Mike Snitzer
2018-05-30 22:02         ` Mike Snitzer
2018-05-31  8:37         ` Sagi Grimberg
2018-05-31  8:37           ` Sagi Grimberg
2018-05-31 12:37           ` Mike Snitzer
2018-05-31 12:37             ` Mike Snitzer
2018-05-31 16:34             ` Christoph Hellwig
2018-05-31 16:34               ` Christoph Hellwig
2018-06-01  4:11               ` Mike Snitzer [this message]
2018-06-01  4:11                 ` Mike Snitzer
2018-05-31 16:36           ` Christoph Hellwig
2018-05-31 16:36             ` Christoph Hellwig
2018-05-31 16:33         ` Christoph Hellwig
2018-05-31 16:33           ` Christoph Hellwig
2018-05-31 18:17           ` Mike Snitzer
2018-05-31 18:17             ` Mike Snitzer
2018-06-01  2:40             ` Martin K. Petersen
2018-06-01  2:40               ` Martin K. Petersen
2018-06-01  4:24               ` Mike Snitzer
2018-06-01  4:24                 ` Mike Snitzer
2018-06-01 14:09                 ` Martin K. Petersen
2018-06-01 14:09                   ` Martin K. Petersen
2018-06-01 15:21                   ` Mike Snitzer
2018-06-01 15:21                     ` Mike Snitzer
2018-06-03 11:00                 ` Sagi Grimberg
2018-06-03 11:00                   ` Sagi Grimberg
2018-06-03 16:06                   ` Mike Snitzer
2018-06-03 16:06                     ` Mike Snitzer
2018-06-04 11:46                     ` Sagi Grimberg
2018-06-04 11:46                       ` Sagi Grimberg
2018-06-04 12:48                       ` Johannes Thumshirn
2018-06-04 12:48                         ` Johannes Thumshirn
2018-05-30 22:44       ` Mike Snitzer
2018-05-30 22:44         ` Mike Snitzer
2018-05-31  8:51         ` Sagi Grimberg
2018-05-31  8:51           ` Sagi Grimberg
2018-05-31 12:41           ` Mike Snitzer
2018-05-31 12:41             ` Mike Snitzer
2018-06-04 21:58       ` Roland Dreier
2018-06-04 21:58         ` Roland Dreier
2018-06-05  4:42         ` Christoph Hellwig
2018-06-05  4:42           ` Christoph Hellwig
2018-06-05 22:57           ` Roland Dreier
2018-06-05 22:57             ` Roland Dreier
2018-06-06  9:51             ` Christoph Hellwig
2018-06-06  9:51               ` Christoph Hellwig
2018-06-06  9:32           ` Sagi Grimberg
2018-06-06  9:32             ` Sagi Grimberg
2018-06-06  9:50             ` Christoph Hellwig
2018-06-06  9:50               ` Christoph Hellwig
2018-05-25 14:22   ` Johannes Thumshirn
2018-05-25 14:22     ` Johannes Thumshirn
2018-05-25 14:30     ` Christoph Hellwig
2018-05-25 14:30       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180601041115.GA14244@redhat.com \
    --to=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.