All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] dm: alternate solution to reloading with failed paths
@ 2014-08-13 18:53 Benjamin Marzinski
  2014-08-13 18:53 ` [PATCH 1/2] dm-multipath: cleanup IO queueing after table load Benjamin Marzinski
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Benjamin Marzinski @ 2014-08-13 18:53 UTC (permalink / raw)
  To: device-mapper development; +Cc: Junichi Nomura

This is different solution to the problem of multipath not being able to
reload devices with failed paths than the one that Hannes first proposed.
It adds a list of dm_devs to the mapped_device structure.  All of the table
lists point to devices on this list. The list keeps track of how many tables
are accessing these devices, and locks them to deal with tables being created
and destroyed at the same time.  This avoids the issue of having tables that
list devices for which the kernel is not holding any reference, and which
can disappear and come back as a completely different device if userspace
drops its reference.  This also allows these devices to be reinstated without
another table reload.

The other patch deals with issues I discovered while tracking down why
multipath devices with no valid paths would hang on table reload. It turns
out that multipath_map can no longer be called after a table reload until
queue_io is cleared, but multipath map was previously the function that
usually cleared queue_io.


Benjamin Marzinski (2):
  dm-multipath: cleanup IO queueing after table load
  device-mapper: allow tables to share dm_devs

 drivers/md/dm-ioctl.c |   2 +-
 drivers/md/dm-mpath.c |   6 ++-
 drivers/md/dm-table.c | 101 ++++++++++++++---------------------------
 drivers/md/dm.c       | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/md/dm.h       |   5 +-
 5 files changed, 165 insertions(+), 72 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] dm-multipath: cleanup IO queueing after table load
  2014-08-13 18:53 [PATCH 0/2] dm: alternate solution to reloading with failed paths Benjamin Marzinski
@ 2014-08-13 18:53 ` Benjamin Marzinski
  2014-08-14  2:46   ` Mike Snitzer
  2014-08-13 18:53 ` [PATCH 2/2] device-mapper: allow tables to share dm_devs Benjamin Marzinski
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Benjamin Marzinski @ 2014-08-13 18:53 UTC (permalink / raw)
  To: device-mapper development; +Cc: Junichi Nomura

Commit e809917735ebf1b9a56c24e877ce0d320baee2ec changed the code in
multipath_busy that checked if a multipath device was initializing a path.
It returned busy if queue_io was set on the device. However queue_io is set
when the table is loaded, and as long as the device was busy, multipath_map
wouldn't run to clear it upon receiving requsts. multipath_ioctl and
reinstate_path were still able to clear queue_io. But if no paths were
reinstated and the multipath device received no ioctls after the table
reload, all requests to it would simply queue.

This patch switches the multipath_busy code to check for a pending or in
progress path initialization, without using queue_io.  This means incoming
requests will allow multipath_map to initialize paths and clear queue_io.

This patch also changes __choose_pgpath to make it clear queue_io if there
are no valid paths, since there are obviously no paths that can be
initialized.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 drivers/md/dm-mpath.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 3f6fd9d..a0b83cb 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -317,8 +317,10 @@ static void __choose_pgpath(struct multipath *m, size_t nr_bytes)
 	struct priority_group *pg;
 	unsigned bypassed = 1;
 
-	if (!m->nr_valid_paths)
+	if (!m->nr_valid_paths) {
+		m->queue_io = 0;
 		goto failed;
+	}
 
 	/* Were we instructed to switch PG? */
 	if (m->next_pg) {
@@ -1612,7 +1614,7 @@ static int multipath_busy(struct dm_target *ti)
 	spin_lock_irqsave(&m->lock, flags);
 
 	/* pg_init in progress, requeue until done */
-	if (!pg_ready(m)) {
+	if (m->pg_init_required || m->pg_init_in_progress) {
 		busy = 1;
 		goto out;
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] device-mapper: allow tables to share dm_devs
  2014-08-13 18:53 [PATCH 0/2] dm: alternate solution to reloading with failed paths Benjamin Marzinski
  2014-08-13 18:53 ` [PATCH 1/2] dm-multipath: cleanup IO queueing after table load Benjamin Marzinski
@ 2014-08-13 18:53 ` Benjamin Marzinski
  2014-09-15 21:44 ` [PATCH 0/2] dm: alternate solution to reloading with failed paths Mike Snitzer
  2014-09-18 16:02 ` [PATCH v2 " Mike Snitzer
  3 siblings, 0 replies; 12+ messages in thread
From: Benjamin Marzinski @ 2014-08-13 18:53 UTC (permalink / raw)
  To: device-mapper development; +Cc: Junichi Nomura

When a new table is created, in needs to open all of the devices that it
will use.  For multipathi, this can cause problems because it may need to
reload a table while some devices are unusable. To get around this, device
mapper now stores the dm_devs list for all table devices in the
mapped-device object itself, and the tables' device lists just point to the
main list.

The device list code in dm.c isn't a straight copy/paste form the code in
dm-table.c, but it's very close. One subtle difference is that find_devices
for the mapped-device list will only match devices with the same name and
mode. This is because we don't want to upgrade a device's mode because of
a new inactive table.

Access to the main list requires a mutex, so that tables can be created
and destroyed concurrently.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 drivers/md/dm-ioctl.c |   2 +-
 drivers/md/dm-table.c | 101 ++++++++++++++---------------------------
 drivers/md/dm.c       | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/md/dm.h       |   5 +-
 4 files changed, 161 insertions(+), 70 deletions(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 5152142..0be9381 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1418,7 +1418,7 @@ static void retrieve_deps(struct dm_table *table,
 	deps->count = count;
 	count = 0;
 	list_for_each_entry (dd, dm_table_get_devices(table), list)
-		deps->dev[count++] = huge_encode_dev(dd->dm_dev.bdev->bd_dev);
+		deps->dev[count++] = huge_encode_dev(dd->dm_dev->bdev->bd_dev);
 
 	param->data_size = param->data_start + needed;
 }
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 5f59f1e..9c9cd1d 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -210,15 +210,15 @@ int dm_table_create(struct dm_table **result, fmode_t mode,
 	return 0;
 }
 
-static void free_devices(struct list_head *devices)
+static void free_devices(struct mapped_device *md, struct list_head *devices)
 {
 	struct list_head *tmp, *next;
 
 	list_for_each_safe(tmp, next, devices) {
 		struct dm_dev_internal *dd =
 		    list_entry(tmp, struct dm_dev_internal, list);
-		DMWARN("dm_table_destroy: dm_put_device call missing for %s",
-		       dd->dm_dev.name);
+		DMWARN("%s: dm_table_destroy: dm_put_device call missing for %s", dm_device_name(md), dd->dm_dev->name);
+		dm_put_table_device(md, dd->dm_dev);
 		kfree(dd);
 	}
 }
@@ -247,7 +247,7 @@ void dm_table_destroy(struct dm_table *t)
 	vfree(t->highs);
 
 	/* free the device list */
-	free_devices(&t->devices);
+	free_devices(t->md, &t->devices);
 
 	dm_free_md_mempools(t->mempools);
 
@@ -262,53 +262,13 @@ static struct dm_dev_internal *find_device(struct list_head *l, dev_t dev)
 	struct dm_dev_internal *dd;
 
 	list_for_each_entry (dd, l, list)
-		if (dd->dm_dev.bdev->bd_dev == dev)
+		if (dd->dm_dev->bdev->bd_dev == dev)
 			return dd;
 
 	return NULL;
 }
 
 /*
- * Open a device so we can use it as a map destination.
- */
-static int open_dev(struct dm_dev_internal *d, dev_t dev,
-		    struct mapped_device *md)
-{
-	static char *_claim_ptr = "I belong to device-mapper";
-	struct block_device *bdev;
-
-	int r;
-
-	BUG_ON(d->dm_dev.bdev);
-
-	bdev = blkdev_get_by_dev(dev, d->dm_dev.mode | FMODE_EXCL, _claim_ptr);
-	if (IS_ERR(bdev))
-		return PTR_ERR(bdev);
-
-	r = bd_link_disk_holder(bdev, dm_disk(md));
-	if (r) {
-		blkdev_put(bdev, d->dm_dev.mode | FMODE_EXCL);
-		return r;
-	}
-
-	d->dm_dev.bdev = bdev;
-	return 0;
-}
-
-/*
- * Close a device that we've been using.
- */
-static void close_dev(struct dm_dev_internal *d, struct mapped_device *md)
-{
-	if (!d->dm_dev.bdev)
-		return;
-
-	bd_unlink_disk_holder(d->dm_dev.bdev, dm_disk(md));
-	blkdev_put(d->dm_dev.bdev, d->dm_dev.mode | FMODE_EXCL);
-	d->dm_dev.bdev = NULL;
-}
-
-/*
  * If possible, this checks an area of a destination device is invalid.
  */
 static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
@@ -386,19 +346,17 @@ static int upgrade_mode(struct dm_dev_internal *dd, fmode_t new_mode,
 			struct mapped_device *md)
 {
 	int r;
-	struct dm_dev_internal dd_new, dd_old;
-
-	dd_new = dd_old = *dd;
+	struct dm_dev *old_dev, *new_dev;
 
-	dd_new.dm_dev.mode |= new_mode;
-	dd_new.dm_dev.bdev = NULL;
+	old_dev = dd->dm_dev;
 
-	r = open_dev(&dd_new, dd->dm_dev.bdev->bd_dev, md);
+	r = dm_get_table_device(md, dd->dm_dev->bdev->bd_dev,
+				dd->dm_dev->mode | new_mode, &new_dev);
 	if (r)
 		return r;
 
-	dd->dm_dev.mode |= new_mode;
-	close_dev(&dd_old, md);
+	dd->dm_dev = new_dev;
+	dm_put_table_device(md, old_dev);
 
 	return 0;
 }
@@ -440,27 +398,22 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
 		if (!dd)
 			return -ENOMEM;
 
-		dd->dm_dev.mode = mode;
-		dd->dm_dev.bdev = NULL;
-
-		if ((r = open_dev(dd, dev, t->md))) {
+		if ((r = dm_get_table_device(t->md, dev, mode, &dd->dm_dev))) {
 			kfree(dd);
 			return r;
 		}
 
-		format_dev_t(dd->dm_dev.name, dev);
-
 		atomic_set(&dd->count, 0);
 		list_add(&dd->list, &t->devices);
 
-	} else if (dd->dm_dev.mode != (mode | dd->dm_dev.mode)) {
+	} else if (dd->dm_dev->mode != (mode | dd->dm_dev->mode)) {
 		r = upgrade_mode(dd, mode, t->md);
 		if (r)
 			return r;
 	}
 	atomic_inc(&dd->count);
 
-	*result = &dd->dm_dev;
+	*result = dd->dm_dev;
 	return 0;
 }
 EXPORT_SYMBOL(dm_get_device);
@@ -505,11 +458,23 @@ static int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
  */
 void dm_put_device(struct dm_target *ti, struct dm_dev *d)
 {
-	struct dm_dev_internal *dd = container_of(d, struct dm_dev_internal,
-						  dm_dev);
+	int found = 0;
+	struct list_head *l = &ti->table->devices;
+	struct dm_dev_internal *dd;
 
+	list_for_each_entry (dd, l, list) {
+		if (dd->dm_dev == d) {
+			found = 1;
+			break;
+		}
+	}
+	if (!found) {
+		DMWARN("%s: device %s not in table list",
+		       dm_device_name(ti->table->md), d->name);
+		return;
+	}
 	if (atomic_dec_and_test(&dd->count)) {
-		close_dev(dd, ti->table->md);
+		dm_put_table_device(ti->table->md, d);
 		list_del(&dd->list);
 		kfree(dd);
 	}
@@ -906,7 +871,7 @@ static int dm_table_set_type(struct dm_table *t)
 	/* Non-request-stackable devices can't be used for request-based dm */
 	devices = dm_table_get_devices(t);
 	list_for_each_entry(dd, devices, list) {
-		if (!blk_queue_stackable(bdev_get_queue(dd->dm_dev.bdev))) {
+		if (!blk_queue_stackable(bdev_get_queue(dd->dm_dev->bdev))) {
 			DMWARN("table load rejected: including"
 			       " non-request-stackable devices");
 			return -EINVAL;
@@ -1043,7 +1008,7 @@ static struct gendisk * dm_table_get_integrity_disk(struct dm_table *t,
 	struct gendisk *prev_disk = NULL, *template_disk = NULL;
 
 	list_for_each_entry(dd, devices, list) {
-		template_disk = dd->dm_dev.bdev->bd_disk;
+		template_disk = dd->dm_dev->bdev->bd_disk;
 		if (!blk_get_integrity(template_disk))
 			goto no_integrity;
 		if (!match_all && !blk_integrity_is_initialized(template_disk))
@@ -1579,7 +1544,7 @@ int dm_table_any_congested(struct dm_table *t, int bdi_bits)
 	int r = 0;
 
 	list_for_each_entry(dd, devices, list) {
-		struct request_queue *q = bdev_get_queue(dd->dm_dev.bdev);
+		struct request_queue *q = bdev_get_queue(dd->dm_dev->bdev);
 		char b[BDEVNAME_SIZE];
 
 		if (likely(q))
@@ -1587,7 +1552,7 @@ int dm_table_any_congested(struct dm_table *t, int bdi_bits)
 		else
 			DMWARN_LIMIT("%s: any_congested: nonexistent device %s",
 				     dm_device_name(t->md),
-				     bdevname(dd->dm_dev.bdev, b));
+				     bdevname(dd->dm_dev->bdev, b));
 	}
 
 	list_for_each_entry(cb, &t->target_callbacks, list)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 437d990..d33b7f1 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -140,6 +140,9 @@ struct mapped_device {
 	 */
 	struct dm_table *map;
 
+	struct list_head table_devices;
+	struct mutex devs_lock;
+
 	unsigned long flags;
 
 	struct request_queue *queue;
@@ -210,6 +213,12 @@ struct dm_md_mempools {
 	struct bio_set *bs;
 };
 
+struct table_device {
+	struct list_head list;
+	atomic_t count;
+	struct dm_dev dm_dev;
+};
+
 #define RESERVED_BIO_BASED_IOS		16
 #define RESERVED_REQUEST_BASED_IOS	256
 #define RESERVED_MAX_IOS		1024
@@ -659,6 +668,117 @@ static void dm_put_live_table_fast(struct mapped_device *md) __releases(RCU)
 }
 
 /*
+ * Open a table device so we can use it as a map destination.
+ */
+static int open_dev(struct table_device *d, dev_t dev,
+		    struct mapped_device *md)
+{
+	static char *_claim_ptr = "I belong to device-mapper";
+	struct block_device *bdev;
+
+	int r;
+
+	BUG_ON(d->dm_dev.bdev);
+
+	bdev = blkdev_get_by_dev(dev, d->dm_dev.mode | FMODE_EXCL, _claim_ptr);
+	if (IS_ERR(bdev))
+		return PTR_ERR(bdev);
+
+	r = bd_link_disk_holder(bdev, dm_disk(md));
+	if (r) {
+		blkdev_put(bdev, d->dm_dev.mode | FMODE_EXCL);
+		return r;
+	}
+
+	d->dm_dev.bdev = bdev;
+	return 0;
+}
+
+/*
+ * Close a table device that we've been using.
+ */
+static void close_dev(struct table_device *d, struct mapped_device *md)
+{
+	if (!d->dm_dev.bdev)
+		return;
+
+	bd_unlink_disk_holder(d->dm_dev.bdev, dm_disk(md));
+	blkdev_put(d->dm_dev.bdev, d->dm_dev.mode | FMODE_EXCL);
+	d->dm_dev.bdev = NULL;
+}
+
+static struct table_device *find_device(struct list_head *l, dev_t dev,
+					fmode_t mode) {
+	struct table_device *dd;
+
+	list_for_each_entry (dd, l, list)
+		if (dd->dm_dev.bdev->bd_dev == dev && dd->dm_dev.mode == mode)
+			return dd;
+
+	return NULL;
+}
+
+int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
+			struct dm_dev **result) {
+	int r;
+	struct table_device *dd;
+
+	mutex_lock(&md->devs_lock);
+	dd = find_device(&md->table_devices, dev, mode);
+	if (!dd) {
+		dd = kmalloc(sizeof(*dd), GFP_KERNEL);
+		if (!dd)
+			return -ENOMEM;
+
+		dd->dm_dev.mode = mode;
+		dd->dm_dev.bdev = NULL;
+
+		if ((r = open_dev(dd, dev, md))) {
+			kfree(dd);
+			return r;
+		}
+
+		format_dev_t(dd->dm_dev.name, dev);
+
+		atomic_set(&dd->count, 0);
+		list_add(&dd->list, &md->table_devices);
+	}
+
+	atomic_inc(&dd->count);
+	mutex_unlock(&md->devs_lock);
+	*result = &dd->dm_dev;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dm_get_table_device);
+
+void dm_put_table_device(struct mapped_device *md, struct dm_dev *d)
+{
+	struct table_device *dd = container_of(d, struct table_device, dm_dev);
+
+	mutex_lock(&md->devs_lock);
+	if (atomic_dec_and_test(&dd->count)) {
+		close_dev(dd, md);
+		list_del(&dd->list);
+		kfree(dd);
+	}
+	mutex_unlock(&md->devs_lock);
+}
+EXPORT_SYMBOL(dm_put_table_device);
+
+static void free_devices(struct list_head *devices)
+{
+	struct list_head *tmp, *next;
+
+	list_for_each_safe(tmp, next, devices) {
+		struct table_device *dd = list_entry(tmp, struct table_device,
+						     list);
+		DMWARN("dm_destroy: %s still exists with %d references",
+		       dd->dm_dev.name, atomic_read(&dd->count));
+		kfree(dd);
+	}
+}
+
+/*
  * Get the geometry associated with a dm device
  */
 int dm_get_geometry(struct mapped_device *md, struct hd_geometry *geo)
@@ -1938,12 +2058,14 @@ static struct mapped_device *alloc_dev(int minor)
 	md->type = DM_TYPE_NONE;
 	mutex_init(&md->suspend_lock);
 	mutex_init(&md->type_lock);
+	mutex_init(&md->devs_lock);
 	spin_lock_init(&md->deferred_lock);
 	atomic_set(&md->holders, 1);
 	atomic_set(&md->open_count, 0);
 	atomic_set(&md->event_nr, 0);
 	atomic_set(&md->uevent_seq, 0);
 	INIT_LIST_HEAD(&md->uevent_list);
+	INIT_LIST_HEAD(&md->table_devices);
 	spin_lock_init(&md->uevent_lock);
 
 	md->queue = blk_alloc_queue(GFP_KERNEL);
@@ -2029,6 +2151,7 @@ static void free_dev(struct mapped_device *md)
 	blk_integrity_unregister(md->disk);
 	del_gendisk(md->disk);
 	cleanup_srcu_struct(&md->io_barrier);
+	free_devices(&md->table_devices);
 	free_minor(minor);
 
 	spin_lock(&_minor_lock);
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index ed76126..e6aa368 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -44,7 +44,7 @@
 struct dm_dev_internal {
 	struct list_head list;
 	atomic_t count;
-	struct dm_dev dm_dev;
+	struct dm_dev *dm_dev;
 };
 
 struct dm_table;
@@ -189,6 +189,9 @@ int dm_cancel_deferred_remove(struct mapped_device *md);
 int dm_request_based(struct mapped_device *md);
 sector_t dm_get_size(struct mapped_device *md);
 struct request_queue *dm_get_md_queue(struct mapped_device *md);
+int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
+			struct dm_dev **result);
+void dm_put_table_device(struct mapped_device *md, struct dm_dev *d);
 struct dm_stats *dm_get_stats(struct mapped_device *md);
 
 int dm_kobject_uevent(struct mapped_device *md, enum kobject_action action,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] dm-multipath: cleanup IO queueing after table load
  2014-08-13 18:53 ` [PATCH 1/2] dm-multipath: cleanup IO queueing after table load Benjamin Marzinski
@ 2014-08-14  2:46   ` Mike Snitzer
  2014-08-15 15:46     ` Benjamin Marzinski
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2014-08-14  2:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Junichi Nomura

On Wed, Aug 13, 2014 at 2:53 PM, Benjamin Marzinski <bmarzins@redhat.com> wrote:
> Commit e809917735ebf1b9a56c24e877ce0d320baee2ec changed the code in
> multipath_busy that checked if a multipath device was initializing a path.
> It returned busy if queue_io was set on the device. However queue_io is set
> when the table is loaded, and as long as the device was busy, multipath_map
> wouldn't run to clear it upon receiving requsts. multipath_ioctl and
> reinstate_path were still able to clear queue_io. But if no paths were
> reinstated and the multipath device received no ioctls after the table
> reload, all requests to it would simply queue.
>
> This patch switches the multipath_busy code to check for a pending or in
> progress path initialization, without using queue_io.  This means incoming
> requests will allow multipath_map to initialize paths and clear queue_io.

This was already addressed with commit 7a7a3b45fe ("dm mpath: fix IO
hang due to logic bug in multipath_busy").

> This patch also changes __choose_pgpath to make it clear queue_io if there
> are no valid paths, since there are obviously no paths that can be
> initialized.

Can you elaborate on why this is needed?  If it still is then it needs
to be rebased.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] dm-multipath: cleanup IO queueing after table load
  2014-08-14  2:46   ` Mike Snitzer
@ 2014-08-15 15:46     ` Benjamin Marzinski
  0 siblings, 0 replies; 12+ messages in thread
From: Benjamin Marzinski @ 2014-08-15 15:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Junichi Nomura

On Wed, Aug 13, 2014 at 10:46:15PM -0400, Mike Snitzer wrote:
> On Wed, Aug 13, 2014 at 2:53 PM, Benjamin Marzinski <bmarzins@redhat.com> wrote:
> > Commit e809917735ebf1b9a56c24e877ce0d320baee2ec changed the code in
> > multipath_busy that checked if a multipath device was initializing a path.
> > It returned busy if queue_io was set on the device. However queue_io is set
> > when the table is loaded, and as long as the device was busy, multipath_map
> > wouldn't run to clear it upon receiving requsts. multipath_ioctl and
> > reinstate_path were still able to clear queue_io. But if no paths were
> > reinstated and the multipath device received no ioctls after the table
> > reload, all requests to it would simply queue.
> >
> > This patch switches the multipath_busy code to check for a pending or in
> > progress path initialization, without using queue_io.  This means incoming
> > requests will allow multipath_map to initialize paths and clear queue_io.
> 
> This was already addressed with commit 7a7a3b45fe ("dm mpath: fix IO
> hang due to logic bug in multipath_busy").
> 
> > This patch also changes __choose_pgpath to make it clear queue_io if there
> > are no valid paths, since there are obviously no paths that can be
> > initialized.
> 
> Can you elaborate on why this is needed?  If it still is then it needs
> to be rebased.

For a while now, it's been possible to create a multipath table with no
devices.  With this patch set, it will be even easier to reload a
multipath device with no valid paths (since multipath now supports
reloading failed paths). In these scenarios, we still need to disable
queue_io, so that IOs to the device will not just back up.

-Ben

> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] dm: alternate solution to reloading with failed paths
  2014-08-13 18:53 [PATCH 0/2] dm: alternate solution to reloading with failed paths Benjamin Marzinski
  2014-08-13 18:53 ` [PATCH 1/2] dm-multipath: cleanup IO queueing after table load Benjamin Marzinski
  2014-08-13 18:53 ` [PATCH 2/2] device-mapper: allow tables to share dm_devs Benjamin Marzinski
@ 2014-09-15 21:44 ` Mike Snitzer
  2014-09-16 16:46   ` Benjamin Marzinski
  2014-09-18 16:02 ` [PATCH v2 " Mike Snitzer
  3 siblings, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2014-09-15 21:44 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Junichi Nomura, device-mapper development

On Wed, Aug 13 2014 at  2:53pm -0400,
Benjamin Marzinski <bmarzins@redhat.com> wrote:

> This is different solution to the problem of multipath not being able to
> reload devices with failed paths than the one that Hannes first proposed.
> It adds a list of dm_devs to the mapped_device structure.  All of the table
> lists point to devices on this list. The list keeps track of how many tables
> are accessing these devices, and locks them to deal with tables being created
> and destroyed at the same time.  This avoids the issue of having tables that
> list devices for which the kernel is not holding any reference, and which
> can disappear and come back as a completely different device if userspace
> drops its reference.  This also allows these devices to be reinstated without
> another table reload.
> 
> The other patch deals with issues I discovered while tracking down why
> multipath devices with no valid paths would hang on table reload. It turns
> out that multipath_map can no longer be called after a table reload until
> queue_io is cleared, but multipath map was previously the function that
> usually cleared queue_io.

I've done a first pass review of the code changes and overlayed a
cleanup patch in this branch:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=devel

I'll do another pass of review tomorrow and perform a full regression
test as well as a targetted test of loading various multipath tables
with no devices.

If all looks good I'll get it staged for 3.18.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] dm: alternate solution to reloading with failed paths
  2014-09-15 21:44 ` [PATCH 0/2] dm: alternate solution to reloading with failed paths Mike Snitzer
@ 2014-09-16 16:46   ` Benjamin Marzinski
  0 siblings, 0 replies; 12+ messages in thread
From: Benjamin Marzinski @ 2014-09-16 16:46 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Junichi Nomura, device-mapper development

On Mon, Sep 15, 2014 at 05:44:53PM -0400, Mike Snitzer wrote:
> On Wed, Aug 13 2014 at  2:53pm -0400,
> Benjamin Marzinski <bmarzins@redhat.com> wrote:
> 
> > This is different solution to the problem of multipath not being able to
> > reload devices with failed paths than the one that Hannes first proposed.
> > It adds a list of dm_devs to the mapped_device structure.  All of the table
> > lists point to devices on this list. The list keeps track of how many tables
> > are accessing these devices, and locks them to deal with tables being created
> > and destroyed at the same time.  This avoids the issue of having tables that
> > list devices for which the kernel is not holding any reference, and which
> > can disappear and come back as a completely different device if userspace
> > drops its reference.  This also allows these devices to be reinstated without
> > another table reload.
> > 
> > The other patch deals with issues I discovered while tracking down why
> > multipath devices with no valid paths would hang on table reload. It turns
> > out that multipath_map can no longer be called after a table reload until
> > queue_io is cleared, but multipath map was previously the function that
> > usually cleared queue_io.
> 
> I've done a first pass review of the code changes and overlayed a
> cleanup patch in this branch:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=devel
> 
> I'll do another pass of review tomorrow and perform a full regression
> test as well as a targetted test of loading various multipath tables
> with no devices.
> 
> If all looks good I'll get it staged for 3.18.


You changes look fine by me.

-Ben

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 0/2] dm: alternate solution to reloading with failed paths
  2014-08-13 18:53 [PATCH 0/2] dm: alternate solution to reloading with failed paths Benjamin Marzinski
                   ` (2 preceding siblings ...)
  2014-09-15 21:44 ` [PATCH 0/2] dm: alternate solution to reloading with failed paths Mike Snitzer
@ 2014-09-18 16:02 ` Mike Snitzer
  2014-09-18 16:02   ` [PATCH v2 1/2] dm mpath: stop queueing IO when no valid paths exist Mike Snitzer
                     ` (2 more replies)
  3 siblings, 3 replies; 12+ messages in thread
From: Mike Snitzer @ 2014-09-18 16:02 UTC (permalink / raw)
  To: hare; +Cc: Junichi Nomura, dm-devel

Hi Hannes,

Here is v2 of Ben's approach to fixing the inability to reload a
multipath table that includes failed paths.  I've staged it in
linux-dm.git's 'for-next' branch.

But I wanted to send this v2 out to verify this approach works for
you.  This approach is focused on allowing failed paths to be pushed
down in a multipath table.

It does _not_ allow non-existent devices to be listed in an mpath
table like your patchset did.  So the aggressive dev_loss_tmo case
isn't covered... but Ben said that case is already handled by
multipathd getting a change event and pushing the table down without
the removed SCSI device(s).

Thoughts?
Mike

Benjamin Marzinski (2):
  dm mpath: stop queueing IO when no valid paths exist
  dm: allow active and inactive tables to share dm_devs

 drivers/md/dm-ioctl.c         |   2 +-
 drivers/md/dm-mpath.c         |   4 +-
 drivers/md/dm-table.c         | 102 ++++++++++++----------------------
 drivers/md/dm.c               | 126 ++++++++++++++++++++++++++++++++++++++++++
 drivers/md/dm.h               |   5 +-
 include/uapi/linux/dm-ioctl.h |   4 +-
 6 files changed, 170 insertions(+), 73 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] dm mpath: stop queueing IO when no valid paths exist
  2014-09-18 16:02 ` [PATCH v2 " Mike Snitzer
@ 2014-09-18 16:02   ` Mike Snitzer
  2014-09-18 16:02   ` [PATCH v2 2/2] dm: allow active and inactive tables to share dm_devs Mike Snitzer
  2014-09-18 18:50   ` [PATCH v2 0/2] dm: alternate solution to reloading with failed paths Mike Snitzer
  2 siblings, 0 replies; 12+ messages in thread
From: Mike Snitzer @ 2014-09-18 16:02 UTC (permalink / raw)
  To: hare; +Cc: Junichi Nomura, dm-devel

From: Benjamin Marzinski <bmarzins@redhat.com>

'queue_io' is set so that IO is queued while paths are being
initialized.  Clear queue_io in __choose_pgpath if there are no valid
paths, since there are obviously no paths that can be initialized.
Otherwise IOs to the device will back up.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-mpath.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 833d7e7..7b6b0f0 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -317,8 +317,10 @@ static void __choose_pgpath(struct multipath *m, size_t nr_bytes)
 	struct priority_group *pg;
 	unsigned bypassed = 1;
 
-	if (!m->nr_valid_paths)
+	if (!m->nr_valid_paths) {
+		m->queue_io = 0;
 		goto failed;
+	}
 
 	/* Were we instructed to switch PG? */
 	if (m->next_pg) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] dm: allow active and inactive tables to share dm_devs
  2014-09-18 16:02 ` [PATCH v2 " Mike Snitzer
  2014-09-18 16:02   ` [PATCH v2 1/2] dm mpath: stop queueing IO when no valid paths exist Mike Snitzer
@ 2014-09-18 16:02   ` Mike Snitzer
  2014-09-18 18:48     ` Mike Snitzer
  2014-09-18 18:50   ` [PATCH v2 0/2] dm: alternate solution to reloading with failed paths Mike Snitzer
  2 siblings, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2014-09-18 16:02 UTC (permalink / raw)
  To: hare; +Cc: Junichi Nomura, dm-devel

From: Benjamin Marzinski <bmarzins@redhat.com>

When a new table is created, in needs to open all of the devices that it
will use.  For multipath, this can cause problems because it may need to
reload a table while some devices are unusable (failed). To get around
this, device mapper now stores the dm_devs list for all table devices in
the mapped_device structure, and the tables' device lists just point to
this main table_devices list.

The device list code in dm.c isn't a straight copy/paste from the code in
dm-table.c, but it's very close (aside from some variable renames).  One
subtle difference is that find_table_device for the tables_devices list
will only match devices with the same name and mode.  This is because we
don't want to upgrade a device's mode in the active table when an
inactive table is loaded.

Access to the mapped_device structure's tables_devices list requires a
mutex (tables_devices_lock), so that tables cannot be created and
destroyed concurrently.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-ioctl.c         |   2 +-
 drivers/md/dm-table.c         | 102 ++++++++++++----------------------
 drivers/md/dm.c               | 126 ++++++++++++++++++++++++++++++++++++++++++
 drivers/md/dm.h               |   5 +-
 include/uapi/linux/dm-ioctl.h |   4 +-
 5 files changed, 167 insertions(+), 72 deletions(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 5152142..0be9381 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1418,7 +1418,7 @@ static void retrieve_deps(struct dm_table *table,
 	deps->count = count;
 	count = 0;
 	list_for_each_entry (dd, dm_table_get_devices(table), list)
-		deps->dev[count++] = huge_encode_dev(dd->dm_dev.bdev->bd_dev);
+		deps->dev[count++] = huge_encode_dev(dd->dm_dev->bdev->bd_dev);
 
 	param->data_size = param->data_start + needed;
 }
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index f9c6cb8..b2bd1eb 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -210,15 +210,16 @@ int dm_table_create(struct dm_table **result, fmode_t mode,
 	return 0;
 }
 
-static void free_devices(struct list_head *devices)
+static void free_devices(struct list_head *devices, struct mapped_device *md)
 {
 	struct list_head *tmp, *next;
 
 	list_for_each_safe(tmp, next, devices) {
 		struct dm_dev_internal *dd =
 		    list_entry(tmp, struct dm_dev_internal, list);
-		DMWARN("dm_table_destroy: dm_put_device call missing for %s",
-		       dd->dm_dev.name);
+		DMWARN("%s: dm_table_destroy: dm_put_device call missing for %s",
+		       dm_device_name(md), dd->dm_dev->name);
+		dm_put_table_device(md, dd->dm_dev);
 		kfree(dd);
 	}
 }
@@ -247,7 +248,7 @@ void dm_table_destroy(struct dm_table *t)
 	vfree(t->highs);
 
 	/* free the device list */
-	free_devices(&t->devices);
+	free_devices(&t->devices, t->md);
 
 	dm_free_md_mempools(t->mempools);
 
@@ -262,53 +263,13 @@ static struct dm_dev_internal *find_device(struct list_head *l, dev_t dev)
 	struct dm_dev_internal *dd;
 
 	list_for_each_entry (dd, l, list)
-		if (dd->dm_dev.bdev->bd_dev == dev)
+		if (dd->dm_dev->bdev->bd_dev == dev)
 			return dd;
 
 	return NULL;
 }
 
 /*
- * Open a device so we can use it as a map destination.
- */
-static int open_dev(struct dm_dev_internal *d, dev_t dev,
-		    struct mapped_device *md)
-{
-	static char *_claim_ptr = "I belong to device-mapper";
-	struct block_device *bdev;
-
-	int r;
-
-	BUG_ON(d->dm_dev.bdev);
-
-	bdev = blkdev_get_by_dev(dev, d->dm_dev.mode | FMODE_EXCL, _claim_ptr);
-	if (IS_ERR(bdev))
-		return PTR_ERR(bdev);
-
-	r = bd_link_disk_holder(bdev, dm_disk(md));
-	if (r) {
-		blkdev_put(bdev, d->dm_dev.mode | FMODE_EXCL);
-		return r;
-	}
-
-	d->dm_dev.bdev = bdev;
-	return 0;
-}
-
-/*
- * Close a device that we've been using.
- */
-static void close_dev(struct dm_dev_internal *d, struct mapped_device *md)
-{
-	if (!d->dm_dev.bdev)
-		return;
-
-	bd_unlink_disk_holder(d->dm_dev.bdev, dm_disk(md));
-	blkdev_put(d->dm_dev.bdev, d->dm_dev.mode | FMODE_EXCL);
-	d->dm_dev.bdev = NULL;
-}
-
-/*
  * If possible, this checks an area of a destination device is invalid.
  */
 static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
@@ -386,19 +347,17 @@ static int upgrade_mode(struct dm_dev_internal *dd, fmode_t new_mode,
 			struct mapped_device *md)
 {
 	int r;
-	struct dm_dev_internal dd_new, dd_old;
+	struct dm_dev *old_dev, *new_dev;
 
-	dd_new = dd_old = *dd;
+	old_dev = dd->dm_dev;
 
-	dd_new.dm_dev.mode |= new_mode;
-	dd_new.dm_dev.bdev = NULL;
-
-	r = open_dev(&dd_new, dd->dm_dev.bdev->bd_dev, md);
+	r = dm_get_table_device(md, dd->dm_dev->bdev->bd_dev,
+				dd->dm_dev->mode | new_mode, &new_dev);
 	if (r)
 		return r;
 
-	dd->dm_dev.mode |= new_mode;
-	close_dev(&dd_old, md);
+	dd->dm_dev = new_dev;
+	dm_put_table_device(md, old_dev);
 
 	return 0;
 }
@@ -440,27 +399,22 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
 		if (!dd)
 			return -ENOMEM;
 
-		dd->dm_dev.mode = mode;
-		dd->dm_dev.bdev = NULL;
-
-		if ((r = open_dev(dd, dev, t->md))) {
+		if ((r = dm_get_table_device(t->md, dev, mode, &dd->dm_dev))) {
 			kfree(dd);
 			return r;
 		}
 
-		format_dev_t(dd->dm_dev.name, dev);
-
 		atomic_set(&dd->count, 0);
 		list_add(&dd->list, &t->devices);
 
-	} else if (dd->dm_dev.mode != (mode | dd->dm_dev.mode)) {
+	} else if (dd->dm_dev->mode != (mode | dd->dm_dev->mode)) {
 		r = upgrade_mode(dd, mode, t->md);
 		if (r)
 			return r;
 	}
 	atomic_inc(&dd->count);
 
-	*result = &dd->dm_dev;
+	*result = dd->dm_dev;
 	return 0;
 }
 EXPORT_SYMBOL(dm_get_device);
@@ -505,11 +459,23 @@ static int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
  */
 void dm_put_device(struct dm_target *ti, struct dm_dev *d)
 {
-	struct dm_dev_internal *dd = container_of(d, struct dm_dev_internal,
-						  dm_dev);
+	int found = 0;
+	struct list_head *devices = &ti->table->devices;
+	struct dm_dev_internal *dd;
 
+	list_for_each_entry(dd, devices, list) {
+		if (dd->dm_dev == d) {
+			found = 1;
+			break;
+		}
+	}
+	if (!found) {
+		DMWARN("%s: device %s not in table devices list",
+		       dm_device_name(ti->table->md), d->name);
+		return;
+	}
 	if (atomic_dec_and_test(&dd->count)) {
-		close_dev(dd, ti->table->md);
+		dm_put_table_device(ti->table->md, d);
 		list_del(&dd->list);
 		kfree(dd);
 	}
@@ -906,7 +872,7 @@ static int dm_table_set_type(struct dm_table *t)
 	/* Non-request-stackable devices can't be used for request-based dm */
 	devices = dm_table_get_devices(t);
 	list_for_each_entry(dd, devices, list) {
-		if (!blk_queue_stackable(bdev_get_queue(dd->dm_dev.bdev))) {
+		if (!blk_queue_stackable(bdev_get_queue(dd->dm_dev->bdev))) {
 			DMWARN("table load rejected: including"
 			       " non-request-stackable devices");
 			return -EINVAL;
@@ -1043,7 +1009,7 @@ static struct gendisk * dm_table_get_integrity_disk(struct dm_table *t,
 	struct gendisk *prev_disk = NULL, *template_disk = NULL;
 
 	list_for_each_entry(dd, devices, list) {
-		template_disk = dd->dm_dev.bdev->bd_disk;
+		template_disk = dd->dm_dev->bdev->bd_disk;
 		if (!blk_get_integrity(template_disk))
 			goto no_integrity;
 		if (!match_all && !blk_integrity_is_initialized(template_disk))
@@ -1629,7 +1595,7 @@ int dm_table_any_congested(struct dm_table *t, int bdi_bits)
 	int r = 0;
 
 	list_for_each_entry(dd, devices, list) {
-		struct request_queue *q = bdev_get_queue(dd->dm_dev.bdev);
+		struct request_queue *q = bdev_get_queue(dd->dm_dev->bdev);
 		char b[BDEVNAME_SIZE];
 
 		if (likely(q))
@@ -1637,7 +1603,7 @@ int dm_table_any_congested(struct dm_table *t, int bdi_bits)
 		else
 			DMWARN_LIMIT("%s: any_congested: nonexistent device %s",
 				     dm_device_name(t->md),
-				     bdevname(dd->dm_dev.bdev, b));
+				     bdevname(dd->dm_dev->bdev, b));
 	}
 
 	list_for_each_entry(cb, &t->target_callbacks, list)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 32b958d..e1db62c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -142,6 +142,9 @@ struct mapped_device {
 	 */
 	struct dm_table *map;
 
+	struct list_head table_devices;
+	struct mutex table_devices_lock;
+
 	unsigned long flags;
 
 	struct request_queue *queue;
@@ -212,6 +215,12 @@ struct dm_md_mempools {
 	struct bio_set *bs;
 };
 
+struct table_device {
+	struct list_head list;
+	atomic_t count;
+	struct dm_dev dm_dev;
+};
+
 #define RESERVED_BIO_BASED_IOS		16
 #define RESERVED_REQUEST_BASED_IOS	256
 #define RESERVED_MAX_IOS		1024
@@ -670,6 +679,120 @@ static void dm_put_live_table_fast(struct mapped_device *md) __releases(RCU)
 }
 
 /*
+ * Open a table device so we can use it as a map destination.
+ */
+static int open_table_device(struct table_device *td, dev_t dev,
+			     struct mapped_device *md)
+{
+	static char *_claim_ptr = "I belong to device-mapper";
+	struct block_device *bdev;
+
+	int r;
+
+	BUG_ON(td->dm_dev.bdev);
+
+	bdev = blkdev_get_by_dev(dev, td->dm_dev.mode | FMODE_EXCL, _claim_ptr);
+	if (IS_ERR(bdev))
+		return PTR_ERR(bdev);
+
+	r = bd_link_disk_holder(bdev, dm_disk(md));
+	if (r) {
+		blkdev_put(bdev, td->dm_dev.mode | FMODE_EXCL);
+		return r;
+	}
+
+	td->dm_dev.bdev = bdev;
+	return 0;
+}
+
+/*
+ * Close a table device that we've been using.
+ */
+static void close_table_device(struct table_device *td, struct mapped_device *md)
+{
+	if (!td->dm_dev.bdev)
+		return;
+
+	bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
+	blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
+	td->dm_dev.bdev = NULL;
+}
+
+static struct table_device *find_table_device(struct list_head *l, dev_t dev,
+					      fmode_t mode) {
+	struct table_device *td;
+
+	list_for_each_entry(td, l, list)
+		if (td->dm_dev.bdev->bd_dev == dev && td->dm_dev.mode == mode)
+			return td;
+
+	return NULL;
+}
+
+int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
+			struct dm_dev **result) {
+	int r;
+	struct table_device *td;
+
+	mutex_lock(&md->table_devices_lock);
+	td = find_table_device(&md->table_devices, dev, mode);
+	if (!td) {
+		td = kmalloc(sizeof(*td), GFP_KERNEL);
+		if (!td) {
+			mutex_unlock(&md->table_devices_lock);
+			return -ENOMEM;
+		}
+
+		td->dm_dev.mode = mode;
+		td->dm_dev.bdev = NULL;
+
+		if ((r = open_table_device(td, dev, md))) {
+			mutex_unlock(&md->table_devices_lock);
+			kfree(td);
+			return r;
+		}
+
+		format_dev_t(td->dm_dev.name, dev);
+
+		atomic_set(&td->count, 0);
+		list_add(&td->list, &md->table_devices);
+	}
+	atomic_inc(&td->count);
+	mutex_unlock(&md->table_devices_lock);
+
+	*result = &td->dm_dev;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dm_get_table_device);
+
+void dm_put_table_device(struct mapped_device *md, struct dm_dev *d)
+{
+	struct table_device *td = container_of(d, struct table_device, dm_dev);
+
+	mutex_lock(&md->table_devices_lock);
+	if (atomic_dec_and_test(&td->count)) {
+		close_table_device(td, md);
+		list_del(&td->list);
+		kfree(td);
+	}
+	mutex_unlock(&md->table_devices_lock);
+}
+EXPORT_SYMBOL(dm_put_table_device);
+
+static void free_table_devices(struct list_head *devices)
+{
+	struct list_head *tmp, *next;
+
+	list_for_each_safe(tmp, next, devices) {
+		struct table_device *td = list_entry(tmp, struct table_device, list);
+
+		DMWARN("dm_destroy: %s still exists with %d references",
+		       td->dm_dev.name, atomic_read(&td->count));
+		kfree(td);
+	}
+}
+
+/*
  * Get the geometry associated with a dm device
  */
 int dm_get_geometry(struct mapped_device *md, struct hd_geometry *geo)
@@ -1949,12 +2072,14 @@ static struct mapped_device *alloc_dev(int minor)
 	md->type = DM_TYPE_NONE;
 	mutex_init(&md->suspend_lock);
 	mutex_init(&md->type_lock);
+	mutex_init(&md->table_devices_lock);
 	spin_lock_init(&md->deferred_lock);
 	atomic_set(&md->holders, 1);
 	atomic_set(&md->open_count, 0);
 	atomic_set(&md->event_nr, 0);
 	atomic_set(&md->uevent_seq, 0);
 	INIT_LIST_HEAD(&md->uevent_list);
+	INIT_LIST_HEAD(&md->table_devices);
 	spin_lock_init(&md->uevent_lock);
 
 	md->queue = blk_alloc_queue(GFP_KERNEL);
@@ -2040,6 +2165,7 @@ static void free_dev(struct mapped_device *md)
 	blk_integrity_unregister(md->disk);
 	del_gendisk(md->disk);
 	cleanup_srcu_struct(&md->io_barrier);
+	free_table_devices(&md->table_devices);
 	free_minor(minor);
 
 	spin_lock(&_minor_lock);
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index e81d215..988c7fb 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -44,7 +44,7 @@
 struct dm_dev_internal {
 	struct list_head list;
 	atomic_t count;
-	struct dm_dev dm_dev;
+	struct dm_dev *dm_dev;
 };
 
 struct dm_table;
@@ -188,6 +188,9 @@ int dm_cancel_deferred_remove(struct mapped_device *md);
 int dm_request_based(struct mapped_device *md);
 sector_t dm_get_size(struct mapped_device *md);
 struct request_queue *dm_get_md_queue(struct mapped_device *md);
+int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
+			struct dm_dev **result);
+void dm_put_table_device(struct mapped_device *md, struct dm_dev *d);
 struct dm_stats *dm_get_stats(struct mapped_device *md);
 
 int dm_kobject_uevent(struct mapped_device *md, enum kobject_action action,
diff --git a/include/uapi/linux/dm-ioctl.h b/include/uapi/linux/dm-ioctl.h
index c8a4302..06614f6 100644
--- a/include/uapi/linux/dm-ioctl.h
+++ b/include/uapi/linux/dm-ioctl.h
@@ -267,9 +267,9 @@ enum {
 #define DM_DEV_SET_GEOMETRY	_IOWR(DM_IOCTL, DM_DEV_SET_GEOMETRY_CMD, struct dm_ioctl)
 
 #define DM_VERSION_MAJOR	4
-#define DM_VERSION_MINOR	27
+#define DM_VERSION_MINOR	28
 #define DM_VERSION_PATCHLEVEL	0
-#define DM_VERSION_EXTRA	"-ioctl (2013-10-30)"
+#define DM_VERSION_EXTRA	"-ioctl (2014-9-17)"
 
 /* Status bits */
 #define DM_READONLY_FLAG	(1 << 0) /* In/Out */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] dm: allow active and inactive tables to share dm_devs
  2014-09-18 16:02   ` [PATCH v2 2/2] dm: allow active and inactive tables to share dm_devs Mike Snitzer
@ 2014-09-18 18:48     ` Mike Snitzer
  0 siblings, 0 replies; 12+ messages in thread
From: Mike Snitzer @ 2014-09-18 18:48 UTC (permalink / raw)
  To: hare; +Cc: Junichi Nomura, dm-devel

I've revised the header for this patch:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=d379c2886c48b32d5f4cf70a27ba2409e127b6f7

dm: allow active and inactive tables to share dm_devs

Until this change, when loading a new DM table, DM core would re-open
all of the devices in the DM table. Now, DM core will avoid redundant
device opens (and closes when destroying the old table) if the old table
already has a device open using the same mode. This is achieved by
managing reference counts on the table_devices that DM core now stores
in the mapped_device structure (rather than in the dm_table
structure). So a mapped_device's active and inactive dm_tables' dm_dev
lists now just point to the dm_devs stored in the mapped_device's
table_devices list.

This improvement in DM core's device reference counting has the
side-effect of fixing a long-standing limitation of the multipath
target: a DM multipath table couldn't include any paths that were
unusable (failed). For example: if all paths have failed and you add a
new, working, path to the table; you can't use it since the table load
would fail due to it still containing failed paths. Now a re-load of a
multipath table can include failed devices and when those devices become
active again they can be used instantly.


The device list code in dm.c isn't a straight copy/paste from the code
in dm-table.c, but it's very close (aside from some variable
renames). One subtle difference is that find_table_device for the
tables_devices list will only match devices with the same name and
mode. This is because we don't want to upgrade a device's mode in the
active table when an inactive table is loaded. Access to the
mapped_device structure's tables_devices list requires a mutex
(tables_devices_lock), so that tables cannot be created and destroyed
concurrently.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] dm: alternate solution to reloading with failed paths
  2014-09-18 16:02 ` [PATCH v2 " Mike Snitzer
  2014-09-18 16:02   ` [PATCH v2 1/2] dm mpath: stop queueing IO when no valid paths exist Mike Snitzer
  2014-09-18 16:02   ` [PATCH v2 2/2] dm: allow active and inactive tables to share dm_devs Mike Snitzer
@ 2014-09-18 18:50   ` Mike Snitzer
  2 siblings, 0 replies; 12+ messages in thread
From: Mike Snitzer @ 2014-09-18 18:50 UTC (permalink / raw)
  To: hare; +Cc: Junichi Nomura, dm-devel

On Thu, Sep 18 2014 at 12:02pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> Hi Hannes,
> 
> Here is v2 of Ben's approach to fixing the inability to reload a
> multipath table that includes failed paths.  I've staged it in
> linux-dm.git's 'for-next' branch.
> 
> But I wanted to send this v2 out to verify this approach works for
> you.  This approach is focused on allowing failed paths to be pushed
> down in a multipath table.
> 
> It does _not_ allow non-existent devices to be listed in an mpath
> table like your patchset did.  So the aggressive dev_loss_tmo case
> isn't covered... but Ben said that case is already handled by
> multipathd getting a change event and pushing the table down without
> the removed SCSI device(s).

It also doesn't allow the initial multipath table load to include failed
devices...

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-09-18 18:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-13 18:53 [PATCH 0/2] dm: alternate solution to reloading with failed paths Benjamin Marzinski
2014-08-13 18:53 ` [PATCH 1/2] dm-multipath: cleanup IO queueing after table load Benjamin Marzinski
2014-08-14  2:46   ` Mike Snitzer
2014-08-15 15:46     ` Benjamin Marzinski
2014-08-13 18:53 ` [PATCH 2/2] device-mapper: allow tables to share dm_devs Benjamin Marzinski
2014-09-15 21:44 ` [PATCH 0/2] dm: alternate solution to reloading with failed paths Mike Snitzer
2014-09-16 16:46   ` Benjamin Marzinski
2014-09-18 16:02 ` [PATCH v2 " Mike Snitzer
2014-09-18 16:02   ` [PATCH v2 1/2] dm mpath: stop queueing IO when no valid paths exist Mike Snitzer
2014-09-18 16:02   ` [PATCH v2 2/2] dm: allow active and inactive tables to share dm_devs Mike Snitzer
2014-09-18 18:48     ` Mike Snitzer
2014-09-18 18:50   ` [PATCH v2 0/2] dm: alternate solution to reloading with failed paths Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.