[PATCH] dm thin: commit pool's metadata on last close of thin device

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] dm thin: commit pool's metadata on last close of thin device
@ 2012-05-16 22:19 Mike Snitzer
  2012-05-17  0:43 ` Mikulas Patocka
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Snitzer @ 2012-05-16 22:19 UTC (permalink / raw)
  To: dm-devel

Reinstate dm_flush_all and dm_table_flush_all.  dm_blk_close will
now trigger the .flush method of all targets within a table on the last
close of a DM device.

In the case of the thin target, the thin_flush method will commit the
backing pool's metadata.

Doing so avoids a deadlock that has been observed with the following
sequence (as can be triggered via "dmsetup remove_all"):
- IO is issued to a thin device, thin device is closed
- pool's metadata device is suspended before the pool is
- because the pool still has outstanding IO we deadlock because the
  pool's metadata device is suspended

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
---
 drivers/md/dm-table.c |    9 +++++++++
 drivers/md/dm-thin.c  |   19 +++++++++++++++++++
 drivers/md/dm.c       |   20 +++++++++++++++++++-
 drivers/md/dm.h       |    1 +
 4 files changed, 48 insertions(+), 1 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 2e227fb..077fff8 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1487,6 +1487,15 @@ int dm_table_resume_targets(struct dm_table *t)
 	return 0;
 }
 
+void dm_table_flush_all(struct dm_table *t)
+{
+	unsigned i;
+
+	for (i = 0; i < t->num_targets; i++)
+		if (t->targets[i].type->flush)
+			t->targets[i].type->flush(&t->targets[i]);
+}
+
 void dm_table_add_target_callbacks(struct dm_table *t, struct dm_target_callbacks *cb)
 {
 	list_add(&cb->list, &t->target_callbacks);
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index c514078..f64c7e6 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -2429,6 +2429,24 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
 	set_discard_limits(pool, limits);
 }
 
+static void thin_flush(struct dm_target *ti)
+{
+	int r;
+	struct thin_c *tc = ti->private;
+	struct pool *pool = tc->pool;
+
+	/*
+	 * A bit heavy-handed but the only existing way to batch
+	 * metadata commits is to issue() a FLUSH bio -- but DM
+	 * doesn't allocate bios outside the DM core.
+	 */
+	r = dm_pool_commit_metadata(pool->pmd);
+	if (r < 0) {
+		DMERR("%s: dm_pool_commit_metadata() failed, error = %d",
+		      __func__, r);
+	}
+}
+
 static struct target_type thin_target = {
 	.name = "thin",
 	.version = {1, 1, 0},
@@ -2441,6 +2459,7 @@ static struct target_type thin_target = {
 	.status = thin_status,
 	.iterate_devices = thin_iterate_devices,
 	.io_hints = thin_io_hints,
+	.flush = thin_flush,
 };
 
 /*----------------------------------------------------------------*/
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 23a1a84..715ee57 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -487,6 +487,16 @@ out:
 	return md ? 0 : -ENXIO;
 }
 
+static void dm_flush_all(struct mapped_device *md)
+{
+	struct dm_table *t = dm_get_live_table(md);
+	
+	if (t) {
+		dm_table_flush_all(t);
+		dm_table_put(t);
+	}
+}
+
 static int dm_blk_close(struct gendisk *disk, fmode_t mode)
 {
 	struct mapped_device *md = disk->private_data;
@@ -494,10 +504,17 @@ static int dm_blk_close(struct gendisk *disk, fmode_t mode)
 	spin_lock(&_minor_lock);
 
 	atomic_dec(&md->open_count);
-	dm_put(md);
 
 	spin_unlock(&_minor_lock);
 
+	/*
+	 * Flush all targets on last close
+	 */
+	if (!dm_open_count(md))
+		dm_flush_all(md);
+
+	dm_put(md);	
+
 	return 0;
 }
 
@@ -2468,6 +2485,7 @@ void dm_destroy_immediate(struct mapped_device *md)
 void dm_put(struct mapped_device *md)
 {
 	atomic_dec(&md->holders);
+	smp_mb__after_atomic_dec();
 }
 EXPORT_SYMBOL_GPL(dm_put);
 
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index b7dacd5..82199a1 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -66,6 +66,7 @@ bool dm_table_supports_discards(struct dm_table *t);
 int dm_table_alloc_md_mempools(struct dm_table *t);
 void dm_table_free_md_mempools(struct dm_table *t);
 struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
+void dm_table_flush_all(struct dm_table *t);
 
 int dm_queue_merge_is_compulsory(struct request_queue *q);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] dm thin: commit pool's metadata on last close of thin device
  2012-05-16 22:19 [PATCH] dm thin: commit pool's metadata on last close of thin device Mike Snitzer
@ 2012-05-17  0:43 ` Mikulas Patocka
  2012-05-17  4:00   ` Mike Snitzer
  0 siblings, 1 reply; 4+ messages in thread
From: Mikulas Patocka @ 2012-05-17  0:43 UTC (permalink / raw)
  To: device-mapper development, Mike Snitzer
  Cc: Edward Thornber, Alasdair G. Kergon, Zdenek Kabelac

On Wed, 16 May 2012, Mike Snitzer wrote:

> Reinstate dm_flush_all and dm_table_flush_all.  dm_blk_close will
> now trigger the .flush method of all targets within a table on the last
> close of a DM device.
> 
> In the case of the thin target, the thin_flush method will commit the
> backing pool's metadata.
> 
> Doing so avoids a deadlock that has been observed with the following
> sequence (as can be triggered via "dmsetup remove_all"):
> - IO is issued to a thin device, thin device is closed
> - pool's metadata device is suspended before the pool is
> - because the pool still has outstanding IO we deadlock because the
>   pool's metadata device is suspended
> 
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> Cc: stable@vger.kernel.org

I'd say --- don't do this sequence.

Device mapper generally expects that devices are suspended top-down --- 
i.e. you should first suspend the thin device and then suspend its 
underlying data and metadata device. If you violate this sequence and 
suspend bottom-up, you get deadlocks.

For example, if dm-mirror is resynchronizing and you suspend the 
underlying leg or log volume and then suspend dm-mirror, you get a 
deadlock.

If dm-snapshot is merging and you suspend the underlying snapshot or 
origin volume and then suspend snapshot-merget target, you get a deadlock.

These are not bugs in dm-mirror or dm-snapshot, this is expected behavior. 
Userspace shouldn't do any bottom-up suspend sequence.

In the same sense, if you suspend the underlying data or metadata pool and 
then suspend dm-thin, you get a deadlock too. Fix userspace so that it 
doesn't do it.

Mikulas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dm thin: commit pool's metadata on last close of thin device
  2012-05-17  0:43 ` Mikulas Patocka
@ 2012-05-17  4:00   ` Mike Snitzer
  2012-05-17  7:44     ` Zdenek Kabelac
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Snitzer @ 2012-05-17  4:00 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Edward Thornber, Alasdair G. Kergon,
	Zdenek Kabelac

On Wed, May 16 2012 at  8:43pm -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Wed, 16 May 2012, Mike Snitzer wrote:
> 
> > Reinstate dm_flush_all and dm_table_flush_all.  dm_blk_close will
> > now trigger the .flush method of all targets within a table on the last
> > close of a DM device.
> > 
> > In the case of the thin target, the thin_flush method will commit the
> > backing pool's metadata.
> > 
> > Doing so avoids a deadlock that has been observed with the following
> > sequence (as can be triggered via "dmsetup remove_all"):
> > - IO is issued to a thin device, thin device is closed
> > - pool's metadata device is suspended before the pool is
> > - because the pool still has outstanding IO we deadlock because the
> >   pool's metadata device is suspended
> > 
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > Cc: stable@vger.kernel.org
> 
> I'd say --- don't do this sequence.
> 
> Device mapper generally expects that devices are suspended top-down --- 
> i.e. you should first suspend the thin device and then suspend its 
> underlying data and metadata device. If you violate this sequence and 
> suspend bottom-up, you get deadlocks.
> 
> For example, if dm-mirror is resynchronizing and you suspend the 
> underlying leg or log volume and then suspend dm-mirror, you get a 
> deadlock.
> 
> If dm-snapshot is merging and you suspend the underlying snapshot or 
> origin volume and then suspend snapshot-merget target, you get a deadlock.
> 
> These are not bugs in dm-mirror or dm-snapshot, this is expected behavior. 
> Userspace shouldn't do any bottom-up suspend sequence.
> 
> In the same sense, if you suspend the underlying data or metadata pool and 
> then suspend dm-thin, you get a deadlock too. Fix userspace so that it 
> doesn't do it.

Yeah, I agree.  I told Zdenek it'd be best to just not do it.

'dmsetup remove_all' is a dumb command that invites these deadlocks when
more sophisticated stacking is being used.

But all said, in general I don't have a problem with triggering a target
specific "flush" on device close.

(Though the implementation of thin_flush could be made more
intelligent... as is we can see a pretty good storm of redundant
metadata commits if 100s of thin devices are closed simultaneously --
creating pmd->root_lock write lock contention).

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dm thin: commit pool's metadata on last close of thin device
  2012-05-17  4:00   ` Mike Snitzer
@ 2012-05-17  7:44     ` Zdenek Kabelac
  0 siblings, 0 replies; 4+ messages in thread
From: Zdenek Kabelac @ 2012-05-17  7:44 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Edward Thornber, device-mapper development, Mikulas Patocka,
	Alasdair G. Kergon

Dne 17.5.2012 06:00, Mike Snitzer napsal(a):
> On Wed, May 16 2012 at  8:43pm -0400,
> Mikulas Patocka<mpatocka@redhat.com>  wrote:
>
>>
>>
>> On Wed, 16 May 2012, Mike Snitzer wrote:
>>
>>> Reinstate dm_flush_all and dm_table_flush_all.  dm_blk_close will
>>> now trigger the .flush method of all targets within a table on the last
>>> close of a DM device.
>>>
>>> In the case of the thin target, the thin_flush method will commit the
>>> backing pool's metadata.
>>>
>>> Doing so avoids a deadlock that has been observed with the following
>>> sequence (as can be triggered via "dmsetup remove_all"):
>>> - IO is issued to a thin device, thin device is closed
>>> - pool's metadata device is suspended before the pool is
>>> - because the pool still has outstanding IO we deadlock because the
>>>    pool's metadata device is suspended
>>>
>>> Signed-off-by: Mike Snitzer<snitzer@redhat.com>
>>> Cc: stable@vger.kernel.org
>>
>> I'd say --- don't do this sequence.
>>
>> Device mapper generally expects that devices are suspended top-down ---
>> i.e. you should first suspend the thin device and then suspend its
>> underlying data and metadata device. If you violate this sequence and
>> suspend bottom-up, you get deadlocks.
>>
>> For example, if dm-mirror is resynchronizing and you suspend the
>> underlying leg or log volume and then suspend dm-mirror, you get a
>> deadlock.
>>
>> If dm-snapshot is merging and you suspend the underlying snapshot or
>> origin volume and then suspend snapshot-merget target, you get a deadlock.
>>
>> These are not bugs in dm-mirror or dm-snapshot, this is expected behavior.
>> Userspace shouldn't do any bottom-up suspend sequence.
>>
>> In the same sense, if you suspend the underlying data or metadata pool and
>> then suspend dm-thin, you get a deadlock too. Fix userspace so that it
>> doesn't do it.
>
> Yeah, I agree.  I told Zdenek it'd be best to just not do it.
>
> 'dmsetup remove_all' is a dumb command that invites these deadlocks when
> more sophisticated stacking is being used.
>
> But all said, in general I don't have a problem with triggering a target
> specific "flush" on device close.
>
> (Though the implementation of thin_flush could be made more
> intelligent... as is we can see a pretty good storm of redundant
> metadata commits if 100s of thin devices are closed simultaneously --
> creating pmd->root_lock write lock contention).


I'd say it differently - the question here is whether we really want to 
support forced removal of devices or we keep them stuck in the system.

The system must expect that device become unavailable anytime (hw fault),
and if the device is not mounted and unused, it should not be a problem to
remove such device (even suspended).

However current code basically removes thinpool target (has open count 0),
but keeps data and metadata devices  (the problem is worse, if the metadata 
device is replaced with error target in this moment).

Also not - there is not a problem in userspace code as such (except the case 
of discardless devices - where the change is simple quite unexpected new 
behavior of device table, thus older tools cannot work properly with it).
The error is technically in the 'teardown' code for the test case - which
used to assume that something with open count 0 could be easily removed - and 
unremovable targets might be replaced with error targets (so underlying devs 
could be detached as well) - however with current thinp target, we are in
some troubles and I'd like to see them behaving like all other dm targets.

Zdenek

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-05-17  7:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-16 22:19 [PATCH] dm thin: commit pool's metadata on last close of thin device Mike Snitzer
2012-05-17  0:43 ` Mikulas Patocka
2012-05-17  4:00   ` Mike Snitzer
2012-05-17  7:44     ` Zdenek Kabelac

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.