All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Device-mapper misc cleanup
@ 2012-03-19 15:53 Hannes Reinecke
  2012-03-19 15:53 ` [PATCH] dm: zero out bi_end_io on remapping failure Hannes Reinecke
  0 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2012-03-19 15:53 UTC (permalink / raw)
  To: dm-devel; +Cc: Alasdair Kergon

This is small patchset for some cleanups I've found during
debugging other issues:

- Zero out bi_end_io: I always find it more convenient to
  explictly zero out things we've set previously. No harm
  done, but possibly easier debugging.
- Simplify call to free_devices(): free_devices() already
  does a list_for_each(), so we don't have to check for
  list_empty...
- Delayed cleanup for dm_table_destroy(): We have krefs
  in the kernel, which does exactly what we want. And I
  fail to see why we can't use them here. Plus we're
  saving an msleep ...

Hannes Reinecke (3):
  dm: zero out bi_end_io on remapping failure
  dm-table: simplify call to free_devices()
  dm-table: delayed cleanup for dm_table_destroy()

 drivers/md/dm-table.c |   29 ++++++++++++++---------------
 drivers/md/dm.c       |    1 +
 2 files changed, 15 insertions(+), 15 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] dm: zero out bi_end_io on remapping failure
  2012-03-19 15:53 [PATCH] Device-mapper misc cleanup Hannes Reinecke
@ 2012-03-19 15:53 ` Hannes Reinecke
  2012-03-19 15:53   ` [PATCH] dm-table: simplify call to free_devices() Hannes Reinecke
  0 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2012-03-19 15:53 UTC (permalink / raw)
  To: dm-devel; +Cc: Alasdair Kergon

To be symmetric we should be setting bi_end_io to NULL when failing
to remap.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b89c548..e24143c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1016,6 +1016,7 @@ static void __map_bio(struct dm_target *ti, struct bio *clone,
 		/*
 		 * Store bio_set for cleanup.
 		 */
+		clone->bi_end_io = NULL;
 		clone->bi_private = md->bs;
 		bio_put(clone);
 		free_tio(md, tio);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] dm-table: simplify call to free_devices()
  2012-03-19 15:53 ` [PATCH] dm: zero out bi_end_io on remapping failure Hannes Reinecke
@ 2012-03-19 15:53   ` Hannes Reinecke
  2012-03-19 15:53     ` [PATCH] dm-table: delayed cleanup for dm_table_destroy() Hannes Reinecke
  0 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2012-03-19 15:53 UTC (permalink / raw)
  To: dm-devel; +Cc: Alasdair Kergon

free_devices already uses list_for_each(), so we don't need to
check if the list is empty.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-table.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 63cc542..a3d1e18 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -268,8 +268,7 @@ void dm_table_destroy(struct dm_table *t)
 	vfree(t->highs);
 
 	/* free the device list */
-	if (t->devices.next != &t->devices)
-		free_devices(&t->devices);
+	free_devices(&t->devices);
 
 	dm_free_md_mempools(t->mempools);
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] dm-table: delayed cleanup for dm_table_destroy()
  2012-03-19 15:53   ` [PATCH] dm-table: simplify call to free_devices() Hannes Reinecke
@ 2012-03-19 15:53     ` Hannes Reinecke
  2012-03-19 21:45       ` Mikulas Patocka
  0 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2012-03-19 15:53 UTC (permalink / raw)
  To: dm-devel; +Cc: Alasdair Kergon

We should be using a kref instead of the existing holders flag.
This enables us to use delayed cleanup and we'll get rid of the
msleep in dm_table_destroy().

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-table.c |   26 +++++++++++++-------------
 1 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index a3d1e18..97c01f7 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -41,7 +41,7 @@
 
 struct dm_table {
 	struct mapped_device *md;
-	atomic_t holders;
+	struct kref kref;
 	unsigned type;
 
 	/* btree table */
@@ -208,7 +208,7 @@ int dm_table_create(struct dm_table **result, fmode_t mode,
 
 	INIT_LIST_HEAD(&t->devices);
 	INIT_LIST_HEAD(&t->target_callbacks);
-	atomic_set(&t->holders, 0);
+	kref_init(&t->kref);
 
 	if (!num_targets)
 		num_targets = KEYS_PER_NODE;
@@ -240,17 +240,11 @@ static void free_devices(struct list_head *devices)
 	}
 }
 
-void dm_table_destroy(struct dm_table *t)
+void __table_destroy(struct kref *kref)
 {
+	struct dm_table *t = container_of(kref, struct dm_table, kref);
 	unsigned int i;
 
-	if (!t)
-		return;
-
-	while (atomic_read(&t->holders))
-		msleep(1);
-	smp_mb();
-
 	/* free the indexes */
 	if (t->depth >= 2)
 		vfree(t->index[t->depth - 2]);
@@ -277,7 +271,8 @@ void dm_table_destroy(struct dm_table *t)
 
 void dm_table_get(struct dm_table *t)
 {
-	atomic_inc(&t->holders);
+	if (t)
+		kref_get(&t->kref);
 }
 EXPORT_SYMBOL(dm_table_get);
 
@@ -286,11 +281,16 @@ void dm_table_put(struct dm_table *t)
 	if (!t)
 		return;
 
-	smp_mb__before_atomic_dec();
-	atomic_dec(&t->holders);
+	kref_put(&t->kref, __table_destroy);
 }
 EXPORT_SYMBOL(dm_table_put);
 
+void dm_table_destroy(struct dm_table *t)
+{
+	smp_mb__before_atomic_dec();
+	dm_table_put(t);
+}
+
 /*
  * Checks to see if we need to extend highs or targets.
  */
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] dm-table: delayed cleanup for dm_table_destroy()
  2012-03-19 15:53     ` [PATCH] dm-table: delayed cleanup for dm_table_destroy() Hannes Reinecke
@ 2012-03-19 21:45       ` Mikulas Patocka
  2012-03-20 15:35         ` Hannes Reinecke
  0 siblings, 1 reply; 10+ messages in thread
From: Mikulas Patocka @ 2012-03-19 21:45 UTC (permalink / raw)
  To: Hannes Reinecke, device-mapper development; +Cc: Alasdair Kergon

Hi Hannes

I wouldn't do this. This behavior actually existed in kernels <= 2.6.28.
And it caused trouble.

The problems are these:

Some code may hold a spinlock and call dm_table_put. Currently, 
dm_table_put just decrements the counter and the cleanup is done 
synchronously in dm_table_destroy. With your patch, cleanup would be done 
in dm_table_put --- you call a target destructor (which may sleep) in a 
non-sleeping context.

Some code may hold a mutex and call dm_table_put. If you call a target 
destructor from dm_table_put and it takes the same mutex, it deadlocks.

Currently, when dm_table_destroy exits, it is guaranteed that the table is 
destroyed and all target destructors have been called. With your patch, 
dm_table_destroy may exit and the table could be still alive because of 
some other reference. This may cause leaked references to some other files 
or devices. For example, suppose that the user has a device mapper device 
"dm" that redirects requests to "/dev/sda". The user assumes that if he 
runs "dmsetup remove dm" and this command returns, "/dev/sda" will not be 
open. Your patch breaks this assumption.

Look at the commit "d58168763f74d1edbc296d7038c60efe6493fdd4" in 2.6.29 
--- I was actually removing the old-style reference counting (that is 
functionally equivalent to what your patch does) because of these three 
problems. The old code really caused a crash although it happened very 
rarely.

Mikulas


On Mon, 19 Mar 2012, Hannes Reinecke wrote:

> We should be using a kref instead of the existing holders flag.
> This enables us to use delayed cleanup and we'll get rid of the
> msleep in dm_table_destroy().
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-table.c |   26 +++++++++++++-------------
>  1 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index a3d1e18..97c01f7 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -41,7 +41,7 @@
>  
>  struct dm_table {
>  	struct mapped_device *md;
> -	atomic_t holders;
> +	struct kref kref;
>  	unsigned type;
>  
>  	/* btree table */
> @@ -208,7 +208,7 @@ int dm_table_create(struct dm_table **result, fmode_t mode,
>  
>  	INIT_LIST_HEAD(&t->devices);
>  	INIT_LIST_HEAD(&t->target_callbacks);
> -	atomic_set(&t->holders, 0);
> +	kref_init(&t->kref);
>  
>  	if (!num_targets)
>  		num_targets = KEYS_PER_NODE;
> @@ -240,17 +240,11 @@ static void free_devices(struct list_head *devices)
>  	}
>  }
>  
> -void dm_table_destroy(struct dm_table *t)
> +void __table_destroy(struct kref *kref)
>  {
> +	struct dm_table *t = container_of(kref, struct dm_table, kref);
>  	unsigned int i;
>  
> -	if (!t)
> -		return;
> -
> -	while (atomic_read(&t->holders))
> -		msleep(1);
> -	smp_mb();
> -
>  	/* free the indexes */
>  	if (t->depth >= 2)
>  		vfree(t->index[t->depth - 2]);
> @@ -277,7 +271,8 @@ void dm_table_destroy(struct dm_table *t)
>  
>  void dm_table_get(struct dm_table *t)
>  {
> -	atomic_inc(&t->holders);
> +	if (t)
> +		kref_get(&t->kref);
>  }
>  EXPORT_SYMBOL(dm_table_get);
>  
> @@ -286,11 +281,16 @@ void dm_table_put(struct dm_table *t)
>  	if (!t)
>  		return;
>  
> -	smp_mb__before_atomic_dec();
> -	atomic_dec(&t->holders);
> +	kref_put(&t->kref, __table_destroy);
>  }
>  EXPORT_SYMBOL(dm_table_put);
>  
> +void dm_table_destroy(struct dm_table *t)
> +{
> +	smp_mb__before_atomic_dec();
> +	dm_table_put(t);
> +}
> +
>  /*
>   * Checks to see if we need to extend highs or targets.
>   */
> -- 
> 1.6.0.2
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dm-table: delayed cleanup for dm_table_destroy()
  2012-03-19 21:45       ` Mikulas Patocka
@ 2012-03-20 15:35         ` Hannes Reinecke
  2012-03-20 19:43           ` Mike Snitzer
                             ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Hannes Reinecke @ 2012-03-20 15:35 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, Alasdair Kergon

Hi Mikulas,

On 03/19/2012 10:45 PM, Mikulas Patocka wrote:
> Hi Hannes
> 
> I wouldn't do this. This behavior actually existed in kernels <= 2.6.28.
> And it caused trouble.
> 
> The problems are these:
> 
> Some code may hold a spinlock and call dm_table_put. Currently, 
> dm_table_put just decrements the counter and the cleanup is done 
> synchronously in dm_table_destroy. With your patch, cleanup would be done 
> in dm_table_put --- you call a target destructor (which may sleep) in a 
> non-sleeping context.
> 
> Some code may hold a mutex and call dm_table_put. If you call a target 
> destructor from dm_table_put and it takes the same mutex, it deadlocks.
> 
> Currently, when dm_table_destroy exits, it is guaranteed that the table is 
> destroyed and all target destructors have been called. With your patch, 
> dm_table_destroy may exit and the table could be still alive because of 
> some other reference. This may cause leaked references to some other files 
> or devices. For example, suppose that the user has a device mapper device 
> "dm" that redirects requests to "/dev/sda". The user assumes that if he 
> runs "dmsetup remove dm" and this command returns, "/dev/sda" will not be 
> open. Your patch breaks this assumption.
> 
Hmm. Yes, true.

> Look at the commit "d58168763f74d1edbc296d7038c60efe6493fdd4" in 2.6.29 
> --- I was actually removing the old-style reference counting (that is 
> functionally equivalent to what your patch does) because of these three 
> problems. The old code really caused a crash although it happened very 
> rarely.
> 
Ah. That'll explain it.
The actual problem I'm trying to track down is that I'm seeing an
excessive duration for the 'resume' dm ioctl after a table update.
I've got reports where the ioctl can take up to several seconds.
Which (as this is multipath) causes an extremely erratic behaviour.

And the 'msleep' here is one of the obvious culprits.

But I'll continue debugging, maybe I'll find something else.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dm-table: delayed cleanup for dm_table_destroy()
  2012-03-20 15:35         ` Hannes Reinecke
@ 2012-03-20 19:43           ` Mike Snitzer
  2012-03-21  1:20             ` Mikulas Patocka
  2012-03-21  1:24           ` [PATCH] " Mikulas Patocka
  2012-03-21  5:34           ` Jun'ichi Nomura
  2 siblings, 1 reply; 10+ messages in thread
From: Mike Snitzer @ 2012-03-20 19:43 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: device-mapper development, Mikulas Patocka, Alasdair Kergon

On Tue, Mar 20 2012 at 11:35am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> Hi Mikulas,
> 
> On 03/19/2012 10:45 PM, Mikulas Patocka wrote:
> > Hi Hannes
> > 
> > I wouldn't do this. This behavior actually existed in kernels <= 2.6.28.
> > And it caused trouble.
> > 
> > The problems are these:
> > 
> > Some code may hold a spinlock and call dm_table_put. Currently, 
> > dm_table_put just decrements the counter and the cleanup is done 
> > synchronously in dm_table_destroy. With your patch, cleanup would be done 
> > in dm_table_put --- you call a target destructor (which may sleep) in a 
> > non-sleeping context.
> > 
> > Some code may hold a mutex and call dm_table_put. If you call a target 
> > destructor from dm_table_put and it takes the same mutex, it deadlocks.
> > 
> > Currently, when dm_table_destroy exits, it is guaranteed that the table is 
> > destroyed and all target destructors have been called. With your patch, 
> > dm_table_destroy may exit and the table could be still alive because of 
> > some other reference. This may cause leaked references to some other files 
> > or devices. For example, suppose that the user has a device mapper device 
> > "dm" that redirects requests to "/dev/sda". The user assumes that if he 
> > runs "dmsetup remove dm" and this command returns, "/dev/sda" will not be 
> > open. Your patch breaks this assumption.
> > 
> Hmm. Yes, true.
> 
> > Look at the commit "d58168763f74d1edbc296d7038c60efe6493fdd4" in 2.6.29 
> > --- I was actually removing the old-style reference counting (that is 
> > functionally equivalent to what your patch does) because of these three 
> > problems. The old code really caused a crash although it happened very 
> > rarely.
> > 
> Ah. That'll explain it.
> The actual problem I'm trying to track down is that I'm seeing an
> excessive duration for the 'resume' dm ioctl after a table update.
> I've got reports where the ioctl can take up to several seconds.
> Which (as this is multipath) causes an extremely erratic behaviour.
> 
> And the 'msleep' here is one of the obvious culprits.
> 
> But I'll continue debugging, maybe I'll find something else.

I once wanted to replace all msleep(1); with cpu_relax();:
http://www.redhat.com/archives/dm-devel/2010-September/msg00100.html

But Alasdair wasn't sure if cpu_relax() would provide the required
delay effect:
https://www.redhat.com/archives/dm-devel/2011-June/msg00080.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dm-table: delayed cleanup for dm_table_destroy()
  2012-03-20 19:43           ` Mike Snitzer
@ 2012-03-21  1:20             ` Mikulas Patocka
  0 siblings, 0 replies; 10+ messages in thread
From: Mikulas Patocka @ 2012-03-21  1:20 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, Alasdair Kergon



On Tue, 20 Mar 2012, Mike Snitzer wrote:

> I once wanted to replace all msleep(1); with cpu_relax();:
> http://www.redhat.com/archives/dm-devel/2010-September/msg00100.html
> 
> But Alasdair wasn't sure if cpu_relax() would provide the required
> delay effect:
> https://www.redhat.com/archives/dm-devel/2011-June/msg00080.html

cpu_relax() would basically kill the kernel if compiled without preempt.

cpu_relax() makes the current processor sleep for a little moment, but it 
doesn't schedule a different process.

cpu_relax() is useful in spinlocks - so that process waiting on a spinlock 
is not slowing down the other process on a CPU with hyperthreading.


There was another possibility --- replace msleep(1) with yield(). The 
problem with yield() is that if the process has readltime priority and 
calls yield(), it doesn't give CPU to a different process with a lower 
priority. So it would cause deadlock if executed with realtime priority. 
msleep(1) doesn't have this problem, so it's the best.

Mikulas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dm-table: delayed cleanup for dm_table_destroy()
  2012-03-20 15:35         ` Hannes Reinecke
  2012-03-20 19:43           ` Mike Snitzer
@ 2012-03-21  1:24           ` Mikulas Patocka
  2012-03-21  5:34           ` Jun'ichi Nomura
  2 siblings, 0 replies; 10+ messages in thread
From: Mikulas Patocka @ 2012-03-21  1:24 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: device-mapper development, Alasdair Kergon



On Tue, 20 Mar 2012, Hannes Reinecke wrote:

> Ah. That'll explain it.
> The actual problem I'm trying to track down is that I'm seeing an
> excessive duration for the 'resume' dm ioctl after a table update.
> I've got reports where the ioctl can take up to several seconds.
> Which (as this is multipath) causes an extremely erratic behaviour.
> 
> And the 'msleep' here is one of the obvious culprits.
> 
> But I'll continue debugging, maybe I'll find something else.
> 
> Cheers,
> 
> Hannes

If it takes several second, it is not obviously problem with msleep(1) --- 
if we converted msleep(1) to a wait queue, it would improve by at most 
1ms, but it couldn't improve by a few seconds.

Mikulas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dm-table: delayed cleanup for dm_table_destroy()
  2012-03-20 15:35         ` Hannes Reinecke
  2012-03-20 19:43           ` Mike Snitzer
  2012-03-21  1:24           ` [PATCH] " Mikulas Patocka
@ 2012-03-21  5:34           ` Jun'ichi Nomura
  2 siblings, 0 replies; 10+ messages in thread
From: Jun'ichi Nomura @ 2012-03-21  5:34 UTC (permalink / raw)
  To: device-mapper development, Hannes Reinecke
  Cc: Mikulas Patocka, Alasdair Kergon

Hi Hannes,

On 03/21/12 00:35, Hannes Reinecke wrote:
> The actual problem I'm trying to track down is that I'm seeing an
> excessive duration for the 'resume' dm ioctl after a table update.
> I've got reports where the ioctl can take up to several seconds.
> Which (as this is multipath) causes an extremely erratic behaviour.
> 
> And the 'msleep' here is one of the obvious culprits.
> 
> But I'll continue debugging, maybe I'll find something else.

Did you track down which part of the resume ioctl took long?

If table is updated and the device is not yet suspended,
resume ioctl itself suspends the device.
And dm_suspend() could take long depending on lower devices
as it waits for already-submitted I/Os to return to dm.

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-03-21  5:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-19 15:53 [PATCH] Device-mapper misc cleanup Hannes Reinecke
2012-03-19 15:53 ` [PATCH] dm: zero out bi_end_io on remapping failure Hannes Reinecke
2012-03-19 15:53   ` [PATCH] dm-table: simplify call to free_devices() Hannes Reinecke
2012-03-19 15:53     ` [PATCH] dm-table: delayed cleanup for dm_table_destroy() Hannes Reinecke
2012-03-19 21:45       ` Mikulas Patocka
2012-03-20 15:35         ` Hannes Reinecke
2012-03-20 19:43           ` Mike Snitzer
2012-03-21  1:20             ` Mikulas Patocka
2012-03-21  1:24           ` [PATCH] " Mikulas Patocka
2012-03-21  5:34           ` Jun'ichi Nomura

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.