* [PATCH] Device-mapper misc cleanup
@ 2012-03-19 15:53 Hannes Reinecke
2012-03-19 15:53 ` [PATCH] dm: zero out bi_end_io on remapping failure Hannes Reinecke
0 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2012-03-19 15:53 UTC (permalink / raw)
To: dm-devel; +Cc: Alasdair Kergon
This is small patchset for some cleanups I've found during
debugging other issues:
- Zero out bi_end_io: I always find it more convenient to
explictly zero out things we've set previously. No harm
done, but possibly easier debugging.
- Simplify call to free_devices(): free_devices() already
does a list_for_each(), so we don't have to check for
list_empty...
- Delayed cleanup for dm_table_destroy(): We have krefs
in the kernel, which does exactly what we want. And I
fail to see why we can't use them here. Plus we're
saving an msleep ...
Hannes Reinecke (3):
dm: zero out bi_end_io on remapping failure
dm-table: simplify call to free_devices()
dm-table: delayed cleanup for dm_table_destroy()
drivers/md/dm-table.c | 29 ++++++++++++++---------------
drivers/md/dm.c | 1 +
2 files changed, 15 insertions(+), 15 deletions(-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] dm: zero out bi_end_io on remapping failure
2012-03-19 15:53 [PATCH] Device-mapper misc cleanup Hannes Reinecke
@ 2012-03-19 15:53 ` Hannes Reinecke
2012-03-19 15:53 ` [PATCH] dm-table: simplify call to free_devices() Hannes Reinecke
0 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2012-03-19 15:53 UTC (permalink / raw)
To: dm-devel; +Cc: Alasdair Kergon
To be symmetric we should be setting bi_end_io to NULL when failing
to remap.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
drivers/md/dm.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b89c548..e24143c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1016,6 +1016,7 @@ static void __map_bio(struct dm_target *ti, struct bio *clone,
/*
* Store bio_set for cleanup.
*/
+ clone->bi_end_io = NULL;
clone->bi_private = md->bs;
bio_put(clone);
free_tio(md, tio);
--
1.6.0.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH] dm-table: simplify call to free_devices()
2012-03-19 15:53 ` [PATCH] dm: zero out bi_end_io on remapping failure Hannes Reinecke
@ 2012-03-19 15:53 ` Hannes Reinecke
2012-03-19 15:53 ` [PATCH] dm-table: delayed cleanup for dm_table_destroy() Hannes Reinecke
0 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2012-03-19 15:53 UTC (permalink / raw)
To: dm-devel; +Cc: Alasdair Kergon
free_devices already uses list_for_each(), so we don't need to
check if the list is empty.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
drivers/md/dm-table.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 63cc542..a3d1e18 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -268,8 +268,7 @@ void dm_table_destroy(struct dm_table *t)
vfree(t->highs);
/* free the device list */
- if (t->devices.next != &t->devices)
- free_devices(&t->devices);
+ free_devices(&t->devices);
dm_free_md_mempools(t->mempools);
--
1.6.0.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH] dm-table: delayed cleanup for dm_table_destroy()
2012-03-19 15:53 ` [PATCH] dm-table: simplify call to free_devices() Hannes Reinecke
@ 2012-03-19 15:53 ` Hannes Reinecke
2012-03-19 21:45 ` Mikulas Patocka
0 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2012-03-19 15:53 UTC (permalink / raw)
To: dm-devel; +Cc: Alasdair Kergon
We should be using a kref instead of the existing holders flag.
This enables us to use delayed cleanup and we'll get rid of the
msleep in dm_table_destroy().
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
drivers/md/dm-table.c | 26 +++++++++++++-------------
1 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index a3d1e18..97c01f7 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -41,7 +41,7 @@
struct dm_table {
struct mapped_device *md;
- atomic_t holders;
+ struct kref kref;
unsigned type;
/* btree table */
@@ -208,7 +208,7 @@ int dm_table_create(struct dm_table **result, fmode_t mode,
INIT_LIST_HEAD(&t->devices);
INIT_LIST_HEAD(&t->target_callbacks);
- atomic_set(&t->holders, 0);
+ kref_init(&t->kref);
if (!num_targets)
num_targets = KEYS_PER_NODE;
@@ -240,17 +240,11 @@ static void free_devices(struct list_head *devices)
}
}
-void dm_table_destroy(struct dm_table *t)
+void __table_destroy(struct kref *kref)
{
+ struct dm_table *t = container_of(kref, struct dm_table, kref);
unsigned int i;
- if (!t)
- return;
-
- while (atomic_read(&t->holders))
- msleep(1);
- smp_mb();
-
/* free the indexes */
if (t->depth >= 2)
vfree(t->index[t->depth - 2]);
@@ -277,7 +271,8 @@ void dm_table_destroy(struct dm_table *t)
void dm_table_get(struct dm_table *t)
{
- atomic_inc(&t->holders);
+ if (t)
+ kref_get(&t->kref);
}
EXPORT_SYMBOL(dm_table_get);
@@ -286,11 +281,16 @@ void dm_table_put(struct dm_table *t)
if (!t)
return;
- smp_mb__before_atomic_dec();
- atomic_dec(&t->holders);
+ kref_put(&t->kref, __table_destroy);
}
EXPORT_SYMBOL(dm_table_put);
+void dm_table_destroy(struct dm_table *t)
+{
+ smp_mb__before_atomic_dec();
+ dm_table_put(t);
+}
+
/*
* Checks to see if we need to extend highs or targets.
*/
--
1.6.0.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] dm-table: delayed cleanup for dm_table_destroy()
2012-03-19 15:53 ` [PATCH] dm-table: delayed cleanup for dm_table_destroy() Hannes Reinecke
@ 2012-03-19 21:45 ` Mikulas Patocka
2012-03-20 15:35 ` Hannes Reinecke
0 siblings, 1 reply; 10+ messages in thread
From: Mikulas Patocka @ 2012-03-19 21:45 UTC (permalink / raw)
To: Hannes Reinecke, device-mapper development; +Cc: Alasdair Kergon
Hi Hannes
I wouldn't do this. This behavior actually existed in kernels <= 2.6.28.
And it caused trouble.
The problems are these:
Some code may hold a spinlock and call dm_table_put. Currently,
dm_table_put just decrements the counter and the cleanup is done
synchronously in dm_table_destroy. With your patch, cleanup would be done
in dm_table_put --- you call a target destructor (which may sleep) in a
non-sleeping context.
Some code may hold a mutex and call dm_table_put. If you call a target
destructor from dm_table_put and it takes the same mutex, it deadlocks.
Currently, when dm_table_destroy exits, it is guaranteed that the table is
destroyed and all target destructors have been called. With your patch,
dm_table_destroy may exit and the table could be still alive because of
some other reference. This may cause leaked references to some other files
or devices. For example, suppose that the user has a device mapper device
"dm" that redirects requests to "/dev/sda". The user assumes that if he
runs "dmsetup remove dm" and this command returns, "/dev/sda" will not be
open. Your patch breaks this assumption.
Look at the commit "d58168763f74d1edbc296d7038c60efe6493fdd4" in 2.6.29
--- I was actually removing the old-style reference counting (that is
functionally equivalent to what your patch does) because of these three
problems. The old code really caused a crash although it happened very
rarely.
Mikulas
On Mon, 19 Mar 2012, Hannes Reinecke wrote:
> We should be using a kref instead of the existing holders flag.
> This enables us to use delayed cleanup and we'll get rid of the
> msleep in dm_table_destroy().
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
> drivers/md/dm-table.c | 26 +++++++++++++-------------
> 1 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index a3d1e18..97c01f7 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -41,7 +41,7 @@
>
> struct dm_table {
> struct mapped_device *md;
> - atomic_t holders;
> + struct kref kref;
> unsigned type;
>
> /* btree table */
> @@ -208,7 +208,7 @@ int dm_table_create(struct dm_table **result, fmode_t mode,
>
> INIT_LIST_HEAD(&t->devices);
> INIT_LIST_HEAD(&t->target_callbacks);
> - atomic_set(&t->holders, 0);
> + kref_init(&t->kref);
>
> if (!num_targets)
> num_targets = KEYS_PER_NODE;
> @@ -240,17 +240,11 @@ static void free_devices(struct list_head *devices)
> }
> }
>
> -void dm_table_destroy(struct dm_table *t)
> +void __table_destroy(struct kref *kref)
> {
> + struct dm_table *t = container_of(kref, struct dm_table, kref);
> unsigned int i;
>
> - if (!t)
> - return;
> -
> - while (atomic_read(&t->holders))
> - msleep(1);
> - smp_mb();
> -
> /* free the indexes */
> if (t->depth >= 2)
> vfree(t->index[t->depth - 2]);
> @@ -277,7 +271,8 @@ void dm_table_destroy(struct dm_table *t)
>
> void dm_table_get(struct dm_table *t)
> {
> - atomic_inc(&t->holders);
> + if (t)
> + kref_get(&t->kref);
> }
> EXPORT_SYMBOL(dm_table_get);
>
> @@ -286,11 +281,16 @@ void dm_table_put(struct dm_table *t)
> if (!t)
> return;
>
> - smp_mb__before_atomic_dec();
> - atomic_dec(&t->holders);
> + kref_put(&t->kref, __table_destroy);
> }
> EXPORT_SYMBOL(dm_table_put);
>
> +void dm_table_destroy(struct dm_table *t)
> +{
> + smp_mb__before_atomic_dec();
> + dm_table_put(t);
> +}
> +
> /*
> * Checks to see if we need to extend highs or targets.
> */
> --
> 1.6.0.2
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] dm-table: delayed cleanup for dm_table_destroy()
2012-03-19 21:45 ` Mikulas Patocka
@ 2012-03-20 15:35 ` Hannes Reinecke
2012-03-20 19:43 ` Mike Snitzer
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Hannes Reinecke @ 2012-03-20 15:35 UTC (permalink / raw)
To: Mikulas Patocka; +Cc: device-mapper development, Alasdair Kergon
Hi Mikulas,
On 03/19/2012 10:45 PM, Mikulas Patocka wrote:
> Hi Hannes
>
> I wouldn't do this. This behavior actually existed in kernels <= 2.6.28.
> And it caused trouble.
>
> The problems are these:
>
> Some code may hold a spinlock and call dm_table_put. Currently,
> dm_table_put just decrements the counter and the cleanup is done
> synchronously in dm_table_destroy. With your patch, cleanup would be done
> in dm_table_put --- you call a target destructor (which may sleep) in a
> non-sleeping context.
>
> Some code may hold a mutex and call dm_table_put. If you call a target
> destructor from dm_table_put and it takes the same mutex, it deadlocks.
>
> Currently, when dm_table_destroy exits, it is guaranteed that the table is
> destroyed and all target destructors have been called. With your patch,
> dm_table_destroy may exit and the table could be still alive because of
> some other reference. This may cause leaked references to some other files
> or devices. For example, suppose that the user has a device mapper device
> "dm" that redirects requests to "/dev/sda". The user assumes that if he
> runs "dmsetup remove dm" and this command returns, "/dev/sda" will not be
> open. Your patch breaks this assumption.
>
Hmm. Yes, true.
> Look at the commit "d58168763f74d1edbc296d7038c60efe6493fdd4" in 2.6.29
> --- I was actually removing the old-style reference counting (that is
> functionally equivalent to what your patch does) because of these three
> problems. The old code really caused a crash although it happened very
> rarely.
>
Ah. That'll explain it.
The actual problem I'm trying to track down is that I'm seeing an
excessive duration for the 'resume' dm ioctl after a table update.
I've got reports where the ioctl can take up to several seconds.
Which (as this is multipath) causes an extremely erratic behaviour.
And the 'msleep' here is one of the obvious culprits.
But I'll continue debugging, maybe I'll find something else.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dm-table: delayed cleanup for dm_table_destroy()
2012-03-20 15:35 ` Hannes Reinecke
@ 2012-03-20 19:43 ` Mike Snitzer
2012-03-21 1:20 ` Mikulas Patocka
2012-03-21 1:24 ` [PATCH] " Mikulas Patocka
2012-03-21 5:34 ` Jun'ichi Nomura
2 siblings, 1 reply; 10+ messages in thread
From: Mike Snitzer @ 2012-03-20 19:43 UTC (permalink / raw)
To: Hannes Reinecke
Cc: device-mapper development, Mikulas Patocka, Alasdair Kergon
On Tue, Mar 20 2012 at 11:35am -0400,
Hannes Reinecke <hare@suse.de> wrote:
> Hi Mikulas,
>
> On 03/19/2012 10:45 PM, Mikulas Patocka wrote:
> > Hi Hannes
> >
> > I wouldn't do this. This behavior actually existed in kernels <= 2.6.28.
> > And it caused trouble.
> >
> > The problems are these:
> >
> > Some code may hold a spinlock and call dm_table_put. Currently,
> > dm_table_put just decrements the counter and the cleanup is done
> > synchronously in dm_table_destroy. With your patch, cleanup would be done
> > in dm_table_put --- you call a target destructor (which may sleep) in a
> > non-sleeping context.
> >
> > Some code may hold a mutex and call dm_table_put. If you call a target
> > destructor from dm_table_put and it takes the same mutex, it deadlocks.
> >
> > Currently, when dm_table_destroy exits, it is guaranteed that the table is
> > destroyed and all target destructors have been called. With your patch,
> > dm_table_destroy may exit and the table could be still alive because of
> > some other reference. This may cause leaked references to some other files
> > or devices. For example, suppose that the user has a device mapper device
> > "dm" that redirects requests to "/dev/sda". The user assumes that if he
> > runs "dmsetup remove dm" and this command returns, "/dev/sda" will not be
> > open. Your patch breaks this assumption.
> >
> Hmm. Yes, true.
>
> > Look at the commit "d58168763f74d1edbc296d7038c60efe6493fdd4" in 2.6.29
> > --- I was actually removing the old-style reference counting (that is
> > functionally equivalent to what your patch does) because of these three
> > problems. The old code really caused a crash although it happened very
> > rarely.
> >
> Ah. That'll explain it.
> The actual problem I'm trying to track down is that I'm seeing an
> excessive duration for the 'resume' dm ioctl after a table update.
> I've got reports where the ioctl can take up to several seconds.
> Which (as this is multipath) causes an extremely erratic behaviour.
>
> And the 'msleep' here is one of the obvious culprits.
>
> But I'll continue debugging, maybe I'll find something else.
I once wanted to replace all msleep(1); with cpu_relax();:
http://www.redhat.com/archives/dm-devel/2010-September/msg00100.html
But Alasdair wasn't sure if cpu_relax() would provide the required
delay effect:
https://www.redhat.com/archives/dm-devel/2011-June/msg00080.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dm-table: delayed cleanup for dm_table_destroy()
2012-03-20 19:43 ` Mike Snitzer
@ 2012-03-21 1:20 ` Mikulas Patocka
0 siblings, 0 replies; 10+ messages in thread
From: Mikulas Patocka @ 2012-03-21 1:20 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development, Alasdair Kergon
On Tue, 20 Mar 2012, Mike Snitzer wrote:
> I once wanted to replace all msleep(1); with cpu_relax();:
> http://www.redhat.com/archives/dm-devel/2010-September/msg00100.html
>
> But Alasdair wasn't sure if cpu_relax() would provide the required
> delay effect:
> https://www.redhat.com/archives/dm-devel/2011-June/msg00080.html
cpu_relax() would basically kill the kernel if compiled without preempt.
cpu_relax() makes the current processor sleep for a little moment, but it
doesn't schedule a different process.
cpu_relax() is useful in spinlocks - so that process waiting on a spinlock
is not slowing down the other process on a CPU with hyperthreading.
There was another possibility --- replace msleep(1) with yield(). The
problem with yield() is that if the process has readltime priority and
calls yield(), it doesn't give CPU to a different process with a lower
priority. So it would cause deadlock if executed with realtime priority.
msleep(1) doesn't have this problem, so it's the best.
Mikulas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] dm-table: delayed cleanup for dm_table_destroy()
2012-03-20 15:35 ` Hannes Reinecke
2012-03-20 19:43 ` Mike Snitzer
@ 2012-03-21 1:24 ` Mikulas Patocka
2012-03-21 5:34 ` Jun'ichi Nomura
2 siblings, 0 replies; 10+ messages in thread
From: Mikulas Patocka @ 2012-03-21 1:24 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: device-mapper development, Alasdair Kergon
On Tue, 20 Mar 2012, Hannes Reinecke wrote:
> Ah. That'll explain it.
> The actual problem I'm trying to track down is that I'm seeing an
> excessive duration for the 'resume' dm ioctl after a table update.
> I've got reports where the ioctl can take up to several seconds.
> Which (as this is multipath) causes an extremely erratic behaviour.
>
> And the 'msleep' here is one of the obvious culprits.
>
> But I'll continue debugging, maybe I'll find something else.
>
> Cheers,
>
> Hannes
If it takes several second, it is not obviously problem with msleep(1) ---
if we converted msleep(1) to a wait queue, it would improve by at most
1ms, but it couldn't improve by a few seconds.
Mikulas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] dm-table: delayed cleanup for dm_table_destroy()
2012-03-20 15:35 ` Hannes Reinecke
2012-03-20 19:43 ` Mike Snitzer
2012-03-21 1:24 ` [PATCH] " Mikulas Patocka
@ 2012-03-21 5:34 ` Jun'ichi Nomura
2 siblings, 0 replies; 10+ messages in thread
From: Jun'ichi Nomura @ 2012-03-21 5:34 UTC (permalink / raw)
To: device-mapper development, Hannes Reinecke
Cc: Mikulas Patocka, Alasdair Kergon
Hi Hannes,
On 03/21/12 00:35, Hannes Reinecke wrote:
> The actual problem I'm trying to track down is that I'm seeing an
> excessive duration for the 'resume' dm ioctl after a table update.
> I've got reports where the ioctl can take up to several seconds.
> Which (as this is multipath) causes an extremely erratic behaviour.
>
> And the 'msleep' here is one of the obvious culprits.
>
> But I'll continue debugging, maybe I'll find something else.
Did you track down which part of the resume ioctl took long?
If table is updated and the device is not yet suspended,
resume ioctl itself suspends the device.
And dm_suspend() could take long depending on lower devices
as it waits for already-submitted I/Os to return to dm.
Thanks,
--
Jun'ichi Nomura, NEC Corporation
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-03-21 5:34 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-19 15:53 [PATCH] Device-mapper misc cleanup Hannes Reinecke
2012-03-19 15:53 ` [PATCH] dm: zero out bi_end_io on remapping failure Hannes Reinecke
2012-03-19 15:53 ` [PATCH] dm-table: simplify call to free_devices() Hannes Reinecke
2012-03-19 15:53 ` [PATCH] dm-table: delayed cleanup for dm_table_destroy() Hannes Reinecke
2012-03-19 21:45 ` Mikulas Patocka
2012-03-20 15:35 ` Hannes Reinecke
2012-03-20 19:43 ` Mike Snitzer
2012-03-21 1:20 ` Mikulas Patocka
2012-03-21 1:24 ` [PATCH] " Mikulas Patocka
2012-03-21 5:34 ` Jun'ichi Nomura
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.