* [PATCH] Device-mapper misc cleanup @ 2012-03-19 15:53 Hannes Reinecke 2012-03-19 15:53 ` [PATCH] dm: zero out bi_end_io on remapping failure Hannes Reinecke 0 siblings, 1 reply; 10+ messages in thread From: Hannes Reinecke @ 2012-03-19 15:53 UTC (permalink / raw) To: dm-devel; +Cc: Alasdair Kergon This is small patchset for some cleanups I've found during debugging other issues: - Zero out bi_end_io: I always find it more convenient to explictly zero out things we've set previously. No harm done, but possibly easier debugging. - Simplify call to free_devices(): free_devices() already does a list_for_each(), so we don't have to check for list_empty... - Delayed cleanup for dm_table_destroy(): We have krefs in the kernel, which does exactly what we want. And I fail to see why we can't use them here. Plus we're saving an msleep ... Hannes Reinecke (3): dm: zero out bi_end_io on remapping failure dm-table: simplify call to free_devices() dm-table: delayed cleanup for dm_table_destroy() drivers/md/dm-table.c | 29 ++++++++++++++--------------- drivers/md/dm.c | 1 + 2 files changed, 15 insertions(+), 15 deletions(-) ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] dm: zero out bi_end_io on remapping failure 2012-03-19 15:53 [PATCH] Device-mapper misc cleanup Hannes Reinecke @ 2012-03-19 15:53 ` Hannes Reinecke 2012-03-19 15:53 ` [PATCH] dm-table: simplify call to free_devices() Hannes Reinecke 0 siblings, 1 reply; 10+ messages in thread From: Hannes Reinecke @ 2012-03-19 15:53 UTC (permalink / raw) To: dm-devel; +Cc: Alasdair Kergon To be symmetric we should be setting bi_end_io to NULL when failing to remap. Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/md/dm.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index b89c548..e24143c 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1016,6 +1016,7 @@ static void __map_bio(struct dm_target *ti, struct bio *clone, /* * Store bio_set for cleanup. */ + clone->bi_end_io = NULL; clone->bi_private = md->bs; bio_put(clone); free_tio(md, tio); -- 1.6.0.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH] dm-table: simplify call to free_devices() 2012-03-19 15:53 ` [PATCH] dm: zero out bi_end_io on remapping failure Hannes Reinecke @ 2012-03-19 15:53 ` Hannes Reinecke 2012-03-19 15:53 ` [PATCH] dm-table: delayed cleanup for dm_table_destroy() Hannes Reinecke 0 siblings, 1 reply; 10+ messages in thread From: Hannes Reinecke @ 2012-03-19 15:53 UTC (permalink / raw) To: dm-devel; +Cc: Alasdair Kergon free_devices already uses list_for_each(), so we don't need to check if the list is empty. Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/md/dm-table.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 63cc542..a3d1e18 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -268,8 +268,7 @@ void dm_table_destroy(struct dm_table *t) vfree(t->highs); /* free the device list */ - if (t->devices.next != &t->devices) - free_devices(&t->devices); + free_devices(&t->devices); dm_free_md_mempools(t->mempools); -- 1.6.0.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH] dm-table: delayed cleanup for dm_table_destroy() 2012-03-19 15:53 ` [PATCH] dm-table: simplify call to free_devices() Hannes Reinecke @ 2012-03-19 15:53 ` Hannes Reinecke 2012-03-19 21:45 ` Mikulas Patocka 0 siblings, 1 reply; 10+ messages in thread From: Hannes Reinecke @ 2012-03-19 15:53 UTC (permalink / raw) To: dm-devel; +Cc: Alasdair Kergon We should be using a kref instead of the existing holders flag. This enables us to use delayed cleanup and we'll get rid of the msleep in dm_table_destroy(). Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/md/dm-table.c | 26 +++++++++++++------------- 1 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index a3d1e18..97c01f7 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -41,7 +41,7 @@ struct dm_table { struct mapped_device *md; - atomic_t holders; + struct kref kref; unsigned type; /* btree table */ @@ -208,7 +208,7 @@ int dm_table_create(struct dm_table **result, fmode_t mode, INIT_LIST_HEAD(&t->devices); INIT_LIST_HEAD(&t->target_callbacks); - atomic_set(&t->holders, 0); + kref_init(&t->kref); if (!num_targets) num_targets = KEYS_PER_NODE; @@ -240,17 +240,11 @@ static void free_devices(struct list_head *devices) } } -void dm_table_destroy(struct dm_table *t) +void __table_destroy(struct kref *kref) { + struct dm_table *t = container_of(kref, struct dm_table, kref); unsigned int i; - if (!t) - return; - - while (atomic_read(&t->holders)) - msleep(1); - smp_mb(); - /* free the indexes */ if (t->depth >= 2) vfree(t->index[t->depth - 2]); @@ -277,7 +271,8 @@ void dm_table_destroy(struct dm_table *t) void dm_table_get(struct dm_table *t) { - atomic_inc(&t->holders); + if (t) + kref_get(&t->kref); } EXPORT_SYMBOL(dm_table_get); @@ -286,11 +281,16 @@ void dm_table_put(struct dm_table *t) if (!t) return; - smp_mb__before_atomic_dec(); - atomic_dec(&t->holders); + kref_put(&t->kref, __table_destroy); } EXPORT_SYMBOL(dm_table_put); +void dm_table_destroy(struct dm_table *t) +{ + smp_mb__before_atomic_dec(); + dm_table_put(t); +} + /* * Checks to see if we need to extend highs or targets. */ -- 1.6.0.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] dm-table: delayed cleanup for dm_table_destroy() 2012-03-19 15:53 ` [PATCH] dm-table: delayed cleanup for dm_table_destroy() Hannes Reinecke @ 2012-03-19 21:45 ` Mikulas Patocka 2012-03-20 15:35 ` Hannes Reinecke 0 siblings, 1 reply; 10+ messages in thread From: Mikulas Patocka @ 2012-03-19 21:45 UTC (permalink / raw) To: Hannes Reinecke, device-mapper development; +Cc: Alasdair Kergon Hi Hannes I wouldn't do this. This behavior actually existed in kernels <= 2.6.28. And it caused trouble. The problems are these: Some code may hold a spinlock and call dm_table_put. Currently, dm_table_put just decrements the counter and the cleanup is done synchronously in dm_table_destroy. With your patch, cleanup would be done in dm_table_put --- you call a target destructor (which may sleep) in a non-sleeping context. Some code may hold a mutex and call dm_table_put. If you call a target destructor from dm_table_put and it takes the same mutex, it deadlocks. Currently, when dm_table_destroy exits, it is guaranteed that the table is destroyed and all target destructors have been called. With your patch, dm_table_destroy may exit and the table could be still alive because of some other reference. This may cause leaked references to some other files or devices. For example, suppose that the user has a device mapper device "dm" that redirects requests to "/dev/sda". The user assumes that if he runs "dmsetup remove dm" and this command returns, "/dev/sda" will not be open. Your patch breaks this assumption. Look at the commit "d58168763f74d1edbc296d7038c60efe6493fdd4" in 2.6.29 --- I was actually removing the old-style reference counting (that is functionally equivalent to what your patch does) because of these three problems. The old code really caused a crash although it happened very rarely. Mikulas On Mon, 19 Mar 2012, Hannes Reinecke wrote: > We should be using a kref instead of the existing holders flag. > This enables us to use delayed cleanup and we'll get rid of the > msleep in dm_table_destroy(). > > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > drivers/md/dm-table.c | 26 +++++++++++++------------- > 1 files changed, 13 insertions(+), 13 deletions(-) > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c > index a3d1e18..97c01f7 100644 > --- a/drivers/md/dm-table.c > +++ b/drivers/md/dm-table.c > @@ -41,7 +41,7 @@ > > struct dm_table { > struct mapped_device *md; > - atomic_t holders; > + struct kref kref; > unsigned type; > > /* btree table */ > @@ -208,7 +208,7 @@ int dm_table_create(struct dm_table **result, fmode_t mode, > > INIT_LIST_HEAD(&t->devices); > INIT_LIST_HEAD(&t->target_callbacks); > - atomic_set(&t->holders, 0); > + kref_init(&t->kref); > > if (!num_targets) > num_targets = KEYS_PER_NODE; > @@ -240,17 +240,11 @@ static void free_devices(struct list_head *devices) > } > } > > -void dm_table_destroy(struct dm_table *t) > +void __table_destroy(struct kref *kref) > { > + struct dm_table *t = container_of(kref, struct dm_table, kref); > unsigned int i; > > - if (!t) > - return; > - > - while (atomic_read(&t->holders)) > - msleep(1); > - smp_mb(); > - > /* free the indexes */ > if (t->depth >= 2) > vfree(t->index[t->depth - 2]); > @@ -277,7 +271,8 @@ void dm_table_destroy(struct dm_table *t) > > void dm_table_get(struct dm_table *t) > { > - atomic_inc(&t->holders); > + if (t) > + kref_get(&t->kref); > } > EXPORT_SYMBOL(dm_table_get); > > @@ -286,11 +281,16 @@ void dm_table_put(struct dm_table *t) > if (!t) > return; > > - smp_mb__before_atomic_dec(); > - atomic_dec(&t->holders); > + kref_put(&t->kref, __table_destroy); > } > EXPORT_SYMBOL(dm_table_put); > > +void dm_table_destroy(struct dm_table *t) > +{ > + smp_mb__before_atomic_dec(); > + dm_table_put(t); > +} > + > /* > * Checks to see if we need to extend highs or targets. > */ > -- > 1.6.0.2 > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] dm-table: delayed cleanup for dm_table_destroy() 2012-03-19 21:45 ` Mikulas Patocka @ 2012-03-20 15:35 ` Hannes Reinecke 2012-03-20 19:43 ` Mike Snitzer ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Hannes Reinecke @ 2012-03-20 15:35 UTC (permalink / raw) To: Mikulas Patocka; +Cc: device-mapper development, Alasdair Kergon Hi Mikulas, On 03/19/2012 10:45 PM, Mikulas Patocka wrote: > Hi Hannes > > I wouldn't do this. This behavior actually existed in kernels <= 2.6.28. > And it caused trouble. > > The problems are these: > > Some code may hold a spinlock and call dm_table_put. Currently, > dm_table_put just decrements the counter and the cleanup is done > synchronously in dm_table_destroy. With your patch, cleanup would be done > in dm_table_put --- you call a target destructor (which may sleep) in a > non-sleeping context. > > Some code may hold a mutex and call dm_table_put. If you call a target > destructor from dm_table_put and it takes the same mutex, it deadlocks. > > Currently, when dm_table_destroy exits, it is guaranteed that the table is > destroyed and all target destructors have been called. With your patch, > dm_table_destroy may exit and the table could be still alive because of > some other reference. This may cause leaked references to some other files > or devices. For example, suppose that the user has a device mapper device > "dm" that redirects requests to "/dev/sda". The user assumes that if he > runs "dmsetup remove dm" and this command returns, "/dev/sda" will not be > open. Your patch breaks this assumption. > Hmm. Yes, true. > Look at the commit "d58168763f74d1edbc296d7038c60efe6493fdd4" in 2.6.29 > --- I was actually removing the old-style reference counting (that is > functionally equivalent to what your patch does) because of these three > problems. The old code really caused a crash although it happened very > rarely. > Ah. That'll explain it. The actual problem I'm trying to track down is that I'm seeing an excessive duration for the 'resume' dm ioctl after a table update. I've got reports where the ioctl can take up to several seconds. Which (as this is multipath) causes an extremely erratic behaviour. And the 'msleep' here is one of the obvious culprits. But I'll continue debugging, maybe I'll find something else. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dm-table: delayed cleanup for dm_table_destroy() 2012-03-20 15:35 ` Hannes Reinecke @ 2012-03-20 19:43 ` Mike Snitzer 2012-03-21 1:20 ` Mikulas Patocka 2012-03-21 1:24 ` [PATCH] " Mikulas Patocka 2012-03-21 5:34 ` Jun'ichi Nomura 2 siblings, 1 reply; 10+ messages in thread From: Mike Snitzer @ 2012-03-20 19:43 UTC (permalink / raw) To: Hannes Reinecke Cc: device-mapper development, Mikulas Patocka, Alasdair Kergon On Tue, Mar 20 2012 at 11:35am -0400, Hannes Reinecke <hare@suse.de> wrote: > Hi Mikulas, > > On 03/19/2012 10:45 PM, Mikulas Patocka wrote: > > Hi Hannes > > > > I wouldn't do this. This behavior actually existed in kernels <= 2.6.28. > > And it caused trouble. > > > > The problems are these: > > > > Some code may hold a spinlock and call dm_table_put. Currently, > > dm_table_put just decrements the counter and the cleanup is done > > synchronously in dm_table_destroy. With your patch, cleanup would be done > > in dm_table_put --- you call a target destructor (which may sleep) in a > > non-sleeping context. > > > > Some code may hold a mutex and call dm_table_put. If you call a target > > destructor from dm_table_put and it takes the same mutex, it deadlocks. > > > > Currently, when dm_table_destroy exits, it is guaranteed that the table is > > destroyed and all target destructors have been called. With your patch, > > dm_table_destroy may exit and the table could be still alive because of > > some other reference. This may cause leaked references to some other files > > or devices. For example, suppose that the user has a device mapper device > > "dm" that redirects requests to "/dev/sda". The user assumes that if he > > runs "dmsetup remove dm" and this command returns, "/dev/sda" will not be > > open. Your patch breaks this assumption. > > > Hmm. Yes, true. > > > Look at the commit "d58168763f74d1edbc296d7038c60efe6493fdd4" in 2.6.29 > > --- I was actually removing the old-style reference counting (that is > > functionally equivalent to what your patch does) because of these three > > problems. The old code really caused a crash although it happened very > > rarely. > > > Ah. That'll explain it. > The actual problem I'm trying to track down is that I'm seeing an > excessive duration for the 'resume' dm ioctl after a table update. > I've got reports where the ioctl can take up to several seconds. > Which (as this is multipath) causes an extremely erratic behaviour. > > And the 'msleep' here is one of the obvious culprits. > > But I'll continue debugging, maybe I'll find something else. I once wanted to replace all msleep(1); with cpu_relax();: http://www.redhat.com/archives/dm-devel/2010-September/msg00100.html But Alasdair wasn't sure if cpu_relax() would provide the required delay effect: https://www.redhat.com/archives/dm-devel/2011-June/msg00080.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: dm-table: delayed cleanup for dm_table_destroy() 2012-03-20 19:43 ` Mike Snitzer @ 2012-03-21 1:20 ` Mikulas Patocka 0 siblings, 0 replies; 10+ messages in thread From: Mikulas Patocka @ 2012-03-21 1:20 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development, Alasdair Kergon On Tue, 20 Mar 2012, Mike Snitzer wrote: > I once wanted to replace all msleep(1); with cpu_relax();: > http://www.redhat.com/archives/dm-devel/2010-September/msg00100.html > > But Alasdair wasn't sure if cpu_relax() would provide the required > delay effect: > https://www.redhat.com/archives/dm-devel/2011-June/msg00080.html cpu_relax() would basically kill the kernel if compiled without preempt. cpu_relax() makes the current processor sleep for a little moment, but it doesn't schedule a different process. cpu_relax() is useful in spinlocks - so that process waiting on a spinlock is not slowing down the other process on a CPU with hyperthreading. There was another possibility --- replace msleep(1) with yield(). The problem with yield() is that if the process has readltime priority and calls yield(), it doesn't give CPU to a different process with a lower priority. So it would cause deadlock if executed with realtime priority. msleep(1) doesn't have this problem, so it's the best. Mikulas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] dm-table: delayed cleanup for dm_table_destroy() 2012-03-20 15:35 ` Hannes Reinecke 2012-03-20 19:43 ` Mike Snitzer @ 2012-03-21 1:24 ` Mikulas Patocka 2012-03-21 5:34 ` Jun'ichi Nomura 2 siblings, 0 replies; 10+ messages in thread From: Mikulas Patocka @ 2012-03-21 1:24 UTC (permalink / raw) To: Hannes Reinecke; +Cc: device-mapper development, Alasdair Kergon On Tue, 20 Mar 2012, Hannes Reinecke wrote: > Ah. That'll explain it. > The actual problem I'm trying to track down is that I'm seeing an > excessive duration for the 'resume' dm ioctl after a table update. > I've got reports where the ioctl can take up to several seconds. > Which (as this is multipath) causes an extremely erratic behaviour. > > And the 'msleep' here is one of the obvious culprits. > > But I'll continue debugging, maybe I'll find something else. > > Cheers, > > Hannes If it takes several second, it is not obviously problem with msleep(1) --- if we converted msleep(1) to a wait queue, it would improve by at most 1ms, but it couldn't improve by a few seconds. Mikulas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] dm-table: delayed cleanup for dm_table_destroy() 2012-03-20 15:35 ` Hannes Reinecke 2012-03-20 19:43 ` Mike Snitzer 2012-03-21 1:24 ` [PATCH] " Mikulas Patocka @ 2012-03-21 5:34 ` Jun'ichi Nomura 2 siblings, 0 replies; 10+ messages in thread From: Jun'ichi Nomura @ 2012-03-21 5:34 UTC (permalink / raw) To: device-mapper development, Hannes Reinecke Cc: Mikulas Patocka, Alasdair Kergon Hi Hannes, On 03/21/12 00:35, Hannes Reinecke wrote: > The actual problem I'm trying to track down is that I'm seeing an > excessive duration for the 'resume' dm ioctl after a table update. > I've got reports where the ioctl can take up to several seconds. > Which (as this is multipath) causes an extremely erratic behaviour. > > And the 'msleep' here is one of the obvious culprits. > > But I'll continue debugging, maybe I'll find something else. Did you track down which part of the resume ioctl took long? If table is updated and the device is not yet suspended, resume ioctl itself suspends the device. And dm_suspend() could take long depending on lower devices as it waits for already-submitted I/Os to return to dm. Thanks, -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-03-21 5:34 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-03-19 15:53 [PATCH] Device-mapper misc cleanup Hannes Reinecke 2012-03-19 15:53 ` [PATCH] dm: zero out bi_end_io on remapping failure Hannes Reinecke 2012-03-19 15:53 ` [PATCH] dm-table: simplify call to free_devices() Hannes Reinecke 2012-03-19 15:53 ` [PATCH] dm-table: delayed cleanup for dm_table_destroy() Hannes Reinecke 2012-03-19 21:45 ` Mikulas Patocka 2012-03-20 15:35 ` Hannes Reinecke 2012-03-20 19:43 ` Mike Snitzer 2012-03-21 1:20 ` Mikulas Patocka 2012-03-21 1:24 ` [PATCH] " Mikulas Patocka 2012-03-21 5:34 ` Jun'ichi Nomura
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.