* dm: bind new table before destroying old
@ 2009-11-11 1:16 Alasdair G Kergon
2009-11-11 1:20 ` Alasdair G Kergon
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Alasdair G Kergon @ 2009-11-11 1:16 UTC (permalink / raw)
To: dm-devel
Cc: Kiyoshi Ueda, Mike Snitzer, Heinz Mauelshagen, Mikulas Patocka,
Zdenek Kabelac, Jun'ichi Nomura, Milan Broz
Questions:
Do all the targets correctly flush or push back everything during a
suspend (including workqueues)?
Do all the targets correctly sync to disk all internal state that
needs to be preserved during a suspend?
In other words, in the case of an already-suspended target, the target
'dtr' functions should only be freeing memory and other resources and
not causing I/O to any of the table's devices.
All targets are supposed to be behave this way already, but please
would you check the targets with which you are familiar anyway?
Alasdair
From: Alasdair G Kergon <agk@redhat.com>
When replacing a mapped device's table during a 'resume', delay the
destruction of the old table until the new one is successfully in place.
This will make it easier for a later patch to transfer internal state
information from the old table to the new one (something we do not currently
support) while giving us more options for reversion if a later part
of the operation fails.
Devices are always in the suspended state during dm_swap_table().
This patch reinforces the requirement that all I/O must have been
flushed from the table targets while in this state (including any in
workqueues). In the case of 'noflush' suspending, unprocessed
I/O should have been 'pushed back' to the dm core prior to this point,
for resubmission after the new table is in place.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
---
drivers/md/dm-table.c | 3 +++
drivers/md/dm.c | 13 +++++++++----
2 files changed, 12 insertions(+), 4 deletions(-)
Index: linux-2.6.32-rc6/drivers/md/dm-table.c
===================================================================
--- linux-2.6.32-rc6.orig/drivers/md/dm-table.c
+++ linux-2.6.32-rc6/drivers/md/dm-table.c
@@ -237,6 +237,9 @@ void dm_table_destroy(struct dm_table *t
{
unsigned int i;
+ if (!t)
+ return;
+
while (atomic_read(&t->holders))
msleep(1);
smp_mb();
Index: linux-2.6.32-rc6/drivers/md/dm.c
===================================================================
--- linux-2.6.32-rc6.orig/drivers/md/dm.c
+++ linux-2.6.32-rc6/drivers/md/dm.c
@@ -2070,7 +2070,10 @@ static int __bind(struct mapped_device *
return 0;
}
-static void __unbind(struct mapped_device *md)
+/*
+ * Returns unbound table for the caller to free.
+ */
+struct dm_table *__unbind(struct mapped_device *md)
{
struct dm_table *map = md->map;
unsigned long flags;
@@ -2082,7 +2085,8 @@ static void __unbind(struct mapped_devic
write_lock_irqsave(&md->map_lock, flags);
md->map = NULL;
write_unlock_irqrestore(&md->map_lock, flags);
- dm_table_destroy(map);
+
+ return map;
}
/*
@@ -2175,7 +2179,7 @@ void dm_put(struct mapped_device *md)
}
dm_sysfs_exit(md);
dm_table_put(map);
- __unbind(md);
+ dm_table_destroy(__unbind(md));
free_dev(md);
}
}
@@ -2381,8 +2385,9 @@ int dm_swap_table(struct mapped_device *
goto out;
}
- __unbind(md);
+ map = __unbind(md);
r = __bind(md, table, &limits);
+ dm_table_destroy(map);
out:
mutex_unlock(&md->suspend_lock);
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: dm: bind new table before destroying old 2009-11-11 1:16 dm: bind new table before destroying old Alasdair G Kergon @ 2009-11-11 1:20 ` Alasdair G Kergon 2009-11-11 2:48 ` dm: keep old table until after resume succeeded Alasdair G Kergon 2009-11-11 6:56 ` dm: bind new table before destroying old Kiyoshi Ueda 2009-11-11 13:20 ` Mike Snitzer 2 siblings, 1 reply; 13+ messages in thread From: Alasdair G Kergon @ 2009-11-11 1:20 UTC (permalink / raw) To: dm-devel Cc: Kiyoshi Ueda, Mike Snitzer, Heinz Mauelshagen, Mikulas Patocka, Zdenek Kabelac, Jun'ichi Nomura, Milan Broz And now a version that actually compiles:-) From: Alasdair G Kergon <agk@redhat.com> When replacing a mapped device's table during a 'resume', delay the destruction of the old table until the new one is successfully in place. This will make it easier for a later patch to transfer internal state information from the old table to the new one (something we do not currently support) while giving us more options for reversion if a later part of the operation fails. Devices are always in the suspended state during dm_swap_table(). This patch reinforces the requirement that all I/O must have been flushed from the table targets while in this state (including any in workqueues). In the case of 'noflush' suspending, unprocessed I/O should have been 'pushed back' to the dm core prior to this point, for resubmission after the new table is in place. Signed-off-by: Alasdair G Kergon <agk@redhat.com> --- drivers/md/dm-table.c | 3 +++ drivers/md/dm.c | 16 +++++++++++----- 2 files changed, 14 insertions(+), 5 deletions(-) Index: linux-2.6.32-rc6/drivers/md/dm-table.c =================================================================== --- linux-2.6.32-rc6.orig/drivers/md/dm-table.c +++ linux-2.6.32-rc6/drivers/md/dm-table.c @@ -237,6 +237,9 @@ void dm_table_destroy(struct dm_table *t { unsigned int i; + if (!t) + return; + while (atomic_read(&t->holders)) msleep(1); smp_mb(); Index: linux-2.6.32-rc6/drivers/md/dm.c =================================================================== --- linux-2.6.32-rc6.orig/drivers/md/dm.c +++ linux-2.6.32-rc6/drivers/md/dm.c @@ -2070,19 +2070,23 @@ static int __bind(struct mapped_device * return 0; } -static void __unbind(struct mapped_device *md) +/* + * Returns unbound table for the caller to free. + */ +static struct dm_table *__unbind(struct mapped_device *md) { struct dm_table *map = md->map; unsigned long flags; if (!map) - return; + return NULL; dm_table_event_callback(map, NULL, NULL); write_lock_irqsave(&md->map_lock, flags); md->map = NULL; write_unlock_irqrestore(&md->map_lock, flags); - dm_table_destroy(map); + + return map; } /* @@ -2175,7 +2179,7 @@ void dm_put(struct mapped_device *md) } dm_sysfs_exit(md); dm_table_put(map); - __unbind(md); + dm_table_destroy(__unbind(md)); free_dev(md); } } @@ -2361,6 +2365,7 @@ static void dm_rq_barrier_work(struct wo */ int dm_swap_table(struct mapped_device *md, struct dm_table *table) { + struct dm_table *map; struct queue_limits limits; int r = -EINVAL; @@ -2381,8 +2386,9 @@ int dm_swap_table(struct mapped_device * goto out; } - __unbind(md); + map = __unbind(md); r = __bind(md, table, &limits); + dm_table_destroy(map); out: mutex_unlock(&md->suspend_lock); ^ permalink raw reply [flat|nested] 13+ messages in thread
* dm: keep old table until after resume succeeded 2009-11-11 1:20 ` Alasdair G Kergon @ 2009-11-11 2:48 ` Alasdair G Kergon 0 siblings, 0 replies; 13+ messages in thread From: Alasdair G Kergon @ 2009-11-11 2:48 UTC (permalink / raw) To: dm-devel Cc: Kiyoshi Ueda, Mike Snitzer, Heinz Mauelshagen, Mikulas Patocka, Zdenek Kabelac, Jun'ichi Nomura, Milan Broz Follow-up patch which moves the table destruction after the resume. __unbind() becomes redundant when followed by __bind(). Alasdair From: Alasdair G Kergon <agk@redhat.com> When swapping a new table into place, retain the old table until its replacement is in place. An old check for an empty table is removed because this is enforced in populate_table(). Signed-off-by: Alasdair G Kergon <agk@redhat.com> --- drivers/md/dm-ioctl.c | 10 ++++++---- drivers/md/dm.c | 34 +++++++++++++++++----------------- include/linux/device-mapper.h | 4 +++- 3 files changed, 26 insertions(+), 22 deletions(-) Index: linux-2.6.32-rc6/drivers/md/dm-ioctl.c =================================================================== --- linux-2.6.32-rc6.orig/drivers/md/dm-ioctl.c +++ linux-2.6.32-rc6/drivers/md/dm-ioctl.c @@ -855,7 +855,7 @@ static int do_resume(struct dm_ioctl *pa unsigned suspend_flags = DM_SUSPEND_LOCKFS_FLAG; struct hash_cell *hc; struct mapped_device *md; - struct dm_table *new_map; + struct dm_table *new_map, *old_map = NULL; down_write(&_hash_lock); @@ -884,11 +884,11 @@ static int do_resume(struct dm_ioctl *pa if (!dm_suspended(md)) dm_suspend(md, suspend_flags); - r = dm_swap_table(md, new_map); - if (r) { + old_map = dm_swap_table(md, new_map); + if (IS_ERR(old_map)) { dm_table_destroy(new_map); dm_put(md); - return r; + return PTR_ERR(old_map); } if (dm_table_get_mode(new_map) & FMODE_WRITE) @@ -900,6 +900,8 @@ static int do_resume(struct dm_ioctl *pa if (dm_suspended(md)) r = dm_resume(md); + if (old_map) + dm_table_destroy(old_map); if (!r) { dm_kobject_uevent(md, KOBJ_CHANGE, param->event_nr); Index: linux-2.6.32-rc6/drivers/md/dm.c =================================================================== --- linux-2.6.32-rc6.orig/drivers/md/dm.c +++ linux-2.6.32-rc6/drivers/md/dm.c @@ -2026,9 +2026,13 @@ static void __set_size(struct mapped_dev mutex_unlock(&md->bdev->bd_inode->i_mutex); } -static int __bind(struct mapped_device *md, struct dm_table *t, - struct queue_limits *limits) +/* + * Returns old map, which caller must destroy. + */ +static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t, + struct queue_limits *limits) { + struct dm_table *old_map; struct request_queue *q = md->queue; sector_t size; unsigned long flags; @@ -2043,11 +2047,6 @@ static int __bind(struct mapped_device * __set_size(md, size); - if (!size) { - dm_table_destroy(t); - return 0; - } - dm_table_event_callback(t, event_callback, md); /* @@ -2063,11 +2062,12 @@ static int __bind(struct mapped_device * __bind_mempools(md, t); write_lock_irqsave(&md->map_lock, flags); + old_map = md->map; md->map = t; dm_table_set_restrictions(t, q, limits); write_unlock_irqrestore(&md->map_lock, flags); - return 0; + return old_map; } /* @@ -2361,13 +2361,13 @@ static void dm_rq_barrier_work(struct wo } /* - * Swap in a new table (destroying old one). + * Swap in a new table, returning the old one for the caller to destroy. */ -int dm_swap_table(struct mapped_device *md, struct dm_table *table) +struct dm_table *dm_swap_table(struct mapped_device *md, struct dm_table *table) { - struct dm_table *map; + struct dm_table *map = ERR_PTR(-EINVAL); struct queue_limits limits; - int r = -EINVAL; + int r; mutex_lock(&md->suspend_lock); @@ -2376,8 +2376,10 @@ int dm_swap_table(struct mapped_device * goto out; r = dm_calculate_queue_limits(table, &limits); - if (r) + if (r) { + map = ERR_PTR(r); goto out; + } /* cannot change the device type, once a table is bound */ if (md->map && @@ -2386,13 +2388,11 @@ int dm_swap_table(struct mapped_device * goto out; } - map = __unbind(md); - r = __bind(md, table, &limits); - dm_table_destroy(map); + map = __bind(md, table, &limits); out: mutex_unlock(&md->suspend_lock); - return r; + return map; } /* Index: linux-2.6.32-rc6/include/linux/device-mapper.h =================================================================== --- linux-2.6.32-rc6.orig/include/linux/device-mapper.h +++ linux-2.6.32-rc6/include/linux/device-mapper.h @@ -295,8 +295,10 @@ void dm_table_event(struct dm_table *t); /* * The device must be suspended before calling this method. + * Returns the previous table, which the caller must destroy. */ -int dm_swap_table(struct mapped_device *md, struct dm_table *t); +struct dm_table *dm_swap_table(struct mapped_device *md, + struct dm_table *t); /* * A wrapper around vmalloc. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm: bind new table before destroying old 2009-11-11 1:16 dm: bind new table before destroying old Alasdair G Kergon 2009-11-11 1:20 ` Alasdair G Kergon @ 2009-11-11 6:56 ` Kiyoshi Ueda 2009-11-11 15:11 ` Alasdair G Kergon 2009-11-11 13:20 ` Mike Snitzer 2 siblings, 1 reply; 13+ messages in thread From: Kiyoshi Ueda @ 2009-11-11 6:56 UTC (permalink / raw) To: Alasdair Kergon Cc: Kiyoshi Ueda, Mike Snitzer, Heinz Mauelshagen, dm-devel, Mikulas Patocka, Zdenek Kabelac, Jun'ichi Nomura, Milan Broz Hi Alasdair, On 11/11/2009 10:16 AM +0900, Alasdair G Kergon wrote: > Questions: > > Do all the targets correctly flush or push back everything during a > suspend (including workqueues)? > > Do all the targets correctly sync to disk all internal state that > needs to be preserved during a suspend? > > In other words, in the case of an already-suspended target, the target > 'dtr' functions should only be freeing memory and other resources and > not causing I/O to any of the table's devices. > > All targets are supposed to be behave this way already, but please > would you check the targets with which you are familiar anyway? I have checked multipath and found 2 issues. multipath flushes all normal I/Os before suspend completion. But multipath doesn't flush some workqueues until the table destruction. Also, such works can be added and kicked even after suspend completion through message ioctl. For example, [de]activate_path() and trigger_event() are such works. "reinstate path" message will trigger activate_path() work and activate_path() may send some SCSI commands (through pg_init()) to the underlying devices of the already-suspended target. (Also, "fail_path" message will trigger deactivate_path() work and deactivate_path() may abort the underlying device's queue of the already-suspended target.) So moving the table destruction after the resume (in your another patch) could/might cause some race problems between new_table and old_table if they have a same underlying device. (e.g. pg_init() race, aborting queue after resume.) I believe dm-mpath needs to flush such workqueues in postsuspend. Also, we need something to block message ioctl to suspended device. As for the message ioctl, I don't have any good idea, but... - Reject message ioctl to suspended device in dm-ioctl Or/And - Targets must not kick any work influencing external themselves in their message ioctl handlers. Thanks, Kiyoshi Ueda ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm: bind new table before destroying old 2009-11-11 6:56 ` dm: bind new table before destroying old Kiyoshi Ueda @ 2009-11-11 15:11 ` Alasdair G Kergon 2009-11-11 15:14 ` Milan Broz 2009-11-12 9:59 ` Kiyoshi Ueda 0 siblings, 2 replies; 13+ messages in thread From: Alasdair G Kergon @ 2009-11-11 15:11 UTC (permalink / raw) To: Kiyoshi Ueda Cc: Mike Snitzer, Heinz Mauelshagen, dm-devel, Mikulas Patocka, Zdenek Kabelac, Jun'ichi Nomura, Milan Broz On Wed, Nov 11, 2009 at 03:56:05PM +0900, Kiyoshi Ueda wrote: > I believe dm-mpath needs to flush such workqueues in postsuspend. Yes - should be an easy change. > Also, we need something to block message ioctl to suspended device. > As for the message ioctl, I don't have any good idea, but... > - Reject message ioctl to suspended device in dm-ioctl I think the crypt target expects to be able to do this to manipulate the in-core encryption key. > - Targets must not kick any work influencing external themselves > in their message ioctl handlers. So I think we'll need a target-specific approach like that here. Alasdair ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm: bind new table before destroying old 2009-11-11 15:11 ` Alasdair G Kergon @ 2009-11-11 15:14 ` Milan Broz 2009-11-12 7:00 ` Mike Anderson 2009-11-12 9:59 ` Kiyoshi Ueda 1 sibling, 1 reply; 13+ messages in thread From: Milan Broz @ 2009-11-11 15:14 UTC (permalink / raw) To: Kiyoshi Ueda, dm-devel, Mike Snitzer, Mikulas Patocka, Jonathan Brassow, Jun'ichi On 11/11/2009 04:11 PM, Alasdair G Kergon wrote: > On Wed, Nov 11, 2009 at 03:56:05PM +0900, Kiyoshi Ueda wrote: >> I believe dm-mpath needs to flush such workqueues in postsuspend. > > Yes - should be an easy change. similar problem is probably in crypt target, I'll check it later. >> Also, we need something to block message ioctl to suspended device. > >> As for the message ioctl, I don't have any good idea, but... >> - Reject message ioctl to suspended device in dm-ioctl > > I think the crypt target expects to be able to do this to manipulate > the in-core encryption key. yes, crypt target must be able to process messages when in suspended state. Milan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: dm: bind new table before destroying old 2009-11-11 15:14 ` Milan Broz @ 2009-11-12 7:00 ` Mike Anderson 0 siblings, 0 replies; 13+ messages in thread From: Mike Anderson @ 2009-11-12 7:00 UTC (permalink / raw) To: device-mapper development Cc: Kiyoshi Ueda, Mike Snitzer, Heinz Mauelshagen, Mikulas Patocka, Zdenek Kabelac, Jun'ichi Nomura Milan Broz <mbroz@redhat.com> wrote: > On 11/11/2009 04:11 PM, Alasdair G Kergon wrote: > > On Wed, Nov 11, 2009 at 03:56:05PM +0900, Kiyoshi Ueda wrote: > >> I believe dm-mpath needs to flush such workqueues in postsuspend. > > > > Yes - should be an easy change. > > similar problem is probably in crypt target, I'll check it later. > > >> Also, we need something to block message ioctl to suspended device. > > > >> As for the message ioctl, I don't have any good idea, but... > >> - Reject message ioctl to suspended device in dm-ioctl > > > > I think the crypt target expects to be able to do this to manipulate > > the in-core encryption key. > > yes, crypt target must be able to process messages when in suspended state. > I hit a issue like Kiyoshi described of the target_message (multipath_message) generating new work while dev_remove is trying complete in parallel with calling fail_path / reinstate_path. I added two accessor functions to dm-table.c (dm_table_md_suspended and dm_table_md_freeing). I am calling both functions from multipath_message to return early if we are in one of these states. Depending on the usage model of the other targets message functions dm_table_md_freeing (or a new function dm_table_md_deleting) could be called in target_message instead of each targets message function. The test runs using a mutex and the alternate option of using spin_lock are currently passing I will post tomorrow the series for comments. I also tested suspend / resume in parallel with calling fail_path / reinstate_path. -andmike -- Michael Anderson andmike@linux.vnet.ibm.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm: bind new table before destroying old 2009-11-11 15:11 ` Alasdair G Kergon 2009-11-11 15:14 ` Milan Broz @ 2009-11-12 9:59 ` Kiyoshi Ueda 1 sibling, 0 replies; 13+ messages in thread From: Kiyoshi Ueda @ 2009-11-12 9:59 UTC (permalink / raw) To: Alasdair Kergon Cc: Kiyoshi Ueda, Mike Snitzer, Heinz Mauelshagen, dm-devel, Mikulas Patocka, Zdenek Kabelac, Jun'ichi Nomura, Milan Broz Hi Alasdair, On 11/12/2009 12:11 AM +0900, Alasdair G Kergon wrote: > On Wed, Nov 11, 2009 at 03:56:05PM +0900, Kiyoshi Ueda wrote: >> I believe dm-mpath needs to flush such workqueues in postsuspend. > > Yes - should be an easy change. OK, I posted the patch. See: http://patchwork.kernel.org/patch/59556/ >> Also, we need something to block message ioctl to suspended device. > >> As for the message ioctl, I don't have any good idea, but... >> - Reject message ioctl to suspended device in dm-ioctl > > I think the crypt target expects to be able to do this to manipulate > the in-core encryption key. > >> - Targets must not kick any work influencing external themselves >> in their message ioctl handlers. > > So I think we'll need a target-specific approach like that here. OK, Mike Anderson seems to be working on this. Thanks, Kiyoshi Ueda ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm: bind new table before destroying old 2009-11-11 1:16 dm: bind new table before destroying old Alasdair G Kergon 2009-11-11 1:20 ` Alasdair G Kergon 2009-11-11 6:56 ` dm: bind new table before destroying old Kiyoshi Ueda @ 2009-11-11 13:20 ` Mike Snitzer 2009-11-11 23:45 ` Mike Snitzer 2 siblings, 1 reply; 13+ messages in thread From: Mike Snitzer @ 2009-11-11 13:20 UTC (permalink / raw) To: Alasdair G Kergon Cc: Kiyoshi Ueda, Heinz Mauelshagen, dm-devel, Mikulas Patocka, Zdenek Kabelac, Jun'ichi Nomura, Milan Broz On Tue, Nov 10 2009 at 8:16pm -0500, Alasdair G Kergon <agk@redhat.com> wrote: > Questions: > > Do all the targets correctly flush or push back everything during a > suspend (including workqueues)? > > Do all the targets correctly sync to disk all internal state that > needs to be preserved during a suspend? > > In other words, in the case of an already-suspended target, the target > 'dtr' functions should only be freeing memory and other resources and > not causing I/O to any of the table's devices. > > All targets are supposed to be behave this way already, but please > would you check the targets with which you are familiar anyway? > > Alasdair > > > From: Alasdair G Kergon <agk@redhat.com> > > When replacing a mapped device's table during a 'resume', delay the > destruction of the old table until the new one is successfully in place. > > This will make it easier for a later patch to transfer internal state > information from the old table to the new one (something we do not currently > support) while giving us more options for reversion if a later part > of the operation fails. I have confirmed that this patch allows handover to work within a single device. The error recovery, will it be done unilaterally in the DM core (on preresume failure)? Or will each target driver need to call a DM core callback if it'd like recovery? Mike ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm: bind new table before destroying old 2009-11-11 13:20 ` Mike Snitzer @ 2009-11-11 23:45 ` Mike Snitzer 2009-11-19 0:50 ` Alasdair G Kergon 0 siblings, 1 reply; 13+ messages in thread From: Mike Snitzer @ 2009-11-11 23:45 UTC (permalink / raw) To: Alasdair G Kergon Cc: Kiyoshi Ueda, Heinz Mauelshagen, dm-devel, Mikulas Patocka, Zdenek Kabelac, Jun'ichi Nomura, Milan Broz On Wed, Nov 11 2009 at 8:20am -0500, Mike Snitzer <snitzer@redhat.com> wrote: > On Tue, Nov 10 2009 at 8:16pm -0500, > Alasdair G Kergon <agk@redhat.com> wrote: > > > Questions: > > > > Do all the targets correctly flush or push back everything during a > > suspend (including workqueues)? > > > > Do all the targets correctly sync to disk all internal state that > > needs to be preserved during a suspend? > > > > In other words, in the case of an already-suspended target, the target > > 'dtr' functions should only be freeing memory and other resources and > > not causing I/O to any of the table's devices. > > > > All targets are supposed to be behave this way already, but please > > would you check the targets with which you are familiar anyway? > > > > Alasdair > > > > > > From: Alasdair G Kergon <agk@redhat.com> > > > > When replacing a mapped device's table during a 'resume', delay the > > destruction of the old table until the new one is successfully in place. > > > > This will make it easier for a later patch to transfer internal state > > information from the old table to the new one (something we do not currently > > support) while giving us more options for reversion if a later part > > of the operation fails. > > I have confirmed that this patch allows handover to work within a single > device. Alasdair, After further testing I've hit a lockdep trace. My testing was with handing over on the same device. I had the snapshot (of an ext3 FS) mounted and I was doing a sequential direct-io write to a file in the FS. While writing I triggered a handover with the following: echo "0 50331648 snapshot 253:2 253:3 P 8" | dmsetup reload test-testlv_snap dmsetup resume test-testlv_snap With that handover worked fine (with no IO errors), but the following lockdep resulted (some "snapshot_*" tracing was added for context): snapshot_ctr snapshot_ctr: found snap_src snapshot_presuspend ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-rc6-snitm #8 ------------------------------------------------------- dmsetup/1827 is trying to acquire lock: (&md->suspend_lock){+.+...}, at: [<ffffffffa00678d8>] dm_swap_table+0x2d/0x249 [dm_mod] but task is already holding lock: (&journal->j_barrier){+.+...}, at: [<ffffffff8119192d>] journal_lock_updates+0xe1/0xf0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&journal->j_barrier){+.+...}: [<ffffffff810857b3>] __lock_acquire+0xb6b/0xd13 [<ffffffff81086396>] lock_release_non_nested+0x1dc/0x23b [<ffffffff8108656f>] lock_release+0x17a/0x1a5 [<ffffffff8139214b>] __mutex_unlock_slowpath+0xce/0x132 [<ffffffff813921bd>] mutex_unlock+0xe/0x10 [<ffffffff81147329>] freeze_bdev+0x104/0x110 [<ffffffffa0069038>] dm_suspend+0x119/0x2a1 [dm_mod] [<ffffffffa006db3a>] dev_suspend+0x11d/0x1de [dm_mod] [<ffffffffa006e30c>] ctl_ioctl+0x1c6/0x213 [dm_mod] [<ffffffffa006e36c>] dm_ctl_ioctl+0x13/0x17 [dm_mod] [<ffffffff8112a959>] vfs_ioctl+0x22/0x87 [<ffffffff8112aec2>] do_vfs_ioctl+0x488/0x4ce [<ffffffff8112af5e>] sys_ioctl+0x56/0x79 [<ffffffff8100bb82>] system_call_fastpath+0x16/0x1b -> #0 (&md->suspend_lock){+.+...}: [<ffffffff8108565d>] __lock_acquire+0xa15/0xd13 [<ffffffff81085a37>] lock_acquire+0xdc/0x102 [<ffffffff81392372>] __mutex_lock_common+0x4b/0x37b [<ffffffff81392766>] mutex_lock_nested+0x3e/0x43 [<ffffffffa00678d8>] dm_swap_table+0x2d/0x249 [dm_mod] [<ffffffffa006db45>] dev_suspend+0x128/0x1de [dm_mod] [<ffffffffa006e30c>] ctl_ioctl+0x1c6/0x213 [dm_mod] [<ffffffffa006e36c>] dm_ctl_ioctl+0x13/0x17 [dm_mod] [<ffffffff8112a959>] vfs_ioctl+0x22/0x87 [<ffffffff8112aec2>] do_vfs_ioctl+0x488/0x4ce [<ffffffff8112af5e>] sys_ioctl+0x56/0x79 [<ffffffff8100bb82>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by dmsetup/1827: #0: (&journal->j_barrier){+.+...}, at: [<ffffffff8119192d>] journal_lock_updates+0xe1/0xf0 stack backtrace: Pid: 1827, comm: dmsetup Not tainted 2.6.32-rc6-snitm #8 Call Trace: [<ffffffff81084825>] print_circular_bug+0xa8/0xb7 [<ffffffff8108565d>] __lock_acquire+0xa15/0xd13 [<ffffffff81085a37>] lock_acquire+0xdc/0x102 [<ffffffffa00678d8>] ? dm_swap_table+0x2d/0x249 [dm_mod] [<ffffffffa00678d8>] ? dm_swap_table+0x2d/0x249 [dm_mod] [<ffffffffa006da1d>] ? dev_suspend+0x0/0x1de [dm_mod] [<ffffffff81392372>] __mutex_lock_common+0x4b/0x37b [<ffffffffa00678d8>] ? dm_swap_table+0x2d/0x249 [dm_mod] [<ffffffff81083933>] ? mark_lock+0x2d/0x22d [<ffffffff81083b85>] ? mark_held_locks+0x52/0x70 [<ffffffff8139219d>] ? __mutex_unlock_slowpath+0x120/0x132 [<ffffffffa006da1d>] ? dev_suspend+0x0/0x1de [dm_mod] [<ffffffff81392766>] mutex_lock_nested+0x3e/0x43 [<ffffffffa00678d8>] dm_swap_table+0x2d/0x249 [dm_mod] [<ffffffff813921bd>] ? mutex_unlock+0xe/0x10 [<ffffffffa00691ae>] ? dm_suspend+0x28f/0x2a1 [dm_mod] [<ffffffffa006da1d>] ? dev_suspend+0x0/0x1de [dm_mod] [<ffffffffa006db45>] dev_suspend+0x128/0x1de [dm_mod] [<ffffffffa006e30c>] ctl_ioctl+0x1c6/0x213 [dm_mod] [<ffffffff81077d7f>] ? cpu_clock+0x43/0x5e [<ffffffffa006e36c>] dm_ctl_ioctl+0x13/0x17 [dm_mod] [<ffffffff8112a959>] vfs_ioctl+0x22/0x87 [<ffffffff81083e41>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8112aec2>] do_vfs_ioctl+0x488/0x4ce [<ffffffff811f3e5a>] ? __up_read+0x76/0x7f [<ffffffff81076746>] ? up_read+0x2b/0x2f [<ffffffff8100c635>] ? retint_swapgs+0x13/0x1b [<ffffffff8112af5e>] sys_ioctl+0x56/0x79 [<ffffffff8100bb82>] system_call_fastpath+0x16/0x1b snapshot_preresume snapshot_preresume: snap_src is_handover_source snapshot_preresume: resuming handover-destination snapshot_resume snapshot_resume: handing over exceptions snapshot_dtr ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm: bind new table before destroying old 2009-11-11 23:45 ` Mike Snitzer @ 2009-11-19 0:50 ` Alasdair G Kergon 2009-11-19 2:51 ` malahal 0 siblings, 1 reply; 13+ messages in thread From: Alasdair G Kergon @ 2009-11-19 0:50 UTC (permalink / raw) To: Mike Snitzer Cc: Kiyoshi Ueda, Heinz Mauelshagen, dm-devel, Mikulas Patocka, Zdenek Kabelac, Jun'ichi Nomura, Milan Broz On Wed, Nov 11, 2009 at 06:45:55PM -0500, Mike Snitzer wrote: > After further testing I've hit a lockdep trace. My testing was with > handing over on the same device. I had the snapshot (of an ext3 FS) > mounted and I was doing a sequential direct-io write to a file in the > FS. While writing I triggered a handover with the following: > ======================================================= > [ INFO: possible circular locking dependency detected ] > 2.6.32-rc6-snitm #8 > ------------------------------------------------------- > dmsetup/1827 is trying to acquire lock: > (&md->suspend_lock){+.+...}, at: [<ffffffffa00678d8>] dm_swap_table+0x2d/0x249 [dm_mod] > > but task is already holding lock: > (&journal->j_barrier){+.+...}, at: [<ffffffff8119192d>] journal_lock_updates+0xe1/0xf0 > > which lock already depends on the new lock. I'm going to assume this is bogus - and I can't spot any annotations available to suppress it, so people will just have to ignore it. Suspend involves: get suspend lock if dev is not already suspended get journal lock set state "dev is suspended" release suspend lock Resume involves [journal lock is held] get suspend lock if dev is suspended release journal lock set state "dev is not suspended" release suspend lock It looks as if lockdep sees that as a problem: Imagine those two sections running in parallel without the "Is dev suspended?" check, of which lockdep has no knowledge. Alasdair ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: dm: bind new table before destroying old 2009-11-19 0:50 ` Alasdair G Kergon @ 2009-11-19 2:51 ` malahal 2009-11-19 12:49 ` Mikulas Patocka 0 siblings, 1 reply; 13+ messages in thread From: malahal @ 2009-11-19 2:51 UTC (permalink / raw) To: dm-devel Cc: Kiyoshi Ueda, Mike Snitzer, Heinz Mauelshagen, Mikulas Patocka, Zdenek Kabelac, Jun'ichi Nomura, Milan Broz Alasdair G Kergon [agk@redhat.com] wrote: > > After further testing I've hit a lockdep trace. My testing was with > > handing over on the same device. I had the snapshot (of an ext3 FS) > > mounted and I was doing a sequential direct-io write to a file in the > > FS. While writing I triggered a handover with the following: > > > ======================================================= > > [ INFO: possible circular locking dependency detected ] > > 2.6.32-rc6-snitm #8 > > ------------------------------------------------------- > > dmsetup/1827 is trying to acquire lock: > > (&md->suspend_lock){+.+...}, at: [<ffffffffa00678d8>] dm_swap_table+0x2d/0x249 [dm_mod] > > > > but task is already holding lock: > > (&journal->j_barrier){+.+...}, at: [<ffffffff8119192d>] journal_lock_updates+0xe1/0xf0 > > > > which lock already depends on the new lock. > > I'm going to assume this is bogus - and I can't spot any annotations > available to suppress it, so people will just have to ignore it. > > Suspend involves: > get suspend lock > if dev is not already suspended > get journal lock > set state "dev is suspended" > release suspend lock > > Resume involves > [journal lock is held] > get suspend lock > if dev is suspended > release journal lock > set state "dev is not suspended" > release suspend lock > > It looks as if lockdep sees that as a problem: > Imagine those two sections running in parallel without the "Is dev > suspended?" check, of which lockdep has no knowledge. Agreed, but is it possible to restructure the suspend code such that it acquires the journal lock before the suspend lock, and then releases the journal lock if dev is already suspended? This needs some explaining in a comment form though! :-) Thanks, Malahal. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: dm: bind new table before destroying old 2009-11-19 2:51 ` malahal @ 2009-11-19 12:49 ` Mikulas Patocka 0 siblings, 0 replies; 13+ messages in thread From: Mikulas Patocka @ 2009-11-19 12:49 UTC (permalink / raw) To: linux-ext4, Stephen Tweedie Cc: Kiyoshi Ueda, Mike Snitzer, Heinz Mauelshagen, dm-devel, Zdenek Kabelac, Jun'ichi Nomura, Milan Broz On Wed, 18 Nov 2009, malahal@us.ibm.com wrote: > Alasdair G Kergon [agk@redhat.com] wrote: > > > After further testing I've hit a lockdep trace. My testing was with > > > handing over on the same device. I had the snapshot (of an ext3 FS) > > > mounted and I was doing a sequential direct-io write to a file in the > > > FS. While writing I triggered a handover with the following: > > > > > ======================================================= > > > [ INFO: possible circular locking dependency detected ] > > > 2.6.32-rc6-snitm #8 > > > ------------------------------------------------------- > > > dmsetup/1827 is trying to acquire lock: > > > (&md->suspend_lock){+.+...}, at: [<ffffffffa00678d8>] dm_swap_table+0x2d/0x249 [dm_mod] > > > > > > but task is already holding lock: > > > (&journal->j_barrier){+.+...}, at: [<ffffffff8119192d>] journal_lock_updates+0xe1/0xf0 > > > > > > which lock already depends on the new lock. > > > > I'm going to assume this is bogus - and I can't spot any annotations > > available to suppress it, so people will just have to ignore it. > > > > Suspend involves: > > get suspend lock > > if dev is not already suspended > > get journal lock > > set state "dev is suspended" > > release suspend lock > > > > Resume involves > > [journal lock is held] > > get suspend lock > > if dev is suspended > > release journal lock > > set state "dev is not suspended" > > release suspend lock > > > > It looks as if lockdep sees that as a problem: > > Imagine those two sections running in parallel without the "Is dev > > suspended?" check, of which lockdep has no knowledge. > > Agreed, but is it possible to restructure the suspend code such that it > acquires the journal lock before the suspend lock, and then releases the > journal lock if dev is already suspended? This needs some explaining > in a comment form though! :-) > > Thanks, Malahal. The real reason for the warning is that ext3 jbd takes the mutex on suspend (ext3_freeze) and then keeps it taken until resume (ext3_unfreeze). You also get a warning if you issue "dmsetup suspend" manually, it warns that a task exists with mutex held. If suspending and resuming manually, suspend and resume are done from different processes, thus ext3 is violating mutex specification Documentation/mutex-design.txt * - only the owner can unlock the mutex * - task may not exit with mutex held It is really a bug in ext3 and ext4 and device mapper has nothing to do with it, except that it triggers it. The bug should be reported to ext[34] maintainers and fixed there. Mikulas ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-11-19 12:49 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-11-11 1:16 dm: bind new table before destroying old Alasdair G Kergon 2009-11-11 1:20 ` Alasdair G Kergon 2009-11-11 2:48 ` dm: keep old table until after resume succeeded Alasdair G Kergon 2009-11-11 6:56 ` dm: bind new table before destroying old Kiyoshi Ueda 2009-11-11 15:11 ` Alasdair G Kergon 2009-11-11 15:14 ` Milan Broz 2009-11-12 7:00 ` Mike Anderson 2009-11-12 9:59 ` Kiyoshi Ueda 2009-11-11 13:20 ` Mike Snitzer 2009-11-11 23:45 ` Mike Snitzer 2009-11-19 0:50 ` Alasdair G Kergon 2009-11-19 2:51 ` malahal 2009-11-19 12:49 ` Mikulas Patocka
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.