* [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex
@ 2006-11-07 18:34 Alasdair G Kergon
2006-11-07 20:18 ` [dm-devel] " Mike Snitzer
2006-11-07 20:28 ` Andrew Morton
0 siblings, 2 replies; 69+ messages in thread
From: Alasdair G Kergon @ 2006-11-07 18:34 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, dm-devel, Ingo Molnar, Eric Sandeen, Srinivasa DS
From: Srinivasa Ds <srinivasa@in.ibm.com>
On debugging I found out that,"dmsetup suspend <device name>" calls
"freeze_bdev()",which locks "bd_mount_mutex" to make sure that no new mounts
happen on bdev until thaw_bdev() is called. This "thaw_bdev()" is getting
called when we resume the device through "dmsetup resume <device-name>".
Hence we have 2 processes,one of which locks "bd_mount_mutex"(dmsetup
suspend) and another(dmsetup resume) unlocks it.
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Eric Sandeen <sandeen@sandeen.net>
Cc: dm-devel@redhat.com
Index: linux-2.6.19-rc4/fs/block_dev.c
===================================================================
--- linux-2.6.19-rc4.orig/fs/block_dev.c 2006-11-07 17:06:20.000000000 +0000
+++ linux-2.6.19-rc4/fs/block_dev.c 2006-11-07 17:26:04.000000000 +0000
@@ -263,7 +263,7 @@ static void init_once(void * foo, kmem_c
{
memset(bdev, 0, sizeof(*bdev));
mutex_init(&bdev->bd_mutex);
- mutex_init(&bdev->bd_mount_mutex);
+ sema_init(&bdev->bd_mount_sem, 1);
INIT_LIST_HEAD(&bdev->bd_inodes);
INIT_LIST_HEAD(&bdev->bd_list);
#ifdef CONFIG_SYSFS
Index: linux-2.6.19-rc4/fs/buffer.c
===================================================================
--- linux-2.6.19-rc4.orig/fs/buffer.c 2006-11-07 17:06:20.000000000 +0000
+++ linux-2.6.19-rc4/fs/buffer.c 2006-11-07 17:26:04.000000000 +0000
@@ -188,7 +188,9 @@ struct super_block *freeze_bdev(struct b
{
struct super_block *sb;
- mutex_lock(&bdev->bd_mount_mutex);
+ if (down_trylock(&bdev->bd_mount_sem))
+ return -EBUSY;
+
sb = get_super(bdev);
if (sb && !(sb->s_flags & MS_RDONLY)) {
sb->s_frozen = SB_FREEZE_WRITE;
@@ -230,7 +232,7 @@ void thaw_bdev(struct block_device *bdev
drop_super(sb);
}
- mutex_unlock(&bdev->bd_mount_mutex);
+ up(&bdev->bd_mount_sem);
}
EXPORT_SYMBOL(thaw_bdev);
Index: linux-2.6.19-rc4/fs/super.c
===================================================================
--- linux-2.6.19-rc4.orig/fs/super.c 2006-11-07 17:06:20.000000000 +0000
+++ linux-2.6.19-rc4/fs/super.c 2006-11-07 17:26:04.000000000 +0000
@@ -735,9 +735,9 @@ int get_sb_bdev(struct file_system_type
* will protect the lockfs code from trying to start a snapshot
* while we are mounting
*/
- mutex_lock(&bdev->bd_mount_mutex);
+ down(&bdev->bd_mount_sem);
s = sget(fs_type, test_bdev_super, set_bdev_super, bdev);
- mutex_unlock(&bdev->bd_mount_mutex);
+ up(&bdev->bd_mount_sem);
if (IS_ERR(s))
goto error_s;
Index: linux-2.6.19-rc4/include/linux/fs.h
===================================================================
--- linux-2.6.19-rc4.orig/include/linux/fs.h 2006-11-07 17:06:20.000000000 +0000
+++ linux-2.6.19-rc4/include/linux/fs.h 2006-11-07 17:26:04.000000000 +0000
@@ -456,7 +456,7 @@ struct block_device {
struct inode * bd_inode; /* will die */
int bd_openers;
struct mutex bd_mutex; /* open/close mutex */
- struct mutex bd_mount_mutex; /* mount mutex */
+ struct semaphore bd_mount_sem;
struct list_head bd_inodes;
void * bd_holder;
int bd_holders;
^ permalink raw reply [flat|nested] 69+ messages in thread* Re: [dm-devel] [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 18:34 [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex Alasdair G Kergon @ 2006-11-07 20:18 ` Mike Snitzer 2006-11-07 20:22 ` Eric Sandeen 2006-11-07 23:34 ` Alasdair G Kergon 2006-11-07 20:28 ` Andrew Morton 1 sibling, 2 replies; 69+ messages in thread From: Mike Snitzer @ 2006-11-07 20:18 UTC (permalink / raw) To: device-mapper development, Andrew Morton, linux-kernel, Ingo Molnar, Eric Sandeen, Srinivasa DS On 11/7/06, Alasdair G Kergon <agk@redhat.com> wrote: > From: Srinivasa Ds <srinivasa@in.ibm.com> > > On debugging I found out that,"dmsetup suspend <device name>" calls > "freeze_bdev()",which locks "bd_mount_mutex" to make sure that no new mounts > happen on bdev until thaw_bdev() is called. This "thaw_bdev()" is getting > called when we resume the device through "dmsetup resume <device-name>". > Hence we have 2 processes,one of which locks "bd_mount_mutex"(dmsetup > suspend) and another(dmsetup resume) unlocks it. Srinivasa's description of the patch just speaks to how freeze_bdev and thaw_bdev are used by DM but completely skips justification for switching from mutex to semaphore. Why is it beneficial and/or necessary to use a semaphore instead of a mutex here? Mike ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [dm-devel] [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 20:18 ` [dm-devel] " Mike Snitzer @ 2006-11-07 20:22 ` Eric Sandeen 2006-11-07 23:34 ` Alasdair G Kergon 1 sibling, 0 replies; 69+ messages in thread From: Eric Sandeen @ 2006-11-07 20:22 UTC (permalink / raw) To: Mike Snitzer Cc: device-mapper development, Andrew Morton, linux-kernel, Ingo Molnar, Srinivasa DS Mike Snitzer wrote: > On 11/7/06, Alasdair G Kergon <agk@redhat.com> wrote: >> From: Srinivasa Ds <srinivasa@in.ibm.com> >> >> On debugging I found out that,"dmsetup suspend <device name>" calls >> "freeze_bdev()",which locks "bd_mount_mutex" to make sure that no new mounts >> happen on bdev until thaw_bdev() is called. This "thaw_bdev()" is getting >> called when we resume the device through "dmsetup resume <device-name>". >> Hence we have 2 processes,one of which locks "bd_mount_mutex"(dmsetup >> suspend) and another(dmsetup resume) unlocks it. > > Srinivasa's description of the patch just speaks to how freeze_bdev > and thaw_bdev are used by DM but completely skips justification for > switching from mutex to semaphore. Why is it beneficial and/or > necessary to use a semaphore instead of a mutex here? Because mutexes are not supposed to be released by anything other than the thread that took them, as enforced by the various checking code and noted in the docs... "The stricter mutex API means you cannot use mutexes the same way you can use semaphores: e.g. they cannot be used from an interrupt context, nor can they be unlocked from a different context that which acquired it." this particular resource can sometimes be locked & unlocked from 2 different userspace threads. -Eric ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [dm-devel] [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 20:18 ` [dm-devel] " Mike Snitzer 2006-11-07 20:22 ` Eric Sandeen @ 2006-11-07 23:34 ` Alasdair G Kergon 1 sibling, 0 replies; 69+ messages in thread From: Alasdair G Kergon @ 2006-11-07 23:34 UTC (permalink / raw) To: device-mapper development Cc: Andrew Morton, linux-kernel, Ingo Molnar, Eric Sandeen, Srinivasa DS On Tue, Nov 07, 2006 at 03:18:08PM -0500, Mike Snitzer wrote: > Srinivasa's description of the patch just speaks to how freeze_bdev > and thaw_bdev are used by DM but completely skips justification for > switching from mutex to semaphore. Why is it beneficial and/or > necessary to use a semaphore instead of a mutex here? This used to be a semaphore; someone changed it to a mutex; we started getting device-mapper bug reports from people because the usage breaks the stated semantics for a mutex; this patch puts things back while we consider if there's a better way of doing things. Alasdair -- agk@redhat.com ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 18:34 [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex Alasdair G Kergon 2006-11-07 20:18 ` [dm-devel] " Mike Snitzer @ 2006-11-07 20:28 ` Andrew Morton 2006-11-07 22:45 ` Eric Sandeen ` (2 more replies) 1 sibling, 3 replies; 69+ messages in thread From: Andrew Morton @ 2006-11-07 20:28 UTC (permalink / raw) To: Alasdair G Kergon Cc: linux-kernel, dm-devel, Ingo Molnar, Eric Sandeen, Srinivasa DS On Tue, 7 Nov 2006 18:34:59 +0000 Alasdair G Kergon <agk@redhat.com> wrote: > From: Srinivasa Ds <srinivasa@in.ibm.com> > > On debugging I found out that,"dmsetup suspend <device name>" calls > "freeze_bdev()",which locks "bd_mount_mutex" to make sure that no new mounts > happen on bdev until thaw_bdev() is called. This "thaw_bdev()" is getting > called when we resume the device through "dmsetup resume <device-name>". > Hence we have 2 processes,one of which locks "bd_mount_mutex"(dmsetup > suspend) and another(dmsetup resume) unlocks it. So... what does this have to do with switching from mutex to semaphore? Perhaps this works around the debugging code which gets offended if a mutex is unlocked by a process which didn't do the lock? If so, it's a bit sad to switch to semaphore just because of some errant debugging code. Perhaps it would be better to create a new mutex_unlock_stfu() which suppresses the warning? > --- linux-2.6.19-rc4.orig/fs/buffer.c 2006-11-07 17:06:20.000000000 +0000 > +++ linux-2.6.19-rc4/fs/buffer.c 2006-11-07 17:26:04.000000000 +0000 > @@ -188,7 +188,9 @@ struct super_block *freeze_bdev(struct b > { > struct super_block *sb; > > - mutex_lock(&bdev->bd_mount_mutex); > + if (down_trylock(&bdev->bd_mount_sem)) > + return -EBUSY; > + This is a functional change which isn't described in the changelog. What's happening here? ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 20:28 ` Andrew Morton @ 2006-11-07 22:45 ` Eric Sandeen 2006-11-07 23:00 ` Andrew Morton 2006-11-07 23:05 ` Rafael J. Wysocki 2006-11-07 23:23 ` Alasdair G Kergon 2006-11-07 23:39 ` Ingo Molnar 2 siblings, 2 replies; 69+ messages in thread From: Eric Sandeen @ 2006-11-07 22:45 UTC (permalink / raw) To: Andrew Morton Cc: Alasdair G Kergon, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS Andrew Morton wrote: >> --- linux-2.6.19-rc4.orig/fs/buffer.c 2006-11-07 17:06:20.000000000 +0000 >> +++ linux-2.6.19-rc4/fs/buffer.c 2006-11-07 17:26:04.000000000 +0000 >> @@ -188,7 +188,9 @@ struct super_block *freeze_bdev(struct b >> { >> struct super_block *sb; >> >> - mutex_lock(&bdev->bd_mount_mutex); >> + if (down_trylock(&bdev->bd_mount_sem)) >> + return -EBUSY; >> + > > This is a functional change which isn't described in the changelog. What's > happening here? Only allow one bdev-freezer in at a time, rather than queueing them up? -Eric ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 22:45 ` Eric Sandeen @ 2006-11-07 23:00 ` Andrew Morton 2006-11-08 9:54 ` Arjan van de Ven ` (2 more replies) 2006-11-07 23:05 ` Rafael J. Wysocki 1 sibling, 3 replies; 69+ messages in thread From: Andrew Morton @ 2006-11-07 23:00 UTC (permalink / raw) To: Eric Sandeen Cc: Alasdair G Kergon, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS On Tue, 07 Nov 2006 16:45:07 -0600 Eric Sandeen <sandeen@redhat.com> wrote: > Andrew Morton wrote: > > >> --- linux-2.6.19-rc4.orig/fs/buffer.c 2006-11-07 17:06:20.000000000 +0000 > >> +++ linux-2.6.19-rc4/fs/buffer.c 2006-11-07 17:26:04.000000000 +0000 > >> @@ -188,7 +188,9 @@ struct super_block *freeze_bdev(struct b > >> { > >> struct super_block *sb; > >> > >> - mutex_lock(&bdev->bd_mount_mutex); > >> + if (down_trylock(&bdev->bd_mount_sem)) > >> + return -EBUSY; > >> + > > > > This is a functional change which isn't described in the changelog. What's > > happening here? > > Only allow one bdev-freezer in at a time, rather than queueing them up? > You're asking me? ;) Guys, I'm going to park this patch pending a full description of what it does, a description of what the above hunk is doing and pending an examination of whether we'd be better off changing the mutex-debugging code rather than switching to semaphores. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 23:00 ` Andrew Morton @ 2006-11-08 9:54 ` Arjan van de Ven 2007-01-12 6:23 ` Srinivasa Ds 2007-01-12 10:16 ` Srinivasa Ds 2 siblings, 0 replies; 69+ messages in thread From: Arjan van de Ven @ 2006-11-08 9:54 UTC (permalink / raw) To: Andrew Morton Cc: Eric Sandeen, Alasdair G Kergon, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS > > You're asking me? ;) > > Guys, I'm going to park this patch pending a full description of what it > does, a description of what the above hunk is doing and pending an > examination of whether we'd be better off changing the mutex-debugging code > rather than switching to semaphores. It's not used as a mutex. Sad but true. It's not so easy to say "just shut up the debug code", because it's just not that easy: The interface allows double "unlock", which is fine for semaphores for example. There fundamentally is no "owner" for this case, and all the mutex concepts assume that there is an owner. If the owner goes away, pointers to their task struct for example are no longer valid (used by lockdep and the other debugging parts). It's what makes the difference between a mutex and a semaphore: a mutex has an owner and several semantics follow from that. These semantics allow a more efficient implementation (no multiple "owners" possible) but once you go away from that fundamental property, soon we'll see "oh and it needs <this extra code> to cover the wider semantics correctly.. and soon you have a semaphore again. Let true semaphores be semaphores, and make all real mutexes mutexes. But lets not make actual semaphores use mutex code... -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 23:00 ` Andrew Morton 2006-11-08 9:54 ` Arjan van de Ven @ 2007-01-12 6:23 ` Srinivasa Ds 2007-01-12 10:16 ` Srinivasa Ds 2 siblings, 0 replies; 69+ messages in thread From: Srinivasa Ds @ 2007-01-12 6:23 UTC (permalink / raw) To: Andrew Morton Cc: Eric Sandeen, Alasdair G Kergon, linux-kernel, dm-devel, Ingo Molnar, arjan [-- Attachment #1: Type: text/plain, Size: 2951 bytes --] Andrew Morton wrote: > On Tue, 07 Nov 2006 16:45:07 -0600 > Eric Sandeen <sandeen@redhat.com> wrote: > > >> Andrew Morton wrote: >> >> >>>> --- linux-2.6.19-rc4.orig/fs/buffer.c 2006-11-07 17:06:20.000000000 +0000 >>>> +++ linux-2.6.19-rc4/fs/buffer.c 2006-11-07 17:26:04.000000000 +0000 >>>> @@ -188,7 +188,9 @@ struct super_block *freeze_bdev(struct b >>>> { >>>> struct super_block *sb; >>>> >>>> - mutex_lock(&bdev->bd_mount_mutex); >>>> + if (down_trylock(&bdev->bd_mount_sem)) >>>> + return -EBUSY; >>>> + >>>> > > > Guys, I'm going to park this patch pending a full description of what it > does, a description of what the above hunk is doing Iam really sorry for delayed reply. Here is my full explanation on my patch. Patch, that I have proposed replaces the mutex on bd_mount_mutex with the semaphore (bd_mount_sem). It was(bd_mount_sem) used to be a semaphore earlier and then Ingo's patch changed it to mutex. Hence we had the problem in device mapper commands. But now, the proposed patch allows two separate device mapper commands to suspend and resume the logical devices. Even though this explaination is from deveice mapper perspective, since semaphore(bd_mount_sem) existed earlier, proposed patch doesn't hurt XFS code which access the freeze_bdev(). As for as the above code is concerned,Iam using down_trylock() to allow only one process to freeze the filesystem at a time and hence stops the other process from queuing up. > and pending an > examination of whether we'd be better off changing the mutex-debugging code > rather than switching to semaphores. > Regarding this one,I agree with what Arjan has told. " It's not used as a mutex. Sad but true. It's not so easy to say "just shut up the debug code", because it's just not that easy: The interface allows double "unlock", which is fine for semaphores for example. There fundamentally is no "owner" for this case, and all the mutex concepts assume that there is an owner. If the owner goes away, pointers to their task struct for example are no longer valid (used by lockdep and the other debugging parts). It's what makes the difference between a mutex and a semaphore: a mutex has an owner and several semantics follow from that. These semantics allow a more efficient implementation (no multiple "owners" possible) but once you go away from that fundamental property, soon we'll see "oh and it needs <this extra code> to cover the wider semantics correctly.. and soon you have a semaphore again. Let true semaphores be semaphores, and make all real mutexes mutexes. But lets not make actual semaphores use mutex code... ". So Iam reproposing my patch(taken against latest kernel) here. Please let me know your comments on this. Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Sandeen <sandeen@sandeen.net> Cc: dm-devel@redhat.com [-- Attachment #2: dm.fix --] [-- Type: text/plain, Size: 2377 bytes --] fs/block_dev.c | 2 +- fs/buffer.c | 6 ++++-- fs/super.c | 4 ++-- include/linux/fs.h | 2 +- 4 files changed, 8 insertions(+), 6 deletions(-) Index: linux-2.6.20-rc4/fs/block_dev.c =================================================================== --- linux-2.6.20-rc4.orig/fs/block_dev.c +++ linux-2.6.20-rc4/fs/block_dev.c @@ -411,7 +411,7 @@ static void init_once(void * foo, struct { memset(bdev, 0, sizeof(*bdev)); mutex_init(&bdev->bd_mutex); - mutex_init(&bdev->bd_mount_mutex); + sema_init(&bdev->bd_mount_sem, 1); INIT_LIST_HEAD(&bdev->bd_inodes); INIT_LIST_HEAD(&bdev->bd_list); #ifdef CONFIG_SYSFS Index: linux-2.6.20-rc4/fs/buffer.c =================================================================== --- linux-2.6.20-rc4.orig/fs/buffer.c +++ linux-2.6.20-rc4/fs/buffer.c @@ -189,7 +189,9 @@ struct super_block *freeze_bdev(struct b { struct super_block *sb; - mutex_lock(&bdev->bd_mount_mutex); + if (down_trylock(&bdev->bd_mount_sem)) + return -EBUSY; + sb = get_super(bdev); if (sb && !(sb->s_flags & MS_RDONLY)) { sb->s_frozen = SB_FREEZE_WRITE; @@ -231,7 +233,7 @@ void thaw_bdev(struct block_device *bdev drop_super(sb); } - mutex_unlock(&bdev->bd_mount_mutex); + up(&bdev->bd_mount_sem); } EXPORT_SYMBOL(thaw_bdev); Index: linux-2.6.20-rc4/include/linux/fs.h =================================================================== --- linux-2.6.20-rc4.orig/include/linux/fs.h +++ linux-2.6.20-rc4/include/linux/fs.h @@ -458,7 +458,7 @@ struct block_device { struct inode * bd_inode; /* will die */ int bd_openers; struct mutex bd_mutex; /* open/close mutex */ - struct mutex bd_mount_mutex; /* mount mutex */ + struct semaphore bd_mount_sem; struct list_head bd_inodes; void * bd_holder; int bd_holders; Index: linux-2.6.20-rc4/fs/super.c =================================================================== --- linux-2.6.20-rc4.orig/fs/super.c +++ linux-2.6.20-rc4/fs/super.c @@ -753,9 +753,9 @@ int get_sb_bdev(struct file_system_type * will protect the lockfs code from trying to start a snapshot * while we are mounting */ - mutex_lock(&bdev->bd_mount_mutex); + down(&bdev->bd_mount_sem); s = sget(fs_type, test_bdev_super, set_bdev_super, bdev); - mutex_unlock(&bdev->bd_mount_mutex); + up(&bdev->bd_mount_sem); if (IS_ERR(s)) goto error_s; ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 23:00 ` Andrew Morton 2006-11-08 9:54 ` Arjan van de Ven 2007-01-12 6:23 ` Srinivasa Ds @ 2007-01-12 10:16 ` Srinivasa Ds 2 siblings, 0 replies; 69+ messages in thread From: Srinivasa Ds @ 2007-01-12 10:16 UTC (permalink / raw) To: Andrew Morton Cc: Eric Sandeen, Alasdair G Kergon, linux-kernel, dm-devel, Ingo Molnar [-- Attachment #1: Type: text/plain, Size: 3172 bytes --] Andrew Morton wrote: > On Tue, 07 Nov 2006 16:45:07 -0600 > Eric Sandeen <sandeen@redhat.com> wrote: > > >> Andrew Morton wrote: >> >> >>>> --- linux-2.6.19-rc4.orig/fs/buffer.c 2006-11-07 17:06:20.000000000 +0000 >>>> +++ linux-2.6.19-rc4/fs/buffer.c 2006-11-07 17:26:04.000000000 +0000 >>>> @@ -188,7 +188,9 @@ struct super_block *freeze_bdev(struct b >>>> { >>>> struct super_block *sb; >>>> >>>> - mutex_lock(&bdev->bd_mount_mutex); >>>> + if (down_trylock(&bdev->bd_mount_sem)) >>>> + return -EBUSY; >>>> + >>>> >>> This is a functional change which isn't described in the changelog. What's >>> happening here? >>> >> Only allow one bdev-freezer in at a time, rather than queueing them up? >> >> > > You're asking me? ;) > > Guys, I'm going to park this patch pending a full description of what it > does, a description of what the above hunk is doing Iam really sorry for delayed reply. Here is my full explanation on my patch. Patch, that I have proposed replaces the mutex on bd_mount_mutex with the semaphore (bd_mount_sem). It was(bd_mount_sem) used to be a semaphore earlier and then Ingo's patch changed it to mutex. Hence we had the problem in device mapper commands. But now, the proposed patch allows two separate device mapper commands to suspend and resume the logical devices. Even though this explaination is from deveice mapper perspective, since semaphore(bd_mount_sem) existed earlier, proposed patch doesn't hurt XFS code which access the freeze_bdev(). As for as the above code is concerned,Iam using down_trylock() to allow only one process to freeze the filesystem at a time and hence stops the other process from queuing up. > and pending an > examination of whether we'd be better off changing the mutex-debugging code > rather than switching to semaphores. > > Regarding this one,I agree with what Arjan has told. " It's not used as a mutex. Sad but true. It's not so easy to say "just shut up the debug code", because it's just not that easy: The interface allows double "unlock", which is fine for semaphores for example. There fundamentally is no "owner" for this case, and all the mutex concepts assume that there is an owner. If the owner goes away, pointers to their task struct for example are no longer valid (used by lockdep and the other debugging parts). It's what makes the difference between a mutex and a semaphore: a mutex has an owner and several semantics follow from that. These semantics allow a more efficient implementation (no multiple "owners" possible) but once you go away from that fundamental property, soon we'll see "oh and it needs <this extra code> to cover the wider semantics correctly.. and soon you have a semaphore again. Let true semaphores be semaphores, and make all real mutexes mutexes. But lets not make actual semaphores use mutex code... ". So Iam reproposing my patch(taken against latest kernel) here. Please let me know your comments on this. Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Sandeen <sandeen@sandeen.net> Cc: dm-devel@redhat.com [-- Attachment #2: dm.fix --] [-- Type: text/plain, Size: 2377 bytes --] fs/block_dev.c | 2 +- fs/buffer.c | 6 ++++-- fs/super.c | 4 ++-- include/linux/fs.h | 2 +- 4 files changed, 8 insertions(+), 6 deletions(-) Index: linux-2.6.20-rc4/fs/block_dev.c =================================================================== --- linux-2.6.20-rc4.orig/fs/block_dev.c +++ linux-2.6.20-rc4/fs/block_dev.c @@ -411,7 +411,7 @@ static void init_once(void * foo, struct { memset(bdev, 0, sizeof(*bdev)); mutex_init(&bdev->bd_mutex); - mutex_init(&bdev->bd_mount_mutex); + sema_init(&bdev->bd_mount_sem, 1); INIT_LIST_HEAD(&bdev->bd_inodes); INIT_LIST_HEAD(&bdev->bd_list); #ifdef CONFIG_SYSFS Index: linux-2.6.20-rc4/fs/buffer.c =================================================================== --- linux-2.6.20-rc4.orig/fs/buffer.c +++ linux-2.6.20-rc4/fs/buffer.c @@ -189,7 +189,9 @@ struct super_block *freeze_bdev(struct b { struct super_block *sb; - mutex_lock(&bdev->bd_mount_mutex); + if (down_trylock(&bdev->bd_mount_sem)) + return -EBUSY; + sb = get_super(bdev); if (sb && !(sb->s_flags & MS_RDONLY)) { sb->s_frozen = SB_FREEZE_WRITE; @@ -231,7 +233,7 @@ void thaw_bdev(struct block_device *bdev drop_super(sb); } - mutex_unlock(&bdev->bd_mount_mutex); + up(&bdev->bd_mount_sem); } EXPORT_SYMBOL(thaw_bdev); Index: linux-2.6.20-rc4/include/linux/fs.h =================================================================== --- linux-2.6.20-rc4.orig/include/linux/fs.h +++ linux-2.6.20-rc4/include/linux/fs.h @@ -458,7 +458,7 @@ struct block_device { struct inode * bd_inode; /* will die */ int bd_openers; struct mutex bd_mutex; /* open/close mutex */ - struct mutex bd_mount_mutex; /* mount mutex */ + struct semaphore bd_mount_sem; struct list_head bd_inodes; void * bd_holder; int bd_holders; Index: linux-2.6.20-rc4/fs/super.c =================================================================== --- linux-2.6.20-rc4.orig/fs/super.c +++ linux-2.6.20-rc4/fs/super.c @@ -753,9 +753,9 @@ int get_sb_bdev(struct file_system_type * will protect the lockfs code from trying to start a snapshot * while we are mounting */ - mutex_lock(&bdev->bd_mount_mutex); + down(&bdev->bd_mount_sem); s = sget(fs_type, test_bdev_super, set_bdev_super, bdev); - mutex_unlock(&bdev->bd_mount_mutex); + up(&bdev->bd_mount_sem); if (IS_ERR(s)) goto error_s; ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 22:45 ` Eric Sandeen 2006-11-07 23:00 ` Andrew Morton @ 2006-11-07 23:05 ` Rafael J. Wysocki 2006-11-07 23:18 ` Eric Sandeen 2006-11-07 23:49 ` Alasdair G Kergon 1 sibling, 2 replies; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-07 23:05 UTC (permalink / raw) To: Eric Sandeen Cc: Andrew Morton, Alasdair G Kergon, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS On Tuesday, 7 November 2006 23:45, Eric Sandeen wrote: > Andrew Morton wrote: > > >> --- linux-2.6.19-rc4.orig/fs/buffer.c 2006-11-07 17:06:20.000000000 +0000 > >> +++ linux-2.6.19-rc4/fs/buffer.c 2006-11-07 17:26:04.000000000 +0000 > >> @@ -188,7 +188,9 @@ struct super_block *freeze_bdev(struct b > >> { > >> struct super_block *sb; > >> > >> - mutex_lock(&bdev->bd_mount_mutex); > >> + if (down_trylock(&bdev->bd_mount_sem)) > >> + return -EBUSY; > >> + > > > > This is a functional change which isn't described in the changelog. What's > > happening here? > > Only allow one bdev-freezer in at a time, rather than queueing them up? But freeze_bdev() is supposed to return the result of get_super(bdev) _unconditionally_. Moreover, in its current form freeze_bdev() _cannot_ _fail_, so I don't see how this change doesn't break any existing code. For example freeze_filesystems() (recently added to -mm) will be broken if the down_trylock() is unsuccessful. Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 23:05 ` Rafael J. Wysocki @ 2006-11-07 23:18 ` Eric Sandeen 2006-11-07 23:42 ` Rafael J. Wysocki 2006-11-07 23:49 ` Alasdair G Kergon 1 sibling, 1 reply; 69+ messages in thread From: Eric Sandeen @ 2006-11-07 23:18 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andrew Morton, Alasdair G Kergon, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS Rafael J. Wysocki wrote: > On Tuesday, 7 November 2006 23:45, Eric Sandeen wrote: >> Andrew Morton wrote: >> >>>> --- linux-2.6.19-rc4.orig/fs/buffer.c 2006-11-07 17:06:20.000000000 +0000 >>>> +++ linux-2.6.19-rc4/fs/buffer.c 2006-11-07 17:26:04.000000000 +0000 >>>> @@ -188,7 +188,9 @@ struct super_block *freeze_bdev(struct b >>>> { >>>> struct super_block *sb; >>>> >>>> - mutex_lock(&bdev->bd_mount_mutex); >>>> + if (down_trylock(&bdev->bd_mount_sem)) >>>> + return -EBUSY; >>>> + >>> This is a functional change which isn't described in the changelog. What's >>> happening here? >> Only allow one bdev-freezer in at a time, rather than queueing them up? > > But freeze_bdev() is supposed to return the result of get_super(bdev) > _unconditionally_. Moreover, in its current form freeze_bdev() _cannot_ > _fail_, so I don't see how this change doesn't break any existing code. Well, it could return NULL. Is that a failure? But, nobody is checking for an outright error, certainly. Especially when the error hasn't been ERR_PTR'd. :) So I agree, that's not so good. But, how is a stampede of fs-freezers -supposed- to work? I could imagine something like a freezer count, and the filesystem is only unfrozen after everyone has thawed? Or should only one freezer be active at a time... which is what we have now I guess. -Eric ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 23:18 ` Eric Sandeen @ 2006-11-07 23:42 ` Rafael J. Wysocki 2006-11-08 0:01 ` Alasdair G Kergon 0 siblings, 1 reply; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-07 23:42 UTC (permalink / raw) To: Eric Sandeen Cc: Andrew Morton, Alasdair G Kergon, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS On Wednesday, 8 November 2006 00:18, Eric Sandeen wrote: > Rafael J. Wysocki wrote: > > On Tuesday, 7 November 2006 23:45, Eric Sandeen wrote: > >> Andrew Morton wrote: > >> > >>>> --- linux-2.6.19-rc4.orig/fs/buffer.c 2006-11-07 17:06:20.000000000 +0000 > >>>> +++ linux-2.6.19-rc4/fs/buffer.c 2006-11-07 17:26:04.000000000 +0000 > >>>> @@ -188,7 +188,9 @@ struct super_block *freeze_bdev(struct b > >>>> { > >>>> struct super_block *sb; > >>>> > >>>> - mutex_lock(&bdev->bd_mount_mutex); > >>>> + if (down_trylock(&bdev->bd_mount_sem)) > >>>> + return -EBUSY; > >>>> + > >>> This is a functional change which isn't described in the changelog. What's > >>> happening here? > >> Only allow one bdev-freezer in at a time, rather than queueing them up? > > > > But freeze_bdev() is supposed to return the result of get_super(bdev) > > _unconditionally_. Moreover, in its current form freeze_bdev() _cannot_ > > _fail_, so I don't see how this change doesn't break any existing code. > > Well, it could return NULL. Is that a failure? It will only fail if get_super(bdev) returns NULL, but if you call freeze_bdev(sb->s_bdev), then it can't do that. > But, nobody is checking for an outright error, certainly. Especially > when the error hasn't been ERR_PTR'd. :) So I agree, that's not so good. > > But, how is a stampede of fs-freezers -supposed- to work? I could > imagine something like a freezer count, and the filesystem is only > unfrozen after everyone has thawed? Or should only one freezer be > active at a time... which is what we have now I guess. I think it shouldn't be possible to freeze an fs more than once. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 23:42 ` Rafael J. Wysocki @ 2006-11-08 0:01 ` Alasdair G Kergon 2006-11-08 8:27 ` David Chinner 0 siblings, 1 reply; 69+ messages in thread From: Alasdair G Kergon @ 2006-11-08 0:01 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Eric Sandeen, Andrew Morton, Alasdair G Kergon, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS On Wed, Nov 08, 2006 at 12:42:02AM +0100, Rafael J. Wysocki wrote: > On Wednesday, 8 November 2006 00:18, Eric Sandeen wrote: > > But, how is a stampede of fs-freezers -supposed- to work? I could > > imagine something like a freezer count, and the filesystem is only > > unfrozen after everyone has thawed? Or should only one freezer be > > active at a time... which is what we have now I guess. > I think it shouldn't be possible to freeze an fs more than once. In device-mapper today, the only way to get more than one freeze on the same device is to use xfs and issue xfs_freeze before creating an lvm snapshot (or issuing the dmsetup equivalent), and at the moment we tell people not to do that any more. The device-mapper API does not permit multiple freezes of the same device. (The interesting question is actually whether the request should be cascaded in any way when devices depend on other devices.) Now if someone's introducing a new use for freeze_bdev, perhaps now's the time to revisit the semantics and allow for concurrent freeze requests. Alasdair -- agk@redhat.com ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-08 0:01 ` Alasdair G Kergon @ 2006-11-08 8:27 ` David Chinner 2006-11-08 14:25 ` Alasdair G Kergon 0 siblings, 1 reply; 69+ messages in thread From: David Chinner @ 2006-11-08 8:27 UTC (permalink / raw) To: Rafael J. Wysocki, Eric Sandeen, Andrew Morton, Alasdair G Kergon, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS On Wed, Nov 08, 2006 at 12:01:09AM +0000, Alasdair G Kergon wrote: > On Wed, Nov 08, 2006 at 12:42:02AM +0100, Rafael J. Wysocki wrote: > > On Wednesday, 8 November 2006 00:18, Eric Sandeen wrote: > > > But, how is a stampede of fs-freezers -supposed- to work? I could > > > imagine something like a freezer count, and the filesystem is only > > > unfrozen after everyone has thawed? Or should only one freezer be > > > active at a time... which is what we have now I guess. > > I think it shouldn't be possible to freeze an fs more than once. > > In device-mapper today, the only way to get more than one freeze on the > same device is to use xfs and issue xfs_freeze before creating an lvm snapshot > (or issuing the dmsetup equivalent), and at the moment we tell people not to do > that any more. But it's trivial to detect this condition - if (sb->s_frozen != SB_UNFROZEN) then the filesystem is already frozen and you shouldn't try to freeze it again. It's simple to do, and the whole problem then just goes away.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-08 8:27 ` David Chinner @ 2006-11-08 14:25 ` Alasdair G Kergon 2006-11-08 14:43 ` Rafael J. Wysocki 0 siblings, 1 reply; 69+ messages in thread From: Alasdair G Kergon @ 2006-11-08 14:25 UTC (permalink / raw) To: David Chinner Cc: Rafael J. Wysocki, Eric Sandeen, Andrew Morton, Alasdair G Kergon, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS On Wed, Nov 08, 2006 at 07:27:22PM +1100, David Chinner wrote: > But it's trivial to detect this condition - if (sb->s_frozen != SB_UNFROZEN) > then the filesystem is already frozen and you shouldn't try to freeze > it again. It's simple to do, and the whole problem then just goes away.... So is that another vote in support of explicitly supporting multiple concurrent freeze requests, letting them all succeed, and only thawing after the last one has requested its thaw? (It's not enough just to check SB_UNFROZEN - also need to track whether any other outstanding requests to avoid risk of it getting unfrozen while something independent believes it still to be frozen.) Alasdair -- agk@redhat.com ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-08 14:25 ` Alasdair G Kergon @ 2006-11-08 14:43 ` Rafael J. Wysocki 2006-11-08 15:25 ` Alasdair G Kergon 0 siblings, 1 reply; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-08 14:43 UTC (permalink / raw) To: Alasdair G Kergon Cc: David Chinner, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS On Wednesday, 8 November 2006 15:25, Alasdair G Kergon wrote: > On Wed, Nov 08, 2006 at 07:27:22PM +1100, David Chinner wrote: > > But it's trivial to detect this condition - if (sb->s_frozen != SB_UNFROZEN) > > then the filesystem is already frozen and you shouldn't try to freeze > > it again. It's simple to do, and the whole problem then just goes away.... > > So is that another vote in support of explicitly supporting multiple concurrent > freeze requests, letting them all succeed, and only thawing after the last one > has requested its thaw? (It's not enough just to check SB_UNFROZEN - also need > to track whether any other outstanding requests to avoid risk of it getting > unfrozen while something independent believes it still to be frozen.) So, I think, we need the following patch to fix freeze_filesystems(). Will it be enough to cover the interactions with dm? Rafael Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> --- fs/buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.19-rc5-mm1/fs/buffer.c =================================================================== --- linux-2.6.19-rc5-mm1.orig/fs/buffer.c +++ linux-2.6.19-rc5-mm1/fs/buffer.c @@ -264,7 +264,7 @@ void freeze_filesystems(void) */ list_for_each_entry_reverse(sb, &super_blocks, s_list) { if (!sb->s_root || !sb->s_bdev || - (sb->s_frozen == SB_FREEZE_TRANS) || + (sb->s_frozen != SB_UNFROZEN) || (sb->s_flags & MS_RDONLY) || (sb->s_flags & MS_FROZEN)) continue; ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-08 14:43 ` Rafael J. Wysocki @ 2006-11-08 15:25 ` Alasdair G Kergon 2006-11-08 23:06 ` Rafael J. Wysocki 0 siblings, 1 reply; 69+ messages in thread From: Alasdair G Kergon @ 2006-11-08 15:25 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alasdair G Kergon, David Chinner, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS On Wed, Nov 08, 2006 at 03:43:26PM +0100, Rafael J. Wysocki wrote: > Will it be enough to cover the interactions with dm? There are cases where you *cannot* freeze the filesystem (unless you wait for userspace processes to finish what they are doing) - and only dm can tell you that. The freeze_filesystems() code ought to do it's best in any given circumstances within the constraints. So: Get the filesystem's block device. Check the full tree of devices that that block device depends upon and for any device that belongs to device-mapper check if it is suspended. If it is, skip the original device. struct mapped_device *dm_get_md(dev_t dev); int dm_suspended(struct mapped_device *md); void dm_put(struct mapped_device *md); Handling the tree is the difficult bit, but sysfs could help. [You can get the device-mapper dependencies with: struct mapped_device *dm_get_md(dev_t dev); struct dm_table *dm_get_table(struct mapped_device *md); struct list_head *dm_table_get_devices(struct dm_table *t); void dm_table_put(struct dm_table *t); void dm_put(struct mapped_device *md); ] Consider that you could have an already-frozen filesystem containing a loop device containing a device-mapper device containing a not-frozen filesystem. You won't be able to freeze that top filesystem because the I/O would queue lower down the stack. (Similar problem if the device-mapper device in the stack was suspended.) Alasdair -- agk@redhat.com ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-08 15:25 ` Alasdair G Kergon @ 2006-11-08 23:06 ` Rafael J. Wysocki 0 siblings, 0 replies; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-08 23:06 UTC (permalink / raw) To: Alasdair G Kergon Cc: David Chinner, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS On Wednesday, 8 November 2006 16:25, Alasdair G Kergon wrote: > On Wed, Nov 08, 2006 at 03:43:26PM +0100, Rafael J. Wysocki wrote: > > Will it be enough to cover the interactions with dm? > > There are cases where you *cannot* freeze the filesystem (unless > you wait for userspace processes to finish what they are doing) - > and only dm can tell you that. > > The freeze_filesystems() code ought to do it's best in any given > circumstances within the constraints. > > So: > Get the filesystem's block device. > Check the full tree of devices that that block device depends upon > and for any device that belongs to device-mapper check if it is suspended. > If it is, skip the original device. > > struct mapped_device *dm_get_md(dev_t dev); > int dm_suspended(struct mapped_device *md); > void dm_put(struct mapped_device *md); > > Handling the tree is the difficult bit, but sysfs could help. > [You can get the device-mapper dependencies with: > struct mapped_device *dm_get_md(dev_t dev); > struct dm_table *dm_get_table(struct mapped_device *md); > struct list_head *dm_table_get_devices(struct dm_table *t); > void dm_table_put(struct dm_table *t); > void dm_put(struct mapped_device *md); > ] > > Consider that you could have an already-frozen filesystem containing a loop > device containing a device-mapper device containing a not-frozen filesystem. I think the last point is handled correctly by freezing the filesystems in the reverse order - unless the fs below the loop has been frozen before by someone else, but I guess that would lead to problems anyway. > You won't be able to freeze that top filesystem because the I/O would > queue lower down the stack. (Similar problem if the device-mapper device > in the stack was suspended.) The suspended dm device in the stack is not, however. Is there any list of all dm devices in the system? Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 23:05 ` Rafael J. Wysocki 2006-11-07 23:18 ` Eric Sandeen @ 2006-11-07 23:49 ` Alasdair G Kergon 2006-11-08 0:00 ` Rafael J. Wysocki 2006-11-08 2:30 ` Alasdair G Kergon 1 sibling, 2 replies; 69+ messages in thread From: Alasdair G Kergon @ 2006-11-07 23:49 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Eric Sandeen, Andrew Morton, Alasdair G Kergon, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS On Wed, Nov 08, 2006 at 12:05:49AM +0100, Rafael J. Wysocki wrote: > But freeze_bdev() is supposed to return the result of get_super(bdev) > _unconditionally_. Moreover, in its current form freeze_bdev() _cannot_ > _fail_, so I don't see how this change doesn't break any existing code. > For example freeze_filesystems() (recently added to -mm) will be broken > if the down_trylock() is unsuccessful. I hadn't noticed that -mm patch. I'll take a look. Up to now, device-mapper (via dmsetup) and xfs (via xfs_freeze, which dates from before device-mapper handled this automatically) were the only users. Only one freeze should be issued at once. A freeze is a temporary thing, normally used while creating a snapshot. (One problem we still have is lots of old documentation on the web advising people to run xfs_freeze before creating device-mapper snapshots.) You're right that the down_trylock idea is more trouble than it's worth and should be scrapped. Alasdair -- agk@redhat.com ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 23:49 ` Alasdair G Kergon @ 2006-11-08 0:00 ` Rafael J. Wysocki 2006-11-08 3:33 ` David Chinner 2006-11-08 2:30 ` Alasdair G Kergon 1 sibling, 1 reply; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-08 0:00 UTC (permalink / raw) To: Alasdair G Kergon Cc: Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS On Wednesday, 8 November 2006 00:49, Alasdair G Kergon wrote: > On Wed, Nov 08, 2006 at 12:05:49AM +0100, Rafael J. Wysocki wrote: > > But freeze_bdev() is supposed to return the result of get_super(bdev) > > _unconditionally_. Moreover, in its current form freeze_bdev() _cannot_ > > _fail_, so I don't see how this change doesn't break any existing code. > > For example freeze_filesystems() (recently added to -mm) will be broken > > if the down_trylock() is unsuccessful. > > I hadn't noticed that -mm patch. I'll take a look. Up to now, device-mapper > (via dmsetup) and xfs (via xfs_freeze, which dates from before device-mapper > handled this automatically) were the only users. Only one freeze should be > issued at once. A freeze is a temporary thing, normally used while creating a > snapshot. (One problem we still have is lots of old documentation on the web > advising people to run xfs_freeze before creating device-mapper snapshots.) > > You're right that the down_trylock idea is more trouble than it's worth and > should be scrapped. Well, having looked at it once again I think I was wrong that this change would break freeze_filesystems(), because it only calls freeze_bdev() after checking if sb->s_frozen is not set to SB_FREEZE_TRANS (freeze_filesystems() is only called after all of the userspace processes have been frozen). However, XFS_IOC_FREEZE happily returns 0 after calling freeze_bdev(), apparetnly assuming that it won't fail. Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-08 0:00 ` Rafael J. Wysocki @ 2006-11-08 3:33 ` David Chinner 0 siblings, 0 replies; 69+ messages in thread From: David Chinner @ 2006-11-08 3:33 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Ingo Molnar, Srinivasa DS On Wed, Nov 08, 2006 at 01:00:17AM +0100, Rafael J. Wysocki wrote: > > However, XFS_IOC_FREEZE happily returns 0 after calling freeze_bdev(), > apparetnly assuming that it won't fail. Because it _can't_ fail at this point. We've got an active superblock, because we've had to open a fd on the filesystem to get to the point where we can issue the ioctl. Hence the get_super() call will always succeed and the freeze will be executed if we are not read only. The superblock state changes if the freeze goes ahead, and XFS uses this state change to determine what to do when the thaw command is sent. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 23:49 ` Alasdair G Kergon 2006-11-08 0:00 ` Rafael J. Wysocki @ 2006-11-08 2:30 ` Alasdair G Kergon 2006-11-08 12:10 ` Rafael J. Wysocki 1 sibling, 1 reply; 69+ messages in thread From: Alasdair G Kergon @ 2006-11-08 2:30 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS On Tue, Nov 07, 2006 at 11:49:51PM +0000, Alasdair G Kergon wrote: > I hadn't noticed that -mm patch. I'll take a look. swsusp-freeze-filesystems-during-suspend-rev-2.patch I think you need to give more thought to device-mapper interactions here. If an underlying device is suspended by device-mapper without freezing the filesystem (the normal state) and you issue a freeze_bdev on a device above it, the freeze_bdev may never return if it attempts any synchronous I/O (as it should). Try: while process generating I/O to filesystem on LVM issue dmsetup suspend --nolockfs (which the lvm2 tools often do) try your freeze_filesystems() Maybe: don't allow freeze_filesystems() to run when the system is in that state; or, use device-mapper suspend instead of freeze_bdev directly where dm is involved; or skip dm devices that are already frozen - all with appropriate dependency tracking to process devices in the right order. Alasdair -- agk@redhat.com ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-08 2:30 ` Alasdair G Kergon @ 2006-11-08 12:10 ` Rafael J. Wysocki 2006-11-08 18:09 ` Pavel Machek 2006-11-08 20:48 ` Nigel Cunningham 0 siblings, 2 replies; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-08 12:10 UTC (permalink / raw) To: Alasdair G Kergon Cc: Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham On Wednesday, 8 November 2006 03:30, Alasdair G Kergon wrote: > On Tue, Nov 07, 2006 at 11:49:51PM +0000, Alasdair G Kergon wrote: > > I hadn't noticed that -mm patch. I'll take a look. > > swsusp-freeze-filesystems-during-suspend-rev-2.patch > > I think you need to give more thought to device-mapper > interactions here. If an underlying device is suspended > by device-mapper without freezing the filesystem (the > normal state) and you issue a freeze_bdev on a device > above it, the freeze_bdev may never return if it attempts > any synchronous I/O (as it should). Well, it looks like the interactions with dm add quite a bit of complexity here. > Try: > while process generating I/O to filesystem on LVM > issue dmsetup suspend --nolockfs (which the lvm2 tools often do) > try your freeze_filesystems() Okay, I will. > Maybe: don't allow freeze_filesystems() to run when the system is in that > state; I'd like to avoid that (we may be running out of battery power at this point). > or, use device-mapper suspend instead of freeze_bdev directly where > dm is involved; How do I check if dm is involved? > or skip dm devices that are already frozen - all with > appropriate dependency tracking to process devices in the right order. I'd prefer this one, but probably the previous one is simpler to start with. Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-08 12:10 ` Rafael J. Wysocki @ 2006-11-08 18:09 ` Pavel Machek 2006-11-09 15:52 ` Rafael J. Wysocki 2006-11-08 20:48 ` Nigel Cunningham 1 sibling, 1 reply; 69+ messages in thread From: Pavel Machek @ 2006-11-08 18:09 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham Hi! > > swsusp-freeze-filesystems-during-suspend-rev-2.patch > > > > I think you need to give more thought to device-mapper > > interactions here. If an underlying device is suspended > > by device-mapper without freezing the filesystem (the > > normal state) and you issue a freeze_bdev on a device > > above it, the freeze_bdev may never return if it attempts > > any synchronous I/O (as it should). > > Well, it looks like the interactions with dm add quite a bit of > complexity here. What about just fixing xfs (thou shall not write to disk when kernel threads are frozen), and getting rid of blockdev freezing? Pavel -- Thanks for all the (sleeping) penguins. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-08 18:09 ` Pavel Machek @ 2006-11-09 15:52 ` Rafael J. Wysocki 2006-11-09 16:00 ` Pavel Machek 0 siblings, 1 reply; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-09 15:52 UTC (permalink / raw) To: Pavel Machek Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner Hi, On Wednesday, 8 November 2006 19:09, Pavel Machek wrote: > Hi! > > > > swsusp-freeze-filesystems-during-suspend-rev-2.patch > > > > > > I think you need to give more thought to device-mapper > > > interactions here. If an underlying device is suspended > > > by device-mapper without freezing the filesystem (the > > > normal state) and you issue a freeze_bdev on a device > > > above it, the freeze_bdev may never return if it attempts > > > any synchronous I/O (as it should). > > > > Well, it looks like the interactions with dm add quite a bit of > > complexity here. > > What about just fixing xfs (thou shall not write to disk when kernel > threads are frozen), and getting rid of blockdev freezing? Well, first I must admit you were absolutely right being suspicious with respect to this stuff. OTOH I have no idea _how_ we can tell xfs that the processes have been frozen. Should we introduce a global flag for that or something? Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-09 15:52 ` Rafael J. Wysocki @ 2006-11-09 16:00 ` Pavel Machek 2006-11-09 19:59 ` Rafael J. Wysocki 2006-11-10 0:33 ` David Chinner 0 siblings, 2 replies; 69+ messages in thread From: Pavel Machek @ 2006-11-09 16:00 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner Hi! > > > Well, it looks like the interactions with dm add quite a bit of > > > complexity here. > > > > What about just fixing xfs (thou shall not write to disk when kernel > > threads are frozen), and getting rid of blockdev freezing? > > Well, first I must admit you were absolutely right being suspicious with > respect to this stuff. (OTOH your patch found real bugs in suspend.c, so...) > OTOH I have no idea _how_ we can tell xfs that the processes have been > frozen. Should we introduce a global flag for that or something? I guess XFS should just do all the writes from process context, and refuse any writing when its threads are frozen... I actually still believe it is doing the right thing, because you can't really write to disk from timer. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-09 16:00 ` Pavel Machek @ 2006-11-09 19:59 ` Rafael J. Wysocki 2006-11-09 21:17 ` Pavel Machek 2006-11-10 0:33 ` David Chinner 1 sibling, 1 reply; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-09 19:59 UTC (permalink / raw) To: Pavel Machek Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner Hi, On Thursday, 9 November 2006 17:00, Pavel Machek wrote: > Hi! > > > > > Well, it looks like the interactions with dm add quite a bit of > > > > complexity here. > > > > > > What about just fixing xfs (thou shall not write to disk when kernel > > > threads are frozen), and getting rid of blockdev freezing? > > > > Well, first I must admit you were absolutely right being suspicious with > > respect to this stuff. > > (OTOH your patch found real bugs in suspend.c, so...) > > > OTOH I have no idea _how_ we can tell xfs that the processes have been > > frozen. Should we introduce a global flag for that or something? > > I guess XFS should just do all the writes from process context, and > refuse any writing when its threads are frozen... I actually still > believe it is doing the right thing, because you can't really write to > disk from timer. This is from a work queue, so in fact from a process context, but from a process that is running with PF_NOFREEZE. And I don't think we can forbid filesystems to use work queues. IMO it's a legitimate thing to do for an fs. _But_. Alasdair, do I think correctly that if there's a suspended device-mapper device below an non-frozen filesystem, then sys_sync() would block just as well as freeze_bdev() on this filesystem? Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-09 19:59 ` Rafael J. Wysocki @ 2006-11-09 21:17 ` Pavel Machek 2006-11-09 21:18 ` Rafael J. Wysocki ` (2 more replies) 0 siblings, 3 replies; 69+ messages in thread From: Pavel Machek @ 2006-11-09 21:17 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner Hi! > > > OTOH I have no idea _how_ we can tell xfs that the processes have been > > > frozen. Should we introduce a global flag for that or something? > > > > I guess XFS should just do all the writes from process context, and > > refuse any writing when its threads are frozen... I actually still > > believe it is doing the right thing, because you can't really write to > > disk from timer. > > This is from a work queue, so in fact from a process context, but from > a process that is running with PF_NOFREEZE. Why not simply &~ PF_NOFREEZE on that particular process? Filesystems are free to use threads/work queues/whatever, but refrigerator should mean "no writes to filesystem" for them... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-09 21:17 ` Pavel Machek @ 2006-11-09 21:18 ` Rafael J. Wysocki 2006-11-09 21:41 ` Pavel Machek 2006-11-10 0:54 ` David Chinner 2006-11-10 10:24 ` Alan Cox 2 siblings, 1 reply; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-09 21:18 UTC (permalink / raw) To: Pavel Machek Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner Hi, On Thursday, 9 November 2006 22:17, Pavel Machek wrote: > Hi! > > > > > OTOH I have no idea _how_ we can tell xfs that the processes have been > > > > frozen. Should we introduce a global flag for that or something? > > > > > > I guess XFS should just do all the writes from process context, and > > > refuse any writing when its threads are frozen... I actually still > > > believe it is doing the right thing, because you can't really write to > > > disk from timer. > > > > This is from a work queue, so in fact from a process context, but from > > a process that is running with PF_NOFREEZE. > > Why not simply &~ PF_NOFREEZE on that particular process? Filesystems > are free to use threads/work queues/whatever, but refrigerator should > mean "no writes to filesystem" for them... But how we differentiate worker_threads used by filesystems from the other ones? BTW, I think that worker_threads run with PF_NOFREEZE for a reason, but what exactly is it? Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-09 21:18 ` Rafael J. Wysocki @ 2006-11-09 21:41 ` Pavel Machek 2006-11-09 22:21 ` Rafael J. Wysocki 0 siblings, 1 reply; 69+ messages in thread From: Pavel Machek @ 2006-11-09 21:41 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner Hi! > > > This is from a work queue, so in fact from a process context, but from > > > a process that is running with PF_NOFREEZE. > > > > Why not simply &~ PF_NOFREEZE on that particular process? Filesystems > > are free to use threads/work queues/whatever, but refrigerator should > > mean "no writes to filesystem" for them... > > But how we differentiate worker_threads used by filesystems from the > other ones? I'd expect filesystems to do &~ PF_NOFREEZE by hand. > BTW, I think that worker_threads run with PF_NOFREEZE for a reason, > but what exactly is it? I do not think we had particulary good reasons... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-09 21:41 ` Pavel Machek @ 2006-11-09 22:21 ` Rafael J. Wysocki 2006-11-09 23:11 ` Pavel Machek 2006-11-10 0:57 ` David Chinner 0 siblings, 2 replies; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-09 22:21 UTC (permalink / raw) To: Pavel Machek Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner On Thursday, 9 November 2006 22:41, Pavel Machek wrote: > Hi! > > > > > This is from a work queue, so in fact from a process context, but from > > > > a process that is running with PF_NOFREEZE. > > > > > > Why not simply &~ PF_NOFREEZE on that particular process? Filesystems > > > are free to use threads/work queues/whatever, but refrigerator should > > > mean "no writes to filesystem" for them... > > > > But how we differentiate worker_threads used by filesystems from the > > other ones? > > I'd expect filesystems to do &~ PF_NOFREEZE by hand. > > > BTW, I think that worker_threads run with PF_NOFREEZE for a reason, > > but what exactly is it? > > I do not think we had particulary good reasons... Well, it looks like quite a lot of drivers depend on them, including libata. I think we can add a flag to __create_workqueue() that will indicate if this one is to be running with PF_NOFREEZE and a corresponding macro like create_freezable_workqueue() to be used wherever we want the worker thread to freeze (in which case it should be calling try_to_freeze() somewhere). Then, we can teach filesystems to use this macro instead of create_workqueue(). Having done that we'd be able to drop the freezing of bdevs patch and forget about the dm-related complexity. [Still I wonder if the sys_sync() in freeze_processes() is actually safe if there's a suspended dm device somewhere in the stack, because in the other case the freezing of bdevs would be no more dangerous than the thing that we're already doing.] Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-09 22:21 ` Rafael J. Wysocki @ 2006-11-09 23:11 ` Pavel Machek 2006-11-09 23:24 ` Alasdair G Kergon 2006-11-10 0:57 ` David Chinner 1 sibling, 1 reply; 69+ messages in thread From: Pavel Machek @ 2006-11-09 23:11 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner Hi! > > > > > This is from a work queue, so in fact from a process context, but from > > > > > a process that is running with PF_NOFREEZE. > > > > > > > > Why not simply &~ PF_NOFREEZE on that particular process? Filesystems > > > > are free to use threads/work queues/whatever, but refrigerator should > > > > mean "no writes to filesystem" for them... > > > > > > But how we differentiate worker_threads used by filesystems from the > > > other ones? > > > > I'd expect filesystems to do &~ PF_NOFREEZE by hand. > > > > > BTW, I think that worker_threads run with PF_NOFREEZE for a reason, > > > but what exactly is it? > > > > I do not think we had particulary good reasons... > > Well, it looks like quite a lot of drivers depend on them, including libata. > > I think we can add a flag to __create_workqueue() that will indicate if > this one is to be running with PF_NOFREEZE and a corresponding macro like > create_freezable_workqueue() to be used wherever we want the worker thread > to freeze (in which case it should be calling try_to_freeze() somewhere). > Then, we can teach filesystems to use this macro instead of > create_workqueue(). Works for me. > Having done that we'd be able to drop the freezing of bdevs patch and forget > about the dm-related complexity. yes. > [Still I wonder if the sys_sync() in freeze_processes() is actually safe if > there's a suspended dm device somewhere in the stack, because in the other > case the freezing of bdevs would be no more dangerous than the thing > that we're already doing.] ? Not sure if I quite understand, but if dm breaks sync... something is teribly wrong with dm. And we do simple sys_sync()... so I do not think we have a problem. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-09 23:11 ` Pavel Machek @ 2006-11-09 23:24 ` Alasdair G Kergon 2006-11-09 23:32 ` Pavel Machek 0 siblings, 1 reply; 69+ messages in thread From: Alasdair G Kergon @ 2006-11-09 23:24 UTC (permalink / raw) To: Pavel Machek Cc: Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner On Fri, Nov 10, 2006 at 12:11:46AM +0100, Pavel Machek wrote: > ? Not sure if I quite understand, but if dm breaks sync... something > is teribly wrong with dm. And we do simple sys_sync()... so I do not > think we have a problem. If you want to handle arbitrary kernel state, you might have a device-mapper device somewhere lower down the stack of devices that is queueing any I/O that reaches it. So anything waiting for I/O completion will wait until the dm process that suspended that device has finished whatever it is doing - and that might be a quick thing carried out by a userspace lvm tool, or a long thing carried out by an administrator using dmsetup. I'm guessing you need a way of detecting such state lower down the stack then optionally either aborting the operation telling the user it can't be done at present; waiting for however long it takes (perhaps for ever if the admin disappeared); or more probably skipping those devices on a 'best endeavours' basis. I'm suggesting sysfs or something built on bd_claim might offer a mechanism for traversing the stack. Alasdair -- agk@redhat.com ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-09 23:24 ` Alasdair G Kergon @ 2006-11-09 23:32 ` Pavel Machek 2006-11-10 12:03 ` Rafael J. Wysocki 0 siblings, 1 reply; 69+ messages in thread From: Pavel Machek @ 2006-11-09 23:32 UTC (permalink / raw) To: Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner Hi! > On Fri, Nov 10, 2006 at 12:11:46AM +0100, Pavel Machek wrote: > > ? Not sure if I quite understand, but if dm breaks sync... something > > is teribly wrong with dm. And we do simple sys_sync()... so I do not > > think we have a problem. > > If you want to handle arbitrary kernel state, you might have a device-mapper > device somewhere lower down the stack of devices that is queueing any I/O > that reaches it. So anything waiting for I/O completion will wait until > the dm process that suspended that device has finished whatever it is doing > - and that might be a quick thing carried out by a userspace lvm tool, or > a long thing carried out by an administrator using dmsetup. > > I'm guessing you need a way of detecting such state lower down the stack > then optionally either aborting the operation telling the user it can't be > done at present; waiting for however long it takes (perhaps for ever if > the admin disappeared); or more probably skipping those devices on a > 'best endeavours' basis. Okay, so you claim that sys_sync can stall, waiting for administator? In such case we can simply do one sys_sync() before we start freezing userspace... or just more the only sys_sync() there. That way, admin has chance to unlock his system. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-09 23:32 ` Pavel Machek @ 2006-11-10 12:03 ` Rafael J. Wysocki 2006-11-12 18:43 ` Pavel Machek 0 siblings, 1 reply; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-10 12:03 UTC (permalink / raw) To: Pavel Machek Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner On Friday, 10 November 2006 00:32, Pavel Machek wrote: > Hi! > > > On Fri, Nov 10, 2006 at 12:11:46AM +0100, Pavel Machek wrote: > > > ? Not sure if I quite understand, but if dm breaks sync... something > > > is teribly wrong with dm. And we do simple sys_sync()... so I do not > > > think we have a problem. > > > > If you want to handle arbitrary kernel state, you might have a device-mapper > > device somewhere lower down the stack of devices that is queueing any I/O > > that reaches it. So anything waiting for I/O completion will wait until > > the dm process that suspended that device has finished whatever it is doing > > - and that might be a quick thing carried out by a userspace lvm tool, or > > a long thing carried out by an administrator using dmsetup. > > > > I'm guessing you need a way of detecting such state lower down the stack > > then optionally either aborting the operation telling the user it can't be > > done at present; waiting for however long it takes (perhaps for ever if > > the admin disappeared); or more probably skipping those devices on a > > 'best endeavours' basis. > > Okay, so you claim that sys_sync can stall, waiting for administator? > > In such case we can simply do one sys_sync() before we start freezing > userspace... or just more the only sys_sync() there. That way, admin > has chance to unlock his system. Well, this is a different story. My point is that if we call sys_sync() _anyway_ before calling freeze_filesystems(), then freeze_filesystems() is _safe_ (either the sys_sync() blocks, or it doesn't in which case freeze_filesystems() won't block either). This means, however, that we can leave the patch as is (well, with the minor fix I have already posted), for now, because it doesn't make things worse a bit, but: (a) it prevents xfs from being corrupted and (b) it prevents journaling filesystems in general from replaying journals after a failing resume. Still, there is a problem with the possibility of potential lock-up - either with the bdevs-freezing patch or without it - due to a suspended dm device down the stack and solving that is a _separate_ issue. Now if we use the userland suspend, there's no problem at all, I think, because s2disk calls sync() before it goes to suspend_system(), so the admin will have a chance to unclock the system and everything is fine and dandy (although it should be documented somewhere, IMHO). However, if the built-in swsusp is used, then well ... <looks> ... we can put a call to sys_sync() before prepare_processes() in pm_suspend_disk(). Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-10 12:03 ` Rafael J. Wysocki @ 2006-11-12 18:43 ` Pavel Machek 2006-11-12 21:53 ` Rafael J. Wysocki ` (2 more replies) 0 siblings, 3 replies; 69+ messages in thread From: Pavel Machek @ 2006-11-12 18:43 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner Hi! > > Okay, so you claim that sys_sync can stall, waiting for administator? > > > > In such case we can simply do one sys_sync() before we start freezing > > userspace... or just more the only sys_sync() there. That way, admin > > has chance to unlock his system. > > Well, this is a different story. > > My point is that if we call sys_sync() _anyway_ before calling > freeze_filesystems(), then freeze_filesystems() is _safe_ (either the > sys_sync() blocks, or it doesn't in which case freeze_filesystems() won't > block either). > > This means, however, that we can leave the patch as is (well, with the minor > fix I have already posted), for now, because it doesn't make things worse a > bit, but: > (a) it prevents xfs from being corrupted and I'd really prefer it to be fixed by 'freezeable workqueues'. Can you point me into sources -- which xfs workqueues are problematic? (It would be nice to fix that for 2.6.19, and full bdev freezing looks intrusive to me). > (b) it prevents journaling filesystems in general from replaying journals > after a failing resume. I do not see b) as an useful goal. Pavel -- Thanks for all the (sleeping) penguins. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-12 18:43 ` Pavel Machek @ 2006-11-12 21:53 ` Rafael J. Wysocki 2006-11-12 23:30 ` David Chinner 2006-11-13 7:35 ` Stefan Seyfried 2 siblings, 0 replies; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-12 21:53 UTC (permalink / raw) To: Pavel Machek Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner Hi, On Sunday, 12 November 2006 19:43, Pavel Machek wrote: > Hi! > > > > Okay, so you claim that sys_sync can stall, waiting for administator? > > > > > > In such case we can simply do one sys_sync() before we start freezing > > > userspace... or just more the only sys_sync() there. That way, admin > > > has chance to unlock his system. > > > > Well, this is a different story. > > > > My point is that if we call sys_sync() _anyway_ before calling > > freeze_filesystems(), then freeze_filesystems() is _safe_ (either the > > sys_sync() blocks, or it doesn't in which case freeze_filesystems() won't > > block either). > > > > This means, however, that we can leave the patch as is (well, with the minor > > fix I have already posted), for now, because it doesn't make things worse a > > bit, but: > > (a) it prevents xfs from being corrupted and > > I'd really prefer it to be fixed by 'freezeable workqueues'. Can you > point me into sources -- which xfs workqueues are problematic? I think these: http://www.linux-m32r.org/lxr/http/source/fs/xfs/linux-2.6/xfs_buf.c?a=x86_64#L1829 But then, there's also this one http://www.linux-m32r.org/lxr/http/source/fs/reiserfs/journal.c?a=x86_64#L2837 and some others that I had no time to trace. > (It would be nice to fix that for 2.6.19, and full bdev freezing looks > intrusive to me). IMHO changing __create_workqueue() will be even more intrusive. > > (b) it prevents journaling filesystems in general from replaying journals > > after a failing resume. > > I do not see b) as an useful goal. Okay, let's forget it. Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-12 18:43 ` Pavel Machek 2006-11-12 21:53 ` Rafael J. Wysocki @ 2006-11-12 23:30 ` David Chinner 2006-11-13 16:11 ` Rafael J. Wysocki 2006-11-15 18:50 ` Pavel Machek 2006-11-13 7:35 ` Stefan Seyfried 2 siblings, 2 replies; 69+ messages in thread From: David Chinner @ 2006-11-12 23:30 UTC (permalink / raw) To: Pavel Machek Cc: Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner On Sun, Nov 12, 2006 at 06:43:10PM +0000, Pavel Machek wrote: > Hi! > > > > Okay, so you claim that sys_sync can stall, waiting for administator? > > > > > > In such case we can simply do one sys_sync() before we start freezing > > > userspace... or just more the only sys_sync() there. That way, admin > > > has chance to unlock his system. > > > > Well, this is a different story. > > > > My point is that if we call sys_sync() _anyway_ before calling > > freeze_filesystems(), then freeze_filesystems() is _safe_ (either the > > sys_sync() blocks, or it doesn't in which case freeze_filesystems() won't > > block either). > > > > This means, however, that we can leave the patch as is (well, with the minor > > fix I have already posted), for now, because it doesn't make things worse a > > bit, but: > > (a) it prevents xfs from being corrupted and > > I'd really prefer it to be fixed by 'freezeable workqueues'. I'd prefer that you just freeze the filesystem and let the filesystem do things correctly. > Can you > point me into sources -- which xfs workqueues are problematic? AFAIK, its the I/O completion workqueues that are causing problems. (fs/xfs/linux-2.6/xfs_buf.c) However, thinking about it, I'm not sure that the work queues being left unfrozen is the real problem. i.e. after a sync there's still I/O outstanding (e.g. metadata in the log but not on disk), and because the kernel threads are frozen some time after the sync, we could have issued this delayed write metadata to disk after the sync. With XFS, we can have a of queue of thousands of metadata buffers for delwri, and they are all issued async and can take many seconds for the I/O to complete. The I/O completion workqueues will continue to run until all I/O stops, and metadata I/O completion will change the state of the filesystem in memory. However, even if you stop the workqueue processing, you're still going to have to wait for all I/O completion to occur before snapshotting memory because having any I/O complete changes memory state. Hence I fail to see how freezing the workqueues really helps at all here.... Given that the only way to track and block on these delwri metadata buffers is to issue a sync flush rather than a async flush, suspend has to do something different to guarantee that we block until all those I/Os have completed. i.e. freeze the filesystem. So the problem, IMO, is suspend is not telling the filesystem to stop doing stuff and so we are getting caught out by doing stuff that suspend assumes won't happen but does nothing to prevent. > > (b) it prevents journaling filesystems in general from replaying journals > > after a failing resume. This is incorrect. Freezing an XFS filesystem _ensures_ that log replay occurs on thaw or a failed resume. XFS specifically dirties the log after a freeze down to a consistent state so that the unlinked inode lists get processed by recovery on thaw/next mount. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-12 23:30 ` David Chinner @ 2006-11-13 16:11 ` Rafael J. Wysocki 2006-11-15 18:50 ` Pavel Machek 1 sibling, 0 replies; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-13 16:11 UTC (permalink / raw) To: David Chinner Cc: Pavel Machek, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham On Monday, 13 November 2006 00:30, David Chinner wrote: > On Sun, Nov 12, 2006 at 06:43:10PM +0000, Pavel Machek wrote: > > Hi! > > > > > > Okay, so you claim that sys_sync can stall, waiting for administator? > > > > > > > > In such case we can simply do one sys_sync() before we start freezing > > > > userspace... or just more the only sys_sync() there. That way, admin > > > > has chance to unlock his system. > > > > > > Well, this is a different story. > > > > > > My point is that if we call sys_sync() _anyway_ before calling > > > freeze_filesystems(), then freeze_filesystems() is _safe_ (either the > > > sys_sync() blocks, or it doesn't in which case freeze_filesystems() won't > > > block either). > > > > > > This means, however, that we can leave the patch as is (well, with the minor > > > fix I have already posted), for now, because it doesn't make things worse a > > > bit, but: > > > (a) it prevents xfs from being corrupted and > > > > I'd really prefer it to be fixed by 'freezeable workqueues'. > > I'd prefer that you just freeze the filesystem and let the > filesystem do things correctly. In fact _I_ agree with you and that's what we have in -mm now. However, Pavel apparently thinks it's too invasive, so we are considering (theoretically, for now) a "less invasive" alternative. > > Can you > > point me into sources -- which xfs workqueues are problematic? > > AFAIK, its the I/O completion workqueues that are causing problems. > (fs/xfs/linux-2.6/xfs_buf.c) However, thinking about it, I'm not > sure that the work queues being left unfrozen is the real problem. > > i.e. after a sync there's still I/O outstanding (e.g. metadata in > the log but not on disk), and because the kernel threads are frozen > some time after the sync, we could have issued this delayed write > metadata to disk after the sync. With XFS, we can have a of queue of > thousands of metadata buffers for delwri, and they are all issued > async and can take many seconds for the I/O to complete. > > The I/O completion workqueues will continue to run until all I/O > stops, and metadata I/O completion will change the state of the > filesystem in memory. > > However, even if you stop the workqueue processing, you're still > going to have to wait for all I/O completion to occur before > snapshotting memory because having any I/O complete changes memory > state. Hence I fail to see how freezing the workqueues really helps > at all here.... The changes of the memory state by themselves are handled correctly anyway. The problem is the state of memory after the resume must be consistent with the data and metadata on disk and it will be consistent if there are no changes to the on-disk data and metadata made during the suspend _after_ the suspend image has been created. Also, the on-disk data and metadata should be sufficient to recover the filesystem in case the resume fails (but that's obvious). > Given that the only way to track and block on these delwri metadata > buffers is to issue a sync flush rather than a async flush, suspend > has to do something different to guarantee that we block until all > those I/Os have completed. i.e. freeze the filesystem. > > So the problem, IMO, is suspend is not telling the filesystem > to stop doing stuff and so we are getting caught out by doing > stuff that suspend assumes won't happen but does nothing > to prevent. > > > > (b) it prevents journaling filesystems in general from replaying journals > > > after a failing resume. > > This is incorrect. Freezing an XFS filesystem _ensures_ that log > replay occurs on thaw or a failed resume. XFS specifically dirties > the log after a freeze down to a consistent state so that the > unlinked inode lists get processed by recovery on thaw/next mount. Okay, so I was referring to ext3 and reiserfs. Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-12 23:30 ` David Chinner 2006-11-13 16:11 ` Rafael J. Wysocki @ 2006-11-15 18:50 ` Pavel Machek 2006-11-15 19:56 ` Rafael J. Wysocki 1 sibling, 1 reply; 69+ messages in thread From: Pavel Machek @ 2006-11-15 18:50 UTC (permalink / raw) To: David Chinner Cc: Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham Hi! > > > This means, however, that we can leave the patch as is (well, with the minor > > > fix I have already posted), for now, because it doesn't make things worse a > > > bit, but: > > > (a) it prevents xfs from being corrupted and > > > > I'd really prefer it to be fixed by 'freezeable workqueues'. > > I'd prefer that you just freeze the filesystem and let the > filesystem do things correctly. Well, I'd prefer filesystems not to know about suspend, and current "freeze the filesystem" does not really nest properly. > > Can you > > point me into sources -- which xfs workqueues are problematic? > > AFAIK, its the I/O completion workqueues that are causing problems. > (fs/xfs/linux-2.6/xfs_buf.c) However, thinking about it, I'm not > sure that the work queues being left unfrozen is the real problem. > > i.e. after a sync there's still I/O outstanding (e.g. metadata in > the log but not on disk), and because the kernel threads are frozen > some time after the sync, we could have issued this delayed write > metadata to disk after the sync. With XFS, we can have a of queue of That's okay, snapshot is atomic. As long as data are safely in the journal, we should be okay. > However, even if you stop the workqueue processing, you're still > going to have to wait for all I/O completion to occur before > snapshotting memory because having any I/O complete changes memory > state. Hence I fail to see how freezing the workqueues really helps > at all here.... It is okay to change memory state, just on disk state may not change after atomic snapshot. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-15 18:50 ` Pavel Machek @ 2006-11-15 19:56 ` Rafael J. Wysocki 2006-11-15 20:00 ` Rafael J. Wysocki 0 siblings, 1 reply; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-15 19:56 UTC (permalink / raw) To: Pavel Machek Cc: David Chinner, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham Hi, On Wednesday, 15 November 2006 19:50, Pavel Machek wrote: > Hi! > > > > > This means, however, that we can leave the patch as is (well, with the minor > > > > fix I have already posted), for now, because it doesn't make things worse a > > > > bit, but: > > > > (a) it prevents xfs from being corrupted and > > > > > > I'd really prefer it to be fixed by 'freezeable workqueues'. > > > > I'd prefer that you just freeze the filesystem and let the > > filesystem do things correctly. > > Well, I'd prefer filesystems not to know about suspend, and current > "freeze the filesystem" does not really nest properly. > > > > Can you > > > point me into sources -- which xfs workqueues are problematic? > > > > AFAIK, its the I/O completion workqueues that are causing problems. > > (fs/xfs/linux-2.6/xfs_buf.c) However, thinking about it, I'm not > > sure that the work queues being left unfrozen is the real problem. > > > > i.e. after a sync there's still I/O outstanding (e.g. metadata in > > the log but not on disk), and because the kernel threads are frozen > > some time after the sync, we could have issued this delayed write > > metadata to disk after the sync. With XFS, we can have a of queue of > > That's okay, snapshot is atomic. As long as data are safely in the > journal, we should be okay. > > > However, even if you stop the workqueue processing, you're still > > going to have to wait for all I/O completion to occur before > > snapshotting memory because having any I/O complete changes memory > > state. Hence I fail to see how freezing the workqueues really helps > > at all here.... > > It is okay to change memory state, just on disk state may not change > after atomic snapshot. There's one more thing, actually. If the on-disk data and metadata are changed _after_ the sync we do and _before_ we create the snapshot image, and the subsequent resume fails, there may be problems with recovering the filesystem. That is, if I correctly understand what David has told us so far. Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-15 19:56 ` Rafael J. Wysocki @ 2006-11-15 20:00 ` Rafael J. Wysocki 2006-11-15 20:23 ` Pavel Machek 0 siblings, 1 reply; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-15 20:00 UTC (permalink / raw) To: Pavel Machek Cc: David Chinner, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham On Wednesday, 15 November 2006 20:56, Rafael J. Wysocki wrote: > Hi, > > On Wednesday, 15 November 2006 19:50, Pavel Machek wrote: > > Hi! > > > > > > > This means, however, that we can leave the patch as is (well, with the minor > > > > > fix I have already posted), for now, because it doesn't make things worse a > > > > > bit, but: > > > > > (a) it prevents xfs from being corrupted and > > > > > > > > I'd really prefer it to be fixed by 'freezeable workqueues'. > > > > > > I'd prefer that you just freeze the filesystem and let the > > > filesystem do things correctly. > > > > Well, I'd prefer filesystems not to know about suspend, and current > > "freeze the filesystem" does not really nest properly. > > > > > > Can you > > > > point me into sources -- which xfs workqueues are problematic? > > > > > > AFAIK, its the I/O completion workqueues that are causing problems. > > > (fs/xfs/linux-2.6/xfs_buf.c) However, thinking about it, I'm not > > > sure that the work queues being left unfrozen is the real problem. > > > > > > i.e. after a sync there's still I/O outstanding (e.g. metadata in > > > the log but not on disk), and because the kernel threads are frozen > > > some time after the sync, we could have issued this delayed write > > > metadata to disk after the sync. With XFS, we can have a of queue of > > > > That's okay, snapshot is atomic. As long as data are safely in the > > journal, we should be okay. > > > > > However, even if you stop the workqueue processing, you're still > > > going to have to wait for all I/O completion to occur before > > > snapshotting memory because having any I/O complete changes memory > > > state. Hence I fail to see how freezing the workqueues really helps > > > at all here.... > > > > It is okay to change memory state, just on disk state may not change > > after atomic snapshot. > > There's one more thing, actually. If the on-disk data and metadata are > changed _after_ the sync we do and _before_ we create the snapshot image, > and the subsequent resume fails, Well, but this is equivalent to a power failure immediately after the sync, so there _must_ be a way to recover the filesystem from that, no? I think I'll prepare a patch for freezing the work queues and we'll see what to do next. Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-15 20:00 ` Rafael J. Wysocki @ 2006-11-15 20:23 ` Pavel Machek 2006-11-15 21:58 ` Rafael J. Wysocki 0 siblings, 1 reply; 69+ messages in thread From: Pavel Machek @ 2006-11-15 20:23 UTC (permalink / raw) To: Rafael J. Wysocki Cc: David Chinner, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham Hi! > > There's one more thing, actually. If the on-disk data and metadata are > > changed _after_ the sync we do and _before_ we create the snapshot image, > > and the subsequent resume fails, > > Well, but this is equivalent to a power failure immediately after the sync, so > there _must_ be a way to recover the filesystem from that, no? Exactly. > I think I'll prepare a patch for freezing the work queues and we'll see what > to do next. Thanks! Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-15 20:23 ` Pavel Machek @ 2006-11-15 21:58 ` Rafael J. Wysocki 2006-11-15 22:49 ` Pavel Machek 2006-11-16 23:20 ` David Chinner 0 siblings, 2 replies; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-15 21:58 UTC (permalink / raw) To: Pavel Machek Cc: David Chinner, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham Hi, On Wednesday, 15 November 2006 21:23, Pavel Machek wrote: > Hi! > > > > There's one more thing, actually. If the on-disk data and metadata are > > > changed _after_ the sync we do and _before_ we create the snapshot image, > > > and the subsequent resume fails, > > > > Well, but this is equivalent to a power failure immediately after the sync, so > > there _must_ be a way to recover the filesystem from that, no? > > Exactly. > > > I think I'll prepare a patch for freezing the work queues and we'll see what > > to do next. > > Thanks! Okay, the patch follows. I've been running it for some time on my boxes and it doesn't seem to break anything. However, I don't use XFS, so well ... Greetings, Rafael Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> --- fs/xfs/linux-2.6/xfs_buf.c | 4 ++-- include/linux/workqueue.h | 8 +++++--- kernel/workqueue.c | 20 ++++++++++++++------ 3 files changed, 21 insertions(+), 11 deletions(-) Index: linux-2.6.19-rc5-mm2/include/linux/workqueue.h =================================================================== --- linux-2.6.19-rc5-mm2.orig/include/linux/workqueue.h 2006-09-20 05:42:06.000000000 +0200 +++ linux-2.6.19-rc5-mm2/include/linux/workqueue.h 2006-11-15 21:58:40.000000000 +0100 @@ -55,9 +55,11 @@ struct execute_work { } while (0) extern struct workqueue_struct *__create_workqueue(const char *name, - int singlethread); -#define create_workqueue(name) __create_workqueue((name), 0) -#define create_singlethread_workqueue(name) __create_workqueue((name), 1) + int singlethread, + int freezeable); +#define create_workqueue(name) __create_workqueue((name), 0, 0) +#define create_freezeable_workqueue(name) __create_workqueue((name), 0, 1) +#define create_singlethread_workqueue(name) __create_workqueue((name), 1, 0) extern void destroy_workqueue(struct workqueue_struct *wq); Index: linux-2.6.19-rc5-mm2/kernel/workqueue.c =================================================================== --- linux-2.6.19-rc5-mm2.orig/kernel/workqueue.c 2006-11-15 21:32:13.000000000 +0100 +++ linux-2.6.19-rc5-mm2/kernel/workqueue.c 2006-11-15 22:27:40.000000000 +0100 @@ -31,6 +31,7 @@ #include <linux/mempolicy.h> #include <linux/kallsyms.h> #include <linux/debug_locks.h> +#include <linux/freezer.h> /* * The per-CPU workqueue (if single thread, we always use the first @@ -57,6 +58,8 @@ struct cpu_workqueue_struct { struct task_struct *thread; int run_depth; /* Detect run_workqueue() recursion depth */ + + int freezeable; /* Freeze the thread during suspend */ } ____cacheline_aligned; /* @@ -251,7 +254,8 @@ static int worker_thread(void *__cwq) struct k_sigaction sa; sigset_t blocked; - current->flags |= PF_NOFREEZE; + if (!cwq->freezeable) + current->flags |= PF_NOFREEZE; set_user_nice(current, -5); @@ -274,6 +278,9 @@ static int worker_thread(void *__cwq) set_current_state(TASK_INTERRUPTIBLE); while (!kthread_should_stop()) { + if (cwq->freezeable) + try_to_freeze(); + add_wait_queue(&cwq->more_work, &wait); if (list_empty(&cwq->worklist)) schedule(); @@ -350,7 +357,7 @@ void fastcall flush_workqueue(struct wor EXPORT_SYMBOL_GPL(flush_workqueue); static struct task_struct *create_workqueue_thread(struct workqueue_struct *wq, - int cpu) + int cpu, int freezeable) { struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu); struct task_struct *p; @@ -360,6 +367,7 @@ static struct task_struct *create_workqu cwq->thread = NULL; cwq->insert_sequence = 0; cwq->remove_sequence = 0; + cwq->freezeable = freezeable; INIT_LIST_HEAD(&cwq->worklist); init_waitqueue_head(&cwq->more_work); init_waitqueue_head(&cwq->work_done); @@ -375,7 +383,7 @@ static struct task_struct *create_workqu } struct workqueue_struct *__create_workqueue(const char *name, - int singlethread) + int singlethread, int freezeable) { int cpu, destroy = 0; struct workqueue_struct *wq; @@ -395,7 +403,7 @@ struct workqueue_struct *__create_workqu mutex_lock(&workqueue_mutex); if (singlethread) { INIT_LIST_HEAD(&wq->list); - p = create_workqueue_thread(wq, singlethread_cpu); + p = create_workqueue_thread(wq, singlethread_cpu, freezeable); if (!p) destroy = 1; else @@ -403,7 +411,7 @@ struct workqueue_struct *__create_workqu } else { list_add(&wq->list, &workqueues); for_each_online_cpu(cpu) { - p = create_workqueue_thread(wq, cpu); + p = create_workqueue_thread(wq, cpu, freezeable); if (p) { kthread_bind(p, cpu); wake_up_process(p); @@ -657,7 +665,7 @@ static int __devinit workqueue_cpu_callb mutex_lock(&workqueue_mutex); /* Create a new workqueue thread for it. */ list_for_each_entry(wq, &workqueues, list) { - if (!create_workqueue_thread(wq, hotcpu)) { + if (!create_workqueue_thread(wq, hotcpu, 0)) { printk("workqueue for %i failed\n", hotcpu); return NOTIFY_BAD; } Index: linux-2.6.19-rc5-mm2/fs/xfs/linux-2.6/xfs_buf.c =================================================================== --- linux-2.6.19-rc5-mm2.orig/fs/xfs/linux-2.6/xfs_buf.c 2006-11-15 21:32:08.000000000 +0100 +++ linux-2.6.19-rc5-mm2/fs/xfs/linux-2.6/xfs_buf.c 2006-11-15 22:12:43.000000000 +0100 @@ -1826,11 +1826,11 @@ xfs_buf_init(void) if (!xfs_buf_zone) goto out_free_trace_buf; - xfslogd_workqueue = create_workqueue("xfslogd"); + xfslogd_workqueue = create_freezeable_workqueue("xfslogd"); if (!xfslogd_workqueue) goto out_free_buf_zone; - xfsdatad_workqueue = create_workqueue("xfsdatad"); + xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad"); if (!xfsdatad_workqueue) goto out_destroy_xfslogd_workqueue; ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-15 21:58 ` Rafael J. Wysocki @ 2006-11-15 22:49 ` Pavel Machek 2006-11-16 23:20 ` David Chinner 1 sibling, 0 replies; 69+ messages in thread From: Pavel Machek @ 2006-11-15 22:49 UTC (permalink / raw) To: Rafael J. Wysocki Cc: David Chinner, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham Hi! > > > > There's one more thing, actually. If the on-disk data and metadata are > > > > changed _after_ the sync we do and _before_ we create the snapshot image, > > > > and the subsequent resume fails, > > > > > > Well, but this is equivalent to a power failure immediately after the sync, so > > > there _must_ be a way to recover the filesystem from that, no? > > > > Exactly. > > > > > I think I'll prepare a patch for freezing the work queues and we'll see what > > > to do next. > > > > Thanks! > > Okay, the patch follows. > > I've been running it for some time on my boxes and it doesn't seem to break > anything. However, I don't use XFS, so well ... Seems like a way to go, thanks! Pavel > Index: linux-2.6.19-rc5-mm2/include/linux/workqueue.h > =================================================================== > --- linux-2.6.19-rc5-mm2.orig/include/linux/workqueue.h 2006-09-20 05:42:06.000000000 +0200 > +++ linux-2.6.19-rc5-mm2/include/linux/workqueue.h 2006-11-15 21:58:40.000000000 +0100 > @@ -55,9 +55,11 @@ struct execute_work { > } while (0) > > extern struct workqueue_struct *__create_workqueue(const char *name, > - int singlethread); > -#define create_workqueue(name) __create_workqueue((name), 0) > -#define create_singlethread_workqueue(name) __create_workqueue((name), 1) > + int singlethread, > + int freezeable); > +#define create_workqueue(name) __create_workqueue((name), 0, 0) > +#define create_freezeable_workqueue(name) __create_workqueue((name), 0, 1) > +#define create_singlethread_workqueue(name) __create_workqueue((name), 1, 0) > > extern void destroy_workqueue(struct workqueue_struct *wq); > > Index: linux-2.6.19-rc5-mm2/kernel/workqueue.c > =================================================================== > --- linux-2.6.19-rc5-mm2.orig/kernel/workqueue.c 2006-11-15 21:32:13.000000000 +0100 > +++ linux-2.6.19-rc5-mm2/kernel/workqueue.c 2006-11-15 22:27:40.000000000 +0100 > @@ -31,6 +31,7 @@ > #include <linux/mempolicy.h> > #include <linux/kallsyms.h> > #include <linux/debug_locks.h> > +#include <linux/freezer.h> > > /* > * The per-CPU workqueue (if single thread, we always use the first > @@ -57,6 +58,8 @@ struct cpu_workqueue_struct { > struct task_struct *thread; > > int run_depth; /* Detect run_workqueue() recursion depth */ > + > + int freezeable; /* Freeze the thread during suspend */ > } ____cacheline_aligned; > > /* > @@ -251,7 +254,8 @@ static int worker_thread(void *__cwq) > struct k_sigaction sa; > sigset_t blocked; > > - current->flags |= PF_NOFREEZE; > + if (!cwq->freezeable) > + current->flags |= PF_NOFREEZE; > > set_user_nice(current, -5); > > @@ -274,6 +278,9 @@ static int worker_thread(void *__cwq) > > set_current_state(TASK_INTERRUPTIBLE); > while (!kthread_should_stop()) { > + if (cwq->freezeable) > + try_to_freeze(); > + > add_wait_queue(&cwq->more_work, &wait); > if (list_empty(&cwq->worklist)) > schedule(); > @@ -350,7 +357,7 @@ void fastcall flush_workqueue(struct wor > EXPORT_SYMBOL_GPL(flush_workqueue); > > static struct task_struct *create_workqueue_thread(struct workqueue_struct *wq, > - int cpu) > + int cpu, int freezeable) > { > struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu); > struct task_struct *p; > @@ -360,6 +367,7 @@ static struct task_struct *create_workqu > cwq->thread = NULL; > cwq->insert_sequence = 0; > cwq->remove_sequence = 0; > + cwq->freezeable = freezeable; > INIT_LIST_HEAD(&cwq->worklist); > init_waitqueue_head(&cwq->more_work); > init_waitqueue_head(&cwq->work_done); > @@ -375,7 +383,7 @@ static struct task_struct *create_workqu > } > > struct workqueue_struct *__create_workqueue(const char *name, > - int singlethread) > + int singlethread, int freezeable) > { > int cpu, destroy = 0; > struct workqueue_struct *wq; > @@ -395,7 +403,7 @@ struct workqueue_struct *__create_workqu > mutex_lock(&workqueue_mutex); > if (singlethread) { > INIT_LIST_HEAD(&wq->list); > - p = create_workqueue_thread(wq, singlethread_cpu); > + p = create_workqueue_thread(wq, singlethread_cpu, freezeable); > if (!p) > destroy = 1; > else > @@ -403,7 +411,7 @@ struct workqueue_struct *__create_workqu > } else { > list_add(&wq->list, &workqueues); > for_each_online_cpu(cpu) { > - p = create_workqueue_thread(wq, cpu); > + p = create_workqueue_thread(wq, cpu, freezeable); > if (p) { > kthread_bind(p, cpu); > wake_up_process(p); > @@ -657,7 +665,7 @@ static int __devinit workqueue_cpu_callb > mutex_lock(&workqueue_mutex); > /* Create a new workqueue thread for it. */ > list_for_each_entry(wq, &workqueues, list) { > - if (!create_workqueue_thread(wq, hotcpu)) { > + if (!create_workqueue_thread(wq, hotcpu, 0)) { > printk("workqueue for %i failed\n", hotcpu); > return NOTIFY_BAD; > } > Index: linux-2.6.19-rc5-mm2/fs/xfs/linux-2.6/xfs_buf.c > =================================================================== > --- linux-2.6.19-rc5-mm2.orig/fs/xfs/linux-2.6/xfs_buf.c 2006-11-15 21:32:08.000000000 +0100 > +++ linux-2.6.19-rc5-mm2/fs/xfs/linux-2.6/xfs_buf.c 2006-11-15 22:12:43.000000000 +0100 > @@ -1826,11 +1826,11 @@ xfs_buf_init(void) > if (!xfs_buf_zone) > goto out_free_trace_buf; > > - xfslogd_workqueue = create_workqueue("xfslogd"); > + xfslogd_workqueue = create_freezeable_workqueue("xfslogd"); > if (!xfslogd_workqueue) > goto out_free_buf_zone; > > - xfsdatad_workqueue = create_workqueue("xfsdatad"); > + xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad"); > if (!xfsdatad_workqueue) > goto out_destroy_xfslogd_workqueue; > -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-15 21:58 ` Rafael J. Wysocki 2006-11-15 22:49 ` Pavel Machek @ 2006-11-16 23:20 ` David Chinner 2006-11-16 23:38 ` Pavel Machek 1 sibling, 1 reply; 69+ messages in thread From: David Chinner @ 2006-11-16 23:20 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Pavel Machek, David Chinner, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham On Wed, Nov 15, 2006 at 10:58:51PM +0100, Rafael J. Wysocki wrote: > On Wednesday, 15 November 2006 21:23, Pavel Machek wrote: > > > I think I'll prepare a patch for freezing the work queues and we'll see what > > > to do next. > > > > Thanks! > > Okay, the patch follows. > > I've been running it for some time on my boxes and it doesn't seem to break > anything. However, I don't use XFS, so well ... So you haven't actually tested whether it fixes anything or whether it introduces any regressions? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-16 23:20 ` David Chinner @ 2006-11-16 23:38 ` Pavel Machek 0 siblings, 0 replies; 69+ messages in thread From: Pavel Machek @ 2006-11-16 23:38 UTC (permalink / raw) To: David Chinner Cc: Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham Hi! > > > > I think I'll prepare a patch for freezing the work queues and we'll see what > > > > to do next. > > > > > > Thanks! > > > > Okay, the patch follows. > > > > I've been running it for some time on my boxes and it doesn't seem to break > > anything. However, I don't use XFS, so well ... > > So you haven't actually tested whether it fixes anything or whether > it introduces any regressions? Noone has a testcase for a problem; swsusp just does not eat filesystems in practice. Fill free to test the patch, but I'm pretty sure it will not break anything. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-12 18:43 ` Pavel Machek 2006-11-12 21:53 ` Rafael J. Wysocki 2006-11-12 23:30 ` David Chinner @ 2006-11-13 7:35 ` Stefan Seyfried 2 siblings, 0 replies; 69+ messages in thread From: Stefan Seyfried @ 2006-11-13 7:35 UTC (permalink / raw) To: Pavel Machek Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner, Rafael J. Wysocki On Sun, Nov 12, 2006 at 06:43:10PM +0000, Pavel Machek wrote: > Hi! > > (b) it prevents journaling filesystems in general from replaying journals > > after a failing resume. > > I do not see b) as an useful goal. I'm not sure, but i guess it also solves "GRUB takes a minute to load kernel and initrd from /boot on suspended reiserfs"-problem, which i see as a _very_ useful goal. Often, most of the time needed for resume is spent by GRUB loading the kernel/initrd from a journaled FS. -- Stefan Seyfried QA / R&D Team Mobile Devices | "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-09 22:21 ` Rafael J. Wysocki 2006-11-09 23:11 ` Pavel Machek @ 2006-11-10 0:57 ` David Chinner 2006-11-10 10:39 ` Pavel Machek 1 sibling, 1 reply; 69+ messages in thread From: David Chinner @ 2006-11-10 0:57 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Pavel Machek, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner On Thu, Nov 09, 2006 at 11:21:46PM +0100, Rafael J. Wysocki wrote: > I think we can add a flag to __create_workqueue() that will indicate if > this one is to be running with PF_NOFREEZE and a corresponding macro like > create_freezable_workqueue() to be used wherever we want the worker thread > to freeze (in which case it should be calling try_to_freeze() somewhere). > Then, we can teach filesystems to use this macro instead of > create_workqueue(). At what point does the workqueue get frozen? i.e. how does this guarantee an unfrozen filesystem will end up in a consistent state? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-10 0:57 ` David Chinner @ 2006-11-10 10:39 ` Pavel Machek 2006-11-12 22:30 ` David Chinner 0 siblings, 1 reply; 69+ messages in thread From: Pavel Machek @ 2006-11-10 10:39 UTC (permalink / raw) To: David Chinner Cc: Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham On Fri 2006-11-10 11:57:49, David Chinner wrote: > On Thu, Nov 09, 2006 at 11:21:46PM +0100, Rafael J. Wysocki wrote: > > I think we can add a flag to __create_workqueue() that will indicate if > > this one is to be running with PF_NOFREEZE and a corresponding macro like > > create_freezable_workqueue() to be used wherever we want the worker thread > > to freeze (in which case it should be calling try_to_freeze() somewhere). > > Then, we can teach filesystems to use this macro instead of > > create_workqueue(). > > At what point does the workqueue get frozen? i.e. how does this > guarantee an unfrozen filesystem will end up in a consistent > state? Snapshot is atomic; workqueue will be unfrozen with everyone else, but as there were no writes in the meantime, there should be no problems. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-10 10:39 ` Pavel Machek @ 2006-11-12 22:30 ` David Chinner 2006-11-12 22:43 ` Rafael J. Wysocki 0 siblings, 1 reply; 69+ messages in thread From: David Chinner @ 2006-11-12 22:30 UTC (permalink / raw) To: Pavel Machek Cc: David Chinner, Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham On Fri, Nov 10, 2006 at 11:39:42AM +0100, Pavel Machek wrote: > On Fri 2006-11-10 11:57:49, David Chinner wrote: > > On Thu, Nov 09, 2006 at 11:21:46PM +0100, Rafael J. Wysocki wrote: > > > I think we can add a flag to __create_workqueue() that will indicate if > > > this one is to be running with PF_NOFREEZE and a corresponding macro like > > > create_freezable_workqueue() to be used wherever we want the worker thread > > > to freeze (in which case it should be calling try_to_freeze() somewhere). > > > Then, we can teach filesystems to use this macro instead of > > > create_workqueue(). > > > > At what point does the workqueue get frozen? i.e. how does this > > guarantee an unfrozen filesystem will end up in a consistent > > state? > > Snapshot is atomic; workqueue will be unfrozen with everyone else, but > as there were no writes in the meantime, there should be no problems. That doesn't answer my question - when in the sequence of freezing do you propose diasbling the workqueues? before the kernel threads, after the kernel threads, before you sync the filesystem? And how does freezing them at that point in time guarantee consistent filesystem state? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-12 22:30 ` David Chinner @ 2006-11-12 22:43 ` Rafael J. Wysocki 2006-11-13 5:43 ` David Chinner 2006-11-16 23:23 ` David Chinner 0 siblings, 2 replies; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-12 22:43 UTC (permalink / raw) To: David Chinner Cc: Pavel Machek, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham On Sunday, 12 November 2006 23:30, David Chinner wrote: > On Fri, Nov 10, 2006 at 11:39:42AM +0100, Pavel Machek wrote: > > On Fri 2006-11-10 11:57:49, David Chinner wrote: > > > On Thu, Nov 09, 2006 at 11:21:46PM +0100, Rafael J. Wysocki wrote: > > > > I think we can add a flag to __create_workqueue() that will indicate if > > > > this one is to be running with PF_NOFREEZE and a corresponding macro like > > > > create_freezable_workqueue() to be used wherever we want the worker thread > > > > to freeze (in which case it should be calling try_to_freeze() somewhere). > > > > Then, we can teach filesystems to use this macro instead of > > > > create_workqueue(). > > > > > > At what point does the workqueue get frozen? i.e. how does this > > > guarantee an unfrozen filesystem will end up in a consistent > > > state? > > > > Snapshot is atomic; workqueue will be unfrozen with everyone else, but > > as there were no writes in the meantime, there should be no problems. > > That doesn't answer my question - when in the sequence of freezing > do you propose diasbling the workqueues? before the kernel threads, > after the kernel threads, before you sync the filesystem? After the sync, along with the freezing of kernel threads. > And how does freezing them at that point in time guarantee consistent > filesystem state? If the work queues are frozen, there won't be any fs-related activity _after_ we create the suspend image. The sync is done after the userland has been frozen, so if the resume is unsuccessful, we'll be able to recover the state of the fs right before the sync, and if the resume is successful, we'll be able to continue (the state of memory will be the same as before the creation of the suspend image and the state of the disk will be the same as before the creation of the suspend image). Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-12 22:43 ` Rafael J. Wysocki @ 2006-11-13 5:43 ` David Chinner 2006-11-13 16:22 ` Rafael J. Wysocki 2006-11-16 23:23 ` David Chinner 1 sibling, 1 reply; 69+ messages in thread From: David Chinner @ 2006-11-13 5:43 UTC (permalink / raw) To: Rafael J. Wysocki Cc: David Chinner, Pavel Machek, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham On Sun, Nov 12, 2006 at 11:43:05PM +0100, Rafael J. Wysocki wrote: > On Sunday, 12 November 2006 23:30, David Chinner wrote: > > On Fri, Nov 10, 2006 at 11:39:42AM +0100, Pavel Machek wrote: > > > On Fri 2006-11-10 11:57:49, David Chinner wrote: > > > > On Thu, Nov 09, 2006 at 11:21:46PM +0100, Rafael J. Wysocki wrote: > > > > > I think we can add a flag to __create_workqueue() that will indicate if > > > > > this one is to be running with PF_NOFREEZE and a corresponding macro like > > > > > create_freezable_workqueue() to be used wherever we want the worker thread > > > > > to freeze (in which case it should be calling try_to_freeze() somewhere). > > > > > Then, we can teach filesystems to use this macro instead of > > > > > create_workqueue(). > > > > > > > > At what point does the workqueue get frozen? i.e. how does this > > > > guarantee an unfrozen filesystem will end up in a consistent > > > > state? > > > > > > Snapshot is atomic; workqueue will be unfrozen with everyone else, but > > > as there were no writes in the meantime, there should be no problems. > > > > That doesn't answer my question - when in the sequence of freezing > > do you propose diasbling the workqueues? before the kernel threads, > > after the kernel threads, before you sync the filesystem? > > After the sync, along with the freezing of kernel threads. Before or after freezing of the kthreads? Order _could_ be important, and different filesystems might require different orders. What then? > > And how does freezing them at that point in time guarantee consistent > > filesystem state? > > If the work queues are frozen, there won't be any fs-related activity _after_ > we create the suspend image. What about if there is still I/O in progress (i.e. kthread wins race and issues async I/O after the sync but before it's frozen) - freezing the workqueues does not prevent this activity and memory state will continue to change as long as there is I/O completing... > The sync is done after the userland has been > frozen, so if the resume is unsuccessful, we'll be able to recover the state > of the fs right before the sync, Yes, in most cases. > and if the resume is successful, we'll be > able to continue (the state of memory will be the same as before the creation > of the suspend image and the state of the disk will be the same as before the > creation of the suspend image). Assuming that you actually suspended an idle filesystem, which sync does not guarantee you. Rather than assuming the filesystem is idle, why not guarantee that it is idle by freezing it? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-13 5:43 ` David Chinner @ 2006-11-13 16:22 ` Rafael J. Wysocki 2006-11-14 0:10 ` David Chinner 0 siblings, 1 reply; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-13 16:22 UTC (permalink / raw) To: David Chinner Cc: Pavel Machek, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham On Monday, 13 November 2006 06:43, David Chinner wrote: > On Sun, Nov 12, 2006 at 11:43:05PM +0100, Rafael J. Wysocki wrote: > > On Sunday, 12 November 2006 23:30, David Chinner wrote: > > > On Fri, Nov 10, 2006 at 11:39:42AM +0100, Pavel Machek wrote: > > > > On Fri 2006-11-10 11:57:49, David Chinner wrote: > > > > > On Thu, Nov 09, 2006 at 11:21:46PM +0100, Rafael J. Wysocki wrote: > > > > > > I think we can add a flag to __create_workqueue() that will indicate if > > > > > > this one is to be running with PF_NOFREEZE and a corresponding macro like > > > > > > create_freezable_workqueue() to be used wherever we want the worker thread > > > > > > to freeze (in which case it should be calling try_to_freeze() somewhere). > > > > > > Then, we can teach filesystems to use this macro instead of > > > > > > create_workqueue(). > > > > > > > > > > At what point does the workqueue get frozen? i.e. how does this > > > > > guarantee an unfrozen filesystem will end up in a consistent > > > > > state? > > > > > > > > Snapshot is atomic; workqueue will be unfrozen with everyone else, but > > > > as there were no writes in the meantime, there should be no problems. > > > > > > That doesn't answer my question - when in the sequence of freezing > > > do you propose diasbling the workqueues? before the kernel threads, > > > after the kernel threads, before you sync the filesystem? > > > > After the sync, along with the freezing of kernel threads. > > Before or after freezing of the kthreads? Order _could_ be > important, and different filesystems might require different > orders. What then? Well, I don't really think the order is important. If the freezing of work queues is done by the freezing of their respective worker threads, the other threads won't even know they have been frozen. > > > And how does freezing them at that point in time guarantee consistent > > > filesystem state? > > > > If the work queues are frozen, there won't be any fs-related activity _after_ > > we create the suspend image. > > What about if there is still I/O in progress (i.e. kthread wins race and > issues async I/O after the sync but before it's frozen) - freezing the > workqueues does not prevent this activity and memory state will continue to > change as long as there is I/O completing... > > > The sync is done after the userland has been > > frozen, so if the resume is unsuccessful, we'll be able to recover the state > > of the fs right before the sync, > > Yes, in most cases. > > > and if the resume is successful, we'll be > > able to continue (the state of memory will be the same as before the creation > > of the suspend image and the state of the disk will be the same as before the > > creation of the suspend image). > > Assuming that you actually suspended an idle filesystem, which sync does not > guarantee you. Even if it's not idle, we are safe as long as the I/O activity doesn't continue after the suspend image has been created. > Rather than assuming the filesystem is idle, why not guarantee > that it is idle by freezing it? Well, _I_ personally think that the freezing of filesystems is the right thing to do, although it may lead to some complications down the road. Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-13 16:22 ` Rafael J. Wysocki @ 2006-11-14 0:10 ` David Chinner 0 siblings, 0 replies; 69+ messages in thread From: David Chinner @ 2006-11-14 0:10 UTC (permalink / raw) To: Rafael J. Wysocki Cc: David Chinner, Pavel Machek, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham On Mon, Nov 13, 2006 at 05:22:54PM +0100, Rafael J. Wysocki wrote: > On Monday, 13 November 2006 06:43, David Chinner wrote: > > On Sun, Nov 12, 2006 at 11:43:05PM +0100, Rafael J. Wysocki wrote: > > Before or after freezing of the kthreads? Order _could_ be > > important, and different filesystems might require different > > orders. What then? > > Well, I don't really think the order is important. If the freezing of work > queues is done by the freezing of their respective worker threads, the > other threads won't even know they have been frozen. True - I just think that some defined ordering semantics might be helpful for people trying to use workqueues and kthreads in the future. It would also help us determine if the current usage is safe as well... > > > and if the resume is successful, we'll be > > > able to continue (the state of memory will be the same as before the creation > > > of the suspend image and the state of the disk will be the same as before the > > > creation of the suspend image). > > > > Assuming that you actually suspended an idle filesystem, which sync does not > > guarantee you. > > Even if it's not idle, we are safe as long as the I/O activity doesn't > continue after the suspend image has been created. And that is my great concern - there really is nothing to prevent a fs from having I/O outstanding while the suspend image is being created. > > Rather than assuming the filesystem is idle, why not guarantee > > that it is idle by freezing it? > > Well, _I_ personally think that the freezing of filesystems is the right thing > to do, although it may lead to some complications down the road. *nod* I would also prefer to start with something we know is safe and work from there. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-12 22:43 ` Rafael J. Wysocki 2006-11-13 5:43 ` David Chinner @ 2006-11-16 23:23 ` David Chinner 2006-11-16 23:40 ` Pavel Machek 1 sibling, 1 reply; 69+ messages in thread From: David Chinner @ 2006-11-16 23:23 UTC (permalink / raw) To: Rafael J. Wysocki Cc: David Chinner, Pavel Machek, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham On Sun, Nov 12, 2006 at 11:43:05PM +0100, Rafael J. Wysocki wrote: > On Sunday, 12 November 2006 23:30, David Chinner wrote: > > And how does freezing them at that point in time guarantee consistent > > filesystem state? > > If the work queues are frozen, there won't be any fs-related activity _after_ > we create the suspend image. fs-related activity before or after the suspend image is captured is not a problem - it's fs-related activity _during_ the suspend that is an issue here. If we have async I/O completing during the suspend image capture, we've got problems.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-16 23:23 ` David Chinner @ 2006-11-16 23:40 ` Pavel Machek 2006-11-17 1:40 ` David Chinner 0 siblings, 1 reply; 69+ messages in thread From: Pavel Machek @ 2006-11-16 23:40 UTC (permalink / raw) To: David Chinner Cc: Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham On Fri 2006-11-17 10:23:49, David Chinner wrote: > On Sun, Nov 12, 2006 at 11:43:05PM +0100, Rafael J. Wysocki wrote: > > On Sunday, 12 November 2006 23:30, David Chinner wrote: > > > And how does freezing them at that point in time guarantee consistent > > > filesystem state? > > > > If the work queues are frozen, there won't be any fs-related activity _after_ > > we create the suspend image. > > fs-related activity before or after the suspend image is captured is > not a problem - it's fs-related activity _during_ the suspend that > is an issue here. If we have async I/O completing during the suspend > image capture, we've got problems.... fs-related activity _after_ image is captured definitely is a problem -- it breaks swsusp invariants. During image capture, any fs-related activity is not possible, as we are running interrupts disabled, DMA disabled. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-16 23:40 ` Pavel Machek @ 2006-11-17 1:40 ` David Chinner 2006-11-17 15:13 ` Pavel Machek 0 siblings, 1 reply; 69+ messages in thread From: David Chinner @ 2006-11-17 1:40 UTC (permalink / raw) To: Pavel Machek Cc: David Chinner, Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham On Fri, Nov 17, 2006 at 12:40:53AM +0100, Pavel Machek wrote: > On Fri 2006-11-17 10:23:49, David Chinner wrote: > > On Sun, Nov 12, 2006 at 11:43:05PM +0100, Rafael J. Wysocki wrote: > > > On Sunday, 12 November 2006 23:30, David Chinner wrote: > > > > And how does freezing them at that point in time guarantee consistent > > > > filesystem state? > > > > > > If the work queues are frozen, there won't be any fs-related activity _after_ > > > we create the suspend image. > > > > fs-related activity before or after the suspend image is captured is > > not a problem - it's fs-related activity _during_ the suspend that > > is an issue here. If we have async I/O completing during the suspend > > image capture, we've got problems.... > > fs-related activity _after_ image is captured definitely is a problem > -- it breaks swsusp invariants. > > During image capture, any fs-related activity is not possible, as we > are running interrupts disabled, DMA disabled. Ok, so the I/o that finishes during the image capture won't be reported until after the capture completes. that means we lose the capability to even run the I/O completion on those buffers once we get to resume. They've already been issued, so will be locked and never issued again because the suspend image says they've been issued, but they'll never complete on resume because their completion status was updated after the capture was taken and will never be recreated after resume. IOWs, the fileystem will start to hang on the next reference to any of these locked buffers that the filesystem thinks is still under I/O.... Freezing the workqueues doesn't fix this..... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-17 1:40 ` David Chinner @ 2006-11-17 15:13 ` Pavel Machek 0 siblings, 0 replies; 69+ messages in thread From: Pavel Machek @ 2006-11-17 15:13 UTC (permalink / raw) To: David Chinner Cc: Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham On Fri 2006-11-17 12:40:51, David Chinner wrote: > On Fri, Nov 17, 2006 at 12:40:53AM +0100, Pavel Machek wrote: > > On Fri 2006-11-17 10:23:49, David Chinner wrote: > > > On Sun, Nov 12, 2006 at 11:43:05PM +0100, Rafael J. Wysocki wrote: > > > > On Sunday, 12 November 2006 23:30, David Chinner wrote: > > > > > And how does freezing them at that point in time guarantee consistent > > > > > filesystem state? > > > > > > > > If the work queues are frozen, there won't be any fs-related activity _after_ > > > > we create the suspend image. > > > > > > fs-related activity before or after the suspend image is captured is > > > not a problem - it's fs-related activity _during_ the suspend that > > > is an issue here. If we have async I/O completing during the suspend > > > image capture, we've got problems.... > > > > fs-related activity _after_ image is captured definitely is a problem > > -- it breaks swsusp invariants. > > > > During image capture, any fs-related activity is not possible, as we > > are running interrupts disabled, DMA disabled. > > Ok, so the I/o that finishes during the image capture won't be reported > until after the capture completes. that means we lose the capability There's no I/O in flight during image capture. Interrupts are disabled, DMAs are stopped, and drivers were told to shut down (that includes finishing any outstanding work, and drivers do that currently; but perhaps docs should be more explicit about it). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-09 21:17 ` Pavel Machek 2006-11-09 21:18 ` Rafael J. Wysocki @ 2006-11-10 0:54 ` David Chinner 2006-11-10 10:24 ` Alan Cox 2 siblings, 0 replies; 69+ messages in thread From: David Chinner @ 2006-11-10 0:54 UTC (permalink / raw) To: Pavel Machek Cc: Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner On Thu, Nov 09, 2006 at 10:17:22PM +0100, Pavel Machek wrote: > Why not simply &~ PF_NOFREEZE on that particular process? Filesystems > are free to use threads/work queues/whatever, but refrigerator should > mean "no writes to filesystem" for them... Freezing the filesytem is the way to tell the filesystem "no more writes to the filesytem". Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-09 21:17 ` Pavel Machek 2006-11-09 21:18 ` Rafael J. Wysocki 2006-11-10 0:54 ` David Chinner @ 2006-11-10 10:24 ` Alan Cox 2006-11-10 10:36 ` Pavel Machek 2 siblings, 1 reply; 69+ messages in thread From: Alan Cox @ 2006-11-10 10:24 UTC (permalink / raw) To: Pavel Machek Cc: Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner Ar Iau, 2006-11-09 am 22:17 +0100, ysgrifennodd Pavel Machek: > Why not simply &~ PF_NOFREEZE on that particular process? Filesystems > are free to use threads/work queues/whatever, but refrigerator should > mean "no writes to filesystem" for them... You can't go around altering the flags of another process - what locking are you relying upon for this ? ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-10 10:24 ` Alan Cox @ 2006-11-10 10:36 ` Pavel Machek 0 siblings, 0 replies; 69+ messages in thread From: Pavel Machek @ 2006-11-10 10:36 UTC (permalink / raw) To: Alan Cox Cc: Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner HI! > > Why not simply &~ PF_NOFREEZE on that particular process? Filesystems > > are free to use threads/work queues/whatever, but refrigerator should > > mean "no writes to filesystem" for them... > > You can't go around altering the flags of another process - what locking > are you relying upon for this ? Well, my idea was for process to &~ PF_NOFREEZE on itself. We are currently (kernel/power/process.c) using &p->sighand->siglock for some accesses, and no locking for others. There were some attempts to fix this, but they led to problems, and we are not hitting any problems... part of reason is that we hot-unplug non-boot cpus before freezing processes, so we are running UP at that point. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-09 16:00 ` Pavel Machek 2006-11-09 19:59 ` Rafael J. Wysocki @ 2006-11-10 0:33 ` David Chinner 2006-11-10 10:38 ` Pavel Machek 1 sibling, 1 reply; 69+ messages in thread From: David Chinner @ 2006-11-10 0:33 UTC (permalink / raw) To: Pavel Machek Cc: Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham, David Chinner On Thu, Nov 09, 2006 at 05:00:03PM +0100, Pavel Machek wrote: > > > > Well, it looks like the interactions with dm add quite a bit of > > > > complexity here. > > > > > > What about just fixing xfs (thou shall not write to disk when kernel > > > threads are frozen), and getting rid of blockdev freezing? > > > > Well, first I must admit you were absolutely right being suspicious with > > respect to this stuff. > > (OTOH your patch found real bugs in suspend.c, so...) > > > OTOH I have no idea _how_ we can tell xfs that the processes have been > > frozen. Should we introduce a global flag for that or something? > > I guess XFS should just do all the writes from process context, and > refuse any writing when its threads are frozen... I actually still > believe it is doing the right thing, because you can't really write to > disk from timer. As per the recent thread about this, XFS threads suspend correctly and XFS doesn't issue I/O from timers. The problem appears to be per-cpu workqueues that don't get suspended because suspend does not shut down workqueue threads. You haven't quiesced the filesystem, you haven't shut down work queues, but you're expecting the filesystem to magically stop using them when suspend starts up? You can't lay the blame on any subsystem for using an interface that suspend doesn't quiesce when you *haven't told the subsystem it should suspend itself*. This is true regardless of whether the subsystem is a device driver, part of the network stack or a filesystem. e.g. A struct device has suspend and resume callouts so that each device can be safely suspended and resumed. Suspend uses these and you sure as hell don't expect a device to work properly after a resume if you don't use these interfaces. So why do you think filesystems should be treated differently? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-10 0:33 ` David Chinner @ 2006-11-10 10:38 ` Pavel Machek 0 siblings, 0 replies; 69+ messages in thread From: Pavel Machek @ 2006-11-10 10:38 UTC (permalink / raw) To: David Chinner Cc: Rafael J. Wysocki, Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS, Nigel Cunningham Hi! > > > OTOH I have no idea _how_ we can tell xfs that the processes have been > > > frozen. Should we introduce a global flag for that or something? > > > > I guess XFS should just do all the writes from process context, and > > refuse any writing when its threads are frozen... I actually still > > believe it is doing the right thing, because you can't really write to > > disk from timer. > > As per the recent thread about this, XFS threads suspend correctly > and XFS doesn't issue I/O from timers. > > The problem appears to be per-cpu workqueues that don't get > suspended because suspend does not shut down workqueue threads. You Workqueues should be easy to handle. current->flags &= ~PF_NONFREEZE, and problem should be solved. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-08 12:10 ` Rafael J. Wysocki 2006-11-08 18:09 ` Pavel Machek @ 2006-11-08 20:48 ` Nigel Cunningham 2006-11-08 21:08 ` Rafael J. Wysocki 1 sibling, 1 reply; 69+ messages in thread From: Nigel Cunningham @ 2006-11-08 20:48 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS Hi. On Wed, 2006-11-08 at 13:10 +0100, Rafael J. Wysocki wrote: > On Wednesday, 8 November 2006 03:30, Alasdair G Kergon wrote: > > On Tue, Nov 07, 2006 at 11:49:51PM +0000, Alasdair G Kergon wrote: > > > I hadn't noticed that -mm patch. I'll take a look. > > > > swsusp-freeze-filesystems-during-suspend-rev-2.patch > > > > I think you need to give more thought to device-mapper > > interactions here. If an underlying device is suspended > > by device-mapper without freezing the filesystem (the > > normal state) and you issue a freeze_bdev on a device > > above it, the freeze_bdev may never return if it attempts > > any synchronous I/O (as it should). > > Well, it looks like the interactions with dm add quite a bit of > complexity here. > > > Try: > > while process generating I/O to filesystem on LVM > > issue dmsetup suspend --nolockfs (which the lvm2 tools often do) > > try your freeze_filesystems() > > Okay, I will. > > > Maybe: don't allow freeze_filesystems() to run when the system is in that > > state; > > I'd like to avoid that (we may be running out of battery power at this point). > > > or, use device-mapper suspend instead of freeze_bdev directly where > > dm is involved; > > How do I check if dm is involved? > > > or skip dm devices that are already frozen - all with > > appropriate dependency tracking to process devices in the right order. > > I'd prefer this one, but probably the previous one is simpler to start with. Shouldn't we just go for the right thing to begin with? Otherwise we'll just make more problems for ourselves later. If we do this last one, I guess we want to do something like I was doing before (creating a list of the devices we've frozen)? Regards, Nigel ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-08 20:48 ` Nigel Cunningham @ 2006-11-08 21:08 ` Rafael J. Wysocki 0 siblings, 0 replies; 69+ messages in thread From: Rafael J. Wysocki @ 2006-11-08 21:08 UTC (permalink / raw) To: nigel Cc: Alasdair G Kergon, Eric Sandeen, Andrew Morton, linux-kernel, dm-devel, Srinivasa DS On Wednesday, 8 November 2006 21:48, Nigel Cunningham wrote: > Hi. > > On Wed, 2006-11-08 at 13:10 +0100, Rafael J. Wysocki wrote: > > On Wednesday, 8 November 2006 03:30, Alasdair G Kergon wrote: > > > On Tue, Nov 07, 2006 at 11:49:51PM +0000, Alasdair G Kergon wrote: > > > > I hadn't noticed that -mm patch. I'll take a look. > > > > > > swsusp-freeze-filesystems-during-suspend-rev-2.patch > > > > > > I think you need to give more thought to device-mapper > > > interactions here. If an underlying device is suspended > > > by device-mapper without freezing the filesystem (the > > > normal state) and you issue a freeze_bdev on a device > > > above it, the freeze_bdev may never return if it attempts > > > any synchronous I/O (as it should). > > > > Well, it looks like the interactions with dm add quite a bit of > > complexity here. > > > > > Try: > > > while process generating I/O to filesystem on LVM > > > issue dmsetup suspend --nolockfs (which the lvm2 tools often do) > > > try your freeze_filesystems() > > > > Okay, I will. > > > > > Maybe: don't allow freeze_filesystems() to run when the system is in that > > > state; > > > > I'd like to avoid that (we may be running out of battery power at this point). > > > > > or, use device-mapper suspend instead of freeze_bdev directly where > > > dm is involved; > > > > How do I check if dm is involved? > > > > > or skip dm devices that are already frozen - all with > > > appropriate dependency tracking to process devices in the right order. > > > > I'd prefer this one, but probably the previous one is simpler to start with. > > Shouldn't we just go for the right thing to begin with? Otherwise we'll > just make more problems for ourselves later. > > If we do this last one, I guess we want to do something like I was doing > before (creating a list of the devices we've frozen)? Well, the frozen superblocks have MS_FROZEN set. What the other list is needed for? Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 20:28 ` Andrew Morton 2006-11-07 22:45 ` Eric Sandeen @ 2006-11-07 23:23 ` Alasdair G Kergon 2006-11-07 23:39 ` Ingo Molnar 2 siblings, 0 replies; 69+ messages in thread From: Alasdair G Kergon @ 2006-11-07 23:23 UTC (permalink / raw) To: Andrew Morton Cc: Alasdair G Kergon, linux-kernel, dm-devel, Ingo Molnar, Eric Sandeen, Srinivasa DS On Tue, Nov 07, 2006 at 12:28:37PM -0800, Andrew Morton wrote: > So... what does this have to do with switching from mutex to semaphore? I should have summarised the discussion, sorry. This has always been a semaphore, but it got changed it to a mutex as part of a global switch recently - and this indeed generated bug reports from people who turn debugging on and report the error messages. So the quick fix in this patch was to put things back how they were. Now mutex.h states: * - only the owner can unlock the mutex * - task may not exit with mutex held * These semantics are fully enforced when DEBUG_MUTEXES is * enabled. Which begs the question whether the stated semantics are stricter than is actually necessary. > If so, it's a bit sad to switch to semaphore just because of some errant > debugging code. Perhaps it would be better to create a new > mutex_unlock_stfu() which suppresses the warning? Ingo? > > + if (down_trylock(&bdev->bd_mount_sem)) > > + return -EBUSY; > > + > This is a functional change which isn't described in the changelog. What's > happening here? Srinivasa added it as a precaution. Device-mapper should never fall foul of it, but confirming that is not trivial unless you know the code well, so it's a useful check to prevent a bug creeping back in. Alasdair -- agk@redhat.com ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex 2006-11-07 20:28 ` Andrew Morton 2006-11-07 22:45 ` Eric Sandeen 2006-11-07 23:23 ` Alasdair G Kergon @ 2006-11-07 23:39 ` Ingo Molnar 2 siblings, 0 replies; 69+ messages in thread From: Ingo Molnar @ 2006-11-07 23:39 UTC (permalink / raw) To: Andrew Morton Cc: Alasdair G Kergon, linux-kernel, dm-devel, Eric Sandeen, Srinivasa DS * Andrew Morton <akpm@osdl.org> wrote: > If so, it's a bit sad to switch to semaphore just because of some > errant debugging code. Perhaps it would be better to create a new > mutex_unlock_stfu() which suppresses the warning? the code was not using semaphores as a pure mutex thing. For example unlocking by non-owner is not legal. AFAICS the code returns to userspace with a held in-kernel mutex. That makes it hard for the kernel to determine for example whether the lock has been 'forgotten', or kept like that intentionally. Alasdair, what is the motivation for doing it like that? Ingo ^ permalink raw reply [flat|nested] 69+ messages in thread
end of thread, other threads:[~2007-01-12 20:47 UTC | newest] Thread overview: 69+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-11-07 18:34 [PATCH 2.6.19 5/5] fs: freeze_bdev with semaphore not mutex Alasdair G Kergon 2006-11-07 20:18 ` [dm-devel] " Mike Snitzer 2006-11-07 20:22 ` Eric Sandeen 2006-11-07 23:34 ` Alasdair G Kergon 2006-11-07 20:28 ` Andrew Morton 2006-11-07 22:45 ` Eric Sandeen 2006-11-07 23:00 ` Andrew Morton 2006-11-08 9:54 ` Arjan van de Ven 2007-01-12 6:23 ` Srinivasa Ds 2007-01-12 10:16 ` Srinivasa Ds 2006-11-07 23:05 ` Rafael J. Wysocki 2006-11-07 23:18 ` Eric Sandeen 2006-11-07 23:42 ` Rafael J. Wysocki 2006-11-08 0:01 ` Alasdair G Kergon 2006-11-08 8:27 ` David Chinner 2006-11-08 14:25 ` Alasdair G Kergon 2006-11-08 14:43 ` Rafael J. Wysocki 2006-11-08 15:25 ` Alasdair G Kergon 2006-11-08 23:06 ` Rafael J. Wysocki 2006-11-07 23:49 ` Alasdair G Kergon 2006-11-08 0:00 ` Rafael J. Wysocki 2006-11-08 3:33 ` David Chinner 2006-11-08 2:30 ` Alasdair G Kergon 2006-11-08 12:10 ` Rafael J. Wysocki 2006-11-08 18:09 ` Pavel Machek 2006-11-09 15:52 ` Rafael J. Wysocki 2006-11-09 16:00 ` Pavel Machek 2006-11-09 19:59 ` Rafael J. Wysocki 2006-11-09 21:17 ` Pavel Machek 2006-11-09 21:18 ` Rafael J. Wysocki 2006-11-09 21:41 ` Pavel Machek 2006-11-09 22:21 ` Rafael J. Wysocki 2006-11-09 23:11 ` Pavel Machek 2006-11-09 23:24 ` Alasdair G Kergon 2006-11-09 23:32 ` Pavel Machek 2006-11-10 12:03 ` Rafael J. Wysocki 2006-11-12 18:43 ` Pavel Machek 2006-11-12 21:53 ` Rafael J. Wysocki 2006-11-12 23:30 ` David Chinner 2006-11-13 16:11 ` Rafael J. Wysocki 2006-11-15 18:50 ` Pavel Machek 2006-11-15 19:56 ` Rafael J. Wysocki 2006-11-15 20:00 ` Rafael J. Wysocki 2006-11-15 20:23 ` Pavel Machek 2006-11-15 21:58 ` Rafael J. Wysocki 2006-11-15 22:49 ` Pavel Machek 2006-11-16 23:20 ` David Chinner 2006-11-16 23:38 ` Pavel Machek 2006-11-13 7:35 ` Stefan Seyfried 2006-11-10 0:57 ` David Chinner 2006-11-10 10:39 ` Pavel Machek 2006-11-12 22:30 ` David Chinner 2006-11-12 22:43 ` Rafael J. Wysocki 2006-11-13 5:43 ` David Chinner 2006-11-13 16:22 ` Rafael J. Wysocki 2006-11-14 0:10 ` David Chinner 2006-11-16 23:23 ` David Chinner 2006-11-16 23:40 ` Pavel Machek 2006-11-17 1:40 ` David Chinner 2006-11-17 15:13 ` Pavel Machek 2006-11-10 0:54 ` David Chinner 2006-11-10 10:24 ` Alan Cox 2006-11-10 10:36 ` Pavel Machek 2006-11-10 0:33 ` David Chinner 2006-11-10 10:38 ` Pavel Machek 2006-11-08 20:48 ` Nigel Cunningham 2006-11-08 21:08 ` Rafael J. Wysocki 2006-11-07 23:23 ` Alasdair G Kergon 2006-11-07 23:39 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox