linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
       [not found] ` <20110713000332.GM23038@dastard>
@ 2011-07-26 20:28   ` Rafael J. Wysocki
  2011-07-27  0:45     ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2011-07-26 20:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph, Linux PM mailing list, xfs

On Wednesday, July 13, 2011, Dave Chinner wrote:
> On Tue, Jul 12, 2011 at 06:05:01PM +0200, Christoph wrote:
> > Hi!
> > 
> > I'd like you to have a look into this issue:
> > 
> > pm-hibernate locks up when using xfs while "Preallocating image memory".
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=33622
> > 
> > I got at least this backtrace (2.6.39.3)
> > 
> > tia
> > 
> > chris
> > 
> > 
> > 
> > SysRq : Show Blocked State
> > 
> > pm-hibernate    D 0000000000000000     0  3638   3637 0x00000000
> >  ffff8800017bf918 0000000000000082 ffff8800017be010 ffff880000000000
> >  ffff8800017be010 ffff88000b8a6170 0000000000013900 ffff8800017bffd8
> >  ffff8800017bffd8 0000000000013900 ffffffff8148b020 ffff88000b8a6170
> > Call Trace:
> >  [<ffffffff81344ce2>] schedule_timeout+0x22/0xbb
> >  [<ffffffff81344b64>] wait_for_common+0xcb/0x148
> >  [<ffffffff810408ea>] ? try_to_wake_up+0x18c/0x18c
> >  [<ffffffff81345527>] ? down_write+0x2d/0x31
> >  [<ffffffff81344c7b>] wait_for_completion+0x18/0x1a
> >  [<ffffffffa02374da>] xfs_reclaim_inode+0x74/0x258 [xfs]
> >  [<ffffffffa0237853>] xfs_reclaim_inodes_ag+0x195/0x264 [xfs]
> >  [<ffffffffa0237974>] xfs_reclaim_inode_shrink+0x52/0x90 [xfs]
> >  [<ffffffff810c4e21>] shrink_slab+0xdb/0x151
> >  [<ffffffff810c625a>] do_try_to_free_pages+0x204/0x39a
> >  [<ffffffff8134ce4e>] ? apic_timer_interrupt+0xe/0x20
> >  [<ffffffff810c647f>] shrink_all_memory+0x8f/0xa8
> >  [<ffffffff810cc41a>] ? next_online_pgdat+0x20/0x41
> >  [<ffffffff8107937d>] hibernate_preallocate_memory+0x1c4/0x30f
> >  [<ffffffff811a8fa2>] ? kobject_put+0x47/0x4b
> >  [<ffffffff81077eb2>] hibernation_snapshot+0x45/0x281
> >  [<ffffffff810781bf>] hibernate+0xd1/0x1b8
> >  [<ffffffff81076c58>] state_store+0x57/0xce
> >  [<ffffffff811a8d0b>] kobj_attr_store+0x17/0x19
> >  [<ffffffff81152bda>] sysfs_write_file+0xfc/0x138
> >  [<ffffffff810fca74>] vfs_write+0xa9/0x105
> >  [<ffffffff810fcb89>] sys_write+0x45/0x6c
> >  [<ffffffff8134c492>] system_call_fastpath+0x16/0x1b
> 
> It's waiting for IO completion, and holding an AG scan lock.
> 
> And IO completion requires a workqueue to run. Just FYI, this
> process of inode reclaim can dirty the filesystem, long after
> hibernate have assumed that it is clean due to the sys_sync() call
> you do after freezing the processes. I pointed out this flaw in
> using sync to write dirty data prior to hibernate a couple of years
> ago.

However, attempts to remove the sys_sync() from the hibernate code
were objected to by some developers, since they believe it will increase
the probability of data loss in case of a failing hibernation in general. 

> Anyway, it's a good thing that XFS doesn't use freezable work
> queues, otherwise it would hang on every hibernate. Perhaps I should
> do that to force hibernate to do things properly in filesystems
> land.

Well, I'd say it's a very well known fact that filesystems are not
handled in any special way during hibernation, which is not a good
thing.  Nevertheless, I've never seen anyone from the filesystems land
pay any kind of attention to this issue.

> However, it is entirely possible that something else that XFS relies
> on for IO completion has been put to sleep by this point.
> 
> /me finds the smoking cannon:
> 
> [  648.794455] xfsbufd/sda3    D 0000000000000000     0   192      2 0x00000000
> [  648.794455]  ffff88003720be00 0000000000000046 ffff88003720bd90 ffffffff00000000
> [  648.794455]  ffff88003720a010 ffff880056bc3580 0000000000013900 ffff88003720bfd8
> [  648.794455]  ffff88003720bfd8 0000000000013900 ffffffff8148b020 ffff880056bc3580
> [  648.794455] Call Trace:
> [  648.794455]  [<ffffffff81065c0a>] refrigerator+0xbd/0xd3
> [  648.794455]  [<ffffffffa022d072>] xfsbufd+0x93/0x14d [xfs]
> [  648.794455]  [<ffffffffa022cfdf>] ? xfs_free_buftarg+0x4c/0x4c [xfs]
> [  648.794455]  [<ffffffff8105f25a>] kthread+0x7d/0x85
> [  648.794455]  [<ffffffff8134d6e4>] kernel_thread_helper+0x4/0x10
> [  648.794455]  [<ffffffff8105f1dd>] ? kthread_worker_fn+0x148/0x148
> [  648.794455]  [<ffffffff8134d6e0>] ? gs_change+0x13/0x13
> 
> The xfsbufd, responsible for pushing out dirty metadata, has been
> been frozen. sys_sync() does not push out dirty metadata because it
> is already on stable storage in the journal. If the flush lock is
> already held on the inode, then inode reclaim will wait for the
> xfsbufd to flush the backing buffer because reclaim can't do it
> directly. And hibernate has already frozen the xfsbufd.
> 
> IOWs, what hibernate does is:
> 
> 	freeze_processes()
> 	sys_sync()
> 	allocate a large amount of memory
> 
> Freezing the processes causes parts of filesystems to be put in the
> fridge, which means there is no guarantee that sys_sync() actually
> does what it is supposed to. As it is, sys_sync() really only
> guarantees file data is clean in memory - metadata does not need to
> be clean as long s it has been journalled and the journal is safe on
> disk.
>
> Further, allocating memory can cause memory reclaim to enter the
> filesystem and try to free memory held by the filesystem. In XFS (at
> least) this can cause the filesystem to issue tranactions and
> metadata IO to clean the dirty metadata to enable it to be
> reclaimed. So hibernate is effectively guaranteed to dirty the
> filesystem after it has frozen all the worker threads the filesystem
> might rely on.
> 
> Also, by this point kswapd has already been frozen, so hibernate is
> relying totally on direct memory reclaim to free up the memory it
> requires. I'm not sure that's a good idea.
> 
> IOWs, hibernate is still broken by design - and broken in exactly
> the way that was pointed out a couple of years ago by myself and
> others in the filesystem world: sys_sync() does not quiesce or
> guarantee a clean filesystem in memory after it completes.
> 
> There is a solution to this, and it already exists - it's called
> freezing the filesystem. Effectively hibernate needs to allocate
> memory before it freezes kernel/filesystem worker threads:
> 
> 	freeze_userspace_processes()
> 
> 	// just to clean the page cache quickly
> 	sys_sync()
> 
> 	// optionally to free page/inode/dentry caches:
> 		iterate_supers(drop_pagecache_sb, NULL);
> 		drop_slab()
> 
> 	allocate a large amount of memory
> 
> 	// Now quiesce the filesystems and clean remaining metadata
> 	iterate_supers(freeze_super, NULL);
> 
> 	freeze_remaining_processes()
> 
> This guarantees that filesystems are still working when memory
> reclaim comes along to free memory for the hibernate image, and that
> once it is allocated that filesystems will not be changed until
> thawed on the hibernate wakeup.
> 
> So, like I said a couple of years ago: fix hibernate to quiesce
> filesystems properly, and the hibernate will be much more reliable
> and robust and less likely to break randomly in the future.

Why don't you simply submit a patch to do that?

Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-07-26 20:28   ` PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Rafael J. Wysocki
@ 2011-07-27  0:45     ` Dave Chinner
  2011-07-27  9:35       ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2011-07-27  0:45 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Christoph, Linux PM mailing list, xfs

On Tue, Jul 26, 2011 at 10:28:11PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, July 13, 2011, Dave Chinner wrote:
> > On Tue, Jul 12, 2011 at 06:05:01PM +0200, Christoph wrote:
.....
> > > SysRq : Show Blocked State
> > > 
> > > pm-hibernate    D 0000000000000000     0  3638   3637 0x00000000
> > >  ffff8800017bf918 0000000000000082 ffff8800017be010 ffff880000000000
> > >  ffff8800017be010 ffff88000b8a6170 0000000000013900 ffff8800017bffd8
> > >  ffff8800017bffd8 0000000000013900 ffffffff8148b020 ffff88000b8a6170
> > > Call Trace:
> > >  [<ffffffff81344ce2>] schedule_timeout+0x22/0xbb
> > >  [<ffffffff81344b64>] wait_for_common+0xcb/0x148
> > >  [<ffffffff810408ea>] ? try_to_wake_up+0x18c/0x18c
> > >  [<ffffffff81345527>] ? down_write+0x2d/0x31
> > >  [<ffffffff81344c7b>] wait_for_completion+0x18/0x1a
> > >  [<ffffffffa02374da>] xfs_reclaim_inode+0x74/0x258 [xfs]
> > >  [<ffffffffa0237853>] xfs_reclaim_inodes_ag+0x195/0x264 [xfs]
> > >  [<ffffffffa0237974>] xfs_reclaim_inode_shrink+0x52/0x90 [xfs]
> > >  [<ffffffff810c4e21>] shrink_slab+0xdb/0x151
> > >  [<ffffffff810c625a>] do_try_to_free_pages+0x204/0x39a
> > >  [<ffffffff8134ce4e>] ? apic_timer_interrupt+0xe/0x20
> > >  [<ffffffff810c647f>] shrink_all_memory+0x8f/0xa8
> > >  [<ffffffff810cc41a>] ? next_online_pgdat+0x20/0x41
> > >  [<ffffffff8107937d>] hibernate_preallocate_memory+0x1c4/0x30f
> > >  [<ffffffff811a8fa2>] ? kobject_put+0x47/0x4b
> > >  [<ffffffff81077eb2>] hibernation_snapshot+0x45/0x281
> > >  [<ffffffff810781bf>] hibernate+0xd1/0x1b8
> > >  [<ffffffff81076c58>] state_store+0x57/0xce
> > >  [<ffffffff811a8d0b>] kobj_attr_store+0x17/0x19
> > >  [<ffffffff81152bda>] sysfs_write_file+0xfc/0x138
> > >  [<ffffffff810fca74>] vfs_write+0xa9/0x105
> > >  [<ffffffff810fcb89>] sys_write+0x45/0x6c
> > >  [<ffffffff8134c492>] system_call_fastpath+0x16/0x1b
> > 
> > It's waiting for IO completion, and holding an AG scan lock.
> > 
> > And IO completion requires a workqueue to run. Just FYI, this
> > process of inode reclaim can dirty the filesystem, long after
> > hibernate have assumed that it is clean due to the sys_sync() call
> > you do after freezing the processes. I pointed out this flaw in
> > using sync to write dirty data prior to hibernate a couple of years
> > ago.
> 
> However, attempts to remove the sys_sync() from the hibernate code
> were objected to by some developers, since they believe it will increase
> the probability of data loss in case of a failing hibernation in general. 

I'm not suggesting it gets removed, I'm suggesting it gets replaced
because it doesn't give the guarantees that you want or need.

> > Anyway, it's a good thing that XFS doesn't use freezable work
> > queues, otherwise it would hang on every hibernate. Perhaps I should
> > do that to force hibernate to do things properly in filesystems
> > land.
> 
> Well, I'd say it's a very well known fact that filesystems are not
> handled in any special way during hibernation, which is not a good
> thing.  Nevertheless, I've never seen anyone from the filesystems land
> pay any kind of attention to this issue.

I beg to differ. We went through this exact clas of bugs with swsusp
back in 2006:

https://lkml.org/lkml/2006/11/12/144

And this patch:

https://lkml.org/lkml/2006/11/1/155

([PATCH -mm] swsusp: Freeze filesystems during suspend)

"This is needed by swsusp, because some filesystems (eg. XFS) use
work queues and worker_threads run with PF_NOFREEZE set, so they can
cause some writes to be performed after the suspend image has been
created which may corrupt the filesystem.  The additional benefit of
it is that if the resume fails, the filesystems will be in a
consistent state and there won't be any journal replays needed."

--

And the patch essentially does:

-			sys_sync();
+			freeze_filesystems();


But, Pavel didn't like freezing filesystems to quiesce them
correctly, so the sys_sync() and all it's problems have remained
until this day, where we still have users tripping over the same
"filesystem not idle" problems.

[....]

> > IOWs, what hibernate does is:
> > 
> > 	freeze_processes()
> > 	sys_sync()
> > 	allocate a large amount of memory
> > 
> > Freezing the processes causes parts of filesystems to be put in the
> > fridge, which means there is no guarantee that sys_sync() actually
> > does what it is supposed to. As it is, sys_sync() really only
> > guarantees file data is clean in memory - metadata does not need to
> > be clean as long s it has been journalled and the journal is safe on
> > disk.
> >
> > Further, allocating memory can cause memory reclaim to enter the
> > filesystem and try to free memory held by the filesystem. In XFS (at
> > least) this can cause the filesystem to issue tranactions and
> > metadata IO to clean the dirty metadata to enable it to be
> > reclaimed. So hibernate is effectively guaranteed to dirty the
> > filesystem after it has frozen all the worker threads the filesystem
> > might rely on.
> > 
> > Also, by this point kswapd has already been frozen, so hibernate is
> > relying totally on direct memory reclaim to free up the memory it
> > requires. I'm not sure that's a good idea.
> > 
> > IOWs, hibernate is still broken by design - and broken in exactly
> > the way that was pointed out a couple of years ago by myself and
> > others in the filesystem world: sys_sync() does not quiesce or
> > guarantee a clean filesystem in memory after it completes.
> > 
> > There is a solution to this, and it already exists - it's called
> > freezing the filesystem. Effectively hibernate needs to allocate
> > memory before it freezes kernel/filesystem worker threads:
> > 
> > 	freeze_userspace_processes()
> > 
> > 	// just to clean the page cache quickly
> > 	sys_sync()
> > 
> > 	// optionally to free page/inode/dentry caches:
> > 		iterate_supers(drop_pagecache_sb, NULL);
> > 		drop_slab()
> > 
> > 	allocate a large amount of memory
> > 
> > 	// Now quiesce the filesystems and clean remaining metadata
> > 	iterate_supers(freeze_super, NULL);
> > 
> > 	freeze_remaining_processes()
> > 
> > This guarantees that filesystems are still working when memory
> > reclaim comes along to free memory for the hibernate image, and that
> > once it is allocated that filesystems will not be changed until
> > thawed on the hibernate wakeup.
> > 
> > So, like I said a couple of years ago: fix hibernate to quiesce
> > filesystems properly, and the hibernate will be much more reliable
> > and robust and less likely to break randomly in the future.
> 
> Why don't you simply submit a patch to do that?

a) I don't know how to test suspend/hibernate
b) I don't have any hardware I can test it on.
c) I don't scale to solving every problem Linux has
d) you guys need to decide how you're going to fix this because the
problem has already been solved once before and it didn't get merged
because nobody in the swsusp/hibernate world could agree on anything
at the time.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-07-27  0:45     ` Dave Chinner
@ 2011-07-27  9:35       ` Rafael J. Wysocki
  2011-07-27 10:33         ` Christoph Hellwig
  2011-08-10 21:43         ` PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Pavel Machek
  0 siblings, 2 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2011-07-27  9:35 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph, Linux PM mailing list, xfs

On Wednesday, July 27, 2011, Dave Chinner wrote:
> On Tue, Jul 26, 2011 at 10:28:11PM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, July 13, 2011, Dave Chinner wrote:
> > > On Tue, Jul 12, 2011 at 06:05:01PM +0200, Christoph wrote:
> .....
> > > > SysRq : Show Blocked State
> > > > 
> > > > pm-hibernate    D 0000000000000000     0  3638   3637 0x00000000
> > > >  ffff8800017bf918 0000000000000082 ffff8800017be010 ffff880000000000
> > > >  ffff8800017be010 ffff88000b8a6170 0000000000013900 ffff8800017bffd8
> > > >  ffff8800017bffd8 0000000000013900 ffffffff8148b020 ffff88000b8a6170
> > > > Call Trace:
> > > >  [<ffffffff81344ce2>] schedule_timeout+0x22/0xbb
> > > >  [<ffffffff81344b64>] wait_for_common+0xcb/0x148
> > > >  [<ffffffff810408ea>] ? try_to_wake_up+0x18c/0x18c
> > > >  [<ffffffff81345527>] ? down_write+0x2d/0x31
> > > >  [<ffffffff81344c7b>] wait_for_completion+0x18/0x1a
> > > >  [<ffffffffa02374da>] xfs_reclaim_inode+0x74/0x258 [xfs]
> > > >  [<ffffffffa0237853>] xfs_reclaim_inodes_ag+0x195/0x264 [xfs]
> > > >  [<ffffffffa0237974>] xfs_reclaim_inode_shrink+0x52/0x90 [xfs]
> > > >  [<ffffffff810c4e21>] shrink_slab+0xdb/0x151
> > > >  [<ffffffff810c625a>] do_try_to_free_pages+0x204/0x39a
> > > >  [<ffffffff8134ce4e>] ? apic_timer_interrupt+0xe/0x20
> > > >  [<ffffffff810c647f>] shrink_all_memory+0x8f/0xa8
> > > >  [<ffffffff810cc41a>] ? next_online_pgdat+0x20/0x41
> > > >  [<ffffffff8107937d>] hibernate_preallocate_memory+0x1c4/0x30f
> > > >  [<ffffffff811a8fa2>] ? kobject_put+0x47/0x4b
> > > >  [<ffffffff81077eb2>] hibernation_snapshot+0x45/0x281
> > > >  [<ffffffff810781bf>] hibernate+0xd1/0x1b8
> > > >  [<ffffffff81076c58>] state_store+0x57/0xce
> > > >  [<ffffffff811a8d0b>] kobj_attr_store+0x17/0x19
> > > >  [<ffffffff81152bda>] sysfs_write_file+0xfc/0x138
> > > >  [<ffffffff810fca74>] vfs_write+0xa9/0x105
> > > >  [<ffffffff810fcb89>] sys_write+0x45/0x6c
> > > >  [<ffffffff8134c492>] system_call_fastpath+0x16/0x1b
> > > 
> > > It's waiting for IO completion, and holding an AG scan lock.
> > > 
> > > And IO completion requires a workqueue to run. Just FYI, this
> > > process of inode reclaim can dirty the filesystem, long after
> > > hibernate have assumed that it is clean due to the sys_sync() call
> > > you do after freezing the processes. I pointed out this flaw in
> > > using sync to write dirty data prior to hibernate a couple of years
> > > ago.
> > 
> > However, attempts to remove the sys_sync() from the hibernate code
> > were objected to by some developers, since they believe it will increase
> > the probability of data loss in case of a failing hibernation in general. 
> 
> I'm not suggesting it gets removed, I'm suggesting it gets replaced
> because it doesn't give the guarantees that you want or need.
> 
> > > Anyway, it's a good thing that XFS doesn't use freezable work
> > > queues, otherwise it would hang on every hibernate. Perhaps I should
> > > do that to force hibernate to do things properly in filesystems
> > > land.
> > 
> > Well, I'd say it's a very well known fact that filesystems are not
> > handled in any special way during hibernation, which is not a good
> > thing.  Nevertheless, I've never seen anyone from the filesystems land
> > pay any kind of attention to this issue.
> 
> I beg to differ. We went through this exact clas of bugs with swsusp
> back in 2006:
> 
> https://lkml.org/lkml/2006/11/12/144

You're right, sorry.  We discussed this 5 years ago, but the context
was a bit different (the hibernation code was using a different
mechanism for freeing memory).

> And this patch:
> 
> https://lkml.org/lkml/2006/11/1/155
> 
> ([PATCH -mm] swsusp: Freeze filesystems during suspend)
> 
> "This is needed by swsusp, because some filesystems (eg. XFS) use
> work queues and worker_threads run with PF_NOFREEZE set, so they can
> cause some writes to be performed after the suspend image has been
> created which may corrupt the filesystem.  The additional benefit of
> it is that if the resume fails, the filesystems will be in a
> consistent state and there won't be any journal replays needed."
> 
> --
> 
> And the patch essentially does:
> 
> -			sys_sync();
> +			freeze_filesystems();

Well, if you still think the patch does the right thing, I can rebase it
on top of the current freezer code and resubmit.

> But, Pavel didn't like freezing filesystems to quiesce them
> correctly, so the sys_sync() and all it's problems have remained
> until this day, where we still have users tripping over the same
> "filesystem not idle" problems.

The problem seems to be quite specific to XFS, though.

The Pavel's objection, if I remember it correctly, was that some
(or the majority of?) filesystems didn't implement the freezing operation,
so they would be more vulnerable to data loss in case of a failing hibernation
after this change.  However, that's better than actively causing pain to XFS
users.

> 
> [....]
> 
> > > IOWs, what hibernate does is:
> > > 
> > > 	freeze_processes()
> > > 	sys_sync()
> > > 	allocate a large amount of memory
> > > 
> > > Freezing the processes causes parts of filesystems to be put in the
> > > fridge, which means there is no guarantee that sys_sync() actually
> > > does what it is supposed to. As it is, sys_sync() really only
> > > guarantees file data is clean in memory - metadata does not need to
> > > be clean as long s it has been journalled and the journal is safe on
> > > disk.
> > >
> > > Further, allocating memory can cause memory reclaim to enter the
> > > filesystem and try to free memory held by the filesystem. In XFS (at
> > > least) this can cause the filesystem to issue tranactions and
> > > metadata IO to clean the dirty metadata to enable it to be
> > > reclaimed. So hibernate is effectively guaranteed to dirty the
> > > filesystem after it has frozen all the worker threads the filesystem
> > > might rely on.
> > > 
> > > Also, by this point kswapd has already been frozen, so hibernate is
> > > relying totally on direct memory reclaim to free up the memory it
> > > requires. I'm not sure that's a good idea.
> > > 
> > > IOWs, hibernate is still broken by design - and broken in exactly
> > > the way that was pointed out a couple of years ago by myself and
> > > others in the filesystem world: sys_sync() does not quiesce or
> > > guarantee a clean filesystem in memory after it completes.
> > > 
> > > There is a solution to this, and it already exists - it's called
> > > freezing the filesystem. Effectively hibernate needs to allocate
> > > memory before it freezes kernel/filesystem worker threads:
> > > 
> > > 	freeze_userspace_processes()
> > > 
> > > 	// just to clean the page cache quickly
> > > 	sys_sync()
> > > 
> > > 	// optionally to free page/inode/dentry caches:
> > > 		iterate_supers(drop_pagecache_sb, NULL);
> > > 		drop_slab()
> > > 
> > > 	allocate a large amount of memory
> > > 
> > > 	// Now quiesce the filesystems and clean remaining metadata
> > > 	iterate_supers(freeze_super, NULL);
> > > 
> > > 	freeze_remaining_processes()
> > > 
> > > This guarantees that filesystems are still working when memory
> > > reclaim comes along to free memory for the hibernate image, and that
> > > once it is allocated that filesystems will not be changed until
> > > thawed on the hibernate wakeup.
> > > 
> > > So, like I said a couple of years ago: fix hibernate to quiesce
> > > filesystems properly, and the hibernate will be much more reliable
> > > and robust and less likely to break randomly in the future.
> > 
> > Why don't you simply submit a patch to do that?
> 
> a) I don't know how to test suspend/hibernate
> b) I don't have any hardware I can test it on.
> c) I don't scale to solving every problem Linux has
> d) you guys need to decide how you're going to fix this because the
> problem has already been solved once before and it didn't get merged
> because nobody in the swsusp/hibernate world could agree on anything
> at the time.

OK, I'm not a filesystem expert in turn.

As I said, I can revive the Nigel's patch if you think it's better than
the code we have, but I'm afraid that's all I can do without any help from
the filesystems people.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-07-27  9:35       ` Rafael J. Wysocki
@ 2011-07-27 10:33         ` Christoph Hellwig
  2011-07-27 12:22           ` Nigel Cunningham
  2011-08-10 21:43         ` PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Pavel Machek
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2011-07-27 10:33 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Christoph, Dave Chinner, xfs, Linux PM mailing list

On Wed, Jul 27, 2011 at 11:35:13AM +0200, Rafael J. Wysocki wrote:
> The Pavel's objection, if I remember it correctly, was that some
> (or the majority of?) filesystems didn't implement the freezing operation,
> so they would be more vulnerable to data loss in case of a failing hibernation
> after this change.  However, that's better than actively causing pain to XFS
> users.

The objection never made sense and only means he never read the code.
freeze_super (or freeze_bdev back then) always does a sync_filesystem
before even checking if we have a freeze method, and sync_filesystem is
what we iterate over for each superblock in sync().

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-07-27 10:33         ` Christoph Hellwig
@ 2011-07-27 12:22           ` Nigel Cunningham
       [not found]             ` <201108032315.06012.rjw@sisk.pl>
  2011-08-03 21:15             ` [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag) Rafael J. Wysocki
  0 siblings, 2 replies; 16+ messages in thread
From: Nigel Cunningham @ 2011-07-27 12:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Christoph, Dave Chinner, xfs, Linux PM mailing list

Hi.

On 27/07/11 20:33, Christoph Hellwig wrote:
> On Wed, Jul 27, 2011 at 11:35:13AM +0200, Rafael J. Wysocki wrote:
>> The Pavel's objection, if I remember it correctly, was that some
>> (or the majority of?) filesystems didn't implement the freezing operation,
>> so they would be more vulnerable to data loss in case of a failing hibernation
>> after this change.  However, that's better than actively causing pain to XFS
>> users.
> 
> The objection never made sense and only means he never read the code.
> freeze_super (or freeze_bdev back then) always does a sync_filesystem
> before even checking if we have a freeze method, and sync_filesystem is
> what we iterate over for each superblock in sync().

I've had freezing supers in TOI for a couple of years now and it has
only ever helped. To be honest, if you have a ton of dirty pages, it
does result in a big delay, but that's the worst of it.

Regards,

Nigel
-- 
Evolution (n): A hypothetical process whereby improbable
events occur with alarming frequency, order arises from chaos, and
no one is given credit.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
       [not found]             ` <201108032315.06012.rjw@sisk.pl>
@ 2011-08-03 17:29               ` Pavel Machek
       [not found]               ` <20110803172922.GA2126@ucw.cz>
  1 sibling, 0 replies; 16+ messages in thread
From: Pavel Machek @ 2011-08-03 17:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Christoph, Dave Chinner, LKML, xfs, Christoph Hellwig,
	Linux PM mailing list

Hi!

> Freeze all filesystems during the freezing of tasks by calling
> freeze_bdev() for each of them and thaw them during the thawing
> of tasks with the help of thaw_bdev().
> 
> This is needed by hibernation, because some filesystems (e.g. XFS)
> deadlock with the preallocation of memory used by it if the memory
> pressure caused by it is too heavy.
> 
> The additional benefit of this change is that, if something goes
> wrong after filesystems have been frozen, they will stay in a
> consistent state and journal replays won't be necessary (e.g. after
> a failing suspend or resume).  In particular, this should help to
> solve a long-standing issue that in some cases during resume from
> hibernation the boot loader causes the journal to be replied for the
> filesystem containing the kernel image and initrd causing it to
> become inconsistent with the information stored in the hibernation
> image.

> +/**
> + * freeze_filesystems - Force all filesystems into a consistent state.
> + */
> +void freeze_filesystems(void)
> +{
> +	struct super_block *sb;
> +
> +	lockdep_off();

Ouch. So... why do we need to silence this?

> +	/*
> +	 * Freeze in reverse order so filesystems dependant upon others are
> +	 * frozen in the right order (eg. loopback on ext3).
> +	 */
> +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> +		if (!sb->s_root || !sb->s_bdev ||
> +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> +		    (sb->s_flags & MS_RDONLY) ||
> +		    (sb->s_flags & MS_FROZEN))
> +			continue;

Should we stop NFS from modifying remote server, too?

Plus... ext3 writes to read-only filesystems on mount; not sure if it
does it later. But RDONLY means 'user cant write to it' not 'bdev will
not be modified'. Should we freeze all?

How can 'already frozen' happen?

> +	list_for_each_entry(sb, &super_blocks, s_list)
> +		if (sb->s_flags & MS_FROZEN) {
> +			sb->s_flags &= ~MS_FROZEN;
> +			thaw_bdev(sb->s_bdev, sb);
> +		}

...because we'll unfreeze it even if we did not freeze it...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
  2011-07-27 12:22           ` Nigel Cunningham
       [not found]             ` <201108032315.06012.rjw@sisk.pl>
@ 2011-08-03 21:15             ` Rafael J. Wysocki
  1 sibling, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2011-08-03 21:15 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Christoph, Dave Chinner, LKML, xfs, Christoph Hellwig,
	Linux PM mailing list

On Wednesday, July 27, 2011, Nigel Cunningham wrote:
> Hi.
> 
> On 27/07/11 20:33, Christoph Hellwig wrote:
> > On Wed, Jul 27, 2011 at 11:35:13AM +0200, Rafael J. Wysocki wrote:
> >> The Pavel's objection, if I remember it correctly, was that some
> >> (or the majority of?) filesystems didn't implement the freezing operation,
> >> so they would be more vulnerable to data loss in case of a failing hibernation
> >> after this change.  However, that's better than actively causing pain to XFS
> >> users.
> > 
> > The objection never made sense and only means he never read the code.
> > freeze_super (or freeze_bdev back then) always does a sync_filesystem
> > before even checking if we have a freeze method, and sync_filesystem is
> > what we iterate over for each superblock in sync().
> 
> I've had freezing supers in TOI for a couple of years now and it has
> only ever helped. To be honest, if you have a ton of dirty pages, it
> does result in a big delay, but that's the worst of it.

OK, so below is the revived patch.

To be precise, we don't call sys_sync() from the freezer an more
(evidently, I'd removed in myself, but later forgot about that), so
it only adds freeze_filesystems() and thaw_filesystems().

Comments welcome.

Thanks,
Rafael

---

Freeze all filesystems during the freezing of tasks by calling
freeze_bdev() for each of them and thaw them during the thawing
of tasks with the help of thaw_bdev().

This is needed by hibernation, because some filesystems (e.g. XFS)
deadlock with the preallocation of memory used by it if the memory
pressure caused by it is too heavy.

The additional benefit of this change is that, if something goes
wrong after filesystems have been frozen, they will stay in a
consistent state and journal replays won't be necessary (e.g. after
a failing suspend or resume).  In particular, this should help to
solve a long-standing issue that in some cases during resume from
hibernation the boot loader causes the journal to be replied for the
filesystem containing the kernel image and initrd causing it to
become inconsistent with the information stored in the hibernation
image.

This change is based on earlier work by Nigel Cunningham.

Not-really-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 fs/block_dev.c         |   43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     |    6 ++++++
 kernel/power/process.c |    7 ++++++-
 3 files changed, 55 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -211,6 +211,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_FROZEN	(1<<25) /* bdev has been frozen */
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
@@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
 extern int fsync_bdev(struct block_device *);
+extern void freeze_filesystems(void);
+extern void thaw_filesystems(void);
 #else
 static inline void bd_forget(struct inode *inode) {}
 static inline int sync_blockdev(struct block_device *bdev) { return 0; }
@@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
 {
 	return 0;
 }
+
+static inline void freeze_filesystems(void) {}
+static inline void thaw_filesystems(void) {}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;
Index: linux-2.6/fs/block_dev.c
===================================================================
--- linux-2.6.orig/fs/block_dev.c
+++ linux-2.6/fs/block_dev.c
@@ -314,6 +314,49 @@ out:
 }
 EXPORT_SYMBOL(thaw_bdev);
 
+/**
+ * freeze_filesystems - Force all filesystems into a consistent state.
+ */
+void freeze_filesystems(void)
+{
+	struct super_block *sb;
+
+	lockdep_off();
+	/*
+	 * Freeze in reverse order so filesystems dependant upon others are
+	 * frozen in the right order (eg. loopback on ext3).
+	 */
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		if (!sb->s_root || !sb->s_bdev ||
+		    (sb->s_frozen == SB_FREEZE_TRANS) ||
+		    (sb->s_flags & MS_RDONLY) ||
+		    (sb->s_flags & MS_FROZEN))
+			continue;
+
+		freeze_bdev(sb->s_bdev);
+		sb->s_flags |= MS_FROZEN;
+	}
+	lockdep_on();
+}
+
+/**
+ * thaw_filesystems - Make all filesystems active again.
+ */
+void thaw_filesystems(void)
+{
+	struct super_block *sb;
+
+	lockdep_off();
+
+	list_for_each_entry(sb, &super_blocks, s_list)
+		if (sb->s_flags & MS_FROZEN) {
+			sb->s_flags &= ~MS_FROZEN;
+			thaw_bdev(sb->s_bdev, sb);
+		}
+
+	lockdep_on();
+}
+
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -12,10 +12,10 @@
 #include <linux/oom.h>
 #include <linux/suspend.h>
 #include <linux/module.h>
-#include <linux/syscalls.h>
 #include <linux/freezer.h>
 #include <linux/delay.h>
 #include <linux/workqueue.h>
+#include <linux/fs.h>
 
 /* 
  * Timeout for stopping processes
@@ -147,6 +147,10 @@ int freeze_processes(void)
 		goto Exit;
 	printk("done.\n");
 
+	pr_info("Freezing filesystems ... ");
+	freeze_filesystems();
+	pr_info("done.\n");
+
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
 	if (error)
@@ -188,6 +192,7 @@ void thaw_processes(void)
 	printk("Restarting tasks ... ");
 	thaw_workqueues();
 	thaw_tasks(true);
+	thaw_filesystems();
 	thaw_tasks(false);
 	schedule();
 	printk("done.\n");

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
       [not found]               ` <20110803172922.GA2126@ucw.cz>
@ 2011-08-04  9:27                 ` Rafael J. Wysocki
       [not found]                 ` <201108041127.30944.rjw@sisk.pl>
  1 sibling, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2011-08-04  9:27 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Christoph, Dave Chinner, LKML, xfs, Christoph Hellwig,
	Linux PM mailing list

On Wednesday, August 03, 2011, Pavel Machek wrote:
> Hi!
> 
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> > 
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> > 
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume).  In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> 
> > +/**
> > + * freeze_filesystems - Force all filesystems into a consistent state.
> > + */
> > +void freeze_filesystems(void)
> > +{
> > +	struct super_block *sb;
> > +
> > +	lockdep_off();
> 
> Ouch. So... why do we need to silence this?

So that it doesn't complain? :-)

I'll need some time to get the exact details here.

> > +	/*
> > +	 * Freeze in reverse order so filesystems dependant upon others are
> > +	 * frozen in the right order (eg. loopback on ext3).
> > +	 */
> > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > +		if (!sb->s_root || !sb->s_bdev ||
> > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > +		    (sb->s_flags & MS_RDONLY) ||
> > +		    (sb->s_flags & MS_FROZEN))
> > +			continue;
> 
> Should we stop NFS from modifying remote server, too?

What do you mean exactly?

> Plus... ext3 writes to read-only filesystems on mount; not sure if it
> does it later. But RDONLY means 'user cant write to it' not 'bdev will
> not be modified'. Should we freeze all?
> 
> How can 'already frozen' happen?
> 
> > +	list_for_each_entry(sb, &super_blocks, s_list)
> > +		if (sb->s_flags & MS_FROZEN) {
> > +			sb->s_flags &= ~MS_FROZEN;
> > +			thaw_bdev(sb->s_bdev, sb);
> > +		}
> 
> ...because we'll unfreeze it even if we did not freeze it...

So we need not check MS_FROZEN in freeze_filesystems().  OK

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
       [not found]                 ` <201108041127.30944.rjw@sisk.pl>
@ 2011-08-04 22:25                   ` Rafael J. Wysocki
       [not found]                   ` <201108050025.09792.rjw@sisk.pl>
  1 sibling, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2011-08-04 22:25 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Christoph, Theodore Ts'o, Dave Chinner, LKML, xfs,
	Christoph Hellwig, Linux PM mailing list, linux-ext4

On Thursday, August 04, 2011, Rafael J. Wysocki wrote:
> On Wednesday, August 03, 2011, Pavel Machek wrote:
> > Hi!
> > 
> > > Freeze all filesystems during the freezing of tasks by calling
> > > freeze_bdev() for each of them and thaw them during the thawing
> > > of tasks with the help of thaw_bdev().
> > > 
> > > This is needed by hibernation, because some filesystems (e.g. XFS)
> > > deadlock with the preallocation of memory used by it if the memory
> > > pressure caused by it is too heavy.
> > > 
> > > The additional benefit of this change is that, if something goes
> > > wrong after filesystems have been frozen, they will stay in a
> > > consistent state and journal replays won't be necessary (e.g. after
> > > a failing suspend or resume).  In particular, this should help to
> > > solve a long-standing issue that in some cases during resume from
> > > hibernation the boot loader causes the journal to be replied for the
> > > filesystem containing the kernel image and initrd causing it to
> > > become inconsistent with the information stored in the hibernation
> > > image.
> > 
> > > +/**
> > > + * freeze_filesystems - Force all filesystems into a consistent state.
> > > + */
> > > +void freeze_filesystems(void)
> > > +{
> > > +	struct super_block *sb;
> > > +
> > > +	lockdep_off();
> > 
> > Ouch. So... why do we need to silence this?
> 
> So that it doesn't complain? :-)
> 
> I'll need some time to get the exact details here.

So, this is because ext3_freeze() that doesn't call
journal_unlock_updates() on success, which quite frankly looks like
a bug in ext3 to me.  At least that's different from what ext4 does
in exactly the same situation (which looks correct).

If ext3_freeze() called journal_unlock_updates() on success too and
the call to journal_unlock_updates() is removed from ext3_unfreeze(),
we wouldn't need that lockdep_off()/lockdep_on() around the loop.

I need someone with ext3/ext4 knowledge to comment here, though.

Moreover, I'm not sure if other filesystems don't do such things.

Anyway, this is just a false-positive, even with the ext3 code as is.

> > > +	/*
> > > +	 * Freeze in reverse order so filesystems dependant upon others are
> > > +	 * frozen in the right order (eg. loopback on ext3).
> > > +	 */
> > > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > > +		if (!sb->s_root || !sb->s_bdev ||
> > > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > > +		    (sb->s_flags & MS_RDONLY) ||
> > > +		    (sb->s_flags & MS_FROZEN))
> > > +			continue;
> > 
> > Should we stop NFS from modifying remote server, too?
> 
> What do you mean exactly?
> 
> > Plus... ext3 writes to read-only filesystems on mount; not sure if it
> > does it later. But RDONLY means 'user cant write to it' not 'bdev will
> > not be modified'. Should we freeze all?
> > 
> > How can 'already frozen' happen?
> > 
> > > +	list_for_each_entry(sb, &super_blocks, s_list)
> > > +		if (sb->s_flags & MS_FROZEN) {
> > > +			sb->s_flags &= ~MS_FROZEN;
> > > +			thaw_bdev(sb->s_bdev, sb);
> > > +		}
> > 
> > ...because we'll unfreeze it even if we did not freeze it...
> 
> So we need not check MS_FROZEN in freeze_filesystems().  OK

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
       [not found]                   ` <201108050025.09792.rjw@sisk.pl>
@ 2011-08-06 21:17                     ` Rafael J. Wysocki
       [not found]                     ` <201108062317.19033.rjw@sisk.pl>
  1 sibling, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2011-08-06 21:17 UTC (permalink / raw)
  To: Linux PM mailing list
  Cc: Christoph, Theodore Ts'o, Dave Chinner, LKML, xfs,
	Christoph Hellwig, linux-fsdevel, linux-ext4

From: Rafael J. Wysocki <rjw@sisk.pl>

Freeze all filesystems during the freezing of tasks by calling
freeze_bdev() for each of them and thaw them during the thawing
of tasks with the help of thaw_bdev().

This is needed by hibernation, because some filesystems (e.g. XFS)
deadlock with the preallocation of memory used by it if the memory
pressure caused by it is too heavy.

The additional benefit of this change is that, if something goes
wrong after filesystems have been frozen, they will stay in a
consistent state and journal replays won't be necessary (e.g. after
a failing suspend or resume).  In particular, this should help to
solve a long-standing issue that in some cases during resume from
hibernation the boot loader causes the journal to be replied for the
filesystem containing the kernel image and initrd causing it to
become inconsistent with the information stored in the hibernation
image.

This change is based on earlier work by Nigel Cunningham.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---

OK, so nobody except for Pavel appears to have any comments, so I assume
that everyone except for Pavel is fine with the approach, interestingly enough.

I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
and added comments explaining why lockdep_off/on() are used.

Thanks,
Rafael

---
 fs/block_dev.c         |   56 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     |    6 +++++
 kernel/power/process.c |    7 +++++-
 3 files changed, 68 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -211,6 +211,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_FROZEN	(1<<25) /* bdev has been frozen */
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
@@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
 extern int fsync_bdev(struct block_device *);
+extern void freeze_filesystems(void);
+extern void thaw_filesystems(void);
 #else
 static inline void bd_forget(struct inode *inode) {}
 static inline int sync_blockdev(struct block_device *bdev) { return 0; }
@@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
 {
 	return 0;
 }
+
+static inline void freeze_filesystems(void) {}
+static inline void thaw_filesystems(void) {}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;
Index: linux-2.6/fs/block_dev.c
===================================================================
--- linux-2.6.orig/fs/block_dev.c
+++ linux-2.6/fs/block_dev.c
@@ -314,6 +314,62 @@ out:
 }
 EXPORT_SYMBOL(thaw_bdev);
 
+/**
+ * freeze_filesystems - Force all filesystems into a consistent state.
+ */
+void freeze_filesystems(void)
+{
+	struct super_block *sb;
+
+	/*
+	 * This is necessary, because some filesystems (e.g. ext3) lock
+	 * mutexes in their .freeze_fs() callbacks and leave them locked for
+	 * their .unfreeze_fs() callbacks to unlock.  This is done under
+	 * bdev->bd_fsfreeze_mutex, which is then released, but it makes
+	 * lockdep think something may be wrong when freeze_bdev() attempts
+	 * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
+	 */
+	lockdep_off();
+
+	/*
+	 * Freeze in reverse order so filesystems depending on others are
+	 * frozen in the right order (eg. loopback on ext3).
+	 */
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		if (!sb->s_root || !sb->s_bdev ||
+		    (sb->s_frozen == SB_FREEZE_TRANS) ||
+		    (sb->s_flags & MS_RDONLY))
+			continue;
+
+		freeze_bdev(sb->s_bdev);
+		sb->s_flags |= MS_FROZEN;
+	}
+
+	lockdep_on();
+}
+
+/**
+ * thaw_filesystems - Make all filesystems active again.
+ */
+void thaw_filesystems(void)
+{
+	struct super_block *sb;
+
+	/*
+	 * This is necessary for the same reason as in freeze_filesystems()
+	 * above.
+	 */
+	lockdep_off();
+
+	list_for_each_entry(sb, &super_blocks, s_list)
+		if (sb->s_flags & MS_FROZEN) {
+			sb->s_flags &= ~MS_FROZEN;
+			thaw_bdev(sb->s_bdev, sb);
+		}
+
+	lockdep_on();
+}
+
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -12,10 +12,10 @@
 #include <linux/oom.h>
 #include <linux/suspend.h>
 #include <linux/module.h>
-#include <linux/syscalls.h>
 #include <linux/freezer.h>
 #include <linux/delay.h>
 #include <linux/workqueue.h>
+#include <linux/fs.h>
 
 /* 
  * Timeout for stopping processes
@@ -147,6 +147,10 @@ int freeze_processes(void)
 		goto Exit;
 	printk("done.\n");
 
+	pr_info("Freezing filesystems ... ");
+	freeze_filesystems();
+	pr_info("done.\n");
+
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
 	if (error)
@@ -188,6 +192,7 @@ void thaw_processes(void)
 	printk("Restarting tasks ... ");
 	thaw_workqueues();
 	thaw_tasks(true);
+	thaw_filesystems();
 	thaw_tasks(false);
 	schedule();
 	printk("done.\n");

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
       [not found]                     ` <201108062317.19033.rjw@sisk.pl>
@ 2011-08-07  0:14                       ` Dave Chinner
       [not found]                       ` <20110807001446.GI3162@dastard>
  1 sibling, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2011-08-07  0:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	linux-fsdevel, Linux PM mailing list, linux-ext4

On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Freeze all filesystems during the freezing of tasks by calling
> freeze_bdev() for each of them and thaw them during the thawing
> of tasks with the help of thaw_bdev().
> 
> This is needed by hibernation, because some filesystems (e.g. XFS)
> deadlock with the preallocation of memory used by it if the memory
> pressure caused by it is too heavy.
> 
> The additional benefit of this change is that, if something goes
> wrong after filesystems have been frozen, they will stay in a
> consistent state and journal replays won't be necessary (e.g. after
> a failing suspend or resume).  In particular, this should help to
> solve a long-standing issue that in some cases during resume from
> hibernation the boot loader causes the journal to be replied for the
> filesystem containing the kernel image and initrd causing it to
> become inconsistent with the information stored in the hibernation
> image.
> 
> This change is based on earlier work by Nigel Cunningham.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
> 
> OK, so nobody except for Pavel appears to have any comments, so I assume
> that everyone except for Pavel is fine with the approach, interestingly enough.
> 
> I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
> and added comments explaining why lockdep_off/on() are used.
> 
> Thanks,
> Rafael
> 
> ---
>  fs/block_dev.c         |   56 +++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h     |    6 +++++
>  kernel/power/process.c |    7 +++++-
>  3 files changed, 68 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h
> +++ linux-2.6/include/linux/fs.h
> @@ -211,6 +211,7 @@ struct inodes_stat_t {
>  #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
>  #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
>  #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
> +#define MS_FROZEN	(1<<25) /* bdev has been frozen */
>  #define MS_NOSEC	(1<<28)
>  #define MS_BORN		(1<<29)
>  #define MS_ACTIVE	(1<<30)
> @@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
>  extern void emergency_thaw_all(void);
>  extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
>  extern int fsync_bdev(struct block_device *);
> +extern void freeze_filesystems(void);
> +extern void thaw_filesystems(void);
>  #else
>  static inline void bd_forget(struct inode *inode) {}
>  static inline int sync_blockdev(struct block_device *bdev) { return 0; }
> @@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
>  {
>  	return 0;
>  }
> +
> +static inline void freeze_filesystems(void) {}
> +static inline void thaw_filesystems(void) {}
>  #endif
>  extern int sync_filesystem(struct super_block *);
>  extern const struct file_operations def_blk_fops;
> Index: linux-2.6/fs/block_dev.c
> ===================================================================
> --- linux-2.6.orig/fs/block_dev.c
> +++ linux-2.6/fs/block_dev.c
> @@ -314,6 +314,62 @@ out:
>  }
>  EXPORT_SYMBOL(thaw_bdev);
>  
> +/**
> + * freeze_filesystems - Force all filesystems into a consistent state.
> + */
> +void freeze_filesystems(void)
> +{
> +	struct super_block *sb;
> +
> +	/*
> +	 * This is necessary, because some filesystems (e.g. ext3) lock
> +	 * mutexes in their .freeze_fs() callbacks and leave them locked for
> +	 * their .unfreeze_fs() callbacks to unlock.  This is done under
> +	 * bdev->bd_fsfreeze_mutex, which is then released, but it makes
> +	 * lockdep think something may be wrong when freeze_bdev() attempts
> +	 * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
> +	 */
> +	lockdep_off();

I thought those problems were fixed. If they aren't, then they most
certainly need to be because holding mutexes over system calls is a
bug.

Well, well:

[252182.603134] ================================================
[252182.604832] [ BUG: lock held when returning to user space! ]
[252182.606086] ------------------------------------------------
[252182.607400] xfs_io/4917 is leaving the kernel with locks still held!
[252182.608905] 1 lock held by xfs_io/4917:
[252182.609739]  #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff812a2aaf>] journal_lock_updates+0xef/0x100

<sigh>

Looks like the problem was fixed for ext4, but not ext3.  Please
report this to the ext3/4 list and get it fixed, don't work around
it here.

> +	/*
> +	 * Freeze in reverse order so filesystems depending on others are
> +	 * frozen in the right order (eg. loopback on ext3).
> +	 */
> +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> +		if (!sb->s_root || !sb->s_bdev ||
> +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> +		    (sb->s_flags & MS_RDONLY))
> +			continue;
> +
> +		freeze_bdev(sb->s_bdev);
> +		sb->s_flags |= MS_FROZEN;
> +	}

AFAIK, that won't work for btrfs - you have to call freeze_super()
directly for btrfs because it has a special relationship with
sb->s_bdev. And besides, all freeze_bdev does is get an active
reference on the superblock and call freeze_super().

Also, that's traversing the list of superblock with locking and
dereferencing the superblock without properly checking that the
superblock is not being torn down. You should probably use
iterate_supers (or at least copy the code), with a function that
drops the s_umount read lock befor calling freeze_super() and then
picks it back up afterwards.

> +
> +	lockdep_on();
> +}
> +
> +/**
> + * thaw_filesystems - Make all filesystems active again.
> + */
> +void thaw_filesystems(void)
> +{
> +	struct super_block *sb;
> +
> +	/*
> +	 * This is necessary for the same reason as in freeze_filesystems()
> +	 * above.
> +	 */
> +	lockdep_off();
> +
> +	list_for_each_entry(sb, &super_blocks, s_list)
> +		if (sb->s_flags & MS_FROZEN) {
> +			sb->s_flags &= ~MS_FROZEN;
> +			thaw_bdev(sb->s_bdev, sb);
> +		}

And once again, iterate_supers() is what you want here. And you
should only call thaw_bdev() as it needs to do checks other than
checking MS_FROZEN e.g. the above will unfreeze filesystems that
were already frozen at the time a suspend occurs, and that could
lead to corruption depending on why the filesystem was frozen...

Also, you still need to check for a valid sb->s_bdev here, otherwise
<splat>.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
       [not found]                       ` <20110807001446.GI3162@dastard>
@ 2011-08-08 21:11                         ` Rafael J. Wysocki
  2011-08-14  0:16                         ` Rafael J. Wysocki
  1 sibling, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2011-08-08 21:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	linux-fsdevel, Linux PM mailing list, linux-ext4

On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> > 
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> > 
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume).  In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> > 
> > This change is based on earlier work by Nigel Cunningham.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> > 
> > OK, so nobody except for Pavel appears to have any comments, so I assume
> > that everyone except for Pavel is fine with the approach, interestingly enough.
> > 
> > I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
> > and added comments explaining why lockdep_off/on() are used.
> > 
> > Thanks,
> > Rafael
> > 
> > ---
> >  fs/block_dev.c         |   56 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/fs.h     |    6 +++++
> >  kernel/power/process.c |    7 +++++-
> >  3 files changed, 68 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6/include/linux/fs.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/fs.h
> > +++ linux-2.6/include/linux/fs.h
> > @@ -211,6 +211,7 @@ struct inodes_stat_t {
> >  #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
> >  #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
> >  #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
> > +#define MS_FROZEN	(1<<25) /* bdev has been frozen */
> >  #define MS_NOSEC	(1<<28)
> >  #define MS_BORN		(1<<29)
> >  #define MS_ACTIVE	(1<<30)
> > @@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
> >  extern void emergency_thaw_all(void);
> >  extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
> >  extern int fsync_bdev(struct block_device *);
> > +extern void freeze_filesystems(void);
> > +extern void thaw_filesystems(void);
> >  #else
> >  static inline void bd_forget(struct inode *inode) {}
> >  static inline int sync_blockdev(struct block_device *bdev) { return 0; }
> > @@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
> >  {
> >  	return 0;
> >  }
> > +
> > +static inline void freeze_filesystems(void) {}
> > +static inline void thaw_filesystems(void) {}
> >  #endif
> >  extern int sync_filesystem(struct super_block *);
> >  extern const struct file_operations def_blk_fops;
> > Index: linux-2.6/fs/block_dev.c
> > ===================================================================
> > --- linux-2.6.orig/fs/block_dev.c
> > +++ linux-2.6/fs/block_dev.c
> > @@ -314,6 +314,62 @@ out:
> >  }
> >  EXPORT_SYMBOL(thaw_bdev);
> >  
> > +/**
> > + * freeze_filesystems - Force all filesystems into a consistent state.
> > + */
> > +void freeze_filesystems(void)
> > +{
> > +	struct super_block *sb;
> > +
> > +	/*
> > +	 * This is necessary, because some filesystems (e.g. ext3) lock
> > +	 * mutexes in their .freeze_fs() callbacks and leave them locked for
> > +	 * their .unfreeze_fs() callbacks to unlock.  This is done under
> > +	 * bdev->bd_fsfreeze_mutex, which is then released, but it makes
> > +	 * lockdep think something may be wrong when freeze_bdev() attempts
> > +	 * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
> > +	 */
> > +	lockdep_off();
> 
> I thought those problems were fixed. If they aren't, then they most
> certainly need to be because holding mutexes over system calls is a
> bug.
> 
> Well, well:
> 
> [252182.603134] ================================================
> [252182.604832] [ BUG: lock held when returning to user space! ]
> [252182.606086] ------------------------------------------------
> [252182.607400] xfs_io/4917 is leaving the kernel with locks still held!
> [252182.608905] 1 lock held by xfs_io/4917:
> [252182.609739]  #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff812a2aaf>] journal_lock_updates+0xef/0x100
> 
> <sigh>
> 
> Looks like the problem was fixed for ext4, but not ext3.  Please
> report this to the ext3/4 list and get it fixed, don't work around
> it here.

OK, but I guess I'll have to post a patch to fix this myself so that
anyone notices. :-)

> > +	/*
> > +	 * Freeze in reverse order so filesystems depending on others are
> > +	 * frozen in the right order (eg. loopback on ext3).
> > +	 */
> > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > +		if (!sb->s_root || !sb->s_bdev ||
> > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > +		    (sb->s_flags & MS_RDONLY))
> > +			continue;
> > +
> > +		freeze_bdev(sb->s_bdev);
> > +		sb->s_flags |= MS_FROZEN;
> > +	}
> 
> AFAIK, that won't work for btrfs - you have to call freeze_super()
> directly for btrfs because it has a special relationship with
> sb->s_bdev. And besides, all freeze_bdev does is get an active
> reference on the superblock and call freeze_super().

OK, so do you mean I should call freeze_super() rather than freeze_bdev()?

> Also, that's traversing the list of superblock with locking and
> dereferencing the superblock without properly checking that the
> superblock is not being torn down. You should probably use
> iterate_supers (or at least copy the code), with a function that
> drops the s_umount read lock befor calling freeze_super() and then
> picks it back up afterwards.

Hmm, I'll try that, but I doubt I'll get it right first time. :-)

> > +
> > +	lockdep_on();
> > +}
> > +
> > +/**
> > + * thaw_filesystems - Make all filesystems active again.
> > + */
> > +void thaw_filesystems(void)
> > +{
> > +	struct super_block *sb;
> > +
> > +	/*
> > +	 * This is necessary for the same reason as in freeze_filesystems()
> > +	 * above.
> > +	 */
> > +	lockdep_off();
> > +
> > +	list_for_each_entry(sb, &super_blocks, s_list)
> > +		if (sb->s_flags & MS_FROZEN) {
> > +			sb->s_flags &= ~MS_FROZEN;
> > +			thaw_bdev(sb->s_bdev, sb);
> > +		}
> 
> And once again, iterate_supers() is what you want here.

OK

> And you should only call thaw_bdev() as it needs to do checks other
> than checking MS_FROZEN

Hmm, I'm not really sure what you mean?

> e.g. the above will unfreeze filesystems that
> were already frozen at the time a suspend occurs, and that could
> lead to corruption depending on why the filesystem was frozen...
> 
> Also, you still need to check for a valid sb->s_bdev here, otherwise
> <splat>.

I see.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-07-27  9:35       ` Rafael J. Wysocki
  2011-07-27 10:33         ` Christoph Hellwig
@ 2011-08-10 21:43         ` Pavel Machek
  2011-08-16 12:38           ` Christoph
       [not found]           ` <4E4A64D4.4000005@u-club.de>
  1 sibling, 2 replies; 16+ messages in thread
From: Pavel Machek @ 2011-08-10 21:43 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Christoph, Linux PM mailing list, Dave Chinner, xfs

Hi!

> > > Why don't you simply submit a patch to do that?
> > 
> > a) I don't know how to test suspend/hibernate
> > b) I don't have any hardware I can test it on.

hibernation should work on any hardware. Just add cmdline parameter
resume=<your swap> and use echo disk > /sys/power/state.

suspend is supported on most PCs, and certainly on any notebook. Just
echo mem > /sys/power/state.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
       [not found]                       ` <20110807001446.GI3162@dastard>
  2011-08-08 21:11                         ` Rafael J. Wysocki
@ 2011-08-14  0:16                         ` Rafael J. Wysocki
  1 sibling, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2011-08-14  0:16 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	linux-fsdevel, Linux PM mailing list, linux-ext4

On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
...
> > +	/*
> > +	 * Freeze in reverse order so filesystems depending on others are
> > +	 * frozen in the right order (eg. loopback on ext3).
> > +	 */
> > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > +		if (!sb->s_root || !sb->s_bdev ||
> > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > +		    (sb->s_flags & MS_RDONLY))
> > +			continue;
> > +
> > +		freeze_bdev(sb->s_bdev);
> > +		sb->s_flags |= MS_FROZEN;
> > +	}
> 
> AFAIK, that won't work for btrfs - you have to call freeze_super()
> directly for btrfs because it has a special relationship with
> sb->s_bdev. And besides, all freeze_bdev does is get an active
> reference on the superblock and call freeze_super().
> 
> Also, that's traversing the list of superblock with locking and
> dereferencing the superblock without properly checking that the
> superblock is not being torn down. You should probably use
> iterate_supers (or at least copy the code), with a function that
> drops the s_umount read lock befor calling freeze_super() and then
> picks it back up afterwards.

So, what about the patch below?  It appears to work on my test boxes.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM / Freezer: Freeze filesystems while freezing processes (v3)

Freeze all filesystems during the freezing of tasks by calling
freeze_super() for all superblocks and thaw them during the thawing
of tasks with the help of thaw_super().

This is needed by hibernation, because some filesystems (e.g. XFS)
deadlock with the preallocation of memory used by it if the memory
pressure caused by it is too heavy.

The additional benefit of this change is that, if something goes
wrong after filesystems have been frozen, they will stay in a
consistent state and journal replays won't be necessary (e.g. after
a failing suspend or resume).  In particular, this should help to
solve a long-standing issue that in some cases during resume from
hibernation the boot loader causes the journal to be replied for the
filesystem containing the kernel image and initrd causing it to
become inconsistent with the information stored in the hibernation
image.

This change is based on earlier work by Nigel Cunningham.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 fs/super.c             |   70 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     |    3 ++
 kernel/power/process.c |    9 +++++-
 3 files changed, 81 insertions(+), 1 deletion(-)

Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h
+++ linux/include/linux/fs.h
@@ -211,6 +211,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_FROZEN	(1<<25) /* Frozen filesystem */
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
@@ -2497,6 +2498,8 @@ extern void drop_super(struct super_bloc
 extern void iterate_supers(void (*)(struct super_block *, void *), void *);
 extern void iterate_supers_type(struct file_system_type *,
 			        void (*)(struct super_block *, void *), void *);
+extern int freeze_supers(void);
+extern void thaw_supers(void);
 
 extern int dcache_dir_open(struct inode *, struct file *);
 extern int dcache_dir_close(struct inode *, struct file *);
Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -12,10 +12,10 @@
 #include <linux/oom.h>
 #include <linux/suspend.h>
 #include <linux/module.h>
-#include <linux/syscalls.h>
 #include <linux/freezer.h>
 #include <linux/delay.h>
 #include <linux/workqueue.h>
+#include <linux/fs.h>
 
 /* 
  * Timeout for stopping processes
@@ -147,6 +147,12 @@ int freeze_processes(void)
 		goto Exit;
 	printk("done.\n");
 
+	printk("Freezing filesystems ... ");
+	error = freeze_supers();
+	if (error)
+		goto Exit;
+	printk("done.\n");
+
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
 	if (error)
@@ -188,6 +194,7 @@ void thaw_processes(void)
 	printk("Restarting tasks ... ");
 	thaw_workqueues();
 	thaw_tasks(true);
+	thaw_supers();
 	thaw_tasks(false);
 	schedule();
 	printk("done.\n");
Index: linux/fs/super.c
===================================================================
--- linux.orig/fs/super.c
+++ linux/fs/super.c
@@ -590,6 +590,76 @@ void iterate_supers_type(struct file_sys
 EXPORT_SYMBOL(iterate_supers_type);
 
 /**
+ *	freeze_supers - call freeze_super() for all superblocks
+ */
+int freeze_supers(void)
+{
+	struct super_block *sb, *p = NULL;
+	int error = 0;
+
+	spin_lock(&sb_lock);
+	/*
+	 * Freeze in reverse order so filesystems depending on others are
+	 * frozen in the right order (eg. loopback on ext3).
+	 */
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		if (list_empty(&sb->s_instances))
+			continue;
+		sb->s_count++;
+		spin_unlock(&sb_lock);
+
+		if (sb->s_root && sb->s_frozen != SB_FREEZE_TRANS
+		    && !(sb->s_flags & MS_RDONLY)) {
+			error = freeze_super(sb);
+			if (!error)
+				sb->s_flags |= MS_FROZEN;
+		}
+
+		spin_lock(&sb_lock);
+		if (error)
+			break;
+		if (p)
+			__put_super(p);
+		p = sb;
+	}
+	if (p)
+		__put_super(p);
+	spin_unlock(&sb_lock);
+
+	return error;
+}
+
+/**
+ *	thaw_supers - call thaw_super() for all superblocks
+ */
+void thaw_supers(void)
+{
+	struct super_block *sb, *p = NULL;
+
+	spin_lock(&sb_lock);
+	list_for_each_entry(sb, &super_blocks, s_list) {
+		if (list_empty(&sb->s_instances))
+			continue;
+		sb->s_count++;
+		spin_unlock(&sb_lock);
+
+		if (sb->s_flags & MS_FROZEN) {
+			thaw_super(sb);
+			sb->s_flags &= ~MS_FROZEN;
+		}
+
+		spin_lock(&sb_lock);
+		if (p)
+			__put_super(p);
+		p = sb;
+	}
+	if (p)
+		__put_super(p);
+	spin_unlock(&sb_lock);
+}
+
+
+/**
  *	get_super - get the superblock of a device
  *	@bdev: device to get the superblock for
  *	

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-08-10 21:43         ` PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Pavel Machek
@ 2011-08-16 12:38           ` Christoph
       [not found]           ` <4E4A64D4.4000005@u-club.de>
  1 sibling, 0 replies; 16+ messages in thread
From: Christoph @ 2011-08-16 12:38 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Linux PM mailing list, Dave Chinner, xfs

On 10.08.2011 23:43, Pavel Machek wrote:
> Hi!
> 
>>>> Why don't you simply submit a patch to do that?
>>>
>>> a) I don't know how to test suspend/hibernate
>>> b) I don't have any hardware I can test it on.


Hi!

I do apologize fmy silence but I've been in a serious productions the last
weeks.

If there any patches to test, let me know. I can run some decent stress tests.

chris


> 
> hibernation should work on any hardware. Just add cmdline parameter
> resume=<your swap> and use echo disk > /sys/power/state.
> 
> suspend is supported on most PCs, and certainly on any notebook. Just
> echo mem > /sys/power/state.
> 									Pavel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
       [not found]           ` <4E4A64D4.4000005@u-club.de>
@ 2011-08-16 18:05             ` Rafael J. Wysocki
  0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2011-08-16 18:05 UTC (permalink / raw)
  To: Christoph; +Cc: Linux PM mailing list, Dave Chinner, xfs

On Tuesday, August 16, 2011, Christoph wrote:
> On 10.08.2011 23:43, Pavel Machek wrote:
> > Hi!
> > 
> >>>> Why don't you simply submit a patch to do that?
> >>>
> >>> a) I don't know how to test suspend/hibernate
> >>> b) I don't have any hardware I can test it on.
> 
> 
> Hi!
> 
> I do apologize fmy silence but I've been in a serious productions the last
> weeks.
> 
> If there any patches to test, let me know. I can run some decent stress tests.

Well, please try the one attached to
https://bugzilla.kernel.org/show_bug.cgi?id=33622#c10

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-08-16 18:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4E1C70AD.1010101@u-club.de>
     [not found] ` <20110713000332.GM23038@dastard>
2011-07-26 20:28   ` PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Rafael J. Wysocki
2011-07-27  0:45     ` Dave Chinner
2011-07-27  9:35       ` Rafael J. Wysocki
2011-07-27 10:33         ` Christoph Hellwig
2011-07-27 12:22           ` Nigel Cunningham
     [not found]             ` <201108032315.06012.rjw@sisk.pl>
2011-08-03 17:29               ` [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag) Pavel Machek
     [not found]               ` <20110803172922.GA2126@ucw.cz>
2011-08-04  9:27                 ` Rafael J. Wysocki
     [not found]                 ` <201108041127.30944.rjw@sisk.pl>
2011-08-04 22:25                   ` Rafael J. Wysocki
     [not found]                   ` <201108050025.09792.rjw@sisk.pl>
2011-08-06 21:17                     ` [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2) Rafael J. Wysocki
     [not found]                     ` <201108062317.19033.rjw@sisk.pl>
2011-08-07  0:14                       ` Dave Chinner
     [not found]                       ` <20110807001446.GI3162@dastard>
2011-08-08 21:11                         ` Rafael J. Wysocki
2011-08-14  0:16                         ` Rafael J. Wysocki
2011-08-03 21:15             ` [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag) Rafael J. Wysocki
2011-08-10 21:43         ` PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Pavel Machek
2011-08-16 12:38           ` Christoph
     [not found]           ` <4E4A64D4.4000005@u-club.de>
2011-08-16 18:05             ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).