From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph <cr2005@u-club.de>,
Linux PM mailing list <linux-pm@lists.linux-foundation.org>,
xfs@oss.sgi.com
Subject: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
Date: Tue, 26 Jul 2011 22:28:11 +0200 [thread overview]
Message-ID: <201107262228.12099.rjw@sisk.pl> (raw)
In-Reply-To: <20110713000332.GM23038@dastard>
On Wednesday, July 13, 2011, Dave Chinner wrote:
> On Tue, Jul 12, 2011 at 06:05:01PM +0200, Christoph wrote:
> > Hi!
> >
> > I'd like you to have a look into this issue:
> >
> > pm-hibernate locks up when using xfs while "Preallocating image memory".
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=33622
> >
> > I got at least this backtrace (2.6.39.3)
> >
> > tia
> >
> > chris
> >
> >
> >
> > SysRq : Show Blocked State
> >
> > pm-hibernate D 0000000000000000 0 3638 3637 0x00000000
> > ffff8800017bf918 0000000000000082 ffff8800017be010 ffff880000000000
> > ffff8800017be010 ffff88000b8a6170 0000000000013900 ffff8800017bffd8
> > ffff8800017bffd8 0000000000013900 ffffffff8148b020 ffff88000b8a6170
> > Call Trace:
> > [<ffffffff81344ce2>] schedule_timeout+0x22/0xbb
> > [<ffffffff81344b64>] wait_for_common+0xcb/0x148
> > [<ffffffff810408ea>] ? try_to_wake_up+0x18c/0x18c
> > [<ffffffff81345527>] ? down_write+0x2d/0x31
> > [<ffffffff81344c7b>] wait_for_completion+0x18/0x1a
> > [<ffffffffa02374da>] xfs_reclaim_inode+0x74/0x258 [xfs]
> > [<ffffffffa0237853>] xfs_reclaim_inodes_ag+0x195/0x264 [xfs]
> > [<ffffffffa0237974>] xfs_reclaim_inode_shrink+0x52/0x90 [xfs]
> > [<ffffffff810c4e21>] shrink_slab+0xdb/0x151
> > [<ffffffff810c625a>] do_try_to_free_pages+0x204/0x39a
> > [<ffffffff8134ce4e>] ? apic_timer_interrupt+0xe/0x20
> > [<ffffffff810c647f>] shrink_all_memory+0x8f/0xa8
> > [<ffffffff810cc41a>] ? next_online_pgdat+0x20/0x41
> > [<ffffffff8107937d>] hibernate_preallocate_memory+0x1c4/0x30f
> > [<ffffffff811a8fa2>] ? kobject_put+0x47/0x4b
> > [<ffffffff81077eb2>] hibernation_snapshot+0x45/0x281
> > [<ffffffff810781bf>] hibernate+0xd1/0x1b8
> > [<ffffffff81076c58>] state_store+0x57/0xce
> > [<ffffffff811a8d0b>] kobj_attr_store+0x17/0x19
> > [<ffffffff81152bda>] sysfs_write_file+0xfc/0x138
> > [<ffffffff810fca74>] vfs_write+0xa9/0x105
> > [<ffffffff810fcb89>] sys_write+0x45/0x6c
> > [<ffffffff8134c492>] system_call_fastpath+0x16/0x1b
>
> It's waiting for IO completion, and holding an AG scan lock.
>
> And IO completion requires a workqueue to run. Just FYI, this
> process of inode reclaim can dirty the filesystem, long after
> hibernate have assumed that it is clean due to the sys_sync() call
> you do after freezing the processes. I pointed out this flaw in
> using sync to write dirty data prior to hibernate a couple of years
> ago.
However, attempts to remove the sys_sync() from the hibernate code
were objected to by some developers, since they believe it will increase
the probability of data loss in case of a failing hibernation in general.
> Anyway, it's a good thing that XFS doesn't use freezable work
> queues, otherwise it would hang on every hibernate. Perhaps I should
> do that to force hibernate to do things properly in filesystems
> land.
Well, I'd say it's a very well known fact that filesystems are not
handled in any special way during hibernation, which is not a good
thing. Nevertheless, I've never seen anyone from the filesystems land
pay any kind of attention to this issue.
> However, it is entirely possible that something else that XFS relies
> on for IO completion has been put to sleep by this point.
>
> /me finds the smoking cannon:
>
> [ 648.794455] xfsbufd/sda3 D 0000000000000000 0 192 2 0x00000000
> [ 648.794455] ffff88003720be00 0000000000000046 ffff88003720bd90 ffffffff00000000
> [ 648.794455] ffff88003720a010 ffff880056bc3580 0000000000013900 ffff88003720bfd8
> [ 648.794455] ffff88003720bfd8 0000000000013900 ffffffff8148b020 ffff880056bc3580
> [ 648.794455] Call Trace:
> [ 648.794455] [<ffffffff81065c0a>] refrigerator+0xbd/0xd3
> [ 648.794455] [<ffffffffa022d072>] xfsbufd+0x93/0x14d [xfs]
> [ 648.794455] [<ffffffffa022cfdf>] ? xfs_free_buftarg+0x4c/0x4c [xfs]
> [ 648.794455] [<ffffffff8105f25a>] kthread+0x7d/0x85
> [ 648.794455] [<ffffffff8134d6e4>] kernel_thread_helper+0x4/0x10
> [ 648.794455] [<ffffffff8105f1dd>] ? kthread_worker_fn+0x148/0x148
> [ 648.794455] [<ffffffff8134d6e0>] ? gs_change+0x13/0x13
>
> The xfsbufd, responsible for pushing out dirty metadata, has been
> been frozen. sys_sync() does not push out dirty metadata because it
> is already on stable storage in the journal. If the flush lock is
> already held on the inode, then inode reclaim will wait for the
> xfsbufd to flush the backing buffer because reclaim can't do it
> directly. And hibernate has already frozen the xfsbufd.
>
> IOWs, what hibernate does is:
>
> freeze_processes()
> sys_sync()
> allocate a large amount of memory
>
> Freezing the processes causes parts of filesystems to be put in the
> fridge, which means there is no guarantee that sys_sync() actually
> does what it is supposed to. As it is, sys_sync() really only
> guarantees file data is clean in memory - metadata does not need to
> be clean as long s it has been journalled and the journal is safe on
> disk.
>
> Further, allocating memory can cause memory reclaim to enter the
> filesystem and try to free memory held by the filesystem. In XFS (at
> least) this can cause the filesystem to issue tranactions and
> metadata IO to clean the dirty metadata to enable it to be
> reclaimed. So hibernate is effectively guaranteed to dirty the
> filesystem after it has frozen all the worker threads the filesystem
> might rely on.
>
> Also, by this point kswapd has already been frozen, so hibernate is
> relying totally on direct memory reclaim to free up the memory it
> requires. I'm not sure that's a good idea.
>
> IOWs, hibernate is still broken by design - and broken in exactly
> the way that was pointed out a couple of years ago by myself and
> others in the filesystem world: sys_sync() does not quiesce or
> guarantee a clean filesystem in memory after it completes.
>
> There is a solution to this, and it already exists - it's called
> freezing the filesystem. Effectively hibernate needs to allocate
> memory before it freezes kernel/filesystem worker threads:
>
> freeze_userspace_processes()
>
> // just to clean the page cache quickly
> sys_sync()
>
> // optionally to free page/inode/dentry caches:
> iterate_supers(drop_pagecache_sb, NULL);
> drop_slab()
>
> allocate a large amount of memory
>
> // Now quiesce the filesystems and clean remaining metadata
> iterate_supers(freeze_super, NULL);
>
> freeze_remaining_processes()
>
> This guarantees that filesystems are still working when memory
> reclaim comes along to free memory for the hibernate image, and that
> once it is allocated that filesystems will not be changed until
> thawed on the hibernate wakeup.
>
> So, like I said a couple of years ago: fix hibernate to quiesce
> filesystems properly, and the hibernate will be much more reliable
> and robust and less likely to break randomly in the future.
Why don't you simply submit a patch to do that?
Rafael
WARNING: multiple messages have this Message-ID (diff)
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph <cr2005@u-club.de>,
Linux PM mailing list <linux-pm@lists.linux-foundation.org>,
Pavel Machek <pavel@ucw.cz>,
xfs@oss.sgi.com
Subject: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
Date: Tue, 26 Jul 2011 22:28:11 +0200 [thread overview]
Message-ID: <201107262228.12099.rjw@sisk.pl> (raw)
In-Reply-To: <20110713000332.GM23038@dastard>
On Wednesday, July 13, 2011, Dave Chinner wrote:
> On Tue, Jul 12, 2011 at 06:05:01PM +0200, Christoph wrote:
> > Hi!
> >
> > I'd like you to have a look into this issue:
> >
> > pm-hibernate locks up when using xfs while "Preallocating image memory".
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=33622
> >
> > I got at least this backtrace (2.6.39.3)
> >
> > tia
> >
> > chris
> >
> >
> >
> > SysRq : Show Blocked State
> >
> > pm-hibernate D 0000000000000000 0 3638 3637 0x00000000
> > ffff8800017bf918 0000000000000082 ffff8800017be010 ffff880000000000
> > ffff8800017be010 ffff88000b8a6170 0000000000013900 ffff8800017bffd8
> > ffff8800017bffd8 0000000000013900 ffffffff8148b020 ffff88000b8a6170
> > Call Trace:
> > [<ffffffff81344ce2>] schedule_timeout+0x22/0xbb
> > [<ffffffff81344b64>] wait_for_common+0xcb/0x148
> > [<ffffffff810408ea>] ? try_to_wake_up+0x18c/0x18c
> > [<ffffffff81345527>] ? down_write+0x2d/0x31
> > [<ffffffff81344c7b>] wait_for_completion+0x18/0x1a
> > [<ffffffffa02374da>] xfs_reclaim_inode+0x74/0x258 [xfs]
> > [<ffffffffa0237853>] xfs_reclaim_inodes_ag+0x195/0x264 [xfs]
> > [<ffffffffa0237974>] xfs_reclaim_inode_shrink+0x52/0x90 [xfs]
> > [<ffffffff810c4e21>] shrink_slab+0xdb/0x151
> > [<ffffffff810c625a>] do_try_to_free_pages+0x204/0x39a
> > [<ffffffff8134ce4e>] ? apic_timer_interrupt+0xe/0x20
> > [<ffffffff810c647f>] shrink_all_memory+0x8f/0xa8
> > [<ffffffff810cc41a>] ? next_online_pgdat+0x20/0x41
> > [<ffffffff8107937d>] hibernate_preallocate_memory+0x1c4/0x30f
> > [<ffffffff811a8fa2>] ? kobject_put+0x47/0x4b
> > [<ffffffff81077eb2>] hibernation_snapshot+0x45/0x281
> > [<ffffffff810781bf>] hibernate+0xd1/0x1b8
> > [<ffffffff81076c58>] state_store+0x57/0xce
> > [<ffffffff811a8d0b>] kobj_attr_store+0x17/0x19
> > [<ffffffff81152bda>] sysfs_write_file+0xfc/0x138
> > [<ffffffff810fca74>] vfs_write+0xa9/0x105
> > [<ffffffff810fcb89>] sys_write+0x45/0x6c
> > [<ffffffff8134c492>] system_call_fastpath+0x16/0x1b
>
> It's waiting for IO completion, and holding an AG scan lock.
>
> And IO completion requires a workqueue to run. Just FYI, this
> process of inode reclaim can dirty the filesystem, long after
> hibernate have assumed that it is clean due to the sys_sync() call
> you do after freezing the processes. I pointed out this flaw in
> using sync to write dirty data prior to hibernate a couple of years
> ago.
However, attempts to remove the sys_sync() from the hibernate code
were objected to by some developers, since they believe it will increase
the probability of data loss in case of a failing hibernation in general.
> Anyway, it's a good thing that XFS doesn't use freezable work
> queues, otherwise it would hang on every hibernate. Perhaps I should
> do that to force hibernate to do things properly in filesystems
> land.
Well, I'd say it's a very well known fact that filesystems are not
handled in any special way during hibernation, which is not a good
thing. Nevertheless, I've never seen anyone from the filesystems land
pay any kind of attention to this issue.
> However, it is entirely possible that something else that XFS relies
> on for IO completion has been put to sleep by this point.
>
> /me finds the smoking cannon:
>
> [ 648.794455] xfsbufd/sda3 D 0000000000000000 0 192 2 0x00000000
> [ 648.794455] ffff88003720be00 0000000000000046 ffff88003720bd90 ffffffff00000000
> [ 648.794455] ffff88003720a010 ffff880056bc3580 0000000000013900 ffff88003720bfd8
> [ 648.794455] ffff88003720bfd8 0000000000013900 ffffffff8148b020 ffff880056bc3580
> [ 648.794455] Call Trace:
> [ 648.794455] [<ffffffff81065c0a>] refrigerator+0xbd/0xd3
> [ 648.794455] [<ffffffffa022d072>] xfsbufd+0x93/0x14d [xfs]
> [ 648.794455] [<ffffffffa022cfdf>] ? xfs_free_buftarg+0x4c/0x4c [xfs]
> [ 648.794455] [<ffffffff8105f25a>] kthread+0x7d/0x85
> [ 648.794455] [<ffffffff8134d6e4>] kernel_thread_helper+0x4/0x10
> [ 648.794455] [<ffffffff8105f1dd>] ? kthread_worker_fn+0x148/0x148
> [ 648.794455] [<ffffffff8134d6e0>] ? gs_change+0x13/0x13
>
> The xfsbufd, responsible for pushing out dirty metadata, has been
> been frozen. sys_sync() does not push out dirty metadata because it
> is already on stable storage in the journal. If the flush lock is
> already held on the inode, then inode reclaim will wait for the
> xfsbufd to flush the backing buffer because reclaim can't do it
> directly. And hibernate has already frozen the xfsbufd.
>
> IOWs, what hibernate does is:
>
> freeze_processes()
> sys_sync()
> allocate a large amount of memory
>
> Freezing the processes causes parts of filesystems to be put in the
> fridge, which means there is no guarantee that sys_sync() actually
> does what it is supposed to. As it is, sys_sync() really only
> guarantees file data is clean in memory - metadata does not need to
> be clean as long s it has been journalled and the journal is safe on
> disk.
>
> Further, allocating memory can cause memory reclaim to enter the
> filesystem and try to free memory held by the filesystem. In XFS (at
> least) this can cause the filesystem to issue tranactions and
> metadata IO to clean the dirty metadata to enable it to be
> reclaimed. So hibernate is effectively guaranteed to dirty the
> filesystem after it has frozen all the worker threads the filesystem
> might rely on.
>
> Also, by this point kswapd has already been frozen, so hibernate is
> relying totally on direct memory reclaim to free up the memory it
> requires. I'm not sure that's a good idea.
>
> IOWs, hibernate is still broken by design - and broken in exactly
> the way that was pointed out a couple of years ago by myself and
> others in the filesystem world: sys_sync() does not quiesce or
> guarantee a clean filesystem in memory after it completes.
>
> There is a solution to this, and it already exists - it's called
> freezing the filesystem. Effectively hibernate needs to allocate
> memory before it freezes kernel/filesystem worker threads:
>
> freeze_userspace_processes()
>
> // just to clean the page cache quickly
> sys_sync()
>
> // optionally to free page/inode/dentry caches:
> iterate_supers(drop_pagecache_sb, NULL);
> drop_slab()
>
> allocate a large amount of memory
>
> // Now quiesce the filesystems and clean remaining metadata
> iterate_supers(freeze_super, NULL);
>
> freeze_remaining_processes()
>
> This guarantees that filesystems are still working when memory
> reclaim comes along to free memory for the hibernate image, and that
> once it is allocated that filesystems will not be changed until
> thawed on the hibernate wakeup.
>
> So, like I said a couple of years ago: fix hibernate to quiesce
> filesystems properly, and the hibernate will be much more reliable
> and robust and less likely to break randomly in the future.
Why don't you simply submit a patch to do that?
Rafael
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-07-26 20:28 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-12 16:05 PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Christoph
2011-07-13 0:03 ` Dave Chinner
2011-07-26 20:28 ` Rafael J. Wysocki [this message]
2011-07-26 20:28 ` Rafael J. Wysocki
2011-07-27 0:45 ` Dave Chinner
2011-07-27 0:45 ` Dave Chinner
2011-07-27 9:35 ` Rafael J. Wysocki
2011-07-27 9:35 ` Rafael J. Wysocki
2011-07-27 10:33 ` Christoph Hellwig
2011-07-27 10:33 ` Christoph Hellwig
2011-07-27 12:22 ` Nigel Cunningham
2011-07-27 12:22 ` Nigel Cunningham
2011-08-03 21:15 ` [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag) Rafael J. Wysocki
2011-08-03 21:15 ` Rafael J. Wysocki
2011-08-03 17:29 ` Pavel Machek
2011-08-03 17:29 ` Pavel Machek
2011-08-04 9:27 ` Rafael J. Wysocki
2011-08-04 9:27 ` Rafael J. Wysocki
2011-08-04 9:27 ` Rafael J. Wysocki
2011-08-04 22:25 ` Rafael J. Wysocki
2011-08-04 22:25 ` Rafael J. Wysocki
2011-08-04 22:25 ` Rafael J. Wysocki
2011-08-06 21:17 ` [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2) Rafael J. Wysocki
2011-08-06 21:17 ` Rafael J. Wysocki
2011-08-07 0:14 ` Dave Chinner
2011-08-07 0:14 ` Dave Chinner
2011-08-07 0:14 ` Dave Chinner
2011-08-08 21:11 ` Rafael J. Wysocki
2011-08-08 21:11 ` Rafael J. Wysocki
2011-08-08 21:11 ` Rafael J. Wysocki
2011-08-14 0:16 ` Rafael J. Wysocki
2011-08-14 0:16 ` Rafael J. Wysocki
2011-08-14 0:16 ` Rafael J. Wysocki
2011-09-24 22:56 ` Rafael J. Wysocki
2011-09-24 22:56 ` Rafael J. Wysocki
2011-09-24 22:56 ` Rafael J. Wysocki
2011-09-25 5:32 ` Nigel Cunningham
2011-09-25 5:32 ` Nigel Cunningham
2011-09-25 13:37 ` Rafael J. Wysocki
2011-09-25 13:37 ` Rafael J. Wysocki
2011-09-25 10:38 ` Christoph
2011-09-25 10:38 ` Christoph
2011-09-25 13:32 ` Rafael J. Wysocki
2011-09-25 13:32 ` Rafael J. Wysocki
2011-09-25 21:57 ` Christoph
2011-09-25 21:57 ` Christoph
2011-09-25 22:10 ` Rafael J. Wysocki
2011-09-25 22:10 ` Rafael J. Wysocki
2011-09-26 5:27 ` Christoph
2011-09-26 5:27 ` Christoph
2011-10-22 15:14 ` Christoph
2011-10-22 15:14 ` Christoph
2011-10-22 21:35 ` Rafael J. Wysocki
2011-10-22 21:35 ` Rafael J. Wysocki
2011-11-16 13:49 ` Ferenc Wagner
2011-11-16 13:49 ` Ferenc Wagner
2011-11-16 21:50 ` Rafael J. Wysocki
2011-11-16 21:50 ` Rafael J. Wysocki
2011-09-25 13:40 ` [Update][PATCH] PM / Hibernate: Freeze kernel threads after preallocating memory Rafael J. Wysocki
2011-09-25 13:40 ` Rafael J. Wysocki
2011-08-06 21:17 ` [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2) Rafael J. Wysocki
2011-08-03 21:15 ` [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag) Rafael J. Wysocki
2011-08-10 21:43 ` PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Pavel Machek
2011-08-10 21:43 ` Pavel Machek
2011-08-16 12:38 ` Christoph
2011-08-16 18:05 ` Rafael J. Wysocki
2011-08-16 18:05 ` Rafael J. Wysocki
2011-08-16 12:38 ` Christoph
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201107262228.12099.rjw@sisk.pl \
--to=rjw@sisk.pl \
--cc=cr2005@u-club.de \
--cc=david@fromorbit.com \
--cc=linux-pm@lists.linux-foundation.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.