From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph <cr2005@u-club.de>,
Linux PM mailing list <linux-pm@lists.linux-foundation.org>,
Pavel Machek <pavel@ucw.cz>,
xfs@oss.sgi.com
Subject: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
Date: Tue, 26 Jul 2011 22:28:11 +0200 [thread overview]
Message-ID: <201107262228.12099.rjw@sisk.pl> (raw)
In-Reply-To: <20110713000332.GM23038@dastard>
On Wednesday, July 13, 2011, Dave Chinner wrote:
> On Tue, Jul 12, 2011 at 06:05:01PM +0200, Christoph wrote:
> > Hi!
> >
> > I'd like you to have a look into this issue:
> >
> > pm-hibernate locks up when using xfs while "Preallocating image memory".
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=33622
> >
> > I got at least this backtrace (2.6.39.3)
> >
> > tia
> >
> > chris
> >
> >
> >
> > SysRq : Show Blocked State
> >
> > pm-hibernate D 0000000000000000 0 3638 3637 0x00000000
> > ffff8800017bf918 0000000000000082 ffff8800017be010 ffff880000000000
> > ffff8800017be010 ffff88000b8a6170 0000000000013900 ffff8800017bffd8
> > ffff8800017bffd8 0000000000013900 ffffffff8148b020 ffff88000b8a6170
> > Call Trace:
> > [<ffffffff81344ce2>] schedule_timeout+0x22/0xbb
> > [<ffffffff81344b64>] wait_for_common+0xcb/0x148
> > [<ffffffff810408ea>] ? try_to_wake_up+0x18c/0x18c
> > [<ffffffff81345527>] ? down_write+0x2d/0x31
> > [<ffffffff81344c7b>] wait_for_completion+0x18/0x1a
> > [<ffffffffa02374da>] xfs_reclaim_inode+0x74/0x258 [xfs]
> > [<ffffffffa0237853>] xfs_reclaim_inodes_ag+0x195/0x264 [xfs]
> > [<ffffffffa0237974>] xfs_reclaim_inode_shrink+0x52/0x90 [xfs]
> > [<ffffffff810c4e21>] shrink_slab+0xdb/0x151
> > [<ffffffff810c625a>] do_try_to_free_pages+0x204/0x39a
> > [<ffffffff8134ce4e>] ? apic_timer_interrupt+0xe/0x20
> > [<ffffffff810c647f>] shrink_all_memory+0x8f/0xa8
> > [<ffffffff810cc41a>] ? next_online_pgdat+0x20/0x41
> > [<ffffffff8107937d>] hibernate_preallocate_memory+0x1c4/0x30f
> > [<ffffffff811a8fa2>] ? kobject_put+0x47/0x4b
> > [<ffffffff81077eb2>] hibernation_snapshot+0x45/0x281
> > [<ffffffff810781bf>] hibernate+0xd1/0x1b8
> > [<ffffffff81076c58>] state_store+0x57/0xce
> > [<ffffffff811a8d0b>] kobj_attr_store+0x17/0x19
> > [<ffffffff81152bda>] sysfs_write_file+0xfc/0x138
> > [<ffffffff810fca74>] vfs_write+0xa9/0x105
> > [<ffffffff810fcb89>] sys_write+0x45/0x6c
> > [<ffffffff8134c492>] system_call_fastpath+0x16/0x1b
>
> It's waiting for IO completion, and holding an AG scan lock.
>
> And IO completion requires a workqueue to run. Just FYI, this
> process of inode reclaim can dirty the filesystem, long after
> hibernate have assumed that it is clean due to the sys_sync() call
> you do after freezing the processes. I pointed out this flaw in
> using sync to write dirty data prior to hibernate a couple of years
> ago.
However, attempts to remove the sys_sync() from the hibernate code
were objected to by some developers, since they believe it will increase
the probability of data loss in case of a failing hibernation in general.
> Anyway, it's a good thing that XFS doesn't use freezable work
> queues, otherwise it would hang on every hibernate. Perhaps I should
> do that to force hibernate to do things properly in filesystems
> land.
Well, I'd say it's a very well known fact that filesystems are not
handled in any special way during hibernation, which is not a good
thing. Nevertheless, I've never seen anyone from the filesystems land
pay any kind of attention to this issue.
> However, it is entirely possible that something else that XFS relies
> on for IO completion has been put to sleep by this point.
>
> /me finds the smoking cannon:
>
> [ 648.794455] xfsbufd/sda3 D 0000000000000000 0 192 2 0x00000000
> [ 648.794455] ffff88003720be00 0000000000000046 ffff88003720bd90 ffffffff00000000
> [ 648.794455] ffff88003720a010 ffff880056bc3580 0000000000013900 ffff88003720bfd8
> [ 648.794455] ffff88003720bfd8 0000000000013900 ffffffff8148b020 ffff880056bc3580
> [ 648.794455] Call Trace:
> [ 648.794455] [<ffffffff81065c0a>] refrigerator+0xbd/0xd3
> [ 648.794455] [<ffffffffa022d072>] xfsbufd+0x93/0x14d [xfs]
> [ 648.794455] [<ffffffffa022cfdf>] ? xfs_free_buftarg+0x4c/0x4c [xfs]
> [ 648.794455] [<ffffffff8105f25a>] kthread+0x7d/0x85
> [ 648.794455] [<ffffffff8134d6e4>] kernel_thread_helper+0x4/0x10
> [ 648.794455] [<ffffffff8105f1dd>] ? kthread_worker_fn+0x148/0x148
> [ 648.794455] [<ffffffff8134d6e0>] ? gs_change+0x13/0x13
>
> The xfsbufd, responsible for pushing out dirty metadata, has been
> been frozen. sys_sync() does not push out dirty metadata because it
> is already on stable storage in the journal. If the flush lock is
> already held on the inode, then inode reclaim will wait for the
> xfsbufd to flush the backing buffer because reclaim can't do it
> directly. And hibernate has already frozen the xfsbufd.
>
> IOWs, what hibernate does is:
>
> freeze_processes()
> sys_sync()
> allocate a large amount of memory
>
> Freezing the processes causes parts of filesystems to be put in the
> fridge, which means there is no guarantee that sys_sync() actually
> does what it is supposed to. As it is, sys_sync() really only
> guarantees file data is clean in memory - metadata does not need to
> be clean as long s it has been journalled and the journal is safe on
> disk.
>
> Further, allocating memory can cause memory reclaim to enter the
> filesystem and try to free memory held by the filesystem. In XFS (at
> least) this can cause the filesystem to issue tranactions and
> metadata IO to clean the dirty metadata to enable it to be
> reclaimed. So hibernate is effectively guaranteed to dirty the
> filesystem after it has frozen all the worker threads the filesystem
> might rely on.
>
> Also, by this point kswapd has already been frozen, so hibernate is
> relying totally on direct memory reclaim to free up the memory it
> requires. I'm not sure that's a good idea.
>
> IOWs, hibernate is still broken by design - and broken in exactly
> the way that was pointed out a couple of years ago by myself and
> others in the filesystem world: sys_sync() does not quiesce or
> guarantee a clean filesystem in memory after it completes.
>
> There is a solution to this, and it already exists - it's called
> freezing the filesystem. Effectively hibernate needs to allocate
> memory before it freezes kernel/filesystem worker threads:
>
> freeze_userspace_processes()
>
> // just to clean the page cache quickly
> sys_sync()
>
> // optionally to free page/inode/dentry caches:
> iterate_supers(drop_pagecache_sb, NULL);
> drop_slab()
>
> allocate a large amount of memory
>
> // Now quiesce the filesystems and clean remaining metadata
> iterate_supers(freeze_super, NULL);
>
> freeze_remaining_processes()
>
> This guarantees that filesystems are still working when memory
> reclaim comes along to free memory for the hibernate image, and that
> once it is allocated that filesystems will not be changed until
> thawed on the hibernate wakeup.
>
> So, like I said a couple of years ago: fix hibernate to quiesce
> filesystems properly, and the hibernate will be much more reliable
> and robust and less likely to break randomly in the future.
Why don't you simply submit a patch to do that?
Rafael
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-07-26 20:27 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-12 16:05 PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Christoph
2011-07-13 0:03 ` Dave Chinner
2011-07-26 20:28 ` Rafael J. Wysocki [this message]
2011-07-27 0:45 ` Dave Chinner
2011-07-27 9:35 ` Rafael J. Wysocki
2011-07-27 10:33 ` Christoph Hellwig
2011-07-27 12:22 ` Nigel Cunningham
2011-08-03 21:15 ` [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag) Rafael J. Wysocki
[not found] ` <20110803172922.GA2126@ucw.cz>
2011-08-04 9:27 ` Rafael J. Wysocki
2011-08-04 22:25 ` Rafael J. Wysocki
2011-08-06 21:17 ` [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2) Rafael J. Wysocki
2011-08-07 0:14 ` Dave Chinner
2011-08-08 21:11 ` Rafael J. Wysocki
2011-08-14 0:16 ` Rafael J. Wysocki
2011-09-24 22:56 ` Rafael J. Wysocki
2011-09-25 5:32 ` Nigel Cunningham
2011-09-25 13:37 ` Rafael J. Wysocki
2011-09-25 10:38 ` Christoph
2011-09-25 13:32 ` Rafael J. Wysocki
2011-09-25 21:57 ` Christoph
2011-09-25 22:10 ` Rafael J. Wysocki
2011-09-26 5:27 ` Christoph
2011-10-22 15:14 ` Christoph
2011-10-22 21:35 ` Rafael J. Wysocki
2011-11-16 13:49 ` Ferenc Wagner
2011-11-16 21:50 ` Rafael J. Wysocki
2011-09-25 13:40 ` [Update][PATCH] PM / Hibernate: Freeze kernel threads after preallocating memory Rafael J. Wysocki
2011-08-10 21:43 ` PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Pavel Machek
2011-08-16 12:38 ` Christoph
2011-08-16 18:05 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201107262228.12099.rjw@sisk.pl \
--to=rjw@sisk.pl \
--cc=cr2005@u-club.de \
--cc=david@fromorbit.com \
--cc=linux-pm@lists.linux-foundation.org \
--cc=pavel@ucw.cz \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox