From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Dave Chinner <david@fromorbit.com>
Cc: Linux PM mailing list <linux-pm@lists.linux-foundation.org>,
Pavel Machek <pavel@ucw.cz>,
Nigel Cunningham <nigel@tuxonice.net>,
Christoph Hellwig <hch@infradead.org>,
Christoph <cr2005@u-club.de>,
xfs@oss.sgi.com, LKML <linux-kernel@vger.kernel.org>,
linux-ext4@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
Date: Mon, 8 Aug 2011 23:11:27 +0200 [thread overview]
Message-ID: <201108082311.28190.rjw@sisk.pl> (raw)
In-Reply-To: <20110807001446.GI3162@dastard>
On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> >
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> >
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> >
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume). In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> >
> > This change is based on earlier work by Nigel Cunningham.
> >
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >
> > OK, so nobody except for Pavel appears to have any comments, so I assume
> > that everyone except for Pavel is fine with the approach, interestingly enough.
> >
> > I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
> > and added comments explaining why lockdep_off/on() are used.
> >
> > Thanks,
> > Rafael
> >
> > ---
> > fs/block_dev.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/fs.h | 6 +++++
> > kernel/power/process.c | 7 +++++-
> > 3 files changed, 68 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6/include/linux/fs.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/fs.h
> > +++ linux-2.6/include/linux/fs.h
> > @@ -211,6 +211,7 @@ struct inodes_stat_t {
> > #define MS_KERNMOUNT (1<<22) /* this is a kern_mount call */
> > #define MS_I_VERSION (1<<23) /* Update inode I_version field */
> > #define MS_STRICTATIME (1<<24) /* Always perform atime updates */
> > +#define MS_FROZEN (1<<25) /* bdev has been frozen */
> > #define MS_NOSEC (1<<28)
> > #define MS_BORN (1<<29)
> > #define MS_ACTIVE (1<<30)
> > @@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
> > extern void emergency_thaw_all(void);
> > extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
> > extern int fsync_bdev(struct block_device *);
> > +extern void freeze_filesystems(void);
> > +extern void thaw_filesystems(void);
> > #else
> > static inline void bd_forget(struct inode *inode) {}
> > static inline int sync_blockdev(struct block_device *bdev) { return 0; }
> > @@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
> > {
> > return 0;
> > }
> > +
> > +static inline void freeze_filesystems(void) {}
> > +static inline void thaw_filesystems(void) {}
> > #endif
> > extern int sync_filesystem(struct super_block *);
> > extern const struct file_operations def_blk_fops;
> > Index: linux-2.6/fs/block_dev.c
> > ===================================================================
> > --- linux-2.6.orig/fs/block_dev.c
> > +++ linux-2.6/fs/block_dev.c
> > @@ -314,6 +314,62 @@ out:
> > }
> > EXPORT_SYMBOL(thaw_bdev);
> >
> > +/**
> > + * freeze_filesystems - Force all filesystems into a consistent state.
> > + */
> > +void freeze_filesystems(void)
> > +{
> > + struct super_block *sb;
> > +
> > + /*
> > + * This is necessary, because some filesystems (e.g. ext3) lock
> > + * mutexes in their .freeze_fs() callbacks and leave them locked for
> > + * their .unfreeze_fs() callbacks to unlock. This is done under
> > + * bdev->bd_fsfreeze_mutex, which is then released, but it makes
> > + * lockdep think something may be wrong when freeze_bdev() attempts
> > + * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
> > + */
> > + lockdep_off();
>
> I thought those problems were fixed. If they aren't, then they most
> certainly need to be because holding mutexes over system calls is a
> bug.
>
> Well, well:
>
> [252182.603134] ================================================
> [252182.604832] [ BUG: lock held when returning to user space! ]
> [252182.606086] ------------------------------------------------
> [252182.607400] xfs_io/4917 is leaving the kernel with locks still held!
> [252182.608905] 1 lock held by xfs_io/4917:
> [252182.609739] #0: (&journal->j_barrier){+.+...}, at: [<ffffffff812a2aaf>] journal_lock_updates+0xef/0x100
>
> <sigh>
>
> Looks like the problem was fixed for ext4, but not ext3. Please
> report this to the ext3/4 list and get it fixed, don't work around
> it here.
OK, but I guess I'll have to post a patch to fix this myself so that
anyone notices. :-)
> > + /*
> > + * Freeze in reverse order so filesystems depending on others are
> > + * frozen in the right order (eg. loopback on ext3).
> > + */
> > + list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > + if (!sb->s_root || !sb->s_bdev ||
> > + (sb->s_frozen == SB_FREEZE_TRANS) ||
> > + (sb->s_flags & MS_RDONLY))
> > + continue;
> > +
> > + freeze_bdev(sb->s_bdev);
> > + sb->s_flags |= MS_FROZEN;
> > + }
>
> AFAIK, that won't work for btrfs - you have to call freeze_super()
> directly for btrfs because it has a special relationship with
> sb->s_bdev. And besides, all freeze_bdev does is get an active
> reference on the superblock and call freeze_super().
OK, so do you mean I should call freeze_super() rather than freeze_bdev()?
> Also, that's traversing the list of superblock with locking and
> dereferencing the superblock without properly checking that the
> superblock is not being torn down. You should probably use
> iterate_supers (or at least copy the code), with a function that
> drops the s_umount read lock befor calling freeze_super() and then
> picks it back up afterwards.
Hmm, I'll try that, but I doubt I'll get it right first time. :-)
> > +
> > + lockdep_on();
> > +}
> > +
> > +/**
> > + * thaw_filesystems - Make all filesystems active again.
> > + */
> > +void thaw_filesystems(void)
> > +{
> > + struct super_block *sb;
> > +
> > + /*
> > + * This is necessary for the same reason as in freeze_filesystems()
> > + * above.
> > + */
> > + lockdep_off();
> > +
> > + list_for_each_entry(sb, &super_blocks, s_list)
> > + if (sb->s_flags & MS_FROZEN) {
> > + sb->s_flags &= ~MS_FROZEN;
> > + thaw_bdev(sb->s_bdev, sb);
> > + }
>
> And once again, iterate_supers() is what you want here.
OK
> And you should only call thaw_bdev() as it needs to do checks other
> than checking MS_FROZEN
Hmm, I'm not really sure what you mean?
> e.g. the above will unfreeze filesystems that
> were already frozen at the time a suspend occurs, and that could
> lead to corruption depending on why the filesystem was frozen...
>
> Also, you still need to check for a valid sb->s_bdev here, otherwise
> <splat>.
I see.
Thanks,
Rafael
WARNING: multiple messages have this Message-ID (diff)
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph <cr2005@u-club.de>, Theodore Ts'o <tytso@mit.edu>,
LKML <linux-kernel@vger.kernel.org>,
xfs@oss.sgi.com, Christoph Hellwig <hch@infradead.org>,
Nigel Cunningham <nigel@tuxonice.net>,
Pavel Machek <pavel@ucw.cz>,
linux-fsdevel@vger.kernel.org,
Linux PM mailing list <linux-pm@lists.linux-foundation.org>,
linux-ext4@vger.kernel.org
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
Date: Mon, 8 Aug 2011 23:11:27 +0200 [thread overview]
Message-ID: <201108082311.28190.rjw@sisk.pl> (raw)
In-Reply-To: <20110807001446.GI3162@dastard>
On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> >
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> >
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> >
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume). In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> >
> > This change is based on earlier work by Nigel Cunningham.
> >
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >
> > OK, so nobody except for Pavel appears to have any comments, so I assume
> > that everyone except for Pavel is fine with the approach, interestingly enough.
> >
> > I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
> > and added comments explaining why lockdep_off/on() are used.
> >
> > Thanks,
> > Rafael
> >
> > ---
> > fs/block_dev.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/fs.h | 6 +++++
> > kernel/power/process.c | 7 +++++-
> > 3 files changed, 68 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6/include/linux/fs.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/fs.h
> > +++ linux-2.6/include/linux/fs.h
> > @@ -211,6 +211,7 @@ struct inodes_stat_t {
> > #define MS_KERNMOUNT (1<<22) /* this is a kern_mount call */
> > #define MS_I_VERSION (1<<23) /* Update inode I_version field */
> > #define MS_STRICTATIME (1<<24) /* Always perform atime updates */
> > +#define MS_FROZEN (1<<25) /* bdev has been frozen */
> > #define MS_NOSEC (1<<28)
> > #define MS_BORN (1<<29)
> > #define MS_ACTIVE (1<<30)
> > @@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
> > extern void emergency_thaw_all(void);
> > extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
> > extern int fsync_bdev(struct block_device *);
> > +extern void freeze_filesystems(void);
> > +extern void thaw_filesystems(void);
> > #else
> > static inline void bd_forget(struct inode *inode) {}
> > static inline int sync_blockdev(struct block_device *bdev) { return 0; }
> > @@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
> > {
> > return 0;
> > }
> > +
> > +static inline void freeze_filesystems(void) {}
> > +static inline void thaw_filesystems(void) {}
> > #endif
> > extern int sync_filesystem(struct super_block *);
> > extern const struct file_operations def_blk_fops;
> > Index: linux-2.6/fs/block_dev.c
> > ===================================================================
> > --- linux-2.6.orig/fs/block_dev.c
> > +++ linux-2.6/fs/block_dev.c
> > @@ -314,6 +314,62 @@ out:
> > }
> > EXPORT_SYMBOL(thaw_bdev);
> >
> > +/**
> > + * freeze_filesystems - Force all filesystems into a consistent state.
> > + */
> > +void freeze_filesystems(void)
> > +{
> > + struct super_block *sb;
> > +
> > + /*
> > + * This is necessary, because some filesystems (e.g. ext3) lock
> > + * mutexes in their .freeze_fs() callbacks and leave them locked for
> > + * their .unfreeze_fs() callbacks to unlock. This is done under
> > + * bdev->bd_fsfreeze_mutex, which is then released, but it makes
> > + * lockdep think something may be wrong when freeze_bdev() attempts
> > + * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
> > + */
> > + lockdep_off();
>
> I thought those problems were fixed. If they aren't, then they most
> certainly need to be because holding mutexes over system calls is a
> bug.
>
> Well, well:
>
> [252182.603134] ================================================
> [252182.604832] [ BUG: lock held when returning to user space! ]
> [252182.606086] ------------------------------------------------
> [252182.607400] xfs_io/4917 is leaving the kernel with locks still held!
> [252182.608905] 1 lock held by xfs_io/4917:
> [252182.609739] #0: (&journal->j_barrier){+.+...}, at: [<ffffffff812a2aaf>] journal_lock_updates+0xef/0x100
>
> <sigh>
>
> Looks like the problem was fixed for ext4, but not ext3. Please
> report this to the ext3/4 list and get it fixed, don't work around
> it here.
OK, but I guess I'll have to post a patch to fix this myself so that
anyone notices. :-)
> > + /*
> > + * Freeze in reverse order so filesystems depending on others are
> > + * frozen in the right order (eg. loopback on ext3).
> > + */
> > + list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > + if (!sb->s_root || !sb->s_bdev ||
> > + (sb->s_frozen == SB_FREEZE_TRANS) ||
> > + (sb->s_flags & MS_RDONLY))
> > + continue;
> > +
> > + freeze_bdev(sb->s_bdev);
> > + sb->s_flags |= MS_FROZEN;
> > + }
>
> AFAIK, that won't work for btrfs - you have to call freeze_super()
> directly for btrfs because it has a special relationship with
> sb->s_bdev. And besides, all freeze_bdev does is get an active
> reference on the superblock and call freeze_super().
OK, so do you mean I should call freeze_super() rather than freeze_bdev()?
> Also, that's traversing the list of superblock with locking and
> dereferencing the superblock without properly checking that the
> superblock is not being torn down. You should probably use
> iterate_supers (or at least copy the code), with a function that
> drops the s_umount read lock befor calling freeze_super() and then
> picks it back up afterwards.
Hmm, I'll try that, but I doubt I'll get it right first time. :-)
> > +
> > + lockdep_on();
> > +}
> > +
> > +/**
> > + * thaw_filesystems - Make all filesystems active again.
> > + */
> > +void thaw_filesystems(void)
> > +{
> > + struct super_block *sb;
> > +
> > + /*
> > + * This is necessary for the same reason as in freeze_filesystems()
> > + * above.
> > + */
> > + lockdep_off();
> > +
> > + list_for_each_entry(sb, &super_blocks, s_list)
> > + if (sb->s_flags & MS_FROZEN) {
> > + sb->s_flags &= ~MS_FROZEN;
> > + thaw_bdev(sb->s_bdev, sb);
> > + }
>
> And once again, iterate_supers() is what you want here.
OK
> And you should only call thaw_bdev() as it needs to do checks other
> than checking MS_FROZEN
Hmm, I'm not really sure what you mean?
> e.g. the above will unfreeze filesystems that
> were already frozen at the time a suspend occurs, and that could
> lead to corruption depending on why the filesystem was frozen...
>
> Also, you still need to check for a valid sb->s_bdev here, otherwise
> <splat>.
I see.
Thanks,
Rafael
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-08-08 21:10 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-12 16:05 PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Christoph
2011-07-13 0:03 ` Dave Chinner
2011-07-26 20:28 ` Rafael J. Wysocki
2011-07-26 20:28 ` Rafael J. Wysocki
2011-07-27 0:45 ` Dave Chinner
2011-07-27 0:45 ` Dave Chinner
2011-07-27 9:35 ` Rafael J. Wysocki
2011-07-27 9:35 ` Rafael J. Wysocki
2011-07-27 10:33 ` Christoph Hellwig
2011-07-27 10:33 ` Christoph Hellwig
2011-07-27 12:22 ` Nigel Cunningham
2011-07-27 12:22 ` Nigel Cunningham
2011-08-03 21:15 ` [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag) Rafael J. Wysocki
2011-08-03 21:15 ` Rafael J. Wysocki
2011-08-03 21:15 ` Rafael J. Wysocki
2011-08-03 17:29 ` Pavel Machek
2011-08-04 9:27 ` Rafael J. Wysocki
2011-08-04 9:27 ` Rafael J. Wysocki
2011-08-04 22:25 ` Rafael J. Wysocki
2011-08-04 22:25 ` Rafael J. Wysocki
2011-08-04 22:25 ` Rafael J. Wysocki
2011-08-06 21:17 ` [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2) Rafael J. Wysocki
2011-08-06 21:17 ` Rafael J. Wysocki
2011-08-06 21:17 ` Rafael J. Wysocki
2011-08-07 0:14 ` Dave Chinner
2011-08-07 0:14 ` Dave Chinner
2011-08-08 21:11 ` Rafael J. Wysocki
2011-08-08 21:11 ` Rafael J. Wysocki [this message]
2011-08-08 21:11 ` Rafael J. Wysocki
2011-08-14 0:16 ` Rafael J. Wysocki
2011-08-14 0:16 ` Rafael J. Wysocki
2011-08-14 0:16 ` Rafael J. Wysocki
2011-09-24 22:56 ` Rafael J. Wysocki
2011-09-24 22:56 ` Rafael J. Wysocki
2011-09-24 22:56 ` Rafael J. Wysocki
2011-09-25 5:32 ` Nigel Cunningham
2011-09-25 5:32 ` Nigel Cunningham
2011-09-25 13:37 ` Rafael J. Wysocki
2011-09-25 13:37 ` Rafael J. Wysocki
2011-09-25 10:38 ` Christoph
2011-09-25 10:38 ` Christoph
2011-09-25 13:32 ` Rafael J. Wysocki
2011-09-25 13:32 ` Rafael J. Wysocki
2011-09-25 21:57 ` Christoph
2011-09-25 21:57 ` Christoph
2011-09-25 22:10 ` Rafael J. Wysocki
2011-09-25 22:10 ` Rafael J. Wysocki
2011-09-26 5:27 ` Christoph
2011-09-26 5:27 ` Christoph
2011-10-22 15:14 ` Christoph
2011-10-22 15:14 ` Christoph
2011-10-22 21:35 ` Rafael J. Wysocki
2011-10-22 21:35 ` Rafael J. Wysocki
2011-11-16 13:49 ` Ferenc Wagner
2011-11-16 13:49 ` Ferenc Wagner
2011-11-16 21:50 ` Rafael J. Wysocki
2011-11-16 21:50 ` Rafael J. Wysocki
2011-09-25 13:40 ` [Update][PATCH] PM / Hibernate: Freeze kernel threads after preallocating memory Rafael J. Wysocki
2011-09-25 13:40 ` Rafael J. Wysocki
2011-08-07 0:14 ` [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2) Dave Chinner
2011-08-04 9:27 ` [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag) Rafael J. Wysocki
2011-08-03 17:29 ` Pavel Machek
2011-08-10 21:43 ` PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Pavel Machek
2011-08-10 21:43 ` Pavel Machek
2011-08-16 12:38 ` Christoph
2011-08-16 12:38 ` Christoph
2011-08-16 18:05 ` Rafael J. Wysocki
2011-08-16 18:05 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201108082311.28190.rjw@sisk.pl \
--to=rjw@sisk.pl \
--cc=cr2005@u-club.de \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@lists.linux-foundation.org \
--cc=nigel@tuxonice.net \
--cc=pavel@ucw.cz \
--cc=tytso@mit.edu \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.