* [PATCH 00/19 v5] Fix filesystem freezing deadlocks
@ 2012-04-16 16:13 Jan Kara
2012-04-16 16:13 ` [PATCH 17/27] ext4: Convert to new freezing mechanism Jan Kara
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
To: Al Viro
Cc: dchinner, LKML, linux-fsdevel, Jan Kara, Alex Elder,
Anton Altaparmakov, Ben Myers, Chris Mason, cluster-devel,
David S. Miller, fuse-devel, J. Bruce Fields, Joel Becker,
KONISHI Ryusuke, linux-btrfs, linux-ext4, linux-nfs, linux-nilfs,
linux-ntfs-dev, Mark Fasheh, Miklos Szeredi, ocfs2-devel,
OGAWA Hirofumi, Steven Whitehouse, Theodore Ts'o, xfs
Hello,
here is the fifth iteration of my patches to improve filesystem freezing.
No serious changes since last time. Mostly I rebased patches and merged this
series with series moving file_update_time() to ->page_mkwrite() to simplify
testing and merging.
Filesystem freezing is currently racy and thus we can end up with dirty data on
frozen filesystem (see changelog patch 13 for detailed race description). This
patch series aims at fixing this.
To be able to block all places where inodes get dirtied, I've moved filesystem
file_update_time() call to ->page_mkwrite callback (patches 01-07) and put
freeze handling in mnt_want_write() / mnt_drop_write(). That however required
some code shuffling and changes to kern_path_create() (see patches 09-12). I
think the result is OK but opinions may differ ;). The advantage of this change
also is that all filesystems get freeze protection almost for free - even ext2
can handle freezing well now.
Another potential contention point might be patch 19. In that patch we make
freeze_super() refuse to freeze the filesystem when there are open but unlinked
files which may be impractical in some cases. The main reason for this is the
problem with handling of file deletion from fput() called with mmap_sem held
(e.g. from munmap(2)), and then there's the fact that we cannot really force
such filesystem into a consistent state... But if people think that freezing
with open but unlinked files should happen, then I have some possible
solutions in mind (maybe as a separate patchset since this is large enough).
I'm not able to hit any deadlocks, lockdep warnings, or dirty data on frozen
filesystem despite beating it with fsstress and bash-shared-mapping while
freezing and unfreezing for several hours (using ext4 and xfs) so I'm
reasonably confident this could finally be the right solution.
Changes since v4:
* added a couple of Acked-by's
* added some comments & doc update
* added patches from series "Push file_update_time() into .page_mkwrite"
since it doesn't make much sense to keep them separate anymore
* rebased on top of 3.4-rc2
Changes since v3:
* added third level of freezing for fs internal purposes - hooked some
filesystems to use it (XFS, nilfs2)
* removed racy i_size check from filemap_mkwrite()
Changes since v2:
* completely rewritten
* freezing is now blocked at VFS entry points
* two stage freezing to handle both mmapped writes and other IO
The biggest changes since v1:
* have two counters to provide safe state transitions for SB_FREEZE_WRITE
and SB_FREEZE_TRANS states
* use percpu counters instead of own percpu structure
* added documentation fixes from the old fs freezing series
* converted XFS to use SB_FREEZE_TRANS counter instead of its private
m_active_trans counter
Honza
CC: Alex Elder <elder@kernel.org>
CC: Anton Altaparmakov <anton@tuxera.com>
CC: Ben Myers <bpm@sgi.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: cluster-devel@redhat.com
CC: "David S. Miller" <davem@davemloft.net>
CC: fuse-devel@lists.sourceforge.net
CC: "J. Bruce Fields" <bfields@fieldses.org>
CC: Joel Becker <jlbec@evilplan.org>
CC: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
CC: linux-btrfs@vger.kernel.org
CC: linux-ext4@vger.kernel.org
CC: linux-nfs@vger.kernel.org
CC: linux-nilfs@vger.kernel.org
CC: linux-ntfs-dev@lists.sourceforge.net
CC: Mark Fasheh <mfasheh@suse.com>
CC: Miklos Szeredi <miklos@szeredi.hu>
CC: ocfs2-devel@oss.oracle.com
CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: "Theodore Ts'o" <tytso@mit.edu>
CC: xfs@oss.sgi.com
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 17/27] ext4: Convert to new freezing mechanism
2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
[not found] ` <1334592845-22862-1-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
2012-04-16 22:02 ` Andreas Dilger
2 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
To: Al Viro
Cc: dchinner, LKML, linux-fsdevel, Jan Kara, linux-ext4,
Theodore Ts'o
We remove most of frozen checks since upper layer takes care
of blocking all writes. We only have to handle protection in
ext4_page_mkwrite() in a special way because we cannot use
generic block_page_mkwrite().
CC: linux-ext4@vger.kernel.org
CC: "Theodore Ts'o" <tytso@mit.edu>
BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 7 ++-----
fs/ext4/mmp.c | 14 ++++++++++----
fs/ext4/super.c | 31 +++++++------------------------
3 files changed, 19 insertions(+), 33 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c77b0bd..bf568fb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4600,11 +4600,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
get_block_t *get_block;
int retries = 0;
- /*
- * This check is racy but catches the common case. We rely on
- * __block_page_mkwrite() to do a reliable check.
- */
- vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
+ sb_start_pagefault(inode->i_sb);
/* Delalloc case is easy... */
if (test_opt(inode->i_sb, DELALLOC) &&
!ext4_should_journal_data(inode) &&
@@ -4672,5 +4668,6 @@ retry_alloc:
out_ret:
ret = block_page_mkwrite_return(ret);
out:
+ sb_end_pagefault(inode->i_sb);
return ret;
}
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index ed6548d..4f63f90 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -10,14 +10,20 @@
* Write the MMP block using WRITE_SYNC to try to get the block on-disk
* faster.
*/
-static int write_mmp_block(struct buffer_head *bh)
+static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
{
+ /*
+ * We protect against freezing so that we don't create dirty buffers
+ * on frozen filesystem.
+ */
+ sb_start_write(sb);
mark_buffer_dirty(bh);
lock_buffer(bh);
bh->b_end_io = end_buffer_write_sync;
get_bh(bh);
submit_bh(WRITE_SYNC, bh);
wait_on_buffer(bh);
+ sb_end_write(sb);
if (unlikely(!buffer_uptodate(bh)))
return 1;
@@ -120,7 +126,7 @@ static int kmmpd(void *data)
mmp->mmp_time = cpu_to_le64(get_seconds());
last_update_time = jiffies;
- retval = write_mmp_block(bh);
+ retval = write_mmp_block(sb, bh);
/*
* Don't spew too many error messages. Print one every
* (s_mmp_update_interval * 60) seconds.
@@ -200,7 +206,7 @@ static int kmmpd(void *data)
mmp->mmp_seq = cpu_to_le32(EXT4_MMP_SEQ_CLEAN);
mmp->mmp_time = cpu_to_le64(get_seconds());
- retval = write_mmp_block(bh);
+ retval = write_mmp_block(sb, bh);
failed:
kfree(data);
@@ -299,7 +305,7 @@ skip:
seq = mmp_new_seq();
mmp->mmp_seq = cpu_to_le32(seq);
- retval = write_mmp_block(bh);
+ retval = write_mmp_block(sb, bh);
if (retval)
goto failed;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index ceebaf8..08c7326 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -290,33 +290,17 @@ static void ext4_put_nojournal(handle_t *handle)
* journal_end calls result in the superblock being marked dirty, so
* that sync() will call the filesystem's write_super callback if
* appropriate.
- *
- * To avoid j_barrier hold in userspace when a user calls freeze(),
- * ext4 prevents a new handle from being started by s_frozen, which
- * is in an upper layer.
*/
handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
{
journal_t *journal;
- handle_t *handle;
trace_ext4_journal_start(sb, nblocks, _RET_IP_);
if (sb->s_flags & MS_RDONLY)
return ERR_PTR(-EROFS);
+ WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
journal = EXT4_SB(sb)->s_journal;
- handle = ext4_journal_current_handle();
-
- /*
- * If a handle has been started, it should be allowed to
- * finish, otherwise deadlock could happen between freeze
- * and others(e.g. truncate) due to the restart of the
- * journal handle if the filesystem is forzen and active
- * handles are not stopped.
- */
- if (!handle)
- vfs_check_frozen(sb, SB_FREEZE_TRANS);
-
if (!journal)
return ext4_get_nojournal();
/*
@@ -2635,6 +2619,7 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
sb = elr->lr_super;
ngroups = EXT4_SB(sb)->s_groups_count;
+ sb_start_write(sb);
for (group = elr->lr_next_group; group < ngroups; group++) {
gdp = ext4_get_group_desc(sb, group, NULL);
if (!gdp) {
@@ -2661,6 +2646,7 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
elr->lr_next_sched = jiffies + elr->lr_timeout;
elr->lr_next_group = group + 1;
}
+ sb_end_write(sb);
return ret;
}
@@ -4137,10 +4123,8 @@ int ext4_force_commit(struct super_block *sb)
return 0;
journal = EXT4_SB(sb)->s_journal;
- if (journal) {
- vfs_check_frozen(sb, SB_FREEZE_TRANS);
+ if (journal)
ret = ext4_journal_force_commit(journal);
- }
return ret;
}
@@ -4172,9 +4156,8 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
* gives us a chance to flush the journal completely and mark the fs clean.
*
* Note that only this function cannot bring a filesystem to be in a clean
- * state independently, because ext4 prevents a new handle from being started
- * by @sb->s_frozen, which stays in an upper layer. It thus needs help from
- * the upper layer.
+ * state independently. It relies on upper layer to stop all data & metadata
+ * modifications.
*/
static int ext4_freeze(struct super_block *sb)
{
@@ -4201,7 +4184,7 @@ static int ext4_freeze(struct super_block *sb)
EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
error = ext4_commit_super(sb, 1);
out:
- /* we rely on s_frozen to stop further updates */
+ /* we rely on upper layer to stop further updates */
jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
return error;
}
--
1.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread[parent not found: <1334592845-22862-1-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>]
* Re: [PATCH 00/19 v5] Fix filesystem freezing deadlocks
[not found] ` <1334592845-22862-1-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
@ 2012-04-16 16:16 ` Jan Kara
0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2012-04-16 16:16 UTC (permalink / raw)
To: Al Viro
Cc: dchinner-H+wXaHxf7aLQT0dZR+AlfA, LKML,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Jan Kara, Alex Elder,
Anton Altaparmakov, Ben Myers, Chris Mason,
cluster-devel-H+wXaHxf7aLQT0dZR+AlfA, David S. Miller,
fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, J. Bruce Fields,
Joel Becker, KONISHI Ryusuke, linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
linux-ext4-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA,
linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Mark Fasheh,
Miklos Szeredi, ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g,
OGAWA Hirofumi, Steven Whitehouse, Theodore Ts'o,
xfs-VZNHf3L845pBDgjK7y7TUQ
The subject should have been [PATCH 00/27]... Sorry for the mistake.
Honza
On Mon 16-04-12 18:13:38, Jan Kara wrote:
> Hello,
>
> here is the fifth iteration of my patches to improve filesystem freezing.
> No serious changes since last time. Mostly I rebased patches and merged this
> series with series moving file_update_time() to ->page_mkwrite() to simplify
> testing and merging.
>
> Filesystem freezing is currently racy and thus we can end up with dirty data on
> frozen filesystem (see changelog patch 13 for detailed race description). This
> patch series aims at fixing this.
>
> To be able to block all places where inodes get dirtied, I've moved filesystem
> file_update_time() call to ->page_mkwrite callback (patches 01-07) and put
> freeze handling in mnt_want_write() / mnt_drop_write(). That however required
> some code shuffling and changes to kern_path_create() (see patches 09-12). I
> think the result is OK but opinions may differ ;). The advantage of this change
> also is that all filesystems get freeze protection almost for free - even ext2
> can handle freezing well now.
>
> Another potential contention point might be patch 19. In that patch we make
> freeze_super() refuse to freeze the filesystem when there are open but unlinked
> files which may be impractical in some cases. The main reason for this is the
> problem with handling of file deletion from fput() called with mmap_sem held
> (e.g. from munmap(2)), and then there's the fact that we cannot really force
> such filesystem into a consistent state... But if people think that freezing
> with open but unlinked files should happen, then I have some possible
> solutions in mind (maybe as a separate patchset since this is large enough).
>
> I'm not able to hit any deadlocks, lockdep warnings, or dirty data on frozen
> filesystem despite beating it with fsstress and bash-shared-mapping while
> freezing and unfreezing for several hours (using ext4 and xfs) so I'm
> reasonably confident this could finally be the right solution.
>
> Changes since v4:
> * added a couple of Acked-by's
> * added some comments & doc update
> * added patches from series "Push file_update_time() into .page_mkwrite"
> since it doesn't make much sense to keep them separate anymore
> * rebased on top of 3.4-rc2
>
> Changes since v3:
> * added third level of freezing for fs internal purposes - hooked some
> filesystems to use it (XFS, nilfs2)
> * removed racy i_size check from filemap_mkwrite()
>
> Changes since v2:
> * completely rewritten
> * freezing is now blocked at VFS entry points
> * two stage freezing to handle both mmapped writes and other IO
>
> The biggest changes since v1:
> * have two counters to provide safe state transitions for SB_FREEZE_WRITE
> and SB_FREEZE_TRANS states
> * use percpu counters instead of own percpu structure
> * added documentation fixes from the old fs freezing series
> * converted XFS to use SB_FREEZE_TRANS counter instead of its private
> m_active_trans counter
>
> Honza
>
> CC: Alex Elder <elder-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> CC: Anton Altaparmakov <anton-yrGDUoBaLx3QT0dZR+AlfA@public.gmane.org>
> CC: Ben Myers <bpm-sJ/iWh9BUns@public.gmane.org>
> CC: Chris Mason <chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> CC: cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> CC: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> CC: fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> CC: "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
> CC: Joel Becker <jlbec-aKy9MeLSZ9dg9hUCZPvPmw@public.gmane.org>
> CC: KONISHI Ryusuke <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
> CC: linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> CC: Mark Fasheh <mfasheh-IBi9RG/b67k@public.gmane.org>
> CC: Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
> CC: ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g@public.gmane.org
> CC: OGAWA Hirofumi <hirofumi-UIVanBePwB70ZhReMnHkpc8NsWr+9BEh@public.gmane.org>
> CC: Steven Whitehouse <swhiteho-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: "Theodore Ts'o" <tytso-3s7WtUTddSA@public.gmane.org>
> CC: xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org
--
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 00/19 v5] Fix filesystem freezing deadlocks
2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
2012-04-16 16:13 ` [PATCH 17/27] ext4: Convert to new freezing mechanism Jan Kara
[not found] ` <1334592845-22862-1-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
@ 2012-04-16 22:02 ` Andreas Dilger
2012-04-17 0:43 ` Dave Chinner
2012-04-17 9:32 ` Jan Kara
2 siblings, 2 replies; 12+ messages in thread
From: Andreas Dilger @ 2012-04-16 22:02 UTC (permalink / raw)
To: Jan Kara
Cc: Al Viro, dchinner, LKML, linux-fsdevel, Alex Elder,
Anton Altaparmakov, Ben Myers, Chris Mason, cluster-devel,
David S. Miller, fuse-devel, J. Bruce Fields, Joel Becker,
KONISHI Ryusuke, linux-btrfs, linux-ext4, linux-nfs, linux-nilfs,
linux-ntfs-dev, Mark Fasheh, Miklos Szeredi, ocfs2-devel,
OGAWA Hirofumi, Steven Whitehouse, Theodore Ts'o, xfs
On 2012-04-16, at 9:13 AM, Jan Kara wrote:
> Another potential contention point might be patch 19. In that patch
> we make freeze_super() refuse to freeze the filesystem when there
> are open but unlinked files which may be impractical in some cases.
> The main reason for this is the problem with handling of file deletion
> from fput() called with mmap_sem held (e.g. from munmap(2)), and
> then there's the fact that we cannot really force such filesystem
> into a consistent state... But if people think that freezing with
> open but unlinked files should happen, then I have some possible
> solutions in mind (maybe as a separate patchset since this is
> large enough).
Looking at a desktop system, I think it is very typical that there
are open-unlinked files present, so I don't know if this is really
an acceptable solution. It isn't clear from your comments whether
this is a blanket refusal for all open-unlinked files, or only in
some particular cases...
lsof | grep deleted
nautilus 25393 adilger 19r REG 253,0 340 253954 /home/adilger/.local/share/gvfs-metadata/home (deleted)
nautilus 25393 adilger 20r REG 253,0 32768 253964 /home/adilger/.local/share/gvfs-metadata/home-f332a8f3.log (deleted)
gnome-ter 25623 adilger 22u REG 0,18 17841 2717846 /tmp/vtePIRJCW (deleted)
gnome-ter 25623 adilger 23u REG 0,18 5568 2717847 /tmp/vteDCSJCW (deleted)
gnome-ter 25623 adilger 29u REG 0,18 480 2728484 /tmp/vte6C1TCW (deleted)
Cheers, Andreas
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 00/19 v5] Fix filesystem freezing deadlocks
2012-04-16 22:02 ` Andreas Dilger
@ 2012-04-17 0:43 ` Dave Chinner
2012-04-17 5:10 ` Andreas Dilger
2012-04-17 9:32 ` Jan Kara
1 sibling, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2012-04-17 0:43 UTC (permalink / raw)
To: Andreas Dilger
Cc: Jan Kara, Al Viro, dchinner, LKML, linux-fsdevel, Alex Elder,
Anton Altaparmakov, Ben Myers, Chris Mason, cluster-devel,
David S. Miller, fuse-devel, J. Bruce Fields, Joel Becker,
KONISHI Ryusuke, linux-btrfs, linux-ext4, linux-nfs, linux-nilfs,
linux-ntfs-dev, Mark Fasheh, Miklos Szeredi, ocfs2-devel,
OGAWA Hirofumi, Steven Whitehouse, Theodore Ts'o, xfs
On Mon, Apr 16, 2012 at 03:02:50PM -0700, Andreas Dilger wrote:
> On 2012-04-16, at 9:13 AM, Jan Kara wrote:
> > Another potential contention point might be patch 19. In that patch
> > we make freeze_super() refuse to freeze the filesystem when there
> > are open but unlinked files which may be impractical in some cases.
> > The main reason for this is the problem with handling of file deletion
> > from fput() called with mmap_sem held (e.g. from munmap(2)), and
> > then there's the fact that we cannot really force such filesystem
> > into a consistent state... But if people think that freezing with
> > open but unlinked files should happen, then I have some possible
> > solutions in mind (maybe as a separate patchset since this is
> > large enough).
>
> Looking at a desktop system, I think it is very typical that there
> are open-unlinked files present, so I don't know if this is really
> an acceptable solution. It isn't clear from your comments whether
> this is a blanket refusal for all open-unlinked files, or only in
> some particular cases...
>
> lsof | grep deleted
> nautilus 25393 adilger 19r REG 253,0 340 253954 /home/adilger/.local/share/gvfs-metadata/home (deleted)
> nautilus 25393 adilger 20r REG 253,0 32768 253964 /home/adilger/.local/share/gvfs-metadata/home-f332a8f3.log (deleted)
> gnome-ter 25623 adilger 22u REG 0,18 17841 2717846 /tmp/vtePIRJCW (deleted)
> gnome-ter 25623 adilger 23u REG 0,18 5568 2717847 /tmp/vteDCSJCW (deleted)
> gnome-ter 25623 adilger 29u REG 0,18 480 2728484 /tmp/vte6C1TCW (deleted)
Unlinked-but-open files are the reason that XFS dirties the log
after the freeze process is complete. This ensures that if the
system crashes while the filesystem is frozen then log recovery
during the next mount will process the unlinked (orphaned) inodes
and free the correctly. i.e. you can still freeze a filesystem with
inodes in this state successfully and have everythign behave as
you'd expect.
I'm not sure how other filesystems handle this problem, but perhaps
pushing this check down into filesystem specific code or adding a
superblock feature flag might be a way to allow filesystems to
handle this case in the way they think is best...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 00/19 v5] Fix filesystem freezing deadlocks
2012-04-17 0:43 ` Dave Chinner
@ 2012-04-17 5:10 ` Andreas Dilger
2012-04-18 0:46 ` Chris Samuel
0 siblings, 1 reply; 12+ messages in thread
From: Andreas Dilger @ 2012-04-17 5:10 UTC (permalink / raw)
To: Dave Chinner
Cc: Jan Kara, J. Bruce Fields, ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g,
KONISHI Ryusuke, OGAWA Hirofumi,
linux-nilfs-u79uwXL29TY76Z2rM5mHXA, Miklos Szeredi,
cluster-devel-H+wXaHxf7aLQT0dZR+AlfA, Anton Altaparmakov,
linux-ext4-u79uwXL29TY76Z2rM5mHXA,
fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Mark Fasheh,
xfs-VZNHf3L845pBDgjK7y7TUQ, Ben Myers, Joel Becker,
dchinner-H+wXaHxf7aLQT0dZR+AlfA, Steven Whitehouse, Chris Mason,
linux-nfs-u79uwXL29TY76Z2rM5mHXA, Alex Elder, Theodore Ts'o,
linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, LKML, Al Viro,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, David S. Miller,
linux-btrfs-u79uwXL29Tb/PtFMR13I2A
On 2012-04-16, at 5:43 PM, Dave Chinner wrote:
> On Mon, Apr 16, 2012 at 03:02:50PM -0700, Andreas Dilger wrote:
>> On 2012-04-16, at 9:13 AM, Jan Kara wrote:
>>> Another potential contention point might be patch 19. In that patch
>>> we make freeze_super() refuse to freeze the filesystem when there
>>> are open but unlinked files which may be impractical in some cases.
>>> The main reason for this is the problem with handling of file deletion from fput() called with mmap_sem held (e.g. from munmap(2)),
>>> and then there's the fact that we cannot really force such filesystem
>>> into a consistent state... But if people think that freezing with
>>> open but unlinked files should happen, then I have some possible
>>> solutions in mind (maybe as a separate patchset since this is
>>> large enough).
>>
>> Looking at a desktop system, I think it is very typical that there
>> are open-unlinked files present, so I don't know if this is really
>> an acceptable solution. It isn't clear from your comments whether
>> this is a blanket refusal for all open-unlinked files, or only in
>> some particular cases...
>
> Unlinked-but-open files are the reason that XFS dirties the log
> after the freeze process is complete. This ensures that if the
> system crashes while the filesystem is frozen then log recovery
> during the next mount will process the unlinked (orphaned) inodes
> and free the correctly. i.e. you can still freeze a filesystem with
> inodes in this state successfully and have everythign behave as
> you'd expect.
>
> I'm not sure how other filesystems handle this problem, but perhaps
> pushing this check down into filesystem specific code or adding a
> superblock feature flag might be a way to allow filesystems to
> handle this case in the way they think is best...
The ext3/4 code has long been able to handle open-unlinked files
properly after a crash (they are put into a singly-linked list from
the superblock on disk that is processed after journal recovery).
The issue here is that blocking freeze from succeeding with open-
unlinked files is an unreasonable assumption of this patch, and
I don't think it is acceptable to land this patchset (which IMHO
would prevent nearly every Gnome system from freezing unless these
apps have changed their behaviour in more recent releases).
Like you suggest, filesystems that handle this correctly should be
able to flag or otherwise indicate that this is OK, and allow the
freeze to continue. For other filesystems that do not handle
open-unlinked file consistency during a filesystem freeze/snapshot
whether this should even be considered a new case, or is something
that has existed for ages already.
The other question is whether this is still a problem even for
filesystems handling the consistency issue, but from Jan's comment
above there is a new locking issue related to mmap_sem being added?
Cheers, Andreas
--
Andreas Dilger Whamcloud, Inc.
Principal Lustre Engineer http://www.whamcloud.com/
------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 00/19 v5] Fix filesystem freezing deadlocks
2012-04-17 5:10 ` Andreas Dilger
@ 2012-04-18 0:46 ` Chris Samuel
0 siblings, 0 replies; 12+ messages in thread
From: Chris Samuel @ 2012-04-18 0:46 UTC (permalink / raw)
To: Andreas Dilger
Cc: Dave Chinner, Jan Kara, Al Viro, dchinner, LKML, linux-fsdevel,
Alex Elder, Anton Altaparmakov, Ben Myers, Chris Mason,
cluster-devel, David S. Miller, fuse-devel, J. Bruce Fields,
Joel Becker, KONISHI Ryusuke, linux-btrfs, linux-ext4, linux-nfs,
linux-nilfs, linux-ntfs-dev, Mark Fasheh, Miklos Szeredi,
ocfs2-devel, OGAWA Hirofumi, Steven Whitehouse
On 17/04/12 15:10, Andreas Dilger wrote:
> (which IMHO would prevent nearly every Gnome system from freezing unless these
> apps have changed their behaviour in more recent releases).
It would also affect current KDE desktops as they tend to use MySQL for
Akonadi & Nepomuk, plus anyone using Chromium (and presumably Chrome):
samuel@eris:/tmp$ sudo lsof | grep deleted | awk '{print $1}' | sort |
uniq -c
[sudo] password for samuel:
32 chromium-
1 dovecot
5 imap-logi
5 mysqld
I would be surprised if you could find many systems that didn't have
files in this situation.
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 00/19 v5] Fix filesystem freezing deadlocks
2012-04-16 22:02 ` Andreas Dilger
2012-04-17 0:43 ` Dave Chinner
@ 2012-04-17 9:32 ` Jan Kara
[not found] ` <20120417093246.GD7198-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
1 sibling, 1 reply; 12+ messages in thread
From: Jan Kara @ 2012-04-17 9:32 UTC (permalink / raw)
To: Andreas Dilger
Cc: Jan Kara, Al Viro, dchinner, LKML, linux-fsdevel, Alex Elder,
Anton Altaparmakov, Ben Myers, Chris Mason, cluster-devel,
David S. Miller, fuse-devel, J. Bruce Fields, Joel Becker,
KONISHI Ryusuke, linux-btrfs, linux-ext4, linux-nfs, linux-nilfs,
linux-ntfs-dev, Mark Fasheh, Miklos Szeredi, ocfs2-devel,
OGAWA Hirofumi, Steven Whitehouse, Theodore Ts'o, xfs
On Mon 16-04-12 15:02:50, Andreas Dilger wrote:
> On 2012-04-16, at 9:13 AM, Jan Kara wrote:
> > Another potential contention point might be patch 19. In that patch
> > we make freeze_super() refuse to freeze the filesystem when there
> > are open but unlinked files which may be impractical in some cases.
> > The main reason for this is the problem with handling of file deletion
> > from fput() called with mmap_sem held (e.g. from munmap(2)), and
> > then there's the fact that we cannot really force such filesystem
> > into a consistent state... But if people think that freezing with
> > open but unlinked files should happen, then I have some possible
> > solutions in mind (maybe as a separate patchset since this is
> > large enough).
>
> Looking at a desktop system, I think it is very typical that there
> are open-unlinked files present, so I don't know if this is really
> an acceptable solution. It isn't clear from your comments whether
> this is a blanket refusal for all open-unlinked files, or only in
> some particular cases...
Thanks for looking at this. It is currently a blanket refusal. And I
agree it's problematic. There are two problems with open but unlinked
files.
One is that some old filesystems cannot get in a consistent state in
presence of open but unlinked files but for filesystems we really care
about - xfs, ext4, ext3, btrfs, or even ocfs2, gfs2 - that is not a real
issue (these filesystems will delete those inodes on next mount read-write).
The other problem is with what should happen when you put last inode
reference on a frozen filesystem. Two possibilities I see are:
a) block the iput() call - that is inconvenient because it can be
called in various contexts. I think we could possibly use the same level of
freeze protection as for page fault (this has changed since I originally
thought about this and that would make things simpler) but I'm not
completely sure.
b) let the iput finish but filesystem will keep inode on its orphan list
(or it's equivalent) and the inode will be deleted after the filesystem is
thawed. The advantage of this is we don't have to block iput(), the
disadvantage is we have to have filesystem support and not all filesystems
can do this.
Any thoughts?
Honza
>
> lsof | grep deleted
> nautilus 25393 adilger 19r REG 253,0 340 253954 /home/adilger/.local/share/gvfs-metadata/home (deleted)
> nautilus 25393 adilger 20r REG 253,0 32768 253964 /home/adilger/.local/share/gvfs-metadata/home-f332a8f3.log (deleted)
> gnome-ter 25623 adilger 22u REG 0,18 17841 2717846 /tmp/vtePIRJCW (deleted)
> gnome-ter 25623 adilger 23u REG 0,18 5568 2717847 /tmp/vteDCSJCW (deleted)
> gnome-ter 25623 adilger 29u REG 0,18 480 2728484 /tmp/vte6C1TCW (deleted)
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 00/27 v6] Fix filesystem freezing deadlocks
@ 2012-06-01 22:30 Jan Kara
2012-06-01 22:30 ` [PATCH 17/27] ext4: Convert to new freezing mechanism Jan Kara
0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2012-06-01 22:30 UTC (permalink / raw)
To: linux-fsdevel
Cc: Al Viro, dchinner, Jan Kara, Alex Elder, Anton Altaparmakov,
Ben Myers, Chris Mason, cluster-devel, David S. Miller,
fuse-devel, J. Bruce Fields, Joel Becker, KONISHI Ryusuke,
linux-btrfs, linux-ext4, linux-nfs, linux-nilfs, linux-ntfs-dev,
Mark Fasheh, Miklos Szeredi, ocfs2-devel, OGAWA Hirofumi,
Steven Whitehouse, Theodore Ts'o, xfs
Hello,
here is the sixth iteration of my patches to improve filesystem freezing.
The change since last iteration is that filesystem can be frozen with open but
unlinked files. After some thinking, I've decided that the best way to handle
this is to block removal inside ->evict_inode() of each filesystem and use
fs-internal level of freeze protection for that (usually I've instrumented
filesystem's transaction system to use freeze protection). Handling
inside VFS would be less work but the only level of freeze protection that
has a chance of not causing deadlocks is the one used for page faults and even
there it's not clear lock ordering would be correct wrt some fs-specific locks.
I've converted ext2, ext4, btrfs, xfs, nilfs2, ocfs2, gfs2 and also checked
that ext3, reiserfs, jfs should work as well (they have their internal freeze
protection mechanisms, possibly they could be replaced by a generic one but
given these are mostly aging filesystems, it's not a real priority IHMO).
So finally I'm not aware of any pending issue with this patch set so if you
have some concern, please speak up!
Introductory text to first time readers:
Filesystem freezing is currently racy and thus we can end up with dirty data on
frozen filesystem (see changelog patch 13 for detailed race description). This
patch series aims at fixing this.
To be able to block all places where inodes get dirtied, I've moved filesystem
file_update_time() call to ->page_mkwrite callback (patches 01-07) and put
freeze handling in mnt_want_write() / mnt_drop_write(). That however required
some code shuffling and changes to kern_path_create() (see patches 09-12). I
think the result is OK but opinions may differ ;). The advantage of this change
also is that all filesystems get freeze protection almost for free - even ext2
can handle freezing well now.
Another potential contention point might be patch 19. In that patch we make
freeze_super() refuse to freeze the filesystem when there are open but unlinked
files which may be impractical in some cases. The main reason for this is the
problem with handling of file deletion from fput() called with mmap_sem held
(e.g. from munmap(2)), and then there's the fact that we cannot really force
such filesystem into a consistent state... But if people think that freezing
with open but unlinked files should happen, then I have some possible
solutions in mind (maybe as a separate patchset since this is large enough).
I'm not able to hit any deadlocks, lockdep warnings, or dirty data on frozen
filesystem despite beating it with fsstress and bash-shared-mapping while
freezing and unfreezing for several hours (using ext4 and xfs) so I'm
reasonably confident this could finally be the right solution.
Changes since v5:
* handle unlinked & open files on frozen filesystem
* lockdep keys for freeze protection are now per filesystem type
* taught lockdep that freeze protection at lower level does not create
dependency when we already hold freeze protection at higher level
* rebased on 3.5-rc1-ish
Changes since v4:
* added a couple of Acked-by's
* added some comments & doc update
* added patches from series "Push file_update_time() into .page_mkwrite"
since it doesn't make much sense to keep them separate anymore
* rebased on top of 3.4-rc2
Changes since v3:
* added third level of freezing for fs internal purposes - hooked some
filesystems to use it (XFS, nilfs2)
* removed racy i_size check from filemap_mkwrite()
Changes since v2:
* completely rewritten
* freezing is now blocked at VFS entry points
* two stage freezing to handle both mmapped writes and other IO
The biggest changes since v1:
* have two counters to provide safe state transitions for SB_FREEZE_WRITE
and SB_FREEZE_TRANS states
* use percpu counters instead of own percpu structure
* added documentation fixes from the old fs freezing series
* converted XFS to use SB_FREEZE_TRANS counter instead of its private
m_active_trans counter
Honza
CC: Alex Elder <elder@kernel.org>
CC: Anton Altaparmakov <anton@tuxera.com>
CC: Ben Myers <bpm@sgi.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: cluster-devel@redhat.com
CC: "David S. Miller" <davem@davemloft.net>
CC: fuse-devel@lists.sourceforge.net
CC: "J. Bruce Fields" <bfields@fieldses.org>
CC: Joel Becker <jlbec@evilplan.org>
CC: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
CC: linux-btrfs@vger.kernel.org
CC: linux-ext4@vger.kernel.org
CC: linux-nfs@vger.kernel.org
CC: linux-nilfs@vger.kernel.org
CC: linux-ntfs-dev@lists.sourceforge.net
CC: Mark Fasheh <mfasheh@suse.com>
CC: Miklos Szeredi <miklos@szeredi.hu>
CC: ocfs2-devel@oss.oracle.com
CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: "Theodore Ts'o" <tytso@mit.edu>
CC: xfs@oss.sgi.com
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 17/27] ext4: Convert to new freezing mechanism
2012-06-01 22:30 [PATCH 00/27 v6] " Jan Kara
@ 2012-06-01 22:30 ` Jan Kara
2012-06-11 3:13 ` Ted Ts'o
0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2012-06-01 22:30 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Al Viro, dchinner, Jan Kara, linux-ext4, Theodore Ts'o
We remove most of frozen checks since upper layer takes care of blocking all
writes. We have to handle protection in ext4_page_mkwrite() in a special way
because we cannot use generic block_page_mkwrite(). Also we add a freeze
protection to ext4_evict_inode() so that iput() of unlinked inode cannot modify
a frozen filesystem (we cannot easily instrument ext4_journal_start() /
ext4_journal_stop() with freeze protection because we are missing the
superblock pointer in ext4_journal_stop() in nojournal mode).
CC: linux-ext4@vger.kernel.org
CC: "Theodore Ts'o" <tytso@mit.edu>
BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 15 ++++++++++-----
fs/ext4/mmp.c | 14 ++++++++++----
fs/ext4/super.c | 31 +++++++------------------------
3 files changed, 27 insertions(+), 33 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 07eaf56..4884127 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -166,6 +166,11 @@ void ext4_evict_inode(struct inode *inode)
if (is_bad_inode(inode))
goto no_delete;
+ /*
+ * Protect us against freezing - iput() caller didn't have to have any
+ * protection against it
+ */
+ sb_start_intwrite(inode->i_sb);
handle = ext4_journal_start(inode, ext4_blocks_for_truncate(inode)+3);
if (IS_ERR(handle)) {
ext4_std_error(inode->i_sb, PTR_ERR(handle));
@@ -175,6 +180,7 @@ void ext4_evict_inode(struct inode *inode)
* cleaned up.
*/
ext4_orphan_del(NULL, inode);
+ sb_end_intwrite(inode->i_sb);
goto no_delete;
}
@@ -206,6 +212,7 @@ void ext4_evict_inode(struct inode *inode)
stop_handle:
ext4_journal_stop(handle);
ext4_orphan_del(NULL, inode);
+ sb_end_intwrite(inode->i_sb);
goto no_delete;
}
}
@@ -234,6 +241,7 @@ void ext4_evict_inode(struct inode *inode)
else
ext4_free_inode(handle, inode);
ext4_journal_stop(handle);
+ sb_end_intwrite(inode->i_sb);
return;
no_delete:
ext4_clear_inode(inode); /* We must guarantee clearing of inode... */
@@ -4606,11 +4614,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
get_block_t *get_block;
int retries = 0;
- /*
- * This check is racy but catches the common case. We rely on
- * __block_page_mkwrite() to do a reliable check.
- */
- vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
+ sb_start_pagefault(inode->i_sb);
/* Delalloc case is easy... */
if (test_opt(inode->i_sb, DELALLOC) &&
!ext4_should_journal_data(inode) &&
@@ -4678,5 +4682,6 @@ retry_alloc:
out_ret:
ret = block_page_mkwrite_return(ret);
out:
+ sb_end_pagefault(inode->i_sb);
return ret;
}
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index ed6548d..4f63f90 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -10,14 +10,20 @@
* Write the MMP block using WRITE_SYNC to try to get the block on-disk
* faster.
*/
-static int write_mmp_block(struct buffer_head *bh)
+static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
{
+ /*
+ * We protect against freezing so that we don't create dirty buffers
+ * on frozen filesystem.
+ */
+ sb_start_write(sb);
mark_buffer_dirty(bh);
lock_buffer(bh);
bh->b_end_io = end_buffer_write_sync;
get_bh(bh);
submit_bh(WRITE_SYNC, bh);
wait_on_buffer(bh);
+ sb_end_write(sb);
if (unlikely(!buffer_uptodate(bh)))
return 1;
@@ -120,7 +126,7 @@ static int kmmpd(void *data)
mmp->mmp_time = cpu_to_le64(get_seconds());
last_update_time = jiffies;
- retval = write_mmp_block(bh);
+ retval = write_mmp_block(sb, bh);
/*
* Don't spew too many error messages. Print one every
* (s_mmp_update_interval * 60) seconds.
@@ -200,7 +206,7 @@ static int kmmpd(void *data)
mmp->mmp_seq = cpu_to_le32(EXT4_MMP_SEQ_CLEAN);
mmp->mmp_time = cpu_to_le64(get_seconds());
- retval = write_mmp_block(bh);
+ retval = write_mmp_block(sb, bh);
failed:
kfree(data);
@@ -299,7 +305,7 @@ skip:
seq = mmp_new_seq();
mmp->mmp_seq = cpu_to_le32(seq);
- retval = write_mmp_block(bh);
+ retval = write_mmp_block(sb, bh);
if (retval)
goto failed;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 35b5954..cd6a516 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -290,33 +290,17 @@ static void ext4_put_nojournal(handle_t *handle)
* journal_end calls result in the superblock being marked dirty, so
* that sync() will call the filesystem's write_super callback if
* appropriate.
- *
- * To avoid j_barrier hold in userspace when a user calls freeze(),
- * ext4 prevents a new handle from being started by s_frozen, which
- * is in an upper layer.
*/
handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
{
journal_t *journal;
- handle_t *handle;
trace_ext4_journal_start(sb, nblocks, _RET_IP_);
if (sb->s_flags & MS_RDONLY)
return ERR_PTR(-EROFS);
+ WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
journal = EXT4_SB(sb)->s_journal;
- handle = ext4_journal_current_handle();
-
- /*
- * If a handle has been started, it should be allowed to
- * finish, otherwise deadlock could happen between freeze
- * and others(e.g. truncate) due to the restart of the
- * journal handle if the filesystem is forzen and active
- * handles are not stopped.
- */
- if (!handle)
- vfs_check_frozen(sb, SB_FREEZE_TRANS);
-
if (!journal)
return ext4_get_nojournal();
/*
@@ -2633,6 +2617,7 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
sb = elr->lr_super;
ngroups = EXT4_SB(sb)->s_groups_count;
+ sb_start_write(sb);
for (group = elr->lr_next_group; group < ngroups; group++) {
gdp = ext4_get_group_desc(sb, group, NULL);
if (!gdp) {
@@ -2659,6 +2644,7 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
elr->lr_next_sched = jiffies + elr->lr_timeout;
elr->lr_next_group = group + 1;
}
+ sb_end_write(sb);
return ret;
}
@@ -4135,10 +4121,8 @@ int ext4_force_commit(struct super_block *sb)
return 0;
journal = EXT4_SB(sb)->s_journal;
- if (journal) {
- vfs_check_frozen(sb, SB_FREEZE_TRANS);
+ if (journal)
ret = ext4_journal_force_commit(journal);
- }
return ret;
}
@@ -4170,9 +4154,8 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
* gives us a chance to flush the journal completely and mark the fs clean.
*
* Note that only this function cannot bring a filesystem to be in a clean
- * state independently, because ext4 prevents a new handle from being started
- * by @sb->s_frozen, which stays in an upper layer. It thus needs help from
- * the upper layer.
+ * state independently. It relies on upper layer to stop all data & metadata
+ * modifications.
*/
static int ext4_freeze(struct super_block *sb)
{
@@ -4199,7 +4182,7 @@ static int ext4_freeze(struct super_block *sb)
EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
error = ext4_commit_super(sb, 1);
out:
- /* we rely on s_frozen to stop further updates */
+ /* we rely on upper layer to stop further updates */
jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
return error;
}
--
1.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH 17/27] ext4: Convert to new freezing mechanism
2012-06-01 22:30 ` [PATCH 17/27] ext4: Convert to new freezing mechanism Jan Kara
@ 2012-06-11 3:13 ` Ted Ts'o
0 siblings, 0 replies; 12+ messages in thread
From: Ted Ts'o @ 2012-06-11 3:13 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-fsdevel, Al Viro, dchinner, linux-ext4
On Sat, Jun 02, 2012 at 12:30:31AM +0200, Jan Kara wrote:
> We remove most of frozen checks since upper layer takes care of blocking all
> writes. We have to handle protection in ext4_page_mkwrite() in a special way
> because we cannot use generic block_page_mkwrite(). Also we add a freeze
> protection to ext4_evict_inode() so that iput() of unlinked inode cannot modify
> a frozen filesystem (we cannot easily instrument ext4_journal_start() /
> ext4_journal_stop() with freeze protection because we are missing the
> superblock pointer in ext4_journal_stop() in nojournal mode).
>
> CC: linux-ext4@vger.kernel.org
> CC: "Theodore Ts'o" <tytso@mit.edu>
> BugLink: https://bugs.launchpad.net/bugs/897421
> Tested-by: Kamal Mostafa <kamal@canonical.com>
> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
> Tested-by: Dann Frazier <dann.frazier@canonical.com>
> Tested-by: Massimo Morana <massimo.morana@canonical.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
- Ted
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 00/27 v7] Fix filesystem freezing deadlocks
@ 2012-06-12 14:20 Jan Kara
2012-06-12 14:20 ` [PATCH 17/27] ext4: Convert to new freezing mechanism Jan Kara
0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2012-06-12 14:20 UTC (permalink / raw)
To: Al Viro
Cc: Jan Kara, J. Bruce Fields, KONISHI Ryusuke,
linux-nilfs-u79uwXL29TY76Z2rM5mHXA, Miklos Szeredi,
cluster-devel-H+wXaHxf7aLQT0dZR+AlfA, Chris Mason,
linux-ext4-u79uwXL29TY76Z2rM5mHXA,
fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Mark Fasheh,
linux-fsdevel-AlSwsSmVLrQ, xfs-VZNHf3L845pBDgjK7y7TUQ, Ben Myers,
Joel Becker, Anton Altaparmakov, Steven Whitehouse,
OGAWA Hirofumi, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Alex Elder,
Theodore Ts'o,
linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, LKML,
ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g, David S. Miller,
linux-btrfs-u79uwXL29TY76Z2rM5mHXA
Hello,
here is the seventh iteration of my patches to improve filesystem freezing.
I've rebased patches on top of 3.5-rc2 as Al requested. Otherwise I've just
fixed some outdated text in the introduction below and added one ack.
Introductory text to first time readers:
Filesystem freezing is currently racy and thus we can end up with dirty data on
frozen filesystem (see changelog patch 13 for detailed race description). This
patch series aims at fixing this.
To be able to block all places where inodes get dirtied, I've moved filesystem
file_update_time() call to ->page_mkwrite callback (patches 01-07) and put
freeze handling in mnt_want_write() / mnt_drop_write(). That however required
some code shuffling and changes to kern_path_create() (see patches 09-12). I
think the result is OK but opinions may differ ;). The advantage of this change
also is that all filesystems get freeze protection almost for free - even ext2
can handle freezing well now.
I'm not able to hit any deadlocks, lockdep warnings, or dirty data on frozen
filesystem despite beating it with fsstress, bash-shared-mapping, and
aio-stress while freezing and unfreezing for several hours (using ext4 and xfs)
so I'm reasonably confident this could finally be the right solution.
Changes since v6:
* rebased on 3.5-rc2
* added ack
Changes since v5:
* handle unlinked & open files on frozen filesystem
* lockdep keys for freeze protection are now per filesystem type
* taught lockdep that freeze protection at lower level does not create
dependency when we already hold freeze protection at higher level
* rebased on 3.5-rc1-ish
Changes since v4:
* added a couple of Acked-by's
* added some comments & doc update
* added patches from series "Push file_update_time() into .page_mkwrite"
since it doesn't make much sense to keep them separate anymore
* rebased on top of 3.4-rc2
Changes since v3:
* added third level of freezing for fs internal purposes - hooked some
filesystems to use it (XFS, nilfs2)
* removed racy i_size check from filemap_mkwrite()
Changes since v2:
* completely rewritten
* freezing is now blocked at VFS entry points
* two stage freezing to handle both mmapped writes and other IO
The biggest changes since v1:
* have two counters to provide safe state transitions for SB_FREEZE_WRITE
and SB_FREEZE_TRANS states
* use percpu counters instead of own percpu structure
* added documentation fixes from the old fs freezing series
* converted XFS to use SB_FREEZE_TRANS counter instead of its private
m_active_trans counter
Honza
CC: Alex Elder <elder-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
CC: Anton Altaparmakov <anton-yrGDUoBaLx3QT0dZR+AlfA@public.gmane.org>
CC: Ben Myers <bpm-sJ/iWh9BUns@public.gmane.org>
CC: Chris Mason <chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
CC: cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
CC: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
CC: fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
CC: "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
CC: Joel Becker <jlbec-aKy9MeLSZ9dg9hUCZPvPmw@public.gmane.org>
CC: KONISHI Ryusuke <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
CC: linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
CC: Mark Fasheh <mfasheh-IBi9RG/b67k@public.gmane.org>
CC: Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
CC: ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g@public.gmane.org
CC: OGAWA Hirofumi <hirofumi-UIVanBePwB70ZhReMnHkpc8NsWr+9BEh@public.gmane.org>
CC: Steven Whitehouse <swhiteho-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: "Theodore Ts'o" <tytso-3s7WtUTddSA@public.gmane.org>
CC: xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 17/27] ext4: Convert to new freezing mechanism
2012-06-12 14:20 [PATCH 00/27 v7] Fix filesystem freezing deadlocks Jan Kara
@ 2012-06-12 14:20 ` Jan Kara
0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2012-06-12 14:20 UTC (permalink / raw)
To: Al Viro; +Cc: LKML, linux-fsdevel, Jan Kara, linux-ext4, Theodore Ts'o
We remove most of frozen checks since upper layer takes care of blocking all
writes. We have to handle protection in ext4_page_mkwrite() in a special way
because we cannot use generic block_page_mkwrite(). Also we add a freeze
protection to ext4_evict_inode() so that iput() of unlinked inode cannot modify
a frozen filesystem (we cannot easily instrument ext4_journal_start() /
ext4_journal_stop() with freeze protection because we are missing the
superblock pointer in ext4_journal_stop() in nojournal mode).
CC: linux-ext4@vger.kernel.org
CC: "Theodore Ts'o" <tytso@mit.edu>
BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 15 ++++++++++-----
fs/ext4/mmp.c | 6 ++++++
fs/ext4/super.c | 31 +++++++------------------------
3 files changed, 23 insertions(+), 29 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 02bc8cb..301e1c2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -233,6 +233,11 @@ void ext4_evict_inode(struct inode *inode)
if (is_bad_inode(inode))
goto no_delete;
+ /*
+ * Protect us against freezing - iput() caller didn't have to have any
+ * protection against it
+ */
+ sb_start_intwrite(inode->i_sb);
handle = ext4_journal_start(inode, ext4_blocks_for_truncate(inode)+3);
if (IS_ERR(handle)) {
ext4_std_error(inode->i_sb, PTR_ERR(handle));
@@ -242,6 +247,7 @@ void ext4_evict_inode(struct inode *inode)
* cleaned up.
*/
ext4_orphan_del(NULL, inode);
+ sb_end_intwrite(inode->i_sb);
goto no_delete;
}
@@ -273,6 +279,7 @@ void ext4_evict_inode(struct inode *inode)
stop_handle:
ext4_journal_stop(handle);
ext4_orphan_del(NULL, inode);
+ sb_end_intwrite(inode->i_sb);
goto no_delete;
}
}
@@ -301,6 +308,7 @@ void ext4_evict_inode(struct inode *inode)
else
ext4_free_inode(handle, inode);
ext4_journal_stop(handle);
+ sb_end_intwrite(inode->i_sb);
return;
no_delete:
ext4_clear_inode(inode); /* We must guarantee clearing of inode... */
@@ -4701,11 +4709,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
get_block_t *get_block;
int retries = 0;
- /*
- * This check is racy but catches the common case. We rely on
- * __block_page_mkwrite() to do a reliable check.
- */
- vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
+ sb_start_pagefault(inode->i_sb);
/* Delalloc case is easy... */
if (test_opt(inode->i_sb, DELALLOC) &&
!ext4_should_journal_data(inode) &&
@@ -4773,5 +4777,6 @@ retry_alloc:
out_ret:
ret = block_page_mkwrite_return(ret);
out:
+ sb_end_pagefault(inode->i_sb);
return ret;
}
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index f99a131..fe7c63f 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -44,6 +44,11 @@ static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
{
struct mmp_struct *mmp = (struct mmp_struct *)(bh->b_data);
+ /*
+ * We protect against freezing so that we don't create dirty buffers
+ * on frozen filesystem.
+ */
+ sb_start_write(sb);
ext4_mmp_csum_set(sb, mmp);
mark_buffer_dirty(bh);
lock_buffer(bh);
@@ -51,6 +56,7 @@ static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
get_bh(bh);
submit_bh(WRITE_SYNC, bh);
wait_on_buffer(bh);
+ sb_end_write(sb);
if (unlikely(!buffer_uptodate(bh)))
return 1;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index eb7aa3e..bd6a415 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -332,33 +332,17 @@ static void ext4_put_nojournal(handle_t *handle)
* journal_end calls result in the superblock being marked dirty, so
* that sync() will call the filesystem's write_super callback if
* appropriate.
- *
- * To avoid j_barrier hold in userspace when a user calls freeze(),
- * ext4 prevents a new handle from being started by s_frozen, which
- * is in an upper layer.
*/
handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
{
journal_t *journal;
- handle_t *handle;
trace_ext4_journal_start(sb, nblocks, _RET_IP_);
if (sb->s_flags & MS_RDONLY)
return ERR_PTR(-EROFS);
+ WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
journal = EXT4_SB(sb)->s_journal;
- handle = ext4_journal_current_handle();
-
- /*
- * If a handle has been started, it should be allowed to
- * finish, otherwise deadlock could happen between freeze
- * and others(e.g. truncate) due to the restart of the
- * journal handle if the filesystem is forzen and active
- * handles are not stopped.
- */
- if (!handle)
- vfs_check_frozen(sb, SB_FREEZE_TRANS);
-
if (!journal)
return ext4_get_nojournal();
/*
@@ -2723,6 +2707,7 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
sb = elr->lr_super;
ngroups = EXT4_SB(sb)->s_groups_count;
+ sb_start_write(sb);
for (group = elr->lr_next_group; group < ngroups; group++) {
gdp = ext4_get_group_desc(sb, group, NULL);
if (!gdp) {
@@ -2749,6 +2734,7 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
elr->lr_next_sched = jiffies + elr->lr_timeout;
elr->lr_next_group = group + 1;
}
+ sb_end_write(sb);
return ret;
}
@@ -4302,10 +4288,8 @@ int ext4_force_commit(struct super_block *sb)
return 0;
journal = EXT4_SB(sb)->s_journal;
- if (journal) {
- vfs_check_frozen(sb, SB_FREEZE_TRANS);
+ if (journal)
ret = ext4_journal_force_commit(journal);
- }
return ret;
}
@@ -4337,9 +4321,8 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
* gives us a chance to flush the journal completely and mark the fs clean.
*
* Note that only this function cannot bring a filesystem to be in a clean
- * state independently, because ext4 prevents a new handle from being started
- * by @sb->s_frozen, which stays in an upper layer. It thus needs help from
- * the upper layer.
+ * state independently. It relies on upper layer to stop all data & metadata
+ * modifications.
*/
static int ext4_freeze(struct super_block *sb)
{
@@ -4366,7 +4349,7 @@ static int ext4_freeze(struct super_block *sb)
EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
error = ext4_commit_super(sb, 1);
out:
- /* we rely on s_frozen to stop further updates */
+ /* we rely on upper layer to stop further updates */
jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
return error;
}
--
1.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-06-12 14:20 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
2012-04-16 16:13 ` [PATCH 17/27] ext4: Convert to new freezing mechanism Jan Kara
[not found] ` <1334592845-22862-1-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
2012-04-16 16:16 ` [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
2012-04-16 22:02 ` Andreas Dilger
2012-04-17 0:43 ` Dave Chinner
2012-04-17 5:10 ` Andreas Dilger
2012-04-18 0:46 ` Chris Samuel
2012-04-17 9:32 ` Jan Kara
[not found] ` <20120417093246.GD7198-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-17 19:34 ` Joel Becker
-- strict thread matches above, loose matches on Subject: below --
2012-06-01 22:30 [PATCH 00/27 v6] " Jan Kara
2012-06-01 22:30 ` [PATCH 17/27] ext4: Convert to new freezing mechanism Jan Kara
2012-06-11 3:13 ` Ted Ts'o
2012-06-12 14:20 [PATCH 00/27 v7] Fix filesystem freezing deadlocks Jan Kara
2012-06-12 14:20 ` [PATCH 17/27] ext4: Convert to new freezing mechanism Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).