[PATCH 00/19] Fix filesystem freezing deadlocks

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/19] Fix filesystem freezing deadlocks
@ 2012-03-05 16:00 Jan Kara
  2012-03-05 16:01 ` [PATCH 10/19] ext4: Convert to new freezing mechanism Jan Kara
  2012-03-11 20:22 ` [PATCH 00/19] Fix filesystem freezing deadlocks Kamal Mostafa
  0 siblings, 2 replies; 6+ messages in thread
From: Jan Kara @ 2012-03-05 16:00 UTC (permalink / raw)
  To: LKML
  Cc: Jan Kara, J. Bruce Fields, cluster-devel-H+wXaHxf7aLQT0dZR+AlfA,
	ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g, KONISHI Ryusuke,
	OGAWA Hirofumi, sandeen-H+wXaHxf7aLQT0dZR+AlfA,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA, Miklos Szeredi,
	Christoph Hellwig, Anton Altaparmakov,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Mark Fasheh,
	xfs-VZNHf3L845pBDgjK7y7TUQ, Ben Myers, Joel Becker,
	dchinner-H+wXaHxf7aLQT0dZR+AlfA, Steven Whitehouse, Chris Mason,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Alex Elder, Theodore Ts'o,
	linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Kamal Mostafa,
	Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, "D

  Hallelujah,

  after a couple of weeks and several rewrites, here comes the third iteration
of my patches to improve filesystem freezing.  Filesystem freezing is currently
racy and thus we can end up with dirty data on frozen filesystem (see changelog
patch 06 for detailed race description). This patch series aims at fixing this.

To be able to block all places where inodes get dirtied, I've moved filesystem
freeze handling in mnt_want_write() / mnt_drop_write(). This however required
some code shuffling and changes to kern_path_create() (see patches 02-05). I
think the result is OK but opinions may differ ;). The advantage of this change
also is that all filesystems get freeze protection almost for free - even ext2
can handle freezing well now.

Another potential contention point might be patch 19. In that patch we make
freeze_super() refuse to freeze the filesystem when there are open but unlinked
files which may be impractical in some cases. The main reason for this is the
problem with handling of file deletion from fput() called with mmap_sem held
(e.g. from munmap(2)), and then there's the fact that we cannot really force
such filesystem into a consistent state... But if people think that freezing
with open but unlinked files should happen, then I have some possible
solutions in mind (maybe as a separate patchset since this is large enough).

I'm not able to hit any deadlocks, lockdep warnings, or dirty data on frozen
filesystem despite beating it with fsstress and bash-shared-mapping while
freezing and unfreezing for several hours (using ext4 and xfs) so I'm
reasonably confident this could finally be the right solution.

And for people wanting to test - this patchset is based on patch series
"Push file_update_time() into .page_mkwrite" so you'll need to pull that one
in as well.

Changes since v2:
  * completely rewritten
  * freezing is now blocked at VFS entry points
  * two stage freezing to handle both mmapped writes and other IO

The biggest changes since v1:
  * have two counters to provide safe state transitions for SB_FREEZE_WRITE
    and SB_FREEZE_TRANS states
  * use percpu counters instead of own percpu structure
  * added documentation fixes from the old fs freezing series
  * converted XFS to use SB_FREEZE_TRANS counter instead of its private
    m_active_trans counter

								Honza

CC: Alex Elder <elder-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
CC: Anton Altaparmakov <anton-yrGDUoBaLx3QT0dZR+AlfA@public.gmane.org>
CC: Ben Myers <bpm-sJ/iWh9BUns@public.gmane.org>
CC: Chris Mason <chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
CC: cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
CC: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
CC: fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
CC: "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
CC: Joel Becker <jlbec-aKy9MeLSZ9dg9hUCZPvPmw@public.gmane.org>
CC: KONISHI Ryusuke <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
CC: linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
CC: Mark Fasheh <mfasheh-IBi9RG/b67k@public.gmane.org>
CC: Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
CC: ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g@public.gmane.org
CC: OGAWA Hirofumi <hirofumi-UIVanBePwB70ZhReMnHkpc8NsWr+9BEh@public.gmane.org>
CC: Steven Whitehouse <swhiteho-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: "Theodore Ts'o" <tytso-3s7WtUTddSA@public.gmane.org>
CC: xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 10/19] ext4: Convert to new freezing mechanism
  2012-03-05 16:00 [PATCH 00/19] Fix filesystem freezing deadlocks Jan Kara
@ 2012-03-05 16:01 ` Jan Kara
  2012-03-07 22:32   ` Kamal Mostafa
  2012-03-11 20:22 ` [PATCH 00/19] Fix filesystem freezing deadlocks Kamal Mostafa
  1 sibling, 1 reply; 6+ messages in thread
From: Jan Kara @ 2012-03-05 16:01 UTC (permalink / raw)
  To: LKML
  Cc: linux-fsdevel, Al Viro, Christoph Hellwig, dchinner, sandeen,
	Kamal Mostafa, Jan Kara, linux-ext4, Theodore Ts'o

We remove most of frozen checks since upper layer takes care
of blocking all writes. We only have to handle protection in
ext4_page_mkwrite() in a special way because we cannot use
generic block_page_mkwrite().

CC: linux-ext4@vger.kernel.org
CC: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c |    7 ++-----
 fs/ext4/super.c |   29 +++++------------------------
 2 files changed, 7 insertions(+), 29 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index feaa82f..c65baf9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4593,11 +4593,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	get_block_t *get_block;
 	int retries = 0;
 
-	/*
-	 * This check is racy but catches the common case. We rely on
-	 * __block_page_mkwrite() to do a reliable check.
-	 */
-	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
+	sb_start_pagefault(inode->i_sb);
 	/* Delalloc case is easy... */
 	if (test_opt(inode->i_sb, DELALLOC) &&
 	    !ext4_should_journal_data(inode) &&
@@ -4665,5 +4661,6 @@ retry_alloc:
 out_ret:
 	ret = block_page_mkwrite_return(ret);
 out:
+	sb_end_pagefault(inode->i_sb);
 	return ret;
 }
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 502c61f..0f1024a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -289,33 +289,17 @@ static void ext4_put_nojournal(handle_t *handle)
  * journal_end calls result in the superblock being marked dirty, so
  * that sync() will call the filesystem's write_super callback if
  * appropriate.
- *
- * To avoid j_barrier hold in userspace when a user calls freeze(),
- * ext4 prevents a new handle from being started by s_frozen, which
- * is in an upper layer.
  */
 handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
 {
 	journal_t *journal;
-	handle_t  *handle;
 
 	trace_ext4_journal_start(sb, nblocks, _RET_IP_);
 	if (sb->s_flags & MS_RDONLY)
 		return ERR_PTR(-EROFS);
 
+	WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
 	journal = EXT4_SB(sb)->s_journal;
-	handle = ext4_journal_current_handle();
-
-	/*
-	 * If a handle has been started, it should be allowed to
-	 * finish, otherwise deadlock could happen between freeze
-	 * and others(e.g. truncate) due to the restart of the
-	 * journal handle if the filesystem is forzen and active
-	 * handles are not stopped.
-	 */
-	if (!handle)
-		vfs_check_frozen(sb, SB_FREEZE_TRANS);
-
 	if (!journal)
 		return ext4_get_nojournal();
 	/*
@@ -4280,10 +4264,8 @@ int ext4_force_commit(struct super_block *sb)
 		return 0;
 
 	journal = EXT4_SB(sb)->s_journal;
-	if (journal) {
-		vfs_check_frozen(sb, SB_FREEZE_TRANS);
+	if (journal)
 		ret = ext4_journal_force_commit(journal);
-	}
 
 	return ret;
 }
@@ -4315,9 +4297,8 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
  * gives us a chance to flush the journal completely and mark the fs clean.
  *
  * Note that only this function cannot bring a filesystem to be in a clean
- * state independently, because ext4 prevents a new handle from being started
- * by @sb->s_frozen, which stays in an upper layer.  It thus needs help from
- * the upper layer.
+ * state independently. It relies on upper layer to stop all data & metadata
+ * modifications.
  */
 static int ext4_freeze(struct super_block *sb)
 {
@@ -4344,7 +4325,7 @@ static int ext4_freeze(struct super_block *sb)
 	EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
 	error = ext4_commit_super(sb, 1);
 out:
-	/* we rely on s_frozen to stop further updates */
+	/* we rely on upper layer to stop further updates */
 	jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
 	return error;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 10/19] ext4: Convert to new freezing mechanism
  2012-03-05 16:01 ` [PATCH 10/19] ext4: Convert to new freezing mechanism Jan Kara
@ 2012-03-07 22:32   ` Kamal Mostafa
  2012-03-08  9:05     ` Jan Kara
  0 siblings, 1 reply; 6+ messages in thread
From: Kamal Mostafa @ 2012-03-07 22:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, linux-fsdevel, Al Viro, Christoph Hellwig, dchinner,
	sandeen, linux-ext4, Theodore Ts'o

[-- Attachment #1: Type: text/plain, Size: 5573 bytes --]

Re: the patch set:
    [PATCH 00/19] Fix filesystem freezing deadlocks

In my initial smoke testing of this, I find that if I freeze a newly
created ext4 filesystem immediately after mounting it for the very first
time, then I get the new SB_FREEZE_COMPLETE warning from
ext4_journal_start_sb() every 0.4 seconds from ext4lazyinit...

        # mkfs -t ext4 /dev/sdaX
        # mount /dev/sdaX /mnt
        # fsfreeze -f /mnt

         WARNING:
        at /home/kamal/src/linux/ubuntu-precise/fs/ext4/super.c:301
        ext4_journal_start_sb+0x159/0x160()
        
         Pid: 3252, comm: ext4lazyinit Tainted: G        W
        3.2.0-18-generic #28+kamal1+jankara1
        
         Call Trace:
          [<ffffffff8106724f>] warn_slowpath_common+0x7f/0xc0
          [<ffffffff810672aa>] warn_slowpath_null+0x1a/0x20
          [<ffffffff812352c9>] ext4_journal_start_sb+0x159/0x160
          [<ffffffff8121326b>] ? ext4_init_inode_table+0xab/0x370
          [<ffffffff8121326b>] ext4_init_inode_table+0xab/0x370
          [<ffffffff81659cb5>] ? schedule_timeout+0x175/0x320
          [<ffffffff81226905>] ext4_run_li_request+0x85/0xe0
          [<ffffffff812269fc>] ext4_lazyinit_thread+0x9c/0x1c0
          [<ffffffff81226960>] ? ext4_run_li_request+0xe0/0xe0
          [<ffffffff8108a39c>] kthread+0x8c/0xa0
          [<ffffffff81665e34>] kernel_thread_helper+0x4/0x10
          [<ffffffff8108a310>] ? flush_kthread_worker+0xa0/0xa0
          [<ffffffff81665e30>] ? gs_change+0x13/0x13

 -Kamal


On Mon, 2012-03-05 at 17:01 +0100, Jan Kara wrote:
> We remove most of frozen checks since upper layer takes care
> of blocking all writes. We only have to handle protection in
> ext4_page_mkwrite() in a special way because we cannot use
> generic block_page_mkwrite().
> 
> CC: linux-ext4@vger.kernel.org
> CC: "Theodore Ts'o" <tytso@mit.edu>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/inode.c |    7 ++-----
>  fs/ext4/super.c |   29 +++++------------------------
>  2 files changed, 7 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index feaa82f..c65baf9 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4593,11 +4593,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
>  	get_block_t *get_block;
>  	int retries = 0;
>  
> -	/*
> -	 * This check is racy but catches the common case. We rely on
> -	 * __block_page_mkwrite() to do a reliable check.
> -	 */
> -	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
> +	sb_start_pagefault(inode->i_sb);
>  	/* Delalloc case is easy... */
>  	if (test_opt(inode->i_sb, DELALLOC) &&
>  	    !ext4_should_journal_data(inode) &&
> @@ -4665,5 +4661,6 @@ retry_alloc:
>  out_ret:
>  	ret = block_page_mkwrite_return(ret);
>  out:
> +	sb_end_pagefault(inode->i_sb);
>  	return ret;
>  }
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 502c61f..0f1024a 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -289,33 +289,17 @@ static void ext4_put_nojournal(handle_t *handle)
>   * journal_end calls result in the superblock being marked dirty, so
>   * that sync() will call the filesystem's write_super callback if
>   * appropriate.
> - *
> - * To avoid j_barrier hold in userspace when a user calls freeze(),
> - * ext4 prevents a new handle from being started by s_frozen, which
> - * is in an upper layer.
>   */
>  handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
>  {
>  	journal_t *journal;
> -	handle_t  *handle;
>  
>  	trace_ext4_journal_start(sb, nblocks, _RET_IP_);
>  	if (sb->s_flags & MS_RDONLY)
>  		return ERR_PTR(-EROFS);
>  
> +	WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
>  	journal = EXT4_SB(sb)->s_journal;
> -	handle = ext4_journal_current_handle();
> -
> -	/*
> -	 * If a handle has been started, it should be allowed to
> -	 * finish, otherwise deadlock could happen between freeze
> -	 * and others(e.g. truncate) due to the restart of the
> -	 * journal handle if the filesystem is forzen and active
> -	 * handles are not stopped.
> -	 */
> -	if (!handle)
> -		vfs_check_frozen(sb, SB_FREEZE_TRANS);
> -
>  	if (!journal)
>  		return ext4_get_nojournal();
>  	/*
> @@ -4280,10 +4264,8 @@ int ext4_force_commit(struct super_block *sb)
>  		return 0;
>  
>  	journal = EXT4_SB(sb)->s_journal;
> -	if (journal) {
> -		vfs_check_frozen(sb, SB_FREEZE_TRANS);
> +	if (journal)
>  		ret = ext4_journal_force_commit(journal);
> -	}
>  
>  	return ret;
>  }
> @@ -4315,9 +4297,8 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
>   * gives us a chance to flush the journal completely and mark the fs clean.
>   *
>   * Note that only this function cannot bring a filesystem to be in a clean
> - * state independently, because ext4 prevents a new handle from being started
> - * by @sb->s_frozen, which stays in an upper layer.  It thus needs help from
> - * the upper layer.
> + * state independently. It relies on upper layer to stop all data & metadata
> + * modifications.
>   */
>  static int ext4_freeze(struct super_block *sb)
>  {
> @@ -4344,7 +4325,7 @@ static int ext4_freeze(struct super_block *sb)
>  	EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
>  	error = ext4_commit_super(sb, 1);
>  out:
> -	/* we rely on s_frozen to stop further updates */
> +	/* we rely on upper layer to stop further updates */
>  	jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
>  	return error;
>  }


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 10/19] ext4: Convert to new freezing mechanism
  2012-03-07 22:32   ` Kamal Mostafa
@ 2012-03-08  9:05     ` Jan Kara
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2012-03-08  9:05 UTC (permalink / raw)
  To: Kamal Mostafa
  Cc: Jan Kara, LKML, linux-fsdevel, Al Viro, Christoph Hellwig,
	dchinner, sandeen, linux-ext4, Theodore Ts'o

[-- Attachment #1: Type: text/plain, Size: 5998 bytes --]

On Wed 07-03-12 14:32:13, Kamal Mostafa wrote:
> Re: the patch set:
>     [PATCH 00/19] Fix filesystem freezing deadlocks
> 
> In my initial smoke testing of this, I find that if I freeze a newly
> created ext4 filesystem immediately after mounting it for the very first
> time, then I get the new SB_FREEZE_COMPLETE warning from
> ext4_journal_start_sb() every 0.4 seconds from ext4lazyinit...
> 
>         # mkfs -t ext4 /dev/sdaX
>         # mount /dev/sdaX /mnt
>         # fsfreeze -f /mnt
> 
>          WARNING:
>         at /home/kamal/src/linux/ubuntu-precise/fs/ext4/super.c:301
>         ext4_journal_start_sb+0x159/0x160()
>         
>          Pid: 3252, comm: ext4lazyinit Tainted: G        W
>         3.2.0-18-generic #28+kamal1+jankara1
>         
>          Call Trace:
>           [<ffffffff8106724f>] warn_slowpath_common+0x7f/0xc0
>           [<ffffffff810672aa>] warn_slowpath_null+0x1a/0x20
>           [<ffffffff812352c9>] ext4_journal_start_sb+0x159/0x160
>           [<ffffffff8121326b>] ? ext4_init_inode_table+0xab/0x370
>           [<ffffffff8121326b>] ext4_init_inode_table+0xab/0x370
>           [<ffffffff81659cb5>] ? schedule_timeout+0x175/0x320
>           [<ffffffff81226905>] ext4_run_li_request+0x85/0xe0
>           [<ffffffff812269fc>] ext4_lazyinit_thread+0x9c/0x1c0
>           [<ffffffff81226960>] ? ext4_run_li_request+0xe0/0xe0
>           [<ffffffff8108a39c>] kthread+0x8c/0xa0
>           [<ffffffff81665e34>] kernel_thread_helper+0x4/0x10
>           [<ffffffff8108a310>] ? flush_kthread_worker+0xa0/0xa0
>           [<ffffffff81665e30>] ? gs_change+0x13/0x13
  Ah, good point. Thanks for spotting this. I forgot about the lazyinit
thread. Attached patch fixes the problem (I've folded it into ext4
conversion patch in my series).

								Honza
> 
>  -Kamal
> 
> 
> On Mon, 2012-03-05 at 17:01 +0100, Jan Kara wrote:
> > We remove most of frozen checks since upper layer takes care
> > of blocking all writes. We only have to handle protection in
> > ext4_page_mkwrite() in a special way because we cannot use
> > generic block_page_mkwrite().
> > 
> > CC: linux-ext4@vger.kernel.org
> > CC: "Theodore Ts'o" <tytso@mit.edu>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/ext4/inode.c |    7 ++-----
> >  fs/ext4/super.c |   29 +++++------------------------
> >  2 files changed, 7 insertions(+), 29 deletions(-)
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index feaa82f..c65baf9 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -4593,11 +4593,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> >  	get_block_t *get_block;
> >  	int retries = 0;
> >  
> > -	/*
> > -	 * This check is racy but catches the common case. We rely on
> > -	 * __block_page_mkwrite() to do a reliable check.
> > -	 */
> > -	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
> > +	sb_start_pagefault(inode->i_sb);
> >  	/* Delalloc case is easy... */
> >  	if (test_opt(inode->i_sb, DELALLOC) &&
> >  	    !ext4_should_journal_data(inode) &&
> > @@ -4665,5 +4661,6 @@ retry_alloc:
> >  out_ret:
> >  	ret = block_page_mkwrite_return(ret);
> >  out:
> > +	sb_end_pagefault(inode->i_sb);
> >  	return ret;
> >  }
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index 502c61f..0f1024a 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -289,33 +289,17 @@ static void ext4_put_nojournal(handle_t *handle)
> >   * journal_end calls result in the superblock being marked dirty, so
> >   * that sync() will call the filesystem's write_super callback if
> >   * appropriate.
> > - *
> > - * To avoid j_barrier hold in userspace when a user calls freeze(),
> > - * ext4 prevents a new handle from being started by s_frozen, which
> > - * is in an upper layer.
> >   */
> >  handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
> >  {
> >  	journal_t *journal;
> > -	handle_t  *handle;
> >  
> >  	trace_ext4_journal_start(sb, nblocks, _RET_IP_);
> >  	if (sb->s_flags & MS_RDONLY)
> >  		return ERR_PTR(-EROFS);
> >  
> > +	WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
> >  	journal = EXT4_SB(sb)->s_journal;
> > -	handle = ext4_journal_current_handle();
> > -
> > -	/*
> > -	 * If a handle has been started, it should be allowed to
> > -	 * finish, otherwise deadlock could happen between freeze
> > -	 * and others(e.g. truncate) due to the restart of the
> > -	 * journal handle if the filesystem is forzen and active
> > -	 * handles are not stopped.
> > -	 */
> > -	if (!handle)
> > -		vfs_check_frozen(sb, SB_FREEZE_TRANS);
> > -
> >  	if (!journal)
> >  		return ext4_get_nojournal();
> >  	/*
> > @@ -4280,10 +4264,8 @@ int ext4_force_commit(struct super_block *sb)
> >  		return 0;
> >  
> >  	journal = EXT4_SB(sb)->s_journal;
> > -	if (journal) {
> > -		vfs_check_frozen(sb, SB_FREEZE_TRANS);
> > +	if (journal)
> >  		ret = ext4_journal_force_commit(journal);
> > -	}
> >  
> >  	return ret;
> >  }
> > @@ -4315,9 +4297,8 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
> >   * gives us a chance to flush the journal completely and mark the fs clean.
> >   *
> >   * Note that only this function cannot bring a filesystem to be in a clean
> > - * state independently, because ext4 prevents a new handle from being started
> > - * by @sb->s_frozen, which stays in an upper layer.  It thus needs help from
> > - * the upper layer.
> > + * state independently. It relies on upper layer to stop all data & metadata
> > + * modifications.
> >   */
> >  static int ext4_freeze(struct super_block *sb)
> >  {
> > @@ -4344,7 +4325,7 @@ static int ext4_freeze(struct super_block *sb)
> >  	EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
> >  	error = ext4_commit_super(sb, 1);
> >  out:
> > -	/* we rely on s_frozen to stop further updates */
> > +	/* we rely on upper layer to stop further updates */
> >  	jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
> >  	return error;
> >  }
> 


-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: ext4_lazyinit_freeze_fix.diff --]
[-- Type: text/x-patch, Size: 626 bytes --]

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0f1024a..039b1e0 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2756,6 +2756,7 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 	sb = elr->lr_super;
 	ngroups = EXT4_SB(sb)->s_groups_count;
 
+	sb_start_write(sb);
 	for (group = elr->lr_next_group; group < ngroups; group++) {
 		gdp = ext4_get_group_desc(sb, group, NULL);
 		if (!gdp) {
@@ -2782,6 +2783,7 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 		elr->lr_next_sched = jiffies + elr->lr_timeout;
 		elr->lr_next_group = group + 1;
 	}
+	sb_end_write(sb);
 
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 00/19] Fix filesystem freezing deadlocks
  2012-03-05 16:00 [PATCH 00/19] Fix filesystem freezing deadlocks Jan Kara
  2012-03-05 16:01 ` [PATCH 10/19] ext4: Convert to new freezing mechanism Jan Kara
@ 2012-03-11 20:22 ` Kamal Mostafa
  1 sibling, 0 replies; 6+ messages in thread
From: Kamal Mostafa @ 2012-03-11 20:22 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, linux-fsdevel, Al Viro, Christoph Hellwig, dchinner,
	sandeen, Alex Elder, Anton Altaparmakov, Ben Myers, Chris Mason,
	cluster-devel, David S. Miller, fuse-devel, J. Bruce Fields,
	Joel Becker, KONISHI Ryusuke, linux-btrfs, linux-ext4, linux-nfs,
	linux-nilfs, linux-ntfs-dev, Mark Fasheh, Miklos Szeredi,
	ocfs2-devel, OGAWA Hirofumi, Steven Whitehouse, Th

[-- Attachment #1: Type: text/plain, Size: 869 bytes --]

On Mon, 2012-03-05 at 17:00 +0100, Jan Kara wrote:
> Hallelujah,
> 
>   after a couple of weeks and several rewrites, here comes the third iteration
> of my patches to improve filesystem freezing.  [...]

We've been testing this patch set at Canonical on the multipath failover
SAN configuration where we originally encountered the freeze deadlock.
We are happy to report that it does appear to fix the problem.  Thanks
Jan!

Please add the following endorsements for these patches (those actually
exercised by our test case):  01, 02, 03, 06, 07, 08, 09, 10, 14, 18, 19

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>

 -Kamal


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 10/19] ext4: Convert to new freezing mechanism
  2012-03-28 23:43 [PATCH 00/19 v4] " Jan Kara
@ 2012-03-28 23:43 ` Jan Kara
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2012-03-28 23:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Al Viro, dchinner, sandeen, Kamal Mostafa, Jan Kara, linux-ext4,
	Theodore Ts'o

We remove most of frozen checks since upper layer takes care
of blocking all writes. We only have to handle protection in
ext4_page_mkwrite() in a special way because we cannot use
generic block_page_mkwrite().

CC: linux-ext4@vger.kernel.org
CC: "Theodore Ts'o" <tytso@mit.edu>
BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c |    7 ++-----
 fs/ext4/mmp.c   |   14 ++++++++++----
 fs/ext4/super.c |   31 +++++++------------------------
 3 files changed, 19 insertions(+), 33 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index feaa82f..c65baf9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4593,11 +4593,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	get_block_t *get_block;
 	int retries = 0;
 
-	/*
-	 * This check is racy but catches the common case. We rely on
-	 * __block_page_mkwrite() to do a reliable check.
-	 */
-	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
+	sb_start_pagefault(inode->i_sb);
 	/* Delalloc case is easy... */
 	if (test_opt(inode->i_sb, DELALLOC) &&
 	    !ext4_should_journal_data(inode) &&
@@ -4665,5 +4661,6 @@ retry_alloc:
 out_ret:
 	ret = block_page_mkwrite_return(ret);
 out:
+	sb_end_pagefault(inode->i_sb);
 	return ret;
 }
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index 7ea4ba4..9061bc7 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -10,14 +10,20 @@
  * Write the MMP block using WRITE_SYNC to try to get the block on-disk
  * faster.
  */
-static int write_mmp_block(struct buffer_head *bh)
+static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
 {
+	/*
+	 * We protect against freezing so that we don't create dirty buffers
+	 * on frozen filesystem.
+	 */
+	sb_start_write(sb);
 	mark_buffer_dirty(bh);
 	lock_buffer(bh);
 	bh->b_end_io = end_buffer_write_sync;
 	get_bh(bh);
 	submit_bh(WRITE_SYNC, bh);
 	wait_on_buffer(bh);
+	sb_end_write(sb);
 	if (unlikely(!buffer_uptodate(bh)))
 		return 1;
 
@@ -120,7 +126,7 @@ static int kmmpd(void *data)
 		mmp->mmp_time = cpu_to_le64(get_seconds());
 		last_update_time = jiffies;
 
-		retval = write_mmp_block(bh);
+		retval = write_mmp_block(sb, bh);
 		/*
 		 * Don't spew too many error messages. Print one every
 		 * (s_mmp_update_interval * 60) seconds.
@@ -200,7 +206,7 @@ static int kmmpd(void *data)
 	mmp->mmp_seq = cpu_to_le32(EXT4_MMP_SEQ_CLEAN);
 	mmp->mmp_time = cpu_to_le64(get_seconds());
 
-	retval = write_mmp_block(bh);
+	retval = write_mmp_block(sb, bh);
 
 failed:
 	kfree(data);
@@ -299,7 +305,7 @@ skip:
 	seq = mmp_new_seq();
 	mmp->mmp_seq = cpu_to_le32(seq);
 
-	retval = write_mmp_block(bh);
+	retval = write_mmp_block(sb, bh);
 	if (retval)
 		goto failed;
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 502c61f..039b1e0 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -289,33 +289,17 @@ static void ext4_put_nojournal(handle_t *handle)
  * journal_end calls result in the superblock being marked dirty, so
  * that sync() will call the filesystem's write_super callback if
  * appropriate.
- *
- * To avoid j_barrier hold in userspace when a user calls freeze(),
- * ext4 prevents a new handle from being started by s_frozen, which
- * is in an upper layer.
  */
 handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
 {
 	journal_t *journal;
-	handle_t  *handle;
 
 	trace_ext4_journal_start(sb, nblocks, _RET_IP_);
 	if (sb->s_flags & MS_RDONLY)
 		return ERR_PTR(-EROFS);
 
+	WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
 	journal = EXT4_SB(sb)->s_journal;
-	handle = ext4_journal_current_handle();
-
-	/*
-	 * If a handle has been started, it should be allowed to
-	 * finish, otherwise deadlock could happen between freeze
-	 * and others(e.g. truncate) due to the restart of the
-	 * journal handle if the filesystem is forzen and active
-	 * handles are not stopped.
-	 */
-	if (!handle)
-		vfs_check_frozen(sb, SB_FREEZE_TRANS);
-
 	if (!journal)
 		return ext4_get_nojournal();
 	/*
@@ -2772,6 +2756,7 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 	sb = elr->lr_super;
 	ngroups = EXT4_SB(sb)->s_groups_count;
 
+	sb_start_write(sb);
 	for (group = elr->lr_next_group; group < ngroups; group++) {
 		gdp = ext4_get_group_desc(sb, group, NULL);
 		if (!gdp) {
@@ -2798,6 +2783,7 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 		elr->lr_next_sched = jiffies + elr->lr_timeout;
 		elr->lr_next_group = group + 1;
 	}
+	sb_end_write(sb);
 
 	return ret;
 }
@@ -4280,10 +4266,8 @@ int ext4_force_commit(struct super_block *sb)
 		return 0;
 
 	journal = EXT4_SB(sb)->s_journal;
-	if (journal) {
-		vfs_check_frozen(sb, SB_FREEZE_TRANS);
+	if (journal)
 		ret = ext4_journal_force_commit(journal);
-	}
 
 	return ret;
 }
@@ -4315,9 +4299,8 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
  * gives us a chance to flush the journal completely and mark the fs clean.
  *
  * Note that only this function cannot bring a filesystem to be in a clean
- * state independently, because ext4 prevents a new handle from being started
- * by @sb->s_frozen, which stays in an upper layer.  It thus needs help from
- * the upper layer.
+ * state independently. It relies on upper layer to stop all data & metadata
+ * modifications.
  */
 static int ext4_freeze(struct super_block *sb)
 {
@@ -4344,7 +4327,7 @@ static int ext4_freeze(struct super_block *sb)
 	EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
 	error = ext4_commit_super(sb, 1);
 out:
-	/* we rely on s_frozen to stop further updates */
+	/* we rely on upper layer to stop further updates */
 	jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
 	return error;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-03-28 23:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-05 16:00 [PATCH 00/19] Fix filesystem freezing deadlocks Jan Kara
2012-03-05 16:01 ` [PATCH 10/19] ext4: Convert to new freezing mechanism Jan Kara
2012-03-07 22:32   ` Kamal Mostafa
2012-03-08  9:05     ` Jan Kara
2012-03-11 20:22 ` [PATCH 00/19] Fix filesystem freezing deadlocks Kamal Mostafa
  -- strict thread matches above, loose matches on Subject: below --
2012-03-28 23:43 [PATCH 00/19 v4] " Jan Kara
2012-03-28 23:43 ` [PATCH 10/19] ext4: Convert to new freezing mechanism Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).