linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/19 v5] Fix filesystem freezing deadlocks
@ 2012-04-16 16:13 Jan Kara
  2012-04-16 16:13 ` [PATCH 01/27] fb_defio: Push file_update_time() into fb_deferred_io_mkwrite() Jan Kara
                   ` (27 more replies)
  0 siblings, 28 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: dchinner, LKML, linux-fsdevel, Jan Kara, Alex Elder,
	Anton Altaparmakov, Ben Myers, Chris Mason, cluster-devel,
	David S. Miller, fuse-devel, J. Bruce Fields, Joel Becker,
	KONISHI Ryusuke, linux-btrfs, linux-ext4, linux-nfs, linux-nilfs,
	linux-ntfs-dev, Mark Fasheh, Miklos Szeredi, ocfs2-devel,
	OGAWA Hirofumi, Steven Whitehouse, Theodore Ts'o, xfs

  Hello,

  here is the fifth iteration of my patches to improve filesystem freezing.
No serious changes since last time. Mostly I rebased patches and merged this
series with series moving file_update_time() to ->page_mkwrite() to simplify
testing and merging.

Filesystem freezing is currently racy and thus we can end up with dirty data on
frozen filesystem (see changelog patch 13 for detailed race description). This
patch series aims at fixing this.

To be able to block all places where inodes get dirtied, I've moved filesystem
file_update_time() call to ->page_mkwrite callback (patches 01-07) and put
freeze handling in mnt_want_write() / mnt_drop_write(). That however required
some code shuffling and changes to kern_path_create() (see patches 09-12). I
think the result is OK but opinions may differ ;). The advantage of this change
also is that all filesystems get freeze protection almost for free - even ext2
can handle freezing well now.

Another potential contention point might be patch 19. In that patch we make
freeze_super() refuse to freeze the filesystem when there are open but unlinked
files which may be impractical in some cases. The main reason for this is the
problem with handling of file deletion from fput() called with mmap_sem held
(e.g. from munmap(2)), and then there's the fact that we cannot really force
such filesystem into a consistent state... But if people think that freezing
with open but unlinked files should happen, then I have some possible
solutions in mind (maybe as a separate patchset since this is large enough).

I'm not able to hit any deadlocks, lockdep warnings, or dirty data on frozen
filesystem despite beating it with fsstress and bash-shared-mapping while
freezing and unfreezing for several hours (using ext4 and xfs) so I'm
reasonably confident this could finally be the right solution.

Changes since v4:
  * added a couple of Acked-by's
  * added some comments & doc update
  * added patches from series "Push file_update_time() into .page_mkwrite"
    since it doesn't make much sense to keep them separate anymore
  * rebased on top of 3.4-rc2

Changes since v3:
  * added third level of freezing for fs internal purposes - hooked some
    filesystems to use it (XFS, nilfs2)
  * removed racy i_size check from filemap_mkwrite()

Changes since v2:
  * completely rewritten
  * freezing is now blocked at VFS entry points
  * two stage freezing to handle both mmapped writes and other IO

The biggest changes since v1:
  * have two counters to provide safe state transitions for SB_FREEZE_WRITE
    and SB_FREEZE_TRANS states
  * use percpu counters instead of own percpu structure
  * added documentation fixes from the old fs freezing series
  * converted XFS to use SB_FREEZE_TRANS counter instead of its private
    m_active_trans counter

								Honza

CC: Alex Elder <elder@kernel.org>
CC: Anton Altaparmakov <anton@tuxera.com>
CC: Ben Myers <bpm@sgi.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: cluster-devel@redhat.com
CC: "David S. Miller" <davem@davemloft.net>
CC: fuse-devel@lists.sourceforge.net
CC: "J. Bruce Fields" <bfields@fieldses.org>
CC: Joel Becker <jlbec@evilplan.org>
CC: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
CC: linux-btrfs@vger.kernel.org
CC: linux-ext4@vger.kernel.org
CC: linux-nfs@vger.kernel.org
CC: linux-nilfs@vger.kernel.org
CC: linux-ntfs-dev@lists.sourceforge.net
CC: Mark Fasheh <mfasheh@suse.com>
CC: Miklos Szeredi <miklos@szeredi.hu>
CC: ocfs2-devel@oss.oracle.com
CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: "Theodore Ts'o" <tytso@mit.edu>
CC: xfs@oss.sgi.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 01/27] fb_defio: Push file_update_time() into fb_deferred_io_mkwrite()
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 02/27] fs: Push file_update_time() into __block_page_mkwrite() Jan Kara
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara, Jaya Kumar

CC: Jaya Kumar <jayalk@intworks.biz>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/video/fb_defio.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/video/fb_defio.c b/drivers/video/fb_defio.c
index c27e153..7a09c06 100644
--- a/drivers/video/fb_defio.c
+++ b/drivers/video/fb_defio.c
@@ -104,6 +104,8 @@ static int fb_deferred_io_mkwrite(struct vm_area_struct *vma,
 	deferred framebuffer IO. then if userspace touches a page
 	again, we repeat the same scheme */
 
+	file_update_time(vma->vm_file);
+
 	/* protect against the workqueue changing the page list */
 	mutex_lock(&fbdefio->lock);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 02/27] fs: Push file_update_time() into __block_page_mkwrite()
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
  2012-04-16 16:13 ` [PATCH 01/27] fb_defio: Push file_update_time() into fb_deferred_io_mkwrite() Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 03/27] ceph: Push file_update_time() into ceph_page_mkwrite() Jan Kara
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara

Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/buffer.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 36d6665..b8a2386 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2313,6 +2313,12 @@ int __block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
 	loff_t size;
 	int ret;
 
+	/*
+	 * Update file times before taking page lock. We may end up failing the
+	 * fault so this update may be superfluous but who really cares...
+	 */
+	file_update_time(vma->vm_file);
+
 	lock_page(page);
 	size = i_size_read(inode);
 	if ((page->mapping != inode->i_mapping) ||
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 03/27] ceph: Push file_update_time() into ceph_page_mkwrite()
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
  2012-04-16 16:13 ` [PATCH 01/27] fb_defio: Push file_update_time() into fb_deferred_io_mkwrite() Jan Kara
  2012-04-16 16:13 ` [PATCH 02/27] fs: Push file_update_time() into __block_page_mkwrite() Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 04/27] 9p: Push file_update_time() into v9fs_vm_page_mkwrite() Jan Kara
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara, Sage Weil, ceph-devel

CC: Sage Weil <sage@newdream.net>
CC: ceph-devel@vger.kernel.org
Acked-by: Sage Weil <sage@newdream.net>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ceph/addr.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 173b1d2..12b139f 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1181,6 +1181,9 @@ static int ceph_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	loff_t size, len;
 	int ret;
 
+	/* Update time before taking page lock */
+	file_update_time(vma->vm_file);
+
 	size = i_size_read(inode);
 	if (off + PAGE_CACHE_SIZE <= size)
 		len = PAGE_CACHE_SIZE;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 04/27] 9p: Push file_update_time() into v9fs_vm_page_mkwrite()
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (2 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 03/27] ceph: Push file_update_time() into ceph_page_mkwrite() Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 05/27] gfs2: Push file_update_time() into gfs2_page_mkwrite() Jan Kara
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: dchinner, LKML, linux-fsdevel, Jan Kara, Eric Van Hensbergen,
	Ron Minnich, Latchesar Ionkov, v9fs-developer

CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Ron Minnich <rminnich@sandia.gov>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: v9fs-developer@lists.sourceforge.net
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/9p/vfs_file.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index fc06fd2..dd6f7ee 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -610,6 +610,9 @@ v9fs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	p9_debug(P9_DEBUG_VFS, "page %p fid %lx\n",
 		 page, (unsigned long)filp->private_data);
 
+	/* Update file times before taking page lock */
+	file_update_time(filp);
+
 	v9inode = V9FS_I(inode);
 	/* make sure the cache has finished storing the page */
 	v9fs_fscache_wait_on_page_write(inode, page);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 05/27] gfs2: Push file_update_time() into gfs2_page_mkwrite()
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (3 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 04/27] 9p: Push file_update_time() into v9fs_vm_page_mkwrite() Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 06/27] sysfs: Push file_update_time() into bin_page_mkwrite() Jan Kara
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, LKML, cluster-devel, linux-fsdevel

CC: Steven Whitehouse <swhiteho@redhat.com>
CC: cluster-devel@redhat.com
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/gfs2/file.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index a3d2c9e..151d667 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -376,6 +376,9 @@ static int gfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	 */
 	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
 
+	/* Update file times before taking page lock */
+	file_update_time(vma->vm_file);
+
 	gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh);
 	ret = gfs2_glock_nq(&gh);
 	if (ret)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 06/27] sysfs: Push file_update_time() into bin_page_mkwrite()
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (4 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 05/27] gfs2: Push file_update_time() into gfs2_page_mkwrite() Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 07/27] mm: Update file times from fault path only if .page_mkwrite is not set Jan Kara
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara, Greg Kroah-Hartman

CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/sysfs/bin.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index a475983..614b2b5 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -228,6 +228,8 @@ static int bin_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	ret = 0;
 	if (bb->vm_ops->page_mkwrite)
 		ret = bb->vm_ops->page_mkwrite(vma, vmf);
+	else
+		file_update_time(file);
 
 	sysfs_put_active(attr_sd);
 	return ret;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 07/27] mm: Update file times from fault path only if .page_mkwrite is not set
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (5 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 06/27] sysfs: Push file_update_time() into bin_page_mkwrite() Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 08/27] mm: Make default vm_ops provide ->page_mkwrite handler Jan Kara
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara

Filesystems wanting to properly support freezing need to have control
when file_update_time() is called. After pushing file_update_time()
to all relevant .page_mkwrite implementations we can just stop calling
file_update_time() when filesystem implements .page_mkwrite.

Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 6105f47..4803e47 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2633,6 +2633,9 @@ reuse:
 		if (!page_mkwrite) {
 			wait_on_page_locked(dirty_page);
 			set_page_dirty_balance(dirty_page, page_mkwrite);
+			/* file_update_time outside page_lock */
+			if (vma->vm_file)
+				file_update_time(vma->vm_file);
 		}
 		put_page(dirty_page);
 		if (page_mkwrite) {
@@ -2650,10 +2653,6 @@ reuse:
 			}
 		}
 
-		/* file_update_time outside page_lock */
-		if (vma->vm_file)
-			file_update_time(vma->vm_file);
-
 		return ret;
 	}
 
@@ -3322,12 +3321,13 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	if (dirty_page) {
 		struct address_space *mapping = page->mapping;
+		int dirtied = 0;
 
 		if (set_page_dirty(dirty_page))
-			page_mkwrite = 1;
+			dirtied = 1;
 		unlock_page(dirty_page);
 		put_page(dirty_page);
-		if (page_mkwrite && mapping) {
+		if ((dirtied || page_mkwrite) && mapping) {
 			/*
 			 * Some device drivers do not set page.mapping but still
 			 * dirty their pages
@@ -3336,7 +3336,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		}
 
 		/* file_update_time outside page_lock */
-		if (vma->vm_file)
+		if (vma->vm_file && !page_mkwrite)
 			file_update_time(vma->vm_file);
 	} else {
 		unlock_page(vmf.page);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 08/27] mm: Make default vm_ops provide ->page_mkwrite handler
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (6 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 07/27] mm: Update file times from fault path only if .page_mkwrite is not set Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 09/27] fs: Push mnt_want_write() outside of i_mutex Jan Kara
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara

Make default vm_ops provide ->page_mkwrite handler. Currently it only updates
file's modification times and gets locked page but later it will also handle
filesystem freezing.

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h |    1 +
 mm/filemap.c       |   19 +++++++++++++++++++
 mm/filemap_xip.c   |    1 +
 3 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d8738a4..9bec077 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1424,6 +1424,7 @@ extern void truncate_inode_pages_range(struct address_space *,
 
 /* generic vm_area_ops exported for stackable file systems */
 extern int filemap_fault(struct vm_area_struct *, struct vm_fault *);
+extern int filemap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
 
 /* mm/page-writeback.c */
 int write_one_page(struct page *page, int wait);
diff --git a/mm/filemap.c b/mm/filemap.c
index 79c4b2b..aec7356 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1751,8 +1751,27 @@ page_not_uptodate:
 }
 EXPORT_SYMBOL(filemap_fault);
 
+int filemap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	struct page *page = vmf->page;
+	struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
+	int ret = VM_FAULT_LOCKED;
+
+	file_update_time(vma->vm_file);
+	lock_page(page);
+	if (page->mapping != inode->i_mapping) {
+		unlock_page(page);
+		ret = VM_FAULT_NOPAGE;
+		goto out;
+	}
+out:
+	return ret;
+}
+EXPORT_SYMBOL(filemap_page_mkwrite);
+
 const struct vm_operations_struct generic_file_vm_ops = {
 	.fault		= filemap_fault,
+	.page_mkwrite	= filemap_page_mkwrite,
 };
 
 /* This is used for a general mmap of a disk file */
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
index a4eb311..591dba6 100644
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -304,6 +304,7 @@ out:
 
 static const struct vm_operations_struct xip_file_vm_ops = {
 	.fault	= xip_file_fault,
+	.page_mkwrite	= filemap_page_mkwrite,
 };
 
 int xip_file_mmap(struct file * file, struct vm_area_struct * vma)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 09/27] fs: Push mnt_want_write() outside of i_mutex
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (7 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 08/27] mm: Make default vm_ops provide ->page_mkwrite handler Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-17  2:18   ` Joel Becker
  2012-04-16 16:13 ` [PATCH 10/27] fat: " Jan Kara
                   ` (18 subsequent siblings)
  27 siblings, 1 reply; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Jan Kara, Mark Fasheh, LKML, dchinner, linux-fsdevel,
	David S. Miller, ocfs2-devel

Currently, mnt_want_write() is sometimes called with i_mutex held and sometimes
without it. This isn't really a problem because mnt_want_write() is a
non-blocking operation (essentially has a trylock semantics) but when the
function starts to handle also frozen filesystems, it will get a full lock
semantics and thus proper lock ordering has to be established. So move
all mnt_want_write() calls outside of i_mutex.

One non-trivial case needing conversion is kern_path_create() /
user_path_create() which didn't include mnt_want_write() but now needs to
because it acquires i_mutex.  Because there are virtual file systems which
don't bother with freeze / remount-ro protection we actually provide both
versions of the function - one which calls mnt_want_write() and one which does
not.

CC: ocfs2-devel@oss.oracle.com
CC: Mark Fasheh <mfasheh@suse.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: "David S. Miller" <davem@davemloft.net>
BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/namei.c              |  115 +++++++++++++++++++++++++++--------------------
 fs/ocfs2/refcounttree.c |   10 +---
 include/linux/namei.h   |    2 +
 net/unix/af_unix.c      |   13 ++----
 4 files changed, 74 insertions(+), 66 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 0062dd1..5417fa1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2460,7 +2460,9 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,
 	return file;
 }
 
-struct dentry *kern_path_create(int dfd, const char *pathname, struct path *path, int is_dir)
+static struct dentry *do_kern_path_create(int dfd, const char *pathname,
+					  struct path *path, int is_dir,
+					  int freeze_protect)
 {
 	struct dentry *dentry = ERR_PTR(-EEXIST);
 	struct nameidata nd;
@@ -2478,6 +2480,14 @@ struct dentry *kern_path_create(int dfd, const char *pathname, struct path *path
 	nd.flags |= LOOKUP_CREATE | LOOKUP_EXCL;
 	nd.intent.open.flags = O_EXCL;
 
+	if (freeze_protect) {
+		error = mnt_want_write(nd.path.mnt);
+		if (error) {
+			dentry = ERR_PTR(error);
+			goto out;
+		}
+	}
+
 	/*
 	 * Do the final lookup.
 	 */
@@ -2506,24 +2516,49 @@ eexist:
 	dentry = ERR_PTR(-EEXIST);
 fail:
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
+	if (freeze_protect)
+		mnt_drop_write(nd.path.mnt);
 out:
 	path_put(&nd.path);
 	return dentry;
 }
+
+struct dentry *kern_path_create(int dfd, const char *pathname, struct path *path, int is_dir)
+{
+	return do_kern_path_create(dfd, pathname, path, is_dir, 0);
+}
 EXPORT_SYMBOL(kern_path_create);
 
+struct dentry *kern_path_create_thawed(int dfd, const char *pathname, struct path *path, int is_dir)
+{
+	return do_kern_path_create(dfd, pathname, path, is_dir, 1);
+}
+EXPORT_SYMBOL(kern_path_create_thawed);
+
 struct dentry *user_path_create(int dfd, const char __user *pathname, struct path *path, int is_dir)
 {
 	char *tmp = getname(pathname);
 	struct dentry *res;
 	if (IS_ERR(tmp))
 		return ERR_CAST(tmp);
-	res = kern_path_create(dfd, tmp, path, is_dir);
+	res = do_kern_path_create(dfd, tmp, path, is_dir, 0);
 	putname(tmp);
 	return res;
 }
 EXPORT_SYMBOL(user_path_create);
 
+struct dentry *user_path_create_thawed(int dfd, const char __user *pathname, struct path *path, int is_dir)
+{
+	char *tmp = getname(pathname);
+	struct dentry *res;
+	if (IS_ERR(tmp))
+		return ERR_CAST(tmp);
+	res = do_kern_path_create(dfd, tmp, path, is_dir, 1);
+	putname(tmp);
+	return res;
+}
+EXPORT_SYMBOL(user_path_create_thawed);
+
 int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
 {
 	int error = may_create(dir, dentry);
@@ -2579,7 +2614,7 @@ SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, umode_t, mode,
 	if (S_ISDIR(mode))
 		return -EPERM;
 
-	dentry = user_path_create(dfd, filename, &path, 0);
+	dentry = user_path_create_thawed(dfd, filename, &path, 0);
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
 
@@ -2588,12 +2623,9 @@ SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, umode_t, mode,
 	error = may_mknod(mode);
 	if (error)
 		goto out_dput;
-	error = mnt_want_write(path.mnt);
-	if (error)
-		goto out_dput;
 	error = security_path_mknod(&path, dentry, mode, dev);
 	if (error)
-		goto out_drop_write;
+		goto out_dput;
 	switch (mode & S_IFMT) {
 		case 0: case S_IFREG:
 			error = vfs_create(path.dentry->d_inode,dentry,mode,NULL);
@@ -2606,11 +2638,10 @@ SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, umode_t, mode,
 			error = vfs_mknod(path.dentry->d_inode,dentry,mode,0);
 			break;
 	}
-out_drop_write:
-	mnt_drop_write(path.mnt);
 out_dput:
 	dput(dentry);
 	mutex_unlock(&path.dentry->d_inode->i_mutex);
+	mnt_drop_write(path.mnt);
 	path_put(&path);
 
 	return error;
@@ -2652,24 +2683,20 @@ SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t, mode)
 	struct path path;
 	int error;
 
-	dentry = user_path_create(dfd, pathname, &path, 1);
+	dentry = user_path_create_thawed(dfd, pathname, &path, 1);
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
 
 	if (!IS_POSIXACL(path.dentry->d_inode))
 		mode &= ~current_umask();
-	error = mnt_want_write(path.mnt);
-	if (error)
-		goto out_dput;
 	error = security_path_mkdir(&path, dentry, mode);
 	if (error)
-		goto out_drop_write;
+		goto out_dput;
 	error = vfs_mkdir(path.dentry->d_inode, dentry, mode);
-out_drop_write:
-	mnt_drop_write(path.mnt);
 out_dput:
 	dput(dentry);
 	mutex_unlock(&path.dentry->d_inode->i_mutex);
+	mnt_drop_write(path.mnt);
 	path_put(&path);
 	return error;
 }
@@ -2764,6 +2791,9 @@ static long do_rmdir(int dfd, const char __user *pathname)
 	}
 
 	nd.flags &= ~LOOKUP_PARENT;
+	error = mnt_want_write(nd.path.mnt);
+	if (error)
+		goto exit1;
 
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
 	dentry = lookup_hash(&nd);
@@ -2774,19 +2804,15 @@ static long do_rmdir(int dfd, const char __user *pathname)
 		error = -ENOENT;
 		goto exit3;
 	}
-	error = mnt_want_write(nd.path.mnt);
-	if (error)
-		goto exit3;
 	error = security_path_rmdir(&nd.path, dentry);
 	if (error)
-		goto exit4;
+		goto exit3;
 	error = vfs_rmdir(nd.path.dentry->d_inode, dentry);
-exit4:
-	mnt_drop_write(nd.path.mnt);
 exit3:
 	dput(dentry);
 exit2:
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
+	mnt_drop_write(nd.path.mnt);
 exit1:
 	path_put(&nd.path);
 	putname(name);
@@ -2853,6 +2879,9 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 		goto exit1;
 
 	nd.flags &= ~LOOKUP_PARENT;
+	error = mnt_want_write(nd.path.mnt);
+	if (error)
+		goto exit1;
 
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
 	dentry = lookup_hash(&nd);
@@ -2865,21 +2894,17 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 		if (!inode)
 			goto slashes;
 		ihold(inode);
-		error = mnt_want_write(nd.path.mnt);
-		if (error)
-			goto exit2;
 		error = security_path_unlink(&nd.path, dentry);
 		if (error)
-			goto exit3;
+			goto exit2;
 		error = vfs_unlink(nd.path.dentry->d_inode, dentry);
-exit3:
-		mnt_drop_write(nd.path.mnt);
-	exit2:
+exit2:
 		dput(dentry);
 	}
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
 	if (inode)
 		iput(inode);	/* truncate the inode here */
+	mnt_drop_write(nd.path.mnt);
 exit1:
 	path_put(&nd.path);
 	putname(name);
@@ -2939,23 +2964,19 @@ SYSCALL_DEFINE3(symlinkat, const char __user *, oldname,
 	if (IS_ERR(from))
 		return PTR_ERR(from);
 
-	dentry = user_path_create(newdfd, newname, &path, 0);
+	dentry = user_path_create_thawed(newdfd, newname, &path, 0);
 	error = PTR_ERR(dentry);
 	if (IS_ERR(dentry))
 		goto out_putname;
 
-	error = mnt_want_write(path.mnt);
-	if (error)
-		goto out_dput;
 	error = security_path_symlink(&path, dentry, from);
 	if (error)
-		goto out_drop_write;
+		goto out_dput;
 	error = vfs_symlink(path.dentry->d_inode, dentry, from);
-out_drop_write:
-	mnt_drop_write(path.mnt);
 out_dput:
 	dput(dentry);
 	mutex_unlock(&path.dentry->d_inode->i_mutex);
+	mnt_drop_write(path.mnt);
 	path_put(&path);
 out_putname:
 	putname(from);
@@ -3048,7 +3069,7 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
 	if (error)
 		return error;
 
-	new_dentry = user_path_create(newdfd, newname, &new_path, 0);
+	new_dentry = user_path_create_thawed(newdfd, newname, &new_path, 0);
 	error = PTR_ERR(new_dentry);
 	if (IS_ERR(new_dentry))
 		goto out;
@@ -3056,18 +3077,14 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
 	error = -EXDEV;
 	if (old_path.mnt != new_path.mnt)
 		goto out_dput;
-	error = mnt_want_write(new_path.mnt);
-	if (error)
-		goto out_dput;
 	error = security_path_link(old_path.dentry, &new_path, new_dentry);
 	if (error)
-		goto out_drop_write;
+		goto out_dput;
 	error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry);
-out_drop_write:
-	mnt_drop_write(new_path.mnt);
 out_dput:
 	dput(new_dentry);
 	mutex_unlock(&new_path.dentry->d_inode->i_mutex);
+	mnt_drop_write(new_path.mnt);
 	path_put(&new_path);
 out:
 	path_put(&old_path);
@@ -3264,6 +3281,10 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	if (newnd.last_type != LAST_NORM)
 		goto exit2;
 
+	error = mnt_want_write(oldnd.path.mnt);
+	if (error)
+		goto exit2;
+
 	oldnd.flags &= ~LOOKUP_PARENT;
 	newnd.flags &= ~LOOKUP_PARENT;
 	newnd.flags |= LOOKUP_RENAME_TARGET;
@@ -3299,23 +3320,19 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	if (new_dentry == trap)
 		goto exit5;
 
-	error = mnt_want_write(oldnd.path.mnt);
-	if (error)
-		goto exit5;
 	error = security_path_rename(&oldnd.path, old_dentry,
 				     &newnd.path, new_dentry);
 	if (error)
-		goto exit6;
+		goto exit5;
 	error = vfs_rename(old_dir->d_inode, old_dentry,
 				   new_dir->d_inode, new_dentry);
-exit6:
-	mnt_drop_write(oldnd.path.mnt);
 exit5:
 	dput(new_dentry);
 exit4:
 	dput(old_dentry);
 exit3:
 	unlock_rename(new_dir, old_dir);
+	mnt_drop_write(oldnd.path.mnt);
 exit2:
 	path_put(&newnd.path);
 	putname(to);
diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
index cf78233..a99b8e2 100644
--- a/fs/ocfs2/refcounttree.c
+++ b/fs/ocfs2/refcounttree.c
@@ -4453,7 +4453,7 @@ int ocfs2_reflink_ioctl(struct inode *inode,
 		return error;
 	}
 
-	new_dentry = user_path_create(AT_FDCWD, newname, &new_path, 0);
+	new_dentry = user_path_create_thawed(AT_FDCWD, newname, &new_path, 0);
 	error = PTR_ERR(new_dentry);
 	if (IS_ERR(new_dentry)) {
 		mlog_errno(error);
@@ -4466,19 +4466,13 @@ int ocfs2_reflink_ioctl(struct inode *inode,
 		goto out_dput;
 	}
 
-	error = mnt_want_write(new_path.mnt);
-	if (error) {
-		mlog_errno(error);
-		goto out_dput;
-	}
-
 	error = ocfs2_vfs_reflink(old_path.dentry,
 				  new_path.dentry->d_inode,
 				  new_dentry, preserve);
-	mnt_drop_write(new_path.mnt);
 out_dput:
 	dput(new_dentry);
 	mutex_unlock(&new_path.dentry->d_inode->i_mutex);
+	mnt_drop_write(new_path.mnt);
 	path_put(&new_path);
 out:
 	path_put(&old_path);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index ffc0213..432f6bb 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -77,7 +77,9 @@ extern int user_path_at_empty(int, const char __user *, unsigned, struct path *,
 extern int kern_path(const char *, unsigned, struct path *);
 
 extern struct dentry *kern_path_create(int, const char *, struct path *, int);
+extern struct dentry *kern_path_create_thawed(int, const char *, struct path *, int);
 extern struct dentry *user_path_create(int, const char __user *, struct path *, int);
+extern struct dentry *user_path_create_thawed(int, const char __user *, struct path *, int);
 extern int kern_path_parent(const char *, struct nameidata *);
 extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
 			   const char *, unsigned int, struct path *);
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index d510353..c532632 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -865,7 +865,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		 * Get the parent directory, calculate the hash for last
 		 * component.
 		 */
-		dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
+		dentry = kern_path_create_thawed(AT_FDCWD, sun_path, &path, 0);
 		err = PTR_ERR(dentry);
 		if (IS_ERR(dentry))
 			goto out_mknod_parent;
@@ -875,19 +875,13 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		 */
 		mode = S_IFSOCK |
 		       (SOCK_INODE(sock)->i_mode & ~current_umask());
-		err = mnt_want_write(path.mnt);
-		if (err)
-			goto out_mknod_dput;
 		err = security_path_mknod(&path, dentry, mode, 0);
 		if (err)
-			goto out_mknod_drop_write;
-		err = vfs_mknod(path.dentry->d_inode, dentry, mode, 0);
-out_mknod_drop_write:
-		mnt_drop_write(path.mnt);
-		if (err)
 			goto out_mknod_dput;
+		err = vfs_mknod(path.dentry->d_inode, dentry, mode, 0);
 		mutex_unlock(&path.dentry->d_inode->i_mutex);
 		dput(path.dentry);
+		mnt_drop_write(path.mnt);
 		path.dentry = dentry;
 
 		addr->hash = UNIX_HASH_SIZE;
@@ -924,6 +918,7 @@ out:
 out_mknod_dput:
 	dput(dentry);
 	mutex_unlock(&path.dentry->d_inode->i_mutex);
+	mnt_drop_write(path.mnt);
 	path_put(&path);
 out_mknod_parent:
 	if (err == -EEXIST)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 10/27] fat: Push mnt_want_write() outside of i_mutex
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (8 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 09/27] fs: Push mnt_want_write() outside of i_mutex Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 11/27] btrfs: " Jan Kara
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara, OGAWA Hirofumi

When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
outside of i_mutex as in other places.

CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fat/file.c |   15 +++++++--------
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/fat/file.c b/fs/fat/file.c
index a71fe37..e007b8b 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -43,10 +43,10 @@ static int fat_ioctl_set_attributes(struct file *file, u32 __user *user_attr)
 	if (err)
 		goto out;
 
-	mutex_lock(&inode->i_mutex);
 	err = mnt_want_write_file(file);
 	if (err)
-		goto out_unlock_inode;
+		goto out;
+	mutex_lock(&inode->i_mutex);
 
 	/*
 	 * ATTR_VOLUME and ATTR_DIR cannot be changed; this also
@@ -73,14 +73,14 @@ static int fat_ioctl_set_attributes(struct file *file, u32 __user *user_attr)
 	/* The root directory has no attributes */
 	if (inode->i_ino == MSDOS_ROOT_INO && attr != ATTR_DIR) {
 		err = -EINVAL;
-		goto out_drop_write;
+		goto out_unlock_inode;
 	}
 
 	if (sbi->options.sys_immutable &&
 	    ((attr | oldattr) & ATTR_SYS) &&
 	    !capable(CAP_LINUX_IMMUTABLE)) {
 		err = -EPERM;
-		goto out_drop_write;
+		goto out_unlock_inode;
 	}
 
 	/*
@@ -90,12 +90,12 @@ static int fat_ioctl_set_attributes(struct file *file, u32 __user *user_attr)
 	 */
 	err = security_inode_setattr(file->f_path.dentry, &ia);
 	if (err)
-		goto out_drop_write;
+		goto out_unlock_inode;
 
 	/* This MUST be done before doing anything irreversible... */
 	err = fat_setattr(file->f_path.dentry, &ia);
 	if (err)
-		goto out_drop_write;
+		goto out_unlock_inode;
 
 	fsnotify_change(file->f_path.dentry, ia.ia_valid);
 	if (sbi->options.sys_immutable) {
@@ -107,10 +107,9 @@ static int fat_ioctl_set_attributes(struct file *file, u32 __user *user_attr)
 
 	fat_save_attrs(inode, attr);
 	mark_inode_dirty(inode);
-out_drop_write:
-	mnt_drop_write_file(file);
 out_unlock_inode:
 	mutex_unlock(&inode->i_mutex);
+	mnt_drop_write_file(file);
 out:
 	return err;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 11/27] btrfs: Push mnt_want_write() outside of i_mutex
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (9 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 10/27] fat: " Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 12/27] nfsd: " Jan Kara
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara, Chris Mason, linux-btrfs

When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
consistently outside of i_mutex.

CC: Chris Mason <chris.mason@oracle.com>
CC: linux-btrfs@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/btrfs/ioctl.c |   23 +++++++++++------------
 1 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 18cc23d..869d913 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -192,6 +192,10 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
 	if (!inode_owner_or_capable(inode))
 		return -EACCES;
 
+	ret = mnt_want_write_file(file);
+	if (ret)
+		return ret;
+
 	mutex_lock(&inode->i_mutex);
 
 	ip_oldflags = ip->flags;
@@ -206,10 +210,6 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
 		}
 	}
 
-	ret = mnt_want_write_file(file);
-	if (ret)
-		goto out_unlock;
-
 	if (flags & FS_SYNC_FL)
 		ip->flags |= BTRFS_INODE_SYNC;
 	else
@@ -271,9 +271,9 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
 		inode->i_flags = i_oldflags;
 	}
 
-	mnt_drop_write_file(file);
  out_unlock:
 	mutex_unlock(&inode->i_mutex);
+	mnt_drop_write_file(file);
 	return ret;
 }
 
@@ -639,6 +639,10 @@ static noinline int btrfs_mksubvol(struct path *parent,
 	struct dentry *dentry;
 	int error;
 
+	error = mnt_want_write(parent->mnt);
+	if (error)
+		return error;
+
 	mutex_lock_nested(&dir->i_mutex, I_MUTEX_PARENT);
 
 	dentry = lookup_one_len(name, parent->dentry, namelen);
@@ -650,13 +654,9 @@ static noinline int btrfs_mksubvol(struct path *parent,
 	if (dentry->d_inode)
 		goto out_dput;
 
-	error = mnt_want_write(parent->mnt);
-	if (error)
-		goto out_dput;
-
 	error = btrfs_may_create(dir, dentry);
 	if (error)
-		goto out_drop_write;
+		goto out_dput;
 
 	down_read(&BTRFS_I(dir)->root->fs_info->subvol_sem);
 
@@ -674,12 +674,11 @@ static noinline int btrfs_mksubvol(struct path *parent,
 		fsnotify_mkdir(dir, dentry);
 out_up_read:
 	up_read(&BTRFS_I(dir)->root->fs_info->subvol_sem);
-out_drop_write:
-	mnt_drop_write(parent->mnt);
 out_dput:
 	dput(dentry);
 out_unlock:
 	mutex_unlock(&dir->i_mutex);
+	mnt_drop_write(parent->mnt);
 	return error;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 12/27] nfsd: Push mnt_want_write() outside of i_mutex
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (10 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 11/27] btrfs: " Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 18:25   ` J. Bruce Fields
  2012-04-16 16:13 ` [PATCH 13/27] fs: Improve filesystem freezing handling Jan Kara
                   ` (15 subsequent siblings)
  27 siblings, 1 reply; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara, linux-nfs,
	J. Bruce Fields

When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
consistently outside of i_mutex.

CC: linux-nfs@vger.kernel.org
CC: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/nfsd/nfs4recover.c      |    9 +++--
 fs/nfsd/nfsfh.c            |    1 +
 fs/nfsd/nfsproc.c          |    9 ++++-
 fs/nfsd/vfs.c              |   79 ++++++++++++++++++++++---------------------
 fs/nfsd/vfs.h              |   11 +++++-
 include/linux/nfsd/nfsfh.h |    1 +
 6 files changed, 64 insertions(+), 46 deletions(-)

diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 4767429..efa7574 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -154,6 +154,10 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
 	if (status < 0)
 		return;
 
+	status = mnt_want_write_file(rec_file);
+	if (status)
+		return;
+
 	dir = rec_file->f_path.dentry;
 	/* lock the parent */
 	mutex_lock(&dir->d_inode->i_mutex);
@@ -173,11 +177,7 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
 		 * as well be forgiving and just succeed silently.
 		 */
 		goto out_put;
-	status = mnt_want_write_file(rec_file);
-	if (status)
-		goto out_put;
 	status = vfs_mkdir(dir->d_inode, dentry, S_IRWXU);
-	mnt_drop_write_file(rec_file);
 out_put:
 	dput(dentry);
 out_unlock:
@@ -189,6 +189,7 @@ out_unlock:
 				" (err %d); please check that %s exists"
 				" and is writeable", status,
 				user_recovery_dirname);
+	mnt_drop_write_file(rec_file);
 	nfs4_reset_creds(original_cred);
 }
 
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 68454e7..8b93353 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -635,6 +635,7 @@ fh_put(struct svc_fh *fhp)
 		fhp->fh_post_saved = 0;
 #endif
 	}
+	fh_drop_write(fhp);
 	if (exp) {
 		cache_put(&exp->h, &svc_export_cache);
 		fhp->fh_export = NULL;
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index e15dc45..aad6d45 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -196,6 +196,7 @@ nfsd_proc_create(struct svc_rqst *rqstp, struct nfsd_createargs *argp,
 	struct dentry	*dchild;
 	int		type, mode;
 	__be32		nfserr;
+	int		hosterr;
 	dev_t		rdev = 0, wanted = new_decode_dev(attr->ia_size);
 
 	dprintk("nfsd: CREATE   %s %.*s\n",
@@ -214,6 +215,12 @@ nfsd_proc_create(struct svc_rqst *rqstp, struct nfsd_createargs *argp,
 	nfserr = nfserr_exist;
 	if (isdotent(argp->name, argp->len))
 		goto done;
+	hosterr = fh_want_write(dirfhp);
+	if (hosterr) {
+		nfserr = nfserrno(hosterr);
+		goto done;
+	}
+
 	fh_lock_nested(dirfhp, I_MUTEX_PARENT);
 	dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len);
 	if (IS_ERR(dchild)) {
@@ -330,7 +337,7 @@ nfsd_proc_create(struct svc_rqst *rqstp, struct nfsd_createargs *argp,
 out_unlock:
 	/* We don't really need to unlock, as fh_put does it. */
 	fh_unlock(dirfhp);
-
+	fh_drop_write(dirfhp);
 done:
 	fh_put(dirfhp);
 	return nfsd_return_dirop(nfserr, resp);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 296d671..b8bb649 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1276,6 +1276,10 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	 * If it has, the parent directory should already be locked.
 	 */
 	if (!resfhp->fh_dentry) {
+		host_err = fh_want_write(fhp);
+		if (host_err)
+			goto out_nfserr;
+
 		/* called from nfsd_proc_mkdir, or possibly nfsd3_proc_create */
 		fh_lock_nested(fhp, I_MUTEX_PARENT);
 		dchild = lookup_one_len(fname, dentry, flen);
@@ -1319,14 +1323,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		goto out;
 	}
 
-	host_err = fh_want_write(fhp);
-	if (host_err)
-		goto out_nfserr;
-
 	/*
 	 * Get the dir op function pointer.
 	 */
 	err = 0;
+	host_err = 0;
 	switch (type) {
 	case S_IFREG:
 		host_err = vfs_create(dirp, dchild, iap->ia_mode, NULL);
@@ -1343,10 +1344,8 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		host_err = vfs_mknod(dirp, dchild, iap->ia_mode, rdev);
 		break;
 	}
-	if (host_err < 0) {
-		fh_drop_write(fhp);
+	if (host_err < 0)
 		goto out_nfserr;
-	}
 
 	err = nfsd_create_setattr(rqstp, resfhp, iap);
 
@@ -1358,7 +1357,6 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	err2 = nfserrno(commit_metadata(fhp));
 	if (err2)
 		err = err2;
-	fh_drop_write(fhp);
 	/*
 	 * Update the file handle to get the new inode info.
 	 */
@@ -1417,6 +1415,11 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	err = nfserr_notdir;
 	if (!dirp->i_op->lookup)
 		goto out;
+
+	host_err = fh_want_write(fhp);
+	if (host_err)
+		goto out_nfserr;
+
 	fh_lock_nested(fhp, I_MUTEX_PARENT);
 
 	/*
@@ -1449,9 +1452,6 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		v_atime = verifier[1]&0x7fffffff;
 	}
 	
-	host_err = fh_want_write(fhp);
-	if (host_err)
-		goto out_nfserr;
 	if (dchild->d_inode) {
 		err = 0;
 
@@ -1522,7 +1522,6 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (!err)
 		err = nfserrno(commit_metadata(fhp));
 
-	fh_drop_write(fhp);
 	/*
 	 * Update the filehandle to get the new inode info.
 	 */
@@ -1533,6 +1532,7 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	fh_unlock(fhp);
 	if (dchild && !IS_ERR(dchild))
 		dput(dchild);
+	fh_drop_write(fhp);
  	return err;
  
  out_nfserr:
@@ -1613,6 +1613,11 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
 	if (err)
 		goto out;
+
+	host_err = fh_want_write(fhp);
+	if (host_err)
+		goto out_nfserr;
+
 	fh_lock(fhp);
 	dentry = fhp->fh_dentry;
 	dnew = lookup_one_len(fname, dentry, flen);
@@ -1620,10 +1625,6 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (IS_ERR(dnew))
 		goto out_nfserr;
 
-	host_err = fh_want_write(fhp);
-	if (host_err)
-		goto out_nfserr;
-
 	if (unlikely(path[plen] != 0)) {
 		char *path_alloced = kmalloc(plen+1, GFP_KERNEL);
 		if (path_alloced == NULL)
@@ -1683,6 +1684,12 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 	if (isdotent(name, len))
 		goto out;
 
+	host_err = fh_want_write(tfhp);
+	if (host_err) {
+		err = nfserrno(host_err);
+		goto out;
+	}
+
 	fh_lock_nested(ffhp, I_MUTEX_PARENT);
 	ddir = ffhp->fh_dentry;
 	dirp = ddir->d_inode;
@@ -1694,18 +1701,13 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 
 	dold = tfhp->fh_dentry;
 
-	host_err = fh_want_write(tfhp);
-	if (host_err) {
-		err = nfserrno(host_err);
-		goto out_dput;
-	}
 	err = nfserr_noent;
 	if (!dold->d_inode)
-		goto out_drop_write;
+		goto out_dput;
 	host_err = nfsd_break_lease(dold->d_inode);
 	if (host_err) {
 		err = nfserrno(host_err);
-		goto out_drop_write;
+		goto out_dput;
 	}
 	host_err = vfs_link(dold, dirp, dnew);
 	if (!host_err) {
@@ -1718,12 +1720,11 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 		else
 			err = nfserrno(host_err);
 	}
-out_drop_write:
-	fh_drop_write(tfhp);
 out_dput:
 	dput(dnew);
 out_unlock:
 	fh_unlock(ffhp);
+	fh_drop_write(tfhp);
 out:
 	return err;
 
@@ -1766,6 +1767,12 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 	if (!flen || isdotent(fname, flen) || !tlen || isdotent(tname, tlen))
 		goto out;
 
+	host_err = fh_want_write(ffhp);
+	if (host_err) {
+		err = nfserrno(host_err);
+		goto out;
+	}
+
 	/* cannot use fh_lock as we need deadlock protective ordering
 	 * so do it by hand */
 	trap = lock_rename(tdentry, fdentry);
@@ -1796,17 +1803,14 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 	host_err = -EXDEV;
 	if (ffhp->fh_export->ex_path.mnt != tfhp->fh_export->ex_path.mnt)
 		goto out_dput_new;
-	host_err = fh_want_write(ffhp);
-	if (host_err)
-		goto out_dput_new;
 
 	host_err = nfsd_break_lease(odentry->d_inode);
 	if (host_err)
-		goto out_drop_write;
+		goto out_dput_new;
 	if (ndentry->d_inode) {
 		host_err = nfsd_break_lease(ndentry->d_inode);
 		if (host_err)
-			goto out_drop_write;
+			goto out_dput_new;
 	}
 	host_err = vfs_rename(fdir, odentry, tdir, ndentry);
 	if (!host_err) {
@@ -1814,8 +1818,6 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 		if (!host_err)
 			host_err = commit_metadata(ffhp);
 	}
-out_drop_write:
-	fh_drop_write(ffhp);
  out_dput_new:
 	dput(ndentry);
  out_dput_old:
@@ -1831,6 +1833,7 @@ out_drop_write:
 	fill_post_wcc(tfhp);
 	unlock_rename(tdentry, fdentry);
 	ffhp->fh_locked = tfhp->fh_locked = 0;
+	fh_drop_write(ffhp);
 
 out:
 	return err;
@@ -1856,6 +1859,10 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 	if (err)
 		goto out;
 
+	host_err = fh_want_write(fhp);
+	if (host_err)
+		goto out_nfserr;
+
 	fh_lock_nested(fhp, I_MUTEX_PARENT);
 	dentry = fhp->fh_dentry;
 	dirp = dentry->d_inode;
@@ -1874,21 +1881,15 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 	if (!type)
 		type = rdentry->d_inode->i_mode & S_IFMT;
 
-	host_err = fh_want_write(fhp);
-	if (host_err)
-		goto out_put;
-
 	host_err = nfsd_break_lease(rdentry->d_inode);
 	if (host_err)
-		goto out_drop_write;
+		goto out_put;
 	if (type != S_IFDIR)
 		host_err = vfs_unlink(dirp, rdentry);
 	else
 		host_err = vfs_rmdir(dirp, rdentry);
 	if (!host_err)
 		host_err = commit_metadata(fhp);
-out_drop_write:
-	fh_drop_write(fhp);
 out_put:
 	dput(rdentry);
 
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index ec0611b..359594c 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -110,12 +110,19 @@ int nfsd_set_posix_acl(struct svc_fh *, int, struct posix_acl *);
 
 static inline int fh_want_write(struct svc_fh *fh)
 {
-	return mnt_want_write(fh->fh_export->ex_path.mnt);
+	int ret = mnt_want_write(fh->fh_export->ex_path.mnt);
+
+	if (!ret)
+		fh->fh_want_write = 1;
+	return ret;
 }
 
 static inline void fh_drop_write(struct svc_fh *fh)
 {
-	mnt_drop_write(fh->fh_export->ex_path.mnt);
+	if (fh->fh_want_write) {
+		fh->fh_want_write = 0;
+		mnt_drop_write(fh->fh_export->ex_path.mnt);
+	}
 }
 
 #endif /* LINUX_NFSD_VFS_H */
diff --git a/include/linux/nfsd/nfsfh.h b/include/linux/nfsd/nfsfh.h
index ce4743a..fa63048 100644
--- a/include/linux/nfsd/nfsfh.h
+++ b/include/linux/nfsd/nfsfh.h
@@ -143,6 +143,7 @@ typedef struct svc_fh {
 	int			fh_maxsize;	/* max size for fh_handle */
 
 	unsigned char		fh_locked;	/* inode locked by us */
+	unsigned char		fh_want_write;	/* remount protection taken */
 
 #ifdef CONFIG_NFSD_V3
 	unsigned char		fh_post_saved;	/* post-op attrs saved */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 13/27] fs: Improve filesystem freezing handling
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (11 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 12/27] nfsd: " Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 14/27] fs: Add freezing handling to mnt_want_write() / mnt_drop_write() Jan Kara
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara

vfs_check_frozen() tests are racy since the filesystem can be frozen just after
the test is performed. Thus in write paths we can end up marking some pages or
inodes dirty even though the file system is already frozen. This creates
problems with flusher thread hanging on frozen filesystem.

Another problem is that exclusion between ->page_mkwrite() and filesystem
freezing has been handled by setting page dirty and then verifying s_frozen.
This guaranteed that either the freezing code sees the faulted page, writes it,
and writeprotects it again or we see s_frozen set and bail out of page fault.
This works to protect from page being marked writeable while filesystem
freezing is running but has an unpleasant artefact of leaving dirty (although
unmodified and writeprotected) pages on frozen filesystem resulting in similar
problems with flusher thread as the first problem.

This patch aims at providing exclusion between write paths and filesystem
freezing. We implement a writer-freeze read-write semaphore in the superblock.
Actually, there are three such semaphores because of lock ranking reasons - one
for page fault handlers (->page_mkwrite), one for all other writers, and one of
internal filesystem purposes (used e.g. to track running transactions).  Write
paths which should block freezing (e.g. directory operations, ->aio_write(),
->page_mkwrite) hold reader side of the semaphore. Code freezing the filesystem
takes the writer side.

Only that we don't really want to bounce cachelines of the semaphores between
CPUs for each write happening. So we implement the reader side of the semaphore
as a per-cpu counter and the writer side is implemented using s_writers.frozen
superblock field.

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/super.c         |  329 ++++++++++++++++++++++++++++++++++++++++++++++++----
 include/linux/fs.h |   48 +++++++--
 2 files changed, 348 insertions(+), 29 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index cf00177..6ed27cd 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -33,12 +33,20 @@
 #include <linux/rculist_bl.h>
 #include <linux/cleancache.h>
 #include <linux/fsnotify.h>
+#include <linux/lockdep.h>
 #include "internal.h"
 
 
 LIST_HEAD(super_blocks);
 DEFINE_SPINLOCK(sb_lock);
 
+static struct lock_class_key sb_writers_key[SB_FREEZE_LEVELS];
+static char *sb_writers_name[SB_FREEZE_LEVELS] = {
+	"sb_writers",
+	"sb_pagefaults",
+	"sb_internal",
+};
+
 /*
  * One thing we have to be careful of with a per-sb shrinker is that we don't
  * drop the last active reference to the superblock from within the shrinker.
@@ -102,6 +110,35 @@ static int prune_super(struct shrinker *shrink, struct shrink_control *sc)
 	return total_objects;
 }
 
+static int init_sb_writers(struct super_block *s)
+{
+	int err;
+	int i;
+
+	for (i = 0; i < SB_FREEZE_LEVELS; i++) {
+		err = percpu_counter_init(&s->s_writers.counter[i], 0);
+		if (err < 0)
+			goto err_out;
+		lockdep_init_map(&s->s_writers.lock_map[i], sb_writers_name[i],
+				 &sb_writers_key[i], 0);
+	}
+	init_waitqueue_head(&s->s_writers.wait);
+	init_waitqueue_head(&s->s_writers.wait_unfrozen);
+	return 0;
+err_out:
+	while (--i >= 0)
+		percpu_counter_destroy(&s->s_writers.counter[i]);
+	return err;
+}
+
+static void destroy_sb_writers(struct super_block *s)
+{
+	int i;
+
+	for (i = 0; i < SB_FREEZE_LEVELS; i++)
+		percpu_counter_destroy(&s->s_writers.counter[i]);
+}
+
 /**
  *	alloc_super	-	create new superblock
  *	@type:	filesystem type superblock should belong to
@@ -116,18 +153,19 @@ static struct super_block *alloc_super(struct file_system_type *type)
 
 	if (s) {
 		if (security_sb_alloc(s)) {
+			/*
+			 * We cannot call security_sb_free() without
+			 * security_sb_alloc() succeeding. So bail out manually
+			 */
 			kfree(s);
 			s = NULL;
 			goto out;
 		}
 #ifdef CONFIG_SMP
 		s->s_files = alloc_percpu(struct list_head);
-		if (!s->s_files) {
-			security_sb_free(s);
-			kfree(s);
-			s = NULL;
-			goto out;
-		} else {
+		if (!s->s_files)
+			goto err_out;
+		else {
 			int i;
 
 			for_each_possible_cpu(i)
@@ -136,6 +174,9 @@ static struct super_block *alloc_super(struct file_system_type *type)
 #else
 		INIT_LIST_HEAD(&s->s_files);
 #endif
+		if (init_sb_writers(s))
+			goto err_out;
+
 		s->s_bdi = &default_backing_dev_info;
 		INIT_HLIST_NODE(&s->s_instances);
 		INIT_HLIST_BL_HEAD(&s->s_anon);
@@ -188,6 +229,16 @@ static struct super_block *alloc_super(struct file_system_type *type)
 	}
 out:
 	return s;
+err_out:
+	security_sb_free(s);
+#ifdef CONFIG_SMP
+	if (s->s_files)
+		free_percpu(s->s_files);
+#endif
+	destroy_sb_writers(s);
+	kfree(s);
+	s = NULL;
+	goto out;
 }
 
 /**
@@ -201,6 +252,7 @@ static inline void destroy_super(struct super_block *s)
 #ifdef CONFIG_SMP
 	free_percpu(s->s_files);
 #endif
+	destroy_sb_writers(s);
 	security_sb_free(s);
 	WARN_ON(!list_empty(&s->s_mounts));
 	kfree(s->s_subtype);
@@ -647,10 +699,11 @@ struct super_block *get_super_thawed(struct block_device *bdev)
 {
 	while (1) {
 		struct super_block *s = get_super(bdev);
-		if (!s || s->s_frozen == SB_UNFROZEN)
+		if (!s || s->s_writers.frozen == SB_UNFROZEN)
 			return s;
 		up_read(&s->s_umount);
-		vfs_check_frozen(s, SB_FREEZE_WRITE);
+		wait_event(s->s_writers.wait_unfrozen,
+			   s->s_writers.frozen == SB_UNFROZEN);
 		put_super(s);
 	}
 }
@@ -728,7 +781,7 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 	int retval;
 	int remount_ro;
 
-	if (sb->s_frozen != SB_UNFROZEN)
+	if (sb->s_writers.frozen != SB_UNFROZEN)
 		return -EBUSY;
 
 #ifdef CONFIG_BLOCK
@@ -1163,6 +1216,196 @@ out:
 	return ERR_PTR(error);
 }
 
+static void __sb_end_write(struct super_block *sb, int level)
+{
+	percpu_counter_dec(&sb->s_writers.counter[level-1]);
+	/*
+	 * Make sure s_writers are updated before we wake up waiters in
+	 * freeze_super().
+	 */
+	smp_mb();
+	if (waitqueue_active(&sb->s_writers.wait))
+		wake_up(&sb->s_writers.wait);
+	rwsem_release(&sb->s_writers.lock_map[level-1], 1, _RET_IP_);
+}
+
+/**
+ * sb_end_write - drop write access to a superblock
+ * @sb: the super we wrote to
+ *
+ * Decrement number of writers to the filesystem. Wake up possible waiters
+ * wanting to freeze the filesystem.
+ */
+void sb_end_write(struct super_block *sb)
+{
+	__sb_end_write(sb, SB_FREEZE_WRITE);
+}
+EXPORT_SYMBOL(sb_end_write);
+
+/**
+ * sb_end_pagefault - drop write access to a superblock from a page fault
+ * @sb: the super we wrote to
+ *
+ * Decrement number of processes handling write page fault to the filesystem.
+ * Wake up possible waiters wanting to freeze the filesystem.
+ */
+void sb_end_pagefault(struct super_block *sb)
+{
+	__sb_end_write(sb, SB_FREEZE_PAGEFAULT);
+}
+EXPORT_SYMBOL(sb_end_pagefault);
+
+/**
+ * sb_end_intwrite - drop write access to a superblock for internal fs purposes
+ * @sb: the super we wrote to
+ *
+ * Decrement fs-internal number of writers to the filesystem.  Wake up possible
+ * waiters wanting to freeze the filesystem.
+ */
+void sb_end_intwrite(struct super_block *sb)
+{
+	__sb_end_write(sb, SB_FREEZE_FS);
+}
+EXPORT_SYMBOL(sb_end_intwrite);
+
+static int __sb_start_write(struct super_block *sb, int level, bool wait)
+{
+retry:
+	if (wait) {
+		wait_event(sb->s_writers.wait_unfrozen,
+			   sb->s_writers.frozen < level);
+	} else if (sb->s_writers.frozen >= level)
+		return 0;
+
+	rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, !wait,
+			   _RET_IP_);
+	percpu_counter_inc(&sb->s_writers.counter[level-1]);
+	/*
+	 * Make sure counter is updated before we check for frozen.
+	 * freeze_super() first sets frozen and then checks the counter.
+	 */
+	smp_mb();
+	if (sb->s_writers.frozen >= level) {
+		__sb_end_write(sb, level);
+		goto retry;
+	}
+	return 1;
+}
+
+/**
+ * sb_start_write - get write access to a superblock
+ * @sb: the super we write to
+ *
+ * When a process wants to write data or metadata to a file system (i.e. dirty
+ * a page or an inode), it should embed the operation in a sb_start_write() -
+ * sb_end_write() pair to get exclusion against file system freezing. This
+ * function increments number of writers preventing freezing. If the file
+ * system is already frozen, the function waits until the file system is
+ * thawed.
+ *
+ * Since freeze protection behaves as a lock, users have to preserve
+ * ordering of freeze protection and other filesystem locks. Generally,
+ * freeze protection should be the outermost lock. In particular, we have:
+ *
+ * sb_start_write
+ *   -> i_mutex			(write path, truncate, directory ops, ...)
+ *   -> s_umount		(freeze_super, thaw_super)
+ */
+void sb_start_write(struct super_block *sb)
+{
+	__sb_start_write(sb, SB_FREEZE_WRITE, true);
+}
+EXPORT_SYMBOL(sb_start_write);
+
+int sb_start_write_trylock(struct super_block *sb)
+{
+	return __sb_start_write(sb, SB_FREEZE_WRITE, false);
+}
+EXPORT_SYMBOL(sb_start_write_trylock);
+
+/**
+ * sb_start_pagefault - get write access to a superblock from a page fault
+ * @sb: the super we write to
+ *
+ * When a process starts handling write page fault, it should embed the
+ * operation into sb_start_pagefault() - sb_end_pagefault() pair to get
+ * exclusion against file system freezing. This is needed since the page fault
+ * is going to dirty a page. This function increments number of running page
+ * faults preventing freezing. If the file system is already frozen, the
+ * function waits until the file system is thawed.
+ *
+ * Since page fault freeze protection behaves as a lock, users have to preserve
+ * ordering of freeze protection and other filesystem locks. It is advised to
+ * put sb_start_pagefault() close to mmap_sem in lock ordering. Page fault
+ * handling code implies lock dependency:
+ *
+ * mmap_sem
+ *   -> sb_start_pagefault
+ */
+void sb_start_pagefault(struct super_block *sb)
+{
+	__sb_start_write(sb, SB_FREEZE_PAGEFAULT, true);
+}
+EXPORT_SYMBOL(sb_start_pagefault);
+
+/*
+ * sb_start_intwrite - get write access to a superblock for internal fs purposes
+ * @sb: the super we write to
+ *
+ * This is the third level of protection against filesystem freezing. It is
+ * free for use by a filesystem. The only requirement is that it must rank
+ * below sb_start_pagefault.
+ *
+ * For example filesystem can call sb_start_intwrite() when starting a
+ * transaction which somewhat eases handling of freezing for internal sources
+ * of filesystem changes (internal fs threads, discarding preallocation on file
+ * close, etc.).
+ */
+void sb_start_intwrite(struct super_block *sb)
+{
+	__sb_start_write(sb, SB_FREEZE_FS, true);
+}
+EXPORT_SYMBOL(sb_start_intwrite);
+
+/**
+ * sb_wait_write - wait until all writers to given file system finish
+ * @sb: the super for which we wait
+ * @level: type of writers we wait for (normal vs page fault)
+ *
+ * This function waits until there are no writers of given type to given file
+ * system. Caller of this function should make sure there can be no new writers
+ * of type @level before calling this function. Otherwise this function can
+ * livelock.
+ */
+static void sb_wait_write(struct super_block *sb, int level)
+{
+	s64 writers;
+
+	/*
+	 * We just cycle-through lockdep here so that it does not complain
+	 * about returning with lock to userspace
+	 */
+	rwsem_acquire(&sb->s_writers.lock_map[level-1], 0, 0, _THIS_IP_);
+	rwsem_release(&sb->s_writers.lock_map[level-1], 1, _THIS_IP_);
+
+	do {
+		DEFINE_WAIT(wait);
+
+		/*
+		 * We use a barrier in prepare_to_wait() to separate setting
+		 * of frozen and checking of the counter
+		 */
+		prepare_to_wait(&sb->s_writers.wait, &wait,
+				TASK_UNINTERRUPTIBLE);
+
+		writers = percpu_counter_sum(&sb->s_writers.counter[level-1]);
+		if (writers)
+			schedule();
+
+		finish_wait(&sb->s_writers.wait, &wait);
+	} while (writers);
+}
+
 /**
  * freeze_super - lock the filesystem and force it into a consistent state
  * @sb: the super to lock
@@ -1170,6 +1413,31 @@ out:
  * Syncs the super to make sure the filesystem is consistent and calls the fs's
  * freeze_fs.  Subsequent calls to this without first thawing the fs will return
  * -EBUSY.
+ *
+ * During this function, sb->s_writers.frozen goes through these values:
+ *
+ * SB_UNFROZEN: File system is normal, all writes progress as usual.
+ *
+ * SB_FREEZE_WRITE: The file system is in the process of being frozen.  New
+ * writes should be blocked, though page faults are still allowed. We wait for
+ * all writes to complete and then proceed to the next stage.
+ *
+ * SB_FREEZE_PAGEFAULT: Freezing continues. Now also page faults are blocked
+ * but internal fs threads can still modify the filesystem (although they
+ * should not dirty new pages or inodes), writeback can run etc. After waiting
+ * for all running page faults we sync the filesystem which will clean all
+ * dirty pages and inodes (no new dirty pages or inodes can be created when
+ * sync is running).
+ *
+ * SB_FREEZE_FS: The file system is frozen. Now all internal sources of fs
+ * modification are blocked (e.g. XFS preallocation truncation on inode
+ * reclaim). This is usually implemented by blocking new transactions for
+ * filesystems that have them and need this additional guard. After all
+ * internal writers are finished we call ->freeze_fs() to finish filesystem
+ * freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
+ * mostly auxiliary for filesystems to verify they do not modify frozen fs.
+ *
+ * sb->s_writers.frozen is protected by sb->s_umount.
  */
 int freeze_super(struct super_block *sb)
 {
@@ -1177,7 +1445,7 @@ int freeze_super(struct super_block *sb)
 
 	atomic_inc(&sb->s_active);
 	down_write(&sb->s_umount);
-	if (sb->s_frozen) {
+	if (sb->s_writers.frozen != SB_UNFROZEN) {
 		deactivate_locked_super(sb);
 		return -EBUSY;
 	}
@@ -1188,33 +1456,53 @@ int freeze_super(struct super_block *sb)
 	}
 
 	if (sb->s_flags & MS_RDONLY) {
-		sb->s_frozen = SB_FREEZE_TRANS;
-		smp_wmb();
+		/* Nothing to do really... */
+		sb->s_writers.frozen = SB_FREEZE_COMPLETE;
 		up_write(&sb->s_umount);
 		return 0;
 	}
 
-	sb->s_frozen = SB_FREEZE_WRITE;
+	/* From now on, no new normal writers can start */
+	sb->s_writers.frozen = SB_FREEZE_WRITE;
+	smp_wmb();
+
+	/* Release s_umount to preserve sb_start_write -> s_umount ordering */
+	up_write(&sb->s_umount);
+
+	sb_wait_write(sb, SB_FREEZE_WRITE);
+
+	/* Now we go and block page faults... */
+	down_write(&sb->s_umount);
+	sb->s_writers.frozen = SB_FREEZE_PAGEFAULT;
 	smp_wmb();
 
+	sb_wait_write(sb, SB_FREEZE_PAGEFAULT);
+
+	/* All writers are done so after syncing there won't be dirty data */
 	sync_filesystem(sb);
 
-	sb->s_frozen = SB_FREEZE_TRANS;
+	/* Now wait for internal filesystem counter */
+	sb->s_writers.frozen = SB_FREEZE_FS;
 	smp_wmb();
+	sb_wait_write(sb, SB_FREEZE_FS);
 
-	sync_blockdev(sb->s_bdev);
 	if (sb->s_op->freeze_fs) {
 		ret = sb->s_op->freeze_fs(sb);
 		if (ret) {
 			printk(KERN_ERR
 				"VFS:Filesystem freeze failed\n");
-			sb->s_frozen = SB_UNFROZEN;
+			sb->s_writers.frozen = SB_UNFROZEN;
 			smp_wmb();
-			wake_up(&sb->s_wait_unfrozen);
+			wake_up(&sb->s_writers.wait_unfrozen);
 			deactivate_locked_super(sb);
 			return ret;
 		}
 	}
+	/*
+	 * This is just for debugging purposes so that fs can warn if it
+	 * sees write activity when frozen is set to SB_FREEZE_COMPLETE.
+	 */
+	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
 	up_write(&sb->s_umount);
 	return 0;
 }
@@ -1231,7 +1519,7 @@ int thaw_super(struct super_block *sb)
 	int error;
 
 	down_write(&sb->s_umount);
-	if (sb->s_frozen == SB_UNFROZEN) {
+	if (sb->s_writers.frozen == SB_UNFROZEN) {
 		up_write(&sb->s_umount);
 		return -EINVAL;
 	}
@@ -1244,16 +1532,15 @@ int thaw_super(struct super_block *sb)
 		if (error) {
 			printk(KERN_ERR
 				"VFS:Filesystem thaw failed\n");
-			sb->s_frozen = SB_FREEZE_TRANS;
 			up_write(&sb->s_umount);
 			return error;
 		}
 	}
 
 out:
-	sb->s_frozen = SB_UNFROZEN;
+	sb->s_writers.frozen = SB_UNFROZEN;
 	smp_wmb();
-	wake_up(&sb->s_wait_unfrozen);
+	wake_up(&sb->s_writers.wait_unfrozen);
 	deactivate_locked_super(sb);
 
 	return 0;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8de6755..d3c5126 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -402,6 +402,7 @@ struct inodes_stat_t {
 #include <linux/atomic.h>
 #include <linux/shrinker.h>
 #include <linux/migrate_mode.h>
+#include <linux/lockdep.h>
 
 #include <asm/byteorder.h>
 
@@ -1402,6 +1403,16 @@ extern void f_delown(struct file *filp);
 extern pid_t f_getown(struct file *filp);
 extern int send_sigurg(struct fown_struct *fown);
 
+struct mm_struct;
+
+void sb_start_write(struct super_block *sb);
+int sb_start_write_trylock(struct super_block *sb);
+void sb_end_write(struct super_block *sb);
+void sb_start_pagefault(struct super_block *sb);
+void sb_end_pagefault(struct super_block *sb);
+void sb_start_intwrite(struct super_block *sb);
+void sb_end_intwrite(struct super_block *sb);
+
 /*
  *	Umount options
  */
@@ -1415,6 +1426,32 @@ extern int send_sigurg(struct fown_struct *fown);
 extern struct list_head super_blocks;
 extern spinlock_t sb_lock;
 
+/* Possible states of 'frozen' field */
+enum {
+	SB_UNFROZEN = 0,		/* FS is unfrozen */
+	SB_FREEZE_WRITE	= 1,		/* Writes, dir ops, ioctls frozen */
+	SB_FREEZE_TRANS = 2,
+	SB_FREEZE_PAGEFAULT = 2,	/* Page faults stopped as well */
+	SB_FREEZE_FS = 3,		/* For internal FS use (e.g. to stop
+					 * internal threads if needed) */
+	SB_FREEZE_COMPLETE = 4,		/* ->freeze_fs finished successfully */
+};
+
+#define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE - 1)
+
+struct sb_writers {
+	/* Counters for counting writers at each level */
+	struct percpu_counter	counter[SB_FREEZE_LEVELS];
+	wait_queue_head_t	wait;		/* queue for waiting for
+						   writers / faults to finish */
+	int			frozen;		/* Is sb frozen? */
+	wait_queue_head_t	wait_unfrozen;	/* queue for waiting for
+						   sb to be thawed */
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	struct lockdep_map	lock_map[SB_FREEZE_LEVELS];
+#endif
+};
+
 struct super_block {
 	struct list_head	s_list;		/* Keep this first */
 	dev_t			s_dev;		/* search index; _not_ kdev_t */
@@ -1464,6 +1501,7 @@ struct super_block {
 
 	int			s_frozen;
 	wait_queue_head_t	s_wait_unfrozen;
+	struct sb_writers	s_writers;
 
 	char s_id[32];				/* Informational name */
 	u8 s_uuid[16];				/* UUID */
@@ -1518,14 +1556,8 @@ extern struct timespec current_fs_time(struct super_block *sb);
 /*
  * Snapshotting support.
  */
-enum {
-	SB_UNFROZEN = 0,
-	SB_FREEZE_WRITE	= 1,
-	SB_FREEZE_TRANS = 2,
-};
-
-#define vfs_check_frozen(sb, level) \
-	wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level)))
+/* Will go away when all users are converted */
+#define vfs_check_frozen(sb, level) do { } while (0)
 
 /*
  * until VFS tracks user namespaces for inodes, just make all files
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 14/27] fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (12 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 13/27] fs: Improve filesystem freezing handling Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 15/27] fs: Skip atime update on frozen filesystem Jan Kara
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara

Most of places where we want freeze protection coincides with the places where
we also have remount-ro protection. So make mnt_want_write() and
mnt_drop_write() (and their _file alternative) prevent freezing as well.
For the few cases that are really interested only in remount-ro protection
provide new function variants.

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/file_table.c       |    2 +-
 fs/inode.c            |    4 +-
 fs/namei.c            |   16 +++++++-
 fs/namespace.c        |  101 +++++++++++++++++++++++++++++++++++++++----------
 fs/open.c             |    2 +-
 include/linux/mount.h |    4 ++
 6 files changed, 103 insertions(+), 26 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 70f2a0f..e842ca3 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -216,7 +216,7 @@ static void drop_file_write_access(struct file *file)
 		return;
 	if (file_check_writeable(file) != 0)
 		return;
-	mnt_drop_write(mnt);
+	__mnt_drop_write(mnt);
 	file_release_write(file);
 }
 
diff --git a/fs/inode.c b/fs/inode.c
index 9f4f5fe..396a388 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1560,7 +1560,7 @@ void file_update_time(struct file *file)
 		return;
 
 	/* Finally allowed to write? Takes lock. */
-	if (mnt_want_write_file(file))
+	if (__mnt_want_write_file(file))
 		return;
 
 	/* Only change inode inside the lock region */
@@ -1571,7 +1571,7 @@ void file_update_time(struct file *file)
 	if (sync_it & S_MTIME)
 		inode->i_mtime = now;
 	mark_inode_dirty_sync(inode);
-	mnt_drop_write_file(file);
+	__mnt_drop_write_file(file);
 }
 EXPORT_SYMBOL(file_update_time);
 
diff --git a/fs/namei.c b/fs/namei.c
index 5417fa1..87dd013 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2243,12 +2243,21 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
 	if (nd->last.name[nd->last.len])
 		goto exit;
 
+	/*
+	 * We don't know whether write access will be needed or not. But we
+	 * must get the protection before getting i_mutex due to locking
+	 * constraints. Thus O_CREAT open will get blocked on frozen filesystem
+	 * even if it didn't need to create anything. Not that surprising...
+	 */
+	sb_start_write(dir->d_sb);
+
 	mutex_lock(&dir->d_inode->i_mutex);
 
 	dentry = lookup_hash(nd);
 	error = PTR_ERR(dentry);
 	if (IS_ERR(dentry)) {
 		mutex_unlock(&dir->d_inode->i_mutex);
+		sb_end_write(dir->d_sb);
 		goto exit;
 	}
 
@@ -2267,9 +2276,11 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
 		 * a permanent write count is taken through
 		 * the 'struct file' in nameidata_to_filp().
 		 */
-		error = mnt_want_write(nd->path.mnt);
-		if (error)
+		error = __mnt_want_write(nd->path.mnt);
+		if (error) {
+			sb_end_write(dir->d_sb);
 			goto exit_mutex_unlock;
+		}
 		want_write = 1;
 		/* Don't check for write permission, don't truncate */
 		open_flag &= ~O_TRUNC;
@@ -2291,6 +2302,7 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
 	 * It already exists.
 	 */
 	mutex_unlock(&dir->d_inode->i_mutex);
+	sb_end_write(dir->d_sb);
 	audit_inode(pathname, path->dentry);
 
 	error = -EEXIST;
diff --git a/fs/namespace.c b/fs/namespace.c
index e608199..157afbe 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -283,24 +283,22 @@ static int mnt_is_readonly(struct vfsmount *mnt)
 }
 
 /*
- * Most r/o checks on a fs are for operations that take
- * discrete amounts of time, like a write() or unlink().
- * We must keep track of when those operations start
- * (for permission checks) and when they end, so that
- * we can determine when writes are able to occur to
- * a filesystem.
+ * Most r/o & frozen checks on a fs are for operations that take discrete
+ * amounts of time, like a write() or unlink().  We must keep track of when
+ * those operations start (for permission checks) and when they end, so that we
+ * can determine when writes are able to occur to a filesystem.
  */
 /**
- * mnt_want_write - get write access to a mount
+ * __mnt_want_write - get write access to a mount without freeze protection
  * @m: the mount on which to take a write
  *
- * This tells the low-level filesystem that a write is
- * about to be performed to it, and makes sure that
- * writes are allowed before returning success.  When
- * the write operation is finished, mnt_drop_write()
- * must be called.  This is effectively a refcount.
+ * This tells the low-level filesystem that a write is about to be performed to
+ * it, and makes sure that writes are allowed (mnt it read-write) before
+ * returning success. This operation does not protect against filesystem being
+ * frozen. When the write operation is finished, __mnt_drop_write() must be
+ * called. This is effectively a refcount.
  */
-int mnt_want_write(struct vfsmount *m)
+int __mnt_want_write(struct vfsmount *m)
 {
 	struct mount *mnt = real_mount(m);
 	int ret = 0;
@@ -326,6 +324,28 @@ int mnt_want_write(struct vfsmount *m)
 		ret = -EROFS;
 	}
 	preempt_enable();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(__mnt_want_write);
+
+/**
+ * mnt_want_write - get write access to a mount
+ * @m: the mount on which to take a write
+ *
+ * This tells the low-level filesystem that a write is about to be performed to
+ * it, and makes sure that writes are allowed (mount is read-write, filesystem
+ * is not frozen) before returning success.  When the write operation is
+ * finished, mnt_drop_write() must be called.  This is effectively a refcount.
+ */
+int mnt_want_write(struct vfsmount *m)
+{
+	int ret;
+
+	sb_start_write(m->mnt_sb);
+	ret = __mnt_want_write(m);
+	if (ret)
+		sb_end_write(m->mnt_sb);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(mnt_want_write);
@@ -355,38 +375,79 @@ int mnt_clone_write(struct vfsmount *mnt)
 EXPORT_SYMBOL_GPL(mnt_clone_write);
 
 /**
- * mnt_want_write_file - get write access to a file's mount
+ * __mnt_want_write_file - get write access to a file's mount
  * @file: the file who's mount on which to take a write
  *
- * This is like mnt_want_write, but it takes a file and can
+ * This is like __mnt_want_write, but it takes a file and can
  * do some optimisations if the file is open for write already
  */
-int mnt_want_write_file(struct file *file)
+int __mnt_want_write_file(struct file *file)
 {
 	struct inode *inode = file->f_dentry->d_inode;
+
 	if (!(file->f_mode & FMODE_WRITE) || special_file(inode->i_mode))
-		return mnt_want_write(file->f_path.mnt);
+		return __mnt_want_write(file->f_path.mnt);
 	else
 		return mnt_clone_write(file->f_path.mnt);
 }
+EXPORT_SYMBOL_GPL(__mnt_want_write_file);
+
+/**
+ * mnt_want_write_file - get write access to a file's mount
+ * @file: the file who's mount on which to take a write
+ *
+ * This is like mnt_want_write, but it takes a file and can
+ * do some optimisations if the file is open for write already
+ */
+int mnt_want_write_file(struct file *file)
+{
+	int ret;
+
+	sb_start_write(file->f_path.mnt->mnt_sb);
+	ret = __mnt_want_write_file(file);
+	if (ret)
+		sb_end_write(file->f_path.mnt->mnt_sb);
+	return ret;
+}
 EXPORT_SYMBOL_GPL(mnt_want_write_file);
 
 /**
- * mnt_drop_write - give up write access to a mount
+ * __mnt_drop_write - give up write access to a mount
  * @mnt: the mount on which to give up write access
  *
  * Tells the low-level filesystem that we are done
  * performing writes to it.  Must be matched with
- * mnt_want_write() call above.
+ * __mnt_want_write() call above.
  */
-void mnt_drop_write(struct vfsmount *mnt)
+void __mnt_drop_write(struct vfsmount *mnt)
 {
 	preempt_disable();
 	mnt_dec_writers(real_mount(mnt));
 	preempt_enable();
 }
+EXPORT_SYMBOL_GPL(__mnt_drop_write);
+
+/**
+ * mnt_drop_write - give up write access to a mount
+ * @mnt: the mount on which to give up write access
+ *
+ * Tells the low-level filesystem that we are done performing writes to it and
+ * also allows filesystem to be frozen again.  Must be matched with
+ * mnt_want_write() call above.
+ */
+void mnt_drop_write(struct vfsmount *mnt)
+{
+	__mnt_drop_write(mnt);
+	sb_end_write(mnt->mnt_sb);
+}
 EXPORT_SYMBOL_GPL(mnt_drop_write);
 
+void __mnt_drop_write_file(struct file *file)
+{
+	__mnt_drop_write(file->f_path.mnt);
+}
+EXPORT_SYMBOL(__mnt_drop_write_file);
+
 void mnt_drop_write_file(struct file *file)
 {
 	mnt_drop_write(file->f_path.mnt);
diff --git a/fs/open.c b/fs/open.c
index 5720854..06afb5b 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -637,7 +637,7 @@ static inline int __get_file_write_access(struct inode *inode,
 		/*
 		 * Balanced in __fput()
 		 */
-		error = mnt_want_write(mnt);
+		error = __mnt_want_write(mnt);
 		if (error)
 			put_write_access(inode);
 	}
diff --git a/include/linux/mount.h b/include/linux/mount.h
index d7029f4..d6ffc3d 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -55,10 +55,14 @@ struct vfsmount {
 
 struct file; /* forward dec */
 
+extern int __mnt_want_write(struct vfsmount *mnt);
 extern int mnt_want_write(struct vfsmount *mnt);
+extern int __mnt_want_write_file(struct file *file);
 extern int mnt_want_write_file(struct file *file);
 extern int mnt_clone_write(struct vfsmount *mnt);
+extern void __mnt_drop_write(struct vfsmount *mnt);
 extern void mnt_drop_write(struct vfsmount *mnt);
+extern void __mnt_drop_write_file(struct file *file);
 extern void mnt_drop_write_file(struct file *file);
 extern void mntput(struct vfsmount *mnt);
 extern struct vfsmount *mntget(struct vfsmount *mnt);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 15/27] fs: Skip atime update on frozen filesystem
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (13 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 14/27] fs: Add freezing handling to mnt_want_write() / mnt_drop_write() Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 16/27] fs: Protect write paths by sb_start_write - sb_end_write Jan Kara
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara

It is unexpected to block reading of frozen filesystem because of atime update.
Also handling blocking on frozen filesystem because of atime update would make
locking more complex than it already is. So just skip atime update when
filesystem is frozen like we skip it when filesystem is remounted read-only.

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/inode.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 396a388..3ded74d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1515,12 +1515,17 @@ void touch_atime(struct path *path)
 	if (timespec_equal(&inode->i_atime, &now))
 		return;
 
-	if (mnt_want_write(mnt))
+	if (!sb_start_write_trylock(inode->i_sb))
 		return;
 
+	if (__mnt_want_write(mnt))
+		goto skip_update;
+
 	inode->i_atime = now;
 	mark_inode_dirty_sync(inode);
-	mnt_drop_write(mnt);
+	__mnt_drop_write(mnt);
+skip_update:
+	sb_end_write(inode->i_sb);
 }
 EXPORT_SYMBOL(touch_atime);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 16/27] fs: Protect write paths by sb_start_write - sb_end_write
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (14 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 15/27] fs: Skip atime update on frozen filesystem Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 17/27] ext4: Convert to new freezing mechanism Jan Kara
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara

There are several entry points which dirty pages in a filesystem.  mmap
(handled by block_page_mkwrite()), buffered write (handled by
__generic_file_aio_write()), splice write (generic_file_splice_write),
truncate, and fallocate (these can dirty last partial page - handled inside
each filesystem separately). Protect these places with sb_start_write() and
sb_end_write().

->page_mkwrite() calls are particularly complex since they are called with
mmap_sem held and thus we cannot use standard sb_start_write() due to lock
ordering constraints. We solve the problem by using a special freeze protection
sb_start_pagefault() which ranks below mmap_sem.

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/buffer.c      |   22 ++++------------------
 fs/open.c        |    7 ++++++-
 fs/splice.c      |    3 +++
 mm/filemap.c     |   12 ++++++++++--
 mm/filemap_xip.c |    5 +++--
 5 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index b8a2386..0331d4a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2301,8 +2301,8 @@ EXPORT_SYMBOL(block_commit_write);
  * beyond EOF, then the page is guaranteed safe against truncation until we
  * unlock the page.
  *
- * Direct callers of this function should call vfs_check_frozen() so that page
- * fault does not busyloop until the fs is thawed.
+ * Direct callers of this function should protect against filesystem freezing
+ * using sb_start_write() - sb_end_write() functions.
  */
 int __block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
 			 get_block_t get_block)
@@ -2340,18 +2340,7 @@ int __block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
 
 	if (unlikely(ret < 0))
 		goto out_unlock;
-	/*
-	 * Freezing in progress? We check after the page is marked dirty and
-	 * with page lock held so if the test here fails, we are sure freezing
-	 * code will wait during syncing until the page fault is done - at that
-	 * point page will be dirty and unlocked so freezing code will write it
-	 * and writeprotect it again.
-	 */
 	set_page_dirty(page);
-	if (inode->i_sb->s_frozen != SB_UNFROZEN) {
-		ret = -EAGAIN;
-		goto out_unlock;
-	}
 	wait_on_page_writeback(page);
 	return 0;
 out_unlock:
@@ -2366,12 +2355,9 @@ int block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
 	int ret;
 	struct super_block *sb = vma->vm_file->f_path.dentry->d_inode->i_sb;
 
-	/*
-	 * This check is racy but catches the common case. The check in
-	 * __block_page_mkwrite() is reliable.
-	 */
-	vfs_check_frozen(sb, SB_FREEZE_WRITE);
+	sb_start_pagefault(sb);
 	ret = __block_page_mkwrite(vma, vmf, get_block);
+	sb_end_pagefault(sb);
 	return block_page_mkwrite_return(ret);
 }
 EXPORT_SYMBOL(block_page_mkwrite);
diff --git a/fs/open.c b/fs/open.c
index 06afb5b..e3181a4 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -164,11 +164,13 @@ static long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
 	if (IS_APPEND(inode))
 		goto out_putf;
 
+	sb_start_write(inode->i_sb);
 	error = locks_verify_truncate(inode, file, length);
 	if (!error)
 		error = security_path_truncate(&file->f_path);
 	if (!error)
 		error = do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, file);
+	sb_end_write(inode->i_sb);
 out_putf:
 	fput(file);
 out:
@@ -266,7 +268,10 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	if (!file->f_op->fallocate)
 		return -EOPNOTSUPP;
 
-	return file->f_op->fallocate(file, mode, offset, len);
+	sb_start_write(inode->i_sb);
+	ret = file->f_op->fallocate(file, mode, offset, len);
+	sb_end_write(inode->i_sb);
+	return ret;
 }
 
 SYSCALL_DEFINE(fallocate)(int fd, int mode, loff_t offset, loff_t len)
diff --git a/fs/splice.c b/fs/splice.c
index f847684..e06da17 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -992,6 +992,8 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	};
 	ssize_t ret;
 
+	sb_start_write(inode->i_sb);
+
 	pipe_lock(pipe);
 
 	splice_from_pipe_begin(&sd);
@@ -1028,6 +1030,7 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 			*ppos += ret;
 		balance_dirty_pages_ratelimited_nr(mapping, nr_pages);
 	}
+	sb_end_write(inode->i_sb);
 
 	return ret;
 }
diff --git a/mm/filemap.c b/mm/filemap.c
index aec7356..689ffdb 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1757,6 +1757,7 @@ int filemap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
 	int ret = VM_FAULT_LOCKED;
 
+	sb_start_pagefault(inode->i_sb);
 	file_update_time(vma->vm_file);
 	lock_page(page);
 	if (page->mapping != inode->i_mapping) {
@@ -1764,7 +1765,14 @@ int filemap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 		ret = VM_FAULT_NOPAGE;
 		goto out;
 	}
+	/*
+	 * We mark the page dirty already here so that when freeze is in
+	 * progress, we are guaranteed that writeback during freezing will
+	 * see the dirty page and writeprotect it again.
+	 */
+	set_page_dirty(page);
 out:
+	sb_end_pagefault(inode->i_sb);
 	return ret;
 }
 EXPORT_SYMBOL(filemap_page_mkwrite);
@@ -2530,8 +2538,6 @@ ssize_t __generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 	count = ocount;
 	pos = *ppos;
 
-	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
-
 	/* We can write back this queue in page reclaim */
 	current->backing_dev_info = mapping->backing_dev_info;
 	written = 0;
@@ -2628,6 +2634,7 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 
 	BUG_ON(iocb->ki_pos != pos);
 
+	sb_start_write(inode->i_sb);
 	mutex_lock(&inode->i_mutex);
 	blk_start_plug(&plug);
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
@@ -2641,6 +2648,7 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 			ret = err;
 	}
 	blk_finish_plug(&plug);
+	sb_end_write(inode->i_sb);
 	return ret;
 }
 EXPORT_SYMBOL(generic_file_aio_write);
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
index 591dba6..b051f0d 100644
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -402,6 +402,8 @@ xip_file_write(struct file *filp, const char __user *buf, size_t len,
 	loff_t pos;
 	ssize_t ret;
 
+	sb_start_write(inode->i_sb);
+
 	mutex_lock(&inode->i_mutex);
 
 	if (!access_ok(VERIFY_READ, buf, len)) {
@@ -412,8 +414,6 @@ xip_file_write(struct file *filp, const char __user *buf, size_t len,
 	pos = *ppos;
 	count = len;
 
-	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
-
 	/* We can write back this queue in page reclaim */
 	current->backing_dev_info = mapping->backing_dev_info;
 
@@ -435,6 +435,7 @@ xip_file_write(struct file *filp, const char __user *buf, size_t len,
 	current->backing_dev_info = NULL;
  out_up:
 	mutex_unlock(&inode->i_mutex);
+	sb_end_write(inode->i_sb);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(xip_file_write);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 17/27] ext4: Convert to new freezing mechanism
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (15 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 16/27] fs: Protect write paths by sb_start_write - sb_end_write Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 18/27] xfs: Convert to new freezing code Jan Kara
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: dchinner, LKML, linux-fsdevel, Jan Kara, linux-ext4,
	Theodore Ts'o

We remove most of frozen checks since upper layer takes care
of blocking all writes. We only have to handle protection in
ext4_page_mkwrite() in a special way because we cannot use
generic block_page_mkwrite().

CC: linux-ext4@vger.kernel.org
CC: "Theodore Ts'o" <tytso@mit.edu>
BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c |    7 ++-----
 fs/ext4/mmp.c   |   14 ++++++++++----
 fs/ext4/super.c |   31 +++++++------------------------
 3 files changed, 19 insertions(+), 33 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c77b0bd..bf568fb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4600,11 +4600,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	get_block_t *get_block;
 	int retries = 0;
 
-	/*
-	 * This check is racy but catches the common case. We rely on
-	 * __block_page_mkwrite() to do a reliable check.
-	 */
-	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
+	sb_start_pagefault(inode->i_sb);
 	/* Delalloc case is easy... */
 	if (test_opt(inode->i_sb, DELALLOC) &&
 	    !ext4_should_journal_data(inode) &&
@@ -4672,5 +4668,6 @@ retry_alloc:
 out_ret:
 	ret = block_page_mkwrite_return(ret);
 out:
+	sb_end_pagefault(inode->i_sb);
 	return ret;
 }
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index ed6548d..4f63f90 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -10,14 +10,20 @@
  * Write the MMP block using WRITE_SYNC to try to get the block on-disk
  * faster.
  */
-static int write_mmp_block(struct buffer_head *bh)
+static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
 {
+	/*
+	 * We protect against freezing so that we don't create dirty buffers
+	 * on frozen filesystem.
+	 */
+	sb_start_write(sb);
 	mark_buffer_dirty(bh);
 	lock_buffer(bh);
 	bh->b_end_io = end_buffer_write_sync;
 	get_bh(bh);
 	submit_bh(WRITE_SYNC, bh);
 	wait_on_buffer(bh);
+	sb_end_write(sb);
 	if (unlikely(!buffer_uptodate(bh)))
 		return 1;
 
@@ -120,7 +126,7 @@ static int kmmpd(void *data)
 		mmp->mmp_time = cpu_to_le64(get_seconds());
 		last_update_time = jiffies;
 
-		retval = write_mmp_block(bh);
+		retval = write_mmp_block(sb, bh);
 		/*
 		 * Don't spew too many error messages. Print one every
 		 * (s_mmp_update_interval * 60) seconds.
@@ -200,7 +206,7 @@ static int kmmpd(void *data)
 	mmp->mmp_seq = cpu_to_le32(EXT4_MMP_SEQ_CLEAN);
 	mmp->mmp_time = cpu_to_le64(get_seconds());
 
-	retval = write_mmp_block(bh);
+	retval = write_mmp_block(sb, bh);
 
 failed:
 	kfree(data);
@@ -299,7 +305,7 @@ skip:
 	seq = mmp_new_seq();
 	mmp->mmp_seq = cpu_to_le32(seq);
 
-	retval = write_mmp_block(bh);
+	retval = write_mmp_block(sb, bh);
 	if (retval)
 		goto failed;
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index ceebaf8..08c7326 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -290,33 +290,17 @@ static void ext4_put_nojournal(handle_t *handle)
  * journal_end calls result in the superblock being marked dirty, so
  * that sync() will call the filesystem's write_super callback if
  * appropriate.
- *
- * To avoid j_barrier hold in userspace when a user calls freeze(),
- * ext4 prevents a new handle from being started by s_frozen, which
- * is in an upper layer.
  */
 handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
 {
 	journal_t *journal;
-	handle_t  *handle;
 
 	trace_ext4_journal_start(sb, nblocks, _RET_IP_);
 	if (sb->s_flags & MS_RDONLY)
 		return ERR_PTR(-EROFS);
 
+	WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
 	journal = EXT4_SB(sb)->s_journal;
-	handle = ext4_journal_current_handle();
-
-	/*
-	 * If a handle has been started, it should be allowed to
-	 * finish, otherwise deadlock could happen between freeze
-	 * and others(e.g. truncate) due to the restart of the
-	 * journal handle if the filesystem is forzen and active
-	 * handles are not stopped.
-	 */
-	if (!handle)
-		vfs_check_frozen(sb, SB_FREEZE_TRANS);
-
 	if (!journal)
 		return ext4_get_nojournal();
 	/*
@@ -2635,6 +2619,7 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 	sb = elr->lr_super;
 	ngroups = EXT4_SB(sb)->s_groups_count;
 
+	sb_start_write(sb);
 	for (group = elr->lr_next_group; group < ngroups; group++) {
 		gdp = ext4_get_group_desc(sb, group, NULL);
 		if (!gdp) {
@@ -2661,6 +2646,7 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 		elr->lr_next_sched = jiffies + elr->lr_timeout;
 		elr->lr_next_group = group + 1;
 	}
+	sb_end_write(sb);
 
 	return ret;
 }
@@ -4137,10 +4123,8 @@ int ext4_force_commit(struct super_block *sb)
 		return 0;
 
 	journal = EXT4_SB(sb)->s_journal;
-	if (journal) {
-		vfs_check_frozen(sb, SB_FREEZE_TRANS);
+	if (journal)
 		ret = ext4_journal_force_commit(journal);
-	}
 
 	return ret;
 }
@@ -4172,9 +4156,8 @@ static int ext4_sync_fs(struct super_block *sb, int wait)
  * gives us a chance to flush the journal completely and mark the fs clean.
  *
  * Note that only this function cannot bring a filesystem to be in a clean
- * state independently, because ext4 prevents a new handle from being started
- * by @sb->s_frozen, which stays in an upper layer.  It thus needs help from
- * the upper layer.
+ * state independently. It relies on upper layer to stop all data & metadata
+ * modifications.
  */
 static int ext4_freeze(struct super_block *sb)
 {
@@ -4201,7 +4184,7 @@ static int ext4_freeze(struct super_block *sb)
 	EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
 	error = ext4_commit_super(sb, 1);
 out:
-	/* we rely on s_frozen to stop further updates */
+	/* we rely on upper layer to stop further updates */
 	jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
 	return error;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 18/27] xfs: Convert to new freezing code
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (16 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 17/27] ext4: Convert to new freezing mechanism Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-16 16:13 ` [PATCH 19/27] ocfs2: Convert to new freezing mechanism Jan Kara
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: Alex Elder, Jan Kara, LKML, xfs, Ben Myers, dchinner,
	linux-fsdevel

Generic code now blocks all writers from standard write paths. So we add
blocking of all writers coming from ioctl (we get a protection of ioctl against
racing remount read-only as a bonus) and convert xfs_file_aio_write() to a
non-racy freeze protection. We also keep freeze protection on transaction
start to block internal filesystem writes such as removal of preallocated
blocks.

CC: Ben Myers <bpm@sgi.com>
CC: Alex Elder <elder@kernel.org>
CC: xfs@oss.sgi.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/xfs/xfs_file.c    |   10 ++++++--
 fs/xfs/xfs_ioctl.c   |   55 +++++++++++++++++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_ioctl32.c |   12 ++++++++++
 fs/xfs/xfs_iomap.c   |    4 +-
 fs/xfs/xfs_mount.c   |    2 +-
 fs/xfs/xfs_mount.h   |    3 --
 fs/xfs/xfs_sync.c    |    2 +-
 fs/xfs/xfs_trans.c   |   17 ++++++++++++--
 fs/xfs/xfs_trans.h   |    2 +
 9 files changed, 91 insertions(+), 16 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 54a67dd..7c4c471 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -822,10 +822,12 @@ xfs_file_aio_write(
 	if (ocount == 0)
 		return 0;
 
-	xfs_wait_for_freeze(ip->i_mount, SB_FREEZE_WRITE);
+	sb_start_write(inode->i_sb);
 
-	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
-		return -EIO;
+	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
+		ret = -EIO;
+		goto out;
+	}
 
 	if (unlikely(file->f_flags & O_DIRECT))
 		ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos, ocount);
@@ -844,6 +846,8 @@ xfs_file_aio_write(
 			ret = err;
 	}
 
+out:
+	sb_end_write(inode->i_sb);
 	return ret;
 }
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 91f8ff5..828b7cb 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -363,9 +363,15 @@ xfs_fssetdm_by_handle(
 	if (copy_from_user(&dmhreq, arg, sizeof(xfs_fsop_setdm_handlereq_t)))
 		return -XFS_ERROR(EFAULT);
 
+	error = mnt_want_write_file(parfilp);
+	if (error)
+		return error;
+
 	dentry = xfs_handlereq_to_dentry(parfilp, &dmhreq.hreq);
-	if (IS_ERR(dentry))
+	if (IS_ERR(dentry)) {
+		mnt_drop_write_file(parfilp);
 		return PTR_ERR(dentry);
+	}
 
 	if (IS_IMMUTABLE(dentry->d_inode) || IS_APPEND(dentry->d_inode)) {
 		error = -XFS_ERROR(EPERM);
@@ -381,6 +387,7 @@ xfs_fssetdm_by_handle(
 				 fsd.fsd_dmstate);
 
  out:
+	mnt_drop_write_file(parfilp);
 	dput(dentry);
 	return error;
 }
@@ -633,7 +640,11 @@ xfs_ioc_space(
 	if (ioflags & IO_INVIS)
 		attr_flags |= XFS_ATTR_DMI;
 
+	error = mnt_want_write_file(filp);
+	if (error)
+		return error;
 	error = xfs_change_file_space(ip, cmd, bf, filp->f_pos, attr_flags);
+	mnt_drop_write_file(filp);
 	return -error;
 }
 
@@ -1162,6 +1173,7 @@ xfs_ioc_fssetxattr(
 {
 	struct fsxattr		fa;
 	unsigned int		mask;
+	int error;
 
 	if (copy_from_user(&fa, arg, sizeof(fa)))
 		return -EFAULT;
@@ -1170,7 +1182,12 @@ xfs_ioc_fssetxattr(
 	if (filp->f_flags & (O_NDELAY|O_NONBLOCK))
 		mask |= FSX_NONBLOCK;
 
-	return -xfs_ioctl_setattr(ip, &fa, mask);
+	error = mnt_want_write_file(filp);
+	if (error)
+		return error;
+	error = xfs_ioctl_setattr(ip, &fa, mask);
+	mnt_drop_write_file(filp);
+	return -error;
 }
 
 STATIC int
@@ -1195,6 +1212,7 @@ xfs_ioc_setxflags(
 	struct fsxattr		fa;
 	unsigned int		flags;
 	unsigned int		mask;
+	int error;
 
 	if (copy_from_user(&flags, arg, sizeof(flags)))
 		return -EFAULT;
@@ -1209,7 +1227,12 @@ xfs_ioc_setxflags(
 		mask |= FSX_NONBLOCK;
 	fa.fsx_xflags = xfs_merge_ioc_xflags(flags, xfs_ip2xflags(ip));
 
-	return -xfs_ioctl_setattr(ip, &fa, mask);
+	error = mnt_want_write_file(filp);
+	if (error)
+		return error;
+	error = xfs_ioctl_setattr(ip, &fa, mask);
+	mnt_drop_write_file(filp);
+	return -error;
 }
 
 STATIC int
@@ -1384,8 +1407,13 @@ xfs_file_ioctl(
 		if (copy_from_user(&dmi, arg, sizeof(dmi)))
 			return -XFS_ERROR(EFAULT);
 
+		error = mnt_want_write_file(filp);
+		if (error)
+			return error;
+
 		error = xfs_set_dmattrs(ip, dmi.fsd_dmevmask,
 				dmi.fsd_dmstate);
+		mnt_drop_write_file(filp);
 		return -error;
 	}
 
@@ -1433,7 +1461,11 @@ xfs_file_ioctl(
 
 		if (copy_from_user(&sxp, arg, sizeof(xfs_swapext_t)))
 			return -XFS_ERROR(EFAULT);
+		error = mnt_want_write_file(filp);
+		if (error)
+			return error;
 		error = xfs_swapext(&sxp);
+		mnt_drop_write_file(filp);
 		return -error;
 	}
 
@@ -1462,9 +1494,14 @@ xfs_file_ioctl(
 		if (copy_from_user(&inout, arg, sizeof(inout)))
 			return -XFS_ERROR(EFAULT);
 
+		error = mnt_want_write_file(filp);
+		if (error)
+			return error;
+
 		/* input parameter is passed in resblks field of structure */
 		in = inout.resblks;
 		error = xfs_reserve_blocks(mp, &in, &inout);
+		mnt_drop_write_file(filp);
 		if (error)
 			return -error;
 
@@ -1495,7 +1532,11 @@ xfs_file_ioctl(
 		if (copy_from_user(&in, arg, sizeof(in)))
 			return -XFS_ERROR(EFAULT);
 
+		error = mnt_want_write_file(filp);
+		if (error)
+			return error;
 		error = xfs_growfs_data(mp, &in);
+		mnt_drop_write_file(filp);
 		return -error;
 	}
 
@@ -1505,7 +1546,11 @@ xfs_file_ioctl(
 		if (copy_from_user(&in, arg, sizeof(in)))
 			return -XFS_ERROR(EFAULT);
 
+		error = mnt_want_write_file(filp);
+		if (error)
+			return error;
 		error = xfs_growfs_log(mp, &in);
+		mnt_drop_write_file(filp);
 		return -error;
 	}
 
@@ -1515,7 +1560,11 @@ xfs_file_ioctl(
 		if (copy_from_user(&in, arg, sizeof(in)))
 			return -XFS_ERROR(EFAULT);
 
+		error = mnt_want_write_file(filp);
+		if (error)
+			return error;
 		error = xfs_growfs_rt(mp, &in);
+		mnt_drop_write_file(filp);
 		return -error;
 	}
 
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index a849a54..542ce93 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -602,7 +602,11 @@ xfs_file_compat_ioctl(
 
 		if (xfs_compat_growfs_data_copyin(&in, arg))
 			return -XFS_ERROR(EFAULT);
+		error = mnt_want_write_file(filp);
+		if (error)
+			return error;
 		error = xfs_growfs_data(mp, &in);
+		mnt_drop_write_file(filp);
 		return -error;
 	}
 	case XFS_IOC_FSGROWFSRT_32: {
@@ -610,7 +614,11 @@ xfs_file_compat_ioctl(
 
 		if (xfs_compat_growfs_rt_copyin(&in, arg))
 			return -XFS_ERROR(EFAULT);
+		error = mnt_want_write_file(filp);
+		if (error)
+			return error;
 		error = xfs_growfs_rt(mp, &in);
+		mnt_drop_write_file(filp);
 		return -error;
 	}
 #endif
@@ -629,7 +637,11 @@ xfs_file_compat_ioctl(
 				   offsetof(struct xfs_swapext, sx_stat)) ||
 		    xfs_ioctl32_bstat_copyin(&sxp.sx_stat, &sxu->sx_stat))
 			return -XFS_ERROR(EFAULT);
+		error = mnt_want_write_file(filp);
+		if (error)
+			return error;
 		error = xfs_swapext(&sxp);
+		mnt_drop_write_file(filp);
 		return -error;
 	}
 	case XFS_IOC_FSBULKSTAT_32:
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 71a4645..dbcae37 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -681,9 +681,9 @@ xfs_iomap_write_unwritten(
 		 * the same inode that we complete here and might deadlock
 		 * on the iolock.
 		 */
-		xfs_wait_for_freeze(mp, SB_FREEZE_TRANS);
+		sb_start_intwrite(mp->m_super);
 		tp = _xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE, KM_NOFS);
-		tp->t_flags |= XFS_TRANS_RESERVE;
+		tp->t_flags |= XFS_TRANS_RESERVE | XFS_TRANS_FREEZE_PROT;
 		error = xfs_trans_reserve(tp, resblks,
 				XFS_WRITE_LOG_RES(mp), 0,
 				XFS_TRANS_PERM_LOG_RES,
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 1ffead4..5aa7444 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1543,7 +1543,7 @@ xfs_unmountfs(
 int
 xfs_fs_writable(xfs_mount_t *mp)
 {
-	return !(xfs_test_for_freeze(mp) || XFS_FORCED_SHUTDOWN(mp) ||
+	return !(mp->m_super->s_writers.frozen || XFS_FORCED_SHUTDOWN(mp) ||
 		(mp->m_flags & XFS_MOUNT_RDONLY));
 }
 
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 9eba738..73f6c7a 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -313,9 +313,6 @@ void xfs_do_force_shutdown(struct xfs_mount *mp, int flags, char *fname,
 #define SHUTDOWN_REMOTE_REQ	0x0010	/* shutdown came from remote cell */
 #define SHUTDOWN_DEVICE_REQ	0x0020	/* failed all paths to the device */
 
-#define xfs_test_for_freeze(mp)		((mp)->m_super->s_frozen)
-#define xfs_wait_for_freeze(mp,l)	vfs_check_frozen((mp)->m_super, (l))
-
 /*
  * Flags for xfs_mountfs
  */
diff --git a/fs/xfs/xfs_sync.c b/fs/xfs/xfs_sync.c
index 205ebcb..cb99ce0 100644
--- a/fs/xfs/xfs_sync.c
+++ b/fs/xfs/xfs_sync.c
@@ -462,7 +462,7 @@ xfs_sync_worker(
 
 	if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
 		/* dgc: errors ignored here */
-		if (mp->m_super->s_frozen == SB_UNFROZEN &&
+		if (mp->m_super->s_writers.frozen == SB_UNFROZEN &&
 		    xfs_log_need_covered(mp))
 			error = xfs_fs_log_dummy(mp);
 		else
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 103b00c..d012dd2 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -577,8 +577,12 @@ xfs_trans_alloc(
 	xfs_mount_t	*mp,
 	uint		type)
 {
-	xfs_wait_for_freeze(mp, SB_FREEZE_TRANS);
-	return _xfs_trans_alloc(mp, type, KM_SLEEP);
+	xfs_trans_t     *tp;
+
+	sb_start_intwrite(mp->m_super);
+	tp = _xfs_trans_alloc(mp, type, KM_SLEEP);
+	tp->t_flags |= XFS_TRANS_FREEZE_PROT;
+	return tp;
 }
 
 xfs_trans_t *
@@ -589,6 +593,7 @@ _xfs_trans_alloc(
 {
 	xfs_trans_t	*tp;
 
+	WARN_ON(mp->m_super->s_writers.frozen == SB_FREEZE_COMPLETE);
 	atomic_inc(&mp->m_active_trans);
 
 	tp = kmem_zone_zalloc(xfs_trans_zone, memflags);
@@ -612,6 +617,8 @@ xfs_trans_free(
 	xfs_alloc_busy_clear(tp->t_mountp, &tp->t_busy, false);
 
 	atomic_dec(&tp->t_mountp->m_active_trans);
+	if (tp->t_flags & XFS_TRANS_FREEZE_PROT)
+		sb_end_intwrite(tp->t_mountp->m_super);
 	xfs_trans_free_dqinfo(tp);
 	kmem_zone_free(xfs_trans_zone, tp);
 }
@@ -644,7 +651,11 @@ xfs_trans_dup(
 	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
 	ASSERT(tp->t_ticket != NULL);
 
-	ntp->t_flags = XFS_TRANS_PERM_LOG_RES | (tp->t_flags & XFS_TRANS_RESERVE);
+	ntp->t_flags = XFS_TRANS_PERM_LOG_RES |
+		       (tp->t_flags & XFS_TRANS_RESERVE) |
+		       (tp->t_flags & XFS_TRANS_FREEZE_PROT);
+	/* We gave our writer reference to the new transaction */
+	tp->t_flags &= ~XFS_TRANS_FREEZE_PROT;
 	ntp->t_ticket = xfs_log_ticket_get(tp->t_ticket);
 	ntp->t_blk_res = tp->t_blk_res - tp->t_blk_res_used;
 	tp->t_blk_res = tp->t_blk_res_used;
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index f611870..e42d94e 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -179,6 +179,8 @@ struct xfs_log_item_desc {
 #define	XFS_TRANS_SYNC		0x08	/* make commit synchronous */
 #define XFS_TRANS_DQ_DIRTY	0x10	/* at least one dquot in trx dirty */
 #define XFS_TRANS_RESERVE	0x20    /* OK to use reserved data blocks */
+#define XFS_TRANS_FREEZE_PROT	0x40	/* Transaction has elevated writer
+					   count in superblock */
 
 /*
  * Values for call flags parameter.
-- 
1.7.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 19/27] ocfs2: Convert to new freezing mechanism
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (17 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 18/27] xfs: Convert to new freezing code Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
  2012-04-17  2:19   ` Joel Becker
  2012-04-16 16:13 ` [PATCH 20/27] gfs2: " Jan Kara
                   ` (8 subsequent siblings)
  27 siblings, 1 reply; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, Mark Fasheh, LKML, dchinner, linux-fsdevel, ocfs2-devel

Protect ocfs2_page_mkwrite() and ocfs2_file_aio_write() using the new
freeze protection. We also protect several ioctl entry points which
were missing the protection.

CC: Mark Fasheh <mfasheh@suse.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: ocfs2-devel@oss.oracle.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/file.c  |   11 +++++++++--
 fs/ocfs2/ioctl.c |   14 ++++++++++++--
 fs/ocfs2/mmap.c  |    2 ++
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 061591a..9b1e3d4 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1971,6 +1971,7 @@ int ocfs2_change_file_space(struct file *file, unsigned int cmd,
 {
 	struct inode *inode = file->f_path.dentry->d_inode;
 	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+	int ret;
 
 	if ((cmd == OCFS2_IOC_RESVSP || cmd == OCFS2_IOC_RESVSP64) &&
 	    !ocfs2_writes_unwritten_extents(osb))
@@ -1985,7 +1986,12 @@ int ocfs2_change_file_space(struct file *file, unsigned int cmd,
 	if (!(file->f_mode & FMODE_WRITE))
 		return -EBADF;
 
-	return __ocfs2_change_file_space(file, inode, file->f_pos, cmd, sr, 0);
+	ret = mnt_want_write_file(file);
+	if (ret)
+		return ret;
+	ret = __ocfs2_change_file_space(file, inode, file->f_pos, cmd, sr, 0);
+	mnt_drop_write_file(file);
+	return ret;
 }
 
 static long ocfs2_fallocate(struct file *file, int mode, loff_t offset,
@@ -2261,7 +2267,7 @@ static ssize_t ocfs2_file_aio_write(struct kiocb *iocb,
 	if (iocb->ki_left == 0)
 		return 0;
 
-	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
+	sb_start_write(inode->i_sb);
 
 	appending = file->f_flags & O_APPEND ? 1 : 0;
 	direct_io = file->f_flags & O_DIRECT ? 1 : 0;
@@ -2434,6 +2440,7 @@ out_sems:
 		ocfs2_iocb_clear_sem_locked(iocb);
 
 	mutex_unlock(&inode->i_mutex);
+	sb_end_write(inode->i_sb);
 
 	if (written)
 		ret = written;
diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c
index a1a1bfd..d9c352b 100644
--- a/fs/ocfs2/ioctl.c
+++ b/fs/ocfs2/ioctl.c
@@ -926,7 +926,12 @@ long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		if (get_user(new_clusters, (int __user *)arg))
 			return -EFAULT;
 
-		return ocfs2_group_extend(inode, new_clusters);
+		status = mnt_want_write_file(filp);
+		if (status)
+			return status;
+		status = ocfs2_group_extend(inode, new_clusters);
+		mnt_drop_write_file(filp);
+		return status;
 	case OCFS2_IOC_GROUP_ADD:
 	case OCFS2_IOC_GROUP_ADD64:
 		if (!capable(CAP_SYS_RESOURCE))
@@ -935,7 +940,12 @@ long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		if (copy_from_user(&input, (int __user *) arg, sizeof(input)))
 			return -EFAULT;
 
-		return ocfs2_group_add(inode, &input);
+		status = mnt_want_write_file(filp);
+		if (status)
+			return status;
+		status = ocfs2_group_add(inode, &input);
+		mnt_drop_write_file(filp);
+		return status;
 	case OCFS2_IOC_REFLINK:
 		if (copy_from_user(&args, (struct reflink_arguments *)arg,
 				   sizeof(args)))
diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
index 9cd4108..d150372 100644
--- a/fs/ocfs2/mmap.c
+++ b/fs/ocfs2/mmap.c
@@ -136,6 +136,7 @@ static int ocfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	sigset_t oldset;
 	int ret;
 
+	sb_start_pagefault(inode->i_sb);
 	ocfs2_block_signals(&oldset);
 
 	/*
@@ -165,6 +166,7 @@ static int ocfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 out:
 	ocfs2_unblock_signals(&oldset);
+	sb_end_pagefault(inode->i_sb);
 	return ret;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 20/27] gfs2: Convert to new freezing mechanism
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (18 preceding siblings ...)
  2012-04-16 16:13 ` [PATCH 19/27] ocfs2: Convert to new freezing mechanism Jan Kara
@ 2012-04-16 16:13 ` Jan Kara
       [not found] ` <1334592845-22862-1-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, LKML, cluster-devel, linux-fsdevel

It is enough to update gfs2_page_mkwrite() to use new freeze protection.
Rest is handled by the generic code.

CC: cluster-devel@redhat.com
CC: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/gfs2/file.c |   15 +++------------
 1 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 151d667..87f5aa8 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -370,11 +370,7 @@ static int gfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	loff_t size;
 	int ret;
 
-	/* Wait if fs is frozen. This is racy so we check again later on
-	 * and retry if the fs has been frozen after the page lock has
-	 * been acquired
-	 */
-	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
+	sb_start_pagefault(inode->i_sb);
 
 	/* Update file times before taking page lock */
 	file_update_time(vma->vm_file);
@@ -458,14 +454,9 @@ out:
 	gfs2_holder_uninit(&gh);
 	if (ret == 0) {
 		set_page_dirty(page);
-		/* This check must be post dropping of transaction lock */
-		if (inode->i_sb->s_frozen == SB_UNFROZEN) {
-			wait_on_page_writeback(page);
-		} else {
-			ret = -EAGAIN;
-			unlock_page(page);
-		}
+		wait_on_page_writeback(page);
 	}
+	sb_end_pagefault(inode->i_sb);
 	return block_page_mkwrite_return(ret);
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 21/27] fuse: Convert to new freezing mechanism
       [not found] ` <1334592845-22862-1-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
@ 2012-04-16 16:13   ` Jan Kara
  2012-04-16 16:16   ` [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
  1 sibling, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Jan Kara, Miklos Szeredi,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, LKML,
	dchinner-H+wXaHxf7aLQT0dZR+AlfA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Convert check in fuse_file_aio_write() to using new freeze protection.

CC: fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
CC: Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 fs/fuse/file.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index a841868..48b5147 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -943,8 +943,8 @@ static ssize_t fuse_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 	if (err)
 		return err;
 
+	sb_start_write(inode->i_sb);
 	mutex_lock(&inode->i_mutex);
-	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
 
 	/* We can write back this queue in page reclaim */
 	current->backing_dev_info = mapping->backing_dev_info;
@@ -970,6 +970,7 @@ static ssize_t fuse_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 out:
 	current->backing_dev_info = NULL;
 	mutex_unlock(&inode->i_mutex);
+	sb_end_write(inode->i_sb);
 
 	return written ? written : err;
 }
-- 
1.7.1


------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 22/27] ntfs: Convert to new freezing mechanism
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (20 preceding siblings ...)
       [not found] ` <1334592845-22862-1-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
@ 2012-04-16 16:14 ` Jan Kara
  2012-04-16 16:14 ` [PATCH 23/27] nilfs2: " Jan Kara
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:14 UTC (permalink / raw)
  To: Al Viro
  Cc: dchinner, LKML, linux-fsdevel, Jan Kara, linux-ntfs-dev,
	Anton Altaparmakov

Move check in ntfs_file_aio_write_nolock() to ntfs_file_aio_write() and
use new freeze protection.

CC: linux-ntfs-dev@lists.sourceforge.net
CC: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ntfs/file.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index 8639169..e026e5f 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -2084,7 +2084,6 @@ static ssize_t ntfs_file_aio_write_nolock(struct kiocb *iocb,
 	if (err)
 		return err;
 	pos = *ppos;
-	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
 	/* We can write back this queue in page reclaim. */
 	current->backing_dev_info = mapping->backing_dev_info;
 	written = 0;
@@ -2117,6 +2116,7 @@ static ssize_t ntfs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 
 	BUG_ON(iocb->ki_pos != pos);
 
+	sb_start_write(inode->i_sb);
 	mutex_lock(&inode->i_mutex);
 	ret = ntfs_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
@@ -2125,6 +2125,7 @@ static ssize_t ntfs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 		if (err < 0)
 			ret = err;
 	}
+	sb_end_write(inode->i_sb);
 	return ret;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 23/27] nilfs2: Convert to new freezing mechanism
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (21 preceding siblings ...)
  2012-04-16 16:14 ` [PATCH 22/27] ntfs: Convert to new freezing mechanism Jan Kara
@ 2012-04-16 16:14 ` Jan Kara
  2012-04-16 16:14 ` [PATCH 24/27] btrfs: " Jan Kara
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:14 UTC (permalink / raw)
  To: Al Viro
  Cc: dchinner, LKML, linux-fsdevel, Jan Kara, linux-nilfs,
	KONISHI Ryusuke

We change nilfs_page_mkwrite() to provide proper freeze protection for
writeable page faults (we must wait for frozen filesystem even if the
page is fully mapped).

We remove all vfs_check_frozen() checks since they are now handled by
the generic code.

CC: linux-nilfs@vger.kernel.org
CC: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/nilfs2/file.c    |   18 +++++++++++-------
 fs/nilfs2/ioctl.c   |    2 --
 fs/nilfs2/segment.c |    5 ++++-
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
index 2660152..0cdd702 100644
--- a/fs/nilfs2/file.c
+++ b/fs/nilfs2/file.c
@@ -65,16 +65,18 @@ static int nilfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	struct page *page = vmf->page;
 	struct inode *inode = vma->vm_file->f_dentry->d_inode;
 	struct nilfs_transaction_info ti;
-	int ret;
+	int ret = 0;
 
 	if (unlikely(nilfs_near_disk_full(inode->i_sb->s_fs_info)))
 		return VM_FAULT_SIGBUS; /* -ENOSPC */
 
+	sb_start_pagefault(inode->i_sb);
 	lock_page(page);
 	if (page->mapping != inode->i_mapping ||
 	    page_offset(page) >= i_size_read(inode) || !PageUptodate(page)) {
 		unlock_page(page);
-		return VM_FAULT_NOPAGE; /* make the VM retry the fault */
+		ret = -EFAULT;	/* make the VM retry the fault */
+		goto out;
 	}
 
 	/*
@@ -108,19 +110,21 @@ static int nilfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	ret = nilfs_transaction_begin(inode->i_sb, &ti, 1);
 	/* never returns -ENOMEM, but may return -ENOSPC */
 	if (unlikely(ret))
-		return VM_FAULT_SIGBUS;
+		goto out;
 
-	ret = block_page_mkwrite(vma, vmf, nilfs_get_block);
-	if (ret != VM_FAULT_LOCKED) {
+	ret = __block_page_mkwrite(vma, vmf, nilfs_get_block);
+	if (ret) {
 		nilfs_transaction_abort(inode->i_sb);
-		return ret;
+		goto out;
 	}
 	nilfs_set_file_dirty(inode, 1 << (PAGE_SHIFT - inode->i_blkbits));
 	nilfs_transaction_commit(inode->i_sb);
 
  mapped:
 	wait_on_page_writeback(page);
-	return VM_FAULT_LOCKED;
+ out:
+	sb_end_pagefault(inode->i_sb);
+	return block_page_mkwrite_return(ret);
 }
 
 static const struct vm_operations_struct nilfs_file_vm_ops = {
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 2a70fce..a2063c7 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -660,8 +660,6 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp,
 		goto out_free;
 	}
 
-	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
-
 	ret = nilfs_ioctl_move_blocks(inode->i_sb, &argv[0], kbufs[0]);
 	if (ret < 0)
 		printk(KERN_ERR "NILFS: GC failed during preparation: "
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 0e72ad6..627351a 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -189,7 +189,7 @@ int nilfs_transaction_begin(struct super_block *sb,
 	if (ret > 0)
 		return 0;
 
-	vfs_check_frozen(sb, SB_FREEZE_WRITE);
+	sb_start_intwrite(sb);
 
 	nilfs = sb->s_fs_info;
 	down_read(&nilfs->ns_segctor_sem);
@@ -205,6 +205,7 @@ int nilfs_transaction_begin(struct super_block *sb,
 	current->journal_info = ti->ti_save;
 	if (ti->ti_flags & NILFS_TI_DYNAMIC_ALLOC)
 		kmem_cache_free(nilfs_transaction_cachep, ti);
+	sb_end_intwrite(sb);
 	return ret;
 }
 
@@ -246,6 +247,7 @@ int nilfs_transaction_commit(struct super_block *sb)
 		err = nilfs_construct_segment(sb);
 	if (ti->ti_flags & NILFS_TI_DYNAMIC_ALLOC)
 		kmem_cache_free(nilfs_transaction_cachep, ti);
+	sb_end_intwrite(sb);
 	return err;
 }
 
@@ -264,6 +266,7 @@ void nilfs_transaction_abort(struct super_block *sb)
 	current->journal_info = ti->ti_save;
 	if (ti->ti_flags & NILFS_TI_DYNAMIC_ALLOC)
 		kmem_cache_free(nilfs_transaction_cachep, ti);
+	sb_end_intwrite(sb);
 }
 
 void nilfs_relax_pressure_in_lock(struct super_block *sb)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 24/27] btrfs: Convert to new freezing mechanism
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (22 preceding siblings ...)
  2012-04-16 16:14 ` [PATCH 23/27] nilfs2: " Jan Kara
@ 2012-04-16 16:14 ` Jan Kara
  2012-04-16 16:14 ` [PATCH 25/27] fs: Remove old " Jan Kara
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:14 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara, linux-btrfs, Chris Mason

We convert btrfs_file_aio_write() to use new freeze check.  We also add proper
freeze protection to btrfs_page_mkwrite(). Checks in cleaner_kthread() and
transaction_kthread() can be safely removed since btrfs_freeze() will lock
the mutexes and thus block the threads (and they shouldn't have anything to
do anyway).

CC: linux-btrfs@vger.kernel.org
CC: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/btrfs/disk-io.c |    3 ---
 fs/btrfs/file.c    |    3 ++-
 fs/btrfs/inode.c   |    6 +++++-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 20196f4..555a57a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1527,8 +1527,6 @@ static int cleaner_kthread(void *arg)
 	struct btrfs_root *root = arg;
 
 	do {
-		vfs_check_frozen(root->fs_info->sb, SB_FREEZE_WRITE);
-
 		if (!(root->fs_info->sb->s_flags & MS_RDONLY) &&
 		    mutex_trylock(&root->fs_info->cleaner_mutex)) {
 			btrfs_run_delayed_iputs(root);
@@ -1560,7 +1558,6 @@ static int transaction_kthread(void *arg)
 	do {
 		cannot_commit = false;
 		delay = HZ * 30;
-		vfs_check_frozen(root->fs_info->sb, SB_FREEZE_WRITE);
 		mutex_lock(&root->fs_info->transaction_kthread_mutex);
 
 		spin_lock(&root->fs_info->trans_lock);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index d83260d..a48251e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1358,7 +1358,7 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
 	ssize_t err = 0;
 	size_t count, ocount;
 
-	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
+	sb_start_write(inode->i_sb);
 
 	mutex_lock(&inode->i_mutex);
 
@@ -1449,6 +1449,7 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
 			num_written = err;
 	}
 out:
+	sb_end_write(inode->i_sb);
 	current->backing_dev_info = NULL;
 	return num_written ? num_written : err;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 115bc05..db4fc01 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6592,6 +6592,7 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	u64 page_start;
 	u64 page_end;
 
+	sb_start_pagefault(inode->i_sb);
 	ret  = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
 	if (!ret) {
 		ret = btrfs_update_time(vma->vm_file);
@@ -6681,12 +6682,15 @@ again:
 	unlock_extent_cached(io_tree, page_start, page_end, &cached_state, GFP_NOFS);
 
 out_unlock:
-	if (!ret)
+	if (!ret) {
+		sb_end_pagefault(inode->i_sb);
 		return VM_FAULT_LOCKED;
+	}
 	unlock_page(page);
 out:
 	btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE);
 out_noreserve:
+	sb_end_pagefault(inode->i_sb);
 	return ret;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 25/27] fs: Remove old freezing mechanism
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (23 preceding siblings ...)
  2012-04-16 16:14 ` [PATCH 24/27] btrfs: " Jan Kara
@ 2012-04-16 16:14 ` Jan Kara
  2012-04-16 16:14 ` [PATCH 26/27] fs: Refuse to freeze filesystem with open but unlinked files Jan Kara
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:14 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara

Now that all users are converted, we can remove functions, variables, and
constants defined by the old freezing mechanism.

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/super.c         |    1 -
 include/linux/fs.h |    9 ---------
 2 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 6ed27cd..7b2c81c 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -217,7 +217,6 @@ static struct super_block *alloc_super(struct file_system_type *type)
 		mutex_init(&s->s_dquot.dqio_mutex);
 		mutex_init(&s->s_dquot.dqonoff_mutex);
 		init_rwsem(&s->s_dquot.dqptr_sem);
-		init_waitqueue_head(&s->s_wait_unfrozen);
 		s->s_maxbytes = MAX_NON_LFS;
 		s->s_op = &default_op;
 		s->s_time_gran = 1000000000;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d3c5126..e6bb666 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1430,7 +1430,6 @@ extern spinlock_t sb_lock;
 enum {
 	SB_UNFROZEN = 0,		/* FS is unfrozen */
 	SB_FREEZE_WRITE	= 1,		/* Writes, dir ops, ioctls frozen */
-	SB_FREEZE_TRANS = 2,
 	SB_FREEZE_PAGEFAULT = 2,	/* Page faults stopped as well */
 	SB_FREEZE_FS = 3,		/* For internal FS use (e.g. to stop
 					 * internal threads if needed) */
@@ -1499,8 +1498,6 @@ struct super_block {
 	struct hlist_node	s_instances;
 	struct quota_info	s_dquot;	/* Diskquota specific options */
 
-	int			s_frozen;
-	wait_queue_head_t	s_wait_unfrozen;
 	struct sb_writers	s_writers;
 
 	char s_id[32];				/* Informational name */
@@ -1554,12 +1551,6 @@ extern void prune_dcache_sb(struct super_block *sb, int nr_to_scan);
 extern struct timespec current_fs_time(struct super_block *sb);
 
 /*
- * Snapshotting support.
- */
-/* Will go away when all users are converted */
-#define vfs_check_frozen(sb, level) do { } while (0)
-
-/*
  * until VFS tracks user namespaces for inodes, just make all files
  * belong to init_user_ns
  */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 26/27] fs: Refuse to freeze filesystem with open but unlinked files
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (24 preceding siblings ...)
  2012-04-16 16:14 ` [PATCH 25/27] fs: Remove old " Jan Kara
@ 2012-04-16 16:14 ` Jan Kara
  2012-04-16 16:14 ` [PATCH 27/27] Documentation: Correct s_umount state for freeze_fs/unfreeze_fs Jan Kara
  2012-04-16 22:02 ` [PATCH 00/19 v5] Fix filesystem freezing deadlocks Andreas Dilger
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:14 UTC (permalink / raw)
  To: Al Viro; +Cc: dchinner, LKML, linux-fsdevel, Jan Kara

Filesystem with frozen but unlinked files cannot be forced into
a fully consistent state. Also handling of closing such files on frozen
filesystem is problematic since fput() can be called with mmap_sem held
and protection against frozen filesystem ranks above it.

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa <kamal@canonical.com>
Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Massimo Morana <massimo.morana@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/super.c |   26 +++++++++++++++++++-------
 1 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 7b2c81c..2bd3632 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1485,16 +1485,22 @@ int freeze_super(struct super_block *sb)
 	smp_wmb();
 	sb_wait_write(sb, SB_FREEZE_FS);
 
+	/*
+	 * Check whether there are some open unlinked files (since all writers
+	 * are blocked, new ones cannot be created now so the check is
+	 * reliable). In that case filesystem cannot be forced into a
+	 * consistent state so we just bail out.
+	 */
+	if (atomic_long_read(&sb->s_remove_count)) {
+		ret = -EBUSY;
+		goto bail;
+	}
+
 	if (sb->s_op->freeze_fs) {
 		ret = sb->s_op->freeze_fs(sb);
 		if (ret) {
-			printk(KERN_ERR
-				"VFS:Filesystem freeze failed\n");
-			sb->s_writers.frozen = SB_UNFROZEN;
-			smp_wmb();
-			wake_up(&sb->s_writers.wait_unfrozen);
-			deactivate_locked_super(sb);
-			return ret;
+			printk(KERN_ERR "VFS: Filesystem freeze failed\n");
+			goto bail;
 		}
 	}
 	/*
@@ -1504,6 +1510,12 @@ int freeze_super(struct super_block *sb)
 	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
 	up_write(&sb->s_umount);
 	return 0;
+bail:
+	sb->s_writers.frozen = SB_UNFROZEN;
+	smp_wmb();
+	wake_up(&sb->s_writers.wait_unfrozen);
+	deactivate_locked_super(sb);
+	return ret;
 }
 EXPORT_SYMBOL(freeze_super);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 27/27] Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (25 preceding siblings ...)
  2012-04-16 16:14 ` [PATCH 26/27] fs: Refuse to freeze filesystem with open but unlinked files Jan Kara
@ 2012-04-16 16:14 ` Jan Kara
  2012-04-16 22:02 ` [PATCH 00/19 v5] Fix filesystem freezing deadlocks Andreas Dilger
  27 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:14 UTC (permalink / raw)
  To: Al Viro
  Cc: dchinner, LKML, linux-fsdevel, Valerie Aurora, Kamal Mostafa,
	Jan Kara

From: Valerie Aurora <val@vaaconsulting.com>

freeze_fs/unfreeze_fs ops are called with s_umount held for write, not read.

Signed-off-by: Valerie Aurora <val@vaaconsulting.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 Documentation/filesystems/Locking |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 4fca82e..406fb09 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -134,8 +134,8 @@ evict_inode:
 put_super:		write
 write_super:		read
 sync_fs:		read
-freeze_fs:		read
-unfreeze_fs:		read
+freeze_fs:		write
+unfreeze_fs:		write
 statfs:			maybe(read)	(see below)
 remount_fs:		write
 umount_begin:		no
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 00/19 v5] Fix filesystem freezing deadlocks
       [not found] ` <1334592845-22862-1-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
  2012-04-16 16:13   ` [PATCH 21/27] fuse: " Jan Kara
@ 2012-04-16 16:16   ` Jan Kara
  1 sibling, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-16 16:16 UTC (permalink / raw)
  To: Al Viro
  Cc: dchinner-H+wXaHxf7aLQT0dZR+AlfA, LKML,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Jan Kara, Alex Elder,
	Anton Altaparmakov, Ben Myers, Chris Mason,
	cluster-devel-H+wXaHxf7aLQT0dZR+AlfA, David S. Miller,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, J. Bruce Fields,
	Joel Becker, KONISHI Ryusuke, linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Mark Fasheh,
	Miklos Szeredi, ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g,
	OGAWA Hirofumi, Steven Whitehouse, Theodore Ts'o,
	xfs-VZNHf3L845pBDgjK7y7TUQ

  The subject should have been [PATCH 00/27]... Sorry for the mistake.

								Honza

On Mon 16-04-12 18:13:38, Jan Kara wrote:
>   Hello,
> 
>   here is the fifth iteration of my patches to improve filesystem freezing.
> No serious changes since last time. Mostly I rebased patches and merged this
> series with series moving file_update_time() to ->page_mkwrite() to simplify
> testing and merging.
> 
> Filesystem freezing is currently racy and thus we can end up with dirty data on
> frozen filesystem (see changelog patch 13 for detailed race description). This
> patch series aims at fixing this.
> 
> To be able to block all places where inodes get dirtied, I've moved filesystem
> file_update_time() call to ->page_mkwrite callback (patches 01-07) and put
> freeze handling in mnt_want_write() / mnt_drop_write(). That however required
> some code shuffling and changes to kern_path_create() (see patches 09-12). I
> think the result is OK but opinions may differ ;). The advantage of this change
> also is that all filesystems get freeze protection almost for free - even ext2
> can handle freezing well now.
> 
> Another potential contention point might be patch 19. In that patch we make
> freeze_super() refuse to freeze the filesystem when there are open but unlinked
> files which may be impractical in some cases. The main reason for this is the
> problem with handling of file deletion from fput() called with mmap_sem held
> (e.g. from munmap(2)), and then there's the fact that we cannot really force
> such filesystem into a consistent state... But if people think that freezing
> with open but unlinked files should happen, then I have some possible
> solutions in mind (maybe as a separate patchset since this is large enough).
> 
> I'm not able to hit any deadlocks, lockdep warnings, or dirty data on frozen
> filesystem despite beating it with fsstress and bash-shared-mapping while
> freezing and unfreezing for several hours (using ext4 and xfs) so I'm
> reasonably confident this could finally be the right solution.
> 
> Changes since v4:
>   * added a couple of Acked-by's
>   * added some comments & doc update
>   * added patches from series "Push file_update_time() into .page_mkwrite"
>     since it doesn't make much sense to keep them separate anymore
>   * rebased on top of 3.4-rc2
> 
> Changes since v3:
>   * added third level of freezing for fs internal purposes - hooked some
>     filesystems to use it (XFS, nilfs2)
>   * removed racy i_size check from filemap_mkwrite()
> 
> Changes since v2:
>   * completely rewritten
>   * freezing is now blocked at VFS entry points
>   * two stage freezing to handle both mmapped writes and other IO
> 
> The biggest changes since v1:
>   * have two counters to provide safe state transitions for SB_FREEZE_WRITE
>     and SB_FREEZE_TRANS states
>   * use percpu counters instead of own percpu structure
>   * added documentation fixes from the old fs freezing series
>   * converted XFS to use SB_FREEZE_TRANS counter instead of its private
>     m_active_trans counter
> 
> 								Honza
> 
> CC: Alex Elder <elder-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> CC: Anton Altaparmakov <anton-yrGDUoBaLx3QT0dZR+AlfA@public.gmane.org>
> CC: Ben Myers <bpm-sJ/iWh9BUns@public.gmane.org>
> CC: Chris Mason <chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> CC: cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> CC: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> CC: fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> CC: "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
> CC: Joel Becker <jlbec-aKy9MeLSZ9dg9hUCZPvPmw@public.gmane.org>
> CC: KONISHI Ryusuke <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
> CC: linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> CC: linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> CC: Mark Fasheh <mfasheh-IBi9RG/b67k@public.gmane.org>
> CC: Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
> CC: ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g@public.gmane.org
> CC: OGAWA Hirofumi <hirofumi-UIVanBePwB70ZhReMnHkpc8NsWr+9BEh@public.gmane.org>
> CC: Steven Whitehouse <swhiteho-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: "Theodore Ts'o" <tytso-3s7WtUTddSA@public.gmane.org>
> CC: xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org
-- 
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 12/27] nfsd: Push mnt_want_write() outside of i_mutex
  2012-04-16 16:13 ` [PATCH 12/27] nfsd: " Jan Kara
@ 2012-04-16 18:25   ` J. Bruce Fields
  2012-04-17  8:17     ` Jan Kara
  0 siblings, 1 reply; 42+ messages in thread
From: J. Bruce Fields @ 2012-04-16 18:25 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, dchinner, LKML, linux-fsdevel, linux-nfs

On Mon, Apr 16, 2012 at 06:13:50PM +0200, Jan Kara wrote:
> When mnt_want_write() starts to handle freezing it will get a full lock
> semantics requiring proper lock ordering. So push mnt_want_write() call
> consistently outside of i_mutex.

How are you testing this?  And do you want this particular track merged
for 3.5 through the nfsd tree, or should it go some other way?

--b.

> 
> CC: linux-nfs@vger.kernel.org
> CC: "J. Bruce Fields" <bfields@fieldses.org>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/nfsd/nfs4recover.c      |    9 +++--
>  fs/nfsd/nfsfh.c            |    1 +
>  fs/nfsd/nfsproc.c          |    9 ++++-
>  fs/nfsd/vfs.c              |   79 ++++++++++++++++++++++---------------------
>  fs/nfsd/vfs.h              |   11 +++++-
>  include/linux/nfsd/nfsfh.h |    1 +
>  6 files changed, 64 insertions(+), 46 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> index 4767429..efa7574 100644
> --- a/fs/nfsd/nfs4recover.c
> +++ b/fs/nfsd/nfs4recover.c
> @@ -154,6 +154,10 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
>  	if (status < 0)
>  		return;
>  
> +	status = mnt_want_write_file(rec_file);
> +	if (status)
> +		return;
> +
>  	dir = rec_file->f_path.dentry;
>  	/* lock the parent */
>  	mutex_lock(&dir->d_inode->i_mutex);
> @@ -173,11 +177,7 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
>  		 * as well be forgiving and just succeed silently.
>  		 */
>  		goto out_put;
> -	status = mnt_want_write_file(rec_file);
> -	if (status)
> -		goto out_put;
>  	status = vfs_mkdir(dir->d_inode, dentry, S_IRWXU);
> -	mnt_drop_write_file(rec_file);
>  out_put:
>  	dput(dentry);
>  out_unlock:
> @@ -189,6 +189,7 @@ out_unlock:
>  				" (err %d); please check that %s exists"
>  				" and is writeable", status,
>  				user_recovery_dirname);
> +	mnt_drop_write_file(rec_file);
>  	nfs4_reset_creds(original_cred);
>  }
>  
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index 68454e7..8b93353 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -635,6 +635,7 @@ fh_put(struct svc_fh *fhp)
>  		fhp->fh_post_saved = 0;
>  #endif
>  	}
> +	fh_drop_write(fhp);
>  	if (exp) {
>  		cache_put(&exp->h, &svc_export_cache);
>  		fhp->fh_export = NULL;
> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> index e15dc45..aad6d45 100644
> --- a/fs/nfsd/nfsproc.c
> +++ b/fs/nfsd/nfsproc.c
> @@ -196,6 +196,7 @@ nfsd_proc_create(struct svc_rqst *rqstp, struct nfsd_createargs *argp,
>  	struct dentry	*dchild;
>  	int		type, mode;
>  	__be32		nfserr;
> +	int		hosterr;
>  	dev_t		rdev = 0, wanted = new_decode_dev(attr->ia_size);
>  
>  	dprintk("nfsd: CREATE   %s %.*s\n",
> @@ -214,6 +215,12 @@ nfsd_proc_create(struct svc_rqst *rqstp, struct nfsd_createargs *argp,
>  	nfserr = nfserr_exist;
>  	if (isdotent(argp->name, argp->len))
>  		goto done;
> +	hosterr = fh_want_write(dirfhp);
> +	if (hosterr) {
> +		nfserr = nfserrno(hosterr);
> +		goto done;
> +	}
> +
>  	fh_lock_nested(dirfhp, I_MUTEX_PARENT);
>  	dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len);
>  	if (IS_ERR(dchild)) {
> @@ -330,7 +337,7 @@ nfsd_proc_create(struct svc_rqst *rqstp, struct nfsd_createargs *argp,
>  out_unlock:
>  	/* We don't really need to unlock, as fh_put does it. */
>  	fh_unlock(dirfhp);
> -
> +	fh_drop_write(dirfhp);
>  done:
>  	fh_put(dirfhp);
>  	return nfsd_return_dirop(nfserr, resp);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 296d671..b8bb649 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1276,6 +1276,10 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	 * If it has, the parent directory should already be locked.
>  	 */
>  	if (!resfhp->fh_dentry) {
> +		host_err = fh_want_write(fhp);
> +		if (host_err)
> +			goto out_nfserr;
> +
>  		/* called from nfsd_proc_mkdir, or possibly nfsd3_proc_create */
>  		fh_lock_nested(fhp, I_MUTEX_PARENT);
>  		dchild = lookup_one_len(fname, dentry, flen);
> @@ -1319,14 +1323,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		goto out;
>  	}
>  
> -	host_err = fh_want_write(fhp);
> -	if (host_err)
> -		goto out_nfserr;
> -
>  	/*
>  	 * Get the dir op function pointer.
>  	 */
>  	err = 0;
> +	host_err = 0;
>  	switch (type) {
>  	case S_IFREG:
>  		host_err = vfs_create(dirp, dchild, iap->ia_mode, NULL);
> @@ -1343,10 +1344,8 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		host_err = vfs_mknod(dirp, dchild, iap->ia_mode, rdev);
>  		break;
>  	}
> -	if (host_err < 0) {
> -		fh_drop_write(fhp);
> +	if (host_err < 0)
>  		goto out_nfserr;
> -	}
>  
>  	err = nfsd_create_setattr(rqstp, resfhp, iap);
>  
> @@ -1358,7 +1357,6 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	err2 = nfserrno(commit_metadata(fhp));
>  	if (err2)
>  		err = err2;
> -	fh_drop_write(fhp);
>  	/*
>  	 * Update the file handle to get the new inode info.
>  	 */
> @@ -1417,6 +1415,11 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	err = nfserr_notdir;
>  	if (!dirp->i_op->lookup)
>  		goto out;
> +
> +	host_err = fh_want_write(fhp);
> +	if (host_err)
> +		goto out_nfserr;
> +
>  	fh_lock_nested(fhp, I_MUTEX_PARENT);
>  
>  	/*
> @@ -1449,9 +1452,6 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		v_atime = verifier[1]&0x7fffffff;
>  	}
>  	
> -	host_err = fh_want_write(fhp);
> -	if (host_err)
> -		goto out_nfserr;
>  	if (dchild->d_inode) {
>  		err = 0;
>  
> @@ -1522,7 +1522,6 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (!err)
>  		err = nfserrno(commit_metadata(fhp));
>  
> -	fh_drop_write(fhp);
>  	/*
>  	 * Update the filehandle to get the new inode info.
>  	 */
> @@ -1533,6 +1532,7 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	fh_unlock(fhp);
>  	if (dchild && !IS_ERR(dchild))
>  		dput(dchild);
> +	fh_drop_write(fhp);
>   	return err;
>   
>   out_nfserr:
> @@ -1613,6 +1613,11 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>  	if (err)
>  		goto out;
> +
> +	host_err = fh_want_write(fhp);
> +	if (host_err)
> +		goto out_nfserr;
> +
>  	fh_lock(fhp);
>  	dentry = fhp->fh_dentry;
>  	dnew = lookup_one_len(fname, dentry, flen);
> @@ -1620,10 +1625,6 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (IS_ERR(dnew))
>  		goto out_nfserr;
>  
> -	host_err = fh_want_write(fhp);
> -	if (host_err)
> -		goto out_nfserr;
> -
>  	if (unlikely(path[plen] != 0)) {
>  		char *path_alloced = kmalloc(plen+1, GFP_KERNEL);
>  		if (path_alloced == NULL)
> @@ -1683,6 +1684,12 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>  	if (isdotent(name, len))
>  		goto out;
>  
> +	host_err = fh_want_write(tfhp);
> +	if (host_err) {
> +		err = nfserrno(host_err);
> +		goto out;
> +	}
> +
>  	fh_lock_nested(ffhp, I_MUTEX_PARENT);
>  	ddir = ffhp->fh_dentry;
>  	dirp = ddir->d_inode;
> @@ -1694,18 +1701,13 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>  
>  	dold = tfhp->fh_dentry;
>  
> -	host_err = fh_want_write(tfhp);
> -	if (host_err) {
> -		err = nfserrno(host_err);
> -		goto out_dput;
> -	}
>  	err = nfserr_noent;
>  	if (!dold->d_inode)
> -		goto out_drop_write;
> +		goto out_dput;
>  	host_err = nfsd_break_lease(dold->d_inode);
>  	if (host_err) {
>  		err = nfserrno(host_err);
> -		goto out_drop_write;
> +		goto out_dput;
>  	}
>  	host_err = vfs_link(dold, dirp, dnew);
>  	if (!host_err) {
> @@ -1718,12 +1720,11 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>  		else
>  			err = nfserrno(host_err);
>  	}
> -out_drop_write:
> -	fh_drop_write(tfhp);
>  out_dput:
>  	dput(dnew);
>  out_unlock:
>  	fh_unlock(ffhp);
> +	fh_drop_write(tfhp);
>  out:
>  	return err;
>  
> @@ -1766,6 +1767,12 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
>  	if (!flen || isdotent(fname, flen) || !tlen || isdotent(tname, tlen))
>  		goto out;
>  
> +	host_err = fh_want_write(ffhp);
> +	if (host_err) {
> +		err = nfserrno(host_err);
> +		goto out;
> +	}
> +
>  	/* cannot use fh_lock as we need deadlock protective ordering
>  	 * so do it by hand */
>  	trap = lock_rename(tdentry, fdentry);
> @@ -1796,17 +1803,14 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
>  	host_err = -EXDEV;
>  	if (ffhp->fh_export->ex_path.mnt != tfhp->fh_export->ex_path.mnt)
>  		goto out_dput_new;
> -	host_err = fh_want_write(ffhp);
> -	if (host_err)
> -		goto out_dput_new;
>  
>  	host_err = nfsd_break_lease(odentry->d_inode);
>  	if (host_err)
> -		goto out_drop_write;
> +		goto out_dput_new;
>  	if (ndentry->d_inode) {
>  		host_err = nfsd_break_lease(ndentry->d_inode);
>  		if (host_err)
> -			goto out_drop_write;
> +			goto out_dput_new;
>  	}
>  	host_err = vfs_rename(fdir, odentry, tdir, ndentry);
>  	if (!host_err) {
> @@ -1814,8 +1818,6 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
>  		if (!host_err)
>  			host_err = commit_metadata(ffhp);
>  	}
> -out_drop_write:
> -	fh_drop_write(ffhp);
>   out_dput_new:
>  	dput(ndentry);
>   out_dput_old:
> @@ -1831,6 +1833,7 @@ out_drop_write:
>  	fill_post_wcc(tfhp);
>  	unlock_rename(tdentry, fdentry);
>  	ffhp->fh_locked = tfhp->fh_locked = 0;
> +	fh_drop_write(ffhp);
>  
>  out:
>  	return err;
> @@ -1856,6 +1859,10 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>  	if (err)
>  		goto out;
>  
> +	host_err = fh_want_write(fhp);
> +	if (host_err)
> +		goto out_nfserr;
> +
>  	fh_lock_nested(fhp, I_MUTEX_PARENT);
>  	dentry = fhp->fh_dentry;
>  	dirp = dentry->d_inode;
> @@ -1874,21 +1881,15 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>  	if (!type)
>  		type = rdentry->d_inode->i_mode & S_IFMT;
>  
> -	host_err = fh_want_write(fhp);
> -	if (host_err)
> -		goto out_put;
> -
>  	host_err = nfsd_break_lease(rdentry->d_inode);
>  	if (host_err)
> -		goto out_drop_write;
> +		goto out_put;
>  	if (type != S_IFDIR)
>  		host_err = vfs_unlink(dirp, rdentry);
>  	else
>  		host_err = vfs_rmdir(dirp, rdentry);
>  	if (!host_err)
>  		host_err = commit_metadata(fhp);
> -out_drop_write:
> -	fh_drop_write(fhp);
>  out_put:
>  	dput(rdentry);
>  
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index ec0611b..359594c 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -110,12 +110,19 @@ int nfsd_set_posix_acl(struct svc_fh *, int, struct posix_acl *);
>  
>  static inline int fh_want_write(struct svc_fh *fh)
>  {
> -	return mnt_want_write(fh->fh_export->ex_path.mnt);
> +	int ret = mnt_want_write(fh->fh_export->ex_path.mnt);
> +
> +	if (!ret)
> +		fh->fh_want_write = 1;
> +	return ret;
>  }
>  
>  static inline void fh_drop_write(struct svc_fh *fh)
>  {
> -	mnt_drop_write(fh->fh_export->ex_path.mnt);
> +	if (fh->fh_want_write) {
> +		fh->fh_want_write = 0;
> +		mnt_drop_write(fh->fh_export->ex_path.mnt);
> +	}
>  }
>  
>  #endif /* LINUX_NFSD_VFS_H */
> diff --git a/include/linux/nfsd/nfsfh.h b/include/linux/nfsd/nfsfh.h
> index ce4743a..fa63048 100644
> --- a/include/linux/nfsd/nfsfh.h
> +++ b/include/linux/nfsd/nfsfh.h
> @@ -143,6 +143,7 @@ typedef struct svc_fh {
>  	int			fh_maxsize;	/* max size for fh_handle */
>  
>  	unsigned char		fh_locked;	/* inode locked by us */
> +	unsigned char		fh_want_write;	/* remount protection taken */
>  
>  #ifdef CONFIG_NFSD_V3
>  	unsigned char		fh_post_saved;	/* post-op attrs saved */
> -- 
> 1.7.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 00/19 v5] Fix filesystem freezing deadlocks
  2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
                   ` (26 preceding siblings ...)
  2012-04-16 16:14 ` [PATCH 27/27] Documentation: Correct s_umount state for freeze_fs/unfreeze_fs Jan Kara
@ 2012-04-16 22:02 ` Andreas Dilger
  2012-04-17  0:43   ` Dave Chinner
  2012-04-17  9:32   ` Jan Kara
  27 siblings, 2 replies; 42+ messages in thread
From: Andreas Dilger @ 2012-04-16 22:02 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, dchinner, LKML, linux-fsdevel, Alex Elder,
	Anton Altaparmakov, Ben Myers, Chris Mason, cluster-devel,
	David S. Miller, fuse-devel, J. Bruce Fields, Joel Becker,
	KONISHI Ryusuke, linux-btrfs, linux-ext4, linux-nfs, linux-nilfs,
	linux-ntfs-dev, Mark Fasheh, Miklos Szeredi, ocfs2-devel,
	OGAWA Hirofumi, Steven Whitehouse, Theodore Ts'o, xfs

On 2012-04-16, at 9:13 AM, Jan Kara wrote:
> Another potential contention point might be patch 19. In that patch
> we make freeze_super() refuse to freeze the filesystem when there
> are open but unlinked files which may be impractical in some cases.
> The main reason for this is the problem with handling of file deletion
> from fput() called with mmap_sem held (e.g. from munmap(2)), and
> then there's the fact that we cannot really force such filesystem
> into a consistent state... But if people think that freezing with
> open but unlinked files should happen, then I have some possible
> solutions in mind (maybe as a separate patchset since this is
> large enough).

Looking at a desktop system, I think it is very typical that there
are open-unlinked files present, so I don't know if this is really
an acceptable solution.  It isn't clear from your comments whether
this is a blanket refusal for all open-unlinked files, or only in
some particular cases...

lsof | grep deleted
nautilus  25393  adilger   19r      REG           253,0      340     253954 /home/adilger/.local/share/gvfs-metadata/home (deleted)
nautilus  25393  adilger   20r      REG           253,0    32768     253964 /home/adilger/.local/share/gvfs-metadata/home-f332a8f3.log (deleted)
gnome-ter 25623  adilger   22u      REG            0,18    17841    2717846 /tmp/vtePIRJCW (deleted)
gnome-ter 25623  adilger   23u      REG            0,18     5568    2717847 /tmp/vteDCSJCW (deleted)
gnome-ter 25623  adilger   29u      REG            0,18      480    2728484 /tmp/vte6C1TCW (deleted)

Cheers, Andreas

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 00/19 v5] Fix filesystem freezing deadlocks
  2012-04-16 22:02 ` [PATCH 00/19 v5] Fix filesystem freezing deadlocks Andreas Dilger
@ 2012-04-17  0:43   ` Dave Chinner
  2012-04-17  5:10     ` Andreas Dilger
  2012-04-17  9:32   ` Jan Kara
  1 sibling, 1 reply; 42+ messages in thread
From: Dave Chinner @ 2012-04-17  0:43 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Jan Kara, Al Viro, dchinner, LKML, linux-fsdevel, Alex Elder,
	Anton Altaparmakov, Ben Myers, Chris Mason, cluster-devel,
	David S. Miller, fuse-devel, J. Bruce Fields, Joel Becker,
	KONISHI Ryusuke, linux-btrfs, linux-ext4, linux-nfs, linux-nilfs,
	linux-ntfs-dev, Mark Fasheh, Miklos Szeredi, ocfs2-devel,
	OGAWA Hirofumi, Steven Whitehouse, Theodore Ts'o, xfs

On Mon, Apr 16, 2012 at 03:02:50PM -0700, Andreas Dilger wrote:
> On 2012-04-16, at 9:13 AM, Jan Kara wrote:
> > Another potential contention point might be patch 19. In that patch
> > we make freeze_super() refuse to freeze the filesystem when there
> > are open but unlinked files which may be impractical in some cases.
> > The main reason for this is the problem with handling of file deletion
> > from fput() called with mmap_sem held (e.g. from munmap(2)), and
> > then there's the fact that we cannot really force such filesystem
> > into a consistent state... But if people think that freezing with
> > open but unlinked files should happen, then I have some possible
> > solutions in mind (maybe as a separate patchset since this is
> > large enough).
> 
> Looking at a desktop system, I think it is very typical that there
> are open-unlinked files present, so I don't know if this is really
> an acceptable solution.  It isn't clear from your comments whether
> this is a blanket refusal for all open-unlinked files, or only in
> some particular cases...
> 
> lsof | grep deleted
> nautilus  25393  adilger   19r      REG           253,0      340     253954 /home/adilger/.local/share/gvfs-metadata/home (deleted)
> nautilus  25393  adilger   20r      REG           253,0    32768     253964 /home/adilger/.local/share/gvfs-metadata/home-f332a8f3.log (deleted)
> gnome-ter 25623  adilger   22u      REG            0,18    17841    2717846 /tmp/vtePIRJCW (deleted)
> gnome-ter 25623  adilger   23u      REG            0,18     5568    2717847 /tmp/vteDCSJCW (deleted)
> gnome-ter 25623  adilger   29u      REG            0,18      480    2728484 /tmp/vte6C1TCW (deleted)

Unlinked-but-open files are the reason that XFS dirties the log
after the freeze process is complete. This ensures that if the
system crashes while the filesystem is frozen then log recovery
during the next mount will process the unlinked (orphaned) inodes
and free the correctly. i.e. you can still freeze a filesystem with
inodes in this state successfully and have everythign behave as
you'd expect.

I'm not sure how other filesystems handle this problem, but perhaps
pushing this check down into filesystem specific code or adding a
superblock feature flag might be a way to allow filesystems to
handle this case in the way they think is best...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 09/27] fs: Push mnt_want_write() outside of i_mutex
  2012-04-16 16:13 ` [PATCH 09/27] fs: Push mnt_want_write() outside of i_mutex Jan Kara
@ 2012-04-17  2:18   ` Joel Becker
  2012-04-17  7:43     ` Jan Kara
  0 siblings, 1 reply; 42+ messages in thread
From: Joel Becker @ 2012-04-17  2:18 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, dchinner, LKML, linux-fsdevel, ocfs2-devel, Mark Fasheh,
	David S. Miller

On Mon, Apr 16, 2012 at 06:13:47PM +0200, Jan Kara wrote:
> Currently, mnt_want_write() is sometimes called with i_mutex held and sometimes
> without it. This isn't really a problem because mnt_want_write() is a
> non-blocking operation (essentially has a trylock semantics) but when the
> function starts to handle also frozen filesystems, it will get a full lock
> semantics and thus proper lock ordering has to be established. So move
> all mnt_want_write() calls outside of i_mutex.
> 
> One non-trivial case needing conversion is kern_path_create() /
> user_path_create() which didn't include mnt_want_write() but now needs to
> because it acquires i_mutex.  Because there are virtual file systems which
> don't bother with freeze / remount-ro protection we actually provide both
> versions of the function - one which calls mnt_want_write() and one which does
> not.
> 
> CC: ocfs2-devel@oss.oracle.com
> CC: Mark Fasheh <mfasheh@suse.com>
> CC: Joel Becker <jlbec@evilplan.org>
> CC: "David S. Miller" <davem@davemloft.net>
> BugLink: https://bugs.launchpad.net/bugs/897421
> Tested-by: Kamal Mostafa <kamal@canonical.com>
> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
> Tested-by: Dann Frazier <dann.frazier@canonical.com>
> Tested-by: Massimo Morana <massimo.morana@canonical.com>
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Joel Becker <jlbec@evilplan.org>

> ---
>  fs/namei.c              |  115 +++++++++++++++++++++++++++--------------------
>  fs/ocfs2/refcounttree.c |   10 +---
>  include/linux/namei.h   |    2 +
>  net/unix/af_unix.c      |   13 ++----
>  4 files changed, 74 insertions(+), 66 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 0062dd1..5417fa1 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2460,7 +2460,9 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,
>  	return file;
>  }
>  
> -struct dentry *kern_path_create(int dfd, const char *pathname, struct path *path, int is_dir)
> +static struct dentry *do_kern_path_create(int dfd, const char *pathname,
> +					  struct path *path, int is_dir,
> +					  int freeze_protect)
>  {
>  	struct dentry *dentry = ERR_PTR(-EEXIST);
>  	struct nameidata nd;
> @@ -2478,6 +2480,14 @@ struct dentry *kern_path_create(int dfd, const char *pathname, struct path *path
>  	nd.flags |= LOOKUP_CREATE | LOOKUP_EXCL;
>  	nd.intent.open.flags = O_EXCL;
>  
> +	if (freeze_protect) {
> +		error = mnt_want_write(nd.path.mnt);
> +		if (error) {
> +			dentry = ERR_PTR(error);
> +			goto out;
> +		}
> +	}
> +
>  	/*
>  	 * Do the final lookup.
>  	 */
> @@ -2506,24 +2516,49 @@ eexist:
>  	dentry = ERR_PTR(-EEXIST);
>  fail:
>  	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
> +	if (freeze_protect)
> +		mnt_drop_write(nd.path.mnt);
>  out:
>  	path_put(&nd.path);
>  	return dentry;
>  }
> +
> +struct dentry *kern_path_create(int dfd, const char *pathname, struct path *path, int is_dir)
> +{
> +	return do_kern_path_create(dfd, pathname, path, is_dir, 0);
> +}
>  EXPORT_SYMBOL(kern_path_create);
>  
> +struct dentry *kern_path_create_thawed(int dfd, const char *pathname, struct path *path, int is_dir)
> +{
> +	return do_kern_path_create(dfd, pathname, path, is_dir, 1);
> +}
> +EXPORT_SYMBOL(kern_path_create_thawed);
> +
>  struct dentry *user_path_create(int dfd, const char __user *pathname, struct path *path, int is_dir)
>  {
>  	char *tmp = getname(pathname);
>  	struct dentry *res;
>  	if (IS_ERR(tmp))
>  		return ERR_CAST(tmp);
> -	res = kern_path_create(dfd, tmp, path, is_dir);
> +	res = do_kern_path_create(dfd, tmp, path, is_dir, 0);
>  	putname(tmp);
>  	return res;
>  }
>  EXPORT_SYMBOL(user_path_create);
>  
> +struct dentry *user_path_create_thawed(int dfd, const char __user *pathname, struct path *path, int is_dir)
> +{
> +	char *tmp = getname(pathname);
> +	struct dentry *res;
> +	if (IS_ERR(tmp))
> +		return ERR_CAST(tmp);
> +	res = do_kern_path_create(dfd, tmp, path, is_dir, 1);
> +	putname(tmp);
> +	return res;
> +}
> +EXPORT_SYMBOL(user_path_create_thawed);
> +
>  int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
>  {
>  	int error = may_create(dir, dentry);
> @@ -2579,7 +2614,7 @@ SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, umode_t, mode,
>  	if (S_ISDIR(mode))
>  		return -EPERM;
>  
> -	dentry = user_path_create(dfd, filename, &path, 0);
> +	dentry = user_path_create_thawed(dfd, filename, &path, 0);
>  	if (IS_ERR(dentry))
>  		return PTR_ERR(dentry);
>  
> @@ -2588,12 +2623,9 @@ SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, umode_t, mode,
>  	error = may_mknod(mode);
>  	if (error)
>  		goto out_dput;
> -	error = mnt_want_write(path.mnt);
> -	if (error)
> -		goto out_dput;
>  	error = security_path_mknod(&path, dentry, mode, dev);
>  	if (error)
> -		goto out_drop_write;
> +		goto out_dput;
>  	switch (mode & S_IFMT) {
>  		case 0: case S_IFREG:
>  			error = vfs_create(path.dentry->d_inode,dentry,mode,NULL);
> @@ -2606,11 +2638,10 @@ SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, umode_t, mode,
>  			error = vfs_mknod(path.dentry->d_inode,dentry,mode,0);
>  			break;
>  	}
> -out_drop_write:
> -	mnt_drop_write(path.mnt);
>  out_dput:
>  	dput(dentry);
>  	mutex_unlock(&path.dentry->d_inode->i_mutex);
> +	mnt_drop_write(path.mnt);
>  	path_put(&path);
>  
>  	return error;
> @@ -2652,24 +2683,20 @@ SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t, mode)
>  	struct path path;
>  	int error;
>  
> -	dentry = user_path_create(dfd, pathname, &path, 1);
> +	dentry = user_path_create_thawed(dfd, pathname, &path, 1);
>  	if (IS_ERR(dentry))
>  		return PTR_ERR(dentry);
>  
>  	if (!IS_POSIXACL(path.dentry->d_inode))
>  		mode &= ~current_umask();
> -	error = mnt_want_write(path.mnt);
> -	if (error)
> -		goto out_dput;
>  	error = security_path_mkdir(&path, dentry, mode);
>  	if (error)
> -		goto out_drop_write;
> +		goto out_dput;
>  	error = vfs_mkdir(path.dentry->d_inode, dentry, mode);
> -out_drop_write:
> -	mnt_drop_write(path.mnt);
>  out_dput:
>  	dput(dentry);
>  	mutex_unlock(&path.dentry->d_inode->i_mutex);
> +	mnt_drop_write(path.mnt);
>  	path_put(&path);
>  	return error;
>  }
> @@ -2764,6 +2791,9 @@ static long do_rmdir(int dfd, const char __user *pathname)
>  	}
>  
>  	nd.flags &= ~LOOKUP_PARENT;
> +	error = mnt_want_write(nd.path.mnt);
> +	if (error)
> +		goto exit1;
>  
>  	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
>  	dentry = lookup_hash(&nd);
> @@ -2774,19 +2804,15 @@ static long do_rmdir(int dfd, const char __user *pathname)
>  		error = -ENOENT;
>  		goto exit3;
>  	}
> -	error = mnt_want_write(nd.path.mnt);
> -	if (error)
> -		goto exit3;
>  	error = security_path_rmdir(&nd.path, dentry);
>  	if (error)
> -		goto exit4;
> +		goto exit3;
>  	error = vfs_rmdir(nd.path.dentry->d_inode, dentry);
> -exit4:
> -	mnt_drop_write(nd.path.mnt);
>  exit3:
>  	dput(dentry);
>  exit2:
>  	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
> +	mnt_drop_write(nd.path.mnt);
>  exit1:
>  	path_put(&nd.path);
>  	putname(name);
> @@ -2853,6 +2879,9 @@ static long do_unlinkat(int dfd, const char __user *pathname)
>  		goto exit1;
>  
>  	nd.flags &= ~LOOKUP_PARENT;
> +	error = mnt_want_write(nd.path.mnt);
> +	if (error)
> +		goto exit1;
>  
>  	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
>  	dentry = lookup_hash(&nd);
> @@ -2865,21 +2894,17 @@ static long do_unlinkat(int dfd, const char __user *pathname)
>  		if (!inode)
>  			goto slashes;
>  		ihold(inode);
> -		error = mnt_want_write(nd.path.mnt);
> -		if (error)
> -			goto exit2;
>  		error = security_path_unlink(&nd.path, dentry);
>  		if (error)
> -			goto exit3;
> +			goto exit2;
>  		error = vfs_unlink(nd.path.dentry->d_inode, dentry);
> -exit3:
> -		mnt_drop_write(nd.path.mnt);
> -	exit2:
> +exit2:
>  		dput(dentry);
>  	}
>  	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
>  	if (inode)
>  		iput(inode);	/* truncate the inode here */
> +	mnt_drop_write(nd.path.mnt);
>  exit1:
>  	path_put(&nd.path);
>  	putname(name);
> @@ -2939,23 +2964,19 @@ SYSCALL_DEFINE3(symlinkat, const char __user *, oldname,
>  	if (IS_ERR(from))
>  		return PTR_ERR(from);
>  
> -	dentry = user_path_create(newdfd, newname, &path, 0);
> +	dentry = user_path_create_thawed(newdfd, newname, &path, 0);
>  	error = PTR_ERR(dentry);
>  	if (IS_ERR(dentry))
>  		goto out_putname;
>  
> -	error = mnt_want_write(path.mnt);
> -	if (error)
> -		goto out_dput;
>  	error = security_path_symlink(&path, dentry, from);
>  	if (error)
> -		goto out_drop_write;
> +		goto out_dput;
>  	error = vfs_symlink(path.dentry->d_inode, dentry, from);
> -out_drop_write:
> -	mnt_drop_write(path.mnt);
>  out_dput:
>  	dput(dentry);
>  	mutex_unlock(&path.dentry->d_inode->i_mutex);
> +	mnt_drop_write(path.mnt);
>  	path_put(&path);
>  out_putname:
>  	putname(from);
> @@ -3048,7 +3069,7 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
>  	if (error)
>  		return error;
>  
> -	new_dentry = user_path_create(newdfd, newname, &new_path, 0);
> +	new_dentry = user_path_create_thawed(newdfd, newname, &new_path, 0);
>  	error = PTR_ERR(new_dentry);
>  	if (IS_ERR(new_dentry))
>  		goto out;
> @@ -3056,18 +3077,14 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
>  	error = -EXDEV;
>  	if (old_path.mnt != new_path.mnt)
>  		goto out_dput;
> -	error = mnt_want_write(new_path.mnt);
> -	if (error)
> -		goto out_dput;
>  	error = security_path_link(old_path.dentry, &new_path, new_dentry);
>  	if (error)
> -		goto out_drop_write;
> +		goto out_dput;
>  	error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry);
> -out_drop_write:
> -	mnt_drop_write(new_path.mnt);
>  out_dput:
>  	dput(new_dentry);
>  	mutex_unlock(&new_path.dentry->d_inode->i_mutex);
> +	mnt_drop_write(new_path.mnt);
>  	path_put(&new_path);
>  out:
>  	path_put(&old_path);
> @@ -3264,6 +3281,10 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
>  	if (newnd.last_type != LAST_NORM)
>  		goto exit2;
>  
> +	error = mnt_want_write(oldnd.path.mnt);
> +	if (error)
> +		goto exit2;
> +
>  	oldnd.flags &= ~LOOKUP_PARENT;
>  	newnd.flags &= ~LOOKUP_PARENT;
>  	newnd.flags |= LOOKUP_RENAME_TARGET;
> @@ -3299,23 +3320,19 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
>  	if (new_dentry == trap)
>  		goto exit5;
>  
> -	error = mnt_want_write(oldnd.path.mnt);
> -	if (error)
> -		goto exit5;
>  	error = security_path_rename(&oldnd.path, old_dentry,
>  				     &newnd.path, new_dentry);
>  	if (error)
> -		goto exit6;
> +		goto exit5;
>  	error = vfs_rename(old_dir->d_inode, old_dentry,
>  				   new_dir->d_inode, new_dentry);
> -exit6:
> -	mnt_drop_write(oldnd.path.mnt);
>  exit5:
>  	dput(new_dentry);
>  exit4:
>  	dput(old_dentry);
>  exit3:
>  	unlock_rename(new_dir, old_dir);
> +	mnt_drop_write(oldnd.path.mnt);
>  exit2:
>  	path_put(&newnd.path);
>  	putname(to);
> diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
> index cf78233..a99b8e2 100644
> --- a/fs/ocfs2/refcounttree.c
> +++ b/fs/ocfs2/refcounttree.c
> @@ -4453,7 +4453,7 @@ int ocfs2_reflink_ioctl(struct inode *inode,
>  		return error;
>  	}
>  
> -	new_dentry = user_path_create(AT_FDCWD, newname, &new_path, 0);
> +	new_dentry = user_path_create_thawed(AT_FDCWD, newname, &new_path, 0);
>  	error = PTR_ERR(new_dentry);
>  	if (IS_ERR(new_dentry)) {
>  		mlog_errno(error);
> @@ -4466,19 +4466,13 @@ int ocfs2_reflink_ioctl(struct inode *inode,
>  		goto out_dput;
>  	}
>  
> -	error = mnt_want_write(new_path.mnt);
> -	if (error) {
> -		mlog_errno(error);
> -		goto out_dput;
> -	}
> -
>  	error = ocfs2_vfs_reflink(old_path.dentry,
>  				  new_path.dentry->d_inode,
>  				  new_dentry, preserve);
> -	mnt_drop_write(new_path.mnt);
>  out_dput:
>  	dput(new_dentry);
>  	mutex_unlock(&new_path.dentry->d_inode->i_mutex);
> +	mnt_drop_write(new_path.mnt);
>  	path_put(&new_path);
>  out:
>  	path_put(&old_path);
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index ffc0213..432f6bb 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -77,7 +77,9 @@ extern int user_path_at_empty(int, const char __user *, unsigned, struct path *,
>  extern int kern_path(const char *, unsigned, struct path *);
>  
>  extern struct dentry *kern_path_create(int, const char *, struct path *, int);
> +extern struct dentry *kern_path_create_thawed(int, const char *, struct path *, int);
>  extern struct dentry *user_path_create(int, const char __user *, struct path *, int);
> +extern struct dentry *user_path_create_thawed(int, const char __user *, struct path *, int);
>  extern int kern_path_parent(const char *, struct nameidata *);
>  extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
>  			   const char *, unsigned int, struct path *);
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index d510353..c532632 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -865,7 +865,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
>  		 * Get the parent directory, calculate the hash for last
>  		 * component.
>  		 */
> -		dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
> +		dentry = kern_path_create_thawed(AT_FDCWD, sun_path, &path, 0);
>  		err = PTR_ERR(dentry);
>  		if (IS_ERR(dentry))
>  			goto out_mknod_parent;
> @@ -875,19 +875,13 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
>  		 */
>  		mode = S_IFSOCK |
>  		       (SOCK_INODE(sock)->i_mode & ~current_umask());
> -		err = mnt_want_write(path.mnt);
> -		if (err)
> -			goto out_mknod_dput;
>  		err = security_path_mknod(&path, dentry, mode, 0);
>  		if (err)
> -			goto out_mknod_drop_write;
> -		err = vfs_mknod(path.dentry->d_inode, dentry, mode, 0);
> -out_mknod_drop_write:
> -		mnt_drop_write(path.mnt);
> -		if (err)
>  			goto out_mknod_dput;
> +		err = vfs_mknod(path.dentry->d_inode, dentry, mode, 0);
>  		mutex_unlock(&path.dentry->d_inode->i_mutex);
>  		dput(path.dentry);
> +		mnt_drop_write(path.mnt);
>  		path.dentry = dentry;
>  
>  		addr->hash = UNIX_HASH_SIZE;
> @@ -924,6 +918,7 @@ out:
>  out_mknod_dput:
>  	dput(dentry);
>  	mutex_unlock(&path.dentry->d_inode->i_mutex);
> +	mnt_drop_write(path.mnt);
>  	path_put(&path);
>  out_mknod_parent:
>  	if (err == -EEXIST)
> -- 
> 1.7.1
> 

-- 

"Hell is oneself, hell is alone, the other figures in it, merely projections."
        - T. S. Eliot

			http://www.jlbec.org/
			jlbec@evilplan.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 19/27] ocfs2: Convert to new freezing mechanism
  2012-04-16 16:13 ` [PATCH 19/27] ocfs2: Convert to new freezing mechanism Jan Kara
@ 2012-04-17  2:19   ` Joel Becker
  2012-04-17  7:44     ` Jan Kara
  0 siblings, 1 reply; 42+ messages in thread
From: Joel Becker @ 2012-04-17  2:19 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, dchinner, LKML, linux-fsdevel, Mark Fasheh, ocfs2-devel

On Mon, Apr 16, 2012 at 06:13:57PM +0200, Jan Kara wrote:
> Protect ocfs2_page_mkwrite() and ocfs2_file_aio_write() using the new
> freeze protection. We also protect several ioctl entry points which
> were missing the protection.
> 
> CC: Mark Fasheh <mfasheh@suse.com>
> CC: Joel Becker <jlbec@evilplan.org>
> CC: ocfs2-devel@oss.oracle.com
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Joel Becker <jlbec@evilplan.org>

> ---
>  fs/ocfs2/file.c  |   11 +++++++++--
>  fs/ocfs2/ioctl.c |   14 ++++++++++++--
>  fs/ocfs2/mmap.c  |    2 ++
>  3 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 061591a..9b1e3d4 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1971,6 +1971,7 @@ int ocfs2_change_file_space(struct file *file, unsigned int cmd,
>  {
>  	struct inode *inode = file->f_path.dentry->d_inode;
>  	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +	int ret;
>  
>  	if ((cmd == OCFS2_IOC_RESVSP || cmd == OCFS2_IOC_RESVSP64) &&
>  	    !ocfs2_writes_unwritten_extents(osb))
> @@ -1985,7 +1986,12 @@ int ocfs2_change_file_space(struct file *file, unsigned int cmd,
>  	if (!(file->f_mode & FMODE_WRITE))
>  		return -EBADF;
>  
> -	return __ocfs2_change_file_space(file, inode, file->f_pos, cmd, sr, 0);
> +	ret = mnt_want_write_file(file);
> +	if (ret)
> +		return ret;
> +	ret = __ocfs2_change_file_space(file, inode, file->f_pos, cmd, sr, 0);
> +	mnt_drop_write_file(file);
> +	return ret;
>  }
>  
>  static long ocfs2_fallocate(struct file *file, int mode, loff_t offset,
> @@ -2261,7 +2267,7 @@ static ssize_t ocfs2_file_aio_write(struct kiocb *iocb,
>  	if (iocb->ki_left == 0)
>  		return 0;
>  
> -	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
> +	sb_start_write(inode->i_sb);
>  
>  	appending = file->f_flags & O_APPEND ? 1 : 0;
>  	direct_io = file->f_flags & O_DIRECT ? 1 : 0;
> @@ -2434,6 +2440,7 @@ out_sems:
>  		ocfs2_iocb_clear_sem_locked(iocb);
>  
>  	mutex_unlock(&inode->i_mutex);
> +	sb_end_write(inode->i_sb);
>  
>  	if (written)
>  		ret = written;
> diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c
> index a1a1bfd..d9c352b 100644
> --- a/fs/ocfs2/ioctl.c
> +++ b/fs/ocfs2/ioctl.c
> @@ -926,7 +926,12 @@ long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		if (get_user(new_clusters, (int __user *)arg))
>  			return -EFAULT;
>  
> -		return ocfs2_group_extend(inode, new_clusters);
> +		status = mnt_want_write_file(filp);
> +		if (status)
> +			return status;
> +		status = ocfs2_group_extend(inode, new_clusters);
> +		mnt_drop_write_file(filp);
> +		return status;
>  	case OCFS2_IOC_GROUP_ADD:
>  	case OCFS2_IOC_GROUP_ADD64:
>  		if (!capable(CAP_SYS_RESOURCE))
> @@ -935,7 +940,12 @@ long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		if (copy_from_user(&input, (int __user *) arg, sizeof(input)))
>  			return -EFAULT;
>  
> -		return ocfs2_group_add(inode, &input);
> +		status = mnt_want_write_file(filp);
> +		if (status)
> +			return status;
> +		status = ocfs2_group_add(inode, &input);
> +		mnt_drop_write_file(filp);
> +		return status;
>  	case OCFS2_IOC_REFLINK:
>  		if (copy_from_user(&args, (struct reflink_arguments *)arg,
>  				   sizeof(args)))
> diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
> index 9cd4108..d150372 100644
> --- a/fs/ocfs2/mmap.c
> +++ b/fs/ocfs2/mmap.c
> @@ -136,6 +136,7 @@ static int ocfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
>  	sigset_t oldset;
>  	int ret;
>  
> +	sb_start_pagefault(inode->i_sb);
>  	ocfs2_block_signals(&oldset);
>  
>  	/*
> @@ -165,6 +166,7 @@ static int ocfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
>  
>  out:
>  	ocfs2_unblock_signals(&oldset);
> +	sb_end_pagefault(inode->i_sb);
>  	return ret;
>  }
>  
> -- 
> 1.7.1
> 

-- 

Life's Little Instruction Book #197

	"Don't forget, a person's greatest emotional need is to 
	 feel appreciated."

			http://www.jlbec.org/
			jlbec@evilplan.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 00/19 v5] Fix filesystem freezing deadlocks
  2012-04-17  0:43   ` Dave Chinner
@ 2012-04-17  5:10     ` Andreas Dilger
  2012-04-18  0:46       ` Chris Samuel
  0 siblings, 1 reply; 42+ messages in thread
From: Andreas Dilger @ 2012-04-17  5:10 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, J. Bruce Fields, ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g,
	KONISHI Ryusuke, OGAWA Hirofumi,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA, Miklos Szeredi,
	cluster-devel-H+wXaHxf7aLQT0dZR+AlfA, Anton Altaparmakov,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Mark Fasheh,
	xfs-VZNHf3L845pBDgjK7y7TUQ, Ben Myers, Joel Becker,
	dchinner-H+wXaHxf7aLQT0dZR+AlfA, Steven Whitehouse, Chris Mason,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Alex Elder, Theodore Ts'o,
	linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, LKML, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, David S. Miller,
	linux-btrfs-u79uwXL29Tb/PtFMR13I2A

On 2012-04-16, at 5:43 PM, Dave Chinner wrote:
> On Mon, Apr 16, 2012 at 03:02:50PM -0700, Andreas Dilger wrote:
>> On 2012-04-16, at 9:13 AM, Jan Kara wrote:
>>> Another potential contention point might be patch 19. In that patch
>>> we make freeze_super() refuse to freeze the filesystem when there
>>> are open but unlinked files which may be impractical in some cases.
>>> The main reason for this is the problem with handling of file deletion from fput() called with mmap_sem held (e.g. from munmap(2)),
>>> and then there's the fact that we cannot really force such filesystem
>>> into a consistent state... But if people think that freezing with
>>> open but unlinked files should happen, then I have some possible
>>> solutions in mind (maybe as a separate patchset since this is
>>> large enough).
>> 
>> Looking at a desktop system, I think it is very typical that there
>> are open-unlinked files present, so I don't know if this is really
>> an acceptable solution.  It isn't clear from your comments whether
>> this is a blanket refusal for all open-unlinked files, or only in
>> some particular cases...
> 
> Unlinked-but-open files are the reason that XFS dirties the log
> after the freeze process is complete. This ensures that if the
> system crashes while the filesystem is frozen then log recovery
> during the next mount will process the unlinked (orphaned) inodes
> and free the correctly. i.e. you can still freeze a filesystem with
> inodes in this state successfully and have everythign behave as
> you'd expect.
> 
> I'm not sure how other filesystems handle this problem, but perhaps
> pushing this check down into filesystem specific code or adding a
> superblock feature flag might be a way to allow filesystems to
> handle this case in the way they think is best...

The ext3/4 code has long been able to handle open-unlinked files
properly after a crash (they are put into a singly-linked list from
the superblock on disk that is processed after journal recovery).

The issue here is that blocking freeze from succeeding with open-
unlinked files is an unreasonable assumption of this patch, and
I don't think it is acceptable to land this patchset (which IMHO
would prevent nearly every Gnome system from freezing unless these
apps have changed their behaviour in more recent releases).

Like you suggest, filesystems that handle this correctly should be
able to flag or otherwise indicate that this is OK, and allow the
freeze to continue.  For other filesystems that do not handle
open-unlinked file consistency during a filesystem freeze/snapshot
whether this should even be considered a new case, or is something
that has existed for ages already.

The other question is whether this is still a problem even for
filesystems handling the consistency issue, but from Jan's comment
above there is a new locking issue related to mmap_sem being added?

Cheers, Andreas
--
Andreas Dilger                       Whamcloud, Inc.
Principal Lustre Engineer            http://www.whamcloud.com/





------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 09/27] fs: Push mnt_want_write() outside of i_mutex
  2012-04-17  2:18   ` Joel Becker
@ 2012-04-17  7:43     ` Jan Kara
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-17  7:43 UTC (permalink / raw)
  To: Joel Becker
  Cc: Jan Kara, Al Viro, dchinner, LKML, linux-fsdevel, ocfs2-devel,
	Mark Fasheh, David S. Miller

On Mon 16-04-12 19:18:53, Joel Becker wrote:
> On Mon, Apr 16, 2012 at 06:13:47PM +0200, Jan Kara wrote:
> > Currently, mnt_want_write() is sometimes called with i_mutex held and sometimes
> > without it. This isn't really a problem because mnt_want_write() is a
> > non-blocking operation (essentially has a trylock semantics) but when the
> > function starts to handle also frozen filesystems, it will get a full lock
> > semantics and thus proper lock ordering has to be established. So move
> > all mnt_want_write() calls outside of i_mutex.
> > 
> > One non-trivial case needing conversion is kern_path_create() /
> > user_path_create() which didn't include mnt_want_write() but now needs to
> > because it acquires i_mutex.  Because there are virtual file systems which
> > don't bother with freeze / remount-ro protection we actually provide both
> > versions of the function - one which calls mnt_want_write() and one which does
> > not.
> > 
> > CC: ocfs2-devel@oss.oracle.com
> > CC: Mark Fasheh <mfasheh@suse.com>
> > CC: Joel Becker <jlbec@evilplan.org>
> > CC: "David S. Miller" <davem@davemloft.net>
> > BugLink: https://bugs.launchpad.net/bugs/897421
> > Tested-by: Kamal Mostafa <kamal@canonical.com>
> > Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
> > Tested-by: Dann Frazier <dann.frazier@canonical.com>
> > Tested-by: Massimo Morana <massimo.morana@canonical.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> Acked-by: Joel Becker <jlbec@evilplan.org>
  Thanks. Added.

								Honza
> 
> > ---
> >  fs/namei.c              |  115 +++++++++++++++++++++++++++--------------------
> >  fs/ocfs2/refcounttree.c |   10 +---
> >  include/linux/namei.h   |    2 +
> >  net/unix/af_unix.c      |   13 ++----
> >  4 files changed, 74 insertions(+), 66 deletions(-)
> > 
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 0062dd1..5417fa1 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -2460,7 +2460,9 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,
> >  	return file;
> >  }
> >  
> > -struct dentry *kern_path_create(int dfd, const char *pathname, struct path *path, int is_dir)
> > +static struct dentry *do_kern_path_create(int dfd, const char *pathname,
> > +					  struct path *path, int is_dir,
> > +					  int freeze_protect)
> >  {
> >  	struct dentry *dentry = ERR_PTR(-EEXIST);
> >  	struct nameidata nd;
> > @@ -2478,6 +2480,14 @@ struct dentry *kern_path_create(int dfd, const char *pathname, struct path *path
> >  	nd.flags |= LOOKUP_CREATE | LOOKUP_EXCL;
> >  	nd.intent.open.flags = O_EXCL;
> >  
> > +	if (freeze_protect) {
> > +		error = mnt_want_write(nd.path.mnt);
> > +		if (error) {
> > +			dentry = ERR_PTR(error);
> > +			goto out;
> > +		}
> > +	}
> > +
> >  	/*
> >  	 * Do the final lookup.
> >  	 */
> > @@ -2506,24 +2516,49 @@ eexist:
> >  	dentry = ERR_PTR(-EEXIST);
> >  fail:
> >  	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
> > +	if (freeze_protect)
> > +		mnt_drop_write(nd.path.mnt);
> >  out:
> >  	path_put(&nd.path);
> >  	return dentry;
> >  }
> > +
> > +struct dentry *kern_path_create(int dfd, const char *pathname, struct path *path, int is_dir)
> > +{
> > +	return do_kern_path_create(dfd, pathname, path, is_dir, 0);
> > +}
> >  EXPORT_SYMBOL(kern_path_create);
> >  
> > +struct dentry *kern_path_create_thawed(int dfd, const char *pathname, struct path *path, int is_dir)
> > +{
> > +	return do_kern_path_create(dfd, pathname, path, is_dir, 1);
> > +}
> > +EXPORT_SYMBOL(kern_path_create_thawed);
> > +
> >  struct dentry *user_path_create(int dfd, const char __user *pathname, struct path *path, int is_dir)
> >  {
> >  	char *tmp = getname(pathname);
> >  	struct dentry *res;
> >  	if (IS_ERR(tmp))
> >  		return ERR_CAST(tmp);
> > -	res = kern_path_create(dfd, tmp, path, is_dir);
> > +	res = do_kern_path_create(dfd, tmp, path, is_dir, 0);
> >  	putname(tmp);
> >  	return res;
> >  }
> >  EXPORT_SYMBOL(user_path_create);
> >  
> > +struct dentry *user_path_create_thawed(int dfd, const char __user *pathname, struct path *path, int is_dir)
> > +{
> > +	char *tmp = getname(pathname);
> > +	struct dentry *res;
> > +	if (IS_ERR(tmp))
> > +		return ERR_CAST(tmp);
> > +	res = do_kern_path_create(dfd, tmp, path, is_dir, 1);
> > +	putname(tmp);
> > +	return res;
> > +}
> > +EXPORT_SYMBOL(user_path_create_thawed);
> > +
> >  int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
> >  {
> >  	int error = may_create(dir, dentry);
> > @@ -2579,7 +2614,7 @@ SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, umode_t, mode,
> >  	if (S_ISDIR(mode))
> >  		return -EPERM;
> >  
> > -	dentry = user_path_create(dfd, filename, &path, 0);
> > +	dentry = user_path_create_thawed(dfd, filename, &path, 0);
> >  	if (IS_ERR(dentry))
> >  		return PTR_ERR(dentry);
> >  
> > @@ -2588,12 +2623,9 @@ SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, umode_t, mode,
> >  	error = may_mknod(mode);
> >  	if (error)
> >  		goto out_dput;
> > -	error = mnt_want_write(path.mnt);
> > -	if (error)
> > -		goto out_dput;
> >  	error = security_path_mknod(&path, dentry, mode, dev);
> >  	if (error)
> > -		goto out_drop_write;
> > +		goto out_dput;
> >  	switch (mode & S_IFMT) {
> >  		case 0: case S_IFREG:
> >  			error = vfs_create(path.dentry->d_inode,dentry,mode,NULL);
> > @@ -2606,11 +2638,10 @@ SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, umode_t, mode,
> >  			error = vfs_mknod(path.dentry->d_inode,dentry,mode,0);
> >  			break;
> >  	}
> > -out_drop_write:
> > -	mnt_drop_write(path.mnt);
> >  out_dput:
> >  	dput(dentry);
> >  	mutex_unlock(&path.dentry->d_inode->i_mutex);
> > +	mnt_drop_write(path.mnt);
> >  	path_put(&path);
> >  
> >  	return error;
> > @@ -2652,24 +2683,20 @@ SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t, mode)
> >  	struct path path;
> >  	int error;
> >  
> > -	dentry = user_path_create(dfd, pathname, &path, 1);
> > +	dentry = user_path_create_thawed(dfd, pathname, &path, 1);
> >  	if (IS_ERR(dentry))
> >  		return PTR_ERR(dentry);
> >  
> >  	if (!IS_POSIXACL(path.dentry->d_inode))
> >  		mode &= ~current_umask();
> > -	error = mnt_want_write(path.mnt);
> > -	if (error)
> > -		goto out_dput;
> >  	error = security_path_mkdir(&path, dentry, mode);
> >  	if (error)
> > -		goto out_drop_write;
> > +		goto out_dput;
> >  	error = vfs_mkdir(path.dentry->d_inode, dentry, mode);
> > -out_drop_write:
> > -	mnt_drop_write(path.mnt);
> >  out_dput:
> >  	dput(dentry);
> >  	mutex_unlock(&path.dentry->d_inode->i_mutex);
> > +	mnt_drop_write(path.mnt);
> >  	path_put(&path);
> >  	return error;
> >  }
> > @@ -2764,6 +2791,9 @@ static long do_rmdir(int dfd, const char __user *pathname)
> >  	}
> >  
> >  	nd.flags &= ~LOOKUP_PARENT;
> > +	error = mnt_want_write(nd.path.mnt);
> > +	if (error)
> > +		goto exit1;
> >  
> >  	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
> >  	dentry = lookup_hash(&nd);
> > @@ -2774,19 +2804,15 @@ static long do_rmdir(int dfd, const char __user *pathname)
> >  		error = -ENOENT;
> >  		goto exit3;
> >  	}
> > -	error = mnt_want_write(nd.path.mnt);
> > -	if (error)
> > -		goto exit3;
> >  	error = security_path_rmdir(&nd.path, dentry);
> >  	if (error)
> > -		goto exit4;
> > +		goto exit3;
> >  	error = vfs_rmdir(nd.path.dentry->d_inode, dentry);
> > -exit4:
> > -	mnt_drop_write(nd.path.mnt);
> >  exit3:
> >  	dput(dentry);
> >  exit2:
> >  	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
> > +	mnt_drop_write(nd.path.mnt);
> >  exit1:
> >  	path_put(&nd.path);
> >  	putname(name);
> > @@ -2853,6 +2879,9 @@ static long do_unlinkat(int dfd, const char __user *pathname)
> >  		goto exit1;
> >  
> >  	nd.flags &= ~LOOKUP_PARENT;
> > +	error = mnt_want_write(nd.path.mnt);
> > +	if (error)
> > +		goto exit1;
> >  
> >  	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
> >  	dentry = lookup_hash(&nd);
> > @@ -2865,21 +2894,17 @@ static long do_unlinkat(int dfd, const char __user *pathname)
> >  		if (!inode)
> >  			goto slashes;
> >  		ihold(inode);
> > -		error = mnt_want_write(nd.path.mnt);
> > -		if (error)
> > -			goto exit2;
> >  		error = security_path_unlink(&nd.path, dentry);
> >  		if (error)
> > -			goto exit3;
> > +			goto exit2;
> >  		error = vfs_unlink(nd.path.dentry->d_inode, dentry);
> > -exit3:
> > -		mnt_drop_write(nd.path.mnt);
> > -	exit2:
> > +exit2:
> >  		dput(dentry);
> >  	}
> >  	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
> >  	if (inode)
> >  		iput(inode);	/* truncate the inode here */
> > +	mnt_drop_write(nd.path.mnt);
> >  exit1:
> >  	path_put(&nd.path);
> >  	putname(name);
> > @@ -2939,23 +2964,19 @@ SYSCALL_DEFINE3(symlinkat, const char __user *, oldname,
> >  	if (IS_ERR(from))
> >  		return PTR_ERR(from);
> >  
> > -	dentry = user_path_create(newdfd, newname, &path, 0);
> > +	dentry = user_path_create_thawed(newdfd, newname, &path, 0);
> >  	error = PTR_ERR(dentry);
> >  	if (IS_ERR(dentry))
> >  		goto out_putname;
> >  
> > -	error = mnt_want_write(path.mnt);
> > -	if (error)
> > -		goto out_dput;
> >  	error = security_path_symlink(&path, dentry, from);
> >  	if (error)
> > -		goto out_drop_write;
> > +		goto out_dput;
> >  	error = vfs_symlink(path.dentry->d_inode, dentry, from);
> > -out_drop_write:
> > -	mnt_drop_write(path.mnt);
> >  out_dput:
> >  	dput(dentry);
> >  	mutex_unlock(&path.dentry->d_inode->i_mutex);
> > +	mnt_drop_write(path.mnt);
> >  	path_put(&path);
> >  out_putname:
> >  	putname(from);
> > @@ -3048,7 +3069,7 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
> >  	if (error)
> >  		return error;
> >  
> > -	new_dentry = user_path_create(newdfd, newname, &new_path, 0);
> > +	new_dentry = user_path_create_thawed(newdfd, newname, &new_path, 0);
> >  	error = PTR_ERR(new_dentry);
> >  	if (IS_ERR(new_dentry))
> >  		goto out;
> > @@ -3056,18 +3077,14 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
> >  	error = -EXDEV;
> >  	if (old_path.mnt != new_path.mnt)
> >  		goto out_dput;
> > -	error = mnt_want_write(new_path.mnt);
> > -	if (error)
> > -		goto out_dput;
> >  	error = security_path_link(old_path.dentry, &new_path, new_dentry);
> >  	if (error)
> > -		goto out_drop_write;
> > +		goto out_dput;
> >  	error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry);
> > -out_drop_write:
> > -	mnt_drop_write(new_path.mnt);
> >  out_dput:
> >  	dput(new_dentry);
> >  	mutex_unlock(&new_path.dentry->d_inode->i_mutex);
> > +	mnt_drop_write(new_path.mnt);
> >  	path_put(&new_path);
> >  out:
> >  	path_put(&old_path);
> > @@ -3264,6 +3281,10 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
> >  	if (newnd.last_type != LAST_NORM)
> >  		goto exit2;
> >  
> > +	error = mnt_want_write(oldnd.path.mnt);
> > +	if (error)
> > +		goto exit2;
> > +
> >  	oldnd.flags &= ~LOOKUP_PARENT;
> >  	newnd.flags &= ~LOOKUP_PARENT;
> >  	newnd.flags |= LOOKUP_RENAME_TARGET;
> > @@ -3299,23 +3320,19 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
> >  	if (new_dentry == trap)
> >  		goto exit5;
> >  
> > -	error = mnt_want_write(oldnd.path.mnt);
> > -	if (error)
> > -		goto exit5;
> >  	error = security_path_rename(&oldnd.path, old_dentry,
> >  				     &newnd.path, new_dentry);
> >  	if (error)
> > -		goto exit6;
> > +		goto exit5;
> >  	error = vfs_rename(old_dir->d_inode, old_dentry,
> >  				   new_dir->d_inode, new_dentry);
> > -exit6:
> > -	mnt_drop_write(oldnd.path.mnt);
> >  exit5:
> >  	dput(new_dentry);
> >  exit4:
> >  	dput(old_dentry);
> >  exit3:
> >  	unlock_rename(new_dir, old_dir);
> > +	mnt_drop_write(oldnd.path.mnt);
> >  exit2:
> >  	path_put(&newnd.path);
> >  	putname(to);
> > diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
> > index cf78233..a99b8e2 100644
> > --- a/fs/ocfs2/refcounttree.c
> > +++ b/fs/ocfs2/refcounttree.c
> > @@ -4453,7 +4453,7 @@ int ocfs2_reflink_ioctl(struct inode *inode,
> >  		return error;
> >  	}
> >  
> > -	new_dentry = user_path_create(AT_FDCWD, newname, &new_path, 0);
> > +	new_dentry = user_path_create_thawed(AT_FDCWD, newname, &new_path, 0);
> >  	error = PTR_ERR(new_dentry);
> >  	if (IS_ERR(new_dentry)) {
> >  		mlog_errno(error);
> > @@ -4466,19 +4466,13 @@ int ocfs2_reflink_ioctl(struct inode *inode,
> >  		goto out_dput;
> >  	}
> >  
> > -	error = mnt_want_write(new_path.mnt);
> > -	if (error) {
> > -		mlog_errno(error);
> > -		goto out_dput;
> > -	}
> > -
> >  	error = ocfs2_vfs_reflink(old_path.dentry,
> >  				  new_path.dentry->d_inode,
> >  				  new_dentry, preserve);
> > -	mnt_drop_write(new_path.mnt);
> >  out_dput:
> >  	dput(new_dentry);
> >  	mutex_unlock(&new_path.dentry->d_inode->i_mutex);
> > +	mnt_drop_write(new_path.mnt);
> >  	path_put(&new_path);
> >  out:
> >  	path_put(&old_path);
> > diff --git a/include/linux/namei.h b/include/linux/namei.h
> > index ffc0213..432f6bb 100644
> > --- a/include/linux/namei.h
> > +++ b/include/linux/namei.h
> > @@ -77,7 +77,9 @@ extern int user_path_at_empty(int, const char __user *, unsigned, struct path *,
> >  extern int kern_path(const char *, unsigned, struct path *);
> >  
> >  extern struct dentry *kern_path_create(int, const char *, struct path *, int);
> > +extern struct dentry *kern_path_create_thawed(int, const char *, struct path *, int);
> >  extern struct dentry *user_path_create(int, const char __user *, struct path *, int);
> > +extern struct dentry *user_path_create_thawed(int, const char __user *, struct path *, int);
> >  extern int kern_path_parent(const char *, struct nameidata *);
> >  extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
> >  			   const char *, unsigned int, struct path *);
> > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> > index d510353..c532632 100644
> > --- a/net/unix/af_unix.c
> > +++ b/net/unix/af_unix.c
> > @@ -865,7 +865,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
> >  		 * Get the parent directory, calculate the hash for last
> >  		 * component.
> >  		 */
> > -		dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
> > +		dentry = kern_path_create_thawed(AT_FDCWD, sun_path, &path, 0);
> >  		err = PTR_ERR(dentry);
> >  		if (IS_ERR(dentry))
> >  			goto out_mknod_parent;
> > @@ -875,19 +875,13 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
> >  		 */
> >  		mode = S_IFSOCK |
> >  		       (SOCK_INODE(sock)->i_mode & ~current_umask());
> > -		err = mnt_want_write(path.mnt);
> > -		if (err)
> > -			goto out_mknod_dput;
> >  		err = security_path_mknod(&path, dentry, mode, 0);
> >  		if (err)
> > -			goto out_mknod_drop_write;
> > -		err = vfs_mknod(path.dentry->d_inode, dentry, mode, 0);
> > -out_mknod_drop_write:
> > -		mnt_drop_write(path.mnt);
> > -		if (err)
> >  			goto out_mknod_dput;
> > +		err = vfs_mknod(path.dentry->d_inode, dentry, mode, 0);
> >  		mutex_unlock(&path.dentry->d_inode->i_mutex);
> >  		dput(path.dentry);
> > +		mnt_drop_write(path.mnt);
> >  		path.dentry = dentry;
> >  
> >  		addr->hash = UNIX_HASH_SIZE;
> > @@ -924,6 +918,7 @@ out:
> >  out_mknod_dput:
> >  	dput(dentry);
> >  	mutex_unlock(&path.dentry->d_inode->i_mutex);
> > +	mnt_drop_write(path.mnt);
> >  	path_put(&path);
> >  out_mknod_parent:
> >  	if (err == -EEXIST)
> > -- 
> > 1.7.1
> > 
> 
> -- 
> 
> "Hell is oneself, hell is alone, the other figures in it, merely projections."
>         - T. S. Eliot
> 
> 			http://www.jlbec.org/
> 			jlbec@evilplan.org
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 19/27] ocfs2: Convert to new freezing mechanism
  2012-04-17  2:19   ` Joel Becker
@ 2012-04-17  7:44     ` Jan Kara
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-17  7:44 UTC (permalink / raw)
  To: Joel Becker
  Cc: Jan Kara, Al Viro, dchinner, LKML, linux-fsdevel, Mark Fasheh,
	ocfs2-devel

On Mon 16-04-12 19:19:21, Joel Becker wrote:
> On Mon, Apr 16, 2012 at 06:13:57PM +0200, Jan Kara wrote:
> > Protect ocfs2_page_mkwrite() and ocfs2_file_aio_write() using the new
> > freeze protection. We also protect several ioctl entry points which
> > were missing the protection.
> > 
> > CC: Mark Fasheh <mfasheh@suse.com>
> > CC: Joel Becker <jlbec@evilplan.org>
> > CC: ocfs2-devel@oss.oracle.com
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> Acked-by: Joel Becker <jlbec@evilplan.org>
  Thanks. Added.

								Honza

> > ---
> >  fs/ocfs2/file.c  |   11 +++++++++--
> >  fs/ocfs2/ioctl.c |   14 ++++++++++++--
> >  fs/ocfs2/mmap.c  |    2 ++
> >  3 files changed, 23 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> > index 061591a..9b1e3d4 100644
> > --- a/fs/ocfs2/file.c
> > +++ b/fs/ocfs2/file.c
> > @@ -1971,6 +1971,7 @@ int ocfs2_change_file_space(struct file *file, unsigned int cmd,
> >  {
> >  	struct inode *inode = file->f_path.dentry->d_inode;
> >  	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> > +	int ret;
> >  
> >  	if ((cmd == OCFS2_IOC_RESVSP || cmd == OCFS2_IOC_RESVSP64) &&
> >  	    !ocfs2_writes_unwritten_extents(osb))
> > @@ -1985,7 +1986,12 @@ int ocfs2_change_file_space(struct file *file, unsigned int cmd,
> >  	if (!(file->f_mode & FMODE_WRITE))
> >  		return -EBADF;
> >  
> > -	return __ocfs2_change_file_space(file, inode, file->f_pos, cmd, sr, 0);
> > +	ret = mnt_want_write_file(file);
> > +	if (ret)
> > +		return ret;
> > +	ret = __ocfs2_change_file_space(file, inode, file->f_pos, cmd, sr, 0);
> > +	mnt_drop_write_file(file);
> > +	return ret;
> >  }
> >  
> >  static long ocfs2_fallocate(struct file *file, int mode, loff_t offset,
> > @@ -2261,7 +2267,7 @@ static ssize_t ocfs2_file_aio_write(struct kiocb *iocb,
> >  	if (iocb->ki_left == 0)
> >  		return 0;
> >  
> > -	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
> > +	sb_start_write(inode->i_sb);
> >  
> >  	appending = file->f_flags & O_APPEND ? 1 : 0;
> >  	direct_io = file->f_flags & O_DIRECT ? 1 : 0;
> > @@ -2434,6 +2440,7 @@ out_sems:
> >  		ocfs2_iocb_clear_sem_locked(iocb);
> >  
> >  	mutex_unlock(&inode->i_mutex);
> > +	sb_end_write(inode->i_sb);
> >  
> >  	if (written)
> >  		ret = written;
> > diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c
> > index a1a1bfd..d9c352b 100644
> > --- a/fs/ocfs2/ioctl.c
> > +++ b/fs/ocfs2/ioctl.c
> > @@ -926,7 +926,12 @@ long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> >  		if (get_user(new_clusters, (int __user *)arg))
> >  			return -EFAULT;
> >  
> > -		return ocfs2_group_extend(inode, new_clusters);
> > +		status = mnt_want_write_file(filp);
> > +		if (status)
> > +			return status;
> > +		status = ocfs2_group_extend(inode, new_clusters);
> > +		mnt_drop_write_file(filp);
> > +		return status;
> >  	case OCFS2_IOC_GROUP_ADD:
> >  	case OCFS2_IOC_GROUP_ADD64:
> >  		if (!capable(CAP_SYS_RESOURCE))
> > @@ -935,7 +940,12 @@ long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> >  		if (copy_from_user(&input, (int __user *) arg, sizeof(input)))
> >  			return -EFAULT;
> >  
> > -		return ocfs2_group_add(inode, &input);
> > +		status = mnt_want_write_file(filp);
> > +		if (status)
> > +			return status;
> > +		status = ocfs2_group_add(inode, &input);
> > +		mnt_drop_write_file(filp);
> > +		return status;
> >  	case OCFS2_IOC_REFLINK:
> >  		if (copy_from_user(&args, (struct reflink_arguments *)arg,
> >  				   sizeof(args)))
> > diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
> > index 9cd4108..d150372 100644
> > --- a/fs/ocfs2/mmap.c
> > +++ b/fs/ocfs2/mmap.c
> > @@ -136,6 +136,7 @@ static int ocfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> >  	sigset_t oldset;
> >  	int ret;
> >  
> > +	sb_start_pagefault(inode->i_sb);
> >  	ocfs2_block_signals(&oldset);
> >  
> >  	/*
> > @@ -165,6 +166,7 @@ static int ocfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> >  
> >  out:
> >  	ocfs2_unblock_signals(&oldset);
> > +	sb_end_pagefault(inode->i_sb);
> >  	return ret;
> >  }
> >  
> > -- 
> > 1.7.1
> > 
> 
> -- 
> 
> Life's Little Instruction Book #197
> 
> 	"Don't forget, a person's greatest emotional need is to 
> 	 feel appreciated."
> 
> 			http://www.jlbec.org/
> 			jlbec@evilplan.org
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 12/27] nfsd: Push mnt_want_write() outside of i_mutex
  2012-04-16 18:25   ` J. Bruce Fields
@ 2012-04-17  8:17     ` Jan Kara
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-04-17  8:17 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jan Kara, Al Viro, dchinner, LKML, linux-fsdevel, linux-nfs

On Mon 16-04-12 14:25:33, J. Bruce Fields wrote:
> On Mon, Apr 16, 2012 at 06:13:50PM +0200, Jan Kara wrote:
> > When mnt_want_write() starts to handle freezing it will get a full lock
> > semantics requiring proper lock ordering. So push mnt_want_write() call
> > consistently outside of i_mutex.
> 
> How are you testing this?
  That's a good question :). When I wrote this, I tried running nfs server,
mounting it and checked that I could do basic operations and no lockdep
warning is emitted. If you have better idea what test I should run, I can
do that.

> And do you want this particular track merged for 3.5 through the nfsd
> tree, or should it go some other way?
  My original intent was to let Al merge everything to make it easy. But
you wish to take this patch via your tree, it should be possible (it does
not depend on anything, just the rest of the series depends on it due to
lock ordering constraints).
  
								Honza

> > CC: linux-nfs@vger.kernel.org
> > CC: "J. Bruce Fields" <bfields@fieldses.org>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/nfsd/nfs4recover.c      |    9 +++--
> >  fs/nfsd/nfsfh.c            |    1 +
> >  fs/nfsd/nfsproc.c          |    9 ++++-
> >  fs/nfsd/vfs.c              |   79 ++++++++++++++++++++++---------------------
> >  fs/nfsd/vfs.h              |   11 +++++-
> >  include/linux/nfsd/nfsfh.h |    1 +
> >  6 files changed, 64 insertions(+), 46 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> > index 4767429..efa7574 100644
> > --- a/fs/nfsd/nfs4recover.c
> > +++ b/fs/nfsd/nfs4recover.c
> > @@ -154,6 +154,10 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
> >  	if (status < 0)
> >  		return;
> >  
> > +	status = mnt_want_write_file(rec_file);
> > +	if (status)
> > +		return;
> > +
> >  	dir = rec_file->f_path.dentry;
> >  	/* lock the parent */
> >  	mutex_lock(&dir->d_inode->i_mutex);
> > @@ -173,11 +177,7 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
> >  		 * as well be forgiving and just succeed silently.
> >  		 */
> >  		goto out_put;
> > -	status = mnt_want_write_file(rec_file);
> > -	if (status)
> > -		goto out_put;
> >  	status = vfs_mkdir(dir->d_inode, dentry, S_IRWXU);
> > -	mnt_drop_write_file(rec_file);
> >  out_put:
> >  	dput(dentry);
> >  out_unlock:
> > @@ -189,6 +189,7 @@ out_unlock:
> >  				" (err %d); please check that %s exists"
> >  				" and is writeable", status,
> >  				user_recovery_dirname);
> > +	mnt_drop_write_file(rec_file);
> >  	nfs4_reset_creds(original_cred);
> >  }
> >  
> > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > index 68454e7..8b93353 100644
> > --- a/fs/nfsd/nfsfh.c
> > +++ b/fs/nfsd/nfsfh.c
> > @@ -635,6 +635,7 @@ fh_put(struct svc_fh *fhp)
> >  		fhp->fh_post_saved = 0;
> >  #endif
> >  	}
> > +	fh_drop_write(fhp);
> >  	if (exp) {
> >  		cache_put(&exp->h, &svc_export_cache);
> >  		fhp->fh_export = NULL;
> > diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> > index e15dc45..aad6d45 100644
> > --- a/fs/nfsd/nfsproc.c
> > +++ b/fs/nfsd/nfsproc.c
> > @@ -196,6 +196,7 @@ nfsd_proc_create(struct svc_rqst *rqstp, struct nfsd_createargs *argp,
> >  	struct dentry	*dchild;
> >  	int		type, mode;
> >  	__be32		nfserr;
> > +	int		hosterr;
> >  	dev_t		rdev = 0, wanted = new_decode_dev(attr->ia_size);
> >  
> >  	dprintk("nfsd: CREATE   %s %.*s\n",
> > @@ -214,6 +215,12 @@ nfsd_proc_create(struct svc_rqst *rqstp, struct nfsd_createargs *argp,
> >  	nfserr = nfserr_exist;
> >  	if (isdotent(argp->name, argp->len))
> >  		goto done;
> > +	hosterr = fh_want_write(dirfhp);
> > +	if (hosterr) {
> > +		nfserr = nfserrno(hosterr);
> > +		goto done;
> > +	}
> > +
> >  	fh_lock_nested(dirfhp, I_MUTEX_PARENT);
> >  	dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len);
> >  	if (IS_ERR(dchild)) {
> > @@ -330,7 +337,7 @@ nfsd_proc_create(struct svc_rqst *rqstp, struct nfsd_createargs *argp,
> >  out_unlock:
> >  	/* We don't really need to unlock, as fh_put does it. */
> >  	fh_unlock(dirfhp);
> > -
> > +	fh_drop_write(dirfhp);
> >  done:
> >  	fh_put(dirfhp);
> >  	return nfsd_return_dirop(nfserr, resp);
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 296d671..b8bb649 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1276,6 +1276,10 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	 * If it has, the parent directory should already be locked.
> >  	 */
> >  	if (!resfhp->fh_dentry) {
> > +		host_err = fh_want_write(fhp);
> > +		if (host_err)
> > +			goto out_nfserr;
> > +
> >  		/* called from nfsd_proc_mkdir, or possibly nfsd3_proc_create */
> >  		fh_lock_nested(fhp, I_MUTEX_PARENT);
> >  		dchild = lookup_one_len(fname, dentry, flen);
> > @@ -1319,14 +1323,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  		goto out;
> >  	}
> >  
> > -	host_err = fh_want_write(fhp);
> > -	if (host_err)
> > -		goto out_nfserr;
> > -
> >  	/*
> >  	 * Get the dir op function pointer.
> >  	 */
> >  	err = 0;
> > +	host_err = 0;
> >  	switch (type) {
> >  	case S_IFREG:
> >  		host_err = vfs_create(dirp, dchild, iap->ia_mode, NULL);
> > @@ -1343,10 +1344,8 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  		host_err = vfs_mknod(dirp, dchild, iap->ia_mode, rdev);
> >  		break;
> >  	}
> > -	if (host_err < 0) {
> > -		fh_drop_write(fhp);
> > +	if (host_err < 0)
> >  		goto out_nfserr;
> > -	}
> >  
> >  	err = nfsd_create_setattr(rqstp, resfhp, iap);
> >  
> > @@ -1358,7 +1357,6 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	err2 = nfserrno(commit_metadata(fhp));
> >  	if (err2)
> >  		err = err2;
> > -	fh_drop_write(fhp);
> >  	/*
> >  	 * Update the file handle to get the new inode info.
> >  	 */
> > @@ -1417,6 +1415,11 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	err = nfserr_notdir;
> >  	if (!dirp->i_op->lookup)
> >  		goto out;
> > +
> > +	host_err = fh_want_write(fhp);
> > +	if (host_err)
> > +		goto out_nfserr;
> > +
> >  	fh_lock_nested(fhp, I_MUTEX_PARENT);
> >  
> >  	/*
> > @@ -1449,9 +1452,6 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  		v_atime = verifier[1]&0x7fffffff;
> >  	}
> >  	
> > -	host_err = fh_want_write(fhp);
> > -	if (host_err)
> > -		goto out_nfserr;
> >  	if (dchild->d_inode) {
> >  		err = 0;
> >  
> > @@ -1522,7 +1522,6 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	if (!err)
> >  		err = nfserrno(commit_metadata(fhp));
> >  
> > -	fh_drop_write(fhp);
> >  	/*
> >  	 * Update the filehandle to get the new inode info.
> >  	 */
> > @@ -1533,6 +1532,7 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	fh_unlock(fhp);
> >  	if (dchild && !IS_ERR(dchild))
> >  		dput(dchild);
> > +	fh_drop_write(fhp);
> >   	return err;
> >   
> >   out_nfserr:
> > @@ -1613,6 +1613,11 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> >  	if (err)
> >  		goto out;
> > +
> > +	host_err = fh_want_write(fhp);
> > +	if (host_err)
> > +		goto out_nfserr;
> > +
> >  	fh_lock(fhp);
> >  	dentry = fhp->fh_dentry;
> >  	dnew = lookup_one_len(fname, dentry, flen);
> > @@ -1620,10 +1625,6 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	if (IS_ERR(dnew))
> >  		goto out_nfserr;
> >  
> > -	host_err = fh_want_write(fhp);
> > -	if (host_err)
> > -		goto out_nfserr;
> > -
> >  	if (unlikely(path[plen] != 0)) {
> >  		char *path_alloced = kmalloc(plen+1, GFP_KERNEL);
> >  		if (path_alloced == NULL)
> > @@ -1683,6 +1684,12 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> >  	if (isdotent(name, len))
> >  		goto out;
> >  
> > +	host_err = fh_want_write(tfhp);
> > +	if (host_err) {
> > +		err = nfserrno(host_err);
> > +		goto out;
> > +	}
> > +
> >  	fh_lock_nested(ffhp, I_MUTEX_PARENT);
> >  	ddir = ffhp->fh_dentry;
> >  	dirp = ddir->d_inode;
> > @@ -1694,18 +1701,13 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> >  
> >  	dold = tfhp->fh_dentry;
> >  
> > -	host_err = fh_want_write(tfhp);
> > -	if (host_err) {
> > -		err = nfserrno(host_err);
> > -		goto out_dput;
> > -	}
> >  	err = nfserr_noent;
> >  	if (!dold->d_inode)
> > -		goto out_drop_write;
> > +		goto out_dput;
> >  	host_err = nfsd_break_lease(dold->d_inode);
> >  	if (host_err) {
> >  		err = nfserrno(host_err);
> > -		goto out_drop_write;
> > +		goto out_dput;
> >  	}
> >  	host_err = vfs_link(dold, dirp, dnew);
> >  	if (!host_err) {
> > @@ -1718,12 +1720,11 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> >  		else
> >  			err = nfserrno(host_err);
> >  	}
> > -out_drop_write:
> > -	fh_drop_write(tfhp);
> >  out_dput:
> >  	dput(dnew);
> >  out_unlock:
> >  	fh_unlock(ffhp);
> > +	fh_drop_write(tfhp);
> >  out:
> >  	return err;
> >  
> > @@ -1766,6 +1767,12 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> >  	if (!flen || isdotent(fname, flen) || !tlen || isdotent(tname, tlen))
> >  		goto out;
> >  
> > +	host_err = fh_want_write(ffhp);
> > +	if (host_err) {
> > +		err = nfserrno(host_err);
> > +		goto out;
> > +	}
> > +
> >  	/* cannot use fh_lock as we need deadlock protective ordering
> >  	 * so do it by hand */
> >  	trap = lock_rename(tdentry, fdentry);
> > @@ -1796,17 +1803,14 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> >  	host_err = -EXDEV;
> >  	if (ffhp->fh_export->ex_path.mnt != tfhp->fh_export->ex_path.mnt)
> >  		goto out_dput_new;
> > -	host_err = fh_want_write(ffhp);
> > -	if (host_err)
> > -		goto out_dput_new;
> >  
> >  	host_err = nfsd_break_lease(odentry->d_inode);
> >  	if (host_err)
> > -		goto out_drop_write;
> > +		goto out_dput_new;
> >  	if (ndentry->d_inode) {
> >  		host_err = nfsd_break_lease(ndentry->d_inode);
> >  		if (host_err)
> > -			goto out_drop_write;
> > +			goto out_dput_new;
> >  	}
> >  	host_err = vfs_rename(fdir, odentry, tdir, ndentry);
> >  	if (!host_err) {
> > @@ -1814,8 +1818,6 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> >  		if (!host_err)
> >  			host_err = commit_metadata(ffhp);
> >  	}
> > -out_drop_write:
> > -	fh_drop_write(ffhp);
> >   out_dput_new:
> >  	dput(ndentry);
> >   out_dput_old:
> > @@ -1831,6 +1833,7 @@ out_drop_write:
> >  	fill_post_wcc(tfhp);
> >  	unlock_rename(tdentry, fdentry);
> >  	ffhp->fh_locked = tfhp->fh_locked = 0;
> > +	fh_drop_write(ffhp);
> >  
> >  out:
> >  	return err;
> > @@ -1856,6 +1859,10 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> >  	if (err)
> >  		goto out;
> >  
> > +	host_err = fh_want_write(fhp);
> > +	if (host_err)
> > +		goto out_nfserr;
> > +
> >  	fh_lock_nested(fhp, I_MUTEX_PARENT);
> >  	dentry = fhp->fh_dentry;
> >  	dirp = dentry->d_inode;
> > @@ -1874,21 +1881,15 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> >  	if (!type)
> >  		type = rdentry->d_inode->i_mode & S_IFMT;
> >  
> > -	host_err = fh_want_write(fhp);
> > -	if (host_err)
> > -		goto out_put;
> > -
> >  	host_err = nfsd_break_lease(rdentry->d_inode);
> >  	if (host_err)
> > -		goto out_drop_write;
> > +		goto out_put;
> >  	if (type != S_IFDIR)
> >  		host_err = vfs_unlink(dirp, rdentry);
> >  	else
> >  		host_err = vfs_rmdir(dirp, rdentry);
> >  	if (!host_err)
> >  		host_err = commit_metadata(fhp);
> > -out_drop_write:
> > -	fh_drop_write(fhp);
> >  out_put:
> >  	dput(rdentry);
> >  
> > diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> > index ec0611b..359594c 100644
> > --- a/fs/nfsd/vfs.h
> > +++ b/fs/nfsd/vfs.h
> > @@ -110,12 +110,19 @@ int nfsd_set_posix_acl(struct svc_fh *, int, struct posix_acl *);
> >  
> >  static inline int fh_want_write(struct svc_fh *fh)
> >  {
> > -	return mnt_want_write(fh->fh_export->ex_path.mnt);
> > +	int ret = mnt_want_write(fh->fh_export->ex_path.mnt);
> > +
> > +	if (!ret)
> > +		fh->fh_want_write = 1;
> > +	return ret;
> >  }
> >  
> >  static inline void fh_drop_write(struct svc_fh *fh)
> >  {
> > -	mnt_drop_write(fh->fh_export->ex_path.mnt);
> > +	if (fh->fh_want_write) {
> > +		fh->fh_want_write = 0;
> > +		mnt_drop_write(fh->fh_export->ex_path.mnt);
> > +	}
> >  }
> >  
> >  #endif /* LINUX_NFSD_VFS_H */
> > diff --git a/include/linux/nfsd/nfsfh.h b/include/linux/nfsd/nfsfh.h
> > index ce4743a..fa63048 100644
> > --- a/include/linux/nfsd/nfsfh.h
> > +++ b/include/linux/nfsd/nfsfh.h
> > @@ -143,6 +143,7 @@ typedef struct svc_fh {
> >  	int			fh_maxsize;	/* max size for fh_handle */
> >  
> >  	unsigned char		fh_locked;	/* inode locked by us */
> > +	unsigned char		fh_want_write;	/* remount protection taken */
> >  
> >  #ifdef CONFIG_NFSD_V3
> >  	unsigned char		fh_post_saved;	/* post-op attrs saved */
> > -- 
> > 1.7.1
> > 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 00/19 v5] Fix filesystem freezing deadlocks
  2012-04-16 22:02 ` [PATCH 00/19 v5] Fix filesystem freezing deadlocks Andreas Dilger
  2012-04-17  0:43   ` Dave Chinner
@ 2012-04-17  9:32   ` Jan Kara
       [not found]     ` <20120417093246.GD7198-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
  1 sibling, 1 reply; 42+ messages in thread
From: Jan Kara @ 2012-04-17  9:32 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Jan Kara, Al Viro, dchinner, LKML, linux-fsdevel, Alex Elder,
	Anton Altaparmakov, Ben Myers, Chris Mason, cluster-devel,
	David S. Miller, fuse-devel, J. Bruce Fields, Joel Becker,
	KONISHI Ryusuke, linux-btrfs, linux-ext4, linux-nfs, linux-nilfs,
	linux-ntfs-dev, Mark Fasheh, Miklos Szeredi, ocfs2-devel,
	OGAWA Hirofumi, Steven Whitehouse, Theodore Ts'o, xfs

On Mon 16-04-12 15:02:50, Andreas Dilger wrote:
> On 2012-04-16, at 9:13 AM, Jan Kara wrote:
> > Another potential contention point might be patch 19. In that patch
> > we make freeze_super() refuse to freeze the filesystem when there
> > are open but unlinked files which may be impractical in some cases.
> > The main reason for this is the problem with handling of file deletion
> > from fput() called with mmap_sem held (e.g. from munmap(2)), and
> > then there's the fact that we cannot really force such filesystem
> > into a consistent state... But if people think that freezing with
> > open but unlinked files should happen, then I have some possible
> > solutions in mind (maybe as a separate patchset since this is
> > large enough).
> 
> Looking at a desktop system, I think it is very typical that there
> are open-unlinked files present, so I don't know if this is really
> an acceptable solution.  It isn't clear from your comments whether
> this is a blanket refusal for all open-unlinked files, or only in
> some particular cases...
  Thanks for looking at this. It is currently a blanket refusal. And I
agree it's problematic. There are two problems with open but unlinked
files.

One is that some old filesystems cannot get in a consistent state in
presence of open but unlinked files but for filesystems we really care
about - xfs, ext4, ext3, btrfs, or even ocfs2, gfs2 - that is not a real
issue (these filesystems will delete those inodes on next mount read-write).

The other problem is with what should happen when you put last inode
reference on a frozen filesystem. Two possibilities I see are:

a) block the iput() call - that is inconvenient because it can be
called in various contexts. I think we could possibly use the same level of
freeze protection as for page fault (this has changed since I originally
thought about this and that would make things simpler) but I'm not
completely sure.

b) let the iput finish but filesystem will keep inode on its orphan list
(or it's equivalent) and the inode will be deleted after the filesystem is
thawed. The advantage of this is we don't have to block iput(), the
disadvantage is we have to have filesystem support and not all filesystems
can do this.

Any thoughts?

								Honza
> 
> lsof | grep deleted
> nautilus  25393  adilger   19r      REG           253,0      340     253954 /home/adilger/.local/share/gvfs-metadata/home (deleted)
> nautilus  25393  adilger   20r      REG           253,0    32768     253964 /home/adilger/.local/share/gvfs-metadata/home-f332a8f3.log (deleted)
> gnome-ter 25623  adilger   22u      REG            0,18    17841    2717846 /tmp/vtePIRJCW (deleted)
> gnome-ter 25623  adilger   23u      REG            0,18     5568    2717847 /tmp/vteDCSJCW (deleted)
> gnome-ter 25623  adilger   29u      REG            0,18      480    2728484 /tmp/vte6C1TCW (deleted)
  
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 00/19 v5] Fix filesystem freezing deadlocks
       [not found]     ` <20120417093246.GD7198-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
@ 2012-04-17 19:34       ` Joel Becker
  0 siblings, 0 replies; 42+ messages in thread
From: Joel Becker @ 2012-04-17 19:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andreas Dilger, Al Viro, dchinner-H+wXaHxf7aLQT0dZR+AlfA, LKML,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Alex Elder,
	Anton Altaparmakov, Ben Myers, Chris Mason,
	cluster-devel-H+wXaHxf7aLQT0dZR+AlfA, David S. Miller,
	fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, J. Bruce Fields,
	KONISHI Ryusuke, linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Mark Fasheh,
	Miklos Szeredi, ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g,
	OGAWA Hirofumi, Steven Whitehouse, Theodore Ts'o,
	xfs-VZNHf3L845pBDgjK7y7TUQ

On Tue, Apr 17, 2012 at 11:32:46AM +0200, Jan Kara wrote:
> On Mon 16-04-12 15:02:50, Andreas Dilger wrote:
> > On 2012-04-16, at 9:13 AM, Jan Kara wrote:
> > > Another potential contention point might be patch 19. In that patch
> > > we make freeze_super() refuse to freeze the filesystem when there
> > > are open but unlinked files which may be impractical in some cases.
> > > The main reason for this is the problem with handling of file deletion
> > > from fput() called with mmap_sem held (e.g. from munmap(2)), and
> > > then there's the fact that we cannot really force such filesystem
> > > into a consistent state... But if people think that freezing with
> > > open but unlinked files should happen, then I have some possible
> > > solutions in mind (maybe as a separate patchset since this is
> > > large enough).
> > 
> > Looking at a desktop system, I think it is very typical that there
> > are open-unlinked files present, so I don't know if this is really
> > an acceptable solution.  It isn't clear from your comments whether
> > this is a blanket refusal for all open-unlinked files, or only in
> > some particular cases...
>   Thanks for looking at this. It is currently a blanket refusal. And I
> agree it's problematic. There are two problems with open but unlinked
> files.

	Let me add my name to the chorus of "we have to handle freezing
with open+unlinked, we cannot assume they don't exist."

> One is that some old filesystems cannot get in a consistent state in
> presence of open but unlinked files but for filesystems we really care
> about - xfs, ext4, ext3, btrfs, or even ocfs2, gfs2 - that is not a real
> issue (these filesystems will delete those inodes on next mount read-write).

	Others have pointed out that we can flag the safe filesystems.
I'd even be willing to say you can't freeze the unsafe filesystems.

> The other problem is with what should happen when you put last inode
> reference on a frozen filesystem. Two possibilities I see are:
> 
> a) block the iput() call - that is inconvenient because it can be
> called in various contexts. I think we could possibly use the same level of
> freeze protection as for page fault (this has changed since I originally
> thought about this and that would make things simpler) but I'm not
> completely sure.

	Given that frozen filesystems can stay that way for a while,
couldn't that lead to a million frozen df(1)s?  It's like your average
NFS network failure.

> b) let the iput finish but filesystem will keep inode on its orphan list
> (or it's equivalent) and the inode will be deleted after the filesystem is
> thawed. The advantage of this is we don't have to block iput(), the
> disadvantage is we have to have filesystem support and not all filesystems
> can do this.

	Perhaps we handle iput() like unlinked.  If the filesystem can
handle it, we allow it, otherwise we block.

Joel

> 
> Any thoughts?
> 
> 								Honza
> > 
> > lsof | grep deleted
> > nautilus  25393  adilger   19r      REG           253,0      340     253954 /home/adilger/.local/share/gvfs-metadata/home (deleted)
> > nautilus  25393  adilger   20r      REG           253,0    32768     253964 /home/adilger/.local/share/gvfs-metadata/home-f332a8f3.log (deleted)
> > gnome-ter 25623  adilger   22u      REG            0,18    17841    2717846 /tmp/vtePIRJCW (deleted)
> > gnome-ter 25623  adilger   23u      REG            0,18     5568    2717847 /tmp/vteDCSJCW (deleted)
> > gnome-ter 25623  adilger   29u      REG            0,18      480    2728484 /tmp/vte6C1TCW (deleted)
>   
> -- 
> Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> SUSE Labs, CR

-- 

"The first requisite of a good citizen in this republic of ours
 is that he shall be able and willing to pull his weight."
	- Theodore Roosevelt

			http://www.jlbec.org/
			jlbec-aKy9MeLSZ9dg9hUCZPvPmw@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 00/19 v5] Fix filesystem freezing deadlocks
  2012-04-17  5:10     ` Andreas Dilger
@ 2012-04-18  0:46       ` Chris Samuel
  0 siblings, 0 replies; 42+ messages in thread
From: Chris Samuel @ 2012-04-18  0:46 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Dave Chinner, Jan Kara, Al Viro, dchinner, LKML, linux-fsdevel,
	Alex Elder, Anton Altaparmakov, Ben Myers, Chris Mason,
	cluster-devel, David S. Miller, fuse-devel, J. Bruce Fields,
	Joel Becker, KONISHI Ryusuke, linux-btrfs, linux-ext4, linux-nfs,
	linux-nilfs, linux-ntfs-dev, Mark Fasheh, Miklos Szeredi,
	ocfs2-devel, OGAWA Hirofumi, Steven Whitehouse

On 17/04/12 15:10, Andreas Dilger wrote:

> (which IMHO would prevent nearly every Gnome system from freezing unless these
> apps have changed their behaviour in more recent releases).

It would also affect current KDE desktops as they tend to use MySQL for
Akonadi & Nepomuk, plus anyone using Chromium (and presumably Chrome):

samuel@eris:/tmp$ sudo lsof | grep deleted | awk '{print $1}' | sort |
uniq -c
[sudo] password for samuel:
     32 chromium-
      1 dovecot
      5 imap-logi
      5 mysqld

I would be surprised if you could find many systems that didn't have
files in this situation.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 19/27] ocfs2: Convert to new freezing mechanism
  2012-06-01 22:30 [PATCH 00/27 v6] " Jan Kara
@ 2012-06-01 22:30 ` Jan Kara
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Kara @ 2012-06-01 22:30 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Al Viro, dchinner, Jan Kara, Mark Fasheh, Joel Becker,
	ocfs2-devel

Protect ocfs2_page_mkwrite() and ocfs2_file_aio_write() using the new freeze
protection. We also protect several ioctl entry points which were missing the
protection. Finally, we add freeze protection to the journaling mechanism so
that iput() of unlinked inode cannot modify a frozen filesystem.

CC: Mark Fasheh <mfasheh@suse.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: ocfs2-devel@oss.oracle.com
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/file.c    |   11 +++++++++--
 fs/ocfs2/ioctl.c   |   14 ++++++++++++--
 fs/ocfs2/journal.c |    7 ++++++-
 fs/ocfs2/mmap.c    |    2 ++
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 061591a..9b1e3d4 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1971,6 +1971,7 @@ int ocfs2_change_file_space(struct file *file, unsigned int cmd,
 {
 	struct inode *inode = file->f_path.dentry->d_inode;
 	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+	int ret;
 
 	if ((cmd == OCFS2_IOC_RESVSP || cmd == OCFS2_IOC_RESVSP64) &&
 	    !ocfs2_writes_unwritten_extents(osb))
@@ -1985,7 +1986,12 @@ int ocfs2_change_file_space(struct file *file, unsigned int cmd,
 	if (!(file->f_mode & FMODE_WRITE))
 		return -EBADF;
 
-	return __ocfs2_change_file_space(file, inode, file->f_pos, cmd, sr, 0);
+	ret = mnt_want_write_file(file);
+	if (ret)
+		return ret;
+	ret = __ocfs2_change_file_space(file, inode, file->f_pos, cmd, sr, 0);
+	mnt_drop_write_file(file);
+	return ret;
 }
 
 static long ocfs2_fallocate(struct file *file, int mode, loff_t offset,
@@ -2261,7 +2267,7 @@ static ssize_t ocfs2_file_aio_write(struct kiocb *iocb,
 	if (iocb->ki_left == 0)
 		return 0;
 
-	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
+	sb_start_write(inode->i_sb);
 
 	appending = file->f_flags & O_APPEND ? 1 : 0;
 	direct_io = file->f_flags & O_DIRECT ? 1 : 0;
@@ -2434,6 +2440,7 @@ out_sems:
 		ocfs2_iocb_clear_sem_locked(iocb);
 
 	mutex_unlock(&inode->i_mutex);
+	sb_end_write(inode->i_sb);
 
 	if (written)
 		ret = written;
diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c
index a1a1bfd..d9c352b 100644
--- a/fs/ocfs2/ioctl.c
+++ b/fs/ocfs2/ioctl.c
@@ -926,7 +926,12 @@ long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		if (get_user(new_clusters, (int __user *)arg))
 			return -EFAULT;
 
-		return ocfs2_group_extend(inode, new_clusters);
+		status = mnt_want_write_file(filp);
+		if (status)
+			return status;
+		status = ocfs2_group_extend(inode, new_clusters);
+		mnt_drop_write_file(filp);
+		return status;
 	case OCFS2_IOC_GROUP_ADD:
 	case OCFS2_IOC_GROUP_ADD64:
 		if (!capable(CAP_SYS_RESOURCE))
@@ -935,7 +940,12 @@ long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		if (copy_from_user(&input, (int __user *) arg, sizeof(input)))
 			return -EFAULT;
 
-		return ocfs2_group_add(inode, &input);
+		status = mnt_want_write_file(filp);
+		if (status)
+			return status;
+		status = ocfs2_group_add(inode, &input);
+		mnt_drop_write_file(filp);
+		return status;
 	case OCFS2_IOC_REFLINK:
 		if (copy_from_user(&args, (struct reflink_arguments *)arg,
 				   sizeof(args)))
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 0a42ae9..2dd36af 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -355,11 +355,14 @@ handle_t *ocfs2_start_trans(struct ocfs2_super *osb, int max_buffs)
 	if (journal_current_handle())
 		return jbd2_journal_start(journal, max_buffs);
 
+	sb_start_intwrite(osb->sb);
+
 	down_read(&osb->journal->j_trans_barrier);
 
 	handle = jbd2_journal_start(journal, max_buffs);
 	if (IS_ERR(handle)) {
 		up_read(&osb->journal->j_trans_barrier);
+		sb_end_intwrite(osb->sb);
 
 		mlog_errno(PTR_ERR(handle));
 
@@ -388,8 +391,10 @@ int ocfs2_commit_trans(struct ocfs2_super *osb,
 	if (ret < 0)
 		mlog_errno(ret);
 
-	if (!nested)
+	if (!nested) {
 		up_read(&journal->j_trans_barrier);
+		sb_end_intwrite(osb->sb);
+	}
 
 	return ret;
 }
diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
index 9cd4108..d150372 100644
--- a/fs/ocfs2/mmap.c
+++ b/fs/ocfs2/mmap.c
@@ -136,6 +136,7 @@ static int ocfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	sigset_t oldset;
 	int ret;
 
+	sb_start_pagefault(inode->i_sb);
 	ocfs2_block_signals(&oldset);
 
 	/*
@@ -165,6 +166,7 @@ static int ocfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 out:
 	ocfs2_unblock_signals(&oldset);
+	sb_end_pagefault(inode->i_sb);
 	return ret;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2012-06-01 22:31 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-16 16:13 [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
2012-04-16 16:13 ` [PATCH 01/27] fb_defio: Push file_update_time() into fb_deferred_io_mkwrite() Jan Kara
2012-04-16 16:13 ` [PATCH 02/27] fs: Push file_update_time() into __block_page_mkwrite() Jan Kara
2012-04-16 16:13 ` [PATCH 03/27] ceph: Push file_update_time() into ceph_page_mkwrite() Jan Kara
2012-04-16 16:13 ` [PATCH 04/27] 9p: Push file_update_time() into v9fs_vm_page_mkwrite() Jan Kara
2012-04-16 16:13 ` [PATCH 05/27] gfs2: Push file_update_time() into gfs2_page_mkwrite() Jan Kara
2012-04-16 16:13 ` [PATCH 06/27] sysfs: Push file_update_time() into bin_page_mkwrite() Jan Kara
2012-04-16 16:13 ` [PATCH 07/27] mm: Update file times from fault path only if .page_mkwrite is not set Jan Kara
2012-04-16 16:13 ` [PATCH 08/27] mm: Make default vm_ops provide ->page_mkwrite handler Jan Kara
2012-04-16 16:13 ` [PATCH 09/27] fs: Push mnt_want_write() outside of i_mutex Jan Kara
2012-04-17  2:18   ` Joel Becker
2012-04-17  7:43     ` Jan Kara
2012-04-16 16:13 ` [PATCH 10/27] fat: " Jan Kara
2012-04-16 16:13 ` [PATCH 11/27] btrfs: " Jan Kara
2012-04-16 16:13 ` [PATCH 12/27] nfsd: " Jan Kara
2012-04-16 18:25   ` J. Bruce Fields
2012-04-17  8:17     ` Jan Kara
2012-04-16 16:13 ` [PATCH 13/27] fs: Improve filesystem freezing handling Jan Kara
2012-04-16 16:13 ` [PATCH 14/27] fs: Add freezing handling to mnt_want_write() / mnt_drop_write() Jan Kara
2012-04-16 16:13 ` [PATCH 15/27] fs: Skip atime update on frozen filesystem Jan Kara
2012-04-16 16:13 ` [PATCH 16/27] fs: Protect write paths by sb_start_write - sb_end_write Jan Kara
2012-04-16 16:13 ` [PATCH 17/27] ext4: Convert to new freezing mechanism Jan Kara
2012-04-16 16:13 ` [PATCH 18/27] xfs: Convert to new freezing code Jan Kara
2012-04-16 16:13 ` [PATCH 19/27] ocfs2: Convert to new freezing mechanism Jan Kara
2012-04-17  2:19   ` Joel Becker
2012-04-17  7:44     ` Jan Kara
2012-04-16 16:13 ` [PATCH 20/27] gfs2: " Jan Kara
     [not found] ` <1334592845-22862-1-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
2012-04-16 16:13   ` [PATCH 21/27] fuse: " Jan Kara
2012-04-16 16:16   ` [PATCH 00/19 v5] Fix filesystem freezing deadlocks Jan Kara
2012-04-16 16:14 ` [PATCH 22/27] ntfs: Convert to new freezing mechanism Jan Kara
2012-04-16 16:14 ` [PATCH 23/27] nilfs2: " Jan Kara
2012-04-16 16:14 ` [PATCH 24/27] btrfs: " Jan Kara
2012-04-16 16:14 ` [PATCH 25/27] fs: Remove old " Jan Kara
2012-04-16 16:14 ` [PATCH 26/27] fs: Refuse to freeze filesystem with open but unlinked files Jan Kara
2012-04-16 16:14 ` [PATCH 27/27] Documentation: Correct s_umount state for freeze_fs/unfreeze_fs Jan Kara
2012-04-16 22:02 ` [PATCH 00/19 v5] Fix filesystem freezing deadlocks Andreas Dilger
2012-04-17  0:43   ` Dave Chinner
2012-04-17  5:10     ` Andreas Dilger
2012-04-18  0:46       ` Chris Samuel
2012-04-17  9:32   ` Jan Kara
     [not found]     ` <20120417093246.GD7198-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-17 19:34       ` Joel Becker
  -- strict thread matches above, loose matches on Subject: below --
2012-06-01 22:30 [PATCH 00/27 v6] " Jan Kara
2012-06-01 22:30 ` [PATCH 19/27] ocfs2: Convert to new freezing mechanism Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).