linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups
@ 2013-01-07 11:18 Fernando Luis Vázquez Cao
  2013-01-07 11:21 ` [PATCH 1/17] vfs: add __iterate_supers() and helpers around it Fernando Luis Vázquez Cao
                   ` (16 more replies)
  0 siblings, 17 replies; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:18 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

This patch set addresses several long standing issues:

- Fix emergency thaw (patch 1 through 4) and make some cleanups while at it
  (patches 5 and 6).

- Fix nested freezes for superblock-less block devices (patch 7).

- Integrate sb-level/bdev-level fsfreeze and sanitize the fsfreeze API to
  avoid perpetual frost in the event of changes in the filesystem namespace
  (patch 8 through 13).

- Add infrastructure needed to support dm-initiated freeze of filesystems
  spanning several bdevs and actually use it to implement this feature in
  btrfs (patch 14 through 16).

- Export freeze state to userspace through mountinfo (patch 17).

---
[1/17] vfs: add __iterate_supers() and helpers around it
[2/17] fsfreeze: add unlocked version of thaw_super

Preparatory patches to fix s_umount lockup of emergency thaw code.

[3/17] fsfreeze: fix emergency thaw infinite loop

Fix thaw_bdev so that it propagates the error code properly to the caller.
This bug caused emergency thaw to loop infinitely. This is a forward port of
a previous patch by Dave Chinner.

[4/17] fsfreeze: emergency thaw will deadlock on s_umount

Avoid emergency thaw deadlock on s_umount by using unlocked version of
thaw_super() and __iterate_supers() (introduced in patches 2 and 1
respectively).

[5/17] xfs: switch to using super methods for fsfreeze
[6/17] fsfreeze: move emergency thaw code to fs/super.c

Two cleanups (based on previous patches by David Chinner): convert xfs to using
superblock methods for fsfreeze and move emergency thaw code to where it
belongs.

[7/17] fsfreeze: fix nested freezing of sb-less bdevs

Get nested freezes working for sb-less bdevs.

[8/17] fsfreeze: allow bdev level thaws when the sb is unfrozen
[9/17] fsfreeze: freeze_super and thaw_bdev don't play well together

Restore the re-entrancy guards in sb-level API as the first step to achieve
proper sb-level/bdev-level fsfreeze integration.

[10/17] fsfreeze: automatically thaw on umount

The fsfreeze ioctls are a sb-level API which means that to thaw a frozen
filesystem it is necessary to have it in our filesystem hierarchy. To avoid
being stuck with a frozen filesystem and frozen processes by accident, we
automatically thaw filesystems on sys_umount().

[11/17] fsfreeze: add thaw_super_force

Both emergecy thaw and the umount-time automatic thaw need to make sure that
the thaw operation does not fail so with some code refactoring we can share
and clean up the code.

[12/17] fsfreeze: sb-level/bdev-level fsfreeze integration
[13/17] fsfreeze: unfreeze bdevs in addition to filesystems during emergency thaw

Fix the problem that a filesystem frozen through the bdev level API can be
thawed using the superblock level API and make sure that their respective
freeze counters are always consistent with each other.

[14/17] vfs: leverage bd_super in get_super and get_active_super
[15/17] btrfs: store pointer to superblock in bd_super
[16/17] fsfreeze: allow freeze counter lock nesting

Leverage bd_super in get_active_super to be able to get the superblock from the bdev when the filesystem spans several bdevs. Once this is in place, getting
thaw_bdev(), i.e. dm snapshot, and btrfs to play well together is just a matter
of using bd_super in btrfs.

[17/17] fsfreeze: export freeze_count through mountinfo

Export freeze_count through mountinfo so that we have a way to figure out the
current freeze state from userspace.
---

Changes since: v5:
  - Fix patch 4 ("fsfreeze: emergency thaw will deadlock on s_umount").
  - Add Jan Kara's Reviewed-by to patch 9 ("fsfreeze: freeze_super and
    thaw_bdev don't play well together").
  - Exclude conflictive check ioctls from the patch set (could be back in a
    follow-up patch set) and add a patch to export freeze state to userspace
    through mountinfo.
  - Add fix for nested freezing of sb-less bdevs.
  - Rewrite the patches to integrate sb-level/bdev-level fsfreeze and sanitize
    the fsfreeze API (including automatical thawing of filesystems on umount).
  - Add infrastructure needed to support dm-initiated freeze of filesystems
    spanning several bdevs and actually use it to implement this feature in
    btrfs.

Changes since v4:
  - Make implementation of iterate_supers_read and iterate_supers_write
    symmetric and update locking of emergency thaw code accordingly.
  - Add Jan Kara's Reviewed-by to patch 2.
  - Add explanation about the change from a block device based emergency thaw
    to a superblock based one to patch 4.
  - Modify patch 7 so that the new superblock based emergency does not leave
    the block device level fsfreeze counter in an inconsistent state.
  - Update and comment handling of fsfreeze during mount.

Changes since v3:
  - Include right version of the emergency thaw fix (thanks go to Josef for the
    heads up).
  - Add iterate_supers_(read/write) helpers as suggested by Eric Sandeen.
  - Fix typos.
  - Add Eric Sandeen's Reviewed-by to patches 2 and 3.

Changes since v2:
  - Rebase on top of Jan Kara's patch set which made it upstream.
  - Do without the horrible "emergency" parameter as suggested by Jan Kara.
  - New approach to fix emergency thaw deadlock on s_umount which does locking
    properly.

Changes since v1:
  - Do not break unfreezing of block devices without a mounted filesystem.
  - Avoid conditional locking in thaw_bdev.
  - Update filesystem locking documentation.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/17] vfs: add __iterate_supers() and helpers around it
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
@ 2013-01-07 11:21 ` Fernando Luis Vázquez Cao
  2013-01-07 11:22 ` [PATCH 2/17] fsfreeze: add unlocked version of thaw_super Fernando Luis Vázquez Cao
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:21 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

iterate_supers() calls a function provided by the caller with the s_umount
semaphore taken in read mode. However, there may be cases where write mode
is preferable, so we add __iterate_supers(), which lets one
specify the mode of the lock, and replace iterate_supers with two helpers
around __iterate_supers(), iterate_supers_read() and iterate_supers_write().

This will be used to fix the emergency thaw (filesystem unfreeze) code, which
iterates over the list of superblocks but needs to hold the s_umount semaphore
in _write_ mode bebore carrying out the actual thaw operation.

This patch introduces no semantic changes since current iterate_supers()
callers are just updated to use iterate_supers_read() instead.

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/buffer.c linux-3.8-rc1/fs/buffer.c
--- linux-3.8-rc1-orig/fs/buffer.c	2012-12-25 10:27:41.166737000 +0900
+++ linux-3.8-rc1/fs/buffer.c	2012-12-25 11:14:41.304018000 +0900
@@ -520,7 +520,7 @@ static void do_thaw_one(struct super_blo
 
 static void do_thaw_all(struct work_struct *work)
 {
-	iterate_supers(do_thaw_one, NULL);
+	iterate_supers_read(do_thaw_one, NULL);
 	kfree(work);
 	printk(KERN_WARNING "Emergency Thaw complete\n");
 }
diff -urNp linux-3.8-rc1-orig/fs/drop_caches.c linux-3.8-rc1/fs/drop_caches.c
--- linux-3.8-rc1-orig/fs/drop_caches.c	2012-12-11 12:30:57.000000000 +0900
+++ linux-3.8-rc1/fs/drop_caches.c	2012-12-25 11:14:41.304018000 +0900
@@ -59,7 +59,7 @@ int drop_caches_sysctl_handler(ctl_table
 		return ret;
 	if (write) {
 		if (sysctl_drop_caches & 1)
-			iterate_supers(drop_pagecache_sb, NULL);
+			iterate_supers_read(drop_pagecache_sb, NULL);
 		if (sysctl_drop_caches & 2)
 			drop_slab();
 	}
diff -urNp linux-3.8-rc1-orig/fs/quota/quota.c linux-3.8-rc1/fs/quota/quota.c
--- linux-3.8-rc1-orig/fs/quota/quota.c	2012-12-25 10:27:42.034737000 +0900
+++ linux-3.8-rc1/fs/quota/quota.c	2012-12-25 11:14:41.304018000 +0900
@@ -58,7 +58,7 @@ static int quota_sync_all(int type)
 		return -EINVAL;
 	ret = security_quotactl(Q_SYNC, type, 0, NULL);
 	if (!ret)
-		iterate_supers(quota_sync_one, &type);
+		iterate_supers_read(quota_sync_one, &type);
 	return ret;
 }
 
diff -urNp linux-3.8-rc1-orig/fs/super.c linux-3.8-rc1/fs/super.c
--- linux-3.8-rc1-orig/fs/super.c	2012-12-11 12:30:57.000000000 +0900
+++ linux-3.8-rc1/fs/super.c	2012-12-25 11:14:41.308018000 +0900
@@ -508,14 +508,16 @@ void drop_super(struct super_block *sb)
 EXPORT_SYMBOL(drop_super);
 
 /**
- *	iterate_supers - call function for all active superblocks
+ *	__iterate_supers - call function for all active superblocks
  *	@f: function to call
  *	@arg: argument to pass to it
+ *	@wlock: mode of superblock lock (false->read lock, true->write lock)
  *
  *	Scans the superblock list and calls given function, passing it
  *	locked superblock and given argument.
  */
-void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
+static void __iterate_supers(void (*f)(struct super_block *, void *), void *arg,
+			     bool wlock)
 {
 	struct super_block *sb, *p = NULL;
 
@@ -526,10 +528,18 @@ void iterate_supers(void (*f)(struct sup
 		sb->s_count++;
 		spin_unlock(&sb_lock);
 
-		down_read(&sb->s_umount);
+		if (wlock)
+			down_write(&sb->s_umount);
+		else
+			down_read(&sb->s_umount);
+
 		if (sb->s_root && (sb->s_flags & MS_BORN))
 			f(sb, arg);
-		up_read(&sb->s_umount);
+
+		if (wlock)
+			up_write(&sb->s_umount);
+		else
+			up_read(&sb->s_umount);
 
 		spin_lock(&sb_lock);
 		if (p)
@@ -542,6 +552,34 @@ void iterate_supers(void (*f)(struct sup
 }
 
 /**
+ *	iterate_supers_read - call function for all active superblocks
+ *	@f: function to call
+ *	@arg: argument to pass to it
+ *
+ *	Scans the superblock list and calls given function, passing it
+ *	a superblock locked in _read_ mode and given argument. The lock
+ *	is automatically relased after the function returns.
+ */
+void iterate_supers_read(void (*f)(struct super_block *, void *), void *arg)
+{
+	__iterate_supers(f, arg , false);
+}
+
+/**
+ *	iterate_supers_write - call function for all active superblocks
+ *	@f: function to call
+ *	@arg: argument to pass to it
+ *
+ *	Scans the superblock list and calls given function, passing it
+ *	a superblock locked in _write_ mode and given argument. The lock
+ *	is automatically relased after the function returns.
+ */
+void iterate_supers_write(void (*f)(struct super_block *, void *), void *arg)
+{
+	__iterate_supers(f, arg , true);
+}
+
+/**
  *	iterate_supers_type - call function for superblocks of given type
  *	@type: fs type
  *	@f: function to call
diff -urNp linux-3.8-rc1-orig/fs/sync.c linux-3.8-rc1/fs/sync.c
--- linux-3.8-rc1-orig/fs/sync.c	2012-12-11 12:30:57.000000000 +0900
+++ linux-3.8-rc1/fs/sync.c	2012-12-25 11:14:41.312018000 +0900
@@ -104,9 +104,9 @@ SYSCALL_DEFINE0(sync)
 	int nowait = 0, wait = 1;
 
 	wakeup_flusher_threads(0, WB_REASON_SYNC);
-	iterate_supers(sync_inodes_one_sb, NULL);
-	iterate_supers(sync_fs_one_sb, &nowait);
-	iterate_supers(sync_fs_one_sb, &wait);
+	iterate_supers_read(sync_inodes_one_sb, NULL);
+	iterate_supers_read(sync_fs_one_sb, &nowait);
+	iterate_supers_read(sync_fs_one_sb, &wait);
 	iterate_bdevs(fdatawrite_one_bdev, NULL);
 	iterate_bdevs(fdatawait_one_bdev, NULL);
 	if (unlikely(laptop_mode))
@@ -122,11 +122,11 @@ static void do_sync_work(struct work_str
 	 * Sync twice to reduce the possibility we skipped some inodes / pages
 	 * because they were temporarily locked
 	 */
-	iterate_supers(sync_inodes_one_sb, &nowait);
-	iterate_supers(sync_fs_one_sb, &nowait);
+	iterate_supers_read(sync_inodes_one_sb, &nowait);
+	iterate_supers_read(sync_fs_one_sb, &nowait);
 	iterate_bdevs(fdatawrite_one_bdev, NULL);
-	iterate_supers(sync_inodes_one_sb, &nowait);
-	iterate_supers(sync_fs_one_sb, &nowait);
+	iterate_supers_read(sync_inodes_one_sb, &nowait);
+	iterate_supers_read(sync_fs_one_sb, &nowait);
 	iterate_bdevs(fdatawrite_one_bdev, NULL);
 	printk("Emergency Sync complete\n");
 	kfree(work);
diff -urNp linux-3.8-rc1-orig/include/linux/fs.h linux-3.8-rc1/include/linux/fs.h
--- linux-3.8-rc1-orig/include/linux/fs.h	2012-12-25 10:27:42.198737000 +0900
+++ linux-3.8-rc1/include/linux/fs.h	2012-12-25 11:14:41.312018000 +0900
@@ -2494,7 +2494,8 @@ extern struct super_block *get_super(str
 extern struct super_block *get_super_thawed(struct block_device *);
 extern struct super_block *get_active_super(struct block_device *bdev);
 extern void drop_super(struct super_block *sb);
-extern void iterate_supers(void (*)(struct super_block *, void *), void *);
+extern void iterate_supers_read(void (*)(struct super_block *, void *), void *);
+extern void iterate_supers_write(void (*)(struct super_block *, void *), void *);
 extern void iterate_supers_type(struct file_system_type *,
 			        void (*)(struct super_block *, void *), void *);
 
diff -urNp linux-3.8-rc1-orig/security/selinux/hooks.c linux-3.8-rc1/security/selinux/hooks.c
--- linux-3.8-rc1-orig/security/selinux/hooks.c	2012-12-11 12:30:57.000000000 +0900
+++ linux-3.8-rc1/security/selinux/hooks.c	2012-12-25 11:14:41.312018000 +0900
@@ -5720,7 +5720,7 @@ void selinux_complete_init(void)
 
 	/* Set up any superblocks initialized prior to the policy load. */
 	printk(KERN_DEBUG "SELinux:  Setting up existing superblocks.\n");
-	iterate_supers(delayed_superblock_init, NULL);
+	iterate_supers_read(delayed_superblock_init, NULL);
 }
 
 /* SELinux requires early initialization in order to label



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/17] fsfreeze: add unlocked version of thaw_super
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
  2013-01-07 11:21 ` [PATCH 1/17] vfs: add __iterate_supers() and helpers around it Fernando Luis Vázquez Cao
@ 2013-01-07 11:22 ` Fernando Luis Vázquez Cao
  2013-01-07 11:23 ` [PATCH 3/17] fsfreeze: fix emergency thaw infinite loop Fernando Luis Vázquez Cao
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:22 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

thaw_super may be called with superblock lock already taken (fsfreeze's
emergency thaw being one example), so we need an unlocked version to avoid
lockups.

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/super.c linux-3.8-rc1/fs/super.c
--- linux-3.8-rc1-orig/fs/super.c	2012-12-25 11:30:38.208018000 +0900
+++ linux-3.8-rc1/fs/super.c	2012-12-25 11:33:32.796018000 +0900
@@ -1399,40 +1399,59 @@ int freeze_super(struct super_block *sb)
 EXPORT_SYMBOL(freeze_super);
 
 /**
- * thaw_super -- unlock filesystem
+ * __thaw_super -- unlock filesystem
  * @sb: the super to thaw
  *
- * Unlocks the filesystem and marks it writeable again after freeze_super().
+ * Unlocks the filesystem and marks it writeable again.
+ *
+ * This is the unlocked version of thaw_super, so it is the caller's job to
+ * to protect the superblock by grabbing the s_umount semaphore in write mode
+ * and release it again on return. See thaw_super() for an example.
  */
-int thaw_super(struct super_block *sb)
+int __thaw_super(struct super_block *sb)
 {
-	int error;
+	int error = 0;
 
-	down_write(&sb->s_umount);
 	if (sb->s_writers.frozen == SB_UNFROZEN) {
-		up_write(&sb->s_umount);
-		return -EINVAL;
+		error = -EINVAL;
+		goto out;
 	}
 
 	if (sb->s_flags & MS_RDONLY)
-		goto out;
+		goto out_thaw;
 
 	if (sb->s_op->unfreeze_fs) {
 		error = sb->s_op->unfreeze_fs(sb);
 		if (error) {
 			printk(KERN_ERR
 				"VFS:Filesystem thaw failed\n");
-			up_write(&sb->s_umount);
-			return error;
+			goto out;
 		}
 	}
 
-out:
+out_thaw:
 	sb->s_writers.frozen = SB_UNFROZEN;
 	smp_wmb();
 	wake_up(&sb->s_writers.wait_unfrozen);
-	deactivate_locked_super(sb);
+out:
+	return error;
+}
 
-	return 0;
+/**
+ * thaw_super -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ */
+int thaw_super(struct super_block *sb)
+{
+	int res;
+	down_write(&sb->s_umount);
+	res = __thaw_super(sb);
+	if (!res)
+		deactivate_locked_super(sb);
+	else
+		up_write(&sb->s_umount);
+	return res;
 }
 EXPORT_SYMBOL(thaw_super);
diff -urNp linux-3.8-rc1-orig/include/linux/fs.h linux-3.8-rc1/include/linux/fs.h
--- linux-3.8-rc1-orig/include/linux/fs.h	2012-12-25 11:30:38.212018000 +0900
+++ linux-3.8-rc1/include/linux/fs.h	2012-12-25 11:33:32.796018000 +0900
@@ -1879,6 +1879,7 @@ extern int user_statfs(const char __user
 extern int fd_statfs(int, struct kstatfs *);
 extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
+extern int __thaw_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
 extern bool our_mnt(struct vfsmount *mnt);
 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 3/17] fsfreeze: fix emergency thaw infinite loop
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
  2013-01-07 11:21 ` [PATCH 1/17] vfs: add __iterate_supers() and helpers around it Fernando Luis Vázquez Cao
  2013-01-07 11:22 ` [PATCH 2/17] fsfreeze: add unlocked version of thaw_super Fernando Luis Vázquez Cao
@ 2013-01-07 11:23 ` Fernando Luis Vázquez Cao
  2013-01-07 11:26 ` [PATCH 4/17] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:23 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

The thawing of a filesystem through sysrq-j loops infinitely as it
incorrectly detects a thawed filesytsem as frozen and tries to
unfreeze repeatedly. This is a regression caused by
4504230a71566785a05d3e6b53fa1ee071b864eb ("freeze_bdev: grab active
reference to frozen superblocks") in that it no longer returned
-EINVAL for superblocks that were not frozen.

Return -EINVAL when the filesystem is already unfrozen to avoid this
problem.

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/block_dev.c linux-3.8-rc1/fs/block_dev.c
--- linux-3.8-rc1-orig/fs/block_dev.c	2012-12-25 10:27:41.110737000 +0900
+++ linux-3.8-rc1/fs/block_dev.c	2012-12-25 11:36:24.236018000 +0900
@@ -266,22 +266,17 @@ int thaw_bdev(struct block_device *bdev,
 	if (!bdev->bd_fsfreeze_count)
 		goto out;
 
-	error = 0;
-	if (--bdev->bd_fsfreeze_count > 0)
-		goto out;
-
-	if (!sb)
+	if (--bdev->bd_fsfreeze_count > 0 || !sb) {
+		error = 0;
 		goto out;
+	}
 
 	error = thaw_super(sb);
-	if (error) {
+	if (error)
 		bdev->bd_fsfreeze_count++;
-		mutex_unlock(&bdev->bd_fsfreeze_mutex);
-		return error;
-	}
 out:
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
-	return 0;
+	return error;
 }
 EXPORT_SYMBOL(thaw_bdev);
 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 4/17] fsfreeze: emergency thaw will deadlock on s_umount
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (2 preceding siblings ...)
  2013-01-07 11:23 ` [PATCH 3/17] fsfreeze: fix emergency thaw infinite loop Fernando Luis Vázquez Cao
@ 2013-01-07 11:26 ` Fernando Luis Vázquez Cao
  2013-01-09 16:12   ` Jan Kara
  2013-01-07 11:27 ` [PATCH 5/17] xfs: switch to using super methods for fsfreeze Fernando Luis Vázquez Cao
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:26 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

The emergency thaw process uses iterate_super() which holds the
sb->s_umount lock in read mode. The current thaw_super() code takes
the sb->s_umount lock in write mode, hence leading to an instant
deadlock.
 
Use the unlocked version of thaw_super() to do the thawing and replace
iterate_supers() with iterate_supers_write() so that the unfreeze operation can
be performed with s_umount held as the locking rules for fsfreeze indicate.

As a bonus, by using thaw_super(), which does not nest, instead of thaw_bdev()
when can get rid of the ugly while loop.

Jan Kara pointed out that with this approach we will leave the block devices
frozen, but this is a problem we have had since the introduction of the
superblock level API: if we thaw the filesystem using the superblock level API
(be it through the thaw ioctl or emergency thaw) the bdev level freeze
reference counter (bd_fsfreeze_count) will not be updated and even though
subsequent calls to thaw_bdev() will decrease it it will never get back to 0
(if thaw_super() returns an error, and it will when the superblock is unfrozen,
thaw_bdev() will return without decreasing the counter). The solution I propose
(and will be implementing in the followup patch "fsfreeze: freeze_super and
thaw_bdev don't play well together") is letting bd_fsfreeze_count
become zero when the superblock sitting on top of it is unfrozen, so that
future calls to freeze_bdev() actually try to freeze the superblock.

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/drivers/tty/sysrq.c linux-3.8-rc1/drivers/tty/sysrq.c
--- linux-3.8-rc1-orig/drivers/tty/sysrq.c	2012-12-25 10:27:40.614737000 +0900
+++ linux-3.8-rc1/drivers/tty/sysrq.c	2012-12-25 11:40:06.128018000 +0900
@@ -363,7 +363,6 @@ static struct sysrq_key_op sysrq_moom_op
 	.enable_mask	= SYSRQ_ENABLE_SIGNAL,
 };
 
-#ifdef CONFIG_BLOCK
 static void sysrq_handle_thaw(int key)
 {
 	emergency_thaw_all();
@@ -374,7 +373,6 @@ static struct sysrq_key_op sysrq_thaw_op
 	.action_msg	= "Emergency Thaw of all frozen filesystems",
 	.enable_mask	= SYSRQ_ENABLE_SIGNAL,
 };
-#endif
 
 static void sysrq_handle_kill(int key)
 {
diff -urNp linux-3.8-rc1-orig/fs/buffer.c linux-3.8-rc1/fs/buffer.c
--- linux-3.8-rc1-orig/fs/buffer.c	2012-12-25 11:30:38.208018000 +0900
+++ linux-3.8-rc1/fs/buffer.c	2012-12-25 11:40:06.128018000 +0900
@@ -512,15 +512,33 @@ repeat:
 
 static void do_thaw_one(struct super_block *sb, void *unused)
 {
-	char b[BDEVNAME_SIZE];
-	while (sb->s_bdev && !thaw_bdev(sb->s_bdev, sb))
-		printk(KERN_WARNING "Emergency Thaw on %s\n",
+	int res;
+
+	if (sb->s_bdev) {
+		char b[BDEVNAME_SIZE];
+		printk(KERN_WARNING "Emergency Thaw on %s.\n",
 		       bdevname(sb->s_bdev, b));
+	}
+
+	/*
+	 * We got here from __iterate_supers with the superblock lock taken
+	 * so we can call the lockless version of thaw_super() safely.
+	 */
+	res = __thaw_super(sb);
+	if (!res) {
+		deactivate_locked_super(sb);
+		/*
+		 * We have to re-acquire s_umount because
+		 * iterate_supers_write() will unlock it. It still holds
+		 * passive reference so sb cannot be freed under us.
+		 */
+		down_write(&sb->s_umount);
+	}
 }
 
 static void do_thaw_all(struct work_struct *work)
 {
-	iterate_supers_read(do_thaw_one, NULL);
+	iterate_supers_write(do_thaw_one, NULL);
 	kfree(work);
 	printk(KERN_WARNING "Emergency Thaw complete\n");
 }
diff -urNp linux-3.8-rc1-orig/include/linux/fs.h linux-3.8-rc1/include/linux/fs.h
--- linux-3.8-rc1-orig/include/linux/fs.h	2012-12-25 11:35:55.488018000 +0900
+++ linux-3.8-rc1/include/linux/fs.h	2012-12-25 11:40:06.132018000 +0900
@@ -1881,6 +1881,7 @@ extern int vfs_ustat(dev_t, struct kstat
 extern int freeze_super(struct super_block *super);
 extern int __thaw_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
+extern void emergency_thaw_all(void);
 extern bool our_mnt(struct vfsmount *mnt);
 
 extern int current_umask(void);
@@ -2053,7 +2054,6 @@ extern void iterate_bdevs(void (*)(struc
 extern int sync_blockdev(struct block_device *bdev);
 extern void kill_bdev(struct block_device *);
 extern struct super_block *freeze_bdev(struct block_device *);
-extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
 extern int fsync_bdev(struct block_device *);
 #else



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 5/17] xfs: switch to using super methods for fsfreeze
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (3 preceding siblings ...)
  2013-01-07 11:26 ` [PATCH 4/17] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
@ 2013-01-07 11:27 ` Fernando Luis Vázquez Cao
  2013-01-07 11:29 ` [PATCH 6/17] fsfreeze: move emergency thaw code to fs/super.c Fernando Luis Vázquez Cao
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:27 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

xfs uses the bdev interfaces despite the fact that the kernel is operating
at the superblock level here. Convert xfs to use the superblock interfaces
instead.

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/xfs/xfs_fsops.c linux-3.8-rc1/fs/xfs/xfs_fsops.c
--- linux-3.8-rc1-orig/fs/xfs/xfs_fsops.c	2012-12-25 10:27:42.074737000 +0900
+++ linux-3.8-rc1/fs/xfs/xfs_fsops.c	2012-12-25 11:44:11.632018000 +0900
@@ -728,16 +728,12 @@ xfs_fs_goingdown(
 	__uint32_t	inflags)
 {
 	switch (inflags) {
-	case XFS_FSOP_GOING_FLAGS_DEFAULT: {
-		struct super_block *sb = freeze_bdev(mp->m_super->s_bdev);
-
-		if (sb && !IS_ERR(sb)) {
+	case XFS_FSOP_GOING_FLAGS_DEFAULT:
+		if (!freeze_super(mp->m_super)) {
 			xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
-			thaw_bdev(sb->s_bdev, sb);
+			thaw_super(mp->m_super);
 		}
-
 		break;
-	}
 	case XFS_FSOP_GOING_FLAGS_LOGFLUSH:
 		xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
 		break;



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 6/17] fsfreeze: move emergency thaw code to fs/super.c
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (4 preceding siblings ...)
  2013-01-07 11:27 ` [PATCH 5/17] xfs: switch to using super methods for fsfreeze Fernando Luis Vázquez Cao
@ 2013-01-07 11:29 ` Fernando Luis Vázquez Cao
  2013-01-07 11:30 ` [PATCH 7/17] fsfreeze: fix nested freezing of sb-less bdevs Fernando Luis Vázquez Cao
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:29 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

It makes no sense having the emergency thaw code in fs/buffer.c when all of
it's operations are one superblocks and the code it executes is all in
fs/super.c. Move the code there and clean it up.

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/buffer.c linux-3.8-rc1/fs/buffer.c
--- linux-3.8-rc1-orig/fs/buffer.c	2012-12-25 11:43:09.116018000 +0900
+++ linux-3.8-rc1/fs/buffer.c	2012-12-25 11:47:43.372018000 +0900
@@ -510,55 +510,6 @@ repeat:
 	return err;
 }
 
-static void do_thaw_one(struct super_block *sb, void *unused)
-{
-	int res;
-
-	if (sb->s_bdev) {
-		char b[BDEVNAME_SIZE];
-		printk(KERN_WARNING "Emergency Thaw on %s.\n",
-		       bdevname(sb->s_bdev, b));
-	}
-
-	/*
-	 * We got here from __iterate_supers with the superblock lock taken
-	 * so we can call the lockless version of thaw_super() safely.
-	 */
-	res = __thaw_super(sb);
-	if (!res) {
-		deactivate_locked_super(sb);
-		/*
-		 * We have to re-acquire s_umount because
-		 * iterate_supers_write() will unlock it. It still holds
-		 * passive reference so sb cannot be freed under us.
-		 */
-		down_write(&sb->s_umount);
-	}
-}
-
-static void do_thaw_all(struct work_struct *work)
-{
-	iterate_supers_write(do_thaw_one, NULL);
-	kfree(work);
-	printk(KERN_WARNING "Emergency Thaw complete\n");
-}
-
-/**
- * emergency_thaw_all -- forcibly thaw every frozen filesystem
- *
- * Used for emergency unfreeze of all filesystems via SysRq
- */
-void emergency_thaw_all(void)
-{
-	struct work_struct *work;
-
-	work = kmalloc(sizeof(*work), GFP_ATOMIC);
-	if (work) {
-		INIT_WORK(work, do_thaw_all);
-		schedule_work(work);
-	}
-}
-
 /**
  * sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
  * @mapping: the mapping which wants those buffers written
diff -urNp linux-3.8-rc1-orig/fs/super.c linux-3.8-rc1/fs/super.c
--- linux-3.8-rc1-orig/fs/super.c	2012-12-25 11:35:55.488018000 +0900
+++ linux-3.8-rc1/fs/super.c	2012-12-25 11:47:43.372018000 +0900
@@ -574,7 +574,7 @@ void iterate_supers_read(void (*f)(struc
  *	a superblock locked in _write_ mode and given argument. The lock
  *	is automatically relased after the function returns.
  */
-void iterate_supers_write(void (*f)(struct super_block *, void *), void *arg)
+static void iterate_supers_write(void (*f)(struct super_block *, void *), void *arg)
 {
 	__iterate_supers(f, arg , true);
 }
@@ -1408,7 +1408,7 @@ EXPORT_SYMBOL(freeze_super);
  * to protect the superblock by grabbing the s_umount semaphore in write mode
  * and release it again on return. See thaw_super() for an example.
  */
-int __thaw_super(struct super_block *sb)
+static int __thaw_super(struct super_block *sb)
 {
 	int error = 0;
 
@@ -1455,3 +1455,53 @@ int thaw_super(struct super_block *sb)
 	return res;
 }
 EXPORT_SYMBOL(thaw_super);
+
+static void do_thaw_one(struct super_block *sb, void *unused)
+{
+	int res;
+
+	if (sb->s_bdev) {
+		char b[BDEVNAME_SIZE];
+		printk(KERN_WARNING "Emergency Thaw on %s.\n",
+		       bdevname(sb->s_bdev, b));
+	}
+
+	/*
+	 * We got here from __iterate_supers with the superblock lock taken
+	 * so we can call the lockless version of thaw_super() safely.
+	 */
+	res = __thaw_super(sb);
+	if (!res) {
+		deactivate_locked_super(sb);
+		/*
+		 * We have to re-acquire s_umount because
+		 * iterate_supers_write() will unlock it. It still holds
+		 * passive reference so sb cannot be freed under us.
+		 */
+		down_write(&sb->s_umount);
+	}
+}
+
+static void do_thaw_all(struct work_struct *work)
+{
+	iterate_supers_write(do_thaw_one, NULL);
+	kfree(work);
+	printk(KERN_WARNING "Emergency Thaw complete\n");
+}
+
+/**
+ * emergency_thaw_all -- forcibly thaw every frozen filesystem
+ *
+ * Used for emergency unfreeze of all filesystems via SysRq
+ */
+void emergency_thaw_all(void)
+{
+	struct work_struct *work;
+
+	work = kmalloc(sizeof(*work), GFP_ATOMIC);
+	if (work) {
+		INIT_WORK(work, do_thaw_all);
+		schedule_work(work);
+	}
+}
+
diff -urNp linux-3.8-rc1-orig/include/linux/fs.h linux-3.8-rc1/include/linux/fs.h
--- linux-3.8-rc1-orig/include/linux/fs.h	2012-12-25 11:43:09.116018000 +0900
+++ linux-3.8-rc1/include/linux/fs.h	2012-12-25 11:47:43.376018000 +0900
@@ -1879,7 +1879,6 @@ extern int user_statfs(const char __user
 extern int fd_statfs(int, struct kstatfs *);
 extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
-extern int __thaw_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
 extern void emergency_thaw_all(void);
 extern bool our_mnt(struct vfsmount *mnt);
@@ -2496,7 +2495,6 @@ extern struct super_block *get_super_tha
 extern struct super_block *get_active_super(struct block_device *bdev);
 extern void drop_super(struct super_block *sb);
 extern void iterate_supers_read(void (*)(struct super_block *, void *), void *);
-extern void iterate_supers_write(void (*)(struct super_block *, void *), void *);
 extern void iterate_supers_type(struct file_system_type *,
 			        void (*)(struct super_block *, void *), void *);
 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 7/17] fsfreeze: fix nested freezing of sb-less bdevs
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (5 preceding siblings ...)
  2013-01-07 11:29 ` [PATCH 6/17] fsfreeze: move emergency thaw code to fs/super.c Fernando Luis Vázquez Cao
@ 2013-01-07 11:30 ` Fernando Luis Vázquez Cao
  2013-01-09 16:24   ` Jan Kara
  2013-01-07 11:32 ` [PATCH 8/17] fsfreeze: allow bdev level thaws when the sb is unfrozen Fernando Luis Vázquez Cao
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:30 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

If a frozen bdev does not have a filesystem sitting on top of it any subsequent
nested freeze will cause a null pointer dereference because freeze_bdev()
calls drop_super() unconditionally. drop_super() should be called only when
there is a superblock to drop.

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/block_dev.c linux-3.8-rc1/fs/block_dev.c
--- linux-3.8-rc1-orig/fs/block_dev.c	2012-12-25 11:38:05.072018000 +0900
+++ linux-3.8-rc1/fs/block_dev.c	2012-12-25 11:50:43.060018000 +0900
@@ -227,8 +227,8 @@ struct super_block *freeze_bdev(struct b
 		 * to freeze_bdev grab an active reference and only the last
 		 * thaw_bdev drops it.
 		 */
-		sb = get_super(bdev);
-		drop_super(sb);
+		if ((sb = get_super(bdev)) != NULL)
+			drop_super(sb);
 		mutex_unlock(&bdev->bd_fsfreeze_mutex);
 		return sb;
 	}



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 8/17] fsfreeze: allow bdev level thaws when the sb is unfrozen
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (6 preceding siblings ...)
  2013-01-07 11:30 ` [PATCH 7/17] fsfreeze: fix nested freezing of sb-less bdevs Fernando Luis Vázquez Cao
@ 2013-01-07 11:32 ` Fernando Luis Vázquez Cao
  2013-01-09 16:26   ` Jan Kara
  2013-01-07 11:34 ` [PATCH 9/17] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

This behavior is important in a scenario where a filesystem frozen using
freeze_bdev() is thawed through the superblock level API; if we
caused subsequent calls to thaw_bdev() future calls to freeze_bdev()
would fail to freeze the superblock.

We can get rid of this code away once bdev level fsfreeze and sb level
fsfreeze are properly integrated.

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/block_dev.c linux-3.8-rc1/fs/block_dev.c
--- linux-3.8-rc1-orig/fs/block_dev.c	2012-12-25 11:52:04.344018000 +0900
+++ linux-3.8-rc1/fs/block_dev.c	2012-12-25 16:11:12.648018000 +0900
@@ -272,8 +272,19 @@ int thaw_bdev(struct block_device *bdev,
 	}
 
 	error = thaw_super(sb);
-	if (error)
+	/*
+	 * If the superblock is already unfrozen, i.e. thaw_super() returned
+	 * -EINVAL, we consider the block device level thaw successful. This
+	 * behavior is important in a scenario where a filesystem frozen using
+	 * freeze_bdev() is thawed through the superblock level API; if we
+	 * caused the subsequent thaw_bdev() to fail bdev->bd_fsfreeze_count
+	 * would not go back to 0 which means that future calls to freeze_bdev()
+	 * would not freeze the superblock, just increase the counter.
+	 */
+	if (error && error != -EINVAL)
 		bdev->bd_fsfreeze_count++;
+	else
+		error = 0;
 out:
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	return error;



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 9/17] fsfreeze: freeze_super and thaw_bdev don't play well together
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (7 preceding siblings ...)
  2013-01-07 11:32 ` [PATCH 8/17] fsfreeze: allow bdev level thaws when the sb is unfrozen Fernando Luis Vázquez Cao
@ 2013-01-07 11:34 ` Fernando Luis Vázquez Cao
  2013-01-07 11:35 ` [PATCH 10/17] fsfreeze: automatically thaw on umount Fernando Luis Vázquez Cao
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:34 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel, Konishi Ryusuke,
	Steven Whitehouse

thaw_bdev() has re-entrancy guards to allow freezes to nest
together. That is, it ensures that the filesystem is not thawed
until the last thaw command is issued. This is needed to prevent the
filesystem from being unfrozen while an existing freezer is still
operating on the filesystem in a frozen state (e.g. dm-snapshot).

Currently, freeze_super() and thaw_super() bypasses these guards,
and as a result manual freezing and unfreezing via the ioctl methods
do not nest correctly. hence mixing userspace directed freezes with
block device level freezes result in inconsistency due to premature
thawing of the filesystem.

Move the re-entrancy guards to the superblock and into freeze_super
and thaw_super() so that userspace directed freezes nest correctly
again.

Caveat: Things work as expected as long as direct calls to
thaw_super are always in response to a previous sb level freeze. In
other words an unpaired call to thaw_super can still thaw a
filesystem frozen using freeze_bdev (this issue could be addressed
in a follow-up patch if deemed necessary).

This patch retains the bdev level mutex and counter to keep the
"feature" that we can freeze a block device that does not have a
filesystem mounted yet.

IMPORTANT NOTE: This patch restores the nesting feature of the fsfreeze
ioctl which was removed by commit 18e9e5104fcd9a973ffe3eed3816c87f2a1b6cd2
("Introduce freeze_super and thaw_super for the fsfreeze ioctl"), thus
fixing the current ABI breakage.

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Konishi Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/block_dev.c linux-3.8-rc1/fs/block_dev.c
--- linux-3.8-rc1-orig/fs/block_dev.c	2012-12-25 16:12:52.304018000 +0900
+++ linux-3.8-rc1/fs/block_dev.c	2012-12-25 16:20:49.840018000 +0900
@@ -204,16 +204,18 @@ int fsync_bdev(struct block_device *bdev
 EXPORT_SYMBOL(fsync_bdev);
 
 /**
- * freeze_bdev  --  lock a filesystem and force it into a consistent state
+ * freeze_bdev  -  lock a block device
  * @bdev:	blockdevice to lock
  *
- * If a superblock is found on this device, we take the s_umount semaphore
- * on it to make sure nobody unmounts until the snapshot creation is done.
- * The reference counter (bd_fsfreeze_count) guarantees that only the last
- * unfreeze process can unfreeze the frozen filesystem actually when multiple
- * freeze requests arrive simultaneously. It counts up in freeze_bdev() and
- * count down in thaw_bdev(). When it becomes 0, thaw_bdev() will unfreeze
- * actually.
+ * Locks the block device and, if present, the associated filesystem too.
+ *
+ * The reference counter (bd_fsfreeze_count) is used to implement the feature
+ * that allows one to freeze a block device that does not have a filesystem
+ * mounted yet. For filesystems using mount_bdev the kernel takes care of
+ * things by preventing the mount operation from succeeding if the underlying
+ * block device is frozen. Other filesystems should check this counter or risk
+ * a situation where a freeze_bdev user (e.g. dm snapshot) and mount race,
+ * which may lead to inconsistencies.
  */
 struct super_block *freeze_bdev(struct block_device *bdev)
 {
@@ -244,19 +246,19 @@ struct super_block *freeze_bdev(struct b
 		return ERR_PTR(error);
 	}
 	deactivate_super(sb);
- out:
+out:
 	sync_blockdev(bdev);
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
-	return sb;	/* thaw_bdev releases s->s_umount */
+	return sb;
 }
 EXPORT_SYMBOL(freeze_bdev);
 
 /**
- * thaw_bdev  -- unlock filesystem
+ * thaw_bdev  - unlock a block device
  * @bdev:	blockdevice to unlock
  * @sb:		associated superblock
  *
- * Unlocks the filesystem and marks it writeable again after freeze_bdev().
+ * Unlocks the block device and, if present, the associated filesystem too.
  */
 int thaw_bdev(struct block_device *bdev, struct super_block *sb)
 {
diff -urNp linux-3.8-rc1-orig/fs/gfs2/ops_fstype.c linux-3.8-rc1/fs/gfs2/ops_fstype.c
--- linux-3.8-rc1-orig/fs/gfs2/ops_fstype.c	2012-12-25 10:27:41.294737000 +0900
+++ linux-3.8-rc1/fs/gfs2/ops_fstype.c	2012-12-25 16:20:49.840018000 +0900
@@ -1300,9 +1300,13 @@ static struct dentry *gfs2_mount(struct
 		return ERR_CAST(bdev);
 
 	/*
-	 * once the super is inserted into the list by sget, s_umount
-	 * will protect the lockfs code from trying to start a snapshot
-	 * while we are mounting
+	 * Once the super is inserted into the list by sget, s_umount will
+	 * protect the block device level fsfreeze code from trying to start a
+	 * snapshot while we are mounting. There is one caveat though: the
+	 * mount can still fail after sget(), in which case we would release
+	 * s_umount before the superblock is fully initialized. The fsfreeze
+	 * code takes care of the latter case by canceling the freeze and
+	 * bailing out if the MS_BORN flag is not set in the superblock.
 	 */
 	mutex_lock(&bdev->bd_fsfreeze_mutex);
 	if (bdev->bd_fsfreeze_count > 0) {
diff -urNp linux-3.8-rc1-orig/fs/nilfs2/super.c linux-3.8-rc1/fs/nilfs2/super.c
--- linux-3.8-rc1-orig/fs/nilfs2/super.c	2012-12-11 12:30:57.000000000 +0900
+++ linux-3.8-rc1/fs/nilfs2/super.c	2012-12-25 16:20:49.840018000 +0900
@@ -1274,9 +1274,13 @@ nilfs_mount(struct file_system_type *fs_
 	}
 
 	/*
-	 * once the super is inserted into the list by sget, s_umount
-	 * will protect the lockfs code from trying to start a snapshot
-	 * while we are mounting
+	 * Once the super is inserted into the list by sget, s_umount will
+	 * protect the block device level fsfreeze code from trying to start a
+	 * snapshot while we are mounting. There is one caveat though: the
+	 * mount can still fail after sget(), in which case we would release
+	 * s_umount before the superblock is fully initialized. The fsfreeze
+	 * code takes care of the latter case by canceling the freeze and
+	 * bailing out if the MS_BORN flag is not set in the superblock.
 	 */
 	mutex_lock(&sd.bdev->bd_fsfreeze_mutex);
 	if (sd.bdev->bd_fsfreeze_count > 0) {
diff -urNp linux-3.8-rc1-orig/fs/super.c linux-3.8-rc1/fs/super.c
--- linux-3.8-rc1-orig/fs/super.c	2012-12-25 11:49:56.172018000 +0900
+++ linux-3.8-rc1/fs/super.c	2012-12-25 16:20:49.844018000 +0900
@@ -1004,9 +1004,13 @@ struct dentry *mount_bdev(struct file_sy
 		return ERR_CAST(bdev);
 
 	/*
-	 * once the super is inserted into the list by sget, s_umount
-	 * will protect the lockfs code from trying to start a snapshot
-	 * while we are mounting
+	 * Once the super is inserted into the list by sget, s_umount will
+	 * protect the block device level fsfreeze code from trying to start a
+	 * snapshot while we are mounting. There is one caveat though: the
+	 * mount can still fail after sget(), in which case we would release
+	 * s_umount before the superblock is fully initialized. The fsfreeze
+	 * code takes care of the latter case by canceling the freeze and
+	 * bailing out if the MS_BORN flag is not set in the superblock.
 	 */
 	mutex_lock(&bdev->bd_fsfreeze_mutex);
 	if (bdev->bd_fsfreeze_count > 0) {
@@ -1301,8 +1305,11 @@ static void sb_wait_write(struct super_b
  * @sb: the super to lock
  *
  * Syncs the super to make sure the filesystem is consistent and calls the fs's
- * freeze_fs.  Subsequent calls to this without first thawing the fs will return
- * -EBUSY.
+ * freeze_fs. The reference counter (s_freeze_count) guarantees that only the
+ * last unfreeze process can unfreeze the frozen filesystem actually when
+ * multiple freeze requests arrive simultaneously. It counts up in
+ * freeze_super() and counts down in thaw_super(). When it becomes 0,
+ * thaw_super() will execute the unfreeze.
  *
  * During this function, sb->s_writers.frozen goes through these values:
  *
@@ -1327,29 +1334,31 @@ static void sb_wait_write(struct super_b
  * freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
  * mostly auxiliary for filesystems to verify they do not modify frozen fs.
  *
- * sb->s_writers.frozen is protected by sb->s_umount.
+ * sb->s_writers.frozen and sb->s_freeze_count are protected by sb->s_umount.
  */
 int freeze_super(struct super_block *sb)
 {
-	int ret;
+	int ret = 0;
 
 	atomic_inc(&sb->s_active);
 	down_write(&sb->s_umount);
-	if (sb->s_writers.frozen != SB_UNFROZEN) {
-		deactivate_locked_super(sb);
-		return -EBUSY;
-	}
 
-	if (!(sb->s_flags & MS_BORN)) {
-		up_write(&sb->s_umount);
-		return 0;	/* sic - it's "nothing to do" */
-	}
+	if (++sb->s_freeze_count > 1)
+		goto out_deactivate;
+
+	/*
+	 * If MS_BORN is not set it means we have a failing mount (this is
+	 * possible if we got here from freeze_bdev()). Keep an active
+	 * reference so that the superblock is not killed until it is thawed
+	 * via thaw_bdev().
+	 */
+	if (!(sb->s_flags & MS_BORN))
+		goto out_unlock;
 
 	if (sb->s_flags & MS_RDONLY) {
 		/* Nothing to do really... */
 		sb->s_writers.frozen = SB_FREEZE_COMPLETE;
-		up_write(&sb->s_umount);
-		return 0;
+		goto out_unlock;
 	}
 
 	/* From now on, no new normal writers can start */
@@ -1381,11 +1390,11 @@ int freeze_super(struct super_block *sb)
 		if (ret) {
 			printk(KERN_ERR
 				"VFS:Filesystem freeze failed\n");
+			sb->s_freeze_count--;
 			sb->s_writers.frozen = SB_UNFROZEN;
 			smp_wmb();
 			wake_up(&sb->s_writers.wait_unfrozen);
-			deactivate_locked_super(sb);
-			return ret;
+			goto out_deactivate;
 		}
 	}
 	/*
@@ -1393,8 +1402,12 @@ int freeze_super(struct super_block *sb)
 	 * sees write activity when frozen is set to SB_FREEZE_COMPLETE.
 	 */
 	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+out_unlock:
 	up_write(&sb->s_umount);
-	return 0;
+	return ret;
+out_deactivate:
+	deactivate_locked_super(sb);
+	return ret;
 }
 EXPORT_SYMBOL(freeze_super);
 
@@ -1404,6 +1417,10 @@ EXPORT_SYMBOL(freeze_super);
  *
  * Unlocks the filesystem and marks it writeable again.
  *
+ * Returns -EINVAL if @sb is not frozen, the number of nested freezes remaining
+ * after this thaw if it succeeded or the corresponding error code otherwise.
+ * If the unfreeze fails, @sb is left in the frozen state.
+ *
  * This is the unlocked version of thaw_super, so it is the caller's job to
  * to protect the superblock by grabbing the s_umount semaphore in write mode
  * and release it again on return. See thaw_super() for an example.
@@ -1412,11 +1429,16 @@ static int __thaw_super(struct super_blo
 {
 	int error = 0;
 
-	if (sb->s_writers.frozen == SB_UNFROZEN) {
+	if (!sb->s_freeze_count) {
 		error = -EINVAL;
 		goto out;
 	}
 
+	if (--sb->s_freeze_count > 0) {
+		error = sb->s_freeze_count;
+		goto out;
+	}
+
 	if (sb->s_flags & MS_RDONLY)
 		goto out_thaw;
 
@@ -1425,6 +1447,7 @@ static int __thaw_super(struct super_blo
 		if (error) {
 			printk(KERN_ERR
 				"VFS:Filesystem thaw failed\n");
+			sb->s_freeze_count++;
 			goto out;
 		}
 	}
@@ -1448,11 +1471,11 @@ int thaw_super(struct super_block *sb)
 	int res;
 	down_write(&sb->s_umount);
 	res = __thaw_super(sb);
-	if (!res)
+	if (!res) /* Active reference released after last thaw. */
 		deactivate_locked_super(sb);
 	else
 		up_write(&sb->s_umount);
-	return res;
+	return res > 0 ? 0 : res;
 }
 EXPORT_SYMBOL(thaw_super);
 
@@ -1460,6 +1483,9 @@ static void do_thaw_one(struct super_blo
 {
 	int res;
 
+	if (sb->s_writers.frozen == SB_UNFROZEN)
+		return;
+
 	if (sb->s_bdev) {
 		char b[BDEVNAME_SIZE];
 		printk(KERN_WARNING "Emergency Thaw on %s.\n",
@@ -1470,6 +1496,7 @@ static void do_thaw_one(struct super_blo
 	 * We got here from __iterate_supers with the superblock lock taken
 	 * so we can call the lockless version of thaw_super() safely.
 	 */
+	sb->s_freeze_count = 1; /* avoid multiple calls to __thaw_super */
 	res = __thaw_super(sb);
 	if (!res) {
 		deactivate_locked_super(sb);
diff -urNp linux-3.8-rc1-orig/include/linux/fs.h linux-3.8-rc1/include/linux/fs.h
--- linux-3.8-rc1-orig/include/linux/fs.h	2012-12-25 11:49:56.172018000 +0900
+++ linux-3.8-rc1/include/linux/fs.h	2012-12-25 16:20:49.844018000 +0900
@@ -1320,6 +1320,9 @@ struct super_block {
 
 	/* Being remounted read-only */
 	int s_readonly_remount;
+
+	/* Number of nested freezes */
+	int s_freeze_count;
 };
 
 /* superblock cache pruning functions */



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 10/17] fsfreeze: automatically thaw on umount
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (8 preceding siblings ...)
  2013-01-07 11:34 ` [PATCH 9/17] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
@ 2013-01-07 11:35 ` Fernando Luis Vázquez Cao
  2013-01-09 17:20   ` Jan Kara
  2013-01-07 11:36 ` [PATCH 11/17] fsfreeze: add thaw_super_force Fernando Luis Vázquez Cao
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:35 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

The fsfreeze userspace API is a sb level API, which means that to thaw a frozen
filesystem we need to have access to it. This means that after an unmount the
filesystem cannot be thawed, because it is not part of the filesystem
namespace anymore.

A possible solution to the problem above is to mount the filesystem again and
execute the thaw operation. However, a subsequent mount is not guaranteed to
succeed and even if it does succeeded it may generate I/O in the process,
which is not supposed to happen since the filesystem is frozen.

Another approach is adding a bdev level API through which the unmounted
filesystem can be reached. The problem with this is that it only works for
filesystems for block devices. We could also return EBUSY when the user tries
to ummount an frozen filesystem, but this wold break lazy unmount.

Automatically freezing on sys_umount fixes all the problems mentioned above.

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/namespace.c linux-3.8-rc1/fs/namespace.c
--- linux-3.8-rc1-orig/fs/namespace.c	2012-12-25 10:27:41.322737000 +0900
+++ linux-3.8-rc1/fs/namespace.c	2012-12-25 16:23:56.904018000 +0900
@@ -1091,6 +1091,27 @@ int may_umount(struct vfsmount *mnt)
 
 EXPORT_SYMBOL(may_umount);
 
+static void thaw_mount(struct mount *mnt)
+{
+	struct super_block *sb = mnt->mnt.mnt_sb;
+
+	down_write(&sb->s_umount);
+	if (sb->s_writers.frozen == SB_UNFROZEN) {
+		up_write(&sb->s_umount);
+		return;
+	}
+	/*
+	 * The superblock may be in the process of being detached from the
+	 * namespace which means we have to make sure the thaw of the superblock
+	 * succeeds (once it has been detached the fsfreeze ioctls become
+	 * unusable). Thus, Force-thaw sb so that all tasks in fsfreeze wait
+	 * queue are woken up.
+	 */
+	sb->s_freeze_count = 1;
+	__thaw_super(sb, true);
+	deactivate_locked_super(sb);
+}
+
 void release_mounts(struct list_head *head)
 {
 	struct mount *mnt;
@@ -1111,6 +1132,7 @@ void release_mounts(struct list_head *he
 			dput(dentry);
 			mntput(&m->mnt);
 		}
+		thaw_mount(mnt);
 		mntput(&mnt->mnt);
 	}
 }
diff -urNp linux-3.8-rc1-orig/fs/super.c linux-3.8-rc1/fs/super.c
--- linux-3.8-rc1-orig/fs/super.c	2012-12-25 16:22:48.272018000 +0900
+++ linux-3.8-rc1/fs/super.c	2012-12-25 16:23:56.904018000 +0900
@@ -1414,6 +1414,7 @@ EXPORT_SYMBOL(freeze_super);
 /**
  * __thaw_super -- unlock filesystem
  * @sb: the super to thaw
+ * @force: whether or not to force the thaw (read details below before using)
  *
  * Unlocks the filesystem and marks it writeable again.
  *
@@ -1421,11 +1422,17 @@ EXPORT_SYMBOL(freeze_super);
  * after this thaw if it succeeded or the corresponding error code otherwise.
  * If the unfreeze fails, @sb is left in the frozen state.
  *
+ * If the force flag is set the kernel will proceeed with the thaw even if the
+ * call to the filesystem specific thaw function (->unfreeze_fs()) fails. This
+ * feature should be used only in situations where there is no entity we can
+ * return an error to so that it has a chance to clean things up and retry, i.e.
+ * this is the last oportunity to wake the tasks in the fsfreeze wait queue up.
+ *
  * This is the unlocked version of thaw_super, so it is the caller's job to
  * to protect the superblock by grabbing the s_umount semaphore in write mode
  * and release it again on return. See thaw_super() for an example.
  */
-static int __thaw_super(struct super_block *sb)
+int __thaw_super(struct super_block *sb, bool force)
 {
 	int error = 0;
 
@@ -1442,11 +1449,9 @@ static int __thaw_super(struct super_blo
 	if (sb->s_flags & MS_RDONLY)
 		goto out_thaw;
 
-	if (sb->s_op->unfreeze_fs) {
-		error = sb->s_op->unfreeze_fs(sb);
-		if (error) {
-			printk(KERN_ERR
-				"VFS:Filesystem thaw failed\n");
+	if (sb->s_op->unfreeze_fs && (error = sb->s_op->unfreeze_fs(sb))) {
+		printk(KERN_ERR "VFS: Filesystem thaw failed\n");
+		if (!force) {
 			sb->s_freeze_count++;
 			goto out;
 		}
@@ -1470,7 +1475,7 @@ int thaw_super(struct super_block *sb)
 {
 	int res;
 	down_write(&sb->s_umount);
-	res = __thaw_super(sb);
+	res = __thaw_super(sb, false);
 	if (!res) /* Active reference released after last thaw. */
 		deactivate_locked_super(sb);
 	else
@@ -1497,7 +1502,7 @@ static void do_thaw_one(struct super_blo
 	 * so we can call the lockless version of thaw_super() safely.
 	 */
 	sb->s_freeze_count = 1; /* avoid multiple calls to __thaw_super */
-	res = __thaw_super(sb);
+	res = __thaw_super(sb, false);
 	if (!res) {
 		deactivate_locked_super(sb);
 		/*
diff -urNp linux-3.8-rc1-orig/include/linux/fs.h linux-3.8-rc1/include/linux/fs.h
--- linux-3.8-rc1-orig/include/linux/fs.h	2012-12-25 16:22:48.272018000 +0900
+++ linux-3.8-rc1/include/linux/fs.h	2012-12-25 16:23:56.904018000 +0900
@@ -1882,6 +1882,7 @@ extern int user_statfs(const char __user
 extern int fd_statfs(int, struct kstatfs *);
 extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
+extern int __thaw_super(struct super_block *super, bool force);
 extern int thaw_super(struct super_block *super);
 extern void emergency_thaw_all(void);
 extern bool our_mnt(struct vfsmount *mnt);



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 11/17] fsfreeze: add thaw_super_force
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (9 preceding siblings ...)
  2013-01-07 11:35 ` [PATCH 10/17] fsfreeze: automatically thaw on umount Fernando Luis Vázquez Cao
@ 2013-01-07 11:36 ` Fernando Luis Vázquez Cao
  2013-01-07 11:38 ` [PATCH 12/17] fsfreeze: sb-level/bdev-level fsfreeze integration Fernando Luis Vázquez Cao
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:36 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

There are cases where we need the thaw operation to succeed not matter what:
the umount time thaw and emergency thaw. The former is particularly problematic
because after the umount the filesystem may not accessible anymore, which
means that any processes in the fsfreeze wait queue would be stuck in an
uninterruptible wait forever.

We add thaw_super_force() to the fsfreeze in-kernel API to be used in
situations like the ones mentioned above. For code reuse purposes
raw_thaw_super() is introduced too and all the thaw functions restructured
around it.

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/namespace.c linux-3.8-rc1/fs/namespace.c
--- linux-3.8-rc1-orig/fs/namespace.c	2012-12-25 16:26:40.328018000 +0900
+++ linux-3.8-rc1/fs/namespace.c	2012-12-25 16:27:19.792018000 +0900
@@ -1095,21 +1095,16 @@ static void thaw_mount(struct mount *mnt
 {
 	struct super_block *sb = mnt->mnt.mnt_sb;
 
+	/* thaw_super_force() has to be called with superblock lock taken. */
 	down_write(&sb->s_umount);
-	if (sb->s_writers.frozen == SB_UNFROZEN) {
-		up_write(&sb->s_umount);
-		return;
-	}
 	/*
-	 * The superblock may be in the process of being detached from the
-	 * namespace which means we have to make sure the thaw of the superblock
-	 * succeeds (once it has been detached the fsfreeze ioctls become
-	 * unusable). Thus, Force-thaw sb so that all tasks in fsfreeze wait
-	 * queue are woken up.
+	 * The superblock may be in the process of being detached from
+	 * the namespace which means we have to make sure the thaw of the
+	 * superblock succeeds (once it has been detached the fsfreeze
+	 * ioctls become unusable). Thus, force-thaw sb so that all tasks
+	 * in fsfreeze wait queue are woken up.
 	 */
-	sb->s_freeze_count = 1;
-	__thaw_super(sb, true);
-	deactivate_locked_super(sb);
+	thaw_super_force(sb); /* Drops superblock lock. */
 }
 
 void release_mounts(struct list_head *head)
diff -urNp linux-3.8-rc1-orig/fs/super.c linux-3.8-rc1/fs/super.c
--- linux-3.8-rc1-orig/fs/super.c	2012-12-25 16:26:40.332018000 +0900
+++ linux-3.8-rc1/fs/super.c	2012-12-25 16:27:19.792018000 +0900
@@ -1412,51 +1412,34 @@ out_deactivate:
 EXPORT_SYMBOL(freeze_super);
 
 /**
- * __thaw_super -- unlock filesystem
+ * raw_thaw_super - unlock filesystem
  * @sb: the super to thaw
  * @force: whether or not to force the thaw (read details below before using)
  *
  * Unlocks the filesystem and marks it writeable again.
  *
- * Returns -EINVAL if @sb is not frozen, the number of nested freezes remaining
- * after this thaw if it succeeded or the corresponding error code otherwise.
- * If the unfreeze fails, @sb is left in the frozen state.
- *
  * If the force flag is set the kernel will proceeed with the thaw even if the
  * call to the filesystem specific thaw function (->unfreeze_fs()) fails. This
  * feature should be used only in situations where there is no entity we can
  * return an error to so that it has a chance to clean things up and retry, i.e.
  * this is the last oportunity to wake the tasks in the fsfreeze wait queue up.
  *
- * This is the unlocked version of thaw_super, so it is the caller's job to
- * to protect the superblock by grabbing the s_umount semaphore in write mode
- * and release it again on return. See thaw_super() for an example.
+ * This is the raw version of thaw_super and comes without locking, so it is
+ * the caller's job to to protect the superblock by grabbing the s_umount
+ * semaphore in write mode and to release it again on return.
  */
-int __thaw_super(struct super_block *sb, bool force)
+static int raw_thaw_super(struct super_block *sb, bool force)
 {
 	int error = 0;
 
-	if (!sb->s_freeze_count) {
-		error = -EINVAL;
-		goto out;
-	}
-
-	if (--sb->s_freeze_count > 0) {
-		error = sb->s_freeze_count;
-		goto out;
-	}
-
 	if (sb->s_flags & MS_RDONLY)
 		goto out_thaw;
 
 	if (sb->s_op->unfreeze_fs && (error = sb->s_op->unfreeze_fs(sb))) {
 		printk(KERN_ERR "VFS: Filesystem thaw failed\n");
-		if (!force) {
-			sb->s_freeze_count++;
+		if (!force)
 			goto out;
-		}
 	}
-
 out_thaw:
 	sb->s_writers.frozen = SB_UNFROZEN;
 	smp_wmb();
@@ -1466,52 +1449,91 @@ out:
 }
 
 /**
- * thaw_super -- unlock filesystem
+ * thaw_super - unlock filesystem
  * @sb: the super to thaw
  *
  * Unlocks the filesystem and marks it writeable again after freeze_super().
+ *
+ * Returns -EINVAL if @sb is not frozen, 0 if it succeeded or the corresponding
+ * error code otherwise. If the unfreeze fails, @sb is left in the frozen state.
  */
 int thaw_super(struct super_block *sb)
 {
-	int res;
+	int error = 0;
+
 	down_write(&sb->s_umount);
-	res = __thaw_super(sb, false);
-	if (!res) /* Active reference released after last thaw. */
-		deactivate_locked_super(sb);
-	else
-		up_write(&sb->s_umount);
-	return res > 0 ? 0 : res;
+
+	if (!sb->s_freeze_count) {
+		error = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (--sb->s_freeze_count > 0)
+		goto out_unlock;
+
+	error = raw_thaw_super(sb, false);
+
+	if (error) {
+		sb->s_freeze_count++;
+		goto out_unlock;
+	}
+
+	/* Active reference released after last thaw. */
+	deactivate_locked_super(sb);
+	return error;
+
+out_unlock:
+	up_write(&sb->s_umount);
+	return error;
 }
 EXPORT_SYMBOL(thaw_super);
 
+/**
+ * thaw_super_force - unlock filesystem by force
+ * @sb: the super to force unlock
+ *
+ * Thaws the filesystem by force and releases the active reference taken at
+ * freeze time. Returns -EINVAL if the filesystem was not frozen or 0
+ * otherwise.
+ *
+ * thaw_super_force is called with ->s_umount lock held and drops it.
+ */
+int thaw_super_force(struct super_block *sb)
+{
+	if (sb->s_writers.frozen == SB_UNFROZEN) {
+		/* Superblock was not frozen so release lock and bail out. */
+		up_write(&sb->s_umount);
+		return -EINVAL;
+	}
+	sb->s_freeze_count = 0;
+	raw_thaw_super(sb, true);
+	/* Active reference released after last thaw. */
+	deactivate_locked_super(sb);
+	return 0;
+}
+
 static void do_thaw_one(struct super_block *sb, void *unused)
 {
-	int res;
+	int error;
 
-	if (sb->s_writers.frozen == SB_UNFROZEN)
-		return;
+	/*
+	 * We got here from __iterate_supers with the superblock lock taken
+	 * so we can call thaw_super_force() safely.
+	 */
+	error = thaw_super_force(sb);
 
-	if (sb->s_bdev) {
+	if (error != -EINVAL && sb->s_bdev) {
 		char b[BDEVNAME_SIZE];
 		printk(KERN_WARNING "Emergency Thaw on %s.\n",
 		       bdevname(sb->s_bdev, b));
 	}
 
 	/*
-	 * We got here from __iterate_supers with the superblock lock taken
-	 * so we can call the lockless version of thaw_super() safely.
+	 * We have to re-acquire s_umount because iterate_supers_write() will
+	 * unlock it. It still holds passive reference so sb cannot be freed
+	 * under us.
 	 */
-	sb->s_freeze_count = 1; /* avoid multiple calls to __thaw_super */
-	res = __thaw_super(sb, false);
-	if (!res) {
-		deactivate_locked_super(sb);
-		/*
-		 * We have to re-acquire s_umount because
-		 * iterate_supers_write() will unlock it. It still holds
-		 * passive reference so sb cannot be freed under us.
-		 */
-		down_write(&sb->s_umount);
-	}
+	down_write(&sb->s_umount);
 }
 
 static void do_thaw_all(struct work_struct *work)
@@ -1522,7 +1544,7 @@ static void do_thaw_all(struct work_stru
 }
 
 /**
- * emergency_thaw_all -- forcibly thaw every frozen filesystem
+ * emergency_thaw_all - forcibly thaw every frozen filesystem
  *
  * Used for emergency unfreeze of all filesystems via SysRq
  */
diff -urNp linux-3.8-rc1-orig/include/linux/fs.h linux-3.8-rc1/include/linux/fs.h
--- linux-3.8-rc1-orig/include/linux/fs.h	2012-12-25 16:26:40.332018000 +0900
+++ linux-3.8-rc1/include/linux/fs.h	2012-12-25 16:27:19.792018000 +0900
@@ -1882,8 +1882,8 @@ extern int user_statfs(const char __user
 extern int fd_statfs(int, struct kstatfs *);
 extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
-extern int __thaw_super(struct super_block *super, bool force);
 extern int thaw_super(struct super_block *super);
+extern int thaw_super_force(struct super_block *super);
 extern void emergency_thaw_all(void);
 extern bool our_mnt(struct vfsmount *mnt);
 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 12/17] fsfreeze: sb-level/bdev-level fsfreeze integration
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (10 preceding siblings ...)
  2013-01-07 11:36 ` [PATCH 11/17] fsfreeze: add thaw_super_force Fernando Luis Vázquez Cao
@ 2013-01-07 11:38 ` Fernando Luis Vázquez Cao
  2013-01-09 16:37   ` Jan Kara
  2013-01-07 11:39 ` [PATCH 13/17] fsfreeze: unfreeze bdevs in addition to filesystems during emergency thaw Fernando Luis Vázquez Cao
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:38 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

As things stand now a filesystem frozen through the in-kernel bdev level API
can be thawed using the userspace sb level API, which can lead to accidental
corruption of filesystem snapshots and backups.

To address this problem we modify the in-kernel API so that we can tell
fsfreeze that a kernel initiated freeze is in progress and that the filesystem
should not be thawed no matter how many times the FITHAW ioctl is invoked.

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/block_dev.c linux-3.8-rc1/fs/block_dev.c
--- linux-3.8-rc1-orig/fs/block_dev.c	2012-12-25 16:22:48.268018000 +0900
+++ linux-3.8-rc1/fs/block_dev.c	2012-12-25 16:32:09.712018000 +0900
@@ -238,7 +238,7 @@ struct super_block *freeze_bdev(struct b
 	sb = get_active_super(bdev);
 	if (!sb)
 		goto out;
-	error = freeze_super(sb);
+	error = __freeze_super(sb, true);
 	if (error) {
 		deactivate_super(sb);
 		bdev->bd_fsfreeze_count--;
@@ -265,6 +265,7 @@ int thaw_bdev(struct block_device *bdev,
 	int error = -EINVAL;
 
 	mutex_lock(&bdev->bd_fsfreeze_mutex);
+
 	if (!bdev->bd_fsfreeze_count)
 		goto out;
 
@@ -273,20 +274,10 @@ int thaw_bdev(struct block_device *bdev,
 		goto out;
 	}
 
-	error = thaw_super(sb);
-	/*
-	 * If the superblock is already unfrozen, i.e. thaw_super() returned
-	 * -EINVAL, we consider the block device level thaw successful. This
-	 * behavior is important in a scenario where a filesystem frozen using
-	 * freeze_bdev() is thawed through the superblock level API; if we
-	 * caused the subsequent thaw_bdev() to fail bdev->bd_fsfreeze_count
-	 * would not go back to 0 which means that future calls to freeze_bdev()
-	 * would not freeze the superblock, just increase the counter.
-	 */
-	if (error && error != -EINVAL)
+	error = __thaw_super(sb, true);
+
+	if (error)
 		bdev->bd_fsfreeze_count++;
-	else
-		error = 0;
 out:
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	return error;
diff -urNp linux-3.8-rc1-orig/fs/namespace.c linux-3.8-rc1/fs/namespace.c
--- linux-3.8-rc1-orig/fs/namespace.c	2012-12-25 16:31:10.780018000 +0900
+++ linux-3.8-rc1/fs/namespace.c	2012-12-25 16:32:09.712018000 +0900
@@ -1103,6 +1103,11 @@ static void thaw_mount(struct mount *mnt
 	 * superblock succeeds (once it has been detached the fsfreeze
 	 * ioctls become unusable). Thus, force-thaw sb so that all tasks
 	 * in fsfreeze wait queue are woken up.
+	 *
+	 * thaw_super_force() does not actually thaw the sb if the freeze
+	 * counter was locked (i.e. was frozen through the block device
+	 * level API). In such a case the freeze counter is set to one
+	 * thus guaranteeing that the sb will get thawed unlock time.
 	 */
 	thaw_super_force(sb); /* Drops superblock lock. */
 }
diff -urNp linux-3.8-rc1-orig/fs/super.c linux-3.8-rc1/fs/super.c
--- linux-3.8-rc1-orig/fs/super.c	2012-12-25 16:31:10.780018000 +0900
+++ linux-3.8-rc1/fs/super.c	2012-12-25 16:32:09.712018000 +0900
@@ -1301,15 +1301,20 @@ static void sb_wait_write(struct super_b
 }
 
 /**
- * freeze_super - lock the filesystem and force it into a consistent state
+ * __freeze_super - lock the filesystem and force it into a consistent state
  * @sb: the super to lock
+ * @lock: should we lock the freeze counter?
  *
  * Syncs the super to make sure the filesystem is consistent and calls the fs's
- * freeze_fs. The reference counter (s_freeze_count) guarantees that only the
- * last unfreeze process can unfreeze the frozen filesystem actually when
- * multiple freeze requests arrive simultaneously. It counts up in
- * freeze_super() and counts down in thaw_super(). When it becomes 0,
- * thaw_super() will execute the unfreeze.
+ * freeze_fs. Freezes can nest which has two implications: the filesystem level
+ * freeze occurs during the first nested freeze, the actual filesystem thaw
+ * occurs only when the last thaw operation brings the freeze counter down to
+ * zero.
+ *
+ * If @lock is true the freeze counter is increased after a successful freeze
+ * but it cannot go back to zero (and the filesystem get actually thawed) until
+ * the the counter is unlocked using this function's thaw counterpart. The
+ * freeze counter lock does not nest.
  *
  * During this function, sb->s_writers.frozen goes through these values:
  *
@@ -1334,15 +1339,24 @@ static void sb_wait_write(struct super_b
  * freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
  * mostly auxiliary for filesystems to verify they do not modify frozen fs.
  *
- * sb->s_writers.frozen and sb->s_freeze_count are protected by sb->s_umount.
+ * sb->s_writers.frozen, sb->s_freeze_count and sb->s_freeze_locked are
+ * protected by sb->s_umount.
  */
-int freeze_super(struct super_block *sb)
+int __freeze_super(struct super_block *sb, bool lock)
 {
 	int ret = 0;
+	bool locked_old = sb->s_freeze_locked;
 
 	atomic_inc(&sb->s_active);
 	down_write(&sb->s_umount);
 
+	/* The freeze counter lock does not nest. */
+	if (sb->s_freeze_locked && lock) {
+		ret = -EBUSY;
+		goto out_deactivate;
+	}
+
+	sb->s_freeze_locked = lock ? true : sb->s_freeze_locked;
 	if (++sb->s_freeze_count > 1)
 		goto out_deactivate;
 
@@ -1390,6 +1404,7 @@ int freeze_super(struct super_block *sb)
 		if (ret) {
 			printk(KERN_ERR
 				"VFS:Filesystem freeze failed\n");
+			sb->s_freeze_locked = locked_old;
 			sb->s_freeze_count--;
 			sb->s_writers.frozen = SB_UNFROZEN;
 			smp_wmb();
@@ -1397,11 +1412,13 @@ int freeze_super(struct super_block *sb)
 			goto out_deactivate;
 		}
 	}
+
 	/*
 	 * This is just for debugging purposes so that fs can warn if it
 	 * sees write activity when frozen is set to SB_FREEZE_COMPLETE.
 	 */
 	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+
 out_unlock:
 	up_write(&sb->s_umount);
 	return ret;
@@ -1409,6 +1426,18 @@ out_deactivate:
 	deactivate_locked_super(sb);
 	return ret;
 }
+
+/**
+ * freeze_super - lock the filesystem and force it into a consistent state
+ * @sb: the super to lock
+ *
+ * This is a wrapper around __freeze_super() which does the actual work of
+ * freezing the filesystem. fsfreeze counter lock is not requested.
+ */
+int freeze_super(struct super_block *sb)
+{
+	return __freeze_super(sb, false);
+}
 EXPORT_SYMBOL(freeze_super);
 
 /**
@@ -1449,34 +1478,56 @@ out:
 }
 
 /**
- * thaw_super - unlock filesystem
+ * __thaw_super - unlock filesystem
  * @sb: the super to thaw
+ * @unlock: should we unlock the freeze counter?
  *
- * Unlocks the filesystem and marks it writeable again after freeze_super().
+ * Tries to decrease the freeze counter and when it reaches zero unlocks the
+ * filesystem and marks it writeable again. If the counter is locked it cannot
+ * go back to zero (and thus trigger the actual filesystem thaw) unless @unlock
+ * is true.
  *
  * Returns -EINVAL if @sb is not frozen, 0 if it succeeded or the corresponding
  * error code otherwise. If the unfreeze fails, @sb is left in the frozen state.
  */
-int thaw_super(struct super_block *sb)
+int __thaw_super(struct super_block *sb, bool unlock)
 {
 	int error = 0;
 
 	down_write(&sb->s_umount);
 
-	if (!sb->s_freeze_count) {
+	/*
+	 * An unfrozen filesystem cannot be thawed. Similarly, an unlocked
+	 * freeze counter cannot be unlocked.
+	 */
+	if (!sb->s_freeze_count || (!sb->s_freeze_locked && unlock)) {
 		error = -EINVAL;
 		goto out_unlock;
 	}
 
-	if (--sb->s_freeze_count > 0)
+	/*
+	 * Freezes nest so only the last call (freeze counter down to one) can
+	 * trigger the actual filesystem thaw.
+	 */
+	if (sb->s_freeze_count > 1) {
+		sb->s_freeze_count--;
+		sb->s_freeze_locked = unlock ? false: sb->s_freeze_locked;
+		goto out_unlock;
+	}
+	/* A locked filesystem cannot be thawed unless unlock was requested. */
+	else if (sb->s_freeze_locked && !unlock) {
+		error = -EINVAL;
 		goto out_unlock;
+	}
 
 	error = raw_thaw_super(sb, false);
 
-	if (error) {
-		sb->s_freeze_count++;
-		goto out_unlock;
+	if (!error) {
+		sb->s_freeze_count = 0;
+		sb->s_freeze_locked = false;
 	}
+	else
+		goto out_unlock;
 
 	/* Active reference released after last thaw. */
 	deactivate_locked_super(sb);
@@ -1486,6 +1537,19 @@ out_unlock:
 	up_write(&sb->s_umount);
 	return error;
 }
+
+/**
+ * thaw_super - unlock filesystem
+ * @sb: the super to unlock
+ *
+ * This is a wrapper around __thaw_super() which does the actual work of
+ * thawing the filesystem. Release of the fsfreeze counter lock is not
+ * requested.
+ */
+int thaw_super(struct super_block *sb)
+{
+	return __thaw_super(sb, false);
+}
 EXPORT_SYMBOL(thaw_super);
 
 /**
@@ -1505,10 +1569,19 @@ int thaw_super_force(struct super_block
 		up_write(&sb->s_umount);
 		return -EINVAL;
 	}
+
+	if (sb->s_freeze_locked) {
+		/* Ensure superblock gets thawed at unlock time */
+		sb->s_freeze_count = 1;
+		up_write(&sb->s_umount);
+		return -EINVAL;
+	}
+
 	sb->s_freeze_count = 0;
 	raw_thaw_super(sb, true);
 	/* Active reference released after last thaw. */
 	deactivate_locked_super(sb);
+
 	return 0;
 }
 
diff -urNp linux-3.8-rc1-orig/include/linux/fs.h linux-3.8-rc1/include/linux/fs.h
--- linux-3.8-rc1-orig/include/linux/fs.h	2012-12-25 16:31:10.784018000 +0900
+++ linux-3.8-rc1/include/linux/fs.h	2012-12-25 16:32:09.712018000 +0900
@@ -1323,6 +1323,9 @@ struct super_block {
 
 	/* Number of nested freezes */
 	int s_freeze_count;
+
+	/* Is freeze state locked? */
+	bool s_freeze_locked;
 };
 
 /* superblock cache pruning functions */
@@ -1881,7 +1884,9 @@ extern int vfs_statfs(struct path *, str
 extern int user_statfs(const char __user *, struct kstatfs *);
 extern int fd_statfs(int, struct kstatfs *);
 extern int vfs_ustat(dev_t, struct kstatfs *);
+extern int __freeze_super(struct super_block *sb, bool lock);
 extern int freeze_super(struct super_block *super);
+extern int __thaw_super(struct super_block *sb, bool unlock);
 extern int thaw_super(struct super_block *super);
 extern int thaw_super_force(struct super_block *super);
 extern void emergency_thaw_all(void);



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 13/17] fsfreeze: unfreeze bdevs in addition to filesystems during emergency thaw
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (11 preceding siblings ...)
  2013-01-07 11:38 ` [PATCH 12/17] fsfreeze: sb-level/bdev-level fsfreeze integration Fernando Luis Vázquez Cao
@ 2013-01-07 11:39 ` Fernando Luis Vázquez Cao
  2013-01-09 16:41   ` Jan Kara
  2013-01-07 11:41 ` [PATCH 14/17] vfs: leverage bd_super in get_super and get_active_super Fernando Luis Vázquez Cao
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:39 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

Now that sb-level/bdev-level fsfreeze integration has been achieved and
sb-lebel/bdev-level freeze counters are always kept consistent, we need to
unfreeze bdevs in addition to filesystems during emergency thaw so that
we can recover from a situation where someone forgot to call thaw_bdev().

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/block_dev.c linux-3.8-rc1/fs/block_dev.c
--- linux-3.8-rc1-orig/fs/block_dev.c	2012-12-25 16:33:38.664018000 +0900
+++ linux-3.8-rc1/fs/block_dev.c	2012-12-25 16:34:01.900018000 +0900
@@ -254,18 +254,18 @@ out:
 EXPORT_SYMBOL(freeze_bdev);
 
 /**
- * thaw_bdev  - unlock a block device
+ * __thaw_bdev  -  unlock a block device
  * @bdev:	blockdevice to unlock
  * @sb:		associated superblock
  *
  * Unlocks the block device and, if present, the associated filesystem too.
+ * This is the unlocked version of thaw_bdev and it has to be called with
+ * ->bd_fsfreeze_mutex mutex taken.
  */
-int thaw_bdev(struct block_device *bdev, struct super_block *sb)
+static int __thaw_bdev(struct block_device *bdev, struct super_block *sb)
 {
 	int error = -EINVAL;
 
-	mutex_lock(&bdev->bd_fsfreeze_mutex);
-
 	if (!bdev->bd_fsfreeze_count)
 		goto out;
 
@@ -279,11 +279,49 @@ int thaw_bdev(struct block_device *bdev,
 	if (error)
 		bdev->bd_fsfreeze_count++;
 out:
+	return error;
+}
+
+/**
+ * thaw_bdev  -- unlock a block device
+ * @bdev:      blockdevice to unlock
+ * @sb:                associated superblock
+ *
+ * Unlocks the block device and, if present, the associated filesystem too.
+ */
+int thaw_bdev(struct block_device *bdev, struct super_block *sb)
+{
+	int error;
+	mutex_lock(&bdev->bd_fsfreeze_mutex);
+	error = __thaw_bdev(bdev, sb);
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	return error;
 }
 EXPORT_SYMBOL(thaw_bdev);
 
+void do_thaw_one_bdev(struct block_device *bdev, void *arg)
+{
+	struct super_block *sb;
+
+	mutex_lock(&bdev->bd_fsfreeze_mutex);
+
+	if (!bdev->bd_fsfreeze_count) {
+		mutex_unlock(&bdev->bd_fsfreeze_mutex);
+		return;
+	}
+
+	/*
+	 * We know the block device is frozen so we do not need to grab a
+	 * reference - the first call to thaw_bdev() dit it.
+	 */
+	if ((sb = get_super(bdev)) != NULL)
+		drop_super(sb);
+
+	while (!__thaw_bdev(bdev, sb));
+
+	mutex_unlock(&bdev->bd_fsfreeze_mutex);
+}
+
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
diff -urNp linux-3.8-rc1-orig/fs/super.c linux-3.8-rc1/fs/super.c
--- linux-3.8-rc1-orig/fs/super.c	2012-12-25 16:33:38.664018000 +0900
+++ linux-3.8-rc1/fs/super.c	2012-12-25 16:34:01.900018000 +0900
@@ -1585,7 +1585,7 @@ int thaw_super_force(struct super_block
 	return 0;
 }
 
-static void do_thaw_one(struct super_block *sb, void *unused)
+static void do_thaw_one_sb(struct super_block *sb, void *unused)
 {
 	int error;
 
@@ -1611,7 +1611,8 @@ static void do_thaw_one(struct super_blo
 
 static void do_thaw_all(struct work_struct *work)
 {
-	iterate_supers_write(do_thaw_one, NULL);
+	iterate_bdevs(do_thaw_one_bdev, NULL);
+	iterate_supers_write(do_thaw_one_sb, NULL);
 	kfree(work);
 	printk(KERN_WARNING "Emergency Thaw complete\n");
 }
diff -urNp linux-3.8-rc1-orig/include/linux/fs.h linux-3.8-rc1/include/linux/fs.h
--- linux-3.8-rc1-orig/include/linux/fs.h	2012-12-25 16:33:38.664018000 +0900
+++ linux-3.8-rc1/include/linux/fs.h	2012-12-25 16:34:01.904018000 +0900
@@ -2063,6 +2063,7 @@ extern int sync_blockdev(struct block_de
 extern void kill_bdev(struct block_device *);
 extern struct super_block *freeze_bdev(struct block_device *);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
+extern void do_thaw_one_bdev(struct block_device *bdev, void *arg);
 extern int fsync_bdev(struct block_device *);
 #else
 static inline void bd_forget(struct inode *inode) {}
@@ -2080,6 +2081,10 @@ static inline int thaw_bdev(struct block
 	return 0;
 }
 
+static inline void do_thaw_one_bdev(struct block_device *bdev, void *arg)
+{
+}
+
 static inline void iterate_bdevs(void (*f)(struct block_device *, void *), void *arg)
 {
 }



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 14/17] vfs: leverage bd_super in get_super and get_active_super
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (12 preceding siblings ...)
  2013-01-07 11:39 ` [PATCH 13/17] fsfreeze: unfreeze bdevs in addition to filesystems during emergency thaw Fernando Luis Vázquez Cao
@ 2013-01-07 11:41 ` Fernando Luis Vázquez Cao
  2013-01-09 16:44   ` Jan Kara
  2013-01-07 11:42 ` [PATCH 15/17] btrfs: store pointer to superblock in bd_super Fernando Luis Vázquez Cao
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:41 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel, Chris Mason

Using get_active_super will not work properly if the fs (like btrfs) does not
save its s_bdev with the device it is on.  Also it does not provide the entire
picture, since an filesystem can be contained on multiple disks (again like
btrfs).

Fortunately we now have a bd_super field in struct block_device were a pointer
to the superblock sitting on top of the block device can be stored.

Filesystems using the vfs helper mount_bdev (the ext filesystems, xfs, etc) and
gfs2 already do this (for the former it is a freebie), so for these there is
no need to iterate through the list of superblocks; we can get the superblock
directly from ->bd_super which is more efficient and what this patch
implements.

A multi-device filesystem (once again lile btrfs) can use that field to store
a pointer to the superblock for each block device in its storage pool. By
doing so it would get_active_super and, by extension, thaw_bdev initiated
freezes working.

Thanks go to Josef Bacik and Christoph Hellwig for initiating this effort
to fix btrfs and for suggesting the solution implemented here, respectively.

Cc: linux-fsdevel@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/super.c linux-3.8-rc1/fs/super.c
--- linux-3.8-rc1-orig/fs/super.c	2012-12-25 16:35:43.100018000 +0900
+++ linux-3.8-rc1/fs/super.c	2012-12-25 16:36:22.312018000 +0900
@@ -632,26 +632,29 @@ struct super_block *get_super(struct blo
 		return NULL;
 
 	spin_lock(&sb_lock);
+	if ((sb = bdev->bd_super) != NULL)
+		goto out_found;
 rescan:
 	list_for_each_entry(sb, &super_blocks, s_list) {
 		if (hlist_unhashed(&sb->s_instances))
 			continue;
-		if (sb->s_bdev == bdev) {
-			sb->s_count++;
-			spin_unlock(&sb_lock);
-			down_read(&sb->s_umount);
-			/* still alive? */
-			if (sb->s_root && (sb->s_flags & MS_BORN))
-				return sb;
-			up_read(&sb->s_umount);
-			/* nope, got unmounted */
-			spin_lock(&sb_lock);
-			__put_super(sb);
-			goto rescan;
-		}
+		if (sb->s_bdev == bdev)
+			goto out_found;
 	}
 	spin_unlock(&sb_lock);
 	return NULL;
+out_found:
+	sb->s_count++;
+	spin_unlock(&sb_lock);
+	down_read(&sb->s_umount);
+	/* still alive? */
+	if (sb->s_root && (sb->s_flags & MS_BORN))
+		return sb;
+	up_read(&sb->s_umount);
+	/* nope, got unmounted */
+	spin_lock(&sb_lock);
+	__put_super(sb);
+	goto rescan;
 }
 
 EXPORT_SYMBOL(get_super);
@@ -696,18 +699,22 @@ struct super_block *get_active_super(str
 
 restart:
 	spin_lock(&sb_lock);
+	if ((sb = bdev->bd_super) != NULL)
+		goto out_grabsuper;
 	list_for_each_entry(sb, &super_blocks, s_list) {
 		if (hlist_unhashed(&sb->s_instances))
 			continue;
-		if (sb->s_bdev == bdev) {
-			if (grab_super(sb)) /* drops sb_lock */
-				return sb;
-			else
-				goto restart;
-		}
+		if (sb->s_bdev == bdev)
+			goto out_grabsuper;
 	}
 	spin_unlock(&sb_lock);
 	return NULL;
+
+out_grabsuper:
+	if (grab_super(sb)) /* drops sb_lock */
+		return sb;
+	else
+		goto restart;
 }
  
 struct super_block *user_get_super(dev_t dev)



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 15/17] btrfs: store pointer to superblock in bd_super
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (13 preceding siblings ...)
  2013-01-07 11:41 ` [PATCH 14/17] vfs: leverage bd_super in get_super and get_active_super Fernando Luis Vázquez Cao
@ 2013-01-07 11:42 ` Fernando Luis Vázquez Cao
  2013-01-07 11:43 ` [PATCH 16/17] fsfreeze: allow freeze counter lock nesting Fernando Luis Vázquez Cao
  2013-01-07 11:44 ` [PATCH 17/17] fsfreeze: export freeze_count through mountinfo Fernando Luis Vázquez Cao
  16 siblings, 0 replies; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:42 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel, Chris Mason

This is needed to get get_active_super and, by extension, thaw_bdev initiated
freezes working.

Thanks go to Josef Bacik and Christoph Hellwig for initiating this effort
to fix btrfs and for suggesting the solution implemented here, respectively.

Cc: linux-fsdevel@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/btrfs/super.c linux-3.8-rc1/fs/btrfs/super.c
--- linux-3.8-rc1-orig/fs/btrfs/super.c	2012-12-25 10:27:41.146737000 +0900
+++ linux-3.8-rc1/fs/btrfs/super.c	2012-12-25 16:38:35.880018000 +0900
@@ -854,6 +854,7 @@ static int btrfs_fill_super(struct super
 	save_mount_options(sb, data);
 	cleancache_init_fs(sb);
 	sb->s_flags |= MS_ACTIVE;
+	btrfs_set_super_devices(fs_devices, sb);
 	return 0;
 
 fail_close:
@@ -1507,6 +1508,7 @@ static int btrfs_statfs(struct dentry *d
 static void btrfs_kill_super(struct super_block *sb)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
+	btrfs_set_super_devices(fs_info->fs_devices, NULL);
 	kill_anon_super(sb);
 	free_fs_info(fs_info);
 }
diff -urNp linux-3.8-rc1-orig/fs/btrfs/volumes.c linux-3.8-rc1/fs/btrfs/volumes.c
--- linux-3.8-rc1-orig/fs/btrfs/volumes.c	2012-12-25 10:27:41.150737000 +0900
+++ linux-3.8-rc1/fs/btrfs/volumes.c	2012-12-25 16:38:35.880018000 +0900
@@ -44,6 +44,8 @@ static int init_first_rw_device(struct b
 static int btrfs_relocate_sys_chunks(struct btrfs_root *root);
 static void __btrfs_reset_dev_stats(struct btrfs_device *dev);
 static void btrfs_dev_stat_print_on_load(struct btrfs_device *device);
+static void btrfs_set_super_device(struct btrfs_device *device,
+                                   struct super_block *sb);
 
 static DEFINE_MUTEX(uuid_mutex);
 static LIST_HEAD(fs_uuids);
@@ -1497,6 +1499,7 @@ int btrfs_rm_device(struct btrfs_root *r
 
 	cur_devices = device->fs_devices;
 	mutex_lock(&root->fs_info->fs_devices->device_list_mutex);
+	btrfs_set_super_device(device, NULL);
 	list_del_rcu(&device->dev_list);
 
 	device->fs_devices->num_devices--;
@@ -1819,6 +1822,23 @@ error:
 	return ret;
 }
 
+static void btrfs_set_super_device(struct btrfs_device *device,
+				   struct super_block *sb)
+{
+	if (device->bdev)
+		device->bdev->bd_super = sb;
+}
+
+void btrfs_set_super_devices(struct btrfs_fs_devices *fs_devices,
+			     struct super_block *sb)
+{
+	struct btrfs_device *device;
+	mutex_lock(&fs_devices->device_list_mutex);
+	list_for_each_entry(device, &fs_devices->devices, dev_list)
+		btrfs_set_super_device(device, sb);
+	mutex_unlock(&fs_devices->device_list_mutex);
+}
+
 int btrfs_init_new_device(struct btrfs_root *root, char *device_path)
 {
 	struct request_queue *q;
@@ -1984,7 +2004,7 @@ int btrfs_init_new_device(struct btrfs_r
 		up_write(&sb->s_umount);
 
 		if (ret) /* transaction commit */
-			return ret;
+			goto out;
 
 		ret = btrfs_relocate_sys_chunks(root);
 		if (ret < 0)
@@ -1994,13 +2014,17 @@ int btrfs_init_new_device(struct btrfs_r
 				    "using the \"btrfs balance\" command.");
 		trans = btrfs_attach_transaction(root);
 		if (IS_ERR(trans)) {
-			if (PTR_ERR(trans) == -ENOENT)
-				return 0;
-			return PTR_ERR(trans);
+			ret = PTR_ERR(trans) == -ENOENT ? 0 : PTR_ERR(trans);
+			goto out;
 		}
 		ret = btrfs_commit_transaction(trans, root);
 	}
-
+out:
+	/*
+	 * The device was successfully added to the filesystem, so we store a
+	 * pointer to the superblock in ->bd_super.
+	 */
+	btrfs_set_super_device(device, sb);
 	return ret;
 
 error_trans:
diff -urNp linux-3.8-rc1-orig/fs/btrfs/volumes.h linux-3.8-rc1/fs/btrfs/volumes.h
--- linux-3.8-rc1-orig/fs/btrfs/volumes.h	2012-12-25 10:27:41.166737000 +0900
+++ linux-3.8-rc1/fs/btrfs/volumes.h	2012-12-25 16:38:35.884018000 +0900
@@ -295,6 +295,8 @@ int btrfs_grow_device(struct btrfs_trans
 struct btrfs_device *btrfs_find_device(struct btrfs_fs_info *fs_info, u64 devid,
 				       u8 *uuid, u8 *fsid);
 int btrfs_shrink_device(struct btrfs_device *device, u64 new_size);
+void btrfs_set_super_devices(struct btrfs_fs_devices *fs_devices,
+			     struct super_block *sb);
 int btrfs_init_new_device(struct btrfs_root *root, char *path);
 int btrfs_init_dev_replace_tgtdev(struct btrfs_root *root, char *device_path,
 				  struct btrfs_device **device_out);



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 16/17] fsfreeze: allow freeze counter lock nesting
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (14 preceding siblings ...)
  2013-01-07 11:42 ` [PATCH 15/17] btrfs: store pointer to superblock in bd_super Fernando Luis Vázquez Cao
@ 2013-01-07 11:43 ` Fernando Luis Vázquez Cao
  2013-01-07 11:44 ` [PATCH 17/17] fsfreeze: export freeze_count through mountinfo Fernando Luis Vázquez Cao
  16 siblings, 0 replies; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:43 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

A side effect of adding support for multi-device filesystem to the bdev level
API is that we can have nested calls to thaw_super() from the bdev layer (for
example, the user may try to dm-snapshot all the devices of a btrfs filesystem
at the same time), which means that the lock for the sb-level freeze counter
needs to become a counter.
 
Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com> 
Cc: Jan Kara <jack@suse.cz>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/super.c linux-3.8-rc1/fs/super.c
--- linux-3.8-rc1-orig/fs/super.c	2012-12-25 16:37:51.172018000 +0900
+++ linux-3.8-rc1/fs/super.c	2012-12-25 16:55:22.240018000 +0900
@@ -1346,24 +1346,19 @@ static void sb_wait_write(struct super_b
  * freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
  * mostly auxiliary for filesystems to verify they do not modify frozen fs.
  *
- * sb->s_writers.frozen, sb->s_freeze_count and sb->s_freeze_locked are
+ * sb->s_writers.frozen, sb->s_freeze_count and sb->s_freeze_lock_count are
  * protected by sb->s_umount.
  */
 int __freeze_super(struct super_block *sb, bool lock)
 {
 	int ret = 0;
-	bool locked_old = sb->s_freeze_locked;
+	int lock_count_old = sb->s_freeze_lock_count;
 
 	atomic_inc(&sb->s_active);
 	down_write(&sb->s_umount);
 
-	/* The freeze counter lock does not nest. */
-	if (sb->s_freeze_locked && lock) {
-		ret = -EBUSY;
-		goto out_deactivate;
-	}
-
-	sb->s_freeze_locked = lock ? true : sb->s_freeze_locked;
+	if (lock)
+		sb->s_freeze_lock_count++;
 	if (++sb->s_freeze_count > 1)
 		goto out_deactivate;
 
@@ -1411,7 +1406,7 @@ int __freeze_super(struct super_block *s
 		if (ret) {
 			printk(KERN_ERR
 				"VFS:Filesystem freeze failed\n");
-			sb->s_freeze_locked = locked_old;
+			sb->s_freeze_lock_count = lock_count_old;
 			sb->s_freeze_count--;
 			sb->s_writers.frozen = SB_UNFROZEN;
 			smp_wmb();
@@ -1507,7 +1502,7 @@ int __thaw_super(struct super_block *sb,
 	 * An unfrozen filesystem cannot be thawed. Similarly, an unlocked
 	 * freeze counter cannot be unlocked.
 	 */
-	if (!sb->s_freeze_count || (!sb->s_freeze_locked && unlock)) {
+	if (!sb->s_freeze_count || (!sb->s_freeze_lock_count && unlock)) {
 		error = -EINVAL;
 		goto out_unlock;
 	}
@@ -1517,12 +1512,13 @@ int __thaw_super(struct super_block *sb,
 	 * trigger the actual filesystem thaw.
 	 */
 	if (sb->s_freeze_count > 1) {
+		if (unlock)
+			sb->s_freeze_lock_count--;
 		sb->s_freeze_count--;
-		sb->s_freeze_locked = unlock ? false: sb->s_freeze_locked;
 		goto out_unlock;
 	}
 	/* A locked filesystem cannot be thawed unless unlock was requested. */
-	else if (sb->s_freeze_locked && !unlock) {
+	else if (sb->s_freeze_lock_count && !unlock) {
 		error = -EINVAL;
 		goto out_unlock;
 	}
@@ -1531,7 +1527,7 @@ int __thaw_super(struct super_block *sb,
 
 	if (!error) {
 		sb->s_freeze_count = 0;
-		sb->s_freeze_locked = false;
+		sb->s_freeze_lock_count = 0;
 	}
 	else
 		goto out_unlock;
@@ -1577,9 +1573,9 @@ int thaw_super_force(struct super_block
 		return -EINVAL;
 	}
 
-	if (sb->s_freeze_locked) {
+	if (sb->s_freeze_lock_count) {
 		/* Ensure superblock gets thawed at unlock time */
-		sb->s_freeze_count = 1;
+		sb->s_freeze_count = sb->s_freeze_lock_count;
 		up_write(&sb->s_umount);
 		return -EINVAL;
 	}
diff -urNp linux-3.8-rc1-orig/include/linux/fs.h linux-3.8-rc1/include/linux/fs.h
--- linux-3.8-rc1-orig/include/linux/fs.h	2012-12-25 16:35:43.104018000 +0900
+++ linux-3.8-rc1/include/linux/fs.h	2012-12-25 16:55:22.244018000 +0900
@@ -1325,7 +1325,7 @@ struct super_block {
 	int s_freeze_count;
 
 	/* Is freeze state locked? */
-	bool s_freeze_locked;
+	int s_freeze_lock_count;
 };
 
 /* superblock cache pruning functions */



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 17/17] fsfreeze: export freeze_count through mountinfo
  2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (15 preceding siblings ...)
  2013-01-07 11:43 ` [PATCH 16/17] fsfreeze: allow freeze counter lock nesting Fernando Luis Vázquez Cao
@ 2013-01-07 11:44 ` Fernando Luis Vázquez Cao
  16 siblings, 0 replies; 27+ messages in thread
From: Fernando Luis Vázquez Cao @ 2013-01-07 11:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, Luiz Capitulino, linux-fsdevel

Export freeze_count through mountinfo so that we have a way to figure out the
current freeze state from userspace.

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc1-orig/fs/proc_namespace.c linux-3.8-rc1/fs/proc_namespace.c
--- linux-3.8-rc1-orig/fs/proc_namespace.c	2012-12-11 12:30:57.000000000 +0900
+++ linux-3.8-rc1/fs/proc_namespace.c	2012-12-25 16:58:27.292018000 +0900
@@ -158,6 +158,8 @@ static int show_mountinfo(struct seq_fil
 	}
 	if (IS_MNT_UNBINDABLE(r))
 		seq_puts(m, " unbindable");
+	if (sb->s_op->freeze_fs)
+		seq_printf(m, " freeze_count:%i", sb->s_freeze_count);
 
 	/* Filesystem specific data */
 	seq_puts(m, " - ");



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/17] fsfreeze: emergency thaw will deadlock on s_umount
  2013-01-07 11:26 ` [PATCH 4/17] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
@ 2013-01-09 16:12   ` Jan Kara
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2013-01-09 16:12 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Jan Kara, Luiz Capitulino, linux-fsdevel

On Mon 07-01-13 20:26:12, Fernando Luis Vázquez Cao wrote:
> The emergency thaw process uses iterate_super() which holds the
> sb->s_umount lock in read mode. The current thaw_super() code takes
> the sb->s_umount lock in write mode, hence leading to an instant
> deadlock.
>  
> Use the unlocked version of thaw_super() to do the thawing and replace
> iterate_supers() with iterate_supers_write() so that the unfreeze operation can
> be performed with s_umount held as the locking rules for fsfreeze indicate.
> 
> As a bonus, by using thaw_super(), which does not nest, instead of thaw_bdev()
> when can get rid of the ugly while loop.
> 
> Jan Kara pointed out that with this approach we will leave the block devices
> frozen, but this is a problem we have had since the introduction of the
> superblock level API: if we thaw the filesystem using the superblock level API
> (be it through the thaw ioctl or emergency thaw) the bdev level freeze
> reference counter (bd_fsfreeze_count) will not be updated and even though
> subsequent calls to thaw_bdev() will decrease it it will never get back to 0
> (if thaw_super() returns an error, and it will when the superblock is unfrozen,
> thaw_bdev() will return without decreasing the counter). The solution I propose
> (and will be implementing in the followup patch "fsfreeze: freeze_super and
> thaw_bdev don't play well together") is letting bd_fsfreeze_count
> become zero when the superblock sitting on top of it is unfrozen, so that
> future calls to freeze_bdev() actually try to freeze the superblock.
  The patch looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
> 
> Cc: linux-fsdevel@vger.kernel.org
> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Eric Sandeen <sandeen@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
> 
> diff -urNp linux-3.8-rc1-orig/drivers/tty/sysrq.c linux-3.8-rc1/drivers/tty/sysrq.c
> --- linux-3.8-rc1-orig/drivers/tty/sysrq.c	2012-12-25 10:27:40.614737000 +0900
> +++ linux-3.8-rc1/drivers/tty/sysrq.c	2012-12-25 11:40:06.128018000 +0900
> @@ -363,7 +363,6 @@ static struct sysrq_key_op sysrq_moom_op
>  	.enable_mask	= SYSRQ_ENABLE_SIGNAL,
>  };
>  
> -#ifdef CONFIG_BLOCK
>  static void sysrq_handle_thaw(int key)
>  {
>  	emergency_thaw_all();
> @@ -374,7 +373,6 @@ static struct sysrq_key_op sysrq_thaw_op
>  	.action_msg	= "Emergency Thaw of all frozen filesystems",
>  	.enable_mask	= SYSRQ_ENABLE_SIGNAL,
>  };
> -#endif
>  
>  static void sysrq_handle_kill(int key)
>  {
> diff -urNp linux-3.8-rc1-orig/fs/buffer.c linux-3.8-rc1/fs/buffer.c
> --- linux-3.8-rc1-orig/fs/buffer.c	2012-12-25 11:30:38.208018000 +0900
> +++ linux-3.8-rc1/fs/buffer.c	2012-12-25 11:40:06.128018000 +0900
> @@ -512,15 +512,33 @@ repeat:
>  
>  static void do_thaw_one(struct super_block *sb, void *unused)
>  {
> -	char b[BDEVNAME_SIZE];
> -	while (sb->s_bdev && !thaw_bdev(sb->s_bdev, sb))
> -		printk(KERN_WARNING "Emergency Thaw on %s\n",
> +	int res;
> +
> +	if (sb->s_bdev) {
> +		char b[BDEVNAME_SIZE];
> +		printk(KERN_WARNING "Emergency Thaw on %s.\n",
>  		       bdevname(sb->s_bdev, b));
> +	}
> +
> +	/*
> +	 * We got here from __iterate_supers with the superblock lock taken
> +	 * so we can call the lockless version of thaw_super() safely.
> +	 */
> +	res = __thaw_super(sb);
> +	if (!res) {
> +		deactivate_locked_super(sb);
> +		/*
> +		 * We have to re-acquire s_umount because
> +		 * iterate_supers_write() will unlock it. It still holds
> +		 * passive reference so sb cannot be freed under us.
> +		 */
> +		down_write(&sb->s_umount);
> +	}
>  }
>  
>  static void do_thaw_all(struct work_struct *work)
>  {
> -	iterate_supers_read(do_thaw_one, NULL);
> +	iterate_supers_write(do_thaw_one, NULL);
>  	kfree(work);
>  	printk(KERN_WARNING "Emergency Thaw complete\n");
>  }
> diff -urNp linux-3.8-rc1-orig/include/linux/fs.h linux-3.8-rc1/include/linux/fs.h
> --- linux-3.8-rc1-orig/include/linux/fs.h	2012-12-25 11:35:55.488018000 +0900
> +++ linux-3.8-rc1/include/linux/fs.h	2012-12-25 11:40:06.132018000 +0900
> @@ -1881,6 +1881,7 @@ extern int vfs_ustat(dev_t, struct kstat
>  extern int freeze_super(struct super_block *super);
>  extern int __thaw_super(struct super_block *super);
>  extern int thaw_super(struct super_block *super);
> +extern void emergency_thaw_all(void);
>  extern bool our_mnt(struct vfsmount *mnt);
>  
>  extern int current_umask(void);
> @@ -2053,7 +2054,6 @@ extern void iterate_bdevs(void (*)(struc
>  extern int sync_blockdev(struct block_device *bdev);
>  extern void kill_bdev(struct block_device *);
>  extern struct super_block *freeze_bdev(struct block_device *);
> -extern void emergency_thaw_all(void);
>  extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
>  extern int fsync_bdev(struct block_device *);
>  #else
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/17] fsfreeze: fix nested freezing of sb-less bdevs
  2013-01-07 11:30 ` [PATCH 7/17] fsfreeze: fix nested freezing of sb-less bdevs Fernando Luis Vázquez Cao
@ 2013-01-09 16:24   ` Jan Kara
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2013-01-09 16:24 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Jan Kara, Luiz Capitulino, linux-fsdevel

On Mon 07-01-13 20:30:38, Fernando Luis Vázquez Cao wrote:
> If a frozen bdev does not have a filesystem sitting on top of it any subsequent
> nested freeze will cause a null pointer dereference because freeze_bdev()
> calls drop_super() unconditionally. drop_super() should be called only when
> there is a superblock to drop.
  Looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

									Honza
> 
> Cc: linux-fsdevel@vger.kernel.org
> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Eric Sandeen <sandeen@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
> 
> diff -urNp linux-3.8-rc1-orig/fs/block_dev.c linux-3.8-rc1/fs/block_dev.c
> --- linux-3.8-rc1-orig/fs/block_dev.c	2012-12-25 11:38:05.072018000 +0900
> +++ linux-3.8-rc1/fs/block_dev.c	2012-12-25 11:50:43.060018000 +0900
> @@ -227,8 +227,8 @@ struct super_block *freeze_bdev(struct b
>  		 * to freeze_bdev grab an active reference and only the last
>  		 * thaw_bdev drops it.
>  		 */
> -		sb = get_super(bdev);
> -		drop_super(sb);
> +		if ((sb = get_super(bdev)) != NULL)
> +			drop_super(sb);
>  		mutex_unlock(&bdev->bd_fsfreeze_mutex);
>  		return sb;
>  	}
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/17] fsfreeze: allow bdev level thaws when the sb is unfrozen
  2013-01-07 11:32 ` [PATCH 8/17] fsfreeze: allow bdev level thaws when the sb is unfrozen Fernando Luis Vázquez Cao
@ 2013-01-09 16:26   ` Jan Kara
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2013-01-09 16:26 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Jan Kara, Luiz Capitulino, linux-fsdevel

On Mon 07-01-13 20:32:13, Fernando Luis Vázquez Cao wrote:
> This behavior is important in a scenario where a filesystem frozen using
> freeze_bdev() is thawed through the superblock level API; if we
> caused subsequent calls to thaw_bdev() future calls to freeze_bdev()
> would fail to freeze the superblock.
> 
> We can get rid of this code away once bdev level fsfreeze and sb level
> fsfreeze are properly integrated.
  OK, makes sence. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
> 
> Cc: linux-fsdevel@vger.kernel.org
> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Eric Sandeen <sandeen@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
> 
> diff -urNp linux-3.8-rc1-orig/fs/block_dev.c linux-3.8-rc1/fs/block_dev.c
> --- linux-3.8-rc1-orig/fs/block_dev.c	2012-12-25 11:52:04.344018000 +0900
> +++ linux-3.8-rc1/fs/block_dev.c	2012-12-25 16:11:12.648018000 +0900
> @@ -272,8 +272,19 @@ int thaw_bdev(struct block_device *bdev,
>  	}
>  
>  	error = thaw_super(sb);
> -	if (error)
> +	/*
> +	 * If the superblock is already unfrozen, i.e. thaw_super() returned
> +	 * -EINVAL, we consider the block device level thaw successful. This
> +	 * behavior is important in a scenario where a filesystem frozen using
> +	 * freeze_bdev() is thawed through the superblock level API; if we
> +	 * caused the subsequent thaw_bdev() to fail bdev->bd_fsfreeze_count
> +	 * would not go back to 0 which means that future calls to freeze_bdev()
> +	 * would not freeze the superblock, just increase the counter.
> +	 */
> +	if (error && error != -EINVAL)
>  		bdev->bd_fsfreeze_count++;
> +	else
> +		error = 0;
>  out:
>  	mutex_unlock(&bdev->bd_fsfreeze_mutex);
>  	return error;
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 12/17] fsfreeze: sb-level/bdev-level fsfreeze integration
  2013-01-07 11:38 ` [PATCH 12/17] fsfreeze: sb-level/bdev-level fsfreeze integration Fernando Luis Vázquez Cao
@ 2013-01-09 16:37   ` Jan Kara
  2013-01-10  9:57     ` Fernando Luis Vazquez Cao
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kara @ 2013-01-09 16:37 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Jan Kara, Luiz Capitulino, linux-fsdevel

On Mon 07-01-13 20:38:24, Fernando Luis Vázquez Cao wrote:
> As things stand now a filesystem frozen through the in-kernel bdev level API
> can be thawed using the userspace sb level API, which can lead to accidental
> corruption of filesystem snapshots and backups.
> 
> To address this problem we modify the in-kernel API so that we can tell
> fsfreeze that a kernel initiated freeze is in progress and that the filesystem
> should not be thawed no matter how many times the FITHAW ioctl is invoked.
  I'm not sure if this isn't going too far in the direction of trying to
prevent sysadmin to shoot himself in the foot. For well written applications
where FITHAW and FIFREEZE are paired, things should work OK after your
initial fixes. And if someone calls unpaired FITHAW, things can break
spectacularly anyway for other users of FIFREEZE. So I just wouldn't bother
with any more protections. What do you think?

									Honza

> Cc: linux-fsdevel@vger.kernel.org
> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Eric Sandeen <sandeen@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Dave Chinner <dchinner@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
> 
> diff -urNp linux-3.8-rc1-orig/fs/block_dev.c linux-3.8-rc1/fs/block_dev.c
> --- linux-3.8-rc1-orig/fs/block_dev.c	2012-12-25 16:22:48.268018000 +0900
> +++ linux-3.8-rc1/fs/block_dev.c	2012-12-25 16:32:09.712018000 +0900
> @@ -238,7 +238,7 @@ struct super_block *freeze_bdev(struct b
>  	sb = get_active_super(bdev);
>  	if (!sb)
>  		goto out;
> -	error = freeze_super(sb);
> +	error = __freeze_super(sb, true);
>  	if (error) {
>  		deactivate_super(sb);
>  		bdev->bd_fsfreeze_count--;
> @@ -265,6 +265,7 @@ int thaw_bdev(struct block_device *bdev,
>  	int error = -EINVAL;
>  
>  	mutex_lock(&bdev->bd_fsfreeze_mutex);
> +
>  	if (!bdev->bd_fsfreeze_count)
>  		goto out;
>  
> @@ -273,20 +274,10 @@ int thaw_bdev(struct block_device *bdev,
>  		goto out;
>  	}
>  
> -	error = thaw_super(sb);
> -	/*
> -	 * If the superblock is already unfrozen, i.e. thaw_super() returned
> -	 * -EINVAL, we consider the block device level thaw successful. This
> -	 * behavior is important in a scenario where a filesystem frozen using
> -	 * freeze_bdev() is thawed through the superblock level API; if we
> -	 * caused the subsequent thaw_bdev() to fail bdev->bd_fsfreeze_count
> -	 * would not go back to 0 which means that future calls to freeze_bdev()
> -	 * would not freeze the superblock, just increase the counter.
> -	 */
> -	if (error && error != -EINVAL)
> +	error = __thaw_super(sb, true);
> +
> +	if (error)
>  		bdev->bd_fsfreeze_count++;
> -	else
> -		error = 0;
>  out:
>  	mutex_unlock(&bdev->bd_fsfreeze_mutex);
>  	return error;
> diff -urNp linux-3.8-rc1-orig/fs/namespace.c linux-3.8-rc1/fs/namespace.c
> --- linux-3.8-rc1-orig/fs/namespace.c	2012-12-25 16:31:10.780018000 +0900
> +++ linux-3.8-rc1/fs/namespace.c	2012-12-25 16:32:09.712018000 +0900
> @@ -1103,6 +1103,11 @@ static void thaw_mount(struct mount *mnt
>  	 * superblock succeeds (once it has been detached the fsfreeze
>  	 * ioctls become unusable). Thus, force-thaw sb so that all tasks
>  	 * in fsfreeze wait queue are woken up.
> +	 *
> +	 * thaw_super_force() does not actually thaw the sb if the freeze
> +	 * counter was locked (i.e. was frozen through the block device
> +	 * level API). In such a case the freeze counter is set to one
> +	 * thus guaranteeing that the sb will get thawed unlock time.
>  	 */
>  	thaw_super_force(sb); /* Drops superblock lock. */
>  }
> diff -urNp linux-3.8-rc1-orig/fs/super.c linux-3.8-rc1/fs/super.c
> --- linux-3.8-rc1-orig/fs/super.c	2012-12-25 16:31:10.780018000 +0900
> +++ linux-3.8-rc1/fs/super.c	2012-12-25 16:32:09.712018000 +0900
> @@ -1301,15 +1301,20 @@ static void sb_wait_write(struct super_b
>  }
>  
>  /**
> - * freeze_super - lock the filesystem and force it into a consistent state
> + * __freeze_super - lock the filesystem and force it into a consistent state
>   * @sb: the super to lock
> + * @lock: should we lock the freeze counter?
>   *
>   * Syncs the super to make sure the filesystem is consistent and calls the fs's
> - * freeze_fs. The reference counter (s_freeze_count) guarantees that only the
> - * last unfreeze process can unfreeze the frozen filesystem actually when
> - * multiple freeze requests arrive simultaneously. It counts up in
> - * freeze_super() and counts down in thaw_super(). When it becomes 0,
> - * thaw_super() will execute the unfreeze.
> + * freeze_fs. Freezes can nest which has two implications: the filesystem level
> + * freeze occurs during the first nested freeze, the actual filesystem thaw
> + * occurs only when the last thaw operation brings the freeze counter down to
> + * zero.
> + *
> + * If @lock is true the freeze counter is increased after a successful freeze
> + * but it cannot go back to zero (and the filesystem get actually thawed) until
> + * the the counter is unlocked using this function's thaw counterpart. The
> + * freeze counter lock does not nest.
>   *
>   * During this function, sb->s_writers.frozen goes through these values:
>   *
> @@ -1334,15 +1339,24 @@ static void sb_wait_write(struct super_b
>   * freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
>   * mostly auxiliary for filesystems to verify they do not modify frozen fs.
>   *
> - * sb->s_writers.frozen and sb->s_freeze_count are protected by sb->s_umount.
> + * sb->s_writers.frozen, sb->s_freeze_count and sb->s_freeze_locked are
> + * protected by sb->s_umount.
>   */
> -int freeze_super(struct super_block *sb)
> +int __freeze_super(struct super_block *sb, bool lock)
>  {
>  	int ret = 0;
> +	bool locked_old = sb->s_freeze_locked;
>  
>  	atomic_inc(&sb->s_active);
>  	down_write(&sb->s_umount);
>  
> +	/* The freeze counter lock does not nest. */
> +	if (sb->s_freeze_locked && lock) {
> +		ret = -EBUSY;
> +		goto out_deactivate;
> +	}
> +
> +	sb->s_freeze_locked = lock ? true : sb->s_freeze_locked;
>  	if (++sb->s_freeze_count > 1)
>  		goto out_deactivate;
>  
> @@ -1390,6 +1404,7 @@ int freeze_super(struct super_block *sb)
>  		if (ret) {
>  			printk(KERN_ERR
>  				"VFS:Filesystem freeze failed\n");
> +			sb->s_freeze_locked = locked_old;
>  			sb->s_freeze_count--;
>  			sb->s_writers.frozen = SB_UNFROZEN;
>  			smp_wmb();
> @@ -1397,11 +1412,13 @@ int freeze_super(struct super_block *sb)
>  			goto out_deactivate;
>  		}
>  	}
> +
>  	/*
>  	 * This is just for debugging purposes so that fs can warn if it
>  	 * sees write activity when frozen is set to SB_FREEZE_COMPLETE.
>  	 */
>  	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
> +
>  out_unlock:
>  	up_write(&sb->s_umount);
>  	return ret;
> @@ -1409,6 +1426,18 @@ out_deactivate:
>  	deactivate_locked_super(sb);
>  	return ret;
>  }
> +
> +/**
> + * freeze_super - lock the filesystem and force it into a consistent state
> + * @sb: the super to lock
> + *
> + * This is a wrapper around __freeze_super() which does the actual work of
> + * freezing the filesystem. fsfreeze counter lock is not requested.
> + */
> +int freeze_super(struct super_block *sb)
> +{
> +	return __freeze_super(sb, false);
> +}
>  EXPORT_SYMBOL(freeze_super);
>  
>  /**
> @@ -1449,34 +1478,56 @@ out:
>  }
>  
>  /**
> - * thaw_super - unlock filesystem
> + * __thaw_super - unlock filesystem
>   * @sb: the super to thaw
> + * @unlock: should we unlock the freeze counter?
>   *
> - * Unlocks the filesystem and marks it writeable again after freeze_super().
> + * Tries to decrease the freeze counter and when it reaches zero unlocks the
> + * filesystem and marks it writeable again. If the counter is locked it cannot
> + * go back to zero (and thus trigger the actual filesystem thaw) unless @unlock
> + * is true.
>   *
>   * Returns -EINVAL if @sb is not frozen, 0 if it succeeded or the corresponding
>   * error code otherwise. If the unfreeze fails, @sb is left in the frozen state.
>   */
> -int thaw_super(struct super_block *sb)
> +int __thaw_super(struct super_block *sb, bool unlock)
>  {
>  	int error = 0;
>  
>  	down_write(&sb->s_umount);
>  
> -	if (!sb->s_freeze_count) {
> +	/*
> +	 * An unfrozen filesystem cannot be thawed. Similarly, an unlocked
> +	 * freeze counter cannot be unlocked.
> +	 */
> +	if (!sb->s_freeze_count || (!sb->s_freeze_locked && unlock)) {
>  		error = -EINVAL;
>  		goto out_unlock;
>  	}
>  
> -	if (--sb->s_freeze_count > 0)
> +	/*
> +	 * Freezes nest so only the last call (freeze counter down to one) can
> +	 * trigger the actual filesystem thaw.
> +	 */
> +	if (sb->s_freeze_count > 1) {
> +		sb->s_freeze_count--;
> +		sb->s_freeze_locked = unlock ? false: sb->s_freeze_locked;
> +		goto out_unlock;
> +	}
> +	/* A locked filesystem cannot be thawed unless unlock was requested. */
> +	else if (sb->s_freeze_locked && !unlock) {
> +		error = -EINVAL;
>  		goto out_unlock;
> +	}
>  
>  	error = raw_thaw_super(sb, false);
>  
> -	if (error) {
> -		sb->s_freeze_count++;
> -		goto out_unlock;
> +	if (!error) {
> +		sb->s_freeze_count = 0;
> +		sb->s_freeze_locked = false;
>  	}
> +	else
> +		goto out_unlock;
>  
>  	/* Active reference released after last thaw. */
>  	deactivate_locked_super(sb);
> @@ -1486,6 +1537,19 @@ out_unlock:
>  	up_write(&sb->s_umount);
>  	return error;
>  }
> +
> +/**
> + * thaw_super - unlock filesystem
> + * @sb: the super to unlock
> + *
> + * This is a wrapper around __thaw_super() which does the actual work of
> + * thawing the filesystem. Release of the fsfreeze counter lock is not
> + * requested.
> + */
> +int thaw_super(struct super_block *sb)
> +{
> +	return __thaw_super(sb, false);
> +}
>  EXPORT_SYMBOL(thaw_super);
>  
>  /**
> @@ -1505,10 +1569,19 @@ int thaw_super_force(struct super_block
>  		up_write(&sb->s_umount);
>  		return -EINVAL;
>  	}
> +
> +	if (sb->s_freeze_locked) {
> +		/* Ensure superblock gets thawed at unlock time */
> +		sb->s_freeze_count = 1;
> +		up_write(&sb->s_umount);
> +		return -EINVAL;
> +	}
> +
>  	sb->s_freeze_count = 0;
>  	raw_thaw_super(sb, true);
>  	/* Active reference released after last thaw. */
>  	deactivate_locked_super(sb);
> +
>  	return 0;
>  }
>  
> diff -urNp linux-3.8-rc1-orig/include/linux/fs.h linux-3.8-rc1/include/linux/fs.h
> --- linux-3.8-rc1-orig/include/linux/fs.h	2012-12-25 16:31:10.784018000 +0900
> +++ linux-3.8-rc1/include/linux/fs.h	2012-12-25 16:32:09.712018000 +0900
> @@ -1323,6 +1323,9 @@ struct super_block {
>  
>  	/* Number of nested freezes */
>  	int s_freeze_count;
> +
> +	/* Is freeze state locked? */
> +	bool s_freeze_locked;
>  };
>  
>  /* superblock cache pruning functions */
> @@ -1881,7 +1884,9 @@ extern int vfs_statfs(struct path *, str
>  extern int user_statfs(const char __user *, struct kstatfs *);
>  extern int fd_statfs(int, struct kstatfs *);
>  extern int vfs_ustat(dev_t, struct kstatfs *);
> +extern int __freeze_super(struct super_block *sb, bool lock);
>  extern int freeze_super(struct super_block *super);
> +extern int __thaw_super(struct super_block *sb, bool unlock);
>  extern int thaw_super(struct super_block *super);
>  extern int thaw_super_force(struct super_block *super);
>  extern void emergency_thaw_all(void);
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 13/17] fsfreeze: unfreeze bdevs in addition to filesystems during emergency thaw
  2013-01-07 11:39 ` [PATCH 13/17] fsfreeze: unfreeze bdevs in addition to filesystems during emergency thaw Fernando Luis Vázquez Cao
@ 2013-01-09 16:41   ` Jan Kara
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2013-01-09 16:41 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Jan Kara, Luiz Capitulino, linux-fsdevel

On Mon 07-01-13 20:39:41, Fernando Luis Vázquez Cao wrote:
> Now that sb-level/bdev-level fsfreeze integration has been achieved and
> sb-lebel/bdev-level freeze counters are always kept consistent, we need to
> unfreeze bdevs in addition to filesystems during emergency thaw so that
> we can recover from a situation where someone forgot to call thaw_bdev().
  Looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

  A typo fix below:

<snip>
> +void do_thaw_one_bdev(struct block_device *bdev, void *arg)
> +{
> +	struct super_block *sb;
> +
> +	mutex_lock(&bdev->bd_fsfreeze_mutex);
> +
> +	if (!bdev->bd_fsfreeze_count) {
> +		mutex_unlock(&bdev->bd_fsfreeze_mutex);
> +		return;
> +	}
> +
> +	/*
> +	 * We know the block device is frozen so we do not need to grab a
> +	 * reference - the first call to thaw_bdev() dit it.
						     ^^^ did
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 14/17] vfs: leverage bd_super in get_super and get_active_super
  2013-01-07 11:41 ` [PATCH 14/17] vfs: leverage bd_super in get_super and get_active_super Fernando Luis Vázquez Cao
@ 2013-01-09 16:44   ` Jan Kara
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2013-01-09 16:44 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Jan Kara, Luiz Capitulino, linux-fsdevel,
	Chris Mason

On Mon 07-01-13 20:41:04, Fernando Luis Vázquez Cao wrote:
> Using get_active_super will not work properly if the fs (like btrfs) does not
> save its s_bdev with the device it is on.  Also it does not provide the entire
> picture, since an filesystem can be contained on multiple disks (again like
> btrfs).
> 
> Fortunately we now have a bd_super field in struct block_device were a pointer
> to the superblock sitting on top of the block device can be stored.
> 
> Filesystems using the vfs helper mount_bdev (the ext filesystems, xfs, etc) and
> gfs2 already do this (for the former it is a freebie), so for these there is
> no need to iterate through the list of superblocks; we can get the superblock
> directly from ->bd_super which is more efficient and what this patch
> implements.
> 
> A multi-device filesystem (once again lile btrfs) can use that field to store
> a pointer to the superblock for each block device in its storage pool. By
> doing so it would get_active_super and, by extension, thaw_bdev initiated
> freezes working.
> 
> Thanks go to Josef Bacik and Christoph Hellwig for initiating this effort
> to fix btrfs and for suggesting the solution implemented here, respectively.
  Looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

									Honza
> 
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-btrfs@vger.kernel.org
> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Eric Sandeen <sandeen@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Dave Chinner <dchinner@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Chris Mason <chris.mason@fusionio.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
> 
> diff -urNp linux-3.8-rc1-orig/fs/super.c linux-3.8-rc1/fs/super.c
> --- linux-3.8-rc1-orig/fs/super.c	2012-12-25 16:35:43.100018000 +0900
> +++ linux-3.8-rc1/fs/super.c	2012-12-25 16:36:22.312018000 +0900
> @@ -632,26 +632,29 @@ struct super_block *get_super(struct blo
>  		return NULL;
>  
>  	spin_lock(&sb_lock);
> +	if ((sb = bdev->bd_super) != NULL)
> +		goto out_found;
>  rescan:
>  	list_for_each_entry(sb, &super_blocks, s_list) {
>  		if (hlist_unhashed(&sb->s_instances))
>  			continue;
> -		if (sb->s_bdev == bdev) {
> -			sb->s_count++;
> -			spin_unlock(&sb_lock);
> -			down_read(&sb->s_umount);
> -			/* still alive? */
> -			if (sb->s_root && (sb->s_flags & MS_BORN))
> -				return sb;
> -			up_read(&sb->s_umount);
> -			/* nope, got unmounted */
> -			spin_lock(&sb_lock);
> -			__put_super(sb);
> -			goto rescan;
> -		}
> +		if (sb->s_bdev == bdev)
> +			goto out_found;
>  	}
>  	spin_unlock(&sb_lock);
>  	return NULL;
> +out_found:
> +	sb->s_count++;
> +	spin_unlock(&sb_lock);
> +	down_read(&sb->s_umount);
> +	/* still alive? */
> +	if (sb->s_root && (sb->s_flags & MS_BORN))
> +		return sb;
> +	up_read(&sb->s_umount);
> +	/* nope, got unmounted */
> +	spin_lock(&sb_lock);
> +	__put_super(sb);
> +	goto rescan;
>  }
>  
>  EXPORT_SYMBOL(get_super);
> @@ -696,18 +699,22 @@ struct super_block *get_active_super(str
>  
>  restart:
>  	spin_lock(&sb_lock);
> +	if ((sb = bdev->bd_super) != NULL)
> +		goto out_grabsuper;
>  	list_for_each_entry(sb, &super_blocks, s_list) {
>  		if (hlist_unhashed(&sb->s_instances))
>  			continue;
> -		if (sb->s_bdev == bdev) {
> -			if (grab_super(sb)) /* drops sb_lock */
> -				return sb;
> -			else
> -				goto restart;
> -		}
> +		if (sb->s_bdev == bdev)
> +			goto out_grabsuper;
>  	}
>  	spin_unlock(&sb_lock);
>  	return NULL;
> +
> +out_grabsuper:
> +	if (grab_super(sb)) /* drops sb_lock */
> +		return sb;
> +	else
> +		goto restart;
>  }
>   
>  struct super_block *user_get_super(dev_t dev)
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/17] fsfreeze: automatically thaw on umount
  2013-01-07 11:35 ` [PATCH 10/17] fsfreeze: automatically thaw on umount Fernando Luis Vázquez Cao
@ 2013-01-09 17:20   ` Jan Kara
  2013-01-10  9:14     ` Fernando Luis Vazquez Cao
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kara @ 2013-01-09 17:20 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Jan Kara, Luiz Capitulino, linux-fsdevel

On Mon 07-01-13 20:35:14, Fernando Luis Vázquez Cao wrote:
> The fsfreeze userspace API is a sb level API, which means that to thaw a frozen
> filesystem we need to have access to it. This means that after an unmount the
> filesystem cannot be thawed, because it is not part of the filesystem
> namespace anymore.
> 
> A possible solution to the problem above is to mount the filesystem again and
> execute the thaw operation. However, a subsequent mount is not guaranteed to
> succeed and even if it does succeeded it may generate I/O in the process,
> which is not supposed to happen since the filesystem is frozen.
> 
> Another approach is adding a bdev level API through which the unmounted
> filesystem can be reached. The problem with this is that it only works for
> filesystems for block devices. We could also return EBUSY when the user tries
> to ummount an frozen filesystem, but this wold break lazy unmount.
> 
> Automatically freezing on sys_umount fixes all the problems mentioned above.
  I have to say I'm not convinced this is the right way to go. For example
if dm-snapshot is run on a filesystem being unmounted, thawing during
unmount isn't really desirable (although I find this scenario unlikely to
happen in practice). And the way you implemented the idea is definitely
wrong - the filesystem can be mounted several times and you would thaw it
on any umount which can definitely surprise processes using freeze.

So what if we changed propagate_mount_busy() to return true if the
superblock is frozen. That will *not* break lazy umount because that
doesn't care about propagate_mount_busy() returns but all other attempts to
umount will return EBUSY. Sure our problem with unreachable filesystem
being frozen won't be completely solved since frozen filesystem can still
be made unreachable by lazy umount but that does not seem to be that
common. What do you think?

									Honza
 
> Cc: linux-fsdevel@vger.kernel.org
> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Eric Sandeen <sandeen@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Dave Chinner <dchinner@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
> 
> diff -urNp linux-3.8-rc1-orig/fs/namespace.c linux-3.8-rc1/fs/namespace.c
> --- linux-3.8-rc1-orig/fs/namespace.c	2012-12-25 10:27:41.322737000 +0900
> +++ linux-3.8-rc1/fs/namespace.c	2012-12-25 16:23:56.904018000 +0900
> @@ -1091,6 +1091,27 @@ int may_umount(struct vfsmount *mnt)
>  
>  EXPORT_SYMBOL(may_umount);
>  
> +static void thaw_mount(struct mount *mnt)
> +{
> +	struct super_block *sb = mnt->mnt.mnt_sb;
> +
> +	down_write(&sb->s_umount);
> +	if (sb->s_writers.frozen == SB_UNFROZEN) {
> +		up_write(&sb->s_umount);
> +		return;
> +	}
> +	/*
> +	 * The superblock may be in the process of being detached from the
> +	 * namespace which means we have to make sure the thaw of the superblock
> +	 * succeeds (once it has been detached the fsfreeze ioctls become
> +	 * unusable). Thus, Force-thaw sb so that all tasks in fsfreeze wait
> +	 * queue are woken up.
> +	 */
> +	sb->s_freeze_count = 1;
> +	__thaw_super(sb, true);
> +	deactivate_locked_super(sb);
> +}
> +
>  void release_mounts(struct list_head *head)
>  {
>  	struct mount *mnt;
> @@ -1111,6 +1132,7 @@ void release_mounts(struct list_head *he
>  			dput(dentry);
>  			mntput(&m->mnt);
>  		}
> +		thaw_mount(mnt);
>  		mntput(&mnt->mnt);
>  	}
>  }
> diff -urNp linux-3.8-rc1-orig/fs/super.c linux-3.8-rc1/fs/super.c
> --- linux-3.8-rc1-orig/fs/super.c	2012-12-25 16:22:48.272018000 +0900
> +++ linux-3.8-rc1/fs/super.c	2012-12-25 16:23:56.904018000 +0900
> @@ -1414,6 +1414,7 @@ EXPORT_SYMBOL(freeze_super);
>  /**
>   * __thaw_super -- unlock filesystem
>   * @sb: the super to thaw
> + * @force: whether or not to force the thaw (read details below before using)
>   *
>   * Unlocks the filesystem and marks it writeable again.
>   *
> @@ -1421,11 +1422,17 @@ EXPORT_SYMBOL(freeze_super);
>   * after this thaw if it succeeded or the corresponding error code otherwise.
>   * If the unfreeze fails, @sb is left in the frozen state.
>   *
> + * If the force flag is set the kernel will proceeed with the thaw even if the
> + * call to the filesystem specific thaw function (->unfreeze_fs()) fails. This
> + * feature should be used only in situations where there is no entity we can
> + * return an error to so that it has a chance to clean things up and retry, i.e.
> + * this is the last oportunity to wake the tasks in the fsfreeze wait queue up.
> + *
>   * This is the unlocked version of thaw_super, so it is the caller's job to
>   * to protect the superblock by grabbing the s_umount semaphore in write mode
>   * and release it again on return. See thaw_super() for an example.
>   */
> -static int __thaw_super(struct super_block *sb)
> +int __thaw_super(struct super_block *sb, bool force)
>  {
>  	int error = 0;
>  
> @@ -1442,11 +1449,9 @@ static int __thaw_super(struct super_blo
>  	if (sb->s_flags & MS_RDONLY)
>  		goto out_thaw;
>  
> -	if (sb->s_op->unfreeze_fs) {
> -		error = sb->s_op->unfreeze_fs(sb);
> -		if (error) {
> -			printk(KERN_ERR
> -				"VFS:Filesystem thaw failed\n");
> +	if (sb->s_op->unfreeze_fs && (error = sb->s_op->unfreeze_fs(sb))) {
> +		printk(KERN_ERR "VFS: Filesystem thaw failed\n");
> +		if (!force) {
>  			sb->s_freeze_count++;
>  			goto out;
>  		}
> @@ -1470,7 +1475,7 @@ int thaw_super(struct super_block *sb)
>  {
>  	int res;
>  	down_write(&sb->s_umount);
> -	res = __thaw_super(sb);
> +	res = __thaw_super(sb, false);
>  	if (!res) /* Active reference released after last thaw. */
>  		deactivate_locked_super(sb);
>  	else
> @@ -1497,7 +1502,7 @@ static void do_thaw_one(struct super_blo
>  	 * so we can call the lockless version of thaw_super() safely.
>  	 */
>  	sb->s_freeze_count = 1; /* avoid multiple calls to __thaw_super */
> -	res = __thaw_super(sb);
> +	res = __thaw_super(sb, false);
>  	if (!res) {
>  		deactivate_locked_super(sb);
>  		/*
> diff -urNp linux-3.8-rc1-orig/include/linux/fs.h linux-3.8-rc1/include/linux/fs.h
> --- linux-3.8-rc1-orig/include/linux/fs.h	2012-12-25 16:22:48.272018000 +0900
> +++ linux-3.8-rc1/include/linux/fs.h	2012-12-25 16:23:56.904018000 +0900
> @@ -1882,6 +1882,7 @@ extern int user_statfs(const char __user
>  extern int fd_statfs(int, struct kstatfs *);
>  extern int vfs_ustat(dev_t, struct kstatfs *);
>  extern int freeze_super(struct super_block *super);
> +extern int __thaw_super(struct super_block *super, bool force);
>  extern int thaw_super(struct super_block *super);
>  extern void emergency_thaw_all(void);
>  extern bool our_mnt(struct vfsmount *mnt);
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/17] fsfreeze: automatically thaw on umount
  2013-01-09 17:20   ` Jan Kara
@ 2013-01-10  9:14     ` Fernando Luis Vazquez Cao
  0 siblings, 0 replies; 27+ messages in thread
From: Fernando Luis Vazquez Cao @ 2013-01-10  9:14 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Luiz Capitulino, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 2709 bytes --]

On 2013/01/10 02:20, Jan Kara wrote:
 > On Mon 07-01-13 20:35:14, Fernando Luis Vázquez Cao wrote:
 >> The fsfreeze userspace API is a sb level API, which means that to 
thaw a frozen
 >> filesystem we need to have access to it. This means that after an 
unmount the
 >> filesystem cannot be thawed, because it is not part of the filesystem
 >> namespace anymore.
 >>
 >> A possible solution to the problem above is to mount the filesystem 
again and
 >> execute the thaw operation. However, a subsequent mount is not 
guaranteed to
 >> succeed and even if it does succeeded it may generate I/O in the 
process,
 >> which is not supposed to happen since the filesystem is frozen.
 >>
 >> Another approach is adding a bdev level API through which the unmounted
 >> filesystem can be reached. The problem with this is that it only 
works for
 >> filesystems for block devices. We could also return EBUSY when the 
user tries
 >> to ummount an frozen filesystem, but this wold break lazy unmount.
 >>
 >> Automatically freezing on sys_umount fixes all the problems 
mentioned above.
 >
 >   I have to say I'm not convinced this is the right way to go. For 
example
 > if dm-snapshot is run on a filesystem being unmounted, thawing during
 > unmount isn't really desirable (although I find this scenario unlikely to
 > happen in practice).

Patch 12 addresses that scenario. With it applied, the sb level freeze
counter would be set to 1 at umount time and the actual filesystem
thaw carried out after dm-snapshot's call to thaw_bdev (which brings
the freeze counter down to zero thus triggering the unfreeze).


 > And the way you implemented the idea is definitely
 > wrong - the filesystem can be mounted several times and you would thaw it
 > on any umount which can definitely surprise processes using freeze.

I agree, the current behavior can cause confusion. I am attaching an
alternative version of this patch that thaws the filesystem only if
this is the last mount remaining. Does it look like a better approach?


 > So what if we changed propagate_mount_busy() to return true if the
 > superblock is frozen. That will *not* break lazy umount because that
 > doesn't care about propagate_mount_busy() returns but all other 
attempts to
 > umount will return EBUSY. Sure our problem with unreachable filesystem
 > being frozen won't be completely solved since frozen filesystem can still
 > be made unreachable by lazy umount but that does not seem to be that
 > common. What do you think?

I think that we should avoid solutions that do not address
the unreachable filesystem case properly. Hopefully I got it
right this time around.


Thank you for your detailed review of the patch set!

- Fernando

[-- Attachment #2: 10-fsfreeze-automatically-thaw-on-umount.patch --]
[-- Type: text/x-patch, Size: 5841 bytes --]

Subject: [PATCH 10/17] fsfreeze: automatically thaw on umount

The fsfreeze userspace API is a sb level API, which means that to thaw a frozen
filesystem we need to have access to it. This means that after an unmount the
filesystem cannot be thawed, because it is not part of the filesystem
namespace anymore.

A possible solution to the problem above is to mount the filesystem again and
execute the thaw operation. However, a subsequent mount is not guaranteed to
succeed and even if it does succeeded it may generate I/O in the process,
which is not supposed to happen since the filesystem is frozen.

Another approach is adding a bdev level API through which the unmounted
filesystem can be reached. The problem with this is that it only works for
filesystems for block devices. We could also return EBUSY when the user tries
to ummount an frozen filesystem, but this wold break lazy unmount.

Automatically freezing on sys_umount fixes all the problems mentioned above.

Cc: linux-fsdevel@vger.kernel.org
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.8-rc3-orig/fs/namespace.c linux-3.8-rc3/fs/namespace.c
--- linux-3.8-rc3-orig/fs/namespace.c	2013-01-10 17:51:41.651680000 +0900
+++ linux-3.8-rc3/fs/namespace.c	2013-01-10 17:47:53.419680000 +0900
@@ -1091,9 +1091,31 @@ int may_umount(struct vfsmount *mnt)
 
 EXPORT_SYMBOL(may_umount);
 
+static void thaw_mount(struct mount *mnt)
+{
+	struct super_block *sb = mnt->mnt.mnt_sb;
+
+	down_write(&sb->s_umount);
+	if (sb->s_writers.frozen == SB_UNFROZEN) {
+		up_write(&sb->s_umount);
+		return;
+	}
+	/*
+	 * The superblock may be in the process of being detached from the
+	 * namespace which means we have to make sure the thaw of the superblock
+	 * succeeds (once it has been detached the fsfreeze ioctls become
+	 * unusable). Thus, Force-thaw sb so that all tasks in fsfreeze wait
+	 * queue are woken up.
+	 */
+	sb->s_freeze_count = 1;
+	__thaw_super(sb, true);
+	deactivate_locked_super(sb);
+}
+
 void release_mounts(struct list_head *head)
 {
 	struct mount *mnt;
+	struct super_block *sb;
 	while (!list_empty(head)) {
 		mnt = list_first_entry(head, struct mount, mnt_hash);
 		list_del_init(&mnt->mnt_hash);
@@ -1111,6 +1133,11 @@ void release_mounts(struct list_head *he
 			dput(dentry);
 			mntput(&m->mnt);
 		}
+		br_read_lock(&vfsmount_lock);
+		sb = mnt->mnt.mnt_sb;
+		if (list_is_last(&mnt->mnt_instance, &sb->s_mounts))
+			thaw_mount(mnt);
+		br_read_lock(&vfsmount_lock);
 		mntput(&mnt->mnt);
 	}
 }
diff -urNp linux-3.8-rc3-orig/fs/super.c linux-3.8-rc3/fs/super.c
--- linux-3.8-rc3-orig/fs/super.c	2013-01-10 17:51:41.651680000 +0900
+++ linux-3.8-rc3/fs/super.c	2013-01-10 17:47:53.419680000 +0900
@@ -1414,6 +1414,7 @@ EXPORT_SYMBOL(freeze_super);
 /**
  * __thaw_super -- unlock filesystem
  * @sb: the super to thaw
+ * @force: whether or not to force the thaw (read details below before using)
  *
  * Unlocks the filesystem and marks it writeable again.
  *
@@ -1421,11 +1422,17 @@ EXPORT_SYMBOL(freeze_super);
  * after this thaw if it succeeded or the corresponding error code otherwise.
  * If the unfreeze fails, @sb is left in the frozen state.
  *
+ * If the force flag is set the kernel will proceeed with the thaw even if the
+ * call to the filesystem specific thaw function (->unfreeze_fs()) fails. This
+ * feature should be used only in situations where there is no entity we can
+ * return an error to so that it has a chance to clean things up and retry, i.e.
+ * this is the last oportunity to wake the tasks in the fsfreeze wait queue up.
+ *
  * This is the unlocked version of thaw_super, so it is the caller's job to
  * to protect the superblock by grabbing the s_umount semaphore in write mode
  * and release it again on return. See thaw_super() for an example.
  */
-static int __thaw_super(struct super_block *sb)
+int __thaw_super(struct super_block *sb, bool force)
 {
 	int error = 0;
 
@@ -1442,11 +1449,9 @@ static int __thaw_super(struct super_blo
 	if (sb->s_flags & MS_RDONLY)
 		goto out_thaw;
 
-	if (sb->s_op->unfreeze_fs) {
-		error = sb->s_op->unfreeze_fs(sb);
-		if (error) {
-			printk(KERN_ERR
-				"VFS:Filesystem thaw failed\n");
+	if (sb->s_op->unfreeze_fs && (error = sb->s_op->unfreeze_fs(sb))) {
+		printk(KERN_ERR "VFS: Filesystem thaw failed\n");
+		if (!force) {
 			sb->s_freeze_count++;
 			goto out;
 		}
@@ -1470,7 +1475,7 @@ int thaw_super(struct super_block *sb)
 {
 	int res;
 	down_write(&sb->s_umount);
-	res = __thaw_super(sb);
+	res = __thaw_super(sb, false);
 	if (!res) /* Active reference released after last thaw. */
 		deactivate_locked_super(sb);
 	else
@@ -1497,7 +1502,7 @@ static void do_thaw_one(struct super_blo
 	 * so we can call the lockless version of thaw_super() safely.
 	 */
 	sb->s_freeze_count = 1; /* avoid multiple calls to __thaw_super */
-	res = __thaw_super(sb);
+	res = __thaw_super(sb, false);
 	if (!res) {
 		deactivate_locked_super(sb);
 		/*
diff -urNp linux-3.8-rc3-orig/include/linux/fs.h linux-3.8-rc3/include/linux/fs.h
--- linux-3.8-rc3-orig/include/linux/fs.h	2013-01-10 17:51:41.651680000 +0900
+++ linux-3.8-rc3/include/linux/fs.h	2013-01-10 17:47:53.419680000 +0900
@@ -1882,6 +1882,7 @@ extern int user_statfs(const char __user
 extern int fd_statfs(int, struct kstatfs *);
 extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
+extern int __thaw_super(struct super_block *super, bool force);
 extern int thaw_super(struct super_block *super);
 extern void emergency_thaw_all(void);
 extern bool our_mnt(struct vfsmount *mnt);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 12/17] fsfreeze: sb-level/bdev-level fsfreeze integration
  2013-01-09 16:37   ` Jan Kara
@ 2013-01-10  9:57     ` Fernando Luis Vazquez Cao
  0 siblings, 0 replies; 27+ messages in thread
From: Fernando Luis Vazquez Cao @ 2013-01-10  9:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Luiz Capitulino, linux-fsdevel

On 2013/01/10 01:37, Jan Kara wrote:
 > On Mon 07-01-13 20:38:24, Fernando Luis Vázquez Cao wrote:
 >> As things stand now a filesystem frozen through the in-kernel bdev 
level API
 >> can be thawed using the userspace sb level API, which can lead to 
accidental
 >> corruption of filesystem snapshots and backups.
 >>
 >> To address this problem we modify the in-kernel API so that we can tell
 >> fsfreeze that a kernel initiated freeze is in progress and that the 
filesystem
 >> should not be thawed no matter how many times the FITHAW ioctl is 
invoked.
 >
 >   I'm not sure if this isn't going too far in the direction of trying to
 > prevent sysadmin to shoot himself in the foot. For well written 
applications
 > where FITHAW and FIFREEZE are paired, things should work OK after your
 > initial fixes. And if someone calls unpaired FITHAW, things can break
 > spectacularly anyway for other users of FIFREEZE. So I just wouldn't 
bother
 > with any more protections. What do you think?

I think that FITHAW/FIFREEZE should not interfere with
(freeze/thaw)_bdev, which is an internal kernel API. It could be the
case that users of freeze_bdev (external modules) cannot handle a
scenario where a frozen filesystem is unexpectedly thawed through the
userspace API.

This kind of protection is also beneficial in virtualization
environments where there can be hypervisor initiated uses of the
fsfreeze API (this is a reality in KVM with the advent of QEMU's guest
agent), which means that we effectively have two administrative
domains (the guest's and hypervisor's). It would be nice if we can
avoid situations where a guest initiated dm-snapshot is not affected
by a spurious FITHAW request from the hypervisor.

If the added complexity is acceptable I think this king of protections
is desirable.

- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2013-01-10  9:58 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-07 11:18 [PATCH v6 0/17] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
2013-01-07 11:21 ` [PATCH 1/17] vfs: add __iterate_supers() and helpers around it Fernando Luis Vázquez Cao
2013-01-07 11:22 ` [PATCH 2/17] fsfreeze: add unlocked version of thaw_super Fernando Luis Vázquez Cao
2013-01-07 11:23 ` [PATCH 3/17] fsfreeze: fix emergency thaw infinite loop Fernando Luis Vázquez Cao
2013-01-07 11:26 ` [PATCH 4/17] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
2013-01-09 16:12   ` Jan Kara
2013-01-07 11:27 ` [PATCH 5/17] xfs: switch to using super methods for fsfreeze Fernando Luis Vázquez Cao
2013-01-07 11:29 ` [PATCH 6/17] fsfreeze: move emergency thaw code to fs/super.c Fernando Luis Vázquez Cao
2013-01-07 11:30 ` [PATCH 7/17] fsfreeze: fix nested freezing of sb-less bdevs Fernando Luis Vázquez Cao
2013-01-09 16:24   ` Jan Kara
2013-01-07 11:32 ` [PATCH 8/17] fsfreeze: allow bdev level thaws when the sb is unfrozen Fernando Luis Vázquez Cao
2013-01-09 16:26   ` Jan Kara
2013-01-07 11:34 ` [PATCH 9/17] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
2013-01-07 11:35 ` [PATCH 10/17] fsfreeze: automatically thaw on umount Fernando Luis Vázquez Cao
2013-01-09 17:20   ` Jan Kara
2013-01-10  9:14     ` Fernando Luis Vazquez Cao
2013-01-07 11:36 ` [PATCH 11/17] fsfreeze: add thaw_super_force Fernando Luis Vázquez Cao
2013-01-07 11:38 ` [PATCH 12/17] fsfreeze: sb-level/bdev-level fsfreeze integration Fernando Luis Vázquez Cao
2013-01-09 16:37   ` Jan Kara
2013-01-10  9:57     ` Fernando Luis Vazquez Cao
2013-01-07 11:39 ` [PATCH 13/17] fsfreeze: unfreeze bdevs in addition to filesystems during emergency thaw Fernando Luis Vázquez Cao
2013-01-09 16:41   ` Jan Kara
2013-01-07 11:41 ` [PATCH 14/17] vfs: leverage bd_super in get_super and get_active_super Fernando Luis Vázquez Cao
2013-01-09 16:44   ` Jan Kara
2013-01-07 11:42 ` [PATCH 15/17] btrfs: store pointer to superblock in bd_super Fernando Luis Vázquez Cao
2013-01-07 11:43 ` [PATCH 16/17] fsfreeze: allow freeze counter lock nesting Fernando Luis Vázquez Cao
2013-01-07 11:44 ` [PATCH 17/17] fsfreeze: export freeze_count through mountinfo Fernando Luis Vázquez Cao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).