* [RFC, PATCH 0/9 v4] fsfreeze: miscellaneous fixes and cleanups
@ 2012-09-14 6:43 Fernando Luis Vázquez Cao
2012-09-14 6:45 ` [PATCH 1/9] vfs: add __iterate_supers() and helpers around it Fernando Luis Vázquez Cao
` (8 more replies)
0 siblings, 9 replies; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-09-14 6:43 UTC (permalink / raw)
To: Al Viro
Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
Jan Kara, linux-fsdevel, fernando
This patch set is to address long standing issues in the filesytem freeze code
and to fill some functionality gaps in the API. Some minor code rearrangements
are included too.
The following patches are included:
---
[1/9] vfs: add __iterate_supers() helper
[2/9] fsfreeze: add unlocked version of thaw_super
Preparatory patches to fix s_umount lockup of emergency thaw code.
[3/9] fsfreeze: Prevent emergency thaw from looping infinitely
Fix thaw_bdev so that it propagates the error code properly to the caller.
This bug caused emergency thaw to loop infinitely. This is a forward port of
a previous patch by Dave Chinner.
[4/9] fsfreeze: emergency thaw will deadlock on s_umount
Avoid emergency thaw deadlock on s_umount by using unlocked version of
thaw_super() and __iterate_supers()i (introduced in patches 2 and 1
respectively).
[5/9] xfs: switch to using super methods for fsfreeze
[6/9] fsfreeze: move emergency thaw code to fs/super.c
Two cleanups (based on previous patches by David Chinner): convert xfs to using
superblock methods for fsfreeze and move emergency thaw code to where it
belongs.
[7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
These patches address the bdev/sb mismatch and the fact that sb level freezing
does not nest correctly. For all the places that the block device interfaces
are used, we need a superblock anyway so we may as well make freeze/thaw work
only at the superblock level. There is one caveat (pointed out by Christoph in
the discussion linked above) though: we still need to support the "feature"
that we can freeze a block device that does not have a filesystem mounted yet
(this can be used to prevent a filesystem mount during a dm snapshot for
example).
This series moves part of the nesting code to the superblock from the block
device level but retains the freeze_bdev/thaw_bdev API so that we can keep the
"feature" mentioned above for those who may need it, namely dm and external
users (yes, block device level API is exported for external use). All the other
users of the block device API are converted to work at the superblock level.
[8/9] fsfreeze: add vfs ioctl to check freeze state
[9/9] fsfreeze: add block device ioctl to check freeze state
These ioctls can be used to by HA and monitoring software to check the freeze
state of block devices and filesystems.
---
In a follow-up patch set I will tackle a big outstanding problem: we allow
users to umount a frozen filesystem but we have no bdev level fsfreeze API to
thaw it after that. My plan is to add the bdev level API but this requires
some modifications to btrfs and vfs layer that I am still working on. By
the way, the approach of returning EBUSY on umount was discarded because it
would break lazy umount.
Changes since v3:
- Include right version of the emergency thaw fix (thanks go to Josef for the
heads up).
- Add iterate_supers_(read/write) helpers as suggested by Eric Sandeen.
- Fix typos.
- Add Eric Sandeen's Reviewed-by to patches 2 and 3.
Changes since v2:
- Rebase on top of Jan Kara's patch set which made it upstream.
- Do without the horrible "emergency" parameter as suggested by Jan Kara.
- New approach to fix emergency thaw deadlock on s_umount which does locking
properly.
Changes since v1:
- Do not break unfreezing of block devices without a mounted filesystem.
- Avoid conditional locking in thaw_bdev.
- Update filesystem locking documentation.
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/9] vfs: add __iterate_supers() and helpers around it
2012-09-14 6:43 [RFC, PATCH 0/9 v4] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
@ 2012-09-14 6:45 ` Fernando Luis Vázquez Cao
2012-09-25 9:11 ` Jan Kara
2012-09-14 6:46 ` [PATCH 2/9] fsfreeze: add unlocked version of thaw_super Fernando Luis Vázquez Cao
` (7 subsequent siblings)
8 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-09-14 6:45 UTC (permalink / raw)
To: Al Viro
Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
Jan Kara, linux-fsdevel, fernando
iterate_supers() calls a function provided by the caller with the s_umount
semaphore taken in read mode. However, there may be cases where write mode
is preferable, so we add __iterate_supers(), which lets one
specify the mode of the lock, and replace iterate_supers with two helpers
around __iterate_supers(), iterate_supers_read() and iterate_supers_write().
This will be used to fix the emergency thaw (filesystem unfreeze) code, which
iterates over the list of superblocks but needs to hold the s_umount semaphore
in _write_ mode bebore carrying out the actual thaw operation.
This patch introduces no semantic changes since iterate_supers() users become
iterate_supers_read() which is equivalent.
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-3.6-rc5-orig/fs/buffer.c linux-3.6-rc5/fs/buffer.c
--- linux-3.6-rc5-orig/fs/buffer.c 2012-09-14 11:53:42.972703234 +0900
+++ linux-3.6-rc5/fs/buffer.c 2012-09-14 12:02:04.892713525 +0900
@@ -521,7 +521,7 @@ static void do_thaw_one(struct super_blo
static void do_thaw_all(struct work_struct *work)
{
- iterate_supers(do_thaw_one, NULL);
+ iterate_supers_read(do_thaw_one, NULL);
kfree(work);
printk(KERN_WARNING "Emergency Thaw complete\n");
}
diff -urNp linux-3.6-rc5-orig/fs/drop_caches.c linux-3.6-rc5/fs/drop_caches.c
--- linux-3.6-rc5-orig/fs/drop_caches.c 2012-07-22 05:58:29.000000000 +0900
+++ linux-3.6-rc5/fs/drop_caches.c 2012-09-14 12:01:30.524713362 +0900
@@ -59,7 +59,7 @@ int drop_caches_sysctl_handler(ctl_table
return ret;
if (write) {
if (sysctl_drop_caches & 1)
- iterate_supers(drop_pagecache_sb, NULL);
+ iterate_supers_read(drop_pagecache_sb, NULL);
if (sysctl_drop_caches & 2)
drop_slab();
}
diff -urNp linux-3.6-rc5-orig/fs/quota/quota.c linux-3.6-rc5/fs/quota/quota.c
--- linux-3.6-rc5-orig/fs/quota/quota.c 2012-09-14 11:53:43.388703307 +0900
+++ linux-3.6-rc5/fs/quota/quota.c 2012-09-14 12:00:31.568712693 +0900
@@ -58,7 +58,7 @@ static int quota_sync_all(int type)
return -EINVAL;
ret = security_quotactl(Q_SYNC, type, 0, NULL);
if (!ret)
- iterate_supers(quota_sync_one, &type);
+ iterate_supers_read(quota_sync_one, &type);
return ret;
}
diff -urNp linux-3.6-rc5-orig/fs/super.c linux-3.6-rc5/fs/super.c
--- linux-3.6-rc5-orig/fs/super.c 2012-09-14 11:53:43.416703312 +0900
+++ linux-3.6-rc5/fs/super.c 2012-09-14 12:30:52.188833193 +0900
@@ -537,14 +537,22 @@ void drop_super(struct super_block *sb)
EXPORT_SYMBOL(drop_super);
/**
- * iterate_supers - call function for all active superblocks
+ * __iterate_supers - call function for all active superblocks
* @f: function to call
* @arg: argument to pass to it
+ * @wlock: mode of superblock lock (false->read lock, true->write lock)
*
* Scans the superblock list and calls given function, passing it
* locked superblock and given argument.
+ *
+ * When the caller asks for the superblock lock (s_umount semaphore) to be
+ * taken in write mode, the lock is taken but not released because the
+ * function provided by the caller may deactivate the superblock itself.
+ * It is that function's job to unlock the superblock as needed in such a
+ * case.
*/
-void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
+static void __iterate_supers(void (*f)(struct super_block *, void *), void *arg,
+ bool wlock)
{
struct super_block *sb, *p = NULL;
@@ -555,10 +563,19 @@ void iterate_supers(void (*f)(struct sup
sb->s_count++;
spin_unlock(&sb_lock);
- down_read(&sb->s_umount);
+ if (wlock)
+ down_write(&sb->s_umount);
+ else
+ down_read(&sb->s_umount);
+
if (sb->s_root && (sb->s_flags & MS_BORN))
f(sb, arg);
- up_read(&sb->s_umount);
+
+ /* When the semaphore was taken in write mode the function
+ * provided by the caller takes care of unlocking it as
+ * needed. See explanation above for details. */
+ if (!wlock)
+ up_read(&sb->s_umount);
spin_lock(&sb_lock);
if (p)
@@ -571,6 +588,36 @@ void iterate_supers(void (*f)(struct sup
}
/**
+ * iterate_supers_read - call function for all active superblocks
+ * @f: function to call
+ * @arg: argument to pass to it
+ *
+ * Scans the superblock list and calls given function, passing it
+ * a superblock locked in _read_ mode and given argument. The lock
+ * is automatically relased after the function return.
+ */
+void iterate_supers_read(void (*f)(struct super_block *, void *), void *arg)
+{
+ __iterate_supers(f, arg , false);
+}
+
+/**
+ * iterate_supers_write - call function for all active superblocks
+ * @f: function to call
+ * @arg: argument to pass to it
+ *
+ * Scans the superblock list and calls given function, passing it
+ * a superblock locked in _write_ mode and given argument. Diferently
+ * from its read counterpart, iterate_supers_write does not release
+ * the lock after the function return. See description of __iterate_supers
+ * above for details.
+ */
+void iterate_supers_write(void (*f)(struct super_block *, void *), void *arg)
+{
+ __iterate_supers(f, arg , true);
+}
+
+/**
* iterate_supers_type - call function for superblocks of given type
* @type: fs type
* @f: function to call
diff -urNp linux-3.6-rc5-orig/fs/sync.c linux-3.6-rc5/fs/sync.c
--- linux-3.6-rc5-orig/fs/sync.c 2012-09-14 11:53:43.416703312 +0900
+++ linux-3.6-rc5/fs/sync.c 2012-09-14 12:02:45.900713907 +0900
@@ -104,9 +104,9 @@ SYSCALL_DEFINE0(sync)
int nowait = 0, wait = 1;
wakeup_flusher_threads(0, WB_REASON_SYNC);
- iterate_supers(sync_inodes_one_sb, NULL);
- iterate_supers(sync_fs_one_sb, &nowait);
- iterate_supers(sync_fs_one_sb, &wait);
+ iterate_supers_read(sync_inodes_one_sb, NULL);
+ iterate_supers_read(sync_fs_one_sb, &nowait);
+ iterate_supers_read(sync_fs_one_sb, &wait);
iterate_bdevs(fdatawrite_one_bdev, NULL);
iterate_bdevs(fdatawait_one_bdev, NULL);
if (unlikely(laptop_mode))
@@ -122,11 +122,11 @@ static void do_sync_work(struct work_str
* Sync twice to reduce the possibility we skipped some inodes / pages
* because they were temporarily locked
*/
- iterate_supers(sync_inodes_one_sb, &nowait);
- iterate_supers(sync_fs_one_sb, &nowait);
+ iterate_supers_read(sync_inodes_one_sb, &nowait);
+ iterate_supers_read(sync_fs_one_sb, &nowait);
iterate_bdevs(fdatawrite_one_bdev, NULL);
- iterate_supers(sync_inodes_one_sb, &nowait);
- iterate_supers(sync_fs_one_sb, &nowait);
+ iterate_supers_read(sync_inodes_one_sb, &nowait);
+ iterate_supers_read(sync_fs_one_sb, &nowait);
iterate_bdevs(fdatawrite_one_bdev, NULL);
printk("Emergency Sync complete\n");
kfree(work);
diff -urNp linux-3.6-rc5-orig/include/linux/fs.h linux-3.6-rc5/include/linux/fs.h
--- linux-3.6-rc5-orig/include/linux/fs.h 2012-09-14 11:53:43.556703343 +0900
+++ linux-3.6-rc5/include/linux/fs.h 2012-09-14 12:03:50.352714271 +0900
@@ -2684,7 +2684,8 @@ extern struct super_block *get_super(str
extern struct super_block *get_super_thawed(struct block_device *);
extern struct super_block *get_active_super(struct block_device *bdev);
extern void drop_super(struct super_block *sb);
-extern void iterate_supers(void (*)(struct super_block *, void *), void *);
+extern void iterate_supers_read(void (*)(struct super_block *, void *), void *);
+extern void iterate_supers_write(void (*)(struct super_block *, void *), void *);
extern void iterate_supers_type(struct file_system_type *,
void (*)(struct super_block *, void *), void *);
diff -urNp linux-3.6-rc5-orig/security/selinux/hooks.c linux-3.6-rc5/security/selinux/hooks.c
--- linux-3.6-rc5-orig/security/selinux/hooks.c 2012-09-14 11:53:44.484703545 +0900
+++ linux-3.6-rc5/security/selinux/hooks.c 2012-09-14 12:00:14.816712518 +0900
@@ -5755,7 +5755,7 @@ void selinux_complete_init(void)
/* Set up any superblocks initialized prior to the policy load. */
printk(KERN_DEBUG "SELinux: Setting up existing superblocks.\n");
- iterate_supers(delayed_superblock_init, NULL);
+ iterate_supers_read(delayed_superblock_init, NULL);
}
/* SELinux requires early initialization in order to label
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 2/9] fsfreeze: add unlocked version of thaw_super
2012-09-14 6:43 [RFC, PATCH 0/9 v4] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
2012-09-14 6:45 ` [PATCH 1/9] vfs: add __iterate_supers() and helpers around it Fernando Luis Vázquez Cao
@ 2012-09-14 6:46 ` Fernando Luis Vázquez Cao
2012-09-25 9:13 ` Jan Kara
2012-09-14 6:47 ` [PATCH 3/9] fsfreeze: Prevent emergency thaw from looping infinitely Fernando Luis Vázquez Cao
` (6 subsequent siblings)
8 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-09-14 6:46 UTC (permalink / raw)
To: Al Viro
Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
Jan Kara, linux-fsdevel, fernando
thaw_super may be called with superblock lock already taken (fsfreeze's
emergency thaw being one example), so we need an unlocked version to avoid
lockups.
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-3.6-rc5-orig/fs/super.c linux-3.6-rc5/fs/super.c
--- linux-3.6-rc5-orig/fs/super.c 2012-09-14 12:35:48.076847666 +0900
+++ linux-3.6-rc5/fs/super.c 2012-09-14 12:41:01.596850005 +0900
@@ -1437,40 +1437,59 @@ int freeze_super(struct super_block *sb)
EXPORT_SYMBOL(freeze_super);
/**
- * thaw_super -- unlock filesystem
+ * __thaw_super -- unlock filesystem
* @sb: the super to thaw
*
- * Unlocks the filesystem and marks it writeable again after freeze_super().
+ * Unlocks the filesystem and marks it writeable again.
+ *
+ * This is the unlocked version of thaw_super, so it is the caller's job to
+ * to protect the superblock by grabbing the s_umount semaphore in write mode
+ * and release it again on return. See thaw_super() for an example.
*/
-int thaw_super(struct super_block *sb)
+int __thaw_super(struct super_block *sb)
{
- int error;
+ int error = 0;
- down_write(&sb->s_umount);
if (sb->s_writers.frozen == SB_UNFROZEN) {
- up_write(&sb->s_umount);
- return -EINVAL;
+ error = -EINVAL;
+ goto out;
}
if (sb->s_flags & MS_RDONLY)
- goto out;
+ goto out_thaw;
if (sb->s_op->unfreeze_fs) {
error = sb->s_op->unfreeze_fs(sb);
if (error) {
printk(KERN_ERR
"VFS:Filesystem thaw failed\n");
- up_write(&sb->s_umount);
- return error;
+ goto out;
}
}
-out:
+out_thaw:
sb->s_writers.frozen = SB_UNFROZEN;
smp_wmb();
wake_up(&sb->s_writers.wait_unfrozen);
- deactivate_locked_super(sb);
+out:
+ return error;
+}
- return 0;
+/**
+ * thaw_super -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ */
+int thaw_super(struct super_block *sb)
+{
+ int res;
+ down_write(&sb->s_umount);
+ res = __thaw_super(sb);
+ if (!res)
+ deactivate_locked_super(sb);
+ else
+ up_write(&sb->s_umount);
+ return res;
}
EXPORT_SYMBOL(thaw_super);
diff -urNp linux-3.6-rc5-orig/include/linux/fs.h linux-3.6-rc5/include/linux/fs.h
--- linux-3.6-rc5-orig/include/linux/fs.h 2012-09-14 12:35:48.076847666 +0900
+++ linux-3.6-rc5/include/linux/fs.h 2012-09-14 12:37:39.304848651 +0900
@@ -2082,6 +2082,7 @@ extern int user_statfs(const char __user
extern int fd_statfs(int, struct kstatfs *);
extern int vfs_ustat(dev_t, struct kstatfs *);
extern int freeze_super(struct super_block *super);
+extern int __thaw_super(struct super_block *super);
extern int thaw_super(struct super_block *super);
extern bool our_mnt(struct vfsmount *mnt);
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 3/9] fsfreeze: Prevent emergency thaw from looping infinitely
2012-09-14 6:43 [RFC, PATCH 0/9 v4] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
2012-09-14 6:45 ` [PATCH 1/9] vfs: add __iterate_supers() and helpers around it Fernando Luis Vázquez Cao
2012-09-14 6:46 ` [PATCH 2/9] fsfreeze: add unlocked version of thaw_super Fernando Luis Vázquez Cao
@ 2012-09-14 6:47 ` Fernando Luis Vázquez Cao
2012-09-14 6:48 ` [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
` (5 subsequent siblings)
8 siblings, 0 replies; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-09-14 6:47 UTC (permalink / raw)
To: Al Viro
Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
Jan Kara, linux-fsdevel, fernando
The thawing of a filesystem through sysrq-j loops infinitely as it
incorrectly detects a thawed filesytsem as frozen and tries to
unfreeze repeatedly. This is a regression caused by
4504230a71566785a05d3e6b53fa1ee071b864eb ("freeze_bdev: grab active
reference to frozen superblocks") in that it no longer returned
-EINVAL for superblocks that were not frozen.
Return -EINVAL when the filesystem is already unfrozen to avoid this
problem.
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp vfs-orig/fs/block_dev.c vfs/fs/block_dev.c
--- vfs-orig/fs/block_dev.c 2012-07-04 18:57:54.000000000 +0900
+++ vfs/fs/block_dev.c 2012-07-12 13:22:38.124815295 +0900
@@ -319,22 +319,17 @@ int thaw_bdev(struct block_device *bdev,
if (!bdev->bd_fsfreeze_count)
goto out;
- error = 0;
- if (--bdev->bd_fsfreeze_count > 0)
- goto out;
-
- if (!sb)
+ if (--bdev->bd_fsfreeze_count > 0 || !sb) {
+ error = 0;
goto out;
+ }
error = thaw_super(sb);
- if (error) {
+ if (error)
bdev->bd_fsfreeze_count++;
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
- return error;
- }
out:
mutex_unlock(&bdev->bd_fsfreeze_mutex);
- return 0;
+ return error;
}
EXPORT_SYMBOL(thaw_bdev);
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount
2012-09-14 6:43 [RFC, PATCH 0/9 v4] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
` (2 preceding siblings ...)
2012-09-14 6:47 ` [PATCH 3/9] fsfreeze: Prevent emergency thaw from looping infinitely Fernando Luis Vázquez Cao
@ 2012-09-14 6:48 ` Fernando Luis Vázquez Cao
2012-09-25 9:24 ` Jan Kara
2012-09-14 6:50 ` [PATCH 5/9] xfs: switch to using super methods for fsfreeze Fernando Luis Vázquez Cao
` (4 subsequent siblings)
8 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-09-14 6:48 UTC (permalink / raw)
To: Al Viro
Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
Jan Kara, linux-fsdevel, fernando
The emergency thaw process uses iterate_super() which holds the
sb->s_umount lock in read mode. The current thaw_super() code takes
the sb->s_umount lock in write mode, hence leading to an instant
deadlock.
Use the unlocked version of thaw_super() to do the thawing and replace
iterate_supers() with __iterate_supers() so that the unfreeze operation can
be performed with s_umount held as the locking rules for fsfreeze indicate.
As a bonus, by using thaw_super(), which does not nest, instead of thaw_bdev()
when can get rid of the ugly while loop.
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-3.6-rc5-orig/fs/buffer.c linux-3.6-rc5/fs/buffer.c
--- linux-3.6-rc5-orig/fs/buffer.c 2012-09-14 12:46:22.988871880 +0900
+++ linux-3.6-rc5/fs/buffer.c 2012-09-14 13:05:47.812919103 +0900
@@ -513,15 +513,27 @@ repeat:
static void do_thaw_one(struct super_block *sb, void *unused)
{
- char b[BDEVNAME_SIZE];
- while (sb->s_bdev && !thaw_bdev(sb->s_bdev, sb))
- printk(KERN_WARNING "Emergency Thaw on %s\n",
+ int res;
+
+ if (sb->s_bdev) {
+ char b[BDEVNAME_SIZE];
+ printk(KERN_WARNING "Emergency Thaw on %s.\n",
bdevname(sb->s_bdev, b));
+ }
+
+ /* We were called from __iterate_supers with superblock lock taken
+ * so we do not need to do it here. */
+ res = __thaw_super(sb);
+ if (!res)
+ deactivate_locked_super(sb);
+ else
+ up_write(&sb->s_umount);
+ return res;
}
static void do_thaw_all(struct work_struct *work)
{
- iterate_supers_read(do_thaw_one, NULL);
+ iterate_supers_write(do_thaw_one, NULL);
kfree(work);
printk(KERN_WARNING "Emergency Thaw complete\n");
}
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 5/9] xfs: switch to using super methods for fsfreeze
2012-09-14 6:43 [RFC, PATCH 0/9 v4] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
` (3 preceding siblings ...)
2012-09-14 6:48 ` [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
@ 2012-09-14 6:50 ` Fernando Luis Vázquez Cao
2012-09-14 6:51 ` [PATCH 6/9] fsfreeze: move emergency thaw code to fs/super.c Fernando Luis Vázquez Cao
` (3 subsequent siblings)
8 siblings, 0 replies; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-09-14 6:50 UTC (permalink / raw)
To: Al Viro
Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
Jan Kara, linux-fsdevel, fernando
xfs uses the bdev interfaces despite the fact that the kernel is operating
at the superblock level here. Convert xfs to use the superblock interfaces
instead.
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-3.6-rc5-orig/fs/xfs/xfs_fsops.c linux-3.6-rc5/fs/xfs/xfs_fsops.c
--- linux-3.6-rc5-orig/fs/xfs/xfs_fsops.c 2012-07-22 05:58:29.000000000 +0900
+++ linux-3.6-rc5/fs/xfs/xfs_fsops.c 2012-09-14 13:20:44.525043726 +0900
@@ -664,16 +664,12 @@ xfs_fs_goingdown(
__uint32_t inflags)
{
switch (inflags) {
- case XFS_FSOP_GOING_FLAGS_DEFAULT: {
- struct super_block *sb = freeze_bdev(mp->m_super->s_bdev);
-
- if (sb && !IS_ERR(sb)) {
+ case XFS_FSOP_GOING_FLAGS_DEFAULT:
+ if (!freeze_super(mp->m_super)) {
xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
- thaw_bdev(sb->s_bdev, sb);
+ thaw_super(mp->m_super);
}
-
break;
- }
case XFS_FSOP_GOING_FLAGS_LOGFLUSH:
xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
break;
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 6/9] fsfreeze: move emergency thaw code to fs/super.c
2012-09-14 6:43 [RFC, PATCH 0/9 v4] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
` (4 preceding siblings ...)
2012-09-14 6:50 ` [PATCH 5/9] xfs: switch to using super methods for fsfreeze Fernando Luis Vázquez Cao
@ 2012-09-14 6:51 ` Fernando Luis Vázquez Cao
2012-09-14 6:53 ` [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
` (2 subsequent siblings)
8 siblings, 0 replies; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-09-14 6:51 UTC (permalink / raw)
To: Al Viro
Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
Jan Kara, linux-fsdevel, fernando
It makes no sense having the emergency thaw code in fs/buffer.c when all of
it's operations are one superblocks and the code it executes is all in
fs/super.c. Move the code there and clean it up.
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-3.6-rc5-orig/fs/buffer.c linux-3.6-rc5/fs/buffer.c
--- linux-3.6-rc5-orig/fs/buffer.c 2012-09-14 13:18:41.801042524 +0900
+++ linux-3.6-rc5/fs/buffer.c 2012-09-14 13:30:11.605062587 +0900
@@ -511,49 +511,6 @@ repeat:
return err;
}
-static void do_thaw_one(struct super_block *sb, void *unused)
-{
- int res;
-
- if (sb->s_bdev) {
- char b[BDEVNAME_SIZE];
- printk(KERN_WARNING "Emergency Thaw on %s.\n",
- bdevname(sb->s_bdev, b));
- }
-
- /* We were called from __iterate_supers with superblock lock taken
- * so we do not need to do it here. */
- res = __thaw_super(sb);
- if (!res)
- deactivate_locked_super(sb);
- else
- up_write(&sb->s_umount);
- return res;
-}
-
-static void do_thaw_all(struct work_struct *work)
-{
- iterate_supers_write(do_thaw_one, NULL);
- kfree(work);
- printk(KERN_WARNING "Emergency Thaw complete\n");
-}
-
-/**
- * emergency_thaw_all -- forcibly thaw every frozen filesystem
- *
- * Used for emergency unfreeze of all filesystems via SysRq
- */
-void emergency_thaw_all(void)
-{
- struct work_struct *work;
-
- work = kmalloc(sizeof(*work), GFP_ATOMIC);
- if (work) {
- INIT_WORK(work, do_thaw_all);
- schedule_work(work);
- }
-}
-
/**
* sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
* @mapping: the mapping which wants those buffers written
diff -urNp linux-3.6-rc5-orig/fs/super.c linux-3.6-rc5/fs/super.c
--- linux-3.6-rc5-orig/fs/super.c 2012-09-14 12:46:26.884872000 +0900
+++ linux-3.6-rc5/fs/super.c 2012-09-14 13:34:17.125068798 +0900
@@ -612,7 +612,7 @@ void iterate_supers_read(void (*f)(struc
* the lock after the function return. See description of __iterate_supers
* above for details.
*/
-void iterate_supers_write(void (*f)(struct super_block *, void *), void *arg)
+static void iterate_supers_write(void (*f)(struct super_block *, void *), void *arg)
{
__iterate_supers(f, arg , true);
}
@@ -1446,7 +1446,7 @@ EXPORT_SYMBOL(freeze_super);
* to protect the superblock by grabbing the s_umount semaphore in write mode
* and release it again on return. See thaw_super() for an example.
*/
-int __thaw_super(struct super_block *sb)
+static int __thaw_super(struct super_block *sb)
{
int error = 0;
@@ -1493,3 +1493,46 @@ int thaw_super(struct super_block *sb)
return res;
}
EXPORT_SYMBOL(thaw_super);
+
+static void do_thaw_one(struct super_block *sb, void *unused)
+{
+ int res;
+
+ if (sb->s_bdev) {
+ char b[BDEVNAME_SIZE];
+ printk(KERN_WARNING "Emergency Thaw on %s.\n",
+ bdevname(sb->s_bdev, b));
+ }
+
+ /* We were called from __iterate_supers with superblock lock taken
+ * so we do not need to do it here. */
+ res = __thaw_super(sb);
+ if (!res)
+ deactivate_locked_super(sb);
+ else
+ up_write(&sb->s_umount);
+ return res;
+}
+
+static void do_thaw_all(struct work_struct *work)
+{
+ iterate_supers_write(do_thaw_one, NULL);
+ kfree(work);
+ printk(KERN_WARNING "Emergency Thaw complete\n");
+}
+
+/**
+ * emergency_thaw_all -- forcibly thaw every frozen filesystem
+ *
+ * Used for emergency unfreeze of all filesystems via SysRq
+ */
+void emergency_thaw_all(void)
+{
+ struct work_struct *work;
+
+ work = kmalloc(sizeof(*work), GFP_ATOMIC);
+ if (work) {
+ INIT_WORK(work, do_thaw_all);
+ schedule_work(work);
+ }
+}
diff -urNp linux-3.6-rc5-orig/include/linux/fs.h linux-3.6-rc5/include/linux/fs.h
--- linux-3.6-rc5-orig/include/linux/fs.h 2012-09-14 12:46:26.884872000 +0900
+++ linux-3.6-rc5/include/linux/fs.h 2012-09-14 13:32:52.993064282 +0900
@@ -2082,7 +2082,6 @@ extern int user_statfs(const char __user
extern int fd_statfs(int, struct kstatfs *);
extern int vfs_ustat(dev_t, struct kstatfs *);
extern int freeze_super(struct super_block *super);
-extern int __thaw_super(struct super_block *super);
extern int thaw_super(struct super_block *super);
extern bool our_mnt(struct vfsmount *mnt);
@@ -2686,7 +2685,6 @@ extern struct super_block *get_super_tha
extern struct super_block *get_active_super(struct block_device *bdev);
extern void drop_super(struct super_block *sb);
extern void iterate_supers_read(void (*)(struct super_block *, void *), void *);
-extern void iterate_supers_write(void (*)(struct super_block *, void *), void *);
extern void iterate_supers_type(struct file_system_type *,
void (*)(struct super_block *, void *), void *);
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
2012-09-14 6:43 [RFC, PATCH 0/9 v4] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
` (5 preceding siblings ...)
2012-09-14 6:51 ` [PATCH 6/9] fsfreeze: move emergency thaw code to fs/super.c Fernando Luis Vázquez Cao
@ 2012-09-14 6:53 ` Fernando Luis Vázquez Cao
2012-09-14 19:20 ` Eric Sandeen
2012-09-25 9:48 ` Jan Kara
2012-09-14 6:54 ` [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state Fernando Luis Vázquez Cao
2012-09-14 6:55 ` [PATCH 9/9] fsfreeze: add block device " Fernando Luis Vázquez Cao
8 siblings, 2 replies; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-09-14 6:53 UTC (permalink / raw)
To: Al Viro
Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
Jan Kara, linux-fsdevel, fernando
thaw_bdev() has re-entrancy guards to allow freezes to nest
together. That is, it ensures that the filesystem is not thawed
until the last thaw command is issued. This is needed to prevent the
filesystem from being unfrozen while an existing freezer is still
operating on the filesystem in a frozen state (e.g. dm-snapshot).
Currently, freeze_super() and thaw_super() bypasses these guards,
and as a result manual freezing and unfreezing via the ioctl methods
do not nest correctly. hence mixing userspace directed freezes with
block device level freezes result in inconsistency due to premature
thawing of the filesystem.
Move the re-enterency guards to the superblock and into freeze_super
and thaw_super() so that userspace directed freezes nest correctly
again. Caveat: Things work as expected as long as direct calls to
thaw_super are always in response to a previous sb level freeze. In
other words an unpaired call to thaw_super can still thaw a
filesystem frozen using freeze_bdev (this issue could be addressed
in a follow-up patch if deemed necessary).
This patch retains the bdev level mutex and counter to keep the
"feature" that we can freeze a block device that does not have a
filesystem mounted yet.
IMPORTANT CAVEAT: This patch restores the nesting feature of the fsfreeze ioctl
which got removed by commit 18e9e5104fcd9a973ffe3eed3816c87f2a1b6cd2
("Introduce freeze_super and thaw_super for the fsfreeze ioctl"). It could be
argued that it is too late to fix the userland ABI breakage caused by that
patch and that the current ABI is the one that should be kept. If this is the
respective maintainer(s) opinion this could be modified to not allow fsfreeze
ioctl nesting.
Changes since version 2:
- Fix reference count leak in freeze_super when MS_BORN flag is not set in
the superblock.
- Protect freeze counter using s_umount and get rid of superblock level
fsfreeze mutex.
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-3.6-rc5-orig/fs/block_dev.c linux-3.6-rc5/fs/block_dev.c
--- linux-3.6-rc5-orig/fs/block_dev.c 2012-09-14 12:48:53.336884715 +0900
+++ linux-3.6-rc5/fs/block_dev.c 2012-09-14 13:51:36.457205813 +0900
@@ -257,16 +257,18 @@ int fsync_bdev(struct block_device *bdev
EXPORT_SYMBOL(fsync_bdev);
/**
- * freeze_bdev -- lock a filesystem and force it into a consistent state
+ * freeze_bdev -- lock a block device
* @bdev: blockdevice to lock
*
- * If a superblock is found on this device, we take the s_umount semaphore
- * on it to make sure nobody unmounts until the snapshot creation is done.
- * The reference counter (bd_fsfreeze_count) guarantees that only the last
- * unfreeze process can unfreeze the frozen filesystem actually when multiple
- * freeze requests arrive simultaneously. It counts up in freeze_bdev() and
- * count down in thaw_bdev(). When it becomes 0, thaw_bdev() will unfreeze
- * actually.
+ * Locks the block device and, if present, the associated filesystem too.
+ *
+ * The reference counter (bd_fsfreeze_count) is used to implement the feature
+ * that allows one to freeze a block device that does not have a filesystem
+ * mounted yet. For filesystems using mount_bdev the kernel takes care of
+ * things by preventing the mount operation from succeeding if the underlying
+ * block device is frozen. Other filesystems should check this counter or risk
+ * a situation where a freeze_bdev user (e.g. dm snapshot) and mount race,
+ * which may lead to inconsistencies.
*/
struct super_block *freeze_bdev(struct block_device *bdev)
{
@@ -274,17 +276,7 @@ struct super_block *freeze_bdev(struct b
int error = 0;
mutex_lock(&bdev->bd_fsfreeze_mutex);
- if (++bdev->bd_fsfreeze_count > 1) {
- /*
- * We don't even need to grab a reference - the first call
- * to freeze_bdev grab an active reference and only the last
- * thaw_bdev drops it.
- */
- sb = get_super(bdev);
- drop_super(sb);
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
- return sb;
- }
+ bdev->bd_fsfreeze_count++;
sb = get_active_super(bdev);
if (!sb)
@@ -297,29 +289,32 @@ struct super_block *freeze_bdev(struct b
return ERR_PTR(error);
}
deactivate_super(sb);
- out:
+out:
sync_blockdev(bdev);
mutex_unlock(&bdev->bd_fsfreeze_mutex);
- return sb; /* thaw_bdev releases s->s_umount */
+ return sb;
}
EXPORT_SYMBOL(freeze_bdev);
/**
- * thaw_bdev -- unlock filesystem
+ * thaw_bdev -- unlock a block device
* @bdev: blockdevice to unlock
* @sb: associated superblock
*
- * Unlocks the filesystem and marks it writeable again after freeze_bdev().
+ * Unlocks the block device and, if present, the associated filesystem too.
*/
int thaw_bdev(struct block_device *bdev, struct super_block *sb)
{
int error = -EINVAL;
mutex_lock(&bdev->bd_fsfreeze_mutex);
+
if (!bdev->bd_fsfreeze_count)
goto out;
- if (--bdev->bd_fsfreeze_count > 0 || !sb) {
+ bdev->bd_fsfreeze_count--;
+
+ if (!sb) {
error = 0;
goto out;
}
diff -urNp linux-3.6-rc5-orig/fs/gfs2/ops_fstype.c linux-3.6-rc5/fs/gfs2/ops_fstype.c
--- linux-3.6-rc5-orig/fs/gfs2/ops_fstype.c 2012-09-14 12:46:15.152871285 +0900
+++ linux-3.6-rc5/fs/gfs2/ops_fstype.c 2012-09-14 13:51:36.457205813 +0900
@@ -1288,11 +1288,6 @@ static struct dentry *gfs2_mount(struct
if (IS_ERR(bdev))
return ERR_CAST(bdev);
- /*
- * once the super is inserted into the list by sget, s_umount
- * will protect the lockfs code from trying to start a snapshot
- * while we are mounting
- */
mutex_lock(&bdev->bd_fsfreeze_mutex);
if (bdev->bd_fsfreeze_count > 0) {
mutex_unlock(&bdev->bd_fsfreeze_mutex);
diff -urNp linux-3.6-rc5-orig/fs/nilfs2/super.c linux-3.6-rc5/fs/nilfs2/super.c
--- linux-3.6-rc5-orig/fs/nilfs2/super.c 2012-09-14 12:46:15.280871314 +0900
+++ linux-3.6-rc5/fs/nilfs2/super.c 2012-09-14 13:51:36.485205815 +0900
@@ -1273,11 +1273,6 @@ nilfs_mount(struct file_system_type *fs_
goto failed;
}
- /*
- * once the super is inserted into the list by sget, s_umount
- * will protect the lockfs code from trying to start a snapshot
- * while we are mounting
- */
mutex_lock(&sd.bdev->bd_fsfreeze_mutex);
if (sd.bdev->bd_fsfreeze_count > 0) {
mutex_unlock(&sd.bdev->bd_fsfreeze_mutex);
diff -urNp linux-3.6-rc5-orig/fs/super.c linux-3.6-rc5/fs/super.c
--- linux-3.6-rc5-orig/fs/super.c 2012-09-14 13:46:38.237179501 +0900
+++ linux-3.6-rc5/fs/super.c 2012-09-14 13:54:12.461207150 +0900
@@ -1041,11 +1041,6 @@ struct dentry *mount_bdev(struct file_sy
if (IS_ERR(bdev))
return ERR_CAST(bdev);
- /*
- * once the super is inserted into the list by sget, s_umount
- * will protect the lockfs code from trying to start a snapshot
- * while we are mounting
- */
mutex_lock(&bdev->bd_fsfreeze_mutex);
if (bdev->bd_fsfreeze_count > 0) {
mutex_unlock(&bdev->bd_fsfreeze_mutex);
@@ -1339,8 +1334,11 @@ static void sb_wait_write(struct super_b
* @sb: the super to lock
*
* Syncs the super to make sure the filesystem is consistent and calls the fs's
- * freeze_fs. Subsequent calls to this without first thawing the fs will return
- * -EBUSY.
+ * freeze_fs. The reference counter (s_freeze_count) guarantees that only the
+ * last unfreeze process can unfreeze the frozen filesystem actually when
+ * multiple freeze requests arrive simultaneously. It counts up in
+ * freeze_super() and counts down in thaw_super(). When it becomes 0,
+ * thaw_super() will execute the unfreeze.
*
* During this function, sb->s_writers.frozen goes through these values:
*
@@ -1365,29 +1363,27 @@ static void sb_wait_write(struct super_b
* freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
* mostly auxiliary for filesystems to verify they do not modify frozen fs.
*
- * sb->s_writers.frozen is protected by sb->s_umount.
+ * sb->s_writers.frozen and sb->s_freeze_count are protected by sb->s_umount.
*/
int freeze_super(struct super_block *sb)
{
- int ret;
+ int ret = 0;
atomic_inc(&sb->s_active);
down_write(&sb->s_umount);
- if (sb->s_writers.frozen != SB_UNFROZEN) {
- deactivate_locked_super(sb);
- return -EBUSY;
- }
if (!(sb->s_flags & MS_BORN)) {
- up_write(&sb->s_umount);
- return 0; /* sic - it's "nothing to do" */
+ atomic_dec(&sb->s_active);
+ goto out_active; /* sic - it's "nothing to do" */
}
+ if (++sb->s_freeze_count > 1)
+ goto out_deactivate;
+
if (sb->s_flags & MS_RDONLY) {
/* Nothing to do really... */
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
- up_write(&sb->s_umount);
- return 0;
+ goto out_active;
}
/* From now on, no new normal writers can start */
@@ -1419,11 +1415,11 @@ int freeze_super(struct super_block *sb)
if (ret) {
printk(KERN_ERR
"VFS:Filesystem freeze failed\n");
+ sb->s_freeze_count--;
sb->s_writers.frozen = SB_UNFROZEN;
smp_wmb();
wake_up(&sb->s_writers.wait_unfrozen);
- deactivate_locked_super(sb);
- return ret;
+ goto out_deactivate;
}
}
/*
@@ -1431,8 +1427,12 @@ int freeze_super(struct super_block *sb)
* sees write activity when frozen is set to SB_FREEZE_COMPLETE.
*/
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+out_active:
up_write(&sb->s_umount);
- return 0;
+ return ret;
+out_deactivate:
+ deactivate_locked_super(sb);
+ return ret;
}
EXPORT_SYMBOL(freeze_super);
@@ -1442,6 +1442,10 @@ EXPORT_SYMBOL(freeze_super);
*
* Unlocks the filesystem and marks it writeable again.
*
+ * Returns -EINVAL if @sb is not frozen, 0 if the filesystem specific unfreeze
+ * function was executed and succeeded or the corresponding error code
+ * otherwise. if the unfreeze fails, @sb is left in the frozen state.
+ *
* This is the unlocked version of thaw_super, so it is the caller's job to
* to protect the superblock by grabbing the s_umount semaphore in write mode
* and release it again on return. See thaw_super() for an example.
@@ -1450,11 +1454,14 @@ static int __thaw_super(struct super_blo
{
int error = 0;
- if (sb->s_writers.frozen == SB_UNFROZEN) {
+ if (!sb->s_freeze_count) {
error = -EINVAL;
goto out;
}
+ if (--sb->s_freeze_count > 0)
+ goto out;
+
if (sb->s_flags & MS_RDONLY)
goto out_thaw;
@@ -1463,6 +1470,7 @@ static int __thaw_super(struct super_blo
if (error) {
printk(KERN_ERR
"VFS:Filesystem thaw failed\n");
+ sb->s_freeze_count++;
goto out;
}
}
@@ -1506,6 +1514,7 @@ static void do_thaw_one(struct super_blo
/* We were called from __iterate_supers with superblock lock taken
* so we do not need to do it here. */
+ sb->s_freeze_count = 1; /* avoid multiple calls to __thaw_super */
res = __thaw_super(sb);
if (!res)
deactivate_locked_super(sb);
diff -urNp linux-3.6-rc5-orig/include/linux/fs.h linux-3.6-rc5/include/linux/fs.h
--- linux-3.6-rc5-orig/include/linux/fs.h 2012-09-14 13:46:38.325179510 +0900
+++ linux-3.6-rc5/include/linux/fs.h 2012-09-14 13:51:36.485205815 +0900
@@ -1578,6 +1578,8 @@ struct super_block {
/* Being remounted read-only */
int s_readonly_remount;
+
+ int s_freeze_count; /* nr of nested freezes */
};
/* superblock cache pruning functions */
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state
2012-09-14 6:43 [RFC, PATCH 0/9 v4] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
` (6 preceding siblings ...)
2012-09-14 6:53 ` [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
@ 2012-09-14 6:54 ` Fernando Luis Vázquez Cao
2012-09-14 6:55 ` [PATCH 9/9] fsfreeze: add block device " Fernando Luis Vázquez Cao
8 siblings, 0 replies; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-09-14 6:54 UTC (permalink / raw)
To: Al Viro
Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
Jan Kara, linux-fsdevel, fernando
The FIISFROZEN ioctl can be use by HA and monitoring software to check
the freeze state of a mounted filesystem.
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-3.6-rc5-orig/fs/compat_ioctl.c linux-3.6-rc5/fs/compat_ioctl.c
--- linux-3.6-rc5-orig/fs/compat_ioctl.c 2012-07-22 05:58:29.000000000 +0900
+++ linux-3.6-rc5/fs/compat_ioctl.c 2012-09-14 15:03:25.481355872 +0900
@@ -885,6 +885,7 @@ COMPATIBLE_IOCTL(FIGETBSZ)
/* 'X' - originally XFS but some now in the VFS */
COMPATIBLE_IOCTL(FIFREEZE)
COMPATIBLE_IOCTL(FITHAW)
+COMPATIBLE_IOCTL(FIISFROZEN)
COMPATIBLE_IOCTL(KDGETKEYCODE)
COMPATIBLE_IOCTL(KDSETKEYCODE)
COMPATIBLE_IOCTL(KDGKBTYPE)
diff -urNp linux-3.6-rc5-orig/fs/ioctl.c linux-3.6-rc5/fs/ioctl.c
--- linux-3.6-rc5-orig/fs/ioctl.c 2012-07-22 05:58:29.000000000 +0900
+++ linux-3.6-rc5/fs/ioctl.c 2012-09-14 15:03:25.481355872 +0900
@@ -536,6 +536,16 @@ static int ioctl_fsthaw(struct file *fil
return thaw_super(sb);
}
+static int ioctl_fs_isfrozen(struct file *filp)
+{
+ struct super_block *sb = filp->f_path.dentry->d_inode->i_sb;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ return isfrozen_super(sb);
+}
+
/*
* When you add any new common ioctls to the switches above and below
* please update compat_sys_ioctl() too.
@@ -585,6 +595,12 @@ int do_vfs_ioctl(struct file *filp, unsi
error = ioctl_fsthaw(filp);
break;
+ case FIISFROZEN:
+ error = ioctl_fs_isfrozen(filp);
+ if (error >= 0)
+ return put_user(error, (int __user *)arg);
+ break;
+
case FS_IOC_FIEMAP:
return ioctl_fiemap(filp, arg);
diff -urNp linux-3.6-rc5-orig/fs/super.c linux-3.6-rc5/fs/super.c
--- linux-3.6-rc5-orig/fs/super.c 2012-09-14 14:03:18.653312262 +0900
+++ linux-3.6-rc5/fs/super.c 2012-09-14 15:03:25.485355873 +0900
@@ -1545,3 +1545,8 @@ void emergency_thaw_all(void)
schedule_work(work);
}
}
+
+int isfrozen_super(struct super_block *sb)
+{
+ return sb->s_writers.frozen > SB_UNFROZEN ? 1 : 0;
+}
diff -urNp linux-3.6-rc5-orig/include/linux/fs.h linux-3.6-rc5/include/linux/fs.h
--- linux-3.6-rc5-orig/include/linux/fs.h 2012-09-14 14:03:18.993312323 +0900
+++ linux-3.6-rc5/include/linux/fs.h 2012-09-14 15:03:25.485355873 +0900
@@ -342,6 +342,7 @@ struct inodes_stat_t {
#define FIFREEZE _IOWR('X', 119, int) /* Freeze */
#define FITHAW _IOWR('X', 120, int) /* Thaw */
#define FITRIM _IOWR('X', 121, struct fstrim_range) /* Trim */
+#define FIISFROZEN _IOR('X', 122, int) /* get sb freeze state */
#define FS_IOC_GETFLAGS _IOR('f', 1, long)
#define FS_IOC_SETFLAGS _IOW('f', 2, long)
@@ -2085,6 +2086,7 @@ extern int fd_statfs(int, struct kstatfs
extern int vfs_ustat(dev_t, struct kstatfs *);
extern int freeze_super(struct super_block *super);
extern int thaw_super(struct super_block *super);
+extern int isfrozen_super(struct super_block *sb);
extern bool our_mnt(struct vfsmount *mnt);
extern int current_umask(void);
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 9/9] fsfreeze: add block device ioctl to check freeze state
2012-09-14 6:43 [RFC, PATCH 0/9 v4] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
` (7 preceding siblings ...)
2012-09-14 6:54 ` [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state Fernando Luis Vázquez Cao
@ 2012-09-14 6:55 ` Fernando Luis Vázquez Cao
8 siblings, 0 replies; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-09-14 6:55 UTC (permalink / raw)
To: Al Viro
Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
Jan Kara, linux-fsdevel, fernando
The BLKISFROZEN ioctl can be used to check the freeze state of a block device
(it is possible to freeze a block device that does not have a filesystem
sitting on top of it).
If allowing the umounting of frozen filesystems is deemed acceptable this ioctl
will be extended to check the state of the associated superblock if any.
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-3.6-rc5-orig/block/compat_ioctl.c linux-3.6-rc5/block/compat_ioctl.c
--- linux-3.6-rc5-orig/block/compat_ioctl.c 2012-07-22 05:58:29.000000000 +0900
+++ linux-3.6-rc5/block/compat_ioctl.c 2012-09-14 15:07:42.581375326 +0900
@@ -746,6 +746,13 @@ long compat_blkdev_ioctl(struct file *fi
case BLKTRACETEARDOWN: /* compatible */
ret = blk_trace_ioctl(bdev, cmd, compat_ptr(arg));
return ret;
+ case BLKISFROZEN:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+ ret = isfrozen_bdev(bdev);
+ if (ret >= 0)
+ return compat_put_int(arg, ret);
+ return ret;
default:
if (disk->fops->compat_ioctl)
ret = disk->fops->compat_ioctl(bdev, mode, cmd, arg);
diff -urNp linux-3.6-rc5-orig/block/ioctl.c linux-3.6-rc5/block/ioctl.c
--- linux-3.6-rc5-orig/block/ioctl.c 2012-09-14 12:46:09.984870443 +0900
+++ linux-3.6-rc5/block/ioctl.c 2012-09-14 15:07:42.593375327 +0900
@@ -396,6 +396,13 @@ int blkdev_ioctl(struct block_device *bd
case BLKTRACETEARDOWN:
ret = blk_trace_ioctl(bdev, cmd, (char __user *) arg);
break;
+ case BLKISFROZEN:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+ ret = isfrozen_bdev(bdev);
+ if (ret >= 0)
+ return put_int(arg, ret);
+ return ret;
default:
ret = __blkdev_driver_ioctl(bdev, mode, cmd, arg);
}
diff -urNp linux-3.6-rc5-orig/fs/block_dev.c linux-3.6-rc5/fs/block_dev.c
--- linux-3.6-rc5-orig/fs/block_dev.c 2012-09-14 14:03:17.861312116 +0900
+++ linux-3.6-rc5/fs/block_dev.c 2012-09-14 15:07:42.593375327 +0900
@@ -328,6 +328,11 @@ out:
}
EXPORT_SYMBOL(thaw_bdev);
+int isfrozen_bdev(struct block_device *bdev)
+{
+ return bdev->bd_fsfreeze_count > 0;
+}
+
static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
{
return block_write_full_page(page, blkdev_get_block, wbc);
diff -urNp linux-3.6-rc5-orig/include/linux/fs.h linux-3.6-rc5/include/linux/fs.h
--- linux-3.6-rc5-orig/include/linux/fs.h 2012-09-14 15:07:19.161375105 +0900
+++ linux-3.6-rc5/include/linux/fs.h 2012-09-14 15:07:42.593375327 +0900
@@ -335,6 +335,7 @@ struct inodes_stat_t {
#define BLKDISCARDZEROES _IO(0x12,124)
#define BLKSECDISCARD _IO(0x12,125)
#define BLKROTATIONAL _IO(0x12,126)
+#define BLKISFROZEN _IOR(0x12,127, int) /* get file system freeze state */
#define BMAP_IOCTL 1 /* obsolete - kept for compatibility */
#define FIBMAP _IO(0x00,1) /* bmap access */
@@ -2250,6 +2251,7 @@ extern void kill_bdev(struct block_devic
extern struct super_block *freeze_bdev(struct block_device *);
extern void emergency_thaw_all(void);
extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
+extern int isfrozen_bdev(struct block_device *bdev);
extern int fsync_bdev(struct block_device *);
#else
static inline void bd_forget(struct inode *inode) {}
@@ -2270,6 +2272,11 @@ static inline int thaw_bdev(struct block
static inline void iterate_bdevs(void (*f)(struct block_device *, void *), void *arg)
{
}
+
+static int isfrozen_bdev(struct block_device *bdev)
+{
+ return 0;
+}
#endif
extern int sync_filesystem(struct super_block *);
extern const struct file_operations def_blk_fops;
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
2012-09-14 6:53 ` [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
@ 2012-09-14 19:20 ` Eric Sandeen
2012-09-15 1:15 ` Eric Sandeen
2012-09-25 9:48 ` Jan Kara
1 sibling, 1 reply; 30+ messages in thread
From: Eric Sandeen @ 2012-09-14 19:20 UTC (permalink / raw)
To: Fernando Luis Vázquez Cao
Cc: Al Viro, Josef Bacik, Dave Chinner, Christoph Hellwig, Jan Kara,
linux-fsdevel, fernando
On 9/14/12 1:53 AM, Fernando Luis Vázquez Cao wrote:
> thaw_bdev() has re-entrancy guards to allow freezes to nest
> together. That is, it ensures that the filesystem is not thawed
> until the last thaw command is issued. This is needed to prevent the
> filesystem from being unfrozen while an existing freezer is still
> operating on the filesystem in a frozen state (e.g. dm-snapshot).
>
> Currently, freeze_super() and thaw_super() bypasses these guards,
> and as a result manual freezing and unfreezing via the ioctl methods
> do not nest correctly. hence mixing userspace directed freezes with
> block device level freezes result in inconsistency due to premature
> thawing of the filesystem.
>
> Move the re-enterency guards to the superblock and into freeze_super
> and thaw_super() so that userspace directed freezes nest correctly
> again. Caveat: Things work as expected as long as direct calls to
> thaw_super are always in response to a previous sb level freeze. In
> other words an unpaired call to thaw_super can still thaw a
> filesystem frozen using freeze_bdev (this issue could be addressed
> in a follow-up patch if deemed necessary).
>
> This patch retains the bdev level mutex and counter to keep the
> "feature" that we can freeze a block device that does not have a
> filesystem mounted yet.
>
> IMPORTANT CAVEAT: This patch restores the nesting feature of the fsfreeze ioctl
> which got removed by commit 18e9e5104fcd9a973ffe3eed3816c87f2a1b6cd2
> ("Introduce freeze_super and thaw_super for the fsfreeze ioctl"). It could be
> argued that it is too late to fix the userland ABI breakage caused by that
> patch and that the current ABI is the one that should be kept. If this is the
> respective maintainer(s) opinion this could be modified to not allow fsfreeze
> ioctl nesting.
I think what's more important is that a given process or person can expect
their freeze to last until they issue an unfreeze, but maybe there are
counter-arguments.
> Changes since version 2:
> - Fix reference count leak in freeze_super when MS_BORN flag is not set in
> the superblock.
> - Protect freeze counter using s_umount and get rid of superblock level
> fsfreeze mutex.
>
> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Eric Sandeen <sandeen@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
<big snip for now>
> @@ -1506,6 +1514,7 @@ static void do_thaw_one(struct super_blo
>
> /* We were called from __iterate_supers with superblock lock taken
> * so we do not need to do it here. */
> + sb->s_freeze_count = 1; /* avoid multiple calls to __thaw_super */
> res = __thaw_super(sb);
Hm, freeze_super() did:
atomic_inc(&sb->s_active);
as well as
if (++sb->s_freeze_count > 1)
for each successful nested freeze.
so won't this leave a bunch of active refs on the superblock?
> if (!res)
> deactivate_locked_super(sb);
> diff -urNp linux-3.6-rc5-orig/include/linux/fs.h linux-3.6-rc5/include/linux/fs.h
> --- linux-3.6-rc5-orig/include/linux/fs.h 2012-09-14 13:46:38.325179510 +0900
> +++ linux-3.6-rc5/include/linux/fs.h 2012-09-14 13:51:36.485205815 +0900
> @@ -1578,6 +1578,8 @@ struct super_block {
>
> /* Being remounted read-only */
> int s_readonly_remount;
> +
> + int s_freeze_count; /* nr of nested freezes */
> };
>
> /* superblock cache pruning functions */
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
2012-09-14 19:20 ` Eric Sandeen
@ 2012-09-15 1:15 ` Eric Sandeen
0 siblings, 0 replies; 30+ messages in thread
From: Eric Sandeen @ 2012-09-15 1:15 UTC (permalink / raw)
To: Fernando Luis Vázquez Cao
Cc: Al Viro, Josef Bacik, Dave Chinner, Christoph Hellwig, Jan Kara,
linux-fsdevel, fernando
On 9/14/12 2:20 PM, Eric Sandeen wrote:
> On 9/14/12 1:53 AM, Fernando Luis Vázquez Cao wrote:
>> thaw_bdev() has re-entrancy guards to allow freezes to nest
>> together. That is, it ensures that the filesystem is not thawed
>> until the last thaw command is issued. This is needed to prevent the
>> filesystem from being unfrozen while an existing freezer is still
>> operating on the filesystem in a frozen state (e.g. dm-snapshot).
>>
>> Currently, freeze_super() and thaw_super() bypasses these guards,
>> and as a result manual freezing and unfreezing via the ioctl methods
>> do not nest correctly. hence mixing userspace directed freezes with
>> block device level freezes result in inconsistency due to premature
>> thawing of the filesystem.
>>
>> Move the re-enterency guards to the superblock and into freeze_super
>> and thaw_super() so that userspace directed freezes nest correctly
>> again. Caveat: Things work as expected as long as direct calls to
>> thaw_super are always in response to a previous sb level freeze. In
>> other words an unpaired call to thaw_super can still thaw a
>> filesystem frozen using freeze_bdev (this issue could be addressed
>> in a follow-up patch if deemed necessary).
>>
>> This patch retains the bdev level mutex and counter to keep the
>> "feature" that we can freeze a block device that does not have a
>> filesystem mounted yet.
>>
>> IMPORTANT CAVEAT: This patch restores the nesting feature of the fsfreeze ioctl
>> which got removed by commit 18e9e5104fcd9a973ffe3eed3816c87f2a1b6cd2
>> ("Introduce freeze_super and thaw_super for the fsfreeze ioctl"). It could be
>> argued that it is too late to fix the userland ABI breakage caused by that
>> patch and that the current ABI is the one that should be kept. If this is the
>> respective maintainer(s) opinion this could be modified to not allow fsfreeze
>> ioctl nesting.
>
> I think what's more important is that a given process or person can expect
> their freeze to last until they issue an unfreeze, but maybe there are
> counter-arguments.
>
>> Changes since version 2:
>> - Fix reference count leak in freeze_super when MS_BORN flag is not set in
>> the superblock.
>> - Protect freeze counter using s_umount and get rid of superblock level
>> fsfreeze mutex.
>>
>> Cc: Josef Bacik <jbacik@fusionio.com>
>> Cc: Eric Sandeen <sandeen@redhat.com>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Dave Chinner <dchinner@redhat.com>
>> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
>> ---
>
> <big snip for now>
>
>> @@ -1506,6 +1514,7 @@ static void do_thaw_one(struct super_blo
>>
>> /* We were called from __iterate_supers with superblock lock taken
>> * so we do not need to do it here. */
>> + sb->s_freeze_count = 1; /* avoid multiple calls to __thaw_super */
>> res = __thaw_super(sb);
>
> Hm, freeze_super() did:
>
> atomic_inc(&sb->s_active);
>
> as well as
>
> if (++sb->s_freeze_count > 1)
>
> for each successful nested freeze.
>
> so won't this leave a bunch of active refs on the superblock?
Sorry, Fernando pointed out that I missed the fact that we only keep 1 active
ref on s_active even for nested freezers. So this is ok.
-Eric
>
>> if (!res)
>> deactivate_locked_super(sb);
>> diff -urNp linux-3.6-rc5-orig/include/linux/fs.h linux-3.6-rc5/include/linux/fs.h
>> --- linux-3.6-rc5-orig/include/linux/fs.h 2012-09-14 13:46:38.325179510 +0900
>> +++ linux-3.6-rc5/include/linux/fs.h 2012-09-14 13:51:36.485205815 +0900
>> @@ -1578,6 +1578,8 @@ struct super_block {
>>
>> /* Being remounted read-only */
>> int s_readonly_remount;
>> +
>> + int s_freeze_count; /* nr of nested freezes */
>> };
>>
>> /* superblock cache pruning functions */
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/9] vfs: add __iterate_supers() and helpers around it
2012-09-14 6:45 ` [PATCH 1/9] vfs: add __iterate_supers() and helpers around it Fernando Luis Vázquez Cao
@ 2012-09-25 9:11 ` Jan Kara
2012-09-25 9:42 ` Fernando Luis Vazquez Cao
0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2012-09-25 9:11 UTC (permalink / raw)
To: Fernando Luis Vázquez Cao
Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, Jan Kara, linux-fsdevel, fernando
On Fri 14-09-12 15:45:04, Fernando Luis Vázquez Cao wrote:
> iterate_supers() calls a function provided by the caller with the s_umount
> semaphore taken in read mode. However, there may be cases where write mode
> is preferable, so we add __iterate_supers(), which lets one
> specify the mode of the lock, and replace iterate_supers with two helpers
> around __iterate_supers(), iterate_supers_read() and iterate_supers_write().
>
> This will be used to fix the emergency thaw (filesystem unfreeze) code, which
> iterates over the list of superblocks but needs to hold the s_umount semaphore
> in _write_ mode bebore carrying out the actual thaw operation.
>
> This patch introduces no semantic changes since iterate_supers() users become
> iterate_supers_read() which is equivalent.
>
> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Eric Sandeen <sandeen@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
...
> diff -urNp linux-3.6-rc5-orig/fs/super.c linux-3.6-rc5/fs/super.c
> --- linux-3.6-rc5-orig/fs/super.c 2012-09-14 11:53:43.416703312 +0900
> +++ linux-3.6-rc5/fs/super.c 2012-09-14 12:30:52.188833193 +0900
> @@ -537,14 +537,22 @@ void drop_super(struct super_block *sb)
> EXPORT_SYMBOL(drop_super);
>
> /**
> - * iterate_supers - call function for all active superblocks
> + * __iterate_supers - call function for all active superblocks
> * @f: function to call
> * @arg: argument to pass to it
> + * @wlock: mode of superblock lock (false->read lock, true->write lock)
> *
> * Scans the superblock list and calls given function, passing it
> * locked superblock and given argument.
> + *
> + * When the caller asks for the superblock lock (s_umount semaphore) to be
> + * taken in write mode, the lock is taken but not released because the
> + * function provided by the caller may deactivate the superblock itself.
> + * It is that function's job to unlock the superblock as needed in such a
> + * case.
> */
> -void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
> +static void __iterate_supers(void (*f)(struct super_block *, void *), void *arg,
> + bool wlock)
> {
> struct super_block *sb, *p = NULL;
>
> @@ -555,10 +563,19 @@ void iterate_supers(void (*f)(struct sup
> sb->s_count++;
> spin_unlock(&sb_lock);
>
> - down_read(&sb->s_umount);
> + if (wlock)
> + down_write(&sb->s_umount);
> + else
> + down_read(&sb->s_umount);
> +
> if (sb->s_root && (sb->s_flags & MS_BORN))
> f(sb, arg);
> - up_read(&sb->s_umount);
> +
> + /* When the semaphore was taken in write mode the function
> + * provided by the caller takes care of unlocking it as
> + * needed. See explanation above for details. */
> + if (!wlock)
> + up_read(&sb->s_umount);
>
> spin_lock(&sb_lock);
> if (p)
These locking rules are ugly and counterintuitive. People will easily
get them wrong and create bugs. I'd rather see emergency thaw retake the
s_umount semaphore so that iterate_supers() can drop it...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/9] fsfreeze: add unlocked version of thaw_super
2012-09-14 6:46 ` [PATCH 2/9] fsfreeze: add unlocked version of thaw_super Fernando Luis Vázquez Cao
@ 2012-09-25 9:13 ` Jan Kara
2012-09-25 9:43 ` Fernando Luis Vazquez Cao
0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2012-09-25 9:13 UTC (permalink / raw)
To: Fernando Luis Vázquez Cao
Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, Jan Kara, linux-fsdevel, fernando
On Fri 14-09-12 15:46:23, Fernando Luis Vázquez Cao wrote:
> thaw_super may be called with superblock lock already taken (fsfreeze's
> emergency thaw being one example), so we need an unlocked version to avoid
> lockups.
>
> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
>
> diff -urNp linux-3.6-rc5-orig/fs/super.c linux-3.6-rc5/fs/super.c
> --- linux-3.6-rc5-orig/fs/super.c 2012-09-14 12:35:48.076847666 +0900
> +++ linux-3.6-rc5/fs/super.c 2012-09-14 12:41:01.596850005 +0900
> @@ -1437,40 +1437,59 @@ int freeze_super(struct super_block *sb)
> EXPORT_SYMBOL(freeze_super);
>
> /**
> - * thaw_super -- unlock filesystem
> + * __thaw_super -- unlock filesystem
> * @sb: the super to thaw
> *
> - * Unlocks the filesystem and marks it writeable again after freeze_super().
> + * Unlocks the filesystem and marks it writeable again.
> + *
> + * This is the unlocked version of thaw_super, so it is the caller's job to
> + * to protect the superblock by grabbing the s_umount semaphore in write mode
> + * and release it again on return. See thaw_super() for an example.
> */
> -int thaw_super(struct super_block *sb)
> +int __thaw_super(struct super_block *sb)
> {
> - int error;
> + int error = 0;
>
> - down_write(&sb->s_umount);
> if (sb->s_writers.frozen == SB_UNFROZEN) {
> - up_write(&sb->s_umount);
> - return -EINVAL;
> + error = -EINVAL;
> + goto out;
> }
>
> if (sb->s_flags & MS_RDONLY)
> - goto out;
> + goto out_thaw;
>
> if (sb->s_op->unfreeze_fs) {
> error = sb->s_op->unfreeze_fs(sb);
> if (error) {
> printk(KERN_ERR
> "VFS:Filesystem thaw failed\n");
> - up_write(&sb->s_umount);
> - return error;
> + goto out;
> }
> }
>
> -out:
> +out_thaw:
> sb->s_writers.frozen = SB_UNFROZEN;
> smp_wmb();
> wake_up(&sb->s_writers.wait_unfrozen);
> - deactivate_locked_super(sb);
> +out:
> + return error;
> +}
>
> - return 0;
> +/**
> + * thaw_super -- unlock filesystem
> + * @sb: the super to thaw
> + *
> + * Unlocks the filesystem and marks it writeable again after freeze_super().
> + */
> +int thaw_super(struct super_block *sb)
> +{
> + int res;
> + down_write(&sb->s_umount);
> + res = __thaw_super(sb);
> + if (!res)
> + deactivate_locked_super(sb);
> + else
> + up_write(&sb->s_umount);
> + return res;
> }
> EXPORT_SYMBOL(thaw_super);
> diff -urNp linux-3.6-rc5-orig/include/linux/fs.h linux-3.6-rc5/include/linux/fs.h
> --- linux-3.6-rc5-orig/include/linux/fs.h 2012-09-14 12:35:48.076847666 +0900
> +++ linux-3.6-rc5/include/linux/fs.h 2012-09-14 12:37:39.304848651 +0900
> @@ -2082,6 +2082,7 @@ extern int user_statfs(const char __user
> extern int fd_statfs(int, struct kstatfs *);
> extern int vfs_ustat(dev_t, struct kstatfs *);
> extern int freeze_super(struct super_block *super);
> +extern int __thaw_super(struct super_block *super);
> extern int thaw_super(struct super_block *super);
> extern bool our_mnt(struct vfsmount *mnt);
>
>
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount
2012-09-14 6:48 ` [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
@ 2012-09-25 9:24 ` Jan Kara
2012-09-25 10:31 ` Fernando Luis Vazquez Cao
0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2012-09-25 9:24 UTC (permalink / raw)
To: Fernando Luis Vázquez Cao
Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, Jan Kara, linux-fsdevel, fernando
On Fri 14-09-12 15:48:56, Fernando Luis Vázquez Cao wrote:
> The emergency thaw process uses iterate_super() which holds the
> sb->s_umount lock in read mode. The current thaw_super() code takes
> the sb->s_umount lock in write mode, hence leading to an instant
> deadlock.
>
> Use the unlocked version of thaw_super() to do the thawing and replace
> iterate_supers() with __iterate_supers() so that the unfreeze operation can
> be performed with s_umount held as the locking rules for fsfreeze indicate.
>
> As a bonus, by using thaw_super(), which does not nest, instead of thaw_bdev()
> when can get rid of the ugly while loop.
Hum, but that will leave block devices frozen? Isn't that a bug? It
definitely creates an inconsistent state. I think emergency thaw should
also iterate over all block devices (via iterate_bdevs()) and thaw all of
them (and no, this does not allow us to avoid the iterate_super() changes
because there are filesystems without block devices...).
Honza
> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Eric Sandeen <sandeen@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
>
> diff -urNp linux-3.6-rc5-orig/fs/buffer.c linux-3.6-rc5/fs/buffer.c
> --- linux-3.6-rc5-orig/fs/buffer.c 2012-09-14 12:46:22.988871880 +0900
> +++ linux-3.6-rc5/fs/buffer.c 2012-09-14 13:05:47.812919103 +0900
> @@ -513,15 +513,27 @@ repeat:
>
> static void do_thaw_one(struct super_block *sb, void *unused)
> {
> - char b[BDEVNAME_SIZE];
> - while (sb->s_bdev && !thaw_bdev(sb->s_bdev, sb))
> - printk(KERN_WARNING "Emergency Thaw on %s\n",
> + int res;
> +
> + if (sb->s_bdev) {
> + char b[BDEVNAME_SIZE];
> + printk(KERN_WARNING "Emergency Thaw on %s.\n",
> bdevname(sb->s_bdev, b));
> + }
> +
> + /* We were called from __iterate_supers with superblock lock taken
> + * so we do not need to do it here. */
> + res = __thaw_super(sb);
> + if (!res)
> + deactivate_locked_super(sb);
> + else
> + up_write(&sb->s_umount);
> + return res;
> }
>
> static void do_thaw_all(struct work_struct *work)
> {
> - iterate_supers_read(do_thaw_one, NULL);
> + iterate_supers_write(do_thaw_one, NULL);
> kfree(work);
> printk(KERN_WARNING "Emergency Thaw complete\n");
> }
>
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/9] vfs: add __iterate_supers() and helpers around it
2012-09-25 9:11 ` Jan Kara
@ 2012-09-25 9:42 ` Fernando Luis Vazquez Cao
2012-09-25 9:52 ` Jan Kara
0 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-09-25 9:42 UTC (permalink / raw)
To: Jan Kara
Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, linux-fsdevel, fernando
On 2012年09月25日 18:11, Jan Kara wrote:
> On Fri 14-09-12 15:45:04, Fernando Luis Vázquez Cao wrote:
>> iterate_supers() calls a function provided by the caller with the s_umount
>> semaphore taken in read mode. However, there may be cases where write mode
>> is preferable, so we add __iterate_supers(), which lets one
>> specify the mode of the lock, and replace iterate_supers with two helpers
>> around __iterate_supers(), iterate_supers_read() and iterate_supers_write().
>>
>> This will be used to fix the emergency thaw (filesystem unfreeze) code, which
>> iterates over the list of superblocks but needs to hold the s_umount semaphore
>> in _write_ mode bebore carrying out the actual thaw operation.
>>
>> This patch introduces no semantic changes since iterate_supers() users become
>> iterate_supers_read() which is equivalent.
>>
>> Cc: Josef Bacik <jbacik@fusionio.com>
>> Cc: Eric Sandeen <sandeen@redhat.com>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Dave Chinner <dchinner@redhat.com>
>> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
>> ---
> ...
>> diff -urNp linux-3.6-rc5-orig/fs/super.c linux-3.6-rc5/fs/super.c
>> --- linux-3.6-rc5-orig/fs/super.c 2012-09-14 11:53:43.416703312 +0900
>> +++ linux-3.6-rc5/fs/super.c 2012-09-14 12:30:52.188833193 +0900
>> @@ -537,14 +537,22 @@ void drop_super(struct super_block *sb)
>> EXPORT_SYMBOL(drop_super);
>>
>> /**
>> - * iterate_supers - call function for all active superblocks
>> + * __iterate_supers - call function for all active superblocks
>> * @f: function to call
>> * @arg: argument to pass to it
>> + * @wlock: mode of superblock lock (false->read lock, true->write lock)
>> *
>> * Scans the superblock list and calls given function, passing it
>> * locked superblock and given argument.
>> + *
>> + * When the caller asks for the superblock lock (s_umount semaphore) to be
>> + * taken in write mode, the lock is taken but not released because the
>> + * function provided by the caller may deactivate the superblock itself.
>> + * It is that function's job to unlock the superblock as needed in such a
>> + * case.
>> */
>> -void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
>> +static void __iterate_supers(void (*f)(struct super_block *, void *), void *arg,
>> + bool wlock)
>> {
>> struct super_block *sb, *p = NULL;
>>
>> @@ -555,10 +563,19 @@ void iterate_supers(void (*f)(struct sup
>> sb->s_count++;
>> spin_unlock(&sb_lock);
>>
>> - down_read(&sb->s_umount);
>> + if (wlock)
>> + down_write(&sb->s_umount);
>> + else
>> + down_read(&sb->s_umount);
>> +
>> if (sb->s_root && (sb->s_flags & MS_BORN))
>> f(sb, arg);
>> - up_read(&sb->s_umount);
>> +
>> + /* When the semaphore was taken in write mode the function
>> + * provided by the caller takes care of unlocking it as
>> + * needed. See explanation above for details. */
>> + if (!wlock)
>> + up_read(&sb->s_umount);
>>
>> spin_lock(&sb_lock);
>> if (p)
> These locking rules are ugly and counterintuitive. People will easily
> get them wrong and create bugs. I'd rather see emergency thaw retake the
> s_umount semaphore so that iterate_supers() can drop it...
I guess you are referring to treating the write lock differently
and not dropping the lock inside __iterate_supers(). The
problem is that f() may release the last reference to the
superblock which in turn will go away, so letting
__iterate_supers() drop the lock is not safe (I added a
comment about this issue in the function itself).
Regarding the ugliness, please notice that __iterate_supers
is static and is not supposed to be used directly; I added two
wrappers around it (a read variant that is semantically identical
to what we have now and a write variant) and documented them
as thoroughly as I could.
Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/9] fsfreeze: add unlocked version of thaw_super
2012-09-25 9:13 ` Jan Kara
@ 2012-09-25 9:43 ` Fernando Luis Vazquez Cao
0 siblings, 0 replies; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-09-25 9:43 UTC (permalink / raw)
To: Jan Kara
Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, linux-fsdevel, fernando
On 2012年09月25日 18:13, Jan Kara wrote:
> On Fri 14-09-12 15:46:23, Fernando Luis Vázquez Cao wrote:
>> thaw_super may be called with superblock lock already taken (fsfreeze's
>> emergency thaw being one example), so we need an unlocked version to avoid
>> lockups.
>>
>> Cc: Josef Bacik <jbacik@fusionio.com>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Dave Chinner <dchinner@redhat.com>
>> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
>> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> Looks good. You can add:
> Reviewed-by: Jan Kara <jack@suse.cz>
Thank you.
- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
2012-09-14 6:53 ` [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
2012-09-14 19:20 ` Eric Sandeen
@ 2012-09-25 9:48 ` Jan Kara
2012-09-25 10:51 ` Fernando Luis Vazquez Cao
1 sibling, 1 reply; 30+ messages in thread
From: Jan Kara @ 2012-09-25 9:48 UTC (permalink / raw)
To: Fernando Luis Vázquez Cao
Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, Jan Kara, linux-fsdevel, fernando
On Fri 14-09-12 15:53:34, Fernando Luis Vázquez Cao wrote:
> thaw_bdev() has re-entrancy guards to allow freezes to nest
> together. That is, it ensures that the filesystem is not thawed
> until the last thaw command is issued. This is needed to prevent the
> filesystem from being unfrozen while an existing freezer is still
> operating on the filesystem in a frozen state (e.g. dm-snapshot).
>
> Currently, freeze_super() and thaw_super() bypasses these guards,
> and as a result manual freezing and unfreezing via the ioctl methods
> do not nest correctly. hence mixing userspace directed freezes with
> block device level freezes result in inconsistency due to premature
> thawing of the filesystem.
>
> Move the re-enterency guards to the superblock and into freeze_super
> and thaw_super() so that userspace directed freezes nest correctly
> again. Caveat: Things work as expected as long as direct calls to
> thaw_super are always in response to a previous sb level freeze. In
> other words an unpaired call to thaw_super can still thaw a
> filesystem frozen using freeze_bdev (this issue could be addressed
> in a follow-up patch if deemed necessary).
>
> This patch retains the bdev level mutex and counter to keep the
> "feature" that we can freeze a block device that does not have a
> filesystem mounted yet.
>
> IMPORTANT CAVEAT: This patch restores the nesting feature of the fsfreeze ioctl
> which got removed by commit 18e9e5104fcd9a973ffe3eed3816c87f2a1b6cd2
> ("Introduce freeze_super and thaw_super for the fsfreeze ioctl"). It could be
> argued that it is too late to fix the userland ABI breakage caused by that
> patch and that the current ABI is the one that should be kept. If this is the
> respective maintainer(s) opinion this could be modified to not allow fsfreeze
> ioctl nesting.
>
> Changes since version 2:
> - Fix reference count leak in freeze_super when MS_BORN flag is not set in
> the superblock.
> - Protect freeze counter using s_umount and get rid of superblock level
> fsfreeze mutex.
>
> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Eric Sandeen <sandeen@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
I have just some mostly minor comments:
...
> diff -urNp linux-3.6-rc5-orig/fs/gfs2/ops_fstype.c linux-3.6-rc5/fs/gfs2/ops_fstype.c
> --- linux-3.6-rc5-orig/fs/gfs2/ops_fstype.c 2012-09-14 12:46:15.152871285 +0900
> +++ linux-3.6-rc5/fs/gfs2/ops_fstype.c 2012-09-14 13:51:36.457205813 +0900
> @@ -1288,11 +1288,6 @@ static struct dentry *gfs2_mount(struct
> if (IS_ERR(bdev))
> return ERR_CAST(bdev);
>
> - /*
> - * once the super is inserted into the list by sget, s_umount
> - * will protect the lockfs code from trying to start a snapshot
> - * while we are mounting
> - */
Shouldn't this comment be replaced with something more accurate instead
of just deleting it?
...
> @@ -1365,29 +1363,27 @@ static void sb_wait_write(struct super_b
> * freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
> * mostly auxiliary for filesystems to verify they do not modify frozen fs.
> *
> - * sb->s_writers.frozen is protected by sb->s_umount.
> + * sb->s_writers.frozen and sb->s_freeze_count are protected by sb->s_umount.
> */
> int freeze_super(struct super_block *sb)
> {
> - int ret;
> + int ret = 0;
>
> atomic_inc(&sb->s_active);
> down_write(&sb->s_umount);
> - if (sb->s_writers.frozen != SB_UNFROZEN) {
> - deactivate_locked_super(sb);
> - return -EBUSY;
> - }
>
> if (!(sb->s_flags & MS_BORN)) {
> - up_write(&sb->s_umount);
> - return 0; /* sic - it's "nothing to do" */
> + atomic_dec(&sb->s_active);
> + goto out_active; /* sic - it's "nothing to do" */
Why not 'goto out_deactivate' here instead of manually decrementing
s_active?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/9] vfs: add __iterate_supers() and helpers around it
2012-09-25 9:42 ` Fernando Luis Vazquez Cao
@ 2012-09-25 9:52 ` Jan Kara
2012-09-25 10:03 ` Fernando Luis Vazquez Cao
0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2012-09-25 9:52 UTC (permalink / raw)
To: Fernando Luis Vazquez Cao
Cc: Jan Kara, Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, linux-fsdevel, fernando
On Tue 25-09-12 18:42:16, Fernando Luis Vazquez Cao wrote:
> On 2012年09月25日 18:11, Jan Kara wrote:
> >On Fri 14-09-12 15:45:04, Fernando Luis Vázquez Cao wrote:
> >>iterate_supers() calls a function provided by the caller with the s_umount
> >>semaphore taken in read mode. However, there may be cases where write mode
> >>is preferable, so we add __iterate_supers(), which lets one
> >>specify the mode of the lock, and replace iterate_supers with two helpers
> >>around __iterate_supers(), iterate_supers_read() and iterate_supers_write().
> >>
> >>This will be used to fix the emergency thaw (filesystem unfreeze) code, which
> >>iterates over the list of superblocks but needs to hold the s_umount semaphore
> >>in _write_ mode bebore carrying out the actual thaw operation.
> >>
> >>This patch introduces no semantic changes since iterate_supers() users become
> >>iterate_supers_read() which is equivalent.
> >>
> >>Cc: Josef Bacik <jbacik@fusionio.com>
> >>Cc: Eric Sandeen <sandeen@redhat.com>
> >>Cc: Christoph Hellwig <hch@infradead.org>
> >>Cc: Jan Kara <jack@suse.cz>
> >>Cc: Dave Chinner <dchinner@redhat.com>
> >>Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> >>---
> >...
> >>diff -urNp linux-3.6-rc5-orig/fs/super.c linux-3.6-rc5/fs/super.c
> >>--- linux-3.6-rc5-orig/fs/super.c 2012-09-14 11:53:43.416703312 +0900
> >>+++ linux-3.6-rc5/fs/super.c 2012-09-14 12:30:52.188833193 +0900
> >>@@ -537,14 +537,22 @@ void drop_super(struct super_block *sb)
> >> EXPORT_SYMBOL(drop_super);
> >> /**
> >>- * iterate_supers - call function for all active superblocks
> >>+ * __iterate_supers - call function for all active superblocks
> >> * @f: function to call
> >> * @arg: argument to pass to it
> >>+ * @wlock: mode of superblock lock (false->read lock, true->write lock)
> >> *
> >> * Scans the superblock list and calls given function, passing it
> >> * locked superblock and given argument.
> >>+ *
> >>+ * When the caller asks for the superblock lock (s_umount semaphore) to be
> >>+ * taken in write mode, the lock is taken but not released because the
> >>+ * function provided by the caller may deactivate the superblock itself.
> >>+ * It is that function's job to unlock the superblock as needed in such a
> >>+ * case.
> >> */
> >>-void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
> >>+static void __iterate_supers(void (*f)(struct super_block *, void *), void *arg,
> >>+ bool wlock)
> >> {
> >> struct super_block *sb, *p = NULL;
> >>@@ -555,10 +563,19 @@ void iterate_supers(void (*f)(struct sup
> >> sb->s_count++;
> >> spin_unlock(&sb_lock);
> >>- down_read(&sb->s_umount);
> >>+ if (wlock)
> >>+ down_write(&sb->s_umount);
> >>+ else
> >>+ down_read(&sb->s_umount);
> >>+
> >> if (sb->s_root && (sb->s_flags & MS_BORN))
> >> f(sb, arg);
> >>- up_read(&sb->s_umount);
> >>+
> >>+ /* When the semaphore was taken in write mode the function
> >>+ * provided by the caller takes care of unlocking it as
> >>+ * needed. See explanation above for details. */
> >>+ if (!wlock)
> >>+ up_read(&sb->s_umount);
> >> spin_lock(&sb_lock);
> >> if (p)
> > These locking rules are ugly and counterintuitive. People will easily
> >get them wrong and create bugs. I'd rather see emergency thaw retake the
> >s_umount semaphore so that iterate_supers() can drop it...
>
> I guess you are referring to treating the write lock differently
> and not dropping the lock inside __iterate_supers(). The
> problem is that f() may release the last reference to the
> superblock which in turn will go away, so letting
> __iterate_supers() drop the lock is not safe (I added a
> comment about this issue in the function itself).
Well, except that iterate_supers() actually takes a passive reference
(s_count) of the superblock. Thus deactivate_locked_super() will never
really destroy it. So what I propose should be safe.
> Regarding the ugliness, please notice that __iterate_supers
> is static and is not supposed to be used directly; I added two
> wrappers around it (a read variant that is semantically identical
> to what we have now and a write variant) and documented them
> as thoroughly as I could.
I know but really the assymetry in the locking has to be observed by the
callback function. And if you have a callback function which doesn't want
to deactivate superblock, it is far from obvious it should drop the
s_umount semaphore... And I'm happy how you documented stuff but people
read documentation only after they spot problems so it is better to have
interfaces which are hard to get wrong.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/9] vfs: add __iterate_supers() and helpers around it
2012-09-25 9:52 ` Jan Kara
@ 2012-09-25 10:03 ` Fernando Luis Vazquez Cao
0 siblings, 0 replies; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-09-25 10:03 UTC (permalink / raw)
To: Jan Kara
Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, linux-fsdevel
On 2012/09/25 18:52, Jan Kara wrote:
> On Tue 25-09-12 18:42:16, Fernando Luis Vazquez Cao wrote:
>> On 2012年09月25日 18:11, Jan Kara wrote:
>>> On Fri 14-09-12 15:45:04, Fernando Luis Vázquez Cao wrote:
>>>> iterate_supers() calls a function provided by the caller with the s_umount
>>>> semaphore taken in read mode. However, there may be cases where write mode
>>>> is preferable, so we add __iterate_supers(), which lets one
>>>> specify the mode of the lock, and replace iterate_supers with two helpers
>>>> around __iterate_supers(), iterate_supers_read() and iterate_supers_write().
>>>>
>>>> This will be used to fix the emergency thaw (filesystem unfreeze) code, which
>>>> iterates over the list of superblocks but needs to hold the s_umount semaphore
>>>> in _write_ mode bebore carrying out the actual thaw operation.
>>>>
>>>> This patch introduces no semantic changes since iterate_supers() users become
>>>> iterate_supers_read() which is equivalent.
>>>>
>>>> Cc: Josef Bacik <jbacik@fusionio.com>
>>>> Cc: Eric Sandeen <sandeen@redhat.com>
>>>> Cc: Christoph Hellwig <hch@infradead.org>
>>>> Cc: Jan Kara <jack@suse.cz>
>>>> Cc: Dave Chinner <dchinner@redhat.com>
>>>> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
>>>> ---
>>> ...
>>>> diff -urNp linux-3.6-rc5-orig/fs/super.c linux-3.6-rc5/fs/super.c
>>>> --- linux-3.6-rc5-orig/fs/super.c 2012-09-14 11:53:43.416703312 +0900
>>>> +++ linux-3.6-rc5/fs/super.c 2012-09-14 12:30:52.188833193 +0900
>>>> @@ -537,14 +537,22 @@ void drop_super(struct super_block *sb)
>>>> EXPORT_SYMBOL(drop_super);
>>>> /**
>>>> - * iterate_supers - call function for all active superblocks
>>>> + * __iterate_supers - call function for all active superblocks
>>>> * @f: function to call
>>>> * @arg: argument to pass to it
>>>> + * @wlock: mode of superblock lock (false->read lock, true->write lock)
>>>> *
>>>> * Scans the superblock list and calls given function, passing it
>>>> * locked superblock and given argument.
>>>> + *
>>>> + * When the caller asks for the superblock lock (s_umount semaphore) to be
>>>> + * taken in write mode, the lock is taken but not released because the
>>>> + * function provided by the caller may deactivate the superblock itself.
>>>> + * It is that function's job to unlock the superblock as needed in such a
>>>> + * case.
>>>> */
>>>> -void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
>>>> +static void __iterate_supers(void (*f)(struct super_block *, void *), void *arg,
>>>> + bool wlock)
>>>> {
>>>> struct super_block *sb, *p = NULL;
>>>> @@ -555,10 +563,19 @@ void iterate_supers(void (*f)(struct sup
>>>> sb->s_count++;
>>>> spin_unlock(&sb_lock);
>>>> - down_read(&sb->s_umount);
>>>> + if (wlock)
>>>> + down_write(&sb->s_umount);
>>>> + else
>>>> + down_read(&sb->s_umount);
>>>> +
>>>> if (sb->s_root && (sb->s_flags & MS_BORN))
>>>> f(sb, arg);
>>>> - up_read(&sb->s_umount);
>>>> +
>>>> + /* When the semaphore was taken in write mode the function
>>>> + * provided by the caller takes care of unlocking it as
>>>> + * needed. See explanation above for details. */
>>>> + if (!wlock)
>>>> + up_read(&sb->s_umount);
>>>> spin_lock(&sb_lock);
>>>> if (p)
>>> These locking rules are ugly and counterintuitive. People will easily
>>> get them wrong and create bugs. I'd rather see emergency thaw retake the
>>> s_umount semaphore so that iterate_supers() can drop it...
>> I guess you are referring to treating the write lock differently
>> and not dropping the lock inside __iterate_supers(). The
>> problem is that f() may release the last reference to the
>> superblock which in turn will go away, so letting
>> __iterate_supers() drop the lock is not safe (I added a
>> comment about this issue in the function itself).
> Well, except that iterate_supers() actually takes a passive reference
> (s_count) of the superblock. Thus deactivate_locked_super() will never
> really destroy it. So what I propose should be safe.
Good point. I missed the fact that we are taking a passive
reference there, which should simplifying locking. I will
take that into account for the next revision.
Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount
2012-09-25 9:24 ` Jan Kara
@ 2012-09-25 10:31 ` Fernando Luis Vazquez Cao
0 siblings, 0 replies; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-09-25 10:31 UTC (permalink / raw)
To: Jan Kara
Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, linux-fsdevel
On 2012/09/25 18:24, Jan Kara wrote:
> On Fri 14-09-12 15:48:56, Fernando Luis Vázquez Cao wrote:
>> The emergency thaw process uses iterate_super() which holds the
>> sb->s_umount lock in read mode. The current thaw_super() code takes
>> the sb->s_umount lock in write mode, hence leading to an instant
>> deadlock.
>>
>> Use the unlocked version of thaw_super() to do the thawing and replace
>> iterate_supers() with __iterate_supers() so that the unfreeze operation can
>> be performed with s_umount held as the locking rules for fsfreeze indicate.
>>
>> As a bonus, by using thaw_super(), which does not nest, instead of thaw_bdev()
>> when can get rid of the ugly while loop.
> Hum, but that will leave block devices frozen? Isn't that a bug? It
> definitely creates an inconsistent state. I think emergency thaw should
> also iterate over all block devices (via iterate_bdevs()) and thaw all of
> them (and no, this does not allow us to avoid the iterate_super() changes
> because there are filesystems without block devices...).
Yes, we may end up with an inconsistent state when
bdev->bd_fsfreeze_count == 1 entering thaw_bdev
(other cases will work as expected) because we have
this code:
error = thaw_super(sb);
if (error) {
bdev->bd_fsfreeze_count++;
mutex_unlock(&bdev->bd_fsfreeze_mutex);
return error;
}
After the emergency unfreeze thaw_super() will return
-EINVAL and bdev->bd_fsfreeze_count will be restored
to one which can confuse the caller.
I think that the best solution to this problem is
considering the thaw successful when thaw_super()
returns -EINVAL, because that means the filesystem
is already unfrozen. I prefer this approach to
thawing bdevs too during emergency thaw,
because dm or external users may call thaw_bdev
after that, which would fail returning -EINVAL.
If you agree with this I will implement it.
Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
2012-09-25 9:48 ` Jan Kara
@ 2012-09-25 10:51 ` Fernando Luis Vazquez Cao
2012-09-25 16:39 ` Jan Kara
0 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-09-25 10:51 UTC (permalink / raw)
To: Jan Kara
Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, linux-fsdevel
On 2012/09/25 18:48, Jan Kara wrote:
> On Fri 14-09-12 15:53:34, Fernando Luis Vázquez Cao wrote:
>> thaw_bdev() has re-entrancy guards to allow freezes to nest
>> together. That is, it ensures that the filesystem is not thawed
>> until the last thaw command is issued. This is needed to prevent the
>> filesystem from being unfrozen while an existing freezer is still
>> operating on the filesystem in a frozen state (e.g. dm-snapshot).
>>
>> Currently, freeze_super() and thaw_super() bypasses these guards,
>> and as a result manual freezing and unfreezing via the ioctl methods
>> do not nest correctly. hence mixing userspace directed freezes with
>> block device level freezes result in inconsistency due to premature
>> thawing of the filesystem.
>>
>> Move the re-enterency guards to the superblock and into freeze_super
>> and thaw_super() so that userspace directed freezes nest correctly
>> again. Caveat: Things work as expected as long as direct calls to
>> thaw_super are always in response to a previous sb level freeze. In
>> other words an unpaired call to thaw_super can still thaw a
>> filesystem frozen using freeze_bdev (this issue could be addressed
>> in a follow-up patch if deemed necessary).
>>
>> This patch retains the bdev level mutex and counter to keep the
>> "feature" that we can freeze a block device that does not have a
>> filesystem mounted yet.
>>
>> IMPORTANT CAVEAT: This patch restores the nesting feature of the fsfreeze ioctl
>> which got removed by commit 18e9e5104fcd9a973ffe3eed3816c87f2a1b6cd2
>> ("Introduce freeze_super and thaw_super for the fsfreeze ioctl"). It could be
>> argued that it is too late to fix the userland ABI breakage caused by that
>> patch and that the current ABI is the one that should be kept. If this is the
>> respective maintainer(s) opinion this could be modified to not allow fsfreeze
>> ioctl nesting.
>>
>> Changes since version 2:
>> - Fix reference count leak in freeze_super when MS_BORN flag is not set in
>> the superblock.
>> - Protect freeze counter using s_umount and get rid of superblock level
>> fsfreeze mutex.
>>
>> Cc: Josef Bacik <jbacik@fusionio.com>
>> Cc: Eric Sandeen <sandeen@redhat.com>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Dave Chinner <dchinner@redhat.com>
>> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
>> ---
> I have just some mostly minor comments:
Thank you for the detailed review.
>> diff -urNp linux-3.6-rc5-orig/fs/gfs2/ops_fstype.c linux-3.6-rc5/fs/gfs2/ops_fstype.c
>> --- linux-3.6-rc5-orig/fs/gfs2/ops_fstype.c 2012-09-14 12:46:15.152871285 +0900
>> +++ linux-3.6-rc5/fs/gfs2/ops_fstype.c 2012-09-14 13:51:36.457205813 +0900
>> @@ -1288,11 +1288,6 @@ static struct dentry *gfs2_mount(struct
>> if (IS_ERR(bdev))
>> return ERR_CAST(bdev);
>>
>> - /*
>> - * once the super is inserted into the list by sget, s_umount
>> - * will protect the lockfs code from trying to start a snapshot
>> - * while we are mounting
>> - */
> Shouldn't this comment be replaced with something more accurate instead
> of just deleting it?
Good point.
>> @@ -1365,29 +1363,27 @@ static void sb_wait_write(struct super_b
>> * freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
>> * mostly auxiliary for filesystems to verify they do not modify frozen fs.
>> *
>> - * sb->s_writers.frozen is protected by sb->s_umount.
>> + * sb->s_writers.frozen and sb->s_freeze_count are protected by sb->s_umount.
>> */
>> int freeze_super(struct super_block *sb)
>> {
>> - int ret;
>> + int ret = 0;
>>
>> atomic_inc(&sb->s_active);
>> down_write(&sb->s_umount);
>> - if (sb->s_writers.frozen != SB_UNFROZEN) {
>> - deactivate_locked_super(sb);
>> - return -EBUSY;
>> - }
>>
>> if (!(sb->s_flags & MS_BORN)) {
>> - up_write(&sb->s_umount);
>> - return 0; /* sic - it's "nothing to do" */
>> + atomic_dec(&sb->s_active);
>> + goto out_active; /* sic - it's "nothing to do" */
> Why not 'goto out_deactivate' here instead of manually decrementing
> s_active?
I was afraid that calling deactivate_locked_super()
when the MS_BORN flag is set could release the
last active reference to the superblock and
end up freeing it.
By the way, I probably should explain in the
patch that that piece of code fixes a reference
leak bug in the current code.
Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
2012-09-25 10:51 ` Fernando Luis Vazquez Cao
@ 2012-09-25 16:39 ` Jan Kara
2012-09-26 8:22 ` Fernando Luis Vazquez Cao
0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2012-09-25 16:39 UTC (permalink / raw)
To: Fernando Luis Vazquez Cao
Cc: Jan Kara, Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, linux-fsdevel
On Tue 25-09-12 19:51:33, Fernando Luis Vazquez Cao wrote:
> On 2012/09/25 18:48, Jan Kara wrote:
> >On Fri 14-09-12 15:53:34, Fernando Luis Vázquez Cao wrote:
> >>thaw_bdev() has re-entrancy guards to allow freezes to nest
> >>together. That is, it ensures that the filesystem is not thawed
> >>until the last thaw command is issued. This is needed to prevent the
> >>filesystem from being unfrozen while an existing freezer is still
> >>operating on the filesystem in a frozen state (e.g. dm-snapshot).
> >>
> >>Currently, freeze_super() and thaw_super() bypasses these guards,
> >>and as a result manual freezing and unfreezing via the ioctl methods
> >>do not nest correctly. hence mixing userspace directed freezes with
> >>block device level freezes result in inconsistency due to premature
> >>thawing of the filesystem.
> >>
> >>Move the re-enterency guards to the superblock and into freeze_super
> >>and thaw_super() so that userspace directed freezes nest correctly
> >>again. Caveat: Things work as expected as long as direct calls to
> >>thaw_super are always in response to a previous sb level freeze. In
> >>other words an unpaired call to thaw_super can still thaw a
> >>filesystem frozen using freeze_bdev (this issue could be addressed
> >>in a follow-up patch if deemed necessary).
> >>
> >>This patch retains the bdev level mutex and counter to keep the
> >>"feature" that we can freeze a block device that does not have a
> >>filesystem mounted yet.
> >>
> >>IMPORTANT CAVEAT: This patch restores the nesting feature of the fsfreeze ioctl
> >>which got removed by commit 18e9e5104fcd9a973ffe3eed3816c87f2a1b6cd2
> >>("Introduce freeze_super and thaw_super for the fsfreeze ioctl"). It could be
> >>argued that it is too late to fix the userland ABI breakage caused by that
> >>patch and that the current ABI is the one that should be kept. If this is the
> >>respective maintainer(s) opinion this could be modified to not allow fsfreeze
> >>ioctl nesting.
> >>
> >>Changes since version 2:
> >> - Fix reference count leak in freeze_super when MS_BORN flag is not set in
> >> the superblock.
> >> - Protect freeze counter using s_umount and get rid of superblock level
> >> fsfreeze mutex.
> >>
> >>Cc: Josef Bacik <jbacik@fusionio.com>
> >>Cc: Eric Sandeen <sandeen@redhat.com>
> >>Cc: Christoph Hellwig <hch@infradead.org>
> >>Cc: Jan Kara <jack@suse.cz>
> >>Cc: Dave Chinner <dchinner@redhat.com>
> >>Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> >>---
> > I have just some mostly minor comments:
>
> Thank you for the detailed review.
>
>
> >>diff -urNp linux-3.6-rc5-orig/fs/gfs2/ops_fstype.c linux-3.6-rc5/fs/gfs2/ops_fstype.c
> >>--- linux-3.6-rc5-orig/fs/gfs2/ops_fstype.c 2012-09-14 12:46:15.152871285 +0900
> >>+++ linux-3.6-rc5/fs/gfs2/ops_fstype.c 2012-09-14 13:51:36.457205813 +0900
> >>@@ -1288,11 +1288,6 @@ static struct dentry *gfs2_mount(struct
> >> if (IS_ERR(bdev))
> >> return ERR_CAST(bdev);
> >>- /*
> >>- * once the super is inserted into the list by sget, s_umount
> >>- * will protect the lockfs code from trying to start a snapshot
> >>- * while we are mounting
> >>- */
> > Shouldn't this comment be replaced with something more accurate instead
> >of just deleting it?
>
> Good point.
>
>
> >>@@ -1365,29 +1363,27 @@ static void sb_wait_write(struct super_b
> >> * freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
> >> * mostly auxiliary for filesystems to verify they do not modify frozen fs.
> >> *
> >>- * sb->s_writers.frozen is protected by sb->s_umount.
> >>+ * sb->s_writers.frozen and sb->s_freeze_count are protected by sb->s_umount.
> >> */
> >> int freeze_super(struct super_block *sb)
> >> {
> >>- int ret;
> >>+ int ret = 0;
> >> atomic_inc(&sb->s_active);
> >> down_write(&sb->s_umount);
> >>- if (sb->s_writers.frozen != SB_UNFROZEN) {
> >>- deactivate_locked_super(sb);
> >>- return -EBUSY;
> >>- }
> >> if (!(sb->s_flags & MS_BORN)) {
> >>- up_write(&sb->s_umount);
> >>- return 0; /* sic - it's "nothing to do" */
> >>+ atomic_dec(&sb->s_active);
> >>+ goto out_active; /* sic - it's "nothing to do" */
> > Why not 'goto out_deactivate' here instead of manually decrementing
> >s_active?
>
> I was afraid that calling deactivate_locked_super()
> when the MS_BORN flag is set could release the
> last active reference to the superblock and
> end up freeing it.
Well, the caller has to have the sb pinned somehow so that it can safely
pass it into freeze_super(). So deactivate_locked_super() cannot really
end up freeing the superblock...
> By the way, I probably should explain in the
> patch that that piece of code fixes a reference
> leak bug in the current code.
Yep, that would be nice.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
2012-09-25 16:39 ` Jan Kara
@ 2012-09-26 8:22 ` Fernando Luis Vazquez Cao
2012-09-26 9:09 ` Jan Kara
0 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-09-26 8:22 UTC (permalink / raw)
To: Jan Kara
Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, linux-fsdevel
On 2012/09/26 01:39, Jan Kara wrote:
>>>> @@ -1365,29 +1363,27 @@ static void sb_wait_write(struct super_b
>>>> * freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
>>>> * mostly auxiliary for filesystems to verify they do not modify frozen fs.
>>>> *
>>>> - * sb->s_writers.frozen is protected by sb->s_umount.
>>>> + * sb->s_writers.frozen and sb->s_freeze_count are protected by sb->s_umount.
>>>> */
>>>> int freeze_super(struct super_block *sb)
>>>> {
>>>> - int ret;
>>>> + int ret = 0;
>>>> atomic_inc(&sb->s_active);
>>>> down_write(&sb->s_umount);
>>>> - if (sb->s_writers.frozen != SB_UNFROZEN) {
>>>> - deactivate_locked_super(sb);
>>>> - return -EBUSY;
>>>> - }
>>>> if (!(sb->s_flags & MS_BORN)) {
>>>> - up_write(&sb->s_umount);
>>>> - return 0; /* sic - it's "nothing to do" */
>>>> + atomic_dec(&sb->s_active);
>>>> + goto out_active; /* sic - it's "nothing to do" */
>>> Why not 'goto out_deactivate' here instead of manually decrementing
>>> s_active?
>> I was afraid that calling deactivate_locked_super()
>> when the MS_BORN flag is set could release the
>> last active reference to the superblock and
>> end up freeing it.
> Well, the caller has to have the sb pinned somehow so that it can safely
> pass it into freeze_super(). So deactivate_locked_super() cannot really
> end up freeing the superblock...
I should have explained my fears better. If no one else
is holding an active reference we will end up executing:
fs->kill_sb(s);
...
put_filesystem(fs);
put_super(s);
in deactivate_locked_super(). If s_count is greater than 0
the superblock will not be freed, as you say, however we
still do "fs->kill_sb(s)" and "put_filesystem(fs)" and I am not
sure whether this is safe when MS_BORN flag is not set in
the superblock. I will check how MS_BORN is being used
once more and try to figure it out (if you are familiar with
MS_BORN I would appreciate it your advise).
>> By the way, I probably should explain in the
>> patch that that piece of code fixes a reference
>> leak bug in the current code.
> Yep, that would be nice.
Will do.
Thanks,
Fernando
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
2012-09-26 8:22 ` Fernando Luis Vazquez Cao
@ 2012-09-26 9:09 ` Jan Kara
2012-10-03 7:58 ` Fernando Luis Vazquez Cao
0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2012-09-26 9:09 UTC (permalink / raw)
To: Fernando Luis Vazquez Cao
Cc: Jan Kara, Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, linux-fsdevel
On Wed 26-09-12 17:22:53, Fernando Luis Vazquez Cao wrote:
> On 2012/09/26 01:39, Jan Kara wrote:
> >>>>@@ -1365,29 +1363,27 @@ static void sb_wait_write(struct super_b
> >>>> * freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
> >>>> * mostly auxiliary for filesystems to verify they do not modify frozen fs.
> >>>> *
> >>>>- * sb->s_writers.frozen is protected by sb->s_umount.
> >>>>+ * sb->s_writers.frozen and sb->s_freeze_count are protected by sb->s_umount.
> >>>> */
> >>>> int freeze_super(struct super_block *sb)
> >>>> {
> >>>>- int ret;
> >>>>+ int ret = 0;
> >>>> atomic_inc(&sb->s_active);
> >>>> down_write(&sb->s_umount);
> >>>>- if (sb->s_writers.frozen != SB_UNFROZEN) {
> >>>>- deactivate_locked_super(sb);
> >>>>- return -EBUSY;
> >>>>- }
> >>>> if (!(sb->s_flags & MS_BORN)) {
> >>>>- up_write(&sb->s_umount);
> >>>>- return 0; /* sic - it's "nothing to do" */
> >>>>+ atomic_dec(&sb->s_active);
> >>>>+ goto out_active; /* sic - it's "nothing to do" */
> >>> Why not 'goto out_deactivate' here instead of manually decrementing
> >>>s_active?
> >>I was afraid that calling deactivate_locked_super()
> >>when the MS_BORN flag is set could release the
> >>last active reference to the superblock and
> >>end up freeing it.
> > Well, the caller has to have the sb pinned somehow so that it can safely
> >pass it into freeze_super(). So deactivate_locked_super() cannot really
> >end up freeing the superblock...
>
> I should have explained my fears better. If no one else
> is holding an active reference we will end up executing:
>
> fs->kill_sb(s);
> ...
> put_filesystem(fs);
> put_super(s);
>
> in deactivate_locked_super(). If s_count is greater than 0
> the superblock will not be freed, as you say, however we
> still do "fs->kill_sb(s)" and "put_filesystem(fs)" and I am not
> sure whether this is safe when MS_BORN flag is not set in
> the superblock. I will check how MS_BORN is being used
> once more and try to figure it out (if you are familiar with
> MS_BORN I would appreciate it your advise).
I see. Well, from a brief check I don't think we should ever get to a
superblock without MS_BORN set. All functions iterating over superblocks
return only superblocks with MS_BORN set. In particular freeze_bdev() even
has an active reference itself and ioctl_fsfreeze() has a file open on the
sb which guarantees an active reference as well...
So maybe just removing that check altogether should be OK.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
2012-09-26 9:09 ` Jan Kara
@ 2012-10-03 7:58 ` Fernando Luis Vazquez Cao
2012-10-04 8:18 ` Jan Kara
0 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-10-03 7:58 UTC (permalink / raw)
To: Jan Kara
Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, linux-fsdevel
On 2012/09/26 18:09, Jan Kara wrote:
>> I should have explained my fears better. If no one else
>> is holding an active reference we will end up executing:
>>
>> fs->kill_sb(s);
>> ...
>> put_filesystem(fs);
>> put_super(s);
>>
>> in deactivate_locked_super(). If s_count is greater than 0
>> the superblock will not be freed, as you say, however we
>> still do "fs->kill_sb(s)" and "put_filesystem(fs)" and I am not
>> sure whether this is safe when MS_BORN flag is not set in
>> the superblock. I will check how MS_BORN is being used
>> once more and try to figure it out (if you are familiar with
>> MS_BORN I would appreciate it your advise).
> I see. Well, from a brief check I don't think we should ever get to a
> superblock without MS_BORN set. All functions iterating over superblocks
> return only superblocks with MS_BORN set. In particular freeze_bdev() even
> has an active reference itself and ioctl_fsfreeze() has a file open on the
> sb which guarantees an active reference as well...
As you say when we get there via the superblock level API
it is guaranteed that we have at least one active reference
and that MS_BORN is set. However, freeze_bdev() iterates
over superblocks using get_active_super() which can return
a superblock without the MS_BORN flag set; during sys_mount
mount_fs() sets the MS_BORN flag *after* sget() inserts the
superblock in all the lists.
If the analysis above is correct we do need the MS_BORN
check.
By the way, tomorrow I will be sending v5 of this patch set.
Thanks,
Fernando
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
2012-10-03 7:58 ` Fernando Luis Vazquez Cao
@ 2012-10-04 8:18 ` Jan Kara
2012-10-05 4:22 ` Fernando Luis Vázquez Cao
2012-10-05 4:30 ` Fernando Luis Vázquez Cao
0 siblings, 2 replies; 30+ messages in thread
From: Jan Kara @ 2012-10-04 8:18 UTC (permalink / raw)
To: Fernando Luis Vazquez Cao
Cc: Jan Kara, Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, linux-fsdevel
On Wed 03-10-12 16:58:50, Fernando Luis Vazquez Cao wrote:
> On 2012/09/26 18:09, Jan Kara wrote:
> >>I should have explained my fears better. If no one else
> >>is holding an active reference we will end up executing:
> >>
> >>fs->kill_sb(s);
> >>...
> >>put_filesystem(fs);
> >>put_super(s);
> >>
> >>in deactivate_locked_super(). If s_count is greater than 0
> >>the superblock will not be freed, as you say, however we
> >>still do "fs->kill_sb(s)" and "put_filesystem(fs)" and I am not
> >>sure whether this is safe when MS_BORN flag is not set in
> >>the superblock. I will check how MS_BORN is being used
> >>once more and try to figure it out (if you are familiar with
> >>MS_BORN I would appreciate it your advise).
> > I see. Well, from a brief check I don't think we should ever get to a
> >superblock without MS_BORN set. All functions iterating over superblocks
> >return only superblocks with MS_BORN set. In particular freeze_bdev() even
> >has an active reference itself and ioctl_fsfreeze() has a file open on the
> >sb which guarantees an active reference as well...
>
> As you say when we get there via the superblock level API
> it is guaranteed that we have at least one active reference
> and that MS_BORN is set. However, freeze_bdev() iterates
> over superblocks using get_active_super() which can return
> a superblock without the MS_BORN flag set; during sys_mount
> mount_fs() sets the MS_BORN flag *after* sget() inserts the
> superblock in all the lists.
>
> If the analysis above is correct we do need the MS_BORN
> check.
Checking again I agree with you. But that seems more like an issue with
get_active_super() which shouldn't return superblock without MS_BORN.
Whatever... that can be left for a separate patch.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
2012-10-04 8:18 ` Jan Kara
@ 2012-10-05 4:22 ` Fernando Luis Vázquez Cao
2012-10-05 4:30 ` Fernando Luis Vázquez Cao
1 sibling, 0 replies; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-10-05 4:22 UTC (permalink / raw)
To: Jan Kara
Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, linux-fsdevel
2012-10-04 (木) の 10:18 +0200 に Jan Kara さんは書きました:
> > As you say when we get there via the superblock level API
> > it is guaranteed that we have at least one active reference
> > and that MS_BORN is set. However, freeze_bdev() iterates
> > over superblocks using get_active_super() which can return
> > a superblock without the MS_BORN flag set; during sys_mount
> > mount_fs() sets the MS_BORN flag *after* sget() inserts the
> > superblock in all the lists.
> >
> > If the analysis above is correct we do need the MS_BORN
> > check.
> Checking again I agree with you. But that seems more like an issue with
> get_active_super() which shouldn't return superblock without MS_BORN.
> Whatever... that can be left for a separate patch.
Yes, let's leave it for a separate patch.
Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
2012-10-04 8:18 ` Jan Kara
2012-10-05 4:22 ` Fernando Luis Vázquez Cao
@ 2012-10-05 4:30 ` Fernando Luis Vázquez Cao
1 sibling, 0 replies; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-10-05 4:30 UTC (permalink / raw)
To: Jan Kara
Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
Christoph Hellwig, linux-fsdevel
2012-10-04 (木) の 10:18 +0200 に Jan Kara さんは書きました:
> > As you say when we get there via the superblock level API
> > it is guaranteed that we have at least one active reference
> > and that MS_BORN is set. However, freeze_bdev() iterates
> > over superblocks using get_active_super() which can return
> > a superblock without the MS_BORN flag set; during sys_mount
> > mount_fs() sets the MS_BORN flag *after* sget() inserts the
> > superblock in all the lists.
> >
> > If the analysis above is correct we do need the MS_BORN
> > check.
> Checking again I agree with you. But that seems more like an issue with
> get_active_super() which shouldn't return superblock without MS_BORN.
> Whatever... that can be left for a separate patch.
Yes, let's leave it for a separate patch.
Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 2/9] fsfreeze: add unlocked version of thaw_super
2012-10-05 5:24 [PATCH 0/9 v5] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
@ 2012-10-05 5:33 ` Fernando Luis Vázquez Cao
0 siblings, 0 replies; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-10-05 5:33 UTC (permalink / raw)
To: Al Viro
Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
Jan Kara, linux-fsdevel
thaw_super may be called with superblock lock already taken (fsfreeze's
emergency thaw being one example), so we need an unlocked version to avoid
lockups.
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-3.6.0-rc7-orig/fs/super.c linux-3.6.0-rc7/fs/super.c
--- linux-3.6.0-rc7-orig/fs/super.c 2012-09-26 13:20:14.854365058 +0900
+++ linux-3.6.0-rc7/fs/super.c 2012-09-26 13:20:36.342365207 +0900
@@ -1428,40 +1428,59 @@ int freeze_super(struct super_block *sb)
EXPORT_SYMBOL(freeze_super);
/**
- * thaw_super -- unlock filesystem
+ * __thaw_super -- unlock filesystem
* @sb: the super to thaw
*
- * Unlocks the filesystem and marks it writeable again after freeze_super().
+ * Unlocks the filesystem and marks it writeable again.
+ *
+ * This is the unlocked version of thaw_super, so it is the caller's job to
+ * to protect the superblock by grabbing the s_umount semaphore in write mode
+ * and release it again on return. See thaw_super() for an example.
*/
-int thaw_super(struct super_block *sb)
+int __thaw_super(struct super_block *sb)
{
- int error;
+ int error = 0;
- down_write(&sb->s_umount);
if (sb->s_writers.frozen == SB_UNFROZEN) {
- up_write(&sb->s_umount);
- return -EINVAL;
+ error = -EINVAL;
+ goto out;
}
if (sb->s_flags & MS_RDONLY)
- goto out;
+ goto out_thaw;
if (sb->s_op->unfreeze_fs) {
error = sb->s_op->unfreeze_fs(sb);
if (error) {
printk(KERN_ERR
"VFS:Filesystem thaw failed\n");
- up_write(&sb->s_umount);
- return error;
+ goto out;
}
}
-out:
+out_thaw:
sb->s_writers.frozen = SB_UNFROZEN;
smp_wmb();
wake_up(&sb->s_writers.wait_unfrozen);
- deactivate_locked_super(sb);
+out:
+ return error;
+}
- return 0;
+/**
+ * thaw_super -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ */
+int thaw_super(struct super_block *sb)
+{
+ int res;
+ down_write(&sb->s_umount);
+ res = __thaw_super(sb);
+ if (!res)
+ deactivate_locked_super(sb);
+ else
+ up_write(&sb->s_umount);
+ return res;
}
EXPORT_SYMBOL(thaw_super);
diff -urNp linux-3.6.0-rc7-orig/include/linux/fs.h linux-3.6.0-rc7/include/linux/fs.h
--- linux-3.6.0-rc7-orig/include/linux/fs.h 2012-09-26 13:20:14.858365059 +0900
+++ linux-3.6.0-rc7/include/linux/fs.h 2012-09-26 13:20:36.358365209 +0900
@@ -2082,6 +2082,7 @@ extern int user_statfs(const char __user
extern int fd_statfs(int, struct kstatfs *);
extern int vfs_ustat(dev_t, struct kstatfs *);
extern int freeze_super(struct super_block *super);
+extern int __thaw_super(struct super_block *super);
extern int thaw_super(struct super_block *super);
extern bool our_mnt(struct vfsmount *mnt);
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2012-10-05 7:05 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-14 6:43 [RFC, PATCH 0/9 v4] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
2012-09-14 6:45 ` [PATCH 1/9] vfs: add __iterate_supers() and helpers around it Fernando Luis Vázquez Cao
2012-09-25 9:11 ` Jan Kara
2012-09-25 9:42 ` Fernando Luis Vazquez Cao
2012-09-25 9:52 ` Jan Kara
2012-09-25 10:03 ` Fernando Luis Vazquez Cao
2012-09-14 6:46 ` [PATCH 2/9] fsfreeze: add unlocked version of thaw_super Fernando Luis Vázquez Cao
2012-09-25 9:13 ` Jan Kara
2012-09-25 9:43 ` Fernando Luis Vazquez Cao
2012-09-14 6:47 ` [PATCH 3/9] fsfreeze: Prevent emergency thaw from looping infinitely Fernando Luis Vázquez Cao
2012-09-14 6:48 ` [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
2012-09-25 9:24 ` Jan Kara
2012-09-25 10:31 ` Fernando Luis Vazquez Cao
2012-09-14 6:50 ` [PATCH 5/9] xfs: switch to using super methods for fsfreeze Fernando Luis Vázquez Cao
2012-09-14 6:51 ` [PATCH 6/9] fsfreeze: move emergency thaw code to fs/super.c Fernando Luis Vázquez Cao
2012-09-14 6:53 ` [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
2012-09-14 19:20 ` Eric Sandeen
2012-09-15 1:15 ` Eric Sandeen
2012-09-25 9:48 ` Jan Kara
2012-09-25 10:51 ` Fernando Luis Vazquez Cao
2012-09-25 16:39 ` Jan Kara
2012-09-26 8:22 ` Fernando Luis Vazquez Cao
2012-09-26 9:09 ` Jan Kara
2012-10-03 7:58 ` Fernando Luis Vazquez Cao
2012-10-04 8:18 ` Jan Kara
2012-10-05 4:22 ` Fernando Luis Vázquez Cao
2012-10-05 4:30 ` Fernando Luis Vázquez Cao
2012-09-14 6:54 ` [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state Fernando Luis Vázquez Cao
2012-09-14 6:55 ` [PATCH 9/9] fsfreeze: add block device " Fernando Luis Vázquez Cao
-- strict thread matches above, loose matches on Subject: below --
2012-10-05 5:24 [PATCH 0/9 v5] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
2012-10-05 5:33 ` [PATCH 2/9] fsfreeze: add unlocked version of thaw_super Fernando Luis Vázquez Cao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).