* [RFC PATCH 0/3] VFS: Fix s_umount thaw/write deadlock
@ 2011-09-13 2:53 Valerie Aurora
2011-09-13 2:57 ` [RFC PATCH 1/3] " Valerie Aurora
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Valerie Aurora @ 2011-09-13 2:53 UTC (permalink / raw)
To: linux-fsdevel, Jan Kara, Dave Chinner, Masayoshi MIZUMA,
Greg Freemyer, linux
[-- Attachment #1: Type: text/plain, Size: 1079 bytes --]
I've been working on some patches to fix the following problem:
File system freeze/thaw require the superblock's s_umount lock.
However, we write to the file system while holding the s_umount lock
in several places (e.g., writeback and quotas). Any of these can
block on a frozen file system while holding s_umount, preventing any
later thaw from occurring, since thaw requires s_umount. The solution
is to add a check, vfs_is_frozen(), to all code paths that write while
holding sb->s_umount and return without writing if it is true.
Attached is an audit of each use of ->s_umount and whether it could
result in a deadlock with freeze/thaw. Some of this has gone into the
code comments. Patches against some random post 3.0.0 pull to follow.
A patch to xfstests is coming. Basically, we need to test for mmaped
writes concurrent with file system freeze. Probably test 068 can be
extended. (If anyone is interested in doing this, it is relatively
simple work.)
Thanks, Jan Kara and Dave Chinner for their kind suggestions for fixes
and review of patches offline.
-VAL
[-- Attachment #2: audit.txt --]
[-- Type: text/plain, Size: 5300 bytes --]
Actual places s_umount is grabbed
fs/fs-writeback.c
- grabbed by writeback_inodes_sb_if_idle()
YES: fs is mounted, can be frozen, we may write an inode
Called only by ext4 when space is getting low, return value ignored
Solution: down_read_trylock() (also writeback_sb_inodes() checks
for frozen)
- grabbed by writeback_inodes_sb_nr_if_idle()
YES: fs is mounted, can be frozen, we may write an inode
Called only by btrfs when it is trying to free space, return value ignored
Solution: down_read_trylock() (also writeback_sb_inodes() checks
for frozen)
- grabbed in wb_do_writeback() by grab_super_passive()
YES: fs is mounted, can be frozen, we may write an inode
Solution: writeback_sb_inodes() checks for frozen and returns
Note: there was a livelock in wb_do_writeback() at grab_super_passive()
fs/namespace.c
- grabbed by do_emergency_remount()
YES: fs is mounted, can be frozen, we may write the superblock
Solution: already checks for frozen in do_remount_sb()
- grabbed by do_remount()
YES: fs is mounted, can be frozen, we may write the superblock
Solution: already checks for frozen in do_remount_sb()
- grabbed by mount_bdev()
NO: this is only called on first mount, fs not visible, can't be frozen
fs/super.c
- grabbed by sync_supers()
NO: We only write super block if s->s_dirt is set
- grabbed by iterate_supers()/iterate_supers_type()
YES: fs is mounted, can be frozen, we may write anything
Solution: Fix current callers and document that
iterate_supers[_type]() callers must check for frozen before
writing
Current iterate_supers() callers:
fs/buffer.c: iterate_supers(do_thaw_one, NULL);
- NO: emergency thaw - we do want to ignore the frozen
fs/drop_caches.c: iterate_supers(drop_pagecache_sb, NULL);
- NO: No dirty inodes, no writes should occur, but documented anyway
fs/drop_caches.c: drop_slab()
- MAYBE: Freeing inodes can dirty inodes, e.g. XFS shrinker.
Documented that this kind of shrinking requires a frozen check first.
fs/quota/quota.c: iterate_supers(quota_sync_one, &type);
- YES: fs is mounted, can be frozen, depends on behavior of fs
quota sync
Solution: check for frozen in quota_sync_one()
fs/sync.c: iterate_supers(sync_one_sb, &wait);
- YES: fs is mounted, can be frozen, quota_sync and sync_fs called
Nowhere is it written, quota ops can't write, sync can't write
- Solution: Check for frozen in sync_one_sb()
security/selinux/hooks.c: iterate_supers(delayed_superblock_init, NULL);
- MAYBE: Let SELinux people figure this one out
- grabbed by get_super() and kept until dropped by caller
YES: fs is mounted, can be frozen, caller may write
Current callers:
fs/quota/quota.c: block-based quota ops:
YES: fs is mounted, can be frozen, most quota ops may write
Solution: check in quotactl()
fs/block_dev.c: fsync_bdev()
YES: fs is mounted, can be frozen, may write
Solution: Existing check in sync_filesystem()
fs/block_dev.c: freeze_bdev()
NO: Caller immediately drops the s_umount ref via drop_super()
fs/statfs.c: statfs() via user_get_super()
YES: fs is mounted, can be frozen, ->statfs may write
Solution: ->statfs() shouldn't write
- grabbed during emergency remount
YES: fs is mounted, can be frozen, we may write the superblock
Solution: already checks for frozen in do_remount_sb()
- grabbed during freeze_super() (included for completeness)
YES: fs is mounted, can be frozen, we may write the superblock
Solution: Limit writes to following: Sync out existing writes
(sync_filesystem() call) with sb->s_frozen set to SB_FREEZE_WRITE
(be sure write_inodes_sb() can complete in this situation), then
set sb->s_frozen to SB_FREEZE_TRANS and write out superblock.
- grabbed during thaw_super() (included for completeness)
YES: fs is mounted, can be frozen, we may write the superblock
Solution: Be sure the ->unfreeze_fs() op can complete with
SB_FREEZE_TRANS set.
fs/sync.c
- grabbed by syncfs(), single fs sync
YES: fs is mounted, can be frozen, we may write the superblock
Solution: Should call sync_one_sb(), actually, but with error
return, Jan has patches touching this so make minimum change
- grabbed eventually through sync_fileystem()
YES: fs is mounted, can be frozen, we may write the superblock
Solution: Document that all callers check for frozen-ness before calling this
File system-specific uses
fs/ubifs/budget.c
- grabbed during sync to call writeback_inodes_sb()
YES: fs is mounted, can be frozen, we may write the superblock
Solution: check for frozen in writeback_inodes_sb()
fs/btrfs/volumes.c
- grabbed during "seeding" of a bdev
NO: we open the device with O_EXCL, no fs on it already
fs/cachefiles/interface.c
- grabbed during sync, calls sync_filesystem()
XXX not sure? Only equipped to deal with EIO
fs/nfs/super.c
- grabbed during nfs_follow_remote_path()
NO: just following path, no write will occur (hopefully)
fs/nilfs2/ioctl.c
- held during change of fs mode, implemented as fs transaction
XXX think this needs freeze awareness in general?
fs/reiserfs/procfs.c
- dropped after get_super() call in /proc operation
XXX don't know, need a reiser expert
^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH 1/3] VFS: Fix s_umount thaw/write deadlock
2011-09-13 2:53 [RFC PATCH 0/3] VFS: Fix s_umount thaw/write deadlock Valerie Aurora
@ 2011-09-13 2:57 ` Valerie Aurora
2011-09-14 14:00 ` Jan Kara
2011-09-20 22:51 ` Christoph Hellwig
2011-09-14 13:05 ` [RFC PATCH 0/3] " Jan Kara
2011-10-27 21:39 ` Christoph Hellwig
2 siblings, 2 replies; 13+ messages in thread
From: Valerie Aurora @ 2011-09-13 2:57 UTC (permalink / raw)
To: linux-fsdevel, Jan Kara, Dave Chinner, Masayoshi MIZUMA,
Greg Freemyer, linux
[-- Attachment #1: Type: text/plain, Size: 1 bytes --]
[-- Attachment #2: 1 --]
[-- Type: application/octet-stream, Size: 8828 bytes --]
commit 043b1284c113cfbaa6a175ad21f369cf59215c71
Author: Valerie Aurora <val@vaaconsulting.com>
Date: Fri Aug 26 20:32:57 2011 -0400
VFS: Fix s_umount thaw/write deadlock
File system freeze/thaw require the superblock's s_umount lock.
However, we write to the file system while holding the s_umount lock
in several places (e.g., writeback and quotas). Any of these can
block on a frozen file system while holding s_umount, preventing any
later thaw from occurring, since thaw requires s_umount.
The solution is to add a check, vfs_is_frozen(), to all code paths
that write while holding sb->s_umount and return without writing if it
is true.
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index c00e055..90f8f7e 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -57,6 +57,14 @@ int drop_caches_sysctl_handler(ctl_table *table, int write,
ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
if (ret)
return ret;
+
+ /*
+ * Any file-system specific routines eventually called by
+ * drop_pagecache_sb() and drop_slab() ought to check for a
+ * frozen file system before writing to the disk. Most file
+ * systems won't write in this case but XFS is a notable
+ * exception in certain cases.
+ */
if (write) {
if (sysctl_drop_caches & 1)
iterate_supers(drop_pagecache_sb, NULL);
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 04cf3b9..d1dca03 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -537,6 +537,9 @@ static long writeback_sb_inodes(struct super_block *sb,
long write_chunk;
long wrote = 0; /* count both pages and inodes */
+ if (vfs_is_frozen(sb))
+ return 0;
+
while (!list_empty(&wb->b_io)) {
struct inode *inode = wb_inode(wb->b_io.prev);
@@ -1238,39 +1241,43 @@ EXPORT_SYMBOL(writeback_inodes_sb);
* writeback_inodes_sb_if_idle - start writeback if none underway
* @sb: the superblock
*
- * Invoke writeback_inodes_sb if no writeback is currently underway.
- * Returns 1 if writeback was started, 0 if not.
+ * Invoke writeback_inodes_sb if no writeback is currently underway
+ * and no one else holds the s_umount lock. Returns 1 if writeback
+ * was started, 0 if not.
*/
int writeback_inodes_sb_if_idle(struct super_block *sb)
{
if (!writeback_in_progress(sb->s_bdi)) {
- down_read(&sb->s_umount);
- writeback_inodes_sb(sb);
- up_read(&sb->s_umount);
- return 1;
- } else
- return 0;
+ if (down_read_trylock(&sb->s_umount)) {
+ writeback_inodes_sb(sb);
+ up_read(&sb->s_umount);
+ return 1;
+ }
+ }
+ return 0;
}
EXPORT_SYMBOL(writeback_inodes_sb_if_idle);
/**
- * writeback_inodes_sb_if_idle - start writeback if none underway
+ * writeback_inodes_sb_nr_if_idle - start writeback if none underway
* @sb: the superblock
* @nr: the number of pages to write
*
- * Invoke writeback_inodes_sb if no writeback is currently underway.
- * Returns 1 if writeback was started, 0 if not.
+ * Invoke writeback_inodes_sb if no writeback is currently underway
+ * and no one else holds the s_umount lock. Returns 1 if writeback
+ * was started, 0 if not.
*/
int writeback_inodes_sb_nr_if_idle(struct super_block *sb,
unsigned long nr)
{
if (!writeback_in_progress(sb->s_bdi)) {
- down_read(&sb->s_umount);
- writeback_inodes_sb_nr(sb, nr);
- up_read(&sb->s_umount);
- return 1;
- } else
- return 0;
+ if (down_read_trylock(&sb->s_umount)) {
+ writeback_inodes_sb_nr(sb, nr);
+ up_read(&sb->s_umount);
+ return 1;
+ }
+ }
+ return 0;
}
EXPORT_SYMBOL(writeback_inodes_sb_nr_if_idle);
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index b34bdb2..993ce22 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -47,7 +47,7 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
static void quota_sync_one(struct super_block *sb, void *arg)
{
- if (sb->s_qcop && sb->s_qcop->quota_sync)
+ if (sb->s_qcop && sb->s_qcop->quota_sync && !vfs_is_frozen(sb))
sb->s_qcop->quota_sync(sb, *(int *)arg, 1);
}
@@ -366,6 +366,18 @@ SYSCALL_DEFINE4(quotactl, unsigned int, cmd, const char __user *, special,
if (IS_ERR(sb))
return PTR_ERR(sb);
+ /*
+ * It's not clear which quota functions may write to the file
+ * system (all?). Check for a frozen fs and bail out now.
+ */
+ if (vfs_is_frozen(sb)) {
+ drop_super(sb);
+ /* XXX Should quotactl_block() error path do this too? */
+ if (pathp && !IS_ERR(pathp))
+ path_put(pathp);
+ return -EBUSY;
+ }
+
ret = do_quotactl(sb, type, cmds, id, addr, pathp);
drop_super(sb);
diff --git a/fs/super.c b/fs/super.c
index 3f56a26..11ef978 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -526,6 +526,10 @@ void sync_supers(void)
*
* Scans the superblock list and calls given function, passing it
* locked superblock and given argument.
+ *
+ * Note: If @f is going to write to the superblock, it must first
+ * check if the file system is frozen (via vfs_is_frozen(sb)) and
+ * refuse to write if so.
*/
void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
{
@@ -561,6 +565,10 @@ void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
*
* Scans the superblock list and calls given function, passing it
* locked superblock and given argument.
+ *
+ * Note: If @f is going to write to the superblock, it must first
+ * check if the file system is frozen (via vfs_is_frozen(sb)) and
+ * refuse to write if so.
*/
void iterate_supers_type(struct file_system_type *type,
void (*f)(struct super_block *, void *), void *arg)
@@ -595,6 +603,10 @@ EXPORT_SYMBOL(iterate_supers_type);
*
* Scans the superblock list and finds the superblock of the file system
* mounted on the device given. %NULL is returned if no match is found.
+ *
+ * Note: If caller is going to write to the superblock, it must first
+ * check if the file system is frozen (via vfs_is_frozen(sb)) and
+ * refuse to write if so.
*/
struct super_block *get_super(struct block_device *bdev)
@@ -1127,6 +1139,23 @@ out:
* Syncs the super to make sure the filesystem is consistent and calls the fs's
* freeze_fs. Subsequent calls to this without first thawing the fs will return
* -EBUSY.
+ *
+ * During this function, sb->s_frozen goes through these values:
+ *
+ * SB_UNFROZEN: File system is normal, all writes progress as usual.
+ *
+ * SB_FREEZE_WRITE: The file system is in the process of being frozen
+ * and out-standing writes are being written out. Writes that
+ * complete in-process writes should be permitted but new ones should
+ * be blocked.
+ *
+ * SB_FREEZE_TRANS: The file system is frozen. No writes are
+ * permitted to any part.
+ *
+ * sb->s_frozen is protected by sb->s_umount. Additionally,
+ * SB_FREEZE_WRITE is only temporarily set during freeze/thaw while
+ * holding sb->s_umount for writing, so any other callers holding
+ * sb->s_umount will never see this state.
*/
int freeze_super(struct super_block *sb)
{
diff --git a/fs/sync.c b/fs/sync.c
index c98a747..db15b11 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -68,7 +68,7 @@ int sync_filesystem(struct super_block *sb)
/*
* No point in syncing out anything if the filesystem is read-only.
*/
- if (sb->s_flags & MS_RDONLY)
+ if ((sb->s_flags & MS_RDONLY) && !vfs_is_frozen(sb))
return 0;
ret = __sync_filesystem(sb, 0);
@@ -80,7 +80,7 @@ EXPORT_SYMBOL_GPL(sync_filesystem);
static void sync_one_sb(struct super_block *sb, void *arg)
{
- if (!(sb->s_flags & MS_RDONLY))
+ if (!(sb->s_flags & MS_RDONLY) && !vfs_is_frozen(sb))
__sync_filesystem(sb, *(int *)arg);
}
/*
@@ -136,7 +136,7 @@ SYSCALL_DEFINE1(syncfs, int, fd)
{
struct file *file;
struct super_block *sb;
- int ret;
+ int ret = 0;
int fput_needed;
file = fget_light(fd, &fput_needed);
@@ -145,7 +145,8 @@ SYSCALL_DEFINE1(syncfs, int, fd)
sb = file->f_dentry->d_sb;
down_read(&sb->s_umount);
- ret = sync_filesystem(sb);
+ if (!(sb->s_flags & MS_RDONLY) && !vfs_is_frozen(sb))
+ ret = sync_filesystem(sb);
up_read(&sb->s_umount);
fput_light(file, fput_needed);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f23bcb7..d23ab4f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1457,7 +1457,7 @@ extern void prune_dcache_sb(struct super_block *sb, int nr_to_scan);
extern struct timespec current_fs_time(struct super_block *sb);
/*
- * Snapshotting support.
+ * Snapshotting support. See freeze_super() for documentation.
*/
enum {
SB_UNFROZEN = 0,
@@ -1468,6 +1468,10 @@ enum {
#define vfs_check_frozen(sb, level) \
wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level)))
+static inline int vfs_is_frozen(struct super_block *sb) {
+ return sb->s_frozen == SB_FREEZE_TRANS;
+}
+
/*
* until VFS tracks user namespaces for inodes, just make all files
* belong to init_user_ns
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 0/3] VFS: Fix s_umount thaw/write deadlock
2011-09-13 2:53 [RFC PATCH 0/3] VFS: Fix s_umount thaw/write deadlock Valerie Aurora
2011-09-13 2:57 ` [RFC PATCH 1/3] " Valerie Aurora
@ 2011-09-14 13:05 ` Jan Kara
2011-09-14 23:22 ` Valerie Aurora
2011-10-27 21:39 ` Christoph Hellwig
2 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2011-09-14 13:05 UTC (permalink / raw)
To: Valerie Aurora
Cc: linux-fsdevel, Jan Kara, Dave Chinner, Masayoshi MIZUMA,
Greg Freemyer, linux-ext4, Eric Sandeen
> fs/reiserfs/procfs.c
> - dropped after get_super() call in /proc operation
> XXX don't know, need a reiser expert
Where exactly do we hold s_umount? Anyway, nothing in reiserfs/procfs.c
does not seem to involve writes.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/3] VFS: Fix s_umount thaw/write deadlock
2011-09-13 2:57 ` [RFC PATCH 1/3] " Valerie Aurora
@ 2011-09-14 14:00 ` Jan Kara
2011-09-14 23:53 ` Valerie Aurora
2011-09-20 22:51 ` Christoph Hellwig
1 sibling, 1 reply; 13+ messages in thread
From: Jan Kara @ 2011-09-14 14:00 UTC (permalink / raw)
To: Valerie Aurora
Cc: linux-fsdevel, Jan Kara, Dave Chinner, Masayoshi MIZUMA,
Greg Freemyer, linux-ext4, Eric Sandeen
On Mon 12-09-11 19:57:11, Valerie Aurora wrote:
> Val, if you are sending patches as attachments, make them at least
> text/plain please!
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 04cf3b9..d1dca03 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -537,6 +537,9 @@ static long writeback_sb_inodes(struct super_block *sb,
> long write_chunk;
> long wrote = 0; /* count both pages and inodes */
>
> + if (vfs_is_frozen(sb))
> + return 0;
> +
Umm, maybe we could make this more robust by skipping the superblock in
__writeback_inodes_wb() and just explicitely stopping the writeback when
work->sb is set (i.e. writeback is required only for frozen sb) in
wb_writeback()?
> while (!list_empty(&wb->b_io)) {
> struct inode *inode = wb_inode(wb->b_io.prev);
>
> @@ -1238,39 +1241,43 @@ EXPORT_SYMBOL(writeback_inodes_sb);
> * writeback_inodes_sb_if_idle - start writeback if none underway
> * @sb: the superblock
> *
> - * Invoke writeback_inodes_sb if no writeback is currently underway.
> - * Returns 1 if writeback was started, 0 if not.
> + * Invoke writeback_inodes_sb if no writeback is currently underway
> + * and no one else holds the s_umount lock. Returns 1 if writeback
> + * was started, 0 if not.
> */
> int writeback_inodes_sb_if_idle(struct super_block *sb)
> {
> if (!writeback_in_progress(sb->s_bdi)) {
> - down_read(&sb->s_umount);
> - writeback_inodes_sb(sb);
> - up_read(&sb->s_umount);
> - return 1;
> - } else
> - return 0;
> + if (down_read_trylock(&sb->s_umount)) {
What's exactly the deadlock trylock protects from here? Or is it just an
optimization?
> + writeback_inodes_sb(sb);
> + up_read(&sb->s_umount);
> + return 1;
> + }
> + }
> + return 0;
> }
> EXPORT_SYMBOL(writeback_inodes_sb_if_idle);
>
> /**
> - * writeback_inodes_sb_if_idle - start writeback if none underway
> + * writeback_inodes_sb_nr_if_idle - start writeback if none underway
> * @sb: the superblock
> * @nr: the number of pages to write
> *
> - * Invoke writeback_inodes_sb if no writeback is currently underway.
> - * Returns 1 if writeback was started, 0 if not.
> + * Invoke writeback_inodes_sb if no writeback is currently underway
> + * and no one else holds the s_umount lock. Returns 1 if writeback
> + * was started, 0 if not.
> */
> int writeback_inodes_sb_nr_if_idle(struct super_block *sb,
> unsigned long nr)
> {
> if (!writeback_in_progress(sb->s_bdi)) {
> - down_read(&sb->s_umount);
> - writeback_inodes_sb_nr(sb, nr);
> - up_read(&sb->s_umount);
> - return 1;
> - } else
> - return 0;
> + if (down_read_trylock(&sb->s_umount)) {
The same question here...
> + writeback_inodes_sb_nr(sb, nr);
> + up_read(&sb->s_umount);
> + return 1;
> + }
> + }
> + return 0;
> }
> EXPORT_SYMBOL(writeback_inodes_sb_nr_if_idle);
>
> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
> index b34bdb2..993ce22 100644
> --- a/fs/quota/quota.c
> +++ b/fs/quota/quota.c
> @@ -366,6 +366,18 @@ SYSCALL_DEFINE4(quotactl, unsigned int, cmd, const char __user *, special,
> if (IS_ERR(sb))
> return PTR_ERR(sb);
>
> + /*
> + * It's not clear which quota functions may write to the file
> + * system (all?). Check for a frozen fs and bail out now.
> + */
> + if (vfs_is_frozen(sb)) {
> + drop_super(sb);
> + /* XXX Should quotactl_block() error path do this too? */
Yes, it does. Thanks for spotting this.
> + if (pathp && !IS_ERR(pathp))
> + path_put(pathp);
> + return -EBUSY;
> + }
> +
> ret = do_quotactl(sb, type, cmds, id, addr, pathp);
>
> drop_super(sb);
...
> diff --git a/fs/sync.c b/fs/sync.c
> index c98a747..db15b11 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -68,7 +68,7 @@ int sync_filesystem(struct super_block *sb)
> /*
> * No point in syncing out anything if the filesystem is read-only.
> */
> - if (sb->s_flags & MS_RDONLY)
> + if ((sb->s_flags & MS_RDONLY) && !vfs_is_frozen(sb))
^^^^
The check should be: (sb->s_flags & MS_RDONLY || vfs_is_frozen(sb))
> return 0;
>
> ret = __sync_filesystem(sb, 0);
Honza
--...
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 0/3] VFS: Fix s_umount thaw/write deadlock
2011-09-14 13:05 ` [RFC PATCH 0/3] " Jan Kara
@ 2011-09-14 23:22 ` Valerie Aurora
0 siblings, 0 replies; 13+ messages in thread
From: Valerie Aurora @ 2011-09-14 23:22 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Dave Chinner, Masayoshi MIZUMA, Greg Freemyer,
linux-ext4, Eric Sandeen
On Wed, Sep 14, 2011 at 6:05 AM, Jan Kara <jack@suse.cz> wrote:
>> fs/reiserfs/procfs.c
>> - dropped after get_super() call in /proc operation
>> XXX don't know, need a reiser expert
> Where exactly do we hold s_umount? Anyway, nothing in reiserfs/procfs.c
> does not seem to involve writes.
Sorry, this actually calls sget(), not get_super(). And I agree,
there aren't any writes. Audit updated accordingly.
Thanks!
-VAL
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/3] VFS: Fix s_umount thaw/write deadlock
2011-09-14 14:00 ` Jan Kara
@ 2011-09-14 23:53 ` Valerie Aurora
2011-09-15 16:22 ` Jan Kara
2011-09-18 23:25 ` Dave Chinner
0 siblings, 2 replies; 13+ messages in thread
From: Valerie Aurora @ 2011-09-14 23:53 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Dave Chinner, Masayoshi MIZUMA, Greg Freemyer,
linux-ext4, Eric Sandeen
On Wed, Sep 14, 2011 at 7:00 AM, Jan Kara <jack@suse.cz> wrote:
> On Mon 12-09-11 19:57:11, Valerie Aurora wrote:
>> Val, if you are sending patches as attachments, make them at least
>> text/plain please!
Oops, sorry.
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index 04cf3b9..d1dca03 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -537,6 +537,9 @@ static long writeback_sb_inodes(struct super_block *sb,
>> long write_chunk;
>> long wrote = 0; /* count both pages and inodes */
>>
>> + if (vfs_is_frozen(sb))
>> + return 0;
>> +
> Umm, maybe we could make this more robust by skipping the superblock in
> __writeback_inodes_wb() and just explicitely stopping the writeback when
> work->sb is set (i.e. writeback is required only for frozen sb) in
> wb_writeback()?
Sorry, I don't quite understand what the goal is here? I'm happy to
make the change, just want to make sure I'm accomplishing what you
want.
>> while (!list_empty(&wb->b_io)) {
>> struct inode *inode = wb_inode(wb->b_io.prev);
>>
>> @@ -1238,39 +1241,43 @@ EXPORT_SYMBOL(writeback_inodes_sb);
>> * writeback_inodes_sb_if_idle - start writeback if none underway
>> * @sb: the superblock
>> *
>> - * Invoke writeback_inodes_sb if no writeback is currently underway.
>> - * Returns 1 if writeback was started, 0 if not.
>> + * Invoke writeback_inodes_sb if no writeback is currently underway
>> + * and no one else holds the s_umount lock. Returns 1 if writeback
>> + * was started, 0 if not.
>> */
>> int writeback_inodes_sb_if_idle(struct super_block *sb)
>> {
>> if (!writeback_in_progress(sb->s_bdi)) {
>> - down_read(&sb->s_umount);
>> - writeback_inodes_sb(sb);
>> - up_read(&sb->s_umount);
>> - return 1;
>> - } else
>> - return 0;
>> + if (down_read_trylock(&sb->s_umount)) {
> What's exactly the deadlock trylock protects from here? Or is it just an
> optimization?
The trylock is an optimization Dave Chinner suggested. The first
version I wrote acquired the lock and then checked vfs_is_frozen().
This function and the similar following one each have one caller,
btrfs in one case and ext4 in the other, and they are both trying to
get more writes to go out to disk in the hope of freeing up disk
space.
>> + writeback_inodes_sb(sb);
>> + up_read(&sb->s_umount);
>> + return 1;
>> + }
>> + }
>> + return 0;
>> }
>> EXPORT_SYMBOL(writeback_inodes_sb_if_idle);
>>
>> /**
>> - * writeback_inodes_sb_if_idle - start writeback if none underway
>> + * writeback_inodes_sb_nr_if_idle - start writeback if none underway
>> * @sb: the superblock
>> * @nr: the number of pages to write
>> *
>> - * Invoke writeback_inodes_sb if no writeback is currently underway.
>> - * Returns 1 if writeback was started, 0 if not.
>> + * Invoke writeback_inodes_sb if no writeback is currently underway
>> + * and no one else holds the s_umount lock. Returns 1 if writeback
>> + * was started, 0 if not.
>> */
>> int writeback_inodes_sb_nr_if_idle(struct super_block *sb,
>> unsigned long nr)
>> {
>> if (!writeback_in_progress(sb->s_bdi)) {
>> - down_read(&sb->s_umount);
>> - writeback_inodes_sb_nr(sb, nr);
>> - up_read(&sb->s_umount);
>> - return 1;
>> - } else
>> - return 0;
>> + if (down_read_trylock(&sb->s_umount)) {
> The same question here...
>
>> + writeback_inodes_sb_nr(sb, nr);
>> + up_read(&sb->s_umount);
>> + return 1;
>> + }
>> + }
>> + return 0;
>> }
>> EXPORT_SYMBOL(writeback_inodes_sb_nr_if_idle);
>>
>> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
>> index b34bdb2..993ce22 100644
>> --- a/fs/quota/quota.c
>> +++ b/fs/quota/quota.c
>> @@ -366,6 +366,18 @@ SYSCALL_DEFINE4(quotactl, unsigned int, cmd, const char __user *, special,
>> if (IS_ERR(sb))
>> return PTR_ERR(sb);
>>
>> + /*
>> + * It's not clear which quota functions may write to the file
>> + * system (all?). Check for a frozen fs and bail out now.
>> + */
>> + if (vfs_is_frozen(sb)) {
>> + drop_super(sb);
>> + /* XXX Should quotactl_block() error path do this too? */
> Yes, it does. Thanks for spotting this.
Fixed, thanks for confirming.
>> + if (pathp && !IS_ERR(pathp))
>> + path_put(pathp);
>> + return -EBUSY;
>> + }
>> +
>> ret = do_quotactl(sb, type, cmds, id, addr, pathp);
>>
>> drop_super(sb);
> ...
>> diff --git a/fs/sync.c b/fs/sync.c
>> index c98a747..db15b11 100644
>> --- a/fs/sync.c
>> +++ b/fs/sync.c
>> @@ -68,7 +68,7 @@ int sync_filesystem(struct super_block *sb)
>> /*
>> * No point in syncing out anything if the filesystem is read-only.
>> */
>> - if (sb->s_flags & MS_RDONLY)
>> + if ((sb->s_flags & MS_RDONLY) && !vfs_is_frozen(sb))
> ^^^^
> The check should be: (sb->s_flags & MS_RDONLY || vfs_is_frozen(sb))
Fixed, and I reviewed the other checks for similar mistakes. Thanks!
I'll resend soon.
-VAL
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/3] VFS: Fix s_umount thaw/write deadlock
2011-09-14 23:53 ` Valerie Aurora
@ 2011-09-15 16:22 ` Jan Kara
2011-09-18 23:25 ` Dave Chinner
1 sibling, 0 replies; 13+ messages in thread
From: Jan Kara @ 2011-09-15 16:22 UTC (permalink / raw)
To: Valerie Aurora
Cc: Jan Kara, linux-fsdevel, Dave Chinner, Masayoshi MIZUMA,
Greg Freemyer, linux-ext4, Eric Sandeen
On Wed 14-09-11 16:53:38, Valerie Aurora wrote:
> On Wed, Sep 14, 2011 at 7:00 AM, Jan Kara <jack@suse.cz> wrote:
> > On Mon 12-09-11 19:57:11, Valerie Aurora wrote:
> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >> index 04cf3b9..d1dca03 100644
> >> --- a/fs/fs-writeback.c
> >> +++ b/fs/fs-writeback.c
> >> @@ -537,6 +537,9 @@ static long writeback_sb_inodes(struct super_block *sb,
> >> long write_chunk;
> >> long wrote = 0; /* count both pages and inodes */
> >>
> >> + if (vfs_is_frozen(sb))
> >> + return 0;
> >> +
> > Umm, maybe we could make this more robust by skipping the superblock in
> > __writeback_inodes_wb() and just explicitely stopping the writeback when
> > work->sb is set (i.e. writeback is required only for frozen sb) in
> > wb_writeback()?
>
> Sorry, I don't quite understand what the goal is here? I'm happy to
> make the change, just want to make sure I'm accomplishing what you
> want.
My worry is, that when we just bail out from writeback_sb_inodes(), we
leave dirty inodes on b_io list. That is a unique behavior in writeback
code since all other places either put inode to b_more_io list for a quick
retry or redirty the inode to retry it later.
Emptyness of b_io list is used for example to detect whether we should
queue more inodes to b_io list and also busylooping logic kind of doesn't
count with inodes staying at b_io list. So although I don't see an
immediate problem with your solution it does add a new special case.
After some more thought I think the cleanest would be to just call
redirty_tail() if an inode is on frozen superblock in the beginning of the
loop in writeback_sb_inodes(). It won't be as efficient but much more in
line how the rest of the writeback code works.
> >> while (!list_empty(&wb->b_io)) {
> >> struct inode *inode = wb_inode(wb->b_io.prev);
> >>
> >> @@ -1238,39 +1241,43 @@ EXPORT_SYMBOL(writeback_inodes_sb);
> >> * writeback_inodes_sb_if_idle - start writeback if none underway
> >> * @sb: the superblock
> >> *
> >> - * Invoke writeback_inodes_sb if no writeback is currently underway.
> >> - * Returns 1 if writeback was started, 0 if not.
> >> + * Invoke writeback_inodes_sb if no writeback is currently underway
> >> + * and no one else holds the s_umount lock. Returns 1 if writeback
> >> + * was started, 0 if not.
> >> */
> >> int writeback_inodes_sb_if_idle(struct super_block *sb)
> >> {
> >> if (!writeback_in_progress(sb->s_bdi)) {
> >> - down_read(&sb->s_umount);
> >> - writeback_inodes_sb(sb);
> >> - up_read(&sb->s_umount);
> >> - return 1;
> >> - } else
> >> - return 0;
> >> + if (down_read_trylock(&sb->s_umount)) {
> > What's exactly the deadlock trylock protects from here? Or is it just an
> > optimization?
>
> The trylock is an optimization Dave Chinner suggested. The first
> version I wrote acquired the lock and then checked vfs_is_frozen().
>
> This function and the similar following one each have one caller,
> btrfs in one case and ext4 in the other, and they are both trying to
> get more writes to go out to disk in the hope of freeing up disk
> space.
Ah right. Maybe a short comment that it's just an optimization for
opportunictic writeout would be nice.
> >> + writeback_inodes_sb(sb);
> >> + up_read(&sb->s_umount);
> >> + return 1;
> >> + }
> >> + }
> >> + return 0;
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/3] VFS: Fix s_umount thaw/write deadlock
2011-09-14 23:53 ` Valerie Aurora
2011-09-15 16:22 ` Jan Kara
@ 2011-09-18 23:25 ` Dave Chinner
1 sibling, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2011-09-18 23:25 UTC (permalink / raw)
To: Valerie Aurora
Cc: Jan Kara, linux-fsdevel, Masayoshi MIZUMA, Greg Freemyer,
linux-ext4, Eric Sandeen
On Wed, Sep 14, 2011 at 04:53:38PM -0700, Valerie Aurora wrote:
> On Wed, Sep 14, 2011 at 7:00 AM, Jan Kara <jack@suse.cz> wrote:
> > On Mon 12-09-11 19:57:11, Valerie Aurora wrote:
> >> Val, if you are sending patches as attachments, make them at least
> >> text/plain please!
>
> Oops, sorry.
>
> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >> index 04cf3b9..d1dca03 100644
> >> --- a/fs/fs-writeback.c
> >> +++ b/fs/fs-writeback.c
> >> @@ -537,6 +537,9 @@ static long writeback_sb_inodes(struct super_block *sb,
> >> long write_chunk;
> >> long wrote = 0; /* count both pages and inodes */
> >>
> >> + if (vfs_is_frozen(sb))
> >> + return 0;
> >> +
> > Umm, maybe we could make this more robust by skipping the superblock in
> > __writeback_inodes_wb() and just explicitely stopping the writeback when
> > work->sb is set (i.e. writeback is required only for frozen sb) in
> > wb_writeback()?
>
> Sorry, I don't quite understand what the goal is here? I'm happy to
> make the change, just want to make sure I'm accomplishing what you
> want.
>
> >> while (!list_empty(&wb->b_io)) {
> >> struct inode *inode = wb_inode(wb->b_io.prev);
> >>
> >> @@ -1238,39 +1241,43 @@ EXPORT_SYMBOL(writeback_inodes_sb);
> >> * writeback_inodes_sb_if_idle - start writeback if none underway
> >> * @sb: the superblock
> >> *
> >> - * Invoke writeback_inodes_sb if no writeback is currently underway.
> >> - * Returns 1 if writeback was started, 0 if not.
> >> + * Invoke writeback_inodes_sb if no writeback is currently underway
> >> + * and no one else holds the s_umount lock. Returns 1 if writeback
> >> + * was started, 0 if not.
> >> */
> >> int writeback_inodes_sb_if_idle(struct super_block *sb)
> >> {
> >> if (!writeback_in_progress(sb->s_bdi)) {
> >> - down_read(&sb->s_umount);
> >> - writeback_inodes_sb(sb);
> >> - up_read(&sb->s_umount);
> >> - return 1;
> >> - } else
> >> - return 0;
> >> + if (down_read_trylock(&sb->s_umount)) {
> > What's exactly the deadlock trylock protects from here? Or is it just an
> > optimization?
>
> The trylock is an optimization Dave Chinner suggested. The first
> version I wrote acquired the lock and then checked vfs_is_frozen().
It's not so much an optimisation, but the general case of avoiding
read-write deadlocks such that freezing can trigger. I think remount
can trigger the same deadlock as freezing, so the trylock avoids both
deadlock cases rather than just working around the freeze problem....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/3] VFS: Fix s_umount thaw/write deadlock
2011-09-13 2:57 ` [RFC PATCH 1/3] " Valerie Aurora
2011-09-14 14:00 ` Jan Kara
@ 2011-09-20 22:51 ` Christoph Hellwig
1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2011-09-20 22:51 UTC (permalink / raw)
To: Valerie Aurora
Cc: linux-fsdevel, Jan Kara, Dave Chinner, Masayoshi MIZUMA,
Greg Freemyer, linux-ext4, Eric Sandeen
On Mon, Sep 12, 2011 at 07:57:11PM -0700, Valerie Aurora wrote:
>
Can you please resend the series in proper format, that is inlined,
with the subject in the mail header and just the description and the
patch itself in the body?
The current form is not really reviewable.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 0/3] VFS: Fix s_umount thaw/write deadlock
2011-09-13 2:53 [RFC PATCH 0/3] VFS: Fix s_umount thaw/write deadlock Valerie Aurora
2011-09-13 2:57 ` [RFC PATCH 1/3] " Valerie Aurora
2011-09-14 13:05 ` [RFC PATCH 0/3] " Jan Kara
@ 2011-10-27 21:39 ` Christoph Hellwig
2011-10-27 22:08 ` Valerie Aurora
2 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2011-10-27 21:39 UTC (permalink / raw)
To: Valerie Aurora
Cc: linux-fsdevel, Jan Kara, Dave Chinner, Masayoshi MIZUMA,
Greg Freemyer, linux-ext4, Eric Sandeen
Any plans to resubmit these with the updates and in proper format?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 0/3] VFS: Fix s_umount thaw/write deadlock
2011-10-27 21:39 ` Christoph Hellwig
@ 2011-10-27 22:08 ` Valerie Aurora
2011-10-30 0:59 ` Valerie Aurora
0 siblings, 1 reply; 13+ messages in thread
From: Valerie Aurora @ 2011-10-27 22:08 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-fsdevel, Jan Kara, Dave Chinner, Masayoshi MIZUMA,
Greg Freemyer, linux-ext4, Eric Sandeen, Surbhi Palande
On Thu, Oct 27, 2011 at 2:39 PM, Christoph Hellwig <hch@infradead.org> wrote:
> Any plans to resubmit these with the updates and in proper format?
Yes, coming tomorrow. We found more bugs during testing which
fortunately Surbhi Palande already fixed and submitted patches for,
but got lost in the shuffle.
-VAL
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 0/3] VFS: Fix s_umount thaw/write deadlock
2011-10-27 22:08 ` Valerie Aurora
@ 2011-10-30 0:59 ` Valerie Aurora
2011-11-28 20:58 ` Valerie Aurora
0 siblings, 1 reply; 13+ messages in thread
From: Valerie Aurora @ 2011-10-30 0:59 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-fsdevel, Jan Kara, Dave Chinner, Masayoshi MIZUMA,
Greg Freemyer, linux-ext4, Eric Sandeen, Surbhi Palande
On Thu, Oct 27, 2011 at 3:08 PM, Valerie Aurora <val@vaaconsulting.com> wrote:
> On Thu, Oct 27, 2011 at 2:39 PM, Christoph Hellwig <hch@infradead.org> wrote:
>> Any plans to resubmit these with the updates and in proper format?
>
> Yes, coming tomorrow. We found more bugs during testing which
> fortunately Surbhi Palande already fixed and submitted patches for,
> but got lost in the shuffle.
Ironically, this has been held up by a disk failure. :) A few more days.
-VAL
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 0/3] VFS: Fix s_umount thaw/write deadlock
2011-10-30 0:59 ` Valerie Aurora
@ 2011-11-28 20:58 ` Valerie Aurora
0 siblings, 0 replies; 13+ messages in thread
From: Valerie Aurora @ 2011-11-28 20:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-fsdevel, Jan Kara, Dave Chinner, Masayoshi MIZUMA,
Greg Freemyer, linux-ext4, Eric Sandeen, Surbhi Palande,
Christopher Chaltain
On Sat, Oct 29, 2011 at 5:59 PM, Valerie Aurora <val@vaaconsulting.com> wrote:
> On Thu, Oct 27, 2011 at 3:08 PM, Valerie Aurora <val@vaaconsulting.com> wrote:
>> On Thu, Oct 27, 2011 at 2:39 PM, Christoph Hellwig <hch@infradead.org> wrote:
>>> Any plans to resubmit these with the updates and in proper format?
>>
>> Yes, coming tomorrow. We found more bugs during testing which
>> fortunately Surbhi Palande already fixed and submitted patches for,
>> but got lost in the shuffle.
>
> Ironically, this has been held up by a disk failure. :) A few more days.
I no longer have time to consult in addition to my full-time job, so
Canonical is taking the lead on this again.
-VAL
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2011-11-28 20:58 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-13 2:53 [RFC PATCH 0/3] VFS: Fix s_umount thaw/write deadlock Valerie Aurora
2011-09-13 2:57 ` [RFC PATCH 1/3] " Valerie Aurora
2011-09-14 14:00 ` Jan Kara
2011-09-14 23:53 ` Valerie Aurora
2011-09-15 16:22 ` Jan Kara
2011-09-18 23:25 ` Dave Chinner
2011-09-20 22:51 ` Christoph Hellwig
2011-09-14 13:05 ` [RFC PATCH 0/3] " Jan Kara
2011-09-14 23:22 ` Valerie Aurora
2011-10-27 21:39 ` Christoph Hellwig
2011-10-27 22:08 ` Valerie Aurora
2011-10-30 0:59 ` Valerie Aurora
2011-11-28 20:58 ` Valerie Aurora
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).