From: Jan Kara <jack@suse.cz>
To: Kamal Mostafa <kamal@canonical.com>
Cc: Jan Kara <jack@suse.cz>, Alexander Viro <viro@zeniv.linux.org.uk>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Matthew Wilcox <matthew@wil.cx>,
Randy Dunlap <rdunlap@xenotime.net>, Theodore Tso <tytso@mit.edu>,
linux-doc@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Surbhi Palande <csurbhi@gmail.com>,
Valerie Aurora <val@vaaconsulting.com>,
Christopher Chaltain <christopher.chaltain@canonical.com>,
"Peter M. Petrakis" <peter.petrakis@canonical.com>,
Mikulas Patocka <mpatocka@redhat.com>
Subject: Re: [PATCH v2 3/7] VFS: Fix s_umount thaw/write deadlock
Date: Fri, 6 Jan 2012 02:50:26 +0100 [thread overview]
Message-ID: <20120106015026.GF3790@quack.suse.cz> (raw)
In-Reply-To: <1323367477-21685-4-git-send-email-kamal@canonical.com>
Hello,
On Thu 08-12-11 10:04:33, Kamal Mostafa wrote:
> From: Valerie Aurora <val@vaaconsulting.com>
>
> File system freeze/thaw require the superblock's s_umount lock.
> However, we write to the file system while holding the s_umount lock
> in several places (e.g., writeback and quotas). Any of these can
> block on a frozen file system while holding s_umount, preventing any
> later thaw from occurring, since thaw requires s_umount.
>
> The solution is to add a check, vfs_is_frozen(), to all code paths
> that write while holding sb->s_umount and return without writing if it
> is true.
Mikulas Patocka correctly pointed out (in thread starting by
https://lkml.org/lkml/2011/11/25/169) that skipping frozen filesystem is
currently actually wrong for filesystems such as ext2. These filesystems
cannot stop modifications from happening and thus sync must not skip them
even when they are marked as frozen. That complicates the solution of the
deadlock you observe.
Another issue is that even with ext4 current freezing code can leave dirty
data on frozen filesystem. Consider the following race:
Thread 1 Thread 2
freeze_super() __generic_file_aio_write()
... vfs_check_frozen(sb)
sb->s_frozen = SB_FREEZE_WRITE; ...
sync_filesystem(sb);
now we go and write to the fs
sb->s_frozen = SB_FREEZE_TRANS;
Your patches would just hide this race (which I can actually trigger rather
easily in my testing).
Above two issues make me think more about how to really avoid having any
dirty bits set on frozen filesystem. Then we shouldn't need this patch at
all. The trouble is that race like above can really happen with any
operation modifying the filesystem so it's not really easy to fix. I'll
write email about that tomorrow...
Honza
> BugLink: https://bugs.launchpad.net/bugs/897421
> Signed-off-by: Valerie Aurora <val@vaaconsulting.com>
> Cc: Kamal Mostafa <kamal@canonical.com>
> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com>
> [kamal@canonical.com: minor changes; patch restructure]
> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
> ---
> fs/fs-writeback.c | 8 ++++++++
> fs/quota/quota.c | 21 ++++++++++++++++++++-
> fs/super.c | 8 ++++++++
> fs/sync.c | 4 ++--
> include/linux/fs.h | 7 ++++++-
> 5 files changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 73c3992..eee01cd 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -554,6 +554,14 @@ static long writeback_sb_inodes(struct super_block *sb,
> while (!list_empty(&wb->b_io)) {
> struct inode *inode = wb_inode(wb->b_io.prev);
>
> + if (inode->i_sb == sb && vfs_is_frozen(sb)) {
> + /*
> + * Inode is on a frozen superblock; skip it for now.
> + */
> + redirty_tail(inode, wb);
> + continue;
> + }
> +
> if (inode->i_sb != sb) {
> if (work->sb) {
> /*
> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
> index 35f4b0e..1d770f2 100644
> --- a/fs/quota/quota.c
> +++ b/fs/quota/quota.c
> @@ -47,7 +47,7 @@ static int check_quotactl_permission(struct super_block *sb, int type, int cmd,
>
> static void quota_sync_one(struct super_block *sb, void *arg)
> {
> - if (sb->s_qcop && sb->s_qcop->quota_sync)
> + if (sb->s_qcop && sb->s_qcop->quota_sync && !vfs_is_frozen(sb))
> sb->s_qcop->quota_sync(sb, *(int *)arg, 1);
> }
>
> @@ -368,8 +368,27 @@ SYSCALL_DEFINE4(quotactl, unsigned int, cmd, const char __user *, special,
> goto out;
> }
>
> + /*
> + * If the fs is frozen, only allow read-only quota subcmds.
> + */
> + if (vfs_is_frozen(sb)) {
> + switch (cmd) {
> + case Q_GETFMT:
> + case Q_GETINFO:
> + case Q_XGETQSTAT:
> + ret = 0;
> + break;
> + default:
> + ret = -EBUSY;
> + break;
> + }
> + if ( ret != 0 )
> + goto out_drop_super;
> + }
> +
> ret = do_quotactl(sb, type, cmds, id, addr, pathp);
>
> +out_drop_super:
> drop_super(sb);
> out:
> if (pathp && !IS_ERR(pathp))
> diff --git a/fs/super.c b/fs/super.c
> index afd0f1a..5629d06 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -526,6 +526,10 @@ void sync_supers(void)
> *
> * Scans the superblock list and calls given function, passing it
> * locked superblock and given argument.
> + *
> + * Note: If @f is going to write to the file system, it must first
> + * check if the file system is frozen (via vfs_is_frozen(sb)) and
> + * refuse to write if so.
> */
> void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
> {
> @@ -595,6 +599,10 @@ EXPORT_SYMBOL(iterate_supers_type);
> *
> * Scans the superblock list and finds the superblock of the file system
> * mounted on the device given. %NULL is returned if no match is found.
> + *
> + * Note: If caller is going to write to the superblock, it must first
> + * check if the file system is frozen (via vfs_is_frozen(sb)) and
> + * refuse to write if so.
> */
>
> struct super_block *get_super(struct block_device *bdev)
> diff --git a/fs/sync.c b/fs/sync.c
> index 101b8ef..e8166db 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -68,7 +68,7 @@ int sync_filesystem(struct super_block *sb)
> /*
> * No point in syncing out anything if the filesystem is read-only.
> */
> - if (sb->s_flags & MS_RDONLY)
> + if ((sb->s_flags & MS_RDONLY) || vfs_is_frozen(sb))
> return 0;
>
> ret = __sync_filesystem(sb, 0);
> @@ -80,7 +80,7 @@ EXPORT_SYMBOL_GPL(sync_filesystem);
>
> static void sync_one_sb(struct super_block *sb, void *arg)
> {
> - if (!(sb->s_flags & MS_RDONLY))
> + if (!(sb->s_flags & MS_RDONLY) && !vfs_is_frozen(sb))
> __sync_filesystem(sb, *(int *)arg);
> }
> /*
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 019dc55..ec33b4c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1490,7 +1490,7 @@ extern void prune_dcache_sb(struct super_block *sb, int nr_to_scan);
> extern struct timespec current_fs_time(struct super_block *sb);
>
> /*
> - * Snapshotting support.
> + * Snapshotting support. See freeze_super() for documentation.
> */
> enum {
> SB_UNFROZEN = 0,
> @@ -1501,6 +1501,11 @@ enum {
> #define vfs_check_frozen(sb, level) \
> wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level)))
>
> +static inline int vfs_is_frozen(struct super_block *sb)
> +{
> + return sb->s_frozen == SB_FREEZE_TRANS;
> +}
> +
> /*
> * until VFS tracks user namespaces for inodes, just make all files
> * belong to init_user_ns
> --
> 1.7.5.4
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2012-01-06 1:50 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-08 18:04 [PATCH v2 0/7] fix s_umount thaw/write and journal deadlock Kamal Mostafa
2011-12-08 18:04 ` [PATCH v2 1/7] Adding support to freeze and unfreeze a journal Kamal Mostafa
2012-01-10 20:20 ` Eric Sandeen
2012-01-10 21:31 ` Jan Kara
2012-01-10 21:55 ` Surbhi Palande
[not found] ` <CAMBkX3eVeKSmEzmYTe6Oe_D6kAMQTL5LYoi1-Axj7CcrM85Pow@mail.gmail.com>
2012-01-11 0:04 ` Jan Kara
2012-01-11 0:13 ` Surbhi Palande
2012-01-11 0:51 ` Jan Kara
2012-01-11 3:08 ` Eric Sandeen
2012-01-10 22:00 ` Surbhi Palande
2011-12-08 18:04 ` [PATCH v2 2/7] Freeze and thaw the journal on ext4 freeze Kamal Mostafa
2012-01-06 0:32 ` Jan Kara
2011-12-08 18:04 ` [PATCH v2 3/7] VFS: Fix s_umount thaw/write deadlock Kamal Mostafa
2012-01-06 1:50 ` Jan Kara [this message]
2011-12-08 18:04 ` [PATCH v2 4/7] VFS: Rename and refactor writeback_inodes_sb_if_idle Kamal Mostafa
2011-12-13 3:34 ` Miao Xie
2011-12-15 7:10 ` Miao Xie
2011-12-16 20:48 ` Kamal Mostafa
2012-01-06 0:33 ` Jan Kara
2011-12-08 18:04 ` [PATCH v2 5/7] VFS: Avoid read-write deadlock in try_to_writeback_inodes_sb Kamal Mostafa
2012-01-06 0:35 ` Jan Kara
2012-01-11 20:29 ` Kamal Mostafa
2012-01-12 15:53 ` Mikulas Patocka
2011-12-08 18:04 ` [PATCH v2 6/7] VFS: Document s_frozen state through freeze_super Kamal Mostafa
2012-01-06 0:36 ` Jan Kara
2011-12-08 18:04 ` [PATCH v2 7/7] Documentation: Correct s_umount state for freeze_fs/unfreeze_fs Kamal Mostafa
2012-01-06 0:36 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120106015026.GF3790@quack.suse.cz \
--to=jack@suse.cz \
--cc=adilger.kernel@dilger.ca \
--cc=christopher.chaltain@canonical.com \
--cc=csurbhi@gmail.com \
--cc=kamal@canonical.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=matthew@wil.cx \
--cc=mpatocka@redhat.com \
--cc=peter.petrakis@canonical.com \
--cc=rdunlap@xenotime.net \
--cc=tytso@mit.edu \
--cc=val@vaaconsulting.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).