From: Nick Piggin <npiggin@suse.de>
To: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: linux-fsdevel@vger.kernel.org, viro@ZenIV.linux.org.uk
Subject: Re: [PATCH 3/3] vfs: fix filesystem_sync vs write race on rw=>ro remount v2
Date: Wed, 3 Mar 2010 02:21:11 +1100 [thread overview]
Message-ID: <20100302152111.GK8653@laptop> (raw)
In-Reply-To: <1267230349-8192-3-git-send-email-dmonakhov@openvz.org>
On Sat, Feb 27, 2010 at 03:25:49AM +0300, Dmitry Monakhov wrote:
> diff --git a/fs/namespace.c b/fs/namespace.c
> index e816097..bc79f1f 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -277,6 +277,13 @@ int mnt_want_write(struct vfsmount *mnt)
> dec_mnt_writers(mnt);
> ret = -EROFS;
> goto out;
> + } else {
> + /*
> + * Clear ST_REMOUNT_RO flag to let remount task know
> + * about new writers.
> + */
> + if (unlikely(test_bit(ST_REMOUNT_RO, &mnt->mnt_sb->s_state)))
> + clear_bit(ST_REMOUNT_RO, &mnt->mnt_sb->s_state);
> }
> out:
> preempt_enable();
I'm not totally sure you've covered the race here. The other side:
store s_state <- ST_REMOUNT_RO
smp_mb()
load write count
spin_unlock()
load s_state
So to start with, the load of s_state could be moved up above one
of the loads of write_count.
This side:
store write count
smp_mb()
load s_state
store s_state
The memory orderings themselves should be OK, although you haven't
updated the existing smp_mb() comments to say that it now also orders
s_state manipulation with write count manipulation and hence also
couples with the new smp_mb you introduced[*].
However if you do this store/store versus load/load sequence for
synchronization (or load/store vs store/load), then you usually need to
do them in opposite order otherwise you lose synchronization: couldn't
write count be loaded in (1) before it is incremented in (2), then
s_state loaded in (1) before it is cleared in (2)?
[*] please explicitly comment all barriers (including acquire and release
barriers if you use them for something other than just the critical
sections). For *every* barrier site, comment or at least point to
comments that document all actors in the protocol.
> @@ -581,18 +584,33 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
>
> if (flags & MS_RDONLY)
> acct_auto_close(sb);
> + if (remount_ro) {
> + if (force)
> + mark_files_ro(sb);
It's a pity we still need the files list for mark_files_ro.
> + /*
> + * Store ST_REMOUNT_RO flag. New writers (in any) will clrear
> + * this bit.
> + */
> + set_bit(ST_REMOUNT_RO, &sb->s_state);
> + /*
> + * This store should be visible before we do.
> + */
> + smp_mb();
> + /*
> + * To prevent sync vs write race condition we have to check
> + * writers before filesystem sync.
> + */
> + if (!fs_may_remount_ro(sb))
> + return -EBUSY;
I guess in the 'force' case, you may actually want to try harder than
this.
I don't know how you can reasonably do that and prevent livelock,
though. Apart from doing this totally differently and putting a sleeping
lock into mnt_want_write, taken only in the slowpath when a remount
action is pending. This should make things simpler here but no idea
whether mnt_want_write is called from atomic context.
Overall I don't know what to think of the direction you're taking here.
I guess it's probably viable, but OTOH if we were able to block in
mnt_want_write then we should be able to just eliminate all races
and make it work still by just traversing the files list. (I would
like it much more if mark_files_ro could go away at the same time,
but I don't see how).
prev parent reply other threads:[~2010-03-02 15:21 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-27 0:25 [PATCH 1/3] vfs: redesign sb state flags Dmitry Monakhov
2010-02-27 0:25 ` [PATCH 2/3] vfs: per-sb write count preparation stage Dmitry Monakhov
2010-02-27 0:25 ` [PATCH 3/3] vfs: fix filesystem_sync vs write race on rw=>ro remount v2 Dmitry Monakhov
2010-03-02 13:29 ` Jan Kara
2010-03-02 14:04 ` Dmitry Monakhov
2010-03-02 14:06 ` Jan Kara
2010-03-02 14:24 ` Dmitry Monakhov
2010-03-02 14:35 ` Jan Kara
2010-03-02 15:38 ` Dmitry Monakhov
2010-03-02 15:21 ` Nick Piggin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100302152111.GK8653@laptop \
--to=npiggin@suse.de \
--cc=dmonakhov@openvz.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).