Re: [PATCH 3/3] vfs: fix filesystem_sync vs write race on rw=>ro remount v2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nick Piggin <npiggin@suse.de>
To: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: linux-fsdevel@vger.kernel.org, viro@ZenIV.linux.org.uk
Subject: Re: [PATCH 3/3]      vfs: fix filesystem_sync vs write race on rw=>ro remount v2
Date: Wed, 3 Mar 2010 02:21:11 +1100	[thread overview]
Message-ID: <20100302152111.GK8653@laptop> (raw)
In-Reply-To: <1267230349-8192-3-git-send-email-dmonakhov@openvz.org>

On Sat, Feb 27, 2010 at 03:25:49AM +0300, Dmitry Monakhov wrote:
> diff --git a/fs/namespace.c b/fs/namespace.c
> index e816097..bc79f1f 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -277,6 +277,13 @@ int mnt_want_write(struct vfsmount *mnt)
>  		dec_mnt_writers(mnt);
>  		ret = -EROFS;
>  		goto out;
> +	} else {
> +		/*
> +		 * Clear ST_REMOUNT_RO flag to let remount task know
> +		 * about new writers.
> +		 */
> +		if (unlikely(test_bit(ST_REMOUNT_RO, &mnt->mnt_sb->s_state)))
> +			clear_bit(ST_REMOUNT_RO, &mnt->mnt_sb->s_state);
>  	}
>  out:
>  	preempt_enable();

I'm not totally sure you've covered the race here. The other side:

store s_state <- ST_REMOUNT_RO
smp_mb()
load  write count
spin_unlock()
load  s_state

So to start with, the load of s_state could be moved up above one
of the loads of write_count.

This side:
store  write count
smp_mb()
load   s_state
store  s_state

The memory orderings themselves should be OK, although you haven't
updated the existing smp_mb() comments to say that it now also orders
s_state manipulation with write count manipulation and hence also
couples with the new smp_mb you introduced[*].

However if you do this store/store versus load/load sequence for
synchronization (or load/store vs store/load), then you usually need to
do them in opposite order otherwise you lose synchronization: couldn't
write count be loaded in (1) before it is incremented in (2), then
s_state loaded in (1) before it is cleared in (2)?

[*] please explicitly comment all barriers (including acquire and release
barriers if you use them for something other than just the critical
sections). For *every* barrier site, comment or at least point to
comments that document all actors in the protocol.

> @@ -581,18 +584,33 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
>  
>  	if (flags & MS_RDONLY)
>  		acct_auto_close(sb);
> +	if (remount_ro) {
> +		if (force)
> +			mark_files_ro(sb);

It's a pity we still need the files list for mark_files_ro.

> +		/*
> +		 * Store ST_REMOUNT_RO flag. New writers (in any) will clrear
> +		 * this bit.
> +		 */
> +		set_bit(ST_REMOUNT_RO, &sb->s_state);
> +		/*
> +		 * This store should be visible before we do.
> +		 */
> +		smp_mb();
> +		/*
> +		 * To prevent sync vs write race condition we have to check
> +		 * writers before filesystem sync.
> +		 */
> +		if (!fs_may_remount_ro(sb))
> +			return -EBUSY;

I guess in the 'force' case, you may actually want to try harder than
this.

I don't know how you can reasonably do that and prevent livelock,
though. Apart from doing this totally differently and putting a sleeping
lock into mnt_want_write, taken only in the slowpath when a remount
action is pending. This should make things simpler here but no idea
whether mnt_want_write is called from atomic context. 

Overall I don't know what to think of the direction you're taking here.
I guess it's probably viable, but OTOH if we were able to block in
mnt_want_write then we should be able to just eliminate all races
and make it work still by just traversing the files list. (I would
like it much more if mark_files_ro could go away at the same time,
but I don't see how).

     prev parent reply	other threads:[~2010-03-02 15:21 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-27  0:25 [PATCH 1/3] vfs: redesign sb state flags Dmitry Monakhov
2010-02-27  0:25 ` [PATCH 2/3] vfs: per-sb write count preparation stage Dmitry Monakhov
2010-02-27  0:25   ` [PATCH 3/3] vfs: fix filesystem_sync vs write race on rw=>ro remount v2 Dmitry Monakhov
2010-03-02 13:29     ` Jan Kara
2010-03-02 14:04       ` Dmitry Monakhov
2010-03-02 14:06         ` Jan Kara
2010-03-02 14:24           ` Dmitry Monakhov
2010-03-02 14:35             ` Jan Kara
2010-03-02 15:38               ` Dmitry Monakhov
2010-03-02 15:21     ` Nick Piggin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100302152111.GK8653@laptop \
    --to=npiggin@suse.de \
    --cc=dmonakhov@openvz.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.