From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Piggin Subject: Re: [PATCH 3/3] vfs: fix filesystem_sync vs write race on rw=>ro remount v2 Date: Wed, 3 Mar 2010 02:21:11 +1100 Message-ID: <20100302152111.GK8653@laptop> References: <1267230349-8192-1-git-send-email-dmonakhov@openvz.org> <1267230349-8192-2-git-send-email-dmonakhov@openvz.org> <1267230349-8192-3-git-send-email-dmonakhov@openvz.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org, viro@ZenIV.linux.org.uk To: Dmitry Monakhov Return-path: Received: from cantor2.suse.de ([195.135.220.15]:41431 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751020Ab0CBPVQ (ORCPT ); Tue, 2 Mar 2010 10:21:16 -0500 Content-Disposition: inline In-Reply-To: <1267230349-8192-3-git-send-email-dmonakhov@openvz.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sat, Feb 27, 2010 at 03:25:49AM +0300, Dmitry Monakhov wrote: > diff --git a/fs/namespace.c b/fs/namespace.c > index e816097..bc79f1f 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -277,6 +277,13 @@ int mnt_want_write(struct vfsmount *mnt) > dec_mnt_writers(mnt); > ret = -EROFS; > goto out; > + } else { > + /* > + * Clear ST_REMOUNT_RO flag to let remount task know > + * about new writers. > + */ > + if (unlikely(test_bit(ST_REMOUNT_RO, &mnt->mnt_sb->s_state))) > + clear_bit(ST_REMOUNT_RO, &mnt->mnt_sb->s_state); > } > out: > preempt_enable(); I'm not totally sure you've covered the race here. The other side: store s_state <- ST_REMOUNT_RO smp_mb() load write count spin_unlock() load s_state So to start with, the load of s_state could be moved up above one of the loads of write_count. This side: store write count smp_mb() load s_state store s_state The memory orderings themselves should be OK, although you haven't updated the existing smp_mb() comments to say that it now also orders s_state manipulation with write count manipulation and hence also couples with the new smp_mb you introduced[*]. However if you do this store/store versus load/load sequence for synchronization (or load/store vs store/load), then you usually need to do them in opposite order otherwise you lose synchronization: couldn't write count be loaded in (1) before it is incremented in (2), then s_state loaded in (1) before it is cleared in (2)? [*] please explicitly comment all barriers (including acquire and release barriers if you use them for something other than just the critical sections). For *every* barrier site, comment or at least point to comments that document all actors in the protocol. > @@ -581,18 +584,33 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force) > > if (flags & MS_RDONLY) > acct_auto_close(sb); > + if (remount_ro) { > + if (force) > + mark_files_ro(sb); It's a pity we still need the files list for mark_files_ro. > + /* > + * Store ST_REMOUNT_RO flag. New writers (in any) will clrear > + * this bit. > + */ > + set_bit(ST_REMOUNT_RO, &sb->s_state); > + /* > + * This store should be visible before we do. > + */ > + smp_mb(); > + /* > + * To prevent sync vs write race condition we have to check > + * writers before filesystem sync. > + */ > + if (!fs_may_remount_ro(sb)) > + return -EBUSY; I guess in the 'force' case, you may actually want to try harder than this. I don't know how you can reasonably do that and prevent livelock, though. Apart from doing this totally differently and putting a sleeping lock into mnt_want_write, taken only in the slowpath when a remount action is pending. This should make things simpler here but no idea whether mnt_want_write is called from atomic context. Overall I don't know what to think of the direction you're taking here. I guess it's probably viable, but OTOH if we were able to block in mnt_want_write then we should be able to just eliminate all races and make it work still by just traversing the files list. (I would like it much more if mark_files_ro could go away at the same time, but I don't see how).