From mboxrd@z Thu Jan 1 00:00:00 1970 From: Al Viro Subject: Re: [PATCH 17/17] RCU'd vfsmounts Date: Thu, 3 Oct 2013 20:43:51 +0100 Message-ID: <20131003194351.GK13318@ZenIV.linux.org.uk> References: <20131003105130.GE13318@ZenIV.linux.org.uk> <20131003174439.GG13318@ZenIV.linux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel , Linux Kernel Mailing List To: Linus Torvalds Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Thu, Oct 03, 2013 at 12:06:04PM -0700, Linus Torvalds wrote: > On Thu, Oct 3, 2013 at 10:44 AM, Al Viro wrote: > > > > Anyway, I've done nicer variants of that protection for everything except > > fuse (hadn't gotten around to it yet); see vfs.git#experimental now: > > Ok, I did a quick test, and it looks ok here, so looking good for 3.13. > > However, the new smp_mb() in mntput_no_expire() is quite noticeable in > the path lookup stress-test profiles. And I'm not seeing what that > allegedly protects against, especially if mnt_ns is NULL (ie all the > common important cases). In the common case it's ->mnt_ns is *not* NULL; that's what we get if the damn thing is still mounted. What we need to avoid is this: mnt_ns non-NULL, mnt_count is 2 CPU1: umount -l CPU2: mntput umount_tree() clears mnt_ns drop mount_lock.lock namespace_unlock() calls mntput() decrement mnt_count see that mnt_ns is NULL grab mount_lock.lock check mnt_count decrement mnt_count see old value of mnt_ns decide to bugger off see it equal to 1 (i.e. miss decrement on CPU2) decide to bugger off The barrier in mntput() is to prevent that combination, so that either CPU2 would see mnt_ns cleared by CPU1, or CPU1 would see mnt_count decrement done by CPU2. Its counterpart on CPU1 is provided by spin_unlock/spin_lock we've done between clearing mnt_ns and checking mnt_count. Note that synchronize_rcu() in namespace_unlock() and rcu_read_lock() in mntput() are irrelevant here - the latter on CPU2 might very well have happened after the former on CPU1, so umount -l did *not* wait for CPU2 to do anything. Any suggestions re getting rid of that barrier?