From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Hansen Subject: Re: [rfc][patch 1/2] mnt_want_write speedup 1 Date: Thu, 18 Dec 2008 22:54:57 -0800 Message-ID: <1229669697.17206.602.camel@nimitz> References: <20081219061937.GA16268@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: Linux Memory Management List , linux-fsdevel@vger.kernel.org To: Nick Piggin Return-path: Received: from e33.co.us.ibm.com ([32.97.110.151]:34156 "EHLO e33.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751111AbYLSGy7 (ORCPT ); Fri, 19 Dec 2008 01:54:59 -0500 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by e33.co.us.ibm.com (8.13.1/8.13.1) with ESMTP id mBJ6sEqQ018394 for ; Thu, 18 Dec 2008 23:54:14 -0700 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v9.1) with ESMTP id mBJ6swrw187004 for ; Thu, 18 Dec 2008 23:54:58 -0700 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id mBJ6sweY021523 for ; Thu, 18 Dec 2008 23:54:58 -0700 In-Reply-To: <20081219061937.GA16268@wotan.suse.de> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Fri, 2008-12-19 at 07:19 +0100, Nick Piggin wrote: > @@ -369,24 +283,34 @@ static int mnt_make_readonly(struct vfsm > { > int ret = 0; > > - lock_mnt_writers(); > + spin_lock(&vfsmount_lock); > + mnt->mnt_flags |= MNT_WRITE_HOLD; > /* > - * With all the locks held, this value is stable > + * After storing MNT_WRITE_HOLD, we'll read the counters. This store > + * should be visible before we do. > */ > - if (atomic_read(&mnt->__mnt_writers) > 0) { > + smp_mb(); > + > + /* > + * With writers on hold, if this value is zero, then there are definitely > + * no active writers (although held writers may subsequently increment > + * the count, they'll have to wait, and decrement it after seeing > + * MNT_READONLY). > + */ > + if (count_mnt_writers(mnt) > 0) { > ret = -EBUSY; OK, I think this is one of the big races inherent with this approach. There's nothing in here to ensure that no one is in the middle of an update during this code. The preempt_disable() will, of course, reduce the window, but I think there's still a race here. Is this where you wanted to put the synchronize_rcu()? That's a nice touch because although *that* will ensure that no one is in the middle of an increment here and that they will, at worst, be blocking on the MNT_WRITE_HOLD thing. I kinda remember going down this path a few times, bu you may have cracked the problem. Dunno. I need to stare at the code a bit more before I'm convinced. I'm optimistic, but a bit skeptical this can work. :) I am really wondering where all the cost is that you're observing in those benchmarks. Have you captured any profiles by chance? -- Dave