linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave@linux.vnet.ibm.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [patch 1/2] fs: mnt_want_write speedup
Date: Wed, 18 Mar 2009 12:13:43 -0700	[thread overview]
Message-ID: <1237403623.8286.196.camel@nimitz> (raw)
In-Reply-To: <20090312041334.GB1893@wotan.suse.de>

On Thu, 2009-03-12 at 05:13 +0100, Nick Piggin wrote: 
> On Wed, Mar 11, 2009 at 03:11:17PM -0700, Dave Hansen wrote:
> > I'm feeling a bit better about these, although I am still honestly quite
> > afraid of the barriers.  I also didn't like all the #ifdefs much, but
> > here's some help on that.
> 
> FWIW, we have this in suse kernels because page fault performance was
> so bad compared with SLES10. mnt_want_write & co was I think the 2nd
> biggest offender for file backed mappings (after pvops). I think we're
> around parity again even with pvops.

Page faults themselves?  Which path was that from?

> Basically I think we have to improve this one way or another in mainline
> too. Is there any way to make you feel better about the barriers? More
> comments?
> 
> mnt_make_readonly()                    mnt_want_write()
> 1. mnt_flags |= MNT_WRITE_HOLD         A. mnt_writers[x]++
> 2. smp_mb()                            B. smp_mb()
> 3. count += mnt_writers[0]             C. while (mnt_flags & MNT_WRITE_HOLD) ;
>    ...                                 D. smp_rmb()
>    count += mnt_writers[N]             E. if (mnt_flags & MNT_READONLY)
> 4. if (count == 0)                     F.    mnt_writers[x]-- /* fail */
> 5.     mnt_flags |= MNT_READONLY       G. else /* success */
> 6. else /* fail */
> 7. smp_wmb()
> 8. mnt_flags &= ~MNT_WRITE_HOLD
> 
> * 2 ensures that 1 is visible before 3 is loaded
> * B ensures that A is visible before C is loaded
> * Therefore, either count != 0 at 4, or C will loop (or both)
> * If count == 0
>   * (make_readonly success)
>   * C will loop until 8
>   * D ensures E is not loaded until loop ends
>   * 7 ensures 5 is visible before 8 is
>   * Therefore E will find MNT_READONLY (want_write fail)
> * If C does not loop
>   * 4 will find count != 0 (make_readonly fail)
>   * Therefore 5 is not executed.
>   * Therefore E will not find MNT_READONLY (want_write success)
> * If count != 0 and C loops
>   * (make_readonly fail)
>   * 5 will not be executed
>   * Therefore E will not find MNT_READONLY (want_write success)

It is great to spell it out like that.  But, honestly, I think the code
and comments that are there are probably better than looking at an out
of line description like that.  

> I don't know if that helps (I should reference which statements rely
> on which). I think it shows that either one or the other only must
> succeed.
> 
> It does not illustrate how the loop in the want_write side prevents
> the sumation from getting confused by decrementing count on a different
> CPU than it was incremented, but I've commented that case in the code
> fairly well I think.

I think you mentioned a seqlock being a possibility here.  That would
slot in as a replacement for MNT_WRITE_HOLD, right?  mnt_make_readonly()
takes a seq_write, mnt_want_write() takes a seq_read and doesn't
consider MNT_READONLY valid until it gets a clear read of it.  It would
internalize all the barriers into the seqlock implementation except for
the mnt_writers[]-related ones.

As for mnt_writers, you'd need an smp_rmb() in mnt_make_readonly()
before reading the counters and an smp_wmb() after writing the counter
in mnt_want_write().

Is see that those are effectively consolidated down in your version into
a single smp_mb() at both call sites when combined with the
MNT_WRITE_HOLD barriers.

If we can get this down to two explicit calls to the barrier functions
that paired and very clear, I think I'd be much more OK with it.

> > How about this on top of what you have as a bit of a cleanup?  It gets
> > rid of all the new #ifdefs in .c files?
> > 
> > Did I miss the use of get_mnt_writers_ptr()?  I don't think I actually
> > saw it used anywhere in this pair of patches, so I've stolen it.  I
> > think gcc should compile all this new stuff down to be basically the
> > same as you had before.  The one thing I'm not horribly sure of is the
> > "out_free_devname:" label.  It shouldn't be reachable in the !SMP case. 
> > 
> > I could also consolidate the header #ifdefs into a single one if you
> > think that looks better.
> 
> I don't like the get_mnt_writers_ptr terribly. The *_mnt_writers functions
> are quite primitive and just happen to be in the .c file because they're
> private to it. The alloc/free_mnt_writers is good (they could be
> in the .c file too?).

Yeah, it could all move to the .c file.  

> Another thing I should probably do is slash away most of the crap from
> mnt_want_write in the UP case. It only needs to do a preempt_disable,
> test MNT_READONLY, increment mnt_writers (and similarly for mnt_make_readonly)

Yeah, that's true.  I probably tried to prematurely optimize it.

-- Dave


  reply	other threads:[~2009-03-18 19:13 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-10 14:37 [patch 1/2] fs: mnt_want_write speedup Nick Piggin
2009-03-10 14:38 ` [patch 2/2] fs: introduce mnt_clone_write Nick Piggin
2009-03-10 14:55   ` Matthew Wilcox
2009-03-10 15:08     ` Nick Piggin
2009-03-10 14:48 ` [patch 1/2] fs: mnt_want_write speedup Matthew Wilcox
2009-03-10 15:03   ` Nick Piggin
2009-03-10 15:31 ` Nick Piggin
2009-03-11 22:11 ` Dave Hansen
2009-03-12  4:13   ` Nick Piggin
2009-03-18 19:13     ` Dave Hansen [this message]
2009-04-02 18:22       ` Nick Piggin
2009-04-02 18:37         ` Dave Hansen
2009-04-02 20:31           ` Christoph Hellwig
2009-04-03  1:29           ` Nick Piggin
2009-04-02 18:43         ` Al Viro
2009-04-02 18:48           ` Al Viro
2009-04-02 19:08           ` Dave Hansen
2009-04-03 10:31             ` Al Viro
2009-04-03  1:31           ` Nick Piggin
2009-04-02 18:08   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1237403623.8286.196.camel@nimitz \
    --to=dave@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).