All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [rfc][patch 1/2] mnt_want_write speedup 1
Date: Mon, 22 Dec 2008 05:35:26 +0100	[thread overview]
Message-ID: <20081222043526.GC13406@wotan.suse.de> (raw)
In-Reply-To: <1229700721.17206.634.camel@nimitz>

On Fri, Dec 19, 2008 at 07:32:01AM -0800, Dave Hansen wrote:
> On Fri, 2008-12-19 at 08:03 +0100, Nick Piggin wrote:
> > MNT_WRITE_HOLD is set, so any writer that has already made it past
> > the MNT_WANT_WRITE loop will have its count visible here. Any writer
> > that has not made it past that loop will wait until the slowpath
> > completes and then the fastpath will go on to check whether the
> > mount is still writeable.
> 
> Ahh, got it.  I'm slowly absorbing the barriers.  Not the normal way, I
> code.
> 
> I thought there was another race with MNT_WRITE_HOLD since mnt_flags
> isn't really managed atomically.  But, by only modifying with the
> vfsmount_lock, I think it is OK.
> 
> I also wondered if there was a possibility of getting a spurious -EBUSY
> when remounting r/w->r/o.  But, that turned out to just happen when the
> fs was *already* r/o.  So that looks good.
> 
> While this has cleared out a huge amount of complexity, I can't stop
> wondering if this could be done with a wee bit more "normal" operations.
> I'm pretty sure I couldn't have come up with this by myself, and I'm a
> bit worried that I wouldn't be able to find a race in it if one reared
> its ugly head.  

It could be done with a seqcounter I think, but that adds more branches,
variables, and barriers to this fastpath. Perhaps I should simply add
a bit more documentation.

 
> Is there a real good reason to allocate the percpu counters dynamically?
> Might as well stick them in the vfsmount and let the one
> kmem_cache_zalloc() in alloc_vfsmnt() do a bit larger of an allocation.
> Did you think that was going to bloat it to a compound allocation or
> something?  I hate the #ifdefs. :)

Distros want to ship big NR_CPUS kernels and have them run reasonably on
small num_possible_cpus() systems. But also, it would help to avoid
cacheline bouncing from false sharing (allocpercpu.c code can also mess
this bug for small objects like these counters, but that's a problem
with the allocpercpu code which should be fixed anyway).


WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <npiggin@suse.de>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [rfc][patch 1/2] mnt_want_write speedup 1
Date: Mon, 22 Dec 2008 05:35:26 +0100	[thread overview]
Message-ID: <20081222043526.GC13406@wotan.suse.de> (raw)
In-Reply-To: <1229700721.17206.634.camel@nimitz>

On Fri, Dec 19, 2008 at 07:32:01AM -0800, Dave Hansen wrote:
> On Fri, 2008-12-19 at 08:03 +0100, Nick Piggin wrote:
> > MNT_WRITE_HOLD is set, so any writer that has already made it past
> > the MNT_WANT_WRITE loop will have its count visible here. Any writer
> > that has not made it past that loop will wait until the slowpath
> > completes and then the fastpath will go on to check whether the
> > mount is still writeable.
> 
> Ahh, got it.  I'm slowly absorbing the barriers.  Not the normal way, I
> code.
> 
> I thought there was another race with MNT_WRITE_HOLD since mnt_flags
> isn't really managed atomically.  But, by only modifying with the
> vfsmount_lock, I think it is OK.
> 
> I also wondered if there was a possibility of getting a spurious -EBUSY
> when remounting r/w->r/o.  But, that turned out to just happen when the
> fs was *already* r/o.  So that looks good.
> 
> While this has cleared out a huge amount of complexity, I can't stop
> wondering if this could be done with a wee bit more "normal" operations.
> I'm pretty sure I couldn't have come up with this by myself, and I'm a
> bit worried that I wouldn't be able to find a race in it if one reared
> its ugly head.  

It could be done with a seqcounter I think, but that adds more branches,
variables, and barriers to this fastpath. Perhaps I should simply add
a bit more documentation.

 
> Is there a real good reason to allocate the percpu counters dynamically?
> Might as well stick them in the vfsmount and let the one
> kmem_cache_zalloc() in alloc_vfsmnt() do a bit larger of an allocation.
> Did you think that was going to bloat it to a compound allocation or
> something?  I hate the #ifdefs. :)

Distros want to ship big NR_CPUS kernels and have them run reasonably on
small num_possible_cpus() systems. But also, it would help to avoid
cacheline bouncing from false sharing (allocpercpu.c code can also mess
this bug for small objects like these counters, but that's a problem
with the allocpercpu code which should be fixed anyway).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-12-22  4:35 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-19  6:19 [rfc][patch 1/2] mnt_want_write speedup 1 Nick Piggin
2008-12-19  6:19 ` Nick Piggin
2008-12-19  6:20 ` [rfc][patch 2/2] mnt_want_write speedup 2 Nick Piggin
2008-12-19  6:20   ` Nick Piggin
2008-12-19  6:34 ` [rfc][patch 1/2] mnt_want_write speedup 1 Dave Hansen
2008-12-19  6:52   ` Nick Piggin
2008-12-19  6:52     ` Nick Piggin
2008-12-19  6:56     ` Nick Piggin
2008-12-19  6:54 ` Dave Hansen
2008-12-19  6:54   ` Dave Hansen
2008-12-19  7:03   ` Nick Piggin
2008-12-19  7:03     ` Nick Piggin
2008-12-19 15:32     ` Dave Hansen
2008-12-19 15:32       ` Dave Hansen
2008-12-22  4:35       ` Nick Piggin [this message]
2008-12-22  4:35         ` Nick Piggin
2008-12-29 23:00         ` Dave Hansen
2008-12-29 23:00           ` Dave Hansen
2008-12-30  4:02           ` Nick Piggin
2008-12-30  4:02             ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081222043526.GC13406@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=dave@linux.vnet.ibm.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.