From: Nick Piggin <npiggin@kernel.dk>
To: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Al Viro <viro@ZenIV.linux.org.uk>,
Stephen Rothwell <sfr@canb.auug.org.au>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch] fs: scale vfsmount refcount (was Re: rcu-walk and dcache scaling tree update and status)
Date: Mon, 13 Dec 2010 14:31:10 +1100 [thread overview]
Message-ID: <20101213033110.GA7898@amd> (raw)
In-Reply-To: <20101213024217.GC6522@amd>
On Mon, Dec 13, 2010 at 01:42:17PM +1100, Nick Piggin wrote:
> On Mon, Dec 13, 2010 at 01:37:33PM +1100, Nick Piggin wrote:
> > Final note:
> > You won't be able to reproduce the parallel path walk scalability
> > numbers that I've posted, because the vfsmount refcounting scalability
> > patch is not included. I have a new idea for that now, so I'll be asking
> > for comments with that soon.
>
> Here is the patch I've been using, which works but has the problem
> described in the changelog. But it works nicely for testing.
>
> As I said, I have a promising approach to solving the problem.
>
> fs: scale mntget/mntput
[...]
> [Note: this is not for merging. Un-attached operation (lazy umount) may not be
> uncommon and will be slowed down and actually have worse scalablilty after
> this patch. I need to think about how to do fast refcounting with unattached
> mounts.]
So the problem this patch tries to fix is vfsmount refcount scalability.
We need to take a ref for every successful path lookup, and often
lookups are going to the same mountpoint.
(Yes this little bouncing atomic hurts, badly, even on my small 2s12c
tightly connected system on the parallel git diff workload -- because
there are other bouncing kernel cachelines in this workload).
The fundamental difficulty is that a simple refcount can never be SMP
scalable, because dropping the ref requires we check whether we are
the last reference (which implies communicating with other CPUs that
might have taken references).
We can make them scalable by keeping a local count, and checking the
global sum less frequently. Some possibilities:
- avoid checking global sum while vfsmount is mounted, because the mount
contributes to the refcount (that is what this patch does, but it
kills performance inside a lazy umounted subtree).
- check global sum once every time interval (this would delay mount and
sb garbage collection, so it's probably a showstopper).
- check global sum only if local sum goes to 0 (this is difficult with
vfsmounts because the 'get' and the 'put' can happen on different
CPUs, so we'd need to have a per-thread refcount, or carry around the
CPU number with the refcount, both get horribly ugly, it turns out).
My proposal is a variant / generalisation of the 1st idea, which is to
have "long" refcounts. Normal refcounts will be per-cpu difference of
incs and decs, but dropping a reference will not have to check the
global sum while "long" refcounts are elevated. If the mount is a long
refcount, then that is what this current patch essentially is.
But then I would also have cwd take the long refcount, which allows
detached operation to remain fast while there are processes working
inside the detached namespace.
Details of locking aren't completely worked out -- it's a bit more
tricky because umount can be much heavier than fork() or chdir(), so
there are some difficulties in making long refcount operations faster
(the problem is remaining race-free versus the fast mntput check, but
I think a seqcount to go with the long refcount should do the trick).
next prev parent reply other threads:[~2010-12-13 3:31 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20101213023733.GB6522@amd>
2010-12-13 2:42 ` [patch] fs: scale vfsmount refcount (was Re: rcu-walk and dcache scaling tree update and status) Nick Piggin
2010-12-13 3:31 ` Nick Piggin [this message]
2010-12-13 3:43 ` Nick Piggin
2010-12-13 7:25 ` Eric Dumazet
2010-12-13 8:33 ` Nick Piggin
2010-12-14 12:40 ` Nick Piggin
2010-12-15 8:16 ` Andreas Dilger
2010-12-15 10:24 ` Nick Piggin
2010-12-13 2:53 ` rcu-walk and dcache scaling tree update and status Ed Tomlinson
2010-12-13 2:59 ` Nick Piggin
2010-12-13 3:45 ` Stephen Rothwell
2010-12-13 3:50 ` Nick Piggin
2010-12-13 3:40 ` Stephen Rothwell
2010-12-13 3:48 ` Nick Piggin
2010-12-14 0:03 ` Stephen Rothwell
2010-12-14 0:16 ` Stephen Rothwell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101213033110.GA7898@amd \
--to=npiggin@kernel.dk \
--cc=akpm@linux-foundation.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sfr@canb.auug.org.au \
--cc=torvalds@linux-foundation.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).