From: Jens Axboe <jens.axboe@oracle.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-fsdevel@vger.kernel.org,
Ravikiran G Thirumalai <kiran@scalex86.org>,
Peter Zijlstra <peterz@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [rfc][patch] store-free path walking
Date: Wed, 7 Oct 2009 11:56:57 +0200 [thread overview]
Message-ID: <20091007095657.GB8703@kernel.dk> (raw)
In-Reply-To: <20091007085849.GN30316@wotan.suse.de>
On Wed, Oct 07 2009, Nick Piggin wrote:
> On Tue, Oct 06, 2009 at 02:49:41PM +0200, Jens Axboe wrote:
> > On Tue, Oct 06 2009, Nick Piggin wrote:
> > > It's a copout, but you could try running multiple dbenches under
> > > different working directories (or actually, IIRC dbench does root
> > > based path lookups so maybe that won't help you much).
> >
> > Yeah, it's hitting dentry->d_lock pretty hard so basically
> > spin-serialized at that point.
> >
> > > > Anyway, below are the results. Seem very stable.
> > > >
> > > > throughput
> > > > ------------------------------------------------
> > > > 2.6.32-rc3-git | 561.218 MB/sec
> > > > 2.6.32-rc3-git+patch | 627.022 MB/sec
> > >
> > > Well it's good to see you got some improvement.
> >
> > Yes, it's an improvement though the results are still pretty abysmal :-)
>
> OK, I have a really basic patch that does store-free path walking
> (except on the final element). dbench is pretty nasty still because
> it seems to do a lot of stupid things like reading from /proc/mounts
> all the time.
>
> Actually it isn't quite store-free because it still takes mnt ref
> (which is hard to avoid, but at least it is a per-cpu data). So
> this patch applies on top of my recent patchset. At least it does
> not store to the dentries.
>
> Basically it is holding rcu_read_lock for the entire walk, it uses
> inode RCU from my earlier patches to check inode permissions, and
> it bails out at the slightest sign of trouble :) (actually I think
> I have got it to walk mountpoints which is nice).
>
> The seqlock in the dentry is for getting consistent name,len pointer,
> and also not giving a false positive if a rename has partially
> overwritten the name string (false negatives are always fine in the
> lock free lookup path but false positives are not). Possibly we
> could make do with a per-sb seqlock for this (or just rename_lock).
>
> I have duplicated most of the lookup code, altering it to the lock
> free version, which returns -EAGAIN if it gets in trouble and the
> regular path walk starts up. I don't know if this is the best way
> to go... it's fairly easy but a bit ugly. But complexifying the
> existing path walk any more would not be nice either. It might be
> nice to try locking the dentry that we're having trouble with and
> continuing from there, rather than redoing the whole walk with
> locks...
>
> Anyway, this is the basics working for now, microbenchmark shows
> same-cwd lookups scale linearly now too. We can probably slowly
> tackle more cases if they come up as being important, simply by
> auditing filesystems etc.
throughput
------------------------------------------------
2.6.32-rc3-git | 561.218 MB/sec
2.6.32-rc3-git+patch | 627.022 MB/sec
2.6.32-rc3-git+patch+inc| 969.761 MB/sec
So better, quite a bit too. Latencies are not listed here, but they are
also a lot better. Perf top still shows ~95% spinlock time. I did a
shorter run (the above are full 600 second runs) of 60s with profiling
and the full 64 clients, this time using -a as well (which generated
9.4GB of trace data!). The top is now:
_spin_lock (92%)
path_get (39%)
d_path (59%)
path_init (26%)
path_walk (13%)
dput (37%)
path_put (86%)
link_path_walk (13%)
__d_path (23%)
And finally, this:
> + if (!nd->dentry->d_inode) {
> + spin_unlock(&nd->path.dentry->d_lock);
> + return -EAGAIN;
> + }
doesn't compile, it wants to be
if (!nd->path.dentry->d_inode) {
--
Jens Axboe
next prev parent reply other threads:[~2009-10-07 9:57 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-06 6:49 Latest vfs scalability patch Nick Piggin
2009-10-06 10:14 ` Jens Axboe
2009-10-06 10:26 ` Jens Axboe
2009-10-06 11:10 ` Peter Zijlstra
2009-10-06 12:51 ` Jens Axboe
2009-10-06 12:26 ` Nick Piggin
2009-10-06 12:49 ` Jens Axboe
2009-10-07 8:58 ` [rfc][patch] store-free path walking Nick Piggin
2009-10-07 9:56 ` Jens Axboe [this message]
2009-10-07 10:10 ` Nick Piggin
2009-10-12 3:58 ` Nick Piggin
2009-10-12 5:59 ` Nick Piggin
2009-10-12 8:20 ` Jens Axboe
2009-10-12 11:00 ` Jens Axboe
2009-10-13 1:26 ` Christoph Hellwig
2009-10-13 1:52 ` Nick Piggin
2009-10-07 14:56 ` Linus Torvalds
2009-10-07 16:27 ` Linus Torvalds
2009-10-07 16:46 ` Nick Piggin
2009-10-07 19:25 ` Linus Torvalds
2009-10-07 20:34 ` Andi Kleen
2009-10-07 20:51 ` Linus Torvalds
2009-10-07 21:06 ` Andi Kleen
2009-10-07 21:20 ` Linus Torvalds
2009-10-07 21:57 ` Linus Torvalds
2009-10-07 22:22 ` Andi Kleen
2009-10-08 7:39 ` Nick Piggin
2009-10-09 17:53 ` Andi Kleen
2009-10-08 13:12 ` Denys Vlasenko
2009-10-09 7:47 ` Nick Piggin
2009-10-09 17:49 ` Andi Kleen
2009-10-07 16:29 ` Nick Piggin
2009-10-08 12:36 ` Nick Piggin
2009-10-08 12:57 ` Jens Axboe
2009-10-08 13:22 ` Nick Piggin
2009-10-08 13:30 ` Jens Axboe
2009-10-08 18:00 ` Peter Zijlstra
2009-10-09 4:04 ` Nick Piggin
2009-10-09 8:54 ` Jens Axboe
2009-10-09 9:51 ` Jens Axboe
2009-10-09 10:02 ` Nick Piggin
2009-10-09 10:08 ` Jens Axboe
2009-10-09 10:07 ` Nick Piggin
2009-10-09 3:50 ` Nick Piggin
2009-10-09 6:15 ` David Miller
2009-10-09 10:40 ` Nick Piggin
2009-10-09 11:09 ` Jens Axboe
2009-10-09 10:44 ` Nick Piggin
2009-10-09 10:48 ` Jens Axboe
2009-10-09 23:16 ` Paul E. McKenney
2009-10-15 10:08 ` Latest vfs scalability patch Anton Blanchard
2009-10-15 10:39 ` Nick Piggin
2009-10-15 10:46 ` Anton Blanchard
2009-10-15 10:53 ` Nick Piggin
2009-10-15 11:23 ` Anton Blanchard
2009-10-15 11:41 ` Nick Piggin
2009-10-15 11:48 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091007095657.GB8703@kernel.dk \
--to=jens.axboe@oracle.com \
--cc=kiran@scalex86.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@suse.de \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.