linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org,
	Ravikiran G Thirumalai <kiran@scalex86.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [rfc][patch] store-free path walking
Date: Wed, 7 Oct 2009 18:46:22 +0200	[thread overview]
Message-ID: <20091007164622.GX30316@wotan.suse.de> (raw)
In-Reply-To: <alpine.LFD.2.01.0910070911080.3432@localhost.localdomain>

On Wed, Oct 07, 2009 at 09:27:59AM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 7 Oct 2009, Linus Torvalds wrote:
> > 
> > Hmm. Regardless, this very much does look like what I envisioned, apart 
> > from details like that. And maybe your per-dentry seqlock is the right 
> > choice. On x86, it certainly doesn't have the performance issues it could 
> > have in other places.
> 
> Actually, if we really want to do the per-dentry thing, then we should 
> change it a bit. Maybe rather than using a seqlock data structure (which 
> is really just a unsigned counter and a spinlock), we could do just the 
> unsigned counter, and use the d_lock as the spinlock for the sequence 
> lock.
> 
> The hackiest way to do that woudl be to get rid of d_lock entirely, 
> replace it with d_seqlock, and then just do
> 
> 	#define d_lock d_seqlock.lock
> 
> instead (but the dentry structure may well have layout issues that makes 
> that not work very well - we're mixing pointers and 'int'-sized things 
> and need to pack them well etc).
> 
> That would cut down the seqlock memory costs from 8 bytes (or more - just 
> the spinlock itself is currently 8 bytes on ia64, so on ia64 the seqlock 
> is actually 16 bytes, not to mention all the spinlock debugging cases) to 
> just four bytes.

Oh I did that, used a "seqcount" which is the bare sequence counter
(and update it while holding d_lock).

Yes it still has packing issues, athough I think I can get rid of
d_mounted so it will then pack nicely and size won't change. (just
have a flag if we are mounted at least once, and just store the
count elsewhere for mountpoints -- or even just search the mount
hash on each umount to see if anything is left mounted on it)

 
> However, I still suspect we could do things entirely without the seqlock. 
> The outer seqlock will handle the "couldn't find it" case, and I've got 
> the strongest feeling that we should be able to just use some basic memory 
> ordering on the dentry hash to make the inner seqlock unnecessary (ie 
> make sure that either we don't see the old entry at all, or that we can 
> guarantee that it won't trigger a successful compare while the rename is 
> in process because we set the dentry name length to zero).

Well, I would be all for improving things of course. But keep in
mind we already do the rename_lock seqcount for each d_lookup,
so the lock free lookup path is only doing extra seqlocks on dcache
hash collision cases.

But I do agree it needs more thought. I'll try to get the powerpc
guys interested in running tests for us tomorrow :)


  reply	other threads:[~2009-10-07 16:46 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-06  6:49 Latest vfs scalability patch Nick Piggin
2009-10-06 10:14 ` Jens Axboe
2009-10-06 10:26   ` Jens Axboe
2009-10-06 11:10     ` Peter Zijlstra
2009-10-06 12:51       ` Jens Axboe
2009-10-06 12:26   ` Nick Piggin
2009-10-06 12:49     ` Jens Axboe
2009-10-07  8:58       ` [rfc][patch] store-free path walking Nick Piggin
2009-10-07  9:56         ` Jens Axboe
2009-10-07 10:10           ` Nick Piggin
2009-10-12  3:58           ` Nick Piggin
2009-10-12  5:59             ` Nick Piggin
2009-10-12  8:20               ` Jens Axboe
2009-10-12 11:00                 ` Jens Axboe
2009-10-13  1:26             ` Christoph Hellwig
2009-10-13  1:52               ` Nick Piggin
2009-10-07 14:56         ` Linus Torvalds
2009-10-07 16:27           ` Linus Torvalds
2009-10-07 16:46             ` Nick Piggin [this message]
2009-10-07 19:25               ` Linus Torvalds
2009-10-07 20:34                 ` Andi Kleen
2009-10-07 20:51                   ` Linus Torvalds
2009-10-07 21:06                     ` Andi Kleen
2009-10-07 21:20                       ` Linus Torvalds
2009-10-07 21:57                         ` Linus Torvalds
2009-10-07 22:22                           ` Andi Kleen
2009-10-08  7:39                             ` Nick Piggin
2009-10-09 17:53                               ` Andi Kleen
2009-10-08 13:12                           ` Denys Vlasenko
2009-10-09  7:47                             ` Nick Piggin
2009-10-09 17:49                             ` Andi Kleen
2009-10-07 16:29           ` Nick Piggin
2009-10-08 12:36           ` Nick Piggin
2009-10-08 12:57             ` Jens Axboe
2009-10-08 13:22               ` Nick Piggin
2009-10-08 13:30                 ` Jens Axboe
2009-10-08 18:00                   ` Peter Zijlstra
2009-10-09  4:04                     ` Nick Piggin
2009-10-09  8:54                 ` Jens Axboe
2009-10-09  9:51                   ` Jens Axboe
2009-10-09 10:02                     ` Nick Piggin
2009-10-09 10:08                       ` Jens Axboe
2009-10-09 10:07                   ` Nick Piggin
2009-10-09  3:50             ` Nick Piggin
2009-10-09  6:15               ` David Miller
2009-10-09 10:40                 ` Nick Piggin
2009-10-09 11:09                   ` Jens Axboe
2009-10-09 10:44                 ` Nick Piggin
2009-10-09 10:48                   ` Jens Axboe
2009-10-09 23:16         ` Paul E. McKenney
2009-10-15 10:08 ` Latest vfs scalability patch Anton Blanchard
2009-10-15 10:39   ` Nick Piggin
2009-10-15 10:46     ` Anton Blanchard
2009-10-15 10:53   ` Nick Piggin
2009-10-15 11:23     ` Anton Blanchard
2009-10-15 11:41       ` Nick Piggin
2009-10-15 11:48         ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091007164622.GX30316@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=jens.axboe@oracle.com \
    --cc=kiran@scalex86.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).