public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@kernel.dk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@kernel.dk>,
	linux-kernel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
	Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [patches] seqlock: add barrier-less special cases for seqcounts
Date: Sat, 13 Nov 2010 10:06:50 +1100	[thread overview]
Message-ID: <20101112230650.GA3317@amd> (raw)
In-Reply-To: <AANLkTinAY=jbtEEFa0XgYiHs8Lj21Q58FhPpU+TOtWnf@mail.gmail.com>

On Fri, Nov 12, 2010 at 08:39:17AM -0800, Linus Torvalds wrote:
> On Thu, Nov 11, 2010 at 12:00 AM, Nick Piggin <npiggin@kernel.dk> wrote:
> > Add branch annotations for seqlock read fastpath, and introduce
> > __read_seqcount_begin and __read_seqcount_end functions, that can avoid
> > the smp_rmb() if it is provided with some other barrier. Read barriers
> > have non trivial cost on many architectures.
> >
> > These will be used by store-free path walking algorithm, where
> > performance is critical and seqlocks are widely used.
> 
> A couple of questions:
> 
>  - what are the barriers in question? IOW, describe some normal use.

OK, anything that provides smp_rmb() or stronger. I really prefer not
to have a "normal" usage for these things, only *really* carefully
controlled and critical parts. I came up with these after doing some
testing on a POWER7 to really shave off cycles.

In the case of rcu-walk, we basically have a chain of dentries to walk
down, and so we need to take seqlocks as we go. So the pattern goes:

dentry = cwd;
seq = read_seqlock(&dentry->d_seq);
/* do path walk */
child = d_lookup(dentry, name);
seq2 = read_seqlock(&child->d_seq);
if (read_seqretry(&dentry->d_seq, seq))
  /* bail out */

So we have to have these inter-linked chain of seqlocks covering the
walk. As such, the smp_rmb tends to get repeated in each one, wheras
we don't actually have to have the smp_rmb for the child issued until
after we verify the parent's sequence (because we don't load anything
from the child until after that).

I really don't anticipate many other users, but perhaps similar case
like walking down nodes of a tree or something.


>  - do we really want the "repeat until seqlock is even" code in the
> __read_seqcount_begin() code for those kinds of internal cases?
> 
> That second one is very much a question for the use-case like the
> pathname walk where you have a fall-back that uses "real" locking
> rather than the optimistic sequence locks. I have a suspicion that if
> seq_locks are used as an "optimistic lockless path with a locking
> fallback", then if we see an odd value at the beginning we should
> consider it a hint that the sequence lock is contended and the
> optimistic path should be aborted early.
> 
> In other words, I kind of suspect that anybody that wants to use some
> internal sequence lock function like __read_seqcount_begin() would
> also want to do its own decision about what happens when the seqlock
> is already in the middle of having an active writer.
> 
> So the interface seems a bit broken: if we really want to expose these
> kinds of internal helper functions, then I suspect not only the
> smp_rmb(), but also the whole "loop until even" should be in the
> normal "read_seqcount_begin()" function, and __read_seqcount_begin()
> would _literally_ just do the single sequence counter access.
> 
> I dunno. Just a gut feel. Added Al, Ingo and Thomas to the Cc - the
> whole "loop in begin" was added by Ingo and Thomas a few years ago to
> avoid a live-lock, but that live-lock issue really isn't an issue if
> you end up falling back on a locking algorithm and have a "early
> failure" case for the __read_seqcount_begin() the same way we have the
> final failure case for [__]read_seqcount_retry().

Possibly, you're right. Now the fallback case is obviously suboptimal
and heavyweight, so we do want to avoid it if we can. Also not having
an error to handle in seqcount_begin is just one less thing to worry
about.

I mean, we can just fall out immediately if we want to, but is there
much advantage in doing so? The write side critical sections on these
things are very small -- pretty much only when the ->d_inode goes
away or ->d_name changes.


  reply	other threads:[~2010-11-12 23:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-11  8:00 [patches] seqlock: add barrier-less special cases for seqcounts Nick Piggin
2010-11-12 16:39 ` Linus Torvalds
2010-11-12 23:06   ` Nick Piggin [this message]
2010-11-12 23:52     ` Linus Torvalds
2010-11-15  3:49       ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101112230650.GA3317@amd \
    --to=npiggin@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox