From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Rik van Riel <riel@redhat.com>,
rostedt@goodmiss.org, aquini@redhat.com,
Andrew Morton <akpm@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Michel Lespinasse <walken@google.com>,
linux-tip-commits@vger.kernel.org
Subject: Re: [tip:core/locking] x86/smp: Move waiting on contended ticket lock out of line
Date: Fri, 15 Feb 2013 17:48:14 +1100 [thread overview]
Message-ID: <1360910894.22260.68.camel@pasglop> (raw)
In-Reply-To: <CA+55aFwZ2wN=9qTNNmqjPfK=ZwS=kx6HAAFj_cWLnJDbK1M0_Q@mail.gmail.com>
On Wed, 2013-02-13 at 10:30 -0800, Linus Torvalds wrote:
> On Wed, Feb 13, 2013 at 8:20 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Adding an external function call is *horrible*, and you might almost
> > as well just uninline the spinlock entirely if you do this. It means
> > that all the small callers now have their registers trashed, whether
> > the unlikely function call is taken or not, and now leaf functions
> > aren't leaves any more.
>
> Btw, we've had things like this before, and I wonder if we could
> perhaps introduce the notion of a "light-weight call" for fastpath
> code that calls unlikely slow-path code..
>
> In particular, see the out-of-line code used by the rwlocks etc (see
> "arch_read_lock()" for an example in arch/x86/include/asm/spinlock.h
> and arch/x86/lib/rwlock.S), where we end up calling things from inline
> asm, with one big reason being exactly the fact that a "normal" C call
> has such horribly detrimental effects on the caller.
This would be nice. I've been wanting to do something like that for a
while in fact... On archs like powerpc, we lose 11 GPRs on a function
call, that ends up being a lot of stupid stack spills for cases that are
often corner cases (error cases etc... in inlines).
> Sadly, gcc doesn't seem to allow specifying which registers are
> clobbered any easy way, which means that both the caller and the
> callee *both* tend to need to have some asm interface. So we bothered
> to do this for __read_lock_failed, but we have *not* bothered to do
> the same for the otherwise very similar __mutex_fastpath_lock() case,
> for example.
>
> So for rwlocks, we actually get very nice code generation with small
> leaf functions not necessarily needing stack frames, but for mutexes
> we mark a lot of registers "unnecessarily" clobbered in the caller,
> exactly because we do *not* do that asm interface for the callee. So
> we have to clobber all the standard callee-clobbered registers, which
> is really sad, and callers almost always need a stack frame, because
> if they have any data live at all across the mutex, they have to save
> it in some register that is callee-saved - which basically means that
> the function has to have that stack frame in order to save its *own*
> callee-saved registers.
>
> So it means that we penalize the fastpath because the slow-path can't
> be bothered to do the extra register saving, unless we go to the
> lengths we went to for the rwlocks, and build a wrapper in asm to save
> the extra registers in the cold path.
>
> Maybe we could introduce some helpers to create these kinds of asm
> wrappers to do this? Something that would allow us to say: "this
> function only clobbers a minimal set of registers and you can call it
> from asm and only mark %rax/rcx/rdx clobbered" and that allows leaf
> functions to look like leaf functions for the fastpath?
We could so something like:
define_fastcall(func [.. figure out how to deal with args ... ])
Which spits out both a trampoline for saving the nasty stuff and calling
the real func() and a call_func() inline asm for the call site.
At least on archs with register-passing conventions, especially if we
make mandatory to stick to register args only and forbid stack spills
(ie, only a handful of args), it's fairly easy to do.
For stack based archs, it gets nastier as you have to dig out the args,
save stuff, and pile them again.
But since we also don't want to lose strong typing, we probably want to
express the args in that macro, maybe like we do for the syscall
defines. A bit ugly, but that would allow to have a strongly typed
call_func() *and* allow the trampoline to know what to do about the args
for stack based calling conventions.
About to go & travel so I don't have time to actually write something,
at least not for a couple of weeks though...
Ben.
> Hmm? That would make my dislike of uninlining the slow case largely go
> away. I still think that back-off tends to be a mistake (and is often
> horrible for virtualization etc), but as long as the fastpath stays
> close to optimal, I don't care *too* much.
>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2013-02-15 6:53 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-06 20:03 [PATCH -v5 0/5] x86,smp: make ticket spinlock proportional backoff w/ auto tuning Rik van Riel
2013-02-06 20:04 ` [PATCH -v5 1/5] x86,smp: move waiting on contended ticket lock out of line Rik van Riel
2013-02-13 12:06 ` [tip:core/locking] x86/smp: Move " tip-bot for Rik van Riel
2013-02-13 16:20 ` Linus Torvalds
2013-02-13 18:30 ` Linus Torvalds
2013-02-14 0:54 ` H. Peter Anvin
2013-02-14 1:31 ` Linus Torvalds
2013-02-14 1:56 ` H. Peter Anvin
2013-02-14 10:50 ` Ingo Molnar
2013-02-14 16:10 ` Linus Torvalds
2013-02-15 15:57 ` Ingo Molnar
2013-02-15 6:48 ` Benjamin Herrenschmidt [this message]
2013-02-13 19:08 ` Rik van Riel
2013-02-13 19:36 ` Linus Torvalds
2013-02-13 22:21 ` Rik van Riel
2013-02-13 22:40 ` Linus Torvalds
2013-02-13 23:41 ` Rik van Riel
2013-02-14 1:21 ` Linus Torvalds
2013-02-14 1:46 ` Linus Torvalds
2013-02-14 10:43 ` Ingo Molnar
2013-02-27 16:42 ` Rik van Riel
2013-02-27 17:10 ` Linus Torvalds
2013-02-27 19:53 ` Rik van Riel
2013-02-27 20:18 ` Linus Torvalds
2013-02-27 21:55 ` Rik van Riel
[not found] ` <CA+55aFwa0EjGG2NUDYVLVBmXJa2k81YiuNO2yggk=GLRQxhhUQ@mail.gmail.com>
2013-02-28 2:58 ` Rik van Riel
2013-02-28 3:19 ` Linus Torvalds
2013-02-28 4:06 ` Davidlohr Bueso
2013-02-28 4:49 ` Linus Torvalds
2013-02-28 15:13 ` Rik van Riel
2013-02-28 18:22 ` Linus Torvalds
2013-02-28 20:26 ` Linus Torvalds
2013-02-28 21:14 ` Rik van Riel
2013-02-28 21:58 ` Linus Torvalds
2013-02-28 22:38 ` Rik van Riel
2013-02-28 23:09 ` Linus Torvalds
2013-03-01 6:42 ` Rik van Riel
2013-03-01 18:18 ` Davidlohr Bueso
2013-03-01 18:50 ` Rik van Riel
2013-03-01 18:52 ` Linus Torvalds
2013-02-06 20:04 ` [PATCH -v5 2/5] x86,smp: proportional backoff for ticket spinlocks Rik van Riel
2013-02-13 12:07 ` [tip:core/locking] x86/smp: Implement " tip-bot for Rik van Riel
2013-02-06 20:05 ` [PATCH -v5 3/5] x86,smp: auto tune spinlock backoff delay factor Rik van Riel
2013-02-13 12:08 ` [tip:core/locking] x86/smp: Auto " tip-bot for Rik van Riel
2013-02-06 20:06 ` [PATCH -v5 4/5] x86,smp: keep spinlock delay values per hashed spinlock address Rik van Riel
2013-02-13 12:09 ` [tip:core/locking] x86/smp: Keep " tip-bot for Eric Dumazet
2013-02-06 20:07 ` [PATCH -v5 5/5] x86,smp: limit spinlock delay on virtual machines Rik van Riel
2013-02-07 11:11 ` Ingo Molnar
2013-02-07 21:24 ` [PATCH fix " Rik van Riel
2013-02-13 12:10 ` [tip:core/locking] x86/smp: Limit " tip-bot for Rik van Riel
2013-02-07 11:25 ` [PATCH -v5 5/5] x86,smp: limit " Stefano Stabellini
2013-02-07 11:59 ` Raghavendra K T
2013-02-07 13:28 ` Rik van Riel
2013-02-06 20:08 ` [PATCH -v5 6/5] x86,smp: add debugging code to track spinlock delay value Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1360910894.22260.68.camel@pasglop \
--to=benh@kernel.crashing.org \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=aquini@redhat.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=riel@redhat.com \
--cc=rostedt@goodmiss.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.