All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Paul Mackerras <paulus@au1.ibm.com>,
	Ingo Molnar <mingo@kernel.org>,
	James Hogan <james.hogan@imgtec.com>,
	"James E.J. Bottomley" <jejb@parisc-linux.org>,
	Helge Deller <deller@gmx.de>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	"David S. Miller" <davem@davemloft.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Anton Blanchard <anton@au1.ibm.com>
Subject: Re: [RFC GIT PULL] softirq: Consolidation and stack overrun fix
Date: Tue, 24 Sep 2013 11:52:07 +1000	[thread overview]
Message-ID: <1379987527.5443.20.camel@pasglop> (raw)
In-Reply-To: <CA+55aFwyVG4DXOEvaVTp0gE-H2Zy0Kx7Mg7RWzpxzwm_RSWEYQ@mail.gmail.com>

On Mon, 2013-09-23 at 18:19 -0700, Linus Torvalds wrote:
> On Mon, Sep 23, 2013 at 5:10 PM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> >
> > BTW, that boils down to a choice between using r13 as either a TLS for
> > current or current_thread_info, or as a per-cpu pointer, which one is
> > the most performance critical ?
> 
> I think you can tune most of the architecture setup to best suit your needs.
> 
> For example, on x86, we don't have much choice: the per-cpu accessors
> are going to be faster than the alternatives, and there are patches
> afoot to tune the preempt and rcu-readside counters to use the percpu
> area (and then save/restore things at task switch time). But having
> the counters natively in the thread_info struct is fine too and is
> what we do now.

Right, as long as the generic code doesn't move toward putting
everything in per-cpu without leaving us the option :-)

> Generally, we've put the performance-critical stuff into
> "current_thread_info" as opposed to "current", so it's likely that if
> the choice is between those two, then you might want to pick %r13
> pointing to the thread-info rather than the "struct task_struct" (ie
> things like low-level thread flags). But which is better probably
> depends on load, and again, some of it you can tweak by just making
> per-architecture structure choices and making the macros point at one
> or the other.

Well, if current_thread_info is basically inside the thread struct, it
will be the same, just a different offset from r13... task struct,
thread struct, thread info, it all becomes just one big structure
pointed to by r13.

> There's a few things that really depend on per-cpu areas, but I don't
> think it's a huge performance issue if you have to indirect off memory
> to get that. Most of the performance issues with per-cpu stuff is
> about avoiding cachelines ping-ponging back and forth, not so much
> about fast direct access. Of course, if some load really uses a *lot*
> of percpu accesses, you get both.
> 
> The advantage of having %r13 point to thread data (which is "stable"
> as far as the compiler is concerned) as opposed to having it be a
> per-cpu pointer (which can change randomly due to task switching) is
> that from a correctness standpoint I really do think that either
> thread-info or current is *much* easier to handle than using it for
> the per-cpu base pointer.

Right. I had a chat with Alan Modra (gcc) and he reckons the "right" way
to make the per-cpu (or PACA) stuff work reasonably reliably is to do
something along the lines of:

register unsigned long per_cpu_offset asm("r13");

And have a barrier in preempt_enable/disable (and irq enable/disable,
though arguably we could just make barrier() do it) which marks r13 as
an *output* (not a clobber !).

>From there, gcc knows that after any such barrier, r13 can have changed
and we intend to use the new value (if it's marked as a clobber, it
assumes it was *clobbered* and thus need to be restored to it's
*previous* value).

So if that holds, we have a solid way to do per-cpu. On one side, I tend
to think that r13 being task/thread/thread_info is probably a better
overall choice, I'm worried that going in a different direction than x86
means generic code will get "tuned" to use per-cpu for performance
critical stuff rather than task/thread/thread_info in inflexible ways.

Cheers,
Ben.

>              Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



  reply	other threads:[~2013-09-24  1:53 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-19 19:51 [RFC GIT PULL] softirq: Consolidation and stack overrun fix Frederic Weisbecker
2013-09-19 19:51 ` [PATCH 1/3] irq: Consolidate do_softirq() arch overriden implementations Frederic Weisbecker
2013-09-19 19:51 ` [PATCH 2/3] irq: Execute softirq on its own stack on irq exit Frederic Weisbecker
2013-09-19 19:51 ` [PATCH 3/3] irq: Comment on the use of inline stack for ksoftirqd Frederic Weisbecker
2013-09-20  0:02 ` [RFC GIT PULL] softirq: Consolidation and stack overrun fix Linus Torvalds
2013-09-20  1:53   ` Benjamin Herrenschmidt
2013-09-20 11:03   ` Thomas Gleixner
2013-09-20 11:11     ` Peter Zijlstra
2013-09-21  0:55       ` Benjamin Herrenschmidt
2013-09-20 16:26     ` Frederic Weisbecker
2013-09-20 17:30       ` Thomas Gleixner
2013-09-20 18:37         ` Frederic Weisbecker
2013-09-20 22:14       ` Linus Torvalds
2013-09-21  7:47         ` Ingo Molnar
2013-09-21 18:58         ` Frederic Weisbecker
2013-09-21 21:45           ` Benjamin Herrenschmidt
2013-09-21 23:27             ` Frederic Weisbecker
2013-09-22  2:01             ` H. Peter Anvin
2013-09-22  4:39               ` Benjamin Herrenschmidt
2013-09-22  4:41                 ` Benjamin Herrenschmidt
2013-09-22 16:24                   ` Peter Zijlstra
2013-09-22 17:47                     ` H. Peter Anvin
2013-09-22 22:00                       ` Benjamin Herrenschmidt
2013-09-22 21:56                     ` Benjamin Herrenschmidt
2013-09-22 22:22                       ` Linus Torvalds
2013-09-22 22:38                         ` Benjamin Herrenschmidt
2013-09-23  4:35                           ` [PATCH] powerpc/irq: Run softirqs off the top of the irq stack Benjamin Herrenschmidt
2013-09-23  4:35                             ` Benjamin Herrenschmidt
2013-09-23  7:56                             ` Stephen Rothwell
2013-09-23  7:56                               ` Stephen Rothwell
2013-09-23 10:13                               ` Benjamin Herrenschmidt
2013-09-23 10:13                                 ` Benjamin Herrenschmidt
2013-09-23 16:47                             ` Linus Torvalds
2013-09-23 16:47                               ` Linus Torvalds
2013-09-23 20:51                               ` Benjamin Herrenschmidt
2013-09-23 20:51                                 ` Benjamin Herrenschmidt
2013-09-24  5:42                           ` [PATCH v2] " Benjamin Herrenschmidt
2013-09-24  5:42                             ` Benjamin Herrenschmidt
2013-09-23 17:59                         ` [RFC GIT PULL] softirq: Consolidation and stack overrun fix Chris Metcalf
2013-09-23 20:57                           ` Benjamin Herrenschmidt
2013-09-24 19:27                             ` Chris Metcalf
2013-09-24 20:58                               ` Benjamin Herrenschmidt
2013-09-24  0:10                         ` Benjamin Herrenschmidt
2013-09-24  1:19                           ` Linus Torvalds
2013-09-24  1:52                             ` Benjamin Herrenschmidt [this message]
2013-09-24  8:04                               ` Peter Zijlstra
2013-09-24  8:16                                 ` Benjamin Herrenschmidt
2013-09-24  8:21                                   ` Peter Zijlstra
2013-09-24  9:31                                     ` Benjamin Herrenschmidt
2013-09-23  4:40             ` Benjamin Herrenschmidt
2013-09-23  5:01               ` David Miller
2013-09-24  2:44               ` Frederic Weisbecker
2013-09-24  4:42                 ` Benjamin Herrenschmidt
2013-09-24 13:56                   ` Frederic Weisbecker
2013-09-24 20:55                     ` Benjamin Herrenschmidt
2013-09-25  8:46                       ` Frederic Weisbecker
2013-09-21  0:52       ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1379987527.5443.20.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=akpm@linux-foundation.org \
    --cc=anton@au1.ibm.com \
    --cc=davem@davemloft.net \
    --cc=deller@gmx.de \
    --cc=fweisbec@gmail.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=james.hogan@imgtec.com \
    --cc=jejb@parisc-linux.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulus@au1.ibm.com \
    --cc=peterz@infradead.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.