linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Paul Mackerras <paulus@au1.ibm.com>,
	Ingo Molnar <mingo@kernel.org>,
	James Hogan <james.hogan@imgtec.com>,
	"James E.J. Bottomley" <jejb@parisc-linux.org>,
	Helge Deller <deller@gmx.de>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	"David S. Miller" <davem@davemloft.net>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC GIT PULL] softirq: Consolidation and stack overrun fix
Date: Mon, 23 Sep 2013 08:38:09 +1000	[thread overview]
Message-ID: <1379889489.24090.34.camel@pasglop> (raw)
In-Reply-To: <CA+55aFwaf_Wst=AS75ydBJVQ6aJxPfAzXdt-UXj3qC9WeUt7kw@mail.gmail.com>

On Sun, 2013-09-22 at 15:22 -0700, Linus Torvalds wrote:
> On Sun, Sep 22, 2013 at 2:56 PM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > On Sun, 2013-09-22 at 18:24 +0200, Peter Zijlstra wrote:
> >>
> >> We use a segment offset. Something like:
> >>
> >>   inc %gs:var;
> >>
> >
> > And gcc makes no stupid assumptions that this gs doesn't change ? That's
> > the main problem we have with using r13 for PACA.
> 
> Since gcc doesn't really know about segment registers at all (modulo
> %fs as TLS on x86), we do everything like that using inline asm.
> 
> It's not *too* painful if you have a number of macro helpers to build
> up all the different versions.
> 
> And r13 isn't volatile if you are preempt-safe, so I'm wondering if
> you could just make the preempt disable code mark %r13 as modified
> ("+r"). Then gcc won't ever cache r13 across one of those. And if you
> don't have preemption disabled, then you cannot do multiple ops using
> %r13 anyway, since on a load-store architecture it might change even
> between the load and store, so a per-cpu "add" operation *has* to
> cache the %r13 value in *another* register anyway, because using
> memory ops with just an offset off %r13 would be buggy.

The only other one is our soft-disable of irqs which is a byte off r13,
so technically a simple:

	li	r0,0
	stb	r0,PACASOFTIRQEN(r13)

Is a perfectly valid implementation of local_irq_disable() ;-) But that
sort of special case I think we can handle with special cases.

Right, using an r13 clobber in a dummy asm in things like
preempt_disable and local_irq_disable is so far the most interesting
option left on our table.

> So I don't think this is a gcc issue. gcc can't fix those kinds of problems.

Well, it would help if we had a way to tell gcc that a global register
variable is volatile. But it doesn't understand that concept.

> Personally, I'd suggest something like:
> 
>  - the paca stuff is just insane. Try to get rid of it.

I wish I could... it dig deep. The problem is fundamentally because with
MMU off (aka "real" mode), under a hypervisor, we have only access to a
portion of the partition memory (it's a fake real mode if you want, and
on older systems it's implemented as a physically contiguous chunk of
memory reserved by the hypervisor for the partition, and as such cannot
be arbitrarily large, newer CPUs support something smarter but with
still with some limitations).

That combined with the fact that all interrupts are delivered in real
mode, and you get the picture. KVM "userspace" (the variant that
emulates the guest as a user process) makes it worse as it needs to
maintain much more state in a place that is "real mode accessible".

We *might* be bale to repurpose r13 as our per-cpu offset and move a
chunk of stuff from the paca to per-cpu variables, but we will still
need "something" akin to the PACA for the guts of the exception
entry/exit and KVM.

>  - use %r13 for the per-thread thread-info pointer instead. A
> per-thread pointer is *not* volatile like the per-cpu base is.

Yes, use it as a TLS, that would be an option I've been considering. It
could also be the task struct but thread_info might be slightly hotter.

That's what I had in mind when I mentioned using it as a TLS for
"current".

>  - Now you can make the per-cpu offset be loaded off the per-thread
> pointer (update it at context switch). gcc knows to not cache it
> across function calls, since it's a memory access. Use ACCESS_ONCE()
> or something to make sure it's only loaded once for the cpu offset
> ops.

I have seen gcc completely disregard ACCESS_ONCE in the past though, at
least in the context of PACA relative accesses...

> Alternatively, make %r13 point to the percpu side, but make sure that
> you always use an asm accessor to fetch the value. In particular, I
> think you need to make __my_cpu_offset be an inline asm that fetches
> %r13 into some other register. Otherwise you can never get it right.

Yes, that's my plan B, at least for PACA but it would work for per-cpu
as well. Use a get_paca/put_paca style construct which fetches is in
another register (and does the preempt enable/disable while at it) and
for the handful of cases where we do want direct and faster access such
as manipulating the soft-irq flag, an inline asm load or store which is
what we do today.

Also with LE, we still have a bit of potential for tweaking the ABI and
the compiler, so that's something I want to put on the table.

Cheers,
Ben.



  reply	other threads:[~2013-09-22 22:39 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-19 19:51 [RFC GIT PULL] softirq: Consolidation and stack overrun fix Frederic Weisbecker
2013-09-19 19:51 ` [PATCH 1/3] irq: Consolidate do_softirq() arch overriden implementations Frederic Weisbecker
2013-09-19 19:51 ` [PATCH 2/3] irq: Execute softirq on its own stack on irq exit Frederic Weisbecker
2013-09-19 19:51 ` [PATCH 3/3] irq: Comment on the use of inline stack for ksoftirqd Frederic Weisbecker
2013-09-20  0:02 ` [RFC GIT PULL] softirq: Consolidation and stack overrun fix Linus Torvalds
2013-09-20  1:53   ` Benjamin Herrenschmidt
2013-09-20 11:03   ` Thomas Gleixner
2013-09-20 11:11     ` Peter Zijlstra
2013-09-21  0:55       ` Benjamin Herrenschmidt
2013-09-20 16:26     ` Frederic Weisbecker
2013-09-20 17:30       ` Thomas Gleixner
2013-09-20 18:37         ` Frederic Weisbecker
2013-09-20 22:14       ` Linus Torvalds
2013-09-21  7:47         ` Ingo Molnar
2013-09-21 18:58         ` Frederic Weisbecker
2013-09-21 21:45           ` Benjamin Herrenschmidt
2013-09-21 23:27             ` Frederic Weisbecker
2013-09-22  2:01             ` H. Peter Anvin
2013-09-22  4:39               ` Benjamin Herrenschmidt
2013-09-22  4:41                 ` Benjamin Herrenschmidt
2013-09-22 16:24                   ` Peter Zijlstra
2013-09-22 17:47                     ` H. Peter Anvin
2013-09-22 22:00                       ` Benjamin Herrenschmidt
2013-09-22 21:56                     ` Benjamin Herrenschmidt
2013-09-22 22:22                       ` Linus Torvalds
2013-09-22 22:38                         ` Benjamin Herrenschmidt [this message]
2013-09-23  4:35                           ` [PATCH] powerpc/irq: Run softirqs off the top of the irq stack Benjamin Herrenschmidt
2013-09-23  7:56                             ` Stephen Rothwell
2013-09-23 10:13                               ` Benjamin Herrenschmidt
2013-09-23 16:47                             ` Linus Torvalds
2013-09-23 20:51                               ` Benjamin Herrenschmidt
2013-09-24  5:42                           ` [PATCH v2] " Benjamin Herrenschmidt
2013-09-23 17:59                         ` [RFC GIT PULL] softirq: Consolidation and stack overrun fix Chris Metcalf
2013-09-23 20:57                           ` Benjamin Herrenschmidt
2013-09-24 19:27                             ` Chris Metcalf
2013-09-24 20:58                               ` Benjamin Herrenschmidt
2013-09-24  0:10                         ` Benjamin Herrenschmidt
2013-09-24  1:19                           ` Linus Torvalds
2013-09-24  1:52                             ` Benjamin Herrenschmidt
2013-09-24  8:04                               ` Peter Zijlstra
2013-09-24  8:16                                 ` Benjamin Herrenschmidt
2013-09-24  8:21                                   ` Peter Zijlstra
2013-09-24  9:31                                     ` Benjamin Herrenschmidt
2013-09-23  4:40             ` Benjamin Herrenschmidt
2013-09-23  5:01               ` David Miller
2013-09-24  2:44               ` Frederic Weisbecker
2013-09-24  4:42                 ` Benjamin Herrenschmidt
2013-09-24 13:56                   ` Frederic Weisbecker
2013-09-24 20:55                     ` Benjamin Herrenschmidt
2013-09-25  8:46                       ` Frederic Weisbecker
2013-09-21  0:52       ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1379889489.24090.34.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=deller@gmx.de \
    --cc=fweisbec@gmail.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=james.hogan@imgtec.com \
    --cc=jejb@parisc-linux.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulus@au1.ibm.com \
    --cc=peterz@infradead.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).