* [PATCH 0/8] softirq: Consolidation and stack overrun fix v3
@ 2013-09-26 16:24 Frederic Weisbecker
2013-09-26 16:24 ` [PATCH 1/8] irq: Force hardirq exit's softirq processing on its own stack Frederic Weisbecker
2013-09-26 16:43 ` [PATCH 0/8] softirq: Consolidation and stack overrun fix v3 Linus Torvalds
0 siblings, 2 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2013-09-26 16:24 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Benjamin Herrenschmidt, Paul Mackerras,
Ingo Molnar, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin,
Linus Torvalds, James Hogan, James E.J. Bottomley, Helge Deller,
Martin Schwidefsky, Heiko Carstens, David S. Miller,
Andrew Morton, stable
Hi,
Series is fixed to address Linus and Ben comments:
* Turn __ARCH_IRQ_EXIT_ON_IRQ_STACK old ad-hoc style symbol to proper Kconfig
(CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK)
* Activates it to powerpc as well.
* Fix a bit the changelog of patch 6/8
This applies on top of 06367d58f487a592de50e6e2327371c5e3b6188f
("Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc")
And it can be pulled from:
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
irq/core-v4
HEAD: 0362b2ff8d67913a8ffc5d757500129ead56b007
Frederic Weisbecker (8):
irq: Force hardirq exit's softirq processing on its own stack
irq: Consolidate do_softirq() arch overriden implementations
irq: Optimize call to softirq on hardirq exit
irq: Improve a bit softirq debugging
irq: Justify the various softirq stack choices
irq: Optimize softirq stack selection in irq exit
x86: Tell about irq stack coverage
powerpc: Tell about irq stack coverage
arch/Kconfig | 10 ++++++++
arch/metag/kernel/irq.c | 52 ++++++++++++++++--------------------------
arch/parisc/kernel/irq.c | 17 ++------------
arch/powerpc/Kconfig | 1 +
arch/powerpc/kernel/irq.c | 17 +-------------
arch/s390/kernel/irq.c | 52 +++++++++++++++++-------------------------
arch/sh/kernel/irq.c | 57 +++++++++++++++++-----------------------------
arch/sparc/kernel/irq_64.c | 31 ++++++++-----------------
arch/x86/Kconfig | 1 +
arch/x86/kernel/entry_64.S | 4 ++--
arch/x86/kernel/irq_32.c | 30 +++++++-----------------
arch/x86/kernel/irq_64.c | 21 -----------------
include/linux/interrupt.h | 11 +++++++++
kernel/softirq.c | 40 ++++++++++++++++++++++++--------
14 files changed, 138 insertions(+), 206 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/8] irq: Force hardirq exit's softirq processing on its own stack
2013-09-26 16:24 [PATCH 0/8] softirq: Consolidation and stack overrun fix v3 Frederic Weisbecker
@ 2013-09-26 16:24 ` Frederic Weisbecker
2013-09-26 16:43 ` [PATCH 0/8] softirq: Consolidation and stack overrun fix v3 Linus Torvalds
1 sibling, 0 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2013-09-26 16:24 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, stable, Benjamin Herrenschmidt,
Paul Mackerras, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
H. Peter Anvin, Linus Torvalds, James Hogan, James E.J. Bottomley,
Helge Deller, Martin Schwidefsky, Heiko Carstens, David S. Miller,
Andrew Morton
The commit facd8b80c67a3cf64a467c4a2ac5fb31f2e6745b
("irq: Sanitize invoke_softirq") converted irq exit
calls of do_softirq() to __do_softirq() on all architectures,
assuming it was only used there for its irq disablement
properties.
But as a side effect, the softirqs processed in the end
of the hardirq are always called on the inline current
stack that is used by irq_exit() instead of the softirq
stack provided by the archs that override do_softirq().
The result is mostly safe if the architecture runs irq_exit()
on a separate irq stack because then softirqs are processed
on that same stack that is near empty at this stage (assuming
hardirq aren't nesting).
Otherwise irq_exit() runs in the task stack and so does the softirq
too. The interrupted call stack can be randomly deep already and
the softirq can dig through it even further. To add insult to the
injury, this softirq can be interrupted by a new hardirq, maximizing
the chances for a stack overrun as reported in powerpc for example:
[ 1002.364577] do_IRQ: stack overflow: 1920
[ 1002.364653] CPU: 0 PID: 1602 Comm: qemu-system-ppc Not tainted 3.10.4-300.1.fc19.ppc64p7 #1
[ 1002.364734] Call Trace:
[ 1002.364770] [c0000000050a8740] [c0000000000157c0] .show_stack+0x130/0x200 (unreliable)
[ 1002.364872] [c0000000050a8810] [c00000000082f6d0] .dump_stack+0x28/0x3c
[ 1002.364955] [c0000000050a8880] [c000000000010b98] .do_IRQ+0x2b8/0x2c0
[ 1002.365039] [c0000000050a8930] [c0000000000026d4] hardware_interrupt_common+0x154/0x180
[ 1002.365148] --- Exception: 501 at .cp_start_xmit+0x3a4/0x820 [8139cp]
[ 1002.365148] LR = .cp_start_xmit+0x390/0x820 [8139cp]
[ 1002.365359] [c0000000050a8d40] [c0000000006d7f14] .dev_hard_start_xmit+0x394/0x640
[ 1002.365433] [c0000000050a8e00] [c0000000007028c0] .sch_direct_xmit+0x110/0x260
[ 1002.365499] [c0000000050a8ea0] [c0000000006d8420] .dev_queue_xmit+0x260/0x630
[ 1002.365571] [c0000000050a8f40] [d0000000027d30d4] .br_dev_queue_push_xmit+0xc4/0x130 [bridge]
[ 1002.365641] [c0000000050a8fc0] [d0000000027d01f8] .br_dev_xmit+0x198/0x270 [bridge]
[ 1002.365707] [c0000000050a9070] [c0000000006d7f14] .dev_hard_start_xmit+0x394/0x640
[ 1002.365774] [c0000000050a9130] [c0000000006d85e8] .dev_queue_xmit+0x428/0x630
[ 1002.365834] [c0000000050a91d0] [c000000000729764] .ip_finish_output+0x2a4/0x550
[ 1002.365902] [c0000000050a9290] [c00000000072aaf0] .ip_local_out+0x50/0x70
[ 1002.365960] [c0000000050a9310] [c00000000072aed8] .ip_queue_xmit+0x148/0x420
[ 1002.366018] [c0000000050a93b0] [c000000000749524] .tcp_transmit_skb+0x4e4/0xaf0
[ 1002.366085] [c0000000050a94a0] [c00000000073de9c] .__tcp_ack_snd_check+0x7c/0xf0
[ 1002.366152] [c0000000050a9520] [c0000000007451d8] .tcp_rcv_established+0x1e8/0x930
[ 1002.366217] [c0000000050a95f0] [c00000000075326c] .tcp_v4_do_rcv+0x21c/0x570
[ 1002.366274] [c0000000050a96c0] [c000000000754a44] .tcp_v4_rcv+0x734/0x930
[ 1002.366332] [c0000000050a97a0] [c000000000724144] .ip_local_deliver_finish+0x184/0x360
[ 1002.366398] [c0000000050a9840] [c000000000724468] .ip_rcv_finish+0x148/0x400
[ 1002.366457] [c0000000050a98d0] [c0000000006d3248] .__netif_receive_skb_core+0x4f8/0xb00
[ 1002.366523] [c0000000050a99d0] [c0000000006d5414] .netif_receive_skb+0x44/0x110
[ 1002.366594] [c0000000050a9a70] [d0000000027d4e2c] .br_handle_frame_finish+0x2bc/0x3f0 [bridge]
[ 1002.366674] [c0000000050a9b20] [d0000000027de5ac] .br_nf_pre_routing_finish+0x2ac/0x420 [bridge]
[ 1002.366754] [c0000000050a9bd0] [d0000000027df5ec] .br_nf_pre_routing+0x4dc/0x7d0 [bridge]
[ 1002.366820] [c0000000050a9c70] [c000000000717aa4] .nf_iterate+0x114/0x130
[ 1002.366877] [c0000000050a9d30] [c000000000717b74] .nf_hook_slow+0xb4/0x1e0
[ 1002.366938] [c0000000050a9e00] [d0000000027d51f0] .br_handle_frame+0x290/0x330 [bridge]
[ 1002.367005] [c0000000050a9ea0] [c0000000006d309c] .__netif_receive_skb_core+0x34c/0xb00
[ 1002.367072] [c0000000050a9fa0] [c0000000006d5414] .netif_receive_skb+0x44/0x110
[ 1002.367138] [c0000000050aa040] [c0000000006d6218] .napi_gro_receive+0xe8/0x120
[ 1002.367210] [c0000000050aa0c0] [d00000000208536c] .cp_rx_poll+0x31c/0x590 [8139cp]
[ 1002.367276] [c0000000050aa1d0] [c0000000006d59cc] .net_rx_action+0x1dc/0x310
[ 1002.367335] [c0000000050aa2b0] [c0000000000a0568] .__do_softirq+0x158/0x330
[ 1002.367394] [c0000000050aa3b0] [c0000000000a0978] .irq_exit+0xc8/0x110
[ 1002.367452] [c0000000050aa430] [c0000000000109bc] .do_IRQ+0xdc/0x2c0
[ 1002.367510] [c0000000050aa4e0] [c0000000000026d4] hardware_interrupt_common+0x154/0x180
[ 1002.367580] --- Exception: 501 at .bad_range+0x1c/0x110
[ 1002.367580] LR = .get_page_from_freelist+0x908/0xbb0
[ 1002.367658] [c0000000050aa7d0] [c00000000041d758] .list_del+0x18/0x50 (unreliable)
[ 1002.367725] [c0000000050aa850] [c0000000001bfa98] .get_page_from_freelist+0x908/0xbb0
[ 1002.367792] [c0000000050aa9e0] [c0000000001bff5c] .__alloc_pages_nodemask+0x21c/0xae0
[ 1002.367860] [c0000000050aaba0] [c0000000002126d0] .alloc_pages_vma+0xd0/0x210
[ 1002.367918] [c0000000050aac60] [c0000000001e93f4] .handle_pte_fault+0x814/0xb70
[ 1002.367985] [c0000000050aad50] [c0000000001eade4] .__get_user_pages+0x1a4/0x640
[ 1002.368052] [c0000000050aae60] [c00000000004606c] .get_user_pages_fast+0xec/0x160
[ 1002.368130] [c0000000050aaf10] [d000000001f73930] .__gfn_to_pfn_memslot+0x3b0/0x430 [kvm]
[ 1002.368205] [c0000000050aafd0] [d000000001f7e214] .kvmppc_gfn_to_pfn+0x64/0x130 [kvm]
[ 1002.368280] [c0000000050ab070] [d000000001f8a824] .kvmppc_mmu_map_page+0x94/0x530 [kvm]
[ 1002.368354] [c0000000050ab190] [d000000001f85064] .kvmppc_handle_pagefault+0x174/0x610 [kvm]
[ 1002.368429] [c0000000050ab270] [d000000001f85b74] .kvmppc_handle_exit_pr+0x464/0x9b0 [kvm]
[ 1002.368504] [c0000000050ab320] [d000000001f88ec4] kvm_start_lightweight+0x1ec/0x1fc [kvm]
[ 1002.368578] [c0000000050ab4f0] [d000000001f86a58] .kvmppc_vcpu_run_pr+0x168/0x3b0 [kvm]
[ 1002.368652] [c0000000050ab9c0] [d000000001f7f218] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
[ 1002.368725] [c0000000050aba50] [d000000001f7bdac] .kvm_arch_vcpu_ioctl_run+0x5c/0x1a0 [kvm]
[ 1002.368797] [c0000000050abae0] [d000000001f74618] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
[ 1002.368865] [c0000000050abc90] [c00000000025302c] .do_vfs_ioctl+0x4ec/0x7c0
[ 1002.368923] [c0000000050abd80] [c0000000002533d4] .SyS_ioctl+0xd4/0xf0
[ 1002.368981] [c0000000050abe30] [c000000000009ed4] syscall_exit+0x0/0x98
Since this is a regression, this patch proposes a minimalistic
and low-risk solution by blindly forcing the hardirq exit processing of
softirqs on the softirq stack. This way we should reduce significantly
the opportunities for task stack overflow dug by softirqs.
Longer term solutions may involve extending the hardirq stack coverage to
irq_exit(), etc...
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
kernel/softirq.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 53cc09c..d7d498d 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -328,10 +328,19 @@ void irq_enter(void)
static inline void invoke_softirq(void)
{
- if (!force_irqthreads)
- __do_softirq();
- else
+ if (!force_irqthreads) {
+ /*
+ * We can safely execute softirq on the current stack if
+ * it is the irq stack, because it should be near empty
+ * at this stage. But we have no way to know if the arch
+ * calls irq_exit() on the irq stack. So call softirq
+ * in its own stack to prevent from any overrun on top
+ * of a potentially deep task stack.
+ */
+ do_softirq();
+ } else {
wakeup_softirqd();
+ }
}
static inline void tick_irq_exit(void)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 0/8] softirq: Consolidation and stack overrun fix v3
2013-09-26 16:24 [PATCH 0/8] softirq: Consolidation and stack overrun fix v3 Frederic Weisbecker
2013-09-26 16:24 ` [PATCH 1/8] irq: Force hardirq exit's softirq processing on its own stack Frederic Weisbecker
@ 2013-09-26 16:43 ` Linus Torvalds
2013-09-26 17:38 ` Frederic Weisbecker
2013-09-30 11:12 ` Frederic Weisbecker
1 sibling, 2 replies; 5+ messages in thread
From: Linus Torvalds @ 2013-09-26 16:43 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Benjamin Herrenschmidt, Paul Mackerras, Ingo Molnar,
Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, James Hogan,
James E.J. Bottomley, Helge Deller, Martin Schwidefsky,
Heiko Carstens, David S. Miller, Andrew Morton, stable
On Thu, Sep 26, 2013 at 9:24 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> * Turn __ARCH_IRQ_EXIT_ON_IRQ_STACK old ad-hoc style symbol to proper Kconfig
> (CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK)
>
> * Activates it to powerpc as well.
>
> * Fix a bit the changelog of patch 6/8
Ack on the whole series.
I guess x86-64 and powerpc no longer care, but for other architectures
the stack usage issue can be a regression even if I think it's only
been reported on Power. So I'm assuming we should pull this into 3.12.
Yes?
And what about earlier stable kernels? Afaik this was originally
introduced by commit facd8b80c67a, merged into 3.9. Or was there some
other trigger? Do we want to just do the "__do_softirq" ->
"do_softirq()" change for stable?
"Help us, Obi-wan Weisbecker. You are our only hope"
Linus
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/8] softirq: Consolidation and stack overrun fix v3
2013-09-26 16:43 ` [PATCH 0/8] softirq: Consolidation and stack overrun fix v3 Linus Torvalds
@ 2013-09-26 17:38 ` Frederic Weisbecker
2013-09-30 11:12 ` Frederic Weisbecker
1 sibling, 0 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2013-09-26 17:38 UTC (permalink / raw)
To: Linus Torvalds
Cc: LKML, Benjamin Herrenschmidt, Paul Mackerras, Ingo Molnar,
Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, James Hogan,
James E.J. Bottomley, Helge Deller, Martin Schwidefsky,
Heiko Carstens, David S. Miller, Andrew Morton, stable
On Thu, Sep 26, 2013 at 09:43:38AM -0700, Linus Torvalds wrote:
> On Thu, Sep 26, 2013 at 9:24 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> >
> > * Turn __ARCH_IRQ_EXIT_ON_IRQ_STACK old ad-hoc style symbol to proper Kconfig
> > (CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK)
> >
> > * Activates it to powerpc as well.
> >
> > * Fix a bit the changelog of patch 6/8
>
>
> Ack on the whole series.
>
> I guess x86-64 and powerpc no longer care, but for other architectures
> the stack usage issue can be a regression even if I think it's only
> been reported on Power. So I'm assuming we should pull this into 3.12.
> Yes?
>
> And what about earlier stable kernels? Afaik this was originally
> introduced by commit facd8b80c67a, merged into 3.9. Or was there some
> other trigger? Do we want to just do the "__do_softirq" ->
> "do_softirq()" change for stable?
So there is like an underway proposition inside this patchset: the very first commit
has a stable tag on it. This way it can be merged seperately first for 3.12 and
then backported up to the release that has facd8b80c67a.
Then the rest of the series can go for the next merge window in a seperate tree.
Now that's really just one way among possible others to spread the patchset. Because
the regression fix alone unfortunately comes with a performance penalty.
Now if we compare that performance penalty to what we had prior facd8b80c67a, we
were already calling plain do_softirq() on irq_exit() for all archs that had
__ARCH_IRQ_EXIT_IRQS_DISABLED not defined. Namely only arm, arm64, blackfin, s390
were calling __do_softirq() instead. These archs, especially arm, aren't the least
that being said.
We could also backport the part that consolidates to do_softirq_own_stack() and call
it directly if __ARCH_IRQ_EXIT_IRQS_DISABLED. That restores a bit the facd8b80c67a~
state. But it's a bit invasive for post -rc2.
Now may be somebody has some better scenario in mind.
>
> "Help us, Obi-wan Weisbecker. You are our only hope"
"To thy heart thou shalt listen, young Linux maintainer. Then the right merge decision
thou wilt take".
>
> Linus
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/8] softirq: Consolidation and stack overrun fix v3
2013-09-26 16:43 ` [PATCH 0/8] softirq: Consolidation and stack overrun fix v3 Linus Torvalds
2013-09-26 17:38 ` Frederic Weisbecker
@ 2013-09-30 11:12 ` Frederic Weisbecker
1 sibling, 0 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2013-09-30 11:12 UTC (permalink / raw)
To: Linus Torvalds
Cc: LKML, Benjamin Herrenschmidt, Paul Mackerras, Ingo Molnar,
Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, James Hogan,
James E.J. Bottomley, Helge Deller, Martin Schwidefsky,
Heiko Carstens, David S. Miller, Andrew Morton, stable
On Thu, Sep 26, 2013 at 09:43:38AM -0700, Linus Torvalds wrote:
> On Thu, Sep 26, 2013 at 9:24 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> >
> > * Turn __ARCH_IRQ_EXIT_ON_IRQ_STACK old ad-hoc style symbol to proper Kconfig
> > (CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK)
> >
> > * Activates it to powerpc as well.
> >
> > * Fix a bit the changelog of patch 6/8
>
>
> Ack on the whole series.
>
> I guess x86-64 and powerpc no longer care, but for other architectures
> the stack usage issue can be a regression even if I think it's only
> been reported on Power. So I'm assuming we should pull this into 3.12.
> Yes?
>
> And what about earlier stable kernels? Afaik this was originally
> introduced by commit facd8b80c67a, merged into 3.9. Or was there some
> other trigger? Do we want to just do the "__do_softirq" ->
> "do_softirq()" change for stable?
>
> "Help us, Obi-wan Weisbecker. You are our only hope"
Ok, I'm going to split that in two pull requests for Ingo and Thomas.
One for the 1st patch as a regression fix to be applied for 3.12 and backported.
Another one for the rest of the series to be applied on the next merge window.
If anybody oppose, please tell.
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-09-30 11:12 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-26 16:24 [PATCH 0/8] softirq: Consolidation and stack overrun fix v3 Frederic Weisbecker
2013-09-26 16:24 ` [PATCH 1/8] irq: Force hardirq exit's softirq processing on its own stack Frederic Weisbecker
2013-09-26 16:43 ` [PATCH 0/8] softirq: Consolidation and stack overrun fix v3 Linus Torvalds
2013-09-26 17:38 ` Frederic Weisbecker
2013-09-30 11:12 ` Frederic Weisbecker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).