Re: rcu_preempt self-detected stall on CPU from 4.4-rc4, since 3.17

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Ross Green <rgkernel@gmail.com>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
	jiangshanlai@gmail.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
	rostedt@goodmis.org, dhowells@redhat.com,
	"Eric Dumazet" <edumazet@google.com>,
	dvhart@linux.intel.com,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	oleg@redhat.com, "pranith kumar" <bobby.prani@gmail.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.4-rc4, since 3.17
Date: Sat, 2 Jan 2016 22:17:20 -0800	[thread overview]
Message-ID: <20160103061720.GT4054@linux.vnet.ibm.com> (raw)
In-Reply-To: <CANfgCY2=np7CX1vHxY7gyYb1-6GZJB7o4yFStkOsh5Yta_-v7A@mail.gmail.com>

On Sun, Jan 03, 2016 at 04:29:11PM +1100, Ross Green wrote:
> Still seeing these rcu_preempt stalls on kernels through to 4.4-rc7
> 
> Still have not found a sure fire method to evoke this stall, but have
> found that it will normally occur within a week of running a kernel -
> usually when it is quiet with light load.
> 
> Have seen similar self detected stalls all the way back to 3.17.
> Most recent kernels have included 4.4-rc5 4.4-rc6 and 4.4-rc7
> 
> Regards,
> 
> Ross
> 
> On Fri, Dec 11, 2015 at 10:17 PM, Ross Green <rgkernel@gmail.com> wrote:
> > I have been getting these stalls in kernels going back to 3.17.
> >
> > This stall occurs usually under light load but often requires several
> > days to show itself. I have not found any simple way to trigger the
> > stall. Indeed heavy workloads seems not to show the fault.
> >
> > Anyone have any thoughts here?
> >
> > The recent patch by peterz with kernel/sched/wait.c I thought might
> > help the situation, but alas after a few days of running 4.4-rc4 the
> > following turned up.
> >
> > [179922.003570] INFO: rcu_preempt self-detected stall on CPU
> > [179922.008178] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [179922.008178]         0-...: (1 ticks this GP) idle=a91/1/0

CPU 0 is non-idle from an RCU perspective.

> > softirq=1296733/1296733 fqs=0
> > [179922.008178]
> > [179922.008209] (detected by 1, t=8775 jiffies, g=576439, c=576438, q=102)
> > [179922.008209] Task dump for CPU 0:
> > [179922.008209] swapper/0       R [179922.008209]  running [179922.008209]     0     0      0 0x00000000
> > [179922.008209] Backtrace:
> >
> > [179922.008239] Backtrace aborted due to bad frame pointer <c0907f54>

Can't have everything, I guess...

> > [179922.008239] rcu_preempt kthread starved for 8775 jiffies! g576439 c576438 f0x0 s3 ->state=0x1

Something is keeping the rcu_preempt grace-period kthread from
running.  This far into the grace period, it should have a
timer event waking it every few jiffies.  It is currently
in TASK_INTERRUPTIBLE state.

> > [179922.060302]         0-...: (1 ticks this GP) idle=a91/1/0 softirq=1296733/1296733 fqs=0
> > [179922.068023]          (t=8775 jiffies g=576439 c=576438 q=102)
> > [179922.073913] rcu_preempt kthread starved for 8775 jiffies! g576439 c576438 f0x2 s3 ->state=0x100

Same story, same grace period, pretty much same time.  Now there is an FQS
request (f0x2) and the state is now TASK_WAKING (->state=0x100 == 256).

> > [179922.083587] Task dump for CPU 0:
> > [179922.087097] swapper/0       R running      0     0      0 0x00000000
> > [179922.093292] Backtrace:
> > [179922.096313] [<c0013ea8>] (dump_backtrace) from [<c00140a4>] (show_stack+0x18/0x1c)
> > [179922.104675]  r7:c0908514 r6:80080193 r5:00000000 r4:c090aca8
> > [179922.110809] [<c001408c>] (show_stack) from [<c005a858>] (sched_show_task+0xbc/0x110)
> > [179922.119049] [<c005a79c>] (sched_show_task) from [<c005ccd4>] (dump_cpu_task+0x40/0x44)
> > [179922.127624]  r5:c0917680 r4:00000000
> > [179922.131042] [<c005cc94>] (dump_cpu_task) from [<c0082268>] (rcu_dump_cpu_stacks+0x9c/0xdc)
> > [179922.140350]  r5:c0917680 r4:00000001
> > [179922.143157] [<c00821cc>] (rcu_dump_cpu_stacks) from [<c008637c>] (rcu_check_callbacks+0x504/0x8e4)
> > [179922.153808]  r9:c0908514 r8:c0917680 r7:00000066 r6:2eeab000
> > r5:c0904300 r4:ef7af300
> > [179922.161499] [<c0085e78>] (rcu_check_callbacks) from [<c00895d0>] (update_process_times+0x40/0x6c)
> > [179922.170898]  r10:c009a584 r9:00000001 r8:ef7abc4c r7:0000a3a3
> > r6:4ec3391c r5:00000000
> > [179922.179901]  r4:c090aca8
> > [179922.182708] [<c0089590>] (update_process_times) from [<c009a580>]
> > (tick_sched_handle+0x50/0x54)
> > [179922.192108]  r5:c0907f10 r4:ef7abe40
> > [179922.195983] [<c009a530>] (tick_sched_handle) from [<c009a5d4>]
> > (tick_sched_timer+0x50/0x94)
> > [179922.204895] [<c009a584>] (tick_sched_timer) from [<c0089fe4>]
> > (__hrtimer_run_queues+0x110/0x1a0)
> > [179922.214324]  r7:00000000 r6:ef7abc40 r5:ef7abe40 r4:ef7abc00
> > [179922.220428] [<c0089ed4>] (__hrtimer_run_queues) from [<c008a674>]
> > (hrtimer_interrupt+0xac/0x1f8)
> > [179922.227111]  r10:ef7abc78 r9:ef7abc98 r8:ef7abc14 r7:ef7abcb8
> > r6:ffffffff r5:00000003
> > [179922.238220]  r4:ef7abc00
> > [179922.238220] [<c008a5c8>] (hrtimer_interrupt) from [<c00170ec>]
> > (twd_handler+0x38/0x48)
> > [179922.238220]  r10:c09084e8 r9:fa241100 r8:00000011 r7:ef028780
> > r6:c092574c r5:ef005cc0

All interrupt stack up to this point.

It is quite possible that the stuff below here is at fault as well.
That said, the grace-period should actually get to execute at some
point.  Do you have a heavy real-time load that might be starving
things?

							Thanx, Paul

> > [179922.257110]  r4:00000001
> > [179922.257110] [<c00170b4>] (twd_handler) from [<c007c8f8>] (handle_percpu_devid_irq+0x74/0x8c)
> > [179922.269683]  r5:ef005cc0 r4:ef7b1740
> > [179922.269683] [<c007c884>] (handle_percpu_devid_irq) from [<c0078454>] (generic_handle_irq+0x2c/0x3c)
> > [179922.283233]  r9:fa241100 r8:ef008000 r7:00000001 r6:00000000
> > r5:00000000 r4:c09013e8
> > [179922.290985] [<c0078428>] (generic_handle_irq) from [<c007872c>] (__handle_domain_irq+0x64/0xbc)
> > [179922.300842] [<c00786c8>] (__handle_domain_irq) from [<c00094c0>]
> > (gic_handle_irq+0x50/0x90)
> > [179922.303222]  r9:fa241100 r8:fa240100 r7:c0907f10 r6:fa24010c
> > r5:c09087a8 r4:c0925748
> > [179922.315216] [<c0009470>] (gic_handle_irq) from [<c0014bd4>]
> > (__irq_svc+0x54/0x90)
> > [179922.319000] Exception stack(0xc0907f10 to 0xc0907f58)
> > [179922.331542] 7f00:                                     00000000
> > ef7ab390 fe600000 00000000
> > [179922.331542] 7f20: c0906000 c090849c c0900364 c06a8124 c0907f80
> > c0944563 c09084e8 c0907f6c
> > [179922.349029] 7f40: c0907f4c c0907f60 c00287ac c0010ba8 60080113 ffffffff
> > [179922.349029]  r9:c0944563 r8:c0907f80 r7:c0907f44 r6:ffffffff
> > r5:60080113 r4:c0010ba8
> > [179922.357116] [<c0010b80>] (arch_cpu_idle) from [<c006f034>]
> > (default_idle_call+0x28/0x34)
> > [179922.368926] [<c006f00c>] (default_idle_call) from [<c006f154>]
> > (cpu_startup_entry+0x114/0x18c)
> > [179922.368926] [<c006f040>] (cpu_startup_entry) from [<c069fc6c>]
> > (rest_init+0x90/0x94)
> > [179922.385284]  r7:ffffffff r4:00000002
> > [179922.393463] [<c069fbdc>] (rest_init) from [<c08bbcec>]
> > (start_kernel+0x370/0x37c)
> > [179922.400421]  r5:c0947000 r4:00000000
> > [179922.400421] [<c08bb97c>] (start_kernel) from [<8000807c>] (0x8000807c)
> > $

next prev parent reply	other threads:[~2016-01-03  6:17 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-11 11:17 rcu_preempt self-detected stall on CPU from 4.4-rc4, since 3.17 Ross Green
2015-12-11 11:30 ` Peter Zijlstra
2016-01-03  5:29 ` Ross Green
2016-01-03  6:17   ` Paul E. McKenney [this message]
2016-01-03  8:27     ` Ross Green
2016-01-03 18:15       ` Paul E. McKenney
2016-01-03 22:00         ` Ross Green
2016-01-04 14:21           ` Ross Green
2016-01-06 23:10             ` Ross Green
2016-01-08 18:35               ` Paul E. McKenney
2016-01-14  5:39                 ` Ross Green
2016-01-16  0:39                   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160103061720.GT4054@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bobby.prani@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhart@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rgkernel@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.