Re: Question concerning RCU - Paul E. McKenney

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: "Stoidner, Christoph" <c.stoidner@arvero.de>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Question concerning RCU
Date: Sun, 11 Jan 2015 12:26:04 -0800	[thread overview]
Message-ID: <20150111202604.GC8063@linux.vnet.ibm.com> (raw)
In-Reply-To: <81b94fc89c774b71a967fc93823e9c63@EX132MBOX1A.de2.local>

On Sun, Jan 11, 2015 at 11:59:45AM +0000, Stoidner, Christoph wrote:
> 
> Hi Paul,
> 
> many thanks for your fast answer!
> 
> Now I have changed my application in that way, that it does not require 
> Xenomai/I-Pipe anymore. That means my kernel is build now from 
> mainline source, with preempt_rt only and no Xenomai or I-Pipe. 
> However the problem is exact the same. After some runtime (minutes 
> or hours) the kernel freezes and JTAG debugging shows that it ends-up 
> in an endless loop in rcu_print_task_stall (as described before). 
> 
> > First I have seen this.  Were you doing lots of CPU-hotplug operations?
> 
> My system has only one core. So I think there should not be any 
> CPU-hotplugging.

OK, so no point in providing you that set of patches, then.

> > If you have more CPUs than the value of CONFIG_RCU_FANOUT (which
> > defaults to 16), and if your workload offlined a full block of CPUs (full
> > blocks being CPUs 0-15, 16-31, 32-47, and so on for the default value
> > of CONFIG_RCU_FANOUT), then there is a theoretical issue that -might-
> > cause the problem that you are seeing.
> 
> Also this could not only happen on a single core system. Am I right?

Yep, no way this can happen without a lot of CPUs and a lot of CPU
hotplugging.

> I have no idea how to find the problem. Do you have any more hints or ideas?

You got stack traces with the stall warnings, correct?  If so, please look
at them and at Documentation/RCU/stallwarn.txt and see if the kernel is
looping somewhere inappropriate.

I am not familiar with the low-level ARM kernel code, but the stack below
leads me to suspect that your kernel is interrupting itself to death or
is improperly handling interrupts.

							Thanx, Paul

> Here is a backtrace when the problem has occurred on the system without Xenomai/I-Pipe:
> 
> #0  rcu_print_task_stall (rnp=0xc0498dc8 <rcu_preempt_state>) at kernel/rcutree_plugin.h:528
> #1  0xc005cabc in print_other_cpu_stall (rsp=0xc0498dc8 <rcu_preempt_state>) at kernel/rcutree.c:885
> #2  check_cpu_stall (rdp=0x80000093, rsp=0xc0498dc8 <rcu_preempt_state>) at kernel/rcutree.c:977
> #3  __rcu_pending (rdp=0x80000093, rsp=0xc0498dc8 <rcu_preempt_state>) at kernel/rcutree.c:2750
> #4  rcu_pending (cpu=<optimized out>) at kernel/rcutree.c:2800
> #5  rcu_check_callbacks (cpu=<optimized out>, user=<optimized out>) at kernel/rcutree.c:2179
> #6  0xc0027648 in update_process_times (user_tick=0) at kernel/timer.c:1427
> #7  0xc004e840 in tick_sched_timer (timer=0xc0498860 <tick_cpu_sched>) at kernel/time/tick-sched.c:1095
> #8  0xc003a0dc in __run_hrtimer (timer=0xc0498860 <tick_cpu_sched>, now=<optimized out>) at kernel/hrtimer.c:1363
> #9  0xc003ab4c in hrtimer_interrupt (dev=<optimized out>) at kernel/hrtimer.c:1582
> #10 0xc02bf7bc in mxs_timer_interrupt (irq=<optimized out>, dev_id=<optimized out>) at drivers/clocksource/mxs_timer.c:132
> #11 0xc0055154 in handle_irq_event_percpu (desc=0xc7804c00, action=0xc04b0520 <mxs_timer_irq>) at kernel/irq/handle.c:144
> #12 0xc0055320 in handle_irq_event (desc=0xc7804c00) at kernel/irq/handle.c:197
> #13 0xc00578b8 in handle_level_irq (irq=<optimized out>, desc=0xc7804c00) at kernel/irq/chip.c:406
> #14 0xc0054aec in generic_handle_irq_desc (desc=<optimized out>, irq=16) at include/linux/irqdesc.h:115
> #15 generic_handle_irq (irq=16) at kernel/irq/irqdesc.c:314
> #16 0xc000f58c in handle_IRQ (irq=16, regs=<optimized out>) at arch/arm/kernel/irq.c:80
> #17 0xc000e360 in __irq_svc () at arch/arm/kernel/entry-armv.S:202
> #18 0xc000e360 in __irq_svc () at arch/arm/kernel/entry-armv.S:202
> #19 0xc000e360 in __irq_svc () at arch/arm/kernel/entry-armv.S:202
> #20 0xc000e360 in __irq_svc () at arch/arm/kernel/entry-armv.S:202
> ...
> 
> Thanks and regards,
> Christoph
>

next prev parent reply	other threads:[~2015-01-11 20:26 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <94921a144a97457385ae95b838c3c6fa@EX132MBOX1A.de2.local>
2015-01-06 19:43 ` Question concerning RCU Paul E. McKenney
2015-01-11 11:59   ` AW: " Stoidner, Christoph
2015-01-11 20:26     ` Paul E. McKenney [this message]
2015-01-12 11:48       ` Stoidner, Christoph
2015-01-12 19:44         ` Paul E. McKenney
2015-01-14  8:38           ` AW: " Stoidner, Christoph
2015-01-14 18:21             ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150111202604.GC8063@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=c.stoidner@arvero.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).