public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Frank Rowand <frank.rowand@am.sony.com>
Cc: "paulmck@linux.vnet.ibm.com" <paulmck@linux.vnet.ibm.com>,
	"Rowand, Frank" <Frank_Rowand@sonyusa.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-rt-users <linux-rt-users@vger.kernel.org>,
	Mike Galbraith <efault@gmx.de>, Ingo Molnar <mingo@tglx.de>,
	Venkatesh Pallipadi <venki@google.com>,
	Russell King <linux@arm.linux.org.uk>
Subject: Re: [ANNOUNCE] 3.0.1-rt11
Date: Wed, 7 Sep 2011 11:25:29 +0200 (CEST)	[thread overview]
Message-ID: <alpine.LFD.2.02.1109071115410.2723@ionos> (raw)
In-Reply-To: <4E66DCAB.8090801@am.sony.com>

On Tue, 6 Sep 2011, Frank Rowand wrote:

> On 08/26/11 16:55, Paul E. McKenney wrote:
> > On Wed, Aug 24, 2011 at 04:58:49PM -0700, Frank Rowand wrote:
> >> On 08/13/11 03:53, Peter Zijlstra wrote:
> >>>
> >>> Whee, I can skip release announcements too!
> >>>
> >>> So no the subject ain't no mistake its not, 3.0.1-rt11 is there for the
> >>> grabs.
> 
> < snip >
> 
> >> I have a consistent (every boot) hang on boot.  With a few
> >> hacks to get console output, I get the
> >>
> >>   rcu_preempt_state detected stalls on CPUs/tasks
> 
> < snip >
> 
> >> This is an ARM NaviEngine (out of tree, so I also have applied
> >> a series of pages for platform support).
> >>
> >> CONFIG_PREEMPT_RT_FULL is set.  Full config is attached.
> 
> I have also replicated the problem on the ARM RealView (in tree) and
> without the RT patches.
> 
> > 
> > Hmmm...  The last few that I have seen that looked like this were
> > due to my messing up rcutorture so that the RCU-boost testing kthreads
> > ran CPU-bound at real-time priority.
> > 
> > Is it possible that something similar is happening on your system?
> > 
> >                                                         Thanx, Paul
> 
> The problem ended up being caused by the allowed cpus mask being set
> to all possible cpus for the ksoftirqd on the secondary processors.
> So the RCU softirq was never executing on cpu 2.
> 
> I'll test the following patch on 3.1 tomorrow.
> 
> -Frank Rowand
> 
> 
> Symptom: rcu stall
> 
> The problem was that ksoftirqd was woken on the secondary processors before
> the secondary processors were online.  This led to allowed cpus being set
> to all cpus.
> 
>    wake_up_process()
>       try_to_wake_up()
>          select_task_rq()
>             if (... || !cpu_online(cpu))
>                select_fallback_rq(task_cpu(p), p)
>                   ...
>                   /* No more Mr. Nice Guy. */
>                   dest_cpu = cpuset_cpus_allowed_fallback(p)
>                      do_set_cpus_allowed(p, cpu_possible_mask)
>                         #  Thus ksoftirqd can now run on any cpu...

This smells badly like the problem we've seen on x86 before. And
looking at the arm SMP boot code:

asmlinkage void __cpuinit secondary_start_kernel(void)
{
	.....

	/*
	 * Give the platform a chance to do its own initialisation.
	 */
	platform_secondary_init(cpu);

	/*
	 * Enable local interrupts.
	 */
	notify_cpu_starting(cpu);
	local_irq_enable();

Here we enable interrupts, but the CPU is neither online nor active.

	local_fiq_enable();

	/*
	 * Setup the percpu timer for this CPU.
	 */
	percpu_timer_setup();

	calibrate_delay();

	smp_store_cpu_info(cpu);

	/*
	 * OK, now it's safe to let the boot CPU continue.  Wait for
	 * the CPU migration code to notice that the CPU is online
	 * before we continue.
	 */
	set_cpu_online(cpu, true);
	while (!cpu_active(cpu))
		cpu_relax();

That's the same thing as x86 is doing, just with interrupts enabled
and therefor it does not help. And the softirq is only part of the
problem, the same can happen with worker threads and other cpu bound
nasties.

	/*
	 * OK, it's off to the idle thread for us
	 */
	cpu_idle();
}

So that wants to be ordered differently. Patch below.

Thanks,

	tglx

Index: linux-2.6/arch/arm/kernel/smp.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/smp.c
+++ linux-2.6/arch/arm/kernel/smp.c
@@ -305,6 +305,16 @@ asmlinkage void __cpuinit secondary_star
 	 * Enable local interrupts.
 	 */
 	notify_cpu_starting(cpu);
+
+	/*
+	 * OK, now it's safe to let the boot CPU continue.  Wait for
+	 * the CPU migration code to notice that the CPU is online
+	 * before we continue.
+	 */
+	set_cpu_online(cpu, true);
+	while (!cpu_active(cpu))
+		cpu_relax();
+
 	local_irq_enable();
 	local_fiq_enable();
 
@@ -318,15 +328,6 @@ asmlinkage void __cpuinit secondary_star
 	smp_store_cpu_info(cpu);
 
 	/*
-	 * OK, now it's safe to let the boot CPU continue.  Wait for
-	 * the CPU migration code to notice that the CPU is online
-	 * before we continue.
-	 */
-	set_cpu_online(cpu, true);
-	while (!cpu_active(cpu))
-		cpu_relax();
-
-	/*
 	 * OK, it's off to the idle thread for us
 	 */
 	cpu_idle();


  parent reply	other threads:[~2011-09-07 12:19 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-13 10:53 [ANNOUNCE] 3.0.1-rt11 Peter Zijlstra
2011-08-13 11:48 ` Mike Galbraith
2011-08-13 11:58   ` Peter Zijlstra
2011-08-13 13:59     ` Mike Galbraith
2011-08-13 14:23       ` Peter Zijlstra
2011-08-13 16:27       ` Paul E. McKenney
2011-08-14  4:23         ` Mike Galbraith
2011-08-16 14:17           ` Nivedita Singhvi
2011-08-16 15:10             ` Mike Galbraith
2011-08-16 15:18               ` Nivedita Singhvi
2011-08-16 19:31               ` Paul E. McKenney
2011-08-17  4:28                 ` Mike Galbraith
2011-08-17  5:03                   ` Nivedita Singhvi
2011-08-15 10:09         ` Mike Galbraith
2011-08-14 21:19 ` Clark Williams
2011-08-23 14:12 ` [patch] sched, rt: fix migrate_enable() thinko Mike Galbraith
2011-09-08  2:11   ` Frank Rowand
2011-09-08  4:58     ` Mike Galbraith
2011-08-24 23:58 ` [ANNOUNCE] 3.0.1-rt11 Frank Rowand
2011-08-26 23:55   ` Paul E. McKenney
2011-08-29 19:57     ` Frank Rowand
2011-08-30  3:17       ` Paul E. McKenney
2011-09-07  2:53     ` Frank Rowand
2011-09-07  3:00       ` Frank Rowand
2011-09-07  6:42       ` Paul E. McKenney
2011-09-07  9:25       ` Thomas Gleixner [this message]
2011-09-07 10:46         ` Russell King - ARM Linux
2011-09-07 10:47           ` Russell King - ARM Linux
2011-09-07 10:57             ` Thomas Gleixner
2011-09-07 14:01               ` Russell King - ARM Linux
2011-09-07 16:32                 ` Thomas Gleixner
2011-09-07 16:33                 ` Frank Rowand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.02.1109071115410.2723@ionos \
    --to=tglx@linutronix.de \
    --cc=Frank_Rowand@sonyusa.com \
    --cc=efault@gmx.de \
    --cc=frank.rowand@am.sony.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mingo@tglx.de \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=venki@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox