All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel J Blueman <daniel@numascale.com>
To: Thomas Gleixner <tglx@linutronix.de>,
	Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Hillf Danton <dhillf@gmail.com>, Borislav Petkov <bp@amd64.org>,
	Ingo Molnar <mingo@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Steffen Persvold <sp@numascale.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [3.14] core onlining/hotplug regression
Date: Sat, 13 Sep 2014 17:03:14 +0800	[thread overview]
Message-ID: <54140852.8090803@numascale.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1407251059130.23352@nanos>

On 07/25/2014 05:05 PM, Thomas Gleixner wrote:
> On Fri, 25 Jul 2014, Daniel J Blueman wrote:
>> On a larger x86 system with 1728 cores, 3.15(.6) asserts on
>> smpboot_thread_fn's td->cpu != smp_processor_id() consistently after ~1500
>> cores are online.
>>
>> Reverting the only directly related changes I could find [1,2] doesn't help.
>> Debugging indicates there is a race where the created thread is quickly
>> migrated to core 0 when this occurs, since smp_processor_id returns 0 in these
>> cases. Thomas introduced a thread parked state to fix related issues a year
>> back. Linux 3.14(.13) boots just nice.
>
> Weird. Commits [1,2] are definitely not the culprits.
>
>> Full boot output is at:
>> https://resources.numascale.com/linux-315-thread-mig.txt
>
> Not really helpful, as we don't see what causes it. We just see the
> wreckage.
>
>> Any theories so far? I'll start bisecting when I have full access to the
>> system again in a week and I'll do some more debugging with intermittent
>> access before then.
>
> One thing you could try is enabling tracing.
>
>      "ftrace=function ftrace_dump_on_oops"
>
> It'll take a looooong time to spill out the traces, but that should
> give us the root cause precisely.

It turns out that bisecting led to Lai's patch "Fix hotplug vs. 
set_cpus_allowed_ptr()" [1]. Reverting it prevents the smpboot.c 
BUG_ON(td->cpu != smp_processor_id()) in smpboot_thread_fn from tripping.

-- [1]

commit 6acbfb96976fc3350e30d964acb1dbbdf876d55e
Author: Lai Jiangshan <laijs@cn.fujitsu.com>
Date:   Fri May 16 11:50:42 2014 +0800

     sched: Fix hotplug vs. set_cpus_allowed_ptr()

     Lai found that:

       WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 
native_smp_send_reschedule+0x2d/0x4b()
       ...
       migration_cpu_stop+0x1d/0x22

     was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
     always a sub-set of cpu_online_mask.

     This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active 
madness").

     So set active and online at the same time to avoid this particular
     problem.

     Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness")
     Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
     Signed-off-by: Peter Zijlstra <peterz@infradead.org>
     Cc: Andrew Morton <akpm@linux-foundation.org>
     Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
     Cc: Linus Torvalds <torvalds@linux-foundation.org>
     Cc: Michael wang <wangyun@linux.vnet.ibm.com>
     Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
     Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
     Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
     Cc: Toshi Kani <toshi.kani@hp.com>
     Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
     Signed-off-by: Ingo Molnar <mingo@kernel.org>
-- 
Daniel J Blueman
Principal Software Engineer, Numascale

      parent reply	other threads:[~2014-09-13  9:03 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-25  7:50 [3.14] core onlining/hotplug regression Daniel J Blueman
2014-07-25  9:05 ` Thomas Gleixner
2014-07-25  9:36   ` Daniel J Blueman
2014-09-13  9:03   ` Daniel J Blueman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54140852.8090803@numascale.com \
    --to=daniel@numascale.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@amd64.org \
    --cc=dhillf@gmail.com \
    --cc=imammedo@redhat.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sp@numascale.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.