All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel J Blueman <daniel@numascale.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Hillf Danton <dhillf@gmail.com>, Borislav Petkov <bp@amd64.org>,
	Ingo Molnar <mingo@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Steffen Persvold <sp@numascale.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [3.14] core onlining/hotplug regression
Date: Fri, 25 Jul 2014 17:36:51 +0800	[thread overview]
Message-ID: <53D22533.9030401@numascale.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1407251059130.23352@nanos>

On 07/25/2014 05:05 PM, Thomas Gleixner wrote:
> On Fri, 25 Jul 2014, Daniel J Blueman wrote:
>> On a larger x86 system with 1728 cores, 3.15(.6) asserts on
>> smpboot_thread_fn's td->cpu != smp_processor_id() consistently after ~1500
>> cores are online.
>>
>> Reverting the only directly related changes I could find [1,2] doesn't help.
>> Debugging indicates there is a race where the created thread is quickly
>> migrated to core 0 when this occurs, since smp_processor_id returns 0 in these
>> cases. Thomas introduced a thread parked state to fix related issues a year
>> back. Linux 3.14(.13) boots just nice.
>
> Weird. Commits [1,2] are definitely not the culprits.
>
>> Full boot output is at:
>> https://resources.numascale.com/linux-315-thread-mig.txt
>
> Not really helpful, as we don't see what causes it. We just see the
> wreckage.
>
>> Any theories so far? I'll start bisecting when I have full access to the
>> system again in a week and I'll do some more debugging with intermittent
>> access before then.
>
> One thing you could try is enabling tracing.
>
>      "ftrace=function ftrace_dump_on_oops"
>
> It'll take a looooong time to spill out the traces, but that should
> give us the root cause precisely.

Good trick. I'll get this early next week and we'll see what's up.

Thanks,
   Daniel
-- 
Daniel J Blueman
Principal Software Engineer, Numascale

  reply	other threads:[~2014-07-25  9:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-25  7:50 [3.14] core onlining/hotplug regression Daniel J Blueman
2014-07-25  9:05 ` Thomas Gleixner
2014-07-25  9:36   ` Daniel J Blueman [this message]
2014-09-13  9:03   ` Daniel J Blueman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53D22533.9030401@numascale.com \
    --to=daniel@numascale.com \
    --cc=bp@amd64.org \
    --cc=dhillf@gmail.com \
    --cc=imammedo@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sp@numascale.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.