From: Prarit Bhargava <prarit@redhat.com>
To: Jakub Kicinski <kubakici@wp.pl>, LKML <linux-kernel@vger.kernel.org>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [bisected] x86 boot still broken on -rc2
Date: Mon, 4 Dec 2017 08:13:29 -0500 [thread overview]
Message-ID: <b7e29d24-2317-2355-65aa-04978bd45add@redhat.com> (raw)
In-Reply-To: <58b4f89a-63c2-b8bd-4414-fbc312c52697@redhat.com>
On 12/04/2017 07:28 AM, Prarit Bhargava wrote:
>
>
> On 12/03/2017 08:28 PM, Jakub Kicinski wrote:
>> Same thing on rc2, bisected down to:
>>
>> commit b4c0a7326f5dc0ef7a64128b0ae7d081f4b2cbd1 (refs/bisect/bad)
>> Author: Prarit Bhargava <prarit@redhat.com>
>> Date: Tue Nov 14 07:42:57 2017 -0500
>>
>> x86/smpboot: Fix __max_logical_packages estimate
>>
>> A system booted with a small number of cores enabled per package
>> panics because the estimate of __max_logical_packages is too low.
>>
>> This occurs when the total number of active cores across all packages is
>> less than the maximum core count for a single package. e.g.:
>>
>> On a 4 package system with 20 cores/package where only 4 cores are
>> enabled on each package, the value of __max_logical_packages is
>> calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.
>>
>> Calculate __max_logical_packages after the cpu enumeration has completed.
>> Use the boot cpu's data to extrapolate the number of packages.
>>
>> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>> Cc: Andi Kleen <ak@linux.intel.com>
>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Kan Liang <kan.liang@intel.com>
>> Cc: He Chen <he.chen@linux.intel.com>
>> Cc: Stephane Eranian <eranian@google.com>
>> Cc: Dave Hansen <dave.hansen@intel.com>
>> Cc: Piotr Luc <piotr.luc@intel.com>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Cc: Borislav Petkov <bp@suse.de>
>> Cc: Tim Chen <tim.c.chen@linux.intel.com>
>> Cc: Mathias Krause <minipli@googlemail.com>
>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>> Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@redhat.com
>>
>>
>> On Fri, 1 Dec 2017 16:39:54 -0800, Jakub Kicinski wrote:
>>> Hi!
>>>
>>> I'm hitting these after DaveM pulled rc1 into net-next on my Xeon
>>> E5-2630 v4 box. It also happens on linux-next. Did anyone else
>>> experience it? (.config attached)
>>>
>>> [ 5.003771] WARNING: CPU: 14 PID: 1 at ../arch/x86/events/intel/uncore.c:936 uncore_pci_probe+0x285/0x2b0
>>> [ 5.007544] Modules linked in:
>>> [ 5.007544] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
>>> [ 5.007544] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>
> I have a Dell R730 available for use. OOC are you booting with the default
> BIOS options?
>
Jakub, I was able to reproduce this on a similar system by DISABLING
hyperthreading in the BIOS. Doing this on other systems seems to have no
impact. What is odd about this system when booting is that the
kernel claims that hyperthreading is ENABLED:
x86: Booting SMP configuration:
.... node #0, CPUs: #1 #2 #3 #4
.... node #1, CPUs: #5 #6 #7 #8 #9
.... node #0, CPUs: #10 #11 #12 #13 #14
.... node #1, CPUs: #15 #16 #17 #18 #19
smp: Brought up 2 nodes, 20 CPUs
smpboot: Max logical packages: 1
which means that the calculation of logical packages is wrong because
ncpus = cpu_data(0).booted_cores * smp_num_siblings;
ncpus = 10 * 2;
ncpus = 20;
smp_num_siblings is defined as "The number of threads in a core" which
should be 1 if HT/SMT is disabled.
It looks like my patch has exposed a bug in the
smp_num_siblings calculation. I'm still debugging ...
FWIW, I did test this code on systems by disabling HT/SMT in BIOS on
several systems. I have tested those systems again and don't see a
problem. It is something peculiar to a few systems.
P.
next prev parent reply other threads:[~2017-12-04 13:13 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-02 0:39 x86 boot broken on -rc1? Jakub Kicinski
2017-12-04 1:28 ` [bisected] x86 boot still broken on -rc2 Jakub Kicinski
2017-12-04 12:28 ` Prarit Bhargava
2017-12-04 13:13 ` Prarit Bhargava [this message]
2017-12-04 16:45 ` Prarit Bhargava
2017-12-04 19:48 ` Jakub Kicinski
2017-12-13 19:37 ` x86 boot broken on -rc1? Björn Töpel
2017-12-13 19:58 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b7e29d24-2317-2355-65aa-04978bd45add@redhat.com \
--to=prarit@redhat.com \
--cc=kubakici@wp.pl \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox