From: Peter Zijlstra <peterz@infradead.org>
To: Xiong Zhou <jencce.kernel@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@kernel.org>, Borislav Petkov <bp@alien8.de>,
Andreas Herrmann <aherrmann@suse.com>
Subject: Re: 4.5.0+ panic when setup loop device
Date: Thu, 17 Mar 2016 10:52:20 +0100 [thread overview]
Message-ID: <20160317095220.GO6344@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <CADJHv_s1cM=NxkCSHEm_1MctDYJUkDtutLB=aFtx9rB4a3FZdw@mail.gmail.com>
On Thu, Mar 17, 2016 at 09:56:05AM +0800, Xiong Zhou wrote:
> On Wed, Mar 16, 2016 at 11:26 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > Can you please provide a full boot log and the output of 'cat /proc/cpuinfo' ?
Mar 17 17:34:30 myhost kernel: smpboot: Max logical packages: 1
Mar 17 17:34:30 myhost kernel: smpboot: APIC(20) Converting physical 1 to logical package 0
Mar 17 17:34:30 myhost kernel: smpboot: APIC(40) Package 2 exceeds logical package map
So that is busted.. it turns out AMD gets x86_max_cores wrong when there
are compute units.
Mar 17 17:34:30 myhost kernel: smpboot: CPU 1 APICId 40 disabled
Mar 17 17:34:30 myhost kernel: Switched APIC routing to physical flat.
Mar 17 17:34:30 myhost kernel: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
Mar 17 17:34:30 myhost kernel: smpboot: CPU0: AMD Opteron(TM) Processor 6274 (family: 0x15, model: 0x1, stepping: 0x2)
Mar 17 17:34:30 myhost kernel: Performance Events: Fam15h core perfctr, Broken BIOS detected, complain to your hardware vendor.
Mar 17 17:34:30 myhost kernel: [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 430076)
Mar 17 17:34:30 myhost kernel: AMD PMU driver.
Mar 17 17:34:30 myhost kernel: ... version: 0
Mar 17 17:34:30 myhost kernel: ... bit width: 48
Mar 17 17:34:30 myhost kernel: ... generic registers: 6
Mar 17 17:34:30 myhost kernel: ... value mask: 0000ffffffffffff
Mar 17 17:34:30 myhost kernel: ... max period: 00007fffffffffff
Mar 17 17:34:30 myhost kernel: ... fixed-purpose events: 0
Mar 17 17:34:30 myhost kernel: ... event mask: 000000000000003f
Mar 17 17:34:30 myhost kernel: NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #17
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #18
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #19
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #20
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #21
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #22
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #23
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #24
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #25
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #26
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #27
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #28
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #29
Mar 17 17:34:30 myhost kernel: .... node #0, CPUs: #30
Mar 17 17:34:30 myhost kernel: .... node #3, CPUs: #31
Mar 17 17:34:30 myhost kernel: x86: Booted up 2 nodes, 31 CPUs
And that is one weird node mapping..
I have a similar system, which after the below patch says:
[ 0.182174] max_cores: 8, cpu_ids: 32, num_siblings: 2, coreid_bits: 5
[ 0.188712] smpboot: Max logical packages: 2
[ 0.192988] smpboot: APIC(20) Converting physical 1 to logical package 0
[ 0.199689] smpboot: APIC(40) Converting physical 2 to logical package 1
[ 0.206405] Switched APIC routing to physical flat.
[ 0.211851] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.329578] smpboot: CPU0: AMD Opteron(tm) Processor 6278 (family: 0x15, model: 0x1, stepping: 0x2)
[ 0.338705] Performance Events: Fam15h core perfctr, AMD PMU driver.
[ 0.345134] ... version: 0
[ 0.349147] ... bit width: 48
[ 0.353262] ... generic registers: 6
[ 0.357274] ... value mask: 0000ffffffffffff
[ 0.362586] ... max period: 00007fffffffffff
[ 0.367900] ... fixed-purpose events: 0
[ 0.371911] ... event mask: 000000000000003f
[ 0.378664] MCE: In-kernel MCE decoding enabled.
[ 0.383965] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[ 0.393079] x86: Booting SMP configuration:
[ 0.397262] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7
[ 0.848764] .... node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15
[ 1.364701] .... node #2, CPUs: #16 #17 #18 #19 #20 #21 #22 #23
[ 1.898586] .... node #3, CPUs: #24 #25 #26 #27 #28 #29 #30 #31
[ 2.413417] x86: Booted up 4 nodes, 32 CPUs
Could you please try? I'm not sure how this would explain your loop
device bug fail, but it certainly pointed towards broken.
Andreas; Borislav said to Cc you since you wrote all this.
The issue is that Linux assumes:
nr_logical_cpus = nr_cores * nr_siblings
But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings
to 2 as well.
Thomas; I removed that first branch testing pkg against
__max_logical_packages because if the first pkg id is larger, then the
find_first_zero will find us logical package id 0. However, if the
second pkg id is indeed 0, we'll again claim it without testing if it
was already taken. Also, it fails to print the mapping.
---
arch/x86/kernel/cpu/amd.c | 8 ++++----
arch/x86/kernel/smpboot.c | 11 ++++++-----
2 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 97c59fd..6216e80 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -310,9 +310,9 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
node_id = ecx & 7;
/* get compute unit information */
- smp_num_siblings = ((ebx >> 8) & 3) + 1;
+ cores_per_cu = smp_num_siblings = ((ebx >> 8) & 3) + 1;
+ c->x86_max_cores /= smp_num_siblings;
c->compute_unit_id = ebx & 0xff;
- cores_per_cu += ((ebx >> 8) & 3);
} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
u64 value;
@@ -328,8 +328,8 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
u32 cus_per_node;
set_cpu_cap(c, X86_FEATURE_AMD_DCM);
- cores_per_node = c->x86_max_cores / nodes_per_socket;
- cus_per_node = cores_per_node / cores_per_cu;
+ cus_per_node = c->x86_max_cores / nodes_per_socket;
+ cores_per_node = cus_per_node * cores_per_cu;
/* store NodeID, use llc_shared_map to store sibling info */
per_cpu(cpu_llc_id, cpu) = node_id;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 643dbdc..15c5fda 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -274,11 +274,6 @@ int topology_update_package_map(unsigned int apicid, unsigned int cpu)
if (test_and_set_bit(pkg, physical_package_map))
goto found;
- if (pkg < __max_logical_packages) {
- set_bit(pkg, logical_package_map);
- physical_to_logical_pkg[pkg] = pkg;
- goto found;
- }
new = find_first_zero_bit(logical_package_map, __max_logical_packages);
if (new >= __max_logical_packages) {
physical_to_logical_pkg[pkg] = -1;
@@ -314,6 +309,12 @@ static void __init smp_init_package_map(void)
unsigned int ncpus, cpu;
size_t size;
+ printk("max_cores: %d, cpu_ids: %d, num_siblings: %d, coreid_bits: %d\n",
+ boot_cpu_data.x86_max_cores,
+ nr_cpu_ids,
+ smp_num_siblings,
+ boot_cpu_data.x86_coreid_bits);
+
/*
* Today neither Intel nor AMD support heterogenous systems. That
* might change in the future....
next prev parent reply other threads:[~2016-03-17 9:52 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-16 6:48 4.5.0+ panic when setup loop device Xiong Zhou
2016-03-16 15:26 ` Thomas Gleixner
2016-03-17 1:56 ` Xiong Zhou
2016-03-17 9:52 ` Peter Zijlstra [this message]
2016-03-17 9:56 ` Peter Zijlstra
2016-03-17 10:21 ` Thomas Gleixner
2016-03-17 10:26 ` Peter Zijlstra
2016-03-17 11:39 ` Thomas Gleixner
2016-03-17 11:51 ` Peter Zijlstra
2016-03-17 11:57 ` Borislav Petkov
2016-03-17 12:01 ` Thomas Gleixner
2016-03-17 16:42 ` Jens Axboe
2016-03-17 18:26 ` Jens Axboe
2016-03-17 20:20 ` Thomas Gleixner
2016-03-17 20:23 ` Jens Axboe
2016-03-17 20:30 ` Thomas Gleixner
2016-03-17 20:41 ` Jens Axboe
2016-03-18 2:31 ` Xiong Zhou
2016-03-18 4:11 ` Mike Galbraith
2016-03-18 7:51 ` Peter Zijlstra
2016-03-18 10:15 ` Peter Zijlstra
2016-03-18 12:39 ` Mike Galbraith
2016-03-18 13:32 ` Peter Zijlstra
2016-03-18 14:07 ` Mike Galbraith
2016-03-18 11:55 ` Thomas Gleixner
2016-03-18 12:39 ` Mike Galbraith
2016-03-19 9:31 ` [tip:x86/urgent] x86/topology: Use total_cpus not nr_cpu_ids for logical packages tip-bot for Thomas Gleixner
2016-03-29 8:48 ` [tip:x86/urgent] x86/topology: Fix AMD core count tip-bot for Peter Zijlstra
-- strict thread matches above, loose matches on Subject: below --
2016-03-18 15:03 [PATCH 0/3] x86 topology fixes Peter Zijlstra
2016-03-18 15:03 ` [PATCH 1/3] x86/topology: Fix logical pkg mapping Peter Zijlstra
2016-03-19 9:30 ` [tip:x86/urgent] x86/topology: Fix logical package mapping tip-bot for Peter Zijlstra
2016-03-18 15:03 ` [PATCH 2/3] x86/topology: Fix AMD core count Peter Zijlstra
2016-03-18 16:41 ` Borislav Petkov
2016-03-21 3:07 ` Huang Rui
2016-03-21 3:46 ` Huang Rui
2016-03-21 8:26 ` Borislav Petkov
2016-03-21 9:18 ` Huang Rui
2016-03-21 8:56 ` Thomas Gleixner
2016-03-21 8:21 ` Peter Zijlstra
2016-03-21 9:46 ` Huang Rui
2016-03-21 13:57 ` Borislav Petkov
2016-03-22 8:10 ` Sherry Hurwitz
2016-03-22 11:22 ` Borislav Petkov
2016-03-21 8:23 ` Borislav Petkov
2016-03-21 10:05 ` Huang Rui
2016-03-21 10:23 ` Borislav Petkov
2016-03-19 9:24 ` Thomas Gleixner
2016-03-19 15:56 ` Borislav Petkov
2016-03-20 10:39 ` Peter Zijlstra
2016-03-20 11:04 ` Borislav Petkov
2016-03-20 12:32 ` Peter Zijlstra
2016-03-20 12:46 ` Peter Zijlstra
2016-03-20 13:09 ` Borislav Petkov
2016-03-20 17:08 ` Peter Zijlstra
2016-03-20 18:46 ` Borislav Petkov
2016-03-29 8:49 ` [tip:x86/urgent] perf/x86/amd: Cleanup Fam10h NB event constraints tip-bot for Peter Zijlstra
2016-03-22 7:56 ` [PATCH 2/3] x86/topology: Fix AMD core count Huang Rui
2016-03-18 15:03 ` [PATCH 3/3] x86/topology: Fix Intel HT disable Peter Zijlstra
2016-03-19 9:31 ` [tip:x86/urgent] " tip-bot for Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160317095220.GO6344@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=aherrmann@suse.com \
--cc=bp@alien8.de \
--cc=jencce.kernel@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox