* Re: [patch V6 11/19] x86/cpu: Use common topology code for AMD
@ 2024-03-14 10:21 Yuezhang.Mo
2024-03-14 12:07 ` Borislav Petkov
2024-05-08 19:53 ` [patch] x86/topology/amd: Ensure that LLC ID is initialized Thomas Gleixner
0 siblings, 2 replies; 4+ messages in thread
From: Yuezhang.Mo @ 2024-03-14 10:21 UTC (permalink / raw)
To: tglx@linutronix.de
Cc: andrew.cooper3@citrix.com, andy@infradead.org,
arjan@linux.intel.com, dimitri.sivanich@hpe.com,
feng.tang@intel.com, jgross@suse.com, kan.liang@linux.intel.com,
kprateek.nayak@amd.com, linux-kernel@vger.kernel.org,
mhklinux@outlook.com, paulmck@kernel.org, peterz@infradead.org,
ray.huang@amd.com, rui.zhang@intel.com, sohil.mehta@intel.com,
thomas.lendacky@amd.com, wendy.wang@intel.com, x86@kernel.org
I ran xfstests generic/650 and found that it failed.
The reason for the failure is that this appears in dmesg:
[ 649.590421] smpboot: CPU 2 is now offline
[ 650.132920] smpboot: Booting Node 0 Processor 3 APIC 0x13
[ 650.133432] LVT offset 0 assigned for vector 0x400
[ 650.148931] ACPI: \_PR_.P003: Found 2 idle states
[ 650.149478] BUG: arch topology borken
[ 650.149483] the CLS domain not a subset of the MC domain
[ 650.149486] BUG: arch topology borken
[ 650.149487] the CLS domain not a subset of the MC domain
I prepared the following script to reproduce this issue.
#! /bin/sh
sysfs_cpu_dir="/sys/devices/system/cpu"
nrcpus=$(getconf _NPROCESSORS_CONF)
hotplug_cpus=()
for ((i = 0; i < nrcpus; i++ )); do
test -e "$sysfs_cpu_dir/cpu$i/online" && hotplug_cpus+=("$i")
done
nr_hotplug_cpus="${#hotplug_cpus[@]}"
for ((i=0; i<20;i++)); do
idx=$(( RANDOM % nr_hotplug_cpus ))
cpu="${hotplug_cpus[$idx]}"
action=$(( RANDOM % 2 ))
echo "$action" > "$sysfs_cpu_dir/cpu$cpu/online" 2>/dev/null
sleep 0.5
done
If run the script without this commit, the issue cannot be reproduced.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch V6 11/19] x86/cpu: Use common topology code for AMD
2024-03-14 10:21 [patch V6 11/19] x86/cpu: Use common topology code for AMD Yuezhang.Mo
@ 2024-03-14 12:07 ` Borislav Petkov
2024-05-08 19:53 ` [patch] x86/topology/amd: Ensure that LLC ID is initialized Thomas Gleixner
1 sibling, 0 replies; 4+ messages in thread
From: Borislav Petkov @ 2024-03-14 12:07 UTC (permalink / raw)
To: Yuezhang.Mo@sony.com
Cc: tglx@linutronix.de, andrew.cooper3@citrix.com, andy@infradead.org,
arjan@linux.intel.com, dimitri.sivanich@hpe.com,
feng.tang@intel.com, jgross@suse.com, kan.liang@linux.intel.com,
kprateek.nayak@amd.com, linux-kernel@vger.kernel.org,
mhklinux@outlook.com, paulmck@kernel.org, peterz@infradead.org,
ray.huang@amd.com, rui.zhang@intel.com, sohil.mehta@intel.com,
thomas.lendacky@amd.com, wendy.wang@intel.com, x86@kernel.org
On Thu, Mar 14, 2024 at 10:21:34AM +0000, Yuezhang.Mo@sony.com wrote:
> I ran xfstests generic/650 and found that it failed.
>
> The reason for the failure is that this appears in dmesg:
Can you send - privately is fine too - from that machine:
* output of "cpuid -r"
* full dmesg
* output of "grep -r . /sys/kernel/debug/x86/topo/"
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 4+ messages in thread
* [patch] x86/topology/amd: Ensure that LLC ID is initialized
2024-03-14 10:21 [patch V6 11/19] x86/cpu: Use common topology code for AMD Yuezhang.Mo
2024-03-14 12:07 ` Borislav Petkov
@ 2024-05-08 19:53 ` Thomas Gleixner
2024-05-10 8:56 ` Yuezhang.Mo
1 sibling, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2024-05-08 19:53 UTC (permalink / raw)
To: Yuezhang.Mo@sony.com
Cc: andrew.cooper3@citrix.com, andy@infradead.org,
arjan@linux.intel.com, dimitri.sivanich@hpe.com,
feng.tang@intel.com, jgross@suse.com, kan.liang@linux.intel.com,
kprateek.nayak@amd.com, linux-kernel@vger.kernel.org,
mhklinux@outlook.com, paulmck@kernel.org, peterz@infradead.org,
ray.huang@amd.com, rui.zhang@intel.com, sohil.mehta@intel.com,
thomas.lendacky@amd.com, wendy.wang@intel.com, x86@kernel.org
The original topology evaluation code initialized cpu_data::topo::llc_id
with the die ID initialy and then eventually overwrite it with information
gathered from a CPUID leaf.
The conversion analysis failed to spot that particular detail and omitted
this initial assignment under the assumption that each topology evaluation
path will set it up. That assumption is mostly correct, but turns out to be
wrong in case that the CPUID leaf 0x80000006 does not provide a LLC ID.
In that case LLC ID is invalid and as a consequence the setup of the
scheduling domain CPU masks is incorrect which subsequently causes the
scheduler core to complain about it during CPU hotplug:
BUG: arch topology borken
the CLS domain not a subset of the MC domain
Cure it by reusing legacy_set_llc() and assigning the die ID if the LLC ID
is invalid after all possible parsers have been tried.
Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser")
Reported-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
Thanks to Yuezhang for providing the debug information!
---
arch/x86/kernel/cpu/topology_amd.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
--- a/arch/x86/kernel/cpu/topology_amd.c
+++ b/arch/x86/kernel/cpu/topology_amd.c
@@ -119,7 +119,7 @@ static bool parse_8000_001e(struct topo_
return true;
}
-static bool parse_fam10h_node_id(struct topo_scan *tscan)
+static void parse_fam10h_node_id(struct topo_scan *tscan)
{
union {
struct {
@@ -131,20 +131,20 @@ static bool parse_fam10h_node_id(struct
} nid;
if (!boot_cpu_has(X86_FEATURE_NODEID_MSR))
- return false;
+ return;
rdmsrl(MSR_FAM10H_NODE_ID, nid.msr);
store_node(tscan, nid.nodes_per_pkg + 1, nid.node_id);
tscan->c->topo.llc_id = nid.node_id;
- return true;
}
static void legacy_set_llc(struct topo_scan *tscan)
{
unsigned int apicid = tscan->c->topo.initial_apicid;
- /* parse_8000_0008() set everything up except llc_id */
- tscan->c->topo.llc_id = apicid >> tscan->dom_shifts[TOPO_CORE_DOMAIN];
+ /* If none of the parsers set LLC ID then use the die ID for it. */
+ if (tscan->c->topo.llc_id == BAD_APICID)
+ tscan->c->topo.llc_id = apicid >> tscan->dom_shifts[TOPO_CORE_DOMAIN];
}
static void topoext_fixup(struct topo_scan *tscan)
@@ -187,10 +187,7 @@ static void parse_topology_amd(struct to
return;
/* Try the NODEID MSR */
- if (parse_fam10h_node_id(tscan))
- return;
-
- legacy_set_llc(tscan);
+ parse_fam10h_node_id(tscan);
}
void cpu_parse_topology_amd(struct topo_scan *tscan)
@@ -198,6 +195,7 @@ void cpu_parse_topology_amd(struct topo_
tscan->amd_nodes_per_pkg = 1;
topoext_fixup(tscan);
parse_topology_amd(tscan);
+ legacy_set_llc(tscan);
if (tscan->amd_nodes_per_pkg > 1)
set_cpu_cap(tscan->c, X86_FEATURE_AMD_DCM);
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch] x86/topology/amd: Ensure that LLC ID is initialized
2024-05-08 19:53 ` [patch] x86/topology/amd: Ensure that LLC ID is initialized Thomas Gleixner
@ 2024-05-10 8:56 ` Yuezhang.Mo
0 siblings, 0 replies; 4+ messages in thread
From: Yuezhang.Mo @ 2024-05-10 8:56 UTC (permalink / raw)
To: Thomas Gleixner
Cc: andrew.cooper3@citrix.com, andy@infradead.org,
arjan@linux.intel.com, dimitri.sivanich@hpe.com,
feng.tang@intel.com, jgross@suse.com, kan.liang@linux.intel.com,
kprateek.nayak@amd.com, linux-kernel@vger.kernel.org,
mhklinux@outlook.com, paulmck@kernel.org, peterz@infradead.org,
ray.huang@amd.com, rui.zhang@intel.com, sohil.mehta@intel.com,
thomas.lendacky@amd.com, wendy.wang@intel.com, x86@kernel.org
After applying this patch, "BUG: Arch topology is broken" no longer
appears in dmesg both on booting and running my test script.
Tested-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-05-10 8:57 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-14 10:21 [patch V6 11/19] x86/cpu: Use common topology code for AMD Yuezhang.Mo
2024-03-14 12:07 ` Borislav Petkov
2024-05-08 19:53 ` [patch] x86/topology/amd: Ensure that LLC ID is initialized Thomas Gleixner
2024-05-10 8:56 ` Yuezhang.Mo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox