* [PATCH V2 1/2] perf/x86/intel/uncore: Skip discovery table for offline dies
@ 2026-01-13 20:56 Zide Chen
2026-01-13 20:56 ` [PATCH V2 2/2] perf/x86/intel/uncore: Fix die ID init and look up bugs Zide Chen
2026-01-14 2:31 ` [PATCH V2 1/2] perf/x86/intel/uncore: Skip discovery table for offline dies Mi, Dapeng
0 siblings, 2 replies; 6+ messages in thread
From: Zide Chen @ 2026-01-13 20:56 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen, Xudong Hao,
Falcon Thomas, Steve Wahl
This warning can be triggered if NUMA is disabled and the system
boots with fewer CPUs than the number of CPUs in die 0.
WARNING: CPU: 9 PID: 7257 at uncore.c:1157 uncore_pci_pmu_register+0x136/0x160 [intel_uncore]
Currently, the discovery table continues to be parsed even if all CPUs
in the associated die are offline. This can lead to an array overflow
at "pmu->boxes[die] = box" in uncore_pci_pmu_register(), which may
trigger the warning above or cause other issues.
Reported-by: Steve Wahl <steve.wahl@hpe.com>
Tested-by: Steve Wahl <steve.wahl@hpe.com>
Fixes: edae1f06c2cd ("perf/x86/intel/uncore: Parse uncore discovery tables")
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
V2:
- Add the Tested-by tag
- Rebase onto perf/core (base commit: a491c02c2770)
arch/x86/events/intel/uncore.c | 4 ++++
arch/x86/events/intel/uncore_discovery.c | 2 +-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 4684649109d9..c126a29ab729 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1368,6 +1368,10 @@ static void uncore_pci_pmus_register(void)
for (node = rb_first(type->boxes); node; node = rb_next(node)) {
unit = rb_entry(node, struct intel_uncore_discovery_unit, node);
+
+ if (WARN_ON(unit->die >= uncore_max_dies()))
+ continue;
+
pdev = pci_get_domain_bus_and_slot(UNCORE_DISCOVERY_PCI_DOMAIN(unit->addr),
UNCORE_DISCOVERY_PCI_BUS(unit->addr),
UNCORE_DISCOVERY_PCI_DEVFN(unit->addr));
diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index b46575254dbe..0e414cecb6f2 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -366,7 +366,7 @@ static bool uncore_discovery_pci(struct uncore_discovery_domain *domain)
(val & UNCORE_DISCOVERY_DVSEC2_BIR_MASK) * UNCORE_DISCOVERY_BIR_STEP;
die = get_device_die_id(dev);
- if (die < 0)
+ if ((die < 0) || (die >= uncore_max_dies()))
continue;
parse_discovery_table(domain, dev, die, bar_offset, &parsed);
--
2.52.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH V2 2/2] perf/x86/intel/uncore: Fix die ID init and look up bugs 2026-01-13 20:56 [PATCH V2 1/2] perf/x86/intel/uncore: Skip discovery table for offline dies Zide Chen @ 2026-01-13 20:56 ` Zide Chen 2026-01-14 5:19 ` Mi, Dapeng 2026-01-14 2:31 ` [PATCH V2 1/2] perf/x86/intel/uncore: Skip discovery table for offline dies Mi, Dapeng 1 sibling, 1 reply; 6+ messages in thread From: Zide Chen @ 2026-01-13 20:56 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin, Andi Kleen, Eranian Stephane Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen, Xudong Hao, Falcon Thomas, Steve Wahl In snbep_pci2phy_map_init(), in the nr_node_ids > 8 path, uncore_device_to_die() may return -1 when all CPUs associated with the UBOX device are offline. Remove the WARN_ON_ONCE(die_id == -1) check for two reasons: - The current code breaks out of the loop. This is incorrect because pci_get_device() does not guarantee iteration in domain or bus order, so additional UBOX devices may be skipped during the scan. - Returning -EINVAL is incorrect, since marking offline buses with die_id == -1 is expected and should not be treated as an error. Separately, when NUMA is disabled on a NUMA-capable platform, pcibus_to_node() returns NUMA_NO_NODE, causing uncore_device_to_die() to return -1 for all PCI devices. As a result, spr_update_device_location(), used on Intel SPR and EMR, ignores the corresponding PMON units and does not add them to the RB tree. Fix this by using uncore_pcibus_to_dieid(), which retrieves topology from the UBOX GIDNIDMAP register and works regardless of whether NUMA is enabled in Linux. This requires snbep_pci2phy_map_init() to be added in spr_uncore_pci_init(). Keep uncore_device_to_die() only for the nr_node_ids > 8 case, where NUMA is expected to be enabled. Fixes: 9a7832ce3d92 ("perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info") Fixes: 65248a9a9ee1 ("perf/x86/uncore: Add a quirk for UPI on SPR") Tested-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Zide Chen <zide.chen@intel.com> --- V2: - Fix the commit message to note that spr_update_device_location() is used by EMR, not GNR. - Rewrite the commit message for clarity. - Add a Tested-by tag. arch/x86/events/intel/uncore.c | 1 + arch/x86/events/intel/uncore_snbep.c | 13 ++++++------- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c index c126a29ab729..c721042be629 100644 --- a/arch/x86/events/intel/uncore.c +++ b/arch/x86/events/intel/uncore.c @@ -67,6 +67,7 @@ int uncore_die_to_segment(int die) return bus ? pci_domain_nr(bus) : -EINVAL; } +/* Note: This API can only be used when NUMA information is available. */ int uncore_device_to_die(struct pci_dev *dev) { int node = pcibus_to_node(dev->bus); diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c index 7ca0429c4004..52dec34d18c4 100644 --- a/arch/x86/events/intel/uncore_snbep.c +++ b/arch/x86/events/intel/uncore_snbep.c @@ -1459,13 +1459,7 @@ static int snbep_pci2phy_map_init(int devid, int nodeid_loc, int idmap_loc, bool } map->pbus_to_dieid[bus] = die_id = uncore_device_to_die(ubox_dev); - raw_spin_unlock(&pci2phy_map_lock); - - if (WARN_ON_ONCE(die_id == -1)) { - err = -EINVAL; - break; - } } } @@ -6420,7 +6414,7 @@ static void spr_update_device_location(int type_id) while ((dev = pci_get_device(PCI_VENDOR_ID_INTEL, device, dev)) != NULL) { - die = uncore_device_to_die(dev); + die = uncore_pcibus_to_dieid(dev->bus); if (die < 0) continue; @@ -6444,6 +6438,11 @@ static void spr_update_device_location(int type_id) int spr_uncore_pci_init(void) { + int ret = snbep_pci2phy_map_init(0x3250, SKX_CPUNODEID, SKX_GIDNIDMAP, true); + + if (ret) + return ret; + /* * The discovery table of UPI on some SPR variant is broken, * which impacts the detection of both UPI and M3UPI uncore PMON. -- 2.52.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH V2 2/2] perf/x86/intel/uncore: Fix die ID init and look up bugs 2026-01-13 20:56 ` [PATCH V2 2/2] perf/x86/intel/uncore: Fix die ID init and look up bugs Zide Chen @ 2026-01-14 5:19 ` Mi, Dapeng 2026-01-15 18:17 ` Chen, Zide 0 siblings, 1 reply; 6+ messages in thread From: Mi, Dapeng @ 2026-01-14 5:19 UTC (permalink / raw) To: Zide Chen, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin, Andi Kleen, Eranian Stephane Cc: linux-kernel, linux-perf-users, Xudong Hao, Falcon Thomas, Steve Wahl On 1/14/2026 4:56 AM, Zide Chen wrote: > In snbep_pci2phy_map_init(), in the nr_node_ids > 8 path, > uncore_device_to_die() may return -1 when all CPUs associated > with the UBOX device are offline. > > Remove the WARN_ON_ONCE(die_id == -1) check for two reasons: > > - The current code breaks out of the loop. This is incorrect because > pci_get_device() does not guarantee iteration in domain or bus order, > so additional UBOX devices may be skipped during the scan. > > - Returning -EINVAL is incorrect, since marking offline buses with > die_id == -1 is expected and should not be treated as an error. > > Separately, when NUMA is disabled on a NUMA-capable platform, > pcibus_to_node() returns NUMA_NO_NODE, causing uncore_device_to_die() > to return -1 for all PCI devices. As a result, > spr_update_device_location(), used on Intel SPR and EMR, ignores the > corresponding PMON units and does not add them to the RB tree. > > Fix this by using uncore_pcibus_to_dieid(), which retrieves topology > from the UBOX GIDNIDMAP register and works regardless of whether NUMA > is enabled in Linux. This requires snbep_pci2phy_map_init() to be > added in spr_uncore_pci_init(). > > Keep uncore_device_to_die() only for the nr_node_ids > 8 case, where > NUMA is expected to be enabled. > > Fixes: 9a7832ce3d92 ("perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info") > Fixes: 65248a9a9ee1 ("perf/x86/uncore: Add a quirk for UPI on SPR") > Tested-by: Steve Wahl <steve.wahl@hpe.com> > Signed-off-by: Zide Chen <zide.chen@intel.com> > --- > V2: > - Fix the commit message to note that spr_update_device_location() is > used by EMR, not GNR. > - Rewrite the commit message for clarity. > - Add a Tested-by tag. > > arch/x86/events/intel/uncore.c | 1 + > arch/x86/events/intel/uncore_snbep.c | 13 ++++++------- > 2 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c > index c126a29ab729..c721042be629 100644 > --- a/arch/x86/events/intel/uncore.c > +++ b/arch/x86/events/intel/uncore.c > @@ -67,6 +67,7 @@ int uncore_die_to_segment(int die) > return bus ? pci_domain_nr(bus) : -EINVAL; > } > > +/* Note: This API can only be used when NUMA information is available. */ > int uncore_device_to_die(struct pci_dev *dev) Not everyone could look at the comment and follow the rule. Could we add a WARN_ON in this function and WARN the users if it's not used appropriately? Others look good to me. > { > int node = pcibus_to_node(dev->bus); > diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c > index 7ca0429c4004..52dec34d18c4 100644 > --- a/arch/x86/events/intel/uncore_snbep.c > +++ b/arch/x86/events/intel/uncore_snbep.c > @@ -1459,13 +1459,7 @@ static int snbep_pci2phy_map_init(int devid, int nodeid_loc, int idmap_loc, bool > } > > map->pbus_to_dieid[bus] = die_id = uncore_device_to_die(ubox_dev); > - > raw_spin_unlock(&pci2phy_map_lock); > - > - if (WARN_ON_ONCE(die_id == -1)) { > - err = -EINVAL; > - break; > - } > } > } > > @@ -6420,7 +6414,7 @@ static void spr_update_device_location(int type_id) > > while ((dev = pci_get_device(PCI_VENDOR_ID_INTEL, device, dev)) != NULL) { > > - die = uncore_device_to_die(dev); > + die = uncore_pcibus_to_dieid(dev->bus); > if (die < 0) > continue; > > @@ -6444,6 +6438,11 @@ static void spr_update_device_location(int type_id) > > int spr_uncore_pci_init(void) > { > + int ret = snbep_pci2phy_map_init(0x3250, SKX_CPUNODEID, SKX_GIDNIDMAP, true); > + > + if (ret) > + return ret; > + > /* > * The discovery table of UPI on some SPR variant is broken, > * which impacts the detection of both UPI and M3UPI uncore PMON. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH V2 2/2] perf/x86/intel/uncore: Fix die ID init and look up bugs 2026-01-14 5:19 ` Mi, Dapeng @ 2026-01-15 18:17 ` Chen, Zide 0 siblings, 0 replies; 6+ messages in thread From: Chen, Zide @ 2026-01-15 18:17 UTC (permalink / raw) To: Mi, Dapeng, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin, Andi Kleen, Eranian Stephane Cc: linux-kernel, linux-perf-users, Xudong Hao, Falcon Thomas, Steve Wahl On 1/13/2026 9:19 PM, Mi, Dapeng wrote: > > On 1/14/2026 4:56 AM, Zide Chen wrote: >> In snbep_pci2phy_map_init(), in the nr_node_ids > 8 path, >> uncore_device_to_die() may return -1 when all CPUs associated >> with the UBOX device are offline. >> >> Remove the WARN_ON_ONCE(die_id == -1) check for two reasons: >> >> - The current code breaks out of the loop. This is incorrect because >> pci_get_device() does not guarantee iteration in domain or bus order, >> so additional UBOX devices may be skipped during the scan. >> >> - Returning -EINVAL is incorrect, since marking offline buses with >> die_id == -1 is expected and should not be treated as an error. >> >> Separately, when NUMA is disabled on a NUMA-capable platform, >> pcibus_to_node() returns NUMA_NO_NODE, causing uncore_device_to_die() >> to return -1 for all PCI devices. As a result, >> spr_update_device_location(), used on Intel SPR and EMR, ignores the >> corresponding PMON units and does not add them to the RB tree. >> >> Fix this by using uncore_pcibus_to_dieid(), which retrieves topology >> from the UBOX GIDNIDMAP register and works regardless of whether NUMA >> is enabled in Linux. This requires snbep_pci2phy_map_init() to be >> added in spr_uncore_pci_init(). >> >> Keep uncore_device_to_die() only for the nr_node_ids > 8 case, where >> NUMA is expected to be enabled. >> >> Fixes: 9a7832ce3d92 ("perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info") >> Fixes: 65248a9a9ee1 ("perf/x86/uncore: Add a quirk for UPI on SPR") >> Tested-by: Steve Wahl <steve.wahl@hpe.com> >> Signed-off-by: Zide Chen <zide.chen@intel.com> >> --- >> V2: >> - Fix the commit message to note that spr_update_device_location() is >> used by EMR, not GNR. >> - Rewrite the commit message for clarity. >> - Add a Tested-by tag. >> >> arch/x86/events/intel/uncore.c | 1 + >> arch/x86/events/intel/uncore_snbep.c | 13 ++++++------- >> 2 files changed, 7 insertions(+), 7 deletions(-) >> >> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c >> index c126a29ab729..c721042be629 100644 >> --- a/arch/x86/events/intel/uncore.c >> +++ b/arch/x86/events/intel/uncore.c >> @@ -67,6 +67,7 @@ int uncore_die_to_segment(int die) >> return bus ? pci_domain_nr(bus) : -EINVAL; >> } >> >> +/* Note: This API can only be used when NUMA information is available. */ >> int uncore_device_to_die(struct pci_dev *dev) > > Not everyone could look at the comment and follow the rule. Could we add a > WARN_ON in this function and WARN the users if it's not used appropriately? I may be missing something, but I can’t find a simple, clean, and reliable way to determine that NUMA is not available (either disabled by Linux/firmware or not supported by the hardware). For example, NUMA_NO_NODE returned from pcibus_to_node() only indicates that the kernel cannot associate the given PCI bus with any NUMA node. This can happen even when NUMA is enabled, if the PCI locality information is unavailable for some PCI buses for some reasons. Commit ad5086108b (“PCI: Warn if no host bridge NUMA node info”) explicitly discusses this case, which is also consistent with the Linux documentation: https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-bus-pci I haven’t looked closely at the CPU-to-node mapping and don’t know whether it would work reliably here. Even if it does, my intuition is that it may not worth adding the extra complexity in this situation. > Others look good to me. > > >> { >> int node = pcibus_to_node(dev->bus); >> diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c >> index 7ca0429c4004..52dec34d18c4 100644 >> --- a/arch/x86/events/intel/uncore_snbep.c >> +++ b/arch/x86/events/intel/uncore_snbep.c >> @@ -1459,13 +1459,7 @@ static int snbep_pci2phy_map_init(int devid, int nodeid_loc, int idmap_loc, bool >> } >> >> map->pbus_to_dieid[bus] = die_id = uncore_device_to_die(ubox_dev); >> - >> raw_spin_unlock(&pci2phy_map_lock); >> - >> - if (WARN_ON_ONCE(die_id == -1)) { >> - err = -EINVAL; >> - break; >> - } >> } >> } >> >> @@ -6420,7 +6414,7 @@ static void spr_update_device_location(int type_id) >> >> while ((dev = pci_get_device(PCI_VENDOR_ID_INTEL, device, dev)) != NULL) { >> >> - die = uncore_device_to_die(dev); >> + die = uncore_pcibus_to_dieid(dev->bus); >> if (die < 0) >> continue; >> >> @@ -6444,6 +6438,11 @@ static void spr_update_device_location(int type_id) >> >> int spr_uncore_pci_init(void) >> { >> + int ret = snbep_pci2phy_map_init(0x3250, SKX_CPUNODEID, SKX_GIDNIDMAP, true); >> + >> + if (ret) >> + return ret; >> + >> /* >> * The discovery table of UPI on some SPR variant is broken, >> * which impacts the detection of both UPI and M3UPI uncore PMON. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH V2 1/2] perf/x86/intel/uncore: Skip discovery table for offline dies 2026-01-13 20:56 [PATCH V2 1/2] perf/x86/intel/uncore: Skip discovery table for offline dies Zide Chen 2026-01-13 20:56 ` [PATCH V2 2/2] perf/x86/intel/uncore: Fix die ID init and look up bugs Zide Chen @ 2026-01-14 2:31 ` Mi, Dapeng 2026-01-15 18:24 ` Chen, Zide 1 sibling, 1 reply; 6+ messages in thread From: Mi, Dapeng @ 2026-01-14 2:31 UTC (permalink / raw) To: Zide Chen, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin, Andi Kleen, Eranian Stephane Cc: linux-kernel, linux-perf-users, Xudong Hao, Falcon Thomas, Steve Wahl On 1/14/2026 4:56 AM, Zide Chen wrote: > This warning can be triggered if NUMA is disabled and the system > boots with fewer CPUs than the number of CPUs in die 0. > > WARNING: CPU: 9 PID: 7257 at uncore.c:1157 uncore_pci_pmu_register+0x136/0x160 [intel_uncore] > > Currently, the discovery table continues to be parsed even if all CPUs > in the associated die are offline. This can lead to an array overflow > at "pmu->boxes[die] = box" in uncore_pci_pmu_register(), which may > trigger the warning above or cause other issues. > > Reported-by: Steve Wahl <steve.wahl@hpe.com> > Tested-by: Steve Wahl <steve.wahl@hpe.com> > Fixes: edae1f06c2cd ("perf/x86/intel/uncore: Parse uncore discovery tables") > Signed-off-by: Zide Chen <zide.chen@intel.com> > --- > V2: > - Add the Tested-by tag > - Rebase onto perf/core (base commit: a491c02c2770) > > arch/x86/events/intel/uncore.c | 4 ++++ > arch/x86/events/intel/uncore_discovery.c | 2 +- > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c > index 4684649109d9..c126a29ab729 100644 > --- a/arch/x86/events/intel/uncore.c > +++ b/arch/x86/events/intel/uncore.c > @@ -1368,6 +1368,10 @@ static void uncore_pci_pmus_register(void) > > for (node = rb_first(type->boxes); node; node = rb_next(node)) { > unit = rb_entry(node, struct intel_uncore_discovery_unit, node); > + > + if (WARN_ON(unit->die >= uncore_max_dies())) > + continue; I'm thinking if we need to add "WARN_ON" here. Since all uncore units that the die id is larger than uncore_max_dies() would be skipped in discovery phase, the unit die id should be not larger than uncore_max_dies() in uncore_pci_pmus_register(). Is it right? > + > pdev = pci_get_domain_bus_and_slot(UNCORE_DISCOVERY_PCI_DOMAIN(unit->addr), > UNCORE_DISCOVERY_PCI_BUS(unit->addr), > UNCORE_DISCOVERY_PCI_DEVFN(unit->addr)); > diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c > index b46575254dbe..0e414cecb6f2 100644 > --- a/arch/x86/events/intel/uncore_discovery.c > +++ b/arch/x86/events/intel/uncore_discovery.c > @@ -366,7 +366,7 @@ static bool uncore_discovery_pci(struct uncore_discovery_domain *domain) > (val & UNCORE_DISCOVERY_DVSEC2_BIR_MASK) * UNCORE_DISCOVERY_BIR_STEP; > > die = get_device_die_id(dev); > - if (die < 0) > + if ((die < 0) || (die >= uncore_max_dies())) > continue; > > parse_discovery_table(domain, dev, die, bar_offset, &parsed); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH V2 1/2] perf/x86/intel/uncore: Skip discovery table for offline dies 2026-01-14 2:31 ` [PATCH V2 1/2] perf/x86/intel/uncore: Skip discovery table for offline dies Mi, Dapeng @ 2026-01-15 18:24 ` Chen, Zide 0 siblings, 0 replies; 6+ messages in thread From: Chen, Zide @ 2026-01-15 18:24 UTC (permalink / raw) To: Mi, Dapeng, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin, Andi Kleen, Eranian Stephane Cc: linux-kernel, linux-perf-users, Xudong Hao, Falcon Thomas, Steve Wahl On 1/13/2026 6:31 PM, Mi, Dapeng wrote: > > On 1/14/2026 4:56 AM, Zide Chen wrote: >> This warning can be triggered if NUMA is disabled and the system >> boots with fewer CPUs than the number of CPUs in die 0. >> >> WARNING: CPU: 9 PID: 7257 at uncore.c:1157 uncore_pci_pmu_register+0x136/0x160 [intel_uncore] >> >> Currently, the discovery table continues to be parsed even if all CPUs >> in the associated die are offline. This can lead to an array overflow >> at "pmu->boxes[die] = box" in uncore_pci_pmu_register(), which may >> trigger the warning above or cause other issues. >> >> Reported-by: Steve Wahl <steve.wahl@hpe.com> >> Tested-by: Steve Wahl <steve.wahl@hpe.com> >> Fixes: edae1f06c2cd ("perf/x86/intel/uncore: Parse uncore discovery tables") >> Signed-off-by: Zide Chen <zide.chen@intel.com> >> --- >> V2: >> - Add the Tested-by tag >> - Rebase onto perf/core (base commit: a491c02c2770) >> >> arch/x86/events/intel/uncore.c | 4 ++++ >> arch/x86/events/intel/uncore_discovery.c | 2 +- >> 2 files changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c >> index 4684649109d9..c126a29ab729 100644 >> --- a/arch/x86/events/intel/uncore.c >> +++ b/arch/x86/events/intel/uncore.c >> @@ -1368,6 +1368,10 @@ static void uncore_pci_pmus_register(void) >> >> for (node = rb_first(type->boxes); node; node = rb_next(node)) { >> unit = rb_entry(node, struct intel_uncore_discovery_unit, node); >> + >> + if (WARN_ON(unit->die >= uncore_max_dies())) >> + continue; > > I'm thinking if we need to add "WARN_ON" here. Since all uncore units that > the die id is larger than uncore_max_dies() would be skipped in discovery > phase, the unit die id should be not larger than uncore_max_dies() in > uncore_pci_pmus_register(). Is it right? I originally added this as a defensive check in case it could trigger an array overflow. However, I agree it can be dropped since this condition should not be reachable. >> + >> pdev = pci_get_domain_bus_and_slot(UNCORE_DISCOVERY_PCI_DOMAIN(unit->addr), >> UNCORE_DISCOVERY_PCI_BUS(unit->addr), >> UNCORE_DISCOVERY_PCI_DEVFN(unit->addr)); >> diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c >> index b46575254dbe..0e414cecb6f2 100644 >> --- a/arch/x86/events/intel/uncore_discovery.c >> +++ b/arch/x86/events/intel/uncore_discovery.c >> @@ -366,7 +366,7 @@ static bool uncore_discovery_pci(struct uncore_discovery_domain *domain) >> (val & UNCORE_DISCOVERY_DVSEC2_BIR_MASK) * UNCORE_DISCOVERY_BIR_STEP; >> >> die = get_device_die_id(dev); >> - if (die < 0) >> + if ((die < 0) || (die >= uncore_max_dies())) >> continue; >> >> parse_discovery_table(domain, dev, die, bar_offset, &parsed); ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-01-15 18:24 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-13 20:56 [PATCH V2 1/2] perf/x86/intel/uncore: Skip discovery table for offline dies Zide Chen 2026-01-13 20:56 ` [PATCH V2 2/2] perf/x86/intel/uncore: Fix die ID init and look up bugs Zide Chen 2026-01-14 5:19 ` Mi, Dapeng 2026-01-15 18:17 ` Chen, Zide 2026-01-14 2:31 ` [PATCH V2 1/2] perf/x86/intel/uncore: Skip discovery table for offline dies Mi, Dapeng 2026-01-15 18:24 ` Chen, Zide
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox