* [PATCH 1/6] perf/x86/intel/uncore: Fix discovery unit lookup for multi-die systems
2026-05-11 23:05 [PATCH 0/6] perf/x86/intel/uncore: Bug fixes and cleanups Zide Chen
@ 2026-05-11 23:05 ` Zide Chen
2026-05-13 1:49 ` sashiko-bot
2026-05-11 23:05 ` [PATCH 2/6] perf/x86/intel/uncore: Fix PCI device refcount leak in UPI discovery Zide Chen
` (4 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Zide Chen @ 2026-05-11 23:05 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen
In uncore_find_add_unit(), PMON units with the same unit ID may be
added to the uncore discovery RB tree for different dies. These units
are distinguished by node->die.
However, intel_generic_uncore_box_ctl() uses fixes die ID -1 when
looking up the discovery unit, which may retrieve the wrong node on
multi-die systems.
Use box->dieid instead so the correct discovery unit is selected.
No functional issue has been observed so far because currently supported
platforms happen to use the same unit control register for such units.
Fixes: b1d9ea2e1ca4 ("perf/x86/uncore: Apply the unit control RB tree to MSR uncore units")
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/events/intel/uncore_discovery.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index 583cbd06b9b8..1d22d7c00ee0 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -481,7 +481,7 @@ static u64 intel_generic_uncore_box_ctl(struct intel_uncore_box *box)
struct intel_uncore_discovery_unit *unit;
unit = intel_uncore_find_discovery_unit(box->pmu->type->boxes,
- -1, box->pmu->pmu_idx);
+ box->dieid, box->pmu->pmu_idx);
if (WARN_ON_ONCE(!unit))
return 0;
--
2.54.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 1/6] perf/x86/intel/uncore: Fix discovery unit lookup for multi-die systems
2026-05-11 23:05 ` [PATCH 1/6] perf/x86/intel/uncore: Fix discovery unit lookup for multi-die systems Zide Chen
@ 2026-05-13 1:49 ` sashiko-bot
2026-05-13 18:35 ` Chen, Zide
0 siblings, 1 reply; 18+ messages in thread
From: sashiko-bot @ 2026-05-13 1:49 UTC (permalink / raw)
To: Zide Chen; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Unchecked return value causes a write to MSR 0x0 when a PMON discovery unit is not found for the current die, leading to a #GP fault and kernel panic.
--
commit 618972be846a765ebeeda359a995bf217a4d825c
Author: Zide Chen <zide.chen@intel.com>
perf/x86/intel/uncore: Fix discovery unit lookup for multi-die systems
This commit updates intel_generic_uncore_box_ctl() to look up discovery
units using box->dieid instead of hardcoding -1. This ensures the
correct node is selected on multi-die systems.
> diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
> index 583cbd06b9b88..1d22d7c00ee06 100644
> --- a/arch/x86/events/intel/uncore_discovery.c
> +++ b/arch/x86/events/intel/uncore_discovery.c
> @@ -481,7 +481,7 @@ static u64 intel_generic_uncore_box_ctl(struct intel_uncore_box *box)
> struct intel_uncore_discovery_unit *unit;
>
> unit = intel_uncore_find_discovery_unit(box->pmu->type->boxes,
> - -1, box->pmu->pmu_idx);
> + box->dieid, box->pmu->pmu_idx);
> if (WARN_ON_ONCE(!unit))
> return 0;
Could returning 0 here cause a regression during CPU hotplug if a die is
brought online later?
If a CPU die is offline during boot, uncore_discovery_msr() skips it,
meaning no PMON units are added to the discovery RB-tree for that dieid.
When the die is later brought online, uncore_event_cpu_online()
allocates boxes based on the globally maximum number of discovered units
and calls uncore_box_init().
The initialization function unconditionally calls
intel_generic_uncore_msr_init_box(), which writes to the returned address
without checking if it is 0:
arch/x86/events/intel/uncore_discovery.c:intel_generic_uncore_msr_init_box() {
wrmsrq(intel_generic_uncore_box_ctl(box), GENERIC_PMON_BOX_CTL_INT);
}
Because the lookup now strictly matches box->dieid instead of safely
falling back with -1 to an identical unit on another die,
intel_uncore_find_discovery_unit() will return NULL for this die.
This results in intel_generic_uncore_box_ctl() returning 0. Calling
wrmsrq(0, ...) writes directly to MSR 0x0 (IA32_P5_MC_ADDR), which will
trigger a #GP fault and crash the system.
Similar unvalidated accesses also occur in PCI configuration space
via intel_generic_uncore_pci_init_box():
arch/x86/events/intel/uncore_discovery.c:intel_generic_uncore_pci_init_box() {
struct pci_dev *pdev = box->pci_dev;
int box_ctl = intel_pci_uncore_box_ctl(box);
__set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
pci_write_config_dword(pdev, box_ctl, GENERIC_PMON_BOX_CTL_INT);
}
Should the callers be updated to check the return value of
intel_generic_uncore_box_ctl() before attempting hardware accesses?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260511230527.26096-1-zide.chen@intel.com?part=1
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 1/6] perf/x86/intel/uncore: Fix discovery unit lookup for multi-die systems
2026-05-13 1:49 ` sashiko-bot
@ 2026-05-13 18:35 ` Chen, Zide
0 siblings, 0 replies; 18+ messages in thread
From: Chen, Zide @ 2026-05-13 18:35 UTC (permalink / raw)
To: sashiko-reviews; +Cc: linux-perf-users
On 5/12/2026 6:49 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] Unchecked return value causes a write to MSR 0x0 when a PMON discovery unit is not found for the current die, leading to a #GP fault and kernel panic.
> --
>
> commit 618972be846a765ebeeda359a995bf217a4d825c
> Author: Zide Chen <zide.chen@intel.com>
>
> perf/x86/intel/uncore: Fix discovery unit lookup for multi-die systems
>
> This commit updates intel_generic_uncore_box_ctl() to look up discovery
> units using box->dieid instead of hardcoding -1. This ensures the
> correct node is selected on multi-die systems.
>
>> diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
>> index 583cbd06b9b88..1d22d7c00ee06 100644
>> --- a/arch/x86/events/intel/uncore_discovery.c
>> +++ b/arch/x86/events/intel/uncore_discovery.c
>> @@ -481,7 +481,7 @@ static u64 intel_generic_uncore_box_ctl(struct intel_uncore_box *box)
>> struct intel_uncore_discovery_unit *unit;
>>
>> unit = intel_uncore_find_discovery_unit(box->pmu->type->boxes,
>> - -1, box->pmu->pmu_idx);
>> + box->dieid, box->pmu->pmu_idx);
>> if (WARN_ON_ONCE(!unit))
>> return 0;
>
> Could returning 0 here cause a regression during CPU hotplug if a die is
> brought online later?
You're right, it's problematic, but not a regression — it's caused by a
pre-existing bug.
With or without this patch, intel_uncore_find_discovery_unit() could
return NULL, causing intel_generic_uncore_box_ctl() to return 0, and all
callers should check the return value before accessing hardware.
The CPU offline scenario you described is just one example of this. I'll
fix it in a separate patch.
> If a CPU die is offline during boot, uncore_discovery_msr() skips it,
> meaning no PMON units are added to the discovery RB-tree for that dieid.
>
> When the die is later brought online, uncore_event_cpu_online()
> allocates boxes based on the globally maximum number of discovered units
> and calls uncore_box_init().
>
> The initialization function unconditionally calls
> intel_generic_uncore_msr_init_box(), which writes to the returned address
> without checking if it is 0:
>
> arch/x86/events/intel/uncore_discovery.c:intel_generic_uncore_msr_init_box() {
> wrmsrq(intel_generic_uncore_box_ctl(box), GENERIC_PMON_BOX_CTL_INT);
> }
>
> Because the lookup now strictly matches box->dieid instead of safely
> falling back with -1 to an identical unit on another die,
> intel_uncore_find_discovery_unit() will return NULL for this die.
>
> This results in intel_generic_uncore_box_ctl() returning 0. Calling
> wrmsrq(0, ...) writes directly to MSR 0x0 (IA32_P5_MC_ADDR), which will
> trigger a #GP fault and crash the system.
>
> Similar unvalidated accesses also occur in PCI configuration space
> via intel_generic_uncore_pci_init_box():
>
> arch/x86/events/intel/uncore_discovery.c:intel_generic_uncore_pci_init_box() {
> struct pci_dev *pdev = box->pci_dev;
> int box_ctl = intel_pci_uncore_box_ctl(box);
>
> __set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
> pci_write_config_dword(pdev, box_ctl, GENERIC_PMON_BOX_CTL_INT);
> }
>
> Should the callers be updated to check the return value of
> intel_generic_uncore_box_ctl() before attempting hardware accesses?
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/6] perf/x86/intel/uncore: Fix PCI device refcount leak in UPI discovery
2026-05-11 23:05 [PATCH 0/6] perf/x86/intel/uncore: Bug fixes and cleanups Zide Chen
2026-05-11 23:05 ` [PATCH 1/6] perf/x86/intel/uncore: Fix discovery unit lookup for multi-die systems Zide Chen
@ 2026-05-11 23:05 ` Zide Chen
2026-05-12 9:27 ` Mi, Dapeng
2026-05-11 23:05 ` [PATCH 3/6] perf/x86/intel/uncore: Defer ADL global PMON enable to enable_box() Zide Chen
` (3 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Zide Chen @ 2026-05-11 23:05 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen
pci_get_domain_bus_and_slot() increments the reference count of the
returned PCI device and therefore requires a matching pci_dev_put().
In skx_upi_topology_cb() and discover_upi_topology(), the lookup is
performed inside a loop, but pci_dev_put() is only called once after
the loop. As a result, references from all previous iterations are
leaked.
Move pci_dev_put(dev) into the if (dev) block immediately after
upi_fill_topology() returns.
Opportunistically, fix uninitialized variable in skx_upi_topology_cb().
Fixes: 4cfce57fa42d ("perf/x86/intel/uncore: Enable UPI topology discovery for Skylake Server")
Fixes: f680b6e6062e ("perf/x86/intel/uncore: Enable UPI topology discovery for Icelake Server")
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/events/intel/uncore_snbep.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index 215d33e260ed..c9ce206fcbb6 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -4261,7 +4261,7 @@ static int upi_fill_topology(struct pci_dev *dev, struct intel_uncore_topology *
static int skx_upi_topology_cb(struct intel_uncore_type *type, int segment,
int die, u64 cpu_bus_msr)
{
- int idx, ret;
+ int idx, ret = 0;
struct intel_uncore_topology *upi;
unsigned int devfn;
struct pci_dev *dev = NULL;
@@ -4274,12 +4274,12 @@ static int skx_upi_topology_cb(struct intel_uncore_type *type, int segment,
dev = pci_get_domain_bus_and_slot(segment, bus, devfn);
if (dev) {
ret = upi_fill_topology(dev, upi, idx);
+ pci_dev_put(dev);
if (ret)
break;
}
}
- pci_dev_put(dev);
return ret;
}
@@ -5499,6 +5499,7 @@ static int discover_upi_topology(struct intel_uncore_type *type, int ubox_did, i
devfn);
if (dev) {
ret = upi_fill_topology(dev, upi, idx);
+ pci_dev_put(dev);
if (ret)
goto err;
}
@@ -5506,7 +5507,6 @@ static int discover_upi_topology(struct intel_uncore_type *type, int ubox_did, i
}
err:
pci_dev_put(ubox);
- pci_dev_put(dev);
return ret;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 2/6] perf/x86/intel/uncore: Fix PCI device refcount leak in UPI discovery
2026-05-11 23:05 ` [PATCH 2/6] perf/x86/intel/uncore: Fix PCI device refcount leak in UPI discovery Zide Chen
@ 2026-05-12 9:27 ` Mi, Dapeng
2026-05-12 17:35 ` Chen, Zide
0 siblings, 1 reply; 18+ messages in thread
From: Mi, Dapeng @ 2026-05-12 9:27 UTC (permalink / raw)
To: Zide Chen, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users
On 5/12/2026 7:05 AM, Zide Chen wrote:
> pci_get_domain_bus_and_slot() increments the reference count of the
> returned PCI device and therefore requires a matching pci_dev_put().
>
> In skx_upi_topology_cb() and discover_upi_topology(), the lookup is
> performed inside a loop, but pci_dev_put() is only called once after
> the loop. As a result, references from all previous iterations are
> leaked.
>
> Move pci_dev_put(dev) into the if (dev) block immediately after
> upi_fill_topology() returns.
>
> Opportunistically, fix uninitialized variable in skx_upi_topology_cb().
>
> Fixes: 4cfce57fa42d ("perf/x86/intel/uncore: Enable UPI topology discovery for Skylake Server")
> Fixes: f680b6e6062e ("perf/x86/intel/uncore: Enable UPI topology discovery for Icelake Server")
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> arch/x86/events/intel/uncore_snbep.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
> index 215d33e260ed..c9ce206fcbb6 100644
> --- a/arch/x86/events/intel/uncore_snbep.c
> +++ b/arch/x86/events/intel/uncore_snbep.c
> @@ -4261,7 +4261,7 @@ static int upi_fill_topology(struct pci_dev *dev, struct intel_uncore_topology *
> static int skx_upi_topology_cb(struct intel_uncore_type *type, int segment,
> int die, u64 cpu_bus_msr)
> {
> - int idx, ret;
> + int idx, ret = 0;
> struct intel_uncore_topology *upi;
> unsigned int devfn;
> struct pci_dev *dev = NULL;
> @@ -4274,12 +4274,12 @@ static int skx_upi_topology_cb(struct intel_uncore_type *type, int segment,
> dev = pci_get_domain_bus_and_slot(segment, bus, devfn);
> if (dev) {
> ret = upi_fill_topology(dev, upi, idx);
> + pci_dev_put(dev);
> if (ret)
> break;
> }
> }
>
> - pci_dev_put(dev);
> return ret;
> }
>
> @@ -5499,6 +5499,7 @@ static int discover_upi_topology(struct intel_uncore_type *type, int ubox_did, i
> devfn);
> if (dev) {
> ret = upi_fill_topology(dev, upi, idx);
> + pci_dev_put(dev);
> if (ret)
> goto err;
> }
> @@ -5506,7 +5507,6 @@ static int discover_upi_topology(struct intel_uncore_type *type, int ubox_did, i
> }
> err:
> pci_dev_put(ubox);
> - pci_dev_put(dev);
Should we move the "pci_dev_put(ubox)" into the while loop as well? In
theory, the ubox device could be found multiple times.
Besides, could you please search "pci_get_device()" in uncore code, it
seems some functions don't call pci_dev_put() or only calls it once, like
the funciton spr_update_device_location() ...
Thanks.
> return ret;
> }
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 2/6] perf/x86/intel/uncore: Fix PCI device refcount leak in UPI discovery
2026-05-12 9:27 ` Mi, Dapeng
@ 2026-05-12 17:35 ` Chen, Zide
2026-05-13 0:31 ` Mi, Dapeng
0 siblings, 1 reply; 18+ messages in thread
From: Chen, Zide @ 2026-05-12 17:35 UTC (permalink / raw)
To: Mi, Dapeng, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users
On 5/12/2026 2:27 AM, Mi, Dapeng wrote:
>
> On 5/12/2026 7:05 AM, Zide Chen wrote:
>> pci_get_domain_bus_and_slot() increments the reference count of the
>> returned PCI device and therefore requires a matching pci_dev_put().
>>
>> In skx_upi_topology_cb() and discover_upi_topology(), the lookup is
>> performed inside a loop, but pci_dev_put() is only called once after
>> the loop. As a result, references from all previous iterations are
>> leaked.
>>
>> Move pci_dev_put(dev) into the if (dev) block immediately after
>> upi_fill_topology() returns.
>>
>> Opportunistically, fix uninitialized variable in skx_upi_topology_cb().
>>
>> Fixes: 4cfce57fa42d ("perf/x86/intel/uncore: Enable UPI topology discovery for Skylake Server")
>> Fixes: f680b6e6062e ("perf/x86/intel/uncore: Enable UPI topology discovery for Icelake Server")
>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>> ---
>> arch/x86/events/intel/uncore_snbep.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
>> index 215d33e260ed..c9ce206fcbb6 100644
>> --- a/arch/x86/events/intel/uncore_snbep.c
>> +++ b/arch/x86/events/intel/uncore_snbep.c
>> @@ -4261,7 +4261,7 @@ static int upi_fill_topology(struct pci_dev *dev, struct intel_uncore_topology *
>> static int skx_upi_topology_cb(struct intel_uncore_type *type, int segment,
>> int die, u64 cpu_bus_msr)
>> {
>> - int idx, ret;
>> + int idx, ret = 0;
>> struct intel_uncore_topology *upi;
>> unsigned int devfn;
>> struct pci_dev *dev = NULL;
>> @@ -4274,12 +4274,12 @@ static int skx_upi_topology_cb(struct intel_uncore_type *type, int segment,
>> dev = pci_get_domain_bus_and_slot(segment, bus, devfn);
>> if (dev) {
>> ret = upi_fill_topology(dev, upi, idx);
>> + pci_dev_put(dev);
>> if (ret)
>> break;
>> }
>> }
>>
>> - pci_dev_put(dev);
>> return ret;
>> }
>>
>> @@ -5499,6 +5499,7 @@ static int discover_upi_topology(struct intel_uncore_type *type, int ubox_did, i
>> devfn);
>> if (dev) {
>> ret = upi_fill_topology(dev, upi, idx);
>> + pci_dev_put(dev);
>> if (ret)
>> goto err;
>> }
>> @@ -5506,7 +5507,6 @@ static int discover_upi_topology(struct intel_uncore_type *type, int ubox_did, i
>> }
>> err:
>> pci_dev_put(ubox);
>> - pci_dev_put(dev);
>
> Should we move the "pci_dev_put(ubox)" into the while loop as well? In
> theory, the ubox device could be found multiple times.
As mentioned below, pci_dev_put(ubox) is needed for the two 'goto err"
breaks. Moving it into the main loop would require two pci_dev_put()
calls, which adds no benefit.
>
> Besides, could you please search "pci_get_device()" in uncore code, it
> seems some functions don't call pci_dev_put() or only calls it once, like
> the funciton spr_update_device_location() ...
pci_get_device() calls pci_dev_put() internally on the previous device
before returning the next one, so if the "while (pci_get_device())" loop
runs to completion without a break, no extra pci_dev_put() is needed:
https://elixir.bootlin.com/linux/v7.1-rc3/source/drivers/pci/search.c#L283
> Thanks.
>
>
>
>> return ret;
>> }
>>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 2/6] perf/x86/intel/uncore: Fix PCI device refcount leak in UPI discovery
2026-05-12 17:35 ` Chen, Zide
@ 2026-05-13 0:31 ` Mi, Dapeng
0 siblings, 0 replies; 18+ messages in thread
From: Mi, Dapeng @ 2026-05-13 0:31 UTC (permalink / raw)
To: Chen, Zide, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users
On 5/13/2026 1:35 AM, Chen, Zide wrote:
>
> On 5/12/2026 2:27 AM, Mi, Dapeng wrote:
>> On 5/12/2026 7:05 AM, Zide Chen wrote:
>>> pci_get_domain_bus_and_slot() increments the reference count of the
>>> returned PCI device and therefore requires a matching pci_dev_put().
>>>
>>> In skx_upi_topology_cb() and discover_upi_topology(), the lookup is
>>> performed inside a loop, but pci_dev_put() is only called once after
>>> the loop. As a result, references from all previous iterations are
>>> leaked.
>>>
>>> Move pci_dev_put(dev) into the if (dev) block immediately after
>>> upi_fill_topology() returns.
>>>
>>> Opportunistically, fix uninitialized variable in skx_upi_topology_cb().
>>>
>>> Fixes: 4cfce57fa42d ("perf/x86/intel/uncore: Enable UPI topology discovery for Skylake Server")
>>> Fixes: f680b6e6062e ("perf/x86/intel/uncore: Enable UPI topology discovery for Icelake Server")
>>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>>> ---
>>> arch/x86/events/intel/uncore_snbep.c | 6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
>>> index 215d33e260ed..c9ce206fcbb6 100644
>>> --- a/arch/x86/events/intel/uncore_snbep.c
>>> +++ b/arch/x86/events/intel/uncore_snbep.c
>>> @@ -4261,7 +4261,7 @@ static int upi_fill_topology(struct pci_dev *dev, struct intel_uncore_topology *
>>> static int skx_upi_topology_cb(struct intel_uncore_type *type, int segment,
>>> int die, u64 cpu_bus_msr)
>>> {
>>> - int idx, ret;
>>> + int idx, ret = 0;
>>> struct intel_uncore_topology *upi;
>>> unsigned int devfn;
>>> struct pci_dev *dev = NULL;
>>> @@ -4274,12 +4274,12 @@ static int skx_upi_topology_cb(struct intel_uncore_type *type, int segment,
>>> dev = pci_get_domain_bus_and_slot(segment, bus, devfn);
>>> if (dev) {
>>> ret = upi_fill_topology(dev, upi, idx);
>>> + pci_dev_put(dev);
>>> if (ret)
>>> break;
>>> }
>>> }
>>>
>>> - pci_dev_put(dev);
>>> return ret;
>>> }
>>>
>>> @@ -5499,6 +5499,7 @@ static int discover_upi_topology(struct intel_uncore_type *type, int ubox_did, i
>>> devfn);
>>> if (dev) {
>>> ret = upi_fill_topology(dev, upi, idx);
>>> + pci_dev_put(dev);
>>> if (ret)
>>> goto err;
>>> }
>>> @@ -5506,7 +5507,6 @@ static int discover_upi_topology(struct intel_uncore_type *type, int ubox_did, i
>>> }
>>> err:
>>> pci_dev_put(ubox);
>>> - pci_dev_put(dev);
>> Should we move the "pci_dev_put(ubox)" into the while loop as well? In
>> theory, the ubox device could be found multiple times.
> As mentioned below, pci_dev_put(ubox) is needed for the two 'goto err"
> breaks. Moving it into the main loop would require two pci_dev_put()
> calls, which adds no benefit.
> >
>> Besides, could you please search "pci_get_device()" in uncore code, it
>> seems some functions don't call pci_dev_put() or only calls it once, like
>> the funciton spr_update_device_location() ...
> pci_get_device() calls pci_dev_put() internally on the previous device
> before returning the next one, so if the "while (pci_get_device())" loop
> runs to completion without a break, no extra pci_dev_put() is needed:
>
> https://elixir.bootlin.com/linux/v7.1-rc3/source/drivers/pci/search.c#L283
I see. Thanks.
>
>
>> Thanks.
>>
>>
>>
>>> return ret;
>>> }
>>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/6] perf/x86/intel/uncore: Defer ADL global PMON enable to enable_box()
2026-05-11 23:05 [PATCH 0/6] perf/x86/intel/uncore: Bug fixes and cleanups Zide Chen
2026-05-11 23:05 ` [PATCH 1/6] perf/x86/intel/uncore: Fix discovery unit lookup for multi-die systems Zide Chen
2026-05-11 23:05 ` [PATCH 2/6] perf/x86/intel/uncore: Fix PCI device refcount leak in UPI discovery Zide Chen
@ 2026-05-11 23:05 ` Zide Chen
2026-05-13 2:33 ` sashiko-bot
2026-05-11 23:05 ` [PATCH 4/6] perf/x86/intel/uncore: Move die_to_cpu() to uncore.c Zide Chen
` (2 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Zide Chen @ 2026-05-11 23:05 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen, stable
On some Raptor Cove CPUs, enabling uncore PMON globally at driver init
may increase power consumption even when no perf events are in use.
Drop adl_uncore_msr_init_box() and defer programming the global control
register to enable_box(), so it is only set when a box is actually used.
Cc: stable@vger.kernel.org
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/events/intel/uncore_snb.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
index 3dbc6bacbd9d..edddd4f9ab5f 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -563,12 +563,6 @@ void tgl_uncore_cpu_init(void)
skl_uncore_msr_ops.init_box = rkl_uncore_msr_init_box;
}
-static void adl_uncore_msr_init_box(struct intel_uncore_box *box)
-{
- if (box->pmu->pmu_idx == 0)
- wrmsrq(ADL_UNC_PERF_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
-}
-
static void adl_uncore_msr_enable_box(struct intel_uncore_box *box)
{
wrmsrq(ADL_UNC_PERF_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
@@ -587,7 +581,6 @@ static void adl_uncore_msr_exit_box(struct intel_uncore_box *box)
}
static struct intel_uncore_ops adl_uncore_msr_ops = {
- .init_box = adl_uncore_msr_init_box,
.enable_box = adl_uncore_msr_enable_box,
.disable_box = adl_uncore_msr_disable_box,
.exit_box = adl_uncore_msr_exit_box,
--
2.54.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 3/6] perf/x86/intel/uncore: Defer ADL global PMON enable to enable_box()
2026-05-11 23:05 ` [PATCH 3/6] perf/x86/intel/uncore: Defer ADL global PMON enable to enable_box() Zide Chen
@ 2026-05-13 2:33 ` sashiko-bot
2026-05-13 17:58 ` Chen, Zide
0 siblings, 1 reply; 18+ messages in thread
From: sashiko-bot @ 2026-05-13 2:33 UTC (permalink / raw)
To: Zide Chen; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] The deferred global PMON enablement permanently leaks the hardware power state after the first use, nullifying the intended power savings.
--
commit ce0046bfd8b6ed5388332db6e76ea45581716939
Author: Zide Chen <zide.chen@intel.com>
perf/x86/intel/uncore: Defer ADL global PMON enable to enable_box()
This commit drops adl_uncore_msr_init_box() and defers programming the
ADL_UNC_PERF_GLOBAL_CTL MSR to enable_box(). The goal is to save power by
ensuring the global control register is only enabled when a box is actually
used.
> diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
> index 3dbc6bacbd9de..edddd4f9ab5fc 100644
> --- a/arch/x86/events/intel/uncore_snb.c
> +++ b/arch/x86/events/intel/uncore_snb.c
[ ... ]
> -static void adl_uncore_msr_init_box(struct intel_uncore_box *box)
> -{
> - if (box->pmu->pmu_idx == 0)
> - wrmsrq(ADL_UNC_PERF_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
> -}
> -
> static void adl_uncore_msr_enable_box(struct intel_uncore_box *box)
> {
> wrmsrq(ADL_UNC_PERF_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
Does this unintentionally leave the hardware power state permanently enabled
after the first use?
The Linux perf core framework unconditionally calls perf_pmu_enable(), which
maps to uncore_pmu_enable() and executes enable_box(), at the end of PMU
transactions. This includes event removal sequences:
perf_pmu_disable()
event->pmu->del()
perf_pmu_enable()
Since adl_uncore_msr_enable_box() unconditionally writes
SNB_UNC_GLOBAL_CTL_EN, it seems the global PMON control is immediately
re-enabled and remains active even when zero events are in use, negating
the intended power savings.
Additionally, if a check like if (box->n_active) were added here, could
that silently break concurrent perf sessions on other uncore boxes?
Because disable_box() unconditionally clears the shared MSR during
transactions if pmu_idx == 0, failing to re-enable it would leave PMON
globally disabled for the entire die. Safely managing dynamic power for this
shared global MSR might require a package-level active reference count
across all boxes.
> @@ -587,7 +581,6 @@ static void adl_uncore_msr_exit_box(struct intel_uncore_box *box)
> }
>
> static struct intel_uncore_ops adl_uncore_msr_ops = {
> - .init_box = adl_uncore_msr_init_box,
> .enable_box = adl_uncore_msr_enable_box,
> .disable_box = adl_uncore_msr_disable_box,
> .exit_box = adl_uncore_msr_exit_box,
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260511230527.26096-1-zide.chen@intel.com?part=3
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 3/6] perf/x86/intel/uncore: Defer ADL global PMON enable to enable_box()
2026-05-13 2:33 ` sashiko-bot
@ 2026-05-13 17:58 ` Chen, Zide
0 siblings, 0 replies; 18+ messages in thread
From: Chen, Zide @ 2026-05-13 17:58 UTC (permalink / raw)
To: sashiko-reviews; +Cc: linux-perf-users
On 5/12/2026 7:33 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] The deferred global PMON enablement permanently leaks the hardware power state after the first use, nullifying the intended power savings.
> --
>
> commit ce0046bfd8b6ed5388332db6e76ea45581716939
> Author: Zide Chen <zide.chen@intel.com>
>
> perf/x86/intel/uncore: Defer ADL global PMON enable to enable_box()
>
> This commit drops adl_uncore_msr_init_box() and defers programming the
> ADL_UNC_PERF_GLOBAL_CTL MSR to enable_box(). The goal is to save power by
> ensuring the global control register is only enabled when a box is actually
> used.
>
>> diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
>> index 3dbc6bacbd9de..edddd4f9ab5fc 100644
>> --- a/arch/x86/events/intel/uncore_snb.c
>> +++ b/arch/x86/events/intel/uncore_snb.c
> [ ... ]
>> -static void adl_uncore_msr_init_box(struct intel_uncore_box *box)
>> -{
>> - if (box->pmu->pmu_idx == 0)
>> - wrmsrq(ADL_UNC_PERF_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
>> -}
>> -
>> static void adl_uncore_msr_enable_box(struct intel_uncore_box *box)
>> {
>> wrmsrq(ADL_UNC_PERF_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
>
> Does this unintentionally leave the hardware power state permanently enabled
> after the first use?
>
> The Linux perf core framework unconditionally calls perf_pmu_enable(), which
> maps to uncore_pmu_enable() and executes enable_box(), at the end of PMU
> transactions. This includes event removal sequences:
>
> perf_pmu_disable()
> event->pmu->del()
> perf_pmu_enable()
>
> Since adl_uncore_msr_enable_box() unconditionally writes
> SNB_UNC_GLOBAL_CTL_EN, it seems the global PMON control is immediately
> re-enabled and remains active even when zero events are in use, negating
> the intended power savings.
> Additionally, if a check like if (box->n_active) were added here, could
> that silently break concurrent perf sessions on other uncore boxes?
The additional power consumption occurs as soon as the MSR is written to
any value — not because the global control remains enabled. So checking
n_active or disabling the global control when no events are active
wouldn't help.
> Because disable_box() unconditionally clears the shared MSR during
> transactions if pmu_idx == 0, failing to re-enable it would leave PMON
> globally disabled for the entire die. Safely managing dynamic power for this
> shared global MSR might require a package-level active reference count
> across all boxes.
Again, for the power consumption concern, the only thing we need to do
is to postpone the write to the global control MSR (0x2ff0), and after
the first write, it's irrelevant.
Regarding the concern of the disable_box(), I'd like to address it
separately from this patch. The fix here is simply to defer the first
write until it's actually needed.
>> @@ -587,7 +581,6 @@ static void adl_uncore_msr_exit_box(struct intel_uncore_box *box)
>> }
>>
>> static struct intel_uncore_ops adl_uncore_msr_ops = {
>> - .init_box = adl_uncore_msr_init_box,
>> .enable_box = adl_uncore_msr_enable_box,
>> .disable_box = adl_uncore_msr_disable_box,
>> .exit_box = adl_uncore_msr_exit_box,
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 4/6] perf/x86/intel/uncore: Move die_to_cpu() to uncore.c
2026-05-11 23:05 [PATCH 0/6] perf/x86/intel/uncore: Bug fixes and cleanups Zide Chen
` (2 preceding siblings ...)
2026-05-11 23:05 ` [PATCH 3/6] perf/x86/intel/uncore: Defer ADL global PMON enable to enable_box() Zide Chen
@ 2026-05-11 23:05 ` Zide Chen
2026-05-13 2:58 ` sashiko-bot
2026-05-11 23:05 ` [PATCH 5/6] perf/x86/intel/uncore: Fix uncore_die_to_cpu() for offline dies Zide Chen
2026-05-11 23:05 ` [PATCH 6/6] perf/x86/intel/uncore: Implement global init callback for GNR uncore Zide Chen
5 siblings, 1 reply; 18+ messages in thread
From: Zide Chen @ 2026-05-11 23:05 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen
Move die_to_cpu() into uncore.c so it can be reused by the MSR
initialization path, preparing for the introduction of an MSR global
initialization callback.
Add the uncore_ prefix for consistency with other uncore APIs.
No functional change intended.
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/events/intel/uncore.c | 19 +++++++++++++++++++
arch/x86/events/intel/uncore.h | 1 +
arch/x86/events/intel/uncore_snbep.c | 21 +--------------------
3 files changed, 21 insertions(+), 20 deletions(-)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index e9cc1ba921c5..2bbe4cc1df3e 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -83,6 +83,25 @@ int uncore_device_to_die(struct pci_dev *dev)
return -1;
}
+int uncore_die_to_cpu(int die)
+{
+ int res = 0, cpu;
+
+ /*
+ * Using cpus_read_lock() to ensure cpu is not going down between
+ * looking at cpu_online_mask.
+ */
+ cpus_read_lock();
+ for_each_online_cpu(cpu) {
+ if (topology_logical_die_id(cpu) == die) {
+ res = cpu;
+ break;
+ }
+ }
+ cpus_read_unlock();
+ return res;
+}
+
static void uncore_free_pcibus_map(void)
{
struct pci2phy_map *map, *tmp;
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index c35918c01afa..94c68e3417b6 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -235,6 +235,7 @@ struct pci2phy_map *__find_pci2phy_map(int segment);
int uncore_pcibus_to_dieid(struct pci_bus *bus);
int uncore_die_to_segment(int die);
int uncore_device_to_die(struct pci_dev *dev);
+int uncore_die_to_cpu(int die);
ssize_t uncore_event_show(struct device *dev,
struct device_attribute *attr, char *buf);
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index c9ce206fcbb6..30c6a9306c54 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -3704,25 +3704,6 @@ static int skx_msr_cpu_bus_read(int cpu, u64 *topology)
return 0;
}
-static int die_to_cpu(int die)
-{
- int res = 0, cpu, current_die;
- /*
- * Using cpus_read_lock() to ensure cpu is not going down between
- * looking at cpu_online_mask.
- */
- cpus_read_lock();
- for_each_online_cpu(cpu) {
- current_die = topology_logical_die_id(cpu);
- if (current_die == die) {
- res = cpu;
- break;
- }
- }
- cpus_read_unlock();
- return res;
-}
-
enum {
IIO_TOPOLOGY_TYPE,
UPI_TOPOLOGY_TYPE,
@@ -3795,7 +3776,7 @@ static int skx_pmu_get_topology(struct intel_uncore_type *type,
u64 cpu_bus_msr;
for (die = 0; die < uncore_max_dies(); die++) {
- ret = skx_msr_cpu_bus_read(die_to_cpu(die), &cpu_bus_msr);
+ ret = skx_msr_cpu_bus_read(uncore_die_to_cpu(die), &cpu_bus_msr);
if (ret)
break;
--
2.54.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 4/6] perf/x86/intel/uncore: Move die_to_cpu() to uncore.c
2026-05-11 23:05 ` [PATCH 4/6] perf/x86/intel/uncore: Move die_to_cpu() to uncore.c Zide Chen
@ 2026-05-13 2:58 ` sashiko-bot
2026-05-13 18:11 ` Chen, Zide
0 siblings, 1 reply; 18+ messages in thread
From: sashiko-bot @ 2026-05-13 2:58 UTC (permalink / raw)
To: Zide Chen; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Recursive `cpus_read_lock()` acquisition leads to deadlock when used by the planned initialization callback.
--
commit d8b9d983f61a73e17f268ae570b5186a67f1a6ae
Author: Zide Chen <zide.chen@intel.com>
perf/x86/intel/uncore: Move die_to_cpu() to uncore.c
This commit moves die_to_cpu() into uncore.c and renames it to
uncore_die_to_cpu(). This change prepares the function to be reused by a
planned MSR global initialization callback.
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index e9cc1ba921c5e..2bbe4cc1df3e2 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
[ ... ]
> @@ -83,6 +83,25 @@ int uncore_device_to_die(struct pci_dev *dev)
> return -1;
> }
>
> +int uncore_die_to_cpu(int die)
> +{
> + int res = 0, cpu;
> +
> + /*
> + * Using cpus_read_lock() to ensure cpu is not going down between
> + * looking at cpu_online_mask.
> + */
> + cpus_read_lock();
Will the planned MSR global initialization callback already hold
cpus_read_lock() when it calls this function?
If the initialization path (such as uncore_discovery_msr() calling into
__parse_discovery_table()) already holds the lock, does acquiring it a
second time here risk a lockdep warning or a deadlock?
Since cpus_read_lock() is backed by a percpu rwsem, a second read lock by
the same thread can block if a writer (like CPU hotplug) is pending. Does
this API need an unlocked variant for initialization paths, or should it
rely on the caller to manage the lock entirely?
> + for_each_online_cpu(cpu) {
> + if (topology_logical_die_id(cpu) == die) {
> + res = cpu;
> + break;
> + }
> + }
> + cpus_read_unlock();
> + return res;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260511230527.26096-1-zide.chen@intel.com?part=4
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 4/6] perf/x86/intel/uncore: Move die_to_cpu() to uncore.c
2026-05-13 2:58 ` sashiko-bot
@ 2026-05-13 18:11 ` Chen, Zide
0 siblings, 0 replies; 18+ messages in thread
From: Chen, Zide @ 2026-05-13 18:11 UTC (permalink / raw)
To: sashiko-reviews; +Cc: linux-perf-users
On 5/12/2026 7:58 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] Recursive `cpus_read_lock()` acquisition leads to deadlock when used by the planned initialization callback.
> --
>
> commit d8b9d983f61a73e17f268ae570b5186a67f1a6ae
> Author: Zide Chen <zide.chen@intel.com>
>
> perf/x86/intel/uncore: Move die_to_cpu() to uncore.c
>
> This commit moves die_to_cpu() into uncore.c and renames it to
> uncore_die_to_cpu(). This change prepares the function to be reused by a
> planned MSR global initialization callback.
>
>> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
>> index e9cc1ba921c5e..2bbe4cc1df3e2 100644
>> --- a/arch/x86/events/intel/uncore.c
>> +++ b/arch/x86/events/intel/uncore.c
> [ ... ]
>> @@ -83,6 +83,25 @@ int uncore_device_to_die(struct pci_dev *dev)
>> return -1;
>> }
>>
>> +int uncore_die_to_cpu(int die)
>> +{
>> + int res = 0, cpu;
>> +
>> + /*
>> + * Using cpus_read_lock() to ensure cpu is not going down between
>> + * looking at cpu_online_mask.
>> + */
>> + cpus_read_lock();
>
> Will the planned MSR global initialization callback already hold
> cpus_read_lock() when it calls this function?
>
> If the initialization path (such as uncore_discovery_msr() calling into
> __parse_discovery_table()) already holds the lock, does acquiring it a
> second time here risk a lockdep warning or a deadlock?
>
> Since cpus_read_lock() is backed by a percpu rwsem, a second read lock by
> the same thread can block if a writer (like CPU hotplug) is pending. Does
> this API need an unlocked variant for initialization paths, or should it
> rely on the caller to manage the lock entirely?
Yes, you are right. Seems the best option would be to remove the lock
from uncore_die_to_cpu(), and rely on callers to manage the lock.
>
>> + for_each_online_cpu(cpu) {
>> + if (topology_logical_die_id(cpu) == die) {
>> + res = cpu;
>> + break;
>> + }
>> + }
>> + cpus_read_unlock();
>> + return res;
>> +}
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 5/6] perf/x86/intel/uncore: Fix uncore_die_to_cpu() for offline dies
2026-05-11 23:05 [PATCH 0/6] perf/x86/intel/uncore: Bug fixes and cleanups Zide Chen
` (3 preceding siblings ...)
2026-05-11 23:05 ` [PATCH 4/6] perf/x86/intel/uncore: Move die_to_cpu() to uncore.c Zide Chen
@ 2026-05-11 23:05 ` Zide Chen
2026-05-11 23:05 ` [PATCH 6/6] perf/x86/intel/uncore: Implement global init callback for GNR uncore Zide Chen
5 siblings, 0 replies; 18+ messages in thread
From: Zide Chen @ 2026-05-11 23:05 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen
If the die is offline when uncore_die_to_cpu() is called, it silently
returns 0, which is misleading. Return -1 in this case to indicate
that all CPUs on the die are offline and the caller can take care of
it accordingly.
Opportunistically, replace -EPERM with -ENODEV, as -ENODEV is
the appropriate error when no CPUs are online across all dies.
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/events/intel/uncore.c | 2 +-
arch/x86/events/intel/uncore_snbep.c | 9 +++++++--
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 2bbe4cc1df3e..19056514b081 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -85,7 +85,7 @@ int uncore_device_to_die(struct pci_dev *dev)
int uncore_die_to_cpu(int die)
{
- int res = 0, cpu;
+ int res = -1, cpu;
/*
* Using cpus_read_lock() to ensure cpu is not going down between
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index 30c6a9306c54..251c7bdbe30b 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -3772,11 +3772,16 @@ static void pmu_free_topology(struct intel_uncore_type *type)
static int skx_pmu_get_topology(struct intel_uncore_type *type,
int (*topology_cb)(struct intel_uncore_type*, int, int, u64))
{
- int die, ret = -EPERM;
+ int die, ret = -ENODEV;
u64 cpu_bus_msr;
+ int cpu;
for (die = 0; die < uncore_max_dies(); die++) {
- ret = skx_msr_cpu_bus_read(uncore_die_to_cpu(die), &cpu_bus_msr);
+ cpu = uncore_die_to_cpu(die);
+ if (cpu == -1)
+ continue;
+
+ ret = skx_msr_cpu_bus_read(cpu, &cpu_bus_msr);
if (ret)
break;
--
2.54.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 6/6] perf/x86/intel/uncore: Implement global init callback for GNR uncore
2026-05-11 23:05 [PATCH 0/6] perf/x86/intel/uncore: Bug fixes and cleanups Zide Chen
` (4 preceding siblings ...)
2026-05-11 23:05 ` [PATCH 5/6] perf/x86/intel/uncore: Fix uncore_die_to_cpu() for offline dies Zide Chen
@ 2026-05-11 23:05 ` Zide Chen
2026-05-13 4:25 ` sashiko-bot
5 siblings, 1 reply; 18+ messages in thread
From: Zide Chen @ 2026-05-11 23:05 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen
On Sierra Forest and Clearwater Forest, the FRZ_ALL bit in the global
control register defaults to 0 at boot, but UBOX PMON units do not
work until the global control register is explicitly written with 0
to trigger hardware initialization properly.
Implement the generic uncore_msr_global_init() callback and add it to
gnr_uncore_init[], which is shared by GNR, GRR, SRF, and CWF.
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/events/intel/uncore.c | 14 +++++++++++++-
arch/x86/events/intel/uncore.h | 2 +-
arch/x86/events/intel/uncore_discovery.c | 2 +-
3 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 19056514b081..a7780c5cd419 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1716,7 +1716,7 @@ static int __init uncore_mmio_init(void)
return ret;
}
-static int uncore_mmio_global_init(u64 ctl)
+static int uncore_mmio_global_init(int die, u64 ctl)
{
void __iomem *io_addr;
@@ -1731,6 +1731,17 @@ static int uncore_mmio_global_init(u64 ctl)
return 0;
}
+static int uncore_msr_global_init(int die, u64 msr)
+{
+ int cpu = uncore_die_to_cpu(die);
+
+ if (cpu == -1)
+ return -ENODEV;
+
+ wrmsrq_on_cpu(cpu, msr, 0);
+ return 0;
+}
+
static const struct uncore_plat_init nhm_uncore_init __initconst = {
.cpu_init = nhm_uncore_cpu_init,
};
@@ -1871,6 +1882,7 @@ static const struct uncore_plat_init gnr_uncore_init __initconst = {
.domain[0].base_is_pci = true,
.domain[0].discovery_base = UNCORE_DISCOVERY_TABLE_DEVICE,
.domain[0].units_ignore = gnr_uncore_units_ignore,
+ .domain[0].global_init = uncore_msr_global_init,
};
static const struct uncore_plat_init dmr_uncore_init __initconst = {
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 94c68e3417b6..c2e5ccb1d72c 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -53,7 +53,7 @@ struct uncore_discovery_domain {
/* MSR address or PCI device used as the discovery base */
u32 discovery_base;
bool base_is_pci;
- int (*global_init)(u64 ctl);
+ int (*global_init)(int die, u64 ctl);
/* The units in the discovery table should be ignored. */
int *units_ignore;
diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index 1d22d7c00ee0..49183d607a34 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -287,7 +287,7 @@ static int __parse_discovery_table(struct uncore_discovery_domain *domain,
if (!io_addr)
return -ENOMEM;
- if (domain->global_init && domain->global_init(global.ctl)) {
+ if (domain->global_init && domain->global_init(die, global.ctl)) {
ret = -ENODEV;
goto out;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 6/6] perf/x86/intel/uncore: Implement global init callback for GNR uncore
2026-05-11 23:05 ` [PATCH 6/6] perf/x86/intel/uncore: Implement global init callback for GNR uncore Zide Chen
@ 2026-05-13 4:25 ` sashiko-bot
2026-05-13 18:26 ` Chen, Zide
0 siblings, 1 reply; 18+ messages in thread
From: sashiko-bot @ 2026-05-13 4:25 UTC (permalink / raw)
To: Zide Chen; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The return value of `wrmsrq_on_cpu()` is ignored in `uncore_msr_global_init()`, masking hardware initialization failures caused by a CPU hotplug TOCTOU race.
--
commit d795e124f297e8f2c0cfee9efd29cdf4ec1993ce
Author: Zide Chen <zide.chen@intel.com>
perf/x86/intel/uncore: Implement global init callback for GNR uncore
This commit adds a generic uncore_msr_global_init() callback for GNR, GRR,
SRF, and CWF. It writes 0 to the global control register during boot to
ensure the UBOX PMON units are properly initialized.
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index 19056514b0812..a7780c5cd419a 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
[ ... ]
> @@ -1731,6 +1731,17 @@ static int uncore_mmio_global_init(u64 ctl)
> return 0;
> }
>
> +static int uncore_msr_global_init(int die, u64 msr)
> +{
> + int cpu = uncore_die_to_cpu(die);
> +
> + if (cpu == -1)
> + return -ENODEV;
> +
> + wrmsrq_on_cpu(cpu, msr, 0);
> + return 0;
> +}
uncore_die_to_cpu() acquires and releases cpus_read_lock() to find an
online CPU, meaning the selected CPU could be hot-unplugged concurrently
before wrmsrq_on_cpu() executes.
If the CPU goes offline, the underlying smp_call_function_single() will
fail and wrmsrq_on_cpu() will return an error.
Because this function unconditionally returns 0, the caller will assume
success and proceed to register the PMON units. As noted in the commit
message, the units will not work unless the MSR is written, which could
lead to a silent functional failure.
Does this code need to check and propagate the return value of
wrmsrq_on_cpu() to prevent this regression?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260511230527.26096-1-zide.chen@intel.com?part=6
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 6/6] perf/x86/intel/uncore: Implement global init callback for GNR uncore
2026-05-13 4:25 ` sashiko-bot
@ 2026-05-13 18:26 ` Chen, Zide
0 siblings, 0 replies; 18+ messages in thread
From: Chen, Zide @ 2026-05-13 18:26 UTC (permalink / raw)
To: sashiko-reviews; +Cc: linux-perf-users
On 5/12/2026 9:25 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [Medium] The return value of `wrmsrq_on_cpu()` is ignored in `uncore_msr_global_init()`, masking hardware initialization failures caused by a CPU hotplug TOCTOU race.
> --
>
> commit d795e124f297e8f2c0cfee9efd29cdf4ec1993ce
> Author: Zide Chen <zide.chen@intel.com>
>
> perf/x86/intel/uncore: Implement global init callback for GNR uncore
>
> This commit adds a generic uncore_msr_global_init() callback for GNR, GRR,
> SRF, and CWF. It writes 0 to the global control register during boot to
> ensure the UBOX PMON units are properly initialized.
>
>> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
>> index 19056514b0812..a7780c5cd419a 100644
>> --- a/arch/x86/events/intel/uncore.c
>> +++ b/arch/x86/events/intel/uncore.c
>
> [ ... ]
>
>> @@ -1731,6 +1731,17 @@ static int uncore_mmio_global_init(u64 ctl)
>> return 0;
>> }
>>
>> +static int uncore_msr_global_init(int die, u64 msr)
>> +{
>> + int cpu = uncore_die_to_cpu(die);
>> +
>> + if (cpu == -1)
>> + return -ENODEV;
>> +
>> + wrmsrq_on_cpu(cpu, msr, 0);
>> + return 0;
>> +}
>
> uncore_die_to_cpu() acquires and releases cpus_read_lock() to find an
> online CPU, meaning the selected CPU could be hot-unplugged concurrently
> before wrmsrq_on_cpu() executes.
>
> If the CPU goes offline, the underlying smp_call_function_single() will
> fail and wrmsrq_on_cpu() will return an error.
As you pointed out in PATCH 4/6, I would move the lock out of
uncore_die_to_cpu() and rely on the caller to manage the lock, then
between uncore_die_to_cpu() and wrmsrq_on_cpu(), CPU hot-unplug cannot
occur.
> Because this function unconditionally returns 0, the caller will assume
> success and proceed to register the PMON units. As noted in the commit
> message, the units will not work unless the MSR is written, which could
> lead to a silent functional failure.
>
> Does this code need to check and propagate the return value of
> wrmsrq_on_cpu() to prevent this regression?
No, wrmsrq_on_cpu() returns 0 always.
>
^ permalink raw reply [flat|nested] 18+ messages in thread