* [PATCH 0/7] perf/x86/intel/uncore: PMU setup robustness fixes
@ 2026-05-12 23:30 Zide Chen
2026-05-12 23:30 ` [PATCH 1/7] perf/x86/intel/uncore: Rename refcount fields and other cleanups Zide Chen
` (6 more replies)
0 siblings, 7 replies; 19+ messages in thread
From: Zide Chen @ 2026-05-12 23:30 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen
This series fixes correctness issues in Intel uncore PMU setup:
- If all init_box() on a PMU fails, the PMU sysfs node may still exist,
while perf events read zeros and silently report wrong data.
- If init_box() fails on only some dies, perf may return partial
non-zero counts, which is harder to diagnose.
- CPU hotplug ref/unref ordering bugs can skip init_box() when the first
CPU in a die comes online, and can call box_exit() prematurely when
the second-to-last CPU goes offline.
To address this, the series introduces a PMU broken state to track setup
failures and switches MSR/MMIO PMUs to lazy registration, matching
existing PCI behavior.
Zide Chen (7):
perf/x86/intel/uncore: Rename refcount fields and other cleanups
perf/x86/intel/uncore: Let init_box() callback report failures
perf/x86/intel/uncore: Keep PCI PMUs working when MMIO/MSR setup fails
perf/x86/intel/uncore: Factor out box setup code
perf/x86/intel/uncore: Introduce PMU flags and broken state
perf/x86/intel/uncore: Fix uncore_box ref/unref ordering on CPU
hotplug
perf/x86/intel/uncore: Implement lazy setup for MSR/MMIO PMU
arch/x86/events/intel/uncore.c | 214 +++++++++++------------
arch/x86/events/intel/uncore.h | 36 ++--
arch/x86/events/intel/uncore_discovery.c | 16 +-
arch/x86/events/intel/uncore_discovery.h | 6 +-
arch/x86/events/intel/uncore_nhmex.c | 3 +-
arch/x86/events/intel/uncore_snb.c | 84 +++++----
arch/x86/events/intel/uncore_snbep.c | 71 +++++---
7 files changed, 242 insertions(+), 188 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/7] perf/x86/intel/uncore: Rename refcount fields and other cleanups
2026-05-12 23:30 [PATCH 0/7] perf/x86/intel/uncore: PMU setup robustness fixes Zide Chen
@ 2026-05-12 23:30 ` Zide Chen
2026-05-13 0:26 ` Ian Rogers
2026-05-12 23:30 ` [PATCH 2/7] perf/x86/intel/uncore: Let init_box() callback report failures Zide Chen
` (5 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Zide Chen @ 2026-05-12 23:30 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen
Fix typo UNCORE_BOX_FLAG_INITIATED to UNCORE_BOX_FLAG_INITIALIZED.
Rename the 'id' parameter in uncore_box_{ref,unref}() to 'die' to
reflect its actual meaning and be consistent with other functions.
pmu->activeboxes is misleading because it does not decrement when the
PMU fails to register. Rename it to die_refcnt, reflecting that it
counts the number of dies with a box present, regardless of whether
the PMU is functioning.
Rename box->refcnt to box->cpu_refcnt to clarify that it tracks the
number of CPUs on the die with a corresponding PMU box present.
Remove the incorrect atomic_inc(&box->refcnt) from
uncore_pci_pmu_register(): PCI boxes are not tracked by cpu_refcnt,
and this call incorrectly increments it on a per-die basis.
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/events/intel/uncore.c | 21 +++++++++++----------
arch/x86/events/intel/uncore.h | 10 +++++-----
2 files changed, 16 insertions(+), 15 deletions(-)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index a7780c5cd419..012a7e081014 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1170,14 +1170,13 @@ static int uncore_pci_pmu_register(struct pci_dev *pdev,
if (!box)
return -ENOMEM;
- atomic_inc(&box->refcnt);
box->dieid = die;
box->pci_dev = pdev;
box->pmu = pmu;
uncore_box_init(box);
pmu->boxes[die] = box;
- if (atomic_inc_return(&pmu->activeboxes) > 1)
+ if (atomic_inc_return(&pmu->die_refcnt) > 1)
return 0;
/* First active box registers the pmu */
@@ -1249,7 +1248,7 @@ static void uncore_pci_pmu_unregister(struct intel_uncore_pmu *pmu, int die)
struct intel_uncore_box *box = pmu->boxes[die];
pmu->boxes[die] = NULL;
- if (atomic_dec_return(&pmu->activeboxes) == 0)
+ if (atomic_dec_return(&pmu->die_refcnt) == 0)
uncore_pmu_unregister(pmu);
uncore_box_exit(box);
kfree(box);
@@ -1515,7 +1514,7 @@ static void uncore_change_context(struct intel_uncore_type **uncores,
uncore_change_type_ctx(*uncores, old_cpu, new_cpu);
}
-static void uncore_box_unref(struct intel_uncore_type **types, int id)
+static void uncore_box_unref(struct intel_uncore_type **types, int die)
{
struct intel_uncore_type *type;
struct intel_uncore_pmu *pmu;
@@ -1526,8 +1525,9 @@ static void uncore_box_unref(struct intel_uncore_type **types, int id)
type = *types;
pmu = type->pmus;
for (i = 0; i < type->num_boxes; i++, pmu++) {
- box = pmu->boxes[id];
- if (box && box->cpu >= 0 && atomic_dec_return(&box->refcnt) == 0)
+ box = pmu->boxes[die];
+ if (box && box->cpu >= 0 &&
+ atomic_dec_return(&box->cpu_refcnt) == 0)
uncore_box_exit(box);
}
}
@@ -1601,14 +1601,14 @@ static int allocate_boxes(struct intel_uncore_type **types,
}
static int uncore_box_ref(struct intel_uncore_type **types,
- int id, unsigned int cpu)
+ int die, unsigned int cpu)
{
struct intel_uncore_type *type;
struct intel_uncore_pmu *pmu;
struct intel_uncore_box *box;
int i, ret;
- ret = allocate_boxes(types, id, cpu);
+ ret = allocate_boxes(types, die, cpu);
if (ret)
return ret;
@@ -1616,8 +1616,9 @@ static int uncore_box_ref(struct intel_uncore_type **types,
type = *types;
pmu = type->pmus;
for (i = 0; i < type->num_boxes; i++, pmu++) {
- box = pmu->boxes[id];
- if (box && box->cpu >= 0 && atomic_inc_return(&box->refcnt) == 1)
+ box = pmu->boxes[die];
+ if (box && box->cpu >= 0 &&
+ atomic_inc_return(&box->cpu_refcnt) == 1)
uncore_box_init(box);
}
}
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index c2e5ccb1d72c..7d4ef869d193 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -147,7 +147,7 @@ struct intel_uncore_pmu {
char name[UNCORE_PMU_NAME_LEN];
int pmu_idx;
bool registered;
- atomic_t activeboxes;
+ atomic_t die_refcnt;
cpumask_t cpu_mask;
struct intel_uncore_type *type;
struct intel_uncore_box **boxes;
@@ -165,7 +165,7 @@ struct intel_uncore_box {
int n_events;
int cpu; /* cpu to collect events */
unsigned long flags;
- atomic_t refcnt;
+ atomic_t cpu_refcnt; /* Number of CPUs that have this box online */
struct perf_event *events[UNCORE_PMC_IDX_MAX];
struct perf_event *event_list[UNCORE_PMC_IDX_MAX];
struct event_constraint *event_constraint[UNCORE_PMC_IDX_MAX];
@@ -185,7 +185,7 @@ struct intel_uncore_box {
#define CFL_UNC_CBO_7_PERFEVTSEL0 0xf70
#define CFL_UNC_CBO_7_PER_CTR0 0xf76
-#define UNCORE_BOX_FLAG_INITIATED 0
+#define UNCORE_BOX_FLAG_INITIALIZED 0
/* event config registers are 8-byte apart */
#define UNCORE_BOX_FLAG_CTL_OFFS8 1
/* CFL 8th CBOX has different MSR space */
@@ -559,7 +559,7 @@ static inline u64 uncore_read_counter(struct intel_uncore_box *box,
static inline void uncore_box_init(struct intel_uncore_box *box)
{
- if (!test_and_set_bit(UNCORE_BOX_FLAG_INITIATED, &box->flags)) {
+ if (!test_and_set_bit(UNCORE_BOX_FLAG_INITIALIZED, &box->flags)) {
if (box->pmu->type->ops->init_box)
box->pmu->type->ops->init_box(box);
}
@@ -567,7 +567,7 @@ static inline void uncore_box_init(struct intel_uncore_box *box)
static inline void uncore_box_exit(struct intel_uncore_box *box)
{
- if (test_and_clear_bit(UNCORE_BOX_FLAG_INITIATED, &box->flags)) {
+ if (test_and_clear_bit(UNCORE_BOX_FLAG_INITIALIZED, &box->flags)) {
if (box->pmu->type->ops->exit_box)
box->pmu->type->ops->exit_box(box);
}
--
2.54.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/7] perf/x86/intel/uncore: Let init_box() callback report failures
2026-05-12 23:30 [PATCH 0/7] perf/x86/intel/uncore: PMU setup robustness fixes Zide Chen
2026-05-12 23:30 ` [PATCH 1/7] perf/x86/intel/uncore: Rename refcount fields and other cleanups Zide Chen
@ 2026-05-12 23:30 ` Zide Chen
2026-05-13 0:23 ` Ian Rogers
2026-05-12 23:30 ` [PATCH 3/7] perf/x86/intel/uncore: Keep PCI PMUs working when MMIO/MSR setup fails Zide Chen
` (4 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Zide Chen @ 2026-05-12 23:30 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen
The init_box() callback currently returns void, so initialization
failures are silently ignored and the box is still marked initialized.
Change the callback to return int so platform code can report errors
back to the common uncore layer.
Update uncore_box_init() to set the initialized flag only when
init_box() succeeds. Because box->cpu_refcnt guarantees that at most
one CPU calls uncore_box_init() for a given box at a time, plain
__set_bit() is safe for the initialized flag without atomic overhead.
Convert all init_box() implementations to return 0 on success or a
negative error code on failure. This is a prerequisite for propagating
initialization errors to the caller so they can be handled properly.
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/events/intel/uncore.h | 16 +++--
arch/x86/events/intel/uncore_discovery.c | 15 +++--
arch/x86/events/intel/uncore_discovery.h | 6 +-
arch/x86/events/intel/uncore_nhmex.c | 3 +-
arch/x86/events/intel/uncore_snb.c | 80 +++++++++++++++---------
arch/x86/events/intel/uncore_snbep.c | 69 ++++++++++++--------
6 files changed, 120 insertions(+), 69 deletions(-)
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 7d4ef869d193..5ee05545116a 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -129,7 +129,7 @@ struct intel_uncore_type {
#define events_group attr_groups[2]
struct intel_uncore_ops {
- void (*init_box)(struct intel_uncore_box *);
+ int (*init_box)(struct intel_uncore_box *);
void (*exit_box)(struct intel_uncore_box *);
void (*disable_box)(struct intel_uncore_box *);
void (*enable_box)(struct intel_uncore_box *);
@@ -557,12 +557,18 @@ static inline u64 uncore_read_counter(struct intel_uncore_box *box,
return box->pmu->type->ops->read_counter(box, event);
}
-static inline void uncore_box_init(struct intel_uncore_box *box)
+static inline int uncore_box_init(struct intel_uncore_box *box)
{
- if (!test_and_set_bit(UNCORE_BOX_FLAG_INITIALIZED, &box->flags)) {
- if (box->pmu->type->ops->init_box)
- box->pmu->type->ops->init_box(box);
+ int ret = 0;
+
+ if (!test_bit(UNCORE_BOX_FLAG_INITIALIZED, &box->flags) &&
+ box->pmu->type->ops->init_box) {
+ ret = box->pmu->type->ops->init_box(box);
+ if (!ret)
+ __set_bit(UNCORE_BOX_FLAG_INITIALIZED, &box->flags);
}
+
+ return ret;
}
static inline void uncore_box_exit(struct intel_uncore_box *box)
diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index 49183d607a34..f1156ad03523 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -488,9 +488,10 @@ static u64 intel_generic_uncore_box_ctl(struct intel_uncore_box *box)
return unit->addr;
}
-void intel_generic_uncore_msr_init_box(struct intel_uncore_box *box)
+int intel_generic_uncore_msr_init_box(struct intel_uncore_box *box)
{
wrmsrq(intel_generic_uncore_box_ctl(box), GENERIC_PMON_BOX_CTL_INT);
+ return 0;
}
void intel_generic_uncore_msr_disable_box(struct intel_uncore_box *box)
@@ -565,13 +566,14 @@ static inline int intel_pci_uncore_box_ctl(struct intel_uncore_box *box)
return UNCORE_DISCOVERY_PCI_BOX_CTRL(intel_generic_uncore_box_ctl(box));
}
-void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box)
+int intel_generic_uncore_pci_init_box(struct intel_uncore_box *box)
{
struct pci_dev *pdev = box->pci_dev;
int box_ctl = intel_pci_uncore_box_ctl(box);
__set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
pci_write_config_dword(pdev, box_ctl, GENERIC_PMON_BOX_CTL_INT);
+ return 0;
}
void intel_generic_uncore_pci_disable_box(struct intel_uncore_box *box)
@@ -632,7 +634,7 @@ static struct intel_uncore_ops generic_uncore_pci_ops = {
#define UNCORE_GENERIC_MMIO_SIZE 0x4000
-void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
+int intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
{
static struct intel_uncore_discovery_unit *unit;
struct intel_uncore_type *type = box->pmu->type;
@@ -642,13 +644,13 @@ void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
if (!unit) {
pr_warn("Uncore type %d id %d: Cannot find box control address.\n",
type->type_id, box->pmu->pmu_idx);
- return;
+ return -ENODEV;
}
if (!unit->addr) {
pr_warn("Uncore type %d box %d: Invalid box control address.\n",
type->type_id, unit->id);
- return;
+ return -ENODEV;
}
addr = unit->addr;
@@ -656,10 +658,11 @@ void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
if (!box->io_addr) {
pr_warn("Uncore type %d box %d: ioremap error for 0x%llx.\n",
type->type_id, unit->id, (unsigned long long)addr);
- return;
+ return -ENOMEM;
}
writel(GENERIC_PMON_BOX_CTL_INT, box->io_addr);
+ return 0;
}
void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box)
diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
index e1330342b92e..142e1b56cfc2 100644
--- a/arch/x86/events/intel/uncore_discovery.h
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -148,11 +148,11 @@ void intel_uncore_generic_uncore_cpu_init(void);
int intel_uncore_generic_uncore_pci_init(void);
void intel_uncore_generic_uncore_mmio_init(void);
-void intel_generic_uncore_msr_init_box(struct intel_uncore_box *box);
+int intel_generic_uncore_msr_init_box(struct intel_uncore_box *box);
void intel_generic_uncore_msr_disable_box(struct intel_uncore_box *box);
void intel_generic_uncore_msr_enable_box(struct intel_uncore_box *box);
-void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box);
+int intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box);
void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box);
void intel_generic_uncore_mmio_enable_box(struct intel_uncore_box *box);
void intel_generic_uncore_mmio_disable_event(struct intel_uncore_box *box,
@@ -160,7 +160,7 @@ void intel_generic_uncore_mmio_disable_event(struct intel_uncore_box *box,
void intel_generic_uncore_mmio_enable_event(struct intel_uncore_box *box,
struct perf_event *event);
-void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box);
+int intel_generic_uncore_pci_init_box(struct intel_uncore_box *box);
void intel_generic_uncore_pci_disable_box(struct intel_uncore_box *box);
void intel_generic_uncore_pci_enable_box(struct intel_uncore_box *box);
void intel_generic_uncore_pci_disable_event(struct intel_uncore_box *box,
diff --git a/arch/x86/events/intel/uncore_nhmex.c b/arch/x86/events/intel/uncore_nhmex.c
index 8962e7cb21e3..7a6855281102 100644
--- a/arch/x86/events/intel/uncore_nhmex.c
+++ b/arch/x86/events/intel/uncore_nhmex.c
@@ -199,9 +199,10 @@ DEFINE_UNCORE_FORMAT_ATTR(counter, counter, "config:6-7");
DEFINE_UNCORE_FORMAT_ATTR(match, match, "config1:0-63");
DEFINE_UNCORE_FORMAT_ATTR(mask, mask, "config2:0-63");
-static void nhmex_uncore_msr_init_box(struct intel_uncore_box *box)
+static int nhmex_uncore_msr_init_box(struct intel_uncore_box *box)
{
wrmsrq(NHMEX_U_MSR_PMON_GLOBAL_CTL, NHMEX_U_PMON_GLOBAL_EN_ALL);
+ return 0;
}
static void nhmex_uncore_msr_exit_box(struct intel_uncore_box *box)
diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
index edddd4f9ab5f..c5347920541c 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -295,12 +295,14 @@ static void snb_uncore_msr_disable_event(struct intel_uncore_box *box, struct pe
wrmsrq(event->hw.config_base, 0);
}
-static void snb_uncore_msr_init_box(struct intel_uncore_box *box)
+static int snb_uncore_msr_init_box(struct intel_uncore_box *box)
{
if (box->pmu->pmu_idx == 0) {
wrmsrq(SNB_UNC_PERF_GLOBAL_CTL,
SNB_UNC_GLOBAL_CTL_EN | SNB_UNC_GLOBAL_CTL_CORE_ALL);
}
+
+ return 0;
}
static void snb_uncore_msr_enable_box(struct intel_uncore_box *box)
@@ -394,7 +396,7 @@ void snb_uncore_cpu_init(void)
snb_uncore_cbox.num_boxes = topology_num_cores_per_package();
}
-static void skl_uncore_msr_init_box(struct intel_uncore_box *box)
+static int skl_uncore_msr_init_box(struct intel_uncore_box *box)
{
if (box->pmu->pmu_idx == 0) {
wrmsrq(SKL_UNC_PERF_GLOBAL_CTL,
@@ -404,6 +406,8 @@ static void skl_uncore_msr_init_box(struct intel_uncore_box *box)
/* The 8th CBOX has different MSR space */
if (box->pmu->pmu_idx == 7)
__set_bit(UNCORE_BOX_FLAG_CFL8_CBOX_MSR_OFFS, &box->flags);
+
+ return 0;
}
static void skl_uncore_msr_enable_box(struct intel_uncore_box *box)
@@ -547,10 +551,12 @@ static struct intel_uncore_type *tgl_msr_uncores[] = {
NULL,
};
-static void rkl_uncore_msr_init_box(struct intel_uncore_box *box)
+static int rkl_uncore_msr_init_box(struct intel_uncore_box *box)
{
if (box->pmu->pmu_idx == 0)
wrmsrq(SKL_UNC_PERF_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
+
+ return 0;
}
void tgl_uncore_cpu_init(void)
@@ -707,9 +713,10 @@ static struct intel_uncore_type mtl_uncore_hac_cbox = {
.format_group = &adl_uncore_format_group,
};
-static void mtl_uncore_msr_init_box(struct intel_uncore_box *box)
+static int mtl_uncore_msr_init_box(struct intel_uncore_box *box)
{
wrmsrq(uncore_msr_box_ctl(box), SNB_UNC_GLOBAL_CTL_EN);
+ return 0;
}
static struct intel_uncore_ops mtl_uncore_msr_ops = {
@@ -773,10 +780,12 @@ static struct intel_uncore_type *lnl_msr_uncores[] = {
#define LNL_UNC_MSR_GLOBAL_CTL 0x240e
-static void lnl_uncore_msr_init_box(struct intel_uncore_box *box)
+static int lnl_uncore_msr_init_box(struct intel_uncore_box *box)
{
if (box->pmu->pmu_idx == 0)
wrmsrq(LNL_UNC_MSR_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
+
+ return 0;
}
static struct intel_uncore_ops lnl_uncore_msr_ops = {
@@ -874,7 +883,7 @@ static const struct attribute_group snb_uncore_imc_format_group = {
.attrs = snb_uncore_imc_formats_attr,
};
-static void snb_uncore_imc_init_box(struct intel_uncore_box *box)
+static int snb_uncore_imc_init_box(struct intel_uncore_box *box)
{
struct intel_uncore_type *type = box->pmu->type;
struct pci_dev *pdev = box->pci_dev;
@@ -893,10 +902,13 @@ static void snb_uncore_imc_init_box(struct intel_uncore_box *box)
addr &= ~(PAGE_SIZE - 1);
box->io_addr = ioremap(addr, type->mmio_map_size);
- if (!box->io_addr)
+ if (!box->io_addr) {
pr_warn("perf uncore: Failed to ioremap for %s.\n", type->name);
+ return -ENOMEM;
+ }
box->hrtimer_duration = UNCORE_SNB_IMC_HRTIMER_INTERVAL;
+ return 0;
}
static void snb_uncore_imc_enable_box(struct intel_uncore_box *box)
@@ -1532,7 +1544,7 @@ static struct pci_dev *tgl_uncore_get_mc_dev(void)
#define TGL_UNCORE_MMIO_IMC_MEM_OFFSET 0x10000
#define TGL_UNCORE_PCI_IMC_MAP_SIZE 0xe000
-static void
+static int
uncore_get_box_mmio_addr(struct intel_uncore_box *box,
unsigned int base_offset,
int bar_offset, int step)
@@ -1541,19 +1553,20 @@ uncore_get_box_mmio_addr(struct intel_uncore_box *box,
struct intel_uncore_pmu *pmu = box->pmu;
struct intel_uncore_type *type = pmu->type;
resource_size_t addr;
+ int ret = 0;
u32 bar;
if (!pdev) {
pr_warn("perf uncore: Cannot find matched IMC device.\n");
- return;
+ return -ENODEV;
}
pci_read_config_dword(pdev, bar_offset, &bar);
if (!(bar & BIT(0))) {
pr_warn("perf uncore: BAR 0x%x is disabled. Failed to map %s counters.\n",
bar_offset, type->name);
- pci_dev_put(pdev);
- return;
+ ret = -ENODEV;
+ goto out;
}
bar &= ~BIT(0);
addr = (resource_size_t)(bar + step * pmu->pmu_idx);
@@ -1565,23 +1578,26 @@ uncore_get_box_mmio_addr(struct intel_uncore_box *box,
addr += base_offset;
box->io_addr = ioremap(addr, type->mmio_map_size);
- if (!box->io_addr)
+ if (!box->io_addr) {
+ ret = -ENOMEM;
pr_warn("perf uncore: Failed to ioremap for %s.\n", type->name);
-
+ }
+out:
pci_dev_put(pdev);
+ return ret;
}
-static void __uncore_imc_init_box(struct intel_uncore_box *box,
+static int __uncore_imc_init_box(struct intel_uncore_box *box,
unsigned int base_offset)
{
- uncore_get_box_mmio_addr(box, base_offset,
+ return uncore_get_box_mmio_addr(box, base_offset,
SNB_UNCORE_PCI_IMC_BAR_OFFSET,
TGL_UNCORE_MMIO_IMC_MEM_OFFSET);
}
-static void tgl_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
+static int tgl_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
{
- __uncore_imc_init_box(box, 0);
+ return __uncore_imc_init_box(box, 0);
}
static struct intel_uncore_ops tgl_uncore_imc_freerunning_ops = {
@@ -1648,13 +1664,15 @@ void tgl_uncore_mmio_init(void)
#define ADL_UNCORE_IMC_CTL_INT (ADL_UNCORE_IMC_CTL_RST_CTRL | \
ADL_UNCORE_IMC_CTL_RST_CTRS)
-static void adl_uncore_imc_init_box(struct intel_uncore_box *box)
+static int adl_uncore_imc_init_box(struct intel_uncore_box *box)
{
- __uncore_imc_init_box(box, ADL_UNCORE_IMC_BASE);
+ int ret = __uncore_imc_init_box(box, ADL_UNCORE_IMC_BASE);
/* The global control in MC1 can control both MCs. */
- if (box->io_addr && (box->pmu->pmu_idx == 1))
+ if (!ret && (box->pmu->pmu_idx == 1))
writel(ADL_UNCORE_IMC_CTL_INT, box->io_addr + ADL_UNCORE_IMC_GLOBAL_CTL);
+
+ return ret;
}
static void adl_uncore_mmio_disable_box(struct intel_uncore_box *box)
@@ -1731,9 +1749,9 @@ static struct freerunning_counters adl_uncore_imc_freerunning[] = {
[ADL_MMIO_UNCORE_IMC_DATA_WRITE] = { 0xA0, 0x0, 0x0, 1, 64 },
};
-static void adl_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
+static int adl_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
{
- __uncore_imc_init_box(box, ADL_UNCORE_IMC_FREERUNNING_BASE);
+ return __uncore_imc_init_box(box, ADL_UNCORE_IMC_FREERUNNING_BASE);
}
static struct intel_uncore_ops adl_uncore_imc_freerunning_ops = {
@@ -1803,9 +1821,9 @@ static const struct attribute_group lnl_uncore_format_group = {
.attrs = lnl_uncore_formats_attr,
};
-static void lnl_uncore_hbo_init_box(struct intel_uncore_box *box)
+static int lnl_uncore_hbo_init_box(struct intel_uncore_box *box)
{
- uncore_get_box_mmio_addr(box, LNL_UNCORE_HBO_BASE,
+ return uncore_get_box_mmio_addr(box, LNL_UNCORE_HBO_BASE,
LNL_UNCORE_PCI_SAFBAR_OFFSET,
LNL_UNCORE_HBO_OFFSET);
}
@@ -1829,14 +1847,16 @@ static struct intel_uncore_type lnl_uncore_hbo = {
.format_group = &lnl_uncore_format_group,
};
-static void lnl_uncore_sncu_init_box(struct intel_uncore_box *box)
+static int lnl_uncore_sncu_init_box(struct intel_uncore_box *box)
{
- uncore_get_box_mmio_addr(box, LNL_UNCORE_SNCU_BASE,
+ int ret = uncore_get_box_mmio_addr(box, LNL_UNCORE_SNCU_BASE,
LNL_UNCORE_PCI_SAFBAR_OFFSET,
0);
- if (box->io_addr)
+ if (!ret)
writel(ADL_UNCORE_IMC_CTL_INT, box->io_addr + LNL_UNCORE_GLOBAL_CTL);
+
+ return ret;
}
static struct intel_uncore_ops lnl_uncore_sncu_ops = {
@@ -1887,13 +1907,15 @@ static struct intel_uncore_type ptl_uncore_imc = {
.mmio_map_size = 0xf00,
};
-static void ptl_uncore_sncu_init_box(struct intel_uncore_box *box)
+static int ptl_uncore_sncu_init_box(struct intel_uncore_box *box)
{
- intel_generic_uncore_mmio_init_box(box);
+ int ret = intel_generic_uncore_mmio_init_box(box);
/* Clear the global freeze bit */
if (box->io_addr)
writel(0, box->io_addr + PTL_UNCORE_GLOBAL_CTL_OFFSET);
+
+ return ret;
}
static struct intel_uncore_ops ptl_uncore_sncu_ops = {
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index 251c7bdbe30b..fbc51fa1b705 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -627,12 +627,13 @@ static u64 snbep_uncore_pci_read_counter(struct intel_uncore_box *box, struct pe
return count;
}
-static void snbep_uncore_pci_init_box(struct intel_uncore_box *box)
+static int snbep_uncore_pci_init_box(struct intel_uncore_box *box)
{
struct pci_dev *pdev = box->pci_dev;
int box_ctl = uncore_pci_box_ctl(box);
pci_write_config_dword(pdev, box_ctl, SNBEP_PMON_BOX_CTL_INT);
+ return 0;
}
static void snbep_uncore_msr_disable_box(struct intel_uncore_box *box)
@@ -680,12 +681,14 @@ static void snbep_uncore_msr_disable_event(struct intel_uncore_box *box,
wrmsrq(hwc->config_base, hwc->config);
}
-static void snbep_uncore_msr_init_box(struct intel_uncore_box *box)
+static int snbep_uncore_msr_init_box(struct intel_uncore_box *box)
{
unsigned msr = uncore_msr_box_ctl(box);
if (msr)
wrmsrq(msr, SNBEP_PMON_BOX_CTL_INT);
+
+ return 0;
}
static struct attribute *snbep_uncore_formats_attr[] = {
@@ -1507,18 +1510,21 @@ int snbep_uncore_pci_init(void)
/* end of Sandy Bridge-EP uncore support */
/* IvyTown uncore support */
-static void ivbep_uncore_msr_init_box(struct intel_uncore_box *box)
+static int ivbep_uncore_msr_init_box(struct intel_uncore_box *box)
{
unsigned msr = uncore_msr_box_ctl(box);
if (msr)
wrmsrq(msr, IVBEP_PMON_BOX_CTL_INT);
+
+ return 0;
}
-static void ivbep_uncore_pci_init_box(struct intel_uncore_box *box)
+static int ivbep_uncore_pci_init_box(struct intel_uncore_box *box)
{
struct pci_dev *pdev = box->pci_dev;
pci_write_config_dword(pdev, SNBEP_PCI_PMON_BOX_CTL, IVBEP_PMON_BOX_CTL_INT);
+ return 0;
}
#define IVBEP_UNCORE_MSR_OPS_COMMON_INIT() \
@@ -2784,7 +2790,7 @@ static struct intel_uncore_type hswep_uncore_cbox = {
/*
* Write SBOX Initialization register bit by bit to avoid spurious #GPs
*/
-static void hswep_uncore_sbox_msr_init_box(struct intel_uncore_box *box)
+static int hswep_uncore_sbox_msr_init_box(struct intel_uncore_box *box)
{
unsigned msr = uncore_msr_box_ctl(box);
@@ -2798,6 +2804,8 @@ static void hswep_uncore_sbox_msr_init_box(struct intel_uncore_box *box)
wrmsrq(msr, flags);
}
}
+
+ return 0;
}
static struct intel_uncore_ops hswep_uncore_sbox_msr_ops = {
@@ -4160,12 +4168,13 @@ static const struct attribute_group skx_upi_uncore_format_group = {
.attrs = skx_upi_uncore_formats_attr,
};
-static void skx_upi_uncore_pci_init_box(struct intel_uncore_box *box)
+static int skx_upi_uncore_pci_init_box(struct intel_uncore_box *box)
{
struct pci_dev *pdev = box->pci_dev;
__set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
pci_write_config_dword(pdev, SKX_UPI_PCI_PMON_BOX_CTL, IVBEP_PMON_BOX_CTL_INT);
+ return 0;
}
static struct intel_uncore_ops skx_upi_uncore_pci_ops = {
@@ -4321,12 +4330,13 @@ static struct intel_uncore_type skx_uncore_upi = {
.cleanup_mapping = skx_upi_cleanup_mapping,
};
-static void skx_m2m_uncore_pci_init_box(struct intel_uncore_box *box)
+static int skx_m2m_uncore_pci_init_box(struct intel_uncore_box *box)
{
struct pci_dev *pdev = box->pci_dev;
__set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
pci_write_config_dword(pdev, SKX_M2M_PCI_PMON_BOX_CTL, IVBEP_PMON_BOX_CTL_INT);
+ return 0;
}
static struct intel_uncore_ops skx_m2m_uncore_pci_ops = {
@@ -4829,13 +4839,14 @@ void snr_uncore_cpu_init(void)
uncore_msr_uncores = snr_msr_uncores;
}
-static void snr_m2m_uncore_pci_init_box(struct intel_uncore_box *box)
+static int snr_m2m_uncore_pci_init_box(struct intel_uncore_box *box)
{
struct pci_dev *pdev = box->pci_dev;
int box_ctl = uncore_pci_box_ctl(box);
__set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
pci_write_config_dword(pdev, box_ctl, IVBEP_PMON_BOX_CTL_INT);
+ return 0;
}
static struct intel_uncore_ops snr_m2m_uncore_pci_ops = {
@@ -5008,17 +5019,22 @@ static int snr_uncore_mmio_map(struct intel_uncore_box *box,
return 0;
}
-static void __snr_uncore_mmio_init_box(struct intel_uncore_box *box,
+static int __snr_uncore_mmio_init_box(struct intel_uncore_box *box,
unsigned int box_ctl, int mem_offset,
unsigned int device)
{
- if (!snr_uncore_mmio_map(box, box_ctl, mem_offset, device))
+ int ret;
+
+ ret = snr_uncore_mmio_map(box, box_ctl, mem_offset, device);
+ if (!ret)
writel(IVBEP_PMON_BOX_CTL_INT, box->io_addr);
+
+ return ret;
}
-static void snr_uncore_mmio_init_box(struct intel_uncore_box *box)
+static int snr_uncore_mmio_init_box(struct intel_uncore_box *box)
{
- __snr_uncore_mmio_init_box(box, uncore_mmio_box_ctl(box),
+ return __snr_uncore_mmio_init_box(box, uncore_mmio_box_ctl(box),
SNR_IMC_MMIO_MEM0_OFFSET,
SNR_MC_DEVICE_ID);
}
@@ -5635,14 +5651,14 @@ int icx_uncore_pci_init(void)
return 0;
}
-static void icx_uncore_imc_init_box(struct intel_uncore_box *box)
+static int icx_uncore_imc_init_box(struct intel_uncore_box *box)
{
unsigned int box_ctl = box->pmu->type->box_ctl +
box->pmu->type->mmio_offset * (box->pmu->pmu_idx % ICX_NUMBER_IMC_CHN);
int mem_offset = (box->pmu->pmu_idx / ICX_NUMBER_IMC_CHN) * ICX_IMC_MEM_STRIDE +
SNR_IMC_MMIO_MEM0_OFFSET;
- __snr_uncore_mmio_init_box(box, box_ctl, mem_offset,
+ return __snr_uncore_mmio_init_box(box, box_ctl, mem_offset,
SNR_MC_DEVICE_ID);
}
@@ -5699,12 +5715,12 @@ static struct uncore_event_desc icx_uncore_imc_freerunning_events[] = {
{ /* end: all zeroes */ },
};
-static void icx_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
+static int icx_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
{
int mem_offset = box->pmu->pmu_idx * ICX_IMC_MEM_STRIDE +
SNR_IMC_MMIO_MEM0_OFFSET;
- snr_uncore_mmio_map(box, uncore_mmio_box_ctl(box),
+ return snr_uncore_mmio_map(box, uncore_mmio_box_ctl(box),
mem_offset, SNR_MC_DEVICE_ID);
}
@@ -6001,10 +6017,10 @@ static struct intel_uncore_type spr_uncore_mdf = {
.name = "mdf",
};
-static void spr_uncore_mmio_offs8_init_box(struct intel_uncore_box *box)
+static int spr_uncore_mmio_offs8_init_box(struct intel_uncore_box *box)
{
__set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
- intel_generic_uncore_mmio_init_box(box);
+ return intel_generic_uncore_mmio_init_box(box);
}
static struct intel_uncore_ops spr_uncore_mmio_offs8_ops = {
@@ -6185,12 +6201,11 @@ static struct uncore_event_desc spr_uncore_imc_freerunning_events[] = {
#define SPR_MC_DEVICE_ID 0x3251
-static void spr_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
+static int spr_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
{
int mem_offset = box->pmu->pmu_idx * ICX_IMC_MEM_STRIDE + SNR_IMC_MMIO_MEM0_OFFSET;
-
- snr_uncore_mmio_map(box, uncore_mmio_box_ctl(box),
- mem_offset, SPR_MC_DEVICE_ID);
+ return snr_uncore_mmio_map(box, uncore_mmio_box_ctl(box),
+ mem_offset, SPR_MC_DEVICE_ID);
}
static struct intel_uncore_ops spr_uncore_imc_freerunning_ops = {
@@ -6879,20 +6894,24 @@ static unsigned int dmr_iio_freerunning_box_offsets[] = {
0x0, 0x8000, 0x18000, 0x20000
};
-static void dmr_uncore_freerunning_init_box(struct intel_uncore_box *box)
+static int dmr_uncore_freerunning_init_box(struct intel_uncore_box *box)
{
struct intel_uncore_type *type = box->pmu->type;
u64 mmio_base;
if (box->pmu->pmu_idx >= type->num_boxes)
- return;
+ return -ENODEV;
mmio_base = DMR_IMH1_HIOP_MMIO_BASE;
mmio_base += dmr_iio_freerunning_box_offsets[box->pmu->pmu_idx];
box->io_addr = ioremap(mmio_base, type->mmio_map_size);
- if (!box->io_addr)
+ if (!box->io_addr) {
pr_warn("perf uncore: Failed to ioremap for %s.\n", type->name);
+ return -ENOMEM;
+ }
+
+ return 0;
}
static struct intel_uncore_ops dmr_uncore_freerunning_ops = {
--
2.54.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/7] perf/x86/intel/uncore: Keep PCI PMUs working when MMIO/MSR setup fails
2026-05-12 23:30 [PATCH 0/7] perf/x86/intel/uncore: PMU setup robustness fixes Zide Chen
2026-05-12 23:30 ` [PATCH 1/7] perf/x86/intel/uncore: Rename refcount fields and other cleanups Zide Chen
2026-05-12 23:30 ` [PATCH 2/7] perf/x86/intel/uncore: Let init_box() callback report failures Zide Chen
@ 2026-05-12 23:30 ` Zide Chen
2026-05-13 0:30 ` Ian Rogers
2026-05-12 23:30 ` [PATCH 4/7] perf/x86/intel/uncore: Factor out box setup code Zide Chen
` (3 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Zide Chen @ 2026-05-12 23:30 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen
uncore_event_cpu_online() returns -ENOMEM early when both the MSR and
MMIO box allocations fail. This also aborts PCI uncore setup, even
though PCI PMUs are independent of the MSR/MMIO paths.
Remove the early return so PCI uncore setup always runs regardless
of whether MSR or MMIO box allocation succeeds.
Fixes: 3da04b8a00dd ("perf/x86/intel/uncore: Support MMIO type uncore blocks")
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/events/intel/uncore.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 012a7e081014..8d5170788af2 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1632,8 +1632,6 @@ static int uncore_event_cpu_online(unsigned int cpu)
die = topology_logical_die_id(cpu);
msr_ret = uncore_box_ref(uncore_msr_uncores, die, cpu);
mmio_ret = uncore_box_ref(uncore_mmio_uncores, die, cpu);
- if (msr_ret && mmio_ret)
- return -ENOMEM;
/*
* Check if there is an online cpu in the package
--
2.54.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4/7] perf/x86/intel/uncore: Factor out box setup code
2026-05-12 23:30 [PATCH 0/7] perf/x86/intel/uncore: PMU setup robustness fixes Zide Chen
` (2 preceding siblings ...)
2026-05-12 23:30 ` [PATCH 3/7] perf/x86/intel/uncore: Keep PCI PMUs working when MMIO/MSR setup fails Zide Chen
@ 2026-05-12 23:30 ` Zide Chen
2026-05-13 0:27 ` Ian Rogers
2026-05-12 23:30 ` [PATCH 5/7] perf/x86/intel/uncore: Introduce PMU flags and broken state Zide Chen
` (2 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Zide Chen @ 2026-05-12 23:30 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen
The PCI uncore PMU path already implements a lazy registration model:
the PMU is registered when the first active box appears and
unregistered when the last active box is removed.
Factor this registration management into a shared helper, so the same
code can be reused by the MSR and MMIO paths in later changes.
No functional change intended.
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/events/intel/uncore.c | 39 ++++++++++++++++++++++++----------
1 file changed, 28 insertions(+), 11 deletions(-)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 8d5170788af2..00ed4e5047ac 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1148,6 +1148,29 @@ uncore_pci_find_dev_pmu(struct pci_dev *pdev, const struct pci_device_id *ids)
return pmu;
}
+static int uncore_box_setup(struct intel_uncore_pmu *pmu,
+ struct intel_uncore_box *box)
+{
+ int dies, ret;
+
+ /* die_refcnt tracks online dies, not only functioning boxes. */
+ dies = atomic_inc_return(&pmu->die_refcnt);
+ uncore_box_init(box);
+
+ /* First active box registers the pmu. */
+ if (dies > 1)
+ return 0;
+
+ ret = uncore_pmu_register(pmu);
+ if (ret)
+ goto err;
+
+ return 0;
+err:
+ uncore_box_exit(box);
+ return ret;
+}
+
/*
* Register the PMU for a PCI device
* @pdev: The PCI device.
@@ -1173,19 +1196,13 @@ static int uncore_pci_pmu_register(struct pci_dev *pdev,
box->dieid = die;
box->pci_dev = pdev;
box->pmu = pmu;
- uncore_box_init(box);
- pmu->boxes[die] = box;
- if (atomic_inc_return(&pmu->die_refcnt) > 1)
- return 0;
-
- /* First active box registers the pmu */
- ret = uncore_pmu_register(pmu);
- if (ret) {
- pmu->boxes[die] = NULL;
- uncore_box_exit(box);
+ ret = uncore_box_setup(pmu, box);
+ if (!ret)
+ pmu->boxes[die] = box;
+ else
kfree(box);
- }
+
return ret;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 5/7] perf/x86/intel/uncore: Introduce PMU flags and broken state
2026-05-12 23:30 [PATCH 0/7] perf/x86/intel/uncore: PMU setup robustness fixes Zide Chen
` (3 preceding siblings ...)
2026-05-12 23:30 ` [PATCH 4/7] perf/x86/intel/uncore: Factor out box setup code Zide Chen
@ 2026-05-12 23:30 ` Zide Chen
2026-05-13 0:28 ` Ian Rogers
2026-05-12 23:30 ` [PATCH 6/7] perf/x86/intel/uncore: Fix uncore_box ref/unref ordering on CPU hotplug Zide Chen
2026-05-12 23:30 ` [PATCH 7/7] perf/x86/intel/uncore: Implement lazy setup for MSR/MMIO PMU Zide Chen
6 siblings, 1 reply; 19+ messages in thread
From: Zide Chen @ 2026-05-12 23:30 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen
Replace the boolean 'registered' field in intel_uncore_pmu with an
unsigned long 'flags' field, and add a PMU_BROKEN flag to track box
setup failures.
When any box fails to initialize, the PMU is marked broken. Broken
PMUs reject new event assignments and skip future box setup attempts.
If the PMU was already registered, it remains so to avoid disrupting
in-flight events on other boxes.
To prevent retry loops, die_refcnt and cpu_refcnt are not decremented
on failure, and broken PMUs are skipped in the CPU hotplug and box
allocation paths.
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/events/intel/uncore.c | 36 +++++++++++++++++++++++-------
arch/x86/events/intel/uncore.h | 12 +++++++++-
arch/x86/events/intel/uncore_snb.c | 2 +-
3 files changed, 40 insertions(+), 10 deletions(-)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 00ed4e5047ac..922ba299533e 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -757,7 +757,7 @@ static int uncore_pmu_event_init(struct perf_event *event)
pmu = uncore_event_to_pmu(event);
/* no device found for this pmu */
- if (!pmu->registered)
+ if (!uncore_pmu_available(pmu))
return -ENOENT;
/* Sampling not supported yet */
@@ -953,16 +953,16 @@ static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
if (!ret)
- pmu->registered = true;
+ uncore_pmu_set_registered(pmu);
return ret;
}
static void uncore_pmu_unregister(struct intel_uncore_pmu *pmu)
{
- if (!pmu->registered)
+ if (!uncore_pmu_registered(pmu))
return;
perf_pmu_unregister(&pmu->pmu);
- pmu->registered = false;
+ WRITE_ONCE(pmu->flags, 0);
}
static void uncore_free_boxes(struct intel_uncore_pmu *pmu)
@@ -1155,7 +1155,13 @@ static int uncore_box_setup(struct intel_uncore_pmu *pmu,
/* die_refcnt tracks online dies, not only functioning boxes. */
dies = atomic_inc_return(&pmu->die_refcnt);
- uncore_box_init(box);
+
+ if (uncore_pmu_broken(pmu))
+ return -ENODEV;
+
+ ret = uncore_box_init(box);
+ if (ret)
+ goto err;
/* First active box registers the pmu. */
if (dies > 1)
@@ -1167,6 +1173,19 @@ static int uncore_box_setup(struct intel_uncore_pmu *pmu,
return 0;
err:
+ /*
+ * On failure on any box, mark the per-package PMU as broken regardless
+ * of whether it was registered or not.
+ *
+ * Don't decrement die_refcnt to prevent any future CPU online
+ * event or PCI probe, from retrying the failed PMU registration.
+ *
+ * Don't decrement cpu_refcnt to avoid other in-die CPUs from
+ * trying to set up the PMU box again.
+ *
+ * Don't kfree box; MSR and MMIO boxes are freed at module exit only.
+ */
+ uncore_pmu_set_broken(pmu);
uncore_box_exit(box);
return ret;
}
@@ -1502,7 +1521,8 @@ static void uncore_change_type_ctx(struct intel_uncore_type *type, int old_cpu,
if (old_cpu < 0) {
WARN_ON_ONCE(box->cpu != -1);
- if (uncore_die_has_box(type, die, pmu->pmu_idx)) {
+ if (uncore_die_has_box(type, die, pmu->pmu_idx) &&
+ !uncore_pmu_broken(pmu)) {
box->cpu = new_cpu;
cpumask_set_cpu(new_cpu, &pmu->cpu_mask);
}
@@ -1512,7 +1532,7 @@ static void uncore_change_type_ctx(struct intel_uncore_type *type, int old_cpu,
WARN_ON_ONCE(box->cpu != -1 && box->cpu != old_cpu);
box->cpu = -1;
cpumask_clear_cpu(old_cpu, &pmu->cpu_mask);
- if (new_cpu < 0)
+ if (new_cpu < 0 || uncore_pmu_broken(pmu))
continue;
if (!uncore_die_has_box(type, die, pmu->pmu_idx))
@@ -1592,7 +1612,7 @@ static int allocate_boxes(struct intel_uncore_type **types,
type = *types;
pmu = type->pmus;
for (i = 0; i < type->num_boxes; i++, pmu++) {
- if (pmu->boxes[die])
+ if (pmu->boxes[die] || uncore_pmu_broken(pmu))
continue;
box = uncore_alloc_box(type, cpu_to_node(cpu));
if (!box)
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 5ee05545116a..4d3a99bf1455 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -146,13 +146,23 @@ struct intel_uncore_pmu {
struct pmu pmu;
char name[UNCORE_PMU_NAME_LEN];
int pmu_idx;
- bool registered;
+ unsigned long flags;
atomic_t die_refcnt;
cpumask_t cpu_mask;
struct intel_uncore_type *type;
struct intel_uncore_box **boxes;
};
+#define PMU_REGISTERED_BIT 0
+#define PMU_BROKEN_BIT 1
+
+#define uncore_pmu_registered(pmu) test_bit(PMU_REGISTERED_BIT, &(pmu)->flags)
+#define uncore_pmu_broken(pmu) test_bit(PMU_BROKEN_BIT, &(pmu)->flags)
+#define uncore_pmu_available(pmu) (uncore_pmu_registered(pmu) && \
+ !uncore_pmu_broken(pmu))
+#define uncore_pmu_set_registered(pmu) set_bit(PMU_REGISTERED_BIT, &(pmu)->flags)
+#define uncore_pmu_set_broken(pmu) set_bit(PMU_BROKEN_BIT, &(pmu)->flags)
+
struct intel_uncore_extra_reg {
raw_spinlock_t lock;
u64 config, config1, config2;
diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
index c5347920541c..055131c508ff 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -940,7 +940,7 @@ static int snb_uncore_imc_event_init(struct perf_event *event)
pmu = uncore_event_to_pmu(event);
/* no device found for this pmu */
- if (!pmu->registered)
+ if (!uncore_pmu_available(pmu))
return -ENOENT;
/* Sampling not supported yet */
--
2.54.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 6/7] perf/x86/intel/uncore: Fix uncore_box ref/unref ordering on CPU hotplug
2026-05-12 23:30 [PATCH 0/7] perf/x86/intel/uncore: PMU setup robustness fixes Zide Chen
` (4 preceding siblings ...)
2026-05-12 23:30 ` [PATCH 5/7] perf/x86/intel/uncore: Introduce PMU flags and broken state Zide Chen
@ 2026-05-12 23:30 ` Zide Chen
2026-05-13 0:32 ` Ian Rogers
2026-05-13 8:59 ` Mi, Dapeng
2026-05-12 23:30 ` [PATCH 7/7] perf/x86/intel/uncore: Implement lazy setup for MSR/MMIO PMU Zide Chen
6 siblings, 2 replies; 19+ messages in thread
From: Zide Chen @ 2026-05-12 23:30 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen
In uncore_event_cpu_online(), uncore_box_ref() was called before
uncore_change_context(). uncore_box_ref() gates on box->cpu >= 0,
but box->cpu is still -1 at that point because uncore_change_context()
has not run yet. As a result, the box is never initialized on the
first CPU to come online in a die, leaving it permanently
uninitialized in the single-CPU-per-die case.
Thus, cpu_refcnt is one count below the true value, and in the CPU
offline path, the box will be torn down on the second-to-last CPU.
In uncore_event_cpu_offline(), uncore_box_unref() was called after
uncore_change_context(), so box->cpu is already -1 when the collector
CPU goes offline, which prevents it from tearing down the box.
Fix by swapping the call order in both paths so that
uncore_box_{ref,unref}() runs at the point where box->cpu reflects
the correct context.
Fixes: c74443d92f68 ("perf/x86/uncore: Support per PMU cpumask")
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/events/intel/uncore.c | 56 ++++++++++++++++++----------------
1 file changed, 29 insertions(+), 27 deletions(-)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 922ba299533e..399f434e1a7d 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1574,9 +1574,15 @@ static int uncore_event_cpu_offline(unsigned int cpu)
{
int die, target;
+ /* Clear the references */
+ die = topology_logical_die_id(cpu);
+ uncore_box_unref(uncore_msr_uncores, die);
+ uncore_box_unref(uncore_mmio_uncores, die);
+
/* Check if exiting cpu is used for collecting uncore events */
if (!cpumask_test_and_clear_cpu(cpu, &uncore_cpu_mask))
- goto unref;
+ return 0;
+
/* Find a new cpu to collect uncore events */
target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
@@ -1589,20 +1595,14 @@ static int uncore_event_cpu_offline(unsigned int cpu)
uncore_change_context(uncore_msr_uncores, cpu, target);
uncore_change_context(uncore_mmio_uncores, cpu, target);
uncore_change_context(uncore_pci_uncores, cpu, target);
-
-unref:
- /* Clear the references */
- die = topology_logical_die_id(cpu);
- uncore_box_unref(uncore_msr_uncores, die);
- uncore_box_unref(uncore_mmio_uncores, die);
return 0;
}
-static int allocate_boxes(struct intel_uncore_type **types,
+static void allocate_boxes(struct intel_uncore_type **types,
unsigned int die, unsigned int cpu)
{
struct intel_uncore_box *box, *tmp;
- struct intel_uncore_type *type;
+ struct intel_uncore_type *type, **start = types;
struct intel_uncore_pmu *pmu;
LIST_HEAD(allocated);
int i;
@@ -1627,14 +1627,21 @@ static int allocate_boxes(struct intel_uncore_type **types,
list_del_init(&box->active_list);
box->pmu->boxes[die] = box;
}
- return 0;
+ return;
cleanup:
list_for_each_entry_safe(box, tmp, &allocated, active_list) {
list_del_init(&box->active_list);
kfree(box);
}
- return -ENOMEM;
+
+ /* mark the PMU broken to prevent future ussage. */
+ for (; *start; start++) {
+ type = *start;
+ pmu = type->pmus;
+ for (i = 0; i < type->num_boxes; i++, pmu++)
+ uncore_pmu_set_broken(pmu);
+ }
}
static int uncore_box_ref(struct intel_uncore_type **types,
@@ -1643,11 +1650,7 @@ static int uncore_box_ref(struct intel_uncore_type **types,
struct intel_uncore_type *type;
struct intel_uncore_pmu *pmu;
struct intel_uncore_box *box;
- int i, ret;
-
- ret = allocate_boxes(types, die, cpu);
- if (ret)
- return ret;
+ int i;
for (; *types; types++) {
type = *types;
@@ -1664,27 +1667,26 @@ static int uncore_box_ref(struct intel_uncore_type **types,
static int uncore_event_cpu_online(unsigned int cpu)
{
- int die, target, msr_ret, mmio_ret;
+ int die, target;
die = topology_logical_die_id(cpu);
- msr_ret = uncore_box_ref(uncore_msr_uncores, die, cpu);
- mmio_ret = uncore_box_ref(uncore_mmio_uncores, die, cpu);
+ allocate_boxes(uncore_msr_uncores, die, cpu);
+ allocate_boxes(uncore_mmio_uncores, die, cpu);
/*
* Check if there is an online cpu in the package
* which collects uncore events already.
*/
target = cpumask_any_and(&uncore_cpu_mask, topology_die_cpumask(cpu));
- if (target < nr_cpu_ids)
- return 0;
-
- cpumask_set_cpu(cpu, &uncore_cpu_mask);
-
- if (!msr_ret)
+ if (target >= nr_cpu_ids) {
+ cpumask_set_cpu(cpu, &uncore_cpu_mask);
uncore_change_context(uncore_msr_uncores, -1, cpu);
- if (!mmio_ret)
uncore_change_context(uncore_mmio_uncores, -1, cpu);
- uncore_change_context(uncore_pci_uncores, -1, cpu);
+ uncore_change_context(uncore_pci_uncores, -1, cpu);
+ }
+
+ uncore_box_ref(uncore_msr_uncores, die, cpu);
+ uncore_box_ref(uncore_mmio_uncores, die, cpu);
return 0;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 7/7] perf/x86/intel/uncore: Implement lazy setup for MSR/MMIO PMU
2026-05-12 23:30 [PATCH 0/7] perf/x86/intel/uncore: PMU setup robustness fixes Zide Chen
` (5 preceding siblings ...)
2026-05-12 23:30 ` [PATCH 6/7] perf/x86/intel/uncore: Fix uncore_box ref/unref ordering on CPU hotplug Zide Chen
@ 2026-05-12 23:30 ` Zide Chen
2026-05-13 0:34 ` Ian Rogers
2026-05-13 9:03 ` Mi, Dapeng
6 siblings, 2 replies; 19+ messages in thread
From: Zide Chen @ 2026-05-12 23:30 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen
MSR and MMIO uncore PMUs are currently registered at module init time
and appear in sysfs even when no PMU boxes are functional.
Apply the same lazy registration model used by PCI uncore PMUs: the
PMU is registered when the first box is successfully initialized, and
unregistered when the last box exits. If a box fails to initialize on
a subsequent die, the PMU is marked broken but remains registered to
avoid disrupting any in-flight perf events.
Box allocation and free remain at module init/exit time to avoid
repeated kfree/alloc cycles across CPU offline/online events.
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
arch/x86/events/intel/uncore.c | 72 ++++++----------------------------
1 file changed, 12 insertions(+), 60 deletions(-)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 399f434e1a7d..2aaac0b49bb6 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1564,8 +1564,11 @@ static void uncore_box_unref(struct intel_uncore_type **types, int die)
for (i = 0; i < type->num_boxes; i++, pmu++) {
box = pmu->boxes[die];
if (box && box->cpu >= 0 &&
- atomic_dec_return(&box->cpu_refcnt) == 0)
+ atomic_dec_return(&box->cpu_refcnt) == 0) {
+ if (atomic_dec_return(&pmu->die_refcnt) == 0)
+ uncore_pmu_unregister(pmu);
uncore_box_exit(box);
+ }
}
}
}
@@ -1659,7 +1662,7 @@ static int uncore_box_ref(struct intel_uncore_type **types,
box = pmu->boxes[die];
if (box && box->cpu >= 0 &&
atomic_inc_return(&box->cpu_refcnt) == 1)
- uncore_box_init(box);
+ uncore_box_setup(pmu, box);
}
}
return 0;
@@ -1690,67 +1693,16 @@ static int uncore_event_cpu_online(unsigned int cpu)
return 0;
}
-static int __init type_pmu_register(struct intel_uncore_type *type)
+static int __init uncore_cpu_mmio_init(struct intel_uncore_type **types)
{
- int i, ret;
-
- for (i = 0; i < type->num_boxes; i++) {
- ret = uncore_pmu_register(&type->pmus[i]);
- if (ret)
- return ret;
- }
- return 0;
-}
-
-static int __init uncore_msr_pmus_register(void)
-{
- struct intel_uncore_type **types = uncore_msr_uncores;
- int ret;
-
- for (; *types; types++) {
- ret = type_pmu_register(*types);
- if (ret)
- return ret;
- }
- return 0;
-}
-
-static int __init uncore_cpu_init(void)
-{
- int ret;
-
- ret = uncore_types_init(uncore_msr_uncores);
- if (ret)
- goto err;
-
- ret = uncore_msr_pmus_register();
- if (ret)
- goto err;
- return 0;
-err:
- uncore_types_exit(uncore_msr_uncores);
- uncore_msr_uncores = empty_uncore;
- return ret;
-}
-
-static int __init uncore_mmio_init(void)
-{
- struct intel_uncore_type **types = uncore_mmio_uncores;
int ret;
ret = uncore_types_init(types);
- if (ret)
- goto err;
+ if (!ret)
+ return 0;
- for (; *types; types++) {
- ret = type_pmu_register(*types);
- if (ret)
- goto err;
- }
- return 0;
-err:
- uncore_types_exit(uncore_mmio_uncores);
- uncore_mmio_uncores = empty_uncore;
+ uncore_types_exit(types);
+ types = empty_uncore;
return ret;
}
@@ -2052,12 +2004,12 @@ static int __init intel_uncore_init(void)
if (uncore_init->cpu_init) {
uncore_init->cpu_init();
- cret = uncore_cpu_init();
+ cret = uncore_cpu_mmio_init(uncore_msr_uncores);
}
if (uncore_init->mmio_init) {
uncore_init->mmio_init();
- mret = uncore_mmio_init();
+ mret = uncore_cpu_mmio_init(uncore_mmio_uncores);
}
if (cret && pret && mret) {
--
2.54.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/7] perf/x86/intel/uncore: Let init_box() callback report failures
2026-05-12 23:30 ` [PATCH 2/7] perf/x86/intel/uncore: Let init_box() callback report failures Zide Chen
@ 2026-05-13 0:23 ` Ian Rogers
0 siblings, 0 replies; 19+ messages in thread
From: Ian Rogers @ 2026-05-13 0:23 UTC (permalink / raw)
To: Zide Chen
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Adrian Hunter, Alexander Shishkin, Andi Kleen,
Eranian Stephane, linux-kernel, linux-perf-users, Dapeng Mi
On Tue, May 12, 2026 at 4:39 PM Zide Chen <zide.chen@intel.com> wrote:
>
> The init_box() callback currently returns void, so initialization
> failures are silently ignored and the box is still marked initialized.
>
> Change the callback to return int so platform code can report errors
> back to the common uncore layer.
>
> Update uncore_box_init() to set the initialized flag only when
> init_box() succeeds. Because box->cpu_refcnt guarantees that at most
> one CPU calls uncore_box_init() for a given box at a time, plain
> __set_bit() is safe for the initialized flag without atomic overhead.
>
> Convert all init_box() implementations to return 0 on success or a
> negative error code on failure. This is a prerequisite for propagating
> initialization errors to the caller so they can be handled properly.
>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> arch/x86/events/intel/uncore.h | 16 +++--
> arch/x86/events/intel/uncore_discovery.c | 15 +++--
> arch/x86/events/intel/uncore_discovery.h | 6 +-
> arch/x86/events/intel/uncore_nhmex.c | 3 +-
> arch/x86/events/intel/uncore_snb.c | 80 +++++++++++++++---------
> arch/x86/events/intel/uncore_snbep.c | 69 ++++++++++++--------
> 6 files changed, 120 insertions(+), 69 deletions(-)
>
> diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
> index 7d4ef869d193..5ee05545116a 100644
> --- a/arch/x86/events/intel/uncore.h
> +++ b/arch/x86/events/intel/uncore.h
> @@ -129,7 +129,7 @@ struct intel_uncore_type {
> #define events_group attr_groups[2]
>
> struct intel_uncore_ops {
> - void (*init_box)(struct intel_uncore_box *);
> + int (*init_box)(struct intel_uncore_box *);
> void (*exit_box)(struct intel_uncore_box *);
> void (*disable_box)(struct intel_uncore_box *);
> void (*enable_box)(struct intel_uncore_box *);
> @@ -557,12 +557,18 @@ static inline u64 uncore_read_counter(struct intel_uncore_box *box,
> return box->pmu->type->ops->read_counter(box, event);
> }
>
> -static inline void uncore_box_init(struct intel_uncore_box *box)
> +static inline int uncore_box_init(struct intel_uncore_box *box)
> {
> - if (!test_and_set_bit(UNCORE_BOX_FLAG_INITIALIZED, &box->flags)) {
> - if (box->pmu->type->ops->init_box)
> - box->pmu->type->ops->init_box(box);
> + int ret = 0;
> +
> + if (!test_bit(UNCORE_BOX_FLAG_INITIALIZED, &box->flags) &&
> + box->pmu->type->ops->init_box) {
> + ret = box->pmu->type->ops->init_box(box);
> + if (!ret)
> + __set_bit(UNCORE_BOX_FLAG_INITIALIZED, &box->flags);
> }
> +
> + return ret;
> }
>
> static inline void uncore_box_exit(struct intel_uncore_box *box)
> diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
> index 49183d607a34..f1156ad03523 100644
> --- a/arch/x86/events/intel/uncore_discovery.c
> +++ b/arch/x86/events/intel/uncore_discovery.c
> @@ -488,9 +488,10 @@ static u64 intel_generic_uncore_box_ctl(struct intel_uncore_box *box)
> return unit->addr;
> }
>
> -void intel_generic_uncore_msr_init_box(struct intel_uncore_box *box)
> +int intel_generic_uncore_msr_init_box(struct intel_uncore_box *box)
> {
> wrmsrq(intel_generic_uncore_box_ctl(box), GENERIC_PMON_BOX_CTL_INT);
> + return 0;
> }
>
> void intel_generic_uncore_msr_disable_box(struct intel_uncore_box *box)
> @@ -565,13 +566,14 @@ static inline int intel_pci_uncore_box_ctl(struct intel_uncore_box *box)
> return UNCORE_DISCOVERY_PCI_BOX_CTRL(intel_generic_uncore_box_ctl(box));
> }
>
> -void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box)
> +int intel_generic_uncore_pci_init_box(struct intel_uncore_box *box)
> {
> struct pci_dev *pdev = box->pci_dev;
> int box_ctl = intel_pci_uncore_box_ctl(box);
>
> __set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
> pci_write_config_dword(pdev, box_ctl, GENERIC_PMON_BOX_CTL_INT);
> + return 0;
> }
>
> void intel_generic_uncore_pci_disable_box(struct intel_uncore_box *box)
> @@ -632,7 +634,7 @@ static struct intel_uncore_ops generic_uncore_pci_ops = {
>
> #define UNCORE_GENERIC_MMIO_SIZE 0x4000
>
> -void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
> +int intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
> {
> static struct intel_uncore_discovery_unit *unit;
> struct intel_uncore_type *type = box->pmu->type;
> @@ -642,13 +644,13 @@ void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
> if (!unit) {
> pr_warn("Uncore type %d id %d: Cannot find box control address.\n",
> type->type_id, box->pmu->pmu_idx);
> - return;
> + return -ENODEV;
> }
>
> if (!unit->addr) {
> pr_warn("Uncore type %d box %d: Invalid box control address.\n",
> type->type_id, unit->id);
> - return;
> + return -ENODEV;
> }
>
> addr = unit->addr;
> @@ -656,10 +658,11 @@ void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
> if (!box->io_addr) {
> pr_warn("Uncore type %d box %d: ioremap error for 0x%llx.\n",
> type->type_id, unit->id, (unsigned long long)addr);
> - return;
> + return -ENOMEM;
> }
>
> writel(GENERIC_PMON_BOX_CTL_INT, box->io_addr);
> + return 0;
> }
>
> void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box)
> diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
> index e1330342b92e..142e1b56cfc2 100644
> --- a/arch/x86/events/intel/uncore_discovery.h
> +++ b/arch/x86/events/intel/uncore_discovery.h
> @@ -148,11 +148,11 @@ void intel_uncore_generic_uncore_cpu_init(void);
> int intel_uncore_generic_uncore_pci_init(void);
> void intel_uncore_generic_uncore_mmio_init(void);
>
> -void intel_generic_uncore_msr_init_box(struct intel_uncore_box *box);
> +int intel_generic_uncore_msr_init_box(struct intel_uncore_box *box);
> void intel_generic_uncore_msr_disable_box(struct intel_uncore_box *box);
> void intel_generic_uncore_msr_enable_box(struct intel_uncore_box *box);
>
> -void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box);
> +int intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box);
> void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box);
> void intel_generic_uncore_mmio_enable_box(struct intel_uncore_box *box);
> void intel_generic_uncore_mmio_disable_event(struct intel_uncore_box *box,
> @@ -160,7 +160,7 @@ void intel_generic_uncore_mmio_disable_event(struct intel_uncore_box *box,
> void intel_generic_uncore_mmio_enable_event(struct intel_uncore_box *box,
> struct perf_event *event);
>
> -void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box);
> +int intel_generic_uncore_pci_init_box(struct intel_uncore_box *box);
> void intel_generic_uncore_pci_disable_box(struct intel_uncore_box *box);
> void intel_generic_uncore_pci_enable_box(struct intel_uncore_box *box);
> void intel_generic_uncore_pci_disable_event(struct intel_uncore_box *box,
> diff --git a/arch/x86/events/intel/uncore_nhmex.c b/arch/x86/events/intel/uncore_nhmex.c
> index 8962e7cb21e3..7a6855281102 100644
> --- a/arch/x86/events/intel/uncore_nhmex.c
> +++ b/arch/x86/events/intel/uncore_nhmex.c
> @@ -199,9 +199,10 @@ DEFINE_UNCORE_FORMAT_ATTR(counter, counter, "config:6-7");
> DEFINE_UNCORE_FORMAT_ATTR(match, match, "config1:0-63");
> DEFINE_UNCORE_FORMAT_ATTR(mask, mask, "config2:0-63");
>
> -static void nhmex_uncore_msr_init_box(struct intel_uncore_box *box)
> +static int nhmex_uncore_msr_init_box(struct intel_uncore_box *box)
> {
> wrmsrq(NHMEX_U_MSR_PMON_GLOBAL_CTL, NHMEX_U_PMON_GLOBAL_EN_ALL);
> + return 0;
> }
>
> static void nhmex_uncore_msr_exit_box(struct intel_uncore_box *box)
> diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
> index edddd4f9ab5f..c5347920541c 100644
> --- a/arch/x86/events/intel/uncore_snb.c
> +++ b/arch/x86/events/intel/uncore_snb.c
> @@ -295,12 +295,14 @@ static void snb_uncore_msr_disable_event(struct intel_uncore_box *box, struct pe
> wrmsrq(event->hw.config_base, 0);
> }
>
> -static void snb_uncore_msr_init_box(struct intel_uncore_box *box)
> +static int snb_uncore_msr_init_box(struct intel_uncore_box *box)
> {
> if (box->pmu->pmu_idx == 0) {
> wrmsrq(SNB_UNC_PERF_GLOBAL_CTL,
> SNB_UNC_GLOBAL_CTL_EN | SNB_UNC_GLOBAL_CTL_CORE_ALL);
> }
> +
> + return 0;
> }
>
> static void snb_uncore_msr_enable_box(struct intel_uncore_box *box)
> @@ -394,7 +396,7 @@ void snb_uncore_cpu_init(void)
> snb_uncore_cbox.num_boxes = topology_num_cores_per_package();
> }
>
> -static void skl_uncore_msr_init_box(struct intel_uncore_box *box)
> +static int skl_uncore_msr_init_box(struct intel_uncore_box *box)
> {
> if (box->pmu->pmu_idx == 0) {
> wrmsrq(SKL_UNC_PERF_GLOBAL_CTL,
> @@ -404,6 +406,8 @@ static void skl_uncore_msr_init_box(struct intel_uncore_box *box)
> /* The 8th CBOX has different MSR space */
> if (box->pmu->pmu_idx == 7)
> __set_bit(UNCORE_BOX_FLAG_CFL8_CBOX_MSR_OFFS, &box->flags);
> +
> + return 0;
> }
>
> static void skl_uncore_msr_enable_box(struct intel_uncore_box *box)
> @@ -547,10 +551,12 @@ static struct intel_uncore_type *tgl_msr_uncores[] = {
> NULL,
> };
>
> -static void rkl_uncore_msr_init_box(struct intel_uncore_box *box)
> +static int rkl_uncore_msr_init_box(struct intel_uncore_box *box)
> {
> if (box->pmu->pmu_idx == 0)
> wrmsrq(SKL_UNC_PERF_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
> +
> + return 0;
> }
>
> void tgl_uncore_cpu_init(void)
> @@ -707,9 +713,10 @@ static struct intel_uncore_type mtl_uncore_hac_cbox = {
> .format_group = &adl_uncore_format_group,
> };
>
> -static void mtl_uncore_msr_init_box(struct intel_uncore_box *box)
> +static int mtl_uncore_msr_init_box(struct intel_uncore_box *box)
> {
> wrmsrq(uncore_msr_box_ctl(box), SNB_UNC_GLOBAL_CTL_EN);
> + return 0;
> }
>
> static struct intel_uncore_ops mtl_uncore_msr_ops = {
> @@ -773,10 +780,12 @@ static struct intel_uncore_type *lnl_msr_uncores[] = {
>
> #define LNL_UNC_MSR_GLOBAL_CTL 0x240e
>
> -static void lnl_uncore_msr_init_box(struct intel_uncore_box *box)
> +static int lnl_uncore_msr_init_box(struct intel_uncore_box *box)
> {
> if (box->pmu->pmu_idx == 0)
> wrmsrq(LNL_UNC_MSR_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
> +
> + return 0;
> }
>
> static struct intel_uncore_ops lnl_uncore_msr_ops = {
> @@ -874,7 +883,7 @@ static const struct attribute_group snb_uncore_imc_format_group = {
> .attrs = snb_uncore_imc_formats_attr,
> };
>
> -static void snb_uncore_imc_init_box(struct intel_uncore_box *box)
> +static int snb_uncore_imc_init_box(struct intel_uncore_box *box)
> {
> struct intel_uncore_type *type = box->pmu->type;
> struct pci_dev *pdev = box->pci_dev;
> @@ -893,10 +902,13 @@ static void snb_uncore_imc_init_box(struct intel_uncore_box *box)
> addr &= ~(PAGE_SIZE - 1);
>
> box->io_addr = ioremap(addr, type->mmio_map_size);
> - if (!box->io_addr)
> + if (!box->io_addr) {
> pr_warn("perf uncore: Failed to ioremap for %s.\n", type->name);
> + return -ENOMEM;
> + }
>
> box->hrtimer_duration = UNCORE_SNB_IMC_HRTIMER_INTERVAL;
> + return 0;
> }
>
> static void snb_uncore_imc_enable_box(struct intel_uncore_box *box)
> @@ -1532,7 +1544,7 @@ static struct pci_dev *tgl_uncore_get_mc_dev(void)
> #define TGL_UNCORE_MMIO_IMC_MEM_OFFSET 0x10000
> #define TGL_UNCORE_PCI_IMC_MAP_SIZE 0xe000
>
> -static void
> +static int
> uncore_get_box_mmio_addr(struct intel_uncore_box *box,
> unsigned int base_offset,
> int bar_offset, int step)
> @@ -1541,19 +1553,20 @@ uncore_get_box_mmio_addr(struct intel_uncore_box *box,
> struct intel_uncore_pmu *pmu = box->pmu;
> struct intel_uncore_type *type = pmu->type;
> resource_size_t addr;
> + int ret = 0;
> u32 bar;
>
> if (!pdev) {
> pr_warn("perf uncore: Cannot find matched IMC device.\n");
> - return;
> + return -ENODEV;
> }
>
> pci_read_config_dword(pdev, bar_offset, &bar);
> if (!(bar & BIT(0))) {
> pr_warn("perf uncore: BAR 0x%x is disabled. Failed to map %s counters.\n",
> bar_offset, type->name);
> - pci_dev_put(pdev);
> - return;
> + ret = -ENODEV;
> + goto out;
> }
> bar &= ~BIT(0);
> addr = (resource_size_t)(bar + step * pmu->pmu_idx);
> @@ -1565,23 +1578,26 @@ uncore_get_box_mmio_addr(struct intel_uncore_box *box,
>
> addr += base_offset;
> box->io_addr = ioremap(addr, type->mmio_map_size);
> - if (!box->io_addr)
> + if (!box->io_addr) {
> + ret = -ENOMEM;
> pr_warn("perf uncore: Failed to ioremap for %s.\n", type->name);
> -
> + }
> +out:
> pci_dev_put(pdev);
> + return ret;
> }
>
> -static void __uncore_imc_init_box(struct intel_uncore_box *box,
> +static int __uncore_imc_init_box(struct intel_uncore_box *box,
> unsigned int base_offset)
> {
> - uncore_get_box_mmio_addr(box, base_offset,
> + return uncore_get_box_mmio_addr(box, base_offset,
> SNB_UNCORE_PCI_IMC_BAR_OFFSET,
> TGL_UNCORE_MMIO_IMC_MEM_OFFSET);
> }
>
> -static void tgl_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
> +static int tgl_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
> {
> - __uncore_imc_init_box(box, 0);
> + return __uncore_imc_init_box(box, 0);
> }
>
> static struct intel_uncore_ops tgl_uncore_imc_freerunning_ops = {
> @@ -1648,13 +1664,15 @@ void tgl_uncore_mmio_init(void)
> #define ADL_UNCORE_IMC_CTL_INT (ADL_UNCORE_IMC_CTL_RST_CTRL | \
> ADL_UNCORE_IMC_CTL_RST_CTRS)
>
> -static void adl_uncore_imc_init_box(struct intel_uncore_box *box)
> +static int adl_uncore_imc_init_box(struct intel_uncore_box *box)
> {
> - __uncore_imc_init_box(box, ADL_UNCORE_IMC_BASE);
> + int ret = __uncore_imc_init_box(box, ADL_UNCORE_IMC_BASE);
>
> /* The global control in MC1 can control both MCs. */
> - if (box->io_addr && (box->pmu->pmu_idx == 1))
> + if (!ret && (box->pmu->pmu_idx == 1))
> writel(ADL_UNCORE_IMC_CTL_INT, box->io_addr + ADL_UNCORE_IMC_GLOBAL_CTL);
> +
> + return ret;
> }
>
> static void adl_uncore_mmio_disable_box(struct intel_uncore_box *box)
> @@ -1731,9 +1749,9 @@ static struct freerunning_counters adl_uncore_imc_freerunning[] = {
> [ADL_MMIO_UNCORE_IMC_DATA_WRITE] = { 0xA0, 0x0, 0x0, 1, 64 },
> };
>
> -static void adl_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
> +static int adl_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
> {
> - __uncore_imc_init_box(box, ADL_UNCORE_IMC_FREERUNNING_BASE);
> + return __uncore_imc_init_box(box, ADL_UNCORE_IMC_FREERUNNING_BASE);
> }
>
> static struct intel_uncore_ops adl_uncore_imc_freerunning_ops = {
> @@ -1803,9 +1821,9 @@ static const struct attribute_group lnl_uncore_format_group = {
> .attrs = lnl_uncore_formats_attr,
> };
>
> -static void lnl_uncore_hbo_init_box(struct intel_uncore_box *box)
> +static int lnl_uncore_hbo_init_box(struct intel_uncore_box *box)
> {
> - uncore_get_box_mmio_addr(box, LNL_UNCORE_HBO_BASE,
> + return uncore_get_box_mmio_addr(box, LNL_UNCORE_HBO_BASE,
> LNL_UNCORE_PCI_SAFBAR_OFFSET,
> LNL_UNCORE_HBO_OFFSET);
> }
> @@ -1829,14 +1847,16 @@ static struct intel_uncore_type lnl_uncore_hbo = {
> .format_group = &lnl_uncore_format_group,
> };
>
> -static void lnl_uncore_sncu_init_box(struct intel_uncore_box *box)
> +static int lnl_uncore_sncu_init_box(struct intel_uncore_box *box)
> {
> - uncore_get_box_mmio_addr(box, LNL_UNCORE_SNCU_BASE,
> + int ret = uncore_get_box_mmio_addr(box, LNL_UNCORE_SNCU_BASE,
> LNL_UNCORE_PCI_SAFBAR_OFFSET,
> 0);
>
> - if (box->io_addr)
> + if (!ret)
> writel(ADL_UNCORE_IMC_CTL_INT, box->io_addr + LNL_UNCORE_GLOBAL_CTL);
> +
> + return ret;
> }
>
> static struct intel_uncore_ops lnl_uncore_sncu_ops = {
> @@ -1887,13 +1907,15 @@ static struct intel_uncore_type ptl_uncore_imc = {
> .mmio_map_size = 0xf00,
> };
>
> -static void ptl_uncore_sncu_init_box(struct intel_uncore_box *box)
> +static int ptl_uncore_sncu_init_box(struct intel_uncore_box *box)
> {
> - intel_generic_uncore_mmio_init_box(box);
> + int ret = intel_generic_uncore_mmio_init_box(box);
>
> /* Clear the global freeze bit */
> if (box->io_addr)
> writel(0, box->io_addr + PTL_UNCORE_GLOBAL_CTL_OFFSET);
> +
> + return ret;
> }
>
> static struct intel_uncore_ops ptl_uncore_sncu_ops = {
> diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
> index 251c7bdbe30b..fbc51fa1b705 100644
> --- a/arch/x86/events/intel/uncore_snbep.c
> +++ b/arch/x86/events/intel/uncore_snbep.c
> @@ -627,12 +627,13 @@ static u64 snbep_uncore_pci_read_counter(struct intel_uncore_box *box, struct pe
> return count;
> }
>
> -static void snbep_uncore_pci_init_box(struct intel_uncore_box *box)
> +static int snbep_uncore_pci_init_box(struct intel_uncore_box *box)
> {
> struct pci_dev *pdev = box->pci_dev;
> int box_ctl = uncore_pci_box_ctl(box);
>
> pci_write_config_dword(pdev, box_ctl, SNBEP_PMON_BOX_CTL_INT);
> + return 0;
> }
>
> static void snbep_uncore_msr_disable_box(struct intel_uncore_box *box)
> @@ -680,12 +681,14 @@ static void snbep_uncore_msr_disable_event(struct intel_uncore_box *box,
> wrmsrq(hwc->config_base, hwc->config);
> }
>
> -static void snbep_uncore_msr_init_box(struct intel_uncore_box *box)
> +static int snbep_uncore_msr_init_box(struct intel_uncore_box *box)
> {
> unsigned msr = uncore_msr_box_ctl(box);
>
> if (msr)
> wrmsrq(msr, SNBEP_PMON_BOX_CTL_INT);
> +
> + return 0;
> }
>
> static struct attribute *snbep_uncore_formats_attr[] = {
> @@ -1507,18 +1510,21 @@ int snbep_uncore_pci_init(void)
> /* end of Sandy Bridge-EP uncore support */
>
> /* IvyTown uncore support */
> -static void ivbep_uncore_msr_init_box(struct intel_uncore_box *box)
> +static int ivbep_uncore_msr_init_box(struct intel_uncore_box *box)
> {
> unsigned msr = uncore_msr_box_ctl(box);
> if (msr)
> wrmsrq(msr, IVBEP_PMON_BOX_CTL_INT);
> +
> + return 0;
> }
>
> -static void ivbep_uncore_pci_init_box(struct intel_uncore_box *box)
> +static int ivbep_uncore_pci_init_box(struct intel_uncore_box *box)
> {
> struct pci_dev *pdev = box->pci_dev;
>
> pci_write_config_dword(pdev, SNBEP_PCI_PMON_BOX_CTL, IVBEP_PMON_BOX_CTL_INT);
> + return 0;
> }
>
> #define IVBEP_UNCORE_MSR_OPS_COMMON_INIT() \
> @@ -2784,7 +2790,7 @@ static struct intel_uncore_type hswep_uncore_cbox = {
> /*
> * Write SBOX Initialization register bit by bit to avoid spurious #GPs
> */
> -static void hswep_uncore_sbox_msr_init_box(struct intel_uncore_box *box)
> +static int hswep_uncore_sbox_msr_init_box(struct intel_uncore_box *box)
> {
> unsigned msr = uncore_msr_box_ctl(box);
>
> @@ -2798,6 +2804,8 @@ static void hswep_uncore_sbox_msr_init_box(struct intel_uncore_box *box)
> wrmsrq(msr, flags);
> }
> }
> +
> + return 0;
> }
>
> static struct intel_uncore_ops hswep_uncore_sbox_msr_ops = {
> @@ -4160,12 +4168,13 @@ static const struct attribute_group skx_upi_uncore_format_group = {
> .attrs = skx_upi_uncore_formats_attr,
> };
>
> -static void skx_upi_uncore_pci_init_box(struct intel_uncore_box *box)
> +static int skx_upi_uncore_pci_init_box(struct intel_uncore_box *box)
> {
> struct pci_dev *pdev = box->pci_dev;
>
> __set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
> pci_write_config_dword(pdev, SKX_UPI_PCI_PMON_BOX_CTL, IVBEP_PMON_BOX_CTL_INT);
> + return 0;
> }
>
> static struct intel_uncore_ops skx_upi_uncore_pci_ops = {
> @@ -4321,12 +4330,13 @@ static struct intel_uncore_type skx_uncore_upi = {
> .cleanup_mapping = skx_upi_cleanup_mapping,
> };
>
> -static void skx_m2m_uncore_pci_init_box(struct intel_uncore_box *box)
> +static int skx_m2m_uncore_pci_init_box(struct intel_uncore_box *box)
> {
> struct pci_dev *pdev = box->pci_dev;
>
> __set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
> pci_write_config_dword(pdev, SKX_M2M_PCI_PMON_BOX_CTL, IVBEP_PMON_BOX_CTL_INT);
> + return 0;
> }
>
> static struct intel_uncore_ops skx_m2m_uncore_pci_ops = {
> @@ -4829,13 +4839,14 @@ void snr_uncore_cpu_init(void)
> uncore_msr_uncores = snr_msr_uncores;
> }
>
> -static void snr_m2m_uncore_pci_init_box(struct intel_uncore_box *box)
> +static int snr_m2m_uncore_pci_init_box(struct intel_uncore_box *box)
> {
> struct pci_dev *pdev = box->pci_dev;
> int box_ctl = uncore_pci_box_ctl(box);
>
> __set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
> pci_write_config_dword(pdev, box_ctl, IVBEP_PMON_BOX_CTL_INT);
> + return 0;
> }
>
> static struct intel_uncore_ops snr_m2m_uncore_pci_ops = {
> @@ -5008,17 +5019,22 @@ static int snr_uncore_mmio_map(struct intel_uncore_box *box,
> return 0;
> }
>
> -static void __snr_uncore_mmio_init_box(struct intel_uncore_box *box,
> +static int __snr_uncore_mmio_init_box(struct intel_uncore_box *box,
> unsigned int box_ctl, int mem_offset,
> unsigned int device)
> {
> - if (!snr_uncore_mmio_map(box, box_ctl, mem_offset, device))
> + int ret;
> +
> + ret = snr_uncore_mmio_map(box, box_ctl, mem_offset, device);
> + if (!ret)
> writel(IVBEP_PMON_BOX_CTL_INT, box->io_addr);
> +
> + return ret;
> }
>
> -static void snr_uncore_mmio_init_box(struct intel_uncore_box *box)
> +static int snr_uncore_mmio_init_box(struct intel_uncore_box *box)
> {
> - __snr_uncore_mmio_init_box(box, uncore_mmio_box_ctl(box),
> + return __snr_uncore_mmio_init_box(box, uncore_mmio_box_ctl(box),
> SNR_IMC_MMIO_MEM0_OFFSET,
> SNR_MC_DEVICE_ID);
> }
> @@ -5635,14 +5651,14 @@ int icx_uncore_pci_init(void)
> return 0;
> }
>
> -static void icx_uncore_imc_init_box(struct intel_uncore_box *box)
> +static int icx_uncore_imc_init_box(struct intel_uncore_box *box)
> {
> unsigned int box_ctl = box->pmu->type->box_ctl +
> box->pmu->type->mmio_offset * (box->pmu->pmu_idx % ICX_NUMBER_IMC_CHN);
> int mem_offset = (box->pmu->pmu_idx / ICX_NUMBER_IMC_CHN) * ICX_IMC_MEM_STRIDE +
> SNR_IMC_MMIO_MEM0_OFFSET;
>
> - __snr_uncore_mmio_init_box(box, box_ctl, mem_offset,
> + return __snr_uncore_mmio_init_box(box, box_ctl, mem_offset,
> SNR_MC_DEVICE_ID);
> }
>
> @@ -5699,12 +5715,12 @@ static struct uncore_event_desc icx_uncore_imc_freerunning_events[] = {
> { /* end: all zeroes */ },
> };
>
> -static void icx_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
> +static int icx_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
> {
> int mem_offset = box->pmu->pmu_idx * ICX_IMC_MEM_STRIDE +
> SNR_IMC_MMIO_MEM0_OFFSET;
>
> - snr_uncore_mmio_map(box, uncore_mmio_box_ctl(box),
> + return snr_uncore_mmio_map(box, uncore_mmio_box_ctl(box),
> mem_offset, SNR_MC_DEVICE_ID);
> }
>
> @@ -6001,10 +6017,10 @@ static struct intel_uncore_type spr_uncore_mdf = {
> .name = "mdf",
> };
>
> -static void spr_uncore_mmio_offs8_init_box(struct intel_uncore_box *box)
> +static int spr_uncore_mmio_offs8_init_box(struct intel_uncore_box *box)
> {
> __set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
> - intel_generic_uncore_mmio_init_box(box);
> + return intel_generic_uncore_mmio_init_box(box);
> }
>
> static struct intel_uncore_ops spr_uncore_mmio_offs8_ops = {
> @@ -6185,12 +6201,11 @@ static struct uncore_event_desc spr_uncore_imc_freerunning_events[] = {
>
> #define SPR_MC_DEVICE_ID 0x3251
>
> -static void spr_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
> +static int spr_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
> {
> int mem_offset = box->pmu->pmu_idx * ICX_IMC_MEM_STRIDE + SNR_IMC_MMIO_MEM0_OFFSET;
> -
> - snr_uncore_mmio_map(box, uncore_mmio_box_ctl(box),
> - mem_offset, SPR_MC_DEVICE_ID);
> + return snr_uncore_mmio_map(box, uncore_mmio_box_ctl(box),
> + mem_offset, SPR_MC_DEVICE_ID);
> }
>
> static struct intel_uncore_ops spr_uncore_imc_freerunning_ops = {
> @@ -6879,20 +6894,24 @@ static unsigned int dmr_iio_freerunning_box_offsets[] = {
> 0x0, 0x8000, 0x18000, 0x20000
> };
>
> -static void dmr_uncore_freerunning_init_box(struct intel_uncore_box *box)
> +static int dmr_uncore_freerunning_init_box(struct intel_uncore_box *box)
> {
> struct intel_uncore_type *type = box->pmu->type;
> u64 mmio_base;
>
> if (box->pmu->pmu_idx >= type->num_boxes)
> - return;
> + return -ENODEV;
>
> mmio_base = DMR_IMH1_HIOP_MMIO_BASE;
> mmio_base += dmr_iio_freerunning_box_offsets[box->pmu->pmu_idx];
>
> box->io_addr = ioremap(mmio_base, type->mmio_map_size);
> - if (!box->io_addr)
> + if (!box->io_addr) {
> pr_warn("perf uncore: Failed to ioremap for %s.\n", type->name);
> + return -ENOMEM;
> + }
> +
> + return 0;
> }
>
> static struct intel_uncore_ops dmr_uncore_freerunning_ops = {
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/7] perf/x86/intel/uncore: Rename refcount fields and other cleanups
2026-05-12 23:30 ` [PATCH 1/7] perf/x86/intel/uncore: Rename refcount fields and other cleanups Zide Chen
@ 2026-05-13 0:26 ` Ian Rogers
0 siblings, 0 replies; 19+ messages in thread
From: Ian Rogers @ 2026-05-13 0:26 UTC (permalink / raw)
To: Zide Chen
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Adrian Hunter, Alexander Shishkin, Andi Kleen,
Eranian Stephane, linux-kernel, linux-perf-users, Dapeng Mi
On Tue, May 12, 2026 at 4:39 PM Zide Chen <zide.chen@intel.com> wrote:
>
> Fix typo UNCORE_BOX_FLAG_INITIATED to UNCORE_BOX_FLAG_INITIALIZED.
>
> Rename the 'id' parameter in uncore_box_{ref,unref}() to 'die' to
> reflect its actual meaning and be consistent with other functions.
>
> pmu->activeboxes is misleading because it does not decrement when the
> PMU fails to register. Rename it to die_refcnt, reflecting that it
> counts the number of dies with a box present, regardless of whether
> the PMU is functioning.
>
> Rename box->refcnt to box->cpu_refcnt to clarify that it tracks the
> number of CPUs on the die with a corresponding PMU box present.
>
> Remove the incorrect atomic_inc(&box->refcnt) from
> uncore_pci_pmu_register(): PCI boxes are not tracked by cpu_refcnt,
> and this call incorrectly increments it on a per-die basis.
>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> arch/x86/events/intel/uncore.c | 21 +++++++++++----------
> arch/x86/events/intel/uncore.h | 10 +++++-----
> 2 files changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index a7780c5cd419..012a7e081014 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -1170,14 +1170,13 @@ static int uncore_pci_pmu_register(struct pci_dev *pdev,
> if (!box)
> return -ENOMEM;
>
> - atomic_inc(&box->refcnt);
> box->dieid = die;
> box->pci_dev = pdev;
> box->pmu = pmu;
> uncore_box_init(box);
>
> pmu->boxes[die] = box;
> - if (atomic_inc_return(&pmu->activeboxes) > 1)
> + if (atomic_inc_return(&pmu->die_refcnt) > 1)
> return 0;
>
> /* First active box registers the pmu */
> @@ -1249,7 +1248,7 @@ static void uncore_pci_pmu_unregister(struct intel_uncore_pmu *pmu, int die)
> struct intel_uncore_box *box = pmu->boxes[die];
>
> pmu->boxes[die] = NULL;
> - if (atomic_dec_return(&pmu->activeboxes) == 0)
> + if (atomic_dec_return(&pmu->die_refcnt) == 0)
> uncore_pmu_unregister(pmu);
> uncore_box_exit(box);
> kfree(box);
> @@ -1515,7 +1514,7 @@ static void uncore_change_context(struct intel_uncore_type **uncores,
> uncore_change_type_ctx(*uncores, old_cpu, new_cpu);
> }
>
> -static void uncore_box_unref(struct intel_uncore_type **types, int id)
> +static void uncore_box_unref(struct intel_uncore_type **types, int die)
> {
> struct intel_uncore_type *type;
> struct intel_uncore_pmu *pmu;
> @@ -1526,8 +1525,9 @@ static void uncore_box_unref(struct intel_uncore_type **types, int id)
> type = *types;
> pmu = type->pmus;
> for (i = 0; i < type->num_boxes; i++, pmu++) {
> - box = pmu->boxes[id];
> - if (box && box->cpu >= 0 && atomic_dec_return(&box->refcnt) == 0)
> + box = pmu->boxes[die];
> + if (box && box->cpu >= 0 &&
> + atomic_dec_return(&box->cpu_refcnt) == 0)
> uncore_box_exit(box);
> }
> }
> @@ -1601,14 +1601,14 @@ static int allocate_boxes(struct intel_uncore_type **types,
> }
>
> static int uncore_box_ref(struct intel_uncore_type **types,
> - int id, unsigned int cpu)
> + int die, unsigned int cpu)
> {
> struct intel_uncore_type *type;
> struct intel_uncore_pmu *pmu;
> struct intel_uncore_box *box;
> int i, ret;
>
> - ret = allocate_boxes(types, id, cpu);
> + ret = allocate_boxes(types, die, cpu);
> if (ret)
> return ret;
>
> @@ -1616,8 +1616,9 @@ static int uncore_box_ref(struct intel_uncore_type **types,
> type = *types;
> pmu = type->pmus;
> for (i = 0; i < type->num_boxes; i++, pmu++) {
> - box = pmu->boxes[id];
> - if (box && box->cpu >= 0 && atomic_inc_return(&box->refcnt) == 1)
> + box = pmu->boxes[die];
> + if (box && box->cpu >= 0 &&
> + atomic_inc_return(&box->cpu_refcnt) == 1)
> uncore_box_init(box);
> }
> }
> diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
> index c2e5ccb1d72c..7d4ef869d193 100644
> --- a/arch/x86/events/intel/uncore.h
> +++ b/arch/x86/events/intel/uncore.h
> @@ -147,7 +147,7 @@ struct intel_uncore_pmu {
> char name[UNCORE_PMU_NAME_LEN];
> int pmu_idx;
> bool registered;
> - atomic_t activeboxes;
> + atomic_t die_refcnt;
> cpumask_t cpu_mask;
> struct intel_uncore_type *type;
> struct intel_uncore_box **boxes;
> @@ -165,7 +165,7 @@ struct intel_uncore_box {
> int n_events;
> int cpu; /* cpu to collect events */
> unsigned long flags;
> - atomic_t refcnt;
> + atomic_t cpu_refcnt; /* Number of CPUs that have this box online */
> struct perf_event *events[UNCORE_PMC_IDX_MAX];
> struct perf_event *event_list[UNCORE_PMC_IDX_MAX];
> struct event_constraint *event_constraint[UNCORE_PMC_IDX_MAX];
> @@ -185,7 +185,7 @@ struct intel_uncore_box {
> #define CFL_UNC_CBO_7_PERFEVTSEL0 0xf70
> #define CFL_UNC_CBO_7_PER_CTR0 0xf76
>
> -#define UNCORE_BOX_FLAG_INITIATED 0
> +#define UNCORE_BOX_FLAG_INITIALIZED 0
> /* event config registers are 8-byte apart */
> #define UNCORE_BOX_FLAG_CTL_OFFS8 1
> /* CFL 8th CBOX has different MSR space */
> @@ -559,7 +559,7 @@ static inline u64 uncore_read_counter(struct intel_uncore_box *box,
>
> static inline void uncore_box_init(struct intel_uncore_box *box)
> {
> - if (!test_and_set_bit(UNCORE_BOX_FLAG_INITIATED, &box->flags)) {
> + if (!test_and_set_bit(UNCORE_BOX_FLAG_INITIALIZED, &box->flags)) {
> if (box->pmu->type->ops->init_box)
> box->pmu->type->ops->init_box(box);
> }
> @@ -567,7 +567,7 @@ static inline void uncore_box_init(struct intel_uncore_box *box)
>
> static inline void uncore_box_exit(struct intel_uncore_box *box)
> {
> - if (test_and_clear_bit(UNCORE_BOX_FLAG_INITIATED, &box->flags)) {
> + if (test_and_clear_bit(UNCORE_BOX_FLAG_INITIALIZED, &box->flags)) {
> if (box->pmu->type->ops->exit_box)
> box->pmu->type->ops->exit_box(box);
> }
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/7] perf/x86/intel/uncore: Factor out box setup code
2026-05-12 23:30 ` [PATCH 4/7] perf/x86/intel/uncore: Factor out box setup code Zide Chen
@ 2026-05-13 0:27 ` Ian Rogers
0 siblings, 0 replies; 19+ messages in thread
From: Ian Rogers @ 2026-05-13 0:27 UTC (permalink / raw)
To: Zide Chen
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Adrian Hunter, Alexander Shishkin, Andi Kleen,
Eranian Stephane, linux-kernel, linux-perf-users, Dapeng Mi
On Tue, May 12, 2026 at 4:39 PM Zide Chen <zide.chen@intel.com> wrote:
>
> The PCI uncore PMU path already implements a lazy registration model:
> the PMU is registered when the first active box appears and
> unregistered when the last active box is removed.
>
> Factor this registration management into a shared helper, so the same
> code can be reused by the MSR and MMIO paths in later changes.
>
> No functional change intended.
>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> arch/x86/events/intel/uncore.c | 39 ++++++++++++++++++++++++----------
> 1 file changed, 28 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index 8d5170788af2..00ed4e5047ac 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -1148,6 +1148,29 @@ uncore_pci_find_dev_pmu(struct pci_dev *pdev, const struct pci_device_id *ids)
> return pmu;
> }
>
> +static int uncore_box_setup(struct intel_uncore_pmu *pmu,
> + struct intel_uncore_box *box)
> +{
> + int dies, ret;
> +
> + /* die_refcnt tracks online dies, not only functioning boxes. */
> + dies = atomic_inc_return(&pmu->die_refcnt);
> + uncore_box_init(box);
> +
> + /* First active box registers the pmu. */
> + if (dies > 1)
> + return 0;
> +
> + ret = uncore_pmu_register(pmu);
> + if (ret)
> + goto err;
> +
> + return 0;
> +err:
> + uncore_box_exit(box);
> + return ret;
> +}
> +
> /*
> * Register the PMU for a PCI device
> * @pdev: The PCI device.
> @@ -1173,19 +1196,13 @@ static int uncore_pci_pmu_register(struct pci_dev *pdev,
> box->dieid = die;
> box->pci_dev = pdev;
> box->pmu = pmu;
> - uncore_box_init(box);
>
> - pmu->boxes[die] = box;
> - if (atomic_inc_return(&pmu->die_refcnt) > 1)
> - return 0;
> -
> - /* First active box registers the pmu */
> - ret = uncore_pmu_register(pmu);
> - if (ret) {
> - pmu->boxes[die] = NULL;
> - uncore_box_exit(box);
> + ret = uncore_box_setup(pmu, box);
> + if (!ret)
> + pmu->boxes[die] = box;
> + else
> kfree(box);
> - }
> +
> return ret;
> }
>
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 5/7] perf/x86/intel/uncore: Introduce PMU flags and broken state
2026-05-12 23:30 ` [PATCH 5/7] perf/x86/intel/uncore: Introduce PMU flags and broken state Zide Chen
@ 2026-05-13 0:28 ` Ian Rogers
0 siblings, 0 replies; 19+ messages in thread
From: Ian Rogers @ 2026-05-13 0:28 UTC (permalink / raw)
To: Zide Chen
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Adrian Hunter, Alexander Shishkin, Andi Kleen,
Eranian Stephane, linux-kernel, linux-perf-users, Dapeng Mi
On Tue, May 12, 2026 at 4:39 PM Zide Chen <zide.chen@intel.com> wrote:
>
> Replace the boolean 'registered' field in intel_uncore_pmu with an
> unsigned long 'flags' field, and add a PMU_BROKEN flag to track box
> setup failures.
>
> When any box fails to initialize, the PMU is marked broken. Broken
> PMUs reject new event assignments and skip future box setup attempts.
> If the PMU was already registered, it remains so to avoid disrupting
> in-flight events on other boxes.
>
> To prevent retry loops, die_refcnt and cpu_refcnt are not decremented
> on failure, and broken PMUs are skipped in the CPU hotplug and box
> allocation paths.
>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> arch/x86/events/intel/uncore.c | 36 +++++++++++++++++++++++-------
> arch/x86/events/intel/uncore.h | 12 +++++++++-
> arch/x86/events/intel/uncore_snb.c | 2 +-
> 3 files changed, 40 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index 00ed4e5047ac..922ba299533e 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -757,7 +757,7 @@ static int uncore_pmu_event_init(struct perf_event *event)
>
> pmu = uncore_event_to_pmu(event);
> /* no device found for this pmu */
> - if (!pmu->registered)
> + if (!uncore_pmu_available(pmu))
> return -ENOENT;
>
> /* Sampling not supported yet */
> @@ -953,16 +953,16 @@ static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
>
> ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
> if (!ret)
> - pmu->registered = true;
> + uncore_pmu_set_registered(pmu);
> return ret;
> }
>
> static void uncore_pmu_unregister(struct intel_uncore_pmu *pmu)
> {
> - if (!pmu->registered)
> + if (!uncore_pmu_registered(pmu))
> return;
> perf_pmu_unregister(&pmu->pmu);
> - pmu->registered = false;
> + WRITE_ONCE(pmu->flags, 0);
> }
>
> static void uncore_free_boxes(struct intel_uncore_pmu *pmu)
> @@ -1155,7 +1155,13 @@ static int uncore_box_setup(struct intel_uncore_pmu *pmu,
>
> /* die_refcnt tracks online dies, not only functioning boxes. */
> dies = atomic_inc_return(&pmu->die_refcnt);
> - uncore_box_init(box);
> +
> + if (uncore_pmu_broken(pmu))
> + return -ENODEV;
> +
> + ret = uncore_box_init(box);
> + if (ret)
> + goto err;
>
> /* First active box registers the pmu. */
> if (dies > 1)
> @@ -1167,6 +1173,19 @@ static int uncore_box_setup(struct intel_uncore_pmu *pmu,
>
> return 0;
> err:
> + /*
> + * On failure on any box, mark the per-package PMU as broken regardless
> + * of whether it was registered or not.
> + *
> + * Don't decrement die_refcnt to prevent any future CPU online
> + * event or PCI probe, from retrying the failed PMU registration.
> + *
> + * Don't decrement cpu_refcnt to avoid other in-die CPUs from
> + * trying to set up the PMU box again.
> + *
> + * Don't kfree box; MSR and MMIO boxes are freed at module exit only.
> + */
> + uncore_pmu_set_broken(pmu);
> uncore_box_exit(box);
> return ret;
> }
> @@ -1502,7 +1521,8 @@ static void uncore_change_type_ctx(struct intel_uncore_type *type, int old_cpu,
>
> if (old_cpu < 0) {
> WARN_ON_ONCE(box->cpu != -1);
> - if (uncore_die_has_box(type, die, pmu->pmu_idx)) {
> + if (uncore_die_has_box(type, die, pmu->pmu_idx) &&
> + !uncore_pmu_broken(pmu)) {
> box->cpu = new_cpu;
> cpumask_set_cpu(new_cpu, &pmu->cpu_mask);
> }
> @@ -1512,7 +1532,7 @@ static void uncore_change_type_ctx(struct intel_uncore_type *type, int old_cpu,
> WARN_ON_ONCE(box->cpu != -1 && box->cpu != old_cpu);
> box->cpu = -1;
> cpumask_clear_cpu(old_cpu, &pmu->cpu_mask);
> - if (new_cpu < 0)
> + if (new_cpu < 0 || uncore_pmu_broken(pmu))
> continue;
>
> if (!uncore_die_has_box(type, die, pmu->pmu_idx))
> @@ -1592,7 +1612,7 @@ static int allocate_boxes(struct intel_uncore_type **types,
> type = *types;
> pmu = type->pmus;
> for (i = 0; i < type->num_boxes; i++, pmu++) {
> - if (pmu->boxes[die])
> + if (pmu->boxes[die] || uncore_pmu_broken(pmu))
> continue;
> box = uncore_alloc_box(type, cpu_to_node(cpu));
> if (!box)
> diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
> index 5ee05545116a..4d3a99bf1455 100644
> --- a/arch/x86/events/intel/uncore.h
> +++ b/arch/x86/events/intel/uncore.h
> @@ -146,13 +146,23 @@ struct intel_uncore_pmu {
> struct pmu pmu;
> char name[UNCORE_PMU_NAME_LEN];
> int pmu_idx;
> - bool registered;
> + unsigned long flags;
> atomic_t die_refcnt;
> cpumask_t cpu_mask;
> struct intel_uncore_type *type;
> struct intel_uncore_box **boxes;
> };
>
> +#define PMU_REGISTERED_BIT 0
> +#define PMU_BROKEN_BIT 1
> +
> +#define uncore_pmu_registered(pmu) test_bit(PMU_REGISTERED_BIT, &(pmu)->flags)
> +#define uncore_pmu_broken(pmu) test_bit(PMU_BROKEN_BIT, &(pmu)->flags)
> +#define uncore_pmu_available(pmu) (uncore_pmu_registered(pmu) && \
> + !uncore_pmu_broken(pmu))
> +#define uncore_pmu_set_registered(pmu) set_bit(PMU_REGISTERED_BIT, &(pmu)->flags)
> +#define uncore_pmu_set_broken(pmu) set_bit(PMU_BROKEN_BIT, &(pmu)->flags)
> +
> struct intel_uncore_extra_reg {
> raw_spinlock_t lock;
> u64 config, config1, config2;
> diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
> index c5347920541c..055131c508ff 100644
> --- a/arch/x86/events/intel/uncore_snb.c
> +++ b/arch/x86/events/intel/uncore_snb.c
> @@ -940,7 +940,7 @@ static int snb_uncore_imc_event_init(struct perf_event *event)
>
> pmu = uncore_event_to_pmu(event);
> /* no device found for this pmu */
> - if (!pmu->registered)
> + if (!uncore_pmu_available(pmu))
> return -ENOENT;
>
> /* Sampling not supported yet */
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/7] perf/x86/intel/uncore: Keep PCI PMUs working when MMIO/MSR setup fails
2026-05-12 23:30 ` [PATCH 3/7] perf/x86/intel/uncore: Keep PCI PMUs working when MMIO/MSR setup fails Zide Chen
@ 2026-05-13 0:30 ` Ian Rogers
0 siblings, 0 replies; 19+ messages in thread
From: Ian Rogers @ 2026-05-13 0:30 UTC (permalink / raw)
To: Zide Chen
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Adrian Hunter, Alexander Shishkin, Andi Kleen,
Eranian Stephane, linux-kernel, linux-perf-users, Dapeng Mi
On Tue, May 12, 2026 at 4:39 PM Zide Chen <zide.chen@intel.com> wrote:
>
> uncore_event_cpu_online() returns -ENOMEM early when both the MSR and
> MMIO box allocations fail. This also aborts PCI uncore setup, even
> though PCI PMUs are independent of the MSR/MMIO paths.
>
> Remove the early return so PCI uncore setup always runs regardless
> of whether MSR or MMIO box allocation succeeds.
>
> Fixes: 3da04b8a00dd ("perf/x86/intel/uncore: Support MMIO type uncore blocks")
> Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> arch/x86/events/intel/uncore.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index 012a7e081014..8d5170788af2 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -1632,8 +1632,6 @@ static int uncore_event_cpu_online(unsigned int cpu)
> die = topology_logical_die_id(cpu);
> msr_ret = uncore_box_ref(uncore_msr_uncores, die, cpu);
> mmio_ret = uncore_box_ref(uncore_mmio_uncores, die, cpu);
> - if (msr_ret && mmio_ret)
> - return -ENOMEM;
>
> /*
> * Check if there is an online cpu in the package
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/7] perf/x86/intel/uncore: Fix uncore_box ref/unref ordering on CPU hotplug
2026-05-12 23:30 ` [PATCH 6/7] perf/x86/intel/uncore: Fix uncore_box ref/unref ordering on CPU hotplug Zide Chen
@ 2026-05-13 0:32 ` Ian Rogers
2026-05-13 8:59 ` Mi, Dapeng
1 sibling, 0 replies; 19+ messages in thread
From: Ian Rogers @ 2026-05-13 0:32 UTC (permalink / raw)
To: Zide Chen
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Adrian Hunter, Alexander Shishkin, Andi Kleen,
Eranian Stephane, linux-kernel, linux-perf-users, Dapeng Mi
On Tue, May 12, 2026 at 4:39 PM Zide Chen <zide.chen@intel.com> wrote:
>
> In uncore_event_cpu_online(), uncore_box_ref() was called before
> uncore_change_context(). uncore_box_ref() gates on box->cpu >= 0,
> but box->cpu is still -1 at that point because uncore_change_context()
> has not run yet. As a result, the box is never initialized on the
> first CPU to come online in a die, leaving it permanently
> uninitialized in the single-CPU-per-die case.
>
> Thus, cpu_refcnt is one count below the true value, and in the CPU
> offline path, the box will be torn down on the second-to-last CPU.
>
> In uncore_event_cpu_offline(), uncore_box_unref() was called after
> uncore_change_context(), so box->cpu is already -1 when the collector
> CPU goes offline, which prevents it from tearing down the box.
>
> Fix by swapping the call order in both paths so that
> uncore_box_{ref,unref}() runs at the point where box->cpu reflects
> the correct context.
>
> Fixes: c74443d92f68 ("perf/x86/uncore: Support per PMU cpumask")
> Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> arch/x86/events/intel/uncore.c | 56 ++++++++++++++++++----------------
> 1 file changed, 29 insertions(+), 27 deletions(-)
>
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index 922ba299533e..399f434e1a7d 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -1574,9 +1574,15 @@ static int uncore_event_cpu_offline(unsigned int cpu)
> {
> int die, target;
>
> + /* Clear the references */
> + die = topology_logical_die_id(cpu);
> + uncore_box_unref(uncore_msr_uncores, die);
> + uncore_box_unref(uncore_mmio_uncores, die);
> +
> /* Check if exiting cpu is used for collecting uncore events */
> if (!cpumask_test_and_clear_cpu(cpu, &uncore_cpu_mask))
> - goto unref;
> + return 0;
> +
> /* Find a new cpu to collect uncore events */
> target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
>
> @@ -1589,20 +1595,14 @@ static int uncore_event_cpu_offline(unsigned int cpu)
> uncore_change_context(uncore_msr_uncores, cpu, target);
> uncore_change_context(uncore_mmio_uncores, cpu, target);
> uncore_change_context(uncore_pci_uncores, cpu, target);
> -
> -unref:
> - /* Clear the references */
> - die = topology_logical_die_id(cpu);
> - uncore_box_unref(uncore_msr_uncores, die);
> - uncore_box_unref(uncore_mmio_uncores, die);
> return 0;
> }
>
> -static int allocate_boxes(struct intel_uncore_type **types,
> +static void allocate_boxes(struct intel_uncore_type **types,
> unsigned int die, unsigned int cpu)
> {
> struct intel_uncore_box *box, *tmp;
> - struct intel_uncore_type *type;
> + struct intel_uncore_type *type, **start = types;
> struct intel_uncore_pmu *pmu;
> LIST_HEAD(allocated);
> int i;
> @@ -1627,14 +1627,21 @@ static int allocate_boxes(struct intel_uncore_type **types,
> list_del_init(&box->active_list);
> box->pmu->boxes[die] = box;
> }
> - return 0;
> + return;
>
> cleanup:
> list_for_each_entry_safe(box, tmp, &allocated, active_list) {
> list_del_init(&box->active_list);
> kfree(box);
> }
> - return -ENOMEM;
> +
> + /* mark the PMU broken to prevent future ussage. */
> + for (; *start; start++) {
> + type = *start;
> + pmu = type->pmus;
> + for (i = 0; i < type->num_boxes; i++, pmu++)
> + uncore_pmu_set_broken(pmu);
> + }
> }
>
> static int uncore_box_ref(struct intel_uncore_type **types,
> @@ -1643,11 +1650,7 @@ static int uncore_box_ref(struct intel_uncore_type **types,
> struct intel_uncore_type *type;
> struct intel_uncore_pmu *pmu;
> struct intel_uncore_box *box;
> - int i, ret;
> -
> - ret = allocate_boxes(types, die, cpu);
> - if (ret)
> - return ret;
> + int i;
>
> for (; *types; types++) {
> type = *types;
> @@ -1664,27 +1667,26 @@ static int uncore_box_ref(struct intel_uncore_type **types,
>
> static int uncore_event_cpu_online(unsigned int cpu)
> {
> - int die, target, msr_ret, mmio_ret;
> + int die, target;
>
> die = topology_logical_die_id(cpu);
> - msr_ret = uncore_box_ref(uncore_msr_uncores, die, cpu);
> - mmio_ret = uncore_box_ref(uncore_mmio_uncores, die, cpu);
> + allocate_boxes(uncore_msr_uncores, die, cpu);
> + allocate_boxes(uncore_mmio_uncores, die, cpu);
>
> /*
> * Check if there is an online cpu in the package
> * which collects uncore events already.
> */
> target = cpumask_any_and(&uncore_cpu_mask, topology_die_cpumask(cpu));
> - if (target < nr_cpu_ids)
> - return 0;
> -
> - cpumask_set_cpu(cpu, &uncore_cpu_mask);
> -
> - if (!msr_ret)
> + if (target >= nr_cpu_ids) {
> + cpumask_set_cpu(cpu, &uncore_cpu_mask);
> uncore_change_context(uncore_msr_uncores, -1, cpu);
> - if (!mmio_ret)
> uncore_change_context(uncore_mmio_uncores, -1, cpu);
> - uncore_change_context(uncore_pci_uncores, -1, cpu);
> + uncore_change_context(uncore_pci_uncores, -1, cpu);
> + }
> +
> + uncore_box_ref(uncore_msr_uncores, die, cpu);
> + uncore_box_ref(uncore_mmio_uncores, die, cpu);
> return 0;
> }
>
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 7/7] perf/x86/intel/uncore: Implement lazy setup for MSR/MMIO PMU
2026-05-12 23:30 ` [PATCH 7/7] perf/x86/intel/uncore: Implement lazy setup for MSR/MMIO PMU Zide Chen
@ 2026-05-13 0:34 ` Ian Rogers
2026-05-13 9:03 ` Mi, Dapeng
1 sibling, 0 replies; 19+ messages in thread
From: Ian Rogers @ 2026-05-13 0:34 UTC (permalink / raw)
To: Zide Chen
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Adrian Hunter, Alexander Shishkin, Andi Kleen,
Eranian Stephane, linux-kernel, linux-perf-users, Dapeng Mi
On Tue, May 12, 2026 at 4:39 PM Zide Chen <zide.chen@intel.com> wrote:
>
> MSR and MMIO uncore PMUs are currently registered at module init time
> and appear in sysfs even when no PMU boxes are functional.
>
> Apply the same lazy registration model used by PCI uncore PMUs: the
> PMU is registered when the first box is successfully initialized, and
> unregistered when the last box exits. If a box fails to initialize on
> a subsequent die, the PMU is marked broken but remains registered to
> avoid disrupting any in-flight perf events.
>
> Box allocation and free remain at module init/exit time to avoid
> repeated kfree/alloc cycles across CPU offline/online events.
>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> arch/x86/events/intel/uncore.c | 72 ++++++----------------------------
> 1 file changed, 12 insertions(+), 60 deletions(-)
>
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index 399f434e1a7d..2aaac0b49bb6 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -1564,8 +1564,11 @@ static void uncore_box_unref(struct intel_uncore_type **types, int die)
> for (i = 0; i < type->num_boxes; i++, pmu++) {
> box = pmu->boxes[die];
> if (box && box->cpu >= 0 &&
> - atomic_dec_return(&box->cpu_refcnt) == 0)
> + atomic_dec_return(&box->cpu_refcnt) == 0) {
> + if (atomic_dec_return(&pmu->die_refcnt) == 0)
> + uncore_pmu_unregister(pmu);
> uncore_box_exit(box);
> + }
> }
> }
> }
> @@ -1659,7 +1662,7 @@ static int uncore_box_ref(struct intel_uncore_type **types,
> box = pmu->boxes[die];
> if (box && box->cpu >= 0 &&
> atomic_inc_return(&box->cpu_refcnt) == 1)
> - uncore_box_init(box);
> + uncore_box_setup(pmu, box);
> }
> }
> return 0;
> @@ -1690,67 +1693,16 @@ static int uncore_event_cpu_online(unsigned int cpu)
> return 0;
> }
>
> -static int __init type_pmu_register(struct intel_uncore_type *type)
> +static int __init uncore_cpu_mmio_init(struct intel_uncore_type **types)
> {
> - int i, ret;
> -
> - for (i = 0; i < type->num_boxes; i++) {
> - ret = uncore_pmu_register(&type->pmus[i]);
> - if (ret)
> - return ret;
> - }
> - return 0;
> -}
> -
> -static int __init uncore_msr_pmus_register(void)
> -{
> - struct intel_uncore_type **types = uncore_msr_uncores;
> - int ret;
> -
> - for (; *types; types++) {
> - ret = type_pmu_register(*types);
> - if (ret)
> - return ret;
> - }
> - return 0;
> -}
> -
> -static int __init uncore_cpu_init(void)
> -{
> - int ret;
> -
> - ret = uncore_types_init(uncore_msr_uncores);
> - if (ret)
> - goto err;
> -
> - ret = uncore_msr_pmus_register();
> - if (ret)
> - goto err;
> - return 0;
> -err:
> - uncore_types_exit(uncore_msr_uncores);
> - uncore_msr_uncores = empty_uncore;
> - return ret;
> -}
> -
> -static int __init uncore_mmio_init(void)
> -{
> - struct intel_uncore_type **types = uncore_mmio_uncores;
> int ret;
>
> ret = uncore_types_init(types);
> - if (ret)
> - goto err;
> + if (!ret)
> + return 0;
>
> - for (; *types; types++) {
> - ret = type_pmu_register(*types);
> - if (ret)
> - goto err;
> - }
> - return 0;
> -err:
> - uncore_types_exit(uncore_mmio_uncores);
> - uncore_mmio_uncores = empty_uncore;
> + uncore_types_exit(types);
> + types = empty_uncore;
> return ret;
> }
>
> @@ -2052,12 +2004,12 @@ static int __init intel_uncore_init(void)
>
> if (uncore_init->cpu_init) {
> uncore_init->cpu_init();
> - cret = uncore_cpu_init();
> + cret = uncore_cpu_mmio_init(uncore_msr_uncores);
> }
>
> if (uncore_init->mmio_init) {
> uncore_init->mmio_init();
> - mret = uncore_mmio_init();
> + mret = uncore_cpu_mmio_init(uncore_mmio_uncores);
> }
>
> if (cret && pret && mret) {
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/7] perf/x86/intel/uncore: Fix uncore_box ref/unref ordering on CPU hotplug
2026-05-12 23:30 ` [PATCH 6/7] perf/x86/intel/uncore: Fix uncore_box ref/unref ordering on CPU hotplug Zide Chen
2026-05-13 0:32 ` Ian Rogers
@ 2026-05-13 8:59 ` Mi, Dapeng
2026-05-13 18:43 ` Chen, Zide
1 sibling, 1 reply; 19+ messages in thread
From: Mi, Dapeng @ 2026-05-13 8:59 UTC (permalink / raw)
To: Zide Chen, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users
On 5/13/2026 7:30 AM, Zide Chen wrote:
> In uncore_event_cpu_online(), uncore_box_ref() was called before
> uncore_change_context(). uncore_box_ref() gates on box->cpu >= 0,
> but box->cpu is still -1 at that point because uncore_change_context()
> has not run yet. As a result, the box is never initialized on the
> first CPU to come online in a die, leaving it permanently
> uninitialized in the single-CPU-per-die case.
>
> Thus, cpu_refcnt is one count below the true value, and in the CPU
> offline path, the box will be torn down on the second-to-last CPU.
>
> In uncore_event_cpu_offline(), uncore_box_unref() was called after
> uncore_change_context(), so box->cpu is already -1 when the collector
> CPU goes offline, which prevents it from tearing down the box.
>
> Fix by swapping the call order in both paths so that
> uncore_box_{ref,unref}() runs at the point where box->cpu reflects
> the correct context.
>
> Fixes: c74443d92f68 ("perf/x86/uncore: Support per PMU cpumask")
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> arch/x86/events/intel/uncore.c | 56 ++++++++++++++++++----------------
> 1 file changed, 29 insertions(+), 27 deletions(-)
>
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index 922ba299533e..399f434e1a7d 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -1574,9 +1574,15 @@ static int uncore_event_cpu_offline(unsigned int cpu)
> {
> int die, target;
>
> + /* Clear the references */
> + die = topology_logical_die_id(cpu);
> + uncore_box_unref(uncore_msr_uncores, die);
> + uncore_box_unref(uncore_mmio_uncores, die);
> +
> /* Check if exiting cpu is used for collecting uncore events */
> if (!cpumask_test_and_clear_cpu(cpu, &uncore_cpu_mask))
> - goto unref;
> + return 0;
> +
> /* Find a new cpu to collect uncore events */
> target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
>
> @@ -1589,20 +1595,14 @@ static int uncore_event_cpu_offline(unsigned int cpu)
> uncore_change_context(uncore_msr_uncores, cpu, target);
> uncore_change_context(uncore_mmio_uncores, cpu, target);
> uncore_change_context(uncore_pci_uncores, cpu, target);
> -
> -unref:
> - /* Clear the references */
> - die = topology_logical_die_id(cpu);
> - uncore_box_unref(uncore_msr_uncores, die);
> - uncore_box_unref(uncore_mmio_uncores, die);
> return 0;
> }
>
> -static int allocate_boxes(struct intel_uncore_type **types,
> +static void allocate_boxes(struct intel_uncore_type **types,
> unsigned int die, unsigned int cpu)
> {
> struct intel_uncore_box *box, *tmp;
> - struct intel_uncore_type *type;
> + struct intel_uncore_type *type, **start = types;
> struct intel_uncore_pmu *pmu;
> LIST_HEAD(allocated);
> int i;
> @@ -1627,14 +1627,21 @@ static int allocate_boxes(struct intel_uncore_type **types,
> list_del_init(&box->active_list);
> box->pmu->boxes[die] = box;
> }
> - return 0;
> + return;
>
> cleanup:
> list_for_each_entry_safe(box, tmp, &allocated, active_list) {
> list_del_init(&box->active_list);
> kfree(box);
> }
> - return -ENOMEM;
> +
> + /* mark the PMU broken to prevent future ussage. */
> + for (; *start; start++) {
> + type = *start;
> + pmu = type->pmus;
> + for (i = 0; i < type->num_boxes; i++, pmu++)
> + uncore_pmu_set_broken(pmu);
> + }
It looks all PMUs of all types are set to be broken even the boxes
allocation of some PMUs didn't fail. Could we set the only the failed and
later PMUs to be broken?
> }
>
> static int uncore_box_ref(struct intel_uncore_type **types,
> @@ -1643,11 +1650,7 @@ static int uncore_box_ref(struct intel_uncore_type **types,
> struct intel_uncore_type *type;
> struct intel_uncore_pmu *pmu;
> struct intel_uncore_box *box;
> - int i, ret;
> -
> - ret = allocate_boxes(types, die, cpu);
> - if (ret)
> - return ret;
> + int i;
>
> for (; *types; types++) {
> type = *types;
> @@ -1664,27 +1667,26 @@ static int uncore_box_ref(struct intel_uncore_type **types,
>
> static int uncore_event_cpu_online(unsigned int cpu)
> {
> - int die, target, msr_ret, mmio_ret;
> + int die, target;
>
> die = topology_logical_die_id(cpu);
> - msr_ret = uncore_box_ref(uncore_msr_uncores, die, cpu);
> - mmio_ret = uncore_box_ref(uncore_mmio_uncores, die, cpu);
> + allocate_boxes(uncore_msr_uncores, die, cpu);
> + allocate_boxes(uncore_mmio_uncores, die, cpu);
>
> /*
> * Check if there is an online cpu in the package
> * which collects uncore events already.
> */
> target = cpumask_any_and(&uncore_cpu_mask, topology_die_cpumask(cpu));
> - if (target < nr_cpu_ids)
> - return 0;
> -
> - cpumask_set_cpu(cpu, &uncore_cpu_mask);
> -
> - if (!msr_ret)
> + if (target >= nr_cpu_ids) {
> + cpumask_set_cpu(cpu, &uncore_cpu_mask);
> uncore_change_context(uncore_msr_uncores, -1, cpu);
> - if (!mmio_ret)
> uncore_change_context(uncore_mmio_uncores, -1, cpu);
> - uncore_change_context(uncore_pci_uncores, -1, cpu);
> + uncore_change_context(uncore_pci_uncores, -1, cpu);
> + }
> +
> + uncore_box_ref(uncore_msr_uncores, die, cpu);
> + uncore_box_ref(uncore_mmio_uncores, die, cpu);
> return 0;
> }
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 7/7] perf/x86/intel/uncore: Implement lazy setup for MSR/MMIO PMU
2026-05-12 23:30 ` [PATCH 7/7] perf/x86/intel/uncore: Implement lazy setup for MSR/MMIO PMU Zide Chen
2026-05-13 0:34 ` Ian Rogers
@ 2026-05-13 9:03 ` Mi, Dapeng
2026-05-13 16:47 ` Chen, Zide
1 sibling, 1 reply; 19+ messages in thread
From: Mi, Dapeng @ 2026-05-13 9:03 UTC (permalink / raw)
To: Zide Chen, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users
On 5/13/2026 7:30 AM, Zide Chen wrote:
> MSR and MMIO uncore PMUs are currently registered at module init time
> and appear in sysfs even when no PMU boxes are functional.
>
> Apply the same lazy registration model used by PCI uncore PMUs: the
> PMU is registered when the first box is successfully initialized, and
> unregistered when the last box exits. If a box fails to initialize on
> a subsequent die, the PMU is marked broken but remains registered to
> avoid disrupting any in-flight perf events.
>
> Box allocation and free remain at module init/exit time to avoid
> repeated kfree/alloc cycles across CPU offline/online events.
>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
> arch/x86/events/intel/uncore.c | 72 ++++++----------------------------
> 1 file changed, 12 insertions(+), 60 deletions(-)
>
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index 399f434e1a7d..2aaac0b49bb6 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -1564,8 +1564,11 @@ static void uncore_box_unref(struct intel_uncore_type **types, int die)
> for (i = 0; i < type->num_boxes; i++, pmu++) {
> box = pmu->boxes[die];
> if (box && box->cpu >= 0 &&
> - atomic_dec_return(&box->cpu_refcnt) == 0)
> + atomic_dec_return(&box->cpu_refcnt) == 0) {
> + if (atomic_dec_return(&pmu->die_refcnt) == 0)
> + uncore_pmu_unregister(pmu);
> uncore_box_exit(box);
> + }
> }
> }
> }
> @@ -1659,7 +1662,7 @@ static int uncore_box_ref(struct intel_uncore_type **types,
> box = pmu->boxes[die];
> if (box && box->cpu >= 0 &&
> atomic_inc_return(&box->cpu_refcnt) == 1)
> - uncore_box_init(box);
> + uncore_box_setup(pmu, box);
> }
> }
> return 0;
> @@ -1690,67 +1693,16 @@ static int uncore_event_cpu_online(unsigned int cpu)
> return 0;
> }
>
> -static int __init type_pmu_register(struct intel_uncore_type *type)
> +static int __init uncore_cpu_mmio_init(struct intel_uncore_type **types)
The name seems a little bit weird, could we name it to a more generic name?
maybe uncore_pmu_types_init() or something similar? Thanks.
> {
> - int i, ret;
> -
> - for (i = 0; i < type->num_boxes; i++) {
> - ret = uncore_pmu_register(&type->pmus[i]);
> - if (ret)
> - return ret;
> - }
> - return 0;
> -}
> -
> -static int __init uncore_msr_pmus_register(void)
> -{
> - struct intel_uncore_type **types = uncore_msr_uncores;
> - int ret;
> -
> - for (; *types; types++) {
> - ret = type_pmu_register(*types);
> - if (ret)
> - return ret;
> - }
> - return 0;
> -}
> -
> -static int __init uncore_cpu_init(void)
> -{
> - int ret;
> -
> - ret = uncore_types_init(uncore_msr_uncores);
> - if (ret)
> - goto err;
> -
> - ret = uncore_msr_pmus_register();
> - if (ret)
> - goto err;
> - return 0;
> -err:
> - uncore_types_exit(uncore_msr_uncores);
> - uncore_msr_uncores = empty_uncore;
> - return ret;
> -}
> -
> -static int __init uncore_mmio_init(void)
> -{
> - struct intel_uncore_type **types = uncore_mmio_uncores;
> int ret;
>
> ret = uncore_types_init(types);
> - if (ret)
> - goto err;
> + if (!ret)
> + return 0;
>
> - for (; *types; types++) {
> - ret = type_pmu_register(*types);
> - if (ret)
> - goto err;
> - }
> - return 0;
> -err:
> - uncore_types_exit(uncore_mmio_uncores);
> - uncore_mmio_uncores = empty_uncore;
> + uncore_types_exit(types);
> + types = empty_uncore;
> return ret;
> }
>
> @@ -2052,12 +2004,12 @@ static int __init intel_uncore_init(void)
>
> if (uncore_init->cpu_init) {
> uncore_init->cpu_init();
> - cret = uncore_cpu_init();
> + cret = uncore_cpu_mmio_init(uncore_msr_uncores);
> }
>
> if (uncore_init->mmio_init) {
> uncore_init->mmio_init();
> - mret = uncore_mmio_init();
> + mret = uncore_cpu_mmio_init(uncore_mmio_uncores);
> }
>
> if (cret && pret && mret) {
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 7/7] perf/x86/intel/uncore: Implement lazy setup for MSR/MMIO PMU
2026-05-13 9:03 ` Mi, Dapeng
@ 2026-05-13 16:47 ` Chen, Zide
0 siblings, 0 replies; 19+ messages in thread
From: Chen, Zide @ 2026-05-13 16:47 UTC (permalink / raw)
To: Mi, Dapeng, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users
On 5/13/2026 2:03 AM, Mi, Dapeng wrote:
>
> On 5/13/2026 7:30 AM, Zide Chen wrote:
>> MSR and MMIO uncore PMUs are currently registered at module init time
>> and appear in sysfs even when no PMU boxes are functional.
>>
>> Apply the same lazy registration model used by PCI uncore PMUs: the
>> PMU is registered when the first box is successfully initialized, and
>> unregistered when the last box exits. If a box fails to initialize on
>> a subsequent die, the PMU is marked broken but remains registered to
>> avoid disrupting any in-flight perf events.
>>
>> Box allocation and free remain at module init/exit time to avoid
>> repeated kfree/alloc cycles across CPU offline/online events.
>>
>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>> ---
>> arch/x86/events/intel/uncore.c | 72 ++++++----------------------------
>> 1 file changed, 12 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
>> index 399f434e1a7d..2aaac0b49bb6 100644
>> --- a/arch/x86/events/intel/uncore.c
>> +++ b/arch/x86/events/intel/uncore.c
>> @@ -1564,8 +1564,11 @@ static void uncore_box_unref(struct intel_uncore_type **types, int die)
>> for (i = 0; i < type->num_boxes; i++, pmu++) {
>> box = pmu->boxes[die];
>> if (box && box->cpu >= 0 &&
>> - atomic_dec_return(&box->cpu_refcnt) == 0)
>> + atomic_dec_return(&box->cpu_refcnt) == 0) {
>> + if (atomic_dec_return(&pmu->die_refcnt) == 0)
>> + uncore_pmu_unregister(pmu);
>> uncore_box_exit(box);
>> + }
>> }
>> }
>> }
>> @@ -1659,7 +1662,7 @@ static int uncore_box_ref(struct intel_uncore_type **types,
>> box = pmu->boxes[die];
>> if (box && box->cpu >= 0 &&
>> atomic_inc_return(&box->cpu_refcnt) == 1)
>> - uncore_box_init(box);
>> + uncore_box_setup(pmu, box);
>> }
>> }
>> return 0;
>> @@ -1690,67 +1693,16 @@ static int uncore_event_cpu_online(unsigned int cpu)
>> return 0;
>> }
>>
>> -static int __init type_pmu_register(struct intel_uncore_type *type)
>> +static int __init uncore_cpu_mmio_init(struct intel_uncore_type **types)
>
> The name seems a little bit weird, could we name it to a more generic name?
> maybe uncore_pmu_types_init() or something similar? Thanks.
Sure, I may pick uncore_pmu_types_init().
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/7] perf/x86/intel/uncore: Fix uncore_box ref/unref ordering on CPU hotplug
2026-05-13 8:59 ` Mi, Dapeng
@ 2026-05-13 18:43 ` Chen, Zide
0 siblings, 0 replies; 19+ messages in thread
From: Chen, Zide @ 2026-05-13 18:43 UTC (permalink / raw)
To: Mi, Dapeng, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users
On 5/13/2026 1:59 AM, Mi, Dapeng wrote:
>
> On 5/13/2026 7:30 AM, Zide Chen wrote:
>> In uncore_event_cpu_online(), uncore_box_ref() was called before
>> uncore_change_context(). uncore_box_ref() gates on box->cpu >= 0,
>> but box->cpu is still -1 at that point because uncore_change_context()
>> has not run yet. As a result, the box is never initialized on the
>> first CPU to come online in a die, leaving it permanently
>> uninitialized in the single-CPU-per-die case.
>>
>> Thus, cpu_refcnt is one count below the true value, and in the CPU
>> offline path, the box will be torn down on the second-to-last CPU.
>>
>> In uncore_event_cpu_offline(), uncore_box_unref() was called after
>> uncore_change_context(), so box->cpu is already -1 when the collector
>> CPU goes offline, which prevents it from tearing down the box.
>>
>> Fix by swapping the call order in both paths so that
>> uncore_box_{ref,unref}() runs at the point where box->cpu reflects
>> the correct context.
>>
>> Fixes: c74443d92f68 ("perf/x86/uncore: Support per PMU cpumask")
>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>> ---
>> arch/x86/events/intel/uncore.c | 56 ++++++++++++++++++----------------
>> 1 file changed, 29 insertions(+), 27 deletions(-)
>>
>> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
>> index 922ba299533e..399f434e1a7d 100644
>> --- a/arch/x86/events/intel/uncore.c
>> +++ b/arch/x86/events/intel/uncore.c
>> @@ -1574,9 +1574,15 @@ static int uncore_event_cpu_offline(unsigned int cpu)
>> {
>> int die, target;
>>
>> + /* Clear the references */
>> + die = topology_logical_die_id(cpu);
>> + uncore_box_unref(uncore_msr_uncores, die);
>> + uncore_box_unref(uncore_mmio_uncores, die);
>> +
>> /* Check if exiting cpu is used for collecting uncore events */
>> if (!cpumask_test_and_clear_cpu(cpu, &uncore_cpu_mask))
>> - goto unref;
>> + return 0;
>> +
>> /* Find a new cpu to collect uncore events */
>> target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
>>
>> @@ -1589,20 +1595,14 @@ static int uncore_event_cpu_offline(unsigned int cpu)
>> uncore_change_context(uncore_msr_uncores, cpu, target);
>> uncore_change_context(uncore_mmio_uncores, cpu, target);
>> uncore_change_context(uncore_pci_uncores, cpu, target);
>> -
>> -unref:
>> - /* Clear the references */
>> - die = topology_logical_die_id(cpu);
>> - uncore_box_unref(uncore_msr_uncores, die);
>> - uncore_box_unref(uncore_mmio_uncores, die);
>> return 0;
>> }
>>
>> -static int allocate_boxes(struct intel_uncore_type **types,
>> +static void allocate_boxes(struct intel_uncore_type **types,
>> unsigned int die, unsigned int cpu)
>> {
>> struct intel_uncore_box *box, *tmp;
>> - struct intel_uncore_type *type;
>> + struct intel_uncore_type *type, **start = types;
>> struct intel_uncore_pmu *pmu;
>> LIST_HEAD(allocated);
>> int i;
>> @@ -1627,14 +1627,21 @@ static int allocate_boxes(struct intel_uncore_type **types,
>> list_del_init(&box->active_list);
>> box->pmu->boxes[die] = box;
>> }
>> - return 0;
>> + return;
>>
>> cleanup:
>> list_for_each_entry_safe(box, tmp, &allocated, active_list) {
>> list_del_init(&box->active_list);
>> kfree(box);
>> }
>> - return -ENOMEM;
>> +
>> + /* mark the PMU broken to prevent future ussage. */
>> + for (; *start; start++) {
>> + type = *start;
>> + pmu = type->pmus;
>> + for (i = 0; i < type->num_boxes; i++, pmu++)
>> + uncore_pmu_set_broken(pmu);
>> + }
>
> It looks all PMUs of all types are set to be broken even the boxes
> allocation of some PMUs didn't fail. Could we set the only the failed and
> later PMUs to be broken?
Right, it should simply set one PMU broken:
- if (!box)
+ if (!box) {
+ uncore_pmu_set_broken(pmu);
goto cleanup;
+ }
>> }
>>
>> static int uncore_box_ref(struct intel_uncore_type **types,
>> @@ -1643,11 +1650,7 @@ static int uncore_box_ref(struct intel_uncore_type **types,
>> struct intel_uncore_type *type;
>> struct intel_uncore_pmu *pmu;
>> struct intel_uncore_box *box;
>> - int i, ret;
>> -
>> - ret = allocate_boxes(types, die, cpu);
>> - if (ret)
>> - return ret;
>> + int i;
>>
>> for (; *types; types++) {
>> type = *types;
>> @@ -1664,27 +1667,26 @@ static int uncore_box_ref(struct intel_uncore_type **types,
>>
>> static int uncore_event_cpu_online(unsigned int cpu)
>> {
>> - int die, target, msr_ret, mmio_ret;
>> + int die, target;
>>
>> die = topology_logical_die_id(cpu);
>> - msr_ret = uncore_box_ref(uncore_msr_uncores, die, cpu);
>> - mmio_ret = uncore_box_ref(uncore_mmio_uncores, die, cpu);
>> + allocate_boxes(uncore_msr_uncores, die, cpu);
>> + allocate_boxes(uncore_mmio_uncores, die, cpu);
>>
>> /*
>> * Check if there is an online cpu in the package
>> * which collects uncore events already.
>> */
>> target = cpumask_any_and(&uncore_cpu_mask, topology_die_cpumask(cpu));
>> - if (target < nr_cpu_ids)
>> - return 0;
>> -
>> - cpumask_set_cpu(cpu, &uncore_cpu_mask);
>> -
>> - if (!msr_ret)
>> + if (target >= nr_cpu_ids) {
>> + cpumask_set_cpu(cpu, &uncore_cpu_mask);
>> uncore_change_context(uncore_msr_uncores, -1, cpu);
>> - if (!mmio_ret)
>> uncore_change_context(uncore_mmio_uncores, -1, cpu);
>> - uncore_change_context(uncore_pci_uncores, -1, cpu);
>> + uncore_change_context(uncore_pci_uncores, -1, cpu);
>> + }
>> +
>> + uncore_box_ref(uncore_msr_uncores, die, cpu);
>> + uncore_box_ref(uncore_mmio_uncores, die, cpu);
>> return 0;
>> }
>>
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2026-05-13 18:43 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12 23:30 [PATCH 0/7] perf/x86/intel/uncore: PMU setup robustness fixes Zide Chen
2026-05-12 23:30 ` [PATCH 1/7] perf/x86/intel/uncore: Rename refcount fields and other cleanups Zide Chen
2026-05-13 0:26 ` Ian Rogers
2026-05-12 23:30 ` [PATCH 2/7] perf/x86/intel/uncore: Let init_box() callback report failures Zide Chen
2026-05-13 0:23 ` Ian Rogers
2026-05-12 23:30 ` [PATCH 3/7] perf/x86/intel/uncore: Keep PCI PMUs working when MMIO/MSR setup fails Zide Chen
2026-05-13 0:30 ` Ian Rogers
2026-05-12 23:30 ` [PATCH 4/7] perf/x86/intel/uncore: Factor out box setup code Zide Chen
2026-05-13 0:27 ` Ian Rogers
2026-05-12 23:30 ` [PATCH 5/7] perf/x86/intel/uncore: Introduce PMU flags and broken state Zide Chen
2026-05-13 0:28 ` Ian Rogers
2026-05-12 23:30 ` [PATCH 6/7] perf/x86/intel/uncore: Fix uncore_box ref/unref ordering on CPU hotplug Zide Chen
2026-05-13 0:32 ` Ian Rogers
2026-05-13 8:59 ` Mi, Dapeng
2026-05-13 18:43 ` Chen, Zide
2026-05-12 23:30 ` [PATCH 7/7] perf/x86/intel/uncore: Implement lazy setup for MSR/MMIO PMU Zide Chen
2026-05-13 0:34 ` Ian Rogers
2026-05-13 9:03 ` Mi, Dapeng
2026-05-13 16:47 ` Chen, Zide
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox