[PATCH 0/7] Generic hotplug support for a PMU with a scope

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/7] Generic hotplug support for a PMU with a scope
@ 2024-08-02 15:16 kan.liang
  2024-08-02 15:16 ` [PATCH 1/7] perf: " kan.liang
                   ` (8 more replies)
  0 siblings, 9 replies; 35+ messages in thread
From: kan.liang @ 2024-08-02 15:16 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, linux-kernel; +Cc: Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The perf subsystem assumes that the counters of a PMU are per-CPU. So
the user space tool reads a counter from each CPU in the system wide
mode. However, many PMUs don't have a per-CPU counter. The counter is
effective for a scope, e.g., a die or a socket. To address this, a
cpumask is exposed by the kernel driver to restrict to one CPU to stand
for a specific scope. In case the given CPU is removed,
the hotplug support has to be implemented for each such driver.

The codes to support the cpumask and hotplug are very similar.
- Expose a cpumask into sysfs
- Pickup another CPU in the same scope if the given CPU is removed.
- Invoke the perf_pmu_migrate_context() to migrate to a new CPU.
- In event init, always set the CPU in the cpumask to event->cpu
- Usually, an event can be read from any CPU of the scope. (For now,
  it is only supported by the pkg scope PMU, via
  PERF_EV_CAP_READ_ACTIVE_PKG, e.g., cstate_pkg, rapl, etc)

Similar duplicated codes are implemented for each such PMU driver. It
would be good to introduce a generic infrastructure to avoid such
duplication.

The patch series introduce 5 popular scopes, core, die, cluster, pkg,
and the system-wide. The PMU drivers for cstate, iommu, idxd and rapl
are updated to apply the new infrastructure. The new infrastructure
can also be applied for other PMU drivers from different ARCHs as well.
But I don't have such platforms. It's not done in this patch series.
They can be added later separately.

The uncore driver isn't updated either. Because a per-PMU cpumask is
required since commit c74443d92f68 ("perf/x86/uncore: Support per PMU
cpumask"). Since different types of PMU share the same hotplug codes,
more factor out works and verification are expected. The cleanup of the
uncore driver can be done later separately.

Kan Liang (7):
  perf: Generic hotplug support for a PMU with a scope
  perf: Add PERF_EV_CAP_READ_SCOPE
  perf/x86/intel/cstate: Clean up cpumask and hotplug
  iommu/vt-d: Clean up cpumask and hotplug
  dmaengine: idxd: Clean up cpumask and hotplug
  perf/x86/rapl: Move the pmu allocation out of CPU hotplug
  perf/x86/rapl: Clean up cpumask and hotplug

 arch/x86/events/intel/cstate.c | 140 +-------------------------
 arch/x86/events/rapl.c         | 119 ++++++----------------
 drivers/dma/idxd/idxd.h        |   7 --
 drivers/dma/idxd/init.c        |   3 -
 drivers/dma/idxd/perfmon.c     |  98 +-----------------
 drivers/iommu/intel/iommu.h    |   2 -
 drivers/iommu/intel/perfmon.c  | 111 +--------------------
 include/linux/cpuhotplug.h     |   3 -
 include/linux/perf_event.h     |  21 ++++
 kernel/events/core.c           | 176 ++++++++++++++++++++++++++++++++-
 10 files changed, 229 insertions(+), 451 deletions(-)

-- 
2.38.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/7] perf: Generic hotplug support for a PMU with a scope
  2024-08-02 15:16 [PATCH 0/7] Generic hotplug support for a PMU with a scope kan.liang
@ 2024-08-02 15:16 ` kan.liang
  2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
                     ` (2 more replies)
  2024-08-02 15:16 ` [PATCH 2/7] perf: Add PERF_EV_CAP_READ_SCOPE kan.liang
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 35+ messages in thread
From: kan.liang @ 2024-08-02 15:16 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, linux-kernel; +Cc: Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The perf subsystem assumes that the counters of a PMU are per-CPU. So
the user space tool reads a counter from each CPU in the system wide
mode. However, many PMUs don't have a per-CPU counter. The counter is
effective for a scope, e.g., a die or a socket. To address this, a
cpumask is exposed by the kernel driver to restrict to one CPU to stand
for a specific scope. In case the given CPU is removed,
the hotplug support has to be implemented for each such driver.

The codes to support the cpumask and hotplug are very similar.
- Expose a cpumask into sysfs
- Pickup another CPU in the same scope if the given CPU is removed.
- Invoke the perf_pmu_migrate_context() to migrate to a new CPU.
- In event init, always set the CPU in the cpumask to event->cpu

Similar duplicated codes are implemented for each such PMU driver. It
would be good to introduce a generic infrastructure to avoid such
duplication.

5 popular scopes are implemented here, core, die, cluster, pkg, and
the system-wide. The scope can be set when a PMU is registered. If so, a
"cpumask" is automatically exposed for the PMU.

The "cpumask" is from the perf_online_<scope>_mask, which is to track
the active CPU for each scope. They are set when the first CPU of the
scope is online via the generic perf hotplug support. When a
corresponding CPU is removed, the perf_online_<scope>_mask is updated
accordingly and the PMU will be moved to a new CPU from the same scope
if possible.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 include/linux/perf_event.h |  18 ++++
 kernel/events/core.c       | 164 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 180 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1a8942277dda..1102d5c2be70 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -292,6 +292,19 @@ struct perf_event_pmu_context;
 #define PERF_PMU_CAP_AUX_OUTPUT			0x0080
 #define PERF_PMU_CAP_EXTENDED_HW_TYPE		0x0100
 
+/**
+ * pmu::scope
+ */
+enum perf_pmu_scope {
+	PERF_PMU_SCOPE_NONE	= 0,
+	PERF_PMU_SCOPE_CORE,
+	PERF_PMU_SCOPE_DIE,
+	PERF_PMU_SCOPE_CLUSTER,
+	PERF_PMU_SCOPE_PKG,
+	PERF_PMU_SCOPE_SYS_WIDE,
+	PERF_PMU_MAX_SCOPE,
+};
+
 struct perf_output_handle;
 
 #define PMU_NULL_DEV	((void *)(~0UL))
@@ -315,6 +328,11 @@ struct pmu {
 	 */
 	int				capabilities;
 
+	/*
+	 * PMU scope
+	 */
+	unsigned int			scope;
+
 	int __percpu			*pmu_disable_count;
 	struct perf_cpu_pmu_context __percpu *cpu_pmu_context;
 	atomic_t			exclusive_cnt; /* < 0: cpu; > 0: tsk */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index aa3450bdc227..5e1877c4cb4c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -407,6 +407,11 @@ static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
 static struct srcu_struct pmus_srcu;
 static cpumask_var_t perf_online_mask;
+static cpumask_var_t perf_online_core_mask;
+static cpumask_var_t perf_online_die_mask;
+static cpumask_var_t perf_online_cluster_mask;
+static cpumask_var_t perf_online_pkg_mask;
+static cpumask_var_t perf_online_sys_mask;
 static struct kmem_cache *perf_event_cache;
 
 /*
@@ -11477,10 +11482,60 @@ perf_event_mux_interval_ms_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(perf_event_mux_interval_ms);
 
+static inline const struct cpumask *perf_scope_cpu_topology_cpumask(unsigned int scope, int cpu)
+{
+	switch (scope) {
+	case PERF_PMU_SCOPE_CORE:
+		return topology_sibling_cpumask(cpu);
+	case PERF_PMU_SCOPE_DIE:
+		return topology_die_cpumask(cpu);
+	case PERF_PMU_SCOPE_CLUSTER:
+		return topology_cluster_cpumask(cpu);
+	case PERF_PMU_SCOPE_PKG:
+		return topology_core_cpumask(cpu);
+	case PERF_PMU_SCOPE_SYS_WIDE:
+		return cpu_online_mask;
+	}
+
+	return NULL;
+}
+
+static inline struct cpumask *perf_scope_cpumask(unsigned int scope)
+{
+	switch (scope) {
+	case PERF_PMU_SCOPE_CORE:
+		return perf_online_core_mask;
+	case PERF_PMU_SCOPE_DIE:
+		return perf_online_die_mask;
+	case PERF_PMU_SCOPE_CLUSTER:
+		return perf_online_cluster_mask;
+	case PERF_PMU_SCOPE_PKG:
+		return perf_online_pkg_mask;
+	case PERF_PMU_SCOPE_SYS_WIDE:
+		return perf_online_sys_mask;
+	}
+
+	return NULL;
+}
+
+static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
+			    char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct cpumask *mask = perf_scope_cpumask(pmu->scope);
+
+	if (mask)
+		return cpumap_print_to_pagebuf(true, buf, mask);
+	return 0;
+}
+
+static DEVICE_ATTR_RO(cpumask);
+
 static struct attribute *pmu_dev_attrs[] = {
 	&dev_attr_type.attr,
 	&dev_attr_perf_event_mux_interval_ms.attr,
 	&dev_attr_nr_addr_filters.attr,
+	&dev_attr_cpumask.attr,
 	NULL,
 };
 
@@ -11492,6 +11547,10 @@ static umode_t pmu_dev_is_visible(struct kobject *kobj, struct attribute *a, int
 	if (n == 2 && !pmu->nr_addr_filters)
 		return 0;
 
+	/* cpumask */
+	if (n == 3 && pmu->scope == PERF_PMU_SCOPE_NONE)
+		return 0;
+
 	return a->mode;
 }
 
@@ -11576,6 +11635,11 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
 		goto free_pdc;
 	}
 
+	if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE, "Can not register a pmu with an invalid scope.\n")) {
+		ret = -EINVAL;
+		goto free_pdc;
+	}
+
 	pmu->name = name;
 
 	if (type >= 0)
@@ -11730,6 +11794,22 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
 		    event_has_any_exclude_flag(event))
 			ret = -EINVAL;
 
+		if (pmu->scope != PERF_PMU_SCOPE_NONE && event->cpu >= 0) {
+			const struct cpumask *cpumask = perf_scope_cpu_topology_cpumask(pmu->scope, event->cpu);
+			struct cpumask *pmu_cpumask = perf_scope_cpumask(pmu->scope);
+			int cpu;
+
+			if (pmu_cpumask && cpumask) {
+				cpu = cpumask_any_and(pmu_cpumask, cpumask);
+				if (cpu >= nr_cpu_ids)
+					ret = -ENODEV;
+				else
+					event->cpu = cpu;
+			} else {
+				ret = -ENODEV;
+			}
+		}
+
 		if (ret && event->destroy)
 			event->destroy(event);
 	}
@@ -13681,6 +13761,12 @@ static void __init perf_event_init_all_cpus(void)
 	int cpu;
 
 	zalloc_cpumask_var(&perf_online_mask, GFP_KERNEL);
+	zalloc_cpumask_var(&perf_online_core_mask, GFP_KERNEL);
+	zalloc_cpumask_var(&perf_online_die_mask, GFP_KERNEL);
+	zalloc_cpumask_var(&perf_online_cluster_mask, GFP_KERNEL);
+	zalloc_cpumask_var(&perf_online_pkg_mask, GFP_KERNEL);
+	zalloc_cpumask_var(&perf_online_sys_mask, GFP_KERNEL);
+
 
 	for_each_possible_cpu(cpu) {
 		swhash = &per_cpu(swevent_htable, cpu);
@@ -13730,6 +13816,40 @@ static void __perf_event_exit_context(void *__info)
 	raw_spin_unlock(&ctx->lock);
 }
 
+static void perf_event_clear_cpumask(unsigned int cpu)
+{
+	int target[PERF_PMU_MAX_SCOPE];
+	unsigned int scope;
+	struct pmu *pmu;
+
+	cpumask_clear_cpu(cpu, perf_online_mask);
+
+	for (scope = PERF_PMU_SCOPE_NONE + 1; scope < PERF_PMU_MAX_SCOPE; scope++) {
+		const struct cpumask *cpumask = perf_scope_cpu_topology_cpumask(scope, cpu);
+		struct cpumask *pmu_cpumask = perf_scope_cpumask(scope);
+
+		target[scope] = -1;
+		if (WARN_ON_ONCE(!pmu_cpumask || !cpumask))
+			continue;
+
+		if (!cpumask_test_and_clear_cpu(cpu, pmu_cpumask))
+			continue;
+		target[scope] = cpumask_any_but(cpumask, cpu);
+		if (target[scope] < nr_cpu_ids)
+			cpumask_set_cpu(target[scope], pmu_cpumask);
+	}
+
+	/* migrate */
+	list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
+		if (pmu->scope == PERF_PMU_SCOPE_NONE ||
+		    WARN_ON_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE))
+			continue;
+
+		if (target[pmu->scope] >= 0 && target[pmu->scope] < nr_cpu_ids)
+			perf_pmu_migrate_context(pmu, cpu, target[pmu->scope]);
+	}
+}
+
 static void perf_event_exit_cpu_context(int cpu)
 {
 	struct perf_cpu_context *cpuctx;
@@ -13737,6 +13857,11 @@ static void perf_event_exit_cpu_context(int cpu)
 
 	// XXX simplify cpuctx->online
 	mutex_lock(&pmus_lock);
+	/*
+	 * Clear the cpumasks, and migrate to other CPUs if possible.
+	 * Must be invoked before the __perf_event_exit_context.
+	 */
+	perf_event_clear_cpumask(cpu);
 	cpuctx = per_cpu_ptr(&perf_cpu_context, cpu);
 	ctx = &cpuctx->ctx;
 
@@ -13744,7 +13869,6 @@ static void perf_event_exit_cpu_context(int cpu)
 	smp_call_function_single(cpu, __perf_event_exit_context, ctx, 1);
 	cpuctx->online = 0;
 	mutex_unlock(&ctx->mutex);
-	cpumask_clear_cpu(cpu, perf_online_mask);
 	mutex_unlock(&pmus_lock);
 }
 #else
@@ -13753,6 +13877,42 @@ static void perf_event_exit_cpu_context(int cpu) { }
 
 #endif
 
+static void perf_event_setup_cpumask(unsigned int cpu)
+{
+	struct cpumask *pmu_cpumask;
+	unsigned int scope;
+
+	cpumask_set_cpu(cpu, perf_online_mask);
+
+	/*
+	 * Early boot stage, the cpumask hasn't been set yet.
+	 * The perf_online_<domain>_masks includes the first CPU of each domain.
+	 * Always uncondifionally set the boot CPU for the perf_online_<domain>_masks.
+	 */
+	if (!topology_sibling_cpumask(cpu)) {
+		for (scope = PERF_PMU_SCOPE_NONE + 1; scope < PERF_PMU_MAX_SCOPE; scope++) {
+			pmu_cpumask = perf_scope_cpumask(scope);
+			if (WARN_ON_ONCE(!pmu_cpumask))
+				continue;
+			cpumask_set_cpu(cpu, pmu_cpumask);
+		}
+		return;
+	}
+
+	for (scope = PERF_PMU_SCOPE_NONE + 1; scope < PERF_PMU_MAX_SCOPE; scope++) {
+		const struct cpumask *cpumask = perf_scope_cpu_topology_cpumask(scope, cpu);
+
+		pmu_cpumask = perf_scope_cpumask(scope);
+
+		if (WARN_ON_ONCE(!pmu_cpumask || !cpumask))
+			continue;
+
+		if (!cpumask_empty(cpumask) &&
+		    cpumask_any_and(pmu_cpumask, cpumask) >= nr_cpu_ids)
+			cpumask_set_cpu(cpu, pmu_cpumask);
+	}
+}
+
 int perf_event_init_cpu(unsigned int cpu)
 {
 	struct perf_cpu_context *cpuctx;
@@ -13761,7 +13921,7 @@ int perf_event_init_cpu(unsigned int cpu)
 	perf_swevent_init_cpu(cpu);
 
 	mutex_lock(&pmus_lock);
-	cpumask_set_cpu(cpu, perf_online_mask);
+	perf_event_setup_cpumask(cpu);
 	cpuctx = per_cpu_ptr(&perf_cpu_context, cpu);
 	ctx = &cpuctx->ctx;
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/7] perf: Add PERF_EV_CAP_READ_SCOPE
  2024-08-02 15:16 [PATCH 0/7] Generic hotplug support for a PMU with a scope kan.liang
  2024-08-02 15:16 ` [PATCH 1/7] perf: " kan.liang
@ 2024-08-02 15:16 ` kan.liang
  2024-09-06 15:11   ` Peter Zijlstra
  2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2024-08-02 15:16 ` [PATCH 3/7] perf/x86/intel/cstate: Clean up cpumask and hotplug kan.liang
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 35+ messages in thread
From: kan.liang @ 2024-08-02 15:16 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, linux-kernel; +Cc: Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Usually, an event can be read from any CPU of the scope. It doesn't need
to be read from the advertised CPU.

Add a new event cap, PERF_EV_CAP_READ_SCOPE. An event of a PMU with
scope can be read from any active CPU in the scope.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 include/linux/perf_event.h |  3 +++
 kernel/events/core.c       | 14 +++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1102d5c2be70..1206bc86eb4f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -633,10 +633,13 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *,
  * PERF_EV_CAP_SIBLING: An event with this flag must be a group sibling and
  * cannot be a group leader. If an event with this flag is detached from the
  * group it is scheduled out and moved into an unrecoverable ERROR state.
+ * PERF_EV_CAP_READ_SCOPE: A CPU event that can be read from any CPU of the
+ * PMU scope where it is active.
  */
 #define PERF_EV_CAP_SOFTWARE		BIT(0)
 #define PERF_EV_CAP_READ_ACTIVE_PKG	BIT(1)
 #define PERF_EV_CAP_SIBLING		BIT(2)
+#define PERF_EV_CAP_READ_SCOPE		BIT(3)
 
 #define SWEVENT_HLIST_BITS		8
 #define SWEVENT_HLIST_SIZE		(1 << SWEVENT_HLIST_BITS)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5e1877c4cb4c..c55294f34575 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4463,16 +4463,24 @@ struct perf_read_data {
 	int ret;
 };
 
+static inline const struct cpumask *perf_scope_cpu_topology_cpumask(unsigned int scope, int cpu);
+
 static int __perf_event_read_cpu(struct perf_event *event, int event_cpu)
 {
+	int local_cpu = smp_processor_id();
 	u16 local_pkg, event_pkg;
 
 	if ((unsigned)event_cpu >= nr_cpu_ids)
 		return event_cpu;
 
-	if (event->group_caps & PERF_EV_CAP_READ_ACTIVE_PKG) {
-		int local_cpu = smp_processor_id();
+	if (event->group_caps & PERF_EV_CAP_READ_SCOPE) {
+		const struct cpumask *cpumask = perf_scope_cpu_topology_cpumask(event->pmu->scope, event_cpu);
+
+		if (cpumask && cpumask_test_cpu(local_cpu, cpumask))
+			return local_cpu;
+	}
 
+	if (event->group_caps & PERF_EV_CAP_READ_ACTIVE_PKG) {
 		event_pkg = topology_physical_package_id(event_cpu);
 		local_pkg = topology_physical_package_id(local_cpu);
 
@@ -11804,7 +11812,7 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
 				if (cpu >= nr_cpu_ids)
 					ret = -ENODEV;
 				else
-					event->cpu = cpu;
+					event->event_caps |= PERF_EV_CAP_READ_SCOPE;
 			} else {
 				ret = -ENODEV;
 			}
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/7] perf/x86/intel/cstate: Clean up cpumask and hotplug
  2024-08-02 15:16 [PATCH 0/7] Generic hotplug support for a PMU with a scope kan.liang
  2024-08-02 15:16 ` [PATCH 1/7] perf: " kan.liang
  2024-08-02 15:16 ` [PATCH 2/7] perf: Add PERF_EV_CAP_READ_SCOPE kan.liang
@ 2024-08-02 15:16 ` kan.liang
  2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2024-08-02 15:16 ` [PATCH 4/7] iommu/vt-d: Clean up cpumask and hotplug for perfmon kan.liang
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: kan.liang @ 2024-08-02 15:16 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, linux-kernel; +Cc: Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

There are three cstate PMUs with different scopes, core, die and module.
The scopes are supported by the generic perf_event subsystem now.

Set the scope for each PMU and remove all the cpumask and hotplug codes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/cstate.c | 142 ++-------------------------------
 include/linux/cpuhotplug.h     |   2 -
 2 files changed, 5 insertions(+), 139 deletions(-)

diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index be58cfb012dd..13d229f2cdda 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -128,10 +128,6 @@ static ssize_t __cstate_##_var##_show(struct device *dev,	\
 static struct device_attribute format_attr_##_var =		\
 	__ATTR(_name, 0444, __cstate_##_var##_show, NULL)
 
-static ssize_t cstate_get_attr_cpumask(struct device *dev,
-				       struct device_attribute *attr,
-				       char *buf);
-
 /* Model -> events mapping */
 struct cstate_model {
 	unsigned long		core_events;
@@ -206,22 +202,9 @@ static struct attribute_group cstate_format_attr_group = {
 	.attrs = cstate_format_attrs,
 };
 
-static cpumask_t cstate_core_cpu_mask;
-static DEVICE_ATTR(cpumask, S_IRUGO, cstate_get_attr_cpumask, NULL);
-
-static struct attribute *cstate_cpumask_attrs[] = {
-	&dev_attr_cpumask.attr,
-	NULL,
-};
-
-static struct attribute_group cpumask_attr_group = {
-	.attrs = cstate_cpumask_attrs,
-};
-
 static const struct attribute_group *cstate_attr_groups[] = {
 	&cstate_events_attr_group,
 	&cstate_format_attr_group,
-	&cpumask_attr_group,
 	NULL,
 };
 
@@ -269,8 +252,6 @@ static struct perf_msr pkg_msr[] = {
 	[PERF_CSTATE_PKG_C10_RES] = { MSR_PKG_C10_RESIDENCY,	&group_cstate_pkg_c10,	test_msr },
 };
 
-static cpumask_t cstate_pkg_cpu_mask;
-
 /* cstate_module PMU */
 static struct pmu cstate_module_pmu;
 static bool has_cstate_module;
@@ -291,28 +272,9 @@ static struct perf_msr module_msr[] = {
 	[PERF_CSTATE_MODULE_C6_RES]  = { MSR_MODULE_C6_RES_MS,	&group_cstate_module_c6,	test_msr },
 };
 
-static cpumask_t cstate_module_cpu_mask;
-
-static ssize_t cstate_get_attr_cpumask(struct device *dev,
-				       struct device_attribute *attr,
-				       char *buf)
-{
-	struct pmu *pmu = dev_get_drvdata(dev);
-
-	if (pmu == &cstate_core_pmu)
-		return cpumap_print_to_pagebuf(true, buf, &cstate_core_cpu_mask);
-	else if (pmu == &cstate_pkg_pmu)
-		return cpumap_print_to_pagebuf(true, buf, &cstate_pkg_cpu_mask);
-	else if (pmu == &cstate_module_pmu)
-		return cpumap_print_to_pagebuf(true, buf, &cstate_module_cpu_mask);
-	else
-		return 0;
-}
-
 static int cstate_pmu_event_init(struct perf_event *event)
 {
 	u64 cfg = event->attr.config;
-	int cpu;
 
 	if (event->attr.type != event->pmu->type)
 		return -ENOENT;
@@ -331,20 +293,13 @@ static int cstate_pmu_event_init(struct perf_event *event)
 		if (!(core_msr_mask & (1 << cfg)))
 			return -EINVAL;
 		event->hw.event_base = core_msr[cfg].msr;
-		cpu = cpumask_any_and(&cstate_core_cpu_mask,
-				      topology_sibling_cpumask(event->cpu));
 	} else if (event->pmu == &cstate_pkg_pmu) {
 		if (cfg >= PERF_CSTATE_PKG_EVENT_MAX)
 			return -EINVAL;
 		cfg = array_index_nospec((unsigned long)cfg, PERF_CSTATE_PKG_EVENT_MAX);
 		if (!(pkg_msr_mask & (1 << cfg)))
 			return -EINVAL;
-
-		event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
-
 		event->hw.event_base = pkg_msr[cfg].msr;
-		cpu = cpumask_any_and(&cstate_pkg_cpu_mask,
-				      topology_die_cpumask(event->cpu));
 	} else if (event->pmu == &cstate_module_pmu) {
 		if (cfg >= PERF_CSTATE_MODULE_EVENT_MAX)
 			return -EINVAL;
@@ -352,16 +307,10 @@ static int cstate_pmu_event_init(struct perf_event *event)
 		if (!(module_msr_mask & (1 << cfg)))
 			return -EINVAL;
 		event->hw.event_base = module_msr[cfg].msr;
-		cpu = cpumask_any_and(&cstate_module_cpu_mask,
-				      topology_cluster_cpumask(event->cpu));
 	} else {
 		return -ENOENT;
 	}
 
-	if (cpu >= nr_cpu_ids)
-		return -ENODEV;
-
-	event->cpu = cpu;
 	event->hw.config = cfg;
 	event->hw.idx = -1;
 	return 0;
@@ -412,84 +361,6 @@ static int cstate_pmu_event_add(struct perf_event *event, int mode)
 	return 0;
 }
 
-/*
- * Check if exiting cpu is the designated reader. If so migrate the
- * events when there is a valid target available
- */
-static int cstate_cpu_exit(unsigned int cpu)
-{
-	unsigned int target;
-
-	if (has_cstate_core &&
-	    cpumask_test_and_clear_cpu(cpu, &cstate_core_cpu_mask)) {
-
-		target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
-		/* Migrate events if there is a valid target */
-		if (target < nr_cpu_ids) {
-			cpumask_set_cpu(target, &cstate_core_cpu_mask);
-			perf_pmu_migrate_context(&cstate_core_pmu, cpu, target);
-		}
-	}
-
-	if (has_cstate_pkg &&
-	    cpumask_test_and_clear_cpu(cpu, &cstate_pkg_cpu_mask)) {
-
-		target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
-		/* Migrate events if there is a valid target */
-		if (target < nr_cpu_ids) {
-			cpumask_set_cpu(target, &cstate_pkg_cpu_mask);
-			perf_pmu_migrate_context(&cstate_pkg_pmu, cpu, target);
-		}
-	}
-
-	if (has_cstate_module &&
-	    cpumask_test_and_clear_cpu(cpu, &cstate_module_cpu_mask)) {
-
-		target = cpumask_any_but(topology_cluster_cpumask(cpu), cpu);
-		/* Migrate events if there is a valid target */
-		if (target < nr_cpu_ids) {
-			cpumask_set_cpu(target, &cstate_module_cpu_mask);
-			perf_pmu_migrate_context(&cstate_module_pmu, cpu, target);
-		}
-	}
-	return 0;
-}
-
-static int cstate_cpu_init(unsigned int cpu)
-{
-	unsigned int target;
-
-	/*
-	 * If this is the first online thread of that core, set it in
-	 * the core cpu mask as the designated reader.
-	 */
-	target = cpumask_any_and(&cstate_core_cpu_mask,
-				 topology_sibling_cpumask(cpu));
-
-	if (has_cstate_core && target >= nr_cpu_ids)
-		cpumask_set_cpu(cpu, &cstate_core_cpu_mask);
-
-	/*
-	 * If this is the first online thread of that package, set it
-	 * in the package cpu mask as the designated reader.
-	 */
-	target = cpumask_any_and(&cstate_pkg_cpu_mask,
-				 topology_die_cpumask(cpu));
-	if (has_cstate_pkg && target >= nr_cpu_ids)
-		cpumask_set_cpu(cpu, &cstate_pkg_cpu_mask);
-
-	/*
-	 * If this is the first online thread of that cluster, set it
-	 * in the cluster cpu mask as the designated reader.
-	 */
-	target = cpumask_any_and(&cstate_module_cpu_mask,
-				 topology_cluster_cpumask(cpu));
-	if (has_cstate_module && target >= nr_cpu_ids)
-		cpumask_set_cpu(cpu, &cstate_module_cpu_mask);
-
-	return 0;
-}
-
 static const struct attribute_group *core_attr_update[] = {
 	&group_cstate_core_c1,
 	&group_cstate_core_c3,
@@ -526,6 +397,7 @@ static struct pmu cstate_core_pmu = {
 	.stop		= cstate_pmu_event_stop,
 	.read		= cstate_pmu_event_update,
 	.capabilities	= PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
+	.scope		= PERF_PMU_SCOPE_CORE,
 	.module		= THIS_MODULE,
 };
 
@@ -541,6 +413,7 @@ static struct pmu cstate_pkg_pmu = {
 	.stop		= cstate_pmu_event_stop,
 	.read		= cstate_pmu_event_update,
 	.capabilities	= PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
+	.scope		= PERF_PMU_SCOPE_PKG,
 	.module		= THIS_MODULE,
 };
 
@@ -556,6 +429,7 @@ static struct pmu cstate_module_pmu = {
 	.stop		= cstate_pmu_event_stop,
 	.read		= cstate_pmu_event_update,
 	.capabilities	= PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
+	.scope		= PERF_PMU_SCOPE_CLUSTER,
 	.module		= THIS_MODULE,
 };
 
@@ -809,9 +683,6 @@ static int __init cstate_probe(const struct cstate_model *cm)
 
 static inline void cstate_cleanup(void)
 {
-	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_CSTATE_ONLINE);
-	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_CSTATE_STARTING);
-
 	if (has_cstate_core)
 		perf_pmu_unregister(&cstate_core_pmu);
 
@@ -826,11 +697,6 @@ static int __init cstate_init(void)
 {
 	int err;
 
-	cpuhp_setup_state(CPUHP_AP_PERF_X86_CSTATE_STARTING,
-			  "perf/x86/cstate:starting", cstate_cpu_init, NULL);
-	cpuhp_setup_state(CPUHP_AP_PERF_X86_CSTATE_ONLINE,
-			  "perf/x86/cstate:online", NULL, cstate_cpu_exit);
-
 	if (has_cstate_core) {
 		err = perf_pmu_register(&cstate_core_pmu, cstate_core_pmu.name, -1);
 		if (err) {
@@ -843,6 +709,8 @@ static int __init cstate_init(void)
 
 	if (has_cstate_pkg) {
 		if (topology_max_dies_per_package() > 1) {
+			/* CLX-AP is multi-die and the cstate is die-scope */
+			cstate_pkg_pmu.scope = PERF_PMU_SCOPE_DIE;
 			err = perf_pmu_register(&cstate_pkg_pmu,
 						"cstate_die", -1);
 		} else {
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 51ba681b915a..9ea6290ade56 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -152,7 +152,6 @@ enum cpuhp_state {
 	CPUHP_AP_PERF_X86_AMD_UNCORE_STARTING,
 	CPUHP_AP_PERF_X86_STARTING,
 	CPUHP_AP_PERF_X86_AMD_IBS_STARTING,
-	CPUHP_AP_PERF_X86_CSTATE_STARTING,
 	CPUHP_AP_PERF_XTENSA_STARTING,
 	CPUHP_AP_ARM_VFP_STARTING,
 	CPUHP_AP_ARM64_DEBUG_MONITORS_STARTING,
@@ -209,7 +208,6 @@ enum cpuhp_state {
 	CPUHP_AP_PERF_X86_AMD_UNCORE_ONLINE,
 	CPUHP_AP_PERF_X86_AMD_POWER_ONLINE,
 	CPUHP_AP_PERF_X86_RAPL_ONLINE,
-	CPUHP_AP_PERF_X86_CSTATE_ONLINE,
 	CPUHP_AP_PERF_S390_CF_ONLINE,
 	CPUHP_AP_PERF_S390_SF_ONLINE,
 	CPUHP_AP_PERF_ARM_CCI_ONLINE,
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/7] iommu/vt-d: Clean up cpumask and hotplug for perfmon
  2024-08-02 15:16 [PATCH 0/7] Generic hotplug support for a PMU with a scope kan.liang
                   ` (2 preceding siblings ...)
  2024-08-02 15:16 ` [PATCH 3/7] perf/x86/intel/cstate: Clean up cpumask and hotplug kan.liang
@ 2024-08-02 15:16 ` kan.liang
  2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2024-08-02 15:16 ` [PATCH 5/7] dmaengine: idxd: " kan.liang
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: kan.liang @ 2024-08-02 15:16 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, linux-kernel
  Cc: Kan Liang, Lu Baolu, David Woodhouse, Joerg Roedel, Will Deacon,
	iommu

From: Kan Liang <kan.liang@linux.intel.com>

The iommu PMU is system-wide scope, which is supported by the generic
perf_event subsystem now.

Set the scope for the iommu PMU and remove all the cpumask and hotplug
codes.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: iommu@lists.linux.dev
---
 drivers/iommu/intel/iommu.h   |   2 -
 drivers/iommu/intel/perfmon.c | 111 +---------------------------------
 2 files changed, 2 insertions(+), 111 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index b67c14da1240..bd2c5a4ca11a 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -687,8 +687,6 @@ struct iommu_pmu {
 	DECLARE_BITMAP(used_mask, IOMMU_PMU_IDX_MAX);
 	struct perf_event	*event_list[IOMMU_PMU_IDX_MAX];
 	unsigned char		irq_name[16];
-	struct hlist_node	cpuhp_node;
-	int			cpu;
 };
 
 #define IOMMU_IRQ_ID_OFFSET_PRQ		(DMAR_UNITS_SUPPORTED)
diff --git a/drivers/iommu/intel/perfmon.c b/drivers/iommu/intel/perfmon.c
index 44083d01852d..75f493bcb353 100644
--- a/drivers/iommu/intel/perfmon.c
+++ b/drivers/iommu/intel/perfmon.c
@@ -34,28 +34,9 @@ static struct attribute_group iommu_pmu_events_attr_group = {
 	.attrs = attrs_empty,
 };
 
-static cpumask_t iommu_pmu_cpu_mask;
-
-static ssize_t
-cpumask_show(struct device *dev, struct device_attribute *attr, char *buf)
-{
-	return cpumap_print_to_pagebuf(true, buf, &iommu_pmu_cpu_mask);
-}
-static DEVICE_ATTR_RO(cpumask);
-
-static struct attribute *iommu_pmu_cpumask_attrs[] = {
-	&dev_attr_cpumask.attr,
-	NULL
-};
-
-static struct attribute_group iommu_pmu_cpumask_attr_group = {
-	.attrs = iommu_pmu_cpumask_attrs,
-};
-
 static const struct attribute_group *iommu_pmu_attr_groups[] = {
 	&iommu_pmu_format_attr_group,
 	&iommu_pmu_events_attr_group,
-	&iommu_pmu_cpumask_attr_group,
 	NULL
 };
 
@@ -565,6 +546,7 @@ static int __iommu_pmu_register(struct intel_iommu *iommu)
 	iommu_pmu->pmu.attr_groups	= iommu_pmu_attr_groups;
 	iommu_pmu->pmu.attr_update	= iommu_pmu_attr_update;
 	iommu_pmu->pmu.capabilities	= PERF_PMU_CAP_NO_EXCLUDE;
+	iommu_pmu->pmu.scope		= PERF_PMU_SCOPE_SYS_WIDE;
 	iommu_pmu->pmu.module		= THIS_MODULE;
 
 	return perf_pmu_register(&iommu_pmu->pmu, iommu_pmu->pmu.name, -1);
@@ -773,89 +755,6 @@ static void iommu_pmu_unset_interrupt(struct intel_iommu *iommu)
 	iommu->perf_irq = 0;
 }
 
-static int iommu_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
-{
-	struct iommu_pmu *iommu_pmu = hlist_entry_safe(node, typeof(*iommu_pmu), cpuhp_node);
-
-	if (cpumask_empty(&iommu_pmu_cpu_mask))
-		cpumask_set_cpu(cpu, &iommu_pmu_cpu_mask);
-
-	if (cpumask_test_cpu(cpu, &iommu_pmu_cpu_mask))
-		iommu_pmu->cpu = cpu;
-
-	return 0;
-}
-
-static int iommu_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
-{
-	struct iommu_pmu *iommu_pmu = hlist_entry_safe(node, typeof(*iommu_pmu), cpuhp_node);
-	int target = cpumask_first(&iommu_pmu_cpu_mask);
-
-	/*
-	 * The iommu_pmu_cpu_mask has been updated when offline the CPU
-	 * for the first iommu_pmu. Migrate the other iommu_pmu to the
-	 * new target.
-	 */
-	if (target < nr_cpu_ids && target != iommu_pmu->cpu) {
-		perf_pmu_migrate_context(&iommu_pmu->pmu, cpu, target);
-		iommu_pmu->cpu = target;
-		return 0;
-	}
-
-	if (!cpumask_test_and_clear_cpu(cpu, &iommu_pmu_cpu_mask))
-		return 0;
-
-	target = cpumask_any_but(cpu_online_mask, cpu);
-
-	if (target < nr_cpu_ids)
-		cpumask_set_cpu(target, &iommu_pmu_cpu_mask);
-	else
-		return 0;
-
-	perf_pmu_migrate_context(&iommu_pmu->pmu, cpu, target);
-	iommu_pmu->cpu = target;
-
-	return 0;
-}
-
-static int nr_iommu_pmu;
-static enum cpuhp_state iommu_cpuhp_slot;
-
-static int iommu_pmu_cpuhp_setup(struct iommu_pmu *iommu_pmu)
-{
-	int ret;
-
-	if (!nr_iommu_pmu) {
-		ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
-					      "driver/iommu/intel/perfmon:online",
-					      iommu_pmu_cpu_online,
-					      iommu_pmu_cpu_offline);
-		if (ret < 0)
-			return ret;
-		iommu_cpuhp_slot = ret;
-	}
-
-	ret = cpuhp_state_add_instance(iommu_cpuhp_slot, &iommu_pmu->cpuhp_node);
-	if (ret) {
-		if (!nr_iommu_pmu)
-			cpuhp_remove_multi_state(iommu_cpuhp_slot);
-		return ret;
-	}
-	nr_iommu_pmu++;
-
-	return 0;
-}
-
-static void iommu_pmu_cpuhp_free(struct iommu_pmu *iommu_pmu)
-{
-	cpuhp_state_remove_instance(iommu_cpuhp_slot, &iommu_pmu->cpuhp_node);
-
-	if (--nr_iommu_pmu)
-		return;
-
-	cpuhp_remove_multi_state(iommu_cpuhp_slot);
-}
-
 void iommu_pmu_register(struct intel_iommu *iommu)
 {
 	struct iommu_pmu *iommu_pmu = iommu->pmu;
@@ -866,17 +765,12 @@ void iommu_pmu_register(struct intel_iommu *iommu)
 	if (__iommu_pmu_register(iommu))
 		goto err;
 
-	if (iommu_pmu_cpuhp_setup(iommu_pmu))
-		goto unregister;
-
 	/* Set interrupt for overflow */
 	if (iommu_pmu_set_interrupt(iommu))
-		goto cpuhp_free;
+		goto unregister;
 
 	return;
 
-cpuhp_free:
-	iommu_pmu_cpuhp_free(iommu_pmu);
 unregister:
 	perf_pmu_unregister(&iommu_pmu->pmu);
 err:
@@ -892,6 +786,5 @@ void iommu_pmu_unregister(struct intel_iommu *iommu)
 		return;
 
 	iommu_pmu_unset_interrupt(iommu);
-	iommu_pmu_cpuhp_free(iommu_pmu);
 	perf_pmu_unregister(&iommu_pmu->pmu);
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/7] dmaengine: idxd: Clean up cpumask and hotplug for perfmon
  2024-08-02 15:16 [PATCH 0/7] Generic hotplug support for a PMU with a scope kan.liang
                   ` (3 preceding siblings ...)
  2024-08-02 15:16 ` [PATCH 4/7] iommu/vt-d: Clean up cpumask and hotplug for perfmon kan.liang
@ 2024-08-02 15:16 ` kan.liang
  2024-08-05 15:40   ` Dave Jiang
                     ` (2 more replies)
  2024-08-02 15:16 ` [PATCH 6/7] perf/x86/rapl: Move the pmu allocation out of CPU hotplug kan.liang
                   ` (3 subsequent siblings)
  8 siblings, 3 replies; 35+ messages in thread
From: kan.liang @ 2024-08-02 15:16 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, linux-kernel
  Cc: Kan Liang, Fenghua Yu, Dave Jiang, Vinod Koul, dmaengine

From: Kan Liang <kan.liang@linux.intel.com>

The idxd PMU is system-wide scope, which is supported by the generic
perf_event subsystem now.

Set the scope for the idxd PMU and remove all the cpumask and hotplug
codes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: dmaengine@vger.kernel.org
---
 drivers/dma/idxd/idxd.h    |  7 ---
 drivers/dma/idxd/init.c    |  3 --
 drivers/dma/idxd/perfmon.c | 98 +-------------------------------------
 3 files changed, 1 insertion(+), 107 deletions(-)

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 868b724a3b75..d84e21daa991 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -124,7 +124,6 @@ struct idxd_pmu {
 
 	struct pmu pmu;
 	char name[IDXD_NAME_SIZE];
-	int cpu;
 
 	int n_counters;
 	int counter_width;
@@ -135,8 +134,6 @@ struct idxd_pmu {
 
 	unsigned long supported_filters;
 	int n_filters;
-
-	struct hlist_node cpuhp_node;
 };
 
 #define IDXD_MAX_PRIORITY	0xf
@@ -803,14 +800,10 @@ void idxd_user_counter_increment(struct idxd_wq *wq, u32 pasid, int index);
 int perfmon_pmu_init(struct idxd_device *idxd);
 void perfmon_pmu_remove(struct idxd_device *idxd);
 void perfmon_counter_overflow(struct idxd_device *idxd);
-void perfmon_init(void);
-void perfmon_exit(void);
 #else
 static inline int perfmon_pmu_init(struct idxd_device *idxd) { return 0; }
 static inline void perfmon_pmu_remove(struct idxd_device *idxd) {}
 static inline void perfmon_counter_overflow(struct idxd_device *idxd) {}
-static inline void perfmon_init(void) {}
-static inline void perfmon_exit(void) {}
 #endif
 
 /* debugfs */
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 21f6905b554d..5725ea82c409 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -878,8 +878,6 @@ static int __init idxd_init_module(void)
 	else
 		support_enqcmd = true;
 
-	perfmon_init();
-
 	err = idxd_driver_register(&idxd_drv);
 	if (err < 0)
 		goto err_idxd_driver_register;
@@ -928,7 +926,6 @@ static void __exit idxd_exit_module(void)
 	idxd_driver_unregister(&idxd_drv);
 	pci_unregister_driver(&idxd_pci_driver);
 	idxd_cdev_remove();
-	perfmon_exit();
 	idxd_remove_debugfs();
 }
 module_exit(idxd_exit_module);
diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
index 5e94247e1ea7..f511cf15845b 100644
--- a/drivers/dma/idxd/perfmon.c
+++ b/drivers/dma/idxd/perfmon.c
@@ -6,29 +6,6 @@
 #include "idxd.h"
 #include "perfmon.h"
 
-static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
-			    char *buf);
-
-static cpumask_t		perfmon_dsa_cpu_mask;
-static bool			cpuhp_set_up;
-static enum cpuhp_state		cpuhp_slot;
-
-/*
- * perf userspace reads this attribute to determine which cpus to open
- * counters on.  It's connected to perfmon_dsa_cpu_mask, which is
- * maintained by the cpu hotplug handlers.
- */
-static DEVICE_ATTR_RO(cpumask);
-
-static struct attribute *perfmon_cpumask_attrs[] = {
-	&dev_attr_cpumask.attr,
-	NULL,
-};
-
-static struct attribute_group cpumask_attr_group = {
-	.attrs = perfmon_cpumask_attrs,
-};
-
 /*
  * These attributes specify the bits in the config word that the perf
  * syscall uses to pass the event ids and categories to perfmon.
@@ -67,16 +44,9 @@ static struct attribute_group perfmon_format_attr_group = {
 
 static const struct attribute_group *perfmon_attr_groups[] = {
 	&perfmon_format_attr_group,
-	&cpumask_attr_group,
 	NULL,
 };
 
-static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
-			    char *buf)
-{
-	return cpumap_print_to_pagebuf(true, buf, &perfmon_dsa_cpu_mask);
-}
-
 static bool is_idxd_event(struct idxd_pmu *idxd_pmu, struct perf_event *event)
 {
 	return &idxd_pmu->pmu == event->pmu;
@@ -217,7 +187,6 @@ static int perfmon_pmu_event_init(struct perf_event *event)
 		return -EINVAL;
 
 	event->hw.event_base = ioread64(PERFMON_TABLE_OFFSET(idxd));
-	event->cpu = idxd->idxd_pmu->cpu;
 	event->hw.config = event->attr.config;
 
 	if (event->group_leader != event)
@@ -488,6 +457,7 @@ static void idxd_pmu_init(struct idxd_pmu *idxd_pmu)
 	idxd_pmu->pmu.stop		= perfmon_pmu_event_stop;
 	idxd_pmu->pmu.read		= perfmon_pmu_event_update;
 	idxd_pmu->pmu.capabilities	= PERF_PMU_CAP_NO_EXCLUDE;
+	idxd_pmu->pmu.scope		= PERF_PMU_SCOPE_SYS_WIDE;
 	idxd_pmu->pmu.module		= THIS_MODULE;
 }
 
@@ -496,59 +466,17 @@ void perfmon_pmu_remove(struct idxd_device *idxd)
 	if (!idxd->idxd_pmu)
 		return;
 
-	cpuhp_state_remove_instance(cpuhp_slot, &idxd->idxd_pmu->cpuhp_node);
 	perf_pmu_unregister(&idxd->idxd_pmu->pmu);
 	kfree(idxd->idxd_pmu);
 	idxd->idxd_pmu = NULL;
 }
 
-static int perf_event_cpu_online(unsigned int cpu, struct hlist_node *node)
-{
-	struct idxd_pmu *idxd_pmu;
-
-	idxd_pmu = hlist_entry_safe(node, typeof(*idxd_pmu), cpuhp_node);
-
-	/* select the first online CPU as the designated reader */
-	if (cpumask_empty(&perfmon_dsa_cpu_mask)) {
-		cpumask_set_cpu(cpu, &perfmon_dsa_cpu_mask);
-		idxd_pmu->cpu = cpu;
-	}
-
-	return 0;
-}
-
-static int perf_event_cpu_offline(unsigned int cpu, struct hlist_node *node)
-{
-	struct idxd_pmu *idxd_pmu;
-	unsigned int target;
-
-	idxd_pmu = hlist_entry_safe(node, typeof(*idxd_pmu), cpuhp_node);
-
-	if (!cpumask_test_and_clear_cpu(cpu, &perfmon_dsa_cpu_mask))
-		return 0;
-
-	target = cpumask_any_but(cpu_online_mask, cpu);
-	/* migrate events if there is a valid target */
-	if (target < nr_cpu_ids) {
-		cpumask_set_cpu(target, &perfmon_dsa_cpu_mask);
-		perf_pmu_migrate_context(&idxd_pmu->pmu, cpu, target);
-	}
-
-	return 0;
-}
-
 int perfmon_pmu_init(struct idxd_device *idxd)
 {
 	union idxd_perfcap perfcap;
 	struct idxd_pmu *idxd_pmu;
 	int rc = -ENODEV;
 
-	/*
-	 * perfmon module initialization failed, nothing to do
-	 */
-	if (!cpuhp_set_up)
-		return -ENODEV;
-
 	/*
 	 * If perfmon_offset or num_counters is 0, it means perfmon is
 	 * not supported on this hardware.
@@ -624,11 +552,6 @@ int perfmon_pmu_init(struct idxd_device *idxd)
 	if (rc)
 		goto free;
 
-	rc = cpuhp_state_add_instance(cpuhp_slot, &idxd_pmu->cpuhp_node);
-	if (rc) {
-		perf_pmu_unregister(&idxd->idxd_pmu->pmu);
-		goto free;
-	}
 out:
 	return rc;
 free:
@@ -637,22 +560,3 @@ int perfmon_pmu_init(struct idxd_device *idxd)
 
 	goto out;
 }
-
-void __init perfmon_init(void)
-{
-	int rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
-					 "driver/dma/idxd/perf:online",
-					 perf_event_cpu_online,
-					 perf_event_cpu_offline);
-	if (WARN_ON(rc < 0))
-		return;
-
-	cpuhp_slot = rc;
-	cpuhp_set_up = true;
-}
-
-void __exit perfmon_exit(void)
-{
-	if (cpuhp_set_up)
-		cpuhp_remove_multi_state(cpuhp_slot);
-}
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 6/7] perf/x86/rapl: Move the pmu allocation out of CPU hotplug
  2024-08-02 15:16 [PATCH 0/7] Generic hotplug support for a PMU with a scope kan.liang
                   ` (4 preceding siblings ...)
  2024-08-02 15:16 ` [PATCH 5/7] dmaengine: idxd: " kan.liang
@ 2024-08-02 15:16 ` kan.liang
  2024-09-09  9:26   ` Dhananjay Ugwekar
  2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2024-08-02 15:16 ` [PATCH 7/7] perf/x86/rapl: Clean up cpumask and hotplug kan.liang
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 35+ messages in thread
From: kan.liang @ 2024-08-02 15:16 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, linux-kernel
  Cc: Kan Liang, Dhananjay Ugwekar

From: Kan Liang <kan.liang@linux.intel.com>

The rapl pmu just needs to be allocated once. It doesn't matter to be
allocated at each CPU hotplug, or the global init_rapl_pmus().

Move the pmu allocation to the init_rapl_pmus(). So the generic hotplug
supports can be applied.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 43 +++++++++++++++++++++++++++++-------------
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index b985ca79cf97..f8b6d504d03f 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -568,19 +568,8 @@ static int rapl_cpu_online(unsigned int cpu)
 	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
 	int target;
 
-	if (!pmu) {
-		pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
-		if (!pmu)
-			return -ENOMEM;
-
-		raw_spin_lock_init(&pmu->lock);
-		INIT_LIST_HEAD(&pmu->active_list);
-		pmu->pmu = &rapl_pmus->pmu;
-		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
-		rapl_hrtimer_init(pmu);
-
-		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
-	}
+	if (!pmu)
+		return -ENOMEM;
 
 	/*
 	 * Check if there is an online cpu in the package which collects rapl
@@ -673,6 +662,32 @@ static const struct attribute_group *rapl_attr_update[] = {
 	NULL,
 };
 
+static void __init init_rapl_pmu(void)
+{
+	struct rapl_pmu *pmu;
+	int cpu;
+
+	cpus_read_lock();
+
+	for_each_cpu(cpu, cpu_online_mask) {
+		pmu = cpu_to_rapl_pmu(cpu);
+		if (pmu)
+			continue;
+		pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
+		if (!pmu)
+			continue;
+		raw_spin_lock_init(&pmu->lock);
+		INIT_LIST_HEAD(&pmu->active_list);
+		pmu->pmu = &rapl_pmus->pmu;
+		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
+		rapl_hrtimer_init(pmu);
+
+		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
+	}
+
+	cpus_read_unlock();
+}
+
 static int __init init_rapl_pmus(void)
 {
 	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
@@ -681,6 +696,8 @@ static int __init init_rapl_pmus(void)
 	if (!rapl_pmus)
 		return -ENOMEM;
 
+	init_rapl_pmu();
+
 	rapl_pmus->nr_rapl_pmu		= nr_rapl_pmu;
 	rapl_pmus->pmu.attr_groups	= rapl_attr_groups;
 	rapl_pmus->pmu.attr_update	= rapl_attr_update;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 7/7] perf/x86/rapl: Clean up cpumask and hotplug
  2024-08-02 15:16 [PATCH 0/7] Generic hotplug support for a PMU with a scope kan.liang
                   ` (5 preceding siblings ...)
  2024-08-02 15:16 ` [PATCH 6/7] perf/x86/rapl: Move the pmu allocation out of CPU hotplug kan.liang
@ 2024-08-02 15:16 ` kan.liang
  2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2024-09-04 12:44 ` [PATCH 0/7] Generic hotplug support for a PMU with a scope Liang, Kan
  2024-09-06 15:12 ` Peter Zijlstra
  8 siblings, 1 reply; 35+ messages in thread
From: kan.liang @ 2024-08-02 15:16 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, linux-kernel
  Cc: Kan Liang, Dhananjay Ugwekar

From: Kan Liang <kan.liang@linux.intel.com>

The rapl pmu is die scope, which is supported by the generic perf_event
subsystem now.

Set the scope for the rapl PMU and remove all the cpumask and hotplug
codes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c     | 80 +-------------------------------------
 include/linux/cpuhotplug.h |  1 -
 2 files changed, 2 insertions(+), 79 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index f8b6d504d03f..b70ad880c5bc 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -135,7 +135,6 @@ struct rapl_model {
  /* 1/2^hw_unit Joule */
 static int rapl_hw_unit[NR_RAPL_DOMAINS] __read_mostly;
 static struct rapl_pmus *rapl_pmus;
-static cpumask_t rapl_cpu_mask;
 static unsigned int rapl_cntr_mask;
 static u64 rapl_timer_ms;
 static struct perf_msr *rapl_msrs;
@@ -340,8 +339,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
 	if (event->cpu < 0)
 		return -EINVAL;
 
-	event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
-
 	if (!cfg || cfg >= NR_RAPL_DOMAINS + 1)
 		return -EINVAL;
 
@@ -360,7 +357,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
 	pmu = cpu_to_rapl_pmu(event->cpu);
 	if (!pmu)
 		return -EINVAL;
-	event->cpu = pmu->cpu;
 	event->pmu_private = pmu;
 	event->hw.event_base = rapl_msrs[bit].msr;
 	event->hw.config = cfg;
@@ -374,23 +370,6 @@ static void rapl_pmu_event_read(struct perf_event *event)
 	rapl_event_update(event);
 }
 
-static ssize_t rapl_get_attr_cpumask(struct device *dev,
-				struct device_attribute *attr, char *buf)
-{
-	return cpumap_print_to_pagebuf(true, buf, &rapl_cpu_mask);
-}
-
-static DEVICE_ATTR(cpumask, S_IRUGO, rapl_get_attr_cpumask, NULL);
-
-static struct attribute *rapl_pmu_attrs[] = {
-	&dev_attr_cpumask.attr,
-	NULL,
-};
-
-static struct attribute_group rapl_pmu_attr_group = {
-	.attrs = rapl_pmu_attrs,
-};
-
 RAPL_EVENT_ATTR_STR(energy-cores, rapl_cores, "event=0x01");
 RAPL_EVENT_ATTR_STR(energy-pkg  ,   rapl_pkg, "event=0x02");
 RAPL_EVENT_ATTR_STR(energy-ram  ,   rapl_ram, "event=0x03");
@@ -438,7 +417,6 @@ static struct attribute_group rapl_pmu_format_group = {
 };
 
 static const struct attribute_group *rapl_attr_groups[] = {
-	&rapl_pmu_attr_group,
 	&rapl_pmu_format_group,
 	&rapl_pmu_events_group,
 	NULL,
@@ -541,49 +519,6 @@ static struct perf_msr amd_rapl_msrs[] = {
 	[PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group,  NULL, false, 0 },
 };
 
-static int rapl_cpu_offline(unsigned int cpu)
-{
-	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
-	int target;
-
-	/* Check if exiting cpu is used for collecting rapl events */
-	if (!cpumask_test_and_clear_cpu(cpu, &rapl_cpu_mask))
-		return 0;
-
-	pmu->cpu = -1;
-	/* Find a new cpu to collect rapl events */
-	target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
-
-	/* Migrate rapl events to the new target */
-	if (target < nr_cpu_ids) {
-		cpumask_set_cpu(target, &rapl_cpu_mask);
-		pmu->cpu = target;
-		perf_pmu_migrate_context(pmu->pmu, cpu, target);
-	}
-	return 0;
-}
-
-static int rapl_cpu_online(unsigned int cpu)
-{
-	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
-	int target;
-
-	if (!pmu)
-		return -ENOMEM;
-
-	/*
-	 * Check if there is an online cpu in the package which collects rapl
-	 * events already.
-	 */
-	target = cpumask_any_and(&rapl_cpu_mask, topology_die_cpumask(cpu));
-	if (target < nr_cpu_ids)
-		return 0;
-
-	cpumask_set_cpu(cpu, &rapl_cpu_mask);
-	pmu->cpu = cpu;
-	return 0;
-}
-
 static int rapl_check_hw_unit(struct rapl_model *rm)
 {
 	u64 msr_rapl_power_unit_bits;
@@ -709,6 +644,7 @@ static int __init init_rapl_pmus(void)
 	rapl_pmus->pmu.stop		= rapl_pmu_event_stop;
 	rapl_pmus->pmu.read		= rapl_pmu_event_read;
 	rapl_pmus->pmu.module		= THIS_MODULE;
+	rapl_pmus->pmu.scope		= PERF_PMU_SCOPE_DIE;
 	rapl_pmus->pmu.capabilities	= PERF_PMU_CAP_NO_EXCLUDE;
 	return 0;
 }
@@ -856,24 +792,13 @@ static int __init rapl_pmu_init(void)
 	if (ret)
 		return ret;
 
-	/*
-	 * Install callbacks. Core will call them for each online cpu.
-	 */
-	ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_RAPL_ONLINE,
-				"perf/x86/rapl:online",
-				rapl_cpu_online, rapl_cpu_offline);
-	if (ret)
-		goto out;
-
 	ret = perf_pmu_register(&rapl_pmus->pmu, "power", -1);
 	if (ret)
-		goto out1;
+		goto out;
 
 	rapl_advertise();
 	return 0;
 
-out1:
-	cpuhp_remove_state(CPUHP_AP_PERF_X86_RAPL_ONLINE);
 out:
 	pr_warn("Initialization failed (%d), disabled\n", ret);
 	cleanup_rapl_pmus();
@@ -883,7 +808,6 @@ module_init(rapl_pmu_init);
 
 static void __exit intel_rapl_exit(void)
 {
-	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
 	perf_pmu_unregister(&rapl_pmus->pmu);
 	cleanup_rapl_pmus();
 }
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 9ea6290ade56..f408521be568 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -207,7 +207,6 @@ enum cpuhp_state {
 	CPUHP_AP_PERF_X86_UNCORE_ONLINE,
 	CPUHP_AP_PERF_X86_AMD_UNCORE_ONLINE,
 	CPUHP_AP_PERF_X86_AMD_POWER_ONLINE,
-	CPUHP_AP_PERF_X86_RAPL_ONLINE,
 	CPUHP_AP_PERF_S390_CF_ONLINE,
 	CPUHP_AP_PERF_S390_SF_ONLINE,
 	CPUHP_AP_PERF_ARM_CCI_ONLINE,
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/7] dmaengine: idxd: Clean up cpumask and hotplug for perfmon
  2024-08-02 15:16 ` [PATCH 5/7] dmaengine: idxd: " kan.liang
@ 2024-08-05 15:40   ` Dave Jiang
  2024-08-05 17:46   ` Fenghua Yu
  2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2 siblings, 0 replies; 35+ messages in thread
From: Dave Jiang @ 2024-08-05 15:40 UTC (permalink / raw)
  To: kan.liang, peterz, mingo, acme, namhyung, irogers, linux-kernel
  Cc: Fenghua Yu, Vinod Koul, dmaengine, Zanussi, Tom



On 8/2/24 8:16 AM, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The idxd PMU is system-wide scope, which is supported by the generic
> perf_event subsystem now.
> 
> Set the scope for the idxd PMU and remove all the cpumask and hotplug
> codes.
> 
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Vinod Koul <vkoul@kernel.org>
> Cc: dmaengine@vger.kernel.org

Cc: Tom Zanussi <tom.zanussi@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  drivers/dma/idxd/idxd.h    |  7 ---
>  drivers/dma/idxd/init.c    |  3 --
>  drivers/dma/idxd/perfmon.c | 98 +-------------------------------------
>  3 files changed, 1 insertion(+), 107 deletions(-)
> 
> diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
> index 868b724a3b75..d84e21daa991 100644
> --- a/drivers/dma/idxd/idxd.h
> +++ b/drivers/dma/idxd/idxd.h
> @@ -124,7 +124,6 @@ struct idxd_pmu {
>  
>  	struct pmu pmu;
>  	char name[IDXD_NAME_SIZE];
> -	int cpu;
>  
>  	int n_counters;
>  	int counter_width;
> @@ -135,8 +134,6 @@ struct idxd_pmu {
>  
>  	unsigned long supported_filters;
>  	int n_filters;
> -
> -	struct hlist_node cpuhp_node;
>  };
>  
>  #define IDXD_MAX_PRIORITY	0xf
> @@ -803,14 +800,10 @@ void idxd_user_counter_increment(struct idxd_wq *wq, u32 pasid, int index);
>  int perfmon_pmu_init(struct idxd_device *idxd);
>  void perfmon_pmu_remove(struct idxd_device *idxd);
>  void perfmon_counter_overflow(struct idxd_device *idxd);
> -void perfmon_init(void);
> -void perfmon_exit(void);
>  #else
>  static inline int perfmon_pmu_init(struct idxd_device *idxd) { return 0; }
>  static inline void perfmon_pmu_remove(struct idxd_device *idxd) {}
>  static inline void perfmon_counter_overflow(struct idxd_device *idxd) {}
> -static inline void perfmon_init(void) {}
> -static inline void perfmon_exit(void) {}
>  #endif
>  
>  /* debugfs */
> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> index 21f6905b554d..5725ea82c409 100644
> --- a/drivers/dma/idxd/init.c
> +++ b/drivers/dma/idxd/init.c
> @@ -878,8 +878,6 @@ static int __init idxd_init_module(void)
>  	else
>  		support_enqcmd = true;
>  
> -	perfmon_init();
> -
>  	err = idxd_driver_register(&idxd_drv);
>  	if (err < 0)
>  		goto err_idxd_driver_register;
> @@ -928,7 +926,6 @@ static void __exit idxd_exit_module(void)
>  	idxd_driver_unregister(&idxd_drv);
>  	pci_unregister_driver(&idxd_pci_driver);
>  	idxd_cdev_remove();
> -	perfmon_exit();
>  	idxd_remove_debugfs();
>  }
>  module_exit(idxd_exit_module);
> diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
> index 5e94247e1ea7..f511cf15845b 100644
> --- a/drivers/dma/idxd/perfmon.c
> +++ b/drivers/dma/idxd/perfmon.c
> @@ -6,29 +6,6 @@
>  #include "idxd.h"
>  #include "perfmon.h"
>  
> -static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
> -			    char *buf);
> -
> -static cpumask_t		perfmon_dsa_cpu_mask;
> -static bool			cpuhp_set_up;
> -static enum cpuhp_state		cpuhp_slot;
> -
> -/*
> - * perf userspace reads this attribute to determine which cpus to open
> - * counters on.  It's connected to perfmon_dsa_cpu_mask, which is
> - * maintained by the cpu hotplug handlers.
> - */
> -static DEVICE_ATTR_RO(cpumask);
> -
> -static struct attribute *perfmon_cpumask_attrs[] = {
> -	&dev_attr_cpumask.attr,
> -	NULL,
> -};
> -
> -static struct attribute_group cpumask_attr_group = {
> -	.attrs = perfmon_cpumask_attrs,
> -};
> -
>  /*
>   * These attributes specify the bits in the config word that the perf
>   * syscall uses to pass the event ids and categories to perfmon.
> @@ -67,16 +44,9 @@ static struct attribute_group perfmon_format_attr_group = {
>  
>  static const struct attribute_group *perfmon_attr_groups[] = {
>  	&perfmon_format_attr_group,
> -	&cpumask_attr_group,
>  	NULL,
>  };
>  
> -static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
> -			    char *buf)
> -{
> -	return cpumap_print_to_pagebuf(true, buf, &perfmon_dsa_cpu_mask);
> -}
> -
>  static bool is_idxd_event(struct idxd_pmu *idxd_pmu, struct perf_event *event)
>  {
>  	return &idxd_pmu->pmu == event->pmu;
> @@ -217,7 +187,6 @@ static int perfmon_pmu_event_init(struct perf_event *event)
>  		return -EINVAL;
>  
>  	event->hw.event_base = ioread64(PERFMON_TABLE_OFFSET(idxd));
> -	event->cpu = idxd->idxd_pmu->cpu;
>  	event->hw.config = event->attr.config;
>  
>  	if (event->group_leader != event)
> @@ -488,6 +457,7 @@ static void idxd_pmu_init(struct idxd_pmu *idxd_pmu)
>  	idxd_pmu->pmu.stop		= perfmon_pmu_event_stop;
>  	idxd_pmu->pmu.read		= perfmon_pmu_event_update;
>  	idxd_pmu->pmu.capabilities	= PERF_PMU_CAP_NO_EXCLUDE;
> +	idxd_pmu->pmu.scope		= PERF_PMU_SCOPE_SYS_WIDE;
>  	idxd_pmu->pmu.module		= THIS_MODULE;
>  }
>  
> @@ -496,59 +466,17 @@ void perfmon_pmu_remove(struct idxd_device *idxd)
>  	if (!idxd->idxd_pmu)
>  		return;
>  
> -	cpuhp_state_remove_instance(cpuhp_slot, &idxd->idxd_pmu->cpuhp_node);
>  	perf_pmu_unregister(&idxd->idxd_pmu->pmu);
>  	kfree(idxd->idxd_pmu);
>  	idxd->idxd_pmu = NULL;
>  }
>  
> -static int perf_event_cpu_online(unsigned int cpu, struct hlist_node *node)
> -{
> -	struct idxd_pmu *idxd_pmu;
> -
> -	idxd_pmu = hlist_entry_safe(node, typeof(*idxd_pmu), cpuhp_node);
> -
> -	/* select the first online CPU as the designated reader */
> -	if (cpumask_empty(&perfmon_dsa_cpu_mask)) {
> -		cpumask_set_cpu(cpu, &perfmon_dsa_cpu_mask);
> -		idxd_pmu->cpu = cpu;
> -	}
> -
> -	return 0;
> -}
> -
> -static int perf_event_cpu_offline(unsigned int cpu, struct hlist_node *node)
> -{
> -	struct idxd_pmu *idxd_pmu;
> -	unsigned int target;
> -
> -	idxd_pmu = hlist_entry_safe(node, typeof(*idxd_pmu), cpuhp_node);
> -
> -	if (!cpumask_test_and_clear_cpu(cpu, &perfmon_dsa_cpu_mask))
> -		return 0;
> -
> -	target = cpumask_any_but(cpu_online_mask, cpu);
> -	/* migrate events if there is a valid target */
> -	if (target < nr_cpu_ids) {
> -		cpumask_set_cpu(target, &perfmon_dsa_cpu_mask);
> -		perf_pmu_migrate_context(&idxd_pmu->pmu, cpu, target);
> -	}
> -
> -	return 0;
> -}
> -
>  int perfmon_pmu_init(struct idxd_device *idxd)
>  {
>  	union idxd_perfcap perfcap;
>  	struct idxd_pmu *idxd_pmu;
>  	int rc = -ENODEV;
>  
> -	/*
> -	 * perfmon module initialization failed, nothing to do
> -	 */
> -	if (!cpuhp_set_up)
> -		return -ENODEV;
> -
>  	/*
>  	 * If perfmon_offset or num_counters is 0, it means perfmon is
>  	 * not supported on this hardware.
> @@ -624,11 +552,6 @@ int perfmon_pmu_init(struct idxd_device *idxd)
>  	if (rc)
>  		goto free;
>  
> -	rc = cpuhp_state_add_instance(cpuhp_slot, &idxd_pmu->cpuhp_node);
> -	if (rc) {
> -		perf_pmu_unregister(&idxd->idxd_pmu->pmu);
> -		goto free;
> -	}
>  out:
>  	return rc;
>  free:
> @@ -637,22 +560,3 @@ int perfmon_pmu_init(struct idxd_device *idxd)
>  
>  	goto out;
>  }
> -
> -void __init perfmon_init(void)
> -{
> -	int rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
> -					 "driver/dma/idxd/perf:online",
> -					 perf_event_cpu_online,
> -					 perf_event_cpu_offline);
> -	if (WARN_ON(rc < 0))
> -		return;
> -
> -	cpuhp_slot = rc;
> -	cpuhp_set_up = true;
> -}
> -
> -void __exit perfmon_exit(void)
> -{
> -	if (cpuhp_set_up)
> -		cpuhp_remove_multi_state(cpuhp_slot);
> -}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/7] dmaengine: idxd: Clean up cpumask and hotplug for perfmon
  2024-08-02 15:16 ` [PATCH 5/7] dmaengine: idxd: " kan.liang
  2024-08-05 15:40   ` Dave Jiang
@ 2024-08-05 17:46   ` Fenghua Yu
  2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2 siblings, 0 replies; 35+ messages in thread
From: Fenghua Yu @ 2024-08-05 17:46 UTC (permalink / raw)
  To: kan.liang, peterz, mingo, acme, namhyung, irogers, linux-kernel
  Cc: Dave Jiang, Vinod Koul, dmaengine



On 8/2/24 08:16, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The idxd PMU is system-wide scope, which is supported by the generic
> perf_event subsystem now.
> 
> Set the scope for the idxd PMU and remove all the cpumask and hotplug
> codes.
> 
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Vinod Koul <vkoul@kernel.org>
> Cc: dmaengine@vger.kernel.org

Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>

> ---
>   drivers/dma/idxd/idxd.h    |  7 ---
>   drivers/dma/idxd/init.c    |  3 --
>   drivers/dma/idxd/perfmon.c | 98 +-------------------------------------
>   3 files changed, 1 insertion(+), 107 deletions(-)
> 
> diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
> index 868b724a3b75..d84e21daa991 100644
> --- a/drivers/dma/idxd/idxd.h
> +++ b/drivers/dma/idxd/idxd.h
> @@ -124,7 +124,6 @@ struct idxd_pmu {
>   
>   	struct pmu pmu;
>   	char name[IDXD_NAME_SIZE];
> -	int cpu;
>   
>   	int n_counters;
>   	int counter_width;
> @@ -135,8 +134,6 @@ struct idxd_pmu {
>   
>   	unsigned long supported_filters;
>   	int n_filters;
> -
> -	struct hlist_node cpuhp_node;
>   };
>   
>   #define IDXD_MAX_PRIORITY	0xf
> @@ -803,14 +800,10 @@ void idxd_user_counter_increment(struct idxd_wq *wq, u32 pasid, int index);
>   int perfmon_pmu_init(struct idxd_device *idxd);
>   void perfmon_pmu_remove(struct idxd_device *idxd);
>   void perfmon_counter_overflow(struct idxd_device *idxd);
> -void perfmon_init(void);
> -void perfmon_exit(void);
>   #else
>   static inline int perfmon_pmu_init(struct idxd_device *idxd) { return 0; }
>   static inline void perfmon_pmu_remove(struct idxd_device *idxd) {}
>   static inline void perfmon_counter_overflow(struct idxd_device *idxd) {}
> -static inline void perfmon_init(void) {}
> -static inline void perfmon_exit(void) {}
>   #endif
>   
>   /* debugfs */
> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> index 21f6905b554d..5725ea82c409 100644
> --- a/drivers/dma/idxd/init.c
> +++ b/drivers/dma/idxd/init.c
> @@ -878,8 +878,6 @@ static int __init idxd_init_module(void)
>   	else
>   		support_enqcmd = true;
>   
> -	perfmon_init();
> -
>   	err = idxd_driver_register(&idxd_drv);
>   	if (err < 0)
>   		goto err_idxd_driver_register;
> @@ -928,7 +926,6 @@ static void __exit idxd_exit_module(void)
>   	idxd_driver_unregister(&idxd_drv);
>   	pci_unregister_driver(&idxd_pci_driver);
>   	idxd_cdev_remove();
> -	perfmon_exit();
>   	idxd_remove_debugfs();
>   }
>   module_exit(idxd_exit_module);
> diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
> index 5e94247e1ea7..f511cf15845b 100644
> --- a/drivers/dma/idxd/perfmon.c
> +++ b/drivers/dma/idxd/perfmon.c
> @@ -6,29 +6,6 @@
>   #include "idxd.h"
>   #include "perfmon.h"
>   
> -static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
> -			    char *buf);
> -
> -static cpumask_t		perfmon_dsa_cpu_mask;
> -static bool			cpuhp_set_up;
> -static enum cpuhp_state		cpuhp_slot;
> -
> -/*
> - * perf userspace reads this attribute to determine which cpus to open
> - * counters on.  It's connected to perfmon_dsa_cpu_mask, which is
> - * maintained by the cpu hotplug handlers.
> - */
> -static DEVICE_ATTR_RO(cpumask);
> -
> -static struct attribute *perfmon_cpumask_attrs[] = {
> -	&dev_attr_cpumask.attr,
> -	NULL,
> -};
> -
> -static struct attribute_group cpumask_attr_group = {
> -	.attrs = perfmon_cpumask_attrs,
> -};
> -
>   /*
>    * These attributes specify the bits in the config word that the perf
>    * syscall uses to pass the event ids and categories to perfmon.
> @@ -67,16 +44,9 @@ static struct attribute_group perfmon_format_attr_group = {
>   
>   static const struct attribute_group *perfmon_attr_groups[] = {
>   	&perfmon_format_attr_group,
> -	&cpumask_attr_group,
>   	NULL,
>   };
>   
> -static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
> -			    char *buf)
> -{
> -	return cpumap_print_to_pagebuf(true, buf, &perfmon_dsa_cpu_mask);
> -}
> -
>   static bool is_idxd_event(struct idxd_pmu *idxd_pmu, struct perf_event *event)
>   {
>   	return &idxd_pmu->pmu == event->pmu;
> @@ -217,7 +187,6 @@ static int perfmon_pmu_event_init(struct perf_event *event)
>   		return -EINVAL;
>   
>   	event->hw.event_base = ioread64(PERFMON_TABLE_OFFSET(idxd));
> -	event->cpu = idxd->idxd_pmu->cpu;
>   	event->hw.config = event->attr.config;
>   
>   	if (event->group_leader != event)
> @@ -488,6 +457,7 @@ static void idxd_pmu_init(struct idxd_pmu *idxd_pmu)
>   	idxd_pmu->pmu.stop		= perfmon_pmu_event_stop;
>   	idxd_pmu->pmu.read		= perfmon_pmu_event_update;
>   	idxd_pmu->pmu.capabilities	= PERF_PMU_CAP_NO_EXCLUDE;
> +	idxd_pmu->pmu.scope		= PERF_PMU_SCOPE_SYS_WIDE;
>   	idxd_pmu->pmu.module		= THIS_MODULE;
>   }
>   
> @@ -496,59 +466,17 @@ void perfmon_pmu_remove(struct idxd_device *idxd)
>   	if (!idxd->idxd_pmu)
>   		return;
>   
> -	cpuhp_state_remove_instance(cpuhp_slot, &idxd->idxd_pmu->cpuhp_node);
>   	perf_pmu_unregister(&idxd->idxd_pmu->pmu);
>   	kfree(idxd->idxd_pmu);
>   	idxd->idxd_pmu = NULL;
>   }
>   
> -static int perf_event_cpu_online(unsigned int cpu, struct hlist_node *node)
> -{
> -	struct idxd_pmu *idxd_pmu;
> -
> -	idxd_pmu = hlist_entry_safe(node, typeof(*idxd_pmu), cpuhp_node);
> -
> -	/* select the first online CPU as the designated reader */
> -	if (cpumask_empty(&perfmon_dsa_cpu_mask)) {
> -		cpumask_set_cpu(cpu, &perfmon_dsa_cpu_mask);
> -		idxd_pmu->cpu = cpu;
> -	}
> -
> -	return 0;
> -}
> -
> -static int perf_event_cpu_offline(unsigned int cpu, struct hlist_node *node)
> -{
> -	struct idxd_pmu *idxd_pmu;
> -	unsigned int target;
> -
> -	idxd_pmu = hlist_entry_safe(node, typeof(*idxd_pmu), cpuhp_node);
> -
> -	if (!cpumask_test_and_clear_cpu(cpu, &perfmon_dsa_cpu_mask))
> -		return 0;
> -
> -	target = cpumask_any_but(cpu_online_mask, cpu);
> -	/* migrate events if there is a valid target */
> -	if (target < nr_cpu_ids) {
> -		cpumask_set_cpu(target, &perfmon_dsa_cpu_mask);
> -		perf_pmu_migrate_context(&idxd_pmu->pmu, cpu, target);
> -	}
> -
> -	return 0;
> -}
> -
>   int perfmon_pmu_init(struct idxd_device *idxd)
>   {
>   	union idxd_perfcap perfcap;
>   	struct idxd_pmu *idxd_pmu;
>   	int rc = -ENODEV;
>   
> -	/*
> -	 * perfmon module initialization failed, nothing to do
> -	 */
> -	if (!cpuhp_set_up)
> -		return -ENODEV;
> -
>   	/*
>   	 * If perfmon_offset or num_counters is 0, it means perfmon is
>   	 * not supported on this hardware.
> @@ -624,11 +552,6 @@ int perfmon_pmu_init(struct idxd_device *idxd)
>   	if (rc)
>   		goto free;
>   
> -	rc = cpuhp_state_add_instance(cpuhp_slot, &idxd_pmu->cpuhp_node);
> -	if (rc) {
> -		perf_pmu_unregister(&idxd->idxd_pmu->pmu);
> -		goto free;
> -	}
>   out:
>   	return rc;
>   free:
> @@ -637,22 +560,3 @@ int perfmon_pmu_init(struct idxd_device *idxd)
>   
>   	goto out;
>   }
> -
> -void __init perfmon_init(void)
> -{
> -	int rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
> -					 "driver/dma/idxd/perf:online",
> -					 perf_event_cpu_online,
> -					 perf_event_cpu_offline);
> -	if (WARN_ON(rc < 0))
> -		return;
> -
> -	cpuhp_slot = rc;
> -	cpuhp_set_up = true;
> -}
> -
> -void __exit perfmon_exit(void)
> -{
> -	if (cpuhp_set_up)
> -		cpuhp_remove_multi_state(cpuhp_slot);
> -}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/7] Generic hotplug support for a PMU with a scope
  2024-08-02 15:16 [PATCH 0/7] Generic hotplug support for a PMU with a scope kan.liang
                   ` (6 preceding siblings ...)
  2024-08-02 15:16 ` [PATCH 7/7] perf/x86/rapl: Clean up cpumask and hotplug kan.liang
@ 2024-09-04 12:44 ` Liang, Kan
  2024-09-06 15:17   ` Peter Zijlstra
  2024-09-06 15:12 ` Peter Zijlstra
  8 siblings, 1 reply; 35+ messages in thread
From: Liang, Kan @ 2024-09-04 12:44 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, linux-kernel

Hi Peter,

Gentle ping.

The patch set provides a generic hotplug support to facilitate various
specific PMU drivers. Could you please take a look?

Thanks,
Kan
On 2024-08-02 11:16 a.m., kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The perf subsystem assumes that the counters of a PMU are per-CPU. So
> the user space tool reads a counter from each CPU in the system wide
> mode. However, many PMUs don't have a per-CPU counter. The counter is
> effective for a scope, e.g., a die or a socket. To address this, a
> cpumask is exposed by the kernel driver to restrict to one CPU to stand
> for a specific scope. In case the given CPU is removed,
> the hotplug support has to be implemented for each such driver.
> 
> The codes to support the cpumask and hotplug are very similar.
> - Expose a cpumask into sysfs
> - Pickup another CPU in the same scope if the given CPU is removed.
> - Invoke the perf_pmu_migrate_context() to migrate to a new CPU.
> - In event init, always set the CPU in the cpumask to event->cpu
> - Usually, an event can be read from any CPU of the scope. (For now,
>   it is only supported by the pkg scope PMU, via
>   PERF_EV_CAP_READ_ACTIVE_PKG, e.g., cstate_pkg, rapl, etc)
> 
> Similar duplicated codes are implemented for each such PMU driver. It
> would be good to introduce a generic infrastructure to avoid such
> duplication.
> 
> The patch series introduce 5 popular scopes, core, die, cluster, pkg,
> and the system-wide. The PMU drivers for cstate, iommu, idxd and rapl
> are updated to apply the new infrastructure. The new infrastructure
> can also be applied for other PMU drivers from different ARCHs as well.
> But I don't have such platforms. It's not done in this patch series.
> They can be added later separately.
> 
> The uncore driver isn't updated either. Because a per-PMU cpumask is
> required since commit c74443d92f68 ("perf/x86/uncore: Support per PMU
> cpumask"). Since different types of PMU share the same hotplug codes,
> more factor out works and verification are expected. The cleanup of the
> uncore driver can be done later separately.
> 
> Kan Liang (7):
>   perf: Generic hotplug support for a PMU with a scope
>   perf: Add PERF_EV_CAP_READ_SCOPE
>   perf/x86/intel/cstate: Clean up cpumask and hotplug
>   iommu/vt-d: Clean up cpumask and hotplug
>   dmaengine: idxd: Clean up cpumask and hotplug
>   perf/x86/rapl: Move the pmu allocation out of CPU hotplug
>   perf/x86/rapl: Clean up cpumask and hotplug
> 
>  arch/x86/events/intel/cstate.c | 140 +-------------------------
>  arch/x86/events/rapl.c         | 119 ++++++----------------
>  drivers/dma/idxd/idxd.h        |   7 --
>  drivers/dma/idxd/init.c        |   3 -
>  drivers/dma/idxd/perfmon.c     |  98 +-----------------
>  drivers/iommu/intel/iommu.h    |   2 -
>  drivers/iommu/intel/perfmon.c  | 111 +--------------------
>  include/linux/cpuhotplug.h     |   3 -
>  include/linux/perf_event.h     |  21 ++++
>  kernel/events/core.c           | 176 ++++++++++++++++++++++++++++++++-
>  10 files changed, 229 insertions(+), 451 deletions(-)
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] perf: Add PERF_EV_CAP_READ_SCOPE
  2024-08-02 15:16 ` [PATCH 2/7] perf: Add PERF_EV_CAP_READ_SCOPE kan.liang
@ 2024-09-06 15:11   ` Peter Zijlstra
  2024-09-06 15:26     ` Liang, Kan
  2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2024-09-06 15:11 UTC (permalink / raw)
  To: kan.liang; +Cc: mingo, acme, namhyung, irogers, linux-kernel

On Fri, Aug 02, 2024 at 08:16:38AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Usually, an event can be read from any CPU of the scope. It doesn't need
> to be read from the advertised CPU.
> 
> Add a new event cap, PERF_EV_CAP_READ_SCOPE. An event of a PMU with
> scope can be read from any active CPU in the scope.
> 
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  include/linux/perf_event.h |  3 +++
>  kernel/events/core.c       | 14 +++++++++++---
>  2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 1102d5c2be70..1206bc86eb4f 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -633,10 +633,13 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *,
>   * PERF_EV_CAP_SIBLING: An event with this flag must be a group sibling and
>   * cannot be a group leader. If an event with this flag is detached from the
>   * group it is scheduled out and moved into an unrecoverable ERROR state.
> + * PERF_EV_CAP_READ_SCOPE: A CPU event that can be read from any CPU of the
> + * PMU scope where it is active.
>   */
>  #define PERF_EV_CAP_SOFTWARE		BIT(0)
>  #define PERF_EV_CAP_READ_ACTIVE_PKG	BIT(1)
>  #define PERF_EV_CAP_SIBLING		BIT(2)
> +#define PERF_EV_CAP_READ_SCOPE		BIT(3)
>  
>  #define SWEVENT_HLIST_BITS		8
>  #define SWEVENT_HLIST_SIZE		(1 << SWEVENT_HLIST_BITS)
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5e1877c4cb4c..c55294f34575 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4463,16 +4463,24 @@ struct perf_read_data {
>  	int ret;
>  };
>  
> +static inline const struct cpumask *perf_scope_cpu_topology_cpumask(unsigned int scope, int cpu);
> +
>  static int __perf_event_read_cpu(struct perf_event *event, int event_cpu)
>  {
> +	int local_cpu = smp_processor_id();
>  	u16 local_pkg, event_pkg;
>  
>  	if ((unsigned)event_cpu >= nr_cpu_ids)
>  		return event_cpu;
>  
> -	if (event->group_caps & PERF_EV_CAP_READ_ACTIVE_PKG) {
> -		int local_cpu = smp_processor_id();
> +	if (event->group_caps & PERF_EV_CAP_READ_SCOPE) {
> +		const struct cpumask *cpumask = perf_scope_cpu_topology_cpumask(event->pmu->scope, event_cpu);
> +
> +		if (cpumask && cpumask_test_cpu(local_cpu, cpumask))
> +			return local_cpu;
> +	}
>  
> +	if (event->group_caps & PERF_EV_CAP_READ_ACTIVE_PKG) {

I'm guessing the goal is to eventually remove this one, right?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/7] Generic hotplug support for a PMU with a scope
  2024-08-02 15:16 [PATCH 0/7] Generic hotplug support for a PMU with a scope kan.liang
                   ` (7 preceding siblings ...)
  2024-09-04 12:44 ` [PATCH 0/7] Generic hotplug support for a PMU with a scope Liang, Kan
@ 2024-09-06 15:12 ` Peter Zijlstra
  2024-09-06 15:30   ` Liang, Kan
  8 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2024-09-06 15:12 UTC (permalink / raw)
  To: kan.liang; +Cc: mingo, acme, namhyung, irogers, linux-kernel

On Fri, Aug 02, 2024 at 08:16:36AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The perf subsystem assumes that the counters of a PMU are per-CPU. So
> the user space tool reads a counter from each CPU in the system wide
> mode. However, many PMUs don't have a per-CPU counter. The counter is
> effective for a scope, e.g., a die or a socket. To address this, a
> cpumask is exposed by the kernel driver to restrict to one CPU to stand
> for a specific scope. In case the given CPU is removed,
> the hotplug support has to be implemented for each such driver.
> 
> The codes to support the cpumask and hotplug are very similar.
> - Expose a cpumask into sysfs
> - Pickup another CPU in the same scope if the given CPU is removed.
> - Invoke the perf_pmu_migrate_context() to migrate to a new CPU.
> - In event init, always set the CPU in the cpumask to event->cpu
> - Usually, an event can be read from any CPU of the scope. (For now,
>   it is only supported by the pkg scope PMU, via
>   PERF_EV_CAP_READ_ACTIVE_PKG, e.g., cstate_pkg, rapl, etc)
> 
> Similar duplicated codes are implemented for each such PMU driver. It
> would be good to introduce a generic infrastructure to avoid such
> duplication.
> 
> The patch series introduce 5 popular scopes, core, die, cluster, pkg,
> and the system-wide. The PMU drivers for cstate, iommu, idxd and rapl
> are updated to apply the new infrastructure. The new infrastructure
> can also be applied for other PMU drivers from different ARCHs as well.
> But I don't have such platforms. It's not done in this patch series.
> They can be added later separately.
> 
> The uncore driver isn't updated either. Because a per-PMU cpumask is
> required since commit c74443d92f68 ("perf/x86/uncore: Support per PMU
> cpumask"). Since different types of PMU share the same hotplug codes,
> more factor out works and verification are expected. The cleanup of the
> uncore driver can be done later separately.
> 
> Kan Liang (7):
>   perf: Generic hotplug support for a PMU with a scope
>   perf: Add PERF_EV_CAP_READ_SCOPE
>   perf/x86/intel/cstate: Clean up cpumask and hotplug
>   iommu/vt-d: Clean up cpumask and hotplug
>   dmaengine: idxd: Clean up cpumask and hotplug
>   perf/x86/rapl: Move the pmu allocation out of CPU hotplug
>   perf/x86/rapl: Clean up cpumask and hotplug
> 
>  arch/x86/events/intel/cstate.c | 140 +-------------------------
>  arch/x86/events/rapl.c         | 119 ++++++----------------

Looks like we have another RAPL driver in:

  drivers/powercap/intel_rapl_common.c

that wants to be converted.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/7] Generic hotplug support for a PMU with a scope
  2024-09-04 12:44 ` [PATCH 0/7] Generic hotplug support for a PMU with a scope Liang, Kan
@ 2024-09-06 15:17   ` Peter Zijlstra
  2024-09-06 15:30     ` Liang, Kan
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2024-09-06 15:17 UTC (permalink / raw)
  To: Liang, Kan; +Cc: mingo, acme, namhyung, irogers, linux-kernel

On Wed, Sep 04, 2024 at 08:44:47AM -0400, Liang, Kan wrote:
> Hi Peter,
> 
> Gentle ping.
> 
> The patch set provides a generic hotplug support to facilitate various
> specific PMU drivers. Could you please take a look?

Seems nice, still applied, so I've queued it up for the robots to pass
judgement.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] perf: Add PERF_EV_CAP_READ_SCOPE
  2024-09-06 15:11   ` Peter Zijlstra
@ 2024-09-06 15:26     ` Liang, Kan
  0 siblings, 0 replies; 35+ messages in thread
From: Liang, Kan @ 2024-09-06 15:26 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, acme, namhyung, irogers, linux-kernel



On 2024-09-06 11:11 a.m., Peter Zijlstra wrote:
> On Fri, Aug 02, 2024 at 08:16:38AM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Usually, an event can be read from any CPU of the scope. It doesn't need
>> to be read from the advertised CPU.
>>
>> Add a new event cap, PERF_EV_CAP_READ_SCOPE. An event of a PMU with
>> scope can be read from any active CPU in the scope.
>>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
>>  include/linux/perf_event.h |  3 +++
>>  kernel/events/core.c       | 14 +++++++++++---
>>  2 files changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 1102d5c2be70..1206bc86eb4f 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -633,10 +633,13 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *,
>>   * PERF_EV_CAP_SIBLING: An event with this flag must be a group sibling and
>>   * cannot be a group leader. If an event with this flag is detached from the
>>   * group it is scheduled out and moved into an unrecoverable ERROR state.
>> + * PERF_EV_CAP_READ_SCOPE: A CPU event that can be read from any CPU of the
>> + * PMU scope where it is active.
>>   */
>>  #define PERF_EV_CAP_SOFTWARE		BIT(0)
>>  #define PERF_EV_CAP_READ_ACTIVE_PKG	BIT(1)
>>  #define PERF_EV_CAP_SIBLING		BIT(2)
>> +#define PERF_EV_CAP_READ_SCOPE		BIT(3)
>>  
>>  #define SWEVENT_HLIST_BITS		8
>>  #define SWEVENT_HLIST_SIZE		(1 << SWEVENT_HLIST_BITS)
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 5e1877c4cb4c..c55294f34575 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -4463,16 +4463,24 @@ struct perf_read_data {
>>  	int ret;
>>  };
>>  
>> +static inline const struct cpumask *perf_scope_cpu_topology_cpumask(unsigned int scope, int cpu);
>> +
>>  static int __perf_event_read_cpu(struct perf_event *event, int event_cpu)
>>  {
>> +	int local_cpu = smp_processor_id();
>>  	u16 local_pkg, event_pkg;
>>  
>>  	if ((unsigned)event_cpu >= nr_cpu_ids)
>>  		return event_cpu;
>>  
>> -	if (event->group_caps & PERF_EV_CAP_READ_ACTIVE_PKG) {
>> -		int local_cpu = smp_processor_id();
>> +	if (event->group_caps & PERF_EV_CAP_READ_SCOPE) {
>> +		const struct cpumask *cpumask = perf_scope_cpu_topology_cpumask(event->pmu->scope, event_cpu);
>> +
>> +		if (cpumask && cpumask_test_cpu(local_cpu, cpumask))
>> +			return local_cpu;
>> +	}
>>  
>> +	if (event->group_caps & PERF_EV_CAP_READ_ACTIVE_PKG) {
> 
> I'm guessing the goal is to eventually remove this one, right?

Yes, after I have the uncore cleaned up. It should be able to remove the
PERF_EV_CAP_READ_ACTIVE_PKG.

Thanks,
Kan
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/7] Generic hotplug support for a PMU with a scope
  2024-09-06 15:12 ` Peter Zijlstra
@ 2024-09-06 15:30   ` Liang, Kan
  0 siblings, 0 replies; 35+ messages in thread
From: Liang, Kan @ 2024-09-06 15:30 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, acme, namhyung, irogers, linux-kernel



On 2024-09-06 11:12 a.m., Peter Zijlstra wrote:
> On Fri, Aug 02, 2024 at 08:16:36AM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> The perf subsystem assumes that the counters of a PMU are per-CPU. So
>> the user space tool reads a counter from each CPU in the system wide
>> mode. However, many PMUs don't have a per-CPU counter. The counter is
>> effective for a scope, e.g., a die or a socket. To address this, a
>> cpumask is exposed by the kernel driver to restrict to one CPU to stand
>> for a specific scope. In case the given CPU is removed,
>> the hotplug support has to be implemented for each such driver.
>>
>> The codes to support the cpumask and hotplug are very similar.
>> - Expose a cpumask into sysfs
>> - Pickup another CPU in the same scope if the given CPU is removed.
>> - Invoke the perf_pmu_migrate_context() to migrate to a new CPU.
>> - In event init, always set the CPU in the cpumask to event->cpu
>> - Usually, an event can be read from any CPU of the scope. (For now,
>>   it is only supported by the pkg scope PMU, via
>>   PERF_EV_CAP_READ_ACTIVE_PKG, e.g., cstate_pkg, rapl, etc)
>>
>> Similar duplicated codes are implemented for each such PMU driver. It
>> would be good to introduce a generic infrastructure to avoid such
>> duplication.
>>
>> The patch series introduce 5 popular scopes, core, die, cluster, pkg,
>> and the system-wide. The PMU drivers for cstate, iommu, idxd and rapl
>> are updated to apply the new infrastructure. The new infrastructure
>> can also be applied for other PMU drivers from different ARCHs as well.
>> But I don't have such platforms. It's not done in this patch series.
>> They can be added later separately.
>>
>> The uncore driver isn't updated either. Because a per-PMU cpumask is
>> required since commit c74443d92f68 ("perf/x86/uncore: Support per PMU
>> cpumask"). Since different types of PMU share the same hotplug codes,
>> more factor out works and verification are expected. The cleanup of the
>> uncore driver can be done later separately.
>>
>> Kan Liang (7):
>>   perf: Generic hotplug support for a PMU with a scope
>>   perf: Add PERF_EV_CAP_READ_SCOPE
>>   perf/x86/intel/cstate: Clean up cpumask and hotplug
>>   iommu/vt-d: Clean up cpumask and hotplug
>>   dmaengine: idxd: Clean up cpumask and hotplug
>>   perf/x86/rapl: Move the pmu allocation out of CPU hotplug
>>   perf/x86/rapl: Clean up cpumask and hotplug
>>
>>  arch/x86/events/intel/cstate.c | 140 +-------------------------
>>  arch/x86/events/rapl.c         | 119 ++++++----------------
> 
> Looks like we have another RAPL driver in:
> 
>   drivers/powercap/intel_rapl_common.c
> 
> that wants to be converted.

Right, but I need to talk with the power guys first. I have a vague
impression that some of the counters are not exactly PKG scope.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/7] Generic hotplug support for a PMU with a scope
  2024-09-06 15:17   ` Peter Zijlstra
@ 2024-09-06 15:30     ` Liang, Kan
  0 siblings, 0 replies; 35+ messages in thread
From: Liang, Kan @ 2024-09-06 15:30 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, acme, namhyung, irogers, linux-kernel



On 2024-09-06 11:17 a.m., Peter Zijlstra wrote:
> On Wed, Sep 04, 2024 at 08:44:47AM -0400, Liang, Kan wrote:
>> Hi Peter,
>>
>> Gentle ping.
>>
>> The patch set provides a generic hotplug support to facilitate various
>> specific PMU drivers. Could you please take a look?
> 
> Seems nice, still applied, so I've queued it up for the robots to pass
> judgement.

Thanks!

Kan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 6/7] perf/x86/rapl: Move the pmu allocation out of CPU hotplug
  2024-08-02 15:16 ` [PATCH 6/7] perf/x86/rapl: Move the pmu allocation out of CPU hotplug kan.liang
@ 2024-09-09  9:26   ` Dhananjay Ugwekar
  2024-09-09 13:02     ` Liang, Kan
  2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
  1 sibling, 1 reply; 35+ messages in thread
From: Dhananjay Ugwekar @ 2024-09-09  9:26 UTC (permalink / raw)
  To: kan.liang, peterz, mingo, acme, namhyung, irogers, linux-kernel

Hello Kan,

On 8/2/2024 8:46 PM, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The rapl pmu just needs to be allocated once. It doesn't matter to be
> allocated at each CPU hotplug, or the global init_rapl_pmus().
> 
> Move the pmu allocation to the init_rapl_pmus(). So the generic hotplug
> supports can be applied.
> 
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> ---
>  arch/x86/events/rapl.c | 43 +++++++++++++++++++++++++++++-------------
>  1 file changed, 30 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index b985ca79cf97..f8b6d504d03f 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -568,19 +568,8 @@ static int rapl_cpu_online(unsigned int cpu)
>  	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>  	int target;
>  
> -	if (!pmu) {
> -		pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
> -		if (!pmu)
> -			return -ENOMEM;
> -
> -		raw_spin_lock_init(&pmu->lock);
> -		INIT_LIST_HEAD(&pmu->active_list);
> -		pmu->pmu = &rapl_pmus->pmu;
> -		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
> -		rapl_hrtimer_init(pmu);
> -
> -		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
> -	}
> +	if (!pmu)
> +		return -ENOMEM;
>  
>  	/*
>  	 * Check if there is an online cpu in the package which collects rapl
> @@ -673,6 +662,32 @@ static const struct attribute_group *rapl_attr_update[] = {
>  	NULL,
>  };
>  
> +static void __init init_rapl_pmu(void)
> +{
> +	struct rapl_pmu *pmu;
> +	int cpu;
> +
> +	cpus_read_lock();
> +
> +	for_each_cpu(cpu, cpu_online_mask) {
> +		pmu = cpu_to_rapl_pmu(cpu);
> +		if (pmu)
> +			continue;
> +		pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
> +		if (!pmu)
> +			continue;
> +		raw_spin_lock_init(&pmu->lock);
> +		INIT_LIST_HEAD(&pmu->active_list);
> +		pmu->pmu = &rapl_pmus->pmu;
> +		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
> +		rapl_hrtimer_init(pmu);
> +
> +		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
> +	}
> +
> +	cpus_read_unlock();
> +}
> +
>  static int __init init_rapl_pmus(void)
>  {
>  	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
> @@ -681,6 +696,8 @@ static int __init init_rapl_pmus(void)
>  	if (!rapl_pmus)
>  		return -ENOMEM;
>  
> +	init_rapl_pmu();
> +
>  	rapl_pmus->nr_rapl_pmu		= nr_rapl_pmu;

I feel there is one potential bug here, first we are calling init_rapl_pmu() --> cpu_to_rapl_pmu(cpu) --> "return rapl_pmu_idx < rapl_pmus->nr_rapl_pmu ? rapl_pmus->pmus[rapl_pmu_idx] : NULL;"
Then we are updating "rapl_pmus->nr_rapl_pmu = nr_rapl_pmu;". This makes the return check in cpu_to_rapl_pmu() ineffective as the rapl_pmus->nr_rapl_pmu value would be falsely zero. 

Please let me know if I'm missing something.

Thanks,
Dhananjay

>  	rapl_pmus->pmu.attr_groups	= rapl_attr_groups;
>  	rapl_pmus->pmu.attr_update	= rapl_attr_update;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 6/7] perf/x86/rapl: Move the pmu allocation out of CPU hotplug
  2024-09-09  9:26   ` Dhananjay Ugwekar
@ 2024-09-09 13:02     ` Liang, Kan
  2024-09-09 13:24       ` Peter Zijlstra
  0 siblings, 1 reply; 35+ messages in thread
From: Liang, Kan @ 2024-09-09 13:02 UTC (permalink / raw)
  To: Dhananjay Ugwekar, peterz, mingo, acme, namhyung, irogers,
	linux-kernel

Hi Dhananjay,

On 2024-09-09 5:26 a.m., Dhananjay Ugwekar wrote:
> Hello Kan,
> 
> On 8/2/2024 8:46 PM, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> The rapl pmu just needs to be allocated once. It doesn't matter to be
>> allocated at each CPU hotplug, or the global init_rapl_pmus().
>>
>> Move the pmu allocation to the init_rapl_pmus(). So the generic hotplug
>> supports can be applied.
>>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
>> ---
>>  arch/x86/events/rapl.c | 43 +++++++++++++++++++++++++++++-------------
>>  1 file changed, 30 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>> index b985ca79cf97..f8b6d504d03f 100644
>> --- a/arch/x86/events/rapl.c
>> +++ b/arch/x86/events/rapl.c
>> @@ -568,19 +568,8 @@ static int rapl_cpu_online(unsigned int cpu)
>>  	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>>  	int target;
>>  
>> -	if (!pmu) {
>> -		pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
>> -		if (!pmu)
>> -			return -ENOMEM;
>> -
>> -		raw_spin_lock_init(&pmu->lock);
>> -		INIT_LIST_HEAD(&pmu->active_list);
>> -		pmu->pmu = &rapl_pmus->pmu;
>> -		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
>> -		rapl_hrtimer_init(pmu);
>> -
>> -		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
>> -	}
>> +	if (!pmu)
>> +		return -ENOMEM;
>>  
>>  	/*
>>  	 * Check if there is an online cpu in the package which collects rapl
>> @@ -673,6 +662,32 @@ static const struct attribute_group *rapl_attr_update[] = {
>>  	NULL,
>>  };
>>  
>> +static void __init init_rapl_pmu(void)
>> +{
>> +	struct rapl_pmu *pmu;
>> +	int cpu;
>> +
>> +	cpus_read_lock();
>> +
>> +	for_each_cpu(cpu, cpu_online_mask) {
>> +		pmu = cpu_to_rapl_pmu(cpu);
>> +		if (pmu)
>> +			continue;
>> +		pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
>> +		if (!pmu)
>> +			continue;
>> +		raw_spin_lock_init(&pmu->lock);
>> +		INIT_LIST_HEAD(&pmu->active_list);
>> +		pmu->pmu = &rapl_pmus->pmu;
>> +		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
>> +		rapl_hrtimer_init(pmu);
>> +
>> +		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
>> +	}
>> +
>> +	cpus_read_unlock();
>> +}
>> +
>>  static int __init init_rapl_pmus(void)
>>  {
>>  	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
>> @@ -681,6 +696,8 @@ static int __init init_rapl_pmus(void)
>>  	if (!rapl_pmus)
>>  		return -ENOMEM;
>>  
>> +	init_rapl_pmu();
>> +
>>  	rapl_pmus->nr_rapl_pmu		= nr_rapl_pmu;
> 
> I feel there is one potential bug here, first we are calling init_rapl_pmu() --> cpu_to_rapl_pmu(cpu) --> "return rapl_pmu_idx < rapl_pmus->nr_rapl_pmu ? rapl_pmus->pmus[rapl_pmu_idx] : NULL;"
> Then we are updating "rapl_pmus->nr_rapl_pmu = nr_rapl_pmu;". This makes the return check in cpu_to_rapl_pmu() ineffective as the rapl_pmus->nr_rapl_pmu value would be falsely zero. 
> 

Ah, right. A pmu will be allocated for each CPU rather than each socket.
A user wouldn't see a difference, but it wastes memory and may cause
memory leak.

I think we should move the init_rapl_pmu(); to the end of the function.

The patch set has been merged into Peter's perf/core branch. Do you want
to post a fix patch to address the issue?

Thanks,
Kan

> Please let me know if I'm missing something.
> 
> Thanks,
> Dhananjay
> 
>>  	rapl_pmus->pmu.attr_groups	= rapl_attr_groups;
>>  	rapl_pmus->pmu.attr_update	= rapl_attr_update;
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 6/7] perf/x86/rapl: Move the pmu allocation out of CPU hotplug
  2024-09-09 13:02     ` Liang, Kan
@ 2024-09-09 13:24       ` Peter Zijlstra
  2024-09-09 17:11         ` Liang, Kan
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2024-09-09 13:24 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Dhananjay Ugwekar, mingo, acme, namhyung, irogers, linux-kernel

On Mon, Sep 09, 2024 at 09:02:56AM -0400, Liang, Kan wrote:

> The patch set has been merged into Peter's perf/core branch. Do you want
> to post a fix patch to address the issue?

I've not yet pushed out to tip, so I can readily rebase. Send me a delta
and indicate what patch it should go into and I'll make it happen.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 6/7] perf/x86/rapl: Move the pmu allocation out of CPU hotplug
  2024-09-09 13:24       ` Peter Zijlstra
@ 2024-09-09 17:11         ` Liang, Kan
  0 siblings, 0 replies; 35+ messages in thread
From: Liang, Kan @ 2024-09-09 17:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dhananjay Ugwekar, mingo, acme, namhyung, irogers, linux-kernel



On 2024-09-09 9:24 a.m., Peter Zijlstra wrote:
> On Mon, Sep 09, 2024 at 09:02:56AM -0400, Liang, Kan wrote:
> 
>> The patch set has been merged into Peter's perf/core branch. Do you want
>> to post a fix patch to address the issue?
> 
> I've not yet pushed out to tip, so I can readily rebase. Send me a delta
> and indicate what patch it should go into and I'll make it happen.
> 

Thanks Peter. Please fold the below patch into commit 90942140bb6c
("perf/x86/rapl: Move the pmu allocation out of CPU hotplug").

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index b70ad880c5bc..1b38f8771488 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -631,8 +632,6 @@ static int __init init_rapl_pmus(void)
 	if (!rapl_pmus)
 		return -ENOMEM;

-	init_rapl_pmu();
-
 	rapl_pmus->nr_rapl_pmu		= nr_rapl_pmu;
 	rapl_pmus->pmu.attr_groups	= rapl_attr_groups;
 	rapl_pmus->pmu.attr_update	= rapl_attr_update;
@@ -646,6 +645,9 @@ static int __init init_rapl_pmus(void)
 	rapl_pmus->pmu.module		= THIS_MODULE;
 	rapl_pmus->pmu.scope		= PERF_PMU_SCOPE_DIE;
 	rapl_pmus->pmu.capabilities	= PERF_PMU_CAP_NO_EXCLUDE;
+
+	init_rapl_pmu();
+
 	return 0;
 }

Thanks,
Kan

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: perf/core] perf/x86/rapl: Clean up cpumask and hotplug
  2024-08-02 15:16 ` [PATCH 7/7] perf/x86/rapl: Clean up cpumask and hotplug kan.liang
@ 2024-09-10  9:59   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Kan Liang @ 2024-09-10  9:59 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     fa90a7dad75b47302f6d35f6c6a36d0c87450187
Gitweb:        https://git.kernel.org/tip/fa90a7dad75b47302f6d35f6c6a36d0c87450187
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Fri, 02 Aug 2024 08:16:43 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 10 Sep 2024 11:44:14 +02:00

perf/x86/rapl: Clean up cpumask and hotplug

The rapl pmu is die scope, which is supported by the generic perf_event
subsystem now.

Set the scope for the rapl PMU and remove all the cpumask and hotplug
codes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240802151643.1691631-8-kan.liang@linux.intel.com
---
 arch/x86/events/rapl.c     | 80 +-------------------------------------
 include/linux/cpuhotplug.h |  1 +-
 2 files changed, 2 insertions(+), 79 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index d12f3a6..0f8f4eb 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -135,7 +135,6 @@ struct rapl_model {
  /* 1/2^hw_unit Joule */
 static int rapl_hw_unit[NR_RAPL_DOMAINS] __read_mostly;
 static struct rapl_pmus *rapl_pmus;
-static cpumask_t rapl_cpu_mask;
 static unsigned int rapl_cntr_mask;
 static u64 rapl_timer_ms;
 static struct perf_msr *rapl_msrs;
@@ -340,8 +339,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
 	if (event->cpu < 0)
 		return -EINVAL;
 
-	event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
-
 	if (!cfg || cfg >= NR_RAPL_DOMAINS + 1)
 		return -EINVAL;
 
@@ -360,7 +357,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
 	pmu = cpu_to_rapl_pmu(event->cpu);
 	if (!pmu)
 		return -EINVAL;
-	event->cpu = pmu->cpu;
 	event->pmu_private = pmu;
 	event->hw.event_base = rapl_msrs[bit].msr;
 	event->hw.config = cfg;
@@ -374,23 +370,6 @@ static void rapl_pmu_event_read(struct perf_event *event)
 	rapl_event_update(event);
 }
 
-static ssize_t rapl_get_attr_cpumask(struct device *dev,
-				struct device_attribute *attr, char *buf)
-{
-	return cpumap_print_to_pagebuf(true, buf, &rapl_cpu_mask);
-}
-
-static DEVICE_ATTR(cpumask, S_IRUGO, rapl_get_attr_cpumask, NULL);
-
-static struct attribute *rapl_pmu_attrs[] = {
-	&dev_attr_cpumask.attr,
-	NULL,
-};
-
-static struct attribute_group rapl_pmu_attr_group = {
-	.attrs = rapl_pmu_attrs,
-};
-
 RAPL_EVENT_ATTR_STR(energy-cores, rapl_cores, "event=0x01");
 RAPL_EVENT_ATTR_STR(energy-pkg  ,   rapl_pkg, "event=0x02");
 RAPL_EVENT_ATTR_STR(energy-ram  ,   rapl_ram, "event=0x03");
@@ -438,7 +417,6 @@ static struct attribute_group rapl_pmu_format_group = {
 };
 
 static const struct attribute_group *rapl_attr_groups[] = {
-	&rapl_pmu_attr_group,
 	&rapl_pmu_format_group,
 	&rapl_pmu_events_group,
 	NULL,
@@ -541,49 +519,6 @@ static struct perf_msr amd_rapl_msrs[] = {
 	[PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group,  NULL, false, 0 },
 };
 
-static int rapl_cpu_offline(unsigned int cpu)
-{
-	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
-	int target;
-
-	/* Check if exiting cpu is used for collecting rapl events */
-	if (!cpumask_test_and_clear_cpu(cpu, &rapl_cpu_mask))
-		return 0;
-
-	pmu->cpu = -1;
-	/* Find a new cpu to collect rapl events */
-	target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
-
-	/* Migrate rapl events to the new target */
-	if (target < nr_cpu_ids) {
-		cpumask_set_cpu(target, &rapl_cpu_mask);
-		pmu->cpu = target;
-		perf_pmu_migrate_context(pmu->pmu, cpu, target);
-	}
-	return 0;
-}
-
-static int rapl_cpu_online(unsigned int cpu)
-{
-	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
-	int target;
-
-	if (!pmu)
-		return -ENOMEM;
-
-	/*
-	 * Check if there is an online cpu in the package which collects rapl
-	 * events already.
-	 */
-	target = cpumask_any_and(&rapl_cpu_mask, topology_die_cpumask(cpu));
-	if (target < nr_cpu_ids)
-		return 0;
-
-	cpumask_set_cpu(cpu, &rapl_cpu_mask);
-	pmu->cpu = cpu;
-	return 0;
-}
-
 static int rapl_check_hw_unit(struct rapl_model *rm)
 {
 	u64 msr_rapl_power_unit_bits;
@@ -707,6 +642,7 @@ static int __init init_rapl_pmus(void)
 	rapl_pmus->pmu.stop		= rapl_pmu_event_stop;
 	rapl_pmus->pmu.read		= rapl_pmu_event_read;
 	rapl_pmus->pmu.module		= THIS_MODULE;
+	rapl_pmus->pmu.scope		= PERF_PMU_SCOPE_DIE;
 	rapl_pmus->pmu.capabilities	= PERF_PMU_CAP_NO_EXCLUDE;
 
 	init_rapl_pmu();
@@ -857,24 +793,13 @@ static int __init rapl_pmu_init(void)
 	if (ret)
 		return ret;
 
-	/*
-	 * Install callbacks. Core will call them for each online cpu.
-	 */
-	ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_RAPL_ONLINE,
-				"perf/x86/rapl:online",
-				rapl_cpu_online, rapl_cpu_offline);
-	if (ret)
-		goto out;
-
 	ret = perf_pmu_register(&rapl_pmus->pmu, "power", -1);
 	if (ret)
-		goto out1;
+		goto out;
 
 	rapl_advertise();
 	return 0;
 
-out1:
-	cpuhp_remove_state(CPUHP_AP_PERF_X86_RAPL_ONLINE);
 out:
 	pr_warn("Initialization failed (%d), disabled\n", ret);
 	cleanup_rapl_pmus();
@@ -884,7 +809,6 @@ module_init(rapl_pmu_init);
 
 static void __exit intel_rapl_exit(void)
 {
-	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
 	perf_pmu_unregister(&rapl_pmus->pmu);
 	cleanup_rapl_pmus();
 }
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 2101ae2..801053c 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -207,7 +207,6 @@ enum cpuhp_state {
 	CPUHP_AP_PERF_X86_UNCORE_ONLINE,
 	CPUHP_AP_PERF_X86_AMD_UNCORE_ONLINE,
 	CPUHP_AP_PERF_X86_AMD_POWER_ONLINE,
-	CPUHP_AP_PERF_X86_RAPL_ONLINE,
 	CPUHP_AP_PERF_S390_CF_ONLINE,
 	CPUHP_AP_PERF_S390_SF_ONLINE,
 	CPUHP_AP_PERF_ARM_CCI_ONLINE,

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: perf/core] perf/x86/rapl: Move the pmu allocation out of CPU hotplug
  2024-08-02 15:16 ` [PATCH 6/7] perf/x86/rapl: Move the pmu allocation out of CPU hotplug kan.liang
  2024-09-09  9:26   ` Dhananjay Ugwekar
@ 2024-09-10  9:59   ` tip-bot2 for Kan Liang
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot2 for Kan Liang @ 2024-09-10  9:59 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     351e6ba39e5c851b00d83716ffb4d19b807ecc3d
Gitweb:        https://git.kernel.org/tip/351e6ba39e5c851b00d83716ffb4d19b807ecc3d
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Fri, 02 Aug 2024 08:16:42 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 10 Sep 2024 11:44:14 +02:00

perf/x86/rapl: Move the pmu allocation out of CPU hotplug

The rapl pmu just needs to be allocated once. It doesn't matter to be
allocated at each CPU hotplug, or the global init_rapl_pmus().

Move the pmu allocation to the init_rapl_pmus(). So the generic hotplug
supports can be applied.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240802151643.1691631-7-kan.liang@linux.intel.com
---
 arch/x86/events/rapl.c | 44 ++++++++++++++++++++++++++++-------------
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index b985ca7..d12f3a6 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -568,19 +568,8 @@ static int rapl_cpu_online(unsigned int cpu)
 	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
 	int target;
 
-	if (!pmu) {
-		pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
-		if (!pmu)
-			return -ENOMEM;
-
-		raw_spin_lock_init(&pmu->lock);
-		INIT_LIST_HEAD(&pmu->active_list);
-		pmu->pmu = &rapl_pmus->pmu;
-		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
-		rapl_hrtimer_init(pmu);
-
-		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
-	}
+	if (!pmu)
+		return -ENOMEM;
 
 	/*
 	 * Check if there is an online cpu in the package which collects rapl
@@ -673,6 +662,32 @@ static const struct attribute_group *rapl_attr_update[] = {
 	NULL,
 };
 
+static void __init init_rapl_pmu(void)
+{
+	struct rapl_pmu *pmu;
+	int cpu;
+
+	cpus_read_lock();
+
+	for_each_cpu(cpu, cpu_online_mask) {
+		pmu = cpu_to_rapl_pmu(cpu);
+		if (pmu)
+			continue;
+		pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
+		if (!pmu)
+			continue;
+		raw_spin_lock_init(&pmu->lock);
+		INIT_LIST_HEAD(&pmu->active_list);
+		pmu->pmu = &rapl_pmus->pmu;
+		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
+		rapl_hrtimer_init(pmu);
+
+		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
+	}
+
+	cpus_read_unlock();
+}
+
 static int __init init_rapl_pmus(void)
 {
 	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
@@ -693,6 +708,9 @@ static int __init init_rapl_pmus(void)
 	rapl_pmus->pmu.read		= rapl_pmu_event_read;
 	rapl_pmus->pmu.module		= THIS_MODULE;
 	rapl_pmus->pmu.capabilities	= PERF_PMU_CAP_NO_EXCLUDE;
+
+	init_rapl_pmu();
+
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: perf/core] dmaengine: idxd: Clean up cpumask and hotplug for perfmon
  2024-08-02 15:16 ` [PATCH 5/7] dmaengine: idxd: " kan.liang
  2024-08-05 15:40   ` Dave Jiang
  2024-08-05 17:46   ` Fenghua Yu
@ 2024-09-10  9:59   ` tip-bot2 for Kan Liang
  2 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Kan Liang @ 2024-09-10  9:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Peter Zijlstra (Intel), Dave Jiang, Fenghua Yu, x86,
	linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     bbdd4df35c31fe502c1c70e4d3a8e3ebe5a270b7
Gitweb:        https://git.kernel.org/tip/bbdd4df35c31fe502c1c70e4d3a8e3ebe5a270b7
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Fri, 02 Aug 2024 08:16:41 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 10 Sep 2024 11:44:13 +02:00

dmaengine: idxd: Clean up cpumask and hotplug for perfmon

The idxd PMU is system-wide scope, which is supported by the generic
perf_event subsystem now.

Set the scope for the idxd PMU and remove all the cpumask and hotplug
codes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240802151643.1691631-6-kan.liang@linux.intel.com
---
 drivers/dma/idxd/idxd.h    |  7 +---
 drivers/dma/idxd/init.c    |  3 +-
 drivers/dma/idxd/perfmon.c | 98 +-------------------------------------
 3 files changed, 1 insertion(+), 107 deletions(-)

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 868b724..d84e21d 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -124,7 +124,6 @@ struct idxd_pmu {
 
 	struct pmu pmu;
 	char name[IDXD_NAME_SIZE];
-	int cpu;
 
 	int n_counters;
 	int counter_width;
@@ -135,8 +134,6 @@ struct idxd_pmu {
 
 	unsigned long supported_filters;
 	int n_filters;
-
-	struct hlist_node cpuhp_node;
 };
 
 #define IDXD_MAX_PRIORITY	0xf
@@ -803,14 +800,10 @@ void idxd_user_counter_increment(struct idxd_wq *wq, u32 pasid, int index);
 int perfmon_pmu_init(struct idxd_device *idxd);
 void perfmon_pmu_remove(struct idxd_device *idxd);
 void perfmon_counter_overflow(struct idxd_device *idxd);
-void perfmon_init(void);
-void perfmon_exit(void);
 #else
 static inline int perfmon_pmu_init(struct idxd_device *idxd) { return 0; }
 static inline void perfmon_pmu_remove(struct idxd_device *idxd) {}
 static inline void perfmon_counter_overflow(struct idxd_device *idxd) {}
-static inline void perfmon_init(void) {}
-static inline void perfmon_exit(void) {}
 #endif
 
 /* debugfs */
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 21f6905..5725ea8 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -878,8 +878,6 @@ static int __init idxd_init_module(void)
 	else
 		support_enqcmd = true;
 
-	perfmon_init();
-
 	err = idxd_driver_register(&idxd_drv);
 	if (err < 0)
 		goto err_idxd_driver_register;
@@ -928,7 +926,6 @@ static void __exit idxd_exit_module(void)
 	idxd_driver_unregister(&idxd_drv);
 	pci_unregister_driver(&idxd_pci_driver);
 	idxd_cdev_remove();
-	perfmon_exit();
 	idxd_remove_debugfs();
 }
 module_exit(idxd_exit_module);
diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
index 5e94247..f511cf1 100644
--- a/drivers/dma/idxd/perfmon.c
+++ b/drivers/dma/idxd/perfmon.c
@@ -6,29 +6,6 @@
 #include "idxd.h"
 #include "perfmon.h"
 
-static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
-			    char *buf);
-
-static cpumask_t		perfmon_dsa_cpu_mask;
-static bool			cpuhp_set_up;
-static enum cpuhp_state		cpuhp_slot;
-
-/*
- * perf userspace reads this attribute to determine which cpus to open
- * counters on.  It's connected to perfmon_dsa_cpu_mask, which is
- * maintained by the cpu hotplug handlers.
- */
-static DEVICE_ATTR_RO(cpumask);
-
-static struct attribute *perfmon_cpumask_attrs[] = {
-	&dev_attr_cpumask.attr,
-	NULL,
-};
-
-static struct attribute_group cpumask_attr_group = {
-	.attrs = perfmon_cpumask_attrs,
-};
-
 /*
  * These attributes specify the bits in the config word that the perf
  * syscall uses to pass the event ids and categories to perfmon.
@@ -67,16 +44,9 @@ static struct attribute_group perfmon_format_attr_group = {
 
 static const struct attribute_group *perfmon_attr_groups[] = {
 	&perfmon_format_attr_group,
-	&cpumask_attr_group,
 	NULL,
 };
 
-static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
-			    char *buf)
-{
-	return cpumap_print_to_pagebuf(true, buf, &perfmon_dsa_cpu_mask);
-}
-
 static bool is_idxd_event(struct idxd_pmu *idxd_pmu, struct perf_event *event)
 {
 	return &idxd_pmu->pmu == event->pmu;
@@ -217,7 +187,6 @@ static int perfmon_pmu_event_init(struct perf_event *event)
 		return -EINVAL;
 
 	event->hw.event_base = ioread64(PERFMON_TABLE_OFFSET(idxd));
-	event->cpu = idxd->idxd_pmu->cpu;
 	event->hw.config = event->attr.config;
 
 	if (event->group_leader != event)
@@ -488,6 +457,7 @@ static void idxd_pmu_init(struct idxd_pmu *idxd_pmu)
 	idxd_pmu->pmu.stop		= perfmon_pmu_event_stop;
 	idxd_pmu->pmu.read		= perfmon_pmu_event_update;
 	idxd_pmu->pmu.capabilities	= PERF_PMU_CAP_NO_EXCLUDE;
+	idxd_pmu->pmu.scope		= PERF_PMU_SCOPE_SYS_WIDE;
 	idxd_pmu->pmu.module		= THIS_MODULE;
 }
 
@@ -496,47 +466,11 @@ void perfmon_pmu_remove(struct idxd_device *idxd)
 	if (!idxd->idxd_pmu)
 		return;
 
-	cpuhp_state_remove_instance(cpuhp_slot, &idxd->idxd_pmu->cpuhp_node);
 	perf_pmu_unregister(&idxd->idxd_pmu->pmu);
 	kfree(idxd->idxd_pmu);
 	idxd->idxd_pmu = NULL;
 }
 
-static int perf_event_cpu_online(unsigned int cpu, struct hlist_node *node)
-{
-	struct idxd_pmu *idxd_pmu;
-
-	idxd_pmu = hlist_entry_safe(node, typeof(*idxd_pmu), cpuhp_node);
-
-	/* select the first online CPU as the designated reader */
-	if (cpumask_empty(&perfmon_dsa_cpu_mask)) {
-		cpumask_set_cpu(cpu, &perfmon_dsa_cpu_mask);
-		idxd_pmu->cpu = cpu;
-	}
-
-	return 0;
-}
-
-static int perf_event_cpu_offline(unsigned int cpu, struct hlist_node *node)
-{
-	struct idxd_pmu *idxd_pmu;
-	unsigned int target;
-
-	idxd_pmu = hlist_entry_safe(node, typeof(*idxd_pmu), cpuhp_node);
-
-	if (!cpumask_test_and_clear_cpu(cpu, &perfmon_dsa_cpu_mask))
-		return 0;
-
-	target = cpumask_any_but(cpu_online_mask, cpu);
-	/* migrate events if there is a valid target */
-	if (target < nr_cpu_ids) {
-		cpumask_set_cpu(target, &perfmon_dsa_cpu_mask);
-		perf_pmu_migrate_context(&idxd_pmu->pmu, cpu, target);
-	}
-
-	return 0;
-}
-
 int perfmon_pmu_init(struct idxd_device *idxd)
 {
 	union idxd_perfcap perfcap;
@@ -544,12 +478,6 @@ int perfmon_pmu_init(struct idxd_device *idxd)
 	int rc = -ENODEV;
 
 	/*
-	 * perfmon module initialization failed, nothing to do
-	 */
-	if (!cpuhp_set_up)
-		return -ENODEV;
-
-	/*
 	 * If perfmon_offset or num_counters is 0, it means perfmon is
 	 * not supported on this hardware.
 	 */
@@ -624,11 +552,6 @@ int perfmon_pmu_init(struct idxd_device *idxd)
 	if (rc)
 		goto free;
 
-	rc = cpuhp_state_add_instance(cpuhp_slot, &idxd_pmu->cpuhp_node);
-	if (rc) {
-		perf_pmu_unregister(&idxd->idxd_pmu->pmu);
-		goto free;
-	}
 out:
 	return rc;
 free:
@@ -637,22 +560,3 @@ free:
 
 	goto out;
 }
-
-void __init perfmon_init(void)
-{
-	int rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
-					 "driver/dma/idxd/perf:online",
-					 perf_event_cpu_online,
-					 perf_event_cpu_offline);
-	if (WARN_ON(rc < 0))
-		return;
-
-	cpuhp_slot = rc;
-	cpuhp_set_up = true;
-}
-
-void __exit perfmon_exit(void)
-{
-	if (cpuhp_set_up)
-		cpuhp_remove_multi_state(cpuhp_slot);
-}

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: perf/core] iommu/vt-d: Clean up cpumask and hotplug for perfmon
  2024-08-02 15:16 ` [PATCH 4/7] iommu/vt-d: Clean up cpumask and hotplug for perfmon kan.liang
@ 2024-09-10  9:59   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Kan Liang @ 2024-09-10  9:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Peter Zijlstra (Intel), Lu Baolu, x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     a8c73b82f7792400555143773d0152d5bc3ac068
Gitweb:        https://git.kernel.org/tip/a8c73b82f7792400555143773d0152d5bc3ac068
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Fri, 02 Aug 2024 08:16:40 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 10 Sep 2024 11:44:13 +02:00

iommu/vt-d: Clean up cpumask and hotplug for perfmon

The iommu PMU is system-wide scope, which is supported by the generic
perf_event subsystem now.

Set the scope for the iommu PMU and remove all the cpumask and hotplug
codes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20240802151643.1691631-5-kan.liang@linux.intel.com
---
 drivers/iommu/intel/iommu.h   |   2 +-
 drivers/iommu/intel/perfmon.c | 111 +---------------------------------
 2 files changed, 2 insertions(+), 111 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index b67c14d..bd2c5a4 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -687,8 +687,6 @@ struct iommu_pmu {
 	DECLARE_BITMAP(used_mask, IOMMU_PMU_IDX_MAX);
 	struct perf_event	*event_list[IOMMU_PMU_IDX_MAX];
 	unsigned char		irq_name[16];
-	struct hlist_node	cpuhp_node;
-	int			cpu;
 };
 
 #define IOMMU_IRQ_ID_OFFSET_PRQ		(DMAR_UNITS_SUPPORTED)
diff --git a/drivers/iommu/intel/perfmon.c b/drivers/iommu/intel/perfmon.c
index 44083d0..75f493b 100644
--- a/drivers/iommu/intel/perfmon.c
+++ b/drivers/iommu/intel/perfmon.c
@@ -34,28 +34,9 @@ static struct attribute_group iommu_pmu_events_attr_group = {
 	.attrs = attrs_empty,
 };
 
-static cpumask_t iommu_pmu_cpu_mask;
-
-static ssize_t
-cpumask_show(struct device *dev, struct device_attribute *attr, char *buf)
-{
-	return cpumap_print_to_pagebuf(true, buf, &iommu_pmu_cpu_mask);
-}
-static DEVICE_ATTR_RO(cpumask);
-
-static struct attribute *iommu_pmu_cpumask_attrs[] = {
-	&dev_attr_cpumask.attr,
-	NULL
-};
-
-static struct attribute_group iommu_pmu_cpumask_attr_group = {
-	.attrs = iommu_pmu_cpumask_attrs,
-};
-
 static const struct attribute_group *iommu_pmu_attr_groups[] = {
 	&iommu_pmu_format_attr_group,
 	&iommu_pmu_events_attr_group,
-	&iommu_pmu_cpumask_attr_group,
 	NULL
 };
 
@@ -565,6 +546,7 @@ static int __iommu_pmu_register(struct intel_iommu *iommu)
 	iommu_pmu->pmu.attr_groups	= iommu_pmu_attr_groups;
 	iommu_pmu->pmu.attr_update	= iommu_pmu_attr_update;
 	iommu_pmu->pmu.capabilities	= PERF_PMU_CAP_NO_EXCLUDE;
+	iommu_pmu->pmu.scope		= PERF_PMU_SCOPE_SYS_WIDE;
 	iommu_pmu->pmu.module		= THIS_MODULE;
 
 	return perf_pmu_register(&iommu_pmu->pmu, iommu_pmu->pmu.name, -1);
@@ -773,89 +755,6 @@ static void iommu_pmu_unset_interrupt(struct intel_iommu *iommu)
 	iommu->perf_irq = 0;
 }
 
-static int iommu_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
-{
-	struct iommu_pmu *iommu_pmu = hlist_entry_safe(node, typeof(*iommu_pmu), cpuhp_node);
-
-	if (cpumask_empty(&iommu_pmu_cpu_mask))
-		cpumask_set_cpu(cpu, &iommu_pmu_cpu_mask);
-
-	if (cpumask_test_cpu(cpu, &iommu_pmu_cpu_mask))
-		iommu_pmu->cpu = cpu;
-
-	return 0;
-}
-
-static int iommu_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
-{
-	struct iommu_pmu *iommu_pmu = hlist_entry_safe(node, typeof(*iommu_pmu), cpuhp_node);
-	int target = cpumask_first(&iommu_pmu_cpu_mask);
-
-	/*
-	 * The iommu_pmu_cpu_mask has been updated when offline the CPU
-	 * for the first iommu_pmu. Migrate the other iommu_pmu to the
-	 * new target.
-	 */
-	if (target < nr_cpu_ids && target != iommu_pmu->cpu) {
-		perf_pmu_migrate_context(&iommu_pmu->pmu, cpu, target);
-		iommu_pmu->cpu = target;
-		return 0;
-	}
-
-	if (!cpumask_test_and_clear_cpu(cpu, &iommu_pmu_cpu_mask))
-		return 0;
-
-	target = cpumask_any_but(cpu_online_mask, cpu);
-
-	if (target < nr_cpu_ids)
-		cpumask_set_cpu(target, &iommu_pmu_cpu_mask);
-	else
-		return 0;
-
-	perf_pmu_migrate_context(&iommu_pmu->pmu, cpu, target);
-	iommu_pmu->cpu = target;
-
-	return 0;
-}
-
-static int nr_iommu_pmu;
-static enum cpuhp_state iommu_cpuhp_slot;
-
-static int iommu_pmu_cpuhp_setup(struct iommu_pmu *iommu_pmu)
-{
-	int ret;
-
-	if (!nr_iommu_pmu) {
-		ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
-					      "driver/iommu/intel/perfmon:online",
-					      iommu_pmu_cpu_online,
-					      iommu_pmu_cpu_offline);
-		if (ret < 0)
-			return ret;
-		iommu_cpuhp_slot = ret;
-	}
-
-	ret = cpuhp_state_add_instance(iommu_cpuhp_slot, &iommu_pmu->cpuhp_node);
-	if (ret) {
-		if (!nr_iommu_pmu)
-			cpuhp_remove_multi_state(iommu_cpuhp_slot);
-		return ret;
-	}
-	nr_iommu_pmu++;
-
-	return 0;
-}
-
-static void iommu_pmu_cpuhp_free(struct iommu_pmu *iommu_pmu)
-{
-	cpuhp_state_remove_instance(iommu_cpuhp_slot, &iommu_pmu->cpuhp_node);
-
-	if (--nr_iommu_pmu)
-		return;
-
-	cpuhp_remove_multi_state(iommu_cpuhp_slot);
-}
-
 void iommu_pmu_register(struct intel_iommu *iommu)
 {
 	struct iommu_pmu *iommu_pmu = iommu->pmu;
@@ -866,17 +765,12 @@ void iommu_pmu_register(struct intel_iommu *iommu)
 	if (__iommu_pmu_register(iommu))
 		goto err;
 
-	if (iommu_pmu_cpuhp_setup(iommu_pmu))
-		goto unregister;
-
 	/* Set interrupt for overflow */
 	if (iommu_pmu_set_interrupt(iommu))
-		goto cpuhp_free;
+		goto unregister;
 
 	return;
 
-cpuhp_free:
-	iommu_pmu_cpuhp_free(iommu_pmu);
 unregister:
 	perf_pmu_unregister(&iommu_pmu->pmu);
 err:
@@ -892,6 +786,5 @@ void iommu_pmu_unregister(struct intel_iommu *iommu)
 		return;
 
 	iommu_pmu_unset_interrupt(iommu);
-	iommu_pmu_cpuhp_free(iommu_pmu);
 	perf_pmu_unregister(&iommu_pmu->pmu);
 }

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: perf/core] perf: Add PERF_EV_CAP_READ_SCOPE
  2024-08-02 15:16 ` [PATCH 2/7] perf: Add PERF_EV_CAP_READ_SCOPE kan.liang
  2024-09-06 15:11   ` Peter Zijlstra
@ 2024-09-10  9:59   ` tip-bot2 for Kan Liang
  1 sibling, 0 replies; 35+ messages in thread
From: tip-bot2 for Kan Liang @ 2024-09-10  9:59 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     a48a36b316ae5d3ab83f9b545dba15998e96d59c
Gitweb:        https://git.kernel.org/tip/a48a36b316ae5d3ab83f9b545dba15998e96d59c
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Fri, 02 Aug 2024 08:16:38 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 10 Sep 2024 11:44:13 +02:00

perf: Add PERF_EV_CAP_READ_SCOPE

Usually, an event can be read from any CPU of the scope. It doesn't need
to be read from the advertised CPU.

Add a new event cap, PERF_EV_CAP_READ_SCOPE. An event of a PMU with
scope can be read from any active CPU in the scope.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240802151643.1691631-3-kan.liang@linux.intel.com
---
 include/linux/perf_event.h |  3 +++
 kernel/events/core.c       | 14 +++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a3cbcd7..794f660 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -636,10 +636,13 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *,
  * PERF_EV_CAP_SIBLING: An event with this flag must be a group sibling and
  * cannot be a group leader. If an event with this flag is detached from the
  * group it is scheduled out and moved into an unrecoverable ERROR state.
+ * PERF_EV_CAP_READ_SCOPE: A CPU event that can be read from any CPU of the
+ * PMU scope where it is active.
  */
 #define PERF_EV_CAP_SOFTWARE		BIT(0)
 #define PERF_EV_CAP_READ_ACTIVE_PKG	BIT(1)
 #define PERF_EV_CAP_SIBLING		BIT(2)
+#define PERF_EV_CAP_READ_SCOPE		BIT(3)
 
 #define SWEVENT_HLIST_BITS		8
 #define SWEVENT_HLIST_SIZE		(1 << SWEVENT_HLIST_BITS)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5ff9735..2766090 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4556,16 +4556,24 @@ struct perf_read_data {
 	int ret;
 };
 
+static inline const struct cpumask *perf_scope_cpu_topology_cpumask(unsigned int scope, int cpu);
+
 static int __perf_event_read_cpu(struct perf_event *event, int event_cpu)
 {
+	int local_cpu = smp_processor_id();
 	u16 local_pkg, event_pkg;
 
 	if ((unsigned)event_cpu >= nr_cpu_ids)
 		return event_cpu;
 
-	if (event->group_caps & PERF_EV_CAP_READ_ACTIVE_PKG) {
-		int local_cpu = smp_processor_id();
+	if (event->group_caps & PERF_EV_CAP_READ_SCOPE) {
+		const struct cpumask *cpumask = perf_scope_cpu_topology_cpumask(event->pmu->scope, event_cpu);
+
+		if (cpumask && cpumask_test_cpu(local_cpu, cpumask))
+			return local_cpu;
+	}
 
+	if (event->group_caps & PERF_EV_CAP_READ_ACTIVE_PKG) {
 		event_pkg = topology_physical_package_id(event_cpu);
 		local_pkg = topology_physical_package_id(local_cpu);
 
@@ -11905,7 +11913,7 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
 				if (cpu >= nr_cpu_ids)
 					ret = -ENODEV;
 				else
-					event->cpu = cpu;
+					event->event_caps |= PERF_EV_CAP_READ_SCOPE;
 			} else {
 				ret = -ENODEV;
 			}

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: perf/core] perf/x86/intel/cstate: Clean up cpumask and hotplug
  2024-08-02 15:16 ` [PATCH 3/7] perf/x86/intel/cstate: Clean up cpumask and hotplug kan.liang
@ 2024-09-10  9:59   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 35+ messages in thread
From: tip-bot2 for Kan Liang @ 2024-09-10  9:59 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     08155c7f2a2cc6ff99886ea5743da274be24ebe4
Gitweb:        https://git.kernel.org/tip/08155c7f2a2cc6ff99886ea5743da274be24ebe4
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Fri, 02 Aug 2024 08:16:39 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 10 Sep 2024 11:44:13 +02:00

perf/x86/intel/cstate: Clean up cpumask and hotplug

There are three cstate PMUs with different scopes, core, die and module.
The scopes are supported by the generic perf_event subsystem now.

Set the scope for each PMU and remove all the cpumask and hotplug codes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240802151643.1691631-4-kan.liang@linux.intel.com
---
 arch/x86/events/intel/cstate.c | 142 +-------------------------------
 include/linux/cpuhotplug.h     |   2 +-
 2 files changed, 5 insertions(+), 139 deletions(-)

diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index 9f116df..ae4ec16 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -128,10 +128,6 @@ static ssize_t __cstate_##_var##_show(struct device *dev,	\
 static struct device_attribute format_attr_##_var =		\
 	__ATTR(_name, 0444, __cstate_##_var##_show, NULL)
 
-static ssize_t cstate_get_attr_cpumask(struct device *dev,
-				       struct device_attribute *attr,
-				       char *buf);
-
 /* Model -> events mapping */
 struct cstate_model {
 	unsigned long		core_events;
@@ -206,22 +202,9 @@ static struct attribute_group cstate_format_attr_group = {
 	.attrs = cstate_format_attrs,
 };
 
-static cpumask_t cstate_core_cpu_mask;
-static DEVICE_ATTR(cpumask, S_IRUGO, cstate_get_attr_cpumask, NULL);
-
-static struct attribute *cstate_cpumask_attrs[] = {
-	&dev_attr_cpumask.attr,
-	NULL,
-};
-
-static struct attribute_group cpumask_attr_group = {
-	.attrs = cstate_cpumask_attrs,
-};
-
 static const struct attribute_group *cstate_attr_groups[] = {
 	&cstate_events_attr_group,
 	&cstate_format_attr_group,
-	&cpumask_attr_group,
 	NULL,
 };
 
@@ -269,8 +252,6 @@ static struct perf_msr pkg_msr[] = {
 	[PERF_CSTATE_PKG_C10_RES] = { MSR_PKG_C10_RESIDENCY,	&group_cstate_pkg_c10,	test_msr },
 };
 
-static cpumask_t cstate_pkg_cpu_mask;
-
 /* cstate_module PMU */
 static struct pmu cstate_module_pmu;
 static bool has_cstate_module;
@@ -291,28 +272,9 @@ static struct perf_msr module_msr[] = {
 	[PERF_CSTATE_MODULE_C6_RES]  = { MSR_MODULE_C6_RES_MS,	&group_cstate_module_c6,	test_msr },
 };
 
-static cpumask_t cstate_module_cpu_mask;
-
-static ssize_t cstate_get_attr_cpumask(struct device *dev,
-				       struct device_attribute *attr,
-				       char *buf)
-{
-	struct pmu *pmu = dev_get_drvdata(dev);
-
-	if (pmu == &cstate_core_pmu)
-		return cpumap_print_to_pagebuf(true, buf, &cstate_core_cpu_mask);
-	else if (pmu == &cstate_pkg_pmu)
-		return cpumap_print_to_pagebuf(true, buf, &cstate_pkg_cpu_mask);
-	else if (pmu == &cstate_module_pmu)
-		return cpumap_print_to_pagebuf(true, buf, &cstate_module_cpu_mask);
-	else
-		return 0;
-}
-
 static int cstate_pmu_event_init(struct perf_event *event)
 {
 	u64 cfg = event->attr.config;
-	int cpu;
 
 	if (event->attr.type != event->pmu->type)
 		return -ENOENT;
@@ -331,20 +293,13 @@ static int cstate_pmu_event_init(struct perf_event *event)
 		if (!(core_msr_mask & (1 << cfg)))
 			return -EINVAL;
 		event->hw.event_base = core_msr[cfg].msr;
-		cpu = cpumask_any_and(&cstate_core_cpu_mask,
-				      topology_sibling_cpumask(event->cpu));
 	} else if (event->pmu == &cstate_pkg_pmu) {
 		if (cfg >= PERF_CSTATE_PKG_EVENT_MAX)
 			return -EINVAL;
 		cfg = array_index_nospec((unsigned long)cfg, PERF_CSTATE_PKG_EVENT_MAX);
 		if (!(pkg_msr_mask & (1 << cfg)))
 			return -EINVAL;
-
-		event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
-
 		event->hw.event_base = pkg_msr[cfg].msr;
-		cpu = cpumask_any_and(&cstate_pkg_cpu_mask,
-				      topology_die_cpumask(event->cpu));
 	} else if (event->pmu == &cstate_module_pmu) {
 		if (cfg >= PERF_CSTATE_MODULE_EVENT_MAX)
 			return -EINVAL;
@@ -352,16 +307,10 @@ static int cstate_pmu_event_init(struct perf_event *event)
 		if (!(module_msr_mask & (1 << cfg)))
 			return -EINVAL;
 		event->hw.event_base = module_msr[cfg].msr;
-		cpu = cpumask_any_and(&cstate_module_cpu_mask,
-				      topology_cluster_cpumask(event->cpu));
 	} else {
 		return -ENOENT;
 	}
 
-	if (cpu >= nr_cpu_ids)
-		return -ENODEV;
-
-	event->cpu = cpu;
 	event->hw.config = cfg;
 	event->hw.idx = -1;
 	return 0;
@@ -412,84 +361,6 @@ static int cstate_pmu_event_add(struct perf_event *event, int mode)
 	return 0;
 }
 
-/*
- * Check if exiting cpu is the designated reader. If so migrate the
- * events when there is a valid target available
- */
-static int cstate_cpu_exit(unsigned int cpu)
-{
-	unsigned int target;
-
-	if (has_cstate_core &&
-	    cpumask_test_and_clear_cpu(cpu, &cstate_core_cpu_mask)) {
-
-		target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
-		/* Migrate events if there is a valid target */
-		if (target < nr_cpu_ids) {
-			cpumask_set_cpu(target, &cstate_core_cpu_mask);
-			perf_pmu_migrate_context(&cstate_core_pmu, cpu, target);
-		}
-	}
-
-	if (has_cstate_pkg &&
-	    cpumask_test_and_clear_cpu(cpu, &cstate_pkg_cpu_mask)) {
-
-		target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
-		/* Migrate events if there is a valid target */
-		if (target < nr_cpu_ids) {
-			cpumask_set_cpu(target, &cstate_pkg_cpu_mask);
-			perf_pmu_migrate_context(&cstate_pkg_pmu, cpu, target);
-		}
-	}
-
-	if (has_cstate_module &&
-	    cpumask_test_and_clear_cpu(cpu, &cstate_module_cpu_mask)) {
-
-		target = cpumask_any_but(topology_cluster_cpumask(cpu), cpu);
-		/* Migrate events if there is a valid target */
-		if (target < nr_cpu_ids) {
-			cpumask_set_cpu(target, &cstate_module_cpu_mask);
-			perf_pmu_migrate_context(&cstate_module_pmu, cpu, target);
-		}
-	}
-	return 0;
-}
-
-static int cstate_cpu_init(unsigned int cpu)
-{
-	unsigned int target;
-
-	/*
-	 * If this is the first online thread of that core, set it in
-	 * the core cpu mask as the designated reader.
-	 */
-	target = cpumask_any_and(&cstate_core_cpu_mask,
-				 topology_sibling_cpumask(cpu));
-
-	if (has_cstate_core && target >= nr_cpu_ids)
-		cpumask_set_cpu(cpu, &cstate_core_cpu_mask);
-
-	/*
-	 * If this is the first online thread of that package, set it
-	 * in the package cpu mask as the designated reader.
-	 */
-	target = cpumask_any_and(&cstate_pkg_cpu_mask,
-				 topology_die_cpumask(cpu));
-	if (has_cstate_pkg && target >= nr_cpu_ids)
-		cpumask_set_cpu(cpu, &cstate_pkg_cpu_mask);
-
-	/*
-	 * If this is the first online thread of that cluster, set it
-	 * in the cluster cpu mask as the designated reader.
-	 */
-	target = cpumask_any_and(&cstate_module_cpu_mask,
-				 topology_cluster_cpumask(cpu));
-	if (has_cstate_module && target >= nr_cpu_ids)
-		cpumask_set_cpu(cpu, &cstate_module_cpu_mask);
-
-	return 0;
-}
-
 static const struct attribute_group *core_attr_update[] = {
 	&group_cstate_core_c1,
 	&group_cstate_core_c3,
@@ -526,6 +397,7 @@ static struct pmu cstate_core_pmu = {
 	.stop		= cstate_pmu_event_stop,
 	.read		= cstate_pmu_event_update,
 	.capabilities	= PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
+	.scope		= PERF_PMU_SCOPE_CORE,
 	.module		= THIS_MODULE,
 };
 
@@ -541,6 +413,7 @@ static struct pmu cstate_pkg_pmu = {
 	.stop		= cstate_pmu_event_stop,
 	.read		= cstate_pmu_event_update,
 	.capabilities	= PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
+	.scope		= PERF_PMU_SCOPE_PKG,
 	.module		= THIS_MODULE,
 };
 
@@ -556,6 +429,7 @@ static struct pmu cstate_module_pmu = {
 	.stop		= cstate_pmu_event_stop,
 	.read		= cstate_pmu_event_update,
 	.capabilities	= PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
+	.scope		= PERF_PMU_SCOPE_CLUSTER,
 	.module		= THIS_MODULE,
 };
 
@@ -810,9 +684,6 @@ static int __init cstate_probe(const struct cstate_model *cm)
 
 static inline void cstate_cleanup(void)
 {
-	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_CSTATE_ONLINE);
-	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_CSTATE_STARTING);
-
 	if (has_cstate_core)
 		perf_pmu_unregister(&cstate_core_pmu);
 
@@ -827,11 +698,6 @@ static int __init cstate_init(void)
 {
 	int err;
 
-	cpuhp_setup_state(CPUHP_AP_PERF_X86_CSTATE_STARTING,
-			  "perf/x86/cstate:starting", cstate_cpu_init, NULL);
-	cpuhp_setup_state(CPUHP_AP_PERF_X86_CSTATE_ONLINE,
-			  "perf/x86/cstate:online", NULL, cstate_cpu_exit);
-
 	if (has_cstate_core) {
 		err = perf_pmu_register(&cstate_core_pmu, cstate_core_pmu.name, -1);
 		if (err) {
@@ -844,6 +710,8 @@ static int __init cstate_init(void)
 
 	if (has_cstate_pkg) {
 		if (topology_max_dies_per_package() > 1) {
+			/* CLX-AP is multi-die and the cstate is die-scope */
+			cstate_pkg_pmu.scope = PERF_PMU_SCOPE_DIE;
 			err = perf_pmu_register(&cstate_pkg_pmu,
 						"cstate_die", -1);
 		} else {
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 9316c39..2101ae2 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -152,7 +152,6 @@ enum cpuhp_state {
 	CPUHP_AP_PERF_X86_AMD_UNCORE_STARTING,
 	CPUHP_AP_PERF_X86_STARTING,
 	CPUHP_AP_PERF_X86_AMD_IBS_STARTING,
-	CPUHP_AP_PERF_X86_CSTATE_STARTING,
 	CPUHP_AP_PERF_XTENSA_STARTING,
 	CPUHP_AP_ARM_VFP_STARTING,
 	CPUHP_AP_ARM64_DEBUG_MONITORS_STARTING,
@@ -209,7 +208,6 @@ enum cpuhp_state {
 	CPUHP_AP_PERF_X86_AMD_UNCORE_ONLINE,
 	CPUHP_AP_PERF_X86_AMD_POWER_ONLINE,
 	CPUHP_AP_PERF_X86_RAPL_ONLINE,
-	CPUHP_AP_PERF_X86_CSTATE_ONLINE,
 	CPUHP_AP_PERF_S390_CF_ONLINE,
 	CPUHP_AP_PERF_S390_SF_ONLINE,
 	CPUHP_AP_PERF_ARM_CCI_ONLINE,

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [tip: perf/core] perf: Generic hotplug support for a PMU with a scope
  2024-08-02 15:16 ` [PATCH 1/7] perf: " kan.liang
@ 2024-09-10  9:59   ` tip-bot2 for Kan Liang
  2024-09-12 10:12     ` Steven Price
  2024-09-19 15:43   ` [PATCH 1/7] " Guenter Roeck
  2024-10-23 17:09   ` Matthieu Baerts
  2 siblings, 1 reply; 35+ messages in thread
From: tip-bot2 for Kan Liang @ 2024-09-10  9:59 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     4ba4f1afb6a9fed8ef896c2363076e36572f71da
Gitweb:        https://git.kernel.org/tip/4ba4f1afb6a9fed8ef896c2363076e36572f71da
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Fri, 02 Aug 2024 08:16:37 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 10 Sep 2024 11:44:12 +02:00

perf: Generic hotplug support for a PMU with a scope

The perf subsystem assumes that the counters of a PMU are per-CPU. So
the user space tool reads a counter from each CPU in the system wide
mode. However, many PMUs don't have a per-CPU counter. The counter is
effective for a scope, e.g., a die or a socket. To address this, a
cpumask is exposed by the kernel driver to restrict to one CPU to stand
for a specific scope. In case the given CPU is removed,
the hotplug support has to be implemented for each such driver.

The codes to support the cpumask and hotplug are very similar.
- Expose a cpumask into sysfs
- Pickup another CPU in the same scope if the given CPU is removed.
- Invoke the perf_pmu_migrate_context() to migrate to a new CPU.
- In event init, always set the CPU in the cpumask to event->cpu

Similar duplicated codes are implemented for each such PMU driver. It
would be good to introduce a generic infrastructure to avoid such
duplication.

5 popular scopes are implemented here, core, die, cluster, pkg, and
the system-wide. The scope can be set when a PMU is registered. If so, a
"cpumask" is automatically exposed for the PMU.

The "cpumask" is from the perf_online_<scope>_mask, which is to track
the active CPU for each scope. They are set when the first CPU of the
scope is online via the generic perf hotplug support. When a
corresponding CPU is removed, the perf_online_<scope>_mask is updated
accordingly and the PMU will be moved to a new CPU from the same scope
if possible.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240802151643.1691631-2-kan.liang@linux.intel.com
---
 include/linux/perf_event.h |  18 ++++-
 kernel/events/core.c       | 164 +++++++++++++++++++++++++++++++++++-
 2 files changed, 180 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 7015499..a3cbcd7 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -295,6 +295,19 @@ struct perf_event_pmu_context;
 #define PERF_PMU_CAP_AUX_OUTPUT			0x0080
 #define PERF_PMU_CAP_EXTENDED_HW_TYPE		0x0100
 
+/**
+ * pmu::scope
+ */
+enum perf_pmu_scope {
+	PERF_PMU_SCOPE_NONE	= 0,
+	PERF_PMU_SCOPE_CORE,
+	PERF_PMU_SCOPE_DIE,
+	PERF_PMU_SCOPE_CLUSTER,
+	PERF_PMU_SCOPE_PKG,
+	PERF_PMU_SCOPE_SYS_WIDE,
+	PERF_PMU_MAX_SCOPE,
+};
+
 struct perf_output_handle;
 
 #define PMU_NULL_DEV	((void *)(~0UL))
@@ -318,6 +331,11 @@ struct pmu {
 	 */
 	int				capabilities;
 
+	/*
+	 * PMU scope
+	 */
+	unsigned int			scope;
+
 	int __percpu			*pmu_disable_count;
 	struct perf_cpu_pmu_context __percpu *cpu_pmu_context;
 	atomic_t			exclusive_cnt; /* < 0: cpu; > 0: tsk */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 67e115d..5ff9735 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -436,6 +436,11 @@ static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
 static struct srcu_struct pmus_srcu;
 static cpumask_var_t perf_online_mask;
+static cpumask_var_t perf_online_core_mask;
+static cpumask_var_t perf_online_die_mask;
+static cpumask_var_t perf_online_cluster_mask;
+static cpumask_var_t perf_online_pkg_mask;
+static cpumask_var_t perf_online_sys_mask;
 static struct kmem_cache *perf_event_cache;
 
 /*
@@ -11578,10 +11583,60 @@ perf_event_mux_interval_ms_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(perf_event_mux_interval_ms);
 
+static inline const struct cpumask *perf_scope_cpu_topology_cpumask(unsigned int scope, int cpu)
+{
+	switch (scope) {
+	case PERF_PMU_SCOPE_CORE:
+		return topology_sibling_cpumask(cpu);
+	case PERF_PMU_SCOPE_DIE:
+		return topology_die_cpumask(cpu);
+	case PERF_PMU_SCOPE_CLUSTER:
+		return topology_cluster_cpumask(cpu);
+	case PERF_PMU_SCOPE_PKG:
+		return topology_core_cpumask(cpu);
+	case PERF_PMU_SCOPE_SYS_WIDE:
+		return cpu_online_mask;
+	}
+
+	return NULL;
+}
+
+static inline struct cpumask *perf_scope_cpumask(unsigned int scope)
+{
+	switch (scope) {
+	case PERF_PMU_SCOPE_CORE:
+		return perf_online_core_mask;
+	case PERF_PMU_SCOPE_DIE:
+		return perf_online_die_mask;
+	case PERF_PMU_SCOPE_CLUSTER:
+		return perf_online_cluster_mask;
+	case PERF_PMU_SCOPE_PKG:
+		return perf_online_pkg_mask;
+	case PERF_PMU_SCOPE_SYS_WIDE:
+		return perf_online_sys_mask;
+	}
+
+	return NULL;
+}
+
+static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
+			    char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct cpumask *mask = perf_scope_cpumask(pmu->scope);
+
+	if (mask)
+		return cpumap_print_to_pagebuf(true, buf, mask);
+	return 0;
+}
+
+static DEVICE_ATTR_RO(cpumask);
+
 static struct attribute *pmu_dev_attrs[] = {
 	&dev_attr_type.attr,
 	&dev_attr_perf_event_mux_interval_ms.attr,
 	&dev_attr_nr_addr_filters.attr,
+	&dev_attr_cpumask.attr,
 	NULL,
 };
 
@@ -11593,6 +11648,10 @@ static umode_t pmu_dev_is_visible(struct kobject *kobj, struct attribute *a, int
 	if (n == 2 && !pmu->nr_addr_filters)
 		return 0;
 
+	/* cpumask */
+	if (n == 3 && pmu->scope == PERF_PMU_SCOPE_NONE)
+		return 0;
+
 	return a->mode;
 }
 
@@ -11677,6 +11736,11 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
 		goto free_pdc;
 	}
 
+	if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE, "Can not register a pmu with an invalid scope.\n")) {
+		ret = -EINVAL;
+		goto free_pdc;
+	}
+
 	pmu->name = name;
 
 	if (type >= 0)
@@ -11831,6 +11895,22 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
 		    event_has_any_exclude_flag(event))
 			ret = -EINVAL;
 
+		if (pmu->scope != PERF_PMU_SCOPE_NONE && event->cpu >= 0) {
+			const struct cpumask *cpumask = perf_scope_cpu_topology_cpumask(pmu->scope, event->cpu);
+			struct cpumask *pmu_cpumask = perf_scope_cpumask(pmu->scope);
+			int cpu;
+
+			if (pmu_cpumask && cpumask) {
+				cpu = cpumask_any_and(pmu_cpumask, cpumask);
+				if (cpu >= nr_cpu_ids)
+					ret = -ENODEV;
+				else
+					event->cpu = cpu;
+			} else {
+				ret = -ENODEV;
+			}
+		}
+
 		if (ret && event->destroy)
 			event->destroy(event);
 	}
@@ -13784,6 +13864,12 @@ static void __init perf_event_init_all_cpus(void)
 	int cpu;
 
 	zalloc_cpumask_var(&perf_online_mask, GFP_KERNEL);
+	zalloc_cpumask_var(&perf_online_core_mask, GFP_KERNEL);
+	zalloc_cpumask_var(&perf_online_die_mask, GFP_KERNEL);
+	zalloc_cpumask_var(&perf_online_cluster_mask, GFP_KERNEL);
+	zalloc_cpumask_var(&perf_online_pkg_mask, GFP_KERNEL);
+	zalloc_cpumask_var(&perf_online_sys_mask, GFP_KERNEL);
+
 
 	for_each_possible_cpu(cpu) {
 		swhash = &per_cpu(swevent_htable, cpu);
@@ -13833,6 +13919,40 @@ static void __perf_event_exit_context(void *__info)
 	raw_spin_unlock(&ctx->lock);
 }
 
+static void perf_event_clear_cpumask(unsigned int cpu)
+{
+	int target[PERF_PMU_MAX_SCOPE];
+	unsigned int scope;
+	struct pmu *pmu;
+
+	cpumask_clear_cpu(cpu, perf_online_mask);
+
+	for (scope = PERF_PMU_SCOPE_NONE + 1; scope < PERF_PMU_MAX_SCOPE; scope++) {
+		const struct cpumask *cpumask = perf_scope_cpu_topology_cpumask(scope, cpu);
+		struct cpumask *pmu_cpumask = perf_scope_cpumask(scope);
+
+		target[scope] = -1;
+		if (WARN_ON_ONCE(!pmu_cpumask || !cpumask))
+			continue;
+
+		if (!cpumask_test_and_clear_cpu(cpu, pmu_cpumask))
+			continue;
+		target[scope] = cpumask_any_but(cpumask, cpu);
+		if (target[scope] < nr_cpu_ids)
+			cpumask_set_cpu(target[scope], pmu_cpumask);
+	}
+
+	/* migrate */
+	list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
+		if (pmu->scope == PERF_PMU_SCOPE_NONE ||
+		    WARN_ON_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE))
+			continue;
+
+		if (target[pmu->scope] >= 0 && target[pmu->scope] < nr_cpu_ids)
+			perf_pmu_migrate_context(pmu, cpu, target[pmu->scope]);
+	}
+}
+
 static void perf_event_exit_cpu_context(int cpu)
 {
 	struct perf_cpu_context *cpuctx;
@@ -13840,6 +13960,11 @@ static void perf_event_exit_cpu_context(int cpu)
 
 	// XXX simplify cpuctx->online
 	mutex_lock(&pmus_lock);
+	/*
+	 * Clear the cpumasks, and migrate to other CPUs if possible.
+	 * Must be invoked before the __perf_event_exit_context.
+	 */
+	perf_event_clear_cpumask(cpu);
 	cpuctx = per_cpu_ptr(&perf_cpu_context, cpu);
 	ctx = &cpuctx->ctx;
 
@@ -13847,7 +13972,6 @@ static void perf_event_exit_cpu_context(int cpu)
 	smp_call_function_single(cpu, __perf_event_exit_context, ctx, 1);
 	cpuctx->online = 0;
 	mutex_unlock(&ctx->mutex);
-	cpumask_clear_cpu(cpu, perf_online_mask);
 	mutex_unlock(&pmus_lock);
 }
 #else
@@ -13856,6 +13980,42 @@ static void perf_event_exit_cpu_context(int cpu) { }
 
 #endif
 
+static void perf_event_setup_cpumask(unsigned int cpu)
+{
+	struct cpumask *pmu_cpumask;
+	unsigned int scope;
+
+	cpumask_set_cpu(cpu, perf_online_mask);
+
+	/*
+	 * Early boot stage, the cpumask hasn't been set yet.
+	 * The perf_online_<domain>_masks includes the first CPU of each domain.
+	 * Always uncondifionally set the boot CPU for the perf_online_<domain>_masks.
+	 */
+	if (!topology_sibling_cpumask(cpu)) {
+		for (scope = PERF_PMU_SCOPE_NONE + 1; scope < PERF_PMU_MAX_SCOPE; scope++) {
+			pmu_cpumask = perf_scope_cpumask(scope);
+			if (WARN_ON_ONCE(!pmu_cpumask))
+				continue;
+			cpumask_set_cpu(cpu, pmu_cpumask);
+		}
+		return;
+	}
+
+	for (scope = PERF_PMU_SCOPE_NONE + 1; scope < PERF_PMU_MAX_SCOPE; scope++) {
+		const struct cpumask *cpumask = perf_scope_cpu_topology_cpumask(scope, cpu);
+
+		pmu_cpumask = perf_scope_cpumask(scope);
+
+		if (WARN_ON_ONCE(!pmu_cpumask || !cpumask))
+			continue;
+
+		if (!cpumask_empty(cpumask) &&
+		    cpumask_any_and(pmu_cpumask, cpumask) >= nr_cpu_ids)
+			cpumask_set_cpu(cpu, pmu_cpumask);
+	}
+}
+
 int perf_event_init_cpu(unsigned int cpu)
 {
 	struct perf_cpu_context *cpuctx;
@@ -13864,7 +14024,7 @@ int perf_event_init_cpu(unsigned int cpu)
 	perf_swevent_init_cpu(cpu);
 
 	mutex_lock(&pmus_lock);
-	cpumask_set_cpu(cpu, perf_online_mask);
+	perf_event_setup_cpumask(cpu);
 	cpuctx = per_cpu_ptr(&perf_cpu_context, cpu);
 	ctx = &cpuctx->ctx;
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [tip: perf/core] perf: Generic hotplug support for a PMU with a scope
  2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
@ 2024-09-12 10:12     ` Steven Price
  2024-09-12 14:53       ` Liang, Kan
  0 siblings, 1 reply; 35+ messages in thread
From: Steven Price @ 2024-09-12 10:12 UTC (permalink / raw)
  To: linux-kernel, linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86

On 10/09/2024 10:59, tip-bot2 for Kan Liang wrote:
> The following commit has been merged into the perf/core branch of tip:
> 
> Commit-ID:     4ba4f1afb6a9fed8ef896c2363076e36572f71da
> Gitweb:        https://git.kernel.org/tip/4ba4f1afb6a9fed8ef896c2363076e36572f71da
> Author:        Kan Liang <kan.liang@linux.intel.com>
> AuthorDate:    Fri, 02 Aug 2024 08:16:37 -07:00
> Committer:     Peter Zijlstra <peterz@infradead.org>
> CommitterDate: Tue, 10 Sep 2024 11:44:12 +02:00
> 
> perf: Generic hotplug support for a PMU with a scope
> 
> The perf subsystem assumes that the counters of a PMU are per-CPU. So
> the user space tool reads a counter from each CPU in the system wide
> mode. However, many PMUs don't have a per-CPU counter. The counter is
> effective for a scope, e.g., a die or a socket. To address this, a
> cpumask is exposed by the kernel driver to restrict to one CPU to stand
> for a specific scope. In case the given CPU is removed,
> the hotplug support has to be implemented for each such driver.
> 
> The codes to support the cpumask and hotplug are very similar.
> - Expose a cpumask into sysfs
> - Pickup another CPU in the same scope if the given CPU is removed.
> - Invoke the perf_pmu_migrate_context() to migrate to a new CPU.
> - In event init, always set the CPU in the cpumask to event->cpu
> 
> Similar duplicated codes are implemented for each such PMU driver. It
> would be good to introduce a generic infrastructure to avoid such
> duplication.
> 
> 5 popular scopes are implemented here, core, die, cluster, pkg, and
> the system-wide. The scope can be set when a PMU is registered. If so, a
> "cpumask" is automatically exposed for the PMU.
> 
> The "cpumask" is from the perf_online_<scope>_mask, which is to track
> the active CPU for each scope. They are set when the first CPU of the
> scope is online via the generic perf hotplug support. When a
> corresponding CPU is removed, the perf_online_<scope>_mask is updated
> accordingly and the PMU will be moved to a new CPU from the same scope
> if possible.
> 
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lore.kernel.org/r/20240802151643.1691631-2-kan.liang@linux.intel.com
> ---
>  include/linux/perf_event.h |  18 ++++-
>  kernel/events/core.c       | 164 +++++++++++++++++++++++++++++++++++-
>  2 files changed, 180 insertions(+), 2 deletions(-)
> 
[...]
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 67e115d..5ff9735 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
[...]
> @@ -13856,6 +13980,42 @@ static void perf_event_exit_cpu_context(int cpu) { }
>  
>  #endif
>  
> +static void perf_event_setup_cpumask(unsigned int cpu)
> +{
> +	struct cpumask *pmu_cpumask;
> +	unsigned int scope;
> +
> +	cpumask_set_cpu(cpu, perf_online_mask);
> +
> +	/*
> +	 * Early boot stage, the cpumask hasn't been set yet.
> +	 * The perf_online_<domain>_masks includes the first CPU of each domain.
> +	 * Always uncondifionally set the boot CPU for the perf_online_<domain>_masks.
                  ^^^^^^^^^^^^^^^ typo

> +	 */
> +	if (!topology_sibling_cpumask(cpu)) {

This causes a compiler warning:

> kernel/events/core.c: In function 'perf_event_setup_cpumask':
> kernel/events/core.c:14012:13: error: the comparison will always evaluate as 'true' for the address of 'thread_sibling' will never be NULL [-Werror=address]
> 14012 |         if (!topology_sibling_cpumask(cpu)) {
>       |             ^
> In file included from ./include/linux/topology.h:30,
>                  from ./include/linux/gfp.h:8,
>                  from ./include/linux/xarray.h:16,
>                  from ./include/linux/list_lru.h:14,
>                  from ./include/linux/fs.h:13,
>                  from kernel/events/core.c:11:
> ./include/linux/arch_topology.h:78:19: note: 'thread_sibling' declared here
>    78 |         cpumask_t thread_sibling;
>       |                   ^~~~~~~~~~~~~~
> cc1: all warnings being treated as errors

Steve


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [tip: perf/core] perf: Generic hotplug support for a PMU with a scope
  2024-09-12 10:12     ` Steven Price
@ 2024-09-12 14:53       ` Liang, Kan
  0 siblings, 0 replies; 35+ messages in thread
From: Liang, Kan @ 2024-09-12 14:53 UTC (permalink / raw)
  To: Steven Price, linux-kernel, linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86



On 2024-09-12 6:12 a.m., Steven Price wrote:
> On 10/09/2024 10:59, tip-bot2 for Kan Liang wrote:
>> The following commit has been merged into the perf/core branch of tip:
>>
>> Commit-ID:     4ba4f1afb6a9fed8ef896c2363076e36572f71da
>> Gitweb:        https://git.kernel.org/tip/4ba4f1afb6a9fed8ef896c2363076e36572f71da
>> Author:        Kan Liang <kan.liang@linux.intel.com>
>> AuthorDate:    Fri, 02 Aug 2024 08:16:37 -07:00
>> Committer:     Peter Zijlstra <peterz@infradead.org>
>> CommitterDate: Tue, 10 Sep 2024 11:44:12 +02:00
>>
>> perf: Generic hotplug support for a PMU with a scope
>>
>> The perf subsystem assumes that the counters of a PMU are per-CPU. So
>> the user space tool reads a counter from each CPU in the system wide
>> mode. However, many PMUs don't have a per-CPU counter. The counter is
>> effective for a scope, e.g., a die or a socket. To address this, a
>> cpumask is exposed by the kernel driver to restrict to one CPU to stand
>> for a specific scope. In case the given CPU is removed,
>> the hotplug support has to be implemented for each such driver.
>>
>> The codes to support the cpumask and hotplug are very similar.
>> - Expose a cpumask into sysfs
>> - Pickup another CPU in the same scope if the given CPU is removed.
>> - Invoke the perf_pmu_migrate_context() to migrate to a new CPU.
>> - In event init, always set the CPU in the cpumask to event->cpu
>>
>> Similar duplicated codes are implemented for each such PMU driver. It
>> would be good to introduce a generic infrastructure to avoid such
>> duplication.
>>
>> 5 popular scopes are implemented here, core, die, cluster, pkg, and
>> the system-wide. The scope can be set when a PMU is registered. If so, a
>> "cpumask" is automatically exposed for the PMU.
>>
>> The "cpumask" is from the perf_online_<scope>_mask, which is to track
>> the active CPU for each scope. They are set when the first CPU of the
>> scope is online via the generic perf hotplug support. When a
>> corresponding CPU is removed, the perf_online_<scope>_mask is updated
>> accordingly and the PMU will be moved to a new CPU from the same scope
>> if possible.
>>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Link: https://lore.kernel.org/r/20240802151643.1691631-2-kan.liang@linux.intel.com
>> ---
>>  include/linux/perf_event.h |  18 ++++-
>>  kernel/events/core.c       | 164 +++++++++++++++++++++++++++++++++++-
>>  2 files changed, 180 insertions(+), 2 deletions(-)
>>
> [...]
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 67e115d..5ff9735 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
> [...]
>> @@ -13856,6 +13980,42 @@ static void perf_event_exit_cpu_context(int cpu) { }
>>  
>>  #endif
>>  
>> +static void perf_event_setup_cpumask(unsigned int cpu)
>> +{
>> +	struct cpumask *pmu_cpumask;
>> +	unsigned int scope;
>> +
>> +	cpumask_set_cpu(cpu, perf_online_mask);
>> +
>> +	/*
>> +	 * Early boot stage, the cpumask hasn't been set yet.
>> +	 * The perf_online_<domain>_masks includes the first CPU of each domain.
>> +	 * Always uncondifionally set the boot CPU for the perf_online_<domain>_masks.
>                   ^^^^^^^^^^^^^^^ typo
> 
>> +	 */
>> +	if (!topology_sibling_cpumask(cpu)) {
> 
> This causes a compiler warning:
> 
>> kernel/events/core.c: In function 'perf_event_setup_cpumask':
>> kernel/events/core.c:14012:13: error: the comparison will always evaluate as 'true' for the address of 'thread_sibling' will never be NULL [-Werror=address]
>> 14012 |         if (!topology_sibling_cpumask(cpu)) {
>>       |             ^
>> In file included from ./include/linux/topology.h:30,
>>                  from ./include/linux/gfp.h:8,
>>                  from ./include/linux/xarray.h:16,
>>                  from ./include/linux/list_lru.h:14,
>>                  from ./include/linux/fs.h:13,
>>                  from kernel/events/core.c:11:
>> ./include/linux/arch_topology.h:78:19: note: 'thread_sibling' declared here
>>    78 |         cpumask_t thread_sibling;
>>       |                   ^~~~~~~~~~~~~~
>> cc1: all warnings being treated as errors
> 

The patch to fix the warning has been posted.
https://lore.kernel.org/lkml/20240912145025.1574448-1-kan.liang@linux.intel.com/

Please give it a try.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] perf: Generic hotplug support for a PMU with a scope
  2024-08-02 15:16 ` [PATCH 1/7] perf: " kan.liang
  2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
@ 2024-09-19 15:43   ` Guenter Roeck
  2024-09-19 16:28     ` Liang, Kan
  2024-10-23 17:09   ` Matthieu Baerts
  2 siblings, 1 reply; 35+ messages in thread
From: Guenter Roeck @ 2024-09-19 15:43 UTC (permalink / raw)
  To: kan.liang; +Cc: peterz, mingo, acme, namhyung, irogers, linux-kernel

On Fri, Aug 02, 2024 at 08:16:37AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The perf subsystem assumes that the counters of a PMU are per-CPU. So
> the user space tool reads a counter from each CPU in the system wide
> mode. However, many PMUs don't have a per-CPU counter. The counter is
> effective for a scope, e.g., a die or a socket. To address this, a
> cpumask is exposed by the kernel driver to restrict to one CPU to stand
> for a specific scope. In case the given CPU is removed,
> the hotplug support has to be implemented for each such driver.
> 
> The codes to support the cpumask and hotplug are very similar.
> - Expose a cpumask into sysfs
> - Pickup another CPU in the same scope if the given CPU is removed.
> - Invoke the perf_pmu_migrate_context() to migrate to a new CPU.
> - In event init, always set the CPU in the cpumask to event->cpu
> 
> Similar duplicated codes are implemented for each such PMU driver. It
> would be good to introduce a generic infrastructure to avoid such
> duplication.
> 
> 5 popular scopes are implemented here, core, die, cluster, pkg, and
> the system-wide. The scope can be set when a PMU is registered. If so, a
> "cpumask" is automatically exposed for the PMU.
> 
> The "cpumask" is from the perf_online_<scope>_mask, which is to track
> the active CPU for each scope. They are set when the first CPU of the
> scope is online via the generic perf hotplug support. When a
> corresponding CPU is removed, the perf_online_<scope>_mask is updated
> accordingly and the PMU will be moved to a new CPU from the same scope
> if possible.
> 
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>

This patch results in build failures in the mainline kernel.

Building arm:allmodconfig ... failed
Building arm64:allmodconfig ... failed
Building loongarch:allmodconfig ... failed
Building mips:allmodconfig ... failed
Building parisc:allmodconfig ... failed

[ and more ]

--------------
Error log:
kernel/events/core.c: In function 'perf_event_setup_cpumask':
kernel/events/core.c:14012:13: error: the comparison will always evaluate as 'true' for the address of 'thread_sibling' will never be NULL


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] perf: Generic hotplug support for a PMU with a scope
  2024-09-19 15:43   ` [PATCH 1/7] " Guenter Roeck
@ 2024-09-19 16:28     ` Liang, Kan
  0 siblings, 0 replies; 35+ messages in thread
From: Liang, Kan @ 2024-09-19 16:28 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: peterz, mingo, acme, namhyung, irogers, linux-kernel



On 2024-09-19 11:43 a.m., Guenter Roeck wrote:
> On Fri, Aug 02, 2024 at 08:16:37AM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> The perf subsystem assumes that the counters of a PMU are per-CPU. So
>> the user space tool reads a counter from each CPU in the system wide
>> mode. However, many PMUs don't have a per-CPU counter. The counter is
>> effective for a scope, e.g., a die or a socket. To address this, a
>> cpumask is exposed by the kernel driver to restrict to one CPU to stand
>> for a specific scope. In case the given CPU is removed,
>> the hotplug support has to be implemented for each such driver.
>>
>> The codes to support the cpumask and hotplug are very similar.
>> - Expose a cpumask into sysfs
>> - Pickup another CPU in the same scope if the given CPU is removed.
>> - Invoke the perf_pmu_migrate_context() to migrate to a new CPU.
>> - In event init, always set the CPU in the cpumask to event->cpu
>>
>> Similar duplicated codes are implemented for each such PMU driver. It
>> would be good to introduce a generic infrastructure to avoid such
>> duplication.
>>
>> 5 popular scopes are implemented here, core, die, cluster, pkg, and
>> the system-wide. The scope can be set when a PMU is registered. If so, a
>> "cpumask" is automatically exposed for the PMU.
>>
>> The "cpumask" is from the perf_online_<scope>_mask, which is to track
>> the active CPU for each scope. They are set when the first CPU of the
>> scope is online via the generic perf hotplug support. When a
>> corresponding CPU is removed, the perf_online_<scope>_mask is updated
>> accordingly and the PMU will be moved to a new CPU from the same scope
>> if possible.
>>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> 
> This patch results in build failures in the mainline kernel.
> 
> Building arm:allmodconfig ... failed
> Building arm64:allmodconfig ... failed
> Building loongarch:allmodconfig ... failed
> Building mips:allmodconfig ... failed
> Building parisc:allmodconfig ... failed
> 
> [ and more ]
> 
> --------------
> Error log:
> kernel/events/core.c: In function 'perf_event_setup_cpumask':
> kernel/events/core.c:14012:13: error: the comparison will always evaluate as 'true' for the address of 'thread_sibling' will never be NULL
> 

The patch has been posted, but haven't been merged yet.

https://lore.kernel.org/lkml/20240912145025.1574448-1-kan.liang@linux.intel.com/

Thanks,
Kan


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] perf: Generic hotplug support for a PMU with a scope
  2024-08-02 15:16 ` [PATCH 1/7] perf: " kan.liang
  2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2024-09-19 15:43   ` [PATCH 1/7] " Guenter Roeck
@ 2024-10-23 17:09   ` Matthieu Baerts
  2024-10-23 17:46     ` Liang, Kan
  2 siblings, 1 reply; 35+ messages in thread
From: Matthieu Baerts @ 2024-10-23 17:09 UTC (permalink / raw)
  To: kan.liang
  Cc: linux-kernel, irogers, peterz, mingo, acme, namhyung,
	linux-perf-users

Hi Kan Liang,

(+ cc Perf ML)

On 02/08/2024 17:16, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The perf subsystem assumes that the counters of a PMU are per-CPU. So
> the user space tool reads a counter from each CPU in the system wide
> mode. However, many PMUs don't have a per-CPU counter. The counter is
> effective for a scope, e.g., a die or a socket. To address this, a
> cpumask is exposed by the kernel driver to restrict to one CPU to stand
> for a specific scope. In case the given CPU is removed,
> the hotplug support has to be implemented for each such driver.
> 
> The codes to support the cpumask and hotplug are very similar.
> - Expose a cpumask into sysfs
> - Pickup another CPU in the same scope if the given CPU is removed.
> - Invoke the perf_pmu_migrate_context() to migrate to a new CPU.
> - In event init, always set the CPU in the cpumask to event->cpu
> 
> Similar duplicated codes are implemented for each such PMU driver. It
> would be good to introduce a generic infrastructure to avoid such
> duplication.
> 
> 5 popular scopes are implemented here, core, die, cluster, pkg, and
> the system-wide. The scope can be set when a PMU is registered. If so, a
> "cpumask" is automatically exposed for the PMU.
> 
> The "cpumask" is from the perf_online_<scope>_mask, which is to track
> the active CPU for each scope. They are set when the first CPU of the
> scope is online via the generic perf hotplug support. When a
> corresponding CPU is removed, the perf_online_<scope>_mask is updated
> accordingly and the PMU will be moved to a new CPU from the same scope
> if possible.

Thank you for the patch.

It looks like this modification causes the following warning on my side
when shutting down a VM running a kernel built with a debug config
including CONFIG_PROVE_RCU_LIST=y (and CONFIG_RCU_EXPERT=y):


> # /usr/lib/klibc/bin/poweroff
> 
> =============================
> WARNING: suspicious RCU usage
> 6.12.0-rc3+ #3 Not tainted
> -----------------------------
> kernel/events/core.c:13962 RCU-list traversed in non-reader section!!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 2, debug_locks = 1
> 3 locks held by poweroff/11748:
>  #0: ffffffff9b441e28 (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot (kernel/reboot.c:770)
>  #1: ffffffff9b43eab0 ((reboot_notifier_list).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain (kernel/notifier.c:388)
>  #2: ffffffff9b6d06c8 (pmus_lock){+.+.}-{3:3}, at: perf_event_exit_cpu_context (kernel/events/core.c:13983)
> 
> stack backtrace:
> CPU: 0 UID: 0 PID: 11748 Comm: poweroff Not tainted 6.12.0-rc3+ #3
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Call Trace:
>  <TASK>
>  dump_stack_lvl (lib/dump_stack.c:123)
>  lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
>  perf_event_clear_cpumask (kernel/events/core.c:13962 (discriminator 9))
>  ? __pfx_perf_event_clear_cpumask (kernel/events/core.c:13939)
>  ? acpi_execute_simple_method (drivers/acpi/utils.c:679)
>  ? __pfx_acpi_execute_simple_method (drivers/acpi/utils.c:679)
>  ? md_notify_reboot (drivers/md/md.c:9860)
>  perf_event_exit_cpu_context (kernel/events/core.c:13984 (discriminator 1))
>  perf_reboot (kernel/events/core.c:14066 (discriminator 3))
>  ? trace_notifier_run (include/trace/events/notifier.h:59 (discriminator 2))
>  notifier_call_chain (kernel/notifier.c:93)
>  blocking_notifier_call_chain (kernel/notifier.c:389)
>  kernel_power_off (kernel/reboot.c:294)
>  __do_sys_reboot (kernel/reboot.c:771)
>  ? __pfx___do_sys_reboot (kernel/reboot.c:717)
>  ? __pfx_ksys_sync (fs/sync.c:98)
>  do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1))
>  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)


It is very easy for me to reproduce it: simply by stopping the VM. Just
in case, here are the steps I used to have the same VM:

  $ cd [kernel source code]
  $ echo true > .virtme-exec-run
  $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug -e RCU_EXPERT -e PROVE_RCU_LIST


I have one question below about the modification you did here.

(...)

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index aa3450bdc227..5e1877c4cb4c 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c

(...)

> @@ -13730,6 +13816,40 @@ static void __perf_event_exit_context(void *__info)
>  	raw_spin_unlock(&ctx->lock);
>  }
>  
> +static void perf_event_clear_cpumask(unsigned int cpu)
> +{
> +	int target[PERF_PMU_MAX_SCOPE];
> +	unsigned int scope;
> +	struct pmu *pmu;
> +
> +	cpumask_clear_cpu(cpu, perf_online_mask);
> +
> +	for (scope = PERF_PMU_SCOPE_NONE + 1; scope < PERF_PMU_MAX_SCOPE; scope++) {
> +		const struct cpumask *cpumask = perf_scope_cpu_topology_cpumask(scope, cpu);
> +		struct cpumask *pmu_cpumask = perf_scope_cpumask(scope);
> +
> +		target[scope] = -1;
> +		if (WARN_ON_ONCE(!pmu_cpumask || !cpumask))
> +			continue;
> +
> +		if (!cpumask_test_and_clear_cpu(cpu, pmu_cpumask))
> +			continue;
> +		target[scope] = cpumask_any_but(cpumask, cpu);
> +		if (target[scope] < nr_cpu_ids)
> +			cpumask_set_cpu(target[scope], pmu_cpumask);
> +	}
> +
> +	/* migrate */
> +	list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {


Here is the line causing the warning, because rcu_read_lock() is not
used before.

I don't know this code, but I guess you are not only doing read
operations here if you are migrating something, right? This operation is
done under the "pmus_lock", maybe the "_rcu" variant is not needed here?

So just using this instead is maybe enough?

  list_for_each_entry(pmu, &pmus, entry) {


> +		if (pmu->scope == PERF_PMU_SCOPE_NONE ||
> +		    WARN_ON_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE))
> +			continue;
> +
> +		if (target[pmu->scope] >= 0 && target[pmu->scope] < nr_cpu_ids)
> +			perf_pmu_migrate_context(pmu, cpu, target[pmu->scope]);
> +	}
> +}
> +
>  static void perf_event_exit_cpu_context(int cpu)
>  {
>  	struct perf_cpu_context *cpuctx;
> @@ -13737,6 +13857,11 @@ static void perf_event_exit_cpu_context(int cpu)
>  
>  	// XXX simplify cpuctx->online
>  	mutex_lock(&pmus_lock);
> +	/*
> +	 * Clear the cpumasks, and migrate to other CPUs if possible.
> +	 * Must be invoked before the __perf_event_exit_context.
> +	 */
> +	perf_event_clear_cpumask(cpu);
>  	cpuctx = per_cpu_ptr(&perf_cpu_context, cpu);
>  	ctx = &cpuctx->ctx;
>  
> @@ -13744,7 +13869,6 @@ static void perf_event_exit_cpu_context(int cpu)
>  	smp_call_function_single(cpu, __perf_event_exit_context, ctx, 1);
>  	cpuctx->online = 0;
>  	mutex_unlock(&ctx->mutex);
> -	cpumask_clear_cpu(cpu, perf_online_mask);
>  	mutex_unlock(&pmus_lock);
>  }
>  #else
(...)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] perf: Generic hotplug support for a PMU with a scope
  2024-10-23 17:09   ` Matthieu Baerts
@ 2024-10-23 17:46     ` Liang, Kan
  2024-10-23 18:19       ` Peter Zijlstra
  0 siblings, 1 reply; 35+ messages in thread
From: Liang, Kan @ 2024-10-23 17:46 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: linux-kernel, irogers, peterz, mingo, acme, namhyung,
	linux-perf-users



On 2024-10-23 1:09 p.m., Matthieu Baerts wrote:
> Hi Kan Liang,
> 
> (+ cc Perf ML)
> 
> On 02/08/2024 17:16, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> The perf subsystem assumes that the counters of a PMU are per-CPU. So
>> the user space tool reads a counter from each CPU in the system wide
>> mode. However, many PMUs don't have a per-CPU counter. The counter is
>> effective for a scope, e.g., a die or a socket. To address this, a
>> cpumask is exposed by the kernel driver to restrict to one CPU to stand
>> for a specific scope. In case the given CPU is removed,
>> the hotplug support has to be implemented for each such driver.
>>
>> The codes to support the cpumask and hotplug are very similar.
>> - Expose a cpumask into sysfs
>> - Pickup another CPU in the same scope if the given CPU is removed.
>> - Invoke the perf_pmu_migrate_context() to migrate to a new CPU.
>> - In event init, always set the CPU in the cpumask to event->cpu
>>
>> Similar duplicated codes are implemented for each such PMU driver. It
>> would be good to introduce a generic infrastructure to avoid such
>> duplication.
>>
>> 5 popular scopes are implemented here, core, die, cluster, pkg, and
>> the system-wide. The scope can be set when a PMU is registered. If so, a
>> "cpumask" is automatically exposed for the PMU.
>>
>> The "cpumask" is from the perf_online_<scope>_mask, which is to track
>> the active CPU for each scope. They are set when the first CPU of the
>> scope is online via the generic perf hotplug support. When a
>> corresponding CPU is removed, the perf_online_<scope>_mask is updated
>> accordingly and the PMU will be moved to a new CPU from the same scope
>> if possible.
> 
> Thank you for the patch.
> 
> It looks like this modification causes the following warning on my side
> when shutting down a VM running a kernel built with a debug config
> including CONFIG_PROVE_RCU_LIST=y (and CONFIG_RCU_EXPERT=y):
> 
> 
>> # /usr/lib/klibc/bin/poweroff
>>
>> =============================
>> WARNING: suspicious RCU usage
>> 6.12.0-rc3+ #3 Not tainted
>> -----------------------------
>> kernel/events/core.c:13962 RCU-list traversed in non-reader section!!
>>
>> other info that might help us debug this:
>>
>>
>> rcu_scheduler_active = 2, debug_locks = 1
>> 3 locks held by poweroff/11748:
>>  #0: ffffffff9b441e28 (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot (kernel/reboot.c:770)
>>  #1: ffffffff9b43eab0 ((reboot_notifier_list).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain (kernel/notifier.c:388)
>>  #2: ffffffff9b6d06c8 (pmus_lock){+.+.}-{3:3}, at: perf_event_exit_cpu_context (kernel/events/core.c:13983)
>>
>> stack backtrace:
>> CPU: 0 UID: 0 PID: 11748 Comm: poweroff Not tainted 6.12.0-rc3+ #3
>> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> Call Trace:
>>  <TASK>
>>  dump_stack_lvl (lib/dump_stack.c:123)
>>  lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
>>  perf_event_clear_cpumask (kernel/events/core.c:13962 (discriminator 9))
>>  ? __pfx_perf_event_clear_cpumask (kernel/events/core.c:13939)
>>  ? acpi_execute_simple_method (drivers/acpi/utils.c:679)
>>  ? __pfx_acpi_execute_simple_method (drivers/acpi/utils.c:679)
>>  ? md_notify_reboot (drivers/md/md.c:9860)
>>  perf_event_exit_cpu_context (kernel/events/core.c:13984 (discriminator 1))
>>  perf_reboot (kernel/events/core.c:14066 (discriminator 3))
>>  ? trace_notifier_run (include/trace/events/notifier.h:59 (discriminator 2))
>>  notifier_call_chain (kernel/notifier.c:93)
>>  blocking_notifier_call_chain (kernel/notifier.c:389)
>>  kernel_power_off (kernel/reboot.c:294)
>>  __do_sys_reboot (kernel/reboot.c:771)
>>  ? __pfx___do_sys_reboot (kernel/reboot.c:717)
>>  ? __pfx_ksys_sync (fs/sync.c:98)
>>  do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1))
>>  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> 
> 
> It is very easy for me to reproduce it: simply by stopping the VM. Just
> in case, here are the steps I used to have the same VM:
> 
>   $ cd [kernel source code]
>   $ echo true > .virtme-exec-run
>   $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
>         --pull always mptcp/mptcp-upstream-virtme-docker:latest \
>         auto-debug -e RCU_EXPERT -e PROVE_RCU_LIST
> 
> 
> I have one question below about the modification you did here.
> 
> (...)
> 
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index aa3450bdc227..5e1877c4cb4c 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
> 
> (...)
> 
>> @@ -13730,6 +13816,40 @@ static void __perf_event_exit_context(void *__info)
>>  	raw_spin_unlock(&ctx->lock);
>>  }
>>  
>> +static void perf_event_clear_cpumask(unsigned int cpu)
>> +{
>> +	int target[PERF_PMU_MAX_SCOPE];
>> +	unsigned int scope;
>> +	struct pmu *pmu;
>> +
>> +	cpumask_clear_cpu(cpu, perf_online_mask);
>> +
>> +	for (scope = PERF_PMU_SCOPE_NONE + 1; scope < PERF_PMU_MAX_SCOPE; scope++) {
>> +		const struct cpumask *cpumask = perf_scope_cpu_topology_cpumask(scope, cpu);
>> +		struct cpumask *pmu_cpumask = perf_scope_cpumask(scope);
>> +
>> +		target[scope] = -1;
>> +		if (WARN_ON_ONCE(!pmu_cpumask || !cpumask))
>> +			continue;
>> +
>> +		if (!cpumask_test_and_clear_cpu(cpu, pmu_cpumask))
>> +			continue;
>> +		target[scope] = cpumask_any_but(cpumask, cpu);
>> +		if (target[scope] < nr_cpu_ids)
>> +			cpumask_set_cpu(target[scope], pmu_cpumask);
>> +	}
>> +
>> +	/* migrate */
>> +	list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
> 
> 
> Here is the line causing the warning, because rcu_read_lock() is not
> used before.
> 
> I don't know this code, but I guess you are not only doing read
> operations here if you are migrating something, right? This operation is
> done under the "pmus_lock", maybe the "_rcu" variant is not needed here?
> 
> So just using this instead is maybe enough?
> 
>   list_for_each_entry(pmu, &pmus, entry) {

Yes, it's good enough. A patch has been proposed, but haven't been
merged yet.

https://lore.kernel.org/lkml/20240913162340.2142976-1-kan.liang@linux.intel.com/

Thanks,
Kan
> 
> 
>> +		if (pmu->scope == PERF_PMU_SCOPE_NONE ||
>> +		    WARN_ON_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE))
>> +			continue;
>> +
>> +		if (target[pmu->scope] >= 0 && target[pmu->scope] < nr_cpu_ids)
>> +			perf_pmu_migrate_context(pmu, cpu, target[pmu->scope]);
>> +	}
>> +}
>> +
>>  static void perf_event_exit_cpu_context(int cpu)
>>  {
>>  	struct perf_cpu_context *cpuctx;
>> @@ -13737,6 +13857,11 @@ static void perf_event_exit_cpu_context(int cpu)
>>  
>>  	// XXX simplify cpuctx->online
>>  	mutex_lock(&pmus_lock);
>> +	/*
>> +	 * Clear the cpumasks, and migrate to other CPUs if possible.
>> +	 * Must be invoked before the __perf_event_exit_context.
>> +	 */
>> +	perf_event_clear_cpumask(cpu);
>>  	cpuctx = per_cpu_ptr(&perf_cpu_context, cpu);
>>  	ctx = &cpuctx->ctx;
>>  
>> @@ -13744,7 +13869,6 @@ static void perf_event_exit_cpu_context(int cpu)
>>  	smp_call_function_single(cpu, __perf_event_exit_context, ctx, 1);
>>  	cpuctx->online = 0;
>>  	mutex_unlock(&ctx->mutex);
>> -	cpumask_clear_cpu(cpu, perf_online_mask);
>>  	mutex_unlock(&pmus_lock);
>>  }
>>  #else
> (...)
> 
> Cheers,
> Matt


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] perf: Generic hotplug support for a PMU with a scope
  2024-10-23 17:46     ` Liang, Kan
@ 2024-10-23 18:19       ` Peter Zijlstra
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2024-10-23 18:19 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Matthieu Baerts, linux-kernel, irogers, mingo, acme, namhyung,
	linux-perf-users

On Wed, Oct 23, 2024 at 01:46:43PM -0400, Liang, Kan wrote:
> Yes, it's good enough. A patch has been proposed, but haven't been
> merged yet.
> 
> https://lore.kernel.org/lkml/20240913162340.2142976-1-kan.liang@linux.intel.com/

Got it now.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-10-23 18:19 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-02 15:16 [PATCH 0/7] Generic hotplug support for a PMU with a scope kan.liang
2024-08-02 15:16 ` [PATCH 1/7] perf: " kan.liang
2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
2024-09-12 10:12     ` Steven Price
2024-09-12 14:53       ` Liang, Kan
2024-09-19 15:43   ` [PATCH 1/7] " Guenter Roeck
2024-09-19 16:28     ` Liang, Kan
2024-10-23 17:09   ` Matthieu Baerts
2024-10-23 17:46     ` Liang, Kan
2024-10-23 18:19       ` Peter Zijlstra
2024-08-02 15:16 ` [PATCH 2/7] perf: Add PERF_EV_CAP_READ_SCOPE kan.liang
2024-09-06 15:11   ` Peter Zijlstra
2024-09-06 15:26     ` Liang, Kan
2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
2024-08-02 15:16 ` [PATCH 3/7] perf/x86/intel/cstate: Clean up cpumask and hotplug kan.liang
2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
2024-08-02 15:16 ` [PATCH 4/7] iommu/vt-d: Clean up cpumask and hotplug for perfmon kan.liang
2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
2024-08-02 15:16 ` [PATCH 5/7] dmaengine: idxd: " kan.liang
2024-08-05 15:40   ` Dave Jiang
2024-08-05 17:46   ` Fenghua Yu
2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
2024-08-02 15:16 ` [PATCH 6/7] perf/x86/rapl: Move the pmu allocation out of CPU hotplug kan.liang
2024-09-09  9:26   ` Dhananjay Ugwekar
2024-09-09 13:02     ` Liang, Kan
2024-09-09 13:24       ` Peter Zijlstra
2024-09-09 17:11         ` Liang, Kan
2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
2024-08-02 15:16 ` [PATCH 7/7] perf/x86/rapl: Clean up cpumask and hotplug kan.liang
2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
2024-09-04 12:44 ` [PATCH 0/7] Generic hotplug support for a PMU with a scope Liang, Kan
2024-09-06 15:17   ` Peter Zijlstra
2024-09-06 15:30     ` Liang, Kan
2024-09-06 15:12 ` Peter Zijlstra
2024-09-06 15:30   ` Liang, Kan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox