[PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs
@ 2024-07-11 10:24 Dhananjay Ugwekar
  2024-07-11 10:24 ` [PATCH v4 01/11] x86/topology: Introduce topology_logical_core_id() Dhananjay Ugwekar
                   ` (11 more replies)
  0 siblings, 12 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-11 10:24 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay.Ugwekar

Currently the energy-cores event in the power PMU aggregates energy
consumption data at a package level. On the other hand the core energy
RAPL counter in AMD CPUs has a core scope (which means the energy 
consumption is recorded separately for each core). Earlier efforts to add
the core event in the power PMU had failed [1], due to the difference in 
the scope of these two events. Hence, there is a need for a new core scope
PMU.

This patchset adds a new "power_per_core" PMU alongside the existing
"power" PMU, which will be responsible for collecting the new
"energy-per-core" event.

Tested the package level and core level PMU counters with workloads
pinned to different CPUs.

Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa 
machine:

$ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1

 Performance counter stats for 'system wide':

S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/

[1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/

This patchset applies cleanly on top of v6.10-rc7 as well as latest 
tip/master.

v4 changes:
* Add patch 11 which removes the unused function cpu_to_rapl_pmu()
* Add Rui's rb tag for patch 1
* Invert the pmu scope check logic in patch 2 (Peter)
* Add comments explaining the scope check in patch 2 (Peter)
* Use cpumask_var_t instead of cpumask_t in patch 5 (Peter)
* Move renaming code to patch 8 (Rui)
* Reorder the cleanup order of per-core and per-pkg PMU in patch 10 (Rui)
* Add rapl_core_hw_unit variable to store the per-core PMU unit in patch
  10 (Rui)

PS: Scope check logic is still kept the same (i.e., all Intel systems being 
considered as die scope), Rui will be modifying it to limit the die-scope 
only to Cascadelake-AP in a future patch on top of this patchset.

v3 changes:
* Patch 1 added to introduce the logical_core_id which is unique across
  the system (Prateek)
* Use the unique topology_logical_core_id() instead of
  topology_core_id() (which is only unique within a package on tested
  AMD and Intel systems) in Patch 10

v2 changes:
* Patches 6,7,8 added to split some changes out of the last patch
* Use container_of to get the rapl_pmus from event variable (Rui)
* Set PERF_EV_CAP_READ_ACTIVE_PKG flag only for pkg scope PMU (Rui)
* Use event id 0x1 for energy-per-core event (Rui)
* Use PERF_RAPL_PER_CORE bit instead of adding a new flag to check for
  per-core counter hw support (Rui)

Dhananjay Ugwekar (10):
  perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
  perf/x86/rapl: Rename rapl_pmu variables
  perf/x86/rapl: Make rapl_model struct global
  perf/x86/rapl: Move cpumask variable to rapl_pmus struct
  perf/x86/rapl: Add wrapper for online/offline functions
  perf/x86/rapl: Add an argument to the cleanup and init functions
  perf/x86/rapl: Modify the generic variable names to *_pkg*
  perf/x86/rapl: Remove the global variable rapl_msrs
  perf/x86/rapl: Add per-core energy counter support for AMD CPUs
  perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu

K Prateek Nayak (1):
  x86/topology: Introduce topology_logical_core_id()

 Documentation/arch/x86/topology.rst   |   4 +
 arch/x86/events/rapl.c                | 454 ++++++++++++++++++--------
 arch/x86/include/asm/processor.h      |   1 +
 arch/x86/include/asm/topology.h       |   1 +
 arch/x86/kernel/cpu/debugfs.c         |   1 +
 arch/x86/kernel/cpu/topology_common.c |   1 +
 6 files changed, 328 insertions(+), 134 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v4 01/11] x86/topology: Introduce topology_logical_core_id()
  2024-07-11 10:24 [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
@ 2024-07-11 10:24 ` Dhananjay Ugwekar
  2024-07-11 10:24 ` [PATCH v4 02/11] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs Dhananjay Ugwekar
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-11 10:24 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay.Ugwekar

From: K Prateek Nayak <kprateek.nayak@amd.com>

On x86, topology_core_id() returns a unique core ID within the PKG
domain. Looking at match_smt() suggests that a core ID just needs to be
unique within a LLC domain. For use cases such as the per-core RAPL PMU,
there exists a need for a unique core ID across the entire system with
multiple PKG domains. Introduce topology_logical_core_id() to derive a
unique core ID across the system.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
---
 Documentation/arch/x86/topology.rst   | 4 ++++
 arch/x86/include/asm/processor.h      | 1 +
 arch/x86/include/asm/topology.h       | 1 +
 arch/x86/kernel/cpu/debugfs.c         | 1 +
 arch/x86/kernel/cpu/topology_common.c | 1 +
 5 files changed, 8 insertions(+)

diff --git a/Documentation/arch/x86/topology.rst b/Documentation/arch/x86/topology.rst
index 7352ab89a55a..c12837e61bda 100644
--- a/Documentation/arch/x86/topology.rst
+++ b/Documentation/arch/x86/topology.rst
@@ -135,6 +135,10 @@ Thread-related topology information in the kernel:
     The ID of the core to which a thread belongs. It is also printed in /proc/cpuinfo
     "core_id."
 
+  - topology_logical_core_id();
+
+    The logical core ID to which a thread belongs.
+
 
 
 System topology examples
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index cb4f6c513c48..1ffe4260bef6 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -98,6 +98,7 @@ struct cpuinfo_topology {
 	// Logical ID mappings
 	u32			logical_pkg_id;
 	u32			logical_die_id;
+	u32			logical_core_id;
 
 	// AMD Node ID and Nodes per Package info
 	u32			amd_node_id;
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index abe3a8f22cbd..2a6dbf965d92 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -137,6 +137,7 @@ extern const struct cpumask *cpu_clustergroup_mask(int cpu);
 #define topology_logical_package_id(cpu)	(cpu_data(cpu).topo.logical_pkg_id)
 #define topology_physical_package_id(cpu)	(cpu_data(cpu).topo.pkg_id)
 #define topology_logical_die_id(cpu)		(cpu_data(cpu).topo.logical_die_id)
+#define topology_logical_core_id(cpu)		(cpu_data(cpu).topo.logical_core_id)
 #define topology_die_id(cpu)			(cpu_data(cpu).topo.die_id)
 #define topology_core_id(cpu)			(cpu_data(cpu).topo.core_id)
 #define topology_ppin(cpu)			(cpu_data(cpu).ppin)
diff --git a/arch/x86/kernel/cpu/debugfs.c b/arch/x86/kernel/cpu/debugfs.c
index 3baf3e435834..b1eb6d7828db 100644
--- a/arch/x86/kernel/cpu/debugfs.c
+++ b/arch/x86/kernel/cpu/debugfs.c
@@ -24,6 +24,7 @@ static int cpu_debug_show(struct seq_file *m, void *p)
 	seq_printf(m, "core_id:             %u\n", c->topo.core_id);
 	seq_printf(m, "logical_pkg_id:      %u\n", c->topo.logical_pkg_id);
 	seq_printf(m, "logical_die_id:      %u\n", c->topo.logical_die_id);
+	seq_printf(m, "logical_core_id:     %u\n", c->topo.logical_core_id);
 	seq_printf(m, "llc_id:              %u\n", c->topo.llc_id);
 	seq_printf(m, "l2c_id:              %u\n", c->topo.l2c_id);
 	seq_printf(m, "amd_node_id:         %u\n", c->topo.amd_node_id);
diff --git a/arch/x86/kernel/cpu/topology_common.c b/arch/x86/kernel/cpu/topology_common.c
index 9a6069e7133c..23722aa21e2f 100644
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -151,6 +151,7 @@ static void topo_set_ids(struct topo_scan *tscan, bool early)
 	if (!early) {
 		c->topo.logical_pkg_id = topology_get_logical_id(apicid, TOPO_PKG_DOMAIN);
 		c->topo.logical_die_id = topology_get_logical_id(apicid, TOPO_DIE_DOMAIN);
+		c->topo.logical_core_id = topology_get_logical_id(apicid, TOPO_CORE_DOMAIN);
 	}
 
 	/* Package relative core ID */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 02/11] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
  2024-07-11 10:24 [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
  2024-07-11 10:24 ` [PATCH v4 01/11] x86/topology: Introduce topology_logical_core_id() Dhananjay Ugwekar
@ 2024-07-11 10:24 ` Dhananjay Ugwekar
  2024-07-12  2:04   ` Zhang, Rui
  2024-07-16  8:56   ` Dhananjay Ugwekar
  2024-07-11 10:24 ` [PATCH v4 03/11] perf/x86/rapl: Rename rapl_pmu variables Dhananjay Ugwekar
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-11 10:24 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay.Ugwekar

After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
on AMD processors that support extended CPUID leaf 0x80000026, the
topology_die_cpumask() and topology_logical_die_id() macros, no longer
return the package cpumask and package id, instead they return the CCD
(Core Complex Die) mask and id respectively. This leads to the energy-pkg
event scope to be modified to CCD instead of package.

Replacing these macros with their package counterparts fixes the
energy-pkg event for AMD CPUs.

However due to the difference between the scope of energy-pkg event for
Intel and AMD CPUs, we have to replace these macros conditionally only for
AMD CPUs.

On a 12 CCD 1 Package AMD Zen4 Genoa machine:

Before:
$ cat /sys/devices/power/cpumask
0,8,16,24,32,40,48,56,64,72,80,88.

The expected cpumask here is supposed to be just "0", as it is a package
scope event, only one CPU will be collecting the event for all the CPUs in
the package.

After:
$ cat /sys/devices/power/cpumask
0

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
---
Changes in v4:
* Invert the pkg scope check in init_rapl_pmus() (Peter)
* Add comments to explain the pkg scope check (Peter)

PS: Scope check logic is still kept the same (i.e., all Intel systems being 
considered as die scope), Rui will be modifying it to limit the die-scope 
only to Cascadelake-AP in a future patch on top of this patchset.
---
 arch/x86/events/rapl.c | 39 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 0c5e7a7c43ac..df71f38ad98d 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -103,6 +103,13 @@ static struct perf_pmu_events_attr event_attr_##v = {				\
 	.event_str	= str,							\
 };
 
+/*
+ * RAPL PMU scope for AMD is package whereas for Intel it is die.
+ */
+#define rapl_pmu_is_pkg_scope()				\
+	(boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||	\
+	 boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
+
 struct rapl_pmu {
 	raw_spinlock_t		lock;
 	int			n_active;
@@ -140,9 +147,25 @@ static unsigned int rapl_cntr_mask;
 static u64 rapl_timer_ms;
 static struct perf_msr *rapl_msrs;
 
+/*
+ * Helper functions to get the correct topology macros according to the
+ * RAPL PMU scope.
+ */
+static inline unsigned int get_rapl_pmu_idx(int cpu)
+{
+	return rapl_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
+					 topology_logical_die_id(cpu);
+}
+
+static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
+{
+	return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
+					 topology_die_cpumask(cpu);
+}
+
 static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
 {
-	unsigned int rapl_pmu_idx = topology_logical_die_id(cpu);
+	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
 
 	/*
 	 * The unsigned check also catches the '-1' return value for non
@@ -543,6 +566,7 @@ static struct perf_msr amd_rapl_msrs[] = {
 
 static int rapl_cpu_offline(unsigned int cpu)
 {
+	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
 	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
 	int target;
 
@@ -552,7 +576,7 @@ static int rapl_cpu_offline(unsigned int cpu)
 
 	pmu->cpu = -1;
 	/* Find a new cpu to collect rapl events */
-	target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
+	target = cpumask_any_but(rapl_pmu_cpumask, cpu);
 
 	/* Migrate rapl events to the new target */
 	if (target < nr_cpu_ids) {
@@ -565,6 +589,8 @@ static int rapl_cpu_offline(unsigned int cpu)
 
 static int rapl_cpu_online(unsigned int cpu)
 {
+	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
+	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
 	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
 	int target;
 
@@ -579,14 +605,14 @@ static int rapl_cpu_online(unsigned int cpu)
 		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
 		rapl_hrtimer_init(pmu);
 
-		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
+		rapl_pmus->pmus[rapl_pmu_idx] = pmu;
 	}
 
 	/*
 	 * Check if there is an online cpu in the package which collects rapl
 	 * events already.
 	 */
-	target = cpumask_any_and(&rapl_cpu_mask, topology_die_cpumask(cpu));
+	target = cpumask_any_and(&rapl_cpu_mask, rapl_pmu_cpumask);
 	if (target < nr_cpu_ids)
 		return 0;
 
@@ -675,7 +701,10 @@ static const struct attribute_group *rapl_attr_update[] = {
 
 static int __init init_rapl_pmus(void)
 {
-	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
+	int nr_rapl_pmu = topology_max_packages();
+
+	if (!rapl_pmu_is_pkg_scope())
+		nr_rapl_pmu *= topology_max_dies_per_package();
 
 	rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus, nr_rapl_pmu), GFP_KERNEL);
 	if (!rapl_pmus)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 03/11] perf/x86/rapl: Rename rapl_pmu variables
  2024-07-11 10:24 [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
  2024-07-11 10:24 ` [PATCH v4 01/11] x86/topology: Introduce topology_logical_core_id() Dhananjay Ugwekar
  2024-07-11 10:24 ` [PATCH v4 02/11] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs Dhananjay Ugwekar
@ 2024-07-11 10:24 ` Dhananjay Ugwekar
  2024-07-12  2:14   ` Zhang, Rui
  2024-07-11 10:24 ` [PATCH v4 04/11] perf/x86/rapl: Make rapl_model struct global Dhananjay Ugwekar
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-11 10:24 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay.Ugwekar

Rename struct rapl_pmu variables from "pmu" to "rapl_pmu", to
avoid any confusion between the variables of two different
structs pmu and rapl_pmu. As rapl_pmu also contains a pointer to
struct pmu, which leads to situations in code like pmu->pmu,
which is needlessly confusing. Above scenario is replaced with
much more readable rapl_pmu->pmu with this change.

Also rename "pmus" member in rapl_pmus struct, for same reason.

No functional change.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 104 ++++++++++++++++++++---------------------
 1 file changed, 52 insertions(+), 52 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index df71f38ad98d..e6162bd84f23 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -123,7 +123,7 @@ struct rapl_pmu {
 struct rapl_pmus {
 	struct pmu		pmu;
 	unsigned int		nr_rapl_pmu;
-	struct rapl_pmu		*pmus[] __counted_by(nr_rapl_pmu);
+	struct rapl_pmu		*rapl_pmu[] __counted_by(nr_rapl_pmu);
 };
 
 enum rapl_unit_quirk {
@@ -171,7 +171,7 @@ static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
 	 * The unsigned check also catches the '-1' return value for non
 	 * existent mappings in the topology map.
 	 */
-	return rapl_pmu_idx < rapl_pmus->nr_rapl_pmu ? rapl_pmus->pmus[rapl_pmu_idx] : NULL;
+	return rapl_pmu_idx < rapl_pmus->nr_rapl_pmu ? rapl_pmus->rapl_pmu[rapl_pmu_idx] : NULL;
 }
 
 static inline u64 rapl_read_counter(struct perf_event *event)
@@ -235,34 +235,34 @@ static void rapl_start_hrtimer(struct rapl_pmu *pmu)
 
 static enum hrtimer_restart rapl_hrtimer_handle(struct hrtimer *hrtimer)
 {
-	struct rapl_pmu *pmu = container_of(hrtimer, struct rapl_pmu, hrtimer);
+	struct rapl_pmu *rapl_pmu = container_of(hrtimer, struct rapl_pmu, hrtimer);
 	struct perf_event *event;
 	unsigned long flags;
 
-	if (!pmu->n_active)
+	if (!rapl_pmu->n_active)
 		return HRTIMER_NORESTART;
 
-	raw_spin_lock_irqsave(&pmu->lock, flags);
+	raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
 
-	list_for_each_entry(event, &pmu->active_list, active_entry)
+	list_for_each_entry(event, &rapl_pmu->active_list, active_entry)
 		rapl_event_update(event);
 
-	raw_spin_unlock_irqrestore(&pmu->lock, flags);
+	raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
 
-	hrtimer_forward_now(hrtimer, pmu->timer_interval);
+	hrtimer_forward_now(hrtimer, rapl_pmu->timer_interval);
 
 	return HRTIMER_RESTART;
 }
 
-static void rapl_hrtimer_init(struct rapl_pmu *pmu)
+static void rapl_hrtimer_init(struct rapl_pmu *rapl_pmu)
 {
-	struct hrtimer *hr = &pmu->hrtimer;
+	struct hrtimer *hr = &rapl_pmu->hrtimer;
 
 	hrtimer_init(hr, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	hr->function = rapl_hrtimer_handle;
 }
 
-static void __rapl_pmu_event_start(struct rapl_pmu *pmu,
+static void __rapl_pmu_event_start(struct rapl_pmu *rapl_pmu,
 				   struct perf_event *event)
 {
 	if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
@@ -270,39 +270,39 @@ static void __rapl_pmu_event_start(struct rapl_pmu *pmu,
 
 	event->hw.state = 0;
 
-	list_add_tail(&event->active_entry, &pmu->active_list);
+	list_add_tail(&event->active_entry, &rapl_pmu->active_list);
 
 	local64_set(&event->hw.prev_count, rapl_read_counter(event));
 
-	pmu->n_active++;
-	if (pmu->n_active == 1)
-		rapl_start_hrtimer(pmu);
+	rapl_pmu->n_active++;
+	if (rapl_pmu->n_active == 1)
+		rapl_start_hrtimer(rapl_pmu);
 }
 
 static void rapl_pmu_event_start(struct perf_event *event, int mode)
 {
-	struct rapl_pmu *pmu = event->pmu_private;
+	struct rapl_pmu *rapl_pmu = event->pmu_private;
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&pmu->lock, flags);
-	__rapl_pmu_event_start(pmu, event);
-	raw_spin_unlock_irqrestore(&pmu->lock, flags);
+	raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
+	__rapl_pmu_event_start(rapl_pmu, event);
+	raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
 }
 
 static void rapl_pmu_event_stop(struct perf_event *event, int mode)
 {
-	struct rapl_pmu *pmu = event->pmu_private;
+	struct rapl_pmu *rapl_pmu = event->pmu_private;
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&pmu->lock, flags);
+	raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
 
 	/* mark event as deactivated and stopped */
 	if (!(hwc->state & PERF_HES_STOPPED)) {
-		WARN_ON_ONCE(pmu->n_active <= 0);
-		pmu->n_active--;
-		if (pmu->n_active == 0)
-			hrtimer_cancel(&pmu->hrtimer);
+		WARN_ON_ONCE(rapl_pmu->n_active <= 0);
+		rapl_pmu->n_active--;
+		if (rapl_pmu->n_active == 0)
+			hrtimer_cancel(&rapl_pmu->hrtimer);
 
 		list_del(&event->active_entry);
 
@@ -320,23 +320,23 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode)
 		hwc->state |= PERF_HES_UPTODATE;
 	}
 
-	raw_spin_unlock_irqrestore(&pmu->lock, flags);
+	raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
 }
 
 static int rapl_pmu_event_add(struct perf_event *event, int mode)
 {
-	struct rapl_pmu *pmu = event->pmu_private;
+	struct rapl_pmu *rapl_pmu = event->pmu_private;
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&pmu->lock, flags);
+	raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
 
 	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
 
 	if (mode & PERF_EF_START)
-		__rapl_pmu_event_start(pmu, event);
+		__rapl_pmu_event_start(rapl_pmu, event);
 
-	raw_spin_unlock_irqrestore(&pmu->lock, flags);
+	raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
 
 	return 0;
 }
@@ -350,7 +350,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
 {
 	u64 cfg = event->attr.config & RAPL_EVENT_MASK;
 	int bit, ret = 0;
-	struct rapl_pmu *pmu;
+	struct rapl_pmu *rapl_pmu;
 
 	/* only look at RAPL events */
 	if (event->attr.type != rapl_pmus->pmu.type)
@@ -380,11 +380,11 @@ static int rapl_pmu_event_init(struct perf_event *event)
 		return -EINVAL;
 
 	/* must be done before validate_group */
-	pmu = cpu_to_rapl_pmu(event->cpu);
-	if (!pmu)
+	rapl_pmu = cpu_to_rapl_pmu(event->cpu);
+	if (!rapl_pmu)
 		return -EINVAL;
-	event->cpu = pmu->cpu;
-	event->pmu_private = pmu;
+	event->cpu = rapl_pmu->cpu;
+	event->pmu_private = rapl_pmu;
 	event->hw.event_base = rapl_msrs[bit].msr;
 	event->hw.config = cfg;
 	event->hw.idx = bit;
@@ -567,22 +567,22 @@ static struct perf_msr amd_rapl_msrs[] = {
 static int rapl_cpu_offline(unsigned int cpu)
 {
 	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
-	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
+	struct rapl_pmu *rapl_pmu = cpu_to_rapl_pmu(cpu);
 	int target;
 
 	/* Check if exiting cpu is used for collecting rapl events */
 	if (!cpumask_test_and_clear_cpu(cpu, &rapl_cpu_mask))
 		return 0;
 
-	pmu->cpu = -1;
+	rapl_pmu->cpu = -1;
 	/* Find a new cpu to collect rapl events */
 	target = cpumask_any_but(rapl_pmu_cpumask, cpu);
 
 	/* Migrate rapl events to the new target */
 	if (target < nr_cpu_ids) {
 		cpumask_set_cpu(target, &rapl_cpu_mask);
-		pmu->cpu = target;
-		perf_pmu_migrate_context(pmu->pmu, cpu, target);
+		rapl_pmu->cpu = target;
+		perf_pmu_migrate_context(rapl_pmu->pmu, cpu, target);
 	}
 	return 0;
 }
@@ -591,21 +591,21 @@ static int rapl_cpu_online(unsigned int cpu)
 {
 	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
 	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
-	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
+	struct rapl_pmu *rapl_pmu = cpu_to_rapl_pmu(cpu);
 	int target;
 
-	if (!pmu) {
-		pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
-		if (!pmu)
+	if (!rapl_pmu) {
+		rapl_pmu = kzalloc_node(sizeof(*rapl_pmu), GFP_KERNEL, cpu_to_node(cpu));
+		if (!rapl_pmu)
 			return -ENOMEM;
 
-		raw_spin_lock_init(&pmu->lock);
-		INIT_LIST_HEAD(&pmu->active_list);
-		pmu->pmu = &rapl_pmus->pmu;
-		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
-		rapl_hrtimer_init(pmu);
+		raw_spin_lock_init(&rapl_pmu->lock);
+		INIT_LIST_HEAD(&rapl_pmu->active_list);
+		rapl_pmu->pmu = &rapl_pmus->pmu;
+		rapl_pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
+		rapl_hrtimer_init(rapl_pmu);
 
-		rapl_pmus->pmus[rapl_pmu_idx] = pmu;
+		rapl_pmus->rapl_pmu[rapl_pmu_idx] = rapl_pmu;
 	}
 
 	/*
@@ -617,7 +617,7 @@ static int rapl_cpu_online(unsigned int cpu)
 		return 0;
 
 	cpumask_set_cpu(cpu, &rapl_cpu_mask);
-	pmu->cpu = cpu;
+	rapl_pmu->cpu = cpu;
 	return 0;
 }
 
@@ -686,7 +686,7 @@ static void cleanup_rapl_pmus(void)
 	int i;
 
 	for (i = 0; i < rapl_pmus->nr_rapl_pmu; i++)
-		kfree(rapl_pmus->pmus[i]);
+		kfree(rapl_pmus->rapl_pmu[i]);
 	kfree(rapl_pmus);
 }
 
@@ -706,7 +706,7 @@ static int __init init_rapl_pmus(void)
 	if (!rapl_pmu_is_pkg_scope())
 		nr_rapl_pmu *= topology_max_dies_per_package();
 
-	rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus, nr_rapl_pmu), GFP_KERNEL);
+	rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu, nr_rapl_pmu), GFP_KERNEL);
 	if (!rapl_pmus)
 		return -ENOMEM;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 04/11] perf/x86/rapl: Make rapl_model struct global
  2024-07-11 10:24 [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (2 preceding siblings ...)
  2024-07-11 10:24 ` [PATCH v4 03/11] perf/x86/rapl: Rename rapl_pmu variables Dhananjay Ugwekar
@ 2024-07-11 10:24 ` Dhananjay Ugwekar
  2024-07-12  3:03   ` Zhang, Rui
  2024-07-11 10:24 ` [PATCH v4 05/11] perf/x86/rapl: Move cpumask variable to rapl_pmus struct Dhananjay Ugwekar
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-11 10:24 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay.Ugwekar

To support AMD's per_core RAPL counter, we will need to check
per_core capability of the current rapl_model multiple times in
rapl_cpu_online/offline, init_rapl_pmus functions, so cache the
matched rapl model in a global variable, to avoid calling
x86_match_cpu() multiple times.

No functional change.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index c3afeaf679c2..4ee0877eb4d8 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -146,6 +146,7 @@ static cpumask_t rapl_cpu_mask;
 static unsigned int rapl_cntr_mask;
 static u64 rapl_timer_ms;
 static struct perf_msr *rapl_msrs;
+static struct rapl_model *rapl_model;
 
 /*
  * Helper functions to get the correct topology macros according to the
@@ -621,18 +622,18 @@ static int rapl_cpu_online(unsigned int cpu)
 	return 0;
 }
 
-static int rapl_check_hw_unit(struct rapl_model *rm)
+static int rapl_check_hw_unit(void)
 {
 	u64 msr_rapl_power_unit_bits;
 	int i;
 
 	/* protect rdmsrl() to handle virtualization */
-	if (rdmsrl_safe(rm->msr_power_unit, &msr_rapl_power_unit_bits))
+	if (rdmsrl_safe(rapl_model->msr_power_unit, &msr_rapl_power_unit_bits))
 		return -1;
 	for (i = 0; i < NR_RAPL_DOMAINS; i++)
 		rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
 
-	switch (rm->unit_quirk) {
+	switch (rapl_model->unit_quirk) {
 	/*
 	 * DRAM domain on HSW server and KNL has fixed energy unit which can be
 	 * different than the unit from power unit MSR. See
@@ -846,21 +847,20 @@ MODULE_DEVICE_TABLE(x86cpu, rapl_model_match);
 static int __init rapl_pmu_init(void)
 {
 	const struct x86_cpu_id *id;
-	struct rapl_model *rm;
 	int ret;
 
 	id = x86_match_cpu(rapl_model_match);
 	if (!id)
 		return -ENODEV;
 
-	rm = (struct rapl_model *) id->driver_data;
+	rapl_model = (struct rapl_model *) id->driver_data;
 
-	rapl_msrs = rm->rapl_msrs;
+	rapl_msrs = rapl_model->rapl_msrs;
 
 	rapl_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_MAX,
-					false, (void *) &rm->events);
+					false, (void *) &rapl_model->events);
 
-	ret = rapl_check_hw_unit(rm);
+	ret = rapl_check_hw_unit();
 	if (ret)
 		return ret;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 05/11] perf/x86/rapl: Move cpumask variable to rapl_pmus struct
  2024-07-11 10:24 [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (3 preceding siblings ...)
  2024-07-11 10:24 ` [PATCH v4 04/11] perf/x86/rapl: Make rapl_model struct global Dhananjay Ugwekar
@ 2024-07-11 10:24 ` Dhananjay Ugwekar
  2024-07-12  3:07   ` Zhang, Rui
  2024-07-11 10:24 ` [PATCH v4 06/11] perf/x86/rapl: Add wrapper for online/offline functions Dhananjay Ugwekar
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-11 10:24 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay.Ugwekar

This patch is in preparation for addition of per-core energy counter
support for AMD CPUs.

Per-core energy counter PMU will need a separate cpumask. It seems like
a better approach to add the cpumask inside the rapl_pmus struct, instead
of creating another global cpumask variable for per-core PMU. This way, in
future, if there is a need for a new PMU with a different scope (e.g. CCD),
adding a new global cpumask variable won't be necessary.

No functional change.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
Changes in v4:
* Use cpumask_var_t instead of cpumask_t (Peter)
---
 arch/x86/events/rapl.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index d231a9c068af..e3d0a82e12b9 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -122,6 +122,7 @@ struct rapl_pmu {
 
 struct rapl_pmus {
 	struct pmu		pmu;
+	cpumask_var_t		cpumask;
 	unsigned int		nr_rapl_pmu;
 	struct rapl_pmu		*rapl_pmu[] __counted_by(nr_rapl_pmu);
 };
@@ -142,7 +143,6 @@ struct rapl_model {
  /* 1/2^hw_unit Joule */
 static int rapl_hw_unit[NR_RAPL_DOMAINS] __read_mostly;
 static struct rapl_pmus *rapl_pmus;
-static cpumask_t rapl_cpu_mask;
 static unsigned int rapl_cntr_mask;
 static u64 rapl_timer_ms;
 static struct perf_msr *rapl_msrs;
@@ -401,7 +401,7 @@ static void rapl_pmu_event_read(struct perf_event *event)
 static ssize_t rapl_get_attr_cpumask(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
-	return cpumap_print_to_pagebuf(true, buf, &rapl_cpu_mask);
+	return cpumap_print_to_pagebuf(true, buf, rapl_pmus->cpumask);
 }
 
 static DEVICE_ATTR(cpumask, S_IRUGO, rapl_get_attr_cpumask, NULL);
@@ -572,7 +572,7 @@ static int rapl_cpu_offline(unsigned int cpu)
 	int target;
 
 	/* Check if exiting cpu is used for collecting rapl events */
-	if (!cpumask_test_and_clear_cpu(cpu, &rapl_cpu_mask))
+	if (!cpumask_test_and_clear_cpu(cpu, rapl_pmus->cpumask))
 		return 0;
 
 	rapl_pmu->cpu = -1;
@@ -581,7 +581,7 @@ static int rapl_cpu_offline(unsigned int cpu)
 
 	/* Migrate rapl events to the new target */
 	if (target < nr_cpu_ids) {
-		cpumask_set_cpu(target, &rapl_cpu_mask);
+		cpumask_set_cpu(target, rapl_pmus->cpumask);
 		rapl_pmu->cpu = target;
 		perf_pmu_migrate_context(rapl_pmu->pmu, cpu, target);
 	}
@@ -613,11 +613,11 @@ static int rapl_cpu_online(unsigned int cpu)
 	 * Check if there is an online cpu in the package which collects rapl
 	 * events already.
 	 */
-	target = cpumask_any_and(&rapl_cpu_mask, rapl_pmu_cpumask);
+	target = cpumask_any_and(rapl_pmus->cpumask, rapl_pmu_cpumask);
 	if (target < nr_cpu_ids)
 		return 0;
 
-	cpumask_set_cpu(cpu, &rapl_cpu_mask);
+	cpumask_set_cpu(cpu, rapl_pmus->cpumask);
 	rapl_pmu->cpu = cpu;
 	return 0;
 }
@@ -710,6 +710,9 @@ static int __init init_rapl_pmus(void)
 	rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu, nr_rapl_pmu), GFP_KERNEL);
 	if (!rapl_pmus)
 		return -ENOMEM;
+
+	if (!alloc_cpumask_var(&rapl_pmus->cpumask, GFP_KERNEL))
+		return -ENOMEM;
 
 	rapl_pmus->nr_rapl_pmu		= nr_rapl_pmu;
 	rapl_pmus->pmu.attr_groups	= rapl_attr_groups;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 06/11] perf/x86/rapl: Add wrapper for online/offline functions
  2024-07-11 10:24 [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (4 preceding siblings ...)
  2024-07-11 10:24 ` [PATCH v4 05/11] perf/x86/rapl: Move cpumask variable to rapl_pmus struct Dhananjay Ugwekar
@ 2024-07-11 10:24 ` Dhananjay Ugwekar
  2024-07-12  3:20   ` Zhang, Rui
  2024-07-11 10:24 ` [PATCH v4 07/11] perf/x86/rapl: Add an argument to the cleanup and init functions Dhananjay Ugwekar
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-11 10:24 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay.Ugwekar

This is in preparation for the addition of per-core RAPL counter support
for AMD CPUs.

The CPU online and offline functions will need to handle the setting up and
migration of the new per-core PMU as well. The wrapper functions added
below will make it easier to pass the corresponding args for both the PMUs.

No functional change.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index e3d0a82e12b9..1d36565d0cb9 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -565,10 +565,10 @@ static struct perf_msr amd_rapl_msrs[] = {
 	[PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group,  NULL, false, 0 },
 };
 
-static int rapl_cpu_offline(unsigned int cpu)
+static int __rapl_cpu_offline(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu_idx,
+			      const struct cpumask *event_cpumask, unsigned int cpu)
 {
-	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
-	struct rapl_pmu *rapl_pmu = cpu_to_rapl_pmu(cpu);
+	struct rapl_pmu *rapl_pmu = rapl_pmus->rapl_pmu[rapl_pmu_idx];
 	int target;
 
 	/* Check if exiting cpu is used for collecting rapl events */
@@ -577,7 +577,7 @@ static int rapl_cpu_offline(unsigned int cpu)
 
 	rapl_pmu->cpu = -1;
 	/* Find a new cpu to collect rapl events */
-	target = cpumask_any_but(rapl_pmu_cpumask, cpu);
+	target = cpumask_any_but(event_cpumask, cpu);
 
 	/* Migrate rapl events to the new target */
 	if (target < nr_cpu_ids) {
@@ -588,11 +588,16 @@ static int rapl_cpu_offline(unsigned int cpu)
 	return 0;
 }
 
-static int rapl_cpu_online(unsigned int cpu)
+static int rapl_cpu_offline(unsigned int cpu)
 {
-	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
-	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
-	struct rapl_pmu *rapl_pmu = cpu_to_rapl_pmu(cpu);
+	return __rapl_cpu_offline(rapl_pmus, get_rapl_pmu_idx(cpu),
+				  get_rapl_pmu_cpumask(cpu), cpu);
+}
+
+static int __rapl_cpu_online(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu_idx,
+			     const struct cpumask *event_cpumask, unsigned int cpu)
+{
+	struct rapl_pmu *rapl_pmu = rapl_pmus->rapl_pmu[rapl_pmu_idx];
 	int target;
 
 	if (!rapl_pmu) {
@@ -613,7 +618,7 @@ static int rapl_cpu_online(unsigned int cpu)
 	 * Check if there is an online cpu in the package which collects rapl
 	 * events already.
 	 */
-	target = cpumask_any_and(rapl_pmus->cpumask, rapl_pmu_cpumask);
+	target = cpumask_any_and(rapl_pmus->cpumask, event_cpumask);
 	if (target < nr_cpu_ids)
 		return 0;
 
@@ -622,6 +627,13 @@ static int rapl_cpu_online(unsigned int cpu)
 	return 0;
 }
 
+static int rapl_cpu_online(unsigned int cpu)
+{
+	return __rapl_cpu_online(rapl_pmus, get_rapl_pmu_idx(cpu),
+				 get_rapl_pmu_cpumask(cpu), cpu);
+}
+
+
 static int rapl_check_hw_unit(void)
 {
 	u64 msr_rapl_power_unit_bits;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 07/11] perf/x86/rapl: Add an argument to the cleanup and init functions
  2024-07-11 10:24 [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (5 preceding siblings ...)
  2024-07-11 10:24 ` [PATCH v4 06/11] perf/x86/rapl: Add wrapper for online/offline functions Dhananjay Ugwekar
@ 2024-07-11 10:24 ` Dhananjay Ugwekar
  2024-07-12  3:22   ` Zhang, Rui
  2024-07-11 10:24 ` [PATCH v4 08/11] perf/x86/rapl: Modify the generic variable names to *_pkg* Dhananjay Ugwekar
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-11 10:24 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay.Ugwekar

Prep for per-core RAPL PMU addition.

No functional change.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 1d36565d0cb9..362e82284ccb 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -694,7 +694,7 @@ static void __init rapl_advertise(void)
 	}
 }
 
-static void cleanup_rapl_pmus(void)
+static void cleanup_rapl_pmus(struct rapl_pmus *rapl_pmus)
 {
 	int i;
 
@@ -712,9 +712,10 @@ static const struct attribute_group *rapl_attr_update[] = {
 	NULL,
 };
 
-static int __init init_rapl_pmus(void)
+static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr)
 {
 	int nr_rapl_pmu = topology_max_packages();
+	struct rapl_pmus *rapl_pmus;
 
 	if (!rapl_pmu_is_pkg_scope())
 		nr_rapl_pmu *= topology_max_dies_per_package();
@@ -738,6 +739,9 @@ static int __init init_rapl_pmus(void)
 	rapl_pmus->pmu.read		= rapl_pmu_event_read;
 	rapl_pmus->pmu.module		= THIS_MODULE;
 	rapl_pmus->pmu.capabilities	= PERF_PMU_CAP_NO_EXCLUDE;
+
+	*rapl_pmus_ptr = rapl_pmus;
+
 	return 0;
 }
 
@@ -879,7 +883,7 @@ static int __init rapl_pmu_init(void)
 	if (ret)
 		return ret;
 
-	ret = init_rapl_pmus();
+	ret = init_rapl_pmus(&rapl_pmus);
 	if (ret)
 		return ret;
 
@@ -903,7 +907,7 @@ static int __init rapl_pmu_init(void)
 	cpuhp_remove_state(CPUHP_AP_PERF_X86_RAPL_ONLINE);
 out:
 	pr_warn("Initialization failed (%d), disabled\n", ret);
-	cleanup_rapl_pmus();
+	cleanup_rapl_pmus(rapl_pmus);
 	return ret;
 }
 module_init(rapl_pmu_init);
@@ -912,6 +916,6 @@ static void __exit intel_rapl_exit(void)
 {
 	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
 	perf_pmu_unregister(&rapl_pmus->pmu);
-	cleanup_rapl_pmus();
+	cleanup_rapl_pmus(rapl_pmus);
 }
 module_exit(intel_rapl_exit);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 08/11] perf/x86/rapl: Modify the generic variable names to *_pkg*
  2024-07-11 10:24 [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (6 preceding siblings ...)
  2024-07-11 10:24 ` [PATCH v4 07/11] perf/x86/rapl: Add an argument to the cleanup and init functions Dhananjay Ugwekar
@ 2024-07-11 10:24 ` Dhananjay Ugwekar
  2024-07-12  3:28   ` Zhang, Rui
  2024-07-11 10:24 ` [PATCH v4 09/11] perf/x86/rapl: Remove the global variable rapl_msrs Dhananjay Ugwekar
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-11 10:24 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay.Ugwekar

Prep for addition of power_per_core PMU to handle core scope energy
consumption for AMD CPUs.

Replace the generic names with *_pkg*, to differentiate between the
scopes of the two different PMUs and their variables.

No functional change.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
Changes in v4:
* Move some renaming code from patch 10 to here (Rui)
---
 arch/x86/events/rapl.c | 113 +++++++++++++++++++++--------------------
 1 file changed, 57 insertions(+), 56 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 362e82284ccb..9beded3d4a14 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -70,18 +70,18 @@ MODULE_LICENSE("GPL");
 /*
  * RAPL energy status counters
  */
-enum perf_rapl_events {
+enum perf_rapl_pkg_events {
 	PERF_RAPL_PP0 = 0,		/* all cores */
 	PERF_RAPL_PKG,			/* entire package */
 	PERF_RAPL_RAM,			/* DRAM */
 	PERF_RAPL_PP1,			/* gpu */
 	PERF_RAPL_PSYS,			/* psys */
 
-	PERF_RAPL_MAX,
-	NR_RAPL_DOMAINS = PERF_RAPL_MAX,
+	PERF_RAPL_PKG_EVENTS_MAX,
+	NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
 };
 
-static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
+static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst = {
 	"pp0-core",
 	"package",
 	"dram",
@@ -134,16 +134,16 @@ enum rapl_unit_quirk {
 };
 
 struct rapl_model {
-	struct perf_msr *rapl_msrs;
-	unsigned long	events;
+	struct perf_msr *rapl_pkg_msrs;
+	unsigned long	pkg_events;
 	unsigned int	msr_power_unit;
 	enum rapl_unit_quirk	unit_quirk;
 };
 
  /* 1/2^hw_unit Joule */
-static int rapl_hw_unit[NR_RAPL_DOMAINS] __read_mostly;
-static struct rapl_pmus *rapl_pmus;
-static unsigned int rapl_cntr_mask;
+static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
+static struct rapl_pmus *rapl_pmus_pkg;
+static unsigned int rapl_pkg_cntr_mask;
 static u64 rapl_timer_ms;
 static struct perf_msr *rapl_msrs;
 static struct rapl_model *rapl_model;
@@ -172,7 +172,8 @@ static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
 	 * The unsigned check also catches the '-1' return value for non
 	 * existent mappings in the topology map.
 	 */
-	return rapl_pmu_idx < rapl_pmus->nr_rapl_pmu ? rapl_pmus->rapl_pmu[rapl_pmu_idx] : NULL;
+	return rapl_pmu_idx < rapl_pmus_pkg->nr_rapl_pmu ?
+	       rapl_pmus_pkg->rapl_pmu[rapl_pmu_idx] : NULL;
 }
 
 static inline u64 rapl_read_counter(struct perf_event *event)
@@ -184,7 +185,7 @@ static inline u64 rapl_read_counter(struct perf_event *event)
 
 static inline u64 rapl_scale(u64 v, int cfg)
 {
-	if (cfg > NR_RAPL_DOMAINS) {
+	if (cfg > NR_RAPL_PKG_DOMAINS) {
 		pr_warn("Invalid domain %d, failed to scale data\n", cfg);
 		return v;
 	}
@@ -194,7 +195,7 @@ static inline u64 rapl_scale(u64 v, int cfg)
 	 * or use ldexp(count, -32).
 	 * Watts = Joules/Time delta
 	 */
-	return v << (32 - rapl_hw_unit[cfg - 1]);
+	return v << (32 - rapl_pkg_hw_unit[cfg - 1]);
 }
 
 static u64 rapl_event_update(struct perf_event *event)
@@ -354,7 +355,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
 	struct rapl_pmu *rapl_pmu;
 
 	/* only look at RAPL events */
-	if (event->attr.type != rapl_pmus->pmu.type)
+	if (event->attr.type != rapl_pmus_pkg->pmu.type)
 		return -ENOENT;
 
 	/* check only supported bits are set */
@@ -366,14 +367,14 @@ static int rapl_pmu_event_init(struct perf_event *event)
 
 	event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
 
-	if (!cfg || cfg >= NR_RAPL_DOMAINS + 1)
+	if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
 		return -EINVAL;
 
-	cfg = array_index_nospec((long)cfg, NR_RAPL_DOMAINS + 1);
+	cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
 	bit = cfg - 1;
 
 	/* check event supported */
-	if (!(rapl_cntr_mask & (1 << bit)))
+	if (!(rapl_pkg_cntr_mask & (1 << bit)))
 		return -EINVAL;
 
 	/* unsupported modes and filters */
@@ -401,7 +402,7 @@ static void rapl_pmu_event_read(struct perf_event *event)
 static ssize_t rapl_get_attr_cpumask(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
-	return cpumap_print_to_pagebuf(true, buf, rapl_pmus->cpumask);
+	return cpumap_print_to_pagebuf(true, buf, rapl_pmus_pkg->cpumask);
 }
 
 static DEVICE_ATTR(cpumask, S_IRUGO, rapl_get_attr_cpumask, NULL);
@@ -553,11 +554,11 @@ static struct perf_msr intel_rapl_spr_msrs[] = {
 };
 
 /*
- * Force to PERF_RAPL_MAX size due to:
- * - perf_msr_probe(PERF_RAPL_MAX)
+ * Force to PERF_RAPL_PKG_EVENTS_MAX size due to:
+ * - perf_msr_probe(PERF_RAPL_PKG_EVENTS_MAX)
  * - want to use same event codes across both architectures
  */
-static struct perf_msr amd_rapl_msrs[] = {
+static struct perf_msr amd_rapl_pkg_msrs[] = {
 	[PERF_RAPL_PP0]  = { 0, &rapl_events_cores_group, NULL, false, 0 },
 	[PERF_RAPL_PKG]  = { MSR_AMD_PKG_ENERGY_STATUS,  &rapl_events_pkg_group,   test_msr, false, RAPL_MSR_MASK },
 	[PERF_RAPL_RAM]  = { 0, &rapl_events_ram_group,   NULL, false, 0 },
@@ -590,7 +591,7 @@ static int __rapl_cpu_offline(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu
 
 static int rapl_cpu_offline(unsigned int cpu)
 {
-	return __rapl_cpu_offline(rapl_pmus, get_rapl_pmu_idx(cpu),
+	return __rapl_cpu_offline(rapl_pmus_pkg, get_rapl_pmu_idx(cpu),
 				  get_rapl_pmu_cpumask(cpu), cpu);
 }
 
@@ -629,7 +630,7 @@ static int __rapl_cpu_online(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu_
 
 static int rapl_cpu_online(unsigned int cpu)
 {
-	return __rapl_cpu_online(rapl_pmus, get_rapl_pmu_idx(cpu),
+	return __rapl_cpu_online(rapl_pmus_pkg, get_rapl_pmu_idx(cpu),
 				 get_rapl_pmu_cpumask(cpu), cpu);
 }
 
@@ -642,8 +643,8 @@ static int rapl_check_hw_unit(void)
 	/* protect rdmsrl() to handle virtualization */
 	if (rdmsrl_safe(rapl_model->msr_power_unit, &msr_rapl_power_unit_bits))
 		return -1;
-	for (i = 0; i < NR_RAPL_DOMAINS; i++)
-		rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
+	for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++)
+		rapl_pkg_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
 
 	switch (rapl_model->unit_quirk) {
 	/*
@@ -653,11 +654,11 @@ static int rapl_check_hw_unit(void)
 	 * of 2. Datasheet, September 2014, Reference Number: 330784-001 "
 	 */
 	case RAPL_UNIT_QUIRK_INTEL_HSW:
-		rapl_hw_unit[PERF_RAPL_RAM] = 16;
+		rapl_pkg_hw_unit[PERF_RAPL_RAM] = 16;
 		break;
 	/* SPR uses a fixed energy unit for Psys domain. */
 	case RAPL_UNIT_QUIRK_INTEL_SPR:
-		rapl_hw_unit[PERF_RAPL_PSYS] = 0;
+		rapl_pkg_hw_unit[PERF_RAPL_PSYS] = 0;
 		break;
 	default:
 		break;
@@ -672,9 +673,9 @@ static int rapl_check_hw_unit(void)
 	 * if hw unit is 32, then we use 2 ms 1/200/2
 	 */
 	rapl_timer_ms = 2;
-	if (rapl_hw_unit[0] < 32) {
+	if (rapl_pkg_hw_unit[0] < 32) {
 		rapl_timer_ms = (1000 / (2 * 100));
-		rapl_timer_ms *= (1ULL << (32 - rapl_hw_unit[0] - 1));
+		rapl_timer_ms *= (1ULL << (32 - rapl_pkg_hw_unit[0] - 1));
 	}
 	return 0;
 }
@@ -684,12 +685,12 @@ static void __init rapl_advertise(void)
 	int i;
 
 	pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms ovfl timer\n",
-		hweight32(rapl_cntr_mask), rapl_timer_ms);
+		hweight32(rapl_pkg_cntr_mask), rapl_timer_ms);
 
-	for (i = 0; i < NR_RAPL_DOMAINS; i++) {
-		if (rapl_cntr_mask & (1 << i)) {
+	for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++) {
+		if (rapl_pkg_cntr_mask & (1 << i)) {
 			pr_info("hw unit of domain %s 2^-%d Joules\n",
-				rapl_domain_names[i], rapl_hw_unit[i]);
+				rapl_pkg_domain_names[i], rapl_pkg_hw_unit[i]);
 		}
 	}
 }
@@ -746,71 +747,71 @@ static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr)
 }
 
 static struct rapl_model model_snb = {
-	.events		= BIT(PERF_RAPL_PP0) |
+	.pkg_events	= BIT(PERF_RAPL_PP0) |
 			  BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_PP1),
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
-	.rapl_msrs      = intel_rapl_msrs,
+	.rapl_pkg_msrs	= intel_rapl_msrs,
 };
 
 static struct rapl_model model_snbep = {
-	.events		= BIT(PERF_RAPL_PP0) |
+	.pkg_events	= BIT(PERF_RAPL_PP0) |
 			  BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_RAM),
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
-	.rapl_msrs      = intel_rapl_msrs,
+	.rapl_pkg_msrs	= intel_rapl_msrs,
 };
 
 static struct rapl_model model_hsw = {
-	.events		= BIT(PERF_RAPL_PP0) |
+	.pkg_events	= BIT(PERF_RAPL_PP0) |
 			  BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_RAM) |
 			  BIT(PERF_RAPL_PP1),
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
-	.rapl_msrs      = intel_rapl_msrs,
+	.rapl_pkg_msrs	= intel_rapl_msrs,
 };
 
 static struct rapl_model model_hsx = {
-	.events		= BIT(PERF_RAPL_PP0) |
+	.pkg_events	= BIT(PERF_RAPL_PP0) |
 			  BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_RAM),
 	.unit_quirk	= RAPL_UNIT_QUIRK_INTEL_HSW,
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
-	.rapl_msrs      = intel_rapl_msrs,
+	.rapl_pkg_msrs	= intel_rapl_msrs,
 };
 
 static struct rapl_model model_knl = {
-	.events		= BIT(PERF_RAPL_PKG) |
+	.pkg_events	= BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_RAM),
 	.unit_quirk	= RAPL_UNIT_QUIRK_INTEL_HSW,
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
-	.rapl_msrs      = intel_rapl_msrs,
+	.rapl_pkg_msrs	= intel_rapl_msrs,
 };
 
 static struct rapl_model model_skl = {
-	.events		= BIT(PERF_RAPL_PP0) |
+	.pkg_events	= BIT(PERF_RAPL_PP0) |
 			  BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_RAM) |
 			  BIT(PERF_RAPL_PP1) |
 			  BIT(PERF_RAPL_PSYS),
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
-	.rapl_msrs      = intel_rapl_msrs,
+	.rapl_pkg_msrs	= intel_rapl_msrs,
 };
 
 static struct rapl_model model_spr = {
-	.events		= BIT(PERF_RAPL_PP0) |
+	.pkg_events	= BIT(PERF_RAPL_PP0) |
 			  BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_RAM) |
 			  BIT(PERF_RAPL_PSYS),
 	.unit_quirk	= RAPL_UNIT_QUIRK_INTEL_SPR,
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
-	.rapl_msrs      = intel_rapl_spr_msrs,
+	.rapl_pkg_msrs	= intel_rapl_spr_msrs,
 };
 
 static struct rapl_model model_amd_hygon = {
-	.events		= BIT(PERF_RAPL_PKG),
+	.pkg_events	= BIT(PERF_RAPL_PKG),
 	.msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
-	.rapl_msrs      = amd_rapl_msrs,
+	.rapl_pkg_msrs	= amd_rapl_pkg_msrs,
 };
 
 static const struct x86_cpu_id rapl_model_match[] __initconst = {
@@ -874,16 +875,16 @@ static int __init rapl_pmu_init(void)
 
 	rapl_model = (struct rapl_model *) id->driver_data;
 
-	rapl_msrs = rapl_model->rapl_msrs;
+	rapl_msrs = rapl_model->rapl_pkg_msrs;
 
-	rapl_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_MAX,
-					false, (void *) &rapl_model->events);
+	rapl_pkg_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_PKG_EVENTS_MAX,
+					false, (void *) &rapl_model->pkg_events);
 
 	ret = rapl_check_hw_unit();
 	if (ret)
 		return ret;
 
-	ret = init_rapl_pmus(&rapl_pmus);
+	ret = init_rapl_pmus(&rapl_pmus_pkg);
 	if (ret)
 		return ret;
 
@@ -896,7 +897,7 @@ static int __init rapl_pmu_init(void)
 	if (ret)
 		goto out;
 
-	ret = perf_pmu_register(&rapl_pmus->pmu, "power", -1);
+	ret = perf_pmu_register(&rapl_pmus_pkg->pmu, "power", -1);
 	if (ret)
 		goto out1;
 
@@ -907,7 +908,7 @@ static int __init rapl_pmu_init(void)
 	cpuhp_remove_state(CPUHP_AP_PERF_X86_RAPL_ONLINE);
 out:
 	pr_warn("Initialization failed (%d), disabled\n", ret);
-	cleanup_rapl_pmus(rapl_pmus);
+	cleanup_rapl_pmus(rapl_pmus_pkg);
 	return ret;
 }
 module_init(rapl_pmu_init);
@@ -915,7 +916,7 @@ module_init(rapl_pmu_init);
 static void __exit intel_rapl_exit(void)
 {
 	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
-	perf_pmu_unregister(&rapl_pmus->pmu);
-	cleanup_rapl_pmus(rapl_pmus);
+	perf_pmu_unregister(&rapl_pmus_pkg->pmu);
+	cleanup_rapl_pmus(rapl_pmus_pkg);
 }
 module_exit(intel_rapl_exit);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 09/11] perf/x86/rapl: Remove the global variable rapl_msrs
  2024-07-11 10:24 [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (7 preceding siblings ...)
  2024-07-11 10:24 ` [PATCH v4 08/11] perf/x86/rapl: Modify the generic variable names to *_pkg* Dhananjay Ugwekar
@ 2024-07-11 10:24 ` Dhananjay Ugwekar
  2024-07-12  3:29   ` Zhang, Rui
  2024-07-11 10:24 ` [PATCH v4 10/11] perf/x86/rapl: Add per-core energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-11 10:24 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay.Ugwekar

After making the rapl_model struct global, the rapl_msrs global
variable isn't needed, so remove it.

Also it will be cleaner when new per-core scope PMU is added. As we will
need to maintain two rapl_msrs array(one for per-core scope and one for
package scope PMU), inside the rapl_model struct.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 9beded3d4a14..7691497042e9 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -145,7 +145,6 @@ static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
 static struct rapl_pmus *rapl_pmus_pkg;
 static unsigned int rapl_pkg_cntr_mask;
 static u64 rapl_timer_ms;
-static struct perf_msr *rapl_msrs;
 static struct rapl_model *rapl_model;
 
 /*
@@ -387,7 +386,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
 		return -EINVAL;
 	event->cpu = rapl_pmu->cpu;
 	event->pmu_private = rapl_pmu;
-	event->hw.event_base = rapl_msrs[bit].msr;
+	event->hw.event_base = rapl_model->rapl_msrs[bit].msr;
 	event->hw.config = cfg;
 	event->hw.idx = bit;
 
@@ -875,9 +874,7 @@ static int __init rapl_pmu_init(void)
 
 	rapl_model = (struct rapl_model *) id->driver_data;
 
-	rapl_msrs = rapl_model->rapl_pkg_msrs;
-
-	rapl_pkg_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_PKG_EVENTS_MAX,
+	rapl_pkg_cntr_mask = perf_msr_probe(rapl_model->rapl_msrs, PERF_RAPL_PKG_EVENTS_MAX,
 					false, (void *) &rapl_model->pkg_events);
 
 	ret = rapl_check_hw_unit();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 10/11] perf/x86/rapl: Add per-core energy counter support for AMD CPUs
  2024-07-11 10:24 [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (8 preceding siblings ...)
  2024-07-11 10:24 ` [PATCH v4 09/11] perf/x86/rapl: Remove the global variable rapl_msrs Dhananjay Ugwekar
@ 2024-07-11 10:24 ` Dhananjay Ugwekar
  2024-07-12  6:15   ` Zhang, Rui
  2024-07-11 10:24 ` [PATCH v4 11/11] perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu Dhananjay Ugwekar
  2024-07-11 22:23 ` [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Ian Rogers
  11 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-11 10:24 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay.Ugwekar

Add a new "power_per_core" PMU and "energy-per-core" event for
monitoring energy consumption by each core. The existing energy-cores
event aggregates the energy consumption at the package level.
This new event aligns with the AMD's per_core energy counters.

Tested the package level and core level PMU counters with workloads
pinned to different CPUs.

Results with workload pinned to CPU 1 in core 1 on a AMD Zen4 Genoa
machine:

$ perf stat -a --per-core -e power_per_core/energy-per-core/ sleep 1

 Performance counter stats for 'system wide':

S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
Changes in v4:
* Reorder the cleanup order of per-core and per-pkg PMU (Rui)
* Add rapl_core_hw_unit variable to store the per-core PMU unit (Rui)
* Use the newly added hw_unit in rapl_scale() and rapl_advertise()
  functions
* Add a new argument in rapl_scale() to differentiate between per-core
  and per-pkg events.
---
 arch/x86/events/rapl.c | 190 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 171 insertions(+), 19 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 7691497042e9..91eb9850df9b 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -39,6 +39,10 @@
  *	  event: rapl_energy_psys
  *    perf code: 0x5
  *
+ *  per_core counter: consumption of a single physical core
+ *	  event: rapl_energy_per_core (power_per_core PMU)
+ *    perf code: 0x1
+ *
  * We manage those counters as free running (read-only). They may be
  * use simultaneously by other tools, such as turbostat.
  *
@@ -81,6 +85,13 @@ enum perf_rapl_pkg_events {
 	NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
 };
 
+enum perf_rapl_core_events {
+	PERF_RAPL_PER_CORE = 0,		/* per-core */
+
+	PERF_RAPL_CORE_EVENTS_MAX,
+	NR_RAPL_CORE_DOMAINS = PERF_RAPL_CORE_EVENTS_MAX,
+};
+
 static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst = {
 	"pp0-core",
 	"package",
@@ -89,6 +100,10 @@ static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst
 	"psys",
 };
 
+static const char *const rapl_core_domain_names[NR_RAPL_CORE_DOMAINS] __initconst = {
+	"per-core",
+};
+
 /*
  * event code: LSB 8 bits, passed in attr->config
  * any other bit is reserved
@@ -135,15 +150,20 @@ enum rapl_unit_quirk {
 
 struct rapl_model {
 	struct perf_msr *rapl_pkg_msrs;
+	struct perf_msr *rapl_core_msrs;
 	unsigned long	pkg_events;
+	unsigned long	core_events;
 	unsigned int	msr_power_unit;
 	enum rapl_unit_quirk	unit_quirk;
 };
 
  /* 1/2^hw_unit Joule */
 static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
+static int rapl_core_hw_unit __read_mostly;
 static struct rapl_pmus *rapl_pmus_pkg;
+static struct rapl_pmus *rapl_pmus_core;
 static unsigned int rapl_pkg_cntr_mask;
+static unsigned int rapl_core_cntr_mask;
 static u64 rapl_timer_ms;
 static struct rapl_model *rapl_model;
 
@@ -182,19 +202,25 @@ static inline u64 rapl_read_counter(struct perf_event *event)
 	return raw;
 }
 
-static inline u64 rapl_scale(u64 v, int cfg)
+static inline u64 rapl_scale(u64 v, int cfg, bool per_core_event)
 {
-	if (cfg > NR_RAPL_PKG_DOMAINS) {
+	if ((!per_core_event && cfg > NR_RAPL_PKG_DOMAINS) ||
+	    (per_core_event && cfg > NR_RAPL_CORE_DOMAINS)) {
 		pr_warn("Invalid domain %d, failed to scale data\n", cfg);
 		return v;
 	}
+	int hw_unit = rapl_pkg_hw_unit[cfg - 1];
+
+	if (per_core_event)
+		hw_unit = rapl_core_hw_unit;
+
 	/*
 	 * scale delta to smallest unit (1/2^32)
 	 * users must then scale back: count * 1/(1e9*2^32) to get Joules
 	 * or use ldexp(count, -32).
 	 * Watts = Joules/Time delta
 	 */
-	return v << (32 - rapl_pkg_hw_unit[cfg - 1]);
+	return v << (32 - hw_unit);
 }
 
 static u64 rapl_event_update(struct perf_event *event)
@@ -203,6 +229,7 @@ static u64 rapl_event_update(struct perf_event *event)
 	u64 prev_raw_count, new_raw_count;
 	s64 delta, sdelta;
 	int shift = RAPL_CNTR_WIDTH;
+	struct rapl_pmus *curr_rapl_pmus;
 
 	prev_raw_count = local64_read(&hwc->prev_count);
 	do {
@@ -221,7 +248,12 @@ static u64 rapl_event_update(struct perf_event *event)
 	delta = (new_raw_count << shift) - (prev_raw_count << shift);
 	delta >>= shift;
 
-	sdelta = rapl_scale(delta, event->hw.config);
+	curr_rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
+
+	if (curr_rapl_pmus == rapl_pmus_core)
+		sdelta = rapl_scale(delta, event->hw.config, true);
+	else
+		sdelta = rapl_scale(delta, event->hw.config, false);
 
 	local64_add(sdelta, &event->count);
 
@@ -352,9 +384,13 @@ static int rapl_pmu_event_init(struct perf_event *event)
 	u64 cfg = event->attr.config & RAPL_EVENT_MASK;
 	int bit, ret = 0;
 	struct rapl_pmu *rapl_pmu;
+	struct rapl_pmus *curr_rapl_pmus;
 
 	/* only look at RAPL events */
-	if (event->attr.type != rapl_pmus_pkg->pmu.type)
+	if (event->attr.type == rapl_pmus_pkg->pmu.type ||
+		(rapl_pmus_core && event->attr.type == rapl_pmus_core->pmu.type))
+		curr_rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
+	else
 		return -ENOENT;
 
 	/* check only supported bits are set */
@@ -364,7 +400,8 @@ static int rapl_pmu_event_init(struct perf_event *event)
 	if (event->cpu < 0)
 		return -EINVAL;
 
-	event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
+	if (curr_rapl_pmus == rapl_pmus_pkg)
+		event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
 
 	if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
 		return -EINVAL;
@@ -373,7 +410,8 @@ static int rapl_pmu_event_init(struct perf_event *event)
 	bit = cfg - 1;
 
 	/* check event supported */
-	if (!(rapl_pkg_cntr_mask & (1 << bit)))
+	if (!(rapl_pkg_cntr_mask & (1 << bit)) &&
+	    !(rapl_core_cntr_mask & (1 << bit)))
 		return -EINVAL;
 
 	/* unsupported modes and filters */
@@ -381,12 +419,18 @@ static int rapl_pmu_event_init(struct perf_event *event)
 		return -EINVAL;
 
 	/* must be done before validate_group */
-	rapl_pmu = cpu_to_rapl_pmu(event->cpu);
+	if (curr_rapl_pmus == rapl_pmus_core) {
+		rapl_pmu = curr_rapl_pmus->rapl_pmu[topology_logical_core_id(event->cpu)];
+		event->hw.event_base = rapl_model->rapl_core_msrs[bit].msr;
+	} else {
+		rapl_pmu = curr_rapl_pmus->rapl_pmu[get_rapl_pmu_idx(event->cpu)];
+		event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
+	}
+
 	if (!rapl_pmu)
 		return -EINVAL;
 	event->cpu = rapl_pmu->cpu;
 	event->pmu_private = rapl_pmu;
-	event->hw.event_base = rapl_model->rapl_msrs[bit].msr;
 	event->hw.config = cfg;
 	event->hw.idx = bit;
 
@@ -415,17 +459,38 @@ static struct attribute_group rapl_pmu_attr_group = {
 	.attrs = rapl_pmu_attrs,
 };
 
+static ssize_t rapl_get_attr_per_core_cpumask(struct device *dev,
+					     struct device_attribute *attr, char *buf)
+{
+	return cpumap_print_to_pagebuf(true, buf, rapl_pmus_core->cpumask);
+}
+
+static struct device_attribute dev_attr_per_core_cpumask = __ATTR(cpumask, 0444,
+								 rapl_get_attr_per_core_cpumask,
+								 NULL);
+
+static struct attribute *rapl_pmu_per_core_attrs[] = {
+	&dev_attr_per_core_cpumask.attr,
+	NULL,
+};
+
+static struct attribute_group rapl_pmu_per_core_attr_group = {
+	.attrs = rapl_pmu_per_core_attrs,
+};
+
 RAPL_EVENT_ATTR_STR(energy-cores, rapl_cores, "event=0x01");
 RAPL_EVENT_ATTR_STR(energy-pkg  ,   rapl_pkg, "event=0x02");
 RAPL_EVENT_ATTR_STR(energy-ram  ,   rapl_ram, "event=0x03");
 RAPL_EVENT_ATTR_STR(energy-gpu  ,   rapl_gpu, "event=0x04");
 RAPL_EVENT_ATTR_STR(energy-psys,   rapl_psys, "event=0x05");
+RAPL_EVENT_ATTR_STR(energy-per-core,   rapl_per_core, "event=0x01");
 
 RAPL_EVENT_ATTR_STR(energy-cores.unit, rapl_cores_unit, "Joules");
 RAPL_EVENT_ATTR_STR(energy-pkg.unit  ,   rapl_pkg_unit, "Joules");
 RAPL_EVENT_ATTR_STR(energy-ram.unit  ,   rapl_ram_unit, "Joules");
 RAPL_EVENT_ATTR_STR(energy-gpu.unit  ,   rapl_gpu_unit, "Joules");
 RAPL_EVENT_ATTR_STR(energy-psys.unit,   rapl_psys_unit, "Joules");
+RAPL_EVENT_ATTR_STR(energy-per-core.unit,   rapl_per_core_unit, "Joules");
 
 /*
  * we compute in 0.23 nJ increments regardless of MSR
@@ -435,6 +500,7 @@ RAPL_EVENT_ATTR_STR(energy-pkg.scale,     rapl_pkg_scale, "2.3283064365386962890
 RAPL_EVENT_ATTR_STR(energy-ram.scale,     rapl_ram_scale, "2.3283064365386962890625e-10");
 RAPL_EVENT_ATTR_STR(energy-gpu.scale,     rapl_gpu_scale, "2.3283064365386962890625e-10");
 RAPL_EVENT_ATTR_STR(energy-psys.scale,   rapl_psys_scale, "2.3283064365386962890625e-10");
+RAPL_EVENT_ATTR_STR(energy-per-core.scale,   rapl_per_core_scale, "2.3283064365386962890625e-10");
 
 /*
  * There are no default events, but we need to create
@@ -468,6 +534,13 @@ static const struct attribute_group *rapl_attr_groups[] = {
 	NULL,
 };
 
+static const struct attribute_group *rapl_per_core_attr_groups[] = {
+	&rapl_pmu_per_core_attr_group,
+	&rapl_pmu_format_group,
+	&rapl_pmu_events_group,
+	NULL,
+};
+
 static struct attribute *rapl_events_cores[] = {
 	EVENT_PTR(rapl_cores),
 	EVENT_PTR(rapl_cores_unit),
@@ -528,6 +601,18 @@ static struct attribute_group rapl_events_psys_group = {
 	.attrs = rapl_events_psys,
 };
 
+static struct attribute *rapl_events_per_core[] = {
+	EVENT_PTR(rapl_per_core),
+	EVENT_PTR(rapl_per_core_unit),
+	EVENT_PTR(rapl_per_core_scale),
+	NULL,
+};
+
+static struct attribute_group rapl_events_per_core_group = {
+	.name  = "events",
+	.attrs = rapl_events_per_core,
+};
+
 static bool test_msr(int idx, void *data)
 {
 	return test_bit(idx, (unsigned long *) data);
@@ -565,6 +650,11 @@ static struct perf_msr amd_rapl_pkg_msrs[] = {
 	[PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group,  NULL, false, 0 },
 };
 
+static struct perf_msr amd_rapl_core_msrs[] = {
+	[PERF_RAPL_PER_CORE] = { MSR_AMD_CORE_ENERGY_STATUS, &rapl_events_per_core_group,
+				 test_msr, false, RAPL_MSR_MASK },
+};
+
 static int __rapl_cpu_offline(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu_idx,
 			      const struct cpumask *event_cpumask, unsigned int cpu)
 {
@@ -590,8 +680,14 @@ static int __rapl_cpu_offline(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu
 
 static int rapl_cpu_offline(unsigned int cpu)
 {
-	return __rapl_cpu_offline(rapl_pmus_pkg, get_rapl_pmu_idx(cpu),
+	int ret =  __rapl_cpu_offline(rapl_pmus_pkg, get_rapl_pmu_idx(cpu),
 				  get_rapl_pmu_cpumask(cpu), cpu);
+
+	if (ret == 0 && rapl_model->core_events)
+		ret = __rapl_cpu_offline(rapl_pmus_core, topology_logical_core_id(cpu),
+				   topology_sibling_cpumask(cpu), cpu);
+
+	return ret;
 }
 
 static int __rapl_cpu_online(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu_idx,
@@ -629,8 +725,14 @@ static int __rapl_cpu_online(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu_
 
 static int rapl_cpu_online(unsigned int cpu)
 {
-	return __rapl_cpu_online(rapl_pmus_pkg, get_rapl_pmu_idx(cpu),
+	int ret =  __rapl_cpu_online(rapl_pmus_pkg, get_rapl_pmu_idx(cpu),
 				 get_rapl_pmu_cpumask(cpu), cpu);
+
+	if (ret == 0 && rapl_model->core_events)
+		ret = __rapl_cpu_online(rapl_pmus_core, topology_logical_core_id(cpu),
+				   topology_sibling_cpumask(cpu), cpu);
+
+	return ret;
 }
 
 
@@ -645,6 +747,8 @@ static int rapl_check_hw_unit(void)
 	for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++)
 		rapl_pkg_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
 
+	rapl_core_hw_unit = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
+
 	switch (rapl_model->unit_quirk) {
 	/*
 	 * DRAM domain on HSW server and KNL has fixed energy unit which can be
@@ -684,7 +788,7 @@ static void __init rapl_advertise(void)
 	int i;
 
 	pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms ovfl timer\n",
-		hweight32(rapl_pkg_cntr_mask), rapl_timer_ms);
+		hweight32(rapl_pkg_cntr_mask) + hweight32(rapl_core_cntr_mask), rapl_timer_ms);
 
 	for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++) {
 		if (rapl_pkg_cntr_mask & (1 << i)) {
@@ -692,6 +796,10 @@ static void __init rapl_advertise(void)
 				rapl_pkg_domain_names[i], rapl_pkg_hw_unit[i]);
 		}
 	}
+
+	if (rapl_core_cntr_mask & (1 << PERF_RAPL_PER_CORE))
+		pr_info("hw unit of domain %s 2^-%d Joules\n",
+			rapl_core_domain_names[PERF_RAPL_PER_CORE], rapl_core_hw_unit);
 }
 
 static void cleanup_rapl_pmus(struct rapl_pmus *rapl_pmus)
@@ -712,14 +820,16 @@ static const struct attribute_group *rapl_attr_update[] = {
 	NULL,
 };
 
-static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr)
+static const struct attribute_group *rapl_per_core_attr_update[] = {
+	&rapl_events_per_core_group,
+};
+
+static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr, int nr_rapl_pmu,
+				 const struct attribute_group **rapl_attr_groups,
+				 const struct attribute_group **rapl_attr_update)
 {
-	int nr_rapl_pmu = topology_max_packages();
 	struct rapl_pmus *rapl_pmus;
 
-	if (!rapl_pmu_is_pkg_scope())
-		nr_rapl_pmu *= topology_max_dies_per_package();
-
 	rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu, nr_rapl_pmu), GFP_KERNEL);
 	if (!rapl_pmus)
 		return -ENOMEM;
@@ -809,8 +919,10 @@ static struct rapl_model model_spr = {
 
 static struct rapl_model model_amd_hygon = {
 	.pkg_events	= BIT(PERF_RAPL_PKG),
+	.core_events	= BIT(PERF_RAPL_PER_CORE),
 	.msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
 	.rapl_pkg_msrs	= amd_rapl_pkg_msrs,
+	.rapl_core_msrs	= amd_rapl_core_msrs,
 };
 
 static const struct x86_cpu_id rapl_model_match[] __initconst = {
@@ -867,6 +979,11 @@ static int __init rapl_pmu_init(void)
 {
 	const struct x86_cpu_id *id;
 	int ret;
+	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
+	int nr_cores = topology_max_packages() * topology_num_cores_per_package();
+
+	if (rapl_pmu_is_pkg_scope())
+		nr_rapl_pmu = topology_max_packages();
 
 	id = x86_match_cpu(rapl_model_match);
 	if (!id)
@@ -874,17 +991,34 @@ static int __init rapl_pmu_init(void)
 
 	rapl_model = (struct rapl_model *) id->driver_data;
 
-	rapl_pkg_cntr_mask = perf_msr_probe(rapl_model->rapl_msrs, PERF_RAPL_PKG_EVENTS_MAX,
+	rapl_pkg_cntr_mask = perf_msr_probe(rapl_model->rapl_pkg_msrs, PERF_RAPL_PKG_EVENTS_MAX,
 					false, (void *) &rapl_model->pkg_events);
 
 	ret = rapl_check_hw_unit();
 	if (ret)
 		return ret;
 
-	ret = init_rapl_pmus(&rapl_pmus_pkg);
+	ret = init_rapl_pmus(&rapl_pmus_pkg, nr_rapl_pmu, rapl_attr_groups, rapl_attr_update);
 	if (ret)
 		return ret;
 
+	if (rapl_model->core_events) {
+		rapl_core_cntr_mask = perf_msr_probe(rapl_model->rapl_core_msrs,
+						     PERF_RAPL_CORE_EVENTS_MAX, false,
+						     (void *) &rapl_model->core_events);
+
+		ret = init_rapl_pmus(&rapl_pmus_core, nr_cores,
+				     rapl_per_core_attr_groups, rapl_per_core_attr_update);
+		if (ret) {
+			/*
+			 * If initialization of per_core PMU fails, reset per_core
+			 * flag, and continue with power PMU initialization.
+			 */
+			pr_warn("Per-core PMU initialization failed (%d)\n", ret);
+			rapl_model->core_events = 0UL;
+		}
+	}
+
 	/*
 	 * Install callbacks. Core will call them for each online cpu.
 	 */
@@ -898,6 +1032,20 @@ static int __init rapl_pmu_init(void)
 	if (ret)
 		goto out1;
 
+	if (rapl_model->core_events) {
+		ret = perf_pmu_register(&rapl_pmus_core->pmu, "power_per_core", -1);
+		if (ret) {
+			/*
+			 * If registration of per_core PMU fails, cleanup per_core PMU
+			 * variables, reset the per_core flag and keep the
+			 * power PMU untouched.
+			 */
+			pr_warn("Per-core PMU registration failed (%d)\n", ret);
+			cleanup_rapl_pmus(rapl_pmus_core);
+			rapl_model->core_events = 0UL;
+		}
+	}
+
 	rapl_advertise();
 	return 0;
 
@@ -915,5 +1063,9 @@ static void __exit intel_rapl_exit(void)
 	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
+	if (rapl_model->core_events) {
+		perf_pmu_unregister(&rapl_pmus_core->pmu);
+		cleanup_rapl_pmus(rapl_pmus_core);
+	}
 	perf_pmu_unregister(&rapl_pmus_pkg->pmu);
 	cleanup_rapl_pmus(rapl_pmus_pkg);
 }
 module_exit(intel_rapl_exit);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 11/11] perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu
  2024-07-11 10:24 [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (9 preceding siblings ...)
  2024-07-11 10:24 ` [PATCH v4 10/11] perf/x86/rapl: Add per-core energy counter support for AMD CPUs Dhananjay Ugwekar
@ 2024-07-11 10:24 ` Dhananjay Ugwekar
  2024-07-11 22:23 ` [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Ian Rogers
  11 siblings, 0 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-11 10:24 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay.Ugwekar

After the addition of per-core counters support, this function
does not work as the same CPU can be mapped to multiple rapl_pmu
structs (package or per-core). So, remove it.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 91eb9850df9b..6edbf798a1a0 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -183,18 +183,6 @@ static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
 					 topology_die_cpumask(cpu);
 }
 
-static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
-{
-	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
-
-	/*
-	 * The unsigned check also catches the '-1' return value for non
-	 * existent mappings in the topology map.
-	 */
-	return rapl_pmu_idx < rapl_pmus_pkg->nr_rapl_pmu ?
-	       rapl_pmus_pkg->rapl_pmu[rapl_pmu_idx] : NULL;
-}
-
 static inline u64 rapl_read_counter(struct perf_event *event)
 {
 	u64 raw;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs
  2024-07-11 10:24 [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (10 preceding siblings ...)
  2024-07-11 10:24 ` [PATCH v4 11/11] perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu Dhananjay Ugwekar
@ 2024-07-11 22:23 ` Ian Rogers
  2024-07-15  9:35   ` Dhananjay Ugwekar
  11 siblings, 1 reply; 31+ messages in thread
From: Ian Rogers @ 2024-07-11 22:23 UTC (permalink / raw)
  To: Dhananjay Ugwekar
  Cc: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, adrian.hunter, kan.liang, tglx, bp, dave.hansen, x86, kees,
	gustavoars, rui.zhang, oleksandr, linux-perf-users, linux-kernel,
	linux-hardening, ananth.narayan, gautham.shenoy, kprateek.nayak,
	ravi.bangoria, sandipan.das, linux-pm

On Thu, Jul 11, 2024 at 3:25 AM Dhananjay Ugwekar
<Dhananjay.Ugwekar@amd.com> wrote:
>
> Currently the energy-cores event in the power PMU aggregates energy
> consumption data at a package level. On the other hand the core energy
> RAPL counter in AMD CPUs has a core scope (which means the energy
> consumption is recorded separately for each core). Earlier efforts to add
> the core event in the power PMU had failed [1], due to the difference in
> the scope of these two events. Hence, there is a need for a new core scope
> PMU.
>
> This patchset adds a new "power_per_core" PMU alongside the existing
> "power" PMU, which will be responsible for collecting the new
> "energy-per-core" event.

Sorry for being naive, is the only reason for adding the new PMU for
the sake of the cpumask? Perhaps we can add per event cpumasks like
say `/sys/devices/power/events/energy-per-core.cpumask` which contains
the CPUs of the different cores in this case. There's supporting
hotplugging CPUs as an issue. Adding the tool support for this
wouldn't be hard and it may be less messy (although old perf tools on
new kernels wouldn't know about these files).

Thanks,
Ian


> Tested the package level and core level PMU counters with workloads
> pinned to different CPUs.
>
> Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa
> machine:
>
> $ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1
>
>  Performance counter stats for 'system wide':
>
> S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
> S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/
>
> [1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/
>
> This patchset applies cleanly on top of v6.10-rc7 as well as latest
> tip/master.
>
> v4 changes:
> * Add patch 11 which removes the unused function cpu_to_rapl_pmu()
> * Add Rui's rb tag for patch 1
> * Invert the pmu scope check logic in patch 2 (Peter)
> * Add comments explaining the scope check in patch 2 (Peter)
> * Use cpumask_var_t instead of cpumask_t in patch 5 (Peter)
> * Move renaming code to patch 8 (Rui)
> * Reorder the cleanup order of per-core and per-pkg PMU in patch 10 (Rui)
> * Add rapl_core_hw_unit variable to store the per-core PMU unit in patch
>   10 (Rui)
>
> PS: Scope check logic is still kept the same (i.e., all Intel systems being
> considered as die scope), Rui will be modifying it to limit the die-scope
> only to Cascadelake-AP in a future patch on top of this patchset.
>
> v3 changes:
> * Patch 1 added to introduce the logical_core_id which is unique across
>   the system (Prateek)
> * Use the unique topology_logical_core_id() instead of
>   topology_core_id() (which is only unique within a package on tested
>   AMD and Intel systems) in Patch 10
>
> v2 changes:
> * Patches 6,7,8 added to split some changes out of the last patch
> * Use container_of to get the rapl_pmus from event variable (Rui)
> * Set PERF_EV_CAP_READ_ACTIVE_PKG flag only for pkg scope PMU (Rui)
> * Use event id 0x1 for energy-per-core event (Rui)
> * Use PERF_RAPL_PER_CORE bit instead of adding a new flag to check for
>   per-core counter hw support (Rui)
>
> Dhananjay Ugwekar (10):
>   perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
>   perf/x86/rapl: Rename rapl_pmu variables
>   perf/x86/rapl: Make rapl_model struct global
>   perf/x86/rapl: Move cpumask variable to rapl_pmus struct
>   perf/x86/rapl: Add wrapper for online/offline functions
>   perf/x86/rapl: Add an argument to the cleanup and init functions
>   perf/x86/rapl: Modify the generic variable names to *_pkg*
>   perf/x86/rapl: Remove the global variable rapl_msrs
>   perf/x86/rapl: Add per-core energy counter support for AMD CPUs
>   perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu
>
> K Prateek Nayak (1):
>   x86/topology: Introduce topology_logical_core_id()
>
>  Documentation/arch/x86/topology.rst   |   4 +
>  arch/x86/events/rapl.c                | 454 ++++++++++++++++++--------
>  arch/x86/include/asm/processor.h      |   1 +
>  arch/x86/include/asm/topology.h       |   1 +
>  arch/x86/kernel/cpu/debugfs.c         |   1 +
>  arch/x86/kernel/cpu/topology_common.c |   1 +
>  6 files changed, 328 insertions(+), 134 deletions(-)
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 02/11] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
  2024-07-11 10:24 ` [PATCH v4 02/11] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs Dhananjay Ugwekar
@ 2024-07-12  2:04   ` Zhang, Rui
  2024-07-12  3:28     ` Dhananjay Ugwekar
  2024-07-16  8:56   ` Dhananjay Ugwekar
  1 sibling, 1 reply; 31+ messages in thread
From: Zhang, Rui @ 2024-07-12  2:04 UTC (permalink / raw)
  To: alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	Hunter, Adrian, mingo@redhat.com, irogers@google.com,
	tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, Dhananjay.Ugwekar@amd.com, bp@alien8.de,
	acme@kernel.org, oleksandr@natalenko.name, jolsa@kernel.org,
	x86@kernel.org, namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org

On Thu, 2024-07-11 at 10:24 +0000, Dhananjay Ugwekar wrote:
> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026
> leaf"),
> on AMD processors that support extended CPUID leaf 0x80000026, the
> topology_die_cpumask() and topology_logical_die_id() macros, no
> longer
> return the package cpumask and package id, instead they return the
> CCD
> (Core Complex Die) mask and id respectively. This leads to the
> energy-pkg
> event scope to be modified to CCD instead of package.
> 
> Replacing these macros with their package counterparts fixes the
> energy-pkg event for AMD CPUs.
> 
> However due to the difference between the scope of energy-pkg event
> for
> Intel and AMD CPUs, we have to replace these macros conditionally
> only for
> AMD CPUs.
> 
> On a 12 CCD 1 Package AMD Zen4 Genoa machine:
> 
> Before:
> $ cat /sys/devices/power/cpumask
> 0,8,16,24,32,40,48,56,64,72,80,88.
> 
> The expected cpumask here is supposed to be just "0", as it is a
> package
> scope event, only one CPU will be collecting the event for all the
> CPUs in
> the package.
> 
> After:
> $ cat /sys/devices/power/cpumask
> 0
> 
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD
> 0x80000026 leaf")

You still missed my Reviewed-by tag for this one. :)
https://lore.kernel.org/all/e1f70a09f85dbd0ee3f32dffea37993e141269d0.camel@intel.com/

thanks,
rui

> ---
> Changes in v4:
> * Invert the pkg scope check in init_rapl_pmus() (Peter)
> * Add comments to explain the pkg scope check (Peter)
> 
> PS: Scope check logic is still kept the same (i.e., all Intel systems
> being 
> considered as die scope), Rui will be modifying it to limit the die-
> scope 
> only to Cascadelake-AP in a future patch on top of this patchset.
> ---
>  arch/x86/events/rapl.c | 39 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 34 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index 0c5e7a7c43ac..df71f38ad98d 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -103,6 +103,13 @@ static struct perf_pmu_events_attr
> event_attr_##v = {                              \
>         .event_str      =
> str,                                                  \
>  };
>  
> +/*
> + * RAPL PMU scope for AMD is package whereas for Intel it is die.
> + */
> +#define rapl_pmu_is_pkg_scope()                                \
> +       (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||  \
> +        boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
> +
>  struct rapl_pmu {
>         raw_spinlock_t          lock;
>         int                     n_active;
> @@ -140,9 +147,25 @@ static unsigned int rapl_cntr_mask;
>  static u64 rapl_timer_ms;
>  static struct perf_msr *rapl_msrs;
>  
> +/*
> + * Helper functions to get the correct topology macros according to
> the
> + * RAPL PMU scope.
> + */
> +static inline unsigned int get_rapl_pmu_idx(int cpu)
> +{
> +       return rapl_pmu_is_pkg_scope() ?
> topology_logical_package_id(cpu) :
> +                                       
> topology_logical_die_id(cpu);
> +}
> +
> +static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
> +{
> +       return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
> +                                        topology_die_cpumask(cpu);
> +}
> +
>  static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
>  {
> -       unsigned int rapl_pmu_idx = topology_logical_die_id(cpu);
> +       unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
>  
>         /*
>          * The unsigned check also catches the '-1' return value for
> non
> @@ -543,6 +566,7 @@ static struct perf_msr amd_rapl_msrs[] = {
>  
>  static int rapl_cpu_offline(unsigned int cpu)
>  {
> +       const struct cpumask *rapl_pmu_cpumask =
> get_rapl_pmu_cpumask(cpu);
>         struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>         int target;
>  
> @@ -552,7 +576,7 @@ static int rapl_cpu_offline(unsigned int cpu)
>  
>         pmu->cpu = -1;
>         /* Find a new cpu to collect rapl events */
> -       target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
> +       target = cpumask_any_but(rapl_pmu_cpumask, cpu);
>  
>         /* Migrate rapl events to the new target */
>         if (target < nr_cpu_ids) {
> @@ -565,6 +589,8 @@ static int rapl_cpu_offline(unsigned int cpu)
>  
>  static int rapl_cpu_online(unsigned int cpu)
>  {
> +       unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
> +       const struct cpumask *rapl_pmu_cpumask =
> get_rapl_pmu_cpumask(cpu);
>         struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>         int target;
>  
> @@ -579,14 +605,14 @@ static int rapl_cpu_online(unsigned int cpu)
>                 pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
>                 rapl_hrtimer_init(pmu);
>  
> -               rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
> +               rapl_pmus->pmus[rapl_pmu_idx] = pmu;
>         }
>  
>         /*
>          * Check if there is an online cpu in the package which
> collects rapl
>          * events already.
>          */
> -       target = cpumask_any_and(&rapl_cpu_mask,
> topology_die_cpumask(cpu));
> +       target = cpumask_any_and(&rapl_cpu_mask, rapl_pmu_cpumask);
>         if (target < nr_cpu_ids)
>                 return 0;
>  
> @@ -675,7 +701,10 @@ static const struct attribute_group
> *rapl_attr_update[] = {
>  
>  static int __init init_rapl_pmus(void)
>  {
> -       int nr_rapl_pmu = topology_max_packages() *
> topology_max_dies_per_package();
> +       int nr_rapl_pmu = topology_max_packages();
> +
> +       if (!rapl_pmu_is_pkg_scope())
> +               nr_rapl_pmu *= topology_max_dies_per_package();
>  
>         rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus,
> nr_rapl_pmu), GFP_KERNEL);
>         if (!rapl_pmus)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 03/11] perf/x86/rapl: Rename rapl_pmu variables
  2024-07-11 10:24 ` [PATCH v4 03/11] perf/x86/rapl: Rename rapl_pmu variables Dhananjay Ugwekar
@ 2024-07-12  2:14   ` Zhang, Rui
  0 siblings, 0 replies; 31+ messages in thread
From: Zhang, Rui @ 2024-07-12  2:14 UTC (permalink / raw)
  To: alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	Hunter, Adrian, mingo@redhat.com, irogers@google.com,
	tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, Dhananjay.Ugwekar@amd.com, bp@alien8.de,
	acme@kernel.org, oleksandr@natalenko.name, jolsa@kernel.org,
	x86@kernel.org, namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org

On Thu, 2024-07-11 at 10:24 +0000, Dhananjay Ugwekar wrote:
> Rename struct rapl_pmu variables from "pmu" to "rapl_pmu", to
> avoid any confusion between the variables of two different
> structs pmu and rapl_pmu. As rapl_pmu also contains a pointer to
> struct pmu, which leads to situations in code like pmu->pmu,
> which is needlessly confusing. Above scenario is replaced with
> much more readable rapl_pmu->pmu with this change.
> 
> Also rename "pmus" member in rapl_pmus struct, for same reason.
> 
> No functional change.
> 
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

-rui
> ---
>  arch/x86/events/rapl.c | 104 ++++++++++++++++++++-------------------
> --
>  1 file changed, 52 insertions(+), 52 deletions(-)
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index df71f38ad98d..e6162bd84f23 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -123,7 +123,7 @@ struct rapl_pmu {
>  struct rapl_pmus {
>         struct pmu              pmu;
>         unsigned int            nr_rapl_pmu;
> -       struct rapl_pmu         *pmus[] __counted_by(nr_rapl_pmu);
> +       struct rapl_pmu         *rapl_pmu[]
> __counted_by(nr_rapl_pmu);
>  };
>  
>  enum rapl_unit_quirk {
> @@ -171,7 +171,7 @@ static inline struct rapl_pmu
> *cpu_to_rapl_pmu(unsigned int cpu)
>          * The unsigned check also catches the '-1' return value for
> non
>          * existent mappings in the topology map.
>          */
> -       return rapl_pmu_idx < rapl_pmus->nr_rapl_pmu ? rapl_pmus-
> >pmus[rapl_pmu_idx] : NULL;
> +       return rapl_pmu_idx < rapl_pmus->nr_rapl_pmu ? rapl_pmus-
> >rapl_pmu[rapl_pmu_idx] : NULL;
>  }
>  
>  static inline u64 rapl_read_counter(struct perf_event *event)
> @@ -235,34 +235,34 @@ static void rapl_start_hrtimer(struct rapl_pmu
> *pmu)
>  
>  static enum hrtimer_restart rapl_hrtimer_handle(struct hrtimer
> *hrtimer)
>  {
> -       struct rapl_pmu *pmu = container_of(hrtimer, struct rapl_pmu,
> hrtimer);
> +       struct rapl_pmu *rapl_pmu = container_of(hrtimer, struct
> rapl_pmu, hrtimer);
>         struct perf_event *event;
>         unsigned long flags;
>  
> -       if (!pmu->n_active)
> +       if (!rapl_pmu->n_active)
>                 return HRTIMER_NORESTART;
>  
> -       raw_spin_lock_irqsave(&pmu->lock, flags);
> +       raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
>  
> -       list_for_each_entry(event, &pmu->active_list, active_entry)
> +       list_for_each_entry(event, &rapl_pmu->active_list,
> active_entry)
>                 rapl_event_update(event);
>  
> -       raw_spin_unlock_irqrestore(&pmu->lock, flags);
> +       raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
>  
> -       hrtimer_forward_now(hrtimer, pmu->timer_interval);
> +       hrtimer_forward_now(hrtimer, rapl_pmu->timer_interval);
>  
>         return HRTIMER_RESTART;
>  }
>  
> -static void rapl_hrtimer_init(struct rapl_pmu *pmu)
> +static void rapl_hrtimer_init(struct rapl_pmu *rapl_pmu)
>  {
> -       struct hrtimer *hr = &pmu->hrtimer;
> +       struct hrtimer *hr = &rapl_pmu->hrtimer;
>  
>         hrtimer_init(hr, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>         hr->function = rapl_hrtimer_handle;
>  }
>  
> -static void __rapl_pmu_event_start(struct rapl_pmu *pmu,
> +static void __rapl_pmu_event_start(struct rapl_pmu *rapl_pmu,
>                                    struct perf_event *event)
>  {
>         if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
> @@ -270,39 +270,39 @@ static void __rapl_pmu_event_start(struct
> rapl_pmu *pmu,
>  
>         event->hw.state = 0;
>  
> -       list_add_tail(&event->active_entry, &pmu->active_list);
> +       list_add_tail(&event->active_entry, &rapl_pmu->active_list);
>  
>         local64_set(&event->hw.prev_count, rapl_read_counter(event));
>  
> -       pmu->n_active++;
> -       if (pmu->n_active == 1)
> -               rapl_start_hrtimer(pmu);
> +       rapl_pmu->n_active++;
> +       if (rapl_pmu->n_active == 1)
> +               rapl_start_hrtimer(rapl_pmu);
>  }
>  
>  static void rapl_pmu_event_start(struct perf_event *event, int mode)
>  {
> -       struct rapl_pmu *pmu = event->pmu_private;
> +       struct rapl_pmu *rapl_pmu = event->pmu_private;
>         unsigned long flags;
>  
> -       raw_spin_lock_irqsave(&pmu->lock, flags);
> -       __rapl_pmu_event_start(pmu, event);
> -       raw_spin_unlock_irqrestore(&pmu->lock, flags);
> +       raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
> +       __rapl_pmu_event_start(rapl_pmu, event);
> +       raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
>  }
>  
>  static void rapl_pmu_event_stop(struct perf_event *event, int mode)
>  {
> -       struct rapl_pmu *pmu = event->pmu_private;
> +       struct rapl_pmu *rapl_pmu = event->pmu_private;
>         struct hw_perf_event *hwc = &event->hw;
>         unsigned long flags;
>  
> -       raw_spin_lock_irqsave(&pmu->lock, flags);
> +       raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
>  
>         /* mark event as deactivated and stopped */
>         if (!(hwc->state & PERF_HES_STOPPED)) {
> -               WARN_ON_ONCE(pmu->n_active <= 0);
> -               pmu->n_active--;
> -               if (pmu->n_active == 0)
> -                       hrtimer_cancel(&pmu->hrtimer);
> +               WARN_ON_ONCE(rapl_pmu->n_active <= 0);
> +               rapl_pmu->n_active--;
> +               if (rapl_pmu->n_active == 0)
> +                       hrtimer_cancel(&rapl_pmu->hrtimer);
>  
>                 list_del(&event->active_entry);
>  
> @@ -320,23 +320,23 @@ static void rapl_pmu_event_stop(struct
> perf_event *event, int mode)
>                 hwc->state |= PERF_HES_UPTODATE;
>         }
>  
> -       raw_spin_unlock_irqrestore(&pmu->lock, flags);
> +       raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
>  }
>  
>  static int rapl_pmu_event_add(struct perf_event *event, int mode)
>  {
> -       struct rapl_pmu *pmu = event->pmu_private;
> +       struct rapl_pmu *rapl_pmu = event->pmu_private;
>         struct hw_perf_event *hwc = &event->hw;
>         unsigned long flags;
>  
> -       raw_spin_lock_irqsave(&pmu->lock, flags);
> +       raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
>  
>         hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
>  
>         if (mode & PERF_EF_START)
> -               __rapl_pmu_event_start(pmu, event);
> +               __rapl_pmu_event_start(rapl_pmu, event);
>  
> -       raw_spin_unlock_irqrestore(&pmu->lock, flags);
> +       raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
>  
>         return 0;
>  }
> @@ -350,7 +350,7 @@ static int rapl_pmu_event_init(struct perf_event
> *event)
>  {
>         u64 cfg = event->attr.config & RAPL_EVENT_MASK;
>         int bit, ret = 0;
> -       struct rapl_pmu *pmu;
> +       struct rapl_pmu *rapl_pmu;
>  
>         /* only look at RAPL events */
>         if (event->attr.type != rapl_pmus->pmu.type)
> @@ -380,11 +380,11 @@ static int rapl_pmu_event_init(struct
> perf_event *event)
>                 return -EINVAL;
>  
>         /* must be done before validate_group */
> -       pmu = cpu_to_rapl_pmu(event->cpu);
> -       if (!pmu)
> +       rapl_pmu = cpu_to_rapl_pmu(event->cpu);
> +       if (!rapl_pmu)
>                 return -EINVAL;
> -       event->cpu = pmu->cpu;
> -       event->pmu_private = pmu;
> +       event->cpu = rapl_pmu->cpu;
> +       event->pmu_private = rapl_pmu;
>         event->hw.event_base = rapl_msrs[bit].msr;
>         event->hw.config = cfg;
>         event->hw.idx = bit;
> @@ -567,22 +567,22 @@ static struct perf_msr amd_rapl_msrs[] = {
>  static int rapl_cpu_offline(unsigned int cpu)
>  {
>         const struct cpumask *rapl_pmu_cpumask =
> get_rapl_pmu_cpumask(cpu);
> -       struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
> +       struct rapl_pmu *rapl_pmu = cpu_to_rapl_pmu(cpu);
>         int target;
>  
>         /* Check if exiting cpu is used for collecting rapl events */
>         if (!cpumask_test_and_clear_cpu(cpu, &rapl_cpu_mask))
>                 return 0;
>  
> -       pmu->cpu = -1;
> +       rapl_pmu->cpu = -1;
>         /* Find a new cpu to collect rapl events */
>         target = cpumask_any_but(rapl_pmu_cpumask, cpu);
>  
>         /* Migrate rapl events to the new target */
>         if (target < nr_cpu_ids) {
>                 cpumask_set_cpu(target, &rapl_cpu_mask);
> -               pmu->cpu = target;
> -               perf_pmu_migrate_context(pmu->pmu, cpu, target);
> +               rapl_pmu->cpu = target;
> +               perf_pmu_migrate_context(rapl_pmu->pmu, cpu, target);
>         }
>         return 0;
>  }
> @@ -591,21 +591,21 @@ static int rapl_cpu_online(unsigned int cpu)
>  {
>         unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
>         const struct cpumask *rapl_pmu_cpumask =
> get_rapl_pmu_cpumask(cpu);
> -       struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
> +       struct rapl_pmu *rapl_pmu = cpu_to_rapl_pmu(cpu);
>         int target;
>  
> -       if (!pmu) {
> -               pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL,
> cpu_to_node(cpu));
> -               if (!pmu)
> +       if (!rapl_pmu) {
> +               rapl_pmu = kzalloc_node(sizeof(*rapl_pmu),
> GFP_KERNEL, cpu_to_node(cpu));
> +               if (!rapl_pmu)
>                         return -ENOMEM;
>  
> -               raw_spin_lock_init(&pmu->lock);
> -               INIT_LIST_HEAD(&pmu->active_list);
> -               pmu->pmu = &rapl_pmus->pmu;
> -               pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
> -               rapl_hrtimer_init(pmu);
> +               raw_spin_lock_init(&rapl_pmu->lock);
> +               INIT_LIST_HEAD(&rapl_pmu->active_list);
> +               rapl_pmu->pmu = &rapl_pmus->pmu;
> +               rapl_pmu->timer_interval =
> ms_to_ktime(rapl_timer_ms);
> +               rapl_hrtimer_init(rapl_pmu);
>  
> -               rapl_pmus->pmus[rapl_pmu_idx] = pmu;
> +               rapl_pmus->rapl_pmu[rapl_pmu_idx] = rapl_pmu;
>         }
>  
>         /*
> @@ -617,7 +617,7 @@ static int rapl_cpu_online(unsigned int cpu)
>                 return 0;
>  
>         cpumask_set_cpu(cpu, &rapl_cpu_mask);
> -       pmu->cpu = cpu;
> +       rapl_pmu->cpu = cpu;
>         return 0;
>  }
>  
> @@ -686,7 +686,7 @@ static void cleanup_rapl_pmus(void)
>         int i;
>  
>         for (i = 0; i < rapl_pmus->nr_rapl_pmu; i++)
> -               kfree(rapl_pmus->pmus[i]);
> +               kfree(rapl_pmus->rapl_pmu[i]);
>         kfree(rapl_pmus);
>  }
>  
> @@ -706,7 +706,7 @@ static int __init init_rapl_pmus(void)
>         if (!rapl_pmu_is_pkg_scope())
>                 nr_rapl_pmu *= topology_max_dies_per_package();
>  
> -       rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus,
> nr_rapl_pmu), GFP_KERNEL);
> +       rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu,
> nr_rapl_pmu), GFP_KERNEL);
>         if (!rapl_pmus)
>                 return -ENOMEM;
>  


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 04/11] perf/x86/rapl: Make rapl_model struct global
  2024-07-11 10:24 ` [PATCH v4 04/11] perf/x86/rapl: Make rapl_model struct global Dhananjay Ugwekar
@ 2024-07-12  3:03   ` Zhang, Rui
  0 siblings, 0 replies; 31+ messages in thread
From: Zhang, Rui @ 2024-07-12  3:03 UTC (permalink / raw)
  To: alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	Hunter, Adrian, mingo@redhat.com, irogers@google.com,
	tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, Dhananjay.Ugwekar@amd.com, bp@alien8.de,
	acme@kernel.org, oleksandr@natalenko.name, jolsa@kernel.org,
	x86@kernel.org, namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org

On Thu, 2024-07-11 at 10:24 +0000, Dhananjay Ugwekar wrote:
> To support AMD's per_core RAPL counter, we will need to check
> per_core capability of the current rapl_model multiple times in
> rapl_cpu_online/offline, init_rapl_pmus functions, so cache the
> matched rapl model in a global variable, to avoid calling
> x86_match_cpu() multiple times.
> 
> No functional change.
> 
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

-rui
> ---
>  arch/x86/events/rapl.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index c3afeaf679c2..4ee0877eb4d8 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -146,6 +146,7 @@ static cpumask_t rapl_cpu_mask;
>  static unsigned int rapl_cntr_mask;
>  static u64 rapl_timer_ms;
>  static struct perf_msr *rapl_msrs;
> +static struct rapl_model *rapl_model;
>  
>  /*
>   * Helper functions to get the correct topology macros according to
> the
> @@ -621,18 +622,18 @@ static int rapl_cpu_online(unsigned int cpu)
>         return 0;
>  }
>  
> -static int rapl_check_hw_unit(struct rapl_model *rm)
> +static int rapl_check_hw_unit(void)
>  {
>         u64 msr_rapl_power_unit_bits;
>         int i;
>  
>         /* protect rdmsrl() to handle virtualization */
> -       if (rdmsrl_safe(rm->msr_power_unit,
> &msr_rapl_power_unit_bits))
> +       if (rdmsrl_safe(rapl_model->msr_power_unit,
> &msr_rapl_power_unit_bits))
>                 return -1;
>         for (i = 0; i < NR_RAPL_DOMAINS; i++)
>                 rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) &
> 0x1FULL;
>  
> -       switch (rm->unit_quirk) {
> +       switch (rapl_model->unit_quirk) {
>         /*
>          * DRAM domain on HSW server and KNL has fixed energy unit
> which can be
>          * different than the unit from power unit MSR. See
> @@ -846,21 +847,20 @@ MODULE_DEVICE_TABLE(x86cpu, rapl_model_match);
>  static int __init rapl_pmu_init(void)
>  {
>         const struct x86_cpu_id *id;
> -       struct rapl_model *rm;
>         int ret;
>  
>         id = x86_match_cpu(rapl_model_match);
>         if (!id)
>                 return -ENODEV;
>  
> -       rm = (struct rapl_model *) id->driver_data;
> +       rapl_model = (struct rapl_model *) id->driver_data;
>  
> -       rapl_msrs = rm->rapl_msrs;
> +       rapl_msrs = rapl_model->rapl_msrs;
>  
>         rapl_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_MAX,
> -                                       false, (void *) &rm->events);
> +                                       false, (void *) &rapl_model-
> >events);
>  
> -       ret = rapl_check_hw_unit(rm);
> +       ret = rapl_check_hw_unit();
>         if (ret)
>                 return ret;
>  


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 05/11] perf/x86/rapl: Move cpumask variable to rapl_pmus struct
  2024-07-11 10:24 ` [PATCH v4 05/11] perf/x86/rapl: Move cpumask variable to rapl_pmus struct Dhananjay Ugwekar
@ 2024-07-12  3:07   ` Zhang, Rui
  2024-07-12  3:30     ` Dhananjay Ugwekar
  0 siblings, 1 reply; 31+ messages in thread
From: Zhang, Rui @ 2024-07-12  3:07 UTC (permalink / raw)
  To: alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	Hunter, Adrian, mingo@redhat.com, irogers@google.com,
	tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, Dhananjay.Ugwekar@amd.com, bp@alien8.de,
	acme@kernel.org, oleksandr@natalenko.name, jolsa@kernel.org,
	x86@kernel.org, namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org

> @@ -710,6 +710,9 @@ static int __init init_rapl_pmus(void)
>         rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu,
> nr_rapl_pmu), GFP_KERNEL);
>         if (!rapl_pmus)
>                 return -ENOMEM;
> +
> +       if (!alloc_cpumask_var(&rapl_pmus->cpumask, GFP_KERNEL))
> +               return -ENOMEM;

missing free_cpumask_var() in cleanup_rapl_pmus()?

thanks,
rui



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 06/11] perf/x86/rapl: Add wrapper for online/offline functions
  2024-07-11 10:24 ` [PATCH v4 06/11] perf/x86/rapl: Add wrapper for online/offline functions Dhananjay Ugwekar
@ 2024-07-12  3:20   ` Zhang, Rui
  0 siblings, 0 replies; 31+ messages in thread
From: Zhang, Rui @ 2024-07-12  3:20 UTC (permalink / raw)
  To: alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	Hunter, Adrian, mingo@redhat.com, irogers@google.com,
	tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, Dhananjay.Ugwekar@amd.com, bp@alien8.de,
	acme@kernel.org, oleksandr@natalenko.name, jolsa@kernel.org,
	x86@kernel.org, namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org

On Thu, 2024-07-11 at 10:24 +0000, Dhananjay Ugwekar wrote:
> This is in preparation for the addition of per-core RAPL counter
> support
> for AMD CPUs.
> 
> The CPU online and offline functions will need to handle the setting
> up and
> migration of the new per-core PMU as well. The wrapper functions
> added
> below will make it easier to pass the corresponding args for both the
> PMUs.
> 
> No functional change.
> 
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

-rui

> ---
>  arch/x86/events/rapl.c | 30 +++++++++++++++++++++---------
>  1 file changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index e3d0a82e12b9..1d36565d0cb9 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -565,10 +565,10 @@ static struct perf_msr amd_rapl_msrs[] = {
>         [PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group,  NULL,
> false, 0 },
>  };
>  
> -static int rapl_cpu_offline(unsigned int cpu)
> +static int __rapl_cpu_offline(struct rapl_pmus *rapl_pmus, unsigned
> int rapl_pmu_idx,
> +                             const struct cpumask *event_cpumask,
> unsigned int cpu)
>  {
> -       const struct cpumask *rapl_pmu_cpumask =
> get_rapl_pmu_cpumask(cpu);
> -       struct rapl_pmu *rapl_pmu = cpu_to_rapl_pmu(cpu);
> +       struct rapl_pmu *rapl_pmu = rapl_pmus-
> >rapl_pmu[rapl_pmu_idx];
>         int target;
>  
>         /* Check if exiting cpu is used for collecting rapl events */
> @@ -577,7 +577,7 @@ static int rapl_cpu_offline(unsigned int cpu)
>  
>         rapl_pmu->cpu = -1;
>         /* Find a new cpu to collect rapl events */
> -       target = cpumask_any_but(rapl_pmu_cpumask, cpu);
> +       target = cpumask_any_but(event_cpumask, cpu);
>  
>         /* Migrate rapl events to the new target */
>         if (target < nr_cpu_ids) {
> @@ -588,11 +588,16 @@ static int rapl_cpu_offline(unsigned int cpu)
>         return 0;
>  }
>  
> -static int rapl_cpu_online(unsigned int cpu)
> +static int rapl_cpu_offline(unsigned int cpu)
>  {
> -       unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
> -       const struct cpumask *rapl_pmu_cpumask =
> get_rapl_pmu_cpumask(cpu);
> -       struct rapl_pmu *rapl_pmu = cpu_to_rapl_pmu(cpu);
> +       return __rapl_cpu_offline(rapl_pmus, get_rapl_pmu_idx(cpu),
> +                                 get_rapl_pmu_cpumask(cpu), cpu);
> +}
> +
> +static int __rapl_cpu_online(struct rapl_pmus *rapl_pmus, unsigned
> int rapl_pmu_idx,
> +                            const struct cpumask *event_cpumask,
> unsigned int cpu)
> +{
> +       struct rapl_pmu *rapl_pmu = rapl_pmus-
> >rapl_pmu[rapl_pmu_idx];
>         int target;
>  
>         if (!rapl_pmu) {
> @@ -613,7 +618,7 @@ static int rapl_cpu_online(unsigned int cpu)
>          * Check if there is an online cpu in the package which
> collects rapl
>          * events already.
>          */
> -       target = cpumask_any_and(rapl_pmus->cpumask,
> rapl_pmu_cpumask);
> +       target = cpumask_any_and(rapl_pmus->cpumask, event_cpumask);
>         if (target < nr_cpu_ids)
>                 return 0;
>  
> @@ -622,6 +627,13 @@ static int rapl_cpu_online(unsigned int cpu)
>         return 0;
>  }
>  
> +static int rapl_cpu_online(unsigned int cpu)
> +{
> +       return __rapl_cpu_online(rapl_pmus, get_rapl_pmu_idx(cpu),
> +                                get_rapl_pmu_cpumask(cpu), cpu);
> +}
> +
> +
>  static int rapl_check_hw_unit(void)
>  {
>         u64 msr_rapl_power_unit_bits;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 07/11] perf/x86/rapl: Add an argument to the cleanup and init functions
  2024-07-11 10:24 ` [PATCH v4 07/11] perf/x86/rapl: Add an argument to the cleanup and init functions Dhananjay Ugwekar
@ 2024-07-12  3:22   ` Zhang, Rui
  0 siblings, 0 replies; 31+ messages in thread
From: Zhang, Rui @ 2024-07-12  3:22 UTC (permalink / raw)
  To: alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	Hunter, Adrian, mingo@redhat.com, irogers@google.com,
	tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, Dhananjay.Ugwekar@amd.com, bp@alien8.de,
	acme@kernel.org, oleksandr@natalenko.name, jolsa@kernel.org,
	x86@kernel.org, namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org

On Thu, 2024-07-11 at 10:24 +0000, Dhananjay Ugwekar wrote:
> Prep for per-core RAPL PMU addition.
> 
> No functional change.
> 
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

-rui

> ---
>  arch/x86/events/rapl.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index 1d36565d0cb9..362e82284ccb 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -694,7 +694,7 @@ static void __init rapl_advertise(void)
>         }
>  }
>  
> -static void cleanup_rapl_pmus(void)
> +static void cleanup_rapl_pmus(struct rapl_pmus *rapl_pmus)
>  {
>         int i;
>  
> @@ -712,9 +712,10 @@ static const struct attribute_group
> *rapl_attr_update[] = {
>         NULL,
>  };
>  
> -static int __init init_rapl_pmus(void)
> +static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr)
>  {
>         int nr_rapl_pmu = topology_max_packages();
> +       struct rapl_pmus *rapl_pmus;
>  
>         if (!rapl_pmu_is_pkg_scope())
>                 nr_rapl_pmu *= topology_max_dies_per_package();
> @@ -738,6 +739,9 @@ static int __init init_rapl_pmus(void)
>         rapl_pmus->pmu.read             = rapl_pmu_event_read;
>         rapl_pmus->pmu.module           = THIS_MODULE;
>         rapl_pmus->pmu.capabilities     = PERF_PMU_CAP_NO_EXCLUDE;
> +
> +       *rapl_pmus_ptr = rapl_pmus;
> +
>         return 0;
>  }
>  
> @@ -879,7 +883,7 @@ static int __init rapl_pmu_init(void)
>         if (ret)
>                 return ret;
>  
> -       ret = init_rapl_pmus();
> +       ret = init_rapl_pmus(&rapl_pmus);
>         if (ret)
>                 return ret;
>  
> @@ -903,7 +907,7 @@ static int __init rapl_pmu_init(void)
>         cpuhp_remove_state(CPUHP_AP_PERF_X86_RAPL_ONLINE);
>  out:
>         pr_warn("Initialization failed (%d), disabled\n", ret);
> -       cleanup_rapl_pmus();
> +       cleanup_rapl_pmus(rapl_pmus);
>         return ret;
>  }
>  module_init(rapl_pmu_init);
> @@ -912,6 +916,6 @@ static void __exit intel_rapl_exit(void)
>  {
>         cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
>         perf_pmu_unregister(&rapl_pmus->pmu);
> -       cleanup_rapl_pmus();
> +       cleanup_rapl_pmus(rapl_pmus);
>  }
>  module_exit(intel_rapl_exit);


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 08/11] perf/x86/rapl: Modify the generic variable names to *_pkg*
  2024-07-11 10:24 ` [PATCH v4 08/11] perf/x86/rapl: Modify the generic variable names to *_pkg* Dhananjay Ugwekar
@ 2024-07-12  3:28   ` Zhang, Rui
  0 siblings, 0 replies; 31+ messages in thread
From: Zhang, Rui @ 2024-07-12  3:28 UTC (permalink / raw)
  To: alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	Hunter, Adrian, mingo@redhat.com, irogers@google.com,
	tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, Dhananjay.Ugwekar@amd.com, bp@alien8.de,
	acme@kernel.org, oleksandr@natalenko.name, jolsa@kernel.org,
	x86@kernel.org, namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org

On Thu, 2024-07-11 at 10:24 +0000, Dhananjay Ugwekar wrote:
> Prep for addition of power_per_core PMU to handle core scope energy
> consumption for AMD CPUs.
> 
> Replace the generic names with *_pkg*, to differentiate between the
> scopes of the two different PMUs and their variables.
> 
> No functional change.
> 
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

-rui

> ---
> Changes in v4:
> * Move some renaming code from patch 10 to here (Rui)
> ---
>  arch/x86/events/rapl.c | 113 +++++++++++++++++++++------------------
> --
>  1 file changed, 57 insertions(+), 56 deletions(-)
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index 362e82284ccb..9beded3d4a14 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -70,18 +70,18 @@ MODULE_LICENSE("GPL");
>  /*
>   * RAPL energy status counters
>   */
> -enum perf_rapl_events {
> +enum perf_rapl_pkg_events {
>         PERF_RAPL_PP0 = 0,              /* all cores */
>         PERF_RAPL_PKG,                  /* entire package */
>         PERF_RAPL_RAM,                  /* DRAM */
>         PERF_RAPL_PP1,                  /* gpu */
>         PERF_RAPL_PSYS,                 /* psys */
>  
> -       PERF_RAPL_MAX,
> -       NR_RAPL_DOMAINS = PERF_RAPL_MAX,
> +       PERF_RAPL_PKG_EVENTS_MAX,
> +       NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
>  };
>  
> -static const char *const rapl_domain_names[NR_RAPL_DOMAINS]
> __initconst = {
> +static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS]
> __initconst = {
>         "pp0-core",
>         "package",
>         "dram",
> @@ -134,16 +134,16 @@ enum rapl_unit_quirk {
>  };
>  
>  struct rapl_model {
> -       struct perf_msr *rapl_msrs;
> -       unsigned long   events;
> +       struct perf_msr *rapl_pkg_msrs;
> +       unsigned long   pkg_events;
>         unsigned int    msr_power_unit;
>         enum rapl_unit_quirk    unit_quirk;
>  };
>  
>   /* 1/2^hw_unit Joule */
> -static int rapl_hw_unit[NR_RAPL_DOMAINS] __read_mostly;
> -static struct rapl_pmus *rapl_pmus;
> -static unsigned int rapl_cntr_mask;
> +static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
> +static struct rapl_pmus *rapl_pmus_pkg;
> +static unsigned int rapl_pkg_cntr_mask;
>  static u64 rapl_timer_ms;
>  static struct perf_msr *rapl_msrs;
>  static struct rapl_model *rapl_model;
> @@ -172,7 +172,8 @@ static inline struct rapl_pmu
> *cpu_to_rapl_pmu(unsigned int cpu)
>          * The unsigned check also catches the '-1' return value for
> non
>          * existent mappings in the topology map.
>          */
> -       return rapl_pmu_idx < rapl_pmus->nr_rapl_pmu ? rapl_pmus-
> >rapl_pmu[rapl_pmu_idx] : NULL;
> +       return rapl_pmu_idx < rapl_pmus_pkg->nr_rapl_pmu ?
> +              rapl_pmus_pkg->rapl_pmu[rapl_pmu_idx] : NULL;
>  }
>  
>  static inline u64 rapl_read_counter(struct perf_event *event)
> @@ -184,7 +185,7 @@ static inline u64 rapl_read_counter(struct
> perf_event *event)
>  
>  static inline u64 rapl_scale(u64 v, int cfg)
>  {
> -       if (cfg > NR_RAPL_DOMAINS) {
> +       if (cfg > NR_RAPL_PKG_DOMAINS) {
>                 pr_warn("Invalid domain %d, failed to scale data\n",
> cfg);
>                 return v;
>         }
> @@ -194,7 +195,7 @@ static inline u64 rapl_scale(u64 v, int cfg)
>          * or use ldexp(count, -32).
>          * Watts = Joules/Time delta
>          */
> -       return v << (32 - rapl_hw_unit[cfg - 1]);
> +       return v << (32 - rapl_pkg_hw_unit[cfg - 1]);
>  }
>  
>  static u64 rapl_event_update(struct perf_event *event)
> @@ -354,7 +355,7 @@ static int rapl_pmu_event_init(struct perf_event
> *event)
>         struct rapl_pmu *rapl_pmu;
>  
>         /* only look at RAPL events */
> -       if (event->attr.type != rapl_pmus->pmu.type)
> +       if (event->attr.type != rapl_pmus_pkg->pmu.type)
>                 return -ENOENT;
>  
>         /* check only supported bits are set */
> @@ -366,14 +367,14 @@ static int rapl_pmu_event_init(struct
> perf_event *event)
>  
>         event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
>  
> -       if (!cfg || cfg >= NR_RAPL_DOMAINS + 1)
> +       if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
>                 return -EINVAL;
>  
> -       cfg = array_index_nospec((long)cfg, NR_RAPL_DOMAINS + 1);
> +       cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
>         bit = cfg - 1;
>  
>         /* check event supported */
> -       if (!(rapl_cntr_mask & (1 << bit)))
> +       if (!(rapl_pkg_cntr_mask & (1 << bit)))
>                 return -EINVAL;
>  
>         /* unsupported modes and filters */
> @@ -401,7 +402,7 @@ static void rapl_pmu_event_read(struct perf_event
> *event)
>  static ssize_t rapl_get_attr_cpumask(struct device *dev,
>                                 struct device_attribute *attr, char
> *buf)
>  {
> -       return cpumap_print_to_pagebuf(true, buf, rapl_pmus-
> >cpumask);
> +       return cpumap_print_to_pagebuf(true, buf, rapl_pmus_pkg-
> >cpumask);
>  }
>  
>  static DEVICE_ATTR(cpumask, S_IRUGO, rapl_get_attr_cpumask, NULL);
> @@ -553,11 +554,11 @@ static struct perf_msr intel_rapl_spr_msrs[] =
> {
>  };
>  
>  /*
> - * Force to PERF_RAPL_MAX size due to:
> - * - perf_msr_probe(PERF_RAPL_MAX)
> + * Force to PERF_RAPL_PKG_EVENTS_MAX size due to:
> + * - perf_msr_probe(PERF_RAPL_PKG_EVENTS_MAX)
>   * - want to use same event codes across both architectures
>   */
> -static struct perf_msr amd_rapl_msrs[] = {
> +static struct perf_msr amd_rapl_pkg_msrs[] = {
>         [PERF_RAPL_PP0]  = { 0, &rapl_events_cores_group, NULL,
> false, 0 },
>         [PERF_RAPL_PKG]  = { MSR_AMD_PKG_ENERGY_STATUS, 
> &rapl_events_pkg_group,   test_msr, false, RAPL_MSR_MASK },
>         [PERF_RAPL_RAM]  = { 0, &rapl_events_ram_group,   NULL,
> false, 0 },
> @@ -590,7 +591,7 @@ static int __rapl_cpu_offline(struct rapl_pmus
> *rapl_pmus, unsigned int rapl_pmu
>  
>  static int rapl_cpu_offline(unsigned int cpu)
>  {
> -       return __rapl_cpu_offline(rapl_pmus, get_rapl_pmu_idx(cpu),
> +       return __rapl_cpu_offline(rapl_pmus_pkg,
> get_rapl_pmu_idx(cpu),
>                                   get_rapl_pmu_cpumask(cpu), cpu);
>  }
>  
> @@ -629,7 +630,7 @@ static int __rapl_cpu_online(struct rapl_pmus
> *rapl_pmus, unsigned int rapl_pmu_
>  
>  static int rapl_cpu_online(unsigned int cpu)
>  {
> -       return __rapl_cpu_online(rapl_pmus, get_rapl_pmu_idx(cpu),
> +       return __rapl_cpu_online(rapl_pmus_pkg,
> get_rapl_pmu_idx(cpu),
>                                  get_rapl_pmu_cpumask(cpu), cpu);
>  }
>  
> @@ -642,8 +643,8 @@ static int rapl_check_hw_unit(void)
>         /* protect rdmsrl() to handle virtualization */
>         if (rdmsrl_safe(rapl_model->msr_power_unit,
> &msr_rapl_power_unit_bits))
>                 return -1;
> -       for (i = 0; i < NR_RAPL_DOMAINS; i++)
> -               rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) &
> 0x1FULL;
> +       for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++)
> +               rapl_pkg_hw_unit[i] = (msr_rapl_power_unit_bits >> 8)
> & 0x1FULL;
>  
>         switch (rapl_model->unit_quirk) {
>         /*
> @@ -653,11 +654,11 @@ static int rapl_check_hw_unit(void)
>          * of 2. Datasheet, September 2014, Reference Number: 330784-
> 001 "
>          */
>         case RAPL_UNIT_QUIRK_INTEL_HSW:
> -               rapl_hw_unit[PERF_RAPL_RAM] = 16;
> +               rapl_pkg_hw_unit[PERF_RAPL_RAM] = 16;
>                 break;
>         /* SPR uses a fixed energy unit for Psys domain. */
>         case RAPL_UNIT_QUIRK_INTEL_SPR:
> -               rapl_hw_unit[PERF_RAPL_PSYS] = 0;
> +               rapl_pkg_hw_unit[PERF_RAPL_PSYS] = 0;
>                 break;
>         default:
>                 break;
> @@ -672,9 +673,9 @@ static int rapl_check_hw_unit(void)
>          * if hw unit is 32, then we use 2 ms 1/200/2
>          */
>         rapl_timer_ms = 2;
> -       if (rapl_hw_unit[0] < 32) {
> +       if (rapl_pkg_hw_unit[0] < 32) {
>                 rapl_timer_ms = (1000 / (2 * 100));
> -               rapl_timer_ms *= (1ULL << (32 - rapl_hw_unit[0] -
> 1));
> +               rapl_timer_ms *= (1ULL << (32 - rapl_pkg_hw_unit[0] -
> 1));
>         }
>         return 0;
>  }
> @@ -684,12 +685,12 @@ static void __init rapl_advertise(void)
>         int i;
>  
>         pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms
> ovfl timer\n",
> -               hweight32(rapl_cntr_mask), rapl_timer_ms);
> +               hweight32(rapl_pkg_cntr_mask), rapl_timer_ms);
>  
> -       for (i = 0; i < NR_RAPL_DOMAINS; i++) {
> -               if (rapl_cntr_mask & (1 << i)) {
> +       for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++) {
> +               if (rapl_pkg_cntr_mask & (1 << i)) {
>                         pr_info("hw unit of domain %s 2^-%d
> Joules\n",
> -                               rapl_domain_names[i],
> rapl_hw_unit[i]);
> +                               rapl_pkg_domain_names[i],
> rapl_pkg_hw_unit[i]);
>                 }
>         }
>  }
> @@ -746,71 +747,71 @@ static int __init init_rapl_pmus(struct
> rapl_pmus **rapl_pmus_ptr)
>  }
>  
>  static struct rapl_model model_snb = {
> -       .events         = BIT(PERF_RAPL_PP0) |
> +       .pkg_events     = BIT(PERF_RAPL_PP0) |
>                           BIT(PERF_RAPL_PKG) |
>                           BIT(PERF_RAPL_PP1),
>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
> -       .rapl_msrs      = intel_rapl_msrs,
> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>  };
>  
>  static struct rapl_model model_snbep = {
> -       .events         = BIT(PERF_RAPL_PP0) |
> +       .pkg_events     = BIT(PERF_RAPL_PP0) |
>                           BIT(PERF_RAPL_PKG) |
>                           BIT(PERF_RAPL_RAM),
>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
> -       .rapl_msrs      = intel_rapl_msrs,
> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>  };
>  
>  static struct rapl_model model_hsw = {
> -       .events         = BIT(PERF_RAPL_PP0) |
> +       .pkg_events     = BIT(PERF_RAPL_PP0) |
>                           BIT(PERF_RAPL_PKG) |
>                           BIT(PERF_RAPL_RAM) |
>                           BIT(PERF_RAPL_PP1),
>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
> -       .rapl_msrs      = intel_rapl_msrs,
> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>  };
>  
>  static struct rapl_model model_hsx = {
> -       .events         = BIT(PERF_RAPL_PP0) |
> +       .pkg_events     = BIT(PERF_RAPL_PP0) |
>                           BIT(PERF_RAPL_PKG) |
>                           BIT(PERF_RAPL_RAM),
>         .unit_quirk     = RAPL_UNIT_QUIRK_INTEL_HSW,
>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
> -       .rapl_msrs      = intel_rapl_msrs,
> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>  };
>  
>  static struct rapl_model model_knl = {
> -       .events         = BIT(PERF_RAPL_PKG) |
> +       .pkg_events     = BIT(PERF_RAPL_PKG) |
>                           BIT(PERF_RAPL_RAM),
>         .unit_quirk     = RAPL_UNIT_QUIRK_INTEL_HSW,
>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
> -       .rapl_msrs      = intel_rapl_msrs,
> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>  };
>  
>  static struct rapl_model model_skl = {
> -       .events         = BIT(PERF_RAPL_PP0) |
> +       .pkg_events     = BIT(PERF_RAPL_PP0) |
>                           BIT(PERF_RAPL_PKG) |
>                           BIT(PERF_RAPL_RAM) |
>                           BIT(PERF_RAPL_PP1) |
>                           BIT(PERF_RAPL_PSYS),
>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
> -       .rapl_msrs      = intel_rapl_msrs,
> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>  };
>  
>  static struct rapl_model model_spr = {
> -       .events         = BIT(PERF_RAPL_PP0) |
> +       .pkg_events     = BIT(PERF_RAPL_PP0) |
>                           BIT(PERF_RAPL_PKG) |
>                           BIT(PERF_RAPL_RAM) |
>                           BIT(PERF_RAPL_PSYS),
>         .unit_quirk     = RAPL_UNIT_QUIRK_INTEL_SPR,
>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
> -       .rapl_msrs      = intel_rapl_spr_msrs,
> +       .rapl_pkg_msrs  = intel_rapl_spr_msrs,
>  };
>  
>  static struct rapl_model model_amd_hygon = {
> -       .events         = BIT(PERF_RAPL_PKG),
> +       .pkg_events     = BIT(PERF_RAPL_PKG),
>         .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
> -       .rapl_msrs      = amd_rapl_msrs,
> +       .rapl_pkg_msrs  = amd_rapl_pkg_msrs,
>  };
>  
>  static const struct x86_cpu_id rapl_model_match[] __initconst = {
> @@ -874,16 +875,16 @@ static int __init rapl_pmu_init(void)
>  
>         rapl_model = (struct rapl_model *) id->driver_data;
>  
> -       rapl_msrs = rapl_model->rapl_msrs;
> +       rapl_msrs = rapl_model->rapl_pkg_msrs;
>  
> -       rapl_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_MAX,
> -                                       false, (void *) &rapl_model-
> >events);
> +       rapl_pkg_cntr_mask = perf_msr_probe(rapl_msrs,
> PERF_RAPL_PKG_EVENTS_MAX,
> +                                       false, (void *) &rapl_model-
> >pkg_events);
>  
>         ret = rapl_check_hw_unit();
>         if (ret)
>                 return ret;
>  
> -       ret = init_rapl_pmus(&rapl_pmus);
> +       ret = init_rapl_pmus(&rapl_pmus_pkg);
>         if (ret)
>                 return ret;
>  
> @@ -896,7 +897,7 @@ static int __init rapl_pmu_init(void)
>         if (ret)
>                 goto out;
>  
> -       ret = perf_pmu_register(&rapl_pmus->pmu, "power", -1);
> +       ret = perf_pmu_register(&rapl_pmus_pkg->pmu, "power", -1);
>         if (ret)
>                 goto out1;
>  
> @@ -907,7 +908,7 @@ static int __init rapl_pmu_init(void)
>         cpuhp_remove_state(CPUHP_AP_PERF_X86_RAPL_ONLINE);
>  out:
>         pr_warn("Initialization failed (%d), disabled\n", ret);
> -       cleanup_rapl_pmus(rapl_pmus);
> +       cleanup_rapl_pmus(rapl_pmus_pkg);
>         return ret;
>  }
>  module_init(rapl_pmu_init);
> @@ -915,7 +916,7 @@ module_init(rapl_pmu_init);
>  static void __exit intel_rapl_exit(void)
>  {
>         cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
> -       perf_pmu_unregister(&rapl_pmus->pmu);
> -       cleanup_rapl_pmus(rapl_pmus);
> +       perf_pmu_unregister(&rapl_pmus_pkg->pmu);
> +       cleanup_rapl_pmus(rapl_pmus_pkg);
>  }
>  module_exit(intel_rapl_exit);


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 02/11] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
  2024-07-12  2:04   ` Zhang, Rui
@ 2024-07-12  3:28     ` Dhananjay Ugwekar
  0 siblings, 0 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-12  3:28 UTC (permalink / raw)
  To: Zhang, Rui, alexander.shishkin@linux.intel.com,
	dave.hansen@linux.intel.com, Hunter, Adrian, mingo@redhat.com,
	irogers@google.com, tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, bp@alien8.de, acme@kernel.org,
	oleksandr@natalenko.name, jolsa@kernel.org, x86@kernel.org,
	namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org

Hello Rui,

On 7/12/2024 7:34 AM, Zhang, Rui wrote:
> On Thu, 2024-07-11 at 10:24 +0000, Dhananjay Ugwekar wrote:
>> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026
>> leaf"),
>> on AMD processors that support extended CPUID leaf 0x80000026, the
>> topology_die_cpumask() and topology_logical_die_id() macros, no
>> longer
>> return the package cpumask and package id, instead they return the
>> CCD
>> (Core Complex Die) mask and id respectively. This leads to the
>> energy-pkg
>> event scope to be modified to CCD instead of package.
>>
>> Replacing these macros with their package counterparts fixes the
>> energy-pkg event for AMD CPUs.
>>
>> However due to the difference between the scope of energy-pkg event
>> for
>> Intel and AMD CPUs, we have to replace these macros conditionally
>> only for
>> AMD CPUs.
>>
>> On a 12 CCD 1 Package AMD Zen4 Genoa machine:
>>
>> Before:
>> $ cat /sys/devices/power/cpumask
>> 0,8,16,24,32,40,48,56,64,72,80,88.
>>
>> The expected cpumask here is supposed to be just "0", as it is a
>> package
>> scope event, only one CPU will be collecting the event for all the
>> CPUs in
>> the package.
>>
>> After:
>> $ cat /sys/devices/power/cpumask
>> 0
>>
>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
>> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD
>> 0x80000026 leaf")
> 
> You still missed my Reviewed-by tag for this one. :)
> https://lore.kernel.org/all/e1f70a09f85dbd0ee3f32dffea37993e141269d0.camel@intel.com/

I didnt forget this time, just wanted to give you a chance to review the changes :) (although they are minimal)

Will surely add it in next version! 

Thanks,
Dhananjay

> 
> thanks,
> rui
> 
>> ---
>> Changes in v4:
>> * Invert the pkg scope check in init_rapl_pmus() (Peter)
>> * Add comments to explain the pkg scope check (Peter)
>>
>> PS: Scope check logic is still kept the same (i.e., all Intel systems
>> being 
>> considered as die scope), Rui will be modifying it to limit the die-
>> scope 
>> only to Cascadelake-AP in a future patch on top of this patchset.
>> ---
>>  arch/x86/events/rapl.c | 39 ++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 34 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>> index 0c5e7a7c43ac..df71f38ad98d 100644
>> --- a/arch/x86/events/rapl.c
>> +++ b/arch/x86/events/rapl.c
>> @@ -103,6 +103,13 @@ static struct perf_pmu_events_attr
>> event_attr_##v = {                              \
>>         .event_str      =
>> str,                                                  \
>>  };
>>  
>> +/*
>> + * RAPL PMU scope for AMD is package whereas for Intel it is die.
>> + */
>> +#define rapl_pmu_is_pkg_scope()                                \
>> +       (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||  \
>> +        boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
>> +
>>  struct rapl_pmu {
>>         raw_spinlock_t          lock;
>>         int                     n_active;
>> @@ -140,9 +147,25 @@ static unsigned int rapl_cntr_mask;
>>  static u64 rapl_timer_ms;
>>  static struct perf_msr *rapl_msrs;
>>  
>> +/*
>> + * Helper functions to get the correct topology macros according to
>> the
>> + * RAPL PMU scope.
>> + */
>> +static inline unsigned int get_rapl_pmu_idx(int cpu)
>> +{
>> +       return rapl_pmu_is_pkg_scope() ?
>> topology_logical_package_id(cpu) :
>> +                                       
>> topology_logical_die_id(cpu);
>> +}
>> +
>> +static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
>> +{
>> +       return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
>> +                                        topology_die_cpumask(cpu);
>> +}
>> +
>>  static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
>>  {
>> -       unsigned int rapl_pmu_idx = topology_logical_die_id(cpu);
>> +       unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
>>  
>>         /*
>>          * The unsigned check also catches the '-1' return value for
>> non
>> @@ -543,6 +566,7 @@ static struct perf_msr amd_rapl_msrs[] = {
>>  
>>  static int rapl_cpu_offline(unsigned int cpu)
>>  {
>> +       const struct cpumask *rapl_pmu_cpumask =
>> get_rapl_pmu_cpumask(cpu);
>>         struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>>         int target;
>>  
>> @@ -552,7 +576,7 @@ static int rapl_cpu_offline(unsigned int cpu)
>>  
>>         pmu->cpu = -1;
>>         /* Find a new cpu to collect rapl events */
>> -       target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
>> +       target = cpumask_any_but(rapl_pmu_cpumask, cpu);
>>  
>>         /* Migrate rapl events to the new target */
>>         if (target < nr_cpu_ids) {
>> @@ -565,6 +589,8 @@ static int rapl_cpu_offline(unsigned int cpu)
>>  
>>  static int rapl_cpu_online(unsigned int cpu)
>>  {
>> +       unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
>> +       const struct cpumask *rapl_pmu_cpumask =
>> get_rapl_pmu_cpumask(cpu);
>>         struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>>         int target;
>>  
>> @@ -579,14 +605,14 @@ static int rapl_cpu_online(unsigned int cpu)
>>                 pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
>>                 rapl_hrtimer_init(pmu);
>>  
>> -               rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
>> +               rapl_pmus->pmus[rapl_pmu_idx] = pmu;
>>         }
>>  
>>         /*
>>          * Check if there is an online cpu in the package which
>> collects rapl
>>          * events already.
>>          */
>> -       target = cpumask_any_and(&rapl_cpu_mask,
>> topology_die_cpumask(cpu));
>> +       target = cpumask_any_and(&rapl_cpu_mask, rapl_pmu_cpumask);
>>         if (target < nr_cpu_ids)
>>                 return 0;
>>  
>> @@ -675,7 +701,10 @@ static const struct attribute_group
>> *rapl_attr_update[] = {
>>  
>>  static int __init init_rapl_pmus(void)
>>  {
>> -       int nr_rapl_pmu = topology_max_packages() *
>> topology_max_dies_per_package();
>> +       int nr_rapl_pmu = topology_max_packages();
>> +
>> +       if (!rapl_pmu_is_pkg_scope())
>> +               nr_rapl_pmu *= topology_max_dies_per_package();
>>  
>>         rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus,
>> nr_rapl_pmu), GFP_KERNEL);
>>         if (!rapl_pmus)
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 09/11] perf/x86/rapl: Remove the global variable rapl_msrs
  2024-07-11 10:24 ` [PATCH v4 09/11] perf/x86/rapl: Remove the global variable rapl_msrs Dhananjay Ugwekar
@ 2024-07-12  3:29   ` Zhang, Rui
  0 siblings, 0 replies; 31+ messages in thread
From: Zhang, Rui @ 2024-07-12  3:29 UTC (permalink / raw)
  To: alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	Hunter, Adrian, mingo@redhat.com, irogers@google.com,
	tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, Dhananjay.Ugwekar@amd.com, bp@alien8.de,
	acme@kernel.org, oleksandr@natalenko.name, jolsa@kernel.org,
	x86@kernel.org, namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org

On Thu, 2024-07-11 at 10:24 +0000, Dhananjay Ugwekar wrote:
> After making the rapl_model struct global, the rapl_msrs global
> variable isn't needed, so remove it.
> 
> Also it will be cleaner when new per-core scope PMU is added. As we
> will
> need to maintain two rapl_msrs array(one for per-core scope and one
> for
> package scope PMU), inside the rapl_model struct.
> 
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

-rui

> ---
>  arch/x86/events/rapl.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index 9beded3d4a14..7691497042e9 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -145,7 +145,6 @@ static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS]
> __read_mostly;
>  static struct rapl_pmus *rapl_pmus_pkg;
>  static unsigned int rapl_pkg_cntr_mask;
>  static u64 rapl_timer_ms;
> -static struct perf_msr *rapl_msrs;
>  static struct rapl_model *rapl_model;
>  
>  /*
> @@ -387,7 +386,7 @@ static int rapl_pmu_event_init(struct perf_event
> *event)
>                 return -EINVAL;
>         event->cpu = rapl_pmu->cpu;
>         event->pmu_private = rapl_pmu;
> -       event->hw.event_base = rapl_msrs[bit].msr;
> +       event->hw.event_base = rapl_model->rapl_msrs[bit].msr;
>         event->hw.config = cfg;
>         event->hw.idx = bit;
>  
> @@ -875,9 +874,7 @@ static int __init rapl_pmu_init(void)
>  
>         rapl_model = (struct rapl_model *) id->driver_data;
>  
> -       rapl_msrs = rapl_model->rapl_pkg_msrs;
> -
> -       rapl_pkg_cntr_mask = perf_msr_probe(rapl_msrs,
> PERF_RAPL_PKG_EVENTS_MAX,
> +       rapl_pkg_cntr_mask = perf_msr_probe(rapl_model->rapl_msrs,
> PERF_RAPL_PKG_EVENTS_MAX,
>                                         false, (void *) &rapl_model-
> >pkg_events);
>  
>         ret = rapl_check_hw_unit();


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 05/11] perf/x86/rapl: Move cpumask variable to rapl_pmus struct
  2024-07-12  3:07   ` Zhang, Rui
@ 2024-07-12  3:30     ` Dhananjay Ugwekar
  0 siblings, 0 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-12  3:30 UTC (permalink / raw)
  To: Zhang, Rui, alexander.shishkin@linux.intel.com,
	dave.hansen@linux.intel.com, Hunter, Adrian, mingo@redhat.com,
	irogers@google.com, tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, bp@alien8.de, acme@kernel.org,
	oleksandr@natalenko.name, jolsa@kernel.org, x86@kernel.org,
	namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org



On 7/12/2024 8:37 AM, Zhang, Rui wrote:
>> @@ -710,6 +710,9 @@ static int __init init_rapl_pmus(void)
>>         rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu,
>> nr_rapl_pmu), GFP_KERNEL);
>>         if (!rapl_pmus)
>>                 return -ENOMEM;
>> +
>> +       if (!alloc_cpumask_var(&rapl_pmus->cpumask, GFP_KERNEL))
>> +               return -ENOMEM;
> 
> missing free_cpumask_var() in cleanup_rapl_pmus()?

Good catch, I had it in one draft of the patch, but then it got 
missed in the final one.

Will fix this

Thanks,
Dhananjay

> 
> thanks,
> rui
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 10/11] perf/x86/rapl: Add per-core energy counter support for AMD CPUs
  2024-07-11 10:24 ` [PATCH v4 10/11] perf/x86/rapl: Add per-core energy counter support for AMD CPUs Dhananjay Ugwekar
@ 2024-07-12  6:15   ` Zhang, Rui
  0 siblings, 0 replies; 31+ messages in thread
From: Zhang, Rui @ 2024-07-12  6:15 UTC (permalink / raw)
  To: alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	Hunter, Adrian, mingo@redhat.com, irogers@google.com,
	tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, Dhananjay.Ugwekar@amd.com, bp@alien8.de,
	acme@kernel.org, oleksandr@natalenko.name, jolsa@kernel.org,
	x86@kernel.org, namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org

> 
> @@ -352,9 +384,13 @@ static int rapl_pmu_event_init(struct perf_event
> *event)
>         u64 cfg = event->attr.config & RAPL_EVENT_MASK;
>         int bit, ret = 0;
>         struct rapl_pmu *rapl_pmu;
> +       struct rapl_pmus *curr_rapl_pmus;
>  
>         /* only look at RAPL events */
> -       if (event->attr.type != rapl_pmus_pkg->pmu.type)
> +       if (event->attr.type == rapl_pmus_pkg->pmu.type ||
> +               (rapl_pmus_core && event->attr.type ==
> rapl_pmus_core->pmu.type))
> +               curr_rapl_pmus = container_of(event->pmu, struct
> rapl_pmus, pmu);
> +       else
>                 return -ENOENT;
>  
>         /* check only supported bits are set */
> @@ -364,7 +400,8 @@ static int rapl_pmu_event_init(struct perf_event
> *event)
>         if (event->cpu < 0)
>                 return -EINVAL;
>  
> -       event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
> +       if (curr_rapl_pmus == rapl_pmus_pkg)
> +               event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
>  
>         if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
>                 return -EINVAL;

this sanity check becomes bogus for per_core event.

> @@ -373,7 +410,8 @@ static int rapl_pmu_event_init(struct perf_event
> *event)
>         bit = cfg - 1;
>  
>         /* check event supported */
> -       if (!(rapl_pkg_cntr_mask & (1 << bit)))
> +       if (!(rapl_pkg_cntr_mask & (1 << bit)) &&
> +           !(rapl_core_cntr_mask & (1 << bit)))
>                 return -EINVAL;

what if bit > 1 for a per_core event?

>  
>         /* unsupported modes and filters */
> @@ -381,12 +419,18 @@ static int rapl_pmu_event_init(struct
> perf_event *event)
>                 return -EINVAL;
>  
>         /* must be done before validate_group */
> -       rapl_pmu = cpu_to_rapl_pmu(event->cpu);
> +       if (curr_rapl_pmus == rapl_pmus_core) {
> +               rapl_pmu = curr_rapl_pmus-
> >rapl_pmu[topology_logical_core_id(event->cpu)];
> +               event->hw.event_base = rapl_model-
> >rapl_core_msrs[bit].msr;
> +       } else {
> +               rapl_pmu = curr_rapl_pmus-
> >rapl_pmu[get_rapl_pmu_idx(event->cpu)];
> +               event->hw.event_base = rapl_model-
> >rapl_pkg_msrs[bit].msr;
> +       }
> +

To avoid the above issues and check for (curr_rapl_pmus ==
rapl_pmus_core) all over the places, I'd suggest we do the
per_core/per_pkg sanity checks and handlings altogether, say something
like

if (event->attr.type == rapl_pmus_pkg->pmu.type) {
	all sanity checks
	rapl_pmu = ...
	event->hw.event_base = ...
} else if (rapl_pmus_core && event->attr.type ==
rapl_pmus_core->pmu.type) {
	all sanity checks
	rapl_pmu = ...
	event->hw.event_base = ...
} else {
	return --ENOENT;
}

[...]

>  static int rapl_cpu_offline(unsigned int cpu)
>  {
> -       return __rapl_cpu_offline(rapl_pmus_pkg,
> get_rapl_pmu_idx(cpu),
> +       int ret =  __rapl_cpu_offline(rapl_pmus_pkg,
> get_rapl_pmu_idx(cpu),
>                                   get_rapl_pmu_cpumask(cpu), cpu);

extra space after '='?
[...]

> +
> +       if (ret == 0 && rapl_model->core_events)
> +               ret = __rapl_cpu_offline(rapl_pmus_core,
> topology_logical_core_id(cpu),
> +                                  topology_sibling_cpumask(cpu),
> cpu);
> +
> +       return ret;
>  }
>  
>  static int __rapl_cpu_online(struct rapl_pmus *rapl_pmus, unsigned
> int rapl_pmu_idx,
> @@ -629,8 +725,14 @@ static int __rapl_cpu_online(struct rapl_pmus
> *rapl_pmus, unsigned int rapl_pmu_
>  
>  static int rapl_cpu_online(unsigned int cpu)
>  {
> -       return __rapl_cpu_online(rapl_pmus_pkg,
> get_rapl_pmu_idx(cpu),
> +       int ret =  __rapl_cpu_online(rapl_pmus_pkg,
> get_rapl_pmu_idx(cpu),
>                                  get_rapl_pmu_cpumask(cpu), cpu);

extra space after '='?
[...]


> +
> +       if (rapl_core_cntr_mask & (1 << PERF_RAPL_PER_CORE))
> +               pr_info("hw unit of domain %s 2^-%d Joules\n",
> +                       rapl_core_domain_names[PERF_RAPL_PER_CORE],
> rapl_core_hw_unit);
>  }

Are we expecting to have more than one Domain for per_core power PMU?
if no, we don't need introduce
+enum perf_rapl_core_events {
+	PERF_RAPL_PER_CORE = 0,		/* per-core */
+
+	PERF_RAPL_CORE_EVENTS_MAX,
+	NR_RAPL_CORE_DOMAINS = PERF_RAPL_CORE_EVENTS_MAX,
+};
+
and check for NR_RAPL_CORE_DOMAINS all over the place.

Or else, we should use a loop here to advertise all possible per_core
domains. Either is okay with me but the code needs to be consistent.

>  
>  static void cleanup_rapl_pmus(struct rapl_pmus *rapl_pmus)
> @@ -712,14 +820,16 @@ static const struct attribute_group
> *rapl_attr_update[] = {
>         NULL,
>  };
>  
> -static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr)
> +static const struct attribute_group *rapl_per_core_attr_update[] = {
> +       &rapl_events_per_core_group,
> +};
> +
> +static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr,
> int nr_rapl_pmu,
> +                                const struct attribute_group
> **rapl_attr_groups,
> +                                const struct attribute_group
> **rapl_attr_update)
>  {
> -       int nr_rapl_pmu = topology_max_packages();
>         struct rapl_pmus *rapl_pmus;
>  
> -       if (!rapl_pmu_is_pkg_scope())
> -               nr_rapl_pmu *= topology_max_dies_per_package();
> -
>         rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu,
> nr_rapl_pmu), GFP_KERNEL);
>         if (!rapl_pmus)
>                 return -ENOMEM;
> @@ -809,8 +919,10 @@ static struct rapl_model model_spr = {
>  
>  static struct rapl_model model_amd_hygon = {
>         .pkg_events     = BIT(PERF_RAPL_PKG),
> +       .core_events    = BIT(PERF_RAPL_PER_CORE),
>         .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
>         .rapl_pkg_msrs  = amd_rapl_pkg_msrs,
> +       .rapl_core_msrs = amd_rapl_core_msrs,
>  };
>  
>  static const struct x86_cpu_id rapl_model_match[] __initconst = {
> @@ -867,6 +979,11 @@ static int __init rapl_pmu_init(void)
>  {
>         const struct x86_cpu_id *id;
>         int ret;
> +       int nr_rapl_pmu = topology_max_packages() *
> topology_max_dies_per_package();
> +       int nr_cores = topology_max_packages() *
> topology_num_cores_per_package();
> +

I thought we agreed to use one variable for all three cases.

thanks,
rui


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs
  2024-07-11 22:23 ` [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Ian Rogers
@ 2024-07-15  9:35   ` Dhananjay Ugwekar
  2024-07-15 15:22     ` Ian Rogers
  0 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-15  9:35 UTC (permalink / raw)
  To: Ian Rogers
  Cc: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, adrian.hunter, kan.liang, tglx, bp, dave.hansen, x86, kees,
	gustavoars, rui.zhang, oleksandr, linux-perf-users, linux-kernel,
	linux-hardening, ananth.narayan, gautham.shenoy, kprateek.nayak,
	ravi.bangoria, sandipan.das, linux-pm

Hello Ian,

On 7/12/2024 3:53 AM, Ian Rogers wrote:
> On Thu, Jul 11, 2024 at 3:25 AM Dhananjay Ugwekar
> <Dhananjay.Ugwekar@amd.com> wrote:
>>
>> Currently the energy-cores event in the power PMU aggregates energy
>> consumption data at a package level. On the other hand the core energy
>> RAPL counter in AMD CPUs has a core scope (which means the energy
>> consumption is recorded separately for each core). Earlier efforts to add
>> the core event in the power PMU had failed [1], due to the difference in
>> the scope of these two events. Hence, there is a need for a new core scope
>> PMU.
>>
>> This patchset adds a new "power_per_core" PMU alongside the existing
>> "power" PMU, which will be responsible for collecting the new
>> "energy-per-core" event.
> 
> Sorry for being naive, is the only reason for adding the new PMU for
> the sake of the cpumask? Perhaps we can add per event cpumasks like
> say `/sys/devices/power/events/energy-per-core.cpumask` which contains
> the CPUs of the different cores in this case. There's supporting
> hotplugging CPUs as an issue. Adding the tool support for this
> wouldn't be hard and it may be less messy (although old perf tools on
> new kernels wouldn't know about these files).

I went over the two approaches and below are my thoughts,
 
New PMU approach:
Pros
* It will work with older perf tools, hence these patches can be backported to an older kernel and the new per-core event will work there as well.
Cons
* More code changes in rapl.c

Event specific cpumask approach:
Pros
* It might be easier to add diff scope events within the same PMU in future(although currently I'm not able to find such a usecase, apart from the RAPL pkg and core energy counters)
Cons
* Both new kernel and perf tool will be required to use the new per-core event.
  
I feel that while the event-specific cpumask is a viable alternative to the new PMU addition approach, I dont see any clear pros to select that over the current approach. Please let me know if you have any design related concerns to the addition of new PMU or your concern is mostly about the amount of code changes in this approach.

Thanks,
Dhananjay

> 
> Thanks,
> Ian
> 
> 
>> Tested the package level and core level PMU counters with workloads
>> pinned to different CPUs.
>>
>> Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa
>> machine:
>>
>> $ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1
>>
>>  Performance counter stats for 'system wide':
>>
>> S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
>> S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/
>>
>> [1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/
>>
>> This patchset applies cleanly on top of v6.10-rc7 as well as latest
>> tip/master.
>>
>> v4 changes:
>> * Add patch 11 which removes the unused function cpu_to_rapl_pmu()
>> * Add Rui's rb tag for patch 1
>> * Invert the pmu scope check logic in patch 2 (Peter)
>> * Add comments explaining the scope check in patch 2 (Peter)
>> * Use cpumask_var_t instead of cpumask_t in patch 5 (Peter)
>> * Move renaming code to patch 8 (Rui)
>> * Reorder the cleanup order of per-core and per-pkg PMU in patch 10 (Rui)
>> * Add rapl_core_hw_unit variable to store the per-core PMU unit in patch
>>   10 (Rui)
>>
>> PS: Scope check logic is still kept the same (i.e., all Intel systems being
>> considered as die scope), Rui will be modifying it to limit the die-scope
>> only to Cascadelake-AP in a future patch on top of this patchset.
>>
>> v3 changes:
>> * Patch 1 added to introduce the logical_core_id which is unique across
>>   the system (Prateek)
>> * Use the unique topology_logical_core_id() instead of
>>   topology_core_id() (which is only unique within a package on tested
>>   AMD and Intel systems) in Patch 10
>>
>> v2 changes:
>> * Patches 6,7,8 added to split some changes out of the last patch
>> * Use container_of to get the rapl_pmus from event variable (Rui)
>> * Set PERF_EV_CAP_READ_ACTIVE_PKG flag only for pkg scope PMU (Rui)
>> * Use event id 0x1 for energy-per-core event (Rui)
>> * Use PERF_RAPL_PER_CORE bit instead of adding a new flag to check for
>>   per-core counter hw support (Rui)
>>
>> Dhananjay Ugwekar (10):
>>   perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
>>   perf/x86/rapl: Rename rapl_pmu variables
>>   perf/x86/rapl: Make rapl_model struct global
>>   perf/x86/rapl: Move cpumask variable to rapl_pmus struct
>>   perf/x86/rapl: Add wrapper for online/offline functions
>>   perf/x86/rapl: Add an argument to the cleanup and init functions
>>   perf/x86/rapl: Modify the generic variable names to *_pkg*
>>   perf/x86/rapl: Remove the global variable rapl_msrs
>>   perf/x86/rapl: Add per-core energy counter support for AMD CPUs
>>   perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu
>>
>> K Prateek Nayak (1):
>>   x86/topology: Introduce topology_logical_core_id()
>>
>>  Documentation/arch/x86/topology.rst   |   4 +
>>  arch/x86/events/rapl.c                | 454 ++++++++++++++++++--------
>>  arch/x86/include/asm/processor.h      |   1 +
>>  arch/x86/include/asm/topology.h       |   1 +
>>  arch/x86/kernel/cpu/debugfs.c         |   1 +
>>  arch/x86/kernel/cpu/topology_common.c |   1 +
>>  6 files changed, 328 insertions(+), 134 deletions(-)
>>
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs
  2024-07-15  9:35   ` Dhananjay Ugwekar
@ 2024-07-15 15:22     ` Ian Rogers
  2024-07-16  8:42       ` Dhananjay Ugwekar
  0 siblings, 1 reply; 31+ messages in thread
From: Ian Rogers @ 2024-07-15 15:22 UTC (permalink / raw)
  To: Dhananjay Ugwekar
  Cc: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, adrian.hunter, kan.liang, tglx, bp, dave.hansen, x86, kees,
	gustavoars, rui.zhang, oleksandr, linux-perf-users, linux-kernel,
	linux-hardening, ananth.narayan, gautham.shenoy, kprateek.nayak,
	ravi.bangoria, sandipan.das, linux-pm

On Mon, Jul 15, 2024 at 2:36 AM Dhananjay Ugwekar
<Dhananjay.Ugwekar@amd.com> wrote:
>
> Hello Ian,
>
> On 7/12/2024 3:53 AM, Ian Rogers wrote:
> > On Thu, Jul 11, 2024 at 3:25 AM Dhananjay Ugwekar
> > <Dhananjay.Ugwekar@amd.com> wrote:
> >>
> >> Currently the energy-cores event in the power PMU aggregates energy
> >> consumption data at a package level. On the other hand the core energy
> >> RAPL counter in AMD CPUs has a core scope (which means the energy
> >> consumption is recorded separately for each core). Earlier efforts to add
> >> the core event in the power PMU had failed [1], due to the difference in
> >> the scope of these two events. Hence, there is a need for a new core scope
> >> PMU.
> >>
> >> This patchset adds a new "power_per_core" PMU alongside the existing
> >> "power" PMU, which will be responsible for collecting the new
> >> "energy-per-core" event.
> >
> > Sorry for being naive, is the only reason for adding the new PMU for
> > the sake of the cpumask? Perhaps we can add per event cpumasks like
> > say `/sys/devices/power/events/energy-per-core.cpumask` which contains
> > the CPUs of the different cores in this case. There's supporting
> > hotplugging CPUs as an issue. Adding the tool support for this
> > wouldn't be hard and it may be less messy (although old perf tools on
> > new kernels wouldn't know about these files).
>
> I went over the two approaches and below are my thoughts,
>
> New PMU approach:
> Pros
> * It will work with older perf tools, hence these patches can be backported to an older kernel and the new per-core event will work there as well.
> Cons
> * More code changes in rapl.c
>
> Event specific cpumask approach:
> Pros
> * It might be easier to add diff scope events within the same PMU in future(although currently I'm not able to find such a usecase, apart from the RAPL pkg and core energy counters)
> Cons
> * Both new kernel and perf tool will be required to use the new per-core event.
>
> I feel that while the event-specific cpumask is a viable alternative to the new PMU addition approach, I dont see any clear pros to select that over the current approach. Please let me know if you have any design related concerns to the addition of new PMU or your concern is mostly about the amount of code changes in this approach.

Thanks Dhananjay, and thanks for taking the time for an objective
discussion on the mailing list. I'm very supportive of seeing the work
you are enabling land.

My concern comes from the tool side. If every PMU starts to have
variants for the sake of the cpumask what does this mean for
aggregation in the perf tool? There is another issue around event
grouping, you can't group events across PMUs, but my feeling is that
event grouping needs to be rethought. By default the power_per_core
events are going to be aggregated together by the perf tool, which
then loses their per_core-ness.

I was trying to think of the same problem but in other PMUs. One
thought I had was the difference between hyperthread and core events.
At least on Intel, some events can only count for the whole core not
per hyperthread. The events don't have a cpu_per_core PMU, they just
use the regular cpu one, and so the cpumask is set to all online
hyperthreads. When a per-core event is programmed it will get
programmed on every hyperthread and so counted twice for the core.
This at the least wastes a counter, but it probably also yields twice
the expected count as every event is counted twice then aggregated. So
this is just wrong and the user is expected to be smart and fix it
(checking the x86 events there is a convention to use a ".ALL" or
".ANY" suffix for core wide events iirc). If we had a cpumask for
these events then we could avoid the double setting, free up a counter
and avoid double counting. Were we to fix things the way it is done in
this patch series we'd add another PMU.

My feeling is that in the longer term a per event cpumask looks
cleaner. I think either way you need a new kernel for the new RAPL
events. The problem with an old perf tool and a new kernel, this
doesn't normally happen with distributions as they match the perf tool
to the kernel version needlessly losing features and fixes along the
way. If the new PMU is going to get backported through fixes.. then we
can do similar for reading the per event cpumask. I'd be tempted not
to do this and focus on the next LTS kernel, getting the kernel and
tool fixes in as necessary.

Thanks,
Ian


> Thanks,
> Dhananjay
>
> >
> > Thanks,
> > Ian
> >
> >
> >> Tested the package level and core level PMU counters with workloads
> >> pinned to different CPUs.
> >>
> >> Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa
> >> machine:
> >>
> >> $ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1
> >>
> >>  Performance counter stats for 'system wide':
> >>
> >> S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
> >> S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/
> >>
> >> [1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/
> >>
> >> This patchset applies cleanly on top of v6.10-rc7 as well as latest
> >> tip/master.
> >>
> >> v4 changes:
> >> * Add patch 11 which removes the unused function cpu_to_rapl_pmu()
> >> * Add Rui's rb tag for patch 1
> >> * Invert the pmu scope check logic in patch 2 (Peter)
> >> * Add comments explaining the scope check in patch 2 (Peter)
> >> * Use cpumask_var_t instead of cpumask_t in patch 5 (Peter)
> >> * Move renaming code to patch 8 (Rui)
> >> * Reorder the cleanup order of per-core and per-pkg PMU in patch 10 (Rui)
> >> * Add rapl_core_hw_unit variable to store the per-core PMU unit in patch
> >>   10 (Rui)
> >>
> >> PS: Scope check logic is still kept the same (i.e., all Intel systems being
> >> considered as die scope), Rui will be modifying it to limit the die-scope
> >> only to Cascadelake-AP in a future patch on top of this patchset.
> >>
> >> v3 changes:
> >> * Patch 1 added to introduce the logical_core_id which is unique across
> >>   the system (Prateek)
> >> * Use the unique topology_logical_core_id() instead of
> >>   topology_core_id() (which is only unique within a package on tested
> >>   AMD and Intel systems) in Patch 10
> >>
> >> v2 changes:
> >> * Patches 6,7,8 added to split some changes out of the last patch
> >> * Use container_of to get the rapl_pmus from event variable (Rui)
> >> * Set PERF_EV_CAP_READ_ACTIVE_PKG flag only for pkg scope PMU (Rui)
> >> * Use event id 0x1 for energy-per-core event (Rui)
> >> * Use PERF_RAPL_PER_CORE bit instead of adding a new flag to check for
> >>   per-core counter hw support (Rui)
> >>
> >> Dhananjay Ugwekar (10):
> >>   perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
> >>   perf/x86/rapl: Rename rapl_pmu variables
> >>   perf/x86/rapl: Make rapl_model struct global
> >>   perf/x86/rapl: Move cpumask variable to rapl_pmus struct
> >>   perf/x86/rapl: Add wrapper for online/offline functions
> >>   perf/x86/rapl: Add an argument to the cleanup and init functions
> >>   perf/x86/rapl: Modify the generic variable names to *_pkg*
> >>   perf/x86/rapl: Remove the global variable rapl_msrs
> >>   perf/x86/rapl: Add per-core energy counter support for AMD CPUs
> >>   perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu
> >>
> >> K Prateek Nayak (1):
> >>   x86/topology: Introduce topology_logical_core_id()
> >>
> >>  Documentation/arch/x86/topology.rst   |   4 +
> >>  arch/x86/events/rapl.c                | 454 ++++++++++++++++++--------
> >>  arch/x86/include/asm/processor.h      |   1 +
> >>  arch/x86/include/asm/topology.h       |   1 +
> >>  arch/x86/kernel/cpu/debugfs.c         |   1 +
> >>  arch/x86/kernel/cpu/topology_common.c |   1 +
> >>  6 files changed, 328 insertions(+), 134 deletions(-)
> >>
> >> --
> >> 2.34.1
> >>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs
  2024-07-15 15:22     ` Ian Rogers
@ 2024-07-16  8:42       ` Dhananjay Ugwekar
  2024-07-16 22:47         ` Ian Rogers
  0 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-16  8:42 UTC (permalink / raw)
  To: Ian Rogers
  Cc: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, adrian.hunter, kan.liang, tglx, bp, dave.hansen, x86, kees,
	gustavoars, rui.zhang, oleksandr, linux-perf-users, linux-kernel,
	linux-hardening, ananth.narayan, gautham.shenoy, kprateek.nayak,
	ravi.bangoria, sandipan.das, linux-pm

Hello Ian,

On 7/15/2024 8:52 PM, Ian Rogers wrote:
> On Mon, Jul 15, 2024 at 2:36 AM Dhananjay Ugwekar
> <Dhananjay.Ugwekar@amd.com> wrote:
>>
>> Hello Ian,
>>
>> On 7/12/2024 3:53 AM, Ian Rogers wrote:
>>> On Thu, Jul 11, 2024 at 3:25 AM Dhananjay Ugwekar
>>> <Dhananjay.Ugwekar@amd.com> wrote:
>>>>
>>>> Currently the energy-cores event in the power PMU aggregates energy
>>>> consumption data at a package level. On the other hand the core energy
>>>> RAPL counter in AMD CPUs has a core scope (which means the energy
>>>> consumption is recorded separately for each core). Earlier efforts to add
>>>> the core event in the power PMU had failed [1], due to the difference in
>>>> the scope of these two events. Hence, there is a need for a new core scope
>>>> PMU.
>>>>
>>>> This patchset adds a new "power_per_core" PMU alongside the existing
>>>> "power" PMU, which will be responsible for collecting the new
>>>> "energy-per-core" event.
>>>
>>> Sorry for being naive, is the only reason for adding the new PMU for
>>> the sake of the cpumask? Perhaps we can add per event cpumasks like
>>> say `/sys/devices/power/events/energy-per-core.cpumask` which contains
>>> the CPUs of the different cores in this case. There's supporting
>>> hotplugging CPUs as an issue. Adding the tool support for this
>>> wouldn't be hard and it may be less messy (although old perf tools on
>>> new kernels wouldn't know about these files).
>>
>> I went over the two approaches and below are my thoughts,
>>
>> New PMU approach:
>> Pros
>> * It will work with older perf tools, hence these patches can be backported to an older kernel and the new per-core event will work there as well.
>> Cons
>> * More code changes in rapl.c
>>
>> Event specific cpumask approach:
>> Pros
>> * It might be easier to add diff scope events within the same PMU in future(although currently I'm not able to find such a usecase, apart from the RAPL pkg and core energy counters)
>> Cons
>> * Both new kernel and perf tool will be required to use the new per-core event.
>>
>> I feel that while the event-specific cpumask is a viable alternative to the new PMU addition approach, I dont see any clear pros to select that over the current approach. Please let me know if you have any design related concerns to the addition of new PMU or your concern is mostly about the amount of code changes in this approach.
> 
> Thanks Dhananjay, and thanks for taking the time for an objective
> discussion on the mailing list. I'm very supportive of seeing the work
> you are enabling land.
> 
> My concern comes from the tool side. If every PMU starts to have
> variants for the sake of the cpumask what does this mean for
> aggregation in the perf tool? There is another issue around event
> grouping, you can't group events across PMUs, but my feeling is that
> event grouping needs to be rethought. By default the power_per_core
> events are going to be aggregated together by the perf tool, which
> then loses their per_core-ness.

Yea right, maybe we need to fix this behavior.

> 
> I was trying to think of the same problem but in other PMUs. One
> thought I had was the difference between hyperthread and core events.
> At least on Intel, some events can only count for the whole core not
> per hyperthread. The events don't have a cpu_per_core PMU, they just
> use the regular cpu one, and so the cpumask is set to all online
> hyperthreads. When a per-core event is programmed it will get
> programmed on every hyperthread and so counted twice for the core.
> This at the least wastes a counter, but it probably also yields twice
> the expected count as every event is counted twice then aggregated. So
> this is just wrong and the user is expected to be smart and fix it
> (checking the x86 events there is a convention to use a ".ALL" or
> ".ANY" suffix for core wide events iirc). If we had a cpumask for
> these events then we could avoid the double setting, free up a counter
> and avoid double counting. Were we to fix things the way it is done in
> this patch series we'd add another PMU.

Yes, this seems like a valid usecase for event-specific cpumasks.

> 
> My feeling is that in the longer term a per event cpumask looks
> cleaner. I think either way you need a new kernel for the new RAPL
> events. The problem with an old perf tool and a new kernel, this
> doesn't normally happen with distributions as they match the perf tool
> to the kernel version needlessly losing features and fixes along the
> way. If the new PMU is going to get backported through fixes.. then we
> can do similar for reading the per event cpumask. I'd be tempted not
> to do this and focus on the next LTS kernel, getting the kernel and
> tool fixes in as necessary.

Makes sense, even though this approach will require more effort but it seems 
to be worthwhile as it would help things down the line (make it easier to have 
heterogenous-scope events within a PMU). I'll need to go through the perf tool 
to see how we can design this. I'll get back with an RFC series probably once 
I have an initial design in mind.

Thanks,
Dhananjay

> 
> Thanks,
> Ian
> 
> 
>> Thanks,
>> Dhananjay
>>
>>>
>>> Thanks,
>>> Ian
>>>
>>>
>>>> Tested the package level and core level PMU counters with workloads
>>>> pinned to different CPUs.
>>>>
>>>> Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa
>>>> machine:
>>>>
>>>> $ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1
>>>>
>>>>  Performance counter stats for 'system wide':
>>>>
>>>> S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
>>>> S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/
>>>>
>>>> [1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/
>>>>
>>>> This patchset applies cleanly on top of v6.10-rc7 as well as latest
>>>> tip/master.
>>>>
>>>> v4 changes:
>>>> * Add patch 11 which removes the unused function cpu_to_rapl_pmu()
>>>> * Add Rui's rb tag for patch 1
>>>> * Invert the pmu scope check logic in patch 2 (Peter)
>>>> * Add comments explaining the scope check in patch 2 (Peter)
>>>> * Use cpumask_var_t instead of cpumask_t in patch 5 (Peter)
>>>> * Move renaming code to patch 8 (Rui)
>>>> * Reorder the cleanup order of per-core and per-pkg PMU in patch 10 (Rui)
>>>> * Add rapl_core_hw_unit variable to store the per-core PMU unit in patch
>>>>   10 (Rui)
>>>>
>>>> PS: Scope check logic is still kept the same (i.e., all Intel systems being
>>>> considered as die scope), Rui will be modifying it to limit the die-scope
>>>> only to Cascadelake-AP in a future patch on top of this patchset.
>>>>
>>>> v3 changes:
>>>> * Patch 1 added to introduce the logical_core_id which is unique across
>>>>   the system (Prateek)
>>>> * Use the unique topology_logical_core_id() instead of
>>>>   topology_core_id() (which is only unique within a package on tested
>>>>   AMD and Intel systems) in Patch 10
>>>>
>>>> v2 changes:
>>>> * Patches 6,7,8 added to split some changes out of the last patch
>>>> * Use container_of to get the rapl_pmus from event variable (Rui)
>>>> * Set PERF_EV_CAP_READ_ACTIVE_PKG flag only for pkg scope PMU (Rui)
>>>> * Use event id 0x1 for energy-per-core event (Rui)
>>>> * Use PERF_RAPL_PER_CORE bit instead of adding a new flag to check for
>>>>   per-core counter hw support (Rui)
>>>>
>>>> Dhananjay Ugwekar (10):
>>>>   perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
>>>>   perf/x86/rapl: Rename rapl_pmu variables
>>>>   perf/x86/rapl: Make rapl_model struct global
>>>>   perf/x86/rapl: Move cpumask variable to rapl_pmus struct
>>>>   perf/x86/rapl: Add wrapper for online/offline functions
>>>>   perf/x86/rapl: Add an argument to the cleanup and init functions
>>>>   perf/x86/rapl: Modify the generic variable names to *_pkg*
>>>>   perf/x86/rapl: Remove the global variable rapl_msrs
>>>>   perf/x86/rapl: Add per-core energy counter support for AMD CPUs
>>>>   perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu
>>>>
>>>> K Prateek Nayak (1):
>>>>   x86/topology: Introduce topology_logical_core_id()
>>>>
>>>>  Documentation/arch/x86/topology.rst   |   4 +
>>>>  arch/x86/events/rapl.c                | 454 ++++++++++++++++++--------
>>>>  arch/x86/include/asm/processor.h      |   1 +
>>>>  arch/x86/include/asm/topology.h       |   1 +
>>>>  arch/x86/kernel/cpu/debugfs.c         |   1 +
>>>>  arch/x86/kernel/cpu/topology_common.c |   1 +
>>>>  6 files changed, 328 insertions(+), 134 deletions(-)
>>>>
>>>> --
>>>> 2.34.1
>>>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 02/11] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
  2024-07-11 10:24 ` [PATCH v4 02/11] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs Dhananjay Ugwekar
  2024-07-12  2:04   ` Zhang, Rui
@ 2024-07-16  8:56   ` Dhananjay Ugwekar
  1 sibling, 0 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-16  8:56 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm

Hello Peter, Rui,

After Ian's comments on the series, I have decided to rethink the approach of
adding a new PMU for the per-core RAPL counters.

However this patch is still needed as a fix for the 
commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"), I will be
sending this separately along with a similar fix for powercap/intel_rapl_common.

Thanks,
Dhananjay

On 7/11/2024 3:54 PM, Dhananjay Ugwekar wrote:
> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
> on AMD processors that support extended CPUID leaf 0x80000026, the
> topology_die_cpumask() and topology_logical_die_id() macros, no longer
> return the package cpumask and package id, instead they return the CCD
> (Core Complex Die) mask and id respectively. This leads to the energy-pkg
> event scope to be modified to CCD instead of package.
> 
> Replacing these macros with their package counterparts fixes the
> energy-pkg event for AMD CPUs.
> 
> However due to the difference between the scope of energy-pkg event for
> Intel and AMD CPUs, we have to replace these macros conditionally only for
> AMD CPUs.
> 
> On a 12 CCD 1 Package AMD Zen4 Genoa machine:
> 
> Before:
> $ cat /sys/devices/power/cpumask
> 0,8,16,24,32,40,48,56,64,72,80,88.
> 
> The expected cpumask here is supposed to be just "0", as it is a package
> scope event, only one CPU will be collecting the event for all the CPUs in
> the package.
> 
> After:
> $ cat /sys/devices/power/cpumask
> 0
> 
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
> ---
> Changes in v4:
> * Invert the pkg scope check in init_rapl_pmus() (Peter)
> * Add comments to explain the pkg scope check (Peter)
> 
> PS: Scope check logic is still kept the same (i.e., all Intel systems being 
> considered as die scope), Rui will be modifying it to limit the die-scope 
> only to Cascadelake-AP in a future patch on top of this patchset.
> ---
>  arch/x86/events/rapl.c | 39 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 34 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index 0c5e7a7c43ac..df71f38ad98d 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -103,6 +103,13 @@ static struct perf_pmu_events_attr event_attr_##v = {				\
>  	.event_str	= str,							\
>  };
>  
> +/*
> + * RAPL PMU scope for AMD is package whereas for Intel it is die.
> + */
> +#define rapl_pmu_is_pkg_scope()				\
> +	(boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||	\
> +	 boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
> +
>  struct rapl_pmu {
>  	raw_spinlock_t		lock;
>  	int			n_active;
> @@ -140,9 +147,25 @@ static unsigned int rapl_cntr_mask;
>  static u64 rapl_timer_ms;
>  static struct perf_msr *rapl_msrs;
>  
> +/*
> + * Helper functions to get the correct topology macros according to the
> + * RAPL PMU scope.
> + */
> +static inline unsigned int get_rapl_pmu_idx(int cpu)
> +{
> +	return rapl_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
> +					 topology_logical_die_id(cpu);
> +}
> +
> +static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
> +{
> +	return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
> +					 topology_die_cpumask(cpu);
> +}
> +
>  static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
>  {
> -	unsigned int rapl_pmu_idx = topology_logical_die_id(cpu);
> +	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
>  
>  	/*
>  	 * The unsigned check also catches the '-1' return value for non
> @@ -543,6 +566,7 @@ static struct perf_msr amd_rapl_msrs[] = {
>  
>  static int rapl_cpu_offline(unsigned int cpu)
>  {
> +	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
>  	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>  	int target;
>  
> @@ -552,7 +576,7 @@ static int rapl_cpu_offline(unsigned int cpu)
>  
>  	pmu->cpu = -1;
>  	/* Find a new cpu to collect rapl events */
> -	target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
> +	target = cpumask_any_but(rapl_pmu_cpumask, cpu);
>  
>  	/* Migrate rapl events to the new target */
>  	if (target < nr_cpu_ids) {
> @@ -565,6 +589,8 @@ static int rapl_cpu_offline(unsigned int cpu)
>  
>  static int rapl_cpu_online(unsigned int cpu)
>  {
> +	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
> +	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
>  	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>  	int target;
>  
> @@ -579,14 +605,14 @@ static int rapl_cpu_online(unsigned int cpu)
>  		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
>  		rapl_hrtimer_init(pmu);
>  
> -		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
> +		rapl_pmus->pmus[rapl_pmu_idx] = pmu;
>  	}
>  
>  	/*
>  	 * Check if there is an online cpu in the package which collects rapl
>  	 * events already.
>  	 */
> -	target = cpumask_any_and(&rapl_cpu_mask, topology_die_cpumask(cpu));
> +	target = cpumask_any_and(&rapl_cpu_mask, rapl_pmu_cpumask);
>  	if (target < nr_cpu_ids)
>  		return 0;
>  
> @@ -675,7 +701,10 @@ static const struct attribute_group *rapl_attr_update[] = {
>  
>  static int __init init_rapl_pmus(void)
>  {
> -	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
> +	int nr_rapl_pmu = topology_max_packages();
> +
> +	if (!rapl_pmu_is_pkg_scope())
> +		nr_rapl_pmu *= topology_max_dies_per_package();
>  
>  	rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus, nr_rapl_pmu), GFP_KERNEL);
>  	if (!rapl_pmus)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs
  2024-07-16  8:42       ` Dhananjay Ugwekar
@ 2024-07-16 22:47         ` Ian Rogers
  2024-07-17  8:04           ` Dhananjay Ugwekar
  0 siblings, 1 reply; 31+ messages in thread
From: Ian Rogers @ 2024-07-16 22:47 UTC (permalink / raw)
  To: Dhananjay Ugwekar
  Cc: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, adrian.hunter, kan.liang, tglx, bp, dave.hansen, x86, kees,
	gustavoars, rui.zhang, oleksandr, linux-perf-users, linux-kernel,
	linux-hardening, ananth.narayan, gautham.shenoy, kprateek.nayak,
	ravi.bangoria, sandipan.das, linux-pm

On Tue, Jul 16, 2024 at 1:42 AM Dhananjay Ugwekar
<Dhananjay.Ugwekar@amd.com> wrote:
>
> Hello Ian,
>
> On 7/15/2024 8:52 PM, Ian Rogers wrote:
> > On Mon, Jul 15, 2024 at 2:36 AM Dhananjay Ugwekar
> > <Dhananjay.Ugwekar@amd.com> wrote:
> >>
> >> Hello Ian,
> >>
> >> On 7/12/2024 3:53 AM, Ian Rogers wrote:
> >>> On Thu, Jul 11, 2024 at 3:25 AM Dhananjay Ugwekar
> >>> <Dhananjay.Ugwekar@amd.com> wrote:
> >>>>
> >>>> Currently the energy-cores event in the power PMU aggregates energy
> >>>> consumption data at a package level. On the other hand the core energy
> >>>> RAPL counter in AMD CPUs has a core scope (which means the energy
> >>>> consumption is recorded separately for each core). Earlier efforts to add
> >>>> the core event in the power PMU had failed [1], due to the difference in
> >>>> the scope of these two events. Hence, there is a need for a new core scope
> >>>> PMU.
> >>>>
> >>>> This patchset adds a new "power_per_core" PMU alongside the existing
> >>>> "power" PMU, which will be responsible for collecting the new
> >>>> "energy-per-core" event.
> >>>
> >>> Sorry for being naive, is the only reason for adding the new PMU for
> >>> the sake of the cpumask? Perhaps we can add per event cpumasks like
> >>> say `/sys/devices/power/events/energy-per-core.cpumask` which contains
> >>> the CPUs of the different cores in this case. There's supporting
> >>> hotplugging CPUs as an issue. Adding the tool support for this
> >>> wouldn't be hard and it may be less messy (although old perf tools on
> >>> new kernels wouldn't know about these files).
> >>
> >> I went over the two approaches and below are my thoughts,
> >>
> >> New PMU approach:
> >> Pros
> >> * It will work with older perf tools, hence these patches can be backported to an older kernel and the new per-core event will work there as well.
> >> Cons
> >> * More code changes in rapl.c
> >>
> >> Event specific cpumask approach:
> >> Pros
> >> * It might be easier to add diff scope events within the same PMU in future(although currently I'm not able to find such a usecase, apart from the RAPL pkg and core energy counters)
> >> Cons
> >> * Both new kernel and perf tool will be required to use the new per-core event.
> >>
> >> I feel that while the event-specific cpumask is a viable alternative to the new PMU addition approach, I dont see any clear pros to select that over the current approach. Please let me know if you have any design related concerns to the addition of new PMU or your concern is mostly about the amount of code changes in this approach.
> >
> > Thanks Dhananjay, and thanks for taking the time for an objective
> > discussion on the mailing list. I'm very supportive of seeing the work
> > you are enabling land.
> >
> > My concern comes from the tool side. If every PMU starts to have
> > variants for the sake of the cpumask what does this mean for
> > aggregation in the perf tool? There is another issue around event
> > grouping, you can't group events across PMUs, but my feeling is that
> > event grouping needs to be rethought. By default the power_per_core
> > events are going to be aggregated together by the perf tool, which
> > then loses their per_core-ness.
>
> Yea right, maybe we need to fix this behavior.
>
> >
> > I was trying to think of the same problem but in other PMUs. One
> > thought I had was the difference between hyperthread and core events.
> > At least on Intel, some events can only count for the whole core not
> > per hyperthread. The events don't have a cpu_per_core PMU, they just
> > use the regular cpu one, and so the cpumask is set to all online
> > hyperthreads. When a per-core event is programmed it will get
> > programmed on every hyperthread and so counted twice for the core.
> > This at the least wastes a counter, but it probably also yields twice
> > the expected count as every event is counted twice then aggregated. So
> > this is just wrong and the user is expected to be smart and fix it
> > (checking the x86 events there is a convention to use a ".ALL" or
> > ".ANY" suffix for core wide events iirc). If we had a cpumask for
> > these events then we could avoid the double setting, free up a counter
> > and avoid double counting. Were we to fix things the way it is done in
> > this patch series we'd add another PMU.
>
> Yes, this seems like a valid usecase for event-specific cpumasks.
>
> >
> > My feeling is that in the longer term a per event cpumask looks
> > cleaner. I think either way you need a new kernel for the new RAPL
> > events. The problem with an old perf tool and a new kernel, this
> > doesn't normally happen with distributions as they match the perf tool
> > to the kernel version needlessly losing features and fixes along the
> > way. If the new PMU is going to get backported through fixes.. then we
> > can do similar for reading the per event cpumask. I'd be tempted not
> > to do this and focus on the next LTS kernel, getting the kernel and
> > tool fixes in as necessary.
>
> Makes sense, even though this approach will require more effort but it seems
> to be worthwhile as it would help things down the line (make it easier to have
> heterogenous-scope events within a PMU). I'll need to go through the perf tool
> to see how we can design this. I'll get back with an RFC series probably once
> I have an initial design in mind.

Hi Dhananjay,

I can have a go at the perf tool side of this - I probably know the
way around the code best. Basically we need to do something similar to
how other "<event>.<setting>" values are parsed:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n335
The cpumask handling in the perf tool is a little weird as there is a
summary value in the evlist. Anyway, I can do that if you want to spin
the RAPL/power PMU side of things.

Thanks,
Ian

> Thanks,
> Dhananjay
>
> >
> > Thanks,
> > Ian
> >
> >
> >> Thanks,
> >> Dhananjay
> >>
> >>>
> >>> Thanks,
> >>> Ian
> >>>
> >>>
> >>>> Tested the package level and core level PMU counters with workloads
> >>>> pinned to different CPUs.
> >>>>
> >>>> Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa
> >>>> machine:
> >>>>
> >>>> $ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1
> >>>>
> >>>>  Performance counter stats for 'system wide':
> >>>>
> >>>> S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/
> >>>>
> >>>> [1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/
> >>>>
> >>>> This patchset applies cleanly on top of v6.10-rc7 as well as latest
> >>>> tip/master.
> >>>>
> >>>> v4 changes:
> >>>> * Add patch 11 which removes the unused function cpu_to_rapl_pmu()
> >>>> * Add Rui's rb tag for patch 1
> >>>> * Invert the pmu scope check logic in patch 2 (Peter)
> >>>> * Add comments explaining the scope check in patch 2 (Peter)
> >>>> * Use cpumask_var_t instead of cpumask_t in patch 5 (Peter)
> >>>> * Move renaming code to patch 8 (Rui)
> >>>> * Reorder the cleanup order of per-core and per-pkg PMU in patch 10 (Rui)
> >>>> * Add rapl_core_hw_unit variable to store the per-core PMU unit in patch
> >>>>   10 (Rui)
> >>>>
> >>>> PS: Scope check logic is still kept the same (i.e., all Intel systems being
> >>>> considered as die scope), Rui will be modifying it to limit the die-scope
> >>>> only to Cascadelake-AP in a future patch on top of this patchset.
> >>>>
> >>>> v3 changes:
> >>>> * Patch 1 added to introduce the logical_core_id which is unique across
> >>>>   the system (Prateek)
> >>>> * Use the unique topology_logical_core_id() instead of
> >>>>   topology_core_id() (which is only unique within a package on tested
> >>>>   AMD and Intel systems) in Patch 10
> >>>>
> >>>> v2 changes:
> >>>> * Patches 6,7,8 added to split some changes out of the last patch
> >>>> * Use container_of to get the rapl_pmus from event variable (Rui)
> >>>> * Set PERF_EV_CAP_READ_ACTIVE_PKG flag only for pkg scope PMU (Rui)
> >>>> * Use event id 0x1 for energy-per-core event (Rui)
> >>>> * Use PERF_RAPL_PER_CORE bit instead of adding a new flag to check for
> >>>>   per-core counter hw support (Rui)
> >>>>
> >>>> Dhananjay Ugwekar (10):
> >>>>   perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
> >>>>   perf/x86/rapl: Rename rapl_pmu variables
> >>>>   perf/x86/rapl: Make rapl_model struct global
> >>>>   perf/x86/rapl: Move cpumask variable to rapl_pmus struct
> >>>>   perf/x86/rapl: Add wrapper for online/offline functions
> >>>>   perf/x86/rapl: Add an argument to the cleanup and init functions
> >>>>   perf/x86/rapl: Modify the generic variable names to *_pkg*
> >>>>   perf/x86/rapl: Remove the global variable rapl_msrs
> >>>>   perf/x86/rapl: Add per-core energy counter support for AMD CPUs
> >>>>   perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu
> >>>>
> >>>> K Prateek Nayak (1):
> >>>>   x86/topology: Introduce topology_logical_core_id()
> >>>>
> >>>>  Documentation/arch/x86/topology.rst   |   4 +
> >>>>  arch/x86/events/rapl.c                | 454 ++++++++++++++++++--------
> >>>>  arch/x86/include/asm/processor.h      |   1 +
> >>>>  arch/x86/include/asm/topology.h       |   1 +
> >>>>  arch/x86/kernel/cpu/debugfs.c         |   1 +
> >>>>  arch/x86/kernel/cpu/topology_common.c |   1 +
> >>>>  6 files changed, 328 insertions(+), 134 deletions(-)
> >>>>
> >>>> --
> >>>> 2.34.1
> >>>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs
  2024-07-16 22:47         ` Ian Rogers
@ 2024-07-17  8:04           ` Dhananjay Ugwekar
  2024-07-17 15:36             ` Ian Rogers
  0 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-17  8:04 UTC (permalink / raw)
  To: Ian Rogers
  Cc: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, adrian.hunter, kan.liang, tglx, bp, dave.hansen, x86, kees,
	gustavoars, rui.zhang, oleksandr, linux-perf-users, linux-kernel,
	linux-hardening, ananth.narayan, gautham.shenoy, kprateek.nayak,
	ravi.bangoria, sandipan.das, linux-pm

On 7/17/2024 4:17 AM, Ian Rogers wrote:
> On Tue, Jul 16, 2024 at 1:42 AM Dhananjay Ugwekar
> <Dhananjay.Ugwekar@amd.com> wrote:
>>
>> Hello Ian,
>>
>> On 7/15/2024 8:52 PM, Ian Rogers wrote:
>>> On Mon, Jul 15, 2024 at 2:36 AM Dhananjay Ugwekar
>>> <Dhananjay.Ugwekar@amd.com> wrote:
>>>>
>>>> Hello Ian,
>>>>
>>>> On 7/12/2024 3:53 AM, Ian Rogers wrote:
>>>>> On Thu, Jul 11, 2024 at 3:25 AM Dhananjay Ugwekar
>>>>> <Dhananjay.Ugwekar@amd.com> wrote:
>>>>>>
>>>>>> Currently the energy-cores event in the power PMU aggregates energy
>>>>>> consumption data at a package level. On the other hand the core energy
>>>>>> RAPL counter in AMD CPUs has a core scope (which means the energy
>>>>>> consumption is recorded separately for each core). Earlier efforts to add
>>>>>> the core event in the power PMU had failed [1], due to the difference in
>>>>>> the scope of these two events. Hence, there is a need for a new core scope
>>>>>> PMU.
>>>>>>
>>>>>> This patchset adds a new "power_per_core" PMU alongside the existing
>>>>>> "power" PMU, which will be responsible for collecting the new
>>>>>> "energy-per-core" event.
>>>>>
>>>>> Sorry for being naive, is the only reason for adding the new PMU for
>>>>> the sake of the cpumask? Perhaps we can add per event cpumasks like
>>>>> say `/sys/devices/power/events/energy-per-core.cpumask` which contains
>>>>> the CPUs of the different cores in this case. There's supporting
>>>>> hotplugging CPUs as an issue. Adding the tool support for this
>>>>> wouldn't be hard and it may be less messy (although old perf tools on
>>>>> new kernels wouldn't know about these files).
>>>>
>>>> I went over the two approaches and below are my thoughts,
>>>>
>>>> New PMU approach:
>>>> Pros
>>>> * It will work with older perf tools, hence these patches can be backported to an older kernel and the new per-core event will work there as well.
>>>> Cons
>>>> * More code changes in rapl.c
>>>>
>>>> Event specific cpumask approach:
>>>> Pros
>>>> * It might be easier to add diff scope events within the same PMU in future(although currently I'm not able to find such a usecase, apart from the RAPL pkg and core energy counters)
>>>> Cons
>>>> * Both new kernel and perf tool will be required to use the new per-core event.
>>>>
>>>> I feel that while the event-specific cpumask is a viable alternative to the new PMU addition approach, I dont see any clear pros to select that over the current approach. Please let me know if you have any design related concerns to the addition of new PMU or your concern is mostly about the amount of code changes in this approach.
>>>
>>> Thanks Dhananjay, and thanks for taking the time for an objective
>>> discussion on the mailing list. I'm very supportive of seeing the work
>>> you are enabling land.
>>>
>>> My concern comes from the tool side. If every PMU starts to have
>>> variants for the sake of the cpumask what does this mean for
>>> aggregation in the perf tool? There is another issue around event
>>> grouping, you can't group events across PMUs, but my feeling is that
>>> event grouping needs to be rethought. By default the power_per_core
>>> events are going to be aggregated together by the perf tool, which
>>> then loses their per_core-ness.
>>
>> Yea right, maybe we need to fix this behavior.
>>
>>>
>>> I was trying to think of the same problem but in other PMUs. One
>>> thought I had was the difference between hyperthread and core events.
>>> At least on Intel, some events can only count for the whole core not
>>> per hyperthread. The events don't have a cpu_per_core PMU, they just
>>> use the regular cpu one, and so the cpumask is set to all online
>>> hyperthreads. When a per-core event is programmed it will get
>>> programmed on every hyperthread and so counted twice for the core.
>>> This at the least wastes a counter, but it probably also yields twice
>>> the expected count as every event is counted twice then aggregated. So
>>> this is just wrong and the user is expected to be smart and fix it
>>> (checking the x86 events there is a convention to use a ".ALL" or
>>> ".ANY" suffix for core wide events iirc). If we had a cpumask for
>>> these events then we could avoid the double setting, free up a counter
>>> and avoid double counting. Were we to fix things the way it is done in
>>> this patch series we'd add another PMU.
>>
>> Yes, this seems like a valid usecase for event-specific cpumasks.
>>
>>>
>>> My feeling is that in the longer term a per event cpumask looks
>>> cleaner. I think either way you need a new kernel for the new RAPL
>>> events. The problem with an old perf tool and a new kernel, this
>>> doesn't normally happen with distributions as they match the perf tool
>>> to the kernel version needlessly losing features and fixes along the
>>> way. If the new PMU is going to get backported through fixes.. then we
>>> can do similar for reading the per event cpumask. I'd be tempted not
>>> to do this and focus on the next LTS kernel, getting the kernel and
>>> tool fixes in as necessary.
>>
>> Makes sense, even though this approach will require more effort but it seems
>> to be worthwhile as it would help things down the line (make it easier to have
>> heterogenous-scope events within a PMU). I'll need to go through the perf tool
>> to see how we can design this. I'll get back with an RFC series probably once
>> I have an initial design in mind.
> 
> Hi Dhananjay,
> 
> I can have a go at the perf tool side of this - I probably know the
> way around the code best. Basically we need to do something similar to
> how other "<event>.<setting>" values are parsed:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n335
Yea, right.

> The cpumask handling in the perf tool is a little weird as there is a
> summary value in the evlist. Anyway, I can do that if you want to spin
> the RAPL/power PMU side of things.

Sounds great!, I'll be happy to refactor the RAPL code to use the event.cpumask 
feature to add the per-core energy counter. Also, please let me know if you need 
any help from me on the perf tool side as well.

Regards,
Dhananjay

> 
> Thanks,
> Ian
> 
>> Thanks,
>> Dhananjay
>>
>>>
>>> Thanks,
>>> Ian
>>>
>>>
>>>> Thanks,
>>>> Dhananjay
>>>>
>>>>>
>>>>> Thanks,
>>>>> Ian
>>>>>
>>>>>
>>>>>> Tested the package level and core level PMU counters with workloads
>>>>>> pinned to different CPUs.
>>>>>>
>>>>>> Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa
>>>>>> machine:
>>>>>>
>>>>>> $ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1
>>>>>>
>>>>>>  Performance counter stats for 'system wide':
>>>>>>
>>>>>> S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/
>>>>>>
>>>>>> [1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/
>>>>>>
>>>>>> This patchset applies cleanly on top of v6.10-rc7 as well as latest
>>>>>> tip/master.
>>>>>>
>>>>>> v4 changes:
>>>>>> * Add patch 11 which removes the unused function cpu_to_rapl_pmu()
>>>>>> * Add Rui's rb tag for patch 1
>>>>>> * Invert the pmu scope check logic in patch 2 (Peter)
>>>>>> * Add comments explaining the scope check in patch 2 (Peter)
>>>>>> * Use cpumask_var_t instead of cpumask_t in patch 5 (Peter)
>>>>>> * Move renaming code to patch 8 (Rui)
>>>>>> * Reorder the cleanup order of per-core and per-pkg PMU in patch 10 (Rui)
>>>>>> * Add rapl_core_hw_unit variable to store the per-core PMU unit in patch
>>>>>>   10 (Rui)
>>>>>>
>>>>>> PS: Scope check logic is still kept the same (i.e., all Intel systems being
>>>>>> considered as die scope), Rui will be modifying it to limit the die-scope
>>>>>> only to Cascadelake-AP in a future patch on top of this patchset.
>>>>>>
>>>>>> v3 changes:
>>>>>> * Patch 1 added to introduce the logical_core_id which is unique across
>>>>>>   the system (Prateek)
>>>>>> * Use the unique topology_logical_core_id() instead of
>>>>>>   topology_core_id() (which is only unique within a package on tested
>>>>>>   AMD and Intel systems) in Patch 10
>>>>>>
>>>>>> v2 changes:
>>>>>> * Patches 6,7,8 added to split some changes out of the last patch
>>>>>> * Use container_of to get the rapl_pmus from event variable (Rui)
>>>>>> * Set PERF_EV_CAP_READ_ACTIVE_PKG flag only for pkg scope PMU (Rui)
>>>>>> * Use event id 0x1 for energy-per-core event (Rui)
>>>>>> * Use PERF_RAPL_PER_CORE bit instead of adding a new flag to check for
>>>>>>   per-core counter hw support (Rui)
>>>>>>
>>>>>> Dhananjay Ugwekar (10):
>>>>>>   perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
>>>>>>   perf/x86/rapl: Rename rapl_pmu variables
>>>>>>   perf/x86/rapl: Make rapl_model struct global
>>>>>>   perf/x86/rapl: Move cpumask variable to rapl_pmus struct
>>>>>>   perf/x86/rapl: Add wrapper for online/offline functions
>>>>>>   perf/x86/rapl: Add an argument to the cleanup and init functions
>>>>>>   perf/x86/rapl: Modify the generic variable names to *_pkg*
>>>>>>   perf/x86/rapl: Remove the global variable rapl_msrs
>>>>>>   perf/x86/rapl: Add per-core energy counter support for AMD CPUs
>>>>>>   perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu
>>>>>>
>>>>>> K Prateek Nayak (1):
>>>>>>   x86/topology: Introduce topology_logical_core_id()
>>>>>>
>>>>>>  Documentation/arch/x86/topology.rst   |   4 +
>>>>>>  arch/x86/events/rapl.c                | 454 ++++++++++++++++++--------
>>>>>>  arch/x86/include/asm/processor.h      |   1 +
>>>>>>  arch/x86/include/asm/topology.h       |   1 +
>>>>>>  arch/x86/kernel/cpu/debugfs.c         |   1 +
>>>>>>  arch/x86/kernel/cpu/topology_common.c |   1 +
>>>>>>  6 files changed, 328 insertions(+), 134 deletions(-)
>>>>>>
>>>>>> --
>>>>>> 2.34.1
>>>>>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs
  2024-07-17  8:04           ` Dhananjay Ugwekar
@ 2024-07-17 15:36             ` Ian Rogers
  0 siblings, 0 replies; 31+ messages in thread
From: Ian Rogers @ 2024-07-17 15:36 UTC (permalink / raw)
  To: Dhananjay Ugwekar
  Cc: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, adrian.hunter, kan.liang, tglx, bp, dave.hansen, x86, kees,
	gustavoars, rui.zhang, oleksandr, linux-perf-users, linux-kernel,
	linux-hardening, ananth.narayan, gautham.shenoy, kprateek.nayak,
	ravi.bangoria, sandipan.das, linux-pm

On Wed, Jul 17, 2024 at 1:05 AM Dhananjay Ugwekar
<Dhananjay.Ugwekar@amd.com> wrote:
...
> Sounds great!, I'll be happy to refactor the RAPL code to use the event.cpumask
> feature to add the per-core energy counter. Also, please let me know if you need
> any help from me on the perf tool side as well.

I hope to send out some patches soon. One thought I have is that
"event.cpumask" is probably less intention revealing the "event.cpus"
as a name, so I'm going to code assuming the suffix is ".cpus" but
obviously we can change it to ".cpumask". To me the "mask" makes it
sound like we "and" the values with something, hence wanting to avoid
the confusion. There's much confusion in the area already, such as the
term "cpu map" that does not implement some kind of map.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-07-17 15:36 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-11 10:24 [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
2024-07-11 10:24 ` [PATCH v4 01/11] x86/topology: Introduce topology_logical_core_id() Dhananjay Ugwekar
2024-07-11 10:24 ` [PATCH v4 02/11] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs Dhananjay Ugwekar
2024-07-12  2:04   ` Zhang, Rui
2024-07-12  3:28     ` Dhananjay Ugwekar
2024-07-16  8:56   ` Dhananjay Ugwekar
2024-07-11 10:24 ` [PATCH v4 03/11] perf/x86/rapl: Rename rapl_pmu variables Dhananjay Ugwekar
2024-07-12  2:14   ` Zhang, Rui
2024-07-11 10:24 ` [PATCH v4 04/11] perf/x86/rapl: Make rapl_model struct global Dhananjay Ugwekar
2024-07-12  3:03   ` Zhang, Rui
2024-07-11 10:24 ` [PATCH v4 05/11] perf/x86/rapl: Move cpumask variable to rapl_pmus struct Dhananjay Ugwekar
2024-07-12  3:07   ` Zhang, Rui
2024-07-12  3:30     ` Dhananjay Ugwekar
2024-07-11 10:24 ` [PATCH v4 06/11] perf/x86/rapl: Add wrapper for online/offline functions Dhananjay Ugwekar
2024-07-12  3:20   ` Zhang, Rui
2024-07-11 10:24 ` [PATCH v4 07/11] perf/x86/rapl: Add an argument to the cleanup and init functions Dhananjay Ugwekar
2024-07-12  3:22   ` Zhang, Rui
2024-07-11 10:24 ` [PATCH v4 08/11] perf/x86/rapl: Modify the generic variable names to *_pkg* Dhananjay Ugwekar
2024-07-12  3:28   ` Zhang, Rui
2024-07-11 10:24 ` [PATCH v4 09/11] perf/x86/rapl: Remove the global variable rapl_msrs Dhananjay Ugwekar
2024-07-12  3:29   ` Zhang, Rui
2024-07-11 10:24 ` [PATCH v4 10/11] perf/x86/rapl: Add per-core energy counter support for AMD CPUs Dhananjay Ugwekar
2024-07-12  6:15   ` Zhang, Rui
2024-07-11 10:24 ` [PATCH v4 11/11] perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu Dhananjay Ugwekar
2024-07-11 22:23 ` [PATCH v4 00/11] Add per-core RAPL energy counter support for AMD CPUs Ian Rogers
2024-07-15  9:35   ` Dhananjay Ugwekar
2024-07-15 15:22     ` Ian Rogers
2024-07-16  8:42       ` Dhananjay Ugwekar
2024-07-16 22:47         ` Ian Rogers
2024-07-17  8:04           ` Dhananjay Ugwekar
2024-07-17 15:36             ` Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).