[PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs
@ 2024-06-24  5:58 Dhananjay Ugwekar
  2024-06-24  5:58 ` [PATCH v3 01/10] x86/topology: Introduce topology_logical_core_id() Dhananjay Ugwekar
                   ` (10 more replies)
  0 siblings, 11 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-06-24  5:58 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay Ugwekar

Currently the energy-cores event in the power PMU aggregates energy
consumption data at a package level. On the other hand the core energy
RAPL counter in AMD CPUs has a core scope (which means the energy 
consumption is recorded separately for each core). Earlier efforts to add
the core event in the power PMU had failed [1], due to the difference in 
the scope of these two events. Hence, there is a need for a new core scope
PMU.

This patchset adds a new "power_per_core" PMU alongside the existing
"power" PMU, which will be responsible for collecting the new
"energy-per-core" event.

Tested the package level and core level PMU counters with workloads
pinned to different CPUs.

Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa 
machine:

$ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1

 Performance counter stats for 'system wide':

S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/

[1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/

This patchset applies cleanly on top of v6.10-rc4 as well as latest 
tip/master.

v3 changes:
* Patch 1 added to introduce the logical_core_id which is unique across
  the system (Prateek)
* Use the unique topology_logical_core_id() instead of
  topology_core_id() (which is only unique within a package on tested
  AMD and Intel systems) in Patch 10

v2 changes:
* Patches 6,7,8 added to split some changes out of the last patch
* Use container_of to get the rapl_pmus from event variable (Rui)
* Set PERF_EV_CAP_READ_ACTIVE_PKG flag only for pkg scope PMU (Rui)
* Use event id 0x1 for energy-per-core event (Rui)
* Use PERF_RAPL_PER_CORE bit instead of adding a new flag to check for
  per-core counter hw support (Rui)

Dhananjay Ugwekar (9):
  perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
  perf/x86/rapl: Rename rapl_pmu variables
  perf/x86/rapl: Make rapl_model struct global
  perf/x86/rapl: Move cpumask variable to rapl_pmus struct
  perf/x86/rapl: Add wrapper for online/offline functions
  perf/x86/rapl: Add an argument to the cleanup and init functions
  perf/x86/rapl: Modify the generic variable names to *_pkg*
  perf/x86/rapl: Remove the global variable rapl_msrs
  perf/x86/rapl: Add per-core energy counter support for AMD CPUs

K Prateek Nayak (1):
  x86/topology: Introduce topology_logical_core_id()

 Documentation/arch/x86/topology.rst   |   4 +
 arch/x86/events/rapl.c                | 418 ++++++++++++++++++--------
 arch/x86/include/asm/processor.h      |   1 +
 arch/x86/include/asm/topology.h       |   1 +
 arch/x86/kernel/cpu/debugfs.c         |   1 +
 arch/x86/kernel/cpu/topology_common.c |   1 +
 6 files changed, 305 insertions(+), 121 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 01/10] x86/topology: Introduce topology_logical_core_id()
  2024-06-24  5:58 [PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
@ 2024-06-24  5:58 ` Dhananjay Ugwekar
  2024-06-26 14:14   ` Zhang, Rui
  2024-06-24  5:58 ` [PATCH v3 02/10] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs Dhananjay Ugwekar
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-06-24  5:58 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay Ugwekar

From: K Prateek Nayak <kprateek.nayak@amd.com>

On x86, topology_core_id() returns a unique core ID within the PKG
domain. Looking at match_smt() suggests that a core ID just needs to be
unique within a LLC domain. For use cases such as the per-core RAPL PMU,
there exists a need for a unique core ID across the entire system with
multiple PKG domains. Introduce topology_logical_core_id() to derive a
unique core ID across the system.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 Documentation/arch/x86/topology.rst   | 4 ++++
 arch/x86/include/asm/processor.h      | 1 +
 arch/x86/include/asm/topology.h       | 1 +
 arch/x86/kernel/cpu/debugfs.c         | 1 +
 arch/x86/kernel/cpu/topology_common.c | 1 +
 5 files changed, 8 insertions(+)

diff --git a/Documentation/arch/x86/topology.rst b/Documentation/arch/x86/topology.rst
index 7352ab89a55a..c12837e61bda 100644
--- a/Documentation/arch/x86/topology.rst
+++ b/Documentation/arch/x86/topology.rst
@@ -135,6 +135,10 @@ Thread-related topology information in the kernel:
     The ID of the core to which a thread belongs. It is also printed in /proc/cpuinfo
     "core_id."
 
+  - topology_logical_core_id();
+
+    The logical core ID to which a thread belongs.
+
 
 
 System topology examples
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 4fd3364dbc73..e1eeb42edeaf 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -98,6 +98,7 @@ struct cpuinfo_topology {
 	// Logical ID mappings
 	u32			logical_pkg_id;
 	u32			logical_die_id;
+	u32			logical_core_id;
 
 	// AMD Node ID and Nodes per Package info
 	u32			amd_node_id;
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index abe3a8f22cbd..2a6dbf965d92 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -137,6 +137,7 @@ extern const struct cpumask *cpu_clustergroup_mask(int cpu);
 #define topology_logical_package_id(cpu)	(cpu_data(cpu).topo.logical_pkg_id)
 #define topology_physical_package_id(cpu)	(cpu_data(cpu).topo.pkg_id)
 #define topology_logical_die_id(cpu)		(cpu_data(cpu).topo.logical_die_id)
+#define topology_logical_core_id(cpu)		(cpu_data(cpu).topo.logical_core_id)
 #define topology_die_id(cpu)			(cpu_data(cpu).topo.die_id)
 #define topology_core_id(cpu)			(cpu_data(cpu).topo.core_id)
 #define topology_ppin(cpu)			(cpu_data(cpu).ppin)
diff --git a/arch/x86/kernel/cpu/debugfs.c b/arch/x86/kernel/cpu/debugfs.c
index 3baf3e435834..b1eb6d7828db 100644
--- a/arch/x86/kernel/cpu/debugfs.c
+++ b/arch/x86/kernel/cpu/debugfs.c
@@ -24,6 +24,7 @@ static int cpu_debug_show(struct seq_file *m, void *p)
 	seq_printf(m, "core_id:             %u\n", c->topo.core_id);
 	seq_printf(m, "logical_pkg_id:      %u\n", c->topo.logical_pkg_id);
 	seq_printf(m, "logical_die_id:      %u\n", c->topo.logical_die_id);
+	seq_printf(m, "logical_core_id:     %u\n", c->topo.logical_core_id);
 	seq_printf(m, "llc_id:              %u\n", c->topo.llc_id);
 	seq_printf(m, "l2c_id:              %u\n", c->topo.l2c_id);
 	seq_printf(m, "amd_node_id:         %u\n", c->topo.amd_node_id);
diff --git a/arch/x86/kernel/cpu/topology_common.c b/arch/x86/kernel/cpu/topology_common.c
index 9a6069e7133c..23722aa21e2f 100644
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -151,6 +151,7 @@ static void topo_set_ids(struct topo_scan *tscan, bool early)
 	if (!early) {
 		c->topo.logical_pkg_id = topology_get_logical_id(apicid, TOPO_PKG_DOMAIN);
 		c->topo.logical_die_id = topology_get_logical_id(apicid, TOPO_DIE_DOMAIN);
+		c->topo.logical_core_id = topology_get_logical_id(apicid, TOPO_CORE_DOMAIN);
 	}
 
 	/* Package relative core ID */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 01/10] x86/topology: Introduce topology_logical_core_id()
  2024-06-24  5:58 ` [PATCH v3 01/10] x86/topology: Introduce topology_logical_core_id() Dhananjay Ugwekar
@ 2024-06-26 14:14   ` Zhang, Rui
  0 siblings, 0 replies; 31+ messages in thread
From: Zhang, Rui @ 2024-06-26 14:14 UTC (permalink / raw)
  To: alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	Hunter, Adrian, mingo@redhat.com, irogers@google.com,
	tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, Dhananjay.Ugwekar@amd.com, bp@alien8.de,
	acme@kernel.org, oleksandr@natalenko.name, jolsa@kernel.org,
	x86@kernel.org, namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org

>  For use cases such as the per-core RAPL PMU,
> there exists a need for a unique core ID across the entire system
> with
> multiple PKG domains.

Agreed. And there are other cases that a system unique core ID is
needed, like in drivers/hwmon/coretemp.c, where per-core temperature
info is saved in an array.

>  Introduce topology_logical_core_id() to derive a
> unique core ID across the system.
> 
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

thanks,
rui

> ---
>  Documentation/arch/x86/topology.rst   | 4 ++++
>  arch/x86/include/asm/processor.h      | 1 +
>  arch/x86/include/asm/topology.h       | 1 +
>  arch/x86/kernel/cpu/debugfs.c         | 1 +
>  arch/x86/kernel/cpu/topology_common.c | 1 +
>  5 files changed, 8 insertions(+)
> 
> diff --git a/Documentation/arch/x86/topology.rst
> b/Documentation/arch/x86/topology.rst
> index 7352ab89a55a..c12837e61bda 100644
> --- a/Documentation/arch/x86/topology.rst
> +++ b/Documentation/arch/x86/topology.rst
> @@ -135,6 +135,10 @@ Thread-related topology information in the
> kernel:
>      The ID of the core to which a thread belongs. It is also printed
> in /proc/cpuinfo
>      "core_id."
>  
> +  - topology_logical_core_id();
> +
> +    The logical core ID to which a thread belongs.
> +
>  
>  
>  System topology examples
> diff --git a/arch/x86/include/asm/processor.h
> b/arch/x86/include/asm/processor.h
> index 4fd3364dbc73..e1eeb42edeaf 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -98,6 +98,7 @@ struct cpuinfo_topology {
>         // Logical ID mappings
>         u32                     logical_pkg_id;
>         u32                     logical_die_id;
> +       u32                     logical_core_id;
>  
>         // AMD Node ID and Nodes per Package info
>         u32                     amd_node_id;
> diff --git a/arch/x86/include/asm/topology.h
> b/arch/x86/include/asm/topology.h
> index abe3a8f22cbd..2a6dbf965d92 100644
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -137,6 +137,7 @@ extern const struct cpumask
> *cpu_clustergroup_mask(int cpu);
>  #define
> topology_logical_package_id(cpu)       (cpu_data(cpu).topo.logical_pk
> g_id)
>  #define
> topology_physical_package_id(cpu)      (cpu_data(cpu).topo.pkg_id)
>  #define
> topology_logical_die_id(cpu)           (cpu_data(cpu).topo.logical_di
> e_id)
> +#define
> topology_logical_core_id(cpu)          (cpu_data(cpu).topo.logical_co
> re_id)
>  #define
> topology_die_id(cpu)                   (cpu_data(cpu).topo.die_id)
>  #define
> topology_core_id(cpu)                  (cpu_data(cpu).topo.core_id)
>  #define topology_ppin(cpu)                     (cpu_data(cpu).ppin)
> diff --git a/arch/x86/kernel/cpu/debugfs.c
> b/arch/x86/kernel/cpu/debugfs.c
> index 3baf3e435834..b1eb6d7828db 100644
> --- a/arch/x86/kernel/cpu/debugfs.c
> +++ b/arch/x86/kernel/cpu/debugfs.c
> @@ -24,6 +24,7 @@ static int cpu_debug_show(struct seq_file *m, void
> *p)
>         seq_printf(m, "core_id:             %u\n", c->topo.core_id);
>         seq_printf(m, "logical_pkg_id:      %u\n", c-
> >topo.logical_pkg_id);
>         seq_printf(m, "logical_die_id:      %u\n", c-
> >topo.logical_die_id);
> +       seq_printf(m, "logical_core_id:     %u\n", c-
> >topo.logical_core_id);
>         seq_printf(m, "llc_id:              %u\n", c->topo.llc_id);
>         seq_printf(m, "l2c_id:              %u\n", c->topo.l2c_id);
>         seq_printf(m, "amd_node_id:         %u\n", c-
> >topo.amd_node_id);
> diff --git a/arch/x86/kernel/cpu/topology_common.c
> b/arch/x86/kernel/cpu/topology_common.c
> index 9a6069e7133c..23722aa21e2f 100644
> --- a/arch/x86/kernel/cpu/topology_common.c
> +++ b/arch/x86/kernel/cpu/topology_common.c
> @@ -151,6 +151,7 @@ static void topo_set_ids(struct topo_scan *tscan,
> bool early)
>         if (!early) {
>                 c->topo.logical_pkg_id =
> topology_get_logical_id(apicid, TOPO_PKG_DOMAIN);
>                 c->topo.logical_die_id =
> topology_get_logical_id(apicid, TOPO_DIE_DOMAIN);
> +               c->topo.logical_core_id =
> topology_get_logical_id(apicid, TOPO_CORE_DOMAIN);
>         }
>  
>         /* Package relative core ID */


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 02/10] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
  2024-06-24  5:58 [PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
  2024-06-24  5:58 ` [PATCH v3 01/10] x86/topology: Introduce topology_logical_core_id() Dhananjay Ugwekar
@ 2024-06-24  5:58 ` Dhananjay Ugwekar
  2024-06-26 14:18   ` Zhang, Rui
  2024-07-01 12:59   ` Peter Zijlstra
  2024-06-24  5:59 ` [PATCH v3 03/10] perf/x86/rapl: Rename rapl_pmu variables Dhananjay Ugwekar
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-06-24  5:58 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay Ugwekar

After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
on AMD processors that support extended CPUID leaf 0x80000026, the
topology_die_cpumask() and topology_logical_die_id() macros, no longer
return the package cpumask and package id, instead they return the CCD
(Core Complex Die) mask and id respectively. This leads to the energy-pkg
event scope to be modified to CCD instead of package.

Replacing these macros with their package counterparts fixes the
energy-pkg event for AMD CPUs.

However due to the difference between the scope of energy-pkg event for
Intel and AMD CPUs, we have to replace these macros conditionally only for
AMD CPUs.

On a 12 CCD 1 Package AMD Zen4 Genoa machine:

Before:
$ cat /sys/devices/power/cpumask
0,8,16,24,32,40,48,56,64,72,80,88.

The expected cpumask here is supposed to be just "0", as it is a package
scope event, only one CPU will be collecting the event for all the CPUs in
the package.

After:
$ cat /sys/devices/power/cpumask
0

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
---
 arch/x86/events/rapl.c | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index b985ca79cf97..73be25e1f4b4 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -103,6 +103,10 @@ static struct perf_pmu_events_attr event_attr_##v = {				\
 	.event_str	= str,							\
 };
 
+#define rapl_pmu_is_pkg_scope()				\
+	(boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||	\
+	 boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
+
 struct rapl_pmu {
 	raw_spinlock_t		lock;
 	int			n_active;
@@ -140,9 +144,21 @@ static unsigned int rapl_cntr_mask;
 static u64 rapl_timer_ms;
 static struct perf_msr *rapl_msrs;
 
+static inline unsigned int get_rapl_pmu_idx(int cpu)
+{
+	return rapl_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
+					 topology_logical_die_id(cpu);
+}
+
+static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
+{
+	return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
+					 topology_die_cpumask(cpu);
+}
+
 static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
 {
-	unsigned int rapl_pmu_idx = topology_logical_die_id(cpu);
+	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
 
 	/*
 	 * The unsigned check also catches the '-1' return value for non
@@ -543,6 +559,7 @@ static struct perf_msr amd_rapl_msrs[] = {
 
 static int rapl_cpu_offline(unsigned int cpu)
 {
+	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
 	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
 	int target;
 
@@ -552,7 +569,7 @@ static int rapl_cpu_offline(unsigned int cpu)
 
 	pmu->cpu = -1;
 	/* Find a new cpu to collect rapl events */
-	target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
+	target = cpumask_any_but(rapl_pmu_cpumask, cpu);
 
 	/* Migrate rapl events to the new target */
 	if (target < nr_cpu_ids) {
@@ -565,6 +582,8 @@ static int rapl_cpu_offline(unsigned int cpu)
 
 static int rapl_cpu_online(unsigned int cpu)
 {
+	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
+	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
 	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
 	int target;
 
@@ -579,14 +598,14 @@ static int rapl_cpu_online(unsigned int cpu)
 		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
 		rapl_hrtimer_init(pmu);
 
-		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
+		rapl_pmus->pmus[rapl_pmu_idx] = pmu;
 	}
 
 	/*
 	 * Check if there is an online cpu in the package which collects rapl
 	 * events already.
 	 */
-	target = cpumask_any_and(&rapl_cpu_mask, topology_die_cpumask(cpu));
+	target = cpumask_any_and(&rapl_cpu_mask, rapl_pmu_cpumask);
 	if (target < nr_cpu_ids)
 		return 0;
 
@@ -677,6 +696,9 @@ static int __init init_rapl_pmus(void)
 {
 	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
 
+	if (rapl_pmu_is_pkg_scope())
+		nr_rapl_pmu = topology_max_packages();
+
 	rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus, nr_rapl_pmu), GFP_KERNEL);
 	if (!rapl_pmus)
 		return -ENOMEM;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 02/10] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
  2024-06-24  5:58 ` [PATCH v3 02/10] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs Dhananjay Ugwekar
@ 2024-06-26 14:18   ` Zhang, Rui
  2024-06-26 15:34     ` Dhananjay Ugwekar
  2024-07-01 12:59   ` Peter Zijlstra
  1 sibling, 1 reply; 31+ messages in thread
From: Zhang, Rui @ 2024-06-26 14:18 UTC (permalink / raw)
  To: alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	Hunter, Adrian, mingo@redhat.com, irogers@google.com,
	tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, Dhananjay.Ugwekar@amd.com, bp@alien8.de,
	acme@kernel.org, oleksandr@natalenko.name, jolsa@kernel.org,
	x86@kernel.org, namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org

On Mon, 2024-06-24 at 05:58 +0000, Dhananjay Ugwekar wrote:
> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026
> leaf"),
> on AMD processors that support extended CPUID leaf 0x80000026, the
> topology_die_cpumask() and topology_logical_die_id() macros, no
> longer
> return the package cpumask and package id, instead they return the
> CCD
> (Core Complex Die) mask and id respectively. This leads to the
> energy-pkg
> event scope to be modified to CCD instead of package.
> 
> Replacing these macros with their package counterparts fixes the
> energy-pkg event for AMD CPUs.
> 
> However due to the difference between the scope of energy-pkg event
> for
> Intel and AMD CPUs, we have to replace these macros conditionally
> only for
> AMD CPUs.
> 
> On a 12 CCD 1 Package AMD Zen4 Genoa machine:
> 
> Before:
> $ cat /sys/devices/power/cpumask
> 0,8,16,24,32,40,48,56,64,72,80,88.
> 
> The expected cpumask here is supposed to be just "0", as it is a
> package
> scope event, only one CPU will be collecting the event for all the
> CPUs in
> the package.
> 
> After:
> $ cat /sys/devices/power/cpumask
> 0
> 
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD
> 0x80000026 leaf")

As there is no code change compared with V1, I think you missed my
Reviewed-by tag
https://lore.kernel.org/all/e1f70a09f85dbd0ee3f32dffea37993e141269d0.camel@intel.com/

thanks,
rui

> ---
>  arch/x86/events/rapl.c | 30 ++++++++++++++++++++++++++----
>  1 file changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index b985ca79cf97..73be25e1f4b4 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -103,6 +103,10 @@ static struct perf_pmu_events_attr
> event_attr_##v = {                              \
>         .event_str      =
> str,                                                  \
>  };
>  
> +#define rapl_pmu_is_pkg_scope()                                \
> +       (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||  \
> +        boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
> +
>  struct rapl_pmu {
>         raw_spinlock_t          lock;
>         int                     n_active;
> @@ -140,9 +144,21 @@ static unsigned int rapl_cntr_mask;
>  static u64 rapl_timer_ms;
>  static struct perf_msr *rapl_msrs;
>  
> +static inline unsigned int get_rapl_pmu_idx(int cpu)
> +{
> +       return rapl_pmu_is_pkg_scope() ?
> topology_logical_package_id(cpu) :
> +                                       
> topology_logical_die_id(cpu);
> +}
> +
> +static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
> +{
> +       return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
> +                                        topology_die_cpumask(cpu);
> +}
> +
>  static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
>  {
> -       unsigned int rapl_pmu_idx = topology_logical_die_id(cpu);
> +       unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
>  
>         /*
>          * The unsigned check also catches the '-1' return value for
> non
> @@ -543,6 +559,7 @@ static struct perf_msr amd_rapl_msrs[] = {
>  
>  static int rapl_cpu_offline(unsigned int cpu)
>  {
> +       const struct cpumask *rapl_pmu_cpumask =
> get_rapl_pmu_cpumask(cpu);
>         struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>         int target;
>  
> @@ -552,7 +569,7 @@ static int rapl_cpu_offline(unsigned int cpu)
>  
>         pmu->cpu = -1;
>         /* Find a new cpu to collect rapl events */
> -       target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
> +       target = cpumask_any_but(rapl_pmu_cpumask, cpu);
>  
>         /* Migrate rapl events to the new target */
>         if (target < nr_cpu_ids) {
> @@ -565,6 +582,8 @@ static int rapl_cpu_offline(unsigned int cpu)
>  
>  static int rapl_cpu_online(unsigned int cpu)
>  {
> +       unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
> +       const struct cpumask *rapl_pmu_cpumask =
> get_rapl_pmu_cpumask(cpu);
>         struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>         int target;
>  
> @@ -579,14 +598,14 @@ static int rapl_cpu_online(unsigned int cpu)
>                 pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
>                 rapl_hrtimer_init(pmu);
>  
> -               rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
> +               rapl_pmus->pmus[rapl_pmu_idx] = pmu;
>         }
>  
>         /*
>          * Check if there is an online cpu in the package which
> collects rapl
>          * events already.
>          */
> -       target = cpumask_any_and(&rapl_cpu_mask,
> topology_die_cpumask(cpu));
> +       target = cpumask_any_and(&rapl_cpu_mask, rapl_pmu_cpumask);
>         if (target < nr_cpu_ids)
>                 return 0;
>  
> @@ -677,6 +696,9 @@ static int __init init_rapl_pmus(void)
>  {
>         int nr_rapl_pmu = topology_max_packages() *
> topology_max_dies_per_package();
>  
> +       if (rapl_pmu_is_pkg_scope())
> +               nr_rapl_pmu = topology_max_packages();
> +
>         rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus,
> nr_rapl_pmu), GFP_KERNEL);
>         if (!rapl_pmus)
>                 return -ENOMEM;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 02/10] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
  2024-06-26 14:18   ` Zhang, Rui
@ 2024-06-26 15:34     ` Dhananjay Ugwekar
  0 siblings, 0 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-06-26 15:34 UTC (permalink / raw)
  To: Zhang, Rui, alexander.shishkin@linux.intel.com,
	dave.hansen@linux.intel.com, Hunter, Adrian, mingo@redhat.com,
	irogers@google.com, tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, bp@alien8.de, acme@kernel.org,
	oleksandr@natalenko.name, jolsa@kernel.org, x86@kernel.org,
	namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org

Hi Rui,

On 6/26/2024 7:48 PM, Zhang, Rui wrote:
> On Mon, 2024-06-24 at 05:58 +0000, Dhananjay Ugwekar wrote:
>> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026
>> leaf"),
>> on AMD processors that support extended CPUID leaf 0x80000026, the
>> topology_die_cpumask() and topology_logical_die_id() macros, no
>> longer
>> return the package cpumask and package id, instead they return the
>> CCD
>> (Core Complex Die) mask and id respectively. This leads to the
>> energy-pkg
>> event scope to be modified to CCD instead of package.
>>
>> Replacing these macros with their package counterparts fixes the
>> energy-pkg event for AMD CPUs.
>>
>> However due to the difference between the scope of energy-pkg event
>> for
>> Intel and AMD CPUs, we have to replace these macros conditionally
>> only for
>> AMD CPUs.
>>
>> On a 12 CCD 1 Package AMD Zen4 Genoa machine:
>>
>> Before:
>> $ cat /sys/devices/power/cpumask
>> 0,8,16,24,32,40,48,56,64,72,80,88.
>>
>> The expected cpumask here is supposed to be just "0", as it is a
>> package
>> scope event, only one CPU will be collecting the event for all the
>> CPUs in
>> the package.
>>
>> After:
>> $ cat /sys/devices/power/cpumask
>> 0
>>
>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
>> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD
>> 0x80000026 leaf")
> 
> As there is no code change compared with V1, I think you missed my
> Reviewed-by tag
> https://lore.kernel.org/all/e1f70a09f85dbd0ee3f32dffea37993e141269d0.camel@intel.com/

Yes!, forgot to add your Reviewed-by tag, will add in the next version.

Thanks,
Dhananjay

> 
> thanks,
> rui
> 
>> ---
>>  arch/x86/events/rapl.c | 30 ++++++++++++++++++++++++++----
>>  1 file changed, 26 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>> index b985ca79cf97..73be25e1f4b4 100644
>> --- a/arch/x86/events/rapl.c
>> +++ b/arch/x86/events/rapl.c
>> @@ -103,6 +103,10 @@ static struct perf_pmu_events_attr
>> event_attr_##v = {                              \
>>         .event_str      =
>> str,                                                  \
>>  };
>>  
>> +#define rapl_pmu_is_pkg_scope()                                \
>> +       (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||  \
>> +        boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
>> +
>>  struct rapl_pmu {
>>         raw_spinlock_t          lock;
>>         int                     n_active;
>> @@ -140,9 +144,21 @@ static unsigned int rapl_cntr_mask;
>>  static u64 rapl_timer_ms;
>>  static struct perf_msr *rapl_msrs;
>>  
>> +static inline unsigned int get_rapl_pmu_idx(int cpu)
>> +{
>> +       return rapl_pmu_is_pkg_scope() ?
>> topology_logical_package_id(cpu) :
>> +                                       
>> topology_logical_die_id(cpu);
>> +}
>> +
>> +static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
>> +{
>> +       return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
>> +                                        topology_die_cpumask(cpu);
>> +}
>> +
>>  static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
>>  {
>> -       unsigned int rapl_pmu_idx = topology_logical_die_id(cpu);
>> +       unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
>>  
>>         /*
>>          * The unsigned check also catches the '-1' return value for
>> non
>> @@ -543,6 +559,7 @@ static struct perf_msr amd_rapl_msrs[] = {
>>  
>>  static int rapl_cpu_offline(unsigned int cpu)
>>  {
>> +       const struct cpumask *rapl_pmu_cpumask =
>> get_rapl_pmu_cpumask(cpu);
>>         struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>>         int target;
>>  
>> @@ -552,7 +569,7 @@ static int rapl_cpu_offline(unsigned int cpu)
>>  
>>         pmu->cpu = -1;
>>         /* Find a new cpu to collect rapl events */
>> -       target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
>> +       target = cpumask_any_but(rapl_pmu_cpumask, cpu);
>>  
>>         /* Migrate rapl events to the new target */
>>         if (target < nr_cpu_ids) {
>> @@ -565,6 +582,8 @@ static int rapl_cpu_offline(unsigned int cpu)
>>  
>>  static int rapl_cpu_online(unsigned int cpu)
>>  {
>> +       unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
>> +       const struct cpumask *rapl_pmu_cpumask =
>> get_rapl_pmu_cpumask(cpu);
>>         struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>>         int target;
>>  
>> @@ -579,14 +598,14 @@ static int rapl_cpu_online(unsigned int cpu)
>>                 pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
>>                 rapl_hrtimer_init(pmu);
>>  
>> -               rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
>> +               rapl_pmus->pmus[rapl_pmu_idx] = pmu;
>>         }
>>  
>>         /*
>>          * Check if there is an online cpu in the package which
>> collects rapl
>>          * events already.
>>          */
>> -       target = cpumask_any_and(&rapl_cpu_mask,
>> topology_die_cpumask(cpu));
>> +       target = cpumask_any_and(&rapl_cpu_mask, rapl_pmu_cpumask);
>>         if (target < nr_cpu_ids)
>>                 return 0;
>>  
>> @@ -677,6 +696,9 @@ static int __init init_rapl_pmus(void)
>>  {
>>         int nr_rapl_pmu = topology_max_packages() *
>> topology_max_dies_per_package();
>>  
>> +       if (rapl_pmu_is_pkg_scope())
>> +               nr_rapl_pmu = topology_max_packages();
>> +
>>         rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus,
>> nr_rapl_pmu), GFP_KERNEL);
>>         if (!rapl_pmus)
>>                 return -ENOMEM;
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 02/10] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
  2024-06-24  5:58 ` [PATCH v3 02/10] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs Dhananjay Ugwekar
  2024-06-26 14:18   ` Zhang, Rui
@ 2024-07-01 12:59   ` Peter Zijlstra
  2024-07-02  9:39     ` Dhananjay Ugwekar
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2024-07-01 12:59 UTC (permalink / raw)
  To: Dhananjay Ugwekar
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen, x86,
	kees, gustavoars, rui.zhang, oleksandr, linux-perf-users,
	linux-kernel, linux-hardening, ananth.narayan, gautham.shenoy,
	kprateek.nayak, ravi.bangoria, sandipan.das, linux-pm

On Mon, Jun 24, 2024 at 05:58:59AM +0000, Dhananjay Ugwekar wrote:
 	
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index b985ca79cf97..73be25e1f4b4 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -103,6 +103,10 @@ static struct perf_pmu_events_attr event_attr_##v = {				\
>  	.event_str	= str,							\
>  };
>  
> +#define rapl_pmu_is_pkg_scope()				\
> +	(boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||	\
> +	 boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
> +
>  struct rapl_pmu {
>  	raw_spinlock_t		lock;
>  	int			n_active;
> @@ -140,9 +144,21 @@ static unsigned int rapl_cntr_mask;
>  static u64 rapl_timer_ms;
>  static struct perf_msr *rapl_msrs;
>  
> +static inline unsigned int get_rapl_pmu_idx(int cpu)
> +{
> +	return rapl_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
> +					 topology_logical_die_id(cpu);
> +}
> +
> +static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
> +{
> +	return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
> +					 topology_die_cpumask(cpu);
> +}

This wants a comment. The next time someone looks at this we're going to
be confused.

> @@ -677,6 +696,9 @@ static int __init init_rapl_pmus(void)
>  {
>  	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
>  
> +	if (rapl_pmu_is_pkg_scope())
> +		nr_rapl_pmu = topology_max_packages();
> +

How about:

	int nr_rapl_pmu = topology_max_packages();
	if (!rapl_pmu_is_pkg_scope())
		nr_rapl_pmu *= topology_max_dies_per_package();

hmm?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 02/10] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
  2024-07-01 12:59   ` Peter Zijlstra
@ 2024-07-02  9:39     ` Dhananjay Ugwekar
  0 siblings, 0 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-02  9:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen, x86,
	kees, gustavoars, rui.zhang, oleksandr, linux-perf-users,
	linux-kernel, linux-hardening, ananth.narayan, gautham.shenoy,
	kprateek.nayak, ravi.bangoria, sandipan.das, linux-pm

Hello Peter,

On 7/1/2024 6:29 PM, Peter Zijlstra wrote:
> On Mon, Jun 24, 2024 at 05:58:59AM +0000, Dhananjay Ugwekar wrote:
>  	
>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>> index b985ca79cf97..73be25e1f4b4 100644
>> --- a/arch/x86/events/rapl.c
>> +++ b/arch/x86/events/rapl.c
>> @@ -103,6 +103,10 @@ static struct perf_pmu_events_attr event_attr_##v = {				\
>>  	.event_str	= str,							\
>>  };
>>  
>> +#define rapl_pmu_is_pkg_scope()				\
>> +	(boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||	\
>> +	 boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
>> +
>>  struct rapl_pmu {
>>  	raw_spinlock_t		lock;
>>  	int			n_active;
>> @@ -140,9 +144,21 @@ static unsigned int rapl_cntr_mask;
>>  static u64 rapl_timer_ms;
>>  static struct perf_msr *rapl_msrs;
>>  
>> +static inline unsigned int get_rapl_pmu_idx(int cpu)
>> +{
>> +	return rapl_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
>> +					 topology_logical_die_id(cpu);
>> +}
>> +
>> +static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
>> +{
>> +	return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
>> +					 topology_die_cpumask(cpu);
>> +}
> 
> This wants a comment. The next time someone looks at this we're going to
> be confused.

Yes, will add a comment.

> 
>> @@ -677,6 +696,9 @@ static int __init init_rapl_pmus(void)
>>  {
>>  	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
>>  
>> +	if (rapl_pmu_is_pkg_scope())
>> +		nr_rapl_pmu = topology_max_packages();
>> +
> 
> How about:
> 
> 	int nr_rapl_pmu = topology_max_packages();
> 	if (!rapl_pmu_is_pkg_scope())
> 		nr_rapl_pmu *= topology_max_dies_per_package();
> 
> hmm?

Sure, I'm okay with this as well, will modify in next version.

Thanks,
Dhananjay

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 03/10] perf/x86/rapl: Rename rapl_pmu variables
  2024-06-24  5:58 [PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
  2024-06-24  5:58 ` [PATCH v3 01/10] x86/topology: Introduce topology_logical_core_id() Dhananjay Ugwekar
  2024-06-24  5:58 ` [PATCH v3 02/10] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs Dhananjay Ugwekar
@ 2024-06-24  5:59 ` Dhananjay Ugwekar
  2024-06-24  5:59 ` [PATCH v3 04/10] perf/x86/rapl: Make rapl_model struct global Dhananjay Ugwekar
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-06-24  5:59 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay Ugwekar

Rename struct rapl_pmu variables from "pmu" to "rapl_pmu", to
avoid any confusion between the variables of two different
structs pmu and rapl_pmu. As rapl_pmu also contains a pointer to
struct pmu, which leads to situations in code like pmu->pmu,
which is needlessly confusing. Above scenario is replaced with
much more readable rapl_pmu->pmu with this change.

Also rename "pmus" member in rapl_pmus struct, for same reason.

No functional change.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 104 ++++++++++++++++++++---------------------
 1 file changed, 52 insertions(+), 52 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 73be25e1f4b4..b4e2073a178e 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -120,7 +120,7 @@ struct rapl_pmu {
 struct rapl_pmus {
 	struct pmu		pmu;
 	unsigned int		nr_rapl_pmu;
-	struct rapl_pmu		*pmus[] __counted_by(nr_rapl_pmu);
+	struct rapl_pmu		*rapl_pmu[] __counted_by(nr_rapl_pmu);
 };
 
 enum rapl_unit_quirk {
@@ -164,7 +164,7 @@ static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
 	 * The unsigned check also catches the '-1' return value for non
 	 * existent mappings in the topology map.
 	 */
-	return rapl_pmu_idx < rapl_pmus->nr_rapl_pmu ? rapl_pmus->pmus[rapl_pmu_idx] : NULL;
+	return rapl_pmu_idx < rapl_pmus->nr_rapl_pmu ? rapl_pmus->rapl_pmu[rapl_pmu_idx] : NULL;
 }
 
 static inline u64 rapl_read_counter(struct perf_event *event)
@@ -228,34 +228,34 @@ static void rapl_start_hrtimer(struct rapl_pmu *pmu)
 
 static enum hrtimer_restart rapl_hrtimer_handle(struct hrtimer *hrtimer)
 {
-	struct rapl_pmu *pmu = container_of(hrtimer, struct rapl_pmu, hrtimer);
+	struct rapl_pmu *rapl_pmu = container_of(hrtimer, struct rapl_pmu, hrtimer);
 	struct perf_event *event;
 	unsigned long flags;
 
-	if (!pmu->n_active)
+	if (!rapl_pmu->n_active)
 		return HRTIMER_NORESTART;
 
-	raw_spin_lock_irqsave(&pmu->lock, flags);
+	raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
 
-	list_for_each_entry(event, &pmu->active_list, active_entry)
+	list_for_each_entry(event, &rapl_pmu->active_list, active_entry)
 		rapl_event_update(event);
 
-	raw_spin_unlock_irqrestore(&pmu->lock, flags);
+	raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
 
-	hrtimer_forward_now(hrtimer, pmu->timer_interval);
+	hrtimer_forward_now(hrtimer, rapl_pmu->timer_interval);
 
 	return HRTIMER_RESTART;
 }
 
-static void rapl_hrtimer_init(struct rapl_pmu *pmu)
+static void rapl_hrtimer_init(struct rapl_pmu *rapl_pmu)
 {
-	struct hrtimer *hr = &pmu->hrtimer;
+	struct hrtimer *hr = &rapl_pmu->hrtimer;
 
 	hrtimer_init(hr, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	hr->function = rapl_hrtimer_handle;
 }
 
-static void __rapl_pmu_event_start(struct rapl_pmu *pmu,
+static void __rapl_pmu_event_start(struct rapl_pmu *rapl_pmu,
 				   struct perf_event *event)
 {
 	if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
@@ -263,39 +263,39 @@ static void __rapl_pmu_event_start(struct rapl_pmu *pmu,
 
 	event->hw.state = 0;
 
-	list_add_tail(&event->active_entry, &pmu->active_list);
+	list_add_tail(&event->active_entry, &rapl_pmu->active_list);
 
 	local64_set(&event->hw.prev_count, rapl_read_counter(event));
 
-	pmu->n_active++;
-	if (pmu->n_active == 1)
-		rapl_start_hrtimer(pmu);
+	rapl_pmu->n_active++;
+	if (rapl_pmu->n_active == 1)
+		rapl_start_hrtimer(rapl_pmu);
 }
 
 static void rapl_pmu_event_start(struct perf_event *event, int mode)
 {
-	struct rapl_pmu *pmu = event->pmu_private;
+	struct rapl_pmu *rapl_pmu = event->pmu_private;
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&pmu->lock, flags);
-	__rapl_pmu_event_start(pmu, event);
-	raw_spin_unlock_irqrestore(&pmu->lock, flags);
+	raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
+	__rapl_pmu_event_start(rapl_pmu, event);
+	raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
 }
 
 static void rapl_pmu_event_stop(struct perf_event *event, int mode)
 {
-	struct rapl_pmu *pmu = event->pmu_private;
+	struct rapl_pmu *rapl_pmu = event->pmu_private;
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&pmu->lock, flags);
+	raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
 
 	/* mark event as deactivated and stopped */
 	if (!(hwc->state & PERF_HES_STOPPED)) {
-		WARN_ON_ONCE(pmu->n_active <= 0);
-		pmu->n_active--;
-		if (pmu->n_active == 0)
-			hrtimer_cancel(&pmu->hrtimer);
+		WARN_ON_ONCE(rapl_pmu->n_active <= 0);
+		rapl_pmu->n_active--;
+		if (rapl_pmu->n_active == 0)
+			hrtimer_cancel(&rapl_pmu->hrtimer);
 
 		list_del(&event->active_entry);
 
@@ -313,23 +313,23 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode)
 		hwc->state |= PERF_HES_UPTODATE;
 	}
 
-	raw_spin_unlock_irqrestore(&pmu->lock, flags);
+	raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
 }
 
 static int rapl_pmu_event_add(struct perf_event *event, int mode)
 {
-	struct rapl_pmu *pmu = event->pmu_private;
+	struct rapl_pmu *rapl_pmu = event->pmu_private;
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&pmu->lock, flags);
+	raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
 
 	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
 
 	if (mode & PERF_EF_START)
-		__rapl_pmu_event_start(pmu, event);
+		__rapl_pmu_event_start(rapl_pmu, event);
 
-	raw_spin_unlock_irqrestore(&pmu->lock, flags);
+	raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
 
 	return 0;
 }
@@ -343,7 +343,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
 {
 	u64 cfg = event->attr.config & RAPL_EVENT_MASK;
 	int bit, ret = 0;
-	struct rapl_pmu *pmu;
+	struct rapl_pmu *rapl_pmu;
 
 	/* only look at RAPL events */
 	if (event->attr.type != rapl_pmus->pmu.type)
@@ -373,11 +373,11 @@ static int rapl_pmu_event_init(struct perf_event *event)
 		return -EINVAL;
 
 	/* must be done before validate_group */
-	pmu = cpu_to_rapl_pmu(event->cpu);
-	if (!pmu)
+	rapl_pmu = cpu_to_rapl_pmu(event->cpu);
+	if (!rapl_pmu)
 		return -EINVAL;
-	event->cpu = pmu->cpu;
-	event->pmu_private = pmu;
+	event->cpu = rapl_pmu->cpu;
+	event->pmu_private = rapl_pmu;
 	event->hw.event_base = rapl_msrs[bit].msr;
 	event->hw.config = cfg;
 	event->hw.idx = bit;
@@ -560,22 +560,22 @@ static struct perf_msr amd_rapl_msrs[] = {
 static int rapl_cpu_offline(unsigned int cpu)
 {
 	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
-	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
+	struct rapl_pmu *rapl_pmu = cpu_to_rapl_pmu(cpu);
 	int target;
 
 	/* Check if exiting cpu is used for collecting rapl events */
 	if (!cpumask_test_and_clear_cpu(cpu, &rapl_cpu_mask))
 		return 0;
 
-	pmu->cpu = -1;
+	rapl_pmu->cpu = -1;
 	/* Find a new cpu to collect rapl events */
 	target = cpumask_any_but(rapl_pmu_cpumask, cpu);
 
 	/* Migrate rapl events to the new target */
 	if (target < nr_cpu_ids) {
 		cpumask_set_cpu(target, &rapl_cpu_mask);
-		pmu->cpu = target;
-		perf_pmu_migrate_context(pmu->pmu, cpu, target);
+		rapl_pmu->cpu = target;
+		perf_pmu_migrate_context(rapl_pmu->pmu, cpu, target);
 	}
 	return 0;
 }
@@ -584,21 +584,21 @@ static int rapl_cpu_online(unsigned int cpu)
 {
 	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
 	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
-	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
+	struct rapl_pmu *rapl_pmu = cpu_to_rapl_pmu(cpu);
 	int target;
 
-	if (!pmu) {
-		pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
-		if (!pmu)
+	if (!rapl_pmu) {
+		rapl_pmu = kzalloc_node(sizeof(*rapl_pmu), GFP_KERNEL, cpu_to_node(cpu));
+		if (!rapl_pmu)
 			return -ENOMEM;
 
-		raw_spin_lock_init(&pmu->lock);
-		INIT_LIST_HEAD(&pmu->active_list);
-		pmu->pmu = &rapl_pmus->pmu;
-		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
-		rapl_hrtimer_init(pmu);
+		raw_spin_lock_init(&rapl_pmu->lock);
+		INIT_LIST_HEAD(&rapl_pmu->active_list);
+		rapl_pmu->pmu = &rapl_pmus->pmu;
+		rapl_pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
+		rapl_hrtimer_init(rapl_pmu);
 
-		rapl_pmus->pmus[rapl_pmu_idx] = pmu;
+		rapl_pmus->rapl_pmu[rapl_pmu_idx] = rapl_pmu;
 	}
 
 	/*
@@ -610,7 +610,7 @@ static int rapl_cpu_online(unsigned int cpu)
 		return 0;
 
 	cpumask_set_cpu(cpu, &rapl_cpu_mask);
-	pmu->cpu = cpu;
+	rapl_pmu->cpu = cpu;
 	return 0;
 }
 
@@ -679,7 +679,7 @@ static void cleanup_rapl_pmus(void)
 	int i;
 
 	for (i = 0; i < rapl_pmus->nr_rapl_pmu; i++)
-		kfree(rapl_pmus->pmus[i]);
+		kfree(rapl_pmus->rapl_pmu[i]);
 	kfree(rapl_pmus);
 }
 
@@ -699,7 +699,7 @@ static int __init init_rapl_pmus(void)
 	if (rapl_pmu_is_pkg_scope())
 		nr_rapl_pmu = topology_max_packages();
 
-	rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus, nr_rapl_pmu), GFP_KERNEL);
+	rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu, nr_rapl_pmu), GFP_KERNEL);
 	if (!rapl_pmus)
 		return -ENOMEM;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 04/10] perf/x86/rapl: Make rapl_model struct global
  2024-06-24  5:58 [PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (2 preceding siblings ...)
  2024-06-24  5:59 ` [PATCH v3 03/10] perf/x86/rapl: Rename rapl_pmu variables Dhananjay Ugwekar
@ 2024-06-24  5:59 ` Dhananjay Ugwekar
  2024-06-24  5:59 ` [PATCH v3 05/10] perf/x86/rapl: Move cpumask variable to rapl_pmus struct Dhananjay Ugwekar
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-06-24  5:59 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay Ugwekar

To support AMD's per_core RAPL counter, we will need to check
per_core capability of the current rapl_model multiple times in
rapl_cpu_online/offline, init_rapl_pmus functions, so cache the
matched rapl model in a global variable, to avoid calling
x86_match_cpu() multiple times.

No functional change.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index b4e2073a178e..e5e878146542 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -143,6 +143,7 @@ static cpumask_t rapl_cpu_mask;
 static unsigned int rapl_cntr_mask;
 static u64 rapl_timer_ms;
 static struct perf_msr *rapl_msrs;
+static struct rapl_model *rapl_model;
 
 static inline unsigned int get_rapl_pmu_idx(int cpu)
 {
@@ -614,18 +615,18 @@ static int rapl_cpu_online(unsigned int cpu)
 	return 0;
 }
 
-static int rapl_check_hw_unit(struct rapl_model *rm)
+static int rapl_check_hw_unit(void)
 {
 	u64 msr_rapl_power_unit_bits;
 	int i;
 
 	/* protect rdmsrl() to handle virtualization */
-	if (rdmsrl_safe(rm->msr_power_unit, &msr_rapl_power_unit_bits))
+	if (rdmsrl_safe(rapl_model->msr_power_unit, &msr_rapl_power_unit_bits))
 		return -1;
 	for (i = 0; i < NR_RAPL_DOMAINS; i++)
 		rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
 
-	switch (rm->unit_quirk) {
+	switch (rapl_model->unit_quirk) {
 	/*
 	 * DRAM domain on HSW server and KNL has fixed energy unit which can be
 	 * different than the unit from power unit MSR. See
@@ -839,21 +840,20 @@ MODULE_DEVICE_TABLE(x86cpu, rapl_model_match);
 static int __init rapl_pmu_init(void)
 {
 	const struct x86_cpu_id *id;
-	struct rapl_model *rm;
 	int ret;
 
 	id = x86_match_cpu(rapl_model_match);
 	if (!id)
 		return -ENODEV;
 
-	rm = (struct rapl_model *) id->driver_data;
+	rapl_model = (struct rapl_model *) id->driver_data;
 
-	rapl_msrs = rm->rapl_msrs;
+	rapl_msrs = rapl_model->rapl_msrs;
 
 	rapl_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_MAX,
-					false, (void *) &rm->events);
+					false, (void *) &rapl_model->events);
 
-	ret = rapl_check_hw_unit(rm);
+	ret = rapl_check_hw_unit();
 	if (ret)
 		return ret;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 05/10] perf/x86/rapl: Move cpumask variable to rapl_pmus struct
  2024-06-24  5:58 [PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (3 preceding siblings ...)
  2024-06-24  5:59 ` [PATCH v3 04/10] perf/x86/rapl: Make rapl_model struct global Dhananjay Ugwekar
@ 2024-06-24  5:59 ` Dhananjay Ugwekar
  2024-07-01 13:02   ` Peter Zijlstra
  2024-06-24  5:59 ` [PATCH v3 06/10] perf/x86/rapl: Add wrapper for online/offline functions Dhananjay Ugwekar
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-06-24  5:59 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay Ugwekar

This patch is in preparation for addition of per-core energy counter
support for AMD CPUs.

Per-core energy counter PMU will need a separate cpumask. It seems like
a better approach to add the cpumask inside the rapl_pmus struct, instead
of creating another global cpumask variable for per-core PMU. This way, in
future, if there is a need for a new PMU with a different scope (e.g. CCD),
adding a new global cpumask variable won't be necessary.

No functional change.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index e5e878146542..be139e9f9ee0 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -119,6 +119,7 @@ struct rapl_pmu {
 
 struct rapl_pmus {
 	struct pmu		pmu;
+	cpumask_t		cpumask;
 	unsigned int		nr_rapl_pmu;
 	struct rapl_pmu		*rapl_pmu[] __counted_by(nr_rapl_pmu);
 };
@@ -139,7 +140,6 @@ struct rapl_model {
  /* 1/2^hw_unit Joule */
 static int rapl_hw_unit[NR_RAPL_DOMAINS] __read_mostly;
 static struct rapl_pmus *rapl_pmus;
-static cpumask_t rapl_cpu_mask;
 static unsigned int rapl_cntr_mask;
 static u64 rapl_timer_ms;
 static struct perf_msr *rapl_msrs;
@@ -394,7 +394,7 @@ static void rapl_pmu_event_read(struct perf_event *event)
 static ssize_t rapl_get_attr_cpumask(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
-	return cpumap_print_to_pagebuf(true, buf, &rapl_cpu_mask);
+	return cpumap_print_to_pagebuf(true, buf, &rapl_pmus->cpumask);
 }
 
 static DEVICE_ATTR(cpumask, S_IRUGO, rapl_get_attr_cpumask, NULL);
@@ -565,7 +565,7 @@ static int rapl_cpu_offline(unsigned int cpu)
 	int target;
 
 	/* Check if exiting cpu is used for collecting rapl events */
-	if (!cpumask_test_and_clear_cpu(cpu, &rapl_cpu_mask))
+	if (!cpumask_test_and_clear_cpu(cpu, &rapl_pmus->cpumask))
 		return 0;
 
 	rapl_pmu->cpu = -1;
@@ -574,7 +574,7 @@ static int rapl_cpu_offline(unsigned int cpu)
 
 	/* Migrate rapl events to the new target */
 	if (target < nr_cpu_ids) {
-		cpumask_set_cpu(target, &rapl_cpu_mask);
+		cpumask_set_cpu(target, &rapl_pmus->cpumask);
 		rapl_pmu->cpu = target;
 		perf_pmu_migrate_context(rapl_pmu->pmu, cpu, target);
 	}
@@ -606,11 +606,11 @@ static int rapl_cpu_online(unsigned int cpu)
 	 * Check if there is an online cpu in the package which collects rapl
 	 * events already.
 	 */
-	target = cpumask_any_and(&rapl_cpu_mask, rapl_pmu_cpumask);
+	target = cpumask_any_and(&rapl_pmus->cpumask, rapl_pmu_cpumask);
 	if (target < nr_cpu_ids)
 		return 0;
 
-	cpumask_set_cpu(cpu, &rapl_cpu_mask);
+	cpumask_set_cpu(cpu, &rapl_pmus->cpumask);
 	rapl_pmu->cpu = cpu;
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 05/10] perf/x86/rapl: Move cpumask variable to rapl_pmus struct
  2024-06-24  5:59 ` [PATCH v3 05/10] perf/x86/rapl: Move cpumask variable to rapl_pmus struct Dhananjay Ugwekar
@ 2024-07-01 13:02   ` Peter Zijlstra
  2024-07-02 10:16     ` Dhananjay Ugwekar
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2024-07-01 13:02 UTC (permalink / raw)
  To: Dhananjay Ugwekar
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen, x86,
	kees, gustavoars, rui.zhang, oleksandr, linux-perf-users,
	linux-kernel, linux-hardening, ananth.narayan, gautham.shenoy,
	kprateek.nayak, ravi.bangoria, sandipan.das, linux-pm

On Mon, Jun 24, 2024 at 05:59:02AM +0000, Dhananjay Ugwekar wrote:
> This patch is in preparation for addition of per-core energy counter
> support for AMD CPUs.
> 
> Per-core energy counter PMU will need a separate cpumask. It seems like
> a better approach to add the cpumask inside the rapl_pmus struct, instead
> of creating another global cpumask variable for per-core PMU. This way, in
> future, if there is a need for a new PMU with a different scope (e.g. CCD),
> adding a new global cpumask variable won't be necessary.
> 
> No functional change.
> 
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> ---
>  arch/x86/events/rapl.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index e5e878146542..be139e9f9ee0 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -119,6 +119,7 @@ struct rapl_pmu {
>  
>  struct rapl_pmus {
>  	struct pmu		pmu;
> +	cpumask_t		cpumask;
>  	unsigned int		nr_rapl_pmu;
>  	struct rapl_pmu		*rapl_pmu[] __counted_by(nr_rapl_pmu);
>  };

Yikes no, please use cpumask_var_t and alloc_cpumask_var() and friends.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 05/10] perf/x86/rapl: Move cpumask variable to rapl_pmus struct
  2024-07-01 13:02   ` Peter Zijlstra
@ 2024-07-02 10:16     ` Dhananjay Ugwekar
  0 siblings, 0 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-02 10:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen, x86,
	kees, gustavoars, rui.zhang, oleksandr, linux-perf-users,
	linux-kernel, linux-hardening, ananth.narayan, gautham.shenoy,
	kprateek.nayak, ravi.bangoria, sandipan.das, linux-pm

Hello Peter,

On 7/1/2024 6:32 PM, Peter Zijlstra wrote:
> On Mon, Jun 24, 2024 at 05:59:02AM +0000, Dhananjay Ugwekar wrote:
>> This patch is in preparation for addition of per-core energy counter
>> support for AMD CPUs.
>>
>> Per-core energy counter PMU will need a separate cpumask. It seems like
>> a better approach to add the cpumask inside the rapl_pmus struct, instead
>> of creating another global cpumask variable for per-core PMU. This way, in
>> future, if there is a need for a new PMU with a different scope (e.g. CCD),
>> adding a new global cpumask variable won't be necessary.
>>
>> No functional change.
>>
>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
>> ---
>>  arch/x86/events/rapl.c | 12 ++++++------
>>  1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>> index e5e878146542..be139e9f9ee0 100644
>> --- a/arch/x86/events/rapl.c
>> +++ b/arch/x86/events/rapl.c
>> @@ -119,6 +119,7 @@ struct rapl_pmu {
>>  
>>  struct rapl_pmus {
>>  	struct pmu		pmu;
>> +	cpumask_t		cpumask;
>>  	unsigned int		nr_rapl_pmu;
>>  	struct rapl_pmu		*rapl_pmu[] __counted_by(nr_rapl_pmu);
>>  };
> 
> Yikes no, please use cpumask_var_t and alloc_cpumask_var() and friends.

Ah yes!, I did not know about this API, will use this in the next version.

Thanks,
Dhananjay

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 06/10] perf/x86/rapl: Add wrapper for online/offline functions
  2024-06-24  5:58 [PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (4 preceding siblings ...)
  2024-06-24  5:59 ` [PATCH v3 05/10] perf/x86/rapl: Move cpumask variable to rapl_pmus struct Dhananjay Ugwekar
@ 2024-06-24  5:59 ` Dhananjay Ugwekar
  2024-06-24  5:59 ` [PATCH v3 07/10] perf/x86/rapl: Add an argument to the cleanup and init functions Dhananjay Ugwekar
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-06-24  5:59 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay Ugwekar

This is in preparation for the addition of per-core RAPL counter support
for AMD CPUs.

The CPU online and offline functions will need to handle the setting up and
migration of the new per-core PMU as well. The wrapper functions added
below will make it easier to pass the corresponding args for both the PMUs.

No functional change.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index be139e9f9ee0..70c7b35fb4d2 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -558,10 +558,10 @@ static struct perf_msr amd_rapl_msrs[] = {
 	[PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group,  NULL, false, 0 },
 };
 
-static int rapl_cpu_offline(unsigned int cpu)
+static int __rapl_cpu_offline(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu_idx,
+			      const struct cpumask *event_cpumask, unsigned int cpu)
 {
-	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
-	struct rapl_pmu *rapl_pmu = cpu_to_rapl_pmu(cpu);
+	struct rapl_pmu *rapl_pmu = rapl_pmus->rapl_pmu[rapl_pmu_idx];
 	int target;
 
 	/* Check if exiting cpu is used for collecting rapl events */
@@ -570,7 +570,7 @@ static int rapl_cpu_offline(unsigned int cpu)
 
 	rapl_pmu->cpu = -1;
 	/* Find a new cpu to collect rapl events */
-	target = cpumask_any_but(rapl_pmu_cpumask, cpu);
+	target = cpumask_any_but(event_cpumask, cpu);
 
 	/* Migrate rapl events to the new target */
 	if (target < nr_cpu_ids) {
@@ -581,11 +581,16 @@ static int rapl_cpu_offline(unsigned int cpu)
 	return 0;
 }
 
-static int rapl_cpu_online(unsigned int cpu)
+static int rapl_cpu_offline(unsigned int cpu)
 {
-	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
-	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
-	struct rapl_pmu *rapl_pmu = cpu_to_rapl_pmu(cpu);
+	return __rapl_cpu_offline(rapl_pmus, get_rapl_pmu_idx(cpu),
+				  get_rapl_pmu_cpumask(cpu), cpu);
+}
+
+static int __rapl_cpu_online(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu_idx,
+			     const struct cpumask *event_cpumask, unsigned int cpu)
+{
+	struct rapl_pmu *rapl_pmu = rapl_pmus->rapl_pmu[rapl_pmu_idx];
 	int target;
 
 	if (!rapl_pmu) {
@@ -606,7 +611,7 @@ static int rapl_cpu_online(unsigned int cpu)
 	 * Check if there is an online cpu in the package which collects rapl
 	 * events already.
 	 */
-	target = cpumask_any_and(&rapl_pmus->cpumask, rapl_pmu_cpumask);
+	target = cpumask_any_and(&rapl_pmus->cpumask, event_cpumask);
 	if (target < nr_cpu_ids)
 		return 0;
 
@@ -615,6 +620,13 @@ static int rapl_cpu_online(unsigned int cpu)
 	return 0;
 }
 
+static int rapl_cpu_online(unsigned int cpu)
+{
+	return __rapl_cpu_online(rapl_pmus, get_rapl_pmu_idx(cpu),
+				 get_rapl_pmu_cpumask(cpu), cpu);
+}
+
+
 static int rapl_check_hw_unit(void)
 {
 	u64 msr_rapl_power_unit_bits;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 07/10] perf/x86/rapl: Add an argument to the cleanup and init functions
  2024-06-24  5:58 [PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (5 preceding siblings ...)
  2024-06-24  5:59 ` [PATCH v3 06/10] perf/x86/rapl: Add wrapper for online/offline functions Dhananjay Ugwekar
@ 2024-06-24  5:59 ` Dhananjay Ugwekar
  2024-06-24  5:59 ` [PATCH v3 08/10] perf/x86/rapl: Modify the generic variable names to *_pkg* Dhananjay Ugwekar
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-06-24  5:59 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay Ugwekar

Prep for per-core RAPL PMU addition.

No functional change.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 70c7b35fb4d2..f815c60ec551 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -687,7 +687,7 @@ static void __init rapl_advertise(void)
 	}
 }
 
-static void cleanup_rapl_pmus(void)
+static void cleanup_rapl_pmus(struct rapl_pmus *rapl_pmus)
 {
 	int i;
 
@@ -705,8 +705,10 @@ static const struct attribute_group *rapl_attr_update[] = {
 	NULL,
 };
 
-static int __init init_rapl_pmus(void)
+static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr)
 {
+	struct rapl_pmus *rapl_pmus;
+
 	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
 
 	if (rapl_pmu_is_pkg_scope())
@@ -728,6 +730,9 @@ static int __init init_rapl_pmus(void)
 	rapl_pmus->pmu.read		= rapl_pmu_event_read;
 	rapl_pmus->pmu.module		= THIS_MODULE;
 	rapl_pmus->pmu.capabilities	= PERF_PMU_CAP_NO_EXCLUDE;
+
+	*rapl_pmus_ptr = rapl_pmus;
+
 	return 0;
 }
 
@@ -869,7 +874,7 @@ static int __init rapl_pmu_init(void)
 	if (ret)
 		return ret;
 
-	ret = init_rapl_pmus();
+	ret = init_rapl_pmus(&rapl_pmus);
 	if (ret)
 		return ret;
 
@@ -893,7 +898,7 @@ static int __init rapl_pmu_init(void)
 	cpuhp_remove_state(CPUHP_AP_PERF_X86_RAPL_ONLINE);
 out:
 	pr_warn("Initialization failed (%d), disabled\n", ret);
-	cleanup_rapl_pmus();
+	cleanup_rapl_pmus(rapl_pmus);
 	return ret;
 }
 module_init(rapl_pmu_init);
@@ -902,6 +907,6 @@ static void __exit intel_rapl_exit(void)
 {
 	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
 	perf_pmu_unregister(&rapl_pmus->pmu);
-	cleanup_rapl_pmus();
+	cleanup_rapl_pmus(rapl_pmus);
 }
 module_exit(intel_rapl_exit);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 08/10] perf/x86/rapl: Modify the generic variable names to *_pkg*
  2024-06-24  5:58 [PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (6 preceding siblings ...)
  2024-06-24  5:59 ` [PATCH v3 07/10] perf/x86/rapl: Add an argument to the cleanup and init functions Dhananjay Ugwekar
@ 2024-06-24  5:59 ` Dhananjay Ugwekar
  2024-07-01 13:08   ` Peter Zijlstra
  2024-06-24  5:59 ` [PATCH v3 09/10] perf/x86/rapl: Remove the global variable rapl_msrs Dhananjay Ugwekar
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-06-24  5:59 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay Ugwekar

Prep for addition of power_per_core PMU to handle core scope energy
consumption for AMD CPUs.

Replace the generic names with *_pkg*, to differentiate between the
scopes of the two different PMUs and their variables.

No functional change.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 83 +++++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 41 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index f815c60ec551..b03b044a390f 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -70,18 +70,18 @@ MODULE_LICENSE("GPL");
 /*
  * RAPL energy status counters
  */
-enum perf_rapl_events {
+enum perf_rapl_pkg_events {
 	PERF_RAPL_PP0 = 0,		/* all cores */
 	PERF_RAPL_PKG,			/* entire package */
 	PERF_RAPL_RAM,			/* DRAM */
 	PERF_RAPL_PP1,			/* gpu */
 	PERF_RAPL_PSYS,			/* psys */
 
-	PERF_RAPL_MAX,
-	NR_RAPL_DOMAINS = PERF_RAPL_MAX,
+	PERF_RAPL_PKG_EVENTS_MAX,
+	NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
 };
 
-static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
+static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst = {
 	"pp0-core",
 	"package",
 	"dram",
@@ -132,15 +132,15 @@ enum rapl_unit_quirk {
 
 struct rapl_model {
 	struct perf_msr *rapl_msrs;
-	unsigned long	events;
+	unsigned long	pkg_events;
 	unsigned int	msr_power_unit;
 	enum rapl_unit_quirk	unit_quirk;
 };
 
  /* 1/2^hw_unit Joule */
-static int rapl_hw_unit[NR_RAPL_DOMAINS] __read_mostly;
-static struct rapl_pmus *rapl_pmus;
-static unsigned int rapl_cntr_mask;
+static int rapl_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
+static struct rapl_pmus *rapl_pmus_pkg;
+static unsigned int rapl_pkg_cntr_mask;
 static u64 rapl_timer_ms;
 static struct perf_msr *rapl_msrs;
 static struct rapl_model *rapl_model;
@@ -165,7 +165,8 @@ static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
 	 * The unsigned check also catches the '-1' return value for non
 	 * existent mappings in the topology map.
 	 */
-	return rapl_pmu_idx < rapl_pmus->nr_rapl_pmu ? rapl_pmus->rapl_pmu[rapl_pmu_idx] : NULL;
+	return rapl_pmu_idx < rapl_pmus_pkg->nr_rapl_pmu ?
+	       rapl_pmus_pkg->rapl_pmu[rapl_pmu_idx] : NULL;
 }
 
 static inline u64 rapl_read_counter(struct perf_event *event)
@@ -177,7 +178,7 @@ static inline u64 rapl_read_counter(struct perf_event *event)
 
 static inline u64 rapl_scale(u64 v, int cfg)
 {
-	if (cfg > NR_RAPL_DOMAINS) {
+	if (cfg > NR_RAPL_PKG_DOMAINS) {
 		pr_warn("Invalid domain %d, failed to scale data\n", cfg);
 		return v;
 	}
@@ -347,7 +348,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
 	struct rapl_pmu *rapl_pmu;
 
 	/* only look at RAPL events */
-	if (event->attr.type != rapl_pmus->pmu.type)
+	if (event->attr.type != rapl_pmus_pkg->pmu.type)
 		return -ENOENT;
 
 	/* check only supported bits are set */
@@ -359,14 +360,14 @@ static int rapl_pmu_event_init(struct perf_event *event)
 
 	event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
 
-	if (!cfg || cfg >= NR_RAPL_DOMAINS + 1)
+	if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
 		return -EINVAL;
 
-	cfg = array_index_nospec((long)cfg, NR_RAPL_DOMAINS + 1);
+	cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
 	bit = cfg - 1;
 
 	/* check event supported */
-	if (!(rapl_cntr_mask & (1 << bit)))
+	if (!(rapl_pkg_cntr_mask & (1 << bit)))
 		return -EINVAL;
 
 	/* unsupported modes and filters */
@@ -394,7 +395,7 @@ static void rapl_pmu_event_read(struct perf_event *event)
 static ssize_t rapl_get_attr_cpumask(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
-	return cpumap_print_to_pagebuf(true, buf, &rapl_pmus->cpumask);
+	return cpumap_print_to_pagebuf(true, buf, &rapl_pmus_pkg->cpumask);
 }
 
 static DEVICE_ATTR(cpumask, S_IRUGO, rapl_get_attr_cpumask, NULL);
@@ -546,11 +547,11 @@ static struct perf_msr intel_rapl_spr_msrs[] = {
 };
 
 /*
- * Force to PERF_RAPL_MAX size due to:
- * - perf_msr_probe(PERF_RAPL_MAX)
+ * Force to PERF_RAPL_PKG_EVENTS_MAX size due to:
+ * - perf_msr_probe(PERF_RAPL_PKG_EVENTS_MAX)
  * - want to use same event codes across both architectures
  */
-static struct perf_msr amd_rapl_msrs[] = {
+static struct perf_msr amd_rapl_pkg_msrs[] = {
 	[PERF_RAPL_PP0]  = { 0, &rapl_events_cores_group, NULL, false, 0 },
 	[PERF_RAPL_PKG]  = { MSR_AMD_PKG_ENERGY_STATUS,  &rapl_events_pkg_group,   test_msr, false, RAPL_MSR_MASK },
 	[PERF_RAPL_RAM]  = { 0, &rapl_events_ram_group,   NULL, false, 0 },
@@ -583,7 +584,7 @@ static int __rapl_cpu_offline(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu
 
 static int rapl_cpu_offline(unsigned int cpu)
 {
-	return __rapl_cpu_offline(rapl_pmus, get_rapl_pmu_idx(cpu),
+	return __rapl_cpu_offline(rapl_pmus_pkg, get_rapl_pmu_idx(cpu),
 				  get_rapl_pmu_cpumask(cpu), cpu);
 }
 
@@ -622,7 +623,7 @@ static int __rapl_cpu_online(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu_
 
 static int rapl_cpu_online(unsigned int cpu)
 {
-	return __rapl_cpu_online(rapl_pmus, get_rapl_pmu_idx(cpu),
+	return __rapl_cpu_online(rapl_pmus_pkg, get_rapl_pmu_idx(cpu),
 				 get_rapl_pmu_cpumask(cpu), cpu);
 }
 
@@ -635,7 +636,7 @@ static int rapl_check_hw_unit(void)
 	/* protect rdmsrl() to handle virtualization */
 	if (rdmsrl_safe(rapl_model->msr_power_unit, &msr_rapl_power_unit_bits))
 		return -1;
-	for (i = 0; i < NR_RAPL_DOMAINS; i++)
+	for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++)
 		rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
 
 	switch (rapl_model->unit_quirk) {
@@ -677,12 +678,12 @@ static void __init rapl_advertise(void)
 	int i;
 
 	pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms ovfl timer\n",
-		hweight32(rapl_cntr_mask), rapl_timer_ms);
+		hweight32(rapl_pkg_cntr_mask), rapl_timer_ms);
 
-	for (i = 0; i < NR_RAPL_DOMAINS; i++) {
-		if (rapl_cntr_mask & (1 << i)) {
+	for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++) {
+		if (rapl_pkg_cntr_mask & (1 << i)) {
 			pr_info("hw unit of domain %s 2^-%d Joules\n",
-				rapl_domain_names[i], rapl_hw_unit[i]);
+				rapl_pkg_domain_names[i], rapl_hw_unit[i]);
 		}
 	}
 }
@@ -737,7 +738,7 @@ static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr)
 }
 
 static struct rapl_model model_snb = {
-	.events		= BIT(PERF_RAPL_PP0) |
+	.pkg_events	= BIT(PERF_RAPL_PP0) |
 			  BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_PP1),
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
@@ -745,7 +746,7 @@ static struct rapl_model model_snb = {
 };
 
 static struct rapl_model model_snbep = {
-	.events		= BIT(PERF_RAPL_PP0) |
+	.pkg_events	= BIT(PERF_RAPL_PP0) |
 			  BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_RAM),
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
@@ -753,7 +754,7 @@ static struct rapl_model model_snbep = {
 };
 
 static struct rapl_model model_hsw = {
-	.events		= BIT(PERF_RAPL_PP0) |
+	.pkg_events	= BIT(PERF_RAPL_PP0) |
 			  BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_RAM) |
 			  BIT(PERF_RAPL_PP1),
@@ -762,7 +763,7 @@ static struct rapl_model model_hsw = {
 };
 
 static struct rapl_model model_hsx = {
-	.events		= BIT(PERF_RAPL_PP0) |
+	.pkg_events	= BIT(PERF_RAPL_PP0) |
 			  BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_RAM),
 	.unit_quirk	= RAPL_UNIT_QUIRK_INTEL_HSW,
@@ -771,7 +772,7 @@ static struct rapl_model model_hsx = {
 };
 
 static struct rapl_model model_knl = {
-	.events		= BIT(PERF_RAPL_PKG) |
+	.pkg_events	= BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_RAM),
 	.unit_quirk	= RAPL_UNIT_QUIRK_INTEL_HSW,
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
@@ -779,7 +780,7 @@ static struct rapl_model model_knl = {
 };
 
 static struct rapl_model model_skl = {
-	.events		= BIT(PERF_RAPL_PP0) |
+	.pkg_events	= BIT(PERF_RAPL_PP0) |
 			  BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_RAM) |
 			  BIT(PERF_RAPL_PP1) |
@@ -789,7 +790,7 @@ static struct rapl_model model_skl = {
 };
 
 static struct rapl_model model_spr = {
-	.events		= BIT(PERF_RAPL_PP0) |
+	.pkg_events	= BIT(PERF_RAPL_PP0) |
 			  BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_RAM) |
 			  BIT(PERF_RAPL_PSYS),
@@ -799,9 +800,9 @@ static struct rapl_model model_spr = {
 };
 
 static struct rapl_model model_amd_hygon = {
-	.events		= BIT(PERF_RAPL_PKG),
+	.pkg_events	= BIT(PERF_RAPL_PKG),
 	.msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
-	.rapl_msrs      = amd_rapl_msrs,
+	.rapl_msrs      = amd_rapl_pkg_msrs,
 };
 
 static const struct x86_cpu_id rapl_model_match[] __initconst = {
@@ -867,14 +868,14 @@ static int __init rapl_pmu_init(void)
 
 	rapl_msrs = rapl_model->rapl_msrs;
 
-	rapl_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_MAX,
-					false, (void *) &rapl_model->events);
+	rapl_pkg_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_PKG_EVENTS_MAX,
+					false, (void *) &rapl_model->pkg_events);
 
 	ret = rapl_check_hw_unit();
 	if (ret)
 		return ret;
 
-	ret = init_rapl_pmus(&rapl_pmus);
+	ret = init_rapl_pmus(&rapl_pmus_pkg);
 	if (ret)
 		return ret;
 
@@ -887,7 +888,7 @@ static int __init rapl_pmu_init(void)
 	if (ret)
 		goto out;
 
-	ret = perf_pmu_register(&rapl_pmus->pmu, "power", -1);
+	ret = perf_pmu_register(&rapl_pmus_pkg->pmu, "power", -1);
 	if (ret)
 		goto out1;
 
@@ -898,7 +899,7 @@ static int __init rapl_pmu_init(void)
 	cpuhp_remove_state(CPUHP_AP_PERF_X86_RAPL_ONLINE);
 out:
 	pr_warn("Initialization failed (%d), disabled\n", ret);
-	cleanup_rapl_pmus(rapl_pmus);
+	cleanup_rapl_pmus(rapl_pmus_pkg);
 	return ret;
 }
 module_init(rapl_pmu_init);
@@ -906,7 +907,7 @@ module_init(rapl_pmu_init);
 static void __exit intel_rapl_exit(void)
 {
 	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
-	perf_pmu_unregister(&rapl_pmus->pmu);
-	cleanup_rapl_pmus(rapl_pmus);
+	perf_pmu_unregister(&rapl_pmus_pkg->pmu);
+	cleanup_rapl_pmus(rapl_pmus_pkg);
 }
 module_exit(intel_rapl_exit);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 08/10] perf/x86/rapl: Modify the generic variable names to *_pkg*
  2024-06-24  5:59 ` [PATCH v3 08/10] perf/x86/rapl: Modify the generic variable names to *_pkg* Dhananjay Ugwekar
@ 2024-07-01 13:08   ` Peter Zijlstra
  2024-07-02  2:25     ` Zhang, Rui
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2024-07-01 13:08 UTC (permalink / raw)
  To: Dhananjay Ugwekar
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen, x86,
	kees, gustavoars, rui.zhang, oleksandr, linux-perf-users,
	linux-kernel, linux-hardening, ananth.narayan, gautham.shenoy,
	kprateek.nayak, ravi.bangoria, sandipan.das, linux-pm

On Mon, Jun 24, 2024 at 05:59:05AM +0000, Dhananjay Ugwekar wrote:
> Prep for addition of power_per_core PMU to handle core scope energy
> consumption for AMD CPUs.
> 
> Replace the generic names with *_pkg*, to differentiate between the
> scopes of the two different PMUs and their variables.

But then remember patch 2 and recall that intel seems to have everything
at die level, not pkg.

Does this proposed naming make sense? How?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 08/10] perf/x86/rapl: Modify the generic variable names to *_pkg*
  2024-07-01 13:08   ` Peter Zijlstra
@ 2024-07-02  2:25     ` Zhang, Rui
  2024-07-02 10:20       ` Dhananjay Ugwekar
  0 siblings, 1 reply; 31+ messages in thread
From: Zhang, Rui @ 2024-07-02  2:25 UTC (permalink / raw)
  To: peterz@infradead.org, Dhananjay.Ugwekar@amd.com
  Cc: gautham.shenoy@amd.com, alexander.shishkin@linux.intel.com,
	dave.hansen@linux.intel.com, ananth.narayan@amd.com,
	Hunter, Adrian, ravi.bangoria@amd.com,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	oleksandr@natalenko.name, irogers@google.com, tglx@linutronix.de,
	gustavoars@kernel.org, kan.liang@linux.intel.com, kees@kernel.org,
	sandipan.das@amd.com, mark.rutland@arm.com,
	linux-perf-users@vger.kernel.org, linux-hardening@vger.kernel.org,
	bp@alien8.de, acme@kernel.org, kprateek.nayak@amd.com,
	jolsa@kernel.org, linux-pm@vger.kernel.org, x86@kernel.org,
	namhyung@kernel.org

On Mon, 2024-07-01 at 15:08 +0200, Peter Zijlstra wrote:
> On Mon, Jun 24, 2024 at 05:59:05AM +0000, Dhananjay Ugwekar wrote:
> > Prep for addition of power_per_core PMU to handle core scope energy
> > consumption for AMD CPUs.
> > 
> > Replace the generic names with *_pkg*, to differentiate between the
> > scopes of the two different PMUs and their variables.
> 
> But then remember patch 2 and recall that intel seems to have
> everything
> at die level, not pkg.
> 
> Does this proposed naming make sense? How?

For Intel products, we have
1. Casecadelake-AP which has multi-die per package and has per-die RAPL
MSRs
2. all other platforms which has single-die per package, so its RAPL
MSRs can be considered as either package-scope or die-scope
This applies to Thermal MSRs as well.

so for these MSRs, we can treat them as
1. always die-scope for all existing platforms
or
2. package-scope with the exception of Casecadelake-ap
And current kernel code follows rule 1.

I propose we switch to rule 2 for these code because rule 1 can be
broke on future multi-die systems (This is already true for Thermal
MSRs).

In this sense, I think it is okay to call it pkg level rapl for both
Intel and AMD.

thanks,
rui

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 08/10] perf/x86/rapl: Modify the generic variable names to *_pkg*
  2024-07-02  2:25     ` Zhang, Rui
@ 2024-07-02 10:20       ` Dhananjay Ugwekar
  2024-07-03  4:07         ` Zhang, Rui
  0 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-02 10:20 UTC (permalink / raw)
  To: Zhang, Rui, peterz@infradead.org
  Cc: gautham.shenoy@amd.com, alexander.shishkin@linux.intel.com,
	dave.hansen@linux.intel.com, ananth.narayan@amd.com,
	Hunter, Adrian, ravi.bangoria@amd.com,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	oleksandr@natalenko.name, irogers@google.com, tglx@linutronix.de,
	gustavoars@kernel.org, kan.liang@linux.intel.com, kees@kernel.org,
	sandipan.das@amd.com, mark.rutland@arm.com,
	linux-perf-users@vger.kernel.org, linux-hardening@vger.kernel.org,
	bp@alien8.de, acme@kernel.org, kprateek.nayak@amd.com,
	jolsa@kernel.org, linux-pm@vger.kernel.org, x86@kernel.org,
	namhyung@kernel.org

Hello Rui,

On 7/2/2024 7:55 AM, Zhang, Rui wrote:
> On Mon, 2024-07-01 at 15:08 +0200, Peter Zijlstra wrote:
>> On Mon, Jun 24, 2024 at 05:59:05AM +0000, Dhananjay Ugwekar wrote:
>>> Prep for addition of power_per_core PMU to handle core scope energy
>>> consumption for AMD CPUs.
>>>
>>> Replace the generic names with *_pkg*, to differentiate between the
>>> scopes of the two different PMUs and their variables.
>>
>> But then remember patch 2 and recall that intel seems to have
>> everything
>> at die level, not pkg.
>>
>> Does this proposed naming make sense? How?
> 
> For Intel products, we have
> 1. Casecadelake-AP which has multi-die per package and has per-die RAPL
> MSRs
> 2. all other platforms which has single-die per package, so its RAPL
> MSRs can be considered as either package-scope or die-scope
> This applies to Thermal MSRs as well.
> 
> so for these MSRs, we can treat them as
> 1. always die-scope for all existing platforms
> or
> 2. package-scope with the exception of Casecadelake-ap
> And current kernel code follows rule 1.
> 
> I propose we switch to rule 2 for these code because rule 1 can be
> broke on future multi-die systems (This is already true for Thermal
> MSRs).

I have a doubt about this, won't the future Intel multi-die systems 
have die-scope for the RAPL PMU like Casecadelake-AP?

If yes, then rule 1 above seems better.

Regards,
Dhananjay

> 
> In this sense, I think it is okay to call it pkg level rapl for both
> Intel and AMD.
> 
> thanks,
> rui
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 08/10] perf/x86/rapl: Modify the generic variable names to *_pkg*
  2024-07-02 10:20       ` Dhananjay Ugwekar
@ 2024-07-03  4:07         ` Zhang, Rui
  2024-07-03  6:31           ` Dhananjay Ugwekar
  0 siblings, 1 reply; 31+ messages in thread
From: Zhang, Rui @ 2024-07-03  4:07 UTC (permalink / raw)
  To: peterz@infradead.org, Dhananjay.Ugwekar@amd.com
  Cc: gautham.shenoy@amd.com, alexander.shishkin@linux.intel.com,
	ananth.narayan@amd.com, dave.hansen@linux.intel.com,
	ravi.bangoria@amd.com, Hunter, Adrian,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	oleksandr@natalenko.name, irogers@google.com, tglx@linutronix.de,
	gustavoars@kernel.org, kan.liang@linux.intel.com, kees@kernel.org,
	sandipan.das@amd.com, mark.rutland@arm.com,
	linux-perf-users@vger.kernel.org, linux-hardening@vger.kernel.org,
	bp@alien8.de, acme@kernel.org, kprateek.nayak@amd.com,
	jolsa@kernel.org, linux-pm@vger.kernel.org, x86@kernel.org,
	namhyung@kernel.org

On Tue, 2024-07-02 at 15:50 +0530, Dhananjay Ugwekar wrote:
> 
> > 
> > For Intel products, we have
> > 1. Casecadelake-AP which has multi-die per package and has per-die
> > RAPL
> > MSRs
> > 2. all other platforms which has single-die per package, so its
> > RAPL
> > MSRs can be considered as either package-scope or die-scope
> > This applies to Thermal MSRs as well.
> > 
> > so for these MSRs, we can treat them as
> > 1. always die-scope for all existing platforms
> > or
> > 2. package-scope with the exception of Casecadelake-ap
> > And current kernel code follows rule 1.
> > 
> > I propose we switch to rule 2 for these code because rule 1 can be
> > broke on future multi-die systems (This is already true for Thermal
> > MSRs).
> 
> I have a doubt about this, won't the future Intel multi-die systems 
> have die-scope for the RAPL PMU like Casecadelake-AP?

For future multi-die systems that I know, the RAPL is still package
scope but it is just lucky that RAPL control is not exposed via the
MSRs so rule 1 is not actually broke for RAPL PMU (while it is indeed
broken for other drivers like thermal).

In short, if we stick with rule 1, the RAPL PMU still works. Switching
to rule 2 to be consistent with the other drivers is also a choice IMV.

thanks,
rui
> 
> If yes, then rule 1 above seems better.
> 
> Regards,
> Dhananjay
> 
> > 
> > In this sense, I think it is okay to call it pkg level rapl for
> > both
> > Intel and AMD.
> > 
> > thanks,
> > rui
> > 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 08/10] perf/x86/rapl: Modify the generic variable names to *_pkg*
  2024-07-03  4:07         ` Zhang, Rui
@ 2024-07-03  6:31           ` Dhananjay Ugwekar
  2024-07-05  2:18             ` Zhang, Rui
  0 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-03  6:31 UTC (permalink / raw)
  To: Zhang, Rui, peterz@infradead.org
  Cc: gautham.shenoy@amd.com, alexander.shishkin@linux.intel.com,
	ananth.narayan@amd.com, dave.hansen@linux.intel.com,
	ravi.bangoria@amd.com, Hunter, Adrian,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	oleksandr@natalenko.name, irogers@google.com, tglx@linutronix.de,
	gustavoars@kernel.org, kan.liang@linux.intel.com, kees@kernel.org,
	sandipan.das@amd.com, mark.rutland@arm.com,
	linux-perf-users@vger.kernel.org, linux-hardening@vger.kernel.org,
	bp@alien8.de, acme@kernel.org, kprateek.nayak@amd.com,
	jolsa@kernel.org, linux-pm@vger.kernel.org, x86@kernel.org,
	namhyung@kernel.org

Hi Rui,

On 7/3/2024 9:37 AM, Zhang, Rui wrote:
> On Tue, 2024-07-02 at 15:50 +0530, Dhananjay Ugwekar wrote:
>>
>>>
>>> For Intel products, we have
>>> 1. Casecadelake-AP which has multi-die per package and has per-die
>>> RAPL
>>> MSRs
>>> 2. all other platforms which has single-die per package, so its
>>> RAPL
>>> MSRs can be considered as either package-scope or die-scope
>>> This applies to Thermal MSRs as well.
>>>
>>> so for these MSRs, we can treat them as
>>> 1. always die-scope for all existing platforms
>>> or
>>> 2. package-scope with the exception of Casecadelake-ap
>>> And current kernel code follows rule 1.
>>>
>>> I propose we switch to rule 2 for these code because rule 1 can be
>>> broke on future multi-die systems (This is already true for Thermal
>>> MSRs).
>>
>> I have a doubt about this, won't the future Intel multi-die systems 
>> have die-scope for the RAPL PMU like Casecadelake-AP?
> 
> For future multi-die systems that I know, the RAPL is still package
> scope 

I think in that case we can go with rule 2, it would be future proof
for Intel systems. If you agree, I can make the change in next version.

Something like below?,

-#define rapl_pmu_is_pkg_scope()                         \
-        (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||  \                                                                                                                                                                  
-	 boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)

+#define rapl_pmu_is_die_scope()			\
+	(boot_cpu_data.x86_model_id == CASCADELAKE)

Regards,
Dhananjay

but it is just lucky that RAPL control is not exposed via the
> MSRs so rule 1 is not actually broke for RAPL PMU (while it is indeed
> broken for other drivers like thermal).
> 
> In short, if we stick with rule 1, the RAPL PMU still works. Switching> to rule 2 to be consistent with the other drivers is also a choice IMV.> 
> thanks,
> rui
>>
>> If yes, then rule 1 above seems better.
>>
>> Regards,
>> Dhananjay
>>
>>>
>>> In this sense, I think it is okay to call it pkg level rapl for
>>> both
>>> Intel and AMD.
>>>
>>> thanks,
>>> rui
>>>
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 08/10] perf/x86/rapl: Modify the generic variable names to *_pkg*
  2024-07-03  6:31           ` Dhananjay Ugwekar
@ 2024-07-05  2:18             ` Zhang, Rui
  2024-07-05  9:24               ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Zhang, Rui @ 2024-07-05  2:18 UTC (permalink / raw)
  To: peterz@infradead.org, Dhananjay.Ugwekar@amd.com
  Cc: gautham.shenoy@amd.com, alexander.shishkin@linux.intel.com,
	ananth.narayan@amd.com, dave.hansen@linux.intel.com,
	ravi.bangoria@amd.com, Hunter, Adrian,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	oleksandr@natalenko.name, irogers@google.com, tglx@linutronix.de,
	gustavoars@kernel.org, kan.liang@linux.intel.com, kees@kernel.org,
	sandipan.das@amd.com, mark.rutland@arm.com,
	linux-perf-users@vger.kernel.org, linux-hardening@vger.kernel.org,
	bp@alien8.de, acme@kernel.org, kprateek.nayak@amd.com,
	jolsa@kernel.org, linux-pm@vger.kernel.org, x86@kernel.org,
	namhyung@kernel.org

On Wed, 2024-07-03 at 12:01 +0530, Dhananjay Ugwekar wrote:
> Hi Rui,
> 
> On 7/3/2024 9:37 AM, Zhang, Rui wrote:
> > On Tue, 2024-07-02 at 15:50 +0530, Dhananjay Ugwekar wrote:
> > > 
> > > > 
> > > > For Intel products, we have
> > > > 1. Casecadelake-AP which has multi-die per package and has per-
> > > > die
> > > > RAPL
> > > > MSRs
> > > > 2. all other platforms which has single-die per package, so its
> > > > RAPL
> > > > MSRs can be considered as either package-scope or die-scope
> > > > This applies to Thermal MSRs as well.
> > > > 
> > > > so for these MSRs, we can treat them as
> > > > 1. always die-scope for all existing platforms
> > > > or
> > > > 2. package-scope with the exception of Casecadelake-ap
> > > > And current kernel code follows rule 1.
> > > > 
> > > > I propose we switch to rule 2 for these code because rule 1 can
> > > > be
> > > > broke on future multi-die systems (This is already true for
> > > > Thermal
> > > > MSRs).
> > > 
> > > I have a doubt about this, won't the future Intel multi-die
> > > systems 
> > > have die-scope for the RAPL PMU like Casecadelake-AP?
> > 
> > For future multi-die systems that I know, the RAPL is still package
> > scope 
> 
> I think in that case we can go with rule 2, it would be future proof
> for Intel systems. If you agree, I can make the change in next
> version.
> 
> Something like below?,
> 
> -#define rapl_pmu_is_pkg_scope()                         \
> -        (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || 
> \                                                                    
>                                                                      
>                          
> -        boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
> 
> +#define rapl_pmu_is_die_scope()                        \
> +       (boot_cpu_data.x86_model_id == CASCADELAKE)
> 
sounds good to me. Just a reminder that using boot_cpu_data.vfm is a
better choice here.

And it would be great to get Peter' view on this.

thanks,
rui

> Regards,
> Dhananjay
> 
> but it is just lucky that RAPL control is not exposed via the
> > MSRs so rule 1 is not actually broke for RAPL PMU (while it is
> > indeed
> > broken for other drivers like thermal).
> > 
> > In short, if we stick with rule 1, the RAPL PMU still works.
> > Switching> to rule 2 to be consistent with the other drivers is
> > also a choice IMV.> 
> > thanks,
> > rui
> > > 
> > > If yes, then rule 1 above seems better.
> > > 
> > > Regards,
> > > Dhananjay
> > > 
> > > > 
> > > > In this sense, I think it is okay to call it pkg level rapl for
> > > > both
> > > > Intel and AMD.
> > > > 
> > > > thanks,
> > > > rui
> > > > 
> > 
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 08/10] perf/x86/rapl: Modify the generic variable names to *_pkg*
  2024-07-05  2:18             ` Zhang, Rui
@ 2024-07-05  9:24               ` Peter Zijlstra
  2024-07-08  1:56                 ` Zhang, Rui
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2024-07-05  9:24 UTC (permalink / raw)
  To: Zhang, Rui
  Cc: Dhananjay.Ugwekar@amd.com, gautham.shenoy@amd.com,
	alexander.shishkin@linux.intel.com, ananth.narayan@amd.com,
	dave.hansen@linux.intel.com, ravi.bangoria@amd.com,
	Hunter, Adrian, linux-kernel@vger.kernel.org, mingo@redhat.com,
	oleksandr@natalenko.name, irogers@google.com, tglx@linutronix.de,
	gustavoars@kernel.org, kan.liang@linux.intel.com, kees@kernel.org,
	sandipan.das@amd.com, mark.rutland@arm.com,
	linux-perf-users@vger.kernel.org, linux-hardening@vger.kernel.org,
	bp@alien8.de, acme@kernel.org, kprateek.nayak@amd.com,
	jolsa@kernel.org, linux-pm@vger.kernel.org, x86@kernel.org,
	namhyung@kernel.org

On Fri, Jul 05, 2024 at 02:18:30AM +0000, Zhang, Rui wrote:

> > > > I have a doubt about this, won't the future Intel multi-die
> > > > systems 
> > > > have die-scope for the RAPL PMU like Casecadelake-AP?
> > > 
> > > For future multi-die systems that I know, the RAPL is still package
> > > scope 
> > 
> > I think in that case we can go with rule 2, it would be future proof
> > for Intel systems. If you agree, I can make the change in next
> > version.
> > 
> > Something like below?,
> > 
> > -#define rapl_pmu_is_pkg_scope()                         \
> > -        (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || 
> > \                                                                    
> >                                                                      
> >                          
> > -        boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
> > 
> > +#define rapl_pmu_is_die_scope()                        \
> > +       (boot_cpu_data.x86_model_id == CASCADELAKE)
> > 
> sounds good to me. Just a reminder that using boot_cpu_data.vfm is a
> better choice here.
> 
> And it would be great to get Peter' view on this.

Peter is confused :-) So you're saying that:

 - old Intel is pkg wide (it has no DIE enumeration)
 - Cascadelake (part of the skylake refresh) is per-DIE
 - modern Intel is pkg wide (they have no DIE enumeration)
 - future Intel will be pkg wide

And this works because for everything that does not enumerate a specific
DIE topology, it ends up begin the same as the PKG topology.

But what about future products that have DIE but are not CASCADE (what
about COOPERLAKE) ?

If this really is a one off for CASCADE, then yes, I think we should be
doing what Dhananjay suggests, and then the PKG naming is fine.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 08/10] perf/x86/rapl: Modify the generic variable names to *_pkg*
  2024-07-05  9:24               ` Peter Zijlstra
@ 2024-07-08  1:56                 ` Zhang, Rui
  0 siblings, 0 replies; 31+ messages in thread
From: Zhang, Rui @ 2024-07-08  1:56 UTC (permalink / raw)
  To: peterz@infradead.org
  Cc: linux-pm@vger.kernel.org, gautham.shenoy@amd.com,
	alexander.shishkin@linux.intel.com, Brown, Len,
	ananth.narayan@amd.com, dave.hansen@linux.intel.com,
	ravi.bangoria@amd.com, Hunter, Adrian,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	irogers@google.com, oleksandr@natalenko.name, tglx@linutronix.de,
	sandipan.das@amd.com, kan.liang@linux.intel.com, kees@kernel.org,
	gustavoars@kernel.org, Dhananjay.Ugwekar@amd.com,
	mark.rutland@arm.com, linux-hardening@vger.kernel.org,
	bp@alien8.de, acme@kernel.org, linux-perf-users@vger.kernel.org,
	kprateek.nayak@amd.com, jolsa@kernel.org, x86@kernel.org,
	namhyung@kernel.org

Hi, Peter,

On Fri, 2024-07-05 at 11:24 +0200, Peter Zijlstra wrote:
> On Fri, Jul 05, 2024 at 02:18:30AM +0000, Zhang, Rui wrote:
> 
> > > > > I have a doubt about this, won't the future Intel multi-die
> > > > > systems 
> > > > > have die-scope for the RAPL PMU like Casecadelake-AP?
> > > > 
> > > > For future multi-die systems that I know, the RAPL is still
> > > > package
> > > > scope 
> > > 
> > > I think in that case we can go with rule 2, it would be future
> > > proof
> > > for Intel systems. If you agree, I can make the change in next
> > > version.
> > > 
> > > Something like below?,
> > > 
> > > -#define rapl_pmu_is_pkg_scope()                         \
> > > -        (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || 
> > > \                                                                
> > >     
> > >                                                                  
> > >     
> > >                          
> > > -        boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
> > > 
> > > +#define rapl_pmu_is_die_scope()                        \
> > > +       (boot_cpu_data.x86_model_id == CASCADELAKE)
> > > 
> > sounds good to me. Just a reminder that using boot_cpu_data.vfm is
> > a
> > better choice here.
> > 
> > And it would be great to get Peter' view on this.
> 
> Peter is confused :-) So you're saying that:
> 
>  - old Intel is pkg wide (it has no DIE enumeration)

right.

>  - Cascadelake (part of the skylake refresh) is per-DIE

right.
It enumerates DIE and its RAPL control (RAPL MSR scope) is also per-
DIE.

>  - modern Intel is pkg wide (they have no DIE enumeration)

right.

>  - future Intel will be pkg wide

see detailed comment below.
> 

> And this works because for everything that does not enumerate a
> specific
> DIE topology, it ends up begin the same as the PKG topology.

Right.

> 
> But what about future products that have DIE but are not CASCADE
> (what
> about COOPERLAKE) ?

For the one that I know, it has Die enumeration but its RAPL control is
still pkg wide.
However, the RAPL control for it (and other future Xeon platforms) is
not done via MSR interface so it does not break this RAPL PMU code.

Future Intel client platforms still rely on MSR to do RAPL control, but
their RAPL control scope (and if they will enumerate DIE) is not clear.

But still, IMO, this suggests that the RAPL control scope is not
architectural and a quirk list (probably ends up with Casecade-AP only)
makes more sense here.

thanks,
rui

> 
> If this really is a one off for CASCADE, then yes, I think we should
> be
> doing what Dhananjay suggests, and then the PKG naming is fine.
> 
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 09/10] perf/x86/rapl: Remove the global variable rapl_msrs
  2024-06-24  5:58 [PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (7 preceding siblings ...)
  2024-06-24  5:59 ` [PATCH v3 08/10] perf/x86/rapl: Modify the generic variable names to *_pkg* Dhananjay Ugwekar
@ 2024-06-24  5:59 ` Dhananjay Ugwekar
  2024-06-24  5:59 ` [PATCH v3 10/10] perf/x86/rapl: Add per-core energy counter support for AMD CPUs Dhananjay Ugwekar
  2024-06-24  6:39 ` [PATCH v3 00/10] Add per-core RAPL " K Prateek Nayak
  10 siblings, 0 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-06-24  5:59 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay Ugwekar

After making the rapl_model struct global, the rapl_msrs global
variable isn't needed, so remove it.

Also it will be cleaner when new per-core scope PMU is added. As we will
need to maintain two rapl_msrs array(one for per-core scope and one for
package scope PMU), inside the rapl_model struct.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index b03b044a390f..e962209e9e4d 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -142,7 +142,6 @@ static int rapl_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
 static struct rapl_pmus *rapl_pmus_pkg;
 static unsigned int rapl_pkg_cntr_mask;
 static u64 rapl_timer_ms;
-static struct perf_msr *rapl_msrs;
 static struct rapl_model *rapl_model;
 
 static inline unsigned int get_rapl_pmu_idx(int cpu)
@@ -380,7 +379,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
 		return -EINVAL;
 	event->cpu = rapl_pmu->cpu;
 	event->pmu_private = rapl_pmu;
-	event->hw.event_base = rapl_msrs[bit].msr;
+	event->hw.event_base = rapl_model->rapl_msrs[bit].msr;
 	event->hw.config = cfg;
 	event->hw.idx = bit;
 
@@ -866,9 +865,7 @@ static int __init rapl_pmu_init(void)
 
 	rapl_model = (struct rapl_model *) id->driver_data;
 
-	rapl_msrs = rapl_model->rapl_msrs;
-
-	rapl_pkg_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_PKG_EVENTS_MAX,
+	rapl_pkg_cntr_mask = perf_msr_probe(rapl_model->rapl_msrs, PERF_RAPL_PKG_EVENTS_MAX,
 					false, (void *) &rapl_model->pkg_events);
 
 	ret = rapl_check_hw_unit();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 10/10] perf/x86/rapl: Add per-core energy counter support for AMD CPUs
  2024-06-24  5:58 [PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (8 preceding siblings ...)
  2024-06-24  5:59 ` [PATCH v3 09/10] perf/x86/rapl: Remove the global variable rapl_msrs Dhananjay Ugwekar
@ 2024-06-24  5:59 ` Dhananjay Ugwekar
  2024-06-26 15:18   ` Zhang, Rui
  2024-06-24  6:39 ` [PATCH v3 00/10] Add per-core RAPL " K Prateek Nayak
  10 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-06-24  5:59 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, kees, gustavoars, rui.zhang, oleksandr
  Cc: linux-perf-users, linux-kernel, linux-hardening, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	linux-pm, Dhananjay Ugwekar

Add a new "power_per_core" PMU and "energy-per-core" event for
monitoring energy consumption by each core. The existing energy-cores
event aggregates the energy consumption at the package level.
This new event aligns with the AMD's per_core energy counters.

Tested the package level and core level PMU counters with workloads
pinned to different CPUs.

Results with workload pinned to CPU 1 in core 1 on a AMD Zen4 Genoa
machine:

$ perf stat -a --per-core -e power_per_core/energy-per-core/ sleep 1

 Performance counter stats for 'system wide':

S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 arch/x86/events/rapl.c | 189 +++++++++++++++++++++++++++++++++++------
 1 file changed, 164 insertions(+), 25 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index e962209e9e4d..8206038a01ac 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -39,6 +39,10 @@
  *	  event: rapl_energy_psys
  *    perf code: 0x5
  *
+ *  per_core counter: consumption of a single physical core
+ *	  event: rapl_energy_per_core (power_per_core PMU)
+ *    perf code: 0x1
+ *
  * We manage those counters as free running (read-only). They may be
  * use simultaneously by other tools, such as turbostat.
  *
@@ -81,6 +85,13 @@ enum perf_rapl_pkg_events {
 	NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
 };
 
+enum perf_rapl_core_events {
+	PERF_RAPL_PER_CORE = 0,		/* per-core */
+
+	PERF_RAPL_CORE_EVENTS_MAX,
+	NR_RAPL_CORE_DOMAINS = PERF_RAPL_CORE_EVENTS_MAX,
+};
+
 static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst = {
 	"pp0-core",
 	"package",
@@ -89,6 +100,10 @@ static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst
 	"psys",
 };
 
+static const char *const rapl_core_domain_names[NR_RAPL_CORE_DOMAINS] __initconst = {
+	"per-core",
+};
+
 /*
  * event code: LSB 8 bits, passed in attr->config
  * any other bit is reserved
@@ -131,8 +146,10 @@ enum rapl_unit_quirk {
 };
 
 struct rapl_model {
-	struct perf_msr *rapl_msrs;
+	struct perf_msr *rapl_pkg_msrs;
+	struct perf_msr *rapl_core_msrs;
 	unsigned long	pkg_events;
+	unsigned long	core_events;
 	unsigned int	msr_power_unit;
 	enum rapl_unit_quirk	unit_quirk;
 };
@@ -140,7 +157,9 @@ struct rapl_model {
  /* 1/2^hw_unit Joule */
 static int rapl_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
 static struct rapl_pmus *rapl_pmus_pkg;
+static struct rapl_pmus *rapl_pmus_core;
 static unsigned int rapl_pkg_cntr_mask;
+static unsigned int rapl_core_cntr_mask;
 static u64 rapl_timer_ms;
 static struct rapl_model *rapl_model;
 
@@ -345,9 +364,13 @@ static int rapl_pmu_event_init(struct perf_event *event)
 	u64 cfg = event->attr.config & RAPL_EVENT_MASK;
 	int bit, ret = 0;
 	struct rapl_pmu *rapl_pmu;
+	struct rapl_pmus *curr_rapl_pmus;
 
 	/* only look at RAPL events */
-	if (event->attr.type != rapl_pmus_pkg->pmu.type)
+	if (event->attr.type == rapl_pmus_pkg->pmu.type ||
+		(rapl_pmus_core && event->attr.type == rapl_pmus_core->pmu.type))
+		curr_rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
+	else
 		return -ENOENT;
 
 	/* check only supported bits are set */
@@ -357,7 +380,8 @@ static int rapl_pmu_event_init(struct perf_event *event)
 	if (event->cpu < 0)
 		return -EINVAL;
 
-	event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
+	if (curr_rapl_pmus == rapl_pmus_pkg)
+		event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
 
 	if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
 		return -EINVAL;
@@ -366,7 +390,8 @@ static int rapl_pmu_event_init(struct perf_event *event)
 	bit = cfg - 1;
 
 	/* check event supported */
-	if (!(rapl_pkg_cntr_mask & (1 << bit)))
+	if (!(rapl_pkg_cntr_mask & (1 << bit)) &&
+	    !(rapl_core_cntr_mask & (1 << bit)))
 		return -EINVAL;
 
 	/* unsupported modes and filters */
@@ -374,12 +399,18 @@ static int rapl_pmu_event_init(struct perf_event *event)
 		return -EINVAL;
 
 	/* must be done before validate_group */
-	rapl_pmu = cpu_to_rapl_pmu(event->cpu);
+	if (curr_rapl_pmus == rapl_pmus_core) {
+		rapl_pmu = curr_rapl_pmus->rapl_pmu[topology_logical_core_id(event->cpu)];
+		event->hw.event_base = rapl_model->rapl_core_msrs[bit].msr;
+	} else {
+		rapl_pmu = curr_rapl_pmus->rapl_pmu[get_rapl_pmu_idx(event->cpu)];
+		event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
+	}
+
 	if (!rapl_pmu)
 		return -EINVAL;
 	event->cpu = rapl_pmu->cpu;
 	event->pmu_private = rapl_pmu;
-	event->hw.event_base = rapl_model->rapl_msrs[bit].msr;
 	event->hw.config = cfg;
 	event->hw.idx = bit;
 
@@ -408,17 +439,38 @@ static struct attribute_group rapl_pmu_attr_group = {
 	.attrs = rapl_pmu_attrs,
 };
 
+static ssize_t rapl_get_attr_per_core_cpumask(struct device *dev,
+					     struct device_attribute *attr, char *buf)
+{
+	return cpumap_print_to_pagebuf(true, buf, &rapl_pmus_core->cpumask);
+}
+
+static struct device_attribute dev_attr_per_core_cpumask = __ATTR(cpumask, 0444,
+								 rapl_get_attr_per_core_cpumask,
+								 NULL);
+
+static struct attribute *rapl_pmu_per_core_attrs[] = {
+	&dev_attr_per_core_cpumask.attr,
+	NULL,
+};
+
+static struct attribute_group rapl_pmu_per_core_attr_group = {
+	.attrs = rapl_pmu_per_core_attrs,
+};
+
 RAPL_EVENT_ATTR_STR(energy-cores, rapl_cores, "event=0x01");
 RAPL_EVENT_ATTR_STR(energy-pkg  ,   rapl_pkg, "event=0x02");
 RAPL_EVENT_ATTR_STR(energy-ram  ,   rapl_ram, "event=0x03");
 RAPL_EVENT_ATTR_STR(energy-gpu  ,   rapl_gpu, "event=0x04");
 RAPL_EVENT_ATTR_STR(energy-psys,   rapl_psys, "event=0x05");
+RAPL_EVENT_ATTR_STR(energy-per-core,   rapl_per_core, "event=0x01");
 
 RAPL_EVENT_ATTR_STR(energy-cores.unit, rapl_cores_unit, "Joules");
 RAPL_EVENT_ATTR_STR(energy-pkg.unit  ,   rapl_pkg_unit, "Joules");
 RAPL_EVENT_ATTR_STR(energy-ram.unit  ,   rapl_ram_unit, "Joules");
 RAPL_EVENT_ATTR_STR(energy-gpu.unit  ,   rapl_gpu_unit, "Joules");
 RAPL_EVENT_ATTR_STR(energy-psys.unit,   rapl_psys_unit, "Joules");
+RAPL_EVENT_ATTR_STR(energy-per-core.unit,   rapl_per_core_unit, "Joules");
 
 /*
  * we compute in 0.23 nJ increments regardless of MSR
@@ -428,6 +480,7 @@ RAPL_EVENT_ATTR_STR(energy-pkg.scale,     rapl_pkg_scale, "2.3283064365386962890
 RAPL_EVENT_ATTR_STR(energy-ram.scale,     rapl_ram_scale, "2.3283064365386962890625e-10");
 RAPL_EVENT_ATTR_STR(energy-gpu.scale,     rapl_gpu_scale, "2.3283064365386962890625e-10");
 RAPL_EVENT_ATTR_STR(energy-psys.scale,   rapl_psys_scale, "2.3283064365386962890625e-10");
+RAPL_EVENT_ATTR_STR(energy-per-core.scale,   rapl_per_core_scale, "2.3283064365386962890625e-10");
 
 /*
  * There are no default events, but we need to create
@@ -461,6 +514,13 @@ static const struct attribute_group *rapl_attr_groups[] = {
 	NULL,
 };
 
+static const struct attribute_group *rapl_per_core_attr_groups[] = {
+	&rapl_pmu_per_core_attr_group,
+	&rapl_pmu_format_group,
+	&rapl_pmu_events_group,
+	NULL,
+};
+
 static struct attribute *rapl_events_cores[] = {
 	EVENT_PTR(rapl_cores),
 	EVENT_PTR(rapl_cores_unit),
@@ -521,6 +581,18 @@ static struct attribute_group rapl_events_psys_group = {
 	.attrs = rapl_events_psys,
 };
 
+static struct attribute *rapl_events_per_core[] = {
+	EVENT_PTR(rapl_per_core),
+	EVENT_PTR(rapl_per_core_unit),
+	EVENT_PTR(rapl_per_core_scale),
+	NULL,
+};
+
+static struct attribute_group rapl_events_per_core_group = {
+	.name  = "events",
+	.attrs = rapl_events_per_core,
+};
+
 static bool test_msr(int idx, void *data)
 {
 	return test_bit(idx, (unsigned long *) data);
@@ -558,6 +630,11 @@ static struct perf_msr amd_rapl_pkg_msrs[] = {
 	[PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group,  NULL, false, 0 },
 };
 
+static struct perf_msr amd_rapl_core_msrs[] = {
+	[PERF_RAPL_PER_CORE] = { MSR_AMD_CORE_ENERGY_STATUS, &rapl_events_per_core_group,
+				 test_msr, false, RAPL_MSR_MASK },
+};
+
 static int __rapl_cpu_offline(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu_idx,
 			      const struct cpumask *event_cpumask, unsigned int cpu)
 {
@@ -583,8 +660,14 @@ static int __rapl_cpu_offline(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu
 
 static int rapl_cpu_offline(unsigned int cpu)
 {
-	return __rapl_cpu_offline(rapl_pmus_pkg, get_rapl_pmu_idx(cpu),
+	int ret =  __rapl_cpu_offline(rapl_pmus_pkg, get_rapl_pmu_idx(cpu),
 				  get_rapl_pmu_cpumask(cpu), cpu);
+
+	if (ret == 0 && rapl_model->core_events)
+		ret = __rapl_cpu_offline(rapl_pmus_core, topology_logical_core_id(cpu),
+				   topology_sibling_cpumask(cpu), cpu);
+
+	return ret;
 }
 
 static int __rapl_cpu_online(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu_idx,
@@ -622,8 +705,14 @@ static int __rapl_cpu_online(struct rapl_pmus *rapl_pmus, unsigned int rapl_pmu_
 
 static int rapl_cpu_online(unsigned int cpu)
 {
-	return __rapl_cpu_online(rapl_pmus_pkg, get_rapl_pmu_idx(cpu),
+	int ret =  __rapl_cpu_online(rapl_pmus_pkg, get_rapl_pmu_idx(cpu),
 				 get_rapl_pmu_cpumask(cpu), cpu);
+
+	if (ret == 0 && rapl_model->core_events)
+		ret = __rapl_cpu_online(rapl_pmus_core, topology_logical_core_id(cpu),
+				   topology_sibling_cpumask(cpu), cpu);
+
+	return ret;
 }
 
 
@@ -677,7 +766,7 @@ static void __init rapl_advertise(void)
 	int i;
 
 	pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms ovfl timer\n",
-		hweight32(rapl_pkg_cntr_mask), rapl_timer_ms);
+		hweight32(rapl_pkg_cntr_mask) + hweight32(rapl_core_cntr_mask), rapl_timer_ms);
 
 	for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++) {
 		if (rapl_pkg_cntr_mask & (1 << i)) {
@@ -685,6 +774,13 @@ static void __init rapl_advertise(void)
 				rapl_pkg_domain_names[i], rapl_hw_unit[i]);
 		}
 	}
+
+	for (i = 0; i < NR_RAPL_CORE_DOMAINS; i++) {
+		if (rapl_core_cntr_mask & (1 << i)) {
+			pr_info("hw unit of domain %s 2^-%d Joules\n",
+				rapl_core_domain_names[i], rapl_hw_unit[i]);
+		}
+	}
 }
 
 static void cleanup_rapl_pmus(struct rapl_pmus *rapl_pmus)
@@ -705,15 +801,16 @@ static const struct attribute_group *rapl_attr_update[] = {
 	NULL,
 };
 
-static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr)
+static const struct attribute_group *rapl_per_core_attr_update[] = {
+	&rapl_events_per_core_group,
+};
+
+static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr, int nr_rapl_pmu,
+				 const struct attribute_group **rapl_attr_groups,
+				 const struct attribute_group **rapl_attr_update)
 {
 	struct rapl_pmus *rapl_pmus;
 
-	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
-
-	if (rapl_pmu_is_pkg_scope())
-		nr_rapl_pmu = topology_max_packages();
-
 	rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu, nr_rapl_pmu), GFP_KERNEL);
 	if (!rapl_pmus)
 		return -ENOMEM;
@@ -741,7 +838,7 @@ static struct rapl_model model_snb = {
 			  BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_PP1),
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
-	.rapl_msrs      = intel_rapl_msrs,
+	.rapl_pkg_msrs	= intel_rapl_msrs,
 };
 
 static struct rapl_model model_snbep = {
@@ -749,7 +846,7 @@ static struct rapl_model model_snbep = {
 			  BIT(PERF_RAPL_PKG) |
 			  BIT(PERF_RAPL_RAM),
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
-	.rapl_msrs      = intel_rapl_msrs,
+	.rapl_pkg_msrs	= intel_rapl_msrs,
 };
 
 static struct rapl_model model_hsw = {
@@ -758,7 +855,7 @@ static struct rapl_model model_hsw = {
 			  BIT(PERF_RAPL_RAM) |
 			  BIT(PERF_RAPL_PP1),
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
-	.rapl_msrs      = intel_rapl_msrs,
+	.rapl_pkg_msrs	= intel_rapl_msrs,
 };
 
 static struct rapl_model model_hsx = {
@@ -767,7 +864,7 @@ static struct rapl_model model_hsx = {
 			  BIT(PERF_RAPL_RAM),
 	.unit_quirk	= RAPL_UNIT_QUIRK_INTEL_HSW,
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
-	.rapl_msrs      = intel_rapl_msrs,
+	.rapl_pkg_msrs	= intel_rapl_msrs,
 };
 
 static struct rapl_model model_knl = {
@@ -775,7 +872,7 @@ static struct rapl_model model_knl = {
 			  BIT(PERF_RAPL_RAM),
 	.unit_quirk	= RAPL_UNIT_QUIRK_INTEL_HSW,
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
-	.rapl_msrs      = intel_rapl_msrs,
+	.rapl_pkg_msrs	= intel_rapl_msrs,
 };
 
 static struct rapl_model model_skl = {
@@ -785,7 +882,7 @@ static struct rapl_model model_skl = {
 			  BIT(PERF_RAPL_PP1) |
 			  BIT(PERF_RAPL_PSYS),
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
-	.rapl_msrs      = intel_rapl_msrs,
+	.rapl_pkg_msrs	= intel_rapl_msrs,
 };
 
 static struct rapl_model model_spr = {
@@ -795,13 +892,15 @@ static struct rapl_model model_spr = {
 			  BIT(PERF_RAPL_PSYS),
 	.unit_quirk	= RAPL_UNIT_QUIRK_INTEL_SPR,
 	.msr_power_unit = MSR_RAPL_POWER_UNIT,
-	.rapl_msrs      = intel_rapl_spr_msrs,
+	.rapl_pkg_msrs	= intel_rapl_spr_msrs,
 };
 
 static struct rapl_model model_amd_hygon = {
 	.pkg_events	= BIT(PERF_RAPL_PKG),
+	.core_events	= BIT(PERF_RAPL_PER_CORE),
 	.msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
-	.rapl_msrs      = amd_rapl_pkg_msrs,
+	.rapl_pkg_msrs	= amd_rapl_pkg_msrs,
+	.rapl_core_msrs	= amd_rapl_core_msrs,
 };
 
 static const struct x86_cpu_id rapl_model_match[] __initconst = {
@@ -858,6 +957,11 @@ static int __init rapl_pmu_init(void)
 {
 	const struct x86_cpu_id *id;
 	int ret;
+	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
+	int nr_cores = topology_max_packages() * topology_num_cores_per_package();
+
+	if (rapl_pmu_is_pkg_scope())
+		nr_rapl_pmu = topology_max_packages();
 
 	id = x86_match_cpu(rapl_model_match);
 	if (!id)
@@ -865,17 +969,34 @@ static int __init rapl_pmu_init(void)
 
 	rapl_model = (struct rapl_model *) id->driver_data;
 
-	rapl_pkg_cntr_mask = perf_msr_probe(rapl_model->rapl_msrs, PERF_RAPL_PKG_EVENTS_MAX,
+	rapl_pkg_cntr_mask = perf_msr_probe(rapl_model->rapl_pkg_msrs, PERF_RAPL_PKG_EVENTS_MAX,
 					false, (void *) &rapl_model->pkg_events);
 
 	ret = rapl_check_hw_unit();
 	if (ret)
 		return ret;
 
-	ret = init_rapl_pmus(&rapl_pmus_pkg);
+	ret = init_rapl_pmus(&rapl_pmus_pkg, nr_rapl_pmu, rapl_attr_groups, rapl_attr_update);
 	if (ret)
 		return ret;
 
+	if (rapl_model->core_events) {
+		rapl_core_cntr_mask = perf_msr_probe(rapl_model->rapl_core_msrs,
+						     PERF_RAPL_CORE_EVENTS_MAX, false,
+						     (void *) &rapl_model->core_events);
+
+		ret = init_rapl_pmus(&rapl_pmus_core, nr_cores,
+				     rapl_per_core_attr_groups, rapl_per_core_attr_update);
+		if (ret) {
+			/*
+			 * If initialization of per_core PMU fails, reset per_core
+			 * flag, and continue with power PMU initialization.
+			 */
+			pr_warn("Per-core PMU initialization failed (%d)\n", ret);
+			rapl_model->core_events = 0UL;
+		}
+	}
+
 	/*
 	 * Install callbacks. Core will call them for each online cpu.
 	 */
@@ -889,6 +1010,20 @@ static int __init rapl_pmu_init(void)
 	if (ret)
 		goto out1;
 
+	if (rapl_model->core_events) {
+		ret = perf_pmu_register(&rapl_pmus_core->pmu, "power_per_core", -1);
+		if (ret) {
+			/*
+			 * If registration of per_core PMU fails, cleanup per_core PMU
+			 * variables, reset the per_core flag and keep the
+			 * power PMU untouched.
+			 */
+			pr_warn("Per-core PMU registration failed (%d)\n", ret);
+			cleanup_rapl_pmus(rapl_pmus_core);
+			rapl_model->core_events = 0UL;
+		}
+	}
+
 	rapl_advertise();
 	return 0;
 
@@ -906,5 +1041,9 @@ static void __exit intel_rapl_exit(void)
 	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
 	perf_pmu_unregister(&rapl_pmus_pkg->pmu);
 	cleanup_rapl_pmus(rapl_pmus_pkg);
+	if (rapl_model->core_events) {
+		perf_pmu_unregister(&rapl_pmus_core->pmu);
+		cleanup_rapl_pmus(rapl_pmus_core);
+	}
 }
 module_exit(intel_rapl_exit);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 10/10] perf/x86/rapl: Add per-core energy counter support for AMD CPUs
  2024-06-24  5:59 ` [PATCH v3 10/10] perf/x86/rapl: Add per-core energy counter support for AMD CPUs Dhananjay Ugwekar
@ 2024-06-26 15:18   ` Zhang, Rui
  2024-06-26 16:37     ` Dhananjay Ugwekar
  0 siblings, 1 reply; 31+ messages in thread
From: Zhang, Rui @ 2024-06-26 15:18 UTC (permalink / raw)
  To: alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	Hunter, Adrian, mingo@redhat.com, irogers@google.com,
	tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, Dhananjay.Ugwekar@amd.com, bp@alien8.de,
	acme@kernel.org, oleksandr@natalenko.name, jolsa@kernel.org,
	x86@kernel.org, namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org


> @@ -131,8 +146,10 @@ enum rapl_unit_quirk {
>  };
>  
>  struct rapl_model {
> -       struct perf_msr *rapl_msrs;
> +       struct perf_msr *rapl_pkg_msrs;

IMO, this should be part of patch 8/10.

[...]

> @@ -685,6 +774,13 @@ static void __init rapl_advertise(void)
>                                 rapl_pkg_domain_names[i],
> rapl_hw_unit[i]);
>                 }
>         }
> +
> +       for (i = 0; i < NR_RAPL_CORE_DOMAINS; i++) {
> +               if (rapl_core_cntr_mask & (1 << i)) {
> +                       pr_info("hw unit of domain %s 2^-%d
> Joules\n",
> +                               rapl_core_domain_names[i],
> rapl_hw_unit[i]);

rapl_hw_unit[] is for package pmu only and
rapl_hw_unit[0] is rapl_hw_unit[PERF_RAPL_PP0] rather than
rapl_hw_unit[PERF_RAPL_PER_CORE]

you cannot use rapl_hw_unit[i] to represent per-core rapl domain unit.

> +               }
> +       }
>  }
>  
>  static void cleanup_rapl_pmus(struct rapl_pmus *rapl_pmus)
> @@ -705,15 +801,16 @@ static const struct attribute_group
> *rapl_attr_update[] = {
>         NULL,
>  };
>  
> -static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr)
> +static const struct attribute_group *rapl_per_core_attr_update[] = {
> +       &rapl_events_per_core_group,
> +};
> +
> +static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr,
> int nr_rapl_pmu,
> +                                const struct attribute_group
> **rapl_attr_groups,
> +                                const struct attribute_group
> **rapl_attr_update)
>  {
>         struct rapl_pmus *rapl_pmus;
>  
> -       int nr_rapl_pmu = topology_max_packages() *
> topology_max_dies_per_package();
> -
> -       if (rapl_pmu_is_pkg_scope())
> -               nr_rapl_pmu = topology_max_packages();
> -
>         rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu,
> nr_rapl_pmu), GFP_KERNEL);
>         if (!rapl_pmus)
>                 return -ENOMEM;
> @@ -741,7 +838,7 @@ static struct rapl_model model_snb = {
>                           BIT(PERF_RAPL_PKG) |
>                           BIT(PERF_RAPL_PP1),
>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
> -       .rapl_msrs      = intel_rapl_msrs,
> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>  };
>  
>  static struct rapl_model model_snbep = {
> @@ -749,7 +846,7 @@ static struct rapl_model model_snbep = {
>                           BIT(PERF_RAPL_PKG) |
>                           BIT(PERF_RAPL_RAM),
>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
> -       .rapl_msrs      = intel_rapl_msrs,
> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>  };
>  
>  static struct rapl_model model_hsw = {
> @@ -758,7 +855,7 @@ static struct rapl_model model_hsw = {
>                           BIT(PERF_RAPL_RAM) |
>                           BIT(PERF_RAPL_PP1),
>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
> -       .rapl_msrs      = intel_rapl_msrs,
> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>  };
>  
>  static struct rapl_model model_hsx = {
> @@ -767,7 +864,7 @@ static struct rapl_model model_hsx = {
>                           BIT(PERF_RAPL_RAM),
>         .unit_quirk     = RAPL_UNIT_QUIRK_INTEL_HSW,
>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
> -       .rapl_msrs      = intel_rapl_msrs,
> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>  };
>  
>  static struct rapl_model model_knl = {
> @@ -775,7 +872,7 @@ static struct rapl_model model_knl = {
>                           BIT(PERF_RAPL_RAM),
>         .unit_quirk     = RAPL_UNIT_QUIRK_INTEL_HSW,
>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
> -       .rapl_msrs      = intel_rapl_msrs,
> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>  };
>  
>  static struct rapl_model model_skl = {
> @@ -785,7 +882,7 @@ static struct rapl_model model_skl = {
>                           BIT(PERF_RAPL_PP1) |
>                           BIT(PERF_RAPL_PSYS),
>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
> -       .rapl_msrs      = intel_rapl_msrs,
> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>  };
>  
>  static struct rapl_model model_spr = {
> @@ -795,13 +892,15 @@ static struct rapl_model model_spr = {
>                           BIT(PERF_RAPL_PSYS),
>         .unit_quirk     = RAPL_UNIT_QUIRK_INTEL_SPR,
>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
> -       .rapl_msrs      = intel_rapl_spr_msrs,
> +       .rapl_pkg_msrs  = intel_rapl_spr_msrs,
>  };

All the above renaming code should be in patch 8/10.
Or else it is a distraction for reviewing this patch.

>  
>  static struct rapl_model model_amd_hygon = {
>         .pkg_events     = BIT(PERF_RAPL_PKG),
> +       .core_events    = BIT(PERF_RAPL_PER_CORE),
>         .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
> -       .rapl_msrs      = amd_rapl_pkg_msrs,
> +       .rapl_pkg_msrs  = amd_rapl_pkg_msrs,
> +       .rapl_core_msrs = amd_rapl_core_msrs,
>  };
>  
>  static const struct x86_cpu_id rapl_model_match[] __initconst = {
> @@ -858,6 +957,11 @@ static int __init rapl_pmu_init(void)
>  {
>         const struct x86_cpu_id *id;
>         int ret;
> +       int nr_rapl_pmu = topology_max_packages() *
> topology_max_dies_per_package();
> +       int nr_cores = topology_max_packages() *
> topology_num_cores_per_package();

I'd suggest either using two variables nr_pkgs/nr_cores, or reuse one
variable nr_rapl_pmu for both pkg pmu and per-core pmu.

> +
> +       if (rapl_pmu_is_pkg_scope())
> +               nr_rapl_pmu = topology_max_packages();
>  
>         id = x86_match_cpu(rapl_model_match);
>         if (!id)
> @@ -865,17 +969,34 @@ static int __init rapl_pmu_init(void)
>  
>         rapl_model = (struct rapl_model *) id->driver_data;
>  
> -       rapl_pkg_cntr_mask = perf_msr_probe(rapl_model->rapl_msrs,
> PERF_RAPL_PKG_EVENTS_MAX,
> +       rapl_pkg_cntr_mask = perf_msr_probe(rapl_model-
> >rapl_pkg_msrs, PERF_RAPL_PKG_EVENTS_MAX,
>                                         false, (void *) &rapl_model-
> >pkg_events);
>  
>         ret = rapl_check_hw_unit();
>         if (ret)
>                 return ret;
>  
> -       ret = init_rapl_pmus(&rapl_pmus_pkg);
> +       ret = init_rapl_pmus(&rapl_pmus_pkg, nr_rapl_pmu,
> rapl_attr_groups, rapl_attr_update);
>         if (ret)
>                 return ret;
>  
> +       if (rapl_model->core_events) {
> +               rapl_core_cntr_mask = perf_msr_probe(rapl_model-
> >rapl_core_msrs,
> +                                                   
> PERF_RAPL_CORE_EVENTS_MAX, false,
> +                                                    (void *)
> &rapl_model->core_events);
> +
> +               ret = init_rapl_pmus(&rapl_pmus_core, nr_cores,
> +                                    rapl_per_core_attr_groups,
> rapl_per_core_attr_update);
> +               if (ret) {
> +                       /*
> +                        * If initialization of per_core PMU fails,
> reset per_core
> +                        * flag, and continue with power PMU
> initialization.
> +                        */
> +                       pr_warn("Per-core PMU initialization failed
> (%d)\n", ret);
> +                       rapl_model->core_events = 0UL;
> +               }
> +       }
> +
>         /*
>          * Install callbacks. Core will call them for each online
> cpu.
>          */
> @@ -889,6 +1010,20 @@ static int __init rapl_pmu_init(void)
>         if (ret)
>                 goto out1;
>  
> +       if (rapl_model->core_events) {
> +               ret = perf_pmu_register(&rapl_pmus_core->pmu,
> "power_per_core", -1);
> +               if (ret) {
> +                       /*
> +                        * If registration of per_core PMU fails,
> cleanup per_core PMU
> +                        * variables, reset the per_core flag and
> keep the
> +                        * power PMU untouched.
> +                        */
> +                       pr_warn("Per-core PMU registration failed
> (%d)\n", ret);
> +                       cleanup_rapl_pmus(rapl_pmus_core);
> +                       rapl_model->core_events = 0UL;
> +               }
> +       }
> +
>         rapl_advertise();
>         return 0;
>  
> @@ -906,5 +1041,9 @@ static void __exit intel_rapl_exit(void)
>         cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
>         perf_pmu_unregister(&rapl_pmus_pkg->pmu);
>         cleanup_rapl_pmus(rapl_pmus_pkg);
> +       if (rapl_model->core_events) {
> +               perf_pmu_unregister(&rapl_pmus_core->pmu);
> +               cleanup_rapl_pmus(rapl_pmus_core);
> +       }

we do check rapl_pmus_core before accessing it, but we never check
rapl_pmus_pkg because the previous code assumes it always exists.

so could there be a problem if some one starts the per-core pmu when
pkg pmu is unregistered and cleaned up?

say, in rapl_pmu_event_init(),

if (event->attr.type == rapl_pmus_pkg->pmu.type ||
   (rapl_pmus_core && event->attr.type == rapl_pmus_core->pmu.type))

this can break because rapl_pmus_pkg is freed, right?

thanks,
rui


>  }
>  module_exit(intel_rapl_exit);


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 10/10] perf/x86/rapl: Add per-core energy counter support for AMD CPUs
  2024-06-26 15:18   ` Zhang, Rui
@ 2024-06-26 16:37     ` Dhananjay Ugwekar
  2024-06-27  6:49       ` Zhang, Rui
  0 siblings, 1 reply; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-06-26 16:37 UTC (permalink / raw)
  To: Zhang, Rui, alexander.shishkin@linux.intel.com,
	dave.hansen@linux.intel.com, Hunter, Adrian, mingo@redhat.com,
	irogers@google.com, tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, bp@alien8.de, acme@kernel.org,
	oleksandr@natalenko.name, jolsa@kernel.org, x86@kernel.org,
	namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	sandipan.das@amd.com, ananth.narayan@amd.com,
	linux-pm@vger.kernel.org

Hello Rui,

On 6/26/2024 8:48 PM, Zhang, Rui wrote:
> 
>> @@ -131,8 +146,10 @@ enum rapl_unit_quirk {
>>  };
>>  
>>  struct rapl_model {
>> -       struct perf_msr *rapl_msrs;
>> +       struct perf_msr *rapl_pkg_msrs;
> 
> IMO, this should be part of patch 8/10.

Makes sense, better to move all the renaming code to 8th patch.

> 
> [...]
> 
>> @@ -685,6 +774,13 @@ static void __init rapl_advertise(void)
>>                                 rapl_pkg_domain_names[i],
>> rapl_hw_unit[i]);
>>                 }
>>         }
>> +
>> +       for (i = 0; i < NR_RAPL_CORE_DOMAINS; i++) {
>> +               if (rapl_core_cntr_mask & (1 << i)) {
>> +                       pr_info("hw unit of domain %s 2^-%d
>> Joules\n",
>> +                               rapl_core_domain_names[i],
>> rapl_hw_unit[i]);
> 
> rapl_hw_unit[] is for package pmu only and
> rapl_hw_unit[0] is rapl_hw_unit[PERF_RAPL_PP0] rather than
> rapl_hw_unit[PERF_RAPL_PER_CORE]
> 
> you cannot use rapl_hw_unit[i] to represent per-core rapl domain unit.

Yes right, I saw that all the elements in the rapl_hw_unit array were actually 
using the value from the same register "MSR_RAPL_POWER_UNIT" or "MSR_AMD_RAPL_POWER_UNIT".
Except for the two quirks,
 
 737         case RAPL_UNIT_QUIRK_INTEL_HSW:                                                                                                                                                                      
 738                 rapl_hw_unit[PERF_RAPL_RAM] = 16;                                                                                                                                                            
 739                 break;                                                                                                                                                                                       
 740         /* SPR uses a fixed energy unit for Psys domain. */
 741         case RAPL_UNIT_QUIRK_INTEL_SPR:
 742                 rapl_hw_unit[PERF_RAPL_PSYS] = 0;
 743                 break;

So, as for AMD systems the rapl_hw_unit[] elements will always have the same value, I ended 
up using the rapl_hw_unit[PERF_RAPL_PP0] for rapl_hw_unit[PERF_RAPL_PER_CORE], but I do realize
it is quite hacky. So, better to do it cleanly and add a separate array/variable for the core events.

> 
>> +               }
>> +       }
>>  }
>>  
>>  static void cleanup_rapl_pmus(struct rapl_pmus *rapl_pmus)
>> @@ -705,15 +801,16 @@ static const struct attribute_group
>> *rapl_attr_update[] = {
>>         NULL,
>>  };
>>  
>> -static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr)
>> +static const struct attribute_group *rapl_per_core_attr_update[] = {
>> +       &rapl_events_per_core_group,
>> +};
>> +
>> +static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr,
>> int nr_rapl_pmu,
>> +                                const struct attribute_group
>> **rapl_attr_groups,
>> +                                const struct attribute_group
>> **rapl_attr_update)
>>  {
>>         struct rapl_pmus *rapl_pmus;
>>  
>> -       int nr_rapl_pmu = topology_max_packages() *
>> topology_max_dies_per_package();
>> -
>> -       if (rapl_pmu_is_pkg_scope())
>> -               nr_rapl_pmu = topology_max_packages();
>> -
>>         rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu,
>> nr_rapl_pmu), GFP_KERNEL);
>>         if (!rapl_pmus)
>>                 return -ENOMEM;
>> @@ -741,7 +838,7 @@ static struct rapl_model model_snb = {
>>                           BIT(PERF_RAPL_PKG) |
>>                           BIT(PERF_RAPL_PP1),
>>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = intel_rapl_msrs,
>> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>>  };
>>  
>>  static struct rapl_model model_snbep = {
>> @@ -749,7 +846,7 @@ static struct rapl_model model_snbep = {
>>                           BIT(PERF_RAPL_PKG) |
>>                           BIT(PERF_RAPL_RAM),
>>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = intel_rapl_msrs,
>> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>>  };
>>  
>>  static struct rapl_model model_hsw = {
>> @@ -758,7 +855,7 @@ static struct rapl_model model_hsw = {
>>                           BIT(PERF_RAPL_RAM) |
>>                           BIT(PERF_RAPL_PP1),
>>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = intel_rapl_msrs,
>> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>>  };
>>  
>>  static struct rapl_model model_hsx = {
>> @@ -767,7 +864,7 @@ static struct rapl_model model_hsx = {
>>                           BIT(PERF_RAPL_RAM),
>>         .unit_quirk     = RAPL_UNIT_QUIRK_INTEL_HSW,
>>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = intel_rapl_msrs,
>> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>>  };
>>  
>>  static struct rapl_model model_knl = {
>> @@ -775,7 +872,7 @@ static struct rapl_model model_knl = {
>>                           BIT(PERF_RAPL_RAM),
>>         .unit_quirk     = RAPL_UNIT_QUIRK_INTEL_HSW,
>>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = intel_rapl_msrs,
>> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>>  };
>>  
>>  static struct rapl_model model_skl = {
>> @@ -785,7 +882,7 @@ static struct rapl_model model_skl = {
>>                           BIT(PERF_RAPL_PP1) |
>>                           BIT(PERF_RAPL_PSYS),
>>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = intel_rapl_msrs,
>> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>>  };
>>  
>>  static struct rapl_model model_spr = {
>> @@ -795,13 +892,15 @@ static struct rapl_model model_spr = {
>>                           BIT(PERF_RAPL_PSYS),
>>         .unit_quirk     = RAPL_UNIT_QUIRK_INTEL_SPR,
>>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = intel_rapl_spr_msrs,
>> +       .rapl_pkg_msrs  = intel_rapl_spr_msrs,
>>  };
> 
> All the above renaming code should be in patch 8/10.
> Or else it is a distraction for reviewing this patch.

Agreed, will move it in the next version.

> 
>>  
>>  static struct rapl_model model_amd_hygon = {
>>         .pkg_events     = BIT(PERF_RAPL_PKG),
>> +       .core_events    = BIT(PERF_RAPL_PER_CORE),
>>         .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = amd_rapl_pkg_msrs,
>> +       .rapl_pkg_msrs  = amd_rapl_pkg_msrs,
>> +       .rapl_core_msrs = amd_rapl_core_msrs,
>>  };
>>  
>>  static const struct x86_cpu_id rapl_model_match[] __initconst = {
>> @@ -858,6 +957,11 @@ static int __init rapl_pmu_init(void)
>>  {
>>         const struct x86_cpu_id *id;
>>         int ret;
>> +       int nr_rapl_pmu = topology_max_packages() *
>> topology_max_dies_per_package();
>> +       int nr_cores = topology_max_packages() *
>> topology_num_cores_per_package();
> 
> I'd suggest either using two variables nr_pkgs/nr_cores, or reuse one
> variable nr_rapl_pmu for both pkg pmu and per-core pmu.

I understand your point, but the problem with that is, there are actually three scopes needed here

Some Intel systems need a *die* scope for the rapl_pmus_pkg PMU
Some Intel systems and all AMD systems need a *package* scope for the rapl_pmus_pkg PMU
And AMD systems need a *core* scope for the rapl_pmus_per_core PMU

I think what we can do is three variables, nr_dies (for all Intel systems as before), 
nr_pkgs(for AMD systems rapl_pmus_pkg PMU) and nr_cores(for rapl_pmus_per_core PMU)

Sounds good?

> 
>> +
>> +       if (rapl_pmu_is_pkg_scope())
>> +               nr_rapl_pmu = topology_max_packages();
>>  
>>         id = x86_match_cpu(rapl_model_match);
>>         if (!id)
>> @@ -865,17 +969,34 @@ static int __init rapl_pmu_init(void)
>>  
>>         rapl_model = (struct rapl_model *) id->driver_data;
>>  
>> -       rapl_pkg_cntr_mask = perf_msr_probe(rapl_model->rapl_msrs,
>> PERF_RAPL_PKG_EVENTS_MAX,
>> +       rapl_pkg_cntr_mask = perf_msr_probe(rapl_model-
>>> rapl_pkg_msrs, PERF_RAPL_PKG_EVENTS_MAX,
>>                                         false, (void *) &rapl_model-
>>> pkg_events);
>>  
>>         ret = rapl_check_hw_unit();
>>         if (ret)
>>                 return ret;
>>  
>> -       ret = init_rapl_pmus(&rapl_pmus_pkg);
>> +       ret = init_rapl_pmus(&rapl_pmus_pkg, nr_rapl_pmu,
>> rapl_attr_groups, rapl_attr_update);
>>         if (ret)
>>                 return ret;
>>  
>> +       if (rapl_model->core_events) {
>> +               rapl_core_cntr_mask = perf_msr_probe(rapl_model-
>>> rapl_core_msrs,
>> +                                                   
>> PERF_RAPL_CORE_EVENTS_MAX, false,
>> +                                                    (void *)
>> &rapl_model->core_events);
>> +
>> +               ret = init_rapl_pmus(&rapl_pmus_core, nr_cores,
>> +                                    rapl_per_core_attr_groups,
>> rapl_per_core_attr_update);
>> +               if (ret) {
>> +                       /*
>> +                        * If initialization of per_core PMU fails,
>> reset per_core
>> +                        * flag, and continue with power PMU
>> initialization.
>> +                        */
>> +                       pr_warn("Per-core PMU initialization failed
>> (%d)\n", ret);
>> +                       rapl_model->core_events = 0UL;
>> +               }
>> +       }
>> +
>>         /*
>>          * Install callbacks. Core will call them for each online
>> cpu.
>>          */
>> @@ -889,6 +1010,20 @@ static int __init rapl_pmu_init(void)
>>         if (ret)
>>                 goto out1;
>>  
>> +       if (rapl_model->core_events) {
>> +               ret = perf_pmu_register(&rapl_pmus_core->pmu,
>> "power_per_core", -1);
>> +               if (ret) {
>> +                       /*
>> +                        * If registration of per_core PMU fails,
>> cleanup per_core PMU
>> +                        * variables, reset the per_core flag and
>> keep the
>> +                        * power PMU untouched.
>> +                        */
>> +                       pr_warn("Per-core PMU registration failed
>> (%d)\n", ret);
>> +                       cleanup_rapl_pmus(rapl_pmus_core);
>> +                       rapl_model->core_events = 0UL;
>> +               }
>> +       }
>> +
>>         rapl_advertise();
>>         return 0;
>>  
>> @@ -906,5 +1041,9 @@ static void __exit intel_rapl_exit(void)
>>         cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
>>         perf_pmu_unregister(&rapl_pmus_pkg->pmu);
>>         cleanup_rapl_pmus(rapl_pmus_pkg);
>> +       if (rapl_model->core_events) {
>> +               perf_pmu_unregister(&rapl_pmus_core->pmu);
>> +               cleanup_rapl_pmus(rapl_pmus_core);
>> +       }
> 
> we do check rapl_pmus_core before accessing it, but we never check
> rapl_pmus_pkg because the previous code assumes it always exists.
> 
> so could there be a problem if some one starts the per-core pmu when
> pkg pmu is unregistered and cleaned up?
> 
> say, in rapl_pmu_event_init(),
> 
> if (event->attr.type == rapl_pmus_pkg->pmu.type ||
>    (rapl_pmus_core && event->attr.type == rapl_pmus_core->pmu.type))
> 
> this can break because rapl_pmus_pkg is freed, right?

Hmm, I think this situation can't arise as whenever the power PMU fails, we 
directly go to the failure path and dont setup the per-core PMU(which means 
no one will be able to start the per-core PMU), 
Please let me know if there is a scenario where this assumption can fail.

Thanks for all the helpful suggestions!, will incorporate them in v4.

Regards,
Dhananjay

> 
> thanks,
> rui
> 
> 
>>  }
>>  module_exit(intel_rapl_exit);
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 10/10] perf/x86/rapl: Add per-core energy counter support for AMD CPUs
  2024-06-26 16:37     ` Dhananjay Ugwekar
@ 2024-06-27  6:49       ` Zhang, Rui
  2024-06-27 11:13         ` Dhananjay Ugwekar
  0 siblings, 1 reply; 31+ messages in thread
From: Zhang, Rui @ 2024-06-27  6:49 UTC (permalink / raw)
  To: alexander.shishkin@linux.intel.com, dave.hansen@linux.intel.com,
	Hunter, Adrian, oleksandr@natalenko.name, mingo@redhat.com,
	irogers@google.com, tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, Dhananjay.Ugwekar@amd.com, bp@alien8.de,
	acme@kernel.org, jolsa@kernel.org, x86@kernel.org,
	namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, gautham.shenoy@amd.com,
	kprateek.nayak@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	ananth.narayan@amd.com, sandipan.das@amd.com,
	linux-pm@vger.kernel.org

Hi, Dhananjay
> 
> > 
> > [...]
> > 
> > > @@ -685,6 +774,13 @@ static void __init rapl_advertise(void)
> > >                                 rapl_pkg_domain_names[i],
> > > rapl_hw_unit[i]);
> > >                 }
> > >         }
> > > +
> > > +       for (i = 0; i < NR_RAPL_CORE_DOMAINS; i++) {
> > > +               if (rapl_core_cntr_mask & (1 << i)) {
> > > +                       pr_info("hw unit of domain %s 2^-%d
> > > Joules\n",
> > > +                               rapl_core_domain_names[i],
> > > rapl_hw_unit[i]);
> > 
> > rapl_hw_unit[] is for package pmu only and
> > rapl_hw_unit[0] is rapl_hw_unit[PERF_RAPL_PP0] rather than
> > rapl_hw_unit[PERF_RAPL_PER_CORE]
> > 
> > you cannot use rapl_hw_unit[i] to represent per-core rapl domain
> > unit.
> 
> Yes right, I saw that all the elements in the rapl_hw_unit array were
> actually 
> using the value from the same register "MSR_RAPL_POWER_UNIT" or
> "MSR_AMD_RAPL_POWER_UNIT".
> Except for the two quirks,
>  
>  737         case
> RAPL_UNIT_QUIRK_INTEL_HSW:                                           
>                                                                      
>                                                       
>  738                 rapl_hw_unit[PERF_RAPL_RAM] =
> 16;                                                                  
>                                                                      
>                      
>  739                
> break;                                                               
>                                                                      
>                                                    
>  740         /* SPR uses a fixed energy unit for Psys domain. */
>  741         case RAPL_UNIT_QUIRK_INTEL_SPR:
>  742                 rapl_hw_unit[PERF_RAPL_PSYS] = 0;
>  743                 break;
> 
> So, as for AMD systems the rapl_hw_unit[] elements will always have
> the same value, I ended 
> up using the rapl_hw_unit[PERF_RAPL_PP0] for
> rapl_hw_unit[PERF_RAPL_PER_CORE], but I do realize
> it is quite hacky. So, better to do it cleanly and add a separate
> array/variable for the core events.
> 
yeah, that is much better.
> 

> > 
> > >  
> > >  static struct rapl_model model_amd_hygon = {
> > >         .pkg_events     = BIT(PERF_RAPL_PKG),
> > > +       .core_events    = BIT(PERF_RAPL_PER_CORE),
> > >         .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
> > > -       .rapl_msrs      = amd_rapl_pkg_msrs,
> > > +       .rapl_pkg_msrs  = amd_rapl_pkg_msrs,
> > > +       .rapl_core_msrs = amd_rapl_core_msrs,
> > >  };
> > >  
> > >  static const struct x86_cpu_id rapl_model_match[] __initconst =
> > > {
> > > @@ -858,6 +957,11 @@ static int __init rapl_pmu_init(void)
> > >  {
> > >         const struct x86_cpu_id *id;
> > >         int ret;
> > > +       int nr_rapl_pmu = topology_max_packages() *
> > > topology_max_dies_per_package();
> > > +       int nr_cores = topology_max_packages() *
> > > topology_num_cores_per_package();
> > 
> > I'd suggest either using two variables nr_pkgs/nr_cores, or reuse
> > one
> > variable nr_rapl_pmu for both pkg pmu and per-core pmu.
> 
> I understand your point, but the problem with that is, there are
> actually three scopes needed here
> 
> Some Intel systems need a *die* scope for the rapl_pmus_pkg PMU
> Some Intel systems and all AMD systems need a *package* scope for the
> rapl_pmus_pkg PMU
> And AMD systems need a *core* scope for the rapl_pmus_per_core PMU
> 
> I think what we can do is three variables, nr_dies (for all Intel
> systems as before), 
> nr_pkgs(for AMD systems rapl_pmus_pkg PMU)

Not necessarily, we already uses rapl_pmus_pkg for intel systems,
right?

>  and nr_cores(for rapl_pmus_per_core PMU)
> 
> Sounds good?

what about just one variable "count" and reuse it for every cases?

> 
> > 
> > > +
> > > +       if (rapl_pmu_is_pkg_scope())
> > > +               nr_rapl_pmu = topology_max_packages();
> > >  
> > >         id = x86_match_cpu(rapl_model_match);
> > >         if (!id)
> > > @@ -865,17 +969,34 @@ static int __init rapl_pmu_init(void)
> > >  
> > >         rapl_model = (struct rapl_model *) id->driver_data;
> > >  
> > > -       rapl_pkg_cntr_mask = perf_msr_probe(rapl_model-
> > > >rapl_msrs,
> > > PERF_RAPL_PKG_EVENTS_MAX,
> > > +       rapl_pkg_cntr_mask = perf_msr_probe(rapl_model-
> > > > rapl_pkg_msrs, PERF_RAPL_PKG_EVENTS_MAX,
> > >                                         false, (void *)
> > > &rapl_model-
> > > > pkg_events);
> > >  
> > >         ret = rapl_check_hw_unit();
> > >         if (ret)
> > >                 return ret;
> > >  
> > > -       ret = init_rapl_pmus(&rapl_pmus_pkg);
> > > +       ret = init_rapl_pmus(&rapl_pmus_pkg, nr_rapl_pmu,
> > > rapl_attr_groups, rapl_attr_update);
> > >         if (ret)
> > >                 return ret;
> > >  
> > > +       if (rapl_model->core_events) {
> > > +               rapl_core_cntr_mask = perf_msr_probe(rapl_model-
> > > > rapl_core_msrs,
> > > +                                                   
> > > PERF_RAPL_CORE_EVENTS_MAX, false,
> > > +                                                    (void *)
> > > &rapl_model->core_events);
> > > +
> > > +               ret = init_rapl_pmus(&rapl_pmus_core, nr_cores,
> > > +                                    rapl_per_core_attr_groups,
> > > rapl_per_core_attr_update);
> > > +               if (ret) {
> > > +                       /*
> > > +                        * If initialization of per_core PMU
> > > fails,
> > > reset per_core
> > > +                        * flag, and continue with power PMU
> > > initialization.
> > > +                        */
> > > +                       pr_warn("Per-core PMU initialization
> > > failed
> > > (%d)\n", ret);
> > > +                       rapl_model->core_events = 0UL;
> > > +               }
> > > +       }
> > > +
> > >         /*
> > >          * Install callbacks. Core will call them for each online
> > > cpu.
> > >          */
> > > @@ -889,6 +1010,20 @@ static int __init rapl_pmu_init(void)
> > >         if (ret)
> > >                 goto out1;
> > >  
> > > +       if (rapl_model->core_events) {
> > > +               ret = perf_pmu_register(&rapl_pmus_core->pmu,
> > > "power_per_core", -1);
> > > +               if (ret) {
> > > +                       /*
> > > +                        * If registration of per_core PMU fails,
> > > cleanup per_core PMU
> > > +                        * variables, reset the per_core flag and
> > > keep the
> > > +                        * power PMU untouched.
> > > +                        */
> > > +                       pr_warn("Per-core PMU registration failed
> > > (%d)\n", ret);
> > > +                       cleanup_rapl_pmus(rapl_pmus_core);
> > > +                       rapl_model->core_events = 0UL;
> > > +               }
> > > +       }
> > > +
> > >         rapl_advertise();
> > >         return 0;
> > >  
> > > @@ -906,5 +1041,9 @@ static void __exit intel_rapl_exit(void)
> > >         cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE)
> > > ;
> > >         perf_pmu_unregister(&rapl_pmus_pkg->pmu);
> > >         cleanup_rapl_pmus(rapl_pmus_pkg);
> > > +       if (rapl_model->core_events) {
> > > +               perf_pmu_unregister(&rapl_pmus_core->pmu);
> > > +               cleanup_rapl_pmus(rapl_pmus_core);
> > > +       }
> > 
> > we do check rapl_pmus_core before accessing it, but we never check
> > rapl_pmus_pkg because the previous code assumes it always exists.
> > 
> > so could there be a problem if some one starts the per-core pmu
> > when
> > pkg pmu is unregistered and cleaned up?
> > 
> > say, in rapl_pmu_event_init(),
> > 
> > if (event->attr.type == rapl_pmus_pkg->pmu.type ||
> >    (rapl_pmus_core && event->attr.type == rapl_pmus_core-
> > >pmu.type))
> > 
> > this can break because rapl_pmus_pkg is freed, right?
> 
> Hmm, I think this situation can't arise as whenever the power PMU
> fails, we 
> directly go to the failure path and dont setup the per-core PMU(which
> means 
> no one will be able to start the per-core PMU), 
> Please let me know if there is a scenario where this assumption can
> fail.

I mean if we do module unload and access power-per-core pmu at the same
time, could there be a race?

why not just unregister and cleanup the per-core pmu before the pkg
pmu?
> 

thanks,
rui


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 10/10] perf/x86/rapl: Add per-core energy counter support for AMD CPUs
  2024-06-27  6:49       ` Zhang, Rui
@ 2024-06-27 11:13         ` Dhananjay Ugwekar
  0 siblings, 0 replies; 31+ messages in thread
From: Dhananjay Ugwekar @ 2024-06-27 11:13 UTC (permalink / raw)
  To: Zhang, Rui, alexander.shishkin@linux.intel.com,
	dave.hansen@linux.intel.com, Hunter, Adrian,
	oleksandr@natalenko.name, mingo@redhat.com, irogers@google.com,
	tglx@linutronix.de, gustavoars@kernel.org,
	kan.liang@linux.intel.com, kees@kernel.org, mark.rutland@arm.com,
	peterz@infradead.org, bp@alien8.de, acme@kernel.org,
	jolsa@kernel.org, x86@kernel.org, namhyung@kernel.org
  Cc: ravi.bangoria@amd.com, gautham.shenoy@amd.com,
	kprateek.nayak@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	ananth.narayan@amd.com, sandipan.das@amd.com,
	linux-pm@vger.kernel.org

Hello Rui,

On 6/27/2024 12:19 PM, Zhang, Rui wrote:
> Hi, Dhananjay
>>
>>>
>>> [...]
>>>
>>>> @@ -685,6 +774,13 @@ static void __init rapl_advertise(void)
>>>>                                 rapl_pkg_domain_names[i],
>>>> rapl_hw_unit[i]);
>>>>                 }
>>>>         }
>>>> +
>>>> +       for (i = 0; i < NR_RAPL_CORE_DOMAINS; i++) {
>>>> +               if (rapl_core_cntr_mask & (1 << i)) {
>>>> +                       pr_info("hw unit of domain %s 2^-%d
>>>> Joules\n",
>>>> +                               rapl_core_domain_names[i],
>>>> rapl_hw_unit[i]);
>>>
>>> rapl_hw_unit[] is for package pmu only and
>>> rapl_hw_unit[0] is rapl_hw_unit[PERF_RAPL_PP0] rather than
>>> rapl_hw_unit[PERF_RAPL_PER_CORE]
>>>
>>> you cannot use rapl_hw_unit[i] to represent per-core rapl domain
>>> unit.
>>
>> Yes right, I saw that all the elements in the rapl_hw_unit array were
>> actually 
>> using the value from the same register "MSR_RAPL_POWER_UNIT" or
>> "MSR_AMD_RAPL_POWER_UNIT".
>> Except for the two quirks,
>>  
>>  737         case
>> RAPL_UNIT_QUIRK_INTEL_HSW:                                           
>>                                                                      
>>                                                       
>>  738                 rapl_hw_unit[PERF_RAPL_RAM] =
>> 16;                                                                  
>>                                                                      
>>                      
>>  739                
>> break;                                                               
>>                                                                      
>>                                                    
>>  740         /* SPR uses a fixed energy unit for Psys domain. */
>>  741         case RAPL_UNIT_QUIRK_INTEL_SPR:
>>  742                 rapl_hw_unit[PERF_RAPL_PSYS] = 0;
>>  743                 break;
>>
>> So, as for AMD systems the rapl_hw_unit[] elements will always have
>> the same value, I ended 
>> up using the rapl_hw_unit[PERF_RAPL_PP0] for
>> rapl_hw_unit[PERF_RAPL_PER_CORE], but I do realize
>> it is quite hacky. So, better to do it cleanly and add a separate
>> array/variable for the core events.
>>
> yeah, that is much better.
>>
> 
>>>
>>>>  
>>>>  static struct rapl_model model_amd_hygon = {
>>>>         .pkg_events     = BIT(PERF_RAPL_PKG),
>>>> +       .core_events    = BIT(PERF_RAPL_PER_CORE),
>>>>         .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
>>>> -       .rapl_msrs      = amd_rapl_pkg_msrs,
>>>> +       .rapl_pkg_msrs  = amd_rapl_pkg_msrs,
>>>> +       .rapl_core_msrs = amd_rapl_core_msrs,
>>>>  };
>>>>  
>>>>  static const struct x86_cpu_id rapl_model_match[] __initconst =
>>>> {
>>>> @@ -858,6 +957,11 @@ static int __init rapl_pmu_init(void)
>>>>  {
>>>>         const struct x86_cpu_id *id;
>>>>         int ret;
>>>> +       int nr_rapl_pmu = topology_max_packages() *
>>>> topology_max_dies_per_package();
>>>> +       int nr_cores = topology_max_packages() *
>>>> topology_num_cores_per_package();
>>>
>>> I'd suggest either using two variables nr_pkgs/nr_cores, or reuse
>>> one
>>> variable nr_rapl_pmu for both pkg pmu and per-core pmu.
>>
>> I understand your point, but the problem with that is, there are
>> actually three scopes needed here
>>
>> Some Intel systems need a *die* scope for the rapl_pmus_pkg PMU
>> Some Intel systems and all AMD systems need a *package* scope for the
>> rapl_pmus_pkg PMU
>> And AMD systems need a *core* scope for the rapl_pmus_per_core PMU
>>
>> I think what we can do is three variables, nr_dies (for all Intel
>> systems as before), 
>> nr_pkgs(for AMD systems rapl_pmus_pkg PMU)
> 
> Not necessarily, we already uses rapl_pmus_pkg for intel systems,
> right?

Yes, but scope of rapl_pmus_pkg is actually *die* for some Intel systems
(Xeon Cascade Lake-AP to be specific), right?

Whereas, all AMD systems have pkg scope for power PMU.

> 
>>  and nr_cores(for rapl_pmus_per_core PMU)
>>
>> Sounds good?
> 
> what about just one variable "count" and reuse it for every cases?

Sure that would be cleaner, albeit not that explicit, will make the change 
in next version

> 
>>
>>>
>>>> +
>>>> +       if (rapl_pmu_is_pkg_scope())
>>>> +               nr_rapl_pmu = topology_max_packages();
>>>>  
>>>>         id = x86_match_cpu(rapl_model_match);
>>>>         if (!id)
>>>> @@ -865,17 +969,34 @@ static int __init rapl_pmu_init(void)
>>>>  
>>>>         rapl_model = (struct rapl_model *) id->driver_data;
>>>>  
>>>> -       rapl_pkg_cntr_mask = perf_msr_probe(rapl_model-
>>>>> rapl_msrs,
>>>> PERF_RAPL_PKG_EVENTS_MAX,
>>>> +       rapl_pkg_cntr_mask = perf_msr_probe(rapl_model-
>>>>> rapl_pkg_msrs, PERF_RAPL_PKG_EVENTS_MAX,
>>>>                                         false, (void *)
>>>> &rapl_model-
>>>>> pkg_events);
>>>>  
>>>>         ret = rapl_check_hw_unit();
>>>>         if (ret)
>>>>                 return ret;
>>>>  
>>>> -       ret = init_rapl_pmus(&rapl_pmus_pkg);
>>>> +       ret = init_rapl_pmus(&rapl_pmus_pkg, nr_rapl_pmu,
>>>> rapl_attr_groups, rapl_attr_update);
>>>>         if (ret)
>>>>                 return ret;
>>>>  
>>>> +       if (rapl_model->core_events) {
>>>> +               rapl_core_cntr_mask = perf_msr_probe(rapl_model-
>>>>> rapl_core_msrs,
>>>> +                                                   
>>>> PERF_RAPL_CORE_EVENTS_MAX, false,
>>>> +                                                    (void *)
>>>> &rapl_model->core_events);
>>>> +
>>>> +               ret = init_rapl_pmus(&rapl_pmus_core, nr_cores,
>>>> +                                    rapl_per_core_attr_groups,
>>>> rapl_per_core_attr_update);
>>>> +               if (ret) {
>>>> +                       /*
>>>> +                        * If initialization of per_core PMU
>>>> fails,
>>>> reset per_core
>>>> +                        * flag, and continue with power PMU
>>>> initialization.
>>>> +                        */
>>>> +                       pr_warn("Per-core PMU initialization
>>>> failed
>>>> (%d)\n", ret);
>>>> +                       rapl_model->core_events = 0UL;
>>>> +               }
>>>> +       }
>>>> +
>>>>         /*
>>>>          * Install callbacks. Core will call them for each online
>>>> cpu.
>>>>          */
>>>> @@ -889,6 +1010,20 @@ static int __init rapl_pmu_init(void)
>>>>         if (ret)
>>>>                 goto out1;
>>>>  
>>>> +       if (rapl_model->core_events) {
>>>> +               ret = perf_pmu_register(&rapl_pmus_core->pmu,
>>>> "power_per_core", -1);
>>>> +               if (ret) {
>>>> +                       /*
>>>> +                        * If registration of per_core PMU fails,
>>>> cleanup per_core PMU
>>>> +                        * variables, reset the per_core flag and
>>>> keep the
>>>> +                        * power PMU untouched.
>>>> +                        */
>>>> +                       pr_warn("Per-core PMU registration failed
>>>> (%d)\n", ret);
>>>> +                       cleanup_rapl_pmus(rapl_pmus_core);
>>>> +                       rapl_model->core_events = 0UL;
>>>> +               }
>>>> +       }
>>>> +
>>>>         rapl_advertise();
>>>>         return 0;
>>>>  
>>>> @@ -906,5 +1041,9 @@ static void __exit intel_rapl_exit(void)
>>>>         cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE)
>>>> ;
>>>>         perf_pmu_unregister(&rapl_pmus_pkg->pmu);
>>>>         cleanup_rapl_pmus(rapl_pmus_pkg);
>>>> +       if (rapl_model->core_events) {
>>>> +               perf_pmu_unregister(&rapl_pmus_core->pmu);
>>>> +               cleanup_rapl_pmus(rapl_pmus_core);
>>>> +       }
>>>
>>> we do check rapl_pmus_core before accessing it, but we never check
>>> rapl_pmus_pkg because the previous code assumes it always exists.
>>>
>>> so could there be a problem if some one starts the per-core pmu
>>> when
>>> pkg pmu is unregistered and cleaned up?
>>>
>>> say, in rapl_pmu_event_init(),
>>>
>>> if (event->attr.type == rapl_pmus_pkg->pmu.type ||
>>>    (rapl_pmus_core && event->attr.type == rapl_pmus_core-
>>>> pmu.type))
>>>
>>> this can break because rapl_pmus_pkg is freed, right?
>>
>> Hmm, I think this situation can't arise as whenever the power PMU
>> fails, we 
>> directly go to the failure path and dont setup the per-core PMU(which
>> means 
>> no one will be able to start the per-core PMU), 
>> Please let me know if there is a scenario where this assumption can
>> fail.
> 
> I mean if we do module unload and access power-per-core pmu at the same
> time, could there be a race?
> 
> why not just unregister and cleanup the per-core pmu before the pkg
> pmu?

Yes, better to be safe, will reorder the cleanup.

Thanks,
Dhananjay

>>
> 
> thanks,
> rui
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs
  2024-06-24  5:58 [PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
                   ` (9 preceding siblings ...)
  2024-06-24  5:59 ` [PATCH v3 10/10] perf/x86/rapl: Add per-core energy counter support for AMD CPUs Dhananjay Ugwekar
@ 2024-06-24  6:39 ` K Prateek Nayak
  10 siblings, 0 replies; 31+ messages in thread
From: K Prateek Nayak @ 2024-06-24  6:39 UTC (permalink / raw)
  To: Dhananjay Ugwekar
  Cc: rui.zhang, mingo, acme, irogers, mark.rutland, jolsa,
	alexander.shishkin, peterz, linux-perf-users, linux-kernel,
	linux-hardening, ananth.narayan, gautham.shenoy, ravi.bangoria,
	sandipan.das, linux-pm, namhyung, adrian.hunter, kan.liang, tglx,
	bp, dave.hansen, x86, kees, gustavoars, oleksandr

Hello Dhananjay,

On 6/24/2024 11:28 AM, Dhananjay Ugwekar wrote:
> Currently the energy-cores event in the power PMU aggregates energy
> consumption data at a package level. On the other hand the core energy
> RAPL counter in AMD CPUs has a core scope (which means the energy
> consumption is recorded separately for each core). Earlier efforts to add
> the core event in the power PMU had failed [1], due to the difference in
> the scope of these two events. Hence, there is a need for a new core scope
> PMU.
> 
> This patchset adds a new "power_per_core" PMU alongside the existing
> "power" PMU, which will be responsible for collecting the new
> "energy-per-core" event.
> 
> Tested the package level and core level PMU counters with workloads
> pinned to different CPUs.
> 
> Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa
> machine:
> 
> $ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1
> 
>   Performance counter stats for 'system wide':
> 
> S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
> S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
> S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/

Tested a bunch of scenarios on my 2P 3rd Generation EPYC server and this
time around I'm seeing the expected behavior. I'll leave some of
scenarios I've tested below:

   $ for i in `seq 0 63`; do taskset -c $i loop & done
   $ sudo perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 5

   S0-D0-C0              1              10.82 Joules power_per_core/energy-per-core/
   S0-D0-C1              1              10.87 Joules power_per_core/energy-per-core/
   S0-D0-C2              1              10.86 Joules power_per_core/energy-per-core/
   S0-D0-C3              1              10.89 Joules power_per_core/energy-per-core/
   S0-D0-C4              1              10.91 Joules power_per_core/energy-per-core/
   ...
   S0-D0-C63             1              11.03 Joules power_per_core/energy-per-core/
   S1-D1-C0              1               0.19 Joules power_per_core/energy-per-core/
   S1-D1-C1              1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C2              1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C3              1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C4              1               0.00 Joules power_per_core/energy-per-core/
   ...
   S1-D1-C63             1               0.00 Joules power_per_core/energy-per-core/

   $ for i in `seq 64 127`; do taskset -c $i loop & done
   $ sudo perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 5

   S0-D0-C0              1               0.17 Joules power_per_core/energy-per-core/
   S0-D0-C1              1               0.00 Joules power_per_core/energy-per-core/
   S0-D0-C2              1               0.00 Joules power_per_core/energy-per-core/
   S0-D0-C3              1               0.00 Joules power_per_core/energy-per-core/
   S0-D0-C4              1               0.00 Joules power_per_core/energy-per-core/
   ...
   S0-D0-C63             1               0.01 Joules power_per_core/energy-per-core/
   S1-D1-C0              1              10.51 Joules power_per_core/energy-per-core/
   S1-D1-C1              1              10.50 Joules power_per_core/energy-per-core/
   S1-D1-C2              1              10.52 Joules power_per_core/energy-per-core/
   S1-D1-C3              1              10.51 Joules power_per_core/energy-per-core/
   S1-D1-C4              1              10.51 Joules power_per_core/energy-per-core/
   ...
   S1-D1-C63             1              10.59 Joules power_per_core/energy-per-core/

   $ for i in `seq 0 15`; do taskset -c $i loop & done
   $ sudo perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 5

   S0-D0-C0              1              11.16 Joules power_per_core/energy-per-core/
   S0-D0-C1              1              11.21 Joules power_per_core/energy-per-core/
   S0-D0-C2              1              11.20 Joules power_per_core/energy-per-core/
   S0-D0-C3              1              11.24 Joules power_per_core/energy-per-core/
   S0-D0-C4              1              11.25 Joules power_per_core/energy-per-core/
   S0-D0-C5              1              11.26 Joules power_per_core/energy-per-core/
   S0-D0-C6              1              11.25 Joules power_per_core/energy-per-core/
   S0-D0-C7              1              11.25 Joules power_per_core/energy-per-core/
   S0-D0-C8              1              11.42 Joules power_per_core/energy-per-core/
   S0-D0-C9              1              11.43 Joules power_per_core/energy-per-core/
   S0-D0-C10             1              11.47 Joules power_per_core/energy-per-core/
   S0-D0-C11             1              11.43 Joules power_per_core/energy-per-core/
   S0-D0-C12             1              11.44 Joules power_per_core/energy-per-core/
   S0-D0-C13             1              11.41 Joules power_per_core/energy-per-core/
   S0-D0-C14             1              11.40 Joules power_per_core/energy-per-core/
   S0-D0-C15             1              11.41 Joules power_per_core/energy-per-core/
   S0-D0-C16             1               0.33 Joules power_per_core/energy-per-core/
   ...
   S0-D0-C63             1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C0              1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C1              1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C2              1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C3              1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C4              1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C5              1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C6              1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C7              1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C8              1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C9              1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C10             1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C11             1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C12             1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C13             1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C14             1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C15             1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C16             1               0.00 Joules power_per_core/energy-per-core/
   ...
   S1-D1-C63             1               0.01 Joules power_per_core/energy-per-core/

   $ for i in `seq 0 7` `seq 128 131` `seq 64 71` `seq 192 199`; do taskset -c $i loop & done
   $ sudo perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 5

   S0-D0-C0              1              18.68 Joules power_per_core/energy-per-core/
   S0-D0-C1              1              18.20 Joules power_per_core/energy-per-core/
   S0-D0-C2              1              18.27 Joules power_per_core/energy-per-core/
   S0-D0-C3              1              18.41 Joules power_per_core/energy-per-core/
   S0-D0-C4              1              16.94 Joules power_per_core/energy-per-core/
   S0-D0-C5              1              16.95 Joules power_per_core/energy-per-core/
   S0-D0-C6              1              16.92 Joules power_per_core/energy-per-core/
   S0-D0-C7              1              16.94 Joules power_per_core/energy-per-core/
   S0-D0-C8              1               0.39 Joules power_per_core/energy-per-core/
   ...
   S0-D0-C63             1               0.00 Joules power_per_core/energy-per-core/
   S1-D1-C0              1              18.59 Joules power_per_core/energy-per-core/
   S1-D1-C1              1              18.39 Joules power_per_core/energy-per-core/
   S1-D1-C2              1              17.50 Joules power_per_core/energy-per-core/
   S1-D1-C3              1              18.29 Joules power_per_core/energy-per-core/
   S1-D1-C4              1              18.58 Joules power_per_core/energy-per-core/
   S1-D1-C5              1              17.62 Joules power_per_core/energy-per-core/
   S1-D1-C6              1              17.75 Joules power_per_core/energy-per-core/
   S1-D1-C7              1              17.53 Joules power_per_core/energy-per-core/
   S1-D1-C8              1               0.00 Joules power_per_core/energy-per-core/
   ...
   S1-D1-C63             1               0.00 Joules power_per_core/energy-per-core/

Unlike last time, each socket is reporting accurate values for all the
scenarios I've tried above.

> 
> [1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/
> 
> This patchset applies cleanly on top of v6.10-rc4 as well as latest
> tip/master.

P.S. I tested this on top of v6.10-rc4 this time around.

Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>

> 
> v3 changes:
> * Patch 1 added to introduce the logical_core_id which is unique across
>    the system (Prateek)
> * Use the unique topology_logical_core_id() instead of
>    topology_core_id() (which is only unique within a package on tested
>    AMD and Intel systems) in Patch 10
> 
> [..snip..]
> 

-- 
Thanks and Regards,
Prateek

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-07-08  1:56 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-24  5:58 [PATCH v3 00/10] Add per-core RAPL energy counter support for AMD CPUs Dhananjay Ugwekar
2024-06-24  5:58 ` [PATCH v3 01/10] x86/topology: Introduce topology_logical_core_id() Dhananjay Ugwekar
2024-06-26 14:14   ` Zhang, Rui
2024-06-24  5:58 ` [PATCH v3 02/10] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs Dhananjay Ugwekar
2024-06-26 14:18   ` Zhang, Rui
2024-06-26 15:34     ` Dhananjay Ugwekar
2024-07-01 12:59   ` Peter Zijlstra
2024-07-02  9:39     ` Dhananjay Ugwekar
2024-06-24  5:59 ` [PATCH v3 03/10] perf/x86/rapl: Rename rapl_pmu variables Dhananjay Ugwekar
2024-06-24  5:59 ` [PATCH v3 04/10] perf/x86/rapl: Make rapl_model struct global Dhananjay Ugwekar
2024-06-24  5:59 ` [PATCH v3 05/10] perf/x86/rapl: Move cpumask variable to rapl_pmus struct Dhananjay Ugwekar
2024-07-01 13:02   ` Peter Zijlstra
2024-07-02 10:16     ` Dhananjay Ugwekar
2024-06-24  5:59 ` [PATCH v3 06/10] perf/x86/rapl: Add wrapper for online/offline functions Dhananjay Ugwekar
2024-06-24  5:59 ` [PATCH v3 07/10] perf/x86/rapl: Add an argument to the cleanup and init functions Dhananjay Ugwekar
2024-06-24  5:59 ` [PATCH v3 08/10] perf/x86/rapl: Modify the generic variable names to *_pkg* Dhananjay Ugwekar
2024-07-01 13:08   ` Peter Zijlstra
2024-07-02  2:25     ` Zhang, Rui
2024-07-02 10:20       ` Dhananjay Ugwekar
2024-07-03  4:07         ` Zhang, Rui
2024-07-03  6:31           ` Dhananjay Ugwekar
2024-07-05  2:18             ` Zhang, Rui
2024-07-05  9:24               ` Peter Zijlstra
2024-07-08  1:56                 ` Zhang, Rui
2024-06-24  5:59 ` [PATCH v3 09/10] perf/x86/rapl: Remove the global variable rapl_msrs Dhananjay Ugwekar
2024-06-24  5:59 ` [PATCH v3 10/10] perf/x86/rapl: Add per-core energy counter support for AMD CPUs Dhananjay Ugwekar
2024-06-26 15:18   ` Zhang, Rui
2024-06-26 16:37     ` Dhananjay Ugwekar
2024-06-27  6:49       ` Zhang, Rui
2024-06-27 11:13         ` Dhananjay Ugwekar
2024-06-24  6:39 ` [PATCH v3 00/10] Add per-core RAPL " K Prateek Nayak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).