linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] RAPL driver fixes for AMD CPUs
@ 2024-07-19  9:25 Dhananjay Ugwekar
  2024-07-19  9:25 ` [PATCH 1/2] perf/x86/rapl: Fix the energy-pkg event " Dhananjay Ugwekar
  2024-07-19  9:25 ` [PATCH 2/2] powercap/intel_rapl: " Dhananjay Ugwekar
  0 siblings, 2 replies; 9+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-19  9:25 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, rui.zhang
  Cc: linux-perf-users, linux-kernel, linux-pm, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	Dhananjay.Ugwekar

After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
AMD CPUs with 0x80000026 leaf have become CCD(Core Complex Die)-aware and 
this CCD info is reported by the _die_ macros, which earlier used to return 
package info.

e.g: Earlier topology_logical_die_id() and topology_logical_package_id()
used to return the same value for all the CPUs even on a multi-die/multi-CCD
system. But after the above mentioned commit, topology_logical_die_id() 
returns the CCD id and topology_logical_package_id() returns the 
package/socket id.

Hence, the RAPL energy-pkg event which uses _die_ macros has to be
fixed, in light of above change.

Dhananjay Ugwekar (2):
  perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
  powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs

 arch/x86/events/rapl.c               | 44 ++++++++++++++++++++++++----
 drivers/powercap/intel_rapl_common.c | 20 +++++++++++--
 2 files changed, 56 insertions(+), 8 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
  2024-07-19  9:25 [PATCH 0/2] RAPL driver fixes for AMD CPUs Dhananjay Ugwekar
@ 2024-07-19  9:25 ` Dhananjay Ugwekar
  2024-07-19  9:25 ` [PATCH 2/2] powercap/intel_rapl: " Dhananjay Ugwekar
  1 sibling, 0 replies; 9+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-19  9:25 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, rui.zhang
  Cc: linux-perf-users, linux-kernel, linux-pm, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	Dhananjay.Ugwekar

After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
on AMD processors that support extended CPUID leaf 0x80000026, the
topology_die_cpumask() and topology_logical_die_id() macros, no longer
return the package cpumask and package id, instead they return the CCD
(Core Complex Die) mask and id respectively. This leads to the energy-pkg
event scope to be modified to CCD instead of package.

For more historical context, please refer to commit 32fb480e0a2c
("powercap/intel_rapl: Support multi-die/package"), which initially changed
the RAPL scope from package to die for all systems, as Intel systems
with Die enumeration have RAPL scope as die, and those without die
enumeration were not affected by it. So, all systems(Intel, AMD, Hygon),
worked correctly with topology_logical_die_id() until recently, but this
changed after the "0x80000026 leaf" commit mentioned above.

Replacing topology_logical_die_id() with topology_physical_package_id()
conditionally only for AMD and Hygon fixes the energy-pkg event.

On a 12 CCD 1 Package AMD Zen4 Genoa machine:

Before:
$ cat /sys/devices/power/cpumask
0,8,16,24,32,40,48,56,64,72,80,88.

The expected cpumask here is supposed to be just "0", as it is a package
scope event, only one CPU will be collecting the event for all the CPUs in
the package.

After:
$ cat /sys/devices/power/cpumask
0

Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
---
 arch/x86/events/rapl.c | 44 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index b985ca79cf97..6e81bb685f7c 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -103,6 +103,18 @@ static struct perf_pmu_events_attr event_attr_##v = {				\
 	.event_str	= str,							\
 };
 
+/*
+ * Intel systems that enumerate DIE domain have RAPL domains implemented
+ * per-die, however, the same is not true for AMD and Hygon processors
+ * where RAPL domains for PKG energy are in-fact per-PKG. Since
+ * logical_die_id is same as logical_package_id in absence of DIE
+ * enumeration, use topology_logical_die_id() on Intel systems and
+ * topology_logical_package_id() on AMD and Hygon systems.
+ */
+#define rapl_pmu_is_pkg_scope()				\
+	(boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||	\
+	 boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
+
 struct rapl_pmu {
 	raw_spinlock_t		lock;
 	int			n_active;
@@ -140,9 +152,25 @@ static unsigned int rapl_cntr_mask;
 static u64 rapl_timer_ms;
 static struct perf_msr *rapl_msrs;
 
+/*
+ * Helper functions to get the correct topology macros according to the
+ * RAPL PMU scope.
+ */
+static inline unsigned int get_rapl_pmu_idx(int cpu)
+{
+	return rapl_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
+					 topology_logical_die_id(cpu);
+}
+
+static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
+{
+	return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
+					 topology_die_cpumask(cpu);
+}
+
 static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
 {
-	unsigned int rapl_pmu_idx = topology_logical_die_id(cpu);
+	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
 
 	/*
 	 * The unsigned check also catches the '-1' return value for non
@@ -543,6 +571,7 @@ static struct perf_msr amd_rapl_msrs[] = {
 
 static int rapl_cpu_offline(unsigned int cpu)
 {
+	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
 	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
 	int target;
 
@@ -552,7 +581,7 @@ static int rapl_cpu_offline(unsigned int cpu)
 
 	pmu->cpu = -1;
 	/* Find a new cpu to collect rapl events */
-	target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
+	target = cpumask_any_but(rapl_pmu_cpumask, cpu);
 
 	/* Migrate rapl events to the new target */
 	if (target < nr_cpu_ids) {
@@ -565,10 +594,12 @@ static int rapl_cpu_offline(unsigned int cpu)
 
 static int rapl_cpu_online(unsigned int cpu)
 {
+	const struct cpumask *rapl_pmu_cpumask = get_rapl_pmu_cpumask(cpu);
 	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
 	int target;
 
 	if (!pmu) {
+		unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
 		pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
 		if (!pmu)
 			return -ENOMEM;
@@ -579,14 +610,14 @@ static int rapl_cpu_online(unsigned int cpu)
 		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
 		rapl_hrtimer_init(pmu);
 
-		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
+		rapl_pmus->pmus[rapl_pmu_idx] = pmu;
 	}
 
 	/*
 	 * Check if there is an online cpu in the package which collects rapl
 	 * events already.
 	 */
-	target = cpumask_any_and(&rapl_cpu_mask, topology_die_cpumask(cpu));
+	target = cpumask_any_and(&rapl_cpu_mask, rapl_pmu_cpumask);
 	if (target < nr_cpu_ids)
 		return 0;
 
@@ -675,7 +706,10 @@ static const struct attribute_group *rapl_attr_update[] = {
 
 static int __init init_rapl_pmus(void)
 {
-	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
+	int nr_rapl_pmu = topology_max_packages();
+
+	if (!rapl_pmu_is_pkg_scope())
+		nr_rapl_pmu *= topology_max_dies_per_package();
 
 	rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus, nr_rapl_pmu), GFP_KERNEL);
 	if (!rapl_pmus)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
  2024-07-19  9:25 [PATCH 0/2] RAPL driver fixes for AMD CPUs Dhananjay Ugwekar
  2024-07-19  9:25 ` [PATCH 1/2] perf/x86/rapl: Fix the energy-pkg event " Dhananjay Ugwekar
@ 2024-07-19  9:25 ` Dhananjay Ugwekar
  2024-07-21 14:17   ` Zhang, Rui
  1 sibling, 1 reply; 9+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-19  9:25 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
	x86, rui.zhang
  Cc: linux-perf-users, linux-kernel, linux-pm, ananth.narayan,
	gautham.shenoy, kprateek.nayak, ravi.bangoria, sandipan.das,
	Dhananjay.Ugwekar, Michael Larabel

After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
on AMD processors that support extended CPUID leaf 0x80000026, the
topology_logical_die_id() macros, no longer returns package id, instead it
returns the CCD (Core Complex Die) id. This leads to the energy-pkg
event scope to be modified to CCD instead of package.

For more historical context, please refer to commit 32fb480e0a2c
("powercap/intel_rapl: Support multi-die/package"), which initially changed
the RAPL scope from package to die for all systems, as Intel systems
with Die enumeration have RAPL scope as die, and those without die
enumeration are not affected. So, all systems(Intel, AMD, Hygon), worked
correctly with topology_logical_die_id() until recently, but this changed
after the "0x80000026 leaf" commit mentioned above.

Replacing topology_logical_die_id() with topology_physical_package_id()
conditionally only for AMD and Hygon fixes the energy-pkg event.

On an AMD 2 socket 8 CCD Zen5 server:

Before:

linux$ ls /sys/class/powercap/
intel-rapl      intel-rapl:1:0  intel-rapl:3:0  intel-rapl:5:0
intel-rapl:7:0  intel-rapl:9:0  intel-rapl:b:0  intel-rapl:d:0
intel-rapl:f:0  intel-rapl:0    intel-rapl:2    intel-rapl:4
intel-rapl:6    intel-rapl:8    intel-rapl:a    intel-rapl:c
intel-rapl:e    intel-rapl:0:0  intel-rapl:2:0  intel-rapl:4:0
intel-rapl:6:0  intel-rapl:8:0  intel-rapl:a:0  intel-rapl:c:0
intel-rapl:e:0  intel-rapl:1    intel-rapl:3    intel-rapl:5
intel-rapl:7    intel-rapl:9    intel-rapl:b    intel-rapl:d
intel-rapl:f

After:

linux$ ls /sys/class/powercap/
intel-rapl  intel-rapl:0  intel-rapl:0:0  intel-rapl:1  intel-rapl:1:0

Only one sysfs entry per-event per-package is created after this change.

Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
Reported-by: Michael Larabel <michael@michaellarabel.com>
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 drivers/powercap/intel_rapl_common.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
index 3cffa6c79538..2f24ca764408 100644
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -2128,6 +2128,18 @@ void rapl_remove_package(struct rapl_package *rp)
 }
 EXPORT_SYMBOL_GPL(rapl_remove_package);
 
+/*
+ * Intel systems that enumerate DIE domain have RAPL domains implemented
+ * per-die, however, the same is not true for AMD and Hygon processors
+ * where RAPL domains for PKG energy are in-fact per-PKG. Since
+ * logical_die_id is same as logical_package_id in absence of DIE
+ * enumeration, use topology_logical_die_id() on Intel systems and
+ * topology_logical_package_id() on AMD and Hygon systems.
+ */
+#define rapl_pmu_is_pkg_scope()				\
+	(boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||	\
+	 boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
+
 /* caller to ensure CPU hotplug lock is held */
 struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_priv *priv,
 							 bool id_is_cpu)
@@ -2136,7 +2148,8 @@ struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_
 	int uid;
 
 	if (id_is_cpu)
-		uid = topology_logical_die_id(id);
+		uid = rapl_pmu_is_pkg_scope() ?
+		      topology_physical_package_id(id) : topology_logical_die_id(id);
 	else
 		uid = id;
 
@@ -2168,9 +2181,10 @@ struct rapl_package *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *pr
 		return ERR_PTR(-ENOMEM);
 
 	if (id_is_cpu) {
-		rp->id = topology_logical_die_id(id);
+		rp->id = rapl_pmu_is_pkg_scope() ?
+			 topology_physical_package_id(id) : topology_logical_die_id(id);
 		rp->lead_cpu = id;
-		if (topology_max_dies_per_package() > 1)
+		if (!rapl_pmu_is_pkg_scope() && topology_max_dies_per_package() > 1)
 			snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d",
 				 topology_physical_package_id(id), topology_die_id(id));
 		else
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
  2024-07-19  9:25 ` [PATCH 2/2] powercap/intel_rapl: " Dhananjay Ugwekar
@ 2024-07-21 14:17   ` Zhang, Rui
  2024-07-22  8:24     ` Dhananjay Ugwekar
  0 siblings, 1 reply; 9+ messages in thread
From: Zhang, Rui @ 2024-07-21 14:17 UTC (permalink / raw)
  To: alexander.shishkin@linux.intel.com, tglx@linutronix.de,
	bp@alien8.de, dave.hansen@linux.intel.com, peterz@infradead.org,
	mark.rutland@arm.com, mingo@redhat.com, Dhananjay.Ugwekar@amd.com,
	acme@kernel.org, namhyung@kernel.org, jolsa@kernel.org,
	kan.liang@linux.intel.com, irogers@google.com, Hunter, Adrian,
	x86@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-kernel@vger.kernel.org,
	linux-perf-users@vger.kernel.org, Larabel, Michael,
	sandipan.das@amd.com, linux-pm@vger.kernel.org,
	ananth.narayan@amd.com

On Fri, 2024-07-19 at 09:25 +0000, Dhananjay Ugwekar wrote:
> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026
> leaf"),
> on AMD processors that support extended CPUID leaf 0x80000026, the
> topology_logical_die_id() macros, no longer returns package id,
> instead it
> returns the CCD (Core Complex Die) id. This leads to the energy-pkg
> event scope to be modified to CCD instead of package.
> 
> For more historical context, please refer to commit 32fb480e0a2c
> ("powercap/intel_rapl: Support multi-die/package"), which initially
> changed
> the RAPL scope from package to die for all systems, as Intel systems
> with Die enumeration have RAPL scope as die, and those without die
> enumeration are not affected. So, all systems(Intel, AMD, Hygon),
> worked
> correctly with topology_logical_die_id() until recently, but this
> changed
> after the "0x80000026 leaf" commit mentioned above.
> 
> Replacing topology_logical_die_id() with
> topology_physical_package_id()
> conditionally only for AMD and Hygon fixes the energy-pkg event.
> 
> On an AMD 2 socket 8 CCD Zen5 server:
> 
> Before:
> 
> linux$ ls /sys/class/powercap/
> intel-rapl      intel-rapl:1:0  intel-rapl:3:0  intel-rapl:5:0
> intel-rapl:7:0  intel-rapl:9:0  intel-rapl:b:0  intel-rapl:d:0
> intel-rapl:f:0  intel-rapl:0    intel-rapl:2    intel-rapl:4
> intel-rapl:6    intel-rapl:8    intel-rapl:a    intel-rapl:c
> intel-rapl:e    intel-rapl:0:0  intel-rapl:2:0  intel-rapl:4:0
> intel-rapl:6:0  intel-rapl:8:0  intel-rapl:a:0  intel-rapl:c:0
> intel-rapl:e:0  intel-rapl:1    intel-rapl:3    intel-rapl:5
> intel-rapl:7    intel-rapl:9    intel-rapl:b    intel-rapl:d
> intel-rapl:f
> 
> After:
> 
> linux$ ls /sys/class/powercap/
> intel-rapl  intel-rapl:0  intel-rapl:0:0  intel-rapl:1  intel-
> rapl:1:0
> 
> Only one sysfs entry per-event per-package is created after this
> change.
> 
> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD
> 0x80000026 leaf")
> Reported-by: Michael Larabel <michael@michaellarabel.com>
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>

For the future Intel multi-die system that I know, it still has
package-scope RAPL, but this is done with TPMI RAPL interface.

The TPMI RAPL driver invokes these APIs with "id == pkg_id" and
"id_is_cpu == false", so no need to make rapl_pmu_is_pkg_scope()
returns true for those Intel systems.

The patch LGTM.

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

thanks,
rui
> ---
>  drivers/powercap/intel_rapl_common.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/powercap/intel_rapl_common.c
> b/drivers/powercap/intel_rapl_common.c
> index 3cffa6c79538..2f24ca764408 100644
> --- a/drivers/powercap/intel_rapl_common.c
> +++ b/drivers/powercap/intel_rapl_common.c
> @@ -2128,6 +2128,18 @@ void rapl_remove_package(struct rapl_package
> *rp)
>  }
>  EXPORT_SYMBOL_GPL(rapl_remove_package);
>  
> +/*
> + * Intel systems that enumerate DIE domain have RAPL domains
> implemented
> + * per-die, however, the same is not true for AMD and Hygon
> processors
> + * where RAPL domains for PKG energy are in-fact per-PKG. Since
> + * logical_die_id is same as logical_package_id in absence of DIE
> + * enumeration, use topology_logical_die_id() on Intel systems and
> + * topology_logical_package_id() on AMD and Hygon systems.
> + */
> +#define rapl_pmu_is_pkg_scope()                                \
> +       (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||  \
> +        boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
> +
>  /* caller to ensure CPU hotplug lock is held */
>  struct rapl_package *rapl_find_package_domain_cpuslocked(int id,
> struct rapl_if_priv *priv,
>                                                          bool
> id_is_cpu)
> @@ -2136,7 +2148,8 @@ struct rapl_package
> *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_
>         int uid;
>  
>         if (id_is_cpu)
> -               uid = topology_logical_die_id(id);
> +               uid = rapl_pmu_is_pkg_scope() ?
> +                     topology_physical_package_id(id) :
> topology_logical_die_id(id);
>         else
>                 uid = id;
>  
> @@ -2168,9 +2181,10 @@ struct rapl_package
> *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *pr
>                 return ERR_PTR(-ENOMEM);
>  
>         if (id_is_cpu) {
> -               rp->id = topology_logical_die_id(id);
> +               rp->id = rapl_pmu_is_pkg_scope() ?
> +                        topology_physical_package_id(id) :
> topology_logical_die_id(id);
>                 rp->lead_cpu = id;
> -               if (topology_max_dies_per_package() > 1)
> +               if (!rapl_pmu_is_pkg_scope() &&
> topology_max_dies_per_package() > 1)
>                         snprintf(rp->name,
> PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d",
>                                  topology_physical_package_id(id),
> topology_die_id(id));
>                 else


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
  2024-07-21 14:17   ` Zhang, Rui
@ 2024-07-22  8:24     ` Dhananjay Ugwekar
  2024-07-22 13:52       ` Zhang, Rui
  0 siblings, 1 reply; 9+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-22  8:24 UTC (permalink / raw)
  To: Zhang, Rui, alexander.shishkin@linux.intel.com,
	tglx@linutronix.de, bp@alien8.de, dave.hansen@linux.intel.com,
	peterz@infradead.org, mark.rutland@arm.com, mingo@redhat.com,
	acme@kernel.org, namhyung@kernel.org, jolsa@kernel.org,
	kan.liang@linux.intel.com, irogers@google.com, Hunter, Adrian,
	x86@kernel.org
  Cc: ravi.bangoria@amd.com, kprateek.nayak@amd.com,
	gautham.shenoy@amd.com, linux-kernel@vger.kernel.org,
	linux-perf-users@vger.kernel.org, Larabel, Michael,
	sandipan.das@amd.com, linux-pm@vger.kernel.org,
	ananth.narayan@amd.com

Hi Rui,

On 7/21/2024 7:47 PM, Zhang, Rui wrote:
> On Fri, 2024-07-19 at 09:25 +0000, Dhananjay Ugwekar wrote:
>> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026
>> leaf"),
>> on AMD processors that support extended CPUID leaf 0x80000026, the
>> topology_logical_die_id() macros, no longer returns package id,
>> instead it
>> returns the CCD (Core Complex Die) id. This leads to the energy-pkg
>> event scope to be modified to CCD instead of package.
>>
>> For more historical context, please refer to commit 32fb480e0a2c
>> ("powercap/intel_rapl: Support multi-die/package"), which initially
>> changed
>> the RAPL scope from package to die for all systems, as Intel systems
>> with Die enumeration have RAPL scope as die, and those without die
>> enumeration are not affected. So, all systems(Intel, AMD, Hygon),
>> worked
>> correctly with topology_logical_die_id() until recently, but this
>> changed
>> after the "0x80000026 leaf" commit mentioned above.
>>
>> Replacing topology_logical_die_id() with
>> topology_physical_package_id()
>> conditionally only for AMD and Hygon fixes the energy-pkg event.
>>
>> On an AMD 2 socket 8 CCD Zen5 server:
>>
>> Before:
>>
>> linux$ ls /sys/class/powercap/
>> intel-rapl      intel-rapl:1:0  intel-rapl:3:0  intel-rapl:5:0
>> intel-rapl:7:0  intel-rapl:9:0  intel-rapl:b:0  intel-rapl:d:0
>> intel-rapl:f:0  intel-rapl:0    intel-rapl:2    intel-rapl:4
>> intel-rapl:6    intel-rapl:8    intel-rapl:a    intel-rapl:c
>> intel-rapl:e    intel-rapl:0:0  intel-rapl:2:0  intel-rapl:4:0
>> intel-rapl:6:0  intel-rapl:8:0  intel-rapl:a:0  intel-rapl:c:0
>> intel-rapl:e:0  intel-rapl:1    intel-rapl:3    intel-rapl:5
>> intel-rapl:7    intel-rapl:9    intel-rapl:b    intel-rapl:d
>> intel-rapl:f
>>
>> After:
>>
>> linux$ ls /sys/class/powercap/
>> intel-rapl  intel-rapl:0  intel-rapl:0:0  intel-rapl:1  intel-
>> rapl:1:0
>>
>> Only one sysfs entry per-event per-package is created after this
>> change.
>>
>> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD
>> 0x80000026 leaf")
>> Reported-by: Michael Larabel <michael@michaellarabel.com>
>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> 
> For the future Intel multi-die system that I know, it still has
> package-scope RAPL, but this is done with TPMI RAPL interface.
> 
> The TPMI RAPL driver invokes these APIs with "id == pkg_id" and
> "id_is_cpu == false", so no need to make rapl_pmu_is_pkg_scope()
> returns true for those Intel systems.

This seems like an important point, would you be okay with it, if I include
this info in the commit log in v2 along with you rb tag?

Thanks for the review.

Regards,
Dhananjay

> 
> The patch LGTM.
> 
> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
> 
> thanks,
> rui
>> ---
>>  drivers/powercap/intel_rapl_common.c | 20 +++++++++++++++++---
>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/powercap/intel_rapl_common.c
>> b/drivers/powercap/intel_rapl_common.c
>> index 3cffa6c79538..2f24ca764408 100644
>> --- a/drivers/powercap/intel_rapl_common.c
>> +++ b/drivers/powercap/intel_rapl_common.c
>> @@ -2128,6 +2128,18 @@ void rapl_remove_package(struct rapl_package
>> *rp)
>>  }
>>  EXPORT_SYMBOL_GPL(rapl_remove_package);
>>  
>> +/*
>> + * Intel systems that enumerate DIE domain have RAPL domains
>> implemented
>> + * per-die, however, the same is not true for AMD and Hygon
>> processors
>> + * where RAPL domains for PKG energy are in-fact per-PKG. Since
>> + * logical_die_id is same as logical_package_id in absence of DIE
>> + * enumeration, use topology_logical_die_id() on Intel systems and
>> + * topology_logical_package_id() on AMD and Hygon systems.
>> + */
>> +#define rapl_pmu_is_pkg_scope()                                \
>> +       (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||  \
>> +        boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
>> +
>>  /* caller to ensure CPU hotplug lock is held */
>>  struct rapl_package *rapl_find_package_domain_cpuslocked(int id,
>> struct rapl_if_priv *priv,
>>                                                          bool
>> id_is_cpu)
>> @@ -2136,7 +2148,8 @@ struct rapl_package
>> *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_
>>         int uid;
>>  
>>         if (id_is_cpu)
>> -               uid = topology_logical_die_id(id);
>> +               uid = rapl_pmu_is_pkg_scope() ?
>> +                     topology_physical_package_id(id) :
>> topology_logical_die_id(id);
>>         else
>>                 uid = id;
>>  
>> @@ -2168,9 +2181,10 @@ struct rapl_package
>> *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *pr
>>                 return ERR_PTR(-ENOMEM);
>>  
>>         if (id_is_cpu) {
>> -               rp->id = topology_logical_die_id(id);
>> +               rp->id = rapl_pmu_is_pkg_scope() ?
>> +                        topology_physical_package_id(id) :
>> topology_logical_die_id(id);
>>                 rp->lead_cpu = id;
>> -               if (topology_max_dies_per_package() > 1)
>> +               if (!rapl_pmu_is_pkg_scope() &&
>> topology_max_dies_per_package() > 1)
>>                         snprintf(rp->name,
>> PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d",
>>                                  topology_physical_package_id(id),
>> topology_die_id(id));
>>                 else
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
  2024-07-22  8:24     ` Dhananjay Ugwekar
@ 2024-07-22 13:52       ` Zhang, Rui
  2024-07-22 14:01         ` Dhananjay Ugwekar
  0 siblings, 1 reply; 9+ messages in thread
From: Zhang, Rui @ 2024-07-22 13:52 UTC (permalink / raw)
  To: irogers@google.com, alexander.shishkin@linux.intel.com,
	x86@kernel.org, dave.hansen@linux.intel.com, bp@alien8.de,
	peterz@infradead.org, mark.rutland@arm.com, mingo@redhat.com,
	tglx@linutronix.de, Dhananjay.Ugwekar@amd.com,
	namhyung@kernel.org, jolsa@kernel.org, acme@kernel.org,
	kan.liang@linux.intel.com, Hunter, Adrian
  Cc: ravi.bangoria@amd.com, gautham.shenoy@amd.com,
	kprateek.nayak@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, Larabel, Michael,
	sandipan.das@amd.com, linux-pm@vger.kernel.org,
	ananth.narayan@amd.com

On Mon, 2024-07-22 at 13:54 +0530, Dhananjay Ugwekar wrote:
> Hi Rui,
> 
> On 7/21/2024 7:47 PM, Zhang, Rui wrote:
> > On Fri, 2024-07-19 at 09:25 +0000, Dhananjay Ugwekar wrote:
> > > After commit ("x86/cpu/topology: Add support for the AMD
> > > 0x80000026
> > > leaf"),
> > > on AMD processors that support extended CPUID leaf 0x80000026,
> > > the
> > > topology_logical_die_id() macros, no longer returns package id,
> > > instead it
> > > returns the CCD (Core Complex Die) id. This leads to the energy-
> > > pkg
> > > event scope to be modified to CCD instead of package.
> > > 
> > > For more historical context, please refer to commit 32fb480e0a2c
> > > ("powercap/intel_rapl: Support multi-die/package"), which
> > > initially
> > > changed
> > > the RAPL scope from package to die for all systems, as Intel
> > > systems
> > > with Die enumeration have RAPL scope as die, and those without
> > > die
> > > enumeration are not affected. So, all systems(Intel, AMD, Hygon),
> > > worked
> > > correctly with topology_logical_die_id() until recently, but this
> > > changed
> > > after the "0x80000026 leaf" commit mentioned above.
> > > 
> > > Replacing topology_logical_die_id() with
> > > topology_physical_package_id()
> > > conditionally only for AMD and Hygon fixes the energy-pkg event.
> > > 
> > > On an AMD 2 socket 8 CCD Zen5 server:
> > > 
> > > Before:
> > > 
> > > linux$ ls /sys/class/powercap/
> > > intel-rapl      intel-rapl:1:0  intel-rapl:3:0  intel-rapl:5:0
> > > intel-rapl:7:0  intel-rapl:9:0  intel-rapl:b:0  intel-rapl:d:0
> > > intel-rapl:f:0  intel-rapl:0    intel-rapl:2    intel-rapl:4
> > > intel-rapl:6    intel-rapl:8    intel-rapl:a    intel-rapl:c
> > > intel-rapl:e    intel-rapl:0:0  intel-rapl:2:0  intel-rapl:4:0
> > > intel-rapl:6:0  intel-rapl:8:0  intel-rapl:a:0  intel-rapl:c:0
> > > intel-rapl:e:0  intel-rapl:1    intel-rapl:3    intel-rapl:5
> > > intel-rapl:7    intel-rapl:9    intel-rapl:b    intel-rapl:d
> > > intel-rapl:f
> > > 
> > > After:
> > > 
> > > linux$ ls /sys/class/powercap/
> > > intel-rapl  intel-rapl:0  intel-rapl:0:0  intel-rapl:1  intel-
> > > rapl:1:0
> > > 
> > > Only one sysfs entry per-event per-package is created after this
> > > change.
> > > 
> > > Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD
> > > 0x80000026 leaf")
> > > Reported-by: Michael Larabel <michael@michaellarabel.com>
> > > Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> > 
> > For the future Intel multi-die system that I know, it still has
> > package-scope RAPL, but this is done with TPMI RAPL interface.
> > 
> > The TPMI RAPL driver invokes these APIs with "id == pkg_id" and
> > "id_is_cpu == false", so no need to make rapl_pmu_is_pkg_scope()
> > returns true for those Intel systems.
> 
> This seems like an important point, would you be okay with it, if I
> include
> this info in the commit log in v2 along with you rb tag?

Yes.

This reminds me that we can rephrase the comment for
rapl_pmu_is_pkg_scope() a bit, something including below points,
1. AMD/HYGON platforms use per-PKG Package energy counter
2. For Intel platforms
   2.1 CLX-AP platform has per-DIE Package energy counter
   2.2 other platforms that uses MSR RAPL are single die systems so the
Package energy counter are per-PKG/per-DIE
   2.3 new platforms that use TPMI RAPL doesn't care about the scope
because they are not MSR/CPU based.

what do you think?

thanks,
rui
> 
> Thanks for the review.
> 
> Regards,
> Dhananjay
> 
> > 
> > The patch LGTM.
> > 
> > Reviewed-by: Zhang Rui <rui.zhang@intel.com>
> > 
> > thanks,
> > rui
> > > ---
> > >  drivers/powercap/intel_rapl_common.c | 20 +++++++++++++++++---
> > >  1 file changed, 17 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/powercap/intel_rapl_common.c
> > > b/drivers/powercap/intel_rapl_common.c
> > > index 3cffa6c79538..2f24ca764408 100644
> > > --- a/drivers/powercap/intel_rapl_common.c
> > > +++ b/drivers/powercap/intel_rapl_common.c
> > > @@ -2128,6 +2128,18 @@ void rapl_remove_package(struct
> > > rapl_package
> > > *rp)
> > >  }
> > >  EXPORT_SYMBOL_GPL(rapl_remove_package);
> > >  
> > > +/*
> > > + * Intel systems that enumerate DIE domain have RAPL domains
> > > implemented
> > > + * per-die, however, the same is not true for AMD and Hygon
> > > processors
> > > + * where RAPL domains for PKG energy are in-fact per-PKG. Since
> > > + * logical_die_id is same as logical_package_id in absence of
> > > DIE
> > > + * enumeration, use topology_logical_die_id() on Intel systems
> > > and
> > > + * topology_logical_package_id() on AMD and Hygon systems.
> > > + */
> > > +#define rapl_pmu_is_pkg_scope()                                \
> > > +       (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||  \
> > > +        boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
> > > +
> > >  /* caller to ensure CPU hotplug lock is held */
> > >  struct rapl_package *rapl_find_package_domain_cpuslocked(int id,
> > > struct rapl_if_priv *priv,
> > >                                                          bool
> > > id_is_cpu)
> > > @@ -2136,7 +2148,8 @@ struct rapl_package
> > > *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_
> > >         int uid;
> > >  
> > >         if (id_is_cpu)
> > > -               uid = topology_logical_die_id(id);
> > > +               uid = rapl_pmu_is_pkg_scope() ?
> > > +                     topology_physical_package_id(id) :
> > > topology_logical_die_id(id);
> > >         else
> > >                 uid = id;
> > >  
> > > @@ -2168,9 +2181,10 @@ struct rapl_package
> > > *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *pr
> > >                 return ERR_PTR(-ENOMEM);
> > >  
> > >         if (id_is_cpu) {
> > > -               rp->id = topology_logical_die_id(id);
> > > +               rp->id = rapl_pmu_is_pkg_scope() ?
> > > +                        topology_physical_package_id(id) :
> > > topology_logical_die_id(id);
> > >                 rp->lead_cpu = id;
> > > -               if (topology_max_dies_per_package() > 1)
> > > +               if (!rapl_pmu_is_pkg_scope() &&
> > > topology_max_dies_per_package() > 1)
> > >                         snprintf(rp->name,
> > > PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d",
> > >                                 
> > > topology_physical_package_id(id),
> > > topology_die_id(id));
> > >                 else
> > 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
  2024-07-22 13:52       ` Zhang, Rui
@ 2024-07-22 14:01         ` Dhananjay Ugwekar
  2024-07-22 15:21           ` Zhang, Rui
  0 siblings, 1 reply; 9+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-22 14:01 UTC (permalink / raw)
  To: Zhang, Rui, irogers@google.com,
	alexander.shishkin@linux.intel.com, x86@kernel.org,
	dave.hansen@linux.intel.com, bp@alien8.de, peterz@infradead.org,
	mark.rutland@arm.com, mingo@redhat.com, tglx@linutronix.de,
	namhyung@kernel.org, jolsa@kernel.org, acme@kernel.org,
	kan.liang@linux.intel.com, Hunter, Adrian
  Cc: ravi.bangoria@amd.com, gautham.shenoy@amd.com,
	kprateek.nayak@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, Larabel, Michael,
	sandipan.das@amd.com, linux-pm@vger.kernel.org,
	ananth.narayan@amd.com



On 7/22/2024 7:22 PM, Zhang, Rui wrote:
> On Mon, 2024-07-22 at 13:54 +0530, Dhananjay Ugwekar wrote:
>> Hi Rui,
>>
>> On 7/21/2024 7:47 PM, Zhang, Rui wrote:
>>> On Fri, 2024-07-19 at 09:25 +0000, Dhananjay Ugwekar wrote:
>>>> After commit ("x86/cpu/topology: Add support for the AMD
>>>> 0x80000026
>>>> leaf"),
>>>> on AMD processors that support extended CPUID leaf 0x80000026,
>>>> the
>>>> topology_logical_die_id() macros, no longer returns package id,
>>>> instead it
>>>> returns the CCD (Core Complex Die) id. This leads to the energy-
>>>> pkg
>>>> event scope to be modified to CCD instead of package.
>>>>
>>>> For more historical context, please refer to commit 32fb480e0a2c
>>>> ("powercap/intel_rapl: Support multi-die/package"), which
>>>> initially
>>>> changed
>>>> the RAPL scope from package to die for all systems, as Intel
>>>> systems
>>>> with Die enumeration have RAPL scope as die, and those without
>>>> die
>>>> enumeration are not affected. So, all systems(Intel, AMD, Hygon),
>>>> worked
>>>> correctly with topology_logical_die_id() until recently, but this
>>>> changed
>>>> after the "0x80000026 leaf" commit mentioned above.
>>>>
>>>> Replacing topology_logical_die_id() with
>>>> topology_physical_package_id()
>>>> conditionally only for AMD and Hygon fixes the energy-pkg event.
>>>>
>>>> On an AMD 2 socket 8 CCD Zen5 server:
>>>>
>>>> Before:
>>>>
>>>> linux$ ls /sys/class/powercap/
>>>> intel-rapl      intel-rapl:1:0  intel-rapl:3:0  intel-rapl:5:0
>>>> intel-rapl:7:0  intel-rapl:9:0  intel-rapl:b:0  intel-rapl:d:0
>>>> intel-rapl:f:0  intel-rapl:0    intel-rapl:2    intel-rapl:4
>>>> intel-rapl:6    intel-rapl:8    intel-rapl:a    intel-rapl:c
>>>> intel-rapl:e    intel-rapl:0:0  intel-rapl:2:0  intel-rapl:4:0
>>>> intel-rapl:6:0  intel-rapl:8:0  intel-rapl:a:0  intel-rapl:c:0
>>>> intel-rapl:e:0  intel-rapl:1    intel-rapl:3    intel-rapl:5
>>>> intel-rapl:7    intel-rapl:9    intel-rapl:b    intel-rapl:d
>>>> intel-rapl:f
>>>>
>>>> After:
>>>>
>>>> linux$ ls /sys/class/powercap/
>>>> intel-rapl  intel-rapl:0  intel-rapl:0:0  intel-rapl:1  intel-
>>>> rapl:1:0
>>>>
>>>> Only one sysfs entry per-event per-package is created after this
>>>> change.
>>>>
>>>> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD
>>>> 0x80000026 leaf")
>>>> Reported-by: Michael Larabel <michael@michaellarabel.com>
>>>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
>>>
>>> For the future Intel multi-die system that I know, it still has
>>> package-scope RAPL, but this is done with TPMI RAPL interface.
>>>
>>> The TPMI RAPL driver invokes these APIs with "id == pkg_id" and
>>> "id_is_cpu == false", so no need to make rapl_pmu_is_pkg_scope()
>>> returns true for those Intel systems.
>>
>> This seems like an important point, would you be okay with it, if I
>> include
>> this info in the commit log in v2 along with you rb tag?
> 
> Yes.
> 
> This reminds me that we can rephrase the comment for
> rapl_pmu_is_pkg_scope() a bit, something including below points,
> 1. AMD/HYGON platforms use per-PKG Package energy counter
> 2. For Intel platforms
>    2.1 CLX-AP platform has per-DIE Package energy counter
>    2.2 other platforms that uses MSR RAPL are single die systems so the
> Package energy counter are per-PKG/per-DIE
>    2.3 new platforms that use TPMI RAPL doesn't care about the scope
> because they are not MSR/CPU based.
> 
> what do you think?

Agreed, this gives a more clear picture of the all the RAPL scopes.

We will need the above comment in the first patch as well, apart from the 2.3 point.

Also, regarding perf/x86/rapl driver(patch 1), will you be sending a patch
to conditionally set the rapl scope to die for CLK-AP platform(on top of this fix),
or should I fix it in this patch 1 itself?

Thanks,
Dhananjay

> 
> thanks,
> rui
>>
>> Thanks for the review.
>>
>> Regards,
>> Dhananjay
>>
>>>
>>> The patch LGTM.
>>>
>>> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
>>>
>>> thanks,
>>> rui
>>>> ---
>>>>  drivers/powercap/intel_rapl_common.c | 20 +++++++++++++++++---
>>>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/powercap/intel_rapl_common.c
>>>> b/drivers/powercap/intel_rapl_common.c
>>>> index 3cffa6c79538..2f24ca764408 100644
>>>> --- a/drivers/powercap/intel_rapl_common.c
>>>> +++ b/drivers/powercap/intel_rapl_common.c
>>>> @@ -2128,6 +2128,18 @@ void rapl_remove_package(struct
>>>> rapl_package
>>>> *rp)
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(rapl_remove_package);
>>>>  
>>>> +/*
>>>> + * Intel systems that enumerate DIE domain have RAPL domains
>>>> implemented
>>>> + * per-die, however, the same is not true for AMD and Hygon
>>>> processors
>>>> + * where RAPL domains for PKG energy are in-fact per-PKG. Since
>>>> + * logical_die_id is same as logical_package_id in absence of
>>>> DIE
>>>> + * enumeration, use topology_logical_die_id() on Intel systems
>>>> and
>>>> + * topology_logical_package_id() on AMD and Hygon systems.
>>>> + */
>>>> +#define rapl_pmu_is_pkg_scope()                                \
>>>> +       (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||  \
>>>> +        boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
>>>> +
>>>>  /* caller to ensure CPU hotplug lock is held */
>>>>  struct rapl_package *rapl_find_package_domain_cpuslocked(int id,
>>>> struct rapl_if_priv *priv,
>>>>                                                          bool
>>>> id_is_cpu)
>>>> @@ -2136,7 +2148,8 @@ struct rapl_package
>>>> *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_
>>>>         int uid;
>>>>  
>>>>         if (id_is_cpu)
>>>> -               uid = topology_logical_die_id(id);
>>>> +               uid = rapl_pmu_is_pkg_scope() ?
>>>> +                     topology_physical_package_id(id) :
>>>> topology_logical_die_id(id);
>>>>         else
>>>>                 uid = id;
>>>>  
>>>> @@ -2168,9 +2181,10 @@ struct rapl_package
>>>> *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *pr
>>>>                 return ERR_PTR(-ENOMEM);
>>>>  
>>>>         if (id_is_cpu) {
>>>> -               rp->id = topology_logical_die_id(id);
>>>> +               rp->id = rapl_pmu_is_pkg_scope() ?
>>>> +                        topology_physical_package_id(id) :
>>>> topology_logical_die_id(id);
>>>>                 rp->lead_cpu = id;
>>>> -               if (topology_max_dies_per_package() > 1)
>>>> +               if (!rapl_pmu_is_pkg_scope() &&
>>>> topology_max_dies_per_package() > 1)
>>>>                         snprintf(rp->name,
>>>> PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d",
>>>>                                 
>>>> topology_physical_package_id(id),
>>>> topology_die_id(id));
>>>>                 else
>>>
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
  2024-07-22 14:01         ` Dhananjay Ugwekar
@ 2024-07-22 15:21           ` Zhang, Rui
  2024-07-23  4:12             ` Dhananjay Ugwekar
  0 siblings, 1 reply; 9+ messages in thread
From: Zhang, Rui @ 2024-07-22 15:21 UTC (permalink / raw)
  To: acme@kernel.org, alexander.shishkin@linux.intel.com,
	Hunter, Adrian, dave.hansen@linux.intel.com, bp@alien8.de,
	mark.rutland@arm.com, peterz@infradead.org, mingo@redhat.com,
	Dhananjay.Ugwekar@amd.com, x86@kernel.org, irogers@google.com,
	tglx@linutronix.de, jolsa@kernel.org, namhyung@kernel.org,
	kan.liang@linux.intel.com
  Cc: ravi.bangoria@amd.com, gautham.shenoy@amd.com,
	kprateek.nayak@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, Larabel, Michael,
	sandipan.das@amd.com, linux-pm@vger.kernel.org,
	ananth.narayan@amd.com

On Mon, 2024-07-22 at 19:31 +0530, Dhananjay Ugwekar wrote:
> 
> 
> On 7/22/2024 7:22 PM, Zhang, Rui wrote:
> > On Mon, 2024-07-22 at 13:54 +0530, Dhananjay Ugwekar wrote:
> > > Hi Rui,
> > > 
> > > On 7/21/2024 7:47 PM, Zhang, Rui wrote:
> > > > On Fri, 2024-07-19 at 09:25 +0000, Dhananjay Ugwekar wrote:
> > > > > After commit ("x86/cpu/topology: Add support for the AMD
> > > > > 0x80000026
> > > > > leaf"),
> > > > > on AMD processors that support extended CPUID leaf
> > > > > 0x80000026,
> > > > > the
> > > > > topology_logical_die_id() macros, no longer returns package
> > > > > id,
> > > > > instead it
> > > > > returns the CCD (Core Complex Die) id. This leads to the
> > > > > energy-
> > > > > pkg
> > > > > event scope to be modified to CCD instead of package.
> > > > > 
> > > > > For more historical context, please refer to commit
> > > > > 32fb480e0a2c
> > > > > ("powercap/intel_rapl: Support multi-die/package"), which
> > > > > initially
> > > > > changed
> > > > > the RAPL scope from package to die for all systems, as Intel
> > > > > systems
> > > > > with Die enumeration have RAPL scope as die, and those
> > > > > without
> > > > > die
> > > > > enumeration are not affected. So, all systems(Intel, AMD,
> > > > > Hygon),
> > > > > worked
> > > > > correctly with topology_logical_die_id() until recently, but
> > > > > this
> > > > > changed
> > > > > after the "0x80000026 leaf" commit mentioned above.
> > > > > 
> > > > > Replacing topology_logical_die_id() with
> > > > > topology_physical_package_id()
> > > > > conditionally only for AMD and Hygon fixes the energy-pkg
> > > > > event.
> > > > > 
> > > > > On an AMD 2 socket 8 CCD Zen5 server:
> > > > > 
> > > > > Before:
> > > > > 
> > > > > linux$ ls /sys/class/powercap/
> > > > > intel-rapl      intel-rapl:1:0  intel-rapl:3:0  intel-
> > > > > rapl:5:0
> > > > > intel-rapl:7:0  intel-rapl:9:0  intel-rapl:b:0  intel-
> > > > > rapl:d:0
> > > > > intel-rapl:f:0  intel-rapl:0    intel-rapl:2    intel-rapl:4
> > > > > intel-rapl:6    intel-rapl:8    intel-rapl:a    intel-rapl:c
> > > > > intel-rapl:e    intel-rapl:0:0  intel-rapl:2:0  intel-
> > > > > rapl:4:0
> > > > > intel-rapl:6:0  intel-rapl:8:0  intel-rapl:a:0  intel-
> > > > > rapl:c:0
> > > > > intel-rapl:e:0  intel-rapl:1    intel-rapl:3    intel-rapl:5
> > > > > intel-rapl:7    intel-rapl:9    intel-rapl:b    intel-rapl:d
> > > > > intel-rapl:f
> > > > > 
> > > > > After:
> > > > > 
> > > > > linux$ ls /sys/class/powercap/
> > > > > intel-rapl  intel-rapl:0  intel-rapl:0:0  intel-rapl:1 
> > > > > intel-
> > > > > rapl:1:0
> > > > > 
> > > > > Only one sysfs entry per-event per-package is created after
> > > > > this
> > > > > change.
> > > > > 
> > > > > Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the
> > > > > AMD
> > > > > 0x80000026 leaf")
> > > > > Reported-by: Michael Larabel <michael@michaellarabel.com>
> > > > > Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> > > > 
> > > > For the future Intel multi-die system that I know, it still has
> > > > package-scope RAPL, but this is done with TPMI RAPL interface.
> > > > 
> > > > The TPMI RAPL driver invokes these APIs with "id == pkg_id" and
> > > > "id_is_cpu == false", so no need to make
> > > > rapl_pmu_is_pkg_scope()
> > > > returns true for those Intel systems.
> > > 
> > > This seems like an important point, would you be okay with it, if
> > > I
> > > include
> > > this info in the commit log in v2 along with you rb tag?
> > 
> > Yes.
> > 
> > This reminds me that we can rephrase the comment for
> > rapl_pmu_is_pkg_scope() a bit, something including below points,
> > 1. AMD/HYGON platforms use per-PKG Package energy counter
> > 2. For Intel platforms
> >    2.1 CLX-AP platform has per-DIE Package energy counter
> >    2.2 other platforms that uses MSR RAPL are single die systems so
> > the
> > Package energy counter are per-PKG/per-DIE
> >    2.3 new platforms that use TPMI RAPL doesn't care about the
> > scope
> > because they are not MSR/CPU based.
> > 
> > what do you think?
> 
> Agreed, this gives a more clear picture of the all the RAPL scopes.
> 
> We will need the above comment in the first patch as well, apart from
> the 2.3 point.

Sounds good to me.
> 
> Also, regarding perf/x86/rapl driver(patch 1), will you be sending a
> patch
> to conditionally set the rapl scope to die for CLK-AP platform(on top
> of this fix),
> or should I fix it in this patch 1 itself?

patch 1 is a fix patch.
optimization for CLX-AP should be a separate patch and that is not
urgent (the new logic is still correct for current existing Intel
platforms), I will submit it later.

I think the fix patch is good enough as long as we have below
information
1. CLX-AP is multi-die and its RAPL MSRs are die scope
2. other Intel platforms are single die systems so the scope can be
considered as either pkg-scope or die-scope.
This info will make the future optimization easier.

thanks,
rui

> 
> Thanks,
> Dhananjay
> 
> > 
> > thanks,
> > rui
> > > 
> > > Thanks for the review.
> > > 
> > > Regards,
> > > Dhananjay
> > > 
> > > > 
> > > > The patch LGTM.
> > > > 
> > > > Reviewed-by: Zhang Rui <rui.zhang@intel.com>
> > > > 
> > > > thanks,
> > > > rui
> > > > > ---
> > > > >  drivers/powercap/intel_rapl_common.c | 20 +++++++++++++++++-
> > > > > --
> > > > >  1 file changed, 17 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/powercap/intel_rapl_common.c
> > > > > b/drivers/powercap/intel_rapl_common.c
> > > > > index 3cffa6c79538..2f24ca764408 100644
> > > > > --- a/drivers/powercap/intel_rapl_common.c
> > > > > +++ b/drivers/powercap/intel_rapl_common.c
> > > > > @@ -2128,6 +2128,18 @@ void rapl_remove_package(struct
> > > > > rapl_package
> > > > > *rp)
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(rapl_remove_package);
> > > > >  
> > > > > +/*
> > > > > + * Intel systems that enumerate DIE domain have RAPL domains
> > > > > implemented
> > > > > + * per-die, however, the same is not true for AMD and Hygon
> > > > > processors
> > > > > + * where RAPL domains for PKG energy are in-fact per-PKG.
> > > > > Since
> > > > > + * logical_die_id is same as logical_package_id in absence
> > > > > of
> > > > > DIE
> > > > > + * enumeration, use topology_logical_die_id() on Intel
> > > > > systems
> > > > > and
> > > > > + * topology_logical_package_id() on AMD and Hygon systems.
> > > > > + */
> > > > > +#define
> > > > > rapl_pmu_is_pkg_scope()                                \
> > > > > +       (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||  \
> > > > > +        boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
> > > > > +
> > > > >  /* caller to ensure CPU hotplug lock is held */
> > > > >  struct rapl_package *rapl_find_package_domain_cpuslocked(int
> > > > > id,
> > > > > struct rapl_if_priv *priv,
> > > > >                                                          bool
> > > > > id_is_cpu)
> > > > > @@ -2136,7 +2148,8 @@ struct rapl_package
> > > > > *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_
> > > > >         int uid;
> > > > >  
> > > > >         if (id_is_cpu)
> > > > > -               uid = topology_logical_die_id(id);
> > > > > +               uid = rapl_pmu_is_pkg_scope() ?
> > > > > +                     topology_physical_package_id(id) :
> > > > > topology_logical_die_id(id);
> > > > >         else
> > > > >                 uid = id;
> > > > >  
> > > > > @@ -2168,9 +2181,10 @@ struct rapl_package
> > > > > *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *pr
> > > > >                 return ERR_PTR(-ENOMEM);
> > > > >  
> > > > >         if (id_is_cpu) {
> > > > > -               rp->id = topology_logical_die_id(id);
> > > > > +               rp->id = rapl_pmu_is_pkg_scope() ?
> > > > > +                        topology_physical_package_id(id) :
> > > > > topology_logical_die_id(id);
> > > > >                 rp->lead_cpu = id;
> > > > > -               if (topology_max_dies_per_package() > 1)
> > > > > +               if (!rapl_pmu_is_pkg_scope() &&
> > > > > topology_max_dies_per_package() > 1)
> > > > >                         snprintf(rp->name,
> > > > > PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d",
> > > > >                                 
> > > > > topology_physical_package_id(id),
> > > > > topology_die_id(id));
> > > > >                 else
> > > > 
> > 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
  2024-07-22 15:21           ` Zhang, Rui
@ 2024-07-23  4:12             ` Dhananjay Ugwekar
  0 siblings, 0 replies; 9+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-23  4:12 UTC (permalink / raw)
  To: Zhang, Rui, acme@kernel.org, alexander.shishkin@linux.intel.com,
	Hunter, Adrian, dave.hansen@linux.intel.com, bp@alien8.de,
	mark.rutland@arm.com, peterz@infradead.org, mingo@redhat.com,
	x86@kernel.org, irogers@google.com, tglx@linutronix.de,
	jolsa@kernel.org, namhyung@kernel.org, kan.liang@linux.intel.com
  Cc: ravi.bangoria@amd.com, gautham.shenoy@amd.com,
	kprateek.nayak@amd.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, Larabel, Michael,
	sandipan.das@amd.com, linux-pm@vger.kernel.org,
	ananth.narayan@amd.com



On 7/22/2024 8:51 PM, Zhang, Rui wrote:
> On Mon, 2024-07-22 at 19:31 +0530, Dhananjay Ugwekar wrote:
>>
>>
>> On 7/22/2024 7:22 PM, Zhang, Rui wrote:
>>> On Mon, 2024-07-22 at 13:54 +0530, Dhananjay Ugwekar wrote:
>>>> Hi Rui,
>>>>
>>>> On 7/21/2024 7:47 PM, Zhang, Rui wrote:
>>>>> On Fri, 2024-07-19 at 09:25 +0000, Dhananjay Ugwekar wrote:
>>>>>> After commit ("x86/cpu/topology: Add support for the AMD
>>>>>> 0x80000026
>>>>>> leaf"),
>>>>>> on AMD processors that support extended CPUID leaf
>>>>>> 0x80000026,
>>>>>> the
>>>>>> topology_logical_die_id() macros, no longer returns package
>>>>>> id,
>>>>>> instead it
>>>>>> returns the CCD (Core Complex Die) id. This leads to the
>>>>>> energy-
>>>>>> pkg
>>>>>> event scope to be modified to CCD instead of package.
>>>>>>
>>>>>> For more historical context, please refer to commit
>>>>>> 32fb480e0a2c
>>>>>> ("powercap/intel_rapl: Support multi-die/package"), which
>>>>>> initially
>>>>>> changed
>>>>>> the RAPL scope from package to die for all systems, as Intel
>>>>>> systems
>>>>>> with Die enumeration have RAPL scope as die, and those
>>>>>> without
>>>>>> die
>>>>>> enumeration are not affected. So, all systems(Intel, AMD,
>>>>>> Hygon),
>>>>>> worked
>>>>>> correctly with topology_logical_die_id() until recently, but
>>>>>> this
>>>>>> changed
>>>>>> after the "0x80000026 leaf" commit mentioned above.
>>>>>>
>>>>>> Replacing topology_logical_die_id() with
>>>>>> topology_physical_package_id()
>>>>>> conditionally only for AMD and Hygon fixes the energy-pkg
>>>>>> event.
>>>>>>
>>>>>> On an AMD 2 socket 8 CCD Zen5 server:
>>>>>>
>>>>>> Before:
>>>>>>
>>>>>> linux$ ls /sys/class/powercap/
>>>>>> intel-rapl      intel-rapl:1:0  intel-rapl:3:0  intel-
>>>>>> rapl:5:0
>>>>>> intel-rapl:7:0  intel-rapl:9:0  intel-rapl:b:0  intel-
>>>>>> rapl:d:0
>>>>>> intel-rapl:f:0  intel-rapl:0    intel-rapl:2    intel-rapl:4
>>>>>> intel-rapl:6    intel-rapl:8    intel-rapl:a    intel-rapl:c
>>>>>> intel-rapl:e    intel-rapl:0:0  intel-rapl:2:0  intel-
>>>>>> rapl:4:0
>>>>>> intel-rapl:6:0  intel-rapl:8:0  intel-rapl:a:0  intel-
>>>>>> rapl:c:0
>>>>>> intel-rapl:e:0  intel-rapl:1    intel-rapl:3    intel-rapl:5
>>>>>> intel-rapl:7    intel-rapl:9    intel-rapl:b    intel-rapl:d
>>>>>> intel-rapl:f
>>>>>>
>>>>>> After:
>>>>>>
>>>>>> linux$ ls /sys/class/powercap/
>>>>>> intel-rapl  intel-rapl:0  intel-rapl:0:0  intel-rapl:1 
>>>>>> intel-
>>>>>> rapl:1:0
>>>>>>
>>>>>> Only one sysfs entry per-event per-package is created after
>>>>>> this
>>>>>> change.
>>>>>>
>>>>>> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the
>>>>>> AMD
>>>>>> 0x80000026 leaf")
>>>>>> Reported-by: Michael Larabel <michael@michaellarabel.com>
>>>>>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
>>>>>
>>>>> For the future Intel multi-die system that I know, it still has
>>>>> package-scope RAPL, but this is done with TPMI RAPL interface.
>>>>>
>>>>> The TPMI RAPL driver invokes these APIs with "id == pkg_id" and
>>>>> "id_is_cpu == false", so no need to make
>>>>> rapl_pmu_is_pkg_scope()
>>>>> returns true for those Intel systems.
>>>>
>>>> This seems like an important point, would you be okay with it, if
>>>> I
>>>> include
>>>> this info in the commit log in v2 along with you rb tag?
>>>
>>> Yes.
>>>
>>> This reminds me that we can rephrase the comment for
>>> rapl_pmu_is_pkg_scope() a bit, something including below points,
>>> 1. AMD/HYGON platforms use per-PKG Package energy counter
>>> 2. For Intel platforms
>>>    2.1 CLX-AP platform has per-DIE Package energy counter
>>>    2.2 other platforms that uses MSR RAPL are single die systems so
>>> the
>>> Package energy counter are per-PKG/per-DIE
>>>    2.3 new platforms that use TPMI RAPL doesn't care about the
>>> scope
>>> because they are not MSR/CPU based.
>>>
>>> what do you think?
>>
>> Agreed, this gives a more clear picture of the all the RAPL scopes.
>>
>> We will need the above comment in the first patch as well, apart from
>> the 2.3 point.
> 
> Sounds good to me.
>>
>> Also, regarding perf/x86/rapl driver(patch 1), will you be sending a
>> patch
>> to conditionally set the rapl scope to die for CLK-AP platform(on top
>> of this fix),
>> or should I fix it in this patch 1 itself?
> 
> patch 1 is a fix patch.
> optimization for CLX-AP should be a separate patch and that is not
> urgent (the new logic is still correct for current existing Intel
> platforms), I will submit it later.

Makes sense

> 
> I think the fix patch is good enough as long as we have below
> information
> 1. CLX-AP is multi-die and its RAPL MSRs are die scope
> 2. other Intel platforms are single die systems so the scope can be
> considered as either pkg-scope or die-scope.
> This info will make the future optimization easier.

Yes, this seems good

Thanks, 
Dhananjay

> 
> thanks,
> rui
> 
>>
>> Thanks,
>> Dhananjay
>>
>>>
>>> thanks,
>>> rui
>>>>
>>>> Thanks for the review.
>>>>
>>>> Regards,
>>>> Dhananjay
>>>>
>>>>>
>>>>> The patch LGTM.
>>>>>
>>>>> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
>>>>>
>>>>> thanks,
>>>>> rui
>>>>>> ---
>>>>>>  drivers/powercap/intel_rapl_common.c | 20 +++++++++++++++++-
>>>>>> --
>>>>>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/powercap/intel_rapl_common.c
>>>>>> b/drivers/powercap/intel_rapl_common.c
>>>>>> index 3cffa6c79538..2f24ca764408 100644
>>>>>> --- a/drivers/powercap/intel_rapl_common.c
>>>>>> +++ b/drivers/powercap/intel_rapl_common.c
>>>>>> @@ -2128,6 +2128,18 @@ void rapl_remove_package(struct
>>>>>> rapl_package
>>>>>> *rp)
>>>>>>  }
>>>>>>  EXPORT_SYMBOL_GPL(rapl_remove_package);
>>>>>>  
>>>>>> +/*
>>>>>> + * Intel systems that enumerate DIE domain have RAPL domains
>>>>>> implemented
>>>>>> + * per-die, however, the same is not true for AMD and Hygon
>>>>>> processors
>>>>>> + * where RAPL domains for PKG energy are in-fact per-PKG.
>>>>>> Since
>>>>>> + * logical_die_id is same as logical_package_id in absence
>>>>>> of
>>>>>> DIE
>>>>>> + * enumeration, use topology_logical_die_id() on Intel
>>>>>> systems
>>>>>> and
>>>>>> + * topology_logical_package_id() on AMD and Hygon systems.
>>>>>> + */
>>>>>> +#define
>>>>>> rapl_pmu_is_pkg_scope()                                \
>>>>>> +       (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||  \
>>>>>> +        boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
>>>>>> +
>>>>>>  /* caller to ensure CPU hotplug lock is held */
>>>>>>  struct rapl_package *rapl_find_package_domain_cpuslocked(int
>>>>>> id,
>>>>>> struct rapl_if_priv *priv,
>>>>>>                                                          bool
>>>>>> id_is_cpu)
>>>>>> @@ -2136,7 +2148,8 @@ struct rapl_package
>>>>>> *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_
>>>>>>         int uid;
>>>>>>  
>>>>>>         if (id_is_cpu)
>>>>>> -               uid = topology_logical_die_id(id);
>>>>>> +               uid = rapl_pmu_is_pkg_scope() ?
>>>>>> +                     topology_physical_package_id(id) :
>>>>>> topology_logical_die_id(id);
>>>>>>         else
>>>>>>                 uid = id;
>>>>>>  
>>>>>> @@ -2168,9 +2181,10 @@ struct rapl_package
>>>>>> *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *pr
>>>>>>                 return ERR_PTR(-ENOMEM);
>>>>>>  
>>>>>>         if (id_is_cpu) {
>>>>>> -               rp->id = topology_logical_die_id(id);
>>>>>> +               rp->id = rapl_pmu_is_pkg_scope() ?
>>>>>> +                        topology_physical_package_id(id) :
>>>>>> topology_logical_die_id(id);
>>>>>>                 rp->lead_cpu = id;
>>>>>> -               if (topology_max_dies_per_package() > 1)
>>>>>> +               if (!rapl_pmu_is_pkg_scope() &&
>>>>>> topology_max_dies_per_package() > 1)
>>>>>>                         snprintf(rp->name,
>>>>>> PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d",
>>>>>>                                 
>>>>>> topology_physical_package_id(id),
>>>>>> topology_die_id(id));
>>>>>>                 else
>>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-07-23  4:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-19  9:25 [PATCH 0/2] RAPL driver fixes for AMD CPUs Dhananjay Ugwekar
2024-07-19  9:25 ` [PATCH 1/2] perf/x86/rapl: Fix the energy-pkg event " Dhananjay Ugwekar
2024-07-19  9:25 ` [PATCH 2/2] powercap/intel_rapl: " Dhananjay Ugwekar
2024-07-21 14:17   ` Zhang, Rui
2024-07-22  8:24     ` Dhananjay Ugwekar
2024-07-22 13:52       ` Zhang, Rui
2024-07-22 14:01         ` Dhananjay Ugwekar
2024-07-22 15:21           ` Zhang, Rui
2024-07-23  4:12             ` Dhananjay Ugwekar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).