* [PATCH v2 0/2] RAPL driver fixes for AMD CPUs
@ 2024-07-30 4:49 Dhananjay Ugwekar
2024-07-30 4:49 ` [PATCH v2 1/2] perf/x86/rapl: Fix the energy-pkg event " Dhananjay Ugwekar
2024-07-30 4:49 ` [PATCH v2 2/2] powercap/intel_rapl: " Dhananjay Ugwekar
0 siblings, 2 replies; 9+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-30 4:49 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
x86, rui.zhang
Cc: linux-perf-users, linux-kernel, linux-pm, ananth.narayan,
gautham.shenoy, kprateek.nayak, ravi.bangoria, Dhananjay.Ugwekar
After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
AMD CPUs with 0x80000026 leaf have become CCD(Core Complex Die)-aware and
this CCD info is reported by the _die_ macros, which earlier used to return
package info.
e.g: Earlier topology_logical_die_id() and topology_logical_package_id()
used to return the same value for all the CPUs even on a multi-die/
multi-CCD system. But after the above mentioned commit,
topology_logical_die_id() returns the CCD id and
topology_logical_package_id() returns the package/socket id.
Hence, the RAPL energy-pkg event which uses _die_ macros has to be
fixed, in light of above change.
Dhananjay Ugwekar (2):
perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
arch/x86/events/rapl.c | 47 +++++++++++++++++++++++++---
drivers/powercap/intel_rapl_common.c | 34 +++++++++++++++++---
2 files changed, 72 insertions(+), 9 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 1/2] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
2024-07-30 4:49 [PATCH v2 0/2] RAPL driver fixes for AMD CPUs Dhananjay Ugwekar
@ 2024-07-30 4:49 ` Dhananjay Ugwekar
2024-08-02 15:26 ` Liang, Kan
2024-07-30 4:49 ` [PATCH v2 2/2] powercap/intel_rapl: " Dhananjay Ugwekar
1 sibling, 1 reply; 9+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-30 4:49 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
x86, rui.zhang
Cc: linux-perf-users, linux-kernel, linux-pm, ananth.narayan,
gautham.shenoy, kprateek.nayak, ravi.bangoria, Dhananjay.Ugwekar
After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
on AMD processors that support extended CPUID leaf 0x80000026, the
topology_die_cpumask() and topology_logical_die_id() macros, no longer
return the package cpumask and package id, instead they return the CCD
(Core Complex Die) mask and id respectively. This leads to the energy-pkg
event scope to be modified to CCD instead of package.
For more historical context, please refer to commit 32fb480e0a2c
("powercap/intel_rapl: Support multi-die/package"), which initially changed
the RAPL scope from package to die for all systems, as Intel systems
with Die enumeration have RAPL scope as die, and those without die
enumeration were not affected by it. So, all systems(Intel, AMD, Hygon),
worked correctly with topology_logical_die_id() until recently, but this
changed after the "0x80000026 leaf" commit mentioned above.
Replacing topology_logical_die_id() with topology_physical_package_id()
conditionally only for AMD and Hygon fixes the energy-pkg event.
On a 12 CCD 1 Package AMD Zen4 Genoa machine:
Before:
$ cat /sys/devices/power/cpumask
0,8,16,24,32,40,48,56,64,72,80,88.
The expected cpumask here is supposed to be just "0", as it is a package
scope event, only one CPU will be collecting the event for all the CPUs in
the package.
After:
$ cat /sys/devices/power/cpumask
0
Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
---
Changes in v2:
* Updated the scope description comment
* Dont create rapl_pmu_cpumask and rapl_pmu_idx local variables, as they're
used only once, instead call the get_* functions directly where needed
* Check topology_logical_(die/package)_id return value
---
arch/x86/events/rapl.c | 47 +++++++++++++++++++++++++++++++++++++-----
1 file changed, 42 insertions(+), 5 deletions(-)
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index b985ca79cf97..7097c0f6a71f 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -103,6 +103,19 @@ static struct perf_pmu_events_attr event_attr_##v = { \
.event_str = str, \
};
+/*
+ * RAPL Package energy counter scope:
+ * 1. AMD/HYGON platforms have a per-PKG package energy counter
+ * 2. For Intel platforms
+ * 2.1. CLX-AP is multi-die and its RAPL MSRs are die-scope
+ * 2.2. Other Intel platforms are single die systems so the scope can be
+ * considered as either pkg-scope or die-scope, and we are considering
+ * them as die-scope.
+ */
+#define rapl_pmu_is_pkg_scope() \
+ (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || \
+ boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
+
struct rapl_pmu {
raw_spinlock_t lock;
int n_active;
@@ -140,9 +153,25 @@ static unsigned int rapl_cntr_mask;
static u64 rapl_timer_ms;
static struct perf_msr *rapl_msrs;
+/*
+ * Helper functions to get the correct topology macros according to the
+ * RAPL PMU scope.
+ */
+static inline unsigned int get_rapl_pmu_idx(int cpu)
+{
+ return rapl_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
+ topology_logical_die_id(cpu);
+}
+
+static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
+{
+ return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
+ topology_die_cpumask(cpu);
+}
+
static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
{
- unsigned int rapl_pmu_idx = topology_logical_die_id(cpu);
+ unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
/*
* The unsigned check also catches the '-1' return value for non
@@ -552,7 +581,7 @@ static int rapl_cpu_offline(unsigned int cpu)
pmu->cpu = -1;
/* Find a new cpu to collect rapl events */
- target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
+ target = cpumask_any_but(get_rapl_pmu_cpumask(cpu), cpu);
/* Migrate rapl events to the new target */
if (target < nr_cpu_ids) {
@@ -565,6 +594,11 @@ static int rapl_cpu_offline(unsigned int cpu)
static int rapl_cpu_online(unsigned int cpu)
{
+ s32 rapl_pmu_idx = get_rapl_pmu_idx(cpu);
+ if (rapl_pmu_idx < 0) {
+ pr_err("topology_logical_(package/die)_id() returned a negative value");
+ return -EINVAL;
+ }
struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
int target;
@@ -579,14 +613,14 @@ static int rapl_cpu_online(unsigned int cpu)
pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
rapl_hrtimer_init(pmu);
- rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
+ rapl_pmus->pmus[rapl_pmu_idx] = pmu;
}
/*
* Check if there is an online cpu in the package which collects rapl
* events already.
*/
- target = cpumask_any_and(&rapl_cpu_mask, topology_die_cpumask(cpu));
+ target = cpumask_any_and(&rapl_cpu_mask, get_rapl_pmu_cpumask(cpu));
if (target < nr_cpu_ids)
return 0;
@@ -675,7 +709,10 @@ static const struct attribute_group *rapl_attr_update[] = {
static int __init init_rapl_pmus(void)
{
- int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
+ int nr_rapl_pmu = topology_max_packages();
+
+ if (!rapl_pmu_is_pkg_scope())
+ nr_rapl_pmu *= topology_max_dies_per_package();
rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus, nr_rapl_pmu), GFP_KERNEL);
if (!rapl_pmus)
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 2/2] powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
2024-07-30 4:49 [PATCH v2 0/2] RAPL driver fixes for AMD CPUs Dhananjay Ugwekar
2024-07-30 4:49 ` [PATCH v2 1/2] perf/x86/rapl: Fix the energy-pkg event " Dhananjay Ugwekar
@ 2024-07-30 4:49 ` Dhananjay Ugwekar
2024-08-02 12:35 ` Rafael J. Wysocki
1 sibling, 1 reply; 9+ messages in thread
From: Dhananjay Ugwekar @ 2024-07-30 4:49 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
x86, rui.zhang
Cc: linux-perf-users, linux-kernel, linux-pm, ananth.narayan,
gautham.shenoy, kprateek.nayak, ravi.bangoria, Dhananjay.Ugwekar,
Michael Larabel
After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
on AMD processors that support extended CPUID leaf 0x80000026, the
topology_logical_die_id() macros, no longer returns package id, instead it
returns the CCD (Core Complex Die) id. This leads to the energy-pkg
event scope to be modified to CCD instead of package.
For more historical context, please refer to commit 32fb480e0a2c
("powercap/intel_rapl: Support multi-die/package"), which initially changed
the RAPL scope from package to die for all systems, as Intel systems
with Die enumeration have RAPL scope as die, and those without die
enumeration are not affected. So, all systems(Intel, AMD, Hygon), worked
correctly with topology_logical_die_id() until recently, but this changed
after the "0x80000026 leaf" commit mentioned above.
Future multi-die Intel systems will have package scope RAPL counters,
but they will be using TPMI RAPL interface, which is not affected by
this change.
Replacing topology_logical_die_id() with topology_physical_package_id()
conditionally only for AMD and Hygon fixes the energy-pkg event.
On an AMD 2 socket 8 CCD Zen4 server:
Before:
linux$ ls /sys/class/powercap/
intel-rapl intel-rapl:4 intel-rapl:8:0 intel-rapl:d
intel-rapl:0 intel-rapl:4:0 intel-rapl:9 intel-rapl:d:0
intel-rapl:0:0 intel-rapl:5 intel-rapl:9:0 intel-rapl:e
intel-rapl:1 intel-rapl:5:0 intel-rapl:a intel-rapl:e:0
intel-rapl:1:0 intel-rapl:6 intel-rapl:a:0 intel-rapl:f
intel-rapl:2 intel-rapl:6:0 intel-rapl:b intel-rapl:f:0
intel-rapl:2:0 intel-rapl:7 intel-rapl:b:0
intel-rapl:3 intel-rapl:7:0 intel-rapl:c
intel-rapl:3:0 intel-rapl:8 intel-rapl:c:0
After:
linux$ ls /sys/class/powercap/
intel-rapl intel-rapl:0 intel-rapl:0:0 intel-rapl:1 intel-rapl:1:0
Only one sysfs entry per-event per-package is created after this change.
Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
Reported-by: Michael Larabel <michael@michaellarabel.com>
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
---
Changes in v2:
* Updated scope description comment, commit log
* Rename rapl_pmu_is_pkg_scope() to rapl_msrs_are_pkg_scope()
* Check topology_logical_(die/package)_id return value
---
drivers/powercap/intel_rapl_common.c | 34 ++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
index 3cffa6c79538..4bc56acb99d6 100644
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -2128,6 +2128,21 @@ void rapl_remove_package(struct rapl_package *rp)
}
EXPORT_SYMBOL_GPL(rapl_remove_package);
+/*
+ * RAPL Package energy counter scope:
+ * 1. AMD/HYGON platforms use per-PKG package energy counter
+ * 2. For Intel platforms
+ * 2.1 CLX-AP platform has per-DIE package energy counter
+ * 2.2 Other platforms that uses MSR RAPL are single die systems so the
+ * package energy counter can be considered as per-PKG/per-DIE,
+ * here it is considered as per-DIE.
+ * 2.3 New platforms that use TPMI RAPL doesn't care about the
+ * scope because they are not MSR/CPU based.
+ */
+#define rapl_msrs_are_pkg_scope() \
+ (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || \
+ boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
+
/* caller to ensure CPU hotplug lock is held */
struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_priv *priv,
bool id_is_cpu)
@@ -2135,8 +2150,14 @@ struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_
struct rapl_package *rp;
int uid;
- if (id_is_cpu)
- uid = topology_logical_die_id(id);
+ if (id_is_cpu) {
+ uid = rapl_msrs_are_pkg_scope() ?
+ topology_physical_package_id(id) : topology_logical_die_id(id);
+ if (uid < 0) {
+ pr_err("topology_logical_(package/die)_id() returned a negative value");
+ return ERR_PTR(-EINVAL);
+ }
+ }
else
uid = id;
@@ -2168,9 +2189,14 @@ struct rapl_package *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *pr
return ERR_PTR(-ENOMEM);
if (id_is_cpu) {
- rp->id = topology_logical_die_id(id);
+ rp->id = rapl_msrs_are_pkg_scope() ?
+ topology_physical_package_id(id) : topology_logical_die_id(id);
+ if ((int)(rp->id) < 0) {
+ pr_err("topology_logical_(package/die)_id() returned a negative value");
+ return ERR_PTR(-EINVAL);
+ }
rp->lead_cpu = id;
- if (topology_max_dies_per_package() > 1)
+ if (!rapl_msrs_are_pkg_scope() && topology_max_dies_per_package() > 1)
snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d",
topology_physical_package_id(id), topology_die_id(id));
else
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
2024-07-30 4:49 ` [PATCH v2 2/2] powercap/intel_rapl: " Dhananjay Ugwekar
@ 2024-08-02 12:35 ` Rafael J. Wysocki
2024-08-08 11:17 ` Dhananjay Ugwekar
0 siblings, 1 reply; 9+ messages in thread
From: Rafael J. Wysocki @ 2024-08-02 12:35 UTC (permalink / raw)
To: Dhananjay Ugwekar
Cc: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
x86, rui.zhang, linux-perf-users, linux-kernel, linux-pm,
ananth.narayan, gautham.shenoy, kprateek.nayak, ravi.bangoria,
Michael Larabel
On Tue, Jul 30, 2024 at 6:53 AM Dhananjay Ugwekar
<Dhananjay.Ugwekar@amd.com> wrote:
>
> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
> on AMD processors that support extended CPUID leaf 0x80000026, the
> topology_logical_die_id() macros, no longer returns package id, instead it
> returns the CCD (Core Complex Die) id. This leads to the energy-pkg
> event scope to be modified to CCD instead of package.
>
> For more historical context, please refer to commit 32fb480e0a2c
> ("powercap/intel_rapl: Support multi-die/package"), which initially changed
> the RAPL scope from package to die for all systems, as Intel systems
> with Die enumeration have RAPL scope as die, and those without die
> enumeration are not affected. So, all systems(Intel, AMD, Hygon), worked
> correctly with topology_logical_die_id() until recently, but this changed
> after the "0x80000026 leaf" commit mentioned above.
>
> Future multi-die Intel systems will have package scope RAPL counters,
> but they will be using TPMI RAPL interface, which is not affected by
> this change.
>
> Replacing topology_logical_die_id() with topology_physical_package_id()
> conditionally only for AMD and Hygon fixes the energy-pkg event.
>
> On an AMD 2 socket 8 CCD Zen4 server:
>
> Before:
>
> linux$ ls /sys/class/powercap/
> intel-rapl intel-rapl:4 intel-rapl:8:0 intel-rapl:d
> intel-rapl:0 intel-rapl:4:0 intel-rapl:9 intel-rapl:d:0
> intel-rapl:0:0 intel-rapl:5 intel-rapl:9:0 intel-rapl:e
> intel-rapl:1 intel-rapl:5:0 intel-rapl:a intel-rapl:e:0
> intel-rapl:1:0 intel-rapl:6 intel-rapl:a:0 intel-rapl:f
> intel-rapl:2 intel-rapl:6:0 intel-rapl:b intel-rapl:f:0
> intel-rapl:2:0 intel-rapl:7 intel-rapl:b:0
> intel-rapl:3 intel-rapl:7:0 intel-rapl:c
> intel-rapl:3:0 intel-rapl:8 intel-rapl:c:0
>
> After:
>
> linux$ ls /sys/class/powercap/
> intel-rapl intel-rapl:0 intel-rapl:0:0 intel-rapl:1 intel-rapl:1:0
>
> Only one sysfs entry per-event per-package is created after this change.
>
> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
> Reported-by: Michael Larabel <michael@michaellarabel.com>
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
> ---
> Changes in v2:
> * Updated scope description comment, commit log
> * Rename rapl_pmu_is_pkg_scope() to rapl_msrs_are_pkg_scope()
> * Check topology_logical_(die/package)_id return value
This patch does not depend on the first one in the series if I'm not
mistaken, in which case I can pick it up separately if you want me to
do that, so please let me know.
Thanks!
> ---
> drivers/powercap/intel_rapl_common.c | 34 ++++++++++++++++++++++++----
> 1 file changed, 30 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
> index 3cffa6c79538..4bc56acb99d6 100644
> --- a/drivers/powercap/intel_rapl_common.c
> +++ b/drivers/powercap/intel_rapl_common.c
> @@ -2128,6 +2128,21 @@ void rapl_remove_package(struct rapl_package *rp)
> }
> EXPORT_SYMBOL_GPL(rapl_remove_package);
>
> +/*
> + * RAPL Package energy counter scope:
> + * 1. AMD/HYGON platforms use per-PKG package energy counter
> + * 2. For Intel platforms
> + * 2.1 CLX-AP platform has per-DIE package energy counter
> + * 2.2 Other platforms that uses MSR RAPL are single die systems so the
> + * package energy counter can be considered as per-PKG/per-DIE,
> + * here it is considered as per-DIE.
> + * 2.3 New platforms that use TPMI RAPL doesn't care about the
> + * scope because they are not MSR/CPU based.
> + */
> +#define rapl_msrs_are_pkg_scope() \
> + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || \
> + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
> +
> /* caller to ensure CPU hotplug lock is held */
> struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_priv *priv,
> bool id_is_cpu)
> @@ -2135,8 +2150,14 @@ struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_
> struct rapl_package *rp;
> int uid;
>
> - if (id_is_cpu)
> - uid = topology_logical_die_id(id);
> + if (id_is_cpu) {
> + uid = rapl_msrs_are_pkg_scope() ?
> + topology_physical_package_id(id) : topology_logical_die_id(id);
> + if (uid < 0) {
> + pr_err("topology_logical_(package/die)_id() returned a negative value");
> + return ERR_PTR(-EINVAL);
> + }
> + }
> else
> uid = id;
>
> @@ -2168,9 +2189,14 @@ struct rapl_package *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *pr
> return ERR_PTR(-ENOMEM);
>
> if (id_is_cpu) {
> - rp->id = topology_logical_die_id(id);
> + rp->id = rapl_msrs_are_pkg_scope() ?
> + topology_physical_package_id(id) : topology_logical_die_id(id);
> + if ((int)(rp->id) < 0) {
> + pr_err("topology_logical_(package/die)_id() returned a negative value");
> + return ERR_PTR(-EINVAL);
> + }
> rp->lead_cpu = id;
> - if (topology_max_dies_per_package() > 1)
> + if (!rapl_msrs_are_pkg_scope() && topology_max_dies_per_package() > 1)
> snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d",
> topology_physical_package_id(id), topology_die_id(id));
> else
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
2024-07-30 4:49 ` [PATCH v2 1/2] perf/x86/rapl: Fix the energy-pkg event " Dhananjay Ugwekar
@ 2024-08-02 15:26 ` Liang, Kan
2024-08-08 12:10 ` Dhananjay Ugwekar
0 siblings, 1 reply; 9+ messages in thread
From: Liang, Kan @ 2024-08-02 15:26 UTC (permalink / raw)
To: Dhananjay Ugwekar, peterz, mingo, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, tglx, bp,
dave.hansen, x86, rui.zhang
Cc: linux-perf-users, linux-kernel, linux-pm, ananth.narayan,
gautham.shenoy, kprateek.nayak, ravi.bangoria
Hi Dhananjay,
On 2024-07-30 12:49 a.m., Dhananjay Ugwekar wrote:
> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
> on AMD processors that support extended CPUID leaf 0x80000026, the
> topology_die_cpumask() and topology_logical_die_id() macros, no longer
> return the package cpumask and package id, instead they return the CCD
> (Core Complex Die) mask and id respectively. This leads to the energy-pkg
> event scope to be modified to CCD instead of package.
>
> For more historical context, please refer to commit 32fb480e0a2c
> ("powercap/intel_rapl: Support multi-die/package"), which initially changed
> the RAPL scope from package to die for all systems, as Intel systems
> with Die enumeration have RAPL scope as die, and those without die
> enumeration were not affected by it. So, all systems(Intel, AMD, Hygon),
> worked correctly with topology_logical_die_id() until recently, but this
> changed after the "0x80000026 leaf" commit mentioned above.
>
> Replacing topology_logical_die_id() with topology_physical_package_id()
> conditionally only for AMD and Hygon fixes the energy-pkg event.
>
> On a 12 CCD 1 Package AMD Zen4 Genoa machine:
>
> Before:
> $ cat /sys/devices/power/cpumask
> 0,8,16,24,32,40,48,56,64,72,80,88.
>
> The expected cpumask here is supposed to be just "0", as it is a package
> scope event, only one CPU will be collecting the event for all the CPUs in
> the package.
>
> After:
> $ cat /sys/devices/power/cpumask
> 0
>
> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
> ---
> Changes in v2:
> * Updated the scope description comment
> * Dont create rapl_pmu_cpumask and rapl_pmu_idx local variables, as they're
> used only once, instead call the get_* functions directly where needed
> * Check topology_logical_(die/package)_id return value
> ---> arch/x86/events/rapl.c | 47 +++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 42 insertions(+), 5 deletions(-)
I just posted a patch set to clean up the hotplug code in perf.
https://lore.kernel.org/lkml/20240802151643.1691631-1-kan.liang@linux.intel.com/
With the cleanup patch set, the fix may be simplified as below.
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index b70ad880c5bc..801697be4118 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -646,6 +646,10 @@ static int __init init_rapl_pmus(void)
rapl_pmus->pmu.module = THIS_MODULE;
rapl_pmus->pmu.scope = PERF_PMU_SCOPE_DIE;
rapl_pmus->pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE;
+
+ if (rapl_pmu_is_pkg_scope())
+ rapl_pmus->pmu.scope = PERF_PMU_SCOPE_PKG;
+
return 0;
}
Could you please take a look at the cleanup patch and give it a try?
Thanks,
Kan
>
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index b985ca79cf97..7097c0f6a71f 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -103,6 +103,19 @@ static struct perf_pmu_events_attr event_attr_##v = { \
> .event_str = str, \
> };
>
> +/*
> + * RAPL Package energy counter scope:
> + * 1. AMD/HYGON platforms have a per-PKG package energy counter
> + * 2. For Intel platforms
> + * 2.1. CLX-AP is multi-die and its RAPL MSRs are die-scope
> + * 2.2. Other Intel platforms are single die systems so the scope can be
> + * considered as either pkg-scope or die-scope, and we are considering
> + * them as die-scope.
> + */
> +#define rapl_pmu_is_pkg_scope() \
> + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || \
> + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
> +
> struct rapl_pmu {
> raw_spinlock_t lock;
> int n_active;
> @@ -140,9 +153,25 @@ static unsigned int rapl_cntr_mask;
> static u64 rapl_timer_ms;
> static struct perf_msr *rapl_msrs;
>
> +/*
> + * Helper functions to get the correct topology macros according to the
> + * RAPL PMU scope.
> + */
> +static inline unsigned int get_rapl_pmu_idx(int cpu)
> +{
> + return rapl_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
> + topology_logical_die_id(cpu);
> +}
> +
> +static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
> +{
> + return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
> + topology_die_cpumask(cpu);
> +}
> +
> static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
> {
> - unsigned int rapl_pmu_idx = topology_logical_die_id(cpu);
> + unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
>
> /*
> * The unsigned check also catches the '-1' return value for non
> @@ -552,7 +581,7 @@ static int rapl_cpu_offline(unsigned int cpu)
>
> pmu->cpu = -1;
> /* Find a new cpu to collect rapl events */
> - target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
> + target = cpumask_any_but(get_rapl_pmu_cpumask(cpu), cpu);
>
> /* Migrate rapl events to the new target */
> if (target < nr_cpu_ids) {
> @@ -565,6 +594,11 @@ static int rapl_cpu_offline(unsigned int cpu)
>
> static int rapl_cpu_online(unsigned int cpu)
> {
> + s32 rapl_pmu_idx = get_rapl_pmu_idx(cpu);
> + if (rapl_pmu_idx < 0) {
> + pr_err("topology_logical_(package/die)_id() returned a negative value");
> + return -EINVAL;
> + }
> struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
> int target;
>
> @@ -579,14 +613,14 @@ static int rapl_cpu_online(unsigned int cpu)
> pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
> rapl_hrtimer_init(pmu);
>
> - rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
> + rapl_pmus->pmus[rapl_pmu_idx] = pmu;
> }
>
> /*
> * Check if there is an online cpu in the package which collects rapl
> * events already.
> */
> - target = cpumask_any_and(&rapl_cpu_mask, topology_die_cpumask(cpu));
> + target = cpumask_any_and(&rapl_cpu_mask, get_rapl_pmu_cpumask(cpu));
> if (target < nr_cpu_ids)
> return 0;
>
> @@ -675,7 +709,10 @@ static const struct attribute_group *rapl_attr_update[] = {
>
> static int __init init_rapl_pmus(void)
> {
> - int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
> + int nr_rapl_pmu = topology_max_packages();
> +
> + if (!rapl_pmu_is_pkg_scope())
> + nr_rapl_pmu *= topology_max_dies_per_package();
>
> rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus, nr_rapl_pmu), GFP_KERNEL);
> if (!rapl_pmus)
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
2024-08-02 12:35 ` Rafael J. Wysocki
@ 2024-08-08 11:17 ` Dhananjay Ugwekar
2024-08-19 13:34 ` Rafael J. Wysocki
0 siblings, 1 reply; 9+ messages in thread
From: Dhananjay Ugwekar @ 2024-08-08 11:17 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
jolsa, irogers, adrian.hunter, kan.liang, tglx, bp, dave.hansen,
x86, rui.zhang, linux-perf-users, linux-kernel, linux-pm,
ananth.narayan, gautham.shenoy, kprateek.nayak, ravi.bangoria,
Michael Larabel
Hello Rafael,
On 8/2/2024 6:05 PM, Rafael J. Wysocki wrote:
> On Tue, Jul 30, 2024 at 6:53 AM Dhananjay Ugwekar
> <Dhananjay.Ugwekar@amd.com> wrote:
>>
>> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
>> on AMD processors that support extended CPUID leaf 0x80000026, the
>> topology_logical_die_id() macros, no longer returns package id, instead it
>> returns the CCD (Core Complex Die) id. This leads to the energy-pkg
>> event scope to be modified to CCD instead of package.
>>
>> For more historical context, please refer to commit 32fb480e0a2c
>> ("powercap/intel_rapl: Support multi-die/package"), which initially changed
>> the RAPL scope from package to die for all systems, as Intel systems
>> with Die enumeration have RAPL scope as die, and those without die
>> enumeration are not affected. So, all systems(Intel, AMD, Hygon), worked
>> correctly with topology_logical_die_id() until recently, but this changed
>> after the "0x80000026 leaf" commit mentioned above.
>>
>> Future multi-die Intel systems will have package scope RAPL counters,
>> but they will be using TPMI RAPL interface, which is not affected by
>> this change.
>>
>> Replacing topology_logical_die_id() with topology_physical_package_id()
>> conditionally only for AMD and Hygon fixes the energy-pkg event.
>>
>> On an AMD 2 socket 8 CCD Zen4 server:
>>
>> Before:
>>
>> linux$ ls /sys/class/powercap/
>> intel-rapl intel-rapl:4 intel-rapl:8:0 intel-rapl:d
>> intel-rapl:0 intel-rapl:4:0 intel-rapl:9 intel-rapl:d:0
>> intel-rapl:0:0 intel-rapl:5 intel-rapl:9:0 intel-rapl:e
>> intel-rapl:1 intel-rapl:5:0 intel-rapl:a intel-rapl:e:0
>> intel-rapl:1:0 intel-rapl:6 intel-rapl:a:0 intel-rapl:f
>> intel-rapl:2 intel-rapl:6:0 intel-rapl:b intel-rapl:f:0
>> intel-rapl:2:0 intel-rapl:7 intel-rapl:b:0
>> intel-rapl:3 intel-rapl:7:0 intel-rapl:c
>> intel-rapl:3:0 intel-rapl:8 intel-rapl:c:0
>>
>> After:
>>
>> linux$ ls /sys/class/powercap/
>> intel-rapl intel-rapl:0 intel-rapl:0:0 intel-rapl:1 intel-rapl:1:0
>>
>> Only one sysfs entry per-event per-package is created after this change.
>>
>> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
>> Reported-by: Michael Larabel <michael@michaellarabel.com>
>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
>> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
>> ---
>> Changes in v2:
>> * Updated scope description comment, commit log
>> * Rename rapl_pmu_is_pkg_scope() to rapl_msrs_are_pkg_scope()
>> * Check topology_logical_(die/package)_id return value
>
> This patch does not depend on the first one in the series if I'm not
> mistaken, in which case I can pick it up separately if you want me to
> do that, so please let me know.
Sorry for the late reply, was out sick,
Yes, please pick this patch separately, it is independent from the first one.
Thanks,
Dhananjay
>
> Thanks!
>
>> ---
>> drivers/powercap/intel_rapl_common.c | 34 ++++++++++++++++++++++++----
>> 1 file changed, 30 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
>> index 3cffa6c79538..4bc56acb99d6 100644
>> --- a/drivers/powercap/intel_rapl_common.c
>> +++ b/drivers/powercap/intel_rapl_common.c
>> @@ -2128,6 +2128,21 @@ void rapl_remove_package(struct rapl_package *rp)
>> }
>> EXPORT_SYMBOL_GPL(rapl_remove_package);
>>
>> +/*
>> + * RAPL Package energy counter scope:
>> + * 1. AMD/HYGON platforms use per-PKG package energy counter
>> + * 2. For Intel platforms
>> + * 2.1 CLX-AP platform has per-DIE package energy counter
>> + * 2.2 Other platforms that uses MSR RAPL are single die systems so the
>> + * package energy counter can be considered as per-PKG/per-DIE,
>> + * here it is considered as per-DIE.
>> + * 2.3 New platforms that use TPMI RAPL doesn't care about the
>> + * scope because they are not MSR/CPU based.
>> + */
>> +#define rapl_msrs_are_pkg_scope() \
>> + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || \
>> + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
>> +
>> /* caller to ensure CPU hotplug lock is held */
>> struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_priv *priv,
>> bool id_is_cpu)
>> @@ -2135,8 +2150,14 @@ struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_
>> struct rapl_package *rp;
>> int uid;
>>
>> - if (id_is_cpu)
>> - uid = topology_logical_die_id(id);
>> + if (id_is_cpu) {
>> + uid = rapl_msrs_are_pkg_scope() ?
>> + topology_physical_package_id(id) : topology_logical_die_id(id);
>> + if (uid < 0) {
>> + pr_err("topology_logical_(package/die)_id() returned a negative value");
>> + return ERR_PTR(-EINVAL);
>> + }
>> + }
>> else
>> uid = id;
>>
>> @@ -2168,9 +2189,14 @@ struct rapl_package *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *pr
>> return ERR_PTR(-ENOMEM);
>>
>> if (id_is_cpu) {
>> - rp->id = topology_logical_die_id(id);
>> + rp->id = rapl_msrs_are_pkg_scope() ?
>> + topology_physical_package_id(id) : topology_logical_die_id(id);
>> + if ((int)(rp->id) < 0) {
>> + pr_err("topology_logical_(package/die)_id() returned a negative value");
>> + return ERR_PTR(-EINVAL);
>> + }
>> rp->lead_cpu = id;
>> - if (topology_max_dies_per_package() > 1)
>> + if (!rapl_msrs_are_pkg_scope() && topology_max_dies_per_package() > 1)
>> snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d",
>> topology_physical_package_id(id), topology_die_id(id));
>> else
>> --
>> 2.43.0
>>
>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
2024-08-02 15:26 ` Liang, Kan
@ 2024-08-08 12:10 ` Dhananjay Ugwekar
0 siblings, 0 replies; 9+ messages in thread
From: Dhananjay Ugwekar @ 2024-08-08 12:10 UTC (permalink / raw)
To: Liang, Kan, peterz, mingo, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, tglx, bp,
dave.hansen, x86, rui.zhang
Cc: linux-perf-users, linux-kernel, linux-pm, ananth.narayan,
gautham.shenoy, kprateek.nayak, ravi.bangoria
Hello Kan,
On 8/2/2024 8:56 PM, Liang, Kan wrote:
> Hi Dhananjay,
>
> On 2024-07-30 12:49 a.m., Dhananjay Ugwekar wrote:
>> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
>> on AMD processors that support extended CPUID leaf 0x80000026, the
>> topology_die_cpumask() and topology_logical_die_id() macros, no longer
>> return the package cpumask and package id, instead they return the CCD
>> (Core Complex Die) mask and id respectively. This leads to the energy-pkg
>> event scope to be modified to CCD instead of package.
>>
>> For more historical context, please refer to commit 32fb480e0a2c
>> ("powercap/intel_rapl: Support multi-die/package"), which initially changed
>> the RAPL scope from package to die for all systems, as Intel systems
>> with Die enumeration have RAPL scope as die, and those without die
>> enumeration were not affected by it. So, all systems(Intel, AMD, Hygon),
>> worked correctly with topology_logical_die_id() until recently, but this
>> changed after the "0x80000026 leaf" commit mentioned above.
>>
>> Replacing topology_logical_die_id() with topology_physical_package_id()
>> conditionally only for AMD and Hygon fixes the energy-pkg event.
>>
>> On a 12 CCD 1 Package AMD Zen4 Genoa machine:
>>
>> Before:
>> $ cat /sys/devices/power/cpumask
>> 0,8,16,24,32,40,48,56,64,72,80,88.
>>
>> The expected cpumask here is supposed to be just "0", as it is a package
>> scope event, only one CPU will be collecting the event for all the CPUs in
>> the package.
>>
>> After:
>> $ cat /sys/devices/power/cpumask
>> 0
>>
>> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
>> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
>> ---
>> Changes in v2:
>> * Updated the scope description comment
>> * Dont create rapl_pmu_cpumask and rapl_pmu_idx local variables, as they're
>> used only once, instead call the get_* functions directly where needed
>> * Check topology_logical_(die/package)_id return value
>> ---> arch/x86/events/rapl.c | 47 +++++++++++++++++++++++++++++++++++++-----
>> 1 file changed, 42 insertions(+), 5 deletions(-)
>
> I just posted a patch set to clean up the hotplug code in perf.
> https://lore.kernel.org/lkml/20240802151643.1691631-1-kan.liang@linux.intel.com/
>
> With the cleanup patch set, the fix may be simplified as below.
>
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index b70ad880c5bc..801697be4118 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -646,6 +646,10 @@ static int __init init_rapl_pmus(void)
> rapl_pmus->pmu.module = THIS_MODULE;
> rapl_pmus->pmu.scope = PERF_PMU_SCOPE_DIE;
> rapl_pmus->pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE;
> +
> + if (rapl_pmu_is_pkg_scope())
> + rapl_pmus->pmu.scope = PERF_PMU_SCOPE_PKG;
> +
> return 0;
> }
>
> Could you please take a look at the cleanup patch and give it a try?
Sorry for the late reply, I was out sick for last few days.
I tried out my fix(modified) on top your patchset, the energy-pkg event seems to be
working correctly,
On a Zen4 Genoa EPYC server:
When idle,
~$ sudo perf stat -a -e power/energy-pkg/ sleep 1
Performance counter stats for 'system wide':
33.93 Joules power/energy-pkg/
1.001969742 seconds time elapsed
After running stress-ng:
~$ sudo perf stat -a -e power/energy-pkg/ sleep 1
Performance counter stats for 'system wide':
376.16 Joules power/energy-pkg/
1.003985155 seconds time elapsed
Please feel free to add my tested-by on your hotplug cleanup patchset.
Thanks,
Dhananjay
>
> Thanks,
> Kan
>>
>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>> index b985ca79cf97..7097c0f6a71f 100644
>> --- a/arch/x86/events/rapl.c
>> +++ b/arch/x86/events/rapl.c
>> @@ -103,6 +103,19 @@ static struct perf_pmu_events_attr event_attr_##v = { \
>> .event_str = str, \
>> };
>>
>> +/*
>> + * RAPL Package energy counter scope:
>> + * 1. AMD/HYGON platforms have a per-PKG package energy counter
>> + * 2. For Intel platforms
>> + * 2.1. CLX-AP is multi-die and its RAPL MSRs are die-scope
>> + * 2.2. Other Intel platforms are single die systems so the scope can be
>> + * considered as either pkg-scope or die-scope, and we are considering
>> + * them as die-scope.
>> + */
>> +#define rapl_pmu_is_pkg_scope() \
>> + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || \
>> + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
>> +
>> struct rapl_pmu {
>> raw_spinlock_t lock;
>> int n_active;
>> @@ -140,9 +153,25 @@ static unsigned int rapl_cntr_mask;
>> static u64 rapl_timer_ms;
>> static struct perf_msr *rapl_msrs;
>>
>> +/*
>> + * Helper functions to get the correct topology macros according to the
>> + * RAPL PMU scope.
>> + */
>> +static inline unsigned int get_rapl_pmu_idx(int cpu)
>> +{
>> + return rapl_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
>> + topology_logical_die_id(cpu);
>> +}
>> +
>> +static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
>> +{
>> + return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
>> + topology_die_cpumask(cpu);
>> +}
>> +
>> static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
>> {
>> - unsigned int rapl_pmu_idx = topology_logical_die_id(cpu);
>> + unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
>>
>> /*
>> * The unsigned check also catches the '-1' return value for non
>> @@ -552,7 +581,7 @@ static int rapl_cpu_offline(unsigned int cpu)
>>
>> pmu->cpu = -1;
>> /* Find a new cpu to collect rapl events */
>> - target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
>> + target = cpumask_any_but(get_rapl_pmu_cpumask(cpu), cpu);
>>
>> /* Migrate rapl events to the new target */
>> if (target < nr_cpu_ids) {
>> @@ -565,6 +594,11 @@ static int rapl_cpu_offline(unsigned int cpu)
>>
>> static int rapl_cpu_online(unsigned int cpu)
>> {
>> + s32 rapl_pmu_idx = get_rapl_pmu_idx(cpu);
>> + if (rapl_pmu_idx < 0) {
>> + pr_err("topology_logical_(package/die)_id() returned a negative value");
>> + return -EINVAL;
>> + }
>> struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>> int target;
>>
>> @@ -579,14 +613,14 @@ static int rapl_cpu_online(unsigned int cpu)
>> pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
>> rapl_hrtimer_init(pmu);
>>
>> - rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
>> + rapl_pmus->pmus[rapl_pmu_idx] = pmu;
>> }
>>
>> /*
>> * Check if there is an online cpu in the package which collects rapl
>> * events already.
>> */
>> - target = cpumask_any_and(&rapl_cpu_mask, topology_die_cpumask(cpu));
>> + target = cpumask_any_and(&rapl_cpu_mask, get_rapl_pmu_cpumask(cpu));
>> if (target < nr_cpu_ids)
>> return 0;
>>
>> @@ -675,7 +709,10 @@ static const struct attribute_group *rapl_attr_update[] = {
>>
>> static int __init init_rapl_pmus(void)
>> {
>> - int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
>> + int nr_rapl_pmu = topology_max_packages();
>> +
>> + if (!rapl_pmu_is_pkg_scope())
>> + nr_rapl_pmu *= topology_max_dies_per_package();
>>
>> rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus, nr_rapl_pmu), GFP_KERNEL);
>> if (!rapl_pmus)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
2024-08-08 11:17 ` Dhananjay Ugwekar
@ 2024-08-19 13:34 ` Rafael J. Wysocki
2024-08-21 4:57 ` Gautham R. Shenoy
0 siblings, 1 reply; 9+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 13:34 UTC (permalink / raw)
To: Dhananjay Ugwekar
Cc: Rafael J. Wysocki, peterz, mingo, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
tglx, bp, dave.hansen, x86, rui.zhang, linux-perf-users,
linux-kernel, linux-pm, ananth.narayan, gautham.shenoy,
kprateek.nayak, ravi.bangoria, Michael Larabel
On Thu, Aug 8, 2024 at 1:18 PM Dhananjay Ugwekar
<Dhananjay.Ugwekar@amd.com> wrote:
>
> Hello Rafael,
>
> On 8/2/2024 6:05 PM, Rafael J. Wysocki wrote:
> > On Tue, Jul 30, 2024 at 6:53 AM Dhananjay Ugwekar
> > <Dhananjay.Ugwekar@amd.com> wrote:
> >>
> >> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
> >> on AMD processors that support extended CPUID leaf 0x80000026, the
> >> topology_logical_die_id() macros, no longer returns package id, instead it
> >> returns the CCD (Core Complex Die) id. This leads to the energy-pkg
> >> event scope to be modified to CCD instead of package.
> >>
> >> For more historical context, please refer to commit 32fb480e0a2c
> >> ("powercap/intel_rapl: Support multi-die/package"), which initially changed
> >> the RAPL scope from package to die for all systems, as Intel systems
> >> with Die enumeration have RAPL scope as die, and those without die
> >> enumeration are not affected. So, all systems(Intel, AMD, Hygon), worked
> >> correctly with topology_logical_die_id() until recently, but this changed
> >> after the "0x80000026 leaf" commit mentioned above.
> >>
> >> Future multi-die Intel systems will have package scope RAPL counters,
> >> but they will be using TPMI RAPL interface, which is not affected by
> >> this change.
> >>
> >> Replacing topology_logical_die_id() with topology_physical_package_id()
> >> conditionally only for AMD and Hygon fixes the energy-pkg event.
> >>
> >> On an AMD 2 socket 8 CCD Zen4 server:
> >>
> >> Before:
> >>
> >> linux$ ls /sys/class/powercap/
> >> intel-rapl intel-rapl:4 intel-rapl:8:0 intel-rapl:d
> >> intel-rapl:0 intel-rapl:4:0 intel-rapl:9 intel-rapl:d:0
> >> intel-rapl:0:0 intel-rapl:5 intel-rapl:9:0 intel-rapl:e
> >> intel-rapl:1 intel-rapl:5:0 intel-rapl:a intel-rapl:e:0
> >> intel-rapl:1:0 intel-rapl:6 intel-rapl:a:0 intel-rapl:f
> >> intel-rapl:2 intel-rapl:6:0 intel-rapl:b intel-rapl:f:0
> >> intel-rapl:2:0 intel-rapl:7 intel-rapl:b:0
> >> intel-rapl:3 intel-rapl:7:0 intel-rapl:c
> >> intel-rapl:3:0 intel-rapl:8 intel-rapl:c:0
> >>
> >> After:
> >>
> >> linux$ ls /sys/class/powercap/
> >> intel-rapl intel-rapl:0 intel-rapl:0:0 intel-rapl:1 intel-rapl:1:0
> >>
> >> Only one sysfs entry per-event per-package is created after this change.
> >>
> >> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
> >> Reported-by: Michael Larabel <michael@michaellarabel.com>
> >> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> >> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
> >> ---
> >> Changes in v2:
> >> * Updated scope description comment, commit log
> >> * Rename rapl_pmu_is_pkg_scope() to rapl_msrs_are_pkg_scope()
> >> * Check topology_logical_(die/package)_id return value
> >
> > This patch does not depend on the first one in the series if I'm not
> > mistaken, in which case I can pick it up separately if you want me to
> > do that, so please let me know.
>
> Sorry for the late reply, was out sick,
>
> Yes, please pick this patch separately, it is independent from the first one.
OK, applied as 6.12 material.
Thanks!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
2024-08-19 13:34 ` Rafael J. Wysocki
@ 2024-08-21 4:57 ` Gautham R. Shenoy
0 siblings, 0 replies; 9+ messages in thread
From: Gautham R. Shenoy @ 2024-08-21 4:57 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Dhananjay Ugwekar, peterz, mingo, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
tglx, bp, dave.hansen, x86, rui.zhang, linux-perf-users,
linux-kernel, linux-pm, ananth.narayan, kprateek.nayak,
ravi.bangoria, Michael Larabel
Hello Rafael,
On Mon, Aug 19, 2024 at 03:34:09PM +0200, Rafael J. Wysocki wrote:
> On Thu, Aug 8, 2024 at 1:18 PM Dhananjay Ugwekar
> <Dhananjay.Ugwekar@amd.com> wrote:
> >
> > Hello Rafael,
> >
> > On 8/2/2024 6:05 PM, Rafael J. Wysocki wrote:
> > > On Tue, Jul 30, 2024 at 6:53 AM Dhananjay Ugwekar
> > > <Dhananjay.Ugwekar@amd.com> wrote:
> > >>
> > >> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
> > >> on AMD processors that support extended CPUID leaf 0x80000026, the
> > >> topology_logical_die_id() macros, no longer returns package id, instead it
> > >> returns the CCD (Core Complex Die) id. This leads to the energy-pkg
> > >> event scope to be modified to CCD instead of package.
> > >>
> > >> For more historical context, please refer to commit 32fb480e0a2c
> > >> ("powercap/intel_rapl: Support multi-die/package"), which initially changed
> > >> the RAPL scope from package to die for all systems, as Intel systems
> > >> with Die enumeration have RAPL scope as die, and those without die
> > >> enumeration are not affected. So, all systems(Intel, AMD, Hygon), worked
> > >> correctly with topology_logical_die_id() until recently, but this changed
> > >> after the "0x80000026 leaf" commit mentioned above.
> > >>
> > >> Future multi-die Intel systems will have package scope RAPL counters,
> > >> but they will be using TPMI RAPL interface, which is not affected by
> > >> this change.
> > >>
> > >> Replacing topology_logical_die_id() with topology_physical_package_id()
> > >> conditionally only for AMD and Hygon fixes the energy-pkg event.
> > >>
> > >> On an AMD 2 socket 8 CCD Zen4 server:
> > >>
> > >> Before:
> > >>
> > >> linux$ ls /sys/class/powercap/
> > >> intel-rapl intel-rapl:4 intel-rapl:8:0 intel-rapl:d
> > >> intel-rapl:0 intel-rapl:4:0 intel-rapl:9 intel-rapl:d:0
> > >> intel-rapl:0:0 intel-rapl:5 intel-rapl:9:0 intel-rapl:e
> > >> intel-rapl:1 intel-rapl:5:0 intel-rapl:a intel-rapl:e:0
> > >> intel-rapl:1:0 intel-rapl:6 intel-rapl:a:0 intel-rapl:f
> > >> intel-rapl:2 intel-rapl:6:0 intel-rapl:b intel-rapl:f:0
> > >> intel-rapl:2:0 intel-rapl:7 intel-rapl:b:0
> > >> intel-rapl:3 intel-rapl:7:0 intel-rapl:c
> > >> intel-rapl:3:0 intel-rapl:8 intel-rapl:c:0
> > >>
> > >> After:
> > >>
> > >> linux$ ls /sys/class/powercap/
> > >> intel-rapl intel-rapl:0 intel-rapl:0:0 intel-rapl:1 intel-rapl:1:0
> > >>
> > >> Only one sysfs entry per-event per-package is created after this change.
> > >>
> > >> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
> > >> Reported-by: Michael Larabel <michael@michaellarabel.com>
> > >> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> > >> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
> > >> ---
> > >> Changes in v2:
> > >> * Updated scope description comment, commit log
> > >> * Rename rapl_pmu_is_pkg_scope() to rapl_msrs_are_pkg_scope()
> > >> * Check topology_logical_(die/package)_id return value
> > >
> > > This patch does not depend on the first one in the series if I'm not
> > > mistaken, in which case I can pick it up separately if you want me to
> > > do that, so please let me know.
> >
> > Sorry for the late reply, was out sick,
> >
> > Yes, please pick this patch separately, it is independent from the first one.
>
> OK, applied as 6.12 material.
Can this go into 6.11 fixes?
It fixes the commit 63edbaa48a57 ("x86/cpu/topology: Add support for
the AMD 0x80000026 leaf") which got merged in 6.10.
--
Thanks and Regards
gautham.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-08-21 4:57 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-30 4:49 [PATCH v2 0/2] RAPL driver fixes for AMD CPUs Dhananjay Ugwekar
2024-07-30 4:49 ` [PATCH v2 1/2] perf/x86/rapl: Fix the energy-pkg event " Dhananjay Ugwekar
2024-08-02 15:26 ` Liang, Kan
2024-08-08 12:10 ` Dhananjay Ugwekar
2024-07-30 4:49 ` [PATCH v2 2/2] powercap/intel_rapl: " Dhananjay Ugwekar
2024-08-02 12:35 ` Rafael J. Wysocki
2024-08-08 11:17 ` Dhananjay Ugwekar
2024-08-19 13:34 ` Rafael J. Wysocki
2024-08-21 4:57 ` Gautham R. Shenoy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).