* [PATCH v7 01/10] perf/x86/rapl: Remove the unused get_rapl_pmu_cpumask() function
2024-11-15 6:07 [PATCH v7 00/10] Add RAPL core energy counter support for AMD CPUs Dhananjay Ugwekar
@ 2024-11-15 6:07 ` Dhananjay Ugwekar
2024-11-15 6:07 ` [PATCH v7 02/10] x86/topology: Introduce topology_logical_core_id() Dhananjay Ugwekar
` (9 subsequent siblings)
10 siblings, 0 replies; 26+ messages in thread
From: Dhananjay Ugwekar @ 2024-11-15 6:07 UTC (permalink / raw)
To: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel, Dhananjay Ugwekar
commit 9e9af8bbb5f9 ("perf/x86/rapl: Clean up cpumask and hotplug")
removes the cpumask handling from rapl. Post that, we no longer need the
get_rapl_pmu_cpumask() function. So remove it.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
---
arch/x86/events/rapl.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index a8defc813c36..f70c49ca0ef3 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -153,7 +153,7 @@ static u64 rapl_timer_ms;
static struct perf_msr *rapl_msrs;
/*
- * Helper functions to get the correct topology macros according to the
+ * Helper function to get the correct topology id according to the
* RAPL PMU scope.
*/
static inline unsigned int get_rapl_pmu_idx(int cpu)
@@ -162,12 +162,6 @@ static inline unsigned int get_rapl_pmu_idx(int cpu)
topology_logical_die_id(cpu);
}
-static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
-{
- return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
- topology_die_cpumask(cpu);
-}
-
static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
{
unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v7 02/10] x86/topology: Introduce topology_logical_core_id()
2024-11-15 6:07 [PATCH v7 00/10] Add RAPL core energy counter support for AMD CPUs Dhananjay Ugwekar
2024-11-15 6:07 ` [PATCH v7 01/10] perf/x86/rapl: Remove the unused get_rapl_pmu_cpumask() function Dhananjay Ugwekar
@ 2024-11-15 6:07 ` Dhananjay Ugwekar
2024-11-15 6:07 ` [PATCH v7 03/10] perf/x86/rapl: Remove the cpu_to_rapl_pmu() function Dhananjay Ugwekar
` (8 subsequent siblings)
10 siblings, 0 replies; 26+ messages in thread
From: Dhananjay Ugwekar @ 2024-11-15 6:07 UTC (permalink / raw)
To: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel, Oleksandr Natalenko, Dhananjay Ugwekar
From: K Prateek Nayak <kprateek.nayak@amd.com>
On x86, topology_core_id() returns a unique core ID within the PKG
domain. Looking at match_smt() suggests that a core ID just needs to be
unique within a LLC domain. For use cases such as the core RAPL PMU,
there exists a need for a unique core ID across the entire system with
multiple PKG domains. Introduce topology_logical_core_id() to derive a
unique core ID across the system.
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
---
Documentation/arch/x86/topology.rst | 4 ++++
arch/x86/include/asm/processor.h | 1 +
arch/x86/include/asm/topology.h | 1 +
arch/x86/kernel/cpu/debugfs.c | 1 +
arch/x86/kernel/cpu/topology_common.c | 1 +
5 files changed, 8 insertions(+)
diff --git a/Documentation/arch/x86/topology.rst b/Documentation/arch/x86/topology.rst
index 7352ab89a55a..c12837e61bda 100644
--- a/Documentation/arch/x86/topology.rst
+++ b/Documentation/arch/x86/topology.rst
@@ -135,6 +135,10 @@ Thread-related topology information in the kernel:
The ID of the core to which a thread belongs. It is also printed in /proc/cpuinfo
"core_id."
+ - topology_logical_core_id();
+
+ The logical core ID to which a thread belongs.
+
System topology examples
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 4a686f0e5dbf..595859cbf7ff 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -98,6 +98,7 @@ struct cpuinfo_topology {
// Logical ID mappings
u32 logical_pkg_id;
u32 logical_die_id;
+ u32 logical_core_id;
// AMD Node ID and Nodes per Package info
u32 amd_node_id;
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index aef70336d624..672fccf9f845 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -137,6 +137,7 @@ extern const struct cpumask *cpu_clustergroup_mask(int cpu);
#define topology_logical_package_id(cpu) (cpu_data(cpu).topo.logical_pkg_id)
#define topology_physical_package_id(cpu) (cpu_data(cpu).topo.pkg_id)
#define topology_logical_die_id(cpu) (cpu_data(cpu).topo.logical_die_id)
+#define topology_logical_core_id(cpu) (cpu_data(cpu).topo.logical_core_id)
#define topology_die_id(cpu) (cpu_data(cpu).topo.die_id)
#define topology_core_id(cpu) (cpu_data(cpu).topo.core_id)
#define topology_ppin(cpu) (cpu_data(cpu).ppin)
diff --git a/arch/x86/kernel/cpu/debugfs.c b/arch/x86/kernel/cpu/debugfs.c
index 3baf3e435834..b1eb6d7828db 100644
--- a/arch/x86/kernel/cpu/debugfs.c
+++ b/arch/x86/kernel/cpu/debugfs.c
@@ -24,6 +24,7 @@ static int cpu_debug_show(struct seq_file *m, void *p)
seq_printf(m, "core_id: %u\n", c->topo.core_id);
seq_printf(m, "logical_pkg_id: %u\n", c->topo.logical_pkg_id);
seq_printf(m, "logical_die_id: %u\n", c->topo.logical_die_id);
+ seq_printf(m, "logical_core_id: %u\n", c->topo.logical_core_id);
seq_printf(m, "llc_id: %u\n", c->topo.llc_id);
seq_printf(m, "l2c_id: %u\n", c->topo.l2c_id);
seq_printf(m, "amd_node_id: %u\n", c->topo.amd_node_id);
diff --git a/arch/x86/kernel/cpu/topology_common.c b/arch/x86/kernel/cpu/topology_common.c
index 9a6069e7133c..23722aa21e2f 100644
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -151,6 +151,7 @@ static void topo_set_ids(struct topo_scan *tscan, bool early)
if (!early) {
c->topo.logical_pkg_id = topology_get_logical_id(apicid, TOPO_PKG_DOMAIN);
c->topo.logical_die_id = topology_get_logical_id(apicid, TOPO_DIE_DOMAIN);
+ c->topo.logical_core_id = topology_get_logical_id(apicid, TOPO_CORE_DOMAIN);
}
/* Package relative core ID */
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v7 03/10] perf/x86/rapl: Remove the cpu_to_rapl_pmu() function
2024-11-15 6:07 [PATCH v7 00/10] Add RAPL core energy counter support for AMD CPUs Dhananjay Ugwekar
2024-11-15 6:07 ` [PATCH v7 01/10] perf/x86/rapl: Remove the unused get_rapl_pmu_cpumask() function Dhananjay Ugwekar
2024-11-15 6:07 ` [PATCH v7 02/10] x86/topology: Introduce topology_logical_core_id() Dhananjay Ugwekar
@ 2024-11-15 6:07 ` Dhananjay Ugwekar
2024-11-15 6:08 ` [PATCH v7 04/10] perf/x86/rapl: Rename rapl_pmu variables Dhananjay Ugwekar
` (7 subsequent siblings)
10 siblings, 0 replies; 26+ messages in thread
From: Dhananjay Ugwekar @ 2024-11-15 6:07 UTC (permalink / raw)
To: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel, Dhananjay Ugwekar
Prepare for the addition of RAPL core energy counter support.
Post which, one CPU might be mapped to more than one rapl_pmu
(package/die one and a core one). So, remove the cpu_to_rapl_pmu()
function.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
---
v7 changes:
* Add comment to get_rapl_pmu_idx()
* Change the rapl_pmu_idx variable in rapl_pmu_event_init() to
unsigned int
---
arch/x86/events/rapl.c | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index f70c49ca0ef3..a3c063176513 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -157,22 +157,15 @@ static struct perf_msr *rapl_msrs;
* RAPL PMU scope.
*/
static inline unsigned int get_rapl_pmu_idx(int cpu)
-{
+{ /*
+ * Returns unsigned int, which converts the '-1' return value
+ * (for non-existent mappings in topology map) to UINT_MAX, so
+ * the error check in the caller is simplified.
+ */
return rapl_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
topology_logical_die_id(cpu);
}
-static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
-{
- unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
-
- /*
- * The unsigned check also catches the '-1' return value for non
- * existent mappings in the topology map.
- */
- return rapl_pmu_idx < rapl_pmus->nr_rapl_pmu ? rapl_pmus->pmus[rapl_pmu_idx] : NULL;
-}
-
static inline u64 rapl_read_counter(struct perf_event *event)
{
u64 raw;
@@ -350,6 +343,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
u64 cfg = event->attr.config & RAPL_EVENT_MASK;
int bit, ret = 0;
struct rapl_pmu *pmu;
+ unsigned int rapl_pmu_idx;
/* only look at RAPL events */
if (event->attr.type != rapl_pmus->pmu.type)
@@ -376,8 +370,12 @@ static int rapl_pmu_event_init(struct perf_event *event)
if (event->attr.sample_period) /* no sampling */
return -EINVAL;
+ rapl_pmu_idx = get_rapl_pmu_idx(event->cpu);
+ if (rapl_pmu_idx >= rapl_pmus->nr_rapl_pmu)
+ return -EINVAL;
+
/* must be done before validate_group */
- pmu = cpu_to_rapl_pmu(event->cpu);
+ pmu = rapl_pmus->pmus[rapl_pmu_idx];
if (!pmu)
return -EINVAL;
event->pmu_private = pmu;
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v7 04/10] perf/x86/rapl: Rename rapl_pmu variables
2024-11-15 6:07 [PATCH v7 00/10] Add RAPL core energy counter support for AMD CPUs Dhananjay Ugwekar
` (2 preceding siblings ...)
2024-11-15 6:07 ` [PATCH v7 03/10] perf/x86/rapl: Remove the cpu_to_rapl_pmu() function Dhananjay Ugwekar
@ 2024-11-15 6:08 ` Dhananjay Ugwekar
2024-11-15 6:08 ` [PATCH v7 05/10] perf/x86/rapl: Make rapl_model struct global Dhananjay Ugwekar
` (6 subsequent siblings)
10 siblings, 0 replies; 26+ messages in thread
From: Dhananjay Ugwekar @ 2024-11-15 6:08 UTC (permalink / raw)
To: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel, Dhananjay Ugwekar
Rename struct rapl_pmu variables from "pmu" to "rapl_pmu", to
avoid any confusion between the variables of two different
structs pmu and rapl_pmu. As rapl_pmu also contains a pointer to
struct pmu, which leads to situations in code like pmu->pmu,
which is needlessly confusing. Above scenario is replaced with
much more readable rapl_pmu->pmu with this change.
Also rename "pmus" member in rapl_pmus struct, for same reason.
No functional change.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
---
arch/x86/events/rapl.c | 91 +++++++++++++++++++++---------------------
1 file changed, 46 insertions(+), 45 deletions(-)
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index a3c063176513..dc350292e1fc 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -129,7 +129,7 @@ struct rapl_pmu {
struct rapl_pmus {
struct pmu pmu;
unsigned int nr_rapl_pmu;
- struct rapl_pmu *pmus[] __counted_by(nr_rapl_pmu);
+ struct rapl_pmu *rapl_pmu[] __counted_by(nr_rapl_pmu);
};
enum rapl_unit_quirk {
@@ -227,34 +227,34 @@ static void rapl_start_hrtimer(struct rapl_pmu *pmu)
static enum hrtimer_restart rapl_hrtimer_handle(struct hrtimer *hrtimer)
{
- struct rapl_pmu *pmu = container_of(hrtimer, struct rapl_pmu, hrtimer);
+ struct rapl_pmu *rapl_pmu = container_of(hrtimer, struct rapl_pmu, hrtimer);
struct perf_event *event;
unsigned long flags;
- if (!pmu->n_active)
+ if (!rapl_pmu->n_active)
return HRTIMER_NORESTART;
- raw_spin_lock_irqsave(&pmu->lock, flags);
+ raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
- list_for_each_entry(event, &pmu->active_list, active_entry)
+ list_for_each_entry(event, &rapl_pmu->active_list, active_entry)
rapl_event_update(event);
- raw_spin_unlock_irqrestore(&pmu->lock, flags);
+ raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
- hrtimer_forward_now(hrtimer, pmu->timer_interval);
+ hrtimer_forward_now(hrtimer, rapl_pmu->timer_interval);
return HRTIMER_RESTART;
}
-static void rapl_hrtimer_init(struct rapl_pmu *pmu)
+static void rapl_hrtimer_init(struct rapl_pmu *rapl_pmu)
{
- struct hrtimer *hr = &pmu->hrtimer;
+ struct hrtimer *hr = &rapl_pmu->hrtimer;
hrtimer_init(hr, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
hr->function = rapl_hrtimer_handle;
}
-static void __rapl_pmu_event_start(struct rapl_pmu *pmu,
+static void __rapl_pmu_event_start(struct rapl_pmu *rapl_pmu,
struct perf_event *event)
{
if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
@@ -262,39 +262,39 @@ static void __rapl_pmu_event_start(struct rapl_pmu *pmu,
event->hw.state = 0;
- list_add_tail(&event->active_entry, &pmu->active_list);
+ list_add_tail(&event->active_entry, &rapl_pmu->active_list);
local64_set(&event->hw.prev_count, rapl_read_counter(event));
- pmu->n_active++;
- if (pmu->n_active == 1)
- rapl_start_hrtimer(pmu);
+ rapl_pmu->n_active++;
+ if (rapl_pmu->n_active == 1)
+ rapl_start_hrtimer(rapl_pmu);
}
static void rapl_pmu_event_start(struct perf_event *event, int mode)
{
- struct rapl_pmu *pmu = event->pmu_private;
+ struct rapl_pmu *rapl_pmu = event->pmu_private;
unsigned long flags;
- raw_spin_lock_irqsave(&pmu->lock, flags);
- __rapl_pmu_event_start(pmu, event);
- raw_spin_unlock_irqrestore(&pmu->lock, flags);
+ raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
+ __rapl_pmu_event_start(rapl_pmu, event);
+ raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
}
static void rapl_pmu_event_stop(struct perf_event *event, int mode)
{
- struct rapl_pmu *pmu = event->pmu_private;
+ struct rapl_pmu *rapl_pmu = event->pmu_private;
struct hw_perf_event *hwc = &event->hw;
unsigned long flags;
- raw_spin_lock_irqsave(&pmu->lock, flags);
+ raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
/* mark event as deactivated and stopped */
if (!(hwc->state & PERF_HES_STOPPED)) {
- WARN_ON_ONCE(pmu->n_active <= 0);
- pmu->n_active--;
- if (pmu->n_active == 0)
- hrtimer_cancel(&pmu->hrtimer);
+ WARN_ON_ONCE(rapl_pmu->n_active <= 0);
+ rapl_pmu->n_active--;
+ if (rapl_pmu->n_active == 0)
+ hrtimer_cancel(&rapl_pmu->hrtimer);
list_del(&event->active_entry);
@@ -312,23 +312,23 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode)
hwc->state |= PERF_HES_UPTODATE;
}
- raw_spin_unlock_irqrestore(&pmu->lock, flags);
+ raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
}
static int rapl_pmu_event_add(struct perf_event *event, int mode)
{
- struct rapl_pmu *pmu = event->pmu_private;
+ struct rapl_pmu *rapl_pmu = event->pmu_private;
struct hw_perf_event *hwc = &event->hw;
unsigned long flags;
- raw_spin_lock_irqsave(&pmu->lock, flags);
+ raw_spin_lock_irqsave(&rapl_pmu->lock, flags);
hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
if (mode & PERF_EF_START)
- __rapl_pmu_event_start(pmu, event);
+ __rapl_pmu_event_start(rapl_pmu, event);
- raw_spin_unlock_irqrestore(&pmu->lock, flags);
+ raw_spin_unlock_irqrestore(&rapl_pmu->lock, flags);
return 0;
}
@@ -342,7 +342,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
{
u64 cfg = event->attr.config & RAPL_EVENT_MASK;
int bit, ret = 0;
- struct rapl_pmu *pmu;
+ struct rapl_pmu *rapl_pmu;
unsigned int rapl_pmu_idx;
/* only look at RAPL events */
@@ -375,10 +375,11 @@ static int rapl_pmu_event_init(struct perf_event *event)
return -EINVAL;
/* must be done before validate_group */
- pmu = rapl_pmus->pmus[rapl_pmu_idx];
- if (!pmu)
+ rapl_pmu = rapl_pmus->rapl_pmu[rapl_pmu_idx];
+ if (!rapl_pmu)
return -EINVAL;
- event->pmu_private = pmu;
+
+ event->pmu_private = rapl_pmu;
event->hw.event_base = rapl_msrs[bit].msr;
event->hw.config = cfg;
event->hw.idx = bit;
@@ -605,7 +606,7 @@ static void cleanup_rapl_pmus(void)
int i;
for (i = 0; i < rapl_pmus->nr_rapl_pmu; i++)
- kfree(rapl_pmus->pmus[i]);
+ kfree(rapl_pmus->rapl_pmu[i]);
kfree(rapl_pmus);
}
@@ -620,27 +621,27 @@ static const struct attribute_group *rapl_attr_update[] = {
static int __init init_rapl_pmu(void)
{
- struct rapl_pmu *pmu;
+ struct rapl_pmu *rapl_pmu;
int idx;
for (idx = 0; idx < rapl_pmus->nr_rapl_pmu; idx++) {
- pmu = kzalloc(sizeof(*pmu), GFP_KERNEL);
- if (!pmu)
+ rapl_pmu = kzalloc(sizeof(*rapl_pmu), GFP_KERNEL);
+ if (!rapl_pmu)
goto free;
- raw_spin_lock_init(&pmu->lock);
- INIT_LIST_HEAD(&pmu->active_list);
- pmu->pmu = &rapl_pmus->pmu;
- pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
- rapl_hrtimer_init(pmu);
+ raw_spin_lock_init(&rapl_pmu->lock);
+ INIT_LIST_HEAD(&rapl_pmu->active_list);
+ rapl_pmu->pmu = &rapl_pmus->pmu;
+ rapl_pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
+ rapl_hrtimer_init(rapl_pmu);
- rapl_pmus->pmus[idx] = pmu;
+ rapl_pmus->rapl_pmu[idx] = rapl_pmu;
}
return 0;
free:
for (; idx > 0; idx--)
- kfree(rapl_pmus->pmus[idx - 1]);
+ kfree(rapl_pmus->rapl_pmu[idx - 1]);
return -ENOMEM;
}
@@ -654,7 +655,7 @@ static int __init init_rapl_pmus(void)
rapl_pmu_scope = PERF_PMU_SCOPE_DIE;
}
- rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus, nr_rapl_pmu), GFP_KERNEL);
+ rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu, nr_rapl_pmu), GFP_KERNEL);
if (!rapl_pmus)
return -ENOMEM;
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v7 05/10] perf/x86/rapl: Make rapl_model struct global
2024-11-15 6:07 [PATCH v7 00/10] Add RAPL core energy counter support for AMD CPUs Dhananjay Ugwekar
` (3 preceding siblings ...)
2024-11-15 6:08 ` [PATCH v7 04/10] perf/x86/rapl: Rename rapl_pmu variables Dhananjay Ugwekar
@ 2024-11-15 6:08 ` Dhananjay Ugwekar
2024-11-15 6:08 ` [PATCH v7 06/10] perf/x86/rapl: Add arguments to the init and cleanup functions Dhananjay Ugwekar
` (5 subsequent siblings)
10 siblings, 0 replies; 26+ messages in thread
From: Dhananjay Ugwekar @ 2024-11-15 6:08 UTC (permalink / raw)
To: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel, Dhananjay Ugwekar
Prepare for the addition of RAPL core energy counter support.
As there will always be just one rapl_model variable on a system, make it
global, to make it easier to access it from any function.
No functional change.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
---
arch/x86/events/rapl.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index dc350292e1fc..0585e15caeab 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -151,6 +151,7 @@ static struct rapl_pmus *rapl_pmus;
static unsigned int rapl_cntr_mask;
static u64 rapl_timer_ms;
static struct perf_msr *rapl_msrs;
+static struct rapl_model *rapl_model;
/*
* Helper function to get the correct topology id according to the
@@ -541,18 +542,18 @@ static struct perf_msr amd_rapl_msrs[] = {
[PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group, NULL, false, 0 },
};
-static int rapl_check_hw_unit(struct rapl_model *rm)
+static int rapl_check_hw_unit(void)
{
u64 msr_rapl_power_unit_bits;
int i;
/* protect rdmsrl() to handle virtualization */
- if (rdmsrl_safe(rm->msr_power_unit, &msr_rapl_power_unit_bits))
+ if (rdmsrl_safe(rapl_model->msr_power_unit, &msr_rapl_power_unit_bits))
return -1;
for (i = 0; i < NR_RAPL_DOMAINS; i++)
rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
- switch (rm->unit_quirk) {
+ switch (rapl_model->unit_quirk) {
/*
* DRAM domain on HSW server and KNL has fixed energy unit which can be
* different than the unit from power unit MSR. See
@@ -797,21 +798,20 @@ MODULE_DEVICE_TABLE(x86cpu, rapl_model_match);
static int __init rapl_pmu_init(void)
{
const struct x86_cpu_id *id;
- struct rapl_model *rm;
int ret;
id = x86_match_cpu(rapl_model_match);
if (!id)
return -ENODEV;
- rm = (struct rapl_model *) id->driver_data;
+ rapl_model = (struct rapl_model *) id->driver_data;
- rapl_msrs = rm->rapl_msrs;
+ rapl_msrs = rapl_model->rapl_msrs;
rapl_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_MAX,
- false, (void *) &rm->events);
+ false, (void *) &rapl_model->events);
- ret = rapl_check_hw_unit(rm);
+ ret = rapl_check_hw_unit();
if (ret)
return ret;
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v7 06/10] perf/x86/rapl: Add arguments to the init and cleanup functions
2024-11-15 6:07 [PATCH v7 00/10] Add RAPL core energy counter support for AMD CPUs Dhananjay Ugwekar
` (4 preceding siblings ...)
2024-11-15 6:08 ` [PATCH v7 05/10] perf/x86/rapl: Make rapl_model struct global Dhananjay Ugwekar
@ 2024-11-15 6:08 ` Dhananjay Ugwekar
2024-11-15 6:08 ` [PATCH v7 07/10] perf/x86/rapl: Modify the generic variable names to *_pkg* Dhananjay Ugwekar
` (4 subsequent siblings)
10 siblings, 0 replies; 26+ messages in thread
From: Dhananjay Ugwekar @ 2024-11-15 6:08 UTC (permalink / raw)
To: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel, Dhananjay Ugwekar
Prepare for the addition of RAPL core energy counter support.
Add arguments to the init and cleanup functions, which will help in
initialization and cleaning up of two separate PMUs.
No functional change.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
---
arch/x86/events/rapl.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 0585e15caeab..f353008b860b 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -602,7 +602,7 @@ static void __init rapl_advertise(void)
}
}
-static void cleanup_rapl_pmus(void)
+static void cleanup_rapl_pmus(struct rapl_pmus *rapl_pmus)
{
int i;
@@ -620,7 +620,7 @@ static const struct attribute_group *rapl_attr_update[] = {
NULL,
};
-static int __init init_rapl_pmu(void)
+static int __init init_rapl_pmu(struct rapl_pmus *rapl_pmus)
{
struct rapl_pmu *rapl_pmu;
int idx;
@@ -646,20 +646,20 @@ static int __init init_rapl_pmu(void)
return -ENOMEM;
}
-static int __init init_rapl_pmus(void)
+static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr, int rapl_pmu_scope)
{
int nr_rapl_pmu = topology_max_packages();
- int rapl_pmu_scope = PERF_PMU_SCOPE_PKG;
+ struct rapl_pmus *rapl_pmus;
- if (!rapl_pmu_is_pkg_scope()) {
- nr_rapl_pmu *= topology_max_dies_per_package();
- rapl_pmu_scope = PERF_PMU_SCOPE_DIE;
- }
+ if (rapl_pmu_scope == PERF_PMU_SCOPE_DIE)
+ nr_rapl_pmu *= topology_max_dies_per_package();
rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu, nr_rapl_pmu), GFP_KERNEL);
if (!rapl_pmus)
return -ENOMEM;
+ *rapl_pmus_ptr = rapl_pmus;
+
rapl_pmus->nr_rapl_pmu = nr_rapl_pmu;
rapl_pmus->pmu.attr_groups = rapl_attr_groups;
rapl_pmus->pmu.attr_update = rapl_attr_update;
@@ -674,7 +674,7 @@ static int __init init_rapl_pmus(void)
rapl_pmus->pmu.module = THIS_MODULE;
rapl_pmus->pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE;
- return init_rapl_pmu();
+ return init_rapl_pmu(rapl_pmus);
}
static struct rapl_model model_snb = {
@@ -798,8 +798,12 @@ MODULE_DEVICE_TABLE(x86cpu, rapl_model_match);
static int __init rapl_pmu_init(void)
{
const struct x86_cpu_id *id;
+ int rapl_pmu_scope = PERF_PMU_SCOPE_DIE;
int ret;
+ if (rapl_pmu_is_pkg_scope())
+ rapl_pmu_scope = PERF_PMU_SCOPE_PKG;
+
id = x86_match_cpu(rapl_model_match);
if (!id)
return -ENODEV;
@@ -815,7 +819,7 @@ static int __init rapl_pmu_init(void)
if (ret)
return ret;
- ret = init_rapl_pmus();
+ ret = init_rapl_pmus(&rapl_pmus, rapl_pmu_scope);
if (ret)
return ret;
@@ -828,7 +832,7 @@ static int __init rapl_pmu_init(void)
out:
pr_warn("Initialization failed (%d), disabled\n", ret);
- cleanup_rapl_pmus();
+ cleanup_rapl_pmus(rapl_pmus);
return ret;
}
module_init(rapl_pmu_init);
@@ -836,6 +840,6 @@ module_init(rapl_pmu_init);
static void __exit intel_rapl_exit(void)
{
perf_pmu_unregister(&rapl_pmus->pmu);
- cleanup_rapl_pmus();
+ cleanup_rapl_pmus(rapl_pmus);
}
module_exit(intel_rapl_exit);
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v7 07/10] perf/x86/rapl: Modify the generic variable names to *_pkg*
2024-11-15 6:07 [PATCH v7 00/10] Add RAPL core energy counter support for AMD CPUs Dhananjay Ugwekar
` (5 preceding siblings ...)
2024-11-15 6:08 ` [PATCH v7 06/10] perf/x86/rapl: Add arguments to the init and cleanup functions Dhananjay Ugwekar
@ 2024-11-15 6:08 ` Dhananjay Ugwekar
2024-11-15 6:08 ` [PATCH v7 08/10] perf/x86/rapl: Remove the global variable rapl_msrs Dhananjay Ugwekar
` (3 subsequent siblings)
10 siblings, 0 replies; 26+ messages in thread
From: Dhananjay Ugwekar @ 2024-11-15 6:08 UTC (permalink / raw)
To: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel, Dhananjay Ugwekar
Prepare for the addition of RAPL core energy counter support.
Replace the generic names with *_pkg*, to later on differentiate between
the scopes of the two different PMUs and their variables.
No functional change.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
---
arch/x86/events/rapl.c | 120 ++++++++++++++++++++---------------------
1 file changed, 60 insertions(+), 60 deletions(-)
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index f353008b860b..2398a8452709 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -70,18 +70,18 @@ MODULE_LICENSE("GPL");
/*
* RAPL energy status counters
*/
-enum perf_rapl_events {
+enum perf_rapl_pkg_events {
PERF_RAPL_PP0 = 0, /* all cores */
PERF_RAPL_PKG, /* entire package */
PERF_RAPL_RAM, /* DRAM */
PERF_RAPL_PP1, /* gpu */
PERF_RAPL_PSYS, /* psys */
- PERF_RAPL_MAX,
- NR_RAPL_DOMAINS = PERF_RAPL_MAX,
+ PERF_RAPL_PKG_EVENTS_MAX,
+ NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
};
-static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
+static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst = {
"pp0-core",
"package",
"dram",
@@ -112,7 +112,7 @@ static struct perf_pmu_events_attr event_attr_##v = { \
* considered as either pkg-scope or die-scope, and we are considering
* them as die-scope.
*/
-#define rapl_pmu_is_pkg_scope() \
+#define rapl_pkg_pmu_is_pkg_scope() \
(boot_cpu_data.x86_vendor == X86_VENDOR_AMD || \
boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
@@ -139,16 +139,16 @@ enum rapl_unit_quirk {
};
struct rapl_model {
- struct perf_msr *rapl_msrs;
- unsigned long events;
+ struct perf_msr *rapl_pkg_msrs;
+ unsigned long pkg_events;
unsigned int msr_power_unit;
enum rapl_unit_quirk unit_quirk;
};
/* 1/2^hw_unit Joule */
-static int rapl_hw_unit[NR_RAPL_DOMAINS] __read_mostly;
-static struct rapl_pmus *rapl_pmus;
-static unsigned int rapl_cntr_mask;
+static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
+static struct rapl_pmus *rapl_pmus_pkg;
+static unsigned int rapl_pkg_cntr_mask;
static u64 rapl_timer_ms;
static struct perf_msr *rapl_msrs;
static struct rapl_model *rapl_model;
@@ -163,8 +163,8 @@ static inline unsigned int get_rapl_pmu_idx(int cpu)
* (for non-existent mappings in topology map) to UINT_MAX, so
* the error check in the caller is simplified.
*/
- return rapl_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
- topology_logical_die_id(cpu);
+ return rapl_pkg_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
+ topology_logical_die_id(cpu);
}
static inline u64 rapl_read_counter(struct perf_event *event)
@@ -176,7 +176,7 @@ static inline u64 rapl_read_counter(struct perf_event *event)
static inline u64 rapl_scale(u64 v, int cfg)
{
- if (cfg > NR_RAPL_DOMAINS) {
+ if (cfg > NR_RAPL_PKG_DOMAINS) {
pr_warn("Invalid domain %d, failed to scale data\n", cfg);
return v;
}
@@ -186,7 +186,7 @@ static inline u64 rapl_scale(u64 v, int cfg)
* or use ldexp(count, -32).
* Watts = Joules/Time delta
*/
- return v << (32 - rapl_hw_unit[cfg - 1]);
+ return v << (32 - rapl_pkg_hw_unit[cfg - 1]);
}
static u64 rapl_event_update(struct perf_event *event)
@@ -347,7 +347,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
unsigned int rapl_pmu_idx;
/* only look at RAPL events */
- if (event->attr.type != rapl_pmus->pmu.type)
+ if (event->attr.type != rapl_pmus_pkg->pmu.type)
return -ENOENT;
/* check only supported bits are set */
@@ -357,14 +357,14 @@ static int rapl_pmu_event_init(struct perf_event *event)
if (event->cpu < 0)
return -EINVAL;
- if (!cfg || cfg >= NR_RAPL_DOMAINS + 1)
+ if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
return -EINVAL;
- cfg = array_index_nospec((long)cfg, NR_RAPL_DOMAINS + 1);
+ cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
bit = cfg - 1;
/* check event supported */
- if (!(rapl_cntr_mask & (1 << bit)))
+ if (!(rapl_pkg_cntr_mask & (1 << bit)))
return -EINVAL;
/* unsupported modes and filters */
@@ -372,11 +372,11 @@ static int rapl_pmu_event_init(struct perf_event *event)
return -EINVAL;
rapl_pmu_idx = get_rapl_pmu_idx(event->cpu);
- if (rapl_pmu_idx >= rapl_pmus->nr_rapl_pmu)
+ if (rapl_pmu_idx >= rapl_pmus_pkg->nr_rapl_pmu)
return -EINVAL;
/* must be done before validate_group */
- rapl_pmu = rapl_pmus->rapl_pmu[rapl_pmu_idx];
+ rapl_pmu = rapl_pmus_pkg->rapl_pmu[rapl_pmu_idx];
if (!rapl_pmu)
return -EINVAL;
@@ -530,11 +530,11 @@ static struct perf_msr intel_rapl_spr_msrs[] = {
};
/*
- * Force to PERF_RAPL_MAX size due to:
- * - perf_msr_probe(PERF_RAPL_MAX)
+ * Force to PERF_RAPL_PKG_EVENTS_MAX size due to:
+ * - perf_msr_probe(PERF_RAPL_PKG_EVENTS_MAX)
* - want to use same event codes across both architectures
*/
-static struct perf_msr amd_rapl_msrs[] = {
+static struct perf_msr amd_rapl_pkg_msrs[] = {
[PERF_RAPL_PP0] = { 0, &rapl_events_cores_group, NULL, false, 0 },
[PERF_RAPL_PKG] = { MSR_AMD_PKG_ENERGY_STATUS, &rapl_events_pkg_group, test_msr, false, RAPL_MSR_MASK },
[PERF_RAPL_RAM] = { 0, &rapl_events_ram_group, NULL, false, 0 },
@@ -550,8 +550,8 @@ static int rapl_check_hw_unit(void)
/* protect rdmsrl() to handle virtualization */
if (rdmsrl_safe(rapl_model->msr_power_unit, &msr_rapl_power_unit_bits))
return -1;
- for (i = 0; i < NR_RAPL_DOMAINS; i++)
- rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
+ for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++)
+ rapl_pkg_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
switch (rapl_model->unit_quirk) {
/*
@@ -561,11 +561,11 @@ static int rapl_check_hw_unit(void)
* of 2. Datasheet, September 2014, Reference Number: 330784-001 "
*/
case RAPL_UNIT_QUIRK_INTEL_HSW:
- rapl_hw_unit[PERF_RAPL_RAM] = 16;
+ rapl_pkg_hw_unit[PERF_RAPL_RAM] = 16;
break;
/* SPR uses a fixed energy unit for Psys domain. */
case RAPL_UNIT_QUIRK_INTEL_SPR:
- rapl_hw_unit[PERF_RAPL_PSYS] = 0;
+ rapl_pkg_hw_unit[PERF_RAPL_PSYS] = 0;
break;
default:
break;
@@ -580,9 +580,9 @@ static int rapl_check_hw_unit(void)
* if hw unit is 32, then we use 2 ms 1/200/2
*/
rapl_timer_ms = 2;
- if (rapl_hw_unit[0] < 32) {
+ if (rapl_pkg_hw_unit[0] < 32) {
rapl_timer_ms = (1000 / (2 * 100));
- rapl_timer_ms *= (1ULL << (32 - rapl_hw_unit[0] - 1));
+ rapl_timer_ms *= (1ULL << (32 - rapl_pkg_hw_unit[0] - 1));
}
return 0;
}
@@ -592,12 +592,12 @@ static void __init rapl_advertise(void)
int i;
pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms ovfl timer\n",
- hweight32(rapl_cntr_mask), rapl_timer_ms);
+ hweight32(rapl_pkg_cntr_mask), rapl_timer_ms);
- for (i = 0; i < NR_RAPL_DOMAINS; i++) {
- if (rapl_cntr_mask & (1 << i)) {
+ for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++) {
+ if (rapl_pkg_cntr_mask & (1 << i)) {
pr_info("hw unit of domain %s 2^-%d Joules\n",
- rapl_domain_names[i], rapl_hw_unit[i]);
+ rapl_pkg_domain_names[i], rapl_pkg_hw_unit[i]);
}
}
}
@@ -678,71 +678,71 @@ static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr, int rapl_pmu_
}
static struct rapl_model model_snb = {
- .events = BIT(PERF_RAPL_PP0) |
+ .pkg_events = BIT(PERF_RAPL_PP0) |
BIT(PERF_RAPL_PKG) |
BIT(PERF_RAPL_PP1),
.msr_power_unit = MSR_RAPL_POWER_UNIT,
- .rapl_msrs = intel_rapl_msrs,
+ .rapl_pkg_msrs = intel_rapl_msrs,
};
static struct rapl_model model_snbep = {
- .events = BIT(PERF_RAPL_PP0) |
+ .pkg_events = BIT(PERF_RAPL_PP0) |
BIT(PERF_RAPL_PKG) |
BIT(PERF_RAPL_RAM),
.msr_power_unit = MSR_RAPL_POWER_UNIT,
- .rapl_msrs = intel_rapl_msrs,
+ .rapl_pkg_msrs = intel_rapl_msrs,
};
static struct rapl_model model_hsw = {
- .events = BIT(PERF_RAPL_PP0) |
+ .pkg_events = BIT(PERF_RAPL_PP0) |
BIT(PERF_RAPL_PKG) |
BIT(PERF_RAPL_RAM) |
BIT(PERF_RAPL_PP1),
.msr_power_unit = MSR_RAPL_POWER_UNIT,
- .rapl_msrs = intel_rapl_msrs,
+ .rapl_pkg_msrs = intel_rapl_msrs,
};
static struct rapl_model model_hsx = {
- .events = BIT(PERF_RAPL_PP0) |
+ .pkg_events = BIT(PERF_RAPL_PP0) |
BIT(PERF_RAPL_PKG) |
BIT(PERF_RAPL_RAM),
.unit_quirk = RAPL_UNIT_QUIRK_INTEL_HSW,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
- .rapl_msrs = intel_rapl_msrs,
+ .rapl_pkg_msrs = intel_rapl_msrs,
};
static struct rapl_model model_knl = {
- .events = BIT(PERF_RAPL_PKG) |
+ .pkg_events = BIT(PERF_RAPL_PKG) |
BIT(PERF_RAPL_RAM),
.unit_quirk = RAPL_UNIT_QUIRK_INTEL_HSW,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
- .rapl_msrs = intel_rapl_msrs,
+ .rapl_pkg_msrs = intel_rapl_msrs,
};
static struct rapl_model model_skl = {
- .events = BIT(PERF_RAPL_PP0) |
+ .pkg_events = BIT(PERF_RAPL_PP0) |
BIT(PERF_RAPL_PKG) |
BIT(PERF_RAPL_RAM) |
BIT(PERF_RAPL_PP1) |
BIT(PERF_RAPL_PSYS),
.msr_power_unit = MSR_RAPL_POWER_UNIT,
- .rapl_msrs = intel_rapl_msrs,
+ .rapl_pkg_msrs = intel_rapl_msrs,
};
static struct rapl_model model_spr = {
- .events = BIT(PERF_RAPL_PP0) |
+ .pkg_events = BIT(PERF_RAPL_PP0) |
BIT(PERF_RAPL_PKG) |
BIT(PERF_RAPL_RAM) |
BIT(PERF_RAPL_PSYS),
.unit_quirk = RAPL_UNIT_QUIRK_INTEL_SPR,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
- .rapl_msrs = intel_rapl_spr_msrs,
+ .rapl_pkg_msrs = intel_rapl_spr_msrs,
};
static struct rapl_model model_amd_hygon = {
- .events = BIT(PERF_RAPL_PKG),
+ .pkg_events = BIT(PERF_RAPL_PKG),
.msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
- .rapl_msrs = amd_rapl_msrs,
+ .rapl_pkg_msrs = amd_rapl_pkg_msrs,
};
static const struct x86_cpu_id rapl_model_match[] __initconst = {
@@ -798,11 +798,11 @@ MODULE_DEVICE_TABLE(x86cpu, rapl_model_match);
static int __init rapl_pmu_init(void)
{
const struct x86_cpu_id *id;
- int rapl_pmu_scope = PERF_PMU_SCOPE_DIE;
+ int rapl_pkg_pmu_scope = PERF_PMU_SCOPE_DIE;
int ret;
- if (rapl_pmu_is_pkg_scope())
- rapl_pmu_scope = PERF_PMU_SCOPE_PKG;
+ if (rapl_pkg_pmu_is_pkg_scope())
+ rapl_pkg_pmu_scope = PERF_PMU_SCOPE_PKG;
id = x86_match_cpu(rapl_model_match);
if (!id)
@@ -810,20 +810,20 @@ static int __init rapl_pmu_init(void)
rapl_model = (struct rapl_model *) id->driver_data;
- rapl_msrs = rapl_model->rapl_msrs;
+ rapl_msrs = rapl_model->rapl_pkg_msrs;
- rapl_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_MAX,
- false, (void *) &rapl_model->events);
+ rapl_pkg_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_PKG_EVENTS_MAX,
+ false, (void *) &rapl_model->pkg_events);
ret = rapl_check_hw_unit();
if (ret)
return ret;
- ret = init_rapl_pmus(&rapl_pmus, rapl_pmu_scope);
+ ret = init_rapl_pmus(&rapl_pmus_pkg, rapl_pkg_pmu_scope);
if (ret)
return ret;
- ret = perf_pmu_register(&rapl_pmus->pmu, "power", -1);
+ ret = perf_pmu_register(&rapl_pmus_pkg->pmu, "power", -1);
if (ret)
goto out;
@@ -832,14 +832,14 @@ static int __init rapl_pmu_init(void)
out:
pr_warn("Initialization failed (%d), disabled\n", ret);
- cleanup_rapl_pmus(rapl_pmus);
+ cleanup_rapl_pmus(rapl_pmus_pkg);
return ret;
}
module_init(rapl_pmu_init);
static void __exit intel_rapl_exit(void)
{
- perf_pmu_unregister(&rapl_pmus->pmu);
- cleanup_rapl_pmus(rapl_pmus);
+ perf_pmu_unregister(&rapl_pmus_pkg->pmu);
+ cleanup_rapl_pmus(rapl_pmus_pkg);
}
module_exit(intel_rapl_exit);
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v7 08/10] perf/x86/rapl: Remove the global variable rapl_msrs
2024-11-15 6:07 [PATCH v7 00/10] Add RAPL core energy counter support for AMD CPUs Dhananjay Ugwekar
` (6 preceding siblings ...)
2024-11-15 6:08 ` [PATCH v7 07/10] perf/x86/rapl: Modify the generic variable names to *_pkg* Dhananjay Ugwekar
@ 2024-11-15 6:08 ` Dhananjay Ugwekar
2024-11-15 6:08 ` [PATCH v7 09/10] perf/x86/rapl: Move the cntr_mask to rapl_pmus struct Dhananjay Ugwekar
` (2 subsequent siblings)
10 siblings, 0 replies; 26+ messages in thread
From: Dhananjay Ugwekar @ 2024-11-15 6:08 UTC (permalink / raw)
To: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel, Dhananjay Ugwekar
Prepare for the addition of RAPL core energy counter support.
After making the rapl_model struct global, the rapl_msrs global
variable isn't needed, so remove it.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
---
arch/x86/events/rapl.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 2398a8452709..46ad20b5e7c5 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -150,7 +150,6 @@ static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
static struct rapl_pmus *rapl_pmus_pkg;
static unsigned int rapl_pkg_cntr_mask;
static u64 rapl_timer_ms;
-static struct perf_msr *rapl_msrs;
static struct rapl_model *rapl_model;
/*
@@ -381,7 +380,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
return -EINVAL;
event->pmu_private = rapl_pmu;
- event->hw.event_base = rapl_msrs[bit].msr;
+ event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
event->hw.config = cfg;
event->hw.idx = bit;
@@ -810,9 +809,7 @@ static int __init rapl_pmu_init(void)
rapl_model = (struct rapl_model *) id->driver_data;
- rapl_msrs = rapl_model->rapl_pkg_msrs;
-
- rapl_pkg_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_PKG_EVENTS_MAX,
+ rapl_pkg_cntr_mask = perf_msr_probe(rapl_model->rapl_pkg_msrs, PERF_RAPL_PKG_EVENTS_MAX,
false, (void *) &rapl_model->pkg_events);
ret = rapl_check_hw_unit();
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v7 09/10] perf/x86/rapl: Move the cntr_mask to rapl_pmus struct
2024-11-15 6:07 [PATCH v7 00/10] Add RAPL core energy counter support for AMD CPUs Dhananjay Ugwekar
` (7 preceding siblings ...)
2024-11-15 6:08 ` [PATCH v7 08/10] perf/x86/rapl: Remove the global variable rapl_msrs Dhananjay Ugwekar
@ 2024-11-15 6:08 ` Dhananjay Ugwekar
2024-11-15 6:08 ` [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs Dhananjay Ugwekar
2024-11-19 12:20 ` [PATCH v7 00/10] Add RAPL " Peter Zijlstra
10 siblings, 0 replies; 26+ messages in thread
From: Dhananjay Ugwekar @ 2024-11-15 6:08 UTC (permalink / raw)
To: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel, Dhananjay Ugwekar
Prepare for the addition of RAPL core energy counter support.
Move cntr_mask to rapl_pmus struct instead of adding a new global
cntr_mask for the new RAPL power_core PMU. This will also ensure that
the second "core_cntr_mask" is only created if needed (i.e. in case of
AMD CPUs).
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
---
arch/x86/events/rapl.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 46ad20b5e7c5..6e51386ff91f 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -129,6 +129,7 @@ struct rapl_pmu {
struct rapl_pmus {
struct pmu pmu;
unsigned int nr_rapl_pmu;
+ unsigned int cntr_mask;
struct rapl_pmu *rapl_pmu[] __counted_by(nr_rapl_pmu);
};
@@ -148,7 +149,6 @@ struct rapl_model {
/* 1/2^hw_unit Joule */
static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
static struct rapl_pmus *rapl_pmus_pkg;
-static unsigned int rapl_pkg_cntr_mask;
static u64 rapl_timer_ms;
static struct rapl_model *rapl_model;
@@ -363,7 +363,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
bit = cfg - 1;
/* check event supported */
- if (!(rapl_pkg_cntr_mask & (1 << bit)))
+ if (!(rapl_pmus_pkg->cntr_mask & (1 << bit)))
return -EINVAL;
/* unsupported modes and filters */
@@ -591,10 +591,10 @@ static void __init rapl_advertise(void)
int i;
pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms ovfl timer\n",
- hweight32(rapl_pkg_cntr_mask), rapl_timer_ms);
+ hweight32(rapl_pmus_pkg->cntr_mask), rapl_timer_ms);
for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++) {
- if (rapl_pkg_cntr_mask & (1 << i)) {
+ if (rapl_pmus_pkg->cntr_mask & (1 << i)) {
pr_info("hw unit of domain %s 2^-%d Joules\n",
rapl_pkg_domain_names[i], rapl_pkg_hw_unit[i]);
}
@@ -809,9 +809,6 @@ static int __init rapl_pmu_init(void)
rapl_model = (struct rapl_model *) id->driver_data;
- rapl_pkg_cntr_mask = perf_msr_probe(rapl_model->rapl_pkg_msrs, PERF_RAPL_PKG_EVENTS_MAX,
- false, (void *) &rapl_model->pkg_events);
-
ret = rapl_check_hw_unit();
if (ret)
return ret;
@@ -820,6 +817,10 @@ static int __init rapl_pmu_init(void)
if (ret)
return ret;
+ rapl_pmus_pkg->cntr_mask = perf_msr_probe(rapl_model->rapl_pkg_msrs,
+ PERF_RAPL_PKG_EVENTS_MAX, false,
+ (void *) &rapl_model->pkg_events);
+
ret = perf_pmu_register(&rapl_pmus_pkg->pmu, "power", -1);
if (ret)
goto out;
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs
2024-11-15 6:07 [PATCH v7 00/10] Add RAPL core energy counter support for AMD CPUs Dhananjay Ugwekar
` (8 preceding siblings ...)
2024-11-15 6:08 ` [PATCH v7 09/10] perf/x86/rapl: Move the cntr_mask to rapl_pmus struct Dhananjay Ugwekar
@ 2024-11-15 6:08 ` Dhananjay Ugwekar
2024-11-20 7:58 ` Peter Jung
2025-01-12 13:42 ` Koichiro Den
2024-11-19 12:20 ` [PATCH v7 00/10] Add RAPL " Peter Zijlstra
10 siblings, 2 replies; 26+ messages in thread
From: Dhananjay Ugwekar @ 2024-11-15 6:08 UTC (permalink / raw)
To: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel, Dhananjay Ugwekar
Add a new "power_core" PMU and "energy-core" event for monitoring
energy consumption by each individual core. The existing energy-cores
event aggregates the energy consumption of CPU cores at the package level.
This new event aligns with the AMD's per-core energy counters.
Tested the package level and core level PMU counters with workloads
pinned to different CPUs.
Results with workload pinned to CPU 4 in core 4 on an AMD Zen4 Genoa
machine:
$ sudo perf stat --per-core -e power_core/energy-core/ -- taskset -c 4 stress-ng --matrix 1 --timeout 5s
stress-ng: info: [21250] setting to a 5 second run per stressor
stress-ng: info: [21250] dispatching hogs: 1 matrix
stress-ng: info: [21250] successful run completed in 5.00s
Performance counter stats for 'system wide':
S0-D0-C0 1 0.00 Joules power_core/energy-core/
S0-D0-C1 1 0.00 Joules power_core/energy-core/
S0-D0-C2 1 0.00 Joules power_core/energy-core/
S0-D0-C3 1 0.00 Joules power_core/energy-core/
S0-D0-C4 1 8.43 Joules power_core/energy-core/
S0-D0-C5 1 0.00 Joules power_core/energy-core/
S0-D0-C6 1 0.00 Joules power_core/energy-core/
S0-D0-C7 1 0.00 Joules power_core/energy-core/
S0-D1-C8 1 0.00 Joules power_core/energy-core/
S0-D1-C9 1 0.00 Joules power_core/energy-core/
S0-D1-C10 1 0.00 Joules power_core/energy-core/
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
---
arch/x86/events/rapl.c | 185 +++++++++++++++++++++++++++++++++--------
1 file changed, 152 insertions(+), 33 deletions(-)
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 6e51386ff91f..e9be1f31163d 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -39,6 +39,10 @@
* event: rapl_energy_psys
* perf code: 0x5
*
+ * core counter: consumption of a single physical core
+ * event: rapl_energy_core (power_core PMU)
+ * perf code: 0x1
+ *
* We manage those counters as free running (read-only). They may be
* use simultaneously by other tools, such as turbostat.
*
@@ -81,6 +85,10 @@ enum perf_rapl_pkg_events {
NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
};
+#define PERF_RAPL_CORE 0 /* single core */
+#define PERF_RAPL_CORE_EVENTS_MAX 1
+#define NR_RAPL_CORE_DOMAINS PERF_RAPL_CORE_EVENTS_MAX
+
static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst = {
"pp0-core",
"package",
@@ -89,6 +97,8 @@ static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst
"psys",
};
+static const char *const rapl_core_domain_name __initconst = "core";
+
/*
* event code: LSB 8 bits, passed in attr->config
* any other bit is reserved
@@ -141,14 +151,18 @@ enum rapl_unit_quirk {
struct rapl_model {
struct perf_msr *rapl_pkg_msrs;
+ struct perf_msr *rapl_core_msrs;
unsigned long pkg_events;
+ unsigned long core_events;
unsigned int msr_power_unit;
enum rapl_unit_quirk unit_quirk;
};
/* 1/2^hw_unit Joule */
static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
+static int rapl_core_hw_unit __read_mostly;
static struct rapl_pmus *rapl_pmus_pkg;
+static struct rapl_pmus *rapl_pmus_core;
static u64 rapl_timer_ms;
static struct rapl_model *rapl_model;
@@ -156,14 +170,23 @@ static struct rapl_model *rapl_model;
* Helper function to get the correct topology id according to the
* RAPL PMU scope.
*/
-static inline unsigned int get_rapl_pmu_idx(int cpu)
-{ /*
+static inline unsigned int get_rapl_pmu_idx(int cpu, int scope)
+{
+ /*
* Returns unsigned int, which converts the '-1' return value
* (for non-existent mappings in topology map) to UINT_MAX, so
* the error check in the caller is simplified.
*/
- return rapl_pkg_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
- topology_logical_die_id(cpu);
+ switch (scope) {
+ case PERF_PMU_SCOPE_PKG:
+ return topology_logical_package_id(cpu);
+ case PERF_PMU_SCOPE_DIE:
+ return topology_logical_die_id(cpu);
+ case PERF_PMU_SCOPE_CORE:
+ return topology_logical_core_id(cpu);
+ default:
+ return -EINVAL;
+ }
}
static inline u64 rapl_read_counter(struct perf_event *event)
@@ -173,19 +196,20 @@ static inline u64 rapl_read_counter(struct perf_event *event)
return raw;
}
-static inline u64 rapl_scale(u64 v, int cfg)
+static inline u64 rapl_scale(u64 v, struct perf_event *event)
{
- if (cfg > NR_RAPL_PKG_DOMAINS) {
- pr_warn("Invalid domain %d, failed to scale data\n", cfg);
- return v;
- }
+ int hw_unit = rapl_pkg_hw_unit[event->hw.config - 1];
+
+ if (event->pmu->scope == PERF_PMU_SCOPE_CORE)
+ hw_unit = rapl_core_hw_unit;
+
/*
* scale delta to smallest unit (1/2^32)
* users must then scale back: count * 1/(1e9*2^32) to get Joules
* or use ldexp(count, -32).
* Watts = Joules/Time delta
*/
- return v << (32 - rapl_pkg_hw_unit[cfg - 1]);
+ return v << (32 - hw_unit);
}
static u64 rapl_event_update(struct perf_event *event)
@@ -212,7 +236,7 @@ static u64 rapl_event_update(struct perf_event *event)
delta = (new_raw_count << shift) - (prev_raw_count << shift);
delta >>= shift;
- sdelta = rapl_scale(delta, event->hw.config);
+ sdelta = rapl_scale(delta, event);
local64_add(sdelta, &event->count);
@@ -341,13 +365,14 @@ static void rapl_pmu_event_del(struct perf_event *event, int flags)
static int rapl_pmu_event_init(struct perf_event *event)
{
u64 cfg = event->attr.config & RAPL_EVENT_MASK;
- int bit, ret = 0;
+ int bit, rapl_pmus_scope, ret = 0;
struct rapl_pmu *rapl_pmu;
unsigned int rapl_pmu_idx;
+ struct rapl_pmus *rapl_pmus;
- /* only look at RAPL events */
- if (event->attr.type != rapl_pmus_pkg->pmu.type)
- return -ENOENT;
+ /* unsupported modes and filters */
+ if (event->attr.sample_period) /* no sampling */
+ return -EINVAL;
/* check only supported bits are set */
if (event->attr.config & ~RAPL_EVENT_MASK)
@@ -356,31 +381,49 @@ static int rapl_pmu_event_init(struct perf_event *event)
if (event->cpu < 0)
return -EINVAL;
- if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
+ rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
+ if (!rapl_pmus)
return -EINVAL;
-
- cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
- bit = cfg - 1;
-
- /* check event supported */
- if (!(rapl_pmus_pkg->cntr_mask & (1 << bit)))
+ rapl_pmus_scope = rapl_pmus->pmu.scope;
+
+ if (rapl_pmus_scope == PERF_PMU_SCOPE_PKG || rapl_pmus_scope == PERF_PMU_SCOPE_DIE) {
+ /* only look at RAPL package events */
+ if (event->attr.type != rapl_pmus_pkg->pmu.type)
+ return -ENOENT;
+
+ cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
+ if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
+ return -EINVAL;
+
+ bit = cfg - 1;
+ event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
+ } else if (rapl_pmus_scope == PERF_PMU_SCOPE_CORE) {
+ /* only look at RAPL core events */
+ if (event->attr.type != rapl_pmus_core->pmu.type)
+ return -ENOENT;
+
+ cfg = array_index_nospec((long)cfg, NR_RAPL_CORE_DOMAINS + 1);
+ if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
+ return -EINVAL;
+
+ bit = cfg - 1;
+ event->hw.event_base = rapl_model->rapl_core_msrs[bit].msr;
+ } else
return -EINVAL;
- /* unsupported modes and filters */
- if (event->attr.sample_period) /* no sampling */
+ /* check event supported */
+ if (!(rapl_pmus->cntr_mask & (1 << bit)))
return -EINVAL;
- rapl_pmu_idx = get_rapl_pmu_idx(event->cpu);
- if (rapl_pmu_idx >= rapl_pmus_pkg->nr_rapl_pmu)
+ rapl_pmu_idx = get_rapl_pmu_idx(event->cpu, rapl_pmus_scope);
+ if (rapl_pmu_idx >= rapl_pmus->nr_rapl_pmu)
return -EINVAL;
-
/* must be done before validate_group */
- rapl_pmu = rapl_pmus_pkg->rapl_pmu[rapl_pmu_idx];
+ rapl_pmu = rapl_pmus->rapl_pmu[rapl_pmu_idx];
if (!rapl_pmu)
return -EINVAL;
event->pmu_private = rapl_pmu;
- event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
event->hw.config = cfg;
event->hw.idx = bit;
@@ -397,12 +440,14 @@ RAPL_EVENT_ATTR_STR(energy-pkg , rapl_pkg, "event=0x02");
RAPL_EVENT_ATTR_STR(energy-ram , rapl_ram, "event=0x03");
RAPL_EVENT_ATTR_STR(energy-gpu , rapl_gpu, "event=0x04");
RAPL_EVENT_ATTR_STR(energy-psys, rapl_psys, "event=0x05");
+RAPL_EVENT_ATTR_STR(energy-core, rapl_core, "event=0x01");
RAPL_EVENT_ATTR_STR(energy-cores.unit, rapl_cores_unit, "Joules");
RAPL_EVENT_ATTR_STR(energy-pkg.unit , rapl_pkg_unit, "Joules");
RAPL_EVENT_ATTR_STR(energy-ram.unit , rapl_ram_unit, "Joules");
RAPL_EVENT_ATTR_STR(energy-gpu.unit , rapl_gpu_unit, "Joules");
RAPL_EVENT_ATTR_STR(energy-psys.unit, rapl_psys_unit, "Joules");
+RAPL_EVENT_ATTR_STR(energy-core.unit, rapl_core_unit, "Joules");
/*
* we compute in 0.23 nJ increments regardless of MSR
@@ -412,6 +457,7 @@ RAPL_EVENT_ATTR_STR(energy-pkg.scale, rapl_pkg_scale, "2.3283064365386962890
RAPL_EVENT_ATTR_STR(energy-ram.scale, rapl_ram_scale, "2.3283064365386962890625e-10");
RAPL_EVENT_ATTR_STR(energy-gpu.scale, rapl_gpu_scale, "2.3283064365386962890625e-10");
RAPL_EVENT_ATTR_STR(energy-psys.scale, rapl_psys_scale, "2.3283064365386962890625e-10");
+RAPL_EVENT_ATTR_STR(energy-core.scale, rapl_core_scale, "2.3283064365386962890625e-10");
/*
* There are no default events, but we need to create
@@ -444,6 +490,12 @@ static const struct attribute_group *rapl_attr_groups[] = {
NULL,
};
+static const struct attribute_group *rapl_core_attr_groups[] = {
+ &rapl_pmu_format_group,
+ &rapl_pmu_events_group,
+ NULL,
+};
+
static struct attribute *rapl_events_cores[] = {
EVENT_PTR(rapl_cores),
EVENT_PTR(rapl_cores_unit),
@@ -504,6 +556,18 @@ static struct attribute_group rapl_events_psys_group = {
.attrs = rapl_events_psys,
};
+static struct attribute *rapl_events_core[] = {
+ EVENT_PTR(rapl_core),
+ EVENT_PTR(rapl_core_unit),
+ EVENT_PTR(rapl_core_scale),
+ NULL,
+};
+
+static struct attribute_group rapl_events_core_group = {
+ .name = "events",
+ .attrs = rapl_events_core,
+};
+
static bool test_msr(int idx, void *data)
{
return test_bit(idx, (unsigned long *) data);
@@ -541,6 +605,11 @@ static struct perf_msr amd_rapl_pkg_msrs[] = {
[PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group, NULL, false, 0 },
};
+static struct perf_msr amd_rapl_core_msrs[] = {
+ [PERF_RAPL_CORE] = { MSR_AMD_CORE_ENERGY_STATUS, &rapl_events_core_group,
+ test_msr, false, RAPL_MSR_MASK },
+};
+
static int rapl_check_hw_unit(void)
{
u64 msr_rapl_power_unit_bits;
@@ -552,6 +621,8 @@ static int rapl_check_hw_unit(void)
for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++)
rapl_pkg_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
+ rapl_core_hw_unit = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
+
switch (rapl_model->unit_quirk) {
/*
* DRAM domain on HSW server and KNL has fixed energy unit which can be
@@ -570,7 +641,6 @@ static int rapl_check_hw_unit(void)
break;
}
-
/*
* Calculate the timer rate:
* Use reference of 200W for scaling the timeout to avoid counter
@@ -589,9 +659,13 @@ static int rapl_check_hw_unit(void)
static void __init rapl_advertise(void)
{
int i;
+ int num_counters = hweight32(rapl_pmus_pkg->cntr_mask);
+
+ if (rapl_pmus_core)
+ num_counters += hweight32(rapl_pmus_core->cntr_mask);
pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms ovfl timer\n",
- hweight32(rapl_pmus_pkg->cntr_mask), rapl_timer_ms);
+ num_counters, rapl_timer_ms);
for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++) {
if (rapl_pmus_pkg->cntr_mask & (1 << i)) {
@@ -599,6 +673,10 @@ static void __init rapl_advertise(void)
rapl_pkg_domain_names[i], rapl_pkg_hw_unit[i]);
}
}
+
+ if (rapl_pmus_core && (rapl_pmus_core->cntr_mask & (1 << PERF_RAPL_CORE)))
+ pr_info("hw unit of domain %s 2^-%d Joules\n",
+ rapl_core_domain_name, rapl_core_hw_unit);
}
static void cleanup_rapl_pmus(struct rapl_pmus *rapl_pmus)
@@ -619,6 +697,10 @@ static const struct attribute_group *rapl_attr_update[] = {
NULL,
};
+static const struct attribute_group *rapl_core_attr_update[] = {
+ &rapl_events_core_group,
+};
+
static int __init init_rapl_pmu(struct rapl_pmus *rapl_pmus)
{
struct rapl_pmu *rapl_pmu;
@@ -645,13 +727,22 @@ static int __init init_rapl_pmu(struct rapl_pmus *rapl_pmus)
return -ENOMEM;
}
-static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr, int rapl_pmu_scope)
+static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr, int rapl_pmu_scope,
+ const struct attribute_group **rapl_attr_groups,
+ const struct attribute_group **rapl_attr_update)
{
int nr_rapl_pmu = topology_max_packages();
struct rapl_pmus *rapl_pmus;
+ /*
+ * rapl_pmu_scope must be either PKG, DIE or CORE
+ */
if (rapl_pmu_scope == PERF_PMU_SCOPE_DIE)
nr_rapl_pmu *= topology_max_dies_per_package();
+ else if (rapl_pmu_scope == PERF_PMU_SCOPE_CORE)
+ nr_rapl_pmu *= topology_num_cores_per_package();
+ else if (rapl_pmu_scope != PERF_PMU_SCOPE_PKG)
+ return -EINVAL;
rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu, nr_rapl_pmu), GFP_KERNEL);
if (!rapl_pmus)
@@ -740,8 +831,10 @@ static struct rapl_model model_spr = {
static struct rapl_model model_amd_hygon = {
.pkg_events = BIT(PERF_RAPL_PKG),
+ .core_events = BIT(PERF_RAPL_CORE),
.msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
.rapl_pkg_msrs = amd_rapl_pkg_msrs,
+ .rapl_core_msrs = amd_rapl_core_msrs,
};
static const struct x86_cpu_id rapl_model_match[] __initconst = {
@@ -813,7 +906,8 @@ static int __init rapl_pmu_init(void)
if (ret)
return ret;
- ret = init_rapl_pmus(&rapl_pmus_pkg, rapl_pkg_pmu_scope);
+ ret = init_rapl_pmus(&rapl_pmus_pkg, rapl_pkg_pmu_scope, rapl_attr_groups,
+ rapl_attr_update);
if (ret)
return ret;
@@ -825,6 +919,27 @@ static int __init rapl_pmu_init(void)
if (ret)
goto out;
+ if (rapl_model->core_events) {
+ ret = init_rapl_pmus(&rapl_pmus_core, PERF_PMU_SCOPE_CORE,
+ rapl_core_attr_groups,
+ rapl_core_attr_update);
+ if (ret) {
+ pr_warn("power-core PMU initialization failed (%d)\n", ret);
+ goto core_init_failed;
+ }
+
+ rapl_pmus_core->cntr_mask = perf_msr_probe(rapl_model->rapl_core_msrs,
+ PERF_RAPL_CORE_EVENTS_MAX, false,
+ (void *) &rapl_model->core_events);
+
+ ret = perf_pmu_register(&rapl_pmus_core->pmu, "power_core", -1);
+ if (ret) {
+ pr_warn("power-core PMU registration failed (%d)\n", ret);
+ cleanup_rapl_pmus(rapl_pmus_core);
+ }
+ }
+
+core_init_failed:
rapl_advertise();
return 0;
@@ -837,6 +952,10 @@ module_init(rapl_pmu_init);
static void __exit intel_rapl_exit(void)
{
+ if (rapl_pmus_core) {
+ perf_pmu_unregister(&rapl_pmus_core->pmu);
+ cleanup_rapl_pmus(rapl_pmus_core);
+ }
perf_pmu_unregister(&rapl_pmus_pkg->pmu);
cleanup_rapl_pmus(rapl_pmus_pkg);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs
2024-11-15 6:08 ` [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs Dhananjay Ugwekar
@ 2024-11-20 7:58 ` Peter Jung
2024-11-20 8:18 ` Peter Jung
2024-11-20 13:58 ` Dhananjay Ugwekar
2025-01-12 13:42 ` Koichiro Den
1 sibling, 2 replies; 26+ messages in thread
From: Peter Jung @ 2024-11-20 7:58 UTC (permalink / raw)
To: Dhananjay Ugwekar, peterz, mingo, rui.zhang, irogers, kan.liang,
tglx, bp, gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel
Hi together,
This patch seems to crash the kernel and results into a not bootable
system.
The patch has been applied on base 6.12.rc7 - I have not tested it yet
on linux-next.
I was able to reproduce this issue also on the v6 and the only "good"
version was the v4.
This has been reproduced on several zen3+ machines and also on my 9950X.
Bisect log:
```
git bisect start
# status: waiting for both good and bad commits
# good: [2d5404caa8c7bb5c4e0435f94b28834ae5456623] Linux 6.12-rc7
git bisect good 2d5404caa8c7bb5c4e0435f94b28834ae5456623
# status: waiting for bad commit, 1 good commit known
# bad: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/rapl: Add
core energy counter support for AMD CPUs
git bisect bad 372e95a40e04ae6ebe69300b76566af6455ba84e
# good: [fd3c84b2fc8a50030e8c7d91983f50539035ec3a] perf/x86/rapl: Rename
rapl_pmu variables
git bisect good fd3c84b2fc8a50030e8c7d91983f50539035ec3a
# good: [96673b2c940e71fde50a54311ecdce00ff7a8e0b] perf/x86/rapl: Modify
the generic variable names to *_pkg*
git bisect good 96673b2c940e71fde50a54311ecdce00ff7a8e0b
# good: [68b214c92635f0b24a3f3074873b77f4f1a82b80] perf/x86/rapl: Move
the cntr_mask to rapl_pmus struct
git bisect good 68b214c92635f0b24a3f3074873b77f4f1a82b80
# first bad commit: [372e95a40e04ae6ebe69300b76566af6455ba84e]
perf/x86/rapl: Add core energy counter support for AMD CPUs
```
Log:
```
Nov 17 12:17:37 varvalian kernel: Oops: general protection fault,
probably for non-canonical address 0x796772656e650073: 0000 [#1] PREEMPT
SMP NOPTI
Nov 17 12:17:37 varvalian kernel: CPU: 8 UID: 0 PID: 549 Comm:
(udev-worker) Tainted: G OE 6.12.0-rc7-6-cachyos-rc-test
#1 bf94bb78cd9745ddc98a0655e5681df034f56f3d
Nov 17 12:17:37 varvalian kernel: Tainted: [O]=OOT_MODULE,
[E]=UNSIGNED_MODULE
Nov 17 12:17:37 varvalian kernel: Hardware name: LENOVO
82SN/LNVNB161216, BIOS J4CN41WW 04/09/2024
Nov 17 12:17:37 varvalian kernel: RIP: 0010:internal_create_group+0x9a/0x4e0
Nov 17 12:17:37 varvalian kernel: Code: 7b 20 00 0f 84 cb 00 00 00 48 8d
74 24 1c 48 8d 54 24 18 4c 89 ff e8 15 8a 99 00 48 83 3b 00 74 59 48 8b
43 18 48 85 c0 74 11 <48> 8b 30 48 85 f6 74 09 4c 8b 5b 08 4d 85 db 75
1a 48 8b 43 20 48
Nov 17 12:17:37 varvalian kernel: RSP: 0018:ffffaa5281fe7868 EFLAGS:
00010202
Nov 17 12:17:37 varvalian kernel: RAX: 796772656e650073 RBX:
ffffffffc2a642aa RCX: f781ec27a963db00
Nov 17 12:17:37 varvalian kernel: RDX: ffffaa5281fe7880 RSI:
ffffaa5281fe7884 RDI: ffff90c611dc8400
Nov 17 12:17:37 varvalian kernel: RBP: 000000000000000f R08:
0000000000000000 R09: 0000000000000001
Nov 17 12:17:37 varvalian kernel: R10: 0000000002000001 R11:
ffffffff8e86ee00 R12: 0000000000000000
Nov 17 12:17:37 varvalian kernel: R13: ffff90c6038469c0 R14:
ffff90c611dc8400 R15: ffff90c611dc8400
Nov 17 12:17:37 varvalian kernel: FS: 00007163efc54880(0000)
GS:ffff90c8f0000000(0000) knlGS:0000000000000000
Nov 17 12:17:37 varvalian kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Nov 17 12:17:37 varvalian kernel: CR2: 00007163efd6a637 CR3:
0000000121298000 CR4: 0000000000f50ef0
Nov 17 12:17:37 varvalian kernel: PKRU: 55555554
Nov 17 12:17:37 varvalian kernel: Call Trace:
Nov 17 12:17:37 varvalian kernel: <TASK>
Nov 17 12:17:37 varvalian kernel: ? __die_body+0x6a/0xb0
Nov 17 12:17:37 varvalian kernel: ? die_addr+0xa4/0xd0
Nov 17 12:17:37 varvalian kernel: ? exc_general_protection+0x170/0x220
Nov 17 12:17:37 varvalian kernel: ? asm_exc_general_protection+0x26/0x30
Nov 17 12:17:37 varvalian kernel: ? __pfx_device_get_ownership+0x10/0x10
Nov 17 12:17:37 varvalian kernel: ? internal_create_group+0x9a/0x4e0
Nov 17 12:17:37 varvalian kernel: ? internal_create_group+0x8b/0x4e0
Nov 17 12:17:37 varvalian kernel: sysfs_update_groups+0x37/0x80
Nov 17 12:17:37 varvalian kernel: pmu_dev_alloc+0xc6/0xe0
Nov 17 12:17:37 varvalian kernel: perf_pmu_register+0x28c/0x3e0
Nov 17 12:17:37 varvalian kernel: init_module+0x1d9/0x240 [rapl
339c1d4b88353cca8b92d05e1524f5f14886bd70]
Nov 17 12:17:37 varvalian kernel: ? __pfx_init_module+0x10/0x10 [rapl
339c1d4b88353cca8b92d05e1524f5f14886bd70]
Nov 17 12:17:37 varvalian kernel: do_one_initcall+0x122/0x370
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? idr_alloc_cyclic+0x139/0x1d0
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ?
security_kernfs_init_security+0x54/0x190
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? __kernfs_new_node+0x1ba/0x240
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? kernfs_link_sibling+0xef/0x110
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? kernfs_activate+0x2c/0x110
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? kernfs_add_one+0x108/0x140
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? __kernfs_create_file+0x75/0xb0
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? sysfs_create_bin_file+0xc6/0x120
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? __vunmap_range_noflush+0x344/0x410
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? insert_vmap_area+0xab/0xf0
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? module_decompress_cleanup+0x4a/0x70
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? load_module+0x1111/0x1330
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? __kmalloc_cache_noprof+0x12b/0x2e0
Nov 17 12:17:37 varvalian kernel: do_init_module+0x60/0x240
Nov 17 12:17:37 varvalian kernel: __se_sys_finit_module+0x270/0x380
Nov 17 12:17:37 varvalian kernel: do_syscall_64+0x88/0x160
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? __x64_sys_lseek+0x5d/0xa0
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? syscall_exit_to_user_mode+0x34/0xd0
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? do_syscall_64+0x94/0x160
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? do_user_addr_fault+0x289/0x730
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Nov 17 12:17:37 varvalian kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Nov 17 12:17:37 varvalian kernel: RIP: 0033:0x7163ef71f15d
Nov 17 12:17:37 varvalian kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00
00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8
4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9b 6b 0d 00
f7 d8 64 89 01 48
Nov 17 12:17:37 varvalian kernel: RSP: 002b:00007ffe2bda0ac8 EFLAGS:
00000246 ORIG_RAX: 0000000000000139
Nov 17 12:17:37 varvalian kernel: RAX: ffffffffffffffda RBX:
00005c1834b627f0 RCX: 00007163ef71f15d
Nov 17 12:17:37 varvalian kernel: RDX: 0000000000000004 RSI:
00007163eef044ac RDI: 000000000000002b
Nov 17 12:17:37 varvalian kernel: RBP: 00007163eef044ac R08:
00007163ef7f6ad0 R09: 00007ffe2bda0bf8
Nov 17 12:17:37 varvalian kernel: R10: 00005c18349ed010 R11:
0000000000000246 R12: 0000000000020000
Nov 17 12:17:37 varvalian kernel: R13: 00005c1834b47b80 R14:
0000000000000000 R15: 00005c1834b54470
Nov 17 12:17:37 varvalian kernel: </TASK>
Nov 17 12:17:37 varvalian kernel: Modules linked in: rapl(+)
nvidia_wmi_ec_backlight pcspkr sparse_keymap roles wmi_bmof
platform_profile snd snd_pci_acp3x k10temp i2c_smbus ccp rfkill mousedev
soundcore joydev typec i2c_hid_acpi i2c_hid amd_pmc ip6t_REJECT acpi_tad
nf_reject_ipv6 mac_hid xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG
nf_log_syslog xt_recent nft_limit xt_limit xt_addrtype xt_tcpudp
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat
nf_tables pkcs8_key_parser uinput i2c_dev crypto_user loop dm_mod
nfnetlink zram 842_decompress 842_compress lz4hc_compress lz4_compress
ip_tables x_tables amdgpu btrfs crc16 libcrc32c amdxcp crc32c_generic
drm_buddy raid6_pq serio_raw drm_suballoc_helper xor drm_exec atkbd
sdhci_pci i2c_algo_bit crc32c_intel vivaldi_fmap libps2 sha256_ssse3
cqhci drm_display_helper nvme sdhci cec nvme_core i8042 mmc_core
gpu_sched nvme_auth serio hid_logitech_hidpp hid_logitech_dj hid_generic
usbhid nvidia_drm(OE) drm_ttm_helper ttm nvidia_uvm(OE)
nvidia_modeset(OE) video wmi nvidia(OE)
Nov 17 12:17:37 varvalian kernel: RIP: 0010:internal_create_group+0x9a/0x4e0
Nov 17 12:17:37 varvalian kernel: Code: 7b 20 00 0f 84 cb 00 00 00 48 8d
74 24 1c 48 8d 54 24 18 4c 89 ff e8 15 8a 99 00 48 83 3b 00 74 59 48 8b
43 18 48 85 c0 74 11 <48> 8b 30 48 85 f6 74 09 4c 8b 5b 08 4d 85 db 75
1a 48 8b 43 20 48
Nov 17 12:17:37 varvalian kernel: RSP: 0018:ffffaa5281fe7868 EFLAGS:
00010202
Nov 17 12:17:37 varvalian kernel: RAX: 796772656e650073 RBX:
ffffffffc2a642aa RCX: f781ec27a963db00
Nov 17 12:17:37 varvalian kernel: RDX: ffffaa5281fe7880 RSI:
ffffaa5281fe7884 RDI: ffff90c611dc8400
Nov 17 12:17:37 varvalian kernel: RBP: 000000000000000f R08:
0000000000000000 R09: 0000000000000001
Nov 17 12:17:37 varvalian kernel: R10: 0000000002000001 R11:
ffffffff8e86ee00 R12: 0000000000000000
Nov 17 12:17:37 varvalian kernel: R13: ffff90c6038469c0 R14:
ffff90c611dc8400 R15: ffff90c611dc8400
Nov 17 12:17:37 varvalian kernel: FS: 00007163efc54880(0000)
GS:ffff90c8efe00000(0000) knlGS:0000000000000000
Nov 17 12:17:37 varvalian kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Nov 17 12:17:37 varvalian kernel: CR2: 00005c1834b98298 CR3:
0000000121298000 CR4: 0000000000f50ef0
Nov 17 12:17:37 varvalian kernel: PKRU: 55555554
Nov 17 12:17:47 varvalian kernel: ------------[ cut here ]------------
```
Ill do on the weekend some additonal tests based on the latest
linux-next snapshot and provide some more logs.
Regards,
Peter
On 15.11.24 07:08, Dhananjay Ugwekar wrote:
> Add a new "power_core" PMU and "energy-core" event for monitoring
> energy consumption by each individual core. The existing energy-cores
> event aggregates the energy consumption of CPU cores at the package level.
> This new event aligns with the AMD's per-core energy counters.
>
> Tested the package level and core level PMU counters with workloads
> pinned to different CPUs.
>
> Results with workload pinned to CPU 4 in core 4 on an AMD Zen4 Genoa
> machine:
>
> $ sudo perf stat --per-core -e power_core/energy-core/ -- taskset -c 4 stress-ng --matrix 1 --timeout 5s
> stress-ng: info: [21250] setting to a 5 second run per stressor
> stress-ng: info: [21250] dispatching hogs: 1 matrix
> stress-ng: info: [21250] successful run completed in 5.00s
>
> Performance counter stats for 'system wide':
>
> S0-D0-C0 1 0.00 Joules power_core/energy-core/
> S0-D0-C1 1 0.00 Joules power_core/energy-core/
> S0-D0-C2 1 0.00 Joules power_core/energy-core/
> S0-D0-C3 1 0.00 Joules power_core/energy-core/
> S0-D0-C4 1 8.43 Joules power_core/energy-core/
> S0-D0-C5 1 0.00 Joules power_core/energy-core/
> S0-D0-C6 1 0.00 Joules power_core/energy-core/
> S0-D0-C7 1 0.00 Joules power_core/energy-core/
> S0-D1-C8 1 0.00 Joules power_core/energy-core/
> S0-D1-C9 1 0.00 Joules power_core/energy-core/
> S0-D1-C10 1 0.00 Joules power_core/energy-core/
>
> Signed-off-by: Dhananjay Ugwekar<Dhananjay.Ugwekar@amd.com>
> Reviewed-by: Gautham R. Shenoy<gautham.shenoy@amd.com>
> ---
> arch/x86/events/rapl.c | 185 +++++++++++++++++++++++++++++++++--------
> 1 file changed, 152 insertions(+), 33 deletions(-)
>
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index 6e51386ff91f..e9be1f31163d 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -39,6 +39,10 @@
> * event: rapl_energy_psys
> * perf code: 0x5
> *
> + * core counter: consumption of a single physical core
> + * event: rapl_energy_core (power_core PMU)
> + * perf code: 0x1
> + *
> * We manage those counters as free running (read-only). They may be
> * use simultaneously by other tools, such as turbostat.
> *
> @@ -81,6 +85,10 @@ enum perf_rapl_pkg_events {
> NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
> };
>
> +#define PERF_RAPL_CORE 0 /* single core */
> +#define PERF_RAPL_CORE_EVENTS_MAX 1
> +#define NR_RAPL_CORE_DOMAINS PERF_RAPL_CORE_EVENTS_MAX
> +
> static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst = {
> "pp0-core",
> "package",
> @@ -89,6 +97,8 @@ static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst
> "psys",
> };
>
> +static const char *const rapl_core_domain_name __initconst = "core";
> +
> /*
> * event code: LSB 8 bits, passed in attr->config
> * any other bit is reserved
> @@ -141,14 +151,18 @@ enum rapl_unit_quirk {
>
> struct rapl_model {
> struct perf_msr *rapl_pkg_msrs;
> + struct perf_msr *rapl_core_msrs;
> unsigned long pkg_events;
> + unsigned long core_events;
> unsigned int msr_power_unit;
> enum rapl_unit_quirk unit_quirk;
> };
>
> /* 1/2^hw_unit Joule */
> static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
> +static int rapl_core_hw_unit __read_mostly;
> static struct rapl_pmus *rapl_pmus_pkg;
> +static struct rapl_pmus *rapl_pmus_core;
> static u64 rapl_timer_ms;
> static struct rapl_model *rapl_model;
>
> @@ -156,14 +170,23 @@ static struct rapl_model *rapl_model;
> * Helper function to get the correct topology id according to the
> * RAPL PMU scope.
> */
> -static inline unsigned int get_rapl_pmu_idx(int cpu)
> -{ /*
> +static inline unsigned int get_rapl_pmu_idx(int cpu, int scope)
> +{
> + /*
> * Returns unsigned int, which converts the '-1' return value
> * (for non-existent mappings in topology map) to UINT_MAX, so
> * the error check in the caller is simplified.
> */
> - return rapl_pkg_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
> - topology_logical_die_id(cpu);
> + switch (scope) {
> + case PERF_PMU_SCOPE_PKG:
> + return topology_logical_package_id(cpu);
> + case PERF_PMU_SCOPE_DIE:
> + return topology_logical_die_id(cpu);
> + case PERF_PMU_SCOPE_CORE:
> + return topology_logical_core_id(cpu);
> + default:
> + return -EINVAL;
> + }
> }
>
> static inline u64 rapl_read_counter(struct perf_event *event)
> @@ -173,19 +196,20 @@ static inline u64 rapl_read_counter(struct perf_event *event)
> return raw;
> }
>
> -static inline u64 rapl_scale(u64 v, int cfg)
> +static inline u64 rapl_scale(u64 v, struct perf_event *event)
> {
> - if (cfg > NR_RAPL_PKG_DOMAINS) {
> - pr_warn("Invalid domain %d, failed to scale data\n", cfg);
> - return v;
> - }
> + int hw_unit = rapl_pkg_hw_unit[event->hw.config - 1];
> +
> + if (event->pmu->scope == PERF_PMU_SCOPE_CORE)
> + hw_unit = rapl_core_hw_unit;
> +
> /*
> * scale delta to smallest unit (1/2^32)
> * users must then scale back: count * 1/(1e9*2^32) to get Joules
> * or use ldexp(count, -32).
> * Watts = Joules/Time delta
> */
> - return v << (32 - rapl_pkg_hw_unit[cfg - 1]);
> + return v << (32 - hw_unit);
> }
>
> static u64 rapl_event_update(struct perf_event *event)
> @@ -212,7 +236,7 @@ static u64 rapl_event_update(struct perf_event *event)
> delta = (new_raw_count << shift) - (prev_raw_count << shift);
> delta >>= shift;
>
> - sdelta = rapl_scale(delta, event->hw.config);
> + sdelta = rapl_scale(delta, event);
>
> local64_add(sdelta, &event->count);
>
> @@ -341,13 +365,14 @@ static void rapl_pmu_event_del(struct perf_event *event, int flags)
> static int rapl_pmu_event_init(struct perf_event *event)
> {
> u64 cfg = event->attr.config & RAPL_EVENT_MASK;
> - int bit, ret = 0;
> + int bit, rapl_pmus_scope, ret = 0;
> struct rapl_pmu *rapl_pmu;
> unsigned int rapl_pmu_idx;
> + struct rapl_pmus *rapl_pmus;
>
> - /* only look at RAPL events */
> - if (event->attr.type != rapl_pmus_pkg->pmu.type)
> - return -ENOENT;
> + /* unsupported modes and filters */
> + if (event->attr.sample_period) /* no sampling */
> + return -EINVAL;
>
> /* check only supported bits are set */
> if (event->attr.config & ~RAPL_EVENT_MASK)
> @@ -356,31 +381,49 @@ static int rapl_pmu_event_init(struct perf_event *event)
> if (event->cpu < 0)
> return -EINVAL;
>
> - if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
> + rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
> + if (!rapl_pmus)
> return -EINVAL;
> -
> - cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
> - bit = cfg - 1;
> -
> - /* check event supported */
> - if (!(rapl_pmus_pkg->cntr_mask & (1 << bit)))
> + rapl_pmus_scope = rapl_pmus->pmu.scope;
> +
> + if (rapl_pmus_scope == PERF_PMU_SCOPE_PKG || rapl_pmus_scope == PERF_PMU_SCOPE_DIE) {
> + /* only look at RAPL package events */
> + if (event->attr.type != rapl_pmus_pkg->pmu.type)
> + return -ENOENT;
> +
> + cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
> + if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
> + return -EINVAL;
> +
> + bit = cfg - 1;
> + event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
> + } else if (rapl_pmus_scope == PERF_PMU_SCOPE_CORE) {
> + /* only look at RAPL core events */
> + if (event->attr.type != rapl_pmus_core->pmu.type)
> + return -ENOENT;
> +
> + cfg = array_index_nospec((long)cfg, NR_RAPL_CORE_DOMAINS + 1);
> + if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
> + return -EINVAL;
> +
> + bit = cfg - 1;
> + event->hw.event_base = rapl_model->rapl_core_msrs[bit].msr;
> + } else
> return -EINVAL;
>
> - /* unsupported modes and filters */
> - if (event->attr.sample_period) /* no sampling */
> + /* check event supported */
> + if (!(rapl_pmus->cntr_mask & (1 << bit)))
> return -EINVAL;
>
> - rapl_pmu_idx = get_rapl_pmu_idx(event->cpu);
> - if (rapl_pmu_idx >= rapl_pmus_pkg->nr_rapl_pmu)
> + rapl_pmu_idx = get_rapl_pmu_idx(event->cpu, rapl_pmus_scope);
> + if (rapl_pmu_idx >= rapl_pmus->nr_rapl_pmu)
> return -EINVAL;
> -
> /* must be done before validate_group */
> - rapl_pmu = rapl_pmus_pkg->rapl_pmu[rapl_pmu_idx];
> + rapl_pmu = rapl_pmus->rapl_pmu[rapl_pmu_idx];
> if (!rapl_pmu)
> return -EINVAL;
>
> event->pmu_private = rapl_pmu;
> - event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
> event->hw.config = cfg;
> event->hw.idx = bit;
>
> @@ -397,12 +440,14 @@ RAPL_EVENT_ATTR_STR(energy-pkg , rapl_pkg, "event=0x02");
> RAPL_EVENT_ATTR_STR(energy-ram , rapl_ram, "event=0x03");
> RAPL_EVENT_ATTR_STR(energy-gpu , rapl_gpu, "event=0x04");
> RAPL_EVENT_ATTR_STR(energy-psys, rapl_psys, "event=0x05");
> +RAPL_EVENT_ATTR_STR(energy-core, rapl_core, "event=0x01");
>
> RAPL_EVENT_ATTR_STR(energy-cores.unit, rapl_cores_unit, "Joules");
> RAPL_EVENT_ATTR_STR(energy-pkg.unit , rapl_pkg_unit, "Joules");
> RAPL_EVENT_ATTR_STR(energy-ram.unit , rapl_ram_unit, "Joules");
> RAPL_EVENT_ATTR_STR(energy-gpu.unit , rapl_gpu_unit, "Joules");
> RAPL_EVENT_ATTR_STR(energy-psys.unit, rapl_psys_unit, "Joules");
> +RAPL_EVENT_ATTR_STR(energy-core.unit, rapl_core_unit, "Joules");
>
> /*
> * we compute in 0.23 nJ increments regardless of MSR
> @@ -412,6 +457,7 @@ RAPL_EVENT_ATTR_STR(energy-pkg.scale, rapl_pkg_scale, "2.3283064365386962890
> RAPL_EVENT_ATTR_STR(energy-ram.scale, rapl_ram_scale, "2.3283064365386962890625e-10");
> RAPL_EVENT_ATTR_STR(energy-gpu.scale, rapl_gpu_scale, "2.3283064365386962890625e-10");
> RAPL_EVENT_ATTR_STR(energy-psys.scale, rapl_psys_scale, "2.3283064365386962890625e-10");
> +RAPL_EVENT_ATTR_STR(energy-core.scale, rapl_core_scale, "2.3283064365386962890625e-10");
>
> /*
> * There are no default events, but we need to create
> @@ -444,6 +490,12 @@ static const struct attribute_group *rapl_attr_groups[] = {
> NULL,
> };
>
> +static const struct attribute_group *rapl_core_attr_groups[] = {
> + &rapl_pmu_format_group,
> + &rapl_pmu_events_group,
> + NULL,
> +};
> +
> static struct attribute *rapl_events_cores[] = {
> EVENT_PTR(rapl_cores),
> EVENT_PTR(rapl_cores_unit),
> @@ -504,6 +556,18 @@ static struct attribute_group rapl_events_psys_group = {
> .attrs = rapl_events_psys,
> };
>
> +static struct attribute *rapl_events_core[] = {
> + EVENT_PTR(rapl_core),
> + EVENT_PTR(rapl_core_unit),
> + EVENT_PTR(rapl_core_scale),
> + NULL,
> +};
> +
> +static struct attribute_group rapl_events_core_group = {
> + .name = "events",
> + .attrs = rapl_events_core,
> +};
> +
> static bool test_msr(int idx, void *data)
> {
> return test_bit(idx, (unsigned long *) data);
> @@ -541,6 +605,11 @@ static struct perf_msr amd_rapl_pkg_msrs[] = {
> [PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group, NULL, false, 0 },
> };
>
> +static struct perf_msr amd_rapl_core_msrs[] = {
> + [PERF_RAPL_CORE] = { MSR_AMD_CORE_ENERGY_STATUS, &rapl_events_core_group,
> + test_msr, false, RAPL_MSR_MASK },
> +};
> +
> static int rapl_check_hw_unit(void)
> {
> u64 msr_rapl_power_unit_bits;
> @@ -552,6 +621,8 @@ static int rapl_check_hw_unit(void)
> for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++)
> rapl_pkg_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
>
> + rapl_core_hw_unit = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
> +
> switch (rapl_model->unit_quirk) {
> /*
> * DRAM domain on HSW server and KNL has fixed energy unit which can be
> @@ -570,7 +641,6 @@ static int rapl_check_hw_unit(void)
> break;
> }
>
> -
> /*
> * Calculate the timer rate:
> * Use reference of 200W for scaling the timeout to avoid counter
> @@ -589,9 +659,13 @@ static int rapl_check_hw_unit(void)
> static void __init rapl_advertise(void)
> {
> int i;
> + int num_counters = hweight32(rapl_pmus_pkg->cntr_mask);
> +
> + if (rapl_pmus_core)
> + num_counters += hweight32(rapl_pmus_core->cntr_mask);
>
> pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms ovfl timer\n",
> - hweight32(rapl_pmus_pkg->cntr_mask), rapl_timer_ms);
> + num_counters, rapl_timer_ms);
>
> for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++) {
> if (rapl_pmus_pkg->cntr_mask & (1 << i)) {
> @@ -599,6 +673,10 @@ static void __init rapl_advertise(void)
> rapl_pkg_domain_names[i], rapl_pkg_hw_unit[i]);
> }
> }
> +
> + if (rapl_pmus_core && (rapl_pmus_core->cntr_mask & (1 << PERF_RAPL_CORE)))
> + pr_info("hw unit of domain %s 2^-%d Joules\n",
> + rapl_core_domain_name, rapl_core_hw_unit);
> }
>
> static void cleanup_rapl_pmus(struct rapl_pmus *rapl_pmus)
> @@ -619,6 +697,10 @@ static const struct attribute_group *rapl_attr_update[] = {
> NULL,
> };
>
> +static const struct attribute_group *rapl_core_attr_update[] = {
> + &rapl_events_core_group,
> +};
> +
> static int __init init_rapl_pmu(struct rapl_pmus *rapl_pmus)
> {
> struct rapl_pmu *rapl_pmu;
> @@ -645,13 +727,22 @@ static int __init init_rapl_pmu(struct rapl_pmus *rapl_pmus)
> return -ENOMEM;
> }
>
> -static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr, int rapl_pmu_scope)
> +static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr, int rapl_pmu_scope,
> + const struct attribute_group **rapl_attr_groups,
> + const struct attribute_group **rapl_attr_update)
> {
> int nr_rapl_pmu = topology_max_packages();
> struct rapl_pmus *rapl_pmus;
>
> + /*
> + * rapl_pmu_scope must be either PKG, DIE or CORE
> + */
> if (rapl_pmu_scope == PERF_PMU_SCOPE_DIE)
> nr_rapl_pmu *= topology_max_dies_per_package();
> + else if (rapl_pmu_scope == PERF_PMU_SCOPE_CORE)
> + nr_rapl_pmu *= topology_num_cores_per_package();
> + else if (rapl_pmu_scope != PERF_PMU_SCOPE_PKG)
> + return -EINVAL;
>
> rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu, nr_rapl_pmu), GFP_KERNEL);
> if (!rapl_pmus)
> @@ -740,8 +831,10 @@ static struct rapl_model model_spr = {
>
> static struct rapl_model model_amd_hygon = {
> .pkg_events = BIT(PERF_RAPL_PKG),
> + .core_events = BIT(PERF_RAPL_CORE),
> .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
> .rapl_pkg_msrs = amd_rapl_pkg_msrs,
> + .rapl_core_msrs = amd_rapl_core_msrs,
> };
>
> static const struct x86_cpu_id rapl_model_match[] __initconst = {
> @@ -813,7 +906,8 @@ static int __init rapl_pmu_init(void)
> if (ret)
> return ret;
>
> - ret = init_rapl_pmus(&rapl_pmus_pkg, rapl_pkg_pmu_scope);
> + ret = init_rapl_pmus(&rapl_pmus_pkg, rapl_pkg_pmu_scope, rapl_attr_groups,
> + rapl_attr_update);
> if (ret)
> return ret;
>
> @@ -825,6 +919,27 @@ static int __init rapl_pmu_init(void)
> if (ret)
> goto out;
>
> + if (rapl_model->core_events) {
> + ret = init_rapl_pmus(&rapl_pmus_core, PERF_PMU_SCOPE_CORE,
> + rapl_core_attr_groups,
> + rapl_core_attr_update);
> + if (ret) {
> + pr_warn("power-core PMU initialization failed (%d)\n", ret);
> + goto core_init_failed;
> + }
> +
> + rapl_pmus_core->cntr_mask = perf_msr_probe(rapl_model->rapl_core_msrs,
> + PERF_RAPL_CORE_EVENTS_MAX, false,
> + (void *) &rapl_model->core_events);
> +
> + ret = perf_pmu_register(&rapl_pmus_core->pmu, "power_core", -1);
> + if (ret) {
> + pr_warn("power-core PMU registration failed (%d)\n", ret);
> + cleanup_rapl_pmus(rapl_pmus_core);
> + }
> + }
> +
> +core_init_failed:
> rapl_advertise();
> return 0;
>
> @@ -837,6 +952,10 @@ module_init(rapl_pmu_init);
>
> static void __exit intel_rapl_exit(void)
> {
> + if (rapl_pmus_core) {
> + perf_pmu_unregister(&rapl_pmus_core->pmu);
> + cleanup_rapl_pmus(rapl_pmus_core);
> + }
> perf_pmu_unregister(&rapl_pmus_pkg->pmu);
> cleanup_rapl_pmus(rapl_pmus_pkg);
> }
> -- 2.34.1
>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs
2024-11-20 7:58 ` Peter Jung
@ 2024-11-20 8:18 ` Peter Jung
2024-11-20 13:58 ` Dhananjay Ugwekar
1 sibling, 0 replies; 26+ messages in thread
From: Peter Jung @ 2024-11-20 8:18 UTC (permalink / raw)
To: Dhananjay Ugwekar, peterz, mingo, rui.zhang, irogers, kan.liang,
tglx, bp, gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel
Sorry, little additon :
This only happens on Clang built kernels. Ive missed to mention that.
On 20.11.24 08:58, Peter Jung wrote:
> Hi together,
>
> This patch seems to crash the kernel and results into a not bootable
> system.
>
>
> The patch has been applied on base 6.12.rc7 - I have not tested it yet
> on linux-next.
>
> I was able to reproduce this issue also on the v6 and the only "good"
> version was the v4.
> This has been reproduced on several zen3+ machines and also on my 9950X.
>
> Bisect log:
> ```
> git bisect start
> # status: waiting for both good and bad commits
> # good: [2d5404caa8c7bb5c4e0435f94b28834ae5456623] Linux 6.12-rc7
> git bisect good 2d5404caa8c7bb5c4e0435f94b28834ae5456623
> # status: waiting for bad commit, 1 good commit known
> # bad: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/rapl: Add
> core energy counter support for AMD CPUs
> git bisect bad 372e95a40e04ae6ebe69300b76566af6455ba84e
> # good: [fd3c84b2fc8a50030e8c7d91983f50539035ec3a] perf/x86/rapl: Rename
> rapl_pmu variables
> git bisect good fd3c84b2fc8a50030e8c7d91983f50539035ec3a
> # good: [96673b2c940e71fde50a54311ecdce00ff7a8e0b] perf/x86/rapl: Modify
> the generic variable names to *_pkg*
> git bisect good 96673b2c940e71fde50a54311ecdce00ff7a8e0b
> # good: [68b214c92635f0b24a3f3074873b77f4f1a82b80] perf/x86/rapl: Move
> the cntr_mask to rapl_pmus struct
> git bisect good 68b214c92635f0b24a3f3074873b77f4f1a82b80
> # first bad commit: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/
> rapl: Add core energy counter support for AMD CPUs
> ```
>
> Log:
> ```
> Nov 17 12:17:37 varvalian kernel: Oops: general protection fault,
> probably for non-canonical address 0x796772656e650073: 0000 [#1] PREEMPT
> SMP NOPTI
> Nov 17 12:17:37 varvalian kernel: CPU: 8 UID: 0 PID: 549 Comm: (udev-
> worker) Tainted: G OE 6.12.0-rc7-6-cachyos-rc-test #1
> bf94bb78cd9745ddc98a0655e5681df034f56f3d
> Nov 17 12:17:37 varvalian kernel: Tainted: [O]=OOT_MODULE,
> [E]=UNSIGNED_MODULE
> Nov 17 12:17:37 varvalian kernel: Hardware name: LENOVO 82SN/
> LNVNB161216, BIOS J4CN41WW 04/09/2024
> Nov 17 12:17:37 varvalian kernel: RIP:
> 0010:internal_create_group+0x9a/0x4e0
> Nov 17 12:17:37 varvalian kernel: Code: 7b 20 00 0f 84 cb 00 00 00 48 8d
> 74 24 1c 48 8d 54 24 18 4c 89 ff e8 15 8a 99 00 48 83 3b 00 74 59 48 8b
> 43 18 48 85 c0 74 11 <48> 8b 30 48 85 f6 74 09 4c 8b 5b 08 4d 85 db 75
> 1a 48 8b 43 20 48
> Nov 17 12:17:37 varvalian kernel: RSP: 0018:ffffaa5281fe7868 EFLAGS:
> 00010202
> Nov 17 12:17:37 varvalian kernel: RAX: 796772656e650073 RBX:
> ffffffffc2a642aa RCX: f781ec27a963db00
> Nov 17 12:17:37 varvalian kernel: RDX: ffffaa5281fe7880 RSI:
> ffffaa5281fe7884 RDI: ffff90c611dc8400
> Nov 17 12:17:37 varvalian kernel: RBP: 000000000000000f R08:
> 0000000000000000 R09: 0000000000000001
> Nov 17 12:17:37 varvalian kernel: R10: 0000000002000001 R11:
> ffffffff8e86ee00 R12: 0000000000000000
> Nov 17 12:17:37 varvalian kernel: R13: ffff90c6038469c0 R14:
> ffff90c611dc8400 R15: ffff90c611dc8400
> Nov 17 12:17:37 varvalian kernel: FS: 00007163efc54880(0000)
> GS:ffff90c8f0000000(0000) knlGS:0000000000000000
> Nov 17 12:17:37 varvalian kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> Nov 17 12:17:37 varvalian kernel: CR2: 00007163efd6a637 CR3:
> 0000000121298000 CR4: 0000000000f50ef0
> Nov 17 12:17:37 varvalian kernel: PKRU: 55555554
> Nov 17 12:17:37 varvalian kernel: Call Trace:
> Nov 17 12:17:37 varvalian kernel: <TASK>
> Nov 17 12:17:37 varvalian kernel: ? __die_body+0x6a/0xb0
> Nov 17 12:17:37 varvalian kernel: ? die_addr+0xa4/0xd0
> Nov 17 12:17:37 varvalian kernel: ? exc_general_protection+0x170/0x220
> Nov 17 12:17:37 varvalian kernel: ? asm_exc_general_protection+0x26/0x30
> Nov 17 12:17:37 varvalian kernel: ? __pfx_device_get_ownership+0x10/0x10
> Nov 17 12:17:37 varvalian kernel: ? internal_create_group+0x9a/0x4e0
> Nov 17 12:17:37 varvalian kernel: ? internal_create_group+0x8b/0x4e0
> Nov 17 12:17:37 varvalian kernel: sysfs_update_groups+0x37/0x80
> Nov 17 12:17:37 varvalian kernel: pmu_dev_alloc+0xc6/0xe0
> Nov 17 12:17:37 varvalian kernel: perf_pmu_register+0x28c/0x3e0
> Nov 17 12:17:37 varvalian kernel: init_module+0x1d9/0x240 [rapl
> 339c1d4b88353cca8b92d05e1524f5f14886bd70]
> Nov 17 12:17:37 varvalian kernel: ? __pfx_init_module+0x10/0x10 [rapl
> 339c1d4b88353cca8b92d05e1524f5f14886bd70]
> Nov 17 12:17:37 varvalian kernel: do_one_initcall+0x122/0x370
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? idr_alloc_cyclic+0x139/0x1d0
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ?
> security_kernfs_init_security+0x54/0x190
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? __kernfs_new_node+0x1ba/0x240
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? kernfs_link_sibling+0xef/0x110
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? kernfs_activate+0x2c/0x110
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? kernfs_add_one+0x108/0x140
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? __kernfs_create_file+0x75/0xb0
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? sysfs_create_bin_file+0xc6/0x120
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? __vunmap_range_noflush+0x344/0x410
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? insert_vmap_area+0xab/0xf0
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? module_decompress_cleanup+0x4a/0x70
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? load_module+0x1111/0x1330
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? __kmalloc_cache_noprof+0x12b/0x2e0
> Nov 17 12:17:37 varvalian kernel: do_init_module+0x60/0x240
> Nov 17 12:17:37 varvalian kernel: __se_sys_finit_module+0x270/0x380
> Nov 17 12:17:37 varvalian kernel: do_syscall_64+0x88/0x160
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? __x64_sys_lseek+0x5d/0xa0
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? syscall_exit_to_user_mode+0x34/0xd0
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? do_syscall_64+0x94/0x160
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? do_user_addr_fault+0x289/0x730
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: ? srso_alias_return_thunk+0x5/0xfbef5
> Nov 17 12:17:37 varvalian kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
> Nov 17 12:17:37 varvalian kernel: RIP: 0033:0x7163ef71f15d
> Nov 17 12:17:37 varvalian kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00
> 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8
> 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9b 6b 0d 00
> f7 d8 64 89 01 48
> Nov 17 12:17:37 varvalian kernel: RSP: 002b:00007ffe2bda0ac8 EFLAGS:
> 00000246 ORIG_RAX: 0000000000000139
> Nov 17 12:17:37 varvalian kernel: RAX: ffffffffffffffda RBX:
> 00005c1834b627f0 RCX: 00007163ef71f15d
> Nov 17 12:17:37 varvalian kernel: RDX: 0000000000000004 RSI:
> 00007163eef044ac RDI: 000000000000002b
> Nov 17 12:17:37 varvalian kernel: RBP: 00007163eef044ac R08:
> 00007163ef7f6ad0 R09: 00007ffe2bda0bf8
> Nov 17 12:17:37 varvalian kernel: R10: 00005c18349ed010 R11:
> 0000000000000246 R12: 0000000000020000
> Nov 17 12:17:37 varvalian kernel: R13: 00005c1834b47b80 R14:
> 0000000000000000 R15: 00005c1834b54470
> Nov 17 12:17:37 varvalian kernel: </TASK>
> Nov 17 12:17:37 varvalian kernel: Modules linked in: rapl(+)
> nvidia_wmi_ec_backlight pcspkr sparse_keymap roles wmi_bmof
> platform_profile snd snd_pci_acp3x k10temp i2c_smbus ccp rfkill mousedev
> soundcore joydev typec i2c_hid_acpi i2c_hid amd_pmc ip6t_REJECT acpi_tad
> nf_reject_ipv6 mac_hid xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG
> nf_log_syslog xt_recent nft_limit xt_limit xt_addrtype xt_tcpudp
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat
> nf_tables pkcs8_key_parser uinput i2c_dev crypto_user loop dm_mod
> nfnetlink zram 842_decompress 842_compress lz4hc_compress lz4_compress
> ip_tables x_tables amdgpu btrfs crc16 libcrc32c amdxcp crc32c_generic
> drm_buddy raid6_pq serio_raw drm_suballoc_helper xor drm_exec atkbd
> sdhci_pci i2c_algo_bit crc32c_intel vivaldi_fmap libps2 sha256_ssse3
> cqhci drm_display_helper nvme sdhci cec nvme_core i8042 mmc_core
> gpu_sched nvme_auth serio hid_logitech_hidpp hid_logitech_dj hid_generic
> usbhid nvidia_drm(OE) drm_ttm_helper ttm nvidia_uvm(OE)
> nvidia_modeset(OE) video wmi nvidia(OE)
> Nov 17 12:17:37 varvalian kernel: RIP:
> 0010:internal_create_group+0x9a/0x4e0
> Nov 17 12:17:37 varvalian kernel: Code: 7b 20 00 0f 84 cb 00 00 00 48 8d
> 74 24 1c 48 8d 54 24 18 4c 89 ff e8 15 8a 99 00 48 83 3b 00 74 59 48 8b
> 43 18 48 85 c0 74 11 <48> 8b 30 48 85 f6 74 09 4c 8b 5b 08 4d 85 db 75
> 1a 48 8b 43 20 48
> Nov 17 12:17:37 varvalian kernel: RSP: 0018:ffffaa5281fe7868 EFLAGS:
> 00010202
> Nov 17 12:17:37 varvalian kernel: RAX: 796772656e650073 RBX:
> ffffffffc2a642aa RCX: f781ec27a963db00
> Nov 17 12:17:37 varvalian kernel: RDX: ffffaa5281fe7880 RSI:
> ffffaa5281fe7884 RDI: ffff90c611dc8400
> Nov 17 12:17:37 varvalian kernel: RBP: 000000000000000f R08:
> 0000000000000000 R09: 0000000000000001
> Nov 17 12:17:37 varvalian kernel: R10: 0000000002000001 R11:
> ffffffff8e86ee00 R12: 0000000000000000
> Nov 17 12:17:37 varvalian kernel: R13: ffff90c6038469c0 R14:
> ffff90c611dc8400 R15: ffff90c611dc8400
> Nov 17 12:17:37 varvalian kernel: FS: 00007163efc54880(0000)
> GS:ffff90c8efe00000(0000) knlGS:0000000000000000
> Nov 17 12:17:37 varvalian kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> Nov 17 12:17:37 varvalian kernel: CR2: 00005c1834b98298 CR3:
> 0000000121298000 CR4: 0000000000f50ef0
> Nov 17 12:17:37 varvalian kernel: PKRU: 55555554
> Nov 17 12:17:47 varvalian kernel: ------------[ cut here ]------------
> ```
>
> Ill do on the weekend some additonal tests based on the latest linux-
> next snapshot and provide some more logs.
>
> Regards,
>
> Peter
>
> On 15.11.24 07:08, Dhananjay Ugwekar wrote:
>> Add a new "power_core" PMU and "energy-core" event for monitoring
>> energy consumption by each individual core. The existing energy-cores
>> event aggregates the energy consumption of CPU cores at the package
>> level.
>> This new event aligns with the AMD's per-core energy counters.
>>
>> Tested the package level and core level PMU counters with workloads
>> pinned to different CPUs.
>>
>> Results with workload pinned to CPU 4 in core 4 on an AMD Zen4 Genoa
>> machine:
>>
>> $ sudo perf stat --per-core -e power_core/energy-core/ -- taskset -c 4
>> stress-ng --matrix 1 --timeout 5s
>> stress-ng: info: [21250] setting to a 5 second run per stressor
>> stress-ng: info: [21250] dispatching hogs: 1 matrix
>> stress-ng: info: [21250] successful run completed in 5.00s
>>
>> Performance counter stats for 'system wide':
>>
>> S0-D0-C0 1 0.00 Joules power_core/energy-core/
>> S0-D0-C1 1 0.00 Joules power_core/energy-core/
>> S0-D0-C2 1 0.00 Joules power_core/energy-core/
>> S0-D0-C3 1 0.00 Joules power_core/energy-core/
>> S0-D0-C4 1 8.43 Joules power_core/energy-core/
>> S0-D0-C5 1 0.00 Joules power_core/energy-core/
>> S0-D0-C6 1 0.00 Joules power_core/energy-core/
>> S0-D0-C7 1 0.00 Joules power_core/energy-core/
>> S0-D1-C8 1 0.00 Joules power_core/energy-core/
>> S0-D1-C9 1 0.00 Joules power_core/energy-core/
>> S0-D1-C10 1 0.00 Joules power_core/energy-core/
>>
>> Signed-off-by: Dhananjay Ugwekar<Dhananjay.Ugwekar@amd.com>
>> Reviewed-by: Gautham R. Shenoy<gautham.shenoy@amd.com>
>> ---
>> arch/x86/events/rapl.c | 185 +++++++++++++++++++++++++++++++++--------
>> 1 file changed, 152 insertions(+), 33 deletions(-)
>>
>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>> index 6e51386ff91f..e9be1f31163d 100644
>> --- a/arch/x86/events/rapl.c
>> +++ b/arch/x86/events/rapl.c
>> @@ -39,6 +39,10 @@
>> * event: rapl_energy_psys
>> * perf code: 0x5
>> *
>> + * core counter: consumption of a single physical core
>> + * event: rapl_energy_core (power_core PMU)
>> + * perf code: 0x1
>> + *
>> * We manage those counters as free running (read-only). They may be
>> * use simultaneously by other tools, such as turbostat.
>> *
>> @@ -81,6 +85,10 @@ enum perf_rapl_pkg_events {
>> NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
>> };
>> +#define PERF_RAPL_CORE 0 /* single core */
>> +#define PERF_RAPL_CORE_EVENTS_MAX 1
>> +#define NR_RAPL_CORE_DOMAINS PERF_RAPL_CORE_EVENTS_MAX
>> +
>> static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS]
>> __initconst = {
>> "pp0-core",
>> "package",
>> @@ -89,6 +97,8 @@ static const char *const
>> rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst
>> "psys",
>> };
>> +static const char *const rapl_core_domain_name __initconst = "core";
>> +
>> /*
>> * event code: LSB 8 bits, passed in attr->config
>> * any other bit is reserved
>> @@ -141,14 +151,18 @@ enum rapl_unit_quirk {
>> struct rapl_model {
>> struct perf_msr *rapl_pkg_msrs;
>> + struct perf_msr *rapl_core_msrs;
>> unsigned long pkg_events;
>> + unsigned long core_events;
>> unsigned int msr_power_unit;
>> enum rapl_unit_quirk unit_quirk;
>> };
>> /* 1/2^hw_unit Joule */
>> static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
>> +static int rapl_core_hw_unit __read_mostly;
>> static struct rapl_pmus *rapl_pmus_pkg;
>> +static struct rapl_pmus *rapl_pmus_core;
>> static u64 rapl_timer_ms;
>> static struct rapl_model *rapl_model;
>> @@ -156,14 +170,23 @@ static struct rapl_model *rapl_model;
>> * Helper function to get the correct topology id according to the
>> * RAPL PMU scope.
>> */
>> -static inline unsigned int get_rapl_pmu_idx(int cpu)
>> -{ /*
>> +static inline unsigned int get_rapl_pmu_idx(int cpu, int scope)
>> +{
>> + /*
>> * Returns unsigned int, which converts the '-1' return value
>> * (for non-existent mappings in topology map) to UINT_MAX, so
>> * the error check in the caller is simplified.
>> */
>> - return rapl_pkg_pmu_is_pkg_scope() ?
>> topology_logical_package_id(cpu) :
>> - topology_logical_die_id(cpu);
>> + switch (scope) {
>> + case PERF_PMU_SCOPE_PKG:
>> + return topology_logical_package_id(cpu);
>> + case PERF_PMU_SCOPE_DIE:
>> + return topology_logical_die_id(cpu);
>> + case PERF_PMU_SCOPE_CORE:
>> + return topology_logical_core_id(cpu);
>> + default:
>> + return -EINVAL;
>> + }
>> }
>> static inline u64 rapl_read_counter(struct perf_event *event)
>> @@ -173,19 +196,20 @@ static inline u64 rapl_read_counter(struct
>> perf_event *event)
>> return raw;
>> }
>> -static inline u64 rapl_scale(u64 v, int cfg)
>> +static inline u64 rapl_scale(u64 v, struct perf_event *event)
>> {
>> - if (cfg > NR_RAPL_PKG_DOMAINS) {
>> - pr_warn("Invalid domain %d, failed to scale data\n", cfg);
>> - return v;
>> - }
>> + int hw_unit = rapl_pkg_hw_unit[event->hw.config - 1];
>> +
>> + if (event->pmu->scope == PERF_PMU_SCOPE_CORE)
>> + hw_unit = rapl_core_hw_unit;
>> +
>> /*
>> * scale delta to smallest unit (1/2^32)
>> * users must then scale back: count * 1/(1e9*2^32) to get Joules
>> * or use ldexp(count, -32).
>> * Watts = Joules/Time delta
>> */
>> - return v << (32 - rapl_pkg_hw_unit[cfg - 1]);
>> + return v << (32 - hw_unit);
>> }
>> static u64 rapl_event_update(struct perf_event *event)
>> @@ -212,7 +236,7 @@ static u64 rapl_event_update(struct perf_event
>> *event)
>> delta = (new_raw_count << shift) - (prev_raw_count << shift);
>> delta >>= shift;
>> - sdelta = rapl_scale(delta, event->hw.config);
>> + sdelta = rapl_scale(delta, event);
>> local64_add(sdelta, &event->count);
>> @@ -341,13 +365,14 @@ static void rapl_pmu_event_del(struct perf_event
>> *event, int flags)
>> static int rapl_pmu_event_init(struct perf_event *event)
>> {
>> u64 cfg = event->attr.config & RAPL_EVENT_MASK;
>> - int bit, ret = 0;
>> + int bit, rapl_pmus_scope, ret = 0;
>> struct rapl_pmu *rapl_pmu;
>> unsigned int rapl_pmu_idx;
>> + struct rapl_pmus *rapl_pmus;
>> - /* only look at RAPL events */
>> - if (event->attr.type != rapl_pmus_pkg->pmu.type)
>> - return -ENOENT;
>> + /* unsupported modes and filters */
>> + if (event->attr.sample_period) /* no sampling */
>> + return -EINVAL;
>> /* check only supported bits are set */
>> if (event->attr.config & ~RAPL_EVENT_MASK)
>> @@ -356,31 +381,49 @@ static int rapl_pmu_event_init(struct perf_event
>> *event)
>> if (event->cpu < 0)
>> return -EINVAL;
>> - if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
>> + rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
>> + if (!rapl_pmus)
>> return -EINVAL;
>> -
>> - cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
>> - bit = cfg - 1;
>> -
>> - /* check event supported */
>> - if (!(rapl_pmus_pkg->cntr_mask & (1 << bit)))
>> + rapl_pmus_scope = rapl_pmus->pmu.scope;
>> +
>> + if (rapl_pmus_scope == PERF_PMU_SCOPE_PKG || rapl_pmus_scope ==
>> PERF_PMU_SCOPE_DIE) {
>> + /* only look at RAPL package events */
>> + if (event->attr.type != rapl_pmus_pkg->pmu.type)
>> + return -ENOENT;
>> +
>> + cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
>> + if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
>> + return -EINVAL;
>> +
>> + bit = cfg - 1;
>> + event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
>> + } else if (rapl_pmus_scope == PERF_PMU_SCOPE_CORE) {
>> + /* only look at RAPL core events */
>> + if (event->attr.type != rapl_pmus_core->pmu.type)
>> + return -ENOENT;
>> +
>> + cfg = array_index_nospec((long)cfg, NR_RAPL_CORE_DOMAINS + 1);
>> + if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
>> + return -EINVAL;
>> +
>> + bit = cfg - 1;
>> + event->hw.event_base = rapl_model->rapl_core_msrs[bit].msr;
>> + } else
>> return -EINVAL;
>> - /* unsupported modes and filters */
>> - if (event->attr.sample_period) /* no sampling */
>> + /* check event supported */
>> + if (!(rapl_pmus->cntr_mask & (1 << bit)))
>> return -EINVAL;
>> - rapl_pmu_idx = get_rapl_pmu_idx(event->cpu);
>> - if (rapl_pmu_idx >= rapl_pmus_pkg->nr_rapl_pmu)
>> + rapl_pmu_idx = get_rapl_pmu_idx(event->cpu, rapl_pmus_scope);
>> + if (rapl_pmu_idx >= rapl_pmus->nr_rapl_pmu)
>> return -EINVAL;
>> -
>> /* must be done before validate_group */
>> - rapl_pmu = rapl_pmus_pkg->rapl_pmu[rapl_pmu_idx];
>> + rapl_pmu = rapl_pmus->rapl_pmu[rapl_pmu_idx];
>> if (!rapl_pmu)
>> return -EINVAL;
>> event->pmu_private = rapl_pmu;
>> - event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
>> event->hw.config = cfg;
>> event->hw.idx = bit;
>> @@ -397,12 +440,14 @@ RAPL_EVENT_ATTR_STR(energy-pkg , rapl_pkg,
>> "event=0x02");
>> RAPL_EVENT_ATTR_STR(energy-ram , rapl_ram, "event=0x03");
>> RAPL_EVENT_ATTR_STR(energy-gpu , rapl_gpu, "event=0x04");
>> RAPL_EVENT_ATTR_STR(energy-psys, rapl_psys, "event=0x05");
>> +RAPL_EVENT_ATTR_STR(energy-core, rapl_core, "event=0x01");
>> RAPL_EVENT_ATTR_STR(energy-cores.unit, rapl_cores_unit, "Joules");
>> RAPL_EVENT_ATTR_STR(energy-pkg.unit , rapl_pkg_unit, "Joules");
>> RAPL_EVENT_ATTR_STR(energy-ram.unit , rapl_ram_unit, "Joules");
>> RAPL_EVENT_ATTR_STR(energy-gpu.unit , rapl_gpu_unit, "Joules");
>> RAPL_EVENT_ATTR_STR(energy-psys.unit, rapl_psys_unit, "Joules");
>> +RAPL_EVENT_ATTR_STR(energy-core.unit, rapl_core_unit, "Joules");
>> /*
>> * we compute in 0.23 nJ increments regardless of MSR
>> @@ -412,6 +457,7 @@ RAPL_EVENT_ATTR_STR(energy-pkg.scale,
>> rapl_pkg_scale, "2.3283064365386962890
>> RAPL_EVENT_ATTR_STR(energy-ram.scale, rapl_ram_scale,
>> "2.3283064365386962890625e-10");
>> RAPL_EVENT_ATTR_STR(energy-gpu.scale, rapl_gpu_scale,
>> "2.3283064365386962890625e-10");
>> RAPL_EVENT_ATTR_STR(energy-psys.scale, rapl_psys_scale,
>> "2.3283064365386962890625e-10");
>> +RAPL_EVENT_ATTR_STR(energy-core.scale, rapl_core_scale,
>> "2.3283064365386962890625e-10");
>> /*
>> * There are no default events, but we need to create
>> @@ -444,6 +490,12 @@ static const struct attribute_group
>> *rapl_attr_groups[] = {
>> NULL,
>> };
>> +static const struct attribute_group *rapl_core_attr_groups[] = {
>> + &rapl_pmu_format_group,
>> + &rapl_pmu_events_group,
>> + NULL,
>> +};
>> +
>> static struct attribute *rapl_events_cores[] = {
>> EVENT_PTR(rapl_cores),
>> EVENT_PTR(rapl_cores_unit),
>> @@ -504,6 +556,18 @@ static struct attribute_group
>> rapl_events_psys_group = {
>> .attrs = rapl_events_psys,
>> };
>> +static struct attribute *rapl_events_core[] = {
>> + EVENT_PTR(rapl_core),
>> + EVENT_PTR(rapl_core_unit),
>> + EVENT_PTR(rapl_core_scale),
>> + NULL,
>> +};
>> +
>> +static struct attribute_group rapl_events_core_group = {
>> + .name = "events",
>> + .attrs = rapl_events_core,
>> +};
>> +
>> static bool test_msr(int idx, void *data)
>> {
>> return test_bit(idx, (unsigned long *) data);
>> @@ -541,6 +605,11 @@ static struct perf_msr amd_rapl_pkg_msrs[] = {
>> [PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group, NULL, false, 0 },
>> };
>> +static struct perf_msr amd_rapl_core_msrs[] = {
>> + [PERF_RAPL_CORE] = { MSR_AMD_CORE_ENERGY_STATUS,
>> &rapl_events_core_group,
>> + test_msr, false, RAPL_MSR_MASK },
>> +};
>> +
>> static int rapl_check_hw_unit(void)
>> {
>> u64 msr_rapl_power_unit_bits;
>> @@ -552,6 +621,8 @@ static int rapl_check_hw_unit(void)
>> for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++)
>> rapl_pkg_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) &
>> 0x1FULL;
>> + rapl_core_hw_unit = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
>> +
>> switch (rapl_model->unit_quirk) {
>> /*
>> * DRAM domain on HSW server and KNL has fixed energy unit which
>> can be
>> @@ -570,7 +641,6 @@ static int rapl_check_hw_unit(void)
>> break;
>> }
>> -
>> /*
>> * Calculate the timer rate:
>> * Use reference of 200W for scaling the timeout to avoid counter
>> @@ -589,9 +659,13 @@ static int rapl_check_hw_unit(void)
>> static void __init rapl_advertise(void)
>> {
>> int i;
>> + int num_counters = hweight32(rapl_pmus_pkg->cntr_mask);
>> +
>> + if (rapl_pmus_core)
>> + num_counters += hweight32(rapl_pmus_core->cntr_mask);
>> pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms
>> ovfl timer\n",
>> - hweight32(rapl_pmus_pkg->cntr_mask), rapl_timer_ms);
>> + num_counters, rapl_timer_ms);
>> for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++) {
>> if (rapl_pmus_pkg->cntr_mask & (1 << i)) {
>> @@ -599,6 +673,10 @@ static void __init rapl_advertise(void)
>> rapl_pkg_domain_names[i], rapl_pkg_hw_unit[i]);
>> }
>> }
>> +
>> + if (rapl_pmus_core && (rapl_pmus_core->cntr_mask & (1 <<
>> PERF_RAPL_CORE)))
>> + pr_info("hw unit of domain %s 2^-%d Joules\n",
>> + rapl_core_domain_name, rapl_core_hw_unit);
>> }
>> static void cleanup_rapl_pmus(struct rapl_pmus *rapl_pmus)
>> @@ -619,6 +697,10 @@ static const struct attribute_group
>> *rapl_attr_update[] = {
>> NULL,
>> };
>> +static const struct attribute_group *rapl_core_attr_update[] = {
>> + &rapl_events_core_group,
>> +};
>> +
>> static int __init init_rapl_pmu(struct rapl_pmus *rapl_pmus)
>> {
>> struct rapl_pmu *rapl_pmu;
>> @@ -645,13 +727,22 @@ static int __init init_rapl_pmu(struct rapl_pmus
>> *rapl_pmus)
>> return -ENOMEM;
>> }
>> -static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr,
>> int rapl_pmu_scope)
>> +static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr,
>> int rapl_pmu_scope,
>> + const struct attribute_group **rapl_attr_groups,
>> + const struct attribute_group **rapl_attr_update)
>> {
>> int nr_rapl_pmu = topology_max_packages();
>> struct rapl_pmus *rapl_pmus;
>> + /*
>> + * rapl_pmu_scope must be either PKG, DIE or CORE
>> + */
>> if (rapl_pmu_scope == PERF_PMU_SCOPE_DIE)
>> nr_rapl_pmu *= topology_max_dies_per_package();
>> + else if (rapl_pmu_scope == PERF_PMU_SCOPE_CORE)
>> + nr_rapl_pmu *= topology_num_cores_per_package();
>> + else if (rapl_pmu_scope != PERF_PMU_SCOPE_PKG)
>> + return -EINVAL;
>> rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu,
>> nr_rapl_pmu), GFP_KERNEL);
>> if (!rapl_pmus)
>> @@ -740,8 +831,10 @@ static struct rapl_model model_spr = {
>> static struct rapl_model model_amd_hygon = {
>> .pkg_events = BIT(PERF_RAPL_PKG),
>> + .core_events = BIT(PERF_RAPL_CORE),
>> .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
>> .rapl_pkg_msrs = amd_rapl_pkg_msrs,
>> + .rapl_core_msrs = amd_rapl_core_msrs,
>> };
>> static const struct x86_cpu_id rapl_model_match[] __initconst = {
>> @@ -813,7 +906,8 @@ static int __init rapl_pmu_init(void)
>> if (ret)
>> return ret;
>> - ret = init_rapl_pmus(&rapl_pmus_pkg, rapl_pkg_pmu_scope);
>> + ret = init_rapl_pmus(&rapl_pmus_pkg, rapl_pkg_pmu_scope,
>> rapl_attr_groups,
>> + rapl_attr_update);
>> if (ret)
>> return ret;
>> @@ -825,6 +919,27 @@ static int __init rapl_pmu_init(void)
>> if (ret)
>> goto out;
>> + if (rapl_model->core_events) {
>> + ret = init_rapl_pmus(&rapl_pmus_core, PERF_PMU_SCOPE_CORE,
>> + rapl_core_attr_groups,
>> + rapl_core_attr_update);
>> + if (ret) {
>> + pr_warn("power-core PMU initialization failed (%d)\n", ret);
>> + goto core_init_failed;
>> + }
>> +
>> + rapl_pmus_core->cntr_mask = perf_msr_probe(rapl_model-
>> >rapl_core_msrs,
>> + PERF_RAPL_CORE_EVENTS_MAX, false,
>> + (void *) &rapl_model->core_events);
>> +
>> + ret = perf_pmu_register(&rapl_pmus_core->pmu, "power_core", -1);
>> + if (ret) {
>> + pr_warn("power-core PMU registration failed (%d)\n", ret);
>> + cleanup_rapl_pmus(rapl_pmus_core);
>> + }
>> + }
>> +
>> +core_init_failed:
>> rapl_advertise();
>> return 0;
>> @@ -837,6 +952,10 @@ module_init(rapl_pmu_init);
>> static void __exit intel_rapl_exit(void)
>> {
>> + if (rapl_pmus_core) {
>> + perf_pmu_unregister(&rapl_pmus_core->pmu);
>> + cleanup_rapl_pmus(rapl_pmus_core);
>> + }
>> perf_pmu_unregister(&rapl_pmus_pkg->pmu);
>> cleanup_rapl_pmus(rapl_pmus_pkg);
>> }
>> -- 2.34.1
>>
>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs
2024-11-20 7:58 ` Peter Jung
2024-11-20 8:18 ` Peter Jung
@ 2024-11-20 13:58 ` Dhananjay Ugwekar
2024-11-20 14:30 ` Peter Jung
1 sibling, 1 reply; 26+ messages in thread
From: Dhananjay Ugwekar @ 2024-11-20 13:58 UTC (permalink / raw)
To: Peter Jung, peterz, mingo, rui.zhang, irogers, kan.liang, tglx,
bp, gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel
Hello Peter Jung,
Thanks for trying out the patchset,
On 11/20/2024 1:28 PM, Peter Jung wrote:
> Hi together,
>
> This patch seems to crash the kernel and results into a not bootable system.
>
>
> The patch has been applied on base 6.12.rc7 - I have not tested it yet on linux-next.
>
> I was able to reproduce this issue also on the v6 and the only "good" version was the v4.
> This has been reproduced on several zen3+ machines and also on my 9950X.
>
> Bisect log:
> ```
> git bisect start
> # status: waiting for both good and bad commits
> # good: [2d5404caa8c7bb5c4e0435f94b28834ae5456623] Linux 6.12-rc7
> git bisect good 2d5404caa8c7bb5c4e0435f94b28834ae5456623
> # status: waiting for bad commit, 1 good commit known
> # bad: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/rapl: Add core energy counter support for AMD CPUs
> git bisect bad 372e95a40e04ae6ebe69300b76566af6455ba84e
> # good: [fd3c84b2fc8a50030e8c7d91983f50539035ec3a] perf/x86/rapl: Rename rapl_pmu variables
> git bisect good fd3c84b2fc8a50030e8c7d91983f50539035ec3a
> # good: [96673b2c940e71fde50a54311ecdce00ff7a8e0b] perf/x86/rapl: Modify the generic variable names to *_pkg*
> git bisect good 96673b2c940e71fde50a54311ecdce00ff7a8e0b
> # good: [68b214c92635f0b24a3f3074873b77f4f1a82b80] perf/x86/rapl: Move the cntr_mask to rapl_pmus struct
> git bisect good 68b214c92635f0b24a3f3074873b77f4f1a82b80
> # first bad commit: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/rapl: Add core energy counter support for AMD CPUs
> ```
>
> Nov 17 12:17:37 varvalian kernel: RIP: 0010:internal_create_group+0x9a/0x4e0
> Nov 17 12:17:37 varvalian kernel: Code: 7b 20 00 0f 84 cb 00 00 00 48 8d 74 24 1c 48 8d 54 24 18 4c 89 ff e8 15 8a 99 00 48 83 3b 00 74 59 48 8b 43 18 48 85 c0 74 11 <48> 8b 30 48 85 f6 74 09 4c 8b 5b 08 4d 85 db 75 1a 48 8b 43 20 48
> Nov 17 12:17:37 varvalian kernel: RSP: 0018:ffffaa5281fe7868 EFLAGS: 00010202
> Nov 17 12:17:37 varvalian kernel: RAX: 796772656e650073 RBX: ffffffffc2a642aa RCX: f781ec27a963db00
> Nov 17 12:17:37 varvalian kernel: RDX: ffffaa5281fe7880 RSI: ffffaa5281fe7884 RDI: ffff90c611dc8400
> Nov 17 12:17:37 varvalian kernel: RBP: 000000000000000f R08: 0000000000000000 R09: 0000000000000001
> Nov 17 12:17:37 varvalian kernel: R10: 0000000002000001 R11: ffffffff8e86ee00 R12: 0000000000000000
> Nov 17 12:17:37 varvalian kernel: R13: ffff90c6038469c0 R14: ffff90c611dc8400 R15: ffff90c611dc8400
> Nov 17 12:17:37 varvalian kernel: FS: 00007163efc54880(0000) GS:ffff90c8efe00000(0000) knlGS:0000000000000000
> Nov 17 12:17:37 varvalian kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Nov 17 12:17:37 varvalian kernel: CR2: 00005c1834b98298 CR3: 0000000121298000 CR4: 0000000000f50ef0
> Nov 17 12:17:37 varvalian kernel: PKRU: 55555554
> Nov 17 12:17:47 varvalian kernel: ------------[ cut here ]------------
> ```
>
> Ill do on the weekend some additonal tests based on the latest linux-next snapshot and provide some more logs.
Can you please try with the below diff once,
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index e9be1f31163d..d3bb3865c1b1 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -699,6 +699,7 @@ static const struct attribute_group *rapl_attr_update[] = {
static const struct attribute_group *rapl_core_attr_update[] = {
&rapl_events_core_group,
+ NULL,
};
static int __init init_rapl_pmu(struct rapl_pmus *rapl_pmus)
Regards,
Dhananjay
>
> Regards,
>
> Peter
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs
2024-11-20 13:58 ` Dhananjay Ugwekar
@ 2024-11-20 14:30 ` Peter Jung
2024-11-21 15:58 ` Dhananjay Ugwekar
0 siblings, 1 reply; 26+ messages in thread
From: Peter Jung @ 2024-11-20 14:30 UTC (permalink / raw)
To: Dhananjay Ugwekar, peterz, mingo, rui.zhang, irogers, kan.liang,
tglx, bp, gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel
Hi Dhananjay,
On 20.11.24 14:58, Dhananjay Ugwekar wrote:
> Hello Peter Jung,
>
> Thanks for trying out the patchset,
>
> On 11/20/2024 1:28 PM, Peter Jung wrote:
>> Hi together,
>>
>> This patch seems to crash the kernel and results into a not bootable system.
>>
>>
>> The patch has been applied on base 6.12.rc7 - I have not tested it yet on linux-next.
>>
>> I was able to reproduce this issue also on the v6 and the only "good" version was the v4.
>> This has been reproduced on several zen3+ machines and also on my 9950X.
>>
>> Bisect log:
>> ```
>> git bisect start
>> # status: waiting for both good and bad commits
>> # good: [2d5404caa8c7bb5c4e0435f94b28834ae5456623] Linux 6.12-rc7
>> git bisect good 2d5404caa8c7bb5c4e0435f94b28834ae5456623
>> # status: waiting for bad commit, 1 good commit known
>> # bad: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/rapl: Add core energy counter support for AMD CPUs
>> git bisect bad 372e95a40e04ae6ebe69300b76566af6455ba84e
>> # good: [fd3c84b2fc8a50030e8c7d91983f50539035ec3a] perf/x86/rapl: Rename rapl_pmu variables
>> git bisect good fd3c84b2fc8a50030e8c7d91983f50539035ec3a
>> # good: [96673b2c940e71fde50a54311ecdce00ff7a8e0b] perf/x86/rapl: Modify the generic variable names to *_pkg*
>> git bisect good 96673b2c940e71fde50a54311ecdce00ff7a8e0b
>> # good: [68b214c92635f0b24a3f3074873b77f4f1a82b80] perf/x86/rapl: Move the cntr_mask to rapl_pmus struct
>> git bisect good 68b214c92635f0b24a3f3074873b77f4f1a82b80
>> # first bad commit: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/rapl: Add core energy counter support for AMD CPUs
>> ```
>>
>> Nov 17 12:17:37 varvalian kernel: RIP: 0010:internal_create_group+0x9a/0x4e0
>> Nov 17 12:17:37 varvalian kernel: Code: 7b 20 00 0f 84 cb 00 00 00 48 8d 74 24 1c 48 8d 54 24 18 4c 89 ff e8 15 8a 99 00 48 83 3b 00 74 59 48 8b 43 18 48 85 c0 74 11 <48> 8b 30 48 85 f6 74 09 4c 8b 5b 08 4d 85 db 75 1a 48 8b 43 20 48
>> Nov 17 12:17:37 varvalian kernel: RSP: 0018:ffffaa5281fe7868 EFLAGS: 00010202
>> Nov 17 12:17:37 varvalian kernel: RAX: 796772656e650073 RBX: ffffffffc2a642aa RCX: f781ec27a963db00
>> Nov 17 12:17:37 varvalian kernel: RDX: ffffaa5281fe7880 RSI: ffffaa5281fe7884 RDI: ffff90c611dc8400
>> Nov 17 12:17:37 varvalian kernel: RBP: 000000000000000f R08: 0000000000000000 R09: 0000000000000001
>> Nov 17 12:17:37 varvalian kernel: R10: 0000000002000001 R11: ffffffff8e86ee00 R12: 0000000000000000
>> Nov 17 12:17:37 varvalian kernel: R13: ffff90c6038469c0 R14: ffff90c611dc8400 R15: ffff90c611dc8400
>> Nov 17 12:17:37 varvalian kernel: FS: 00007163efc54880(0000) GS:ffff90c8efe00000(0000) knlGS:0000000000000000
>> Nov 17 12:17:37 varvalian kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Nov 17 12:17:37 varvalian kernel: CR2: 00005c1834b98298 CR3: 0000000121298000 CR4: 0000000000f50ef0
>> Nov 17 12:17:37 varvalian kernel: PKRU: 55555554
>> Nov 17 12:17:47 varvalian kernel: ------------[ cut here ]------------
>> ```
>>
>> Ill do on the weekend some additonal tests based on the latest linux-next snapshot and provide some more logs.
> Can you please try with the below diff once,
>
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index e9be1f31163d..d3bb3865c1b1 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -699,6 +699,7 @@ static const struct attribute_group *rapl_attr_update[] = {
>
> static const struct attribute_group *rapl_core_attr_update[] = {
> &rapl_events_core_group,
> + NULL,
> };
>
> static int __init init_rapl_pmu(struct rapl_pmus *rapl_pmus)
>
> Regards,
> Dhananjay
>
Thanks! This patch appears to fix the issue, when the kernel is built
with clang. Thanks for providing such fast fix! :)
Peter
>> Regards,
>>
>> Peter
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs
2024-11-20 14:30 ` Peter Jung
@ 2024-11-21 15:58 ` Dhananjay Ugwekar
0 siblings, 0 replies; 26+ messages in thread
From: Dhananjay Ugwekar @ 2024-11-21 15:58 UTC (permalink / raw)
To: Peter Jung, peterz, mingo, rui.zhang, irogers, kan.liang, tglx,
bp, gautham.shenoy
Cc: kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel
On 11/20/2024 8:00 PM, Peter Jung wrote:
> Hi Dhananjay,
>
> On 20.11.24 14:58, Dhananjay Ugwekar wrote:
>> Hello Peter Jung,
>>
>> Thanks for trying out the patchset,
>>
>> On 11/20/2024 1:28 PM, Peter Jung wrote:
>>> Hi together,
>>>
>>> This patch seems to crash the kernel and results into a not bootable system.
>>>
>>>
>>> The patch has been applied on base 6.12.rc7 - I have not tested it yet on linux-next.
>>>
>>> I was able to reproduce this issue also on the v6 and the only "good" version was the v4.
>>> This has been reproduced on several zen3+ machines and also on my 9950X.
>>>
>>> Bisect log:
>>> ```
>>> git bisect start
>>> # status: waiting for both good and bad commits
>>> # good: [2d5404caa8c7bb5c4e0435f94b28834ae5456623] Linux 6.12-rc7
>>> git bisect good 2d5404caa8c7bb5c4e0435f94b28834ae5456623
>>> # status: waiting for bad commit, 1 good commit known
>>> # bad: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/rapl: Add core energy counter support for AMD CPUs
>>> git bisect bad 372e95a40e04ae6ebe69300b76566af6455ba84e
>>> # good: [fd3c84b2fc8a50030e8c7d91983f50539035ec3a] perf/x86/rapl: Rename rapl_pmu variables
>>> git bisect good fd3c84b2fc8a50030e8c7d91983f50539035ec3a
>>> # good: [96673b2c940e71fde50a54311ecdce00ff7a8e0b] perf/x86/rapl: Modify the generic variable names to *_pkg*
>>> git bisect good 96673b2c940e71fde50a54311ecdce00ff7a8e0b
>>> # good: [68b214c92635f0b24a3f3074873b77f4f1a82b80] perf/x86/rapl: Move the cntr_mask to rapl_pmus struct
>>> git bisect good 68b214c92635f0b24a3f3074873b77f4f1a82b80
>>> # first bad commit: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/rapl: Add core energy counter support for AMD CPUs
>>> ```
>>>
>>> Nov 17 12:17:37 varvalian kernel: RIP: 0010:internal_create_group+0x9a/0x4e0
>>> Nov 17 12:17:37 varvalian kernel: Code: 7b 20 00 0f 84 cb 00 00 00 48 8d 74 24 1c 48 8d 54 24 18 4c 89 ff e8 15 8a 99 00 48 83 3b 00 74 59 48 8b 43 18 48 85 c0 74 11 <48> 8b 30 48 85 f6 74 09 4c 8b 5b 08 4d 85 db 75 1a 48 8b 43 20 48
>>> Nov 17 12:17:37 varvalian kernel: RSP: 0018:ffffaa5281fe7868 EFLAGS: 00010202
>>> Nov 17 12:17:37 varvalian kernel: RAX: 796772656e650073 RBX: ffffffffc2a642aa RCX: f781ec27a963db00
>>> Nov 17 12:17:37 varvalian kernel: RDX: ffffaa5281fe7880 RSI: ffffaa5281fe7884 RDI: ffff90c611dc8400
>>> Nov 17 12:17:37 varvalian kernel: RBP: 000000000000000f R08: 0000000000000000 R09: 0000000000000001
>>> Nov 17 12:17:37 varvalian kernel: R10: 0000000002000001 R11: ffffffff8e86ee00 R12: 0000000000000000
>>> Nov 17 12:17:37 varvalian kernel: R13: ffff90c6038469c0 R14: ffff90c611dc8400 R15: ffff90c611dc8400
>>> Nov 17 12:17:37 varvalian kernel: FS: 00007163efc54880(0000) GS:ffff90c8efe00000(0000) knlGS:0000000000000000
>>> Nov 17 12:17:37 varvalian kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Nov 17 12:17:37 varvalian kernel: CR2: 00005c1834b98298 CR3: 0000000121298000 CR4: 0000000000f50ef0
>>> Nov 17 12:17:37 varvalian kernel: PKRU: 55555554
>>> Nov 17 12:17:47 varvalian kernel: ------------[ cut here ]------------
>>> ```
>>>
>>> Ill do on the weekend some additonal tests based on the latest linux-next snapshot and provide some more logs.
>> Can you please try with the below diff once,
>>
>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>> index e9be1f31163d..d3bb3865c1b1 100644
>> --- a/arch/x86/events/rapl.c
>> +++ b/arch/x86/events/rapl.c
>> @@ -699,6 +699,7 @@ static const struct attribute_group *rapl_attr_update[] = {
>>
>> static const struct attribute_group *rapl_core_attr_update[] = {
>> &rapl_events_core_group,
>> + NULL,
>> };
>>
>> static int __init init_rapl_pmu(struct rapl_pmus *rapl_pmus)
>>
>> Regards,
>> Dhananjay
>>
>
>
> Thanks! This patch appears to fix the issue, when the kernel is built with clang. Thanks for providing such fast fix! :)
Great!, Thanks for the confirmation.
Regards,
Dhananjay
>
> Peter
>
>
>>> Regards,
>>>
>>> Peter
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs
2024-11-15 6:08 ` [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs Dhananjay Ugwekar
2024-11-20 7:58 ` Peter Jung
@ 2025-01-12 13:42 ` Koichiro Den
2025-01-13 6:34 ` Dhananjay Ugwekar
1 sibling, 1 reply; 26+ messages in thread
From: Koichiro Den @ 2025-01-12 13:42 UTC (permalink / raw)
To: Dhananjay Ugwekar
Cc: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy, kprateek.nayak, ravi.bangoria, x86,
linux-perf-users, linux-kernel
On Fri, Nov 15, 2024 at 06:08:06AM GMT, Dhananjay Ugwekar wrote:
> Add a new "power_core" PMU and "energy-core" event for monitoring
> energy consumption by each individual core. The existing energy-cores
> event aggregates the energy consumption of CPU cores at the package level.
> This new event aligns with the AMD's per-core energy counters.
>
> Tested the package level and core level PMU counters with workloads
> pinned to different CPUs.
>
> Results with workload pinned to CPU 4 in core 4 on an AMD Zen4 Genoa
> machine:
>
> $ sudo perf stat --per-core -e power_core/energy-core/ -- taskset -c 4 stress-ng --matrix 1 --timeout 5s
> stress-ng: info: [21250] setting to a 5 second run per stressor
> stress-ng: info: [21250] dispatching hogs: 1 matrix
> stress-ng: info: [21250] successful run completed in 5.00s
>
> Performance counter stats for 'system wide':
>
> S0-D0-C0 1 0.00 Joules power_core/energy-core/
> S0-D0-C1 1 0.00 Joules power_core/energy-core/
> S0-D0-C2 1 0.00 Joules power_core/energy-core/
> S0-D0-C3 1 0.00 Joules power_core/energy-core/
> S0-D0-C4 1 8.43 Joules power_core/energy-core/
> S0-D0-C5 1 0.00 Joules power_core/energy-core/
> S0-D0-C6 1 0.00 Joules power_core/energy-core/
> S0-D0-C7 1 0.00 Joules power_core/energy-core/
> S0-D1-C8 1 0.00 Joules power_core/energy-core/
> S0-D1-C9 1 0.00 Joules power_core/energy-core/
> S0-D1-C10 1 0.00 Joules power_core/energy-core/
>
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
> ---
> arch/x86/events/rapl.c | 185 +++++++++++++++++++++++++++++++++--------
> 1 file changed, 152 insertions(+), 33 deletions(-)
>
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index 6e51386ff91f..e9be1f31163d 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -39,6 +39,10 @@
> * event: rapl_energy_psys
> * perf code: 0x5
> *
> + * core counter: consumption of a single physical core
> + * event: rapl_energy_core (power_core PMU)
> + * perf code: 0x1
> + *
> * We manage those counters as free running (read-only). They may be
> * use simultaneously by other tools, such as turbostat.
> *
> @@ -81,6 +85,10 @@ enum perf_rapl_pkg_events {
> NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
> };
>
> +#define PERF_RAPL_CORE 0 /* single core */
> +#define PERF_RAPL_CORE_EVENTS_MAX 1
> +#define NR_RAPL_CORE_DOMAINS PERF_RAPL_CORE_EVENTS_MAX
> +
> static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst = {
> "pp0-core",
> "package",
> @@ -89,6 +97,8 @@ static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst
> "psys",
> };
>
> +static const char *const rapl_core_domain_name __initconst = "core";
> +
> /*
> * event code: LSB 8 bits, passed in attr->config
> * any other bit is reserved
> @@ -141,14 +151,18 @@ enum rapl_unit_quirk {
>
> struct rapl_model {
> struct perf_msr *rapl_pkg_msrs;
> + struct perf_msr *rapl_core_msrs;
> unsigned long pkg_events;
> + unsigned long core_events;
> unsigned int msr_power_unit;
> enum rapl_unit_quirk unit_quirk;
> };
>
> /* 1/2^hw_unit Joule */
> static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
> +static int rapl_core_hw_unit __read_mostly;
> static struct rapl_pmus *rapl_pmus_pkg;
> +static struct rapl_pmus *rapl_pmus_core;
> static u64 rapl_timer_ms;
> static struct rapl_model *rapl_model;
>
> @@ -156,14 +170,23 @@ static struct rapl_model *rapl_model;
> * Helper function to get the correct topology id according to the
> * RAPL PMU scope.
> */
> -static inline unsigned int get_rapl_pmu_idx(int cpu)
> -{ /*
> +static inline unsigned int get_rapl_pmu_idx(int cpu, int scope)
> +{
> + /*
> * Returns unsigned int, which converts the '-1' return value
> * (for non-existent mappings in topology map) to UINT_MAX, so
> * the error check in the caller is simplified.
> */
> - return rapl_pkg_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
> - topology_logical_die_id(cpu);
> + switch (scope) {
> + case PERF_PMU_SCOPE_PKG:
> + return topology_logical_package_id(cpu);
> + case PERF_PMU_SCOPE_DIE:
> + return topology_logical_die_id(cpu);
> + case PERF_PMU_SCOPE_CORE:
> + return topology_logical_core_id(cpu);
> + default:
> + return -EINVAL;
> + }
> }
>
> static inline u64 rapl_read_counter(struct perf_event *event)
> @@ -173,19 +196,20 @@ static inline u64 rapl_read_counter(struct perf_event *event)
> return raw;
> }
>
> -static inline u64 rapl_scale(u64 v, int cfg)
> +static inline u64 rapl_scale(u64 v, struct perf_event *event)
> {
> - if (cfg > NR_RAPL_PKG_DOMAINS) {
> - pr_warn("Invalid domain %d, failed to scale data\n", cfg);
> - return v;
> - }
> + int hw_unit = rapl_pkg_hw_unit[event->hw.config - 1];
> +
> + if (event->pmu->scope == PERF_PMU_SCOPE_CORE)
> + hw_unit = rapl_core_hw_unit;
> +
> /*
> * scale delta to smallest unit (1/2^32)
> * users must then scale back: count * 1/(1e9*2^32) to get Joules
> * or use ldexp(count, -32).
> * Watts = Joules/Time delta
> */
> - return v << (32 - rapl_pkg_hw_unit[cfg - 1]);
> + return v << (32 - hw_unit);
> }
>
> static u64 rapl_event_update(struct perf_event *event)
> @@ -212,7 +236,7 @@ static u64 rapl_event_update(struct perf_event *event)
> delta = (new_raw_count << shift) - (prev_raw_count << shift);
> delta >>= shift;
>
> - sdelta = rapl_scale(delta, event->hw.config);
> + sdelta = rapl_scale(delta, event);
>
> local64_add(sdelta, &event->count);
>
> @@ -341,13 +365,14 @@ static void rapl_pmu_event_del(struct perf_event *event, int flags)
> static int rapl_pmu_event_init(struct perf_event *event)
> {
> u64 cfg = event->attr.config & RAPL_EVENT_MASK;
> - int bit, ret = 0;
> + int bit, rapl_pmus_scope, ret = 0;
> struct rapl_pmu *rapl_pmu;
> unsigned int rapl_pmu_idx;
> + struct rapl_pmus *rapl_pmus;
>
> - /* only look at RAPL events */
> - if (event->attr.type != rapl_pmus_pkg->pmu.type)
> - return -ENOENT;
> + /* unsupported modes and filters */
> + if (event->attr.sample_period) /* no sampling */
> + return -EINVAL;
Hi Dhananjay,
On linux-next, since this commit, it seems that simple sampling with 'perf
record -- <command>' (i.e. the default event), 'perf top' etc. can
unexpectedly fail because rapl_pmu_event_init() now returns -EINVAL instead
of -ENOENT even in such cases of a type mismatch. I observed that this
prevents evsel__fallback() from falling back to cpu-clock or task-clock.
Should we reorder the checks in rapl_pmu_event_init() to allow an early
return with -ENOENT in such cases, as shown below? I'm not very familiar
with this area and I might be missing something. I'd appreciate it if you
could share your thoughts.
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -370,17 +370,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
unsigned int rapl_pmu_idx;
struct rapl_pmus *rapl_pmus;
- /* unsupported modes and filters */
- if (event->attr.sample_period) /* no sampling */
- return -EINVAL;
-
- /* check only supported bits are set */
- if (event->attr.config & ~RAPL_EVENT_MASK)
- return -EINVAL;
-
- if (event->cpu < 0)
- return -EINVAL;
-
rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
if (!rapl_pmus)
return -EINVAL;
@@ -411,6 +400,17 @@ static int rapl_pmu_event_init(struct perf_event *event)
} else
return -EINVAL;
+ /* unsupported modes and filters */
+ if (event->attr.sample_period) /* no sampling */
+ return -EINVAL;
+
+ /* check only supported bits are set */
+ if (event->attr.config & ~RAPL_EVENT_MASK)
+ return -EINVAL;
+
+ if (event->cpu < 0)
+ return -EINVAL;
+
/* check event supported */
if (!(rapl_pmus->cntr_mask & (1 << bit)))
return -EINVAL;
Thanks.
-Koichiro
>
> /* check only supported bits are set */
> if (event->attr.config & ~RAPL_EVENT_MASK)
> @@ -356,31 +381,49 @@ static int rapl_pmu_event_init(struct perf_event *event)
> if (event->cpu < 0)
> return -EINVAL;
>
> - if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
> + rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
> + if (!rapl_pmus)
> return -EINVAL;
> -
> - cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
> - bit = cfg - 1;
> -
> - /* check event supported */
> - if (!(rapl_pmus_pkg->cntr_mask & (1 << bit)))
> + rapl_pmus_scope = rapl_pmus->pmu.scope;
> +
> + if (rapl_pmus_scope == PERF_PMU_SCOPE_PKG || rapl_pmus_scope == PERF_PMU_SCOPE_DIE) {
> + /* only look at RAPL package events */
> + if (event->attr.type != rapl_pmus_pkg->pmu.type)
> + return -ENOENT;
> +
> + cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
> + if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
> + return -EINVAL;
> +
> + bit = cfg - 1;
> + event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
> + } else if (rapl_pmus_scope == PERF_PMU_SCOPE_CORE) {
> + /* only look at RAPL core events */
> + if (event->attr.type != rapl_pmus_core->pmu.type)
> + return -ENOENT;
> +
> + cfg = array_index_nospec((long)cfg, NR_RAPL_CORE_DOMAINS + 1);
> + if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
> + return -EINVAL;
> +
> + bit = cfg - 1;
> + event->hw.event_base = rapl_model->rapl_core_msrs[bit].msr;
> + } else
> return -EINVAL;
>
> - /* unsupported modes and filters */
> - if (event->attr.sample_period) /* no sampling */
> + /* check event supported */
> + if (!(rapl_pmus->cntr_mask & (1 << bit)))
> return -EINVAL;
>
> - rapl_pmu_idx = get_rapl_pmu_idx(event->cpu);
> - if (rapl_pmu_idx >= rapl_pmus_pkg->nr_rapl_pmu)
> + rapl_pmu_idx = get_rapl_pmu_idx(event->cpu, rapl_pmus_scope);
> + if (rapl_pmu_idx >= rapl_pmus->nr_rapl_pmu)
> return -EINVAL;
> -
> /* must be done before validate_group */
> - rapl_pmu = rapl_pmus_pkg->rapl_pmu[rapl_pmu_idx];
> + rapl_pmu = rapl_pmus->rapl_pmu[rapl_pmu_idx];
> if (!rapl_pmu)
> return -EINVAL;
>
> event->pmu_private = rapl_pmu;
> - event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
> event->hw.config = cfg;
> event->hw.idx = bit;
>
> @@ -397,12 +440,14 @@ RAPL_EVENT_ATTR_STR(energy-pkg , rapl_pkg, "event=0x02");
> RAPL_EVENT_ATTR_STR(energy-ram , rapl_ram, "event=0x03");
> RAPL_EVENT_ATTR_STR(energy-gpu , rapl_gpu, "event=0x04");
> RAPL_EVENT_ATTR_STR(energy-psys, rapl_psys, "event=0x05");
> +RAPL_EVENT_ATTR_STR(energy-core, rapl_core, "event=0x01");
>
> RAPL_EVENT_ATTR_STR(energy-cores.unit, rapl_cores_unit, "Joules");
> RAPL_EVENT_ATTR_STR(energy-pkg.unit , rapl_pkg_unit, "Joules");
> RAPL_EVENT_ATTR_STR(energy-ram.unit , rapl_ram_unit, "Joules");
> RAPL_EVENT_ATTR_STR(energy-gpu.unit , rapl_gpu_unit, "Joules");
> RAPL_EVENT_ATTR_STR(energy-psys.unit, rapl_psys_unit, "Joules");
> +RAPL_EVENT_ATTR_STR(energy-core.unit, rapl_core_unit, "Joules");
>
> /*
> * we compute in 0.23 nJ increments regardless of MSR
> @@ -412,6 +457,7 @@ RAPL_EVENT_ATTR_STR(energy-pkg.scale, rapl_pkg_scale, "2.3283064365386962890
> RAPL_EVENT_ATTR_STR(energy-ram.scale, rapl_ram_scale, "2.3283064365386962890625e-10");
> RAPL_EVENT_ATTR_STR(energy-gpu.scale, rapl_gpu_scale, "2.3283064365386962890625e-10");
> RAPL_EVENT_ATTR_STR(energy-psys.scale, rapl_psys_scale, "2.3283064365386962890625e-10");
> +RAPL_EVENT_ATTR_STR(energy-core.scale, rapl_core_scale, "2.3283064365386962890625e-10");
>
> /*
> * There are no default events, but we need to create
> @@ -444,6 +490,12 @@ static const struct attribute_group *rapl_attr_groups[] = {
> NULL,
> };
>
> +static const struct attribute_group *rapl_core_attr_groups[] = {
> + &rapl_pmu_format_group,
> + &rapl_pmu_events_group,
> + NULL,
> +};
> +
> static struct attribute *rapl_events_cores[] = {
> EVENT_PTR(rapl_cores),
> EVENT_PTR(rapl_cores_unit),
> @@ -504,6 +556,18 @@ static struct attribute_group rapl_events_psys_group = {
> .attrs = rapl_events_psys,
> };
>
> +static struct attribute *rapl_events_core[] = {
> + EVENT_PTR(rapl_core),
> + EVENT_PTR(rapl_core_unit),
> + EVENT_PTR(rapl_core_scale),
> + NULL,
> +};
> +
> +static struct attribute_group rapl_events_core_group = {
> + .name = "events",
> + .attrs = rapl_events_core,
> +};
> +
> static bool test_msr(int idx, void *data)
> {
> return test_bit(idx, (unsigned long *) data);
> @@ -541,6 +605,11 @@ static struct perf_msr amd_rapl_pkg_msrs[] = {
> [PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group, NULL, false, 0 },
> };
>
> +static struct perf_msr amd_rapl_core_msrs[] = {
> + [PERF_RAPL_CORE] = { MSR_AMD_CORE_ENERGY_STATUS, &rapl_events_core_group,
> + test_msr, false, RAPL_MSR_MASK },
> +};
> +
> static int rapl_check_hw_unit(void)
> {
> u64 msr_rapl_power_unit_bits;
> @@ -552,6 +621,8 @@ static int rapl_check_hw_unit(void)
> for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++)
> rapl_pkg_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
>
> + rapl_core_hw_unit = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
> +
> switch (rapl_model->unit_quirk) {
> /*
> * DRAM domain on HSW server and KNL has fixed energy unit which can be
> @@ -570,7 +641,6 @@ static int rapl_check_hw_unit(void)
> break;
> }
>
> -
> /*
> * Calculate the timer rate:
> * Use reference of 200W for scaling the timeout to avoid counter
> @@ -589,9 +659,13 @@ static int rapl_check_hw_unit(void)
> static void __init rapl_advertise(void)
> {
> int i;
> + int num_counters = hweight32(rapl_pmus_pkg->cntr_mask);
> +
> + if (rapl_pmus_core)
> + num_counters += hweight32(rapl_pmus_core->cntr_mask);
>
> pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms ovfl timer\n",
> - hweight32(rapl_pmus_pkg->cntr_mask), rapl_timer_ms);
> + num_counters, rapl_timer_ms);
>
> for (i = 0; i < NR_RAPL_PKG_DOMAINS; i++) {
> if (rapl_pmus_pkg->cntr_mask & (1 << i)) {
> @@ -599,6 +673,10 @@ static void __init rapl_advertise(void)
> rapl_pkg_domain_names[i], rapl_pkg_hw_unit[i]);
> }
> }
> +
> + if (rapl_pmus_core && (rapl_pmus_core->cntr_mask & (1 << PERF_RAPL_CORE)))
> + pr_info("hw unit of domain %s 2^-%d Joules\n",
> + rapl_core_domain_name, rapl_core_hw_unit);
> }
>
> static void cleanup_rapl_pmus(struct rapl_pmus *rapl_pmus)
> @@ -619,6 +697,10 @@ static const struct attribute_group *rapl_attr_update[] = {
> NULL,
> };
>
> +static const struct attribute_group *rapl_core_attr_update[] = {
> + &rapl_events_core_group,
> +};
> +
> static int __init init_rapl_pmu(struct rapl_pmus *rapl_pmus)
> {
> struct rapl_pmu *rapl_pmu;
> @@ -645,13 +727,22 @@ static int __init init_rapl_pmu(struct rapl_pmus *rapl_pmus)
> return -ENOMEM;
> }
>
> -static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr, int rapl_pmu_scope)
> +static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr, int rapl_pmu_scope,
> + const struct attribute_group **rapl_attr_groups,
> + const struct attribute_group **rapl_attr_update)
> {
> int nr_rapl_pmu = topology_max_packages();
> struct rapl_pmus *rapl_pmus;
>
> + /*
> + * rapl_pmu_scope must be either PKG, DIE or CORE
> + */
> if (rapl_pmu_scope == PERF_PMU_SCOPE_DIE)
> nr_rapl_pmu *= topology_max_dies_per_package();
> + else if (rapl_pmu_scope == PERF_PMU_SCOPE_CORE)
> + nr_rapl_pmu *= topology_num_cores_per_package();
> + else if (rapl_pmu_scope != PERF_PMU_SCOPE_PKG)
> + return -EINVAL;
>
> rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu, nr_rapl_pmu), GFP_KERNEL);
> if (!rapl_pmus)
> @@ -740,8 +831,10 @@ static struct rapl_model model_spr = {
>
> static struct rapl_model model_amd_hygon = {
> .pkg_events = BIT(PERF_RAPL_PKG),
> + .core_events = BIT(PERF_RAPL_CORE),
> .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
> .rapl_pkg_msrs = amd_rapl_pkg_msrs,
> + .rapl_core_msrs = amd_rapl_core_msrs,
> };
>
> static const struct x86_cpu_id rapl_model_match[] __initconst = {
> @@ -813,7 +906,8 @@ static int __init rapl_pmu_init(void)
> if (ret)
> return ret;
>
> - ret = init_rapl_pmus(&rapl_pmus_pkg, rapl_pkg_pmu_scope);
> + ret = init_rapl_pmus(&rapl_pmus_pkg, rapl_pkg_pmu_scope, rapl_attr_groups,
> + rapl_attr_update);
> if (ret)
> return ret;
>
> @@ -825,6 +919,27 @@ static int __init rapl_pmu_init(void)
> if (ret)
> goto out;
>
> + if (rapl_model->core_events) {
> + ret = init_rapl_pmus(&rapl_pmus_core, PERF_PMU_SCOPE_CORE,
> + rapl_core_attr_groups,
> + rapl_core_attr_update);
> + if (ret) {
> + pr_warn("power-core PMU initialization failed (%d)\n", ret);
> + goto core_init_failed;
> + }
> +
> + rapl_pmus_core->cntr_mask = perf_msr_probe(rapl_model->rapl_core_msrs,
> + PERF_RAPL_CORE_EVENTS_MAX, false,
> + (void *) &rapl_model->core_events);
> +
> + ret = perf_pmu_register(&rapl_pmus_core->pmu, "power_core", -1);
> + if (ret) {
> + pr_warn("power-core PMU registration failed (%d)\n", ret);
> + cleanup_rapl_pmus(rapl_pmus_core);
> + }
> + }
> +
> +core_init_failed:
> rapl_advertise();
> return 0;
>
> @@ -837,6 +952,10 @@ module_init(rapl_pmu_init);
>
> static void __exit intel_rapl_exit(void)
> {
> + if (rapl_pmus_core) {
> + perf_pmu_unregister(&rapl_pmus_core->pmu);
> + cleanup_rapl_pmus(rapl_pmus_core);
> + }
> perf_pmu_unregister(&rapl_pmus_pkg->pmu);
> cleanup_rapl_pmus(rapl_pmus_pkg);
> }
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs
2025-01-12 13:42 ` Koichiro Den
@ 2025-01-13 6:34 ` Dhananjay Ugwekar
2025-01-15 14:23 ` Koichiro Den
0 siblings, 1 reply; 26+ messages in thread
From: Dhananjay Ugwekar @ 2025-01-13 6:34 UTC (permalink / raw)
To: Koichiro Den
Cc: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy, kprateek.nayak, ravi.bangoria, x86,
linux-perf-users, linux-kernel
On 1/12/2025 7:12 PM, Koichiro Den wrote:
> On Fri, Nov 15, 2024 at 06:08:06AM GMT, Dhananjay Ugwekar wrote:
>> Add a new "power_core" PMU and "energy-core" event for monitoring
>> energy consumption by each individual core. The existing energy-cores
>> event aggregates the energy consumption of CPU cores at the package level.
>> This new event aligns with the AMD's per-core energy counters.
>>
>> Tested the package level and core level PMU counters with workloads
>> pinned to different CPUs.
>>
>> Results with workload pinned to CPU 4 in core 4 on an AMD Zen4 Genoa
>> machine:
>>
>> $ sudo perf stat --per-core -e power_core/energy-core/ -- taskset -c 4 stress-ng --matrix 1 --timeout 5s
>> stress-ng: info: [21250] setting to a 5 second run per stressor
>> stress-ng: info: [21250] dispatching hogs: 1 matrix
>> stress-ng: info: [21250] successful run completed in 5.00s
>>
>> Performance counter stats for 'system wide':
>>
>> S0-D0-C0 1 0.00 Joules power_core/energy-core/
>> S0-D0-C1 1 0.00 Joules power_core/energy-core/
>> S0-D0-C2 1 0.00 Joules power_core/energy-core/
>> S0-D0-C3 1 0.00 Joules power_core/energy-core/
>> S0-D0-C4 1 8.43 Joules power_core/energy-core/
>> S0-D0-C5 1 0.00 Joules power_core/energy-core/
>> S0-D0-C6 1 0.00 Joules power_core/energy-core/
>> S0-D0-C7 1 0.00 Joules power_core/energy-core/
>> S0-D1-C8 1 0.00 Joules power_core/energy-core/
>> S0-D1-C9 1 0.00 Joules power_core/energy-core/
>> S0-D1-C10 1 0.00 Joules power_core/energy-core/
>>
>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
>> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
>> ---
>> arch/x86/events/rapl.c | 185 +++++++++++++++++++++++++++++++++--------
>> 1 file changed, 152 insertions(+), 33 deletions(-)
>>
>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>> index 6e51386ff91f..e9be1f31163d 100644
>> --- a/arch/x86/events/rapl.c
>> +++ b/arch/x86/events/rapl.c
>> @@ -39,6 +39,10 @@
>> * event: rapl_energy_psys
>> * perf code: 0x5
>> *
>> + * core counter: consumption of a single physical core
>> + * event: rapl_energy_core (power_core PMU)
>> + * perf code: 0x1
>> + *
>> * We manage those counters as free running (read-only). They may be
>> * use simultaneously by other tools, such as turbostat.
>> *
>> @@ -81,6 +85,10 @@ enum perf_rapl_pkg_events {
>> NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
>> };
>>
>> +#define PERF_RAPL_CORE 0 /* single core */
>> +#define PERF_RAPL_CORE_EVENTS_MAX 1
>> +#define NR_RAPL_CORE_DOMAINS PERF_RAPL_CORE_EVENTS_MAX
>> +
>> static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst = {
>> "pp0-core",
>> "package",
>> @@ -89,6 +97,8 @@ static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst
>> "psys",
>> };
>>
>> +static const char *const rapl_core_domain_name __initconst = "core";
>> +
>> /*
>> * event code: LSB 8 bits, passed in attr->config
>> * any other bit is reserved
>> @@ -141,14 +151,18 @@ enum rapl_unit_quirk {
>>
>> struct rapl_model {
>> struct perf_msr *rapl_pkg_msrs;
>> + struct perf_msr *rapl_core_msrs;
>> unsigned long pkg_events;
>> + unsigned long core_events;
>> unsigned int msr_power_unit;
>> enum rapl_unit_quirk unit_quirk;
>> };
>>
>> /* 1/2^hw_unit Joule */
>> static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
>> +static int rapl_core_hw_unit __read_mostly;
>> static struct rapl_pmus *rapl_pmus_pkg;
>> +static struct rapl_pmus *rapl_pmus_core;
>> static u64 rapl_timer_ms;
>> static struct rapl_model *rapl_model;
>>
>> @@ -156,14 +170,23 @@ static struct rapl_model *rapl_model;
>> * Helper function to get the correct topology id according to the
>> * RAPL PMU scope.
>> */
>> -static inline unsigned int get_rapl_pmu_idx(int cpu)
>> -{ /*
>> +static inline unsigned int get_rapl_pmu_idx(int cpu, int scope)
>> +{
>> + /*
>> * Returns unsigned int, which converts the '-1' return value
>> * (for non-existent mappings in topology map) to UINT_MAX, so
>> * the error check in the caller is simplified.
>> */
>> - return rapl_pkg_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
>> - topology_logical_die_id(cpu);
>> + switch (scope) {
>> + case PERF_PMU_SCOPE_PKG:
>> + return topology_logical_package_id(cpu);
>> + case PERF_PMU_SCOPE_DIE:
>> + return topology_logical_die_id(cpu);
>> + case PERF_PMU_SCOPE_CORE:
>> + return topology_logical_core_id(cpu);
>> + default:
>> + return -EINVAL;
>> + }
>> }
>>
>> static inline u64 rapl_read_counter(struct perf_event *event)
>> @@ -173,19 +196,20 @@ static inline u64 rapl_read_counter(struct perf_event *event)
>> return raw;
>> }
>>
>> -static inline u64 rapl_scale(u64 v, int cfg)
>> +static inline u64 rapl_scale(u64 v, struct perf_event *event)
>> {
>> - if (cfg > NR_RAPL_PKG_DOMAINS) {
>> - pr_warn("Invalid domain %d, failed to scale data\n", cfg);
>> - return v;
>> - }
>> + int hw_unit = rapl_pkg_hw_unit[event->hw.config - 1];
>> +
>> + if (event->pmu->scope == PERF_PMU_SCOPE_CORE)
>> + hw_unit = rapl_core_hw_unit;
>> +
>> /*
>> * scale delta to smallest unit (1/2^32)
>> * users must then scale back: count * 1/(1e9*2^32) to get Joules
>> * or use ldexp(count, -32).
>> * Watts = Joules/Time delta
>> */
>> - return v << (32 - rapl_pkg_hw_unit[cfg - 1]);
>> + return v << (32 - hw_unit);
>> }
>>
>> static u64 rapl_event_update(struct perf_event *event)
>> @@ -212,7 +236,7 @@ static u64 rapl_event_update(struct perf_event *event)
>> delta = (new_raw_count << shift) - (prev_raw_count << shift);
>> delta >>= shift;
>>
>> - sdelta = rapl_scale(delta, event->hw.config);
>> + sdelta = rapl_scale(delta, event);
>>
>> local64_add(sdelta, &event->count);
>>
>> @@ -341,13 +365,14 @@ static void rapl_pmu_event_del(struct perf_event *event, int flags)
>> static int rapl_pmu_event_init(struct perf_event *event)
>> {
>> u64 cfg = event->attr.config & RAPL_EVENT_MASK;
>> - int bit, ret = 0;
>> + int bit, rapl_pmus_scope, ret = 0;
>> struct rapl_pmu *rapl_pmu;
>> unsigned int rapl_pmu_idx;
>> + struct rapl_pmus *rapl_pmus;
>>
>> - /* only look at RAPL events */
>> - if (event->attr.type != rapl_pmus_pkg->pmu.type)
>> - return -ENOENT;
>> + /* unsupported modes and filters */
>> + if (event->attr.sample_period) /* no sampling */
>> + return -EINVAL;
>
> Hi Dhananjay,
>
> On linux-next, since this commit, it seems that simple sampling with 'perf
> record -- <command>' (i.e. the default event), 'perf top' etc. can
> unexpectedly fail because rapl_pmu_event_init() now returns -EINVAL instead
> of -ENOENT even in such cases of a type mismatch. I observed that this
> prevents evsel__fallback() from falling back to cpu-clock or task-clock.
>
> Should we reorder the checks in rapl_pmu_event_init() to allow an early
> return with -ENOENT in such cases, as shown below? I'm not very familiar
> with this area and I might be missing something. I'd appreciate it if you
> could share your thoughts.
>
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -370,17 +370,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
> unsigned int rapl_pmu_idx;
> struct rapl_pmus *rapl_pmus;
>
> - /* unsupported modes and filters */
> - if (event->attr.sample_period) /* no sampling */
> - return -EINVAL;
> -
> - /* check only supported bits are set */
> - if (event->attr.config & ~RAPL_EVENT_MASK)
> - return -EINVAL;
> -
> - if (event->cpu < 0)
> - return -EINVAL;
> -
> rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
> if (!rapl_pmus)
> return -EINVAL;
> @@ -411,6 +400,17 @@ static int rapl_pmu_event_init(struct perf_event *event)
> } else
> return -EINVAL;
>
> + /* unsupported modes and filters */
> + if (event->attr.sample_period) /* no sampling */
> + return -EINVAL;
> +
> + /* check only supported bits are set */
> + if (event->attr.config & ~RAPL_EVENT_MASK)
> + return -EINVAL;
> +
> + if (event->cpu < 0)
> + return -EINVAL;
> +
> /* check event supported */
> if (!(rapl_pmus->cntr_mask & (1 << bit)))
> return -EINVAL;
>
> Thanks.
>
> -Koichiro
Hello Koichiro,
I tried reproducing the issue you mentioned using "sudo perf record -- sleep 2" and
"sudo perf top" commands on an AMD EPYC system, the commands worked successfully.
Can you please mention which system and which exact commands you're
running that reproduced the issue?
My analysis is, if we are running "perf record/top" with the default event, we would
not enter the rapl_pmu_event_init() function, which renders the reordering of the type
checks irrelevant. Regardless, please let me know how I can reproduce the issue.
Thanks,
Dhananjay
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs
2025-01-13 6:34 ` Dhananjay Ugwekar
@ 2025-01-15 14:23 ` Koichiro Den
2025-01-20 11:42 ` Dhananjay Ugwekar
0 siblings, 1 reply; 26+ messages in thread
From: Koichiro Den @ 2025-01-15 14:23 UTC (permalink / raw)
To: Dhananjay Ugwekar
Cc: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy, kprateek.nayak, ravi.bangoria, x86,
linux-perf-users, linux-kernel
On Mon, Jan 13, 2025 at 12:04:43PM GMT, Dhananjay Ugwekar wrote:
> On 1/12/2025 7:12 PM, Koichiro Den wrote:
> > On Fri, Nov 15, 2024 at 06:08:06AM GMT, Dhananjay Ugwekar wrote:
> >> Add a new "power_core" PMU and "energy-core" event for monitoring
> >> energy consumption by each individual core. The existing energy-cores
> >> event aggregates the energy consumption of CPU cores at the package level.
> >> This new event aligns with the AMD's per-core energy counters.
> >>
> >> Tested the package level and core level PMU counters with workloads
> >> pinned to different CPUs.
> >>
> >> Results with workload pinned to CPU 4 in core 4 on an AMD Zen4 Genoa
> >> machine:
> >>
> >> $ sudo perf stat --per-core -e power_core/energy-core/ -- taskset -c 4 stress-ng --matrix 1 --timeout 5s
> >> stress-ng: info: [21250] setting to a 5 second run per stressor
> >> stress-ng: info: [21250] dispatching hogs: 1 matrix
> >> stress-ng: info: [21250] successful run completed in 5.00s
> >>
> >> Performance counter stats for 'system wide':
> >>
> >> S0-D0-C0 1 0.00 Joules power_core/energy-core/
> >> S0-D0-C1 1 0.00 Joules power_core/energy-core/
> >> S0-D0-C2 1 0.00 Joules power_core/energy-core/
> >> S0-D0-C3 1 0.00 Joules power_core/energy-core/
> >> S0-D0-C4 1 8.43 Joules power_core/energy-core/
> >> S0-D0-C5 1 0.00 Joules power_core/energy-core/
> >> S0-D0-C6 1 0.00 Joules power_core/energy-core/
> >> S0-D0-C7 1 0.00 Joules power_core/energy-core/
> >> S0-D1-C8 1 0.00 Joules power_core/energy-core/
> >> S0-D1-C9 1 0.00 Joules power_core/energy-core/
> >> S0-D1-C10 1 0.00 Joules power_core/energy-core/
> >>
> >> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> >> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
> >> ---
> >> arch/x86/events/rapl.c | 185 +++++++++++++++++++++++++++++++++--------
> >> 1 file changed, 152 insertions(+), 33 deletions(-)
> >>
> >> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> >> index 6e51386ff91f..e9be1f31163d 100644
> >> --- a/arch/x86/events/rapl.c
> >> +++ b/arch/x86/events/rapl.c
> >> @@ -39,6 +39,10 @@
> >> * event: rapl_energy_psys
> >> * perf code: 0x5
> >> *
> >> + * core counter: consumption of a single physical core
> >> + * event: rapl_energy_core (power_core PMU)
> >> + * perf code: 0x1
> >> + *
> >> * We manage those counters as free running (read-only). They may be
> >> * use simultaneously by other tools, such as turbostat.
> >> *
> >> @@ -81,6 +85,10 @@ enum perf_rapl_pkg_events {
> >> NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
> >> };
> >>
> >> +#define PERF_RAPL_CORE 0 /* single core */
> >> +#define PERF_RAPL_CORE_EVENTS_MAX 1
> >> +#define NR_RAPL_CORE_DOMAINS PERF_RAPL_CORE_EVENTS_MAX
> >> +
> >> static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst = {
> >> "pp0-core",
> >> "package",
> >> @@ -89,6 +97,8 @@ static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst
> >> "psys",
> >> };
> >>
> >> +static const char *const rapl_core_domain_name __initconst = "core";
> >> +
> >> /*
> >> * event code: LSB 8 bits, passed in attr->config
> >> * any other bit is reserved
> >> @@ -141,14 +151,18 @@ enum rapl_unit_quirk {
> >>
> >> struct rapl_model {
> >> struct perf_msr *rapl_pkg_msrs;
> >> + struct perf_msr *rapl_core_msrs;
> >> unsigned long pkg_events;
> >> + unsigned long core_events;
> >> unsigned int msr_power_unit;
> >> enum rapl_unit_quirk unit_quirk;
> >> };
> >>
> >> /* 1/2^hw_unit Joule */
> >> static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
> >> +static int rapl_core_hw_unit __read_mostly;
> >> static struct rapl_pmus *rapl_pmus_pkg;
> >> +static struct rapl_pmus *rapl_pmus_core;
> >> static u64 rapl_timer_ms;
> >> static struct rapl_model *rapl_model;
> >>
> >> @@ -156,14 +170,23 @@ static struct rapl_model *rapl_model;
> >> * Helper function to get the correct topology id according to the
> >> * RAPL PMU scope.
> >> */
> >> -static inline unsigned int get_rapl_pmu_idx(int cpu)
> >> -{ /*
> >> +static inline unsigned int get_rapl_pmu_idx(int cpu, int scope)
> >> +{
> >> + /*
> >> * Returns unsigned int, which converts the '-1' return value
> >> * (for non-existent mappings in topology map) to UINT_MAX, so
> >> * the error check in the caller is simplified.
> >> */
> >> - return rapl_pkg_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
> >> - topology_logical_die_id(cpu);
> >> + switch (scope) {
> >> + case PERF_PMU_SCOPE_PKG:
> >> + return topology_logical_package_id(cpu);
> >> + case PERF_PMU_SCOPE_DIE:
> >> + return topology_logical_die_id(cpu);
> >> + case PERF_PMU_SCOPE_CORE:
> >> + return topology_logical_core_id(cpu);
> >> + default:
> >> + return -EINVAL;
> >> + }
> >> }
> >>
> >> static inline u64 rapl_read_counter(struct perf_event *event)
> >> @@ -173,19 +196,20 @@ static inline u64 rapl_read_counter(struct perf_event *event)
> >> return raw;
> >> }
> >>
> >> -static inline u64 rapl_scale(u64 v, int cfg)
> >> +static inline u64 rapl_scale(u64 v, struct perf_event *event)
> >> {
> >> - if (cfg > NR_RAPL_PKG_DOMAINS) {
> >> - pr_warn("Invalid domain %d, failed to scale data\n", cfg);
> >> - return v;
> >> - }
> >> + int hw_unit = rapl_pkg_hw_unit[event->hw.config - 1];
> >> +
> >> + if (event->pmu->scope == PERF_PMU_SCOPE_CORE)
> >> + hw_unit = rapl_core_hw_unit;
> >> +
> >> /*
> >> * scale delta to smallest unit (1/2^32)
> >> * users must then scale back: count * 1/(1e9*2^32) to get Joules
> >> * or use ldexp(count, -32).
> >> * Watts = Joules/Time delta
> >> */
> >> - return v << (32 - rapl_pkg_hw_unit[cfg - 1]);
> >> + return v << (32 - hw_unit);
> >> }
> >>
> >> static u64 rapl_event_update(struct perf_event *event)
> >> @@ -212,7 +236,7 @@ static u64 rapl_event_update(struct perf_event *event)
> >> delta = (new_raw_count << shift) - (prev_raw_count << shift);
> >> delta >>= shift;
> >>
> >> - sdelta = rapl_scale(delta, event->hw.config);
> >> + sdelta = rapl_scale(delta, event);
> >>
> >> local64_add(sdelta, &event->count);
> >>
> >> @@ -341,13 +365,14 @@ static void rapl_pmu_event_del(struct perf_event *event, int flags)
> >> static int rapl_pmu_event_init(struct perf_event *event)
> >> {
> >> u64 cfg = event->attr.config & RAPL_EVENT_MASK;
> >> - int bit, ret = 0;
> >> + int bit, rapl_pmus_scope, ret = 0;
> >> struct rapl_pmu *rapl_pmu;
> >> unsigned int rapl_pmu_idx;
> >> + struct rapl_pmus *rapl_pmus;
> >>
> >> - /* only look at RAPL events */
> >> - if (event->attr.type != rapl_pmus_pkg->pmu.type)
> >> - return -ENOENT;
> >> + /* unsupported modes and filters */
> >> + if (event->attr.sample_period) /* no sampling */
> >> + return -EINVAL;
> >
> > Hi Dhananjay,
> >
> > On linux-next, since this commit, it seems that simple sampling with 'perf
> > record -- <command>' (i.e. the default event), 'perf top' etc. can
> > unexpectedly fail because rapl_pmu_event_init() now returns -EINVAL instead
> > of -ENOENT even in such cases of a type mismatch. I observed that this
> > prevents evsel__fallback() from falling back to cpu-clock or task-clock.
> >
> > Should we reorder the checks in rapl_pmu_event_init() to allow an early
> > return with -ENOENT in such cases, as shown below? I'm not very familiar
> > with this area and I might be missing something. I'd appreciate it if you
> > could share your thoughts.
> >
> > --- a/arch/x86/events/rapl.c
> > +++ b/arch/x86/events/rapl.c
> > @@ -370,17 +370,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
> > unsigned int rapl_pmu_idx;
> > struct rapl_pmus *rapl_pmus;
> >
> > - /* unsupported modes and filters */
> > - if (event->attr.sample_period) /* no sampling */
> > - return -EINVAL;
> > -
> > - /* check only supported bits are set */
> > - if (event->attr.config & ~RAPL_EVENT_MASK)
> > - return -EINVAL;
> > -
> > - if (event->cpu < 0)
> > - return -EINVAL;
> > -
> > rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
> > if (!rapl_pmus)
> > return -EINVAL;
> > @@ -411,6 +400,17 @@ static int rapl_pmu_event_init(struct perf_event *event)
> > } else
> > return -EINVAL;
> >
> > + /* unsupported modes and filters */
> > + if (event->attr.sample_period) /* no sampling */
> > + return -EINVAL;
> > +
> > + /* check only supported bits are set */
> > + if (event->attr.config & ~RAPL_EVENT_MASK)
> > + return -EINVAL;
> > +
> > + if (event->cpu < 0)
> > + return -EINVAL;
> > +
> > /* check event supported */
> > if (!(rapl_pmus->cntr_mask & (1 << bit)))
> > return -EINVAL;
> >
> > Thanks.
> >
> > -Koichiro
>
> Hello Koichiro,
>
> I tried reproducing the issue you mentioned using "sudo perf record -- sleep 2" and
> "sudo perf top" commands on an AMD EPYC system, the commands worked successfully.
> Can you please mention which system and which exact commands you're
> running that reproduced the issue?
>
> My analysis is, if we are running "perf record/top" with the default event, we would
> not enter the rapl_pmu_event_init() function, which renders the reordering of the type
> checks irrelevant. Regardless, please let me know how I can reproduce the issue.
>
> Thanks,
> Dhananjay
Hi,
Apologies for the delayed response, and thank you for your comment. I
confirmed that just running "perf top" on a qemu instance reproduces it.
The host CPU model is Intel Core i9-13900K, which is passed through to
the guest.
In my case, no pmu for PERF_TYPE_RAW is registered, but the rapl pmu is
present. Then, perf_init_event() reaches the line marked "---->" below, and
rapl_pmu_event_init() run, which returns -EINVAL before the type check.
static struct pmu *perf_init_event(struct perf_event *event)
{
[...]
if (pmu) {
[...]
goto unlock;
}
list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
----> ret = perf_try_init_event(pmu, event);
if (!ret)
goto unlock;
if (ret != -ENOENT) {
pmu = ERR_PTR(ret);
goto unlock;
}
}
I'll look into this a bit more on my side later and get back to you if
something becomes clear.
Thanks,
-Koichiro
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs
2025-01-15 14:23 ` Koichiro Den
@ 2025-01-20 11:42 ` Dhananjay Ugwekar
2025-01-28 8:11 ` Dhananjay Ugwekar
0 siblings, 1 reply; 26+ messages in thread
From: Dhananjay Ugwekar @ 2025-01-20 11:42 UTC (permalink / raw)
To: Koichiro Den
Cc: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy, kprateek.nayak, ravi.bangoria, x86,
linux-perf-users, linux-kernel
On 1/15/2025 7:53 PM, Koichiro Den wrote:
> On Mon, Jan 13, 2025 at 12:04:43PM GMT, Dhananjay Ugwekar wrote:
>> On 1/12/2025 7:12 PM, Koichiro Den wrote:
>>> On Fri, Nov 15, 2024 at 06:08:06AM GMT, Dhananjay Ugwekar wrote:
>>>> Add a new "power_core" PMU and "energy-core" event for monitoring
>>>> energy consumption by each individual core. The existing energy-cores
>>>> event aggregates the energy consumption of CPU cores at the package level.
>>>> This new event aligns with the AMD's per-core energy counters.
>>>>
>>>> Tested the package level and core level PMU counters with workloads
>>>> pinned to different CPUs.
>>>>
>>>> Results with workload pinned to CPU 4 in core 4 on an AMD Zen4 Genoa
>>>> machine:
>>>>
>>>> $ sudo perf stat --per-core -e power_core/energy-core/ -- taskset -c 4 stress-ng --matrix 1 --timeout 5s
>>>> stress-ng: info: [21250] setting to a 5 second run per stressor
>>>> stress-ng: info: [21250] dispatching hogs: 1 matrix
>>>> stress-ng: info: [21250] successful run completed in 5.00s
>>>>
>>>> Performance counter stats for 'system wide':
>>>>
>>>> S0-D0-C0 1 0.00 Joules power_core/energy-core/
>>>> S0-D0-C1 1 0.00 Joules power_core/energy-core/
>>>> S0-D0-C2 1 0.00 Joules power_core/energy-core/
>>>> S0-D0-C3 1 0.00 Joules power_core/energy-core/
>>>> S0-D0-C4 1 8.43 Joules power_core/energy-core/
>>>> S0-D0-C5 1 0.00 Joules power_core/energy-core/
>>>> S0-D0-C6 1 0.00 Joules power_core/energy-core/
>>>> S0-D0-C7 1 0.00 Joules power_core/energy-core/
>>>> S0-D1-C8 1 0.00 Joules power_core/energy-core/
>>>> S0-D1-C9 1 0.00 Joules power_core/energy-core/
>>>> S0-D1-C10 1 0.00 Joules power_core/energy-core/
>>>>
>>>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
>>>> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
>>>> ---
>>>> arch/x86/events/rapl.c | 185 +++++++++++++++++++++++++++++++++--------
>>>> 1 file changed, 152 insertions(+), 33 deletions(-)
>>>>
>>>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>>>> index 6e51386ff91f..e9be1f31163d 100644
>>>> --- a/arch/x86/events/rapl.c
>>>> +++ b/arch/x86/events/rapl.c
>>>> @@ -39,6 +39,10 @@
>>>> * event: rapl_energy_psys
>>>> * perf code: 0x5
>>>> *
>>>> + * core counter: consumption of a single physical core
>>>> + * event: rapl_energy_core (power_core PMU)
>>>> + * perf code: 0x1
>>>> + *
>>>> * We manage those counters as free running (read-only). They may be
>>>> * use simultaneously by other tools, such as turbostat.
>>>> *
>>>> @@ -81,6 +85,10 @@ enum perf_rapl_pkg_events {
>>>> NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
>>>> };
>>>>
>>>> +#define PERF_RAPL_CORE 0 /* single core */
>>>> +#define PERF_RAPL_CORE_EVENTS_MAX 1
>>>> +#define NR_RAPL_CORE_DOMAINS PERF_RAPL_CORE_EVENTS_MAX
>>>> +
>>>> static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst = {
>>>> "pp0-core",
>>>> "package",
>>>> @@ -89,6 +97,8 @@ static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst
>>>> "psys",
>>>> };
>>>>
>>>> +static const char *const rapl_core_domain_name __initconst = "core";
>>>> +
>>>> /*
>>>> * event code: LSB 8 bits, passed in attr->config
>>>> * any other bit is reserved
>>>> @@ -141,14 +151,18 @@ enum rapl_unit_quirk {
>>>>
>>>> struct rapl_model {
>>>> struct perf_msr *rapl_pkg_msrs;
>>>> + struct perf_msr *rapl_core_msrs;
>>>> unsigned long pkg_events;
>>>> + unsigned long core_events;
>>>> unsigned int msr_power_unit;
>>>> enum rapl_unit_quirk unit_quirk;
>>>> };
>>>>
>>>> /* 1/2^hw_unit Joule */
>>>> static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
>>>> +static int rapl_core_hw_unit __read_mostly;
>>>> static struct rapl_pmus *rapl_pmus_pkg;
>>>> +static struct rapl_pmus *rapl_pmus_core;
>>>> static u64 rapl_timer_ms;
>>>> static struct rapl_model *rapl_model;
>>>>
>>>> @@ -156,14 +170,23 @@ static struct rapl_model *rapl_model;
>>>> * Helper function to get the correct topology id according to the
>>>> * RAPL PMU scope.
>>>> */
>>>> -static inline unsigned int get_rapl_pmu_idx(int cpu)
>>>> -{ /*
>>>> +static inline unsigned int get_rapl_pmu_idx(int cpu, int scope)
>>>> +{
>>>> + /*
>>>> * Returns unsigned int, which converts the '-1' return value
>>>> * (for non-existent mappings in topology map) to UINT_MAX, so
>>>> * the error check in the caller is simplified.
>>>> */
>>>> - return rapl_pkg_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
>>>> - topology_logical_die_id(cpu);
>>>> + switch (scope) {
>>>> + case PERF_PMU_SCOPE_PKG:
>>>> + return topology_logical_package_id(cpu);
>>>> + case PERF_PMU_SCOPE_DIE:
>>>> + return topology_logical_die_id(cpu);
>>>> + case PERF_PMU_SCOPE_CORE:
>>>> + return topology_logical_core_id(cpu);
>>>> + default:
>>>> + return -EINVAL;
>>>> + }
>>>> }
>>>>
>>>> static inline u64 rapl_read_counter(struct perf_event *event)
>>>> @@ -173,19 +196,20 @@ static inline u64 rapl_read_counter(struct perf_event *event)
>>>> return raw;
>>>> }
>>>>
>>>> -static inline u64 rapl_scale(u64 v, int cfg)
>>>> +static inline u64 rapl_scale(u64 v, struct perf_event *event)
>>>> {
>>>> - if (cfg > NR_RAPL_PKG_DOMAINS) {
>>>> - pr_warn("Invalid domain %d, failed to scale data\n", cfg);
>>>> - return v;
>>>> - }
>>>> + int hw_unit = rapl_pkg_hw_unit[event->hw.config - 1];
>>>> +
>>>> + if (event->pmu->scope == PERF_PMU_SCOPE_CORE)
>>>> + hw_unit = rapl_core_hw_unit;
>>>> +
>>>> /*
>>>> * scale delta to smallest unit (1/2^32)
>>>> * users must then scale back: count * 1/(1e9*2^32) to get Joules
>>>> * or use ldexp(count, -32).
>>>> * Watts = Joules/Time delta
>>>> */
>>>> - return v << (32 - rapl_pkg_hw_unit[cfg - 1]);
>>>> + return v << (32 - hw_unit);
>>>> }
>>>>
>>>> static u64 rapl_event_update(struct perf_event *event)
>>>> @@ -212,7 +236,7 @@ static u64 rapl_event_update(struct perf_event *event)
>>>> delta = (new_raw_count << shift) - (prev_raw_count << shift);
>>>> delta >>= shift;
>>>>
>>>> - sdelta = rapl_scale(delta, event->hw.config);
>>>> + sdelta = rapl_scale(delta, event);
>>>>
>>>> local64_add(sdelta, &event->count);
>>>>
>>>> @@ -341,13 +365,14 @@ static void rapl_pmu_event_del(struct perf_event *event, int flags)
>>>> static int rapl_pmu_event_init(struct perf_event *event)
>>>> {
>>>> u64 cfg = event->attr.config & RAPL_EVENT_MASK;
>>>> - int bit, ret = 0;
>>>> + int bit, rapl_pmus_scope, ret = 0;
>>>> struct rapl_pmu *rapl_pmu;
>>>> unsigned int rapl_pmu_idx;
>>>> + struct rapl_pmus *rapl_pmus;
>>>>
>>>> - /* only look at RAPL events */
>>>> - if (event->attr.type != rapl_pmus_pkg->pmu.type)
>>>> - return -ENOENT;
>>>> + /* unsupported modes and filters */
>>>> + if (event->attr.sample_period) /* no sampling */
>>>> + return -EINVAL;
>>>
>>> Hi Dhananjay,
>>>
>>> On linux-next, since this commit, it seems that simple sampling with 'perf
>>> record -- <command>' (i.e. the default event), 'perf top' etc. can
>>> unexpectedly fail because rapl_pmu_event_init() now returns -EINVAL instead
>>> of -ENOENT even in such cases of a type mismatch. I observed that this
>>> prevents evsel__fallback() from falling back to cpu-clock or task-clock.
>>>
>>> Should we reorder the checks in rapl_pmu_event_init() to allow an early
>>> return with -ENOENT in such cases, as shown below? I'm not very familiar
>>> with this area and I might be missing something. I'd appreciate it if you
>>> could share your thoughts.
>>>
>>> --- a/arch/x86/events/rapl.c
>>> +++ b/arch/x86/events/rapl.c
>>> @@ -370,17 +370,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
>>> unsigned int rapl_pmu_idx;
>>> struct rapl_pmus *rapl_pmus;
>>>
>>> - /* unsupported modes and filters */
>>> - if (event->attr.sample_period) /* no sampling */
>>> - return -EINVAL;
>>> -
>>> - /* check only supported bits are set */
>>> - if (event->attr.config & ~RAPL_EVENT_MASK)
>>> - return -EINVAL;
>>> -
>>> - if (event->cpu < 0)
>>> - return -EINVAL;
>>> -
>>> rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
>>> if (!rapl_pmus)
>>> return -EINVAL;
>>> @@ -411,6 +400,17 @@ static int rapl_pmu_event_init(struct perf_event *event)
>>> } else
>>> return -EINVAL;
>>>
>>> + /* unsupported modes and filters */
>>> + if (event->attr.sample_period) /* no sampling */
>>> + return -EINVAL;
>>> +
>>> + /* check only supported bits are set */
>>> + if (event->attr.config & ~RAPL_EVENT_MASK)
>>> + return -EINVAL;
>>> +
>>> + if (event->cpu < 0)
>>> + return -EINVAL;
>>> +
>>> /* check event supported */
>>> if (!(rapl_pmus->cntr_mask & (1 << bit)))
>>> return -EINVAL;
>>>
>>> Thanks.
>>>
>>> -Koichiro
>>
>> Hello Koichiro,
>>
>> I tried reproducing the issue you mentioned using "sudo perf record -- sleep 2" and
>> "sudo perf top" commands on an AMD EPYC system, the commands worked successfully.
>> Can you please mention which system and which exact commands you're
>> running that reproduced the issue?
>>
>> My analysis is, if we are running "perf record/top" with the default event, we would
>> not enter the rapl_pmu_event_init() function, which renders the reordering of the type
>> checks irrelevant. Regardless, please let me know how I can reproduce the issue.
>>
>> Thanks,
>> Dhananjay
>
> Hi,
>
> Apologies for the delayed response, and thank you for your comment. I
> confirmed that just running "perf top" on a qemu instance reproduces it.
> The host CPU model is Intel Core i9-13900K, which is passed through to
> the guest.
>
> In my case, no pmu for PERF_TYPE_RAW is registered, but the rapl pmu is
> present. Then, perf_init_event() reaches the line marked "---->" below, and
> rapl_pmu_event_init() run, which returns -EINVAL before the type check.
>
> static struct pmu *perf_init_event(struct perf_event *event)
> {
> [...]
> if (pmu) {
> [...]
> goto unlock;
> }
>
> list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
> ----> ret = perf_try_init_event(pmu, event);
> if (!ret)
> goto unlock;
>
> if (ret != -ENOENT) {
> pmu = ERR_PTR(ret);
> goto unlock;
> }
> }
>
> I'll look into this a bit more on my side later and get back to you if
> something becomes clear.
Sorry for the delayed response, can you please try the below diff and let me know
if it fixes the issue?
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index d3bb3865c1b1..4952faf03e82 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -370,6 +370,10 @@ static int rapl_pmu_event_init(struct perf_event *event)
unsigned int rapl_pmu_idx;
struct rapl_pmus *rapl_pmus;
+ /* only look at RAPL events */
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
/* unsupported modes and filters */
if (event->attr.sample_period) /* no sampling */
return -EINVAL;
@@ -387,10 +391,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
rapl_pmus_scope = rapl_pmus->pmu.scope;
if (rapl_pmus_scope == PERF_PMU_SCOPE_PKG || rapl_pmus_scope == PERF_PMU_SCOPE_DIE) {
- /* only look at RAPL package events */
- if (event->attr.type != rapl_pmus_pkg->pmu.type)
- return -ENOENT;
-
cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
return -EINVAL;
@@ -398,10 +398,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
bit = cfg - 1;
event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
} else if (rapl_pmus_scope == PERF_PMU_SCOPE_CORE) {
- /* only look at RAPL core events */
- if (event->attr.type != rapl_pmus_core->pmu.type)
- return -ENOENT;
-
cfg = array_index_nospec((long)cfg, NR_RAPL_CORE_DOMAINS + 1);
if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
return -EINVAL;
>
> Thanks,
>
> -Koichiro
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs
2025-01-20 11:42 ` Dhananjay Ugwekar
@ 2025-01-28 8:11 ` Dhananjay Ugwekar
2025-01-29 3:03 ` Koichiro Den
0 siblings, 1 reply; 26+ messages in thread
From: Dhananjay Ugwekar @ 2025-01-28 8:11 UTC (permalink / raw)
To: Koichiro Den
Cc: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy, kprateek.nayak, ravi.bangoria, x86,
linux-perf-users, linux-kernel
On 1/20/2025 5:12 PM, Dhananjay Ugwekar wrote:
> On 1/15/2025 7:53 PM, Koichiro Den wrote:
>> On Mon, Jan 13, 2025 at 12:04:43PM GMT, Dhananjay Ugwekar wrote:
>>> On 1/12/2025 7:12 PM, Koichiro Den wrote:
>>>> On Fri, Nov 15, 2024 at 06:08:06AM GMT, Dhananjay Ugwekar wrote:
>>>>> Add a new "power_core" PMU and "energy-core" event for monitoring
>>>>> energy consumption by each individual core. The existing energy-cores
>>>>> event aggregates the energy consumption of CPU cores at the package level.
>>>>> This new event aligns with the AMD's per-core energy counters.
>>>>>
>>>>> Tested the package level and core level PMU counters with workloads
>>>>> pinned to different CPUs.
>>>>>
>>>>> Results with workload pinned to CPU 4 in core 4 on an AMD Zen4 Genoa
>>>>> machine:
>>>>>
>>>>> $ sudo perf stat --per-core -e power_core/energy-core/ -- taskset -c 4 stress-ng --matrix 1 --timeout 5s
>>>>> stress-ng: info: [21250] setting to a 5 second run per stressor
>>>>> stress-ng: info: [21250] dispatching hogs: 1 matrix
>>>>> stress-ng: info: [21250] successful run completed in 5.00s
>>>>>
>>>>> Performance counter stats for 'system wide':
>>>>>
>>>>> S0-D0-C0 1 0.00 Joules power_core/energy-core/
>>>>> S0-D0-C1 1 0.00 Joules power_core/energy-core/
>>>>> S0-D0-C2 1 0.00 Joules power_core/energy-core/
>>>>> S0-D0-C3 1 0.00 Joules power_core/energy-core/
>>>>> S0-D0-C4 1 8.43 Joules power_core/energy-core/
>>>>> S0-D0-C5 1 0.00 Joules power_core/energy-core/
>>>>> S0-D0-C6 1 0.00 Joules power_core/energy-core/
>>>>> S0-D0-C7 1 0.00 Joules power_core/energy-core/
>>>>> S0-D1-C8 1 0.00 Joules power_core/energy-core/
>>>>> S0-D1-C9 1 0.00 Joules power_core/energy-core/
>>>>> S0-D1-C10 1 0.00 Joules power_core/energy-core/
>>>>>
>>>>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
>>>>> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
>>>>> ---
>>>>> arch/x86/events/rapl.c | 185 +++++++++++++++++++++++++++++++++--------
>>>>> 1 file changed, 152 insertions(+), 33 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>>>>> index 6e51386ff91f..e9be1f31163d 100644
>>>>> --- a/arch/x86/events/rapl.c
>>>>> +++ b/arch/x86/events/rapl.c
>>>>> @@ -39,6 +39,10 @@
>>>>> * event: rapl_energy_psys
>>>>> * perf code: 0x5
>>>>> *
>>>>> + * core counter: consumption of a single physical core
>>>>> + * event: rapl_energy_core (power_core PMU)
>>>>> + * perf code: 0x1
>>>>> + *
>>>>> * We manage those counters as free running (read-only). They may be
>>>>> * use simultaneously by other tools, such as turbostat.
>>>>> *
>>>>> @@ -81,6 +85,10 @@ enum perf_rapl_pkg_events {
>>>>> NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
>>>>> };
>>>>>
>>>>> +#define PERF_RAPL_CORE 0 /* single core */
>>>>> +#define PERF_RAPL_CORE_EVENTS_MAX 1
>>>>> +#define NR_RAPL_CORE_DOMAINS PERF_RAPL_CORE_EVENTS_MAX
>>>>> +
>>>>> static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst = {
>>>>> "pp0-core",
>>>>> "package",
>>>>> @@ -89,6 +97,8 @@ static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst
>>>>> "psys",
>>>>> };
>>>>>
>>>>> +static const char *const rapl_core_domain_name __initconst = "core";
>>>>> +
>>>>> /*
>>>>> * event code: LSB 8 bits, passed in attr->config
>>>>> * any other bit is reserved
>>>>> @@ -141,14 +151,18 @@ enum rapl_unit_quirk {
>>>>>
>>>>> struct rapl_model {
>>>>> struct perf_msr *rapl_pkg_msrs;
>>>>> + struct perf_msr *rapl_core_msrs;
>>>>> unsigned long pkg_events;
>>>>> + unsigned long core_events;
>>>>> unsigned int msr_power_unit;
>>>>> enum rapl_unit_quirk unit_quirk;
>>>>> };
>>>>>
>>>>> /* 1/2^hw_unit Joule */
>>>>> static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
>>>>> +static int rapl_core_hw_unit __read_mostly;
>>>>> static struct rapl_pmus *rapl_pmus_pkg;
>>>>> +static struct rapl_pmus *rapl_pmus_core;
>>>>> static u64 rapl_timer_ms;
>>>>> static struct rapl_model *rapl_model;
>>>>>
>>>>> @@ -156,14 +170,23 @@ static struct rapl_model *rapl_model;
>>>>> * Helper function to get the correct topology id according to the
>>>>> * RAPL PMU scope.
>>>>> */
>>>>> -static inline unsigned int get_rapl_pmu_idx(int cpu)
>>>>> -{ /*
>>>>> +static inline unsigned int get_rapl_pmu_idx(int cpu, int scope)
>>>>> +{
>>>>> + /*
>>>>> * Returns unsigned int, which converts the '-1' return value
>>>>> * (for non-existent mappings in topology map) to UINT_MAX, so
>>>>> * the error check in the caller is simplified.
>>>>> */
>>>>> - return rapl_pkg_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
>>>>> - topology_logical_die_id(cpu);
>>>>> + switch (scope) {
>>>>> + case PERF_PMU_SCOPE_PKG:
>>>>> + return topology_logical_package_id(cpu);
>>>>> + case PERF_PMU_SCOPE_DIE:
>>>>> + return topology_logical_die_id(cpu);
>>>>> + case PERF_PMU_SCOPE_CORE:
>>>>> + return topology_logical_core_id(cpu);
>>>>> + default:
>>>>> + return -EINVAL;
>>>>> + }
>>>>> }
>>>>>
>>>>> static inline u64 rapl_read_counter(struct perf_event *event)
>>>>> @@ -173,19 +196,20 @@ static inline u64 rapl_read_counter(struct perf_event *event)
>>>>> return raw;
>>>>> }
>>>>>
>>>>> -static inline u64 rapl_scale(u64 v, int cfg)
>>>>> +static inline u64 rapl_scale(u64 v, struct perf_event *event)
>>>>> {
>>>>> - if (cfg > NR_RAPL_PKG_DOMAINS) {
>>>>> - pr_warn("Invalid domain %d, failed to scale data\n", cfg);
>>>>> - return v;
>>>>> - }
>>>>> + int hw_unit = rapl_pkg_hw_unit[event->hw.config - 1];
>>>>> +
>>>>> + if (event->pmu->scope == PERF_PMU_SCOPE_CORE)
>>>>> + hw_unit = rapl_core_hw_unit;
>>>>> +
>>>>> /*
>>>>> * scale delta to smallest unit (1/2^32)
>>>>> * users must then scale back: count * 1/(1e9*2^32) to get Joules
>>>>> * or use ldexp(count, -32).
>>>>> * Watts = Joules/Time delta
>>>>> */
>>>>> - return v << (32 - rapl_pkg_hw_unit[cfg - 1]);
>>>>> + return v << (32 - hw_unit);
>>>>> }
>>>>>
>>>>> static u64 rapl_event_update(struct perf_event *event)
>>>>> @@ -212,7 +236,7 @@ static u64 rapl_event_update(struct perf_event *event)
>>>>> delta = (new_raw_count << shift) - (prev_raw_count << shift);
>>>>> delta >>= shift;
>>>>>
>>>>> - sdelta = rapl_scale(delta, event->hw.config);
>>>>> + sdelta = rapl_scale(delta, event);
>>>>>
>>>>> local64_add(sdelta, &event->count);
>>>>>
>>>>> @@ -341,13 +365,14 @@ static void rapl_pmu_event_del(struct perf_event *event, int flags)
>>>>> static int rapl_pmu_event_init(struct perf_event *event)
>>>>> {
>>>>> u64 cfg = event->attr.config & RAPL_EVENT_MASK;
>>>>> - int bit, ret = 0;
>>>>> + int bit, rapl_pmus_scope, ret = 0;
>>>>> struct rapl_pmu *rapl_pmu;
>>>>> unsigned int rapl_pmu_idx;
>>>>> + struct rapl_pmus *rapl_pmus;
>>>>>
>>>>> - /* only look at RAPL events */
>>>>> - if (event->attr.type != rapl_pmus_pkg->pmu.type)
>>>>> - return -ENOENT;
>>>>> + /* unsupported modes and filters */
>>>>> + if (event->attr.sample_period) /* no sampling */
>>>>> + return -EINVAL;
>>>>
>>>> Hi Dhananjay,
>>>>
>>>> On linux-next, since this commit, it seems that simple sampling with 'perf
>>>> record -- <command>' (i.e. the default event), 'perf top' etc. can
>>>> unexpectedly fail because rapl_pmu_event_init() now returns -EINVAL instead
>>>> of -ENOENT even in such cases of a type mismatch. I observed that this
>>>> prevents evsel__fallback() from falling back to cpu-clock or task-clock.
>>>>
>>>> Should we reorder the checks in rapl_pmu_event_init() to allow an early
>>>> return with -ENOENT in such cases, as shown below? I'm not very familiar
>>>> with this area and I might be missing something. I'd appreciate it if you
>>>> could share your thoughts.
>>>>
>>>> --- a/arch/x86/events/rapl.c
>>>> +++ b/arch/x86/events/rapl.c
>>>> @@ -370,17 +370,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
>>>> unsigned int rapl_pmu_idx;
>>>> struct rapl_pmus *rapl_pmus;
>>>>
>>>> - /* unsupported modes and filters */
>>>> - if (event->attr.sample_period) /* no sampling */
>>>> - return -EINVAL;
>>>> -
>>>> - /* check only supported bits are set */
>>>> - if (event->attr.config & ~RAPL_EVENT_MASK)
>>>> - return -EINVAL;
>>>> -
>>>> - if (event->cpu < 0)
>>>> - return -EINVAL;
>>>> -
>>>> rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
>>>> if (!rapl_pmus)
>>>> return -EINVAL;
>>>> @@ -411,6 +400,17 @@ static int rapl_pmu_event_init(struct perf_event *event)
>>>> } else
>>>> return -EINVAL;
>>>>
>>>> + /* unsupported modes and filters */
>>>> + if (event->attr.sample_period) /* no sampling */
>>>> + return -EINVAL;
>>>> +
>>>> + /* check only supported bits are set */
>>>> + if (event->attr.config & ~RAPL_EVENT_MASK)
>>>> + return -EINVAL;
>>>> +
>>>> + if (event->cpu < 0)
>>>> + return -EINVAL;
>>>> +
>>>> /* check event supported */
>>>> if (!(rapl_pmus->cntr_mask & (1 << bit)))
>>>> return -EINVAL;
>>>>
>>>> Thanks.
>>>>
>>>> -Koichiro
>>>
>>> Hello Koichiro,
>>>
>>> I tried reproducing the issue you mentioned using "sudo perf record -- sleep 2" and
>>> "sudo perf top" commands on an AMD EPYC system, the commands worked successfully.
>>> Can you please mention which system and which exact commands you're
>>> running that reproduced the issue?
>>>
>>> My analysis is, if we are running "perf record/top" with the default event, we would
>>> not enter the rapl_pmu_event_init() function, which renders the reordering of the type
>>> checks irrelevant. Regardless, please let me know how I can reproduce the issue.
>>>
>>> Thanks,
>>> Dhananjay
>>
>> Hi,
>>
>> Apologies for the delayed response, and thank you for your comment. I
>> confirmed that just running "perf top" on a qemu instance reproduces it.
>> The host CPU model is Intel Core i9-13900K, which is passed through to
>> the guest.
>>
>> In my case, no pmu for PERF_TYPE_RAW is registered, but the rapl pmu is
>> present. Then, perf_init_event() reaches the line marked "---->" below, and
>> rapl_pmu_event_init() run, which returns -EINVAL before the type check.
>>
>> static struct pmu *perf_init_event(struct perf_event *event)
>> {
>> [...]
>> if (pmu) {
>> [...]
>> goto unlock;
>> }
>>
>> list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
>> ----> ret = perf_try_init_event(pmu, event);
>> if (!ret)
>> goto unlock;
>>
>> if (ret != -ENOENT) {
>> pmu = ERR_PTR(ret);
>> goto unlock;
>> }
>> }
>>
>> I'll look into this a bit more on my side later and get back to you if
>> something becomes clear.
>
> Sorry for the delayed response, can you please try the below diff and let me know
> if it fixes the issue?
Hello Koichiro,
Gentle ping, please let me know once you try out the below fix.
Thanks,
Dhananjay
>
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index d3bb3865c1b1..4952faf03e82 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -370,6 +370,10 @@ static int rapl_pmu_event_init(struct perf_event *event)
> unsigned int rapl_pmu_idx;
> struct rapl_pmus *rapl_pmus;
>
> + /* only look at RAPL events */
> + if (event->attr.type != event->pmu->type)
> + return -ENOENT;
> +
> /* unsupported modes and filters */
> if (event->attr.sample_period) /* no sampling */
> return -EINVAL;
> @@ -387,10 +391,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
> rapl_pmus_scope = rapl_pmus->pmu.scope;
>
> if (rapl_pmus_scope == PERF_PMU_SCOPE_PKG || rapl_pmus_scope == PERF_PMU_SCOPE_DIE) {
> - /* only look at RAPL package events */
> - if (event->attr.type != rapl_pmus_pkg->pmu.type)
> - return -ENOENT;
> -
> cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
> if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
> return -EINVAL;
> @@ -398,10 +398,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
> bit = cfg - 1;
> event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
> } else if (rapl_pmus_scope == PERF_PMU_SCOPE_CORE) {
> - /* only look at RAPL core events */
> - if (event->attr.type != rapl_pmus_core->pmu.type)
> - return -ENOENT;
> -
> cfg = array_index_nospec((long)cfg, NR_RAPL_CORE_DOMAINS + 1);
> if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
> return -EINVAL;
>
>>
>> Thanks,
>>
>> -Koichiro
>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs
2025-01-28 8:11 ` Dhananjay Ugwekar
@ 2025-01-29 3:03 ` Koichiro Den
2025-01-29 5:34 ` Dhananjay Ugwekar
0 siblings, 1 reply; 26+ messages in thread
From: Koichiro Den @ 2025-01-29 3:03 UTC (permalink / raw)
To: Dhananjay Ugwekar
Cc: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy, kprateek.nayak, ravi.bangoria, x86,
linux-perf-users, linux-kernel
On Tue, Jan 28, 2025 at 01:41:51PM GMT, Dhananjay Ugwekar wrote:
> On 1/20/2025 5:12 PM, Dhananjay Ugwekar wrote:
> > On 1/15/2025 7:53 PM, Koichiro Den wrote:
> >> On Mon, Jan 13, 2025 at 12:04:43PM GMT, Dhananjay Ugwekar wrote:
> >>> On 1/12/2025 7:12 PM, Koichiro Den wrote:
> >>>> On Fri, Nov 15, 2024 at 06:08:06AM GMT, Dhananjay Ugwekar wrote:
> >>>>> Add a new "power_core" PMU and "energy-core" event for monitoring
> >>>>> energy consumption by each individual core. The existing energy-cores
> >>>>> event aggregates the energy consumption of CPU cores at the package level.
> >>>>> This new event aligns with the AMD's per-core energy counters.
> >>>>>
> >>>>> Tested the package level and core level PMU counters with workloads
> >>>>> pinned to different CPUs.
> >>>>>
> >>>>> Results with workload pinned to CPU 4 in core 4 on an AMD Zen4 Genoa
> >>>>> machine:
> >>>>>
> >>>>> $ sudo perf stat --per-core -e power_core/energy-core/ -- taskset -c 4 stress-ng --matrix 1 --timeout 5s
> >>>>> stress-ng: info: [21250] setting to a 5 second run per stressor
> >>>>> stress-ng: info: [21250] dispatching hogs: 1 matrix
> >>>>> stress-ng: info: [21250] successful run completed in 5.00s
> >>>>>
> >>>>> Performance counter stats for 'system wide':
> >>>>>
> >>>>> S0-D0-C0 1 0.00 Joules power_core/energy-core/
> >>>>> S0-D0-C1 1 0.00 Joules power_core/energy-core/
> >>>>> S0-D0-C2 1 0.00 Joules power_core/energy-core/
> >>>>> S0-D0-C3 1 0.00 Joules power_core/energy-core/
> >>>>> S0-D0-C4 1 8.43 Joules power_core/energy-core/
> >>>>> S0-D0-C5 1 0.00 Joules power_core/energy-core/
> >>>>> S0-D0-C6 1 0.00 Joules power_core/energy-core/
> >>>>> S0-D0-C7 1 0.00 Joules power_core/energy-core/
> >>>>> S0-D1-C8 1 0.00 Joules power_core/energy-core/
> >>>>> S0-D1-C9 1 0.00 Joules power_core/energy-core/
> >>>>> S0-D1-C10 1 0.00 Joules power_core/energy-core/
> >>>>>
> >>>>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> >>>>> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
> >>>>> ---
> >>>>> arch/x86/events/rapl.c | 185 +++++++++++++++++++++++++++++++++--------
> >>>>> 1 file changed, 152 insertions(+), 33 deletions(-)
> >>>>>
> >>>>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> >>>>> index 6e51386ff91f..e9be1f31163d 100644
> >>>>> --- a/arch/x86/events/rapl.c
> >>>>> +++ b/arch/x86/events/rapl.c
> >>>>> @@ -39,6 +39,10 @@
> >>>>> * event: rapl_energy_psys
> >>>>> * perf code: 0x5
> >>>>> *
> >>>>> + * core counter: consumption of a single physical core
> >>>>> + * event: rapl_energy_core (power_core PMU)
> >>>>> + * perf code: 0x1
> >>>>> + *
> >>>>> * We manage those counters as free running (read-only). They may be
> >>>>> * use simultaneously by other tools, such as turbostat.
> >>>>> *
> >>>>> @@ -81,6 +85,10 @@ enum perf_rapl_pkg_events {
> >>>>> NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
> >>>>> };
> >>>>>
> >>>>> +#define PERF_RAPL_CORE 0 /* single core */
> >>>>> +#define PERF_RAPL_CORE_EVENTS_MAX 1
> >>>>> +#define NR_RAPL_CORE_DOMAINS PERF_RAPL_CORE_EVENTS_MAX
> >>>>> +
> >>>>> static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst = {
> >>>>> "pp0-core",
> >>>>> "package",
> >>>>> @@ -89,6 +97,8 @@ static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst
> >>>>> "psys",
> >>>>> };
> >>>>>
> >>>>> +static const char *const rapl_core_domain_name __initconst = "core";
> >>>>> +
> >>>>> /*
> >>>>> * event code: LSB 8 bits, passed in attr->config
> >>>>> * any other bit is reserved
> >>>>> @@ -141,14 +151,18 @@ enum rapl_unit_quirk {
> >>>>>
> >>>>> struct rapl_model {
> >>>>> struct perf_msr *rapl_pkg_msrs;
> >>>>> + struct perf_msr *rapl_core_msrs;
> >>>>> unsigned long pkg_events;
> >>>>> + unsigned long core_events;
> >>>>> unsigned int msr_power_unit;
> >>>>> enum rapl_unit_quirk unit_quirk;
> >>>>> };
> >>>>>
> >>>>> /* 1/2^hw_unit Joule */
> >>>>> static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
> >>>>> +static int rapl_core_hw_unit __read_mostly;
> >>>>> static struct rapl_pmus *rapl_pmus_pkg;
> >>>>> +static struct rapl_pmus *rapl_pmus_core;
> >>>>> static u64 rapl_timer_ms;
> >>>>> static struct rapl_model *rapl_model;
> >>>>>
> >>>>> @@ -156,14 +170,23 @@ static struct rapl_model *rapl_model;
> >>>>> * Helper function to get the correct topology id according to the
> >>>>> * RAPL PMU scope.
> >>>>> */
> >>>>> -static inline unsigned int get_rapl_pmu_idx(int cpu)
> >>>>> -{ /*
> >>>>> +static inline unsigned int get_rapl_pmu_idx(int cpu, int scope)
> >>>>> +{
> >>>>> + /*
> >>>>> * Returns unsigned int, which converts the '-1' return value
> >>>>> * (for non-existent mappings in topology map) to UINT_MAX, so
> >>>>> * the error check in the caller is simplified.
> >>>>> */
> >>>>> - return rapl_pkg_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
> >>>>> - topology_logical_die_id(cpu);
> >>>>> + switch (scope) {
> >>>>> + case PERF_PMU_SCOPE_PKG:
> >>>>> + return topology_logical_package_id(cpu);
> >>>>> + case PERF_PMU_SCOPE_DIE:
> >>>>> + return topology_logical_die_id(cpu);
> >>>>> + case PERF_PMU_SCOPE_CORE:
> >>>>> + return topology_logical_core_id(cpu);
> >>>>> + default:
> >>>>> + return -EINVAL;
> >>>>> + }
> >>>>> }
> >>>>>
> >>>>> static inline u64 rapl_read_counter(struct perf_event *event)
> >>>>> @@ -173,19 +196,20 @@ static inline u64 rapl_read_counter(struct perf_event *event)
> >>>>> return raw;
> >>>>> }
> >>>>>
> >>>>> -static inline u64 rapl_scale(u64 v, int cfg)
> >>>>> +static inline u64 rapl_scale(u64 v, struct perf_event *event)
> >>>>> {
> >>>>> - if (cfg > NR_RAPL_PKG_DOMAINS) {
> >>>>> - pr_warn("Invalid domain %d, failed to scale data\n", cfg);
> >>>>> - return v;
> >>>>> - }
> >>>>> + int hw_unit = rapl_pkg_hw_unit[event->hw.config - 1];
> >>>>> +
> >>>>> + if (event->pmu->scope == PERF_PMU_SCOPE_CORE)
> >>>>> + hw_unit = rapl_core_hw_unit;
> >>>>> +
> >>>>> /*
> >>>>> * scale delta to smallest unit (1/2^32)
> >>>>> * users must then scale back: count * 1/(1e9*2^32) to get Joules
> >>>>> * or use ldexp(count, -32).
> >>>>> * Watts = Joules/Time delta
> >>>>> */
> >>>>> - return v << (32 - rapl_pkg_hw_unit[cfg - 1]);
> >>>>> + return v << (32 - hw_unit);
> >>>>> }
> >>>>>
> >>>>> static u64 rapl_event_update(struct perf_event *event)
> >>>>> @@ -212,7 +236,7 @@ static u64 rapl_event_update(struct perf_event *event)
> >>>>> delta = (new_raw_count << shift) - (prev_raw_count << shift);
> >>>>> delta >>= shift;
> >>>>>
> >>>>> - sdelta = rapl_scale(delta, event->hw.config);
> >>>>> + sdelta = rapl_scale(delta, event);
> >>>>>
> >>>>> local64_add(sdelta, &event->count);
> >>>>>
> >>>>> @@ -341,13 +365,14 @@ static void rapl_pmu_event_del(struct perf_event *event, int flags)
> >>>>> static int rapl_pmu_event_init(struct perf_event *event)
> >>>>> {
> >>>>> u64 cfg = event->attr.config & RAPL_EVENT_MASK;
> >>>>> - int bit, ret = 0;
> >>>>> + int bit, rapl_pmus_scope, ret = 0;
> >>>>> struct rapl_pmu *rapl_pmu;
> >>>>> unsigned int rapl_pmu_idx;
> >>>>> + struct rapl_pmus *rapl_pmus;
> >>>>>
> >>>>> - /* only look at RAPL events */
> >>>>> - if (event->attr.type != rapl_pmus_pkg->pmu.type)
> >>>>> - return -ENOENT;
> >>>>> + /* unsupported modes and filters */
> >>>>> + if (event->attr.sample_period) /* no sampling */
> >>>>> + return -EINVAL;
> >>>>
> >>>> Hi Dhananjay,
> >>>>
> >>>> On linux-next, since this commit, it seems that simple sampling with 'perf
> >>>> record -- <command>' (i.e. the default event), 'perf top' etc. can
> >>>> unexpectedly fail because rapl_pmu_event_init() now returns -EINVAL instead
> >>>> of -ENOENT even in such cases of a type mismatch. I observed that this
> >>>> prevents evsel__fallback() from falling back to cpu-clock or task-clock.
> >>>>
> >>>> Should we reorder the checks in rapl_pmu_event_init() to allow an early
> >>>> return with -ENOENT in such cases, as shown below? I'm not very familiar
> >>>> with this area and I might be missing something. I'd appreciate it if you
> >>>> could share your thoughts.
> >>>>
> >>>> --- a/arch/x86/events/rapl.c
> >>>> +++ b/arch/x86/events/rapl.c
> >>>> @@ -370,17 +370,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
> >>>> unsigned int rapl_pmu_idx;
> >>>> struct rapl_pmus *rapl_pmus;
> >>>>
> >>>> - /* unsupported modes and filters */
> >>>> - if (event->attr.sample_period) /* no sampling */
> >>>> - return -EINVAL;
> >>>> -
> >>>> - /* check only supported bits are set */
> >>>> - if (event->attr.config & ~RAPL_EVENT_MASK)
> >>>> - return -EINVAL;
> >>>> -
> >>>> - if (event->cpu < 0)
> >>>> - return -EINVAL;
> >>>> -
> >>>> rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
> >>>> if (!rapl_pmus)
> >>>> return -EINVAL;
> >>>> @@ -411,6 +400,17 @@ static int rapl_pmu_event_init(struct perf_event *event)
> >>>> } else
> >>>> return -EINVAL;
> >>>>
> >>>> + /* unsupported modes and filters */
> >>>> + if (event->attr.sample_period) /* no sampling */
> >>>> + return -EINVAL;
> >>>> +
> >>>> + /* check only supported bits are set */
> >>>> + if (event->attr.config & ~RAPL_EVENT_MASK)
> >>>> + return -EINVAL;
> >>>> +
> >>>> + if (event->cpu < 0)
> >>>> + return -EINVAL;
> >>>> +
> >>>> /* check event supported */
> >>>> if (!(rapl_pmus->cntr_mask & (1 << bit)))
> >>>> return -EINVAL;
> >>>>
> >>>> Thanks.
> >>>>
> >>>> -Koichiro
> >>>
> >>> Hello Koichiro,
> >>>
> >>> I tried reproducing the issue you mentioned using "sudo perf record -- sleep 2" and
> >>> "sudo perf top" commands on an AMD EPYC system, the commands worked successfully.
> >>> Can you please mention which system and which exact commands you're
> >>> running that reproduced the issue?
> >>>
> >>> My analysis is, if we are running "perf record/top" with the default event, we would
> >>> not enter the rapl_pmu_event_init() function, which renders the reordering of the type
> >>> checks irrelevant. Regardless, please let me know how I can reproduce the issue.
> >>>
> >>> Thanks,
> >>> Dhananjay
> >>
> >> Hi,
> >>
> >> Apologies for the delayed response, and thank you for your comment. I
> >> confirmed that just running "perf top" on a qemu instance reproduces it.
> >> The host CPU model is Intel Core i9-13900K, which is passed through to
> >> the guest.
> >>
> >> In my case, no pmu for PERF_TYPE_RAW is registered, but the rapl pmu is
> >> present. Then, perf_init_event() reaches the line marked "---->" below, and
> >> rapl_pmu_event_init() run, which returns -EINVAL before the type check.
> >>
> >> static struct pmu *perf_init_event(struct perf_event *event)
> >> {
> >> [...]
> >> if (pmu) {
> >> [...]
> >> goto unlock;
> >> }
> >>
> >> list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
> >> ----> ret = perf_try_init_event(pmu, event);
> >> if (!ret)
> >> goto unlock;
> >>
> >> if (ret != -ENOENT) {
> >> pmu = ERR_PTR(ret);
> >> goto unlock;
> >> }
> >> }
> >>
> >> I'll look into this a bit more on my side later and get back to you if
> >> something becomes clear.
> >
> > Sorry for the delayed response, can you please try the below diff and let me know
> > if it fixes the issue?
>
> Hello Koichiro,
>
> Gentle ping, please let me know once you try out the below fix.
Sorry to be late. Yes it works.
Early return with -ENOENT when event->attr.type != event->pmu->type seems
like a common approach used in other pmu implementations as well.
Thanks,
Koichiro
>
> Thanks,
> Dhananjay
>
> >
> > diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> > index d3bb3865c1b1..4952faf03e82 100644
> > --- a/arch/x86/events/rapl.c
> > +++ b/arch/x86/events/rapl.c
> > @@ -370,6 +370,10 @@ static int rapl_pmu_event_init(struct perf_event *event)
> > unsigned int rapl_pmu_idx;
> > struct rapl_pmus *rapl_pmus;
> >
> > + /* only look at RAPL events */
> > + if (event->attr.type != event->pmu->type)
> > + return -ENOENT;
> > +
> > /* unsupported modes and filters */
> > if (event->attr.sample_period) /* no sampling */
> > return -EINVAL;
> > @@ -387,10 +391,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
> > rapl_pmus_scope = rapl_pmus->pmu.scope;
> >
> > if (rapl_pmus_scope == PERF_PMU_SCOPE_PKG || rapl_pmus_scope == PERF_PMU_SCOPE_DIE) {
> > - /* only look at RAPL package events */
> > - if (event->attr.type != rapl_pmus_pkg->pmu.type)
> > - return -ENOENT;
> > -
> > cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
> > if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
> > return -EINVAL;
> > @@ -398,10 +398,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
> > bit = cfg - 1;
> > event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
> > } else if (rapl_pmus_scope == PERF_PMU_SCOPE_CORE) {
> > - /* only look at RAPL core events */
> > - if (event->attr.type != rapl_pmus_core->pmu.type)
> > - return -ENOENT;
> > -
> > cfg = array_index_nospec((long)cfg, NR_RAPL_CORE_DOMAINS + 1);
> > if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
> > return -EINVAL;
> >
> >>
> >> Thanks,
> >>
> >> -Koichiro
> >
>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs
2025-01-29 3:03 ` Koichiro Den
@ 2025-01-29 5:34 ` Dhananjay Ugwekar
0 siblings, 0 replies; 26+ messages in thread
From: Dhananjay Ugwekar @ 2025-01-29 5:34 UTC (permalink / raw)
To: Koichiro Den
Cc: peterz, mingo, rui.zhang, irogers, kan.liang, tglx, bp,
gautham.shenoy, kprateek.nayak, ravi.bangoria, x86,
linux-perf-users, linux-kernel
On 1/29/2025 8:33 AM, Koichiro Den wrote:
> On Tue, Jan 28, 2025 at 01:41:51PM GMT, Dhananjay Ugwekar wrote:
>> On 1/20/2025 5:12 PM, Dhananjay Ugwekar wrote:
>>> On 1/15/2025 7:53 PM, Koichiro Den wrote:
>>>> On Mon, Jan 13, 2025 at 12:04:43PM GMT, Dhananjay Ugwekar wrote:
>>>>> On 1/12/2025 7:12 PM, Koichiro Den wrote:
>>>>>> On Fri, Nov 15, 2024 at 06:08:06AM GMT, Dhananjay Ugwekar wrote:
>>>>>>> Add a new "power_core" PMU and "energy-core" event for monitoring
>>>>>>> energy consumption by each individual core. The existing energy-cores
>>>>>>> event aggregates the energy consumption of CPU cores at the package level.
>>>>>>> This new event aligns with the AMD's per-core energy counters.
>>>>>>>
>>>>>>> Tested the package level and core level PMU counters with workloads
>>>>>>> pinned to different CPUs.
>>>>>>>
>>>>>>> Results with workload pinned to CPU 4 in core 4 on an AMD Zen4 Genoa
>>>>>>> machine:
>>>>>>>
>>>>>>> $ sudo perf stat --per-core -e power_core/energy-core/ -- taskset -c 4 stress-ng --matrix 1 --timeout 5s
>>>>>>> stress-ng: info: [21250] setting to a 5 second run per stressor
>>>>>>> stress-ng: info: [21250] dispatching hogs: 1 matrix
>>>>>>> stress-ng: info: [21250] successful run completed in 5.00s
>>>>>>>
>>>>>>> Performance counter stats for 'system wide':
>>>>>>>
>>>>>>> S0-D0-C0 1 0.00 Joules power_core/energy-core/
>>>>>>> S0-D0-C1 1 0.00 Joules power_core/energy-core/
>>>>>>> S0-D0-C2 1 0.00 Joules power_core/energy-core/
>>>>>>> S0-D0-C3 1 0.00 Joules power_core/energy-core/
>>>>>>> S0-D0-C4 1 8.43 Joules power_core/energy-core/
>>>>>>> S0-D0-C5 1 0.00 Joules power_core/energy-core/
>>>>>>> S0-D0-C6 1 0.00 Joules power_core/energy-core/
>>>>>>> S0-D0-C7 1 0.00 Joules power_core/energy-core/
>>>>>>> S0-D1-C8 1 0.00 Joules power_core/energy-core/
>>>>>>> S0-D1-C9 1 0.00 Joules power_core/energy-core/
>>>>>>> S0-D1-C10 1 0.00 Joules power_core/energy-core/
>>>>>>>
>>>>>>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
>>>>>>> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
>>>>>>> ---
>>>>>>> arch/x86/events/rapl.c | 185 +++++++++++++++++++++++++++++++++--------
>>>>>>> 1 file changed, 152 insertions(+), 33 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>>>>>>> index 6e51386ff91f..e9be1f31163d 100644
>>>>>>> --- a/arch/x86/events/rapl.c
>>>>>>> +++ b/arch/x86/events/rapl.c
>>>>>>> @@ -39,6 +39,10 @@
>>>>>>> * event: rapl_energy_psys
>>>>>>> * perf code: 0x5
>>>>>>> *
>>>>>>> + * core counter: consumption of a single physical core
>>>>>>> + * event: rapl_energy_core (power_core PMU)
>>>>>>> + * perf code: 0x1
>>>>>>> + *
>>>>>>> * We manage those counters as free running (read-only). They may be
>>>>>>> * use simultaneously by other tools, such as turbostat.
>>>>>>> *
>>>>>>> @@ -81,6 +85,10 @@ enum perf_rapl_pkg_events {
>>>>>>> NR_RAPL_PKG_DOMAINS = PERF_RAPL_PKG_EVENTS_MAX,
>>>>>>> };
>>>>>>>
>>>>>>> +#define PERF_RAPL_CORE 0 /* single core */
>>>>>>> +#define PERF_RAPL_CORE_EVENTS_MAX 1
>>>>>>> +#define NR_RAPL_CORE_DOMAINS PERF_RAPL_CORE_EVENTS_MAX
>>>>>>> +
>>>>>>> static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst = {
>>>>>>> "pp0-core",
>>>>>>> "package",
>>>>>>> @@ -89,6 +97,8 @@ static const char *const rapl_pkg_domain_names[NR_RAPL_PKG_DOMAINS] __initconst
>>>>>>> "psys",
>>>>>>> };
>>>>>>>
>>>>>>> +static const char *const rapl_core_domain_name __initconst = "core";
>>>>>>> +
>>>>>>> /*
>>>>>>> * event code: LSB 8 bits, passed in attr->config
>>>>>>> * any other bit is reserved
>>>>>>> @@ -141,14 +151,18 @@ enum rapl_unit_quirk {
>>>>>>>
>>>>>>> struct rapl_model {
>>>>>>> struct perf_msr *rapl_pkg_msrs;
>>>>>>> + struct perf_msr *rapl_core_msrs;
>>>>>>> unsigned long pkg_events;
>>>>>>> + unsigned long core_events;
>>>>>>> unsigned int msr_power_unit;
>>>>>>> enum rapl_unit_quirk unit_quirk;
>>>>>>> };
>>>>>>>
>>>>>>> /* 1/2^hw_unit Joule */
>>>>>>> static int rapl_pkg_hw_unit[NR_RAPL_PKG_DOMAINS] __read_mostly;
>>>>>>> +static int rapl_core_hw_unit __read_mostly;
>>>>>>> static struct rapl_pmus *rapl_pmus_pkg;
>>>>>>> +static struct rapl_pmus *rapl_pmus_core;
>>>>>>> static u64 rapl_timer_ms;
>>>>>>> static struct rapl_model *rapl_model;
>>>>>>>
>>>>>>> @@ -156,14 +170,23 @@ static struct rapl_model *rapl_model;
>>>>>>> * Helper function to get the correct topology id according to the
>>>>>>> * RAPL PMU scope.
>>>>>>> */
>>>>>>> -static inline unsigned int get_rapl_pmu_idx(int cpu)
>>>>>>> -{ /*
>>>>>>> +static inline unsigned int get_rapl_pmu_idx(int cpu, int scope)
>>>>>>> +{
>>>>>>> + /*
>>>>>>> * Returns unsigned int, which converts the '-1' return value
>>>>>>> * (for non-existent mappings in topology map) to UINT_MAX, so
>>>>>>> * the error check in the caller is simplified.
>>>>>>> */
>>>>>>> - return rapl_pkg_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
>>>>>>> - topology_logical_die_id(cpu);
>>>>>>> + switch (scope) {
>>>>>>> + case PERF_PMU_SCOPE_PKG:
>>>>>>> + return topology_logical_package_id(cpu);
>>>>>>> + case PERF_PMU_SCOPE_DIE:
>>>>>>> + return topology_logical_die_id(cpu);
>>>>>>> + case PERF_PMU_SCOPE_CORE:
>>>>>>> + return topology_logical_core_id(cpu);
>>>>>>> + default:
>>>>>>> + return -EINVAL;
>>>>>>> + }
>>>>>>> }
>>>>>>>
>>>>>>> static inline u64 rapl_read_counter(struct perf_event *event)
>>>>>>> @@ -173,19 +196,20 @@ static inline u64 rapl_read_counter(struct perf_event *event)
>>>>>>> return raw;
>>>>>>> }
>>>>>>>
>>>>>>> -static inline u64 rapl_scale(u64 v, int cfg)
>>>>>>> +static inline u64 rapl_scale(u64 v, struct perf_event *event)
>>>>>>> {
>>>>>>> - if (cfg > NR_RAPL_PKG_DOMAINS) {
>>>>>>> - pr_warn("Invalid domain %d, failed to scale data\n", cfg);
>>>>>>> - return v;
>>>>>>> - }
>>>>>>> + int hw_unit = rapl_pkg_hw_unit[event->hw.config - 1];
>>>>>>> +
>>>>>>> + if (event->pmu->scope == PERF_PMU_SCOPE_CORE)
>>>>>>> + hw_unit = rapl_core_hw_unit;
>>>>>>> +
>>>>>>> /*
>>>>>>> * scale delta to smallest unit (1/2^32)
>>>>>>> * users must then scale back: count * 1/(1e9*2^32) to get Joules
>>>>>>> * or use ldexp(count, -32).
>>>>>>> * Watts = Joules/Time delta
>>>>>>> */
>>>>>>> - return v << (32 - rapl_pkg_hw_unit[cfg - 1]);
>>>>>>> + return v << (32 - hw_unit);
>>>>>>> }
>>>>>>>
>>>>>>> static u64 rapl_event_update(struct perf_event *event)
>>>>>>> @@ -212,7 +236,7 @@ static u64 rapl_event_update(struct perf_event *event)
>>>>>>> delta = (new_raw_count << shift) - (prev_raw_count << shift);
>>>>>>> delta >>= shift;
>>>>>>>
>>>>>>> - sdelta = rapl_scale(delta, event->hw.config);
>>>>>>> + sdelta = rapl_scale(delta, event);
>>>>>>>
>>>>>>> local64_add(sdelta, &event->count);
>>>>>>>
>>>>>>> @@ -341,13 +365,14 @@ static void rapl_pmu_event_del(struct perf_event *event, int flags)
>>>>>>> static int rapl_pmu_event_init(struct perf_event *event)
>>>>>>> {
>>>>>>> u64 cfg = event->attr.config & RAPL_EVENT_MASK;
>>>>>>> - int bit, ret = 0;
>>>>>>> + int bit, rapl_pmus_scope, ret = 0;
>>>>>>> struct rapl_pmu *rapl_pmu;
>>>>>>> unsigned int rapl_pmu_idx;
>>>>>>> + struct rapl_pmus *rapl_pmus;
>>>>>>>
>>>>>>> - /* only look at RAPL events */
>>>>>>> - if (event->attr.type != rapl_pmus_pkg->pmu.type)
>>>>>>> - return -ENOENT;
>>>>>>> + /* unsupported modes and filters */
>>>>>>> + if (event->attr.sample_period) /* no sampling */
>>>>>>> + return -EINVAL;
>>>>>>
>>>>>> Hi Dhananjay,
>>>>>>
>>>>>> On linux-next, since this commit, it seems that simple sampling with 'perf
>>>>>> record -- <command>' (i.e. the default event), 'perf top' etc. can
>>>>>> unexpectedly fail because rapl_pmu_event_init() now returns -EINVAL instead
>>>>>> of -ENOENT even in such cases of a type mismatch. I observed that this
>>>>>> prevents evsel__fallback() from falling back to cpu-clock or task-clock.
>>>>>>
>>>>>> Should we reorder the checks in rapl_pmu_event_init() to allow an early
>>>>>> return with -ENOENT in such cases, as shown below? I'm not very familiar
>>>>>> with this area and I might be missing something. I'd appreciate it if you
>>>>>> could share your thoughts.
>>>>>>
>>>>>> --- a/arch/x86/events/rapl.c
>>>>>> +++ b/arch/x86/events/rapl.c
>>>>>> @@ -370,17 +370,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
>>>>>> unsigned int rapl_pmu_idx;
>>>>>> struct rapl_pmus *rapl_pmus;
>>>>>>
>>>>>> - /* unsupported modes and filters */
>>>>>> - if (event->attr.sample_period) /* no sampling */
>>>>>> - return -EINVAL;
>>>>>> -
>>>>>> - /* check only supported bits are set */
>>>>>> - if (event->attr.config & ~RAPL_EVENT_MASK)
>>>>>> - return -EINVAL;
>>>>>> -
>>>>>> - if (event->cpu < 0)
>>>>>> - return -EINVAL;
>>>>>> -
>>>>>> rapl_pmus = container_of(event->pmu, struct rapl_pmus, pmu);
>>>>>> if (!rapl_pmus)
>>>>>> return -EINVAL;
>>>>>> @@ -411,6 +400,17 @@ static int rapl_pmu_event_init(struct perf_event *event)
>>>>>> } else
>>>>>> return -EINVAL;
>>>>>>
>>>>>> + /* unsupported modes and filters */
>>>>>> + if (event->attr.sample_period) /* no sampling */
>>>>>> + return -EINVAL;
>>>>>> +
>>>>>> + /* check only supported bits are set */
>>>>>> + if (event->attr.config & ~RAPL_EVENT_MASK)
>>>>>> + return -EINVAL;
>>>>>> +
>>>>>> + if (event->cpu < 0)
>>>>>> + return -EINVAL;
>>>>>> +
>>>>>> /* check event supported */
>>>>>> if (!(rapl_pmus->cntr_mask & (1 << bit)))
>>>>>> return -EINVAL;
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> -Koichiro
>>>>>
>>>>> Hello Koichiro,
>>>>>
>>>>> I tried reproducing the issue you mentioned using "sudo perf record -- sleep 2" and
>>>>> "sudo perf top" commands on an AMD EPYC system, the commands worked successfully.
>>>>> Can you please mention which system and which exact commands you're
>>>>> running that reproduced the issue?
>>>>>
>>>>> My analysis is, if we are running "perf record/top" with the default event, we would
>>>>> not enter the rapl_pmu_event_init() function, which renders the reordering of the type
>>>>> checks irrelevant. Regardless, please let me know how I can reproduce the issue.
>>>>>
>>>>> Thanks,
>>>>> Dhananjay
>>>>
>>>> Hi,
>>>>
>>>> Apologies for the delayed response, and thank you for your comment. I
>>>> confirmed that just running "perf top" on a qemu instance reproduces it.
>>>> The host CPU model is Intel Core i9-13900K, which is passed through to
>>>> the guest.
>>>>
>>>> In my case, no pmu for PERF_TYPE_RAW is registered, but the rapl pmu is
>>>> present. Then, perf_init_event() reaches the line marked "---->" below, and
>>>> rapl_pmu_event_init() run, which returns -EINVAL before the type check.
>>>>
>>>> static struct pmu *perf_init_event(struct perf_event *event)
>>>> {
>>>> [...]
>>>> if (pmu) {
>>>> [...]
>>>> goto unlock;
>>>> }
>>>>
>>>> list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
>>>> ----> ret = perf_try_init_event(pmu, event);
>>>> if (!ret)
>>>> goto unlock;
>>>>
>>>> if (ret != -ENOENT) {
>>>> pmu = ERR_PTR(ret);
>>>> goto unlock;
>>>> }
>>>> }
>>>>
>>>> I'll look into this a bit more on my side later and get back to you if
>>>> something becomes clear.
>>>
>>> Sorry for the delayed response, can you please try the below diff and let me know
>>> if it fixes the issue?
>>
>> Hello Koichiro,
>>
>> Gentle ping, please let me know once you try out the below fix.
>
> Sorry to be late. Yes it works.
> Early return with -ENOENT when event->attr.type != event->pmu->type seems
> like a common approach used in other pmu implementations as well.
Great!, I will post this as a fix for the last patch.
Thanks for your help!
Regards,
Dhananjay
>
> Thanks,
> Koichiro
>
>>
>> Thanks,
>> Dhananjay
>>
>>>
>>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>>> index d3bb3865c1b1..4952faf03e82 100644
>>> --- a/arch/x86/events/rapl.c
>>> +++ b/arch/x86/events/rapl.c
>>> @@ -370,6 +370,10 @@ static int rapl_pmu_event_init(struct perf_event *event)
>>> unsigned int rapl_pmu_idx;
>>> struct rapl_pmus *rapl_pmus;
>>>
>>> + /* only look at RAPL events */
>>> + if (event->attr.type != event->pmu->type)
>>> + return -ENOENT;
>>> +
>>> /* unsupported modes and filters */
>>> if (event->attr.sample_period) /* no sampling */
>>> return -EINVAL;
>>> @@ -387,10 +391,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
>>> rapl_pmus_scope = rapl_pmus->pmu.scope;
>>>
>>> if (rapl_pmus_scope == PERF_PMU_SCOPE_PKG || rapl_pmus_scope == PERF_PMU_SCOPE_DIE) {
>>> - /* only look at RAPL package events */
>>> - if (event->attr.type != rapl_pmus_pkg->pmu.type)
>>> - return -ENOENT;
>>> -
>>> cfg = array_index_nospec((long)cfg, NR_RAPL_PKG_DOMAINS + 1);
>>> if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
>>> return -EINVAL;
>>> @@ -398,10 +398,6 @@ static int rapl_pmu_event_init(struct perf_event *event)
>>> bit = cfg - 1;
>>> event->hw.event_base = rapl_model->rapl_pkg_msrs[bit].msr;
>>> } else if (rapl_pmus_scope == PERF_PMU_SCOPE_CORE) {
>>> - /* only look at RAPL core events */
>>> - if (event->attr.type != rapl_pmus_core->pmu.type)
>>> - return -ENOENT;
>>> -
>>> cfg = array_index_nospec((long)cfg, NR_RAPL_CORE_DOMAINS + 1);
>>> if (!cfg || cfg >= NR_RAPL_PKG_DOMAINS + 1)
>>> return -EINVAL;
>>>
>>>>
>>>> Thanks,
>>>>
>>>> -Koichiro
>>>
>>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v7 00/10] Add RAPL core energy counter support for AMD CPUs
2024-11-15 6:07 [PATCH v7 00/10] Add RAPL core energy counter support for AMD CPUs Dhananjay Ugwekar
` (9 preceding siblings ...)
2024-11-15 6:08 ` [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs Dhananjay Ugwekar
@ 2024-11-19 12:20 ` Peter Zijlstra
2024-11-22 6:34 ` Dhananjay Ugwekar
10 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2024-11-19 12:20 UTC (permalink / raw)
To: Dhananjay Ugwekar
Cc: mingo, rui.zhang, irogers, kan.liang, tglx, bp, gautham.shenoy,
kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel
On Fri, Nov 15, 2024 at 06:07:56AM +0000, Dhananjay Ugwekar wrote:
>
> Dhananjay Ugwekar (9):
> perf/x86/rapl: Remove the unused get_rapl_pmu_cpumask() function
> perf/x86/rapl: Remove the cpu_to_rapl_pmu() function
> perf/x86/rapl: Rename rapl_pmu variables
> perf/x86/rapl: Make rapl_model struct global
> perf/x86/rapl: Add arguments to the init and cleanup functions
> perf/x86/rapl: Modify the generic variable names to *_pkg*
> perf/x86/rapl: Remove the global variable rapl_msrs
> perf/x86/rapl: Move the cntr_mask to rapl_pmus struct
> perf/x86/rapl: Add core energy counter support for AMD CPUs
>
> K Prateek Nayak (1):
> x86/topology: Introduce topology_logical_core_id()
>
> Documentation/arch/x86/topology.rst | 4 +
> arch/x86/events/rapl.c | 414 ++++++++++++++++----------
> arch/x86/include/asm/processor.h | 1 +
> arch/x86/include/asm/topology.h | 1 +
> arch/x86/kernel/cpu/debugfs.c | 1 +
> arch/x86/kernel/cpu/topology_common.c | 1 +
> 6 files changed, 272 insertions(+), 150 deletions(-)
Thanks, I'll stick them in a tree post -rc1.
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v7 00/10] Add RAPL core energy counter support for AMD CPUs
2024-11-19 12:20 ` [PATCH v7 00/10] Add RAPL " Peter Zijlstra
@ 2024-11-22 6:34 ` Dhananjay Ugwekar
2024-11-22 11:19 ` Peter Zijlstra
0 siblings, 1 reply; 26+ messages in thread
From: Dhananjay Ugwekar @ 2024-11-22 6:34 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, rui.zhang, irogers, kan.liang, tglx, bp, gautham.shenoy,
kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel
On 11/19/2024 5:50 PM, Peter Zijlstra wrote:
> On Fri, Nov 15, 2024 at 06:07:56AM +0000, Dhananjay Ugwekar wrote:
>>
>> Dhananjay Ugwekar (9):
>> perf/x86/rapl: Remove the unused get_rapl_pmu_cpumask() function
>> perf/x86/rapl: Remove the cpu_to_rapl_pmu() function
>> perf/x86/rapl: Rename rapl_pmu variables
>> perf/x86/rapl: Make rapl_model struct global
>> perf/x86/rapl: Add arguments to the init and cleanup functions
>> perf/x86/rapl: Modify the generic variable names to *_pkg*
>> perf/x86/rapl: Remove the global variable rapl_msrs
>> perf/x86/rapl: Move the cntr_mask to rapl_pmus struct
>> perf/x86/rapl: Add core energy counter support for AMD CPUs
>>
>> K Prateek Nayak (1):
>> x86/topology: Introduce topology_logical_core_id()
>>
>> Documentation/arch/x86/topology.rst | 4 +
>> arch/x86/events/rapl.c | 414 ++++++++++++++++----------
>> arch/x86/include/asm/processor.h | 1 +
>> arch/x86/include/asm/topology.h | 1 +
>> arch/x86/kernel/cpu/debugfs.c | 1 +
>> arch/x86/kernel/cpu/topology_common.c | 1 +
>> 6 files changed, 272 insertions(+), 150 deletions(-)
>
> Thanks, I'll stick them in a tree post -rc1.
Thanks Peter!, there was a bug reported by Peter Jung, it is fixed by a diff that
I posted as a reply, could you please squash the diff into the last patch of the
series before merging or let me know if you prefer me to post the squashed patch
separately.
Regards,
Dhananjay
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v7 00/10] Add RAPL core energy counter support for AMD CPUs
2024-11-22 6:34 ` Dhananjay Ugwekar
@ 2024-11-22 11:19 ` Peter Zijlstra
0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2024-11-22 11:19 UTC (permalink / raw)
To: Dhananjay Ugwekar
Cc: mingo, rui.zhang, irogers, kan.liang, tglx, bp, gautham.shenoy,
kprateek.nayak, ravi.bangoria, x86, linux-perf-users,
linux-kernel
On Fri, Nov 22, 2024 at 12:04:25PM +0530, Dhananjay Ugwekar wrote:
> On 11/19/2024 5:50 PM, Peter Zijlstra wrote:
> > On Fri, Nov 15, 2024 at 06:07:56AM +0000, Dhananjay Ugwekar wrote:
> >>
> >> Dhananjay Ugwekar (9):
> >> perf/x86/rapl: Remove the unused get_rapl_pmu_cpumask() function
> >> perf/x86/rapl: Remove the cpu_to_rapl_pmu() function
> >> perf/x86/rapl: Rename rapl_pmu variables
> >> perf/x86/rapl: Make rapl_model struct global
> >> perf/x86/rapl: Add arguments to the init and cleanup functions
> >> perf/x86/rapl: Modify the generic variable names to *_pkg*
> >> perf/x86/rapl: Remove the global variable rapl_msrs
> >> perf/x86/rapl: Move the cntr_mask to rapl_pmus struct
> >> perf/x86/rapl: Add core energy counter support for AMD CPUs
> >>
> >> K Prateek Nayak (1):
> >> x86/topology: Introduce topology_logical_core_id()
> >>
> >> Documentation/arch/x86/topology.rst | 4 +
> >> arch/x86/events/rapl.c | 414 ++++++++++++++++----------
> >> arch/x86/include/asm/processor.h | 1 +
> >> arch/x86/include/asm/topology.h | 1 +
> >> arch/x86/kernel/cpu/debugfs.c | 1 +
> >> arch/x86/kernel/cpu/topology_common.c | 1 +
> >> 6 files changed, 272 insertions(+), 150 deletions(-)
> >
> > Thanks, I'll stick them in a tree post -rc1.
>
> Thanks Peter!, there was a bug reported by Peter Jung, it is fixed by a diff that
> I posted as a reply, could you please squash the diff into the last patch of the
> series before merging or let me know if you prefer me to post the squashed patch
> separately.
Already squashed. Thanks!
^ permalink raw reply [flat|nested] 26+ messages in thread