linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] perf/x86: Fix low freq setting issue
@ 2025-01-17 15:19 kan.liang
  2025-01-17 15:19 ` [PATCH 2/3] perf: Fix low freq setting via IOC_PERIOD kan.liang
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: kan.liang @ 2025-01-17 15:19 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, linux-kernel,
	linux-perf-users
  Cc: ak, ravi.bangoria, jolsa, Kan Liang, stable

From: Kan Liang <kan.liang@linux.intel.com>

Perf doesn't work with a low freq.

perf record -e cpu_core/instructions/ppp -F 120
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (cpu_core/instructions/ppp).
"dmesg | grep -i perf" may provide additional information.

The limit_period() check avoids a low sampling period on a counter. It
doesn't intend to limit the frequency.
The check in the x86_pmu_hw_config() should be limited to non-freq mode.
The attr.sample_period and attr.sample_freq are union. The
attr.sample_period should not be used to indicate the freq mode.

Fixes: c46e665f0377 ("perf/x86: Add INST_RETIRED.ALL workarounds")
Closes: https://lore.kernel.org/lkml/20250115154949.3147-1-ravi.bangoria@amd.com/
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: stable@vger.kernel.org
---
 arch/x86/events/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 7b6430e5a77b..20ad5cca6ad2 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -630,7 +630,7 @@ int x86_pmu_hw_config(struct perf_event *event)
 	if (event->attr.type == event->pmu->type)
 		event->hw.config |= x86_pmu_get_event_config(event);
 
-	if (event->attr.sample_period && x86_pmu.limit_period) {
+	if (!event->attr.freq && x86_pmu.limit_period) {
 		s64 left = event->attr.sample_period;
 		x86_pmu.limit_period(event, &left);
 		if (left > event->attr.sample_period)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] perf: Fix low freq setting via IOC_PERIOD
  2025-01-17 15:19 [PATCH 1/3] perf/x86: Fix low freq setting issue kan.liang
@ 2025-01-17 15:19 ` kan.liang
  2025-01-20  4:59   ` Ravi Bangoria
  2025-01-17 15:19 ` [RESEND PATCH 3/3] perf/x86/intel: New start period for the freq mode kan.liang
  2025-01-20  5:00 ` [PATCH 1/3] perf/x86: Fix low freq setting issue Ravi Bangoria
  2 siblings, 1 reply; 5+ messages in thread
From: kan.liang @ 2025-01-17 15:19 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, linux-kernel,
	linux-perf-users
  Cc: ak, ravi.bangoria, jolsa, Kan Liang, stable

From: Kan Liang <kan.liang@linux.intel.com>

A low freq cannot be set via IOC_PERIOD on some platforms.

The perf_event_check_period() introduced in commit 81ec3f3c4c4d
("perf/x86: Add check_period PMU callback") intended to check the
period, rather than the freq. A low freq may be mistakenly rejected by
the limit_period().

Fixes: 81ec3f3c4c4d ("perf/x86: Add check_period PMU callback")
Closes: https://lore.kernel.org/lkml/20250115154949.3147-1-ravi.bangoria@amd.com/
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: stable@vger.kernel.org
---
 kernel/events/core.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f91ba29048ce..a9a04d4f3619 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5960,14 +5960,15 @@ static int _perf_event_period(struct perf_event *event, u64 value)
 	if (!value)
 		return -EINVAL;
 
-	if (event->attr.freq && value > sysctl_perf_event_sample_rate)
-		return -EINVAL;
-
-	if (perf_event_check_period(event, value))
-		return -EINVAL;
-
-	if (!event->attr.freq && (value & (1ULL << 63)))
-		return -EINVAL;
+	if (event->attr.freq) {
+		if (value > sysctl_perf_event_sample_rate)
+			return -EINVAL;
+	} else {
+		if (perf_event_check_period(event, value))
+			return -EINVAL;
+		if (value & (1ULL << 63))
+			return -EINVAL;
+	}
 
 	event_function_call(event, __perf_event_period, &value);
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RESEND PATCH 3/3] perf/x86/intel: New start period for the freq mode
  2025-01-17 15:19 [PATCH 1/3] perf/x86: Fix low freq setting issue kan.liang
  2025-01-17 15:19 ` [PATCH 2/3] perf: Fix low freq setting via IOC_PERIOD kan.liang
@ 2025-01-17 15:19 ` kan.liang
  2025-01-20  5:00 ` [PATCH 1/3] perf/x86: Fix low freq setting issue Ravi Bangoria
  2 siblings, 0 replies; 5+ messages in thread
From: kan.liang @ 2025-01-17 15:19 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, irogers, linux-kernel,
	linux-perf-users
  Cc: ak, ravi.bangoria, jolsa, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The freq mode is the current default mode of Linux perf. 1 period is
used as a start period. The period is auto-adjusted in each tick or an
overflow to meet the frequency target.

The start period 1 is too low and may trigger some issues.
- Many HWs do not support period 1 well.
  https://lore.kernel.org/lkml/875xs2oh69.ffs@tglx/
- For an event that occurs frequently, period 1 is too far away from the
  real period. Lots of the samples are generated at the beginning.
  The distribution of samples may not be even.
- A low start period for the frequently occurring event also challenges
  virtualization, which has a longer path to handle a PMI.

The limit_period only checks the minimum acceptable value for HW.
It cannot be used to set the start period. Because, some events may
need a very low period. The limit_period cannot be set too high. It
doesn't help with the events that occur frequently.

It's hard to find a universal start period for all events. The idea is
only to give an estimate for the popular HW and HW cache events. For the
rest of the events, start from the lowest possible recommended value.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---

The patch was sent last year, but not merged.
https://lore.kernel.org/lkml/20241022130414.2493923-1-kan.liang@linux.intel.com/

Since it's also related to the period of freq mode, resend it here.

 arch/x86/events/intel/core.c | 85 ++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 7f1b6b90a5fb..e36f8e737f6b 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3991,6 +3991,85 @@ static inline bool intel_pmu_has_cap(struct perf_event *event, int idx)
 	return test_bit(idx, (unsigned long *)&intel_cap->capabilities);
 }
 
+static u64 intel_pmu_freq_start_period(struct perf_event *event)
+{
+	int type = event->attr.type;
+	u64 config, factor;
+	s64 start;
+
+	/*
+	 * The 127 is the lowest possible recommended SAV (sample after value)
+	 * for a 4000 freq (default freq), according to the event list JSON file.
+	 * Also, assume the workload is idle 50% time.
+	 */
+	factor = 64 * 4000;
+	if (type != PERF_TYPE_HARDWARE && type != PERF_TYPE_HW_CACHE)
+		goto end;
+
+	/*
+	 * The estimation of the start period in the freq mode is
+	 * based on the below assumption.
+	 *
+	 * For a cycles or an instructions event, 1GHZ of the
+	 * underlying platform, 1 IPC. The workload is idle 50% time.
+	 * The start period = 1,000,000,000 * 1 / freq / 2.
+	 *		    = 500,000,000 / freq
+	 *
+	 * Usually, the branch-related events occur less than the
+	 * instructions event. According to the Intel event list JSON
+	 * file, the SAV (sample after value) of a branch-related event
+	 * is usually 1/4 of an instruction event.
+	 * The start period of branch-related events = 125,000,000 / freq.
+	 *
+	 * The cache-related events occurs even less. The SAV is usually
+	 * 1/20 of an instruction event.
+	 * The start period of cache-related events = 25,000,000 / freq.
+	 */
+	config = event->attr.config & PERF_HW_EVENT_MASK;
+	if (type == PERF_TYPE_HARDWARE) {
+		switch (config) {
+		case PERF_COUNT_HW_CPU_CYCLES:
+		case PERF_COUNT_HW_INSTRUCTIONS:
+		case PERF_COUNT_HW_BUS_CYCLES:
+		case PERF_COUNT_HW_STALLED_CYCLES_FRONTEND:
+		case PERF_COUNT_HW_STALLED_CYCLES_BACKEND:
+		case PERF_COUNT_HW_REF_CPU_CYCLES:
+			factor = 500000000;
+			break;
+		case PERF_COUNT_HW_BRANCH_INSTRUCTIONS:
+		case PERF_COUNT_HW_BRANCH_MISSES:
+			factor = 125000000;
+			break;
+		case PERF_COUNT_HW_CACHE_REFERENCES:
+		case PERF_COUNT_HW_CACHE_MISSES:
+			factor = 25000000;
+			break;
+		default:
+			goto end;
+		}
+	}
+
+	if (type == PERF_TYPE_HW_CACHE)
+		factor = 25000000;
+end:
+	/*
+	 * Usually, a prime or a number with less factors (close to prime)
+	 * is chosen as an SAV, which makes it less likely that the sampling
+	 * period synchronizes with some periodic event in the workload.
+	 * Minus 1 to make it at least avoiding values near power of twos
+	 * for the default freq.
+	 */
+	start = DIV_ROUND_UP_ULL(factor, event->attr.sample_freq) - 1;
+
+	if (start > x86_pmu.max_period)
+		start = x86_pmu.max_period;
+
+	if (x86_pmu.limit_period)
+		x86_pmu.limit_period(event, &start);
+
+	return start;
+}
+
 static int intel_pmu_hw_config(struct perf_event *event)
 {
 	int ret = x86_pmu_hw_config(event);
@@ -4002,6 +4081,12 @@ static int intel_pmu_hw_config(struct perf_event *event)
 	if (ret)
 		return ret;
 
+	if (event->attr.freq && event->attr.sample_freq) {
+		event->hw.sample_period = intel_pmu_freq_start_period(event);
+		event->hw.last_period = event->hw.sample_period;
+		local64_set(&event->hw.period_left, event->hw.sample_period);
+	}
+
 	if (event->attr.precise_ip) {
 		if ((event->attr.config & INTEL_ARCH_EVENT_MASK) == INTEL_FIXED_VLBR_EVENT)
 			return -EINVAL;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/3] perf: Fix low freq setting via IOC_PERIOD
  2025-01-17 15:19 ` [PATCH 2/3] perf: Fix low freq setting via IOC_PERIOD kan.liang
@ 2025-01-20  4:59   ` Ravi Bangoria
  0 siblings, 0 replies; 5+ messages in thread
From: Ravi Bangoria @ 2025-01-20  4:59 UTC (permalink / raw)
  To: kan.liang, peterz, mingo, acme, namhyung, irogers, linux-kernel,
	linux-perf-users
  Cc: ak, jolsa, stable, Ravi Bangoria

On 17-Jan-25 8:49 PM, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> A low freq cannot be set via IOC_PERIOD on some platforms.
> 
> The perf_event_check_period() introduced in commit 81ec3f3c4c4d
> ("perf/x86: Add check_period PMU callback") intended to check the
> period, rather than the freq. A low freq may be mistakenly rejected by
> the limit_period().
> 
> Fixes: 81ec3f3c4c4d ("perf/x86: Add check_period PMU callback")
> Closes: https://lore.kernel.org/lkml/20250115154949.3147-1-ravi.bangoria@amd.com/
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Ravi Bangoria <ravi.bangoria@amd.com>
> Cc: stable@vger.kernel.org

Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/3] perf/x86: Fix low freq setting issue
  2025-01-17 15:19 [PATCH 1/3] perf/x86: Fix low freq setting issue kan.liang
  2025-01-17 15:19 ` [PATCH 2/3] perf: Fix low freq setting via IOC_PERIOD kan.liang
  2025-01-17 15:19 ` [RESEND PATCH 3/3] perf/x86/intel: New start period for the freq mode kan.liang
@ 2025-01-20  5:00 ` Ravi Bangoria
  2 siblings, 0 replies; 5+ messages in thread
From: Ravi Bangoria @ 2025-01-20  5:00 UTC (permalink / raw)
  To: kan.liang, peterz, mingo, acme, namhyung, irogers, linux-kernel,
	linux-perf-users
  Cc: ak, jolsa, stable, Ravi Bangoria

On 17-Jan-25 8:49 PM, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Perf doesn't work with a low freq.
> 
> perf record -e cpu_core/instructions/ppp -F 120
> Error:
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (cpu_core/instructions/ppp).
> "dmesg | grep -i perf" may provide additional information.
> 
> The limit_period() check avoids a low sampling period on a counter. It
> doesn't intend to limit the frequency.
> The check in the x86_pmu_hw_config() should be limited to non-freq mode.
> The attr.sample_period and attr.sample_freq are union. The
> attr.sample_period should not be used to indicate the freq mode.
> 
> Fixes: c46e665f0377 ("perf/x86: Add INST_RETIRED.ALL workarounds")
> Closes: https://lore.kernel.org/lkml/20250115154949.3147-1-ravi.bangoria@amd.com/
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Ravi Bangoria <ravi.bangoria@amd.com>
> Cc: stable@vger.kernel.org

Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-01-20  5:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-17 15:19 [PATCH 1/3] perf/x86: Fix low freq setting issue kan.liang
2025-01-17 15:19 ` [PATCH 2/3] perf: Fix low freq setting via IOC_PERIOD kan.liang
2025-01-20  4:59   ` Ravi Bangoria
2025-01-17 15:19 ` [RESEND PATCH 3/3] perf/x86/intel: New start period for the freq mode kan.liang
2025-01-20  5:00 ` [PATCH 1/3] perf/x86: Fix low freq setting issue Ravi Bangoria

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).