[tip: perf/core] perf: Fix the throttle logic for a group

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "tip-bot2 for Kan Liang" <tip-bot2@linutronix.de>
To: linux-tip-commits@vger.kernel.org
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Kan Liang <kan.liang@linux.intel.com>,
	Namhyung Kim <namhyung@kernel.org>,
	x86@kernel.org, linux-kernel@vger.kernel.org
Subject: [tip: perf/core] perf: Fix the throttle logic for a group
Date: Wed, 21 May 2025 12:16:04 -0000	[thread overview]
Message-ID: <174782976403.406.11916704121951822622.tip-bot2@tip-bot2> (raw)
In-Reply-To: <20250520181644.2673067-2-kan.liang@linux.intel.com>

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     9734e25fbf5ae68eb04234b2cd14a4b36ab89141
Gitweb:        https://git.kernel.org/tip/9734e25fbf5ae68eb04234b2cd14a4b36ab89141
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Tue, 20 May 2025 11:16:29 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 21 May 2025 13:57:42 +02:00

perf: Fix the throttle logic for a group

The current throttle logic doesn't work well with a group, e.g., the
following sampling-read case.

$ perf record -e "{cycles,cycles}:S" ...

$ perf report -D | grep THROTTLE | tail -2
            THROTTLE events:        426  ( 9.0%)
          UNTHROTTLE events:        425  ( 9.0%)

$ perf report -D | grep PERF_RECORD_SAMPLE -a4 | tail -n 5
0 1020120874009167 0x74970 [0x68]: PERF_RECORD_SAMPLE(IP, 0x1):
... sample_read:
.... group nr 2
..... id 0000000000000327, value 000000000cbb993a, lost 0
..... id 0000000000000328, value 00000002211c26df, lost 0

The second cycles event has a much larger value than the first cycles
event in the same group.

The current throttle logic in the generic code only logs the THROTTLE
event. It relies on the specific driver implementation to disable
events. For all ARCHs, the implementation is similar. Only the event is
disabled, rather than the group.

The logic to disable the group should be generic for all ARCHs. Add the
logic in the generic code. The following patch will remove the buggy
driver-specific implementation.

The throttle only happens when an event is overflowed. Stop the entire
group when any event in the group triggers the throttle.
The MAX_INTERRUPTS is set to all throttle events.

The unthrottled could happen in 3 places.
- event/group sched. All events in the group are scheduled one by one.
  All of them will be unthrottled eventually. Nothing needs to be
  changed.
- The perf_adjust_freq_unthr_events for each tick. Needs to restart the
  group altogether.
- The __perf_event_period(). The whole group needs to be restarted
  altogether as well.

With the fix,
$ sudo perf report -D | grep PERF_RECORD_SAMPLE -a4 | tail -n 5
0 3573470770332 0x12f5f8 [0x70]: PERF_RECORD_SAMPLE(IP, 0x2):
... sample_read:
.... group nr 2
..... id 0000000000000a28, value 00000004fd3dfd8f, lost 0
..... id 0000000000000a29, value 00000004fd3dfd8f, lost 0

Suggested-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250520181644.2673067-2-kan.liang@linux.intel.com
---
 kernel/events/core.c | 66 +++++++++++++++++++++++++++++--------------
 1 file changed, 46 insertions(+), 20 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 952340f..8327ab0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2645,6 +2645,39 @@ void perf_event_disable_inatomic(struct perf_event *event)
 static void perf_log_throttle(struct perf_event *event, int enable);
 static void perf_log_itrace_start(struct perf_event *event);
 
+static void perf_event_unthrottle(struct perf_event *event, bool start)
+{
+	event->hw.interrupts = 0;
+	if (start)
+		event->pmu->start(event, 0);
+	perf_log_throttle(event, 1);
+}
+
+static void perf_event_throttle(struct perf_event *event)
+{
+	event->pmu->stop(event, 0);
+	event->hw.interrupts = MAX_INTERRUPTS;
+	perf_log_throttle(event, 0);
+}
+
+static void perf_event_unthrottle_group(struct perf_event *event, bool skip_start_event)
+{
+	struct perf_event *sibling, *leader = event->group_leader;
+
+	perf_event_unthrottle(leader, skip_start_event ? leader != event : true);
+	for_each_sibling_event(sibling, leader)
+		perf_event_unthrottle(sibling, skip_start_event ? sibling != event : true);
+}
+
+static void perf_event_throttle_group(struct perf_event *event)
+{
+	struct perf_event *sibling, *leader = event->group_leader;
+
+	perf_event_throttle(leader);
+	for_each_sibling_event(sibling, leader)
+		perf_event_throttle(sibling);
+}
+
 static int
 event_sched_in(struct perf_event *event, struct perf_event_context *ctx)
 {
@@ -2673,10 +2706,8 @@ event_sched_in(struct perf_event *event, struct perf_event_context *ctx)
 	 * ticks already, also for a heavily scheduling task there is little
 	 * guarantee it'll get a tick in a timely manner.
 	 */
-	if (unlikely(event->hw.interrupts == MAX_INTERRUPTS)) {
-		perf_log_throttle(event, 1);
-		event->hw.interrupts = 0;
-	}
+	if (unlikely(event->hw.interrupts == MAX_INTERRUPTS))
+		perf_event_unthrottle(event, false);
 
 	perf_pmu_disable(event->pmu);
 
@@ -4254,12 +4285,8 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
 
 		hwc = &event->hw;
 
-		if (hwc->interrupts == MAX_INTERRUPTS) {
-			hwc->interrupts = 0;
-			perf_log_throttle(event, 1);
-			if (!is_event_in_freq_mode(event))
-				event->pmu->start(event, 0);
-		}
+		if (hwc->interrupts == MAX_INTERRUPTS)
+			perf_event_unthrottle_group(event, is_event_in_freq_mode(event));
 
 		if (!is_event_in_freq_mode(event))
 			continue;
@@ -6181,14 +6208,6 @@ static void __perf_event_period(struct perf_event *event,
 	active = (event->state == PERF_EVENT_STATE_ACTIVE);
 	if (active) {
 		perf_pmu_disable(event->pmu);
-		/*
-		 * We could be throttled; unthrottle now to avoid the tick
-		 * trying to unthrottle while we already re-started the event.
-		 */
-		if (event->hw.interrupts == MAX_INTERRUPTS) {
-			event->hw.interrupts = 0;
-			perf_log_throttle(event, 1);
-		}
 		event->pmu->stop(event, PERF_EF_UPDATE);
 	}
 
@@ -6196,6 +6215,14 @@ static void __perf_event_period(struct perf_event *event,
 
 	if (active) {
 		event->pmu->start(event, PERF_EF_RELOAD);
+		/*
+		 * Once the period is force-reset, the event starts immediately.
+		 * But the event/group could be throttled. Unthrottle the
+		 * event/group now to avoid the next tick trying to unthrottle
+		 * while we already re-started the event/group.
+		 */
+		if (event->hw.interrupts == MAX_INTERRUPTS)
+			perf_event_unthrottle_group(event, true);
 		perf_pmu_enable(event->pmu);
 	}
 }
@@ -10084,8 +10111,7 @@ __perf_event_account_interrupt(struct perf_event *event, int throttle)
 	if (unlikely(throttle && hwc->interrupts >= max_samples_per_tick)) {
 		__this_cpu_inc(perf_throttled_count);
 		tick_dep_set_cpu(smp_processor_id(), TICK_DEP_BIT_PERF_EVENTS);
-		hwc->interrupts = MAX_INTERRUPTS;
-		perf_log_throttle(event, 0);
+		perf_event_throttle_group(event);
 		ret = 1;
 	}

next prev parent reply	other threads:[~2025-05-21 12:16 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-20 18:16 [PATCH V4 00/16] perf: Fix the throttle logic for group kan.liang
2025-05-20 18:16 ` [PATCH V4 01/16] perf: Fix the throttle logic for a group kan.liang
2025-05-20 22:02   ` Namhyung Kim
2025-05-21 12:16   ` tip-bot2 for Kan Liang [this message]
2025-05-27 16:16   ` Leo Yan
2025-05-27 19:30     ` Liang, Kan
2025-05-28 10:28       ` Leo Yan
2025-05-28 14:51         ` Liang, Kan
2025-06-02  0:30   ` perf regression. Was: " Alexei Starovoitov
2025-06-02 12:55     ` Liang, Kan
2025-06-02 16:24       ` Alexei Starovoitov
2025-06-02 17:51         ` Liang, Kan
2025-06-02 18:14           ` Alexei Starovoitov
2025-05-20 18:16 ` [PATCH V4 02/16] perf: Only dump the throttle log for the leader kan.liang
2025-05-20 22:02   ` Namhyung Kim
2025-05-21 12:05   ` Peter Zijlstra
2025-05-21 13:55     ` Liang, Kan
2025-05-21 12:16   ` [tip: perf/core] " tip-bot2 for Kan Liang
2025-05-20 18:16 ` [PATCH V4 03/16] perf/x86/intel: Remove driver-specific throttle support kan.liang
2025-05-21 12:16   ` [tip: perf/core] " tip-bot2 for Kan Liang
2025-05-20 18:16 ` [PATCH V4 04/16] perf/x86/amd: " kan.liang
2025-05-21 12:16   ` [tip: perf/core] " tip-bot2 for Kan Liang
2025-05-20 18:16 ` [PATCH V4 05/16] perf/x86/zhaoxin: " kan.liang
2025-05-21 12:16   ` [tip: perf/core] " tip-bot2 for Kan Liang
2025-05-20 18:16 ` [PATCH V4 06/16] powerpc/perf: " kan.liang
2025-05-21 12:16   ` [tip: perf/core] " tip-bot2 for Kan Liang
2025-05-20 18:16 ` [PATCH V4 07/16] s390/perf: " kan.liang
2025-05-21 12:15   ` [tip: perf/core] " tip-bot2 for Kan Liang
2025-07-23  8:06   ` [PATCH V4 07/16] " Sumanth Korikkar
2025-08-06  8:37     ` Sumanth Korikkar
2025-08-06 17:05       ` Liang, Kan
2025-08-11 14:02         ` Sumanth Korikkar
2025-05-20 18:16 ` [PATCH V4 08/16] perf/arm: " kan.liang
2025-05-21 12:15   ` [tip: perf/core] " tip-bot2 for Kan Liang
2025-05-20 18:16 ` [PATCH V4 09/16] perf/apple_m1: " kan.liang
2025-05-21 12:15   ` [tip: perf/core] " tip-bot2 for Kan Liang
2025-05-20 18:16 ` [PATCH V4 10/16] alpha/perf: " kan.liang
2025-05-21 12:15   ` [tip: perf/core] " tip-bot2 for Kan Liang
2025-05-20 18:16 ` [PATCH V4 11/16] arc/perf: " kan.liang
2025-05-21 12:15   ` [tip: perf/core] " tip-bot2 for Kan Liang
2025-05-20 18:16 ` [PATCH V4 12/16] csky/perf: " kan.liang
2025-05-21 12:15   ` [tip: perf/core] " tip-bot2 for Kan Liang
2025-05-20 18:16 ` [PATCH V4 13/16] loongarch/perf: " kan.liang
2025-05-21 12:15   ` [tip: perf/core] " tip-bot2 for Kan Liang
2025-05-20 18:16 ` [PATCH V4 14/16] sparc/perf: " kan.liang
2025-05-21 12:15   ` [tip: perf/core] " tip-bot2 for Kan Liang
2025-05-20 18:16 ` [PATCH V4 15/16] xtensa/perf: " kan.liang
2025-05-21 12:15   ` [tip: perf/core] " tip-bot2 for Kan Liang
2025-05-20 18:16 ` [PATCH V4 16/16] mips/perf: " kan.liang
2025-05-21 12:15   ` [tip: perf/core] " tip-bot2 for Kan Liang

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:952340f dfblob:8327ab0 )
 OR (
bs:"[tip: perf/core] perf: Fix the throttle logic for a group" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=174782976403.406.11916704121951822622.tip-bot2@tip-bot2 \
    --to=tip-bot2@linutronix.de \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).