From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 141BF28466B;
	Tue,  6 May 2025 16:48:12 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1746550096; cv=none; b=EtPpqymrT/jTOtz9SM+CRp/eKm8zV8n98W9VDxydbrp34IPkLjm/xCl+pYEy3vwbV13lHM0bZjlQHX0OzFCjwauqe+CGjNXzUC/i07j8B0FDYK0BkumiX0y/kO6RvmapNWlkwPL6lEZrd721pWm8ISFIdlhxw6MOq7BbgHTURBc=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1746550096; c=relaxed/simple;
	bh=mdNYtaRODBwuKF3VeWvQWytU/9E3Yw8BVWMEoPcA6lQ=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version; b=jP0cZNQSynHm+KciTSRHOhdQaKVoLfnHYtKNZmEYusnTCjWaZeqGEPrqZo2dFH4Kwvdn9GKyU1aM/Bnkknz8j+50yqEzIT/+mLgevnDJHpF84c9OBLSkfcJRsjF44gpqz7/fnny76U0poYWxhtroHcLIhDquCS9EIz/9rbdH6H0=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Q9i8tK4u; arc=none smtp.client-ip=192.198.163.10
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Q9i8tK4u"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1746550093; x=1778086093;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=mdNYtaRODBwuKF3VeWvQWytU/9E3Yw8BVWMEoPcA6lQ=;
  b=Q9i8tK4uocx2/iQgOajK978jUBG5cNu3/vWkTXqA7FaSpP1XwOenT4WW
   alOzreIx0tV+LUFXqglKYRA+5gPbqPYjJ6x0oUXRKC1e6fVp5TPNU2j2M
   FC3hNgQaenOBHbv/T8feuF0VRkIhCguezTea7b5aPO68h0DpV5uMChFW7
   aPJO2eyoQosRPX6UFqlT7nafPW8rXzybHFJazz7YCPW5JR2Y4ZwxgZG/d
   Kznsi1Lzt6CpDyJfd1L1xpNM41LUMbItRVkWjmXsk/4xaXl+yrJKlBfWQ
   JbW5R9f6qDnSgeIoWA7AcbQEGqcLdHyChOq3lnK8OmBN908Vi5LUfRP79
   A==;
X-CSE-ConnectionGUID: 3w1T0rgxR/GRt3NRbHw8HA==
X-CSE-MsgGUID: /FSwg7AaQIObnBmfKgp80g==
X-IronPort-AV: E=McAfee;i="6700,10204,11425"; a="59595215"
X-IronPort-AV: E=Sophos;i="6.15,266,1739865600"; 
   d="scan'208";a="59595215"
Received: from fmviesa007.fm.intel.com ([10.60.135.147])
  by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 May 2025 09:48:11 -0700
X-CSE-ConnectionGUID: ovA2FqPhTrGIn+ztkKVIDw==
X-CSE-MsgGUID: IKCG6UOVT72WMMuJbsczww==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.15,266,1739865600"; 
   d="scan'208";a="135674837"
Received: from kanliang-dev.jf.intel.com ([10.165.154.102])
  by fmviesa007.fm.intel.com with ESMTP; 06 May 2025 09:48:10 -0700
From: kan.liang@linux.intel.com
To: peterz@infradead.org,
	mingo@redhat.com,
	namhyung@kernel.org,
	irogers@google.com,
	mark.rutland@arm.com,
	linux-kernel@vger.kernel.org,
	linux-perf-users@vger.kernel.org
Cc: eranian@google.com,
	ctshao@google.com,
	tmricht@linux.ibm.com,
	Kan Liang <kan.liang@linux.intel.com>
Subject: [RFC PATCH 01/15] perf: Fix the throttle logic for a group
Date: Tue,  6 May 2025 09:47:26 -0700
Message-Id: <20250506164740.1317237-2-kan.liang@linux.intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <20250506164740.1317237-1-kan.liang@linux.intel.com>
References: <20250506164740.1317237-1-kan.liang@linux.intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

From: Kan Liang <kan.liang@linux.intel.com>

The current throttle logic doesn't work well with a group, e.g., the
following sampling-read case.

$ perf record -e "{cycles,cycles}:S" ...

$ perf report -D | grep THROTTLE | tail -2
            THROTTLE events:        426  ( 9.0%)
          UNTHROTTLE events:        425  ( 9.0%)

$ perf report -D | grep PERF_RECORD_SAMPLE -a4 | tail -n 5
0 1020120874009167 0x74970 [0x68]: PERF_RECORD_SAMPLE(IP, 0x1):
... sample_read:
.... group nr 2
..... id 0000000000000327, value 000000000cbb993a, lost 0
..... id 0000000000000328, value 00000002211c26df, lost 0

The second cycles event has a much larger value than the first cycles
event in the same group.

The current throttle logic in the generic code only logs the THROTTLE
event. It relies on the specific driver implementation to disable
events. However, for all ARCHs, the implementation is similar. It only
disable the event, rather than the group.

The logic to disable the group should be generic for all ARCHs. Add the
logic in the generic code. The following patch will remove the buggy
driver-specific implementation.

The throttle only happens when an event is overflowed. Stop the entire
group when any event in the group triggers the throttle. Set the
MAX_INTERRUPTS to the leader event to indicate the group is throttled.

The unthrottled could happen in 3 places.
- event/group sched. All events in the group are scheduled one by one.
  All of them will be unthrottled eventually. Nothing needs to be
  changed.
- The perf_adjust_freq_unthr_events for each tick. Needs to restart the
  group altogether.
- The __perf_event_period(). The whole group needs to be restarted
  altogether as well.

With the fix,
$ sudo perf report -D | grep PERF_RECORD_SAMPLE -a4 | tail -n 5
0 3573470770332 0x12f5f8 [0x70]: PERF_RECORD_SAMPLE(IP, 0x2):
... sample_read:
.... group nr 2
..... id 0000000000000a28, value 00000004fd3dfd8f, lost 0
..... id 0000000000000a29, value 00000004fd3dfd8f, lost 0

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 kernel/events/core.c | 55 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 41 insertions(+), 14 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index a84abc2b7f20..eb0dc871f4f1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2734,6 +2734,38 @@ void perf_event_disable_inatomic(struct perf_event *event)
 static void perf_log_throttle(struct perf_event *event, int enable);
 static void perf_log_itrace_start(struct perf_event *event);
 
+static void perf_event_group_unthrottle(struct perf_event *event, bool start_event)
+{
+	struct perf_event *leader = event->group_leader;
+	struct perf_event *sibling;
+
+	if (leader != event || start_event)
+		leader->pmu->start(leader, 0);
+	leader->hw.interrupts = 0;
+
+	for_each_sibling_event(sibling, leader) {
+		if (sibling != event || start_event)
+			sibling->pmu->start(sibling, 0);
+		sibling->hw.interrupts = 0;
+	}
+
+	perf_log_throttle(leader, 1);
+}
+
+static void perf_event_group_throttle(struct perf_event *event)
+{
+	struct perf_event *leader = event->group_leader;
+	struct perf_event *sibling;
+
+	leader->hw.interrupts = MAX_INTERRUPTS;
+	leader->pmu->stop(leader, 0);
+
+	for_each_sibling_event(sibling, leader)
+		sibling->pmu->stop(sibling, 0);
+
+	perf_log_throttle(leader, 0);
+}
+
 static int
 event_sched_in(struct perf_event *event, struct perf_event_context *ctx)
 {
@@ -4389,10 +4421,8 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
 		hwc = &event->hw;
 
 		if (hwc->interrupts == MAX_INTERRUPTS) {
-			hwc->interrupts = 0;
-			perf_log_throttle(event, 1);
-			if (!event->attr.freq || !event->attr.sample_freq)
-				event->pmu->start(event, 0);
+			perf_event_group_unthrottle(event,
+				!event->attr.freq || !event->attr.sample_freq);
 		}
 
 		if (!event->attr.freq || !event->attr.sample_freq)
@@ -6421,14 +6451,6 @@ static void __perf_event_period(struct perf_event *event,
 	active = (event->state == PERF_EVENT_STATE_ACTIVE);
 	if (active) {
 		perf_pmu_disable(event->pmu);
-		/*
-		 * We could be throttled; unthrottle now to avoid the tick
-		 * trying to unthrottle while we already re-started the event.
-		 */
-		if (event->hw.interrupts == MAX_INTERRUPTS) {
-			event->hw.interrupts = 0;
-			perf_log_throttle(event, 1);
-		}
 		event->pmu->stop(event, PERF_EF_UPDATE);
 	}
 
@@ -6436,6 +6458,12 @@ static void __perf_event_period(struct perf_event *event,
 
 	if (active) {
 		event->pmu->start(event, PERF_EF_RELOAD);
+		/*
+		 * We could be throttled; unthrottle now to avoid the tick
+		 * trying to unthrottle while we already re-started the event.
+		 */
+		if (event->group_leader->hw.interrupts == MAX_INTERRUPTS)
+			perf_event_group_unthrottle(event, false);
 		perf_pmu_enable(event->pmu);
 	}
 }
@@ -10326,8 +10354,7 @@ __perf_event_account_interrupt(struct perf_event *event, int throttle)
 	if (unlikely(throttle && hwc->interrupts >= max_samples_per_tick)) {
 		__this_cpu_inc(perf_throttled_count);
 		tick_dep_set_cpu(smp_processor_id(), TICK_DEP_BIT_PERF_EVENTS);
-		hwc->interrupts = MAX_INTERRUPTS;
-		perf_log_throttle(event, 0);
+		perf_event_group_throttle(event);
 		ret = 1;
 	}
 
-- 
2.38.1