public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] perf/amd/ibs: Fixes + future enhancements
@ 2026-01-16  3:34 Ravi Bangoria
  2026-01-16  3:34 ` [PATCH 01/11] perf/amd/ibs: Throttle interrupts with filtered ldlat samples Ravi Bangoria
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-16  3:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Ravi Bangoria, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Dapeng Mi, James Clark, x86, linux-perf-users, linux-kernel,
	Manali Shukla, Santosh Shukla, Ananth Narayan, Sandipan Das

Patches 1-5:
  o Resolve several issues in the current IBS implementation

Patches 6-11:
  o Add support for new capabilities that will appear in future AMD
    CPUs:
    - Alternate disable bit in with control only MSRs to eliminate the
      RMW race in existing IBS_{FETCH|OP}_CTL MSRs
    - RIP bit 63 filtering, which can be used as hardware assisted
      privilege filtering, enabling IBS for unprivileged users without
      software based privilege filtering
    - Fetch latency threshold filter to capture only high-latency fetch
      events
    - Streaming-store filter to sample only instructions that perform
      streaming stores
    - Remote socket indicator for load/store instructions

Patches are prepared on tip/perf/core (eebe6446ccb7)

TODO:
  o perf tool and man page changes
  o perf test "AMD IBS software filtering" is failing on system that
    supports RIP bit63 filtering, but it's false-positive. I'll post
    a fix along with perf tools changes

Ravi Bangoria (11):
  perf/amd/ibs: Throttle interrupts with filtered ldlat samples
  perf/amd/ibs: Limit ldlat->l3missonly dependency to Zen5
  perf/amd/ibs: Preserve PhyAddrVal bit when clearing PhyAddr MSR
  perf/amd/ibs: Avoid race between event add and NMI
  perf/amd/ibs: Define macro for ldlat mask
  perf/amd/ibs: Add new MSRs and CPUID bits definitions
  perf/amd/ibs: Support IBS_{FETCH|OP}_CTL2[Dis] to eliminate RMW race
  perf/amd/ibs: Enable fetch latency filtering
  perf/amd/ibs: Enable RIP bit63 hardware filtering
  perf/amd/ibs: Enable streaming store filter
  perf/amd/ibs: Advertise remote socket capability

 arch/x86/events/amd/ibs.c         | 252 ++++++++++++++++++++++++++++--
 arch/x86/include/asm/amd/ibs.h    |   4 +-
 arch/x86/include/asm/msr-index.h  |   2 +
 arch/x86/include/asm/perf_event.h |  52 +++---
 4 files changed, 275 insertions(+), 35 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/11] perf/amd/ibs: Throttle interrupts with filtered ldlat samples
  2026-01-16  3:34 [PATCH 00/11] perf/amd/ibs: Fixes + future enhancements Ravi Bangoria
@ 2026-01-16  3:34 ` Ravi Bangoria
  2026-01-19  7:31   ` Mi, Dapeng
  2026-01-16  3:34 ` [PATCH 02/11] perf/amd/ibs: Limit ldlat->l3missonly dependency to Zen5 Ravi Bangoria
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-16  3:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Ravi Bangoria, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Dapeng Mi, James Clark, x86, linux-perf-users, linux-kernel,
	Manali Shukla, Santosh Shukla, Ananth Narayan, Sandipan Das

IBS NMI handler has a software filter (on top of hardware filter) to
discard samples with load latency value lesser than user requested
threshold. However, since software filter still involves NMI, check
for NMI overhead and throttle the sample rate if needed.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index aca89f23d2e0..96bb0974057f 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -1293,8 +1293,10 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		 * within [128, 2048] range.
 		 */
 		if (!op_data3.ld_op || !op_data3.dc_miss ||
-		    op_data3.dc_miss_lat <= (event->attr.config1 & 0xFFF))
+		    op_data3.dc_miss_lat <= (event->attr.config1 & 0xFFF)) {
+			throttle = perf_event_account_interrupt(event);
 			goto out;
+		}
 	}
 
 	/*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/11] perf/amd/ibs: Limit ldlat->l3missonly dependency to Zen5
  2026-01-16  3:34 [PATCH 00/11] perf/amd/ibs: Fixes + future enhancements Ravi Bangoria
  2026-01-16  3:34 ` [PATCH 01/11] perf/amd/ibs: Throttle interrupts with filtered ldlat samples Ravi Bangoria
@ 2026-01-16  3:34 ` Ravi Bangoria
  2026-01-16  3:34 ` [PATCH 03/11] perf/amd/ibs: Preserve PhyAddrVal bit when clearing PhyAddr MSR Ravi Bangoria
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-16  3:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Ravi Bangoria, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Dapeng Mi, James Clark, x86, linux-perf-users, linux-kernel,
	Manali Shukla, Santosh Shukla, Ananth Narayan, Sandipan Das

The ldlat dependency on l3missonly is specific to Zen 5; newer generations
are not affected. This quirk is documented as an erratum in the following
Revision Guide.

  Erratum: 1606 IBS (Instruction Based Sampling) OP Load Latency Filtering
           May Capture Unwanted Samples When L3Miss Filtering is Disabled

  Revision Guide for AMD Family 1Ah Models 00h-0Fh Processors,
  Pub. 58251 Rev. 1.30 July 2025
  https://bugzilla.kernel.org/attachment.cgi?id=309193

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 96bb0974057f..dc8cc173cdf5 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -356,7 +356,10 @@ static int perf_ibs_init(struct perf_event *event)
 		ldlat >>= 7;
 
 		config |= (ldlat - 1) << 59;
-		config |= IBS_OP_L3MISSONLY | IBS_OP_LDLAT_EN;
+
+		config |= IBS_OP_LDLAT_EN;
+		if (cpu_feature_enabled(X86_FEATURE_ZEN5))
+			config |= IBS_OP_L3MISSONLY;
 	}
 
 	/*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/11] perf/amd/ibs: Preserve PhyAddrVal bit when clearing PhyAddr MSR
  2026-01-16  3:34 [PATCH 00/11] perf/amd/ibs: Fixes + future enhancements Ravi Bangoria
  2026-01-16  3:34 ` [PATCH 01/11] perf/amd/ibs: Throttle interrupts with filtered ldlat samples Ravi Bangoria
  2026-01-16  3:34 ` [PATCH 02/11] perf/amd/ibs: Limit ldlat->l3missonly dependency to Zen5 Ravi Bangoria
@ 2026-01-16  3:34 ` Ravi Bangoria
  2026-01-16  3:34 ` [PATCH 04/11] perf/amd/ibs: Avoid race between event add and NMI Ravi Bangoria
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-16  3:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Ravi Bangoria, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Dapeng Mi, James Clark, x86, linux-perf-users, linux-kernel,
	Manali Shukla, Santosh Shukla, Ananth Narayan, Sandipan Das

Commit 50a53b60e141 ("perf/amd/ibs: Prevent leaking sensitive data to
userspace") zeroed the physical address and also cleared the PhyAddrVal
flag before copying the value into a perf sample to avoid exposing
physical addresses to unprivileged users.

Clearing PhyAddrVal, however, has an unintended side-effect: several
other IBS fields are considered valid only when this bit is set. As a
result, those otherwise correct fields are discarded, reducing IBS
functionality.

Continue to zero the physical address, but keep the PhyAddrVal bit
intact so the related fields remain usable while still preventing any
address leak.

Fixes: 50a53b60e141 ("perf/amd/ibs: Prevent leaking sensitive data to userspace")
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index dc8cc173cdf5..72abc474ec23 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -1217,12 +1217,10 @@ static void perf_ibs_phyaddr_clear(struct perf_ibs *perf_ibs,
 				   struct perf_ibs_data *ibs_data)
 {
 	if (perf_ibs == &perf_ibs_op) {
-		ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSOPDATA3)] &= ~(1ULL << 18);
 		ibs_data->regs[ibs_op_msr_idx(MSR_AMD64_IBSDCPHYSAD)] = 0;
 		return;
 	}
 
-	ibs_data->regs[ibs_fetch_msr_idx(MSR_AMD64_IBSFETCHCTL)] &= ~(1ULL << 52);
 	ibs_data->regs[ibs_fetch_msr_idx(MSR_AMD64_IBSFETCHPHYSAD)] = 0;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/11] perf/amd/ibs: Avoid race between event add and NMI
  2026-01-16  3:34 [PATCH 00/11] perf/amd/ibs: Fixes + future enhancements Ravi Bangoria
                   ` (2 preceding siblings ...)
  2026-01-16  3:34 ` [PATCH 03/11] perf/amd/ibs: Preserve PhyAddrVal bit when clearing PhyAddr MSR Ravi Bangoria
@ 2026-01-16  3:34 ` Ravi Bangoria
  2026-01-16  3:34 ` [PATCH 05/11] perf/amd/ibs: Define macro for ldlat mask Ravi Bangoria
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-16  3:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Ravi Bangoria, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Dapeng Mi, James Clark, x86, linux-perf-users, linux-kernel,
	Manali Shukla, Santosh Shukla, Ananth Narayan, Sandipan Das

Consider the following race:

  --------
  o OP_CTL contains stale value: OP_CTL[Val]=1, OP_CTL[En]=0
  o A new IBS OP event is being added
  o [P]: Process context, [N]: NMI context

  [P] perf_ibs_add(event) {
  [P]     if (test_and_set_bit(IBS_ENABLED, pcpu->state))
  [P]         return;
  [P]     /* pcpu->state = IBS_ENABLED */
  [P]
  [P]     pcpu->event = event;
  [P]
  [P]     perf_ibs_start(event) {
  [P]         set_bit(IBS_STARTED, pcpu->state);
  [P]         /* pcpu->state = IBS_ENABLED | IBS_STARTED */
  [P]         clear_bit(IBS_STOPPING, pcpu->state);
  [P]         /* pcpu->state = IBS_ENABLED | IBS_STARTED */

  [N] --> NMI due to genuine FETCH event. perf_ibs_handle_irq()
  [N]     called for OP PMU as well.
  [N]
  [N] perf_ibs_handle_irq(perf_ibs) {
  [N]     event = pcpu->event; /* See line 6 */
  [N]
  [N]     if (!test_bit(IBS_STARTED, pcpu->state)) /* false */
  [N]         return 0;
  [N]
  [N]     if (WARN_ON_ONCE(!event)) /* false */
  [N]         goto fail;
  [N]
  [N]     if (!(*buf++ & perf_ibs->valid_mask)) /* false due to stale
  [N]                                            * IBS_OP_CTL value */
  [N]         goto fail;
  [N]
  [N]         ...
  [N]
  [N]     perf_ibs_enable_event() // *Accidentally* enable the event.
  [N] }
  [N]
  [N] /*
  [N]  * Repeated NMIs may follow due to accidentally enabled IBS OP
  [N]  * event if the sample period is very low. It could also lead
  [N]  * to pcpu->state corruption if the event gets throttled due
  [N]  * to too frequent NMIs.
  [N]  */

  [P]         perf_ibs_enable_event();
  [P]     }
  [P] }
  --------

We cannot safely clear IBS_{FETCH|OP}_CTL while disabling the event,
because the register might be read again later. So, clear the register
in the enable path - before we update pcpu->state and enable the event.
This guarantees that any NMI that lands in the gap finds Val=0 and
bails out cleanly.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 72abc474ec23..27b764eee6c7 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -490,6 +490,14 @@ static void perf_ibs_start(struct perf_event *event, int flags)
 	}
 	config |= period >> 4;
 
+	/*
+	 * Reset the IBS_{FETCH|OP}_CTL MSR before updating pcpu->state.
+	 * Doing so prevents a race condition in which an NMI due to other
+	 * source might accidentally activate the event before we enable
+	 * it ourselves.
+	 */
+	perf_ibs_disable_event(perf_ibs, hwc, 0);
+
 	/*
 	 * Set STARTED before enabling the hardware, such that a subsequent NMI
 	 * must observe it.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/11] perf/amd/ibs: Define macro for ldlat mask
  2026-01-16  3:34 [PATCH 00/11] perf/amd/ibs: Fixes + future enhancements Ravi Bangoria
                   ` (3 preceding siblings ...)
  2026-01-16  3:34 ` [PATCH 04/11] perf/amd/ibs: Avoid race between event add and NMI Ravi Bangoria
@ 2026-01-16  3:34 ` Ravi Bangoria
  2026-01-19  7:38   ` Mi, Dapeng
  2026-01-16  3:34 ` [PATCH 06/11] perf/amd/ibs: Add new MSRs and CPUID bits definitions Ravi Bangoria
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-16  3:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Ravi Bangoria, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Dapeng Mi, James Clark, x86, linux-perf-users, linux-kernel,
	Manali Shukla, Santosh Shukla, Ananth Narayan, Sandipan Das

Load latency filter threshold is encoded in config1[11:0]. Define a mask
for it instead of hardcoded 0xFFF. Unlike "config" fields whose layout
maps to PERF_{FETCH|OP}_CTL MSR, layout of "config1" is custom defined
so a new set of macros are needed for "config1" fields.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 27b764eee6c7..02e7bffe1208 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -32,6 +32,9 @@ static u32 ibs_caps;
 /* attr.config2 */
 #define IBS_SW_FILTER_MASK	1
 
+/* attr.config1 */
+#define IBS_OP_CONFIG1_LDLAT_MASK		(0xFFFULL <<  0)
+
 /*
  * IBS states:
  *
@@ -274,7 +277,7 @@ static bool perf_ibs_ldlat_event(struct perf_ibs *perf_ibs,
 {
 	return perf_ibs == &perf_ibs_op &&
 	       (ibs_caps & IBS_CAPS_OPLDLAT) &&
-	       (event->attr.config1 & 0xFFF);
+	       (event->attr.config1 & IBS_OP_CONFIG1_LDLAT_MASK);
 }
 
 static int perf_ibs_init(struct perf_event *event)
@@ -349,7 +352,7 @@ static int perf_ibs_init(struct perf_event *event)
 	}
 
 	if (perf_ibs_ldlat_event(perf_ibs, event)) {
-		u64 ldlat = event->attr.config1 & 0xFFF;
+		u64 ldlat = event->attr.config1 & IBS_OP_CONFIG1_LDLAT_MASK;
 
 		if (ldlat < 128 || ldlat > 2048)
 			return -EINVAL;
@@ -1302,7 +1305,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		 * within [128, 2048] range.
 		 */
 		if (!op_data3.ld_op || !op_data3.dc_miss ||
-		    op_data3.dc_miss_lat <= (event->attr.config1 & 0xFFF)) {
+		    op_data3.dc_miss_lat <= (event->attr.config1 & IBS_OP_CONFIG1_LDLAT_MASK)) {
 			throttle = perf_event_account_interrupt(event);
 			goto out;
 		}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/11] perf/amd/ibs: Add new MSRs and CPUID bits definitions
  2026-01-16  3:34 [PATCH 00/11] perf/amd/ibs: Fixes + future enhancements Ravi Bangoria
                   ` (4 preceding siblings ...)
  2026-01-16  3:34 ` [PATCH 05/11] perf/amd/ibs: Define macro for ldlat mask Ravi Bangoria
@ 2026-01-16  3:34 ` Ravi Bangoria
  2026-01-19  7:39   ` Mi, Dapeng
  2026-01-16  3:34 ` [PATCH 07/11] perf/amd/ibs: Support IBS_{FETCH|OP}_CTL2[Dis] to eliminate RMW race Ravi Bangoria
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-16  3:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Ravi Bangoria, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Dapeng Mi, James Clark, x86, linux-perf-users, linux-kernel,
	Manali Shukla, Santosh Shukla, Ananth Narayan, Sandipan Das

IBS on upcoming microarch introduced two new control MSRs and couple of
new features. Define macros for them.

New capabilities:

 o IBS_CAPS_DIS: Alternate Fetch and Op IBS disable bits
 o IBS_CAPS_FETCHLAT: Fetch Latency filter
 o IBS_CAPS_BIT63_FILTER: Virtual address bit 63 based filters for Fetch
   and Op
 o IBS_CAPS_STRMST_RMTSOCKET: Streaming store filter and indicator,
   remote socket indicator

New control MSRs for above features:

 o MSR_AMD64_IBSFETCHCTL2
 o MSR_AMD64_IBSOPCTL2

Also do cosmetic alignment changes.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/include/asm/msr-index.h  |  2 ++
 arch/x86/include/asm/perf_event.h | 52 ++++++++++++++++++++-----------
 2 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3d0a0950d20a..d8b3f3abe583 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -693,6 +693,8 @@
 #define MSR_AMD64_IBSBRTARGET		0xc001103b
 #define MSR_AMD64_ICIBSEXTDCTL		0xc001103c
 #define MSR_AMD64_IBSOPDATA4		0xc001103d
+#define MSR_AMD64_IBSOPCTL2		0xc001103e
+#define MSR_AMD64_IBSFETCHCTL2		0xc001103f
 #define MSR_AMD64_IBS_REG_COUNT_MAX	8 /* includes MSR_AMD64_IBSBRTARGET */
 #define MSR_AMD64_SVM_AVIC_DOORBELL	0xc001011b
 #define MSR_AMD64_VM_PAGE_FLUSH		0xc001011e
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 0d9af4135e0a..6f5ec5c9d5b4 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -639,6 +639,10 @@ struct arch_pebs_cntr_header {
 #define IBS_CAPS_OPDATA4		(1U<<10)
 #define IBS_CAPS_ZEN4			(1U<<11)
 #define IBS_CAPS_OPLDLAT		(1U<<12)
+#define IBS_CAPS_DIS			(1U<<13)
+#define IBS_CAPS_FETCHLAT		(1U<<14)
+#define IBS_CAPS_BIT63_FILTER		(1U<<15)
+#define IBS_CAPS_STRMST_RMTSOCKET	(1U<<16)
 #define IBS_CAPS_OPDTLBPGSIZE		(1U<<19)
 
 #define IBS_CAPS_DEFAULT		(IBS_CAPS_AVAIL		\
@@ -653,31 +657,41 @@ struct arch_pebs_cntr_header {
 #define IBSCTL_LVT_OFFSET_MASK		0x0F
 
 /* IBS fetch bits/masks */
-#define IBS_FETCH_L3MISSONLY	(1ULL<<59)
-#define IBS_FETCH_RAND_EN	(1ULL<<57)
-#define IBS_FETCH_VAL		(1ULL<<49)
-#define IBS_FETCH_ENABLE	(1ULL<<48)
-#define IBS_FETCH_CNT		0xFFFF0000ULL
-#define IBS_FETCH_MAX_CNT	0x0000FFFFULL
+#define IBS_FETCH_L3MISSONLY		      (1ULL << 59)
+#define IBS_FETCH_RAND_EN		      (1ULL << 57)
+#define IBS_FETCH_VAL			      (1ULL << 49)
+#define IBS_FETCH_ENABLE		      (1ULL << 48)
+#define IBS_FETCH_CNT			     0xFFFF0000ULL
+#define IBS_FETCH_MAX_CNT		     0x0000FFFFULL
+
+#define IBS_FETCH_2_DIS			      (1ULL <<  0)
+#define IBS_FETCH_2_FETCH_LAT_FILTER	    (0xFULL <<  1)
+#define IBS_FETCH_2_EXCL_RIP_63_EQ_1	      (1ULL <<  5)
+#define IBS_FETCH_2_EXCL_RIP_63_EQ_0	      (1ULL <<  6)
 
 /*
  * IBS op bits/masks
  * The lower 7 bits of the current count are random bits
  * preloaded by hardware and ignored in software
  */
-#define IBS_OP_LDLAT_EN		(1ULL<<63)
-#define IBS_OP_LDLAT_THRSH	(0xFULL<<59)
-#define IBS_OP_CUR_CNT		(0xFFF80ULL<<32)
-#define IBS_OP_CUR_CNT_RAND	(0x0007FULL<<32)
-#define IBS_OP_CUR_CNT_EXT_MASK	(0x7FULL<<52)
-#define IBS_OP_CNT_CTL		(1ULL<<19)
-#define IBS_OP_VAL		(1ULL<<18)
-#define IBS_OP_ENABLE		(1ULL<<17)
-#define IBS_OP_L3MISSONLY	(1ULL<<16)
-#define IBS_OP_MAX_CNT		0x0000FFFFULL
-#define IBS_OP_MAX_CNT_EXT	0x007FFFFFULL	/* not a register bit mask */
-#define IBS_OP_MAX_CNT_EXT_MASK	(0x7FULL<<20)	/* separate upper 7 bits */
-#define IBS_RIP_INVALID		(1ULL<<38)
+#define IBS_OP_LDLAT_EN			      (1ULL << 63)
+#define IBS_OP_LDLAT_THRSH		    (0xFULL << 59)
+#define IBS_OP_CUR_CNT			(0xFFF80ULL << 32)
+#define IBS_OP_CUR_CNT_RAND		(0x0007FULL << 32)
+#define IBS_OP_CUR_CNT_EXT_MASK		   (0x7FULL << 52)
+#define IBS_OP_CNT_CTL			      (1ULL << 19)
+#define IBS_OP_VAL			      (1ULL << 18)
+#define IBS_OP_ENABLE			      (1ULL << 17)
+#define IBS_OP_L3MISSONLY		      (1ULL << 16)
+#define IBS_OP_MAX_CNT			     0x0000FFFFULL
+#define IBS_OP_MAX_CNT_EXT		     0x007FFFFFULL	/* not a register bit mask */
+#define IBS_OP_MAX_CNT_EXT_MASK		   (0x7FULL << 20)	/* separate upper 7 bits */
+#define IBS_RIP_INVALID			      (1ULL << 38)
+
+#define IBS_OP_2_DIS			      (1ULL <<  0)
+#define IBS_OP_2_EXCL_RIP_63_EQ_0	      (1ULL <<  1)
+#define IBS_OP_2_EXCL_RIP_63_EQ_1	      (1ULL <<  2)
+#define IBS_OP_2_STRM_ST_FILTER		      (1ULL <<  3)
 
 #ifdef CONFIG_X86_LOCAL_APIC
 extern u32 get_ibs_caps(void);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/11] perf/amd/ibs: Support IBS_{FETCH|OP}_CTL2[Dis] to eliminate RMW race
  2026-01-16  3:34 [PATCH 00/11] perf/amd/ibs: Fixes + future enhancements Ravi Bangoria
                   ` (5 preceding siblings ...)
  2026-01-16  3:34 ` [PATCH 06/11] perf/amd/ibs: Add new MSRs and CPUID bits definitions Ravi Bangoria
@ 2026-01-16  3:34 ` Ravi Bangoria
  2026-01-19  7:48   ` Mi, Dapeng
  2026-01-16  3:34 ` [PATCH 08/11] perf/amd/ibs: Enable fetch latency filtering Ravi Bangoria
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-16  3:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Ravi Bangoria, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Dapeng Mi, James Clark, x86, linux-perf-users, linux-kernel,
	Manali Shukla, Santosh Shukla, Ananth Narayan, Sandipan Das

The existing IBS_{FETCH|OP}_CTL MSRs combine control and status bits
which leads to RMW race between HW and SW:

  HW                               SW
  ------------------------         ------------------------------
                                   config = rdmsr(IBS_OP_CTL);
                                   config &= ~EN;
  Set IBS_OP_CTL[Val] to 1
  trigger NMI
                                   wrmsr(IBS_OP_CTL, config);
                                   // Val is accidentally cleared

Future hardware adds a control-only MSR, IBS_{FETCH|OP}_CTL2, which
provides a second-level "disable" bit (Dis). IBS is now:

  Enabled:  IBS_{FETCH|OP}_CTL[En] = 1 && IBS_{FETCH|OP}_CTL2[Dis] = 0
  Disabled: IBS_{FETCH|OP}_CTL[En] = 0 || IBS_{FETCH|OP}_CTL2[Dis] = 1

The separate "Dis" bit lets software disable IBS without touching any
status fields, eliminating the hardware/software race.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 45 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 02e7bffe1208..d8216048be84 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -86,9 +86,11 @@ struct cpu_perf_ibs {
 struct perf_ibs {
 	struct pmu			pmu;
 	unsigned int			msr;
+	unsigned int			msr2;
 	u64				config_mask;
 	u64				cnt_mask;
 	u64				enable_mask;
+	u64				disable_mask;
 	u64				valid_mask;
 	u16				min_period;
 	u64				max_period;
@@ -292,6 +294,8 @@ static int perf_ibs_init(struct perf_event *event)
 		return -ENOENT;
 
 	config = event->attr.config;
+	hwc->extra_reg.config = 0;
+	hwc->extra_reg.reg = 0;
 
 	if (event->pmu != &perf_ibs->pmu)
 		return -ENOENT;
@@ -316,6 +320,11 @@ static int perf_ibs_init(struct perf_event *event)
 	if (ret)
 		return ret;
 
+	if (ibs_caps & IBS_CAPS_DIS) {
+		hwc->extra_reg.config &= ~perf_ibs->disable_mask;
+		hwc->extra_reg.reg = perf_ibs->msr2;
+	}
+
 	if (hwc->sample_period) {
 		if (config & perf_ibs->cnt_mask)
 			/* raw max_cnt may not be set */
@@ -445,6 +454,9 @@ static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs,
 		wrmsrq(hwc->config_base, tmp & ~perf_ibs->enable_mask);
 
 	wrmsrq(hwc->config_base, tmp | perf_ibs->enable_mask);
+
+	if (hwc->extra_reg.reg)
+		wrmsrq(hwc->extra_reg.reg, hwc->extra_reg.config);
 }
 
 /*
@@ -457,6 +469,11 @@ static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs,
 static inline void perf_ibs_disable_event(struct perf_ibs *perf_ibs,
 					  struct hw_perf_event *hwc, u64 config)
 {
+	if (ibs_caps & IBS_CAPS_DIS) {
+		wrmsrq(hwc->extra_reg.reg, perf_ibs->disable_mask);
+		return;
+	}
+
 	config &= ~perf_ibs->cnt_mask;
 	if (boot_cpu_data.x86 == 0x10)
 		wrmsrq(hwc->config_base, config);
@@ -809,6 +826,7 @@ static struct perf_ibs perf_ibs_fetch = {
 		.check_period	= perf_ibs_check_period,
 	},
 	.msr			= MSR_AMD64_IBSFETCHCTL,
+	.msr2			= MSR_AMD64_IBSFETCHCTL2,
 	.config_mask		= IBS_FETCH_MAX_CNT | IBS_FETCH_RAND_EN,
 	.cnt_mask		= IBS_FETCH_MAX_CNT,
 	.enable_mask		= IBS_FETCH_ENABLE,
@@ -834,6 +852,7 @@ static struct perf_ibs perf_ibs_op = {
 		.check_period	= perf_ibs_check_period,
 	},
 	.msr			= MSR_AMD64_IBSOPCTL,
+	.msr2			= MSR_AMD64_IBSOPCTL2,
 	.config_mask		= IBS_OP_MAX_CNT,
 	.cnt_mask		= IBS_OP_MAX_CNT | IBS_OP_CUR_CNT |
 				  IBS_OP_CUR_CNT_RAND,
@@ -1389,6 +1408,9 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 
 out:
 	if (!throttle) {
+		if (ibs_caps & IBS_CAPS_DIS)
+			wrmsrq(hwc->extra_reg.reg, perf_ibs->disable_mask);
+
 		if (perf_ibs == &perf_ibs_op) {
 			if (ibs_caps & IBS_CAPS_OPCNTEXT) {
 				new_config = period & IBS_OP_MAX_CNT_EXT_MASK;
@@ -1460,6 +1482,9 @@ static __init int perf_ibs_fetch_init(void)
 	if (ibs_caps & IBS_CAPS_ZEN4)
 		perf_ibs_fetch.config_mask |= IBS_FETCH_L3MISSONLY;
 
+	if (ibs_caps & IBS_CAPS_DIS)
+		perf_ibs_fetch.disable_mask = IBS_FETCH_2_DIS;
+
 	perf_ibs_fetch.pmu.attr_groups = fetch_attr_groups;
 	perf_ibs_fetch.pmu.attr_update = fetch_attr_update;
 
@@ -1481,6 +1506,9 @@ static __init int perf_ibs_op_init(void)
 	if (ibs_caps & IBS_CAPS_ZEN4)
 		perf_ibs_op.config_mask |= IBS_OP_L3MISSONLY;
 
+	if (ibs_caps & IBS_CAPS_DIS)
+		perf_ibs_op.disable_mask = IBS_OP_2_DIS;
+
 	perf_ibs_op.pmu.attr_groups = op_attr_groups;
 	perf_ibs_op.pmu.attr_update = op_attr_update;
 
@@ -1727,6 +1755,23 @@ static void clear_APIC_ibs(void)
 static int x86_pmu_amd_ibs_starting_cpu(unsigned int cpu)
 {
 	setup_APIC_ibs();
+
+	if (ibs_caps & IBS_CAPS_DIS) {
+		/*
+		 * IBS enable sequence:
+		 *   CTL[En] = 1;
+		 *   CTL2[Dis] = 0;
+		 *
+		 * IBS disable sequence:
+		 *   CTL2[Dis] = 1;
+		 *
+		 * Set CTL2[Dis] when CPU comes up. This is needed to make
+		 * enable sequence effective.
+		 */
+		wrmsrq(MSR_AMD64_IBSFETCHCTL2, 1);
+		wrmsrq(MSR_AMD64_IBSOPCTL2, 1);
+	}
+
 	return 0;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/11] perf/amd/ibs: Enable fetch latency filtering
  2026-01-16  3:34 [PATCH 00/11] perf/amd/ibs: Fixes + future enhancements Ravi Bangoria
                   ` (6 preceding siblings ...)
  2026-01-16  3:34 ` [PATCH 07/11] perf/amd/ibs: Support IBS_{FETCH|OP}_CTL2[Dis] to eliminate RMW race Ravi Bangoria
@ 2026-01-16  3:34 ` Ravi Bangoria
  2026-01-16  3:34 ` [PATCH 09/11] perf/amd/ibs: Enable RIP bit63 hardware filtering Ravi Bangoria
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-16  3:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Ravi Bangoria, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Dapeng Mi, James Clark, x86, linux-perf-users, linux-kernel,
	Manali Shukla, Santosh Shukla, Ananth Narayan, Sandipan Das

IBS Fetch on future hardware adds fetch latency filtering which
generates interrupt only when FetchLat value exceeds a programmable
threshold.

Hardware allows threshold in 128-cycle increment (i.e. 128, 256, 384
etc.) from 128 to 1920 cycles. Like the existing IBS filters, samples
that fail the latency test are dropped and IBS restarts internally.

Since hardware supports threshold in multiple of 128, add a software
filter on top to support latency threshold with the granularity of 1
cycle in between [128-1920].

Example:
  # perf record -e ibs_fetch/fetchlat=128/ -c 10000 -a -- sleep 5

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 66 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 65 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index d8216048be84..b2d21026edae 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -35,6 +35,8 @@ static u32 ibs_caps;
 /* attr.config1 */
 #define IBS_OP_CONFIG1_LDLAT_MASK		(0xFFFULL <<  0)
 
+#define IBS_FETCH_CONFIG1_FETCHLAT_MASK		(0x7FFULL <<  0)
+
 /*
  * IBS states:
  *
@@ -282,6 +284,14 @@ static bool perf_ibs_ldlat_event(struct perf_ibs *perf_ibs,
 	       (event->attr.config1 & IBS_OP_CONFIG1_LDLAT_MASK);
 }
 
+static bool perf_ibs_fetch_lat_event(struct perf_ibs *perf_ibs,
+				     struct perf_event *event)
+{
+	return perf_ibs == &perf_ibs_fetch &&
+	       (ibs_caps & IBS_CAPS_FETCHLAT) &&
+	       (event->attr.config1 & IBS_FETCH_CONFIG1_FETCHLAT_MASK);
+}
+
 static int perf_ibs_init(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
@@ -374,6 +384,17 @@ static int perf_ibs_init(struct perf_event *event)
 			config |= IBS_OP_L3MISSONLY;
 	}
 
+	if (perf_ibs_fetch_lat_event(perf_ibs, event)) {
+		u64 fetchlat = event->attr.config1 & IBS_FETCH_CONFIG1_FETCHLAT_MASK;
+
+		if (fetchlat < 128 || fetchlat > 1920)
+			return -EINVAL;
+		fetchlat >>= 7;
+
+		hwc->extra_reg.reg = perf_ibs->msr2;
+		hwc->extra_reg.config |= fetchlat << 1;
+	}
+
 	/*
 	 * If we modify hwc->sample_period, we also need to update
 	 * hwc->last_period and hwc->period_left.
@@ -662,6 +683,8 @@ PMU_EVENT_ATTR_STRING(ldlat, ibs_op_ldlat_format, "config1:0-11");
 PMU_EVENT_ATTR_STRING(zen4_ibs_extensions, zen4_ibs_extensions, "1");
 PMU_EVENT_ATTR_STRING(ldlat, ibs_op_ldlat_cap, "1");
 PMU_EVENT_ATTR_STRING(dtlb_pgsize, ibs_op_dtlb_pgsize_cap, "1");
+PMU_EVENT_ATTR_STRING(fetchlat, ibs_fetch_lat_format, "config1:0-10");
+PMU_EVENT_ATTR_STRING(fetchlat, ibs_fetch_lat_cap, "1");
 
 static umode_t
 zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int i)
@@ -669,6 +692,12 @@ zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int
 	return ibs_caps & IBS_CAPS_ZEN4 ? attr->mode : 0;
 }
 
+static umode_t
+ibs_fetch_lat_is_visible(struct kobject *kobj, struct attribute *attr, int i)
+{
+	return ibs_caps & IBS_CAPS_FETCHLAT ? attr->mode : 0;
+}
+
 static umode_t
 ibs_op_ldlat_is_visible(struct kobject *kobj, struct attribute *attr, int i)
 {
@@ -697,6 +726,16 @@ static struct attribute *zen4_ibs_extensions_attrs[] = {
 	NULL,
 };
 
+static struct attribute *ibs_fetch_lat_format_attrs[] = {
+	&ibs_fetch_lat_format.attr.attr,
+	NULL,
+};
+
+static struct attribute *ibs_fetch_lat_cap_attrs[] = {
+	&ibs_fetch_lat_cap.attr.attr,
+	NULL,
+};
+
 static struct attribute *ibs_op_ldlat_cap_attrs[] = {
 	&ibs_op_ldlat_cap.attr.attr,
 	NULL,
@@ -724,6 +763,18 @@ static struct attribute_group group_zen4_ibs_extensions = {
 	.is_visible = zen4_ibs_extensions_is_visible,
 };
 
+static struct attribute_group group_ibs_fetch_lat_cap = {
+	.name = "caps",
+	.attrs = ibs_fetch_lat_cap_attrs,
+	.is_visible = ibs_fetch_lat_is_visible,
+};
+
+static struct attribute_group group_ibs_fetch_lat_format = {
+	.name = "format",
+	.attrs = ibs_fetch_lat_format_attrs,
+	.is_visible = ibs_fetch_lat_is_visible,
+};
+
 static struct attribute_group group_ibs_op_ldlat_cap = {
 	.name = "caps",
 	.attrs = ibs_op_ldlat_cap_attrs,
@@ -745,6 +796,8 @@ static const struct attribute_group *fetch_attr_groups[] = {
 static const struct attribute_group *fetch_attr_update[] = {
 	&group_fetch_l3missonly,
 	&group_zen4_ibs_extensions,
+	&group_ibs_fetch_lat_cap,
+	&group_ibs_fetch_lat_format,
 	NULL,
 };
 
@@ -1188,7 +1241,8 @@ static int perf_ibs_get_offset_max(struct perf_ibs *perf_ibs,
 {
 	if (event->attr.sample_type & PERF_SAMPLE_RAW ||
 	    perf_ibs_is_mem_sample_type(perf_ibs, event) ||
-	    perf_ibs_ldlat_event(perf_ibs, event))
+	    perf_ibs_ldlat_event(perf_ibs, event) ||
+	    perf_ibs_fetch_lat_event(perf_ibs, event))
 		return perf_ibs->offset_max;
 	else if (check_rip)
 		return 3;
@@ -1330,6 +1384,16 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		}
 	}
 
+	if (perf_ibs_fetch_lat_event(perf_ibs, event)) {
+		union ibs_fetch_ctl fetch_ctl;
+
+		fetch_ctl.val = ibs_data.regs[ibs_fetch_msr_idx(MSR_AMD64_IBSFETCHCTL)];
+		if (fetch_ctl.fetch_lat < (event->attr.config1 & IBS_FETCH_CONFIG1_FETCHLAT_MASK)) {
+			throttle = perf_event_account_interrupt(event);
+			goto out;
+		}
+	}
+
 	/*
 	 * Read IbsBrTarget, IbsOpData4, and IbsExtdCtl separately
 	 * depending on their availability.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/11] perf/amd/ibs: Enable RIP bit63 hardware filtering
  2026-01-16  3:34 [PATCH 00/11] perf/amd/ibs: Fixes + future enhancements Ravi Bangoria
                   ` (7 preceding siblings ...)
  2026-01-16  3:34 ` [PATCH 08/11] perf/amd/ibs: Enable fetch latency filtering Ravi Bangoria
@ 2026-01-16  3:34 ` Ravi Bangoria
  2026-01-16  3:34 ` [PATCH 10/11] perf/amd/ibs: Enable streaming store filter Ravi Bangoria
  2026-01-16  3:34 ` [PATCH 11/11] perf/amd/ibs: Advertise remote socket capability Ravi Bangoria
  10 siblings, 0 replies; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-16  3:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Ravi Bangoria, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Dapeng Mi, James Clark, x86, linux-perf-users, linux-kernel,
	Manali Shukla, Santosh Shukla, Ananth Narayan, Sandipan Das

IBS on future hardware adds the ability to filter IBS events by examining
RIP bit 63. Because Linux kernel addresses always have bit 63 set while
user-space addresses never do, this capability can be used as a privilege
filter.

So far, IBS supports privilege filtering in software (swfilt=1), where
samples are dropped in the NMI handler. The RIP bit63 hardware filter
enables IBS to be usable by unprivileged users without passing swfilt
flag. So, swfilt flag will silently be ignored when the hardware
filtering capability is present.

Example (non-root user):
  $ perf record -e ibs_op//u -- <workload>

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 46 ++++++++++++++++++++++++++++++++-------
 1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index b2d21026edae..a768a82d7ad2 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -321,11 +321,6 @@ static int perf_ibs_init(struct perf_event *event)
 	    event->attr.exclude_idle)
 		return -EINVAL;
 
-	if (!(event->attr.config2 & IBS_SW_FILTER_MASK) &&
-	    (event->attr.exclude_kernel || event->attr.exclude_user ||
-	     event->attr.exclude_hv))
-		return -EINVAL;
-
 	ret = validate_group(event);
 	if (ret)
 		return ret;
@@ -335,6 +330,32 @@ static int perf_ibs_init(struct perf_event *event)
 		hwc->extra_reg.reg = perf_ibs->msr2;
 	}
 
+	if (ibs_caps & IBS_CAPS_BIT63_FILTER) {
+		if (perf_ibs == &perf_ibs_fetch) {
+			if (event->attr.exclude_kernel) {
+				hwc->extra_reg.config |= IBS_FETCH_2_EXCL_RIP_63_EQ_1;
+				hwc->extra_reg.reg = perf_ibs->msr2;
+			}
+			if (event->attr.exclude_user) {
+				hwc->extra_reg.config |= IBS_FETCH_2_EXCL_RIP_63_EQ_0;
+				hwc->extra_reg.reg = perf_ibs->msr2;
+			}
+		} else {
+			if (event->attr.exclude_kernel) {
+				hwc->extra_reg.config |= IBS_OP_2_EXCL_RIP_63_EQ_1;
+				hwc->extra_reg.reg = perf_ibs->msr2;
+			}
+			if (event->attr.exclude_user) {
+				hwc->extra_reg.config |= IBS_OP_2_EXCL_RIP_63_EQ_0;
+				hwc->extra_reg.reg = perf_ibs->msr2;
+			}
+		}
+	} else if (!(event->attr.config2 & IBS_SW_FILTER_MASK) &&
+		   (event->attr.exclude_kernel || event->attr.exclude_user ||
+		    event->attr.exclude_hv)) {
+		return -EINVAL;
+	}
+
 	if (hwc->sample_period) {
 		if (config & perf_ibs->cnt_mask)
 			/* raw max_cnt may not be set */
@@ -1277,7 +1298,7 @@ static bool perf_ibs_is_kernel_br_target(struct perf_event *event,
 			op_data.op_brn_ret && kernel_ip(br_target));
 }
 
-static bool perf_ibs_swfilt_discard(struct perf_ibs *perf_ibs, struct perf_event *event,
+static bool perf_ibs_discard_sample(struct perf_ibs *perf_ibs, struct perf_event *event,
 				    struct pt_regs *regs, struct perf_ibs_data *ibs_data,
 				    int br_target_idx)
 {
@@ -1430,8 +1451,9 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		regs.flags |= PERF_EFLAGS_EXACT;
 	}
 
-	if ((event->attr.config2 & IBS_SW_FILTER_MASK) &&
-	    perf_ibs_swfilt_discard(perf_ibs, event, &regs, &ibs_data, br_target_idx)) {
+	if (((ibs_caps & IBS_CAPS_BIT63_FILTER) ||
+	     (event->attr.config2 & IBS_SW_FILTER_MASK)) &&
+	    perf_ibs_discard_sample(perf_ibs, event, &regs, &ibs_data, br_target_idx)) {
 		throttle = perf_event_account_interrupt(event);
 		goto out;
 	}
@@ -1894,6 +1916,14 @@ static __init int amd_ibs_init(void)
 
 	perf_ibs_pm_init();
 
+#ifdef CONFIG_X86_32
+	/*
+	 * IBS_CAPS_BIT63_FILTER is used for exclude_kernel/user filtering,
+	 * which obviously won't work for 32 bit kernel.
+	 */
+	caps &= ~IBS_CAPS_BIT63_FILTER;
+#endif
+
 	ibs_caps = caps;
 	/* make ibs_caps visible to other cpus: */
 	smp_mb();
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/11] perf/amd/ibs: Enable streaming store filter
  2026-01-16  3:34 [PATCH 00/11] perf/amd/ibs: Fixes + future enhancements Ravi Bangoria
                   ` (8 preceding siblings ...)
  2026-01-16  3:34 ` [PATCH 09/11] perf/amd/ibs: Enable RIP bit63 hardware filtering Ravi Bangoria
@ 2026-01-16  3:34 ` Ravi Bangoria
  2026-01-19  7:57   ` Mi, Dapeng
  2026-01-16  3:34 ` [PATCH 11/11] perf/amd/ibs: Advertise remote socket capability Ravi Bangoria
  10 siblings, 1 reply; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-16  3:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Ravi Bangoria, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Dapeng Mi, James Clark, x86, linux-perf-users, linux-kernel,
	Manali Shukla, Santosh Shukla, Ananth Narayan, Sandipan Das

IBS OP on future hardware supports recording samples only for instructions
that does streaming store. Like the existing IBS filters, samples pointing
to instruction which does not cause streaming store are discarded and IBS
restarts internally.

Example:

  $ perf record -e ibs_op/strmst=1/ -- <workload>

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c      | 50 ++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/amd/ibs.h |  3 +-
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index a768a82d7ad2..0331bcd82272 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -34,6 +34,7 @@ static u32 ibs_caps;
 
 /* attr.config1 */
 #define IBS_OP_CONFIG1_LDLAT_MASK		(0xFFFULL <<  0)
+#define IBS_OP_CONFIG1_STRMST_MASK		(    1ULL << 12)
 
 #define IBS_FETCH_CONFIG1_FETCHLAT_MASK		(0x7FFULL <<  0)
 
@@ -292,6 +293,14 @@ static bool perf_ibs_fetch_lat_event(struct perf_ibs *perf_ibs,
 	       (event->attr.config1 & IBS_FETCH_CONFIG1_FETCHLAT_MASK);
 }
 
+static bool perf_ibs_strmst_event(struct perf_ibs *perf_ibs,
+				  struct perf_event *event)
+{
+	return perf_ibs == &perf_ibs_op &&
+	       (ibs_caps & IBS_CAPS_STRMST_RMTSOCKET) &&
+	       (event->attr.config1 & IBS_OP_CONFIG1_STRMST_MASK);
+}
+
 static int perf_ibs_init(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
@@ -416,6 +425,15 @@ static int perf_ibs_init(struct perf_event *event)
 		hwc->extra_reg.config |= fetchlat << 1;
 	}
 
+	if (perf_ibs_strmst_event(perf_ibs, event)) {
+		u64 strmst = event->attr.config1 & IBS_OP_CONFIG1_STRMST_MASK;
+
+		strmst >>= 12;
+
+		hwc->extra_reg.reg = perf_ibs->msr2;
+		hwc->extra_reg.config |= strmst << 3;
+	}
+
 	/*
 	 * If we modify hwc->sample_period, we also need to update
 	 * hwc->last_period and hwc->period_left.
@@ -706,6 +724,8 @@ PMU_EVENT_ATTR_STRING(ldlat, ibs_op_ldlat_cap, "1");
 PMU_EVENT_ATTR_STRING(dtlb_pgsize, ibs_op_dtlb_pgsize_cap, "1");
 PMU_EVENT_ATTR_STRING(fetchlat, ibs_fetch_lat_format, "config1:0-10");
 PMU_EVENT_ATTR_STRING(fetchlat, ibs_fetch_lat_cap, "1");
+PMU_EVENT_ATTR_STRING(strmst, ibs_op_strmst_format, "config1:12");
+PMU_EVENT_ATTR_STRING(strmst, ibs_op_strmst_cap, "1");
 
 static umode_t
 zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int i)
@@ -719,6 +739,12 @@ ibs_fetch_lat_is_visible(struct kobject *kobj, struct attribute *attr, int i)
 	return ibs_caps & IBS_CAPS_FETCHLAT ? attr->mode : 0;
 }
 
+static umode_t
+ibs_op_strmst_is_visible(struct kobject *kobj, struct attribute *attr, int i)
+{
+	return ibs_caps & IBS_CAPS_STRMST_RMTSOCKET ? attr->mode : 0;
+}
+
 static umode_t
 ibs_op_ldlat_is_visible(struct kobject *kobj, struct attribute *attr, int i)
 {
@@ -767,6 +793,11 @@ static struct attribute *ibs_op_dtlb_pgsize_cap_attrs[] = {
 	NULL,
 };
 
+static struct attribute *ibs_op_strmst_cap_attrs[] = {
+	&ibs_op_strmst_cap.attr.attr,
+	NULL,
+};
+
 static struct attribute_group group_fetch_formats = {
 	.name = "format",
 	.attrs = fetch_attrs,
@@ -808,6 +839,12 @@ static struct attribute_group group_ibs_op_dtlb_pgsize_cap = {
 	.is_visible = ibs_op_dtlb_pgsize_is_visible,
 };
 
+static struct attribute_group group_ibs_op_strmst_cap = {
+	.name = "caps",
+	.attrs = ibs_op_strmst_cap_attrs,
+	.is_visible = ibs_op_strmst_is_visible,
+};
+
 static const struct attribute_group *fetch_attr_groups[] = {
 	&group_fetch_formats,
 	&empty_caps_group,
@@ -853,6 +890,11 @@ static struct attribute *ibs_op_ldlat_format_attrs[] = {
 	NULL,
 };
 
+static struct attribute *ibs_op_strmst_format_attrs[] = {
+	&ibs_op_strmst_format.attr.attr,
+	NULL,
+};
+
 static struct attribute_group group_cnt_ctl = {
 	.name = "format",
 	.attrs = cnt_ctl_attrs,
@@ -877,6 +919,12 @@ static struct attribute_group group_ibs_op_ldlat_format = {
 	.is_visible = ibs_op_ldlat_is_visible,
 };
 
+static struct attribute_group group_ibs_op_strmst_format = {
+	.name = "format",
+	.attrs = ibs_op_strmst_format_attrs,
+	.is_visible = ibs_op_strmst_is_visible,
+};
+
 static const struct attribute_group *op_attr_update[] = {
 	&group_cnt_ctl,
 	&group_op_l3missonly,
@@ -884,6 +932,8 @@ static const struct attribute_group *op_attr_update[] = {
 	&group_ibs_op_ldlat_cap,
 	&group_ibs_op_ldlat_format,
 	&group_ibs_op_dtlb_pgsize_cap,
+	&group_ibs_op_strmst_cap,
+	&group_ibs_op_strmst_format,
 	NULL,
 };
 
diff --git a/arch/x86/include/asm/amd/ibs.h b/arch/x86/include/asm/amd/ibs.h
index 3ee5903982c2..b940156b7d23 100644
--- a/arch/x86/include/asm/amd/ibs.h
+++ b/arch/x86/include/asm/amd/ibs.h
@@ -99,7 +99,8 @@ union ibs_op_data2 {
 			rmt_node:1,	/* 4: destination node */
 			cache_hit_st:1,	/* 5: cache hit state */
 			data_src_hi:2,	/* 6-7: data source high */
-			reserved1:56;	/* 8-63: reserved */
+			strm_st:1,	/* 8: streaming store */
+			reserved1:55;	/* 9-63: reserved */
 	};
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/11] perf/amd/ibs: Advertise remote socket capability
  2026-01-16  3:34 [PATCH 00/11] perf/amd/ibs: Fixes + future enhancements Ravi Bangoria
                   ` (9 preceding siblings ...)
  2026-01-16  3:34 ` [PATCH 10/11] perf/amd/ibs: Enable streaming store filter Ravi Bangoria
@ 2026-01-16  3:34 ` Ravi Bangoria
  10 siblings, 0 replies; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-16  3:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Ravi Bangoria, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Dapeng Mi, James Clark, x86, linux-perf-users, linux-kernel,
	Manali Shukla, Santosh Shukla, Ananth Narayan, Sandipan Das

IBS OP on future hardware can indicate data source from remote socket
as well. Advertise this capability to userspace so that userspace tools
can decode IBS data accordingly.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c      | 19 +++++++++++++++++++
 arch/x86/include/asm/amd/ibs.h |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 0331bcd82272..b1e05a13df7a 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -726,6 +726,7 @@ PMU_EVENT_ATTR_STRING(fetchlat, ibs_fetch_lat_format, "config1:0-10");
 PMU_EVENT_ATTR_STRING(fetchlat, ibs_fetch_lat_cap, "1");
 PMU_EVENT_ATTR_STRING(strmst, ibs_op_strmst_format, "config1:12");
 PMU_EVENT_ATTR_STRING(strmst, ibs_op_strmst_cap, "1");
+PMU_EVENT_ATTR_STRING(rmtsocket, ibs_op_rmtsocket_cap, "1");
 
 static umode_t
 zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int i)
@@ -745,6 +746,12 @@ ibs_op_strmst_is_visible(struct kobject *kobj, struct attribute *attr, int i)
 	return ibs_caps & IBS_CAPS_STRMST_RMTSOCKET ? attr->mode : 0;
 }
 
+static umode_t
+ibs_op_rmtsocket_is_visible(struct kobject *kobj, struct attribute *attr, int i)
+{
+	return ibs_caps & IBS_CAPS_STRMST_RMTSOCKET ? attr->mode : 0;
+}
+
 static umode_t
 ibs_op_ldlat_is_visible(struct kobject *kobj, struct attribute *attr, int i)
 {
@@ -798,6 +805,11 @@ static struct attribute *ibs_op_strmst_cap_attrs[] = {
 	NULL,
 };
 
+static struct attribute *ibs_op_rmtsocket_cap_attrs[] = {
+	&ibs_op_rmtsocket_cap.attr.attr,
+	NULL,
+};
+
 static struct attribute_group group_fetch_formats = {
 	.name = "format",
 	.attrs = fetch_attrs,
@@ -845,6 +857,12 @@ static struct attribute_group group_ibs_op_strmst_cap = {
 	.is_visible = ibs_op_strmst_is_visible,
 };
 
+static struct attribute_group group_ibs_op_rmtsocket_cap = {
+	.name = "caps",
+	.attrs = ibs_op_rmtsocket_cap_attrs,
+	.is_visible = ibs_op_rmtsocket_is_visible,
+};
+
 static const struct attribute_group *fetch_attr_groups[] = {
 	&group_fetch_formats,
 	&empty_caps_group,
@@ -934,6 +952,7 @@ static const struct attribute_group *op_attr_update[] = {
 	&group_ibs_op_dtlb_pgsize_cap,
 	&group_ibs_op_strmst_cap,
 	&group_ibs_op_strmst_format,
+	&group_ibs_op_rmtsocket_cap,
 	NULL,
 };
 
diff --git a/arch/x86/include/asm/amd/ibs.h b/arch/x86/include/asm/amd/ibs.h
index b940156b7d23..532c189e77b8 100644
--- a/arch/x86/include/asm/amd/ibs.h
+++ b/arch/x86/include/asm/amd/ibs.h
@@ -100,7 +100,8 @@ union ibs_op_data2 {
 			cache_hit_st:1,	/* 5: cache hit state */
 			data_src_hi:2,	/* 6-7: data source high */
 			strm_st:1,	/* 8: streaming store */
-			reserved1:55;	/* 9-63: reserved */
+			rmt_socket:1,   /* 9: remote socket */
+			reserved1:54;   /* 10-63: reserved */
 	};
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/11] perf/amd/ibs: Throttle interrupts with filtered ldlat samples
  2026-01-16  3:34 ` [PATCH 01/11] perf/amd/ibs: Throttle interrupts with filtered ldlat samples Ravi Bangoria
@ 2026-01-19  7:31   ` Mi, Dapeng
  2026-01-19 12:56     ` Ravi Bangoria
  0 siblings, 1 reply; 20+ messages in thread
From: Mi, Dapeng @ 2026-01-19  7:31 UTC (permalink / raw)
  To: Ravi Bangoria, Peter Zijlstra, Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers, James Clark,
	x86, linux-perf-users, linux-kernel, Manali Shukla,
	Santosh Shukla, Ananth Narayan, Sandipan Das


On 1/16/2026 11:34 AM, Ravi Bangoria wrote:
> IBS NMI handler has a software filter (on top of hardware filter) to
> discard samples with load latency value lesser than user requested
> threshold. However, since software filter still involves NMI, check
> for NMI overhead and throttle the sample rate if needed.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  arch/x86/events/amd/ibs.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index aca89f23d2e0..96bb0974057f 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -1293,8 +1293,10 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
>  		 * within [128, 2048] range.
>  		 */
>  		if (!op_data3.ld_op || !op_data3.dc_miss ||
> -		    op_data3.dc_miss_lat <= (event->attr.config1 & 0xFFF))
> +		    op_data3.dc_miss_lat <= (event->attr.config1 & 0xFFF)) {
> +			throttle = perf_event_account_interrupt(event);
>  			goto out;
> +		}
>  	}

Not quite familiar with IBS code, but should the below code call the
throttle as well?

        /* Workaround for erratum #1197 */
        if (perf_ibs->fetch_ignore_if_zero_rip && !(ibs_data.regs[1]))
            goto out;


>  
>  	/*

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 05/11] perf/amd/ibs: Define macro for ldlat mask
  2026-01-16  3:34 ` [PATCH 05/11] perf/amd/ibs: Define macro for ldlat mask Ravi Bangoria
@ 2026-01-19  7:38   ` Mi, Dapeng
  0 siblings, 0 replies; 20+ messages in thread
From: Mi, Dapeng @ 2026-01-19  7:38 UTC (permalink / raw)
  To: Ravi Bangoria, Peter Zijlstra, Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers, James Clark,
	x86, linux-perf-users, linux-kernel, Manali Shukla,
	Santosh Shukla, Ananth Narayan, Sandipan Das


On 1/16/2026 11:34 AM, Ravi Bangoria wrote:
> Load latency filter threshold is encoded in config1[11:0]. Define a mask
> for it instead of hardcoded 0xFFF. Unlike "config" fields whose layout
> maps to PERF_{FETCH|OP}_CTL MSR, layout of "config1" is custom defined
> so a new set of macros are needed for "config1" fields.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  arch/x86/events/amd/ibs.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index 27b764eee6c7..02e7bffe1208 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -32,6 +32,9 @@ static u32 ibs_caps;
>  /* attr.config2 */
>  #define IBS_SW_FILTER_MASK	1
>  
> +/* attr.config1 */
> +#define IBS_OP_CONFIG1_LDLAT_MASK		(0xFFFULL <<  0)
> +
>  /*
>   * IBS states:
>   *
> @@ -274,7 +277,7 @@ static bool perf_ibs_ldlat_event(struct perf_ibs *perf_ibs,
>  {
>  	return perf_ibs == &perf_ibs_op &&
>  	       (ibs_caps & IBS_CAPS_OPLDLAT) &&
> -	       (event->attr.config1 & 0xFFF);
> +	       (event->attr.config1 & IBS_OP_CONFIG1_LDLAT_MASK);
>  }
>  
>  static int perf_ibs_init(struct perf_event *event)
> @@ -349,7 +352,7 @@ static int perf_ibs_init(struct perf_event *event)
>  	}
>  
>  	if (perf_ibs_ldlat_event(perf_ibs, event)) {
> -		u64 ldlat = event->attr.config1 & 0xFFF;
> +		u64 ldlat = event->attr.config1 & IBS_OP_CONFIG1_LDLAT_MASK;
>  
>  		if (ldlat < 128 || ldlat > 2048)
>  			return -EINVAL;
> @@ -1302,7 +1305,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
>  		 * within [128, 2048] range.
>  		 */
>  		if (!op_data3.ld_op || !op_data3.dc_miss ||
> -		    op_data3.dc_miss_lat <= (event->attr.config1 & 0xFFF)) {
> +		    op_data3.dc_miss_lat <= (event->attr.config1 & IBS_OP_CONFIG1_LDLAT_MASK)) {
>  			throttle = perf_event_account_interrupt(event);
>  			goto out;
>  		}

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 06/11] perf/amd/ibs: Add new MSRs and CPUID bits definitions
  2026-01-16  3:34 ` [PATCH 06/11] perf/amd/ibs: Add new MSRs and CPUID bits definitions Ravi Bangoria
@ 2026-01-19  7:39   ` Mi, Dapeng
  0 siblings, 0 replies; 20+ messages in thread
From: Mi, Dapeng @ 2026-01-19  7:39 UTC (permalink / raw)
  To: Ravi Bangoria, Peter Zijlstra, Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers, James Clark,
	x86, linux-perf-users, linux-kernel, Manali Shukla,
	Santosh Shukla, Ananth Narayan, Sandipan Das


On 1/16/2026 11:34 AM, Ravi Bangoria wrote:
> IBS on upcoming microarch introduced two new control MSRs and couple of
> new features. Define macros for them.
>
> New capabilities:
>
>  o IBS_CAPS_DIS: Alternate Fetch and Op IBS disable bits
>  o IBS_CAPS_FETCHLAT: Fetch Latency filter
>  o IBS_CAPS_BIT63_FILTER: Virtual address bit 63 based filters for Fetch
>    and Op
>  o IBS_CAPS_STRMST_RMTSOCKET: Streaming store filter and indicator,
>    remote socket indicator
>
> New control MSRs for above features:
>
>  o MSR_AMD64_IBSFETCHCTL2
>  o MSR_AMD64_IBSOPCTL2
>
> Also do cosmetic alignment changes.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  arch/x86/include/asm/msr-index.h  |  2 ++
>  arch/x86/include/asm/perf_event.h | 52 ++++++++++++++++++++-----------
>  2 files changed, 35 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 3d0a0950d20a..d8b3f3abe583 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -693,6 +693,8 @@
>  #define MSR_AMD64_IBSBRTARGET		0xc001103b
>  #define MSR_AMD64_ICIBSEXTDCTL		0xc001103c
>  #define MSR_AMD64_IBSOPDATA4		0xc001103d
> +#define MSR_AMD64_IBSOPCTL2		0xc001103e
> +#define MSR_AMD64_IBSFETCHCTL2		0xc001103f
>  #define MSR_AMD64_IBS_REG_COUNT_MAX	8 /* includes MSR_AMD64_IBSBRTARGET */
>  #define MSR_AMD64_SVM_AVIC_DOORBELL	0xc001011b
>  #define MSR_AMD64_VM_PAGE_FLUSH		0xc001011e
> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> index 0d9af4135e0a..6f5ec5c9d5b4 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -639,6 +639,10 @@ struct arch_pebs_cntr_header {
>  #define IBS_CAPS_OPDATA4		(1U<<10)
>  #define IBS_CAPS_ZEN4			(1U<<11)
>  #define IBS_CAPS_OPLDLAT		(1U<<12)
> +#define IBS_CAPS_DIS			(1U<<13)
> +#define IBS_CAPS_FETCHLAT		(1U<<14)
> +#define IBS_CAPS_BIT63_FILTER		(1U<<15)
> +#define IBS_CAPS_STRMST_RMTSOCKET	(1U<<16)
>  #define IBS_CAPS_OPDTLBPGSIZE		(1U<<19)
>  
>  #define IBS_CAPS_DEFAULT		(IBS_CAPS_AVAIL		\
> @@ -653,31 +657,41 @@ struct arch_pebs_cntr_header {
>  #define IBSCTL_LVT_OFFSET_MASK		0x0F
>  
>  /* IBS fetch bits/masks */
> -#define IBS_FETCH_L3MISSONLY	(1ULL<<59)
> -#define IBS_FETCH_RAND_EN	(1ULL<<57)
> -#define IBS_FETCH_VAL		(1ULL<<49)
> -#define IBS_FETCH_ENABLE	(1ULL<<48)
> -#define IBS_FETCH_CNT		0xFFFF0000ULL
> -#define IBS_FETCH_MAX_CNT	0x0000FFFFULL
> +#define IBS_FETCH_L3MISSONLY		      (1ULL << 59)
> +#define IBS_FETCH_RAND_EN		      (1ULL << 57)
> +#define IBS_FETCH_VAL			      (1ULL << 49)
> +#define IBS_FETCH_ENABLE		      (1ULL << 48)
> +#define IBS_FETCH_CNT			     0xFFFF0000ULL
> +#define IBS_FETCH_MAX_CNT		     0x0000FFFFULL
> +
> +#define IBS_FETCH_2_DIS			      (1ULL <<  0)
> +#define IBS_FETCH_2_FETCH_LAT_FILTER	    (0xFULL <<  1)
> +#define IBS_FETCH_2_EXCL_RIP_63_EQ_1	      (1ULL <<  5)
> +#define IBS_FETCH_2_EXCL_RIP_63_EQ_0	      (1ULL <<  6)
>  
>  /*
>   * IBS op bits/masks
>   * The lower 7 bits of the current count are random bits
>   * preloaded by hardware and ignored in software
>   */
> -#define IBS_OP_LDLAT_EN		(1ULL<<63)
> -#define IBS_OP_LDLAT_THRSH	(0xFULL<<59)
> -#define IBS_OP_CUR_CNT		(0xFFF80ULL<<32)
> -#define IBS_OP_CUR_CNT_RAND	(0x0007FULL<<32)
> -#define IBS_OP_CUR_CNT_EXT_MASK	(0x7FULL<<52)
> -#define IBS_OP_CNT_CTL		(1ULL<<19)
> -#define IBS_OP_VAL		(1ULL<<18)
> -#define IBS_OP_ENABLE		(1ULL<<17)
> -#define IBS_OP_L3MISSONLY	(1ULL<<16)
> -#define IBS_OP_MAX_CNT		0x0000FFFFULL
> -#define IBS_OP_MAX_CNT_EXT	0x007FFFFFULL	/* not a register bit mask */
> -#define IBS_OP_MAX_CNT_EXT_MASK	(0x7FULL<<20)	/* separate upper 7 bits */
> -#define IBS_RIP_INVALID		(1ULL<<38)
> +#define IBS_OP_LDLAT_EN			      (1ULL << 63)
> +#define IBS_OP_LDLAT_THRSH		    (0xFULL << 59)
> +#define IBS_OP_CUR_CNT			(0xFFF80ULL << 32)
> +#define IBS_OP_CUR_CNT_RAND		(0x0007FULL << 32)
> +#define IBS_OP_CUR_CNT_EXT_MASK		   (0x7FULL << 52)
> +#define IBS_OP_CNT_CTL			      (1ULL << 19)
> +#define IBS_OP_VAL			      (1ULL << 18)
> +#define IBS_OP_ENABLE			      (1ULL << 17)
> +#define IBS_OP_L3MISSONLY		      (1ULL << 16)
> +#define IBS_OP_MAX_CNT			     0x0000FFFFULL
> +#define IBS_OP_MAX_CNT_EXT		     0x007FFFFFULL	/* not a register bit mask */
> +#define IBS_OP_MAX_CNT_EXT_MASK		   (0x7FULL << 20)	/* separate upper 7 bits */
> +#define IBS_RIP_INVALID			      (1ULL << 38)
> +
> +#define IBS_OP_2_DIS			      (1ULL <<  0)
> +#define IBS_OP_2_EXCL_RIP_63_EQ_0	      (1ULL <<  1)
> +#define IBS_OP_2_EXCL_RIP_63_EQ_1	      (1ULL <<  2)
> +#define IBS_OP_2_STRM_ST_FILTER		      (1ULL <<  3)
>  
>  #ifdef CONFIG_X86_LOCAL_APIC
>  extern u32 get_ibs_caps(void);

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 07/11] perf/amd/ibs: Support IBS_{FETCH|OP}_CTL2[Dis] to eliminate RMW race
  2026-01-16  3:34 ` [PATCH 07/11] perf/amd/ibs: Support IBS_{FETCH|OP}_CTL2[Dis] to eliminate RMW race Ravi Bangoria
@ 2026-01-19  7:48   ` Mi, Dapeng
  2026-01-19 13:00     ` Ravi Bangoria
  0 siblings, 1 reply; 20+ messages in thread
From: Mi, Dapeng @ 2026-01-19  7:48 UTC (permalink / raw)
  To: Ravi Bangoria, Peter Zijlstra, Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers, James Clark,
	x86, linux-perf-users, linux-kernel, Manali Shukla,
	Santosh Shukla, Ananth Narayan, Sandipan Das


On 1/16/2026 11:34 AM, Ravi Bangoria wrote:
> The existing IBS_{FETCH|OP}_CTL MSRs combine control and status bits
> which leads to RMW race between HW and SW:
>
>   HW                               SW
>   ------------------------         ------------------------------
>                                    config = rdmsr(IBS_OP_CTL);
>                                    config &= ~EN;
>   Set IBS_OP_CTL[Val] to 1
>   trigger NMI
>                                    wrmsr(IBS_OP_CTL, config);
>                                    // Val is accidentally cleared
>
> Future hardware adds a control-only MSR, IBS_{FETCH|OP}_CTL2, which
> provides a second-level "disable" bit (Dis). IBS is now:
>
>   Enabled:  IBS_{FETCH|OP}_CTL[En] = 1 && IBS_{FETCH|OP}_CTL2[Dis] = 0
>   Disabled: IBS_{FETCH|OP}_CTL[En] = 0 || IBS_{FETCH|OP}_CTL2[Dis] = 1
>
> The separate "Dis" bit lets software disable IBS without touching any
> status fields, eliminating the hardware/software race.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  arch/x86/events/amd/ibs.c | 45 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index 02e7bffe1208..d8216048be84 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -86,9 +86,11 @@ struct cpu_perf_ibs {
>  struct perf_ibs {
>  	struct pmu			pmu;
>  	unsigned int			msr;
> +	unsigned int			msr2;
>  	u64				config_mask;
>  	u64				cnt_mask;
>  	u64				enable_mask;
> +	u64				disable_mask;
>  	u64				valid_mask;
>  	u16				min_period;
>  	u64				max_period;
> @@ -292,6 +294,8 @@ static int perf_ibs_init(struct perf_event *event)
>  		return -ENOENT;
>  
>  	config = event->attr.config;
> +	hwc->extra_reg.config = 0;
> +	hwc->extra_reg.reg = 0;
>  
>  	if (event->pmu != &perf_ibs->pmu)
>  		return -ENOENT;
> @@ -316,6 +320,11 @@ static int perf_ibs_init(struct perf_event *event)
>  	if (ret)
>  		return ret;
>  
> +	if (ibs_caps & IBS_CAPS_DIS) {
> +		hwc->extra_reg.config &= ~perf_ibs->disable_mask;
> +		hwc->extra_reg.reg = perf_ibs->msr2;
> +	}
> +
>  	if (hwc->sample_period) {
>  		if (config & perf_ibs->cnt_mask)
>  			/* raw max_cnt may not be set */
> @@ -445,6 +454,9 @@ static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs,
>  		wrmsrq(hwc->config_base, tmp & ~perf_ibs->enable_mask);
>  
>  	wrmsrq(hwc->config_base, tmp | perf_ibs->enable_mask);
> +
> +	if (hwc->extra_reg.reg)
> +		wrmsrq(hwc->extra_reg.reg, hwc->extra_reg.config);
>  }
>  
>  /*
> @@ -457,6 +469,11 @@ static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs,
>  static inline void perf_ibs_disable_event(struct perf_ibs *perf_ibs,
>  					  struct hw_perf_event *hwc, u64 config)
>  {
> +	if (ibs_caps & IBS_CAPS_DIS) {
> +		wrmsrq(hwc->extra_reg.reg, perf_ibs->disable_mask);
> +		return;
> +	}
> +
>  	config &= ~perf_ibs->cnt_mask;
>  	if (boot_cpu_data.x86 == 0x10)
>  		wrmsrq(hwc->config_base, config);
> @@ -809,6 +826,7 @@ static struct perf_ibs perf_ibs_fetch = {
>  		.check_period	= perf_ibs_check_period,
>  	},
>  	.msr			= MSR_AMD64_IBSFETCHCTL,
> +	.msr2			= MSR_AMD64_IBSFETCHCTL2,
>  	.config_mask		= IBS_FETCH_MAX_CNT | IBS_FETCH_RAND_EN,
>  	.cnt_mask		= IBS_FETCH_MAX_CNT,
>  	.enable_mask		= IBS_FETCH_ENABLE,
> @@ -834,6 +852,7 @@ static struct perf_ibs perf_ibs_op = {
>  		.check_period	= perf_ibs_check_period,
>  	},
>  	.msr			= MSR_AMD64_IBSOPCTL,
> +	.msr2			= MSR_AMD64_IBSOPCTL2,
>  	.config_mask		= IBS_OP_MAX_CNT,
>  	.cnt_mask		= IBS_OP_MAX_CNT | IBS_OP_CUR_CNT |
>  				  IBS_OP_CUR_CNT_RAND,
> @@ -1389,6 +1408,9 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
>  
>  out:
>  	if (!throttle) {
> +		if (ibs_caps & IBS_CAPS_DIS)
> +			wrmsrq(hwc->extra_reg.reg, perf_ibs->disable_mask);
> +
>  		if (perf_ibs == &perf_ibs_op) {
>  			if (ibs_caps & IBS_CAPS_OPCNTEXT) {
>  				new_config = period & IBS_OP_MAX_CNT_EXT_MASK;
> @@ -1460,6 +1482,9 @@ static __init int perf_ibs_fetch_init(void)
>  	if (ibs_caps & IBS_CAPS_ZEN4)
>  		perf_ibs_fetch.config_mask |= IBS_FETCH_L3MISSONLY;
>  
> +	if (ibs_caps & IBS_CAPS_DIS)
> +		perf_ibs_fetch.disable_mask = IBS_FETCH_2_DIS;
> +
>  	perf_ibs_fetch.pmu.attr_groups = fetch_attr_groups;
>  	perf_ibs_fetch.pmu.attr_update = fetch_attr_update;
>  
> @@ -1481,6 +1506,9 @@ static __init int perf_ibs_op_init(void)
>  	if (ibs_caps & IBS_CAPS_ZEN4)
>  		perf_ibs_op.config_mask |= IBS_OP_L3MISSONLY;
>  
> +	if (ibs_caps & IBS_CAPS_DIS)
> +		perf_ibs_op.disable_mask = IBS_OP_2_DIS;
> +
>  	perf_ibs_op.pmu.attr_groups = op_attr_groups;
>  	perf_ibs_op.pmu.attr_update = op_attr_update;
>  
> @@ -1727,6 +1755,23 @@ static void clear_APIC_ibs(void)
>  static int x86_pmu_amd_ibs_starting_cpu(unsigned int cpu)
>  {
>  	setup_APIC_ibs();
> +
> +	if (ibs_caps & IBS_CAPS_DIS) {
> +		/*
> +		 * IBS enable sequence:
> +		 *   CTL[En] = 1;
> +		 *   CTL2[Dis] = 0;
> +		 *
> +		 * IBS disable sequence:
> +		 *   CTL2[Dis] = 1;
> +		 *
> +		 * Set CTL2[Dis] when CPU comes up. This is needed to make
> +		 * enable sequence effective.
> +		 */
> +		wrmsrq(MSR_AMD64_IBSFETCHCTL2, 1);
> +		wrmsrq(MSR_AMD64_IBSOPCTL2, 1);

What does the BIT 0 of these 2 MSRs mean? Disable? Better define a macro
instead of using the magic number "1". 


> +	}
> +
>  	return 0;
>  }
>  

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 10/11] perf/amd/ibs: Enable streaming store filter
  2026-01-16  3:34 ` [PATCH 10/11] perf/amd/ibs: Enable streaming store filter Ravi Bangoria
@ 2026-01-19  7:57   ` Mi, Dapeng
  2026-01-19 13:02     ` Ravi Bangoria
  0 siblings, 1 reply; 20+ messages in thread
From: Mi, Dapeng @ 2026-01-19  7:57 UTC (permalink / raw)
  To: Ravi Bangoria, Peter Zijlstra, Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers, James Clark,
	x86, linux-perf-users, linux-kernel, Manali Shukla,
	Santosh Shukla, Ananth Narayan, Sandipan Das


On 1/16/2026 11:34 AM, Ravi Bangoria wrote:
> IBS OP on future hardware supports recording samples only for instructions
> that does streaming store. Like the existing IBS filters, samples pointing
> to instruction which does not cause streaming store are discarded and IBS
> restarts internally.
>
> Example:
>
>   $ perf record -e ibs_op/strmst=1/ -- <workload>
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  arch/x86/events/amd/ibs.c      | 50 ++++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/amd/ibs.h |  3 +-
>  2 files changed, 52 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index a768a82d7ad2..0331bcd82272 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -34,6 +34,7 @@ static u32 ibs_caps;
>  
>  /* attr.config1 */
>  #define IBS_OP_CONFIG1_LDLAT_MASK		(0xFFFULL <<  0)
> +#define IBS_OP_CONFIG1_STRMST_MASK		(    1ULL << 12)
>  
>  #define IBS_FETCH_CONFIG1_FETCHLAT_MASK		(0x7FFULL <<  0)
>  
> @@ -292,6 +293,14 @@ static bool perf_ibs_fetch_lat_event(struct perf_ibs *perf_ibs,
>  	       (event->attr.config1 & IBS_FETCH_CONFIG1_FETCHLAT_MASK);
>  }
>  
> +static bool perf_ibs_strmst_event(struct perf_ibs *perf_ibs,
> +				  struct perf_event *event)
> +{
> +	return perf_ibs == &perf_ibs_op &&
> +	       (ibs_caps & IBS_CAPS_STRMST_RMTSOCKET) &&
> +	       (event->attr.config1 & IBS_OP_CONFIG1_STRMST_MASK);
> +}
> +
>  static int perf_ibs_init(struct perf_event *event)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
> @@ -416,6 +425,15 @@ static int perf_ibs_init(struct perf_event *event)
>  		hwc->extra_reg.config |= fetchlat << 1;
>  	}
>  
> +	if (perf_ibs_strmst_event(perf_ibs, event)) {
> +		u64 strmst = event->attr.config1 & IBS_OP_CONFIG1_STRMST_MASK;
> +
> +		strmst >>= 12;

The right shift can be directly merged into previous sentence, and better
define some macro instead of use these magic numbers.


> +
> +		hwc->extra_reg.reg = perf_ibs->msr2;
> +		hwc->extra_reg.config |= strmst << 3;
> +	}
> +
>  	/*
>  	 * If we modify hwc->sample_period, we also need to update
>  	 * hwc->last_period and hwc->period_left.
> @@ -706,6 +724,8 @@ PMU_EVENT_ATTR_STRING(ldlat, ibs_op_ldlat_cap, "1");
>  PMU_EVENT_ATTR_STRING(dtlb_pgsize, ibs_op_dtlb_pgsize_cap, "1");
>  PMU_EVENT_ATTR_STRING(fetchlat, ibs_fetch_lat_format, "config1:0-10");
>  PMU_EVENT_ATTR_STRING(fetchlat, ibs_fetch_lat_cap, "1");
> +PMU_EVENT_ATTR_STRING(strmst, ibs_op_strmst_format, "config1:12");
> +PMU_EVENT_ATTR_STRING(strmst, ibs_op_strmst_cap, "1");
>  
>  static umode_t
>  zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int i)
> @@ -719,6 +739,12 @@ ibs_fetch_lat_is_visible(struct kobject *kobj, struct attribute *attr, int i)
>  	return ibs_caps & IBS_CAPS_FETCHLAT ? attr->mode : 0;
>  }
>  
> +static umode_t
> +ibs_op_strmst_is_visible(struct kobject *kobj, struct attribute *attr, int i)
> +{
> +	return ibs_caps & IBS_CAPS_STRMST_RMTSOCKET ? attr->mode : 0;
> +}
> +
>  static umode_t
>  ibs_op_ldlat_is_visible(struct kobject *kobj, struct attribute *attr, int i)
>  {
> @@ -767,6 +793,11 @@ static struct attribute *ibs_op_dtlb_pgsize_cap_attrs[] = {
>  	NULL,
>  };
>  
> +static struct attribute *ibs_op_strmst_cap_attrs[] = {
> +	&ibs_op_strmst_cap.attr.attr,
> +	NULL,
> +};
> +
>  static struct attribute_group group_fetch_formats = {
>  	.name = "format",
>  	.attrs = fetch_attrs,
> @@ -808,6 +839,12 @@ static struct attribute_group group_ibs_op_dtlb_pgsize_cap = {
>  	.is_visible = ibs_op_dtlb_pgsize_is_visible,
>  };
>  
> +static struct attribute_group group_ibs_op_strmst_cap = {
> +	.name = "caps",
> +	.attrs = ibs_op_strmst_cap_attrs,
> +	.is_visible = ibs_op_strmst_is_visible,
> +};
> +
>  static const struct attribute_group *fetch_attr_groups[] = {
>  	&group_fetch_formats,
>  	&empty_caps_group,
> @@ -853,6 +890,11 @@ static struct attribute *ibs_op_ldlat_format_attrs[] = {
>  	NULL,
>  };
>  
> +static struct attribute *ibs_op_strmst_format_attrs[] = {
> +	&ibs_op_strmst_format.attr.attr,
> +	NULL,
> +};
> +
>  static struct attribute_group group_cnt_ctl = {
>  	.name = "format",
>  	.attrs = cnt_ctl_attrs,
> @@ -877,6 +919,12 @@ static struct attribute_group group_ibs_op_ldlat_format = {
>  	.is_visible = ibs_op_ldlat_is_visible,
>  };
>  
> +static struct attribute_group group_ibs_op_strmst_format = {
> +	.name = "format",
> +	.attrs = ibs_op_strmst_format_attrs,
> +	.is_visible = ibs_op_strmst_is_visible,
> +};
> +
>  static const struct attribute_group *op_attr_update[] = {
>  	&group_cnt_ctl,
>  	&group_op_l3missonly,
> @@ -884,6 +932,8 @@ static const struct attribute_group *op_attr_update[] = {
>  	&group_ibs_op_ldlat_cap,
>  	&group_ibs_op_ldlat_format,
>  	&group_ibs_op_dtlb_pgsize_cap,
> +	&group_ibs_op_strmst_cap,
> +	&group_ibs_op_strmst_format,
>  	NULL,
>  };
>  
> diff --git a/arch/x86/include/asm/amd/ibs.h b/arch/x86/include/asm/amd/ibs.h
> index 3ee5903982c2..b940156b7d23 100644
> --- a/arch/x86/include/asm/amd/ibs.h
> +++ b/arch/x86/include/asm/amd/ibs.h
> @@ -99,7 +99,8 @@ union ibs_op_data2 {
>  			rmt_node:1,	/* 4: destination node */
>  			cache_hit_st:1,	/* 5: cache hit state */
>  			data_src_hi:2,	/* 6-7: data source high */
> -			reserved1:56;	/* 8-63: reserved */
> +			strm_st:1,	/* 8: streaming store */
> +			reserved1:55;	/* 9-63: reserved */
>  	};
>  };
>  

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/11] perf/amd/ibs: Throttle interrupts with filtered ldlat samples
  2026-01-19  7:31   ` Mi, Dapeng
@ 2026-01-19 12:56     ` Ravi Bangoria
  0 siblings, 0 replies; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-19 12:56 UTC (permalink / raw)
  To: Mi, Dapeng
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, James Clark, x86, linux-perf-users,
	linux-kernel, Manali Shukla, Santosh Shukla, Ananth Narayan,
	Sandipan Das

Hi Dapeng,

>> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
>> index aca89f23d2e0..96bb0974057f 100644
>> --- a/arch/x86/events/amd/ibs.c
>> +++ b/arch/x86/events/amd/ibs.c
>> @@ -1293,8 +1293,10 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
>>  		 * within [128, 2048] range.
>>  		 */
>>  		if (!op_data3.ld_op || !op_data3.dc_miss ||
>> -		    op_data3.dc_miss_lat <= (event->attr.config1 & 0xFFF))
>> +		    op_data3.dc_miss_lat <= (event->attr.config1 & 0xFFF)) {
>> +			throttle = perf_event_account_interrupt(event);
>>  			goto out;
>> +		}
>>  	}
> 
> Not quite familiar with IBS code, but should the below code call the
> throttle as well?
> 
>         /* Workaround for erratum #1197 */
>         if (perf_ibs->fetch_ignore_if_zero_rip && !(ibs_data.regs[1]))
>             goto out;
> 

Yes, it should. I'll add a patch in next version.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 07/11] perf/amd/ibs: Support IBS_{FETCH|OP}_CTL2[Dis] to eliminate RMW race
  2026-01-19  7:48   ` Mi, Dapeng
@ 2026-01-19 13:00     ` Ravi Bangoria
  0 siblings, 0 replies; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-19 13:00 UTC (permalink / raw)
  To: Mi, Dapeng
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, James Clark, x86, linux-perf-users,
	linux-kernel, Manali Shukla, Santosh Shukla, Ananth Narayan,
	Sandipan Das

Hi Dapeng,

>>  static int x86_pmu_amd_ibs_starting_cpu(unsigned int cpu)
>>  {
>>  	setup_APIC_ibs();
>> +
>> +	if (ibs_caps & IBS_CAPS_DIS) {
>> +		/*
>> +		 * IBS enable sequence:
>> +		 *   CTL[En] = 1;
>> +		 *   CTL2[Dis] = 0;
>> +		 *
>> +		 * IBS disable sequence:
>> +		 *   CTL2[Dis] = 1;
>> +		 *
>> +		 * Set CTL2[Dis] when CPU comes up. This is needed to make
>> +		 * enable sequence effective.
>> +		 */
>> +		wrmsrq(MSR_AMD64_IBSFETCHCTL2, 1);
>> +		wrmsrq(MSR_AMD64_IBSOPCTL2, 1);
> 
> What does the BIT 0 of these 2 MSRs mean? Disable? Better define a macro
> instead of using the magic number "1". 

Right, those are disable bits. I'll replace those magic numbers with
IBS_FETCH_2_DIS and IBS_OP_2_DIS.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 10/11] perf/amd/ibs: Enable streaming store filter
  2026-01-19  7:57   ` Mi, Dapeng
@ 2026-01-19 13:02     ` Ravi Bangoria
  0 siblings, 0 replies; 20+ messages in thread
From: Ravi Bangoria @ 2026-01-19 13:02 UTC (permalink / raw)
  To: Mi, Dapeng
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, James Clark, x86, linux-perf-users,
	linux-kernel, Manali Shukla, Santosh Shukla, Ananth Narayan,
	Sandipan Das, Ravi Bangoria

Hi Dapeng,

>> @@ -416,6 +425,15 @@ static int perf_ibs_init(struct perf_event *event)
>>  		hwc->extra_reg.config |= fetchlat << 1;
>>  	}
>>  
>> +	if (perf_ibs_strmst_event(perf_ibs, event)) {
>> +		u64 strmst = event->attr.config1 & IBS_OP_CONFIG1_STRMST_MASK;
>> +
>> +		strmst >>= 12;
> 
> The right shift can be directly merged into previous sentence, and better
> define some macro instead of use these magic numbers.

Ack.

Thanks for the review,
Ravi

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-01-19 13:02 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-16  3:34 [PATCH 00/11] perf/amd/ibs: Fixes + future enhancements Ravi Bangoria
2026-01-16  3:34 ` [PATCH 01/11] perf/amd/ibs: Throttle interrupts with filtered ldlat samples Ravi Bangoria
2026-01-19  7:31   ` Mi, Dapeng
2026-01-19 12:56     ` Ravi Bangoria
2026-01-16  3:34 ` [PATCH 02/11] perf/amd/ibs: Limit ldlat->l3missonly dependency to Zen5 Ravi Bangoria
2026-01-16  3:34 ` [PATCH 03/11] perf/amd/ibs: Preserve PhyAddrVal bit when clearing PhyAddr MSR Ravi Bangoria
2026-01-16  3:34 ` [PATCH 04/11] perf/amd/ibs: Avoid race between event add and NMI Ravi Bangoria
2026-01-16  3:34 ` [PATCH 05/11] perf/amd/ibs: Define macro for ldlat mask Ravi Bangoria
2026-01-19  7:38   ` Mi, Dapeng
2026-01-16  3:34 ` [PATCH 06/11] perf/amd/ibs: Add new MSRs and CPUID bits definitions Ravi Bangoria
2026-01-19  7:39   ` Mi, Dapeng
2026-01-16  3:34 ` [PATCH 07/11] perf/amd/ibs: Support IBS_{FETCH|OP}_CTL2[Dis] to eliminate RMW race Ravi Bangoria
2026-01-19  7:48   ` Mi, Dapeng
2026-01-19 13:00     ` Ravi Bangoria
2026-01-16  3:34 ` [PATCH 08/11] perf/amd/ibs: Enable fetch latency filtering Ravi Bangoria
2026-01-16  3:34 ` [PATCH 09/11] perf/amd/ibs: Enable RIP bit63 hardware filtering Ravi Bangoria
2026-01-16  3:34 ` [PATCH 10/11] perf/amd/ibs: Enable streaming store filter Ravi Bangoria
2026-01-19  7:57   ` Mi, Dapeng
2026-01-19 13:02     ` Ravi Bangoria
2026-01-16  3:34 ` [PATCH 11/11] perf/amd/ibs: Advertise remote socket capability Ravi Bangoria

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox