Kernel KVM virtualization development
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH] x86: nVMX: Add retry loop to advanced RTM debugging subtest
@ 2026-02-27 21:38 Jim Mattson
  2026-02-28  0:55 ` Yosry Ahmed
  0 siblings, 1 reply; 6+ messages in thread
From: Jim Mattson @ 2026-02-27 21:38 UTC (permalink / raw)
  To: kvm, Paolo Bonzini, Sean Christopherson, Yosry Ahmed; +Cc: Jim Mattson

Linux commit 400816f60c54 ("perf/x86/intel: Implement support for TSX Force
Abort") introduced a feature that temporarily disables RTM on Skylake and
similar CPUs (06_55H stepping <= 5, 06_4EH, 06_5EH, 06_8EH stepping <= 0BH,
and 06_9EH stepping <= 0CH) via the TSX_FORCE_ABORT MSR, so that all four
general purpose PMCs can be used by perf. This feature is on by default,
but can be disabled by writing 0 to /sys/devices/cpu/allow_tsx_force_abort.

When TSX_FORCE_ABORT.RTM_FORCE_ABORT[bit 0] is set, all RTM transactions
will immediately abort, before the xbegin instruction retires.

The test of a single-step #DB delivered in a transactional region,
introduced in commit 414bd9d5ebd7 ("x86: nVMX: Basic test of #DB intercept
in L1"), does not handle this scenario.

Modify the test to identify an immediate RTM transaction abort and to try
up to 30 times before giving up. If the xbegin instruction never retires,
report the test as skipped.

Note that when an RTM transaction aborts, the CPU state is rolled back to
before the xbegin instruction, but the RIP is modified to point to the
fallback code address. Hence, if the transaction aborts before the
single-step #DB trap is delivered, the first instruction of the fallback
code will retire before the single-step #DB trap is delivered.

Fixes: 414bd9d5ebd7 ("x86: nVMX: Basic test of #DB intercept in L1")
Signed-off-by: Jim Mattson <jmattson@google.com>
---
 lib/x86/msr.h   |  1 +
 x86/vmx_tests.c | 56 ++++++++++++++++++++++++++++++++-----------------
 2 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/lib/x86/msr.h b/lib/x86/msr.h
index 7397809c07cd..97f52bb5bb4e 100644
--- a/lib/x86/msr.h
+++ b/lib/x86/msr.h
@@ -109,6 +109,7 @@
 #define DEBUGCTLMSR_BTS_OFF_OS		(1UL <<  9)
 #define DEBUGCTLMSR_BTS_OFF_USR		(1UL << 10)
 #define DEBUGCTLMSR_FREEZE_LBRS_ON_PMI	(1UL << 11)
+#define DEBUGCTLMSR_RTM_DEBUG		(1UL << 15)
 
 #define MSR_LBR_NHM_FROM	0x00000680
 #define MSR_LBR_NHM_TO		0x000006c0
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 5ffb80a3d866..2094a0d3ec57 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -9217,9 +9217,10 @@ static void vmx_db_test_guest(void)
 	 * For a hardware generated single-step #DB in a transactional region.
 	 */
 	asm volatile("vmcall;"
-		     ".Lxbegin: xbegin .Lskip_rtm;"
+		     ".Lrtm_begin: xbegin .Lrtm_fallback;"
 		     "xend;"
-		     ".Lskip_rtm:");
+		     ".Lrtm_fallback: nop;"
+		     ".Lpost_rtm:");
 }
 
 /*
@@ -9295,6 +9296,10 @@ static void single_step_guest(const char *test_name, u64 starting_dr6,
  * exception bits are properly accumulated into the exit qualification
  * field.
  */
+
+#define RTM_RETRIES 30
+#define ONE_BILLION 1000000000ul
+
 static void vmx_db_test(void)
 {
 	/*
@@ -9308,8 +9313,8 @@ static void vmx_db_test(void)
 	extern char post_movss_nop asm(".Lpost_movss_nop");
 	extern char post_wbinvd asm(".Lpost_wbinvd");
 	extern char post_movss_wbinvd asm(".Lpost_movss_wbinvd");
-	extern char xbegin asm(".Lxbegin");
-	extern char skip_rtm asm(".Lskip_rtm");
+	extern char rtm_begin asm(".Lrtm_begin");
+	extern char post_rtm asm(".Lpost_rtm");
 
 	/*
 	 * L1 wants to intercept #DB exceptions encountered in L2.
@@ -9362,30 +9367,43 @@ static void vmx_db_test(void)
 		      starting_dr6);
 
 	/*
-	 * Optional RTM test for hardware that supports RTM, to
-	 * demonstrate that the current volume 3 of the SDM
-	 * (325384-067US), table 27-1 is incorrect. Bit 16 of the exit
-	 * qualification for debug exceptions is not reserved. It is
-	 * set to 1 if a debug exception (#DB) or a breakpoint
-	 * exception (#BP) occurs inside an RTM region while advanced
-	 * debugging of RTM transactional regions is enabled.
+	 * Optional RTM test for hardware that supports RTM, to verify that
+	 * bit 16 of the exit qualification for debug exceptions is set to
+	 * 1 if a #DB occurs inside an RTM region while advanced debugging
+	 * of RTM transactional regions is enabled.
 	 */
 	if (this_cpu_has(X86_FEATURE_RTM)) {
+		int i = RTM_RETRIES;
+
 		vmcs_write(ENT_CONTROLS,
 			   vmcs_read(ENT_CONTROLS) | ENT_LOAD_DBGCTLS);
 		/*
-		 * Set DR7.RTM[bit 11] and IA32_DEBUGCTL.RTM[bit 15]
-		 * in the guest to enable advanced debugging of RTM
-		 * transactional regions.
+		 * Set DR7.RTM and IA32_DEBUGCTL.RTM to enable advanced
+		 * debugging of RTM transactional regions. See "RTM-Enabled
+		 * Debugger Support" in the SDM, volume 1.
 		 */
-		vmcs_write(GUEST_DR7, BIT(11));
-		vmcs_write(GUEST_DEBUGCTL, BIT(15));
+		vmcs_write(GUEST_DR7, DR7_RTM);
+		vmcs_write(GUEST_DEBUGCTL, DEBUGCTLMSR_RTM_DEBUG);
+
 		single_step_guest("Hardware delivered single-step in "
 				  "transactional region", starting_dr6, 0);
-		check_db_exit(false, false, false, &xbegin, BIT(16),
-			      starting_dr6);
+
+		while (--i && vmcs_read(GUEST_RIP) == (u64)&post_rtm) {
+			delay(ONE_BILLION);
+			vmcs_write(GUEST_RIP, (u64)&rtm_begin);
+			enter_guest();
+		}
+
+		if (vmcs_read(GUEST_RIP) == (u64)&post_rtm) {
+			report_skip("Transaction always aborted before xbegin "
+				    "retired (%d attempts)", RTM_RETRIES);
+			dismiss_db();
+		} else {
+			check_db_exit(false, false, false, &rtm_begin, DR6_RTM,
+				      starting_dr6);
+		}
 	} else {
-		vmcs_write(GUEST_RIP, (u64)&skip_rtm);
+		vmcs_write(GUEST_RIP, (u64)&post_rtm);
 		enter_guest();
 	}
 }

base-commit: 86e53277ac80dabb04f4fa5fa6a6cc7649392bdc
prerequisite-patch-id: 177e49d1b63609e5b421ea64fe7490a4906617a9
prerequisite-patch-id: 7e71aa35841be5d72bf543e4f332554d12e83cc0
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [kvm-unit-tests PATCH] x86: nVMX: Add retry loop to advanced RTM debugging subtest
  2026-02-27 21:38 [kvm-unit-tests PATCH] x86: nVMX: Add retry loop to advanced RTM debugging subtest Jim Mattson
@ 2026-02-28  0:55 ` Yosry Ahmed
  2026-03-03  0:08   ` Jim Mattson
  0 siblings, 1 reply; 6+ messages in thread
From: Yosry Ahmed @ 2026-02-28  0:55 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm, Paolo Bonzini, Sean Christopherson

On Fri, Feb 27, 2026 at 1:39 PM Jim Mattson <jmattson@google.com> wrote:
>
> Linux commit 400816f60c54 ("perf/x86/intel: Implement support for TSX Force
> Abort") introduced a feature that temporarily disables RTM on Skylake and
> similar CPUs (06_55H stepping <= 5, 06_4EH, 06_5EH, 06_8EH stepping <= 0BH,
> and 06_9EH stepping <= 0CH) via the TSX_FORCE_ABORT MSR, so that all four
> general purpose PMCs can be used by perf. This feature is on by default,
> but can be disabled by writing 0 to /sys/devices/cpu/allow_tsx_force_abort.
>
> When TSX_FORCE_ABORT.RTM_FORCE_ABORT[bit 0] is set, all RTM transactions
> will immediately abort, before the xbegin instruction retires.
>
> The test of a single-step #DB delivered in a transactional region,
> introduced in commit 414bd9d5ebd7 ("x86: nVMX: Basic test of #DB intercept
> in L1"), does not handle this scenario.
>
> Modify the test to identify an immediate RTM transaction abort and to try
> up to 30 times before giving up. If the xbegin instruction never retires,
> report the test as skipped.
>
> Note that when an RTM transaction aborts, the CPU state is rolled back to
> before the xbegin instruction, but the RIP is modified to point to the
> fallback code address. Hence, if the transaction aborts before the
> single-step #DB trap is delivered, the first instruction of the fallback
> code will retire before the single-step #DB trap is delivered.
>
> Fixes: 414bd9d5ebd7 ("x86: nVMX: Basic test of #DB intercept in L1")
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>  lib/x86/msr.h   |  1 +
>  x86/vmx_tests.c | 56 ++++++++++++++++++++++++++++++++-----------------
>  2 files changed, 38 insertions(+), 19 deletions(-)
>
> diff --git a/lib/x86/msr.h b/lib/x86/msr.h
> index 7397809c07cd..97f52bb5bb4e 100644
> --- a/lib/x86/msr.h
> +++ b/lib/x86/msr.h
> @@ -109,6 +109,7 @@
>  #define DEBUGCTLMSR_BTS_OFF_OS         (1UL <<  9)
>  #define DEBUGCTLMSR_BTS_OFF_USR                (1UL << 10)
>  #define DEBUGCTLMSR_FREEZE_LBRS_ON_PMI (1UL << 11)
> +#define DEBUGCTLMSR_RTM_DEBUG          (1UL << 15)
>
>  #define MSR_LBR_NHM_FROM       0x00000680
>  #define MSR_LBR_NHM_TO         0x000006c0
> diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
> index 5ffb80a3d866..2094a0d3ec57 100644
> --- a/x86/vmx_tests.c
> +++ b/x86/vmx_tests.c
> @@ -9217,9 +9217,10 @@ static void vmx_db_test_guest(void)
>          * For a hardware generated single-step #DB in a transactional region.
>          */
>         asm volatile("vmcall;"
> -                    ".Lxbegin: xbegin .Lskip_rtm;"
> +                    ".Lrtm_begin: xbegin .Lrtm_fallback;"
>                      "xend;"
> -                    ".Lskip_rtm:");
> +                    ".Lrtm_fallback: nop;"
> +                    ".Lpost_rtm:");
>  }
>
>  /*
> @@ -9295,6 +9296,10 @@ static void single_step_guest(const char *test_name, u64 starting_dr6,
>   * exception bits are properly accumulated into the exit qualification
>   * field.
>   */
> +
> +#define RTM_RETRIES 30
> +#define ONE_BILLION 1000000000ul

I think the name would be more descriptive as RTM_DELAY_CYCLES or sth,
IIUC this will be in the order of 100s of milliseconds. Do we need to
wait that long between retries? If the CPU is in a state where it will
always abort RTM, 30 retries will end up taking seconds or 10s of
seconds, right?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [kvm-unit-tests PATCH] x86: nVMX: Add retry loop to advanced RTM debugging subtest
  2026-02-28  0:55 ` Yosry Ahmed
@ 2026-03-03  0:08   ` Jim Mattson
  2026-03-03  0:09     ` Yosry Ahmed
  0 siblings, 1 reply; 6+ messages in thread
From: Jim Mattson @ 2026-03-03  0:08 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: kvm, Paolo Bonzini, Sean Christopherson

On Fri, Feb 27, 2026 at 4:55 PM Yosry Ahmed <yosry@kernel.org> wrote:
>
> On Fri, Feb 27, 2026 at 1:39 PM Jim Mattson <jmattson@google.com> wrote:
> > +#define RTM_RETRIES 30
> > +#define ONE_BILLION 1000000000ul
>
> I think the name would be more descriptive as RTM_DELAY_CYCLES or sth,
RTM_RETRY_DELAY?

> IIUC this will be in the order of 100s of milliseconds. Do we need to
> wait that long between retries? If the CPU is in a state where it will
> always abort RTM, 30 retries will end up taking seconds or 10s of
> seconds, right?

I tried reducing the delay by a factor of 10. At 200 retries, I still
see a 2% skip rate on a Skylake Xeon E5 @ 2GHz. I'd like to get the
skip rate under 1%. But, maybe others don't care as much?

Yes, 30 billion cycles is going to be on the order of 10 seconds.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [kvm-unit-tests PATCH] x86: nVMX: Add retry loop to advanced RTM debugging subtest
  2026-03-03  0:08   ` Jim Mattson
@ 2026-03-03  0:09     ` Yosry Ahmed
  2026-03-03  0:26       ` Sean Christopherson
  0 siblings, 1 reply; 6+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:09 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm, Paolo Bonzini, Sean Christopherson

On Mon, Mar 2, 2026 at 4:08 PM Jim Mattson <jmattson@google.com> wrote:
>
> On Fri, Feb 27, 2026 at 4:55 PM Yosry Ahmed <yosry@kernel.org> wrote:
> >
> > On Fri, Feb 27, 2026 at 1:39 PM Jim Mattson <jmattson@google.com> wrote:
> > > +#define RTM_RETRIES 30
> > > +#define ONE_BILLION 1000000000ul
> >
> > I think the name would be more descriptive as RTM_DELAY_CYCLES or sth,
> RTM_RETRY_DELAY?

SG.

>
> > IIUC this will be in the order of 100s of milliseconds. Do we need to
> > wait that long between retries? If the CPU is in a state where it will
> > always abort RTM, 30 retries will end up taking seconds or 10s of
> > seconds, right?
>
> I tried reducing the delay by a factor of 10. At 200 retries, I still
> see a 2% skip rate on a Skylake Xeon E5 @ 2GHz. I'd like to get the
> skip rate under 1%. But, maybe others don't care as much?
>
> Yes, 30 billion cycles is going to be on the order of 10 seconds.

I personally care more about the test time than the fact that it won't
test RTM 2% of the time, but my opinion doesn't really matter :P

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [kvm-unit-tests PATCH] x86: nVMX: Add retry loop to advanced RTM debugging subtest
  2026-03-03  0:09     ` Yosry Ahmed
@ 2026-03-03  0:26       ` Sean Christopherson
  2026-03-03  0:38         ` Yosry Ahmed
  0 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2026-03-03  0:26 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Jim Mattson, kvm, Paolo Bonzini

On Mon, Mar 02, 2026, Yosry Ahmed wrote:
> On Mon, Mar 2, 2026 at 4:08 PM Jim Mattson <jmattson@google.com> wrote:
> > > IIUC this will be in the order of 100s of milliseconds. Do we need to
> > > wait that long between retries? If the CPU is in a state where it will
> > > always abort RTM, 30 retries will end up taking seconds or 10s of
> > > seconds, right?
> >
> > I tried reducing the delay by a factor of 10. At 200 retries, I still
> > see a 2% skip rate on a Skylake Xeon E5 @ 2GHz. I'd like to get the
> > skip rate under 1%. But, maybe others don't care as much?
> >
> > Yes, 30 billion cycles is going to be on the order of 10 seconds.
> 
> I personally care more about the test time than the fact that it won't
> test RTM 2% of the time, but my opinion doesn't really matter :P

I generally care more about runtime too, but isn't 10 seconds only the worst
case scenario, and only on these fubar CPUs?  E.g. if there's no perf activity
in the host, or the CPU isn't one of these oddballs, isn't XBEGIN going to succeed
~100% of the time?

If this were a choice between "eat N seconds every time" and "skip the test",
I'd be a-ok with a skip rate of 50% if it meant reducing N.  But, assuming this
requires perf activity and a Skylake-era CPU, odds are good this will only be hit
in CI environments, at which point adding ~10 seconds to the worst case scenario
isn't a bad tradeoff (so long as it doesn't push the total runtime close to the
timeout).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [kvm-unit-tests PATCH] x86: nVMX: Add retry loop to advanced RTM debugging subtest
  2026-03-03  0:26       ` Sean Christopherson
@ 2026-03-03  0:38         ` Yosry Ahmed
  0 siblings, 0 replies; 6+ messages in thread
From: Yosry Ahmed @ 2026-03-03  0:38 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Jim Mattson, kvm, Paolo Bonzini

On Mon, Mar 2, 2026 at 4:26 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Mar 02, 2026, Yosry Ahmed wrote:
> > On Mon, Mar 2, 2026 at 4:08 PM Jim Mattson <jmattson@google.com> wrote:
> > > > IIUC this will be in the order of 100s of milliseconds. Do we need to
> > > > wait that long between retries? If the CPU is in a state where it will
> > > > always abort RTM, 30 retries will end up taking seconds or 10s of
> > > > seconds, right?
> > >
> > > I tried reducing the delay by a factor of 10. At 200 retries, I still
> > > see a 2% skip rate on a Skylake Xeon E5 @ 2GHz. I'd like to get the
> > > skip rate under 1%. But, maybe others don't care as much?
> > >
> > > Yes, 30 billion cycles is going to be on the order of 10 seconds.
> >
> > I personally care more about the test time than the fact that it won't
> > test RTM 2% of the time, but my opinion doesn't really matter :P
>
> I generally care more about runtime too, but isn't 10 seconds only the worst
> case scenario, and only on these fubar CPUs?  E.g. if there's no perf activity
> in the host, or the CPU isn't one of these oddballs, isn't XBEGIN going to succeed
> ~100% of the time?
>
> If this were a choice between "eat N seconds every time" and "skip the test",
> I'd be a-ok with a skip rate of 50% if it meant reducing N.  But, assuming this
> requires perf activity and a Skylake-era CPU, odds are good this will only be hit
> in CI environments, at which point adding ~10 seconds to the worst case scenario
> isn't a bad tradeoff (so long as it doesn't push the total runtime close to the
> timeout).

Good point. We can always revisit if we get really unlucky :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-03  0:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-27 21:38 [kvm-unit-tests PATCH] x86: nVMX: Add retry loop to advanced RTM debugging subtest Jim Mattson
2026-02-28  0:55 ` Yosry Ahmed
2026-03-03  0:08   ` Jim Mattson
2026-03-03  0:09     ` Yosry Ahmed
2026-03-03  0:26       ` Sean Christopherson
2026-03-03  0:38         ` Yosry Ahmed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox