public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH 0/2] Fix triple fault in eventinj test
@ 2025-09-15 14:49 Chao Gao
  2025-09-15 14:49 ` [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification Chao Gao
  2025-09-15 14:49 ` [PATCH 2/2] x86/eventinj: Push SP to IRET frame Chao Gao
  0 siblings, 2 replies; 8+ messages in thread
From: Chao Gao @ 2025-09-15 14:49 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, seanjc, Chao Gao

As reported in [1], the eventinj test can cause a triple fault due to an
invalid RSP after IRET. Fix this by pushing a valid stack pointer to the
crafted IRET frame in do_iret(), ensuring RSP is restored to a valid
stack in 64-bit mode.

[1]: https://lore.kernel.org/kvm/aMahfvF1r39Xq6zK@intel.com/

Chao Gao (2):
  x86/eventinj: Use global asm label for nested NMI IP address
    verification
  x86/eventinj: Push SP to IRET frame

 x86/eventinj.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification
  2025-09-15 14:49 [kvm-unit-tests PATCH 0/2] Fix triple fault in eventinj test Chao Gao
@ 2025-09-15 14:49 ` Chao Gao
  2025-09-16 10:10   ` Mathias Krause
  2025-09-15 14:49 ` [PATCH 2/2] x86/eventinj: Push SP to IRET frame Chao Gao
  1 sibling, 1 reply; 8+ messages in thread
From: Chao Gao @ 2025-09-15 14:49 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, seanjc, Chao Gao

Use a global asm label to get the expected IP address for nested NMI
interception instead of reading a hardcoded offset from the stack.

the NMI test in eventinj.c verifies that a nested NMI occurs immediately at
the return address (IP register) in the IRET frame, as IRET opens the
NMI window. Currently, nested_nmi_iret_isr() reads the return address
using a magic offset (iret_stack[-3]), which is unclear and may break if
more values are pushed to the "iret_stack".

To improve readability, add a global 'ip_after_iret' label for the expected
return address, push it to the IRET frame, and verify it matches the
interrupted address in the nested NMI handler.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 x86/eventinj.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/x86/eventinj.c b/x86/eventinj.c
index 6fbb2d0f..ec8a5ef1 100644
--- a/x86/eventinj.c
+++ b/x86/eventinj.c
@@ -127,12 +127,13 @@ static void nmi_isr(struct ex_regs *r)
 }
 
 unsigned long *iret_stack;
+extern char ip_after_iret[];
 
 static void nested_nmi_iret_isr(struct ex_regs *r)
 {
 	printf("Nested NMI isr running rip=%lx\n", r->rip);
 
-	if (r->rip == iret_stack[-3])
+	if (r->rip == (unsigned long)ip_after_iret)
 		test_count++;
 }
 
@@ -156,11 +157,11 @@ asm("do_iret:"
 	"mov %cs, %ecx \n\t"
 	"push"W" %"R "cx \n\t"
 #ifndef __x86_64__
-	"push"W" $2f \n\t"
+	"push"W" $ip_after_iret \n\t"
 
 	"cmpb $0, no_test_device\n\t"	// see if need to flush
 #else
-	"leaq 2f(%rip), %rbx \n\t"
+	"leaq ip_after_iret(%rip), %rbx \n\t"
 	"pushq %rbx \n\t"
 
 	"mov no_test_device(%rip), %bl \n\t"
@@ -170,7 +171,9 @@ asm("do_iret:"
 	"outl %eax, $0xe4 \n\t"		// flush page
 	"1: \n\t"
 	"iret"W" \n\t"
-	"2: xchg %"R "dx, %"R "sp \n\t"	// point to old stack
+	".global ip_after_iret \n\t"
+	"ip_after_iret: \n\t"
+	"xchg %"R "dx, %"R "sp \n\t"	// point to old stack
 	"ret\n\t"
    );
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] x86/eventinj: Push SP to IRET frame
  2025-09-15 14:49 [kvm-unit-tests PATCH 0/2] Fix triple fault in eventinj test Chao Gao
  2025-09-15 14:49 ` [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification Chao Gao
@ 2025-09-15 14:49 ` Chao Gao
  2025-09-16 10:21   ` Mathias Krause
  1 sibling, 1 reply; 8+ messages in thread
From: Chao Gao @ 2025-09-15 14:49 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, seanjc, Chao Gao

Push the current stack pointer (SP) to the IRET frame in do_iret() to
ensure a valid stack after IRET execution, particularly in 64-bit mode.

Currently, do_iret() crafts an IRET frame with a zeroed SP. For 32-bit
guests, SP is not popped during IRET due to no privilege change. so, the
stack after IRET is still valid. But for 64-bit guests, IRET
unconditionally pops RSP, restoring it to zero. This can cause a nested NMI
to push its interrupt frame to the topmost page (0xffffffff_fffff000),
which may be not mapped and cause triple fault [1].

To fix this issue, push the current SP to the IRET frame, ensuring RSP is
restored to a valid stack in 64-bit mode.

Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/kvm/aMahfvF1r39Xq6zK@intel.com/
---
 x86/eventinj.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/x86/eventinj.c b/x86/eventinj.c
index ec8a5ef1..63ebbaab 100644
--- a/x86/eventinj.c
+++ b/x86/eventinj.c
@@ -153,6 +153,7 @@ asm("do_iret:"
 	"mov 8(%esp), %edx \n\t"	// virt_stack
 #endif
 	"xchg %"R "dx, %"R "sp \n\t"	// point to new stack
+	"push"W" %"R "sp \n\t"
 	"pushf"W" \n\t"
 	"mov %cs, %ecx \n\t"
 	"push"W" %"R "cx \n\t"
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification
  2025-09-15 14:49 ` [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification Chao Gao
@ 2025-09-16 10:10   ` Mathias Krause
  2025-10-15  1:47     ` Chao Gao
  0 siblings, 1 reply; 8+ messages in thread
From: Mathias Krause @ 2025-09-16 10:10 UTC (permalink / raw)
  To: Chao Gao, kvm; +Cc: pbonzini, seanjc

Am 15.09.25 um 16:49 schrieb Chao Gao:
> Use a global asm label to get the expected IP address for nested NMI
> interception instead of reading a hardcoded offset from the stack.
> 
> the NMI test in eventinj.c verifies that a nested NMI occurs immediately at
> the return address (IP register) in the IRET frame, as IRET opens the
> NMI window. Currently, nested_nmi_iret_isr() reads the return address
> using a magic offset (iret_stack[-3]), which is unclear and may break if
> more values are pushed to the "iret_stack".
> 
> To improve readability, add a global 'ip_after_iret' label for the expected
> return address, push it to the IRET frame, and verify it matches the
> interrupted address in the nested NMI handler.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>  x86/eventinj.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/x86/eventinj.c b/x86/eventinj.c
> index 6fbb2d0f..ec8a5ef1 100644
> --- a/x86/eventinj.c
> +++ b/x86/eventinj.c
> @@ -127,12 +127,13 @@ static void nmi_isr(struct ex_regs *r)
>  }
>  
>  unsigned long *iret_stack;
> +extern char ip_after_iret[];
>  
>  static void nested_nmi_iret_isr(struct ex_regs *r)
>  {
>  	printf("Nested NMI isr running rip=%lx\n", r->rip);
>  
> -	if (r->rip == iret_stack[-3])
> +	if (r->rip == (unsigned long)ip_after_iret)

This change basically eliminates the need for the global
'ip_after_iret', it can be local to nmi_iret_isr() now.

>  		test_count++;
>  }
>  
> @@ -156,11 +157,11 @@ asm("do_iret:"
>  	"mov %cs, %ecx \n\t"
>  	"push"W" %"R "cx \n\t"
>  #ifndef __x86_64__
> -	"push"W" $2f \n\t"
> +	"push"W" $ip_after_iret \n\t"
>  
>  	"cmpb $0, no_test_device\n\t"	// see if need to flush
>  #else
> -	"leaq 2f(%rip), %rbx \n\t"
> +	"leaq ip_after_iret(%rip), %rbx \n\t"
>  	"pushq %rbx \n\t"
>  
>  	"mov no_test_device(%rip), %bl \n\t"
> @@ -170,7 +171,9 @@ asm("do_iret:"
>  	"outl %eax, $0xe4 \n\t"		// flush page
>  	"1: \n\t"
>  	"iret"W" \n\t"
> -	"2: xchg %"R "dx, %"R "sp \n\t"	// point to old stack
> +	".global ip_after_iret \n\t"
> +	"ip_after_iret: \n\t"
> +	"xchg %"R "dx, %"R "sp \n\t"	// point to old stack
>  	"ret\n\t"
>     );
>  


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] x86/eventinj: Push SP to IRET frame
  2025-09-15 14:49 ` [PATCH 2/2] x86/eventinj: Push SP to IRET frame Chao Gao
@ 2025-09-16 10:21   ` Mathias Krause
  2025-10-15  1:49     ` Chao Gao
  0 siblings, 1 reply; 8+ messages in thread
From: Mathias Krause @ 2025-09-16 10:21 UTC (permalink / raw)
  To: Chao Gao, kvm; +Cc: pbonzini, seanjc

Am 15.09.25 um 16:49 schrieb Chao Gao:
> Push the current stack pointer (SP) to the IRET frame in do_iret() to
> ensure a valid stack after IRET execution, particularly in 64-bit mode.
> 
> Currently, do_iret() crafts an IRET frame with a zeroed SP. For 32-bit
> guests, SP is not popped during IRET due to no privilege change. so, the
> stack after IRET is still valid. But for 64-bit guests, IRET
> unconditionally pops RSP, restoring it to zero. This can cause a nested NMI
> to push its interrupt frame to the topmost page (0xffffffff_fffff000),
> which may be not mapped and cause triple fault [1].

Nice catch!

> 
> To fix this issue, push the current SP to the IRET frame, ensuring RSP is
> restored to a valid stack in 64-bit mode.
> 
> Reported-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Link: https://lore.kernel.org/kvm/aMahfvF1r39Xq6zK@intel.com/
> ---
>  x86/eventinj.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/x86/eventinj.c b/x86/eventinj.c
> index ec8a5ef1..63ebbaab 100644
> --- a/x86/eventinj.c
> +++ b/x86/eventinj.c
> @@ -153,6 +153,7 @@ asm("do_iret:"
>  	"mov 8(%esp), %edx \n\t"	// virt_stack
>  #endif
>  	"xchg %"R "dx, %"R "sp \n\t"	// point to new stack
> +	"push"W" %"R "sp \n\t"

We should also push SS, for consistency reasons. Not that it would
matter much for x64-64, but a NULL selector for SS still feels wrong.

>  	"pushf"W" \n\t"
>  	"mov %cs, %ecx \n\t"
>  	"push"W" %"R "cx \n\t"

The leading comment also needs an update and the 'extern bool
no_test_device' can be dropped as well, as fwcfg.h gets included by
eventinj.c.

So, maybe something like this on top?:

diff --git a/x86/eventinj.c b/x86/eventinj.c
index 63ebbaabda66..4e21f3915820 100644
--- a/x86/eventinj.c
+++ b/x86/eventinj.c
@@ -139,11 +139,8 @@ static void nested_nmi_iret_isr(struct ex_regs *r)

 extern void do_iret(ulong phys_stack, void *virt_stack);

-// Return to same privilege level won't pop SS or SP, so
+// Return to same privilege level won't pop SS or SP for i386 but x86-64, so
 // save it in RDX while we run on the nested stack
-
-extern bool no_test_device;
-
 asm("do_iret:"
 #ifdef __x86_64__
        "mov %rdi, %rax \n\t"           // phys_stack
@@ -153,7 +150,12 @@ asm("do_iret:"
        "mov 8(%esp), %edx \n\t"        // virt_stack
 #endif
        "xchg %"R "dx, %"R "sp \n\t"    // point to new stack
+#ifdef __x86_64__
+       // IRET in 64 bit mode unconditionally pops SS:xSP
+       "mov %ss, %ecx \n\t"
+       "push"W" %"R "cx \n\t"
        "push"W" %"R "sp \n\t"
+#endif
        "pushf"W" \n\t"
        "mov %cs, %ecx \n\t"
        "push"W" %"R "cx \n\t"

Thanks,
Mathias

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification
  2025-09-16 10:10   ` Mathias Krause
@ 2025-10-15  1:47     ` Chao Gao
  2025-10-15  4:29       ` Mathias Krause
  0 siblings, 1 reply; 8+ messages in thread
From: Chao Gao @ 2025-10-15  1:47 UTC (permalink / raw)
  To: Mathias Krause; +Cc: kvm, pbonzini, seanjc

On Tue, Sep 16, 2025 at 12:10:46PM +0200, Mathias Krause wrote:
>Am 15.09.25 um 16:49 schrieb Chao Gao:
>> Use a global asm label to get the expected IP address for nested NMI
>> interception instead of reading a hardcoded offset from the stack.
>> 
>> the NMI test in eventinj.c verifies that a nested NMI occurs immediately at
>> the return address (IP register) in the IRET frame, as IRET opens the
>> NMI window. Currently, nested_nmi_iret_isr() reads the return address
>> using a magic offset (iret_stack[-3]), which is unclear and may break if
>> more values are pushed to the "iret_stack".
>> 
>> To improve readability, add a global 'ip_after_iret' label for the expected
>> return address, push it to the IRET frame, and verify it matches the
>> interrupted address in the nested NMI handler.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>>  x86/eventinj.c | 11 +++++++----
>>  1 file changed, 7 insertions(+), 4 deletions(-)
>> 
>> diff --git a/x86/eventinj.c b/x86/eventinj.c
>> index 6fbb2d0f..ec8a5ef1 100644
>> --- a/x86/eventinj.c
>> +++ b/x86/eventinj.c
>> @@ -127,12 +127,13 @@ static void nmi_isr(struct ex_regs *r)
>>  }
>>  
>>  unsigned long *iret_stack;
>> +extern char ip_after_iret[];
>>  
>>  static void nested_nmi_iret_isr(struct ex_regs *r)
>>  {
>>  	printf("Nested NMI isr running rip=%lx\n", r->rip);
>>  
>> -	if (r->rip == iret_stack[-3])
>> +	if (r->rip == (unsigned long)ip_after_iret)
>
>This change basically eliminates the need for the global
>'ip_after_iret', it can be local to nmi_iret_isr() now.

You meant 'iret_stack', right? if so, sure, I will make it local to
nmi_iret_isr().

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] x86/eventinj: Push SP to IRET frame
  2025-09-16 10:21   ` Mathias Krause
@ 2025-10-15  1:49     ` Chao Gao
  0 siblings, 0 replies; 8+ messages in thread
From: Chao Gao @ 2025-10-15  1:49 UTC (permalink / raw)
  To: Mathias Krause; +Cc: kvm, pbonzini, seanjc

>> ---
>>  x86/eventinj.c | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/x86/eventinj.c b/x86/eventinj.c
>> index ec8a5ef1..63ebbaab 100644
>> --- a/x86/eventinj.c
>> +++ b/x86/eventinj.c
>> @@ -153,6 +153,7 @@ asm("do_iret:"
>>  	"mov 8(%esp), %edx \n\t"	// virt_stack
>>  #endif
>>  	"xchg %"R "dx, %"R "sp \n\t"	// point to new stack
>> +	"push"W" %"R "sp \n\t"
>
>We should also push SS, for consistency reasons. Not that it would
>matter much for x64-64, but a NULL selector for SS still feels wrong.
>
>>  	"pushf"W" \n\t"
>>  	"mov %cs, %ecx \n\t"
>>  	"push"W" %"R "cx \n\t"
>
>The leading comment also needs an update and the 'extern bool
>no_test_device' can be dropped as well, as fwcfg.h gets included by
>eventinj.c.
>
>So, maybe something like this on top?:

Looks good to me. Thanks for your suggestion.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification
  2025-10-15  1:47     ` Chao Gao
@ 2025-10-15  4:29       ` Mathias Krause
  0 siblings, 0 replies; 8+ messages in thread
From: Mathias Krause @ 2025-10-15  4:29 UTC (permalink / raw)
  To: Chao Gao; +Cc: kvm, pbonzini, seanjc

On 10/15/25 03:47, Chao Gao wrote:
> On Tue, Sep 16, 2025 at 12:10:46PM +0200, Mathias Krause wrote:
>> Am 15.09.25 um 16:49 schrieb Chao Gao:
>>> [...]
>>
>> This change basically eliminates the need for the global
>> 'ip_after_iret', it can be local to nmi_iret_isr() now.
> 
> You meant 'iret_stack', right? if so, sure, I will make it local to
> nmi_iret_isr().

Bleh, sure. Sorry for the confusion.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-10-15  4:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-15 14:49 [kvm-unit-tests PATCH 0/2] Fix triple fault in eventinj test Chao Gao
2025-09-15 14:49 ` [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification Chao Gao
2025-09-16 10:10   ` Mathias Krause
2025-10-15  1:47     ` Chao Gao
2025-10-15  4:29       ` Mathias Krause
2025-09-15 14:49 ` [PATCH 2/2] x86/eventinj: Push SP to IRET frame Chao Gao
2025-09-16 10:21   ` Mathias Krause
2025-10-15  1:49     ` Chao Gao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox