* [kvm-unit-tests PATCH 0/2] Fix triple fault in eventinj test
@ 2025-09-15 14:49 Chao Gao
2025-09-15 14:49 ` [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification Chao Gao
2025-09-15 14:49 ` [PATCH 2/2] x86/eventinj: Push SP to IRET frame Chao Gao
0 siblings, 2 replies; 8+ messages in thread
From: Chao Gao @ 2025-09-15 14:49 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, seanjc, Chao Gao
As reported in [1], the eventinj test can cause a triple fault due to an
invalid RSP after IRET. Fix this by pushing a valid stack pointer to the
crafted IRET frame in do_iret(), ensuring RSP is restored to a valid
stack in 64-bit mode.
[1]: https://lore.kernel.org/kvm/aMahfvF1r39Xq6zK@intel.com/
Chao Gao (2):
x86/eventinj: Use global asm label for nested NMI IP address
verification
x86/eventinj: Push SP to IRET frame
x86/eventinj.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification
2025-09-15 14:49 [kvm-unit-tests PATCH 0/2] Fix triple fault in eventinj test Chao Gao
@ 2025-09-15 14:49 ` Chao Gao
2025-09-16 10:10 ` Mathias Krause
2025-09-15 14:49 ` [PATCH 2/2] x86/eventinj: Push SP to IRET frame Chao Gao
1 sibling, 1 reply; 8+ messages in thread
From: Chao Gao @ 2025-09-15 14:49 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, seanjc, Chao Gao
Use a global asm label to get the expected IP address for nested NMI
interception instead of reading a hardcoded offset from the stack.
the NMI test in eventinj.c verifies that a nested NMI occurs immediately at
the return address (IP register) in the IRET frame, as IRET opens the
NMI window. Currently, nested_nmi_iret_isr() reads the return address
using a magic offset (iret_stack[-3]), which is unclear and may break if
more values are pushed to the "iret_stack".
To improve readability, add a global 'ip_after_iret' label for the expected
return address, push it to the IRET frame, and verify it matches the
interrupted address in the nested NMI handler.
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
x86/eventinj.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/x86/eventinj.c b/x86/eventinj.c
index 6fbb2d0f..ec8a5ef1 100644
--- a/x86/eventinj.c
+++ b/x86/eventinj.c
@@ -127,12 +127,13 @@ static void nmi_isr(struct ex_regs *r)
}
unsigned long *iret_stack;
+extern char ip_after_iret[];
static void nested_nmi_iret_isr(struct ex_regs *r)
{
printf("Nested NMI isr running rip=%lx\n", r->rip);
- if (r->rip == iret_stack[-3])
+ if (r->rip == (unsigned long)ip_after_iret)
test_count++;
}
@@ -156,11 +157,11 @@ asm("do_iret:"
"mov %cs, %ecx \n\t"
"push"W" %"R "cx \n\t"
#ifndef __x86_64__
- "push"W" $2f \n\t"
+ "push"W" $ip_after_iret \n\t"
"cmpb $0, no_test_device\n\t" // see if need to flush
#else
- "leaq 2f(%rip), %rbx \n\t"
+ "leaq ip_after_iret(%rip), %rbx \n\t"
"pushq %rbx \n\t"
"mov no_test_device(%rip), %bl \n\t"
@@ -170,7 +171,9 @@ asm("do_iret:"
"outl %eax, $0xe4 \n\t" // flush page
"1: \n\t"
"iret"W" \n\t"
- "2: xchg %"R "dx, %"R "sp \n\t" // point to old stack
+ ".global ip_after_iret \n\t"
+ "ip_after_iret: \n\t"
+ "xchg %"R "dx, %"R "sp \n\t" // point to old stack
"ret\n\t"
);
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] x86/eventinj: Push SP to IRET frame
2025-09-15 14:49 [kvm-unit-tests PATCH 0/2] Fix triple fault in eventinj test Chao Gao
2025-09-15 14:49 ` [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification Chao Gao
@ 2025-09-15 14:49 ` Chao Gao
2025-09-16 10:21 ` Mathias Krause
1 sibling, 1 reply; 8+ messages in thread
From: Chao Gao @ 2025-09-15 14:49 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, seanjc, Chao Gao
Push the current stack pointer (SP) to the IRET frame in do_iret() to
ensure a valid stack after IRET execution, particularly in 64-bit mode.
Currently, do_iret() crafts an IRET frame with a zeroed SP. For 32-bit
guests, SP is not popped during IRET due to no privilege change. so, the
stack after IRET is still valid. But for 64-bit guests, IRET
unconditionally pops RSP, restoring it to zero. This can cause a nested NMI
to push its interrupt frame to the topmost page (0xffffffff_fffff000),
which may be not mapped and cause triple fault [1].
To fix this issue, push the current SP to the IRET frame, ensuring RSP is
restored to a valid stack in 64-bit mode.
Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/kvm/aMahfvF1r39Xq6zK@intel.com/
---
x86/eventinj.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/x86/eventinj.c b/x86/eventinj.c
index ec8a5ef1..63ebbaab 100644
--- a/x86/eventinj.c
+++ b/x86/eventinj.c
@@ -153,6 +153,7 @@ asm("do_iret:"
"mov 8(%esp), %edx \n\t" // virt_stack
#endif
"xchg %"R "dx, %"R "sp \n\t" // point to new stack
+ "push"W" %"R "sp \n\t"
"pushf"W" \n\t"
"mov %cs, %ecx \n\t"
"push"W" %"R "cx \n\t"
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification
2025-09-15 14:49 ` [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification Chao Gao
@ 2025-09-16 10:10 ` Mathias Krause
2025-10-15 1:47 ` Chao Gao
0 siblings, 1 reply; 8+ messages in thread
From: Mathias Krause @ 2025-09-16 10:10 UTC (permalink / raw)
To: Chao Gao, kvm; +Cc: pbonzini, seanjc
Am 15.09.25 um 16:49 schrieb Chao Gao:
> Use a global asm label to get the expected IP address for nested NMI
> interception instead of reading a hardcoded offset from the stack.
>
> the NMI test in eventinj.c verifies that a nested NMI occurs immediately at
> the return address (IP register) in the IRET frame, as IRET opens the
> NMI window. Currently, nested_nmi_iret_isr() reads the return address
> using a magic offset (iret_stack[-3]), which is unclear and may break if
> more values are pushed to the "iret_stack".
>
> To improve readability, add a global 'ip_after_iret' label for the expected
> return address, push it to the IRET frame, and verify it matches the
> interrupted address in the nested NMI handler.
>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> x86/eventinj.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/x86/eventinj.c b/x86/eventinj.c
> index 6fbb2d0f..ec8a5ef1 100644
> --- a/x86/eventinj.c
> +++ b/x86/eventinj.c
> @@ -127,12 +127,13 @@ static void nmi_isr(struct ex_regs *r)
> }
>
> unsigned long *iret_stack;
> +extern char ip_after_iret[];
>
> static void nested_nmi_iret_isr(struct ex_regs *r)
> {
> printf("Nested NMI isr running rip=%lx\n", r->rip);
>
> - if (r->rip == iret_stack[-3])
> + if (r->rip == (unsigned long)ip_after_iret)
This change basically eliminates the need for the global
'ip_after_iret', it can be local to nmi_iret_isr() now.
> test_count++;
> }
>
> @@ -156,11 +157,11 @@ asm("do_iret:"
> "mov %cs, %ecx \n\t"
> "push"W" %"R "cx \n\t"
> #ifndef __x86_64__
> - "push"W" $2f \n\t"
> + "push"W" $ip_after_iret \n\t"
>
> "cmpb $0, no_test_device\n\t" // see if need to flush
> #else
> - "leaq 2f(%rip), %rbx \n\t"
> + "leaq ip_after_iret(%rip), %rbx \n\t"
> "pushq %rbx \n\t"
>
> "mov no_test_device(%rip), %bl \n\t"
> @@ -170,7 +171,9 @@ asm("do_iret:"
> "outl %eax, $0xe4 \n\t" // flush page
> "1: \n\t"
> "iret"W" \n\t"
> - "2: xchg %"R "dx, %"R "sp \n\t" // point to old stack
> + ".global ip_after_iret \n\t"
> + "ip_after_iret: \n\t"
> + "xchg %"R "dx, %"R "sp \n\t" // point to old stack
> "ret\n\t"
> );
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] x86/eventinj: Push SP to IRET frame
2025-09-15 14:49 ` [PATCH 2/2] x86/eventinj: Push SP to IRET frame Chao Gao
@ 2025-09-16 10:21 ` Mathias Krause
2025-10-15 1:49 ` Chao Gao
0 siblings, 1 reply; 8+ messages in thread
From: Mathias Krause @ 2025-09-16 10:21 UTC (permalink / raw)
To: Chao Gao, kvm; +Cc: pbonzini, seanjc
Am 15.09.25 um 16:49 schrieb Chao Gao:
> Push the current stack pointer (SP) to the IRET frame in do_iret() to
> ensure a valid stack after IRET execution, particularly in 64-bit mode.
>
> Currently, do_iret() crafts an IRET frame with a zeroed SP. For 32-bit
> guests, SP is not popped during IRET due to no privilege change. so, the
> stack after IRET is still valid. But for 64-bit guests, IRET
> unconditionally pops RSP, restoring it to zero. This can cause a nested NMI
> to push its interrupt frame to the topmost page (0xffffffff_fffff000),
> which may be not mapped and cause triple fault [1].
Nice catch!
>
> To fix this issue, push the current SP to the IRET frame, ensuring RSP is
> restored to a valid stack in 64-bit mode.
>
> Reported-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Link: https://lore.kernel.org/kvm/aMahfvF1r39Xq6zK@intel.com/
> ---
> x86/eventinj.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/x86/eventinj.c b/x86/eventinj.c
> index ec8a5ef1..63ebbaab 100644
> --- a/x86/eventinj.c
> +++ b/x86/eventinj.c
> @@ -153,6 +153,7 @@ asm("do_iret:"
> "mov 8(%esp), %edx \n\t" // virt_stack
> #endif
> "xchg %"R "dx, %"R "sp \n\t" // point to new stack
> + "push"W" %"R "sp \n\t"
We should also push SS, for consistency reasons. Not that it would
matter much for x64-64, but a NULL selector for SS still feels wrong.
> "pushf"W" \n\t"
> "mov %cs, %ecx \n\t"
> "push"W" %"R "cx \n\t"
The leading comment also needs an update and the 'extern bool
no_test_device' can be dropped as well, as fwcfg.h gets included by
eventinj.c.
So, maybe something like this on top?:
diff --git a/x86/eventinj.c b/x86/eventinj.c
index 63ebbaabda66..4e21f3915820 100644
--- a/x86/eventinj.c
+++ b/x86/eventinj.c
@@ -139,11 +139,8 @@ static void nested_nmi_iret_isr(struct ex_regs *r)
extern void do_iret(ulong phys_stack, void *virt_stack);
-// Return to same privilege level won't pop SS or SP, so
+// Return to same privilege level won't pop SS or SP for i386 but x86-64, so
// save it in RDX while we run on the nested stack
-
-extern bool no_test_device;
-
asm("do_iret:"
#ifdef __x86_64__
"mov %rdi, %rax \n\t" // phys_stack
@@ -153,7 +150,12 @@ asm("do_iret:"
"mov 8(%esp), %edx \n\t" // virt_stack
#endif
"xchg %"R "dx, %"R "sp \n\t" // point to new stack
+#ifdef __x86_64__
+ // IRET in 64 bit mode unconditionally pops SS:xSP
+ "mov %ss, %ecx \n\t"
+ "push"W" %"R "cx \n\t"
"push"W" %"R "sp \n\t"
+#endif
"pushf"W" \n\t"
"mov %cs, %ecx \n\t"
"push"W" %"R "cx \n\t"
Thanks,
Mathias
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification
2025-09-16 10:10 ` Mathias Krause
@ 2025-10-15 1:47 ` Chao Gao
2025-10-15 4:29 ` Mathias Krause
0 siblings, 1 reply; 8+ messages in thread
From: Chao Gao @ 2025-10-15 1:47 UTC (permalink / raw)
To: Mathias Krause; +Cc: kvm, pbonzini, seanjc
On Tue, Sep 16, 2025 at 12:10:46PM +0200, Mathias Krause wrote:
>Am 15.09.25 um 16:49 schrieb Chao Gao:
>> Use a global asm label to get the expected IP address for nested NMI
>> interception instead of reading a hardcoded offset from the stack.
>>
>> the NMI test in eventinj.c verifies that a nested NMI occurs immediately at
>> the return address (IP register) in the IRET frame, as IRET opens the
>> NMI window. Currently, nested_nmi_iret_isr() reads the return address
>> using a magic offset (iret_stack[-3]), which is unclear and may break if
>> more values are pushed to the "iret_stack".
>>
>> To improve readability, add a global 'ip_after_iret' label for the expected
>> return address, push it to the IRET frame, and verify it matches the
>> interrupted address in the nested NMI handler.
>>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>> x86/eventinj.c | 11 +++++++----
>> 1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/x86/eventinj.c b/x86/eventinj.c
>> index 6fbb2d0f..ec8a5ef1 100644
>> --- a/x86/eventinj.c
>> +++ b/x86/eventinj.c
>> @@ -127,12 +127,13 @@ static void nmi_isr(struct ex_regs *r)
>> }
>>
>> unsigned long *iret_stack;
>> +extern char ip_after_iret[];
>>
>> static void nested_nmi_iret_isr(struct ex_regs *r)
>> {
>> printf("Nested NMI isr running rip=%lx\n", r->rip);
>>
>> - if (r->rip == iret_stack[-3])
>> + if (r->rip == (unsigned long)ip_after_iret)
>
>This change basically eliminates the need for the global
>'ip_after_iret', it can be local to nmi_iret_isr() now.
You meant 'iret_stack', right? if so, sure, I will make it local to
nmi_iret_isr().
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] x86/eventinj: Push SP to IRET frame
2025-09-16 10:21 ` Mathias Krause
@ 2025-10-15 1:49 ` Chao Gao
0 siblings, 0 replies; 8+ messages in thread
From: Chao Gao @ 2025-10-15 1:49 UTC (permalink / raw)
To: Mathias Krause; +Cc: kvm, pbonzini, seanjc
>> ---
>> x86/eventinj.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/x86/eventinj.c b/x86/eventinj.c
>> index ec8a5ef1..63ebbaab 100644
>> --- a/x86/eventinj.c
>> +++ b/x86/eventinj.c
>> @@ -153,6 +153,7 @@ asm("do_iret:"
>> "mov 8(%esp), %edx \n\t" // virt_stack
>> #endif
>> "xchg %"R "dx, %"R "sp \n\t" // point to new stack
>> + "push"W" %"R "sp \n\t"
>
>We should also push SS, for consistency reasons. Not that it would
>matter much for x64-64, but a NULL selector for SS still feels wrong.
>
>> "pushf"W" \n\t"
>> "mov %cs, %ecx \n\t"
>> "push"W" %"R "cx \n\t"
>
>The leading comment also needs an update and the 'extern bool
>no_test_device' can be dropped as well, as fwcfg.h gets included by
>eventinj.c.
>
>So, maybe something like this on top?:
Looks good to me. Thanks for your suggestion.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification
2025-10-15 1:47 ` Chao Gao
@ 2025-10-15 4:29 ` Mathias Krause
0 siblings, 0 replies; 8+ messages in thread
From: Mathias Krause @ 2025-10-15 4:29 UTC (permalink / raw)
To: Chao Gao; +Cc: kvm, pbonzini, seanjc
On 10/15/25 03:47, Chao Gao wrote:
> On Tue, Sep 16, 2025 at 12:10:46PM +0200, Mathias Krause wrote:
>> Am 15.09.25 um 16:49 schrieb Chao Gao:
>>> [...]
>>
>> This change basically eliminates the need for the global
>> 'ip_after_iret', it can be local to nmi_iret_isr() now.
>
> You meant 'iret_stack', right? if so, sure, I will make it local to
> nmi_iret_isr().
Bleh, sure. Sorry for the confusion.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-10-15 4:29 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-15 14:49 [kvm-unit-tests PATCH 0/2] Fix triple fault in eventinj test Chao Gao
2025-09-15 14:49 ` [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification Chao Gao
2025-09-16 10:10 ` Mathias Krause
2025-10-15 1:47 ` Chao Gao
2025-10-15 4:29 ` Mathias Krause
2025-09-15 14:49 ` [PATCH 2/2] x86/eventinj: Push SP to IRET frame Chao Gao
2025-09-16 10:21 ` Mathias Krause
2025-10-15 1:49 ` Chao Gao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox