* [kvm-unit-tests PATCH 0/2] Fix triple fault in eventinj test
@ 2025-09-15 14:49 Chao Gao
2025-09-15 14:49 ` [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification Chao Gao
2025-09-15 14:49 ` [PATCH 2/2] x86/eventinj: Push SP to IRET frame Chao Gao
0 siblings, 2 replies; 8+ messages in thread
From: Chao Gao @ 2025-09-15 14:49 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, seanjc, Chao Gao
As reported in [1], the eventinj test can cause a triple fault due to an
invalid RSP after IRET. Fix this by pushing a valid stack pointer to the
crafted IRET frame in do_iret(), ensuring RSP is restored to a valid
stack in 64-bit mode.
[1]: https://lore.kernel.org/kvm/aMahfvF1r39Xq6zK@intel.com/
Chao Gao (2):
x86/eventinj: Use global asm label for nested NMI IP address
verification
x86/eventinj: Push SP to IRET frame
x86/eventinj.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification 2025-09-15 14:49 [kvm-unit-tests PATCH 0/2] Fix triple fault in eventinj test Chao Gao @ 2025-09-15 14:49 ` Chao Gao 2025-09-16 10:10 ` Mathias Krause 2025-09-15 14:49 ` [PATCH 2/2] x86/eventinj: Push SP to IRET frame Chao Gao 1 sibling, 1 reply; 8+ messages in thread From: Chao Gao @ 2025-09-15 14:49 UTC (permalink / raw) To: kvm; +Cc: pbonzini, seanjc, Chao Gao Use a global asm label to get the expected IP address for nested NMI interception instead of reading a hardcoded offset from the stack. the NMI test in eventinj.c verifies that a nested NMI occurs immediately at the return address (IP register) in the IRET frame, as IRET opens the NMI window. Currently, nested_nmi_iret_isr() reads the return address using a magic offset (iret_stack[-3]), which is unclear and may break if more values are pushed to the "iret_stack". To improve readability, add a global 'ip_after_iret' label for the expected return address, push it to the IRET frame, and verify it matches the interrupted address in the nested NMI handler. Signed-off-by: Chao Gao <chao.gao@intel.com> --- x86/eventinj.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/x86/eventinj.c b/x86/eventinj.c index 6fbb2d0f..ec8a5ef1 100644 --- a/x86/eventinj.c +++ b/x86/eventinj.c @@ -127,12 +127,13 @@ static void nmi_isr(struct ex_regs *r) } unsigned long *iret_stack; +extern char ip_after_iret[]; static void nested_nmi_iret_isr(struct ex_regs *r) { printf("Nested NMI isr running rip=%lx\n", r->rip); - if (r->rip == iret_stack[-3]) + if (r->rip == (unsigned long)ip_after_iret) test_count++; } @@ -156,11 +157,11 @@ asm("do_iret:" "mov %cs, %ecx \n\t" "push"W" %"R "cx \n\t" #ifndef __x86_64__ - "push"W" $2f \n\t" + "push"W" $ip_after_iret \n\t" "cmpb $0, no_test_device\n\t" // see if need to flush #else - "leaq 2f(%rip), %rbx \n\t" + "leaq ip_after_iret(%rip), %rbx \n\t" "pushq %rbx \n\t" "mov no_test_device(%rip), %bl \n\t" @@ -170,7 +171,9 @@ asm("do_iret:" "outl %eax, $0xe4 \n\t" // flush page "1: \n\t" "iret"W" \n\t" - "2: xchg %"R "dx, %"R "sp \n\t" // point to old stack + ".global ip_after_iret \n\t" + "ip_after_iret: \n\t" + "xchg %"R "dx, %"R "sp \n\t" // point to old stack "ret\n\t" ); -- 2.47.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification 2025-09-15 14:49 ` [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification Chao Gao @ 2025-09-16 10:10 ` Mathias Krause 2025-10-15 1:47 ` Chao Gao 0 siblings, 1 reply; 8+ messages in thread From: Mathias Krause @ 2025-09-16 10:10 UTC (permalink / raw) To: Chao Gao, kvm; +Cc: pbonzini, seanjc Am 15.09.25 um 16:49 schrieb Chao Gao: > Use a global asm label to get the expected IP address for nested NMI > interception instead of reading a hardcoded offset from the stack. > > the NMI test in eventinj.c verifies that a nested NMI occurs immediately at > the return address (IP register) in the IRET frame, as IRET opens the > NMI window. Currently, nested_nmi_iret_isr() reads the return address > using a magic offset (iret_stack[-3]), which is unclear and may break if > more values are pushed to the "iret_stack". > > To improve readability, add a global 'ip_after_iret' label for the expected > return address, push it to the IRET frame, and verify it matches the > interrupted address in the nested NMI handler. > > Signed-off-by: Chao Gao <chao.gao@intel.com> > --- > x86/eventinj.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/x86/eventinj.c b/x86/eventinj.c > index 6fbb2d0f..ec8a5ef1 100644 > --- a/x86/eventinj.c > +++ b/x86/eventinj.c > @@ -127,12 +127,13 @@ static void nmi_isr(struct ex_regs *r) > } > > unsigned long *iret_stack; > +extern char ip_after_iret[]; > > static void nested_nmi_iret_isr(struct ex_regs *r) > { > printf("Nested NMI isr running rip=%lx\n", r->rip); > > - if (r->rip == iret_stack[-3]) > + if (r->rip == (unsigned long)ip_after_iret) This change basically eliminates the need for the global 'ip_after_iret', it can be local to nmi_iret_isr() now. > test_count++; > } > > @@ -156,11 +157,11 @@ asm("do_iret:" > "mov %cs, %ecx \n\t" > "push"W" %"R "cx \n\t" > #ifndef __x86_64__ > - "push"W" $2f \n\t" > + "push"W" $ip_after_iret \n\t" > > "cmpb $0, no_test_device\n\t" // see if need to flush > #else > - "leaq 2f(%rip), %rbx \n\t" > + "leaq ip_after_iret(%rip), %rbx \n\t" > "pushq %rbx \n\t" > > "mov no_test_device(%rip), %bl \n\t" > @@ -170,7 +171,9 @@ asm("do_iret:" > "outl %eax, $0xe4 \n\t" // flush page > "1: \n\t" > "iret"W" \n\t" > - "2: xchg %"R "dx, %"R "sp \n\t" // point to old stack > + ".global ip_after_iret \n\t" > + "ip_after_iret: \n\t" > + "xchg %"R "dx, %"R "sp \n\t" // point to old stack > "ret\n\t" > ); > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification 2025-09-16 10:10 ` Mathias Krause @ 2025-10-15 1:47 ` Chao Gao 2025-10-15 4:29 ` Mathias Krause 0 siblings, 1 reply; 8+ messages in thread From: Chao Gao @ 2025-10-15 1:47 UTC (permalink / raw) To: Mathias Krause; +Cc: kvm, pbonzini, seanjc On Tue, Sep 16, 2025 at 12:10:46PM +0200, Mathias Krause wrote: >Am 15.09.25 um 16:49 schrieb Chao Gao: >> Use a global asm label to get the expected IP address for nested NMI >> interception instead of reading a hardcoded offset from the stack. >> >> the NMI test in eventinj.c verifies that a nested NMI occurs immediately at >> the return address (IP register) in the IRET frame, as IRET opens the >> NMI window. Currently, nested_nmi_iret_isr() reads the return address >> using a magic offset (iret_stack[-3]), which is unclear and may break if >> more values are pushed to the "iret_stack". >> >> To improve readability, add a global 'ip_after_iret' label for the expected >> return address, push it to the IRET frame, and verify it matches the >> interrupted address in the nested NMI handler. >> >> Signed-off-by: Chao Gao <chao.gao@intel.com> >> --- >> x86/eventinj.c | 11 +++++++---- >> 1 file changed, 7 insertions(+), 4 deletions(-) >> >> diff --git a/x86/eventinj.c b/x86/eventinj.c >> index 6fbb2d0f..ec8a5ef1 100644 >> --- a/x86/eventinj.c >> +++ b/x86/eventinj.c >> @@ -127,12 +127,13 @@ static void nmi_isr(struct ex_regs *r) >> } >> >> unsigned long *iret_stack; >> +extern char ip_after_iret[]; >> >> static void nested_nmi_iret_isr(struct ex_regs *r) >> { >> printf("Nested NMI isr running rip=%lx\n", r->rip); >> >> - if (r->rip == iret_stack[-3]) >> + if (r->rip == (unsigned long)ip_after_iret) > >This change basically eliminates the need for the global >'ip_after_iret', it can be local to nmi_iret_isr() now. You meant 'iret_stack', right? if so, sure, I will make it local to nmi_iret_isr(). ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification 2025-10-15 1:47 ` Chao Gao @ 2025-10-15 4:29 ` Mathias Krause 0 siblings, 0 replies; 8+ messages in thread From: Mathias Krause @ 2025-10-15 4:29 UTC (permalink / raw) To: Chao Gao; +Cc: kvm, pbonzini, seanjc On 10/15/25 03:47, Chao Gao wrote: > On Tue, Sep 16, 2025 at 12:10:46PM +0200, Mathias Krause wrote: >> Am 15.09.25 um 16:49 schrieb Chao Gao: >>> [...] >> >> This change basically eliminates the need for the global >> 'ip_after_iret', it can be local to nmi_iret_isr() now. > > You meant 'iret_stack', right? if so, sure, I will make it local to > nmi_iret_isr(). Bleh, sure. Sorry for the confusion. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] x86/eventinj: Push SP to IRET frame 2025-09-15 14:49 [kvm-unit-tests PATCH 0/2] Fix triple fault in eventinj test Chao Gao 2025-09-15 14:49 ` [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification Chao Gao @ 2025-09-15 14:49 ` Chao Gao 2025-09-16 10:21 ` Mathias Krause 1 sibling, 1 reply; 8+ messages in thread From: Chao Gao @ 2025-09-15 14:49 UTC (permalink / raw) To: kvm; +Cc: pbonzini, seanjc, Chao Gao Push the current stack pointer (SP) to the IRET frame in do_iret() to ensure a valid stack after IRET execution, particularly in 64-bit mode. Currently, do_iret() crafts an IRET frame with a zeroed SP. For 32-bit guests, SP is not popped during IRET due to no privilege change. so, the stack after IRET is still valid. But for 64-bit guests, IRET unconditionally pops RSP, restoring it to zero. This can cause a nested NMI to push its interrupt frame to the topmost page (0xffffffff_fffff000), which may be not mapped and cause triple fault [1]. To fix this issue, push the current SP to the IRET frame, ensuring RSP is restored to a valid stack in 64-bit mode. Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/kvm/aMahfvF1r39Xq6zK@intel.com/ --- x86/eventinj.c | 1 + 1 file changed, 1 insertion(+) diff --git a/x86/eventinj.c b/x86/eventinj.c index ec8a5ef1..63ebbaab 100644 --- a/x86/eventinj.c +++ b/x86/eventinj.c @@ -153,6 +153,7 @@ asm("do_iret:" "mov 8(%esp), %edx \n\t" // virt_stack #endif "xchg %"R "dx, %"R "sp \n\t" // point to new stack + "push"W" %"R "sp \n\t" "pushf"W" \n\t" "mov %cs, %ecx \n\t" "push"W" %"R "cx \n\t" -- 2.47.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] x86/eventinj: Push SP to IRET frame 2025-09-15 14:49 ` [PATCH 2/2] x86/eventinj: Push SP to IRET frame Chao Gao @ 2025-09-16 10:21 ` Mathias Krause 2025-10-15 1:49 ` Chao Gao 0 siblings, 1 reply; 8+ messages in thread From: Mathias Krause @ 2025-09-16 10:21 UTC (permalink / raw) To: Chao Gao, kvm; +Cc: pbonzini, seanjc Am 15.09.25 um 16:49 schrieb Chao Gao: > Push the current stack pointer (SP) to the IRET frame in do_iret() to > ensure a valid stack after IRET execution, particularly in 64-bit mode. > > Currently, do_iret() crafts an IRET frame with a zeroed SP. For 32-bit > guests, SP is not popped during IRET due to no privilege change. so, the > stack after IRET is still valid. But for 64-bit guests, IRET > unconditionally pops RSP, restoring it to zero. This can cause a nested NMI > to push its interrupt frame to the topmost page (0xffffffff_fffff000), > which may be not mapped and cause triple fault [1]. Nice catch! > > To fix this issue, push the current SP to the IRET frame, ensuring RSP is > restored to a valid stack in 64-bit mode. > > Reported-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Chao Gao <chao.gao@intel.com> > Link: https://lore.kernel.org/kvm/aMahfvF1r39Xq6zK@intel.com/ > --- > x86/eventinj.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/x86/eventinj.c b/x86/eventinj.c > index ec8a5ef1..63ebbaab 100644 > --- a/x86/eventinj.c > +++ b/x86/eventinj.c > @@ -153,6 +153,7 @@ asm("do_iret:" > "mov 8(%esp), %edx \n\t" // virt_stack > #endif > "xchg %"R "dx, %"R "sp \n\t" // point to new stack > + "push"W" %"R "sp \n\t" We should also push SS, for consistency reasons. Not that it would matter much for x64-64, but a NULL selector for SS still feels wrong. > "pushf"W" \n\t" > "mov %cs, %ecx \n\t" > "push"W" %"R "cx \n\t" The leading comment also needs an update and the 'extern bool no_test_device' can be dropped as well, as fwcfg.h gets included by eventinj.c. So, maybe something like this on top?: diff --git a/x86/eventinj.c b/x86/eventinj.c index 63ebbaabda66..4e21f3915820 100644 --- a/x86/eventinj.c +++ b/x86/eventinj.c @@ -139,11 +139,8 @@ static void nested_nmi_iret_isr(struct ex_regs *r) extern void do_iret(ulong phys_stack, void *virt_stack); -// Return to same privilege level won't pop SS or SP, so +// Return to same privilege level won't pop SS or SP for i386 but x86-64, so // save it in RDX while we run on the nested stack - -extern bool no_test_device; - asm("do_iret:" #ifdef __x86_64__ "mov %rdi, %rax \n\t" // phys_stack @@ -153,7 +150,12 @@ asm("do_iret:" "mov 8(%esp), %edx \n\t" // virt_stack #endif "xchg %"R "dx, %"R "sp \n\t" // point to new stack +#ifdef __x86_64__ + // IRET in 64 bit mode unconditionally pops SS:xSP + "mov %ss, %ecx \n\t" + "push"W" %"R "cx \n\t" "push"W" %"R "sp \n\t" +#endif "pushf"W" \n\t" "mov %cs, %ecx \n\t" "push"W" %"R "cx \n\t" Thanks, Mathias ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] x86/eventinj: Push SP to IRET frame 2025-09-16 10:21 ` Mathias Krause @ 2025-10-15 1:49 ` Chao Gao 0 siblings, 0 replies; 8+ messages in thread From: Chao Gao @ 2025-10-15 1:49 UTC (permalink / raw) To: Mathias Krause; +Cc: kvm, pbonzini, seanjc >> --- >> x86/eventinj.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/x86/eventinj.c b/x86/eventinj.c >> index ec8a5ef1..63ebbaab 100644 >> --- a/x86/eventinj.c >> +++ b/x86/eventinj.c >> @@ -153,6 +153,7 @@ asm("do_iret:" >> "mov 8(%esp), %edx \n\t" // virt_stack >> #endif >> "xchg %"R "dx, %"R "sp \n\t" // point to new stack >> + "push"W" %"R "sp \n\t" > >We should also push SS, for consistency reasons. Not that it would >matter much for x64-64, but a NULL selector for SS still feels wrong. > >> "pushf"W" \n\t" >> "mov %cs, %ecx \n\t" >> "push"W" %"R "cx \n\t" > >The leading comment also needs an update and the 'extern bool >no_test_device' can be dropped as well, as fwcfg.h gets included by >eventinj.c. > >So, maybe something like this on top?: Looks good to me. Thanks for your suggestion. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-10-15 4:29 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-15 14:49 [kvm-unit-tests PATCH 0/2] Fix triple fault in eventinj test Chao Gao 2025-09-15 14:49 ` [PATCH 1/2] x86/eventinj: Use global asm label for nested NMI IP address verification Chao Gao 2025-09-16 10:10 ` Mathias Krause 2025-10-15 1:47 ` Chao Gao 2025-10-15 4:29 ` Mathias Krause 2025-09-15 14:49 ` [PATCH 2/2] x86/eventinj: Push SP to IRET frame Chao Gao 2025-09-16 10:21 ` Mathias Krause 2025-10-15 1:49 ` Chao Gao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox