* [PATCH 0/1] i386/tcg fix for IRET as used in dotnet runtime @ 2024-06-11 16:20 Robert R. Henry 2024-06-11 16:20 ` [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for " Robert R. Henry 0 siblings, 1 reply; 5+ messages in thread From: Robert R. Henry @ 2024-06-11 16:20 UTC (permalink / raw) To: qemu-devel; +Cc: pbonzini, richard.henderson, Robert R. Henry This patch fixes the i386/tcg implementation of the IRET instruction so that IRET can return from user space to user space, as used by the dotnet runtime to switch threads. This fixes https://gitlab.com/qemu-project/qemu/-/issues/249 I debugged this issue 4+ years ago, and wrote this patch then. At the time, I did not fully understand the nuances of the priority levels in the TCG emulation of the x86, nor of the x86 itself. I understand less now! I do not recall exactly how I was led to the conclusion that an unhandled page fault in kernel space was due to a bug in the code executed in the tcg emulator for IRET. Eventually, my approach to debugging was to modify the source for the dotnet runtime so that immediately prior to the IRET I executed an x87 fpatan2 instruction, knowing that no modern program used that instruction, and that there was a single point in QEMU source code that emulated that, making it a convenient place to put gdb breakpoints to enable further breakpoints in the IRET emulation code. With this change the page faults go away, and that the dotnet program completes as expected. For the curious, https://github.com/dotnet/runtime/blob/main/src/coreclr/pal/src/arch/amd64/context2.S#L241 shows how the dotnet runtime uses iret. I have booted BSD, solaris and macosX with this change, and await results for booting Windows from the Windows kernel team. I have not tested this with other modern JITers, such as Java, v8, or HHVM. Robert R. Henry (1): i386/tcg: Allow IRET from user mode to user mode for dotnet runtime target/i386/tcg/seg_helper.c | 78 ++++++++++++++++++++++-------------- 1 file changed, 47 insertions(+), 31 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for dotnet runtime 2024-06-11 16:20 [PATCH 0/1] i386/tcg fix for IRET as used in dotnet runtime Robert R. Henry @ 2024-06-11 16:20 ` Robert R. Henry 2024-06-15 23:25 ` Richard Henderson 0 siblings, 1 reply; 5+ messages in thread From: Robert R. Henry @ 2024-06-11 16:20 UTC (permalink / raw) To: qemu-devel; +Cc: pbonzini, richard.henderson, Robert R. Henry This fixes a bug wherein i386/tcg assumed an interrupt return using the IRET instruction was always returning from kernel mode to either kernel mode or user mode. This assumption is violated when IRET is used as a clever way to restore thread state, as for example in the dotnet runtime. There, IRET returns from user mode to user mode. This bug manifested itself as a page fault in the guest Linux kernel. This bug appears to have been in QEMU since the beginning. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/249 Signed-off-by: Robert R. Henry <robhenry@microsoft.com> --- target/i386/tcg/seg_helper.c | 78 ++++++++++++++++++++++-------------- 1 file changed, 47 insertions(+), 31 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 715db1f232..815d26e61d 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -843,20 +843,35 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, #ifdef TARGET_X86_64 -#define PUSHQ_RA(sp, val, ra) \ - { \ - sp -= 8; \ - cpu_stq_kernel_ra(env, sp, (val), ra); \ - } - -#define POPQ_RA(sp, val, ra) \ - { \ - val = cpu_ldq_kernel_ra(env, sp, ra); \ - sp += 8; \ - } +#define PUSHQ_RA(sp, val, ra, cpl, dpl) \ + FUNC_PUSHQ_RA(env, &sp, val, ra, cpl, dpl) + +static inline void FUNC_PUSHQ_RA( + CPUX86State *env, target_ulong *sp, + target_ulong val, target_ulong ra, int cpl, int dpl) { + *sp -= 8; + if (dpl == 0) { + cpu_stq_kernel_ra(env, *sp, val, ra); + } else { + cpu_stq_data_ra(env, *sp, val, ra); + } +} -#define PUSHQ(sp, val) PUSHQ_RA(sp, val, 0) -#define POPQ(sp, val) POPQ_RA(sp, val, 0) +#define POPQ_RA(sp, val, ra, cpl, dpl) \ + val = FUNC_POPQ_RA(env, &sp, ra, cpl, dpl) + +static inline target_ulong FUNC_POPQ_RA( + CPUX86State *env, target_ulong *sp, + target_ulong ra, int cpl, int dpl) { + target_ulong val; + if (cpl == 0) { /* TODO perhaps both arms reduce to cpu_ldq_data_ra? */ + val = cpu_ldq_kernel_ra(env, *sp, ra); + } else { + val = cpu_ldq_data_ra(env, *sp, ra); + } + *sp += 8; + return val; +} static inline target_ulong get_rsp_from_tss(CPUX86State *env, int level) { @@ -901,6 +916,7 @@ static void do_interrupt64(CPUX86State *env, int intno, int is_int, uint32_t e1, e2, e3, ss, eflags; target_ulong old_eip, esp, offset; bool set_rf; + const target_ulong retaddr = 0; has_error_code = 0; if (!is_int && !is_hw) { @@ -989,13 +1005,13 @@ static void do_interrupt64(CPUX86State *env, int intno, int is_int, eflags |= RF_MASK; } - PUSHQ(esp, env->segs[R_SS].selector); - PUSHQ(esp, env->regs[R_ESP]); - PUSHQ(esp, eflags); - PUSHQ(esp, env->segs[R_CS].selector); - PUSHQ(esp, old_eip); + PUSHQ_RA(esp, env->segs[R_SS].selector, retaddr, cpl, dpl); + PUSHQ_RA(esp, env->regs[R_ESP], retaddr, cpl, dpl); + PUSHQ_RA(esp, eflags, retaddr, cpl, dpl); + PUSHQ_RA(esp, env->segs[R_CS].selector, retaddr, cpl, dpl); + PUSHQ_RA(esp, old_eip, retaddr, cpl, dpl); if (has_error_code) { - PUSHQ(esp, error_code); + PUSHQ_RA(esp, error_code, retaddr, cpl, dpl); } /* interrupt gate clear IF mask */ @@ -1621,8 +1637,8 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip, /* 64 bit case */ rsp = env->regs[R_ESP]; - PUSHQ_RA(rsp, env->segs[R_CS].selector, GETPC()); - PUSHQ_RA(rsp, next_eip, GETPC()); + PUSHQ_RA(rsp, env->segs[R_CS].selector, GETPC(), cpl, dpl); + PUSHQ_RA(rsp, next_eip, GETPC(), cpl, dpl); /* from this point, not restartable */ env->regs[R_ESP] = rsp; cpu_x86_load_seg_cache(env, R_CS, (new_cs & 0xfffc) | cpl, @@ -1792,8 +1808,8 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip, #ifdef TARGET_X86_64 if (shift == 2) { /* XXX: verify if new stack address is canonical */ - PUSHQ_RA(sp, env->segs[R_SS].selector, GETPC()); - PUSHQ_RA(sp, env->regs[R_ESP], GETPC()); + PUSHQ_RA(sp, env->segs[R_SS].selector, GETPC(), cpl, dpl); + PUSHQ_RA(sp, env->regs[R_ESP], GETPC(), cpl, dpl); /* parameters aren't supported for 64-bit call gates */ } else #endif @@ -1828,8 +1844,8 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip, #ifdef TARGET_X86_64 if (shift == 2) { - PUSHQ_RA(sp, env->segs[R_CS].selector, GETPC()); - PUSHQ_RA(sp, next_eip, GETPC()); + PUSHQ_RA(sp, env->segs[R_CS].selector, GETPC(), cpl, dpl); + PUSHQ_RA(sp, next_eip, GETPC(), cpl, dpl); } else #endif if (shift == 1) { @@ -1944,6 +1960,7 @@ static inline void helper_ret_protected(CPUX86State *env, int shift, int cpl, dpl, rpl, eflags_mask, iopl; target_ulong ssp, sp, new_eip, new_esp, sp_mask; + cpl = env->hflags & HF_CPL_MASK; #ifdef TARGET_X86_64 if (shift == 2) { sp_mask = -1; @@ -1957,11 +1974,11 @@ static inline void helper_ret_protected(CPUX86State *env, int shift, new_eflags = 0; /* avoid warning */ #ifdef TARGET_X86_64 if (shift == 2) { - POPQ_RA(sp, new_eip, retaddr); - POPQ_RA(sp, new_cs, retaddr); + POPQ_RA(sp, new_eip, retaddr, cpl, dpl); + POPQ_RA(sp, new_cs, retaddr, cpl, dpl); new_cs &= 0xffff; if (is_iret) { - POPQ_RA(sp, new_eflags, retaddr); + POPQ_RA(sp, new_eflags, retaddr, cpl, dpl); } } else #endif @@ -1999,7 +2016,6 @@ static inline void helper_ret_protected(CPUX86State *env, int shift, !(e2 & DESC_CS_MASK)) { raise_exception_err_ra(env, EXCP0D_GPF, new_cs & 0xfffc, retaddr); } - cpl = env->hflags & HF_CPL_MASK; rpl = new_cs & 3; if (rpl < cpl) { raise_exception_err_ra(env, EXCP0D_GPF, new_cs & 0xfffc, retaddr); @@ -2030,8 +2046,8 @@ static inline void helper_ret_protected(CPUX86State *env, int shift, /* return to different privilege level */ #ifdef TARGET_X86_64 if (shift == 2) { - POPQ_RA(sp, new_esp, retaddr); - POPQ_RA(sp, new_ss, retaddr); + POPQ_RA(sp, new_esp, retaddr, cpl, dpl); + POPQ_RA(sp, new_ss, retaddr, cpl, dpl); new_ss &= 0xffff; } else #endif -- 2.34.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for dotnet runtime 2024-06-11 16:20 ` [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for " Robert R. Henry @ 2024-06-15 23:25 ` Richard Henderson 2024-06-16 22:44 ` Robert Henry 0 siblings, 1 reply; 5+ messages in thread From: Richard Henderson @ 2024-06-15 23:25 UTC (permalink / raw) To: Robert R. Henry, qemu-devel; +Cc: pbonzini, Robert R. Henry On 6/11/24 09:20, Robert R. Henry wrote: > This fixes a bug wherein i386/tcg assumed an interrupt return using > the IRET instruction was always returning from kernel mode to either > kernel mode or user mode. This assumption is violated when IRET is used > as a clever way to restore thread state, as for example in the dotnet > runtime. There, IRET returns from user mode to user mode. > > This bug manifested itself as a page fault in the guest Linux kernel. > > This bug appears to have been in QEMU since the beginning. > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/249 > Signed-off-by: Robert R. Henry <robhenry@microsoft.com> > --- > target/i386/tcg/seg_helper.c | 78 ++++++++++++++++++++++-------------- > 1 file changed, 47 insertions(+), 31 deletions(-) > > diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c > index 715db1f232..815d26e61d 100644 > --- a/target/i386/tcg/seg_helper.c > +++ b/target/i386/tcg/seg_helper.c > @@ -843,20 +843,35 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, > > #ifdef TARGET_X86_64 > > -#define PUSHQ_RA(sp, val, ra) \ > - { \ > - sp -= 8; \ > - cpu_stq_kernel_ra(env, sp, (val), ra); \ > - } > - > -#define POPQ_RA(sp, val, ra) \ > - { \ > - val = cpu_ldq_kernel_ra(env, sp, ra); \ > - sp += 8; \ > - } > +#define PUSHQ_RA(sp, val, ra, cpl, dpl) \ > + FUNC_PUSHQ_RA(env, &sp, val, ra, cpl, dpl) > + > +static inline void FUNC_PUSHQ_RA( > + CPUX86State *env, target_ulong *sp, > + target_ulong val, target_ulong ra, int cpl, int dpl) { > + *sp -= 8; > + if (dpl == 0) { > + cpu_stq_kernel_ra(env, *sp, val, ra); > + } else { > + cpu_stq_data_ra(env, *sp, val, ra); > + } > +} This doesn't seem quite right. I would be much happier if we were to resolve the proper mmu index earlier, once, rather than within each call to cpu_{ld,st}*_{kernel,data}_ra. With the mmu index in hand, use cpu_{ld,st}*_mmuidx_ra instead. I believe you will want to factor out a subroutine of x86_cpu_mmu_index which passes in the pl, rather than reading cpl from env->hflags. This will also allow cpu_mmu_index_kernel to be eliminated or simplified, which is written to assume pl=0. r~ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for dotnet runtime 2024-06-15 23:25 ` Richard Henderson @ 2024-06-16 22:44 ` Robert Henry 2024-06-17 8:33 ` Paolo Bonzini 0 siblings, 1 reply; 5+ messages in thread From: Robert Henry @ 2024-06-16 22:44 UTC (permalink / raw) To: Richard Henderson; +Cc: qemu-devel, pbonzini, Robert R. Henry [-- Attachment #1: Type: text/plain, Size: 3154 bytes --] I do not think I will have the time or focus to work on improving this patch this summer, as I will retire in 2 weeks and need to make a clean break to focus on other things (health, for one) for a while. If anyone wants to put into place Richard's ideas, I will not be offended! I do not see any of this chatter in this email thread on the bug report https://gitlab.com/qemu-project/qemu/-/issues/249 Robert Henry On Sat, Jun 15, 2024 at 4:25 PM Richard Henderson < richard.henderson@linaro.org> wrote: > On 6/11/24 09:20, Robert R. Henry wrote: > > This fixes a bug wherein i386/tcg assumed an interrupt return using > > the IRET instruction was always returning from kernel mode to either > > kernel mode or user mode. This assumption is violated when IRET is used > > as a clever way to restore thread state, as for example in the dotnet > > runtime. There, IRET returns from user mode to user mode. > > > > This bug manifested itself as a page fault in the guest Linux kernel. > > > > This bug appears to have been in QEMU since the beginning. > > > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/249 > > Signed-off-by: Robert R. Henry <robhenry@microsoft.com> > > --- > > target/i386/tcg/seg_helper.c | 78 ++++++++++++++++++++++-------------- > > 1 file changed, 47 insertions(+), 31 deletions(-) > > > > diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c > > index 715db1f232..815d26e61d 100644 > > --- a/target/i386/tcg/seg_helper.c > > +++ b/target/i386/tcg/seg_helper.c > > @@ -843,20 +843,35 @@ static void do_interrupt_protected(CPUX86State > *env, int intno, int is_int, > > > > #ifdef TARGET_X86_64 > > > > -#define PUSHQ_RA(sp, val, ra) \ > > - { \ > > - sp -= 8; \ > > - cpu_stq_kernel_ra(env, sp, (val), ra); \ > > - } > > - > > -#define POPQ_RA(sp, val, ra) \ > > - { \ > > - val = cpu_ldq_kernel_ra(env, sp, ra); \ > > - sp += 8; \ > > - } > > +#define PUSHQ_RA(sp, val, ra, cpl, dpl) \ > > + FUNC_PUSHQ_RA(env, &sp, val, ra, cpl, dpl) > > + > > +static inline void FUNC_PUSHQ_RA( > > + CPUX86State *env, target_ulong *sp, > > + target_ulong val, target_ulong ra, int cpl, int dpl) { > > + *sp -= 8; > > + if (dpl == 0) { > > + cpu_stq_kernel_ra(env, *sp, val, ra); > > + } else { > > + cpu_stq_data_ra(env, *sp, val, ra); > > + } > > +} > > This doesn't seem quite right. > > I would be much happier if we were to resolve the proper mmu index > earlier, once, rather > than within each call to cpu_{ld,st}*_{kernel,data}_ra. With the mmu > index in hand, use > cpu_{ld,st}*_mmuidx_ra instead. > > I believe you will want to factor out a subroutine of x86_cpu_mmu_index > which passes in > the pl, rather than reading cpl from env->hflags. This will also allow > cpu_mmu_index_kernel to be eliminated or simplified, which is written to > assume pl=0. > > > r~ > [-- Attachment #2: Type: text/html, Size: 4162 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for dotnet runtime 2024-06-16 22:44 ` Robert Henry @ 2024-06-17 8:33 ` Paolo Bonzini 0 siblings, 0 replies; 5+ messages in thread From: Paolo Bonzini @ 2024-06-17 8:33 UTC (permalink / raw) To: Robert Henry; +Cc: Richard Henderson, qemu-devel, Robert R. Henry On Mon, Jun 17, 2024 at 12:45 AM Robert Henry <rrh.henry@gmail.com> wrote: > I do not think I will have the time or focus to work on improving this patch this summer, as I will retire in 2 weeks and need to make a clean break to focus on other things (health, for one) for a while. > If anyone wants to put into place Richard's ideas, I will not be offended! Great, I'll do the work and make sure your analysis and contribution to the patch is recognized. > I do not see any of this chatter in this email thread on the bug report https://gitlab.com/qemu-project/qemu/-/issues/249 Yeah, that happens - the discussion in the bug report often focuses more on what the bug is than how to fix it. I had looked at the patch and came to roughly the same conclusion as Richard, though he beat me to answering. More precisely: - the main issue with your patch is that it only affects IRETQ and I think (going from memory) RETFQ. All IRET and RETF operations should use the CPL for the access. - a secondary issue with the patch is that you can use the *_data variant for both CPL0 and CPL3 accesses. - it's a good idea to also solve the related issue, that interrupts should use a data access for the DPL of the interrupt/call gate. That one cannot use *_data, so there are two alternatives: using an if like you did, or using the *_mmuidx variant with the MMU index computed in advance. Yet another related issue (going back to *really* legacy stuff) is that call gates need to use *_data when reading the parameters from the stack, so that it's possible to use call gates from CPL3 to CPL0 with CR4.SMAP=1. Paolo ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-06-17 8:34 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-11 16:20 [PATCH 0/1] i386/tcg fix for IRET as used in dotnet runtime Robert R. Henry 2024-06-11 16:20 ` [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for " Robert R. Henry 2024-06-15 23:25 ` Richard Henderson 2024-06-16 22:44 ` Robert Henry 2024-06-17 8:33 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).