[PATCH 0/1] i386/tcg fix for IRET as used in dotnet runtime

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/1] i386/tcg fix for IRET as used in dotnet runtime
@ 2024-06-11 16:20 Robert R. Henry
  2024-06-11 16:20 ` [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for " Robert R. Henry
  0 siblings, 1 reply; 5+ messages in thread
From: Robert R. Henry @ 2024-06-11 16:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, richard.henderson, Robert R. Henry

This patch fixes the i386/tcg implementation of the IRET instruction
so that IRET can return from user space to user space, as used by the
dotnet runtime to switch threads.

This fixes https://gitlab.com/qemu-project/qemu/-/issues/249

I debugged this issue 4+ years ago, and wrote this patch then.

At the time, I did not fully understand the nuances of the priority
levels in the TCG emulation of the x86, nor of the x86 itself.
I understand less now!

I do not recall exactly how I was led to the conclusion that an
unhandled page fault in kernel space was due to a bug in the code
executed in the tcg emulator for IRET. Eventually, my approach to
debugging was to modify the source for the dotnet runtime so that
immediately prior to the IRET I executed an x87 fpatan2 instruction,
knowing that no modern program used that instruction, and that there
was a single point in QEMU source code that emulated that, making it a
convenient place to put gdb breakpoints to enable further breakpoints in
the IRET emulation code.

With this change the page faults go away, and that the dotnet program
completes as expected. For the curious,
https://github.com/dotnet/runtime/blob/main/src/coreclr/pal/src/arch/amd64/context2.S#L241
shows how the dotnet runtime uses iret.

I have booted BSD, solaris and macosX with this change, and await
results for booting Windows from the Windows kernel team.

I have not tested this with other modern JITers, such as Java,
v8, or HHVM.

Robert R. Henry (1):
  i386/tcg: Allow IRET from user mode to user mode for dotnet runtime

 target/i386/tcg/seg_helper.c | 78 ++++++++++++++++++++++--------------
 1 file changed, 47 insertions(+), 31 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for dotnet runtime
  2024-06-11 16:20 [PATCH 0/1] i386/tcg fix for IRET as used in dotnet runtime Robert R. Henry
@ 2024-06-11 16:20 ` Robert R. Henry
  2024-06-15 23:25   ` Richard Henderson
  0 siblings, 1 reply; 5+ messages in thread
From: Robert R. Henry @ 2024-06-11 16:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, richard.henderson, Robert R. Henry

This fixes a bug wherein i386/tcg assumed an interrupt return using
the IRET instruction was always returning from kernel mode to either
kernel mode or user mode. This assumption is violated when IRET is used
as a clever way to restore thread state, as for example in the dotnet
runtime. There, IRET returns from user mode to user mode.

This bug manifested itself as a page fault in the guest Linux kernel.

This bug appears to have been in QEMU since the beginning.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/249
Signed-off-by: Robert R. Henry <robhenry@microsoft.com>
---
 target/i386/tcg/seg_helper.c | 78 ++++++++++++++++++++++--------------
 1 file changed, 47 insertions(+), 31 deletions(-)

diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c
index 715db1f232..815d26e61d 100644
--- a/target/i386/tcg/seg_helper.c
+++ b/target/i386/tcg/seg_helper.c
@@ -843,20 +843,35 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int,
 
 #ifdef TARGET_X86_64
 
-#define PUSHQ_RA(sp, val, ra)                   \
-    {                                           \
-        sp -= 8;                                \
-        cpu_stq_kernel_ra(env, sp, (val), ra);  \
-    }
-
-#define POPQ_RA(sp, val, ra)                    \
-    {                                           \
-        val = cpu_ldq_kernel_ra(env, sp, ra);   \
-        sp += 8;                                \
-    }
+#define PUSHQ_RA(sp, val, ra, cpl, dpl) \
+  FUNC_PUSHQ_RA(env, &sp, val, ra, cpl, dpl)
+
+static inline void FUNC_PUSHQ_RA(
+    CPUX86State *env, target_ulong *sp,
+    target_ulong val, target_ulong ra, int cpl, int dpl) {
+  *sp -= 8;
+  if (dpl == 0) {
+    cpu_stq_kernel_ra(env, *sp, val, ra);
+  } else {
+    cpu_stq_data_ra(env, *sp, val, ra);
+  } 
+}
 
-#define PUSHQ(sp, val) PUSHQ_RA(sp, val, 0)
-#define POPQ(sp, val) POPQ_RA(sp, val, 0)
+#define POPQ_RA(sp, val, ra, cpl, dpl) \
+  val = FUNC_POPQ_RA(env, &sp, ra, cpl, dpl)
+
+static inline target_ulong FUNC_POPQ_RA(
+    CPUX86State *env, target_ulong *sp,
+    target_ulong ra, int cpl, int dpl) {
+  target_ulong val;
+  if (cpl == 0) {  /* TODO perhaps both arms reduce to cpu_ldq_data_ra? */
+    val = cpu_ldq_kernel_ra(env, *sp, ra);
+  } else {
+    val = cpu_ldq_data_ra(env, *sp, ra);
+  }
+  *sp += 8;
+  return val;
+}
 
 static inline target_ulong get_rsp_from_tss(CPUX86State *env, int level)
 {
@@ -901,6 +916,7 @@ static void do_interrupt64(CPUX86State *env, int intno, int is_int,
     uint32_t e1, e2, e3, ss, eflags;
     target_ulong old_eip, esp, offset;
     bool set_rf;
+    const target_ulong retaddr = 0;
 
     has_error_code = 0;
     if (!is_int && !is_hw) {
@@ -989,13 +1005,13 @@ static void do_interrupt64(CPUX86State *env, int intno, int is_int,
         eflags |= RF_MASK;
     }
 
-    PUSHQ(esp, env->segs[R_SS].selector);
-    PUSHQ(esp, env->regs[R_ESP]);
-    PUSHQ(esp, eflags);
-    PUSHQ(esp, env->segs[R_CS].selector);
-    PUSHQ(esp, old_eip);
+    PUSHQ_RA(esp, env->segs[R_SS].selector, retaddr, cpl, dpl);
+    PUSHQ_RA(esp, env->regs[R_ESP],         retaddr, cpl, dpl);
+    PUSHQ_RA(esp, eflags,                   retaddr, cpl, dpl);
+    PUSHQ_RA(esp, env->segs[R_CS].selector, retaddr, cpl, dpl);
+    PUSHQ_RA(esp, old_eip,                  retaddr, cpl, dpl);
     if (has_error_code) {
-        PUSHQ(esp, error_code);
+        PUSHQ_RA(esp, error_code, retaddr, cpl, dpl);
     }
 
     /* interrupt gate clear IF mask */
@@ -1621,8 +1637,8 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip,
 
             /* 64 bit case */
             rsp = env->regs[R_ESP];
-            PUSHQ_RA(rsp, env->segs[R_CS].selector, GETPC());
-            PUSHQ_RA(rsp, next_eip, GETPC());
+            PUSHQ_RA(rsp, env->segs[R_CS].selector, GETPC(), cpl, dpl);
+            PUSHQ_RA(rsp, next_eip, GETPC(), cpl, dpl);
             /* from this point, not restartable */
             env->regs[R_ESP] = rsp;
             cpu_x86_load_seg_cache(env, R_CS, (new_cs & 0xfffc) | cpl,
@@ -1792,8 +1808,8 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip,
 #ifdef TARGET_X86_64
             if (shift == 2) {
                 /* XXX: verify if new stack address is canonical */
-                PUSHQ_RA(sp, env->segs[R_SS].selector, GETPC());
-                PUSHQ_RA(sp, env->regs[R_ESP], GETPC());
+                PUSHQ_RA(sp, env->segs[R_SS].selector, GETPC(), cpl, dpl);
+                PUSHQ_RA(sp, env->regs[R_ESP], GETPC(), cpl, dpl);
                 /* parameters aren't supported for 64-bit call gates */
             } else
 #endif
@@ -1828,8 +1844,8 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip,
 
 #ifdef TARGET_X86_64
         if (shift == 2) {
-            PUSHQ_RA(sp, env->segs[R_CS].selector, GETPC());
-            PUSHQ_RA(sp, next_eip, GETPC());
+            PUSHQ_RA(sp, env->segs[R_CS].selector, GETPC(), cpl, dpl);
+            PUSHQ_RA(sp, next_eip, GETPC(), cpl, dpl);
         } else
 #endif
         if (shift == 1) {
@@ -1944,6 +1960,7 @@ static inline void helper_ret_protected(CPUX86State *env, int shift,
     int cpl, dpl, rpl, eflags_mask, iopl;
     target_ulong ssp, sp, new_eip, new_esp, sp_mask;
 
+    cpl = env->hflags & HF_CPL_MASK;
 #ifdef TARGET_X86_64
     if (shift == 2) {
         sp_mask = -1;
@@ -1957,11 +1974,11 @@ static inline void helper_ret_protected(CPUX86State *env, int shift,
     new_eflags = 0; /* avoid warning */
 #ifdef TARGET_X86_64
     if (shift == 2) {
-        POPQ_RA(sp, new_eip, retaddr);
-        POPQ_RA(sp, new_cs, retaddr);
+        POPQ_RA(sp, new_eip, retaddr, cpl, dpl);
+        POPQ_RA(sp, new_cs, retaddr, cpl, dpl);
         new_cs &= 0xffff;
         if (is_iret) {
-            POPQ_RA(sp, new_eflags, retaddr);
+            POPQ_RA(sp, new_eflags, retaddr, cpl, dpl);
         }
     } else
 #endif
@@ -1999,7 +2016,6 @@ static inline void helper_ret_protected(CPUX86State *env, int shift,
         !(e2 & DESC_CS_MASK)) {
         raise_exception_err_ra(env, EXCP0D_GPF, new_cs & 0xfffc, retaddr);
     }
-    cpl = env->hflags & HF_CPL_MASK;
     rpl = new_cs & 3;
     if (rpl < cpl) {
         raise_exception_err_ra(env, EXCP0D_GPF, new_cs & 0xfffc, retaddr);
@@ -2030,8 +2046,8 @@ static inline void helper_ret_protected(CPUX86State *env, int shift,
         /* return to different privilege level */
 #ifdef TARGET_X86_64
         if (shift == 2) {
-            POPQ_RA(sp, new_esp, retaddr);
-            POPQ_RA(sp, new_ss, retaddr);
+            POPQ_RA(sp, new_esp, retaddr, cpl, dpl);
+            POPQ_RA(sp, new_ss, retaddr, cpl, dpl);
             new_ss &= 0xffff;
         } else
 #endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for dotnet runtime
  2024-06-11 16:20 ` [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for " Robert R. Henry
@ 2024-06-15 23:25   ` Richard Henderson
  2024-06-16 22:44     ` Robert Henry
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Henderson @ 2024-06-15 23:25 UTC (permalink / raw)
  To: Robert R. Henry, qemu-devel; +Cc: pbonzini, Robert R. Henry

On 6/11/24 09:20, Robert R. Henry wrote:
> This fixes a bug wherein i386/tcg assumed an interrupt return using
> the IRET instruction was always returning from kernel mode to either
> kernel mode or user mode. This assumption is violated when IRET is used
> as a clever way to restore thread state, as for example in the dotnet
> runtime. There, IRET returns from user mode to user mode.
> 
> This bug manifested itself as a page fault in the guest Linux kernel.
> 
> This bug appears to have been in QEMU since the beginning.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/249
> Signed-off-by: Robert R. Henry <robhenry@microsoft.com>
> ---
>   target/i386/tcg/seg_helper.c | 78 ++++++++++++++++++++++--------------
>   1 file changed, 47 insertions(+), 31 deletions(-)
> 
> diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c
> index 715db1f232..815d26e61d 100644
> --- a/target/i386/tcg/seg_helper.c
> +++ b/target/i386/tcg/seg_helper.c
> @@ -843,20 +843,35 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int,
>   
>   #ifdef TARGET_X86_64
>   
> -#define PUSHQ_RA(sp, val, ra)                   \
> -    {                                           \
> -        sp -= 8;                                \
> -        cpu_stq_kernel_ra(env, sp, (val), ra);  \
> -    }
> -
> -#define POPQ_RA(sp, val, ra)                    \
> -    {                                           \
> -        val = cpu_ldq_kernel_ra(env, sp, ra);   \
> -        sp += 8;                                \
> -    }
> +#define PUSHQ_RA(sp, val, ra, cpl, dpl) \
> +  FUNC_PUSHQ_RA(env, &sp, val, ra, cpl, dpl)
> +
> +static inline void FUNC_PUSHQ_RA(
> +    CPUX86State *env, target_ulong *sp,
> +    target_ulong val, target_ulong ra, int cpl, int dpl) {
> +  *sp -= 8;
> +  if (dpl == 0) {
> +    cpu_stq_kernel_ra(env, *sp, val, ra);
> +  } else {
> +    cpu_stq_data_ra(env, *sp, val, ra);
> +  }
> +}

This doesn't seem quite right.

I would be much happier if we were to resolve the proper mmu index earlier, once, rather 
than within each call to cpu_{ld,st}*_{kernel,data}_ra.  With the mmu index in hand, use 
cpu_{ld,st}*_mmuidx_ra instead.

I believe you will want to factor out a subroutine of x86_cpu_mmu_index which passes in 
the pl, rather than reading cpl from env->hflags.  This will also allow 
cpu_mmu_index_kernel to be eliminated or simplified, which is written to assume pl=0.


r~


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for dotnet runtime
  2024-06-15 23:25   ` Richard Henderson
@ 2024-06-16 22:44     ` Robert Henry
  2024-06-17  8:33       ` Paolo Bonzini
  0 siblings, 1 reply; 5+ messages in thread
From: Robert Henry @ 2024-06-16 22:44 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, pbonzini, Robert R. Henry

[-- Attachment #1: Type: text/plain, Size: 3154 bytes --]

I do not think I will have the time or focus to work on improving this
patch this summer, as I will retire in 2 weeks and need to make a clean
break to focus on other things (health, for one) for a while.

If anyone wants to put into place Richard's ideas, I will not be offended!

I do not see any of this chatter in this email thread on the bug report
https://gitlab.com/qemu-project/qemu/-/issues/249

Robert Henry

On Sat, Jun 15, 2024 at 4:25 PM Richard Henderson <
richard.henderson@linaro.org> wrote:

> On 6/11/24 09:20, Robert R. Henry wrote:
> > This fixes a bug wherein i386/tcg assumed an interrupt return using
> > the IRET instruction was always returning from kernel mode to either
> > kernel mode or user mode. This assumption is violated when IRET is used
> > as a clever way to restore thread state, as for example in the dotnet
> > runtime. There, IRET returns from user mode to user mode.
> >
> > This bug manifested itself as a page fault in the guest Linux kernel.
> >
> > This bug appears to have been in QEMU since the beginning.
> >
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/249
> > Signed-off-by: Robert R. Henry <robhenry@microsoft.com>
> > ---
> >   target/i386/tcg/seg_helper.c | 78 ++++++++++++++++++++++--------------
> >   1 file changed, 47 insertions(+), 31 deletions(-)
> >
> > diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c
> > index 715db1f232..815d26e61d 100644
> > --- a/target/i386/tcg/seg_helper.c
> > +++ b/target/i386/tcg/seg_helper.c
> > @@ -843,20 +843,35 @@ static void do_interrupt_protected(CPUX86State
> *env, int intno, int is_int,
> >
> >   #ifdef TARGET_X86_64
> >
> > -#define PUSHQ_RA(sp, val, ra)                   \
> > -    {                                           \
> > -        sp -= 8;                                \
> > -        cpu_stq_kernel_ra(env, sp, (val), ra);  \
> > -    }
> > -
> > -#define POPQ_RA(sp, val, ra)                    \
> > -    {                                           \
> > -        val = cpu_ldq_kernel_ra(env, sp, ra);   \
> > -        sp += 8;                                \
> > -    }
> > +#define PUSHQ_RA(sp, val, ra, cpl, dpl) \
> > +  FUNC_PUSHQ_RA(env, &sp, val, ra, cpl, dpl)
> > +
> > +static inline void FUNC_PUSHQ_RA(
> > +    CPUX86State *env, target_ulong *sp,
> > +    target_ulong val, target_ulong ra, int cpl, int dpl) {
> > +  *sp -= 8;
> > +  if (dpl == 0) {
> > +    cpu_stq_kernel_ra(env, *sp, val, ra);
> > +  } else {
> > +    cpu_stq_data_ra(env, *sp, val, ra);
> > +  }
> > +}
>
> This doesn't seem quite right.
>
> I would be much happier if we were to resolve the proper mmu index
> earlier, once, rather
> than within each call to cpu_{ld,st}*_{kernel,data}_ra.  With the mmu
> index in hand, use
> cpu_{ld,st}*_mmuidx_ra instead.
>
> I believe you will want to factor out a subroutine of x86_cpu_mmu_index
> which passes in
> the pl, rather than reading cpl from env->hflags.  This will also allow
> cpu_mmu_index_kernel to be eliminated or simplified, which is written to
> assume pl=0.
>
>
> r~
>

[-- Attachment #2: Type: text/html, Size: 4162 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for dotnet runtime
  2024-06-16 22:44     ` Robert Henry
@ 2024-06-17  8:33       ` Paolo Bonzini
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2024-06-17  8:33 UTC (permalink / raw)
  To: Robert Henry; +Cc: Richard Henderson, qemu-devel, Robert R. Henry

On Mon, Jun 17, 2024 at 12:45 AM Robert Henry <rrh.henry@gmail.com> wrote:
> I do not think I will have the time or focus to work on improving this patch this summer, as I will retire in 2 weeks and need to make a clean break to focus on other things (health, for one) for a while.
> If anyone wants to put into place Richard's ideas, I will not be offended!

Great, I'll do the work and make sure your analysis and contribution
to the patch is recognized.

> I do not see any of this chatter in this email thread on the bug report https://gitlab.com/qemu-project/qemu/-/issues/249

Yeah, that happens - the discussion in the bug report often focuses
more on what the bug is than how to fix it. I had looked at the patch
and came to roughly the same conclusion as Richard, though he beat me
to answering.

More precisely:

- the main issue with your patch is that it only affects IRETQ and I
think (going from memory) RETFQ. All IRET and RETF operations should
use the CPL for the access.

- a secondary issue with the patch is that you can use the *_data
variant for both CPL0 and CPL3 accesses.

- it's a good idea to also solve the related issue, that interrupts
should use a data access for the DPL of the interrupt/call gate. That
one cannot use *_data, so there are two alternatives: using an if like
you did, or using the *_mmuidx variant with the MMU index computed in
advance.

Yet another related issue (going back to *really* legacy stuff) is
that call gates need to use *_data when reading the parameters from
the stack, so that it's possible to use call gates from CPL3 to CPL0
with CR4.SMAP=1.

Paolo



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-06-17  8:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-11 16:20 [PATCH 0/1] i386/tcg fix for IRET as used in dotnet runtime Robert R. Henry
2024-06-11 16:20 ` [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for " Robert R. Henry
2024-06-15 23:25   ` Richard Henderson
2024-06-16 22:44     ` Robert Henry
2024-06-17  8:33       ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).