I do not think I will have the time or focus to work on improving this patch this summer, as I will retire in 2 weeks and need to make a clean break to focus on other things (health, for one) for a while.

If anyone wants to put into place Richard's ideas, I will not be offended!

I do not see any of this chatter in this email thread on the bug report https://gitlab.com/qemu-project/qemu/-/issues/249

Robert Henry

On Sat, Jun 15, 2024 at 4:25 PM Richard Henderson <richard.henderson@linaro.org> wrote:
On 6/11/24 09:20, Robert R. Henry wrote:
> This fixes a bug wherein i386/tcg assumed an interrupt return using
> the IRET instruction was always returning from kernel mode to either
> kernel mode or user mode. This assumption is violated when IRET is used
> as a clever way to restore thread state, as for example in the dotnet
> runtime. There, IRET returns from user mode to user mode.
>
> This bug manifested itself as a page fault in the guest Linux kernel.
>
> This bug appears to have been in QEMU since the beginning.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/249
> Signed-off-by: Robert R. Henry <robhenry@microsoft.com>
> ---
>   target/i386/tcg/seg_helper.c | 78 ++++++++++++++++++++++--------------
>   1 file changed, 47 insertions(+), 31 deletions(-)
>
> diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c
> index 715db1f232..815d26e61d 100644
> --- a/target/i386/tcg/seg_helper.c
> +++ b/target/i386/tcg/seg_helper.c
> @@ -843,20 +843,35 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int,
>   
>   #ifdef TARGET_X86_64
>   
> -#define PUSHQ_RA(sp, val, ra)                   \
> -    {                                           \
> -        sp -= 8;                                \
> -        cpu_stq_kernel_ra(env, sp, (val), ra);  \
> -    }
> -
> -#define POPQ_RA(sp, val, ra)                    \
> -    {                                           \
> -        val = cpu_ldq_kernel_ra(env, sp, ra);   \
> -        sp += 8;                                \
> -    }
> +#define PUSHQ_RA(sp, val, ra, cpl, dpl) \
> +  FUNC_PUSHQ_RA(env, &sp, val, ra, cpl, dpl)
> +
> +static inline void FUNC_PUSHQ_RA(
> +    CPUX86State *env, target_ulong *sp,
> +    target_ulong val, target_ulong ra, int cpl, int dpl) {
> +  *sp -= 8;
> +  if (dpl == 0) {
> +    cpu_stq_kernel_ra(env, *sp, val, ra);
> +  } else {
> +    cpu_stq_data_ra(env, *sp, val, ra);
> +  }
> +}

This doesn't seem quite right.

I would be much happier if we were to resolve the proper mmu index earlier, once, rather
than within each call to cpu_{ld,st}*_{kernel,data}_ra.  With the mmu index in hand, use
cpu_{ld,st}*_mmuidx_ra instead.

I believe you will want to factor out a subroutine of x86_cpu_mmu_index which passes in
the pl, rather than reading cpl from env->hflags.  This will also allow
cpu_mmu_index_kernel to be eliminated or simplified, which is written to assume pl=0.


r~