On Sat, Jun 15, 2024 at 4:25 PM Richard Henderson <richard.henderson@linaro.org> wrote:

On 6/11/24 09:20, Robert R. Henry wrote:
> This fixes a bug wherein i386/tcg assumed an interrupt return using
> the IRET instruction was always returning from kernel mode to either
> kernel mode or user mode. This assumption is violated when IRET is used
> as a clever way to restore thread state, as for example in the dotnet
> runtime. There, IRET returns from user mode to user mode.
>
> This bug manifested itself as a page fault in the guest Linux kernel.
>
> This bug appears to have been in QEMU since the beginning.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/249
> Signed-off-by: Robert R. Henry <robhenry@microsoft.com>
> ---
> target/i386/tcg/seg_helper.c | 78 ++++++++++++++++++++++--------------
> 1 file changed, 47 insertions(+), 31 deletions(-)
>
> diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c
> index 715db1f232..815d26e61d 100644
> --- a/target/i386/tcg/seg_helper.c
> +++ b/target/i386/tcg/seg_helper.c
> @@ -843,20 +843,35 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int,
>
> #ifdef TARGET_X86_64
>
> -#define PUSHQ_RA(sp, val, ra) \
> - { \
> - sp -= 8; \
> - cpu_stq_kernel_ra(env, sp, (val), ra); \
> - }
> -
> -#define POPQ_RA(sp, val, ra) \
> - { \
> - val = cpu_ldq_kernel_ra(env, sp, ra); \
> - sp += 8; \
> - }
> +#define PUSHQ_RA(sp, val, ra, cpl, dpl) \
> + FUNC_PUSHQ_RA(env, &sp, val, ra, cpl, dpl)
> +
> +static inline void FUNC_PUSHQ_RA(
> + CPUX86State *env, target_ulong *sp,
> + target_ulong val, target_ulong ra, int cpl, int dpl) {
> + *sp -= 8;
> + if (dpl == 0) {
> + cpu_stq_kernel_ra(env, *sp, val, ra);
> + } else {
> + cpu_stq_data_ra(env, *sp, val, ra);
> + }
> +}

This doesn't seem quite right.

I would be much happier if we were to resolve the proper mmu index earlier, once, rather
than within each call to cpu_{ld,st}*_{kernel,data}_ra. With the mmu index in hand, use
cpu_{ld,st}*_mmuidx_ra instead.

I believe you will want to factor out a subroutine of x86_cpu_mmu_index which passes in
the pl, rather than reading cpl from env->hflags. This will also allow
cpu_mmu_index_kernel to be eliminated or simplified, which is written to assume pl=0.

r~