Re: arm: singlestep bug

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Andrew Jones <drjones@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-arm@nongnu.org,
	richard.henderson@linaro.org, peter.maydell@linaro.org,
	ricarkol@google.com
Subject: Re: arm: singlestep bug
Date: Wed, 02 Feb 2022 11:16:46 +0000	[thread overview]
Message-ID: <87a6f9wjd5.fsf@linaro.org> (raw)
In-Reply-To: <20220202102832.lwiyc5huddau4i6a@gator>


Andrew Jones <drjones@redhat.com> writes:

> Hello TCG developers,
>
> We have new debug test cases in kvm-unit-tests thanks to Ricardo
> Koller.

Yay tests ;-)

> The singlestep test cases are failing with TCG. Enabling TCG debug outputs
> the error
>
>   TCG hflags mismatch (current:(0x000004a1,0x0000000000004000) rebuilt:(0x000004a3,0x0000000000004000)

This shows that:

  FIELD(TBFLAG_ANY, SS_ACTIVE, 1, 1)

should be set but wasn't cached.

> I noticed that the test passed on an older QEMU, so I bisected it and
> found commit e979972a6a17 ("target/arm: Rely on hflags correct in
> cpu_get_tb_cpu_state"), which unfortunately doesn't tell us anything
> that the above error message didn't say already (apparently we can't
> currently depend on hflags being correct wrt singlestep at this
> point).

Fortunately this is intended - the enable-debug always recalculates the
hflags (and expensive operation) and makes it pretty easy to spot where
we failed to call arm_rebuild_hflags(). You can do this with the normal
debug tools or my new favourite tool (for short programs) using the
execlog plugin.

  0, 0x40080a24, 0x52840020, "movz w0, #0x2001"
  0, 0x40080a28, 0x2a010000, "orr w0, w0, w1"
  0, 0x40080a2c, 0xd5100240, "msr mdscr_el1, x0"
  0, 0x40080a30, 0xd5033fdf, "isb "
  0, 0x40080a34, 0x350001f4, "cbnz w20, #0x40080a70"
  TCG hflags mismatch (current:(0x000004a1,0x0000000000004000) rebuilt:(0x000004a3,0x0000000000004000)

This is a touch weird though because any msr write does trigger a
rebuild of the flags. See handle_sys():

    if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
        /*
         * A write to any coprocessor regiser that ends a TB
         * must rebuild the hflags for the next TB.
         */
        TCGv_i32 tcg_el = tcg_const_i32(s->current_el);
        gen_helper_rebuild_hflags_a64(cpu_env, tcg_el);
        tcg_temp_free_i32(tcg_el);
        /*
         * We default to ending the TB on a coprocessor register write,
         * but allow this to be suppressed by the register definition
         * (usually only necessary to work around guest bugs).
         */
        s->base.is_jmp = DISAS_UPDATE_EXIT;
    }

And indeed in rr I can see it working though the tail end of
helper_rebuild_flags_a64() but it seems arm_singlestep_active() returns
false at this point. This ultimately fails at
aa64_generate_debug_exceptions():

    /*
     * Same EL to same EL debug exceptions need MDSCR_KDE enabled
     * while not masking the (D)ebug bit in DAIF.
     */
    debug_el = arm_debug_target_el(env);

    if (cur_el == debug_el) {
        return extract32(env->cp15.mdscr_el1, 13, 1)
            && !(env->daif & PSTATE_D);
    }

And if I look at the objdump it is indeed the instruction we never
completed:

     a34:       350001f4        cbnz    w20, a70 <ss_start+0x34>
     a38:       d50348ff        msr     daifclr, #0x8

So if I force the flag generation on manipulating daif:

--8<---------------cut here---------------start------------->8---
modified   target/arm/helper-a64.c
@@ -83,12 +83,14 @@ void HELPER(msr_i_daifset)(CPUARMState *env, uint32_t imm)
 {
     daif_check(env, 0x1e, imm, GETPC());
     env->daif |= (imm << 6) & PSTATE_DAIF;
+    arm_rebuild_hflags(env);
 }
 
 void HELPER(msr_i_daifclear)(CPUARMState *env, uint32_t imm)
 {
     daif_check(env, 0x1f, imm, GETPC());
     env->daif &= ~((imm << 6) & PSTATE_DAIF);
+    arm_rebuild_hflags(env);
 }
 
--8<---------------cut here---------------end--------------->8---

  I now get a working test:

  env QEMU=$HOME/lsrc/qemu.git/builds/all.debug/qemu-system-aarch64 ./run_tests.sh -g debug
  PASS debug-bp (6 tests)
  PASS debug-bp-migration (7 tests)
  PASS debug-wp (8 tests)
  PASS debug-wp-migration (9 tests)
  PASS debug-sstep (1 tests)
  PASS debug-sstep-migration (1 tests)

(I was momentarily confused when debug-sstep failed, but that was I'd
forgotten to point to my build, the system 5.2 qemu is broken in this
regard).

I'll spin up a proper patch.

Side note:

  ad5fb8830150071487025b3594a7b1bf218d12d8 is the first bad commit
  commit ad5fb8830150071487025b3594a7b1bf218d12d8
  Author: Zixuan Wang <zixuanwang@google.com>
  Date:   Mon Oct 4 13:49:19 2021 -0700

breaks the running on kvm-unit-test for me, I needed to patch:

--8<---------------cut here---------------start------------->8---
modified   run_tests.sh
@@ -31,7 +31,8 @@ specify the appropriate qemu binary for ARCH-run.
 EOF
 }
 
-RUNTIME_arch_run="./$TEST_SUBDIR/run"
+RUNTIME_arch_run="./$TEST_DIR/run"
+#RUNTIME_arch_run="./$TEST_SUBDIR/run"
 source scripts/runtime.bash
 
 # require enhanced getopt
--8<---------------cut here---------------end--------------->8---


-- 
Alex Bennée

WARNING: multiple messages have this Message-ID (diff)

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Andrew Jones <drjones@redhat.com>
Cc: peter.maydell@linaro.org, ricarkol@google.com,
	qemu-arm@nongnu.org, richard.henderson@linaro.org,
	qemu-devel@nongnu.org
Subject: Re: arm: singlestep bug
Date: Wed, 02 Feb 2022 11:16:46 +0000	[thread overview]
Message-ID: <87a6f9wjd5.fsf@linaro.org> (raw)
In-Reply-To: <20220202102832.lwiyc5huddau4i6a@gator>


Andrew Jones <drjones@redhat.com> writes:

> Hello TCG developers,
>
> We have new debug test cases in kvm-unit-tests thanks to Ricardo
> Koller.

Yay tests ;-)

> The singlestep test cases are failing with TCG. Enabling TCG debug outputs
> the error
>
>   TCG hflags mismatch (current:(0x000004a1,0x0000000000004000) rebuilt:(0x000004a3,0x0000000000004000)

This shows that:

  FIELD(TBFLAG_ANY, SS_ACTIVE, 1, 1)

should be set but wasn't cached.

> I noticed that the test passed on an older QEMU, so I bisected it and
> found commit e979972a6a17 ("target/arm: Rely on hflags correct in
> cpu_get_tb_cpu_state"), which unfortunately doesn't tell us anything
> that the above error message didn't say already (apparently we can't
> currently depend on hflags being correct wrt singlestep at this
> point).

Fortunately this is intended - the enable-debug always recalculates the
hflags (and expensive operation) and makes it pretty easy to spot where
we failed to call arm_rebuild_hflags(). You can do this with the normal
debug tools or my new favourite tool (for short programs) using the
execlog plugin.

  0, 0x40080a24, 0x52840020, "movz w0, #0x2001"
  0, 0x40080a28, 0x2a010000, "orr w0, w0, w1"
  0, 0x40080a2c, 0xd5100240, "msr mdscr_el1, x0"
  0, 0x40080a30, 0xd5033fdf, "isb "
  0, 0x40080a34, 0x350001f4, "cbnz w20, #0x40080a70"
  TCG hflags mismatch (current:(0x000004a1,0x0000000000004000) rebuilt:(0x000004a3,0x0000000000004000)

This is a touch weird though because any msr write does trigger a
rebuild of the flags. See handle_sys():

    if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
        /*
         * A write to any coprocessor regiser that ends a TB
         * must rebuild the hflags for the next TB.
         */
        TCGv_i32 tcg_el = tcg_const_i32(s->current_el);
        gen_helper_rebuild_hflags_a64(cpu_env, tcg_el);
        tcg_temp_free_i32(tcg_el);
        /*
         * We default to ending the TB on a coprocessor register write,
         * but allow this to be suppressed by the register definition
         * (usually only necessary to work around guest bugs).
         */
        s->base.is_jmp = DISAS_UPDATE_EXIT;
    }

And indeed in rr I can see it working though the tail end of
helper_rebuild_flags_a64() but it seems arm_singlestep_active() returns
false at this point. This ultimately fails at
aa64_generate_debug_exceptions():

    /*
     * Same EL to same EL debug exceptions need MDSCR_KDE enabled
     * while not masking the (D)ebug bit in DAIF.
     */
    debug_el = arm_debug_target_el(env);

    if (cur_el == debug_el) {
        return extract32(env->cp15.mdscr_el1, 13, 1)
            && !(env->daif & PSTATE_D);
    }

And if I look at the objdump it is indeed the instruction we never
completed:

     a34:       350001f4        cbnz    w20, a70 <ss_start+0x34>
     a38:       d50348ff        msr     daifclr, #0x8

So if I force the flag generation on manipulating daif:

--8<---------------cut here---------------start------------->8---
modified   target/arm/helper-a64.c
@@ -83,12 +83,14 @@ void HELPER(msr_i_daifset)(CPUARMState *env, uint32_t imm)
 {
     daif_check(env, 0x1e, imm, GETPC());
     env->daif |= (imm << 6) & PSTATE_DAIF;
+    arm_rebuild_hflags(env);
 }
 
 void HELPER(msr_i_daifclear)(CPUARMState *env, uint32_t imm)
 {
     daif_check(env, 0x1f, imm, GETPC());
     env->daif &= ~((imm << 6) & PSTATE_DAIF);
+    arm_rebuild_hflags(env);
 }
 
--8<---------------cut here---------------end--------------->8---

  I now get a working test:

  env QEMU=$HOME/lsrc/qemu.git/builds/all.debug/qemu-system-aarch64 ./run_tests.sh -g debug
  PASS debug-bp (6 tests)
  PASS debug-bp-migration (7 tests)
  PASS debug-wp (8 tests)
  PASS debug-wp-migration (9 tests)
  PASS debug-sstep (1 tests)
  PASS debug-sstep-migration (1 tests)

(I was momentarily confused when debug-sstep failed, but that was I'd
forgotten to point to my build, the system 5.2 qemu is broken in this
regard).

I'll spin up a proper patch.

Side note:

  ad5fb8830150071487025b3594a7b1bf218d12d8 is the first bad commit
  commit ad5fb8830150071487025b3594a7b1bf218d12d8
  Author: Zixuan Wang <zixuanwang@google.com>
  Date:   Mon Oct 4 13:49:19 2021 -0700

breaks the running on kvm-unit-test for me, I needed to patch:

--8<---------------cut here---------------start------------->8---
modified   run_tests.sh
@@ -31,7 +31,8 @@ specify the appropriate qemu binary for ARCH-run.
 EOF
 }
 
-RUNTIME_arch_run="./$TEST_SUBDIR/run"
+RUNTIME_arch_run="./$TEST_DIR/run"
+#RUNTIME_arch_run="./$TEST_SUBDIR/run"
 source scripts/runtime.bash
 
 # require enhanced getopt
--8<---------------cut here---------------end--------------->8---


-- 
Alex Bennée

next prev parent reply	other threads:[~2022-02-02 12:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-02 10:28 arm: singlestep bug Andrew Jones
2022-02-02 10:28 ` Andrew Jones
2022-02-02 11:16 ` Alex Bennée [this message]
2022-02-02 11:16   ` Alex Bennée
2022-02-02 12:30   ` Andrew Jones
2022-02-02 12:30     ` Andrew Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a6f9wjd5.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=drjones@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=ricarkol@google.com \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.