qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Pierrick Bouvier <pierrick.bouvier@linaro.org>
To: Julian Ganz <neither@nut.email>, qemu-devel@nongnu.org
Cc: "Alex Bennée" <alex.bennee@linaro.org>,
	"Alexandre Iooss" <erdnaxe@crans.org>,
	"Mahmoud Mandour" <ma.mandourr@gmail.com>
Subject: Re: [RFC PATCH v3 11/11] tests: add plugin asserting correctness of discon event's to_pc
Date: Fri, 20 Dec 2024 13:46:52 -0800	[thread overview]
Message-ID: <86f28c7e-eaac-4350-a3cd-000108d8943e@linaro.org> (raw)
In-Reply-To: <11ae3330-71bb-4da9-9dcb-b7378f1682bc@linaro.org>

On 12/20/24 13:17, Pierrick Bouvier wrote:
> Hi Julian,
> 
> On 12/20/24 03:47, Julian Ganz wrote:
>> Hi Pierrick,
>>
>> December 5, 2024 at 12:33 AM, "Pierrick Bouvier" wrote:
>>> On 12/2/24 11:41, Julian Ganz wrote:
>>>>    +static void insn_exec(unsigned int vcpu_index, void *userdata)
>>>>    +{
>>>>    + struct cpu_state *state = qemu_plugin_scoreboard_find(states, vcpu_index);
>>>>    + uint64_t pc = (uint64_t) userdata;
>>>>    + GString* report;
>>>>    +
>>>>    + if (state->has_next) {
>>>>    + if (state->next_pc != pc) {
>>>>    + report = g_string_new("Trap target PC mismatch\n");
>>>>    + g_string_append_printf(report,
>>>>    + "Expected: %"PRIx64"\nEncountered: %"
>>>>    + PRIx64"\n",
>>>>    + state->next_pc, pc);
>>>>    + qemu_plugin_outs(report->str);
>>>>    + if (abort_on_mismatch) {
>>>>    + g_abort();
>>>>    + }
>>>>    + g_string_free(report, true);
>>>>    + }
>>>>    + state->has_next = false;
>>>>    + }
>>>>    +}
>>>>
>>> When booting an arm64 vm, I get this message:
>>> Trap target PC mismatch
>>> Expected: 23faf3a80
>>> Encountered: 23faf3a84
>>
>> A colleague of mine went to great lengths trying to track and reliably
>> reproduce this. We think that it's something amiss with the existing
>> instruction exec callback infrastructure. So... it's not something I'll
>> be addressing with the next iteration as it's out of scope. We'll
>> probably continue looking into it, though.
>>
>> The mismatch is reported perfectly normal and boring exceptions and
>> interrupts with no indication of any differences to other (not reported)
>> events that fire on a regular basis. Apparently, once in a blue moon
>> (relatively speaking), for the first instruction of a handler (even
>> though it is definitely executed and qemu does print a trace-line for
>> that instruction):
>>
>> | Trace 0: 0x7fffa0b03900 [00104004/000000023fde73b4/00000021/ff020200]
>> | Trace 0: 0x7fffa02d9580 [00104004/000000023fde72b8/00000021/ff020200]
>> | Trace 0: 0x7fffa02dfc40 [00104004/000000023fde7338/00000021/ff020200]
>> | Trace 0: 0x7fffa0b03d00 [00104004/000000023fde73d4/00000021/ff020200]
>> | Trace 0: 0x7fffa0b03e80 [00104004/000000023fde73d8/00000021/ff020200]
>> | Trace 0: 0x7fffa0b04140 [00104004/000000023fde7408/00000021/ff020200]
>> | Trace 0: 0x7fffa02dd6c0 [00104004/000000023fde70b8/00000021/ff020200]
>> | Trace 0: 0x7fffa02dd800 [00104004/000000023fde7b90/00000021/ff020200]
>> | cpu_io_recompile: rewound execution of TB to 000000023fde7b90
>> | Taking exception 5 [IRQ] on CPU 0
>> | ...from EL1 to EL1
>> | ...with ESR 0x0/0x3800000
>> | ...with SPSR 0x20000305
>> | ...with ELR 0x23fde7b90
>> | ...to EL1 PC 0x23fd77a80 PSTATE 0x23c5
>> | Trace 0: 0x7fffa13a8340 [00104004/000000023fd77a80/00000021/ff021201]
>> | Trace 0: 0x7fffa13a8480 [00104004/000000023fd77a84/00000021/ff020200]
>> | Trap target PC mismatch CPU 0
>> | Expected:    23fd77a80
>> | Encountered: 23fd77a84
>> | warning: 44	./nptl/pthread_kill.c: No such file or directory
>> | Couldn't get registers: No such process.
>>
>> It does show up with both single-core and multi-core VMs, so that at
>> least eliminates some possibilities. Maybe :/
>>
>> The issue is nasty to reproduce in a way that allows any meaningful
>> investigation. It usually involves sifting through many GBs of Qemu logs
>> for maybe one occurance. We could add another testing/dummy plugin that
>> just prints the PC for _any_ instruction executed and have a skript
>> check for non-alternating Trace-lines from Qemu and that Plugin. But
>> then we're talking nearly double the amount of Lines to look through
>> with probably little additional information.
>>
> 
> Thanks for the investigation.
> I could reproduce this with this command line:
> ./build/qemu-system-aarch64 -M virt -plugin
> ./build/tests/tcg/plugins/libdiscons.so,abort=on -m 8G -device
> virtio-blk-pci,drive=root -drive
> if=none,id=root,file=/home/user/.work/images/debianaarch64.img -M virt
> -cpu max,pauth=off  -drive
> if=pflash,readonly=on,file=/usr/share/AAVMF/AAVMF_CODE.fd -drive
> if=pflash,file=/home/user/.work/images/AAVMF_VARS.fd -d plugin,in_asm,op
> -D crash.log
> 
> # -d plugin,in_asm,op allows to dump asm of every translated block,
> plugin output (for discon plugin), and tcg op generated.
> 
> It reliably crashes with a single address.
> Looking at the debug output (crash.log):
> ----------------
> IN:
> 0x23faf3a80:  d108c3ff  sub      sp, sp, #0x230
> # => This bb has a single instruction as input
> 
> OP:
> # this is the TB instrumentation
>    ld_i32 loc0,env,$0xfffffffffffffff8
>    brcond_i32 loc0,$0x0,lt,$L0
>    st8_i32 $0x1,env,$0xfffffffffffffffc
> 
>    ---- 0000000000000a80 0000000000000000 0000000000000000
> # => we can see that there is no call_plugin, looks like instrumentation
> # is not applied
>    sub_i64 sp,sp,$0x230
>    add_i64 pc,pc,$0x4
>    goto_tb $0x1
>    exit_tb $0x7f7eedd355c1
>    set_label $L0
>    exit_tb $0x7f7eedd355c3
> 
> ----------------
> IN:
> 0x23faf3a84:  a9b007e0  stp      x0, x1, [sp, #-0x100]!
> 0x23faf3a88:  a9010fe2  stp      x2, x3, [sp, #0x10]
> ...
> 
> OP:
>    ld_i32 loc0,env,$0xfffffffffffffff8
>    brcond_i32 loc0,$0x0,lt,$L0
>    st8_i32 $0x0,env,$0xfffffffffffffffc
> 
>    ---- 0000000000000a84 0000000000000000 0000000000000000
> # instruction is correctly applied
> call plugin(0x7f7eec96d530),$0x1,$0,$0x0,$0x23faf3a84
>    mov_i64 loc2,sp
>    ...
> 
> Trap target PC mismatch
> Expected:    23faf3a80
> Encountered: 23faf3a84
> 
> The thing interesting here is that we can notice that 23faf3a80 is a
> translation block with a single instruction, and we can see that
> instrumentation is not applied for this instruction (call_plugin is not
> present).
> 
> Overall, it really looks like a bug on QEMU side, where we miss
> instrumenting something. I'll take a look. You can ignore this for now.
> 

It seems like we have a problem to identify tb as mem_only. This was 
introduced to prevent double instrumentation of some memory access 
(touching MMIO), but it seems that as a result, we skip some 
instructions sometimes.

I need to dig into this further, but for now, you should be able to 
workaround this on your side with this patch:

diff --git a/plugins/api.c b/plugins/api.c
index 24ea64e2de5..6cb9d81a0a2 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -92,6 +92,7 @@ void 
qemu_plugin_register_vcpu_exit_cb(qemu_plugin_id_t id,

  static bool tb_is_mem_only(void)
  {
+    return false;
      return tb_cflags(tcg_ctx->gen_tb) & CF_MEMI_ONLY;
  }

>> Regards,
>> Julian
> 

Regards,
Pierrick


  reply	other threads:[~2024-12-20 21:48 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-02 19:26 [RFC PATCH v3 00/11] tcg-plugins: add hooks for discontinuities Julian Ganz
2024-12-02 19:26 ` [RFC PATCH v3 01/11] plugins: add types for callbacks related to certain discontinuities Julian Ganz
2024-12-03  8:45   ` Julian Ganz
2024-12-04 22:41     ` Pierrick Bouvier
2024-12-05 12:40       ` Julian Ganz
2024-12-05 17:56         ` Pierrick Bouvier
2024-12-05 21:50           ` Julian Ganz
2024-12-05 22:14             ` Julian Ganz
2024-12-05 23:03             ` Pierrick Bouvier
2024-12-06  8:58               ` Julian Ganz
2024-12-06 18:59                 ` Pierrick Bouvier
2024-12-07 13:38                   ` Julian Ganz
2024-12-09 18:52                     ` Pierrick Bouvier
2024-12-04 22:45   ` Pierrick Bouvier
2024-12-05 12:44     ` Julian Ganz
2024-12-05 17:35       ` Pierrick Bouvier
2024-12-05 21:25         ` Julian Ganz
2025-01-09 13:52     ` Alex Bennée
2025-01-09 22:28       ` Pierrick Bouvier
2025-01-10 11:43       ` Julian Ganz
2024-12-02 19:26 ` [RFC PATCH v3 02/11] plugins: add API for registering discontinuity callbacks Julian Ganz
2024-12-04 22:45   ` Pierrick Bouvier
2025-01-09 13:57   ` Alex Bennée
2025-01-10 11:40     ` Julian Ganz
2024-12-02 19:26 ` [RFC PATCH v3 03/11] plugins: add hooks for new discontinuity related callbacks Julian Ganz
2024-12-04 22:47   ` Pierrick Bouvier
2025-01-09 13:58   ` Alex Bennée
2024-12-02 19:26 ` [RFC PATCH v3 04/11] contrib/plugins: add plugin showcasing new dicontinuity related API Julian Ganz
2024-12-04 23:14   ` Pierrick Bouvier
2024-12-05 13:00     ` Julian Ganz
2024-12-05 17:23       ` Pierrick Bouvier
2025-01-09 14:04   ` Alex Bennée
2025-01-09 22:10     ` Pierrick Bouvier
2025-01-10 11:49     ` Julian Ganz
2025-01-10 15:15       ` Alex Bennée
2025-01-10 21:02         ` Pierrick Bouvier
2025-01-11 12:15           ` Alex Bennée
2024-12-02 19:26 ` [RFC PATCH v3 05/11] target/alpha: call plugin trap callbacks Julian Ganz
2024-12-04 22:48   ` Pierrick Bouvier
2024-12-02 19:26 ` [RFC PATCH v3 06/11] target/arm: " Julian Ganz
2024-12-02 19:26 ` [RFC PATCH v3 07/11] target/avr: " Julian Ganz
2024-12-02 19:26 ` [RFC PATCH v3 08/11] target/mips: " Julian Ganz
2025-01-09 13:43   ` Alex Bennée
2024-12-02 19:26 ` [RFC PATCH v3 09/11] target/riscv: " Julian Ganz
2024-12-03  4:39   ` Alistair Francis
2024-12-02 19:41 ` [RFC PATCH v3 10/11] target/sparc: " Julian Ganz
2025-01-09 13:46   ` Alex Bennée
2024-12-02 19:41 ` [RFC PATCH v3 11/11] tests: add plugin asserting correctness of discon event's to_pc Julian Ganz
2024-12-04 23:33   ` Pierrick Bouvier
2024-12-05 13:10     ` Julian Ganz
2024-12-05 17:30       ` Pierrick Bouvier
2024-12-05 21:22         ` Julian Ganz
2024-12-05 22:28           ` Pierrick Bouvier
2024-12-06  8:42             ` Julian Ganz
2024-12-06 19:02               ` Pierrick Bouvier
2024-12-06 19:42                 ` Richard Henderson
2024-12-06 20:40                   ` Pierrick Bouvier
2024-12-06 22:56                     ` Richard Henderson
2024-12-07 13:47                       ` Julian Ganz
2024-12-07 13:41                   ` Julian Ganz
2024-12-20 11:47     ` Julian Ganz
2024-12-20 21:17       ` Pierrick Bouvier
2024-12-20 21:46         ` Pierrick Bouvier [this message]
2025-01-09 16:35         ` Alex Bennée
2025-01-09 16:33       ` Alex Bennée
2025-01-09 22:27         ` Pierrick Bouvier
2025-01-10 11:58         ` Julian Ganz
2024-12-03  8:36 ` [RFC PATCH v3 00/11] tcg-plugins: add hooks for discontinuities Julian Ganz
2024-12-04 22:51   ` Pierrick Bouvier
2025-01-09 16:43 ` Alex Bennée

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86f28c7e-eaac-4350-a3cd-000108d8943e@linaro.org \
    --to=pierrick.bouvier@linaro.org \
    --cc=alex.bennee@linaro.org \
    --cc=erdnaxe@crans.org \
    --cc=ma.mandourr@gmail.com \
    --cc=neither@nut.email \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).