From: Pierrick Bouvier <pierrick.bouvier@linaro.org>
To: "Alex Bennée" <alex.bennee@linaro.org>,
"Julian Ganz" <neither@nut.email>
Cc: qemu-devel@nongnu.org, Alexandre Iooss <erdnaxe@crans.org>,
Mahmoud Mandour <ma.mandourr@gmail.com>,
Richard Henderson <richard.henderson@linaro.org>
Subject: Re: [RFC PATCH v3 11/11] tests: add plugin asserting correctness of discon event's to_pc
Date: Thu, 9 Jan 2025 14:27:20 -0800 [thread overview]
Message-ID: <86f60229-e76b-40d1-b8e2-2ad6c29c1194@linaro.org> (raw)
In-Reply-To: <87a5c06j7b.fsf@draig.linaro.org>
On 1/9/25 08:33, Alex Bennée wrote:
> "Julian Ganz" <neither@nut.email> writes:
>
> (Add Richard to CC)
>
>> Hi Pierrick,
>>
>> December 5, 2024 at 12:33 AM, "Pierrick Bouvier" wrote:
>>> On 12/2/24 11:41, Julian Ganz wrote:
>>>> +static void insn_exec(unsigned int vcpu_index, void *userdata)
>>>> +{
>>>> + struct cpu_state *state = qemu_plugin_scoreboard_find(states, vcpu_index);
>>>> + uint64_t pc = (uint64_t) userdata;
>>>> + GString* report;
>>>> +
>>>> + if (state->has_next) {
>>>> + if (state->next_pc != pc) {
>>>> + report = g_string_new("Trap target PC mismatch\n");
>>>> + g_string_append_printf(report,
>>>> + "Expected: %"PRIx64"\nEncountered: %"
>>>> + PRIx64"\n",
>>>> + state->next_pc, pc);
>>>> + qemu_plugin_outs(report->str);
>>>> + if (abort_on_mismatch) {
>>>> + g_abort();
>>>> + }
>>>> + g_string_free(report, true);
>>>> + }
>>>> + state->has_next = false;
>>>> + }
>>>> +}
>>>>
>>> When booting an arm64 vm, I get this message:
>>> Trap target PC mismatch
>>> Expected: 23faf3a80
>>> Encountered: 23faf3a84
>>
>> A colleague of mine went to great lengths trying to track and reliably
>> reproduce this. We think that it's something amiss with the existing
>> instruction exec callback infrastructure. So... it's not something I'll
>> be addressing with the next iteration as it's out of scope. We'll
>> probably continue looking into it, though.
>>
>> The mismatch is reported perfectly normal and boring exceptions and
>> interrupts with no indication of any differences to other (not reported)
>> events that fire on a regular basis. Apparently, once in a blue moon
>> (relatively speaking), for the first instruction of a handler (even
>> though it is definitely executed and qemu does print a trace-line for
>> that instruction):
>>
>> | Trace 0: 0x7fffa0b03900 [00104004/000000023fde73b4/00000021/ff020200]
>> | Trace 0: 0x7fffa02d9580 [00104004/000000023fde72b8/00000021/ff020200]
>> | Trace 0: 0x7fffa02dfc40 [00104004/000000023fde7338/00000021/ff020200]
>> | Trace 0: 0x7fffa0b03d00 [00104004/000000023fde73d4/00000021/ff020200]
>> | Trace 0: 0x7fffa0b03e80 [00104004/000000023fde73d8/00000021/ff020200]
>> | Trace 0: 0x7fffa0b04140 [00104004/000000023fde7408/00000021/ff020200]
>> | Trace 0: 0x7fffa02dd6c0 [00104004/000000023fde70b8/00000021/ff020200]
>> | Trace 0: 0x7fffa02dd800 [00104004/000000023fde7b90/00000021/ff020200]
>> | cpu_io_recompile: rewound execution of TB to 000000023fde7b90
>
> So this happens when an instruction that is not the last instruction of
> the block does some IO. As IO accesses can potentially change system
> state we can't allow more instructions to run in the block that might
> not have that change of state captured
>
> cpu_io_recompile exits the loop and forces the next TranslationBlock to
> be only one (or maybe two instructions). We have to play games with
> instrumentation to avoid double counting execution:
>
> /*
> * Exit the loop and potentially generate a new TB executing the
> * just the I/O insns. We also limit instrumentation to memory
> * operations only (which execute after completion) so we don't
> * double instrument the instruction.
> */
> cpu->cflags_next_tb = curr_cflags(cpu) | CF_MEMI_ONLY | n;
>
> The instruction is in a weird state having both executed (from the
> plugin point of view) but not changed any state (stopped from doing MMIO
> until the next instruction).
>
>> | Taking exception 5 [IRQ] on CPU 0
>> | ...from EL1 to EL1
>> | ...with ESR 0x0/0x3800000
>> | ...with SPSR 0x20000305
>> | ...with ELR 0x23fde7b90
>> | ...to EL1 PC 0x23fd77a80 PSTATE 0x23c5
>
> I guess before we re-executed the new block an asynchronous interrupt
> came in?
>
> Does changing the above to:
>
> cpu->cflags_next_tb = curr_cflags(cpu) | CF_MEMI_ONLY | CF_NOIRQ | n;
>
> make the problem go away? It should ensure the next 1/2 instruction
> block execute without checking for async events. See gen_tb_start() for
> the gory details.
>
Thanks, it solves the problem indeed.
I was not sure why this specific block was reexecuted in the case of an IRQ.
>> | Trace 0: 0x7fffa13a8340 [00104004/000000023fd77a80/00000021/ff021201]
>> | Trace 0: 0x7fffa13a8480 [00104004/000000023fd77a84/00000021/ff020200]
>> | Trap target PC mismatch CPU 0
>> | Expected: 23fd77a80
>> | Encountered: 23fd77a84
>> | warning: 44 ./nptl/pthread_kill.c: No such file or directory
>> | Couldn't get registers: No such process.
>>
>> It does show up with both single-core and multi-core VMs, so that at
>> least eliminates some possibilities. Maybe :/
>>
>> The issue is nasty to reproduce in a way that allows any meaningful
>> investigation. It usually involves sifting through many GBs of Qemu logs
>> for maybe one occurance. We could add another testing/dummy plugin that
>> just prints the PC for _any_ instruction executed and have a skript
>> check for non-alternating Trace-lines from Qemu and that Plugin. But
>> then we're talking nearly double the amount of Lines to look through
>> with probably little additional information.
>>
>> Regards,
>> Julian
>
next prev parent reply other threads:[~2025-01-09 22:28 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-02 19:26 [RFC PATCH v3 00/11] tcg-plugins: add hooks for discontinuities Julian Ganz
2024-12-02 19:26 ` [RFC PATCH v3 01/11] plugins: add types for callbacks related to certain discontinuities Julian Ganz
2024-12-03 8:45 ` Julian Ganz
2024-12-04 22:41 ` Pierrick Bouvier
2024-12-05 12:40 ` Julian Ganz
2024-12-05 17:56 ` Pierrick Bouvier
2024-12-05 21:50 ` Julian Ganz
2024-12-05 22:14 ` Julian Ganz
2024-12-05 23:03 ` Pierrick Bouvier
2024-12-06 8:58 ` Julian Ganz
2024-12-06 18:59 ` Pierrick Bouvier
2024-12-07 13:38 ` Julian Ganz
2024-12-09 18:52 ` Pierrick Bouvier
2024-12-04 22:45 ` Pierrick Bouvier
2024-12-05 12:44 ` Julian Ganz
2024-12-05 17:35 ` Pierrick Bouvier
2024-12-05 21:25 ` Julian Ganz
2025-01-09 13:52 ` Alex Bennée
2025-01-09 22:28 ` Pierrick Bouvier
2025-01-10 11:43 ` Julian Ganz
2024-12-02 19:26 ` [RFC PATCH v3 02/11] plugins: add API for registering discontinuity callbacks Julian Ganz
2024-12-04 22:45 ` Pierrick Bouvier
2025-01-09 13:57 ` Alex Bennée
2025-01-10 11:40 ` Julian Ganz
2024-12-02 19:26 ` [RFC PATCH v3 03/11] plugins: add hooks for new discontinuity related callbacks Julian Ganz
2024-12-04 22:47 ` Pierrick Bouvier
2025-01-09 13:58 ` Alex Bennée
2024-12-02 19:26 ` [RFC PATCH v3 04/11] contrib/plugins: add plugin showcasing new dicontinuity related API Julian Ganz
2024-12-04 23:14 ` Pierrick Bouvier
2024-12-05 13:00 ` Julian Ganz
2024-12-05 17:23 ` Pierrick Bouvier
2025-01-09 14:04 ` Alex Bennée
2025-01-09 22:10 ` Pierrick Bouvier
2025-01-10 11:49 ` Julian Ganz
2025-01-10 15:15 ` Alex Bennée
2025-01-10 21:02 ` Pierrick Bouvier
2025-01-11 12:15 ` Alex Bennée
2024-12-02 19:26 ` [RFC PATCH v3 05/11] target/alpha: call plugin trap callbacks Julian Ganz
2024-12-04 22:48 ` Pierrick Bouvier
2024-12-02 19:26 ` [RFC PATCH v3 06/11] target/arm: " Julian Ganz
2024-12-02 19:26 ` [RFC PATCH v3 07/11] target/avr: " Julian Ganz
2024-12-02 19:26 ` [RFC PATCH v3 08/11] target/mips: " Julian Ganz
2025-01-09 13:43 ` Alex Bennée
2024-12-02 19:26 ` [RFC PATCH v3 09/11] target/riscv: " Julian Ganz
2024-12-03 4:39 ` Alistair Francis
2024-12-02 19:41 ` [RFC PATCH v3 10/11] target/sparc: " Julian Ganz
2025-01-09 13:46 ` Alex Bennée
2024-12-02 19:41 ` [RFC PATCH v3 11/11] tests: add plugin asserting correctness of discon event's to_pc Julian Ganz
2024-12-04 23:33 ` Pierrick Bouvier
2024-12-05 13:10 ` Julian Ganz
2024-12-05 17:30 ` Pierrick Bouvier
2024-12-05 21:22 ` Julian Ganz
2024-12-05 22:28 ` Pierrick Bouvier
2024-12-06 8:42 ` Julian Ganz
2024-12-06 19:02 ` Pierrick Bouvier
2024-12-06 19:42 ` Richard Henderson
2024-12-06 20:40 ` Pierrick Bouvier
2024-12-06 22:56 ` Richard Henderson
2024-12-07 13:47 ` Julian Ganz
2024-12-07 13:41 ` Julian Ganz
2024-12-20 11:47 ` Julian Ganz
2024-12-20 21:17 ` Pierrick Bouvier
2024-12-20 21:46 ` Pierrick Bouvier
2025-01-09 16:35 ` Alex Bennée
2025-01-09 16:33 ` Alex Bennée
2025-01-09 22:27 ` Pierrick Bouvier [this message]
2025-01-10 11:58 ` Julian Ganz
2024-12-03 8:36 ` [RFC PATCH v3 00/11] tcg-plugins: add hooks for discontinuities Julian Ganz
2024-12-04 22:51 ` Pierrick Bouvier
2025-01-09 16:43 ` Alex Bennée
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86f60229-e76b-40d1-b8e2-2ad6c29c1194@linaro.org \
--to=pierrick.bouvier@linaro.org \
--cc=alex.bennee@linaro.org \
--cc=erdnaxe@crans.org \
--cc=ma.mandourr@gmail.com \
--cc=neither@nut.email \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).