* Plugin Memory Callback Debugging
@ 2022-11-15 22:05 Aaron Lindsay
2022-11-15 22:36 ` Alex Bennée
2022-11-16 6:19 ` Emilio Cota
0 siblings, 2 replies; 17+ messages in thread
From: Aaron Lindsay @ 2022-11-15 22:05 UTC (permalink / raw)
To: qemu-devel; +Cc: Emilio G. Cota, Alex Bennée, Richard Henderson
Hello,
I have been wrestling with what might be a bug in the plugin memory
callbacks. The immediate error is that I hit the
`g_assert_not_reached()` in the 'default:' case in
qemu_plugin_vcpu_mem_cb, indicating the callback type was invalid. When
breaking on this assertion in gdb, the contents of cpu->plugin_mem_cbs
are obviously bogus (`len` was absurdly high, for example). After doing
some further digging/instrumenting, I eventually found that
`free_dyn_cb_arr(void *p, ...)` is being called shortly before the
assertion is hit with `p` pointing to the same address as
`cpu->plugin_mem_cbs` will later hold at assertion-time. We are freeing
the memory still pointed to by `cpu->plugin_mem_cbs`.
I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
end of an instruction/TB's execution, so its not exactly clear to me how this
is occurring. However, I suspect it may be relevant that we are calling
`free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
I have additionally found that the below addition allows me to run successfully
without hitting the assert:
diff --git a/plugins/core.c b/plugins/core.c
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -427,9 +427,14 @@ static bool free_dyn_cb_arr(void *p, uint32_t h, void *userp)
void qemu_plugin_flush_cb(void)
{
+ CPUState *cpu;
qht_iter_remove(&plugin.dyn_cb_arr_ht, free_dyn_cb_arr, NULL);
qht_reset(&plugin.dyn_cb_arr_ht);
+ CPU_FOREACH(cpu) {
+ cpu->plugin_mem_cbs = NULL;
+ }
+
plugin_cb__simple(QEMU_PLUGIN_EV_FLUSH);
}
Unfortunately, the workload/setup I have encountered this bug with are
difficult to reproduce in a way suitable for sharing upstream (admittedly
potentially because I do not fully understand the conditions necessary to
trigger it). It is also deep into a run, and I haven't found a good way
to break in gdb immediately prior to it happening in order to inspect
it, without perturbing it enough such that it doesn't happen...
I welcome any feedback or insights on how to further nail down the
failure case and/or help in working towards an appropriate solution.
Thanks!
-Aaron
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-11-15 22:05 Plugin Memory Callback Debugging Aaron Lindsay
@ 2022-11-15 22:36 ` Alex Bennée
2022-11-18 21:58 ` Aaron Lindsay via
2022-11-21 20:18 ` Aaron Lindsay via
2022-11-16 6:19 ` Emilio Cota
1 sibling, 2 replies; 17+ messages in thread
From: Alex Bennée @ 2022-11-15 22:36 UTC (permalink / raw)
To: Aaron Lindsay; +Cc: qemu-devel, Emilio G. Cota, Richard Henderson
Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> Hello,
>
> I have been wrestling with what might be a bug in the plugin memory
> callbacks. The immediate error is that I hit the
> `g_assert_not_reached()` in the 'default:' case in
> qemu_plugin_vcpu_mem_cb, indicating the callback type was invalid. When
> breaking on this assertion in gdb, the contents of cpu->plugin_mem_cbs
> are obviously bogus (`len` was absurdly high, for example). After doing
> some further digging/instrumenting, I eventually found that
> `free_dyn_cb_arr(void *p, ...)` is being called shortly before the
> assertion is hit with `p` pointing to the same address as
> `cpu->plugin_mem_cbs` will later hold at assertion-time. We are freeing
> the memory still pointed to by `cpu->plugin_mem_cbs`.
>
> I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
> end of an instruction/TB's execution, so its not exactly clear to me how this
> is occurring. However, I suspect it may be relevant that we are calling
> `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
Hmm I'm going to have to remind myself about how this bit works.
>
> I have additionally found that the below addition allows me to run successfully
> without hitting the assert:
>
> diff --git a/plugins/core.c b/plugins/core.c
> --- a/plugins/core.c
> +++ b/plugins/core.c
> @@ -427,9 +427,14 @@ static bool free_dyn_cb_arr(void *p, uint32_t h, void *userp)
>
> void qemu_plugin_flush_cb(void)
> {
> + CPUState *cpu;
> qht_iter_remove(&plugin.dyn_cb_arr_ht, free_dyn_cb_arr, NULL);
> qht_reset(&plugin.dyn_cb_arr_ht);
>
> + CPU_FOREACH(cpu) {
> + cpu->plugin_mem_cbs = NULL;
> + }
> +
This is essentially qemu_plugin_disable_mem_helpers() but for all CPUs.
I think we should be able to treat the CPUs separately.
> plugin_cb__simple(QEMU_PLUGIN_EV_FLUSH);
> }
>
> Unfortunately, the workload/setup I have encountered this bug with are
> difficult to reproduce in a way suitable for sharing upstream (admittedly
> potentially because I do not fully understand the conditions necessary to
> trigger it). It is also deep into a run
How many full TB flushes have there been? You only see
qemu_plugin_flush_cb when we flush whole translation buffer (which is
something we do more often when plugins exit).
Does lowering tb-size make it easier to hit the failure mode?
> , and I haven't found a good way
> to break in gdb immediately prior to it happening in order to inspect
> it, without perturbing it enough such that it doesn't happen...
This is exactly the sort of thing rr is great for. Can you trigger it in
that?
https://rr-project.org/
>
> I welcome any feedback or insights on how to further nail down the
> failure case and/or help in working towards an appropriate solution.
>
> Thanks!
>
> -Aaron
--
Alex Bennée
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-11-15 22:05 Plugin Memory Callback Debugging Aaron Lindsay
2022-11-15 22:36 ` Alex Bennée
@ 2022-11-16 6:19 ` Emilio Cota
1 sibling, 0 replies; 17+ messages in thread
From: Emilio Cota @ 2022-11-16 6:19 UTC (permalink / raw)
To: Aaron Lindsay; +Cc: Alex Bennée, qemu-devel, Richard Henderson
On Tue, Nov 15, 2022 at 22:36:07 +0000, Alex Bennée wrote:
> This is exactly the sort of thing rr is great for. Can you trigger it in
> that?
>
> https://rr-project.org/
The sanitizers should also help.
For TLB flush tracing, defining DEBUG_TLB at the top of cputlb.c
might be useful.
Emilio
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-11-15 22:36 ` Alex Bennée
@ 2022-11-18 21:58 ` Aaron Lindsay via
2022-11-18 22:02 ` Aaron Lindsay
2022-11-21 20:18 ` Aaron Lindsay via
1 sibling, 1 reply; 17+ messages in thread
From: Aaron Lindsay via @ 2022-11-18 21:58 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel, Emilio G. Cota, Richard Henderson
On Nov 15 22:36, Alex Bennée wrote:
>
> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>
> > Hello,
> >
> > I have been wrestling with what might be a bug in the plugin memory
> > callbacks. The immediate error is that I hit the
> > `g_assert_not_reached()` in the 'default:' case in
> > qemu_plugin_vcpu_mem_cb, indicating the callback type was invalid. When
> > breaking on this assertion in gdb, the contents of cpu->plugin_mem_cbs
> > are obviously bogus (`len` was absurdly high, for example). After doing
> > some further digging/instrumenting, I eventually found that
> > `free_dyn_cb_arr(void *p, ...)` is being called shortly before the
> > assertion is hit with `p` pointing to the same address as
> > `cpu->plugin_mem_cbs` will later hold at assertion-time. We are freeing
> > the memory still pointed to by `cpu->plugin_mem_cbs`.
> >
> > I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
> > end of an instruction/TB's execution, so its not exactly clear to me how this
> > is occurring. However, I suspect it may be relevant that we are calling
> > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
>
> Hmm I'm going to have to remind myself about how this bit works.
>
> >
> > I have additionally found that the below addition allows me to run successfully
> > without hitting the assert:
> >
> > diff --git a/plugins/core.c b/plugins/core.c
> > --- a/plugins/core.c
> > +++ b/plugins/core.c
> > @@ -427,9 +427,14 @@ static bool free_dyn_cb_arr(void *p, uint32_t h, void *userp)
> >
> > void qemu_plugin_flush_cb(void)
> > {
> > + CPUState *cpu;
> > qht_iter_remove(&plugin.dyn_cb_arr_ht, free_dyn_cb_arr, NULL);
> > qht_reset(&plugin.dyn_cb_arr_ht);
> >
> > + CPU_FOREACH(cpu) {
> > + cpu->plugin_mem_cbs = NULL;
> > + }
> > +
>
> This is essentially qemu_plugin_disable_mem_helpers() but for all CPUs.
> I think we should be able to treat the CPUs separately.
I agree it's similar to qemu_plugin_disable_mem_helpers(), but for all
CPUs. Though a perhaps important distinction is that its occurring
unconditionally in conjunction with the event which flushes the TBs and
frees the callback arrays.
Isn't the code calling into qemu_plugin_flush_cb() already clearing TBs
for all CPUs? Can you help me understand why treating the CPUs
separately would be better?
> > plugin_cb__simple(QEMU_PLUGIN_EV_FLUSH);
> > }
> >
> > Unfortunately, the workload/setup I have encountered this bug with are
> > difficult to reproduce in a way suitable for sharing upstream (admittedly
> > potentially because I do not fully understand the conditions necessary to
> > trigger it). It is also deep into a run
>
> How many full TB flushes have there been? You only see
> qemu_plugin_flush_cb when we flush whole translation buffer (which is
> something we do more often when plugins exit).
There have been maybe hundreds of TB flushes at this point (I, erm, use
qemu_plugin_reset() somewhat liberally in this plugin). I believe it is
the most recent such flush that is problematic - I observe the call to
free_dyn_cb_arr() mentioned above as a result of it.
> Does lowering tb-size make it easier to hit the failure mode?
Hrm, interesting, I have not tried that. I'll poke at that if the rr
debug doesn't pan out.
> > , and I haven't found a good way
> > to break in gdb immediately prior to it happening in order to inspect
> > it, without perturbing it enough such that it doesn't happen...
>
> This is exactly the sort of thing rr is great for. Can you trigger it in
> that?
>
> https://rr-project.org/
I had not used rr before, thanks for the push to do so!
I have, so far, discovered the following timeline:
1. My plugin receives a instruction execution callback for a load
instruction. At this time, cpu->plugin_mem_cbs points to the same
memory which will later be freed
2. During the handling of this callback, my plugin calls qemu_plugin_reset()
3. Ostensibly something goes wrong here with the cleanup of
cpu->plugin_mem_cbs???
4. Step 2 triggers the TBs to be flushed, which frees the memory pointed
to by cpu->plugin_mem_cbs
Since I have this nicely recorded now with rr, it ought to be easier to
poke at, though I admit I'm not entirely sure how to poke at the
generated code to see what's going wrong (i.e. why wouldn't the tb exit
stuff be clearing this pointer like normal?).
> > I welcome any feedback or insights on how to further nail down the
> > failure case and/or help in working towards an appropriate solution.
-Aaron
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-11-18 21:58 ` Aaron Lindsay via
@ 2022-11-18 22:02 ` Aaron Lindsay
2022-11-21 22:02 ` Alex Bennée
0 siblings, 1 reply; 17+ messages in thread
From: Aaron Lindsay @ 2022-11-18 22:02 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel, Emilio G. Cota, Richard Henderson
Sorry, left off the very end of my timeline:
On Nov 18 16:58, Aaron Lindsay wrote:
> I have, so far, discovered the following timeline:
> 1. My plugin receives a instruction execution callback for a load
> instruction. At this time, cpu->plugin_mem_cbs points to the same
> memory which will later be freed
> 2. During the handling of this callback, my plugin calls qemu_plugin_reset()
> 3. Ostensibly something goes wrong here with the cleanup of
> cpu->plugin_mem_cbs???
> 4. Step 2 triggers the TBs to be flushed, which frees the memory pointed
> to by cpu->plugin_mem_cbs
5. A store exclusive instruction is translated and then executed, which
requires the use of a helper. When executed, this helper checks
cpu->plugin_mem_cbs, which is non-null, so it attempts to dereference
and use it, resulting in the assertion.
-Aaron
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-11-15 22:36 ` Alex Bennée
2022-11-18 21:58 ` Aaron Lindsay via
@ 2022-11-21 20:18 ` Aaron Lindsay via
2022-11-21 21:51 ` Alex Bennée
1 sibling, 1 reply; 17+ messages in thread
From: Aaron Lindsay via @ 2022-11-21 20:18 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel, Emilio G. Cota, Richard Henderson
On Nov 15 22:36, Alex Bennée wrote:
> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
> > end of an instruction/TB's execution, so its not exactly clear to me how this
> > is occurring. However, I suspect it may be relevant that we are calling
> > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
>
> Hmm I'm going to have to remind myself about how this bit works.
When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
set for an instruction? Is it guaranteed it is reset by the end of the
tb? If I were to put an assertion in cpu_tb_exec() just after the call
to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL there?
In my debugging, I *think* I'm seeing a tb set cpu->plugin_mem_cbs
for an instruction, and then not reset it to NULL. I'm wondering if its
getting optimized away or something, but want to make sure I've got my
assumptions correct about how this is intended to be working.
Thanks!
-Aaron
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-11-21 20:18 ` Aaron Lindsay via
@ 2022-11-21 21:51 ` Alex Bennée
2022-11-22 2:22 ` Richard Henderson
0 siblings, 1 reply; 17+ messages in thread
From: Alex Bennée @ 2022-11-21 21:51 UTC (permalink / raw)
To: Aaron Lindsay; +Cc: qemu-devel, Emilio G. Cota, Richard Henderson
Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> On Nov 15 22:36, Alex Bennée wrote:
>> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>> > I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
>> > end of an instruction/TB's execution, so its not exactly clear to me how this
>> > is occurring. However, I suspect it may be relevant that we are calling
>> > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
>>
>> Hmm I'm going to have to remind myself about how this bit works.
>
> When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
> set for an instruction? Is it guaranteed it is reset by the end of the
> tb?
It should be by the end of the instruction. See
inject_mem_disable_helper() which inserts TCG code to disable the
helpers. We also have plugin_gen_disable_mem_helpers() which should
catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is
why qemu_plugin_disable_mem_helpers() is only really concerned about
when we longjmp out of the loop.
> If I were to put an assertion in cpu_tb_exec() just after the call
> to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL
> there?
Yes I think so.
> In my debugging, I *think* I'm seeing a tb set cpu->plugin_mem_cbs
> for an instruction, and then not reset it to NULL. I'm wondering if its
> getting optimized away or something, but want to make sure I've got my
> assumptions correct about how this is intended to be working.
We are expecting some stuff to dead code away (but hopefully in pairs).
We don't know ahead of the instruction decode if it will be a memory
instruction so opportunistically insert our empty helper calls. If no
memory access is done those ops should be gone. This is all done to
avoid doing a 2 pass translate.
Richard,
Have I got that right? I think thats how I remember the TCG code working.
>
> Thanks!
>
> -Aaron
--
Alex Bennée
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-11-18 22:02 ` Aaron Lindsay
@ 2022-11-21 22:02 ` Alex Bennée
2022-11-22 17:05 ` Aaron Lindsay via
0 siblings, 1 reply; 17+ messages in thread
From: Alex Bennée @ 2022-11-21 22:02 UTC (permalink / raw)
To: Aaron Lindsay; +Cc: qemu-devel, Emilio G. Cota, Richard Henderson
Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> Sorry, left off the very end of my timeline:
>
> On Nov 18 16:58, Aaron Lindsay wrote:
>> I have, so far, discovered the following timeline:
>> 1. My plugin receives a instruction execution callback for a load
>> instruction. At this time, cpu->plugin_mem_cbs points to the same
>> memory which will later be freed
>> 2. During the handling of this callback, my plugin calls
>qemu_plugin_reset()
The final plugin reset should only execute in the safe async context
(i.e. no other vCPUs running code). That flushes all current generated
code.
>> 3. Ostensibly something goes wrong here with the cleanup of
>> cpu->plugin_mem_cbs???
This may be missed by the reset path (hence your patch) but it should be
being reset every instruction we instrument.
>> 4. Step 2 triggers the TBs to be flushed, which frees the memory pointed
>> to by cpu->plugin_mem_cbs
>
> 5. A store exclusive instruction is translated and then executed, which
> requires the use of a helper. When executed, this helper checks
> cpu->plugin_mem_cbs, which is non-null, so it attempts to dereference
> and use it, resulting in the assertion.
It should be being reset for each instruction I think.
>
> -Aaron
--
Alex Bennée
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-11-21 21:51 ` Alex Bennée
@ 2022-11-22 2:22 ` Richard Henderson
2022-11-22 15:57 ` Aaron Lindsay via
0 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2022-11-22 2:22 UTC (permalink / raw)
To: Alex Bennée, Aaron Lindsay; +Cc: qemu-devel, Emilio G. Cota
On 11/21/22 13:51, Alex Bennée wrote:
>
> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>
>> On Nov 15 22:36, Alex Bennée wrote:
>>> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>>>> I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
>>>> end of an instruction/TB's execution, so its not exactly clear to me how this
>>>> is occurring. However, I suspect it may be relevant that we are calling
>>>> `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
>>>
>>> Hmm I'm going to have to remind myself about how this bit works.
>>
>> When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
>> set for an instruction? Is it guaranteed it is reset by the end of the
>> tb?
>
> It should be by the end of the instruction. See
> inject_mem_disable_helper() which inserts TCG code to disable the
> helpers. We also have plugin_gen_disable_mem_helpers() which should
> catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is
> why qemu_plugin_disable_mem_helpers() is only really concerned about
> when we longjmp out of the loop.
>
>> If I were to put an assertion in cpu_tb_exec() just after the call
>> to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL
>> there?
>
> Yes I think so.
Indeed.
As an aside, this field should be moved out of CPUState to DisasContextBase, because it's
only used in code generation.
>
>> In my debugging, I *think* I'm seeing a tb set cpu->plugin_mem_cbs
>> for an instruction, and then not reset it to NULL. I'm wondering if its
>> getting optimized away or something, but want to make sure I've got my
>> assumptions correct about how this is intended to be working.
>
> We are expecting some stuff to dead code away (but hopefully in pairs).
> We don't know ahead of the instruction decode if it will be a memory
> instruction so opportunistically insert our empty helper calls. If no
> memory access is done those ops should be gone. This is all done to
> avoid doing a 2 pass translate.
>
> Richard,
>
> Have I got that right? I think thats how I remember the TCG code working.
That's about right.
r~
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-11-22 2:22 ` Richard Henderson
@ 2022-11-22 15:57 ` Aaron Lindsay via
2022-11-29 20:37 ` Aaron Lindsay via
0 siblings, 1 reply; 17+ messages in thread
From: Aaron Lindsay via @ 2022-11-22 15:57 UTC (permalink / raw)
To: Richard Henderson; +Cc: Alex Bennée, qemu-devel, Emilio G. Cota
On Nov 21 18:22, Richard Henderson wrote:
> On 11/21/22 13:51, Alex Bennée wrote:
> >
> > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> >
> > > On Nov 15 22:36, Alex Bennée wrote:
> > > > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > > > > I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
> > > > > end of an instruction/TB's execution, so its not exactly clear to me how this
> > > > > is occurring. However, I suspect it may be relevant that we are calling
> > > > > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
> > > >
> > > > Hmm I'm going to have to remind myself about how this bit works.
> > >
> > > When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
> > > set for an instruction? Is it guaranteed it is reset by the end of the
> > > tb?
> >
> > It should be by the end of the instruction. See
> > inject_mem_disable_helper() which inserts TCG code to disable the
> > helpers. We also have plugin_gen_disable_mem_helpers() which should
> > catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is
> > why qemu_plugin_disable_mem_helpers() is only really concerned about
> > when we longjmp out of the loop.
> >
> > > If I were to put an assertion in cpu_tb_exec() just after the call
> > > to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL
> > > there?
> >
> > Yes I think so.
>
> Indeed.
Well, the good news is that if this is an assumption we're relying on, it is
now trivial to reproduce the problem!
Compile some simple program (doesn't really matter, the issue gets triggered
early):
$ echo "int main() { return 0; }" > simple.c && gcc simple.c -o simple
Make this change to cpu_tb_exec():
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 356fe348de..50a010327d 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -436,6 +436,9 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int *tb_exit)
>
> qemu_thread_jit_execute();
> ret = tcg_qemu_tb_exec(env, tb_ptr);
> + if (cpu->plugin_mem_cbs != NULL) {
> + g_assert_not_reached();
> + }
> cpu->can_do_io = 1;
> /*
> * TODO: Delay swapping back to the read-write region of the TB
And run:
$ ./build/qemu-aarch64 -plugin contrib/plugins/libexeclog.so -d plugin ./simple
You should fairly quickly see something like:
> [snip]
> 0, 0x5502814d04, 0xb4000082, ""
> 0, 0x5502814d08, 0xf9400440, "", load, 0x5502844ed0
> 0, 0x5502814d0c, 0xf1001c1f, ""
> **
> ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached
> Bail out! ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached
When digging through my other failure in `rr` I saw the cpu->plugin_mem_cbs
pointer changing from one non-null value to another (which also seems to
indicate it is not being cleared between instructions).
Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
getting optimized away, but not the code to set it in the first place?
-Aaron
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-11-21 22:02 ` Alex Bennée
@ 2022-11-22 17:05 ` Aaron Lindsay via
0 siblings, 0 replies; 17+ messages in thread
From: Aaron Lindsay via @ 2022-11-22 17:05 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel, Emilio G. Cota, Richard Henderson
On Nov 21 22:02, Alex Bennée wrote:
>
> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>
> > Sorry, left off the very end of my timeline:
> >
> > On Nov 18 16:58, Aaron Lindsay wrote:
> >> I have, so far, discovered the following timeline:
> >> 1. My plugin receives a instruction execution callback for a load
> >> instruction. At this time, cpu->plugin_mem_cbs points to the same
> >> memory which will later be freed
> >> 2. During the handling of this callback, my plugin calls
> >qemu_plugin_reset()
>
> The final plugin reset should only execute in the safe async context
> (i.e. no other vCPUs running code). That flushes all current generated
> code.
>
> >> 3. Ostensibly something goes wrong here with the cleanup of
> >> cpu->plugin_mem_cbs???
>
> This may be missed by the reset path (hence your patch) but it should be
> being reset every instruction we instrument.
>
> >> 4. Step 2 triggers the TBs to be flushed, which frees the memory pointed
> >> to by cpu->plugin_mem_cbs
> >
> > 5. A store exclusive instruction is translated and then executed, which
> > requires the use of a helper. When executed, this helper checks
> > cpu->plugin_mem_cbs, which is non-null, so it attempts to dereference
> > and use it, resulting in the assertion.
>
> It should be being reset for each instruction I think.
FYI - I suspect my above presentation of the problem suffered from the
"searching where the streetlamp is instead of where you lost something"
problem. In other words, I did/do observe the error at reset, but I now
believe that is merely where it is easiest to observe. The
cpu->plugin_mem_cbs doesn't appear to be reset at the end of
instructions and manifests at reset because that's when the underlying
memory is freed.
-Aaron
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-11-22 15:57 ` Aaron Lindsay via
@ 2022-11-29 20:37 ` Aaron Lindsay via
2022-12-01 19:32 ` Alex Bennée
2022-12-18 5:24 ` Emilio Cota
0 siblings, 2 replies; 17+ messages in thread
From: Aaron Lindsay via @ 2022-11-29 20:37 UTC (permalink / raw)
To: Richard Henderson; +Cc: Alex Bennée, qemu-devel, Emilio G. Cota
On Nov 22 10:57, Aaron Lindsay wrote:
> On Nov 21 18:22, Richard Henderson wrote:
> > On 11/21/22 13:51, Alex Bennée wrote:
> > >
> > > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > >
> > > > On Nov 15 22:36, Alex Bennée wrote:
> > > > > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > > > > > I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
> > > > > > end of an instruction/TB's execution, so its not exactly clear to me how this
> > > > > > is occurring. However, I suspect it may be relevant that we are calling
> > > > > > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
> > > > >
> > > > > Hmm I'm going to have to remind myself about how this bit works.
> > > >
> > > > When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
> > > > set for an instruction? Is it guaranteed it is reset by the end of the
> > > > tb?
> > >
> > > It should be by the end of the instruction. See
> > > inject_mem_disable_helper() which inserts TCG code to disable the
> > > helpers. We also have plugin_gen_disable_mem_helpers() which should
> > > catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is
> > > why qemu_plugin_disable_mem_helpers() is only really concerned about
> > > when we longjmp out of the loop.
> > >
> > > > If I were to put an assertion in cpu_tb_exec() just after the call
> > > > to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL
> > > > there?
> > >
> > > Yes I think so.
> >
> > Indeed.
>
> Well, the good news is that if this is an assumption we're relying on, it is
> now trivial to reproduce the problem!
>
> Compile some simple program (doesn't really matter, the issue gets triggered
> early):
>
> $ echo "int main() { return 0; }" > simple.c && gcc simple.c -o simple
>
> Make this change to cpu_tb_exec():
>
> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > index 356fe348de..50a010327d 100644
> > --- a/accel/tcg/cpu-exec.c
> > +++ b/accel/tcg/cpu-exec.c
> > @@ -436,6 +436,9 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int *tb_exit)
> >
> > qemu_thread_jit_execute();
> > ret = tcg_qemu_tb_exec(env, tb_ptr);
> > + if (cpu->plugin_mem_cbs != NULL) {
> > + g_assert_not_reached();
> > + }
> > cpu->can_do_io = 1;
> > /*
> > * TODO: Delay swapping back to the read-write region of the TB
>
> And run:
>
> $ ./build/qemu-aarch64 -plugin contrib/plugins/libexeclog.so -d plugin ./simple
>
> You should fairly quickly see something like:
>
> > [snip]
> > 0, 0x5502814d04, 0xb4000082, ""
> > 0, 0x5502814d08, 0xf9400440, "", load, 0x5502844ed0
> > 0, 0x5502814d0c, 0xf1001c1f, ""
> > **
> > ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached
> > Bail out! ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached
>
> When digging through my other failure in `rr` I saw the cpu->plugin_mem_cbs
> pointer changing from one non-null value to another (which also seems to
> indicate it is not being cleared between instructions).
>
> Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
> getting optimized away, but not the code to set it in the first place?
Is there anyone who could help take a look at this from the code gen
perspective?
-Aaron
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-11-29 20:37 ` Aaron Lindsay via
@ 2022-12-01 19:32 ` Alex Bennée
2022-12-18 5:24 ` Emilio Cota
1 sibling, 0 replies; 17+ messages in thread
From: Alex Bennée @ 2022-12-01 19:32 UTC (permalink / raw)
To: Aaron Lindsay; +Cc: Richard Henderson, qemu-devel, Emilio G. Cota
Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> On Nov 22 10:57, Aaron Lindsay wrote:
>> On Nov 21 18:22, Richard Henderson wrote:
>> > On 11/21/22 13:51, Alex Bennée wrote:
>> > >
>> > > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>> > >
>> > > > On Nov 15 22:36, Alex Bennée wrote:
>> > > > > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>> > > > > > I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
>> > > > > > end of an instruction/TB's execution, so its not exactly clear to me how this
>> > > > > > is occurring. However, I suspect it may be relevant that we are calling
>> > > > > > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
>> > > > >
>> > > > > Hmm I'm going to have to remind myself about how this bit works.
>> > > >
>> > > > When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
>> > > > set for an instruction? Is it guaranteed it is reset by the end of the
>> > > > tb?
>> > >
>> > > It should be by the end of the instruction. See
>> > > inject_mem_disable_helper() which inserts TCG code to disable the
>> > > helpers. We also have plugin_gen_disable_mem_helpers() which should
>> > > catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is
>> > > why qemu_plugin_disable_mem_helpers() is only really concerned about
>> > > when we longjmp out of the loop.
>> > >
>> > > > If I were to put an assertion in cpu_tb_exec() just after the call
>> > > > to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL
>> > > > there?
>> > >
>> > > Yes I think so.
>> >
>> > Indeed.
>>
>> Well, the good news is that if this is an assumption we're relying on, it is
>> now trivial to reproduce the problem!
>>
>> Compile some simple program (doesn't really matter, the issue gets triggered
>> early):
>>
>> $ echo "int main() { return 0; }" > simple.c && gcc simple.c -o simple
>>
>> Make this change to cpu_tb_exec():
>>
>> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>> > index 356fe348de..50a010327d 100644
>> > --- a/accel/tcg/cpu-exec.c
>> > +++ b/accel/tcg/cpu-exec.c
>> > @@ -436,6 +436,9 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int *tb_exit)
>> >
>> > qemu_thread_jit_execute();
>> > ret = tcg_qemu_tb_exec(env, tb_ptr);
>> > + if (cpu->plugin_mem_cbs != NULL) {
>> > + g_assert_not_reached();
>> > + }
>> > cpu->can_do_io = 1;
>> > /*
>> > * TODO: Delay swapping back to the read-write region of the TB
>>
>> And run:
>>
>> $ ./build/qemu-aarch64 -plugin contrib/plugins/libexeclog.so -d plugin ./simple
>>
>> You should fairly quickly see something like:
>>
>> > [snip]
>> > 0, 0x5502814d04, 0xb4000082, ""
>> > 0, 0x5502814d08, 0xf9400440, "", load, 0x5502844ed0
>> > 0, 0x5502814d0c, 0xf1001c1f, ""
Hmm why are you not getting any opcodes there? Missing capstone?
>> > **
>> > ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached
>> > Bail out! ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code
>> > should not be reached
Hmm I can replicate so I need to check my understanding. It fails in the
first block:
./qemu-aarch64 -plugin contrib/plugins/libexeclog.so -d \
plugin,in_asm,op,op_opt,out_asm ./tests/tcg/aarch64-linux-user/sha512
gives:
PROLOGUE: [size=45]
0x7f4b64000000: 55 pushq %rbp
0x7f4b64000001: 53 pushq %rbx
0x7f4b64000002: 41 54 pushq %r12
0x7f4b64000004: 41 55 pushq %r13
0x7f4b64000006: 41 56 pushq %r14
0x7f4b64000008: 41 57 pushq %r15
0x7f4b6400000a: 48 8b ef movq %rdi, %rbp
0x7f4b6400000d: 48 81 c4 78 fb ff ff addq $-0x488, %rsp
0x7f4b64000014: ff e6 jmpq *%rsi
0x7f4b64000016: 33 c0 xorl %eax, %eax
0x7f4b64000018: 48 81 c4 88 04 00 00 addq $0x488, %rsp
0x7f4b6400001f: c5 f8 77 vzeroupper
0x7f4b64000022: 41 5f popq %r15
0x7f4b64000024: 41 5e popq %r14
0x7f4b64000026: 41 5d popq %r13
0x7f4b64000028: 41 5c popq %r12
0x7f4b6400002a: 5b popq %rbx
0x7f4b6400002b: 5d popq %rbp
0x7f4b6400002c: c3 retq
----------------
IN:
0x004005d0: d280001d movz x29, #0
0x004005d4: d280001e movz x30, #0
0x004005d8: aa0003e5 mov x5, x0
0x004005dc: f94003e1 ldr x1, [sp]
0x004005e0: 910023e2 add x2, sp, #8
0x004005e4: 910003e6 mov x6, sp
0x004005e8: 90000000 adrp x0, #0x400000
0x004005ec: 91182000 add x0, x0, #0x608
0x004005f0: b0000023 adrp x3, #0x405000
0x004005f4: 91014063 add x3, x3, #0x50
0x004005f8: b0000024 adrp x4, #0x405000
0x004005fc: 91044084 add x4, x4, #0x110
0x00400600: 940010e8 bl #0x4049a0
OP:
ld_i32 tmp0,env,$0xfffffffffffffff0
brcond_i32 tmp0,$0x0,lt,$L0
---- 00000000004005d0 0000000000000000 0000000000000000
mov_i64 tmp2,$0x55c0ff203430
ld_i32 tmp0,env,$0xffffffffffffffa8
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
mov_i64 x29,$0x0
---- 00000000004005d4 0000000000000000 0000000000000000
mov_i64 tmp2,$0x55c0ff202800
ld_i32 tmp0,env,$0xffffffffffffffa8
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
mov_i64 lr,$0x0
---- 00000000004005d8 0000000000000000 0000000000000000
mov_i64 tmp2,$0x55c0ff203400
ld_i32 tmp0,env,$0xffffffffffffffa8
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
mov_i64 x5,x0
This is a memory annotated instruction:
---- 00000000004005dc 0000000000000000 0000000000000f06
mov_i64 tmp2,$0x55c0ff1a6150
ld_i32 tmp0,env,$0xffffffffffffffa8
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
mov_i64 tmp2,sp
shl_i64 tmp4,tmp2,$0x8
sar_i64 tmp4,tmp4,$0x8
and_i64 tmp4,tmp4,tmp2
mov_i64 tmp7,tmp4
qemu_ld_i64 x1,tmp7,leq,0
mov_i32 tmp8,$0x10030
mov_i64 tmp11,$0x0
ld_i32 tmp0,env,$0xffffffffffffffa8
mov_i64 tmp10,tmp7
call plugin(0x7f4b71c14388),$0x1,$0,tmp0,tmp8,tmp10,tmp11
---- 00000000004005e0 0000000000000000 0000000000000000
mov_i64 tmp2,$0x55c0ff1fa4e0
ld_i32 tmp0,env,$0xffffffffffffffa8
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
add_i64 tmp2,sp,$0x8
mov_i64 x2,tmp2
---- 00000000004005e4 0000000000000000 0000000000000000
mov_i64 tmp2,$0x55c0ff193500
ld_i32 tmp0,env,$0xffffffffffffffa8
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
mov_i64 tmp2,sp
mov_i64 x6,tmp2
---- 00000000004005e8 0000000000000000 0000000000000000
mov_i64 tmp2,$0x55c0ff219700
ld_i32 tmp0,env,$0xffffffffffffffa8
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
mov_i64 x0,$0x400000
---- 00000000004005ec 0000000000000000 0000000000000000
mov_i64 tmp2,$0x55c0ff21d160
ld_i32 tmp0,env,$0xffffffffffffffa8
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
add_i64 tmp2,x0,$0x608
mov_i64 x0,tmp2
---- 00000000004005f0 0000000000000000 0000000000000000
mov_i64 tmp2,$0x55c0ff217f80
ld_i32 tmp0,env,$0xffffffffffffffa8
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
mov_i64 x3,$0x405000
---- 00000000004005f4 0000000000000000 0000000000000000
mov_i64 tmp2,$0x55c0ff2180c0
ld_i32 tmp0,env,$0xffffffffffffffa8
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
add_i64 tmp2,x3,$0x50
mov_i64 x3,tmp2
---- 00000000004005f8 0000000000000000 0000000000000000
mov_i64 tmp2,$0x55c0ff21c4b0
ld_i32 tmp0,env,$0xffffffffffffffa8
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
mov_i64 x4,$0x405000
---- 00000000004005fc 0000000000000000 0000000000000000
mov_i64 tmp2,$0x55c0ff21c590
ld_i32 tmp0,env,$0xffffffffffffffa8
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
add_i64 tmp2,x4,$0x110
mov_i64 x4,tmp2
---- 0000000000400600 0000000000000000 0000000000000000
mov_i64 tmp2,$0x55c0ff217cd0
st_i64 tmp2,env,$0xffffffffffffff90
mov_i64 tmp2,$0x55c0ff21c670
ld_i32 tmp0,env,$0xffffffffffffffa8
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
mov_i64 lr,$0x400604
mov_i64 pc,$0x4049a0
call lookup_tb_ptr,$0x6,$1,tmp2,env
goto_ptr tmp2
set_label $L0
exit_tb $0x7f4b64000043
OP after optimization and liveness analysis:
ld_i32 tmp0,env,$0xfffffffffffffff0 pref=0xffff
brcond_i32 tmp0,$0x0,lt,$L0 dead: 0 1
---- 00000000004005d0 0000000000000000 0000000000000000
ld_i32 tmp0,env,$0xffffffffffffffa8 pref=0x80
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff203430 dead: 0 1
mov_i64 x29,$0x0 sync: 0 dead: 0 pref=0xffff
---- 00000000004005d4 0000000000000000 0000000000000000
ld_i32 tmp0,env,$0xffffffffffffffa8 pref=0x80
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff202800 dead: 0 1
mov_i64 lr,$0x0 sync: 0 dead: 0 pref=0xffff
---- 00000000004005d8 0000000000000000 0000000000000000
ld_i32 tmp0,env,$0xffffffffffffffa8 pref=0x80
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff203400 dead: 0 1
mov_i64 x5,x0 sync: 0 dead: 0 1 pref=0xffff
---- 00000000004005dc 0000000000000000 0000000000000f06
ld_i32 tmp0,env,$0xffffffffffffffa8 pref=0x80
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff1a6150 dead: 0 1
shl_i64 tmp4,sp,$0x8 pref=0xffff
sar_i64 tmp4,tmp4,$0x8 dead: 1 pref=0xffff
and_i64 tmp4,tmp4,sp dead: 1 pref=0xffff
mov_i64 tmp7,tmp4 dead: 1 pref=0xf038
qemu_ld_i64 x1,tmp7,leq,0 sync: 0 dead: 0 pref=0xffff
ld_i32 tmp0,env,$0xffffffffffffffa8 pref=0x80
mov_i64 tmp10,tmp7 dead: 1 pref=0x4
call plugin(0x7f4b71c14388),$0x1,$0,tmp0,$0x10030,tmp10,$0x0 dead: 0 1 2 3
---- 00000000004005e0 0000000000000000 0000000000000000
ld_i32 tmp0,env,$0xffffffffffffffa8 pref=0x80
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff1fa4e0 dead: 0 1
add_i64 tmp2,sp,$0x8 dead: 2 pref=0xffff
mov_i64 x2,tmp2 sync: 0 dead: 0 1 pref=0xffff
---- 00000000004005e4 0000000000000000 0000000000000000
ld_i32 tmp0,env,$0xffffffffffffffa8 pref=0x80
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff193500 dead: 0 1
mov_i64 x6,sp sync: 0 dead: 0 1 pref=0xffff
---- 00000000004005e8 0000000000000000 0000000000000000
ld_i32 tmp0,env,$0xffffffffffffffa8 pref=0x80
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff219700 dead: 0 1
---- 00000000004005ec 0000000000000000 0000000000000000
ld_i32 tmp0,env,$0xffffffffffffffa8 pref=0x80
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff21d160 dead: 0 1
mov_i64 x0,$0x400608 sync: 0 dead: 0 1 pref=0xffff
---- 00000000004005f0 0000000000000000 0000000000000000
ld_i32 tmp0,env,$0xffffffffffffffa8 pref=0x80
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff217f80 dead: 0 1
---- 00000000004005f4 0000000000000000 0000000000000000
ld_i32 tmp0,env,$0xffffffffffffffa8 pref=0x80
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff2180c0 dead: 0 1
mov_i64 x3,$0x405050 sync: 0 dead: 0 1 pref=0xffff
---- 00000000004005f8 0000000000000000 0000000000000000
ld_i32 tmp0,env,$0xffffffffffffffa8 pref=0x80
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff21c4b0 dead: 0 1
---- 00000000004005fc 0000000000000000 0000000000000000
ld_i32 tmp0,env,$0xffffffffffffffa8 pref=0x80
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff21c590 dead: 0 1
mov_i64 x4,$0x405110 sync: 0 dead: 0 1 pref=0xffff
---- 0000000000400600 0000000000000000 0000000000000000
st_i64 $0x55c0ff217cd0,env,$0xffffffffffffff90 dead: 0
ld_i32 tmp0,env,$0xffffffffffffffa8 pref=0x80
call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff21c670 dead: 0 1
mov_i64 lr,$0x400604 sync: 0 dead: 0 1 pref=0xffff
mov_i64 pc,$0x4049a0 sync: 0 dead: 0 1 pref=0xffff
call lookup_tb_ptr,$0x6,$1,tmp2,env dead: 1 pref=none
goto_ptr tmp2 dead: 0
set_label $L0
exit_tb $0x7f4b64000043
OUT: [size=432]
-- guest addr 0x00000000004005d0 + tb prologue
0x7f4b64000100: 8b 5d f0 movl -0x10(%rbp), %ebx
0x7f4b64000103: 85 db testl %ebx, %ebx
0x7f4b64000105: 0f 8c 8a 01 00 00 jl 0x7f4b64000295
0x7f4b6400010b: 8b 7d a8 movl -0x58(%rbp), %edi
0x7f4b6400010e: 48 be 30 34 20 ff c0 55 movabsq $0x55c0ff203430, %rsi
0x7f4b64000116: 00 00
0x7f4b64000118: e8 82 43 c1 0d callq 0x7f4b71c1449f
0x7f4b6400011d: 48 c7 85 28 01 00 00 00 movq $0, 0x128(%rbp)
0x7f4b64000125: 00 00 00
-- guest addr 0x00000000004005d4
0x7f4b64000128: 8b 7d a8 movl -0x58(%rbp), %edi
0x7f4b6400012b: 48 be 00 28 20 ff c0 55 movabsq $0x55c0ff202800, %rsi
0x7f4b64000133: 00 00
0x7f4b64000135: e8 65 43 c1 0d callq 0x7f4b71c1449f
0x7f4b6400013a: 48 c7 85 30 01 00 00 00 movq $0, 0x130(%rbp)
0x7f4b64000142: 00 00 00
-- guest addr 0x00000000004005d8
0x7f4b64000145: 8b 7d a8 movl -0x58(%rbp), %edi
0x7f4b64000148: 48 be 00 34 20 ff c0 55 movabsq $0x55c0ff203400, %rsi
0x7f4b64000150: 00 00
0x7f4b64000152: e8 48 43 c1 0d callq 0x7f4b71c1449f
0x7f4b64000157: 48 8b 5d 40 movq 0x40(%rbp), %rbx
0x7f4b6400015b: 48 89 5d 68 movq %rbx, 0x68(%rbp)
-- guest addr 0x00000000004005dc
0x7f4b6400015f: 8b 7d a8 movl -0x58(%rbp), %edi
0x7f4b64000162: 48 be 50 61 1a ff c0 55 movabsq $0x55c0ff1a6150, %rsi
0x7f4b6400016a: 00 00
0x7f4b6400016c: e8 2e 43 c1 0d callq 0x7f4b71c1449f
0x7f4b64000171: 48 8b 9d 38 01 00 00 movq 0x138(%rbp), %rbx
0x7f4b64000178: 4c 8b e3 movq %rbx, %r12
0x7f4b6400017b: 49 c1 e4 08 shlq $8, %r12
0x7f4b6400017f: 49 c1 fc 08 sarq $8, %r12
0x7f4b64000183: 4c 23 e3 andq %rbx, %r12
0x7f4b64000186: 4d 8b 2c 24 movq (%r12), %r13
0x7f4b6400018a: 4c 89 6d 48 movq %r13, 0x48(%rbp)
0x7f4b6400018e: 8b 7d a8 movl -0x58(%rbp), %edi
0x7f4b64000191: be 30 00 01 00 movl $0x10030, %esi
0x7f4b64000196: 49 8b d4 movq %r12, %rdx
0x7f4b64000199: 33 c9 xorl %ecx, %ecx
0x7f4b6400019b: e8 e8 41 c1 0d callq 0x7f4b71c14388
-- guest addr 0x00000000004005e0
0x7f4b640001a0: 8b 7d a8 movl -0x58(%rbp), %edi
0x7f4b640001a3: 48 be e0 a4 1f ff c0 55 movabsq $0x55c0ff1fa4e0, %rsi
0x7f4b640001ab: 00 00
0x7f4b640001ad: e8 ed 42 c1 0d callq 0x7f4b71c1449f
0x7f4b640001b2: 4c 8d 63 08 leaq 8(%rbx), %r12
0x7f4b640001b6: 4c 89 65 50 movq %r12, 0x50(%rbp)
-- guest addr 0x00000000004005e4
0x7f4b640001ba: 8b 7d a8 movl -0x58(%rbp), %edi
0x7f4b640001bd: 48 be 00 35 19 ff c0 55 movabsq $0x55c0ff193500, %rsi
0x7f4b640001c5: 00 00
0x7f4b640001c7: e8 d3 42 c1 0d callq 0x7f4b71c1449f
0x7f4b640001cc: 48 89 5d 70 movq %rbx, 0x70(%rbp)
-- guest addr 0x00000000004005e8
0x7f4b640001d0: 8b 7d a8 movl -0x58(%rbp), %edi
0x7f4b640001d3: 48 be 00 97 21 ff c0 55 movabsq $0x55c0ff219700, %rsi
0x7f4b640001db: 00 00
0x7f4b640001dd: e8 bd 42 c1 0d callq 0x7f4b71c1449f
-- guest addr 0x00000000004005ec
0x7f4b640001e2: 8b 7d a8 movl -0x58(%rbp), %edi
0x7f4b640001e5: 48 be 60 d1 21 ff c0 55 movabsq $0x55c0ff21d160, %rsi
0x7f4b640001ed: 00 00
0x7f4b640001ef: e8 ab 42 c1 0d callq 0x7f4b71c1449f
0x7f4b640001f4: 48 c7 45 40 08 06 40 00 movq $0x400608, 0x40(%rbp)
-- guest addr 0x00000000004005f0
0x7f4b640001fc: 8b 7d a8 movl -0x58(%rbp), %edi
0x7f4b640001ff: 48 be 80 7f 21 ff c0 55 movabsq $0x55c0ff217f80, %rsi
0x7f4b64000207: 00 00
0x7f4b64000209: e8 91 42 c1 0d callq 0x7f4b71c1449f
-- guest addr 0x00000000004005f4
0x7f4b6400020e: 8b 7d a8 movl -0x58(%rbp), %edi
0x7f4b64000211: 48 be c0 80 21 ff c0 55 movabsq $0x55c0ff2180c0, %rsi
0x7f4b64000219: 00 00
0x7f4b6400021b: e8 7f 42 c1 0d callq 0x7f4b71c1449f
0x7f4b64000220: 48 c7 45 58 50 50 40 00 movq $0x405050, 0x58(%rbp)
-- guest addr 0x00000000004005f8
0x7f4b64000228: 8b 7d a8 movl -0x58(%rbp), %edi
0x7f4b6400022b: 48 be b0 c4 21 ff c0 55 movabsq $0x55c0ff21c4b0, %rsi
0x7f4b64000233: 00 00
0x7f4b64000235: e8 65 42 c1 0d callq 0x7f4b71c1449f
-- guest addr 0x00000000004005fc
0x7f4b6400023a: 8b 7d a8 movl -0x58(%rbp), %edi
0x7f4b6400023d: 48 be 90 c5 21 ff c0 55 movabsq $0x55c0ff21c590, %rsi
0x7f4b64000245: 00 00
0x7f4b64000247: e8 53 42 c1 0d callq 0x7f4b71c1449f
0x7f4b6400024c: 48 c7 45 60 10 51 40 00 movq $0x405110, 0x60(%rbp)
-- guest addr 0x0000000000400600
0x7f4b64000254: 48 bb d0 7c 21 ff c0 55 movabsq $0x55c0ff217cd0, %rbx
0x7f4b6400025c: 00 00
0x7f4b6400025e: 48 89 5d 90 movq %rbx, -0x70(%rbp)
0x7f4b64000262: 8b 7d a8 movl -0x58(%rbp), %edi
0x7f4b64000265: 48 be 70 c6 21 ff c0 55 movabsq $0x55c0ff21c670, %rsi
0x7f4b6400026d: 00 00
0x7f4b6400026f: e8 2b 42 c1 0d callq 0x7f4b71c1449f
0x7f4b64000274: 48 c7 85 30 01 00 00 04 movq $0x400604, 0x130(%rbp)
0x7f4b6400027c: 06 40 00
0x7f4b6400027f: 48 c7 85 40 01 00 00 a0 movq $0x4049a0, 0x140(%rbp)
0x7f4b64000287: 49 40 00
0x7f4b6400028a: 48 8b fd movq %rbp, %rdi
0x7f4b6400028d: ff 15 15 00 00 00 callq *0x15(%rip)
0x7f4b64000293: ff e0 jmpq *%rax
0x7f4b64000295: 48 8d 05 a7 fd ff ff leaq -0x259(%rip), %rax
0x7f4b6400029c: e9 77 fd ff ff jmp 0x7f4b64000018
-- tb slow paths + alignment
0x7f4b640002a1: 90 nop
0x7f4b640002a2: 90 nop
0x7f4b640002a3: 90 nop
0x7f4b640002a4: 90 nop
0x7f4b640002a5: 90 nop
0x7f4b640002a6: 90 nop
0x7f4b640002a7: 90 nop
data: [size=8]
0x7f4b640002a8: .quad 0x000055c0feba1d00
0, 0x4005d0, 0xd280001d, "movz x29, #0"
0, 0x4005d4, 0xd280001e, "movz x30, #0"
0, 0x4005d8, 0xaa0003e5, "mov x5, x0"
0, 0x4005dc, 0xf94003e1, "ldr x1, [sp]", load, 0x55008000f0
0, 0x4005e0, 0x910023e2, "add x2, sp, #8"
0, 0x4005e4, 0x910003e6, "mov x6, sp"
0, 0x4005e8, 0x90000000, "adrp x0, #0x400000"
0, 0x4005ec, 0x91182000, "add x0, x0, #0x608"
0, 0x4005f0, 0xb0000023, "adrp x3, #0x405000"
0, 0x4005f4, 0x91014063, "add x3, x3, #0x50"
0, 0x4005f8, 0xb0000024, "adrp x4, #0x405000"
0, 0x4005fc, 0x91044084, "add x4, x4, #0x110"
cpu_tb_exec: 0
**
ERROR:../../accel/tcg/cpu-exec.c:443:cpu_tb_exec: code should not be reached
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
>>
>> When digging through my other failure in `rr` I saw the cpu->plugin_mem_cbs
>> pointer changing from one non-null value to another (which also seems to
>> indicate it is not being cleared between instructions).
>>
>> Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
>> getting optimized away, but not the code to set it in the first place?
>
> Is there anyone who could help take a look at this from the code gen
> perspective?
>
> -Aaron
--
Alex Bennée
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-11-29 20:37 ` Aaron Lindsay via
2022-12-01 19:32 ` Alex Bennée
@ 2022-12-18 5:24 ` Emilio Cota
2022-12-19 20:11 ` Aaron Lindsay
1 sibling, 1 reply; 17+ messages in thread
From: Emilio Cota @ 2022-12-18 5:24 UTC (permalink / raw)
To: Aaron Lindsay; +Cc: Richard Henderson, Alex Bennée, qemu-devel
On Tue, Nov 29, 2022 at 15:37:51 -0500, Aaron Lindsay wrote:
(snip)
> > Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
> > getting optimized away, but not the code to set it in the first place?
>
> Is there anyone who could help take a look at this from the code gen
> perspective?
Thanks for the report. Just adding assertions was enough to uncover
several bugs. I did not reproduce the use-after-free, but by calling
reset from a callback it's easy to see how it can occur.
I have fixes in https://github.com/cota/qemu/tree/plugins
Can you please give those a try?
BTW I created an issue on gitlab to track this
https://gitlab.com/qemu-project/qemu/-/issues/1381
Thanks,
Emilio
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-12-18 5:24 ` Emilio Cota
@ 2022-12-19 20:11 ` Aaron Lindsay
2023-01-06 10:30 ` Alex Bennée
0 siblings, 1 reply; 17+ messages in thread
From: Aaron Lindsay @ 2022-12-19 20:11 UTC (permalink / raw)
To: Emilio Cota; +Cc: Richard Henderson, Alex Bennée, qemu-devel
Emilio,
On Dec 18 00:24, Emilio Cota wrote:
> On Tue, Nov 29, 2022 at 15:37:51 -0500, Aaron Lindsay wrote:
> (snip)
> > > Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
> > > getting optimized away, but not the code to set it in the first place?
> >
> > Is there anyone who could help take a look at this from the code gen
> > perspective?
>
> Thanks for the report. Just adding assertions was enough to uncover
> several bugs. I did not reproduce the use-after-free, but by calling
> reset from a callback it's easy to see how it can occur.
>
> I have fixes in https://github.com/cota/qemu/tree/plugins
>
> Can you please give those a try?
>
> BTW I created an issue on gitlab to track this
> https://gitlab.com/qemu-project/qemu/-/issues/1381
Thanks so much for digging into this!
I rebased your plugins branch on top of v7.2.0 and tested with several
scenarios which reliably triggered the bug for me. None of them
reproduced the original problem (or hit other bugs!) with your fixes.
-Aaron
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2022-12-19 20:11 ` Aaron Lindsay
@ 2023-01-06 10:30 ` Alex Bennée
2023-01-07 3:07 ` Emilio Cota
0 siblings, 1 reply; 17+ messages in thread
From: Alex Bennée @ 2023-01-06 10:30 UTC (permalink / raw)
To: Aaron Lindsay; +Cc: Emilio Cota, Richard Henderson, qemu-devel
Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> Emilio,
>
> On Dec 18 00:24, Emilio Cota wrote:
>> On Tue, Nov 29, 2022 at 15:37:51 -0500, Aaron Lindsay wrote:
>> (snip)
>> > > Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
>> > > getting optimized away, but not the code to set it in the first place?
>> >
>> > Is there anyone who could help take a look at this from the code gen
>> > perspective?
>>
>> Thanks for the report. Just adding assertions was enough to uncover
>> several bugs. I did not reproduce the use-after-free, but by calling
>> reset from a callback it's easy to see how it can occur.
>>
>> I have fixes in https://github.com/cota/qemu/tree/plugins
>>
>> Can you please give those a try?
>>
>> BTW I created an issue on gitlab to track this
>> https://gitlab.com/qemu-project/qemu/-/issues/1381
>
> Thanks so much for digging into this!
>
> I rebased your plugins branch on top of v7.2.0 and tested with several
> scenarios which reliably triggered the bug for me. None of them
> reproduced the original problem (or hit other bugs!) with your fixes.
Emilio,
Are you going to be able to post the patches soon? I'd like to get the
fixes in as early in the cycle as possible.
Thanks,
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Plugin Memory Callback Debugging
2023-01-06 10:30 ` Alex Bennée
@ 2023-01-07 3:07 ` Emilio Cota
0 siblings, 0 replies; 17+ messages in thread
From: Emilio Cota @ 2023-01-07 3:07 UTC (permalink / raw)
To: Alex Bennée; +Cc: Aaron Lindsay, Richard Henderson, QEMU Developers
[-- Attachment #1: Type: text/plain, Size: 263 bytes --]
On Fri, Jan 6, 2023, 5:31 AM Alex Bennée <alex.bennee@linaro.org> wrote:
> Are you going to be able to post the patches soon? I'd like to get the
> fixes in as early in the cycle as possible.
>
I intend to post this series on Sunday.
Thanks,
Emilio
[-- Attachment #2: Type: text/html, Size: 651 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2023-01-07 3:08 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-15 22:05 Plugin Memory Callback Debugging Aaron Lindsay
2022-11-15 22:36 ` Alex Bennée
2022-11-18 21:58 ` Aaron Lindsay via
2022-11-18 22:02 ` Aaron Lindsay
2022-11-21 22:02 ` Alex Bennée
2022-11-22 17:05 ` Aaron Lindsay via
2022-11-21 20:18 ` Aaron Lindsay via
2022-11-21 21:51 ` Alex Bennée
2022-11-22 2:22 ` Richard Henderson
2022-11-22 15:57 ` Aaron Lindsay via
2022-11-29 20:37 ` Aaron Lindsay via
2022-12-01 19:32 ` Alex Bennée
2022-12-18 5:24 ` Emilio Cota
2022-12-19 20:11 ` Aaron Lindsay
2023-01-06 10:30 ` Alex Bennée
2023-01-07 3:07 ` Emilio Cota
2022-11-16 6:19 ` Emilio Cota
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).