qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Plugin Memory Callback Debugging
@ 2022-11-15 22:05 Aaron Lindsay
  2022-11-15 22:36 ` Alex Bennée
  2022-11-16  6:19 ` Emilio Cota
  0 siblings, 2 replies; 17+ messages in thread
From: Aaron Lindsay @ 2022-11-15 22:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota, Alex Bennée, Richard Henderson

Hello,

I have been wrestling with what might be a bug in the plugin memory
callbacks. The immediate error is that I hit the
`g_assert_not_reached()` in the 'default:' case in
qemu_plugin_vcpu_mem_cb, indicating the callback type was invalid. When
breaking on this assertion in gdb, the contents of cpu->plugin_mem_cbs
are obviously bogus (`len` was absurdly high, for example).  After doing
some further digging/instrumenting, I eventually found that
`free_dyn_cb_arr(void *p, ...)` is being called shortly before the
assertion is hit with `p` pointing to the same address as
`cpu->plugin_mem_cbs` will later hold at assertion-time. We are freeing
the memory still pointed to by `cpu->plugin_mem_cbs`.

I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
end of an instruction/TB's execution, so its not exactly clear to me how this
is occurring. However, I suspect it may be relevant that we are calling
`free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`. 

I have additionally found that the below addition allows me to run successfully
without hitting the assert:

diff --git a/plugins/core.c b/plugins/core.c
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -427,9 +427,14 @@ static bool free_dyn_cb_arr(void *p, uint32_t h, void *userp)

 void qemu_plugin_flush_cb(void)
 {
+    CPUState *cpu;
     qht_iter_remove(&plugin.dyn_cb_arr_ht, free_dyn_cb_arr, NULL);
     qht_reset(&plugin.dyn_cb_arr_ht);

+    CPU_FOREACH(cpu) {
+        cpu->plugin_mem_cbs = NULL;
+    }
+
     plugin_cb__simple(QEMU_PLUGIN_EV_FLUSH);
 }

Unfortunately, the workload/setup I have encountered this bug with are
difficult to reproduce in a way suitable for sharing upstream (admittedly
potentially because I do not fully understand the conditions necessary to
trigger it). It is also deep into a run, and I haven't found a good way
to break in gdb immediately prior to it happening in order to inspect
it, without perturbing it enough such that it doesn't happen... 

I welcome any feedback or insights on how to further nail down the
failure case and/or help in working towards an appropriate solution.

Thanks!

-Aaron


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-01-07  3:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-15 22:05 Plugin Memory Callback Debugging Aaron Lindsay
2022-11-15 22:36 ` Alex Bennée
2022-11-18 21:58   ` Aaron Lindsay via
2022-11-18 22:02     ` Aaron Lindsay
2022-11-21 22:02       ` Alex Bennée
2022-11-22 17:05         ` Aaron Lindsay via
2022-11-21 20:18   ` Aaron Lindsay via
2022-11-21 21:51     ` Alex Bennée
2022-11-22  2:22       ` Richard Henderson
2022-11-22 15:57         ` Aaron Lindsay via
2022-11-29 20:37           ` Aaron Lindsay via
2022-12-01 19:32             ` Alex Bennée
2022-12-18  5:24             ` Emilio Cota
2022-12-19 20:11               ` Aaron Lindsay
2023-01-06 10:30                 ` Alex Bennée
2023-01-07  3:07                   ` Emilio Cota
2022-11-16  6:19 ` Emilio Cota

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).