* [PATCH v2 0/3] scripts/qemu-gdb: Make coroutine dumps to work with coredumps
@ 2024-12-12 20:47 Peter Xu
2024-12-12 20:47 ` [PATCH v2 1/3] scripts/qemu-gdb: Always do full stack dump for python errors Peter Xu
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Peter Xu @ 2024-12-12 20:47 UTC (permalink / raw)
To: qemu-devel
Cc: Stefan Hajnoczi, Fabiano Rosas, Kevin Wolf, Paolo Bonzini, peterx,
Peter Maydell, s_sourceforge, Maxim Levitsky
v1: https://lore.kernel.org/r/20241211201739.1380222-1-peterx@redhat.com
Changelog: in previous v1, I got a wrong cut-off accident in commit
message, which is now fixed (along with some small touchup elsewhere).
When at it, I also tried to make it look even better to be as close as gdb
bt, so it looks like this now:
Coroutine at 0x7f9f4c57c748:
#0 0x55ae6c0dc9a8 in qemu_coroutine_switch<+120> () at ../util/coroutine-ucontext.c:321
#1 0x55ae6c0da2f8 in qemu_aio_coroutine_enter<+356> () at ../util/qemu-coroutine.c:293
#2 0x55ae6c0da3f1 in qemu_coroutine_enter<+34> () at ../util/qemu-coroutine.c:316
#3 0x55ae6baf775e in migration_incoming_process<+43> () at ../migration/migration.c:876
#4 0x55ae6baf7ab4 in migration_ioc_process_incoming<+490> () at ../migration/migration.c:1008
#5 0x55ae6bae9ae7 in migration_channel_process_incoming<+145> () at ../migration/channel.c:45
#6 0x55ae6bb18e35 in socket_accept_incoming_migration<+118> () at ../migration/socket.c:132
#7 0x55ae6be939ef in qio_net_listener_channel_func<+131> () at ../io/net-listener.c:54
#8 0x55ae6be8ce1a in qio_channel_fd_source_dispatch<+78> () at ../io/channel-watch.c:84
#9 0x7f9f5b26728c in g_main_context_dispatch_unlocked.lto_priv<+315> ()
#10 0x7f9f5b267555 in g_main_context_dispatch<+36> ()
#11 0x55ae6c0d91a7 in glib_pollfds_poll<+90> () at ../util/main-loop.c:287
#12 0x55ae6c0d9235 in os_host_main_loop_wait<+128> () at ../util/main-loop.c:310
#13 0x55ae6c0d9364 in main_loop_wait<+203> () at ../util/main-loop.c:589
#14 0x55ae6bac212a in qemu_main_loop<+41> () at ../system/runstate.c:835
#15 0x55ae6bfdf522 in qemu_default_main<+19> () at ../system/main.c:37
#16 0x55ae6bfdf55f in main<+40> () at ../system/main.c:48
#17 0x7f9f59d42248 in __libc_start_call_main<+119> ()
#18 0x7f9f59d4230b in __libc_start_main_impl<+138> ()
Coroutines are used in many cases in block layers. It's also used in live
migration when on destination side, and it'll be handy to diagnose crashes
within a coroutine when we want to also know what other coroutines are
doing.
This series adds initial support for that, not pretty but it should start
working. Since we can't use the trick to modify registers on the fly in
non-live gdb sessions, we do manual unwinds.
One thing to mention is there's a similar but more generic solution
mentioned on the list from Niall:
https://lore.kernel.org/r/f0ebccca-7a17-4da8-ac4a-71cf6d69abc3@mtasv.net
That adds more dependency on both gdb and qemu in the future, however more
generic. So this series is an intermediate quick solution as for now,
which should work for most older qemu/gdb binaries too.
Thanks,
Peter Xu (3):
scripts/qemu-gdb: Always do full stack dump for python errors
scripts/qemu-gdb: Simplify fs_base fetching for coroutines
scripts/qemu-gdb: Support coroutine dumps in coredumps
scripts/qemu-gdb.py | 2 +
scripts/qemugdb/coroutine.py | 102 +++++++++++++++++++++++++----------
2 files changed, 77 insertions(+), 27 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 1/3] scripts/qemu-gdb: Always do full stack dump for python errors
2024-12-12 20:47 [PATCH v2 0/3] scripts/qemu-gdb: Make coroutine dumps to work with coredumps Peter Xu
@ 2024-12-12 20:47 ` Peter Xu
2024-12-12 20:48 ` [PATCH v2 2/3] scripts/qemu-gdb: Simplify fs_base fetching for coroutines Peter Xu
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Peter Xu @ 2024-12-12 20:47 UTC (permalink / raw)
To: qemu-devel
Cc: Stefan Hajnoczi, Fabiano Rosas, Kevin Wolf, Paolo Bonzini, peterx,
Peter Maydell, s_sourceforge, Maxim Levitsky
It's easier for either debugging plugin errors, or issue reports.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
scripts/qemu-gdb.py | 2 ++
1 file changed, 2 insertions(+)
diff --git a/scripts/qemu-gdb.py b/scripts/qemu-gdb.py
index 4d2a9f6c43..cfae94a2e9 100644
--- a/scripts/qemu-gdb.py
+++ b/scripts/qemu-gdb.py
@@ -45,3 +45,5 @@ def __init__(self):
# Default to silently passing through SIGUSR1, because QEMU sends it
# to itself a lot.
gdb.execute('handle SIGUSR1 pass noprint nostop')
+# Always print full stack for python errors, easier to debug and report issues
+gdb.execute('set python print-stack full')
--
2.47.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 2/3] scripts/qemu-gdb: Simplify fs_base fetching for coroutines
2024-12-12 20:47 [PATCH v2 0/3] scripts/qemu-gdb: Make coroutine dumps to work with coredumps Peter Xu
2024-12-12 20:47 ` [PATCH v2 1/3] scripts/qemu-gdb: Always do full stack dump for python errors Peter Xu
@ 2024-12-12 20:48 ` Peter Xu
2024-12-12 20:48 ` [PATCH v2 3/3] scripts/qemu-gdb: Support coroutine dumps in coredumps Peter Xu
2025-01-22 13:50 ` [PATCH v2 0/3] scripts/qemu-gdb: Make coroutine dumps to work with coredumps Kevin Wolf
3 siblings, 0 replies; 5+ messages in thread
From: Peter Xu @ 2024-12-12 20:48 UTC (permalink / raw)
To: qemu-devel
Cc: Stefan Hajnoczi, Fabiano Rosas, Kevin Wolf, Paolo Bonzini, peterx,
Peter Maydell, s_sourceforge, Maxim Levitsky
There're a bunch of code trying to fetch fs_base in different ways. IIUC
the simplest way instead is "$fs_base". It also has the benefit that it'll
work for both live gdb session or coredumps.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
scripts/qemugdb/coroutine.py | 23 ++---------------------
1 file changed, 2 insertions(+), 21 deletions(-)
diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
index 7db46d4b68..20f76ed37b 100644
--- a/scripts/qemugdb/coroutine.py
+++ b/scripts/qemugdb/coroutine.py
@@ -13,28 +13,9 @@
VOID_PTR = gdb.lookup_type('void').pointer()
-def get_fs_base():
- '''Fetch %fs base value using arch_prctl(ARCH_GET_FS). This is
- pthread_self().'''
- # %rsp - 120 is scratch space according to the SystemV ABI
- old = gdb.parse_and_eval('*(uint64_t*)($rsp - 120)')
- gdb.execute('call (int)arch_prctl(0x1003, $rsp - 120)', False, True)
- fs_base = gdb.parse_and_eval('*(uint64_t*)($rsp - 120)')
- gdb.execute('set *(uint64_t*)($rsp - 120) = %s' % old, False, True)
- return fs_base
-
def pthread_self():
- '''Fetch pthread_self() from the glibc start_thread function.'''
- f = gdb.newest_frame()
- while f.name() != 'start_thread':
- f = f.older()
- if f is None:
- return get_fs_base()
-
- try:
- return f.read_var("arg")
- except ValueError:
- return get_fs_base()
+ '''Fetch the base address of TLS.'''
+ return gdb.parse_and_eval("$fs_base")
def get_glibc_pointer_guard():
'''Fetch glibc pointer guard value'''
--
2.47.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 3/3] scripts/qemu-gdb: Support coroutine dumps in coredumps
2024-12-12 20:47 [PATCH v2 0/3] scripts/qemu-gdb: Make coroutine dumps to work with coredumps Peter Xu
2024-12-12 20:47 ` [PATCH v2 1/3] scripts/qemu-gdb: Always do full stack dump for python errors Peter Xu
2024-12-12 20:48 ` [PATCH v2 2/3] scripts/qemu-gdb: Simplify fs_base fetching for coroutines Peter Xu
@ 2024-12-12 20:48 ` Peter Xu
2025-01-22 13:50 ` [PATCH v2 0/3] scripts/qemu-gdb: Make coroutine dumps to work with coredumps Kevin Wolf
3 siblings, 0 replies; 5+ messages in thread
From: Peter Xu @ 2024-12-12 20:48 UTC (permalink / raw)
To: qemu-devel
Cc: Stefan Hajnoczi, Fabiano Rosas, Kevin Wolf, Paolo Bonzini, peterx,
Peter Maydell, s_sourceforge, Maxim Levitsky
Dumping coroutines don't yet work with coredumps. Let's make it work.
We still kept most of the old code because they can be either more
flexible, or prettier. Only add the fallbacks when they stop working.
Currently the raw unwind is pretty ugly, but it works, like this:
(gdb) qemu bt
#0 process_incoming_migration_co (opaque=0x0) at ../migration/migration.c:788
#1 0x000055ae6c0dc4d9 in coroutine_trampoline (i0=-1711718576, i1=21934) at ../util/coroutine-ucontext.c:175
#2 0x00007f9f59d72f40 in ??? () at /lib64/libc.so.6
#3 0x00007ffd549214a0 in ??? ()
#4 0x0000000000000000 in ??? ()
Coroutine at 0x7f9f4c57c748:
#0 0x55ae6c0dc9a8 in qemu_coroutine_switch<+120> () at ../util/coroutine-ucontext.c:321
#1 0x55ae6c0da2f8 in qemu_aio_coroutine_enter<+356> () at ../util/qemu-coroutine.c:293
#2 0x55ae6c0da3f1 in qemu_coroutine_enter<+34> () at ../util/qemu-coroutine.c:316
#3 0x55ae6baf775e in migration_incoming_process<+43> () at ../migration/migration.c:876
#4 0x55ae6baf7ab4 in migration_ioc_process_incoming<+490> () at ../migration/migration.c:1008
#5 0x55ae6bae9ae7 in migration_channel_process_incoming<+145> () at ../migration/channel.c:45
#6 0x55ae6bb18e35 in socket_accept_incoming_migration<+118> () at ../migration/socket.c:132
#7 0x55ae6be939ef in qio_net_listener_channel_func<+131> () at ../io/net-listener.c:54
#8 0x55ae6be8ce1a in qio_channel_fd_source_dispatch<+78> () at ../io/channel-watch.c:84
#9 0x7f9f5b26728c in g_main_context_dispatch_unlocked.lto_priv<+315> ()
#10 0x7f9f5b267555 in g_main_context_dispatch<+36> ()
#11 0x55ae6c0d91a7 in glib_pollfds_poll<+90> () at ../util/main-loop.c:287
#12 0x55ae6c0d9235 in os_host_main_loop_wait<+128> () at ../util/main-loop.c:310
#13 0x55ae6c0d9364 in main_loop_wait<+203> () at ../util/main-loop.c:589
#14 0x55ae6bac212a in qemu_main_loop<+41> () at ../system/runstate.c:835
#15 0x55ae6bfdf522 in qemu_default_main<+19> () at ../system/main.c:37
#16 0x55ae6bfdf55f in main<+40> () at ../system/main.c:48
#17 0x7f9f59d42248 in __libc_start_call_main<+119> ()
#18 0x7f9f59d4230b in __libc_start_main_impl<+138> ()
Signed-off-by: Peter Xu <peterx@redhat.com>
---
scripts/qemugdb/coroutine.py | 79 +++++++++++++++++++++++++++++++++---
1 file changed, 73 insertions(+), 6 deletions(-)
diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
index 20f76ed37b..e98fc48a4b 100644
--- a/scripts/qemugdb/coroutine.py
+++ b/scripts/qemugdb/coroutine.py
@@ -46,9 +46,60 @@ def get_jmpbuf_regs(jmpbuf):
'r15': jmpbuf[JB_R15],
'rip': glibc_ptr_demangle(jmpbuf[JB_PC], pointer_guard) }
-def bt_jmpbuf(jmpbuf):
- '''Backtrace a jmpbuf'''
- regs = get_jmpbuf_regs(jmpbuf)
+def symbol_lookup(addr):
+ # Example: "__clone3 + 44 in section .text of /lib64/libc.so.6"
+ result = gdb.execute(f"info symbol {hex(addr)}", to_string=True).strip()
+ try:
+ if "+" in result:
+ (func, result) = result.split(" + ")
+ (offset, result) = result.split(" in ")
+ else:
+ offset = "0"
+ (func, result) = result.split(" in ")
+ func_str = f"{func}<+{offset}> ()"
+ except:
+ return f"??? ({result})"
+
+ # Example: Line 321 of "../util/coroutine-ucontext.c" starts at address
+ # 0x55cf3894d993 <qemu_coroutine_switch+99> and ends at 0x55cf3894d9ab
+ # <qemu_coroutine_switch+123>.
+ result = gdb.execute(f"info line *{hex(addr)}", to_string=True).strip()
+ if not result.startswith("Line "):
+ return func_str
+ result = result[5:]
+
+ try:
+ result = result.split(" starts ")[0]
+ (line, path) = result.split(" of ")
+ path = path.replace("\"", "")
+ except:
+ return func_str
+
+ return f"{func_str} at {path}:{line}"
+
+def dump_backtrace(regs):
+ '''
+ Backtrace dump with raw registers, mimic GDB command 'bt'.
+ '''
+ # Here only rbp and rip that matter..
+ rbp = regs['rbp']
+ rip = regs['rip']
+ i = 0
+
+ while rbp:
+ # For all return addresses on stack, we want to look up symbol/line
+ # on the CALL command, because the return address is the next
+ # instruction instead of the CALL. Here -1 would work for any
+ # sized CALL instruction.
+ print(f"#{i} {hex(rip)} in {symbol_lookup(rip if i == 0 else rip-1)}")
+ rip = gdb.parse_and_eval(f"*(uint64_t *)(uint64_t)({hex(rbp)} + 8)")
+ rbp = gdb.parse_and_eval(f"*(uint64_t *)(uint64_t)({hex(rbp)})")
+ i += 1
+
+def dump_backtrace_live(regs):
+ '''
+ Backtrace dump with gdb's 'bt' command, only usable in a live session.
+ '''
old = dict()
# remember current stack frame and select the topmost
@@ -69,6 +120,17 @@ def bt_jmpbuf(jmpbuf):
selected_frame.select()
+def bt_jmpbuf(jmpbuf):
+ '''Backtrace a jmpbuf'''
+ regs = get_jmpbuf_regs(jmpbuf)
+ try:
+ # This reuses gdb's "bt" command, which can be slightly prettier
+ # but only works with live sessions.
+ dump_backtrace_live(regs)
+ except:
+ # If above doesn't work, fallback to poor man's unwind
+ dump_backtrace(regs)
+
def co_cast(co):
return co.cast(gdb.lookup_type('CoroutineUContext').pointer())
@@ -101,10 +163,15 @@ def invoke(self, arg, from_tty):
gdb.execute("bt")
- if gdb.parse_and_eval("qemu_in_coroutine()") == False:
- return
+ try:
+ # This only works with a live session
+ co_ptr = gdb.parse_and_eval("qemu_coroutine_self()")
+ except:
+ # Fallback to use hard-coded ucontext vars if it's coredump
+ co_ptr = gdb.parse_and_eval("co_tls_current")
- co_ptr = gdb.parse_and_eval("qemu_coroutine_self()")
+ if co_ptr == False:
+ return
while True:
co = co_cast(co_ptr)
--
2.47.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 0/3] scripts/qemu-gdb: Make coroutine dumps to work with coredumps
2024-12-12 20:47 [PATCH v2 0/3] scripts/qemu-gdb: Make coroutine dumps to work with coredumps Peter Xu
` (2 preceding siblings ...)
2024-12-12 20:48 ` [PATCH v2 3/3] scripts/qemu-gdb: Support coroutine dumps in coredumps Peter Xu
@ 2025-01-22 13:50 ` Kevin Wolf
3 siblings, 0 replies; 5+ messages in thread
From: Kevin Wolf @ 2025-01-22 13:50 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Stefan Hajnoczi, Fabiano Rosas, Paolo Bonzini,
Peter Maydell, s_sourceforge, Maxim Levitsky
Am 12.12.2024 um 21:47 hat Peter Xu geschrieben:
> v1: https://lore.kernel.org/r/20241211201739.1380222-1-peterx@redhat.com
>
> Changelog: in previous v1, I got a wrong cut-off accident in commit
> message, which is now fixed (along with some small touchup elsewhere).
> When at it, I also tried to make it look even better to be as close as gdb
> bt, so it looks like this now:
>
> Coroutine at 0x7f9f4c57c748:
> #0 0x55ae6c0dc9a8 in qemu_coroutine_switch<+120> () at ../util/coroutine-ucontext.c:321
> #1 0x55ae6c0da2f8 in qemu_aio_coroutine_enter<+356> () at ../util/qemu-coroutine.c:293
> #2 0x55ae6c0da3f1 in qemu_coroutine_enter<+34> () at ../util/qemu-coroutine.c:316
> #3 0x55ae6baf775e in migration_incoming_process<+43> () at ../migration/migration.c:876
> #4 0x55ae6baf7ab4 in migration_ioc_process_incoming<+490> () at ../migration/migration.c:1008
> #5 0x55ae6bae9ae7 in migration_channel_process_incoming<+145> () at ../migration/channel.c:45
> #6 0x55ae6bb18e35 in socket_accept_incoming_migration<+118> () at ../migration/socket.c:132
> #7 0x55ae6be939ef in qio_net_listener_channel_func<+131> () at ../io/net-listener.c:54
> #8 0x55ae6be8ce1a in qio_channel_fd_source_dispatch<+78> () at ../io/channel-watch.c:84
> #9 0x7f9f5b26728c in g_main_context_dispatch_unlocked.lto_priv<+315> ()
> #10 0x7f9f5b267555 in g_main_context_dispatch<+36> ()
> #11 0x55ae6c0d91a7 in glib_pollfds_poll<+90> () at ../util/main-loop.c:287
> #12 0x55ae6c0d9235 in os_host_main_loop_wait<+128> () at ../util/main-loop.c:310
> #13 0x55ae6c0d9364 in main_loop_wait<+203> () at ../util/main-loop.c:589
> #14 0x55ae6bac212a in qemu_main_loop<+41> () at ../system/runstate.c:835
> #15 0x55ae6bfdf522 in qemu_default_main<+19> () at ../system/main.c:37
> #16 0x55ae6bfdf55f in main<+40> () at ../system/main.c:48
> #17 0x7f9f59d42248 in __libc_start_call_main<+119> ()
> #18 0x7f9f59d4230b in __libc_start_main_impl<+138> ()
>
> Coroutines are used in many cases in block layers. It's also used in live
> migration when on destination side, and it'll be handy to diagnose crashes
> within a coroutine when we want to also know what other coroutines are
> doing.
>
> This series adds initial support for that, not pretty but it should start
> working. Since we can't use the trick to modify registers on the fly in
> non-live gdb sessions, we do manual unwinds.
>
> One thing to mention is there's a similar but more generic solution
> mentioned on the list from Niall:
>
> https://lore.kernel.org/r/f0ebccca-7a17-4da8-ac4a-71cf6d69abc3@mtasv.net
>
> That adds more dependency on both gdb and qemu in the future, however more
> generic. So this series is an intermediate quick solution as for now,
> which should work for most older qemu/gdb binaries too.
>
> Thanks,
>
> Peter Xu (3):
> scripts/qemu-gdb: Always do full stack dump for python errors
> scripts/qemu-gdb: Simplify fs_base fetching for coroutines
> scripts/qemu-gdb: Support coroutine dumps in coredumps
Thanks, applied to the block branch.
Kevin
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-01-22 13:51 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-12 20:47 [PATCH v2 0/3] scripts/qemu-gdb: Make coroutine dumps to work with coredumps Peter Xu
2024-12-12 20:47 ` [PATCH v2 1/3] scripts/qemu-gdb: Always do full stack dump for python errors Peter Xu
2024-12-12 20:48 ` [PATCH v2 2/3] scripts/qemu-gdb: Simplify fs_base fetching for coroutines Peter Xu
2024-12-12 20:48 ` [PATCH v2 3/3] scripts/qemu-gdb: Support coroutine dumps in coredumps Peter Xu
2025-01-22 13:50 ` [PATCH v2 0/3] scripts/qemu-gdb: Make coroutine dumps to work with coredumps Kevin Wolf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.