qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Fixes and improvements for scripts/qemugdb commands
@ 2025-11-25 14:21 andrey.drobyshev
  2025-11-25 14:21 ` [PATCH 1/4] scripts/qemugdb: mtree: Fix OverflowError in mtree with 128-bit addresses andrey.drobyshev
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: andrey.drobyshev @ 2025-11-25 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, peterx, stefanha, vsementsov, den, andrey.drobyshev

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

Commands 'qemu mtree' and 'qemu timers' are currently not working.
Luckily, fixing them requires minimal effort.

The most interesting part is adding '--detailed' option into 'qemu bt'
and 'qemu coroutine' commands.  That is based on the previous downstream
work by Vladimir.  As this approach is much heavier than the current
stack unwinding, we add it as non-default behaviour.

Andrey Drobyshev (4):
  scripts/qemugdb: mtree: Fix OverflowError in mtree with 128-bit
    addresses
  scripts/qemugdb: timers: Fix KeyError in 'qemu timers' command
  scripts/qemugdb: timers: Improve 'qemu timers' command readability
  scripts/qemugdb: coroutine: Add option for obtaining detailed trace in
    coredump

 scripts/qemugdb/coroutine.py | 126 ++++++++++++++++++++++++++++++++---
 scripts/qemugdb/mtree.py     |   2 +-
 scripts/qemugdb/timers.py    |  54 ++++++++++++---
 3 files changed, 162 insertions(+), 20 deletions(-)

-- 
2.43.5



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/4] scripts/qemugdb: mtree: Fix OverflowError in mtree with 128-bit addresses
  2025-11-25 14:21 [PATCH 0/4] Fixes and improvements for scripts/qemugdb commands andrey.drobyshev
@ 2025-11-25 14:21 ` andrey.drobyshev
  2025-11-25 14:21 ` [PATCH 2/4] scripts/qemugdb: timers: Fix KeyError in 'qemu timers' command andrey.drobyshev
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: andrey.drobyshev @ 2025-11-25 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, peterx, stefanha, vsementsov, den, andrey.drobyshev

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

The 'qemu mtree' command fails with "OverflowError: int too big to
convert" when memory regions have 128-bit addresses.

Fix by changing conversion base from 16 to 0 (automatic detection based
on string prefix).  This works more reliably in GDB's embedded
Python.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
---
 scripts/qemugdb/mtree.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/qemugdb/mtree.py b/scripts/qemugdb/mtree.py
index 8fe42c3c12..77603c04b1 100644
--- a/scripts/qemugdb/mtree.py
+++ b/scripts/qemugdb/mtree.py
@@ -25,7 +25,7 @@ def int128(p):
     if p.type.code == gdb.TYPE_CODE_STRUCT:
         return int(p['lo']) + (int(p['hi']) << 64)
     else:
-        return int(("%s" % p), 16)
+        return int(("%s" % p), 0)
 
 class MtreeCommand(gdb.Command):
     '''Display the memory tree hierarchy'''
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/4] scripts/qemugdb: timers: Fix KeyError in 'qemu timers' command
  2025-11-25 14:21 [PATCH 0/4] Fixes and improvements for scripts/qemugdb commands andrey.drobyshev
  2025-11-25 14:21 ` [PATCH 1/4] scripts/qemugdb: mtree: Fix OverflowError in mtree with 128-bit addresses andrey.drobyshev
@ 2025-11-25 14:21 ` andrey.drobyshev
  2025-11-25 14:21 ` [PATCH 3/4] scripts/qemugdb: timers: Improve 'qemu timers' command readability andrey.drobyshev
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: andrey.drobyshev @ 2025-11-25 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, peterx, stefanha, vsementsov, den, andrey.drobyshev

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

Currently invoking 'qemu timers' command results into: "gdb.error: There
is no member named last".  Let's remove the legacy 'last' field from
QEMUClock, as it was removed in v4.2.0 by the commit 3c2d4c8aa6a
("timer: last, remove last bits of last").

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
---
 scripts/qemugdb/timers.py | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/scripts/qemugdb/timers.py b/scripts/qemugdb/timers.py
index 5714f92cc2..1219a96b32 100644
--- a/scripts/qemugdb/timers.py
+++ b/scripts/qemugdb/timers.py
@@ -36,10 +36,9 @@ def dump_timers(self, timer):
 
     def process_timerlist(self, tlist, ttype):
         gdb.write("Processing %s timers\n" % (ttype))
-        gdb.write("  clock %s is enabled:%s, last:%s\n" % (
+        gdb.write("  clock %s is enabled:%s\n" % (
             tlist['clock']['type'],
-            tlist['clock']['enabled'],
-            tlist['clock']['last']))
+            tlist['clock']['enabled']))
         if int(tlist['active_timers']) > 0:
             self.dump_timers(tlist['active_timers'])
 
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/4] scripts/qemugdb: timers: Improve 'qemu timers' command readability
  2025-11-25 14:21 [PATCH 0/4] Fixes and improvements for scripts/qemugdb commands andrey.drobyshev
  2025-11-25 14:21 ` [PATCH 1/4] scripts/qemugdb: mtree: Fix OverflowError in mtree with 128-bit addresses andrey.drobyshev
  2025-11-25 14:21 ` [PATCH 2/4] scripts/qemugdb: timers: Fix KeyError in 'qemu timers' command andrey.drobyshev
@ 2025-11-25 14:21 ` andrey.drobyshev
  2025-11-25 14:21 ` [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump andrey.drobyshev
  2025-11-26 20:59 ` [PATCH 0/4] Fixes and improvements for scripts/qemugdb commands Stefan Hajnoczi
  4 siblings, 0 replies; 18+ messages in thread
From: andrey.drobyshev @ 2025-11-25 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, peterx, stefanha, vsementsov, den, andrey.drobyshev

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

* Add the 'attributes' field from QEMUTimer;
* Stringify the field's value in accordance with macros from
  include/qemu/timer.h;
* Make timer expiration times human-readable by converting from nanoseconds
  to appropriate units (ms/s/min/hrs/days) and showing the scale factor
  (ns/us/ms/s).

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
---
 scripts/qemugdb/timers.py | 49 +++++++++++++++++++++++++++++++++++----
 1 file changed, 44 insertions(+), 5 deletions(-)

diff --git a/scripts/qemugdb/timers.py b/scripts/qemugdb/timers.py
index 1219a96b32..916c71b74a 100644
--- a/scripts/qemugdb/timers.py
+++ b/scripts/qemugdb/timers.py
@@ -21,14 +21,53 @@ def __init__(self):
         gdb.Command.__init__(self, 'qemu timers', gdb.COMMAND_DATA,
                              gdb.COMPLETE_NONE)
 
+    def _format_expire_time(self, expire_time, scale):
+        "Return human-readable expiry time (ns) with scale info."
+        secs = expire_time / 1e9
+
+        # Select unit and compute value
+        if secs < 1:
+            val, unit = secs * 1000, "ms"
+        elif secs < 60:
+            val, unit = secs, "s"
+        elif secs < 3600:
+            val, unit = secs / 60, "min"
+        elif secs < 86400:
+            val, unit = secs / 3600, "hrs"
+        else:
+            val, unit = secs / 86400, "days"
+
+        scale_map = {1: "ns", 1000: "us", 1000000: "ms",
+                     1000000000: "s"}
+        scale_str = scale_map.get(scale, f"scale={scale}")
+        return f"{val:.2f} {unit} [{scale_str}]"
+
+    def _format_attribute(self, attr):
+        "Given QEMUTimer attributes value, return a human-readable string"
+
+        # From include/qemu/timer.h
+        if attr == 0:
+            value = 'NONE'
+        elif attr == 1 << 0:
+            value = 'ATTR_EXTERNAL'
+        elif attr == int(0xffffffff):
+            value = 'ATTR_ALL'
+        else:
+            value = 'UNKNOWN'
+        return f'{attr} <{value}>'
+
     def dump_timers(self, timer):
         "Follow a timer and recursively dump each one in the list."
         # timer should be of type QemuTimer
-        gdb.write("    timer %s/%s (cb:%s,opq:%s)\n" % (
-            timer['expire_time'],
-            timer['scale'],
-            timer['cb'],
-            timer['opaque']))
+        scale = int(timer['scale'])
+        expire_time = int(timer['expire_time'])
+        attributes = int(timer['attributes'])
+
+        time_str = self._format_expire_time(expire_time, scale)
+        attr_str = self._format_attribute(attributes)
+
+        gdb.write(f"    timer at {time_str} (attr:{attr_str}, "
+                  f"cb:{timer['cb']}, opq:{timer['opaque']})\n")
 
         if int(timer['next']) > 0:
             self.dump_timers(timer['next'])
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump
  2025-11-25 14:21 [PATCH 0/4] Fixes and improvements for scripts/qemugdb commands andrey.drobyshev
                   ` (2 preceding siblings ...)
  2025-11-25 14:21 ` [PATCH 3/4] scripts/qemugdb: timers: Improve 'qemu timers' command readability andrey.drobyshev
@ 2025-11-25 14:21 ` andrey.drobyshev
  2025-11-26 20:58   ` Stefan Hajnoczi
  2025-11-27  9:56   ` Kevin Wolf
  2025-11-26 20:59 ` [PATCH 0/4] Fixes and improvements for scripts/qemugdb commands Stefan Hajnoczi
  4 siblings, 2 replies; 18+ messages in thread
From: andrey.drobyshev @ 2025-11-25 14:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, peterx, stefanha, vsementsov, den, andrey.drobyshev

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

Commit 772f86839f ("scripts/qemu-gdb: Support coroutine dumps in
coredumps") introduced coroutine traces in coredumps using raw stack
unwinding.  While this works, this approach does not allow to view the
function arguments in the corresponding stack frames.

As an alternative, we can obtain saved registers from the coroutine's
jmpbuf, copy the original coredump file into a temporary file, patch the
saved registers into the tmp coredump's struct elf_prstatus and execute
another gdb subprocess to get backtrace from the patched temporary coredump.

While providing more detailed info, this alternative approach, however, is
quite heavyweight as it takes significantly more time and disk space.
So, instead of making it a new default, let's keep raw unwind the default
behaviour, but add the '--detailed' option for 'qemu bt' and 'qemu coroutine'
command which would enforce the new behaviour.

That's how this looks:

  (gdb) qemu coroutine 0x7fda9335a508
  #0  0x5602bdb41c26 in qemu_coroutine_switch<+214> () at ../util/coroutine-ucontext.c:321
  #1  0x5602bdb3e8fe in qemu_aio_coroutine_enter<+493> () at ../util/qemu-coroutine.c:293
  #2  0x5602bdb3c4eb in co_schedule_bh_cb<+538> () at ../util/async.c:547
  #3  0x5602bdb3b518 in aio_bh_call<+119> () at ../util/async.c:172
  #4  0x5602bdb3b79a in aio_bh_poll<+457> () at ../util/async.c:219
  #5  0x5602bdb10f22 in aio_poll<+1201> () at ../util/aio-posix.c:719
  #6  0x5602bd8fb1ac in iothread_run<+123> () at ../iothread.c:63
  #7  0x5602bdb18a24 in qemu_thread_start<+355> () at ../util/qemu-thread-posix.c:393

  (gdb) qemu coroutine 0x7fda9335a508 --detailed
  patching core file /tmp/tmpq4hmk2qc
  found "CORE" at 0x10c48
  assume pt_regs at 0x10cbc
  write r15 at 0x10cbc
  write r14 at 0x10cc4
  write r13 at 0x10ccc
  write r12 at 0x10cd4
  write rbp at 0x10cdc
  write rbx at 0x10ce4
  write rip at 0x10d3c
  write rsp at 0x10d54

  #0  0x00005602bdb41c26 in qemu_coroutine_switch (from_=0x7fda9335a508, to_=0x7fda8400c280, action=COROUTINE_ENTER) at ../util/coroutine-ucontext.c:321
  #1  0x00005602bdb3e8fe in qemu_aio_coroutine_enter (ctx=0x5602bf7147c0, co=0x7fda8400c280) at ../util/qemu-coroutine.c:293
  #2  0x00005602bdb3c4eb in co_schedule_bh_cb (opaque=0x5602bf7147c0) at ../util/async.c:547
  #3  0x00005602bdb3b518 in aio_bh_call (bh=0x5602bf714a40) at ../util/async.c:172
  #4  0x00005602bdb3b79a in aio_bh_poll (ctx=0x5602bf7147c0) at ../util/async.c:219
  #5  0x00005602bdb10f22 in aio_poll (ctx=0x5602bf7147c0, blocking=true) at ../util/aio-posix.c:719
  #6  0x00005602bd8fb1ac in iothread_run (opaque=0x5602bf42b100) at ../iothread.c:63
  #7  0x00005602bdb18a24 in qemu_thread_start (args=0x5602bf7164a0) at ../util/qemu-thread-posix.c:393
  #8  0x00007fda9e89f7f2 in start_thread (arg=<optimized out>) at pthread_create.c:443
  #9  0x00007fda9e83f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
CC: Peter Xu <peterx@redhat.com>
Originally-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
---
 scripts/qemugdb/coroutine.py | 126 ++++++++++++++++++++++++++++++++---
 1 file changed, 115 insertions(+), 11 deletions(-)

diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
index e98fc48a4b..b1c7f96af5 100644
--- a/scripts/qemugdb/coroutine.py
+++ b/scripts/qemugdb/coroutine.py
@@ -10,6 +10,13 @@
 # or later.  See the COPYING file in the top-level directory.
 
 import gdb
+import os
+import re
+import struct
+import shutil
+import subprocess
+import tempfile
+import textwrap
 
 VOID_PTR = gdb.lookup_type('void').pointer()
 
@@ -77,6 +84,65 @@ def symbol_lookup(addr):
 
     return f"{func_str} at {path}:{line}"
 
+def write_regs_to_coredump(corefile, set_regs):
+    # asm/ptrace.h
+    pt_regs = ['r15', 'r14', 'r13', 'r12', 'rbp', 'rbx', 'r11', 'r10',
+               'r9', 'r8', 'rax', 'rcx', 'rdx', 'rsi', 'rdi', 'orig_rax',
+               'rip', 'cs', 'eflags', 'rsp', 'ss']
+
+    with open(corefile, 'r+b') as f:
+        gdb.write(f'patching core file {corefile}\n')
+
+        while f.read(4) != b'CORE':
+            pass
+        gdb.write(f'found "CORE" at 0x{f.tell():x}\n')
+
+        # Looking for struct elf_prstatus and pr_reg field in it (an array
+        # of general purpose registers).  See sys/procfs.h
+
+        # lseek(f.fileno(), 4, SEEK_CUR): go to elf_prstatus
+        f.seek(4, 1)
+        # lseek(f.fileno(), 112, SEEK_CUR): offsetof(struct elf_prstatus, pr_reg)
+        f.seek(112, 1)
+
+        gdb.write(f'assume pt_regs at 0x{f.tell():x}\n')
+        for reg in pt_regs:
+            if reg in set_regs:
+                gdb.write(f'write {reg} at 0x{f.tell():x}\n')
+                f.write(struct.pack('q', set_regs[reg]))
+            else:
+                # lseek(f.fileno(), 8, SEEK_CUR): go to the next reg
+                f.seek(8, 1)
+
+def clone_coredump(source, target, set_regs):
+    shutil.copyfile(source, target)
+    write_regs_to_coredump(target, set_regs)
+
+def dump_backtrace_patched(regs):
+    files = gdb.execute('info files', False, True).split('\n')
+    executable = re.match('^Symbols from "(.*)".$', files[0]).group(1)
+    dump = re.search("`(.*)'", files[2]).group(1)
+
+    with tempfile.NamedTemporaryFile(dir='/tmp', delete=False) as f:
+        tmpcore = f.name
+
+    clone_coredump(dump, tmpcore, regs)
+
+    cmd = ['script', '-qec',
+           'gdb -batch ' +
+           '-ex "set complaints 0" ' +
+           '-ex "set verbose off" ' +
+           '-ex "set style enabled on" ' +
+           '-ex "python print(\'----split----\')" ' +
+           f'-ex bt {executable} {tmpcore}',
+           '/dev/null']
+    out = subprocess.check_output(cmd, stderr=subprocess.DEVNULL)
+    out = out.split(b'----split----')[1].decode('utf-8')
+
+    os.remove(tmpcore)
+
+    gdb.write(out)
+
 def dump_backtrace(regs):
     '''
     Backtrace dump with raw registers, mimic GDB command 'bt'.
@@ -120,7 +186,7 @@ def dump_backtrace_live(regs):
 
     selected_frame.select()
 
-def bt_jmpbuf(jmpbuf):
+def bt_jmpbuf(jmpbuf, detailed=False):
     '''Backtrace a jmpbuf'''
     regs = get_jmpbuf_regs(jmpbuf)
     try:
@@ -128,8 +194,12 @@ def bt_jmpbuf(jmpbuf):
         # but only works with live sessions.
         dump_backtrace_live(regs)
     except:
-        # If above doesn't work, fallback to poor man's unwind
-        dump_backtrace(regs)
+        if detailed:
+            # Obtain detailed trace by patching regs in copied coredump
+            dump_backtrace_patched(regs)
+        else:
+            # If above doesn't work, fallback to poor man's unwind
+            dump_backtrace(regs)
 
 def co_cast(co):
     return co.cast(gdb.lookup_type('CoroutineUContext').pointer())
@@ -140,26 +210,60 @@ def coroutine_to_jmpbuf(co):
 
 
 class CoroutineCommand(gdb.Command):
-    '''Display coroutine backtrace'''
+    __doc__ = textwrap.dedent("""\
+        Display coroutine backtrace
+
+        Usage: qemu coroutine COROPTR [--detailed]
+        Show backtrace for a coroutine specified by COROPTR
+
+          --detailed       obtain detailed trace by copying coredump, patching
+                           regs in it, and runing gdb subprocess to get
+                           backtrace from the patched coredump
+        """)
+
     def __init__(self):
         gdb.Command.__init__(self, 'qemu coroutine', gdb.COMMAND_DATA,
                              gdb.COMPLETE_NONE)
 
+    def _usage(self):
+        gdb.write('usage: qemu coroutine <coroutine-pointer> [--detailed]\n')
+        return
+
     def invoke(self, arg, from_tty):
         argv = gdb.string_to_argv(arg)
-        if len(argv) != 1:
-            gdb.write('usage: qemu coroutine <coroutine-pointer>\n')
-            return
+        argc = len(argv)
+        if argc == 0 or argc > 2 or (argc == 2 and argv[1] != '--detailed'):
+            return self._usage()
+        detailed = True if argc == 2 else False
 
-        bt_jmpbuf(coroutine_to_jmpbuf(gdb.parse_and_eval(argv[0])))
+        bt_jmpbuf(coroutine_to_jmpbuf(gdb.parse_and_eval(argv[0])),
+                  detailed=detailed)
 
 class CoroutineBt(gdb.Command):
-    '''Display backtrace including coroutine switches'''
+    __doc__ = textwrap.dedent("""\
+        Display backtrace including coroutine switches
+
+        Usage: qemu bt [--detailed]
+
+          --detailed       obtain detailed trace by copying coredump, patching
+                           regs in it, and runing gdb subprocess to get
+                           backtrace from the patched coredump
+        """)
+
     def __init__(self):
         gdb.Command.__init__(self, 'qemu bt', gdb.COMMAND_STACK,
                              gdb.COMPLETE_NONE)
 
+    def _usage(self):
+        gdb.write('usage: qemu bt [--detailed]\n')
+        return
+
     def invoke(self, arg, from_tty):
+        argv = gdb.string_to_argv(arg)
+        argc = len(argv)
+        if argc > 1 or (argc == 1 and argv[0] != '--detailed'):
+            return self._usage()
+        detailed = True if argc == 1 else False
 
         gdb.execute("bt")
 
@@ -178,8 +282,8 @@ def invoke(self, arg, from_tty):
             co_ptr = co["base"]["caller"]
             if co_ptr == 0:
                 break
-            gdb.write("Coroutine at " + str(co_ptr) + ":\n")
-            bt_jmpbuf(coroutine_to_jmpbuf(co_ptr))
+            gdb.write("\nCoroutine at " + str(co_ptr) + ":\n")
+            bt_jmpbuf(coroutine_to_jmpbuf(co_ptr), detailed=detailed)
 
 class CoroutineSPFunction(gdb.Function):
     def __init__(self):
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump
  2025-11-25 14:21 ` [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump andrey.drobyshev
@ 2025-11-26 20:58   ` Stefan Hajnoczi
  2025-11-27 13:14     ` Andrey Drobyshev
  2025-11-27  9:56   ` Kevin Wolf
  1 sibling, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2025-11-26 20:58 UTC (permalink / raw)
  To: andrey.drobyshev; +Cc: qemu-devel, kwolf, peterx, vsementsov, den

[-- Attachment #1: Type: text/plain, Size: 11106 bytes --]

On Tue, Nov 25, 2025 at 04:21:05PM +0200, andrey.drobyshev@virtuozzo.com wrote:
> From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> 
> Commit 772f86839f ("scripts/qemu-gdb: Support coroutine dumps in
> coredumps") introduced coroutine traces in coredumps using raw stack
> unwinding.  While this works, this approach does not allow to view the
> function arguments in the corresponding stack frames.
> 
> As an alternative, we can obtain saved registers from the coroutine's
> jmpbuf, copy the original coredump file into a temporary file, patch the
> saved registers into the tmp coredump's struct elf_prstatus and execute
> another gdb subprocess to get backtrace from the patched temporary coredump.
> 
> While providing more detailed info, this alternative approach, however, is
> quite heavyweight as it takes significantly more time and disk space.
> So, instead of making it a new default, let's keep raw unwind the default
> behaviour, but add the '--detailed' option for 'qemu bt' and 'qemu coroutine'
> command which would enforce the new behaviour.

Wow, that's a big hack around GDB limitations but I don't see any harm
in offering this as an option.

> 
> That's how this looks:
> 
>   (gdb) qemu coroutine 0x7fda9335a508
>   #0  0x5602bdb41c26 in qemu_coroutine_switch<+214> () at ../util/coroutine-ucontext.c:321
>   #1  0x5602bdb3e8fe in qemu_aio_coroutine_enter<+493> () at ../util/qemu-coroutine.c:293
>   #2  0x5602bdb3c4eb in co_schedule_bh_cb<+538> () at ../util/async.c:547
>   #3  0x5602bdb3b518 in aio_bh_call<+119> () at ../util/async.c:172
>   #4  0x5602bdb3b79a in aio_bh_poll<+457> () at ../util/async.c:219
>   #5  0x5602bdb10f22 in aio_poll<+1201> () at ../util/aio-posix.c:719
>   #6  0x5602bd8fb1ac in iothread_run<+123> () at ../iothread.c:63
>   #7  0x5602bdb18a24 in qemu_thread_start<+355> () at ../util/qemu-thread-posix.c:393
> 
>   (gdb) qemu coroutine 0x7fda9335a508 --detailed
>   patching core file /tmp/tmpq4hmk2qc
>   found "CORE" at 0x10c48
>   assume pt_regs at 0x10cbc
>   write r15 at 0x10cbc
>   write r14 at 0x10cc4
>   write r13 at 0x10ccc
>   write r12 at 0x10cd4
>   write rbp at 0x10cdc
>   write rbx at 0x10ce4
>   write rip at 0x10d3c
>   write rsp at 0x10d54
> 
>   #0  0x00005602bdb41c26 in qemu_coroutine_switch (from_=0x7fda9335a508, to_=0x7fda8400c280, action=COROUTINE_ENTER) at ../util/coroutine-ucontext.c:321
>   #1  0x00005602bdb3e8fe in qemu_aio_coroutine_enter (ctx=0x5602bf7147c0, co=0x7fda8400c280) at ../util/qemu-coroutine.c:293
>   #2  0x00005602bdb3c4eb in co_schedule_bh_cb (opaque=0x5602bf7147c0) at ../util/async.c:547
>   #3  0x00005602bdb3b518 in aio_bh_call (bh=0x5602bf714a40) at ../util/async.c:172
>   #4  0x00005602bdb3b79a in aio_bh_poll (ctx=0x5602bf7147c0) at ../util/async.c:219
>   #5  0x00005602bdb10f22 in aio_poll (ctx=0x5602bf7147c0, blocking=true) at ../util/aio-posix.c:719
>   #6  0x00005602bd8fb1ac in iothread_run (opaque=0x5602bf42b100) at ../iothread.c:63
>   #7  0x00005602bdb18a24 in qemu_thread_start (args=0x5602bf7164a0) at ../util/qemu-thread-posix.c:393
>   #8  0x00007fda9e89f7f2 in start_thread (arg=<optimized out>) at pthread_create.c:443
>   #9  0x00007fda9e83f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> 
> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> CC: Peter Xu <peterx@redhat.com>
> Originally-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> ---
>  scripts/qemugdb/coroutine.py | 126 ++++++++++++++++++++++++++++++++---
>  1 file changed, 115 insertions(+), 11 deletions(-)
> 
> diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
> index e98fc48a4b..b1c7f96af5 100644
> --- a/scripts/qemugdb/coroutine.py
> +++ b/scripts/qemugdb/coroutine.py
> @@ -10,6 +10,13 @@
>  # or later.  See the COPYING file in the top-level directory.
>  
>  import gdb
> +import os
> +import re
> +import struct
> +import shutil
> +import subprocess
> +import tempfile
> +import textwrap
>  
>  VOID_PTR = gdb.lookup_type('void').pointer()
>  
> @@ -77,6 +84,65 @@ def symbol_lookup(addr):
>  
>      return f"{func_str} at {path}:{line}"
>  
> +def write_regs_to_coredump(corefile, set_regs):
> +    # asm/ptrace.h
> +    pt_regs = ['r15', 'r14', 'r13', 'r12', 'rbp', 'rbx', 'r11', 'r10',
> +               'r9', 'r8', 'rax', 'rcx', 'rdx', 'rsi', 'rdi', 'orig_rax',
> +               'rip', 'cs', 'eflags', 'rsp', 'ss']
> +
> +    with open(corefile, 'r+b') as f:
> +        gdb.write(f'patching core file {corefile}\n')
> +
> +        while f.read(4) != b'CORE':
> +            pass
> +        gdb.write(f'found "CORE" at 0x{f.tell():x}\n')
> +
> +        # Looking for struct elf_prstatus and pr_reg field in it (an array
> +        # of general purpose registers).  See sys/procfs.h
> +
> +        # lseek(f.fileno(), 4, SEEK_CUR): go to elf_prstatus
> +        f.seek(4, 1)
> +        # lseek(f.fileno(), 112, SEEK_CUR): offsetof(struct elf_prstatus, pr_reg)
> +        f.seek(112, 1)
> +
> +        gdb.write(f'assume pt_regs at 0x{f.tell():x}\n')
> +        for reg in pt_regs:
> +            if reg in set_regs:
> +                gdb.write(f'write {reg} at 0x{f.tell():x}\n')
> +                f.write(struct.pack('q', set_regs[reg]))
> +            else:
> +                # lseek(f.fileno(), 8, SEEK_CUR): go to the next reg
> +                f.seek(8, 1)
> +
> +def clone_coredump(source, target, set_regs):
> +    shutil.copyfile(source, target)
> +    write_regs_to_coredump(target, set_regs)
> +
> +def dump_backtrace_patched(regs):
> +    files = gdb.execute('info files', False, True).split('\n')
> +    executable = re.match('^Symbols from "(.*)".$', files[0]).group(1)
> +    dump = re.search("`(.*)'", files[2]).group(1)
> +
> +    with tempfile.NamedTemporaryFile(dir='/tmp', delete=False) as f:
> +        tmpcore = f.name
> +
> +    clone_coredump(dump, tmpcore, regs)
> +
> +    cmd = ['script', '-qec',
> +           'gdb -batch ' +
> +           '-ex "set complaints 0" ' +
> +           '-ex "set verbose off" ' +
> +           '-ex "set style enabled on" ' +
> +           '-ex "python print(\'----split----\')" ' +
> +           f'-ex bt {executable} {tmpcore}',
> +           '/dev/null']
> +    out = subprocess.check_output(cmd, stderr=subprocess.DEVNULL)

Is script(1) necessary or just something you used for debugging?

On Fedora 43 the script(1) utility isn't installed by default. Due to
its generic name it's also a little hard to find the package name
online. It would be nice to print a help message pointing to the
packages. From what I can tell, script(1) is available in
util-linux-script on Red Hat-based distros, bsdutils on Debian-based
distros, and util-linux on Arch.

> +    out = out.split(b'----split----')[1].decode('utf-8')
> +
> +    os.remove(tmpcore)
> +
> +    gdb.write(out)
> +
>  def dump_backtrace(regs):
>      '''
>      Backtrace dump with raw registers, mimic GDB command 'bt'.
> @@ -120,7 +186,7 @@ def dump_backtrace_live(regs):
>  
>      selected_frame.select()
>  
> -def bt_jmpbuf(jmpbuf):
> +def bt_jmpbuf(jmpbuf, detailed=False):
>      '''Backtrace a jmpbuf'''
>      regs = get_jmpbuf_regs(jmpbuf)
>      try:
> @@ -128,8 +194,12 @@ def bt_jmpbuf(jmpbuf):
>          # but only works with live sessions.
>          dump_backtrace_live(regs)
>      except:
> -        # If above doesn't work, fallback to poor man's unwind
> -        dump_backtrace(regs)
> +        if detailed:
> +            # Obtain detailed trace by patching regs in copied coredump
> +            dump_backtrace_patched(regs)
> +        else:
> +            # If above doesn't work, fallback to poor man's unwind
> +            dump_backtrace(regs)
>  
>  def co_cast(co):
>      return co.cast(gdb.lookup_type('CoroutineUContext').pointer())
> @@ -140,26 +210,60 @@ def coroutine_to_jmpbuf(co):
>  
>  
>  class CoroutineCommand(gdb.Command):
> -    '''Display coroutine backtrace'''
> +    __doc__ = textwrap.dedent("""\
> +        Display coroutine backtrace
> +
> +        Usage: qemu coroutine COROPTR [--detailed]
> +        Show backtrace for a coroutine specified by COROPTR
> +
> +          --detailed       obtain detailed trace by copying coredump, patching
> +                           regs in it, and runing gdb subprocess to get
> +                           backtrace from the patched coredump
> +        """)
> +
>      def __init__(self):
>          gdb.Command.__init__(self, 'qemu coroutine', gdb.COMMAND_DATA,
>                               gdb.COMPLETE_NONE)
>  
> +    def _usage(self):
> +        gdb.write('usage: qemu coroutine <coroutine-pointer> [--detailed]\n')
> +        return
> +
>      def invoke(self, arg, from_tty):
>          argv = gdb.string_to_argv(arg)
> -        if len(argv) != 1:
> -            gdb.write('usage: qemu coroutine <coroutine-pointer>\n')
> -            return
> +        argc = len(argv)
> +        if argc == 0 or argc > 2 or (argc == 2 and argv[1] != '--detailed'):
> +            return self._usage()
> +        detailed = True if argc == 2 else False
>  
> -        bt_jmpbuf(coroutine_to_jmpbuf(gdb.parse_and_eval(argv[0])))
> +        bt_jmpbuf(coroutine_to_jmpbuf(gdb.parse_and_eval(argv[0])),
> +                  detailed=detailed)
>  
>  class CoroutineBt(gdb.Command):
> -    '''Display backtrace including coroutine switches'''
> +    __doc__ = textwrap.dedent("""\
> +        Display backtrace including coroutine switches
> +
> +        Usage: qemu bt [--detailed]
> +
> +          --detailed       obtain detailed trace by copying coredump, patching
> +                           regs in it, and runing gdb subprocess to get
> +                           backtrace from the patched coredump
> +        """)
> +
>      def __init__(self):
>          gdb.Command.__init__(self, 'qemu bt', gdb.COMMAND_STACK,
>                               gdb.COMPLETE_NONE)
>  
> +    def _usage(self):
> +        gdb.write('usage: qemu bt [--detailed]\n')
> +        return
> +
>      def invoke(self, arg, from_tty):
> +        argv = gdb.string_to_argv(arg)
> +        argc = len(argv)
> +        if argc > 1 or (argc == 1 and argv[0] != '--detailed'):
> +            return self._usage()
> +        detailed = True if argc == 1 else False
>  
>          gdb.execute("bt")
>  
> @@ -178,8 +282,8 @@ def invoke(self, arg, from_tty):
>              co_ptr = co["base"]["caller"]
>              if co_ptr == 0:
>                  break
> -            gdb.write("Coroutine at " + str(co_ptr) + ":\n")
> -            bt_jmpbuf(coroutine_to_jmpbuf(co_ptr))
> +            gdb.write("\nCoroutine at " + str(co_ptr) + ":\n")
> +            bt_jmpbuf(coroutine_to_jmpbuf(co_ptr), detailed=detailed)
>  
>  class CoroutineSPFunction(gdb.Function):
>      def __init__(self):
> -- 
> 2.43.5
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] Fixes and improvements for scripts/qemugdb commands
  2025-11-25 14:21 [PATCH 0/4] Fixes and improvements for scripts/qemugdb commands andrey.drobyshev
                   ` (3 preceding siblings ...)
  2025-11-25 14:21 ` [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump andrey.drobyshev
@ 2025-11-26 20:59 ` Stefan Hajnoczi
  4 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2025-11-26 20:59 UTC (permalink / raw)
  To: andrey.drobyshev; +Cc: qemu-devel, kwolf, peterx, vsementsov, den

[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]

On Tue, Nov 25, 2025 at 04:21:01PM +0200, andrey.drobyshev@virtuozzo.com wrote:
> From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> 
> Commands 'qemu mtree' and 'qemu timers' are currently not working.
> Luckily, fixing them requires minimal effort.
> 
> The most interesting part is adding '--detailed' option into 'qemu bt'
> and 'qemu coroutine' commands.  That is based on the previous downstream
> work by Vladimir.  As this approach is much heavier than the current
> stack unwinding, we add it as non-default behaviour.
> 
> Andrey Drobyshev (4):
>   scripts/qemugdb: mtree: Fix OverflowError in mtree with 128-bit
>     addresses
>   scripts/qemugdb: timers: Fix KeyError in 'qemu timers' command
>   scripts/qemugdb: timers: Improve 'qemu timers' command readability
>   scripts/qemugdb: coroutine: Add option for obtaining detailed trace in
>     coredump
> 
>  scripts/qemugdb/coroutine.py | 126 ++++++++++++++++++++++++++++++++---
>  scripts/qemugdb/mtree.py     |   2 +-
>  scripts/qemugdb/timers.py    |  54 ++++++++++++---
>  3 files changed, 162 insertions(+), 20 deletions(-)
> 
> -- 
> 2.43.5
> 

Aside from the question about script(1):
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump
  2025-11-25 14:21 ` [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump andrey.drobyshev
  2025-11-26 20:58   ` Stefan Hajnoczi
@ 2025-11-27  9:56   ` Kevin Wolf
  2025-11-27 10:02     ` Daniel P. Berrangé
  1 sibling, 1 reply; 18+ messages in thread
From: Kevin Wolf @ 2025-11-27  9:56 UTC (permalink / raw)
  To: andrey.drobyshev; +Cc: qemu-devel, peterx, stefanha, vsementsov, den

Am 25.11.2025 um 15:21 hat andrey.drobyshev@virtuozzo.com geschrieben:
> From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> 
> Commit 772f86839f ("scripts/qemu-gdb: Support coroutine dumps in
> coredumps") introduced coroutine traces in coredumps using raw stack
> unwinding.  While this works, this approach does not allow to view the
> function arguments in the corresponding stack frames.
> 
> As an alternative, we can obtain saved registers from the coroutine's
> jmpbuf, copy the original coredump file into a temporary file, patch the
> saved registers into the tmp coredump's struct elf_prstatus and execute
> another gdb subprocess to get backtrace from the patched temporary coredump.
> 
> While providing more detailed info, this alternative approach, however, is
> quite heavyweight as it takes significantly more time and disk space.
> So, instead of making it a new default, let's keep raw unwind the default
> behaviour, but add the '--detailed' option for 'qemu bt' and 'qemu coroutine'
> command which would enforce the new behaviour.
> [...]

> +def clone_coredump(source, target, set_regs):
> +    shutil.copyfile(source, target)
> +    write_regs_to_coredump(target, set_regs)
> +
> +def dump_backtrace_patched(regs):
> +    files = gdb.execute('info files', False, True).split('\n')
> +    executable = re.match('^Symbols from "(.*)".$', files[0]).group(1)
> +    dump = re.search("`(.*)'", files[2]).group(1)
> +
> +    with tempfile.NamedTemporaryFile(dir='/tmp', delete=False) as f:
> +        tmpcore = f.name
> +
> +    clone_coredump(dump, tmpcore, regs)

I think this is what makes it so heavy, right? Coredumps can be quite
large and /tmp is probably a different filesystem, so you end up really
copying the full size of the coredump around.

Wouldn't it be better in the general case if we could just do a reflink
copy of the coredump and then do only very few writes for updating the
register values? Then the overhead should actually be quite negligible
both in terms of time and disk space.

Kevin



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump
  2025-11-27  9:56   ` Kevin Wolf
@ 2025-11-27 10:02     ` Daniel P. Berrangé
  2025-11-27 14:31       ` Andrey Drobyshev
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel P. Berrangé @ 2025-11-27 10:02 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: andrey.drobyshev, qemu-devel, peterx, stefanha, vsementsov, den

On Thu, Nov 27, 2025 at 10:56:12AM +0100, Kevin Wolf wrote:
> Am 25.11.2025 um 15:21 hat andrey.drobyshev@virtuozzo.com geschrieben:
> > From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> > 
> > Commit 772f86839f ("scripts/qemu-gdb: Support coroutine dumps in
> > coredumps") introduced coroutine traces in coredumps using raw stack
> > unwinding.  While this works, this approach does not allow to view the
> > function arguments in the corresponding stack frames.
> > 
> > As an alternative, we can obtain saved registers from the coroutine's
> > jmpbuf, copy the original coredump file into a temporary file, patch the
> > saved registers into the tmp coredump's struct elf_prstatus and execute
> > another gdb subprocess to get backtrace from the patched temporary coredump.
> > 
> > While providing more detailed info, this alternative approach, however, is
> > quite heavyweight as it takes significantly more time and disk space.
> > So, instead of making it a new default, let's keep raw unwind the default
> > behaviour, but add the '--detailed' option for 'qemu bt' and 'qemu coroutine'
> > command which would enforce the new behaviour.
> > [...]
> 
> > +def clone_coredump(source, target, set_regs):
> > +    shutil.copyfile(source, target)
> > +    write_regs_to_coredump(target, set_regs)
> > +
> > +def dump_backtrace_patched(regs):
> > +    files = gdb.execute('info files', False, True).split('\n')
> > +    executable = re.match('^Symbols from "(.*)".$', files[0]).group(1)
> > +    dump = re.search("`(.*)'", files[2]).group(1)
> > +
> > +    with tempfile.NamedTemporaryFile(dir='/tmp', delete=False) as f:
> > +        tmpcore = f.name
> > +
> > +    clone_coredump(dump, tmpcore, regs)
> 
> I think this is what makes it so heavy, right? Coredumps can be quite
> large and /tmp is probably a different filesystem, so you end up really
> copying the full size of the coredump around.

On my system /tmp is  tmpfs, so this is actually bringing the whole
coredump into RAM which is not a sensible approach.

> Wouldn't it be better in the general case if we could just do a reflink
> copy of the coredump and then do only very few writes for updating the
> register values? Then the overhead should actually be quite negligible
> both in terms of time and disk space.

Personally I'd be fine with just modifying the core dump in place
most of the time. I don't need to keep the current file untouched,
as it is is just a temporary download acquired from systemd's
coredumpctl, or from a bug tracker. 


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump
  2025-11-26 20:58   ` Stefan Hajnoczi
@ 2025-11-27 13:14     ` Andrey Drobyshev
  2025-11-27 14:48       ` Stefan Hajnoczi
  0 siblings, 1 reply; 18+ messages in thread
From: Andrey Drobyshev @ 2025-11-27 13:14 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, kwolf, peterx, vsementsov, den

On 11/26/25 10:58 PM, Stefan Hajnoczi wrote:
> On Tue, Nov 25, 2025 at 04:21:05PM +0200, andrey.drobyshev@virtuozzo.com wrote:
>> From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>>
>> Commit 772f86839f ("scripts/qemu-gdb: Support coroutine dumps in
>> coredumps") introduced coroutine traces in coredumps using raw stack
>> unwinding.  While this works, this approach does not allow to view the
>> function arguments in the corresponding stack frames.
>>
>> As an alternative, we can obtain saved registers from the coroutine's
>> jmpbuf, copy the original coredump file into a temporary file, patch the
>> saved registers into the tmp coredump's struct elf_prstatus and execute
>> another gdb subprocess to get backtrace from the patched temporary coredump.
>>
>> While providing more detailed info, this alternative approach, however, is
>> quite heavyweight as it takes significantly more time and disk space.
>> So, instead of making it a new default, let's keep raw unwind the default
>> behaviour, but add the '--detailed' option for 'qemu bt' and 'qemu coroutine'
>> command which would enforce the new behaviour.
> 
> Wow, that's a big hack around GDB limitations but I don't see any harm
> in offering this as an option.
> 
>>
>> That's how this looks:
>>
>>   (gdb) qemu coroutine 0x7fda9335a508
>>   #0  0x5602bdb41c26 in qemu_coroutine_switch<+214> () at ../util/coroutine-ucontext.c:321
>>   #1  0x5602bdb3e8fe in qemu_aio_coroutine_enter<+493> () at ../util/qemu-coroutine.c:293
>>   #2  0x5602bdb3c4eb in co_schedule_bh_cb<+538> () at ../util/async.c:547
>>   #3  0x5602bdb3b518 in aio_bh_call<+119> () at ../util/async.c:172
>>   #4  0x5602bdb3b79a in aio_bh_poll<+457> () at ../util/async.c:219
>>   #5  0x5602bdb10f22 in aio_poll<+1201> () at ../util/aio-posix.c:719
>>   #6  0x5602bd8fb1ac in iothread_run<+123> () at ../iothread.c:63
>>   #7  0x5602bdb18a24 in qemu_thread_start<+355> () at ../util/qemu-thread-posix.c:393
>>
>>   (gdb) qemu coroutine 0x7fda9335a508 --detailed
>>   patching core file /tmp/tmpq4hmk2qc
>>   found "CORE" at 0x10c48
>>   assume pt_regs at 0x10cbc
>>   write r15 at 0x10cbc
>>   write r14 at 0x10cc4
>>   write r13 at 0x10ccc
>>   write r12 at 0x10cd4
>>   write rbp at 0x10cdc
>>   write rbx at 0x10ce4
>>   write rip at 0x10d3c
>>   write rsp at 0x10d54
>>
>>   #0  0x00005602bdb41c26 in qemu_coroutine_switch (from_=0x7fda9335a508, to_=0x7fda8400c280, action=COROUTINE_ENTER) at ../util/coroutine-ucontext.c:321
>>   #1  0x00005602bdb3e8fe in qemu_aio_coroutine_enter (ctx=0x5602bf7147c0, co=0x7fda8400c280) at ../util/qemu-coroutine.c:293
>>   #2  0x00005602bdb3c4eb in co_schedule_bh_cb (opaque=0x5602bf7147c0) at ../util/async.c:547
>>   #3  0x00005602bdb3b518 in aio_bh_call (bh=0x5602bf714a40) at ../util/async.c:172
>>   #4  0x00005602bdb3b79a in aio_bh_poll (ctx=0x5602bf7147c0) at ../util/async.c:219
>>   #5  0x00005602bdb10f22 in aio_poll (ctx=0x5602bf7147c0, blocking=true) at ../util/aio-posix.c:719
>>   #6  0x00005602bd8fb1ac in iothread_run (opaque=0x5602bf42b100) at ../iothread.c:63
>>   #7  0x00005602bdb18a24 in qemu_thread_start (args=0x5602bf7164a0) at ../util/qemu-thread-posix.c:393
>>   #8  0x00007fda9e89f7f2 in start_thread (arg=<optimized out>) at pthread_create.c:443
>>   #9  0x00007fda9e83f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
>>
>> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> CC: Peter Xu <peterx@redhat.com>
>> Originally-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>> ---
>>  scripts/qemugdb/coroutine.py | 126 ++++++++++++++++++++++++++++++++---
>>  1 file changed, 115 insertions(+), 11 deletions(-)
>>
>> diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
>> index e98fc48a4b..b1c7f96af5 100644
>> --- a/scripts/qemugdb/coroutine.py
>> +++ b/scripts/qemugdb/coroutine.py
>> @@ -10,6 +10,13 @@
>>  # or later.  See the COPYING file in the top-level directory.
>>  
>>  import gdb
>> +import os
>> +import re
>> +import struct
>> +import shutil
>> +import subprocess
>> +import tempfile
>> +import textwrap
>>  
>>  VOID_PTR = gdb.lookup_type('void').pointer()
>>  
>> @@ -77,6 +84,65 @@ def symbol_lookup(addr):
>>  
>>      return f"{func_str} at {path}:{line}"
>>  
>> +def write_regs_to_coredump(corefile, set_regs):
>> +    # asm/ptrace.h
>> +    pt_regs = ['r15', 'r14', 'r13', 'r12', 'rbp', 'rbx', 'r11', 'r10',
>> +               'r9', 'r8', 'rax', 'rcx', 'rdx', 'rsi', 'rdi', 'orig_rax',
>> +               'rip', 'cs', 'eflags', 'rsp', 'ss']
>> +
>> +    with open(corefile, 'r+b') as f:
>> +        gdb.write(f'patching core file {corefile}\n')
>> +
>> +        while f.read(4) != b'CORE':
>> +            pass
>> +        gdb.write(f'found "CORE" at 0x{f.tell():x}\n')
>> +
>> +        # Looking for struct elf_prstatus and pr_reg field in it (an array
>> +        # of general purpose registers).  See sys/procfs.h
>> +
>> +        # lseek(f.fileno(), 4, SEEK_CUR): go to elf_prstatus
>> +        f.seek(4, 1)
>> +        # lseek(f.fileno(), 112, SEEK_CUR): offsetof(struct elf_prstatus, pr_reg)
>> +        f.seek(112, 1)
>> +
>> +        gdb.write(f'assume pt_regs at 0x{f.tell():x}\n')
>> +        for reg in pt_regs:
>> +            if reg in set_regs:
>> +                gdb.write(f'write {reg} at 0x{f.tell():x}\n')
>> +                f.write(struct.pack('q', set_regs[reg]))
>> +            else:
>> +                # lseek(f.fileno(), 8, SEEK_CUR): go to the next reg
>> +                f.seek(8, 1)
>> +
>> +def clone_coredump(source, target, set_regs):
>> +    shutil.copyfile(source, target)
>> +    write_regs_to_coredump(target, set_regs)
>> +
>> +def dump_backtrace_patched(regs):
>> +    files = gdb.execute('info files', False, True).split('\n')
>> +    executable = re.match('^Symbols from "(.*)".$', files[0]).group(1)
>> +    dump = re.search("`(.*)'", files[2]).group(1)
>> +
>> +    with tempfile.NamedTemporaryFile(dir='/tmp', delete=False) as f:
>> +        tmpcore = f.name
>> +
>> +    clone_coredump(dump, tmpcore, regs)
>> +
>> +    cmd = ['script', '-qec',
>> +           'gdb -batch ' +
>> +           '-ex "set complaints 0" ' +
>> +           '-ex "set verbose off" ' +
>> +           '-ex "set style enabled on" ' +
>> +           '-ex "python print(\'----split----\')" ' +
>> +           f'-ex bt {executable} {tmpcore}',
>> +           '/dev/null']
>> +    out = subprocess.check_output(cmd, stderr=subprocess.DEVNULL)
> 
> Is script(1) necessary or just something you used for debugging?
> 
> On Fedora 43 the script(1) utility isn't installed by default. Due to
> its generic name it's also a little hard to find the package name
> online. It would be nice to print a help message pointing to the
> packages. From what I can tell, script(1) is available in
> util-linux-script on Red Hat-based distros, bsdutils on Debian-based
> distros, and util-linux on Arch.
> 
> [...]
My sole purpose for using script(1) was to make GDB subprocess produce
colored stack trace output, just like what we get when calling 'bt' in a
regular GDB session.  I just find it easier to read.  So, unless there's
an easier way to achieve that same result, I'd prefer to keep using
script(1).

But your point is of course valid -- I didn't think of the case when
script(1) program might not be installed.  Since we're just decorating
the output here, instead of failing with a help message I'd suggest
simply checking whether script(1) binary is present in the system with
smth like shutil.which(), and only using it if it is.  I'll update the
patch accordingly, if there're no objections.

Andrey


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump
  2025-11-27 10:02     ` Daniel P. Berrangé
@ 2025-11-27 14:31       ` Andrey Drobyshev
  2025-11-27 14:55         ` Daniel P. Berrangé
  2025-11-27 16:39         ` Kevin Wolf
  0 siblings, 2 replies; 18+ messages in thread
From: Andrey Drobyshev @ 2025-11-27 14:31 UTC (permalink / raw)
  To: Daniel P. Berrangé, Kevin Wolf
  Cc: qemu-devel, peterx, stefanha, vsementsov, den

On 11/27/25 12:02 PM, Daniel P. Berrangé wrote:
> On Thu, Nov 27, 2025 at 10:56:12AM +0100, Kevin Wolf wrote:
>> Am 25.11.2025 um 15:21 hat andrey.drobyshev@virtuozzo.com geschrieben:
>>> From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>>>
>>> Commit 772f86839f ("scripts/qemu-gdb: Support coroutine dumps in
>>> coredumps") introduced coroutine traces in coredumps using raw stack
>>> unwinding.  While this works, this approach does not allow to view the
>>> function arguments in the corresponding stack frames.
>>>
>>> As an alternative, we can obtain saved registers from the coroutine's
>>> jmpbuf, copy the original coredump file into a temporary file, patch the
>>> saved registers into the tmp coredump's struct elf_prstatus and execute
>>> another gdb subprocess to get backtrace from the patched temporary coredump.
>>>
>>> While providing more detailed info, this alternative approach, however, is
>>> quite heavyweight as it takes significantly more time and disk space.
>>> So, instead of making it a new default, let's keep raw unwind the default
>>> behaviour, but add the '--detailed' option for 'qemu bt' and 'qemu coroutine'
>>> command which would enforce the new behaviour.
>>> [...]
>>
>>> +def clone_coredump(source, target, set_regs):
>>> +    shutil.copyfile(source, target)
>>> +    write_regs_to_coredump(target, set_regs)
>>> +
>>> +def dump_backtrace_patched(regs):
>>> +    files = gdb.execute('info files', False, True).split('\n')
>>> +    executable = re.match('^Symbols from "(.*)".$', files[0]).group(1)
>>> +    dump = re.search("`(.*)'", files[2]).group(1)
>>> +
>>> +    with tempfile.NamedTemporaryFile(dir='/tmp', delete=False) as f:
>>> +        tmpcore = f.name
>>> +
>>> +    clone_coredump(dump, tmpcore, regs)
>>
>> I think this is what makes it so heavy, right? Coredumps can be quite
>> large and /tmp is probably a different filesystem, so you end up really
>> copying the full size of the coredump around.
> 
> On my system /tmp is  tmpfs, so this is actually bringing the whole
> coredump into RAM which is not a sensible approach.
> 
>> Wouldn't it be better in the general case if we could just do a reflink
>> copy of the coredump and then do only very few writes for updating the
>> register values? Then the overhead should actually be quite negligible
>> both in terms of time and disk space.
> 

That's correct, copying the file to /tmp takes most of the time with
this approach.

As for reflink copy, this might've been a great solution.  However, it
would largely depend on the FS used.  E.g. in my system coredumpctl
places uncompressed coredump at /var/tmp, which is mounted as ext4.  And
in this case:

# cp --reflink /var/tmp/coredump-MQCZQc /root
cp: failed to clone '/root/coredump-MQCZQc' from
'/var/tmp/coredump-MQCZQc': Invalid cross-device link

# cp --reflink /var/tmp/coredump-MQCZQc /var/tmp/coredump.ref
cp: failed to clone '/var/tmp/coredump.ref' from
'/var/tmp/coredump-MQCZQc': Operation not supported

Apparently, ext4 doesn't support reflink copy. xfs and btrfs do.  But I
guess our implementation better be FS-agnostic.
> Personally I'd be fine with just modifying the core dump in place
> most of the time. I don't need to keep the current file untouched,
> as it is is just a temporary download acquired from systemd's
> coredumpctl, or from a bug tracker. 
> 
>

Hmm, that's an interesting proposal.  But I still see some potential
pitfalls with it:

1. When dealing with the core dump stored by coredumpctl, original file
is indeed stored compressed and not being modified.  We don't really
care about the uncompressed temporary dump placed in /var/tmp.  What we
do care about is that current GDB session keeps working smoothly.  I
tried patching the dump in place without copying, and it doesn't seem to
break subsequent commands.  However GDB keeps the temporary dump open
throughout the whole session, which means it can occasionally read
modified data from it.  I'm not sure that we have a solid guarantee that
things will keep working with the patched dump.

2. If we're dealing with an external core dump downloaded from a bug
report, we surely want to be able to create new GDB sessions with it.
That means we'll want its unmodified version.  Having to re-download it
again is even slower than plain copying.

The solution to both problems would be saving original registers and
patching them back into the core dump once we've obtained our coroutine
trace.  It's still potentially fragile in 2nd case if GDB process
abruptly gets killed/dies leaving registers un-restored.  But I guess we
can live with it?

What do you think?


> With regards,
> Daniel



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump
  2025-11-27 13:14     ` Andrey Drobyshev
@ 2025-11-27 14:48       ` Stefan Hajnoczi
  2025-11-27 16:13         ` Andrey Drobyshev
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2025-11-27 14:48 UTC (permalink / raw)
  To: Andrey Drobyshev; +Cc: qemu-devel, kwolf, peterx, vsementsov, den

[-- Attachment #1: Type: text/plain, Size: 8358 bytes --]

On Thu, Nov 27, 2025 at 03:14:43PM +0200, Andrey Drobyshev wrote:
> On 11/26/25 10:58 PM, Stefan Hajnoczi wrote:
> > On Tue, Nov 25, 2025 at 04:21:05PM +0200, andrey.drobyshev@virtuozzo.com wrote:
> >> From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> >>
> >> Commit 772f86839f ("scripts/qemu-gdb: Support coroutine dumps in
> >> coredumps") introduced coroutine traces in coredumps using raw stack
> >> unwinding.  While this works, this approach does not allow to view the
> >> function arguments in the corresponding stack frames.
> >>
> >> As an alternative, we can obtain saved registers from the coroutine's
> >> jmpbuf, copy the original coredump file into a temporary file, patch the
> >> saved registers into the tmp coredump's struct elf_prstatus and execute
> >> another gdb subprocess to get backtrace from the patched temporary coredump.
> >>
> >> While providing more detailed info, this alternative approach, however, is
> >> quite heavyweight as it takes significantly more time and disk space.
> >> So, instead of making it a new default, let's keep raw unwind the default
> >> behaviour, but add the '--detailed' option for 'qemu bt' and 'qemu coroutine'
> >> command which would enforce the new behaviour.
> > 
> > Wow, that's a big hack around GDB limitations but I don't see any harm
> > in offering this as an option.
> > 
> >>
> >> That's how this looks:
> >>
> >>   (gdb) qemu coroutine 0x7fda9335a508
> >>   #0  0x5602bdb41c26 in qemu_coroutine_switch<+214> () at ../util/coroutine-ucontext.c:321
> >>   #1  0x5602bdb3e8fe in qemu_aio_coroutine_enter<+493> () at ../util/qemu-coroutine.c:293
> >>   #2  0x5602bdb3c4eb in co_schedule_bh_cb<+538> () at ../util/async.c:547
> >>   #3  0x5602bdb3b518 in aio_bh_call<+119> () at ../util/async.c:172
> >>   #4  0x5602bdb3b79a in aio_bh_poll<+457> () at ../util/async.c:219
> >>   #5  0x5602bdb10f22 in aio_poll<+1201> () at ../util/aio-posix.c:719
> >>   #6  0x5602bd8fb1ac in iothread_run<+123> () at ../iothread.c:63
> >>   #7  0x5602bdb18a24 in qemu_thread_start<+355> () at ../util/qemu-thread-posix.c:393
> >>
> >>   (gdb) qemu coroutine 0x7fda9335a508 --detailed
> >>   patching core file /tmp/tmpq4hmk2qc
> >>   found "CORE" at 0x10c48
> >>   assume pt_regs at 0x10cbc
> >>   write r15 at 0x10cbc
> >>   write r14 at 0x10cc4
> >>   write r13 at 0x10ccc
> >>   write r12 at 0x10cd4
> >>   write rbp at 0x10cdc
> >>   write rbx at 0x10ce4
> >>   write rip at 0x10d3c
> >>   write rsp at 0x10d54
> >>
> >>   #0  0x00005602bdb41c26 in qemu_coroutine_switch (from_=0x7fda9335a508, to_=0x7fda8400c280, action=COROUTINE_ENTER) at ../util/coroutine-ucontext.c:321
> >>   #1  0x00005602bdb3e8fe in qemu_aio_coroutine_enter (ctx=0x5602bf7147c0, co=0x7fda8400c280) at ../util/qemu-coroutine.c:293
> >>   #2  0x00005602bdb3c4eb in co_schedule_bh_cb (opaque=0x5602bf7147c0) at ../util/async.c:547
> >>   #3  0x00005602bdb3b518 in aio_bh_call (bh=0x5602bf714a40) at ../util/async.c:172
> >>   #4  0x00005602bdb3b79a in aio_bh_poll (ctx=0x5602bf7147c0) at ../util/async.c:219
> >>   #5  0x00005602bdb10f22 in aio_poll (ctx=0x5602bf7147c0, blocking=true) at ../util/aio-posix.c:719
> >>   #6  0x00005602bd8fb1ac in iothread_run (opaque=0x5602bf42b100) at ../iothread.c:63
> >>   #7  0x00005602bdb18a24 in qemu_thread_start (args=0x5602bf7164a0) at ../util/qemu-thread-posix.c:393
> >>   #8  0x00007fda9e89f7f2 in start_thread (arg=<optimized out>) at pthread_create.c:443
> >>   #9  0x00007fda9e83f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> >>
> >> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> >> CC: Peter Xu <peterx@redhat.com>
> >> Originally-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> >> ---
> >>  scripts/qemugdb/coroutine.py | 126 ++++++++++++++++++++++++++++++++---
> >>  1 file changed, 115 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
> >> index e98fc48a4b..b1c7f96af5 100644
> >> --- a/scripts/qemugdb/coroutine.py
> >> +++ b/scripts/qemugdb/coroutine.py
> >> @@ -10,6 +10,13 @@
> >>  # or later.  See the COPYING file in the top-level directory.
> >>  
> >>  import gdb
> >> +import os
> >> +import re
> >> +import struct
> >> +import shutil
> >> +import subprocess
> >> +import tempfile
> >> +import textwrap
> >>  
> >>  VOID_PTR = gdb.lookup_type('void').pointer()
> >>  
> >> @@ -77,6 +84,65 @@ def symbol_lookup(addr):
> >>  
> >>      return f"{func_str} at {path}:{line}"
> >>  
> >> +def write_regs_to_coredump(corefile, set_regs):
> >> +    # asm/ptrace.h
> >> +    pt_regs = ['r15', 'r14', 'r13', 'r12', 'rbp', 'rbx', 'r11', 'r10',
> >> +               'r9', 'r8', 'rax', 'rcx', 'rdx', 'rsi', 'rdi', 'orig_rax',
> >> +               'rip', 'cs', 'eflags', 'rsp', 'ss']
> >> +
> >> +    with open(corefile, 'r+b') as f:
> >> +        gdb.write(f'patching core file {corefile}\n')
> >> +
> >> +        while f.read(4) != b'CORE':
> >> +            pass
> >> +        gdb.write(f'found "CORE" at 0x{f.tell():x}\n')
> >> +
> >> +        # Looking for struct elf_prstatus and pr_reg field in it (an array
> >> +        # of general purpose registers).  See sys/procfs.h
> >> +
> >> +        # lseek(f.fileno(), 4, SEEK_CUR): go to elf_prstatus
> >> +        f.seek(4, 1)
> >> +        # lseek(f.fileno(), 112, SEEK_CUR): offsetof(struct elf_prstatus, pr_reg)
> >> +        f.seek(112, 1)
> >> +
> >> +        gdb.write(f'assume pt_regs at 0x{f.tell():x}\n')
> >> +        for reg in pt_regs:
> >> +            if reg in set_regs:
> >> +                gdb.write(f'write {reg} at 0x{f.tell():x}\n')
> >> +                f.write(struct.pack('q', set_regs[reg]))
> >> +            else:
> >> +                # lseek(f.fileno(), 8, SEEK_CUR): go to the next reg
> >> +                f.seek(8, 1)
> >> +
> >> +def clone_coredump(source, target, set_regs):
> >> +    shutil.copyfile(source, target)
> >> +    write_regs_to_coredump(target, set_regs)
> >> +
> >> +def dump_backtrace_patched(regs):
> >> +    files = gdb.execute('info files', False, True).split('\n')
> >> +    executable = re.match('^Symbols from "(.*)".$', files[0]).group(1)
> >> +    dump = re.search("`(.*)'", files[2]).group(1)
> >> +
> >> +    with tempfile.NamedTemporaryFile(dir='/tmp', delete=False) as f:
> >> +        tmpcore = f.name
> >> +
> >> +    clone_coredump(dump, tmpcore, regs)
> >> +
> >> +    cmd = ['script', '-qec',
> >> +           'gdb -batch ' +
> >> +           '-ex "set complaints 0" ' +
> >> +           '-ex "set verbose off" ' +
> >> +           '-ex "set style enabled on" ' +
> >> +           '-ex "python print(\'----split----\')" ' +
> >> +           f'-ex bt {executable} {tmpcore}',
> >> +           '/dev/null']
> >> +    out = subprocess.check_output(cmd, stderr=subprocess.DEVNULL)
> > 
> > Is script(1) necessary or just something you used for debugging?
> > 
> > On Fedora 43 the script(1) utility isn't installed by default. Due to
> > its generic name it's also a little hard to find the package name
> > online. It would be nice to print a help message pointing to the
> > packages. From what I can tell, script(1) is available in
> > util-linux-script on Red Hat-based distros, bsdutils on Debian-based
> > distros, and util-linux on Arch.
> > 
> > [...]
> My sole purpose for using script(1) was to make GDB subprocess produce
> colored stack trace output, just like what we get when calling 'bt' in a
> regular GDB session.  I just find it easier to read.  So, unless there's
> an easier way to achieve that same result, I'd prefer to keep using
> script(1).

Have you tried the pty Python standard library module?
https://docs.python.org/3/library/pty.html

> 
> But your point is of course valid -- I didn't think of the case when
> script(1) program might not be installed.  Since we're just decorating
> the output here, instead of failing with a help message I'd suggest
> simply checking whether script(1) binary is present in the system with
> smth like shutil.which(), and only using it if it is.  I'll update the
> patch accordingly, if there're no objections.
> 
> Andrey
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump
  2025-11-27 14:31       ` Andrey Drobyshev
@ 2025-11-27 14:55         ` Daniel P. Berrangé
  2025-11-27 16:39         ` Kevin Wolf
  1 sibling, 0 replies; 18+ messages in thread
From: Daniel P. Berrangé @ 2025-11-27 14:55 UTC (permalink / raw)
  To: Andrey Drobyshev
  Cc: Kevin Wolf, qemu-devel, peterx, stefanha, vsementsov, den

On Thu, Nov 27, 2025 at 04:31:29PM +0200, Andrey Drobyshev wrote:
> On 11/27/25 12:02 PM, Daniel P. Berrangé wrote:
> > On Thu, Nov 27, 2025 at 10:56:12AM +0100, Kevin Wolf wrote:
> >> Am 25.11.2025 um 15:21 hat andrey.drobyshev@virtuozzo.com geschrieben:
> >>> From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> >>>
> >>> Commit 772f86839f ("scripts/qemu-gdb: Support coroutine dumps in
> >>> coredumps") introduced coroutine traces in coredumps using raw stack
> >>> unwinding.  While this works, this approach does not allow to view the
> >>> function arguments in the corresponding stack frames.
> >>>
> >>> As an alternative, we can obtain saved registers from the coroutine's
> >>> jmpbuf, copy the original coredump file into a temporary file, patch the
> >>> saved registers into the tmp coredump's struct elf_prstatus and execute
> >>> another gdb subprocess to get backtrace from the patched temporary coredump.
> >>>
> >>> While providing more detailed info, this alternative approach, however, is
> >>> quite heavyweight as it takes significantly more time and disk space.
> >>> So, instead of making it a new default, let's keep raw unwind the default
> >>> behaviour, but add the '--detailed' option for 'qemu bt' and 'qemu coroutine'
> >>> command which would enforce the new behaviour.
> >>> [...]
> >>

> > Personally I'd be fine with just modifying the core dump in place
> > most of the time. I don't need to keep the current file untouched,
> > as it is is just a temporary download acquired from systemd's
> > coredumpctl, or from a bug tracker. 
> > 
> >
> 
> Hmm, that's an interesting proposal.  But I still see some potential
> pitfalls with it:
> 
> 1. When dealing with the core dump stored by coredumpctl, original file
> is indeed stored compressed and not being modified.  We don't really
> care about the uncompressed temporary dump placed in /var/tmp.  What we
> do care about is that current GDB session keeps working smoothly.  I
> tried patching the dump in place without copying, and it doesn't seem to
> break subsequent commands.  However GDB keeps the temporary dump open
> throughout the whole session, which means it can occasionally read
> modified data from it.  I'm not sure that we have a solid guarantee that
> things will keep working with the patched dump.
> 
> 2. If we're dealing with an external core dump downloaded from a bug
> report, we surely want to be able to create new GDB sessions with it.
> That means we'll want its unmodified version.  Having to re-download it
> again is even slower than plain copying.
> 
> The solution to both problems would be saving original registers and
> patching them back into the core dump once we've obtained our coroutine
> trace.  It's still potentially fragile in 2nd case if GDB process
> abruptly gets killed/dies leaving registers un-restored.  But I guess we
> can live with it?
> 
> What do you think?

I didn't realize at first that we're actually running this every time
a back trace is requested in the session. It feels fragile / risky to
be modifying the file repeatedly while QEMU has the orignal file open
so probably best to disregard that suggestion of mine.

None the less, that makes me even more wary of the full copyfile
approach, as we'd be taking that performance hit potentually many
times over :-(

I don't suppose there's an easy way to make a sparse copy of the
file such that we only include the minimum data needed to produce
the stack trace and omit other parts (leave all-zeros) ? 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump
  2025-11-27 14:48       ` Stefan Hajnoczi
@ 2025-11-27 16:13         ` Andrey Drobyshev
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Drobyshev @ 2025-11-27 16:13 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, kwolf, peterx, vsementsov, den

On 11/27/25 4:48 PM, Stefan Hajnoczi wrote:
> On Thu, Nov 27, 2025 at 03:14:43PM +0200, Andrey Drobyshev wrote:
>> On 11/26/25 10:58 PM, Stefan Hajnoczi wrote:
>>> On Tue, Nov 25, 2025 at 04:21:05PM +0200, andrey.drobyshev@virtuozzo.com wrote:
>>>> From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>>>>
>>>> Commit 772f86839f ("scripts/qemu-gdb: Support coroutine dumps in
>>>> coredumps") introduced coroutine traces in coredumps using raw stack
>>>> unwinding.  While this works, this approach does not allow to view the
>>>> function arguments in the corresponding stack frames.
>>>>
>>>> As an alternative, we can obtain saved registers from the coroutine's
>>>> jmpbuf, copy the original coredump file into a temporary file, patch the
>>>> saved registers into the tmp coredump's struct elf_prstatus and execute
>>>> another gdb subprocess to get backtrace from the patched temporary coredump.
>>>>
>>>> While providing more detailed info, this alternative approach, however, is
>>>> quite heavyweight as it takes significantly more time and disk space.
>>>> So, instead of making it a new default, let's keep raw unwind the default
>>>> behaviour, but add the '--detailed' option for 'qemu bt' and 'qemu coroutine'
>>>> command which would enforce the new behaviour.
>>>
>>> Wow, that's a big hack around GDB limitations but I don't see any harm
>>> in offering this as an option.
>>>
>>>>
>>>> That's how this looks:
>>>>
>>>>   (gdb) qemu coroutine 0x7fda9335a508
>>>>   #0  0x5602bdb41c26 in qemu_coroutine_switch<+214> () at ../util/coroutine-ucontext.c:321
>>>>   #1  0x5602bdb3e8fe in qemu_aio_coroutine_enter<+493> () at ../util/qemu-coroutine.c:293
>>>>   #2  0x5602bdb3c4eb in co_schedule_bh_cb<+538> () at ../util/async.c:547
>>>>   #3  0x5602bdb3b518 in aio_bh_call<+119> () at ../util/async.c:172
>>>>   #4  0x5602bdb3b79a in aio_bh_poll<+457> () at ../util/async.c:219
>>>>   #5  0x5602bdb10f22 in aio_poll<+1201> () at ../util/aio-posix.c:719
>>>>   #6  0x5602bd8fb1ac in iothread_run<+123> () at ../iothread.c:63
>>>>   #7  0x5602bdb18a24 in qemu_thread_start<+355> () at ../util/qemu-thread-posix.c:393
>>>>
>>>>   (gdb) qemu coroutine 0x7fda9335a508 --detailed
>>>>   patching core file /tmp/tmpq4hmk2qc
>>>>   found "CORE" at 0x10c48
>>>>   assume pt_regs at 0x10cbc
>>>>   write r15 at 0x10cbc
>>>>   write r14 at 0x10cc4
>>>>   write r13 at 0x10ccc
>>>>   write r12 at 0x10cd4
>>>>   write rbp at 0x10cdc
>>>>   write rbx at 0x10ce4
>>>>   write rip at 0x10d3c
>>>>   write rsp at 0x10d54
>>>>
>>>>   #0  0x00005602bdb41c26 in qemu_coroutine_switch (from_=0x7fda9335a508, to_=0x7fda8400c280, action=COROUTINE_ENTER) at ../util/coroutine-ucontext.c:321
>>>>   #1  0x00005602bdb3e8fe in qemu_aio_coroutine_enter (ctx=0x5602bf7147c0, co=0x7fda8400c280) at ../util/qemu-coroutine.c:293
>>>>   #2  0x00005602bdb3c4eb in co_schedule_bh_cb (opaque=0x5602bf7147c0) at ../util/async.c:547
>>>>   #3  0x00005602bdb3b518 in aio_bh_call (bh=0x5602bf714a40) at ../util/async.c:172
>>>>   #4  0x00005602bdb3b79a in aio_bh_poll (ctx=0x5602bf7147c0) at ../util/async.c:219
>>>>   #5  0x00005602bdb10f22 in aio_poll (ctx=0x5602bf7147c0, blocking=true) at ../util/aio-posix.c:719
>>>>   #6  0x00005602bd8fb1ac in iothread_run (opaque=0x5602bf42b100) at ../iothread.c:63
>>>>   #7  0x00005602bdb18a24 in qemu_thread_start (args=0x5602bf7164a0) at ../util/qemu-thread-posix.c:393
>>>>   #8  0x00007fda9e89f7f2 in start_thread (arg=<optimized out>) at pthread_create.c:443
>>>>   #9  0x00007fda9e83f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
>>>>
>>>> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>> CC: Peter Xu <peterx@redhat.com>
>>>> Originally-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>>>> ---
>>>>  scripts/qemugdb/coroutine.py | 126 ++++++++++++++++++++++++++++++++---
>>>>  1 file changed, 115 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
>>>> index e98fc48a4b..b1c7f96af5 100644
>>>> --- a/scripts/qemugdb/coroutine.py
>>>> +++ b/scripts/qemugdb/coroutine.py
>>>> @@ -10,6 +10,13 @@
>>>>  # or later.  See the COPYING file in the top-level directory.
>>>>  
>>>>  import gdb
>>>> +import os
>>>> +import re
>>>> +import struct
>>>> +import shutil
>>>> +import subprocess
>>>> +import tempfile
>>>> +import textwrap
>>>>  
>>>>  VOID_PTR = gdb.lookup_type('void').pointer()
>>>>  
>>>> @@ -77,6 +84,65 @@ def symbol_lookup(addr):
>>>>  
>>>>      return f"{func_str} at {path}:{line}"
>>>>  
>>>> +def write_regs_to_coredump(corefile, set_regs):
>>>> +    # asm/ptrace.h
>>>> +    pt_regs = ['r15', 'r14', 'r13', 'r12', 'rbp', 'rbx', 'r11', 'r10',
>>>> +               'r9', 'r8', 'rax', 'rcx', 'rdx', 'rsi', 'rdi', 'orig_rax',
>>>> +               'rip', 'cs', 'eflags', 'rsp', 'ss']
>>>> +
>>>> +    with open(corefile, 'r+b') as f:
>>>> +        gdb.write(f'patching core file {corefile}\n')
>>>> +
>>>> +        while f.read(4) != b'CORE':
>>>> +            pass
>>>> +        gdb.write(f'found "CORE" at 0x{f.tell():x}\n')
>>>> +
>>>> +        # Looking for struct elf_prstatus and pr_reg field in it (an array
>>>> +        # of general purpose registers).  See sys/procfs.h
>>>> +
>>>> +        # lseek(f.fileno(), 4, SEEK_CUR): go to elf_prstatus
>>>> +        f.seek(4, 1)
>>>> +        # lseek(f.fileno(), 112, SEEK_CUR): offsetof(struct elf_prstatus, pr_reg)
>>>> +        f.seek(112, 1)
>>>> +
>>>> +        gdb.write(f'assume pt_regs at 0x{f.tell():x}\n')
>>>> +        for reg in pt_regs:
>>>> +            if reg in set_regs:
>>>> +                gdb.write(f'write {reg} at 0x{f.tell():x}\n')
>>>> +                f.write(struct.pack('q', set_regs[reg]))
>>>> +            else:
>>>> +                # lseek(f.fileno(), 8, SEEK_CUR): go to the next reg
>>>> +                f.seek(8, 1)
>>>> +
>>>> +def clone_coredump(source, target, set_regs):
>>>> +    shutil.copyfile(source, target)
>>>> +    write_regs_to_coredump(target, set_regs)
>>>> +
>>>> +def dump_backtrace_patched(regs):
>>>> +    files = gdb.execute('info files', False, True).split('\n')
>>>> +    executable = re.match('^Symbols from "(.*)".$', files[0]).group(1)
>>>> +    dump = re.search("`(.*)'", files[2]).group(1)
>>>> +
>>>> +    with tempfile.NamedTemporaryFile(dir='/tmp', delete=False) as f:
>>>> +        tmpcore = f.name
>>>> +
>>>> +    clone_coredump(dump, tmpcore, regs)
>>>> +
>>>> +    cmd = ['script', '-qec',
>>>> +           'gdb -batch ' +
>>>> +           '-ex "set complaints 0" ' +
>>>> +           '-ex "set verbose off" ' +
>>>> +           '-ex "set style enabled on" ' +
>>>> +           '-ex "python print(\'----split----\')" ' +
>>>> +           f'-ex bt {executable} {tmpcore}',
>>>> +           '/dev/null']
>>>> +    out = subprocess.check_output(cmd, stderr=subprocess.DEVNULL)
>>>
>>> Is script(1) necessary or just something you used for debugging?
>>>
>>> On Fedora 43 the script(1) utility isn't installed by default. Due to
>>> its generic name it's also a little hard to find the package name
>>> online. It would be nice to print a help message pointing to the
>>> packages. From what I can tell, script(1) is available in
>>> util-linux-script on Red Hat-based distros, bsdutils on Debian-based
>>> distros, and util-linux on Arch.
>>>
>>> [...]
>> My sole purpose for using script(1) was to make GDB subprocess produce
>> colored stack trace output, just like what we get when calling 'bt' in a
>> regular GDB session.  I just find it easier to read.  So, unless there's
>> an easier way to achieve that same result, I'd prefer to keep using
>> script(1).
> 
> Have you tried the pty Python standard library module?
> https://docs.python.org/3/library/pty.html
> 

I haven't until now.  Although using it creates a necessity to manually
fork() and manage master/slave fds, it does serves its purpose and
eliminates the dependency for external programs.  So thank you, I'll add
it in v2.

>>
>> But your point is of course valid -- I didn't think of the case when
>> script(1) program might not be installed.  Since we're just decorating
>> the output here, instead of failing with a help message I'd suggest
>> simply checking whether script(1) binary is present in the system with
>> smth like shutil.which(), and only using it if it is.  I'll update the
>> patch accordingly, if there're no objections.
>>
>> Andrey
>>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump
  2025-11-27 14:31       ` Andrey Drobyshev
  2025-11-27 14:55         ` Daniel P. Berrangé
@ 2025-11-27 16:39         ` Kevin Wolf
  2025-11-28 12:24           ` Andrey Drobyshev
  1 sibling, 1 reply; 18+ messages in thread
From: Kevin Wolf @ 2025-11-27 16:39 UTC (permalink / raw)
  To: Andrey Drobyshev
  Cc: Daniel P. Berrangé, qemu-devel, peterx, stefanha, vsementsov,
	den

Am 27.11.2025 um 15:31 hat Andrey Drobyshev geschrieben:
> On 11/27/25 12:02 PM, Daniel P. Berrangé wrote:
> > On Thu, Nov 27, 2025 at 10:56:12AM +0100, Kevin Wolf wrote:
> >> Am 25.11.2025 um 15:21 hat andrey.drobyshev@virtuozzo.com geschrieben:
> >>> From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> >>>
> >>> Commit 772f86839f ("scripts/qemu-gdb: Support coroutine dumps in
> >>> coredumps") introduced coroutine traces in coredumps using raw stack
> >>> unwinding.  While this works, this approach does not allow to view the
> >>> function arguments in the corresponding stack frames.
> >>>
> >>> As an alternative, we can obtain saved registers from the coroutine's
> >>> jmpbuf, copy the original coredump file into a temporary file, patch the
> >>> saved registers into the tmp coredump's struct elf_prstatus and execute
> >>> another gdb subprocess to get backtrace from the patched temporary coredump.
> >>>
> >>> While providing more detailed info, this alternative approach, however, is
> >>> quite heavyweight as it takes significantly more time and disk space.
> >>> So, instead of making it a new default, let's keep raw unwind the default
> >>> behaviour, but add the '--detailed' option for 'qemu bt' and 'qemu coroutine'
> >>> command which would enforce the new behaviour.
> >>> [...]
> >>
> >>> +def clone_coredump(source, target, set_regs):
> >>> +    shutil.copyfile(source, target)
> >>> +    write_regs_to_coredump(target, set_regs)
> >>> +
> >>> +def dump_backtrace_patched(regs):
> >>> +    files = gdb.execute('info files', False, True).split('\n')
> >>> +    executable = re.match('^Symbols from "(.*)".$', files[0]).group(1)
> >>> +    dump = re.search("`(.*)'", files[2]).group(1)
> >>> +
> >>> +    with tempfile.NamedTemporaryFile(dir='/tmp', delete=False) as f:
> >>> +        tmpcore = f.name
> >>> +
> >>> +    clone_coredump(dump, tmpcore, regs)
> >>
> >> I think this is what makes it so heavy, right? Coredumps can be quite
> >> large and /tmp is probably a different filesystem, so you end up really
> >> copying the full size of the coredump around.
> > 
> > On my system /tmp is  tmpfs, so this is actually bringing the whole
> > coredump into RAM which is not a sensible approach.
> > 
> >> Wouldn't it be better in the general case if we could just do a reflink
> >> copy of the coredump and then do only very few writes for updating the
> >> register values? Then the overhead should actually be quite negligible
> >> both in terms of time and disk space.
> > 
> 
> That's correct, copying the file to /tmp takes most of the time with
> this approach.
> 
> As for reflink copy, this might've been a great solution.  However, it
> would largely depend on the FS used.  E.g. in my system coredumpctl
> places uncompressed coredump at /var/tmp, which is mounted as ext4.  And
> in this case:
> 
> # cp --reflink /var/tmp/coredump-MQCZQc /root
> cp: failed to clone '/root/coredump-MQCZQc' from
> '/var/tmp/coredump-MQCZQc': Invalid cross-device link
> 
> # cp --reflink /var/tmp/coredump-MQCZQc /var/tmp/coredump.ref
> cp: failed to clone '/var/tmp/coredump.ref' from
> '/var/tmp/coredump-MQCZQc': Operation not supported
> 
> Apparently, ext4 doesn't support reflink copy. xfs and btrfs do.  But I
> guess our implementation better be FS-agnostic.

Yes, we might still need a slow copy fallback for those filesystems,
i.e. essentially a 'cp --reflink=auto'. For myself, coredumps will
generally live on XFS, so I would benefit from creating that copy in the
same filesystem where the coredump lives, and for you it shouldn't hurt
at least.

Another thought... it's a bit crazy, but... we're QEMU, we have our own
tools for this. We could create a qcow2 overlay for the coredump and
export it using FUSE! :-D (Probably not very practical because you need
the right paths for the binaries, but I had to mention it.)

Kevin



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump
  2025-11-27 16:39         ` Kevin Wolf
@ 2025-11-28 12:24           ` Andrey Drobyshev
  2025-11-28 13:18             ` Kevin Wolf
  0 siblings, 1 reply; 18+ messages in thread
From: Andrey Drobyshev @ 2025-11-28 12:24 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Daniel P. Berrangé, qemu-devel, peterx, stefanha, vsementsov,
	den

On 11/27/25 6:39 PM, Kevin Wolf wrote:
> Am 27.11.2025 um 15:31 hat Andrey Drobyshev geschrieben:
>> On 11/27/25 12:02 PM, Daniel P. Berrangé wrote:
>>> On Thu, Nov 27, 2025 at 10:56:12AM +0100, Kevin Wolf wrote:
>>>> Am 25.11.2025 um 15:21 hat andrey.drobyshev@virtuozzo.com geschrieben:
>>>>> From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>>>>>
>>>>> Commit 772f86839f ("scripts/qemu-gdb: Support coroutine dumps in
>>>>> coredumps") introduced coroutine traces in coredumps using raw stack
>>>>> unwinding.  While this works, this approach does not allow to view the
>>>>> function arguments in the corresponding stack frames.
>>>>>
>>>>> As an alternative, we can obtain saved registers from the coroutine's
>>>>> jmpbuf, copy the original coredump file into a temporary file, patch the
>>>>> saved registers into the tmp coredump's struct elf_prstatus and execute
>>>>> another gdb subprocess to get backtrace from the patched temporary coredump.
>>>>>
>>>>> While providing more detailed info, this alternative approach, however, is
>>>>> quite heavyweight as it takes significantly more time and disk space.
>>>>> So, instead of making it a new default, let's keep raw unwind the default
>>>>> behaviour, but add the '--detailed' option for 'qemu bt' and 'qemu coroutine'
>>>>> command which would enforce the new behaviour.
>>>>> [...]
>>>>
>>>>> +def clone_coredump(source, target, set_regs):
>>>>> +    shutil.copyfile(source, target)
>>>>> +    write_regs_to_coredump(target, set_regs)
>>>>> +
>>>>> +def dump_backtrace_patched(regs):
>>>>> +    files = gdb.execute('info files', False, True).split('\n')
>>>>> +    executable = re.match('^Symbols from "(.*)".$', files[0]).group(1)
>>>>> +    dump = re.search("`(.*)'", files[2]).group(1)
>>>>> +
>>>>> +    with tempfile.NamedTemporaryFile(dir='/tmp', delete=False) as f:
>>>>> +        tmpcore = f.name
>>>>> +
>>>>> +    clone_coredump(dump, tmpcore, regs)
>>>>
>>>> I think this is what makes it so heavy, right? Coredumps can be quite
>>>> large and /tmp is probably a different filesystem, so you end up really
>>>> copying the full size of the coredump around.
>>>
>>> On my system /tmp is  tmpfs, so this is actually bringing the whole
>>> coredump into RAM which is not a sensible approach.
>>>
>>>> Wouldn't it be better in the general case if we could just do a reflink
>>>> copy of the coredump and then do only very few writes for updating the
>>>> register values? Then the overhead should actually be quite negligible
>>>> both in terms of time and disk space.
>>>
>>
>> That's correct, copying the file to /tmp takes most of the time with
>> this approach.
>>
>> As for reflink copy, this might've been a great solution.  However, it
>> would largely depend on the FS used.  E.g. in my system coredumpctl
>> places uncompressed coredump at /var/tmp, which is mounted as ext4.  And
>> in this case:
>>
>> # cp --reflink /var/tmp/coredump-MQCZQc /root
>> cp: failed to clone '/root/coredump-MQCZQc' from
>> '/var/tmp/coredump-MQCZQc': Invalid cross-device link
>>
>> # cp --reflink /var/tmp/coredump-MQCZQc /var/tmp/coredump.ref
>> cp: failed to clone '/var/tmp/coredump.ref' from
>> '/var/tmp/coredump-MQCZQc': Operation not supported
>>
>> Apparently, ext4 doesn't support reflink copy. xfs and btrfs do.  But I
>> guess our implementation better be FS-agnostic.
> 
> Yes, we might still need a slow copy fallback for those filesystems,
> i.e. essentially a 'cp --reflink=auto'. For myself, coredumps will
> generally live on XFS, so I would benefit from creating that copy in the
> same filesystem where the coredump lives, and for you it shouldn't hurt
> at least.
> 
> Another thought... it's a bit crazy, but... we're QEMU, we have our own
> tools for this. We could create a qcow2 overlay for the coredump and
> export it using FUSE! :-D (Probably not very practical because you need
> the right paths for the binaries, but I had to mention it.)
> 
> Kevin
> 

We can surely add reflink copying as a fast path option which we try
first.  That's cheap to implement.  The real issue is designing a
sensible fallback approach.

As for creating an overlay... That's an interesting option but it would
require everybody who wants to use this stuff configure their QEMU build
with --enable-fuse.  We, for instance, don't have it enabled in our
builds, as I'm sure many others.

Of course we can think of an NBD export for such an overlay instead of
FUSE.  But it'll then require root user to write to /dev/nbd0.  Also not
very acceptable.

Quick overlayfs mount with lowerdir=/var/tmp could also solve this.  But
again, root is required.  Not good.

So the most robust option, I guess, is the one suggested by Daniel:
copying some kind of minimal viable coredump part/range instead of the
whole file, which is just enough for producing valid backtrace.  The
only thing left is figuring out which part to copy.  That might require
some tricky ELF structure parsing.

Andrey


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump
  2025-11-28 12:24           ` Andrey Drobyshev
@ 2025-11-28 13:18             ` Kevin Wolf
  2025-12-02 16:36               ` Andrey Drobyshev
  0 siblings, 1 reply; 18+ messages in thread
From: Kevin Wolf @ 2025-11-28 13:18 UTC (permalink / raw)
  To: Andrey Drobyshev
  Cc: Daniel P. Berrangé, qemu-devel, peterx, stefanha, vsementsov,
	den

Am 28.11.2025 um 13:24 hat Andrey Drobyshev geschrieben:
> On 11/27/25 6:39 PM, Kevin Wolf wrote:
> > Am 27.11.2025 um 15:31 hat Andrey Drobyshev geschrieben:
> >> As for reflink copy, this might've been a great solution.  However, it
> >> would largely depend on the FS used.  E.g. in my system coredumpctl
> >> places uncompressed coredump at /var/tmp, which is mounted as ext4.  And
> >> in this case:
> >>
> >> # cp --reflink /var/tmp/coredump-MQCZQc /root
> >> cp: failed to clone '/root/coredump-MQCZQc' from
> >> '/var/tmp/coredump-MQCZQc': Invalid cross-device link
> >>
> >> # cp --reflink /var/tmp/coredump-MQCZQc /var/tmp/coredump.ref
> >> cp: failed to clone '/var/tmp/coredump.ref' from
> >> '/var/tmp/coredump-MQCZQc': Operation not supported
> >>
> >> Apparently, ext4 doesn't support reflink copy. xfs and btrfs do.  But I
> >> guess our implementation better be FS-agnostic.
> > 
> > Yes, we might still need a slow copy fallback for those filesystems,
> > i.e. essentially a 'cp --reflink=auto'. For myself, coredumps will
> > generally live on XFS, so I would benefit from creating that copy in the
> > same filesystem where the coredump lives, and for you it shouldn't hurt
> > at least.
> > 
> > Another thought... it's a bit crazy, but... we're QEMU, we have our own
> > tools for this. We could create a qcow2 overlay for the coredump and
> > export it using FUSE! :-D (Probably not very practical because you need
> > the right paths for the binaries, but I had to mention it.)
> > 
> > Kevin
> 
> We can surely add reflink copying as a fast path option which we try
> first.  That's cheap to implement.  The real issue is designing a
> sensible fallback approach.

I mean, as far as I am concerned, you can keep what you already have as
the fallback approach. Reflink copy if possible, and otherwise a slow
full copy.

Or if the coredump can be written to, you could do the in-place editing,
though I would save the original content in a file that could survive a
crash. And after finishing the operation, the original content
definitely should be written back.

> As for creating an overlay... That's an interesting option but it would
> require everybody who wants to use this stuff configure their QEMU build
> with --enable-fuse.  We, for instance, don't have it enabled in our
> builds, as I'm sure many others.
> 
> Of course we can think of an NBD export for such an overlay instead of
> FUSE.  But it'll then require root user to write to /dev/nbd0.  Also not
> very acceptable.
> 
> Quick overlayfs mount with lowerdir=/var/tmp could also solve this.  But
> again, root is required.  Not good.
> 
> So the most robust option, I guess, is the one suggested by Daniel:
> copying some kind of minimal viable coredump part/range instead of the
> whole file, which is just enough for producing valid backtrace.  The
> only thing left is figuring out which part to copy.  That might require
> some tricky ELF structure parsing.

All of these solutions are interesting, but honestly feel a bit too
complex for a simple debugging script.

Kevin



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump
  2025-11-28 13:18             ` Kevin Wolf
@ 2025-12-02 16:36               ` Andrey Drobyshev
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Drobyshev @ 2025-12-02 16:36 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Daniel P. Berrangé, qemu-devel, peterx, stefanha, vsementsov,
	den

On 11/28/25 3:18 PM, Kevin Wolf wrote:
> Am 28.11.2025 um 13:24 hat Andrey Drobyshev geschrieben:
>> On 11/27/25 6:39 PM, Kevin Wolf wrote:
>>> Am 27.11.2025 um 15:31 hat Andrey Drobyshev geschrieben:
>>>> As for reflink copy, this might've been a great solution.  However, it
>>>> would largely depend on the FS used.  E.g. in my system coredumpctl
>>>> places uncompressed coredump at /var/tmp, which is mounted as ext4.  And
>>>> in this case:
>>>>
>>>> # cp --reflink /var/tmp/coredump-MQCZQc /root
>>>> cp: failed to clone '/root/coredump-MQCZQc' from
>>>> '/var/tmp/coredump-MQCZQc': Invalid cross-device link
>>>>
>>>> # cp --reflink /var/tmp/coredump-MQCZQc /var/tmp/coredump.ref
>>>> cp: failed to clone '/var/tmp/coredump.ref' from
>>>> '/var/tmp/coredump-MQCZQc': Operation not supported
>>>>
>>>> Apparently, ext4 doesn't support reflink copy. xfs and btrfs do.  But I
>>>> guess our implementation better be FS-agnostic.
>>>
>>> Yes, we might still need a slow copy fallback for those filesystems,
>>> i.e. essentially a 'cp --reflink=auto'. For myself, coredumps will
>>> generally live on XFS, so I would benefit from creating that copy in the
>>> same filesystem where the coredump lives, and for you it shouldn't hurt
>>> at least.
>>>
>>> Another thought... it's a bit crazy, but... we're QEMU, we have our own
>>> tools for this. We could create a qcow2 overlay for the coredump and
>>> export it using FUSE! :-D (Probably not very practical because you need
>>> the right paths for the binaries, but I had to mention it.)
>>>
>>> Kevin
>>
>> We can surely add reflink copying as a fast path option which we try
>> first.  That's cheap to implement.  The real issue is designing a
>> sensible fallback approach.
> 
> I mean, as far as I am concerned, you can keep what you already have as
> the fallback approach. Reflink copy if possible, and otherwise a slow
> full copy.
> 
> Or if the coredump can be written to, you could do the in-place editing,
> though I would save the original content in a file that could survive a
> crash. And after finishing the operation, the original content
> definitely should be written back.
> 

I decided to go with the latter approach as it's more universal.
In-place modifying doesn't take as much time, so performance difference
with reflink shouldn't be a problem.  I've sent v2 with an implementation.

>> As for creating an overlay... That's an interesting option but it would
>> require everybody who wants to use this stuff configure their QEMU build
>> with --enable-fuse.  We, for instance, don't have it enabled in our
>> builds, as I'm sure many others.
>>
>> Of course we can think of an NBD export for such an overlay instead of
>> FUSE.  But it'll then require root user to write to /dev/nbd0.  Also not
>> very acceptable.
>>
>> Quick overlayfs mount with lowerdir=/var/tmp could also solve this.  But
>> again, root is required.  Not good.
>>
>> So the most robust option, I guess, is the one suggested by Daniel:
>> copying some kind of minimal viable coredump part/range instead of the
>> whole file, which is just enough for producing valid backtrace.  The
>> only thing left is figuring out which part to copy.  That might require
>> some tricky ELF structure parsing.
> 
> All of these solutions are interesting, but honestly feel a bit too
> complex for a simple debugging script.
> 
> Kevin
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-12-02 16:39 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-25 14:21 [PATCH 0/4] Fixes and improvements for scripts/qemugdb commands andrey.drobyshev
2025-11-25 14:21 ` [PATCH 1/4] scripts/qemugdb: mtree: Fix OverflowError in mtree with 128-bit addresses andrey.drobyshev
2025-11-25 14:21 ` [PATCH 2/4] scripts/qemugdb: timers: Fix KeyError in 'qemu timers' command andrey.drobyshev
2025-11-25 14:21 ` [PATCH 3/4] scripts/qemugdb: timers: Improve 'qemu timers' command readability andrey.drobyshev
2025-11-25 14:21 ` [PATCH 4/4] scripts/qemugdb: coroutine: Add option for obtaining detailed trace in coredump andrey.drobyshev
2025-11-26 20:58   ` Stefan Hajnoczi
2025-11-27 13:14     ` Andrey Drobyshev
2025-11-27 14:48       ` Stefan Hajnoczi
2025-11-27 16:13         ` Andrey Drobyshev
2025-11-27  9:56   ` Kevin Wolf
2025-11-27 10:02     ` Daniel P. Berrangé
2025-11-27 14:31       ` Andrey Drobyshev
2025-11-27 14:55         ` Daniel P. Berrangé
2025-11-27 16:39         ` Kevin Wolf
2025-11-28 12:24           ` Andrey Drobyshev
2025-11-28 13:18             ` Kevin Wolf
2025-12-02 16:36               ` Andrey Drobyshev
2025-11-26 20:59 ` [PATCH 0/4] Fixes and improvements for scripts/qemugdb commands Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).