public inbox for kexec@lists.infradead.org
 help / color / mirror / Atom feed
From: Stephen Brennan <stephen.s.brennan@oracle.com>
To: Tao Liu <ltao@redhat.com>,
	yamazaki-msmt@nec.com, k-hagio-ab@nec.com,
	kexec@lists.infradead.org
Cc: aravinda@linux.vnet.ibm.com, Tao Liu <ltao@redhat.com>
Subject: Re: [PATCH v4][makedumpfile 0/7] btf/kallsyms based makedumpfile extension for mm page filtering
Date: Fri, 03 Apr 2026 11:26:59 -0700	[thread overview]
Message-ID: <87ldf3ew7g.fsf@oracle.com> (raw)
In-Reply-To: <20260317150743.69590-1-ltao@redhat.com>

Hello,

My testing of the patch series involves my own extension, userstack.so,
which I've implemented here[1] on top of this branch:

[1]: https://github.com/brenns10/makedumpfile/commits/stepbren_userstack_upstream_v4/

To test, I have a vmcore created at dump-level 23, and I'm using
makedumpfile to re-filter the vmcore at dump-level 31, either including
the userspace stacks, or not using an extension at all:

$ /usr/bin/time ./makedumpfile -z -d31 --extension userstack.so ./dump.withuser.img ./extension.img
...
9.66user 0.86system 0:10.56elapsed 99%CPU (0avgtext+0avgdata 15108maxresident)k
0inputs+1133800outputs (0major+2184minor)pagefaults 0swaps

$ /usr/bin/time ./makedumpfile -z -d31 ./dump.withuser.img ./baseline.img
...
9.28user 0.84system 0:10.17elapsed 99%CPU (0avgtext+0avgdata 3336maxresident)k
0inputs+1132120outputs (0major+236minor)pagefaults 0swaps

$ ls -l *.img
-rw------- 1 stepbren stepbren  577384093 Apr  3 10:55 baseline.img
-rw------- 1 stepbren stepbren 1746475073 Apr  2 15:21 dump.withuser.img
-rw------- 1 stepbren stepbren  578253346 Apr  3 10:54 extension.img


With drgn's contrib/pstack.py, we can dump a stack trace for an example
process in the vmcore. I'll show it here three times: first with the
original vmcore containing all userspace pages: it works, but the vmcore
is 1.7 GiB, more than 3x larger than a dump-level 31 vmcore. Second with
the baseline, which is at dump-level 31: the userspace stack cannot be
retrieved. Third, with the userstack.so extension: despite being less
than 1 MiB larger than the prior core, it contains the necessary stack
pages and thus the stack trace is the same as the original.


$ drgn -c dump.withuser.img /usr/share/drgn/contrib/pstack.py pstack -p 19284
[PID: 19284 COMM: bash]
  Thread 0 TID=19284 [S] CPU=18 ('bash')
    #0  context_switch (kernel/sched/core.c:5367:2)
    #1  __schedule (kernel/sched/core.c:6752:8)
    #2  __schedule_loop (kernel/sched/core.c:6829:3)
    #3  schedule (kernel/sched/core.c:6844:2)
    #4  do_wait (kernel/exit.c:1698:3)
    #5  kernel_wait4 (kernel/exit.c:1852:8)
    #6  __do_sys_wait4 (kernel/exit.c:1880:13)
    #7  do_syscall_x64 (arch/x86/entry/common.c:47:14)
    #8  do_syscall_64 (arch/x86/entry/common.c:84:7)
    #9  entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121)
    #10 0x7f46e82b63d7
    ------ userspace ---------
    #0  wait4+0x17/0xa3 (from /usr/lib64/libc.so.6 +0x1103d7)
    #1  waitchld.isra.0+0x102/0xc42 (from /usr/bin/bash +0x10bd52)
    #2  wait_for+0x15a/0xda6 (from /usr/bin/bash +0x5b54a)
    #3  execute_command_internal+0x3186/0x3877 (from /usr/bin/bash +0x3e976)
    #4  execute_command+0xcc/0x1c8 (from /usr/bin/bash +0x3f13c)
    #5  reader_loop+0x1ea/0x39a (from /usr/bin/bash +0x311da)
    #6  main+0xe2a/0x1a94 (from /usr/bin/bash +0x2548a)
    #7  __libc_start_call_main+0x7e/0xac (from /usr/lib64/libc.so.6 +0x2d39e)
    #8  __libc_start_main@@GLIBC_2.34+0x89/0x14c (from /usr/lib64/libc.so.6 +0x2d459)
    #9  _start+0x25/0x26 (from /usr/bin/bash +0x26125)
    #10 ???


$ drgn -c baseline.img /usr/share/drgn/contrib/pstack.py pstack -p 19284
[PID: 19284 COMM: bash]
  Thread 0 TID=19284 [S] CPU=18 ('bash')
    #0  context_switch (kernel/sched/core.c:5367:2)
    #1  __schedule (kernel/sched/core.c:6752:8)
    #2  __schedule_loop (kernel/sched/core.c:6829:3)
    #3  schedule (kernel/sched/core.c:6844:2)
    #4  do_wait (kernel/exit.c:1698:3)
    #5  kernel_wait4 (kernel/exit.c:1852:8)
    #6  __do_sys_wait4 (kernel/exit.c:1880:13)
    #7  do_syscall_x64 (arch/x86/entry/common.c:47:14)
    #8  do_syscall_64 (arch/x86/entry/common.c:84:7)
    #9  entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121)
    #10 0x7f46e82b63d7
    ------ userspace ---------
    #0  wait4+0x17/0xa3 (from /usr/lib64/libc.so.6 +0x1103d7)


$ drgn -c extension.img /usr/share/drgn/contrib/pstack.py pstack -p 19284
[PID: 19284 COMM: bash]
  Thread 0 TID=19284 [S] CPU=18 ('bash')
    #0  context_switch (kernel/sched/core.c:5367:2)
    #1  __schedule (kernel/sched/core.c:6752:8)
    #2  __schedule_loop (kernel/sched/core.c:6829:3)
    #3  schedule (kernel/sched/core.c:6844:2)
    #4  do_wait (kernel/exit.c:1698:3)
    #5  kernel_wait4 (kernel/exit.c:1852:8)
    #6  __do_sys_wait4 (kernel/exit.c:1880:13)
    #7  do_syscall_x64 (arch/x86/entry/common.c:47:14)
    #8  do_syscall_64 (arch/x86/entry/common.c:84:7)
    #9  entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121)
    #10 0x7f46e82b63d7
    ------ userspace ---------
    #0  wait4+0x17/0xa3 (from /usr/lib64/libc.so.6 +0x1103d7)
    #1  waitchld.isra.0+0x102/0xc42 (from /usr/bin/bash +0x10bd52)
    #2  wait_for+0x15a/0xda6 (from /usr/bin/bash +0x5b54a)
    #3  execute_command_internal+0x3186/0x3877 (from /usr/bin/bash +0x3e976)
    #4  execute_command+0xcc/0x1c8 (from /usr/bin/bash +0x3f13c)
    #5  reader_loop+0x1ea/0x39a (from /usr/bin/bash +0x311da)
    #6  main+0xe2a/0x1a94 (from /usr/bin/bash +0x2548a)
    #7  __libc_start_call_main+0x7e/0xac (from /usr/lib64/libc.so.6 +0x2d39e)
    #8  __libc_start_main@@GLIBC_2.34+0x89/0x14c (from /usr/lib64/libc.so.6 +0x2d459)
    #9  _start+0x25/0x26 (from /usr/bin/bash +0x26125)
    #10 ???

Beyond my testing / use case, the only other feedback I wanted to share
on this patch series, is that I think it would be good to have the
amdgpu and userstack extensions contributed upstream, rather than kept
as external customizations.

Given that there's not a defined extension API or ABI, I don't think
extensions yet make sense to be indpendent: they depend heavily on the
internals of makedumpfile. There's probably not going to be a
"makedumpfile-devel" package any time soon which allows building
and maintaining external extensions for your system makedumpfile. So the
only way make use of the system would be either (a) the Linux distro
bundles their own extension, or (b) the user builds and manually
installs a custom version of makedumpfile & extensions.

The main purpose of the extension API is to provide these specific
functionalities which are impossible with page-based filtering. Ideally,
I think makedumpfile should provide the real, useful capability and not
just the framework for a motivated developer to make it happen.

That said, I know it's a maintenance overhead and there may not be a
clear test strategy for every extension.

Thanks,
Stephen

Tao Liu <ltao@redhat.com> writes:
> A) This patchset will introduce the following features to makedumpfile:
>
>   1) Add .so extension support to makedumpfile
>   2) Enable btf and kallsyms for symbol type and address resolving.
>
> B) The purpose of the features are:
>
>   1) Currently makedumpfile filters mm pages based on page flags, because flags
>      can help to determine one page's usage. But this page-flag-checking method
>      lacks of flexibility in certain cases, e.g. if we want to filter those mm
>      pages occupied by GPU during vmcore dumping due to:
>
>      a) GPU may be taking a large memory and contains sensitive data;
>      b) GPU mm pages have no relations to kernel crash and useless for vmcore
>         analysis.
>
>      But there is no GPU mm page specific flags, and apparently we don't need
>      to create one just for kdump use. A programmable filtering tool is more
>      suitable for such cases. In addition, different GPU vendors may use
>      different ways for mm pages allocating, programmable filtering is better
>      than hard coding these GPU specific logics into makedumpfile in this case.
>
>   2) Currently makedumpfile already contains a programmable filtering tool, aka
>      eppic script, which allows user to write customized code for data erasing.
>      However it has the following drawbacks:
>
>      a) cannot do mm page filtering.
>      b) need to access to debuginfo of both kernel and modules, which is not
>         applicable in the 2nd kernel.
>      c) eppic library has memory leaks which are not all resolved [1]. This
>         is not acceptable in 2nd kernel.
>
>      makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
>      types and addresses. In recent kernel there are dwarf alternatives such
>      as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
>      are already packed within vmcore, so we can use it directly.
>
>   With these, this patchset introduces makedumpfile extensions, which is based
>   on btf/kallsyms symbol resolving, and is programmable for mm page filtering.
>   The following section shows its usage and performance, please note the tests
>   are performed in 1st kernel.
>
>   3) Compile and run makedumpfile extensions:
>
>   $ make LINKTYPE=dynamic USELZO=on USESNAPPY=on USEZSTD=on
>   $ make extensions
>   
>   $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
>     /tmp/extension.out --extension amdgpu_filter.so
>     Loaded extension: ./extensions/amdgpu_filter.so
>     makedumpfile Completed.
> 	User time (seconds): 5.08
> 	System time (seconds): 0.84
> 	Percent of CPU this job got: 99%
> 	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.95
>         Maximum resident set size (kbytes): 17360
>         ...
>  
>      To contrast with eppic script of v2 [2]:
>
>   $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
>     /tmp/eppic.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c   
>     makedumpfile Completed.
>         User time (seconds): 8.23
>         System time (seconds): 0.88
>         Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.16
>         Maximum resident set size (kbytes): 57128
>         ...
>
>   -rw------- 1 root root 367475074 Jan 19 19:01 /tmp/extension.out
>   -rw------- 1 root root 367475074 Jan 19 19:48 /tmp/eppic.out
>   -rw------- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore
>
> C) Discussion:
>
>   1) GPU types: Currently only tested with amdgpu's mm page filtering, others
>      are not tested.
>   2) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64.
>      Others are not tested.
>
> D) Testing:
>
>      If you don't want to create your vmcore, you can find a vmcore which I
>      created with amdgpu mm pages unfiltered [3], the amdgpu mm pages are
>      allocated by program [4]. You can use the vmcore in 1st kernel to filter
>      the amdgpu mm pages by the previous performance testing cmdline. To
>      verify the pages are filtered in crash:
>
>      Unfiltered:
>      crash> search -c "!QAZXSW@#EDC"
>      ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>      ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>      crash> rd ffff96b7fa800000
>      ffff96b7fa800000:  405753585a415121                    !QAZXSW@
>      crash> rd ffff96b87c800000
>      ffff96b87c800000:  405753585a415121                    !QAZXSW@
>
>      Filtered:
>      crash> search -c "!QAZXSW@#EDC"
>      crash> rd ffff96b7fa800000
>      rd: page excluded: kernel virtual address: ffff96b7fa800000  type: "64-bit KVADDR"
>      crash> rd ffff96b87c800000
>      rd: page excluded: kernel virtual address: ffff96b87c800000  type: "64-bit KVADDR"
>
> [1]: https://github.com/lucchouina/eppic/pull/32
> [2]: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
> [3]: https://people.redhat.com/~ltao/core/vmcore
> [4]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df
>
> v4 -> v3:
>
> 1) Get rid of all hash table usage. So only required syms/types info
>    will be stored, rather than install all kernel's syms/types. To do this,
>    special elf sections as .init_ksyms/ktypes are used for the required info
>    declaration/storage.
>
> 2) Support extension callback for makedumpfile, so during mm page
>    filtering, extension can help to decide if keep/discard the page. 
>
> 3) The patches are organized as follows:
>
>     --- <only for test purpose, don't merge> ---
>     7. Filter amdgpu mm pages
>
>     --- <code should be merged> ---
>     6. Add makedumpfile extensions support
>     5. Implement kernel module's btf resolving
>     4. Implement kernel module's kallsyms resolving
>     3. Implement kernel btf resolving
>     2. Implement kernel kallsyms resolving
>     1. Reserve sections for makedumpfile and extenions
>
>     Patch 7 is customization specific, which can be maintained separately.  
>     Patch 1 ~ 6 are common code which should be integrate with makedumpfile.
>
> Link to v3: https://lore.kernel.org/kexec/20260120025500.25095-1-ltao@redhat.com/
> Link to v2: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
> Link to v1: https://lore.kernel.org/kexec/20250610095743.18073-1-ltao@redhat.com/
>
> Tao Liu (7):
>   Reserve sections for makedumpfile and extenions
>   Implement kernel kallsyms resolving
>   Implement kernel btf resolving
>   Implement kernel module's kallsyms resolving
>   Implement kernel module's btf resolving
>   Add makedumpfile extensions support
>   Filter amdgpu mm pages
>
>  Makefile                   |  11 +-
>  btf_info.c                 | 345 +++++++++++++++++++++++++++
>  btf_info.h                 |  92 ++++++++
>  extension.c                | 300 +++++++++++++++++++++++
>  extension.h                |  12 +
>  extensions/Makefile        |  12 +
>  extensions/amdgpu_filter.c | 190 +++++++++++++++
>  extensions/maple_tree.c    | 307 ++++++++++++++++++++++++
>  extensions/maple_tree.h    |   6 +
>  kallsyms.c                 | 473 +++++++++++++++++++++++++++++++++++++
>  kallsyms.h                 |  94 ++++++++
>  makedumpfile.c             |  41 +++-
>  makedumpfile.h             |  13 +
>  makedumpfile.ld            |  15 ++
>  14 files changed, 1903 insertions(+), 8 deletions(-)
>  create mode 100644 btf_info.c
>  create mode 100644 btf_info.h
>  create mode 100644 extension.c
>  create mode 100644 extension.h
>  create mode 100644 extensions/Makefile
>  create mode 100644 extensions/amdgpu_filter.c
>  create mode 100644 extensions/maple_tree.c
>  create mode 100644 extensions/maple_tree.h
>  create mode 100644 kallsyms.c
>  create mode 100644 kallsyms.h
>  create mode 100644 makedumpfile.ld
>
> -- 
> 2.47.0


      parent reply	other threads:[~2026-04-03 18:27 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 15:07 [PATCH v4][makedumpfile 0/7] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
2026-03-17 15:07 ` [PATCH v4][makedumpfile 1/7] Reserve sections for makedumpfile and extenions Tao Liu
2026-04-02 23:31   ` Stephen Brennan
2026-04-03  8:10   ` HAGIO KAZUHITO(萩尾 一仁)
2026-03-17 15:07 ` [PATCH v4][makedumpfile 2/7] Implement kernel kallsyms resolving Tao Liu
2026-04-02 23:32   ` Stephen Brennan
2026-04-03  8:12   ` HAGIO KAZUHITO(萩尾 一仁)
2026-03-17 15:07 ` [PATCH v4][makedumpfile 3/7] Implement kernel btf resolving Tao Liu
2026-04-02 23:41   ` Stephen Brennan
2026-04-03  8:13   ` HAGIO KAZUHITO(萩尾 一仁)
2026-03-17 15:07 ` [PATCH v4][makedumpfile 4/7] Implement kernel module's kallsyms resolving Tao Liu
2026-04-02 23:54   ` Stephen Brennan
2026-03-17 15:07 ` [PATCH v4][makedumpfile 5/7] Implement kernel module's btf resolving Tao Liu
2026-04-02 23:56   ` Stephen Brennan
2026-03-17 15:07 ` [PATCH v4][makedumpfile 6/7] Add makedumpfile extensions support Tao Liu
2026-04-03  0:11   ` Stephen Brennan
2026-04-03  8:14   ` HAGIO KAZUHITO(萩尾 一仁)
2026-03-17 15:07 ` [PATCH v4][makedumpfile 7/7] Filter amdgpu mm pages Tao Liu
2026-04-03  0:16   ` Stephen Brennan
2026-04-03  8:06 ` [PATCH v4][makedumpfile 0/7] btf/kallsyms based makedumpfile extension for mm page filtering HAGIO KAZUHITO(萩尾 一仁)
2026-04-03 18:26 ` Stephen Brennan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ldf3ew7g.fsf@oracle.com \
    --to=stephen.s.brennan@oracle.com \
    --cc=aravinda@linux.vnet.ibm.com \
    --cc=k-hagio-ab@nec.com \
    --cc=kexec@lists.infradead.org \
    --cc=ltao@redhat.com \
    --cc=yamazaki-msmt@nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox