All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering
@ 2026-04-14 10:26 Tao Liu
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 1/9] Reserve sections for makedumpfile and extenions Tao Liu
                   ` (11 more replies)
  0 siblings, 12 replies; 18+ messages in thread
From: Tao Liu @ 2026-04-14 10:26 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

A) This patchset will introduce the following features to makedumpfile:

  1) Add .so extension support to makedumpfile
  2) Enable btf and kallsyms for symbol type and address resolving.

B) The purpose of the features are:

  1) Currently makedumpfile filters mm pages based on page flags, because flags
     can help to determine one page's usage. But this page-flag-checking method
     lacks of flexibility in certain cases, e.g. if we want to filter those mm
     pages occupied by GPU during vmcore dumping due to:

     a) GPU may be taking a large memory and contains sensitive data;
     b) GPU mm pages have no relations to kernel crash and useless for vmcore
        analysis.

     But there is no GPU mm page specific flags, and apparently we don't need
     to create one just for kdump use. A programmable filtering tool is more
     suitable for such cases. In addition, different GPU vendors may use
     different ways for mm pages allocating, programmable filtering is better
     than hard coding these GPU specific logics into makedumpfile in this case.

  2) Currently makedumpfile already contains a programmable filtering tool, aka
     eppic script, which allows user to write customized code for data erasing.
     However it has the following drawbacks:

     a) cannot do mm page filtering.
     b) need to access to debuginfo of both kernel and modules, which is not
        applicable in the 2nd kernel.
     c) eppic library has memory leaks which are not all resolved [1]. This
        is not acceptable in 2nd kernel.

     makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
     types and addresses. In recent kernel there are dwarf alternatives such
     as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
     are already packed within vmcore, so we can use it directly.

  With these, this patchset introduces makedumpfile extensions, which is based
  on btf/kallsyms symbol resolving, and is programmable for mm page filtering.
  The following section shows its usage and performance, please note the tests
  are performed in 1st kernel.

  3) Compile and run makedumpfile extensions:

  $ make LINKTYPE=dynamic USELZO=on USESNAPPY=on USEZSTD=on EXTENSION=on
  $ make extensions
  
  $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
    /tmp/extension.out --extension amdgpu_filter.so
    Loaded extension: ./extensions/amdgpu_filter.so
    makedumpfile Completed.
        User time (seconds): 5.08
        System time (seconds): 0.84
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.95
        Maximum resident set size (kbytes): 17360
        ...
 
     To contrast with eppic script of v2 [2]:

  $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
    /tmp/eppic.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c   
    makedumpfile Completed.
        User time (seconds): 8.23
        System time (seconds): 0.88
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.16
        Maximum resident set size (kbytes): 57128
        ...

  -rw------- 1 root root 367475074 Jan 19 19:01 /tmp/extension.out
  -rw------- 1 root root 367475074 Jan 19 19:48 /tmp/eppic.out
  -rw------- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore

C) Discussion:

  1) GPU types: Currently only tested with amdgpu's mm page filtering, others
     are not tested.
  2) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64.
     rhel8.x is not supported, others are not tested.

D) Testing:

     If you don't want to create your vmcore, you can find a vmcore which I
     created with amdgpu mm pages unfiltered [3], the amdgpu mm pages are
     allocated by program [4]. You can use the vmcore in 1st kernel to filter
     the amdgpu mm pages by the previous performance testing cmdline. To
     verify the pages are filtered in crash:

     Unfiltered:
     crash> search -c "!QAZXSW@#EDC"
     ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
     ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
     crash> rd ffff96b7fa800000
     ffff96b7fa800000:  405753585a415121                    !QAZXSW@
     crash> rd ffff96b87c800000
     ffff96b87c800000:  405753585a415121                    !QAZXSW@

     Filtered:
     crash> search -c "!QAZXSW@#EDC"
     crash> rd ffff96b7fa800000
     rd: page excluded: kernel virtual address: ffff96b7fa800000  type: "64-bit KVADDR"
     crash> rd ffff96b87c800000
     rd: page excluded: kernel virtual address: ffff96b87c800000  type: "64-bit KVADDR"

[1]: https://github.com/lucchouina/eppic/pull/32
[2]: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
[3]: https://people.redhat.com/~ltao/core/vmcore
[4]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df

v5 -> v4:

1) Add "make EXTENSION=on" switch to customize the extension feature.
2) Clean up macros within btf_info.h.
3) Updated doc and a sample extension to demo the extension usage.
4) Use MSG()/ERRMSG() rather than fprintf().
5) Add return value check for readmem().
6) Allow "makedumpfile -d 1 --extension ext.so" to enter extension.
7) The patches are organized as follows:

    --- <customization specific> ---
    9. Add amdgpu mm pages filtering extension

    --- <code should be merged> ---
    8. Doc: Add --extension option to makedumpfile manual
    7. Add sample extension as an example reference
    6. Add makedumpfile extensions support
    5. Implement kernel module's btf resolving
    4. Implement kernel module's kallsyms resolving
    3. Implement kernel btf resolving
    2. Implement kernel kallsyms resolving
    1. Reserve sections for makedumpfile and extenions

    Patch 9 is customization specific, merging depends on the strategy of
    maintenance.
    Patch 1 ~ 8 are common code which should be merged with makedumpfile.

Link to v4: https://lore.kernel.org/kexec/20260317150743.69590-1-ltao@redhat.com/
Link to v3: https://lore.kernel.org/kexec/20260120025500.25095-1-ltao@redhat.com/
Link to v2: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
Link to v1: https://lore.kernel.org/kexec/20250610095743.18073-1-ltao@redhat.com/

Tao Liu (9):
  Reserve sections for makedumpfile and extenions
  Implement kernel kallsyms resolving
  Implement kernel btf resolving
  Implement kernel module's kallsyms resolving
  Implement kernel module's btf resolving
  Add makedumpfile extensions support
  Add sample extension as an example reference
  Doc: Add --extension option to makedumpfile manual
  Add amdgpu mm pages filtering extension

 Makefile                   |  15 +-
 README                     |   6 +
 btf_info.c                 | 375 +++++++++++++++++++++++++
 btf_info.h                 |  77 ++++++
 extension.c                | 338 ++++++++++++++++++++++
 extension.h                |  16 ++
 extensions/Makefile        |  13 +
 extensions/amdgpu_filter.c | 221 +++++++++++++++
 extensions/maple_tree.c    | 328 ++++++++++++++++++++++
 extensions/maple_tree.h    |   7 +
 extensions/sample.c        |  69 +++++
 kallsyms.c                 | 554 +++++++++++++++++++++++++++++++++++++
 kallsyms.h                 |  87 ++++++
 makedumpfile.8.in          |  11 +-
 makedumpfile.c             |  44 ++-
 makedumpfile.h             |  12 +
 makedumpfile.ld            |  16 ++
 17 files changed, 2180 insertions(+), 9 deletions(-)
 create mode 100644 btf_info.c
 create mode 100644 btf_info.h
 create mode 100644 extension.c
 create mode 100644 extension.h
 create mode 100644 extensions/Makefile
 create mode 100644 extensions/amdgpu_filter.c
 create mode 100644 extensions/maple_tree.c
 create mode 100644 extensions/maple_tree.h
 create mode 100644 extensions/sample.c
 create mode 100644 kallsyms.c
 create mode 100644 kallsyms.h
 create mode 100644 makedumpfile.ld

-- 
2.47.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5][makedumpfile 1/9] Reserve sections for makedumpfile and extenions
  2026-04-14 10:26 [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
@ 2026-04-14 10:26 ` Tao Liu
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 2/9] Implement kernel kallsyms resolving Tao Liu
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Tao Liu @ 2026-04-14 10:26 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

This patch makes preparation for btf/kallsyms support of
makedumpfile and extensions. Any needed kernel symbols/types
will be reserved within a special section, .init_ksyms for
kallsyms symbols and .init_ktypes for kernel types. During
makedumpfile kallsyms/btf initialization, those missing info
will be resolved. A makedumpfile.ld script is introduced for the
purpose.

Suggested-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
---
 Makefile        |  2 +-
 makedumpfile.ld | 16 ++++++++++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 makedumpfile.ld

diff --git a/Makefile b/Makefile
index 05ab5f2..15a4ba0 100644
--- a/Makefile
+++ b/Makefile
@@ -113,7 +113,7 @@ $(OBJ_ARCH): $(SRC_ARCH)
 	$(CC) $(CFLAGS_ARCH) -c -o ./$@ $(VPATH)$(@:.o=.c)
 
 makedumpfile: $(SRC_BASE) $(OBJ_PART) $(OBJ_ARCH)
-	$(CC) $(CFLAGS) $(LDFLAGS) $(OBJ_PART) $(OBJ_ARCH) -rdynamic -o $@ $< $(LIBS)
+	$(CC) $(CFLAGS) $(LDFLAGS) $(OBJ_PART) $(OBJ_ARCH) -rdynamic -Wl,-T,makedumpfile.ld -o $@ $< $(LIBS)
 	@sed -e "s/@DATE@/$(DATE)/" \
 	     -e "s/@VERSION@/$(VERSION)/" \
 	     $(VPATH)makedumpfile.8.in > $(VPATH)makedumpfile.8
diff --git a/makedumpfile.ld b/makedumpfile.ld
new file mode 100644
index 0000000..474ad41
--- /dev/null
+++ b/makedumpfile.ld
@@ -0,0 +1,16 @@
+SECTIONS
+{
+	.init_ksyms ALIGN(8) : {
+		__start_init_ksyms = .;
+		KEEP(*(.init_ksyms*))
+		__stop_init_ksyms = .;
+	}
+
+	.init_ktypes ALIGN(8) : {
+		__start_init_ktypes = .;
+		KEEP(*(.init_ktypes*))
+		__stop_init_ktypes = .;
+	}
+}
+INSERT AFTER .data;
+
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5][makedumpfile 2/9] Implement kernel kallsyms resolving
  2026-04-14 10:26 [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 1/9] Reserve sections for makedumpfile and extenions Tao Liu
@ 2026-04-14 10:26 ` Tao Liu
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 3/9] Implement kernel btf resolving Tao Liu
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Tao Liu @ 2026-04-14 10:26 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

This patch will parse kernel's kallsyms data. During the parsing
process, the .init_ksyms sections of makedumpfile and the
extensions will be iterated, so the kallsyms symbols which belongs
to vmlinux can be resolved at this moment.

Suggested-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
---
 Makefile       |   2 +-
 kallsyms.c     | 396 +++++++++++++++++++++++++++++++++++++++++++++++++
 kallsyms.h     |  84 +++++++++++
 makedumpfile.c |   3 +
 makedumpfile.h |  11 ++
 5 files changed, 495 insertions(+), 1 deletion(-)
 create mode 100644 kallsyms.c
 create mode 100644 kallsyms.h

diff --git a/Makefile b/Makefile
index 15a4ba0..a57185e 100644
--- a/Makefile
+++ b/Makefile
@@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
 endif
 
 SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
-SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c
+SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c
 OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
 SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
 OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
diff --git a/kallsyms.c b/kallsyms.c
new file mode 100644
index 0000000..a198de6
--- /dev/null
+++ b/kallsyms.c
@@ -0,0 +1,396 @@
+#define _GNU_SOURCE
+#include <stdbool.h>
+#ifdef EXTENSION
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include "makedumpfile.h"
+#include "kallsyms.h"
+
+static uint32_t *kallsyms_offsets = NULL;
+static uint16_t *kallsyms_token_index = NULL;
+static uint8_t  *kallsyms_token_table = NULL;
+static uint8_t  *kallsyms_names = NULL;
+static unsigned long kallsyms_relative_base = 0;
+static unsigned int kallsyms_num_syms = 0;
+
+/* makedumpfile & extensions' .init_ksyms section range array */
+static struct section_range **sr = NULL;
+static int sr_len = 0;
+static int sr_cap = 0;
+
+/* Which mod's kallsyms should be inited? */
+static char **mods = NULL;
+static int mods_len = 0;
+static int mods_cap = 0;
+
+INIT_MOD_SYM(vmlinux, _stext);
+
+/*
+ * Utility: add elem to arr, which can auto extend its capacity.
+ * (*arr) is a pointer array, holding pointers of elem
+*/
+bool add_to_arr(void ***arr, int *arr_len, int *arr_cap, void *elem)
+{
+	void *tmp;
+	int new_cap = 0;
+
+	if (*arr == NULL) {
+		*arr_len = 0;
+		new_cap = 4;
+	} else if (*arr_len >= *arr_cap) {
+		new_cap = (*arr_cap) + ((*arr_cap) >> 1);
+	}
+
+	if (new_cap) {
+		tmp = reallocarray(*arr, new_cap, sizeof(void *));
+		if (!tmp)
+			goto no_mem;
+		*arr = tmp;
+		*arr_cap = new_cap;
+	}
+
+	(*arr)[(*arr_len)++] = elem;
+	return true;
+
+no_mem:
+	ERRMSG("Not enough memory!\n");
+	return false;
+}
+
+/*
+ * Utility: add uniq string to arr, which can auto extend its capacity.
+*/
+bool push_uniq_str(void ***arr, int *arr_len, int *arr_cap, char *str)
+{
+	for (int i = 0; i < (*arr_len); i++) {
+		if (!strcmp((*arr)[i], str))
+			/* String already exists, skip it */
+			return true;
+	}
+	return add_to_arr(arr, arr_len, arr_cap, str);
+}
+
+static bool add_ksym_modname(char *modname)
+{
+	return push_uniq_str((void ***)&mods, &mods_len, &mods_cap, modname);
+}
+
+bool check_ksyms_require_modname(char *modname, int *total)
+{
+	if (total)
+		*total = mods_len;
+	for (int i = 0; i < mods_len; i++) {
+		if (!strcmp(modname, mods[i]))
+			return true;
+	}
+	return false;
+}
+
+static void cleanup_ksyms_modname(void)
+{
+	if (mods) {
+		free(mods);
+		mods = NULL;
+	}
+	mods_len = 0;
+	mods_cap = 0;
+}
+
+/*
+ * Used by makedumpfile and extensions, to register their .init_ksyms section.
+ * so kallsyms can know which module/sym should be inited.
+*/
+REGISTER_SECTION(ksym)
+
+static void cleanup_ksyms_section_range(void)
+{
+	for (int i = 0; i < sr_len; i++) {
+		free(sr[i]);
+	}
+	if (sr) {
+		free(sr);
+		sr = NULL;
+	}
+	sr_len = 0;
+	sr_cap = 0;
+}
+
+static uint64_t absolute_percpu(uint64_t base, int32_t val)
+{
+	if (val >= 0)
+		return (uint64_t)val;
+	else
+		return base - 1 - val;
+}
+
+static uint64_t calc_addr_absolute_percpu(struct ksym_info *p)
+{
+	return absolute_percpu(kallsyms_relative_base, p->value);
+}
+
+static uint64_t calc_addr_relative_base(struct ksym_info *p)
+{
+	return p->value + kallsyms_relative_base;
+}
+
+static uint64_t calc_addr_place_relative(struct ksym_info *p)
+{
+	return SYMBOL(kallsyms_offsets) + p->index * sizeof(uint32_t) +
+		(int32_t)kallsyms_offsets[p->index];
+}
+
+static bool parse_kernel_kallsyms(void)
+{
+	char buf[BUFSIZE];
+	int index = 0, i, j;
+	uint8_t *compressd_data;
+	uint8_t *uncompressd_data;
+	uint8_t len, len_old;
+	struct ksym_info **p;
+	uint64_t (*calc_addr)(struct ksym_info *);
+	struct ksym_info *stext_p;
+	bool skip_symbol;
+
+	for (i = 0; i < kallsyms_num_syms; i++) {
+		skip_symbol = false;
+		memset(buf, 0, BUFSIZE);
+		len = kallsyms_names[index];
+		if (len & 0x80) {
+			index++;
+			len_old = len;
+			len = kallsyms_names[index];
+			if (len & 0x80) {
+				ERRMSG("BUG! Unexpected 3-byte length, "
+				       "should be detected in init_kernel_kallsyms()\n");
+				goto out;
+			}
+			len = (len_old & 0x7F) | (len << 7);
+		}
+		index++;
+
+		compressd_data = &kallsyms_names[index];
+		index += len;
+		while (len--) {
+			uncompressd_data = &kallsyms_token_table[kallsyms_token_index[*compressd_data]];
+			if (strlen(buf) + strlen((char *)uncompressd_data) >= BUFSIZE) {
+				skip_symbol = true;
+				break;
+			}
+			strcat(buf, (char *)uncompressd_data);
+			compressd_data++;
+		}
+
+		if (skip_symbol)
+			continue;
+
+		/* Now check if the symbol is we wanted */
+		for (j = 0; j < sr_len; j++) {
+			for (p = (struct ksym_info **)(sr[j]->start);
+			     p < (struct ksym_info **)(sr[j]->stop);
+			     p++) {
+				if (!strcmp((*p)->modname, "vmlinux") &&
+				    !strcmp((*p)->symname, &buf[1])) {
+					(*p)->value = kallsyms_offsets[i];
+					(*p)->index = i;
+				}
+			}
+		}
+	}
+
+	/* Check the approach for calc absolute kallsyms address
+	 *
+	 * A complete comment of each approaches please refer to:
+	 * https://github.com/osandov/drgn/commit/744f36ec3c3f64d7e1323a0037898158698585c4
+	 */
+	if (!MOD_SYM_EXIST(vmlinux, _stext)) {
+		ERRMSG("symbol _stext not found!\n");
+		goto out;
+	}
+
+	stext_p = GET_MOD_SYM_PTR(vmlinux, _stext);
+
+	if (SYMBOL(_stext) == calc_addr_absolute_percpu(stext_p)) {
+		calc_addr = calc_addr_absolute_percpu;
+	} else if (SYMBOL(_stext) == calc_addr_relative_base(stext_p)) {
+		calc_addr = calc_addr_relative_base;
+	} else if (SYMBOL(_stext) == calc_addr_place_relative(stext_p)) {
+		calc_addr = calc_addr_place_relative;
+	} else {
+		ERRMSG("Wrong calculate kallsyms symbol value!\n");
+		goto out;
+	}
+
+	/* Now do the calc */
+	for (j = 0; j < sr_len; j++) {
+		for (p = (struct ksym_info **)(sr[j]->start);
+		     p < (struct ksym_info **)(sr[j]->stop);
+		     p++) {
+			if (!strcmp((*p)->modname, "vmlinux") &&
+			    SYM_EXIST(*p)) {
+				(*p)->value = calc_addr(*p);
+			}
+		}
+	}
+
+	return true;
+out:
+	return false;
+}
+
+static bool vmcore_info_ready = false;
+
+bool read_vmcoreinfo_kallsyms(void)
+{
+	READ_SYMBOL("kallsyms_names", kallsyms_names);
+	READ_SYMBOL("kallsyms_num_syms", kallsyms_num_syms);
+	READ_SYMBOL("kallsyms_token_table", kallsyms_token_table);
+	READ_SYMBOL("kallsyms_token_index", kallsyms_token_index);
+	READ_SYMBOL("kallsyms_offsets", kallsyms_offsets);
+	READ_SYMBOL("kallsyms_relative_base", kallsyms_relative_base);
+	if (SYMBOL(kallsyms_names) != NOT_FOUND_SYMBOL) {
+		vmcore_info_ready = true;
+	} else {
+		vmcore_info_ready = false;
+	}
+	return true;
+}
+
+/*
+ * Makedumpfile's .init_ksyms section
+*/
+extern struct ksym_info *__start_init_ksyms[];
+extern struct ksym_info *__stop_init_ksyms[];
+
+bool init_kernel_kallsyms(void)
+{
+	const int token_index_size = (UINT8_MAX + 1) * sizeof(uint16_t);
+	uint64_t last_token, len;
+	unsigned char data, data_old;
+	int i;
+	bool ret = false;
+
+	if (vmcore_info_ready == false) {
+		ERRMSG("vmcoreinfo not ready for kallsyms!\n");
+		return ret;
+	}
+
+	if (!register_ksym_section((char *)__start_init_ksyms,
+				   (char *)__stop_init_ksyms))
+		return ret;
+
+	if (!readmem(VADDR, SYMBOL(kallsyms_num_syms), &kallsyms_num_syms,
+		sizeof(kallsyms_num_syms))) {
+		ERRMSG("Can't get kallsyms_num_syms!\n");
+		goto out;
+	}
+	if (SYMBOL(kallsyms_relative_base) != NOT_FOUND_SYMBOL) {
+		if (!readmem(VADDR, SYMBOL(kallsyms_relative_base),
+			&kallsyms_relative_base, sizeof(kallsyms_relative_base))) {
+			ERRMSG("Can't get kallsyms_relative_base!\n");
+			goto out;
+		}
+	}
+
+	kallsyms_offsets = malloc(sizeof(uint32_t) * kallsyms_num_syms);
+	if (!kallsyms_offsets)
+		goto no_mem;
+	if (!readmem(VADDR, SYMBOL(kallsyms_offsets), kallsyms_offsets,
+		kallsyms_num_syms * sizeof(uint32_t))) {
+		ERRMSG("Can't get kallsyms_offsets!\n");
+		goto out;
+	}
+
+	kallsyms_token_index = malloc(token_index_size);
+	if (!kallsyms_token_index)
+		goto no_mem;
+	if (!readmem(VADDR, SYMBOL(kallsyms_token_index), kallsyms_token_index,
+		token_index_size)) {
+		ERRMSG("Can't get kallsyms_token_index!\n");
+		goto out;
+	}
+
+	last_token = SYMBOL(kallsyms_token_table) + kallsyms_token_index[UINT8_MAX];
+	do {
+		if (!readmem(VADDR, last_token++, &data, 1)) {
+			ERRMSG("Can't get last_token!\n");
+			goto out;
+		}
+	} while(data);
+	len = last_token - SYMBOL(kallsyms_token_table);
+	kallsyms_token_table = malloc(len);
+	if (!kallsyms_token_table)
+		goto no_mem;
+	if (!readmem(VADDR, SYMBOL(kallsyms_token_table), kallsyms_token_table, len)) {
+		ERRMSG("Can't get kallsyms_token_table!\n");
+		goto out;
+	}
+
+	for (len = 0, i = 0; i < kallsyms_num_syms; i++) {
+		if (!readmem(VADDR, SYMBOL(kallsyms_names) + len, &data, 1)) {
+			ERRMSG("Can't get kallsyms_names len1!\n");
+			goto out;
+		}
+		/*
+		 * The 2-byte representation was added in commit 73bbb94466fd3
+		 * ("kallsyms: support "big" kernel symbols") in v6.1, thus for
+		 * v6.1+, they indicate a long symbol, but for kernel versions
+		 * prior to v6.1, they might be ambiguous.
+		 */
+		if (data & 0x80) {
+			len += 1;
+			data_old = data;
+			if (!readmem(VADDR, SYMBOL(kallsyms_names) + len, &data, 1)) {
+				ERRMSG("Can't get kallsyms_names len2!\n");
+				goto out;
+			}
+			if (data & 0x80) {
+				ERRMSG("BUG! Unexpected 3-byte length "
+					"encoding in kallsyms names\n");
+				goto out;
+			}
+			data = (data_old & 0x7F) | (data << 7);
+		}
+		len += data + 1;
+	}
+	kallsyms_names = malloc(len);
+	if (!kallsyms_names)
+		goto no_mem;
+	if (!readmem(VADDR, SYMBOL(kallsyms_names), kallsyms_names, len)) {
+		ERRMSG("Can't get kallsyms_names!\n");
+		goto out;
+	}
+
+	ret = parse_kernel_kallsyms();
+	goto out;
+
+no_mem:
+	ERRMSG("Not enough memory!\n");
+out:
+	if (kallsyms_offsets) {
+		free(kallsyms_offsets);
+		kallsyms_offsets = NULL;
+	}
+	if (kallsyms_token_index) {
+		free(kallsyms_token_index);
+		kallsyms_token_index = NULL;
+	}
+	if (kallsyms_token_table) {
+		free(kallsyms_token_table);
+		kallsyms_token_table = NULL;
+	}
+	if (kallsyms_names) {
+		free(kallsyms_names);
+		kallsyms_names = NULL;
+	}
+	return ret;
+}
+#else /* EXTENSION */
+
+bool read_vmcoreinfo_kallsyms(void)
+{
+	return true;
+}
+
+#endif /* EXTENSION */
+
diff --git a/kallsyms.h b/kallsyms.h
new file mode 100644
index 0000000..73e9839
--- /dev/null
+++ b/kallsyms.h
@@ -0,0 +1,84 @@
+#ifndef _KALLSYMS_H
+#define _KALLSYMS_H
+
+#include <stdint.h>
+#include <stdbool.h>
+
+struct ksym_info {
+	/********in******/
+	char *modname;
+	char *symname;
+	bool sym_required;
+	/********out*****/
+	uint64_t value;
+	int index;	// -1 if sym not found
+};
+
+#define INIT_MOD_SYM_RQD(MOD, SYM, R)			\
+	struct ksym_info _##MOD##_##SYM = {		\
+		#MOD, #SYM, R, 0, -1			\
+	};						\
+	__attribute__((section(".init_ksyms"), used))	\
+	struct ksym_info * _ptr_##MOD##_##SYM = &_##MOD##_##SYM
+
+#define GET_MOD_SYM(MOD, SYM) (_##MOD##_##SYM.value)
+#define GET_MOD_SYM_PTR(MOD, SYM) (&_##MOD##_##SYM)
+#define MOD_SYM_EXIST(MOD, SYM) (_##MOD##_##SYM.index >= 0)
+#define SYM_EXIST(p) ((p)->index >= 0)
+
+/*
+ * Required syms will be checked automatically before extension running.
+ * Optinal syms should be checked manually at extension runtime.
+ */
+#define INIT_MOD_SYM(MOD, SYM) INIT_MOD_SYM_RQD(MOD, SYM, 1)
+#define INIT_OPT_MOD_SYM(MOD, SYM) INIT_MOD_SYM_RQD(MOD, SYM, 0)
+
+struct section_range {
+	char *start;
+	char *stop;
+};
+
+#define REGISTER_SECTION(T)						\
+bool register_##T##_section(char *start, char *stop)			\
+{									\
+	struct section_range *new_sr;					\
+	struct T##_info **p;						\
+	bool ret = false;						\
+									\
+	if (!start || !stop) {						\
+		fprintf(stderr, "%s: Invalid section start/stop\n",	\
+			__func__);					\
+		goto out;						\
+	}								\
+									\
+	for (p = (struct T##_info **)start;				\
+	     p < (struct T##_info **)stop;				\
+	     p++) {							\
+		if (!add_##T##_modname((*p)->modname))			\
+			goto out;					\
+	}								\
+									\
+	new_sr = malloc(sizeof(struct section_range));			\
+	if (!new_sr) {							\
+		fprintf(stderr, "%s: Not enough memory!\n", __func__);	\
+		goto out;						\
+	}								\
+	new_sr->start = start;						\
+	new_sr->stop = stop;						\
+	if (!add_to_arr((void ***)&sr, &sr_len, &sr_cap, new_sr)) {	\
+		free(new_sr);						\
+		goto out;						\
+	}								\
+	ret = true;							\
+out:									\
+	return ret;							\
+}
+
+bool add_to_arr(void ***arr, int *arr_len, int *arr_cap, void *elem);
+bool push_uniq_str(void ***arr, int *arr_len, int *arr_cap, char *str);
+bool check_ksyms_require_modname(char *modname, int *total);
+bool register_ksym_section(char *start, char *stop);
+bool read_vmcoreinfo_kallsyms(void);
+bool init_kernel_kallsyms(void);
+#endif /* _KALLSYMS_H */
+
diff --git a/makedumpfile.c b/makedumpfile.c
index 12fb0d8..dba3628 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -27,6 +27,7 @@
 #include <limits.h>
 #include <assert.h>
 #include <zlib.h>
+#include "kallsyms.h"
 
 struct symbol_table	symbol_table;
 struct size_table	size_table;
@@ -3105,6 +3106,8 @@ read_vmcoreinfo_from_vmcore(off_t offset, unsigned long size, int flag_xen_hv)
 		if (!read_vmcoreinfo())
 			goto out;
 	}
+	read_vmcoreinfo_kallsyms();
+
 	close_vmcoreinfo();
 
 	ret = TRUE;
diff --git a/makedumpfile.h b/makedumpfile.h
index 134eb7a..0f13743 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -259,6 +259,7 @@ static inline int string_exists(char *s) { return (s ? TRUE : FALSE); }
 #define UINT(ADDR)	*((unsigned int *)(ADDR))
 #define ULONG(ADDR)	*((unsigned long *)(ADDR))
 #define ULONGLONG(ADDR)	*((unsigned long long *)(ADDR))
+#define VOID_PTR(ADDR)	*((void **)(ADDR))
 
 
 /*
@@ -1919,6 +1920,16 @@ struct symbol_table {
 	 * symbols on sparc64 arch
 	 */
 	unsigned long long		vmemmap_table;
+
+	/*
+	 * kallsyms related
+	 */
+	unsigned long long		kallsyms_names;
+	unsigned long long		kallsyms_num_syms;
+	unsigned long long		kallsyms_token_table;
+	unsigned long long		kallsyms_token_index;
+	unsigned long long		kallsyms_offsets;
+	unsigned long long		kallsyms_relative_base;
 };
 
 struct size_table {
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5][makedumpfile 3/9] Implement kernel btf resolving
  2026-04-14 10:26 [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 1/9] Reserve sections for makedumpfile and extenions Tao Liu
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 2/9] Implement kernel kallsyms resolving Tao Liu
@ 2026-04-14 10:26 ` Tao Liu
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 4/9] Implement kernel module's kallsyms resolving Tao Liu
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Tao Liu @ 2026-04-14 10:26 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

This patch will parse kernel's btf data using libbpf. The kernel's
btf data is located between __start_BTF and __stop_BTF symbols which
are resolved by kallsyms of the previous patch. Same as the previous
one, the .init_ktypes section of makedumpfile and the extensions will
be iterated, and any types which belongs to vmlinux can be resolved
at this time.

Another primary function implemented in this patch, is recursively
diving into anonymous struct/union when encountered any, to find a
member by given its name.

Suggested-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
---
 Makefile   |   2 +-
 btf_info.c | 242 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 btf_info.h |  75 +++++++++++++++++
 3 files changed, 318 insertions(+), 1 deletion(-)
 create mode 100644 btf_info.c
 create mode 100644 btf_info.h

diff --git a/Makefile b/Makefile
index a57185e..690ef3e 100644
--- a/Makefile
+++ b/Makefile
@@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
 endif
 
 SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
-SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c
+SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c
 OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
 SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
 OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
diff --git a/btf_info.c b/btf_info.c
new file mode 100644
index 0000000..7243674
--- /dev/null
+++ b/btf_info.c
@@ -0,0 +1,242 @@
+#ifdef EXTENSION
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <bpf/btf.h>
+#include <bpf/libbpf_legacy.h>
+#include "makedumpfile.h"
+#include "kallsyms.h"
+#include "btf_info.h"
+
+struct btf_arr_elem {
+	struct btf *btf;
+	char *module;
+};
+
+static struct btf_arr_elem **btf_arr = NULL;
+static int btf_arr_len = 0;
+static int btf_arr_cap = 0;
+
+/* makedumpfile & extensions' .init_ktypes section range array */
+static struct section_range **sr = NULL;
+static int sr_len = 0;
+static int sr_cap = 0;
+
+/* Which mod's btf should be inited? */
+static char **mods = NULL;
+static int mods_len = 0;
+static int mods_cap = 0;
+
+static bool add_ktype_modname(char *modname)
+{
+	return push_uniq_str((void ***)&mods, &mods_len, &mods_cap, modname);
+}
+
+bool check_ktypes_require_modname(char *modname, int *total)
+{
+	if (total)
+		*total = mods_len;
+	for (int i = 0; i < mods_len; i++) {
+		if (!strcmp(modname, mods[i]))
+			return true;
+	}
+	return false;
+}
+
+static void cleanup_ktypes_modname(void)
+{
+	if (mods) {
+		free(mods);
+		mods = NULL;
+	}
+	mods_len = 0;
+	mods_cap = 0;
+}
+
+/*
+ * Used by makedumpfile and extensions, to register their .init_ktypes section,
+ * so btf_info can know which module/type should be inited.
+*/
+REGISTER_SECTION(ktype)
+
+static void cleanup_ktypes_section_range(void)
+{
+	for (int i = 0; i < sr_len; i++) {
+		free(sr[i]);
+	}
+	if (sr) {
+		free(sr);
+		sr = NULL;
+	}
+	sr_len = 0;
+	sr_cap = 0;
+}
+
+static void find_member_recursive(struct btf *btf, int struct_typeid,
+				  int base_offset, struct ktype_info *ki)
+{
+	const struct btf_type *st;
+	struct btf_member *bm;
+	int i, vlen;
+
+	struct_typeid = btf__resolve_type(btf, struct_typeid);
+	st = btf__type_by_id(btf, struct_typeid);
+
+	if (!st)
+		return;
+
+	if (BTF_INFO_KIND(st->info) != BTF_KIND_STRUCT &&
+	    BTF_INFO_KIND(st->info) != BTF_KIND_UNION)
+		return;
+
+	vlen = BTF_INFO_VLEN(st->info);
+	bm = btf_members(st);
+
+	for (i = 0; i < vlen; i++, bm++) {
+		const char *name = btf__name_by_offset(btf, bm->name_off);
+		int member_bit_offset = btf_member_bit_offset(st, i) + base_offset;
+		int member_typeid = btf__resolve_type(btf, bm->type);
+		const struct btf_type *mt = btf__type_by_id(btf, member_typeid);
+
+		if (name && strcmp(name, ki->member_name) == 0) {
+			ki->member_bit_offset = member_bit_offset;
+			ki->member_bit_sz = btf_member_bitfield_size(st, i);
+			ki->member_size = btf__resolve_size(btf, member_typeid);
+			ki->index = i;
+			return;
+		}
+
+		if (!name || !name[0]) {
+			if (BTF_INFO_KIND(mt->info) == BTF_KIND_STRUCT ||
+			    BTF_INFO_KIND(mt->info) == BTF_KIND_UNION) {
+				find_member_recursive(btf, member_typeid,
+						      member_bit_offset, ki);
+			}
+		}
+	}
+}
+
+static void get_ktype_info(struct ktype_info *ki, char *mod_to_resolve)
+{
+	int i, j, start_id;
+
+	if (mod_to_resolve != NULL) {
+		if (strcmp(ki->modname, mod_to_resolve) != 0)
+			/* Exit safely */
+			return;
+	}
+
+	for (i = 0; i < btf_arr_len; i++) {
+		if (strcmp(btf_arr[i]->module, ki->modname) != 0)
+			continue;
+		/*
+		 * vmlinux(btf_arr[0])'s typeid is 1~vmlinux_type_cnt,
+		 * modules(btf_arr[1...])'s typeid is vmlinux_type_cnt~btf__type_cnt
+		 */
+		start_id = (i == 0 ? 1 : btf__type_cnt(btf_arr[0]->btf));
+
+		for (j = start_id; j < btf__type_cnt(btf_arr[i]->btf); j++) {
+			const struct btf_type *bt =
+				btf__type_by_id(btf_arr[i]->btf, j);
+			const char *name =
+				btf__name_by_offset(btf_arr[i]->btf, bt->name_off);
+
+			if (name && strcmp(ki->struct_name, name) == 0) {
+				if (ki->member_name != NULL) {
+					/* Retrieve member info */
+					find_member_recursive(btf_arr[i]->btf, j, 0, ki);
+				} else {
+					ki->index = j;
+				}
+				ki->struct_size = btf__resolve_size(btf_arr[i]->btf, j);
+				return;
+			}
+		}
+	}
+}
+
+static bool add_to_btf_arr(struct btf *btf, char *module_name)
+{
+	struct btf_arr_elem *new_p;
+
+	new_p = malloc(sizeof(struct btf_arr_elem));
+	if (!new_p)
+		goto no_mem;
+
+	new_p->btf = btf;
+	new_p->module = module_name;
+
+	return add_to_arr((void ***)&btf_arr, &btf_arr_len, &btf_arr_cap, new_p);
+
+no_mem:
+	ERRMSG("Not enough memory!\n");
+	return false;
+}
+
+INIT_MOD_SYM(vmlinux, __start_BTF);
+INIT_MOD_SYM(vmlinux, __stop_BTF);
+
+#define GET_KERN_SYM(SYM) GET_MOD_SYM(vmlinux, SYM)
+#define KERN_SYM_EXIST(SYM) MOD_SYM_EXIST(vmlinux, SYM)
+
+/*
+ * Makedumpfile's .init_ktypes section
+*/
+extern struct ktype_info *__start_init_ktypes[];
+extern struct ktype_info *__stop_init_ktypes[];
+
+bool init_kernel_btf(void)
+{
+	uint64_t size;
+	struct btf *btf;
+	int i;
+	struct ktype_info **p;
+	char *buf = NULL;
+	bool ret = false;
+
+	uint64_t start_btf = GET_KERN_SYM(__start_BTF);
+	uint64_t stop_btf = GET_KERN_SYM(__stop_BTF);
+	if (!KERN_SYM_EXIST(__start_BTF) ||
+	    !KERN_SYM_EXIST(__stop_BTF)) {
+		ERRMSG("symbol __start/stop_BTF not found!\n");
+		goto out;
+	}
+
+	if (!register_ktype_section((char *)__start_init_ktypes,
+				    (char *)__stop_init_ktypes))
+		return ret;
+
+	size = stop_btf - start_btf;
+	buf = (char *)malloc(size);
+	if (!buf) {
+		ERRMSG("Not enough memory!\n");
+		goto out;
+	}
+	if (!readmem(VADDR, start_btf, buf, size)) {
+		ERRMSG("Can't get kernel btf data!\n");
+		goto out;
+	}
+	btf = btf__new(buf, size);
+
+	if (libbpf_get_error(btf) != 0 ||
+	    add_to_btf_arr(btf, strdup("vmlinux")) == false) {	
+		ERRMSG("init vmlinux btf fail\n");
+		goto out;
+	}
+
+	for (i = 0; i < sr_len; i++) {
+		for (p = (struct ktype_info **)(sr[i]->start);
+		     p < (struct ktype_info **)(sr[i]->stop);
+		     p++) {
+			get_ktype_info(*p, "vmlinux");
+		}
+	}
+
+	ret = true;
+out:
+	if (buf)
+		free(buf);
+	return ret;
+}
+#endif /* EXTENSION */
+
diff --git a/btf_info.h b/btf_info.h
new file mode 100644
index 0000000..8eace67
--- /dev/null
+++ b/btf_info.h
@@ -0,0 +1,75 @@
+#ifndef _BTF_INFO_H
+#define _BTF_INFO_H
+#include <stdint.h>
+#include <stdbool.h>
+
+struct ktype_info {
+	/********in******/
+	char *modname;		// Set to search within the module, in case
+				// name conflict of different modules
+	char *struct_name;	// Search by struct name
+	char *member_name;	// Search by member name
+	bool struct_required : 1;
+	bool member_required : 1;
+	/********out*****/
+	uint32_t member_bit_offset;	// member offset in bits
+	uint32_t member_bit_sz;	// member width in bits
+	uint32_t member_size;	// member size in bytes
+	uint32_t struct_size;	// struct size in bytes
+	int index;		// -1 if type not found
+};
+
+bool check_ktypes_require_modname(char *modname, int *total);
+bool register_ktype_section(char *start, char *stop);
+bool init_kernel_btf(void);
+
+#define _GEN_NAME_PTR_IMPL(PTR, NAME)		PTR##NAME
+#define _GEN_NAME_PTR(PTR, NAME)		_GEN_NAME_PTR_IMPL(PTR, NAME)
+#define ___GEN_NAME_2(MOD, S)			_##MOD##_##S
+#define ___GEN_NAME_3(MOD, S, M)		_##MOD##_##S##_##M
+#define __GEN_NAME_SELECTOR(_1, _2, _3, NAME, ...) NAME
+#define _GEN_NAME(...) __GEN_NAME_SELECTOR(__VA_ARGS__, ___GEN_NAME_3, ___GEN_NAME_2)(__VA_ARGS__)
+
+#define _INIT_MOD_STRUCT_MEMBER_RQD(MOD, S, M, R)		\
+	struct ktype_info _GEN_NAME(MOD, S, M) = {		\
+		#MOD, #S, #M, R, R, 0, 0, 0, 0, -1		\
+	};							\
+	__attribute__((section(".init_ktypes"), used))		\
+	struct ktype_info * _GEN_NAME_PTR(_ptr, _GEN_NAME(MOD, S, M)) = &_GEN_NAME(MOD, S, M)
+
+/*
+ * Required types will be checked automatically before extension running.
+ * Optinal types should be checked manually at extension runtime.
+ */
+#define INIT_MOD_STRUCT_MEMBER(MOD, S, M) \
+	_INIT_MOD_STRUCT_MEMBER_RQD(MOD, S, M, 1)
+#define INIT_OPT_MOD_STRUCT_MEMBER(MOD, S, M) \
+	_INIT_MOD_STRUCT_MEMBER_RQD(MOD, S, M, 0)
+
+#define DECLARE_MOD_STRUCT_MEMBER(MOD, S, M) \
+	extern struct ktype_info _GEN_NAME(MOD, S, M)
+
+#define GET_MOD_STRUCT_MEMBER_MOFF(MOD, S, M)	(_GEN_NAME(MOD, S, M).member_bit_offset)
+#define GET_MOD_STRUCT_MEMBER_MSIZE(MOD, S, M)	(_GEN_NAME(MOD, S, M).member_size)
+#define GET_MOD_STRUCT_MEMBER_SSIZE(MOD, S, M)	(_GEN_NAME(MOD, S, M).struct_size)
+#define MOD_STRUCT_MEMBER_EXIST(MOD, S, M)	(_GEN_NAME(MOD, S, M).index >= 0)
+#define TYPE_EXIST(p)				((p)->index >= 0)
+
+
+#define _INIT_MOD_STRUCT_RQD(MOD, S, R)				\
+	struct ktype_info _GEN_NAME(MOD, S) = {			\
+		#MOD, #S, 0, R, 0, 0, 0, 0, 0, -1		\
+	};							\
+	__attribute__((section(".init_ktypes"), used))		\
+	struct ktype_info * _GEN_NAME_PTR(_ptr, _GEN_NAME(MOD, S)) = &_GEN_NAME(MOD, S)
+
+#define INIT_MOD_STRUCT(MOD, S)		_INIT_MOD_STRUCT_RQD(MOD, S, 1)
+#define INIT_OPT_MOD_STRUCT(MOD, S)	_INIT_MOD_STRUCT_RQD(MOD, S, 0)
+
+#define DECLARE_MOD_STRUCT(MOD, S)	extern struct ktype_info _GEN_NAME(MOD, S);
+
+#define GET_MOD_STRUCT_SSIZE(MOD, S)	(_GEN_NAME(MOD, S).struct_size)
+#define MOD_STRUCT_EXIST(MOD, S)	(_GEN_NAME(MOD, S).index >= 0)
+
+#endif /* _BTF_INFO_H */
+
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5][makedumpfile 4/9] Implement kernel module's kallsyms resolving
  2026-04-14 10:26 [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (2 preceding siblings ...)
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 3/9] Implement kernel btf resolving Tao Liu
@ 2026-04-14 10:26 ` Tao Liu
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 5/9] Implement kernel module's btf resolving Tao Liu
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Tao Liu @ 2026-04-14 10:26 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

With kernel's kallsyms and btf ready, we can get any kernel types and
symbol addresses. So we can iterate kernel modules' linked list, and
parse each one of kernel module's structure to get its kallsyms data.
At this time, kernel modules' kallsyms symbol defined within .init_ksyms
section will be resolved.

Suggested-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
---
 kallsyms.c | 158 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 kallsyms.h |   3 +
 2 files changed, 161 insertions(+)

diff --git a/kallsyms.c b/kallsyms.c
index a198de6..f231a06 100644
--- a/kallsyms.c
+++ b/kallsyms.c
@@ -6,6 +6,7 @@
 #include <string.h>
 #include "makedumpfile.h"
 #include "kallsyms.h"
+#include "btf_info.h"
 
 static uint32_t *kallsyms_offsets = NULL;
 static uint16_t *kallsyms_token_index = NULL;
@@ -385,6 +386,163 @@ out:
 	}
 	return ret;
 }
+
+INIT_MOD_SYM(vmlinux, modules);
+
+INIT_MOD_STRUCT_MEMBER(vmlinux, list_head, next);
+INIT_MOD_STRUCT_MEMBER(vmlinux, module, list);
+INIT_MOD_STRUCT_MEMBER(vmlinux, module, name);
+INIT_MOD_STRUCT_MEMBER(vmlinux, module, core_kallsyms);
+INIT_MOD_STRUCT_MEMBER(vmlinux, mod_kallsyms, symtab);
+INIT_MOD_STRUCT_MEMBER(vmlinux, mod_kallsyms, num_symtab);
+INIT_MOD_STRUCT_MEMBER(vmlinux, mod_kallsyms, strtab);
+INIT_MOD_STRUCT_MEMBER(vmlinux, elf64_sym, st_name);
+INIT_MOD_STRUCT_MEMBER(vmlinux, elf64_sym, st_value);
+
+#define MEMBER_OFF(S, M) \
+	GET_MOD_STRUCT_MEMBER_MOFF(vmlinux, S, M) / 8
+#define GET_KERN_STRUCT_MEMBER_MSIZE(S, M) \
+	GET_MOD_STRUCT_MEMBER_MSIZE(vmlinux, S, M)
+#define KERN_STRUCT_MEMBER_EXIST(S, M) \
+	MOD_STRUCT_MEMBER_EXIST(vmlinux, S, M)
+#define GET_KERN_STRUCT_MEMBER_SSIZE(S, M) \
+	GET_MOD_STRUCT_MEMBER_SSIZE(vmlinux, S, M)
+#define GET_KERN_SYM(SYM) GET_MOD_SYM(vmlinux, SYM)
+#define KERN_SYM_EXIST(SYM) MOD_SYM_EXIST(vmlinux, SYM)
+
+uint64_t next_list(uint64_t list)
+{
+	uint64_t next = 0;
+
+	if (!readmem(VADDR, list + MEMBER_OFF(list_head, next),
+		&next, GET_KERN_STRUCT_MEMBER_MSIZE(list_head, next))) {
+		ERRMSG("Can't get next list!\n");
+	}
+	return next;
+}
+
+bool init_module_kallsyms(void)
+{
+	uint64_t modules, list, value = 0, symtab = 0, strtab = 0;
+	uint32_t st_name = 0;
+	int num_symtab, i, j;
+	struct ksym_info **p;
+	char symname[512], ch;
+	char *modname = NULL;
+	bool ret = false;
+
+	modules = GET_KERN_SYM(modules);
+	if (!KERN_SYM_EXIST(modules)) {
+		/* Not a failure if no module enabled */
+		ret = true;
+		goto out;
+	}
+
+	if (!KERN_STRUCT_MEMBER_EXIST(list_head, next) ||
+	    !KERN_STRUCT_MEMBER_EXIST(module, list) ||
+	    !KERN_STRUCT_MEMBER_EXIST(module, name) ||
+	    !KERN_STRUCT_MEMBER_EXIST(module, core_kallsyms) ||
+	    !KERN_STRUCT_MEMBER_EXIST(mod_kallsyms, symtab) ||
+	    !KERN_STRUCT_MEMBER_EXIST(mod_kallsyms, num_symtab) ||
+	    !KERN_STRUCT_MEMBER_EXIST(mod_kallsyms, strtab) ||
+	    !KERN_STRUCT_MEMBER_EXIST(elf64_sym, st_name) ||
+	    !KERN_STRUCT_MEMBER_EXIST(elf64_sym, st_value)) {
+		/* Fail when module enabled but any required types not found */
+		ERRMSG("Missing required module syms/types!\n");
+		goto out;
+	}
+
+	modname = (char *)malloc(GET_KERN_STRUCT_MEMBER_MSIZE(module, name));
+	if (!modname)
+		goto no_mem;
+
+	for (list = next_list(modules); list != modules; list = next_list(list)) {
+		if (!readmem(VADDR, list - MEMBER_OFF(module, list) +
+				MEMBER_OFF(module, name),
+			modname, GET_KERN_STRUCT_MEMBER_MSIZE(module, name))) {
+			ERRMSG("Can't get module modname!\n");
+			goto out;
+		}
+		if (!check_ksyms_require_modname(modname, NULL))
+			continue;
+		if (!readmem(VADDR, list - MEMBER_OFF(module, list) +
+				MEMBER_OFF(module, core_kallsyms) +
+				MEMBER_OFF(mod_kallsyms, num_symtab),
+			&num_symtab, GET_KERN_STRUCT_MEMBER_MSIZE(mod_kallsyms, num_symtab))) {
+			ERRMSG("Can't get module num_symtab!\n");
+			goto out;
+		}
+		if (!readmem(VADDR, list - MEMBER_OFF(module, list) +
+				MEMBER_OFF(module, core_kallsyms) +
+				MEMBER_OFF(mod_kallsyms, symtab),
+			&symtab, GET_KERN_STRUCT_MEMBER_MSIZE(mod_kallsyms, symtab))) {
+			ERRMSG("Can't get module symtab!\n");
+			goto out;
+		}
+		if (!readmem(VADDR, list - MEMBER_OFF(module, list) +
+				MEMBER_OFF(module, core_kallsyms) +
+				MEMBER_OFF(mod_kallsyms, strtab),
+			&strtab, GET_KERN_STRUCT_MEMBER_MSIZE(mod_kallsyms, strtab))) {
+			ERRMSG("Can't get module strtab!\n");
+			goto out;
+		}
+		for (i = 0; i < num_symtab; i++) {
+			j = 0;
+			if (!readmem(VADDR, symtab + i * GET_KERN_STRUCT_MEMBER_SSIZE(elf64_sym, st_value) +
+					MEMBER_OFF(elf64_sym, st_value),
+				&value, GET_KERN_STRUCT_MEMBER_MSIZE(elf64_sym, st_value))) {
+				ERRMSG("Can't get module st_value!\n");
+				goto out;
+			}
+			if (!readmem(VADDR, symtab + i * GET_KERN_STRUCT_MEMBER_SSIZE(elf64_sym, st_name) +
+					MEMBER_OFF(elf64_sym, st_name),
+				&st_name, GET_KERN_STRUCT_MEMBER_MSIZE(elf64_sym, st_name))) {
+				ERRMSG("Can't get module st_name!\n");
+				goto out;
+			}
+			do {
+				if (!readmem(VADDR, strtab + st_name + j++, &ch, 1)) {
+					ERRMSG("Can't get module symname's next char!\n");
+					goto out;
+				}
+			} while (ch != '\0');
+			if (j == 1 || j > sizeof(symname))
+				/* Skip empty or too long string */
+				continue;
+			if (!readmem(VADDR, strtab + st_name, symname, j)) {
+				ERRMSG("Can't get module symname!\n");
+				goto out;
+			}
+
+			for (j = 0; j < sr_len; j++) {
+				for (p = (struct ksym_info **)(sr[j]->start);
+				     p < (struct ksym_info **)(sr[j]->stop);
+				     p++) {
+					if (!strcmp((*p)->modname, modname) &&
+					    !strcmp((*p)->symname, symname)) {
+						(*p)->value = value;
+						(*p)->index = i;
+					}
+				}
+			}
+		}
+	}
+	ret = true;
+	goto out;
+no_mem:
+	ERRMSG("Not enough memory!\n");
+out:
+	if (modname)
+		free(modname);
+	return ret;
+}
+
+void cleanup_kallsyms(void)
+{
+	cleanup_ksyms_section_range();
+	cleanup_ksyms_modname();
+}
+
 #else /* EXTENSION */
 
 bool read_vmcoreinfo_kallsyms(void)
diff --git a/kallsyms.h b/kallsyms.h
index 73e9839..155fa55 100644
--- a/kallsyms.h
+++ b/kallsyms.h
@@ -80,5 +80,8 @@ bool check_ksyms_require_modname(char *modname, int *total);
 bool register_ksym_section(char *start, char *stop);
 bool read_vmcoreinfo_kallsyms(void);
 bool init_kernel_kallsyms(void);
+uint64_t next_list(uint64_t list);
+bool init_module_kallsyms(void);
+void cleanup_kallsyms(void);
 #endif /* _KALLSYMS_H */
 
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5][makedumpfile 5/9] Implement kernel module's btf resolving
  2026-04-14 10:26 [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (3 preceding siblings ...)
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 4/9] Implement kernel module's kallsyms resolving Tao Liu
@ 2026-04-14 10:26 ` Tao Liu
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 6/9] Add makedumpfile extensions support Tao Liu
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Tao Liu @ 2026-04-14 10:26 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

Same as the previous patch, with kernel's kallsyms and btf ready,
we can locate and iterate all kernel modules' btf data. So kernel
modules' types specified within .init_ksyms section will be resolved.

Suggested-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
---
 btf_info.c | 133 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 btf_info.h |   2 +
 2 files changed, 135 insertions(+)

diff --git a/btf_info.c b/btf_info.c
index 7243674..1506152 100644
--- a/btf_info.c
+++ b/btf_info.c
@@ -238,5 +238,138 @@ out:
 		free(buf);
 	return ret;
 }
+
+INIT_MOD_SYM(vmlinux, btf_modules);
+
+INIT_MOD_STRUCT_MEMBER(vmlinux, btf_module, list);
+INIT_MOD_STRUCT_MEMBER(vmlinux, btf_module, btf);
+INIT_MOD_STRUCT_MEMBER(vmlinux, btf_module, module);
+DECLARE_MOD_STRUCT_MEMBER(vmlinux, module, name);
+INIT_MOD_STRUCT_MEMBER(vmlinux, btf, data);
+INIT_MOD_STRUCT_MEMBER(vmlinux, btf, data_size);
+
+#define KERN_STRUCT_MEMBER_EXIST(S, M) MOD_STRUCT_MEMBER_EXIST(vmlinux, S, M)
+#define MEMBER_OFF(S, M) GET_MOD_STRUCT_MEMBER_MOFF(vmlinux, S, M) / 8
+#define GET_KERN_STRUCT_MEMBER_MSIZE(S, M) GET_MOD_STRUCT_MEMBER_MSIZE(vmlinux, S, M)
+#define GET_KERN_SYM(SYM) GET_MOD_SYM(vmlinux, SYM)
+
+bool init_module_btf(void)
+{
+	struct btf *btf_mod;
+	uint64_t btf_modules, list;
+	uint64_t btf = 0, data = 0, module = 0;
+	int data_size = 0;
+	bool ret = false;
+	char *btf_buf = NULL;
+	char *modname = NULL;
+	struct ktype_info **p;
+
+	btf_modules = GET_KERN_SYM(btf_modules);
+	if (!KERN_SYM_EXIST(btf_modules))
+		/* Maybe module is not enabled, this is not an error */
+		return true;
+
+	if (!KERN_STRUCT_MEMBER_EXIST(btf_module, list) ||
+	    !KERN_STRUCT_MEMBER_EXIST(btf_module, btf) ||
+	    !KERN_STRUCT_MEMBER_EXIST(btf_module, module) ||
+	    !KERN_STRUCT_MEMBER_EXIST(btf, data) ||
+	    !KERN_STRUCT_MEMBER_EXIST(btf, data_size)) {
+		/* Fail when module enabled but any required types not found */
+		ERRMSG("Missing required btf syms/types!\n");
+		goto out;
+	}
+
+	modname = (char *)malloc(GET_KERN_STRUCT_MEMBER_MSIZE(module, name));
+	if (!modname)
+		goto no_mem;
+
+	for (list = next_list(btf_modules); list != btf_modules; list = next_list(list)) {
+		if (!readmem(VADDR, list - MEMBER_OFF(btf_module, list) +
+				MEMBER_OFF(btf_module, btf),
+			&btf, GET_KERN_STRUCT_MEMBER_MSIZE(btf_module, btf))) {
+			ERRMSG("Can't get btf_module member btf!\n");
+			goto out;
+		}
+		if (!readmem(VADDR, list - MEMBER_OFF(btf_module, list) +
+				MEMBER_OFF(btf_module, module),
+			&module, GET_KERN_STRUCT_MEMBER_MSIZE(btf_module, module))) {
+			ERRMSG("Can't get btf_module member module!\n");
+			goto out;
+		}
+		if (!readmem(VADDR, module + MEMBER_OFF(module, name),
+			modname, GET_KERN_STRUCT_MEMBER_MSIZE(module, name))) {
+			ERRMSG("Can't get module modname!\n");
+			goto out;
+		}
+		if (!check_ktypes_require_modname(modname, NULL)) {
+			continue;
+		}
+		if (!readmem(VADDR, btf + MEMBER_OFF(btf, data),
+			&data, GET_KERN_STRUCT_MEMBER_MSIZE(btf, data))) {
+			ERRMSG("Can't get module btf address!\n");
+			goto out;
+		}
+		if (!readmem(VADDR, btf + MEMBER_OFF(btf, data_size),
+			&data_size, GET_KERN_STRUCT_MEMBER_MSIZE(btf, data_size))) {
+			ERRMSG("Can't get module btf data size!\n");
+			goto out;
+		}
+		btf_buf = (char *)malloc(data_size);
+		if (!btf_buf)
+			goto no_mem;
+		if (!readmem(VADDR, data, btf_buf, data_size)) {
+			ERRMSG("Can't get module btf data!\n");
+			goto out;
+		}
+		btf_mod = btf__new_split(btf_buf, data_size, btf_arr[0]->btf);
+		free(btf_buf);
+		if (libbpf_get_error(btf_mod) != 0 ||
+		    add_to_btf_arr(btf_mod, strdup(modname)) == false) {
+			ERRMSG("init %s btf fail\n", modname);
+			goto out;
+		}
+	}
+
+	/* OK, we have loaded all needed modules's btf, now resolve the types */
+	for (int i = 0; i < sr_len; i++) {
+		for (p = (struct ktype_info **)(sr[i]->start);
+		     p < (struct ktype_info **)(sr[i]->stop);
+		     p++)
+			get_ktype_info(*p, NULL);
+	}
+
+	ret = true;
+	goto out;
+
+no_mem:
+	ERRMSG("Not enough memory!\n");
+out:
+	if (modname)
+		free(modname);
+	return ret;
+}
+
+static void cleanup_btf_arr(void)
+{
+	for (int i = 0; i < btf_arr_len; i++) {
+		free(btf_arr[i]->module);
+		btf__free(btf_arr[i]->btf);
+		free(btf_arr[i]);
+	}
+	if (btf_arr) {
+		free(btf_arr);
+		btf_arr = NULL;
+	}
+	btf_arr_len = 0;
+	btf_arr_cap = 0;
+}
+
+void cleanup_btf(void)
+{
+	cleanup_btf_arr();
+	cleanup_ktypes_section_range();
+	cleanup_ktypes_modname();
+}
+
 #endif /* EXTENSION */
 
diff --git a/btf_info.h b/btf_info.h
index 8eace67..a6c3bba 100644
--- a/btf_info.h
+++ b/btf_info.h
@@ -22,6 +22,8 @@ struct ktype_info {
 bool check_ktypes_require_modname(char *modname, int *total);
 bool register_ktype_section(char *start, char *stop);
 bool init_kernel_btf(void);
+bool init_module_btf(void);
+void cleanup_btf(void);
 
 #define _GEN_NAME_PTR_IMPL(PTR, NAME)		PTR##NAME
 #define _GEN_NAME_PTR(PTR, NAME)		_GEN_NAME_PTR_IMPL(PTR, NAME)
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5][makedumpfile 6/9] Add makedumpfile extensions support
  2026-04-14 10:26 [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (4 preceding siblings ...)
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 5/9] Implement kernel module's btf resolving Tao Liu
@ 2026-04-14 10:26 ` Tao Liu
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 7/9] Add sample extension as an example reference Tao Liu
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Tao Liu @ 2026-04-14 10:26 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

The extensions can be specified by makedumpfile cmdline parameter as
"--extension", followed by extension's filename or absolute path. If
filename is give, then "./", "./extenisons" and
"/usr/lib64/makedumpfile/extensions/" will be searched.

The procedures of extensions are as follows:

Step 0: Every extensions will declare which kernel symbol/types they needed
during programming. This info will be stored within .init_ksyms/ktypes section.
Also extension will have a callback function for makedumpfile to call.

Step 1: Register .init_ksyms and .init_ktypes sections of makedumpfile
itself and extension's .so files, then tell kallsyms/btf subcomponent that which
kernel symbols/types will be resolved. And callbacks are also registered.

Step 2: Init kernel/module's btf/kallsyms on demand. Any un-needed kenrel
modules will be skipped.

Step 3: During btf/kallsyms parsing, the needed info will be filled. For
syms/types which are defined via INIT_MOD_OPT(...) macro, these are optinal
syms/types, it won't fail at parsing step if any are missing, instead, they
need to be checked within extension_init() of each extensions; Otherwise for
syms/types which defined via INIT_MOD(...) macro, these are must-have syms/types,
if any missing, the extension will fail at this step and as a result
this extension will be skipped.

After this step, required kernel symbol value and kernel types size/offset
are resolved, the extensions are ready to go.

Step 4: When makedumpfile doing page filtering, in addition to its
original filtering mechanism, it will call extensions callbacks for advices
whether the page should be included/excluded.

Suggested-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
---
 Makefile            |  13 +-
 extension.c         | 338 ++++++++++++++++++++++++++++++++++++++++++++
 extension.h         |  16 +++
 extensions/Makefile |  11 ++
 makedumpfile.c      |  41 +++++-
 makedumpfile.h      |   1 +
 6 files changed, 413 insertions(+), 7 deletions(-)
 create mode 100644 extension.c
 create mode 100644 extension.h
 create mode 100644 extensions/Makefile

diff --git a/Makefile b/Makefile
index 690ef3e..061e8e1 100644
--- a/Makefile
+++ b/Makefile
@@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
 endif
 
 SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
-SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c
+SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c extension.c
 OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
 SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
 OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
@@ -73,6 +73,11 @@ LIBS := -lzstd $(LIBS)
 CFLAGS += -DUSEZSTD
 endif
 
+ifeq ($(EXTENSION), on)
+LIBS := -lbpf $(LIBS)
+CFLAGS += -DEXTENSION
+endif
+
 ifeq ($(DEBUG), on)
 # Requires libasan
 CFLAGS += -fsanitize=address
@@ -126,6 +131,7 @@ eppic_makedumpfile.so: extension_eppic.c
 
 clean:
 	rm -f $(OBJ) $(OBJ_PART) $(OBJ_ARCH) makedumpfile makedumpfile.8 makedumpfile.conf.5
+	$(MAKE) -C extensions clean
 
 install:
 	install -m 755 -d ${DESTDIR}/${SBINDIR} ${DESTDIR}/usr/share/man/man5 ${DESTDIR}/usr/share/man/man8
@@ -135,3 +141,8 @@ install:
 	mkdir -p ${DESTDIR}/usr/share/makedumpfile/eppic_scripts
 	install -m 644 -D $(VPATH)makedumpfile.conf ${DESTDIR}/usr/share/makedumpfile/makedumpfile.conf.sample
 	install -m 644 -t ${DESTDIR}/usr/share/makedumpfile/eppic_scripts/ $(VPATH)eppic_scripts/*
+
+.PHONY: extensions
+extensions:
+	$(MAKE) -C extensions CC=$(CC)
+
diff --git a/extension.c b/extension.c
new file mode 100644
index 0000000..017c980
--- /dev/null
+++ b/extension.c
@@ -0,0 +1,338 @@
+#include <stdio.h>
+#include "extension.h"
+#include "makedumpfile.h"
+#ifdef EXTENSION
+#include <string.h>
+#include <dirent.h>
+#include <dlfcn.h>
+#include <stdbool.h>
+#include <unistd.h>
+#include "kallsyms.h"
+#include "btf_info.h"
+
+typedef int (*callback_fn)(unsigned long, const void *);
+
+struct extension_handle_cb {
+	void *handle;
+	callback_fn cb;
+};
+
+/* Extension .so extension_handle_cb array */
+static struct extension_handle_cb **handle_cbs = NULL;
+static int handle_cbs_len = 0;
+static int handle_cbs_cap = 0;
+
+/* Extension option array */
+static char **extension_opts = NULL;
+static int extension_opts_len = 0;
+static int extension_opts_cap = 0;
+
+static const char *dirs[] = {
+	"./",
+	"./extensions/",
+	"/usr/lib64/makedumpfile/extensions/",
+};
+
+bool add_extension_opts(char *opt)
+{
+	if (!add_to_arr((void ***)&extension_opts, &extension_opts_len,
+			&extension_opts_cap, opt)) {
+		/*
+		 * If fail, print error info and skip the extension.
+		*/
+		ERRMSG("Fail to add extension %s\n", opt);
+		return false;
+	} else {
+		return true;
+	}
+}
+
+static bool init_kallsyms_btf(void)
+{
+	int count;
+	bool ret = false;
+	/* We will load module's btf/kallsyms on demand */
+	bool init_ksyms_module = false;
+	bool init_ktypes_module = false;
+
+	if (check_ksyms_require_modname("vmlinux", &count)) {
+		if (!init_kernel_kallsyms())
+			goto out;
+		if (count >= 2)
+			init_ksyms_module = true;
+	}
+	if (check_ktypes_require_modname("vmlinux", &count)) {
+		if (!init_kernel_btf())
+			goto out;
+		if (count >= 2)
+			init_ktypes_module = true;
+	}
+	if (init_ksyms_module && !init_module_kallsyms())
+		goto out;
+	if (init_ktypes_module && !init_module_btf())
+		goto out;	
+	ret = true;
+out:
+	return ret;
+}
+
+static void cleanup_kallsyms_btf(void)
+{
+	cleanup_kallsyms();
+	cleanup_btf();
+}
+
+static void load_extensions(void)
+{
+	char path[512];
+	int len, i, j;
+	void *handle;
+	struct extension_handle_cb *ehc;
+
+	for (i = 0; i < extension_opts_len; i++) {
+		handle = NULL;
+		if (!extension_opts[i])
+			continue;
+		if ((len = strlen(extension_opts[i])) <= 3 ||
+		    (strcmp(extension_opts[i] + len - 3, ".so") != 0)) {
+			ERRMSG("Skip invalid extension: %s\n", extension_opts[i]);
+			continue;
+		}
+
+		if (extension_opts[i][0] == '/') {
+			/* Path & filename */
+			snprintf(path, sizeof(path), "%s", extension_opts[i]);
+			handle = dlopen(path, RTLD_NOW);
+			if (!handle) {
+				ERRMSG("Failed to load %s\n", dlerror());
+				continue;
+			}
+		} else {
+			/* Only filename */
+			for (j = 0; j < sizeof(dirs) / sizeof(char *); j++) {
+				snprintf(path, sizeof(path), "%s", dirs[j]);
+				len = strlen(path);
+				snprintf(path + len, sizeof(path) - len, "%s",
+					extension_opts[i]);
+				if (access(path, F_OK) == 0) {
+					handle = dlopen(path, RTLD_NOW);
+					if (handle)
+						break;
+					else
+						ERRMSG("Failed to load %s\n", dlerror());
+				}
+			}
+			if (!handle && j >= sizeof(dirs) / sizeof(char *)) {
+				ERRMSG("Not found %s\n", extension_opts[i]);
+				continue;
+			}
+		}
+
+		if (dlsym(handle, "extension_init") == NULL) {
+			ERRMSG("Skip extension %s: No extension_init()\n", path);
+			dlclose(handle);
+			continue;
+		}
+
+		if ((ehc = malloc(sizeof(struct extension_handle_cb))) == NULL) {
+			ERRMSG("Skip extension %s: No memory\n", path);
+			dlclose(handle);
+			continue;
+		}
+
+		ehc->handle = handle;
+		ehc->cb = dlsym(handle, "extension_callback");
+
+		if (!add_to_arr((void ***)&handle_cbs, &handle_cbs_len, &handle_cbs_cap, ehc)) {
+			ERRMSG("Failed to load %s\n", extension_opts[i]);
+			free(ehc);
+			dlclose(handle);
+			continue;
+		}
+		MSG("Loaded extension: %s\n", path);
+	}
+}
+
+static bool register_extension_sections(void)
+{
+	char *start, *stop;
+	int i;
+	bool ret = false;
+
+	for (i = 0; i < handle_cbs_len; i++) {
+		start = dlsym(handle_cbs[i]->handle, "__start_init_ksyms");
+		stop = dlsym(handle_cbs[i]->handle, "__stop_init_ksyms");
+		if (!register_ksym_section(start, stop))
+			goto out;
+
+		start = dlsym(handle_cbs[i]->handle, "__start_init_ktypes");
+		stop = dlsym(handle_cbs[i]->handle, "__stop_init_ktypes");
+		if (!register_ktype_section(start, stop))
+			goto out;
+	}
+	ret = true;
+out:
+	return ret;
+}
+
+void cleanup_extensions(void)
+{
+	for (int i = 0; i < handle_cbs_len; i++) {
+		dlclose(handle_cbs[i]->handle);
+		free(handle_cbs[i]);
+	}
+	if (handle_cbs) {
+		free(handle_cbs);
+		handle_cbs = NULL;
+	}
+	handle_cbs_len = 0;
+	handle_cbs_cap = 0;
+	if (extension_opts) {
+		free(extension_opts);
+		extension_opts = NULL;
+	}
+	extension_opts_len = 0;
+	extension_opts_cap = 0;
+
+	cleanup_kallsyms_btf();
+}
+
+static bool check_required_ksyms_all_resolved(void *handle)
+{
+	char *start, *stop;
+	struct ksym_info **p;
+	bool ret = true;
+
+	start = dlsym(handle, "__start_init_ksyms");
+	stop = dlsym(handle, "__stop_init_ksyms");
+
+	for (p = (struct ksym_info **)start;
+	     p < (struct ksym_info **)stop;
+	     p++) {
+		if ((*p)->sym_required && !SYM_EXIST(*p)) {
+			ret = false;
+			ERRMSG("Symbol %s in %s not found\n",
+				(*p)->symname, (*p)->modname);
+		}
+	}
+
+	return ret;
+}
+
+static bool check_required_ktypes_all_resolved(void *handle)
+{
+	char *start, *stop;
+	struct ktype_info **p;
+	bool ret = true;
+
+	start = dlsym(handle, "__start_init_ktypes");
+	stop = dlsym(handle, "__stop_init_ktypes");
+
+	for (p = (struct ktype_info **)start;
+	     p < (struct ktype_info **)stop;
+	     p++) {
+		if (!TYPE_EXIST(*p)) {
+			if ((*p)->member_required) {
+				ret = false;
+				ERRMSG("Member %s of struct %s in %s not found\n",
+					(*p)->member_name, (*p)->struct_name,
+					(*p)->modname);
+			} else if ((*p)->struct_required) {
+				ret = false;
+				ERRMSG("Struct %s in %s not found\n",
+					(*p)->struct_name, (*p)->modname);
+			}
+		}
+	}
+
+	return ret;
+}
+
+static bool extension_runnable(void *handle)
+{
+	return check_required_ksyms_all_resolved(handle) &&
+		check_required_ktypes_all_resolved(handle);
+}
+
+void init_extensions(void)
+{
+	/* Entry of extension init */
+	void (*init)(void);
+
+	load_extensions();
+	if (!register_extension_sections())
+		goto fail;
+	if (!init_kallsyms_btf()) 
+		goto fail;
+	for (int i = 0; i < handle_cbs_len; i++) {
+		if (extension_runnable(handle_cbs[i]->handle)) {
+			init = dlsym(handle_cbs[i]->handle, "extension_init");
+			init();
+		} else {
+			ERRMSG("Skip %dth extension\n", i + 1);
+		}
+	}
+	return;
+fail:
+	ERRMSG("fail & skip all extensions\n");
+	cleanup_extensions();
+}
+
+/*
+ * For a single pfn/pcache, multiple extensions will decide whether to:
+ * 1) include the page (PG_INCLUDE), or
+ * 2) exclude the page (PG_EXCLUDE), or
+ * 3) make no decision to pass to others or fallback to traditional page-flags
+ *    based filtering (PG_UNDECID).
+ * 
+ * The arbitration is:
+ * 1) Include the page if anyone says PG_INCLUDE, and
+ * 2) Exclude the page if no one says PG_INCLUDE, but one or more say PG_EXCLUDE.
+ */
+int run_extension_callback(unsigned long pfn, const void *pcache)
+{
+	int result;
+	int ret = PG_UNDECID;
+
+	for (int i = 0; i < handle_cbs_len; i++) {
+		if (handle_cbs[i]->cb) {
+			result = handle_cbs[i]->cb(pfn, pcache);
+			if (result == PG_INCLUDE) {
+				ret = result;
+				goto out;
+			} else if (result == PG_EXCLUDE) {
+				ret = result;
+			}
+		}
+	}
+out:
+	return ret;
+}
+
+bool has_extension_loaded(void)
+{
+	return extension_opts_len > 0;
+}
+
+#else /* EXTENSION */
+
+void init_extensions(void) { }
+void cleanup_extensions(void) { }
+bool add_extension_opts(char *opt)
+{
+	ERRMSG("extension unsupported. Try `make EXTENSION=on` when building\n");
+	return false;
+}
+
+int run_extension_callback(unsigned long pfn, const void *pcache)
+{
+	return PG_UNDECID;
+}
+
+bool has_extension_loaded(void)
+{
+	return false;
+}
+
+#endif /* EXTENSION */
+
diff --git a/extension.h b/extension.h
new file mode 100644
index 0000000..972fcc8
--- /dev/null
+++ b/extension.h
@@ -0,0 +1,16 @@
+#ifndef _EXTENSION_H
+#define _EXTENSION_H
+#include <stdbool.h>
+
+enum {
+	PG_INCLUDE,	// Exntesion will keep the page
+	PG_EXCLUDE,	// Exntesion will discard the page
+	PG_UNDECID,	// Exntesion makes no decision
+};
+int run_extension_callback(unsigned long pfn, const void *pcache);
+void init_extensions(void);
+void cleanup_extensions(void);
+bool add_extension_opts(char *opt);
+bool has_extension_loaded(void);
+#endif /* _EXTENSION_H */
+
diff --git a/extensions/Makefile b/extensions/Makefile
new file mode 100644
index 0000000..b23f346
--- /dev/null
+++ b/extensions/Makefile
@@ -0,0 +1,11 @@
+CC ?= gcc
+CONTRIB_SO :=
+
+all: $(CONTRIB_SO)
+
+$(CONTRIB_SO): %.so: %.c
+	$(CC) -O2 -g -fPIC -shared -Wl,-T,../makedumpfile.ld -o $@ $^
+
+clean:
+	rm -f $(CONTRIB_SO)
+
diff --git a/makedumpfile.c b/makedumpfile.c
index dba3628..70badbc 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -28,6 +28,7 @@
 #include <assert.h>
 #include <zlib.h>
 #include "kallsyms.h"
+#include "extension.h"
 
 struct symbol_table	symbol_table;
 struct size_table	size_table;
@@ -102,6 +103,7 @@ mdf_pfn_t pfn_free;
 mdf_pfn_t pfn_hwpoison;
 mdf_pfn_t pfn_offline;
 mdf_pfn_t pfn_elf_excluded;
+mdf_pfn_t pfn_extension;
 
 mdf_pfn_t num_dumped;
 
@@ -6459,6 +6461,7 @@ __exclude_unnecessary_pages(unsigned long mem_map,
 	unsigned int order_offset, dtor_offset;
 	unsigned long flags, mapping, private = 0;
 	unsigned long compound_dtor, compound_head = 0;
+	int filter_pg;
 
 	/*
 	 * If a multi-page exclusion is pending, do it first
@@ -6531,6 +6534,14 @@ __exclude_unnecessary_pages(unsigned long mem_map,
 			pfn_read_end   = pfn + pfn_mm - 1;
 		}
 
+		/*
+		 * Include pages that specified by user via
+		 * makedumpfile extensions
+		 */
+		filter_pg = run_extension_callback(pfn, pcache);
+		if (filter_pg == PG_INCLUDE)
+			continue;
+
 		flags   = ULONG(pcache + OFFSET(page.flags));
 		_count  = UINT(pcache + OFFSET(page._refcount));
 		mapping = ULONG(pcache + OFFSET(page.mapping));
@@ -6687,6 +6698,14 @@ check_order:
 		else if (isOffline(flags, _mapcount)) {
 			pfn_counter = &pfn_offline;
 		}
+		/*
+		 * Exclude pages that specified by user via
+		 * makedumpfile extensions
+		 */
+		else if (filter_pg == PG_EXCLUDE) {
+			nr_pages = 1;
+			pfn_counter = &pfn_extension;
+		}
 		/*
 		 * Unexcludable page
 		 */
@@ -7173,13 +7192,14 @@ create_2nd_bitmap(struct cycle *cycle)
 
 	/*
 	 * Exclude cache pages, cache private pages, user data pages,
-	 * and hwpoison pages.
+	 * hwpoison pages and extension specified pages.
 	 */
 	if (info->dump_level & DL_EXCLUDE_CACHE ||
 	    info->dump_level & DL_EXCLUDE_CACHE_PRI ||
 	    info->dump_level & DL_EXCLUDE_USER_DATA ||
 	    NUMBER(PG_hwpoison) != NOT_FOUND_NUMBER ||
-	    ((info->dump_level & DL_EXCLUDE_FREE) && info->page_is_buddy)) {
+	    ((info->dump_level & DL_EXCLUDE_FREE) && info->page_is_buddy) ||
+	    has_extension_loaded()) {
 		if (!exclude_unnecessary_pages(cycle)) {
 			ERRMSG("Can't exclude unnecessary pages.\n");
 			return FALSE;
@@ -8234,7 +8254,7 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
 	 */
 	if (info->flag_cyclic) {
 		pfn_zero = pfn_cache = pfn_cache_private = 0;
-		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = 0;
+		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_extension = 0;
 		pfn_memhole = info->max_mapnr;
 	}
 
@@ -9579,7 +9599,7 @@ write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_d
 		 * Reset counter for debug message.
 		 */
 		pfn_zero = pfn_cache = pfn_cache_private = 0;
-		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = 0;
+		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_extension = 0;
 		pfn_memhole = info->max_mapnr;
 
 		/*
@@ -10528,7 +10548,7 @@ print_report(void)
 	pfn_original = info->max_mapnr - pfn_memhole;
 
 	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
-	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
+	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_extension;
 
 	REPORT_MSG("\n");
 	REPORT_MSG("Original pages  : 0x%016llx\n", pfn_original);
@@ -10544,6 +10564,7 @@ print_report(void)
 	REPORT_MSG("    Free pages              : 0x%016llx\n", pfn_free);
 	REPORT_MSG("    Hwpoison pages          : 0x%016llx\n", pfn_hwpoison);
 	REPORT_MSG("    Offline pages           : 0x%016llx\n", pfn_offline);
+	REPORT_MSG("    Extension filter pages  : 0x%016llx\n", pfn_extension);
 	REPORT_MSG("  Remaining pages  : 0x%016llx\n",
 	    pfn_original - pfn_excluded);
 
@@ -10584,7 +10605,7 @@ print_mem_usage(void)
 	pfn_original = info->max_mapnr - pfn_memhole;
 
 	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
-	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
+	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_extension;
 	shrinking = (pfn_original - pfn_excluded) * 100;
 	shrinking = shrinking / pfn_original;
 	total_size = info->page_size * pfn_original;
@@ -10878,6 +10899,7 @@ create_dumpfile(void)
 	}
 
 	print_vtop();
+	init_extensions();
 
 	num_retry = 0;
 retry:
@@ -10921,6 +10943,7 @@ retry:
 	}
 	print_report();
 
+	cleanup_extensions();
 	clear_filter_info();
 	if (!close_files_for_creating_dumpfile())
 		return FALSE;
@@ -12130,6 +12153,7 @@ static struct option longopts[] = {
 	{"check-params", no_argument, NULL, OPT_CHECK_PARAMS},
 	{"dry-run", no_argument, NULL, OPT_DRY_RUN},
 	{"show-stats", no_argument, NULL, OPT_SHOW_STATS},
+	{"extension", required_argument, NULL, OPT_EXTENSION},
 	{0, 0, 0, 0}
 };
 
@@ -12317,6 +12341,11 @@ main(int argc, char *argv[])
 		case OPT_SHOW_STATS:
 			flag_show_stats = TRUE;
 			break;
+		case OPT_EXTENSION:
+			if (add_extension_opts(optarg))
+				break;
+			else
+				goto out;
 		case '?':
 			MSG("Commandline parameter is invalid.\n");
 			MSG("Try `makedumpfile --help' for more information.\n");
diff --git a/makedumpfile.h b/makedumpfile.h
index 0f13743..974b648 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -2747,6 +2747,7 @@ struct elf_prstatus {
 #define OPT_CHECK_PARAMS        OPT_START+18
 #define OPT_DRY_RUN             OPT_START+19
 #define OPT_SHOW_STATS          OPT_START+20
+#define OPT_EXTENSION           OPT_START+21
 
 /*
  * Function Prototype.
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5][makedumpfile 7/9] Add sample extension as an example reference
  2026-04-14 10:26 [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (5 preceding siblings ...)
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 6/9] Add makedumpfile extensions support Tao Liu
@ 2026-04-14 10:26 ` Tao Liu
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 8/9] Doc: Add --extension option to makedumpfile manual Tao Liu
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Tao Liu @ 2026-04-14 10:26 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

This patch will add a sample.c as an example extension for code
reference. The extension will do nothing related to page filtering,
but print a few kernel symbol/type information.

Suggested-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
---
 extensions/Makefile |  2 +-
 extensions/sample.c | 69 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+), 1 deletion(-)
 create mode 100644 extensions/sample.c

diff --git a/extensions/Makefile b/extensions/Makefile
index b23f346..dbaec4e 100644
--- a/extensions/Makefile
+++ b/extensions/Makefile
@@ -1,5 +1,5 @@
 CC ?= gcc
-CONTRIB_SO :=
+CONTRIB_SO := sample.so
 
 all: $(CONTRIB_SO)
 
diff --git a/extensions/sample.c b/extensions/sample.c
new file mode 100644
index 0000000..e4938d1
--- /dev/null
+++ b/extensions/sample.c
@@ -0,0 +1,69 @@
+#include "../makedumpfile.h"
+#include "../btf_info.h"
+#include "../kallsyms.h"
+#include "../extension.h"
+
+/* 
+ * Declare the kernel symbols/types that will be used by the extension later.
+ * The btf/kallsyms component of makedumpfile will resolve these requested
+ * info automatically during extension loading.
+ * 
+ * The symbol/types declared by non-OPT macros as INIT_MOD_XX are must-have for
+ * the extension, any missing of these will lead to load-fail of the extension.
+ * This is useful to skip one extension as early. E.g. exit the amdgpu mm filtering
+ * extension when filtering against a vmcore dumpped by a machine which have
+ * no amdgpu hardware.
+ * 
+ * The symbol/types declared by OPT macros as INIT_OPT_MOD_XX are optional,
+ * meaning the existence of these are checked during extension runtime. This
+ * is useful to cover different kernel versions where some of the data structure
+ * are slightly different.
+ */
+/* All kernel will have init_task and task_struct.mm */
+INIT_MOD_SYM(vmlinux, init_task);
+INIT_MOD_STRUCT_MEMBER(vmlinux, task_struct, mm);
+/* 
+ * Older kernels use mm_struct.mm_rb,
+ * later ones use mm_struct.mm_mt
+ */
+INIT_OPT_MOD_STRUCT_MEMBER(vmlinux, mm_struct, mm_mt);
+INIT_OPT_MOD_STRUCT_MEMBER(vmlinux, mm_struct, mm_rb);
+
+/*
+ * Extension callback when makedumpfile is doing page filtering,
+ * extension should decide whether the given page should be kept(PG_INCLUDE),
+ * discarded(PG_EXCLUDE) or undecided(PG_UNDECID). Here we simply return
+ * PG_UNDECID to let every page fallbacks to traditinal page-flags
+ * check routine or let other extensions make the decision.
+ */
+int extension_callback(unsigned long pfn, const void *pcache)
+{
+	return PG_UNDECID;
+}
+
+/* Entry of extension */
+void extension_init(void)
+{
+	MSG("sample.so: The address of init_task is: %lx\n",
+		GET_MOD_SYM(vmlinux, init_task));
+	MSG("sample.so: The size of task_struct is: %d bytes\n",
+		GET_MOD_STRUCT_MEMBER_SSIZE(vmlinux, task_struct, mm));
+	MSG("sample.so: The offset of member mm within task_struct is: %d bytes\n",
+		GET_MOD_STRUCT_MEMBER_MOFF(vmlinux, task_struct, mm) / 8 );
+	MSG("sample.so: The size of member mm within task_struct is: %d bytes\n",
+		GET_MOD_STRUCT_MEMBER_MSIZE(vmlinux, task_struct, mm));
+	if (MOD_STRUCT_MEMBER_EXIST(vmlinux, mm_struct, mm_mt)) {
+		MSG("sample.so: Your kernel is using maple tree in mm_struct\n");
+	}
+	if (MOD_STRUCT_MEMBER_EXIST(vmlinux, mm_struct, mm_rb)) {
+		MSG("sample.so: Your kernel is using rb tree in mm_struct\n");
+	}
+}
+
+/* 
+ * This function is called when the extension is unloaded. 
+ * If desired, perform any cleanups here. 
+ */
+__attribute__((destructor))
+void extension_cleanup(void) { }
+
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5][makedumpfile 8/9] Doc: Add --extension option to makedumpfile manual
  2026-04-14 10:26 [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (6 preceding siblings ...)
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 7/9] Add sample extension as an example reference Tao Liu
@ 2026-04-14 10:26 ` Tao Liu
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 9/9] Add amdgpu mm pages filtering extension Tao Liu
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Tao Liu @ 2026-04-14 10:26 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

Suggested-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
---
 README            |  6 ++++++
 makedumpfile.8.in | 11 ++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/README b/README
index b2666cb..0f1ccb9 100644
--- a/README
+++ b/README
@@ -65,6 +65,12 @@
     # make USEZSTD=on ; make install
     The user has to prepare zstd library.
 
+  - Build with extension support:
+    # make EXTENSION=on ; make install
+    The user has to prepare bpf library with version(>=v1.4.0).
+    Note: the extension will only work for kernels with DEBUG_INFO_BTF
+    & KALLSYMS enabled.
+
   - Build the extension module for --eppic option.
     # make eppic_makedumpfile.so
     The user has to prepare eppic library from the following site:
diff --git a/makedumpfile.8.in b/makedumpfile.8.in
index 1edd0ff..5af1786 100644
--- a/makedumpfile.8.in
+++ b/makedumpfile.8.in
@@ -2,7 +2,7 @@
 .SH NAME
 makedumpfile \- make a small dumpfile of kdump
 .SH SYNOPSIS
-\fBmakedumpfile\fR    [\fIOPTION\fR] [\-x \fIVMLINUX\fR|\-i \fIVMCOREINFO\fR] \fIVMCORE\fR \fIDUMPFILE\fR
+\fBmakedumpfile\fR    [\fIOPTION\fR] [\-x \fIVMLINUX\fR|\-i \fIVMCOREINFO\fR] [--extension \fIEXTENSION1.SO\fR [--extension \fIEXTENSION2.SO\fR ..]] \fIVMCORE\fR \fIDUMPFILE\fR
 .br
 \fBmakedumpfile\fR \-F [\fIOPTION\fR] [\-x \fIVMLINUX\fR|\-i \fIVMCOREINFO\fR] \fIVMCORE\fR
 .br
@@ -664,6 +664,15 @@ This option cannot be used with the --dump-dmesg, --reassemble and -g options.
 Display report messages. This is an alternative to enabling bit 4 in the level
 provided to --message-level.
 
+.TP
+\fB\-\-extension\fR \fIEXTENSION.SO\fR
+Load makedumpfile extensions. By using extensions, users can programmably
+customize mm page keeping / discard, alone with the traditional page-flag
+(-d dump_level) based filtering. The option can be used multiple times to
+select various extensions. NOTE: this option needs EXTENSION=on when build
+makedumpfile with libbpf(>=v1.4.0). Also this option only works for kernels
+with DEBUG_INFO_BTF & KALLSYMS enabled.
+
 .SH ENVIRONMENT VARIABLES
 
 .TP 8
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5][makedumpfile 9/9] Add amdgpu mm pages filtering extension
  2026-04-14 10:26 [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (7 preceding siblings ...)
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 8/9] Doc: Add --extension option to makedumpfile manual Tao Liu
@ 2026-04-14 10:26 ` Tao Liu
  2026-05-20  4:55 ` [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Tao Liu @ 2026-04-14 10:26 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

This patch will introduce maple_tree & amdgpu mm page filtering extension,
those mm pages allocated to amdgpu will be discarded from vmcore, in order
to shrink vmcore size since mm pages allocated to amdgpu are useless to kernel
crash and may contain sensitive data.

Signed-off-by: Tao Liu <ltao@redhat.com>
---
 extensions/Makefile        |   4 +-
 extensions/amdgpu_filter.c | 221 +++++++++++++++++++++++++
 extensions/maple_tree.c    | 328 +++++++++++++++++++++++++++++++++++++
 extensions/maple_tree.h    |   7 +
 4 files changed, 559 insertions(+), 1 deletion(-)
 create mode 100644 extensions/amdgpu_filter.c
 create mode 100644 extensions/maple_tree.c
 create mode 100644 extensions/maple_tree.h

diff --git a/extensions/Makefile b/extensions/Makefile
index dbaec4e..c1dbc4f 100644
--- a/extensions/Makefile
+++ b/extensions/Makefile
@@ -1,8 +1,10 @@
 CC ?= gcc
-CONTRIB_SO := sample.so
+CONTRIB_SO := sample.so amdgpu_filter.so
 
 all: $(CONTRIB_SO)
 
+amdgpu_filter.so: maple_tree.c
+
 $(CONTRIB_SO): %.so: %.c
 	$(CC) -O2 -g -fPIC -shared -Wl,-T,../makedumpfile.ld -o $@ $^
 
diff --git a/extensions/amdgpu_filter.c b/extensions/amdgpu_filter.c
new file mode 100644
index 0000000..0ad0fb5
--- /dev/null
+++ b/extensions/amdgpu_filter.c
@@ -0,0 +1,221 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include "maple_tree.h"
+#include "../makedumpfile.h"
+#include "../btf_info.h"
+#include "../kallsyms.h"
+#include "../extension.h"
+
+/*
+ * These syms/types are must-have for the extension.
+*/
+INIT_MOD_STRUCT_MEMBER(vmlinux, task_struct, tasks);
+INIT_MOD_STRUCT_MEMBER(vmlinux, task_struct, mm);
+INIT_MOD_STRUCT_MEMBER(vmlinux, mm_struct, mm_mt);
+INIT_MOD_STRUCT_MEMBER(vmlinux, vm_area_struct, vm_ops);
+INIT_MOD_STRUCT_MEMBER(vmlinux, vm_area_struct, vm_private_data);
+INIT_MOD_STRUCT_MEMBER(amdgpu, ttm_buffer_object, ttm);
+INIT_MOD_STRUCT_MEMBER(amdgpu, ttm_tt, pages);
+INIT_MOD_STRUCT_MEMBER(amdgpu, ttm_tt, num_pages);
+INIT_MOD_STRUCT(vmlinux, page);
+
+INIT_MOD_SYM(vmlinux, init_task);
+INIT_MOD_SYM(vmlinux, vmemmap_base);
+INIT_MOD_SYM(amdgpu, amdgpu_gem_vm_ops);
+
+struct ft_page_info {
+	unsigned long pfn;
+	unsigned long num;
+	struct ft_page_info *next;
+};
+
+static struct ft_page_info *ft_head_discard = NULL;
+
+static void update_filter_pages_info(unsigned long pfn, unsigned long num)
+{
+	struct ft_page_info *p, **ft_head;
+	struct ft_page_info *new_p = malloc(sizeof(struct ft_page_info));
+
+	ft_head = &ft_head_discard;
+
+	if (!new_p) {
+		ERRMSG("Can't allocate memory for ft_page_info\n");
+		return;
+	}
+	new_p->pfn = pfn;
+	new_p->num = num;
+	new_p->next = NULL;
+
+	if (!(*ft_head) || (*ft_head)->pfn > new_p->pfn) {
+		new_p->next = (*ft_head);
+		(*ft_head) = new_p;
+		return;
+	}
+
+	p = (*ft_head);
+	while (p->next != NULL && p->next->pfn < new_p->pfn) {
+		p = p->next;
+	}
+
+	new_p->next = p->next;
+	p->next = new_p;
+}
+
+static int filter_page(unsigned long pfn, struct ft_page_info **p)
+{
+	struct ft_page_info *ft_head = ft_head_discard;
+
+	if (ft_head == NULL)
+		return PG_UNDECID;
+
+	if (*p == NULL)
+		*p = ft_head;
+
+	/* The gap before 1st block */
+	if (pfn >= 0 && pfn < ft_head->pfn)
+		return PG_UNDECID;
+
+	/* Handle 1~(n-1) blocks and following gaps */
+	while ((*p)->next) {
+		if (pfn >= (*p)->pfn && pfn < (*p)->pfn + (*p)->num)
+			return PG_EXCLUDE; // hit the block
+		if (pfn >= (*p)->pfn + (*p)->num && pfn < (*p)->next->pfn)
+			return PG_UNDECID; // the gap after the block
+		*p = (*p)->next;
+	}
+
+	/* The last block and gap */
+	if (pfn >= (*p)->pfn + (*p)->num)
+		return PG_UNDECID;
+	else
+		return PG_EXCLUDE;
+}
+
+static void do_cleanup(struct ft_page_info **ft_head)
+{
+	struct ft_page_info *p, *p_tmp;
+
+	for (p = *ft_head; p;) {
+		p_tmp = p;
+		p = p->next;
+		free(p_tmp);
+	}
+	*ft_head = NULL;
+}
+
+#define MOD_MEMBER_OFF(MOD, S, M) \
+	GET_MOD_STRUCT_MEMBER_MOFF(MOD, S, M) / 8
+#define KERN_MEMBER_OFF(S, M) MOD_MEMBER_OFF(vmlinux, S, M)
+#define GET_KERN_STRUCT_MEMBER_MSIZE(S, M) \
+	GET_MOD_STRUCT_MEMBER_MSIZE(vmlinux, S, M)
+#define GET_KERN_SYM(SYM) GET_MOD_SYM(vmlinux, SYM)
+#define GET_KERN_STRUCT_SSIZE(S) GET_MOD_STRUCT_SSIZE(vmlinux, S)
+
+static bool gather_amdgpu_mm_range_info(void)
+{
+	uint64_t init_task, list, list_offset, amdgpu_gem_vm_ops;
+	uint64_t mm, vm_ops, tbo, ttm, num_pages, pages, pfn, vmemmap_base;
+	int array_len;
+	unsigned long *array_out;
+	init_task = GET_KERN_SYM(init_task);
+	amdgpu_gem_vm_ops = GET_MOD_SYM(amdgpu, amdgpu_gem_vm_ops);
+
+	list = init_task + KERN_MEMBER_OFF(task_struct, tasks);
+
+	do {
+		if (!readmem(VADDR, list - KERN_MEMBER_OFF(task_struct, tasks) + 
+				KERN_MEMBER_OFF(task_struct, mm),
+			&mm, sizeof(uint64_t))) {
+			ERRMSG("Can't get task_struct member mm!\n");
+			goto out;
+		}
+		if (!mm) {
+			list = next_list(list);
+			continue;
+		}
+
+		array_out = mt_dump(mm + KERN_MEMBER_OFF(mm_struct, mm_mt), &array_len);
+		if (!array_out)
+			goto out;
+
+		for (int i = 0; i < array_len; i++) {
+			num_pages = 0;
+			if (!readmem(VADDR, array_out[i] + KERN_MEMBER_OFF(vm_area_struct, vm_ops),
+				&vm_ops, GET_KERN_STRUCT_MEMBER_MSIZE(vm_area_struct, vm_ops))) {
+				ERRMSG("Can't get vm_area_struct member vm_ops!\n");
+				goto out;
+			}
+			if (vm_ops == amdgpu_gem_vm_ops) {
+				if (!readmem(VADDR, array_out[i] +
+					KERN_MEMBER_OFF(vm_area_struct, vm_private_data),
+					&tbo, GET_KERN_STRUCT_MEMBER_MSIZE(vm_area_struct, vm_private_data))) {
+					ERRMSG("Can't get vm_area_struct member vm_private_data!\n");
+					goto out;
+				}
+				if (!readmem(VADDR, tbo + MOD_MEMBER_OFF(amdgpu, ttm_buffer_object, ttm),
+					&ttm, GET_MOD_STRUCT_MEMBER_MSIZE(amdgpu, ttm_buffer_object, ttm))) {
+					ERRMSG("Can't get ttm_buffer_object member ttm!\n");
+					goto out;
+				}
+				if (ttm) {
+					if (!readmem(VADDR, ttm + MOD_MEMBER_OFF(amdgpu, ttm_tt, num_pages),
+						&num_pages, GET_MOD_STRUCT_MEMBER_MSIZE(amdgpu, ttm_tt, num_pages))) {
+						ERRMSG("Can't get ttm_tt member num_pages!\n");
+						goto out;
+					}
+					if (!readmem(VADDR, ttm + MOD_MEMBER_OFF(amdgpu, ttm_tt, pages),
+						&pages, GET_MOD_STRUCT_MEMBER_MSIZE(amdgpu, ttm_tt, pages))) {
+						ERRMSG("Can't get ttm_tt member pages!\n");
+						goto out;
+					}
+					if (!readmem(VADDR, pages, &pages, sizeof(unsigned long))) {
+						ERRMSG("Can't get pages!\n");
+						goto out;
+					}
+					if (!readmem(VADDR, GET_KERN_SYM(vmemmap_base),
+						&vmemmap_base, sizeof(unsigned long))) {
+						ERRMSG("Can't get vmemmap_base!\n");
+						goto out;
+					}
+					pfn = (pages - vmemmap_base) / GET_KERN_STRUCT_SSIZE(page);
+					update_filter_pages_info(pfn, num_pages);
+				}
+			}
+		}
+
+		free(array_out);
+		list = next_list(list);
+	} while (list != init_task + KERN_MEMBER_OFF(task_struct, tasks));
+
+	return true;
+out:
+	return false;
+}
+
+/* Extension callback when makedumpfile do page filtering */
+int extension_callback(unsigned long pfn, const void *pcache)
+{
+	struct ft_page_info *cur = NULL;
+
+	return filter_page(pfn, &cur);
+}
+
+/* Entry of extension */
+void extension_init(void)
+{
+	if (!maple_init() || !gather_amdgpu_mm_range_info()) {
+		ERRMSG("amdgpu_filter.so: init fail!\n");
+		goto out;
+	}
+	MSG("amdgpu_filter.so: init success!\n");
+out:
+	return;
+}
+
+__attribute__((destructor))
+void extension_cleanup(void)
+{
+	do_cleanup(&ft_head_discard);
+}
+
+
diff --git a/extensions/maple_tree.c b/extensions/maple_tree.c
new file mode 100644
index 0000000..9cc067f
--- /dev/null
+++ b/extensions/maple_tree.c
@@ -0,0 +1,328 @@
+#include <stdio.h>
+#include <stdbool.h>
+#include "../btf_info.h"
+#include "../kallsyms.h"
+#include "../makedumpfile.h"
+
+static unsigned char mt_slots[4] = {0};
+static unsigned char mt_pivots[4] = {0};
+static unsigned long mt_max[4] = {0};
+
+INIT_OPT_MOD_SYM(vmlinux, mt_slots);
+INIT_OPT_MOD_SYM(vmlinux, mt_pivots);
+
+INIT_OPT_MOD_STRUCT(vmlinux, maple_tree);
+INIT_OPT_MOD_STRUCT(vmlinux, maple_node);
+INIT_OPT_MOD_STRUCT_MEMBER(vmlinux, maple_tree, ma_root);
+INIT_OPT_MOD_STRUCT_MEMBER(vmlinux, maple_node, ma64);
+INIT_OPT_MOD_STRUCT_MEMBER(vmlinux, maple_node, mr64);
+INIT_OPT_MOD_STRUCT_MEMBER(vmlinux, maple_node, slot);
+INIT_OPT_MOD_STRUCT_MEMBER(vmlinux, maple_arange_64, pivot);
+INIT_OPT_MOD_STRUCT_MEMBER(vmlinux, maple_arange_64, slot);
+INIT_OPT_MOD_STRUCT_MEMBER(vmlinux, maple_range_64, pivot);
+INIT_OPT_MOD_STRUCT_MEMBER(vmlinux, maple_range_64, slot);
+
+#define MEMBER_OFF(S, M) \
+	GET_MOD_STRUCT_MEMBER_MOFF(vmlinux, S, M) / 8
+#define KERN_STRUCT_MEMBER_EXIST(S, M) \
+	MOD_STRUCT_MEMBER_EXIST(vmlinux, S, M)
+#define GET_KERN_SYM(SYM) GET_MOD_SYM(vmlinux, SYM)
+#define KERN_SYM_EXIST(SYM) MOD_SYM_EXIST(vmlinux, SYM)
+#define GET_KERN_STRUCT_SSIZE(S) \
+	GET_MOD_STRUCT_SSIZE(vmlinux, S)
+#define KERN_STRUCT_EXIST(SYM) MOD_STRUCT_EXIST(vmlinux, SYM)
+
+#define MAPLE_BUFSIZE			512
+
+enum {
+	maple_dense_enum,
+	maple_leaf_64_enum,
+	maple_range_64_enum,
+	maple_arange_64_enum,
+};
+
+#define MAPLE_NODE_MASK			255UL
+#define MAPLE_NODE_TYPE_MASK		0x0F
+#define MAPLE_NODE_TYPE_SHIFT		0x03
+#define XA_ZERO_ENTRY			xa_mk_internal(257)
+
+static unsigned long xa_mk_internal(unsigned long v)
+{
+	return (v << 2) | 2;
+}
+
+static bool xa_is_internal(unsigned long entry)
+{
+	return (entry & 3) == 2;
+}
+
+static bool xa_is_node(unsigned long entry)
+{
+	return xa_is_internal(entry) && entry > 4096;
+}
+
+static bool xa_is_value(unsigned long entry)
+{
+	return entry & 1;
+}
+
+static bool xa_is_zero(unsigned long entry)
+{
+	return entry == XA_ZERO_ENTRY;
+}
+
+static unsigned long xa_to_internal(unsigned long entry)
+{
+	return entry >> 2;
+}
+
+static unsigned long xa_to_value(unsigned long entry)
+{
+	return entry >> 1;
+}
+
+static unsigned long mte_to_node(unsigned long entry)
+{
+        return entry & ~MAPLE_NODE_MASK;
+}
+
+static unsigned long mte_node_type(unsigned long maple_enode_entry)
+{
+	return (maple_enode_entry >> MAPLE_NODE_TYPE_SHIFT) &
+		MAPLE_NODE_TYPE_MASK;
+}
+
+static unsigned long mt_slot(void **slots, unsigned char offset)
+{
+       return (unsigned long)slots[offset];
+}
+
+static bool ma_is_leaf(unsigned long type)
+{
+	return type < maple_range_64_enum;
+}
+
+static bool mte_is_leaf(unsigned long maple_enode_entry)
+{
+       return ma_is_leaf(mte_node_type(maple_enode_entry));
+}
+
+static void mt_dump_entry(unsigned long entry, unsigned long min,
+			unsigned long max, unsigned int depth,
+			unsigned long **array_out, int *array_len,
+			int *array_cap)
+{
+	if (entry == 0)
+		return;
+
+	add_to_arr((void ***)array_out, array_len, array_cap, (void *)entry);
+}
+
+static void mt_dump_node(unsigned long entry, unsigned long min,
+			unsigned long max, unsigned int depth,
+			unsigned long **array_out, int *array_len,
+			int *array_cap);
+
+static void mt_dump_range64(unsigned long entry, unsigned long min,
+			unsigned long max, unsigned int depth,
+			unsigned long **array_out, int *array_len,
+			int *array_cap)
+{
+	unsigned long maple_node_m_node = mte_to_node(entry);
+	char node_buf[MAPLE_BUFSIZE];
+	bool leaf = mte_is_leaf(entry);
+	unsigned long first = min, last;
+	int i;
+	char *mr64_buf;
+
+	if (!readmem(VADDR, maple_node_m_node, node_buf, GET_KERN_STRUCT_SSIZE(maple_node))) {
+		ERRMSG("Can't get maple_node_m_node!\n");
+	}
+	mr64_buf = node_buf + MEMBER_OFF(maple_node, mr64);
+
+	for (i = 0; i < mt_slots[maple_range_64_enum]; i++) {
+		last = max;
+
+		if (i < (mt_slots[maple_range_64_enum] - 1))
+			last = ULONG(mr64_buf + MEMBER_OFF(maple_range_64, pivot) +
+				     sizeof(ulong) * i);
+
+		else if (!VOID_PTR(mr64_buf + MEMBER_OFF(maple_range_64, slot) +
+			  sizeof(void *) * i) &&
+			 max != mt_max[mte_node_type(entry)])
+			break;
+		if (last == 0 && i > 0)
+			break;
+		if (leaf)
+			mt_dump_entry(mt_slot((void **)(mr64_buf +
+						      MEMBER_OFF(maple_range_64, slot)), i),
+				first, last, depth + 1, array_out, array_len, array_cap);
+		else if (VOID_PTR(mr64_buf + MEMBER_OFF(maple_range_64, slot) +
+				  sizeof(void *) * i)) {
+			mt_dump_node(mt_slot((void **)(mr64_buf +
+						     MEMBER_OFF(maple_range_64, slot)), i),
+				first, last, depth + 1, array_out, array_len, array_cap);
+		}
+
+		if (last == max)
+			break;
+		if (last > max) {
+			MSG("node %p last (%lu) > max (%lu) at pivot %d!\n",
+				mr64_buf, last, max, i);
+			break;
+		}
+		first = last + 1;
+	}
+}
+
+static void mt_dump_arange64(unsigned long entry, unsigned long min,
+			unsigned long max, unsigned int depth,
+			unsigned long **array_out, int *array_len,
+			int *array_cap)
+{
+	unsigned long maple_node_m_node = mte_to_node(entry);
+	char node_buf[MAPLE_BUFSIZE];
+	unsigned long first = min, last;
+	int i;
+	char *ma64_buf;
+
+	if (!readmem(VADDR, maple_node_m_node, node_buf, GET_KERN_STRUCT_SSIZE(maple_node))) {
+		ERRMSG("Can't get maple_node_m_node!\n");
+	}
+	ma64_buf = node_buf + MEMBER_OFF(maple_node, ma64);
+
+	for (i = 0; i < mt_slots[maple_arange_64_enum]; i++) {
+		last = max;
+
+		if (i < (mt_slots[maple_arange_64_enum] - 1))
+			last = ULONG(ma64_buf + MEMBER_OFF(maple_arange_64, pivot) +
+				     sizeof(void *) * i);
+		else if (!VOID_PTR(ma64_buf + MEMBER_OFF(maple_arange_64, slot) +
+				   sizeof(void *) * i))
+			break;
+		if (last == 0 && i > 0)
+			break;
+
+		if (ULONG(ma64_buf + MEMBER_OFF(maple_arange_64, slot) + sizeof(void *) * i))
+			mt_dump_node(mt_slot((void **)(ma64_buf +
+						      MEMBER_OFF(maple_arange_64, slot)), i),
+				first, last, depth + 1, array_out, array_len, array_cap);
+
+		if (last == max)
+			break;
+		if (last > max) {
+			MSG("node %p last (%lu) > max (%lu) at pivot %d!\n",
+				ma64_buf, last, max, i);
+			break;
+		}
+		first = last + 1;
+	}
+}
+
+static void mt_dump_node(unsigned long entry, unsigned long min,
+			unsigned long max, unsigned int depth,
+			unsigned long **array_out, int *array_len,
+			int *array_cap)
+{
+	unsigned long maple_node = mte_to_node(entry);
+	unsigned long type = mte_node_type(entry);
+	int i;
+	char node_buf[MAPLE_BUFSIZE];
+
+	if (!readmem(VADDR, maple_node, node_buf, GET_KERN_STRUCT_SSIZE(maple_node))) {
+		ERRMSG("Can't get maple_node!\n");
+	}
+
+	switch (type) {
+	case maple_dense_enum:
+		for (i = 0; i < mt_slots[maple_dense_enum]; i++) {
+			if (min + i > max)
+				MSG("OUT OF RANGE: ");
+			mt_dump_entry(mt_slot((void **)(node_buf + MEMBER_OFF(maple_node, slot)), i),
+				min + i, min + i, depth, array_out, array_len, array_cap);
+		}
+		break;
+	case maple_leaf_64_enum:
+	case maple_range_64_enum:
+		mt_dump_range64(entry, min, max, depth, array_out, array_len, array_cap);
+		break;
+	case maple_arange_64_enum:
+		mt_dump_arange64(entry, min, max, depth, array_out, array_len, array_cap);
+		break;
+	default:
+		MSG(" UNKNOWN TYPE\n");
+	}	
+}
+
+unsigned long *mt_dump(unsigned long mt, int *array_len)
+{
+	char tree_buf[MAPLE_BUFSIZE];
+	unsigned long entry;
+	unsigned long *array_out = NULL;
+	int array_cap = 0;
+	*array_len = 0;
+
+	if (!readmem(VADDR, mt, tree_buf, GET_KERN_STRUCT_SSIZE(maple_tree))) {
+		ERRMSG("Can't get maple_tree!\n");
+	}
+	entry = ULONG(tree_buf + MEMBER_OFF(maple_tree, ma_root));
+
+	if (xa_is_node(entry))
+		mt_dump_node(entry, 0, mt_max[mte_node_type(entry)], 0,
+				&array_out, array_len, &array_cap);
+	else if (entry)
+		mt_dump_entry(entry, 0, 0, 0, &array_out, array_len, &array_cap);
+	else
+		MSG("(empty)\n");
+
+	return array_out;
+}
+
+bool maple_init(void)
+{
+	unsigned long mt_slots_ptr;
+	unsigned long mt_pivots_ptr;
+
+	if (!KERN_SYM_EXIST(mt_slots) ||
+	    !KERN_SYM_EXIST(mt_pivots) ||
+	    !KERN_STRUCT_EXIST(maple_tree) ||
+	    !KERN_STRUCT_EXIST(maple_node) ||
+	    !KERN_STRUCT_MEMBER_EXIST(maple_tree, ma_root) ||
+	    !KERN_STRUCT_MEMBER_EXIST(maple_node, ma64) ||
+	    !KERN_STRUCT_MEMBER_EXIST(maple_node, mr64) ||
+	    !KERN_STRUCT_MEMBER_EXIST(maple_node, slot) ||
+	    !KERN_STRUCT_MEMBER_EXIST(maple_arange_64, pivot) ||
+	    !KERN_STRUCT_MEMBER_EXIST(maple_arange_64, slot) ||
+	    !KERN_STRUCT_MEMBER_EXIST(maple_range_64, pivot) ||
+	    !KERN_STRUCT_MEMBER_EXIST(maple_range_64, slot)) {
+		ERRMSG("Missing required maple tree syms/types\n");
+		return false;
+	}
+
+	mt_slots_ptr = GET_KERN_SYM(mt_slots);
+	mt_pivots_ptr = GET_KERN_SYM(mt_pivots);
+
+	if (GET_KERN_STRUCT_SSIZE(maple_tree) > MAPLE_BUFSIZE ||
+	    GET_KERN_STRUCT_SSIZE(maple_node) > MAPLE_BUFSIZE) {
+		ERRMSG("MAPLE_BUFSIZE should be larger than maple_node/tree struct\n");
+		return false;
+	}
+
+	if (!readmem(VADDR, mt_slots_ptr, mt_slots, sizeof(mt_slots))) {
+		ERRMSG("Can't get mt_slots!\n");
+		return false;
+	}
+	if (!readmem(VADDR, mt_pivots_ptr, mt_pivots, sizeof(mt_pivots))) {
+		ERRMSG("Can't get mt_pivots!\n");
+		return false;
+	}
+
+	mt_max[maple_dense_enum]           = mt_slots[maple_dense_enum];
+	mt_max[maple_leaf_64_enum]         = ULONG_MAX;
+	mt_max[maple_range_64_enum]        = ULONG_MAX;
+	mt_max[maple_arange_64_enum]       = ULONG_MAX;
+
+	return true;
+}
+
+
diff --git a/extensions/maple_tree.h b/extensions/maple_tree.h
new file mode 100644
index 0000000..2e8d029
--- /dev/null
+++ b/extensions/maple_tree.h
@@ -0,0 +1,7 @@
+#ifndef _MAPLE_TREE_H
+#define _MAPLE_TREE_H
+#include <stdbool.h>
+unsigned long *mt_dump(unsigned long mt, int *array_len);
+bool maple_init(void);
+#endif /* _MAPLE_TREE_H */
+
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering
  2026-04-14 10:26 [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (8 preceding siblings ...)
  2026-04-14 10:26 ` [PATCH v5][makedumpfile 9/9] Add amdgpu mm pages filtering extension Tao Liu
@ 2026-05-20  4:55 ` Tao Liu
  2026-05-28 18:37 ` Stephen Brennan
  2026-05-29 21:11 ` Krister Johansen
  11 siblings, 0 replies; 18+ messages in thread
From: Tao Liu @ 2026-05-20  4:55 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan

Kindly ping, Any comments for this patchset?

Thanks,
Tao Liu

On Tue, Apr 14, 2026 at 10:27 PM Tao Liu <ltao@redhat.com> wrote:
>
> A) This patchset will introduce the following features to makedumpfile:
>
>   1) Add .so extension support to makedumpfile
>   2) Enable btf and kallsyms for symbol type and address resolving.
>
> B) The purpose of the features are:
>
>   1) Currently makedumpfile filters mm pages based on page flags, because flags
>      can help to determine one page's usage. But this page-flag-checking method
>      lacks of flexibility in certain cases, e.g. if we want to filter those mm
>      pages occupied by GPU during vmcore dumping due to:
>
>      a) GPU may be taking a large memory and contains sensitive data;
>      b) GPU mm pages have no relations to kernel crash and useless for vmcore
>         analysis.
>
>      But there is no GPU mm page specific flags, and apparently we don't need
>      to create one just for kdump use. A programmable filtering tool is more
>      suitable for such cases. In addition, different GPU vendors may use
>      different ways for mm pages allocating, programmable filtering is better
>      than hard coding these GPU specific logics into makedumpfile in this case.
>
>   2) Currently makedumpfile already contains a programmable filtering tool, aka
>      eppic script, which allows user to write customized code for data erasing.
>      However it has the following drawbacks:
>
>      a) cannot do mm page filtering.
>      b) need to access to debuginfo of both kernel and modules, which is not
>         applicable in the 2nd kernel.
>      c) eppic library has memory leaks which are not all resolved [1]. This
>         is not acceptable in 2nd kernel.
>
>      makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
>      types and addresses. In recent kernel there are dwarf alternatives such
>      as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
>      are already packed within vmcore, so we can use it directly.
>
>   With these, this patchset introduces makedumpfile extensions, which is based
>   on btf/kallsyms symbol resolving, and is programmable for mm page filtering.
>   The following section shows its usage and performance, please note the tests
>   are performed in 1st kernel.
>
>   3) Compile and run makedumpfile extensions:
>
>   $ make LINKTYPE=dynamic USELZO=on USESNAPPY=on USEZSTD=on EXTENSION=on
>   $ make extensions
>
>   $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
>     /tmp/extension.out --extension amdgpu_filter.so
>     Loaded extension: ./extensions/amdgpu_filter.so
>     makedumpfile Completed.
>         User time (seconds): 5.08
>         System time (seconds): 0.84
>         Percent of CPU this job got: 99%
>         Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.95
>         Maximum resident set size (kbytes): 17360
>         ...
>
>      To contrast with eppic script of v2 [2]:
>
>   $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
>     /tmp/eppic.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c
>     makedumpfile Completed.
>         User time (seconds): 8.23
>         System time (seconds): 0.88
>         Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.16
>         Maximum resident set size (kbytes): 57128
>         ...
>
>   -rw------- 1 root root 367475074 Jan 19 19:01 /tmp/extension.out
>   -rw------- 1 root root 367475074 Jan 19 19:48 /tmp/eppic.out
>   -rw------- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore
>
> C) Discussion:
>
>   1) GPU types: Currently only tested with amdgpu's mm page filtering, others
>      are not tested.
>   2) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64.
>      rhel8.x is not supported, others are not tested.
>
> D) Testing:
>
>      If you don't want to create your vmcore, you can find a vmcore which I
>      created with amdgpu mm pages unfiltered [3], the amdgpu mm pages are
>      allocated by program [4]. You can use the vmcore in 1st kernel to filter
>      the amdgpu mm pages by the previous performance testing cmdline. To
>      verify the pages are filtered in crash:
>
>      Unfiltered:
>      crash> search -c "!QAZXSW@#EDC"
>      ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>      ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>      crash> rd ffff96b7fa800000
>      ffff96b7fa800000:  405753585a415121                    !QAZXSW@
>      crash> rd ffff96b87c800000
>      ffff96b87c800000:  405753585a415121                    !QAZXSW@
>
>      Filtered:
>      crash> search -c "!QAZXSW@#EDC"
>      crash> rd ffff96b7fa800000
>      rd: page excluded: kernel virtual address: ffff96b7fa800000  type: "64-bit KVADDR"
>      crash> rd ffff96b87c800000
>      rd: page excluded: kernel virtual address: ffff96b87c800000  type: "64-bit KVADDR"
>
> [1]: https://github.com/lucchouina/eppic/pull/32
> [2]: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
> [3]: https://people.redhat.com/~ltao/core/vmcore
> [4]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df
>
> v5 -> v4:
>
> 1) Add "make EXTENSION=on" switch to customize the extension feature.
> 2) Clean up macros within btf_info.h.
> 3) Updated doc and a sample extension to demo the extension usage.
> 4) Use MSG()/ERRMSG() rather than fprintf().
> 5) Add return value check for readmem().
> 6) Allow "makedumpfile -d 1 --extension ext.so" to enter extension.
> 7) The patches are organized as follows:
>
>     --- <customization specific> ---
>     9. Add amdgpu mm pages filtering extension
>
>     --- <code should be merged> ---
>     8. Doc: Add --extension option to makedumpfile manual
>     7. Add sample extension as an example reference
>     6. Add makedumpfile extensions support
>     5. Implement kernel module's btf resolving
>     4. Implement kernel module's kallsyms resolving
>     3. Implement kernel btf resolving
>     2. Implement kernel kallsyms resolving
>     1. Reserve sections for makedumpfile and extenions
>
>     Patch 9 is customization specific, merging depends on the strategy of
>     maintenance.
>     Patch 1 ~ 8 are common code which should be merged with makedumpfile.
>
> Link to v4: https://lore.kernel.org/kexec/20260317150743.69590-1-ltao@redhat.com/
> Link to v3: https://lore.kernel.org/kexec/20260120025500.25095-1-ltao@redhat.com/
> Link to v2: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
> Link to v1: https://lore.kernel.org/kexec/20250610095743.18073-1-ltao@redhat.com/
>
> Tao Liu (9):
>   Reserve sections for makedumpfile and extenions
>   Implement kernel kallsyms resolving
>   Implement kernel btf resolving
>   Implement kernel module's kallsyms resolving
>   Implement kernel module's btf resolving
>   Add makedumpfile extensions support
>   Add sample extension as an example reference
>   Doc: Add --extension option to makedumpfile manual
>   Add amdgpu mm pages filtering extension
>
>  Makefile                   |  15 +-
>  README                     |   6 +
>  btf_info.c                 | 375 +++++++++++++++++++++++++
>  btf_info.h                 |  77 ++++++
>  extension.c                | 338 ++++++++++++++++++++++
>  extension.h                |  16 ++
>  extensions/Makefile        |  13 +
>  extensions/amdgpu_filter.c | 221 +++++++++++++++
>  extensions/maple_tree.c    | 328 ++++++++++++++++++++++
>  extensions/maple_tree.h    |   7 +
>  extensions/sample.c        |  69 +++++
>  kallsyms.c                 | 554 +++++++++++++++++++++++++++++++++++++
>  kallsyms.h                 |  87 ++++++
>  makedumpfile.8.in          |  11 +-
>  makedumpfile.c             |  44 ++-
>  makedumpfile.h             |  12 +
>  makedumpfile.ld            |  16 ++
>  17 files changed, 2180 insertions(+), 9 deletions(-)
>  create mode 100644 btf_info.c
>  create mode 100644 btf_info.h
>  create mode 100644 extension.c
>  create mode 100644 extension.h
>  create mode 100644 extensions/Makefile
>  create mode 100644 extensions/amdgpu_filter.c
>  create mode 100644 extensions/maple_tree.c
>  create mode 100644 extensions/maple_tree.h
>  create mode 100644 extensions/sample.c
>  create mode 100644 kallsyms.c
>  create mode 100644 kallsyms.h
>  create mode 100644 makedumpfile.ld
>
> --
> 2.47.0
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering
  2026-04-14 10:26 [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (9 preceding siblings ...)
  2026-05-20  4:55 ` [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
@ 2026-05-28 18:37 ` Stephen Brennan
  2026-05-28 22:02   ` Tao Liu
  2026-05-29 21:11 ` Krister Johansen
  11 siblings, 1 reply; 18+ messages in thread
From: Stephen Brennan @ 2026-05-28 18:37 UTC (permalink / raw)
  To: Tao Liu, yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, Tao Liu

Tao Liu <ltao@redhat.com> writes:
> A) This patchset will introduce the following features to makedumpfile:
>
>   1) Add .so extension support to makedumpfile
>   2) Enable btf and kallsyms for symbol type and address resolving.
>
> B) The purpose of the features are:
>
>   1) Currently makedumpfile filters mm pages based on page flags, because flags
>      can help to determine one page's usage. But this page-flag-checking method
>      lacks of flexibility in certain cases, e.g. if we want to filter those mm
>      pages occupied by GPU during vmcore dumping due to:
>
>      a) GPU may be taking a large memory and contains sensitive data;
>      b) GPU mm pages have no relations to kernel crash and useless for vmcore
>         analysis.
>
>      But there is no GPU mm page specific flags, and apparently we don't need
>      to create one just for kdump use. A programmable filtering tool is more
>      suitable for such cases. In addition, different GPU vendors may use
>      different ways for mm pages allocating, programmable filtering is better
>      than hard coding these GPU specific logics into makedumpfile in this case.
>
>   2) Currently makedumpfile already contains a programmable filtering tool, aka
>      eppic script, which allows user to write customized code for data erasing.
>      However it has the following drawbacks:
>
>      a) cannot do mm page filtering.
>      b) need to access to debuginfo of both kernel and modules, which is not
>         applicable in the 2nd kernel.
>      c) eppic library has memory leaks which are not all resolved [1]. This
>         is not acceptable in 2nd kernel.
>
>      makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
>      types and addresses. In recent kernel there are dwarf alternatives such
>      as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
>      are already packed within vmcore, so we can use it directly.
>
>   With these, this patchset introduces makedumpfile extensions, which is based
>   on btf/kallsyms symbol resolving, and is programmable for mm page filtering.
>   The following section shows its usage and performance, please note the tests
>   are performed in 1st kernel.
>
>   3) Compile and run makedumpfile extensions:
>
>   $ make LINKTYPE=dynamic USELZO=on USESNAPPY=on USEZSTD=on EXTENSION=on
>   $ make extensions
>   
>   $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
>     /tmp/extension.out --extension amdgpu_filter.so
>     Loaded extension: ./extensions/amdgpu_filter.so
>     makedumpfile Completed.
>         User time (seconds): 5.08
>         System time (seconds): 0.84
>         Percent of CPU this job got: 99%
>         Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.95
>         Maximum resident set size (kbytes): 17360
>         ...
>  
>      To contrast with eppic script of v2 [2]:
>
>   $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
>     /tmp/eppic.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c   
>     makedumpfile Completed.
>         User time (seconds): 8.23
>         System time (seconds): 0.88
>         Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.16
>         Maximum resident set size (kbytes): 57128
>         ...
>
>   -rw------- 1 root root 367475074 Jan 19 19:01 /tmp/extension.out
>   -rw------- 1 root root 367475074 Jan 19 19:48 /tmp/eppic.out
>   -rw------- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore
>
> C) Discussion:
>
>   1) GPU types: Currently only tested with amdgpu's mm page filtering, others
>      are not tested.
>   2) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64.
>      rhel8.x is not supported, others are not tested.
>
> D) Testing:
>
>      If you don't want to create your vmcore, you can find a vmcore which I
>      created with amdgpu mm pages unfiltered [3], the amdgpu mm pages are
>      allocated by program [4]. You can use the vmcore in 1st kernel to filter
>      the amdgpu mm pages by the previous performance testing cmdline. To
>      verify the pages are filtered in crash:
>
>      Unfiltered:
>      crash> search -c "!QAZXSW@#EDC"
>      ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>      ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>      crash> rd ffff96b7fa800000
>      ffff96b7fa800000:  405753585a415121                    !QAZXSW@
>      crash> rd ffff96b87c800000
>      ffff96b87c800000:  405753585a415121                    !QAZXSW@
>
>      Filtered:
>      crash> search -c "!QAZXSW@#EDC"
>      crash> rd ffff96b7fa800000
>      rd: page excluded: kernel virtual address: ffff96b7fa800000  type: "64-bit KVADDR"
>      crash> rd ffff96b87c800000
>      rd: page excluded: kernel virtual address: ffff96b87c800000  type: "64-bit KVADDR"
>
> [1]: https://github.com/lucchouina/eppic/pull/32
> [2]: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
> [3]: https://people.redhat.com/~ltao/core/vmcore
> [4]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df
>
> v5 -> v4:
>
> 1) Add "make EXTENSION=on" switch to customize the extension feature.
> 2) Clean up macros within btf_info.h.
> 3) Updated doc and a sample extension to demo the extension usage.
> 4) Use MSG()/ERRMSG() rather than fprintf().
> 5) Add return value check for readmem().
> 6) Allow "makedumpfile -d 1 --extension ext.so" to enter extension.
> 7) The patches are organized as follows:
>
>     --- <customization specific> ---
>     9. Add amdgpu mm pages filtering extension
>
>     --- <code should be merged> ---
>     8. Doc: Add --extension option to makedumpfile manual
>     7. Add sample extension as an example reference
>     6. Add makedumpfile extensions support
>     5. Implement kernel module's btf resolving
>     4. Implement kernel module's kallsyms resolving
>     3. Implement kernel btf resolving
>     2. Implement kernel kallsyms resolving
>     1. Reserve sections for makedumpfile and extenions
>
>     Patch 9 is customization specific, merging depends on the strategy of
>     maintenance.
>     Patch 1 ~ 8 are common code which should be merged with makedumpfile.

Hi Tao,

Just to be thorough, I did re-read the code of each patch, in addition
to reviewing the diff from v4 to v5.

I've also re-based my userspace stack extension on this new version.
I think it is ready to go.

Reviewed-by: Stephen Brennan <stephen.s.brennan@oracle.com>

Thanks,
Stephen

> Link to v4: https://lore.kernel.org/kexec/20260317150743.69590-1-ltao@redhat.com/
> Link to v3: https://lore.kernel.org/kexec/20260120025500.25095-1-ltao@redhat.com/
> Link to v2: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
> Link to v1: https://lore.kernel.org/kexec/20250610095743.18073-1-ltao@redhat.com/
>
> Tao Liu (9):
>   Reserve sections for makedumpfile and extenions
>   Implement kernel kallsyms resolving
>   Implement kernel btf resolving
>   Implement kernel module's kallsyms resolving
>   Implement kernel module's btf resolving
>   Add makedumpfile extensions support
>   Add sample extension as an example reference
>   Doc: Add --extension option to makedumpfile manual
>   Add amdgpu mm pages filtering extension
>
>  Makefile                   |  15 +-
>  README                     |   6 +
>  btf_info.c                 | 375 +++++++++++++++++++++++++
>  btf_info.h                 |  77 ++++++
>  extension.c                | 338 ++++++++++++++++++++++
>  extension.h                |  16 ++
>  extensions/Makefile        |  13 +
>  extensions/amdgpu_filter.c | 221 +++++++++++++++
>  extensions/maple_tree.c    | 328 ++++++++++++++++++++++
>  extensions/maple_tree.h    |   7 +
>  extensions/sample.c        |  69 +++++
>  kallsyms.c                 | 554 +++++++++++++++++++++++++++++++++++++
>  kallsyms.h                 |  87 ++++++
>  makedumpfile.8.in          |  11 +-
>  makedumpfile.c             |  44 ++-
>  makedumpfile.h             |  12 +
>  makedumpfile.ld            |  16 ++
>  17 files changed, 2180 insertions(+), 9 deletions(-)
>  create mode 100644 btf_info.c
>  create mode 100644 btf_info.h
>  create mode 100644 extension.c
>  create mode 100644 extension.h
>  create mode 100644 extensions/Makefile
>  create mode 100644 extensions/amdgpu_filter.c
>  create mode 100644 extensions/maple_tree.c
>  create mode 100644 extensions/maple_tree.h
>  create mode 100644 extensions/sample.c
>  create mode 100644 kallsyms.c
>  create mode 100644 kallsyms.h
>  create mode 100644 makedumpfile.ld
>
> -- 
> 2.47.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering
  2026-05-28 18:37 ` Stephen Brennan
@ 2026-05-28 22:02   ` Tao Liu
  0 siblings, 0 replies; 18+ messages in thread
From: Tao Liu @ 2026-05-28 22:02 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: yamazaki-msmt, k-hagio-ab, kexec, aravinda

Hi Stephen,

Thanks a lot for your code review, I really appreciate it! And I have
enjoyed the process of cooperating with you on the code discussion,
design, and implementation. Thanks again for your help on this! :)

Hi maintainers,

Just a kind ask: any comments on the v5 patchset? I'm happy to make
further modifications. Thanks!

Thanks,
Tao Liu

On Fri, May 29, 2026 at 6:38 AM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Tao Liu <ltao@redhat.com> writes:
> > A) This patchset will introduce the following features to makedumpfile:
> >
> >   1) Add .so extension support to makedumpfile
> >   2) Enable btf and kallsyms for symbol type and address resolving.
> >
> > B) The purpose of the features are:
> >
> >   1) Currently makedumpfile filters mm pages based on page flags, because flags
> >      can help to determine one page's usage. But this page-flag-checking method
> >      lacks of flexibility in certain cases, e.g. if we want to filter those mm
> >      pages occupied by GPU during vmcore dumping due to:
> >
> >      a) GPU may be taking a large memory and contains sensitive data;
> >      b) GPU mm pages have no relations to kernel crash and useless for vmcore
> >         analysis.
> >
> >      But there is no GPU mm page specific flags, and apparently we don't need
> >      to create one just for kdump use. A programmable filtering tool is more
> >      suitable for such cases. In addition, different GPU vendors may use
> >      different ways for mm pages allocating, programmable filtering is better
> >      than hard coding these GPU specific logics into makedumpfile in this case.
> >
> >   2) Currently makedumpfile already contains a programmable filtering tool, aka
> >      eppic script, which allows user to write customized code for data erasing.
> >      However it has the following drawbacks:
> >
> >      a) cannot do mm page filtering.
> >      b) need to access to debuginfo of both kernel and modules, which is not
> >         applicable in the 2nd kernel.
> >      c) eppic library has memory leaks which are not all resolved [1]. This
> >         is not acceptable in 2nd kernel.
> >
> >      makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
> >      types and addresses. In recent kernel there are dwarf alternatives such
> >      as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
> >      are already packed within vmcore, so we can use it directly.
> >
> >   With these, this patchset introduces makedumpfile extensions, which is based
> >   on btf/kallsyms symbol resolving, and is programmable for mm page filtering.
> >   The following section shows its usage and performance, please note the tests
> >   are performed in 1st kernel.
> >
> >   3) Compile and run makedumpfile extensions:
> >
> >   $ make LINKTYPE=dynamic USELZO=on USESNAPPY=on USEZSTD=on EXTENSION=on
> >   $ make extensions
> >
> >   $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
> >     /tmp/extension.out --extension amdgpu_filter.so
> >     Loaded extension: ./extensions/amdgpu_filter.so
> >     makedumpfile Completed.
> >         User time (seconds): 5.08
> >         System time (seconds): 0.84
> >         Percent of CPU this job got: 99%
> >         Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.95
> >         Maximum resident set size (kbytes): 17360
> >         ...
> >
> >      To contrast with eppic script of v2 [2]:
> >
> >   $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
> >     /tmp/eppic.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c
> >     makedumpfile Completed.
> >         User time (seconds): 8.23
> >         System time (seconds): 0.88
> >         Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.16
> >         Maximum resident set size (kbytes): 57128
> >         ...
> >
> >   -rw------- 1 root root 367475074 Jan 19 19:01 /tmp/extension.out
> >   -rw------- 1 root root 367475074 Jan 19 19:48 /tmp/eppic.out
> >   -rw------- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore
> >
> > C) Discussion:
> >
> >   1) GPU types: Currently only tested with amdgpu's mm page filtering, others
> >      are not tested.
> >   2) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64.
> >      rhel8.x is not supported, others are not tested.
> >
> > D) Testing:
> >
> >      If you don't want to create your vmcore, you can find a vmcore which I
> >      created with amdgpu mm pages unfiltered [3], the amdgpu mm pages are
> >      allocated by program [4]. You can use the vmcore in 1st kernel to filter
> >      the amdgpu mm pages by the previous performance testing cmdline. To
> >      verify the pages are filtered in crash:
> >
> >      Unfiltered:
> >      crash> search -c "!QAZXSW@#EDC"
> >      ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> >      ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> >      crash> rd ffff96b7fa800000
> >      ffff96b7fa800000:  405753585a415121                    !QAZXSW@
> >      crash> rd ffff96b87c800000
> >      ffff96b87c800000:  405753585a415121                    !QAZXSW@
> >
> >      Filtered:
> >      crash> search -c "!QAZXSW@#EDC"
> >      crash> rd ffff96b7fa800000
> >      rd: page excluded: kernel virtual address: ffff96b7fa800000  type: "64-bit KVADDR"
> >      crash> rd ffff96b87c800000
> >      rd: page excluded: kernel virtual address: ffff96b87c800000  type: "64-bit KVADDR"
> >
> > [1]: https://github.com/lucchouina/eppic/pull/32
> > [2]: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
> > [3]: https://people.redhat.com/~ltao/core/vmcore
> > [4]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df
> >
> > v5 -> v4:
> >
> > 1) Add "make EXTENSION=on" switch to customize the extension feature.
> > 2) Clean up macros within btf_info.h.
> > 3) Updated doc and a sample extension to demo the extension usage.
> > 4) Use MSG()/ERRMSG() rather than fprintf().
> > 5) Add return value check for readmem().
> > 6) Allow "makedumpfile -d 1 --extension ext.so" to enter extension.
> > 7) The patches are organized as follows:
> >
> >     --- <customization specific> ---
> >     9. Add amdgpu mm pages filtering extension
> >
> >     --- <code should be merged> ---
> >     8. Doc: Add --extension option to makedumpfile manual
> >     7. Add sample extension as an example reference
> >     6. Add makedumpfile extensions support
> >     5. Implement kernel module's btf resolving
> >     4. Implement kernel module's kallsyms resolving
> >     3. Implement kernel btf resolving
> >     2. Implement kernel kallsyms resolving
> >     1. Reserve sections for makedumpfile and extenions
> >
> >     Patch 9 is customization specific, merging depends on the strategy of
> >     maintenance.
> >     Patch 1 ~ 8 are common code which should be merged with makedumpfile.
>
> Hi Tao,
>
> Just to be thorough, I did re-read the code of each patch, in addition
> to reviewing the diff from v4 to v5.
>
> I've also re-based my userspace stack extension on this new version.
> I think it is ready to go.
>
> Reviewed-by: Stephen Brennan <stephen.s.brennan@oracle.com>
>
> Thanks,
> Stephen
>
> > Link to v4: https://lore.kernel.org/kexec/20260317150743.69590-1-ltao@redhat.com/
> > Link to v3: https://lore.kernel.org/kexec/20260120025500.25095-1-ltao@redhat.com/
> > Link to v2: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
> > Link to v1: https://lore.kernel.org/kexec/20250610095743.18073-1-ltao@redhat.com/
> >
> > Tao Liu (9):
> >   Reserve sections for makedumpfile and extenions
> >   Implement kernel kallsyms resolving
> >   Implement kernel btf resolving
> >   Implement kernel module's kallsyms resolving
> >   Implement kernel module's btf resolving
> >   Add makedumpfile extensions support
> >   Add sample extension as an example reference
> >   Doc: Add --extension option to makedumpfile manual
> >   Add amdgpu mm pages filtering extension
> >
> >  Makefile                   |  15 +-
> >  README                     |   6 +
> >  btf_info.c                 | 375 +++++++++++++++++++++++++
> >  btf_info.h                 |  77 ++++++
> >  extension.c                | 338 ++++++++++++++++++++++
> >  extension.h                |  16 ++
> >  extensions/Makefile        |  13 +
> >  extensions/amdgpu_filter.c | 221 +++++++++++++++
> >  extensions/maple_tree.c    | 328 ++++++++++++++++++++++
> >  extensions/maple_tree.h    |   7 +
> >  extensions/sample.c        |  69 +++++
> >  kallsyms.c                 | 554 +++++++++++++++++++++++++++++++++++++
> >  kallsyms.h                 |  87 ++++++
> >  makedumpfile.8.in          |  11 +-
> >  makedumpfile.c             |  44 ++-
> >  makedumpfile.h             |  12 +
> >  makedumpfile.ld            |  16 ++
> >  17 files changed, 2180 insertions(+), 9 deletions(-)
> >  create mode 100644 btf_info.c
> >  create mode 100644 btf_info.h
> >  create mode 100644 extension.c
> >  create mode 100644 extension.h
> >  create mode 100644 extensions/Makefile
> >  create mode 100644 extensions/amdgpu_filter.c
> >  create mode 100644 extensions/maple_tree.c
> >  create mode 100644 extensions/maple_tree.h
> >  create mode 100644 extensions/sample.c
> >  create mode 100644 kallsyms.c
> >  create mode 100644 kallsyms.h
> >  create mode 100644 makedumpfile.ld
> >
> > --
> > 2.47.0
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering
  2026-04-14 10:26 [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (10 preceding siblings ...)
  2026-05-28 18:37 ` Stephen Brennan
@ 2026-05-29 21:11 ` Krister Johansen
  2026-06-01 23:12   ` Tao Liu
  11 siblings, 1 reply; 18+ messages in thread
From: Krister Johansen @ 2026-05-29 21:11 UTC (permalink / raw)
  To: Tao Liu; +Cc: yamazaki-msmt, k-hagio-ab, kexec, aravinda, stephen.s.brennan

On Tue, Apr 14, 2026 at 10:26:47PM +1200, Tao Liu wrote:
> A) This patchset will introduce the following features to makedumpfile:
> 
>   1) Add .so extension support to makedumpfile
>   2) Enable btf and kallsyms for symbol type and address resolving.
> 
> B) The purpose of the features are:
> 
>   1) Currently makedumpfile filters mm pages based on page flags, because flags
>      can help to determine one page's usage. But this page-flag-checking method
>      lacks of flexibility in certain cases, e.g. if we want to filter those mm
>      pages occupied by GPU during vmcore dumping due to:
> 
>      a) GPU may be taking a large memory and contains sensitive data;
>      b) GPU mm pages have no relations to kernel crash and useless for vmcore
>         analysis.
> 
>      But there is no GPU mm page specific flags, and apparently we don't need
>      to create one just for kdump use. A programmable filtering tool is more
>      suitable for such cases. In addition, different GPU vendors may use
>      different ways for mm pages allocating, programmable filtering is better
>      than hard coding these GPU specific logics into makedumpfile in this case.
> 
>   2) Currently makedumpfile already contains a programmable filtering tool, aka
>      eppic script, which allows user to write customized code for data erasing.
>      However it has the following drawbacks:
> 
>      a) cannot do mm page filtering.
>      b) need to access to debuginfo of both kernel and modules, which is not
>         applicable in the 2nd kernel.
>      c) eppic library has memory leaks which are not all resolved [1]. This
>         is not acceptable in 2nd kernel.
> 
>      makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
>      types and addresses. In recent kernel there are dwarf alternatives such
>      as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
>      are already packed within vmcore, so we can use it directly.
> 
>   With these, this patchset introduces makedumpfile extensions, which is based
>   on btf/kallsyms symbol resolving, and is programmable for mm page filtering.
>   The following section shows its usage and performance, please note the tests
>   are performed in 1st kernel.
> 
>   3) Compile and run makedumpfile extensions:
> 
>   $ make LINKTYPE=dynamic USELZO=on USESNAPPY=on USEZSTD=on EXTENSION=on
>   $ make extensions

I love this idea.  Do you have time to take it further, and if not are
you open to making the extension framework more modular so that we could
add others in the future?

Could the btf lookups be extended to cover the symbol lookups used by
eppic and the erase filters so that the -x option is unnecessary for
kernels that have BTF support?

The current extension implementation is focused just on skipping pages,
but it would be great to be able to use this to erase data in structures
like the config filters and eppic, but without having to provide a
vmlinux at dump time.  What do you think about adding the ability to
use the extensions to also erase parts of data structures, in addition
to filtering whole pages?

Would you be willing to modify the extension registration options to
allow an extension to specify what kind it is? That way, in the future
we could register multiple different kinds without breaking existing
ones.  One for filtering pages, one for erasing / modifying dump
content, and others based upon whatever additional use cases develop.

Thanks,

-K


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering
  2026-05-29 21:11 ` Krister Johansen
@ 2026-06-01 23:12   ` Tao Liu
  2026-06-02  0:47     ` Krister Johansen
  0 siblings, 1 reply; 18+ messages in thread
From: Tao Liu @ 2026-06-01 23:12 UTC (permalink / raw)
  To: Krister Johansen
  Cc: yamazaki-msmt, k-hagio-ab, kexec, aravinda, stephen.s.brennan

Hi Krister,

Thanks a lot for your suggestions and comments!

On Sat, May 30, 2026 at 9:11 AM Krister Johansen
<kjlx@templeofstupid.com> wrote:
>
> On Tue, Apr 14, 2026 at 10:26:47PM +1200, Tao Liu wrote:
> > A) This patchset will introduce the following features to makedumpfile:
> >
> >   1) Add .so extension support to makedumpfile
> >   2) Enable btf and kallsyms for symbol type and address resolving.
> >
> > B) The purpose of the features are:
> >
> >   1) Currently makedumpfile filters mm pages based on page flags, because flags
> >      can help to determine one page's usage. But this page-flag-checking method
> >      lacks of flexibility in certain cases, e.g. if we want to filter those mm
> >      pages occupied by GPU during vmcore dumping due to:
> >
> >      a) GPU may be taking a large memory and contains sensitive data;
> >      b) GPU mm pages have no relations to kernel crash and useless for vmcore
> >         analysis.
> >
> >      But there is no GPU mm page specific flags, and apparently we don't need
> >      to create one just for kdump use. A programmable filtering tool is more
> >      suitable for such cases. In addition, different GPU vendors may use
> >      different ways for mm pages allocating, programmable filtering is better
> >      than hard coding these GPU specific logics into makedumpfile in this case.
> >
> >   2) Currently makedumpfile already contains a programmable filtering tool, aka
> >      eppic script, which allows user to write customized code for data erasing.
> >      However it has the following drawbacks:
> >
> >      a) cannot do mm page filtering.
> >      b) need to access to debuginfo of both kernel and modules, which is not
> >         applicable in the 2nd kernel.
> >      c) eppic library has memory leaks which are not all resolved [1]. This
> >         is not acceptable in 2nd kernel.
> >
> >      makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
> >      types and addresses. In recent kernel there are dwarf alternatives such
> >      as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
> >      are already packed within vmcore, so we can use it directly.
> >
> >   With these, this patchset introduces makedumpfile extensions, which is based
> >   on btf/kallsyms symbol resolving, and is programmable for mm page filtering.
> >   The following section shows its usage and performance, please note the tests
> >   are performed in 1st kernel.
> >
> >   3) Compile and run makedumpfile extensions:
> >
> >   $ make LINKTYPE=dynamic USELZO=on USESNAPPY=on USEZSTD=on EXTENSION=on
> >   $ make extensions
>
> I love this idea.  Do you have time to take it further, and if not are
> you open to making the extension framework more modular so that we could
> add others in the future?

The purpose of extension is to make the framework modular. My original
thought is, we can implement several makedumpfile extensions, each
restricted to one specific function. Like one extension deals with AMD
gpu mm filtering only, one deals with Intel gpu only etc. For distros
we can ship all extensions along with makedumpfile once, but the
respective extensions will only take effect if the machine has AMD /
Intel gpu. This is the same case if you'd like to add other customized
functions while the makedumpfile core remains unchanged.

>
> Could the btf lookups be extended to cover the symbol lookups used by
> eppic and the erase filters so that the -x option is unnecessary for
> kernels that have BTF support?

Yes, from my view it is doable and not difficult to implement.

>
> The current extension implementation is focused just on skipping pages,
> but it would be great to be able to use this to erase data in structures
> like the config filters and eppic, but without having to provide a
> vmlinux at dump time.  What do you think about adding the ability to
> use the extensions to also erase parts of data structures, in addition
> to filtering whole pages?

That's the step 2 for the BTF/kallsyms work of makedumpfile, and I
have planed to work on this once the patchset(step 1) is accepted. The
reason for the task dividing is, the GPU mm page filtering is more
urgent than data erasing from my view. For data erasing, at least we
can do the erasing in 1st kernel with the help of dwarf, cumbersome
but working; For GPU mm filtering, as far as I know, there are no
handy tools in 2nd kernel.

I think erasing the data is doable upon the current page filtering code.

>
> Would you be willing to modify the extension registration options to
> allow an extension to specify what kind it is? That way, in the future

I'm not sure what you mean by "what kind". Do you mean an extension
needs to tell makedumpfile what purpose it is for when loading?

> we could register multiple different kinds without breaking existing
> ones.  One for filtering pages, one for erasing / modifying dump
> content, and others based upon whatever additional use cases develop.

That's the goal of extensions, each extension deals with its own
business. Could you point out the code that doesn't match the goal?
I'm happy to correct it in v6.

Thanks,
Tao Liu


>
> Thanks,
>
> -K
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering
  2026-06-01 23:12   ` Tao Liu
@ 2026-06-02  0:47     ` Krister Johansen
  2026-06-02  3:04       ` Tao Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Krister Johansen @ 2026-06-02  0:47 UTC (permalink / raw)
  To: Tao Liu
  Cc: Krister Johansen, yamazaki-msmt, k-hagio-ab, kexec, aravinda,
	stephen.s.brennan

Hi Tao,
Thanks for the response! I've put the followups below.

On Tue, Jun 02, 2026 at 11:12:05AM +1200, Tao Liu wrote:
> On Sat, May 30, 2026 at 9:11 AM Krister Johansen
> <kjlx@templeofstupid.com> wrote:
> >
> > I love this idea.  Do you have time to take it further, and if not are
> > you open to making the extension framework more modular so that we could
> > add others in the future?
> 
> The purpose of extension is to make the framework modular. My original
> thought is, we can implement several makedumpfile extensions, each
> restricted to one specific function. Like one extension deals with AMD
> gpu mm filtering only, one deals with Intel gpu only etc. For distros
> we can ship all extensions along with makedumpfile once, but the
> respective extensions will only take effect if the machine has AMD /
> Intel gpu. This is the same case if you'd like to add other customized
> functions while the makedumpfile core remains unchanged.

Makes sense.

> > Could the btf lookups be extended to cover the symbol lookups used by
> > eppic and the erase filters so that the -x option is unnecessary for
> > kernels that have BTF support?
> 
> Yes, from my view it is doable and not difficult to implement.

In some environments, the size of the vmlinux + modules can be fairly
substantial to leave on disk.  It's attractive to have the option to
omit it and still filter dumps.

> > The current extension implementation is focused just on skipping pages,
> > but it would be great to be able to use this to erase data in structures
> > like the config filters and eppic, but without having to provide a
> > vmlinux at dump time.  What do you think about adding the ability to
> > use the extensions to also erase parts of data structures, in addition
> > to filtering whole pages?
> 
> That's the step 2 for the BTF/kallsyms work of makedumpfile, and I
> have planed to work on this once the patchset(step 1) is accepted. The
> reason for the task dividing is, the GPU mm page filtering is more
> urgent than data erasing from my view. For data erasing, at least we
> can do the erasing in 1st kernel with the help of dwarf, cumbersome
> but working; For GPU mm filtering, as far as I know, there are no
> handy tools in 2nd kernel.

Excited to hear that you have something already planned for erasing.  My
apologies if I missed a more comprehensive write-up about the longer
term goals for the work.

> I think erasing the data is doable upon the current page filtering code.

I wondered about this, but for data-structures that are smaller than a
page, wouldn't that mean that we're erasing other content?  The "erase"
plugins memset the output data to a chosen value (or 0), whereas the
filtering just drops the page.  Couldn't this also lead to a situation
where the debugger can't find the page at all, versus giving us one
that's sanitized?  (I do understand why you want to drop the pages for
the GPU cases)

> > Would you be willing to modify the extension registration options to
> > allow an extension to specify what kind it is? That way, in the future
> 
> I'm not sure what you mean by "what kind". Do you mean an extension
> needs to tell makedumpfile what purpose it is for when loading?

Yes, sorry I wasn't clear in writing the question.  Stating this
differently, if we want to allow the ability for different extensions to
do different things, how do the extensions declare to makedumpfile what
they can do, so that it knows where to invoke their callbacks, and what
callbacks of theirs to invoke.

Looking at patch 6/9, right now run_extension_callback() is involved
from __exclude_unncessary_pages and always calls the
"extension_callback" symbol in the module.  This makes sense for a
single extension type that's focused on filtering pages.  However, if we
wanted to have multiple different extensions, this might be more
difficult.

If we could determine what type of functionality the module implements
in load_extensions, then we could tell if this is a page filtering
extension, an erase extension, or some other kind of extension.

For example, for an erase filter, perhaps we would want two callbacks:
one to set up the ranges to filter "extension_gather_callback" and
another to actuallyf check the address range to see if it is filtered,
"extension_filter_data_callback"

I'm not sure about the names.  "extension_callback" seems generic, but
this has a specific purpose.  It's a "extension_filter_page_callback"

I may be overengineering this a bit, but having makedumpfile pass an ops
vector to the extension in a load function could help here.  Then the
module's load function fills out the vector with the functions it
supports.  Depending on what's implemented, these can be placed into
different callback lists to get invoked at different points in the
program (e.g. one at pfn filter time, another in filter_data_buffer,
etc).

It sounds like you had a plan here, though.  Were you thinking of adding
new extension types a different way?

> > we could register multiple different kinds without breaking existing
> > ones.  One for filtering pages, one for erasing / modifying dump
> > content, and others based upon whatever additional use cases develop.
> 
> That's the goal of extensions, each extension deals with its own
> business. Could you point out the code that doesn't match the goal?
> I'm happy to correct it in v6.

Yes, I attempted to elaborate on this in the preceding paragraphs.
Basically wondering how we can add new extension functionality without
breaking existing extensions, and then get the code to invoke the right
if there are multiple types that need to be used at different times.

Thanks,

-K


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering
  2026-06-02  0:47     ` Krister Johansen
@ 2026-06-02  3:04       ` Tao Liu
  2026-06-02  4:49         ` Krister Johansen
  0 siblings, 1 reply; 18+ messages in thread
From: Tao Liu @ 2026-06-02  3:04 UTC (permalink / raw)
  To: Krister Johansen
  Cc: yamazaki-msmt, k-hagio-ab, kexec, aravinda, stephen.s.brennan

Hi Krister,

On Tue, Jun 2, 2026 at 12:47 PM Krister Johansen
<kjlx@templeofstupid.com> wrote:
>
> Hi Tao,
> Thanks for the response! I've put the followups below.

Thanks for your in-depth explanation, it's very helpful to me for
designing the data erasing function.

>
> On Tue, Jun 02, 2026 at 11:12:05AM +1200, Tao Liu wrote:
> > On Sat, May 30, 2026 at 9:11 AM Krister Johansen
> > <kjlx@templeofstupid.com> wrote:
> > >
> > > I love this idea.  Do you have time to take it further, and if not are
> > > you open to making the extension framework more modular so that we could
> > > add others in the future?
> >
> > The purpose of extension is to make the framework modular. My original
> > thought is, we can implement several makedumpfile extensions, each
> > restricted to one specific function. Like one extension deals with AMD
> > gpu mm filtering only, one deals with Intel gpu only etc. For distros
> > we can ship all extensions along with makedumpfile once, but the
> > respective extensions will only take effect if the machine has AMD /
> > Intel gpu. This is the same case if you'd like to add other customized
> > functions while the makedumpfile core remains unchanged.
>
> Makes sense.
>
> > > Could the btf lookups be extended to cover the symbol lookups used by
> > > eppic and the erase filters so that the -x option is unnecessary for
> > > kernels that have BTF support?
> >
> > Yes, from my view it is doable and not difficult to implement.
>
> In some environments, the size of the vmlinux + modules can be fairly
> substantial to leave on disk.  It's attractive to have the option to
> omit it and still filter dumps.

Yes, I totally agree.

>
> > > The current extension implementation is focused just on skipping pages,
> > > but it would be great to be able to use this to erase data in structures
> > > like the config filters and eppic, but without having to provide a
> > > vmlinux at dump time.  What do you think about adding the ability to
> > > use the extensions to also erase parts of data structures, in addition
> > > to filtering whole pages?
> >
> > That's the step 2 for the BTF/kallsyms work of makedumpfile, and I
> > have planed to work on this once the patchset(step 1) is accepted. The
> > reason for the task dividing is, the GPU mm page filtering is more
> > urgent than data erasing from my view. For data erasing, at least we
> > can do the erasing in 1st kernel with the help of dwarf, cumbersome
> > but working; For GPU mm filtering, as far as I know, there are no
> > handy tools in 2nd kernel.
>
> Excited to hear that you have something already planned for erasing.  My
> apologies if I missed a more comprehensive write-up about the longer
> term goals for the work.

No worries, I didn't post the goals upstream; I only had internal
discussions within my team regarding the next steps for BTF/kallsyms
in makedumpfile.

>
> > I think erasing the data is doable upon the current page filtering code.
>
> I wondered about this, but for data-structures that are smaller than a
> page, wouldn't that mean that we're erasing other content?  The "erase"
> plugins memset the output data to a chosen value (or 0), whereas the
> filtering just drops the page.  Couldn't this also lead to a situation
> where the debugger can't find the page at all, versus giving us one
> that's sanitized?  (I do understand why you want to drop the pages for
> the GPU cases)

Frankly I didn't consider the data erasing as in-depth as you did. I
think you are right, makedumpfile needs to know which extensions
handle data erasing and which handle mm page filtering. I guess the mm
page filtering extensions will need to perform a "dry-run" filter
first, in case the "data erasing" extensions break any useful data
structure. In this step, "dry-run" will only record pfn numbers of the
pages that will be filtered. Then "data erasing" extensions are
called, so all the sensitive data is memset to 0. Finally, all desired
pages are filtered out based on the previous recording.

With this, "data erase" and "page filtering" will not interfere with
each other. What do you think?

>
> > > Would you be willing to modify the extension registration options to
> > > allow an extension to specify what kind it is? That way, in the future
> >
> > I'm not sure what you mean by "what kind". Do you mean an extension
> > needs to tell makedumpfile what purpose it is for when loading?
>
> Yes, sorry I wasn't clear in writing the question.  Stating this
> differently, if we want to allow the ability for different extensions to
> do different things, how do the extensions declare to makedumpfile what
> they can do, so that it knows where to invoke their callbacks, and what
> callbacks of theirs to invoke.
>
> Looking at patch 6/9, right now run_extension_callback() is involved
> from __exclude_unncessary_pages and always calls the
> "extension_callback" symbol in the module.  This makes sense for a
> single extension type that's focused on filtering pages.  However, if we
> wanted to have multiple different extensions, this might be more
> difficult.
>
> If we could determine what type of functionality the module implements
> in load_extensions, then we could tell if this is a page filtering
> extension, an erase extension, or some other kind of extension.
>
> For example, for an erase filter, perhaps we would want two callbacks:
> one to set up the ranges to filter "extension_gather_callback" and
> another to actuallyf check the address range to see if it is filtered,
> "extension_filter_data_callback"
>
> I'm not sure about the names.  "extension_callback" seems generic, but
> this has a specific purpose.  It's a "extension_filter_page_callback"
>
> I may be overengineering this a bit, but having makedumpfile pass an ops
> vector to the extension in a load function could help here.  Then the
> module's load function fills out the vector with the functions it
> supports.  Depending on what's implemented, these can be placed into
> different callback lists to get invoked at different points in the
> program (e.g. one at pfn filter time, another in filter_data_buffer,
> etc).
>
> It sounds like you had a plan here, though.  Were you thinking of adding
> new extension types a different way?

I see your idea: makedumpfile predefines a few hook points at
different stages, and extensions can register their callbacks to these
hook points. For now I think 2 hook points are enough, one for page
filtering and other one for resiger the data erasing, which definitely
shouldn't be within __exclude_unnecessary_pages().

I'm willing to modify the code. Such as implementing a hooking point
registration/management. But since I haven't work on the data erasing
functions so far, the design might be superficial, personally I'd
prefer to do this along with the data erasing functions in the next
independent patchset, considering current patchset we already includes
plenty of code/function implementations. @maintainers, What's your
opinion?

>
> > > we could register multiple different kinds without breaking existing
> > > ones.  One for filtering pages, one for erasing / modifying dump
> > > content, and others based upon whatever additional use cases develop.
> >
> > That's the goal of extensions, each extension deals with its own
> > business. Could you point out the code that doesn't match the goal?
> > I'm happy to correct it in v6.
>
> Yes, I attempted to elaborate on this in the preceding paragraphs.
> Basically wondering how we can add new extension functionality without
> breaking existing extensions, and then get the code to invoke the right
> if there are multiple types that need to be used at different times.

Agreed.

Thanks,
Tao Liu

>
> Thanks,
>
> -K
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering
  2026-06-02  3:04       ` Tao Liu
@ 2026-06-02  4:49         ` Krister Johansen
  0 siblings, 0 replies; 18+ messages in thread
From: Krister Johansen @ 2026-06-02  4:49 UTC (permalink / raw)
  To: Tao Liu; +Cc: yamazaki-msmt, k-hagio-ab, kexec, aravinda, stephen.s.brennan

Hi Tao,

On Tue, Jun 02, 2026 at 03:04:12PM +1200, Tao Liu wrote:
> On Tue, Jun 2, 2026 at 12:47 PM Krister Johansen
> <kjlx@templeofstupid.com> wrote:
> Thanks for your in-depth explanation, it's very helpful to me for
> designing the data erasing function.

Thanks for the great discussion.

> > On Tue, Jun 02, 2026 at 11:12:05AM +1200, Tao Liu wrote:
> > > On Sat, May 30, 2026 at 9:11 AM Krister Johansen
> > I wondered about this, but for data-structures that are smaller than a
> > page, wouldn't that mean that we're erasing other content?  The "erase"
> > plugins memset the output data to a chosen value (or 0), whereas the
> > filtering just drops the page.  Couldn't this also lead to a situation
> > where the debugger can't find the page at all, versus giving us one
> > that's sanitized?  (I do understand why you want to drop the pages for
> > the GPU cases)
> 
> Frankly I didn't consider the data erasing as in-depth as you did. I
> think you are right, makedumpfile needs to know which extensions
> handle data erasing and which handle mm page filtering. I guess the mm
> page filtering extensions will need to perform a "dry-run" filter
> first, in case the "data erasing" extensions break any useful data
> structure. In this step, "dry-run" will only record pfn numbers of the
> pages that will be filtered. Then "data erasing" extensions are
> called, so all the sensitive data is memset to 0. Finally, all desired
> pages are filtered out based on the previous recording.
> 
> With this, "data erase" and "page filtering" will not interfere with
> each other. What do you think?

This is a great point.  It's probably worth documenting the precedence
order in which these callbacks are expected to be applied.  Naively, I
might expect filtering pages to take precedence over erasing data
structures.  For the GPU cases, these are orthogonal.  However, for
something where a user might be both trying to filter the page and erase
matching content, we don't have any rules defined.  It's probably less
surprising to allow pages to be filtered first.  (I think it is this way
in the code.) It also prevents the page filtering from completely
filtering a page.

> > > > Would you be willing to modify the extension registration options to
> > > > allow an extension to specify what kind it is? That way, in the future
> > >
> > > I'm not sure what you mean by "what kind". Do you mean an extension
> > > needs to tell makedumpfile what purpose it is for when loading?
> >
> > Yes, sorry I wasn't clear in writing the question.  Stating this
> > differently, if we want to allow the ability for different extensions to
> > do different things, how do the extensions declare to makedumpfile what
> > they can do, so that it knows where to invoke their callbacks, and what
> > callbacks of theirs to invoke.
> >
> > Looking at patch 6/9, right now run_extension_callback() is involved
> > from __exclude_unncessary_pages and always calls the
> > "extension_callback" symbol in the module.  This makes sense for a
> > single extension type that's focused on filtering pages.  However, if we
> > wanted to have multiple different extensions, this might be more
> > difficult.
> >
> > If we could determine what type of functionality the module implements
> > in load_extensions, then we could tell if this is a page filtering
> > extension, an erase extension, or some other kind of extension.
> >
> > For example, for an erase filter, perhaps we would want two callbacks:
> > one to set up the ranges to filter "extension_gather_callback" and
> > another to actuallyf check the address range to see if it is filtered,
> > "extension_filter_data_callback"
> >
> > I'm not sure about the names.  "extension_callback" seems generic, but
> > this has a specific purpose.  It's a "extension_filter_page_callback"
> >
> > I may be overengineering this a bit, but having makedumpfile pass an ops
> > vector to the extension in a load function could help here.  Then the
> > module's load function fills out the vector with the functions it
> > supports.  Depending on what's implemented, these can be placed into
> > different callback lists to get invoked at different points in the
> > program (e.g. one at pfn filter time, another in filter_data_buffer,
> > etc).
> >
> > It sounds like you had a plan here, though.  Were you thinking of adding
> > new extension types a different way?
> 
> I see your idea: makedumpfile predefines a few hook points at
> different stages, and extensions can register their callbacks to these
> hook points. For now I think 2 hook points are enough, one for page
> filtering and other one for resiger the data erasing, which definitely
> shouldn't be within __exclude_unnecessary_pages().
> 
> I'm willing to modify the code. Such as implementing a hooking point
> registration/management. But since I haven't work on the data erasing
> functions so far, the design might be superficial, personally I'd
> prefer to do this along with the data erasing functions in the next
> independent patchset, considering current patchset we already includes
> plenty of code/function implementations. @maintainers, What's your
> opinion?

Just to clarify, I'm not asking that you implement any erase
functionality in the current patchset.  Rather, asking if there's a way
to implement the current functionality such that the extension modules
won't need recompilation when a new extension type is introduced. I
think there are a number of different ways to do this, but I didn't want
to be overly prescriptive in my feedback.

Thanks again,

-K


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-06-02  4:49 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-14 10:26 [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
2026-04-14 10:26 ` [PATCH v5][makedumpfile 1/9] Reserve sections for makedumpfile and extenions Tao Liu
2026-04-14 10:26 ` [PATCH v5][makedumpfile 2/9] Implement kernel kallsyms resolving Tao Liu
2026-04-14 10:26 ` [PATCH v5][makedumpfile 3/9] Implement kernel btf resolving Tao Liu
2026-04-14 10:26 ` [PATCH v5][makedumpfile 4/9] Implement kernel module's kallsyms resolving Tao Liu
2026-04-14 10:26 ` [PATCH v5][makedumpfile 5/9] Implement kernel module's btf resolving Tao Liu
2026-04-14 10:26 ` [PATCH v5][makedumpfile 6/9] Add makedumpfile extensions support Tao Liu
2026-04-14 10:26 ` [PATCH v5][makedumpfile 7/9] Add sample extension as an example reference Tao Liu
2026-04-14 10:26 ` [PATCH v5][makedumpfile 8/9] Doc: Add --extension option to makedumpfile manual Tao Liu
2026-04-14 10:26 ` [PATCH v5][makedumpfile 9/9] Add amdgpu mm pages filtering extension Tao Liu
2026-05-20  4:55 ` [PATCH v5][makedumpfile 0/9] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
2026-05-28 18:37 ` Stephen Brennan
2026-05-28 22:02   ` Tao Liu
2026-05-29 21:11 ` Krister Johansen
2026-06-01 23:12   ` Tao Liu
2026-06-02  0:47     ` Krister Johansen
2026-06-02  3:04       ` Tao Liu
2026-06-02  4:49         ` Krister Johansen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.