[PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering

public inbox for kexec@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering
@ 2026-01-20  2:54 Tao Liu
  2026-01-20  2:54 ` [PATCH v3 1/8] Implement kernel kallsyms resolving Tao Liu
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Tao Liu @ 2026-01-20  2:54 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

A) This patchset will introduce the following features to makedumpfile:

  1) Add .so extension support to makedumpfile
  2) Enable btf and kallsyms for symbol type and address resolving.

B) The purpose of the features are:

  1) Currently makedumpfile filters mm pages based on page flags, because flags
     can help to determine one page's usage. But this page-flag-checking method
     lacks of flexibility in certain cases, e.g. if we want to filter those mm
     pages occupied by GPU during vmcore dumping due to:

     a) GPU may be taking a large memory and contains sensitive data;
     b) GPU mm pages have no relations to kernel crash and useless for vmcore
        analysis.

     But there is no GPU mm page specific flags, and apparently we don't need
     to create one just for kdump use. A programmable filtering tool is more
     suitable for such cases. In addition, different GPU vendors may use
     different ways for mm pages allocating, programmable filtering is better
     than hard coding these GPU specific logics into makedumpfile in this case.

  2) Currently makedumpfile already contains a programmable filtering tool, aka
     eppic script, which allows user to write customized code for data erasing.
     However it has the following drawbacks:

     a) cannot do mm page filtering.
     b) need to access to debuginfo of both kernel and modules, which is not
        applicable in the 2nd kernel.
     c) eppic library has memory leaks which are not all resolved [1]. This
        is not acceptable in 2nd kernel.

     makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
     types and addresses. In recent kernel there are dwarf alternatives such
     as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
     are already packed within vmcore, so we can use it directly.

  With these, this patchset introduces makedumpfile extensions, which is based
  on btf/kallsyms symbol resolving, and is programmable for mm page filtering.
  The following section shows its usage and performance, please note the tests
  are performed in 1st kernel.

  3) Compile and run makedumpfile extensions:

  $ make LINKTYPE=dynamic USELZO=on USESNAPPY=on USEZSTD=on
  $ make extensions
  
  $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
    /tmp/extension.out
    Loaded extension: ./extensions/amdgpu_filter.so
    makedumpfile Completed.
        User time (seconds): 6.37
        System time (seconds): 0.70
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.10
        Maximum resident set size (kbytes): 38024
        ...
 
     To contrast with eppic script of v2 [2]:

  $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
    /tmp/eppic.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c   
    makedumpfile Completed.
        User time (seconds): 8.23
        System time (seconds): 0.88
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.16
        Maximum resident set size (kbytes): 57128
        ...

  -rw------- 1 root root 367475074 Jan 19 19:01 /tmp/extension.out
  -rw------- 1 root root 367475074 Jan 19 19:48 /tmp/eppic.out
  -rw------- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore

C) Discussion:

  1) GPU types: Currently only tested with amdgpu's mm page filtering, others
     are not tested.
  2) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64.
     Others are not tested.

D) Testing:

     If you don't want to create your vmcore, you can find a vmcore which I
     created with amdgpu mm pages unfiltered [3], the amdgpu mm pages are
     allocated by program [4]. You can use the vmcore in 1st kernel to filter
     the amdgpu mm pages by the previous performance testing cmdline. To
     verify the pages are filtered in crash:

     Unfiltered:
     crash> search -c "!QAZXSW@#EDC"
     ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
     ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
     crash> rd ffff96b7fa800000
     ffff96b7fa800000:  405753585a415121                    !QAZXSW@
     crash> rd ffff96b87c800000
     ffff96b87c800000:  405753585a415121                    !QAZXSW@

     Filtered:
     crash> search -c "!QAZXSW@#EDC"
     crash> rd ffff96b7fa800000
     rd: page excluded: kernel virtual address: ffff96b7fa800000  type: "64-bit KVADDR"
     crash> rd ffff96b87c800000
     rd: page excluded: kernel virtual address: ffff96b87c800000  type: "64-bit KVADDR"

[1]: https://github.com/lucchouina/eppic/pull/32
[2]: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
[3]: https://people.redhat.com/~ltao/core/vmcore
[4]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df

v3 -> v2:

1) Removed btf/kallsyms support for eppic script, and introduced
   makedumpfile .so extension instead. The reason of removing eppic
   support is: 
   a) Native binary code as .so has better performance than scripting,
      see the time consumption contrast above.
   b) Eppic library has memory leaks which hasn't been fixed totally,
      memeory leaks in 2nd kernel might be fatal.  

2) Removed the code of manually parsing btf info, and used libbpf for
   btf info parsing instead. The reason of removing manually parsing is:
   a) Less code modification to makedumpfile, easier to maintain.
   b) The performance of using libbpf is as good as manual parsing +
      hash table indexing, as well as less memory consumption, see time
      and memory consumption contrast above. 

3) The patches are organized as follows:

    --- <only for test purpose, don't merge> ---
    8.Filter amdgpu mm pages
    7.Add maple tree support to makedumpfile extension

    --- <code should be merged> ---
    6.Add page filtering function
    5.Add makedumpfile extension support
    4.Implement kernel modules' btf resolving
    3.Implement kernel modules' kallsyms resolving
    2.Implement kernel btf resolving
    1.Implement kernel kallsyms resolving

    Patch 7 & 8 are customization specific, which can be maintained separately.
    Patch 1 ~ 6 are common code which should be integrate with makedumpfile.

Link to v2: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
Link to v1: https://lore.kernel.org/kexec/20250610095743.18073-1-ltao@redhat.com/

Tao Liu (8):
  Implement kernel kallsyms resolving
  Implement kernel btf resolving
  Implement kernel modules' kallsyms resolving
  Implement kernel modules' btf resolving
  Add makedumpfile extension support
  Add page filtering function
  Add maple tree support to makedumpfile extension
  Filter amdgpu mm pages

 Makefile                   |   9 +-
 btf_info.c                 | 260 +++++++++++++++++++++++++
 btf_info.h                 |  66 +++++++
 erase_info.c               |  98 ++++++++++
 erase_info.h               |  12 ++
 extension.c                |  82 ++++++++
 extensions/Makefile        |  10 +
 extensions/amdgpu_filter.c |  90 +++++++++
 extensions/maple_tree.c    | 336 +++++++++++++++++++++++++++++++++
 extensions/maple_tree.h    |   6 +
 kallsyms.c                 | 376 +++++++++++++++++++++++++++++++++++++
 kallsyms.h                 |  20 ++
 makedumpfile.c             |  35 +++-
 makedumpfile.h             |  11 ++
 14 files changed, 1405 insertions(+), 6 deletions(-)
 create mode 100644 btf_info.c
 create mode 100644 btf_info.h
 create mode 100644 extension.c
 create mode 100644 extensions/Makefile
 create mode 100644 extensions/amdgpu_filter.c
 create mode 100644 extensions/maple_tree.c
 create mode 100644 extensions/maple_tree.h
 create mode 100644 kallsyms.c
 create mode 100644 kallsyms.h

-- 
2.47.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/8] Implement kernel kallsyms resolving
  2026-01-20  2:54 [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
@ 2026-01-20  2:54 ` Tao Liu
  2026-01-24  1:09   ` Stephen Brennan
  2026-01-20  2:54 ` [PATCH v3 2/8] Implement kernel btf resolving Tao Liu
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Tao Liu @ 2026-01-20  2:54 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

This patch will parse kernel's kallsyms data, and store them into a hash
table so they can be referenced later in a fast speed.

Signed-off-by: Tao Liu <ltao@redhat.com>
---
 Makefile       |   2 +-
 kallsyms.c     | 265 +++++++++++++++++++++++++++++++++++++++++++++++++
 kallsyms.h     |  17 ++++
 makedumpfile.c |   3 +
 makedumpfile.h |  11 ++
 5 files changed, 297 insertions(+), 1 deletion(-)
 create mode 100644 kallsyms.c
 create mode 100644 kallsyms.h

diff --git a/Makefile b/Makefile
index 05ab5f2..6c450ac 100644
--- a/Makefile
+++ b/Makefile
@@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
 endif
 
 SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
-SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c
+SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c
 OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
 SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
 OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
diff --git a/kallsyms.c b/kallsyms.c
new file mode 100644
index 0000000..ecf64e0
--- /dev/null
+++ b/kallsyms.c
@@ -0,0 +1,265 @@
+#include <stdint.h>
+#include <stdbool.h>
+#include <string.h>
+#include "makedumpfile.h"
+#include "kallsyms.h"
+
+static uint32_t *kallsyms_offsets = NULL;
+static uint16_t *kallsyms_token_index = NULL;
+static uint8_t  *kallsyms_token_table = NULL;
+static uint8_t  *kallsyms_names = NULL;
+static unsigned long kallsyms_relative_base = 0;
+static unsigned int kallsyms_num_syms = 0;
+
+#define NAME_HASH 512
+static struct syment *name_hash_table[NAME_HASH] = {0};
+
+static uint64_t absolute_percpu(uint64_t base, int32_t val)
+{
+	if (val >= 0)
+		return (uint64_t)val;
+	else
+		return base - 1 - val;
+}
+
+static unsigned int hash_index(const char *name, unsigned int hash_size)
+{
+	unsigned int len, value;
+
+	len = strlen(name);
+	value = name[len - 1] * name[len / 2];
+
+	return (name[0] ^ value) % hash_size;
+}
+
+static void name_hash_install(struct syment *en)
+{
+	unsigned int index = hash_index(en->name, NAME_HASH);
+	struct syment *sp = name_hash_table[index];
+
+	if (sp == NULL) {
+		name_hash_table[index] = en;
+	} else {
+		while (sp) {
+			if (sp->name_hash_next) {
+				sp = sp->name_hash_next;
+			} else {
+				sp->name_hash_next = en;
+				break;
+			}
+		}
+	}
+}
+
+static struct syment *search_kallsyms_by_name(char *name)
+{
+	unsigned int index;
+	struct syment *sp;
+
+	index = hash_index(name, NAME_HASH);
+	for (sp = name_hash_table[index]; sp; sp = sp->name_hash_next) {
+		if (!strcmp(name, sp->name)) {
+			return sp;
+		}
+	}
+	return sp;
+}
+
+static bool is_unwanted_symbol(char *name)
+{
+	const char *unwanted_prefix[] = {
+		"__pfx_",	// CFI symbols
+		"_R",		// Rust symbols
+	};
+	for (int i = 0; i < sizeof(unwanted_prefix) / sizeof(char *); i++) {
+		if (!strncmp(name, unwanted_prefix[i], strlen(unwanted_prefix[i])))
+			return true;
+	}
+	return false;
+}
+
+uint64_t get_kallsyms_value_by_name(char *name)
+{
+	struct syment *sp;
+
+	sp = search_kallsyms_by_name(name);
+	if (!sp)
+		return 0;
+	return sp->value;
+}
+
+#define BUFLEN 1024
+static bool parse_kernel_kallsyms(void)
+{
+	char buf[BUFLEN];
+	int index = 0, i;
+	uint8_t *compressd_data;
+	uint8_t *uncompressd_data;
+	uint64_t stext;
+	uint8_t len, len_old;
+	struct syment *kern_syment;
+	bool skip;
+
+	for (i = 0; i < kallsyms_num_syms; i++) {
+		skip = false;
+		memset(buf, 0, BUFLEN);
+		len = kallsyms_names[index];
+		if (len & 0x80) {
+			index++;
+			len_old = len;
+			len = kallsyms_names[index];
+			if (len & 0x80) {
+				fprintf(stderr, "%s: BUG! Unexpected 3-byte length,"
+					" should be detected in init_kernel_kallsyms()\n",
+					__func__);
+				goto out;
+			}
+			len = (len_old & 0x7F) | (len << 7);
+		}
+		index++;
+
+		compressd_data = &kallsyms_names[index];
+		index += len;
+		while (len--) {
+			uncompressd_data = &kallsyms_token_table[kallsyms_token_index[*compressd_data]];
+			if (strlen(buf) + strlen((char *)uncompressd_data) >= BUFLEN) {
+				skip = true;
+				break;
+			}
+			strcat(buf, (char *)uncompressd_data);
+			compressd_data++;
+		}
+		if (skip || is_unwanted_symbol(&buf[1]))
+			continue;
+		kern_syment = (struct syment *)calloc(1, sizeof(struct syment));
+		if (!kern_syment)
+			goto no_mem;
+		kern_syment->value = kallsyms_offsets[i];
+		kern_syment->name = strdup(&buf[1]);
+		if (!kern_syment->name) {
+			free(kern_syment);
+			goto no_mem;
+		}
+		name_hash_install(kern_syment);
+	}
+
+	/* Now refresh the absolute each kallsyms address */
+	stext = get_kallsyms_value_by_name("_stext");
+	if (SYMBOL(_stext) == absolute_percpu(kallsyms_relative_base, stext)) {
+		for (i = 0; i < NAME_HASH; i++) {
+			for (kern_syment = name_hash_table[i];
+			     kern_syment;
+			     kern_syment = kern_syment->name_hash_next)
+				kern_syment->value = absolute_percpu(kallsyms_relative_base,
+							kern_syment->value);
+		}
+	} else if (SYMBOL(_stext) == kallsyms_relative_base + stext) {
+		for (i = 0; i < NAME_HASH; i++) {
+			for (kern_syment = name_hash_table[i];
+			     kern_syment;
+			     kern_syment = kern_syment->name_hash_next)
+				kern_syment->value += kallsyms_relative_base;
+		}
+	} else {
+		fprintf(stderr, "%s: Wrong calculate kallsyms symbol value!\n", __func__);
+		goto out;
+	}
+
+	return true;
+no_mem:
+	fprintf(stderr, "%s: Not enough memory!\n", __func__);
+out:
+	return false;
+}
+
+static bool vmcore_info_ready = false;
+
+bool read_vmcoreinfo_kallsyms(void)
+{
+	READ_SYMBOL("kallsyms_names", kallsyms_names);
+	READ_SYMBOL("kallsyms_num_syms", kallsyms_num_syms);
+	READ_SYMBOL("kallsyms_token_table", kallsyms_token_table);
+	READ_SYMBOL("kallsyms_token_index", kallsyms_token_index);
+	READ_SYMBOL("kallsyms_offsets", kallsyms_offsets);
+	READ_SYMBOL("kallsyms_relative_base", kallsyms_relative_base);
+	vmcore_info_ready = true;
+	return true;
+}
+
+bool init_kernel_kallsyms(void)
+{
+	const int token_index_size = (UINT8_MAX + 1) * sizeof(uint16_t);
+	uint64_t last_token, len;
+	unsigned char data, data_old;
+	int i;
+	bool ret = false;
+
+	if (vmcore_info_ready == false) {
+		fprintf(stderr, "%s: vmcoreinfo not ready for kallsyms!\n",
+			__func__);
+		return ret;
+	}
+
+	readmem(VADDR, SYMBOL(kallsyms_num_syms), &kallsyms_num_syms,
+		sizeof(kallsyms_num_syms));
+	readmem(VADDR, SYMBOL(kallsyms_relative_base), &kallsyms_relative_base,
+		sizeof(kallsyms_relative_base));
+
+	kallsyms_offsets = malloc(sizeof(uint32_t) * kallsyms_num_syms);
+	if (!kallsyms_offsets)
+		goto no_mem;
+	readmem(VADDR, SYMBOL(kallsyms_offsets), kallsyms_offsets,
+		kallsyms_num_syms * sizeof(uint32_t));
+
+	kallsyms_token_index = malloc(token_index_size);
+	if (!kallsyms_token_index)
+		goto no_mem;
+	readmem(VADDR, SYMBOL(kallsyms_token_index), kallsyms_token_index,
+		token_index_size);
+
+	last_token = SYMBOL(kallsyms_token_table) + kallsyms_token_index[UINT8_MAX];
+	do {
+		readmem(VADDR, last_token++, &data, 1);
+	} while(data);
+	len = last_token - SYMBOL(kallsyms_token_table);
+	kallsyms_token_table = malloc(len);
+	if (!kallsyms_token_table)
+		goto no_mem;
+	readmem(VADDR, SYMBOL(kallsyms_token_table), kallsyms_token_table, len);
+
+	for (len = 0, i = 0; i < kallsyms_num_syms; i++) {
+		readmem(VADDR, SYMBOL(kallsyms_names) + len, &data, 1);
+		if (data & 0x80) {
+			len += 1;
+			data_old = data;
+			readmem(VADDR, SYMBOL(kallsyms_names) + len, &data, 1);
+			if (data & 0x80) {
+				fprintf(stderr, "%s: BUG! Unexpected 3-byte length"
+					" encoding in kallsyms names\n", __func__);
+				goto out;
+			}
+			data = (data_old & 0x7F) | (data << 7);
+		}
+		len += data + 1;
+	}
+	kallsyms_names = malloc(len);
+	if (!kallsyms_names)
+		goto no_mem;
+	readmem(VADDR, SYMBOL(kallsyms_names), kallsyms_names, len);
+
+	ret = parse_kernel_kallsyms();
+	goto out;
+
+no_mem:
+	fprintf(stderr, "%s: Not enough memory!\n", __func__);
+out:
+	if (kallsyms_offsets)
+		free(kallsyms_offsets);
+	if (kallsyms_token_index)
+		free(kallsyms_token_index);
+	if (kallsyms_token_table)
+		free(kallsyms_token_table);
+	if (kallsyms_names)
+		free(kallsyms_names);
+	return ret;
+}
diff --git a/kallsyms.h b/kallsyms.h
new file mode 100644
index 0000000..a4fbe10
--- /dev/null
+++ b/kallsyms.h
@@ -0,0 +1,17 @@
+#ifndef _KALLSYMS_H
+#define _KALLSYMS_H
+
+#include <stdint.h>
+#include <stdbool.h>
+
+struct __attribute__((packed)) syment {
+	uint64_t value;
+	char *name;
+	struct syment *name_hash_next;
+};
+
+bool read_vmcoreinfo_kallsyms(void);
+bool init_kernel_kallsyms(void);
+uint64_t get_kallsyms_value_by_name(char *);
+
+#endif /* _KALLSYMS_H */
\ No newline at end of file
diff --git a/makedumpfile.c b/makedumpfile.c
index 12fb0d8..dba3628 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -27,6 +27,7 @@
 #include <limits.h>
 #include <assert.h>
 #include <zlib.h>
+#include "kallsyms.h"
 
 struct symbol_table	symbol_table;
 struct size_table	size_table;
@@ -3105,6 +3106,8 @@ read_vmcoreinfo_from_vmcore(off_t offset, unsigned long size, int flag_xen_hv)
 		if (!read_vmcoreinfo())
 			goto out;
 	}
+	read_vmcoreinfo_kallsyms();
+
 	close_vmcoreinfo();
 
 	ret = TRUE;
diff --git a/makedumpfile.h b/makedumpfile.h
index 134eb7a..0dec50e 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -259,6 +259,7 @@ static inline int string_exists(char *s) { return (s ? TRUE : FALSE); }
 #define UINT(ADDR)	*((unsigned int *)(ADDR))
 #define ULONG(ADDR)	*((unsigned long *)(ADDR))
 #define ULONGLONG(ADDR)	*((unsigned long long *)(ADDR))
+#define VOID_PTR(ADDR)  *((void **)(ADDR))
 
 
 /*
@@ -1919,6 +1920,16 @@ struct symbol_table {
 	 * symbols on sparc64 arch
 	 */
 	unsigned long long		vmemmap_table;
+
+	/*
+	 * kallsyms related
+	 */
+	unsigned long long		kallsyms_names;
+	unsigned long long		kallsyms_num_syms;
+	unsigned long long		kallsyms_token_table;
+	unsigned long long		kallsyms_token_index;
+	unsigned long long		kallsyms_offsets;
+	unsigned long long		kallsyms_relative_base;
 };
 
 struct size_table {
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/8] Implement kernel btf resolving
  2026-01-20  2:54 [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
  2026-01-20  2:54 ` [PATCH v3 1/8] Implement kernel kallsyms resolving Tao Liu
@ 2026-01-20  2:54 ` Tao Liu
  2026-01-20  2:54 ` [PATCH v3 3/8] Implement kernel modules' kallsyms resolving Tao Liu
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Tao Liu @ 2026-01-20  2:54 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

This patch will parse kernel's btf data using libbpf. The kernel's btf data is
located between __start_BTF and __stop_BTF symbols which are resolved by kallsyms
of the previous patch. The primary function implemented in this patch, is
recursively diving into anonymous struct/union when encountered any, to find a
member by given its name.

Signed-off-by: Tao Liu <ltao@redhat.com>
---
 Makefile   |   4 +-
 btf_info.c | 186 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 btf_info.h |  64 ++++++++++++++++++
 3 files changed, 252 insertions(+), 2 deletions(-)
 create mode 100644 btf_info.c
 create mode 100644 btf_info.h

diff --git a/Makefile b/Makefile
index 6c450ac..f3f4da8 100644
--- a/Makefile
+++ b/Makefile
@@ -45,12 +45,12 @@ CFLAGS_ARCH += -m32
 endif
 
 SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
-SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c
+SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c
 OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
 SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
 OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
 
-LIBS = -ldw -lbz2 -ldl -lelf -lz
+LIBS = -ldw -lbz2 -ldl -lelf -lz -lbpf
 ifneq ($(LINKTYPE), dynamic)
 LIBS := -static $(LIBS) -llzma
 endif
diff --git a/btf_info.c b/btf_info.c
new file mode 100644
index 0000000..e7f8d9a
--- /dev/null
+++ b/btf_info.c
@@ -0,0 +1,186 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <bpf/btf.h>
+#include <bpf/libbpf_legacy.h>
+#include "makedumpfile.h"
+#include "kallsyms.h"
+#include "btf_info.h"
+
+struct btf_arr_elem {
+	struct btf *btf;
+	char *module;
+};
+
+static struct btf_arr_elem* btf_arr = NULL;
+static int btf_arr_len = 0;
+static int btf_arr_cap = 0;
+
+static bool find_member_recursive(struct btf *btf,
+				int struct_typeid,
+				int base_offset,
+				char *member_name,
+				uint32_t *out_bit_offset,
+				uint32_t *out_bit_sz,
+				uint32_t *out_member_size)
+{
+	const struct btf_type *st;
+	struct btf_member *bm;
+	int i, vlen;
+
+	struct_typeid = btf__resolve_type(btf, struct_typeid);
+	st = btf__type_by_id(btf, struct_typeid);
+
+	if (!st)
+		return false;
+
+	if (BTF_INFO_KIND(st->info) != BTF_KIND_STRUCT &&
+	    BTF_INFO_KIND(st->info) != BTF_KIND_UNION)
+		return false;
+
+	vlen = BTF_INFO_VLEN(st->info);
+	bm = btf_members(st);
+
+	for (i = 0; i < vlen; i++, bm++) {
+		const char *name = btf__name_by_offset(btf, bm->name_off);
+		int member_bit_offset = btf_member_bit_offset(st, i) + base_offset;
+		int member_typeid = btf__resolve_type(btf, bm->type);
+		const struct btf_type *mt = btf__type_by_id(btf, member_typeid);
+
+		if (name && strcmp(name, member_name) == 0) {
+			*out_bit_offset = member_bit_offset;
+			*out_bit_sz = btf_member_bitfield_size(st, i);
+			*out_member_size = btf__resolve_size(btf, member_typeid);
+			return true;
+		}
+
+		if (!name || !name[0]) {
+			if (BTF_INFO_KIND(mt->info) == BTF_KIND_STRUCT ||
+			    BTF_INFO_KIND(mt->info) == BTF_KIND_UNION) {
+				if (find_member_recursive(btf, member_typeid,
+								member_bit_offset,
+								member_name,
+								out_bit_offset,
+								out_bit_sz,
+								out_member_size))
+					return true;
+			}
+		}
+	}
+	return false;
+}
+
+bool get_struct_member_by_name(struct struct_member_info *smi)
+{
+	int i, j, start_id;
+	char *fmt;
+
+	for (i = 0; i < btf_arr_len; i++) {
+		if (smi->modname != NULL) {
+			if (strcmp(smi->modname, btf_arr[i].module) != 0)
+				continue;
+		}
+		/*
+		 * vmlinux(btf_arr[0])'s typeid is 1~vmlinux_type_cnt,
+		 * modules(btf_arr[1...])'s typeid is vmlinux_type_cnt~btf__type_cnt
+		 */
+		start_id = (i == 0 ? 1 : btf__type_cnt(btf_arr[0].btf));
+
+		for (j = start_id; j < btf__type_cnt(btf_arr[i].btf); j++) {
+			const struct btf_type *bt = btf__type_by_id(btf_arr[i].btf, j);
+			const char *name = btf__name_by_offset(btf_arr[i].btf, bt->name_off);
+
+			if (name && strcmp(smi->struct_name, name) == 0) {
+				if (smi->member_name != NULL) {
+					/* Retrieve member info */
+					if (!find_member_recursive(btf_arr[i].btf, j,
+								0,
+								smi->member_name,
+								&(smi->member_bit_offset),
+								&(smi->member_bit_sz),
+								&(smi->member_size))) {
+						fprintf(stderr, "%s: Not find member %s in %s\n",
+							__func__, smi->struct_name,
+							smi->member_name);						
+						return false;
+					}
+				}
+				smi->struct_size = btf__resolve_size(btf_arr[i].btf, j);
+				return true;
+			}
+		}
+	}
+	fmt = smi->modname ?
+		"%s: Not find struct/union %s in %s\n" :
+		"%s: Not find struct/union %s%s\n";
+
+	fprintf(stderr, fmt, __func__, smi->struct_name,
+			smi->modname ? smi->modname : "");
+	return false;
+}
+
+static bool add_to_btf_arr(struct btf *btf, char *module_name)
+{
+	struct btf_arr_elem* tmp;
+	int new_cap = 0;
+
+	if (btf_arr == NULL) {
+		new_cap = 4;
+	} else if (btf_arr_len >= btf_arr_cap) {
+		new_cap = btf_arr_cap + (btf_arr_cap >> 1);
+	}
+
+	if (!module_name)
+		goto no_mem;
+
+	if (new_cap) {
+		tmp = reallocarray(btf_arr, new_cap, sizeof(struct btf_arr_elem));
+		if (!tmp)
+			goto no_mem;
+		btf_arr = tmp;
+		btf_arr_cap = new_cap;
+	}
+
+	btf_arr[btf_arr_len].btf = btf;
+	btf_arr[btf_arr_len++].module = module_name;
+	return true;
+
+no_mem:
+	fprintf(stderr, "%s: Not enough memory!\n", __func__);
+	return false;
+}
+
+bool init_kernel_btf(void)
+{
+	uint64_t size;
+	struct btf *btf;
+	char *buf = NULL;
+	bool ret = false;
+
+	uint64_t start_btf = get_kallsyms_value_by_name("__start_BTF");
+	uint64_t stop_btf = get_kallsyms_value_by_name("__stop_BTF");
+	if (!start_btf || !stop_btf) {
+		fprintf(stderr, "%s: symbol __start/stop_BTF not found!\n", __func__);
+		goto out;
+	}
+
+	size = stop_btf - start_btf;
+	buf = (char *)malloc(size);
+	if (!buf) {
+		fprintf(stderr, "%s: Not enough memory!\n", __func__);
+		goto out;
+	}
+	readmem(VADDR, start_btf, buf, size);
+	btf = btf__new(buf, size);
+
+	if (libbpf_get_error(btf) != 0 ||
+	    add_to_btf_arr(btf, strdup("vmlinux")) == false) {	
+		fprintf(stderr, "%s: init vmlinux btf fail\n", __func__);
+		goto out;
+	}
+	ret = true;
+out:
+	if (buf)
+		free(buf);
+	return ret;
+}
diff --git a/btf_info.h b/btf_info.h
new file mode 100644
index 0000000..1fc6829
--- /dev/null
+++ b/btf_info.h
@@ -0,0 +1,64 @@
+#ifndef _BTF_INFO_H
+#define _BTF_INFO_H
+#include <stdint.h>
+
+struct struct_member_info {
+	/********in******/
+	char *modname;		// Set to search within the module, in case
+				// name conflict of different modules
+	char *struct_name;	// Search by struct name
+	char *member_name;	// Search by member name
+	/********out*****/
+	uint32_t member_bit_offset;	// member offset in bits
+	uint32_t member_bit_sz;	// member width in bits
+	uint32_t member_size;	// member size in bytes
+	uint32_t struct_size;	// struct size in bytes
+};
+
+bool init_kernel_btf(void);
+bool get_struct_member_by_name(struct struct_member_info *smi);
+
+struct member_off_size {
+	int m_off;
+	int m_size;
+	int s_size;
+};
+#define QUATE(x) #x
+#define INIT_MOD_STRUCT_MEMBER(MOD, S, M) \
+	struct member_off_size _##MOD##_##S##_##M; \
+	memset(&smi, 0, sizeof(struct struct_member_info)); \
+	smi.modname = QUATE(MOD); \
+	smi.struct_name = QUATE(S); \
+	smi.member_name = QUATE(M); \
+	get_struct_member_by_name(&smi); \
+	_##MOD##_##S##_##M.s_size = smi.struct_size; \
+	_##MOD##_##S##_##M.m_size = smi.member_size; \
+	_##MOD##_##S##_##M.m_off = smi.member_bit_offset;
+#define GET_MOD_STRUCT_MEMBER_MOFF(MOD, S, M) (_##MOD##_##S##_##M.m_off)
+#define GET_MOD_STRUCT_MEMBER_MSIZE(MOD, S, M) (_##MOD##_##S##_##M.m_size)
+#define GET_MOD_STRUCT_MEMBER_SSIZE(MOD, S, M) (_##MOD##_##S##_##M.s_size)
+
+#define INIT_STRUCT_MEMBER(S, M) \
+	struct member_off_size _##S##_##M; \
+	memset(&smi, 0, sizeof(struct struct_member_info)); \
+	smi.modname = NULL; \
+	smi.struct_name = QUATE(S); \
+	smi.member_name = QUATE(M); \
+	get_struct_member_by_name(&smi); \
+	_##S##_##M.s_size = smi.struct_size; \
+	_##S##_##M.m_size = smi.member_size; \
+	_##S##_##M.m_off = smi.member_bit_offset;
+#define GET_STRUCT_MEMBER_MOFF(S, M) (_##S##_##M.m_off)
+#define GET_STRUCT_MEMBER_MSIZE(S, M) (_##S##_##M.m_size)
+#define GET_STRUCT_MEMBER_SSIZE(S, M) (_##S##_##M.s_size)
+
+#define INIT_STRUCT(S) \
+	struct member_off_size _##S; \
+	memset(&smi, 0, sizeof(struct struct_member_info)); \
+	smi.modname = NULL; \
+	smi.member_name = NULL; \
+	smi.struct_name = QUATE(S); \
+	get_struct_member_by_name(&smi); \
+	_##S.s_size = smi.struct_size;
+#define GET_STRUCT_SSIZE(S) (_##S.s_size)
+#endif /* _BTF_INFO_H */
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 3/8] Implement kernel modules' kallsyms resolving
  2026-01-20  2:54 [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
  2026-01-20  2:54 ` [PATCH v3 1/8] Implement kernel kallsyms resolving Tao Liu
  2026-01-20  2:54 ` [PATCH v3 2/8] Implement kernel btf resolving Tao Liu
@ 2026-01-20  2:54 ` Tao Liu
  2026-01-20  2:54 ` [PATCH v3 4/8] Implement kernel modules' btf resolving Tao Liu
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Tao Liu @ 2026-01-20  2:54 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

With kernel's kallsyms and btf ready, we can get any kernel types and
symbol addresses. So we can iterate kernel modules' linked list, and
parse each one of kernel module's structure to get its kallsyms data.

Signed-off-by: Tao Liu <ltao@redhat.com>
---
 kallsyms.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 kallsyms.h |   3 ++
 2 files changed, 114 insertions(+)

diff --git a/kallsyms.c b/kallsyms.c
index ecf64e0..9069c88 100644
--- a/kallsyms.c
+++ b/kallsyms.c
@@ -3,6 +3,7 @@
 #include <string.h>
 #include "makedumpfile.h"
 #include "kallsyms.h"
+#include "btf_info.h"
 
 static uint32_t *kallsyms_offsets = NULL;
 static uint16_t *kallsyms_token_index = NULL;
@@ -263,3 +264,113 @@ out:
 		free(kallsyms_names);
 	return ret;
 }
+
+#define MEMBER_OFF(S, M) \
+	GET_STRUCT_MEMBER_MOFF(S, M) / 8
+
+uint64_t next_list(uint64_t list)
+{
+	static int list_head_next_offset = 0;
+	static int list_head_next_size = 0;
+
+	struct struct_member_info smi;
+	uint64_t next = 0;
+
+	if (!list_head_next_size) {
+		INIT_STRUCT_MEMBER(list_head, next);
+		list_head_next_size = GET_STRUCT_MEMBER_MSIZE(list_head, next);
+		list_head_next_offset = MEMBER_OFF(list_head, next);
+	}
+	readmem(VADDR, list + list_head_next_offset, &next, list_head_next_size);
+	return next;
+}
+
+bool init_module_kallsyms(void)
+{
+	struct struct_member_info smi;
+	uint64_t modules, list, value = 0, symtab = 0, strtab = 0;
+	uint32_t st_name = 0;
+	int num_symtab, i, j;
+	struct syment *mod_syment;
+	char symname[512], ch;
+	bool ret = false;
+
+	modules = get_kallsyms_value_by_name("modules");
+	if (!modules) {
+		/* Not a failure if no module enabled */
+		ret = true;
+		goto out;
+	}
+
+	INIT_STRUCT_MEMBER(module, list);
+	INIT_STRUCT_MEMBER(module, core_kallsyms);
+	INIT_STRUCT_MEMBER(mod_kallsyms, symtab);
+	INIT_STRUCT_MEMBER(mod_kallsyms, num_symtab);
+	INIT_STRUCT_MEMBER(mod_kallsyms, strtab);
+	INIT_STRUCT_MEMBER(elf64_sym, st_name);
+	INIT_STRUCT_MEMBER(elf64_sym, st_value);
+
+	for (list = next_list(modules); list != modules; list = next_list(list)) {
+		readmem(VADDR, list - MEMBER_OFF(module, list) +
+				MEMBER_OFF(module, core_kallsyms) +
+				MEMBER_OFF(mod_kallsyms, num_symtab),
+			&num_symtab, GET_STRUCT_MEMBER_MSIZE(mod_kallsyms, num_symtab));
+		readmem(VADDR, list - MEMBER_OFF(module, list) +
+				MEMBER_OFF(module, core_kallsyms) +
+				MEMBER_OFF(mod_kallsyms, symtab),
+			&symtab, GET_STRUCT_MEMBER_MSIZE(mod_kallsyms, symtab));
+		readmem(VADDR, list - MEMBER_OFF(module, list) +
+				MEMBER_OFF(module, core_kallsyms) +
+				MEMBER_OFF(mod_kallsyms, strtab),
+			&strtab, GET_STRUCT_MEMBER_MSIZE(mod_kallsyms, strtab));
+		for (i = 0; i < num_symtab; i++) {
+			j = 0;
+			readmem(VADDR, symtab + i * GET_STRUCT_MEMBER_SSIZE(elf64_sym, st_value) +
+					MEMBER_OFF(elf64_sym, st_value),
+				&value, GET_STRUCT_MEMBER_MSIZE(elf64_sym, st_value));
+			readmem(VADDR, symtab + i * GET_STRUCT_MEMBER_SSIZE(elf64_sym, st_name) +
+					MEMBER_OFF(elf64_sym, st_name),
+				&st_name, GET_STRUCT_MEMBER_MSIZE(elf64_sym, st_name));
+			do {
+				readmem(VADDR, strtab + st_name + j++, &ch, 1);
+			} while (ch != '\0');
+			if (j == 1 || j > sizeof(symname))
+				/* Skip empty or too long string */
+				continue;
+			readmem(VADDR, strtab + st_name, symname, j);
+			if (is_unwanted_symbol(symname))
+				continue;
+			mod_syment = (struct syment *)calloc(1, sizeof(struct syment));
+			if (!mod_syment)
+				goto no_mem;
+			mod_syment->name = strdup(symname);
+			if (!mod_syment->name) {
+				free(mod_syment);
+				goto no_mem;
+			}
+			mod_syment->value = value;
+			name_hash_install(mod_syment);
+		}
+	}
+	ret = true;
+	goto out;
+no_mem:
+	/* Hashtable will be cleaned later */
+	fprintf(stderr, "%s: Not enough memory!\n", __func__);
+out:
+	return ret;
+}
+
+void cleanup_kallsyms(void)
+{
+	struct syment *en, *en_tmp;
+
+	for (int i = 0; i < NAME_HASH; i++) {
+		for (en = name_hash_table[i]; en;) {
+			en_tmp = en;
+			en = en->name_hash_next;
+			free(en_tmp->name);
+			free(en_tmp);
+		}
+	}
+}
diff --git a/kallsyms.h b/kallsyms.h
index a4fbe10..78af4ef 100644
--- a/kallsyms.h
+++ b/kallsyms.h
@@ -12,6 +12,9 @@ struct __attribute__((packed)) syment {
 
 bool read_vmcoreinfo_kallsyms(void);
 bool init_kernel_kallsyms(void);
+bool init_module_kallsyms(void);
+void cleanup_kallsyms(void);
+uint64_t next_list(uint64_t);
 uint64_t get_kallsyms_value_by_name(char *);
 
 #endif /* _KALLSYMS_H */
\ No newline at end of file
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 4/8] Implement kernel modules' btf resolving
  2026-01-20  2:54 [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (2 preceding siblings ...)
  2026-01-20  2:54 ` [PATCH v3 3/8] Implement kernel modules' kallsyms resolving Tao Liu
@ 2026-01-20  2:54 ` Tao Liu
  2026-01-20  2:54 ` [PATCH v3 5/8] Add makedumpfile extension support Tao Liu
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Tao Liu @ 2026-01-20  2:54 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

Same as the previous patch, with kernel's kallsyms and btf ready,
we can locate and iterate all kernel modules' btf data.

Signed-off-by: Tao Liu <ltao@redhat.com>
---
 btf_info.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 btf_info.h |  2 ++
 2 files changed, 76 insertions(+)

diff --git a/btf_info.c b/btf_info.c
index e7f8d9a..752661c 100644
--- a/btf_info.c
+++ b/btf_info.c
@@ -184,3 +184,77 @@ out:
 		free(buf);
 	return ret;
 }
+
+#define MEMBER_OFF(S, M) \
+	GET_STRUCT_MEMBER_MOFF(S, M) / 8
+
+bool init_module_btf(void)
+{
+	struct btf *btf_mod;
+	uint64_t btf_modules, list;
+	struct struct_member_info smi;
+	uint64_t btf = 0, data = 0, module = 0;
+	int data_size = 0;
+	bool ret = false;
+	char *btf_buf = NULL;
+	char *modname = NULL;
+
+	btf_modules = get_kallsyms_value_by_name("btf_modules");
+	if (!btf_modules)
+		/* Maybe module is not enabled, this is not an error */
+		return true;
+
+	INIT_STRUCT_MEMBER(btf_module, list);
+	INIT_STRUCT_MEMBER(btf_module, btf);
+	INIT_STRUCT_MEMBER(btf_module, module);
+	INIT_STRUCT_MEMBER(module, name);
+	INIT_STRUCT_MEMBER(btf, data);
+	INIT_STRUCT_MEMBER(btf, data_size);
+	modname = (char *)malloc(GET_STRUCT_MEMBER_MSIZE(module, name));
+	if (!modname)
+		goto no_mem;
+
+	for (list = next_list(btf_modules); list != btf_modules; list = next_list(list)) {
+		readmem(VADDR, list - MEMBER_OFF(btf_module, list) +
+				MEMBER_OFF(btf_module, btf),
+			&btf, GET_STRUCT_MEMBER_MSIZE(btf_module, btf));
+		readmem(VADDR, list - MEMBER_OFF(btf_module, list) +
+				MEMBER_OFF(btf_module, module),
+			&module, GET_STRUCT_MEMBER_MSIZE(btf_module, module));
+		readmem(VADDR, module + MEMBER_OFF(module, name),
+			modname, GET_STRUCT_MEMBER_MSIZE(module, name));
+		readmem(VADDR, btf + MEMBER_OFF(btf, data),
+			&data, GET_STRUCT_MEMBER_MSIZE(btf, data));
+		readmem(VADDR, btf + MEMBER_OFF(btf, data_size),
+			&data_size, GET_STRUCT_MEMBER_MSIZE(btf, data_size));
+		btf_buf = (char *)malloc(data_size);
+		if (!btf_buf)
+			goto no_mem;
+		readmem(VADDR, data, btf_buf, data_size);
+		btf_mod = btf__new_split(btf_buf, data_size, btf_arr[0].btf);
+		free(btf_buf);
+		if (libbpf_get_error(btf_mod) != 0 ||
+		    add_to_btf_arr(btf_mod, strdup(modname)) == false) {
+			fprintf(stderr, "%s: init %s btf fail\n", __func__, modname);
+			goto out;
+		}
+	}
+	ret = true;
+	goto out;
+
+no_mem:
+	fprintf(stderr, "%s: Not enough memory!\n", __func__);
+out:
+	if (modname)
+		free(modname);
+	return ret;
+}
+
+void cleanup_btf(void)
+{
+	for (int i = 0; i < btf_arr_len; i++) {
+		free(btf_arr[i].module);
+		btf__free(btf_arr[i].btf);
+	}
+	free(btf_arr);
+}
diff --git a/btf_info.h b/btf_info.h
index 1fc6829..d4408c3 100644
--- a/btf_info.h
+++ b/btf_info.h
@@ -16,7 +16,9 @@ struct struct_member_info {
 };
 
 bool init_kernel_btf(void);
+bool init_module_btf(void);
 bool get_struct_member_by_name(struct struct_member_info *smi);
+void cleanup_btf(void);
 
 struct member_off_size {
 	int m_off;
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 5/8] Add makedumpfile extension support
  2026-01-20  2:54 [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (3 preceding siblings ...)
  2026-01-20  2:54 ` [PATCH v3 4/8] Implement kernel modules' btf resolving Tao Liu
@ 2026-01-20  2:54 ` Tao Liu
  2026-01-22  0:51   ` Stephen Brennan
  2026-01-20  2:54 ` [PATCH v3 6/8] Add page filtering function Tao Liu
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Tao Liu @ 2026-01-20  2:54 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

This patch will add .so extension support to makedumpfile, similar to crash
extension to crash utility. Currently only "/usr/lib64/makedumpfile/extensions"
and "./extensions" are searched for extensions. Once found, kallsyms and btf
will be initialized so all extensions can benifit from it (Currently makedumpfile
doesn't use these info, we can move the kallsyms/btf init code else where later
if makedumpfile needs them).

The makedumpfile extension is to help users to customize mm page filtering upon
traditional mm page flag filtering, without make code modification on makedumpfile
itself.

Signed-off-by: Tao Liu <ltao@redhat.com>
---
 Makefile            |  7 +++-
 extension.c         | 82 +++++++++++++++++++++++++++++++++++++++++++++
 extensions/Makefile | 10 ++++++
 makedumpfile.c      |  4 +++
 4 files changed, 102 insertions(+), 1 deletion(-)
 create mode 100644 extension.c
 create mode 100644 extensions/Makefile

diff --git a/Makefile b/Makefile
index f3f4da8..7e29220 100644
--- a/Makefile
+++ b/Makefile
@@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
 endif
 
 SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
-SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c
+SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c extension.c
 OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
 SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
 OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
@@ -126,6 +126,7 @@ eppic_makedumpfile.so: extension_eppic.c
 
 clean:
 	rm -f $(OBJ) $(OBJ_PART) $(OBJ_ARCH) makedumpfile makedumpfile.8 makedumpfile.conf.5
+	$(MAKE) -C extensions clean
 
 install:
 	install -m 755 -d ${DESTDIR}/${SBINDIR} ${DESTDIR}/usr/share/man/man5 ${DESTDIR}/usr/share/man/man8
@@ -135,3 +136,7 @@ install:
 	mkdir -p ${DESTDIR}/usr/share/makedumpfile/eppic_scripts
 	install -m 644 -D $(VPATH)makedumpfile.conf ${DESTDIR}/usr/share/makedumpfile/makedumpfile.conf.sample
 	install -m 644 -t ${DESTDIR}/usr/share/makedumpfile/eppic_scripts/ $(VPATH)eppic_scripts/*
+
+.PHONY: extensions
+extensions:
+	$(MAKE) -C extensions CC=$(CC)
\ No newline at end of file
diff --git a/extension.c b/extension.c
new file mode 100644
index 0000000..6ee7f4e
--- /dev/null
+++ b/extension.c
@@ -0,0 +1,82 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <dirent.h>
+#include <dlfcn.h>
+#include <stdbool.h>
+#include "kallsyms.h"
+#include "btf_info.h"
+
+static const char *dirs[] = {
+	"/usr/lib64/makedumpfile/extensions",
+	"./extensions",
+};
+
+/* Will only init once */
+static bool init_kallsyms_btf(void)
+{
+	static bool ret = false;
+	static bool has_inited = false;
+
+	if (has_inited)
+		goto out;
+	if (!init_kernel_kallsyms())
+		goto out;
+	if (!init_kernel_btf())
+		goto out;
+	if (!init_module_kallsyms())
+		goto out;
+	if (!init_module_btf())
+		goto out;
+	ret = true;
+out:
+	has_inited = true;
+	return ret;
+}
+
+static void cleanup_kallsyms_btf(void)
+{
+	cleanup_kallsyms();
+	cleanup_btf();
+}
+
+void run_extensions(void)
+{
+	DIR *dir;
+	struct dirent *entry;
+	size_t len;
+	int i;
+	void *handle;
+	char path[512];
+
+	for (i = 0; i < sizeof(dirs) / sizeof(char *); i++) {
+		if ((dir = opendir(dirs[i])) != NULL)
+			break;
+	}
+
+	if (!dir || i >= sizeof(dirs) / sizeof(char *))
+		/* No extensions found */
+		return;
+
+	while ((entry = readdir(dir)) != NULL) {
+		len = strlen(entry->d_name);
+		if (len > 3 && strcmp(entry->d_name + len - 3, ".so") == 0) {
+			/* Will only init when .so exist */
+			if (!init_kallsyms_btf())
+				goto out;
+
+			snprintf(path, sizeof(path), "%s/%s", dirs[i], entry->d_name);
+			handle = dlopen(path, RTLD_NOW);
+			if (!handle) {
+				fprintf(stderr, "%s: Failed to load %s: %s\n",
+					__func__, path, dlerror());
+				continue;
+			}
+			printf("Loaded extension: %s\n", path);
+			dlclose(handle);
+		}
+	}
+out:
+	closedir(dir);
+	cleanup_kallsyms_btf();
+}
\ No newline at end of file
diff --git a/extensions/Makefile b/extensions/Makefile
new file mode 100644
index 0000000..afbc61e
--- /dev/null
+++ b/extensions/Makefile
@@ -0,0 +1,10 @@
+CC ?= gcc
+CONTRIB_SO :=
+
+all: $(CONTRIB_SO)
+
+$(CONTRIB_SO): %.so: %.c
+	$(CC) -O2 -g -fPIC -shared -o $@ $^
+
+clean:
+	rm -f $(CONTRIB_SO)
diff --git a/makedumpfile.c b/makedumpfile.c
index dba3628..ca8ed8a 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -10847,6 +10847,8 @@ update_dump_level(void)
 	}
 }
 
+void run_extensions(void);
+
 int
 create_dumpfile(void)
 {
@@ -10884,6 +10886,8 @@ retry:
 	if (info->flag_refiltering)
 		update_dump_level();
 
+	run_extensions();
+
 	if ((info->name_filterconfig || info->name_eppic_config)
 			&& !gather_filter_info())
 		return FALSE;
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 6/8] Add page filtering function
  2026-01-20  2:54 [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (4 preceding siblings ...)
  2026-01-20  2:54 ` [PATCH v3 5/8] Add makedumpfile extension support Tao Liu
@ 2026-01-20  2:54 ` Tao Liu
  2026-01-23  0:54   ` Stephen Brennan
  2026-01-20  2:54 ` [PATCH v3 7/8] Add maple tree support to makedumpfile extension Tao Liu
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Tao Liu @ 2026-01-20  2:54 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

pfn and num is the data which extensions give to makedumpfile for mm page
filtering. Since makedumpfile will iterate the pfn in an ascending order in
__exclude_unnecessary_pages(), pfn and num are stored within ft_page_info linked
lists and organized in an ascending order by pfn, so if one pfn
is hit by one list, the next pfn is most likely to be hit either by
this list again, or the one follows, so a cur variable is used for saving
the current list position to speedup the pfn checking process.

In addition, 2 ft_page_info linked list chains are used, one for mm page
discarding and the other for page keeping.

Signed-off-by: Tao Liu <ltao@redhat.com>
---
 erase_info.c   | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++
 erase_info.h   | 12 +++++++
 makedumpfile.c | 28 ++++++++++++---
 3 files changed, 134 insertions(+), 4 deletions(-)

diff --git a/erase_info.c b/erase_info.c
index b67d1d0..8838bea 100644
--- a/erase_info.c
+++ b/erase_info.c
@@ -2466,3 +2466,101 @@ get_size_eraseinfo(void)
 	return size_eraseinfo;
 }
 
+/* Pages to be discarded */
+static struct ft_page_info *ft_head_discard = NULL;
+/* Pages to be keeped */
+static struct ft_page_info *ft_head_keep = NULL;
+
+/*
+ * Insert the ft_page_info blocks into ft_head by ascending pfn.
+ */
+bool
+update_filter_pages_info(unsigned long pfn, unsigned long num, bool to_discard)
+{
+	struct ft_page_info *p, **ft_head;
+	struct ft_page_info *new_p = malloc(sizeof(struct ft_page_info));
+
+	ft_head = to_discard ? &ft_head_discard : &ft_head_keep;
+
+	if (!new_p) {
+		ERRMSG("Can't allocate memory for ft_page_info at %lx\n", pfn);
+		return false;
+	}
+	new_p->pfn = pfn;
+	new_p->num = num;
+	new_p->next = NULL;
+
+	if (!(*ft_head) || (*ft_head)->pfn > new_p->pfn) {
+		new_p->next = (*ft_head);
+		(*ft_head) = new_p;
+		return true;
+	}
+
+	p = (*ft_head);
+	while (p->next != NULL && p->next->pfn < new_p->pfn) {
+		p = p->next;
+	}
+
+	new_p->next = p->next;
+	p->next = new_p;
+	return true;
+}
+
+/*
+ * Check if the pfn hit ft_page_info block.
+ *
+ * pfn and ft_head are in ascending order, so save the current ft_page_info
+ * block into **p because it is likely to hit again next time.
+ */
+bool
+filter_page(unsigned long pfn, struct ft_page_info **p, bool handle_discard)
+{
+	struct ft_page_info *ft_head;
+
+	ft_head = handle_discard ? ft_head_discard : ft_head_keep;
+
+	if (ft_head == NULL)
+		return false;
+
+	if (*p == NULL)
+		*p = ft_head;
+
+	/* The gap before 1st block */
+	if (pfn >= 0 && pfn < ft_head->pfn)
+		return false;
+
+	/* Handle 1~(n-1) blocks and following gaps */
+	while ((*p)->next) {
+		if (pfn >= (*p)->pfn && pfn < (*p)->pfn + (*p)->num)
+			return true; // hit the block
+		if (pfn >= (*p)->pfn + (*p)->num && pfn < (*p)->next->pfn)
+			return false; // the gap after the block
+		*p = (*p)->next;
+	}
+
+	/* The last block and gap */
+	if (pfn >= (*p)->pfn + (*p)->num)
+		return false;
+	else
+		return true;
+}
+
+static void
+do_cleanup(struct ft_page_info **ft_head)
+{
+	struct ft_page_info *p, *p_tmp;
+
+	for (p = *ft_head; p;) {
+		p_tmp = p;
+		p = p->next;
+		free(p_tmp);
+	}
+	*ft_head = NULL;
+}
+
+void
+cleanup_filter_pages_info(void)
+{
+	do_cleanup(&ft_head_discard);
+	do_cleanup(&ft_head_keep);
+}
diff --git a/erase_info.h b/erase_info.h
index b363a40..6c60706 100644
--- a/erase_info.h
+++ b/erase_info.h
@@ -20,6 +20,7 @@
 #define _ERASE_INFO_H
 
 #define MAX_SIZE_STR_LEN (26)
+#include <stdbool.h>
 
 /*
  * Erase information, original symbol expressions.
@@ -65,5 +66,16 @@ void filter_data_buffer_parallel(unsigned char *buf, unsigned long long paddr,
 unsigned long get_size_eraseinfo(void);
 int update_filter_info_raw(unsigned long long, int, int);
 
+bool update_filter_pages_info(unsigned long, unsigned long, bool);
+
+struct ft_page_info {
+	unsigned long pfn;
+	unsigned long num;
+	struct ft_page_info *next;
+} __attribute__((packed));
+
+bool filter_page(unsigned long, struct ft_page_info **p, bool handle_discard);
+void cleanup_filter_pages_info(void);
+
 #endif /* _ERASE_INFO_H */
 
diff --git a/makedumpfile.c b/makedumpfile.c
index ca8ed8a..ebac8da 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -102,6 +102,7 @@ mdf_pfn_t pfn_free;
 mdf_pfn_t pfn_hwpoison;
 mdf_pfn_t pfn_offline;
 mdf_pfn_t pfn_elf_excluded;
+mdf_pfn_t pfn_extension;
 
 mdf_pfn_t num_dumped;
 
@@ -6459,6 +6460,8 @@ __exclude_unnecessary_pages(unsigned long mem_map,
 	unsigned int order_offset, dtor_offset;
 	unsigned long flags, mapping, private = 0;
 	unsigned long compound_dtor, compound_head = 0;
+	struct ft_page_info *cur_discard = NULL;
+	struct ft_page_info *cur_keep = NULL;
 
 	/*
 	 * If a multi-page exclusion is pending, do it first
@@ -6495,6 +6498,13 @@ __exclude_unnecessary_pages(unsigned long mem_map,
 		if (info->flag_cyclic && !is_cyclic_region(pfn, cycle))
 			continue;
 
+		/*
+		 * Keep pages that specified by user via
+		 * makedumpfile extensions
+		 */
+		if (filter_page(pfn, &cur_keep, false))
+			continue;
+
 		/*
 		 * Exclude the memory hole.
 		 */
@@ -6687,6 +6697,14 @@ check_order:
 		else if (isOffline(flags, _mapcount)) {
 			pfn_counter = &pfn_offline;
 		}
+		/*
+		 * Exclude pages that specified by user via
+		 * makedumpfile extensions
+		 */
+		else if (filter_page(pfn, &cur_discard, true)) {
+			nr_pages = 1;
+			pfn_counter = &pfn_extension;
+		}
 		/*
 		 * Unexcludable page
 		 */
@@ -6748,6 +6766,7 @@ exclude_unnecessary_pages(struct cycle *cycle)
 		print_progress(PROGRESS_UNN_PAGES, info->num_mem_map, info->num_mem_map, NULL);
 		print_execution_time(PROGRESS_UNN_PAGES, &ts_start);
 	}
+	cleanup_filter_pages_info();
 
 	return TRUE;
 }
@@ -8234,7 +8253,7 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
 	 */
 	if (info->flag_cyclic) {
 		pfn_zero = pfn_cache = pfn_cache_private = 0;
-		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = 0;
+		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_extension = 0;
 		pfn_memhole = info->max_mapnr;
 	}
 
@@ -9579,7 +9598,7 @@ write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_d
 		 * Reset counter for debug message.
 		 */
 		pfn_zero = pfn_cache = pfn_cache_private = 0;
-		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = 0;
+		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_extension = 0;
 		pfn_memhole = info->max_mapnr;
 
 		/*
@@ -10528,7 +10547,7 @@ print_report(void)
 	pfn_original = info->max_mapnr - pfn_memhole;
 
 	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
-	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
+	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_extension;
 
 	REPORT_MSG("\n");
 	REPORT_MSG("Original pages  : 0x%016llx\n", pfn_original);
@@ -10544,6 +10563,7 @@ print_report(void)
 	REPORT_MSG("    Free pages              : 0x%016llx\n", pfn_free);
 	REPORT_MSG("    Hwpoison pages          : 0x%016llx\n", pfn_hwpoison);
 	REPORT_MSG("    Offline pages           : 0x%016llx\n", pfn_offline);
+	REPORT_MSG("    Extension filter pages  : 0x%016llx\n", pfn_extension);
 	REPORT_MSG("  Remaining pages  : 0x%016llx\n",
 	    pfn_original - pfn_excluded);
 
@@ -10584,7 +10604,7 @@ print_mem_usage(void)
 	pfn_original = info->max_mapnr - pfn_memhole;
 
 	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
-	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
+	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_extension;
 	shrinking = (pfn_original - pfn_excluded) * 100;
 	shrinking = shrinking / pfn_original;
 	total_size = info->page_size * pfn_original;
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 7/8] Add maple tree support to makedumpfile extension
  2026-01-20  2:54 [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (5 preceding siblings ...)
  2026-01-20  2:54 ` [PATCH v3 6/8] Add page filtering function Tao Liu
@ 2026-01-20  2:54 ` Tao Liu
  2026-01-20  2:55 ` [PATCH v3 8/8] Filter amdgpu mm pages Tao Liu
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Tao Liu @ 2026-01-20  2:54 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

Signed-off-by: Tao Liu <ltao@redhat.com>
---
 extensions/maple_tree.c | 336 ++++++++++++++++++++++++++++++++++++++++
 extensions/maple_tree.h |   6 +
 2 files changed, 342 insertions(+)
 create mode 100644 extensions/maple_tree.c
 create mode 100644 extensions/maple_tree.h

diff --git a/extensions/maple_tree.c b/extensions/maple_tree.c
new file mode 100644
index 0000000..0cc65bc
--- /dev/null
+++ b/extensions/maple_tree.c
@@ -0,0 +1,336 @@
+#include <stdio.h>
+#include <stdbool.h>
+#include "../btf_info.h"
+#include "../kallsyms.h"
+#include "../makedumpfile.h"
+
+static unsigned char mt_slots[4] = {0};
+static unsigned char mt_pivots[4] = {0};
+static unsigned long mt_max[4] = {0};
+
+static int maple_tree_size;
+static int maple_node_size;
+static int maple_tree_ma_root;
+static int maple_tree_ma_flags;
+static int maple_node_parent;
+static int maple_node_ma64;
+static int maple_node_mr64;
+static int maple_node_slot;
+static int maple_arange_64_pivot;
+static int maple_arange_64_slot;
+static int maple_arange_64_gap;
+static int maple_arange_64_meta;
+static int maple_range_64_pivot;
+static int maple_range_64_slot;
+static int maple_metadata_end;
+static int maple_metadata_gap;
+
+#define MAPLE_BUFSIZE			512
+
+enum {
+	maple_dense_enum,
+	maple_leaf_64_enum,
+	maple_range_64_enum,
+	maple_arange_64_enum,
+};
+
+#define MAPLE_NODE_MASK			255UL
+#define MAPLE_NODE_TYPE_MASK		0x0F
+#define MAPLE_NODE_TYPE_SHIFT		0x03
+#define XA_ZERO_ENTRY			xa_mk_internal(257)
+
+static unsigned long xa_mk_internal(unsigned long v)
+{
+	return (v << 2) | 2;
+}
+
+static bool xa_is_internal(unsigned long entry)
+{
+	return (entry & 3) == 2;
+}
+
+static bool xa_is_node(unsigned long entry)
+{
+	return xa_is_internal(entry) && entry > 4096;
+}
+
+static bool xa_is_value(unsigned long entry)
+{
+	return entry & 1;
+}
+
+static bool xa_is_zero(unsigned long entry)
+{
+	return entry == XA_ZERO_ENTRY;
+}
+
+static unsigned long xa_to_internal(unsigned long entry)
+{
+	return entry >> 2;
+}
+
+static unsigned long xa_to_value(unsigned long entry)
+{
+	return entry >> 1;
+}
+
+static unsigned long mte_to_node(unsigned long entry)
+{
+        return entry & ~MAPLE_NODE_MASK;
+}
+
+static unsigned long mte_node_type(unsigned long maple_enode_entry)
+{
+	return (maple_enode_entry >> MAPLE_NODE_TYPE_SHIFT) &
+		MAPLE_NODE_TYPE_MASK;
+}
+
+static unsigned long mt_slot(void **slots, unsigned char offset)
+{
+       return (unsigned long)slots[offset];
+}
+
+static bool ma_is_leaf(unsigned long type)
+{
+	return type < maple_range_64_enum;
+}
+
+static bool mte_is_leaf(unsigned long maple_enode_entry)
+{
+       return ma_is_leaf(mte_node_type(maple_enode_entry));
+}
+
+static void mt_dump_entry(unsigned long entry, unsigned long min,
+			unsigned long max, unsigned int depth,
+			unsigned long **array_out, int *array_len,
+			int *array_cap)
+{
+	unsigned long *tmp;
+	int new_cap = 0;
+
+	if (entry == 0)
+		return;
+	if (*array_out == NULL) {
+		*array_len = 0;
+		new_cap = 4;
+	} else if (*array_len >= *array_cap) {
+		new_cap = *array_cap + (*array_cap >> 1);
+	}
+
+	if (new_cap) {
+		tmp = reallocarray(*array_out, new_cap, sizeof(unsigned long));
+		if (!tmp)
+			goto no_mem;
+		*array_out = tmp;
+		*array_cap = new_cap;
+	}
+
+	(*array_out)[(*array_len)++] = entry;
+	return;
+
+no_mem:
+	printf("%s: Not enough memory!\n", __func__);
+}
+
+static void mt_dump_node(unsigned long entry, unsigned long min,
+			unsigned long max, unsigned int depth,
+			unsigned long **array_out, int *array_len,
+			int *array_cap);
+
+static void mt_dump_range64(unsigned long entry, unsigned long min,
+			unsigned long max, unsigned int depth,
+			unsigned long **array_out, int *array_len,
+			int *array_cap)
+{
+	unsigned long maple_node_m_node = mte_to_node(entry);
+	char node_buf[MAPLE_BUFSIZE];
+	bool leaf = mte_is_leaf(entry);
+	unsigned long first = min, last;
+	int i;
+	char *mr64_buf;
+
+	readmem(VADDR, maple_node_m_node, node_buf, maple_node_size);
+	mr64_buf = node_buf + maple_node_mr64;
+
+	for (i = 0; i < mt_slots[maple_range_64_enum]; i++) {
+		last = max;
+
+		if (i < (mt_slots[maple_range_64_enum] - 1))
+			last = ULONG(mr64_buf + maple_range_64_pivot +
+				     sizeof(ulong) * i);
+
+		else if (!VOID_PTR(mr64_buf + maple_range_64_slot +
+			  sizeof(void *) * i) &&
+			 max != mt_max[mte_node_type(entry)])
+			break;
+		if (last == 0 && i > 0)
+			break;
+		if (leaf)
+			mt_dump_entry(mt_slot((void **)(mr64_buf +
+						      maple_range_64_slot), i),
+				first, last, depth + 1, array_out, array_len, array_cap);
+		else if (VOID_PTR(mr64_buf + maple_range_64_slot +
+				  sizeof(void *) * i)) {
+			mt_dump_node(mt_slot((void **)(mr64_buf +
+						     maple_range_64_slot), i),
+				first, last, depth + 1, array_out, array_len, array_cap);
+		}
+
+		if (last == max)
+			break;
+		if (last > max) {
+			printf("node %p last (%lu) > max (%lu) at pivot %d!\n",
+				mr64_buf, last, max, i);
+			break;
+		}
+		first = last + 1;
+	}
+}
+
+static void mt_dump_arange64(unsigned long entry, unsigned long min,
+			unsigned long max, unsigned int depth,
+			unsigned long **array_out, int *array_len,
+			int *array_cap)
+{
+	unsigned long maple_node_m_node = mte_to_node(entry);
+	char node_buf[MAPLE_BUFSIZE];
+	unsigned long first = min, last;
+	int i;
+	char *ma64_buf;
+
+	readmem(VADDR, maple_node_m_node, node_buf, maple_node_size);
+	ma64_buf = node_buf + maple_node_ma64;
+
+	for (i = 0; i < mt_slots[maple_arange_64_enum]; i++) {
+		last = max;
+
+		if (i < (mt_slots[maple_arange_64_enum] - 1))
+			last = ULONG(ma64_buf + maple_arange_64_pivot +
+				     sizeof(void *) * i);
+		else if (!VOID_PTR(ma64_buf + maple_arange_64_slot +
+				   sizeof(void *) * i))
+			break;
+		if (last == 0 && i > 0)
+			break;
+
+		if (ULONG(ma64_buf + maple_arange_64_slot + sizeof(void *) * i))
+			mt_dump_node(mt_slot((void **)(ma64_buf +
+						      maple_arange_64_slot), i),
+				first, last, depth + 1, array_out, array_len, array_cap);
+
+		if (last == max)
+			break;
+		if (last > max) {
+			printf("node %p last (%lu) > max (%lu) at pivot %d!\n",
+				ma64_buf, last, max, i);
+			break;
+		}
+		first = last + 1;
+	}
+}
+
+static void mt_dump_node(unsigned long entry, unsigned long min,
+			unsigned long max, unsigned int depth,
+			unsigned long **array_out, int *array_len,
+			int *array_cap)
+{
+	unsigned long maple_node = mte_to_node(entry);
+	unsigned long type = mte_node_type(entry);
+	int i;
+	char node_buf[MAPLE_BUFSIZE];
+
+	readmem(VADDR, maple_node, node_buf, maple_node_size);
+
+	switch (type) {
+	case maple_dense_enum:
+		for (i = 0; i < mt_slots[maple_dense_enum]; i++) {
+			if (min + i > max)
+				printf("OUT OF RANGE: ");
+			mt_dump_entry(mt_slot((void **)(node_buf + maple_node_slot), i),
+				min + i, min + i, depth, array_out, array_len, array_cap);
+		}
+		break;
+	case maple_leaf_64_enum:
+	case maple_range_64_enum:
+		mt_dump_range64(entry, min, max, depth, array_out, array_len, array_cap);
+		break;
+	case maple_arange_64_enum:
+		mt_dump_arange64(entry, min, max, depth, array_out, array_len, array_cap);
+		break;
+	default:
+		printf(" UNKNOWN TYPE\n");
+	}	
+}
+
+unsigned long *mt_dump(unsigned long mt, int *array_len)
+{
+	char tree_buf[MAPLE_BUFSIZE];
+	unsigned long entry;
+	unsigned long *array_out = NULL;
+	int array_cap = 0;
+	*array_len = 0;
+
+	readmem(VADDR, mt, tree_buf, maple_tree_size);
+	entry = ULONG(tree_buf + maple_tree_ma_root);
+
+	if (xa_is_node(entry))
+		mt_dump_node(entry, 0, mt_max[mte_node_type(entry)], 0,
+				&array_out, array_len, &array_cap);
+	else if (entry)
+		mt_dump_entry(entry, 0, 0, 0, &array_out, array_len, &array_cap);
+	else
+		printf("(empty)\n");
+
+	return array_out;
+}
+
+#define MAPLE_STRUCT_SIZE(S) \
+	({INIT_STRUCT(S); GET_STRUCT_SSIZE(S);})
+#define MAPLE_STRUCT_MEMBER_OFFSET(S, M) \
+	({INIT_STRUCT_MEMBER(S, M); GET_STRUCT_MEMBER_MOFF(S, M) / 8;})
+
+bool maple_init(void)
+{
+	unsigned long mt_slots_ptr;
+	unsigned long mt_pivots_ptr;
+	struct struct_member_info smi;
+
+	maple_tree_size = MAPLE_STRUCT_SIZE(maple_tree);
+	maple_node_size = MAPLE_STRUCT_SIZE(maple_node);
+	maple_tree_ma_root = MAPLE_STRUCT_MEMBER_OFFSET(maple_tree, ma_root);
+	maple_tree_ma_flags = MAPLE_STRUCT_MEMBER_OFFSET(maple_tree, ma_flags);
+	maple_node_parent = MAPLE_STRUCT_MEMBER_OFFSET(maple_node, parent);
+	maple_node_ma64 = MAPLE_STRUCT_MEMBER_OFFSET(maple_node, ma64);
+	maple_node_mr64 = MAPLE_STRUCT_MEMBER_OFFSET(maple_node, mr64);
+	maple_node_slot = MAPLE_STRUCT_MEMBER_OFFSET(maple_node, slot);
+	maple_arange_64_pivot = MAPLE_STRUCT_MEMBER_OFFSET(maple_arange_64, pivot);
+	maple_arange_64_slot = MAPLE_STRUCT_MEMBER_OFFSET(maple_arange_64, slot);
+	maple_arange_64_gap = MAPLE_STRUCT_MEMBER_OFFSET(maple_arange_64, gap);
+	maple_arange_64_meta = MAPLE_STRUCT_MEMBER_OFFSET(maple_arange_64, meta);
+	maple_range_64_pivot = MAPLE_STRUCT_MEMBER_OFFSET(maple_range_64, pivot);
+	maple_range_64_slot = MAPLE_STRUCT_MEMBER_OFFSET(maple_range_64, slot);
+	maple_metadata_end = MAPLE_STRUCT_MEMBER_OFFSET(maple_metadata, end);
+	maple_metadata_gap = MAPLE_STRUCT_MEMBER_OFFSET(maple_metadata, gap);
+
+	mt_slots_ptr = get_kallsyms_value_by_name("mt_slots");
+	mt_pivots_ptr = get_kallsyms_value_by_name("mt_pivots");
+	if (mt_slots_ptr == 0 || mt_pivots_ptr == 0) {
+		printf("Invalid mt_slots/mt_pivots address\n");
+		return false;
+	}
+	if (maple_tree_size > MAPLE_BUFSIZE ||
+	    maple_node_size > MAPLE_BUFSIZE) {
+		printf("MAPLE_BUFSIZE should be larger than maple_node/tree struct\n");
+		return false;
+	}
+
+	readmem(VADDR, mt_slots_ptr, mt_slots, sizeof(mt_slots));
+	readmem(VADDR, mt_pivots_ptr, mt_pivots, sizeof(mt_pivots));
+
+	mt_max[maple_dense_enum]           = mt_slots[maple_dense_enum];
+	mt_max[maple_leaf_64_enum]         = ULONG_MAX;
+	mt_max[maple_range_64_enum]        = ULONG_MAX;
+	mt_max[maple_arange_64_enum]       = ULONG_MAX;
+
+	return true;
+}
diff --git a/extensions/maple_tree.h b/extensions/maple_tree.h
new file mode 100644
index 0000000..c96624c
--- /dev/null
+++ b/extensions/maple_tree.h
@@ -0,0 +1,6 @@
+#ifndef _MAPLE_TREE_H
+#define _MAPLE_TREE_H
+#include <stdbool.h>
+unsigned long *mt_dump(unsigned long mt, int *array_len);
+bool maple_init(void);
+#endif /* _MAPLE_TREE_H */
\ No newline at end of file
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 8/8] Filter amdgpu mm pages
  2026-01-20  2:54 [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (6 preceding siblings ...)
  2026-01-20  2:54 ` [PATCH v3 7/8] Add maple tree support to makedumpfile extension Tao Liu
@ 2026-01-20  2:55 ` Tao Liu
  2026-01-20  4:39 ` [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
  2026-01-29 10:19 ` YAMAZAKI MASAMITSU(山崎　真光)
  9 siblings, 0 replies; 23+ messages in thread
From: Tao Liu @ 2026-01-20  2:55 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan, Tao Liu

This patch will introduce amdgpu mm page filtering extension, those mm pages
allocated to amdgpu will be discarded from vmcore, in order to shrink
vmcore size since mm pages allocated to amdgpu are useless to kernel
crash and may contain sensitive data.

Signed-off-by: Tao Liu <ltao@redhat.com>
---
 extensions/Makefile        |  4 +-
 extensions/amdgpu_filter.c | 90 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+), 2 deletions(-)
 create mode 100644 extensions/amdgpu_filter.c

diff --git a/extensions/Makefile b/extensions/Makefile
index afbc61e..81dda31 100644
--- a/extensions/Makefile
+++ b/extensions/Makefile
@@ -1,9 +1,9 @@
 CC ?= gcc
-CONTRIB_SO :=
+CONTRIB_SO := amdgpu_filter.so
 
 all: $(CONTRIB_SO)
 
-$(CONTRIB_SO): %.so: %.c
+$(CONTRIB_SO): %.so: %.c maple_tree.c
 	$(CC) -O2 -g -fPIC -shared -o $@ $^
 
 clean:
diff --git a/extensions/amdgpu_filter.c b/extensions/amdgpu_filter.c
new file mode 100644
index 0000000..433cc0b
--- /dev/null
+++ b/extensions/amdgpu_filter.c
@@ -0,0 +1,90 @@
+#include <stdio.h>
+#include "maple_tree.h"
+#include "../makedumpfile.h"
+#include "../btf_info.h"
+#include "../kallsyms.h"
+#include "../erase_info.h"
+
+#define MEMBER_OFF(S, M) \
+	GET_STRUCT_MEMBER_MOFF(S, M) / 8
+#define MOD_MEMBER_OFF(MOD, S, M) \
+	GET_MOD_STRUCT_MEMBER_MOFF(MOD, S, M) / 8
+
+static void do_filter(void)
+{
+	uint64_t init_task, list, list_offset, amdgpu_gem_vm_ops;
+	uint64_t mm, vm_ops, tbo, ttm, num_pages, pages, pfn, vmemmap_base;
+	struct struct_member_info smi;
+	int array_len;
+	unsigned long *array_out;
+	init_task = get_kallsyms_value_by_name("init_task");
+	amdgpu_gem_vm_ops = get_kallsyms_value_by_name("amdgpu_gem_vm_ops");
+
+	INIT_STRUCT_MEMBER(task_struct, tasks);
+	INIT_STRUCT_MEMBER(task_struct, mm);
+	INIT_STRUCT_MEMBER(mm_struct, mm_mt);
+	INIT_STRUCT_MEMBER(vm_area_struct, vm_ops);
+	INIT_STRUCT_MEMBER(vm_area_struct, vm_private_data);
+	INIT_MOD_STRUCT_MEMBER(amdgpu, ttm_buffer_object, ttm);
+	INIT_MOD_STRUCT_MEMBER(amdgpu, ttm_tt, pages);
+	INIT_MOD_STRUCT_MEMBER(amdgpu, ttm_tt, num_pages);
+	INIT_STRUCT(page);
+
+	list = init_task + MEMBER_OFF(task_struct, tasks);
+
+	do {
+		readmem(VADDR, list - MEMBER_OFF(task_struct, tasks) + 
+				MEMBER_OFF(task_struct, mm),
+			&mm, sizeof(uint64_t));
+		if (!mm) {
+			list = next_list(list);
+			continue;
+		}
+
+		array_out = mt_dump(mm + MEMBER_OFF(mm_struct, mm_mt), &array_len);
+		if (!array_out)
+			return;
+
+		for (int i = 0; i < array_len; i++) {
+			num_pages = 0;
+			readmem(VADDR, array_out[i] + MEMBER_OFF(vm_area_struct, vm_ops),
+				&vm_ops, GET_STRUCT_MEMBER_MSIZE(vm_area_struct, vm_ops));
+			if (vm_ops == amdgpu_gem_vm_ops) {
+				readmem(VADDR, array_out[i] +
+					MEMBER_OFF(vm_area_struct, vm_private_data),
+					&tbo, GET_STRUCT_MEMBER_MSIZE(vm_area_struct, vm_private_data));
+				readmem(VADDR, tbo + MOD_MEMBER_OFF(amdgpu, ttm_buffer_object, ttm),
+					&ttm, GET_MOD_STRUCT_MEMBER_MSIZE(amdgpu, ttm_buffer_object, ttm));
+				if (ttm) {
+					readmem(VADDR, ttm + MOD_MEMBER_OFF(amdgpu, ttm_tt, num_pages),
+						&num_pages, GET_MOD_STRUCT_MEMBER_MSIZE(amdgpu, ttm_tt, num_pages));
+					readmem(VADDR, ttm + MOD_MEMBER_OFF(amdgpu, ttm_tt, pages),
+						&pages, GET_MOD_STRUCT_MEMBER_MSIZE(amdgpu, ttm_tt, pages));
+					readmem(VADDR, pages, &pages, sizeof(unsigned long));
+					readmem(VADDR, get_kallsyms_value_by_name("vmemmap_base"),
+						&vmemmap_base, sizeof(unsigned long));
+					pfn = (pages - vmemmap_base) / GET_STRUCT_SSIZE(page);
+					update_filter_pages_info(pfn, num_pages, true);
+				}
+			}
+		}
+
+		free(array_out);
+		list = next_list(list);
+	} while (list != init_task + MEMBER_OFF(task_struct, tasks));
+
+	return;
+}
+
+__attribute__((constructor))
+static void amdgpu_mmpage_filter_constructor(void)
+{
+	if (!maple_init())
+		goto out;
+	do_filter();
+out:
+	return;
+}
+
+__attribute__((destructor))
+static void amdgpu_mmpage_filter_destructor(void) {}
\ No newline at end of file
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering
  2026-01-20  2:54 [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (7 preceding siblings ...)
  2026-01-20  2:55 ` [PATCH v3 8/8] Filter amdgpu mm pages Tao Liu
@ 2026-01-20  4:39 ` Tao Liu
  2026-01-29 10:19 ` YAMAZAKI MASAMITSU(山崎　真光)
  9 siblings, 0 replies; 23+ messages in thread
From: Tao Liu @ 2026-01-20  4:39 UTC (permalink / raw)
  To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, stephen.s.brennan

Forget to mention, this patchset is for makedumpfile only.

On Tue, Jan 20, 2026 at 4:39 PM Tao Liu <ltao@redhat.com> wrote:
>
> A) This patchset will introduce the following features to makedumpfile:
>
>   1) Add .so extension support to makedumpfile
>   2) Enable btf and kallsyms for symbol type and address resolving.
>
> B) The purpose of the features are:
>
>   1) Currently makedumpfile filters mm pages based on page flags, because flags
>      can help to determine one page's usage. But this page-flag-checking method
>      lacks of flexibility in certain cases, e.g. if we want to filter those mm
>      pages occupied by GPU during vmcore dumping due to:
>
>      a) GPU may be taking a large memory and contains sensitive data;
>      b) GPU mm pages have no relations to kernel crash and useless for vmcore
>         analysis.
>
>      But there is no GPU mm page specific flags, and apparently we don't need
>      to create one just for kdump use. A programmable filtering tool is more
>      suitable for such cases. In addition, different GPU vendors may use
>      different ways for mm pages allocating, programmable filtering is better
>      than hard coding these GPU specific logics into makedumpfile in this case.
>
>   2) Currently makedumpfile already contains a programmable filtering tool, aka
>      eppic script, which allows user to write customized code for data erasing.
>      However it has the following drawbacks:
>
>      a) cannot do mm page filtering.
>      b) need to access to debuginfo of both kernel and modules, which is not
>         applicable in the 2nd kernel.
>      c) eppic library has memory leaks which are not all resolved [1]. This
>         is not acceptable in 2nd kernel.
>
>      makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
>      types and addresses. In recent kernel there are dwarf alternatives such
>      as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
>      are already packed within vmcore, so we can use it directly.
>
>   With these, this patchset introduces makedumpfile extensions, which is based
>   on btf/kallsyms symbol resolving, and is programmable for mm page filtering.
>   The following section shows its usage and performance, please note the tests
>   are performed in 1st kernel.
>
>   3) Compile and run makedumpfile extensions:
>
>   $ make LINKTYPE=dynamic USELZO=on USESNAPPY=on USEZSTD=on
>   $ make extensions
>
>   $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
>     /tmp/extension.out
>     Loaded extension: ./extensions/amdgpu_filter.so
>     makedumpfile Completed.
>         User time (seconds): 6.37
>         System time (seconds): 0.70
>         Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.10
>         Maximum resident set size (kbytes): 38024
>         ...
>
>      To contrast with eppic script of v2 [2]:
>
>   $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
>     /tmp/eppic.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c
>     makedumpfile Completed.
>         User time (seconds): 8.23
>         System time (seconds): 0.88
>         Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.16
>         Maximum resident set size (kbytes): 57128
>         ...
>
>   -rw------- 1 root root 367475074 Jan 19 19:01 /tmp/extension.out
>   -rw------- 1 root root 367475074 Jan 19 19:48 /tmp/eppic.out
>   -rw------- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore
>
> C) Discussion:
>
>   1) GPU types: Currently only tested with amdgpu's mm page filtering, others
>      are not tested.
>   2) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64.
>      Others are not tested.
>
> D) Testing:
>
>      If you don't want to create your vmcore, you can find a vmcore which I
>      created with amdgpu mm pages unfiltered [3], the amdgpu mm pages are
>      allocated by program [4]. You can use the vmcore in 1st kernel to filter
>      the amdgpu mm pages by the previous performance testing cmdline. To
>      verify the pages are filtered in crash:
>
>      Unfiltered:
>      crash> search -c "!QAZXSW@#EDC"
>      ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>      ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>      crash> rd ffff96b7fa800000
>      ffff96b7fa800000:  405753585a415121                    !QAZXSW@
>      crash> rd ffff96b87c800000
>      ffff96b87c800000:  405753585a415121                    !QAZXSW@
>
>      Filtered:
>      crash> search -c "!QAZXSW@#EDC"
>      crash> rd ffff96b7fa800000
>      rd: page excluded: kernel virtual address: ffff96b7fa800000  type: "64-bit KVADDR"
>      crash> rd ffff96b87c800000
>      rd: page excluded: kernel virtual address: ffff96b87c800000  type: "64-bit KVADDR"
>
> [1]: https://github.com/lucchouina/eppic/pull/32
> [2]: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
> [3]: https://people.redhat.com/~ltao/core/vmcore
> [4]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df
>
> v3 -> v2:
>
> 1) Removed btf/kallsyms support for eppic script, and introduced
>    makedumpfile .so extension instead. The reason of removing eppic
>    support is:
>    a) Native binary code as .so has better performance than scripting,
>       see the time consumption contrast above.
>    b) Eppic library has memory leaks which hasn't been fixed totally,
>       memeory leaks in 2nd kernel might be fatal.
>
> 2) Removed the code of manually parsing btf info, and used libbpf for
>    btf info parsing instead. The reason of removing manually parsing is:
>    a) Less code modification to makedumpfile, easier to maintain.
>    b) The performance of using libbpf is as good as manual parsing +
>       hash table indexing, as well as less memory consumption, see time
>       and memory consumption contrast above.
>
> 3) The patches are organized as follows:
>
>     --- <only for test purpose, don't merge> ---
>     8.Filter amdgpu mm pages
>     7.Add maple tree support to makedumpfile extension
>
>     --- <code should be merged> ---
>     6.Add page filtering function
>     5.Add makedumpfile extension support
>     4.Implement kernel modules' btf resolving
>     3.Implement kernel modules' kallsyms resolving
>     2.Implement kernel btf resolving
>     1.Implement kernel kallsyms resolving
>
>     Patch 7 & 8 are customization specific, which can be maintained separately.
>     Patch 1 ~ 6 are common code which should be integrate with makedumpfile.
>
> Link to v2: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
> Link to v1: https://lore.kernel.org/kexec/20250610095743.18073-1-ltao@redhat.com/
>
> Tao Liu (8):
>   Implement kernel kallsyms resolving
>   Implement kernel btf resolving
>   Implement kernel modules' kallsyms resolving
>   Implement kernel modules' btf resolving
>   Add makedumpfile extension support
>   Add page filtering function
>   Add maple tree support to makedumpfile extension
>   Filter amdgpu mm pages
>
>  Makefile                   |   9 +-
>  btf_info.c                 | 260 +++++++++++++++++++++++++
>  btf_info.h                 |  66 +++++++
>  erase_info.c               |  98 ++++++++++
>  erase_info.h               |  12 ++
>  extension.c                |  82 ++++++++
>  extensions/Makefile        |  10 +
>  extensions/amdgpu_filter.c |  90 +++++++++
>  extensions/maple_tree.c    | 336 +++++++++++++++++++++++++++++++++
>  extensions/maple_tree.h    |   6 +
>  kallsyms.c                 | 376 +++++++++++++++++++++++++++++++++++++
>  kallsyms.h                 |  20 ++
>  makedumpfile.c             |  35 +++-
>  makedumpfile.h             |  11 ++
>  14 files changed, 1405 insertions(+), 6 deletions(-)
>  create mode 100644 btf_info.c
>  create mode 100644 btf_info.h
>  create mode 100644 extension.c
>  create mode 100644 extensions/Makefile
>  create mode 100644 extensions/amdgpu_filter.c
>  create mode 100644 extensions/maple_tree.c
>  create mode 100644 extensions/maple_tree.h
>  create mode 100644 kallsyms.c
>  create mode 100644 kallsyms.h
>
> --
> 2.47.0
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/8] Add makedumpfile extension support
  2026-01-20  2:54 ` [PATCH v3 5/8] Add makedumpfile extension support Tao Liu
@ 2026-01-22  0:51   ` Stephen Brennan
  2026-01-22 13:43     ` Tao Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Stephen Brennan @ 2026-01-22  0:51 UTC (permalink / raw)
  To: Tao Liu, yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, Tao Liu

Hi Tao,

This series looks really great -- I'm excited to see the switch to
native .so extensions instead of epicc. I've applied the series locally
and I'll rebuild my userspace stack inclusion feature based on it, to
try it out myself.

In the meantime, I'll share some of my feedback on the patches (though
I'm not a makedumpfile developer). This seems like the most important
patch in terms of design, so I'll start here.

Tao Liu <ltao@redhat.com> writes:
> This patch will add .so extension support to makedumpfile, similar to crash
> extension to crash utility. Currently only "/usr/lib64/makedumpfile/extensions"
> and "./extensions" are searched for extensions. Once found, kallsyms and btf
> will be initialized so all extensions can benifit from it (Currently makedumpfile
> doesn't use these info, we can move the kallsyms/btf init code else where later
> if makedumpfile needs them).
>
> The makedumpfile extension is to help users to customize mm page filtering upon
> traditional mm page flag filtering, without make code modification on makedumpfile
> itself.
>
> Signed-off-by: Tao Liu <ltao@redhat.com>
> ---
>  Makefile            |  7 +++-
>  extension.c         | 82 +++++++++++++++++++++++++++++++++++++++++++++
>  extensions/Makefile | 10 ++++++
>  makedumpfile.c      |  4 +++
>  4 files changed, 102 insertions(+), 1 deletion(-)
>  create mode 100644 extension.c
>  create mode 100644 extensions/Makefile
>
> diff --git a/Makefile b/Makefile
> index f3f4da8..7e29220 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
>  endif
>  
>  SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
> -SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c
> +SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c extension.c
>  OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
>  SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
>  OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
> @@ -126,6 +126,7 @@ eppic_makedumpfile.so: extension_eppic.c
>  
>  clean:
>  	rm -f $(OBJ) $(OBJ_PART) $(OBJ_ARCH) makedumpfile makedumpfile.8 makedumpfile.conf.5
> +	$(MAKE) -C extensions clean
>  
>  install:
>  	install -m 755 -d ${DESTDIR}/${SBINDIR} ${DESTDIR}/usr/share/man/man5 ${DESTDIR}/usr/share/man/man8
> @@ -135,3 +136,7 @@ install:
>  	mkdir -p ${DESTDIR}/usr/share/makedumpfile/eppic_scripts
>  	install -m 644 -D $(VPATH)makedumpfile.conf ${DESTDIR}/usr/share/makedumpfile/makedumpfile.conf.sample
>  	install -m 644 -t ${DESTDIR}/usr/share/makedumpfile/eppic_scripts/ $(VPATH)eppic_scripts/*
> +
> +.PHONY: extensions
> +extensions:
> +	$(MAKE) -C extensions CC=$(CC)
> \ No newline at end of file
> diff --git a/extension.c b/extension.c
> new file mode 100644
> index 0000000..6ee7f4e
> --- /dev/null
> +++ b/extension.c
> @@ -0,0 +1,82 @@
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <dirent.h>
> +#include <dlfcn.h>
> +#include <stdbool.h>
> +#include "kallsyms.h"
> +#include "btf_info.h"
> +
> +static const char *dirs[] = {
> +	"/usr/lib64/makedumpfile/extensions",
> +	"./extensions",
> +};
> +
> +/* Will only init once */
> +static bool init_kallsyms_btf(void)
> +{
> +	static bool ret = false;
> +	static bool has_inited = false;
> +
> +	if (has_inited)
> +		goto out;
> +	if (!init_kernel_kallsyms())
> +		goto out;
> +	if (!init_kernel_btf())
> +		goto out;
> +	if (!init_module_kallsyms())
> +		goto out;
> +	if (!init_module_btf())
> +		goto out;
> +	ret = true;

I feel it would be good practice to load as little information as is
necessary for the task. If "amdgpu" module is required, then load kernel
kallsyms, BTF, and then the amdgpu module kallsyms & BTF. If no module
debuginfo is required, then just the kernel would suffice.

This would reduce memory usage and runtime, though I don't know if it
would show up in profiling. The main benefit could be reliability: by
handling less data, there are fewer chances to hit an error.

> +out:
> +	has_inited = true;
> +	return ret;
> +}
> +
> +static void cleanup_kallsyms_btf(void)
> +{
> +	cleanup_kallsyms();
> +	cleanup_btf();
> +}
> +
> +void run_extensions(void)
> +{
> +	DIR *dir;
> +	struct dirent *entry;
> +	size_t len;
> +	int i;
> +	void *handle;
> +	char path[512];
> +
> +	for (i = 0; i < sizeof(dirs) / sizeof(char *); i++) {
> +		if ((dir = opendir(dirs[i])) != NULL)
> +			break;
> +	}
> +
> +	if (!dir || i >= sizeof(dirs) / sizeof(char *))
> +		/* No extensions found */
> +		return;

It could be confusing that makedumpfile would behave differently with
the same command-line arguments depending on the presence or absence of
these extensions on the filesystem.

I think it may fit users' expectations better if they are required to
specify extensions on the command line. Then we could load them by
searching each directory in order. This allows:

(a) more expected behavior
(b) multiple extensions can exist without all being enabled, thus more
    flexibility
(c) extensions can be present in the local "extensions/" directory, or
    in the system directory

> +	while ((entry = readdir(dir)) != NULL) {
> +		len = strlen(entry->d_name);
> +		if (len > 3 && strcmp(entry->d_name + len - 3, ".so") == 0) {
> +			/* Will only init when .so exist */
> +			if (!init_kallsyms_btf())
> +				goto out;
> +
> +			snprintf(path, sizeof(path), "%s/%s", dirs[i], entry->d_name);
> +			handle = dlopen(path, RTLD_NOW);
> +			if (!handle) {
> +				fprintf(stderr, "%s: Failed to load %s: %s\n",
> +					__func__, path, dlerror());
> +				continue;
> +			}
> +			printf("Loaded extension: %s\n", path);
> +			dlclose(handle);

Using the constructor/destructor of the shared object is clever! But we
lose some flexibility: by the time the dlopen() returns, the constructor
has executed and the plugin has thus executed.

What if we instead use dlsym() to load some symbols from the DSO? In
particular, I think it would be useful if extensions could declare a
list of symbols and a list of structure information which they are
interested in receiving. We could use these lists to know which
kernel/module kallsyms & BTF we should load. We could even load the
information into the local variables of the extension, so the extension
would not need to manually load it.

Of course this is more complex, but the benefit is:

1. Extensions can be written more simply, and would not need to manually
load each symbol & type.
2. We could eliminate the hash tables for kallsyms & BTF, and eliminate
the loading of unnecessary module information. Instead, we'd just
populate the symbol addresses, struct offsets, and type sizes directly
into the local variables which request them.

Again, while I don't want to prematurely optimize -- it's good to avoid
loading unnecessary information. I hope I've described my idea well. I
would be happy to work on an implementation of it based on your patches
here, if you're interested.

Thanks,
Stephen

> +		}
> +	}
> +out:
> +	closedir(dir);
> +	cleanup_kallsyms_btf();
> +}
> \ No newline at end of file
> diff --git a/extensions/Makefile b/extensions/Makefile
> new file mode 100644
> index 0000000..afbc61e
> --- /dev/null
> +++ b/extensions/Makefile
> @@ -0,0 +1,10 @@
> +CC ?= gcc
> +CONTRIB_SO :=
> +
> +all: $(CONTRIB_SO)
> +
> +$(CONTRIB_SO): %.so: %.c
> +	$(CC) -O2 -g -fPIC -shared -o $@ $^
> +
> +clean:
> +	rm -f $(CONTRIB_SO)
> diff --git a/makedumpfile.c b/makedumpfile.c
> index dba3628..ca8ed8a 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -10847,6 +10847,8 @@ update_dump_level(void)
>  	}
>  }
>  
> +void run_extensions(void);
> +
>  int
>  create_dumpfile(void)
>  {
> @@ -10884,6 +10886,8 @@ retry:
>  	if (info->flag_refiltering)
>  		update_dump_level();
>  
> +	run_extensions();
> +
>  	if ((info->name_filterconfig || info->name_eppic_config)
>  			&& !gather_filter_info())
>  		return FALSE;
> -- 
> 2.47.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/8] Add makedumpfile extension support
  2026-01-22  0:51   ` Stephen Brennan
@ 2026-01-22 13:43     ` Tao Liu
  2026-02-04  8:40       ` Tao Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Tao Liu @ 2026-01-22 13:43 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: yamazaki-msmt, k-hagio-ab, kexec, aravinda

Hi Stephen,

Thanks a lot for your quick reply and detailed information, I really
appreciate it!

On Thu, Jan 22, 2026 at 1:51 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Hi Tao,
>
> This series looks really great -- I'm excited to see the switch to
> native .so extensions instead of epicc. I've applied the series locally
> and I'll rebuild my userspace stack inclusion feature based on it, to
> try it out myself.

Awesome, looking forward to your feedback on the code/API designs etc...

>
> In the meantime, I'll share some of my feedback on the patches (though
> I'm not a makedumpfile developer). This seems like the most important
> patch in terms of design, so I'll start here.
>
> Tao Liu <ltao@redhat.com> writes:
> > This patch will add .so extension support to makedumpfile, similar to crash
> > extension to crash utility. Currently only "/usr/lib64/makedumpfile/extensions"
> > and "./extensions" are searched for extensions. Once found, kallsyms and btf
> > will be initialized so all extensions can benifit from it (Currently makedumpfile
> > doesn't use these info, we can move the kallsyms/btf init code else where later
> > if makedumpfile needs them).
> >
> > The makedumpfile extension is to help users to customize mm page filtering upon
> > traditional mm page flag filtering, without make code modification on makedumpfile
> > itself.
> >
> > Signed-off-by: Tao Liu <ltao@redhat.com>
> > ---
> >  Makefile            |  7 +++-
> >  extension.c         | 82 +++++++++++++++++++++++++++++++++++++++++++++
> >  extensions/Makefile | 10 ++++++
> >  makedumpfile.c      |  4 +++
> >  4 files changed, 102 insertions(+), 1 deletion(-)
> >  create mode 100644 extension.c
> >  create mode 100644 extensions/Makefile
> >
> > diff --git a/Makefile b/Makefile
> > index f3f4da8..7e29220 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
> >  endif
> >
> >  SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
> > -SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c
> > +SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c extension.c
> >  OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
> >  SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
> >  OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
> > @@ -126,6 +126,7 @@ eppic_makedumpfile.so: extension_eppic.c
> >
> >  clean:
> >       rm -f $(OBJ) $(OBJ_PART) $(OBJ_ARCH) makedumpfile makedumpfile.8 makedumpfile.conf.5
> > +     $(MAKE) -C extensions clean
> >
> >  install:
> >       install -m 755 -d ${DESTDIR}/${SBINDIR} ${DESTDIR}/usr/share/man/man5 ${DESTDIR}/usr/share/man/man8
> > @@ -135,3 +136,7 @@ install:
> >       mkdir -p ${DESTDIR}/usr/share/makedumpfile/eppic_scripts
> >       install -m 644 -D $(VPATH)makedumpfile.conf ${DESTDIR}/usr/share/makedumpfile/makedumpfile.conf.sample
> >       install -m 644 -t ${DESTDIR}/usr/share/makedumpfile/eppic_scripts/ $(VPATH)eppic_scripts/*
> > +
> > +.PHONY: extensions
> > +extensions:
> > +     $(MAKE) -C extensions CC=$(CC)
> > \ No newline at end of file
> > diff --git a/extension.c b/extension.c
> > new file mode 100644
> > index 0000000..6ee7f4e
> > --- /dev/null
> > +++ b/extension.c
> > @@ -0,0 +1,82 @@
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <dirent.h>
> > +#include <dlfcn.h>
> > +#include <stdbool.h>
> > +#include "kallsyms.h"
> > +#include "btf_info.h"
> > +
> > +static const char *dirs[] = {
> > +     "/usr/lib64/makedumpfile/extensions",
> > +     "./extensions",
> > +};
> > +
> > +/* Will only init once */
> > +static bool init_kallsyms_btf(void)
> > +{
> > +     static bool ret = false;
> > +     static bool has_inited = false;
> > +
> > +     if (has_inited)
> > +             goto out;
> > +     if (!init_kernel_kallsyms())
> > +             goto out;
> > +     if (!init_kernel_btf())
> > +             goto out;
> > +     if (!init_module_kallsyms())
> > +             goto out;
> > +     if (!init_module_btf())
> > +             goto out;
> > +     ret = true;
>
> I feel it would be good practice to load as little information as is
> necessary for the task. If "amdgpu" module is required, then load kernel
> kallsyms, BTF, and then the amdgpu module kallsyms & BTF. If no module
> debuginfo is required, then just the kernel would suffice.
>
> This would reduce memory usage and runtime, though I don't know if it
> would show up in profiling. The main benefit could be reliability: by
> handling less data, there are fewer chances to hit an error.

OK, I agree, mandatory kernel btf/kallsyms info + optional kernel
module btf/kallsyms info is a reasonable design. So kernel modules'
info can be loaded on demand.

>
> > +out:
> > +     has_inited = true;
> > +     return ret;
> > +}
> > +
> > +static void cleanup_kallsyms_btf(void)
> > +{
> > +     cleanup_kallsyms();
> > +     cleanup_btf();
> > +}
> > +
> > +void run_extensions(void)
> > +{
> > +     DIR *dir;
> > +     struct dirent *entry;
> > +     size_t len;
> > +     int i;
> > +     void *handle;
> > +     char path[512];
> > +
> > +     for (i = 0; i < sizeof(dirs) / sizeof(char *); i++) {
> > +             if ((dir = opendir(dirs[i])) != NULL)
> > +                     break;
> > +     }
> > +
> > +     if (!dir || i >= sizeof(dirs) / sizeof(char *))
> > +             /* No extensions found */
> > +             return;
>
> It could be confusing that makedumpfile would behave differently with
> the same command-line arguments depending on the presence or absence of
> these extensions on the filesystem.
>
> I think it may fit users' expectations better if they are required to
> specify extensions on the command line. Then we could load them by
> searching each directory in order. This allows:
>
> (a) more expected behavior
> (b) multiple extensions can exist without all being enabled, thus more
>     flexibility
> (c) extensions can be present in the local "extensions/" directory, or
>     in the system directory

Sure, it also sounds reasonable. My original thoughts are, user
customization on mm filtering are specified in .so, and if user don't
need one .so, e.g. amdgpu mm filtering for a nvidia machine, then he
doesn't pack the amdgpu_filter.so into kdump's initramfs. I agree
adding extra makedumpfile cmdline option to receive those needed .so
is a better design.

>
> > +     while ((entry = readdir(dir)) != NULL) {
> > +             len = strlen(entry->d_name);
> > +             if (len > 3 && strcmp(entry->d_name + len - 3, ".so") == 0) {
> > +                     /* Will only init when .so exist */
> > +                     if (!init_kallsyms_btf())
> > +                             goto out;
> > +
> > +                     snprintf(path, sizeof(path), "%s/%s", dirs[i], entry->d_name);
> > +                     handle = dlopen(path, RTLD_NOW);
> > +                     if (!handle) {
> > +                             fprintf(stderr, "%s: Failed to load %s: %s\n",
> > +                                     __func__, path, dlerror());
> > +                             continue;
> > +                     }
> > +                     printf("Loaded extension: %s\n", path);
> > +                     dlclose(handle);
>
> Using the constructor/destructor of the shared object is clever! But we
> lose some flexibility: by the time the dlopen() returns, the constructor
> has executed and the plugin has thus executed.
>
> What if we instead use dlsym() to load some symbols from the DSO? In
> particular, I think it would be useful if extensions could declare a
> list of symbols and a list of structure information which they are
> interested in receiving. We could use these lists to know which
> kernel/module kallsyms & BTF we should load. We could even load the
> information into the local variables of the extension, so the extension
> would not need to manually load it.
>
> Of course this is more complex, but the benefit is:
>
> 1. Extensions can be written more simply, and would not need to manually
> load each symbol & type.
> 2. We could eliminate the hash tables for kallsyms & BTF, and eliminate
> the loading of unnecessary module information. Instead, we'd just
> populate the symbol addresses, struct offsets, and type sizes directly
> into the local variables which request them.

It is a clever idea! Though complex for code, I think it is doable.

>
> Again, while I don't want to prematurely optimize -- it's good to avoid
> loading unnecessary information. I hope I've described my idea well. I
> would be happy to work on an implementation of it based on your patches
> here, if you're interested.

Thanks again for your suggestions! I got your points and I think I can
improve the code while waiting for maintainers ideas at the same time.
I will let you know when done or encounter blockers if any.

Thanks,
Tao Liu

>
> Thanks,
> Stephen
>
> > +             }
> > +     }
> > +out:
> > +     closedir(dir);
> > +     cleanup_kallsyms_btf();
> > +}
> > \ No newline at end of file
> > diff --git a/extensions/Makefile b/extensions/Makefile
> > new file mode 100644
> > index 0000000..afbc61e
> > --- /dev/null
> > +++ b/extensions/Makefile
> > @@ -0,0 +1,10 @@
> > +CC ?= gcc
> > +CONTRIB_SO :=
> > +
> > +all: $(CONTRIB_SO)
> > +
> > +$(CONTRIB_SO): %.so: %.c
> > +     $(CC) -O2 -g -fPIC -shared -o $@ $^
> > +
> > +clean:
> > +     rm -f $(CONTRIB_SO)
> > diff --git a/makedumpfile.c b/makedumpfile.c
> > index dba3628..ca8ed8a 100644
> > --- a/makedumpfile.c
> > +++ b/makedumpfile.c
> > @@ -10847,6 +10847,8 @@ update_dump_level(void)
> >       }
> >  }
> >
> > +void run_extensions(void);
> > +
> >  int
> >  create_dumpfile(void)
> >  {
> > @@ -10884,6 +10886,8 @@ retry:
> >       if (info->flag_refiltering)
> >               update_dump_level();
> >
> > +     run_extensions();
> > +
> >       if ((info->name_filterconfig || info->name_eppic_config)
> >                       && !gather_filter_info())
> >               return FALSE;
> > --
> > 2.47.0
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 6/8] Add page filtering function
  2026-01-20  2:54 ` [PATCH v3 6/8] Add page filtering function Tao Liu
@ 2026-01-23  0:54   ` Stephen Brennan
  2026-01-27  3:21     ` Tao Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Stephen Brennan @ 2026-01-23  0:54 UTC (permalink / raw)
  To: Tao Liu, yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, Tao Liu

Tao Liu <ltao@redhat.com> writes:
> pfn and num is the data which extensions give to makedumpfile for mm page
> filtering. Since makedumpfile will iterate the pfn in an ascending order in
> __exclude_unnecessary_pages(), pfn and num are stored within ft_page_info linked
> lists and organized in an ascending order by pfn, so if one pfn
> is hit by one list, the next pfn is most likely to be hit either by
> this list again, or the one follows, so a cur variable is used for saving
> the current list position to speedup the pfn checking process.

I'm wondering about the trade-off for using a linked list versus an
array. Using the linked list, we are forced to maintain the sorted
order as we construct the list, which is an O(N^2) insertion sort.

If instead we used an array, we could sort it with qsort() once, at the
end. Then we could merge any overlapping ranges. Lookup could be
implemented cheaply with bsearch(), and we could continue to use the
optimization where we maintain a "cur" pointer.  I believe the overall
runtime complexity of the array approach would be O(N*log(N)) without
requiring hand-implementing anything too complex, compared to O(N^2).

Depending on the number of pages (and how fragmented they are), this
may or may not be an issue.

In my testing for userspace tasks, the number of pages retained can be
on the order of ~100k. However -- my use case can't really use a list of
PFNs, which I'll explain below. So my use case doesn't really matter too
much here -- maybe your use case has relatively few page ranges, so the
cost of O(N^2) is not bad.

So I guess I don't have a strong preference - but it's worth
considering.

> In addition, 2 ft_page_info linked list chains are used, one for mm page
> discarding and the other for page keeping.
>
> Signed-off-by: Tao Liu <ltao@redhat.com>
> ---
>  erase_info.c   | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  erase_info.h   | 12 +++++++
>  makedumpfile.c | 28 ++++++++++++---
>  3 files changed, 134 insertions(+), 4 deletions(-)
>
> diff --git a/erase_info.c b/erase_info.c
> index b67d1d0..8838bea 100644
> --- a/erase_info.c
> +++ b/erase_info.c
> @@ -2466,3 +2466,101 @@ get_size_eraseinfo(void)
>  	return size_eraseinfo;
>  }
>  
> +/* Pages to be discarded */
> +static struct ft_page_info *ft_head_discard = NULL;
> +/* Pages to be keeped */
> +static struct ft_page_info *ft_head_keep = NULL;
> +
> +/*
> + * Insert the ft_page_info blocks into ft_head by ascending pfn.
> + */
> +bool
> +update_filter_pages_info(unsigned long pfn, unsigned long num, bool to_discard)
> +{
> +	struct ft_page_info *p, **ft_head;
> +	struct ft_page_info *new_p = malloc(sizeof(struct ft_page_info));
> +
> +	ft_head = to_discard ? &ft_head_discard : &ft_head_keep;
> +
> +	if (!new_p) {
> +		ERRMSG("Can't allocate memory for ft_page_info at %lx\n", pfn);
> +		return false;
> +	}
> +	new_p->pfn = pfn;
> +	new_p->num = num;
> +	new_p->next = NULL;
> +
> +	if (!(*ft_head) || (*ft_head)->pfn > new_p->pfn) {
> +		new_p->next = (*ft_head);
> +		(*ft_head) = new_p;
> +		return true;
> +	}
> +
> +	p = (*ft_head);
> +	while (p->next != NULL && p->next->pfn < new_p->pfn) {
> +		p = p->next;
> +	}
> +
> +	new_p->next = p->next;
> +	p->next = new_p;

It might be wise to defensively handle the case of overlapping
PFN ranges by merging them.

> +	return true;
> +}
> +
> +/*
> + * Check if the pfn hit ft_page_info block.
> + *
> + * pfn and ft_head are in ascending order, so save the current ft_page_info
> + * block into **p because it is likely to hit again next time.
> + */
> +bool
> +filter_page(unsigned long pfn, struct ft_page_info **p, bool handle_discard)
> +{
> +	struct ft_page_info *ft_head;
> +
> +	ft_head = handle_discard ? ft_head_discard : ft_head_keep;
> +
> +	if (ft_head == NULL)
> +		return false;
> +
> +	if (*p == NULL)
> +		*p = ft_head;
> +
> +	/* The gap before 1st block */
> +	if (pfn >= 0 && pfn < ft_head->pfn)
> +		return false;
> +
> +	/* Handle 1~(n-1) blocks and following gaps */
> +	while ((*p)->next) {
> +		if (pfn >= (*p)->pfn && pfn < (*p)->pfn + (*p)->num)
> +			return true; // hit the block
> +		if (pfn >= (*p)->pfn + (*p)->num && pfn < (*p)->next->pfn)
> +			return false; // the gap after the block
> +		*p = (*p)->next;
> +	}
> +
> +	/* The last block and gap */
> +	if (pfn >= (*p)->pfn + (*p)->num)
> +		return false;
> +	else
> +		return true;
> +}
> +
> +static void
> +do_cleanup(struct ft_page_info **ft_head)
> +{
> +	struct ft_page_info *p, *p_tmp;
> +
> +	for (p = *ft_head; p;) {
> +		p_tmp = p;
> +		p = p->next;
> +		free(p_tmp);
> +	}
> +	*ft_head = NULL;
> +}
> +
> +void
> +cleanup_filter_pages_info(void)
> +{
> +	do_cleanup(&ft_head_discard);
> +	do_cleanup(&ft_head_keep);
> +}
> diff --git a/erase_info.h b/erase_info.h
> index b363a40..6c60706 100644
> --- a/erase_info.h
> +++ b/erase_info.h
> @@ -20,6 +20,7 @@
>  #define _ERASE_INFO_H
>  
>  #define MAX_SIZE_STR_LEN (26)
> +#include <stdbool.h>
>  
>  /*
>   * Erase information, original symbol expressions.
> @@ -65,5 +66,16 @@ void filter_data_buffer_parallel(unsigned char *buf, unsigned long long paddr,
>  unsigned long get_size_eraseinfo(void);
>  int update_filter_info_raw(unsigned long long, int, int);
>  
> +bool update_filter_pages_info(unsigned long, unsigned long, bool);
> +
> +struct ft_page_info {
> +	unsigned long pfn;
> +	unsigned long num;
> +	struct ft_page_info *next;
> +} __attribute__((packed));
> +
> +bool filter_page(unsigned long, struct ft_page_info **p, bool handle_discard);
> +void cleanup_filter_pages_info(void);
> +
>  #endif /* _ERASE_INFO_H */
>  
> diff --git a/makedumpfile.c b/makedumpfile.c
> index ca8ed8a..ebac8da 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -102,6 +102,7 @@ mdf_pfn_t pfn_free;
>  mdf_pfn_t pfn_hwpoison;
>  mdf_pfn_t pfn_offline;
>  mdf_pfn_t pfn_elf_excluded;
> +mdf_pfn_t pfn_extension;
>  
>  mdf_pfn_t num_dumped;
>  
> @@ -6459,6 +6460,8 @@ __exclude_unnecessary_pages(unsigned long mem_map,
>  	unsigned int order_offset, dtor_offset;
>  	unsigned long flags, mapping, private = 0;
>  	unsigned long compound_dtor, compound_head = 0;
> +	struct ft_page_info *cur_discard = NULL;
> +	struct ft_page_info *cur_keep = NULL;
>  
>  	/*
>  	 * If a multi-page exclusion is pending, do it first
> @@ -6495,6 +6498,13 @@ __exclude_unnecessary_pages(unsigned long mem_map,
>  		if (info->flag_cyclic && !is_cyclic_region(pfn, cycle))
>  			continue;
>  
> +		/*
> +		 * Keep pages that specified by user via
> +		 * makedumpfile extensions
> +		 */
> +		if (filter_page(pfn, &cur_keep, false))
> +			continue;
> +

It makes sense to allow plugins to enumerate a list of PFNs to override
and include. I like that - it's simple enough. But it's not flexible
enough for my use case with userspace stacks :(

The userspace stack region is an anon_vma. My plugin can enumerate the
anon_vmas that it wants to save, but it's prohibitively expensive and
complex to enumerate the list of pages associated with each anon_vma. We
would need to do a page table walk for each process.

There's a simpler way: from the struct page mapping and index fields,
it's possible to determine which anon_vma the page is associated with,
and what index it has within the VMA. And from this, we can make the
determination of whether to include a page or not. This is what I had
implemented in this patch:

https://github.com/brenns10/makedumpfile/commit/1c0a828ef80962480f771915c2d494272721b659#diff-2593512d7ec329b34b1ca5686a7b6b073d0ca636df8ff20fea04684da2c8e063R6692-R12150

So, I wonder if it makes sense to allow a plugin to register a callback
to be called here, so the plugin can make the more complex decision?
This would keep the logic outside of the core makedumpfile code, but
allow the necessary flexibility.

Something like:

if (plugin_keep_page_callback && plugin_keep_page_callback(pfn, pcache))
    continue;

And then the extension system could allow an extension to register that
callback. It would need to keep the extension loaded for the duration of
the execution of makedumpfile (rather than calling dlclose()
immediately).

What do you think about this? I'm happy to implement this part of it
separate from your patch series -- you could simply drop the stuff
related to page inclusion, and I can add the necessary pieces when I
submit my extension patches.

Thanks,
Stephen

>  		/*
>  		 * Exclude the memory hole.
>  		 */
> @@ -6687,6 +6697,14 @@ check_order:
>  		else if (isOffline(flags, _mapcount)) {
>  			pfn_counter = &pfn_offline;
>  		}
> +		/*
> +		 * Exclude pages that specified by user via
> +		 * makedumpfile extensions
> +		 */
> +		else if (filter_page(pfn, &cur_discard, true)) {
> +			nr_pages = 1;
> +			pfn_counter = &pfn_extension;
> +		}
>  		/*
>  		 * Unexcludable page
>  		 */
> @@ -6748,6 +6766,7 @@ exclude_unnecessary_pages(struct cycle *cycle)
>  		print_progress(PROGRESS_UNN_PAGES, info->num_mem_map, info->num_mem_map, NULL);
>  		print_execution_time(PROGRESS_UNN_PAGES, &ts_start);
>  	}
> +	cleanup_filter_pages_info();
>  
>  	return TRUE;
>  }
> @@ -8234,7 +8253,7 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
>  	 */
>  	if (info->flag_cyclic) {
>  		pfn_zero = pfn_cache = pfn_cache_private = 0;
> -		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = 0;
> +		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_extension = 0;
>  		pfn_memhole = info->max_mapnr;
>  	}
>  
> @@ -9579,7 +9598,7 @@ write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_d
>  		 * Reset counter for debug message.
>  		 */
>  		pfn_zero = pfn_cache = pfn_cache_private = 0;
> -		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = 0;
> +		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_extension = 0;
>  		pfn_memhole = info->max_mapnr;
>  
>  		/*
> @@ -10528,7 +10547,7 @@ print_report(void)
>  	pfn_original = info->max_mapnr - pfn_memhole;
>  
>  	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
> -	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
> +	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_extension;
>  
>  	REPORT_MSG("\n");
>  	REPORT_MSG("Original pages  : 0x%016llx\n", pfn_original);
> @@ -10544,6 +10563,7 @@ print_report(void)
>  	REPORT_MSG("    Free pages              : 0x%016llx\n", pfn_free);
>  	REPORT_MSG("    Hwpoison pages          : 0x%016llx\n", pfn_hwpoison);
>  	REPORT_MSG("    Offline pages           : 0x%016llx\n", pfn_offline);
> +	REPORT_MSG("    Extension filter pages  : 0x%016llx\n", pfn_extension);
>  	REPORT_MSG("  Remaining pages  : 0x%016llx\n",
>  	    pfn_original - pfn_excluded);
>  
> @@ -10584,7 +10604,7 @@ print_mem_usage(void)
>  	pfn_original = info->max_mapnr - pfn_memhole;
>  
>  	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
> -	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
> +	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_extension;
>  	shrinking = (pfn_original - pfn_excluded) * 100;
>  	shrinking = shrinking / pfn_original;
>  	total_size = info->page_size * pfn_original;
> -- 
> 2.47.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/8] Implement kernel kallsyms resolving
  2026-01-20  2:54 ` [PATCH v3 1/8] Implement kernel kallsyms resolving Tao Liu
@ 2026-01-24  1:09   ` Stephen Brennan
  2026-01-24  5:52     ` Tao Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Stephen Brennan @ 2026-01-24  1:09 UTC (permalink / raw)
  To: Tao Liu, yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, Tao Liu

Hi Tao,

You've managed to do nearly the same as my implementations of kallsyms,
but with far less code! Nice! Thank you for this.

A few comments inline.

Tao Liu <ltao@redhat.com> writes:
> This patch will parse kernel's kallsyms data, and store them into a hash
> table so they can be referenced later in a fast speed.
>
> Signed-off-by: Tao Liu <ltao@redhat.com>
> ---
>  Makefile       |   2 +-
>  kallsyms.c     | 265 +++++++++++++++++++++++++++++++++++++++++++++++++
>  kallsyms.h     |  17 ++++
>  makedumpfile.c |   3 +
>  makedumpfile.h |  11 ++
>  5 files changed, 297 insertions(+), 1 deletion(-)
>  create mode 100644 kallsyms.c
>  create mode 100644 kallsyms.h
>
> diff --git a/Makefile b/Makefile
> index 05ab5f2..6c450ac 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
>  endif
>  
>  SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
> -SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c
> +SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c
>  OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
>  SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
>  OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
> diff --git a/kallsyms.c b/kallsyms.c
> new file mode 100644
> index 0000000..ecf64e0
> --- /dev/null
> +++ b/kallsyms.c
> @@ -0,0 +1,265 @@
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <string.h>
> +#include "makedumpfile.h"
> +#include "kallsyms.h"
> +
> +static uint32_t *kallsyms_offsets = NULL;
> +static uint16_t *kallsyms_token_index = NULL;
> +static uint8_t  *kallsyms_token_table = NULL;
> +static uint8_t  *kallsyms_names = NULL;
> +static unsigned long kallsyms_relative_base = 0;
> +static unsigned int kallsyms_num_syms = 0;
> +
> +#define NAME_HASH 512
> +static struct syment *name_hash_table[NAME_HASH] = {0};
> +
> +static uint64_t absolute_percpu(uint64_t base, int32_t val)
> +{
> +	if (val >= 0)
> +		return (uint64_t)val;
> +	else
> +		return base - 1 - val;
> +}
> +
> +static unsigned int hash_index(const char *name, unsigned int hash_size)
> +{
> +	unsigned int len, value;
> +
> +	len = strlen(name);
> +	value = name[len - 1] * name[len / 2];
> +
> +	return (name[0] ^ value) % hash_size;
> +}
> +
> +static void name_hash_install(struct syment *en)
> +{
> +	unsigned int index = hash_index(en->name, NAME_HASH);
> +	struct syment *sp = name_hash_table[index];
> +
> +	if (sp == NULL) {
> +		name_hash_table[index] = en;
> +	} else {
> +		while (sp) {
> +			if (sp->name_hash_next) {
> +				sp = sp->name_hash_next;
> +			} else {
> +				sp->name_hash_next = en;
> +				break;
> +			}
> +		}
> +	}
> +}
> +
> +static struct syment *search_kallsyms_by_name(char *name)
> +{
> +	unsigned int index;
> +	struct syment *sp;
> +
> +	index = hash_index(name, NAME_HASH);
> +	for (sp = name_hash_table[index]; sp; sp = sp->name_hash_next) {
> +		if (!strcmp(name, sp->name)) {
> +			return sp;
> +		}
> +	}
> +	return sp;
> +}
> +
> +static bool is_unwanted_symbol(char *name)
> +{
> +	const char *unwanted_prefix[] = {
> +		"__pfx_",	// CFI symbols
> +		"_R",		// Rust symbols
> +	};
> +	for (int i = 0; i < sizeof(unwanted_prefix) / sizeof(char *); i++) {
> +		if (!strncmp(name, unwanted_prefix[i], strlen(unwanted_prefix[i])))
> +			return true;
> +	}
> +	return false;
> +}
> +
> +uint64_t get_kallsyms_value_by_name(char *name)
> +{
> +	struct syment *sp;
> +
> +	sp = search_kallsyms_by_name(name);
> +	if (!sp)
> +		return 0;
> +	return sp->value;
> +}
> +
> +#define BUFLEN 1024
> +static bool parse_kernel_kallsyms(void)
> +{
> +	char buf[BUFLEN];
> +	int index = 0, i;
> +	uint8_t *compressd_data;
> +	uint8_t *uncompressd_data;
> +	uint64_t stext;
> +	uint8_t len, len_old;
> +	struct syment *kern_syment;
> +	bool skip;
> +
> +	for (i = 0; i < kallsyms_num_syms; i++) {
> +		skip = false;
> +		memset(buf, 0, BUFLEN);
> +		len = kallsyms_names[index];
> +		if (len & 0x80) {
> +			index++;
> +			len_old = len;
> +			len = kallsyms_names[index];
> +			if (len & 0x80) {
> +				fprintf(stderr, "%s: BUG! Unexpected 3-byte length,"
> +					" should be detected in init_kernel_kallsyms()\n",
> +					__func__);
> +				goto out;
> +			}
> +			len = (len_old & 0x7F) | (len << 7);

The 2-byte representation was added in commit 73bbb94466fd3 ("kallsyms:
support "big" kernel symbols"), in v6.1. It seems useful to include a
comment about that, at a minimum.

It also seems to me that, for older kernel versions, this means lengths
128-255 are ambiguous: for v6.1+, they indicate a long symbol, but for
kernel versions prior to that, they are valid lengths.

I guess this is implemented for current kernels, but it might be worth
checking the kernel major/minor version for this. Though, I haven't
personally witnessed the issue, so maybe it's unnecessary. I will test
this on some older kernels and let you know.

> +		}
> +		index++;
> +
> +		compressd_data = &kallsyms_names[index];
> +		index += len;
> +		while (len--) {
> +			uncompressd_data = &kallsyms_token_table[kallsyms_token_index[*compressd_data]];
> +			if (strlen(buf) + strlen((char *)uncompressd_data) >= BUFLEN) {
> +				skip = true;
> +				break;
> +			}
> +			strcat(buf, (char *)uncompressd_data);
> +			compressd_data++;
> +		}
> +		if (skip || is_unwanted_symbol(&buf[1]))
> +			continue;
> +		kern_syment = (struct syment *)calloc(1, sizeof(struct syment));
> +		if (!kern_syment)
> +			goto no_mem;
> +		kern_syment->value = kallsyms_offsets[i];
> +		kern_syment->name = strdup(&buf[1]);
> +		if (!kern_syment->name) {
> +			free(kern_syment);
> +			goto no_mem;
> +		}
> +		name_hash_install(kern_syment);

Like I mentioned in a prior email, if we were able to know the list of
symbols we care about up-front, we could entirely avoid creating the
hash table, and also avoid maintaining a list of symbol prefixes we want
to skip loading. I'm not certain that you would want to go that far, but
it's a thought.

> +	}
> +
> +	/* Now refresh the absolute each kallsyms address */

I think this could use a better comment. This is my understanding of the
history of the kallsyms address encoding history:

    Kallsyms originally stored absolute symbol addresses in a plain
    array called "kallsyms_addresses". This strategy was called
    "absolute kallsyms". In Linux v4.6, commit 2213e9a66bb87 ("kallsyms:
    add support for relative offsets in kallsyms address table"),
    introduced two ways of storing symbol addresses relative two a base
    address, so that 64-bit architectures could use 32-bit arrays. These
    methods were CONFIG_KALLSYMS_BASE_RELATIVE and
    CONFIG_KALLSYMS_ABSOLUTE_PERCPU. The ABSOLUTE_PERCPU mechanism was
    used by architectures like x86_64 with a percpu address range near
    0x0, but kernel address range in the negative address space. Some
    architectures, namely tile and ia64, had to continue using absolute
    kallsyms due to very large gaps in their address spaces.

    After both architectures were removed, absolute percpu was dropped
    in v6.11 commit 64e166099b69b ("kallsyms: get rid of code for
    absolute kallsyms"). In v6.15, the x86_64 percpu address range was
    moved away from 0x0, and as a result ABSOLUTE_PERCPU was no longer
    required. It was dropped in 01157ddc58dc2 ("kallsyms: Remove
    KALLSYMS_ABSOLUTE_PERCPU"), leaving only the BASE_RELATIVE scheme
    (which no longer has a kconfig entry, since there is no other
    scheme).

    This code implements support for BASE_RELATIVE and ABSOLUTE_PERCPU,
    but absolute percpu is not supported. The kallsyms symbols
    themselves were only added to vmcoreinfo in v6.0 with commit
    f09bddbd86619 ("vmcoreinfo: add kallsyms_num_syms symbol"). At that
    time, only ia64 would have used the absolute percpu mechanism. Even
    if these commits were backported to quite old kernels, BASE_RELATIVE
    and ABSOLUTE_PERCPU would suffice for most other architectures until
    v4.6 and earlier.

I don't know if all of the context is necessary, but I find it helpful
to know.

Thanks,
Stephen

> +	stext = get_kallsyms_value_by_name("_stext");
> +	if (SYMBOL(_stext) == absolute_percpu(kallsyms_relative_base, stext)) {
> +		for (i = 0; i < NAME_HASH; i++) {
> +			for (kern_syment = name_hash_table[i];
> +			     kern_syment;
> +			     kern_syment = kern_syment->name_hash_next)
> +				kern_syment->value = absolute_percpu(kallsyms_relative_base,
> +							kern_syment->value);
> +		}
> +	} else if (SYMBOL(_stext) == kallsyms_relative_base + stext) {
> +		for (i = 0; i < NAME_HASH; i++) {
> +			for (kern_syment = name_hash_table[i];
> +			     kern_syment;
> +			     kern_syment = kern_syment->name_hash_next)
> +				kern_syment->value += kallsyms_relative_base;
> +		}
> +	} else {
> +		fprintf(stderr, "%s: Wrong calculate kallsyms symbol value!\n", __func__);
> +		goto out;
> +	}
> +
> +	return true;
> +no_mem:
> +	fprintf(stderr, "%s: Not enough memory!\n", __func__);
> +out:
> +	return false;
> +}
> +
> +static bool vmcore_info_ready = false;
> +
> +bool read_vmcoreinfo_kallsyms(void)
> +{
> +	READ_SYMBOL("kallsyms_names", kallsyms_names);
> +	READ_SYMBOL("kallsyms_num_syms", kallsyms_num_syms);
> +	READ_SYMBOL("kallsyms_token_table", kallsyms_token_table);
> +	READ_SYMBOL("kallsyms_token_index", kallsyms_token_index);
> +	READ_SYMBOL("kallsyms_offsets", kallsyms_offsets);
> +	READ_SYMBOL("kallsyms_relative_base", kallsyms_relative_base);
> +	vmcore_info_ready = true;
> +	return true;
> +}
> +
> +bool init_kernel_kallsyms(void)
> +{
> +	const int token_index_size = (UINT8_MAX + 1) * sizeof(uint16_t);
> +	uint64_t last_token, len;
> +	unsigned char data, data_old;
> +	int i;
> +	bool ret = false;
> +
> +	if (vmcore_info_ready == false) {
> +		fprintf(stderr, "%s: vmcoreinfo not ready for kallsyms!\n",
> +			__func__);
> +		return ret;
> +	}
> +
> +	readmem(VADDR, SYMBOL(kallsyms_num_syms), &kallsyms_num_syms,
> +		sizeof(kallsyms_num_syms));
> +	readmem(VADDR, SYMBOL(kallsyms_relative_base), &kallsyms_relative_base,
> +		sizeof(kallsyms_relative_base));
> +
> +	kallsyms_offsets = malloc(sizeof(uint32_t) * kallsyms_num_syms);
> +	if (!kallsyms_offsets)
> +		goto no_mem;
> +	readmem(VADDR, SYMBOL(kallsyms_offsets), kallsyms_offsets,
> +		kallsyms_num_syms * sizeof(uint32_t));
> +
> +	kallsyms_token_index = malloc(token_index_size);
> +	if (!kallsyms_token_index)
> +		goto no_mem;
> +	readmem(VADDR, SYMBOL(kallsyms_token_index), kallsyms_token_index,
> +		token_index_size);
> +
> +	last_token = SYMBOL(kallsyms_token_table) + kallsyms_token_index[UINT8_MAX];
> +	do {
> +		readmem(VADDR, last_token++, &data, 1);
> +	} while(data);
> +	len = last_token - SYMBOL(kallsyms_token_table);
> +	kallsyms_token_table = malloc(len);
> +	if (!kallsyms_token_table)
> +		goto no_mem;
> +	readmem(VADDR, SYMBOL(kallsyms_token_table), kallsyms_token_table, len);
> +
> +	for (len = 0, i = 0; i < kallsyms_num_syms; i++) {
> +		readmem(VADDR, SYMBOL(kallsyms_names) + len, &data, 1);
> +		if (data & 0x80) {
> +			len += 1;
> +			data_old = data;
> +			readmem(VADDR, SYMBOL(kallsyms_names) + len, &data, 1);
> +			if (data & 0x80) {
> +				fprintf(stderr, "%s: BUG! Unexpected 3-byte length"
> +					" encoding in kallsyms names\n", __func__);
> +				goto out;
> +			}
> +			data = (data_old & 0x7F) | (data << 7);
> +		}
> +		len += data + 1;
> +	}
> +	kallsyms_names = malloc(len);
> +	if (!kallsyms_names)
> +		goto no_mem;
> +	readmem(VADDR, SYMBOL(kallsyms_names), kallsyms_names, len);
> +
> +	ret = parse_kernel_kallsyms();
> +	goto out;
> +
> +no_mem:
> +	fprintf(stderr, "%s: Not enough memory!\n", __func__);
> +out:
> +	if (kallsyms_offsets)
> +		free(kallsyms_offsets);
> +	if (kallsyms_token_index)
> +		free(kallsyms_token_index);
> +	if (kallsyms_token_table)
> +		free(kallsyms_token_table);
> +	if (kallsyms_names)
> +		free(kallsyms_names);
> +	return ret;
> +}
> diff --git a/kallsyms.h b/kallsyms.h
> new file mode 100644
> index 0000000..a4fbe10
> --- /dev/null
> +++ b/kallsyms.h
> @@ -0,0 +1,17 @@
> +#ifndef _KALLSYMS_H
> +#define _KALLSYMS_H
> +
> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +struct __attribute__((packed)) syment {
> +	uint64_t value;
> +	char *name;
> +	struct syment *name_hash_next;
> +};
> +
> +bool read_vmcoreinfo_kallsyms(void);
> +bool init_kernel_kallsyms(void);
> +uint64_t get_kallsyms_value_by_name(char *);
> +
> +#endif /* _KALLSYMS_H */
> \ No newline at end of file
> diff --git a/makedumpfile.c b/makedumpfile.c
> index 12fb0d8..dba3628 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -27,6 +27,7 @@
>  #include <limits.h>
>  #include <assert.h>
>  #include <zlib.h>
> +#include "kallsyms.h"
>  
>  struct symbol_table	symbol_table;
>  struct size_table	size_table;
> @@ -3105,6 +3106,8 @@ read_vmcoreinfo_from_vmcore(off_t offset, unsigned long size, int flag_xen_hv)
>  		if (!read_vmcoreinfo())
>  			goto out;
>  	}
> +	read_vmcoreinfo_kallsyms();
> +
>  	close_vmcoreinfo();
>  
>  	ret = TRUE;
> diff --git a/makedumpfile.h b/makedumpfile.h
> index 134eb7a..0dec50e 100644
> --- a/makedumpfile.h
> +++ b/makedumpfile.h
> @@ -259,6 +259,7 @@ static inline int string_exists(char *s) { return (s ? TRUE : FALSE); }
>  #define UINT(ADDR)	*((unsigned int *)(ADDR))
>  #define ULONG(ADDR)	*((unsigned long *)(ADDR))
>  #define ULONGLONG(ADDR)	*((unsigned long long *)(ADDR))
> +#define VOID_PTR(ADDR)  *((void **)(ADDR))
>  
>  
>  /*
> @@ -1919,6 +1920,16 @@ struct symbol_table {
>  	 * symbols on sparc64 arch
>  	 */
>  	unsigned long long		vmemmap_table;
> +
> +	/*
> +	 * kallsyms related
> +	 */
> +	unsigned long long		kallsyms_names;
> +	unsigned long long		kallsyms_num_syms;
> +	unsigned long long		kallsyms_token_table;
> +	unsigned long long		kallsyms_token_index;
> +	unsigned long long		kallsyms_offsets;
> +	unsigned long long		kallsyms_relative_base;
>  };
>  
>  struct size_table {
> -- 
> 2.47.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/8] Implement kernel kallsyms resolving
  2026-01-24  1:09   ` Stephen Brennan
@ 2026-01-24  5:52     ` Tao Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Tao Liu @ 2026-01-24  5:52 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: yamazaki-msmt, k-hagio-ab, kexec, aravinda

Hi Stephen,

On Sat, Jan 24, 2026 at 2:10 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Hi Tao,
>
> You've managed to do nearly the same as my implementations of kallsyms,
> but with far less code! Nice! Thank you for this.
>
> A few comments inline.

Thanks again for your detailed information and comments on the
patchset. Please give me some time to digest your information and
reconsider the features to be implemented for v4.

Thanks,
Tao Liu

>
> Tao Liu <ltao@redhat.com> writes:
> > This patch will parse kernel's kallsyms data, and store them into a hash
> > table so they can be referenced later in a fast speed.
> >
> > Signed-off-by: Tao Liu <ltao@redhat.com>
> > ---
> >  Makefile       |   2 +-
> >  kallsyms.c     | 265 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  kallsyms.h     |  17 ++++
> >  makedumpfile.c |   3 +
> >  makedumpfile.h |  11 ++
> >  5 files changed, 297 insertions(+), 1 deletion(-)
> >  create mode 100644 kallsyms.c
> >  create mode 100644 kallsyms.h
> >
> > diff --git a/Makefile b/Makefile
> > index 05ab5f2..6c450ac 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
> >  endif
> >
> >  SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
> > -SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c
> > +SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c
> >  OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
> >  SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
> >  OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
> > diff --git a/kallsyms.c b/kallsyms.c
> > new file mode 100644
> > index 0000000..ecf64e0
> > --- /dev/null
> > +++ b/kallsyms.c
> > @@ -0,0 +1,265 @@
> > +#include <stdint.h>
> > +#include <stdbool.h>
> > +#include <string.h>
> > +#include "makedumpfile.h"
> > +#include "kallsyms.h"
> > +
> > +static uint32_t *kallsyms_offsets = NULL;
> > +static uint16_t *kallsyms_token_index = NULL;
> > +static uint8_t  *kallsyms_token_table = NULL;
> > +static uint8_t  *kallsyms_names = NULL;
> > +static unsigned long kallsyms_relative_base = 0;
> > +static unsigned int kallsyms_num_syms = 0;
> > +
> > +#define NAME_HASH 512
> > +static struct syment *name_hash_table[NAME_HASH] = {0};
> > +
> > +static uint64_t absolute_percpu(uint64_t base, int32_t val)
> > +{
> > +     if (val >= 0)
> > +             return (uint64_t)val;
> > +     else
> > +             return base - 1 - val;
> > +}
> > +
> > +static unsigned int hash_index(const char *name, unsigned int hash_size)
> > +{
> > +     unsigned int len, value;
> > +
> > +     len = strlen(name);
> > +     value = name[len - 1] * name[len / 2];
> > +
> > +     return (name[0] ^ value) % hash_size;
> > +}
> > +
> > +static void name_hash_install(struct syment *en)
> > +{
> > +     unsigned int index = hash_index(en->name, NAME_HASH);
> > +     struct syment *sp = name_hash_table[index];
> > +
> > +     if (sp == NULL) {
> > +             name_hash_table[index] = en;
> > +     } else {
> > +             while (sp) {
> > +                     if (sp->name_hash_next) {
> > +                             sp = sp->name_hash_next;
> > +                     } else {
> > +                             sp->name_hash_next = en;
> > +                             break;
> > +                     }
> > +             }
> > +     }
> > +}
> > +
> > +static struct syment *search_kallsyms_by_name(char *name)
> > +{
> > +     unsigned int index;
> > +     struct syment *sp;
> > +
> > +     index = hash_index(name, NAME_HASH);
> > +     for (sp = name_hash_table[index]; sp; sp = sp->name_hash_next) {
> > +             if (!strcmp(name, sp->name)) {
> > +                     return sp;
> > +             }
> > +     }
> > +     return sp;
> > +}
> > +
> > +static bool is_unwanted_symbol(char *name)
> > +{
> > +     const char *unwanted_prefix[] = {
> > +             "__pfx_",       // CFI symbols
> > +             "_R",           // Rust symbols
> > +     };
> > +     for (int i = 0; i < sizeof(unwanted_prefix) / sizeof(char *); i++) {
> > +             if (!strncmp(name, unwanted_prefix[i], strlen(unwanted_prefix[i])))
> > +                     return true;
> > +     }
> > +     return false;
> > +}
> > +
> > +uint64_t get_kallsyms_value_by_name(char *name)
> > +{
> > +     struct syment *sp;
> > +
> > +     sp = search_kallsyms_by_name(name);
> > +     if (!sp)
> > +             return 0;
> > +     return sp->value;
> > +}
> > +
> > +#define BUFLEN 1024
> > +static bool parse_kernel_kallsyms(void)
> > +{
> > +     char buf[BUFLEN];
> > +     int index = 0, i;
> > +     uint8_t *compressd_data;
> > +     uint8_t *uncompressd_data;
> > +     uint64_t stext;
> > +     uint8_t len, len_old;
> > +     struct syment *kern_syment;
> > +     bool skip;
> > +
> > +     for (i = 0; i < kallsyms_num_syms; i++) {
> > +             skip = false;
> > +             memset(buf, 0, BUFLEN);
> > +             len = kallsyms_names[index];
> > +             if (len & 0x80) {
> > +                     index++;
> > +                     len_old = len;
> > +                     len = kallsyms_names[index];
> > +                     if (len & 0x80) {
> > +                             fprintf(stderr, "%s: BUG! Unexpected 3-byte length,"
> > +                                     " should be detected in init_kernel_kallsyms()\n",
> > +                                     __func__);
> > +                             goto out;
> > +                     }
> > +                     len = (len_old & 0x7F) | (len << 7);
>
> The 2-byte representation was added in commit 73bbb94466fd3 ("kallsyms:
> support "big" kernel symbols"), in v6.1. It seems useful to include a
> comment about that, at a minimum.
>
> It also seems to me that, for older kernel versions, this means lengths
> 128-255 are ambiguous: for v6.1+, they indicate a long symbol, but for
> kernel versions prior to that, they are valid lengths.
>
> I guess this is implemented for current kernels, but it might be worth
> checking the kernel major/minor version for this. Though, I haven't
> personally witnessed the issue, so maybe it's unnecessary. I will test
> this on some older kernels and let you know.
>
> > +             }
> > +             index++;
> > +
> > +             compressd_data = &kallsyms_names[index];
> > +             index += len;
> > +             while (len--) {
> > +                     uncompressd_data = &kallsyms_token_table[kallsyms_token_index[*compressd_data]];
> > +                     if (strlen(buf) + strlen((char *)uncompressd_data) >= BUFLEN) {
> > +                             skip = true;
> > +                             break;
> > +                     }
> > +                     strcat(buf, (char *)uncompressd_data);
> > +                     compressd_data++;
> > +             }
> > +             if (skip || is_unwanted_symbol(&buf[1]))
> > +                     continue;
> > +             kern_syment = (struct syment *)calloc(1, sizeof(struct syment));
> > +             if (!kern_syment)
> > +                     goto no_mem;
> > +             kern_syment->value = kallsyms_offsets[i];
> > +             kern_syment->name = strdup(&buf[1]);
> > +             if (!kern_syment->name) {
> > +                     free(kern_syment);
> > +                     goto no_mem;
> > +             }
> > +             name_hash_install(kern_syment);
>
> Like I mentioned in a prior email, if we were able to know the list of
> symbols we care about up-front, we could entirely avoid creating the
> hash table, and also avoid maintaining a list of symbol prefixes we want
> to skip loading. I'm not certain that you would want to go that far, but
> it's a thought.
>
> > +     }
> > +
> > +     /* Now refresh the absolute each kallsyms address */
>
> I think this could use a better comment. This is my understanding of the
> history of the kallsyms address encoding history:
>
>     Kallsyms originally stored absolute symbol addresses in a plain
>     array called "kallsyms_addresses". This strategy was called
>     "absolute kallsyms". In Linux v4.6, commit 2213e9a66bb87 ("kallsyms:
>     add support for relative offsets in kallsyms address table"),
>     introduced two ways of storing symbol addresses relative two a base
>     address, so that 64-bit architectures could use 32-bit arrays. These
>     methods were CONFIG_KALLSYMS_BASE_RELATIVE and
>     CONFIG_KALLSYMS_ABSOLUTE_PERCPU. The ABSOLUTE_PERCPU mechanism was
>     used by architectures like x86_64 with a percpu address range near
>     0x0, but kernel address range in the negative address space. Some
>     architectures, namely tile and ia64, had to continue using absolute
>     kallsyms due to very large gaps in their address spaces.
>
>     After both architectures were removed, absolute percpu was dropped
>     in v6.11 commit 64e166099b69b ("kallsyms: get rid of code for
>     absolute kallsyms"). In v6.15, the x86_64 percpu address range was
>     moved away from 0x0, and as a result ABSOLUTE_PERCPU was no longer
>     required. It was dropped in 01157ddc58dc2 ("kallsyms: Remove
>     KALLSYMS_ABSOLUTE_PERCPU"), leaving only the BASE_RELATIVE scheme
>     (which no longer has a kconfig entry, since there is no other
>     scheme).
>
>     This code implements support for BASE_RELATIVE and ABSOLUTE_PERCPU,
>     but absolute percpu is not supported. The kallsyms symbols
>     themselves were only added to vmcoreinfo in v6.0 with commit
>     f09bddbd86619 ("vmcoreinfo: add kallsyms_num_syms symbol"). At that
>     time, only ia64 would have used the absolute percpu mechanism. Even
>     if these commits were backported to quite old kernels, BASE_RELATIVE
>     and ABSOLUTE_PERCPU would suffice for most other architectures until
>     v4.6 and earlier.
>
> I don't know if all of the context is necessary, but I find it helpful
> to know.
>
> Thanks,
> Stephen
>
> > +     stext = get_kallsyms_value_by_name("_stext");
> > +     if (SYMBOL(_stext) == absolute_percpu(kallsyms_relative_base, stext)) {
> > +             for (i = 0; i < NAME_HASH; i++) {
> > +                     for (kern_syment = name_hash_table[i];
> > +                          kern_syment;
> > +                          kern_syment = kern_syment->name_hash_next)
> > +                             kern_syment->value = absolute_percpu(kallsyms_relative_base,
> > +                                                     kern_syment->value);
> > +             }
> > +     } else if (SYMBOL(_stext) == kallsyms_relative_base + stext) {
> > +             for (i = 0; i < NAME_HASH; i++) {
> > +                     for (kern_syment = name_hash_table[i];
> > +                          kern_syment;
> > +                          kern_syment = kern_syment->name_hash_next)
> > +                             kern_syment->value += kallsyms_relative_base;
> > +             }
> > +     } else {
> > +             fprintf(stderr, "%s: Wrong calculate kallsyms symbol value!\n", __func__);
> > +             goto out;
> > +     }
> > +
> > +     return true;
> > +no_mem:
> > +     fprintf(stderr, "%s: Not enough memory!\n", __func__);
> > +out:
> > +     return false;
> > +}
> > +
> > +static bool vmcore_info_ready = false;
> > +
> > +bool read_vmcoreinfo_kallsyms(void)
> > +{
> > +     READ_SYMBOL("kallsyms_names", kallsyms_names);
> > +     READ_SYMBOL("kallsyms_num_syms", kallsyms_num_syms);
> > +     READ_SYMBOL("kallsyms_token_table", kallsyms_token_table);
> > +     READ_SYMBOL("kallsyms_token_index", kallsyms_token_index);
> > +     READ_SYMBOL("kallsyms_offsets", kallsyms_offsets);
> > +     READ_SYMBOL("kallsyms_relative_base", kallsyms_relative_base);
> > +     vmcore_info_ready = true;
> > +     return true;
> > +}
> > +
> > +bool init_kernel_kallsyms(void)
> > +{
> > +     const int token_index_size = (UINT8_MAX + 1) * sizeof(uint16_t);
> > +     uint64_t last_token, len;
> > +     unsigned char data, data_old;
> > +     int i;
> > +     bool ret = false;
> > +
> > +     if (vmcore_info_ready == false) {
> > +             fprintf(stderr, "%s: vmcoreinfo not ready for kallsyms!\n",
> > +                     __func__);
> > +             return ret;
> > +     }
> > +
> > +     readmem(VADDR, SYMBOL(kallsyms_num_syms), &kallsyms_num_syms,
> > +             sizeof(kallsyms_num_syms));
> > +     readmem(VADDR, SYMBOL(kallsyms_relative_base), &kallsyms_relative_base,
> > +             sizeof(kallsyms_relative_base));
> > +
> > +     kallsyms_offsets = malloc(sizeof(uint32_t) * kallsyms_num_syms);
> > +     if (!kallsyms_offsets)
> > +             goto no_mem;
> > +     readmem(VADDR, SYMBOL(kallsyms_offsets), kallsyms_offsets,
> > +             kallsyms_num_syms * sizeof(uint32_t));
> > +
> > +     kallsyms_token_index = malloc(token_index_size);
> > +     if (!kallsyms_token_index)
> > +             goto no_mem;
> > +     readmem(VADDR, SYMBOL(kallsyms_token_index), kallsyms_token_index,
> > +             token_index_size);
> > +
> > +     last_token = SYMBOL(kallsyms_token_table) + kallsyms_token_index[UINT8_MAX];
> > +     do {
> > +             readmem(VADDR, last_token++, &data, 1);
> > +     } while(data);
> > +     len = last_token - SYMBOL(kallsyms_token_table);
> > +     kallsyms_token_table = malloc(len);
> > +     if (!kallsyms_token_table)
> > +             goto no_mem;
> > +     readmem(VADDR, SYMBOL(kallsyms_token_table), kallsyms_token_table, len);
> > +
> > +     for (len = 0, i = 0; i < kallsyms_num_syms; i++) {
> > +             readmem(VADDR, SYMBOL(kallsyms_names) + len, &data, 1);
> > +             if (data & 0x80) {
> > +                     len += 1;
> > +                     data_old = data;
> > +                     readmem(VADDR, SYMBOL(kallsyms_names) + len, &data, 1);
> > +                     if (data & 0x80) {
> > +                             fprintf(stderr, "%s: BUG! Unexpected 3-byte length"
> > +                                     " encoding in kallsyms names\n", __func__);
> > +                             goto out;
> > +                     }
> > +                     data = (data_old & 0x7F) | (data << 7);
> > +             }
> > +             len += data + 1;
> > +     }
> > +     kallsyms_names = malloc(len);
> > +     if (!kallsyms_names)
> > +             goto no_mem;
> > +     readmem(VADDR, SYMBOL(kallsyms_names), kallsyms_names, len);
> > +
> > +     ret = parse_kernel_kallsyms();
> > +     goto out;
> > +
> > +no_mem:
> > +     fprintf(stderr, "%s: Not enough memory!\n", __func__);
> > +out:
> > +     if (kallsyms_offsets)
> > +             free(kallsyms_offsets);
> > +     if (kallsyms_token_index)
> > +             free(kallsyms_token_index);
> > +     if (kallsyms_token_table)
> > +             free(kallsyms_token_table);
> > +     if (kallsyms_names)
> > +             free(kallsyms_names);
> > +     return ret;
> > +}
> > diff --git a/kallsyms.h b/kallsyms.h
> > new file mode 100644
> > index 0000000..a4fbe10
> > --- /dev/null
> > +++ b/kallsyms.h
> > @@ -0,0 +1,17 @@
> > +#ifndef _KALLSYMS_H
> > +#define _KALLSYMS_H
> > +
> > +#include <stdint.h>
> > +#include <stdbool.h>
> > +
> > +struct __attribute__((packed)) syment {
> > +     uint64_t value;
> > +     char *name;
> > +     struct syment *name_hash_next;
> > +};
> > +
> > +bool read_vmcoreinfo_kallsyms(void);
> > +bool init_kernel_kallsyms(void);
> > +uint64_t get_kallsyms_value_by_name(char *);
> > +
> > +#endif /* _KALLSYMS_H */
> > \ No newline at end of file
> > diff --git a/makedumpfile.c b/makedumpfile.c
> > index 12fb0d8..dba3628 100644
> > --- a/makedumpfile.c
> > +++ b/makedumpfile.c
> > @@ -27,6 +27,7 @@
> >  #include <limits.h>
> >  #include <assert.h>
> >  #include <zlib.h>
> > +#include "kallsyms.h"
> >
> >  struct symbol_table  symbol_table;
> >  struct size_table    size_table;
> > @@ -3105,6 +3106,8 @@ read_vmcoreinfo_from_vmcore(off_t offset, unsigned long size, int flag_xen_hv)
> >               if (!read_vmcoreinfo())
> >                       goto out;
> >       }
> > +     read_vmcoreinfo_kallsyms();
> > +
> >       close_vmcoreinfo();
> >
> >       ret = TRUE;
> > diff --git a/makedumpfile.h b/makedumpfile.h
> > index 134eb7a..0dec50e 100644
> > --- a/makedumpfile.h
> > +++ b/makedumpfile.h
> > @@ -259,6 +259,7 @@ static inline int string_exists(char *s) { return (s ? TRUE : FALSE); }
> >  #define UINT(ADDR)   *((unsigned int *)(ADDR))
> >  #define ULONG(ADDR)  *((unsigned long *)(ADDR))
> >  #define ULONGLONG(ADDR)      *((unsigned long long *)(ADDR))
> > +#define VOID_PTR(ADDR)  *((void **)(ADDR))
> >
> >
> >  /*
> > @@ -1919,6 +1920,16 @@ struct symbol_table {
> >        * symbols on sparc64 arch
> >        */
> >       unsigned long long              vmemmap_table;
> > +
> > +     /*
> > +      * kallsyms related
> > +      */
> > +     unsigned long long              kallsyms_names;
> > +     unsigned long long              kallsyms_num_syms;
> > +     unsigned long long              kallsyms_token_table;
> > +     unsigned long long              kallsyms_token_index;
> > +     unsigned long long              kallsyms_offsets;
> > +     unsigned long long              kallsyms_relative_base;
> >  };
> >
> >  struct size_table {
> > --
> > 2.47.0
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 6/8] Add page filtering function
  2026-01-23  0:54   ` Stephen Brennan
@ 2026-01-27  3:21     ` Tao Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Tao Liu @ 2026-01-27  3:21 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: yamazaki-msmt, k-hagio-ab, kexec, aravinda

Hi Stephen,

On Fri, Jan 23, 2026 at 2:04 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Tao Liu <ltao@redhat.com> writes:
> > pfn and num is the data which extensions give to makedumpfile for mm page
> > filtering. Since makedumpfile will iterate the pfn in an ascending order in
> > __exclude_unnecessary_pages(), pfn and num are stored within ft_page_info linked
> > lists and organized in an ascending order by pfn, so if one pfn
> > is hit by one list, the next pfn is most likely to be hit either by
> > this list again, or the one follows, so a cur variable is used for saving
> > the current list position to speedup the pfn checking process.
>
> I'm wondering about the trade-off for using a linked list versus an
> array. Using the linked list, we are forced to maintain the sorted
> order as we construct the list, which is an O(N^2) insertion sort.
>
> If instead we used an array, we could sort it with qsort() once, at the
> end. Then we could merge any overlapping ranges. Lookup could be
> implemented cheaply with bsearch(), and we could continue to use the
> optimization where we maintain a "cur" pointer.  I believe the overall
> runtime complexity of the array approach would be O(N*log(N)) without
> requiring hand-implementing anything too complex, compared to O(N^2).
>
> Depending on the number of pages (and how fragmented they are), this
> may or may not be an issue.
>
> In my testing for userspace tasks, the number of pages retained can be
> on the order of ~100k. However -- my use case can't really use a list of
> PFNs, which I'll explain below. So my use case doesn't really matter too
> much here -- maybe your use case has relatively few page ranges, so the
> cost of O(N^2) is not bad.
>
> So I guess I don't have a strong preference - but it's worth
> considering.
>
> > In addition, 2 ft_page_info linked list chains are used, one for mm page
> > discarding and the other for page keeping.
> >
> > Signed-off-by: Tao Liu <ltao@redhat.com>
> > ---
> >  erase_info.c   | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  erase_info.h   | 12 +++++++
> >  makedumpfile.c | 28 ++++++++++++---
> >  3 files changed, 134 insertions(+), 4 deletions(-)
> >
> > diff --git a/erase_info.c b/erase_info.c
> > index b67d1d0..8838bea 100644
> > --- a/erase_info.c
> > +++ b/erase_info.c
> > @@ -2466,3 +2466,101 @@ get_size_eraseinfo(void)
> >       return size_eraseinfo;
> >  }
> >
> > +/* Pages to be discarded */
> > +static struct ft_page_info *ft_head_discard = NULL;
> > +/* Pages to be keeped */
> > +static struct ft_page_info *ft_head_keep = NULL;
> > +
> > +/*
> > + * Insert the ft_page_info blocks into ft_head by ascending pfn.
> > + */
> > +bool
> > +update_filter_pages_info(unsigned long pfn, unsigned long num, bool to_discard)
> > +{
> > +     struct ft_page_info *p, **ft_head;
> > +     struct ft_page_info *new_p = malloc(sizeof(struct ft_page_info));
> > +
> > +     ft_head = to_discard ? &ft_head_discard : &ft_head_keep;
> > +
> > +     if (!new_p) {
> > +             ERRMSG("Can't allocate memory for ft_page_info at %lx\n", pfn);
> > +             return false;
> > +     }
> > +     new_p->pfn = pfn;
> > +     new_p->num = num;
> > +     new_p->next = NULL;
> > +
> > +     if (!(*ft_head) || (*ft_head)->pfn > new_p->pfn) {
> > +             new_p->next = (*ft_head);
> > +             (*ft_head) = new_p;
> > +             return true;
> > +     }
> > +
> > +     p = (*ft_head);
> > +     while (p->next != NULL && p->next->pfn < new_p->pfn) {
> > +             p = p->next;
> > +     }
> > +
> > +     new_p->next = p->next;
> > +     p->next = new_p;
>
> It might be wise to defensively handle the case of overlapping
> PFN ranges by merging them.
>
> > +     return true;
> > +}
> > +
> > +/*
> > + * Check if the pfn hit ft_page_info block.
> > + *
> > + * pfn and ft_head are in ascending order, so save the current ft_page_info
> > + * block into **p because it is likely to hit again next time.
> > + */
> > +bool
> > +filter_page(unsigned long pfn, struct ft_page_info **p, bool handle_discard)
> > +{
> > +     struct ft_page_info *ft_head;
> > +
> > +     ft_head = handle_discard ? ft_head_discard : ft_head_keep;
> > +
> > +     if (ft_head == NULL)
> > +             return false;
> > +
> > +     if (*p == NULL)
> > +             *p = ft_head;
> > +
> > +     /* The gap before 1st block */
> > +     if (pfn >= 0 && pfn < ft_head->pfn)
> > +             return false;
> > +
> > +     /* Handle 1~(n-1) blocks and following gaps */
> > +     while ((*p)->next) {
> > +             if (pfn >= (*p)->pfn && pfn < (*p)->pfn + (*p)->num)
> > +                     return true; // hit the block
> > +             if (pfn >= (*p)->pfn + (*p)->num && pfn < (*p)->next->pfn)
> > +                     return false; // the gap after the block
> > +             *p = (*p)->next;
> > +     }
> > +
> > +     /* The last block and gap */
> > +     if (pfn >= (*p)->pfn + (*p)->num)
> > +             return false;
> > +     else
> > +             return true;
> > +}
> > +
> > +static void
> > +do_cleanup(struct ft_page_info **ft_head)
> > +{
> > +     struct ft_page_info *p, *p_tmp;
> > +
> > +     for (p = *ft_head; p;) {
> > +             p_tmp = p;
> > +             p = p->next;
> > +             free(p_tmp);
> > +     }
> > +     *ft_head = NULL;
> > +}
> > +
> > +void
> > +cleanup_filter_pages_info(void)
> > +{
> > +     do_cleanup(&ft_head_discard);
> > +     do_cleanup(&ft_head_keep);
> > +}
> > diff --git a/erase_info.h b/erase_info.h
> > index b363a40..6c60706 100644
> > --- a/erase_info.h
> > +++ b/erase_info.h
> > @@ -20,6 +20,7 @@
> >  #define _ERASE_INFO_H
> >
> >  #define MAX_SIZE_STR_LEN (26)
> > +#include <stdbool.h>
> >
> >  /*
> >   * Erase information, original symbol expressions.
> > @@ -65,5 +66,16 @@ void filter_data_buffer_parallel(unsigned char *buf, unsigned long long paddr,
> >  unsigned long get_size_eraseinfo(void);
> >  int update_filter_info_raw(unsigned long long, int, int);
> >
> > +bool update_filter_pages_info(unsigned long, unsigned long, bool);
> > +
> > +struct ft_page_info {
> > +     unsigned long pfn;
> > +     unsigned long num;
> > +     struct ft_page_info *next;
> > +} __attribute__((packed));
> > +
> > +bool filter_page(unsigned long, struct ft_page_info **p, bool handle_discard);
> > +void cleanup_filter_pages_info(void);
> > +
> >  #endif /* _ERASE_INFO_H */
> >
> > diff --git a/makedumpfile.c b/makedumpfile.c
> > index ca8ed8a..ebac8da 100644
> > --- a/makedumpfile.c
> > +++ b/makedumpfile.c
> > @@ -102,6 +102,7 @@ mdf_pfn_t pfn_free;
> >  mdf_pfn_t pfn_hwpoison;
> >  mdf_pfn_t pfn_offline;
> >  mdf_pfn_t pfn_elf_excluded;
> > +mdf_pfn_t pfn_extension;
> >
> >  mdf_pfn_t num_dumped;
> >
> > @@ -6459,6 +6460,8 @@ __exclude_unnecessary_pages(unsigned long mem_map,
> >       unsigned int order_offset, dtor_offset;
> >       unsigned long flags, mapping, private = 0;
> >       unsigned long compound_dtor, compound_head = 0;
> > +     struct ft_page_info *cur_discard = NULL;
> > +     struct ft_page_info *cur_keep = NULL;
> >
> >       /*
> >        * If a multi-page exclusion is pending, do it first
> > @@ -6495,6 +6498,13 @@ __exclude_unnecessary_pages(unsigned long mem_map,
> >               if (info->flag_cyclic && !is_cyclic_region(pfn, cycle))
> >                       continue;
> >
> > +             /*
> > +              * Keep pages that specified by user via
> > +              * makedumpfile extensions
> > +              */
> > +             if (filter_page(pfn, &cur_keep, false))
> > +                     continue;
> > +
>
> It makes sense to allow plugins to enumerate a list of PFNs to override
> and include. I like that - it's simple enough. But it's not flexible
> enough for my use case with userspace stacks :(
>
> The userspace stack region is an anon_vma. My plugin can enumerate the
> anon_vmas that it wants to save, but it's prohibitively expensive and
> complex to enumerate the list of pages associated with each anon_vma. We
> would need to do a page table walk for each process.
>
> There's a simpler way: from the struct page mapping and index fields,
> it's possible to determine which anon_vma the page is associated with,
> and what index it has within the VMA. And from this, we can make the
> determination of whether to include a page or not. This is what I had
> implemented in this patch:
>
> https://github.com/brenns10/makedumpfile/commit/1c0a828ef80962480f771915c2d494272721b659#diff-2593512d7ec329b34b1ca5686a7b6b073d0ca636df8ff20fea04684da2c8e063R6692-R12150
>
> So, I wonder if it makes sense to allow a plugin to register a callback
> to be called here, so the plugin can make the more complex decision?
> This would keep the logic outside of the core makedumpfile code, but
> allow the necessary flexibility.
>
> Something like:
>
> if (plugin_keep_page_callback && plugin_keep_page_callback(pfn, pcache))
>     continue;
>
> And then the extension system could allow an extension to register that
> callback. It would need to keep the extension loaded for the duration of
> the execution of makedumpfile (rather than calling dlclose()
> immediately).
>
> What do you think about this? I'm happy to implement this part of it
> separate from your patch series -- you could simply drop the stuff
> related to page inclusion, and I can add the necessary pieces when I
> submit my extension patches.

Thanks a lot for your comments and suggestions! For my GPU mm
filtering case, the pages are often allocated as a large block set, so
one linked list like {discard: true, start_pfn: 100, pfn_nums: 100000}
could hit a large range of pages' pfn, thus linked lists aren't much
long and overlap/merge isn't a performance bottleneck to me. I agree
with your case that this can be a problem for scattered pages.

Here is my plan, in v4, I will re-design the extensions part, that
extensions will declare which kallsyms/btf symbol it is interested in,
and remove the hash tables for kallsyms as you suggested (Many thanks
for that, I think it is a brilliant idea to save the kallsyms
memory!). And I will leave this linked list / array page (ex)inclusion
function to you, so we can cooperate on this. Thanks again for your
support and help :)

Thanks,
Tao Liu

>
> Thanks,
> Stephen
>
> >               /*
> >                * Exclude the memory hole.
> >                */
> > @@ -6687,6 +6697,14 @@ check_order:
> >               else if (isOffline(flags, _mapcount)) {
> >                       pfn_counter = &pfn_offline;
> >               }
> > +             /*
> > +              * Exclude pages that specified by user via
> > +              * makedumpfile extensions
> > +              */
> > +             else if (filter_page(pfn, &cur_discard, true)) {
> > +                     nr_pages = 1;
> > +                     pfn_counter = &pfn_extension;
> > +             }
> >               /*
> >                * Unexcludable page
> >                */
> > @@ -6748,6 +6766,7 @@ exclude_unnecessary_pages(struct cycle *cycle)
> >               print_progress(PROGRESS_UNN_PAGES, info->num_mem_map, info->num_mem_map, NULL);
> >               print_execution_time(PROGRESS_UNN_PAGES, &ts_start);
> >       }
> > +     cleanup_filter_pages_info();
> >
> >       return TRUE;
> >  }
> > @@ -8234,7 +8253,7 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
> >        */
> >       if (info->flag_cyclic) {
> >               pfn_zero = pfn_cache = pfn_cache_private = 0;
> > -             pfn_user = pfn_free = pfn_hwpoison = pfn_offline = 0;
> > +             pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_extension = 0;
> >               pfn_memhole = info->max_mapnr;
> >       }
> >
> > @@ -9579,7 +9598,7 @@ write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_d
> >                * Reset counter for debug message.
> >                */
> >               pfn_zero = pfn_cache = pfn_cache_private = 0;
> > -             pfn_user = pfn_free = pfn_hwpoison = pfn_offline = 0;
> > +             pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_extension = 0;
> >               pfn_memhole = info->max_mapnr;
> >
> >               /*
> > @@ -10528,7 +10547,7 @@ print_report(void)
> >       pfn_original = info->max_mapnr - pfn_memhole;
> >
> >       pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
> > -         + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
> > +         + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_extension;
> >
> >       REPORT_MSG("\n");
> >       REPORT_MSG("Original pages  : 0x%016llx\n", pfn_original);
> > @@ -10544,6 +10563,7 @@ print_report(void)
> >       REPORT_MSG("    Free pages              : 0x%016llx\n", pfn_free);
> >       REPORT_MSG("    Hwpoison pages          : 0x%016llx\n", pfn_hwpoison);
> >       REPORT_MSG("    Offline pages           : 0x%016llx\n", pfn_offline);
> > +     REPORT_MSG("    Extension filter pages  : 0x%016llx\n", pfn_extension);
> >       REPORT_MSG("  Remaining pages  : 0x%016llx\n",
> >           pfn_original - pfn_excluded);
> >
> > @@ -10584,7 +10604,7 @@ print_mem_usage(void)
> >       pfn_original = info->max_mapnr - pfn_memhole;
> >
> >       pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
> > -         + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
> > +         + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_extension;
> >       shrinking = (pfn_original - pfn_excluded) * 100;
> >       shrinking = shrinking / pfn_original;
> >       total_size = info->page_size * pfn_original;
> > --
> > 2.47.0
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering
  2026-01-20  2:54 [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
                   ` (8 preceding siblings ...)
  2026-01-20  4:39 ` [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
@ 2026-01-29 10:19 ` YAMAZAKI MASAMITSU(山崎　真光)
  2026-02-04  8:50   ` Tao Liu
  9 siblings, 1 reply; 23+ messages in thread
From: YAMAZAKI MASAMITSU(山崎　真光) @ 2026-01-29 10:19 UTC (permalink / raw)
  To: Tao Liu, HAGIO KAZUHITO(萩尾　一仁),
	kexec@lists.infradead.org, stephen.s.brennan@oracle.com
  Cc: aravinda@linux.vnet.ibm.com

On 2026/01/20 11:54, Tao Liu wrote:
> A) This patchset will introduce the following features to makedumpfile:
>
>    1) Add .so extension support to makedumpfile
>    2) Enable btf and kallsyms for symbol type and address resolving.
>
> B) The purpose of the features are:
>
>    1) Currently makedumpfile filters mm pages based on page flags, because flags
>       can help to determine one page's usage. But this page-flag-checking method
>       lacks of flexibility in certain cases, e.g. if we want to filter those mm
>       pages occupied by GPU during vmcore dumping due to:
>
>       a) GPU may be taking a large memory and contains sensitive data;
>       b) GPU mm pages have no relations to kernel crash and useless for vmcore
>          analysis.
>
>       But there is no GPU mm page specific flags, and apparently we don't need
>       to create one just for kdump use. A programmable filtering tool is more
>       suitable for such cases. In addition, different GPU vendors may use
>       different ways for mm pages allocating, programmable filtering is better
>       than hard coding these GPU specific logics into makedumpfile in this case.
>
>    2) Currently makedumpfile already contains a programmable filtering tool, aka
>       eppic script, which allows user to write customized code for data erasing.
>       However it has the following drawbacks:
>
>       a) cannot do mm page filtering.
>       b) need to access to debuginfo of both kernel and modules, which is not
>          applicable in the 2nd kernel.
>       c) eppic library has memory leaks which are not all resolved [1]. This
>          is not acceptable in 2nd kernel.
>
>       makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
>       types and addresses. In recent kernel there are dwarf alternatives such
>       as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
>       are already packed within vmcore, so we can use it directly.
>
>    With these, this patchset introduces makedumpfile extensions, which is based
>    on btf/kallsyms symbol resolving, and is programmable for mm page filtering.
>    The following section shows its usage and performance, please note the tests
>    are performed in 1st kernel.
>
>    3) Compile and run makedumpfile extensions:
>
>    $ make LINKTYPE=dynamic USELZO=on USESNAPPY=on USEZSTD=on
>    $ make extensions
>    
>    $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
>      /tmp/extension.out
>      Loaded extension: ./extensions/amdgpu_filter.so
>      makedumpfile Completed.
>          User time (seconds): 6.37
>          System time (seconds): 0.70
>          Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.10
>          Maximum resident set size (kbytes): 38024
>          ...
>   
>       To contrast with eppic script of v2 [2]:
>
>    $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
>      /tmp/eppic.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c
>      makedumpfile Completed.
>          User time (seconds): 8.23
>          System time (seconds): 0.88
>          Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.16
>          Maximum resident set size (kbytes): 57128
>          ...
>
>    -rw------- 1 root root 367475074 Jan 19 19:01 /tmp/extension.out
>    -rw------- 1 root root 367475074 Jan 19 19:48 /tmp/eppic.out
>    -rw------- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore
>
> C) Discussion:
>
>    1) GPU types: Currently only tested with amdgpu's mm page filtering, others
>       are not tested.
>    2) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64.
>       Others are not tested.
>
> D) Testing:
>
>       If you don't want to create your vmcore, you can find a vmcore which I
>       created with amdgpu mm pages unfiltered [3], the amdgpu mm pages are
>       allocated by program [4]. You can use the vmcore in 1st kernel to filter
>       the amdgpu mm pages by the previous performance testing cmdline. To
>       verify the pages are filtered in crash:
>
>       Unfiltered:
>       crash> search -c "!QAZXSW@#EDC"
>       ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>       ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>       crash> rd ffff96b7fa800000
>       ffff96b7fa800000:  405753585a415121                    !QAZXSW@
>       crash> rd ffff96b87c800000
>       ffff96b87c800000:  405753585a415121                    !QAZXSW@
>
>       Filtered:
>       crash> search -c "!QAZXSW@#EDC"
>       crash> rd ffff96b7fa800000
>       rd: page excluded: kernel virtual address: ffff96b7fa800000  type: "64-bit KVADDR"
>       crash> rd ffff96b87c800000
>       rd: page excluded: kernel virtual address: ffff96b87c800000  type: "64-bit KVADDR"
>
> [1]: https://github.com/lucchouina/eppic/pull/32
> [2]: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
> [3]: https://people.redhat.com/~ltao/core/vmcore
> [4]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df
>
> v3 -> v2:
>
> 1) Removed btf/kallsyms support for eppic script, and introduced
>     makedumpfile .so extension instead. The reason of removing eppic
>     support is:
>     a) Native binary code as .so has better performance than scripting,
>        see the time consumption contrast above.
>     b) Eppic library has memory leaks which hasn't been fixed totally,
>        memeory leaks in 2nd kernel might be fatal.
>
> 2) Removed the code of manually parsing btf info, and used libbpf for
>     btf info parsing instead. The reason of removing manually parsing is:
>     a) Less code modification to makedumpfile, easier to maintain.
>     b) The performance of using libbpf is as good as manual parsing +
>        hash table indexing, as well as less memory consumption, see time
>        and memory consumption contrast above.
>
> 3) The patches are organized as follows:
>
>      --- <only for test purpose, don't merge> ---
>      8.Filter amdgpu mm pages
>      7.Add maple tree support to makedumpfile extension
>
>      --- <code should be merged> ---
>      6.Add page filtering function
>      5.Add makedumpfile extension support
>      4.Implement kernel modules' btf resolving
>      3.Implement kernel modules' kallsyms resolving
>      2.Implement kernel btf resolving
>      1.Implement kernel kallsyms resolving
>
>      Patch 7 & 8 are customization specific, which can be maintained separately.
>      Patch 1 ~ 6 are common code which should be integrate with makedumpfile.
>
> Link to v2: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
> Link to v1: https://lore.kernel.org/kexec/20250610095743.18073-1-ltao@redhat.com/
>
> Tao Liu (8):
>    Implement kernel kallsyms resolving
>    Implement kernel btf resolving
>    Implement kernel modules' kallsyms resolving
>    Implement kernel modules' btf resolving
>    Add makedumpfile extension support
>    Add page filtering function
>    Add maple tree support to makedumpfile extension
>    Filter amdgpu mm pages
>
>   Makefile                   |   9 +-
>   btf_info.c                 | 260 +++++++++++++++++++++++++
>   btf_info.h                 |  66 +++++++
>   erase_info.c               |  98 ++++++++++
>   erase_info.h               |  12 ++
>   extension.c                |  82 ++++++++
>   extensions/Makefile        |  10 +
>   extensions/amdgpu_filter.c |  90 +++++++++
>   extensions/maple_tree.c    | 336 +++++++++++++++++++++++++++++++++
>   extensions/maple_tree.h    |   6 +
>   kallsyms.c                 | 376 +++++++++++++++++++++++++++++++++++++
>   kallsyms.h                 |  20 ++
>   makedumpfile.c             |  35 +++-
>   makedumpfile.h             |  11 ++
>   14 files changed, 1405 insertions(+), 6 deletions(-)
>   create mode 100644 btf_info.c
>   create mode 100644 btf_info.h
>   create mode 100644 extension.c
>   create mode 100644 extensions/Makefile
>   create mode 100644 extensions/amdgpu_filter.c
>   create mode 100644 extensions/maple_tree.c
>   create mode 100644 extensions/maple_tree.h
>   create mode 100644 kallsyms.c
>   create mode 100644 kallsyms.h

Thank you for your the v3 patch. And thanks for the appropriate 
improvements.

I believe the results will be significantly improved by stopping use
of eppic and using libbpf instead.

My concerns are as follows:

* Performance has improved compared to the version using eppic,
but I would like to see a comparison of performance without these patches
applied.

* This mechanism is enabled when .so (shared libraries) are present,
but I think it would be better to have an option to turn this feature
on or off. If you don't use shared libraries, you have to delete or move 
them,
  which can be inconvenient.

I'm looking forward to v4, which will improve things like calculation Order.

Masa

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/8] Add makedumpfile extension support
  2026-01-22 13:43     ` Tao Liu
@ 2026-02-04  8:40       ` Tao Liu
  2026-03-11  0:38         ` Stephen Brennan
  0 siblings, 1 reply; 23+ messages in thread
From: Tao Liu @ 2026-02-04  8:40 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: yamazaki-msmt, k-hagio-ab, kexec, aravinda

Hi Stephen,

Sorry I took some time to make modifications on v3 code. Please see a
drafted v4 [1] for a preview.

I followed your suggestion to let extensions declare the kallsyms
symbol/btf types it needed, then during kallsyms/btf initialization,
it will only resolve the declared symbol/types, thus getting rid of
the hash table. Extensions now don't need to initial needed
symbols/types by itself.

In addition, users can specify which extensions to load at
makedumpfile cmdline as:

$ ./makedumpfile -d 31 -l
/var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore /tmp/out --extension
amdgpu_filter.so

Also "--extension" can be used several times, and amdgpu_filter.so can
be an absolute path as well. If no extensions are specified, the
btf/kallsyms of makedumpfile will not initialize.

I don't know if this is what you wanted, or any suggestions for this?
Thanks in advance!

Thanks,
Tao Liu







[1]: https://github.com/liutgnu/makedumpfile/commits/v4/

On Fri, Jan 23, 2026 at 2:43 AM Tao Liu <ltao@redhat.com> wrote:
>
> Hi Stephen,
>
> Thanks a lot for your quick reply and detailed information, I really
> appreciate it!
>
> On Thu, Jan 22, 2026 at 1:51 PM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
> >
> > Hi Tao,
> >
> > This series looks really great -- I'm excited to see the switch to
> > native .so extensions instead of epicc. I've applied the series locally
> > and I'll rebuild my userspace stack inclusion feature based on it, to
> > try it out myself.
>
> Awesome, looking forward to your feedback on the code/API designs etc...
>
> >
> > In the meantime, I'll share some of my feedback on the patches (though
> > I'm not a makedumpfile developer). This seems like the most important
> > patch in terms of design, so I'll start here.
> >
> > Tao Liu <ltao@redhat.com> writes:
> > > This patch will add .so extension support to makedumpfile, similar to crash
> > > extension to crash utility. Currently only "/usr/lib64/makedumpfile/extensions"
> > > and "./extensions" are searched for extensions. Once found, kallsyms and btf
> > > will be initialized so all extensions can benifit from it (Currently makedumpfile
> > > doesn't use these info, we can move the kallsyms/btf init code else where later
> > > if makedumpfile needs them).
> > >
> > > The makedumpfile extension is to help users to customize mm page filtering upon
> > > traditional mm page flag filtering, without make code modification on makedumpfile
> > > itself.
> > >
> > > Signed-off-by: Tao Liu <ltao@redhat.com>
> > > ---
> > >  Makefile            |  7 +++-
> > >  extension.c         | 82 +++++++++++++++++++++++++++++++++++++++++++++
> > >  extensions/Makefile | 10 ++++++
> > >  makedumpfile.c      |  4 +++
> > >  4 files changed, 102 insertions(+), 1 deletion(-)
> > >  create mode 100644 extension.c
> > >  create mode 100644 extensions/Makefile
> > >
> > > diff --git a/Makefile b/Makefile
> > > index f3f4da8..7e29220 100644
> > > --- a/Makefile
> > > +++ b/Makefile
> > > @@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
> > >  endif
> > >
> > >  SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
> > > -SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c
> > > +SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c extension.c
> > >  OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
> > >  SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
> > >  OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
> > > @@ -126,6 +126,7 @@ eppic_makedumpfile.so: extension_eppic.c
> > >
> > >  clean:
> > >       rm -f $(OBJ) $(OBJ_PART) $(OBJ_ARCH) makedumpfile makedumpfile.8 makedumpfile.conf.5
> > > +     $(MAKE) -C extensions clean
> > >
> > >  install:
> > >       install -m 755 -d ${DESTDIR}/${SBINDIR} ${DESTDIR}/usr/share/man/man5 ${DESTDIR}/usr/share/man/man8
> > > @@ -135,3 +136,7 @@ install:
> > >       mkdir -p ${DESTDIR}/usr/share/makedumpfile/eppic_scripts
> > >       install -m 644 -D $(VPATH)makedumpfile.conf ${DESTDIR}/usr/share/makedumpfile/makedumpfile.conf.sample
> > >       install -m 644 -t ${DESTDIR}/usr/share/makedumpfile/eppic_scripts/ $(VPATH)eppic_scripts/*
> > > +
> > > +.PHONY: extensions
> > > +extensions:
> > > +     $(MAKE) -C extensions CC=$(CC)
> > > \ No newline at end of file
> > > diff --git a/extension.c b/extension.c
> > > new file mode 100644
> > > index 0000000..6ee7f4e
> > > --- /dev/null
> > > +++ b/extension.c
> > > @@ -0,0 +1,82 @@
> > > +#include <stdio.h>
> > > +#include <stdlib.h>
> > > +#include <string.h>
> > > +#include <dirent.h>
> > > +#include <dlfcn.h>
> > > +#include <stdbool.h>
> > > +#include "kallsyms.h"
> > > +#include "btf_info.h"
> > > +
> > > +static const char *dirs[] = {
> > > +     "/usr/lib64/makedumpfile/extensions",
> > > +     "./extensions",
> > > +};
> > > +
> > > +/* Will only init once */
> > > +static bool init_kallsyms_btf(void)
> > > +{
> > > +     static bool ret = false;
> > > +     static bool has_inited = false;
> > > +
> > > +     if (has_inited)
> > > +             goto out;
> > > +     if (!init_kernel_kallsyms())
> > > +             goto out;
> > > +     if (!init_kernel_btf())
> > > +             goto out;
> > > +     if (!init_module_kallsyms())
> > > +             goto out;
> > > +     if (!init_module_btf())
> > > +             goto out;
> > > +     ret = true;
> >
> > I feel it would be good practice to load as little information as is
> > necessary for the task. If "amdgpu" module is required, then load kernel
> > kallsyms, BTF, and then the amdgpu module kallsyms & BTF. If no module
> > debuginfo is required, then just the kernel would suffice.
> >
> > This would reduce memory usage and runtime, though I don't know if it
> > would show up in profiling. The main benefit could be reliability: by
> > handling less data, there are fewer chances to hit an error.
>
> OK, I agree, mandatory kernel btf/kallsyms info + optional kernel
> module btf/kallsyms info is a reasonable design. So kernel modules'
> info can be loaded on demand.
>
> >
> > > +out:
> > > +     has_inited = true;
> > > +     return ret;
> > > +}
> > > +
> > > +static void cleanup_kallsyms_btf(void)
> > > +{
> > > +     cleanup_kallsyms();
> > > +     cleanup_btf();
> > > +}
> > > +
> > > +void run_extensions(void)
> > > +{
> > > +     DIR *dir;
> > > +     struct dirent *entry;
> > > +     size_t len;
> > > +     int i;
> > > +     void *handle;
> > > +     char path[512];
> > > +
> > > +     for (i = 0; i < sizeof(dirs) / sizeof(char *); i++) {
> > > +             if ((dir = opendir(dirs[i])) != NULL)
> > > +                     break;
> > > +     }
> > > +
> > > +     if (!dir || i >= sizeof(dirs) / sizeof(char *))
> > > +             /* No extensions found */
> > > +             return;
> >
> > It could be confusing that makedumpfile would behave differently with
> > the same command-line arguments depending on the presence or absence of
> > these extensions on the filesystem.
> >
> > I think it may fit users' expectations better if they are required to
> > specify extensions on the command line. Then we could load them by
> > searching each directory in order. This allows:
> >
> > (a) more expected behavior
> > (b) multiple extensions can exist without all being enabled, thus more
> >     flexibility
> > (c) extensions can be present in the local "extensions/" directory, or
> >     in the system directory
>
> Sure, it also sounds reasonable. My original thoughts are, user
> customization on mm filtering are specified in .so, and if user don't
> need one .so, e.g. amdgpu mm filtering for a nvidia machine, then he
> doesn't pack the amdgpu_filter.so into kdump's initramfs. I agree
> adding extra makedumpfile cmdline option to receive those needed .so
> is a better design.
>
> >
> > > +     while ((entry = readdir(dir)) != NULL) {
> > > +             len = strlen(entry->d_name);
> > > +             if (len > 3 && strcmp(entry->d_name + len - 3, ".so") == 0) {
> > > +                     /* Will only init when .so exist */
> > > +                     if (!init_kallsyms_btf())
> > > +                             goto out;
> > > +
> > > +                     snprintf(path, sizeof(path), "%s/%s", dirs[i], entry->d_name);
> > > +                     handle = dlopen(path, RTLD_NOW);
> > > +                     if (!handle) {
> > > +                             fprintf(stderr, "%s: Failed to load %s: %s\n",
> > > +                                     __func__, path, dlerror());
> > > +                             continue;
> > > +                     }
> > > +                     printf("Loaded extension: %s\n", path);
> > > +                     dlclose(handle);
> >
> > Using the constructor/destructor of the shared object is clever! But we
> > lose some flexibility: by the time the dlopen() returns, the constructor
> > has executed and the plugin has thus executed.
> >
> > What if we instead use dlsym() to load some symbols from the DSO? In
> > particular, I think it would be useful if extensions could declare a
> > list of symbols and a list of structure information which they are
> > interested in receiving. We could use these lists to know which
> > kernel/module kallsyms & BTF we should load. We could even load the
> > information into the local variables of the extension, so the extension
> > would not need to manually load it.
> >
> > Of course this is more complex, but the benefit is:
> >
> > 1. Extensions can be written more simply, and would not need to manually
> > load each symbol & type.
> > 2. We could eliminate the hash tables for kallsyms & BTF, and eliminate
> > the loading of unnecessary module information. Instead, we'd just
> > populate the symbol addresses, struct offsets, and type sizes directly
> > into the local variables which request them.
>
> It is a clever idea! Though complex for code, I think it is doable.
>
> >
> > Again, while I don't want to prematurely optimize -- it's good to avoid
> > loading unnecessary information. I hope I've described my idea well. I
> > would be happy to work on an implementation of it based on your patches
> > here, if you're interested.
>
> Thanks again for your suggestions! I got your points and I think I can
> improve the code while waiting for maintainers ideas at the same time.
> I will let you know when done or encounter blockers if any.
>
> Thanks,
> Tao Liu
>
> >
> > Thanks,
> > Stephen
> >
> > > +             }
> > > +     }
> > > +out:
> > > +     closedir(dir);
> > > +     cleanup_kallsyms_btf();
> > > +}
> > > \ No newline at end of file
> > > diff --git a/extensions/Makefile b/extensions/Makefile
> > > new file mode 100644
> > > index 0000000..afbc61e
> > > --- /dev/null
> > > +++ b/extensions/Makefile
> > > @@ -0,0 +1,10 @@
> > > +CC ?= gcc
> > > +CONTRIB_SO :=
> > > +
> > > +all: $(CONTRIB_SO)
> > > +
> > > +$(CONTRIB_SO): %.so: %.c
> > > +     $(CC) -O2 -g -fPIC -shared -o $@ $^
> > > +
> > > +clean:
> > > +     rm -f $(CONTRIB_SO)
> > > diff --git a/makedumpfile.c b/makedumpfile.c
> > > index dba3628..ca8ed8a 100644
> > > --- a/makedumpfile.c
> > > +++ b/makedumpfile.c
> > > @@ -10847,6 +10847,8 @@ update_dump_level(void)
> > >       }
> > >  }
> > >
> > > +void run_extensions(void);
> > > +
> > >  int
> > >  create_dumpfile(void)
> > >  {
> > > @@ -10884,6 +10886,8 @@ retry:
> > >       if (info->flag_refiltering)
> > >               update_dump_level();
> > >
> > > +     run_extensions();
> > > +
> > >       if ((info->name_filterconfig || info->name_eppic_config)
> > >                       && !gather_filter_info())
> > >               return FALSE;
> > > --
> > > 2.47.0
> >



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering
  2026-01-29 10:19 ` YAMAZAKI MASAMITSU(山崎　真光)
@ 2026-02-04  8:50   ` Tao Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Tao Liu @ 2026-02-04  8:50 UTC (permalink / raw)
  To: YAMAZAKI MASAMITSU(山崎 真光)
  Cc: HAGIO KAZUHITO(萩尾 一仁),
	kexec@lists.infradead.org, stephen.s.brennan@oracle.com,
	aravinda@linux.vnet.ibm.com

Hi YAMAZAKI,

Sorry for the late reply.

On Thu, Jan 29, 2026 at 11:19 PM YAMAZAKI MASAMITSU(山崎　真光)
<yamazaki-msmt@nec.com> wrote:
>
> On 2026/01/20 11:54, Tao Liu wrote:
> > A) This patchset will introduce the following features to makedumpfile:
> >
> >    1) Add .so extension support to makedumpfile
> >    2) Enable btf and kallsyms for symbol type and address resolving.
> >
> > B) The purpose of the features are:
> >
> >    1) Currently makedumpfile filters mm pages based on page flags, because flags
> >       can help to determine one page's usage. But this page-flag-checking method
> >       lacks of flexibility in certain cases, e.g. if we want to filter those mm
> >       pages occupied by GPU during vmcore dumping due to:
> >
> >       a) GPU may be taking a large memory and contains sensitive data;
> >       b) GPU mm pages have no relations to kernel crash and useless for vmcore
> >          analysis.
> >
> >       But there is no GPU mm page specific flags, and apparently we don't need
> >       to create one just for kdump use. A programmable filtering tool is more
> >       suitable for such cases. In addition, different GPU vendors may use
> >       different ways for mm pages allocating, programmable filtering is better
> >       than hard coding these GPU specific logics into makedumpfile in this case.
> >
> >    2) Currently makedumpfile already contains a programmable filtering tool, aka
> >       eppic script, which allows user to write customized code for data erasing.
> >       However it has the following drawbacks:
> >
> >       a) cannot do mm page filtering.
> >       b) need to access to debuginfo of both kernel and modules, which is not
> >          applicable in the 2nd kernel.
> >       c) eppic library has memory leaks which are not all resolved [1]. This
> >          is not acceptable in 2nd kernel.
> >
> >       makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
> >       types and addresses. In recent kernel there are dwarf alternatives such
> >       as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
> >       are already packed within vmcore, so we can use it directly.
> >
> >    With these, this patchset introduces makedumpfile extensions, which is based
> >    on btf/kallsyms symbol resolving, and is programmable for mm page filtering.
> >    The following section shows its usage and performance, please note the tests
> >    are performed in 1st kernel.
> >
> >    3) Compile and run makedumpfile extensions:
> >
> >    $ make LINKTYPE=dynamic USELZO=on USESNAPPY=on USEZSTD=on
> >    $ make extensions
> >
> >    $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
> >      /tmp/extension.out
> >      Loaded extension: ./extensions/amdgpu_filter.so
> >      makedumpfile Completed.
> >          User time (seconds): 6.37
> >          System time (seconds): 0.70
> >          Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.10
> >          Maximum resident set size (kbytes): 38024
> >          ...
> >
> >       To contrast with eppic script of v2 [2]:
> >
> >    $ /usr/bin/time -v ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
> >      /tmp/eppic.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c
> >      makedumpfile Completed.
> >          User time (seconds): 8.23
> >          System time (seconds): 0.88
> >          Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.16
> >          Maximum resident set size (kbytes): 57128
> >          ...
> >
> >    -rw------- 1 root root 367475074 Jan 19 19:01 /tmp/extension.out
> >    -rw------- 1 root root 367475074 Jan 19 19:48 /tmp/eppic.out
> >    -rw------- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore
> >
> > C) Discussion:
> >
> >    1) GPU types: Currently only tested with amdgpu's mm page filtering, others
> >       are not tested.
> >    2) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64.
> >       Others are not tested.
> >
> > D) Testing:
> >
> >       If you don't want to create your vmcore, you can find a vmcore which I
> >       created with amdgpu mm pages unfiltered [3], the amdgpu mm pages are
> >       allocated by program [4]. You can use the vmcore in 1st kernel to filter
> >       the amdgpu mm pages by the previous performance testing cmdline. To
> >       verify the pages are filtered in crash:
> >
> >       Unfiltered:
> >       crash> search -c "!QAZXSW@#EDC"
> >       ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> >       ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> >       crash> rd ffff96b7fa800000
> >       ffff96b7fa800000:  405753585a415121                    !QAZXSW@
> >       crash> rd ffff96b87c800000
> >       ffff96b87c800000:  405753585a415121                    !QAZXSW@
> >
> >       Filtered:
> >       crash> search -c "!QAZXSW@#EDC"
> >       crash> rd ffff96b7fa800000
> >       rd: page excluded: kernel virtual address: ffff96b7fa800000  type: "64-bit KVADDR"
> >       crash> rd ffff96b87c800000
> >       rd: page excluded: kernel virtual address: ffff96b87c800000  type: "64-bit KVADDR"
> >
> > [1]: https://github.com/lucchouina/eppic/pull/32
> > [2]: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
> > [3]: https://people.redhat.com/~ltao/core/vmcore
> > [4]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df
> >
> > v3 -> v2:
> >
> > 1) Removed btf/kallsyms support for eppic script, and introduced
> >     makedumpfile .so extension instead. The reason of removing eppic
> >     support is:
> >     a) Native binary code as .so has better performance than scripting,
> >        see the time consumption contrast above.
> >     b) Eppic library has memory leaks which hasn't been fixed totally,
> >        memeory leaks in 2nd kernel might be fatal.
> >
> > 2) Removed the code of manually parsing btf info, and used libbpf for
> >     btf info parsing instead. The reason of removing manually parsing is:
> >     a) Less code modification to makedumpfile, easier to maintain.
> >     b) The performance of using libbpf is as good as manual parsing +
> >        hash table indexing, as well as less memory consumption, see time
> >        and memory consumption contrast above.
> >
> > 3) The patches are organized as follows:
> >
> >      --- <only for test purpose, don't merge> ---
> >      8.Filter amdgpu mm pages
> >      7.Add maple tree support to makedumpfile extension
> >
> >      --- <code should be merged> ---
> >      6.Add page filtering function
> >      5.Add makedumpfile extension support
> >      4.Implement kernel modules' btf resolving
> >      3.Implement kernel modules' kallsyms resolving
> >      2.Implement kernel btf resolving
> >      1.Implement kernel kallsyms resolving
> >
> >      Patch 7 & 8 are customization specific, which can be maintained separately.
> >      Patch 1 ~ 6 are common code which should be integrate with makedumpfile.
> >
> > Link to v2: https://lore.kernel.org/kexec/20251020222410.8235-1-ltao@redhat.com/
> > Link to v1: https://lore.kernel.org/kexec/20250610095743.18073-1-ltao@redhat.com/
> >
> > Tao Liu (8):
> >    Implement kernel kallsyms resolving
> >    Implement kernel btf resolving
> >    Implement kernel modules' kallsyms resolving
> >    Implement kernel modules' btf resolving
> >    Add makedumpfile extension support
> >    Add page filtering function
> >    Add maple tree support to makedumpfile extension
> >    Filter amdgpu mm pages
> >
> >   Makefile                   |   9 +-
> >   btf_info.c                 | 260 +++++++++++++++++++++++++
> >   btf_info.h                 |  66 +++++++
> >   erase_info.c               |  98 ++++++++++
> >   erase_info.h               |  12 ++
> >   extension.c                |  82 ++++++++
> >   extensions/Makefile        |  10 +
> >   extensions/amdgpu_filter.c |  90 +++++++++
> >   extensions/maple_tree.c    | 336 +++++++++++++++++++++++++++++++++
> >   extensions/maple_tree.h    |   6 +
> >   kallsyms.c                 | 376 +++++++++++++++++++++++++++++++++++++
> >   kallsyms.h                 |  20 ++
> >   makedumpfile.c             |  35 +++-
> >   makedumpfile.h             |  11 ++
> >   14 files changed, 1405 insertions(+), 6 deletions(-)
> >   create mode 100644 btf_info.c
> >   create mode 100644 btf_info.h
> >   create mode 100644 extension.c
> >   create mode 100644 extensions/Makefile
> >   create mode 100644 extensions/amdgpu_filter.c
> >   create mode 100644 extensions/maple_tree.c
> >   create mode 100644 extensions/maple_tree.h
> >   create mode 100644 kallsyms.c
> >   create mode 100644 kallsyms.h
>
> Thank you for your the v3 patch. And thanks for the appropriate
> improvements.
>
> I believe the results will be significantly improved by stopping use
> of eppic and using libbpf instead.
>
> My concerns are as follows:
>
> * Performance has improved compared to the version using eppic,
> but I would like to see a comparison of performance without these patches
> applied.

Sure, I will attach the performance comparison for v4. Currently I'm
cooperating with Stephen on a drafted v4. Once we are OK with it, I
will post the final v4 to upstream later.

>
> * This mechanism is enabled when .so (shared libraries) are present,
> but I think it would be better to have an option to turn this feature
> on or off. If you don't use shared libraries, you have to delete or move
> them,
>   which can be inconvenient.

I guess your meaning is, when no .so extensions are needed, then
btf/kallsyms code won't get executed. Currently I'm introducing a
"--extension" option to makedumpfile. Only when specified, the
btf/kallsyms of makedumpfile will be initialized.

>
> I'm looking forward to v4, which will improve things like calculation Order.

Sure, no problem. Thanks again for your suggestions!

Thanks,
Tao Liu
>
> Masa



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/8] Add makedumpfile extension support
  2026-02-04  8:40       ` Tao Liu
@ 2026-03-11  0:38         ` Stephen Brennan
  2026-03-11 14:41           ` Tao Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Stephen Brennan @ 2026-03-11  0:38 UTC (permalink / raw)
  To: Tao Liu; +Cc: yamazaki-msmt, k-hagio-ab, kexec, aravinda

Tao Liu <ltao@redhat.com> writes:
> Hi Stephen,
>
> Sorry I took some time to make modifications on v3 code. Please see a
> drafted v4 [1] for a preview.
>
> I followed your suggestion to let extensions declare the kallsyms
> symbol/btf types it needed, then during kallsyms/btf initialization,
> it will only resolve the declared symbol/types, thus getting rid of
> the hash table. Extensions now don't need to initial needed
> symbols/types by itself.
>
> In addition, users can specify which extensions to load at
> makedumpfile cmdline as:
>
> $ ./makedumpfile -d 31 -l
> /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore /tmp/out --extension
> amdgpu_filter.so
>
> Also "--extension" can be used several times, and amdgpu_filter.so can
> be an absolute path as well. If no extensions are specified, the
> btf/kallsyms of makedumpfile will not initialize.
>
> I don't know if this is what you wanted, or any suggestions for this?
> Thanks in advance!

Hi Tao,

Please accept my apologies for the long delay on this review.  I've gone through
each commit with my feedback. This looks like a really excellent job and I
think it is ready for the review of the maintainers. I've implemented my own
userspace stack extension on top of this support, and I find it to be great for
my use case as well, with one small change (adding flags for detecting whether a
struct or member is found).

Here is some more detailed feedback on a per-commit level; it's all pretty
minor.

d3aee7a ("Reserve sections for makedumpfile and extenions")

* This is just a note, but I've found that the linker script may not be
  necessary. GCC automatically creates the __start_SECTION and __stop_SECTION
  symbols for sections it creates.
* Otherwise, this is exactly what I was thinking!

102ae0b ("Implement kernel kallsyms resolving")

* Again, this is looking very close to the design I hoped for, thanks!
* I'm not sure whether is_unwanted_symbol() needs to exist anymore?  Given that
  users opt-in to specific symbols, we don't need to filter out noisy ones that
  would waste memory. What do you think?
* Unfortunately, upstream has made some changes to the vmlinux kallsyms encoding
  in 7.0. You may want to check what we did in drgn to support those changes:
  https://github.com/osandov/drgn/commit/744f36ec3c3f64d7e1323a0037898158698585c4
* In kallsyms.c:

  +/*
  + * Makedumpfile's .init_ksyms section
  +*/
  +extern struct ksym_info __start_init_ksyms[];
  +extern struct ksym_info __stop_init_ksyms[];

  I'm not sure it matters, but the type here is wrong. It should be
  "extern struct ksym_info *" because you're storing pointers, not the actual
  struct. That said, the type isn't used so I don't know that it matters.

4187b33 ("Implement kernel btf resolving")

* The get_ktype_info() function fails if either the struct or member is not
  found. This makes sense in a lot of cases, but there are other cases where we
  will want to use the presence or absence of a struct/struct member to detect
  which version of code to use. For example, my userspace stack code will use
  either maple or rbtree helper code for finding stack VMAs, depending on which
  is available. We can't know until runtime, when we check whether mm_rb or
  mm_mt is present.

  One solution is to handle this at runtime. We can have a macro like
  "HAVE_MEMBER(S, M)" and "HAVE_STRUCT(S)", and then each extension can check
  whether for the members it expects to be present. I think the major downside
  to this approach is that it requires manual effort, which is likely to be
  forgotten when writing extensions.

  Alternatively, we could set a flag in the struct ktype_info if the type is
  "required" (or optional), and only fail get_ktype_info() for required
  structs/members. The concern with this approach is: what if plugin (A)
  requires a type which is not present, but plugin (B) does not? If both are
  loaded, the failure of A would cause B not to run. I'm not sure whether we
  should care about that situation... I don't know if we have a use case for
  using multiple plugins at the same time. Until we do, we probably won't have a
  good idea whether it should be allowed for one to fail, but the other to
  continue.

  I've implemented this second alternative in my branch.

* Similar to the previous commit, __start_init_ktypes and __stop_init_ktypes are
  declared as structs but should probably be declared as pointers.

22097b7 ("Implement kernel module's kallsyms resolving")
edfa698 ("Implement kernel module's btf resolving")

* These commits are straightforward:
  Reviewed-by: Stephen Brennan <stephen.s.brennan@oracle.com>

ede22e8 ("Add makedumpfile extensions support")

* nit: the array "handlers" actually contains "handles" (no r) returned from
  dlopen(). The word handlers usually implies a function pointer or something
  like that, called to handle a certain situation. Maybe rename this to
  "handles".
* The run_extensions() function is called within a retry loop in
  create_dumpfile(). It seems possible that this could get called multiple
  times. But many of the global variables in the kallsyms, BTF, and extension
  code are not safe to be cleaned up and reinitialized. In particular, when
  array elements are freed, the lengths, capacities, and pointers are not reset
  to 0/NULL. I think it would be wise to make all the cleanup functions clear
  the globals, so that they may be reinitialized safely.
* Further, I think it might be helpful to split extension loading, running, and
  cleanup. Something like this in create_dumpfile():

        load_extensions(); /* loads everything and calls entry() */
    retry:
        /* ... create bitmap and dump ... */
        if (status == NOSPACE) {
            /* ... */
            goto retry;
        }
        /* ... */
        cleanup_extensions();
        return;

  For two reasons: (1) this avoids unnecessary work re-loading and
  re-initializing the BTF, kallsyms, and extensions. (Though I still think it's
  safer to ensure they can be re-initialized safely.) And (2), this allows for
  future use of extensions in the rest of the dump operation. For my userspace
  stack extension, I plan to add a callback which allows extensions to override
  the decision to filter a page, since my logic can't be easily done via erase
  info. So the extensions need to remain loaded during the creation of the
  dumpfile, and cleaned up after. I have tweaked this in my own patches,
  but I just wanted to share the use case.

c568635 ("btf/kallsyms based makedumpfile extension for mm page filtering")

* This looks good to me!
* I will say that for my userspace stack use case, while the "filter_page(...,
  false)" mechanism for specifying pages that are retained *looks* useful, I
  wouldn't be able to use it because I cannot determine the PFNs for each stack
  VMA. Instead, I have to use the page "mapping" and "index" fields to match the
  VMA and determine whether the PFN falls in a range I care to save. Of course,
  just because *I* won't use it doesn't mean it's not useful :)

2b252ec ("Filter amdgpu mm pages")

* I'm no expert in amdgpu, but the overall approach makes sense to me, and the
  helpers look good.


At a broader level, while the add_to_arr() function is useful, I do think the
dynamic array/vector pattern could be captured with a dedicated data structure.
For instance, drgn has this excellent LGPL-2.1+ header-only vector library:
https://github.com/osandov/drgn/blob/main/libdrgn/vector.h
I don't think it is high priority or must be addressed. Nor do I mean that this
particular implementation choice be used. It's just something to share & think
about.

To provide some context on my comments related to the userspace stack tracing,
here is a branch of mine which is based on yours, that adds my userspace stack
extension and a few tweaks:

https://github.com/brenns10/makedumpfile/commits/stepbren_userstack_v4/

Thank you,
Stephen

> Thanks,
> Tao Liu
>
>
>
>
>
>
>
> [1]: https://github.com/liutgnu/makedumpfile/commits/v4/
>
> On Fri, Jan 23, 2026 at 2:43 AM Tao Liu <ltao@redhat.com> wrote:
>>
>> Hi Stephen,
>>
>> Thanks a lot for your quick reply and detailed information, I really
>> appreciate it!
>>
>> On Thu, Jan 22, 2026 at 1:51 PM Stephen Brennan
>> <stephen.s.brennan@oracle.com> wrote:
>> >
>> > Hi Tao,
>> >
>> > This series looks really great -- I'm excited to see the switch to
>> > native .so extensions instead of epicc. I've applied the series locally
>> > and I'll rebuild my userspace stack inclusion feature based on it, to
>> > try it out myself.
>>
>> Awesome, looking forward to your feedback on the code/API designs etc...
>>
>> >
>> > In the meantime, I'll share some of my feedback on the patches (though
>> > I'm not a makedumpfile developer). This seems like the most important
>> > patch in terms of design, so I'll start here.
>> >
>> > Tao Liu <ltao@redhat.com> writes:
>> > > This patch will add .so extension support to makedumpfile, similar to crash
>> > > extension to crash utility. Currently only "/usr/lib64/makedumpfile/extensions"
>> > > and "./extensions" are searched for extensions. Once found, kallsyms and btf
>> > > will be initialized so all extensions can benifit from it (Currently makedumpfile
>> > > doesn't use these info, we can move the kallsyms/btf init code else where later
>> > > if makedumpfile needs them).
>> > >
>> > > The makedumpfile extension is to help users to customize mm page filtering upon
>> > > traditional mm page flag filtering, without make code modification on makedumpfile
>> > > itself.
>> > >
>> > > Signed-off-by: Tao Liu <ltao@redhat.com>
>> > > ---
>> > >  Makefile            |  7 +++-
>> > >  extension.c         | 82 +++++++++++++++++++++++++++++++++++++++++++++
>> > >  extensions/Makefile | 10 ++++++
>> > >  makedumpfile.c      |  4 +++
>> > >  4 files changed, 102 insertions(+), 1 deletion(-)
>> > >  create mode 100644 extension.c
>> > >  create mode 100644 extensions/Makefile
>> > >
>> > > diff --git a/Makefile b/Makefile
>> > > index f3f4da8..7e29220 100644
>> > > --- a/Makefile
>> > > +++ b/Makefile
>> > > @@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
>> > >  endif
>> > >
>> > >  SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
>> > > -SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c
>> > > +SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c extension.c
>> > >  OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
>> > >  SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
>> > >  OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
>> > > @@ -126,6 +126,7 @@ eppic_makedumpfile.so: extension_eppic.c
>> > >
>> > >  clean:
>> > >       rm -f $(OBJ) $(OBJ_PART) $(OBJ_ARCH) makedumpfile makedumpfile.8 makedumpfile.conf.5
>> > > +     $(MAKE) -C extensions clean
>> > >
>> > >  install:
>> > >       install -m 755 -d ${DESTDIR}/${SBINDIR} ${DESTDIR}/usr/share/man/man5 ${DESTDIR}/usr/share/man/man8
>> > > @@ -135,3 +136,7 @@ install:
>> > >       mkdir -p ${DESTDIR}/usr/share/makedumpfile/eppic_scripts
>> > >       install -m 644 -D $(VPATH)makedumpfile.conf ${DESTDIR}/usr/share/makedumpfile/makedumpfile.conf.sample
>> > >       install -m 644 -t ${DESTDIR}/usr/share/makedumpfile/eppic_scripts/ $(VPATH)eppic_scripts/*
>> > > +
>> > > +.PHONY: extensions
>> > > +extensions:
>> > > +     $(MAKE) -C extensions CC=$(CC)
>> > > \ No newline at end of file
>> > > diff --git a/extension.c b/extension.c
>> > > new file mode 100644
>> > > index 0000000..6ee7f4e
>> > > --- /dev/null
>> > > +++ b/extension.c
>> > > @@ -0,0 +1,82 @@
>> > > +#include <stdio.h>
>> > > +#include <stdlib.h>
>> > > +#include <string.h>
>> > > +#include <dirent.h>
>> > > +#include <dlfcn.h>
>> > > +#include <stdbool.h>
>> > > +#include "kallsyms.h"
>> > > +#include "btf_info.h"
>> > > +
>> > > +static const char *dirs[] = {
>> > > +     "/usr/lib64/makedumpfile/extensions",
>> > > +     "./extensions",
>> > > +};
>> > > +
>> > > +/* Will only init once */
>> > > +static bool init_kallsyms_btf(void)
>> > > +{
>> > > +     static bool ret = false;
>> > > +     static bool has_inited = false;
>> > > +
>> > > +     if (has_inited)
>> > > +             goto out;
>> > > +     if (!init_kernel_kallsyms())
>> > > +             goto out;
>> > > +     if (!init_kernel_btf())
>> > > +             goto out;
>> > > +     if (!init_module_kallsyms())
>> > > +             goto out;
>> > > +     if (!init_module_btf())
>> > > +             goto out;
>> > > +     ret = true;
>> >
>> > I feel it would be good practice to load as little information as is
>> > necessary for the task. If "amdgpu" module is required, then load kernel
>> > kallsyms, BTF, and then the amdgpu module kallsyms & BTF. If no module
>> > debuginfo is required, then just the kernel would suffice.
>> >
>> > This would reduce memory usage and runtime, though I don't know if it
>> > would show up in profiling. The main benefit could be reliability: by
>> > handling less data, there are fewer chances to hit an error.
>>
>> OK, I agree, mandatory kernel btf/kallsyms info + optional kernel
>> module btf/kallsyms info is a reasonable design. So kernel modules'
>> info can be loaded on demand.
>>
>> >
>> > > +out:
>> > > +     has_inited = true;
>> > > +     return ret;
>> > > +}
>> > > +
>> > > +static void cleanup_kallsyms_btf(void)
>> > > +{
>> > > +     cleanup_kallsyms();
>> > > +     cleanup_btf();
>> > > +}
>> > > +
>> > > +void run_extensions(void)
>> > > +{
>> > > +     DIR *dir;
>> > > +     struct dirent *entry;
>> > > +     size_t len;
>> > > +     int i;
>> > > +     void *handle;
>> > > +     char path[512];
>> > > +
>> > > +     for (i = 0; i < sizeof(dirs) / sizeof(char *); i++) {
>> > > +             if ((dir = opendir(dirs[i])) != NULL)
>> > > +                     break;
>> > > +     }
>> > > +
>> > > +     if (!dir || i >= sizeof(dirs) / sizeof(char *))
>> > > +             /* No extensions found */
>> > > +             return;
>> >
>> > It could be confusing that makedumpfile would behave differently with
>> > the same command-line arguments depending on the presence or absence of
>> > these extensions on the filesystem.
>> >
>> > I think it may fit users' expectations better if they are required to
>> > specify extensions on the command line. Then we could load them by
>> > searching each directory in order. This allows:
>> >
>> > (a) more expected behavior
>> > (b) multiple extensions can exist without all being enabled, thus more
>> >     flexibility
>> > (c) extensions can be present in the local "extensions/" directory, or
>> >     in the system directory
>>
>> Sure, it also sounds reasonable. My original thoughts are, user
>> customization on mm filtering are specified in .so, and if user don't
>> need one .so, e.g. amdgpu mm filtering for a nvidia machine, then he
>> doesn't pack the amdgpu_filter.so into kdump's initramfs. I agree
>> adding extra makedumpfile cmdline option to receive those needed .so
>> is a better design.
>>
>> >
>> > > +     while ((entry = readdir(dir)) != NULL) {
>> > > +             len = strlen(entry->d_name);
>> > > +             if (len > 3 && strcmp(entry->d_name + len - 3, ".so") == 0) {
>> > > +                     /* Will only init when .so exist */
>> > > +                     if (!init_kallsyms_btf())
>> > > +                             goto out;
>> > > +
>> > > +                     snprintf(path, sizeof(path), "%s/%s", dirs[i], entry->d_name);
>> > > +                     handle = dlopen(path, RTLD_NOW);
>> > > +                     if (!handle) {
>> > > +                             fprintf(stderr, "%s: Failed to load %s: %s\n",
>> > > +                                     __func__, path, dlerror());
>> > > +                             continue;
>> > > +                     }
>> > > +                     printf("Loaded extension: %s\n", path);
>> > > +                     dlclose(handle);
>> >
>> > Using the constructor/destructor of the shared object is clever! But we
>> > lose some flexibility: by the time the dlopen() returns, the constructor
>> > has executed and the plugin has thus executed.
>> >
>> > What if we instead use dlsym() to load some symbols from the DSO? In
>> > particular, I think it would be useful if extensions could declare a
>> > list of symbols and a list of structure information which they are
>> > interested in receiving. We could use these lists to know which
>> > kernel/module kallsyms & BTF we should load. We could even load the
>> > information into the local variables of the extension, so the extension
>> > would not need to manually load it.
>> >
>> > Of course this is more complex, but the benefit is:
>> >
>> > 1. Extensions can be written more simply, and would not need to manually
>> > load each symbol & type.
>> > 2. We could eliminate the hash tables for kallsyms & BTF, and eliminate
>> > the loading of unnecessary module information. Instead, we'd just
>> > populate the symbol addresses, struct offsets, and type sizes directly
>> > into the local variables which request them.
>>
>> It is a clever idea! Though complex for code, I think it is doable.
>>
>> >
>> > Again, while I don't want to prematurely optimize -- it's good to avoid
>> > loading unnecessary information. I hope I've described my idea well. I
>> > would be happy to work on an implementation of it based on your patches
>> > here, if you're interested.
>>
>> Thanks again for your suggestions! I got your points and I think I can
>> improve the code while waiting for maintainers ideas at the same time.
>> I will let you know when done or encounter blockers if any.
>>
>> Thanks,
>> Tao Liu
>>
>> >
>> > Thanks,
>> > Stephen
>> >
>> > > +             }
>> > > +     }
>> > > +out:
>> > > +     closedir(dir);
>> > > +     cleanup_kallsyms_btf();
>> > > +}
>> > > \ No newline at end of file
>> > > diff --git a/extensions/Makefile b/extensions/Makefile
>> > > new file mode 100644
>> > > index 0000000..afbc61e
>> > > --- /dev/null
>> > > +++ b/extensions/Makefile
>> > > @@ -0,0 +1,10 @@
>> > > +CC ?= gcc
>> > > +CONTRIB_SO :=
>> > > +
>> > > +all: $(CONTRIB_SO)
>> > > +
>> > > +$(CONTRIB_SO): %.so: %.c
>> > > +     $(CC) -O2 -g -fPIC -shared -o $@ $^
>> > > +
>> > > +clean:
>> > > +     rm -f $(CONTRIB_SO)
>> > > diff --git a/makedumpfile.c b/makedumpfile.c
>> > > index dba3628..ca8ed8a 100644
>> > > --- a/makedumpfile.c
>> > > +++ b/makedumpfile.c
>> > > @@ -10847,6 +10847,8 @@ update_dump_level(void)
>> > >       }
>> > >  }
>> > >
>> > > +void run_extensions(void);
>> > > +
>> > >  int
>> > >  create_dumpfile(void)
>> > >  {
>> > > @@ -10884,6 +10886,8 @@ retry:
>> > >       if (info->flag_refiltering)
>> > >               update_dump_level();
>> > >
>> > > +     run_extensions();
>> > > +
>> > >       if ((info->name_filterconfig || info->name_eppic_config)
>> > >                       && !gather_filter_info())
>> > >               return FALSE;
>> > > --
>> > > 2.47.0
>> >


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/8] Add makedumpfile extension support
  2026-03-11  0:38         ` Stephen Brennan
@ 2026-03-11 14:41           ` Tao Liu
  2026-03-12 22:24             ` Stephen Brennan
  0 siblings, 1 reply; 23+ messages in thread
From: Tao Liu @ 2026-03-11 14:41 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: yamazaki-msmt, k-hagio-ab, kexec, aravinda

Hi Stephen,

On Wed, Mar 11, 2026 at 1:38 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Tao Liu <ltao@redhat.com> writes:
> > Hi Stephen,
> >
> > Sorry I took some time to make modifications on v3 code. Please see a
> > drafted v4 [1] for a preview.
> >
> > I followed your suggestion to let extensions declare the kallsyms
> > symbol/btf types it needed, then during kallsyms/btf initialization,
> > it will only resolve the declared symbol/types, thus getting rid of
> > the hash table. Extensions now don't need to initial needed
> > symbols/types by itself.
> >
> > In addition, users can specify which extensions to load at
> > makedumpfile cmdline as:
> >
> > $ ./makedumpfile -d 31 -l
> > /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore /tmp/out --extension
> > amdgpu_filter.so
> >
> > Also "--extension" can be used several times, and amdgpu_filter.so can
> > be an absolute path as well. If no extensions are specified, the
> > btf/kallsyms of makedumpfile will not initialize.
> >
> > I don't know if this is what you wanted, or any suggestions for this?
> > Thanks in advance!
>
> Hi Tao,
>
> Please accept my apologies for the long delay on this review.  I've gone through
> each commit with my feedback. This looks like a really excellent job and I
> think it is ready for the review of the maintainers. I've implemented my own
> userspace stack extension on top of this support, and I find it to be great for
> my use case as well, with one small change (adding flags for detecting whether a
> struct or member is found).

No worries at all :) Thanks a lot for your detailed comments. I'm
looking through them as well as your code branch in github. This may
take a while...

>
> Here is some more detailed feedback on a per-commit level; it's all pretty
> minor.
>
> d3aee7a ("Reserve sections for makedumpfile and extenions")
>
> * This is just a note, but I've found that the linker script may not be
>   necessary. GCC automatically creates the __start_SECTION and __stop_SECTION
>   symbols for sections it creates.

I have tried to remove the makedumpfile.ld linker script. The
automatic symbol as __start/stop_SECTION are only generated for
makedumpfile, aka the main program, but not for .so extensions. This
is a breaker because I would expect all .so files have the
__start/stop_SECTION symbol so they can be resigtered to main program.
I'm not expert in GCC's default behaviours, so I guess I would prefer
the current approach of explicit symbol define within the linker
script, seems to be more robust...  Perhaps you can share your code on
this, I will give it a try.

> * Otherwise, this is exactly what I was thinking!
>
> 102ae0b ("Implement kernel kallsyms resolving")
>
> * Again, this is looking very close to the design I hoped for, thanks!
> * I'm not sure whether is_unwanted_symbol() needs to exist anymore?  Given that
>   users opt-in to specific symbols, we don't need to filter out noisy ones that
>   would waste memory. What do you think?

Agreed, I can remove this function.

> * Unfortunately, upstream has made some changes to the vmlinux kallsyms encoding
>   in 7.0. You may want to check what we did in drgn to support those changes:
>   https://github.com/osandov/drgn/commit/744f36ec3c3f64d7e1323a0037898158698585c4
> * In kallsyms.c:

Thanks for the info, I will try this and update the code in v4.

>
>   +/*
>   + * Makedumpfile's .init_ksyms section
>   +*/
>   +extern struct ksym_info __start_init_ksyms[];
>   +extern struct ksym_info __stop_init_ksyms[];
>
>   I'm not sure it matters, but the type here is wrong. It should be
>   "extern struct ksym_info *" because you're storing pointers, not the actual
>   struct. That said, the type isn't used so I don't know that it matters.

Right, this is an error. Thanks for pointing that out.

>
> 4187b33 ("Implement kernel btf resolving")
>
> * The get_ktype_info() function fails if either the struct or member is not
>   found. This makes sense in a lot of cases, but there are other cases where we
>   will want to use the presence or absence of a struct/struct member to detect
>   which version of code to use. For example, my userspace stack code will use
>   either maple or rbtree helper code for finding stack VMAs, depending on which
>   is available. We can't know until runtime, when we check whether mm_rb or
>   mm_mt is present.
>
>   One solution is to handle this at runtime. We can have a macro like
>   "HAVE_MEMBER(S, M)" and "HAVE_STRUCT(S)", and then each extension can check
>   whether for the members it expects to be present. I think the major downside
>   to this approach is that it requires manual effort, which is likely to be
>   forgotten when writing extensions.
>
>   Alternatively, we could set a flag in the struct ktype_info if the type is
>   "required" (or optional), and only fail get_ktype_info() for required

Agreed, an optional flag looks better.

>   structs/members. The concern with this approach is: what if plugin (A)
>   requires a type which is not present, but plugin (B) does not? If both are
>   loaded, the failure of A would cause B not to run. I'm not sure whether we
>   should care about that situation... I don't know if we have a use case for
>   using multiple plugins at the same time. Until we do, we probably won't have a
>   good idea whether it should be allowed for one to fail, but the other to
>   continue.

Great question! I have thought about this previously. I suggest that
if plugin(A) fails, it should just fail and allow the execution of any
later plugins (B, C....) to continue. Each plugin is responsible for
one task, like plugin(A) for dealing with amdgpu's mm page
filtering,and plugin(B) for Intel's and plugin(C) for NV's. Plugin(A)
certainly will fail if one machine have no amdgpu, thus the amdgpu.ko
will never been loaded, so related symbol/types missing. This is
expected and shouldn't block the later plugins.

But the "fail" should gentle, not like the ones as segfault, which
will crash the entire makedumpfile program. Since currently plugins
are native .so libraries, the quality of code is ensured by each
plugin authors, rather than makedumpfile maintainers. Idealy the the
plugins are well tested in 1st kernel before they are shipped to kdump
img, but who knows. From makedumpfile's view, do you think we need to
introduce a sandbox to isolate plugins from makedumpfile? This would
prevent serious plugin errors from stopping makedumpfile from
generating the vmcore.

>
>   I've implemented this second alternative in my branch.
>
> * Similar to the previous commit, __start_init_ktypes and __stop_init_ktypes are
>   declared as structs but should probably be declared as pointers.
>
> 22097b7 ("Implement kernel module's kallsyms resolving")
> edfa698 ("Implement kernel module's btf resolving")
>
> * These commits are straightforward:
>   Reviewed-by: Stephen Brennan <stephen.s.brennan@oracle.com>
>
> ede22e8 ("Add makedumpfile extensions support")

I will address those later after reading through your code branch.
Thanks again for your detailed comments!

Thanks,
Tao Liu

>
> * nit: the array "handlers" actually contains "handles" (no r) returned from
>   dlopen(). The word handlers usually implies a function pointer or something
>   like that, called to handle a certain situation. Maybe rename this to
>   "handles".
> * The run_extensions() function is called within a retry loop in
>   create_dumpfile(). It seems possible that this could get called multiple
>   times. But many of the global variables in the kallsyms, BTF, and extension
>   code are not safe to be cleaned up and reinitialized. In particular, when
>   array elements are freed, the lengths, capacities, and pointers are not reset
>   to 0/NULL. I think it would be wise to make all the cleanup functions clear
>   the globals, so that they may be reinitialized safely.
> * Further, I think it might be helpful to split extension loading, running, and
>   cleanup. Something like this in create_dumpfile():
>
>         load_extensions(); /* loads everything and calls entry() */
>     retry:
>         /* ... create bitmap and dump ... */
>         if (status == NOSPACE) {
>             /* ... */
>             goto retry;
>         }
>         /* ... */
>         cleanup_extensions();
>         return;
>
>   For two reasons: (1) this avoids unnecessary work re-loading and
>   re-initializing the BTF, kallsyms, and extensions. (Though I still think it's
>   safer to ensure they can be re-initialized safely.) And (2), this allows for
>   future use of extensions in the rest of the dump operation. For my userspace
>   stack extension, I plan to add a callback which allows extensions to override
>   the decision to filter a page, since my logic can't be easily done via erase
>   info. So the extensions need to remain loaded during the creation of the
>   dumpfile, and cleaned up after. I have tweaked this in my own patches,
>   but I just wanted to share the use case.
>
> c568635 ("btf/kallsyms based makedumpfile extension for mm page filtering")
>
> * This looks good to me!
> * I will say that for my userspace stack use case, while the "filter_page(...,
>   false)" mechanism for specifying pages that are retained *looks* useful, I
>   wouldn't be able to use it because I cannot determine the PFNs for each stack
>   VMA. Instead, I have to use the page "mapping" and "index" fields to match the
>   VMA and determine whether the PFN falls in a range I care to save. Of course,
>   just because *I* won't use it doesn't mean it's not useful :)
>
> 2b252ec ("Filter amdgpu mm pages")
>
> * I'm no expert in amdgpu, but the overall approach makes sense to me, and the
>   helpers look good.
>
>
> At a broader level, while the add_to_arr() function is useful, I do think the
> dynamic array/vector pattern could be captured with a dedicated data structure.
> For instance, drgn has this excellent LGPL-2.1+ header-only vector library:
> https://github.com/osandov/drgn/blob/main/libdrgn/vector.h
> I don't think it is high priority or must be addressed. Nor do I mean that this
> particular implementation choice be used. It's just something to share & think
> about.
>
> To provide some context on my comments related to the userspace stack tracing,
> here is a branch of mine which is based on yours, that adds my userspace stack
> extension and a few tweaks:
>
> https://github.com/brenns10/makedumpfile/commits/stepbren_userstack_v4/
>
> Thank you,
> Stephen
>
> > Thanks,
> > Tao Liu
> >
> >
> >
> >
> >
> >
> >
> > [1]: https://github.com/liutgnu/makedumpfile/commits/v4/
> >
> > On Fri, Jan 23, 2026 at 2:43 AM Tao Liu <ltao@redhat.com> wrote:
> >>
> >> Hi Stephen,
> >>
> >> Thanks a lot for your quick reply and detailed information, I really
> >> appreciate it!
> >>
> >> On Thu, Jan 22, 2026 at 1:51 PM Stephen Brennan
> >> <stephen.s.brennan@oracle.com> wrote:
> >> >
> >> > Hi Tao,
> >> >
> >> > This series looks really great -- I'm excited to see the switch to
> >> > native .so extensions instead of epicc. I've applied the series locally
> >> > and I'll rebuild my userspace stack inclusion feature based on it, to
> >> > try it out myself.
> >>
> >> Awesome, looking forward to your feedback on the code/API designs etc...
> >>
> >> >
> >> > In the meantime, I'll share some of my feedback on the patches (though
> >> > I'm not a makedumpfile developer). This seems like the most important
> >> > patch in terms of design, so I'll start here.
> >> >
> >> > Tao Liu <ltao@redhat.com> writes:
> >> > > This patch will add .so extension support to makedumpfile, similar to crash
> >> > > extension to crash utility. Currently only "/usr/lib64/makedumpfile/extensions"
> >> > > and "./extensions" are searched for extensions. Once found, kallsyms and btf
> >> > > will be initialized so all extensions can benifit from it (Currently makedumpfile
> >> > > doesn't use these info, we can move the kallsyms/btf init code else where later
> >> > > if makedumpfile needs them).
> >> > >
> >> > > The makedumpfile extension is to help users to customize mm page filtering upon
> >> > > traditional mm page flag filtering, without make code modification on makedumpfile
> >> > > itself.
> >> > >
> >> > > Signed-off-by: Tao Liu <ltao@redhat.com>
> >> > > ---
> >> > >  Makefile            |  7 +++-
> >> > >  extension.c         | 82 +++++++++++++++++++++++++++++++++++++++++++++
> >> > >  extensions/Makefile | 10 ++++++
> >> > >  makedumpfile.c      |  4 +++
> >> > >  4 files changed, 102 insertions(+), 1 deletion(-)
> >> > >  create mode 100644 extension.c
> >> > >  create mode 100644 extensions/Makefile
> >> > >
> >> > > diff --git a/Makefile b/Makefile
> >> > > index f3f4da8..7e29220 100644
> >> > > --- a/Makefile
> >> > > +++ b/Makefile
> >> > > @@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
> >> > >  endif
> >> > >
> >> > >  SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
> >> > > -SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c
> >> > > +SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c extension.c
> >> > >  OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
> >> > >  SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
> >> > >  OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
> >> > > @@ -126,6 +126,7 @@ eppic_makedumpfile.so: extension_eppic.c
> >> > >
> >> > >  clean:
> >> > >       rm -f $(OBJ) $(OBJ_PART) $(OBJ_ARCH) makedumpfile makedumpfile.8 makedumpfile.conf.5
> >> > > +     $(MAKE) -C extensions clean
> >> > >
> >> > >  install:
> >> > >       install -m 755 -d ${DESTDIR}/${SBINDIR} ${DESTDIR}/usr/share/man/man5 ${DESTDIR}/usr/share/man/man8
> >> > > @@ -135,3 +136,7 @@ install:
> >> > >       mkdir -p ${DESTDIR}/usr/share/makedumpfile/eppic_scripts
> >> > >       install -m 644 -D $(VPATH)makedumpfile.conf ${DESTDIR}/usr/share/makedumpfile/makedumpfile.conf.sample
> >> > >       install -m 644 -t ${DESTDIR}/usr/share/makedumpfile/eppic_scripts/ $(VPATH)eppic_scripts/*
> >> > > +
> >> > > +.PHONY: extensions
> >> > > +extensions:
> >> > > +     $(MAKE) -C extensions CC=$(CC)
> >> > > \ No newline at end of file
> >> > > diff --git a/extension.c b/extension.c
> >> > > new file mode 100644
> >> > > index 0000000..6ee7f4e
> >> > > --- /dev/null
> >> > > +++ b/extension.c
> >> > > @@ -0,0 +1,82 @@
> >> > > +#include <stdio.h>
> >> > > +#include <stdlib.h>
> >> > > +#include <string.h>
> >> > > +#include <dirent.h>
> >> > > +#include <dlfcn.h>
> >> > > +#include <stdbool.h>
> >> > > +#include "kallsyms.h"
> >> > > +#include "btf_info.h"
> >> > > +
> >> > > +static const char *dirs[] = {
> >> > > +     "/usr/lib64/makedumpfile/extensions",
> >> > > +     "./extensions",
> >> > > +};
> >> > > +
> >> > > +/* Will only init once */
> >> > > +static bool init_kallsyms_btf(void)
> >> > > +{
> >> > > +     static bool ret = false;
> >> > > +     static bool has_inited = false;
> >> > > +
> >> > > +     if (has_inited)
> >> > > +             goto out;
> >> > > +     if (!init_kernel_kallsyms())
> >> > > +             goto out;
> >> > > +     if (!init_kernel_btf())
> >> > > +             goto out;
> >> > > +     if (!init_module_kallsyms())
> >> > > +             goto out;
> >> > > +     if (!init_module_btf())
> >> > > +             goto out;
> >> > > +     ret = true;
> >> >
> >> > I feel it would be good practice to load as little information as is
> >> > necessary for the task. If "amdgpu" module is required, then load kernel
> >> > kallsyms, BTF, and then the amdgpu module kallsyms & BTF. If no module
> >> > debuginfo is required, then just the kernel would suffice.
> >> >
> >> > This would reduce memory usage and runtime, though I don't know if it
> >> > would show up in profiling. The main benefit could be reliability: by
> >> > handling less data, there are fewer chances to hit an error.
> >>
> >> OK, I agree, mandatory kernel btf/kallsyms info + optional kernel
> >> module btf/kallsyms info is a reasonable design. So kernel modules'
> >> info can be loaded on demand.
> >>
> >> >
> >> > > +out:
> >> > > +     has_inited = true;
> >> > > +     return ret;
> >> > > +}
> >> > > +
> >> > > +static void cleanup_kallsyms_btf(void)
> >> > > +{
> >> > > +     cleanup_kallsyms();
> >> > > +     cleanup_btf();
> >> > > +}
> >> > > +
> >> > > +void run_extensions(void)
> >> > > +{
> >> > > +     DIR *dir;
> >> > > +     struct dirent *entry;
> >> > > +     size_t len;
> >> > > +     int i;
> >> > > +     void *handle;
> >> > > +     char path[512];
> >> > > +
> >> > > +     for (i = 0; i < sizeof(dirs) / sizeof(char *); i++) {
> >> > > +             if ((dir = opendir(dirs[i])) != NULL)
> >> > > +                     break;
> >> > > +     }
> >> > > +
> >> > > +     if (!dir || i >= sizeof(dirs) / sizeof(char *))
> >> > > +             /* No extensions found */
> >> > > +             return;
> >> >
> >> > It could be confusing that makedumpfile would behave differently with
> >> > the same command-line arguments depending on the presence or absence of
> >> > these extensions on the filesystem.
> >> >
> >> > I think it may fit users' expectations better if they are required to
> >> > specify extensions on the command line. Then we could load them by
> >> > searching each directory in order. This allows:
> >> >
> >> > (a) more expected behavior
> >> > (b) multiple extensions can exist without all being enabled, thus more
> >> >     flexibility
> >> > (c) extensions can be present in the local "extensions/" directory, or
> >> >     in the system directory
> >>
> >> Sure, it also sounds reasonable. My original thoughts are, user
> >> customization on mm filtering are specified in .so, and if user don't
> >> need one .so, e.g. amdgpu mm filtering for a nvidia machine, then he
> >> doesn't pack the amdgpu_filter.so into kdump's initramfs. I agree
> >> adding extra makedumpfile cmdline option to receive those needed .so
> >> is a better design.
> >>
> >> >
> >> > > +     while ((entry = readdir(dir)) != NULL) {
> >> > > +             len = strlen(entry->d_name);
> >> > > +             if (len > 3 && strcmp(entry->d_name + len - 3, ".so") == 0) {
> >> > > +                     /* Will only init when .so exist */
> >> > > +                     if (!init_kallsyms_btf())
> >> > > +                             goto out;
> >> > > +
> >> > > +                     snprintf(path, sizeof(path), "%s/%s", dirs[i], entry->d_name);
> >> > > +                     handle = dlopen(path, RTLD_NOW);
> >> > > +                     if (!handle) {
> >> > > +                             fprintf(stderr, "%s: Failed to load %s: %s\n",
> >> > > +                                     __func__, path, dlerror());
> >> > > +                             continue;
> >> > > +                     }
> >> > > +                     printf("Loaded extension: %s\n", path);
> >> > > +                     dlclose(handle);
> >> >
> >> > Using the constructor/destructor of the shared object is clever! But we
> >> > lose some flexibility: by the time the dlopen() returns, the constructor
> >> > has executed and the plugin has thus executed.
> >> >
> >> > What if we instead use dlsym() to load some symbols from the DSO? In
> >> > particular, I think it would be useful if extensions could declare a
> >> > list of symbols and a list of structure information which they are
> >> > interested in receiving. We could use these lists to know which
> >> > kernel/module kallsyms & BTF we should load. We could even load the
> >> > information into the local variables of the extension, so the extension
> >> > would not need to manually load it.
> >> >
> >> > Of course this is more complex, but the benefit is:
> >> >
> >> > 1. Extensions can be written more simply, and would not need to manually
> >> > load each symbol & type.
> >> > 2. We could eliminate the hash tables for kallsyms & BTF, and eliminate
> >> > the loading of unnecessary module information. Instead, we'd just
> >> > populate the symbol addresses, struct offsets, and type sizes directly
> >> > into the local variables which request them.
> >>
> >> It is a clever idea! Though complex for code, I think it is doable.
> >>
> >> >
> >> > Again, while I don't want to prematurely optimize -- it's good to avoid
> >> > loading unnecessary information. I hope I've described my idea well. I
> >> > would be happy to work on an implementation of it based on your patches
> >> > here, if you're interested.
> >>
> >> Thanks again for your suggestions! I got your points and I think I can
> >> improve the code while waiting for maintainers ideas at the same time.
> >> I will let you know when done or encounter blockers if any.
> >>
> >> Thanks,
> >> Tao Liu
> >>
> >> >
> >> > Thanks,
> >> > Stephen
> >> >
> >> > > +             }
> >> > > +     }
> >> > > +out:
> >> > > +     closedir(dir);
> >> > > +     cleanup_kallsyms_btf();
> >> > > +}
> >> > > \ No newline at end of file
> >> > > diff --git a/extensions/Makefile b/extensions/Makefile
> >> > > new file mode 100644
> >> > > index 0000000..afbc61e
> >> > > --- /dev/null
> >> > > +++ b/extensions/Makefile
> >> > > @@ -0,0 +1,10 @@
> >> > > +CC ?= gcc
> >> > > +CONTRIB_SO :=
> >> > > +
> >> > > +all: $(CONTRIB_SO)
> >> > > +
> >> > > +$(CONTRIB_SO): %.so: %.c
> >> > > +     $(CC) -O2 -g -fPIC -shared -o $@ $^
> >> > > +
> >> > > +clean:
> >> > > +     rm -f $(CONTRIB_SO)
> >> > > diff --git a/makedumpfile.c b/makedumpfile.c
> >> > > index dba3628..ca8ed8a 100644
> >> > > --- a/makedumpfile.c
> >> > > +++ b/makedumpfile.c
> >> > > @@ -10847,6 +10847,8 @@ update_dump_level(void)
> >> > >       }
> >> > >  }
> >> > >
> >> > > +void run_extensions(void);
> >> > > +
> >> > >  int
> >> > >  create_dumpfile(void)
> >> > >  {
> >> > > @@ -10884,6 +10886,8 @@ retry:
> >> > >       if (info->flag_refiltering)
> >> > >               update_dump_level();
> >> > >
> >> > > +     run_extensions();
> >> > > +
> >> > >       if ((info->name_filterconfig || info->name_eppic_config)
> >> > >                       && !gather_filter_info())
> >> > >               return FALSE;
> >> > > --
> >> > > 2.47.0
> >> >
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/8] Add makedumpfile extension support
  2026-03-11 14:41           ` Tao Liu
@ 2026-03-12 22:24             ` Stephen Brennan
  2026-03-17 15:31               ` Tao Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Stephen Brennan @ 2026-03-12 22:24 UTC (permalink / raw)
  To: Tao Liu; +Cc: yamazaki-msmt, k-hagio-ab, kexec, aravinda

Tao Liu <ltao@redhat.com> writes:

> Hi Stephen,
>
> On Wed, Mar 11, 2026 at 1:38 PM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
>>
>> Tao Liu <ltao@redhat.com> writes:
>> > Hi Stephen,
>> >
>> > Sorry I took some time to make modifications on v3 code. Please see a
>> > drafted v4 [1] for a preview.
>> >
>> > I followed your suggestion to let extensions declare the kallsyms
>> > symbol/btf types it needed, then during kallsyms/btf initialization,
>> > it will only resolve the declared symbol/types, thus getting rid of
>> > the hash table. Extensions now don't need to initial needed
>> > symbols/types by itself.
>> >
>> > In addition, users can specify which extensions to load at
>> > makedumpfile cmdline as:
>> >
>> > $ ./makedumpfile -d 31 -l
>> > /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore /tmp/out --extension
>> > amdgpu_filter.so
>> >
>> > Also "--extension" can be used several times, and amdgpu_filter.so can
>> > be an absolute path as well. If no extensions are specified, the
>> > btf/kallsyms of makedumpfile will not initialize.
>> >
>> > I don't know if this is what you wanted, or any suggestions for this?
>> > Thanks in advance!
>>
>> Hi Tao,
>>
>> Please accept my apologies for the long delay on this review.  I've gone through
>> each commit with my feedback. This looks like a really excellent job and I
>> think it is ready for the review of the maintainers. I've implemented my own
>> userspace stack extension on top of this support, and I find it to be great for
>> my use case as well, with one small change (adding flags for detecting whether a
>> struct or member is found).
>
> No worries at all :) Thanks a lot for your detailed comments. I'm
> looking through them as well as your code branch in github. This may
> take a while...
>
>>
>> Here is some more detailed feedback on a per-commit level; it's all pretty
>> minor.
>>
>> d3aee7a ("Reserve sections for makedumpfile and extenions")
>>
>> * This is just a note, but I've found that the linker script may not be
>>   necessary. GCC automatically creates the __start_SECTION and __stop_SECTION
>>   symbols for sections it creates.
>
> I have tried to remove the makedumpfile.ld linker script. The
> automatic symbol as __start/stop_SECTION are only generated for
> makedumpfile, aka the main program, but not for .so extensions. This
> is a breaker because I would expect all .so files have the
> __start/stop_SECTION symbol so they can be resigtered to main program.
> I'm not expert in GCC's default behaviours, so I guess I would prefer
> the current approach of explicit symbol define within the linker
> script, seems to be more robust...  Perhaps you can share your code on
> this, I will give it a try.

The key to making GCC generate the symbols for the sections is that the
section names should themselves be valid C identifiers (i.e. they should
not start with a '.'). Here's a patch to give the idea:

From 3618f51869e978328e49f6061eae94469bca43c5 Mon Sep 17 00:00:00 2001
From: Stephen Brennan <stephen.s.brennan@oracle.com>
Date: Wed, 11 Mar 2026 09:17:05 -0700
Subject: [PATCH] Use automatically generated section start symbols

GCC will create __start_SECTION and __stop_SECTION only if the section
names are valid C identifiers. This means we cannot use a leading dot.
Drop the leading dot from the section names so we can avoid defining a
custom linker script.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 Makefile            |  2 +-
 btf_info.h          |  4 ++--
 extensions/Makefile |  2 +-
 kallsyms.h          |  2 +-
 makedumpfile.ld     | 15 ---------------
 5 files changed, 5 insertions(+), 20 deletions(-)
 delete mode 100644 makedumpfile.ld

diff --git a/Makefile b/Makefile
index 8196aa6..e2cde8f 100644
--- a/Makefile
+++ b/Makefile
@@ -113,7 +113,7 @@ $(OBJ_ARCH): $(SRC_ARCH)
 	$(CC) $(CFLAGS_ARCH) -c -o ./$@ $(VPATH)$(@:.o=.c)
 
 makedumpfile: $(SRC_BASE) $(OBJ_PART) $(OBJ_ARCH)
-	$(CC) $(CFLAGS) $(LDFLAGS) $(OBJ_PART) $(OBJ_ARCH) -rdynamic -Wl,-T,makedumpfile.ld -o $@ $< $(LIBS)
+	$(CC) $(CFLAGS) $(LDFLAGS) $(OBJ_PART) $(OBJ_ARCH) -rdynamic -o $@ $< $(LIBS)
 	@sed -e "s/@DATE@/$(DATE)/" \
 	     -e "s/@VERSION@/$(VERSION)/" \
 	     $(VPATH)makedumpfile.8.in > $(VPATH)makedumpfile.8
diff --git a/btf_info.h b/btf_info.h
index 2a78fbf..555f21e 100644
--- a/btf_info.h
+++ b/btf_info.h
@@ -29,7 +29,7 @@ void cleanup_btf(void);
 	struct ktype_info _##MOD##_##S##_##M = {		\
 		QUATE(MOD), QUATE(S), QUATE(M), 0, 0, 0, 0, R, R\
 	};							\
-	__attribute__((section(".init_ktypes"), used))		\
+	__attribute__((section("init_ktypes"), used))		\
 	struct ktype_info * _ptr_##MOD##_##S##_##M = &_##MOD##_##S##_##M
 
 #define INIT_MOD_STRUCT_MEMBER(MOD, S, M)			\
@@ -60,7 +60,7 @@ void cleanup_btf(void);
 	struct ktype_info _##MOD##_##S = {			\
 		QUATE(MOD), QUATE(S), 0, 0, 0, 0, 0, R, 0	\
 	};							\
-	__attribute__((section(".init_ktypes"), used))		\
+	__attribute__((section("init_ktypes"), used))		\
 	struct ktype_info * _ptr_##MOD##_##S = &_##MOD##_##S
 
 #define INIT_MOD_STRUCT(MOD, S) INIT_MOD_STRUCT_RQD(MOD, S, 1)
diff --git a/extensions/Makefile b/extensions/Makefile
index fc28577..a4dc86a 100644
--- a/extensions/Makefile
+++ b/extensions/Makefile
@@ -8,7 +8,7 @@ amdgpu_filter.so: maple_tree.c
 userstack.so: maple_tree.c vma_rbtree.c
 
 $(CONTRIB_SO): %.so: %.c
-	$(CC) -O2 -g -fPIC -shared -Wl,-T,../makedumpfile.ld -o $@ $^
+	$(CC) -O2 -g -fPIC -shared -o $@ $^
 
 clean:
 	rm -f $(CONTRIB_SO)
diff --git a/kallsyms.h b/kallsyms.h
index d94830e..1989dfe 100644
--- a/kallsyms.h
+++ b/kallsyms.h
@@ -17,7 +17,7 @@ struct ksym_info {
 	struct ksym_info _##MOD##_##SYM = {		\
 		QUATE(MOD), QUATE(SYM), 0		\
 	};						\
-	__attribute__((section(".init_ksyms"), used))	\
+	__attribute__((section("init_ksyms"), used))	\
 	struct ksym_info * _ptr_##MOD##_##SYM = &_##MOD##_##SYM
 
 #define GET_MOD_SYM(MOD, SYM) (_##MOD##_##SYM.value)
diff --git a/makedumpfile.ld b/makedumpfile.ld
deleted file mode 100644
index 231a162..0000000
--- a/makedumpfile.ld
+++ /dev/null
@@ -1,15 +0,0 @@
-SECTIONS
-{
-	.init_ksyms ALIGN(8) : {
-		__start_init_ksyms = .;
-		KEEP(*(.init_ksyms*))
-		__stop_init_ksyms = .;
-	}
-
-	.init_ktypes ALIGN(8) : {
-		__start_init_ktypes = .;
-		KEEP(*(.init_ktypes*))
-		__stop_init_ktypes = .;
-	}
-}
-INSERT AFTER .data;
\ No newline at end of file
-- 
2.47.3



>> * Otherwise, this is exactly what I was thinking!
>>
>> 102ae0b ("Implement kernel kallsyms resolving")
>>
>> * Again, this is looking very close to the design I hoped for, thanks!
>> * I'm not sure whether is_unwanted_symbol() needs to exist anymore?  Given that
>>   users opt-in to specific symbols, we don't need to filter out noisy ones that
>>   would waste memory. What do you think?
>
> Agreed, I can remove this function.
>
>> * Unfortunately, upstream has made some changes to the vmlinux kallsyms encoding
>>   in 7.0. You may want to check what we did in drgn to support those changes:
>>   https://github.com/osandov/drgn/commit/744f36ec3c3f64d7e1323a0037898158698585c4
>> * In kallsyms.c:
>
> Thanks for the info, I will try this and update the code in v4.
>
>>
>>   +/*
>>   + * Makedumpfile's .init_ksyms section
>>   +*/
>>   +extern struct ksym_info __start_init_ksyms[];
>>   +extern struct ksym_info __stop_init_ksyms[];
>>
>>   I'm not sure it matters, but the type here is wrong. It should be
>>   "extern struct ksym_info *" because you're storing pointers, not the actual
>>   struct. That said, the type isn't used so I don't know that it matters.
>
> Right, this is an error. Thanks for pointing that out.
>
>>
>> 4187b33 ("Implement kernel btf resolving")
>>
>> * The get_ktype_info() function fails if either the struct or member is not
>>   found. This makes sense in a lot of cases, but there are other cases where we
>>   will want to use the presence or absence of a struct/struct member to detect
>>   which version of code to use. For example, my userspace stack code will use
>>   either maple or rbtree helper code for finding stack VMAs, depending on which
>>   is available. We can't know until runtime, when we check whether mm_rb or
>>   mm_mt is present.
>>
>>   One solution is to handle this at runtime. We can have a macro like
>>   "HAVE_MEMBER(S, M)" and "HAVE_STRUCT(S)", and then each extension can check
>>   whether for the members it expects to be present. I think the major downside
>>   to this approach is that it requires manual effort, which is likely to be
>>   forgotten when writing extensions.
>>
>>   Alternatively, we could set a flag in the struct ktype_info if the type is
>>   "required" (or optional), and only fail get_ktype_info() for required
>
> Agreed, an optional flag looks better.
>
>>   structs/members. The concern with this approach is: what if plugin (A)
>>   requires a type which is not present, but plugin (B) does not? If both are
>>   loaded, the failure of A would cause B not to run. I'm not sure whether we
>>   should care about that situation... I don't know if we have a use case for
>>   using multiple plugins at the same time. Until we do, we probably won't have a
>>   good idea whether it should be allowed for one to fail, but the other to
>>   continue.
>
> Great question! I have thought about this previously. I suggest that
> if plugin(A) fails, it should just fail and allow the execution of any
> later plugins (B, C....) to continue. Each plugin is responsible for
> one task, like plugin(A) for dealing with amdgpu's mm page
> filtering,and plugin(B) for Intel's and plugin(C) for NV's. Plugin(A)
> certainly will fail if one machine have no amdgpu, thus the amdgpu.ko
> will never been loaded, so related symbol/types missing. This is
> expected and shouldn't block the later plugins.
>
> But the "fail" should gentle, not like the ones as segfault, which
> will crash the entire makedumpfile program.

Totally agreed up to here! I think it makes sense for plugin failures to
be independent.

It seems like the BTF loading process should not fail if any type or
member is not found. But once we've tried to load everything, we can
then go through each plugin's list, and if we find a "required" type
or member which is not present, we log that failure and don't run the
plugin. This shouldn't be much work, because we already have all the
types in an array for each plugin.

> Since currently plugins
> are native .so libraries, the quality of code is ensured by each
> plugin authors, rather than makedumpfile maintainers. Idealy the the
> plugins are well tested in 1st kernel before they are shipped to kdump
> img, but who knows. From makedumpfile's view, do you think we need to
> introduce a sandbox to isolate plugins from makedumpfile? This would
> prevent serious plugin errors from stopping makedumpfile from
> generating the vmcore.

Personally, I don't believe we need sandbox. The plugins should be
written to handle the clear error cases:

- Optional types or symbols may not be present, so validate them.
- Values read from /proc/vmcore may not be valid, or the read may fail,
  so check all return values.

That said, I'm not set against it, so if you believe it's necessary I
could be convinced :)

Thanks again,
Stephen

>>
>>   I've implemented this second alternative in my branch.
>>
>> * Similar to the previous commit, __start_init_ktypes and __stop_init_ktypes are
>>   declared as structs but should probably be declared as pointers.
>>
>> 22097b7 ("Implement kernel module's kallsyms resolving")
>> edfa698 ("Implement kernel module's btf resolving")
>>
>> * These commits are straightforward:
>>   Reviewed-by: Stephen Brennan <stephen.s.brennan@oracle.com>
>>
>> ede22e8 ("Add makedumpfile extensions support")
>
> I will address those later after reading through your code branch.
> Thanks again for your detailed comments!
>
> Thanks,
> Tao Liu
>
>>
>> * nit: the array "handlers" actually contains "handles" (no r) returned from
>>   dlopen(). The word handlers usually implies a function pointer or something
>>   like that, called to handle a certain situation. Maybe rename this to
>>   "handles".
>> * The run_extensions() function is called within a retry loop in
>>   create_dumpfile(). It seems possible that this could get called multiple
>>   times. But many of the global variables in the kallsyms, BTF, and extension
>>   code are not safe to be cleaned up and reinitialized. In particular, when
>>   array elements are freed, the lengths, capacities, and pointers are not reset
>>   to 0/NULL. I think it would be wise to make all the cleanup functions clear
>>   the globals, so that they may be reinitialized safely.
>> * Further, I think it might be helpful to split extension loading, running, and
>>   cleanup. Something like this in create_dumpfile():
>>
>>         load_extensions(); /* loads everything and calls entry() */
>>     retry:
>>         /* ... create bitmap and dump ... */
>>         if (status == NOSPACE) {
>>             /* ... */
>>             goto retry;
>>         }
>>         /* ... */
>>         cleanup_extensions();
>>         return;
>>
>>   For two reasons: (1) this avoids unnecessary work re-loading and
>>   re-initializing the BTF, kallsyms, and extensions. (Though I still think it's
>>   safer to ensure they can be re-initialized safely.) And (2), this allows for
>>   future use of extensions in the rest of the dump operation. For my userspace
>>   stack extension, I plan to add a callback which allows extensions to override
>>   the decision to filter a page, since my logic can't be easily done via erase
>>   info. So the extensions need to remain loaded during the creation of the
>>   dumpfile, and cleaned up after. I have tweaked this in my own patches,
>>   but I just wanted to share the use case.
>>
>> c568635 ("btf/kallsyms based makedumpfile extension for mm page filtering")
>>
>> * This looks good to me!
>> * I will say that for my userspace stack use case, while the "filter_page(...,
>>   false)" mechanism for specifying pages that are retained *looks* useful, I
>>   wouldn't be able to use it because I cannot determine the PFNs for each stack
>>   VMA. Instead, I have to use the page "mapping" and "index" fields to match the
>>   VMA and determine whether the PFN falls in a range I care to save. Of course,
>>   just because *I* won't use it doesn't mean it's not useful :)
>>
>> 2b252ec ("Filter amdgpu mm pages")
>>
>> * I'm no expert in amdgpu, but the overall approach makes sense to me, and the
>>   helpers look good.
>>
>>
>> At a broader level, while the add_to_arr() function is useful, I do think the
>> dynamic array/vector pattern could be captured with a dedicated data structure.
>> For instance, drgn has this excellent LGPL-2.1+ header-only vector library:
>> https://github.com/osandov/drgn/blob/main/libdrgn/vector.h
>> I don't think it is high priority or must be addressed. Nor do I mean that this
>> particular implementation choice be used. It's just something to share & think
>> about.
>>
>> To provide some context on my comments related to the userspace stack tracing,
>> here is a branch of mine which is based on yours, that adds my userspace stack
>> extension and a few tweaks:
>>
>> https://github.com/brenns10/makedumpfile/commits/stepbren_userstack_v4/
>>
>> Thank you,
>> Stephen
>>
>> > Thanks,
>> > Tao Liu
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > [1]: https://github.com/liutgnu/makedumpfile/commits/v4/
>> >
>> > On Fri, Jan 23, 2026 at 2:43 AM Tao Liu <ltao@redhat.com> wrote:
>> >>
>> >> Hi Stephen,
>> >>
>> >> Thanks a lot for your quick reply and detailed information, I really
>> >> appreciate it!
>> >>
>> >> On Thu, Jan 22, 2026 at 1:51 PM Stephen Brennan
>> >> <stephen.s.brennan@oracle.com> wrote:
>> >> >
>> >> > Hi Tao,
>> >> >
>> >> > This series looks really great -- I'm excited to see the switch to
>> >> > native .so extensions instead of epicc. I've applied the series locally
>> >> > and I'll rebuild my userspace stack inclusion feature based on it, to
>> >> > try it out myself.
>> >>
>> >> Awesome, looking forward to your feedback on the code/API designs etc...
>> >>
>> >> >
>> >> > In the meantime, I'll share some of my feedback on the patches (though
>> >> > I'm not a makedumpfile developer). This seems like the most important
>> >> > patch in terms of design, so I'll start here.
>> >> >
>> >> > Tao Liu <ltao@redhat.com> writes:
>> >> > > This patch will add .so extension support to makedumpfile, similar to crash
>> >> > > extension to crash utility. Currently only "/usr/lib64/makedumpfile/extensions"
>> >> > > and "./extensions" are searched for extensions. Once found, kallsyms and btf
>> >> > > will be initialized so all extensions can benifit from it (Currently makedumpfile
>> >> > > doesn't use these info, we can move the kallsyms/btf init code else where later
>> >> > > if makedumpfile needs them).
>> >> > >
>> >> > > The makedumpfile extension is to help users to customize mm page filtering upon
>> >> > > traditional mm page flag filtering, without make code modification on makedumpfile
>> >> > > itself.
>> >> > >
>> >> > > Signed-off-by: Tao Liu <ltao@redhat.com>
>> >> > > ---
>> >> > >  Makefile            |  7 +++-
>> >> > >  extension.c         | 82 +++++++++++++++++++++++++++++++++++++++++++++
>> >> > >  extensions/Makefile | 10 ++++++
>> >> > >  makedumpfile.c      |  4 +++
>> >> > >  4 files changed, 102 insertions(+), 1 deletion(-)
>> >> > >  create mode 100644 extension.c
>> >> > >  create mode 100644 extensions/Makefile
>> >> > >
>> >> > > diff --git a/Makefile b/Makefile
>> >> > > index f3f4da8..7e29220 100644
>> >> > > --- a/Makefile
>> >> > > +++ b/Makefile
>> >> > > @@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
>> >> > >  endif
>> >> > >
>> >> > >  SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
>> >> > > -SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c
>> >> > > +SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c extension.c
>> >> > >  OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
>> >> > >  SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
>> >> > >  OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
>> >> > > @@ -126,6 +126,7 @@ eppic_makedumpfile.so: extension_eppic.c
>> >> > >
>> >> > >  clean:
>> >> > >       rm -f $(OBJ) $(OBJ_PART) $(OBJ_ARCH) makedumpfile makedumpfile.8 makedumpfile.conf.5
>> >> > > +     $(MAKE) -C extensions clean
>> >> > >
>> >> > >  install:
>> >> > >       install -m 755 -d ${DESTDIR}/${SBINDIR} ${DESTDIR}/usr/share/man/man5 ${DESTDIR}/usr/share/man/man8
>> >> > > @@ -135,3 +136,7 @@ install:
>> >> > >       mkdir -p ${DESTDIR}/usr/share/makedumpfile/eppic_scripts
>> >> > >       install -m 644 -D $(VPATH)makedumpfile.conf ${DESTDIR}/usr/share/makedumpfile/makedumpfile.conf.sample
>> >> > >       install -m 644 -t ${DESTDIR}/usr/share/makedumpfile/eppic_scripts/ $(VPATH)eppic_scripts/*
>> >> > > +
>> >> > > +.PHONY: extensions
>> >> > > +extensions:
>> >> > > +     $(MAKE) -C extensions CC=$(CC)
>> >> > > \ No newline at end of file
>> >> > > diff --git a/extension.c b/extension.c
>> >> > > new file mode 100644
>> >> > > index 0000000..6ee7f4e
>> >> > > --- /dev/null
>> >> > > +++ b/extension.c
>> >> > > @@ -0,0 +1,82 @@
>> >> > > +#include <stdio.h>
>> >> > > +#include <stdlib.h>
>> >> > > +#include <string.h>
>> >> > > +#include <dirent.h>
>> >> > > +#include <dlfcn.h>
>> >> > > +#include <stdbool.h>
>> >> > > +#include "kallsyms.h"
>> >> > > +#include "btf_info.h"
>> >> > > +
>> >> > > +static const char *dirs[] = {
>> >> > > +     "/usr/lib64/makedumpfile/extensions",
>> >> > > +     "./extensions",
>> >> > > +};
>> >> > > +
>> >> > > +/* Will only init once */
>> >> > > +static bool init_kallsyms_btf(void)
>> >> > > +{
>> >> > > +     static bool ret = false;
>> >> > > +     static bool has_inited = false;
>> >> > > +
>> >> > > +     if (has_inited)
>> >> > > +             goto out;
>> >> > > +     if (!init_kernel_kallsyms())
>> >> > > +             goto out;
>> >> > > +     if (!init_kernel_btf())
>> >> > > +             goto out;
>> >> > > +     if (!init_module_kallsyms())
>> >> > > +             goto out;
>> >> > > +     if (!init_module_btf())
>> >> > > +             goto out;
>> >> > > +     ret = true;
>> >> >
>> >> > I feel it would be good practice to load as little information as is
>> >> > necessary for the task. If "amdgpu" module is required, then load kernel
>> >> > kallsyms, BTF, and then the amdgpu module kallsyms & BTF. If no module
>> >> > debuginfo is required, then just the kernel would suffice.
>> >> >
>> >> > This would reduce memory usage and runtime, though I don't know if it
>> >> > would show up in profiling. The main benefit could be reliability: by
>> >> > handling less data, there are fewer chances to hit an error.
>> >>
>> >> OK, I agree, mandatory kernel btf/kallsyms info + optional kernel
>> >> module btf/kallsyms info is a reasonable design. So kernel modules'
>> >> info can be loaded on demand.
>> >>
>> >> >
>> >> > > +out:
>> >> > > +     has_inited = true;
>> >> > > +     return ret;
>> >> > > +}
>> >> > > +
>> >> > > +static void cleanup_kallsyms_btf(void)
>> >> > > +{
>> >> > > +     cleanup_kallsyms();
>> >> > > +     cleanup_btf();
>> >> > > +}
>> >> > > +
>> >> > > +void run_extensions(void)
>> >> > > +{
>> >> > > +     DIR *dir;
>> >> > > +     struct dirent *entry;
>> >> > > +     size_t len;
>> >> > > +     int i;
>> >> > > +     void *handle;
>> >> > > +     char path[512];
>> >> > > +
>> >> > > +     for (i = 0; i < sizeof(dirs) / sizeof(char *); i++) {
>> >> > > +             if ((dir = opendir(dirs[i])) != NULL)
>> >> > > +                     break;
>> >> > > +     }
>> >> > > +
>> >> > > +     if (!dir || i >= sizeof(dirs) / sizeof(char *))
>> >> > > +             /* No extensions found */
>> >> > > +             return;
>> >> >
>> >> > It could be confusing that makedumpfile would behave differently with
>> >> > the same command-line arguments depending on the presence or absence of
>> >> > these extensions on the filesystem.
>> >> >
>> >> > I think it may fit users' expectations better if they are required to
>> >> > specify extensions on the command line. Then we could load them by
>> >> > searching each directory in order. This allows:
>> >> >
>> >> > (a) more expected behavior
>> >> > (b) multiple extensions can exist without all being enabled, thus more
>> >> >     flexibility
>> >> > (c) extensions can be present in the local "extensions/" directory, or
>> >> >     in the system directory
>> >>
>> >> Sure, it also sounds reasonable. My original thoughts are, user
>> >> customization on mm filtering are specified in .so, and if user don't
>> >> need one .so, e.g. amdgpu mm filtering for a nvidia machine, then he
>> >> doesn't pack the amdgpu_filter.so into kdump's initramfs. I agree
>> >> adding extra makedumpfile cmdline option to receive those needed .so
>> >> is a better design.
>> >>
>> >> >
>> >> > > +     while ((entry = readdir(dir)) != NULL) {
>> >> > > +             len = strlen(entry->d_name);
>> >> > > +             if (len > 3 && strcmp(entry->d_name + len - 3, ".so") == 0) {
>> >> > > +                     /* Will only init when .so exist */
>> >> > > +                     if (!init_kallsyms_btf())
>> >> > > +                             goto out;
>> >> > > +
>> >> > > +                     snprintf(path, sizeof(path), "%s/%s", dirs[i], entry->d_name);
>> >> > > +                     handle = dlopen(path, RTLD_NOW);
>> >> > > +                     if (!handle) {
>> >> > > +                             fprintf(stderr, "%s: Failed to load %s: %s\n",
>> >> > > +                                     __func__, path, dlerror());
>> >> > > +                             continue;
>> >> > > +                     }
>> >> > > +                     printf("Loaded extension: %s\n", path);
>> >> > > +                     dlclose(handle);
>> >> >
>> >> > Using the constructor/destructor of the shared object is clever! But we
>> >> > lose some flexibility: by the time the dlopen() returns, the constructor
>> >> > has executed and the plugin has thus executed.
>> >> >
>> >> > What if we instead use dlsym() to load some symbols from the DSO? In
>> >> > particular, I think it would be useful if extensions could declare a
>> >> > list of symbols and a list of structure information which they are
>> >> > interested in receiving. We could use these lists to know which
>> >> > kernel/module kallsyms & BTF we should load. We could even load the
>> >> > information into the local variables of the extension, so the extension
>> >> > would not need to manually load it.
>> >> >
>> >> > Of course this is more complex, but the benefit is:
>> >> >
>> >> > 1. Extensions can be written more simply, and would not need to manually
>> >> > load each symbol & type.
>> >> > 2. We could eliminate the hash tables for kallsyms & BTF, and eliminate
>> >> > the loading of unnecessary module information. Instead, we'd just
>> >> > populate the symbol addresses, struct offsets, and type sizes directly
>> >> > into the local variables which request them.
>> >>
>> >> It is a clever idea! Though complex for code, I think it is doable.
>> >>
>> >> >
>> >> > Again, while I don't want to prematurely optimize -- it's good to avoid
>> >> > loading unnecessary information. I hope I've described my idea well. I
>> >> > would be happy to work on an implementation of it based on your patches
>> >> > here, if you're interested.
>> >>
>> >> Thanks again for your suggestions! I got your points and I think I can
>> >> improve the code while waiting for maintainers ideas at the same time.
>> >> I will let you know when done or encounter blockers if any.
>> >>
>> >> Thanks,
>> >> Tao Liu
>> >>
>> >> >
>> >> > Thanks,
>> >> > Stephen
>> >> >
>> >> > > +             }
>> >> > > +     }
>> >> > > +out:
>> >> > > +     closedir(dir);
>> >> > > +     cleanup_kallsyms_btf();
>> >> > > +}
>> >> > > \ No newline at end of file
>> >> > > diff --git a/extensions/Makefile b/extensions/Makefile
>> >> > > new file mode 100644
>> >> > > index 0000000..afbc61e
>> >> > > --- /dev/null
>> >> > > +++ b/extensions/Makefile
>> >> > > @@ -0,0 +1,10 @@
>> >> > > +CC ?= gcc
>> >> > > +CONTRIB_SO :=
>> >> > > +
>> >> > > +all: $(CONTRIB_SO)
>> >> > > +
>> >> > > +$(CONTRIB_SO): %.so: %.c
>> >> > > +     $(CC) -O2 -g -fPIC -shared -o $@ $^
>> >> > > +
>> >> > > +clean:
>> >> > > +     rm -f $(CONTRIB_SO)
>> >> > > diff --git a/makedumpfile.c b/makedumpfile.c
>> >> > > index dba3628..ca8ed8a 100644
>> >> > > --- a/makedumpfile.c
>> >> > > +++ b/makedumpfile.c
>> >> > > @@ -10847,6 +10847,8 @@ update_dump_level(void)
>> >> > >       }
>> >> > >  }
>> >> > >
>> >> > > +void run_extensions(void);
>> >> > > +
>> >> > >  int
>> >> > >  create_dumpfile(void)
>> >> > >  {
>> >> > > @@ -10884,6 +10886,8 @@ retry:
>> >> > >       if (info->flag_refiltering)
>> >> > >               update_dump_level();
>> >> > >
>> >> > > +     run_extensions();
>> >> > > +
>> >> > >       if ((info->name_filterconfig || info->name_eppic_config)
>> >> > >                       && !gather_filter_info())
>> >> > >               return FALSE;
>> >> > > --
>> >> > > 2.47.0
>> >> >
>>


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/8] Add makedumpfile extension support
  2026-03-12 22:24             ` Stephen Brennan
@ 2026-03-17 15:31               ` Tao Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Tao Liu @ 2026-03-17 15:31 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: yamazaki-msmt, k-hagio-ab, kexec, aravinda

Hi Stephen,

Thanks a lot for your comments, I have made plenty of code
improvements based on your suggestions and your github code repo.
Sorry I didn't inline reply to all your comments, please check out the
latest v4 patch I posted upstream.

Some notable improvements:

1) I adjusted the extension callback function to amdgpu_filter, so a
same callback interface can serve both our extensions.
2) I followed your implementation of the INIT_OPT_(...) macro for
optional sym/types and the INIT_(...) macro for must-have sym/types.
If extension missing any must-have sym/types, it will load-fail and
exit as early(e.g. a machine doesn't have amdgpu). And optional
symbols checking during extension_init() (e.g. existence of kernel
symbol for kernel version detection)
3) kallsyms address calc for v7.0 kernel

On Fri, Mar 13, 2026 at 11:24 AM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Tao Liu <ltao@redhat.com> writes:
>
> > Hi Stephen,
> >
> > On Wed, Mar 11, 2026 at 1:38 PM Stephen Brennan
> > <stephen.s.brennan@oracle.com> wrote:
> >>
> >> Tao Liu <ltao@redhat.com> writes:
> >> > Hi Stephen,
> >> >
> >> > Sorry I took some time to make modifications on v3 code. Please see a
> >> > drafted v4 [1] for a preview.
> >> >
> >> > I followed your suggestion to let extensions declare the kallsyms
> >> > symbol/btf types it needed, then during kallsyms/btf initialization,
> >> > it will only resolve the declared symbol/types, thus getting rid of
> >> > the hash table. Extensions now don't need to initial needed
> >> > symbols/types by itself.
> >> >
> >> > In addition, users can specify which extensions to load at
> >> > makedumpfile cmdline as:
> >> >
> >> > $ ./makedumpfile -d 31 -l
> >> > /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore /tmp/out --extension
> >> > amdgpu_filter.so
> >> >
> >> > Also "--extension" can be used several times, and amdgpu_filter.so can
> >> > be an absolute path as well. If no extensions are specified, the
> >> > btf/kallsyms of makedumpfile will not initialize.
> >> >
> >> > I don't know if this is what you wanted, or any suggestions for this?
> >> > Thanks in advance!
> >>
> >> Hi Tao,
> >>
> >> Please accept my apologies for the long delay on this review.  I've gone through
> >> each commit with my feedback. This looks like a really excellent job and I
> >> think it is ready for the review of the maintainers. I've implemented my own
> >> userspace stack extension on top of this support, and I find it to be great for
> >> my use case as well, with one small change (adding flags for detecting whether a
> >> struct or member is found).
> >
> > No worries at all :) Thanks a lot for your detailed comments. I'm
> > looking through them as well as your code branch in github. This may
> > take a while...
> >
> >>
> >> Here is some more detailed feedback on a per-commit level; it's all pretty
> >> minor.
> >>
> >> d3aee7a ("Reserve sections for makedumpfile and extenions")
> >>
> >> * This is just a note, but I've found that the linker script may not be
> >>   necessary. GCC automatically creates the __start_SECTION and __stop_SECTION
> >>   symbols for sections it creates.
> >
> > I have tried to remove the makedumpfile.ld linker script. The
> > automatic symbol as __start/stop_SECTION are only generated for
> > makedumpfile, aka the main program, but not for .so extensions. This
> > is a breaker because I would expect all .so files have the
> > __start/stop_SECTION symbol so they can be resigtered to main program.
> > I'm not expert in GCC's default behaviours, so I guess I would prefer
> > the current approach of explicit symbol define within the linker
> > script, seems to be more robust...  Perhaps you can share your code on
> > this, I will give it a try.
>
> The key to making GCC generate the symbols for the sections is that the
> section names should themselves be valid C identifiers (i.e. they should
> not start with a '.'). Here's a patch to give the idea:
>
> From 3618f51869e978328e49f6061eae94469bca43c5 Mon Sep 17 00:00:00 2001
> From: Stephen Brennan <stephen.s.brennan@oracle.com>
> Date: Wed, 11 Mar 2026 09:17:05 -0700
> Subject: [PATCH] Use automatically generated section start symbols
>
> GCC will create __start_SECTION and __stop_SECTION only if the section
> names are valid C identifiers. This means we cannot use a leading dot.
> Drop the leading dot from the section names so we can avoid defining a
> custom linker script.
>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> ---
>  Makefile            |  2 +-
>  btf_info.h          |  4 ++--
>  extensions/Makefile |  2 +-
>  kallsyms.h          |  2 +-
>  makedumpfile.ld     | 15 ---------------
>  5 files changed, 5 insertions(+), 20 deletions(-)
>  delete mode 100644 makedumpfile.ld
>
> diff --git a/Makefile b/Makefile
> index 8196aa6..e2cde8f 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -113,7 +113,7 @@ $(OBJ_ARCH): $(SRC_ARCH)
>         $(CC) $(CFLAGS_ARCH) -c -o ./$@ $(VPATH)$(@:.o=.c)
>
>  makedumpfile: $(SRC_BASE) $(OBJ_PART) $(OBJ_ARCH)
> -       $(CC) $(CFLAGS) $(LDFLAGS) $(OBJ_PART) $(OBJ_ARCH) -rdynamic -Wl,-T,makedumpfile.ld -o $@ $< $(LIBS)
> +       $(CC) $(CFLAGS) $(LDFLAGS) $(OBJ_PART) $(OBJ_ARCH) -rdynamic -o $@ $< $(LIBS)
>         @sed -e "s/@DATE@/$(DATE)/" \
>              -e "s/@VERSION@/$(VERSION)/" \
>              $(VPATH)makedumpfile.8.in > $(VPATH)makedumpfile.8
> diff --git a/btf_info.h b/btf_info.h
> index 2a78fbf..555f21e 100644
> --- a/btf_info.h
> +++ b/btf_info.h
> @@ -29,7 +29,7 @@ void cleanup_btf(void);
>         struct ktype_info _##MOD##_##S##_##M = {                \
>                 QUATE(MOD), QUATE(S), QUATE(M), 0, 0, 0, 0, R, R\
>         };                                                      \
> -       __attribute__((section(".init_ktypes"), used))          \
> +       __attribute__((section("init_ktypes"), used))           \
>         struct ktype_info * _ptr_##MOD##_##S##_##M = &_##MOD##_##S##_##M
>
>  #define INIT_MOD_STRUCT_MEMBER(MOD, S, M)                      \
> @@ -60,7 +60,7 @@ void cleanup_btf(void);
>         struct ktype_info _##MOD##_##S = {                      \
>                 QUATE(MOD), QUATE(S), 0, 0, 0, 0, 0, R, 0       \
>         };                                                      \
> -       __attribute__((section(".init_ktypes"), used))          \
> +       __attribute__((section("init_ktypes"), used))           \
>         struct ktype_info * _ptr_##MOD##_##S = &_##MOD##_##S
>
>  #define INIT_MOD_STRUCT(MOD, S) INIT_MOD_STRUCT_RQD(MOD, S, 1)
> diff --git a/extensions/Makefile b/extensions/Makefile
> index fc28577..a4dc86a 100644
> --- a/extensions/Makefile
> +++ b/extensions/Makefile
> @@ -8,7 +8,7 @@ amdgpu_filter.so: maple_tree.c
>  userstack.so: maple_tree.c vma_rbtree.c
>
>  $(CONTRIB_SO): %.so: %.c
> -       $(CC) -O2 -g -fPIC -shared -Wl,-T,../makedumpfile.ld -o $@ $^
> +       $(CC) -O2 -g -fPIC -shared -o $@ $^
>
>  clean:
>         rm -f $(CONTRIB_SO)
> diff --git a/kallsyms.h b/kallsyms.h
> index d94830e..1989dfe 100644
> --- a/kallsyms.h
> +++ b/kallsyms.h
> @@ -17,7 +17,7 @@ struct ksym_info {
>         struct ksym_info _##MOD##_##SYM = {             \
>                 QUATE(MOD), QUATE(SYM), 0               \
>         };                                              \
> -       __attribute__((section(".init_ksyms"), used))   \
> +       __attribute__((section("init_ksyms"), used))    \
>         struct ksym_info * _ptr_##MOD##_##SYM = &_##MOD##_##SYM
>
>  #define GET_MOD_SYM(MOD, SYM) (_##MOD##_##SYM.value)
> diff --git a/makedumpfile.ld b/makedumpfile.ld
> deleted file mode 100644
> index 231a162..0000000
> --- a/makedumpfile.ld
> +++ /dev/null
> @@ -1,15 +0,0 @@
> -SECTIONS
> -{
> -       .init_ksyms ALIGN(8) : {
> -               __start_init_ksyms = .;
> -               KEEP(*(.init_ksyms*))
> -               __stop_init_ksyms = .;
> -       }
> -
> -       .init_ktypes ALIGN(8) : {
> -               __start_init_ktypes = .;
> -               KEEP(*(.init_ktypes*))
> -               __stop_init_ktypes = .;
> -       }
> -}
> -INSERT AFTER .data;
> \ No newline at end of file
> --
> 2.47.3

Unfortunately with the patch applied I cannot see the
__start_init_ksyms/ktypes symbol for amdgpu_filter.so. So I had to
keep the linker script in v4...

Thanks,
Tao Liu

>
>
>
> >> * Otherwise, this is exactly what I was thinking!
> >>
> >> 102ae0b ("Implement kernel kallsyms resolving")
> >>
> >> * Again, this is looking very close to the design I hoped for, thanks!
> >> * I'm not sure whether is_unwanted_symbol() needs to exist anymore?  Given that
> >>   users opt-in to specific symbols, we don't need to filter out noisy ones that
> >>   would waste memory. What do you think?
> >
> > Agreed, I can remove this function.
> >
> >> * Unfortunately, upstream has made some changes to the vmlinux kallsyms encoding
> >>   in 7.0. You may want to check what we did in drgn to support those changes:
> >>   https://github.com/osandov/drgn/commit/744f36ec3c3f64d7e1323a0037898158698585c4
> >> * In kallsyms.c:
> >
> > Thanks for the info, I will try this and update the code in v4.
> >
> >>
> >>   +/*
> >>   + * Makedumpfile's .init_ksyms section
> >>   +*/
> >>   +extern struct ksym_info __start_init_ksyms[];
> >>   +extern struct ksym_info __stop_init_ksyms[];
> >>
> >>   I'm not sure it matters, but the type here is wrong. It should be
> >>   "extern struct ksym_info *" because you're storing pointers, not the actual
> >>   struct. That said, the type isn't used so I don't know that it matters.
> >
> > Right, this is an error. Thanks for pointing that out.
> >
> >>
> >> 4187b33 ("Implement kernel btf resolving")
> >>
> >> * The get_ktype_info() function fails if either the struct or member is not
> >>   found. This makes sense in a lot of cases, but there are other cases where we
> >>   will want to use the presence or absence of a struct/struct member to detect
> >>   which version of code to use. For example, my userspace stack code will use
> >>   either maple or rbtree helper code for finding stack VMAs, depending on which
> >>   is available. We can't know until runtime, when we check whether mm_rb or
> >>   mm_mt is present.
> >>
> >>   One solution is to handle this at runtime. We can have a macro like
> >>   "HAVE_MEMBER(S, M)" and "HAVE_STRUCT(S)", and then each extension can check
> >>   whether for the members it expects to be present. I think the major downside
> >>   to this approach is that it requires manual effort, which is likely to be
> >>   forgotten when writing extensions.
> >>
> >>   Alternatively, we could set a flag in the struct ktype_info if the type is
> >>   "required" (or optional), and only fail get_ktype_info() for required
> >
> > Agreed, an optional flag looks better.
> >
> >>   structs/members. The concern with this approach is: what if plugin (A)
> >>   requires a type which is not present, but plugin (B) does not? If both are
> >>   loaded, the failure of A would cause B not to run. I'm not sure whether we
> >>   should care about that situation... I don't know if we have a use case for
> >>   using multiple plugins at the same time. Until we do, we probably won't have a
> >>   good idea whether it should be allowed for one to fail, but the other to
> >>   continue.
> >
> > Great question! I have thought about this previously. I suggest that
> > if plugin(A) fails, it should just fail and allow the execution of any
> > later plugins (B, C....) to continue. Each plugin is responsible for
> > one task, like plugin(A) for dealing with amdgpu's mm page
> > filtering,and plugin(B) for Intel's and plugin(C) for NV's. Plugin(A)
> > certainly will fail if one machine have no amdgpu, thus the amdgpu.ko
> > will never been loaded, so related symbol/types missing. This is
> > expected and shouldn't block the later plugins.
> >
> > But the "fail" should gentle, not like the ones as segfault, which
> > will crash the entire makedumpfile program.
>
> Totally agreed up to here! I think it makes sense for plugin failures to
> be independent.
>
> It seems like the BTF loading process should not fail if any type or
> member is not found. But once we've tried to load everything, we can
> then go through each plugin's list, and if we find a "required" type
> or member which is not present, we log that failure and don't run the
> plugin. This shouldn't be much work, because we already have all the
> types in an array for each plugin.
>
> > Since currently plugins
> > are native .so libraries, the quality of code is ensured by each
> > plugin authors, rather than makedumpfile maintainers. Idealy the the
> > plugins are well tested in 1st kernel before they are shipped to kdump
> > img, but who knows. From makedumpfile's view, do you think we need to
> > introduce a sandbox to isolate plugins from makedumpfile? This would
> > prevent serious plugin errors from stopping makedumpfile from
> > generating the vmcore.
>
> Personally, I don't believe we need sandbox. The plugins should be
> written to handle the clear error cases:
>
> - Optional types or symbols may not be present, so validate them.
> - Values read from /proc/vmcore may not be valid, or the read may fail,
>   so check all return values.
>
> That said, I'm not set against it, so if you believe it's necessary I
> could be convinced :)
>
> Thanks again,
> Stephen
>
> >>
> >>   I've implemented this second alternative in my branch.
> >>
> >> * Similar to the previous commit, __start_init_ktypes and __stop_init_ktypes are
> >>   declared as structs but should probably be declared as pointers.
> >>
> >> 22097b7 ("Implement kernel module's kallsyms resolving")
> >> edfa698 ("Implement kernel module's btf resolving")
> >>
> >> * These commits are straightforward:
> >>   Reviewed-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> >>
> >> ede22e8 ("Add makedumpfile extensions support")
> >
> > I will address those later after reading through your code branch.
> > Thanks again for your detailed comments!
> >
> > Thanks,
> > Tao Liu
> >
> >>
> >> * nit: the array "handlers" actually contains "handles" (no r) returned from
> >>   dlopen(). The word handlers usually implies a function pointer or something
> >>   like that, called to handle a certain situation. Maybe rename this to
> >>   "handles".
> >> * The run_extensions() function is called within a retry loop in
> >>   create_dumpfile(). It seems possible that this could get called multiple
> >>   times. But many of the global variables in the kallsyms, BTF, and extension
> >>   code are not safe to be cleaned up and reinitialized. In particular, when
> >>   array elements are freed, the lengths, capacities, and pointers are not reset
> >>   to 0/NULL. I think it would be wise to make all the cleanup functions clear
> >>   the globals, so that they may be reinitialized safely.
> >> * Further, I think it might be helpful to split extension loading, running, and
> >>   cleanup. Something like this in create_dumpfile():
> >>
> >>         load_extensions(); /* loads everything and calls entry() */
> >>     retry:
> >>         /* ... create bitmap and dump ... */
> >>         if (status == NOSPACE) {
> >>             /* ... */
> >>             goto retry;
> >>         }
> >>         /* ... */
> >>         cleanup_extensions();
> >>         return;
> >>
> >>   For two reasons: (1) this avoids unnecessary work re-loading and
> >>   re-initializing the BTF, kallsyms, and extensions. (Though I still think it's
> >>   safer to ensure they can be re-initialized safely.) And (2), this allows for
> >>   future use of extensions in the rest of the dump operation. For my userspace
> >>   stack extension, I plan to add a callback which allows extensions to override
> >>   the decision to filter a page, since my logic can't be easily done via erase
> >>   info. So the extensions need to remain loaded during the creation of the
> >>   dumpfile, and cleaned up after. I have tweaked this in my own patches,
> >>   but I just wanted to share the use case.
> >>
> >> c568635 ("btf/kallsyms based makedumpfile extension for mm page filtering")
> >>
> >> * This looks good to me!
> >> * I will say that for my userspace stack use case, while the "filter_page(...,
> >>   false)" mechanism for specifying pages that are retained *looks* useful, I
> >>   wouldn't be able to use it because I cannot determine the PFNs for each stack
> >>   VMA. Instead, I have to use the page "mapping" and "index" fields to match the
> >>   VMA and determine whether the PFN falls in a range I care to save. Of course,
> >>   just because *I* won't use it doesn't mean it's not useful :)
> >>
> >> 2b252ec ("Filter amdgpu mm pages")
> >>
> >> * I'm no expert in amdgpu, but the overall approach makes sense to me, and the
> >>   helpers look good.
> >>
> >>
> >> At a broader level, while the add_to_arr() function is useful, I do think the
> >> dynamic array/vector pattern could be captured with a dedicated data structure.
> >> For instance, drgn has this excellent LGPL-2.1+ header-only vector library:
> >> https://github.com/osandov/drgn/blob/main/libdrgn/vector.h
> >> I don't think it is high priority or must be addressed. Nor do I mean that this
> >> particular implementation choice be used. It's just something to share & think
> >> about.
> >>
> >> To provide some context on my comments related to the userspace stack tracing,
> >> here is a branch of mine which is based on yours, that adds my userspace stack
> >> extension and a few tweaks:
> >>
> >> https://github.com/brenns10/makedumpfile/commits/stepbren_userstack_v4/
> >>
> >> Thank you,
> >> Stephen
> >>
> >> > Thanks,
> >> > Tao Liu
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > [1]: https://github.com/liutgnu/makedumpfile/commits/v4/
> >> >
> >> > On Fri, Jan 23, 2026 at 2:43 AM Tao Liu <ltao@redhat.com> wrote:
> >> >>
> >> >> Hi Stephen,
> >> >>
> >> >> Thanks a lot for your quick reply and detailed information, I really
> >> >> appreciate it!
> >> >>
> >> >> On Thu, Jan 22, 2026 at 1:51 PM Stephen Brennan
> >> >> <stephen.s.brennan@oracle.com> wrote:
> >> >> >
> >> >> > Hi Tao,
> >> >> >
> >> >> > This series looks really great -- I'm excited to see the switch to
> >> >> > native .so extensions instead of epicc. I've applied the series locally
> >> >> > and I'll rebuild my userspace stack inclusion feature based on it, to
> >> >> > try it out myself.
> >> >>
> >> >> Awesome, looking forward to your feedback on the code/API designs etc...
> >> >>
> >> >> >
> >> >> > In the meantime, I'll share some of my feedback on the patches (though
> >> >> > I'm not a makedumpfile developer). This seems like the most important
> >> >> > patch in terms of design, so I'll start here.
> >> >> >
> >> >> > Tao Liu <ltao@redhat.com> writes:
> >> >> > > This patch will add .so extension support to makedumpfile, similar to crash
> >> >> > > extension to crash utility. Currently only "/usr/lib64/makedumpfile/extensions"
> >> >> > > and "./extensions" are searched for extensions. Once found, kallsyms and btf
> >> >> > > will be initialized so all extensions can benifit from it (Currently makedumpfile
> >> >> > > doesn't use these info, we can move the kallsyms/btf init code else where later
> >> >> > > if makedumpfile needs them).
> >> >> > >
> >> >> > > The makedumpfile extension is to help users to customize mm page filtering upon
> >> >> > > traditional mm page flag filtering, without make code modification on makedumpfile
> >> >> > > itself.
> >> >> > >
> >> >> > > Signed-off-by: Tao Liu <ltao@redhat.com>
> >> >> > > ---
> >> >> > >  Makefile            |  7 +++-
> >> >> > >  extension.c         | 82 +++++++++++++++++++++++++++++++++++++++++++++
> >> >> > >  extensions/Makefile | 10 ++++++
> >> >> > >  makedumpfile.c      |  4 +++
> >> >> > >  4 files changed, 102 insertions(+), 1 deletion(-)
> >> >> > >  create mode 100644 extension.c
> >> >> > >  create mode 100644 extensions/Makefile
> >> >> > >
> >> >> > > diff --git a/Makefile b/Makefile
> >> >> > > index f3f4da8..7e29220 100644
> >> >> > > --- a/Makefile
> >> >> > > +++ b/Makefile
> >> >> > > @@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
> >> >> > >  endif
> >> >> > >
> >> >> > >  SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
> >> >> > > -SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c
> >> >> > > +SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf_info.c extension.c
> >> >> > >  OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
> >> >> > >  SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
> >> >> > >  OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
> >> >> > > @@ -126,6 +126,7 @@ eppic_makedumpfile.so: extension_eppic.c
> >> >> > >
> >> >> > >  clean:
> >> >> > >       rm -f $(OBJ) $(OBJ_PART) $(OBJ_ARCH) makedumpfile makedumpfile.8 makedumpfile.conf.5
> >> >> > > +     $(MAKE) -C extensions clean
> >> >> > >
> >> >> > >  install:
> >> >> > >       install -m 755 -d ${DESTDIR}/${SBINDIR} ${DESTDIR}/usr/share/man/man5 ${DESTDIR}/usr/share/man/man8
> >> >> > > @@ -135,3 +136,7 @@ install:
> >> >> > >       mkdir -p ${DESTDIR}/usr/share/makedumpfile/eppic_scripts
> >> >> > >       install -m 644 -D $(VPATH)makedumpfile.conf ${DESTDIR}/usr/share/makedumpfile/makedumpfile.conf.sample
> >> >> > >       install -m 644 -t ${DESTDIR}/usr/share/makedumpfile/eppic_scripts/ $(VPATH)eppic_scripts/*
> >> >> > > +
> >> >> > > +.PHONY: extensions
> >> >> > > +extensions:
> >> >> > > +     $(MAKE) -C extensions CC=$(CC)
> >> >> > > \ No newline at end of file
> >> >> > > diff --git a/extension.c b/extension.c
> >> >> > > new file mode 100644
> >> >> > > index 0000000..6ee7f4e
> >> >> > > --- /dev/null
> >> >> > > +++ b/extension.c
> >> >> > > @@ -0,0 +1,82 @@
> >> >> > > +#include <stdio.h>
> >> >> > > +#include <stdlib.h>
> >> >> > > +#include <string.h>
> >> >> > > +#include <dirent.h>
> >> >> > > +#include <dlfcn.h>
> >> >> > > +#include <stdbool.h>
> >> >> > > +#include "kallsyms.h"
> >> >> > > +#include "btf_info.h"
> >> >> > > +
> >> >> > > +static const char *dirs[] = {
> >> >> > > +     "/usr/lib64/makedumpfile/extensions",
> >> >> > > +     "./extensions",
> >> >> > > +};
> >> >> > > +
> >> >> > > +/* Will only init once */
> >> >> > > +static bool init_kallsyms_btf(void)
> >> >> > > +{
> >> >> > > +     static bool ret = false;
> >> >> > > +     static bool has_inited = false;
> >> >> > > +
> >> >> > > +     if (has_inited)
> >> >> > > +             goto out;
> >> >> > > +     if (!init_kernel_kallsyms())
> >> >> > > +             goto out;
> >> >> > > +     if (!init_kernel_btf())
> >> >> > > +             goto out;
> >> >> > > +     if (!init_module_kallsyms())
> >> >> > > +             goto out;
> >> >> > > +     if (!init_module_btf())
> >> >> > > +             goto out;
> >> >> > > +     ret = true;
> >> >> >
> >> >> > I feel it would be good practice to load as little information as is
> >> >> > necessary for the task. If "amdgpu" module is required, then load kernel
> >> >> > kallsyms, BTF, and then the amdgpu module kallsyms & BTF. If no module
> >> >> > debuginfo is required, then just the kernel would suffice.
> >> >> >
> >> >> > This would reduce memory usage and runtime, though I don't know if it
> >> >> > would show up in profiling. The main benefit could be reliability: by
> >> >> > handling less data, there are fewer chances to hit an error.
> >> >>
> >> >> OK, I agree, mandatory kernel btf/kallsyms info + optional kernel
> >> >> module btf/kallsyms info is a reasonable design. So kernel modules'
> >> >> info can be loaded on demand.
> >> >>
> >> >> >
> >> >> > > +out:
> >> >> > > +     has_inited = true;
> >> >> > > +     return ret;
> >> >> > > +}
> >> >> > > +
> >> >> > > +static void cleanup_kallsyms_btf(void)
> >> >> > > +{
> >> >> > > +     cleanup_kallsyms();
> >> >> > > +     cleanup_btf();
> >> >> > > +}
> >> >> > > +
> >> >> > > +void run_extensions(void)
> >> >> > > +{
> >> >> > > +     DIR *dir;
> >> >> > > +     struct dirent *entry;
> >> >> > > +     size_t len;
> >> >> > > +     int i;
> >> >> > > +     void *handle;
> >> >> > > +     char path[512];
> >> >> > > +
> >> >> > > +     for (i = 0; i < sizeof(dirs) / sizeof(char *); i++) {
> >> >> > > +             if ((dir = opendir(dirs[i])) != NULL)
> >> >> > > +                     break;
> >> >> > > +     }
> >> >> > > +
> >> >> > > +     if (!dir || i >= sizeof(dirs) / sizeof(char *))
> >> >> > > +             /* No extensions found */
> >> >> > > +             return;
> >> >> >
> >> >> > It could be confusing that makedumpfile would behave differently with
> >> >> > the same command-line arguments depending on the presence or absence of
> >> >> > these extensions on the filesystem.
> >> >> >
> >> >> > I think it may fit users' expectations better if they are required to
> >> >> > specify extensions on the command line. Then we could load them by
> >> >> > searching each directory in order. This allows:
> >> >> >
> >> >> > (a) more expected behavior
> >> >> > (b) multiple extensions can exist without all being enabled, thus more
> >> >> >     flexibility
> >> >> > (c) extensions can be present in the local "extensions/" directory, or
> >> >> >     in the system directory
> >> >>
> >> >> Sure, it also sounds reasonable. My original thoughts are, user
> >> >> customization on mm filtering are specified in .so, and if user don't
> >> >> need one .so, e.g. amdgpu mm filtering for a nvidia machine, then he
> >> >> doesn't pack the amdgpu_filter.so into kdump's initramfs. I agree
> >> >> adding extra makedumpfile cmdline option to receive those needed .so
> >> >> is a better design.
> >> >>
> >> >> >
> >> >> > > +     while ((entry = readdir(dir)) != NULL) {
> >> >> > > +             len = strlen(entry->d_name);
> >> >> > > +             if (len > 3 && strcmp(entry->d_name + len - 3, ".so") == 0) {
> >> >> > > +                     /* Will only init when .so exist */
> >> >> > > +                     if (!init_kallsyms_btf())
> >> >> > > +                             goto out;
> >> >> > > +
> >> >> > > +                     snprintf(path, sizeof(path), "%s/%s", dirs[i], entry->d_name);
> >> >> > > +                     handle = dlopen(path, RTLD_NOW);
> >> >> > > +                     if (!handle) {
> >> >> > > +                             fprintf(stderr, "%s: Failed to load %s: %s\n",
> >> >> > > +                                     __func__, path, dlerror());
> >> >> > > +                             continue;
> >> >> > > +                     }
> >> >> > > +                     printf("Loaded extension: %s\n", path);
> >> >> > > +                     dlclose(handle);
> >> >> >
> >> >> > Using the constructor/destructor of the shared object is clever! But we
> >> >> > lose some flexibility: by the time the dlopen() returns, the constructor
> >> >> > has executed and the plugin has thus executed.
> >> >> >
> >> >> > What if we instead use dlsym() to load some symbols from the DSO? In
> >> >> > particular, I think it would be useful if extensions could declare a
> >> >> > list of symbols and a list of structure information which they are
> >> >> > interested in receiving. We could use these lists to know which
> >> >> > kernel/module kallsyms & BTF we should load. We could even load the
> >> >> > information into the local variables of the extension, so the extension
> >> >> > would not need to manually load it.
> >> >> >
> >> >> > Of course this is more complex, but the benefit is:
> >> >> >
> >> >> > 1. Extensions can be written more simply, and would not need to manually
> >> >> > load each symbol & type.
> >> >> > 2. We could eliminate the hash tables for kallsyms & BTF, and eliminate
> >> >> > the loading of unnecessary module information. Instead, we'd just
> >> >> > populate the symbol addresses, struct offsets, and type sizes directly
> >> >> > into the local variables which request them.
> >> >>
> >> >> It is a clever idea! Though complex for code, I think it is doable.
> >> >>
> >> >> >
> >> >> > Again, while I don't want to prematurely optimize -- it's good to avoid
> >> >> > loading unnecessary information. I hope I've described my idea well. I
> >> >> > would be happy to work on an implementation of it based on your patches
> >> >> > here, if you're interested.
> >> >>
> >> >> Thanks again for your suggestions! I got your points and I think I can
> >> >> improve the code while waiting for maintainers ideas at the same time.
> >> >> I will let you know when done or encounter blockers if any.
> >> >>
> >> >> Thanks,
> >> >> Tao Liu
> >> >>
> >> >> >
> >> >> > Thanks,
> >> >> > Stephen
> >> >> >
> >> >> > > +             }
> >> >> > > +     }
> >> >> > > +out:
> >> >> > > +     closedir(dir);
> >> >> > > +     cleanup_kallsyms_btf();
> >> >> > > +}
> >> >> > > \ No newline at end of file
> >> >> > > diff --git a/extensions/Makefile b/extensions/Makefile
> >> >> > > new file mode 100644
> >> >> > > index 0000000..afbc61e
> >> >> > > --- /dev/null
> >> >> > > +++ b/extensions/Makefile
> >> >> > > @@ -0,0 +1,10 @@
> >> >> > > +CC ?= gcc
> >> >> > > +CONTRIB_SO :=
> >> >> > > +
> >> >> > > +all: $(CONTRIB_SO)
> >> >> > > +
> >> >> > > +$(CONTRIB_SO): %.so: %.c
> >> >> > > +     $(CC) -O2 -g -fPIC -shared -o $@ $^
> >> >> > > +
> >> >> > > +clean:
> >> >> > > +     rm -f $(CONTRIB_SO)
> >> >> > > diff --git a/makedumpfile.c b/makedumpfile.c
> >> >> > > index dba3628..ca8ed8a 100644
> >> >> > > --- a/makedumpfile.c
> >> >> > > +++ b/makedumpfile.c
> >> >> > > @@ -10847,6 +10847,8 @@ update_dump_level(void)
> >> >> > >       }
> >> >> > >  }
> >> >> > >
> >> >> > > +void run_extensions(void);
> >> >> > > +
> >> >> > >  int
> >> >> > >  create_dumpfile(void)
> >> >> > >  {
> >> >> > > @@ -10884,6 +10886,8 @@ retry:
> >> >> > >       if (info->flag_refiltering)
> >> >> > >               update_dump_level();
> >> >> > >
> >> >> > > +     run_extensions();
> >> >> > > +
> >> >> > >       if ((info->name_filterconfig || info->name_eppic_config)
> >> >> > >                       && !gather_filter_info())
> >> >> > >               return FALSE;
> >> >> > > --
> >> >> > > 2.47.0
> >> >> >
> >>
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-03-17 15:32 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-20  2:54 [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
2026-01-20  2:54 ` [PATCH v3 1/8] Implement kernel kallsyms resolving Tao Liu
2026-01-24  1:09   ` Stephen Brennan
2026-01-24  5:52     ` Tao Liu
2026-01-20  2:54 ` [PATCH v3 2/8] Implement kernel btf resolving Tao Liu
2026-01-20  2:54 ` [PATCH v3 3/8] Implement kernel modules' kallsyms resolving Tao Liu
2026-01-20  2:54 ` [PATCH v3 4/8] Implement kernel modules' btf resolving Tao Liu
2026-01-20  2:54 ` [PATCH v3 5/8] Add makedumpfile extension support Tao Liu
2026-01-22  0:51   ` Stephen Brennan
2026-01-22 13:43     ` Tao Liu
2026-02-04  8:40       ` Tao Liu
2026-03-11  0:38         ` Stephen Brennan
2026-03-11 14:41           ` Tao Liu
2026-03-12 22:24             ` Stephen Brennan
2026-03-17 15:31               ` Tao Liu
2026-01-20  2:54 ` [PATCH v3 6/8] Add page filtering function Tao Liu
2026-01-23  0:54   ` Stephen Brennan
2026-01-27  3:21     ` Tao Liu
2026-01-20  2:54 ` [PATCH v3 7/8] Add maple tree support to makedumpfile extension Tao Liu
2026-01-20  2:55 ` [PATCH v3 8/8] Filter amdgpu mm pages Tao Liu
2026-01-20  4:39 ` [PATCH v3 0/8] btf/kallsyms based makedumpfile extension for mm page filtering Tao Liu
2026-01-29 10:19 ` YAMAZAKI MASAMITSU(山崎　真光)
2026-02-04  8:50   ` Tao Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox