* [PATCH RFC][makedumpfile 01/10] dwarf_info: Support kernel address randomization
2025-06-10 9:57 [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering Tao Liu
@ 2025-06-10 9:57 ` Tao Liu
2025-09-08 11:11 ` YAMAZAKI MASAMITSU(山崎 真光)
2025-09-29 6:30 ` HAGIO KAZUHITO(萩尾 一仁)
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 02/10] dwarf_info: Fix a infinite recursion bug for search_domain Tao Liu
` (9 subsequent siblings)
10 siblings, 2 replies; 26+ messages in thread
From: Tao Liu @ 2025-06-10 9:57 UTC (permalink / raw)
To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, devel, Tao Liu
There is a bug of not supporting randomized kernel address, this patch fix it.
Signed-off-by: Tao Liu <ltao@redhat.com>
---
erase_info.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/erase_info.c b/erase_info.c
index af6bfae..b67d1d0 100644
--- a/erase_info.c
+++ b/erase_info.c
@@ -1881,7 +1881,7 @@ get_symbol_addr_all(char *name) {
if (!strcmp(get_dwarf_module_name(), "vmlinux")) {
symbol_addr = get_symbol_addr(name);
if (symbol_addr)
- return symbol_addr;
+ return symbol_addr + info->kaslr_offset;
vmlinux_searched = 1;
}
@@ -1942,9 +1942,9 @@ get_symbol_addr_all(char *name) {
* this function is called with debuginfo set to a particular
* kernel module and we are looking for symbol in vmlinux
*/
- if (!vmlinux_searched)
- return get_symbol_addr(name);
- else
+ if (!vmlinux_searched && !!(symbol_addr = get_symbol_addr(name))) {
+ return symbol_addr + info->kaslr_offset;
+ } else
return NOT_FOUND_SYMBOL;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH RFC][makedumpfile 01/10] dwarf_info: Support kernel address randomization
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 01/10] dwarf_info: Support kernel address randomization Tao Liu
@ 2025-09-08 11:11 ` YAMAZAKI MASAMITSU(山崎 真光)
2025-09-09 5:24 ` Tao Liu
2025-09-29 6:30 ` HAGIO KAZUHITO(萩尾 一仁)
1 sibling, 1 reply; 26+ messages in thread
From: YAMAZAKI MASAMITSU(山崎 真光) @ 2025-09-08 11:11 UTC (permalink / raw)
To: Tao Liu, HAGIO KAZUHITO(萩尾 一仁),
kexec@lists.infradead.org
Cc: aravinda@linux.vnet.ibm.com, devel@lists.crash-utility.osci.io
Hi,Liu
Thanks your patch. What kind of environment did you test
this KASLR problem ? Especialy is it kernel verssion
6.11.8-300.fc41.x86_64 ? Have you checked other versions?
Additionally, do you know the version of eppic ?
Also, if possible, show this problem results when an error occurs
and the results when it is correct. Please tell me.
Sincerely
On 2025/06/10 18:57, Tao Liu wrote:
> There is a bug of not supporting randomized kernel address, this patch fix it.
>
> Signed-off-by: Tao Liu <ltao@redhat.com>
> ---
> erase_info.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/erase_info.c b/erase_info.c
> index af6bfae..b67d1d0 100644
> --- a/erase_info.c
> +++ b/erase_info.c
> @@ -1881,7 +1881,7 @@ get_symbol_addr_all(char *name) {
> if (!strcmp(get_dwarf_module_name(), "vmlinux")) {
> symbol_addr = get_symbol_addr(name);
> if (symbol_addr)
> - return symbol_addr;
> + return symbol_addr + info->kaslr_offset;
>
> vmlinux_searched = 1;
> }
> @@ -1942,9 +1942,9 @@ get_symbol_addr_all(char *name) {
> * this function is called with debuginfo set to a particular
> * kernel module and we are looking for symbol in vmlinux
> */
> - if (!vmlinux_searched)
> - return get_symbol_addr(name);
> - else
> + if (!vmlinux_searched && !!(symbol_addr = get_symbol_addr(name))) {
> + return symbol_addr + info->kaslr_offset;
> + } else
> return NOT_FOUND_SYMBOL;
> }
>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH RFC][makedumpfile 01/10] dwarf_info: Support kernel address randomization
2025-09-08 11:11 ` YAMAZAKI MASAMITSU(山崎 真光)
@ 2025-09-09 5:24 ` Tao Liu
0 siblings, 0 replies; 26+ messages in thread
From: Tao Liu @ 2025-09-09 5:24 UTC (permalink / raw)
To: YAMAZAKI MASAMITSU(山崎 真光)
Cc: HAGIO KAZUHITO(萩尾 一仁),
kexec@lists.infradead.org, aravinda@linux.vnet.ibm.com,
devel@lists.crash-utility.osci.io
Hi YAMAZAKI,
On Mon, Sep 8, 2025 at 11:11 PM YAMAZAKI MASAMITSU(山崎 真光)
<yamazaki-msmt@nec.com> wrote:
>
> Hi,Liu
>
> Thanks your patch. What kind of environment did you test
> this KASLR problem ? Especialy is it kernel verssion
> 6.11.8-300.fc41.x86_64 ? Have you checked other versions?
> Additionally, do you know the version of eppic ?
I had primarily tested the patchset within 6.11.8-300.fc41.x86_64,
since it has been a while, and I have updated my kernel to 6.16.3. The
dwarf issue(PATCH 01/10 & PATCH 02/10) can still be reproduced there,
see the following:
1) Env:
$ uname -a
Linux localhost.localdomain 6.16.3 #3 SMP PREEMPT_DYNAMIC Thu Aug 28
15:12:38 2025 x86_64 GNU/Linux
liutgnu@localhost:~/sources/up-makedumpfile$ git log --oneline | head -2
65bf4c9 [PATCH v2] Fix a data race in multi-threading mode (--num-threads=N)
liutgnu@localhost:~/sources/up-eppic$ git branch
* master
v5.0
liutgnu@localhost:~/sources/up-eppic$ git log --oneline | head -2
72da440 Merge pull request #20 from yselkowitz/master
5ecafb7 Fix build with glibc-2.42
liutgnu@localhost:~/sources/up-makedumpfile$ cat eppic_scripts/test.c
string test_opt() {...}
string test_usage() {...}
static void test_showusage() {...}
string test_help() {...}
int test()
{
printf("linux banner: %lx\n", (unsigned long)&linux_banner);
return 0;
}
2) The recursive issue as fixed in PATCH 02/10:
$ ./makedumpfile -d 31 -l vmcore /tmp/out -x vmlinux --eppic
eppic_scripts/test.c --dry-run
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7c3ca4b in pthread_rwlock_unlock@GLIBC_2.2.5 () from
target:/lib64/libc.so.6
(gdb) bt
...
#5 0x0000000000409a67 in search_domain (die=0x7fffff7ff220,
found=0x7fffffffcf24) at dwarf_info.c:904
#6 0x0000000000409978 in search_domain (die=0x7fffff7ff290,
found=0x7fffffffcf24) at dwarf_info.c:865
#7 0x0000000000409978 in search_domain (die=0x7fffff7ff300,
found=0x7fffffffcf24) at dwarf_info.c:865
...
#74820 0x0000000000409978 in search_domain (die=0x7fffffffcdb0,
found=0x7fffffffcf24) at dwarf_info.c:865
#74821 0x0000000000409978 in search_domain (die=0x7fffffffce60,
found=0x7fffffffcf24) at dwarf_info.c:865
#74822 0x0000000000409bfc in search_die_tree (die=0x7fffffffce60,
found=0x7fffffffcf24) at dwarf_info.c:953
#74823 0x0000000000409b2c in search_die_tree (die=0x7fffffffce90,
found=0x7fffffffcf24) at dwarf_info.c:935
#74824 0x0000000000409ec7 in get_debug_info () at dwarf_info.c:1010
#74825 0x000000000040af2a in get_domain (symname=0x90aeb0 "memset",
cmd=13, die=0x7fffffffd020) at dwarf_info.c:1381
#74826 0x00000000004143d0 in get_domain_all (symname=0x90aeb0
"memset", cmd=13, die=0x7fffffffd020) at erase_info.c:1965
#74827 0x00007ffff76c0fc1 in apigetctype (ctype=7, name=0x90aeb0
"memset", tout=0x48be80) at extension_eppic.c:280
#74828 0x00007ffff76cab1b in eppic_getctype (ctype=7, name=0x90aeb0
"memset", silent=1) at eppic_api.c:454
#74829 0x00007ffff76d56c0 in eppiclex () at lex.eppic.c:1831
#74830 0x00007ffff76d2ae5 in eppicparse () at eppic.tab.c:1804
#74831 0x00007ffff76c1f4d in eppic_parsexpr (expr=0x7ffff7707364 "int
memset(char *, int, int)") at eppic_func.c:437
#74832 0x00007ffff76c740e in eppic_builtin (proto=0x7ffff7707364 "int
memset(char *, int, int)", fp=0x7ffff76c1349 <eppic_memset>)
at eppic_builtin.c:252
#74833 0x00007ffff76c1420 in eppic_init (fun_ptr=0x473600 <eppic_cb>)
at extension_eppic.c:461
#74834 0x0000000000414f83 in process_eppic_file
(name_config=0x7fffffffe0b7 "eppic_scripts/test.c") at
erase_info.c:2226
#74835 0x0000000000415513 in gather_filter_info () at erase_info.c:2357
#74836 0x000000000044f237 in create_dumpfile () at makedumpfile.c:10885
#74837 0x000000000045569a in main (argc=11, argv=0x7fffffffdc08) at
makedumpfile.c:12478
(gdb) frame 74821
#74821 0x0000000000409978 in search_domain (die=0x7fffffffce60,
found=0x7fffffffcf24) at dwarf_info.c:865
865 search_domain(&child, found);
(gdb) p name
$1 = 0x7ffff1bdb782 "&core::num::fmt::Part"
...
(gdb) frame 6
#6 0x0000000000409978 in search_domain (die=0x7fffff7ff290,
found=0x7fffffffcf24) at dwarf_info.c:865
865 search_domain(&child, found);
(gdb) p name
$8 = 0x7ffff1bdbe1f "and_then<char, (),
core::net::parser::{impl#0}::read_given_char::{closure#0}::{closure_env#0}>"
So the stack overflows, and is caused by rust symbols. I have no idea
why rust symbols will cause this, so in PATCH 02/10, I skipped all
rust symbols for now. With the patch applied, no segfaults.
3) The kaslr issue as fixed in PATCH 01/10:
$ ./makedumpfile -d 31 -l vmcore /tmp/out -x vmlinux --eppic
eppic_scripts/test.c --dry-run
__vtop4_x86_64: Can't get a valid pmd_pte.
readmem: Can't convert a virtual address(ffffffff828445f0) to physical address.
readmem: type_addr: 0, addr:ffffffff828445f0, size:1
linux banner: ffffffff828445f0
crash> sym linux_banner
ffffffff938445f0 (D) linux_banner
liutgnu@localhost:/lib/modules/6.16.3/build$ cat System.map |grep linux_banner
ffffffff828445f0 D linux_banner
So the value outputted by eppic script is not kaslred. With the patch
applied, the eppic can output the correct value as crash's:
$ ./makedumpfile -d 31 -l vmcore /tmp/out -x vmlinux --eppic
eppic_scripts/test.c --dry-run
linux banner: ffffffff938445f0
Hope the debug process can help for PATCH 01/10 & PATCH 02/10
Thanks,
Tao Liu
>
> Also, if possible, show this problem results when an error occurs
> and the results when it is correct. Please tell me.
>
> Sincerely
>
> On 2025/06/10 18:57, Tao Liu wrote:
> > There is a bug of not supporting randomized kernel address, this patch fix it.
> >
> > Signed-off-by: Tao Liu <ltao@redhat.com>
> > ---
> > erase_info.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/erase_info.c b/erase_info.c
> > index af6bfae..b67d1d0 100644
> > --- a/erase_info.c
> > +++ b/erase_info.c
> > @@ -1881,7 +1881,7 @@ get_symbol_addr_all(char *name) {
> > if (!strcmp(get_dwarf_module_name(), "vmlinux")) {
> > symbol_addr = get_symbol_addr(name);
> > if (symbol_addr)
> > - return symbol_addr;
> > + return symbol_addr + info->kaslr_offset;
> >
> > vmlinux_searched = 1;
> > }
> > @@ -1942,9 +1942,9 @@ get_symbol_addr_all(char *name) {
> > * this function is called with debuginfo set to a particular
> > * kernel module and we are looking for symbol in vmlinux
> > */
> > - if (!vmlinux_searched)
> > - return get_symbol_addr(name);
> > - else
> > + if (!vmlinux_searched && !!(symbol_addr = get_symbol_addr(name))) {
> > + return symbol_addr + info->kaslr_offset;
> > + } else
> > return NOT_FOUND_SYMBOL;
> > }
> >
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH RFC][makedumpfile 01/10] dwarf_info: Support kernel address randomization
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 01/10] dwarf_info: Support kernel address randomization Tao Liu
2025-09-08 11:11 ` YAMAZAKI MASAMITSU(山崎 真光)
@ 2025-09-29 6:30 ` HAGIO KAZUHITO(萩尾 一仁)
2025-09-30 0:34 ` Tao Liu
1 sibling, 1 reply; 26+ messages in thread
From: HAGIO KAZUHITO(萩尾 一仁) @ 2025-09-29 6:30 UTC (permalink / raw)
To: Tao Liu,
YAMAZAKI MASAMITSU(山崎 真光),
kexec@lists.infradead.org
Cc: aravinda@linux.vnet.ibm.com, devel@lists.crash-utility.osci.io
Hi Tao,
On 2025/06/10 18:57, Tao Liu wrote:
> There is a bug of not supporting randomized kernel address, this patch fix it.
>
> Signed-off-by: Tao Liu <ltao@redhat.com>
apologies for my long delay and thank you for the patch.
This patch looks good to me and I would like to merge this separately
from the series, so
Acked-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Thanks,
Kazu
> ---
> erase_info.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/erase_info.c b/erase_info.c
> index af6bfae..b67d1d0 100644
> --- a/erase_info.c
> +++ b/erase_info.c
> @@ -1881,7 +1881,7 @@ get_symbol_addr_all(char *name) {
> if (!strcmp(get_dwarf_module_name(), "vmlinux")) {
> symbol_addr = get_symbol_addr(name);
> if (symbol_addr)
> - return symbol_addr;
> + return symbol_addr + info->kaslr_offset;
>
> vmlinux_searched = 1;
> }
> @@ -1942,9 +1942,9 @@ get_symbol_addr_all(char *name) {
> * this function is called with debuginfo set to a particular
> * kernel module and we are looking for symbol in vmlinux
> */
> - if (!vmlinux_searched)
> - return get_symbol_addr(name);
> - else
> + if (!vmlinux_searched && !!(symbol_addr = get_symbol_addr(name))) {
> + return symbol_addr + info->kaslr_offset;
> + } else
> return NOT_FOUND_SYMBOL;
> }
>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH RFC][makedumpfile 01/10] dwarf_info: Support kernel address randomization
2025-09-29 6:30 ` HAGIO KAZUHITO(萩尾 一仁)
@ 2025-09-30 0:34 ` Tao Liu
2025-09-30 1:28 ` HAGIO KAZUHITO(萩尾 一仁)
0 siblings, 1 reply; 26+ messages in thread
From: Tao Liu @ 2025-09-30 0:34 UTC (permalink / raw)
To: HAGIO KAZUHITO(萩尾 一仁)
Cc: YAMAZAKI MASAMITSU(山崎 真光),
kexec@lists.infradead.org, aravinda@linux.vnet.ibm.com,
devel@lists.crash-utility.osci.io
Hi Kazu,
Thanks for your review of the patch [1 & 2].
As for the left [3 - 10] patches, I will redraft and post v2 later.
Thanks,
Tao Liu
On Mon, Sep 29, 2025 at 7:30 PM HAGIO KAZUHITO(萩尾 一仁)
<k-hagio-ab@nec.com> wrote:
>
> Hi Tao,
>
> On 2025/06/10 18:57, Tao Liu wrote:
> > There is a bug of not supporting randomized kernel address, this patch fix it.
> >
> > Signed-off-by: Tao Liu <ltao@redhat.com>
>
> apologies for my long delay and thank you for the patch.
>
> This patch looks good to me and I would like to merge this separately
> from the series, so
>
> Acked-by: Kazuhito Hagio <k-hagio-ab@nec.com>
>
> Thanks,
> Kazu
>
> > ---
> > erase_info.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/erase_info.c b/erase_info.c
> > index af6bfae..b67d1d0 100644
> > --- a/erase_info.c
> > +++ b/erase_info.c
> > @@ -1881,7 +1881,7 @@ get_symbol_addr_all(char *name) {
> > if (!strcmp(get_dwarf_module_name(), "vmlinux")) {
> > symbol_addr = get_symbol_addr(name);
> > if (symbol_addr)
> > - return symbol_addr;
> > + return symbol_addr + info->kaslr_offset;
> >
> > vmlinux_searched = 1;
> > }
> > @@ -1942,9 +1942,9 @@ get_symbol_addr_all(char *name) {
> > * this function is called with debuginfo set to a particular
> > * kernel module and we are looking for symbol in vmlinux
> > */
> > - if (!vmlinux_searched)
> > - return get_symbol_addr(name);
> > - else
> > + if (!vmlinux_searched && !!(symbol_addr = get_symbol_addr(name))) {
> > + return symbol_addr + info->kaslr_offset;
> > + } else
> > return NOT_FOUND_SYMBOL;
> > }
> >
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH RFC][makedumpfile 01/10] dwarf_info: Support kernel address randomization
2025-09-30 0:34 ` Tao Liu
@ 2025-09-30 1:28 ` HAGIO KAZUHITO(萩尾 一仁)
2025-09-30 1:44 ` Tao Liu
0 siblings, 1 reply; 26+ messages in thread
From: HAGIO KAZUHITO(萩尾 一仁) @ 2025-09-30 1:28 UTC (permalink / raw)
To: Tao Liu
Cc: YAMAZAKI MASAMITSU(山崎 真光),
kexec@lists.infradead.org, aravinda@linux.vnet.ibm.com,
devel@lists.crash-utility.osci.io
On 2025/09/30 9:34, Tao Liu wrote:
> Hi Kazu,
>
> Thanks for your review of the patch [1 & 2].
ah sorry, the ack was for 01/10, I will review 02/10 later.
>
> As for the left [3 - 10] patches, I will redraft and post v2 later.
I see.
Thanks,
Kazu
>
> Thanks,
> Tao Liu
>
> On Mon, Sep 29, 2025 at 7:30 PM HAGIO KAZUHITO(萩尾 一仁)
> <k-hagio-ab@nec.com> wrote:
>>
>> Hi Tao,
>>
>> On 2025/06/10 18:57, Tao Liu wrote:
>>> There is a bug of not supporting randomized kernel address, this patch fix it.
>>>
>>> Signed-off-by: Tao Liu <ltao@redhat.com>
>>
>> apologies for my long delay and thank you for the patch.
>>
>> This patch looks good to me and I would like to merge this separately
>> from the series, so
>>
>> Acked-by: Kazuhito Hagio <k-hagio-ab@nec.com>
>>
>> Thanks,
>> Kazu
>>
>>> ---
>>> erase_info.c | 8 ++++----
>>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/erase_info.c b/erase_info.c
>>> index af6bfae..b67d1d0 100644
>>> --- a/erase_info.c
>>> +++ b/erase_info.c
>>> @@ -1881,7 +1881,7 @@ get_symbol_addr_all(char *name) {
>>> if (!strcmp(get_dwarf_module_name(), "vmlinux")) {
>>> symbol_addr = get_symbol_addr(name);
>>> if (symbol_addr)
>>> - return symbol_addr;
>>> + return symbol_addr + info->kaslr_offset;
>>>
>>> vmlinux_searched = 1;
>>> }
>>> @@ -1942,9 +1942,9 @@ get_symbol_addr_all(char *name) {
>>> * this function is called with debuginfo set to a particular
>>> * kernel module and we are looking for symbol in vmlinux
>>> */
>>> - if (!vmlinux_searched)
>>> - return get_symbol_addr(name);
>>> - else
>>> + if (!vmlinux_searched && !!(symbol_addr = get_symbol_addr(name))) {
>>> + return symbol_addr + info->kaslr_offset;
>>> + } else
>>> return NOT_FOUND_SYMBOL;
>>> }
>>>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH RFC][makedumpfile 01/10] dwarf_info: Support kernel address randomization
2025-09-30 1:28 ` HAGIO KAZUHITO(萩尾 一仁)
@ 2025-09-30 1:44 ` Tao Liu
0 siblings, 0 replies; 26+ messages in thread
From: Tao Liu @ 2025-09-30 1:44 UTC (permalink / raw)
To: HAGIO KAZUHITO(萩尾 一仁)
Cc: YAMAZAKI MASAMITSU(山崎 真光),
kexec@lists.infradead.org, aravinda@linux.vnet.ibm.com,
devel@lists.crash-utility.osci.io
On Tue, Sep 30, 2025 at 2:29 PM HAGIO KAZUHITO(萩尾 一仁)
<k-hagio-ab@nec.com> wrote:
>
> On 2025/09/30 9:34, Tao Liu wrote:
> > Hi Kazu,
> >
> > Thanks for your review of the patch [1 & 2].
>
> ah sorry, the ack was for 01/10, I will review 02/10 later.
Oh I see, no worries, please take your time :)
Thanks,
Tao Liu
>
> >
> > As for the left [3 - 10] patches, I will redraft and post v2 later.
>
> I see.
>
> Thanks,
> Kazu
>
>
> >
> > Thanks,
> > Tao Liu
> >
> > On Mon, Sep 29, 2025 at 7:30 PM HAGIO KAZUHITO(萩尾 一仁)
> > <k-hagio-ab@nec.com> wrote:
> >>
> >> Hi Tao,
> >>
> >> On 2025/06/10 18:57, Tao Liu wrote:
> >>> There is a bug of not supporting randomized kernel address, this patch fix it.
> >>>
> >>> Signed-off-by: Tao Liu <ltao@redhat.com>
> >>
> >> apologies for my long delay and thank you for the patch.
> >>
> >> This patch looks good to me and I would like to merge this separately
> >> from the series, so
> >>
> >> Acked-by: Kazuhito Hagio <k-hagio-ab@nec.com>
> >>
> >> Thanks,
> >> Kazu
> >>
> >>> ---
> >>> erase_info.c | 8 ++++----
> >>> 1 file changed, 4 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/erase_info.c b/erase_info.c
> >>> index af6bfae..b67d1d0 100644
> >>> --- a/erase_info.c
> >>> +++ b/erase_info.c
> >>> @@ -1881,7 +1881,7 @@ get_symbol_addr_all(char *name) {
> >>> if (!strcmp(get_dwarf_module_name(), "vmlinux")) {
> >>> symbol_addr = get_symbol_addr(name);
> >>> if (symbol_addr)
> >>> - return symbol_addr;
> >>> + return symbol_addr + info->kaslr_offset;
> >>>
> >>> vmlinux_searched = 1;
> >>> }
> >>> @@ -1942,9 +1942,9 @@ get_symbol_addr_all(char *name) {
> >>> * this function is called with debuginfo set to a particular
> >>> * kernel module and we are looking for symbol in vmlinux
> >>> */
> >>> - if (!vmlinux_searched)
> >>> - return get_symbol_addr(name);
> >>> - else
> >>> + if (!vmlinux_searched && !!(symbol_addr = get_symbol_addr(name))) {
> >>> + return symbol_addr + info->kaslr_offset;
> >>> + } else
> >>> return NOT_FOUND_SYMBOL;
> >>> }
> >>>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH RFC][makedumpfile 02/10] dwarf_info: Fix a infinite recursion bug for search_domain
2025-06-10 9:57 [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering Tao Liu
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 01/10] dwarf_info: Support kernel address randomization Tao Liu
@ 2025-06-10 9:57 ` Tao Liu
2025-10-03 7:22 ` YAMAZAKI MASAMITSU(山崎 真光)
2025-10-17 4:21 ` HAGIO KAZUHITO(萩尾 一仁)
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 03/10] Add page filtering function Tao Liu
` (8 subsequent siblings)
10 siblings, 2 replies; 26+ messages in thread
From: Tao Liu @ 2025-06-10 9:57 UTC (permalink / raw)
To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, devel, Tao Liu
There is an infinite recursion bug noticed in rust symbols. The root cause is
unclear to me. This patch will avoid the bug by skip the recursion of rust
symbols, since currently we don't need to deal with those.
Signed-off-by: Tao Liu <ltao@redhat.com>
---
dwarf_info.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/dwarf_info.c b/dwarf_info.c
index a3a2fd6..73842ab 100644
--- a/dwarf_info.c
+++ b/dwarf_info.c
@@ -837,7 +837,7 @@ search_symbol(Dwarf_Die *die, int *found)
}
static void
-search_domain(Dwarf_Die *die, int *found)
+search_domain(Dwarf_Die *die, int *found, int lang)
{
int tag;
const char *name;
@@ -859,10 +859,11 @@ search_domain(Dwarf_Die *die, int *found)
if (is_container(&die_type)) {
Dwarf_Die child;
- if (dwarf_child(&die_type, &child) != 0)
+ if (dwarf_child(&die_type, &child) != 0 ||
+ lang == DW_LANG_Rust)
continue;
- search_domain(&child, found);
+ search_domain(&child, found, lang);
if (*found)
return;
@@ -924,7 +925,7 @@ search_die(Dwarf_Die *die, int *found)
}
static void
-search_die_tree(Dwarf_Die *die, int *found)
+search_die_tree(Dwarf_Die *die, int *found, int lang)
{
Dwarf_Die child;
@@ -932,7 +933,7 @@ search_die_tree(Dwarf_Die *die, int *found)
* start by looking at the children
*/
if (dwarf_child(die, &child) == 0)
- search_die_tree(&child, found);
+ search_die_tree(&child, found, lang);
if (*found)
return;
@@ -950,7 +951,7 @@ search_die_tree(Dwarf_Die *die, int *found)
search_typedef(die, found);
else if (is_search_domain(dwarf_info.cmd))
- search_domain(die, found);
+ search_domain(die, found, lang);
else if (is_search_die(dwarf_info.cmd))
search_die(die, found);
@@ -1007,7 +1008,7 @@ get_debug_info(void)
ERRMSG("Can't get CU die.\n");
goto out;
}
- search_die_tree(&cu_die, &found);
+ search_die_tree(&cu_die, &found, dwarf_srclang(&cu_die));
if (found)
break;
off = next_off;
--
2.47.0
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH RFC][makedumpfile 02/10] dwarf_info: Fix a infinite recursion bug for search_domain
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 02/10] dwarf_info: Fix a infinite recursion bug for search_domain Tao Liu
@ 2025-10-03 7:22 ` YAMAZAKI MASAMITSU(山崎 真光)
2025-10-05 23:25 ` Tao Liu
2025-10-17 4:21 ` HAGIO KAZUHITO(萩尾 一仁)
1 sibling, 1 reply; 26+ messages in thread
From: YAMAZAKI MASAMITSU(山崎 真光) @ 2025-10-03 7:22 UTC (permalink / raw)
To: Tao Liu, HAGIO KAZUHITO(萩尾 一仁),
kexec@lists.infradead.org
Cc: aravinda@linux.vnet.ibm.com, devel@lists.crash-utility.osci.io
On 2025/06/10 18:57, Tao Liu wrote:
> There is an infinite recursion bug noticed in rust symbols. The root cause is
> unclear to me. This patch will avoid the bug by skip the recursion of rust
> symbols, since currently we don't need to deal with those.
>
> Signed-off-by: Tao Liu <ltao@redhat.com>
> ---
> dwarf_info.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/dwarf_info.c b/dwarf_info.c
> index a3a2fd6..73842ab 100644
> --- a/dwarf_info.c
> +++ b/dwarf_info.c
> @@ -837,7 +837,7 @@ search_symbol(Dwarf_Die *die, int *found)
> }
>
> static void
> -search_domain(Dwarf_Die *die, int *found)
> +search_domain(Dwarf_Die *die, int *found, int lang)
> {
> int tag;
> const char *name;
> @@ -859,10 +859,11 @@ search_domain(Dwarf_Die *die, int *found)
> if (is_container(&die_type)) {
> Dwarf_Die child;
>
> - if (dwarf_child(&die_type, &child) != 0)
> + if (dwarf_child(&die_type, &child) != 0 ||
> + lang == DW_LANG_Rust)
> continue;
>
> - search_domain(&child, found);
> + search_domain(&child, found, lang);
>
> if (*found)
> return;
> @@ -924,7 +925,7 @@ search_die(Dwarf_Die *die, int *found)
> }
>
> static void
> -search_die_tree(Dwarf_Die *die, int *found)
> +search_die_tree(Dwarf_Die *die, int *found, int lang)
> {
> Dwarf_Die child;
>
> @@ -932,7 +933,7 @@ search_die_tree(Dwarf_Die *die, int *found)
> * start by looking at the children
> */
> if (dwarf_child(die, &child) == 0)
> - search_die_tree(&child, found);
> + search_die_tree(&child, found, lang);
>
> if (*found)
> return;
> @@ -950,7 +951,7 @@ search_die_tree(Dwarf_Die *die, int *found)
> search_typedef(die, found);
>
> else if (is_search_domain(dwarf_info.cmd))
> - search_domain(die, found);
> + search_domain(die, found, lang);
>
> else if (is_search_die(dwarf_info.cmd))
> search_die(die, found);
> @@ -1007,7 +1008,7 @@ get_debug_info(void)
> ERRMSG("Can't get CU die.\n");
> goto out;
> }
> - search_die_tree(&cu_die, &found);
> + search_die_tree(&cu_die, &found, dwarf_srclang(&cu_die));
> if (found)
> break;
> off = next_off;
Hi Liu
This problem need to be solve. But I don't know how to reproduce.
If Your server is running rust program. Or Or is it running as a
module by rust? Please tell me how to reproduce it.
Thanks,
Masa
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH RFC][makedumpfile 02/10] dwarf_info: Fix a infinite recursion bug for search_domain
2025-10-03 7:22 ` YAMAZAKI MASAMITSU(山崎 真光)
@ 2025-10-05 23:25 ` Tao Liu
0 siblings, 0 replies; 26+ messages in thread
From: Tao Liu @ 2025-10-05 23:25 UTC (permalink / raw)
To: YAMAZAKI MASAMITSU(山崎 真光)
Cc: HAGIO KAZUHITO(萩尾 一仁),
kexec@lists.infradead.org, aravinda@linux.vnet.ibm.com,
devel@lists.crash-utility.osci.io
Hi YAMAZAKI,
On Fri, Oct 3, 2025 at 8:23 PM YAMAZAKI MASAMITSU(山崎 真光)
<yamazaki-msmt@nec.com> wrote:
>
> On 2025/06/10 18:57, Tao Liu wrote:
> > There is an infinite recursion bug noticed in rust symbols. The root cause is
> > unclear to me. This patch will avoid the bug by skip the recursion of rust
> > symbols, since currently we don't need to deal with those.
> >
> > Signed-off-by: Tao Liu <ltao@redhat.com>
> > ---
> > dwarf_info.c | 15 ++++++++-------
> > 1 file changed, 8 insertions(+), 7 deletions(-)
> >
> > diff --git a/dwarf_info.c b/dwarf_info.c
> > index a3a2fd6..73842ab 100644
> > --- a/dwarf_info.c
> > +++ b/dwarf_info.c
> > @@ -837,7 +837,7 @@ search_symbol(Dwarf_Die *die, int *found)
> > }
> >
> > static void
> > -search_domain(Dwarf_Die *die, int *found)
> > +search_domain(Dwarf_Die *die, int *found, int lang)
> > {
> > int tag;
> > const char *name;
> > @@ -859,10 +859,11 @@ search_domain(Dwarf_Die *die, int *found)
> > if (is_container(&die_type)) {
> > Dwarf_Die child;
> >
> > - if (dwarf_child(&die_type, &child) != 0)
> > + if (dwarf_child(&die_type, &child) != 0 ||
> > + lang == DW_LANG_Rust)
> > continue;
> >
> > - search_domain(&child, found);
> > + search_domain(&child, found, lang);
> >
> > if (*found)
> > return;
> > @@ -924,7 +925,7 @@ search_die(Dwarf_Die *die, int *found)
> > }
> >
> > static void
> > -search_die_tree(Dwarf_Die *die, int *found)
> > +search_die_tree(Dwarf_Die *die, int *found, int lang)
> > {
> > Dwarf_Die child;
> >
> > @@ -932,7 +933,7 @@ search_die_tree(Dwarf_Die *die, int *found)
> > * start by looking at the children
> > */
> > if (dwarf_child(die, &child) == 0)
> > - search_die_tree(&child, found);
> > + search_die_tree(&child, found, lang);
> >
> > if (*found)
> > return;
> > @@ -950,7 +951,7 @@ search_die_tree(Dwarf_Die *die, int *found)
> > search_typedef(die, found);
> >
> > else if (is_search_domain(dwarf_info.cmd))
> > - search_domain(die, found);
> > + search_domain(die, found, lang);
> >
> > else if (is_search_die(dwarf_info.cmd))
> > search_die(die, found);
> > @@ -1007,7 +1008,7 @@ get_debug_info(void)
> > ERRMSG("Can't get CU die.\n");
> > goto out;
> > }
> > - search_die_tree(&cu_die, &found);
> > + search_die_tree(&cu_die, &found, dwarf_srclang(&cu_die));
> > if (found)
> > break;
> > off = next_off;
>
> Hi Liu
>
> This problem need to be solve. But I don't know how to reproduce.
> If Your server is running rust program. Or Or is it running as a
> module by rust? Please tell me how to reproduce it.
Sure
E.g. using the following eppic program: /tmp/test.c:
string test_opt(){return "";}
string test_usage(){return "";}
static void test_showusage(){printf("");}
string test_help(){return "";}
int test()
{
struct task_struct *p;
unsigned long offset;
p = (struct task_struct *)&init_task;
offset = (unsigned long)&(p->tasks) - (unsigned long)p;
do {
printf("%d\n", (int)(p->pid));
p = (struct task_struct *)((unsigned long)(p->tasks.next) - (unsigned long)p);
} while (p != &init_task);
return 1;
}
$ ./makedumpfile --dry-run -d 31 -l
/var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore /tmp/out -x
/lib/debug/lib/modules/6.11.8-300.fc41.x86_64/vmlinux --eppic
/tmp/test.c
Segmentation fault (core dumped)
With the patch, no segfault.
The vmcore/vmlinux should contain rust symbols: CONFIG_RUST=y, you can
use the following vmcore https://people.redhat.com/~ltao/core/vmcore +
https://kojipkgs.fedoraproject.org//packages/kernel/6.11.8/300.fc41/x86_64/kernel-debuginfo-6.11.8-300.fc41.x86_64.rpm
for vmlinux and vmcore to test.
Thanks,
Tao Liu
>
> Thanks,
> Masa
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH RFC][makedumpfile 02/10] dwarf_info: Fix a infinite recursion bug for search_domain
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 02/10] dwarf_info: Fix a infinite recursion bug for search_domain Tao Liu
2025-10-03 7:22 ` YAMAZAKI MASAMITSU(山崎 真光)
@ 2025-10-17 4:21 ` HAGIO KAZUHITO(萩尾 一仁)
2025-10-20 3:52 ` Tao Liu
1 sibling, 1 reply; 26+ messages in thread
From: HAGIO KAZUHITO(萩尾 一仁) @ 2025-10-17 4:21 UTC (permalink / raw)
To: Tao Liu,
YAMAZAKI MASAMITSU(山崎 真光),
kexec@lists.infradead.org
Cc: aravinda@linux.vnet.ibm.com, devel@lists.crash-utility.osci.io
Hi Tao,
thank you for the fix.
On 2025/06/10 18:57, Tao Liu wrote:
> There is an infinite recursion bug noticed in rust symbols. The root cause is
> unclear to me. This patch will avoid the bug by skip the recursion of rust
> symbols, since currently we don't need to deal with those.
I confirmed that the recursive dwarf_child() calls result in a loop, and
I'm also not sure how we can avoid it correctly. I think for now there
is no demand for eppic + Rust with makedumpfile, so I can agree about
skipping Rust DIEs.
I think that probably you tried to support them as much as possible, so
the patch has the lang check in search_domain(). but a halfway support
may be confusing, i.e. maybe a user cannot determine whether it's not
supported or a bug.
So could we skip it in get_debug_info() like below and describe that the
eppic extension does not support Rust's debug information in the man page?
@@ -1007,6 +1010,12 @@ get_debug_info(void)
ERRMSG("Can't get CU die.\n");
goto out;
}
+
+ /* NOTE: currently Rust is not supported. */
+ if (dwarf_srclang(&cu_die) == DW_LANG_Rust)
+ continue;
+
search_die_tree(&cu_die, &found);
if (found)
break;
Thanks,
Kazu
>
> Signed-off-by: Tao Liu <ltao@redhat.com>
> ---
> dwarf_info.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/dwarf_info.c b/dwarf_info.c
> index a3a2fd6..73842ab 100644
> --- a/dwarf_info.c
> +++ b/dwarf_info.c
> @@ -837,7 +837,7 @@ search_symbol(Dwarf_Die *die, int *found)
> }
>
> static void
> -search_domain(Dwarf_Die *die, int *found)
> +search_domain(Dwarf_Die *die, int *found, int lang)
> {
> int tag;
> const char *name;
> @@ -859,10 +859,11 @@ search_domain(Dwarf_Die *die, int *found)
> if (is_container(&die_type)) {
> Dwarf_Die child;
>
> - if (dwarf_child(&die_type, &child) != 0)
> + if (dwarf_child(&die_type, &child) != 0 ||
> + lang == DW_LANG_Rust)
> continue;
>
> - search_domain(&child, found);
> + search_domain(&child, found, lang);
>
> if (*found)
> return;
> @@ -924,7 +925,7 @@ search_die(Dwarf_Die *die, int *found)
> }
>
> static void
> -search_die_tree(Dwarf_Die *die, int *found)
> +search_die_tree(Dwarf_Die *die, int *found, int lang)
> {
> Dwarf_Die child;
>
> @@ -932,7 +933,7 @@ search_die_tree(Dwarf_Die *die, int *found)
> * start by looking at the children
> */
> if (dwarf_child(die, &child) == 0)
> - search_die_tree(&child, found);
> + search_die_tree(&child, found, lang);
>
> if (*found)
> return;
> @@ -950,7 +951,7 @@ search_die_tree(Dwarf_Die *die, int *found)
> search_typedef(die, found);
>
> else if (is_search_domain(dwarf_info.cmd))
> - search_domain(die, found);
> + search_domain(die, found, lang);
>
> else if (is_search_die(dwarf_info.cmd))
> search_die(die, found);
> @@ -1007,7 +1008,7 @@ get_debug_info(void)
> ERRMSG("Can't get CU die.\n");
> goto out;
> }
> - search_die_tree(&cu_die, &found);
> + search_die_tree(&cu_die, &found, dwarf_srclang(&cu_die));
> if (found)
> break;
> off = next_off;
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH RFC][makedumpfile 02/10] dwarf_info: Fix a infinite recursion bug for search_domain
2025-10-17 4:21 ` HAGIO KAZUHITO(萩尾 一仁)
@ 2025-10-20 3:52 ` Tao Liu
0 siblings, 0 replies; 26+ messages in thread
From: Tao Liu @ 2025-10-20 3:52 UTC (permalink / raw)
To: HAGIO KAZUHITO(萩尾 一仁)
Cc: YAMAZAKI MASAMITSU(山崎 真光),
kexec@lists.infradead.org, aravinda@linux.vnet.ibm.com,
devel@lists.crash-utility.osci.io
Hi Kauz,
On Fri, Oct 17, 2025 at 5:21 PM HAGIO KAZUHITO(萩尾 一仁)
<k-hagio-ab@nec.com> wrote:
>
> Hi Tao,
>
> thank you for the fix.
>
> On 2025/06/10 18:57, Tao Liu wrote:
> > There is an infinite recursion bug noticed in rust symbols. The root cause is
> > unclear to me. This patch will avoid the bug by skip the recursion of rust
> > symbols, since currently we don't need to deal with those.
>
> I confirmed that the recursive dwarf_child() calls result in a loop, and
> I'm also not sure how we can avoid it correctly. I think for now there
> is no demand for eppic + Rust with makedumpfile, so I can agree about
> skipping Rust DIEs.
>
> I think that probably you tried to support them as much as possible, so
> the patch has the lang check in search_domain(). but a halfway support
> may be confusing, i.e. maybe a user cannot determine whether it's not
> supported or a bug.
Agreed.
>
> So could we skip it in get_debug_info() like below and describe that the
> eppic extension does not support Rust's debug information in the man page?
Sure, thanks for your suggestion, I will update it in v2.
>
> @@ -1007,6 +1010,12 @@ get_debug_info(void)
> ERRMSG("Can't get CU die.\n");
> goto out;
> }
> +
> + /* NOTE: currently Rust is not supported. */
> + if (dwarf_srclang(&cu_die) == DW_LANG_Rust)
> + continue;
> +
The following change works for me:
/* NOTE: currently Rust is not supported. */
if (dwarf_srclang(&cu_die) == DW_LANG_Rust) {
off = next_off;
continue;
}
I'm currently testing and will get it improved in v2
Thanks,
Tao Liu
> search_die_tree(&cu_die, &found);
> if (found)
> break;
>
> Thanks,
> Kazu
>
> >
> > Signed-off-by: Tao Liu <ltao@redhat.com>
> > ---
> > dwarf_info.c | 15 ++++++++-------
> > 1 file changed, 8 insertions(+), 7 deletions(-)
> >
> > diff --git a/dwarf_info.c b/dwarf_info.c
> > index a3a2fd6..73842ab 100644
> > --- a/dwarf_info.c
> > +++ b/dwarf_info.c
> > @@ -837,7 +837,7 @@ search_symbol(Dwarf_Die *die, int *found)
> > }
> >
> > static void
> > -search_domain(Dwarf_Die *die, int *found)
> > +search_domain(Dwarf_Die *die, int *found, int lang)
> > {
> > int tag;
> > const char *name;
> > @@ -859,10 +859,11 @@ search_domain(Dwarf_Die *die, int *found)
> > if (is_container(&die_type)) {
> > Dwarf_Die child;
> >
> > - if (dwarf_child(&die_type, &child) != 0)
> > + if (dwarf_child(&die_type, &child) != 0 ||
> > + lang == DW_LANG_Rust)
> > continue;
> >
> > - search_domain(&child, found);
> > + search_domain(&child, found, lang);
> >
> > if (*found)
> > return;
> > @@ -924,7 +925,7 @@ search_die(Dwarf_Die *die, int *found)
> > }
> >
> > static void
> > -search_die_tree(Dwarf_Die *die, int *found)
> > +search_die_tree(Dwarf_Die *die, int *found, int lang)
> > {
> > Dwarf_Die child;
> >
> > @@ -932,7 +933,7 @@ search_die_tree(Dwarf_Die *die, int *found)
> > * start by looking at the children
> > */
> > if (dwarf_child(die, &child) == 0)
> > - search_die_tree(&child, found);
> > + search_die_tree(&child, found, lang);
> >
> > if (*found)
> > return;
> > @@ -950,7 +951,7 @@ search_die_tree(Dwarf_Die *die, int *found)
> > search_typedef(die, found);
> >
> > else if (is_search_domain(dwarf_info.cmd))
> > - search_domain(die, found);
> > + search_domain(die, found, lang);
> >
> > else if (is_search_die(dwarf_info.cmd))
> > search_die(die, found);
> > @@ -1007,7 +1008,7 @@ get_debug_info(void)
> > ERRMSG("Can't get CU die.\n");
> > goto out;
> > }
> > - search_die_tree(&cu_die, &found);
> > + search_die_tree(&cu_die, &found, dwarf_srclang(&cu_die));
> > if (found)
> > break;
> > off = next_off;
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH RFC][makedumpfile 03/10] Add page filtering function
2025-06-10 9:57 [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering Tao Liu
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 01/10] dwarf_info: Support kernel address randomization Tao Liu
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 02/10] dwarf_info: Fix a infinite recursion bug for search_domain Tao Liu
@ 2025-06-10 9:57 ` Tao Liu
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 04/10] Add btf/kallsyms support for symbol type/address resolving Tao Liu
` (7 subsequent siblings)
10 siblings, 0 replies; 26+ messages in thread
From: Tao Liu @ 2025-06-10 9:57 UTC (permalink / raw)
To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, devel, Tao Liu
pfn and num is the data which eppic scritps give to makedumpfile for mm page
filtering, so any page within [pfn ~ pfn + num) can be filtered. Since
makedumpfile will iterate the pfn in an ascending order, the pfn & num
linked lists are also organized in a ascending order by pfn, so if one pfn
is hit by one list, the following pfn is more likely to be hit either by
this list again, or the one after, and a cur variable is used for saving
the current list to speedup the checking process.
Signed-off-by: Tao Liu <ltao@redhat.com>
---
erase_info.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++
erase_info.h | 8 ++++++
makedumpfile.c | 18 ++++++++++---
3 files changed, 91 insertions(+), 4 deletions(-)
diff --git a/erase_info.c b/erase_info.c
index b67d1d0..d68e1a2 100644
--- a/erase_info.c
+++ b/erase_info.c
@@ -2466,3 +2466,72 @@ get_size_eraseinfo(void)
return size_eraseinfo;
}
+static struct ft_page_info *ft_head = NULL;
+
+/*
+ * Insert the ft_page_info blocks into ft_head by ascending pfn.
+ */
+int
+update_filter_pages_info(unsigned long pfn, unsigned long num)
+{
+ struct ft_page_info *p;
+ struct ft_page_info *new_p = malloc(sizeof(struct ft_page_info));
+
+ if (!new_p) {
+ ERRMSG("Can't allocate memory for ft_page_info at %lx\n", pfn);
+ return 1;
+ }
+ new_p->pfn = pfn;
+ new_p->num = num;
+ new_p->next = NULL;
+
+ if (!ft_head || ft_head->pfn > new_p->pfn) {
+ new_p->next = ft_head;
+ ft_head = new_p;
+ return 0;
+ }
+
+ p = ft_head;
+ while (p->next != NULL && p->next->pfn < new_p->pfn) {
+ p = p->next;
+ }
+
+ new_p->next = p->next;
+ p->next = new_p;
+ return 0;
+}
+
+/*
+ * Check if the pfn page should be filtered.
+ *
+ * pfn and ft_head are in ascending order, so save the current ft_page_info
+ * block into **p because it is likely to hit again next time.
+ */
+int
+filter_page(unsigned long pfn, struct ft_page_info **p)
+{
+ if (ft_head == NULL)
+ return 0;
+
+ if (*p == NULL)
+ *p = ft_head;
+
+ /* Handle the 1st gap */
+ if (pfn >= 0 && pfn < ft_head->pfn)
+ return 0;
+
+ /* Handle ft_page_info blocks and following gaps */
+ while ((*p)->next) {
+ if (pfn >= (*p)->pfn && pfn < (*p)->pfn + (*p)->num)
+ return 1; // filter this page
+ if (pfn >= (*p)->pfn + (*p)->num && pfn < (*p)->next->pfn)
+ return 0; // save this page
+ *p = (*p)->next;
+ }
+
+ /* Handle the last gap */
+ if (pfn >= (*p)->pfn + (*p)->num)
+ return 0;
+ else
+ return 1;
+}
\ No newline at end of file
diff --git a/erase_info.h b/erase_info.h
index b363a40..4552dfc 100644
--- a/erase_info.h
+++ b/erase_info.h
@@ -64,6 +64,14 @@ void filter_data_buffer_parallel(unsigned char *buf, unsigned long long paddr,
size_t size, pthread_mutex_t *mutex);
unsigned long get_size_eraseinfo(void);
int update_filter_info_raw(unsigned long long, int, int);
+int update_filter_pages_info(unsigned long, unsigned long);
+
+struct ft_page_info {
+ unsigned long pfn;
+ unsigned long num;
+ struct ft_page_info *next;
+};
+int filter_page(unsigned long, struct ft_page_info **p);
#endif /* _ERASE_INFO_H */
diff --git a/makedumpfile.c b/makedumpfile.c
index 2d3b08b..33fad32 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -100,6 +100,7 @@ mdf_pfn_t pfn_user;
mdf_pfn_t pfn_free;
mdf_pfn_t pfn_hwpoison;
mdf_pfn_t pfn_offline;
+mdf_pfn_t pfn_eppic;
mdf_pfn_t pfn_elf_excluded;
mdf_pfn_t num_dumped;
@@ -6453,6 +6454,7 @@ __exclude_unnecessary_pages(unsigned long mem_map,
unsigned int order_offset, dtor_offset;
unsigned long flags, mapping, private = 0;
unsigned long compound_dtor, compound_head = 0;
+ struct ft_page_info *cur = NULL;
/*
* If a multi-page exclusion is pending, do it first
@@ -6670,6 +6672,13 @@ check_order:
else if (isOffline(flags, _mapcount)) {
pfn_counter = &pfn_offline;
}
+ /*
+ * Exclude pages that specified by eppic script
+ */
+ else if (filter_page(pfn, &cur)) {
+ nr_pages = 1;
+ pfn_counter = &pfn_eppic;
+ }
/*
* Unexcludable page
*/
@@ -8217,7 +8226,7 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
*/
if (info->flag_cyclic) {
pfn_zero = pfn_cache = pfn_cache_private = 0;
- pfn_user = pfn_free = pfn_hwpoison = pfn_offline = 0;
+ pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_eppic = 0;
pfn_memhole = info->max_mapnr;
}
@@ -9555,7 +9564,7 @@ write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_d
* Reset counter for debug message.
*/
pfn_zero = pfn_cache = pfn_cache_private = 0;
- pfn_user = pfn_free = pfn_hwpoison = pfn_offline = 0;
+ pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_eppic = 0;
pfn_memhole = info->max_mapnr;
/*
@@ -10504,7 +10513,7 @@ print_report(void)
pfn_original = info->max_mapnr - pfn_memhole;
pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
- + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
+ + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_eppic;
REPORT_MSG("\n");
REPORT_MSG("Original pages : 0x%016llx\n", pfn_original);
@@ -10520,6 +10529,7 @@ print_report(void)
REPORT_MSG(" Free pages : 0x%016llx\n", pfn_free);
REPORT_MSG(" Hwpoison pages : 0x%016llx\n", pfn_hwpoison);
REPORT_MSG(" Offline pages : 0x%016llx\n", pfn_offline);
+ REPORT_MSG(" Eppic filtered pages : 0x%016llx\n", pfn_eppic);
REPORT_MSG(" Remaining pages : 0x%016llx\n",
pfn_original - pfn_excluded);
@@ -10560,7 +10570,7 @@ print_mem_usage(void)
pfn_original = info->max_mapnr - pfn_memhole;
pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
- + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
+ + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_eppic;
shrinking = (pfn_original - pfn_excluded) * 100;
shrinking = shrinking / pfn_original;
total_size = info->page_size * pfn_original;
--
2.47.0
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH RFC][makedumpfile 04/10] Add btf/kallsyms support for symbol type/address resolving
2025-06-10 9:57 [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering Tao Liu
` (2 preceding siblings ...)
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 03/10] Add page filtering function Tao Liu
@ 2025-06-10 9:57 ` Tao Liu
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 05/10] Export necessary btf/kallsyms functions to eppic extension Tao Liu
` (6 subsequent siblings)
10 siblings, 0 replies; 26+ messages in thread
From: Tao Liu @ 2025-06-10 9:57 UTC (permalink / raw)
To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, devel, Tao Liu
btf data contains symbols' type and size, which is essential to determine a
structure/member size & offset. kallsyms contains symbols' address. All the
data can provide complete info for symbol resolving.
To initialize, we will:
1st) read vmcore info and parse kernel's kallsyms data;
2nd) read kernel data range of "__start/stop_BTF" and parse kernel's btf data,
now we have complete type/address info for kernel symbols;
3rd) iterate kernel's module and parse all kernel modules' kallsyms data;
4th) iterate btf_modules and parse all kernel modules' btf data, now we have
complete type/address info for kernel and modules symbols.
Also to speed up the converting of symbol name to address/type, the parsed
btf/kallsyms data are stored in hash tables.
Signed-off-by: Tao Liu <ltao@redhat.com>
---
Makefile | 2 +-
btf.c | 919 +++++++++++++++++++++++++++++++++++++++++++++++++
btf.h | 176 ++++++++++
kallsyms.c | 371 ++++++++++++++++++++
kallsyms.h | 42 +++
makedumpfile.c | 3 +
makedumpfile.h | 11 +
7 files changed, 1523 insertions(+), 1 deletion(-)
create mode 100644 btf.c
create mode 100644 btf.h
create mode 100644 kallsyms.c
create mode 100644 kallsyms.h
diff --git a/Makefile b/Makefile
index 18d3a17..fbc9f5b 100644
--- a/Makefile
+++ b/Makefile
@@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
endif
SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h
-SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c
+SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c kallsyms.c btf.c
OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c
OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
diff --git a/btf.c b/btf.c
new file mode 100644
index 0000000..c91c841
--- /dev/null
+++ b/btf.c
@@ -0,0 +1,919 @@
+#include "btf.h"
+#include <stdio.h>
+#include <assert.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <libelf.h>
+#include <gelf.h>
+#include <dirent.h>
+#include "kallsyms.h"
+#include "makedumpfile.h"
+
+// btf_files_array[0] must be kernel itself
+// btf_files_array[1..] are kernel modules
+static struct btf_file **btf_files_array = NULL;
+static int btf_files_array_index = 0;
+
+static const char *kind_names[NR_BTF_KINDS] = {
+ [BTF_KIND_UNKN] = "UNKN",
+ [BTF_KIND_INT] = "INT",
+ [BTF_KIND_PTR] = "PTR",
+ [BTF_KIND_ARRAY] = "ARRAY",
+ [BTF_KIND_STRUCT] = "STRUCT",
+ [BTF_KIND_UNION] = "UNION",
+ [BTF_KIND_ENUM] = "ENUM",
+ [BTF_KIND_FWD] = "FWD",
+ [BTF_KIND_TYPEDEF] = "TYPEDEF",
+ [BTF_KIND_VOLATILE] = "VOLATILE",
+ [BTF_KIND_CONST] = "CONST",
+ [BTF_KIND_RESTRICT] = "RESTRICT",
+ [BTF_KIND_FUNC] = "FUNC",
+ [BTF_KIND_FUNC_PROTO] = "FUNC_PROTO",
+ [BTF_KIND_VAR] = "VAR",
+ [BTF_KIND_DATASEC] = "DATASEC",
+ [BTF_KIND_FLOAT] = "FLOAT",
+ [BTF_KIND_DECL_TAG] = "DECL_TAG",
+ [BTF_KIND_TYPE_TAG] = "TYPE_TAG",
+ [BTF_KIND_ENUM64] = "ENUM64",
+};
+
+static inline uint16_t btf_vlen(uint32_t info)
+{
+ return info & 0xFFFF;
+}
+
+static inline uint32_t btf_kind_flag(uint32_t info)
+{
+ return info & (1 << 31);
+}
+
+static inline uint32_t btf_member_bit_offset(uint32_t value)
+{
+ return value & 0xffffff;
+}
+
+static inline uint32_t btf_member_bitfield_size(uint32_t value)
+{
+ return value >> 24;
+}
+
+static struct btf_type *btf_next(struct btf_type *tp)
+{
+ uintptr_t next = (uintptr_t)&tp[1];
+
+ switch (btf_kind(tp->info)) {
+ case BTF_KIND_INT:
+ next += sizeof(uint32_t);
+ break;
+
+ case BTF_KIND_ARRAY:
+ next += sizeof(struct btf_array);
+ break;
+
+ case BTF_KIND_STRUCT:
+ case BTF_KIND_UNION:
+ next += btf_vlen(tp->info) * sizeof(struct btf_member);
+ break;
+
+ case BTF_KIND_ENUM:
+ next += btf_vlen(tp->info) * sizeof(struct btf_enum);
+ break;
+
+ case BTF_KIND_FUNC_PROTO:
+ next += btf_vlen(tp->info) * sizeof(struct btf_param);
+ break;
+
+ case BTF_KIND_VAR:
+ next += sizeof(struct btf_var);
+ break;
+
+ case BTF_KIND_DATASEC:
+ next += btf_vlen(tp->info) * sizeof(struct btf_var_secinfo);
+ break;
+
+ case BTF_KIND_DECL_TAG:
+ next += sizeof(struct btf_decl_tag);
+ break;
+
+ case BTF_KIND_ENUM64:
+ next += btf_vlen(tp->info) * sizeof(struct btf_enum64);
+ break;
+
+ case BTF_KIND_PTR:
+ case BTF_KIND_FWD:
+ case BTF_KIND_TYPEDEF:
+ case BTF_KIND_VOLATILE:
+ case BTF_KIND_CONST:
+ case BTF_KIND_RESTRICT:
+ case BTF_KIND_FUNC:
+ case BTF_KIND_FLOAT:
+ case BTF_KIND_TYPE_TAG:
+ break; // no extra data
+
+ default:
+ __builtin_unreachable();
+ }
+ return (struct btf_type *)next;
+}
+
+#define NAME_HASH 512
+static struct name_entry *name_hash_table[NAME_HASH] = {0};
+
+static void *get_name_entry_next(void *entry)
+{
+ return ((struct name_entry *)entry)->name_hash_next;
+}
+
+static void set_name_entry_next(void *entry, void *next)
+{
+ ((struct name_entry *)entry)->name_hash_next = next;
+}
+
+static void name_hash_install(struct name_entry *en)
+{
+ hash_install((void **)name_hash_table, NAME_HASH, en, en->name,
+ get_name_entry_next, set_name_entry_next);
+}
+
+static unsigned int name_hash_index(char *name)
+{
+ return hash_index(name, NAME_HASH);
+}
+
+static int read_file_at_offset(char *f, int f_off, int r_size, void *buf)
+{
+ int got = 0;
+ int r, fd, ret = -1;
+
+ fd = open(f, O_RDONLY);
+ if (fd < 0) {
+ fprintf(stderr, "%s: Failed to open file %s\n", __func__, f);
+ goto out;
+ }
+
+ while (got < r_size) {
+ r = pread(fd, buf + got, r_size - got, f_off + got);
+ if (r < 0) {
+ fprintf(stderr, "%s: Failed to read file %s\n", __func__, f);
+ goto clean_fd;
+ }
+ got += r;
+ }
+ ret = got;
+
+clean_fd:
+ close(fd);
+out:
+ return ret;
+}
+
+static char *get_str_by_name_off(struct btf_file *bf, uint32_t name_off)
+{
+ struct btf_file *bf_vmlinux = btf_files_array[0];
+
+ if (bf != bf_vmlinux && name_off >= bf_vmlinux->str_cache_len)
+ return bf->str_cache + name_off - bf_vmlinux->str_cache_len;
+ else
+ return bf_vmlinux->str_cache + name_off;
+}
+
+int get_btf_type_by_type_id(struct btf_file *bf_in, uint32_t id_in,
+ struct btf_type *bt_out, struct name_entry **en_out)
+{
+ struct name_entry *en;
+ struct btf_file *bf_vmlinux = btf_files_array[0];
+
+ if (bf_in != bf_vmlinux && id_in > bf_vmlinux->array_len) {
+ en = bf_in->array[id_in - bf_vmlinux->array_len - 1];
+ } else {
+ en = bf_vmlinux->array[id_in - 1];
+ }
+ if (en_out)
+ *en_out = en;
+ return read_file_at_offset(en->bf->file_name,
+ en->btf_type_offset + en->bf->types_data_offset,
+ sizeof(*bt_out), bt_out);
+}
+
+/*
+* Parse the btf data and install elements into hashtable and array for quick
+* reference.
+*/
+static int parse_btf_data(char *file_path, char *data_start, uint32_t data_len)
+{
+ struct btf_header *hdr = (struct btf_header *)data_start;
+ struct btf_file *bf = NULL;
+ void *type_start;
+ char *str_start;
+ struct btf_type *tp;
+ struct name_entry *en;
+ int i, j, scale;
+
+ /* We do some check first */
+ if (hdr->magic != BTF_MAGIC) {
+ fprintf(stderr, "%s: Invalid BTF magic in %s\n",
+ __func__, file_path);
+ goto out;
+ }
+ if (hdr->hdr_len != sizeof(*hdr)) {
+ fprintf(stderr, "%s: Invalid BTF header length in %s\n",
+ __func__, file_path);
+ goto out;
+ }
+ if (hdr->hdr_len + hdr->str_off + hdr->str_len > data_len) {
+ fprintf(stderr, "%s: String section exceeds data length in %s\n",
+ __func__, file_path);
+ goto out;
+ }
+ if (hdr->hdr_len + hdr->type_off + hdr->type_len > data_len) {
+ fprintf(stderr, "%s: Type section exceeds data length in %s\n",
+ __func__, file_path);
+ goto out;
+ }
+
+ /* Let's start parsing */
+ bf = (struct btf_file *)malloc(sizeof(*bf));
+ if (!bf)
+ goto no_mem;
+ memset(bf, 0, sizeof(*bf));
+
+ /* Enlarge array when reach power of 2 */
+ btf_files_array[btf_files_array_index++] = bf;
+ if ((btf_files_array_index & (btf_files_array_index - 1)) == 0) {
+ struct btf_file **tmp = (struct btf_file **)reallocarray(btf_files_array,
+ btf_files_array_index << 1,
+ sizeof(struct btf_file *));
+ if (!tmp)
+ goto no_mem;
+ btf_files_array = tmp;
+ }
+
+ type_start = data_start + hdr->hdr_len + hdr->type_off;
+ str_start = data_start + hdr->hdr_len + hdr->str_off;
+
+ bf->str_cache = malloc(hdr->str_len);
+ if (!bf->str_cache)
+ goto no_mem;
+ memcpy(bf->str_cache, str_start, hdr->str_len);
+ bf->str_cache_len = hdr->str_len;
+
+ bf->array_len = 64;
+ bf->array = (struct name_entry **)calloc(bf->array_len,
+ sizeof(struct name_entry *));
+ if (!bf->array)
+ goto no_mem;
+
+ bf->file_name = strdup(file_path);
+ if (!bf->file_name)
+ goto no_mem;
+ bf->types_data_offset = (char *)type_start - data_start;
+
+ for (tp = (struct btf_type *)type_start, i = 0;
+ (void *)tp < type_start + hdr->type_len;
+ tp = btf_next(tp), i++) {
+ en = (struct name_entry *)malloc(sizeof(struct name_entry));
+ if (!en) {
+ bf->array_len = i;
+ goto no_mem;
+ }
+ memset(en, 0, sizeof(struct name_entry));
+
+ en->btf_type_offset = (void *)tp - type_start;
+ en->id = i + 1;
+ en->bf = bf;
+ if (tp->name_off) {
+ en->name = get_str_by_name_off(bf, tp->name_off);
+ if (en->name[0])
+ name_hash_install(en);
+ }
+ bf->array[i] = en;
+
+ /* Now deal with sub elements which also have a name */
+ if (btf_kind(tp->info) == BTF_KIND_ENUM ||
+ btf_kind(tp->info) == BTF_KIND_ENUM64) {
+ scale = btf_kind(tp->info) == BTF_KIND_ENUM ?
+ sizeof(struct btf_enum) : sizeof(struct btf_enum64);
+ for (j = 0; j < btf_vlen(tp->info); j++) {
+ en = (struct name_entry *)malloc(sizeof(struct name_entry));
+ if (!en) {
+ bf->array_len = i + 1;
+ goto no_mem;
+ }
+ memset(en, 0, sizeof(struct name_entry));
+ en->id = 0;
+ en->p_id = i + 1;
+ en->p_id += btf_files_array_index == 1 ?
+ 0 : btf_files_array[0]->array_len;
+ en->bf = bf;
+ en->name = get_str_by_name_off(bf,
+ *(uint32_t *)((char *)&tp[1] + j * scale));
+ // printf("%s\n", en->name);
+ name_hash_install(en);
+ }
+ }
+
+ /* Enlarge array when over 3/4 */
+ if (i > (bf->array_len >> 2) * 3) {
+ struct name_entry **tmp = (struct name_entry **)reallocarray(bf->array,
+ bf->array_len << 1, sizeof(struct name_entry *));
+ if (!tmp)
+ goto no_mem;
+ bf->array = tmp;
+ bf->array_len <<= 1;
+ }
+ }
+ bf->array_len = i;
+ return 0;
+no_mem:
+ /* All the memory free will be dealed later. */
+ fprintf(stderr, "%s: Not enough memory!\n", __func__);
+out:
+ return -1;
+}
+
+/*
+ * Used by search_name_by_cond() for searching specific type:
+ * such as struct/union/typedef
+ */
+static bool is_type(struct name_entry *sp, void *data_in, void *data_out)
+{
+ struct btf_type *bt = (struct btf_type *)data_out;
+ int type = *(int *)data_in;
+ if (sp->id) {
+ read_file_at_offset(sp->bf->file_name,
+ sp->btf_type_offset + sp->bf->types_data_offset,
+ sizeof(*bt), bt);
+ return btf_kind(bt->info) == type;
+ }
+ return false;
+}
+
+static struct name_entry *search_name_by_cond(char *name,
+ bool (*fn)(struct name_entry *, void *, void *),
+ void *data_in, void *data_out)
+{
+ unsigned int index;
+ struct name_entry *sp;
+
+ index = name_hash_index(name);
+ for (sp = name_hash_table[index]; sp; sp = sp->name_hash_next) {
+ if (!strcmp(name, sp->name) && fn(sp, data_in, data_out)) {
+ return sp;
+ }
+ }
+ return sp;
+}
+
+// caller should prepare bt
+void resolve_typedef(struct name_entry *en_in, struct name_entry **en_out,
+ struct btf_type *bt)
+{
+ uint32_t id;
+ struct name_entry *en;
+
+ read_file_at_offset(en_in->bf->file_name,
+ en_in->btf_type_offset + en_in->bf->types_data_offset,
+ sizeof(*bt), bt);
+ if (btf_kind(bt->info) == BTF_KIND_TYPEDEF) {
+ id = bt->type;
+ get_btf_type_by_type_id(en_in->bf, id, bt, &en);
+ return resolve_typedef(en, en_out, bt);
+ } else {
+ *en_out = en_in;
+ }
+}
+
+struct cond_args {
+ int index;
+ char *name;
+};
+
+static bool cond(struct cond_args *c1, struct cond_args *c2)
+{
+ if (c1->name && c2->name)
+ /* Check if the member name is what we want */
+ return !strcmp(c1->name, c2->name);
+ else
+ /* Check if the member index is what we want */
+ return c1->index == c2->index;
+}
+
+static bool get_internal_member_info(struct name_entry *en, int base_position,
+ int *global_index, struct cond_args *tar_arg,
+ struct member_info *mi, bool initial_dive)
+{
+ struct btf_type bt, sub_bt;
+ int member_num;
+ char *member_array_buf = NULL;
+ int i;
+ struct btf_member *bm;
+ struct btf_array *ba;
+ int bm_position;
+ struct name_entry *sub_en;
+ bool res;
+ struct cond_args cur_arg = {0};
+
+ read_file_at_offset(en->bf->file_name,
+ en->btf_type_offset + en->bf->types_data_offset,
+ sizeof(bt), &bt); // this struct
+ if (initial_dive ||
+ ((!en->name || en->name[0] == '\0') &&
+ ((btf_kind(bt.info) == BTF_KIND_STRUCT) ||
+ (btf_kind(bt.info) == BTF_KIND_UNION)))) {
+ /* anonymous struct/union, dive into */
+ member_num = btf_vlen(bt.info);
+ member_array_buf = calloc(member_num, sizeof(struct btf_member));
+ if (!member_array_buf) {
+ fprintf(stderr, "%s: Not enough memory!\n", __func__);
+ return false;
+ }
+ read_file_at_offset(en->bf->file_name,
+ en->btf_type_offset + en->bf->types_data_offset + sizeof(bt),
+ member_num *sizeof(struct btf_member), member_array_buf);
+ for (i = 0, bm = (struct btf_member *)member_array_buf;
+ i < member_num; i++) {
+ if (btf_kind_flag(bt.info)) {
+ bm_position = base_position +
+ btf_member_bit_offset(bm[i].offset);
+ mi->bits = btf_member_bitfield_size(bm[i].offset);
+ } else {
+ bm_position = base_position +
+ bm[i].offset;
+ }
+ mi->sname = get_str_by_name_off(en->bf, bm[i].name_off);
+ get_btf_type_by_type_id(en->bf, bm[i].type,
+ &sub_bt, &sub_en);
+ // Dive into this member
+ res = get_internal_member_info(sub_en, bm_position, global_index,
+ tar_arg, mi, false);
+ if (res) {
+ free(member_array_buf);
+ return res;
+ }
+ }
+ free(member_array_buf);
+ return false;
+ }
+
+ cur_arg.index = *global_index;
+ cur_arg.name = mi->sname;
+ if (cond(&cur_arg, tar_arg)) {
+ mi->bit_pos = base_position;
+ resolve_typedef(en, &sub_en, &bt);
+ mi->uniq_id = id_to_uniq_id(sub_en->id, sub_en->bf);
+ if (btf_kind(bt.info) == BTF_KIND_PTR) {
+ /*
+ * BUG? No pointer size in btf,
+ * 32bit btf target on 64bit machine
+ */
+ mi->size = sizeof(void *);
+ mi->type = "char *";
+ } else if (btf_kind(bt.info) == BTF_KIND_ARRAY) {
+ en = sub_en;
+ ba = (struct btf_array *)malloc(sizeof(struct btf_array));
+ if (!ba) {
+ fprintf(stderr, "%s: Not enough memory!\n", __func__);
+ return false;
+ }
+ read_file_at_offset(en->bf->file_name,
+ en->btf_type_offset + en->bf->types_data_offset + sizeof(bt),
+ sizeof(struct btf_array), ba);
+ mi->size = ba->nelems; // array elements
+ get_btf_type_by_type_id(en->bf, ba->type, &sub_bt, &sub_en);
+ resolve_typedef(sub_en, &en, &bt);
+ free(ba);
+ mi->size *= bt.size; // element size
+ mi->type = NULL; // "char [64]? Leave it NULL for now"
+ } else {
+ mi->size = bt.size;
+ mi->type = get_str_by_name_off(sub_en->bf, bt.name_off);
+ }
+ return true;
+ } else {
+ (*global_index)++;
+ return false;
+ }
+}
+
+static bool get_member_info_by_index(struct name_entry *en, int target_index,
+ struct member_info *mi)
+{
+ int g_index = 1;
+ struct cond_args args = {0};
+ args.index = target_index;
+ memset(mi, 0, sizeof(*mi));
+ return get_internal_member_info(en, 0, &g_index, &args, mi, true);
+}
+
+static bool get_member_info_by_member_name(struct name_entry *en, char *member_name,
+ struct member_info *mi)
+{
+ int g_index = 1;
+ struct cond_args args = {0};
+ args.name = member_name;
+ memset(mi, 0, sizeof(*mi));
+ return get_internal_member_info(en, 0, &g_index, &args, mi, true);
+}
+
+/*
+ * Entry for query struct members
+ * Return: struct size, mi_out: member details
+ */
+uint32_t get_struct_member_by_name(char *struct_name, char *member_name,
+ struct member_info *mi_out)
+{
+ struct name_entry *en;
+ struct btf_type bt = {0};
+ int type = BTF_KIND_STRUCT;
+
+ en = search_name_by_cond(struct_name, is_type, (void *)&type, (void *)&bt);
+ if (!en)
+ goto out;
+ if (get_member_info_by_member_name(en, member_name, mi_out))
+ return bt.size;
+out:
+ return 0;
+}
+
+/*
+ * Entry for query type members, similar as above
+ * Return: found-the-type, mi_out: member details
+ */
+static uint32_t uniq_id_to_id(uint32_t, struct btf_file **);
+bool get_type_member_by_index(uint64_t type_uniq_id, int member_index,
+ struct member_info *mi_out)
+{
+ uint32_t id;
+ struct btf_file *bf;
+
+ id = uniq_id_to_id((uint32_t)type_uniq_id, &bf);
+ return get_member_info_by_index(bf->array[id - 1], member_index, mi_out);
+}
+
+uint32_t id_to_uniq_id(uint32_t id, struct btf_file *bf)
+{
+ int i;
+ uint32_t uniq_id = 0;
+ for (i = 0; i < btf_files_array_index; i++) {
+ if (btf_files_array[i] != bf)
+ uniq_id += btf_files_array[i]->array_len;
+ else
+ return id + uniq_id;
+ }
+ __builtin_unreachable();
+ assert(false);
+}
+
+static uint32_t uniq_id_to_id(uint32_t uid, struct btf_file **bf)
+{
+ int i;
+
+ for (i = 0; i < btf_files_array_index; i++) {
+ if (uid > btf_files_array[i]->array_len) {
+ uid -= btf_files_array[i]->array_len;
+ } else {
+ if (bf)
+ *bf = btf_files_array[i];
+ return uid;
+ }
+
+ }
+ __builtin_unreachable();
+ assert(false);
+}
+
+// api for eppic
+uint32_t get_type_size_by_name(char *type_name, int type, uint32_t *uniq_id)
+{
+ struct name_entry *en;
+ struct btf_type bt = {0};
+
+ en = search_name_by_cond(type_name, is_type, (void *)&type, (void *)&bt);
+ if (!en)
+ goto out;
+ if (uniq_id)
+ *uniq_id = id_to_uniq_id(en->id, en->bf);
+ return bt.size;
+out:
+ return 0;
+}
+
+// Caller should prepare bt_out
+struct name_entry *get_en_by_uniq_id(uint32_t uniq_id, struct btf_type *bt_out)
+{
+ uint32_t id;
+ struct btf_file *bf;
+ struct name_entry *en;
+ struct name_entry *en_sub;
+
+ id = uniq_id_to_id((uint32_t)uniq_id, &bf);
+ en = bf->array[id - 1];
+ resolve_typedef(en, &en_sub, bt_out);
+ return en_sub;
+}
+
+/* Deal with btf file */
+static int btf_file_init(int fd)
+{
+ struct stat f_stat;
+ char *buf = NULL;
+ char proc_path[32];
+ char real_path[512];
+ int ret = -1;
+
+ memset(real_path, 0, sizeof(real_path));
+ if (fstat(fd, &f_stat) < 0) {
+ fprintf(stderr, "%s: fstat fail!\n", __func__);
+ goto out;
+ }
+ buf = malloc(f_stat.st_size);
+ if (!buf) {
+ fprintf(stderr, "%s: Not enough memory!\n", __func__);
+ goto out;
+ }
+
+ snprintf(proc_path, sizeof(proc_path), "/proc/self/fd/%d", fd);
+ readlink(proc_path, real_path, sizeof(real_path) - 1);
+ if (read_file_at_offset(real_path, 0, f_stat.st_size, buf) < 0)
+ goto out;
+
+ ret = parse_btf_data(real_path, buf, f_stat.st_size);
+out:
+ if (buf)
+ free(buf);
+ return ret;
+}
+
+/* Deal with elf file which contains .BTF section */
+static int elf_file_init(int fd)
+{
+ char proc_path[32];
+ char real_path[512];
+ Elf *elf = NULL;
+ Elf_Scn *scn = NULL;
+ size_t shstrndx;
+ GElf_Shdr shdr;
+ Elf_Data *data = NULL;
+ char *str, *databuf = NULL;
+ int ret = -1;
+
+ snprintf(proc_path, sizeof(proc_path), "/proc/self/fd/%d", fd);
+ readlink(proc_path, real_path, sizeof(real_path) - 1);
+
+ if (elf_version(EV_CURRENT) == EV_NONE)
+ goto elf_err;
+ elf = elf_begin(fd, ELF_C_READ, NULL);
+ if (!elf || elf_getshdrstrndx(elf, &shstrndx))
+ goto elf_err;
+ while ((scn = elf_nextscn(elf, scn)) != NULL) {
+ if (!gelf_getshdr(scn, &shdr))
+ continue;
+ str = elf_strptr(elf, shstrndx, shdr.sh_name);
+ if (!strcmp(str, ".BTF"))
+ break;
+ }
+ if (!scn) {
+ fprintf(stderr, "%s: No .BTF found in %s!\n", __func__, real_path);
+ goto out;
+ }
+
+ data = elf_rawdata(scn, data);
+ databuf = malloc(data->d_size);
+ if (!databuf) {
+ fprintf(stderr, "%s: Not enough memory!\n", __func__);
+ goto out;
+ }
+ memcpy(databuf, data->d_buf, data->d_size);
+ elf_end(elf);
+ elf = NULL;
+ ret = parse_btf_data(real_path, databuf, data->d_size);
+ goto out;
+
+elf_err:
+ fprintf(stderr, "%s: elf error in %s!\n", __func__, real_path);
+out:
+ if (databuf)
+ free(databuf);
+ if (elf)
+ elf_end(elf);
+ return ret;
+}
+
+/*
+ * Entry for parse btf file and elf file.
+ */
+static int file_init(char *name)
+{
+ int fd = 0, ret = -1;
+ char buf[4] = {0};
+
+ /* Will be enlarged automatically */
+ if (!btf_files_array) {
+ btf_files_array = calloc(1, sizeof(struct btf_file *));
+ if (!btf_files_array) {
+ fprintf(stderr, "%s: Not enough memory!\n", __func__);
+ goto out;
+ }
+ }
+
+ fd = open(name, O_RDONLY);
+ if (read_file_at_offset(name, 0, sizeof(buf), buf) < 0)
+ goto out;
+ if (*(u_int16_t *)&buf == BTF_MAGIC) {
+ ret = btf_file_init(fd);
+ } else if (*(u_int32_t *)&buf == (((u_int32_t)ELFMAG0 << 24) |
+ ((u_int32_t)ELFMAG1 << 16) |
+ ((u_int32_t)ELFMAG2 << 8) |
+ ((u_int32_t)ELFMAG3 << 0))) {
+ ret = elf_file_init(fd);
+ } else {
+ __builtin_unreachable();
+ }
+
+out:
+ if (fd)
+ close(fd);
+ return ret;
+}
+
+/* For pure buffer data, wrap it as a file, and handover to file_init() */
+char temp[] = "/tmp/btf_XXXXXX";
+static int init_btf_from_buf(char *mod_name, char *buf, int size)
+{
+ int fd;
+ char name_buf[64];
+ int w, got = 0;
+ static char *temp_dir = NULL;
+
+ if (!temp_dir)
+ temp_dir = mkdtemp(temp);
+ snprintf(name_buf, sizeof(name_buf), "%s/%s", temp_dir, mod_name);
+ fd = open(name_buf, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
+ if (fd < 0) {
+ fprintf(stderr, "%s: open fail in %s!\n", __func__, name_buf);
+ return -1;
+ }
+
+ while (got < size) {
+ w = pwrite(fd, buf + got, size - got, got);
+ if (w < 0) {
+ fprintf(stderr, "%s: pwrite fail in %s!\n", __func__, name_buf);
+ return -1;
+ }
+ got += w;
+ }
+ close(fd);
+ return file_init(name_buf);
+}
+
+int init_kernel_btf(void)
+{
+ uint64_t size;
+ char *buf;
+ int ret;
+
+ uint64_t start_btf = get_kallsyms_value_by_name("__start_BTF");
+ uint64_t stop_btf = get_kallsyms_value_by_name("__stop_BTF");
+ if (!start_btf || !stop_btf) {
+ fprintf(stderr, "%s: symbol __start/stop_BTF not found!\n", __func__);
+ return -1;
+ }
+
+ size = stop_btf - start_btf;
+ buf = (char *)malloc(size);
+ if (!buf) {
+ fprintf(stderr, "%s: Not enough memory!\n", __func__);
+ return -1;
+ }
+ readmem(VADDR, start_btf, buf, size);
+ ret = init_btf_from_buf("vmlinux", buf, size);
+ free(buf);
+
+ return ret;
+}
+
+int init_module_btf(void)
+{
+ uint64_t btf_modules, list;
+ struct member_info mi;
+ uint64_t btf = 0, data = 0, module = 0;
+ int data_size = 0;
+ char *btf_buf = NULL;
+ char *modname = NULL;
+
+ btf_modules = get_kallsyms_value_by_name("btf_modules");
+ if (!btf_modules)
+ /* Maybe module is not enabled, this is not an error */
+ return 0;
+
+ INIT_MEMBER_OFF_SIZE(btf_module, list);
+ INIT_MEMBER_OFF_SIZE(btf_module, btf);
+ INIT_MEMBER_OFF_SIZE(btf_module, module);
+ INIT_MEMBER_OFF_SIZE(module, name);
+ INIT_MEMBER_OFF_SIZE(btf, data);
+ INIT_MEMBER_OFF_SIZE(btf, data_size);
+ modname = (char *)malloc(GET_MEMBER_SIZE(module, name));
+ if (!modname) {
+ fprintf(stderr, "%s: Not enough memory!\n", __func__);
+ goto out;
+ }
+
+ for (list = next_list(btf_modules); list != btf_modules; list = next_list(list)) {
+ readmem(VADDR, list - GET_MEMBER_OFF(btf_module, list) +
+ GET_MEMBER_OFF(btf_module, btf),
+ &btf, GET_MEMBER_SIZE(btf_module, btf));
+ readmem(VADDR, list - GET_MEMBER_OFF(btf_module, list) +
+ GET_MEMBER_OFF(btf_module, module),
+ &module, GET_MEMBER_SIZE(btf_module, module));
+ readmem(VADDR, module + GET_MEMBER_OFF(module, name),
+ modname, GET_MEMBER_SIZE(module, name));
+ readmem(VADDR, btf + GET_MEMBER_OFF(btf, data),
+ &data, GET_MEMBER_SIZE(btf, data));
+ readmem(VADDR, btf + GET_MEMBER_OFF(btf, data_size),
+ &data_size, GET_MEMBER_SIZE(btf, data_size));
+ btf_buf = (char *)malloc(data_size);
+ if (!btf_buf) {
+ fprintf(stderr, "%s: Not enough memory!\n", __func__);
+ goto out;
+ }
+ readmem(VADDR, data, btf_buf, data_size);
+ if (init_btf_from_buf(modname, btf_buf, data_size))
+ goto out;
+ free(btf_buf);
+ }
+ return 0;
+out:
+ if (modname)
+ free(modname);
+ if (btf_buf)
+ free(btf_buf);
+ return -1;
+}
+
+void cleanup_btf(void)
+{
+ int bf_index, i;
+ struct btf_file *bf;
+ struct name_entry *en, *en_tmp;
+ DIR *temp_dir;
+ struct dirent *entry;
+ char path[512];
+
+ /* Free no-hashtable-installed elements */
+ for (bf_index = 0; bf_index < btf_files_array_index; bf_index++) {
+ bf = btf_files_array[bf_index];
+ if (bf && bf->array) {
+ for (i = 0; i < bf->array_len; i++) {
+ if (!bf->array[i]->name || !bf->array[i]->name[0])
+ free(bf->array[i]);
+ }
+ }
+ }
+
+ /* Free the hashtable-installed elements */
+ for (i = 0; i < NAME_HASH; i++) {
+ for (en = name_hash_table[i]; en;) {
+ en_tmp = en;
+ en = en->name_hash_next;
+ free(en_tmp);
+ }
+ }
+
+ /* Cleanup anything else */
+ for (bf_index = 0; bf_index < btf_files_array_index; bf_index++) {
+ bf = btf_files_array[bf_index];
+ if (bf && bf->array) {
+ free(bf->array);
+ }
+ if (bf && bf->file_name)
+ free(bf->file_name);
+ if (bf && bf->str_cache)
+ free(bf->str_cache);
+ if (bf)
+ free(bf);
+ }
+
+ if (btf_files_array)
+ free(btf_files_array);
+
+ /* Cleanup the dir /tmp/btf_XXXXXX */
+ temp_dir = opendir(temp);
+ if (!temp_dir)
+ return;
+ while ((entry = readdir(temp_dir)) != NULL) {
+ if (!strcmp(entry->d_name, ".") || !strcmp(entry->d_name, ".."))
+ continue;
+ snprintf(path, sizeof(path), "%s/%s", temp, entry->d_name);
+ if (remove(path)) {
+ fprintf(stderr, "%s: fail del %s!\n", __func__, path);
+ /* In case too many fail messages of files in the dir */
+ return;
+ }
+ }
+ closedir(temp_dir);
+ rmdir(temp);
+}
diff --git a/btf.h b/btf.h
new file mode 100644
index 0000000..a0d13e6
--- /dev/null
+++ b/btf.h
@@ -0,0 +1,176 @@
+#ifndef _BTF_H
+#define _BTF_H
+#include <stdint.h>
+#include <stdbool.h>
+
+typedef uint8_t __u8;
+typedef uint16_t __u16;
+typedef uint32_t __u32;
+typedef int32_t __s32;
+
+#define BTF_MAGIC 0xeb9f
+
+enum {
+ BTF_KIND_UNKN = 0, /* Unknown */
+ BTF_KIND_INT = 1, /* Integer */
+ BTF_KIND_PTR = 2, /* Pointer */
+ BTF_KIND_ARRAY = 3, /* Array */
+ BTF_KIND_STRUCT = 4, /* Struct */
+ BTF_KIND_UNION = 5, /* Union */
+ BTF_KIND_ENUM = 6, /* Enumeration */
+ BTF_KIND_FWD = 7, /* Forward */
+ BTF_KIND_TYPEDEF = 8, /* Typedef */
+ BTF_KIND_VOLATILE = 9, /* Volatile */
+ BTF_KIND_CONST = 10, /* Const */
+ BTF_KIND_RESTRICT = 11, /* Restrict */
+ BTF_KIND_FUNC = 12, /* Function */
+ BTF_KIND_FUNC_PROTO = 13, /* Function Proto */
+ BTF_KIND_VAR = 14, /* Variable */
+ BTF_KIND_DATASEC = 15, /* Section */
+ BTF_KIND_FLOAT = 16, /* Floating point */
+ BTF_KIND_DECL_TAG = 17, /* Decl Tag */
+ BTF_KIND_TYPE_TAG = 18, /* Type Tag */
+ BTF_KIND_ENUM64 = 19, /* Enumeration up to 64-bit values */
+
+ NR_BTF_KINDS,
+ BTF_KIND_MAX = NR_BTF_KINDS - 1,
+};
+
+struct btf_type {
+ __u32 name_off;
+
+ /* "info" bits arrangement
+ * bits 0-15: vlen (e.g. # of struct's members)
+ * bits 16-23: unused
+ * bits 24-27: kind (e.g. int, ptr, array...etc)
+ * bits 28-30: unused
+ * bit 31: kind_flag, currently used by
+ * struct, union and fwd
+ */
+ __u32 info;
+
+ /* "size" is used by INT, ENUM, STRUCT, UNION and DATASEC.
+ * "size" tells the size of the type it is describing.
+ *
+ * "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
+ * FUNC, FUNC_PROTO, VAR, DECL_TAG and TYPE_TAG.
+ * "type" is a type_id referring to another type.
+ */
+ union {
+ __u32 size;
+ __u32 type;
+ };
+};
+
+struct btf_header {
+ __u16 magic;
+ __u8 version;
+ __u8 flags;
+ __u32 hdr_len;
+
+ /* All offsets are in bytes relative to the end of this header */
+ __u32 type_off; /* offset of type section */
+ __u32 type_len; /* length of type section */
+ __u32 str_off; /* offset of string section */
+ __u32 str_len; /* length of string section */
+};
+
+struct btf_array {
+ __u32 type;
+ __u32 index_type;
+ __u32 nelems;
+};
+
+struct btf_member {
+ __u32 name_off;
+ __u32 type;
+ /* If the type info kind_flag is set, the btf_member offset
+ * contains both member bitfield size and bit offset. The
+ * bitfield size is set for bitfield members. If the type
+ * info kind_flag is not set, the offset contains only bit
+ * offset.
+ */
+ __u32 offset;
+};
+
+struct btf_enum {
+ __u32 name_off;
+ __s32 val;
+};
+
+struct btf_enum64 {
+ __u32 name_off;
+ __u32 val_lo32;
+ __u32 val_hi32;
+};
+
+struct btf_param {
+ __u32 name_off;
+ __u32 type;
+};
+
+struct btf_var {
+ __u32 linkage;
+};
+
+struct btf_var_secinfo {
+ __u32 type;
+ __u32 offset;
+ __u32 size;
+};
+
+struct btf_decl_tag {
+ __s32 component_idx;
+};
+
+/*************************************/
+struct btf_file {
+ char *file_name;
+ char *str_cache;
+ uint32_t str_cache_len;
+ uint32_t types_data_offset;
+ uint32_t array_len;
+ struct name_entry **array;
+};
+
+struct name_entry {
+ union {
+ uint32_t btf_type_offset;
+ uint32_t p_id;
+ };
+
+ uint32_t id;
+ char *name;
+ struct btf_file *bf;
+ struct name_entry *name_hash_next;
+};
+
+struct member_info {
+ char *sname; // member name
+ char *type; // member type: int, long etc
+ uint32_t bit_pos; // member position in bits
+ uint32_t bits; // member width in bits
+ uint32_t size; // member size in bytes
+ int uniq_id; // uniq_id of btf
+};
+
+/*************************************/
+
+int init_kernel_btf(void);
+uint32_t get_struct_member_by_name(char *, char *, struct member_info *);
+bool get_type_member_by_index(uint64_t, int, struct member_info *);
+uint32_t get_type_size_by_name(char *, int, uint32_t *);
+struct name_entry *get_en_by_uniq_id(uint32_t, struct btf_type *);
+void resolve_typedef(struct name_entry *, struct name_entry **, struct btf_type *);
+int get_btf_type_by_type_id(struct btf_file *, uint32_t,
+ struct btf_type *, struct name_entry **);
+uint32_t id_to_uniq_id(uint32_t, struct btf_file *);
+int init_module_btf(void);
+void cleanup_btf(void);
+
+static inline uint32_t btf_kind(uint32_t info)
+{
+ return (info & 0x1F000000) >> 24;
+}
+
+#endif /* _BTF_H */
diff --git a/kallsyms.c b/kallsyms.c
new file mode 100644
index 0000000..89f6bfa
--- /dev/null
+++ b/kallsyms.c
@@ -0,0 +1,371 @@
+#include <stdint.h>
+#include <stdbool.h>
+#include <string.h>
+#include <assert.h>
+#include "makedumpfile.h"
+#include "kallsyms.h"
+#include "btf.h"
+
+static uint32_t *kallsyms_offsets = NULL;
+static uint16_t *kallsyms_token_index = NULL;
+static uint8_t *kallsyms_token_table = NULL;
+static uint8_t *kallsyms_names = NULL;
+static unsigned long kallsyms_relative_base = 0;
+static unsigned int kallsyms_num_syms = 0;
+
+#define NAME_HASH 512
+static struct syment *symtable = NULL;
+static struct syment *name_hash_table[NAME_HASH] = {0};
+
+static uint64_t absolute_percpu(uint64_t base, int32_t val)
+{
+ if (val >= 0)
+ return (uint64_t)val;
+ else
+ return base - 1 - val;
+}
+
+static void *get_syment_next(void *entry)
+{
+ return ((struct syment *)entry)->name_hash_next;
+}
+
+static void set_syment_next(void *entry, void *next)
+{
+ ((struct syment *)entry)->name_hash_next = next;
+}
+
+static unsigned int name_hash_index(char *name)
+{
+ return hash_index(name, NAME_HASH);
+}
+
+static void name_hash_install(struct syment *en)
+{
+ hash_install((void **)name_hash_table, NAME_HASH, en, en->name,
+ get_syment_next, set_syment_next);
+}
+
+struct syment *search_kallsyms_by_name(char *name)
+{
+ unsigned int index;
+ struct syment *sp;
+
+ index = name_hash_index(name);
+ for (sp = name_hash_table[index]; sp; sp = sp->name_hash_next) {
+ if (!strcmp(name, sp->name)) {
+ return sp;
+ }
+ }
+ return sp;
+}
+
+uint64_t get_kallsyms_value_by_name(char *name)
+{
+ struct syment *sp;
+
+ sp = search_kallsyms_by_name(name);
+ if (!sp)
+ return 0;
+ return sp->value;
+}
+
+#define BUFLEN 512
+int parse_kernel_kallsyms(void)
+{
+ char buf[BUFLEN];
+ int len = 0;
+ int index = 0, i;
+ uint8_t *compressd_data;
+ uint8_t *uncompressd_data;
+ uint64_t stext;
+
+ symtable = calloc(kallsyms_num_syms, sizeof(struct syment));
+ if (!symtable)
+ goto no_mem;
+
+ for (i = 0; i < kallsyms_num_syms; i++) {
+ memset(buf, 0, BUFLEN);
+ len = kallsyms_names[index++];
+ compressd_data = &kallsyms_names[index];
+ index += len;
+ while (len--) {
+ uncompressd_data = &kallsyms_token_table[kallsyms_token_index[*compressd_data]];
+ assert(strlen(buf) + strlen((char *)uncompressd_data) < BUFLEN);
+ strcat(buf, (char *)uncompressd_data);
+ compressd_data++;
+ }
+ symtable[i].value = kallsyms_offsets[i];
+ symtable[i].type = buf[0];
+ symtable[i].name = strdup(&buf[1]);
+ if (!symtable[i].name)
+ goto no_mem;
+ name_hash_install(&symtable[i]);
+ }
+
+ /* Now refresh the absolute each kallsyms address */
+ stext = get_kallsyms_value_by_name("_stext");
+ if (SYMBOL(_stext) == absolute_percpu(kallsyms_relative_base, stext)) {
+ for (i = 0; i < kallsyms_num_syms; i++) {
+ symtable[i].value = absolute_percpu(kallsyms_relative_base,
+ symtable[i].value);
+ }
+ } else if (SYMBOL(_stext) == kallsyms_relative_base + stext) {
+ for (i = 0; i < kallsyms_num_syms; i++) {
+ symtable[i].value += kallsyms_relative_base;
+ }
+ } else {
+ fprintf(stderr, "%s: Wrong calculate kallsyms symbol value!\n", __func__);
+ goto out;
+ }
+
+ return 0;
+no_mem:
+ fprintf(stderr, "%s: Not enough memory!\n", __func__);
+out:
+ return -1;
+}
+
+static bool vmcore_info_ready = false;
+
+int read_vmcoreinfo_kallsyms(void)
+{
+ READ_SYMBOL("kallsyms_names", kallsyms_names);
+ READ_SYMBOL("kallsyms_num_syms", kallsyms_num_syms);
+ READ_SYMBOL("kallsyms_token_table", kallsyms_token_table);
+ READ_SYMBOL("kallsyms_token_index", kallsyms_token_index);
+ READ_SYMBOL("kallsyms_offsets", kallsyms_offsets);
+ READ_SYMBOL("kallsyms_relative_base", kallsyms_relative_base);
+ vmcore_info_ready = true;
+ return true;
+}
+
+int init_kernel_kallsyms(void)
+{
+ const int token_index_size = (UINT8_MAX + 1) * sizeof(uint16_t);
+ uint64_t last_token, len;
+ char data;
+ int i;
+ int ret = -1;
+
+ if (vmcore_info_ready == false) {
+ fprintf(stderr, "%s: vmcoreinfo not ready for kallsyms!\n",
+ __func__);
+ return ret;
+ }
+
+ readmem(VADDR, SYMBOL(kallsyms_num_syms), &kallsyms_num_syms,
+ sizeof(kallsyms_num_syms));
+ readmem(VADDR, SYMBOL(kallsyms_relative_base), &kallsyms_relative_base,
+ sizeof(kallsyms_relative_base));
+
+ kallsyms_offsets = malloc(sizeof(uint32_t) * kallsyms_num_syms);
+ if (!kallsyms_offsets)
+ goto no_mem;
+ readmem(VADDR, SYMBOL(kallsyms_offsets), kallsyms_offsets,
+ kallsyms_num_syms * sizeof(uint32_t));
+
+ kallsyms_token_index = malloc(token_index_size);
+ if (!kallsyms_token_index)
+ goto no_mem;
+ readmem(VADDR, SYMBOL(kallsyms_token_index), kallsyms_token_index,
+ token_index_size);
+
+ last_token = SYMBOL(kallsyms_token_table) + kallsyms_token_index[UINT8_MAX];
+ do {
+ readmem(VADDR, last_token++, &data, 1);
+ } while(data);
+ len = last_token - SYMBOL(kallsyms_token_table);
+ kallsyms_token_table = malloc(len);
+ if (!kallsyms_token_table)
+ goto no_mem;
+ readmem(VADDR, SYMBOL(kallsyms_token_table), kallsyms_token_table, len);
+
+ for (len = 0, i = 0; i < kallsyms_num_syms; i++) {
+ readmem(VADDR, SYMBOL(kallsyms_names) + len, &data, 1);
+ if (data & 0x80) {
+ printf("BUG! long sym name");
+ goto out;
+ }
+ len += data + 1;
+ }
+ kallsyms_names = malloc(len);
+ if (!kallsyms_names)
+ goto no_mem;
+ readmem(VADDR, SYMBOL(kallsyms_names), kallsyms_names, len);
+
+ ret = parse_kernel_kallsyms();
+ goto out;
+
+no_mem:
+ fprintf(stderr, "%s: Not enough memory!\n", __func__);
+out:
+ if (kallsyms_offsets)
+ free(kallsyms_offsets);
+ if (kallsyms_token_index)
+ free(kallsyms_token_index);
+ if (kallsyms_token_table)
+ free(kallsyms_token_table);
+ if (kallsyms_names)
+ free(kallsyms_names);
+ return ret;
+}
+
+uint64_t next_list(uint64_t list)
+{
+ static int list_head_next_offset = 0;
+ static int list_head_next_size = 0;
+
+ struct member_info mi;
+ uint64_t next = 0;
+
+ if (!list_head_next_size) {
+ get_struct_member_by_name("list_head", "next", &mi);
+ list_head_next_size = mi.size;
+ list_head_next_offset = mi.bit_pos / 8;
+ }
+ readmem(VADDR, list + list_head_next_offset, &next, list_head_next_size);
+ return next;
+}
+
+int init_module_kallsyms(void)
+{
+ struct member_info mi;
+ uint64_t modules, list, value = 0, symtab = 0,
+ strtab = 0, typetab = 0;
+ uint32_t st_name = 0;
+ int num_symtab, i, j;
+ struct syment *mod_syment;
+ char symname[512], ch;
+ int ret = -1;
+
+ modules = get_kallsyms_value_by_name("modules");
+ if (!modules) {
+ /* Not a failure if no module enabled */
+ ret = 0;
+ goto out;
+ }
+
+ INIT_MEMBER_OFF_SIZE(module, list);
+ INIT_MEMBER_OFF_SIZE(module, core_kallsyms);
+ INIT_MEMBER_OFF_SIZE(mod_kallsyms, symtab);
+ INIT_MEMBER_OFF_SIZE(mod_kallsyms, num_symtab);
+ INIT_MEMBER_OFF_SIZE(mod_kallsyms, strtab);
+ INIT_MEMBER_OFF_SIZE(mod_kallsyms, typetab);
+ INIT_MEMBER_OFF_SIZE(elf64_sym, st_name);
+ INIT_MEMBER_OFF_SIZE(elf64_sym, st_value);
+
+ for (list = next_list(modules); list != modules; list = next_list(list)) {
+ readmem(VADDR, list - GET_MEMBER_OFF(module, list) +
+ GET_MEMBER_OFF(module, core_kallsyms) +
+ GET_MEMBER_OFF(mod_kallsyms, num_symtab),
+ &num_symtab, GET_MEMBER_SIZE(mod_kallsyms, num_symtab));
+ readmem(VADDR, list - GET_MEMBER_OFF(module, list) +
+ GET_MEMBER_OFF(module, core_kallsyms) +
+ GET_MEMBER_OFF(mod_kallsyms, symtab),
+ &symtab, GET_MEMBER_SIZE(mod_kallsyms, symtab));
+ readmem(VADDR, list - GET_MEMBER_OFF(module, list) +
+ GET_MEMBER_OFF(module, core_kallsyms) +
+ GET_MEMBER_OFF(mod_kallsyms, strtab),
+ &strtab, GET_MEMBER_SIZE(mod_kallsyms, strtab));
+ readmem(VADDR, list - GET_MEMBER_OFF(module, list) +
+ GET_MEMBER_OFF(module, core_kallsyms) +
+ GET_MEMBER_OFF(mod_kallsyms, typetab),
+ &typetab, GET_MEMBER_SIZE(mod_kallsyms, typetab));
+ for (i = 0; i < num_symtab; i++) {
+ j = 0;
+ readmem(VADDR, symtab + i * GET_STRUCT_SIZE(elf64_sym, st_value) +
+ GET_MEMBER_OFF(elf64_sym, st_value),
+ &value, GET_MEMBER_SIZE(elf64_sym, st_value));
+ readmem(VADDR, symtab + i * GET_STRUCT_SIZE(elf64_sym, st_name) +
+ GET_MEMBER_OFF(elf64_sym, st_name),
+ &st_name, GET_MEMBER_SIZE(elf64_sym, st_name));
+ do {
+ readmem(VADDR, strtab + st_name + j++, &ch, 1);
+ } while (ch != '\0');
+ if (j == 1)
+ /* Skip empty string */
+ continue;
+ assert(j <= sizeof(symname));
+ mod_syment = (struct syment *)calloc(1, sizeof(struct syment));
+ if (!mod_syment)
+ goto no_mem;
+ readmem(VADDR, strtab + st_name, symname, j);
+ mod_syment->name = strdup(symname);
+ if (!mod_syment->name) {
+ free(mod_syment);
+ goto no_mem;
+ }
+ mod_syment->value = value;
+ readmem(VADDR, typetab + i, &mod_syment->type, 1);
+ name_hash_install(mod_syment);
+ }
+ }
+ ret = 0;
+ goto out;
+no_mem:
+ /* Hashtable will be cleaned later */
+ fprintf(stderr, "%s: Not enough memory!\n", __func__);
+out:
+ return ret;
+}
+
+void cleanup_kallsyms(void)
+{
+ struct syment *en, *en_tmp;
+ int i;
+
+ /* Free the module's kallsyms first */
+ for (i = 0; i < NAME_HASH; i++) {
+ for (en = name_hash_table[i]; en;) {
+ en_tmp = en;
+ en = en->name_hash_next;
+ if (en_tmp <= &symtable[kallsyms_num_syms - 1] &&
+ en_tmp >= &symtable[0])
+ continue;
+ free(en_tmp->name);
+ free(en_tmp);
+ }
+ }
+
+ /* Free the kernel ones */
+ for (i = 0; i < kallsyms_num_syms; i++) {
+ if (symtable[i].name)
+ free(symtable[i].name);
+ }
+ free(symtable);
+}
+
+/* Hash table utils */
+unsigned int hash_index(const char *name, unsigned int hash_size)
+{
+ unsigned int len, value;
+
+ len = strlen(name);
+ value = name[len - 1] * name[len / 2];
+
+ return (name[0] ^ value) % hash_size;
+}
+
+void hash_install(void **hash_table, unsigned int hash_size,
+ void *entry, const char *name,
+ void *(*get_next)(void *),
+ void (*set_next)(void *, void *))
+{
+ unsigned int index = hash_index(name, hash_size);
+ void *sp = hash_table[index];
+
+ assert(index < hash_size);
+ if (sp == NULL) {
+ hash_table[index] = entry;
+ } else {
+ while (sp) {
+ if (get_next(sp)) {
+ sp = get_next(sp);
+ } else {
+ set_next(sp, entry);
+ break;
+ }
+ }
+ }
+}
\ No newline at end of file
diff --git a/kallsyms.h b/kallsyms.h
new file mode 100644
index 0000000..96ea970
--- /dev/null
+++ b/kallsyms.h
@@ -0,0 +1,42 @@
+#ifndef _KALLSYMS_H
+#define _KALLSYMS_H
+
+#include <stdint.h>
+
+struct syment {
+ uint64_t value;
+ char *name;
+ struct syment *name_hash_next;
+ char type;
+};
+
+int init_kernel_kallsyms(void);
+int read_vmcoreinfo_kallsyms(void);
+int parse_kernel_kallsyms(void);
+struct syment *search_kallsyms_by_name(char *);
+uint64_t next_list(uint64_t);
+uint64_t get_kallsyms_value_by_name(char *);
+int init_module_kallsyms(void);
+void cleanup_kallsyms(void);
+int read_vmcoreinfo_kallsyms(void);
+
+struct member_off_size {
+ int m_off;
+ int m_size;
+ int s_size;
+};
+#define QUATE(x) #x
+#define INIT_MEMBER_OFF_SIZE(S, M) \
+ struct member_off_size S##_##M; \
+ S##_##M.s_size = get_struct_member_by_name(QUATE(S), QUATE(M), &mi); \
+ S##_##M.m_off = mi.bit_pos / 8; \
+ S##_##M.m_size = mi.size;
+#define GET_MEMBER_OFF(S, M) (S##_##M.m_off)
+#define GET_MEMBER_SIZE(S, M) (S##_##M.m_size)
+#define GET_STRUCT_SIZE(S, M) (S##_##M.s_size)
+
+unsigned int hash_index(const char *, unsigned int);
+void hash_install(void **, unsigned int, void *, const char *,
+ void *(*)(void *), void (*)(void *, void *));
+
+#endif /* _KALLSYMS_H */
\ No newline at end of file
diff --git a/makedumpfile.c b/makedumpfile.c
index 33fad32..cdfcfeb 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -27,6 +27,7 @@
#include <limits.h>
#include <assert.h>
#include <zlib.h>
+#include "kallsyms.h"
struct symbol_table symbol_table;
struct size_table size_table;
@@ -3103,6 +3104,8 @@ read_vmcoreinfo_from_vmcore(off_t offset, unsigned long size, int flag_xen_hv)
if (!read_vmcoreinfo())
goto out;
}
+ read_vmcoreinfo_kallsyms();
+
close_vmcoreinfo();
ret = TRUE;
diff --git a/makedumpfile.h b/makedumpfile.h
index 944397a..cc474ad 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -257,6 +257,7 @@ static inline int string_exists(char *s) { return (s ? TRUE : FALSE); }
#define UINT(ADDR) *((unsigned int *)(ADDR))
#define ULONG(ADDR) *((unsigned long *)(ADDR))
#define ULONGLONG(ADDR) *((unsigned long long *)(ADDR))
+#define VOID_PTR(ADDR) *((void **)(ADDR))
/*
@@ -1917,6 +1918,16 @@ struct symbol_table {
* symbols on sparc64 arch
*/
unsigned long long vmemmap_table;
+
+ /*
+ * kallsyms related
+ */
+ unsigned long long kallsyms_names;
+ unsigned long long kallsyms_num_syms;
+ unsigned long long kallsyms_token_table;
+ unsigned long long kallsyms_token_index;
+ unsigned long long kallsyms_offsets;
+ unsigned long long kallsyms_relative_base;
};
struct size_table {
--
2.47.0
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH RFC][makedumpfile 05/10] Export necessary btf/kallsyms functions to eppic extension
2025-06-10 9:57 [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering Tao Liu
` (3 preceding siblings ...)
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 04/10] Add btf/kallsyms support for symbol type/address resolving Tao Liu
@ 2025-06-10 9:57 ` Tao Liu
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 06/10] Port the maple tree data structures and functions Tao Liu
` (5 subsequent siblings)
10 siblings, 0 replies; 26+ messages in thread
From: Tao Liu @ 2025-06-10 9:57 UTC (permalink / raw)
To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, devel, Tao Liu
Signed-off-by: Tao Liu <ltao@redhat.com>
---
erase_info.c | 16 +++++++++++++++-
erase_info.h | 14 ++++++++++++++
2 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/erase_info.c b/erase_info.c
index d68e1a2..9ec1813 100644
--- a/erase_info.c
+++ b/erase_info.c
@@ -20,6 +20,8 @@
#include "print_info.h"
#include "dwarf_info.h"
#include "erase_info.h"
+#include "kallsyms.h"
+#include "btf.h"
#include <dlfcn.h>
@@ -36,7 +38,19 @@ struct call_back eppic_cb = {
&get_die_member_all,
&get_die_nfields_all,
&get_symbol_addr_all,
- &update_filter_info_raw
+ &update_filter_info_raw,
+ &get_structure_size,
+ &get_member_offset,
+ /**********************/
+ &update_filter_pages_info,
+ &get_kallsyms_value_by_name,
+ &get_struct_member_by_name,
+ &get_type_member_by_index,
+ &get_type_size_by_name,
+ &get_en_by_uniq_id,
+ &resolve_typedef,
+ &get_btf_type_by_type_id,
+ &id_to_uniq_id,
};
diff --git a/erase_info.h b/erase_info.h
index 4552dfc..6797ed1 100644
--- a/erase_info.h
+++ b/erase_info.h
@@ -19,6 +19,7 @@
#ifndef _ERASE_INFO_H
#define _ERASE_INFO_H
+#include "btf.h"
#define MAX_SIZE_STR_LEN (26)
/*
@@ -52,6 +53,19 @@ struct call_back {
int (*get_die_nfields_all)(unsigned long long die_off);
unsigned long long (*get_symbol_addr_all)(char *symname);
int (*update_filter_info_raw)(unsigned long long, int, int);
+ long (*get_structure_size)(char *, int);
+ long (*get_member_offset)(char *, char *, int);
+ /********************************/
+ int (*update_filter_pages_info)(unsigned long, unsigned long);
+ uint64_t (*get_kallsyms_value_by_name)(char *);
+ uint32_t (*get_struct_member_by_name)(char *, char *, struct member_info *);
+ bool (*get_type_member_by_index)(uint64_t, int, struct member_info *);
+ uint32_t (*get_type_size_by_name)(char *, int, uint32_t *);
+ struct name_entry *(*get_en_by_uniq_id)(uint32_t, struct btf_type *);
+ void (*resolve_typedef)(struct name_entry *, struct name_entry **, struct btf_type *);
+ int (*get_btf_type_by_type_id)(struct btf_file *, uint32_t,
+ struct btf_type *, struct name_entry **);
+ uint32_t (*id_to_uniq_id)(uint32_t, struct btf_file *);
};
extern struct erase_info *erase_info;
--
2.47.0
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH RFC][makedumpfile 06/10] Port the maple tree data structures and functions
2025-06-10 9:57 [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering Tao Liu
` (4 preceding siblings ...)
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 05/10] Export necessary btf/kallsyms functions to eppic extension Tao Liu
@ 2025-06-10 9:57 ` Tao Liu
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 07/10] Supporting main() as the entry of eppic script Tao Liu
` (4 subsequent siblings)
10 siblings, 0 replies; 26+ messages in thread
From: Tao Liu @ 2025-06-10 9:57 UTC (permalink / raw)
To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, devel, Tao Liu
The majority of maple tree code is copied from crash utility. Since currenty
it is not needed by makedumpfile, maple tree is integrated with eppic
extension only.
The minor change of maple tree code are:
1) a cache buffer for maple tree data because eppic script cannot allocate
a buffer currently;
2) an interface for eppic script.
Signed-off-by: Tao Liu <ltao@redhat.com>
---
Makefile | 4 +-
eppic_maple.c | 431 ++++++++++++++++++++++++++++++++++++++++++++++++++
eppic_maple.h | 8 +
3 files changed, 441 insertions(+), 2 deletions(-)
create mode 100644 eppic_maple.c
create mode 100644 eppic_maple.h
diff --git a/Makefile b/Makefile
index fbc9f5b..216749f 100644
--- a/Makefile
+++ b/Makefile
@@ -121,8 +121,8 @@ makedumpfile: $(SRC_BASE) $(OBJ_PART) $(OBJ_ARCH)
-e "s/@VERSION@/$(VERSION)/" \
$(VPATH)makedumpfile.conf.5.in > $(VPATH)makedumpfile.conf.5
-eppic_makedumpfile.so: extension_eppic.c
- $(CC) $(CFLAGS) $(LDFLAGS) -shared -rdynamic -o $@ extension_eppic.c -fPIC -leppic -ltinfo
+eppic_makedumpfile.so: extension_eppic.c eppic_maple.c
+ $(CC) $(CFLAGS) $(LDFLAGS) -shared -rdynamic -o $@ $^ -fPIC -leppic -ltinfo
clean:
rm -f $(OBJ) $(OBJ_PART) $(OBJ_ARCH) makedumpfile makedumpfile.8 makedumpfile.conf.5
diff --git a/eppic_maple.c b/eppic_maple.c
new file mode 100644
index 0000000..ee8d23d
--- /dev/null
+++ b/eppic_maple.c
@@ -0,0 +1,431 @@
+
+#include <stdbool.h>
+#include <limits.h>
+#include <sys/types.h>
+#include <assert.h>
+#include "makedumpfile.h"
+#include "extension_eppic.h"
+#include "btf.h"
+#include "kallsyms.h"
+#include "erase_info.h"
+
+enum maple_type {
+ maple_dense,
+ maple_leaf_64,
+ maple_range_64,
+ maple_arange_64,
+};
+
+#define MAPLE_TREE_COUNT (1)
+#define MAPLE_TREE_SEARCH (2)
+#define MAPLE_TREE_DUMP (3)
+#define MAPLE_TREE_GATHER (4)
+#define MAPLE_TREE_DUMP_CB (5)
+
+#define MAPLE_NODE_MASK 255UL
+#define MT_FLAGS_HEIGHT_OFFSET 0x02
+#define MT_FLAGS_HEIGHT_MASK 0x7C
+#define MAPLE_NODE_TYPE_MASK 0x0F
+#define MAPLE_NODE_TYPE_SHIFT 0x03
+#define MAPLE_BUFSIZE 512
+
+static unsigned char *mt_slots = NULL;
+
+static ulong mt_max[4] = {0};
+
+static long g_maple_tree;
+static long g_maple_node;
+static long g_maple_tree_ma_root;
+static long g_maple_node_ma64;
+static long g_maple_node_mr64;
+static long g_maple_node_slot;
+static long g_maple_arange_64_pivot;
+static long g_maple_arange_64_slot;
+static long g_maple_range_64_pivot;
+static long g_maple_range_64_slot;
+
+/********************maple tree internal**************************/
+
+static inline bool xa_is_internal(ulong entry)
+{
+ return (entry & 3) == 2;
+}
+
+static inline bool xa_is_node(ulong entry)
+{
+ return xa_is_internal(entry) && entry > 4096;
+}
+
+static inline ulong mte_to_node(ulong maple_enode_entry)
+{
+ return maple_enode_entry & ~MAPLE_NODE_MASK;
+}
+
+static inline enum maple_type mte_node_type(ulong maple_enode_entry)
+{
+ return (maple_enode_entry >> MAPLE_NODE_TYPE_SHIFT) &
+ MAPLE_NODE_TYPE_MASK;
+}
+
+static inline ulong mt_slot(void **slots, unsigned char offset)
+{
+ return (ulong)slots[offset];
+}
+
+static inline bool ma_is_leaf(const enum maple_type type)
+{
+ return type < maple_range_64;
+}
+
+struct do_maple_tree_info {
+ ulong count;
+ void *data;
+};
+
+struct maple_tree_ops {
+ void (*entry)(ulong node, void *private);
+ void *private;
+};
+
+static void do_mt_range64(ulong, ulong, ulong,
+ struct maple_tree_ops *);
+static void do_mt_arange64(ulong, ulong, ulong,
+ struct maple_tree_ops *);
+static void do_mt_entry(ulong, struct maple_tree_ops *);
+static void do_mt_node(ulong, ulong, ulong,
+ struct maple_tree_ops *);
+
+static inline bool mte_is_leaf(ulong maple_enode_entry)
+{
+ return ma_is_leaf(mte_node_type(maple_enode_entry));
+}
+
+static void do_mt_range64(ulong entry, ulong min, ulong max,
+ struct maple_tree_ops *ops)
+{
+ ulong maple_node_m_node = mte_to_node(entry);
+ char node_buf[MAPLE_BUFSIZE];
+ bool leaf = mte_is_leaf(entry);
+ ulong first = min, last;
+ int i;
+ char *mr64_buf;
+
+ READMEM(VADDR, maple_node_m_node, node_buf, g_maple_node);
+
+ mr64_buf = node_buf + g_maple_node_mr64;
+
+ for (i = 0; i < mt_slots[maple_range_64]; i++) {
+ last = max;
+
+ if (i < (mt_slots[maple_range_64] - 1))
+ last = ULONG(mr64_buf + g_maple_range_64_pivot +
+ sizeof(ulong) * i);
+
+ else if (!VOID_PTR(mr64_buf + g_maple_range_64_slot +
+ sizeof(void *) * i) &&
+ max != mt_max[mte_node_type(entry)])
+ break;
+ if (last == 0 && i > 0)
+ break;
+ if (leaf)
+ do_mt_entry(mt_slot((void **)(mr64_buf +
+ g_maple_range_64_slot), i),
+ ops);
+ else if (VOID_PTR(mr64_buf + g_maple_range_64_slot +
+ sizeof(void *) * i)) {
+ do_mt_node(mt_slot((void **)(mr64_buf +
+ g_maple_range_64_slot), i),
+ first, last, ops);
+ }
+
+ if (last == max)
+ break;
+ if (last > max) {
+ printf("node %p last (%lu) > max (%lu) at pivot %d!\n",
+ mr64_buf, last, max, i);
+ break;
+ }
+ first = last + 1;
+ }
+}
+
+static void do_mt_arange64(ulong entry, ulong min, ulong max,
+ struct maple_tree_ops *ops)
+{
+ ulong maple_node_m_node = mte_to_node(entry);
+ char node_buf[MAPLE_BUFSIZE];
+ bool leaf = mte_is_leaf(entry);
+ ulong first = min, last;
+ int i;
+ char *ma64_buf;
+
+ READMEM(VADDR, maple_node_m_node, node_buf, g_maple_node);
+
+ ma64_buf = node_buf + g_maple_node_ma64;
+
+ for (i = 0; i < mt_slots[maple_arange_64]; i++) {
+ last = max;
+
+ if (i < (mt_slots[maple_arange_64] - 1))
+ last = ULONG(ma64_buf + g_maple_arange_64_pivot +
+ sizeof(ulong) * i);
+ else if (!VOID_PTR(ma64_buf + g_maple_arange_64_slot +
+ sizeof(void *) * i))
+ break;
+ if (last == 0 && i > 0)
+ break;
+
+ if (leaf)
+ do_mt_entry(mt_slot((void **)(ma64_buf +
+ g_maple_arange_64_slot), i),
+ ops);
+ else if (VOID_PTR(ma64_buf + g_maple_arange_64_slot +
+ sizeof(void *) * i)) {
+ do_mt_node(mt_slot((void **)(ma64_buf +
+ g_maple_arange_64_slot), i),
+ first, last, ops);
+ }
+
+ if (last == max)
+ break;
+ if (last > max) {
+ printf("node %p last (%lu) > max (%lu) at pivot %d!\n",
+ ma64_buf, last, max, i);
+ break;
+ }
+ first = last + 1;
+ }
+}
+
+static void do_mt_entry(ulong entry, struct maple_tree_ops *ops)
+{
+ if (ops->entry && entry)
+ ops->entry(entry, ops->private);
+}
+
+static void do_mt_node(ulong entry, ulong min, ulong max,
+ struct maple_tree_ops *ops)
+{
+ ulong maple_node = mte_to_node(entry);
+ int i, type = mte_node_type(entry);
+ char node_buf[MAPLE_BUFSIZE];
+
+ READMEM(VADDR, maple_node, node_buf, g_maple_node);
+
+ switch (type) {
+ case maple_dense:
+ for (i = 0; i < mt_slots[maple_dense]; i++) {
+ if (min + i > max)
+ printf("OUT OF RANGE: ");
+ do_mt_entry(mt_slot((void **)(node_buf +
+ g_maple_node_slot), i), ops);
+ }
+ break;
+ case maple_leaf_64:
+ case maple_range_64:
+ do_mt_range64(entry, min, max, ops);
+ break;
+ case maple_arange_64:
+ do_mt_arange64(entry, min, max, ops);
+ break;
+ default:
+ printf(" UNKNOWN TYPE\n");
+ }
+}
+
+static int do_maple_tree_traverse(ulong ptr, struct maple_tree_ops *ops)
+{
+ char tree_buf[MAPLE_BUFSIZE];
+ ulong entry;
+
+ assert(MAPLE_BUFSIZE >= g_maple_tree);
+
+ READMEM(VADDR, ptr, tree_buf, g_maple_tree);
+ entry = ULONG(tree_buf + g_maple_tree_ma_root);
+
+ if (!xa_is_node(entry))
+ do_mt_entry(entry, ops);
+ else if (entry) {
+ do_mt_node(entry, 0, mt_max[mte_node_type(entry)], ops);
+ }
+
+ return 0;
+}
+
+static void do_maple_tree_count(ulong node, void *private)
+{
+ struct do_maple_tree_info *info = private;
+ info->count++;
+}
+
+static void do_maple_tree_gather(ulong node, void *private)
+{
+ struct do_maple_tree_info *info = private;
+ ulong *buf = info->data;
+
+ buf[info->count] = node;
+ info->count++;
+}
+
+static ulong do_maple_tree(ulong root, int flag, ulong *buf)
+{
+ struct do_maple_tree_info info = {
+ .count = 0,
+ .data = buf,
+ };
+ struct maple_tree_ops ops = {
+ .private = &info,
+ };
+
+ switch (flag)
+ {
+ case MAPLE_TREE_COUNT:
+ ops.entry = do_maple_tree_count;
+ break;
+
+ case MAPLE_TREE_GATHER:
+ ops.entry = do_maple_tree_gather;
+ break;
+
+ default:
+ fprintf(stderr, "do_maple_tree: invalid flag: %x\n", flag);
+ return 0;
+ }
+
+ do_maple_tree_traverse(root, &ops);
+ return info.count;
+}
+
+#define MAPLE_SIZE_INIT(X, Y) \
+do { \
+ if (is_btf) { \
+ if ((X = cb->get_type_size_by_name(Y, BTF_KIND_STRUCT, NULL)) \
+ == 0) \
+ return FALSE; \
+ } else { \
+ if ((X = cb->get_structure_size(Y, DWARF_INFO_GET_STRUCT_SIZE)) \
+ == FAILED_DWARFINFO) \
+ return FALSE; \
+ } \
+} while (0)
+
+#define MAPLE_OFFSET_INIT(X, Y, Z) \
+do { \
+ if (is_btf) { \
+ struct member_info mi; \
+ memset(&mi, 0, sizeof(mi)); \
+ if (cb->get_struct_member_by_name(Y, Z, &mi) == 0) \
+ return FALSE; \
+ X = mi.bit_pos / 8; \
+ } else { \
+ if ((X = cb->get_member_offset(Y, Z, DWARF_INFO_GET_MEMBER_OFFSET)) \
+ == FAILED_DWARFINFO) \
+ return FALSE; \
+ } \
+} while (0)
+
+/*******************maple tree api***************************/
+
+int maple_init(bool is_btf)
+{
+ int array_len = 16;
+
+ MAPLE_SIZE_INIT(g_maple_tree, "maple_tree");
+ MAPLE_SIZE_INIT(g_maple_node, "maple_node");
+ MAPLE_OFFSET_INIT(g_maple_tree_ma_root, "maple_tree", "ma_root");
+ MAPLE_OFFSET_INIT(g_maple_node_ma64, "maple_node", "ma64");
+ MAPLE_OFFSET_INIT(g_maple_node_mr64, "maple_node", "mr64");
+ MAPLE_OFFSET_INIT(g_maple_node_slot, "maple_node", "slot");
+ MAPLE_OFFSET_INIT(g_maple_arange_64_pivot, "maple_arange_64", "pivot");
+ MAPLE_OFFSET_INIT(g_maple_arange_64_slot, "maple_arange_64", "slot");
+ MAPLE_OFFSET_INIT(g_maple_range_64_pivot, "maple_range_64", "pivot");
+ MAPLE_OFFSET_INIT(g_maple_range_64_slot, "maple_range_64", "slot");
+ mt_slots = calloc(array_len, sizeof(char));
+ if (!mt_slots) {
+ fprintf(stderr, "%s: Not enough memory!\n", __func__);
+ return FALSE;
+ }
+ if (is_btf) {
+ READMEM(VADDR, cb->get_kallsyms_value_by_name("mt_slots"),
+ mt_slots, array_len * sizeof(char));
+ } else {
+ READMEM(VADDR, cb->get_symbol_addr_all("mt_slots"),
+ mt_slots, array_len * sizeof(char));
+ }
+
+ mt_max[maple_dense] = mt_slots[maple_dense];
+ mt_max[maple_leaf_64] = ULONG_MAX;
+ mt_max[maple_range_64] = ULONG_MAX;
+ mt_max[maple_arange_64] = ULONG_MAX;
+ return TRUE;
+}
+
+#define MAPLE_CACHE 16
+
+static struct maple_cache {
+ uint64_t tree;
+ uint64_t hits;
+ int elems_count;
+ uint64_t *elems;
+} maple_cache[MAPLE_CACHE] = {0};
+
+static VALUE_S *maple_tree(VALUE_S *ep_tree, int cmd, VALUE_S *ep_index)
+{
+ uint64_t tree = eppic_getval(ep_tree);
+ int index = eppic_getval(ep_index);
+ int found = -1, target = -1;
+ int min_hits = 0;
+ for (int i = 0; i < MAPLE_CACHE; i++) {
+ min_hits = maple_cache[i].hits < maple_cache[min_hits].hits ?
+ i : min_hits;
+ if (tree == maple_cache[i].tree) {
+ found = i;
+ }
+ }
+
+ if (found >= 0) {
+ maple_cache[found].hits++;
+ target = found;
+ } else {
+ target = min_hits;
+ }
+
+ switch (cmd)
+ {
+ case MAPLE_TREE_COUNT:
+ if (found < 0) {
+ if (maple_cache[target].elems) {
+ free(maple_cache[target].elems);
+ memset(&maple_cache[target], 0, sizeof(struct maple_cache));
+ }
+ found = do_maple_tree(tree, MAPLE_TREE_COUNT, NULL);
+ maple_cache[target].elems = malloc(found * sizeof(u_int64_t));
+ do_maple_tree(tree, MAPLE_TREE_GATHER, maple_cache[target].elems);
+ maple_cache[target].elems_count = found;
+ return eppic_makebtype(found);
+ } else {
+ return eppic_makebtype(maple_cache[target].elems_count);
+ }
+ case MAPLE_TREE_GATHER:
+ if (index > maple_cache[target].elems_count) {
+ printf("Invalid maple index %d(> %d) for tree %lx\n",
+ index, maple_cache[target].elems_count, tree);
+ return eppic_makebtype(0);
+ }
+ return eppic_makebtype(maple_cache[target].elems[index]);
+ default:
+ return eppic_makebtype(0);
+ }
+}
+
+VALUE_S *
+maple_count(VALUE_S *ep_tree)
+{
+ return maple_tree(ep_tree, MAPLE_TREE_COUNT, NULL);
+}
+
+VALUE_S *
+maple_elem(VALUE_S *ep_tree, VALUE_S *ep_index)
+{
+ return maple_tree(ep_tree, MAPLE_TREE_GATHER, ep_index);
+}
diff --git a/eppic_maple.h b/eppic_maple.h
new file mode 100644
index 0000000..61bb32f
--- /dev/null
+++ b/eppic_maple.h
@@ -0,0 +1,8 @@
+#ifndef _EPPIC_MAPLE_H
+#define _EPPIC_MAPLE_H
+#include "makedumpfile.h"
+int maple_init(bool);
+VALUE_S *maple_count(VALUE_S *);
+VALUE_S *maple_elem(VALUE_S *, VALUE_S *);
+#endif /*_EPPIC_MAPLE_H*/
+
--
2.47.0
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH RFC][makedumpfile 07/10] Supporting main() as the entry of eppic script
2025-06-10 9:57 [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering Tao Liu
` (5 preceding siblings ...)
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 06/10] Port the maple tree data structures and functions Tao Liu
@ 2025-06-10 9:57 ` Tao Liu
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 08/10] Enable page filtering for dwarf eppic Tao Liu
` (3 subsequent siblings)
10 siblings, 0 replies; 26+ messages in thread
From: Tao Liu @ 2025-06-10 9:57 UTC (permalink / raw)
To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, devel, Tao Liu
Previously only a function with usage or help functions is regarded as
the entry of eppic script. This constraint makes no sense because main()
is widely accepted, and people can easily get confused when writing eppic
script and blocked by the constraint. This patch will support main() as
an entry.
Signed-off-by: Tao Liu <ltao@redhat.com>
---
extension_eppic.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/extension_eppic.c b/extension_eppic.c
index 8aa9ed2..c4a13b9 100644
--- a/extension_eppic.c
+++ b/extension_eppic.c
@@ -49,6 +49,8 @@ static int apigetctype(int, char *, type_t *);
* entry point and user will not have any option to execute the usage
* or help functions. However they are required to identify the entry
* points in the eppic macro.
+ *
+ * "main" can also work as the entry point of eppic macro.
*/
void
reg_callback(char *name, int load)
@@ -59,6 +61,11 @@ reg_callback(char *name, int load)
if (!load)
return;
+ if (!strcmp(name, "main")) {
+ eppic_cmd(name, NULL, 0);
+ return;
+ }
+
snprintf(fname, sizeof(fname), "%s_help", name);
if (eppic_chkfname(fname, 0)) {
snprintf(fname, sizeof(fname), "%s_usage", name);
--
2.47.0
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH RFC][makedumpfile 08/10] Enable page filtering for dwarf eppic
2025-06-10 9:57 [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering Tao Liu
` (6 preceding siblings ...)
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 07/10] Supporting main() as the entry of eppic script Tao Liu
@ 2025-06-10 9:57 ` Tao Liu
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 09/10] Enable page filtering for btf/kallsyms eppic Tao Liu
` (2 subsequent siblings)
10 siblings, 0 replies; 26+ messages in thread
From: Tao Liu @ 2025-06-10 9:57 UTC (permalink / raw)
To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, devel, Tao Liu
Since dwarf and btf have different apiops, so making it as parameter to
eppic_init, and initialized respectively.
Signed-off-by: Tao Liu <ltao@redhat.com>
---
erase_info.c | 8 +++++---
extension_eppic.c | 34 +++++++++++++++++++++++++++++++---
extension_eppic.h | 6 ++++--
3 files changed, 40 insertions(+), 8 deletions(-)
diff --git a/erase_info.c b/erase_info.c
index 9ec1813..eeb2c3b 100644
--- a/erase_info.c
+++ b/erase_info.c
@@ -2202,7 +2202,7 @@ get_die_member_all(unsigned long long die_off, int index, long *offset,
/* Process the eppic macro using eppic library */
static int
-process_eppic_file(char *name_config)
+process_eppic_file(char *name_config, bool is_btf)
{
void *handle;
void (*eppic_load)(char *), (*eppic_unload)(char *);
@@ -2221,7 +2221,9 @@ process_eppic_file(char *name_config)
* Support specifying eppic macros in makedumpfile.conf file
*/
- eppic_init = dlsym(handle, "eppic_init");
+ if (!is_btf) {
+ eppic_init = dlsym(handle, "eppic_dwarf_init");
+ }
if (!eppic_init) {
ERRMSG("Could not find eppic_init function\n");
return FALSE;
@@ -2368,7 +2370,7 @@ gather_filter_info(void)
ret = process_config_file(info->name_filterconfig);
if (info->name_eppic_config)
- ret &= process_eppic_file(info->name_eppic_config);
+ ret &= process_eppic_file(info->name_eppic_config, false);
/*
* Remove modules symbol information, we dont need now.
diff --git a/extension_eppic.c b/extension_eppic.c
index c4a13b9..59178e5 100644
--- a/extension_eppic.c
+++ b/extension_eppic.c
@@ -23,8 +23,10 @@
#include "makedumpfile.h"
#include "extension_eppic.h"
+#include "eppic_maple.h"
static int apigetctype(int, char *, type_t *);
+struct call_back *cb;
/*
* Most of the functions included in this file performs similar
@@ -416,7 +418,7 @@ apifindsym(char *p)
return NULL;
}
-apiops icops = {
+apiops dwarf_icops = {
apigetmem,
apiputmem,
apimember,
@@ -449,17 +451,27 @@ eppic_memset(VALUE_S *vaddr, VALUE_S *vch, VALUE_S *vlen)
return eppic_makebtype(1);
}
+VALUE_S *
+eppic_filter_pages(VALUE_S *p, VALUE_S *n)
+{
+ unsigned long pfn = eppic_getval(p);
+ unsigned long num = eppic_getval(n);
+
+ UPDATE_FILTER_PAGE_INFO(pfn, num);
+ return eppic_makebtype(1);
+}
+
/* Initialize eppic */
int
-eppic_init(void *fun_ptr)
+eppic_init(void *fun_ptr, apiops *ops, bool is_btf)
{
cb = (struct call_back *)fun_ptr;
if (eppic_open() >= 0) {
/* Register call back functions */
- eppic_apiset(&icops, 3, sizeof(long), 0);
+ eppic_apiset(ops, 3, sizeof(long), 0);
/* set the new function callback */
eppic_setcallback(reg_callback);
@@ -468,8 +480,24 @@ eppic_init(void *fun_ptr)
eppic_builtin("int memset(char *, int, int)",
(bf_t *)eppic_memset);
+ if (maple_init(is_btf) == FALSE)
+ return 1;
+
+ eppic_builtin("int maple_count(char *)",
+ (bf_t *)maple_count);
+
+ eppic_builtin("unsigned long maple_elem(char *, int)",
+ (bf_t *)maple_elem);
+
+ eppic_builtin("unsigned long filter_pages(unsigned long, unsigned long)",
+ (bf_t *)eppic_filter_pages);
+
return 0;
}
return 1;
}
+int eppic_dwarf_init(void *fun_ptr)
+{
+ return eppic_init(fun_ptr, &dwarf_icops, false);
+}
diff --git a/extension_eppic.h b/extension_eppic.h
index 08f1db0..ff92f01 100644
--- a/extension_eppic.h
+++ b/extension_eppic.h
@@ -79,8 +79,7 @@ do { \
fprintf(stderr, x); \
} while (0)
-
-struct call_back *cb;
+extern struct call_back *cb;
#define GET_DOMAIN_ALL cb->get_domain_all
#define READMEM cb->readmem
@@ -92,5 +91,8 @@ struct call_back *cb;
#define GET_DIE_NFIELDS_ALL cb->get_die_nfields_all
#define GET_SYMBOL_ADDR_ALL cb->get_symbol_addr_all
#define UPDATE_FILTER_INFO_RAW cb->update_filter_info_raw
+#define UPDATE_FILTER_PAGE_INFO cb->update_filter_pages_info
+
+int eppic_init(void *, apiops *, bool);
#endif /* _EXTENSION_EPPIC_H */
--
2.47.0
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH RFC][makedumpfile 09/10] Enable page filtering for btf/kallsyms eppic
2025-06-10 9:57 [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering Tao Liu
` (7 preceding siblings ...)
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 08/10] Enable page filtering for dwarf eppic Tao Liu
@ 2025-06-10 9:57 ` Tao Liu
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 10/10] Introducing 2 eppic scripts to test the dwarf/btf eppic extension Tao Liu
2025-08-05 3:16 ` [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering Tao Liu
10 siblings, 0 replies; 26+ messages in thread
From: Tao Liu @ 2025-06-10 9:57 UTC (permalink / raw)
To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, devel, Tao Liu
This patch mainly implements the eppic interfaces using btf/kallsyms functions.
Signed-off-by: Tao Liu <ltao@redhat.com>
---
Makefile | 2 +-
erase_info.c | 22 +++++
extension_btf.c | 218 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 241 insertions(+), 1 deletion(-)
create mode 100644 extension_btf.c
diff --git a/Makefile b/Makefile
index 216749f..a2d37b1 100644
--- a/Makefile
+++ b/Makefile
@@ -121,7 +121,7 @@ makedumpfile: $(SRC_BASE) $(OBJ_PART) $(OBJ_ARCH)
-e "s/@VERSION@/$(VERSION)/" \
$(VPATH)makedumpfile.conf.5.in > $(VPATH)makedumpfile.conf.5
-eppic_makedumpfile.so: extension_eppic.c eppic_maple.c
+eppic_makedumpfile.so: extension_eppic.c eppic_maple.c extension_btf.c
$(CC) $(CFLAGS) $(LDFLAGS) -shared -rdynamic -o $@ $^ -fPIC -leppic -ltinfo
clean:
diff --git a/erase_info.c b/erase_info.c
index eeb2c3b..e0fed6b 100644
--- a/erase_info.c
+++ b/erase_info.c
@@ -2223,6 +2223,8 @@ process_eppic_file(char *name_config, bool is_btf)
if (!is_btf) {
eppic_init = dlsym(handle, "eppic_dwarf_init");
+ } else {
+ eppic_init = dlsym(handle, "eppic_btf_init");
}
if (!eppic_init) {
ERRMSG("Could not find eppic_init function\n");
@@ -2351,6 +2353,26 @@ gather_filter_info(void)
{
int ret = TRUE;
+ if (!info->name_vmlinux && info->name_eppic_config) {
+ /* No vmlinux is present, use btf & kallsyms instead. */
+ if (init_kernel_kallsyms())
+ goto fail;
+ if (init_kernel_btf())
+ goto fail;
+ if (init_module_kallsyms())
+ goto fail;
+ if (init_module_btf())
+ goto fail;
+ ret = process_eppic_file(info->name_eppic_config, true);
+ goto out;
+fail:
+ ret = FALSE;
+out:
+ cleanup_btf();
+ cleanup_kallsyms();
+ return ret;
+ }
+
/*
* Before processing filter config file, load the symbol data of
* loaded modules from vmcore.
diff --git a/extension_btf.c b/extension_btf.c
new file mode 100644
index 0000000..c09625f
--- /dev/null
+++ b/extension_btf.c
@@ -0,0 +1,218 @@
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include "btf.h"
+#include "kallsyms.h"
+
+#include "makedumpfile.h"
+#include "extension_eppic.h"
+#include "eppic_maple.h"
+
+static def_t *
+apigetdefs(void)
+{
+ return 0;
+}
+
+static int
+apigetval(char *name, ull *val, VALUE_S *value)
+{
+ uint64_t v = cb->get_kallsyms_value_by_name(name);
+ if (!v)
+ return 0;
+ *val = v;
+ if (!value)
+ return 1;
+ eppic_setmemaddr(value, *val);
+ return 1;
+}
+
+static char *drilldown(ull, type_t *);
+static char *
+apimember(char *mname, ull idx, type_t *tm, member_t *m, ull *last_index)
+{
+ struct member_info mi = {0};
+
+ if (cb->get_type_member_by_index(idx, ++(*last_index), &mi) == false)
+ return NULL;
+ eppic_member_soffset(m, mi.bit_pos / 8);
+ eppic_member_ssize(m, mi.size);
+ eppic_member_snbits(m, mi.bits);
+ eppic_member_sfbit(m, mi.bit_pos % 8);
+ eppic_member_sname(m, mi.sname);
+
+ return drilldown(mi.uniq_id, tm);
+}
+
+static int
+apigetmem(ull iaddr, void *p, int nbytes)
+{
+ return cb->readmem(VADDR, iaddr, p, nbytes);
+}
+
+static int
+apigetctype(int ctype, char *name, type_t *tout)
+{
+ long size = 0;
+ uint32_t idx = 0;
+
+ switch (ctype) {
+ case V_STRUCT:
+ size = cb->get_type_size_by_name(name, BTF_KIND_STRUCT, &idx);
+ break;
+ case V_UNION:
+ size = cb->get_type_size_by_name(name, BTF_KIND_UNION, &idx);
+ break;
+ }
+
+ if (size <= 0 || !idx)
+ return 0;
+
+ /* populate */
+ eppic_type_settype(tout, ctype);
+ eppic_type_setsize(tout, size);
+ eppic_type_setidx(tout, (ull)idx);
+ eppic_pushref(tout, 0);
+ return 1;
+}
+
+static char *
+apigetrtype(ull idx, TYPE_S *t)
+{
+ return 0;
+}
+
+static uint8_t
+apigetuint8(void *ptr)
+{
+ uint8_t val;
+ if (!READMEM(VADDR, (unsigned long)ptr, (char *)&val, sizeof(val)))
+ return (uint8_t) -1;
+ return val;
+}
+
+static uint16_t
+apigetuint16(void *ptr)
+{
+ uint16_t val;
+ if (!READMEM(VADDR, (unsigned long)ptr, (char *)&val, sizeof(val)))
+ return (uint16_t) -1;
+ return val;
+}
+
+static uint32_t
+apigetuint32(void *ptr)
+{
+ uint32_t val;
+ if (!READMEM(VADDR, (unsigned long)ptr, (char *)&val, sizeof(val)))
+ return (uint32_t) -1;
+ return val;
+}
+
+static uint64_t
+apigetuint64(void *ptr)
+{
+ uint64_t val;
+ if (!READMEM(VADDR, (unsigned long)ptr, (char *)&val, sizeof(val)))
+ return (uint64_t) -1;
+ return val;
+}
+
+static char *
+apifindsym(char *p)
+{
+ return NULL;
+}
+
+static enum_t *
+apigetenum(char *name)
+{
+ return 0;
+}
+
+static int
+apialignment(ull idx)
+{
+ return 0;
+}
+
+static int
+apiputmem(ull iaddr, void *p, int nbytes)
+{
+ return 1;
+}
+
+static apiops btf_icops = {
+ apigetmem,
+ apiputmem,
+ apimember,
+ apigetctype,
+ apigetrtype,
+ apialignment,
+ apigetval,
+ apigetenum,
+ apigetdefs,
+ apigetuint8,
+ apigetuint16,
+ apigetuint32,
+ apigetuint64,
+ apifindsym,
+};
+
+static char *drilldown(ull idx, type_t *t)
+{
+ struct btf_type bt, sub_bt;
+ struct name_entry *en, *sub_en;
+ int ref = 0;
+ int tmp = idx;
+
+dive_ptr:
+ en = cb->get_en_by_uniq_id(tmp, &bt);
+ if (btf_kind(bt.info) == BTF_KIND_PTR) {
+ ref++;
+ if (!bt.type)
+ eppic_parsetype("char", t, ref);
+ else {
+ cb->get_btf_type_by_type_id(en->bf, bt.type, &sub_bt, &sub_en);
+ tmp = cb->id_to_uniq_id(sub_en->id, sub_en->bf);
+ goto dive_ptr;
+ }
+ } else if (btf_kind(bt.info) == BTF_KIND_INT) {
+ eppic_parsetype("int", t, ref);
+ eppic_type_setsize(t, bt.size);
+ } else if (btf_kind(bt.info) == BTF_KIND_STRUCT) {
+ eppic_type_mkstruct(t);
+ eppic_type_setsize(t, bt.size);
+ eppic_type_setidx(t, (ull)idx);
+ if (en->name && en->name[0])
+ apigetctype(V_STRUCT, en->name, t);
+ eppic_pushref(t, ref);
+ } else if (btf_kind(bt.info) == BTF_KIND_UNION) {
+ eppic_type_mkunion(t);
+ eppic_type_setsize(t, bt.size);
+ eppic_type_setidx(t, (ull)idx);
+ if (en->name && en->name[0])
+ apigetctype(V_UNION, en->name, t);
+ eppic_pushref(t, ref);
+ } else if (btf_kind(bt.info) == BTF_KIND_ENUM) {
+ eppic_type_mkenum(t);
+ eppic_type_setsize(t, bt.size);
+ eppic_type_setidx(t, (ull)idx);
+ if (en->name && en->name[0])
+ apigetctype(V_UNION, en->name, t);
+ eppic_pushref(t, ref);
+ } else if (btf_kind(bt.info) == BTF_KIND_CONST) {
+ cb->get_btf_type_by_type_id(en->bf, bt.type, &sub_bt, &sub_en);
+ tmp = cb->id_to_uniq_id(sub_en->id, sub_en->bf);
+ goto dive_ptr;
+ } else {
+ printf("%s: Drilldown unsupported btf kind %d\n",
+ en->name, btf_kind(bt.info));
+ }
+ return eppic_strdup("");
+}
+
+int eppic_btf_init(void *fun_ptr)
+{
+ return eppic_init(fun_ptr, &btf_icops, true);
+}
\ No newline at end of file
--
2.47.0
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH RFC][makedumpfile 10/10] Introducing 2 eppic scripts to test the dwarf/btf eppic extension
2025-06-10 9:57 [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering Tao Liu
` (8 preceding siblings ...)
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 09/10] Enable page filtering for btf/kallsyms eppic Tao Liu
@ 2025-06-10 9:57 ` Tao Liu
2025-08-05 3:16 ` [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering Tao Liu
10 siblings, 0 replies; 26+ messages in thread
From: Tao Liu @ 2025-06-10 9:57 UTC (permalink / raw)
To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, devel, Tao Liu
This patch will introduce 2 eppic scripts. One is for filtering out amdgpu
mm pages, the other is for printing all tasks VMAs. dwarf & btf eppic
extension should produce the same result for every eppic script, mainly
for test use.
Signed-off-by: Tao Liu <ltao@redhat.com>
---
eppic_scripts/filter_amdgpu_mm_pages.c | 36 ++++++++++++++++++++++++++
eppic_scripts/print_all_vma.c | 29 +++++++++++++++++++++
2 files changed, 65 insertions(+)
create mode 100644 eppic_scripts/filter_amdgpu_mm_pages.c
create mode 100644 eppic_scripts/print_all_vma.c
diff --git a/eppic_scripts/filter_amdgpu_mm_pages.c b/eppic_scripts/filter_amdgpu_mm_pages.c
new file mode 100644
index 0000000..2936a54
--- /dev/null
+++ b/eppic_scripts/filter_amdgpu_mm_pages.c
@@ -0,0 +1,36 @@
+int main()
+{
+ struct task_struct *p;
+ unsigned long p_off;
+ int i, c;
+ struct vm_area_struct *vma;
+ struct ttm_buffer_object *tbo;
+ unsigned long pfn, num, mt;
+
+ p = (struct task_struct *)&init_task;
+ p_off = (unsigned long)&(p->tasks) - (unsigned long)p;
+
+ do {
+ if (!(p->mm)) {
+ p = (struct task_struct *)((unsigned long)(p->tasks.next) - p_off);
+ continue;
+ }
+ mt = (unsigned long)&(p->mm->mm_mt);
+
+ c = maple_count((char *)mt);
+ for (i = 0; i < c; i++) {
+ vma = (struct vm_area_struct *)maple_elem((char *)mt, i);
+ if (vma->vm_ops == &amdgpu_gem_vm_ops) {
+ tbo = (struct ttm_buffer_object *)(vma->vm_private_data);
+ if (tbo->ttm) {
+ num = (unsigned long)(tbo->ttm->num_pages);
+ pfn = ((unsigned long)(tbo->ttm->pages[0]) - *(unsigned long *)&vmemmap_base) / sizeof(struct page);
+ filter_pages(pfn, num);
+ }
+ }
+ }
+ p = (struct task_struct *)((unsigned long)(p->tasks.next) - p_off);
+ } while(p != &init_task);
+
+ return 1;
+}
diff --git a/eppic_scripts/print_all_vma.c b/eppic_scripts/print_all_vma.c
new file mode 100644
index 0000000..e8e49c2
--- /dev/null
+++ b/eppic_scripts/print_all_vma.c
@@ -0,0 +1,29 @@
+int main()
+{
+ struct task_struct *p;
+ unsigned long p_off;
+ int i, c;
+ struct vm_area_struct *vma;
+ unsigned long mt;
+
+ p = (struct task_struct *)&init_task;
+ p_off = (unsigned long)&(p->tasks) - (unsigned long)p;
+
+ do {
+ if (!(p->mm)) {
+ p = (struct task_struct *)((unsigned long)(p->tasks.next) - p_off);
+ continue;
+ }
+ printf("PID: %d\n", (int)(p->pid));
+ mt = (unsigned long)&(p->mm->mm_mt);
+
+ c = maple_count((char *)mt);
+ for (i = 0; i < c; i++) {
+ vma = (struct vm_area_struct *)maple_elem((char *)mt, i);
+ printf("%lx\n", vma);
+ }
+ p = (struct task_struct *)((unsigned long)(p->tasks.next) - p_off);
+ } while(p != &init_task);
+
+ return 1;
+}
--
2.47.0
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering
2025-06-10 9:57 [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering Tao Liu
` (9 preceding siblings ...)
2025-06-10 9:57 ` [PATCH RFC][makedumpfile 10/10] Introducing 2 eppic scripts to test the dwarf/btf eppic extension Tao Liu
@ 2025-08-05 3:16 ` Tao Liu
2025-08-07 11:42 ` YAMAZAKI MASAMITSU(山崎 真光)
10 siblings, 1 reply; 26+ messages in thread
From: Tao Liu @ 2025-08-05 3:16 UTC (permalink / raw)
To: yamazaki-msmt, k-hagio-ab, kexec; +Cc: aravinda, devel
Kindly ping...
Any comments for this patchset?
Thanks,
Tao Liu
On Tue, Jun 10, 2025 at 9:57 PM Tao Liu <ltao@redhat.com> wrote:
>
> A) This patchset will introduce the following features to makedumpfile:
>
> 1) Enable eppic script for memory pages filtering.
> 2) Enable btf and kallsyms for symbol type and address resolving.
> 3) Port maple tree data structures and functions, primarily used for
> vma iteration.
>
> B) The purpose of the features are:
>
> 1) Currently makedumpfile filters mm pages based on page flags, because flags
> can help to determine one page's usage. But this page-flag-checking method
> lacks of flexibility in certain cases, e.g. if we want to filter those mm
> pages occupied by GPU during vmcore dumping due to:
>
> a) GPU may be taking a large memory and contains sensitive data;
> b) GPU mm pages have no relations to kernel crash and useless for vmcore
> analysis.
>
> But there is no GPU mm page specific flags, and apparently we don't need
> to create one just for kdump use. A programmable filtering tool is more
> suitable for such cases. In addition, different GPU vendors may use
> different ways for mm pages allocating, programmable filtering is better
> than hard coding these GPU specific logics into makedumpfile in this case.
>
> 2) Currently makedumpfile already contains a programmable filtering tool, aka
> eppic script, which allows user to write customized code for data erasing.
> However it has the following drawbacks:
>
> a) cannot do mm page filtering.
> b) need to access to debuginfo of both kernel and modules, which is not
> applicable in the 2nd kernel.
> c) Poor performance, making vmcore dumping time unacceptable (See
> the following performance testing).
>
> makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
> types and addresses. In recent kernel there are dwarf alternatives such
> as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
> are already packed within vmcore, so we can use it directly.
>
> 3) Maple tree data structures are used in recent kernels, such as vma
> iteration. So maple tree poring is needed.
>
> With these, this patchset introduces an upgraded eppic, which is based on
> btf/kallsyms symbol resolving, and is programmable for mm page filtering.
> The following info shows its usage and performance, please note the tests
> are performed in 1st kernel:
>
> $ time ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
> /tmp/dwarf.out -x /lib/debug/lib/modules/6.11.8-300.fc41.x86_64/vmlinux
> --eppic eppic_scripts/filter_amdgpu_mm_pages.c
> real 14m6.894s
> user 4m16.900s
> sys 9m44.695s
>
> $ time ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
> /tmp/btf.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c
> real 0m10.672s
> user 0m9.270s
> sys 0m1.130s
>
> -rw------- 1 root root 367475074 Jun 10 18:06 btf.out
> -rw------- 1 root root 367475074 Jun 10 21:05 dwarf.out
> -rw-rw-rw- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore
>
> C) Discussion:
>
> 1) GPU types: Currently only tested with amdgpu's mm page filtering, others
> are not tested.
> 2) Code structure: There are some similar code shared by makedumpfile and
> crash, such as maple tree data structure, also I planed to port the
> btf/kallsyms code to crash as well, so there are code duplications for
> crash & makedumpfile. Since I havn't working on crash poring, code change
> on btf/kallsyms is expected. How can we share the code, creating a common
> library or keep the duplication as it is?
> 3) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64.
> Others are not tested.
>
> D) Testing:
>
> 1) If you don't want to create your vmcore, you can find a vmcore which I
> created with amdgpu mm pages unfiltered [1], the amdgpu mm pages are
> allocated by program [2]. You can use the vmcore in 1st kernel to filter
> the amdgpu mm pages by the previous performance testing cmdline. To
> verify the pages are filtered in crash:
>
> Unfiltered:
> crash> search -c "!QAZXSW@#EDC"
> ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> crash> rd ffff96b7fa800000
> ffff96b7fa800000: 405753585a415121 !QAZXSW@
> crash> rd ffff96b87c800000
> ffff96b87c800000: 405753585a415121 !QAZXSW@
>
> Filtered:
> crash> search -c "!QAZXSW@#EDC"
> crash> rd ffff96b7fa800000
> rd: page excluded: kernel virtual address: ffff96b7fa800000 type: "64-bit KVADDR"
> crash> rd ffff96b87c800000
> rd: page excluded: kernel virtual address: ffff96b87c800000 type: "64-bit KVADDR"
>
> 2) You can use eppic_scripts/print_all_vma.c against an ordinary vmcore to
> test only btf/kallsyms functions by output all VMAs if no amdgpu
> vmcores/machine avaliable.
>
> [1]: https://people.redhat.com/~ltao/core/
> [2]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df
>
> Tao Liu (10):
> dwarf_info: Support kernel address randomization
> dwarf_info: Fix a infinite recursion bug for search_domain
> Add page filtering function
> Add btf/kallsyms support for symbol type/address resolving
> Export necessary btf/kallsyms functions to eppic extension
> Port the maple tree data structures and functions
> Supporting main() as the entry of eppic script
> Enable page filtering for dwarf eppic
> Enable page filtering for btf/kallsyms eppic
> Introducing 2 eppic scripts to test the dwarf/btf eppic extension
>
> Makefile | 6 +-
> btf.c | 919 +++++++++++++++++++++++++
> btf.h | 176 +++++
> dwarf_info.c | 15 +-
> eppic_maple.c | 431 ++++++++++++
> eppic_maple.h | 8 +
> eppic_scripts/filter_amdgpu_mm_pages.c | 36 +
> eppic_scripts/print_all_vma.c | 29 +
> erase_info.c | 123 +++-
> erase_info.h | 22 +
> extension_btf.c | 218 ++++++
> extension_eppic.c | 41 +-
> extension_eppic.h | 6 +-
> kallsyms.c | 371 ++++++++++
> kallsyms.h | 42 ++
> makedumpfile.c | 21 +-
> makedumpfile.h | 11 +
> 17 files changed, 2448 insertions(+), 27 deletions(-)
> create mode 100644 btf.c
> create mode 100644 btf.h
> create mode 100644 eppic_maple.c
> create mode 100644 eppic_maple.h
> create mode 100644 eppic_scripts/filter_amdgpu_mm_pages.c
> create mode 100644 eppic_scripts/print_all_vma.c
> create mode 100644 extension_btf.c
> create mode 100644 kallsyms.c
> create mode 100644 kallsyms.h
>
> --
> 2.47.0
>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering
2025-08-05 3:16 ` [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering Tao Liu
@ 2025-08-07 11:42 ` YAMAZAKI MASAMITSU(山崎 真光)
2025-08-11 0:04 ` Tao Liu
0 siblings, 1 reply; 26+ messages in thread
From: YAMAZAKI MASAMITSU(山崎 真光) @ 2025-08-07 11:42 UTC (permalink / raw)
To: Tao Liu, HAGIO KAZUHITO(萩尾 一仁),
kexec@lists.infradead.org
Cc: aravinda@linux.vnet.ibm.com, devel@lists.crash-utility.osci.io
Thank you for the suggestion.
I think it's a good idea, but epppic needs careful consideration.
I'm sorry, but please let me check for a moment.
Thanks,
Masa
On 2025/08/05 12:16, Tao Liu wrote:
> Kindly ping...
>
> Any comments for this patchset?
>
> Thanks,
> Tao Liu
>
>
> On Tue, Jun 10, 2025 at 9:57 PM Tao Liu <ltao@redhat.com> wrote:
>> A) This patchset will introduce the following features to makedumpfile:
>>
>> 1) Enable eppic script for memory pages filtering.
>> 2) Enable btf and kallsyms for symbol type and address resolving.
>> 3) Port maple tree data structures and functions, primarily used for
>> vma iteration.
>>
>> B) The purpose of the features are:
>>
>> 1) Currently makedumpfile filters mm pages based on page flags, because flags
>> can help to determine one page's usage. But this page-flag-checking method
>> lacks of flexibility in certain cases, e.g. if we want to filter those mm
>> pages occupied by GPU during vmcore dumping due to:
>>
>> a) GPU may be taking a large memory and contains sensitive data;
>> b) GPU mm pages have no relations to kernel crash and useless for vmcore
>> analysis.
>>
>> But there is no GPU mm page specific flags, and apparently we don't need
>> to create one just for kdump use. A programmable filtering tool is more
>> suitable for such cases. In addition, different GPU vendors may use
>> different ways for mm pages allocating, programmable filtering is better
>> than hard coding these GPU specific logics into makedumpfile in this case.
>>
>> 2) Currently makedumpfile already contains a programmable filtering tool, aka
>> eppic script, which allows user to write customized code for data erasing.
>> However it has the following drawbacks:
>>
>> a) cannot do mm page filtering.
>> b) need to access to debuginfo of both kernel and modules, which is not
>> applicable in the 2nd kernel.
>> c) Poor performance, making vmcore dumping time unacceptable (See
>> the following performance testing).
>>
>> makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
>> types and addresses. In recent kernel there are dwarf alternatives such
>> as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
>> are already packed within vmcore, so we can use it directly.
>>
>> 3) Maple tree data structures are used in recent kernels, such as vma
>> iteration. So maple tree poring is needed.
>>
>> With these, this patchset introduces an upgraded eppic, which is based on
>> btf/kallsyms symbol resolving, and is programmable for mm page filtering.
>> The following info shows its usage and performance, please note the tests
>> are performed in 1st kernel:
>>
>> $ time ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
>> /tmp/dwarf.out -x /lib/debug/lib/modules/6.11.8-300.fc41.x86_64/vmlinux
>> --eppic eppic_scripts/filter_amdgpu_mm_pages.c
>> real 14m6.894s
>> user 4m16.900s
>> sys 9m44.695s
>>
>> $ time ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
>> /tmp/btf.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c
>> real 0m10.672s
>> user 0m9.270s
>> sys 0m1.130s
>>
>> -rw------- 1 root root 367475074 Jun 10 18:06 btf.out
>> -rw------- 1 root root 367475074 Jun 10 21:05 dwarf.out
>> -rw-rw-rw- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore
>>
>> C) Discussion:
>>
>> 1) GPU types: Currently only tested with amdgpu's mm page filtering, others
>> are not tested.
>> 2) Code structure: There are some similar code shared by makedumpfile and
>> crash, such as maple tree data structure, also I planed to port the
>> btf/kallsyms code to crash as well, so there are code duplications for
>> crash & makedumpfile. Since I havn't working on crash poring, code change
>> on btf/kallsyms is expected. How can we share the code, creating a common
>> library or keep the duplication as it is?
>> 3) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64.
>> Others are not tested.
>>
>> D) Testing:
>>
>> 1) If you don't want to create your vmcore, you can find a vmcore which I
>> created with amdgpu mm pages unfiltered [1], the amdgpu mm pages are
>> allocated by program [2]. You can use the vmcore in 1st kernel to filter
>> the amdgpu mm pages by the previous performance testing cmdline. To
>> verify the pages are filtered in crash:
>>
>> Unfiltered:
>> crash> search -c "!QAZXSW@#EDC"
>> ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>> ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>> crash> rd ffff96b7fa800000
>> ffff96b7fa800000: 405753585a415121 !QAZXSW@
>> crash> rd ffff96b87c800000
>> ffff96b87c800000: 405753585a415121 !QAZXSW@
>>
>> Filtered:
>> crash> search -c "!QAZXSW@#EDC"
>> crash> rd ffff96b7fa800000
>> rd: page excluded: kernel virtual address: ffff96b7fa800000 type: "64-bit KVADDR"
>> crash> rd ffff96b87c800000
>> rd: page excluded: kernel virtual address: ffff96b87c800000 type: "64-bit KVADDR"
>>
>> 2) You can use eppic_scripts/print_all_vma.c against an ordinary vmcore to
>> test only btf/kallsyms functions by output all VMAs if no amdgpu
>> vmcores/machine avaliable.
>>
>> [1]: https://people.redhat.com/~ltao/core/
>> [2]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df
>>
>> Tao Liu (10):
>> dwarf_info: Support kernel address randomization
>> dwarf_info: Fix a infinite recursion bug for search_domain
>> Add page filtering function
>> Add btf/kallsyms support for symbol type/address resolving
>> Export necessary btf/kallsyms functions to eppic extension
>> Port the maple tree data structures and functions
>> Supporting main() as the entry of eppic script
>> Enable page filtering for dwarf eppic
>> Enable page filtering for btf/kallsyms eppic
>> Introducing 2 eppic scripts to test the dwarf/btf eppic extension
>>
>> Makefile | 6 +-
>> btf.c | 919 +++++++++++++++++++++++++
>> btf.h | 176 +++++
>> dwarf_info.c | 15 +-
>> eppic_maple.c | 431 ++++++++++++
>> eppic_maple.h | 8 +
>> eppic_scripts/filter_amdgpu_mm_pages.c | 36 +
>> eppic_scripts/print_all_vma.c | 29 +
>> erase_info.c | 123 +++-
>> erase_info.h | 22 +
>> extension_btf.c | 218 ++++++
>> extension_eppic.c | 41 +-
>> extension_eppic.h | 6 +-
>> kallsyms.c | 371 ++++++++++
>> kallsyms.h | 42 ++
>> makedumpfile.c | 21 +-
>> makedumpfile.h | 11 +
>> 17 files changed, 2448 insertions(+), 27 deletions(-)
>> create mode 100644 btf.c
>> create mode 100644 btf.h
>> create mode 100644 eppic_maple.c
>> create mode 100644 eppic_maple.h
>> create mode 100644 eppic_scripts/filter_amdgpu_mm_pages.c
>> create mode 100644 eppic_scripts/print_all_vma.c
>> create mode 100644 extension_btf.c
>> create mode 100644 kallsyms.c
>> create mode 100644 kallsyms.h
>>
>> --
>> 2.47.0
>>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering
2025-08-07 11:42 ` YAMAZAKI MASAMITSU(山崎 真光)
@ 2025-08-11 0:04 ` Tao Liu
2025-09-05 12:41 ` YAMAZAKI MASAMITSU(山崎 真光)
0 siblings, 1 reply; 26+ messages in thread
From: Tao Liu @ 2025-08-11 0:04 UTC (permalink / raw)
To: YAMAZAKI MASAMITSU(山崎 真光)
Cc: HAGIO KAZUHITO(萩尾 一仁),
kexec@lists.infradead.org, aravinda@linux.vnet.ibm.com,
devel@lists.crash-utility.osci.io
Hi YAMAZAKI,
Thanks for your comments.
On Thu, Aug 7, 2025 at 11:42 PM YAMAZAKI MASAMITSU(山崎 真光)
<yamazaki-msmt@nec.com> wrote:
>
> Thank you for the suggestion.
> I think it's a good idea,
> but epppic needs careful consideration.
Do you mean there are drawbacks of using eppic?
From my side, one advantage of eppic is, users can use the kernel data
structures/variables directly without redefine it, like:
struct task_struct *p;
p = (struct task_struct *)&init_task;
Users don't need to include headers to define what task_struct looks
like, and can use global variable init_task without declaring it. The
callbacks of eppic can resolve these missing info via
dwarf/btf/kallsyms during runtime. I think this feature can make the
eppic scripts tidy and convenient. Also we can use other similar c
interpreters, but I haven't encountered one with the similar feature,
also considering eppic has been in makedumpfile for years, so people
may already be familiar with it...
> I'm sorry, but please let me check for a moment.
>
Sure, no problem, please take your time.
Thanks,
Tao Liu
> Thanks,
> Masa
>
> On 2025/08/05 12:16, Tao Liu wrote:
> > Kindly ping...
> >
> > Any comments for this patchset?
> >
> > Thanks,
> > Tao Liu
> >
> >
> > On Tue, Jun 10, 2025 at 9:57 PM Tao Liu <ltao@redhat.com> wrote:
> >> A) This patchset will introduce the following features to makedumpfile:
> >>
> >> 1) Enable eppic script for memory pages filtering.
> >> 2) Enable btf and kallsyms for symbol type and address resolving.
> >> 3) Port maple tree data structures and functions, primarily used for
> >> vma iteration.
> >>
> >> B) The purpose of the features are:
> >>
> >> 1) Currently makedumpfile filters mm pages based on page flags, because flags
> >> can help to determine one page's usage. But this page-flag-checking method
> >> lacks of flexibility in certain cases, e.g. if we want to filter those mm
> >> pages occupied by GPU during vmcore dumping due to:
> >>
> >> a) GPU may be taking a large memory and contains sensitive data;
> >> b) GPU mm pages have no relations to kernel crash and useless for vmcore
> >> analysis.
> >>
> >> But there is no GPU mm page specific flags, and apparently we don't need
> >> to create one just for kdump use. A programmable filtering tool is more
> >> suitable for such cases. In addition, different GPU vendors may use
> >> different ways for mm pages allocating, programmable filtering is better
> >> than hard coding these GPU specific logics into makedumpfile in this case.
> >>
> >> 2) Currently makedumpfile already contains a programmable filtering tool, aka
> >> eppic script, which allows user to write customized code for data erasing.
> >> However it has the following drawbacks:
> >>
> >> a) cannot do mm page filtering.
> >> b) need to access to debuginfo of both kernel and modules, which is not
> >> applicable in the 2nd kernel.
> >> c) Poor performance, making vmcore dumping time unacceptable (See
> >> the following performance testing).
> >>
> >> makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
> >> types and addresses. In recent kernel there are dwarf alternatives such
> >> as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
> >> are already packed within vmcore, so we can use it directly.
> >>
> >> 3) Maple tree data structures are used in recent kernels, such as vma
> >> iteration. So maple tree poring is needed.
> >>
> >> With these, this patchset introduces an upgraded eppic, which is based on
> >> btf/kallsyms symbol resolving, and is programmable for mm page filtering.
> >> The following info shows its usage and performance, please note the tests
> >> are performed in 1st kernel:
> >>
> >> $ time ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
> >> /tmp/dwarf.out -x /lib/debug/lib/modules/6.11.8-300.fc41.x86_64/vmlinux
> >> --eppic eppic_scripts/filter_amdgpu_mm_pages.c
> >> real 14m6.894s
> >> user 4m16.900s
> >> sys 9m44.695s
> >>
> >> $ time ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
> >> /tmp/btf.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c
> >> real 0m10.672s
> >> user 0m9.270s
> >> sys 0m1.130s
> >>
> >> -rw------- 1 root root 367475074 Jun 10 18:06 btf.out
> >> -rw------- 1 root root 367475074 Jun 10 21:05 dwarf.out
> >> -rw-rw-rw- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore
> >>
> >> C) Discussion:
> >>
> >> 1) GPU types: Currently only tested with amdgpu's mm page filtering, others
> >> are not tested.
> >> 2) Code structure: There are some similar code shared by makedumpfile and
> >> crash, such as maple tree data structure, also I planed to port the
> >> btf/kallsyms code to crash as well, so there are code duplications for
> >> crash & makedumpfile. Since I havn't working on crash poring, code change
> >> on btf/kallsyms is expected. How can we share the code, creating a common
> >> library or keep the duplication as it is?
> >> 3) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64.
> >> Others are not tested.
> >>
> >> D) Testing:
> >>
> >> 1) If you don't want to create your vmcore, you can find a vmcore which I
> >> created with amdgpu mm pages unfiltered [1], the amdgpu mm pages are
> >> allocated by program [2]. You can use the vmcore in 1st kernel to filter
> >> the amdgpu mm pages by the previous performance testing cmdline. To
> >> verify the pages are filtered in crash:
> >>
> >> Unfiltered:
> >> crash> search -c "!QAZXSW@#EDC"
> >> ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> >> ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> >> crash> rd ffff96b7fa800000
> >> ffff96b7fa800000: 405753585a415121 !QAZXSW@
> >> crash> rd ffff96b87c800000
> >> ffff96b87c800000: 405753585a415121 !QAZXSW@
> >>
> >> Filtered:
> >> crash> search -c "!QAZXSW@#EDC"
> >> crash> rd ffff96b7fa800000
> >> rd: page excluded: kernel virtual address: ffff96b7fa800000 type: "64-bit KVADDR"
> >> crash> rd ffff96b87c800000
> >> rd: page excluded: kernel virtual address: ffff96b87c800000 type: "64-bit KVADDR"
> >>
> >> 2) You can use eppic_scripts/print_all_vma.c against an ordinary vmcore to
> >> test only btf/kallsyms functions by output all VMAs if no amdgpu
> >> vmcores/machine avaliable.
> >>
> >> [1]: https://people.redhat.com/~ltao/core/
> >> [2]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df
> >>
> >> Tao Liu (10):
> >> dwarf_info: Support kernel address randomization
> >> dwarf_info: Fix a infinite recursion bug for search_domain
> >> Add page filtering function
> >> Add btf/kallsyms support for symbol type/address resolving
> >> Export necessary btf/kallsyms functions to eppic extension
> >> Port the maple tree data structures and functions
> >> Supporting main() as the entry of eppic script
> >> Enable page filtering for dwarf eppic
> >> Enable page filtering for btf/kallsyms eppic
> >> Introducing 2 eppic scripts to test the dwarf/btf eppic extension
> >>
> >> Makefile | 6 +-
> >> btf.c | 919 +++++++++++++++++++++++++
> >> btf.h | 176 +++++
> >> dwarf_info.c | 15 +-
> >> eppic_maple.c | 431 ++++++++++++
> >> eppic_maple.h | 8 +
> >> eppic_scripts/filter_amdgpu_mm_pages.c | 36 +
> >> eppic_scripts/print_all_vma.c | 29 +
> >> erase_info.c | 123 +++-
> >> erase_info.h | 22 +
> >> extension_btf.c | 218 ++++++
> >> extension_eppic.c | 41 +-
> >> extension_eppic.h | 6 +-
> >> kallsyms.c | 371 ++++++++++
> >> kallsyms.h | 42 ++
> >> makedumpfile.c | 21 +-
> >> makedumpfile.h | 11 +
> >> 17 files changed, 2448 insertions(+), 27 deletions(-)
> >> create mode 100644 btf.c
> >> create mode 100644 btf.h
> >> create mode 100644 eppic_maple.c
> >> create mode 100644 eppic_maple.h
> >> create mode 100644 eppic_scripts/filter_amdgpu_mm_pages.c
> >> create mode 100644 eppic_scripts/print_all_vma.c
> >> create mode 100644 extension_btf.c
> >> create mode 100644 kallsyms.c
> >> create mode 100644 kallsyms.h
> >>
> >> --
> >> 2.47.0
> >>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering
2025-08-11 0:04 ` Tao Liu
@ 2025-09-05 12:41 ` YAMAZAKI MASAMITSU(山崎 真光)
2025-09-09 1:56 ` Tao Liu
0 siblings, 1 reply; 26+ messages in thread
From: YAMAZAKI MASAMITSU(山崎 真光) @ 2025-09-05 12:41 UTC (permalink / raw)
To: Tao Liu
Cc: HAGIO KAZUHITO(萩尾 一仁),
kexec@lists.infradead.org, aravinda@linux.vnet.ibm.com,
devel@lists.crash-utility.osci.io
Hi Liu
I'm sorry it took so long.
Your proposed issue with regard to GPU memory is useful, but it may not
be a frequently used feature. I think it would be better to use an external
library, script, or option whenever possible.
As you already know, I am concerned that code related to btf, kallsyms and
maple tree has been ported. These newer features are appealing. However, I
feel that sharing code with crash will increase future maintenance efforts
if we try to maintain consistency. On the other hand, if they are not
identical, maintenance will be less frequent and the feature may
not be updated.
I also think that the eppic script would be better treated as an additional
script. Although some time has passed since the eppic function was added,
the addition of new scripts and updates to supported versions are limited.
I think it would be better to add documentation introducing the script as
a related script rather than including it directly in the makedumpfile
repository.
However, this proposal also includes suggestions for fixing some bugs and
speeding up existing eppic functions. It seems that these things need to be
incorporated.
In other words, I think it's better to separate things that can be
separated.
Sincerely
On 2025/08/11 9:04, Tao Liu wrote:
> Hi YAMAZAKI,
>
> Thanks for your comments.
>
> On Thu, Aug 7, 2025 at 11:42 PM YAMAZAKI MASAMITSU(山崎 真光)
> <yamazaki-msmt@nec.com> wrote:
>> Thank you for the suggestion.
>> I think it's a good idea,
>> but epppic needs careful consideration.
> Do you mean there are drawbacks of using eppic?
>
> From my side, one advantage of eppic is, users can use the kernel data
> structures/variables directly without redefine it, like:
>
> struct task_struct *p;
> p = (struct task_struct *)&init_task;
>
> Users don't need to include headers to define what task_struct looks
> like, and can use global variable init_task without declaring it. The
> callbacks of eppic can resolve these missing info via
> dwarf/btf/kallsyms during runtime. I think this feature can make the
> eppic scripts tidy and convenient. Also we can use other similar c
> interpreters, but I haven't encountered one with the similar feature,
> also considering eppic has been in makedumpfile for years, so people
> may already be familiar with it...
>
>> I'm sorry, but please let me check for a moment.
>>
> Sure, no problem, please take your time.
>
> Thanks,
> Tao Liu
>
>> Thanks,
>> Masa
>>
>> On 2025/08/05 12:16, Tao Liu wrote:
>>> Kindly ping...
>>>
>>> Any comments for this patchset?
>>>
>>> Thanks,
>>> Tao Liu
>>>
>>>
>>> On Tue, Jun 10, 2025 at 9:57 PM Tao Liu <ltao@redhat.com> wrote:
>>>> A) This patchset will introduce the following features to makedumpfile:
>>>>
>>>> 1) Enable eppic script for memory pages filtering.
>>>> 2) Enable btf and kallsyms for symbol type and address resolving.
>>>> 3) Port maple tree data structures and functions, primarily used for
>>>> vma iteration.
>>>>
>>>> B) The purpose of the features are:
>>>>
>>>> 1) Currently makedumpfile filters mm pages based on page flags, because flags
>>>> can help to determine one page's usage. But this page-flag-checking method
>>>> lacks of flexibility in certain cases, e.g. if we want to filter those mm
>>>> pages occupied by GPU during vmcore dumping due to:
>>>>
>>>> a) GPU may be taking a large memory and contains sensitive data;
>>>> b) GPU mm pages have no relations to kernel crash and useless for vmcore
>>>> analysis.
>>>>
>>>> But there is no GPU mm page specific flags, and apparently we don't need
>>>> to create one just for kdump use. A programmable filtering tool is more
>>>> suitable for such cases. In addition, different GPU vendors may use
>>>> different ways for mm pages allocating, programmable filtering is better
>>>> than hard coding these GPU specific logics into makedumpfile in this case.
>>>>
>>>> 2) Currently makedumpfile already contains a programmable filtering tool, aka
>>>> eppic script, which allows user to write customized code for data erasing.
>>>> However it has the following drawbacks:
>>>>
>>>> a) cannot do mm page filtering.
>>>> b) need to access to debuginfo of both kernel and modules, which is not
>>>> applicable in the 2nd kernel.
>>>> c) Poor performance, making vmcore dumping time unacceptable (See
>>>> the following performance testing).
>>>>
>>>> makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
>>>> types and addresses. In recent kernel there are dwarf alternatives such
>>>> as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
>>>> are already packed within vmcore, so we can use it directly.
>>>>
>>>> 3) Maple tree data structures are used in recent kernels, such as vma
>>>> iteration. So maple tree poring is needed.
>>>>
>>>> With these, this patchset introduces an upgraded eppic, which is based on
>>>> btf/kallsyms symbol resolving, and is programmable for mm page filtering.
>>>> The following info shows its usage and performance, please note the tests
>>>> are performed in 1st kernel:
>>>>
>>>> $ time ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
>>>> /tmp/dwarf.out -x /lib/debug/lib/modules/6.11.8-300.fc41.x86_64/vmlinux
>>>> --eppic eppic_scripts/filter_amdgpu_mm_pages.c
>>>> real 14m6.894s
>>>> user 4m16.900s
>>>> sys 9m44.695s
>>>>
>>>> $ time ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
>>>> /tmp/btf.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c
>>>> real 0m10.672s
>>>> user 0m9.270s
>>>> sys 0m1.130s
>>>>
>>>> -rw------- 1 root root 367475074 Jun 10 18:06 btf.out
>>>> -rw------- 1 root root 367475074 Jun 10 21:05 dwarf.out
>>>> -rw-rw-rw- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore
>>>>
>>>> C) Discussion:
>>>>
>>>> 1) GPU types: Currently only tested with amdgpu's mm page filtering, others
>>>> are not tested.
>>>> 2) Code structure: There are some similar code shared by makedumpfile and
>>>> crash, such as maple tree data structure, also I planed to port the
>>>> btf/kallsyms code to crash as well, so there are code duplications for
>>>> crash & makedumpfile. Since I havn't working on crash poring, code change
>>>> on btf/kallsyms is expected. How can we share the code, creating a common
>>>> library or keep the duplication as it is?
>>>> 3) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64.
>>>> Others are not tested.
>>>>
>>>> D) Testing:
>>>>
>>>> 1) If you don't want to create your vmcore, you can find a vmcore which I
>>>> created with amdgpu mm pages unfiltered [1], the amdgpu mm pages are
>>>> allocated by program [2]. You can use the vmcore in 1st kernel to filter
>>>> the amdgpu mm pages by the previous performance testing cmdline. To
>>>> verify the pages are filtered in crash:
>>>>
>>>> Unfiltered:
>>>> crash> search -c "!QAZXSW@#EDC"
>>>> ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>>>> ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>>>> crash> rd ffff96b7fa800000
>>>> ffff96b7fa800000: 405753585a415121 !QAZXSW@
>>>> crash> rd ffff96b87c800000
>>>> ffff96b87c800000: 405753585a415121 !QAZXSW@
>>>>
>>>> Filtered:
>>>> crash> search -c "!QAZXSW@#EDC"
>>>> crash> rd ffff96b7fa800000
>>>> rd: page excluded: kernel virtual address: ffff96b7fa800000 type: "64-bit KVADDR"
>>>> crash> rd ffff96b87c800000
>>>> rd: page excluded: kernel virtual address: ffff96b87c800000 type: "64-bit KVADDR"
>>>>
>>>> 2) You can use eppic_scripts/print_all_vma.c against an ordinary vmcore to
>>>> test only btf/kallsyms functions by output all VMAs if no amdgpu
>>>> vmcores/machine avaliable.
>>>>
>>>> [1]: https://people.redhat.com/~ltao/core/
>>>> [2]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df
>>>>
>>>> Tao Liu (10):
>>>> dwarf_info: Support kernel address randomization
>>>> dwarf_info: Fix a infinite recursion bug for search_domain
>>>> Add page filtering function
>>>> Add btf/kallsyms support for symbol type/address resolving
>>>> Export necessary btf/kallsyms functions to eppic extension
>>>> Port the maple tree data structures and functions
>>>> Supporting main() as the entry of eppic script
>>>> Enable page filtering for dwarf eppic
>>>> Enable page filtering for btf/kallsyms eppic
>>>> Introducing 2 eppic scripts to test the dwarf/btf eppic extension
>>>>
>>>> Makefile | 6 +-
>>>> btf.c | 919 +++++++++++++++++++++++++
>>>> btf.h | 176 +++++
>>>> dwarf_info.c | 15 +-
>>>> eppic_maple.c | 431 ++++++++++++
>>>> eppic_maple.h | 8 +
>>>> eppic_scripts/filter_amdgpu_mm_pages.c | 36 +
>>>> eppic_scripts/print_all_vma.c | 29 +
>>>> erase_info.c | 123 +++-
>>>> erase_info.h | 22 +
>>>> extension_btf.c | 218 ++++++
>>>> extension_eppic.c | 41 +-
>>>> extension_eppic.h | 6 +-
>>>> kallsyms.c | 371 ++++++++++
>>>> kallsyms.h | 42 ++
>>>> makedumpfile.c | 21 +-
>>>> makedumpfile.h | 11 +
>>>> 17 files changed, 2448 insertions(+), 27 deletions(-)
>>>> create mode 100644 btf.c
>>>> create mode 100644 btf.h
>>>> create mode 100644 eppic_maple.c
>>>> create mode 100644 eppic_maple.h
>>>> create mode 100644 eppic_scripts/filter_amdgpu_mm_pages.c
>>>> create mode 100644 eppic_scripts/print_all_vma.c
>>>> create mode 100644 extension_btf.c
>>>> create mode 100644 kallsyms.c
>>>> create mode 100644 kallsyms.h
>>>>
>>>> --
>>>> 2.47.0
>>>>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering
2025-09-05 12:41 ` YAMAZAKI MASAMITSU(山崎 真光)
@ 2025-09-09 1:56 ` Tao Liu
0 siblings, 0 replies; 26+ messages in thread
From: Tao Liu @ 2025-09-09 1:56 UTC (permalink / raw)
To: YAMAZAKI MASAMITSU(山崎 真光)
Cc: HAGIO KAZUHITO(萩尾 一仁),
kexec@lists.infradead.org, aravinda@linux.vnet.ibm.com,
devel@lists.crash-utility.osci.io
Hi YAMAZAKI,
Thanks a lot for your comments!
On Sat, Sep 6, 2025 at 12:41 AM YAMAZAKI MASAMITSU(山崎 真光)
<yamazaki-msmt@nec.com> wrote:
>
> Hi Liu
>
> I'm sorry it took so long.
No worries. :)
>
> Your proposed issue with regard to GPU memory is useful, but it may not
> be a frequently used feature. I think it would be better to use an external
> library, script, or option whenever possible.
To add some context, I have received some issue reports about
filtering GPU mem pages out of vmcore, otherwise the vmcore will be
too large. And I think this is a to-do feature, as GPU nowadays are
used more often as AI training, and frankly we have to address the
issue eventually.
>
> As you already know, I am concerned that code related to btf, kallsyms and
> maple tree has been ported. These newer features are appealing. However, I
> feel that sharing code with crash will increase future maintenance efforts
> if we try to maintain consistency. On the other hand, if they are not
> identical, maintenance will be less frequent and the feature may
> not be updated.
OK, understood. Then I will not make it share code with crash.
>
> I also think that the eppic script would be better treated as an additional
> script. Although some time has passed since the eppic function was added,
> the addition of new scripts and updates to supported versions are limited.
>
> I think it would be better to add documentation introducing the script as
> a related script rather than including it directly in the makedumpfile
> repository.
Yes, I totally agree. The filter_amdgpu_mm_pages.c is just a POC to
demo the final result that the patchset is doing, all the code
modifications are to serve this eppic script for amdgpu's mm pages
filtering. We won't ship the eppic script to everyone, because it is
not generic to all. My thought is, we empower makedumpfile with the
ability to filter GPU mm pages via BTF/eppic, and end users can use
this feature to address their specific cases, so it's users' duty to
maintain/adapt the eppic scripts.
>
> However, this proposal also includes suggestions for fixing some bugs and
> speeding up existing eppic functions. It seems that these things need to be
> incorporated.
The bug fixes related to dwarf_info aren't a must-have for filtering
gpu mm page via eppic. In fact, dwarf or BTF either can be the source
of debuginfo, which will be used to determine the details of kernel
data structures. In other words, we can do gpu mm pages filtering via
(dwarf + eppic + maple_tree), or via (BTF/kallsyms + eppic +
maple_tree). Either path can get the final vmcore with gpu mm page
filtered. But, the 2nd path(BTF/kallsyms + eppic + maple_tree) is more
suitable for kdump in 2nd kernel, because BTF/kallsyms are available
from kernel(in memory & smaller), and dwarf is only available from
debuginfo files(in disk & larger). Also we can make a performance
comparison of the 2 paths. So I added the bug fixes of dwarf_info in
this patch set.
>
> In other words, I think it's better to separate things that can be
> separated.
Yes, sure! I will try to do the separate in v2.
Thanks,
Tao Liu
>
> Sincerely
>
> On 2025/08/11 9:04, Tao Liu wrote:
> > Hi YAMAZAKI,
> >
> > Thanks for your comments.
> >
> > On Thu, Aug 7, 2025 at 11:42 PM YAMAZAKI MASAMITSU(山崎 真光)
> > <yamazaki-msmt@nec.com> wrote:
> >> Thank you for the suggestion.
> >> I think it's a good idea,
> >> but epppic needs careful consideration.
> > Do you mean there are drawbacks of using eppic?
> >
> > From my side, one advantage of eppic is, users can use the kernel data
> > structures/variables directly without redefine it, like:
> >
> > struct task_struct *p;
> > p = (struct task_struct *)&init_task;
> >
> > Users don't need to include headers to define what task_struct looks
> > like, and can use global variable init_task without declaring it. The
> > callbacks of eppic can resolve these missing info via
> > dwarf/btf/kallsyms during runtime. I think this feature can make the
> > eppic scripts tidy and convenient. Also we can use other similar c
> > interpreters, but I haven't encountered one with the similar feature,
> > also considering eppic has been in makedumpfile for years, so people
> > may already be familiar with it...
> >
> >> I'm sorry, but please let me check for a moment.
> >>
> > Sure, no problem, please take your time.
> >
> > Thanks,
> > Tao Liu
> >
> >> Thanks,
> >> Masa
> >>
> >> On 2025/08/05 12:16, Tao Liu wrote:
> >>> Kindly ping...
> >>>
> >>> Any comments for this patchset?
> >>>
> >>> Thanks,
> >>> Tao Liu
> >>>
> >>>
> >>> On Tue, Jun 10, 2025 at 9:57 PM Tao Liu <ltao@redhat.com> wrote:
> >>>> A) This patchset will introduce the following features to makedumpfile:
> >>>>
> >>>> 1) Enable eppic script for memory pages filtering.
> >>>> 2) Enable btf and kallsyms for symbol type and address resolving.
> >>>> 3) Port maple tree data structures and functions, primarily used for
> >>>> vma iteration.
> >>>>
> >>>> B) The purpose of the features are:
> >>>>
> >>>> 1) Currently makedumpfile filters mm pages based on page flags, because flags
> >>>> can help to determine one page's usage. But this page-flag-checking method
> >>>> lacks of flexibility in certain cases, e.g. if we want to filter those mm
> >>>> pages occupied by GPU during vmcore dumping due to:
> >>>>
> >>>> a) GPU may be taking a large memory and contains sensitive data;
> >>>> b) GPU mm pages have no relations to kernel crash and useless for vmcore
> >>>> analysis.
> >>>>
> >>>> But there is no GPU mm page specific flags, and apparently we don't need
> >>>> to create one just for kdump use. A programmable filtering tool is more
> >>>> suitable for such cases. In addition, different GPU vendors may use
> >>>> different ways for mm pages allocating, programmable filtering is better
> >>>> than hard coding these GPU specific logics into makedumpfile in this case.
> >>>>
> >>>> 2) Currently makedumpfile already contains a programmable filtering tool, aka
> >>>> eppic script, which allows user to write customized code for data erasing.
> >>>> However it has the following drawbacks:
> >>>>
> >>>> a) cannot do mm page filtering.
> >>>> b) need to access to debuginfo of both kernel and modules, which is not
> >>>> applicable in the 2nd kernel.
> >>>> c) Poor performance, making vmcore dumping time unacceptable (See
> >>>> the following performance testing).
> >>>>
> >>>> makedumpfile need to resolve the dwarf data from debuginfo, to get symbols
> >>>> types and addresses. In recent kernel there are dwarf alternatives such
> >>>> as btf/kallsyms which can be used for this purpose. And btf/kallsyms info
> >>>> are already packed within vmcore, so we can use it directly.
> >>>>
> >>>> 3) Maple tree data structures are used in recent kernels, such as vma
> >>>> iteration. So maple tree poring is needed.
> >>>>
> >>>> With these, this patchset introduces an upgraded eppic, which is based on
> >>>> btf/kallsyms symbol resolving, and is programmable for mm page filtering.
> >>>> The following info shows its usage and performance, please note the tests
> >>>> are performed in 1st kernel:
> >>>>
> >>>> $ time ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
> >>>> /tmp/dwarf.out -x /lib/debug/lib/modules/6.11.8-300.fc41.x86_64/vmlinux
> >>>> --eppic eppic_scripts/filter_amdgpu_mm_pages.c
> >>>> real 14m6.894s
> >>>> user 4m16.900s
> >>>> sys 9m44.695s
> >>>>
> >>>> $ time ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore
> >>>> /tmp/btf.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c
> >>>> real 0m10.672s
> >>>> user 0m9.270s
> >>>> sys 0m1.130s
> >>>>
> >>>> -rw------- 1 root root 367475074 Jun 10 18:06 btf.out
> >>>> -rw------- 1 root root 367475074 Jun 10 21:05 dwarf.out
> >>>> -rw-rw-rw- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore
> >>>>
> >>>> C) Discussion:
> >>>>
> >>>> 1) GPU types: Currently only tested with amdgpu's mm page filtering, others
> >>>> are not tested.
> >>>> 2) Code structure: There are some similar code shared by makedumpfile and
> >>>> crash, such as maple tree data structure, also I planed to port the
> >>>> btf/kallsyms code to crash as well, so there are code duplications for
> >>>> crash & makedumpfile. Since I havn't working on crash poring, code change
> >>>> on btf/kallsyms is expected. How can we share the code, creating a common
> >>>> library or keep the duplication as it is?
> >>>> 3) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64.
> >>>> Others are not tested.
> >>>>
> >>>> D) Testing:
> >>>>
> >>>> 1) If you don't want to create your vmcore, you can find a vmcore which I
> >>>> created with amdgpu mm pages unfiltered [1], the amdgpu mm pages are
> >>>> allocated by program [2]. You can use the vmcore in 1st kernel to filter
> >>>> the amdgpu mm pages by the previous performance testing cmdline. To
> >>>> verify the pages are filtered in crash:
> >>>>
> >>>> Unfiltered:
> >>>> crash> search -c "!QAZXSW@#EDC"
> >>>> ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> >>>> ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> >>>> crash> rd ffff96b7fa800000
> >>>> ffff96b7fa800000: 405753585a415121 !QAZXSW@
> >>>> crash> rd ffff96b87c800000
> >>>> ffff96b87c800000: 405753585a415121 !QAZXSW@
> >>>>
> >>>> Filtered:
> >>>> crash> search -c "!QAZXSW@#EDC"
> >>>> crash> rd ffff96b7fa800000
> >>>> rd: page excluded: kernel virtual address: ffff96b7fa800000 type: "64-bit KVADDR"
> >>>> crash> rd ffff96b87c800000
> >>>> rd: page excluded: kernel virtual address: ffff96b87c800000 type: "64-bit KVADDR"
> >>>>
> >>>> 2) You can use eppic_scripts/print_all_vma.c against an ordinary vmcore to
> >>>> test only btf/kallsyms functions by output all VMAs if no amdgpu
> >>>> vmcores/machine avaliable.
> >>>>
> >>>> [1]: https://people.redhat.com/~ltao/core/
> >>>> [2]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df
> >>>>
> >>>> Tao Liu (10):
> >>>> dwarf_info: Support kernel address randomization
> >>>> dwarf_info: Fix a infinite recursion bug for search_domain
> >>>> Add page filtering function
> >>>> Add btf/kallsyms support for symbol type/address resolving
> >>>> Export necessary btf/kallsyms functions to eppic extension
> >>>> Port the maple tree data structures and functions
> >>>> Supporting main() as the entry of eppic script
> >>>> Enable page filtering for dwarf eppic
> >>>> Enable page filtering for btf/kallsyms eppic
> >>>> Introducing 2 eppic scripts to test the dwarf/btf eppic extension
> >>>>
> >>>> Makefile | 6 +-
> >>>> btf.c | 919 +++++++++++++++++++++++++
> >>>> btf.h | 176 +++++
> >>>> dwarf_info.c | 15 +-
> >>>> eppic_maple.c | 431 ++++++++++++
> >>>> eppic_maple.h | 8 +
> >>>> eppic_scripts/filter_amdgpu_mm_pages.c | 36 +
> >>>> eppic_scripts/print_all_vma.c | 29 +
> >>>> erase_info.c | 123 +++-
> >>>> erase_info.h | 22 +
> >>>> extension_btf.c | 218 ++++++
> >>>> extension_eppic.c | 41 +-
> >>>> extension_eppic.h | 6 +-
> >>>> kallsyms.c | 371 ++++++++++
> >>>> kallsyms.h | 42 ++
> >>>> makedumpfile.c | 21 +-
> >>>> makedumpfile.h | 11 +
> >>>> 17 files changed, 2448 insertions(+), 27 deletions(-)
> >>>> create mode 100644 btf.c
> >>>> create mode 100644 btf.h
> >>>> create mode 100644 eppic_maple.c
> >>>> create mode 100644 eppic_maple.h
> >>>> create mode 100644 eppic_scripts/filter_amdgpu_mm_pages.c
> >>>> create mode 100644 eppic_scripts/print_all_vma.c
> >>>> create mode 100644 extension_btf.c
> >>>> create mode 100644 kallsyms.c
> >>>> create mode 100644 kallsyms.h
> >>>>
> >>>> --
> >>>> 2.47.0
> >>>>
^ permalink raw reply [flat|nested] 26+ messages in thread