* Re: parallel pahole hangs while building modules from nvidia-open-kernel-dkms
2025-03-25 9:10 parallel pahole hangs while building modules from nvidia-open-kernel-dkms Domenico Andreoli
@ 2025-03-25 11:32 ` Alan Maguire
2025-03-26 20:48 ` Ihor Solodrai
2025-03-28 20:25 ` Ihor Solodrai
2 siblings, 0 replies; 8+ messages in thread
From: Alan Maguire @ 2025-03-25 11:32 UTC (permalink / raw)
To: Domenico Andreoli, dwarves
On 25/03/2025 09:10, Domenico Andreoli wrote:
> Hi,
>
> This a forward of Debian bug report [0] where you can find more
> details. At [1] and [2] you can get the kernel and module to reproduce.
> I could reproduce on both amd64 and arm64 using pahole 1.29.
>
> This is marked as serious severity because it makes the autobuilder hang
> as well [3].
>
> Could you please help?
>
I tried building https://github.com/NVIDIA/open-gpu-kernel-modules
with bpf-next and latest pahole v1.29 - based on the next branch from
https://web.git.kernel.org/pub/scm/devel/pahole/pahole.git/
...using gcc 12 and all worked fine, so I need to try and reproduce your
environment more closely I suspect.
Can you provide the following information to help reproduce this:
- pahole head commit you tested with; it's described as 1.29-2 but I'd
like to understand exactly what commits are in that package
- which version of open-gpu-kernel-modules you used (which commit)
- the kernel the vmlinux you used was based on (which kernel tree, head
commit etc). the bug report says
/usr/src/linux-headers-6.12.17-amd64/vmlinux is used so does that
correspond to the stable tree 6.12 17 branch?
- which version of gcc you're building with
The more closely we can reproduce this the better. Unfortunately just
having the module itself without the base vmlinux doesn't allow us to
run pahole as we need base BTF + module for module BTF generation.
It's possible extracting base BTF might be enough here; from the vmlinux
you used something like
pahole -J -j --btf_features=default --lang_exclude=rust \
--btf_encode_detached=vmlinux.btf \
/usr/src/linux-headers-6.12.17-amd64/vmlinux
...would give us the vmlinux BTF, and we should be able to repro more
closely.
Thanks!
Alan
> Regards,
> Domenico
>
>
> The command to succeed:
>
> This simplified (sequential) command succeeds:
>
> cp nvidia-modeset.base.ko nvidia-modeset.ko
> LLVM_OBJCOPY="x86_64-linux-gnu-objcopy" pahole -J --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_features=distilled_base --btf_base vmlinux nvidia-modeset.ko -j1
> echo $?
>
> producing this output:
> ===== 8< =====
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> Unsupported DW_TAG_reference_type(0x10): type: 0x28172
> Error while encoding BTF.
> 0
> ===== >8 =====
>
>
> While this (parallel) command hangs:
>
> cp nvidia-modeset.base.ko nvidia-modeset.ko
> LLVM_OBJCOPY="x86_64-linux-gnu-objcopy" pahole -J --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_features=distilled_base --btf_base vmlinux nvidia-modeset.ko -j2
> echo $?
>
> producing this output:
> ===== 8< =====
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> Unsupported DW_TAG_reference_type(0x10): type: 0x28172
> Error while encoding BTF.
> Terminated
> 143
> ===== >8 =====
>
>
> [0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1100503
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1100503;filename=vmlinux.zst;msg=19
> [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1100503;filename=nvidia-modeset.base.ko.zst;msg=12
> [3] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1101262
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: parallel pahole hangs while building modules from nvidia-open-kernel-dkms
2025-03-25 9:10 parallel pahole hangs while building modules from nvidia-open-kernel-dkms Domenico Andreoli
2025-03-25 11:32 ` Alan Maguire
@ 2025-03-26 20:48 ` Ihor Solodrai
2025-03-28 9:05 ` Domenico Andreoli
2025-03-28 20:25 ` Ihor Solodrai
2 siblings, 1 reply; 8+ messages in thread
From: Ihor Solodrai @ 2025-03-26 20:48 UTC (permalink / raw)
To: Domenico Andreoli, dwarves
On 3/25/25 2:10 AM, Domenico Andreoli wrote:
> Hi,
>
> This a forward of Debian bug report [0] where you can find more
> details. At [1] and [2] you can get the kernel and module to reproduce.
> I could reproduce on both amd64 and arm64 using pahole 1.29.
>
> This is marked as serious severity because it makes the autobuilder hang
> as well [3].
>
> Could you please help?
>
> Regards,
> Domenico
Hi Domenico, thanks for the bug report.
I debugged the hanging, and it appears that "abort" handling in case
of a BTF encoding error was overlooked in recent changes to speedup
parallel encoding.
Could you please try the diff below, and check if it resolves the
hanging?
diff --git a/dwarf_loader.c b/dwarf_loader.c
index 84122d0..e1ba7bc 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3459,6 +3459,7 @@ static struct {
*/
uint32_t next_cu_id;
struct list_head jobs;
+ bool abort;
} cus_processing_queue;
enum job_type {
@@ -3479,6 +3480,7 @@ static void cus_queue__init(void)
pthread_cond_init(&cus_processing_queue.job_added, NULL);
INIT_LIST_HEAD(&cus_processing_queue.jobs);
cus_processing_queue.next_cu_id = 0;
+ cus_processing_queue.abort = false;
}
static void cus_queue__destroy(void)
@@ -3535,8 +3537,9 @@ static struct cu_processing_job *cus_queue__enqdeq_job(struct cu_processing_job
pthread_cond_signal(&cus_processing_queue.job_added);
}
for (;;) {
+ bool abort = __atomic_load_n(&cus_processing_queue.abort, __ATOMIC_SEQ_CST);
job = cus_queue__try_dequeue();
- if (job)
+ if (job || abort)
break;
/* No jobs or only steals out of order */
pthread_cond_wait(&cus_processing_queue.job_added, &cus_processing_queue.mutex);
@@ -3653,6 +3656,9 @@ static void *dwarf_loader__worker_thread(void *arg)
while (!stop) {
job = cus_queue__enqdeq_job(job);
+ if (!job)
+ goto out_abort;
+
switch (job->type) {
case JOB_DECODE:
@@ -3688,6 +3694,8 @@ static void *dwarf_loader__worker_thread(void *arg)
return (void *)DWARF_CB_OK;
out_abort:
+ __atomic_store_n(&cus_processing_queue.abort, true, __ATOMIC_SEQ_CST);
+ pthread_cond_signal(&cus_processing_queue.job_added);
return (void *)DWARF_CB_ABORT;
}
@@ -4028,7 +4036,7 @@ static int cus__process_file(struct cus *cus, struct conf_load *conf, int fd,
/* Process the one or more modules gleaned from this file. */
int err = dwfl_getmodules(dwfl, cus__process_dwflmod, &parms, 0);
- if (err < 0)
+ if (err)
return -1;
// We can't call dwfl_end(dwfl) here, as we keep pointers to strings
--
2.48.1
>
>
> The command to succeed:
>
> This simplified (sequential) command succeeds:
>
> cp nvidia-modeset.base.ko nvidia-modeset.ko
> LLVM_OBJCOPY="x86_64-linux-gnu-objcopy" pahole -J --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_features=distilled_base --btf_base vmlinux nvidia-modeset.ko -j1
> echo $?
>
> producing this output:
> ===== 8< =====
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> Unsupported DW_TAG_reference_type(0x10): type: 0x28172
> Error while encoding BTF.
> 0
> ===== >8 =====
>
>
> While this (parallel) command hangs:
>
> cp nvidia-modeset.base.ko nvidia-modeset.ko
> LLVM_OBJCOPY="x86_64-linux-gnu-objcopy" pahole -J --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_features=distilled_base --btf_base vmlinux nvidia-modeset.ko -j2
> echo $?
>
> producing this output:
> ===== 8< =====
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> Unsupported DW_TAG_reference_type(0x10): type: 0x28172
> Error while encoding BTF.
> Terminated
> 143
> ===== >8 =====
Please note that even though the sequential command succeeds, the BTF
output is going to be incomplete (and potentially invalid). The
underlying issue is that there is an unhandled DW_TAG in the BTF
encoder. The encoding process exits on errors like this.
It would be nice if you provided all the input (base vmlinux and the
module) that led to this error, so we could investigate.
Thank you!
>
>
> [0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1100503
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1100503;filename=vmlinux.zst;msg=19
> [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1100503;filename=nvidia-modeset.base.ko.zst;msg=12
> [3] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1101262
>
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: parallel pahole hangs while building modules from nvidia-open-kernel-dkms
2025-03-26 20:48 ` Ihor Solodrai
@ 2025-03-28 9:05 ` Domenico Andreoli
2025-03-28 16:25 ` Ihor Solodrai
0 siblings, 1 reply; 8+ messages in thread
From: Domenico Andreoli @ 2025-03-28 9:05 UTC (permalink / raw)
To: Ihor Solodrai; +Cc: dwarves
[-- Attachment #1: Type: text/plain, Size: 3755 bytes --]
On Wed, Mar 26, 2025 at 08:48:51PM +0000, Ihor Solodrai wrote:
> On 3/25/25 2:10 AM, Domenico Andreoli wrote:
> > Hi,
> >
> > This a forward of Debian bug report [0] where you can find more
> > details. At [1] and [2] you can get the kernel and module to reproduce.
> > I could reproduce on both amd64 and arm64 using pahole 1.29.
> >
> > This is marked as serious severity because it makes the autobuilder hang
> > as well [3].
> >
> > Could you please help?
> >
> > Regards,
> > Domenico
>
> Hi Domenico, thanks for the bug report.
Hi Ihor,
>
> I debugged the hanging, and it appears that "abort" handling in case
> of a BTF encoding error was overlooked in recent changes to speedup
> parallel encoding.
>
> Could you please try the diff below, and check if it resolves the
> hanging?
>
Yes, I tried it and the hanging is gone.
Now both parallel and sequential invocations fail with this error:
dwarf_expr: unhandled 0x12 DW_OP_ operation
Unsupported DW_TAG_reference_type(0x10): type: 0x28172
Error while encoding BTF.
dwarf_expr: unhandled 0x12 DW_OP_ operation
dwarf_expr: unhandled 0x12 DW_OP_ operation
dwarf_expr: unhandled 0x12 DW_OP_ operation
libbpf: failed to find '.BTF' ELF section in nvidia-modeset.ko
pahole: nvidia-modeset.ko: Invalid argument
I guess this is another story that was simply covered by the previous bug.
> diff --git a/dwarf_loader.c b/dwarf_loader.c
> index 84122d0..e1ba7bc 100644
> --- a/dwarf_loader.c
> +++ b/dwarf_loader.c
> @@ -3459,6 +3459,7 @@ static struct {
> */
> uint32_t next_cu_id;
> struct list_head jobs;
> + bool abort;
> } cus_processing_queue;
>
> enum job_type {
> @@ -3479,6 +3480,7 @@ static void cus_queue__init(void)
> pthread_cond_init(&cus_processing_queue.job_added, NULL);
> INIT_LIST_HEAD(&cus_processing_queue.jobs);
> cus_processing_queue.next_cu_id = 0;
> + cus_processing_queue.abort = false;
> }
>
> static void cus_queue__destroy(void)
> @@ -3535,8 +3537,9 @@ static struct cu_processing_job *cus_queue__enqdeq_job(struct cu_processing_job
> pthread_cond_signal(&cus_processing_queue.job_added);
> }
> for (;;) {
> + bool abort = __atomic_load_n(&cus_processing_queue.abort, __ATOMIC_SEQ_CST);
> job = cus_queue__try_dequeue();
> - if (job)
> + if (job || abort)
> break;
> /* No jobs or only steals out of order */
> pthread_cond_wait(&cus_processing_queue.job_added, &cus_processing_queue.mutex);
> @@ -3653,6 +3656,9 @@ static void *dwarf_loader__worker_thread(void *arg)
>
> while (!stop) {
> job = cus_queue__enqdeq_job(job);
> + if (!job)
> + goto out_abort;
> +
> switch (job->type) {
>
> case JOB_DECODE:
> @@ -3688,6 +3694,8 @@ static void *dwarf_loader__worker_thread(void *arg)
>
> return (void *)DWARF_CB_OK;
> out_abort:
> + __atomic_store_n(&cus_processing_queue.abort, true, __ATOMIC_SEQ_CST);
> + pthread_cond_signal(&cus_processing_queue.job_added);
> return (void *)DWARF_CB_ABORT;
> }
>
> @@ -4028,7 +4036,7 @@ static int cus__process_file(struct cus *cus, struct conf_load *conf, int fd,
>
> /* Process the one or more modules gleaned from this file. */
> int err = dwfl_getmodules(dwfl, cus__process_dwflmod, &parms, 0);
> - if (err < 0)
> + if (err)
> return -1;
>
> // We can't call dwfl_end(dwfl) here, as we keep pointers to strings
>
Is this patch already final or do you prefer I'd wait for review and marge first?
I would apply it on top of Debian's 1.29 and release a new 1.29-3 package.
Thank,
dom
--
rsa4096: 3B10 0CA1 8674 ACBA B4FE FCD2 CE5B CF17 9960 DE13
ed25519: FFB4 0CC3 7F2E 091D F7DA 356E CC79 2832 ED38 CB05
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: parallel pahole hangs while building modules from nvidia-open-kernel-dkms
2025-03-28 9:05 ` Domenico Andreoli
@ 2025-03-28 16:25 ` Ihor Solodrai
2025-03-28 17:55 ` Ihor Solodrai
0 siblings, 1 reply; 8+ messages in thread
From: Ihor Solodrai @ 2025-03-28 16:25 UTC (permalink / raw)
To: Domenico Andreoli; +Cc: dwarves, alan.maguire, acme, andrii, bpf
On 3/28/25 2:05 AM, Domenico Andreoli wrote:
> On Wed, Mar 26, 2025 at 08:48:51PM +0000, Ihor Solodrai wrote:
>> On 3/25/25 2:10 AM, Domenico Andreoli wrote:
>>> Hi,
>>>
>>> This a forward of Debian bug report [0] where you can find more
>>> details. At [1] and [2] you can get the kernel and module to reproduce.
>>> I could reproduce on both amd64 and arm64 using pahole 1.29.
>>>
>>> This is marked as serious severity because it makes the autobuilder hang
>>> as well [3].
>>>
>>> Could you please help?
>>>
>>> Regards,
>>> Domenico
>>
>> Hi Domenico, thanks for the bug report.
>
> Hi Ihor,
>
>>
>> I debugged the hanging, and it appears that "abort" handling in case
>> of a BTF encoding error was overlooked in recent changes to speedup
>> parallel encoding.
>>
>> Could you please try the diff below, and check if it resolves the
>> hanging?
>>
>
> Yes, I tried it and the hanging is gone.
>
> Now both parallel and sequential invocations fail with this error:
>
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> Unsupported DW_TAG_reference_type(0x10): type: 0x28172
> Error while encoding BTF.
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> libbpf: failed to find '.BTF' ELF section in nvidia-modeset.ko
> pahole: nvidia-modeset.ko: Invalid argument
>
> I guess this is another story that was simply covered by the previous bug.
The "invalid argument" message and non-0 exit code are now appearing
because the encoding error is correctly propagated with the patch I
sent you.
The root cause of the problem you're seeing is unhandled DWARF
input. The resulting BTF is wrong in all cases: with or w/o the patch,
parallel or sequential.
The important part of the error message is:
Unsupported DW_TAG_reference_type(0x10): type: 0x28172
This makes BTF encoder abort.
Could you please share the base vmlinux passed to the command that
produces this error? Maybe you could provide instructions on how to
build it?
>
>> diff --git a/dwarf_loader.c b/dwarf_loader.c
>> index 84122d0..e1ba7bc 100644
>> --- a/dwarf_loader.c
>> +++ b/dwarf_loader.c
>> @@ -3459,6 +3459,7 @@ static struct {
>> */
>> uint32_t next_cu_id;
>> struct list_head jobs;
>> + bool abort;
>> } cus_processing_queue;
>>
>> enum job_type {
>> @@ -3479,6 +3480,7 @@ static void cus_queue__init(void)
>> pthread_cond_init(&cus_processing_queue.job_added, NULL);
>> INIT_LIST_HEAD(&cus_processing_queue.jobs);
>> cus_processing_queue.next_cu_id = 0;
>> + cus_processing_queue.abort = false;
>> }
>>
>> static void cus_queue__destroy(void)
>> @@ -3535,8 +3537,9 @@ static struct cu_processing_job *cus_queue__enqdeq_job(struct cu_processing_job
>> pthread_cond_signal(&cus_processing_queue.job_added);
>> }
>> for (;;) {
>> + bool abort = __atomic_load_n(&cus_processing_queue.abort, __ATOMIC_SEQ_CST);
>> job = cus_queue__try_dequeue();
>> - if (job)
>> + if (job || abort)
>> break;
>> /* No jobs or only steals out of order */
>> pthread_cond_wait(&cus_processing_queue.job_added, &cus_processing_queue.mutex);
>> @@ -3653,6 +3656,9 @@ static void *dwarf_loader__worker_thread(void *arg)
>>
>> while (!stop) {
>> job = cus_queue__enqdeq_job(job);
>> + if (!job)
>> + goto out_abort;
>> +
>> switch (job->type) {
>>
>> case JOB_DECODE:
>> @@ -3688,6 +3694,8 @@ static void *dwarf_loader__worker_thread(void *arg)
>>
>> return (void *)DWARF_CB_OK;
>> out_abort:
>> + __atomic_store_n(&cus_processing_queue.abort, true, __ATOMIC_SEQ_CST);
>> + pthread_cond_signal(&cus_processing_queue.job_added);
>> return (void *)DWARF_CB_ABORT;
>> }
>>
>> @@ -4028,7 +4036,7 @@ static int cus__process_file(struct cus *cus, struct conf_load *conf, int fd,
>>
>> /* Process the one or more modules gleaned from this file. */
>> int err = dwfl_getmodules(dwfl, cus__process_dwflmod, &parms, 0);
>> - if (err < 0)
>> + if (err)
>> return -1;
>>
>> // We can't call dwfl_end(dwfl) here, as we keep pointers to strings
>>
>
> Is this patch already final or do you prefer I'd wait for review and marge first?
I'd prefer the patch to get reviewed and merged into pahole/next
first. I'll submit it separately.
But as I said, it fixes just the hanging part, not the root cause of
the error.
>
> I would apply it on top of Debian's 1.29 and release a new 1.29-3 package.
>
> Thank,
> dom
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: parallel pahole hangs while building modules from nvidia-open-kernel-dkms
2025-03-28 16:25 ` Ihor Solodrai
@ 2025-03-28 17:55 ` Ihor Solodrai
0 siblings, 0 replies; 8+ messages in thread
From: Ihor Solodrai @ 2025-03-28 17:55 UTC (permalink / raw)
To: Domenico Andreoli; +Cc: dwarves, alan.maguire, acme, andrii, bpf
On 3/28/25 9:25 AM, Ihor Solodrai wrote:
> [...]
>
> Could you please share the base vmlinux passed to the command that
> produces this error? Maybe you could provide instructions on how to
> build it?
I just noticed that you shared some input files in the original
report. I'm going to try and reproduce the issue with that, thanks.
>> [...]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: parallel pahole hangs while building modules from nvidia-open-kernel-dkms
2025-03-25 9:10 parallel pahole hangs while building modules from nvidia-open-kernel-dkms Domenico Andreoli
2025-03-25 11:32 ` Alan Maguire
2025-03-26 20:48 ` Ihor Solodrai
@ 2025-03-28 20:25 ` Ihor Solodrai
2025-03-31 13:17 ` Alan Maguire
2 siblings, 1 reply; 8+ messages in thread
From: Ihor Solodrai @ 2025-03-28 20:25 UTC (permalink / raw)
To: Domenico Andreoli, alan.maguire, acme; +Cc: dwarves, bpf
On 3/25/25 2:10 AM, Domenico Andreoli wrote:
> Hi,
>
> This a forward of Debian bug report [0] where you can find more
> details. At [1] and [2] you can get the kernel and module to reproduce.
> I could reproduce on both amd64 and arm64 using pahole 1.29.
>
> This is marked as serious severity because it makes the autobuilder hang
> as well [3].
>
> Could you please help?
>
> Regards,
> Domenico
>
>
> The command to succeed:
>
> This simplified (sequential) command succeeds:
>
> cp nvidia-modeset.base.ko nvidia-modeset.ko
> LLVM_OBJCOPY="x86_64-linux-gnu-objcopy" pahole -J --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_features=distilled_base --btf_base vmlinux nvidia-modeset.ko -j1
> echo $?
>
> producing this output:
> ===== 8< =====
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> Unsupported DW_TAG_reference_type(0x10): type: 0x28172
Domenico, Alan, Arnaldo,
I was able to reproduce this error using the input files provided by
Domenico [1][2].
./build/pahole -J --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_features=distilled_base --btf_base debian-repro/vmlinux debian-repro/nvidia-modeset.base.ko -j1
dwarf_expr: unhandled 0x12 DW_OP_ operation
Unsupported DW_TAG_reference_type(0x10): type: 0x28172
Error while encoding BTF.
libbpf: failed to find '.BTF' ELF section in debian-repro/nvidia-modeset.base.ko
pahole: debian-repro/nvidia-modeset.base.ko: Invalid argument
The unhandled tag points to src/common/displayport/src/dp_auxretry.cpp
[3] of nvidia-modeset.base.ko
Now, as far as I know, BTF can't represent C++-style references
directly (maybe indirectly with tags?).
According to the code, pahole simply bails out in case it encounters
`DW_TAG_reference_type` during BTF encoding. So the question is why
BTF generation is even attempted for a module written in C++? It does
not appear to be a supported use-case.
Please correct me if I'm wrong about this.
Alan, sorry for jumping into this uninvited. I trust you'll take over
from here. Thanks!
I've sent a patch with a fix for the hanging [4].
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1100503;filename=vmlinux.zst;msg=19
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1100503;filename=nvidia-modeset.base.ko.zst;msg=12
[3] https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/common/displayport/src/dp_auxretry.cpp
[4] https://lore.kernel.org/bpf/20250328174003.3945581-1-ihor.solodrai@linux.dev/
> Error while encoding BTF.
> 0
> ===== >8 =====
>
> [...]
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: parallel pahole hangs while building modules from nvidia-open-kernel-dkms
2025-03-28 20:25 ` Ihor Solodrai
@ 2025-03-31 13:17 ` Alan Maguire
0 siblings, 0 replies; 8+ messages in thread
From: Alan Maguire @ 2025-03-31 13:17 UTC (permalink / raw)
To: Ihor Solodrai, Domenico Andreoli, acme; +Cc: dwarves, bpf
On 28/03/2025 20:25, Ihor Solodrai wrote:
> On 3/25/25 2:10 AM, Domenico Andreoli wrote:
>> Hi,
>>
>> This a forward of Debian bug report [0] where you can find more
>> details. At [1] and [2] you can get the kernel and module to reproduce.
>> I could reproduce on both amd64 and arm64 using pahole 1.29.
>>
>> This is marked as serious severity because it makes the autobuilder hang
>> as well [3].
>>
>> Could you please help?
>>
>> Regards,
>> Domenico
>>
>>
>> The command to succeed:
>>
>> This simplified (sequential) command succeeds:
>>
>> cp nvidia-modeset.base.ko nvidia-modeset.ko
>> LLVM_OBJCOPY="x86_64-linux-gnu-objcopy" pahole -J --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_features=distilled_base --btf_base vmlinux nvidia-modeset.ko -j1
>> echo $?
>>
>> producing this output:
>> ===== 8< =====
>> dwarf_expr: unhandled 0x12 DW_OP_ operation
>> Unsupported DW_TAG_reference_type(0x10): type: 0x28172
>
> Domenico, Alan, Arnaldo,
>
> I was able to reproduce this error using the input files provided by
> Domenico [1][2].
>
> ./build/pahole -J --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_features=distilled_base --btf_base debian-repro/vmlinux debian-repro/nvidia-modeset.base.ko -j1
> dwarf_expr: unhandled 0x12 DW_OP_ operation
> Unsupported DW_TAG_reference_type(0x10): type: 0x28172
> Error while encoding BTF.
> libbpf: failed to find '.BTF' ELF section in debian-repro/nvidia-modeset.base.ko
> pahole: debian-repro/nvidia-modeset.base.ko: Invalid argument
>
>
> The unhandled tag points to src/common/displayport/src/dp_auxretry.cpp
> [3] of nvidia-modeset.base.ko
>
> Now, as far as I know, BTF can't represent C++-style references
> directly (maybe indirectly with tags?).
>
> According to the code, pahole simply bails out in case it encounters
> `DW_TAG_reference_type` during BTF encoding. So the question is why
> BTF generation is even attempted for a module written in C++? It does
> not appear to be a supported use-case.
>
> Please correct me if I'm wrong about this.
>
> Alan, sorry for jumping into this uninvited. I trust you'll take over
> from here. Thanks!
>
> I've sent a patch with a fix for the hanging [4].
>
Thanks for the update; I'll test the fix shortly and I can reproduce
this failure at my end now too. For me, adding --lang_excude=c++11
resolves the issue and BTF encoding is successful, i.e. the following:
pahole -J
--btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs
--btf_features=distilled_base --lang_exclude=rust,c++11 --btf_base
vmlinux nvidia-modeset.base.ko
...works. I see though there is some support in pahole for C++
constructs, so we should figure out what we can/should support too. In
this particular case, the reference types and rvalue reference types all
appear to be toplevel DWARF tags, e.g.
<1></>: Abbrev Number: 16 (DW_TAG_reference_type)
<2b6053> DW_AT_byte_size : 8
<2b6053> DW_AT_type : <0x2b5805>
...refers to a const type:
<2><2b5805>: Abbrev Number: 10 (DW_TAG_const_type)
<2b5806> DW_AT_type : <0x2b561f>
...which in turn refers to a buffer:
<2><2b561f>: Abbrev Number: 43 (DW_TAG_class_type)
<2b5620> DW_AT_name : (indirect string, offset: 0x20bf79):
Buffer
<2b5624> DW_AT_byte_size : 24
<2b5625> DW_AT_decl_file : 10
<2b5626> DW_AT_decl_line : 38
<2b5627> DW_AT_decl_column : 11
<2b5627> DW_AT_sibling : <0x2b5805>
I tried doing the simple thing and skipping them for BTF encoding and we
end up falling over during deduplication, so that tells us they are
getting swept up into the type hierarchy.
Further investigation shows the DW_TAG_subprogram associated with Buffer
refers to the above reference type as a formal parameter.
I think these are class constructors, possibly for
https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/common/displayport/inc/dp_buffer.h
so
Buffer(const Buffer & other);
The & connotes a reference type in C++ I think (&& is a rvalue reference
type, which we'd need to handle too)..
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1100503;filename=vmlinux.zst;msg=19
> [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1100503;filename=nvidia-modeset.base.ko.zst;msg=12
> [3] https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/common/displayport/src/dp_auxretry.cpp
> [4] https://lore.kernel.org/bpf/20250328174003.3945581-1-ihor.solodrai@linux.dev/
>
>> Error while encoding BTF.
>> 0
>> ===== >8 =====
>>
>> [...]
>
^ permalink raw reply [flat|nested] 8+ messages in thread