* [PATCH dwarves] dwarf_loader: fix termination on BTF encoding error
@ 2025-03-28 17:40 Ihor Solodrai
2025-04-01 12:57 ` Alan Maguire
0 siblings, 1 reply; 4+ messages in thread
From: Ihor Solodrai @ 2025-03-28 17:40 UTC (permalink / raw)
To: dwarves, bpf
Cc: alan.maguire, domenico.andreoli, acme, andrii, eddyz87, mykolal,
kernel-team
When BTF encoding thread aborts because of an error, dwarf loader
worker threads get stuck in cus_queue__enqdeq_job() at:
pthread_cond_wait(&cus_processing_queue.job_added, &cus_processing_queue.mutex);
To avoid this, introduce an abort flag into cus_processing_queue, and
atomically check for it in the deq loop. The flag is only set in case
of a worker thread exiting on error. Make sure to pthread_cond_signal
to the waiting threads to let them exit too.
In cus__process_file fix the check of an error returned from
dwfl_getmodules: it may return a positive number when a
callback (cus__process_dwflmod in our case) returns an error.
Link: https://lore.kernel.org/dwarves/Z-JzFrXaopQCYd6h@localhost/
Reported-by: Domenico Andreoli <domenico.andreoli@linux.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
---
dwarf_loader.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/dwarf_loader.c b/dwarf_loader.c
index 84122d0..e1ba7bc 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3459,6 +3459,7 @@ static struct {
*/
uint32_t next_cu_id;
struct list_head jobs;
+ bool abort;
} cus_processing_queue;
enum job_type {
@@ -3479,6 +3480,7 @@ static void cus_queue__init(void)
pthread_cond_init(&cus_processing_queue.job_added, NULL);
INIT_LIST_HEAD(&cus_processing_queue.jobs);
cus_processing_queue.next_cu_id = 0;
+ cus_processing_queue.abort = false;
}
static void cus_queue__destroy(void)
@@ -3535,8 +3537,9 @@ static struct cu_processing_job *cus_queue__enqdeq_job(struct cu_processing_job
pthread_cond_signal(&cus_processing_queue.job_added);
}
for (;;) {
+ bool abort = __atomic_load_n(&cus_processing_queue.abort, __ATOMIC_SEQ_CST);
job = cus_queue__try_dequeue();
- if (job)
+ if (job || abort)
break;
/* No jobs or only steals out of order */
pthread_cond_wait(&cus_processing_queue.job_added, &cus_processing_queue.mutex);
@@ -3653,6 +3656,9 @@ static void *dwarf_loader__worker_thread(void *arg)
while (!stop) {
job = cus_queue__enqdeq_job(job);
+ if (!job)
+ goto out_abort;
+
switch (job->type) {
case JOB_DECODE:
@@ -3688,6 +3694,8 @@ static void *dwarf_loader__worker_thread(void *arg)
return (void *)DWARF_CB_OK;
out_abort:
+ __atomic_store_n(&cus_processing_queue.abort, true, __ATOMIC_SEQ_CST);
+ pthread_cond_signal(&cus_processing_queue.job_added);
return (void *)DWARF_CB_ABORT;
}
@@ -4028,7 +4036,7 @@ static int cus__process_file(struct cus *cus, struct conf_load *conf, int fd,
/* Process the one or more modules gleaned from this file. */
int err = dwfl_getmodules(dwfl, cus__process_dwflmod, &parms, 0);
- if (err < 0)
+ if (err)
return -1;
// We can't call dwfl_end(dwfl) here, as we keep pointers to strings
--
2.48.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH dwarves] dwarf_loader: fix termination on BTF encoding error
2025-03-28 17:40 [PATCH dwarves] dwarf_loader: fix termination on BTF encoding error Ihor Solodrai
@ 2025-04-01 12:57 ` Alan Maguire
2025-04-01 13:43 ` Domenico Andreoli
0 siblings, 1 reply; 4+ messages in thread
From: Alan Maguire @ 2025-04-01 12:57 UTC (permalink / raw)
To: Ihor Solodrai, dwarves, bpf
Cc: domenico.andreoli, acme, andrii, eddyz87, mykolal, kernel-team
On 28/03/2025 17:40, Ihor Solodrai wrote:
> When BTF encoding thread aborts because of an error, dwarf loader
> worker threads get stuck in cus_queue__enqdeq_job() at:
>
> pthread_cond_wait(&cus_processing_queue.job_added, &cus_processing_queue.mutex);
>
> To avoid this, introduce an abort flag into cus_processing_queue, and
> atomically check for it in the deq loop. The flag is only set in case
> of a worker thread exiting on error. Make sure to pthread_cond_signal
> to the waiting threads to let them exit too.
>
> In cus__process_file fix the check of an error returned from
> dwfl_getmodules: it may return a positive number when a
> callback (cus__process_dwflmod in our case) returns an error.
>
> Link: https://lore.kernel.org/dwarves/Z-JzFrXaopQCYd6h@localhost/
>
> Reported-by: Domenico Andreoli <domenico.andreoli@linux.com>
> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Thanks for the fix! I've tested this with the problematic module+vmlinux
BTF and the previously-hanging pahole goes on to fail as expected; also
run it through the work-in-progress CI, building and testing on x86_64
and aarch64, no issues found. If anyone else has a chance to ack or test
it, that would be great. Thanks!
Alan
> ---
> dwarf_loader.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/dwarf_loader.c b/dwarf_loader.c
> index 84122d0..e1ba7bc 100644
> --- a/dwarf_loader.c
> +++ b/dwarf_loader.c
> @@ -3459,6 +3459,7 @@ static struct {
> */
> uint32_t next_cu_id;
> struct list_head jobs;
> + bool abort;
> } cus_processing_queue;
>
> enum job_type {
> @@ -3479,6 +3480,7 @@ static void cus_queue__init(void)
> pthread_cond_init(&cus_processing_queue.job_added, NULL);
> INIT_LIST_HEAD(&cus_processing_queue.jobs);
> cus_processing_queue.next_cu_id = 0;
> + cus_processing_queue.abort = false;
> }
>
> static void cus_queue__destroy(void)
> @@ -3535,8 +3537,9 @@ static struct cu_processing_job *cus_queue__enqdeq_job(struct cu_processing_job
> pthread_cond_signal(&cus_processing_queue.job_added);
> }
> for (;;) {
> + bool abort = __atomic_load_n(&cus_processing_queue.abort, __ATOMIC_SEQ_CST);
> job = cus_queue__try_dequeue();
> - if (job)
> + if (job || abort)
> break;
> /* No jobs or only steals out of order */
> pthread_cond_wait(&cus_processing_queue.job_added, &cus_processing_queue.mutex);
> @@ -3653,6 +3656,9 @@ static void *dwarf_loader__worker_thread(void *arg)
>
> while (!stop) {
> job = cus_queue__enqdeq_job(job);
> + if (!job)
> + goto out_abort;
> +
> switch (job->type) {
>
> case JOB_DECODE:
> @@ -3688,6 +3694,8 @@ static void *dwarf_loader__worker_thread(void *arg)
>
> return (void *)DWARF_CB_OK;
> out_abort:
> + __atomic_store_n(&cus_processing_queue.abort, true, __ATOMIC_SEQ_CST);
> + pthread_cond_signal(&cus_processing_queue.job_added);
> return (void *)DWARF_CB_ABORT;
> }
>
> @@ -4028,7 +4036,7 @@ static int cus__process_file(struct cus *cus, struct conf_load *conf, int fd,
>
> /* Process the one or more modules gleaned from this file. */
> int err = dwfl_getmodules(dwfl, cus__process_dwflmod, &parms, 0);
> - if (err < 0)
> + if (err)
> return -1;
>
> // We can't call dwfl_end(dwfl) here, as we keep pointers to strings
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH dwarves] dwarf_loader: fix termination on BTF encoding error
2025-04-01 12:57 ` Alan Maguire
@ 2025-04-01 13:43 ` Domenico Andreoli
2025-04-02 9:56 ` Alan Maguire
0 siblings, 1 reply; 4+ messages in thread
From: Domenico Andreoli @ 2025-04-01 13:43 UTC (permalink / raw)
To: Alan Maguire
Cc: Ihor Solodrai, dwarves, bpf, acme, andrii, eddyz87, mykolal,
kernel-team
On Tue, Apr 01, 2025 at 01:57:25PM +0100, Alan Maguire wrote:
> On 28/03/2025 17:40, Ihor Solodrai wrote:
> > When BTF encoding thread aborts because of an error, dwarf loader
> > worker threads get stuck in cus_queue__enqdeq_job() at:
> >
> > pthread_cond_wait(&cus_processing_queue.job_added, &cus_processing_queue.mutex);
> >
> > To avoid this, introduce an abort flag into cus_processing_queue, and
> > atomically check for it in the deq loop. The flag is only set in case
> > of a worker thread exiting on error. Make sure to pthread_cond_signal
> > to the waiting threads to let them exit too.
> >
> > In cus__process_file fix the check of an error returned from
> > dwfl_getmodules: it may return a positive number when a
> > callback (cus__process_dwflmod in our case) returns an error.
> >
> > Link: https://lore.kernel.org/dwarves/Z-JzFrXaopQCYd6h@localhost/
> >
> > Reported-by: Domenico Andreoli <domenico.andreoli@linux.com>
> > Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
>
> Thanks for the fix! I've tested this with the problematic module+vmlinux
> BTF and the previously-hanging pahole goes on to fail as expected; also
> run it through the work-in-progress CI, building and testing on x86_64
> and aarch64, no issues found. If anyone else has a chance to ack or test
> it, that would be great. Thanks!
Tested-by: Domenico Andreoli <domenico.andreoli@linux.com>
I rebuilt the Debian package with that patch applied and it then started
to fail consistently because of the extra c++ symbols.
When I use the switch --lang_exclude=rust,c++11, it works without
errors.
Thank you Alan and Ihor for the fast support!
Dom
>
> Alan
>
> > ---
> > dwarf_loader.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/dwarf_loader.c b/dwarf_loader.c
> > index 84122d0..e1ba7bc 100644
> > --- a/dwarf_loader.c
> > +++ b/dwarf_loader.c
> > @@ -3459,6 +3459,7 @@ static struct {
> > */
> > uint32_t next_cu_id;
> > struct list_head jobs;
> > + bool abort;
> > } cus_processing_queue;
> >
> > enum job_type {
> > @@ -3479,6 +3480,7 @@ static void cus_queue__init(void)
> > pthread_cond_init(&cus_processing_queue.job_added, NULL);
> > INIT_LIST_HEAD(&cus_processing_queue.jobs);
> > cus_processing_queue.next_cu_id = 0;
> > + cus_processing_queue.abort = false;
> > }
> >
> > static void cus_queue__destroy(void)
> > @@ -3535,8 +3537,9 @@ static struct cu_processing_job *cus_queue__enqdeq_job(struct cu_processing_job
> > pthread_cond_signal(&cus_processing_queue.job_added);
> > }
> > for (;;) {
> > + bool abort = __atomic_load_n(&cus_processing_queue.abort, __ATOMIC_SEQ_CST);
> > job = cus_queue__try_dequeue();
> > - if (job)
> > + if (job || abort)
> > break;
> > /* No jobs or only steals out of order */
> > pthread_cond_wait(&cus_processing_queue.job_added, &cus_processing_queue.mutex);
> > @@ -3653,6 +3656,9 @@ static void *dwarf_loader__worker_thread(void *arg)
> >
> > while (!stop) {
> > job = cus_queue__enqdeq_job(job);
> > + if (!job)
> > + goto out_abort;
> > +
> > switch (job->type) {
> >
> > case JOB_DECODE:
> > @@ -3688,6 +3694,8 @@ static void *dwarf_loader__worker_thread(void *arg)
> >
> > return (void *)DWARF_CB_OK;
> > out_abort:
> > + __atomic_store_n(&cus_processing_queue.abort, true, __ATOMIC_SEQ_CST);
> > + pthread_cond_signal(&cus_processing_queue.job_added);
> > return (void *)DWARF_CB_ABORT;
> > }
> >
> > @@ -4028,7 +4036,7 @@ static int cus__process_file(struct cus *cus, struct conf_load *conf, int fd,
> >
> > /* Process the one or more modules gleaned from this file. */
> > int err = dwfl_getmodules(dwfl, cus__process_dwflmod, &parms, 0);
> > - if (err < 0)
> > + if (err)
> > return -1;
> >
> > // We can't call dwfl_end(dwfl) here, as we keep pointers to strings
>
>
--
rsa4096: 3B10 0CA1 8674 ACBA B4FE FCD2 CE5B CF17 9960 DE13
ed25519: FFB4 0CC3 7F2E 091D F7DA 356E CC79 2832 ED38 CB05
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH dwarves] dwarf_loader: fix termination on BTF encoding error
2025-04-01 13:43 ` Domenico Andreoli
@ 2025-04-02 9:56 ` Alan Maguire
0 siblings, 0 replies; 4+ messages in thread
From: Alan Maguire @ 2025-04-02 9:56 UTC (permalink / raw)
To: Domenico Andreoli
Cc: Ihor Solodrai, dwarves, bpf, acme, andrii, eddyz87, mykolal,
kernel-team
On 01/04/2025 14:43, Domenico Andreoli wrote:
> On Tue, Apr 01, 2025 at 01:57:25PM +0100, Alan Maguire wrote:
>> On 28/03/2025 17:40, Ihor Solodrai wrote:
>>> When BTF encoding thread aborts because of an error, dwarf loader
>>> worker threads get stuck in cus_queue__enqdeq_job() at:
>>>
>>> pthread_cond_wait(&cus_processing_queue.job_added, &cus_processing_queue.mutex);
>>>
>>> To avoid this, introduce an abort flag into cus_processing_queue, and
>>> atomically check for it in the deq loop. The flag is only set in case
>>> of a worker thread exiting on error. Make sure to pthread_cond_signal
>>> to the waiting threads to let them exit too.
>>>
>>> In cus__process_file fix the check of an error returned from
>>> dwfl_getmodules: it may return a positive number when a
>>> callback (cus__process_dwflmod in our case) returns an error.
>>>
>>> Link: https://lore.kernel.org/dwarves/Z-JzFrXaopQCYd6h@localhost/
>>>
>>> Reported-by: Domenico Andreoli <domenico.andreoli@linux.com>
>>> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
>>
>> Thanks for the fix! I've tested this with the problematic module+vmlinux
>> BTF and the previously-hanging pahole goes on to fail as expected; also
>> run it through the work-in-progress CI, building and testing on x86_64
>> and aarch64, no issues found. If anyone else has a chance to ack or test
>> it, that would be great. Thanks!
>
> Tested-by: Domenico Andreoli <domenico.andreoli@linux.com>
>
> I rebuilt the Debian package with that patch applied and it then started
> to fail consistently because of the extra c++ symbols.
>
> When I use the switch --lang_exclude=rust,c++11, it works without
> errors.
>
> Thank you Alan and Ihor for the fast support!
>
Fix applied to next branch at
https://git.kernel.org/pub/scm/devel/pahole/pahole.git , thanks!
Alan
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-04-02 9:56 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-28 17:40 [PATCH dwarves] dwarf_loader: fix termination on BTF encoding error Ihor Solodrai
2025-04-01 12:57 ` Alan Maguire
2025-04-01 13:43 ` Domenico Andreoli
2025-04-02 9:56 ` Alan Maguire
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox