* [PATCH] erofs-utils: fix thread join loop in erofs_destroy_workqueue
@ 2026-03-17 4:33 Nithurshen
2026-03-17 4:38 ` Nithurshen Karthikeyan
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Nithurshen @ 2026-03-17 4:33 UTC (permalink / raw)
To: linux-erofs; +Cc: xiang, hsiangkao, zhaoyifan28, Nithurshen
Currently, erofs_destroy_workqueue returns immediately if a single
pthread_join fails. This halts the teardown process, potentially
leaving orphaned threads and leaking the workqueue's mutexes and
worker array.
Refactor the joining logic to unconditionally join all worker
threads. Capture the first error encountered, continue joining the
remaining threads, and ensure all workqueue resources are properly
freed before returning the captured error.
Signed-off-by: Nithurshen <nithurshen.dev@gmail.com>
---
lib/workqueue.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/lib/workqueue.c b/lib/workqueue.c
index 18ee0f9..23eb460 100644
--- a/lib/workqueue.c
+++ b/lib/workqueue.c
@@ -42,6 +42,8 @@ static void *worker_thread(void *arg)
int erofs_destroy_workqueue(struct erofs_workqueue *wq)
{
+ int err = 0;
+
if (!wq)
return -EINVAL;
@@ -53,15 +55,17 @@ int erofs_destroy_workqueue(struct erofs_workqueue *wq)
while (wq->nworker) {
int ret = -pthread_join(wq->workers[wq->nworker - 1], NULL);
- if (ret)
- return ret;
+ if (ret && !err)
+ err = ret;
+
--wq->nworker;
}
free(wq->workers);
pthread_mutex_destroy(&wq->lock);
pthread_cond_destroy(&wq->cond_empty);
pthread_cond_destroy(&wq->cond_full);
- return 0;
+
+ return err;
}
int erofs_alloc_workqueue(struct erofs_workqueue *wq, unsigned int nworker,
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] erofs-utils: fix thread join loop in erofs_destroy_workqueue
2026-03-17 4:33 [PATCH] erofs-utils: fix thread join loop in erofs_destroy_workqueue Nithurshen
@ 2026-03-17 4:38 ` Nithurshen Karthikeyan
2026-03-18 2:15 ` 赵逸凡
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Nithurshen Karthikeyan @ 2026-03-17 4:38 UTC (permalink / raw)
To: linux-erofs; +Cc: xiang, hsiangkao, zhaoyifan28
Subject: Re: [PATCH] erofs-utils: fix thread join loop in
erofs_destroy_workqueue
Hi,
I have submitted a patch to fix the thread joining logic within
erofs_destroy_workqueue. To ensure the robustness of the teardown
process and guarantee no resources are leaked, I performed the
following verification and testing:
1) I built mkfs.erofs and successfully compressed a test directory
containing various text files using lz4 compression. The image
build completed successfully without any hangs or crashes.
2) I ran the same mkfs.erofs compression workload through Valgrind
with --leak-check=full and --show-leak-kinds=all. Valgrind
reported that "All heap blocks were freed -- no leaks are
possible," confirming that the modified teardown loop correctly
frees the worker array and other resources.
3) To ensure no orphaned threads or synchronization issues occur
during the destruction phase, I ran the workload through Helgrind.
The workqueue teardown sequence executed cleanly. (Note:
Helgrind did catch a pre-existing benign data race in
compressor_lz4_init unrelated to this patch which I am happy to
send a patch for).
4) Finally, I ran the built-in ./contrib/mkstress.sh script to verify
behavior under heavier loads, which also completed without issues.
Please let me know if you need any additional testing logs or
further modifications to the patch.
Thanks,
Nithurshen
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] erofs-utils: fix thread join loop in erofs_destroy_workqueue
2026-03-17 4:33 [PATCH] erofs-utils: fix thread join loop in erofs_destroy_workqueue Nithurshen
2026-03-17 4:38 ` Nithurshen Karthikeyan
@ 2026-03-18 2:15 ` 赵逸凡
2026-03-18 3:10 ` Gao Xiang
2026-03-18 4:03 ` [PATCH v2] " Nithurshen
2026-03-18 6:03 ` [PATCH v3] " Nithurshen
3 siblings, 1 reply; 9+ messages in thread
From: 赵逸凡 @ 2026-03-18 2:15 UTC (permalink / raw)
To: Nithurshen, linux-erofs; +Cc: xiang, hsiangkao, zhaoyifan28
On 3/17/2026 12:33 PM, Nithurshen wrote:
> Currently, erofs_destroy_workqueue returns immediately if a single
> pthread_join fails. This halts the teardown process, potentially
> leaving orphaned threads and leaking the workqueue's mutexes and
> worker array.
>
> Refactor the joining logic to unconditionally join all worker
> threads. Capture the first error encountered, continue joining the
> remaining threads, and ensure all workqueue resources are properly
> freed before returning the captured error.
>
> Signed-off-by: Nithurshen <nithurshen.dev@gmail.com>
Hi Nithurshe,
I'm afraid I cannot agree with this change. If pthread_join() fails, it
implies that the worker thread is still alive.
Cleaning up the synchronization primitives and the workers array at this
point would lead to a Use-After-Free issue, which is far more severe
than the current resource leak.
I believe pthread_join() seldom fails, otherwise it indicates bugs in
our codebase.
How about just leaving an error log print for this scenario? cc @hsiangkao
Thanks,
Yifan
> ---
> lib/workqueue.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/lib/workqueue.c b/lib/workqueue.c
> index 18ee0f9..23eb460 100644
> --- a/lib/workqueue.c
> +++ b/lib/workqueue.c
> @@ -42,6 +42,8 @@ static void *worker_thread(void *arg)
>
> int erofs_destroy_workqueue(struct erofs_workqueue *wq)
> {
> + int err = 0;
> +
> if (!wq)
> return -EINVAL;
>
> @@ -53,15 +55,17 @@ int erofs_destroy_workqueue(struct erofs_workqueue *wq)
> while (wq->nworker) {
> int ret = -pthread_join(wq->workers[wq->nworker - 1], NULL);
>
> - if (ret)
> - return ret;
> + if (ret && !err)
> + err = ret;
> +
> --wq->nworker;
> }
> free(wq->workers);
> pthread_mutex_destroy(&wq->lock);
> pthread_cond_destroy(&wq->cond_empty);
> pthread_cond_destroy(&wq->cond_full);
> - return 0;
> +
> + return err;
> }
>
> int erofs_alloc_workqueue(struct erofs_workqueue *wq, unsigned int nworker,
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] erofs-utils: fix thread join loop in erofs_destroy_workqueue
2026-03-18 2:15 ` 赵逸凡
@ 2026-03-18 3:10 ` Gao Xiang
0 siblings, 0 replies; 9+ messages in thread
From: Gao Xiang @ 2026-03-18 3:10 UTC (permalink / raw)
To: 赵逸凡, Nithurshen, linux-erofs; +Cc: xiang, zhaoyifan28
On 2026/3/18 10:15, 赵逸凡 wrote:
> On 3/17/2026 12:33 PM, Nithurshen wrote:
>> Currently, erofs_destroy_workqueue returns immediately if a single
>> pthread_join fails. This halts the teardown process, potentially
>> leaving orphaned threads and leaking the workqueue's mutexes and
>> worker array.
>>
>> Refactor the joining logic to unconditionally join all worker
>> threads. Capture the first error encountered, continue joining the
>> remaining threads, and ensure all workqueue resources are properly
>> freed before returning the captured error.
>>
>> Signed-off-by: Nithurshen <nithurshen.dev@gmail.com>
>
> Hi Nithurshe,
>
>
> I'm afraid I cannot agree with this change. If pthread_join() fails, it
> implies that the worker thread is still alive.
>
> Cleaning up the synchronization primitives and the workers array at this
> point would lead to a Use-After-Free issue, which is far more severe
> than the current resource leak.
>
> I believe pthread_join() seldom fails, otherwise it indicates bugs in
> our codebase.
Yes, I wonder the real error number here, it seems there
is another bug somewhere.
>
> How about just leaving an error log print for this scenario? cc @hsiangkao
Agreed, we should print the error message for each failure
instead.
Thanks,
Gao Xiang
>
>
>
> Thanks,
>
> Yifan
>
>
>
>> ---
>> lib/workqueue.c | 10 +++++++---
>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/workqueue.c b/lib/workqueue.c
>> index 18ee0f9..23eb460 100644
>> --- a/lib/workqueue.c
>> +++ b/lib/workqueue.c
>> @@ -42,6 +42,8 @@ static void *worker_thread(void *arg)
>>
>> int erofs_destroy_workqueue(struct erofs_workqueue *wq)
>> {
>> + int err = 0;
>> +
>> if (!wq)
>> return -EINVAL;
>>
>> @@ -53,15 +55,17 @@ int erofs_destroy_workqueue(struct erofs_workqueue *wq)
>> while (wq->nworker) {
>> int ret = -pthread_join(wq->workers[wq->nworker - 1], NULL);
>>
>> - if (ret)
>> - return ret;
>> + if (ret && !err)
>> + err = ret;
>> +
>> --wq->nworker;
>> }
>> free(wq->workers);
>> pthread_mutex_destroy(&wq->lock);
>> pthread_cond_destroy(&wq->cond_empty);
>> pthread_cond_destroy(&wq->cond_full);
>> - return 0;
>> +
>> + return err;
>> }
>>
>> int erofs_alloc_workqueue(struct erofs_workqueue *wq, unsigned int nworker,
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2] erofs-utils: fix thread join loop in erofs_destroy_workqueue
2026-03-17 4:33 [PATCH] erofs-utils: fix thread join loop in erofs_destroy_workqueue Nithurshen
2026-03-17 4:38 ` Nithurshen Karthikeyan
2026-03-18 2:15 ` 赵逸凡
@ 2026-03-18 4:03 ` Nithurshen
2026-03-18 4:20 ` Yifan Zhao
2026-03-18 6:03 ` [PATCH v3] " Nithurshen
3 siblings, 1 reply; 9+ messages in thread
From: Nithurshen @ 2026-03-18 4:03 UTC (permalink / raw)
To: linux-erofs; +Cc: xiang, hsiangkao, zhaoyifan28, stopire, Nithurshen
Currently, `erofs_destroy_workqueue` returns immediately if a single
`pthread_join` fails. This halts the teardown process, potentially
leaving orphanedd threads and leaking the workqueue's mutexes and
worker array.
Refactor the joining logic to unconditionally join all worker
threads. If a thread fails to join, print an error message with
the return code. Crucially, if any join fails, skip cleaning up
the synchronization primitives and worker array to prevent a severe
UAF vulnerability caused by still-living threads.
Signed-off-by: Nithurshen <nithurshen.dev@gmail.com>
---
lib/workqueue.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/lib/workqueue.c b/lib/workqueue.c
index 18ee0f9..98c181b 100644
--- a/lib/workqueue.c
+++ b/lib/workqueue.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0+ OR Apache-2.0
#include <pthread.h>
#include <stdlib.h>
+#include "erofs/print.h"
#include "erofs/workqueue.h"
static void *worker_thread(void *arg)
@@ -42,6 +43,8 @@ static void *worker_thread(void *arg)
int erofs_destroy_workqueue(struct erofs_workqueue *wq)
{
+ int err = 0;
+
if (!wq)
return -EINVAL;
@@ -51,12 +54,20 @@ int erofs_destroy_workqueue(struct erofs_workqueue *wq)
pthread_mutex_unlock(&wq->lock);
while (wq->nworker) {
- int ret = -pthread_join(wq->workers[wq->nworker - 1], NULL);
+ int ret = pthread_join(wq->workers[wq->nworker - 1], NULL);
- if (ret)
- return ret;
+ if (ret) {
+ erofs_err("failed to join worker thread %u: %d",
+ wq->nworker - 1, ret);
+ if (!err)
+ err = -ret;
+ }
--wq->nworker;
}
+
+ if (err)
+ return err;
+
free(wq->workers);
pthread_mutex_destroy(&wq->lock);
pthread_cond_destroy(&wq->cond_empty);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] erofs-utils: fix thread join loop in erofs_destroy_workqueue
2026-03-18 4:03 ` [PATCH v2] " Nithurshen
@ 2026-03-18 4:20 ` Yifan Zhao
0 siblings, 0 replies; 9+ messages in thread
From: Yifan Zhao @ 2026-03-18 4:20 UTC (permalink / raw)
To: Nithurshen, linux-erofs; +Cc: xiang, hsiangkao, zhaoyifan28
[-- Attachment #1: Type: text/plain, Size: 1923 bytes --]
On 3/18/2026 12:03 PM, Nithurshen wrote:
Currently, `erofs_destroy_workqueue` returns immediately if a single
`pthread_join` fails. This halts the teardown process, potentially
leaving orphanedd threads and leaking the workqueue's mutexes and
worker array.
Refactor the joining logic to unconditionally join all worker
threads. If a thread fails to join, print an error message with
the return code. Crucially, if any join fails, skip cleaning up
the synchronization primitives and worker array to prevent a severe
UAF vulnerability caused by still-living threads.
Signed-off-by: Nithurshen <nithurshen.dev@gmail.com> <nithurshen.dev@gmail.com>
---
lib/workqueue.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/lib/workqueue.c b/lib/workqueue.c
index 18ee0f9..98c181b 100644
--- a/lib/workqueue.c
+++ b/lib/workqueue.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0+ OR Apache-2.0
#include <pthread.h>
#include <stdlib.h>
+#include "erofs/print.h"
#include "erofs/workqueue.h"
static void *worker_thread(void *arg)
@@ -42,6 +43,8 @@ static void *worker_thread(void *arg)
int erofs_destroy_workqueue(struct erofs_workqueue *wq)
{
+ int err = 0;
+
if (!wq)
return -EINVAL;
@@ -51,12 +54,20 @@ int erofs_destroy_workqueue(struct erofs_workqueue *wq)
pthread_mutex_unlock(&wq->lock);
while (wq->nworker) {
- int ret = -pthread_join(wq->workers[wq->nworker - 1], NULL);
+ int ret = pthread_join(wq->workers[wq->nworker - 1], NULL);
- if (ret)
I think just a erofs_err() here is enough.., as joining other workers
brings very little benefit here.
Yifan
- return ret;
+ if (ret) {
+ erofs_err("failed to join worker thread %u: %d",
+ wq->nworker - 1, ret);
+ if (!err)
+ err = -ret;
+ }
--wq->nworker;
}
+
+ if (err)
+ return err;
+
free(wq->workers);
pthread_mutex_destroy(&wq->lock);
pthread_cond_destroy(&wq->cond_empty);
[-- Attachment #2: Type: text/html, Size: 2324 bytes --]
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3] erofs-utils: fix thread join loop in erofs_destroy_workqueue
2026-03-17 4:33 [PATCH] erofs-utils: fix thread join loop in erofs_destroy_workqueue Nithurshen
` (2 preceding siblings ...)
2026-03-18 4:03 ` [PATCH v2] " Nithurshen
@ 2026-03-18 6:03 ` Nithurshen
[not found] ` <CABra5+2gf8+T4ker-7rdABiy25VVPjPUSLjzarttxajYkjU2xg@mail.gmail.com>
3 siblings, 1 reply; 9+ messages in thread
From: Nithurshen @ 2026-03-18 6:03 UTC (permalink / raw)
To: linux-erofs; +Cc: xiang, hsiangkao, zhaoyifan28, stopire, Nithurshen
Currently, `erofs_destroy_workqueue` returns immediately if a single
`pthread_join` fails. However, it does not log the failure, making it
difficult to debug if a worker thread gets stuck.
Add an error log when `pthread_join` fails. Retain the early return
behavior to safely abort the teardown process, ensuring we do not
free the synchronization primitives and worker array while threads
are potentially still alive (avoiding a use after free).
Signed-off-by: Nithurshen <nithurshen.dev@gmail.com>
---
lib/workqueue.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/lib/workqueue.c b/lib/workqueue.c
index 18ee0f9..4a1a957 100644
--- a/lib/workqueue.c
+++ b/lib/workqueue.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0+ OR Apache-2.0
#include <pthread.h>
#include <stdlib.h>
+#include "erofs/print.h"
#include "erofs/workqueue.h"
static void *worker_thread(void *arg)
@@ -53,10 +54,14 @@ int erofs_destroy_workqueue(struct erofs_workqueue *wq)
while (wq->nworker) {
int ret = -pthread_join(wq->workers[wq->nworker - 1], NULL);
- if (ret)
+ if (ret) {
+ erofs_err("failed to join worker thread %u: %d",
+ wq->nworker - 1, ret);
return ret;
+ }
--wq->nworker;
}
+
free(wq->workers);
pthread_mutex_destroy(&wq->lock);
pthread_cond_destroy(&wq->cond_empty);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3] erofs-utils: fix thread join loop in erofs_destroy_workqueue
[not found] ` <CABra5+2gf8+T4ker-7rdABiy25VVPjPUSLjzarttxajYkjU2xg@mail.gmail.com>
@ 2026-03-18 8:36 ` Nithurshen Karthikeyan
2026-03-18 8:40 ` Gao Xiang
0 siblings, 1 reply; 9+ messages in thread
From: Nithurshen Karthikeyan @ 2026-03-18 8:36 UTC (permalink / raw)
To: Yifan Zhao, linux-erofs
Cc: Gao Xiang, xiang, zhaoyifan (H), Nithurshen Karthikeyan
On Wed, Mar 18, 2026 at 1:47 PM Yifan Zhao <stopire@gmail.com> wrote:
>>
>> On 3/18/2026 2:03 PM, Nithurshen wrote:
>>
>> Currently, `erofs_destroy_workqueue` returns immediately if a single
>> `pthread_join` fails. However, it does not log the failure, making it
>> difficult to debug if a worker thread gets stuck.
>>
>> Add an error log when `pthread_join` fails. Retain the early return
>> behavior to safely abort the teardown process, ensuring we do not
>> free the synchronization primitives and worker array while threads
>> are potentially still alive (avoiding a use after free).
>>
>> Signed-off-by: Nithurshen <nithurshen.dev@gmail.com>
>> ---
>> lib/workqueue.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/workqueue.c b/lib/workqueue.c
>> index 18ee0f9..4a1a957 100644
>> --- a/lib/workqueue.c
>> +++ b/lib/workqueue.c
>> @@ -1,6 +1,7 @@
>> // SPDX-License-Identifier: GPL-2.0+ OR Apache-2.0
>> #include <pthread.h>
>> #include <stdlib.h>
>> +#include "erofs/print.h"
>> #include "erofs/workqueue.h"
>>
>> static void *worker_thread(void *arg)
>> @@ -53,10 +54,14 @@ int erofs_destroy_workqueue(struct erofs_workqueue *wq)
>> while (wq->nworker) {
>> int ret = -pthread_join(wq->workers[wq->nworker - 1], NULL);
>>
>> - if (ret)
>> + if (ret) {
>> + erofs_err("failed to join worker thread %u: %d",
>> + wq->nworker - 1, ret);
>> return ret;
>> + }
>> --wq->nworker;
>> }
>> +
>> free(wq->workers);
>> pthread_mutex_destroy(&wq->lock);
>> pthread_cond_destroy(&wq->cond_empty);
>>
>> Reviewed-by: Yifan Zhao <stopire@gmail.com>
Thanks for the review,
It looks like the CC list was dropped on your last reply,
so I'm adding the mailing list back.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3] erofs-utils: fix thread join loop in erofs_destroy_workqueue
2026-03-18 8:36 ` Nithurshen Karthikeyan
@ 2026-03-18 8:40 ` Gao Xiang
0 siblings, 0 replies; 9+ messages in thread
From: Gao Xiang @ 2026-03-18 8:40 UTC (permalink / raw)
To: Nithurshen Karthikeyan, Yifan Zhao, linux-erofs; +Cc: xiang, zhaoyifan (H)
On 2026/3/18 16:36, Nithurshen Karthikeyan wrote:
> On Wed, Mar 18, 2026 at 1:47 PM Yifan Zhao <stopire@gmail.com> wrote:
>>>
...
>>>
>>> Reviewed-by: Yifan Zhao <stopire@gmail.com>
>
> Thanks for the review,
> It looks like the CC list was dropped on your last reply,
> so I'm adding the mailing list back.
Will apply, busy now.
Thanks,
Gao Xiang
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-03-18 8:40 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-17 4:33 [PATCH] erofs-utils: fix thread join loop in erofs_destroy_workqueue Nithurshen
2026-03-17 4:38 ` Nithurshen Karthikeyan
2026-03-18 2:15 ` 赵逸凡
2026-03-18 3:10 ` Gao Xiang
2026-03-18 4:03 ` [PATCH v2] " Nithurshen
2026-03-18 4:20 ` Yifan Zhao
2026-03-18 6:03 ` [PATCH v3] " Nithurshen
[not found] ` <CABra5+2gf8+T4ker-7rdABiy25VVPjPUSLjzarttxajYkjU2xg@mail.gmail.com>
2026-03-18 8:36 ` Nithurshen Karthikeyan
2026-03-18 8:40 ` Gao Xiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox