public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
* [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()
@ 2025-01-10  6:47 Gagandeep Singh
  2025-01-10 17:19 ` Stephen Hemminger
  2026-03-30 20:39 ` Stephen Hemminger
  0 siblings, 2 replies; 8+ messages in thread
From: Gagandeep Singh @ 2025-01-10  6:47 UTC (permalink / raw)
  To: dev

This patch introduces a worker thread cleanup function in the EAL library,
ensuring proper termination of created pthreads and invocation of
registered pthread destructors.
This guarantees the correct cleanup of thread-specific resources,
used by drivers or applications.

Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
---
 lib/eal/linux/eal.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index a6220524a4..a828905783 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -913,6 +913,32 @@ eal_worker_thread_create(unsigned int lcore_id)
 	return ret;
 }
 
+static void
+eal_worker_thread_cleanup(void)
+{
+	unsigned int lcore_id;
+	int ret = 0;
+
+	/* Cancel all the worker pthreads */
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		/* Check for non zero id */
+		if (!lcore_config[lcore_id].thread_id.opaque_id)
+			continue;
+
+		ret = pthread_cancel((pthread_t)lcore_config[lcore_id].thread_id.opaque_id);
+		if (ret != 0) {
+			EAL_LOG(WARNING, "Pthread cancel fails for lcore %d",
+					lcore_id);
+		}
+		ret = pthread_join((pthread_t)lcore_config[lcore_id].thread_id.opaque_id, NULL);
+		if (ret != 0) {
+			EAL_LOG(WARNING, "Pthread join fails for lcore %d",
+					lcore_id);
+		}
+	}
+	EAL_LOG(DEBUG, "Worker thread clean up done");
+}
+
 /* Launch threads, called at application init(). */
 int
 rte_eal_init(int argc, char **argv)
@@ -1321,6 +1347,7 @@ rte_eal_cleanup(void)
 #endif
 	rte_mp_channel_cleanup();
 	eal_bus_cleanup();
+	eal_worker_thread_cleanup();
 	rte_trace_save();
 	eal_trace_fini();
 	eal_mp_dev_hotplug_cleanup();
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()
  2025-01-10  6:47 [PATCH] eal: add worker threads cleanup in rte_eal_cleanup() Gagandeep Singh
@ 2025-01-10 17:19 ` Stephen Hemminger
  2025-01-13  5:13   ` Gagandeep Singh
  2026-03-30 20:39 ` Stephen Hemminger
  1 sibling, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2025-01-10 17:19 UTC (permalink / raw)
  To: Gagandeep Singh; +Cc: dev

On Fri, 10 Jan 2025 12:17:17 +0530
Gagandeep Singh <g.singh@nxp.com> wrote:

> This patch introduces a worker thread cleanup function in the EAL library,
> ensuring proper termination of created pthreads and invocation of
> registered pthread destructors.
> This guarantees the correct cleanup of thread-specific resources,
> used by drivers or applications.
> 
> Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
> ---

What problem is this trying to solve?

Canceling threads sends signals and can be problematic.
Many of the operations done in drivers are not signal safe.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()
  2025-01-10 17:19 ` Stephen Hemminger
@ 2025-01-13  5:13   ` Gagandeep Singh
  2025-01-13 16:40     ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Gagandeep Singh @ 2025-01-13  5:13 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev@dpdk.org

Hi,

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Friday, January 10, 2025 10:49 PM
> To: Gagandeep Singh <G.Singh@nxp.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()
> 
> On Fri, 10 Jan 2025 12:17:17 +0530
> Gagandeep Singh <g.singh@nxp.com> wrote:
> 
> > This patch introduces a worker thread cleanup function in the EAL
> > library, ensuring proper termination of created pthreads and
> > invocation of registered pthread destructors.
> > This guarantees the correct cleanup of thread-specific resources, used
> > by drivers or applications.
> >
> > Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
> > ---
> 
> What problem is this trying to solve?
> 
> Canceling threads sends signals and can be problematic.
> Many of the operations done in drivers are not signal safe.

To ensure the proper cleanup of thread-specific resources, the DPAA driver initializes pthread-specific destructors using pthread_key_create(). These destructors are executed only when a thread terminates or the key is deleted. However, since threads are not terminated when the application is killed, these destructors are not executed, resulting in resource leaks.
To address this issue, we propose adding thread termination code to rte_eal_cleanup() to ensure that threads are properly terminated, thereby triggering the execution of pthread-specific destructors

Any alternate suggestion in case pthread_cancel is not a better solution? We can add pthread join timeout to avoid blocking on thread stuck or
May be any way to call pthread_exit?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()
  2025-01-13  5:13   ` Gagandeep Singh
@ 2025-01-13 16:40     ` Stephen Hemminger
  2025-01-14  4:53       ` Gagandeep Singh
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2025-01-13 16:40 UTC (permalink / raw)
  To: Gagandeep Singh; +Cc: dev@dpdk.org

On Mon, 13 Jan 2025 05:13:01 +0000
Gagandeep Singh <G.Singh@nxp.com> wrote:

> Hi,
> 
> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Friday, January 10, 2025 10:49 PM
> > To: Gagandeep Singh <G.Singh@nxp.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()
> > 
> > On Fri, 10 Jan 2025 12:17:17 +0530
> > Gagandeep Singh <g.singh@nxp.com> wrote:
> >   
> > > This patch introduces a worker thread cleanup function in the EAL
> > > library, ensuring proper termination of created pthreads and
> > > invocation of registered pthread destructors.
> > > This guarantees the correct cleanup of thread-specific resources, used
> > > by drivers or applications.
> > >
> > > Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
> > > ---  
> > 
> > What problem is this trying to solve?
> > 
> > Canceling threads sends signals and can be problematic.
> > Many of the operations done in drivers are not signal safe.  
> 
> To ensure the proper cleanup of thread-specific resources, the DPAA driver initializes pthread-specific destructors using pthread_key_create(). These destructors are executed only when a thread terminates or the key is deleted. However, since threads are not terminated when the application is killed, these destructors are not executed, resulting in resource leaks.
> To address this issue, we propose adding thread termination code to rte_eal_cleanup() to ensure that threads are properly terminated, thereby triggering the execution of pthread-specific destructors
> 
> Any alternate suggestion in case pthread_cancel is not a better solution? We can add pthread join timeout to avoid blocking on thread stuck or
> May be any way to call pthread_exit?

The DPAA driver is the problem here. It should not be using pthread_key.
Other drivers don't do this. Other drivers do setup on probe and cleanup on close.

An application doing a clean shutdown should do what existing testpmd, l3fwd, do
	- wait for worker threads to go idle
	- stop all ports
	- close all ports
	- call eal cleanup

If DPAA driver needs pthread_key it should handling that in the close.
But it really should be using DPDK thread local storage for this.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()
  2025-01-13 16:40     ` Stephen Hemminger
@ 2025-01-14  4:53       ` Gagandeep Singh
  2025-01-14 16:26         ` Stephen Hemminger
  2026-03-30 20:38         ` Stephen Hemminger
  0 siblings, 2 replies; 8+ messages in thread
From: Gagandeep Singh @ 2025-01-14  4:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev@dpdk.org

Hi,

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday, January 13, 2025 10:10 PM
> To: Gagandeep Singh <G.Singh@nxp.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()
> 
> On Mon, 13 Jan 2025 05:13:01 +0000
> Gagandeep Singh <G.Singh@nxp.com> wrote:
> 
> > Hi,
> >
> > > -----Original Message-----
> > > From: Stephen Hemminger <stephen@networkplumber.org>
> > > Sent: Friday, January 10, 2025 10:49 PM
> > > To: Gagandeep Singh <G.Singh@nxp.com>
> > > Cc: dev@dpdk.org
> > > Subject: Re: [PATCH] eal: add worker threads cleanup in
> > > rte_eal_cleanup()
> > >
> > > On Fri, 10 Jan 2025 12:17:17 +0530
> > > Gagandeep Singh <g.singh@nxp.com> wrote:
> > >
> > > > This patch introduces a worker thread cleanup function in the EAL
> > > > library, ensuring proper termination of created pthreads and
> > > > invocation of registered pthread destructors.
> > > > This guarantees the correct cleanup of thread-specific resources,
> > > > used by drivers or applications.
> > > >
> > > > Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
> > > > ---
> > >
> > > What problem is this trying to solve?
> > >
> > > Canceling threads sends signals and can be problematic.
> > > Many of the operations done in drivers are not signal safe.
> >
> > To ensure the proper cleanup of thread-specific resources, the DPAA driver
> initializes pthread-specific destructors using pthread_key_create(). These
> destructors are executed only when a thread terminates or the key is deleted.
> However, since threads are not terminated when the application is killed, these
> destructors are not executed, resulting in resource leaks.
> > To address this issue, we propose adding thread termination code to
> > rte_eal_cleanup() to ensure that threads are properly terminated,
> > thereby triggering the execution of pthread-specific destructors
> >
> > Any alternate suggestion in case pthread_cancel is not a better
> > solution? We can add pthread join timeout to avoid blocking on thread stuck or
> May be any way to call pthread_exit?
> 
> The DPAA driver is the problem here. It should not be using pthread_key.
> Other drivers don't do this. Other drivers do setup on probe and cleanup on close.
> 
> An application doing a clean shutdown should do what existing testpmd, l3fwd, do
> 	- wait for worker threads to go idle
> 	- stop all ports
> 	- close all ports
> 	- call eal cleanup
> 
> If DPAA driver needs pthread_key it should handling that in the close.
> But it really should be using DPDK thread local storage for this.
> 
This is fine for graceful application termination, but what about non-graceful application termination?
I have a DPAA bus cleanup patch ready, and will submit it soon, But we still need
destructor in case application exit without calling the rte_eal_cleanup().

I can see DPDK is giving an API rte_thread_key_create() which is defined only in UNIX and windows.
Why not for Linux?

I know DPAA driver is using pthread_key which it should not use but instead use
rte_thread_key_create() provided by DPDK which we can replace.
Having pthread destructor is still a valid case in my opinion.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()
  2025-01-14  4:53       ` Gagandeep Singh
@ 2025-01-14 16:26         ` Stephen Hemminger
  2026-03-30 20:38         ` Stephen Hemminger
  1 sibling, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2025-01-14 16:26 UTC (permalink / raw)
  To: Gagandeep Singh; +Cc: dev@dpdk.org

On Tue, 14 Jan 2025 04:53:51 +0000
Gagandeep Singh <G.Singh@nxp.com> wrote:

> Hi,
> 
> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Monday, January 13, 2025 10:10 PM
> > To: Gagandeep Singh <G.Singh@nxp.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()
> > 
> > On Mon, 13 Jan 2025 05:13:01 +0000
> > Gagandeep Singh <G.Singh@nxp.com> wrote:
> >   
> > > Hi,
> > >  
> > > > -----Original Message-----
> > > > From: Stephen Hemminger <stephen@networkplumber.org>
> > > > Sent: Friday, January 10, 2025 10:49 PM
> > > > To: Gagandeep Singh <G.Singh@nxp.com>
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [PATCH] eal: add worker threads cleanup in
> > > > rte_eal_cleanup()
> > > >
> > > > On Fri, 10 Jan 2025 12:17:17 +0530
> > > > Gagandeep Singh <g.singh@nxp.com> wrote:
> > > >  
> > > > > This patch introduces a worker thread cleanup function in the EAL
> > > > > library, ensuring proper termination of created pthreads and
> > > > > invocation of registered pthread destructors.
> > > > > This guarantees the correct cleanup of thread-specific resources,
> > > > > used by drivers or applications.
> > > > >
> > > > > Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
> > > > > ---  
> > > >
> > > > What problem is this trying to solve?
> > > >
> > > > Canceling threads sends signals and can be problematic.
> > > > Many of the operations done in drivers are not signal safe.  
> > >
> > > To ensure the proper cleanup of thread-specific resources, the DPAA driver  
> > initializes pthread-specific destructors using pthread_key_create(). These
> > destructors are executed only when a thread terminates or the key is deleted.
> > However, since threads are not terminated when the application is killed, these
> > destructors are not executed, resulting in resource leaks.  
> > > To address this issue, we propose adding thread termination code to
> > > rte_eal_cleanup() to ensure that threads are properly terminated,
> > > thereby triggering the execution of pthread-specific destructors
> > >
> > > Any alternate suggestion in case pthread_cancel is not a better
> > > solution? We can add pthread join timeout to avoid blocking on thread stuck or  
> > May be any way to call pthread_exit?
> > 
> > The DPAA driver is the problem here. It should not be using pthread_key.
> > Other drivers don't do this. Other drivers do setup on probe and cleanup on close.
> > 
> > An application doing a clean shutdown should do what existing testpmd, l3fwd, do
> > 	- wait for worker threads to go idle
> > 	- stop all ports
> > 	- close all ports
> > 	- call eal cleanup
> > 
> > If DPAA driver needs pthread_key it should handling that in the close.
> > But it really should be using DPDK thread local storage for this.
> >   
> This is fine for graceful application termination, but what about non-graceful application termination?
> I have a DPAA bus cleanup patch ready, and will submit it soon, But we still need
> destructor in case application exit without calling the rte_eal_cleanup().

If application crashes, the code isn't going to be called  so the pthread destructor won't help.
You need to build any driver so that if application ungracefully exits, the hardware can be
reset on restart. Trying to rely on application doing anything while crashing is not going
to work. Alternatively, having a distinct separate watcher process (or using systemd) that can
do a forceful cleanup on application crash can work. The point is you can't handle non-graceful
shutdown cleanup in a DPDK driver and have it work reliably.


> 
> I can see DPDK is giving an API rte_thread_key_create() which is defined only in UNIX and windows.
> Why not for Linux?

I was thinking of the recent per-thread allocator.

> 
> I know DPAA driver is using pthread_key which it should not use but instead use
> rte_thread_key_create() provided by DPDK which we can replace.
> Having pthread destructor is still a valid case in my opinion.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()
  2025-01-14  4:53       ` Gagandeep Singh
  2025-01-14 16:26         ` Stephen Hemminger
@ 2026-03-30 20:38         ` Stephen Hemminger
  1 sibling, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2026-03-30 20:38 UTC (permalink / raw)
  To: Gagandeep Singh; +Cc: dev@dpdk.org

On Tue, 14 Jan 2025 04:53:51 +0000
Gagandeep Singh <G.Singh@nxp.com> wrote:

> This is fine for graceful application termination, but what about non-graceful application termination?
> I have a DPAA bus cleanup patch ready, and will submit it soon, But we still need
> destructor in case application exit without calling the rte_eal_cleanup().

Any destructor has to use only signal safe functions. if you want to handle non-graceful shutdown.
That is really hard to do.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()
  2025-01-10  6:47 [PATCH] eal: add worker threads cleanup in rte_eal_cleanup() Gagandeep Singh
  2025-01-10 17:19 ` Stephen Hemminger
@ 2026-03-30 20:39 ` Stephen Hemminger
  1 sibling, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2026-03-30 20:39 UTC (permalink / raw)
  To: Gagandeep Singh; +Cc: dev

On Fri, 10 Jan 2025 12:17:17 +0530
Gagandeep Singh <g.singh@nxp.com> wrote:

> This patch introduces a worker thread cleanup function in the EAL library,
> ensuring proper termination of created pthreads and invocation of
> registered pthread destructors.
> This guarantees the correct cleanup of thread-specific resources,
> used by drivers or applications.
> 
> Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
> ---

This seems to have not gotten review it needs. The AI review process found several issues.

Review: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()

Patch 1/1 - eal: add worker threads cleanup in rte_eal_cleanup()

Error: pthread_join called unconditionally after pthread_cancel failure.
If pthread_cancel() fails (returns non-zero), the thread was not
cancelled. Calling pthread_join() on a still-running worker thread
that is blocked in read() on its pipe will block the cleanup
indefinitely -- the worker is waiting for a command that will never
come, and join will wait for the worker that will never exit.
The join should be skipped when cancel fails, or the cancel failure
should be treated as fatal for that lcore.

  Suggested fix:
    ret = pthread_cancel((pthread_t)lcore_config[lcore_id].thread_id.opaque_id);
    if (ret != 0) {
        EAL_LOG(WARNING, "Pthread cancel fails for lcore %d",
                lcore_id);
        continue;   /* skip join -- thread is still running */
    }
    ret = pthread_join(...);

Error: Cleanup ordering -- worker threads cancelled after eal_bus_cleanup().
The patch inserts eal_worker_thread_cleanup() after eal_bus_cleanup().
Bus cleanup may trigger device close/release callbacks. If a worker
lcore is currently executing a function dispatched via
rte_eal_remote_launch() that touches bus/device resources, cancelling
the thread after those resources are torn down risks use-after-free.
Worker threads should be terminated first, before any subsystem
teardown, to ensure no worker is mid-execution when resources are
freed. Move eal_worker_thread_cleanup() to the beginning of
rte_eal_cleanup(), after the run_once guard but before
rte_service_finalize() / eal_bus_cleanup().

Warning: Uses raw pthread_cancel()/pthread_join() instead of DPDK thread API.
AGENTS.md forbidden tokens list requires rte_thread_join() instead
of pthread_join(). The existing mp_channel_cleanup code uses the
same pattern (pthread_cancel + rte_thread_join), so at minimum
the join should use rte_thread_join() for consistency:

    rte_thread_join(lcore_config[lcore_id].thread_id, NULL);

There is no rte_thread_cancel() wrapper, so pthread_cancel() is
acceptable here (same as rte_mp_channel_cleanup does).

Warning: No pipe fd cleanup after thread cancellation.
Each worker has pipe_main2worker and pipe_worker2main fds created
during rte_eal_init(). After cancelling and joining the worker
threads, these pipe fds are never closed. This leaks 2 pipe fds
(4 file descriptors) per worker lcore. The cleanup function should
close these fds after the join succeeds.

Warning: Comparing opaque_id against zero to detect uninitialized threads.
The check `if (!lcore_config[lcore_id].thread_id.opaque_id)` assumes
that zero means "no thread was created." On Linux, pthread_t is an
unsigned long and a valid thread ID could theoretically be 0 (though
glibc never produces this). A more robust approach is to track which
lcores had threads successfully created, or check the lcore state.
The existing mp_channel code uses a similar opaque_id != 0 guard,
so this is minor -- mentioning for completeness.

Info: The commit message says "pthreads" and "pthread destructors" but
does not explain *which* thread-specific resources motivate this
change. A concrete example (e.g., a specific driver TLS destructor
that leaks without this) would strengthen the justification.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-30 20:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-10  6:47 [PATCH] eal: add worker threads cleanup in rte_eal_cleanup() Gagandeep Singh
2025-01-10 17:19 ` Stephen Hemminger
2025-01-13  5:13   ` Gagandeep Singh
2025-01-13 16:40     ` Stephen Hemminger
2025-01-14  4:53       ` Gagandeep Singh
2025-01-14 16:26         ` Stephen Hemminger
2026-03-30 20:38         ` Stephen Hemminger
2026-03-30 20:39 ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox