Re: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()

public inbox for dev@dpdk.org
 help / color / mirror / Atom feed

From: Stephen Hemminger <stephen@networkplumber.org>
To: Gagandeep Singh <g.singh@nxp.com>
Cc: dev@dpdk.org
Subject: Re: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()
Date: Mon, 30 Mar 2026 13:39:38 -0700	[thread overview]
Message-ID: <20260330133938.2c0b6b8f@phoenix.local> (raw)
In-Reply-To: <20250110064717.1372216-1-g.singh@nxp.com>

On Fri, 10 Jan 2025 12:17:17 +0530
Gagandeep Singh <g.singh@nxp.com> wrote:

> This patch introduces a worker thread cleanup function in the EAL library,
> ensuring proper termination of created pthreads and invocation of
> registered pthread destructors.
> This guarantees the correct cleanup of thread-specific resources,
> used by drivers or applications.
> 
> Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
> ---

This seems to have not gotten review it needs. The AI review process found several issues.

Review: [PATCH] eal: add worker threads cleanup in rte_eal_cleanup()

Patch 1/1 - eal: add worker threads cleanup in rte_eal_cleanup()

Error: pthread_join called unconditionally after pthread_cancel failure.
If pthread_cancel() fails (returns non-zero), the thread was not
cancelled. Calling pthread_join() on a still-running worker thread
that is blocked in read() on its pipe will block the cleanup
indefinitely -- the worker is waiting for a command that will never
come, and join will wait for the worker that will never exit.
The join should be skipped when cancel fails, or the cancel failure
should be treated as fatal for that lcore.

  Suggested fix:
    ret = pthread_cancel((pthread_t)lcore_config[lcore_id].thread_id.opaque_id);
    if (ret != 0) {
        EAL_LOG(WARNING, "Pthread cancel fails for lcore %d",
                lcore_id);
        continue;   /* skip join -- thread is still running */
    }
    ret = pthread_join(...);

Error: Cleanup ordering -- worker threads cancelled after eal_bus_cleanup().
The patch inserts eal_worker_thread_cleanup() after eal_bus_cleanup().
Bus cleanup may trigger device close/release callbacks. If a worker
lcore is currently executing a function dispatched via
rte_eal_remote_launch() that touches bus/device resources, cancelling
the thread after those resources are torn down risks use-after-free.
Worker threads should be terminated first, before any subsystem
teardown, to ensure no worker is mid-execution when resources are
freed. Move eal_worker_thread_cleanup() to the beginning of
rte_eal_cleanup(), after the run_once guard but before
rte_service_finalize() / eal_bus_cleanup().

Warning: Uses raw pthread_cancel()/pthread_join() instead of DPDK thread API.
AGENTS.md forbidden tokens list requires rte_thread_join() instead
of pthread_join(). The existing mp_channel_cleanup code uses the
same pattern (pthread_cancel + rte_thread_join), so at minimum
the join should use rte_thread_join() for consistency:

    rte_thread_join(lcore_config[lcore_id].thread_id, NULL);

There is no rte_thread_cancel() wrapper, so pthread_cancel() is
acceptable here (same as rte_mp_channel_cleanup does).

Warning: No pipe fd cleanup after thread cancellation.
Each worker has pipe_main2worker and pipe_worker2main fds created
during rte_eal_init(). After cancelling and joining the worker
threads, these pipe fds are never closed. This leaks 2 pipe fds
(4 file descriptors) per worker lcore. The cleanup function should
close these fds after the join succeeds.

Warning: Comparing opaque_id against zero to detect uninitialized threads.
The check `if (!lcore_config[lcore_id].thread_id.opaque_id)` assumes
that zero means "no thread was created." On Linux, pthread_t is an
unsigned long and a valid thread ID could theoretically be 0 (though
glibc never produces this). A more robust approach is to track which
lcores had threads successfully created, or check the lcore state.
The existing mp_channel code uses a similar opaque_id != 0 guard,
so this is minor -- mentioning for completeness.

Info: The commit message says "pthreads" and "pthread destructors" but
does not explain *which* thread-specific resources motivate this
change. A concrete example (e.g., a specific driver TLS destructor
that leaks without this) would strengthen the justification.

     prev parent reply	other threads:[~2026-03-30 20:39 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-10  6:47 [PATCH] eal: add worker threads cleanup in rte_eal_cleanup() Gagandeep Singh
2025-01-10 17:19 ` Stephen Hemminger
2025-01-13  5:13   ` Gagandeep Singh
2025-01-13 16:40     ` Stephen Hemminger
2025-01-14  4:53       ` Gagandeep Singh
2025-01-14 16:26         ` Stephen Hemminger
2026-03-30 20:38         ` Stephen Hemminger
2026-03-30 20:39 ` Stephen Hemminger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260330133938.2c0b6b8f@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    --cc=g.singh@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox