DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 1/5] eal: fix wrong log message in async IPC request
@ 2026-03-19 16:07 Anatoly Burakov
  2026-03-19 16:07 ` [PATCH v1 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-03-19 16:07 UTC (permalink / raw)
  To: dev, Jianfeng Tan

The allocation failure log message in mp_request_async() says "sync
request" but the function handles asynchronous requests.

Fix the log to say "async request".

Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 06f151818c..799c6e81b0 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -883,7 +883,7 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	pending_req = calloc(1, sizeof(*pending_req));
 	reply_msg = calloc(1, sizeof(*reply_msg));
 	if (pending_req == NULL || reply_msg == NULL) {
-		EAL_LOG(ERR, "Could not allocate space for sync request");
+		EAL_LOG(ERR, "Could not allocate space for async request");
 		rte_errno = ENOMEM;
 		ret = -1;
 		goto fail;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v1 2/5] eal: fix async IPC callback not fired when no peers
  2026-03-19 16:07 [PATCH v1 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
@ 2026-03-19 16:07 ` Anatoly Burakov
  2026-03-19 16:07 ` [PATCH v1 3/5] eal: fix memory leak in async IPC secondary path Anatoly Burakov
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-03-19 16:07 UTC (permalink / raw)
  To: dev, Jianfeng Tan

Currently, when rte_mp_request_async() is called and no peer processes
are connected (nb_sent == 0), the user callback is never invoked.

The original implementation used a dedicated background thread and
pthread_cond_signal() to wake it after queuing the dummy request. When
that thread was replaced with per-message alarms, no alarm was set for
the dummy request, silently breaking the nb_sent == 0 path.

This was not noticed because async requests are used while handling
secondary process requests, where peers are typically already present.

Fix it by setting a 1us alarm on the dummy request, so the callback path
immediately triggers and processes it.

Fixes: daf9bfca717e ("ipc: remove thread for async requests")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 799c6e81b0..0ec79336a5 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -1187,11 +1187,15 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
 		ret = mp_request_async(eal_mp_socket_path(), copy, param, ts);
 
-		/* if we didn't send anything, put dummy request on the queue */
+		/* if we didn't send anything, put dummy request on the queue
+		 * and set a minimum-delay alarm so the callback fires immediately.
+		 */
 		if (ret == 0 && reply->nb_sent == 0) {
 			TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
 					next);
 			dummy_used = true;
+			if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0)
+				EAL_LOG(ERR, "Fail to set alarm for dummy request");
 		}
 
 		pthread_mutex_unlock(&pending_requests.lock);
@@ -1232,10 +1236,14 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		} else if (mp_request_async(path, copy, param, ts))
 			ret = -1;
 	}
-	/* if we didn't send anything, put dummy request on the queue */
+	/* if we didn't send anything, put dummy request on the queue
+	 * and set a minimum-delay alarm so the callback fires immediately.
+	 */
 	if (ret == 0 && reply->nb_sent == 0) {
 		TAILQ_INSERT_HEAD(&pending_requests.requests, dummy, next);
 		dummy_used = true;
+		if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0)
+			EAL_LOG(ERR, "Fail to set alarm for dummy request");
 	}
 
 	/* finally, unlock the queue */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v1 3/5] eal: fix memory leak in async IPC secondary path
  2026-03-19 16:07 [PATCH v1 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
  2026-03-19 16:07 ` [PATCH v1 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
@ 2026-03-19 16:07 ` Anatoly Burakov
  2026-03-19 16:07 ` [PATCH v1 4/5] eal: fix async IPC resource leaks on partial failure Anatoly Burakov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-03-19 16:07 UTC (permalink / raw)
  To: dev, Jianfeng Tan

When rte_mp_request_async() succeeds on the secondary process path, the
dummy request is freed only if it was inserted into the queue. However,
when the actual request was sent successfully (nb_sent > 0), the dummy is
not used and the function returns without freeing it.

Free dummy before returning on the success path when it was not inserted
into the queue.

Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 0ec79336a5..24d3557859 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -1203,6 +1203,8 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		/* if we couldn't send anything, clean up */
 		if (ret != 0)
 			goto fail;
+		if (!dummy_used)
+			free(dummy);
 		return 0;
 	}
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v1 4/5] eal: fix async IPC resource leaks on partial failure
  2026-03-19 16:07 [PATCH v1 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
  2026-03-19 16:07 ` [PATCH v1 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
  2026-03-19 16:07 ` [PATCH v1 3/5] eal: fix memory leak in async IPC secondary path Anatoly Burakov
@ 2026-03-19 16:07 ` Anatoly Burakov
  2026-05-28 14:24   ` Thomas Monjalon
  2026-03-19 16:07 ` [PATCH v1 5/5] eal: avoid deadlock in async IPC alarm callback Anatoly Burakov
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Anatoly Burakov @ 2026-03-19 16:07 UTC (permalink / raw)
  To: dev, Jianfeng Tan

When rte_mp_request_async() fails to send requests to all peers,
copy and param can lose ownership and leak.

On partial failure, some requests may already be queued and still
reference copy and param, so freeing them directly on the error
path can cause use-after-free when those requests are later handled.

Fix this by rolling back queued requests from the current batch,
resetting nb_sent to 0, and freeing copy/param only after rollback.
Use a numeric request ID for alarm callback lookup so stale callbacks
from rolled-back requests become harmless no-ops.

Coverity issue: 501503
Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 106 +++++++++++++++++++++++--------
 1 file changed, 81 insertions(+), 25 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 24d3557859..d1a041b707 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -74,6 +74,7 @@ struct async_request_param {
 
 struct pending_request {
 	TAILQ_ENTRY(pending_request) next;
+	unsigned long id;
 	enum {
 		REQUEST_TYPE_SYNC,
 		REQUEST_TYPE_ASYNC
@@ -92,6 +93,8 @@ struct pending_request {
 	};
 };
 
+static unsigned long next_request_id;
+
 TAILQ_HEAD(pending_request_list, pending_request);
 
 static struct {
@@ -111,9 +114,9 @@ mp_send(struct rte_mp_msg *msg, const char *peer, int type);
 static void
 async_reply_handle(void *arg);
 
-/* for use with process_msg */
+/* for use with alarm callback and process_msg */
 static struct pending_request *
-async_reply_handle_thread_unsafe(void *arg);
+async_reply_handle_thread_unsafe(struct pending_request *req);
 
 static void
 trigger_async_action(struct pending_request *req);
@@ -132,6 +135,19 @@ find_pending_request(const char *dst, const char *act_name)
 	return r;
 }
 
+static struct pending_request *
+find_pending_request_by_id(unsigned long id)
+{
+	struct pending_request *r;
+
+	TAILQ_FOREACH(r, &pending_requests.requests, next) {
+		if (r->id == id)
+			return r;
+	}
+
+	return NULL;
+}
+
 /*
  * Combine prefix and name(optional) to return unix domain socket path
  * return the number of characters that would have been put into buffer.
@@ -519,9 +535,8 @@ trigger_async_action(struct pending_request *sr)
 }
 
 static struct pending_request *
-async_reply_handle_thread_unsafe(void *arg)
+async_reply_handle_thread_unsafe(struct pending_request *req)
 {
-	struct pending_request *req = (struct pending_request *)arg;
 	enum async_action action;
 	struct timespec ts_now;
 
@@ -534,7 +549,8 @@ async_reply_handle_thread_unsafe(void *arg)
 
 	TAILQ_REMOVE(&pending_requests.requests, req, next);
 
-	if (rte_eal_alarm_cancel(async_reply_handle, req) < 0) {
+	if (rte_eal_alarm_cancel(async_reply_handle,
+			(void *)(uintptr_t)req->id) < 0) {
 		/* if we failed to cancel the alarm because it's already in
 		 * progress, don't proceed because otherwise we will end up
 		 * handling the same message twice.
@@ -557,9 +573,13 @@ static void
 async_reply_handle(void *arg)
 {
 	struct pending_request *req;
+	/* alarm arg carries the request ID packed into a void * via uintptr_t */
+	unsigned long id = (uintptr_t)arg;
 
 	pthread_mutex_lock(&pending_requests.lock);
-	req = async_reply_handle_thread_unsafe(arg);
+	req = find_pending_request_by_id(id);
+	if (req != NULL)
+		req = async_reply_handle_thread_unsafe(req);
 	pthread_mutex_unlock(&pending_requests.lock);
 
 	if (req != NULL)
@@ -878,7 +898,29 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 {
 	struct rte_mp_msg *reply_msg;
 	struct pending_request *pending_req, *exist;
-	int ret = -1;
+	unsigned long id;
+	int ret;
+
+	/* queue already locked by caller */
+
+	exist = find_pending_request(dst, req->name);
+	if (exist) {
+		EAL_LOG(ERR, "A pending request %s:%s", dst, req->name);
+		rte_errno = EEXIST;
+		return -1;
+	}
+
+	/* Set alarm before allocating or sending so request timeout tracking
+	 * is active as soon as this request ID is reserved.
+	 */
+	id = ++next_request_id;
+	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
+			async_reply_handle,
+			(void *)(uintptr_t)id) < 0) {
+		EAL_LOG(ERR, "Fail to set alarm for request %s:%s",
+			dst, req->name);
+		return -1;
+	}
 
 	pending_req = calloc(1, sizeof(*pending_req));
 	reply_msg = calloc(1, sizeof(*reply_msg));
@@ -890,21 +932,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	}
 
 	pending_req->type = REQUEST_TYPE_ASYNC;
+	pending_req->id = id;
 	strlcpy(pending_req->dst, dst, sizeof(pending_req->dst));
 	pending_req->request = req;
 	pending_req->reply = reply_msg;
 	pending_req->async.param = param;
 
-	/* queue already locked by caller */
-
-	exist = find_pending_request(dst, req->name);
-	if (exist) {
-		EAL_LOG(ERR, "A pending request %s:%s", dst, req->name);
-		rte_errno = EEXIST;
-		ret = -1;
-		goto fail;
-	}
-
 	ret = send_msg(dst, req, MP_REQ);
 	if (ret < 0) {
 		EAL_LOG(ERR, "Fail to send request %s:%s",
@@ -917,14 +950,6 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	}
 	param->user_reply.nb_sent++;
 
-	/* if alarm set fails, we simply ignore the reply */
-	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
-			      async_reply_handle, pending_req) < 0) {
-		EAL_LOG(ERR, "Fail to set alarm for request %s:%s",
-			dst, req->name);
-		ret = -1;
-		goto fail;
-	}
 	TAILQ_INSERT_TAIL(&pending_requests.requests, pending_req, next);
 
 	return 0;
@@ -1178,6 +1203,7 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	 * it, and put it on the queue if we don't send any requests.
 	 */
 	dummy->type = REQUEST_TYPE_ASYNC;
+	dummy->id = ++next_request_id;
 	dummy->request = copy;
 	dummy->reply = NULL;
 	dummy->async.param = param;
@@ -1238,6 +1264,31 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		} else if (mp_request_async(path, copy, param, ts))
 			ret = -1;
 	}
+
+	/*
+	 * On partial failure, roll back all queued requests. We hold the lock
+	 * so no one else touches the queue. All requests in this batch share
+	 * the same param pointer. Stale alarms will fire and harmlessly find
+	 * nothing via ID-based lookup.
+	 */
+	if (ret != 0 && reply->nb_sent > 0) {
+		struct pending_request *r, *next;
+
+		for (r = TAILQ_FIRST(&pending_requests.requests);
+				r != NULL; r = next) {
+			next = TAILQ_NEXT(r, next);
+			if (r->type == REQUEST_TYPE_ASYNC &&
+					r->async.param == param) {
+				TAILQ_REMOVE(&pending_requests.requests,
+						r, next);
+				free(r->reply);
+				/* r->request == copy, freed below after the loop */
+				free(r);
+			}
+		}
+		reply->nb_sent = 0;
+	}
+
 	/* if we didn't send anything, put dummy request on the queue
 	 * and set a minimum-delay alarm so the callback fires immediately.
 	 */
@@ -1260,6 +1311,11 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	/* if dummy was unused, free it */
 	if (!dummy_used)
 		free(dummy);
+	/* if nothing was sent, nobody owns copy/param */
+	if (ret != 0) {
+		free(param);
+		free(copy);
+	}
 
 	return ret;
 closedir_fail:
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v1 5/5] eal: avoid deadlock in async IPC alarm callback
  2026-03-19 16:07 [PATCH v1 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
                   ` (2 preceding siblings ...)
  2026-03-19 16:07 ` [PATCH v1 4/5] eal: fix async IPC resource leaks on partial failure Anatoly Burakov
@ 2026-03-19 16:07 ` Anatoly Burakov
  2026-05-29 15:26 ` [PATCH v2 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-03-19 16:07 UTC (permalink / raw)
  To: dev, Jianfeng Tan

async_reply_handle_thread_unsafe() can run while holding
pending_requests.lock and currently calls rte_eal_alarm_cancel().

rte_eal_alarm_cancel() may spin-wait for an executing callback, which can
deadlock if that callback is blocked on the same lock.

Remove callback-side alarm cancellation. It is safe to do so, because any
callback triggered without a pending request becomes a noop.

Fixes: daf9bfca717e ("ipc: remove thread for async requests")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index d1a041b707..830c11f4ac 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -549,19 +549,6 @@ async_reply_handle_thread_unsafe(struct pending_request *req)
 
 	TAILQ_REMOVE(&pending_requests.requests, req, next);
 
-	if (rte_eal_alarm_cancel(async_reply_handle,
-			(void *)(uintptr_t)req->id) < 0) {
-		/* if we failed to cancel the alarm because it's already in
-		 * progress, don't proceed because otherwise we will end up
-		 * handling the same message twice.
-		 */
-		if (rte_errno == EINPROGRESS) {
-			EAL_LOG(DEBUG, "Request handling is already in progress");
-			goto no_trigger;
-		}
-		EAL_LOG(ERR, "Failed to cancel alarm");
-	}
-
 	if (action == ACTION_TRIGGER)
 		return req;
 no_trigger:
@@ -910,8 +897,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 		return -1;
 	}
 
-	/* Set alarm before allocating or sending so request timeout tracking
-	 * is active as soon as this request ID is reserved.
+	/* Set alarm before allocating or sending. The alarm is never cancelled:
+	 * rte_eal_alarm_cancel spin-waits for an executing callback to finish,
+	 * which deadlocks if we hold pending_requests.lock while the callback
+	 * is blocked on it. Instead, let stale alarms fire; with ID-based
+	 * lookup the callback will simply not find the request and return
+	 * harmlessly.
 	 */
 	id = ++next_request_id;
 	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v1 4/5] eal: fix async IPC resource leaks on partial failure
  2026-03-19 16:07 ` [PATCH v1 4/5] eal: fix async IPC resource leaks on partial failure Anatoly Burakov
@ 2026-05-28 14:24   ` Thomas Monjalon
  2026-05-29 15:10     ` Burakov, Anatoly
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Monjalon @ 2026-05-28 14:24 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Jianfeng Tan

Hello Anatoly,

19/03/2026 17:07, Anatoly Burakov:
> Use a numeric request ID for alarm callback lookup so stale callbacks
> from rolled-back requests become harmless no-ops.

It seems you forgot to convert 2 calls to rte_eal_alarm_set().

[...]
>  struct pending_request {
>  	TAILQ_ENTRY(pending_request) next;
> +	unsigned long id;
[...]
> +static struct pending_request *
> +find_pending_request_by_id(unsigned long id)
> +{
> +	struct pending_request *r;
> +
> +	TAILQ_FOREACH(r, &pending_requests.requests, next) {
> +		if (r->id == id)
> +			return r;
> +	}
> +
> +	return NULL;
> +}
[...]
>  async_reply_handle(void *arg)
>  {
>  	struct pending_request *req;
> +	/* alarm arg carries the request ID packed into a void * via uintptr_t */

No, that's wrong.
The pointer passed in some rte_eal_alarm_set() from patch 2
is a struct pending_request.

> +	unsigned long id = (uintptr_t)arg;

Either you get the id field from arg,
or you pass dummy->id to rte_eal_alarm_set().

>  	pthread_mutex_lock(&pending_requests.lock);
> -	req = async_reply_handle_thread_unsafe(arg);
> +	req = find_pending_request_by_id(id);
> +	if (req != NULL)
> +		req = async_reply_handle_thread_unsafe(req);
>  	pthread_mutex_unlock(&pending_requests.lock);




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v1 4/5] eal: fix async IPC resource leaks on partial failure
  2026-05-28 14:24   ` Thomas Monjalon
@ 2026-05-29 15:10     ` Burakov, Anatoly
  0 siblings, 0 replies; 28+ messages in thread
From: Burakov, Anatoly @ 2026-05-29 15:10 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Jianfeng Tan

On 5/28/2026 4:24 PM, Thomas Monjalon wrote:
> Hello Anatoly,
> 
> 19/03/2026 17:07, Anatoly Burakov:
>> Use a numeric request ID for alarm callback lookup so stale callbacks
>> from rolled-back requests become harmless no-ops.
> 
> It seems you forgot to convert 2 calls to rte_eal_alarm_set().
> 
> [...]
>>   struct pending_request {
>>   	TAILQ_ENTRY(pending_request) next;
>> +	unsigned long id;
> [...]
>> +static struct pending_request *
>> +find_pending_request_by_id(unsigned long id)
>> +{
>> +	struct pending_request *r;
>> +
>> +	TAILQ_FOREACH(r, &pending_requests.requests, next) {
>> +		if (r->id == id)
>> +			return r;
>> +	}
>> +
>> +	return NULL;
>> +}
> [...]
>>   async_reply_handle(void *arg)
>>   {
>>   	struct pending_request *req;
>> +	/* alarm arg carries the request ID packed into a void * via uintptr_t */
> 
> No, that's wrong.
> The pointer passed in some rte_eal_alarm_set() from patch 2
> is a struct pending_request.
> 
>> +	unsigned long id = (uintptr_t)arg;
> 
> Either you get the id field from arg,
> or you pass dummy->id to rte_eal_alarm_set().

The intent was to do the latter, good catch. Thanks!

> 
>>   	pthread_mutex_lock(&pending_requests.lock);
>> -	req = async_reply_handle_thread_unsafe(arg);
>> +	req = find_pending_request_by_id(id);
>> +	if (req != NULL)
>> +		req = async_reply_handle_thread_unsafe(req);
>>   	pthread_mutex_unlock(&pending_requests.lock);
> 
> 
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/5] eal: fix wrong log message in async IPC request
  2026-03-19 16:07 [PATCH v1 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
                   ` (3 preceding siblings ...)
  2026-03-19 16:07 ` [PATCH v1 5/5] eal: avoid deadlock in async IPC alarm callback Anatoly Burakov
@ 2026-05-29 15:26 ` Anatoly Burakov
  2026-05-29 15:26   ` [PATCH v2 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
                     ` (3 more replies)
  2026-06-04 16:32 ` [PATCH v3 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
  2026-06-05 14:29 ` [PATCH v4 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
  6 siblings, 4 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-05-29 15:26 UTC (permalink / raw)
  To: dev, Jianfeng Tan

The allocation failure log message in mp_request_async() says "sync
request" but the function handles asynchronous requests.

Fix the log to say "async request".

Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 06f151818c..799c6e81b0 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -883,7 +883,7 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	pending_req = calloc(1, sizeof(*pending_req));
 	reply_msg = calloc(1, sizeof(*reply_msg));
 	if (pending_req == NULL || reply_msg == NULL) {
-		EAL_LOG(ERR, "Could not allocate space for sync request");
+		EAL_LOG(ERR, "Could not allocate space for async request");
 		rte_errno = ENOMEM;
 		ret = -1;
 		goto fail;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/5] eal: fix async IPC callback not fired when no peers
  2026-05-29 15:26 ` [PATCH v2 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
@ 2026-05-29 15:26   ` Anatoly Burakov
  2026-06-01 12:21     ` Thomas Monjalon
  2026-05-29 15:26   ` [PATCH v2 3/5] eal: fix memory leak in async IPC secondary path Anatoly Burakov
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Anatoly Burakov @ 2026-05-29 15:26 UTC (permalink / raw)
  To: dev, Jianfeng Tan

Currently, when rte_mp_request_async() is called and no peer processes
are connected (nb_sent == 0), the user callback is never invoked.

The original implementation used a dedicated background thread and
pthread_cond_signal() to wake it after queuing the dummy request. When
that thread was replaced with per-message alarms, no alarm was set for
the dummy request, silently breaking the nb_sent == 0 path.

This was not noticed because async requests are used while handling
secondary process requests, where peers are typically already present.

Fix it by setting a 1us alarm on the dummy request, so the callback path
immediately triggers and processes it.

Fixes: daf9bfca717e ("ipc: remove thread for async requests")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 799c6e81b0..0ec79336a5 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -1187,11 +1187,15 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
 		ret = mp_request_async(eal_mp_socket_path(), copy, param, ts);
 
-		/* if we didn't send anything, put dummy request on the queue */
+		/* if we didn't send anything, put dummy request on the queue
+		 * and set a minimum-delay alarm so the callback fires immediately.
+		 */
 		if (ret == 0 && reply->nb_sent == 0) {
 			TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
 					next);
 			dummy_used = true;
+			if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0)
+				EAL_LOG(ERR, "Fail to set alarm for dummy request");
 		}
 
 		pthread_mutex_unlock(&pending_requests.lock);
@@ -1232,10 +1236,14 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		} else if (mp_request_async(path, copy, param, ts))
 			ret = -1;
 	}
-	/* if we didn't send anything, put dummy request on the queue */
+	/* if we didn't send anything, put dummy request on the queue
+	 * and set a minimum-delay alarm so the callback fires immediately.
+	 */
 	if (ret == 0 && reply->nb_sent == 0) {
 		TAILQ_INSERT_HEAD(&pending_requests.requests, dummy, next);
 		dummy_used = true;
+		if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0)
+			EAL_LOG(ERR, "Fail to set alarm for dummy request");
 	}
 
 	/* finally, unlock the queue */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/5] eal: fix memory leak in async IPC secondary path
  2026-05-29 15:26 ` [PATCH v2 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
  2026-05-29 15:26   ` [PATCH v2 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
@ 2026-05-29 15:26   ` Anatoly Burakov
  2026-05-29 15:26   ` [PATCH v2 4/5] eal: fix async IPC resource leaks on partial failure Anatoly Burakov
  2026-05-29 15:26   ` [PATCH v2 5/5] eal: avoid deadlock in async IPC alarm callback Anatoly Burakov
  3 siblings, 0 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-05-29 15:26 UTC (permalink / raw)
  To: dev, Jianfeng Tan

When rte_mp_request_async() succeeds on the secondary process path, the
dummy request is freed only if it was inserted into the queue. However,
when the actual request was sent successfully (nb_sent > 0), the dummy is
not used and the function returns without freeing it.

Free dummy before returning on the success path when it was not inserted
into the queue.

Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 0ec79336a5..24d3557859 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -1203,6 +1203,8 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		/* if we couldn't send anything, clean up */
 		if (ret != 0)
 			goto fail;
+		if (!dummy_used)
+			free(dummy);
 		return 0;
 	}
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 4/5] eal: fix async IPC resource leaks on partial failure
  2026-05-29 15:26 ` [PATCH v2 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
  2026-05-29 15:26   ` [PATCH v2 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
  2026-05-29 15:26   ` [PATCH v2 3/5] eal: fix memory leak in async IPC secondary path Anatoly Burakov
@ 2026-05-29 15:26   ` Anatoly Burakov
  2026-06-01 12:16     ` Thomas Monjalon
  2026-05-29 15:26   ` [PATCH v2 5/5] eal: avoid deadlock in async IPC alarm callback Anatoly Burakov
  3 siblings, 1 reply; 28+ messages in thread
From: Anatoly Burakov @ 2026-05-29 15:26 UTC (permalink / raw)
  To: dev, Jianfeng Tan

When rte_mp_request_async() fails to send requests to all peers,
copy and param can lose ownership and leak.

On partial failure, some requests may already be queued and still
reference copy and param, so freeing them directly on the error
path can cause use-after-free when those requests are later handled.

Fix this by rolling back queued requests from the current batch,
resetting nb_sent to 0, and freeing copy/param only after rollback.
Use a numeric request ID for alarm callback lookup so stale callbacks
from rolled-back requests become harmless no-ops.

Coverity issue: 501503
Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 112 +++++++++++++++++++++++--------
 1 file changed, 85 insertions(+), 27 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 24d3557859..967186382e 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -74,6 +74,7 @@ struct async_request_param {
 
 struct pending_request {
 	TAILQ_ENTRY(pending_request) next;
+	unsigned long id;
 	enum {
 		REQUEST_TYPE_SYNC,
 		REQUEST_TYPE_ASYNC
@@ -92,6 +93,8 @@ struct pending_request {
 	};
 };
 
+static unsigned long next_request_id;
+
 TAILQ_HEAD(pending_request_list, pending_request);
 
 static struct {
@@ -111,9 +114,9 @@ mp_send(struct rte_mp_msg *msg, const char *peer, int type);
 static void
 async_reply_handle(void *arg);
 
-/* for use with process_msg */
+/* for use with alarm callback and process_msg */
 static struct pending_request *
-async_reply_handle_thread_unsafe(void *arg);
+async_reply_handle_thread_unsafe(struct pending_request *req);
 
 static void
 trigger_async_action(struct pending_request *req);
@@ -132,6 +135,19 @@ find_pending_request(const char *dst, const char *act_name)
 	return r;
 }
 
+static struct pending_request *
+find_pending_request_by_id(unsigned long id)
+{
+	struct pending_request *r;
+
+	TAILQ_FOREACH(r, &pending_requests.requests, next) {
+		if (r->id == id)
+			return r;
+	}
+
+	return NULL;
+}
+
 /*
  * Combine prefix and name(optional) to return unix domain socket path
  * return the number of characters that would have been put into buffer.
@@ -519,9 +535,8 @@ trigger_async_action(struct pending_request *sr)
 }
 
 static struct pending_request *
-async_reply_handle_thread_unsafe(void *arg)
+async_reply_handle_thread_unsafe(struct pending_request *req)
 {
-	struct pending_request *req = (struct pending_request *)arg;
 	enum async_action action;
 	struct timespec ts_now;
 
@@ -534,7 +549,8 @@ async_reply_handle_thread_unsafe(void *arg)
 
 	TAILQ_REMOVE(&pending_requests.requests, req, next);
 
-	if (rte_eal_alarm_cancel(async_reply_handle, req) < 0) {
+	if (rte_eal_alarm_cancel(async_reply_handle,
+			(void *)(uintptr_t)req->id) < 0) {
 		/* if we failed to cancel the alarm because it's already in
 		 * progress, don't proceed because otherwise we will end up
 		 * handling the same message twice.
@@ -557,9 +573,13 @@ static void
 async_reply_handle(void *arg)
 {
 	struct pending_request *req;
+	/* alarm arg carries the request ID packed into a void * via uintptr_t */
+	unsigned long id = (uintptr_t)arg;
 
 	pthread_mutex_lock(&pending_requests.lock);
-	req = async_reply_handle_thread_unsafe(arg);
+	req = find_pending_request_by_id(id);
+	if (req != NULL)
+		req = async_reply_handle_thread_unsafe(req);
 	pthread_mutex_unlock(&pending_requests.lock);
 
 	if (req != NULL)
@@ -878,7 +898,29 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 {
 	struct rte_mp_msg *reply_msg;
 	struct pending_request *pending_req, *exist;
-	int ret = -1;
+	unsigned long id;
+	int ret;
+
+	/* queue already locked by caller */
+
+	exist = find_pending_request(dst, req->name);
+	if (exist) {
+		EAL_LOG(ERR, "A pending request %s:%s", dst, req->name);
+		rte_errno = EEXIST;
+		return -1;
+	}
+
+	/* Set alarm before allocating or sending so request timeout tracking
+	 * is active as soon as this request ID is reserved.
+	 */
+	id = ++next_request_id;
+	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
+			async_reply_handle,
+			(void *)(uintptr_t)id) < 0) {
+		EAL_LOG(ERR, "Fail to set alarm for request %s:%s",
+			dst, req->name);
+		return -1;
+	}
 
 	pending_req = calloc(1, sizeof(*pending_req));
 	reply_msg = calloc(1, sizeof(*reply_msg));
@@ -890,21 +932,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	}
 
 	pending_req->type = REQUEST_TYPE_ASYNC;
+	pending_req->id = id;
 	strlcpy(pending_req->dst, dst, sizeof(pending_req->dst));
 	pending_req->request = req;
 	pending_req->reply = reply_msg;
 	pending_req->async.param = param;
 
-	/* queue already locked by caller */
-
-	exist = find_pending_request(dst, req->name);
-	if (exist) {
-		EAL_LOG(ERR, "A pending request %s:%s", dst, req->name);
-		rte_errno = EEXIST;
-		ret = -1;
-		goto fail;
-	}
-
 	ret = send_msg(dst, req, MP_REQ);
 	if (ret < 0) {
 		EAL_LOG(ERR, "Fail to send request %s:%s",
@@ -917,14 +950,6 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	}
 	param->user_reply.nb_sent++;
 
-	/* if alarm set fails, we simply ignore the reply */
-	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
-			      async_reply_handle, pending_req) < 0) {
-		EAL_LOG(ERR, "Fail to set alarm for request %s:%s",
-			dst, req->name);
-		ret = -1;
-		goto fail;
-	}
 	TAILQ_INSERT_TAIL(&pending_requests.requests, pending_req, next);
 
 	return 0;
@@ -1178,6 +1203,7 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	 * it, and put it on the queue if we don't send any requests.
 	 */
 	dummy->type = REQUEST_TYPE_ASYNC;
+	dummy->id = ++next_request_id;
 	dummy->request = copy;
 	dummy->reply = NULL;
 	dummy->async.param = param;
@@ -1194,7 +1220,8 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 			TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
 					next);
 			dummy_used = true;
-			if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0)
+			if (rte_eal_alarm_set(1, async_reply_handle,
+					(void *)(uintptr_t)dummy->id) < 0)
 				EAL_LOG(ERR, "Fail to set alarm for dummy request");
 		}
 
@@ -1238,13 +1265,39 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		} else if (mp_request_async(path, copy, param, ts))
 			ret = -1;
 	}
+
+	/*
+	 * On partial failure, roll back all queued requests. We hold the lock
+	 * so no one else touches the queue. All requests in this batch share
+	 * the same param pointer. Stale alarms will fire and harmlessly find
+	 * nothing via ID-based lookup.
+	 */
+	if (ret != 0 && reply->nb_sent > 0) {
+		struct pending_request *r, *next;
+
+		for (r = TAILQ_FIRST(&pending_requests.requests);
+				r != NULL; r = next) {
+			next = TAILQ_NEXT(r, next);
+			if (r->type == REQUEST_TYPE_ASYNC &&
+					r->async.param == param) {
+				TAILQ_REMOVE(&pending_requests.requests,
+						r, next);
+				free(r->reply);
+				/* r->request == copy, freed below after the loop */
+				free(r);
+			}
+		}
+		reply->nb_sent = 0;
+	}
+
 	/* if we didn't send anything, put dummy request on the queue
 	 * and set a minimum-delay alarm so the callback fires immediately.
 	 */
 	if (ret == 0 && reply->nb_sent == 0) {
 		TAILQ_INSERT_HEAD(&pending_requests.requests, dummy, next);
 		dummy_used = true;
-		if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0)
+		if (rte_eal_alarm_set(1, async_reply_handle,
+				(void *)(uintptr_t)dummy->id) < 0)
 			EAL_LOG(ERR, "Fail to set alarm for dummy request");
 	}
 
@@ -1260,6 +1313,11 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	/* if dummy was unused, free it */
 	if (!dummy_used)
 		free(dummy);
+	/* if nothing was sent, nobody owns copy/param */
+	if (ret != 0) {
+		free(param);
+		free(copy);
+	}
 
 	return ret;
 closedir_fail:
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 5/5] eal: avoid deadlock in async IPC alarm callback
  2026-05-29 15:26 ` [PATCH v2 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
                     ` (2 preceding siblings ...)
  2026-05-29 15:26   ` [PATCH v2 4/5] eal: fix async IPC resource leaks on partial failure Anatoly Burakov
@ 2026-05-29 15:26   ` Anatoly Burakov
  3 siblings, 0 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-05-29 15:26 UTC (permalink / raw)
  To: dev, Jianfeng Tan

async_reply_handle_thread_unsafe() can run while holding
pending_requests.lock and currently calls rte_eal_alarm_cancel().

rte_eal_alarm_cancel() may spin-wait for an executing callback, which can
deadlock if that callback is blocked on the same lock.

Remove callback-side alarm cancellation. It is safe to do so, because any
callback triggered without a pending request becomes a noop.

Fixes: daf9bfca717e ("ipc: remove thread for async requests")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 967186382e..67a393f53e 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -549,19 +549,6 @@ async_reply_handle_thread_unsafe(struct pending_request *req)
 
 	TAILQ_REMOVE(&pending_requests.requests, req, next);
 
-	if (rte_eal_alarm_cancel(async_reply_handle,
-			(void *)(uintptr_t)req->id) < 0) {
-		/* if we failed to cancel the alarm because it's already in
-		 * progress, don't proceed because otherwise we will end up
-		 * handling the same message twice.
-		 */
-		if (rte_errno == EINPROGRESS) {
-			EAL_LOG(DEBUG, "Request handling is already in progress");
-			goto no_trigger;
-		}
-		EAL_LOG(ERR, "Failed to cancel alarm");
-	}
-
 	if (action == ACTION_TRIGGER)
 		return req;
 no_trigger:
@@ -910,8 +897,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 		return -1;
 	}
 
-	/* Set alarm before allocating or sending so request timeout tracking
-	 * is active as soon as this request ID is reserved.
+	/* Set alarm before allocating or sending. The alarm is never cancelled:
+	 * rte_eal_alarm_cancel spin-waits for an executing callback to finish,
+	 * which deadlocks if we hold pending_requests.lock while the callback
+	 * is blocked on it. Instead, let stale alarms fire; with ID-based
+	 * lookup the callback will simply not find the request and return
+	 * harmlessly.
 	 */
 	id = ++next_request_id;
 	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/5] eal: fix async IPC resource leaks on partial failure
  2026-05-29 15:26   ` [PATCH v2 4/5] eal: fix async IPC resource leaks on partial failure Anatoly Burakov
@ 2026-06-01 12:16     ` Thomas Monjalon
  2026-06-03  8:28       ` Burakov, Anatoly
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Monjalon @ 2026-06-01 12:16 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Jianfeng Tan

29/05/2026 17:26, Anatoly Burakov:
> Use a numeric request ID for alarm callback lookup so stale callbacks
> from rolled-back requests become harmless no-ops.
[...]
> +static struct pending_request *
> +find_pending_request_by_id(unsigned long id)
> +{
> +	struct pending_request *r;
> +
> +	TAILQ_FOREACH(r, &pending_requests.requests, next) {
> +		if (r->id == id)
> +			return r;
> +	}
> +
> +	return NULL;
> +}

This function is supposed to find only async requests?
What will happen if id wraparound and becomes 0,
matching sync requests?

I feel we should filter with r->type == REQUEST_TYPE_ASYNC



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/5] eal: fix async IPC callback not fired when no peers
  2026-05-29 15:26   ` [PATCH v2 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
@ 2026-06-01 12:21     ` Thomas Monjalon
  2026-06-01 12:40       ` Thomas Monjalon
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Monjalon @ 2026-06-01 12:21 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Jianfeng Tan

29/05/2026 17:26, Anatoly Burakov:
> Currently, when rte_mp_request_async() is called and no peer processes
> are connected (nb_sent == 0), the user callback is never invoked.
> 
> The original implementation used a dedicated background thread and
> pthread_cond_signal() to wake it after queuing the dummy request. When
> that thread was replaced with per-message alarms, no alarm was set for
> the dummy request, silently breaking the nb_sent == 0 path.
> 
> This was not noticed because async requests are used while handling
> secondary process requests, where peers are typically already present.
> 
> Fix it by setting a 1us alarm on the dummy request, so the callback path
> immediately triggers and processes it.
> 
> Fixes: daf9bfca717e ("ipc: remove thread for async requests")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/eal/common/eal_common_proc.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
> index 799c6e81b0..0ec79336a5 100644
> --- a/lib/eal/common/eal_common_proc.c
> +++ b/lib/eal/common/eal_common_proc.c
> @@ -1187,11 +1187,15 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
>  	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
>  		ret = mp_request_async(eal_mp_socket_path(), copy, param, ts);
>  
> -		/* if we didn't send anything, put dummy request on the queue */
> +		/* if we didn't send anything, put dummy request on the queue
> +		 * and set a minimum-delay alarm so the callback fires immediately.
> +		 */
>  		if (ret == 0 && reply->nb_sent == 0) {
>  			TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
>  					next);
>  			dummy_used = true;
> +			if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0)
> +				EAL_LOG(ERR, "Fail to set alarm for dummy request");

Shouldn't we return an error?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/5] eal: fix async IPC callback not fired when no peers
  2026-06-01 12:21     ` Thomas Monjalon
@ 2026-06-01 12:40       ` Thomas Monjalon
  2026-06-04 16:21         ` Burakov, Anatoly
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Monjalon @ 2026-06-01 12:40 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Jianfeng Tan

01/06/2026 14:21, Thomas Monjalon:
> 29/05/2026 17:26, Anatoly Burakov:
> > Currently, when rte_mp_request_async() is called and no peer processes
> > are connected (nb_sent == 0), the user callback is never invoked.
> > 
> > The original implementation used a dedicated background thread and
> > pthread_cond_signal() to wake it after queuing the dummy request. When
> > that thread was replaced with per-message alarms, no alarm was set for
> > the dummy request, silently breaking the nb_sent == 0 path.
> > 
> > This was not noticed because async requests are used while handling
> > secondary process requests, where peers are typically already present.
> > 
> > Fix it by setting a 1us alarm on the dummy request, so the callback path
> > immediately triggers and processes it.
> > 
> > Fixes: daf9bfca717e ("ipc: remove thread for async requests")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > ---
> >  lib/eal/common/eal_common_proc.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
> > index 799c6e81b0..0ec79336a5 100644
> > --- a/lib/eal/common/eal_common_proc.c
> > +++ b/lib/eal/common/eal_common_proc.c
> > @@ -1187,11 +1187,15 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
> >  	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> >  		ret = mp_request_async(eal_mp_socket_path(), copy, param, ts);
> >  
> > -		/* if we didn't send anything, put dummy request on the queue */
> > +		/* if we didn't send anything, put dummy request on the queue
> > +		 * and set a minimum-delay alarm so the callback fires immediately.
> > +		 */
> >  		if (ret == 0 && reply->nb_sent == 0) {
> >  			TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
> >  					next);
> >  			dummy_used = true;
> > +			if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0)
> > +				EAL_LOG(ERR, "Fail to set alarm for dummy request");
> 
> Shouldn't we return an error?

AI suggests this:

      if (rte_eal_alarm_set(1, async_reply_handle,
              (void *)(uintptr_t)dummy->id) < 0) {
          EAL_LOG(ERR, "Fail to set alarm for dummy request");
          TAILQ_REMOVE(&pending_requests.requests, dummy, next);
          dummy_used = false;
          ret = -1;
      }



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/5] eal: fix async IPC resource leaks on partial failure
  2026-06-01 12:16     ` Thomas Monjalon
@ 2026-06-03  8:28       ` Burakov, Anatoly
  0 siblings, 0 replies; 28+ messages in thread
From: Burakov, Anatoly @ 2026-06-03  8:28 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Jianfeng Tan

On 6/1/2026 2:16 PM, Thomas Monjalon wrote:
> 29/05/2026 17:26, Anatoly Burakov:
>> Use a numeric request ID for alarm callback lookup so stale callbacks
>> from rolled-back requests become harmless no-ops.
> [...]
>> +static struct pending_request *
>> +find_pending_request_by_id(unsigned long id)
>> +{
>> +	struct pending_request *r;
>> +
>> +	TAILQ_FOREACH(r, &pending_requests.requests, next) {
>> +		if (r->id == id)
>> +			return r;
>> +	}
>> +
>> +	return NULL;
>> +}
> 
> This function is supposed to find only async requests?
> What will happen if id wraparound and becomes 0,
> matching sync requests?
> 
> I feel we should filter with r->type == REQUEST_TYPE_ASYNC
> 
> 

I mean it could happen if you submit `unsigned long` worth of IPC 
requests, but I didn't think it was a problem that needed solving. But 
sure, we can add an additional check.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/5] eal: fix async IPC callback not fired when no peers
  2026-06-01 12:40       ` Thomas Monjalon
@ 2026-06-04 16:21         ` Burakov, Anatoly
  0 siblings, 0 replies; 28+ messages in thread
From: Burakov, Anatoly @ 2026-06-04 16:21 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Jianfeng Tan

On 6/1/2026 2:40 PM, Thomas Monjalon wrote:
> 01/06/2026 14:21, Thomas Monjalon:
>> 29/05/2026 17:26, Anatoly Burakov:
>>> Currently, when rte_mp_request_async() is called and no peer processes
>>> are connected (nb_sent == 0), the user callback is never invoked.
>>>
>>> The original implementation used a dedicated background thread and
>>> pthread_cond_signal() to wake it after queuing the dummy request. When
>>> that thread was replaced with per-message alarms, no alarm was set for
>>> the dummy request, silently breaking the nb_sent == 0 path.
>>>
>>> This was not noticed because async requests are used while handling
>>> secondary process requests, where peers are typically already present.
>>>
>>> Fix it by setting a 1us alarm on the dummy request, so the callback path
>>> immediately triggers and processes it.
>>>
>>> Fixes: daf9bfca717e ("ipc: remove thread for async requests")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>> ---
>>>   lib/eal/common/eal_common_proc.c | 12 ++++++++++--
>>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
>>> index 799c6e81b0..0ec79336a5 100644
>>> --- a/lib/eal/common/eal_common_proc.c
>>> +++ b/lib/eal/common/eal_common_proc.c
>>> @@ -1187,11 +1187,15 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
>>>   	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
>>>   		ret = mp_request_async(eal_mp_socket_path(), copy, param, ts);
>>>   
>>> -		/* if we didn't send anything, put dummy request on the queue */
>>> +		/* if we didn't send anything, put dummy request on the queue
>>> +		 * and set a minimum-delay alarm so the callback fires immediately.
>>> +		 */
>>>   		if (ret == 0 && reply->nb_sent == 0) {
>>>   			TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
>>>   					next);
>>>   			dummy_used = true;
>>> +			if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0)
>>> +				EAL_LOG(ERR, "Fail to set alarm for dummy request");
>>
>> Shouldn't we return an error?
> 
> AI suggests this:
> 
>        if (rte_eal_alarm_set(1, async_reply_handle,
>                (void *)(uintptr_t)dummy->id) < 0) {
>            EAL_LOG(ERR, "Fail to set alarm for dummy request");
>            TAILQ_REMOVE(&pending_requests.requests, dummy, next);
>            dummy_used = false;
>            ret = -1;
>        }
> 
Dummy id is not added till later in the series, but I'll integrate this 
otherwise.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 1/5] eal: fix wrong log message in async IPC request
  2026-03-19 16:07 [PATCH v1 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
                   ` (4 preceding siblings ...)
  2026-05-29 15:26 ` [PATCH v2 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
@ 2026-06-04 16:32 ` Anatoly Burakov
  2026-06-04 16:32   ` [PATCH v3 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
                     ` (3 more replies)
  2026-06-05 14:29 ` [PATCH v4 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
  6 siblings, 4 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-06-04 16:32 UTC (permalink / raw)
  To: dev, Jianfeng Tan

The allocation failure log message in mp_request_async() says "sync
request" but the function handles asynchronous requests.

Fix the log to say "async request".

Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 06f151818c..799c6e81b0 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -883,7 +883,7 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	pending_req = calloc(1, sizeof(*pending_req));
 	reply_msg = calloc(1, sizeof(*reply_msg));
 	if (pending_req == NULL || reply_msg == NULL) {
-		EAL_LOG(ERR, "Could not allocate space for sync request");
+		EAL_LOG(ERR, "Could not allocate space for async request");
 		rte_errno = ENOMEM;
 		ret = -1;
 		goto fail;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 2/5] eal: fix async IPC callback not fired when no peers
  2026-06-04 16:32 ` [PATCH v3 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
@ 2026-06-04 16:32   ` Anatoly Burakov
  2026-06-05 18:15     ` Stephen Hemminger
  2026-06-04 16:32   ` [PATCH v3 3/5] eal: fix memory leak in async IPC secondary path Anatoly Burakov
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Anatoly Burakov @ 2026-06-04 16:32 UTC (permalink / raw)
  To: dev, Jianfeng Tan

Currently, when rte_mp_request_async() is called and no peer processes
are connected (nb_sent == 0), the user callback is never invoked.

The original implementation used a dedicated background thread and
pthread_cond_signal() to wake it after queuing the dummy request. When
that thread was replaced with per-message alarms, no alarm was set for
the dummy request, silently breaking the nb_sent == 0 path.

This was not noticed because async requests are used while handling
secondary process requests, where peers are typically already present.

Fix it by setting a 1us alarm on the dummy request, so the callback path
immediately triggers and processes it.

Fixes: daf9bfca717e ("ipc: remove thread for async requests")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 799c6e81b0..5cc15a0f78 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -1187,11 +1187,21 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
 		ret = mp_request_async(eal_mp_socket_path(), copy, param, ts);
 
-		/* if we didn't send anything, put dummy request on the queue */
+		/* if we didn't send anything, put dummy request on the queue
+		 * and set a minimum-delay alarm so the callback fires immediately.
+		 */
 		if (ret == 0 && reply->nb_sent == 0) {
 			TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
 					next);
 			dummy_used = true;
+
+			if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0) {
+				EAL_LOG(ERR, "Fail to set alarm for dummy request");
+				/* roll back the changes */
+				TAILQ_REMOVE(&pending_requests.requests, dummy, next);
+				dummy_used = false;
+				ret = -1;
+			}
 		}
 
 		pthread_mutex_unlock(&pending_requests.lock);
@@ -1232,10 +1242,14 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		} else if (mp_request_async(path, copy, param, ts))
 			ret = -1;
 	}
-	/* if we didn't send anything, put dummy request on the queue */
+	/* if we didn't send anything, put dummy request on the queue
+	 * and set a minimum-delay alarm so the callback fires immediately.
+	 */
 	if (ret == 0 && reply->nb_sent == 0) {
 		TAILQ_INSERT_HEAD(&pending_requests.requests, dummy, next);
 		dummy_used = true;
+		if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0)
+			EAL_LOG(ERR, "Fail to set alarm for dummy request");
 	}
 
 	/* finally, unlock the queue */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 3/5] eal: fix memory leak in async IPC secondary path
  2026-06-04 16:32 ` [PATCH v3 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
  2026-06-04 16:32   ` [PATCH v3 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
@ 2026-06-04 16:32   ` Anatoly Burakov
  2026-06-04 16:32   ` [PATCH v3 4/5] eal: fix async IPC resource leaks on partial failure Anatoly Burakov
  2026-06-04 16:32   ` [PATCH v3 5/5] eal: avoid deadlock in async IPC alarm callback Anatoly Burakov
  3 siblings, 0 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-06-04 16:32 UTC (permalink / raw)
  To: dev, Jianfeng Tan

When rte_mp_request_async() succeeds on the secondary process path, the
dummy request is freed only if it was inserted into the queue. However,
when the actual request was sent successfully (nb_sent > 0), the dummy is
not used and the function returns without freeing it.

Free dummy before returning on the success path when it was not inserted
into the queue.

Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 5cc15a0f78..02adcb9ba8 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -1209,6 +1209,8 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		/* if we couldn't send anything, clean up */
 		if (ret != 0)
 			goto fail;
+		if (!dummy_used)
+			free(dummy);
 		return 0;
 	}
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 4/5] eal: fix async IPC resource leaks on partial failure
  2026-06-04 16:32 ` [PATCH v3 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
  2026-06-04 16:32   ` [PATCH v3 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
  2026-06-04 16:32   ` [PATCH v3 3/5] eal: fix memory leak in async IPC secondary path Anatoly Burakov
@ 2026-06-04 16:32   ` Anatoly Burakov
  2026-06-04 16:32   ` [PATCH v3 5/5] eal: avoid deadlock in async IPC alarm callback Anatoly Burakov
  3 siblings, 0 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-06-04 16:32 UTC (permalink / raw)
  To: dev, Jianfeng Tan

When rte_mp_request_async() fails to send requests to all peers,
copy and param can lose ownership and leak.

On partial failure, some requests may already be queued and still
reference copy and param, so freeing them directly on the error
path can cause use-after-free when those requests are later handled.

Fix this by rolling back queued requests from the current batch,
resetting nb_sent to 0, and freeing copy/param only after rollback.
Use a numeric request ID for alarm callback lookup so stale callbacks
from rolled-back requests become harmless no-ops.

Coverity issue: 501503
Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 113 +++++++++++++++++++++++--------
 1 file changed, 85 insertions(+), 28 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 02adcb9ba8..f023991438 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -74,6 +74,7 @@ struct async_request_param {
 
 struct pending_request {
 	TAILQ_ENTRY(pending_request) next;
+	unsigned long id;
 	enum {
 		REQUEST_TYPE_SYNC,
 		REQUEST_TYPE_ASYNC
@@ -92,6 +93,8 @@ struct pending_request {
 	};
 };
 
+static unsigned long next_request_id;
+
 TAILQ_HEAD(pending_request_list, pending_request);
 
 static struct {
@@ -111,9 +114,9 @@ mp_send(struct rte_mp_msg *msg, const char *peer, int type);
 static void
 async_reply_handle(void *arg);
 
-/* for use with process_msg */
+/* for use with alarm callback and process_msg */
 static struct pending_request *
-async_reply_handle_thread_unsafe(void *arg);
+async_reply_handle_thread_unsafe(struct pending_request *req);
 
 static void
 trigger_async_action(struct pending_request *req);
@@ -132,6 +135,19 @@ find_pending_request(const char *dst, const char *act_name)
 	return r;
 }
 
+static struct pending_request *
+find_async_request_by_id(unsigned long id)
+{
+	struct pending_request *r;
+
+	TAILQ_FOREACH(r, &pending_requests.requests, next) {
+		if (r->id == id && r->type == REQUEST_TYPE_ASYNC)
+			return r;
+	}
+
+	return NULL;
+}
+
 /*
  * Combine prefix and name(optional) to return unix domain socket path
  * return the number of characters that would have been put into buffer.
@@ -519,9 +535,8 @@ trigger_async_action(struct pending_request *sr)
 }
 
 static struct pending_request *
-async_reply_handle_thread_unsafe(void *arg)
+async_reply_handle_thread_unsafe(struct pending_request *req)
 {
-	struct pending_request *req = (struct pending_request *)arg;
 	enum async_action action;
 	struct timespec ts_now;
 
@@ -534,7 +549,8 @@ async_reply_handle_thread_unsafe(void *arg)
 
 	TAILQ_REMOVE(&pending_requests.requests, req, next);
 
-	if (rte_eal_alarm_cancel(async_reply_handle, req) < 0) {
+	if (rte_eal_alarm_cancel(async_reply_handle,
+			(void *)(uintptr_t)req->id) < 0) {
 		/* if we failed to cancel the alarm because it's already in
 		 * progress, don't proceed because otherwise we will end up
 		 * handling the same message twice.
@@ -557,9 +573,13 @@ static void
 async_reply_handle(void *arg)
 {
 	struct pending_request *req;
+	/* alarm arg carries the request ID packed into a void * via uintptr_t */
+	unsigned long id = (uintptr_t)arg;
 
 	pthread_mutex_lock(&pending_requests.lock);
-	req = async_reply_handle_thread_unsafe(arg);
+	req = find_async_request_by_id(id);
+	if (req != NULL)
+		req = async_reply_handle_thread_unsafe(req);
 	pthread_mutex_unlock(&pending_requests.lock);
 
 	if (req != NULL)
@@ -878,7 +898,29 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 {
 	struct rte_mp_msg *reply_msg;
 	struct pending_request *pending_req, *exist;
-	int ret = -1;
+	unsigned long id;
+	int ret;
+
+	/* queue already locked by caller */
+
+	exist = find_pending_request(dst, req->name);
+	if (exist) {
+		EAL_LOG(ERR, "A pending request %s:%s", dst, req->name);
+		rte_errno = EEXIST;
+		return -1;
+	}
+
+	/* Set alarm before allocating or sending so request timeout tracking
+	 * is active as soon as this request ID is reserved.
+	 */
+	id = ++next_request_id;
+	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
+			async_reply_handle,
+			(void *)(uintptr_t)id) < 0) {
+		EAL_LOG(ERR, "Fail to set alarm for request %s:%s",
+			dst, req->name);
+		return -1;
+	}
 
 	pending_req = calloc(1, sizeof(*pending_req));
 	reply_msg = calloc(1, sizeof(*reply_msg));
@@ -890,21 +932,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	}
 
 	pending_req->type = REQUEST_TYPE_ASYNC;
+	pending_req->id = id;
 	strlcpy(pending_req->dst, dst, sizeof(pending_req->dst));
 	pending_req->request = req;
 	pending_req->reply = reply_msg;
 	pending_req->async.param = param;
 
-	/* queue already locked by caller */
-
-	exist = find_pending_request(dst, req->name);
-	if (exist) {
-		EAL_LOG(ERR, "A pending request %s:%s", dst, req->name);
-		rte_errno = EEXIST;
-		ret = -1;
-		goto fail;
-	}
-
 	ret = send_msg(dst, req, MP_REQ);
 	if (ret < 0) {
 		EAL_LOG(ERR, "Fail to send request %s:%s",
@@ -917,14 +950,6 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	}
 	param->user_reply.nb_sent++;
 
-	/* if alarm set fails, we simply ignore the reply */
-	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
-			      async_reply_handle, pending_req) < 0) {
-		EAL_LOG(ERR, "Fail to set alarm for request %s:%s",
-			dst, req->name);
-		ret = -1;
-		goto fail;
-	}
 	TAILQ_INSERT_TAIL(&pending_requests.requests, pending_req, next);
 
 	return 0;
@@ -1178,6 +1203,7 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	 * it, and put it on the queue if we don't send any requests.
 	 */
 	dummy->type = REQUEST_TYPE_ASYNC;
+	dummy->id = ++next_request_id;
 	dummy->request = copy;
 	dummy->reply = NULL;
 	dummy->async.param = param;
@@ -1194,8 +1220,8 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 			TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
 					next);
 			dummy_used = true;
-
-			if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0) {
+			if (rte_eal_alarm_set(1, async_reply_handle,
+					(void *)(uintptr_t)dummy->id) < 0) {
 				EAL_LOG(ERR, "Fail to set alarm for dummy request");
 				/* roll back the changes */
 				TAILQ_REMOVE(&pending_requests.requests, dummy, next);
@@ -1244,13 +1270,39 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		} else if (mp_request_async(path, copy, param, ts))
 			ret = -1;
 	}
+
+	/*
+	 * On partial failure, roll back all queued requests. We hold the lock
+	 * so no one else touches the queue. All requests in this batch share
+	 * the same param pointer. Stale alarms will fire and harmlessly find
+	 * nothing via ID-based lookup.
+	 */
+	if (ret != 0 && reply->nb_sent > 0) {
+		struct pending_request *r, *next;
+
+		for (r = TAILQ_FIRST(&pending_requests.requests);
+				r != NULL; r = next) {
+			next = TAILQ_NEXT(r, next);
+			if (r->type == REQUEST_TYPE_ASYNC &&
+					r->async.param == param) {
+				TAILQ_REMOVE(&pending_requests.requests,
+						r, next);
+				free(r->reply);
+				/* r->request == copy, freed below after the loop */
+				free(r);
+			}
+		}
+		reply->nb_sent = 0;
+	}
+
 	/* if we didn't send anything, put dummy request on the queue
 	 * and set a minimum-delay alarm so the callback fires immediately.
 	 */
 	if (ret == 0 && reply->nb_sent == 0) {
 		TAILQ_INSERT_HEAD(&pending_requests.requests, dummy, next);
 		dummy_used = true;
-		if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0)
+		if (rte_eal_alarm_set(1, async_reply_handle,
+				(void *)(uintptr_t)dummy->id) < 0)
 			EAL_LOG(ERR, "Fail to set alarm for dummy request");
 	}
 
@@ -1266,6 +1318,11 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	/* if dummy was unused, free it */
 	if (!dummy_used)
 		free(dummy);
+	/* if nothing was sent, nobody owns copy/param */
+	if (ret != 0) {
+		free(param);
+		free(copy);
+	}
 
 	return ret;
 closedir_fail:
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 5/5] eal: avoid deadlock in async IPC alarm callback
  2026-06-04 16:32 ` [PATCH v3 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
                     ` (2 preceding siblings ...)
  2026-06-04 16:32   ` [PATCH v3 4/5] eal: fix async IPC resource leaks on partial failure Anatoly Burakov
@ 2026-06-04 16:32   ` Anatoly Burakov
  3 siblings, 0 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-06-04 16:32 UTC (permalink / raw)
  To: dev, Jianfeng Tan

async_reply_handle_thread_unsafe() can run while holding
pending_requests.lock and currently calls rte_eal_alarm_cancel().

rte_eal_alarm_cancel() may spin-wait for an executing callback, which can
deadlock if that callback is blocked on the same lock.

Remove callback-side alarm cancellation. It is safe to do so, because any
callback triggered without a pending request becomes a noop.

Fixes: daf9bfca717e ("ipc: remove thread for async requests")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index f023991438..012a03440c 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -549,19 +549,6 @@ async_reply_handle_thread_unsafe(struct pending_request *req)
 
 	TAILQ_REMOVE(&pending_requests.requests, req, next);
 
-	if (rte_eal_alarm_cancel(async_reply_handle,
-			(void *)(uintptr_t)req->id) < 0) {
-		/* if we failed to cancel the alarm because it's already in
-		 * progress, don't proceed because otherwise we will end up
-		 * handling the same message twice.
-		 */
-		if (rte_errno == EINPROGRESS) {
-			EAL_LOG(DEBUG, "Request handling is already in progress");
-			goto no_trigger;
-		}
-		EAL_LOG(ERR, "Failed to cancel alarm");
-	}
-
 	if (action == ACTION_TRIGGER)
 		return req;
 no_trigger:
@@ -910,8 +897,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 		return -1;
 	}
 
-	/* Set alarm before allocating or sending so request timeout tracking
-	 * is active as soon as this request ID is reserved.
+	/* Set alarm before allocating or sending. The alarm is never cancelled:
+	 * rte_eal_alarm_cancel spin-waits for an executing callback to finish,
+	 * which deadlocks if we hold pending_requests.lock while the callback
+	 * is blocked on it. Instead, let stale alarms fire; with ID-based
+	 * lookup the callback will simply not find the request and return
+	 * harmlessly.
 	 */
 	id = ++next_request_id;
 	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 1/5] eal: fix wrong log message in async IPC request
  2026-03-19 16:07 [PATCH v1 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
                   ` (5 preceding siblings ...)
  2026-06-04 16:32 ` [PATCH v3 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
@ 2026-06-05 14:29 ` Anatoly Burakov
  2026-06-05 14:29   ` [PATCH v4 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
                     ` (3 more replies)
  6 siblings, 4 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-06-05 14:29 UTC (permalink / raw)
  To: dev, Jianfeng Tan

The allocation failure log message in mp_request_async() says "sync
request" but the function handles asynchronous requests.

Fix the log to say "async request".

Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 06f151818c..799c6e81b0 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -883,7 +883,7 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	pending_req = calloc(1, sizeof(*pending_req));
 	reply_msg = calloc(1, sizeof(*reply_msg));
 	if (pending_req == NULL || reply_msg == NULL) {
-		EAL_LOG(ERR, "Could not allocate space for sync request");
+		EAL_LOG(ERR, "Could not allocate space for async request");
 		rte_errno = ENOMEM;
 		ret = -1;
 		goto fail;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 2/5] eal: fix async IPC callback not fired when no peers
  2026-06-05 14:29 ` [PATCH v4 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
@ 2026-06-05 14:29   ` Anatoly Burakov
  2026-06-05 14:29   ` [PATCH v4 3/5] eal: fix memory leak in async IPC secondary path Anatoly Burakov
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-06-05 14:29 UTC (permalink / raw)
  To: dev, Jianfeng Tan

Currently, when rte_mp_request_async() is called and no peer processes
are connected (nb_sent == 0), the user callback is never invoked.

The original implementation used a dedicated background thread and
pthread_cond_signal() to wake it after queuing the dummy request. When
that thread was replaced with per-message alarms, no alarm was set for
the dummy request, silently breaking the nb_sent == 0 path.

This was not noticed because async requests are used while handling
secondary process requests, where peers are typically already present.

Fix it by setting a 1us alarm on the dummy request, so the callback path
immediately triggers and processes it.

Fixes: daf9bfca717e ("ipc: remove thread for async requests")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 799c6e81b0..2f4a939c68 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -1187,11 +1187,22 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
 		ret = mp_request_async(eal_mp_socket_path(), copy, param, ts);
 
-		/* if we didn't send anything, put dummy request on the queue */
+		/* if we didn't send anything, put dummy request on the queue
+		 * and set a minimum-delay alarm so the callback fires immediately.
+		 */
 		if (ret == 0 && reply->nb_sent == 0) {
 			TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
 					next);
 			dummy_used = true;
+
+			if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0) {
+				EAL_LOG(ERR, "Fail to set alarm for dummy request");
+				/* roll back the changes */
+				TAILQ_REMOVE(&pending_requests.requests, dummy, next);
+				dummy_used = false;
+				ret = -1;
+				goto fail;
+			}
 		}
 
 		pthread_mutex_unlock(&pending_requests.lock);
@@ -1232,10 +1243,14 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		} else if (mp_request_async(path, copy, param, ts))
 			ret = -1;
 	}
-	/* if we didn't send anything, put dummy request on the queue */
+	/* if we didn't send anything, put dummy request on the queue
+	 * and set a minimum-delay alarm so the callback fires immediately.
+	 */
 	if (ret == 0 && reply->nb_sent == 0) {
 		TAILQ_INSERT_HEAD(&pending_requests.requests, dummy, next);
 		dummy_used = true;
+		if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0)
+			EAL_LOG(ERR, "Fail to set alarm for dummy request");
 	}
 
 	/* finally, unlock the queue */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 3/5] eal: fix memory leak in async IPC secondary path
  2026-06-05 14:29 ` [PATCH v4 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
  2026-06-05 14:29   ` [PATCH v4 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
@ 2026-06-05 14:29   ` Anatoly Burakov
  2026-06-05 14:29   ` [PATCH v4 4/5] eal: fix async IPC resource leaks on partial failure Anatoly Burakov
  2026-06-05 14:29   ` [PATCH v4 5/5] eal: avoid deadlock in async IPC alarm callback Anatoly Burakov
  3 siblings, 0 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-06-05 14:29 UTC (permalink / raw)
  To: dev, Jianfeng Tan

When rte_mp_request_async() succeeds on the secondary process path, the
dummy request is freed only if it was inserted into the queue. However,
when the actual request was sent successfully (nb_sent > 0), the dummy is
not used and the function returns without freeing it.

Free dummy before returning on the success path when it was not inserted
into the queue.

Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 2f4a939c68..c8e59967d9 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -1210,6 +1210,8 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		/* if we couldn't send anything, clean up */
 		if (ret != 0)
 			goto fail;
+		if (!dummy_used)
+			free(dummy);
 		return 0;
 	}
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 4/5] eal: fix async IPC resource leaks on partial failure
  2026-06-05 14:29 ` [PATCH v4 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
  2026-06-05 14:29   ` [PATCH v4 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
  2026-06-05 14:29   ` [PATCH v4 3/5] eal: fix memory leak in async IPC secondary path Anatoly Burakov
@ 2026-06-05 14:29   ` Anatoly Burakov
  2026-06-05 14:29   ` [PATCH v4 5/5] eal: avoid deadlock in async IPC alarm callback Anatoly Burakov
  3 siblings, 0 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-06-05 14:29 UTC (permalink / raw)
  To: dev, Jianfeng Tan

When rte_mp_request_async() fails to send requests to all peers,
copy and param can lose ownership and leak.

On partial failure, some requests may already be queued and still
reference copy and param, so freeing them directly on the error
path can cause use-after-free when those requests are later handled.

Fix this by rolling back queued requests from the current batch,
resetting nb_sent to 0, and freeing copy/param only after rollback.
Use a numeric request ID for alarm callback lookup so stale callbacks
from rolled-back requests become harmless no-ops.

Coverity issue: 501503
Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 112 +++++++++++++++++++++++--------
 1 file changed, 84 insertions(+), 28 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index c8e59967d9..64812bcfd7 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -74,6 +74,7 @@ struct async_request_param {
 
 struct pending_request {
 	TAILQ_ENTRY(pending_request) next;
+	unsigned long id;
 	enum {
 		REQUEST_TYPE_SYNC,
 		REQUEST_TYPE_ASYNC
@@ -92,6 +93,8 @@ struct pending_request {
 	};
 };
 
+static unsigned long next_request_id;
+
 TAILQ_HEAD(pending_request_list, pending_request);
 
 static struct {
@@ -111,9 +114,9 @@ mp_send(struct rte_mp_msg *msg, const char *peer, int type);
 static void
 async_reply_handle(void *arg);
 
-/* for use with process_msg */
+/* for use with alarm callback and process_msg */
 static struct pending_request *
-async_reply_handle_thread_unsafe(void *arg);
+async_reply_handle_thread_unsafe(struct pending_request *req);
 
 static void
 trigger_async_action(struct pending_request *req);
@@ -132,6 +135,19 @@ find_pending_request(const char *dst, const char *act_name)
 	return r;
 }
 
+static struct pending_request *
+find_async_request_by_id(unsigned long id)
+{
+	struct pending_request *r;
+
+	TAILQ_FOREACH(r, &pending_requests.requests, next) {
+		if (r->id == id && r->type == REQUEST_TYPE_ASYNC)
+			return r;
+	}
+
+	return NULL;
+}
+
 /*
  * Combine prefix and name(optional) to return unix domain socket path
  * return the number of characters that would have been put into buffer.
@@ -519,9 +535,8 @@ trigger_async_action(struct pending_request *sr)
 }
 
 static struct pending_request *
-async_reply_handle_thread_unsafe(void *arg)
+async_reply_handle_thread_unsafe(struct pending_request *req)
 {
-	struct pending_request *req = (struct pending_request *)arg;
 	enum async_action action;
 	struct timespec ts_now;
 
@@ -534,7 +549,8 @@ async_reply_handle_thread_unsafe(void *arg)
 
 	TAILQ_REMOVE(&pending_requests.requests, req, next);
 
-	if (rte_eal_alarm_cancel(async_reply_handle, req) < 0) {
+	if (rte_eal_alarm_cancel(async_reply_handle,
+			(void *)(uintptr_t)req->id) < 0) {
 		/* if we failed to cancel the alarm because it's already in
 		 * progress, don't proceed because otherwise we will end up
 		 * handling the same message twice.
@@ -557,9 +573,13 @@ static void
 async_reply_handle(void *arg)
 {
 	struct pending_request *req;
+	/* alarm arg carries the request ID packed into a void * via uintptr_t */
+	unsigned long id = (uintptr_t)arg;
 
 	pthread_mutex_lock(&pending_requests.lock);
-	req = async_reply_handle_thread_unsafe(arg);
+	req = find_async_request_by_id(id);
+	if (req != NULL)
+		req = async_reply_handle_thread_unsafe(req);
 	pthread_mutex_unlock(&pending_requests.lock);
 
 	if (req != NULL)
@@ -878,7 +898,29 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 {
 	struct rte_mp_msg *reply_msg;
 	struct pending_request *pending_req, *exist;
-	int ret = -1;
+	unsigned long id;
+	int ret;
+
+	/* queue already locked by caller */
+
+	exist = find_pending_request(dst, req->name);
+	if (exist) {
+		EAL_LOG(ERR, "A pending request %s:%s", dst, req->name);
+		rte_errno = EEXIST;
+		return -1;
+	}
+
+	/* Set alarm before allocating or sending so request timeout tracking
+	 * is active as soon as this request ID is reserved.
+	 */
+	id = ++next_request_id;
+	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
+			async_reply_handle,
+			(void *)(uintptr_t)id) < 0) {
+		EAL_LOG(ERR, "Fail to set alarm for request %s:%s",
+			dst, req->name);
+		return -1;
+	}
 
 	pending_req = calloc(1, sizeof(*pending_req));
 	reply_msg = calloc(1, sizeof(*reply_msg));
@@ -890,21 +932,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	}
 
 	pending_req->type = REQUEST_TYPE_ASYNC;
+	pending_req->id = id;
 	strlcpy(pending_req->dst, dst, sizeof(pending_req->dst));
 	pending_req->request = req;
 	pending_req->reply = reply_msg;
 	pending_req->async.param = param;
 
-	/* queue already locked by caller */
-
-	exist = find_pending_request(dst, req->name);
-	if (exist) {
-		EAL_LOG(ERR, "A pending request %s:%s", dst, req->name);
-		rte_errno = EEXIST;
-		ret = -1;
-		goto fail;
-	}
-
 	ret = send_msg(dst, req, MP_REQ);
 	if (ret < 0) {
 		EAL_LOG(ERR, "Fail to send request %s:%s",
@@ -917,14 +950,6 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	}
 	param->user_reply.nb_sent++;
 
-	/* if alarm set fails, we simply ignore the reply */
-	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
-			      async_reply_handle, pending_req) < 0) {
-		EAL_LOG(ERR, "Fail to set alarm for request %s:%s",
-			dst, req->name);
-		ret = -1;
-		goto fail;
-	}
 	TAILQ_INSERT_TAIL(&pending_requests.requests, pending_req, next);
 
 	return 0;
@@ -1178,6 +1203,7 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	 * it, and put it on the queue if we don't send any requests.
 	 */
 	dummy->type = REQUEST_TYPE_ASYNC;
+	dummy->id = ++next_request_id;
 	dummy->request = copy;
 	dummy->reply = NULL;
 	dummy->async.param = param;
@@ -1194,8 +1220,8 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 			TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
 					next);
 			dummy_used = true;
-
-			if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0) {
+			if (rte_eal_alarm_set(1, async_reply_handle,
+					(void *)(uintptr_t)dummy->id) < 0) {
 				EAL_LOG(ERR, "Fail to set alarm for dummy request");
 				/* roll back the changes */
 				TAILQ_REMOVE(&pending_requests.requests, dummy, next);
@@ -1245,13 +1271,38 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		} else if (mp_request_async(path, copy, param, ts))
 			ret = -1;
 	}
+
+	/*
+	 * On partial failure, roll back all queued requests in this batch while
+	 * holding pending_requests.lock. Any alarm callback that runs later for
+	 * these removed IDs will not find a pending request and will return.
+	 */
+	if (ret != 0 && reply->nb_sent > 0) {
+		struct pending_request *r, *next;
+
+		for (r = TAILQ_FIRST(&pending_requests.requests);
+				r != NULL; r = next) {
+			next = TAILQ_NEXT(r, next);
+			if (r->type == REQUEST_TYPE_ASYNC &&
+					r->async.param == param) {
+				TAILQ_REMOVE(&pending_requests.requests,
+						r, next);
+				free(r->reply);
+				/* r->request == copy, freed below after the loop */
+				free(r);
+			}
+		}
+		reply->nb_sent = 0;
+	}
+
 	/* if we didn't send anything, put dummy request on the queue
 	 * and set a minimum-delay alarm so the callback fires immediately.
 	 */
 	if (ret == 0 && reply->nb_sent == 0) {
 		TAILQ_INSERT_HEAD(&pending_requests.requests, dummy, next);
 		dummy_used = true;
-		if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0)
+		if (rte_eal_alarm_set(1, async_reply_handle,
+				(void *)(uintptr_t)dummy->id) < 0)
 			EAL_LOG(ERR, "Fail to set alarm for dummy request");
 	}
 
@@ -1267,6 +1318,11 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	/* if dummy was unused, free it */
 	if (!dummy_used)
 		free(dummy);
+	/* if nothing was sent, nobody owns copy/param */
+	if (ret != 0) {
+		free(param);
+		free(copy);
+	}
 
 	return ret;
 closedir_fail:
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 5/5] eal: avoid deadlock in async IPC alarm callback
  2026-06-05 14:29 ` [PATCH v4 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
                     ` (2 preceding siblings ...)
  2026-06-05 14:29   ` [PATCH v4 4/5] eal: fix async IPC resource leaks on partial failure Anatoly Burakov
@ 2026-06-05 14:29   ` Anatoly Burakov
  3 siblings, 0 replies; 28+ messages in thread
From: Anatoly Burakov @ 2026-06-05 14:29 UTC (permalink / raw)
  To: dev, Jianfeng Tan

async_reply_handle_thread_unsafe() can run while holding
pending_requests.lock and currently calls rte_eal_alarm_cancel().

rte_eal_alarm_cancel() may spin-wait for an executing callback, which can
deadlock if that callback is blocked on the same lock.

Remove callback-side alarm cancellation. It is safe to do so, because any
callback triggered without a pending request becomes a noop.

Fixes: daf9bfca717e ("ipc: remove thread for async requests")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 28 ++++++++++------------------
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 64812bcfd7..e459109fec 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -549,19 +549,6 @@ async_reply_handle_thread_unsafe(struct pending_request *req)
 
 	TAILQ_REMOVE(&pending_requests.requests, req, next);
 
-	if (rte_eal_alarm_cancel(async_reply_handle,
-			(void *)(uintptr_t)req->id) < 0) {
-		/* if we failed to cancel the alarm because it's already in
-		 * progress, don't proceed because otherwise we will end up
-		 * handling the same message twice.
-		 */
-		if (rte_errno == EINPROGRESS) {
-			EAL_LOG(DEBUG, "Request handling is already in progress");
-			goto no_trigger;
-		}
-		EAL_LOG(ERR, "Failed to cancel alarm");
-	}
-
 	if (action == ACTION_TRIGGER)
 		return req;
 no_trigger:
@@ -910,8 +897,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 		return -1;
 	}
 
-	/* Set alarm before allocating or sending so request timeout tracking
-	 * is active as soon as this request ID is reserved.
+	/* Set alarm before allocating or sending. The alarm is never cancelled:
+	 * rte_eal_alarm_cancel spin-waits for an executing callback to finish,
+	 * which deadlocks if we hold pending_requests.lock while the callback
+	 * is blocked on it. Instead, let stale alarms fire; with ID-based
+	 * lookup the callback will simply not find the request and return
+	 * harmlessly.
 	 */
 	id = ++next_request_id;
 	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
@@ -1273,9 +1264,10 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	}
 
 	/*
-	 * On partial failure, roll back all queued requests in this batch while
-	 * holding pending_requests.lock. Any alarm callback that runs later for
-	 * these removed IDs will not find a pending request and will return.
+	 * On partial failure, roll back all queued requests. We hold the lock
+	 * so no one else touches the queue. All requests in this batch share
+	 * the same param pointer. Stale alarms will fire and harmlessly find
+	 * nothing via ID-based lookup.
 	 */
 	if (ret != 0 && reply->nb_sent > 0) {
 		struct pending_request *r, *next;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/5] eal: fix async IPC callback not fired when no peers
  2026-06-04 16:32   ` [PATCH v3 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
@ 2026-06-05 18:15     ` Stephen Hemminger
  0 siblings, 0 replies; 28+ messages in thread
From: Stephen Hemminger @ 2026-06-05 18:15 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Jianfeng Tan

On Thu,  4 Jun 2026 17:32:16 +0100
Anatoly Burakov <anatoly.burakov@intel.com> wrote:

> Currently, when rte_mp_request_async() is called and no peer processes
> are connected (nb_sent == 0), the user callback is never invoked.
> 
> The original implementation used a dedicated background thread and
> pthread_cond_signal() to wake it after queuing the dummy request. When
> that thread was replaced with per-message alarms, no alarm was set for
> the dummy request, silently breaking the nb_sent == 0 path.
> 
> This was not noticed because async requests are used while handling
> secondary process requests, where peers are typically already present.
> 
> Fix it by setting a 1us alarm on the dummy request, so the callback path
> immediately triggers and processes it.
> 
> Fixes: daf9bfca717e ("ipc: remove thread for async requests")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/eal/common/eal_common_proc.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
> index 799c6e81b0..5cc15a0f78 100644
> --- a/lib/eal/common/eal_common_proc.c
> +++ b/lib/eal/common/eal_common_proc.c
> @@ -1187,11 +1187,21 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
>  	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
>  		ret = mp_request_async(eal_mp_socket_path(), copy, param, ts);
>  
> -		/* if we didn't send anything, put dummy request on the queue */
> +		/* if we didn't send anything, put dummy request on the queue
> +		 * and set a minimum-delay alarm so the callback fires immediately.
> +		 */
>  		if (ret == 0 && reply->nb_sent == 0) {
>  			TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
>  					next);
>  			dummy_used = true;
> +
> +			if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0) {
> +				EAL_LOG(ERR, "Fail to set alarm for dummy request");
> +				/* roll back the changes */
> +				TAILQ_REMOVE(&pending_requests.requests, dummy, next);
> +				dummy_used = false;
> +				ret = -1;
> +			}
>  		}
>  
>  		pthread_mutex_unlock(&pending_requests.lock);
> @@ -1232,10 +1242,14 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
>  		} else if (mp_request_async(path, copy, param, ts))
>  			ret = -1;
>  	}
> -	/* if we didn't send anything, put dummy request on the queue */
> +	/* if we didn't send anything, put dummy request on the queue
> +	 * and set a minimum-delay alarm so the callback fires immediately.
> +	 */
>  	if (ret == 0 && reply->nb_sent == 0) {
>  		TAILQ_INSERT_HEAD(&pending_requests.requests, dummy, next);
>  		dummy_used = true;
> +		if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0)
> +			EAL_LOG(ERR, "Fail to set alarm for dummy request");
>  	}
>  
>  	/* finally, unlock the queue */


AI spotted potential issue:

The bug in 2/5: in the primary-process path, if rte_eal_alarm_set() fails for the dummy request, the code only logs it.
The dummy stays on the queue with no alarm, the function returns 0 (success),
the callback never fires, and dummy/copy/param leak.

The secondary path right above it handles this correctly (rolls back, returns -1).
Fix is to make the primary path do the same. This corner is never fixed by the later patches.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2026-06-05 18:15 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19 16:07 [PATCH v1 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
2026-03-19 16:07 ` [PATCH v1 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
2026-03-19 16:07 ` [PATCH v1 3/5] eal: fix memory leak in async IPC secondary path Anatoly Burakov
2026-03-19 16:07 ` [PATCH v1 4/5] eal: fix async IPC resource leaks on partial failure Anatoly Burakov
2026-05-28 14:24   ` Thomas Monjalon
2026-05-29 15:10     ` Burakov, Anatoly
2026-03-19 16:07 ` [PATCH v1 5/5] eal: avoid deadlock in async IPC alarm callback Anatoly Burakov
2026-05-29 15:26 ` [PATCH v2 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
2026-05-29 15:26   ` [PATCH v2 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
2026-06-01 12:21     ` Thomas Monjalon
2026-06-01 12:40       ` Thomas Monjalon
2026-06-04 16:21         ` Burakov, Anatoly
2026-05-29 15:26   ` [PATCH v2 3/5] eal: fix memory leak in async IPC secondary path Anatoly Burakov
2026-05-29 15:26   ` [PATCH v2 4/5] eal: fix async IPC resource leaks on partial failure Anatoly Burakov
2026-06-01 12:16     ` Thomas Monjalon
2026-06-03  8:28       ` Burakov, Anatoly
2026-05-29 15:26   ` [PATCH v2 5/5] eal: avoid deadlock in async IPC alarm callback Anatoly Burakov
2026-06-04 16:32 ` [PATCH v3 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
2026-06-04 16:32   ` [PATCH v3 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
2026-06-05 18:15     ` Stephen Hemminger
2026-06-04 16:32   ` [PATCH v3 3/5] eal: fix memory leak in async IPC secondary path Anatoly Burakov
2026-06-04 16:32   ` [PATCH v3 4/5] eal: fix async IPC resource leaks on partial failure Anatoly Burakov
2026-06-04 16:32   ` [PATCH v3 5/5] eal: avoid deadlock in async IPC alarm callback Anatoly Burakov
2026-06-05 14:29 ` [PATCH v4 1/5] eal: fix wrong log message in async IPC request Anatoly Burakov
2026-06-05 14:29   ` [PATCH v4 2/5] eal: fix async IPC callback not fired when no peers Anatoly Burakov
2026-06-05 14:29   ` [PATCH v4 3/5] eal: fix memory leak in async IPC secondary path Anatoly Burakov
2026-06-05 14:29   ` [PATCH v4 4/5] eal: fix async IPC resource leaks on partial failure Anatoly Burakov
2026-06-05 14:29   ` [PATCH v4 5/5] eal: avoid deadlock in async IPC alarm callback Anatoly Burakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox