DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] dma/cnxk: fix crash on secondary process cleanup
From: Jerin Jacob @ 2026-06-08 16:09 UTC (permalink / raw)
  To: pbhagavatula
  Cc: jerinj, Vamsi Attunuru, Anatoly Burakov, Radha Mohan Chintakuntla,
	dev, stable
In-Reply-To: <20260605081620.97056-1-pbhagavatula@marvell.com>

On Fri, Jun 5, 2026 at 2:11 PM <pbhagavatula@marvell.com> wrote:
>
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>
> cnxk_dmadev_probe() ran in secondary processes too, overwriting the
> shared rdpi->pci_dev with a process-local pointer and marking the
> device ready. With buses now cleaned up on shutdown, the primary's
> roc_dpi_dev_fini() dereferences that stale pointer and crashes.
>
> Skip HW init in secondary processes: attach to the shared device data
> and return, leaving rdpi and the device state untouched.
>
> Fixes: 53f6d7328bf4 ("dma/cnxk: create and initialize device on PCI probing")
> Cc: stable@dpdk.org
>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>

Applied to dpdk-next-net-mrvl/for-main. Thanks

^ permalink raw reply

* Re: [PATCH v2 2/3] event/cnxk: add pause to spinloops
From: Jerin Jacob @ 2026-06-08 15:49 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Pavan Nikhilesh, Shijith Thotton
In-Reply-To: <20260413164652.33291-3-stephen@networkplumber.org>

On Mon, Apr 13, 2026 at 10:36 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On SMT systems when a spinloop is done without a pause
> it may cause excessive latency. This problem was found
> by the fix_empty_spinloops coccinelle script.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

rte_pause() translates to YIELD instruction. Since cnxk is an
integrated SoC and it is a single threaded core, it won't help on
anything other than adding one instruction bit more latency.
In general 3/3 devtool is good. Please send a it separate version so
that 3/3 patches can be merged through the main tree.



---------------
The YIELD instruction write up from ARMv8 manual.

YIELD is a hint instruction in the ARMv8-A architecture. It tells the
CPU that the current hardware thread is doing nothing useful right now
(typically spinning in a busy-wait loop), so the processor may
reallocate shared execution resources to other hardware threads.

On SMT (multithreaded) cores, this can give sibling hardware threads
more resources, improving overall throughput.
On most current single-threaded ARM cores, YIELD executes as a NOP —
it has no microarchitectural effect, but it's architecturally valid
and harmless. It does not put the core to sleep (unlike WFE/WFI).
It's a pure hint: it never changes program correctness, only
potentially performance/fairness.
-------------

^ permalink raw reply

* Re: [PATCH] eal: fix core_index for non-EAL registered threads
From: David Marchand @ 2026-06-08 15:49 UTC (permalink / raw)
  To: Maxime Peim, Dariusz Sosnowski, Slava Ovsiienko
  Cc: dev, Matan Azrad, Thomas Monjalon
In-Reply-To: <20260422075414.2528455-1-maxime.peim@gmail.com>

Hello,

On Wed, 22 Apr 2026 at 09:54, Maxime Peim <maxime.peim@gmail.com> wrote:
>
> Threads registered via rte_thread_register() are assigned a valid
> lcore_id by eal_lcore_non_eal_allocate(), but their core_index in
> lcore_config is left at -1. This value was set during rte_eal_cpu_init()
> for lcores with ROLE_OFF (undetected CPUs) and is never updated when the
> lcore is later allocated to a non-EAL thread.
>
> As a result, rte_lcore_index() returns -1 for registered non-EAL
> threads. Libraries that use rte_lcore_index() to select per-lcore
> caches fall back to a shared global path when it returns -1, causing
> severe contention under concurrent access from multiple registered
> threads.
>
> A concrete example is the mlx5 indexed memory pool (mlx5_ipool), which
> uses rte_lcore_index() in mlx5_ipool_malloc_cache() to select a per-core
> cache slot. When core_index is -1, all registered threads are funneled
> into a single shared slot protected by a spinlock. In testing with VPP
> (which registers worker threads via rte_thread_register()), this caused
> async flow rule insertion throughput to drop from ~6.4M rules/sec to
> ~1.2M rules/sec with 4 workers -- a 5x regression attributable entirely
> to spinlock contention in the ipool allocator.
>
> Fix by setting core_index to the next sequential index (cfg->lcore_count)
> in eal_lcore_non_eal_allocate() before incrementing the count. Also reset
> core_index back to -1 on the error rollback path and in
> eal_lcore_non_eal_release() for correctness.
>
> Fixes: 5c307ba2a5b1 ("eal: register non-EAL threads as lcores")
> Signed-off-by: Maxime Peim <maxime.peim@gmail.com>

Thanks for the fix Maxime, it looks correct though I am a bit
skeptical about usage of this API with dynamic thread allocation.

In the net/mlx5 context, for example, I expect no memory saving from
using the lcore "index": mlx5 is allocating an array with
RTE_MAX_LCORE+1 entries.
Using rte_lcore_id() would probably be good enough.
Dariusz, Slava, any opinion?


-- 
David Marchand


^ permalink raw reply

* Re: [PATCH v4] ethdev: support inline calculating masked item value
From: Stephen Hemminger @ 2026-06-08 15:45 UTC (permalink / raw)
  To: Bing Zhao
  Cc: viacheslavo, dev, rasland, orika, dsosnowski, suanmingm, matan,
	thomas
In-Reply-To: <20260603092805.9837-1-bingz@nvidia.com>

On Wed, 3 Jun 2026 12:28:05 +0300
Bing Zhao <bingz@nvidia.com> wrote:

> In the asynchronous API definition and some drivers, the
> rte_flow_item spec value may not be calculated by the driver due to the
> reason of speed of light rule insertion rate and sometimes the input
> parameters will be copied and changed internally.
> 
> After copying, the spec and last will be protected by the keyword
> const and cannot be changed in the code itself. And also the driver
> needs some extra memory to do the calculation and extra conditions
> to understand the length of each item spec. This is not efficient.
> 
> To solve the issue and support usage of the following fix, a new OP
> was introduced to calculate the spec and last values after applying
> the mask inline.
> 
> Signed-off-by: Bing Zhao <bingz@nvidia.com>
> ---

More detailed AI review found some things that still need addressing.

On Wed,  3 Jun 2026 12:28:05 +0300, Bing Zhao wrote:
> Subject: [PATCH v4] ethdev: support inline calculating masked item value

Error: byte-wise masking corrupts embedded pointers in deep-copy item
types (RAW, FLEX, GENEVE_OPT).

In rte_flow_conv_pattern(), the new mask application runs over the fixed
item struct:

	size_t item_mask_size = mask ? rte_flow_conv_item_mask_size(src) : 0;
	...
	size_t mask_size = RTE_MIN(ret, item_mask_size);

	for (j = 0; j < mask_size; j++)
		c_spec[j] &= mask[j];

item_mask_size is rte_flow_desc_item[type].size, the size of the fixed
item struct. For RTE_FLOW_ITEM_TYPE_RAW, FLEX, and GENEVE_OPT, that fixed
struct ends in an embedded pointer that rte_flow_conv_item_spec() has just
populated to point at the deep-copied trailing data (rte_flow_item_raw.pattern,
rte_flow_item_flex.pattern, rte_flow_item_geneve_opt.data). Because the masked
range covers the whole fixed struct, the loop ANDs the bytes of that pointer
with the mask's corresponding bytes (typically a NULL mask pointer), zeroing
or garbling it.

The converted item's pattern/data pointer is clobbered while the copied
payload it should reference is left unreachable. A consumer that follows
conv->pattern then dereferences NULL or a corrupt address. Plain value items
(eth, ipv4, ...) are unaffected; only the deep-copy item types break, which
is exactly what the test does not exercise.

Suggested fix: do not blind-mask the entire fixed struct for items that carry
an embedded pointer / desc_fn deep copy. Either skip masking when
rte_flow_desc_item[type].desc_fn != NULL, or mask only the leading plain-data
region and leave the pointer field (and trailing copied bytes) intact.

Warning: the new test validates only an ETH pattern, so the RAW/FLEX/GENEVE_OPT
path above is untested. A RAW item case would have surfaced the pointer
corruption.

Info: the Doxygen block for RTE_FLOW_CONV_OP_PATTERN_MASKED uses @p mask,
@p spec, @p last, but those are item fields, not parameters of the op; the
neighboring enum entries only document the @p src / @p dst types.

^ permalink raw reply

* Re: [EXTERNAL] [PATCH] net/octeontx/base: fix out-of-bounds read in DQ range lookup
From: Jerin Jacob @ 2026-06-08 15:37 UTC (permalink / raw)
  To: Sergei Iashin, Harman Kalra, Santosh Shukla
  Cc: dev@dpdk.org, stable@dpdk.org, jerin.jacob@caviumnetworks.com
In-Reply-To: <20260407113001.1217481-1-yashin.sergey@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2732 bytes --]


Applied to dpdk-next-net-mrvl/for-main. Thanks

________________________________
From: Sergei Iashin <yashin.sergey@gmail.com>
Sent: Tuesday, April 7, 2026 5:00 PM
To: Harman Kalra <hkalra@marvell.com>; Santosh Shukla <santosh.shukla@caviumnetworks.com>; Jerin Jacob <jerinj@marvell.com>
Cc: dev@dpdk.org <dev@dpdk.org>; stable@dpdk.org <stable@dpdk.org>; jerin.jacob@caviumnetworks.com <jerin.jacob@caviumnetworks.com>; Sergei Iashin <yashin.sergey@gmail.com>
Subject: [EXTERNAL] [PATCH] net/octeontx/base: fix out-of-bounds read in DQ range lookup

In octeontx_pko_dq_range_lookup(), the inner while loop evaluates the array access ctl->dq_map[dq]. chanid before the bounds check dq < RTE_DIM(ctl->dq_map). When dq is incremented to 256 inside the loop, the next iteration reads one
ZjQcmQRYFpfptBannerStart
Prioritize security for external emails:
Confirm sender and content safety before clicking links or opening attachments
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/CRVmXkqW!tm3Z1f8UYnVa9O-cmb1abtPB-IORJwK3Jr3VXVds937zvL1Te5uABuIyTLhBPe1u0lFyd2PYF2MzgfBRj9IabE7Hc6ItR791qHo$>
Report Suspicious

ZjQcmQRYFpfptBannerEnd

In octeontx_pko_dq_range_lookup(), the inner while loop evaluates the
array access ctl->dq_map[dq].chanid before the bounds check
dq < RTE_DIM(ctl->dq_map). When dq is incremented to 256 inside the
loop, the next iteration reads one element past the end of the
256-element dq_map array before the bounds condition can short-circuit.

Swap the two conjuncts so the bounds check is evaluated first, matching
the pattern already used in the outer loop.

Fixes: cad78ca23818 ("net/octeontx/base: add base PKO operations")
Cc: jerin.jacob@caviumnetworks.com
Cc: stable@dpdk.org

Signed-off-by: Sergei Iashin <yashin.sergey@gmail.com>
---
 drivers/net/octeontx/base/octeontx_pkovf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/octeontx/base/octeontx_pkovf.c b/drivers/net/octeontx/base/octeontx_pkovf.c
index 7aec84a813..5326fe24b9 100644
--- a/drivers/net/octeontx/base/octeontx_pkovf.c
+++ b/drivers/net/octeontx/base/octeontx_pkovf.c
@@ -196,8 +196,8 @@ octeontx_pko_dq_range_lookup(struct octeontx_pko_vf_ctl_s *ctl, uint64_t chanid,
        while (dq < RTE_DIM(ctl->dq_map)) {
                dq_base = dq;
                dq_cnt = 0;
-               while (ctl->dq_map[dq].chanid == ~chanid &&
-                       dq < RTE_DIM(ctl->dq_map)) {
+               while (dq < RTE_DIM(ctl->dq_map) &&
+                       ctl->dq_map[dq].chanid == ~chanid) {
                        dq_cnt++;
                        if (dq_cnt == dq_num)
                                return dq_base;
--
2.39.5



[-- Attachment #2: Type: text/html, Size: 7065 bytes --]

^ permalink raw reply related

* Re: [PATCH] net/nfp: fix null dereference in flower ctrl NFD3 Tx
From: Stephen Hemminger @ 2026-06-08 15:30 UTC (permalink / raw)
  To: Denis Sergeev; +Cc: dev, chaoyong.he, stable, sdl.dpdk
In-Reply-To: <20260603055211.120315-1-denserg.edu@gmail.com>

On Wed,  3 Jun 2026 08:51:56 +0300
Denis Sergeev <denserg.edu@gmail.com> wrote:

> In nfp_flower_ctrl_vnic_nfd3_xmit(), when txq is NULL, goto xmit_end
> leads to unconditional dereference of txq->qcp_q in nfp_qcp_ptr_add().
> The same goto from the "no free descriptors" path incorrectly increments
> the hardware write pointer despite no descriptor being written.
> 
> Replace both gotos with early return, removing the unused xmit_end label.
> 
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
> 
> Fixes: a36634e87e16 ("net/nfp: add flower ctrl VNIC Rx/Tx")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Denis Sergeev <denserg.edu@gmail.com>

Applied to next-net

^ permalink raw reply

* Re: [EXTERNAL] [PATCH] net/octeontx: fix buffer overflow in device name formatting
From: Jerin Jacob @ 2026-06-08 15:29 UTC (permalink / raw)
  To: Sergei Iashin, Harman Kalra, Santosh Shukla
  Cc: dev@dpdk.org, stable@dpdk.org, jerin.jacob@caviumnetworks.com
In-Reply-To: <20260407075732.1175609-1-yashin.sergey@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2807 bytes --]


Applied to dpdk-next-net-mrvl/for-main. Thanks


________________________________
From: Sergei Iashin <yashin.sergey@gmail.com>
Sent: Tuesday, April 7, 2026 1:27 PM
To: Harman Kalra <hkalra@marvell.com>; Jerin Jacob <jerinj@marvell.com>; Santosh Shukla <santosh.shukla@caviumnetworks.com>
Cc: dev@dpdk.org <dev@dpdk.org>; stable@dpdk.org <stable@dpdk.org>; jerin.jacob@caviumnetworks.com <jerin.jacob@caviumnetworks.com>; Sergei Iashin <yashin.sergey@gmail.com>
Subject: [EXTERNAL] [PATCH] net/octeontx: fix buffer overflow in device name formatting

Replace sprintf with snprintf when formatting into the fixed-size octtx_name buffer in octeontx_create and octeontx_remove. The device name can be up to 63 bytes (RTE_DEV_NAME_MAX_LEN) while the buffer is only 32 bytes (OCTEONTX_MAX_NAME_LEN),
ZjQcmQRYFpfptBannerStart
Prioritize security for external emails:
Confirm sender and content safety before clicking links or opening attachments
Report Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/CRVmXkqW!tm3Z1f8UYnVa9O-8WX26DsK-0LaBO_9qwE4pEx2cpcKfFql8RWpbr-t0B-4n0FjU7XSDAvlitsV3KK8Ua-2nw37gJz6mivFAuDI$>

ZjQcmQRYFpfptBannerEnd

Replace sprintf with snprintf when formatting into the fixed-size
octtx_name buffer in octeontx_create and octeontx_remove. The device
name can be up to 63 bytes (RTE_DEV_NAME_MAX_LEN) while the buffer
is only 32 bytes (OCTEONTX_MAX_NAME_LEN), which may cause a stack
buffer overflow with a long user-provided --vdev name.

Fixes: f18b146c498d ("net/octeontx: create ethdev ports")
Cc: stable@dpdk.org

Signed-off-by: Sergei Iashin <yashin.sergey@gmail.com>
---
 drivers/net/octeontx/octeontx_ethdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/octeontx/octeontx_ethdev.c b/drivers/net/octeontx/octeontx_ethdev.c
index 21e3e56901..e4dca30d9d 100644
--- a/drivers/net/octeontx/octeontx_ethdev.c
+++ b/drivers/net/octeontx/octeontx_ethdev.c
@@ -1555,7 +1555,7 @@ octeontx_create(struct rte_vdev_device *dev, int port, uint8_t evdev,

        PMD_INIT_FUNC_TRACE();

-       sprintf(octtx_name, "%s_%d", name, port);
+       snprintf(octtx_name, sizeof(octtx_name), "%s_%d", name, port);
        if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
                eth_dev = rte_eth_dev_attach_secondary(octtx_name);
                if (eth_dev == NULL)
@@ -1711,7 +1711,7 @@ octeontx_remove(struct rte_vdev_device *dev)
                return -EINVAL;

        for (i = 0; i < OCTEONTX_VDEV_DEFAULT_MAX_NR_PORT; i++) {
-               sprintf(octtx_name, "eth_octeontx_%d", i);
+               snprintf(octtx_name, sizeof(octtx_name), "eth_octeontx_%d", i);

                eth_dev = rte_eth_dev_allocated(octtx_name);
                if (eth_dev == NULL)
--
2.39.5



[-- Attachment #2: Type: text/html, Size: 6645 bytes --]

^ permalink raw reply related

* Re: [PATCH v2] net/ark: fix unsafe env variable in extension loading
From: Stephen Hemminger @ 2026-06-08 15:24 UTC (permalink / raw)
  To: Denis Sergeev
  Cc: dev, shepard.siegel, ed.czeck, john.miller, stable, sdl.dpdk
In-Reply-To: <20260603053313.119342-1-denserg.edu@gmail.com>

On Wed,  3 Jun 2026 08:32:45 +0300
Denis Sergeev <denserg.edu@gmail.com> wrote:

> The ARK_EXT_PATH environment variable is passed to dlopen without
> verifying process privileges. In a setuid/setgid scenario, this
> could allow loading an arbitrary shared library with elevated
> privileges.
> 
> Add a check that effective user/group IDs match real IDs before
> trusting the environment variable, consistent with the same
> protection already present in the mlx5 driver.
> 
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
> 
> Fixes: 727b3fe292bc ("net/ark: integrate PMD")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Denis Sergeev <denserg.edu@gmail.com>

Thanks for the report, but it makes no sense.
DPDK already load shared libraries via -d command line arg without
checking. And running DPDK application as setuid would be completely
unsafe. The startup is not hardened in anyway.

NAK

That said, it would be good if DPDK had some security documentation
about what the trust boundary is and what capabilities are needed.

^ permalink raw reply

* Re: [PATCH 1/1] ml/cnxk: support for 64-bit int type in metadata
From: Jerin Jacob @ 2026-06-08 15:16 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev
In-Reply-To: <20260331085529.1105898-1-syalavarthi@marvell.com>

On Tue, Mar 31, 2026 at 6:41 PM Srikanth Yalavarthi
<syalavarthi@marvell.com> wrote:
>
> Added support for 64-bit integer data type in model metadata.
>
> Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>

Applied to dpdk-next-net-mrvl/for-main. Thanks

^ permalink raw reply

* Re: [PATCH 1/1] ml/cnxk: enable data caching for all MRVL layers
From: Jerin Jacob @ 2026-06-08 15:15 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, Anup Prabhu
In-Reply-To: <20260331085350.1105103-1-syalavarthi@marvell.com>

On Tue, Mar 31, 2026 at 2:30 PM Srikanth Yalavarthi
<syalavarthi@marvell.com> wrote:
>
> From: Anup Prabhu <aprabhu@marvell.com>
>
> Enabled data caching for all MRVL layers in TVM models.
>
> Signed-off-by: Anup Prabhu <aprabhu@marvell.com>


Applied to dpdk-next-net-mrvl/for-main. Thanks

> ---
>  drivers/ml/cnxk/cn10k_ml_ops.c | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
> index b30af7c7a44..628ff963c3c 100644
> --- a/drivers/ml/cnxk/cn10k_ml_ops.c
> +++ b/drivers/ml/cnxk/cn10k_ml_ops.c
> @@ -997,13 +997,8 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
>         if (ret < 0) {
>                 cn10k_ml_layer_stop(device, model_id, layer_name);
>         } else {
> -               if (cn10k_mldev->cache_model_data) {
> -                       if ((model->type == ML_CNXK_MODEL_TYPE_GLOW &&
> -                            model->subtype == ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL) ||
> -                           (model->type == ML_CNXK_MODEL_TYPE_TVM &&
> -                            model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
> -                               ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
> -               }
> +               if (cn10k_mldev->cache_model_data)
> +                       ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
>         }
>
>         return ret;
> --
> 2.47.0
>

^ permalink raw reply

* Re: [PATCH 1/1] ml/cnxk: avoid overwriting layer name during load
From: Jerin Jacob @ 2026-06-08 15:14 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev
In-Reply-To: <20260331085445.1105590-1-syalavarthi@marvell.com>

On Tue, Mar 31, 2026 at 7:10 PM Srikanth Yalavarthi
<syalavarthi@marvell.com> wrote:
>
> Layer name is initialized during metadata fetch and
> parsing stage. Avoid overwriting the layer name during
> layer load.
>
> Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>


Change as ml/cnxk: fix ...

and add Fixes: tag

> ---
>  drivers/ml/cnxk/cn10k_ml_ops.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
> index 628ff963c3c..77947120f25 100644
> --- a/drivers/ml/cnxk/cn10k_ml_ops.c
> +++ b/drivers/ml/cnxk/cn10k_ml_ops.c
> @@ -671,9 +671,6 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
>         rte_memcpy(&layer->glow.metadata, buffer, sizeof(struct cn10k_ml_model_metadata));
>         cn10k_ml_model_metadata_update(&layer->glow.metadata);
>
> -       /* Set layer name */
> -       rte_memcpy(layer->name, layer->glow.metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
> -
>         /* Enable support for batch_size of 256 */
>         if (layer->glow.metadata.model.batch_size == 0)
>                 layer->batch_size = 256;
> --
> 2.34.1
>

^ permalink raw reply

* Re: [PATCH v2] common/cnxk: allow typecasting to CN20K NPA structures
From: Jerin Jacob @ 2026-06-08 15:13 UTC (permalink / raw)
  To: Nawal Kishor
  Cc: dev, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra, jerinj, asekhar
In-Reply-To: <20260325062114.2595888-1-nkishor@marvell.com>

On Wed, Mar 25, 2026 at 12:10 PM Nawal Kishor <nkishor@marvell.com> wrote:
>
> Add __attribute__((may_alias)) to the CN20K-specific NPA structures
> (npa_cn20k_aura_s, npa_cn20k_pool_s, and npa_cn20k_halo_s) to allow
> safe type punning when casting between these structures and their
> base types (npa_aura_s and npa_pool_s).
>
> This attribute tells the compiler that these structures may alias
> with other types, which is necessary when casting pointers between
> compatible hardware register structures that share the same memory
> layout. Without this attribute, such casts violate strict aliasing
> rules and can lead to incorrect compiler optimizations.
>
> Signed-off-by: Nawal Kishor <nkishor@marvell.com>

Applied to dpdk-next-net-mrvl/for-main. Thanks

^ permalink raw reply

* [PATCH 3/3] net/iavf: fix event handler refcount leak on HW reset
From: Ciara Loftus @ 2026-06-08 14:55 UTC (permalink / raw)
  To: dev; +Cc: Ciara Loftus, stable
In-Reply-To: <20260608145518.1705524-1-ciara.loftus@intel.com>

Currently, when handling a hardware reset, the uninit path skips
releasing the event handler reference while in_reset_recovery is set,
to prevent premature teardown of the event handler thread. However, the
subsequent re-init call unconditionally increments the reference count,
inflating ndev on every reset cycle. On the final device removal, the
count never reaches zero and the event handler thread is never joined.

Fix it by also skipping the event handler reference acquisition during
reset recovery, matching the symmetric skip in the uninit path so the
count stays stable across each reset cycle.

Fixes: 3e6a5d2d310a ("net/iavf: add devargs to enable VF auto-reset")
Cc: stable@dpdk.org

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 drivers/net/intel/iavf/iavf_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/intel/iavf/iavf_ethdev.c b/drivers/net/intel/iavf/iavf_ethdev.c
index a38132e80e..ec1ad02826 100644
--- a/drivers/net/intel/iavf/iavf_ethdev.c
+++ b/drivers/net/intel/iavf/iavf_ethdev.c
@@ -3031,7 +3031,7 @@ iavf_dev_init(struct rte_eth_dev *eth_dev)
 	adapter->tpid = RTE_ETHER_TYPE_VLAN; /* VLAN TPID set to 0x8100 by default */
 	rte_spinlock_init(&adapter->phc_sync_lock);
 
-	if (iavf_dev_event_handler_init())
+	if (!vf->in_reset_recovery && iavf_dev_event_handler_init())
 		goto init_vf_err;
 
 	if (iavf_init_vf(eth_dev) != 0) {
-- 
2.43.0


^ permalink raw reply related

* [PATCH 2/3] net/iavf: wait for PF reset start before reinitializing
From: Ciara Loftus @ 2026-06-08 14:55 UTC (permalink / raw)
  To: dev; +Cc: Ciara Loftus, stable, Talluri Chaitanyababu
In-Reply-To: <20260608145518.1705524-1-ciara.loftus@intel.com>

Commit 1428895ad417 ("net/iavf: fix disabling of promiscuous modes on
close") added a synchronous VIRTCHNL round-trip on the close path
before the reset request is sent. This delays the reset just long
enough that `IAVF_VFGEN_RSTAT` still reads `VIRTCHNL_VFR_VFACTIVE`
when the re-init path polls it for reset completion. The driver
interprets this as the reset being complete, when in fact it has not
yet started, and proceeds to issue VIRTCHNL commands before the PF
has disabled the VF mailbox.

Fix by polling `IAVF_VF_ARQLEN1.ARQENABLE` immediately after the reset
request and before shutting down the admin queue, when the close is
triggered by a reset. The PF clears this bit as its first reset action,
providing an unambiguous signal that the reset is in progress.

Fixes: 1428895ad4 ("net/iavf: fix disabling of promiscuous modes on close")
Cc: stable@dpdk.org

Reported-by: Talluri Chaitanyababu <chaitanyababux.talluri@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 drivers/net/intel/iavf/iavf.h        |  1 +
 drivers/net/intel/iavf/iavf_ethdev.c | 12 ++++++++++++
 2 files changed, 13 insertions(+)

diff --git a/drivers/net/intel/iavf/iavf.h b/drivers/net/intel/iavf/iavf.h
index 2615b6f034..4444602a30 100644
--- a/drivers/net/intel/iavf/iavf.h
+++ b/drivers/net/intel/iavf/iavf.h
@@ -291,6 +291,7 @@ struct iavf_info {
 	struct rte_eth_dev *eth_dev;
 
 	bool in_reset_recovery;
+	bool reset_pending;
 
 	uint32_t ptp_caps;
 	rte_spinlock_t phc_time_aq_lock;
diff --git a/drivers/net/intel/iavf/iavf_ethdev.c b/drivers/net/intel/iavf/iavf_ethdev.c
index a8031e23a5..a38132e80e 100644
--- a/drivers/net/intel/iavf/iavf_ethdev.c
+++ b/drivers/net/intel/iavf/iavf_ethdev.c
@@ -106,6 +106,7 @@ static int iavf_dev_start(struct rte_eth_dev *dev);
 static int iavf_dev_stop(struct rte_eth_dev *dev);
 static int iavf_dev_close(struct rte_eth_dev *dev);
 static int iavf_dev_reset(struct rte_eth_dev *dev);
+static bool iavf_is_reset_detected(struct iavf_adapter *adapter);
 static int iavf_dev_info_get(struct rte_eth_dev *dev,
 			     struct rte_eth_dev_info *dev_info);
 static const uint32_t *iavf_dev_supported_ptypes_get(struct rte_eth_dev *dev,
@@ -3196,6 +3197,14 @@ iavf_dev_close(struct rte_eth_dev *dev)
 	iavf_flow_uninit(adapter);
 
 	iavf_vf_reset(hw);
+	/*
+	 * If a reset is pending, wait for the PF to disable the VF's admin
+	 * receive queue (its first reset action) before we shut it down
+	 * ourselves.  This ensures iavf_check_vf_reset_done() does not see
+	 * a stale VFACTIVE value on the re-init path.
+	 */
+	if (vf->reset_pending)
+		iavf_is_reset_detected(adapter);
 	vf->aq_intr_enabled = false;
 	iavf_shutdown_adminq(hw);
 	if (vf->vf_res->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_WB_ON_ITR) {
@@ -3273,6 +3282,7 @@ iavf_dev_reset(struct rte_eth_dev *dev)
 	struct iavf_adapter *adapter =
 		IAVF_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
 	struct iavf_hw *hw = IAVF_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct iavf_info *vf = IAVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
 	/*
 	 * Check whether the VF reset has been done and inform application,
 	 * to avoid calling the virtual channel command, which may cause
@@ -3285,8 +3295,10 @@ iavf_dev_reset(struct rte_eth_dev *dev)
 	}
 	iavf_set_no_poll(adapter, false);
 
+	vf->reset_pending = true;
 	PMD_DRV_LOG(DEBUG, "Start dev_reset ...");
 	ret = iavf_dev_uninit(dev);
+	vf->reset_pending = false;
 	if (ret)
 		return ret;
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH 1/3] net/iavf: downgrade opcode 0 ARQ log to debug
From: Ciara Loftus @ 2026-06-08 14:55 UTC (permalink / raw)
  To: dev; +Cc: Talluri Chaitanyababu
In-Reply-To: <20260608145518.1705524-1-ciara.loftus@intel.com>

From: Talluri Chaitanyababu <chaitanyababux.talluri@intel.com>

After admin queue reinitialisation, completions from uninitialised
ARQ ring descriptor memory may arrive before any real PF response.
These carry opcode 0 (`VIRTCHNL_OP_UNKNOWN`) and trigger a WARNING
log on every poll iteration, flooding the log during reset recovery.

Treat opcode 0 as a distinct case and log it at DEBUG level, while
retaining WARNING for genuine opcode mismatches.

Signed-off-by: Talluri Chaitanyababu <chaitanyababux.talluri@intel.com>
---
 drivers/net/intel/iavf/iavf_vchnl.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/intel/iavf/iavf_vchnl.c b/drivers/net/intel/iavf/iavf_vchnl.c
index 94ccfb5d6e..cd90d35023 100644
--- a/drivers/net/intel/iavf/iavf_vchnl.c
+++ b/drivers/net/intel/iavf/iavf_vchnl.c
@@ -299,8 +299,15 @@ iavf_read_msg_from_pf(struct iavf_adapter *adapter, uint16_t buf_len,
 		/* async reply msg on command issued by vf previously */
 		result = IAVF_MSG_CMD;
 		if (opcode != vf->pend_cmd) {
-			PMD_DRV_LOG(WARNING, "command mismatch, expect %u, get %u",
-					vf->pend_cmd, opcode);
+			if (opcode == VIRTCHNL_OP_UNKNOWN)
+				PMD_DRV_LOG(DEBUG,
+					    "Spurious msg with opcode 0, pending cmd %u",
+					    vf->pend_cmd);
+			else
+				PMD_DRV_LOG(WARNING,
+					    "command mismatch, expect %u, get %u",
+					    vf->pend_cmd, opcode);
+
 			result = IAVF_MSG_ERR;
 		}
 	}
-- 
2.43.0


^ permalink raw reply related

* [PATCH 0/3] net/iavf: vf reset fixes
From: Ciara Loftus @ 2026-06-08 14:55 UTC (permalink / raw)
  To: dev; +Cc: Ciara Loftus

The patch [1] aimed to address a race condition in the iavf driver
during a reset and also reduced noisy logging during resets.
Patch 1 of this series extracts the noisy logging fix into its own
commit.
Patch 2 offers an alternative approach to fixing the race condition.
Patch 3 fixes a pre-existing refcount imbalance in the shared event
handler thread that became visible while investigating the reset path.

[1] https://patches.dpdk.org/project/dpdk/patch/20260605123646.1328492-1-chaitanyababux.talluri@intel.com/

Ciara Loftus (2):
  net/iavf: wait for PF reset start before reinitializing
  net/iavf: fix event handler refcount leak on HW reset

Talluri Chaitanyababu (1):
  net/iavf: downgrade opcode 0 ARQ log to debug

 drivers/net/intel/iavf/iavf.h        |  1 +
 drivers/net/intel/iavf/iavf_ethdev.c | 14 +++++++++++++-
 drivers/net/intel/iavf/iavf_vchnl.c  | 11 +++++++++--
 3 files changed, 23 insertions(+), 3 deletions(-)

-- 
2.43.0


^ permalink raw reply

* Re: [PATCH v4] ethdev: support inline calculating masked item value
From: Dariusz Sosnowski @ 2026-06-08 14:49 UTC (permalink / raw)
  To: Bing Zhao, orika
  Cc: viacheslavo, dev, rasland, stephen, suanmingm, matan, thomas
In-Reply-To: <20260603092805.9837-1-bingz@nvidia.com>

On Wed, Jun 03, 2026 at 12:28:05PM +0300, Bing Zhao wrote:
> In the asynchronous API definition and some drivers, the
> rte_flow_item spec value may not be calculated by the driver due to the
> reason of speed of light rule insertion rate and sometimes the input
> parameters will be copied and changed internally.
> 
> After copying, the spec and last will be protected by the keyword
> const and cannot be changed in the code itself. And also the driver
> needs some extra memory to do the calculation and extra conditions
> to understand the length of each item spec. This is not efficient.
> 
> To solve the issue and support usage of the following fix, a new OP
> was introduced to calculate the spec and last values after applying
> the mask inline.
> 
> Signed-off-by: Bing Zhao <bingz@nvidia.com>

Ori has some technical issues with plain text emails on his side.
On his behalf:

Acked-by: Ori Kam <orika@nvidia.com>

Best regards,
Dariusz Sosnowski

^ permalink raw reply

* RE: [PATCH v2] net/mlx5: fix counter TAILQ race between free and query callback
From: Dariusz Sosnowski @ 2026-06-08 14:11 UTC (permalink / raw)
  To: Linhu Li, dev@dpdk.org; +Cc: stable@dpdk.org
In-Reply-To: <20260608132555.31439-1-lilinhu618@gmail.com>



> -----Original Message-----
> From: Linhu Li <lilinhu618@gmail.com>
> Sent: Monday, June 8, 2026 3:26 PM
> To: dev@dpdk.org
> Cc: stable@dpdk.org; Dariusz Sosnowski <dsosnowski@nvidia.com>; Linhu Li
> <lilinhu618@gmail.com>
> Subject: [PATCH v2] net/mlx5: fix counter TAILQ race between free and query
> callback
> 
> flow_dv_counter_free() inserts counters into
> pool->counters[pool->query_gen] under pool->csl. Meanwhile,
> mlx5_flow_async_pool_query_handle() moves counters from
> pool->counters[query_gen ^ 1] to the global free list via
> TAILQ_CONCAT while holding only cmng->csl, not pool->csl.
> 
> The comment in flow_dv_counter_free() claims the lock is not needed
> because the query callback and the release function operate on different lists.
> That holds only if the free path always observes the up-to-date query_gen. It
> can be violated:
> 
> 1. A counter free thread (non-PMD, e.g. OVS offload thread) reads
>    pool->query_gen == 0 and is about to insert into counters[0].
> 2. The free thread is preempted by the OS scheduler; it is a regular
>    pthread, not pinned to a core.
> 3. The eal-intr-thread alarm fires: query_gen++ (now 1) and the async
>    query is sent.
> 4. Hardware completes the query and the callback runs TAILQ_CONCAT on
>    counters[0] (= query_gen ^ 1).
> 5. The free thread resumes and runs TAILQ_INSERT_TAIL on counters[0]
>    concurrently with step 4 on another core.
> 
> Because the two paths take different locks, TAILQ_INSERT_TAIL and
> TAILQ_CONCAT run concurrently on the same list with no synchronization and
> corrupt it: the pool-local list ends up with a NULL head but a dangling
> tqh_last, and the global free list tail no longer points to the real tail. The just-
> freed counter and every counter inserted afterwards become unreachable
> and are leaked.
> 
> Non-PMD threads can be preempted for hundreds of microseconds under
> CPU pressure, which is well within the async query round-trip time, so the
> window is reachable in practice.
> 
> Fix it by taking pool->csl in the query completion callback before operating on
> pool->counters[query_gen], serializing the CONCAT with any concurrent
> INSERT. The lock is taken once per pool per query completion in the eal-intr-
> thread context, not on the datapath, so the cost is negligible. Lock order is
> pool->csl then cmng->csl, matching all other sites.
> 
> Also handle the error path: previously the counters accumulated in
> pool->counters[query_gen] were abandoned when a query failed. Move
> them back to the global free list to avoid a leak on persistent query failures.
> 
> Fixes: ac79183dc6f7 ("net/mlx5: optimize free counter lookup")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Linhu Li <lilinhu618@gmail.com>

Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>

^ permalink raw reply

* DPDK Tech Board meeting minutes 27-May-2026
From: Konstantin Ananyev @ 2026-06-08 14:06 UTC (permalink / raw)
  To: dev@dpdk.org; +Cc: techboard@dpdk.org


Members Attending
=================

Aaron Conole
Bruce Richardson
Jerin Jacob Kollanukkaran
Kevin Traynor
Konstantin Ananyev (chair)
Maxime Coquelin
Morten Brørup
Stephen Hemminger
Thomas Monjalon

NOTE
====
The Technical Board meetings take place every second Wednesday at 3 pm UTC.
Meetings are public, and DPDK community members are welcome to attend.
Agenda and previous minutes:
http://core.dpdk.org/techboard/minutes
The next meeting will follow the regular schedule.

1. DPDK Vulnerability Management - request for more engineers (Maxime, Thomas)
---------------------------------------------------------------------------------------------------------------
    - 15 unprocessed CVEs in the backlog
    - One of the current DPDK security maintainers is not active any more
    - DPDK security group needs more people to coupe with existing and new CVEs  
      - Options considered:
        - Intel and Marvell will poke for some internal resources
        - Try to reach universities that specialize in that topic
        - Hire research interns for that role:
           AR to current TB representative in the DPDK GB:
           Bring up that problem to DPDK GB attention and request for funding             

2. Excessive usage of __rte_always_inline (Stephen)
---------------------------------------------------------------------
The ``__rte_always_inline`` attribute forces the compiler to inline a function regardless of its size or call-graph heuristics.
Excessive usage of forced inlining can hurt performance by inflating function bodies, increasing register pressure,
and overriding profile-guided optimization.
In most cases preferred way would be plain ``inline`` (or no annotation at all for static functions) and let the compiler decide.
Modern compilers at ``-O2`` make good inlining decisions for small ``static inline`` functions.
New usages of ``__rte_always_inline`` have to be properly justified for the submitter od the patch.
Stephen to submit new patch for DPDK coding guideless to address that matter:
https://patchwork.dpdk.org/project/dpdk/patch/20260601172104.311909-1-stephen@networkplumber.org/


^ permalink raw reply

* [PATCH v5 5/5] eal: avoid deadlock in async IPC alarm callback
From: Anatoly Burakov @ 2026-06-08 13:13 UTC (permalink / raw)
  To: dev, Jianfeng Tan
In-Reply-To: <2bc77b94493d94b53a28ea535ed96d92a157a7c7.1780924381.git.anatoly.burakov@intel.com>

async_reply_handle_thread_unsafe() can run while holding
pending_requests.lock and currently calls rte_eal_alarm_cancel().

rte_eal_alarm_cancel() may spin-wait for an executing callback, which can
deadlock if that callback is blocked on the same lock.

Remove callback-side alarm cancellation. It is safe to do so, because any
callback triggered without a pending request becomes a noop.

Fixes: daf9bfca717e ("ipc: remove thread for async requests")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 28 ++++++++++------------------
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index ddcaa2f20b..908e86f6b0 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -549,19 +549,6 @@ async_reply_handle_thread_unsafe(struct pending_request *req)
 
 	TAILQ_REMOVE(&pending_requests.requests, req, next);
 
-	if (rte_eal_alarm_cancel(async_reply_handle,
-			(void *)(uintptr_t)req->id) < 0) {
-		/* if we failed to cancel the alarm because it's already in
-		 * progress, don't proceed because otherwise we will end up
-		 * handling the same message twice.
-		 */
-		if (rte_errno == EINPROGRESS) {
-			EAL_LOG(DEBUG, "Request handling is already in progress");
-			goto no_trigger;
-		}
-		EAL_LOG(ERR, "Failed to cancel alarm");
-	}
-
 	if (action == ACTION_TRIGGER)
 		return req;
 no_trigger:
@@ -910,8 +897,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 		return -1;
 	}
 
-	/* Set alarm before allocating or sending so request timeout tracking
-	 * is active as soon as this request ID is reserved.
+	/* Set alarm before allocating or sending. The alarm is never cancelled:
+	 * rte_eal_alarm_cancel spin-waits for an executing callback to finish,
+	 * which deadlocks if we hold pending_requests.lock while the callback
+	 * is blocked on it. Instead, let stale alarms fire; with ID-based
+	 * lookup the callback will simply not find the request and return
+	 * harmlessly.
 	 */
 	id = ++next_request_id;
 	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
@@ -1273,9 +1264,10 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	}
 
 	/*
-	 * On partial failure, roll back all queued requests in this batch while
-	 * holding pending_requests.lock. Any alarm callback that runs later for
-	 * these removed IDs will not find a pending request and will return.
+	 * On partial failure, roll back all queued requests. We hold the lock
+	 * so no one else touches the queue. All requests in this batch share
+	 * the same param pointer. Stale alarms will fire and harmlessly find
+	 * nothing via ID-based lookup.
 	 */
 	if (ret != 0 && reply->nb_sent > 0) {
 		struct pending_request *r, *next;
-- 
2.47.3


^ permalink raw reply related

* [PATCH v5 4/5] eal: fix async IPC resource leaks on partial failure
From: Anatoly Burakov @ 2026-06-08 13:13 UTC (permalink / raw)
  To: dev, Jianfeng Tan
In-Reply-To: <2bc77b94493d94b53a28ea535ed96d92a157a7c7.1780924381.git.anatoly.burakov@intel.com>

When rte_mp_request_async() fails to send requests to all peers,
copy and param can lose ownership and leak.

On partial failure, some requests may already be queued and still
reference copy and param, so freeing them directly on the error
path can cause use-after-free when those requests are later handled.

Fix this by rolling back queued requests from the current batch,
resetting nb_sent to 0, and freeing copy/param only after rollback.
Use a numeric request ID for alarm callback lookup so stale callbacks
from rolled-back requests become harmless no-ops.

Coverity issue: 501503
Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 112 +++++++++++++++++++++++--------
 1 file changed, 84 insertions(+), 28 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 0dd25bef8b..ddcaa2f20b 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -74,6 +74,7 @@ struct async_request_param {
 
 struct pending_request {
 	TAILQ_ENTRY(pending_request) next;
+	unsigned long id;
 	enum {
 		REQUEST_TYPE_SYNC,
 		REQUEST_TYPE_ASYNC
@@ -92,6 +93,8 @@ struct pending_request {
 	};
 };
 
+static unsigned long next_request_id;
+
 TAILQ_HEAD(pending_request_list, pending_request);
 
 static struct {
@@ -111,9 +114,9 @@ mp_send(struct rte_mp_msg *msg, const char *peer, int type);
 static void
 async_reply_handle(void *arg);
 
-/* for use with process_msg */
+/* for use with alarm callback and process_msg */
 static struct pending_request *
-async_reply_handle_thread_unsafe(void *arg);
+async_reply_handle_thread_unsafe(struct pending_request *req);
 
 static void
 trigger_async_action(struct pending_request *req);
@@ -132,6 +135,19 @@ find_pending_request(const char *dst, const char *act_name)
 	return r;
 }
 
+static struct pending_request *
+find_async_request_by_id(unsigned long id)
+{
+	struct pending_request *r;
+
+	TAILQ_FOREACH(r, &pending_requests.requests, next) {
+		if (r->id == id && r->type == REQUEST_TYPE_ASYNC)
+			return r;
+	}
+
+	return NULL;
+}
+
 /*
  * Combine prefix and name(optional) to return unix domain socket path
  * return the number of characters that would have been put into buffer.
@@ -519,9 +535,8 @@ trigger_async_action(struct pending_request *sr)
 }
 
 static struct pending_request *
-async_reply_handle_thread_unsafe(void *arg)
+async_reply_handle_thread_unsafe(struct pending_request *req)
 {
-	struct pending_request *req = (struct pending_request *)arg;
 	enum async_action action;
 	struct timespec ts_now;
 
@@ -534,7 +549,8 @@ async_reply_handle_thread_unsafe(void *arg)
 
 	TAILQ_REMOVE(&pending_requests.requests, req, next);
 
-	if (rte_eal_alarm_cancel(async_reply_handle, req) < 0) {
+	if (rte_eal_alarm_cancel(async_reply_handle,
+			(void *)(uintptr_t)req->id) < 0) {
 		/* if we failed to cancel the alarm because it's already in
 		 * progress, don't proceed because otherwise we will end up
 		 * handling the same message twice.
@@ -557,9 +573,13 @@ static void
 async_reply_handle(void *arg)
 {
 	struct pending_request *req;
+	/* alarm arg carries the request ID packed into a void * via uintptr_t */
+	unsigned long id = (uintptr_t)arg;
 
 	pthread_mutex_lock(&pending_requests.lock);
-	req = async_reply_handle_thread_unsafe(arg);
+	req = find_async_request_by_id(id);
+	if (req != NULL)
+		req = async_reply_handle_thread_unsafe(req);
 	pthread_mutex_unlock(&pending_requests.lock);
 
 	if (req != NULL)
@@ -878,7 +898,29 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 {
 	struct rte_mp_msg *reply_msg;
 	struct pending_request *pending_req, *exist;
-	int ret = -1;
+	unsigned long id;
+	int ret;
+
+	/* queue already locked by caller */
+
+	exist = find_pending_request(dst, req->name);
+	if (exist) {
+		EAL_LOG(ERR, "A pending request %s:%s", dst, req->name);
+		rte_errno = EEXIST;
+		return -1;
+	}
+
+	/* Set alarm before allocating or sending so request timeout tracking
+	 * is active as soon as this request ID is reserved.
+	 */
+	id = ++next_request_id;
+	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
+			async_reply_handle,
+			(void *)(uintptr_t)id) < 0) {
+		EAL_LOG(ERR, "Fail to set alarm for request %s:%s",
+			dst, req->name);
+		return -1;
+	}
 
 	pending_req = calloc(1, sizeof(*pending_req));
 	reply_msg = calloc(1, sizeof(*reply_msg));
@@ -890,21 +932,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	}
 
 	pending_req->type = REQUEST_TYPE_ASYNC;
+	pending_req->id = id;
 	strlcpy(pending_req->dst, dst, sizeof(pending_req->dst));
 	pending_req->request = req;
 	pending_req->reply = reply_msg;
 	pending_req->async.param = param;
 
-	/* queue already locked by caller */
-
-	exist = find_pending_request(dst, req->name);
-	if (exist) {
-		EAL_LOG(ERR, "A pending request %s:%s", dst, req->name);
-		rte_errno = EEXIST;
-		ret = -1;
-		goto fail;
-	}
-
 	ret = send_msg(dst, req, MP_REQ);
 	if (ret < 0) {
 		EAL_LOG(ERR, "Fail to send request %s:%s",
@@ -917,14 +950,6 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	}
 	param->user_reply.nb_sent++;
 
-	/* if alarm set fails, we simply ignore the reply */
-	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
-			      async_reply_handle, pending_req) < 0) {
-		EAL_LOG(ERR, "Fail to set alarm for request %s:%s",
-			dst, req->name);
-		ret = -1;
-		goto fail;
-	}
 	TAILQ_INSERT_TAIL(&pending_requests.requests, pending_req, next);
 
 	return 0;
@@ -1178,6 +1203,7 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	 * it, and put it on the queue if we don't send any requests.
 	 */
 	dummy->type = REQUEST_TYPE_ASYNC;
+	dummy->id = ++next_request_id;
 	dummy->request = copy;
 	dummy->reply = NULL;
 	dummy->async.param = param;
@@ -1194,8 +1220,8 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 			TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
 					next);
 			dummy_used = true;
-
-			if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0) {
+			if (rte_eal_alarm_set(1, async_reply_handle,
+					(void *)(uintptr_t)dummy->id) < 0) {
 				EAL_LOG(ERR, "Fail to set alarm for dummy request");
 				/* roll back the changes */
 				TAILQ_REMOVE(&pending_requests.requests, dummy, next);
@@ -1245,6 +1271,30 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		} else if (mp_request_async(path, copy, param, ts))
 			ret = -1;
 	}
+
+	/*
+	 * On partial failure, roll back all queued requests in this batch while
+	 * holding pending_requests.lock. Any alarm callback that runs later for
+	 * these removed IDs will not find a pending request and will return.
+	 */
+	if (ret != 0 && reply->nb_sent > 0) {
+		struct pending_request *r, *next;
+
+		for (r = TAILQ_FIRST(&pending_requests.requests);
+				r != NULL; r = next) {
+			next = TAILQ_NEXT(r, next);
+			if (r->type == REQUEST_TYPE_ASYNC &&
+					r->async.param == param) {
+				TAILQ_REMOVE(&pending_requests.requests,
+						r, next);
+				free(r->reply);
+				/* r->request == copy, freed below after the loop */
+				free(r);
+			}
+		}
+		reply->nb_sent = 0;
+	}
+
 	/* if we didn't send anything, put dummy request on the queue
 	 * and set a minimum-delay alarm so the callback fires immediately.
 	 */
@@ -1252,7 +1302,8 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		TAILQ_INSERT_HEAD(&pending_requests.requests, dummy, next);
 		dummy_used = true;
 
-		if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0) {
+		if (rte_eal_alarm_set(1, async_reply_handle,
+				(void *)(uintptr_t)dummy->id) < 0) {
 			EAL_LOG(ERR, "Fail to set alarm for dummy request");
 			/* roll back the changes */
 			TAILQ_REMOVE(&pending_requests.requests, dummy, next);
@@ -1274,6 +1325,11 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	/* if dummy was unused, free it */
 	if (!dummy_used)
 		free(dummy);
+	/* if nothing was sent, nobody owns copy/param */
+	if (ret != 0) {
+		free(param);
+		free(copy);
+	}
 
 	return ret;
 closedir_fail:
-- 
2.47.3


^ permalink raw reply related

* [PATCH v5 3/5] eal: fix memory leak in async IPC secondary path
From: Anatoly Burakov @ 2026-06-08 13:13 UTC (permalink / raw)
  To: dev, Jianfeng Tan
In-Reply-To: <2bc77b94493d94b53a28ea535ed96d92a157a7c7.1780924381.git.anatoly.burakov@intel.com>

When rte_mp_request_async() succeeds on the secondary process path, the
dummy request is freed only if it was inserted into the queue. However,
when the actual request was sent successfully (nb_sent > 0), the dummy is
not used and the function returns without freeing it.

Free dummy before returning on the success path when it was not inserted
into the queue.

Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 2a99162a21..0dd25bef8b 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -1210,6 +1210,8 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		/* if we couldn't send anything, clean up */
 		if (ret != 0)
 			goto fail;
+		if (!dummy_used)
+			free(dummy);
 		return 0;
 	}
 
-- 
2.47.3


^ permalink raw reply related

* [PATCH v5 2/5] eal: fix async IPC callback not fired when no peers
From: Anatoly Burakov @ 2026-06-08 13:13 UTC (permalink / raw)
  To: dev, Jianfeng Tan
In-Reply-To: <2bc77b94493d94b53a28ea535ed96d92a157a7c7.1780924381.git.anatoly.burakov@intel.com>

Currently, when rte_mp_request_async() is called and no peer processes
are connected (nb_sent == 0), the user callback is never invoked.

The original implementation used a dedicated background thread and
pthread_cond_signal() to wake it after queuing the dummy request. When
that thread was replaced with per-message alarms, no alarm was set for
the dummy request, silently breaking the nb_sent == 0 path.

This was not noticed because async requests are used while handling
secondary process requests, where peers are typically already present.

Fix it by setting a 1us alarm on the dummy request, so the callback path
immediately triggers and processes it.

Fixes: daf9bfca717e ("ipc: remove thread for async requests")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 799c6e81b0..2a99162a21 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -1187,11 +1187,22 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
 		ret = mp_request_async(eal_mp_socket_path(), copy, param, ts);
 
-		/* if we didn't send anything, put dummy request on the queue */
+		/* if we didn't send anything, put dummy request on the queue
+		 * and set a minimum-delay alarm so the callback fires immediately.
+		 */
 		if (ret == 0 && reply->nb_sent == 0) {
 			TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
 					next);
 			dummy_used = true;
+
+			if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0) {
+				EAL_LOG(ERR, "Fail to set alarm for dummy request");
+				/* roll back the changes */
+				TAILQ_REMOVE(&pending_requests.requests, dummy, next);
+				dummy_used = false;
+				ret = -1;
+				goto unlock_fail;
+			}
 		}
 
 		pthread_mutex_unlock(&pending_requests.lock);
@@ -1232,10 +1243,21 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 		} else if (mp_request_async(path, copy, param, ts))
 			ret = -1;
 	}
-	/* if we didn't send anything, put dummy request on the queue */
+	/* if we didn't send anything, put dummy request on the queue
+	 * and set a minimum-delay alarm so the callback fires immediately.
+	 */
 	if (ret == 0 && reply->nb_sent == 0) {
 		TAILQ_INSERT_HEAD(&pending_requests.requests, dummy, next);
 		dummy_used = true;
+
+		if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0) {
+			EAL_LOG(ERR, "Fail to set alarm for dummy request");
+			/* roll back the changes */
+			TAILQ_REMOVE(&pending_requests.requests, dummy, next);
+			dummy_used = false;
+			ret = -1;
+			goto closedir_fail;
+		}
 	}
 
 	/* finally, unlock the queue */
-- 
2.47.3


^ permalink raw reply related

* [PATCH v5 1/5] eal: fix wrong log message in async IPC request
From: Anatoly Burakov @ 2026-06-08 13:13 UTC (permalink / raw)
  To: dev, Jianfeng Tan
In-Reply-To: <740b39c5098b4d40cafb9881ad70865a3c889012.1773936429.git.anatoly.burakov@intel.com>

The allocation failure log message in mp_request_async() says "sync
request" but the function handles asynchronous requests.

Fix the log to say "async request".

Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/eal/common/eal_common_proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 06f151818c..799c6e81b0 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -883,7 +883,7 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
 	pending_req = calloc(1, sizeof(*pending_req));
 	reply_msg = calloc(1, sizeof(*reply_msg));
 	if (pending_req == NULL || reply_msg == NULL) {
-		EAL_LOG(ERR, "Could not allocate space for sync request");
+		EAL_LOG(ERR, "Could not allocate space for async request");
 		rte_errno = ENOMEM;
 		ret = -1;
 		goto fail;
-- 
2.47.3


^ permalink raw reply related

* Re: [PATCH] doc: move firmware instructions in mlx5 guide
From: Dariusz Sosnowski @ 2026-06-08 12:46 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad
In-Reply-To: <20260608120531.1037367-1-thomas@monjalon.net>

On Mon, Jun 08, 2026 at 02:05:31PM +0200, Thomas Monjalon wrote:
> Having firmware update instructions before firmware config
> looks simpler to find than in compilation prerequisites.
> 
> A link is also added after listing minimum firmware versions.
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox