RE: [PATCH v2] net/null: Add fast mbuf release TX offload

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Konstantin Ananyev" <konstantin.ananyev@huawei.com>,
	"Ivan Malov" <ivan.malov@arknetworks.am>
Cc: <dev@dpdk.org>, "Tetsuya Mukawa" <mtetsuyah@gmail.com>,
	"Stephen Hemminger" <stephen@networkplumber.org>,
	"Vipin Varghese" <Vipin.Varghese@amd.com>,
	"Thiyagarjan P" <Thiyagarajan.P@amd.com>
Subject: RE: [PATCH v2] net/null: Add fast mbuf release TX offload
Date: Mon, 28 Jul 2025 18:42:21 +0200	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9FDCE@smartserver.smartshare.dk> (raw)
In-Reply-To: <b9f113ebc01f46cc87f7cd0ff7502add@huawei.com>

> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> Sent: Monday, 28 July 2025 17.42
> 
> > > Hi Morten,
> > >
> > > Good patch. Please see below.
> > >
> > > On Sat, 26 Jul 2025, Morten Brørup wrote:
> > >
> > > > Added fast mbuf release, re-using the existing mbuf pool pointer
> > > > in the queue structure.
> > > >
> > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > ---
> > > > v2:
> > > > * Also announce the offload as a per-queue capability.
> > > > * Added missing test of per-device offload configuration when
> > > configuring
> > > >  the queue.
> > > > ---
> > > > drivers/net/null/rte_eth_null.c | 33 ++++++++++++++++++++++++++++++---
> > > > 1 file changed, 30 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/net/null/rte_eth_null.c
> > > b/drivers/net/null/rte_eth_null.c
> > > > index 8a9b74a03b..09cfc74494 100644
> > > > --- a/drivers/net/null/rte_eth_null.c
> > > > +++ b/drivers/net/null/rte_eth_null.c
> > > > @@ -34,6 +34,17 @@ struct pmd_internals;
> > > > struct null_queue {
> > > > 	struct pmd_internals *internals;
> > > >
> > > > +	/**
> > > > +	 * For RX queue:
> > > > +	 *  Mempool to allocate mbufs from.
> > > > +	 *
> > > > +	 * For TX queue:
> > >
> > > Perhaps spell it 'Rx', 'Tx', but this is up to you.
> >
> > I just checked, and it seems all three spellings "rx", "Rx" and "RX" are
> being used in DPDK.
> > I personally prefer RX, so I'll keep that.
> >
> > >
> > > > +	 *  Mempool to free mbufs to, if fast release of mbufs is
> enabled.
> > > > +	 *  UINTPTR_MAX if the mempool for fast release of mbufs has not
> > > yet been detected.
> > > > +	 *  NULL if fast release of mbufs is not enabled.
> > > > +	 *
> > > > +	 *  @see RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
> > > > +	 */
> > >
> > > May be it would be better to have a separate 'tx_pkt_burst' callback, to
> > > avoid
> > > conditional checks below. Though, I understand this will downgrade the
> > > per-queue
> > > capability to the per-port only, so feel free to disregard this point.
> >
> > I considered this, and I can imagine an application using FAST_FREE for its
> primary queues, and normal free for some secondary
> > queues.
> > Looking at other drivers - which have implemented a runtime check, not
> separate callbacks for FAST_FREE - I guess they came to the
> > same conclusion.
> >
> > >
> > > > 	struct rte_mempool *mb_pool;
> > > > 	void *dummy_packet;
> > > >
> > > > @@ -151,7 +162,16 @@ eth_null_tx(void *q, struct rte_mbuf **bufs,
> > > uint16_t nb_bufs)
> > > > 	for (i = 0; i < nb_bufs; i++)
> > > > 		bytes += rte_pktmbuf_pkt_len(bufs[i]);
> > > >
> > > > -	rte_pktmbuf_free_bulk(bufs, nb_bufs);
> > > > +	if (h->mb_pool != NULL) { /* RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE */
> > > > +		if (unlikely(h->mb_pool == (void *)UINTPTR_MAX)) {
> > > > +			if (unlikely(nb_bufs == 0))
> > > > +				return 0; /* Do not dereference uninitialized
> > > bufs[0]. */
> > > > +			h->mb_pool = bufs[0]->pool;
> > > > +		}
> > > > +		rte_mbuf_raw_free_bulk(h->mb_pool, bufs, nb_bufs);
> > > > +	} else {
> > > > +		rte_pktmbuf_free_bulk(bufs, nb_bufs);
> > > > +	}
> > > > 	rte_atomic_fetch_add_explicit(&h->tx_pkts, nb_bufs,
> > > rte_memory_order_relaxed);
> > > > 	rte_atomic_fetch_add_explicit(&h->tx_bytes, bytes,
> > > rte_memory_order_relaxed);
> > > >
> > > > @@ -259,7 +279,7 @@ static int
> > > > eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> > > > 		uint16_t nb_tx_desc __rte_unused,
> > > > 		unsigned int socket_id __rte_unused,
> > > > -		const struct rte_eth_txconf *tx_conf __rte_unused)
> > > > +		const struct rte_eth_txconf *tx_conf)
> > > > {
> > > > 	struct rte_mbuf *dummy_packet;
> > > > 	struct pmd_internals *internals;
> > > > @@ -284,6 +304,10 @@ eth_tx_queue_setup(struct rte_eth_dev *dev,
> > > uint16_t tx_queue_id,
> > > >
> > > > 	internals->tx_null_queues[tx_queue_id].internals = internals;
> > > > 	internals->tx_null_queues[tx_queue_id].dummy_packet =
> > > dummy_packet;
> > > > +	internals->tx_null_queues[tx_queue_id].mb_pool =
> > > > +			(dev->data->dev_conf.txmode.offloads | tx_conf-
> > > >offloads) &
> > > > +			RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE ?
> > > > +			(void *)UINTPTR_MAX : NULL;
> > >
> > > Given the fact that FAST_FREE and MULTI_SEGS are effectively
> > > conflicting,
> > > wouldn't it be better to have a check for the presence of both flags
> > > here? I'm
> > > not sure whether this is already checked at the generic layer above the
> > > PMD.
> >
> > Interesting thought - got me looking deeper into this.
> >
> > It seems MULTI_SEGS is primarily a capability flag.
> > The description of the MULTI_SEGS flag [1] could be interpreted that way
> too:
> > /** Device supports multi segment send. */
> >
> > [1]:
> https://elixir.bootlin.com/dpdk/v25.07/source/lib/ethdev/rte_ethdev.h#L1614
> 
> In fact, I believe it serves both purposes: report capabilities and request
> for offloads to enable.
> Few example, I believe request this offload:
> https://elixir.bootlin.com/dpdk/v25.07/source/examples/ip_fragmentation/main.c
> #L156
> https://elixir.bootlin.com/dpdk/v25.07/source/examples/ipsec-secgw/ipsec-
> secgw.c#L1985
> https://elixir.bootlin.com/dpdk/v25.07/source/examples/ip_reassembly/main.c#L1
> 77

Another flag with unclear description. :-(

> 
> >
> > E.g. the i40e driver offers MULTI_SEGS capability per-device, but not per-
> queue. And it doesn't use the MULTI_SEGS flag for any
> > purpose (beyond capability reporting).
> >
> > Furthermore, enabling MULTI_SEGS on TX (per device or per queue) wouldn't
> mean that all transmitted packets are segmented; it
> > only means that the driver must be able to handle segmented packets.
> 
> Yep.
> 
> > I.e. MULTI_SEGS could be enabled on a device, and yet it would be acceptable
> to enable FAST_FREE on a queue on that device.
> 
> In theory yes... you probably can have one TX queue with FAST_FREE (no multi-
> seg packets) and another TX queue serving mulit-seg packets.
> Again, probably some drivers can even support both offloads on the same TX
> queue,
> as long as conditions for the FAST_FREE offload are still satisfied: single
> mempool, refcnt==1, no indirect mbufs, etc.
> Though in practice, using MULTI_SEG usually implies usage all these mbuf
> features that are not-compatible with FAST_FREE.
> BTW,  I see many of DPDK examples - do use both FAST_FREE and MULTI_SEG.
> TBH - I don't understand how it works together, from my memories - for many
> cases it just shouldn't.

Agree: Using FAST_FREE and MULTI_SEG together shouldn't work.
But if a driver (e.g. i40e) doesn't support configuring MULTI_SEG per queue, only per device, it would be impossible to configure FAST_FREE for one queue and MULTI_SEG for another queue.
Cleaning up this mess would break applications that assume MULTI_SEG is for capability only, and thus don't set it when configuring the device or queue. And applications that configure MULTI_SEG on a device and FAST_FREE on a queue.

Slightly related: I suspect that FAST_FREE might not be implemented 100 % correctly in all drivers, so I submitted a patch [2] to verify that FAST_FREE'ed mbufs conform to the required state of mbufs held in the mbuf pool. (No specific drivers in mind, just a weak hunch.)
Any drivers implementing FAST_FREE should use rte_mbuf_raw_free_bulk() instead of rte_mempool_put_bulk() to benefit from this patch, especially when conformance testing the driver.

[2]: https://inbox.dpdk.org/dev/20250722093431.555214-1-mb@smartsharesystems.com/

> 
> >
> > >
> > > Thank you.
> >
> > Thank you for reviewing.
> >
> > >
> > > >
> > > > 	return 0;
> > > > }
> > > > @@ -309,7 +333,10 @@ eth_dev_info(struct rte_eth_dev *dev,
> > > > 	dev_info->max_rx_queues = RTE_DIM(internals->rx_null_queues);
> > > > 	dev_info->max_tx_queues = RTE_DIM(internals->tx_null_queues);
> > > > 	dev_info->min_rx_bufsize = 0;
> > > > -	dev_info->tx_offload_capa = RTE_ETH_TX_OFFLOAD_MULTI_SEGS |
> > > RTE_ETH_TX_OFFLOAD_MT_LOCKFREE;
> > > > +	dev_info->tx_queue_offload_capa =
> > > RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
> > > > +	dev_info->tx_offload_capa = RTE_ETH_TX_OFFLOAD_MULTI_SEGS |
> > > > +			RTE_ETH_TX_OFFLOAD_MT_LOCKFREE |
> > > > +			dev_info->tx_queue_offload_capa;
> > > >
> > > > 	dev_info->reta_size = internals->reta_size;
> > > > 	dev_info->flow_type_rss_offloads = internals-
> > > >flow_type_rss_offloads;
> > > > --
> > > > 2.43.0
> > > >
> > > >

next prev parent reply	other threads:[~2025-07-28 16:42 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-24 18:14 [PATCH] net/null: Add fast mbuf release TX offload Morten Brørup
2025-06-26 14:05 ` Stephen Hemminger
2025-06-26 15:44   ` Morten Brørup
2025-06-27 12:07     ` Varghese, Vipin
2025-07-26  4:34       ` Morten Brørup
2025-07-28  8:22         ` Varghese, Vipin
2025-07-26  4:48 ` [PATCH v2] " Morten Brørup
2025-07-26  6:15   ` Ivan Malov
2025-07-28 13:27     ` Morten Brørup
2025-07-28 13:51       ` Ivan Malov
2025-07-28 15:42       ` Konstantin Ananyev
2025-07-28 16:42         ` Morten Brørup [this message]
2025-07-30 13:50 ` [PATCH v3] " Morten Brørup
2025-07-30 14:04 ` [PATCH v4] " Morten Brørup
2025-08-04  9:23   ` Varghese, Vipin
2025-08-04 15:39   ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9FDCE@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=Thiyagarajan.P@amd.com \
    --cc=Vipin.Varghese@amd.com \
    --cc=dev@dpdk.org \
    --cc=ivan.malov@arknetworks.am \
    --cc=konstantin.ananyev@huawei.com \
    --cc=mtetsuyah@gmail.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.