netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] idpf: do not linearize big TSO packets
@ 2025-08-18 19:59 Eric Dumazet
  2025-08-20  4:27 ` Hay, Joshua A
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Eric Dumazet @ 2025-08-18 19:59 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet, Tony Nguyen,
	Przemek Kitszel, Jacob Keller, Madhu Chittim, Pavan Kumar Linga,
	Joshua Hay, Brian Vazquez, Willem de Bruijn, Andrew Lunn

idpf has a limit on number of scatter-gather frags
that can be used per segment.

Currently, idpf_tx_start() checks if the limit is hit
and forces a linearization of the whole packet.

This requires high order allocations that can fail
under memory pressure. A full size BIG-TCP packet
would require order-7 alocation on x86_64 :/

We can move the check earlier from idpf_features_check()
for TSO packets, to force GSO in this case, removing the
cost of a big copy.

This means that a linearization will eventually happen
with sizes smaller than one MSS.

__idpf_chk_linearize() is renamed to idpf_chk_tso_segment()
and moved to idpf_lib.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Madhu Chittim <madhu.chittim@intel.com>
Cc: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Cc: Joshua Hay <joshua.a.hay@intel.com>
Cc: Brian Vazquez <brianvv@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
---
 drivers/net/ethernet/intel/idpf/idpf.h      |   2 +
 drivers/net/ethernet/intel/idpf/idpf_lib.c  | 102 +++++++++++++++-
 drivers/net/ethernet/intel/idpf/idpf_txrx.c | 129 ++++----------------
 3 files changed, 120 insertions(+), 113 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h
index f4c0eaf9bde336ec77c9d61b5480f20fe7929e02..aafbb280c2e7387745bef10bd1fbfead07a3666b 100644
--- a/drivers/net/ethernet/intel/idpf/idpf.h
+++ b/drivers/net/ethernet/intel/idpf/idpf.h
@@ -148,6 +148,7 @@ enum idpf_vport_state {
  * @link_speed_mbps: Link speed in mbps
  * @vport_idx: Relative vport index
  * @max_tx_hdr_size: Max header length hardware can support
+ * @tx_max_bufs: Max buffers that can be transmitted with scatter-gather
  * @state: See enum idpf_vport_state
  * @netstats: Packet and byte stats
  * @stats_lock: Lock to protect stats update
@@ -159,6 +160,7 @@ struct idpf_netdev_priv {
 	u32 link_speed_mbps;
 	u16 vport_idx;
 	u16 max_tx_hdr_size;
+	u16 tx_max_bufs;
 	enum idpf_vport_state state;
 	struct rtnl_link_stats64 netstats;
 	spinlock_t stats_lock;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
index 2c2a3e85d693013be95b71d2032817fb1e0a63d7..565c521c88cfb39cec31edbc0dc2ca9418fc3f09 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
@@ -776,6 +776,7 @@ static int idpf_cfg_netdev(struct idpf_vport *vport)
 	np->vport_idx = vport->idx;
 	np->vport_id = vport->vport_id;
 	np->max_tx_hdr_size = idpf_get_max_tx_hdr_size(adapter);
+	np->tx_max_bufs = idpf_get_max_tx_bufs(adapter);
 
 	spin_lock_init(&np->stats_lock);
 
@@ -2271,6 +2272,92 @@ static int idpf_change_mtu(struct net_device *netdev, int new_mtu)
 	return err;
 }
 
+/**
+ * idpf_chk_tso_segment - Check skb is not using too many buffers
+ * @skb: send buffer
+ * @max_bufs: maximum number of buffers
+ *
+ * For TSO we need to count the TSO header and segment payload separately.  As
+ * such we need to check cases where we have max_bufs-1 fragments or more as we
+ * can potentially require max_bufs+1 DMA transactions, 1 for the TSO header, 1
+ * for the segment payload in the first descriptor, and another max_buf-1 for
+ * the fragments.
+ *
+ * Returns true if the packet needs to be software segmented by core stack.
+ */
+static bool idpf_chk_tso_segment(const struct sk_buff *skb,
+				 unsigned int max_bufs)
+{
+	const struct skb_shared_info *shinfo = skb_shinfo(skb);
+	const skb_frag_t *frag, *stale;
+	int nr_frags, sum;
+
+	/* no need to check if number of frags is less than max_bufs - 1 */
+	nr_frags = shinfo->nr_frags;
+	if (nr_frags < (max_bufs - 1))
+		return false;
+
+	/* We need to walk through the list and validate that each group
+	 * of max_bufs-2 fragments totals at least gso_size.
+	 */
+	nr_frags -= max_bufs - 2;
+	frag = &shinfo->frags[0];
+
+	/* Initialize size to the negative value of gso_size minus 1.  We use
+	 * this as the worst case scenario in which the frag ahead of us only
+	 * provides one byte which is why we are limited to max_bufs-2
+	 * descriptors for a single transmit as the header and previous
+	 * fragment are already consuming 2 descriptors.
+	 */
+	sum = 1 - shinfo->gso_size;
+
+	/* Add size of frags 0 through 4 to create our initial sum */
+	sum += skb_frag_size(frag++);
+	sum += skb_frag_size(frag++);
+	sum += skb_frag_size(frag++);
+	sum += skb_frag_size(frag++);
+	sum += skb_frag_size(frag++);
+
+	/* Walk through fragments adding latest fragment, testing it, and
+	 * then removing stale fragments from the sum.
+	 */
+	for (stale = &shinfo->frags[0];; stale++) {
+		int stale_size = skb_frag_size(stale);
+
+		sum += skb_frag_size(frag++);
+
+		/* The stale fragment may present us with a smaller
+		 * descriptor than the actual fragment size. To account
+		 * for that we need to remove all the data on the front and
+		 * figure out what the remainder would be in the last
+		 * descriptor associated with the fragment.
+		 */
+		if (stale_size > IDPF_TX_MAX_DESC_DATA) {
+			int align_pad = -(skb_frag_off(stale)) &
+					(IDPF_TX_MAX_READ_REQ_SIZE - 1);
+
+			sum -= align_pad;
+			stale_size -= align_pad;
+
+			do {
+				sum -= IDPF_TX_MAX_DESC_DATA_ALIGNED;
+				stale_size -= IDPF_TX_MAX_DESC_DATA_ALIGNED;
+			} while (stale_size > IDPF_TX_MAX_DESC_DATA);
+		}
+
+		/* if sum is negative we failed to make sufficient progress */
+		if (sum < 0)
+			return true;
+
+		if (!nr_frags--)
+			break;
+
+		sum -= stale_size;
+	}
+
+	return false;
+}
+
 /**
  * idpf_features_check - Validate packet conforms to limits
  * @skb: skb buffer
@@ -2292,12 +2379,15 @@ static netdev_features_t idpf_features_check(struct sk_buff *skb,
 	if (skb->ip_summed != CHECKSUM_PARTIAL)
 		return features;
 
-	/* We cannot support GSO if the MSS is going to be less than
-	 * 88 bytes. If it is then we need to drop support for GSO.
-	 */
-	if (skb_is_gso(skb) &&
-	    (skb_shinfo(skb)->gso_size < IDPF_TX_TSO_MIN_MSS))
-		features &= ~NETIF_F_GSO_MASK;
+	if (skb_is_gso(skb)) {
+		/* We cannot support GSO if the MSS is going to be less than
+		 * 88 bytes. If it is then we need to drop support for GSO.
+		 */
+		if (skb_shinfo(skb)->gso_size < IDPF_TX_TSO_MIN_MSS)
+			features &= ~NETIF_F_GSO_MASK;
+		else if (idpf_chk_tso_segment(skb, np->tx_max_bufs))
+			features &= ~NETIF_F_GSO_MASK;
+	}
 
 	/* Ensure MACLEN is <= 126 bytes (63 words) and not an odd size */
 	len = skb_network_offset(skb);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 66a1b040639d1110519ca8fd57c2bdf0fdb2d822..b868761fad4d7ec8be414a35dc21640425abf09b 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -16,8 +16,28 @@ struct idpf_tx_stash {
 #define idpf_tx_buf_compl_tag(buf)	(*(u32 *)&(buf)->priv)
 LIBETH_SQE_CHECK_PRIV(u32);
 
-static bool idpf_chk_linearize(struct sk_buff *skb, unsigned int max_bufs,
-			       unsigned int count);
+/**
+ * idpf_chk_linearize - Check if skb exceeds max descriptors per packet
+ * @skb: send buffer
+ * @max_bufs: maximum scatter gather buffers for single packet
+ * @count: number of buffers this packet needs
+ *
+ * Make sure we don't exceed maximum scatter gather buffers for a single
+ * packet.
+ * TSO case has been handled earlier from idpf_features_check().
+ */
+static bool idpf_chk_linearize(const struct sk_buff *skb,
+			       unsigned int max_bufs,
+			       unsigned int count)
+{
+	if (likely(count <= max_bufs))
+		return false;
+
+	if (skb_is_gso(skb))
+		return false;
+
+	return true;
+}
 
 /**
  * idpf_buf_lifo_push - push a buffer pointer onto stack
@@ -2627,111 +2647,6 @@ int idpf_tso(struct sk_buff *skb, struct idpf_tx_offload_params *off)
 	return 1;
 }
 
-/**
- * __idpf_chk_linearize - Check skb is not using too many buffers
- * @skb: send buffer
- * @max_bufs: maximum number of buffers
- *
- * For TSO we need to count the TSO header and segment payload separately.  As
- * such we need to check cases where we have max_bufs-1 fragments or more as we
- * can potentially require max_bufs+1 DMA transactions, 1 for the TSO header, 1
- * for the segment payload in the first descriptor, and another max_buf-1 for
- * the fragments.
- */
-static bool __idpf_chk_linearize(struct sk_buff *skb, unsigned int max_bufs)
-{
-	const struct skb_shared_info *shinfo = skb_shinfo(skb);
-	const skb_frag_t *frag, *stale;
-	int nr_frags, sum;
-
-	/* no need to check if number of frags is less than max_bufs - 1 */
-	nr_frags = shinfo->nr_frags;
-	if (nr_frags < (max_bufs - 1))
-		return false;
-
-	/* We need to walk through the list and validate that each group
-	 * of max_bufs-2 fragments totals at least gso_size.
-	 */
-	nr_frags -= max_bufs - 2;
-	frag = &shinfo->frags[0];
-
-	/* Initialize size to the negative value of gso_size minus 1.  We use
-	 * this as the worst case scenario in which the frag ahead of us only
-	 * provides one byte which is why we are limited to max_bufs-2
-	 * descriptors for a single transmit as the header and previous
-	 * fragment are already consuming 2 descriptors.
-	 */
-	sum = 1 - shinfo->gso_size;
-
-	/* Add size of frags 0 through 4 to create our initial sum */
-	sum += skb_frag_size(frag++);
-	sum += skb_frag_size(frag++);
-	sum += skb_frag_size(frag++);
-	sum += skb_frag_size(frag++);
-	sum += skb_frag_size(frag++);
-
-	/* Walk through fragments adding latest fragment, testing it, and
-	 * then removing stale fragments from the sum.
-	 */
-	for (stale = &shinfo->frags[0];; stale++) {
-		int stale_size = skb_frag_size(stale);
-
-		sum += skb_frag_size(frag++);
-
-		/* The stale fragment may present us with a smaller
-		 * descriptor than the actual fragment size. To account
-		 * for that we need to remove all the data on the front and
-		 * figure out what the remainder would be in the last
-		 * descriptor associated with the fragment.
-		 */
-		if (stale_size > IDPF_TX_MAX_DESC_DATA) {
-			int align_pad = -(skb_frag_off(stale)) &
-					(IDPF_TX_MAX_READ_REQ_SIZE - 1);
-
-			sum -= align_pad;
-			stale_size -= align_pad;
-
-			do {
-				sum -= IDPF_TX_MAX_DESC_DATA_ALIGNED;
-				stale_size -= IDPF_TX_MAX_DESC_DATA_ALIGNED;
-			} while (stale_size > IDPF_TX_MAX_DESC_DATA);
-		}
-
-		/* if sum is negative we failed to make sufficient progress */
-		if (sum < 0)
-			return true;
-
-		if (!nr_frags--)
-			break;
-
-		sum -= stale_size;
-	}
-
-	return false;
-}
-
-/**
- * idpf_chk_linearize - Check if skb exceeds max descriptors per packet
- * @skb: send buffer
- * @max_bufs: maximum scatter gather buffers for single packet
- * @count: number of buffers this packet needs
- *
- * Make sure we don't exceed maximum scatter gather buffers for a single
- * packet. We have to do some special checking around the boundary (max_bufs-1)
- * if TSO is on since we need count the TSO header and payload separately.
- * E.g.: a packet with 7 fragments can require 9 DMA transactions; 1 for TSO
- * header, 1 for segment payload, and then 7 for the fragments.
- */
-static bool idpf_chk_linearize(struct sk_buff *skb, unsigned int max_bufs,
-			       unsigned int count)
-{
-	if (likely(count < max_bufs))
-		return false;
-	if (skb_is_gso(skb))
-		return __idpf_chk_linearize(skb, max_bufs);
-
-	return count > max_bufs;
-}
 
 /**
  * idpf_tx_splitq_get_ctx_desc - grab next desc and update buffer ring
-- 
2.51.0.rc1.167.g924127e9c0-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH net-next] idpf: do not linearize big TSO packets
  2025-08-18 19:59 [PATCH net-next] idpf: do not linearize big TSO packets Eric Dumazet
@ 2025-08-20  4:27 ` Hay, Joshua A
  2025-08-21 14:14   ` Brian Vazquez
  2025-08-22 16:02 ` Tony Nguyen
  2025-08-22 17:40 ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 5+ messages in thread
From: Hay, Joshua A @ 2025-08-20  4:27 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev@vger.kernel.org, eric.dumazet@gmail.com,
	Nguyen, Anthony L, Kitszel, Przemyslaw, Keller, Jacob E,
	Chittim, Madhu, Linga, Pavan Kumar, Brian Vazquez,
	Willem de Bruijn, Andrew Lunn

> From: Eric Dumazet <edumazet@google.com>
> 
> idpf has a limit on number of scatter-gather frags
> that can be used per segment.
> 
> Currently, idpf_tx_start() checks if the limit is hit
> and forces a linearization of the whole packet.
> 
> This requires high order allocations that can fail
> under memory pressure. A full size BIG-TCP packet
> would require order-7 alocation on x86_64 :/
> 
> We can move the check earlier from idpf_features_check()
> for TSO packets, to force GSO in this case, removing the
> cost of a big copy.
> 
> This means that a linearization will eventually happen
> with sizes smaller than one MSS.
> 
> __idpf_chk_linearize() is renamed to idpf_chk_tso_segment()
> and moved to idpf_lib.c
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Cc: Jacob Keller <jacob.e.keller@intel.com>
> Cc: Madhu Chittim <madhu.chittim@intel.com>
> Cc: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> Cc: Joshua Hay <joshua.a.hay@intel.com>
> Cc: Brian Vazquez <brianvv@google.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Andrew Lunn <andrew+netdev@lunn.ch>
> ---
Reviewed-by: Joshua Hay <joshua.a.hay@intel.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] idpf: do not linearize big TSO packets
  2025-08-20  4:27 ` Hay, Joshua A
@ 2025-08-21 14:14   ` Brian Vazquez
  0 siblings, 0 replies; 5+ messages in thread
From: Brian Vazquez @ 2025-08-21 14:14 UTC (permalink / raw)
  To: Hay, Joshua A
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev@vger.kernel.org, eric.dumazet@gmail.com,
	Nguyen, Anthony L, Kitszel, Przemyslaw, Keller, Jacob E,
	Chittim, Madhu, Linga, Pavan Kumar, Willem de Bruijn, Andrew Lunn

Tested-by: Brian Vazquez <brianvv@google.com>


On Wed, Aug 20, 2025 at 12:27 AM Hay, Joshua A <joshua.a.hay@intel.com> wrote:
>
> > From: Eric Dumazet <edumazet@google.com>
> >
> > idpf has a limit on number of scatter-gather frags
> > that can be used per segment.
> >
> > Currently, idpf_tx_start() checks if the limit is hit
> > and forces a linearization of the whole packet.
> >
> > This requires high order allocations that can fail
> > under memory pressure. A full size BIG-TCP packet
> > would require order-7 alocation on x86_64 :/
> >
> > We can move the check earlier from idpf_features_check()
> > for TSO packets, to force GSO in this case, removing the
> > cost of a big copy.
> >
> > This means that a linearization will eventually happen
> > with sizes smaller than one MSS.
> >
> > __idpf_chk_linearize() is renamed to idpf_chk_tso_segment()
> > and moved to idpf_lib.c
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
> > Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> > Cc: Jacob Keller <jacob.e.keller@intel.com>
> > Cc: Madhu Chittim <madhu.chittim@intel.com>
> > Cc: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> > Cc: Joshua Hay <joshua.a.hay@intel.com>
> > Cc: Brian Vazquez <brianvv@google.com>
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Andrew Lunn <andrew+netdev@lunn.ch>
> > ---
> Reviewed-by: Joshua Hay <joshua.a.hay@intel.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] idpf: do not linearize big TSO packets
  2025-08-18 19:59 [PATCH net-next] idpf: do not linearize big TSO packets Eric Dumazet
  2025-08-20  4:27 ` Hay, Joshua A
@ 2025-08-22 16:02 ` Tony Nguyen
  2025-08-22 17:40 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 5+ messages in thread
From: Tony Nguyen @ 2025-08-22 16:02 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev, eric.dumazet, Przemek Kitszel, Jacob Keller,
	Madhu Chittim, Pavan Kumar Linga, Joshua Hay, Brian Vazquez,
	Willem de Bruijn, Andrew Lunn



On 8/18/2025 12:59 PM, Eric Dumazet wrote:
> idpf has a limit on number of scatter-gather frags
> that can be used per segment.
> 
> Currently, idpf_tx_start() checks if the limit is hit
> and forces a linearization of the whole packet.
> 
> This requires high order allocations that can fail
> under memory pressure. A full size BIG-TCP packet
> would require order-7 alocation on x86_64 :/
> 
> We can move the check earlier from idpf_features_check()
> for TSO packets, to force GSO in this case, removing the
> cost of a big copy.
> 
> This means that a linearization will eventually happen
> with sizes smaller than one MSS.
> 
> __idpf_chk_linearize() is renamed to idpf_chk_tso_segment()
> and moved to idpf_lib.c
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Tony Nguyen <anthony.l.nguyen@intel.com>

Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com>

> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Cc: Jacob Keller <jacob.e.keller@intel.com>
> Cc: Madhu Chittim <madhu.chittim@intel.com>
> Cc: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> Cc: Joshua Hay <joshua.a.hay@intel.com>
> Cc: Brian Vazquez <brianvv@google.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Andrew Lunn <andrew+netdev@lunn.ch>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] idpf: do not linearize big TSO packets
  2025-08-18 19:59 [PATCH net-next] idpf: do not linearize big TSO packets Eric Dumazet
  2025-08-20  4:27 ` Hay, Joshua A
  2025-08-22 16:02 ` Tony Nguyen
@ 2025-08-22 17:40 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-08-22 17:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, horms, netdev, eric.dumazet,
	anthony.l.nguyen, przemyslaw.kitszel, jacob.e.keller,
	madhu.chittim, pavan.kumar.linga, joshua.a.hay, brianvv, willemb,
	andrew+netdev

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 18 Aug 2025 19:59:34 +0000 you wrote:
> idpf has a limit on number of scatter-gather frags
> that can be used per segment.
> 
> Currently, idpf_tx_start() checks if the limit is hit
> and forces a linearization of the whole packet.
> 
> This requires high order allocations that can fail
> under memory pressure. A full size BIG-TCP packet
> would require order-7 alocation on x86_64 :/
> 
> [...]

Here is the summary with links:
  - [net-next] idpf: do not linearize big TSO packets
    https://git.kernel.org/netdev/net-next/c/02614eee26fb

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-08-22 17:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-18 19:59 [PATCH net-next] idpf: do not linearize big TSO packets Eric Dumazet
2025-08-20  4:27 ` Hay, Joshua A
2025-08-21 14:14   ` Brian Vazquez
2025-08-22 16:02 ` Tony Nguyen
2025-08-22 17:40 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).