Re: [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

From: David Laight <david.laight.linux@gmail.com>
To: Tariq Toukan <tariqt@nvidia.com>
Cc: Christoph Paasch <cpaasch@openai.com>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	"Andrew Lunn" <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Saeed Mahameed <saeedm@nvidia.com>,
	"Mark Bloch" <mbloch@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>, <netdev@vger.kernel.org>,
	<linux-rdma@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	Gal Pressman <gal@nvidia.com>,
	Dragos Tatulea <dtatulea@nvidia.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Stanislav Fomichev <sdf@fomichev.me>,
	Amery Hung <ameryhung@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part
Date: Fri, 8 May 2026 13:43:43 +0100	[thread overview]
Message-ID: <20260508134343.6651d7c6@pumpkin> (raw)
In-Reply-To: <20260507095330.318892-3-tariqt@nvidia.com>

On Thu, 7 May 2026 12:53:29 +0300
Tariq Toukan <tariqt@nvidia.com> wrote:

> From: Christoph Paasch <cpaasch@openai.com>
> 
> mlx5e_skb_from_cqe_mpwrq_nonlinear() copies MLX5E_RX_MAX_HEAD (256)
> bytes from the page-pool to the skb's linear part. Those 256 bytes
> include part of the payload.
> 
> When attempting to do GRO in skb_gro_receive, if headlen > data_offset
> (and skb->head_frag is not set), we end up aggregating packets in the
> frag_list.
> 
> This is of course not good when we are CPU-limited. Also causes a worse
> skb->len/truesize ratio,...
> 
> So, let's avoid copying parts of the payload to the linear part. We use
> eth_get_headlen() to parse the headers and compute the length of the
> protocol headers, which will be used to copy the relevant bits of the
> skb's linear part.
> 
> We still allocate MLX5E_RX_MAX_HEAD for the skb so that if the networking
> stack needs to call pskb_may_pull() later on, we don't need to reallocate
> memory.
> 
> This gives a nice throughput increase (ARM Neoverse-V2 with CX-7 NIC and
> LRO enabled):
> 
> BEFORE:
> =======
> (netserver pinned to core receiving interrupts)
> $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
>  87380  16384 262144    60.01    32547.82
> 
> (netserver pinned to adjacent core receiving interrupts)
> $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
>  87380  16384 262144    60.00    52531.67
> 
> AFTER:
> ======
> (netserver pinned to core receiving interrupts)
> $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
>  87380  16384 262144    60.00    52896.06
> 
> (netserver pinned to adjacent core receiving interrupts)
>  $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
>  87380  16384 262144    60.00    85094.90
> 
> Additional tests across a larger range of parameters w/ and w/o LRO, w/
> and w/o IPv6-encapsulation, different MTUs (1500, 4096, 9000), different
> TCP read/write-sizes as well as UDP benchmarks, all have shown equal or
> better performance with this patch.
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
> Signed-off-by: Christoph Paasch <cpaasch@openai.com>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 75ccf40a7f17..301b33419207 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
...
> @@ -2060,8 +2066,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
>  				pagep->frags++;
>  			while (++pagep < frag_page);
>  
> -			headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len,
> -					skb->data_len);
> +			headlen = min_t(u16, headlen - len, skb->data_len);

That looks entirely broken.
skb->data_len can be larger than 65535 so (u16)skb->data_len can
discard significant bits.

I can't quite see why the subtract can't overflow either.
It is entirely non-obvious.

There seem to be far too many u16 local variables in this code.
Typically they just make the code larger because they require the
compiler mask arithmetic results to 16bits all the time.
(Only x86 and m68k have instructions for 8 and 16bit arithmetic.)
The same is true for function parameters and results.

I think all the min_t() in this file can easily be changed to min().

-- David

>  			__pskb_pull_tail(skb, headlen);
>  		}
>  	} else {

next prev parent reply	other threads:[~2026-05-08 12:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-07  9:53 [PATCH net-next V6 0/3] net/mlx5: Avoid payload in skb's linear part for better GRO-processing Tariq Toukan
2026-05-07  9:53 ` [PATCH net-next V6 1/3] net/mlx5e: DMA-sync earlier in mlx5e_skb_from_cqe_mpwrq_nonlinear Tariq Toukan
2026-05-07  9:53 ` [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part Tariq Toukan
2026-05-07 13:53   ` Amery Hung
2026-05-07 15:49     ` Dragos Tatulea
2026-05-07 20:50       ` Amery Hung
2026-05-08  9:15         ` Dragos Tatulea
2026-05-08 17:44           ` Amery Hung
2026-05-08 18:42             ` Dragos Tatulea
2026-05-10  6:50               ` Dragos Tatulea
2026-05-08 12:43   ` David Laight [this message]
2026-05-08 13:30     ` Dragos Tatulea
2026-05-07  9:53 ` [PATCH net-next V6 3/3] net/mlx5e: Align header copy to cache line for Striding RQ non-linear Tariq Toukan
2026-05-07 19:58 ` [PATCH net-next V6 0/3] net/mlx5: Avoid payload in skb's linear part for better GRO-processing Christoph Paasch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260508134343.6651d7c6@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=ameryhung@gmail.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=ast@kernel.org \
    --cc=cpaasch@openai.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=sdf@fomichev.me \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox