Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Dragos Tatulea <dtatulea@nvidia.com>
To: David Laight <david.laight.linux@gmail.com>,
	Tariq Toukan <tariqt@nvidia.com>
Cc: Christoph Paasch <cpaasch@openai.com>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Mark Bloch <mbloch@nvidia.com>, Leon Romanovsky <leon@kernel.org>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, Gal Pressman <gal@nvidia.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Stanislav Fomichev <sdf@fomichev.me>,
	Amery Hung <ameryhung@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part
Date: Fri, 8 May 2026 15:30:40 +0200	[thread overview]
Message-ID: <ec8e758a-6f03-43af-afa8-31632f18737e@nvidia.com> (raw)
In-Reply-To: <20260508134343.6651d7c6@pumpkin>



On 08.05.26 14:43, David Laight wrote:
> On Thu, 7 May 2026 12:53:29 +0300
> Tariq Toukan <tariqt@nvidia.com> wrote:
> 
>> From: Christoph Paasch <cpaasch@openai.com>
>>
>> mlx5e_skb_from_cqe_mpwrq_nonlinear() copies MLX5E_RX_MAX_HEAD (256)
>> bytes from the page-pool to the skb's linear part. Those 256 bytes
>> include part of the payload.
>>
>> When attempting to do GRO in skb_gro_receive, if headlen > data_offset
>> (and skb->head_frag is not set), we end up aggregating packets in the
>> frag_list.
>>
>> This is of course not good when we are CPU-limited. Also causes a worse
>> skb->len/truesize ratio,...
>>
>> So, let's avoid copying parts of the payload to the linear part. We use
>> eth_get_headlen() to parse the headers and compute the length of the
>> protocol headers, which will be used to copy the relevant bits of the
>> skb's linear part.
>>
>> We still allocate MLX5E_RX_MAX_HEAD for the skb so that if the networking
>> stack needs to call pskb_may_pull() later on, we don't need to reallocate
>> memory.
>>
>> This gives a nice throughput increase (ARM Neoverse-V2 with CX-7 NIC and
>> LRO enabled):
>>
>> BEFORE:
>> =======
>> (netserver pinned to core receiving interrupts)
>> $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
>>  87380  16384 262144    60.01    32547.82
>>
>> (netserver pinned to adjacent core receiving interrupts)
>> $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
>>  87380  16384 262144    60.00    52531.67
>>
>> AFTER:
>> ======
>> (netserver pinned to core receiving interrupts)
>> $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
>>  87380  16384 262144    60.00    52896.06
>>
>> (netserver pinned to adjacent core receiving interrupts)
>>  $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
>>  87380  16384 262144    60.00    85094.90
>>
>> Additional tests across a larger range of parameters w/ and w/o LRO, w/
>> and w/o IPv6-encapsulation, different MTUs (1500, 4096, 9000), different
>> TCP read/write-sizes as well as UDP benchmarks, all have shown equal or
>> better performance with this patch.
>>
>> Reviewed-by: Eric Dumazet <edumazet@google.com>
>> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
>> Signed-off-by: Christoph Paasch <cpaasch@openai.com>
>> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> ---
>>  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> index 75ccf40a7f17..301b33419207 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> ...
>> @@ -2060,8 +2066,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
>>  				pagep->frags++;
>>  			while (++pagep < frag_page);
>>  
>> -			headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len,
>> -					skb->data_len);
>> +			headlen = min_t(u16, headlen - len, skb->data_len);
> 
> That looks entirely broken.
> skb->data_len can be larger than 65535 so (u16)skb->data_len can
> discard significant bits.
> 
> I can't quite see why the subtract can't overflow either.
> It is entirely non-obvious.
>
A check will be added for that.
 
> There seem to be far too many u16 local variables in this code.
> Typically they just make the code larger because they require the
> compiler mask arithmetic results to 16bits all the time.
> (Only x86 and m68k have instructions for 8 and 16bit arithmetic.)
> The same is true for function parameters and results.
> 
> I think all the min_t() in this file can easily be changed to min().
>
Will use min() here. And for the rest of the datapath files I will look
into a follow-up patch.

Thanks,
Dragos

  reply	other threads:[~2026-05-08 13:31 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-07  9:53 [PATCH net-next V6 0/3] net/mlx5: Avoid payload in skb's linear part for better GRO-processing Tariq Toukan
2026-05-07  9:53 ` [PATCH net-next V6 1/3] net/mlx5e: DMA-sync earlier in mlx5e_skb_from_cqe_mpwrq_nonlinear Tariq Toukan
2026-05-07  9:53 ` [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part Tariq Toukan
2026-05-07 13:53   ` Amery Hung
2026-05-07 15:49     ` Dragos Tatulea
2026-05-07 20:50       ` Amery Hung
2026-05-08  9:15         ` Dragos Tatulea
2026-05-08 17:44           ` Amery Hung
2026-05-08 18:42             ` Dragos Tatulea
2026-05-10  6:50               ` Dragos Tatulea
2026-05-08 12:43   ` David Laight
2026-05-08 13:30     ` Dragos Tatulea [this message]
2026-05-07  9:53 ` [PATCH net-next V6 3/3] net/mlx5e: Align header copy to cache line for Striding RQ non-linear Tariq Toukan
2026-05-07 19:58 ` [PATCH net-next V6 0/3] net/mlx5: Avoid payload in skb's linear part for better GRO-processing Christoph Paasch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ec8e758a-6f03-43af-afa8-31632f18737e@nvidia.com \
    --to=dtatulea@nvidia.com \
    --cc=ameryhung@gmail.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=ast@kernel.org \
    --cc=cpaasch@openai.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=david.laight.linux@gmail.com \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=sdf@fomichev.me \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox