From: Dragos Tatulea <dtatulea@nvidia.com>
To: David Laight <david.laight.linux@gmail.com>,
Tariq Toukan <tariqt@nvidia.com>
Cc: Christoph Paasch <cpaasch@openai.com>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Saeed Mahameed <saeedm@nvidia.com>,
Mark Bloch <mbloch@nvidia.com>, Leon Romanovsky <leon@kernel.org>,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, Gal Pressman <gal@nvidia.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
Stanislav Fomichev <sdf@fomichev.me>,
Amery Hung <ameryhung@gmail.com>,
Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part
Date: Fri, 8 May 2026 15:30:40 +0200 [thread overview]
Message-ID: <ec8e758a-6f03-43af-afa8-31632f18737e@nvidia.com> (raw)
In-Reply-To: <20260508134343.6651d7c6@pumpkin>
On 08.05.26 14:43, David Laight wrote:
> On Thu, 7 May 2026 12:53:29 +0300
> Tariq Toukan <tariqt@nvidia.com> wrote:
>
>> From: Christoph Paasch <cpaasch@openai.com>
>>
>> mlx5e_skb_from_cqe_mpwrq_nonlinear() copies MLX5E_RX_MAX_HEAD (256)
>> bytes from the page-pool to the skb's linear part. Those 256 bytes
>> include part of the payload.
>>
>> When attempting to do GRO in skb_gro_receive, if headlen > data_offset
>> (and skb->head_frag is not set), we end up aggregating packets in the
>> frag_list.
>>
>> This is of course not good when we are CPU-limited. Also causes a worse
>> skb->len/truesize ratio,...
>>
>> So, let's avoid copying parts of the payload to the linear part. We use
>> eth_get_headlen() to parse the headers and compute the length of the
>> protocol headers, which will be used to copy the relevant bits of the
>> skb's linear part.
>>
>> We still allocate MLX5E_RX_MAX_HEAD for the skb so that if the networking
>> stack needs to call pskb_may_pull() later on, we don't need to reallocate
>> memory.
>>
>> This gives a nice throughput increase (ARM Neoverse-V2 with CX-7 NIC and
>> LRO enabled):
>>
>> BEFORE:
>> =======
>> (netserver pinned to core receiving interrupts)
>> $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
>> 87380 16384 262144 60.01 32547.82
>>
>> (netserver pinned to adjacent core receiving interrupts)
>> $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
>> 87380 16384 262144 60.00 52531.67
>>
>> AFTER:
>> ======
>> (netserver pinned to core receiving interrupts)
>> $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
>> 87380 16384 262144 60.00 52896.06
>>
>> (netserver pinned to adjacent core receiving interrupts)
>> $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
>> 87380 16384 262144 60.00 85094.90
>>
>> Additional tests across a larger range of parameters w/ and w/o LRO, w/
>> and w/o IPv6-encapsulation, different MTUs (1500, 4096, 9000), different
>> TCP read/write-sizes as well as UDP benchmarks, all have shown equal or
>> better performance with this patch.
>>
>> Reviewed-by: Eric Dumazet <edumazet@google.com>
>> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
>> Signed-off-by: Christoph Paasch <cpaasch@openai.com>
>> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> ---
>> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> index 75ccf40a7f17..301b33419207 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> ...
>> @@ -2060,8 +2066,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
>> pagep->frags++;
>> while (++pagep < frag_page);
>>
>> - headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len,
>> - skb->data_len);
>> + headlen = min_t(u16, headlen - len, skb->data_len);
>
> That looks entirely broken.
> skb->data_len can be larger than 65535 so (u16)skb->data_len can
> discard significant bits.
>
> I can't quite see why the subtract can't overflow either.
> It is entirely non-obvious.
>
A check will be added for that.
> There seem to be far too many u16 local variables in this code.
> Typically they just make the code larger because they require the
> compiler mask arithmetic results to 16bits all the time.
> (Only x86 and m68k have instructions for 8 and 16bit arithmetic.)
> The same is true for function parameters and results.
>
> I think all the min_t() in this file can easily be changed to min().
>
Will use min() here. And for the rest of the datapath files I will look
into a follow-up patch.
Thanks,
Dragos
next prev parent reply other threads:[~2026-05-08 13:31 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-07 9:53 [PATCH net-next V6 0/3] net/mlx5: Avoid payload in skb's linear part for better GRO-processing Tariq Toukan
2026-05-07 9:53 ` [PATCH net-next V6 1/3] net/mlx5e: DMA-sync earlier in mlx5e_skb_from_cqe_mpwrq_nonlinear Tariq Toukan
2026-05-07 9:53 ` [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part Tariq Toukan
2026-05-07 13:53 ` Amery Hung
2026-05-07 15:49 ` Dragos Tatulea
2026-05-07 20:50 ` Amery Hung
2026-05-08 9:15 ` Dragos Tatulea
2026-05-08 17:44 ` Amery Hung
2026-05-08 18:42 ` Dragos Tatulea
2026-05-10 6:50 ` Dragos Tatulea
2026-05-08 12:43 ` David Laight
2026-05-08 13:30 ` Dragos Tatulea [this message]
2026-05-07 9:53 ` [PATCH net-next V6 3/3] net/mlx5e: Align header copy to cache line for Striding RQ non-linear Tariq Toukan
2026-05-07 19:58 ` [PATCH net-next V6 0/3] net/mlx5: Avoid payload in skb's linear part for better GRO-processing Christoph Paasch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ec8e758a-6f03-43af-afa8-31632f18737e@nvidia.com \
--to=dtatulea@nvidia.com \
--cc=ameryhung@gmail.com \
--cc=andrew+netdev@lunn.ch \
--cc=ast@kernel.org \
--cc=cpaasch@openai.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=david.laight.linux@gmail.com \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=hawk@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=sdf@fomichev.me \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox