From: Dragos Tatulea <dtatulea@nvidia.com>
To: Amery Hung <ameryhung@gmail.com>
Cc: Tariq Toukan <tariqt@nvidia.com>,
Christoph Paasch <cpaasch@openai.com>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Saeed Mahameed <saeedm@nvidia.com>,
Mark Bloch <mbloch@nvidia.com>, Leon Romanovsky <leon@kernel.org>,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, Gal Pressman <gal@nvidia.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
Stanislav Fomichev <sdf@fomichev.me>,
Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part
Date: Tue, 12 May 2026 17:33:08 +0200 [thread overview]
Message-ID: <47630ed4-3028-4716-816e-d4f803423b37@nvidia.com> (raw)
In-Reply-To: <CAMB2axN6USwsGaUQWkL52G=9V=kSe2La_gE-ppOFLWbPCnaVKQ@mail.gmail.com>
On 12.05.26 01:08, Amery Hung wrote:
> On Sat, May 9, 2026 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>>
>>
>>
>> On 08.05.26 20:42, Dragos Tatulea wrote:
>>>
>>>
>>> On 08.05.26 19:44, Amery Hung wrote:
>>>> On Fri, May 8, 2026 at 2:15 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 07.05.26 22:50, Amery Hung wrote:
>>>>>> On Thu, May 7, 2026 at 4:50 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Amery,
>>>>>>>
>>>>>>> On 07.05.26 15:53, Amery Hung wrote:
>>>>>>>> [...]
>>>>>>>> Am I understanding correctly that the better performance comes with
>>>>>>>> the assumption that the XDP does not change headers?
>>>>>>>>
>>>>>>>> headlen is determined before the XDP program runs. If it push/pop
>>>>>>>> headers, there could be headers in frags or data in the linear region
>>>>>>>> after __pskb_pull_tail().
>>>>>>>>
>>>>>>> That's right.
>>>>>>>
>>>>>>>>> if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
>>>>>>>>> struct mlx5e_frag_page *pfp;
>>>>>>>>> @@ -2060,8 +2066,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
>>>>>>>>> pagep->frags++;
>>>>>>>>> while (++pagep < frag_page);
>>>>>>>>>
>>>>>>>>> - headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len,
>>>>>>>>> - skb->data_len);
>>>>>>>>> + headlen = min_t(u16, headlen - len, skb->data_len);
>>>>>>>>
>>>>>>>> headlen - len can underflow but will be capped by skb->data_len, so
>>>>>>>> this should be okay, right?
>>>>>>> It is safe. But it might trigger an extra allocation in the pull when
>>>>>>> len > headlen. We could also skip the pull in that case. Or do a
>>>>>>> min(headlen - len, min(skb->data_len, MLX5E_RX_MAX_HEAD)). WDYT?
>>>>>>
>>>>>> Make sense, but this line took me a bit to understand. Maybe consider
>>>>>> checking len < headlen first?
>>>>>>
>>>>>> if (len < headlen) {
>>>>>> headlen = min_t(u32, headlen - len, skb->data_len);
>>>>>> __pskb_pull_tail(skb, headlen);
>>>>>> }
>>>>>>
>>>>> Yes, that's what I had in mind when skipping the pull. I would also
>>>>> tag this as likely.
>>>>>
>>>>>> Another clarifying question. So this patch will improve the
>>>>>> performance when the XDP programs don't change header length. For
>>>>>> those that encap/decap, they should precisely pull only headers into
>>>>>> the linear area for optimal performance. Is it correct?
>>>>>>
>>>>> Right for encap, but for decap not quite:
>>>>>
>>>>> Let's say that the XDP program pulls 64B header into the linear part
>>>>> and snips 4B of the encap out. This would result in a pull of an
>>>>> additional 4B (headlen (64B) - len (60B) = 4B) which are now
>>>>> data bytes => sub-optimal layout.
>>>>>
>>>>> I don't see how we can improve this corner case though.
>>>>
>>>> I see. Thanks for the clarification.
>>>>
>>>> I think the "if (len < headlen)" makes too many assumptions about what
>>>> the XDP program did.
>>>>
>>>> How about this policy instead: If the XDP program did not create/pull
>>>> data into the linear area, pull the parsed headers; otherwise, assume
>>>> the XDP program owns the geometry. min() is still needed since the
>>>> program can shrink the packet.
>>>>
>>>> if (!len) {
>>>> headlen = min(headlen, skb->data_len);
>>>> __pskb_pull_tail(skb, headen);
>>>> }
>>>>
>>>> This preserves the optimization for the default no-modification case,
>>>> and most importantly allow XDP program to get the optimal performance
>>>> if it gets the final geometry right.
>>>>
>>> I like this. It will also save us some neurons next time we need to
>>> touch these lines.
>>>
>> Sashiko disagrees:
>>
>> """
>> If an XDP program changes the packet geometry by prepending data, len will
>> be greater than 0, which skips the __pskb_pull_tail() call entirely.
>> The resulting SKB's linear part will only contain the prepended data, with
>> the Ethernet headers remaining in the fragments.
>> When the network stack later calls eth_type_trans(), it unconditionally
>> pulls ETH_HLEN:
>> net/ethernet/eth.c:eth_type_trans() {
>> ...
>> skb_pull_inline(skb, ETH_HLEN);
>> ...
>> }
>> Could pulling 14 bytes from a smaller linear area cause skb->len to drop
>> below skb->data_len and trigger a BUG() in __skb_pull()?
>> """
>>
>> So I think we need an else where we preserve the old behavior:
>> if (!len)
>> headlen = min(headlen, skb->data_len);
>> else
>> headlen = min(MLX5E_RX_MAX_HEAD - len, skb->data_len);
>>
>> __pskb_pull_tail(skb, headlen);
>
> I see. I am okay with the fallback.
>
> One last question: if the fallback is mainly to preserve the minimum
> linear head needed by eth_type_trans(), can we make that explicit
> instead?
>
> unsigned int pull_len = 0;
>
> if (!len)
> pull_len = headlen;
> else if (len < ETH_HLEN)
> pull_len = ETH_HLEN - len;
>
> if (pull_len)
> __pskb_pull_tail(skb, min(pull_len, skb->data_len));
>
> This keeps the parsed-header pull for the fully nonlinear case, but once
> XDP leaves some linear data, the driver only pulls enough to satisfy the
> Ethernet header invariant and otherwise leaves the final geometry to XDP.
>
That would work, but maybe with one less conditional:
if (!len)
__pskb_pull_tail(skb, min(headlen, skb->data_len);
else if (len < ETH_HLEN)
__pskb_pull_tail(skb, min(ETH_LEN - len, skb->data_len));
Tariq suggested to make sure that we have an xdp selftest for this.
Will take it as a follow-up after this series.
Thanks,
Dragos
next prev parent reply other threads:[~2026-05-12 15:33 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-07 9:53 [PATCH net-next V6 0/3] net/mlx5: Avoid payload in skb's linear part for better GRO-processing Tariq Toukan
2026-05-07 9:53 ` [PATCH net-next V6 1/3] net/mlx5e: DMA-sync earlier in mlx5e_skb_from_cqe_mpwrq_nonlinear Tariq Toukan
2026-05-07 9:53 ` [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part Tariq Toukan
2026-05-07 13:53 ` Amery Hung
2026-05-07 15:49 ` Dragos Tatulea
2026-05-07 20:50 ` Amery Hung
2026-05-08 9:15 ` Dragos Tatulea
2026-05-08 17:44 ` Amery Hung
2026-05-08 18:42 ` Dragos Tatulea
2026-05-10 6:50 ` Dragos Tatulea
2026-05-11 23:08 ` Amery Hung
2026-05-12 15:33 ` Dragos Tatulea [this message]
2026-05-12 16:30 ` Amery Hung
2026-05-08 12:43 ` David Laight
2026-05-08 13:30 ` Dragos Tatulea
2026-05-07 9:53 ` [PATCH net-next V6 3/3] net/mlx5e: Align header copy to cache line for Striding RQ non-linear Tariq Toukan
2026-05-07 19:58 ` [PATCH net-next V6 0/3] net/mlx5: Avoid payload in skb's linear part for better GRO-processing Christoph Paasch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47630ed4-3028-4716-816e-d4f803423b37@nvidia.com \
--to=dtatulea@nvidia.com \
--cc=ameryhung@gmail.com \
--cc=andrew+netdev@lunn.ch \
--cc=ast@kernel.org \
--cc=cpaasch@openai.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=hawk@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=sdf@fomichev.me \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox