From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Yunsheng Lin <linyunsheng@huawei.com>,
Jesper Dangaard Brouer <jbrouer@redhat.com>,
Liang Chen <liangchen.linux@gmail.com>,
hawk@kernel.org, horms@kernel.org, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com
Cc: ilias.apalodimas@linaro.org, daniel@iogearbox.net,
ast@kernel.org, netdev@vger.kernel.org,
Lorenzo Bianconi <lorenzo.bianconi@redhat.com>,
Stanislav Fomichev <sdf@google.com>,
Maryam Tahhan <mtahhan@redhat.com>
Subject: Re: [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage
Date: Tue, 22 Aug 2023 20:13:17 +0200 [thread overview]
Message-ID: <21cc97c9-1778-1b73-a05e-4ebbca39c861@kernel.org> (raw)
In-Reply-To: <ef4ca8d3-3127-f6dd-032a-e04d367fd49c@redhat.com>
On 22/08/2023 15.05, Jesper Dangaard Brouer wrote:
>
> On 22/08/2023 14.24, Yunsheng Lin wrote:
>> On 2023/8/22 5:54, Jesper Dangaard Brouer wrote:
>>> On 21/08/2023 16.21, Jesper Dangaard Brouer wrote:
>>>>
>>>> On 16/08/2023 14.30, Liang Chen wrote:
>>>>> Page pool is supported for veth, but at the moment pages are not properly
>>>>> recyled for XDP_TX and XDP_REDIRECT. That prevents veth xdp from fully
>>>>> leveraging the advantages of the page pool. So this RFC patchset is mainly
>>>>> to make recycling work for those cases. With that in place, it can be
>>>>> further optimized by utilizing the napi skb cache. Detailed figures are
>>>>> presented in each commit message, and together they demonstrate a quite
>>>>> noticeable improvement.
>>>>>
>>>>
>>>> I'm digging into this code path today.
>>>>
>>>> I'm trying to extend this and find a way to support SKBs that used
>>>> kmalloc (skb->head_frag=0), such that we can remove the
>>>> skb_head_is_locked() check in veth_convert_skb_to_xdp_buff(), which will
>>>> allow more SKBs to avoid realloc. As long as they have enough headroom,
>>>> which we can dynamically control for netdev TX-packets by adjusting
>>>> netdev->needed_headroom, e.g. when loading an XDP prog.
>>>>
>>>> I noticed netif_receive_generic_xdp() and bpf_prog_run_generic_xdp() can
>>>> handle SKB kmalloc (skb->head_frag=0). Going though the code, I don't
>>>> think it is a bug that generic-XDP allows this.
>>
>> Is it possible to relaxe other checking too, and implement something like
>> pskb_expand_head() in xdp core if xdp core need to modify the data?
>>
>
> Yes, I definitely hope (and plan) to relax other checks.
>
> The XDP_PACKET_HEADROOM (256 bytes) check have IMHO become obsolete and
> wrong, as many drivers today use headroom 192 bytes for XDP (which we
> allowed). Thus, there is not reason for veth to insist on this
> XDP_PACKET_HEADROOM limit. Today XDP can handle variable headroom (due
> to these drivers).
>
>
>>
>>>>
>>>> Deep into this rabbit hole, I start to question our approach.
>>>> - Perhaps the veth XDP approach for SKBs is wrong?
>>>>
>>>> The root-cause of this issue is that veth_xdp_rcv_skb() code path (that
>>>> handle SKBs) is calling XDP-native function "xdp_do_redirect()". I
>>>> question, why isn't it using "xdp_do_generic_redirect()"?
>>>> (I will jump into this rabbit hole now...)
>>
>> Is there any reason why xdp_do_redirect() can not handle the
>> slab-allocated
>> data? Can we change the xdp_do_redirect() to handle slab-allocated
>> data, so that it can benefit other case beside veth too?
>>
>
> I started coding up this, but realized that it was a wrong approach.
>
> The xdp_do_redirect() call is for native-XDP with a proper xdp_buff.
> When dealing with SKBs we pretend is a xdp_buff, we have the API
> xdp_do_generic_redirect(). IMHO it is wrong to "steal" the packet-data
> from an SKB and in-order to use the native-XDP API xdp_do_redirect().
> In the use-cases I see, often the next layer will allocate a new SKB and
> attach the stolen packet-data , which is pure-waste as
> xdp_do_generic_redirect() keeps the SKB intact, so no new SKB allocs.
>
Please see my RFC-v1 patchset[1] for a different approach, which avoids
the realloc and page_pool usage all together (but also results in
correct recycling for PP when realloc cannot be avoided).
[1]
https://lore.kernel.org/all/169272709850.1975370.16698220879817216294.stgit@firesoul/
--Jesper
prev parent reply other threads:[~2023-08-22 18:13 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-16 12:30 [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Liang Chen
2023-08-16 12:30 ` [RFC PATCH net-next v3 1/2] net: veth: Improving page pool recycling Liang Chen
2023-08-16 12:30 ` [RFC PATCH net-next v3 2/2] net: veth: Optimizing skb reuse in NAPI Context Liang Chen
2023-08-21 14:21 ` [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Jesper Dangaard Brouer
2023-08-21 21:54 ` Jesper Dangaard Brouer
2023-08-22 12:24 ` Yunsheng Lin
2023-08-22 13:05 ` Jesper Dangaard Brouer
2023-08-22 18:13 ` Jesper Dangaard Brouer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=21cc97c9-1778-1b73-a05e-4ebbca39c861@kernel.org \
--to=hawk@kernel.org \
--cc=ast@kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=jbrouer@redhat.com \
--cc=kuba@kernel.org \
--cc=liangchen.linux@gmail.com \
--cc=linyunsheng@huawei.com \
--cc=lorenzo.bianconi@redhat.com \
--cc=mtahhan@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).