Re: [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Paolo Valerio <pvalerio@redhat.com>
To: "Théo Lebrun" <theo.lebrun@bootlin.com>, netdev@vger.kernel.org
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>,
	Claudiu Beznea <claudiu.beznea@tuxon.dev>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Lorenzo Bianconi <lorenzo@kernel.org>
Subject: Re: [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration
Date: Tue, 02 Dec 2025 18:24:50 +0100	[thread overview]
Message-ID: <87fr9szti5.fsf@redhat.com> (raw)
In-Reply-To: <DEITSIO441QL.X81MVLL3EIV4@bootlin.com>

Hello Théo,

thank you for the feedback

On 26 Nov 2025 at 07:08:14 PM, Théo Lebrun <theo.lebrun@bootlin.com> wrote:

> Hello Paolo,
>
> So this is an initial review, I'll start here with five series-wide
> topics and send small per-line comments (ie nitpicks) in a second stage.
>
>
>
> ### Rx buffer size computation
>
> The buffer size computation should be reworked. At the end of the series
> it looks like:
>
> static int macb_open(struct net_device *dev)
> {
>     size_t bufsz = dev->mtu + ETH_HLEN + ETH_FCS_LEN + NET_IP_ALIGN;
>
>     // ...
>
>     macb_init_rx_buffer_size(bp, bufsz);
>
>     // ...
> }
>
> static void macb_init_rx_buffer_size(struct macb *bp, size_t size)
> {
>     if (!macb_is_gem(bp)) {
>         bp->rx_buffer_size = MACB_RX_BUFFER_SIZE;
>     } else {
>         bp->rx_buffer_size = size
>             + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
>             + MACB_PP_HEADROOM;
>
>         if (bp->rx_buffer_size > PAGE_SIZE)
>             bp->rx_buffer_size = PAGE_SIZE;
>
>         if (bp->rx_buffer_size % RX_BUFFER_MULTIPLE)
>             bp->rx_buffer_size = roundup(bp->rx_buffer_size, RX_BUFFER_MULTIPLE);
>     }
> }
>
> Most of the issues with this code is not stemming from your series, but
> this big rework is the right moment to fix it all.
>
>  - NET_IP_ALIGN is accounted for in the headroom even though it isn't
>    present if !RSC.
>

that's something I noticed and I was a unsure about the reason.

>  - When skbuff.c/h allocates an SKB buffer, it SKB_DATA_ALIGN()s
>    headroom+data. We should probably do the same. In our case it would
>    be:
>
>    bp->rx_buffer_size = SKB_DATA_ALIGN(MACB_PP_HEADROOM + size) +
>                         SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>    // or
>    bp->rx_buffer_size = SKB_HEAD_ALIGN(MACB_PP_HEADROOM + size);
>
>    I've not computed if it can ever give a different value to your
>    series in terms of total size or shinfo alignment. I'd guess it is
>    unlikely.
>

I'll take a look at this

>  - If the size clamping to PAGE_SIZE comes into play, we are probably
>    doomed. It means we cannot deal with the MTU and we'll probably get
>    corruption. If we do put a check in place, it should loudly fail
>    rather than silently clamp.
>

That should not happen, unless I'm missing something.
E.g., 9000B mtu on a 4K PAGE_SIZE kernel should be handled with multiple
descriptors. The clamping is there because according with how the series
creates the pool, the maximum buffer size is page order 0.

Hardware-wise bp->rx_buffer_size should also be taken into account for
the receive buffer size.

> TLDR: I think macb_init_rx_buffer_size() should encapsulate the whole
> rx buffer size computation. It can use bp->rx_offset and add on top
> MTU & co. It might start failing if >PAGE_SIZE (?).
>

ack, I'll move that part

>
>
> ### Buffer variable names
>
> Related: so many variables, fields or constants have ambiguous names,
> can we do something about it?
>
>  - bp->rx_offset is named oddly to my ears. Offset to what?
>    Maybe bp->rx_head or bp->rx_headroom?
>

bp->rx_headroom sounds a good choice to me, but if you have a stronger
preference for bp->rx_head just let me know.

>  - bp->rx_buffer_size: it used to be approximately the payload size
>    (plus NET_IP_ALIGN). Now it contains the XDP headroom and shinfo.
>    That's on GEM, because on MACB it means something different.
>
>    This line is a bit ironic and illustrates the trouble:
>       buffer_size = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset
>
>  - data in gem_rx|gem_rx_refill|gem_free_rx_buffers() points to what we
>    could call skb->head, ie start of buffer & start of XDP headroom.
>    Its name implied skb->data to me, ie after the headroom.
>
>    It also made `data - page_address(page) + bp->rx_offset` hard to
>    understand. It is easier for me to understand that the following is
>    the page fragment offset till skb->data:
>
>       buff_head + bp->rx_headroom - page_address(page)
>

ack, will change this

>  - MACB_MAX_PAD is ambiguous. It rhymes with NET_SKB_PAD which is the
>    default SKB headroom but here it contains part of the headroom
>    (XDP_PACKET_HEADROOM but not NET_IP_ALIGN) and the tailroom (shinfo).
>

uhm, looking at this, I think NET_IP_ALIGN should be part of it (for the
!rsc case).
I'll revisit this part, though.

>  - Field `data` in `struct macb_tx_buff` points to skb/xdp_frame but my
>    initial thought was skb->data pointer (ie after headroom).
>    What about va or ptr or buff or frame or ...?
>

I see. At some point I considered buff, but then I realized
tx_buff->buff was not perfect, hence data :)

I guess one between frame or va works, thanks.

> TLDR: I had a hard time understanding size/offset expressions (both from
> existing code and the series) because of variable names.
>

ack. Will revisit this aspect based on your suggestions.

>
>
> ### Duplicated buffer size computations
>
> Last point related to buffer size computation:
>
>  - it is duplicated in macb_set_mtu() but forgets NET_IP_ALIGN & proper
>    SKB_DATA_ALIGN() and,
>
>  - it is duplicated in gem_xdp_setup() but I don't see why because the
>    buffer size is computed to work fine with/without XDP enabled. Also
>    this check means we cannot load an XDP program before macb_open()
>    because bp->rx_buffer_size == 0.
>
> TLDR: Let's deduplicate size computations to minimise chances of getting
> it wrong.
>

ack

>
>
> ### Allocation changes
>
> I am not convinced by patches 1/6 and 2/6 that change the alloc strategy
> in two steps, from netdev_alloc_skb() to page_pool_alloc_pages() to
> page_pool_alloc_frag().
>
>  - The transient solution isn't acceptable when PAGE_SIZE is large.
>    We have 16K and would never want one buffer per page.
>
>  - It forces you to introduce temporary code & constants which is added
>    noise IMO. MACB_PP_MAX_BUF_SIZE() is odd as is the alignment of
>    buffer sizes to page sizes. It forces you to deal with
>    `bp->rx_buffer_size > PAGE_SIZE` which we could ignore. Right?
>
> TLDR: do alloc changes in one step.
>

yes, makes sense I'll squash them.

>
>
> ### XDP_SETUP_PROG if netif_running()
>
> I'd like to start a discussion on the expected behavior on XDP program
> change if netif_running(). Summarised:
>
> static int gem_xdp_setup(struct net_device *dev, struct bpf_prog *prog,
>              struct netlink_ext_ack *extack)
> {
>     bool running = netif_running(dev);
>     bool need_update = !!bp->prog != !!prog;
>
>     if (running && need_update)
>         macb_close(dev);
>     old_prog = rcu_replace_pointer(bp->prog, prog, lockdep_rtnl_is_held());
>     if (running && need_update)
>         return macb_open(dev);
> }
>
> Have you experimented with that? I don't see anything graceful in our
> close operation, it looks like we'll get corruption or dropped packets
> or both. We shouldn't impose that on the user who just wanted to swap
> the program.
>
> I cannot find any good reason that implies we wouldn't be able to swap
> our XDP program on the fly. If we think it is unsafe, I'd vote for
> starting with a -EBUSY return code and iterating on that.
>

I didn't experiment much with this, other than simply adding and
removing programs as needed during my tests. Didn't experience
particular issues.

The reason a close/open sequence was added here was mostly because I was
considering to account XDP_PACKET_HEADROOM only when a program was
present. I later decided to not proceed with that (mostly to avoid
changing too many things at once).

Given the geometry of the buffer remains untouched in either case, I
see no particular reasons we can't swap on the fly as you suggest.

I'll try this and change it, thanks!

> TLDR: macb_close() on XDP program change is probably a bad idea.
>
> Thanks,
>
> --
> Théo Lebrun, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com

next prev parent reply	other threads:[~2025-12-02 17:25 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-19 13:53 [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Paolo Valerio
2025-11-19 13:53 ` [PATCH RFC net-next 1/6] cadence: macb/gem: Add page pool support Paolo Valerio
2025-11-27 13:21   ` Théo Lebrun
2025-12-02 17:30     ` Paolo Valerio
2025-11-19 13:53 ` [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception Paolo Valerio
2025-11-27 13:38   ` Théo Lebrun
2025-12-02 17:32     ` Paolo Valerio
2025-12-08 10:21       ` Théo Lebrun
2025-12-08 10:22         ` [PATCH 1/8] net: macb: move Rx buffers alloc from link up to open Théo Lebrun
2025-12-08 12:53         ` [PATCH RFC net-next 2/6] cadence: macb/gem: handle multi-descriptor frame reception Paolo Valerio
2025-12-09  9:01           ` Théo Lebrun
2025-11-19 13:53 ` [PATCH RFC net-next 3/6] cadence: macb/gem: use the current queue number for stats Paolo Valerio
2025-11-19 13:53 ` [PATCH RFC net-next 4/6] cadence: macb/gem: add XDP support for gem Paolo Valerio
2025-11-27 14:41   ` Théo Lebrun
2025-12-02 17:32     ` Paolo Valerio
2025-12-08 10:59       ` Théo Lebrun
2025-11-19 13:53 ` [PATCH RFC net-next 5/6] cadence: macb/gem: make tx path skb agnostic Paolo Valerio
2025-11-27 14:49   ` Théo Lebrun
2025-12-02 17:33     ` Paolo Valerio
2025-11-19 13:53 ` [PATCH RFC net-next 6/6] cadence: macb/gem: introduce xmit support Paolo Valerio
2025-11-27 15:07   ` Théo Lebrun
2025-12-02 17:34     ` Paolo Valerio
2025-12-08 11:01       ` Théo Lebrun
2025-11-25 16:50 ` [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration Théo Lebrun
2025-11-25 23:11   ` Paolo Valerio
2025-11-26 18:08 ` Théo Lebrun
2025-12-02 17:24   ` Paolo Valerio [this message]
2025-12-03 14:28     ` Théo Lebrun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fr9szti5.fsf@redhat.com \
    --to=pvalerio@redhat.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=claudiu.beznea@tuxon.dev \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=lorenzo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.ferre@microchip.com \
    --cc=pabeni@redhat.com \
    --cc=theo.lebrun@bootlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).