Re: [PATCH net-next v9 1/7] net: bcmgenet: convert RX path to page_pool

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: nb@tipi-net.de
Cc: Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, justin.chen@broadcom.com,
	horms@kernel.org, mohsin.bashr@gmail.com, opendmb@gmail.com,
	florian.fainelli@broadcom.com,
	bcm-kernel-feedback-list@broadcom.com, andrew+netdev@lunn.ch,
	edumazet@google.com, pabeni@redhat.com, hmohsin@meta.com,
	davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
	hawk@kernel.org, john.fastabend@gmail.com, sdf@fomichev.me,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH net-next v9 1/7] net: bcmgenet: convert RX path to page_pool
Date: Sun, 10 May 2026 11:46:37 -0700	[thread overview]
Message-ID: <20260510184637.3952320-1-kuba@kernel.org> (raw)
In-Reply-To: <20260506095553.55357-2-nb@tipi-net.de>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: bcmgenet: convert RX path to page_pool

This patch replaces the per-packet __netdev_alloc_skb() + dma_map_single()
in the RX path with page_pool, and builds skbs via napi_build_skb() with
skb_mark_for_recycle() so pages are returned to the pool automatically.
A per-ring page_pool is created with PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
and the buffer layout reserves XDP_PACKET_HEADROOM before the HW RSB.

> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 54f71b1e85fc..df11c4977e8f 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c

[ ... ]

> @@ -2341,25 +2325,29 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
>  	while ((rxpktprocessed < rxpkttoprocess) &&
>  	       (rxpktprocessed < budget)) {
>  		struct status_64 *status;
> +		struct page *rx_page;
> +		unsigned int rx_off;
>  		__be16 rx_csum;
> +		void *hard_start;
>  
>  		cb = &priv->rx_cbs[ring->read_ptr];
> -		skb = bcmgenet_rx_refill(priv, cb);
>  
> -		if (unlikely(!skb)) {
> +		/* Save the received page before refilling */
> +		rx_page = cb->rx_page;
> +		rx_off = cb->rx_page_offset;
> +
> +		if (bcmgenet_rx_refill(ring, cb)) {
>  			BCMGENET_STATS64_INC(stats, dropped);
>  			goto next;
>  		}
>  
> -		status = (struct status_64 *)skb->data;
> +		/* Sync the RSB first to read the frame length */
> +		page_pool_dma_sync_for_cpu(ring->page_pool, rx_page, 0,
> +					   sizeof(struct status_64));
> +
> +		hard_start = page_address(rx_page) + rx_off;
> +		status = (struct status_64 *)hard_start;
>  		dma_length_status = status->length_status;

rx_off is saved from cb->rx_page_offset, but bcmgenet_rx_refill() always
sets that field to XDP_PACKET_HEADROOM.  The adjacent napi_build_skb()
call further down hardcodes the same constant as:

    napi_build_skb(hard_start, PAGE_SIZE - XDP_PACKET_HEADROOM);

Is the intent that rx_page_offset eventually varies (for XDP layouts),
or should the two sites either both use the constant or both derive the
size from rx_off (for example PAGE_SIZE - rx_off)?  If the offset ever
changes without updating the hardcoded napi_build_skb() argument the
skb_shared_info placement becomes incorrect.

>  
>  		/* DMA flags and length are still valid no matter how
>  		 * we got the Receive Status Vector (64B RSB or register)
> @@ -2367,15 +2355,23 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
>  		dma_flag = dma_length_status & 0xffff;
>  		len = dma_length_status >> DMA_BUFLENGTH_SHIFT;
>  
> +		/* Sync the rest of the actual received frame */
> +		if (len > sizeof(struct status_64))
> +			page_pool_dma_sync_for_cpu(ring->page_pool, rx_page,
> +						   sizeof(struct status_64),
> +						   len - sizeof(struct status_64));
> +

Can this drive a cache-maintenance operation with an unbounded length?

len is extracted from the HW-written RSB as:

    len = dma_length_status >> DMA_BUFLENGTH_SHIFT;

which is a 16-bit field (up to 65535), and the bounds check against
RX_BUF_LENGTH only happens in the next hunk.  page_pool_dma_sync_for_cpu()
on a non-coherent system ends up invalidating the cache over
[dma_addr + 64, dma_addr + 64 + (len - 64)].  If the hardware ever reports
a bogus length (glitch, stale RSB on an error condition, etc.) the
invalidation would drop dirty cache lines belonging to whatever kernel
allocation follows this page.

The previous code used a fixed priv->rx_buf_len for dma_unmap_single(),
so the HW-controlled length never sized a DMA-maintenance operation.

Would it be safer to validate len against RX_BUF_LENGTH / GENET_RSB_PAD
before this second page_pool_dma_sync_for_cpu() call, or clamp the sync
size to min(len - 64, RX_BUF_LENGTH - 64)?

>  		netif_dbg(priv, rx_status, dev,
>  			  "%s:p_ind=%d c_ind=%d read_ptr=%d len_stat=0x%08x\n",
>  			  __func__, p_index, ring->c_index,
>  			  ring->read_ptr, dma_length_status);
>  
> -		if (unlikely(len > RX_BUF_LENGTH)) {
> -			netif_err(priv, rx_status, dev, "oversized packet\n");
> +		if (unlikely(len > RX_BUF_LENGTH || len < GENET_RSB_PAD)) {
> +			netif_err(priv, rx_status, dev,
> +				  "invalid packet length %d\n", len);
>  			BCMGENET_STATS64_INC(stats, length_errors);
> -			dev_kfree_skb_any(skb);
> +			page_pool_put_full_page(ring->page_pool, rx_page,
> +						true);
>  			goto next;
>  		}

Is the lower bound here tight enough?

len == GENET_RSB_PAD (66) is accepted and then:

    __skb_put(skb, len - GENET_RSB_PAD);

yields a 0-byte skb.  eth_type_trans() -> eth_skb_pull_mac() will still
read h_proto from skb->data, which points into page_pool memory that the
hardware did not write for this packet.

Similarly, if priv->crc_fwd_en is set and skb->len < ETH_FCS_LEN (len
in [66..69]):

    skb_trim(skb, skb->len - ETH_FCS_LEN);

underflows to a huge unsigned value, but skb_trim()'s internal
"if (skb->len > len)" evaluates to false so the trim silently no-ops and
FCS bytes remain in the payload.

Would a bound like
len < GENET_RSB_PAD + ETH_ZLEN + (crc_fwd_en ? ETH_FCS_LEN : 0)
reject these runts more cleanly?

[ ... ]

> @@ -2411,24 +2408,47 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,

[ ... ]

> -		skb_put(skb, len);
> +		/* Build SKB from the page - data starts at hard_start,
> +		 * frame begins after RSB(64) + pad(2) = 66 bytes.
> +		 */
> +		skb = napi_build_skb(hard_start, PAGE_SIZE - XDP_PACKET_HEADROOM);
> +		if (unlikely(!skb)) {
> +			BCMGENET_STATS64_INC(stats, dropped);
> +			page_pool_put_full_page(ring->page_pool, rx_page,
> +						true);
> +			goto next;
> +		}
> +
> +		skb_mark_for_recycle(skb);
>  
> -		/* remove RSB and hardware 2bytes added for IP alignment */
> -		skb_pull(skb, 66);
> -		len -= 66;
> +		/* Reserve the RSB + pad, then set the data length */
> +		skb_reserve(skb, GENET_RSB_PAD);
> +		__skb_put(skb, len - GENET_RSB_PAD);
>  
>  		if (priv->crc_fwd_en) {
> -			skb_trim(skb, len - ETH_FCS_LEN);
> -			len -= ETH_FCS_LEN;
> +			skb_trim(skb, skb->len - ETH_FCS_LEN);
>  		}

Separately, the pre-patch code incremented priv->mib.rx_dma_failed on
dma_mapping_error() inside bcmgenet_rx_refill().  With page_pool taking
over DMA mapping, nothing increments rx_dma_failed anymore, but the
counter is still exported via ethtool -S through:

    STAT_GENET_SOFT_MIB("rx_dma_failed", mib.rx_dma_failed)

So the field will be permanently zero after this patch, which is an
implicit user-visible change not mentioned in the commit message.  Should
rx_dma_failed be removed from the stats table (or folded into
alloc_rx_buff_failed) as part of this conversion, and the change noted in
the commit log?

[ ... ]
-- 
pw-bot: cr

next prev parent reply	other threads:[~2026-05-10 18:46 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-06  9:55 [PATCH net-next v9 0/7] net: bcmgenet: add XDP support Nicolai Buchwitz
2026-05-06  9:55 ` [PATCH net-next v9 1/7] net: bcmgenet: convert RX path to page_pool Nicolai Buchwitz
2026-05-07 19:17   ` sashiko-bot
2026-05-10 18:46   ` Jakub Kicinski [this message]
2026-05-06  9:55 ` [PATCH net-next v9 2/7] net: bcmgenet: register xdp_rxq_info for each RX ring Nicolai Buchwitz
2026-05-06  9:55 ` [PATCH net-next v9 3/7] net: bcmgenet: add basic XDP support (PASS/DROP) Nicolai Buchwitz
2026-05-07 19:17   ` sashiko-bot
2026-05-10 18:47   ` Jakub Kicinski
2026-05-06  9:55 ` [PATCH net-next v9 4/7] net: bcmgenet: add XDP_TX support Nicolai Buchwitz
2026-05-07 19:17   ` sashiko-bot
2026-05-10 18:52   ` Jakub Kicinski
2026-05-06  9:55 ` [PATCH net-next v9 5/7] net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support Nicolai Buchwitz
2026-05-07 19:17   ` sashiko-bot
2026-05-10 18:55   ` Jakub Kicinski
2026-05-06  9:55 ` [PATCH net-next v9 6/7] net: bcmgenet: add XDP statistics counters Nicolai Buchwitz
2026-05-07 19:17   ` sashiko-bot
2026-05-06  9:55 ` [PATCH net-next v9 7/7] net: bcmgenet: reject MTU changes incompatible with XDP Nicolai Buchwitz
2026-05-07 19:18   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260510184637.3952320-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=ast@kernel.org \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=florian.fainelli@broadcom.com \
    --cc=hawk@kernel.org \
    --cc=hmohsin@meta.com \
    --cc=horms@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=justin.chen@broadcom.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mohsin.bashr@gmail.com \
    --cc=nb@tipi-net.de \
    --cc=netdev@vger.kernel.org \
    --cc=opendmb@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.