From: Jakub Kicinski <kuba@kernel.org>
To: nb@tipi-net.de
Cc: Jakub Kicinski <kuba@kernel.org>,
netdev@vger.kernel.org, justin.chen@broadcom.com,
horms@kernel.org, mohsin.bashr@gmail.com, opendmb@gmail.com,
florian.fainelli@broadcom.com,
bcm-kernel-feedback-list@broadcom.com, andrew+netdev@lunn.ch,
edumazet@google.com, pabeni@redhat.com, hmohsin@meta.com,
davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
hawk@kernel.org, john.fastabend@gmail.com, sdf@fomichev.me,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH net-next v9 1/7] net: bcmgenet: convert RX path to page_pool
Date: Sun, 10 May 2026 11:46:37 -0700 [thread overview]
Message-ID: <20260510184637.3952320-1-kuba@kernel.org> (raw)
In-Reply-To: <20260506095553.55357-2-nb@tipi-net.de>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: bcmgenet: convert RX path to page_pool
This patch replaces the per-packet __netdev_alloc_skb() + dma_map_single()
in the RX path with page_pool, and builds skbs via napi_build_skb() with
skb_mark_for_recycle() so pages are returned to the pool automatically.
A per-ring page_pool is created with PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
and the buffer layout reserves XDP_PACKET_HEADROOM before the HW RSB.
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 54f71b1e85fc..df11c4977e8f 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
[ ... ]
> @@ -2341,25 +2325,29 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
> while ((rxpktprocessed < rxpkttoprocess) &&
> (rxpktprocessed < budget)) {
> struct status_64 *status;
> + struct page *rx_page;
> + unsigned int rx_off;
> __be16 rx_csum;
> + void *hard_start;
>
> cb = &priv->rx_cbs[ring->read_ptr];
> - skb = bcmgenet_rx_refill(priv, cb);
>
> - if (unlikely(!skb)) {
> + /* Save the received page before refilling */
> + rx_page = cb->rx_page;
> + rx_off = cb->rx_page_offset;
> +
> + if (bcmgenet_rx_refill(ring, cb)) {
> BCMGENET_STATS64_INC(stats, dropped);
> goto next;
> }
>
> - status = (struct status_64 *)skb->data;
> + /* Sync the RSB first to read the frame length */
> + page_pool_dma_sync_for_cpu(ring->page_pool, rx_page, 0,
> + sizeof(struct status_64));
> +
> + hard_start = page_address(rx_page) + rx_off;
> + status = (struct status_64 *)hard_start;
> dma_length_status = status->length_status;
rx_off is saved from cb->rx_page_offset, but bcmgenet_rx_refill() always
sets that field to XDP_PACKET_HEADROOM. The adjacent napi_build_skb()
call further down hardcodes the same constant as:
napi_build_skb(hard_start, PAGE_SIZE - XDP_PACKET_HEADROOM);
Is the intent that rx_page_offset eventually varies (for XDP layouts),
or should the two sites either both use the constant or both derive the
size from rx_off (for example PAGE_SIZE - rx_off)? If the offset ever
changes without updating the hardcoded napi_build_skb() argument the
skb_shared_info placement becomes incorrect.
>
> /* DMA flags and length are still valid no matter how
> * we got the Receive Status Vector (64B RSB or register)
> @@ -2367,15 +2355,23 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
> dma_flag = dma_length_status & 0xffff;
> len = dma_length_status >> DMA_BUFLENGTH_SHIFT;
>
> + /* Sync the rest of the actual received frame */
> + if (len > sizeof(struct status_64))
> + page_pool_dma_sync_for_cpu(ring->page_pool, rx_page,
> + sizeof(struct status_64),
> + len - sizeof(struct status_64));
> +
Can this drive a cache-maintenance operation with an unbounded length?
len is extracted from the HW-written RSB as:
len = dma_length_status >> DMA_BUFLENGTH_SHIFT;
which is a 16-bit field (up to 65535), and the bounds check against
RX_BUF_LENGTH only happens in the next hunk. page_pool_dma_sync_for_cpu()
on a non-coherent system ends up invalidating the cache over
[dma_addr + 64, dma_addr + 64 + (len - 64)]. If the hardware ever reports
a bogus length (glitch, stale RSB on an error condition, etc.) the
invalidation would drop dirty cache lines belonging to whatever kernel
allocation follows this page.
The previous code used a fixed priv->rx_buf_len for dma_unmap_single(),
so the HW-controlled length never sized a DMA-maintenance operation.
Would it be safer to validate len against RX_BUF_LENGTH / GENET_RSB_PAD
before this second page_pool_dma_sync_for_cpu() call, or clamp the sync
size to min(len - 64, RX_BUF_LENGTH - 64)?
> netif_dbg(priv, rx_status, dev,
> "%s:p_ind=%d c_ind=%d read_ptr=%d len_stat=0x%08x\n",
> __func__, p_index, ring->c_index,
> ring->read_ptr, dma_length_status);
>
> - if (unlikely(len > RX_BUF_LENGTH)) {
> - netif_err(priv, rx_status, dev, "oversized packet\n");
> + if (unlikely(len > RX_BUF_LENGTH || len < GENET_RSB_PAD)) {
> + netif_err(priv, rx_status, dev,
> + "invalid packet length %d\n", len);
> BCMGENET_STATS64_INC(stats, length_errors);
> - dev_kfree_skb_any(skb);
> + page_pool_put_full_page(ring->page_pool, rx_page,
> + true);
> goto next;
> }
Is the lower bound here tight enough?
len == GENET_RSB_PAD (66) is accepted and then:
__skb_put(skb, len - GENET_RSB_PAD);
yields a 0-byte skb. eth_type_trans() -> eth_skb_pull_mac() will still
read h_proto from skb->data, which points into page_pool memory that the
hardware did not write for this packet.
Similarly, if priv->crc_fwd_en is set and skb->len < ETH_FCS_LEN (len
in [66..69]):
skb_trim(skb, skb->len - ETH_FCS_LEN);
underflows to a huge unsigned value, but skb_trim()'s internal
"if (skb->len > len)" evaluates to false so the trim silently no-ops and
FCS bytes remain in the payload.
Would a bound like
len < GENET_RSB_PAD + ETH_ZLEN + (crc_fwd_en ? ETH_FCS_LEN : 0)
reject these runts more cleanly?
[ ... ]
> @@ -2411,24 +2408,47 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,
[ ... ]
> - skb_put(skb, len);
> + /* Build SKB from the page - data starts at hard_start,
> + * frame begins after RSB(64) + pad(2) = 66 bytes.
> + */
> + skb = napi_build_skb(hard_start, PAGE_SIZE - XDP_PACKET_HEADROOM);
> + if (unlikely(!skb)) {
> + BCMGENET_STATS64_INC(stats, dropped);
> + page_pool_put_full_page(ring->page_pool, rx_page,
> + true);
> + goto next;
> + }
> +
> + skb_mark_for_recycle(skb);
>
> - /* remove RSB and hardware 2bytes added for IP alignment */
> - skb_pull(skb, 66);
> - len -= 66;
> + /* Reserve the RSB + pad, then set the data length */
> + skb_reserve(skb, GENET_RSB_PAD);
> + __skb_put(skb, len - GENET_RSB_PAD);
>
> if (priv->crc_fwd_en) {
> - skb_trim(skb, len - ETH_FCS_LEN);
> - len -= ETH_FCS_LEN;
> + skb_trim(skb, skb->len - ETH_FCS_LEN);
> }
Separately, the pre-patch code incremented priv->mib.rx_dma_failed on
dma_mapping_error() inside bcmgenet_rx_refill(). With page_pool taking
over DMA mapping, nothing increments rx_dma_failed anymore, but the
counter is still exported via ethtool -S through:
STAT_GENET_SOFT_MIB("rx_dma_failed", mib.rx_dma_failed)
So the field will be permanently zero after this patch, which is an
implicit user-visible change not mentioned in the commit message. Should
rx_dma_failed be removed from the stats table (or folded into
alloc_rx_buff_failed) as part of this conversion, and the change noted in
the commit log?
[ ... ]
--
pw-bot: cr
next prev parent reply other threads:[~2026-05-10 18:46 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-06 9:55 [PATCH net-next v9 0/7] net: bcmgenet: add XDP support Nicolai Buchwitz
2026-05-06 9:55 ` [PATCH net-next v9 1/7] net: bcmgenet: convert RX path to page_pool Nicolai Buchwitz
2026-05-10 18:46 ` Jakub Kicinski [this message]
2026-05-06 9:55 ` [PATCH net-next v9 2/7] net: bcmgenet: register xdp_rxq_info for each RX ring Nicolai Buchwitz
2026-05-06 9:55 ` [PATCH net-next v9 3/7] net: bcmgenet: add basic XDP support (PASS/DROP) Nicolai Buchwitz
2026-05-10 18:47 ` Jakub Kicinski
2026-05-06 9:55 ` [PATCH net-next v9 4/7] net: bcmgenet: add XDP_TX support Nicolai Buchwitz
2026-05-10 18:52 ` Jakub Kicinski
2026-05-06 9:55 ` [PATCH net-next v9 5/7] net: bcmgenet: add XDP_REDIRECT and ndo_xdp_xmit support Nicolai Buchwitz
2026-05-10 18:55 ` Jakub Kicinski
2026-05-06 9:55 ` [PATCH net-next v9 6/7] net: bcmgenet: add XDP statistics counters Nicolai Buchwitz
2026-05-06 9:55 ` [PATCH net-next v9 7/7] net: bcmgenet: reject MTU changes incompatible with XDP Nicolai Buchwitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260510184637.3952320-1-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=ast@kernel.org \
--cc=bcm-kernel-feedback-list@broadcom.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=florian.fainelli@broadcom.com \
--cc=hawk@kernel.org \
--cc=hmohsin@meta.com \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=justin.chen@broadcom.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mohsin.bashr@gmail.com \
--cc=nb@tipi-net.de \
--cc=netdev@vger.kernel.org \
--cc=opendmb@gmail.com \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox