Netdev List
 help / color / mirror / Atom feed
From: Nicolai Buchwitz <nb@tipi-net.de>
To: Simon Horman <horms@kernel.org>
Cc: netdev@vger.kernel.org, justin.chen@broadcom.com,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, opendmb@gmail.com,
	florian.fainelli@broadcom.com,
	bcm-kernel-feedback-list@broadcom.com, vikas.gupta@broadcom.com,
	rajashekar.hudumula@broadcom.com, bhargava.marreddy@broadcom.com,
	fmancera@suse.de, markus@blochl.de, arnd@arndb.de,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next] net: bcmgenet: convert RX path to page_pool
Date: Thu, 04 Jun 2026 22:05:29 +0200	[thread overview]
Message-ID: <e6694208872ec000c8db73dedd5931a2@tipi-net.de> (raw)
In-Reply-To: <20260604173205.3989025-3-horms@kernel.org>

Hi Simon

On 4.6.2026 19:32, Simon Horman wrote:
> From: 'Simon Horman' <horms@kernel.org>
> 
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://netdev-ai.bots.linux.dev/sashiko/
> ---
> Now I have the exact diff. Let me produce the report.
> 
> commit ff8ce4b2334ded9490897cdd2cff5a25a4a0a438
> Author: Nicolai Buchwitz <nb@tipi-net.de>
> 
> net: bcmgenet: convert RX path to page_pool
> 
> This patch converts the bcmgenet RX path from per-packet
> __netdev_alloc_skb() + dma_map_single() to page_pool, builds SKBs from
> pool pages via napi_build_skb() with skb_mark_for_recycle(), and 
> removes
> the now-unused priv->rx_buf_len field and rx_dma_failed soft MIB
> counter.
> 
>> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
>> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c

> [...]

>> @@ -1153,7 +1159,6 @@ static const struct bcmgenet_stats 
>> bcmgenet_gstrings_stats[] = {
>>  			UMAC_RBUF_ERR_CNT_V1),
>>  	STAT_GENET_MISC("mdf_err_cnt", mib.mdf_err_cnt, UMAC_MDF_ERR_CNT),
>>  	STAT_GENET_SOFT_MIB("alloc_rx_buff_failed", 
>> mib.alloc_rx_buff_failed),
>> -	STAT_GENET_SOFT_MIB("rx_dma_failed", mib.rx_dma_failed),
>>  	STAT_GENET_SOFT_MIB("tx_dma_failed", mib.tx_dma_failed),
> 
> [Medium]
> The string "rx_dma_failed" exposed via ethtool -S is user-visible: 
> monitoring
> agents (collectd, telegraf, Prometheus exporters, ad-hoc scripts) may 
> key off
> that name, and the strings list is ordered, so removing one shifts the 
> index
> of every counter that follows for tools that index by offset rather 
> than name.
> 
> Should the changelog mention this user-visible change explicitly, or 
> could
> the entry be retained reporting 0 to preserve the existing string set?

Nothing in the driver can increment rx_dma_failed after this patch. What 
would be
the point in keeping it static 0? AFAIU removing dead soft-MIB counters 
in the same
patch is not so special for page_pool conversions (e.g. ionic)?

I will mention it in the commit message of v2 though.

> 
> Worth noting that the sibling driver 
> drivers/net/ethernet/broadcom/bcmsysport.c
> still exposes rx_dma_failed, so the two drivers in the same Broadcom 
> directory
> will diverge after this patch.

bcmsysport still does per-packet dma_map_single() and not PP. The two 
drivers
diverge because they do different things now, not because removing the 
counter
here is wrong?

> [...]

> [High]
> Has a one-page-per-RX-descriptor scheme been considered against the
> platforms bcmgenet actually runs on?
> 
> The pool is created with order=0 and max_len=RX_BUF_LENGTH (2048):
> 
> 	struct page_pool_params pp_params = {
> 		.order = 0,
> 		...
> 		.max_len = RX_BUF_LENGTH,
> 	};
> 
> bcmgenet_rx_refill() always sets cb->rx_page_offset = 0, and
> bcmgenet_desc_rx() then does:
> 
> 	skb = napi_build_skb(hard_start, PAGE_SIZE - rx_off);
> 
> so each ~1.5KB frame ends up with skb->truesize == PAGE_SIZE.
> 
> bcmgenet runs on BCM2711/2712 (Raspberry Pi 4/5) and Brahma-B53 STB 
> SoCs,
> which are commonly built with 16K or 64K page sizes. With ring->size of

Pi5 uses BCM2712, but doesn't expose the genet and instead uses cadence 
macb
on it's RP1. AFAIK only pi5 uses 16k pages and Pi4 and other STB 
platforms
use 4k pages? With 4k we would end up at 1 MB per ring, which seems fine 
for
a GBit MAC?

> 256 and one full page per descriptor, that is e.g. 256 * 64K = 16MB of
> DMA-mapped memory per ring on a 64K-page kernel, for ~512KB of frame
> data. Every ~1.5KB packet also bills PAGE_SIZE bytes against sk_rcvbuf
> and per-socket memory accounting, which can drive TCP receivers into
> memory pressure on a Gigabit MAC.
> 
> The newly added struct enet_cb::rx_page_offset is set but never to a
> non-zero value:
> 
> 	cb->rx_page = page;
> 	cb->rx_page_offset = 0;
> 
> Was a fragmented allocation (page_pool_dev_alloc_frag(), or a higher
> .order) the original intent here? If so, would it make sense to either
> implement that in this patch, or drop the unused rx_page_offset field
> until it is needed?

Frag allocation is a reasonable follow-up but I'd keep this patch
focused on the page_pool conversion. rx_page_offset is an artifact from
the original XDP series. It is meant for the later XDP_PACKET_HEADROOM, 
but
I can drop it in v2 for now.

> 
>> @@ -2340,25 +2321,31 @@ static unsigned int bcmgenet_desc_rx(struct 
>> bcmgenet_rx_ring *ring,
> 
> [ ... ]
> 
>> -		if (unlikely(len > RX_BUF_LENGTH)) {
>> -			netif_err(priv, rx_status, dev, "oversized packet\n");
>> +		/* Reject obviously bogus lengths to keep the SKB build path
>> +		 * safe against runt frames.
>> +		 */
>> +		min_len = GENET_RSB_PAD + ETH_ZLEN +
>> +			  (priv->crc_fwd_en ? ETH_FCS_LEN : 0);
>> +		if (unlikely(len > RX_BUF_LENGTH || len < min_len)) {
>> +			netif_err(priv, rx_status, dev,
>> +				  "invalid packet length %d\n", len);
>>  			BCMGENET_STATS64_INC(stats, length_errors);
>> -			dev_kfree_skb_any(skb);
>> +			page_pool_put_full_page(ring->page_pool, rx_page,
>> +						true);
>>  			goto next;
>>  		}
> 
> [Low, Low]
> This combined check changes the per-counter attribution for short 
> frames.
> Previously, only len > RX_BUF_LENGTH short-circuited into 
> length_errors;
> runts that also had DMA_RX_CRC_ERROR / DMA_RX_OV / DMA_RX_NO / 
> DMA_RX_LG /
> DMA_RX_RXER set would fall through to the dma_flag-driven block and be
> counted as crc_errors / over_errors / frame_errors / errors as 
> appropriate.
> 
> After this patch any sub-min_len frame is bucketed only as 
> length_errors,
> even if dma_flag indicates a CRC or RXER condition. Is the loss of 
> finer
> attribution intentional, and could the changelog mention it?
> 
> A related question on the lower bound itself: min_len is computed as
> GENET_RSB_PAD + ETH_ZLEN (+ ETH_FCS_LEN if crc_fwd_en), so any 
> HW-reported
> frame below ETH_ZLEN on the wire is dropped. Frames below the minimum
> that the MAC nevertheless delivers (for example with promiscuous mode 
> or
> runt-pass enabled, or some PHY-side debug captures) used to reach the
> stack with DMA_RX_LG / DMA_RX_NO flagging.
> 
> The commit message describes this as "a runt cannot underflow the SKB
> build path". The actual underflow precondition for the skb_reserve() /
> __skb_put() sequence is len < GENET_RSB_PAD (= 66):
> 
> 		skb_reserve(skb, GENET_RSB_PAD);
> 		__skb_put(skb, len - GENET_RSB_PAD);
> 
> Would a tighter check of len < GENET_RSB_PAD satisfy the stated safety
> goal without filtering legitimate sub-ETH_ZLEN frames the HW chose to
> deliver?

Resonable. Will use len < GENET_RSB_PAD in v2.

Thanks,
Nicolai

      parent reply	other threads:[~2026-06-04 20:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-02  9:42 [PATCH net-next] net: bcmgenet: convert RX path to page_pool Nicolai Buchwitz
2026-06-04 17:32 ` Simon Horman
2026-06-04 18:54   ` Simon Horman
2026-06-04 19:39   ` Nicolai Buchwitz
2026-06-04 21:31     ` Jacob Keller
2026-06-04 20:05   ` Nicolai Buchwitz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e6694208872ec000c8db73dedd5931a2@tipi-net.de \
    --to=nb@tipi-net.de \
    --cc=andrew+netdev@lunn.ch \
    --cc=arnd@arndb.de \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=bhargava.marreddy@broadcom.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=florian.fainelli@broadcom.com \
    --cc=fmancera@suse.de \
    --cc=horms@kernel.org \
    --cc=justin.chen@broadcom.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markus@blochl.de \
    --cc=netdev@vger.kernel.org \
    --cc=opendmb@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=rajashekar.hudumula@broadcom.com \
    --cc=vikas.gupta@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox