From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 149B84C954A;
	Thu,  4 Jun 2026 17:33:13 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780594396; cv=none; b=mxyakoIglgNy7l9ujQ8Xj0IRz/dFuhRsejjGaL1eFXy+28uWLxog9G0sqP9oK0fUgG0OUU9pwWOR2AhplcMzyQmXTJk2pf75jKbuaeX5f8j38eKD6h8RKN3qk83+L2UqDQibOyH+q1mGdZ7GfS9DaD46uunhzgk4KyceDOlRvf0=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780594396; c=relaxed/simple;
	bh=K7kw/I3eF2aRMgsjq6hJCOCmmCvvffqpzX0AsR7dY2E=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version; b=OOO30mkRDHVVxV06wgVZS78G/VtS7HeO90hoKdWnbt8M5HvqcXdcEaoy7ohclkV1ljsp/6RvULYe8GORy1XyjdU9dPQdXWCsDoH6aFfGywSoOWOocwejJ8vOlXZAM2EmpMXJ32/QLP1nDk6D/Ahy6g3UD0qJygQ5WoBYxIoARCE=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KAMvaTYv; arc=none smtp.client-ip=100.103.45.18
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KAMvaTYv"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 134521F00893;
	Thu,  4 Jun 2026 17:33:09 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org;
	s=k20260515; t=1780594393;
	bh=KE+uMkWXsI4pHqJPpGFmy8f2FxY+qMoC/T/rfy5gO7Q=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References;
	b=KAMvaTYvDvB4hnp/67u6AT0NyXI8HNFOOSF7atosC10cAnrq6Ek44404qw48A/SbE
	 UZ1ruN3mLI1zxR9JzULFaAir6JN6DU/gwjv2fIKewb8ZloyVDrbahkjF89Kw1YkS4u
	 CjzudcGsriEIhruwFzo64urOtOFSBOq7fpR/clqI63xDZ9qp6GXmeaXRH73yhCdjAp
	 V5jUU3qk9Fg5a6vdSQrO9VhJyTKqN8+X7aS1pscBiMWqWK1/2bbxIYV4PCxFKoBwZa
	 j8A2VOVOcD8Ue7wG2RTcicXZq5sJnoIWW0M8dtaA4/f+1NrdZppaS0bgEIi8o1yw6B
	 Ab6Qjrrsuw7cA==
From: Simon Horman <horms@kernel.org>
To: nb@tipi-net.de
Cc: 'Simon Horman' <horms@kernel.org>,
	netdev@vger.kernel.org,
	justin.chen@broadcom.com,
	andrew+netdev@lunn.ch,
	davem@davemloft.net,
	edumazet@google.com,
	kuba@kernel.org,
	pabeni@redhat.com,
	opendmb@gmail.com,
	florian.fainelli@broadcom.com,
	bcm-kernel-feedback-list@broadcom.com,
	vikas.gupta@broadcom.com,
	rajashekar.hudumula@broadcom.com,
	bhargava.marreddy@broadcom.com,
	fmancera@suse.de,
	markus@blochl.de,
	arnd@arndb.de,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next] net: bcmgenet: convert RX path to page_pool
Date: Thu,  4 Jun 2026 18:32:06 +0100
Message-ID: <20260604173205.3989025-3-horms@kernel.org>
X-Mailer: git-send-email 2.54.0
In-Reply-To: <20260602094248.4130712-1-nb@tipi-net.de>
References: <20260602094248.4130712-1-nb@tipi-net.de>
Precedence: bulk
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
Now I have the exact diff. Let me produce the report.

commit ff8ce4b2334ded9490897cdd2cff5a25a4a0a438
Author: Nicolai Buchwitz <nb@tipi-net.de>

net: bcmgenet: convert RX path to page_pool

This patch converts the bcmgenet RX path from per-packet
__netdev_alloc_skb() + dma_map_single() to page_pool, builds SKBs from
pool pages via napi_build_skb() with skb_mark_for_recycle(), and removes
the now-unused priv->rx_buf_len field and rx_dma_failed soft MIB
counter.

> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 7c11cf916762..80dbfba9fa88 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c

[ ... ]

> @@ -1153,7 +1159,6 @@ static const struct bcmgenet_stats bcmgenet_gstrings_stats[] = {
>  			UMAC_RBUF_ERR_CNT_V1),
>  	STAT_GENET_MISC("mdf_err_cnt", mib.mdf_err_cnt, UMAC_MDF_ERR_CNT),
>  	STAT_GENET_SOFT_MIB("alloc_rx_buff_failed", mib.alloc_rx_buff_failed),
> -	STAT_GENET_SOFT_MIB("rx_dma_failed", mib.rx_dma_failed),
>  	STAT_GENET_SOFT_MIB("tx_dma_failed", mib.tx_dma_failed),

[Medium]
The string "rx_dma_failed" exposed via ethtool -S is user-visible: monitoring
agents (collectd, telegraf, Prometheus exporters, ad-hoc scripts) may key off
that name, and the strings list is ordered, so removing one shifts the index
of every counter that follows for tools that index by offset rather than name.

Should the changelog mention this user-visible change explicitly, or could
the entry be retained reporting 0 to preserve the existing string set?

Worth noting that the sibling driver drivers/net/ethernet/broadcom/bcmsysport.c
still exposes rx_dma_failed, so the two drivers in the same Broadcom directory
will diverge after this patch.

> @@ -2249,46 +2246,30 @@ static netdev_tx_t bcmgenet_xmit(struct sk_buff *skb, struct net_device *dev)
>  	goto out;
>  }
>  
> -static struct sk_buff *bcmgenet_rx_refill(struct bcmgenet_priv *priv,
> -					  struct enet_cb *cb)
> +static int bcmgenet_rx_refill(struct bcmgenet_rx_ring *ring,
> +			      struct enet_cb *cb)
>  {
> -	struct device *kdev = &priv->pdev->dev;
> -	struct sk_buff *skb;
> -	struct sk_buff *rx_skb;
> +	struct bcmgenet_priv *priv = ring->priv;
>  	dma_addr_t mapping;
> +	struct page *page;
>  
> -	/* Allocate a new Rx skb */
> -	skb = __netdev_alloc_skb(priv->dev, priv->rx_buf_len + SKB_ALIGNMENT,
> -				 GFP_ATOMIC | __GFP_NOWARN);
> -	if (!skb) {
> +	page = page_pool_alloc_pages(ring->page_pool,
> +				     GFP_ATOMIC);
> +	if (!page) {
>  		priv->mib.alloc_rx_buff_failed++;
>  		netif_err(priv, rx_err, priv->dev,
> -			  "%s: Rx skb allocation failed\n", __func__);
> -		return NULL;
> -	}
> -
> -	/* DMA-map the new Rx skb */
> -	mapping = dma_map_single(kdev, skb->data, priv->rx_buf_len,
> -				 DMA_FROM_DEVICE);
> -	if (dma_mapping_error(kdev, mapping)) {
> -		priv->mib.rx_dma_failed++;
> -		dev_kfree_skb_any(skb);
> -		netif_err(priv, rx_err, priv->dev,
> -			  "%s: Rx skb DMA mapping failed\n", __func__);
> -		return NULL;
> +			  "%s: Rx page allocation failed\n", __func__);
> +		return -ENOMEM;
>  	}
>  
> -	/* Grab the current Rx skb from the ring and DMA-unmap it */
> -	rx_skb = bcmgenet_free_rx_cb(kdev, cb);
> +	/* page_pool handles DMA mapping via PP_FLAG_DMA_MAP */
> +	mapping = page_pool_get_dma_addr(page);
>  
> -	/* Put the new Rx skb on the ring */
> -	cb->skb = skb;
> -	dma_unmap_addr_set(cb, dma_addr, mapping);
> -	dma_unmap_len_set(cb, dma_len, priv->rx_buf_len);
> +	cb->rx_page = page;
> +	cb->rx_page_offset = 0;
>  	dmadesc_set_addr(priv, cb->bd_addr, mapping);
>  
> -	/* Return the current Rx skb to caller */
> -	return rx_skb;
> +	return 0;
>  }

[High]
Has a one-page-per-RX-descriptor scheme been considered against the
platforms bcmgenet actually runs on?

The pool is created with order=0 and max_len=RX_BUF_LENGTH (2048):

	struct page_pool_params pp_params = {
		.order = 0,
		...
		.max_len = RX_BUF_LENGTH,
	};

bcmgenet_rx_refill() always sets cb->rx_page_offset = 0, and
bcmgenet_desc_rx() then does:

	skb = napi_build_skb(hard_start, PAGE_SIZE - rx_off);

so each ~1.5KB frame ends up with skb->truesize == PAGE_SIZE.

bcmgenet runs on BCM2711/2712 (Raspberry Pi 4/5) and Brahma-B53 STB SoCs,
which are commonly built with 16K or 64K page sizes. With ring->size of
256 and one full page per descriptor, that is e.g. 256 * 64K = 16MB of
DMA-mapped memory per ring on a 64K-page kernel, for ~512KB of frame
data. Every ~1.5KB packet also bills PAGE_SIZE bytes against sk_rcvbuf
and per-socket memory accounting, which can drive TCP receivers into
memory pressure on a Gigabit MAC.

The newly added struct enet_cb::rx_page_offset is set but never to a
non-zero value:

	cb->rx_page = page;
	cb->rx_page_offset = 0;

Was a fragmented allocation (page_pool_dev_alloc_frag(), or a higher
.order) the original intent here? If so, would it make sense to either
implement that in this patch, or drop the unused rx_page_offset field
until it is needed?

> @@ -2340,25 +2321,31 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_rx_ring *ring,

[ ... ]

> -		if (unlikely(len > RX_BUF_LENGTH)) {
> -			netif_err(priv, rx_status, dev, "oversized packet\n");
> +		/* Reject obviously bogus lengths to keep the SKB build path
> +		 * safe against runt frames.
> +		 */
> +		min_len = GENET_RSB_PAD + ETH_ZLEN +
> +			  (priv->crc_fwd_en ? ETH_FCS_LEN : 0);
> +		if (unlikely(len > RX_BUF_LENGTH || len < min_len)) {
> +			netif_err(priv, rx_status, dev,
> +				  "invalid packet length %d\n", len);
>  			BCMGENET_STATS64_INC(stats, length_errors);
> -			dev_kfree_skb_any(skb);
> +			page_pool_put_full_page(ring->page_pool, rx_page,
> +						true);
>  			goto next;
>  		}

[Low, Low]
This combined check changes the per-counter attribution for short frames.
Previously, only len > RX_BUF_LENGTH short-circuited into length_errors;
runts that also had DMA_RX_CRC_ERROR / DMA_RX_OV / DMA_RX_NO / DMA_RX_LG /
DMA_RX_RXER set would fall through to the dma_flag-driven block and be
counted as crc_errors / over_errors / frame_errors / errors as appropriate.

After this patch any sub-min_len frame is bucketed only as length_errors,
even if dma_flag indicates a CRC or RXER condition. Is the loss of finer
attribution intentional, and could the changelog mention it?

A related question on the lower bound itself: min_len is computed as
GENET_RSB_PAD + ETH_ZLEN (+ ETH_FCS_LEN if crc_fwd_en), so any HW-reported
frame below ETH_ZLEN on the wire is dropped. Frames below the minimum
that the MAC nevertheless delivers (for example with promiscuous mode or
runt-pass enabled, or some PHY-side debug captures) used to reach the
stack with DMA_RX_LG / DMA_RX_NO flagging.

The commit message describes this as "a runt cannot underflow the SKB
build path". The actual underflow precondition for the skb_reserve() /
__skb_put() sequence is len < GENET_RSB_PAD (= 66):

		skb_reserve(skb, GENET_RSB_PAD);
		__skb_put(skb, len - GENET_RSB_PAD);

Would a tighter check of len < GENET_RSB_PAD satisfy the stated safety
goal without filtering legitimate sub-ETH_ZLEN frames the HW chose to
deliver?