Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v6 3/7] net: bcmgenet: add basic XDP support (PASS/DROP)
From: Jakub Kicinski @ 2026-04-12 19:22 UTC (permalink / raw)
  To: Nicolai Buchwitz
  Cc: netdev, Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
	Florian Fainelli, Broadcom internal kernel review list,
	Andrew Lunn, Eric Dumazet, Paolo Abeni, David S. Miller,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, linux-kernel, bpf
In-Reply-To: <20260406083536.839517-4-nb@tipi-net.de>

On Mon,  6 Apr 2026 10:35:27 +0200 Nicolai Buchwitz wrote:
> Add XDP program attachment via ndo_bpf and execute XDP programs in the
> RX path. XDP_PASS builds an SKB from the xdp_buff (handling
> xdp_adjust_head/tail), XDP_DROP returns the page to page_pool without
> SKB allocation.
> 
> XDP_TX and XDP_REDIRECT are not yet supported and return XDP_ABORTED.
> 
> Advertise NETDEV_XDP_ACT_BASIC in xdp_features.


> -		skb_mark_for_recycle(skb);
> -
> -		/* Reserve the RSB + pad, then set the data length */
> -		skb_reserve(skb, GENET_RSB_PAD);
> -		__skb_put(skb, len - GENET_RSB_PAD);
> +		{

floating code blocks are considered poor coding style in the kernel
Why not push the variables up into the outer scope or make this 
a helper?

> +			struct xdp_buff xdp;
> +			unsigned int xdp_act;
> +			int pkt_len;
> +
> +			pkt_len = len - GENET_RSB_PAD;
> +			if (priv->crc_fwd_en)
> +				pkt_len -= ETH_FCS_LEN;
> +
> +			/* Save rx_csum before XDP runs - an XDP program
> +			 * could overwrite the RSB via bpf_xdp_adjust_head.
> +			 */
> +			if (dev->features & NETIF_F_RXCSUM)
> +				rx_csum = (__force __be16)(status->rx_csum
> +							   & 0xffff);

FWIW this could be before the block

> +			xdp_init_buff(&xdp, PAGE_SIZE, &ring->xdp_rxq);
> +			xdp_prepare_buff(&xdp, page_address(rx_page),
> +					 GENET_RX_HEADROOM, pkt_len, true);
> +
> +			if (xdp_prog) {
> +				xdp_act = bcmgenet_run_xdp(ring, xdp_prog,
> +							   &xdp, rx_page);

Since you pass the xdp_prog in you can save yourself the indentation by
making bcmgenet_run_xdp() return PASS when no program is set.
bcmgenet_run_xdp() has one caller, it's going to get inlined.

> +				if (xdp_act != XDP_PASS)
> +					goto next;
> +			}
>  
> -		if (priv->crc_fwd_en) {
> -			skb_trim(skb, skb->len - ETH_FCS_LEN);
> +			skb = bcmgenet_xdp_build_skb(ring, &xdp);
> +			if (unlikely(!skb)) {
> +				BCMGENET_STATS64_INC(stats, dropped);
> +				page_pool_put_full_page(ring->page_pool,
> +							rx_page, true);
> +				goto next;
> +			}
>  		}
>  
>  		/* Set up checksum offload */
>  		if (dev->features & NETIF_F_RXCSUM) {
> -			rx_csum = (__force __be16)(status->rx_csum & 0xffff);
>  			if (rx_csum) {
>  				skb->csum = (__force __wsum)ntohs(rx_csum);
>  				skb->ip_summed = CHECKSUM_COMPLETE;
> @@ -3744,6 +3810,37 @@ static int bcmgenet_change_carrier(struct net_device *dev, bool new_carrier)
>  	return 0;
>  }
>  
> +static int bcmgenet_xdp_setup(struct net_device *dev,
> +			      struct netdev_bpf *xdp)
> +{
> +	struct bcmgenet_priv *priv = netdev_priv(dev);
> +	struct bpf_prog *old_prog;
> +	struct bpf_prog *prog = xdp->prog;
> +
> +	if (prog && dev->mtu > PAGE_SIZE - GENET_RX_HEADROOM -
> +	    SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) {

I'm confused by this check, it appears that the max page size this
driver can Rx in the first place is 2kB (RX_BUF_LENGTH). And max_mtu 
is 1.5kB.

If GENET_RX_HEADROOM + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
is larger than 2kB the Rx path will break completely whether XDP was
attached or not.

This check seems to be cargo culting what other drivers do?

^ permalink raw reply

* Re: [PATCH net-next v6 1/7] net: bcmgenet: convert RX path to page_pool
From: Jakub Kicinski @ 2026-04-12 19:10 UTC (permalink / raw)
  To: Nicolai Buchwitz
  Cc: netdev, Justin Chen, Simon Horman, Mohsin Bashir, Doug Berger,
	Florian Fainelli, Broadcom internal kernel review list,
	Andrew Lunn, Eric Dumazet, Paolo Abeni, David S. Miller,
	Vikas Gupta, Bhargava Marreddy, Rajashekar Hudumula,
	Arnd Bergmann, Fernando Fernandez Mancera, Markus Blöchl,
	linux-kernel
In-Reply-To: <20260406083536.839517-2-nb@tipi-net.de>

On Mon,  6 Apr 2026 10:35:25 +0200 Nicolai Buchwitz wrote:
> Replace the per-packet __netdev_alloc_skb() + dma_map_single() in the
> RX path with page_pool, which provides efficient page recycling and
> DMA mapping management. This is a prerequisite for XDP support (which
> requires stable page-backed buffers rather than SKB linear data).
> 
> Key changes:
> - Create a page_pool per RX ring (PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV)
> - bcmgenet_rx_refill() allocates pages via page_pool_alloc_pages()
> - bcmgenet_desc_rx() builds SKBs from pages via napi_build_skb() with
>   skb_mark_for_recycle() for automatic page_pool return
> - Buffer layout reserves XDP_PACKET_HEADROOM (256 bytes) before the HW
>   RSB (64 bytes) + alignment pad (2 bytes) for future XDP headroom

some nits here, since I have more "real" comments on later patches

> +/* Page pool RX buffer layout:
> + * XDP_PACKET_HEADROOM | RSB(64) + pad(2) | frame data | skb_shared_info
> + * The HW writes the 64B RSB + 2B alignment padding before the frame.
> + */
> +#define GENET_XDP_HEADROOM	XDP_PACKET_HEADROOM

subjective but IDK what value this define adds vs using
XDP_PACKET_HEADROOM directly.

> +#define GENET_RSB_PAD		(sizeof(struct status_64) + 2)
> +#define GENET_RX_HEADROOM	(GENET_XDP_HEADROOM + GENET_RSB_PAD)

> +static int bcmgenet_rx_refill(struct bcmgenet_rx_ring *ring,
> +			      struct enet_cb *cb)
>  {
> -	struct device *kdev = &priv->pdev->dev;
> -	struct sk_buff *skb;
> -	struct sk_buff *rx_skb;
> +	struct bcmgenet_priv *priv = ring->priv;
>  	dma_addr_t mapping;
> +	struct page *page;
>  
> -	/* Allocate a new Rx skb */
> -	skb = __netdev_alloc_skb(priv->dev, priv->rx_buf_len + SKB_ALIGNMENT,
> -				 GFP_ATOMIC | __GFP_NOWARN);

page pool adds __GFP_NOWARN automatically, you can drop it now

> -	if (!skb) {
> +	page = page_pool_alloc_pages(ring->page_pool,
> +				     GFP_ATOMIC | __GFP_NOWARN);
> +	if (!page) {
>  		priv->mib.alloc_rx_buff_failed++;
>  		netif_err(priv, rx_err, priv->dev,
> -			  "%s: Rx skb allocation failed\n", __func__);
> -		return NULL;
> -	}



^ permalink raw reply

* Re: [PATCH net-next 00/11] netfilter: updates for net-next
From: Pablo Neira Ayuso @ 2026-04-12 18:58 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Jakub Kicinski, netdev, Paolo Abeni, David S. Miller,
	Eric Dumazet, netfilter-devel
In-Reply-To: <advTosG9qZ_ZW355@strlen.de>

On Sun, Apr 12, 2026 at 07:17:22PM +0200, Florian Westphal wrote:
> Florian Westphal <fw@strlen.de> wrote:
> > Jakub Kicinski <kuba@kernel.org> wrote:
> > https://sashiko.dev/#/patchset/20260410112352.23599-1-fw%40strlen.de
> 
> Forgot to mention this:
> 
> ---------------
> AF_PACKET raw sockets or tun devices, the network_header might be
> uninitialized (~0U). In this state, skb_mac_header_len() will evaluate to
> a very large number, bypassing the ETH_HLEN check completely.
> ---------------
> 
> Really?  TIL.
> 
> ---------------------
> Furthermore, skb_mac_header_len() only verifies the logical distance between
> header offsets, rather than ensuring the bytes are actually present in the
> physical linear buffer.
> 
> ---------------
> 
> Really?  Total news to me :-(

No problem, taking a look into this.

^ permalink raw reply

* Re: [PATCH net 1/2] NFC: digital: Bounds check NFC-A cascade depth in SDD response handler
From: patchwork-bot+netdevbpf @ 2026-04-12 18:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: netdev, linux-kernel, davem, edumazet, kuba, pabeni, horms, kees,
	thierry.escande, sameo, stable
In-Reply-To: <2026040913-figure-seducing-bd3f@gregkh>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu,  9 Apr 2026 17:18:14 +0200 you wrote:
> The NFC-A anti-collision cascade in digital_in_recv_sdd_res() appends 3
> or 4 bytes to target->nfcid1 on each round, but the number of cascade
> rounds is controlled entirely by the peer device.  The peer sets the
> cascade tag in the SDD_RES (deciding 3 vs 4 bytes) and the
> cascade-incomplete bit in the SEL_RES (deciding whether another round
> follows).
> 
> [...]

Here is the summary with links:
  - [net,1/2] NFC: digital: Bounds check NFC-A cascade depth in SDD response handler
    https://git.kernel.org/netdev/net/c/46ce8be2ced3
  - [net,2/2] NFC: digital: Bounds check Felica response before sensf_res memcpy
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v2] net_sched: fix skb memory leak in deferred qdisc drops
From: patchwork-bot+netdevbpf @ 2026-04-12 18:50 UTC (permalink / raw)
  To: Fernando Fernandez Mancera
  Cc: netdev, horms, pabeni, kuba, edumazet, davem, damilola
In-Reply-To: <20260408100044.4530-1-fmancera@suse.de>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed,  8 Apr 2026 12:00:44 +0200 you wrote:
> When the network stack cleans up the deferred list via qdisc_run_end(),
> it operates on the root qdisc. If the root qdisc do not implement the
> TCQ_F_DEQUEUE_DROPS flag the packets queue to free are never freed and
> gets stranded on the child's local to_free list.
> 
> Fix this by making qdisc_dequeue_drop() aware of the root qdisc. It
> fetches the root qdisc and check for the TCQ_F_DEQUEUE_DROPS flag. If
> the flag is present, the packet is appended directly to the root's
> to_free list. Otherwise, drop it directly as it was done before the
> optimization was implemented.
> 
> [...]

Here is the summary with links:
  - [net,v2] net_sched: fix skb memory leak in deferred qdisc drops
    https://git.kernel.org/netdev/net/c/a6bd339dbb35

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 2/2] NFC: digital: Bounds check Felica response before sensf_res memcpy
From: Jakub Kicinski @ 2026-04-12 18:46 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: netdev, linux-kernel, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Kees Cook, Thierry Escande, Samuel Ortiz, stable
In-Reply-To: <2026040913-rearrange-unseeing-fa85@gregkh>

On Thu,  9 Apr 2026 17:18:15 +0200 Greg Kroah-Hartman wrote:
> A malicious NFC peer can send a SENSF_RES that is longer than the
> NFC_SENSF_RES_MAXSIZE (18 byte) sensf_res field in the onstack struct
> nfc_target.  digital_in_recv_sensf_res() validates that the response is
> at least DIGITAL_SENSF_RES_MIN_LENGTH bytes but applies no upper bound
> before memcpy(target.sensf_res, sensf_res, resp->len) is called,
> allowing a stack buffer overflow with attacker-controlled length and
> content.
> 
> Commit e329e71013c9 ("NFC: nci: Bounds check struct nfc_target arrays")
> fixed identical missing checks for the same target->sensf_res field on
> the NCI path; the Digital Protocol path was never patched.
> 
> Fix this all up by just rejecting responses that exceed
> NFC_SENSF_RES_MAXSIZE.

This driver's local definition of the sensf_res struct seems to 
be larger than NFC_SENSF_RES_MAXSIZE. Something is off here.

^ permalink raw reply

* Re: [PATCH net-next v2 0/5] ynl/ethtool/netlink: fix nla_len overflow for large string sets
From: patchwork-bot+netdevbpf @ 2026-04-12 18:40 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: donald.hunter, kuba, davem, edumazet, pabeni, horms, andrew,
	netdev, linux-kernel
In-Reply-To: <20260408-b4-ynl_ethtool-v2-0-7623a5e8f70b@gmail.com>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 08 Apr 2026 15:08:48 +0800 you wrote:
> This series addresses a silent data corruption issue triggered when ynl
> retrieves string sets from NICs with a large number of statistics entries
> (e.g. mlx5_core with thousands of ETH_SS_STATS strings).
> 
> The root cause is that struct nlattr.nla_len is a __u16 (max 65535
> bytes). When a NIC exports enough statistics strings, the
> ETHTOOL_A_STRINGSET_STRINGS nest built by strset_fill_set() exceeds
> this limit. nla_nest_end() silently truncates the length on assignment,
> producing a corrupted netlink message.
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/5] tools: ynl: move ethtool.py to selftest
    https://git.kernel.org/netdev/net-next/c/22ef8a263c17
  - [net-next,v2,2/5] tools: ynl: ethtool: use doit instead of dumpit for per-device GET
    https://git.kernel.org/netdev/net-next/c/1c43d471a513
  - [net-next,v2,3/5] tools: ynl: ethtool: add --dbg-small-recv option
    https://git.kernel.org/netdev/net-next/c/594ba4477164
  - [net-next,v2,4/5] netlink: add a nla_nest_end_safe() helper
    https://git.kernel.org/netdev/net-next/c/1346586a9ac9
  - [net-next,v2,5/5] ethtool: strset: check nla_len overflow
    https://git.kernel.org/netdev/net-next/c/b2fb1a336383

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net: airoha: Add missing RX_CPU_IDX() configuration in airoha_qdma_cleanup_rx_queue()
From: patchwork-bot+netdevbpf @ 2026-04-12 18:40 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, linux-arm-kernel,
	linux-mediatek, netdev
In-Reply-To: <20260408-airoha-cpu-idx-airoha_qdma_cleanup_rx_queue-v1-1-8efa64844308@kernel.org>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 08 Apr 2026 20:26:56 +0200 you wrote:
> When the descriptor index written in REG_RX_CPU_IDX() is equal to the one
> stored in REG_RX_DMA_IDX(), the hw will stop since the QDMA RX ring is
> empty.
> Add missing REG_RX_CPU_IDX() configuration in airoha_qdma_cleanup_rx_queue
> routine during QDMA RX ring cleanup.
> 
> Fixes: 514aac359987 ("net: airoha: Add missing cleanup bits in airoha_qdma_cleanup_rx_queue()")
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> 
> [...]

Here is the summary with links:
  - [net] net: airoha: Add missing RX_CPU_IDX() configuration in airoha_qdma_cleanup_rx_queue()
    https://git.kernel.org/netdev/net/c/656121b15503

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 0/2] net: mana: Fix debugfs directory naming and file lifecycle
From: patchwork-bot+netdevbpf @ 2026-04-12 18:40 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ssengar, dipayanroy, gargaditya,
	shradhagupta, kees, kotaranov, yury.norov, linux-hyperv, netdev,
	linux-kernel
In-Reply-To: <20260408081224.302308-1-ernis@linux.microsoft.com>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed,  8 Apr 2026 01:12:18 -0700 you wrote:
> This series fixes two pre-existing debugfs issues in the MANA driver.
> 
> Patch 1 fixes the per-device debugfs directory naming to use the unique
> PCI BDF address via pci_name(), avoiding a potential NULL pointer
> dereference when pdev->slot is NULL (e.g. VFIO passthrough, nested KVM)
> and preventing name collisions across multiple PFs or VFs.
> 
> [...]

Here is the summary with links:
  - [net,1/2] net: mana: Use pci_name() for debugfs directory naming
    https://git.kernel.org/netdev/net/c/c116f07ab9d2
  - [net,2/2] net: mana: Move current_speed debugfs file to mana_init_port()
    https://git.kernel.org/netdev/net/c/3b7c7fc97aea

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next 0/3] net: phy: add support for disabling autonomous EEE
From: Jakub Kicinski @ 2026-04-12 18:35 UTC (permalink / raw)
  To: Andrew Lunn, Russell King, Florian Fainelli
  Cc: Nicolai Buchwitz, Heiner Kallweit, David S. Miller, Eric Dumazet,
	Paolo Abeni, Broadcom internal kernel review list, netdev,
	linux-kernel
In-Reply-To: <20260406-devel-autonomous-eee-v1-0-b335e7143711@tipi-net.de>

On Mon, 06 Apr 2026 09:13:06 +0200 Nicolai Buchwitz wrote:
> Some PHYs implement autonomous EEE where the PHY manages EEE
> independently, preventing the MAC from controlling LPI signaling.
> This conflicts with MACs that implement their own LPI control.

AFAIU the discussion that followed was about.. future work?
So this series is good as is. Applied, please LMK if I misread,
I'll drop it.

^ permalink raw reply

* Re: [PATCH net] nfc: llcp: add missing return after LLCP_CLOSED checks
From: patchwork-bot+netdevbpf @ 2026-04-12 18:30 UTC (permalink / raw)
  To: Junxi Qian; +Cc: netdev, davem, edumazet, kuba, pabeni, horms
In-Reply-To: <20260408081006.3723-1-qjx1298677004@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed,  8 Apr 2026 16:10:06 +0800 you wrote:
> In nfc_llcp_recv_hdlc() and nfc_llcp_recv_disc(), when the socket
> state is LLCP_CLOSED, the code correctly calls release_sock() and
> nfc_llcp_sock_put() but fails to return. Execution falls through to
> the remainder of the function, which calls release_sock() and
> nfc_llcp_sock_put() again. This results in a double release_sock()
> and a refcount underflow via double nfc_llcp_sock_put(), leading to
> a use-after-free.
> 
> [...]

Here is the summary with links:
  - [net] nfc: llcp: add missing return after LLCP_CLOSED checks
    https://git.kernel.org/netdev/net/c/2b5dd4632966

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] net: check qdisc_pkt_len_segs_init() return value on ingress
From: Jakub Kicinski @ 2026-04-12 18:21 UTC (permalink / raw)
  To: David Carlier
  Cc: davem, edumazet, pabeni, horms, sdf, kuniyu, skhawaja, liuhangbin,
	krikku, netdev, linux-kernel
In-Reply-To: <20260408172307.172736-1-devnexen@gmail.com>

On Wed,  8 Apr 2026 18:23:07 +0100 David Carlier wrote:
> Commit 7fb4c1967011 ("net: pull headers in qdisc_pkt_len_segs_init()")
> changed qdisc_pkt_len_segs_init() to return an skb drop reason when
> it detects malicious GSO packets. The egress path in __dev_queue_xmit()
> checks this return value and drops bad packets, but the ingress path in
> sch_handle_ingress() ignores it.
> 
> This means malformed GSO packets entering via TC ingress are not dropped
> and could be redirected to another interface or cause incorrect qdisc
> accounting.
> 
> Check the return value and drop the packet when a bad GSO is detected.
> 
> Fixes: 7fb4c1967011 ("net: pull headers in qdisc_pkt_len_segs_init()")
> Signed-off-by: David Carlier <devnexen@gmail.com>

Not sure this can happen today, but okay.
Hopefully we won't get a patch for every Sashiko report we knowingly
ignored :|

> diff --git a/net/core/dev.c b/net/core/dev.c
> index 5a31f9d2128c..2b5f508fc479 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4459,7 +4459,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
>  		   struct net_device *orig_dev, bool *another)
>  {
>  	struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress);
> -	enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_INGRESS;
> +	enum skb_drop_reason drop_reason;

this needs to move one line down now to keep the variable ordering.

>  	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
>  	int sch_ret;
>  
> @@ -4472,7 +4472,15 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
>  		*pt_prev = NULL;
>  	}
>  
> -	qdisc_pkt_len_segs_init(skb);
> +	drop_reason = qdisc_pkt_len_segs_init(skb);
> +	if (unlikely(drop_reason)) {
> +		kfree_skb_reason(skb, drop_reason);
> +		*ret = NET_RX_DROP;
> +		bpf_net_ctx_clear(bpf_net_ctx);
> +		return NULL;
> +	}
> +
> +	drop_reason = SKB_DROP_REASON_TC_INGRESS;
>  	tcx_set_ingress(skb, true);
>  
>  	if (static_branch_unlikely(&tcx_needed_key)) {
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net-next v12 00/10] bng_en: add link management and statistics support
From: patchwork-bot+netdevbpf @ 2026-04-12 18:20 UTC (permalink / raw)
  To: Bhargava Marreddy
  Cc: davem, edumazet, kuba, pabeni, andrew+netdev, horms, netdev,
	linux-kernel, michael.chan, pavan.chebbi, vsrama-krishna.nemani,
	vikas.gupta
In-Reply-To: <20260406180420.279470-1-bhargava.marreddy@broadcom.com>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon,  6 Apr 2026 23:34:10 +0530 you wrote:
> Hi,
> 
> This series enhances the bng_en driver by adding:
> 1. Link/PHY support
>    a. Link query
>    b. Async Link events
>    c. Ethtool link set/get functionality
> 2. Hardware statistics reporting via ethtool -S
> 
> [...]

Here is the summary with links:
  - [net-next,v12,01/10] bng_en: add per-PF workqueue, timer, and slow-path task
    https://git.kernel.org/netdev/net-next/c/2095da234017
  - [net-next,v12,02/10] bng_en: query PHY capabilities and report link status
    https://git.kernel.org/netdev/net-next/c/7626cd3d53be
  - [net-next,v12,03/10] bng_en: add ethtool link settings, get_link, and nway_reset
    https://git.kernel.org/netdev/net-next/c/169f6e8dd149
  - [net-next,v12,04/10] bng_en: implement ethtool pauseparam operations
    https://git.kernel.org/netdev/net-next/c/dc85e8a51f5a
  - [net-next,v12,05/10] bng_en: add support for link async events
    https://git.kernel.org/netdev/net-next/c/4a75900989c9
  - [net-next,v12,06/10] bng_en: add HW stats infra and structured ethtool ops
    https://git.kernel.org/netdev/net-next/c/8438239bd2b2
  - [net-next,v12,07/10] bng_en: periodically fetch and accumulate hardware statistics
    https://git.kernel.org/netdev/net-next/c/50c885cb2ebe
  - [net-next,v12,08/10] bng_en: implement ndo_get_stats64
    https://git.kernel.org/netdev/net-next/c/d4f802eb4e7d
  - [net-next,v12,09/10] bng_en: implement netdev_stat_ops
    https://git.kernel.org/netdev/net-next/c/c1da271f0d35
  - [net-next,v12,10/10] bng_en: add support for ethtool -S stats display
    https://git.kernel.org/netdev/net-next/c/bcc0f4c0f257

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH net] sctp: disable BH before calling udp_tunnel_xmit_skb()
From: Xin Long @ 2026-04-12 18:15 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Simon Horman,
	Marcelo Ricardo Leitner, Weiming Shi

udp_tunnel_xmit_skb() / udp_tunnel6_xmit_skb() are expected to run with
BH disabled.  After commit 6f1a9140ecda ("add xmit recursion limit to
tunnel xmit functions"), on the path:

  udp(6)_tunnel_xmit_skb() -> ip(6)tunnel_xmit()

dev_xmit_recursion_inc()/dec() must stay balanced on the same CPU.

Without local_bh_disable(), the context may move between CPUs, which can
break the inc/dec pairing. This may lead to incorrect recursion level
detection and cause packets to be dropped in ip(6)_tunnel_xmit() or
__dev_queue_xmit().

Fix it by disabling BH around both IPv4 and IPv6 SCTP UDP xmit paths.

In my testing, after enabling the SCTP over UDP:

  # ip net exec ha sysctl -w net.sctp.udp_port=9899
  # ip net exec ha sysctl -w net.sctp.encap_port=9899
  # ip net exec hb sysctl -w net.sctp.udp_port=9899
  # ip net exec hb sysctl -w net.sctp.encap_port=9899

  # ip net exec ha iperf3 -s

- without this patch:

  # ip net exec hb iperf3 -c 192.168.0.1 --sctp
  [  5]   0.00-10.00  sec  37.2 MBytes  31.2 Mbits/sec  sender
  [  5]   0.00-10.00  sec  37.1 MBytes  31.1 Mbits/sec  receiver

- with this patch:

  # ip net exec hb iperf3 -c 192.168.0.1 --sctp
  [  5]   0.00-10.00  sec  3.14 GBytes  2.69 Gbits/sec  sender
  [  5]   0.00-10.00  sec  3.14 GBytes  2.69 Gbits/sec  receiver

Fixes: 6f1a9140ecda ("add xmit recursion limit to tunnel xmit functions")
Fixes: 046c052b475e ("sctp: enable udp tunneling socks")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/ipv6.c     | 2 ++
 net/sctp/protocol.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 53a5c027f8e3..cd15b695607e 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -261,9 +261,11 @@ static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *t)
 	skb_set_inner_ipproto(skb, IPPROTO_SCTP);
 	label = ip6_make_flowlabel(sock_net(sk), skb, fl6->flowlabel, true, fl6);
 
+	local_bh_disable();
 	udp_tunnel6_xmit_skb(dst, sk, skb, NULL, &fl6->saddr, &fl6->daddr,
 			     tclass, ip6_dst_hoplimit(dst), label,
 			     sctp_sk(sk)->udp_port, t->encap_port, false, 0);
+	local_bh_enable();
 	return 0;
 }
 
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 828a59b8e7bf..5800e7ee7ea0 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1070,10 +1070,12 @@ static inline int sctp_v4_xmit(struct sk_buff *skb, struct sctp_transport *t)
 	skb_reset_inner_mac_header(skb);
 	skb_reset_inner_transport_header(skb);
 	skb_set_inner_ipproto(skb, IPPROTO_SCTP);
+	local_bh_disable();
 	udp_tunnel_xmit_skb(dst_rtable(dst), sk, skb, fl4->saddr,
 			    fl4->daddr, dscp, ip4_dst_hoplimit(dst), df,
 			    sctp_sk(sk)->udp_port, t->encap_port, false, false,
 			    0);
+	local_bh_enable();
 	return 0;
 }
 
-- 
2.47.1


^ permalink raw reply related

* [PATCH net] sctp: fix missing encap_port propagation for GSO fragments
From: Xin Long @ 2026-04-12 18:13 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Simon Horman,
	Marcelo Ricardo Leitner

encap_port in SCTP_INPUT_CB(skb) is used by sctp_vtag_verify() for
SCTP-over-UDP processing. In the GSO case, it is only set on the head
skb, while fragment skbs leave it 0.

This results in fragment skbs seeing encap_port == 0, breaking
SCTP-over-UDP connections.

Fix it by propagating encap_port from the head skb cb when initializing
fragment skbs in sctp_inq_pop().

Fixes: 046c052b475e ("sctp: enable udp tunneling socks")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/inqueue.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
index f5a7d5a38755..a024c0843247 100644
--- a/net/sctp/inqueue.c
+++ b/net/sctp/inqueue.c
@@ -201,6 +201,7 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
 
 			cb->chunk = head_cb->chunk;
 			cb->af = head_cb->af;
+			cb->encap_port = head_cb->encap_port;
 		}
 	}
 
-- 
2.47.1


^ permalink raw reply related

* Re: [net-next v10 00/10] Add TSO map-once DMA helpers and bnxt SW USO support
From: patchwork-bot+netdevbpf @ 2026-04-12 18:10 UTC (permalink / raw)
  To: Joe Damato
  Cc: netdev, andrew+netdev, davem, edumazet, kuba, pabeni, horms,
	michael.chan, pavan.chebbi, linux-kernel, leon
In-Reply-To: <20260408230607.2019402-1-joe@dama.to>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed,  8 Apr 2026 16:05:49 -0700 you wrote:
> Greetings:
> 
> This series extends net/tso to add a data structure and some helpers allowing
> drivers to DMA map headers and packet payloads a single time. The helpers can
> then be used to reference slices of shared mapping for each segment. This
> helps to avoid the cost of repeated DMA mappings, especially on systems which
> use an IOMMU. N per-packet DMA maps are replaced with a single map for the
> entire GSO skb. As of v3, the series uses the DMA IOVA API (as suggested by
> Leon [1]) and provides a fallback path when an IOMMU is not in use. The DMA
> IOVA API provides even better efficiency than the v2; see below.
> 
> [...]

Here is the summary with links:
  - [net-next,v10,01/10] net: tso: Introduce tso_dma_map and helpers
    https://git.kernel.org/netdev/net-next/c/82db77f6fb16
  - [net-next,v10,02/10] net: bnxt: Export bnxt_xmit_get_cfa_action
    https://git.kernel.org/netdev/net-next/c/268c63f2c6b2
  - [net-next,v10,03/10] net: bnxt: Add a helper for tx_bd_ext
    https://git.kernel.org/netdev/net-next/c/637237d3d93c
  - [net-next,v10,04/10] net: bnxt: Use dma_unmap_len for TX completion unmapping
    https://git.kernel.org/netdev/net-next/c/3cb430e62c83
  - [net-next,v10,05/10] net: bnxt: Add TX inline buffer infrastructure
    https://git.kernel.org/netdev/net-next/c/0c26a0e765e7
  - [net-next,v10,06/10] net: bnxt: Add boilerplate GSO code
    https://git.kernel.org/netdev/net-next/c/0440e27eedac
  - [net-next,v10,07/10] net: bnxt: Implement software USO
    https://git.kernel.org/netdev/net-next/c/cc5d90667db8
  - [net-next,v10,08/10] net: bnxt: Add SW GSO completion and teardown support
    https://git.kernel.org/netdev/net-next/c/87550ba2dc39
  - [net-next,v10,09/10] net: bnxt: Dispatch to SW USO
    https://git.kernel.org/netdev/net-next/c/28f2c22398fb
  - [net-next,v10,10/10] selftests: drv-net: Add USO test
    https://git.kernel.org/netdev/net-next/c/5d3b12d1a24b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next 00/11] netfilter: updates for net-next
From: Julian Anastasov @ 2026-04-12 18:07 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Florian Westphal, netdev, Paolo Abeni, David S. Miller,
	Eric Dumazet, netfilter-devel, pablo
In-Reply-To: <20260412105344.5e14fe70@kernel.org>


	Hello,

On Sun, 12 Apr 2026, Jakub Kicinski wrote:

> On Sun, 12 Apr 2026 18:54:49 +0200 Florian Westphal wrote:
> > Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Fri, 10 Apr 2026 13:23:41 +0200 Florian Westphal wrote:  
> > > > 1-3) IPVS updates from Julian Anastasov to enhance visibility into
> > > >      IPVS internal state by exposing hash size, load factor etc and
> > > >      allows userspace to tune the load factor used for resizing hash
> > > >      tables.  
> > > 
> > > Someone should take a look at the Sashiko reports for those, please?  
> > 
> > https://sashiko.dev/#/patchset/20260410112352.23599-1-fw%40strlen.de
> > 
> > Sorry Pablo I am dumping this on you.  Already wasted 3h on saturday
> > on LLM crap 8-(
> 
> Sorry, I was quoting the IPVS section of the PR because I meant that
> someone should look at the IPVS portion. The rest looked like a waste
> of time, indeed. The netns dismantle vs ipvs smelled like it could be
> legit.

	I'll check the IPVS part, there are probably
some problems to fix...

Regards

--
Julian Anastasov <ja@ssi.bg>


^ permalink raw reply

* Re: [PATCH net-next] net: lan743x: rename chip_rev to fpga_rev
From: patchwork-bot+netdevbpf @ 2026-04-12 18:00 UTC (permalink / raw)
  To: Thangaraj Samynathan
  Cc: bryan.whitehead, UNGLinuxDriver, andrew+netdev, davem, edumazet,
	kuba, pabeni, netdev, linux-kernel
In-Reply-To: <20260410085710.9246-1-thangaraj.s@microchip.com>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 10 Apr 2026 14:27:10 +0530 you wrote:
> The variable chip_rev stores the value read from the FPGA_REV
> register and represents the FPGA revision. Rename it to fpga_rev
> to better reflect its meaning.
> 
> No functional change intended.
> 
> Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
> 
> [...]

Here is the summary with links:
  - [net-next] net: lan743x: rename chip_rev to fpga_rev
    https://git.kernel.org/netdev/net-next/c/469faa546e7a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next] net: skb: clean up dead code after skb_kfree_head() simplification
From: patchwork-bot+netdevbpf @ 2026-04-12 18:00 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, kerneljasonxing,
	kuniyu, mhal, almasrymina, ebiggers, toke, linux-kernel
In-Reply-To: <20260410034736.297900-1-jiayuan.chen@linux.dev>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 10 Apr 2026 11:47:32 +0800 you wrote:
> Since commit 0f42e3f4fe2a ("net: skb: fix cross-cache free of
> KFENCE-allocated skb head"), skb_kfree_head() always calls kfree()
> and no longer uses end_offset to distinguish between skb_small_head_cache
> and generic kmalloc caches.
> 
> Clean up the leftovers:
> 
> [...]

Here is the summary with links:
  - [net-next] net: skb: clean up dead code after skb_kfree_head() simplification
    https://git.kernel.org/netdev/net-next/c/5758be283ff8

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] selftests: netfilter: nft_tproxy.sh: adjust to socat changes
From: patchwork-bot+netdevbpf @ 2026-04-12 18:00 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, kuba
In-Reply-To: <20260409224506.27072-1-fw@strlen.de>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 10 Apr 2026 00:45:02 +0200 you wrote:
> Like e65d8b6f3092 ("selftests: drv-net: adjust to socat changes") we
> need to add shut-none for this test too.
> 
> The extra 0-packet can trigger a second (unexpected) reply from the server.
> 
> Fixes: 7e37e0eacd22 ("selftests: netfilter: nft_tproxy.sh: add tcp tests")
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Closes: https://lore.kernel.org/netdev/20260408152432.24b8ad0d@kernel.org/
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> 
> [...]

Here is the summary with links:
  - [net] selftests: netfilter: nft_tproxy.sh: adjust to socat changes
    https://git.kernel.org/netdev/net-next/c/61119542663c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v2] vsock/virtio: remove unnecessary call to `virtio_transport_get_ops`
From: patchwork-bot+netdevbpf @ 2026-04-12 18:00 UTC (permalink / raw)
  To: Luigi Leonardi
  Cc: mst, jasowang, xuanzhuo, eperezma, stefanha, sgarzare, davem,
	edumazet, kuba, pabeni, horms, avkrasnov, kvm, virtualization,
	netdev, linux-kernel
In-Reply-To: <20260408-remove_parameter-v2-1-e00f31cf7a17@redhat.com>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 08 Apr 2026 17:21:02 +0200 you wrote:
> `virtio_transport_send_pkt_info` gets all the transport information
> from the parameter `t_ops`. There is no need to call
> `virtio_transport_get_ops()`.
> 
> Remove it.
> 
> Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
> 
> [...]

Here is the summary with links:
  - [net-next,v2] vsock/virtio: remove unnecessary call to `virtio_transport_get_ops`
    https://git.kernel.org/netdev/net-next/c/006679268a29

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next] netkit: Don't emit scrub attribute for single device mode
From: patchwork-bot+netdevbpf @ 2026-04-12 18:00 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, bpf, kuba, dw, pabeni, razor
In-Reply-To: <20260410072334.548232-1-daniel@iogearbox.net>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 10 Apr 2026 09:23:34 +0200 you wrote:
> When userspace reads a single mode netkit device via RTM_GETLINK,
> it receives IFLA_NETKIT_SCRUB=NETKIT_SCRUB_DEFAULT attribute from
> netkit_fill_info(). If that attribute is echoed back to recreate
> the device, the seen_scrub presence check in netkit_new_link()
> causes creation to fail with -EOPNOTSUPP. Since it has no meaning
> for single devices at this point, just don't dump it.
> 
> [...]

Here is the summary with links:
  - [net-next] netkit: Don't emit scrub attribute for single device mode
    https://git.kernel.org/netdev/net-next/c/e530b484b705

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next 01/11] ipvs: show the current conn_tab size to users
From: patchwork-bot+netdevbpf @ 2026-04-12 18:00 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, pabeni, davem, edumazet, kuba, netfilter-devel, pablo
In-Reply-To: <20260410112352.23599-2-fw@strlen.de>

Hello:

This series was applied to netdev/net-next.git (main)
by Florian Westphal <fw@strlen.de>:

On Fri, 10 Apr 2026 13:23:42 +0200 you wrote:
> From: Julian Anastasov <ja@ssi.bg>
> 
> As conn_tab is per-net, better to show the current hash table size
> to users instead of the ip_vs_conn_tab_size (max).
> 
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> 
> [...]

Here is the summary with links:
  - [net-next,01/11] ipvs: show the current conn_tab size to users
    https://git.kernel.org/netdev/net-next/c/22e620fe8455
  - [net-next,02/11] ipvs: add ip_vs_status info
    https://git.kernel.org/netdev/net-next/c/9a9ccef907a7
  - [net-next,03/11] ipvs: add conn_lfactor and svc_lfactor sysctl vars
    https://git.kernel.org/netdev/net-next/c/8d7de5477e47
  - [net-next,04/11] netfilter: x_physdev: reject empty or not-nul terminated device names
    https://git.kernel.org/netdev/net-next/c/8df772afc9d0
  - [net-next,05/11] netfilter: nfnetlink: prefer skb_mac_header helpers
    https://git.kernel.org/netdev/net-next/c/74feb7d373b3
  - [net-next,06/11] netfilter: xt_HL: add pr_fmt and checkentry validation
    https://git.kernel.org/netdev/net-next/c/24bd5c2679ca
  - [net-next,07/11] netfilter: xt_socket: enable defrag after all other checks
    https://git.kernel.org/netdev/net-next/c/542be3fa5aff
  - [net-next,08/11] netfilter: conntrack: remove UDP-Lite conntrack support
    https://git.kernel.org/netdev/net-next/c/84dee05d9d61
  - [net-next,09/11] netfilter: x_tables: Avoid a couple -Wflex-array-member-not-at-end warnings
    https://git.kernel.org/netdev/net-next/c/f30e5a7291a8
  - [net-next,10/11] netfilter: nft_fwd_netdev: check ttl/hl before forwarding
    https://git.kernel.org/netdev/net-next/c/1dfd95bdf4d1
  - [net-next,11/11] netfilter: require Ethernet MAC header before using eth_hdr()
    https://git.kernel.org/netdev/net-next/c/62443dc21114

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next 00/11] netfilter: updates for net-next
From: Jakub Kicinski @ 2026-04-12 17:53 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
	netfilter-devel, pablo
In-Reply-To: <advOUl92VLlqaiCJ@strlen.de>

On Sun, 12 Apr 2026 18:54:49 +0200 Florian Westphal wrote:
> Jakub Kicinski <kuba@kernel.org> wrote:
> > On Fri, 10 Apr 2026 13:23:41 +0200 Florian Westphal wrote:  
> > > 1-3) IPVS updates from Julian Anastasov to enhance visibility into
> > >      IPVS internal state by exposing hash size, load factor etc and
> > >      allows userspace to tune the load factor used for resizing hash
> > >      tables.  
> > 
> > Someone should take a look at the Sashiko reports for those, please?  
> 
> https://sashiko.dev/#/patchset/20260410112352.23599-1-fw%40strlen.de
> 
> Sorry Pablo I am dumping this on you.  Already wasted 3h on saturday
> on LLM crap 8-(

Sorry, I was quoting the IPVS section of the PR because I meant that
someone should look at the IPVS portion. The rest looked like a waste
of time, indeed. The netns dismantle vs ipvs smelled like it could be
legit.

^ permalink raw reply

* Re: [net,PATCH v2] net: ks8851: Reinstate disabling of BHs around IRQ handler
From: Jakub Kicinski @ 2026-04-12 17:51 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Marek Vasut, netdev, stable, David S. Miller, Andrew Lunn,
	Eric Dumazet, Nicolai Buchwitz, Paolo Abeni, Ronald Wahl,
	Yicong Hui, linux-kernel, Thomas Gleixner
In-Reply-To: <2558832d-c821-436d-898d-b708c5e0a228@nabladev.com>

On Sun, 12 Apr 2026 18:27:28 +0200 Marek Vasut wrote:
> On 4/12/26 6:01 PM, Jakub Kicinski wrote:
> > On Wed,  8 Apr 2026 18:24:58 +0200 Marek Vasut wrote:  
> >> If CONFIG_PREEMPT_RT=y is set AND the driver executes ks8851_irq() AND
> >> KSZ_ISR register bit IRQ_RXI is set AND ks8851_rx_pkts() detects that
> >> there are packets in the RX FIFO, then netdev_alloc_skb_ip_align() is
> >> called to allocate SKBs. If netdev_alloc_skb_ip_align() is called with
> >> BH enabled, local_bh_enable() at the end of netdev_alloc_skb_ip_align()
> >> will call __local_bh_enable_ip(), which will call __do_softirq(), which
> >> may trigger net_tx_action() softirq, which may ultimately call the xmit
> >> callback ks8851_start_xmit_par(). The ks8851_start_xmit_par() will try
> >> to lock struct ks8851_net_par .lock spinlock, which is already locked
> >> by ks8851_irq() from which ks8851_start_xmit_par() was called. This
> >> leads to a deadlock, which is reported by the kernel, including a trace
> >> listed below.  
> > 
> > lock_par is a spinlock, and AFAIU softirqs run in their on thread on RT.
> > I'm not following.  
> 
> Please look at the backtrace in the commit message, this part, please 
> read from bottom to top to observe the failure in chronological order. 
> It does not seem the handle_softirqs() is running in its own thread, 
> separate from the IRQ thread ?
> 
>    rt_spin_lock from ks8851_start_xmit_par+0x68/0x1a0
>    ks8851_start_xmit_par from netdev_start_xmit+0x1c/0x40 <---- this 
> tries to grab the same PAR spinlock, and deadlocks
>    netdev_start_xmit from dev_hard_start_xmit+0xec/0x1b0
>    dev_hard_start_xmit from sch_direct_xmit+0xb8/0x25c
>    sch_direct_xmit from __qdisc_run+0x20c/0x4fc
>    __qdisc_run from qdisc_run+0x1c/0x28
>    qdisc_run from net_tx_action+0x1f4/0x244
>    net_tx_action from handle_softirqs+0x1c0/0x29c
>    handle_softirqs from __local_bh_enable_ip+0xdc/0xf4
>    __local_bh_enable_ip from __netdev_alloc_skb+0x140/0x194
>    __netdev_alloc_skb from ks8851_irq+0x348/0x4d8 <---- this is called 
> from ks8851_rx_pkts() via netdev_alloc_skb_ip_align()
>    ks8851_irq from irq_thread_fn+0x24/0x64 <-------- this here runs with 
> the PAR spinlock held
> 
> > The patch looks way to "advanced" for a driver. Something is going
> > very wrong here. Or the commit message must be updated to explain
> > it better to people like me. Or both.  
> 
> Does the backtrace make the problem clearer, with the annotation above ?

Sebastian, do you have any recommendation here? tl;dr is that the driver does

	spin_lock_irqsave()
	__netdev_alloc_skb()
	spin_unlock_irqrestore()

And __netdev_alloc_skb() does:

	if (in_hardirq() || irqs_disabled()) {
		nc = this_cpu_ptr(&netdev_alloc_cache);
		data = page_frag_alloc(nc, len, gfp_mask);
		pfmemalloc = page_frag_cache_is_pfmemalloc(nc);
	} else {
		local_bh_disable();
		local_lock_nested_bh(&napi_alloc_cache.bh_lock);

		nc = this_cpu_ptr(&napi_alloc_cache.page);
		data = page_frag_alloc(nc, len, gfp_mask);
		pfmemalloc = page_frag_cache_is_pfmemalloc(nc);

		local_unlock_nested_bh(&napi_alloc_cache.bh_lock);
		local_bh_enable();
	}

the local_bh_enable() seems to kick in BH processing inline,
and BH processing takes the same spin lock the driver is already
holding.


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox