Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3] net: fman: fix use-after-free on IRQF_SHARED handler after probe failure
From: 赵金明 @ 2026-06-26  9:53 UTC (permalink / raw)
  To: horms
  Cc: horms, Andrew Lunn, andrew+netdev, davem, edumazet, kuba,
	linux-kernel, madalin.bucur, netdev, pabeni, sean.anderson
In-Reply-To: <20260625164237.1222477-1-horms@kernel.org>

Hi,

The analysis is logically correct. Since fman is zero-initialized by
kzalloc_obj(), both fman->cfg and fman->fpm_regs are NULL when
devm_request_irq() registers the shared IRQ handler. The guard in
fman_irq():

    if (!is_init_done(fman->cfg))
        return IRQ_NONE;

does not protect against this case because is_init_done(NULL) returns
true, so the handler would proceed to dereference the NULL
fpm_regs pointer via ioread32be().

However, this is a pre-existing issue unrelated to the UAF fix in this
patch. The window is very short -- between devm_request_irq() and the
completion of fman_init() -- and would require another device on the
same shared IRQ line to fire an interrupt during that interval.

If this should be addressed, I will send a separate patch for it.
The current patch is focused solely on the post-IRQ-registration UAF
on error paths.

Please let me know if you would like me to handle this separately.

Best regards,
ZhaoJinming



>This is an AI-generated review of your patch. The human sending this



>email has considered the AI review valid, or at least plausible.



>Full review at: https://sashiko.dev



>---



>net: fman: fix use-after-free on IRQF_SHARED handler after probe failure



>



>This patch fixes a use-after-free in the error path of read_dts_node(). It



>manually releases the registered IRQ handlers before freeing the fman



>structure, preventing the devres cleanup from leaving a registered IRQ



>pointing to freed memory.



>



>> diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c



>> index 013273a2de32f..ba2338da0cea1 100644



>> --- a/drivers/net/ethernet/freescale/fman/fman.c



>> +++ b/drivers/net/ethernet/freescale/fman/fman.c



>[ ... ]



>> @@ -2786,7 +2790,7 @@ static struct fman *read_dts_node(struct platform_device *of_dev)



>>? 		if (err < 0) {



>>? 			dev_err(&of_dev->dev, "%s: irq %d allocation failed (error = %d)\n",



>>? 				__func__, fman->dts_params.err_irq, err);



>> -			goto fman_free;



>> +			goto free_main_irq;



>



>[Severity: High]



>This is a pre-existing issue, but since we are modifying the IRQ error



>handling here, could the shared interrupt handler crash if it fires during



>this setup phase?



>



>Just before this snippet in read_dts_node(), devm_request_irq() is called



>with IRQF_SHARED while fman is still partially initialized. Since fman was



>allocated with kzalloc_obj(), fman->cfg and fman->fpm_regs are NULL.



>



>If a shared interrupt fires immediately, fman_irq() is called:



>



>drivers/net/ethernet/freescale/fman/fman.c:fman_irq() {



>??????? ...



>??????? if (!is_init_done(fman->cfg))



>??????????????? return IRQ_NONE;



>



>??????? fpm_rg = fman->fpm_regs;



>??????? pending = ioread32be(&fpm_rg->fm_npi);



>??????? ...



>}



>



>Since fman->cfg is NULL, is_init_done() behaves like this:



>



>drivers/net/ethernet/freescale/fman/fman.c:is_init_done() {



>??????? if (!cfg)



>??????????????? return true;



>??????? ...



>}



>



>Would this cause fman_irq() to bypass the early return, proceed with a NULL



>fpm_rg pointer, and dereference it during the ioread32be() call?



>



>>? 		}



>>? 	}



>



^ permalink raw reply

* [PATCH net-next v1] tcp/dccp: avoid parity split for socket-local bind range
From: xuanqiang.luo @ 2026-06-26  9:38 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell, netdev
  Cc: Kuniyuki Iwashima, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, luoxuanqiang

From: luoxuanqiang <luoxuanqiang@kylinos.cn>

IP_LOCAL_PORT_RANGE lets applications override the netns ephemeral port
range on a per-socket basis.  __inet_hash_connect() already treats such a
range as an explicit application partition and scans it with step 1 [1].

Do the same in inet_csk_find_open_port(): when a socket-local range is set,
walk the whole selected range instead of first splitting it by parity.
Keep the existing step-2 parity behavior for sockets using the netns range,
so the default bind/connect separation remains unchanged.

[1] https://lore.kernel.org/r/20231214192939.1962891-3-edumazet@google.com

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: luoxuanqiang <luoxuanqiang@kylinos.cn>
---
 net/ipv4/inet_connection_sock.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 56902bba54838..ad8af70c92ca3 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -323,13 +323,16 @@ inet_csk_find_open_port(const struct sock *sk, struct inet_bind_bucket **tb_ret,
 	struct inet_bind2_bucket *tb2;
 	struct inet_bind_bucket *tb;
 	u32 remaining, offset;
+	bool local_ports;
 	bool relax = false;
+	int step;
 
 	l3mdev = inet_sk_bound_l3mdev(sk);
 ports_exhausted:
 	attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
-	inet_sk_get_local_port_range(sk, &low, &high);
+	local_ports = inet_sk_get_local_port_range(sk, &low, &high);
+	step = local_ports ? 1 : 2;
 	high++; /* [32768, 60999] -> [32768, 61000[ */
 	if (high - low < 4)
 		attempt_half = 0;
@@ -342,18 +345,19 @@ inet_csk_find_open_port(const struct sock *sk, struct inet_bind_bucket **tb_ret,
 			low = half;
 	}
 	remaining = high - low;
-	if (likely(remaining > 1))
+	if (!local_ports && remaining > 1)
 		remaining &= ~1U;
 
 	offset = get_random_u32_below(remaining);
 	/* __inet_hash_connect() favors ports having @low parity
 	 * We do the opposite to not pollute connect() users.
 	 */
-	offset |= 1U;
+	if (!local_ports)
+		offset |= 1U;
 
 other_parity_scan:
 	port = low + offset;
-	for (i = 0; i < remaining; i += 2, port += 2) {
+	for (i = 0; i < remaining; i += step, port += step) {
 		if (unlikely(port >= high))
 			port -= remaining;
 		if (inet_is_local_reserved_port(net, port))
@@ -384,9 +388,11 @@ inet_csk_find_open_port(const struct sock *sk, struct inet_bind_bucket **tb_ret,
 		cond_resched();
 	}
 
-	offset--;
-	if (!(offset & 1))
-		goto other_parity_scan;
+	if (!local_ports) {
+		offset--;
+		if (!(offset & 1))
+			goto other_parity_scan;
+	}
 
 	if (attempt_half == 1) {
 		/* OK we now try the upper half of the range */
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v2] netfilter: nf_log: validate MAC header was set before dumping it
From: Alexander Martyniuk @ 2026-06-26  9:38 UTC (permalink / raw)
  To: gregkh
  Cc: alexevgmart, bestswngs, coreteam, davem, fw, kaber, kadlec, kuba,
	kuznet, linux-kernel, netdev, netfilter-devel, pablo, sashal,
	stable, xmei5, yoshfuji
In-Reply-To: <2026062658-pregame-buggy-ccbc@gregkh>

> What kernel(s) is this for?

5.10

^ permalink raw reply

* [PATCH net v2] net: libwx: fix VMDQ mask for 1-queue mode
From: Jiawen Wu @ 2026-06-26  9:25 UTC (permalink / raw)
  To: netdev
  Cc: Mengyuan Lou, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jacob Keller,
	Kees Cook, Jiawen Wu, Larysa Zaremba

In wx_set_vmdq_queues(), the VMDQ mask was not set for the devices not
supporting WX_FLAG_MULTI_64_FUNC, i.e., NGBE devices. A mask of 0 causes
__ALIGN_MASK(1, ~vmdq->mask) to return 0, which incorrectly sets
q_per_pool to 0 in wx_write_qde().

Fix the VMDQ 1-queue mask to 0x7F then ensures that __ALIGN_MASK(1,
~0x7F) correctly evaluates to 1.

Fixes: c52d4b898901 ("net: libwx: Redesign flow when sriov is enabled")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
v2: Fix the commit message.
---
 drivers/net/ethernet/wangxun/libwx/wx_lib.c  | 1 +
 drivers/net/ethernet/wangxun/libwx/wx_type.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.c b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
index d042567b8128..814d88d2aee4 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_lib.c
+++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
@@ -1802,6 +1802,7 @@ static bool wx_set_vmdq_queues(struct wx *wx)
 			rss_i = 4;
 		}
 	} else {
+		vmdq_m = WX_VMDQ_1Q_MASK;
 		/* double check we are limited to maximum pools */
 		vmdq_i = min_t(u16, 8, vmdq_i);
 
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_type.h b/drivers/net/ethernet/wangxun/libwx/wx_type.h
index c7befe4cdfe9..65e3e55db1cf 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_type.h
+++ b/drivers/net/ethernet/wangxun/libwx/wx_type.h
@@ -486,6 +486,7 @@ enum WX_MSCA_CMD_value {
 
 #define WX_VMDQ_4Q_MASK              0x7C
 #define WX_VMDQ_2Q_MASK              0x7E
+#define WX_VMDQ_1Q_MASK              0x7F
 
 /****************** Manageablility Host Interface defines ********************/
 #define WX_HI_MAX_BLOCK_BYTE_LENGTH  256 /* Num of bytes in range */
-- 
2.51.0


^ permalink raw reply related

* Re: [PATCH] xsk: fix memory corruptions in net/core/xdp.c
From: Clement Lecigne @ 2026-06-26  9:12 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: edumazet, netdev, bpf, linux-kernel, kuba, sdf, horms,
	john.fastabend, ast, daniel
In-Reply-To: <0922ce5d-48d8-44e7-983c-e547f3126ef4@intel.com>

[-- Attachment #1: Type: text/plain, Size: 5631 bytes --]

Thanks, Olek!

I submitted a v2 of the patch directly as a reply to this thread by
mistake, do you still want me to post this new version separately?

On Thu, Jun 25, 2026 at 5:14 PM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> From: Clement Lecigne <clecigne@google.com>
> Date: Wed, 24 Jun 2026 08:41:28 +0000
>
> > From: Clément Lecigne <clecigne@google.com>
> >
> > Commit 560d958c6c68 ("xsk: add generic XSk &xdp_buff -> skb conversion")
> > introduced a vulnerability in the handling of XDP_PASS for AF_XDP zero-copy
> > frames.
> >
> > Note: Currently, this specific AF_XDP zero-copy conversion path is only
> > reachable from the drivers/net/ethernet/intel/ice driver.
>
> idpf uses this, too (every driver based on libeth_xdp in general,
> currently these two).

Done.

>
> >
> > When building an skb, xdp_build_skb_from_zc() uses the chunk size
> > (xdp->frame_sz) for the allocation. However, napi_build_skb() automatically
> > reserves space at the end of the allocation for the skb_shared_info
> > structure.
> >
> > Most high performance UMEM applications use 4K chunks, where the
> > corruption cannot happen. However, if the UMEM is configured with 2KB
> > chunks (a very common configuration to maximize packet density in memory),
> > a standard 1500 MTU packet will trigger the corruption because the required
> > space exceeds the 2048 byte chunk size:
> >
> > Headroom (256) + Packet (1514) + skb_shared_info (320) = 2090 bytes
> >
> > Because 2090 bytes > 2048 bytes and __skb_put() does not perform bounds
> > checking, the memcpy() writes past the available linear data area and
> > corrupts the skb_shared_info structure. This can lead to arbitrary code
> > execution if pointers like destructor_arg are overwritten.
> >
> > Additionally, in xdp_copy_frags_from_zc(), the allocation size is set
> > strictly to the fragment size (len), but the subsequent memcpy() uses
> > LARGEST_ALIGN(len). This mismatch results in an out-of-bounds write of
> > up to 7 bytes, which triggers KASAN warnings and is unsafe despite typical
> > page pool allocator padding.
> >
> > Fix the skb allocation in xdp_build_skb_from_zc() by dynamically
> > calculating the exact truesize required: the sum of the headroom, the
> > packet length, and the skb_shared_info overhead, properly aligned via
> > SKB_DATA_ALIGN.
> >
> > Fix the out-of-bounds write in xdp_copy_frags_from_zc() by rounding up
> > the allocation request using LARGEST_ALIGN(len) to match the copy
> > operation.
> >
> > Fixes: 560d958c6c68 ("xsk: add generic XSk &xdp_buff -> skb conversion")
> > CC: Alexander Lobakin <aleksander.lobakin@intel.com>
> > CC: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Clément Lecigne <clecigne@google.com>
> > ---
> > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > index 9890a30584ba..f36d1fb875ab 100644
> > --- a/net/core/xdp.c
> > +++ b/net/core/xdp.c
> > @@ -699,7 +699,7 @@ static noinline bool xdp_copy_frags_from_zc(struct sk_buff *skb,
> >       for (u32 i = 0; i < nr_frags; i++) {
> >               const skb_frag_t *frag = &xinfo->frags[i];
> >               u32 len = skb_frag_size(frag);
> > -             u32 offset, truesize = len;
> > +             u32 offset, truesize = LARGEST_ALIGN(len);
>
> I think you need to re-sort this to keep RCT, now that the truesize
> initialization is way longer than it was.

Done.

>
>                 const skb_frag_t *frag = &xinfo->frags[i];
>                 u32 offset, len = skb_frag_size(frag);
>                 u32 truesize = LARGEST_ALIGN(len);
>                 struct page *page;
>
> >               struct page *page;
> >
> >               page = page_pool_dev_alloc(pp, &offset, &truesize);
>
> BTW usually LARGEST_ALIGN() aligns to 16, I've never seen a bigger one.
> IIRC Page Pool never returns a truesize aligned to a smaller value. But
> if you're really able to trigger this, it probably does?
>
> > @@ -740,7 +740,9 @@ struct sk_buff *xdp_build_skb_from_zc(struct xdp_buff *xdp)
> >  {
> >       const struct xdp_rxq_info *rxq = xdp->rxq;
> >       u32 len = xdp->data_end - xdp->data_meta;
> > -     u32 truesize = xdp->frame_sz;
> > +     u32 headroom = xdp->data_meta - xdp->data_hard_start;
> > +     u32 truesize = SKB_DATA_ALIGN(headroom + len) +
> > +                    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>
> Ah now I get it: xdp->frame_sz doesn't account the shinfo for
> single-buffer frames, only for multi-buffer ones. The fix looks correct,
> but I'd use SKB_HEAD_ALIGN() since it does exactly what you're
> open-coding here and sort the declarations:

Good idea, done.


>
> {
>         u32 hr = xdp->data_meta - xdp->data_hard_start;
>         const struct xdp_rxq_info *rxq = xdp->rxq;
>         u32 len = xdp->data_end - xdp->data_meta;
>         u32 truesize = SKB_HEAD_ALIGN(hr + len);
>         struct sk_buff *skb = NULL;
>         struct page_pool *pp;
>         int metalen;
>         void *data;
>
>         if (!IS_ENABLED(CONFIG_PAGE_POOL))
>                 return NULL;
>
>         ...
>
> >       struct sk_buff *skb = NULL;
> >       struct page_pool *pp;
> >       int metalen;
> > @@ -762,7 +764,7 @@ struct sk_buff *xdp_build_skb_from_zc(struct xdp_buff *xdp)
> >       }
> >
> >       skb_mark_for_recycle(skb);
> > -     skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
> > +     skb_reserve(skb, headroom);
> >
> >       memcpy(__skb_put(skb, len), xdp->data_meta, LARGEST_ALIGN(len));
>
> Thanks,
> Olek

Thanks,
-clem

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5281 bytes --]

^ permalink raw reply

* [PATCH net v2] nfc: nci: fix uninit-value in the RF discover/activated NTF handlers
From: Samuel Page @ 2026-06-26  9:03 UTC (permalink / raw)
  To: David Heidelberg
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, oe-linux-nfc, netdev, linux-kernel, stable

nci_rf_discover_ntf_packet() and nci_rf_intf_activated_ntf_packet() each
parse a notification into an on-stack struct (nci_rf_discover_ntf /
nci_rf_intf_activated_ntf) that is not initialised. The RF
technology-specific parameters are only extracted when
rf_tech_specific_params_len is non-zero, so a notification that reports a
zero length leaves the rf_tech_specific_params union uninitialised - and
both handlers then pass it to nci_add_new_protocol(), which reads it:

 - discover:  nci_add_new_target() -> nci_add_new_protocol();
 - activated: nci_target_auto_activated() -> nci_add_new_protocol().

nci_add_new_protocol() uses nfca_poll->nfcid1_len as both a branch
condition and a memcpy() length and copies nfcid1/sens_res/sel_res into
ndev->targets, which is later exposed to user space via NFC_CMD_GET_TARGET.

  BUG: KMSAN: uninit-value in nci_add_new_protocol+0x624/0x6c0
   nci_add_new_protocol+0x624/0x6c0
   nci_ntf_packet+0x25b2/0x3c30
   nci_rx_work+0x318/0x5d0
   process_scheduled_works+0x84b/0x17a0
   worker_thread+0xc10/0x11b0
   kthread+0x376/0x500
  Local variable ntf.i created at:
   nci_ntf_packet+0xbc2/0x3c30

Zero-initialise both on-stack notifications so the union reads back as
zero when no technology-specific parameters are present.

Fixes: 019c4fbaa790 ("NFC: Add NCI multiple targets support")
Fixes: e8c0dacd9836 ("NFC: Update names and structs to NCI spec 1.0 d18")
Link: https://lore.kernel.org/netdev/20260623172109.1105965-2-horms@kernel.org/
Cc: stable@vger.kernel.org
Assisted-by: Bynario AI
Signed-off-by: Samuel Page <sam@bynar.io>
---
v2: Drop the inaccurate activation_params / NFC_ATTR_TARGET_ATS scenario
    from the commit message. No code change; the ntf = {} fix is unchanged.

 net/nfc/nci/ntf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c
index c96512bb8653..274d9a4202c9 100644
--- a/net/nfc/nci/ntf.c
+++ b/net/nfc/nci/ntf.c
@@ -440,7 +440,7 @@ void nci_clear_target_list(struct nci_dev *ndev)
 static int nci_rf_discover_ntf_packet(struct nci_dev *ndev,
 				      const struct sk_buff *skb)
 {
-	struct nci_rf_discover_ntf ntf;
+	struct nci_rf_discover_ntf ntf = {};
 	const __u8 *data;
 	bool add_target = true;
 
@@ -688,7 +688,7 @@ static int nci_rf_intf_activated_ntf_packet(struct nci_dev *ndev,
 					    const struct sk_buff *skb)
 {
 	struct nci_conn_info *conn_info;
-	struct nci_rf_intf_activated_ntf ntf;
+	struct nci_rf_intf_activated_ntf ntf = {};
 	const __u8 *data;
 	int err = NCI_STATUS_OK;
 

base-commit: 02f144fbb4c86c360495d33debe307cb46a57f95
-- 
2.54.0


^ permalink raw reply related

* [PATCH net-next] ipv4: fib: fix route re-dump in inet_dump_fib() on multi-batch dump
From: Pengfei Zhang @ 2026-06-26  8:56 UTC (permalink / raw)
  To: dsahern, idosch
  Cc: davem, edumazet, kuba, pabeni, horms, netdev, linux-kernel,
	chenzhangqi, baohua, zhangpengfei16, Pengfei Zhang

inet_dump_fib() saves its progress in cb->args[1] as a positional
index within the current hash chain.  Between batches, a concurrent
fib_new_table() can insert a new table at the chain head, shifting
all existing entries.  On resume the saved index lands on a different
table, causing already-dumped tables to be re-dumped and the
originally suspended table to restart from the beginning.

Fix by storing tb->tb_id in cb->args[1] instead of a positional
index, mirroring the fix applied to inet6_dump_fib().

Fixes: 1b43af5480c3 ("[IPV6]: Increase number of possible routing tables to 2^32")
Signed-off-by: Pengfei Zhang <zhangfeionline@gmail.com>
---
Consider a hash slot containing two tables [A(pos=0), B(pos=1)] where
B is large enough to require multiple batches.  On the first batch, B
suspends mid-walk and the loop saves:

  cb->args[1] = e;   /* e=1, position of B in the chain */

The lock is then released.  At this point a concurrent fib_new_table()
inserts table C at the chain head via hlist_add_head_rcu(), making the
chain [C(pos=0), A(pos=1), B(pos=2)].

On the next batch, inet_dump_fib() resumes with s_e=1 and iterates:

  s_e = cb->args[1];   /* s_e = 1 */
  hlist_for_each_entry_rcu(tb, head, tb_hlist) {
      if (e < s_e)     /* skip C at pos=0 */
          goto next;
      /* e=1: tb now points to A, not B */
      if (dumped)
          memset(...);  /* resets B's suspended progress */
      fib_table_dump(tb, ...);   /* re-dumps A from scratch */
      dumped = 1;
      /* e=2: tb now points to B */
      fib_table_dump(tb, ...);   /* re-dumps B from beginning */
  }

Routes from A are dumped twice, and the portion of B that was already
dumped in the first batch is dumped again.

 net/ipv4/fib_frontend.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 42212970d..65fa245af 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1019,10 +1019,11 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 		.dump_routes = true,
 		.dump_exceptions = true,
 	};
-	unsigned int e = 0, s_e, h, s_h;
 	struct hlist_head *head;
 	int dumped = 0, err = 0;
+	unsigned int h, s_h;
 	struct fib_table *tb;
+	u32 s_id;
 
 	rcu_read_lock();
 	if (cb->strict_check) {
@@ -1054,29 +1055,28 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 	}
 
 	s_h = cb->args[0];
-	s_e = cb->args[1];
+	s_id = cb->args[1];
 
 	err = 0;
-	for (h = s_h; h < FIB_TABLE_HASHSZ; h++, s_e = 0) {
-		e = 0;
+	for (h = s_h; h < FIB_TABLE_HASHSZ; h++, s_id = 0) {
 		head = &net->ipv4.fib_table_hash[h];
 		hlist_for_each_entry_rcu(tb, head, tb_hlist) {
-			if (e < s_e)
-				goto next;
+			if (s_id && tb->tb_id != s_id)
+				continue;
+
+			s_id = 0;
 			if (dumped)
 				memset(&cb->args[2], 0, sizeof(cb->args) -
 						 2 * sizeof(cb->args[0]));
+			cb->args[1] = tb->tb_id;
 			err = fib_table_dump(tb, skb, cb, &filter);
 			if (err < 0)
 				goto out;
 			dumped = 1;
-next:
-			e++;
 		}
 	}
 out:
 
-	cb->args[1] = e;
 	cb->args[0] = h;
 
 unlock:
-- 
2.34.1


^ permalink raw reply related

* [PATCH v2] xsk: fix memory corruptions in net/core/xdp.c
From: Clement Lecigne @ 2026-06-26  8:52 UTC (permalink / raw)
  To: aleksander.lobakin, edumazet, netdev
  Cc: clecigne, bpf, linux-kernel, kuba, sdf, horms, john.fastabend,
	ast, daniel
In-Reply-To: <0922ce5d-48d8-44e7-983c-e547f3126ef4@intel.com>

From: Clément Lecigne <clecigne@google.com>

Commit 560d958c6c68 ("xsk: add generic XSk &xdp_buff -> skb conversion")
introduced a vulnerability in the handling of XDP_PASS for AF_XDP zero-copy
frames.

Note: Currently, this specific AF_XDP zero-copy conversion path is only
reachable from the drivers/net/ethernet/intel/ice and
drivers/net/ethernet/intel/idpf drivers.

When building an skb, xdp_build_skb_from_zc() uses the chunk size
(xdp->frame_sz) for the allocation. However, napi_build_skb() automatically
reserves space at the end of the allocation for the skb_shared_info
structure. 

Most high performance UMEM applications use 4K chunks, where the
corruption cannot happen. However, if the UMEM is configured with 2KB
chunks (a very common configuration to maximize packet density in memory),
a standard 1500 MTU packet will trigger the corruption because the required
space exceeds the 2048 byte chunk size:

Headroom (256) + Packet (1514) + skb_shared_info (320) = 2090 bytes

Because 2090 bytes > 2048 bytes and __skb_put() does not perform bounds
checking, the memcpy() writes past the available linear data area and
corrupts the skb_shared_info structure. This can lead to arbitrary code
execution if pointers like destructor_arg are overwritten.

Additionally, in xdp_copy_frags_from_zc(), the allocation size is set
strictly to the fragment size (len), but the subsequent memcpy() uses
LARGEST_ALIGN(len). This mismatch results in an out-of-bounds write of
up to 7 bytes, which triggers KASAN warnings and is unsafe despite typical
page pool allocator padding.

Fix the skb allocation in xdp_build_skb_from_zc() by dynamically
calculating the exact truesize required using SKB_HEAD_ALIGN() to
properly account for the headroom, the packet length, and the
skb_shared_info overhead.

Fix the out-of-bounds write in xdp_copy_frags_from_zc() by rounding up
the allocation request using LARGEST_ALIGN(len) to match the copy
operation.

Fixes: 560d958c6c68 ("xsk: add generic XSk &xdp_buff -> skb conversion")
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: Clément Lecigne <clecigne@google.com>
---
Changes since v1:
 - Used SKB_HEAD_ALIGN to properly calculate the required allocation size
   including the skb_shared_info overhead.
 - Re-ordered variable declarations.

---
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 9890a30584ba..52546746378a 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -698,8 +698,8 @@ static noinline bool xdp_copy_frags_from_zc(struct sk_buff *skb,

 	for (u32 i = 0; i < nr_frags; i++) {
 		const skb_frag_t *frag = &xinfo->frags[i];
-		u32 len = skb_frag_size(frag);
-		u32 offset, truesize = len;
+		u32 offset, len = skb_frag_size(frag);
+		u32 truesize = LARGEST_ALIGN(len);
 		struct page *page;

 		page = page_pool_dev_alloc(pp, &offset, &truesize);
@@ -738,9 +738,10 @@ static noinline bool xdp_copy_frags_from_zc(struct sk_buff *skb,
  */
 struct sk_buff *xdp_build_skb_from_zc(struct xdp_buff *xdp)
 {
+	u32 headroom = xdp->data_meta - xdp->data_hard_start;
 	const struct xdp_rxq_info *rxq = xdp->rxq;
 	u32 len = xdp->data_end - xdp->data_meta;
-	u32 truesize = xdp->frame_sz;
+	u32 truesize = SKB_HEAD_ALIGN(headroom + len);
 	struct sk_buff *skb = NULL;
 	struct page_pool *pp;
 	int metalen;
@@ -762,7 +763,7 @@ struct sk_buff *xdp_build_skb_from_zc(struct xdp_buff *xdp)
 	}

 	skb_mark_for_recycle(skb);
-	skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
+	skb_reserve(skb, headroom);

 	memcpy(__skb_put(skb, len), xdp->data_meta, LARGEST_ALIGN(len));

^ permalink raw reply related

* Re: [patch 09/24] timekeeping: Add CLOCK_AUX support for ktime_get_snapshot_id()
From: Thomas Weißschuh @ 2026-06-26  8:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, David Woodhouse, Miroslav Lichvar, John Stultz,
	Stephen Boyd, Anna-Maria Behnsen, Frederic Weisbecker,
	Arthur Kiyanovski, Rodolfo Giometti, Vincent Donnefort,
	Marc Zyngier, Oliver Upton, kvmarm, Oliver Upton, Richard Cochran,
	netdev, Takashi Iwai, Miri Korenblit, Johannes Berg, Jacob Keller,
	Tony Nguyen, Saeed Mahameed, Peter Hilber, Michael S. Tsirkin,
	virtualization, linux-wireless, linux-sound
In-Reply-To: <20260526171223.374814973@kernel.org>

On Tue, May 26, 2026 at 07:14:13PM +0200, Thomas Gleixner wrote:
(...)

>  static inline void tk_update_aux_offs(struct timekeeper *tk, ktime_t offs)
> @@ -1218,6 +1223,12 @@ bool ktime_get_snapshot_id(struct system
>  		tkd = &tk_core;
>  		offs = &tk_core.timekeeper.offs_boot;
>  		break;
> +	case CLOCK_AUX ... CLOCK_AUX_LAST:
> +		tkd = aux_get_tk_data(clock_id);
> +		if (!tkd)
> +			return false;
> +		offs = &tkd->timekeeper.offs_aux;
> +		break;

'tkd' is also used to compute 'monoraw'. However 'tkr_raw' and 'tkr_mono'
are the same for auxilary clocks, so this will compute a wrong 'monoraw'.
Instead 'monoraw' should be computed based on 'tk_core'.
Which then also requires the sequence locking of 'tk_core'.

As you know I have a series which unifies the locking between the
different timekeepers. Maybe we revert this patch for 7.2 and I send
a fixed variant including the prerequisites for 7.3.

(The same goes for get_device_system_crosststamp())

>  	default:
>  		WARN_ON_ONCE(1);
>  		return false;
> @@ -1228,6 +1239,10 @@ bool ktime_get_snapshot_id(struct system
>  	do {
>  		seq = read_seqcount_begin(&tkd->seq);
>  
> +		/* Aux clocks can be invalid */
> +		if (!tk->clock_valid)
> +			return false;
> +
>  		now = tk_clock_read(&tk->tkr_mono);
>  		systime_snapshot->cs_id = tk->tkr_mono.clock->id;
>  		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
> 

^ permalink raw reply

* [PATCH v2] Subject: [PATCH] net: gro: fix double aggregation of flush-marked skbs
From: Shiming Cheng @ 2026-06-26  8:44 UTC (permalink / raw)
  To: netdev, davem, edumazet, kuba, pabeni, horms, matthias.bgg,
	angelogioacchino.delregno, willemb, imv4bel, alice,
	eilaimemedsnaimel, sd
  Cc: lena.wang, stable, Shiming Cheng

The new skb_gro_receive_list() function is missing a critical safety check
present in the legacy skb_gro_receive() path. Specifically, it does not
validate NAPI_GRO_CB(skb)->flush before allowing packet aggregation.

This allows already-GRO'd packets with existing frag_list to be
re-aggregated into a new GRO session, corrupting the frag_list chain
structure. When skb_segment() attempts to unpack these malformed packets,
it encounters invalid state and triggers a kernel panic.

Scenario (Tethering/Device forwarding):
  1. Driver: Generated aggregated packet P1 via LRO with frag_list
  2. Dev A: Receives aggregated fraglist packet and flush flag set
  3. Dev A: Re-enters GRO, skb_gro_receive_list() is called
  4. Missing flush check allows re-aggregation despite flush flag
  5. Frag_list chain becomes corrupted (loops or dangling refs)
  6. Dev B: TX path calls skb_segment(), crashes on corrupted frag_list

Root cause in skb_segment():
  The check at line ~4891:
    if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) &&
        (skb_headlen(list_skb) == len || sg)) {

  When frag_list is corrupted by double aggregation, when list_skb is
  a NULL pointer from skb->next, skb_headlen(list_skb) dereference
  NULL/corrupted pointers occurs.

Call Trace:
 skb_headlen(NULL skb)
 skb_segment
 tcp_gso_segment
 tcp4_gso_segment
 inet_gso_segment
 skb_mac_gso_segment
 __skb_gso_segment
 skb_gso_segment
 validate_xmit_skb
 validate_xmit_skb_list
 sch_direct_xmit
 qdisc_restart
 __qdisc_run
 qdisc_run
 net_tx_action

Fix: Add NAPI_GRO_CB(skb)->flush validation to the early-return check in
skb_gro_receive_list(), matching the defensive programming pattern of
skb_gro_receive().

Fixes: 9dc2c3cd6c11 ("net: add fraglist GRO/GSO support")
Cc: stable@vger.kernel.org
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
---
 net/core/gro.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/gro.c b/net/core/gro.c
index 35f2f708f010..076247c1e662 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -229,7 +229,8 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)

 int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb)
 {
-	if (unlikely(p->len + skb->len >= 65536))
+	if (unlikely(p->len + skb->len >= 65536 ||
+		     NAPI_GRO_CB(skb)->flush))
 		return -E2BIG;

 	if (!pskb_may_pull(skb, skb_gro_offset(skb))) {
-- 
2.45.2

^ permalink raw reply related

* [PATCH net] e1000e: fix IRQ leak when request_irq() fails in e1000_request_msix()
From: Jiayuan Chen @ 2026-06-26  8:39 UTC (permalink / raw)
  To: netdev
  Cc: Jiayuan Chen, Tony Nguyen, Przemek Kitszel, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jeff Garzik, Bruce Allan, intel-wired-lan, linux-kernel

An internal syzbot instance reported the warning below.

comedi (comedi_parport) lets userspace request_irq() an arbitrary IRQ
number and can thus grab one of e1000e's MSI-X vectors. When
e1000_request_msix() then fails partway through, it returned without
freeing the vectors it had already requested; pci_disable_msix() later
tears those descriptors down while their irqaction is still attached,
leaking the /proc/irq entry.

Free the already requested IRQs on the error path.

genirq: Flags mismatch irq 28. 00200000 (eth1-tx-0) vs. 00200000 (comedi_parport)

remove_proc_entry: removing non-empty directory 'irq/27', leaking at least 'eth1-rx-0'
WARNING: fs/proc/generic.c:742 at remove_proc_entry+0x436/0x560, CPU#3: ip/445
Modules linked in:
CPU: 3 UID: 0 PID: 445 Comm: ip Not tainted 7.1.0+ #284 PREEMPT
RIP: 0010:remove_proc_entry (fs/proc/generic.c:742 (discriminator 4))
PKRU: 55555554
Call Trace:
<TASK>
unregister_irq_proc (kernel/irq/proc.c:406)
free_desc (kernel/irq/irqdesc.c:482)
irq_free_descs (kernel/irq/irqdesc.c:874 kernel/irq/irqdesc.c:865)
irq_domain_free_irqs (kernel/irq/irqdomain.c:1917)
msi_domain_free_locked.part.0 (kernel/irq/msi.c:1619 kernel/irq/msi.c:1645)
msi_domain_free_irqs_all_locked (kernel/irq/msi.c:1632)
pci_msi_teardown_msi_irqs (drivers/pci/msi/irqdomain.c:28)
pci_free_msi_irqs (drivers/pci/msi/msi.c:925)
pci_disable_msix (drivers/pci/msi/api.c:200 drivers/pci/msi/api.c:193)
e1000_request_irq (drivers/net/ethernet/intel/e1000e/netdev.c:2028)
e1000e_open (drivers/net/ethernet/intel/e1000e/netdev.c:4681)
__dev_open (net/core/dev.c:1702)
netif_change_flags (net/core/dev.c:9806)
do_setlink.isra.0 (net/core/rtnetlink.c:3207 (discriminator 1))
rtnetlink_rcv_msg (net/core/rtnetlink.c:7068)
netlink_rcv_skb (net/netlink/af_netlink.c:2556)

Fixes: 4662e82b2cb4 ("e1000e: add support for new 82574L part")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Assisted-by: Claude:claude-opus-4-8
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 808e5cddd6a9..19b9823c5679 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -2099,7 +2099,7 @@ void e1000e_set_interrupt_capability(struct e1000_adapter *adapter)
 static int e1000_request_msix(struct e1000_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
-	int err = 0, vector = 0;
+	int err = 0, vector = 0, i;
 
 	if (strlen(netdev->name) < (IFNAMSIZ - 5))
 		snprintf(adapter->rx_ring->name,
@@ -2111,7 +2111,7 @@ static int e1000_request_msix(struct e1000_adapter *adapter)
 			  e1000_intr_msix_rx, 0, adapter->rx_ring->name,
 			  netdev);
 	if (err)
-		return err;
+		goto err_free;
 	adapter->rx_ring->itr_register = adapter->hw.hw_addr +
 	    E1000_EITR_82574(vector);
 	adapter->rx_ring->itr_val = adapter->itr;
@@ -2127,7 +2127,7 @@ static int e1000_request_msix(struct e1000_adapter *adapter)
 			  e1000_intr_msix_tx, 0, adapter->tx_ring->name,
 			  netdev);
 	if (err)
-		return err;
+		goto err_free;
 	adapter->tx_ring->itr_register = adapter->hw.hw_addr +
 	    E1000_EITR_82574(vector);
 	adapter->tx_ring->itr_val = adapter->itr;
@@ -2136,11 +2136,16 @@ static int e1000_request_msix(struct e1000_adapter *adapter)
 	err = request_irq(adapter->msix_entries[vector].vector,
 			  e1000_msix_other, 0, netdev->name, netdev);
 	if (err)
-		return err;
+		goto err_free;
 
 	e1000_configure_msix(adapter);
 
 	return 0;
+
+err_free:
+	for (i = vector - 1; i >= 0; i--)
+		free_irq(adapter->msix_entries[i].vector, netdev);
+	return err;
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next] Documentation: networking: Add a test plan for ethtool pause validation
From: Maxime Chevallier @ 2026-06-26  8:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni, Simon Horman,
	Russell King, Heiner Kallweit, Jonathan Corbet, Shuah Khan,
	Oleksij Rempel, Vladimir Oltean, Florian Fainelli,
	thomas.petazzoni, netdev, linux-kernel, linux-doc
In-Reply-To: <58f37d6e-973b-4242-be82-0561ccdb1a6f@lunn.ch>


> Sphinx follows pythons object orientate structure. So you could have a
> class test_ethtool_pause_advertising, with class documentation. And
> then methods within the class which are individual tests.  The
> commented out section would then be method documentation.

Good point, so maybe something along these lines :

 - A class for the test
 - methods for indivitual tests
 - For readability, I've written what the internal test helper would look
   like (_adv_test), and how a test would look like without the helper in
   adv_rx_on_tx_on().

I'm already diving into coding, but it helps me a bit in the definition of the
"description" format :)

this is what the class would look like :


class test_ethtool_pause_advertising:
    """Pause advertisement

    Validate that changing pause params through the ETHTOOL_MSG_PAUSE command
    translates to a change in the advertised pause params, and that these
    parameters are correct w.r.t the supported pause params and requested pause
    params.
    
    This exercises the .set_pauseparams() ethtool ops for MAC configuration,
    as well as the reconfiguration of the PHY's advertising and negociation.
    
    On non-phylink MACs, the MAC should call phy_set_sym_pause() to update the
    PHY's advertising, and restart a negotiation with phy_start_aneg() if
    need be. Failure to do so will result on the wrong advertising parameters.
    
    Pn phylink-enabled MACs, phylink deals with the PHY reconfiguration provided
    the MAC driver calls phylink_ethtool_set_pauseparam().
    
    Failing this test likely means that the PHY driver is not correctly advertising
    pause settings, either due to the MAC triggering a PHY reconfiguration,
    a misconficonfiguration of the advertising registers by the PHY, or by
    mis-handling the phydev->advertising bitfield in the PHY driver directly.
    
    The validation is made by looking at the advertised modes locally, as well as
    what the peer's 'lp_advertising' values report.

    cfg -- local device's interface configuration
    peer -- peer device handle
    """

    def _adv_test(cfg, peer, rx, tx, adv, not_adv):
        ret = cfg.run(f"ethtool -A ethX rx {rx} tx {tx} autoneg on")
        ksft_eq(ret, 0)

        linkmodes = cfg.get_advertising()
        if adv:
            ksft_in(adv, linkmodes, f"rx {rx} tx {tx} must advertise {adv}")

        if not_adv:
            ksft_not_in(not_adv, linkmodes, f"rx {rx} tx {tx} must not advertise {not_adv}")

        remote_linkmodes = peer.get_lp_advertising()

        if adv:
            ksft_in(adv, linkmodes, f"PHY does not advertise {adv}")

        if not_adv:
            ksft_not_in(not_adv, linkmodes, f"PHY incorrectly advertises {not_adv}")


    @ksft_ethtool_needs_supported_allof([Pause])
    def adv_rx_on_tx_on(cfg, peer) -> None:
        """Advertising test with rx on tx on

        - run 'ethtool -A ethX rx on tx on autoneg on'
        - FAIL if the return isn't 0
        - FAIL if ETHTOOL_A_LINKMODES_OURS's advertised values does not contain
          "Pause" or contains "Asym_Pause"
        - FAIL if peer's lp_advertising doesn't contain "Pause" or contains
          "Asym_Pause"
        - Succeed otherwise
        """
        ret = cfg.run('ethtool -A ethX rx on tx on autoneg on')
        ksft_eq(ret, 0)

        linkmodes = cfg.get_advertising()
        ksft_in('Pause', linkmodes, "rx on tx on must advertise Pause")
        ksft_not_in('Asym_Pause', linkmodes, "rx on tx on must not advertise Asym_Pause")

        remote_linkmodes = peer.get_lp_advertising()
        ksft_in('Pause', linkmodes, "PHY does not advertise Pause")
        ksft_not_in('Asym_Pause', linkmodes, "PHY incorrectly advertises Asym_Pause")


    @ksft_ethtool_needs_supported_allof([Pause, Asym_Pause])
    def adv_rx_on_tx_off(cfg, peer) -> None:
        """Advertising test with rx on tx off

        - run 'ethtool -A ethX rx on tx off autoneg on'
        - FAIL if the return isn't 0
        - FAIL if ETHTOOL_A_LINKMODES_OURS's advertised values does not contain
          "Pause" and "Asym_Pause"
        - FAIL if peer's lp_advertising doesn't contain "Pause" and "Asym_Pause"
        - Succeed otherwise
        """

        _adv_test(cfg, peer, 'on', 'off', ["Pause", "Asym_Pause"], [])

    @ksft_ethtool_needs_supported_allof([Asym_Pause])
    def adv_rx_off_tx_on(cfg, peer) -> None:
        """Advertising test with rx off tx on

        - run 'ethtool -A ethX rx off tx on autoneg on'
        - FAIL if the return isn't 0
        - FAIL if ETHTOOL_A_LINKMODES_OURS's advertised values does not contain
          "Asym_Pause" or contains "Pause"
        - FAIL if peer's lp_advertising doesn't contain "Pause" and "Asym_Pause"
        - Succeed otherwise
        """

        _adv_test(cfg, peer, 'off', 'on', ["Asym_Pause"], ["Pause"])


Maxime

^ permalink raw reply

* [PATCH net v2 1/1] net: sched: ets: avoid deficit wrap and bound empty dequeue  rounds
From: Ren Wei @ 2026-06-26  8:32 UTC (permalink / raw)
  To: netdev
  Cc: jhs, jiri, davem, petrm, yuantan098, yifanwucs, tomapufckgml,
	zcliangcn, bird, bronzed_45_vested, n05ec

From: Wyatt Feng <bronzed_45_vested@icloud.com>

ETS keeps each DRR-style deficit in a u32 and replenishes it with
the configured quantum whenever the head packet is too large. Both
the quantum and qdisc_pkt_len() are user-controlled inputs: a large
quantum can wrap the deficit counter, while a tiny quantum combined
with an inflated qdisc_pkt_len() can force billions of iterations in
softirq context before any packet becomes eligible.

Store the deficit in u64 so replenishment cannot wrap the counter.
This keeps the existing dequeue logic unchanged while fixing the
overflow condition.

Bound one dequeue attempt to at most nbands * 2 ETS rotations, as
suggested in review. This avoids the livelock without adding heavier
logic to the fast path.

Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc")
Cc: stable@vger.kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Assisted-by: Codex:GPT-5.4
Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---
changes in v2:
  - Instead of doing a div() in the fast path, simply bound the loop per
    dequeue
  - v1 Link: https://lore.kernel.org/all/20260615103759.2404228-2-n05ec@lzu.edu.cn/


 net/sched/sch_ets.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index cb8cf437ce87..12a156ccb0a6 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -40,7 +40,7 @@ struct ets_class {
 	struct list_head alist; /* In struct ets_sched.active. */
 	struct Qdisc *qdisc;
 	u32 quantum;
-	u32 deficit;
+	u64 deficit;
 	struct gnet_stats_basic_sync bstats;
 	struct gnet_stats_queue qstats;
 };
@@ -463,6 +463,8 @@ ets_qdisc_dequeue_skb(struct Qdisc *sch, struct sk_buff *skb)
 static struct sk_buff *ets_qdisc_dequeue(struct Qdisc *sch)
 {
 	struct ets_sched *q = qdisc_priv(sch);
+	unsigned int max_loops = READ_ONCE(q->nbands) * 2;
+	unsigned int loops = 0;
 	struct ets_class *cl;
 	struct sk_buff *skb;
 	unsigned int band;
@@ -499,6 +501,8 @@ static struct sk_buff *ets_qdisc_dequeue(struct Qdisc *sch)
 
 		cl->deficit += READ_ONCE(cl->quantum);
 		list_move_tail(&cl->alist, &q->active);
+		if (++loops > max_loops)
+			goto out;
 	}
 out:
 	return NULL;
-- 
2.47.3


^ permalink raw reply related

* Re: [PATCH net] net: airoha: fix max receive size configuration
From: Lorenzo Bianconi @ 2026-06-26  8:25 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: linux-arm-kernel, linux-mediatek, netdev, Madhur Agrawal
In-Reply-To: <20260625-airoha-fix-rx-max-len-v1-1-45b9b827358d@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 12499 bytes --]

> Set the GDM maximum receive size to AIROHA_MAX_RX_SIZE unconditionally
> during hardware initialization instead of updating it according to the
> configured MTU. This avoids dropping incoming frames that exceed the
> current MTU but could still be processed by the networking stack, which
> is able to fragment the reply on the TX side (e.g. ICMP echo requests).
> Move the per-port MTU configuration to the PPE egress path where it
> belongs, and set the tx frame size running airoha_ppe_set_xmit_frame_size()
> to dynamically track the maximum MTU across running interfaces sharing
> the same PPE instance.
> Fix the PPE MTU register addressing to pack two port entries per
> register word and add WAN_MTU0 configuration for non-LAN GDM devices.
> 
> Fixes: 54d989d58d2a ("net: airoha: Move min/max packet len configuration in airoha_dev_open()")
> Tested-by: Madhur Agrawal <madhur.agrawal@airoha.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>

commenting on sashiko's report:
https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260625-airoha-fix-rx-max-len-v1-1-45b9b827358d%40kernel.org

> ---
>  drivers/net/ethernet/airoha/airoha_eth.c  | 68 ++++++++++---------------------
>  drivers/net/ethernet/airoha/airoha_eth.h  |  2 +
>  drivers/net/ethernet/airoha/airoha_ppe.c  | 39 +++++++++++++-----
>  drivers/net/ethernet/airoha/airoha_regs.h |  9 ++--
>  4 files changed, 58 insertions(+), 60 deletions(-)
> 
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 932b3a3df2e5..3f451c2d4c24 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -178,10 +178,15 @@ static void airoha_fe_maccr_init(struct airoha_eth *eth)
>  {
>  	int p;
>  
> -	for (p = 1; p <= ARRAY_SIZE(eth->ports); p++)
> +	for (p = 1; p <= ARRAY_SIZE(eth->ports); p++) {
>  		airoha_fe_set(eth, REG_GDM_FWD_CFG(p),
>  			      GDM_TCP_CKSUM_MASK | GDM_UDP_CKSUM_MASK |
>  			      GDM_IP4_CKSUM_MASK | GDM_DROP_CRC_ERR_MASK);
> +		airoha_fe_rmw(eth, REG_GDM_LEN_CFG(p),
> +			      GDM_SHORT_LEN_MASK | GDM_LONG_LEN_MASK,
> +			      FIELD_PREP(GDM_SHORT_LEN_MASK, 60) |
> +			      FIELD_PREP(GDM_LONG_LEN_MASK, AIROHA_MAX_RX_SIZE));
> +	}
>  
>  	airoha_fe_rmw(eth, REG_CDM_VLAN_CTRL(1), CDM_VLAN_MASK,
>  		      FIELD_PREP(CDM_VLAN_MASK, 0x8100));
> @@ -1831,13 +1836,24 @@ static void airoha_update_hw_stats(struct airoha_gdm_dev *dev)
>  	spin_unlock(&port->stats_lock);
>  }
>  
> +static void airoha_dev_set_xmit_frame_size(struct net_device *netdev)
> +{
> +	struct airoha_gdm_dev *dev = netdev_priv(netdev);
> +
> +	airoha_ppe_set_xmit_frame_size(dev);
> +	if (!airoha_is_lan_gdm_dev(dev))
> +		airoha_fe_rmw(dev->eth, REG_WAN_MTU0, WAN_MTU0_MASK,
> +			      FIELD_PREP(WAN_MTU0_MASK,
> +					 VLAN_ETH_HLEN + netdev->mtu));
> +}

- Could the WAN_MTU0 update here use the same max-across-siblings
  aggregation as airoha_ppe_set_xmit_frame_size()?
  - This is same issue reported by sashiko-gemini. There is just one WAN device
    in the system so we do not need calculate the max MTU here.

> +
>  static int airoha_dev_open(struct net_device *netdev)
>  {
> -	int err, len = ETH_HLEN + netdev->mtu + ETH_FCS_LEN;
>  	struct airoha_gdm_dev *dev = netdev_priv(netdev);
>  	struct airoha_gdm_port *port = dev->port;
> -	u32 cur_len, pse_port = FE_PSE_PORT_PPE1;
>  	struct airoha_qdma *qdma = dev->qdma;
> +	u32 pse_port = FE_PSE_PORT_PPE1;
> +	int err;
>  
>  	netif_tx_start_all_queues(netdev);
>  	err = airoha_set_vip_for_gdm_port(dev, true);
> @@ -1851,19 +1867,7 @@ static int airoha_dev_open(struct net_device *netdev)
>  		airoha_fe_clear(qdma->eth, REG_GDM_INGRESS_CFG(port->id),
>  				GDM_STAG_EN_MASK);
>  
> -	cur_len = airoha_fe_get(qdma->eth, REG_GDM_LEN_CFG(port->id),
> -				GDM_LONG_LEN_MASK);
> -	if (!port->users || len > cur_len) {
> -		/* Opening a sibling net_device with a larger MTU updates the
> -		 * MTU of already running devices. This is required to allow
> -		 * multiple net_devices with different MTUs to share the same
> -		 * GDM port.
> -		 */
> -		airoha_fe_rmw(qdma->eth, REG_GDM_LEN_CFG(port->id),
> -			      GDM_SHORT_LEN_MASK | GDM_LONG_LEN_MASK,
> -			      FIELD_PREP(GDM_SHORT_LEN_MASK, 60) |
> -			      FIELD_PREP(GDM_LONG_LEN_MASK, len));
> -	}
> +	airoha_dev_set_xmit_frame_size(netdev);
>  	port->users++;
>  
>  	if (!airoha_is_lan_gdm_dev(dev) &&
> @@ -1875,30 +1879,6 @@ static int airoha_dev_open(struct net_device *netdev)
>  	return 0;
>  }
>  
> -static void airoha_set_port_mtu(struct airoha_eth *eth,
> -				struct airoha_gdm_port *port)
> -{
> -	u32 len = 0;
> -	int i;
> -
> -	for (i = 0; i < ARRAY_SIZE(port->devs); i++) {
> -		struct airoha_gdm_dev *dev = port->devs[i];
> -		struct net_device *netdev;
> -
> -		if (!dev)
> -			continue;
> -
> -		netdev = netdev_from_priv(dev);
> -		if (netif_running(netdev))
> -			len = max_t(u32, len, netdev->mtu);
> -	}
> -	len += ETH_HLEN + ETH_FCS_LEN;
> -
> -	airoha_fe_rmw(eth, REG_GDM_LEN_CFG(port->id),
> -		      GDM_LONG_LEN_MASK,
> -		      FIELD_PREP(GDM_LONG_LEN_MASK, len));
> -}
> -
>  static int airoha_dev_stop(struct net_device *netdev)
>  {
>  	struct airoha_gdm_dev *dev = netdev_priv(netdev);
> @@ -1909,7 +1889,7 @@ static int airoha_dev_stop(struct net_device *netdev)
>  	airoha_set_vip_for_gdm_port(dev, false);
>  
>  	if (--port->users)
> -		airoha_set_port_mtu(dev->eth, port);
> +		airoha_ppe_set_xmit_frame_size(dev);

- On the close path, the call is to airoha_ppe_set_xmit_frame_size()
  directly rather than the airoha_dev_set_xmit_frame_size() wrapper.
  Does this mean WAN_MTU0 is never refreshed when a WAN dev is closed?
  For example, if a small-MTU sibling is closed while a larger-MTU dev
  remains running, the PPE MTU register gets recomputed to the larger
  value but WAN_MTU0 retains the smaller value written at the last open
  or change_mtu.
  The commit message states:
    set the tx frame size running airoha_ppe_set_xmit_frame_size()
    to dynamically track the maximum MTU across running interfaces sharing
    the same PPE instance.
  Is the asymmetry between PPE MTU (max across siblings) and WAN_MTU0
  (per-netdev write) intentional?
  - This is same issue reported by sashiko-gemini. There is just one WAN device
    in the system so there is no point to update WAN_MTU0 if the WAN device is
    stopped.

Regards,
Lorenzo

>  	else
>  		airoha_set_gdm_port_fwd_cfg(qdma->eth,
>  					    REG_GDM_FWD_CFG(port->id),
> @@ -1962,10 +1942,6 @@ static int airoha_enable_gdm2_loopback(struct airoha_gdm_dev *dev)
>  		      FIELD_PREP(LPBK_CHAN_MASK, chan) |
>  		      LBK_GAP_MODE_MASK | LBK_LEN_MODE_MASK |
>  		      LBK_CHAN_MODE_MASK | LPBK_EN_MASK);
> -	airoha_fe_rmw(eth, REG_GDM_LEN_CFG(AIROHA_GDM2_IDX),
> -		      GDM_SHORT_LEN_MASK | GDM_LONG_LEN_MASK,
> -		      FIELD_PREP(GDM_SHORT_LEN_MASK, 60) |
> -		      FIELD_PREP(GDM_LONG_LEN_MASK, AIROHA_MAX_MTU));
>  	/* Forward the traffic to the proper GDM port */
>  	pse_port = port->id == AIROHA_GDM3_IDX ? FE_PSE_PORT_GDM3
>  					       : FE_PSE_PORT_GDM4;
> @@ -2098,7 +2074,7 @@ static int airoha_dev_change_mtu(struct net_device *netdev, int mtu)
>  
>  	WRITE_ONCE(netdev->mtu, mtu);
>  	if (port->users)
> -		airoha_set_port_mtu(dev->eth, port);
> +		airoha_dev_set_xmit_frame_size(netdev);
>  
>  	return 0;
>  }
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
> index d7ff8c5200e2..0c3fb6e5d7f1 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.h
> +++ b/drivers/net/ethernet/airoha/airoha_eth.h
> @@ -23,6 +23,7 @@
>  #define AIROHA_MAX_DSA_PORTS		7
>  #define AIROHA_MAX_NUM_RSTS		3
>  #define AIROHA_MAX_MTU			9220
> +#define AIROHA_MAX_RX_SIZE		16128
>  #define AIROHA_MAX_PACKET_SIZE		2048
>  #define AIROHA_NUM_QOS_CHANNELS		4
>  #define AIROHA_NUM_QOS_QUEUES		8
> @@ -676,6 +677,7 @@ int airoha_get_fe_port(struct airoha_gdm_dev *dev);
>  bool airoha_is_valid_gdm_dev(struct airoha_eth *eth,
>  			     struct airoha_gdm_dev *dev);
>  
> +void airoha_ppe_set_xmit_frame_size(struct airoha_gdm_dev *dev);
>  void airoha_ppe_set_cpu_port(struct airoha_gdm_dev *dev, u8 ppe_id, u8 fport);
>  bool airoha_ppe_is_enabled(struct airoha_eth *eth, int index);
>  void airoha_ppe_check_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
> diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
> index 42f4b0f21d17..e7c78293002a 100644
> --- a/drivers/net/ethernet/airoha/airoha_ppe.c
> +++ b/drivers/net/ethernet/airoha/airoha_ppe.c
> @@ -97,6 +97,33 @@ void airoha_ppe_set_cpu_port(struct airoha_gdm_dev *dev, u8 ppe_id, u8 fport)
>  		      __field_prep(DFT_CPORT_MASK(fport), fe_cpu_port));
>  }
>  
> +void airoha_ppe_set_xmit_frame_size(struct airoha_gdm_dev *dev)
> +{
> +	struct airoha_gdm_port *port = dev->port;
> +	struct airoha_eth *eth = dev->eth;
> +	int i, ppe_id, index;
> +	u32 len = 0;
> +
> +	for (i = 0; i < ARRAY_SIZE(port->devs); i++) {
> +		struct airoha_gdm_dev *d = port->devs[i];
> +		struct net_device *netdev;
> +
> +		if (!d)
> +			continue;
> +
> +		netdev = netdev_from_priv(d);
> +		if (netif_running(netdev))
> +			len = max_t(u32, len, netdev->mtu);
> +	}
> +	len += VLAN_ETH_HLEN;
> +
> +	ppe_id = !airoha_is_lan_gdm_dev(dev) && airoha_ppe_is_enabled(eth, 1);
> +	index = port->id == AIROHA_GDM4_IDX ? 7 : port->id;
> +	airoha_fe_rmw(eth, REG_PPE_MTU(ppe_id, index),
> +		      FP_EGRESS_MTU_MASK(index),
> +		      __field_prep(FP_EGRESS_MTU_MASK(index), len));
> +}
> +
>  static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
>  {
>  	u32 sram_ppe_num_data_entries = PPE_SRAM_NUM_ENTRIES, sram_num_entries;
> @@ -115,8 +142,6 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
>  		PPE_RAM_NUM_ENTRIES_SHIFT(sram_ppe_num_data_entries);
>  
>  	for (i = 0; i < eth->soc->num_ppe; i++) {
> -		int p;
> -
>  		airoha_fe_wr(eth, REG_PPE_TB_BASE(i),
>  			     ppe->foe_dma + sram_tb_size);
>  
> @@ -166,15 +191,6 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
>  		airoha_fe_wr(eth, REG_PPE_HASH_SEED(i), PPE_HASH_SEED);
>  		airoha_fe_clear(eth, REG_PPE_PPE_FLOW_CFG(i),
>  				PPE_FLOW_CFG_IP6_6RD_MASK);
> -
> -		for (p = 0; p < ARRAY_SIZE(eth->ports); p++)
> -			airoha_fe_rmw(eth, REG_PPE_MTU(i, p),
> -				      FP0_EGRESS_MTU_MASK |
> -				      FP1_EGRESS_MTU_MASK,
> -				      FIELD_PREP(FP0_EGRESS_MTU_MASK,
> -						 AIROHA_MAX_MTU) |
> -				      FIELD_PREP(FP1_EGRESS_MTU_MASK,
> -						 AIROHA_MAX_MTU));
>  	}
>  
>  	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
> @@ -196,6 +212,7 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
>  				 airoha_ppe_is_enabled(eth, 1);
>  			fport = airoha_get_fe_port(dev);
>  			airoha_ppe_set_cpu_port(dev, ppe_id, fport);
> +			airoha_ppe_set_xmit_frame_size(dev);
>  		}
>  	}
>  }
> diff --git a/drivers/net/ethernet/airoha/airoha_regs.h b/drivers/net/ethernet/airoha/airoha_regs.h
> index 436f3c8779c1..6fed63d013b4 100644
> --- a/drivers/net/ethernet/airoha/airoha_regs.h
> +++ b/drivers/net/ethernet/airoha/airoha_regs.h
> @@ -327,9 +327,8 @@
>  #define PPE_SRAM_TABLE_EN_MASK			BIT(0)
>  
>  #define REG_PPE_MTU_BASE(_n)			(((_n) ? PPE2_BASE : PPE1_BASE) + 0x304)
> -#define REG_PPE_MTU(_m, _n)			(REG_PPE_MTU_BASE(_m) + ((_n) << 2))
> -#define FP1_EGRESS_MTU_MASK			GENMASK(29, 16)
> -#define FP0_EGRESS_MTU_MASK			GENMASK(13, 0)
> +#define REG_PPE_MTU(_m, _n)			(REG_PPE_MTU_BASE(_m) + (((_n) / 2) << 2))
> +#define FP_EGRESS_MTU_MASK(_n)			GENMASK(13 + (((_n) % 2) << 4), ((_n) % 2) << 4)
>  
>  #define REG_PPE_RAM_CTRL(_n)			(((_n) ? PPE2_BASE : PPE1_BASE) + 0x31c)
>  #define PPE_SRAM_CTRL_ACK_MASK			BIT(31)
> @@ -377,6 +376,10 @@
>  #define REG_SRC_PORT_FC_MAP6		0x2298
>  #define FC_ID_OF_SRC_PORT_MASK(_n)	GENMASK(4 + ((_n) << 3), ((_n) << 3))
>  
> +#define REG_WAN_MTU0			0x2300
> +#define WAN_MTU1_MASK			GENMASK(29, 16)
> +#define WAN_MTU0_MASK			GENMASK(13, 0)
> +
>  #define REG_CDM5_RX_OQ1_DROP_CNT	0x29d4
>  
>  /* QDMA */
> 
> ---
> base-commit: fd1269e454089abda0e4f9e5e25ecd02a90ab009
> change-id: 20260618-airoha-fix-rx-max-len-57654b661646
> 
> Best regards,
> -- 
> Lorenzo Bianconi <lorenzo@kernel.org>
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH iwl v3] ice: retry reading NVM if admin queue returns EBUSY
From: Robert Malz @ 2026-06-26  8:15 UTC (permalink / raw)
  To: Przemek Kitszel
  Cc: Simon Horman, Grzegorz Nitka, anthony.l.nguyen, intel-wired-lan,
	netdev
In-Reply-To: <CADcc-bwd2CcWJ1AFDm1GR1HBzo2OOh=Xr3moNS+-RVuai6yVBA@mail.gmail.com>

Hey Przemek,
I ran some tests and unfortunately, the following sentence from the
datasheet is true:
"For specific resources, such as Change Lock (0x0003) and Global Config Lock
(0x0004), this field is used by software to override the default timeout for the
operation, and also to specify the timeout used for this operation."

This means we can only change a default timeout for 0x0003 and 0x0004
but not for 0x0001 (NVM resource).
Whatever timeout I provide FW defaults to 0xB88
Input:
[ 2209.656758] ice 0000:31:00.0: CQ CMD: opcode 0x0008, flags 0x2000,
datalen 0x0000, retval 0x0000
[ 2209.656760] ice 0000:31:00.0:        cookie (h,l) 0x00000000 0x00000000
[ 2209.656761] ice 0000:31:00.0:        param (0,1)  0x00010001 0x00000BB9
Output:
[ 2209.656927] ice 0000:31:00.0: CQ CMD: opcode 0x0008, flags 0x2003,
datalen 0x0000, retval 0x0000
[ 2209.656929] ice 0000:31:00.0:        cookie (h,l) 0x00000000 0x00000000
[ 2209.656931] ice 0000:31:00.0:        param (0,1)  0x00010001 0x00000BB8

Correct me If I'm wrong, but the only way to properly handle it is to
ensure the resource is locked and released between every
ice_acquire_nvm call.
I'll start working on this.

Regards,
Robert


On Thu, Jun 25, 2026 at 12:14 PM Robert Malz <robert.malz@canonical.com> wrote:
>
> Hey Przemek,
> Thanks a lot for the feedback.
> I was sure that we use ICE_NVM_TIMEOUT (180s) as a timeout every time
> (ice_acquire_nvm) but your proposal made me rethink it a little.
> First of all, the datasheet for E810 specifies the timeout as: "As an
> input, the software might specify timeout longer than the default
> taken for this resource, and up to one minute."
> 180s is greater than one minute so I took a look into AQC logs:
> [  110.698471] ice 0000:05:00.0: CQ CMD: opcode 0x0008, flags 0x2000,
> datalen 0x0000, retval 0x0000
> [  110.698474] ice 0000:05:00.0:        cookie (h,l) 0x00000000 0x00000000
> [  110.698477] ice 0000:05:00.0:        param (0,1)  0x00010001 0x0002BF20
> [  110.698480] ice 0000:05:00.0:        addr (h,l)   0x00000000 0x00000000
> [  110.698645] ice 0000:05:00.0: ATQ: desc and buffer writeback:
> [  110.698648] ice 0000:05:00.0: CQ CMD: opcode 0x0008, flags 0x2003,
> datalen 0x0000, retval 0x0000
> [  110.698651] ice 0000:05:00.0:        cookie (h,l) 0x00000000 0x00000000
> [  110.698654] ice 0000:05:00.0:        param (0,1)  0x00010001 0x00000BB8
> [  110.698657] ice 0000:05:00.0:        addr (h,l)   0x00000000 0x00000000
> Based on the above, the driver requested a 0x0002BF20 timeout (180 000
> ms) but the FW returned only 0x00000BB8 (3s).
> I'm assuming this is expected behavior since the maximum timeout for
> NVM read should be 60,000 ms.
>
> If changing the timeout requested by the driver to 60s for read ops is
> handled correctly by the FW and the FW respects that lock, the retry
> patch submitted in this email thread might not be required at all.
> Let me quickly prepare a new patch and test it. I'll update this
> thread once I have a results.
>
> Regards,
> Robert
>
>
> On Thu, Jun 25, 2026 at 9:51 AM Przemek Kitszel
> <przemyslaw.kitszel@intel.com> wrote:
> >
> > >> This is an AI-generated review of your patch. The human sending this
> >
> >
> > >> [Severity: Low]
> > >> Does this check allow an extra retry execution? Since retry_cnt starts at 0,
> > >> using a strictly greater-than check allows the loop to retry 4 times
> > >> (5 total executions) instead of the 3 executions intended by
> > >> ICE_SQ_SEND_MAX_EXECUTE.
> > >>
> > >>> +                             break;
> > >>> +                     ice_debug(hw, ICE_DBG_NVM,
> > >>> +                               "NVM read EBUSY error, retry %d\n",
> > >>> +                               retry_cnt + 1);
> > >>> +                     last_cmd = false;
> > >>> +                     ice_release_nvm(hw);
> > >>> +                     msleep(ICE_SQ_SEND_DELAY_TIME_MS);
> > >>
> > > The review is correct that this allows one extra execution. However, I
> > > intentionally kept this verbatim from the OOT driver (1.15.4) to
> >
> > This OOT change was made entirely by Grzegorz, for E825C. Would be great
> > to acknowledge his authorship (+ SoB ofc, below).
> > Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
> >
> > What was the platform that you have encountered the bug on?
> >
> > > retain the significant test exposure it has there. The absolute worst
> > > case is one harmless extra retry before giving up. If you strongly
> > > prefer strict adherence to ICE_SQ_SEND_MAX_EXECUTE, I can update the
> > > condition in a v4.
> >
> > Please update (to have one attempt less).
> >
> > >
> > >> [Severity: High]
> > >> Does dropping the NVM lock here violate the caller's assumption of mutual
> > >> exclusion?
> > >>
> > >> Callers of ice_read_flat_nvm() typically acquire the NVM lock beforehand to
> > >> ensure an atomic read of multiple sectors without interleaved writes. By
> > >> explicitly dropping the lock during the msleep() window, another thread, VF,
> > >> or PF could acquire the NVM lock and modify the NVM before the read resumes,
> > >> potentially resulting in reading torn or inconsistent NVM data.
> > >>
> > >>> +                     status = ice_acquire_nvm(hw, ICE_RES_READ);
> > >>> +                     if (status)
> > >>> +                             break;
> > >>
> > >
> > > I don't think this is an issue. The NVM resource (req_res 0x0008) is
> > > an ownership lease, not a content snapshot. The FW grants it with a
> > > hold timeout (see the timeout field handling in ice_aq_req_res). It
> > > never guaranteed a point-in-time view of the flash.
> > > Very similar behavior already exists in ice_devlink_nvm_snapshot(),
> > > which reads the NVM in chunks via ice_read_flat_nvm() and deliberately
> > > acquires/releases the lock between chunks (with a comment explaining
> > > the FW reclaim).
> > > Additionally, the lock is only dropped on the failure path, where
> > > neither bytes_read nor offset have been advanced. The driver simply
> > > re-reads the failed chunk after re-acquiring. So no data that is
> > > returned to the caller is committed across the drop window.
> > > Overall the lock drop does not weaken any guarantee callers actually
> > > have. If we wanted to introduce a true snapshot guarantee, that would
> > > probably require different approach and separate patch.
> > >
> > >
> > >> [Severity: High]
> > >> Can this lead to an unbalanced NVM resource lock release and potential data
> > >> corruption?
> > >>
> > >> If ice_acquire_nvm() fails here, the loop breaks and returns an error with the
> > >> lock currently unheld. However, callers like ice_read_flash_module() are
> > >> unaware of this internal lock state change and will unconditionally call
> > >> ice_release_nvm() upon failure:
> > >>
> > >> drivers/net/ethernet/intel/ice/ice_nvm.c:ice_read_flash_module() {
> > >>          status = ice_acquire_nvm(hw, ICE_RES_READ);
> > >>          if (status)
> > >>                  return status;
> > >>
> > >>          status = ice_read_flat_nvm(hw, start + offset, &length, data, false);
> > >>
> > >>          ice_release_nvm(hw);
> > >>      ...
> > >>
> > >> Because firmware tracks resource locks per-PF rather than per-OS-thread, this
> > >> errant second release could forcefully strip the lock from another thread on
> > >> the same PF that successfully acquired it, exposing the NVM to concurrent
> > >> modification.
> > >>
> > >
> > > Agreed, this might be a real bug, and the one of the three I think is
> > > worth investigating.
> > > If ice_acquire_nvm() fails after the drop, ice_read_flat_nvm() returns
> > > with the lock unheld while callers unconditionally call
> > > ice_release_nvm(), so a stray release is issued.
> > >
> > > On probability, though, the window is very small. Reaching it requires
> > > sustained EBUSY across the retry budget plus a failed re-acquire
> > > (which itself polls up to ICE_NVM_TIMEOUT), and concurrently another
> > > requester taking the lock. Most reads happen during init (ice_probe,
> > > and reset/rebuild via ice_init_nvm), and NVM writes only happen on an
> > > already initialized driver. The devlink/ethtool nvm_read paths are
> > > also exposed, but hitting this race would require precise timing
> > > against a concurrent NVM owner on the device.
> > >
> > > I'd prefer to keep the scope of this patch limited to the EBUSY retry
> > > path and not take on the unbalanced-release fix here. A proper fix
> > > should change the lock-ownership contract of ice_read_flat_nvm() (on
> > > error, the lock must be released by ice_read_flat_nvm(), callers
> > > release only on success) and update all callers. Code change sould be
> > > simple for all callers but ice_discover_flash_size(), it intentionally
> > > holds one lease across a read loop and would need to re-acquire after
> > > each expected boundary failure.
> > >
> > > Given how small the original window is, I'd rather not trade tested
> > > OOT behavior for the risk of a complex unbalanced NVM lock fix. I
> > > actually have a patch mostly ready that fixes the lock-ownership
> > > contract, but I really don't like it. It changes the design of
> > > ice_read_flat_nvm(), making it less intuitive for callers. More
> > > importantly, I just don't have the resources or test coverage right
> > > now to properly verify such change.
> > >
> > > However, I can modify the failure path for ice_acquire_nvm inside
> > > ice_read_flat_nvm. Instead of bailing out immediately, we can just
> > > retry it within the existing retry budget. In this case, the
> > > probability of leaving ice_read_flat_nvm without holding the lock is
> > > reduced even further without needing a refactor.
> > >
> > > Please let me know what you think about my thought process on this.
> >
> > I think that both AI-reported issues against the lock are valid
> > concerns.
> >
> > I think that sleep was the actual fix, and re-locking were merely
> > a necessity due to their expiration (as you said).
> >
> > A proper fix would be to just increase lock-timeout to accommodate all
> > attempts (and still do the retries&sleep, but without unlocking).
> >
> >
> > >
> > >
> > >
> > >>> +                     retry_cnt++;
> > >>> +             } else {
> > >>> +                     bytes_read += read_size;
> > >>> +                     offset += read_size;
> > >>> +                     retry_cnt = 0;
> > >>> +             }
> > >>>        } while (!last_cmd);
> > >>>
> > >>>        *length = bytes_read;
> > >
> > > Thanks,
> > > Robert
> >

^ permalink raw reply

* Re: [PATCH v5 0/9] driver core: Fix some race conditions
From: patchwork-bot+linux-riscv @ 2026-06-26  8:21 UTC (permalink / raw)
  To: Doug Anderson
  Cc: linux-riscv, gregkh, rafael, dakr, stern, aik, johan, edumazet,
	leon, hch, robin.murphy, maz, aleksander.lobakin, saravanak, akpm,
	Frank.Li, jgg, alex, alexander.stein, andre.przywara, andrew,
	andrew, andriy.shevchenko, aou, ardb, astewart, bhelgaas, brgl,
	broonie, catalin.marinas, chleroy, davem, david, devicetree,
	dmaengine, driver-core, gbatra, gregory.clement, hkallweit1,
	iommu, jirislaby, joel, joro, kees, kevin.brodsky, kuba, lenb,
	lgirdwood, linux-acpi, linux-arm-kernel, linux-aspeed, linux-cxl,
	linux-kernel, linux-mips, linux-mm, linux-pci, linux-serial,
	linux-snps-arc, linux-usb, linux, linuxppc-dev, m.szyprowski,
	maddy, mani, miko.lenczewski, mpe, netdev, npiggin, osalvador,
	oupton, pabeni, palmer, peter.ujfalusi, peterz, pjw, robh,
	sebastian.hesselbarth, tglx, tsbogend, vgupta, vkoul, will, willy,
	yangyicong, yeoreum.yun
In-Reply-To: <20260406232444.3117516-1-dianders@chromium.org>

Hello:

This patch was applied to riscv/linux.git (fixes)
by Danilo Krummrich <dakr@kernel.org>:

On Mon,  6 Apr 2026 16:22:53 -0700 you wrote:
> The main goal of this series is to fix the observed bug talked about
> in the first patch ("driver core: Don't let a device probe until it's
> ready"). That patch fixes a problem that has been observed in the real
> world and could land even if the rest of the patches are found
> unacceptable or need to be spun.
> 
> That said, during patch review Danilo correctly pointed out that many
> of the bitfield accesses in "struct device" are unsafe. I added a
> bunch of patches in the series to address each one.
> 
> [...]

Here is the summary with links:
  - [v5,7/9] driver core: Replace dev->dma_coherent with dev_dma_coherent()
    https://git.kernel.org/riscv/c/3e2c1e213ac2

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v2 1/5] arch: select HAVE_ARCH_BITREVERSE conditionally on BITREVERSE
From: patchwork-bot+linux-riscv @ 2026-06-26  8:20 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-riscv, pjw, palmer, aou, alex, yury.norov, linux, arnd,
	ebiggers, andrew+netdev, davem, edumazet, kuba, pabeni, akpm, ast,
	daniel, hawk, john.fastabend, sdf, ruanjinjie, linux-kernel,
	linux-arch, netdev, bpf
In-Reply-To: <20260506175207.110893-2-ynorov@nvidia.com>

Hello:

This series was applied to riscv/linux.git (fixes)
by Yury Norov <ynorov@nvidia.com>:

On Wed,  6 May 2026 13:52:02 -0400 you wrote:
> Architectures may have bit reversal instructions, but if the API not
> needed, the corresponding option should not be selected because it may
> lead to generating the unneeded code.
> 
> Signed-off-by: Yury Norov <ynorov@nvidia.com>
> ---
>  arch/arm/Kconfig       | 2 +-
>  arch/arm64/Kconfig     | 2 +-
>  arch/loongarch/Kconfig | 2 +-
>  arch/mips/Kconfig      | 2 +-
>  lib/Kconfig            | 1 +
>  5 files changed, 5 insertions(+), 4 deletions(-)

Here is the summary with links:
  - [v2,1/5] arch: select HAVE_ARCH_BITREVERSE conditionally on BITREVERSE
    https://git.kernel.org/riscv/c/42d9c75e8b9c
  - [v2,2/5] lib/bitrev: Introduce GENERIC_BITREVERSE
    https://git.kernel.org/riscv/c/00751d655ece
  - [v2,3/5] bitops: Define generic___bitrev8/16/32 for reuse
    https://git.kernel.org/riscv/c/83aede8131af
  - [v2,4/5] arch/riscv: Add bitrev.h file to support rev8 and brev8
    https://git.kernel.org/riscv/c/e8620bd7e5e0
  - [v2,5/5] MAINTAINERS: BITOPS: include bitrev.[ch]
    https://git.kernel.org/riscv/c/7b2c5b4e43aa

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH 0/6] lib: rework bitreverse
From: patchwork-bot+linux-riscv @ 2026-06-26  8:20 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-riscv, pjw, palmer, aou, alex, yury.norov, linux, arnd,
	andrew+netdev, davem, edumazet, kuba, pabeni, akpm, ast, daniel,
	hawk, john.fastabend, sdf, ruanjinjie, linux-kernel, linux-arch,
	netdev, bpf
In-Reply-To: <20260430211351.658193-1-ynorov@nvidia.com>

Hello:

This series was applied to riscv/linux.git (fixes)
by Yury Norov <ynorov@nvidia.com>:

On Thu, 30 Apr 2026 17:13:44 -0400 you wrote:
> This series is a resend for Jinjie Ruan's "arch/riscv: Add bitrev.h file
> to support rev8 and brev8" [1], my follow-up "lib: compile generic
> bitrev based on GENERIC_BITREVERSE" [2], and the fix for a build error
> reported by Nathan Chancellor [3].
> 
> No changes, except for combining pieces together and rebasing on top of
> the tree.
> 
> [...]

Here is the summary with links:
  - [1/6] lib: include crc32.h conditionally on CONFIG_CRC32
    (no matching commit)
  - [2/6] lib/bitrev: Introduce GENERIC_BITREVERSE and cleanup Kconfig
    (no matching commit)
  - [3/6] bitops: Define generic __bitrev8/16/32 for reuse
    (no matching commit)
  - [4/6] arch/riscv: Add bitrev.h file to support rev8 and brev8
    (no matching commit)
  - [5/6] lib: compile generic bitrev.c conditionally on GENERIC_BITREVERSE
    (no matching commit)
  - [6/6] MAINTAINERS: BITOPS: include bitrev.[ch]
    https://git.kernel.org/riscv/c/7b2c5b4e43aa

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net: airoha: fix max receive size configuration
From: Lorenzo Bianconi @ 2026-06-26  8:18 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: linux-arm-kernel, linux-mediatek, netdev, Madhur Agrawal
In-Reply-To: <20260625-airoha-fix-rx-max-len-v1-1-45b9b827358d@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 12249 bytes --]

> Set the GDM maximum receive size to AIROHA_MAX_RX_SIZE unconditionally
> during hardware initialization instead of updating it according to the
> configured MTU. This avoids dropping incoming frames that exceed the
> current MTU but could still be processed by the networking stack, which
> is able to fragment the reply on the TX side (e.g. ICMP echo requests).
> Move the per-port MTU configuration to the PPE egress path where it
> belongs, and set the tx frame size running airoha_ppe_set_xmit_frame_size()
> to dynamically track the maximum MTU across running interfaces sharing
> the same PPE instance.
> Fix the PPE MTU register addressing to pack two port entries per
> register word and add WAN_MTU0 configuration for non-LAN GDM devices.

commenting on sashiko's report:
https://sashiko.dev/#/patchset/20260625-airoha-fix-rx-max-len-v1-1-45b9b827358d%40kernel.org

> 
> Fixes: 54d989d58d2a ("net: airoha: Move min/max packet len configuration in airoha_dev_open()")
> Tested-by: Madhur Agrawal <madhur.agrawal@airoha.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  drivers/net/ethernet/airoha/airoha_eth.c  | 68 ++++++++++---------------------
>  drivers/net/ethernet/airoha/airoha_eth.h  |  2 +
>  drivers/net/ethernet/airoha/airoha_ppe.c  | 39 +++++++++++++-----
>  drivers/net/ethernet/airoha/airoha_regs.h |  9 ++--
>  4 files changed, 58 insertions(+), 60 deletions(-)
> 
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 932b3a3df2e5..3f451c2d4c24 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -178,10 +178,15 @@ static void airoha_fe_maccr_init(struct airoha_eth *eth)
>  {
>  	int p;
>  
> -	for (p = 1; p <= ARRAY_SIZE(eth->ports); p++)
> +	for (p = 1; p <= ARRAY_SIZE(eth->ports); p++) {
>  		airoha_fe_set(eth, REG_GDM_FWD_CFG(p),
>  			      GDM_TCP_CKSUM_MASK | GDM_UDP_CKSUM_MASK |
>  			      GDM_IP4_CKSUM_MASK | GDM_DROP_CRC_ERR_MASK);
> +		airoha_fe_rmw(eth, REG_GDM_LEN_CFG(p),
> +			      GDM_SHORT_LEN_MASK | GDM_LONG_LEN_MASK,
> +			      FIELD_PREP(GDM_SHORT_LEN_MASK, 60) |
> +			      FIELD_PREP(GDM_LONG_LEN_MASK, AIROHA_MAX_RX_SIZE));
> +	}
>  
>  	airoha_fe_rmw(eth, REG_CDM_VLAN_CTRL(1), CDM_VLAN_MASK,
>  		      FIELD_PREP(CDM_VLAN_MASK, 0x8100));
> @@ -1831,13 +1836,24 @@ static void airoha_update_hw_stats(struct airoha_gdm_dev *dev)
>  	spin_unlock(&port->stats_lock);

- This is a pre-existing issue, but can the spin_lock() used in
  airoha_update_hw_stats() cause a deadlock?
  If a process context holds port->stats_lock via spin_lock() and is preempted
  by a networking softirq on the same CPU that calls dev_get_stats()
  (which invokes ndo_get_stats64 -> airoha_update_hw_stats()), will the softirq
  spin forever trying to acquire the same lock? Should this use spin_lock_bh()
  instead?
  - The reported issue has not been introduced by this patch. Moreover, I do
    not think this is a real problem since in the current codebase
    airoha_update_hw_stats() is always run in process context and not in-irq
    context.

>  }
>  
> +static void airoha_dev_set_xmit_frame_size(struct net_device *netdev)
> +{
> +	struct airoha_gdm_dev *dev = netdev_priv(netdev);
> +
> +	airoha_ppe_set_xmit_frame_size(dev);
> +	if (!airoha_is_lan_gdm_dev(dev))
> +		airoha_fe_rmw(dev->eth, REG_WAN_MTU0, WAN_MTU0_MASK,
> +			      FIELD_PREP(WAN_MTU0_MASK,
> +					 VLAN_ETH_HLEN + netdev->mtu));
> +}

- Does this unconditional write to REG_WAN_MTU0 break sibling network devices
  sharing the same WAN port? 
  If multiple interfaces share the same hardware port, this appears to overwrite
  the shared register using only the current interface's MTU, ignoring the
  maximum MTU of any active sibling interfaces. Could this cause the hardware to
  drop frames for sibling interfaces if their MTU is larger than the most
  recently configured interface?
  - This is not a real issue since we can have at most a single WAN port in the
    system

> +
>  static int airoha_dev_open(struct net_device *netdev)
>  {
> -	int err, len = ETH_HLEN + netdev->mtu + ETH_FCS_LEN;
>  	struct airoha_gdm_dev *dev = netdev_priv(netdev);
>  	struct airoha_gdm_port *port = dev->port;
> -	u32 cur_len, pse_port = FE_PSE_PORT_PPE1;
>  	struct airoha_qdma *qdma = dev->qdma;
> +	u32 pse_port = FE_PSE_PORT_PPE1;
> +	int err;
>  
>  	netif_tx_start_all_queues(netdev);
>  	err = airoha_set_vip_for_gdm_port(dev, true);

[...]

>  static int airoha_dev_stop(struct net_device *netdev)
>  {
>  	struct airoha_gdm_dev *dev = netdev_priv(netdev);
> @@ -1909,7 +1889,7 @@ static int airoha_dev_stop(struct net_device *netdev)
>  	airoha_set_vip_for_gdm_port(dev, false);
>  
>  	if (--port->users)
> -		airoha_set_port_mtu(dev->eth, port);
> +		airoha_ppe_set_xmit_frame_size(dev);

- Does this stop path fail to update the WAN MTU limit?
  When an interface is stopped, airoha_ppe_set_xmit_frame_size() recalculates
  the PPE MTU, but it looks like the global REG_WAN_MTU0 register is not updated
  here. Will this prevent the MTU limit from correctly shrinking when an
  interface is brought down?
  - This is not a real issue since, as pointed out above, we can have at most a
    single WAN port in the system, so there is no point to reconfigure
    REG_WAN_MTU0 if this interface is stopped.

>  	else
>  		airoha_set_gdm_port_fwd_cfg(qdma->eth,
>  					    REG_GDM_FWD_CFG(port->id),
> @@ -1962,10 +1942,6 @@ static int airoha_enable_gdm2_loopback(struct airoha_gdm_dev *dev)
>  		      FIELD_PREP(LPBK_CHAN_MASK, chan) |
>  		      LBK_GAP_MODE_MASK | LBK_LEN_MODE_MASK |
>  		      LBK_CHAN_MODE_MASK | LPBK_EN_MASK);
> -	airoha_fe_rmw(eth, REG_GDM_LEN_CFG(AIROHA_GDM2_IDX),
> -		      GDM_SHORT_LEN_MASK | GDM_LONG_LEN_MASK,
> -		      FIELD_PREP(GDM_SHORT_LEN_MASK, 60) |
> -		      FIELD_PREP(GDM_LONG_LEN_MASK, AIROHA_MAX_MTU));
>  	/* Forward the traffic to the proper GDM port */
>  	pse_port = port->id == AIROHA_GDM3_IDX ? FE_PSE_PORT_GDM3
>  					       : FE_PSE_PORT_GDM4;
> @@ -2098,7 +2074,7 @@ static int airoha_dev_change_mtu(struct net_device *netdev, int mtu)
>  
>  	WRITE_ONCE(netdev->mtu, mtu);
>  	if (port->users)
> -		airoha_set_port_mtu(dev->eth, port);
> +		airoha_dev_set_xmit_frame_size(netdev);
>  
>  	return 0;
>  }
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
> index d7ff8c5200e2..0c3fb6e5d7f1 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.h
> +++ b/drivers/net/ethernet/airoha/airoha_eth.h
> @@ -23,6 +23,7 @@
>  #define AIROHA_MAX_DSA_PORTS		7
>  #define AIROHA_MAX_NUM_RSTS		3
>  #define AIROHA_MAX_MTU			9220
> +#define AIROHA_MAX_RX_SIZE		16128
>  #define AIROHA_MAX_PACKET_SIZE		2048
>  #define AIROHA_NUM_QOS_CHANNELS		4
>  #define AIROHA_NUM_QOS_QUEUES		8
> @@ -676,6 +677,7 @@ int airoha_get_fe_port(struct airoha_gdm_dev *dev);
>  bool airoha_is_valid_gdm_dev(struct airoha_eth *eth,
>  			     struct airoha_gdm_dev *dev);
>  
> +void airoha_ppe_set_xmit_frame_size(struct airoha_gdm_dev *dev);
>  void airoha_ppe_set_cpu_port(struct airoha_gdm_dev *dev, u8 ppe_id, u8 fport);
>  bool airoha_ppe_is_enabled(struct airoha_eth *eth, int index);
>  void airoha_ppe_check_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
> diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
> index 42f4b0f21d17..e7c78293002a 100644
> --- a/drivers/net/ethernet/airoha/airoha_ppe.c
> +++ b/drivers/net/ethernet/airoha/airoha_ppe.c
> @@ -97,6 +97,33 @@ void airoha_ppe_set_cpu_port(struct airoha_gdm_dev *dev, u8 ppe_id, u8 fport)
>  		      __field_prep(DFT_CPORT_MASK(fport), fe_cpu_port));
>  }
>  
> +void airoha_ppe_set_xmit_frame_size(struct airoha_gdm_dev *dev)
> +{
> +	struct airoha_gdm_port *port = dev->port;
> +	struct airoha_eth *eth = dev->eth;
> +	int i, ppe_id, index;
> +	u32 len = 0;
> +
> +	for (i = 0; i < ARRAY_SIZE(port->devs); i++) {
> +		struct airoha_gdm_dev *d = port->devs[i];
> +		struct net_device *netdev;
> +
> +		if (!d)
> +			continue;
> +
> +		netdev = netdev_from_priv(d);
> +		if (netif_running(netdev))
> +			len = max_t(u32, len, netdev->mtu);
> +	}
> +	len += VLAN_ETH_HLEN;
> +
> +	ppe_id = !airoha_is_lan_gdm_dev(dev) && airoha_ppe_is_enabled(eth, 1);
> +	index = port->id == AIROHA_GDM4_IDX ? 7 : port->id;
> +	airoha_fe_rmw(eth, REG_PPE_MTU(ppe_id, index),
> +		      FP_EGRESS_MTU_MASK(index),
> +		      __field_prep(FP_EGRESS_MTU_MASK(index), len));

- Does this leave the egress MTU limit uninitialized for other PPE engines?
  The patch removes the loop in airoha_ppe_hw_init() that previously initialized
  REG_PPE_MTU for all ports on all available PPEs. This function now only
  configures it for a single ppe_id.
  During cross-PPE routing (such as WAN-to-LAN), if PPE1 (WAN) forwards a packet
  to a LAN port, it will check REG_PPE_MTU(1, LAN_port). Since this register was
  only configured for PPE0, will the uninitialized limit (0) cause the packet to
  be dropped?
  - This is not a real issue since every airoha_gdm_dev/net_device is
    associated to a PPE engine/QDMA according to the logic in
    airoha_dev_open()/airoha_dev_set_qdma(). The other PPE engine's MTU will be
    updated according to the assigned net_device.

Regards,
Lorenzo

> +}
> +
>  static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
>  {
>  	u32 sram_ppe_num_data_entries = PPE_SRAM_NUM_ENTRIES, sram_num_entries;
> @@ -115,8 +142,6 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
>  		PPE_RAM_NUM_ENTRIES_SHIFT(sram_ppe_num_data_entries);
>  
>  	for (i = 0; i < eth->soc->num_ppe; i++) {
> -		int p;
> -
>  		airoha_fe_wr(eth, REG_PPE_TB_BASE(i),
>  			     ppe->foe_dma + sram_tb_size);
>  
> @@ -166,15 +191,6 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
>  		airoha_fe_wr(eth, REG_PPE_HASH_SEED(i), PPE_HASH_SEED);
>  		airoha_fe_clear(eth, REG_PPE_PPE_FLOW_CFG(i),
>  				PPE_FLOW_CFG_IP6_6RD_MASK);
> -
> -		for (p = 0; p < ARRAY_SIZE(eth->ports); p++)
> -			airoha_fe_rmw(eth, REG_PPE_MTU(i, p),
> -				      FP0_EGRESS_MTU_MASK |
> -				      FP1_EGRESS_MTU_MASK,
> -				      FIELD_PREP(FP0_EGRESS_MTU_MASK,
> -						 AIROHA_MAX_MTU) |
> -				      FIELD_PREP(FP1_EGRESS_MTU_MASK,
> -						 AIROHA_MAX_MTU));
>  	}
>  
>  	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
> @@ -196,6 +212,7 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
>  				 airoha_ppe_is_enabled(eth, 1);
>  			fport = airoha_get_fe_port(dev);
>  			airoha_ppe_set_cpu_port(dev, ppe_id, fport);
> +			airoha_ppe_set_xmit_frame_size(dev);
>  		}
>  	}
>  }
> diff --git a/drivers/net/ethernet/airoha/airoha_regs.h b/drivers/net/ethernet/airoha/airoha_regs.h
> index 436f3c8779c1..6fed63d013b4 100644
> --- a/drivers/net/ethernet/airoha/airoha_regs.h
> +++ b/drivers/net/ethernet/airoha/airoha_regs.h
> @@ -327,9 +327,8 @@
>  #define PPE_SRAM_TABLE_EN_MASK			BIT(0)
>  
>  #define REG_PPE_MTU_BASE(_n)			(((_n) ? PPE2_BASE : PPE1_BASE) + 0x304)
> -#define REG_PPE_MTU(_m, _n)			(REG_PPE_MTU_BASE(_m) + ((_n) << 2))
> -#define FP1_EGRESS_MTU_MASK			GENMASK(29, 16)
> -#define FP0_EGRESS_MTU_MASK			GENMASK(13, 0)
> +#define REG_PPE_MTU(_m, _n)			(REG_PPE_MTU_BASE(_m) + (((_n) / 2) << 2))
> +#define FP_EGRESS_MTU_MASK(_n)			GENMASK(13 + (((_n) % 2) << 4), ((_n) % 2) << 4)
>  
>  #define REG_PPE_RAM_CTRL(_n)			(((_n) ? PPE2_BASE : PPE1_BASE) + 0x31c)
>  #define PPE_SRAM_CTRL_ACK_MASK			BIT(31)
> @@ -377,6 +376,10 @@
>  #define REG_SRC_PORT_FC_MAP6		0x2298
>  #define FC_ID_OF_SRC_PORT_MASK(_n)	GENMASK(4 + ((_n) << 3), ((_n) << 3))
>  
> +#define REG_WAN_MTU0			0x2300
> +#define WAN_MTU1_MASK			GENMASK(29, 16)
> +#define WAN_MTU0_MASK			GENMASK(13, 0)
> +
>  #define REG_CDM5_RX_OQ1_DROP_CNT	0x29d4
>  
>  /* QDMA */
> 
> ---
> base-commit: fd1269e454089abda0e4f9e5e25ecd02a90ab009
> change-id: 20260618-airoha-fix-rx-max-len-57654b661646
> 
> Best regards,
> -- 
> Lorenzo Bianconi <lorenzo@kernel.org>
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [RFC net-next 00/17] MPTCP KTLS support
From: Geliang Tang @ 2026-06-26  7:56 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Matthieu Baerts, Mat Martineau, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Neal Cardwell, Kuniyuki Iwashima,
	John Fastabend, Sabrina Dubroca, Hannes Reinecke, Geliang Tang,
	netdev, mptcp, Gang Yan, Zqiang
In-Reply-To: <20260622090059.5d1813dd@kernel.org>

On Mon, 2026-06-22 at 09:00 -0700, Jakub Kicinski wrote:
> On Mon, 22 Jun 2026 18:43:20 +0800 Geliang Tang wrote:
> > Subject: [RFC net-next 00/17] MPTCP KTLS support
> 
> Please no. We have a ton of unfixed bugs and may have to revert some
> of
> the features we dropped back in. I'd prefer to avoid large new bug
> surfaces until we reach an LTS release.

Sure, I can wait. During this time, I'll go over the implementation
more carefully to make sure there are no issues on the MPTCP side.

Thanks,
-Geliang

^ permalink raw reply

* Re: [PATCH] Subject: [PATCH] net: gro: fix double aggregation of flush-marked skbs
From: Greg KH @ 2026-06-26  7:47 UTC (permalink / raw)
  To: Shiming Cheng
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, matthias.bgg,
	angelogioacchino.delregno, willemb, imv4bel, alice,
	eilaimemedsnaimel, sd, lena.wang, stable
In-Reply-To: <20260626074059.25244-1-shiming.cheng@mediatek.com>

On Fri, Jun 26, 2026 at 03:40:59PM +0800, Shiming Cheng wrote:
> The new skb_gro_receive_list() function is missing a critical safety check
> present in the legacy skb_gro_receive() path. Specifically, it does not
> validate NAPI_GRO_CB(skb)->flush before allowing packet aggregation.
> 
> This allows already-GRO'd packets with existing frag_list to be
> re-aggregated into a new GRO session, corrupting the frag_list chain
> structure. When skb_segment() attempts to unpack these malformed packets,
> it encounters invalid state and triggers a kernel panic.
> 
> Scenario (Tethering/Device forwarding):
>   1. Driver: Driver Generated aggregated packet P1 via LRO with frag_list
>   2. Dev A: Receives aggregated fraglist packet and flush flag set
>   2. Dev A: Re-enters GRO, skb_gro_receive_list() is called
>   4. Missing flush check allows re-aggregation despite flush flag
>   5. Frag_list chain becomes corrupted (loops or dangling refs)
>   6. Dev B: TX path calls skb_segment(), crashes on corrupted frag_list
> 
> Root cause in skb_segment():
>   The check at line ~4891:
>     if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) &&
>         (skb_headlen(list_skb) == len || sg)) {
> 
>   When frag_list is corrupted by double aggregation, when list_skb is
>   a NULL pointer from skb->next, skb_headlen(list_skb) dereference
>   NULL/corrupted pointers occurs.
> 
> Call Trace:
>  skb_headlen(NULL skb)
>  skb_segment
>  tcp_gso_segment
>  tcp4_gso_segment
>  inet_gso_segment
>  skb_mac_gso_segment
>  __skb_gso_segment
>  skb_gso_segment
>  validate_xmit_skb
>  validate_xmit_skb_list
>  sch_direct_xmit
>  qdisc_restart
>  __qdisc_run
>  qdisc_run
>  net_tx_action
> 
> Fix: Add NAPI_GRO_CB(skb)->flush validation to the early-return check in
> skb_gro_receive_list(), matching the defensive programming pattern of
> skb_gro_receive().
> 
> Fixes: 9dc2c3cd6c11 ("net: add fraglist GRO/GSO support")
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> ---
>  net/core/gro.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/gro.c b/net/core/gro.c
> index 35f2f708f010..076247c1e662 100644
> --- a/net/core/gro.c
> +++ b/net/core/gro.c
> @@ -229,7 +229,8 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
>  
>  int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb)
>  {
> -	if (unlikely(p->len + skb->len >= 65536))
> +	if (unlikely(p->len + skb->len >= 65536 ||
> +		     NAPI_GRO_CB(skb)->flush))
>  		return -E2BIG;
>  
>  	if (!pskb_may_pull(skb, skb_gro_offset(skb))) {
> -- 
> 2.45.2
> 
> 

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

^ permalink raw reply

* Re: [PATCH net 0/7] xsk: fix AF_XDP multi-buffer Tx descriptor reclaim
From: Maciej Fijalkowski @ 2026-06-26  7:42 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Jason Xing, netdev, bpf, magnus.karlsson, stfomichev, kuba,
	pabeni, horms, bjorn
In-Reply-To: <aj1O4vKzCuodwgYL@devvm7509.cco0.facebook.com>

On Thu, Jun 25, 2026 at 09:05:28AM -0700, Stanislav Fomichev wrote:
> On 06/25, Jason Xing wrote:
> > On Thu, Jun 25, 2026 at 12:37 AM Maciej Fijalkowski
> > <maciej.fijalkowski@intel.com> wrote:
> > >
> > > On Wed, Jun 24, 2026 at 08:38:20AM -0700, Stanislav Fomichev wrote:
> > > > On 06/23, Maciej Fijalkowski wrote:
> > > > > Hi,
> > > > >
> > > > > This series fixes several AF_XDP multi-buffer Tx paths where descriptors
> > > > > consumed from the Tx ring are not consistently returned to userspace
> > > > > through the completion ring when the packet is later dropped as invalid.
> > > > >
> > > > > The affected cases are invalid or oversized multi-buffer Tx packets in
> > > > > both the generic and zero-copy paths. In these cases, the kernel can
> > > > > consume one or more Tx descriptors while building or validating a
> > > > > multi-buffer packet, then drop the packet before it reaches the device.
> > > > > Userspace still owns the UMEM buffers only after the corresponding
> > > > > addresses are returned through the CQ. Missing completions therefore
> > > > > make userspace lose track of those buffers.
> > > > >
> > > > > The generic path fixes cover three related cases:
> > > > > * partially built multi-buffer skbs dropped by xsk_drop_skb();
> > > > >   continuation descriptors left in the Tx ring after xsk_build_skb()
> > > > >   reports overflow;
> > > > > * invalid descriptors encountered in the middle of a multi-buffer
> > > > >   packet, including the offending invalid descriptor itself.
> > > > >
> > > > > The zero-copy path is handled separately. The batched Tx parser now
> > > > > distinguishes descriptors that can be passed to the driver from
> > > > > descriptors that are consumed only because they belong to an invalid
> > > > > multi-buffer packet. Reclaim-only descriptors are written to the CQ
> > > > > address area and published in completion order, after any earlier
> > > > > driver-visible Tx descriptors.
> > > > >
> > > > > The ZC batching path can also retain drain state when userspace has not
> > > > > yet provided the end of an invalid multi-buffer packet. To keep this
> > > > > state local to the singular batched path, the series prevents a second
> > > > > Tx socket from joining the same pool while such drain state exists.
> > > > > During the singular-to-shared transition, Tx batching is gated,
> > > > > pre-existing readers are waited out, and bind fails with -EAGAIN if the
> > > > > existing socket still has pending drain state. This avoids adding
> > > > > multi-buffer drain handling to the shared-UMEM fallback path.
> > > > >
> > > > > The last two patches update xskxceiver so the tests account invalid
> > > > > multi-buffer Tx packets as descriptors that must be reclaimed, while
> > > > > still not expecting those invalid packets on the Rx side.
> > > > >
> > > > > This is a follow-up to Jason's changes [0] which were addressing generic
> > > > > xmit only and this set allows me to pass full xskxceiver test suite run
> > > > > against ice driver.
> > > >
> > > > There is a fair amount of feedback from sashiko already :-( So the meta
> > > > question from me is: is it time to scrap our current approach where
> > > > we parse descriptor by descriptor? (and maintain half-baked skb and
> > > > half-consumed descriptor queues)
> > > >
> > > > Should we:
> > > >
> > > > 1. do desc[MAX_SKB_FRAGS] and xskq_cons_peek_desc until we exhaust
> > > > PKT_CONT (if the last packet has PKT_CONT, return EOVERFLOW to userspace
> > > > and do a full stop here)
> > > > 2. now that we really know the number of valid descriptors -> reserve
> > > > the cq space (if not -> EAGAIN)
> > > > 3. pre-allocate everything here (if at any point we have ENOMEM -> cleanup
> > > > locally, don't ever create semi-initialized skb)
> > > > 4. construct the skb
> > > > 5. xmit
> > >
> > > Yeah generic xmit became utterly horrible, haven't gone through sashiko
> > > reviews yet, but bare in mind this set also aligns zc side to what was
> > > previously being addressed by Jason.
> > >
> > > I believe planned logistics were to get these fixes onto net and then
> > > Jason had an implementation of batching on generic xmit, directed towards
> > > -next and that's where we could address current flow.
> > 
> > Agreed. That's what I'm hoping for. There would be much more
> > discussion on how to do batch xmit in an elegant way, I believe.
> 
> This doesn't have to depend on the batch rewrite, we should be able to rewrite
> this non-zc in net, this is still technically fixes, not feature work..
> 
> There was already a couple of revisions with this drain_cont approach
> and every time I look at it feels like the cure is worse than the
> decease :-( Obviously not gonna stop you from going with the current approach,
> but these fixes feel a bit of a wasted effort to me (since the bugs keep
> coming and we are piling more complexity).

Well this is my fault as I took Jason's patches as-is and did not realize
Sashiko had issues with it. I *think* I got ZC side almost right so I'd
like to have at least one last round with trying to make the generic side
right...


^ permalink raw reply

* [PATCH] Subject: [PATCH] net: gro: fix double aggregation of flush-marked skbs
From: Shiming Cheng @ 2026-06-26  7:40 UTC (permalink / raw)
  To: netdev, davem, edumazet, kuba, pabeni, horms, matthias.bgg,
	angelogioacchino.delregno, willemb, imv4bel, alice,
	eilaimemedsnaimel, sd
  Cc: lena.wang, stable, Shiming Cheng

The new skb_gro_receive_list() function is missing a critical safety check
present in the legacy skb_gro_receive() path. Specifically, it does not
validate NAPI_GRO_CB(skb)->flush before allowing packet aggregation.

This allows already-GRO'd packets with existing frag_list to be
re-aggregated into a new GRO session, corrupting the frag_list chain
structure. When skb_segment() attempts to unpack these malformed packets,
it encounters invalid state and triggers a kernel panic.

Scenario (Tethering/Device forwarding):
  1. Driver: Driver Generated aggregated packet P1 via LRO with frag_list
  2. Dev A: Receives aggregated fraglist packet and flush flag set
  2. Dev A: Re-enters GRO, skb_gro_receive_list() is called
  4. Missing flush check allows re-aggregation despite flush flag
  5. Frag_list chain becomes corrupted (loops or dangling refs)
  6. Dev B: TX path calls skb_segment(), crashes on corrupted frag_list

Root cause in skb_segment():
  The check at line ~4891:
    if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) &&
        (skb_headlen(list_skb) == len || sg)) {

  When frag_list is corrupted by double aggregation, when list_skb is
  a NULL pointer from skb->next, skb_headlen(list_skb) dereference
  NULL/corrupted pointers occurs.

Call Trace:
 skb_headlen(NULL skb)
 skb_segment
 tcp_gso_segment
 tcp4_gso_segment
 inet_gso_segment
 skb_mac_gso_segment
 __skb_gso_segment
 skb_gso_segment
 validate_xmit_skb
 validate_xmit_skb_list
 sch_direct_xmit
 qdisc_restart
 __qdisc_run
 qdisc_run
 net_tx_action

Fix: Add NAPI_GRO_CB(skb)->flush validation to the early-return check in
skb_gro_receive_list(), matching the defensive programming pattern of
skb_gro_receive().

Fixes: 9dc2c3cd6c11 ("net: add fraglist GRO/GSO support")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
---
 net/core/gro.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/gro.c b/net/core/gro.c
index 35f2f708f010..076247c1e662 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -229,7 +229,8 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)

 int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb)
 {
-	if (unlikely(p->len + skb->len >= 65536))
+	if (unlikely(p->len + skb->len >= 65536 ||
+		     NAPI_GRO_CB(skb)->flush))
 		return -E2BIG;

 	if (!pskb_may_pull(skb, skb_gro_offset(skb))) {
-- 
2.45.2

^ permalink raw reply related

* [PATCH net] nfc: pn533: hold a reference to the request skb during send_frame
From: Yinhao Hu @ 2026-06-26  7:34 UTC (permalink / raw)
  To: David Heidelberg
  Cc: Kees Cook, Krzysztof Kozlowski, Dan Carpenter, Jakub Kicinski,
	Samuel Ortiz, Michael Thalmeier, netdev, dzm91,
	hust-os-kernel-patches, Yinhao Hu

__pn533_send_async() publishes the command and then calls
dev->phy_ops->send_frame(). Once dev->cmd is set, an incoming frame
can be matched to this command: the I2C threaded IRQ runs
pn533_recv_frame(), which queues cmd_complete_work, and
pn533_send_async_complete() frees cmd->req with consume_skb().

On the I2C transport, pn533_i2c_send_frame() still dereferences the same
skb after i2c_master_send() returns, so a completion that races the
send can free the skb while the transport is still using it.

The request skb is owned by the command object and may be freed by
command completion at any time after dev->cmd is published, so the
transport send path must not assume it stays alive. Hold a temporary
reference to the request skb across the send_frame() call so the
transport always sees a live skb even if completion races the send.
Add a pn533_send_cmd_frame() helper and use it from all three send
paths.

Fixes: 9815c7cf22da ("NFC: pn533: Separate physical layer from the core implementation")
Signed-off-by: Yinhao Hu <dddddd@hust.edu.cn>
---
 drivers/nfc/pn533/pn533.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/nfc/pn533/pn533.c b/drivers/nfc/pn533/pn533.c
index d7bdbc82e2ba..55bbfa32d695 100644
--- a/drivers/nfc/pn533/pn533.c
+++ b/drivers/nfc/pn533/pn533.c
@@ -434,6 +434,18 @@ static int pn533_send_async_complete(struct pn533 *dev)
 	return rc;
 }

+static int pn533_send_cmd_frame(struct pn533 *dev, struct pn533_cmd *cmd)
+{
+	struct sk_buff *req = cmd->req;
+	int rc;
+
+	skb_get(req);
+	dev->cmd = cmd;
+	rc = dev->phy_ops->send_frame(dev, req);
+	dev_kfree_skb(req);
+	return rc;
+}
+
 static int __pn533_send_async(struct pn533 *dev, u8 cmd_code,
 			      struct sk_buff *req,
 			      pn533_send_async_complete_t complete_cb,
@@ -458,8 +470,7 @@ static int __pn533_send_async(struct pn533 *dev, u8 cmd_code,
 	mutex_lock(&dev->cmd_lock);

 	if (!dev->cmd_pending) {
-		dev->cmd = cmd;
-		rc = dev->phy_ops->send_frame(dev, req);
+		rc = pn533_send_cmd_frame(dev, cmd);
 		if (rc) {
 			dev->cmd = NULL;
 			goto error;
@@ -529,8 +540,7 @@ static int pn533_send_cmd_direct_async(struct pn533 *dev, u8 cmd_code,

 	pn533_build_cmd_frame(dev, cmd_code, req);

-	dev->cmd = cmd;
-	rc = dev->phy_ops->send_frame(dev, req);
+	rc = pn533_send_cmd_frame(dev, cmd);
 	if (rc < 0) {
 		dev->cmd = NULL;
 		kfree(cmd);
@@ -569,8 +579,7 @@ static void pn533_wq_cmd(struct work_struct *work)

 	mutex_unlock(&dev->cmd_lock);

-	dev->cmd = cmd;
-	rc = dev->phy_ops->send_frame(dev, cmd->req);
+	rc = pn533_send_cmd_frame(dev, cmd);
 	if (rc < 0) {
 		dev->cmd = NULL;
 		dev_kfree_skb(cmd->req);
-- 
2.43.0

^ permalink raw reply related

* [PATCH net] net: enetc: check the number of BDs needed for xdp_frame
From: wei.fang @ 2026-06-26  7:32 UTC (permalink / raw)
  To: claudiu.manoil, vladimir.oltean, xiaoning.wang, andrew+netdev,
	davem, edumazet, kuba, pabeni, ast, daniel, hawk, john.fastabend,
	sdf
  Cc: wei.fang, imx, netdev, linux-kernel, bpf

From: Wei Fang <wei.fang@nxp.com>

The size of xdp_redirect_arr array is ENETC_MAX_SKB_FRAGS. However, the
number of fragments contained in xdp_frame may be greater than or equal
to ENETC_MAX_SKB_FRAGS, which will cause the access to xdp_redirect_arr
to be out of bounds.

Fixes: 9d2b68cc108d ("net: enetc: add support for XDP_REDIRECT")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
---
 drivers/net/ethernet/freescale/enetc/enetc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index aa8a87124b10..8e3f345dd9aa 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1783,6 +1783,7 @@ int enetc_xdp_xmit(struct net_device *ndev, int num_frames,
 {
 	struct enetc_tx_swbd xdp_redirect_arr[ENETC_MAX_SKB_FRAGS] = {0};
 	struct enetc_ndev_priv *priv = netdev_priv(ndev);
+	struct skb_shared_info *shinfo;
 	struct enetc_bdr *tx_ring;
 	int xdp_tx_bd_cnt, i, k;
 	int xdp_tx_frm_cnt = 0;
@@ -1798,6 +1799,12 @@ int enetc_xdp_xmit(struct net_device *ndev, int num_frames,
 	prefetchw(ENETC_TXBD(*tx_ring, tx_ring->next_to_use));
 
 	for (k = 0; k < num_frames; k++) {
+		if (xdp_frame_has_frags(frames[k])) {
+			shinfo = xdp_get_shared_info_from_frame(frames[k]);
+			if (unlikely((shinfo->nr_frags + 1) > ENETC_MAX_SKB_FRAGS))
+				break;
+		}
+
 		xdp_tx_bd_cnt = enetc_xdp_frame_to_xdp_tx_swbd(tx_ring,
 							       xdp_redirect_arr,
 							       frames[k]);
-- 
2.34.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox