Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v2 net 11/11] octeontx2-af: npc: cn20k: Reject missing default-rule MCAM indices
From: Ratheesh Kannoth @ 2026-04-20  2:34 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: sgoutham, davem, edumazet, kuba, pabeni, andrew+netdev,
	dan.carpenter, Ratheesh Kannoth, Suman Ghosh
In-Reply-To: <20260420023442.3295891-1-rkannoth@marvell.com>

npc_get_nixlf_mcam_index() returned USHRT_MAX from npc_cn20k_dft_rules_idx_get()
for broadcast, multicast, promiscuous, and unicast default-rule slots when no
rule was installed.  Callers could use that sentinel as a real MCAM index.
Return -EINVAL from the cn20k cases when the reserved entry is still USHRT_MAX.
Teach rvu_npc_update_flowkey_alg_idx() to handle negative indices and skip RSS
action updates when the lookup fails.  Add range checks to cn20k NPC MCAM helpers
so bogus indices are rejected before touching hardware.

Cc: Suman Ghosh <sumang@marvell.com>
Fixes: 6d1e70282f76 ("octeontx2-af: npc: cn20k: Use common APIs")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
---
 .../ethernet/marvell/octeontx2/af/cn20k/npc.c | 12 ++++++++++
 .../ethernet/marvell/octeontx2/af/rvu_npc.c   | 23 ++++++++++++-------
 2 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
index cb2fc21e0583..59d22511b742 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
@@ -808,6 +808,9 @@ npc_cn20k_enable_mcam_entry(struct rvu *rvu, int blkaddr,
 	u64 cfg, hw_prio;
 	u8 kw_type;
 
+	if (index < 0 || index >= mcam->total_entries)
+		return -EINVAL;
+
 	if (npc_mcam_idx_2_key_type(rvu, index, &kw_type))
 		return -EINVAL;
 
@@ -1055,6 +1058,9 @@ int npc_cn20k_config_mcam_entry(struct rvu *rvu, int blkaddr, int index,
 	int kw = 0;
 	u8 kw_type;
 
+	if (index < 0 || index >= mcam->total_entries)
+		return -EINVAL;
+
 	if (npc_mcam_idx_2_key_type(rvu, index, &kw_type))
 		return -EINVAL;
 
@@ -1147,6 +1153,9 @@ int npc_cn20k_copy_mcam_entry(struct rvu *rvu, int blkaddr, u16 src, u16 dest)
 	int bank, i, sb, db;
 	int dbank, sbank;
 
+	if (src >= mcam->total_entries || dest >= mcam->total_entries)
+		return -EINVAL;
+
 	dbank = npc_get_bank(mcam, dest);
 	sbank = npc_get_bank(mcam, src);
 
@@ -1212,6 +1221,9 @@ int npc_cn20k_read_mcam_entry(struct rvu *rvu, int blkaddr, u16 index,
 	int kw = 0, bank;
 	u8 kw_type;
 
+	if (index >= mcam->total_entries)
+		return -EINVAL;
+
 	if (npc_mcam_idx_2_key_type(rvu, index, &kw_type))
 		return -EINVAL;
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
index b2a9c8d0075b..be79a51b4b48 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
@@ -165,12 +165,20 @@ int npc_get_nixlf_mcam_index(struct npc_mcam *mcam,
 
 		switch (type) {
 		case NIXLF_BCAST_ENTRY:
+			if (bcast == USHRT_MAX)
+				return -EINVAL;
 			return bcast;
 		case NIXLF_ALLMULTI_ENTRY:
+			if (mcast == USHRT_MAX)
+				return -EINVAL;
 			return mcast;
 		case NIXLF_PROMISC_ENTRY:
+			if (promisc == USHRT_MAX)
+				return -EINVAL;
 			return promisc;
 		case NIXLF_UCAST_ENTRY:
+			if (ucast == USHRT_MAX)
+				return -EINVAL;
 			return ucast;
 		default:
 			return -EINVAL;
@@ -238,9 +246,6 @@ void npc_enable_mcam_entry(struct rvu *rvu, struct npc_mcam *mcam,
 	int actbank = bank;
 
 	if (is_cn20k(rvu->pdev)) {
-		if (index < 0 || index >= mcam->banksize * mcam->banks)
-			return;
-
 		npc_cn20k_enable_mcam_entry(rvu, blkaddr, index, enable);
 		return;
 	}
@@ -1128,7 +1133,7 @@ void rvu_npc_update_flowkey_alg_idx(struct rvu *rvu, u16 pcifunc, int nixlf,
 		index = mcam_index;
 	}
 
-	if (index >= mcam->total_entries)
+	if (index < 0 || index >= mcam->total_entries)
 		return;
 
 	bank = npc_get_bank(mcam, index);
@@ -1173,16 +1178,18 @@ void rvu_npc_update_flowkey_alg_idx(struct rvu *rvu, u16 pcifunc, int nixlf,
 		/* If PF's promiscuous  entry is enabled,
 		 * Set RSS action for that entry as well
 		 */
-		npc_update_rx_action_with_alg_idx(rvu, action, pfvf, index,
-						  blkaddr, alg_idx);
+		if (index >= 0)
+			npc_update_rx_action_with_alg_idx(rvu, action, pfvf, index,
+							  blkaddr, alg_idx);
 
 		index = npc_get_nixlf_mcam_index(mcam, pcifunc,
 						 nixlf, NIXLF_ALLMULTI_ENTRY);
 		/* If PF's allmulti  entry is enabled,
 		 * Set RSS action for that entry as well
 		 */
-		npc_update_rx_action_with_alg_idx(rvu, action, pfvf, index,
-						  blkaddr, alg_idx);
+		if (index >= 0)
+			npc_update_rx_action_with_alg_idx(rvu, action, pfvf, index,
+							  blkaddr, alg_idx);
 	}
 }
 
-- 
2.43.0


^ permalink raw reply related

* Re: Re: [PATCH] net: hamachi: fix divide by zero in hamachi_init_one
From: 王明煜 @ 2026-04-20  2:41 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, tglx, mingo, netdev,
	linux-kernel
In-Reply-To: <e1d937ec-d909-4553-9f1e-c89f9b556d6a@lunn.ch>

Hi Andrew,

Regarding the "halt" and exploitability:
While a clean panic on bad hardware is a safe defense, this is an unhandled exception that breaks the module error-handling path. It causes memory corruption rather than a safe halt, which is dangerous for hot-pluggable VMs. The trace shows a KASAN out-of-bounds read and a subsequent soft lockup:

==================================================================
BUG: KASAN: slab-out-of-bounds in idempotent_init_module+0x54a/0x620
Read of size 8 at addr ffff8881191e7e08 by task systemd-udevd/169
...
watchdog: BUG: soft lockup - CPU#2 stuck for 245s! [systemd-udevd:174]
RIP: 0010:queued_spin_lock_slowpath+0x243/0xb90
==================================================================

Regarding reporting the attack:
You are entirely correct that the v1 patch passively masked the math error. This v2 patch actively detects the untrusted hardware state ((i & 0x80) && !(i & 0x7f)), loudly reports it via dev_err(), and safely aborts the probe with -EIO.

Signed-off-by: Mingyu Wang <25181214217@stu.xidian.edu.cn>
---
Changes in v2:
- Addressed feedback from Andrew Lunn regarding trusting bad hardware.
- Added explicit detection for the untrusted hardware state.
- Added dev_err() to report the attack and aborted the probe with -EIO to prevent error-path memory corruption. 

 drivers/net/ethernet/packetengines/hamachi.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/packetengines/hamachi.c b/drivers/net/ethernet/packetengines/hamachi.c
index b0de7e9f12a5..2f487db2fb33 100644
--- a/drivers/net/ethernet/packetengines/hamachi.c
+++ b/drivers/net/ethernet/packetengines/hamachi.c
@@ -745,6 +745,14 @@ static int hamachi_init_one(struct pci_dev *pdev,
 		   dev->name, chip_tbl[chip_id].name, readl(ioaddr + ChipRev),
 		   ioaddr, dev->dev_addr, irq);
 	i = readb(ioaddr + PCIClkMeas);
+
+	if ((i & 0x80) && !(i & 0x7f)) {
+		dev_err(&pdev->dev, "Invalid PCIClkMeas value (0x%02x), hardware untrusted.\n", i);
+		unregister_netdev(dev);
+		ret = -EIO;
+		goto err_out_unmap_rx;
+	}
+
 	printk(KERN_INFO "%s:  %d-bit %d Mhz PCI bus (%d), Virtual Jumpers "
 		   "%2.2x, LPA %4.4x.\n",
 		   dev->name, readw(ioaddr + MiscStatus) & 1 ? 64 : 32,
-- 
2.34.1

2026-04-18 23:34:26 "Andrew Lunn" <andrew@lunn.ch> 写道：
> On Sat, Apr 18, 2026 at 08:18:04PM +0800, Mingyu Wang wrote:
> > During the hardware initialization phase in hamachi_init_one(), the driver
> > reads the PCIClkMeas register to calculate the PCI bus frequency.
> > 
> > The current code attempts to prevent a divide-by-zero error using a ternary
> > operator: `i ? 2000/(i&0x7f) : 0`. However, this check is flawed. The highest
> > bit of `i` (0x80) acts as a ready flag. If unreliable hardware or a malicious
> > virtual device returns a value where the ready bit is set but the lower 7 bits
> > are zero (e.g., 0x80), the condition `i` evaluates to true, but `(i & 0x7f)`
> > evaluates to 0. This results in a fatal divide-by-zero exception.
> > 
> > This bug was discovered during an automated virtual device fuzzing campaign
> > testing the hardware-software trust boundary. When the hardware returns 0x80,
> > it bypassed the readiness while-loop but triggered the divide error. In our
> > tests, this panic interrupted the module loading process, further triggering
> > a KASAN slab-out-of-bounds in the module error path, and ultimately leading
> > to a multi-core soft lockup and RCU stall.
> 
> Isn't that a good result of somebody trying to use emulated hardware
> with bad behaviour? The machine grinds to a halt? So it is not
> exploitable.
> 
> What happens with your patch in place? How are you reporting the
> hardware is attacking the machine, and the hardware should not be
> trusted?
> 
> 	Andrew

^ permalink raw reply related

* [PATCH net 1/2] tcp: send a challenge ACK on SEG.ACK > SND.NXT
From: Jiayuan Chen @ 2026-04-20  2:54 UTC (permalink / raw)
  To: netdev
  Cc: Jiayuan Chen, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
	David S. Miller, David Ahern, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Shuah Khan, linux-kernel, linux-kselftest
In-Reply-To: <20260420025428.101192-1-jiayuan.chen@linux.dev>

RFC 5961 Section 5.2 validates an incoming segment's ACK value
against the range [SND.UNA - MAX.SND.WND, SND.NXT] and states:

  "All incoming segments whose ACK value doesn't satisfy the above
   condition MUST be discarded and an ACK sent back."

Commit 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack
Mitigation") opted Linux into this mitigation and implements the
challenge ACK on the lower side (SEG.ACK < SND.UNA - MAX.SND.WND),
but the symmetric upper side (SEG.ACK > SND.NXT) still takes the
pre-RFC-5961 path and silently returns
SKB_DROP_REASON_TCP_ACK_UNSENT_DATA, even though RFC 793 Section 3.9
(now RFC 9293 Section 3.10.7.4) has always required:

  "If the ACK acknowledges something not yet sent (SEG.ACK > SND.NXT)
   then send an ACK, drop the segment, and return."

Complete the mitigation by sending a challenge ACK on that branch,
reusing the existing tcp_send_challenge_ack() path which already
enforces the per-socket RFC 5961 Section 7 rate limit via
__tcp_oow_rate_limited().  FLAG_NO_CHALLENGE_ACK is honoured for
symmetry with the lower-edge case.

Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>

---
I'm not sure if 'blamed commit' is appropriate, because I think
it's due to missing parts of the implementation, or it might be
directly targeted to net-next.
---
 net/ipv4/tcp_input.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 021f745747c5..c2b6f05acdfa 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4284,11 +4284,15 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 		goto old_ack;
 	}
 
-	/* If the ack includes data we haven't sent yet, discard
-	 * this segment (RFC793 Section 3.9).
+	/* If the ack includes data we haven't sent yet, drop the
+	 * segment.  RFC 793 Section 3.9 and RFC 5961 Section 5.2
+	 * require us to send an ACK back in that case.
 	 */
-	if (after(ack, tp->snd_nxt))
+	if (after(ack, tp->snd_nxt)) {
+		if (!(flag & FLAG_NO_CHALLENGE_ACK))
+			tcp_send_challenge_ack(sk, false);
 		return -SKB_DROP_REASON_TCP_ACK_UNSENT_DATA;
+	}
 
 	if (after(ack, prior_snd_una)) {
 		flag |= FLAG_SND_UNA_ADVANCED;
-- 
2.43.0


^ permalink raw reply related

* [PATCH net 0/2] tcp: symmetric challenge ACK for SEG.ACK > SND.NXT
From: Jiayuan Chen @ 2026-04-20  2:54 UTC (permalink / raw)
  To: netdev
  Cc: Jiayuan Chen, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
	David S. Miller, David Ahern, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Shuah Khan, linux-kernel, linux-kselftest

Commit 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack
Mitigation") quotes RFC 5961 Section 5.2 in full, which requires
that any incoming segment whose ACK value falls outside
[SND.UNA - MAX.SND.WND, SND.NXT] MUST be discarded and an ACK sent
back.  Linux currently sends that challenge ACK only on the lower
edge (SEG.ACK < SND.UNA - MAX.SND.WND); on the symmetric upper edge
(SEG.ACK > SND.NXT) the segment is silently dropped with
SKB_DROP_REASON_TCP_ACK_UNSENT_DATA.

Patch 1 completes the mitigation by emitting a rate-limited challenge
ACK on that branch, reusing tcp_send_challenge_ack() and honouring
FLAG_NO_CHALLENGE_ACK for consistency with the lower-edge case.

Patch 2 adds a packetdrill selftest under
tools/testing/selftests/net/packetdrill/ that verifies the new
behaviour.

Jiayuan Chen (2):
  tcp: send a challenge ACK on SEG.ACK > SND.NXT
  selftests/net: packetdrill: cover challenge ACK on SEG.ACK > SND.NXT

 net/ipv4/tcp_input.c                          | 10 ++++--
 .../tcp_rfc5961_ack-beyond-snd-nxt.pkt        | 31 +++++++++++++++++++
 2 files changed, 38 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_rfc5961_ack-beyond-snd-nxt.pkt

-- 
2.43.0

^ permalink raw reply

* [PATCH net 2/2] selftests/net: packetdrill: cover challenge ACK on SEG.ACK > SND.NXT
From: Jiayuan Chen @ 2026-04-20  2:54 UTC (permalink / raw)
  To: netdev
  Cc: Jiayuan Chen, Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima,
	David S. Miller, David Ahern, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Shuah Khan, linux-kernel, linux-kselftest
In-Reply-To: <20260420025428.101192-1-jiayuan.chen@linux.dev>

Exercise the RFC 5961 Section 5.2 / RFC 793 Section 3.9 requirement
on the upper edge of the acceptable ACK range, mirroring the existing
coverage of the SEG.ACK < SND.UNA - MAX.SND.WND case.

After the peer ACKs data the receiver has never sent, the receiver
must respond with <SEQ = SND.NXT, ACK = RCV.NXT, CTL = ACK> and drop
the offending segment.  The script validates this exact response.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 .../tcp_rfc5961_ack-beyond-snd-nxt.pkt        | 31 +++++++++++++++++++
 1 file changed, 31 insertions(+)
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_rfc5961_ack-beyond-snd-nxt.pkt

diff --git a/tools/testing/selftests/net/packetdrill/tcp_rfc5961_ack-beyond-snd-nxt.pkt b/tools/testing/selftests/net/packetdrill/tcp_rfc5961_ack-beyond-snd-nxt.pkt
new file mode 100644
index 000000000000..29d61a98d5ef
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/tcp_rfc5961_ack-beyond-snd-nxt.pkt
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// RFC 5961 Section 5.2 / RFC 793 Section 3.9: when a received
+// segment's SEG.ACK acknowledges data we have never sent
+// (SEG.ACK > SND.NXT), the receiver MUST discard the segment and
+// send a challenge ACK back.
+//
+// Complement the existing mitigation for the symmetric
+// (SEG.ACK < SND.UNA - MAX.SND.WND) case.
+
+`./defaults.sh`
+
+   // Open listener socket
+    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+   +0 bind(3, ..., ...) = 0
+   +0 listen(3, 1) = 0
+
+   // Three-way handshake
+   +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+   +0 > S. 0:0(0) ack 1 <...>
+ +.1 < . 1:1(0) ack 1 win 257
+   +0 accept(3, ..., ...) = 4
+
+   // Server has sent no data yet: SND.UNA = SND.NXT = 1, RCV.NXT = 1.
+   // Peer sends a pure ACK with SEG.ACK = 2, which is > SND.NXT.
+   +0 < . 1:1(0) ack 2 win 257
+
+   // Expect an immediate challenge ACK:
+   //   <SEQ = SND.NXT = 1, ACK = RCV.NXT = 1, CTL = ACK>
+   +0 > . 1:1(0) ack 1
-- 
2.43.0


^ permalink raw reply related

* Re: [net-next v2 4/5] net: stmmac: starfive: Add JHB100 SGMII interface
From: Minda Chen @ 2026-04-20  2:25 UTC (permalink / raw)
  To: Sai Krishna Gajula, Alexandre Torgue, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Maxime Coquelin, Emil Renner Berthing, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, netdev@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	devicetree@vger.kernel.org
In-Reply-To: <BYAPR18MB3735BA5A7D35AFFE8DF4BE67A0202@BYAPR18MB3735.namprd18.prod.outlook.com>


> 
> > -----Original Message-----
> > From: Minda Chen <minda.chen@starfivetech.com>
> > Sent: Friday, April 17, 2026 8:15 AM
> > To: Alexandre Torgue <alexandre.torgue@foss.st.com>; Andrew Lunn
> > <andrew+netdev@lunn.ch>; David S . Miller <davem@davemloft.net>; Eric
> > Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> > Abeni <pabeni@redhat.com>; Maxime Coquelin
> > <mcoquelin.stm32@gmail.com>; Emil Renner Berthing
> > <emil.renner.berthing@canonical.com>; Rob Herring
> > <robh+dt@kernel.org>; Krzysztof Kozlowski <krzk+dt@kernel.org>; Conor
> > Dooley <conor@kernel.org>; netdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org; linux-stm32@st-md-
> > mailman.stormreply.com; devicetree@vger.kernel.org; Minda Chen
> > <minda.chen@starfivetech.com>
> > Subject: [net-next v2 4/5] net: stmmac: starfive: Add JHB100 SGMII
> > interface
> >
> > Add JHB100 compatible and SGMII support. JHB100 soc contains 2 SGMII
> > interfaces and integrated with serdes PHY. SGMII with split TX/RX MAC
> > clock and need to set 2. 5M/25M/125M TX/RX clock rate in
> > 10M/100M/1000M speed mode. Signed-off-by:
> > Add JHB100 compatible and SGMII support. JHB100 soc contains
> > 2 SGMII interfaces and integrated with serdes PHY. SGMII with split
> > TX/RX MAC clock and need to set 2.5M/25M/125M TX/RX clock rate in
> > 10M/100M/1000M speed mode.
> >
> > Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
> > ---
> >  .../ethernet/stmicro/stmmac/dwmac-starfive.c  | 54
> > ++++++++++++++-----
> >  1 file changed, 42 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
> > b/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
> > index 16b955a6d77b..91698c763dac 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
> > @@ -26,6 +26,7 @@ struct starfive_dwmac_data {  struct starfive_dwmac {
> >  	struct device *dev;
> >  	const struct starfive_dwmac_data *data;
> > +	struct clk *sgmii_rx;
> >  };
> >
> >  static int starfive_dwmac_set_mode(struct plat_stmmacenet_data
> > *plat_dat) @@ -68,6 +69,24 @@ static int
> > starfive_dwmac_set_mode(struct plat_stmmacenet_data *plat_dat)
> >  	return 0;
> >  }
> >
> > +static int stmmac_starfive_sgmii_set_clk_rate(void *bsp_priv, struct
> > +clk
> > *clk_tx_i,
> > +					      phy_interface_t interface, int
> > speed) {
> 
> phy_interface_t interface is likely unused in stmmac_starfive_sgmii_set_clk_rate
> → may need __maybe_unused or (void)interface to avoid
> -Werror=unused-parameter on strict builds.
> 
Okay . Thanks for review!

> > +	struct starfive_dwmac *dwmac = (void *)bsp_priv;
> > +	long rate = rgmii_clock(speed);
> > +	int ret;
> > +
> > +	/* MAC clock rate the same as RGMII */
> > +	if (rate < 0)
> > +		return 0;
> > +
> > +	ret = clk_set_rate(clk_tx_i, rate);
> > +	if (ret)
> > +		return ret;
> > +
> > +	return clk_set_rate(dwmac->sgmii_rx, rate); }
> > +
> >  static int starfive_dwmac_probe(struct platform_device *pdev)  {
> >  	struct plat_stmmacenet_data *plat_dat; @@ -102,24 +121,34 @@ static
> > int starfive_dwmac_probe(struct platform_device *pdev)
> >  		return dev_err_probe(&pdev->dev, PTR_ERR(clk_gtx),
> >  				     "error getting gtx clock\n");
> >
> > -	/* Generally, the rgmii_tx clock is provided by the internal clock,
> > -	 * which needs to match the corresponding clock frequency according
> > -	 * to different speeds. If the rgmii_tx clock is provided by the
> > -	 * external rgmii_rxin, there is no need to configure the clock
> > -	 * internally, because rgmii_rxin will be adaptively adjusted.
> > -	 */
> > -	if (!device_property_read_bool(&pdev->dev, "starfive,tx-use-rgmii-
> > clk"))
> > -		plat_dat->set_clk_tx_rate = stmmac_set_clk_tx_rate;
> > +	if (plat_dat->phy_interface == PHY_INTERFACE_MODE_SGMII) {
> > +		dwmac->sgmii_rx = devm_clk_get_enabled(&pdev->dev,
> > "sgmii_rx");
> > +		if (IS_ERR(dwmac->sgmii_rx))
> > +			return dev_err_probe(&pdev->dev,
> > +					     PTR_ERR(dwmac->sgmii_rx),
> > +					     "error getting sgmii rx clock\n");
> > +		plat_dat->set_clk_tx_rate =
> > stmmac_starfive_sgmii_set_clk_rate;
> > +	} else {
> > +		/*
> > +		 * Generally, the rgmii_tx clock is provided by the internal
> > clock,
> > +		 * which needs to match the corresponding clock frequency
> > according
> > +		 * to different speeds. If the rgmii_tx clock is provided by the
> > +		 * external rgmii_rxin, there is no need to configure the clock
> > +		 * internally, because rgmii_rxin will be adaptively adjusted.
> > +		 */
> > +		if (!device_property_read_bool(&pdev->dev, "starfive,tx-use-
> > rgmii-clk"))
> > +			plat_dat->set_clk_tx_rate = stmmac_set_clk_tx_rate;
> > +
> > +		err = starfive_dwmac_set_mode(plat_dat);
> > +		if (err)
> > +			return err;
> > +	}
> >
> >  	dwmac->dev = &pdev->dev;
> >  	plat_dat->flags |= STMMAC_FLAG_EN_TX_LPI_CLK_PHY_CAP;
> >  	plat_dat->bsp_priv = dwmac;
> >  	plat_dat->dma_cfg->dche = true;
> >
> > -	err = starfive_dwmac_set_mode(plat_dat);
> > -	if (err)
> > -		return err;
> > -
> >  	return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);  }
> >
> > @@ -130,6 +159,7 @@ static const struct starfive_dwmac_data
> > jh7100_data = {  static const struct of_device_id starfive_dwmac_match[] = {
> >  	{ .compatible = "starfive,jh7100-dwmac", .data = &jh7100_data },
> >  	{ .compatible = "starfive,jh7110-dwmac" },
> > +	{ .compatible = "starfive,jhb100-dwmac" },
> >  	{ /* sentinel */ }
> >  };
> >  MODULE_DEVICE_TABLE(of, starfive_dwmac_match);
> > --
> > 2.17.1
> >
> Reviewed-by: Sai Krishna <saikrishnag@marvell.com>

^ permalink raw reply

* Re: [PATCH iwl-next v2 3/3] igc: add support for forcing link speed without autonegotiation
From: Abdul Rahim, Faizal @ 2026-04-20  3:20 UTC (permalink / raw)
  To: Simon Horman, khai.wen.tan
  Cc: anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev, davem,
	edumazet, kuba, pabeni, intel-wired-lan, netdev, linux-kernel,
	faizal.abdul.rahim, hong.aun.looi, khai.wen.tan
In-Reply-To: <20260418164837.380985-2-horms@kernel.org>



On 19/4/2026 12:48 am, Simon Horman wrote:
> From: 'Simon Horman' <horms@kernel.org>
> 
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> ---
> igc: add support for forcing link speed without autonegotiation
> 
> This commit adds support to allow users to force 10/100 Mb/s link speed
> and duplex via ethtool when autonegotiation is disabled.
> 
>> diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
>> index cfcbf2fdad6ea..5bd37d1be1688 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> 
> [ ... ]
> 
>> @@ -2000,6 +2013,41 @@ static int igc_ethtool_get_link_ksettings(struct net_device *netdev,
>>   	return 0;
>>   }
>>   
>> +/**
>> + * igc_handle_autoneg_disabled - Configure forced speed/duplex settings
>> + * @adapter: private driver structure
>> + * @speed: requested speed (must be SPEED_10 or SPEED_100)
>> + * @duplex: requested duplex
>> + *
>> + * Records forced speed/duplex when autoneg is disabled.
>> + * Caller must validate speed before calling this function.
>> + */
>> +static void igc_handle_autoneg_disabled(struct igc_adapter *adapter, u32 speed,
>> +					u8 duplex)
>> +{
>> +	struct igc_mac_info *mac = &adapter->hw.mac;
>> +
>> +	switch (speed) {
>> +	case SPEED_10:
>> +		mac->forced_speed_duplex = (duplex == DUPLEX_FULL) ?
>> +			IGC_FORCED_10F : IGC_FORCED_10H;
>> +		break;
>> +	case SPEED_100:
>> +		mac->forced_speed_duplex = (duplex == DUPLEX_FULL) ?
>> +			IGC_FORCED_100F : IGC_FORCED_100H;
>> +		break;
>> +	default:
>> +		WARN_ONCE(1, "Unsupported speed %u\n", speed);
>> +		return;
>> +	}
>> +
>> +	mac->autoneg_enabled = false;
>> +
>> +	/* Half-duplex cannot support flow control per IEEE 802.3 */
>> +	if (duplex == DUPLEX_HALF)
>> +		adapter->hw.fc.requested_mode = igc_fc_none;
>> +}
> 
> If userspace issues an ethtool command without specifying duplex, the duplex
> value can be DUPLEX_UNKNOWN. In this case, forced_speed_duplex correctly
> defaults to half-duplex since duplex == DUPLEX_FULL evaluates to false.
> 
> However, since DUPLEX_UNKNOWN is not equal to DUPLEX_HALF, the check above
> fails, and the requested flow control mode remains unchanged. Could this leave
> flow control enabled on a forced half-duplex link, violating IEEE 802.3?
> Should this check be based on the resolved mac->forced_speed_duplex state
> instead?
>

You're right, thanks for pointing that out.

That said, it feels simpler to address it with [1]:
if (duplex != DUPLEX_FULL)
     adapter->hw.fc.requested_mode = igc_fc_none;

Rather than [2]:
  if (mac->forced_speed_duplex == IGC_FORCED_10H ||
         mac->forced_speed_duplex == IGC_FORCED_100H)
         adapter->hw.fc.requested_mode = igc_fc_none;

Are you okay with [1] ?

> [ ... ]
> 
>> @@ -2071,14 +2120,20 @@ igc_ethtool_set_link_ksettings(struct net_device *netdev,
>>   		}
>>   	}
>>   
>> +	if (cmd->base.autoneg == AUTONEG_DISABLE &&
>> +	    cmd->base.speed != SPEED_10 && cmd->base.speed != SPEED_100) {
>> +		netdev_info(dev, "Unsupported speed for forced link\n");
>> +		return -EINVAL;
>> +	}
>> +
>>   	while (test_and_set_bit(__IGC_RESETTING, &adapter->state))
>>   		usleep_range(1000, 2000);
> 
> This isn't a bug introduced by this patch, but does this ad-hoc
> synchronization using bit flags circumvent proper locking guarantees? The
> networking subsystem guidelines forbid using bit flags as ad-hoc locks.
> 
> Using a bit flag in a polling loop bypasses lockdep analysis, fairness, and
> strict memory ordering. Could this be replaced with a real synchronization
> mechanism like a mutex?

It looks like a worthwhile cleanup. However, it likely doesn’t belong in 
this series, since the synchronization pattern predates these patches and 
is used throughout the igc driver (set_ringparam, set_pauseparam, 
set_channels, etc.). We could address it in different patch series and 
align the other code paths at the same time ?


^ permalink raw reply

* [PATCH net v2 2/2] 8021q: delete cleared egress QoS mappings
From: Ren Wei @ 2026-04-20  3:18 UTC (permalink / raw)
  To: netdev, horms, edumazet
  Cc: andrew+netdev, davem, kuba, pabeni, kees, yuantan098, ylong030,
	n05ec, yifanwucs, tomapufckgml, bird
In-Reply-To: <9136768189f8c6d3f824f476c62d2fa1111688e8.1776647968.git.yuantan098@gmail.com>

From: Longxuan Yu <ylong030@ucr.edu>

vlan_dev_set_egress_priority() currently keeps cleared egress
priority mappings in the hash as tombstones. Repeated set/clear cycles
with distinct skb priorities therefore accumulate mapping nodes until
device teardown and leak memory.

Delete mappings when vlan_prio is cleared instead of keeping tombstones.
Now that the egress mapping lists are RCU protected, the node can be
unlinked safely and freed after a grace period.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@kernel.org
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Longxuan Yu <ylong030@ucr.edu>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---
 net/8021q/vlan_dev.c     | 20 ++++++++++++++------
 net/8021q/vlan_netlink.c |  4 ----
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index a5340932b657..7aa3af8b10ea 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -172,26 +172,34 @@ int vlan_dev_set_egress_priority(const struct net_device *dev,
 				 u32 skb_prio, u16 vlan_prio)
 {
 	struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
+	struct vlan_priority_tci_mapping __rcu **mpp;
 	struct vlan_priority_tci_mapping *mp;
 	struct vlan_priority_tci_mapping *np;
 	u32 bucket = skb_prio & 0xF;
 	u32 vlan_qos = (vlan_prio << VLAN_PRIO_SHIFT) & VLAN_PRIO_MASK;
 
 	/* See if a priority mapping exists.. */
-	mp = rtnl_dereference(vlan->egress_priority_map[bucket]);
+	mpp = &vlan->egress_priority_map[bucket];
+	mp = rtnl_dereference(*mpp);
 	while (mp) {
 		if (mp->priority == skb_prio) {
-			if (mp->vlan_qos && !vlan_qos)
+			if (!vlan_qos) {
+				rcu_assign_pointer(*mpp, rtnl_dereference(mp->next));
 				vlan->nr_egress_mappings--;
-			else if (!mp->vlan_qos && vlan_qos)
-				vlan->nr_egress_mappings++;
-			WRITE_ONCE(mp->vlan_qos, vlan_qos);
+				kfree_rcu(mp, rcu);
+			} else {
+				WRITE_ONCE(mp->vlan_qos, vlan_qos);
+			}
 			return 0;
 		}
-		mp = rtnl_dereference(mp->next);
+		mpp = &mp->next;
+		mp = rtnl_dereference(*mpp);
 	}
 
 	/* Create a new mapping then. */
+	if (!vlan_qos)
+		return 0;
+
 	np = kmalloc_obj(struct vlan_priority_tci_mapping);
 	if (!np)
 		return -ENOBUFS;
diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
index a5b16833e2ce..368d53ca7d87 100644
--- a/net/8021q/vlan_netlink.c
+++ b/net/8021q/vlan_netlink.c
@@ -263,10 +263,6 @@ static int vlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
 			for (pm = rcu_dereference_rtnl(vlan->egress_priority_map[i]); pm;
 			     pm = rcu_dereference_rtnl(pm->next)) {
 				u16 vlan_qos = READ_ONCE(pm->vlan_qos);
-
-				if (!vlan_qos)
-					continue;
-
 				m.from = pm->priority;
 				m.to   = (vlan_qos >> 13) & 0x7;
 				if (nla_put(skb, IFLA_VLAN_QOS_MAPPING,
-- 
2.43.0


^ permalink raw reply related

* [PATCH net v2 1/2] 8021q: use RCU for egress QoS mappings
From: Ren Wei @ 2026-04-20  3:18 UTC (permalink / raw)
  To: netdev, horms, edumazet
  Cc: andrew+netdev, davem, kuba, pabeni, kees, yuantan098, ylong030,
	n05ec, yifanwucs, tomapufckgml, bird

From: Longxuan Yu <ylong030@ucr.edu>

The TX fast path and reporting paths walk egress QoS mappings without
RTNL. Convert the mapping lists to RCU-protected pointers, use RCU
reader annotations in readers, and defer freeing mapping nodes with an
embedded rcu_head.

This prepares the egress QoS mapping code for safe removal of mapping
nodes in a follow-up change while preserving the current behavior.

Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Longxuan Yu <ylong030@ucr.edu>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---

changes in v2:
- split the original single patch into RCU preparation and bug-fix patches
- embed an rcu_head in mapping nodes and use kfree_rcu(pm, rcu)
- keep the existing tombstone-skipping check in vlan_fill_info() so this
patch preserves behavior
- use rcu_dereference_rtnl() for netlink dump traversal
- In patch 1, sample pm->vlan_qos once with READ_ONCE() in vlan_fill_info()
  and reuse the local value, so the RCU preparatory change does not mix a
  plain load for the tombstone check with a separate READ_ONCE() for the
  exported value
- v1 Link:
https://lore.kernel.org/all/b877895cd02d35254b5c05d3c40abbf130cd87eb.1776039122.git.ylong030@ucr.edu/

Besides, we still think Fixes should point to 1da177e4c3f4, not
b020cb488586, because the underlying bug already exists in 1da177e4c3f4:
clearing an egress QoS mapping leaves the mapping node in the hash instead
of deleting it. Commit b020cb488586 only added nr_egress_mappings
accounting for that pre-existing behavior; it did not introduce the bug
itself.

 include/linux/if_vlan.h  | 25 ++++++++++++++++---------
 net/8021q/vlan_dev.c     | 31 ++++++++++++++++---------------
 net/8021q/vlan_netlink.c | 10 ++++++----
 net/8021q/vlanproc.c     | 12 ++++++++----
 4 files changed, 46 insertions(+), 32 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index e6272f9c5e42..20cc16ea4e5a 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -147,11 +147,13 @@ extern __be16 vlan_dev_vlan_proto(const struct net_device *dev);
  *	@priority: skb priority
  *	@vlan_qos: vlan priority: (skb->priority << 13) & 0xE000
  *	@next: pointer to next struct
+ *	@rcu: used for deferred freeing of mapping nodes
  */
 struct vlan_priority_tci_mapping {
 	u32					priority;
 	u16					vlan_qos;
-	struct vlan_priority_tci_mapping	*next;
+	struct vlan_priority_tci_mapping __rcu	*next;
+	struct rcu_head			rcu;
 };
 
 struct proc_dir_entry;
@@ -177,7 +179,7 @@ struct vlan_dev_priv {
 	unsigned int				nr_ingress_mappings;
 	u32					ingress_priority_map[8];
 	unsigned int				nr_egress_mappings;
-	struct vlan_priority_tci_mapping	*egress_priority_map[16];
+	struct vlan_priority_tci_mapping __rcu	*egress_priority_map[16];
 
 	__be16					vlan_proto;
 	u16					vlan_id;
@@ -209,19 +211,24 @@ static inline u16
 vlan_dev_get_egress_qos_mask(struct net_device *dev, u32 skprio)
 {
 	struct vlan_priority_tci_mapping *mp;
+	u16 vlan_qos = 0;
 
-	smp_rmb(); /* coupled with smp_wmb() in vlan_dev_set_egress_priority() */
+	rcu_read_lock();
 
-	mp = vlan_dev_priv(dev)->egress_priority_map[(skprio & 0xF)];
+	mp = rcu_dereference(vlan_dev_priv(dev)->egress_priority_map[skprio & 0xF]);
 	while (mp) {
 		if (mp->priority == skprio) {
-			return mp->vlan_qos; /* This should already be shifted
-					      * to mask correctly with the
-					      * VLAN's TCI */
+			vlan_qos = READ_ONCE(mp->vlan_qos);
+			break;
 		}
-		mp = mp->next;
+		mp = rcu_dereference(mp->next);
 	}
-	return 0;
+	rcu_read_unlock();
+
+	/* This should already be shifted to mask correctly with
+	 * the VLAN's TCI.
+	 */
+	return vlan_qos;
 }
 
 extern bool vlan_do_receive(struct sk_buff **skb);
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index c40f7d5c4fca..a5340932b657 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -172,39 +172,34 @@ int vlan_dev_set_egress_priority(const struct net_device *dev,
 				 u32 skb_prio, u16 vlan_prio)
 {
 	struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
-	struct vlan_priority_tci_mapping *mp = NULL;
+	struct vlan_priority_tci_mapping *mp;
 	struct vlan_priority_tci_mapping *np;
+	u32 bucket = skb_prio & 0xF;
 	u32 vlan_qos = (vlan_prio << VLAN_PRIO_SHIFT) & VLAN_PRIO_MASK;
 
 	/* See if a priority mapping exists.. */
-	mp = vlan->egress_priority_map[skb_prio & 0xF];
+	mp = rtnl_dereference(vlan->egress_priority_map[bucket]);
 	while (mp) {
 		if (mp->priority == skb_prio) {
 			if (mp->vlan_qos && !vlan_qos)
 				vlan->nr_egress_mappings--;
 			else if (!mp->vlan_qos && vlan_qos)
 				vlan->nr_egress_mappings++;
-			mp->vlan_qos = vlan_qos;
+			WRITE_ONCE(mp->vlan_qos, vlan_qos);
 			return 0;
 		}
-		mp = mp->next;
+		mp = rtnl_dereference(mp->next);
 	}
 
 	/* Create a new mapping then. */
-	mp = vlan->egress_priority_map[skb_prio & 0xF];
 	np = kmalloc_obj(struct vlan_priority_tci_mapping);
 	if (!np)
 		return -ENOBUFS;
 
-	np->next = mp;
 	np->priority = skb_prio;
 	np->vlan_qos = vlan_qos;
-	/* Before inserting this element in hash table, make sure all its fields
-	 * are committed to memory.
-	 * coupled with smp_rmb() in vlan_dev_get_egress_qos_mask()
-	 */
-	smp_wmb();
-	vlan->egress_priority_map[skb_prio & 0xF] = np;
+	RCU_INIT_POINTER(np->next, rtnl_dereference(vlan->egress_priority_map[bucket]));
+	rcu_assign_pointer(vlan->egress_priority_map[bucket], np);
 	if (vlan_qos)
 		vlan->nr_egress_mappings++;
 	return 0;
@@ -604,11 +599,17 @@ void vlan_dev_free_egress_priority(const struct net_device *dev)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(vlan->egress_priority_map); i++) {
-		while ((pm = vlan->egress_priority_map[i]) != NULL) {
-			vlan->egress_priority_map[i] = pm->next;
-			kfree(pm);
+		pm = rtnl_dereference(vlan->egress_priority_map[i]);
+		RCU_INIT_POINTER(vlan->egress_priority_map[i], NULL);
+		while (pm) {
+			struct vlan_priority_tci_mapping *next;
+
+			next = rtnl_dereference(pm->next);
+			kfree_rcu(pm, rcu);
+			pm = next;
 		}
 	}
+	vlan->nr_egress_mappings = 0;
 }
 
 static void vlan_dev_uninit(struct net_device *dev)
diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
index a000b1ef0520..a5b16833e2ce 100644
--- a/net/8021q/vlan_netlink.c
+++ b/net/8021q/vlan_netlink.c
@@ -260,13 +260,15 @@ static int vlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
 			goto nla_put_failure;
 
 		for (i = 0; i < ARRAY_SIZE(vlan->egress_priority_map); i++) {
-			for (pm = vlan->egress_priority_map[i]; pm;
-			     pm = pm->next) {
-				if (!pm->vlan_qos)
+			for (pm = rcu_dereference_rtnl(vlan->egress_priority_map[i]); pm;
+			     pm = rcu_dereference_rtnl(pm->next)) {
+				u16 vlan_qos = READ_ONCE(pm->vlan_qos);
+
+				if (!vlan_qos)
 					continue;
 
 				m.from = pm->priority;
-				m.to   = (pm->vlan_qos >> 13) & 0x7;
+				m.to   = (vlan_qos >> 13) & 0x7;
 				if (nla_put(skb, IFLA_VLAN_QOS_MAPPING,
 					    sizeof(m), &m))
 					goto nla_put_failure;
diff --git a/net/8021q/vlanproc.c b/net/8021q/vlanproc.c
index fa67374bda49..0e424e0895b7 100644
--- a/net/8021q/vlanproc.c
+++ b/net/8021q/vlanproc.c
@@ -262,15 +262,19 @@ static int vlandev_seq_show(struct seq_file *seq, void *offset)
 		   vlan->ingress_priority_map[7]);
 
 	seq_printf(seq, " EGRESS priority mappings: ");
+	rcu_read_lock();
 	for (i = 0; i < 16; i++) {
-		const struct vlan_priority_tci_mapping *mp
-			= vlan->egress_priority_map[i];
+		const struct vlan_priority_tci_mapping *mp =
+			rcu_dereference(vlan->egress_priority_map[i]);
 		while (mp) {
+			u16 vlan_qos = READ_ONCE(mp->vlan_qos);
+
 			seq_printf(seq, "%u:%d ",
-				   mp->priority, ((mp->vlan_qos >> 13) & 0x7));
-			mp = mp->next;
+				   mp->priority, ((vlan_qos >> 13) & 0x7));
+			mp = rcu_dereference(mp->next);
 		}
 	}
+	rcu_read_unlock();
 	seq_puts(seq, "\n");
 
 	return 0;
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH] net: lpc_eth: Fix a possible memory leak in lpc_mii_probe()
From: Ma Ke @ 2026-04-20  3:24 UTC (permalink / raw)
  To: vz
  Cc: alexandre.belloni, andrew+netdev, davem, edumazet, kuba,
	linux-arm-kernel, linux-kernel, make24, netdev, pabeni,
	piotr.wojtaszczyk, stable
In-Reply-To: <60dea9e5-9890-49ab-b806-713c388d6e08@mleia.com>

>Hello Ma Ke.
>
>On 4/1/26 16:18, Ma Ke wrote:
>> On 3/30/26 13:04, Vladimir Zapolskiy wrote:
>>> On 3/30/26 11:16, Ma Ke wrote:
>>>> lpc_mii_probe() calls of_phy_find_device() to obtain a phy_device
>>>> pointer. of_phy_find_device() increments the refcount of the device.
>>>> The current implementation does not decrement the refcount after using
>>>> the pointer, which leads to a memory leak.
>>>
>>> this is correct, there is an actual detected bug.
>>>
>>>>
>>>> Add phy_device_free() to balance the refcount.
>>>
>>> But this does not sound right, you shoud use of_node_put(pldat->phy_node).
>>>
>>>>
>>>> Found by code review.
>>>>
>>>> Signed-off-by: Ma Ke <make24@iscas.ac.cn>
>>>> Cc: stable@vger.kernel.org
>>>> Fixes: 3503bf024b3e ("net: lpc_eth: parse phy nodes from device tree")
>>>> ---
>>>>    drivers/net/ethernet/nxp/lpc_eth.c | 11 ++++++-----
>>>>    1 file changed, 6 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/nxp/lpc_eth.c b/drivers/net/ethernet/nxp/lpc_eth.c
>>>> index 8b9a3e3bba30..8ce7c9bb6dd6 100644
>>>> --- a/drivers/net/ethernet/nxp/lpc_eth.c
>>>> +++ b/drivers/net/ethernet/nxp/lpc_eth.c
>>>> @@ -751,7 +751,7 @@ static void lpc_handle_link_change(struct net_device *ndev)
>>>>    static int lpc_mii_probe(struct net_device *ndev)
>>>>    {
>>>>    	struct netdata_local *pldat = netdev_priv(ndev);
>>>> -	struct phy_device *phydev;
>>>> +	struct phy_device *phydev, *phydev_tmp;
>>>>    
>>>>    	/* Attach to the PHY */
>>>>    	if (lpc_phy_interface_mode(&pldat->pdev->dev) == PHY_INTERFACE_MODE_MII)
>>>> @@ -760,17 +760,18 @@ static int lpc_mii_probe(struct net_device *ndev)
>>>>    		netdev_info(ndev, "using RMII interface\n");
>>>>    
>>>>    	if (pldat->phy_node)
>>>> -		phydev =  of_phy_find_device(pldat->phy_node);
>>>> +		phydev_tmp =  of_phy_find_device(pldat->phy_node);
>>>>    	else
>>>> -		phydev = phy_find_first(pldat->mii_bus);
>>>> -	if (!phydev) {
>>>> +		phydev_tmp = phy_find_first(pldat->mii_bus);
>>>> +	if (!phydev_tmp) {
>>>
>>> I didn't get it, why the new phydev_tmp is needed above, please
>>> restore the original code above.
>>>
>>>>    		netdev_err(ndev, "no PHY found\n");
>>>>    		return -ENODEV;
>>>>    	}
>>>>    
>>>> -	phydev = phy_connect(ndev, phydev_name(phydev),
>>>> +	phydev = phy_connect(ndev, phydev_name(phydev_tmp),
>>>>    			     &lpc_handle_link_change,
>>>>    			     lpc_phy_interface_mode(&pldat->pdev->dev));
>>>> +	phy_device_free(phydev_tmp);
>>>
>>> This is plainly wrong and has to be dropped or changed to
>>>
>>> 	if (pldat->phy_node)
>>> 		of_node_put(pldat->phy_node);
>>>
>>>>    	if (IS_ERR(phydev)) {
>>>>    		netdev_err(ndev, "Could not attach to PHY\n");
>>>>    		return PTR_ERR(phydev);
>>>
>>> Is it AI generated fix or what?.. The change looks bad, it introduces
>>> more severe issues than it fixes.
>>>
>>> If you think you cannot create a proper change, let me know.
>>>
>> Thank you very much for your detailed review and guidance.
>> 
>> Now I think your point probably is: you are saying that the real leak
>> is not from of_phy_find_device(), but from the device node
>
>I was pretty indelicate in my comment, let's split the change into parts.
>
>1) I still do not understand, why phydev_tmp is introduced, please explain
>or remove this part of the change;
>
>2) phydev = of_phy_find_device() requires phy_device_free(phydev), but
>I do not see why phy_find_first() requires it, while it was added in your
>change.
>
>Let's start from resolving these two points.
>
>> pldat->phy_node which was obtained earlier (probably by
>> of_parse_phandle()) and never freed by of_node_put(). And you suggest
>> to add of_node_put(pldat->phy_node) instead of my wrong
>> phy_device_free().
>> 
>> However, I am still a little confused. In lpc_mii_probe(),
>> of_phy_find_device() is called. From my understanding, this function
>> increases the reference count of the device. To balance it, I thought
>> phy_device_free() (which calls put_device()) should be used.
>> 
>> Could you please kindly advise the correct patch? I will follow your
>> guidance and submit a proper fix.
>> 
>> I apologize again for my previous wrong patch. Thank you very much for
>> your help.
>
> -- 
> Best wishes,
> Vladimir
Hello Vladimir,

Thank you for the detailed explanation and for pointing out my mistakes.

> 1) I still do not understand, why phydev_tmp is introduced, please explain
> or remove this part of the change;

I added phydev_tmp because I thought I needed to keep the original 
phy_device pointer for releasing after phy_connect(). But as you 
implied, it's perhaps unnecessary and only makes the code less 
readable. I will drop this change completely in the next version.

> 2) phydev = of_phy_find_device() requires phy_device_free(phydev), but
> I do not see why phy_find_first() requires it, while it was added in your
> change.

You are absolutely right. I mistakenly assumed that both functions 
return a reference-counted pointer. phy_find_first() does not 
increment the refcount, so calling phy_device_free() on it is wrong 
and dangerous. My patch introduced a new bug there.

Now I understand that only the of_phy_find_device() branch needs a 
corresponding put_device(). I will prepare a corrected patch that only
releases the reference in that specific path (including on the error 
path after phy_connect() failure). I will also look at the phy_node 
reference leak you hinted at.

Thank you again for your guidance. I will send a v2 after fixing it 
properly.

Best regards,
Ma Ke


^ permalink raw reply

* [PATCH net] ipv6: addrconf: skip ERRDAD transition when address already DEAD
From: Linmao Li @ 2026-04-20  3:28 UTC (permalink / raw)
  To: davem, dsahern, edumazet, kuba, pabeni
  Cc: horms, netdev, linux-kernel, Linmao Li

addrconf_dad_end() transitions ifp->state from DAD to POSTDAD under
ifp->lock and releases the lock.  addrconf_dad_failure() takes
ifp->lock again with the spin_lock_bh() following the
net_info_ratelimited() duplicate-address log.  A concurrent
ipv6_del_addr() can acquire the lock in that window, set ifp->state
to DEAD and run list_del_rcu(&ifp->if_list).

addrconf_dad_failure() then overwrites DEAD with ERRDAD at errdad:
and schedules a new dad_work.  The work calls ipv6_del_addr() again,
hitting the already-poisoned list entry:

  general protection fault: 0000 [#1] SMP NOPTI
  CPU: 4 PID: 217 Comm: kworker/4:1
  Workqueue: ipv6_addrconf addrconf_dad_work
  RIP: 0010:ipv6_del_addr+0xe9/0x280
  RAX: dead000000000122
  Call Trace:
   addrconf_dad_stop+0x113/0x140
   addrconf_dad_work+0x28c/0x430
   process_one_work+0x1eb/0x3b0
   worker_thread+0x4d/0x400
   kthread+0x104/0x140
   ret_from_fork+0x35/0x40

Bail out at errdad: when ifp->state is already DEAD. The existing
in6_ifa_put() releases the reference taken for this invocation.

Signed-off-by: Linmao Li <lilinmao@kylinos.cn>
---
 net/ipv6/addrconf.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 5476b6536eb7..14b1ab43da87 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2227,6 +2227,12 @@ void addrconf_dad_failure(struct sk_buff *skb, struct inet6_ifaddr *ifp)

 errdad:
 	/* transition from _POSTDAD to _ERRDAD */
+	if (ifp->state == INET6_IFADDR_STATE_DEAD) {
+		/* ipv6_del_addr() already removed ifp while lock was dropped */
+		spin_unlock_bh(&ifp->lock);
+		in6_ifa_put(ifp);
+		return;
+	}
 	ifp->state = INET6_IFADDR_STATE_ERRDAD;
 	spin_unlock_bh(&ifp->lock);

-- 
2.25.1

^ permalink raw reply related

* [PATCH net 1/1] net: nsh: handle nested NSH headers during GSO
From: Ren Wei @ 2026-04-20  3:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jbenc, yuantan098,
	yifanwucs, tomapufckgml, bird, lx24, caoruide123, n05ec
In-Reply-To: <cover.1776597597.git.caoruide123@gmail.com>

From: Ruide Cao <caoruide123@gmail.com>

nsh_gso_segment() currently redispatches the inner payload through
skb_mac_gso_segment() after stripping one NSH header. For nested NSH
payloads, including Ethernet-encapsulated NSH payloads, this can cause
repeated re-entry into nsh_gso_segment().

The existing validation added by commit af50e4ba34f4 ("nsh: fix infinite
loop") only covers invalid header lengths and does not address repeated
self-dispatch across nested NSH payload chains.

Handle nested NSH headers iteratively in a single nsh_gso_segment()
invocation. Unwrap consecutive NSH headers until the first non-NSH payload
is reached, including the case where the next redispatch target is reached
through ETH_P_TEB, segment that payload once, and then restore the full
outer encapsulation on each output segment. Also route validation failures
through the same unwind path so the skb state is restored after any partial
unwrap.

This keeps nested NSH GSO handling correct while avoiding recursive
redispatch.

Fixes: c411ed854584 ("nsh: add GSO support")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Co-developed-by: Xiao Liu <lx24@stu.ynu.edu.cn>
Signed-off-by: Xiao Liu <lx24@stu.ynu.edu.cn>
Signed-off-by: Ruide Cao <caoruide123@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
---
 net/nsh/nsh.c | 75 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 54 insertions(+), 21 deletions(-)

diff --git a/net/nsh/nsh.c b/net/nsh/nsh.c
index bfb7758063f3..89c240c2ed7c 100644
--- a/net/nsh/nsh.c
+++ b/net/nsh/nsh.c
@@ -77,7 +77,7 @@ EXPORT_SYMBOL_GPL(nsh_pop);
 static struct sk_buff *nsh_gso_segment(struct sk_buff *skb,
 				       netdev_features_t features)
 {
-	unsigned int outer_hlen, mac_len, nsh_len;
+	unsigned int outer_hlen, mac_len, nsh_len, total_pull_len = 0;
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	u16 mac_offset = skb->mac_header;
 	__be16 outer_proto, proto;
@@ -88,40 +88,73 @@ static struct sk_buff *nsh_gso_segment(struct sk_buff *skb,
 	outer_hlen = skb_mac_header_len(skb);
 	mac_len = skb->mac_len;
 
-	if (unlikely(!pskb_may_pull(skb, NSH_BASE_HDR_LEN)))
-		goto out;
-	nsh_len = nsh_hdr_len(nsh_hdr(skb));
-	if (nsh_len < NSH_BASE_HDR_LEN)
-		goto out;
-	if (unlikely(!pskb_may_pull(skb, nsh_len)))
-		goto out;
+	while (true) {
+		if (unlikely(!pskb_may_pull(skb, NSH_BASE_HDR_LEN)))
+			goto err;
+		nsh_len = nsh_hdr_len(nsh_hdr(skb));
+		if (nsh_len < NSH_BASE_HDR_LEN)
+			goto err;
+		if (unlikely(!pskb_may_pull(skb, nsh_len)))
+			goto err;
 
-	proto = tun_p_to_eth_p(nsh_hdr(skb)->np);
-	if (!proto)
-		goto out;
+		proto = tun_p_to_eth_p(nsh_hdr(skb)->np);
+		if (!proto)
+			goto err;
 
-	__skb_pull(skb, nsh_len);
+		__skb_pull(skb, nsh_len);
+		total_pull_len += nsh_len;
 
-	skb_reset_mac_header(skb);
-	skb->mac_len = proto == htons(ETH_P_TEB) ? ETH_HLEN : 0;
-	skb->protocol = proto;
+		skb_reset_mac_header(skb);
+		skb->mac_len = proto == htons(ETH_P_TEB) ? ETH_HLEN : 0;
+		skb->protocol = proto;
+
+		/* Keep unwrapping any payload that would redispatch back into
+		 * nsh_gso_segment(), including Ethernet-encapsulated NSH.
+		 */
+		if (proto == htons(ETH_P_NSH))
+			continue;
+
+		if (proto == htons(ETH_P_TEB)) {
+			int depth = skb->mac_len;
+
+			proto = skb_network_protocol(skb, &depth);
+			if (!proto)
+				goto err;
+
+			if (proto == htons(ETH_P_NSH)) {
+				__skb_pull(skb, depth);
+				total_pull_len += depth;
+
+				skb_reset_mac_header(skb);
+				skb->mac_len = 0;
+				skb->protocol = proto;
+				continue;
+			}
+		}
+
+		break;
+	}
 
 	features &= NETIF_F_SG;
 	segs = skb_mac_gso_segment(skb, features);
-	if (IS_ERR_OR_NULL(segs)) {
-		skb_gso_error_unwind(skb, htons(ETH_P_NSH), nsh_len,
-				     mac_offset, mac_len);
-		goto out;
-	}
+	if (IS_ERR_OR_NULL(segs))
+		goto err;
 
 	for (skb = segs; skb; skb = skb->next) {
 		skb->protocol = outer_proto;
-		__skb_push(skb, nsh_len + outer_hlen);
+		__skb_push(skb, total_pull_len + outer_hlen);
 		skb_reset_mac_header(skb);
 		skb_set_network_header(skb, outer_hlen);
 		skb->mac_len = mac_len;
 	}
 
+	goto out;
+
+err:
+	if (total_pull_len)
+		skb_gso_error_unwind(skb, outer_proto, total_pull_len,
+				     mac_offset, mac_len);
+
 out:
 	return segs;
 }
-- 
2.34.1


^ permalink raw reply related

* [PATCH 6.12.y] ipv6: add NULL checks for idev in SRv6 paths
From: Li hongliang @ 2026-04-20  5:42 UTC (permalink / raw)
  To: gregkh, stable, heminhong
  Cc: patches, linux-kernel, davem, dsahern, edumazet, kuba, pabeni,
	horms, david.lebrun, netdev, andrea.mayer

From: Minhong He <heminhong@kylinos.cn>

[ Upstream commit 06413793526251870e20402c39930804f14d59c0 ]

__in6_dev_get() can return NULL when the device has no IPv6 configuration
(e.g. MTU < IPV6_MIN_MTU or after NETDEV_UNREGISTER).

Add NULL checks for idev returned by __in6_dev_get() in both
seg6_hmac_validate_skb() and ipv6_srh_rcv() to prevent potential NULL
pointer dereferences.

Fixes: 1ababeba4a21 ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)")
Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support")
Signed-off-by: Minhong He <heminhong@kylinos.cn>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20260316073301.106643-1-heminhong@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Li hongliang <1468888505@139.com>
---
 net/ipv6/exthdrs.c   | 4 ++++
 net/ipv6/seg6_hmac.c | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 1a627c24e4c3..8a30dd83cf0b 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -379,6 +379,10 @@ static int ipv6_srh_rcv(struct sk_buff *skb)
 	hdr = (struct ipv6_sr_hdr *)skb_transport_header(skb);
 
 	idev = __in6_dev_get(skb->dev);
+	if (!idev) {
+		kfree_skb(skb);
+		return -1;
+	}
 
 	accept_seg6 = min(READ_ONCE(net->ipv6.devconf_all->seg6_enabled),
 			  READ_ONCE(idev->cnf.seg6_enabled));
diff --git a/net/ipv6/seg6_hmac.c b/net/ipv6/seg6_hmac.c
index 5d21a74c1165..214d137d545e 100644
--- a/net/ipv6/seg6_hmac.c
+++ b/net/ipv6/seg6_hmac.c
@@ -245,6 +245,8 @@ bool seg6_hmac_validate_skb(struct sk_buff *skb)
 	int require_hmac;
 
 	idev = __in6_dev_get(skb->dev);
+	if (!idev)
+		return false;
 
 	srh = (struct ipv6_sr_hdr *)skb_transport_header(skb);
 
-- 
2.34.1



^ permalink raw reply related

* [PATCH 6.6.y] ipv6: add NULL checks for idev in SRv6 paths
From: Li hongliang @ 2026-04-20  5:42 UTC (permalink / raw)
  To: gregkh, stable, heminhong
  Cc: patches, linux-kernel, davem, dsahern, edumazet, kuba, pabeni,
	horms, david.lebrun, netdev, andrea.mayer

From: Minhong He <heminhong@kylinos.cn>

[ Upstream commit 06413793526251870e20402c39930804f14d59c0 ]

__in6_dev_get() can return NULL when the device has no IPv6 configuration
(e.g. MTU < IPV6_MIN_MTU or after NETDEV_UNREGISTER).

Add NULL checks for idev returned by __in6_dev_get() in both
seg6_hmac_validate_skb() and ipv6_srh_rcv() to prevent potential NULL
pointer dereferences.

Fixes: 1ababeba4a21 ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)")
Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support")
Signed-off-by: Minhong He <heminhong@kylinos.cn>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20260316073301.106643-1-heminhong@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Li hongliang <1468888505@139.com>
---
 net/ipv6/exthdrs.c   | 4 ++++
 net/ipv6/seg6_hmac.c | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 676284b6efe8..a8790163e8b6 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -378,6 +378,10 @@ static int ipv6_srh_rcv(struct sk_buff *skb)
 	hdr = (struct ipv6_sr_hdr *)skb_transport_header(skb);
 
 	idev = __in6_dev_get(skb->dev);
+	if (!idev) {
+		kfree_skb(skb);
+		return -1;
+	}
 
 	accept_seg6 = net->ipv6.devconf_all->seg6_enabled;
 	if (accept_seg6 > idev->cnf.seg6_enabled)
diff --git a/net/ipv6/seg6_hmac.c b/net/ipv6/seg6_hmac.c
index 6e15a65faecc..bf97bf5ac138 100644
--- a/net/ipv6/seg6_hmac.c
+++ b/net/ipv6/seg6_hmac.c
@@ -244,6 +244,8 @@ bool seg6_hmac_validate_skb(struct sk_buff *skb)
 	struct inet6_dev *idev;
 
 	idev = __in6_dev_get(skb->dev);
+	if (!idev)
+		return false;
 
 	srh = (struct ipv6_sr_hdr *)skb_transport_header(skb);
 
-- 
2.34.1



^ permalink raw reply related

* [PATCH 6.1.y] ipv6: add NULL checks for idev in SRv6 paths
From: Li hongliang @ 2026-04-20  5:43 UTC (permalink / raw)
  To: gregkh, stable, heminhong
  Cc: patches, linux-kernel, davem, dsahern, edumazet, kuba, pabeni,
	horms, david.lebrun, netdev, andrea.mayer

From: Minhong He <heminhong@kylinos.cn>

[ Upstream commit 06413793526251870e20402c39930804f14d59c0 ]

__in6_dev_get() can return NULL when the device has no IPv6 configuration
(e.g. MTU < IPV6_MIN_MTU or after NETDEV_UNREGISTER).

Add NULL checks for idev returned by __in6_dev_get() in both
seg6_hmac_validate_skb() and ipv6_srh_rcv() to prevent potential NULL
pointer dereferences.

Fixes: 1ababeba4a21 ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)")
Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support")
Signed-off-by: Minhong He <heminhong@kylinos.cn>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20260316073301.106643-1-heminhong@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Li hongliang <1468888505@139.com>
---
 net/ipv6/exthdrs.c   | 4 ++++
 net/ipv6/seg6_hmac.c | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 61e0060185f4..5fb97a87d2cb 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -381,6 +381,10 @@ static int ipv6_srh_rcv(struct sk_buff *skb)
 	hdr = (struct ipv6_sr_hdr *)skb_transport_header(skb);
 
 	idev = __in6_dev_get(skb->dev);
+	if (!idev) {
+		kfree_skb(skb);
+		return -1;
+	}
 
 	accept_seg6 = net->ipv6.devconf_all->seg6_enabled;
 	if (accept_seg6 > idev->cnf.seg6_enabled)
diff --git a/net/ipv6/seg6_hmac.c b/net/ipv6/seg6_hmac.c
index b90c286d77ed..e784f539194a 100644
--- a/net/ipv6/seg6_hmac.c
+++ b/net/ipv6/seg6_hmac.c
@@ -244,6 +244,8 @@ bool seg6_hmac_validate_skb(struct sk_buff *skb)
 	struct inet6_dev *idev;
 
 	idev = __in6_dev_get(skb->dev);
+	if (!idev)
+		return false;
 
 	srh = (struct ipv6_sr_hdr *)skb_transport_header(skb);
 
-- 
2.34.1



^ permalink raw reply related

* [PATCH 5.15.y] ipv6: add NULL checks for idev in SRv6 paths
From: Li hongliang @ 2026-04-20  5:43 UTC (permalink / raw)
  To: gregkh, stable, heminhong
  Cc: patches, linux-kernel, davem, dsahern, edumazet, kuba, pabeni,
	horms, david.lebrun, netdev, andrea.mayer

From: Minhong He <heminhong@kylinos.cn>

[ Upstream commit 06413793526251870e20402c39930804f14d59c0 ]

__in6_dev_get() can return NULL when the device has no IPv6 configuration
(e.g. MTU < IPV6_MIN_MTU or after NETDEV_UNREGISTER).

Add NULL checks for idev returned by __in6_dev_get() in both
seg6_hmac_validate_skb() and ipv6_srh_rcv() to prevent potential NULL
pointer dereferences.

Fixes: 1ababeba4a21 ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)")
Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support")
Signed-off-by: Minhong He <heminhong@kylinos.cn>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20260316073301.106643-1-heminhong@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Li hongliang <1468888505@139.com>
---
 net/ipv6/exthdrs.c   | 4 ++++
 net/ipv6/seg6_hmac.c | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 10772dab66bb..3d249c10e3e9 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -373,6 +373,10 @@ static int ipv6_srh_rcv(struct sk_buff *skb)
 	hdr = (struct ipv6_sr_hdr *)skb_transport_header(skb);
 
 	idev = __in6_dev_get(skb->dev);
+	if (!idev) {
+		kfree_skb(skb);
+		return -1;
+	}
 
 	accept_seg6 = net->ipv6.devconf_all->seg6_enabled;
 	if (accept_seg6 > idev->cnf.seg6_enabled)
diff --git a/net/ipv6/seg6_hmac.c b/net/ipv6/seg6_hmac.c
index 7e3a85769932..68acff337e41 100644
--- a/net/ipv6/seg6_hmac.c
+++ b/net/ipv6/seg6_hmac.c
@@ -244,6 +244,8 @@ bool seg6_hmac_validate_skb(struct sk_buff *skb)
 	struct inet6_dev *idev;
 
 	idev = __in6_dev_get(skb->dev);
+	if (!idev)
+		return false;
 
 	srh = (struct ipv6_sr_hdr *)skb_transport_header(skb);
 
-- 
2.34.1



^ permalink raw reply related

* [PATCH 5.10.y] ipv6: add NULL checks for idev in SRv6 paths
From: Li hongliang @ 2026-04-20  5:43 UTC (permalink / raw)
  To: gregkh, stable, heminhong
  Cc: patches, linux-kernel, davem, dsahern, edumazet, kuba, pabeni,
	horms, david.lebrun, netdev, andrea.mayer

From: Minhong He <heminhong@kylinos.cn>

[ Upstream commit 06413793526251870e20402c39930804f14d59c0 ]

__in6_dev_get() can return NULL when the device has no IPv6 configuration
(e.g. MTU < IPV6_MIN_MTU or after NETDEV_UNREGISTER).

Add NULL checks for idev returned by __in6_dev_get() in both
seg6_hmac_validate_skb() and ipv6_srh_rcv() to prevent potential NULL
pointer dereferences.

Fixes: 1ababeba4a21 ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)")
Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support")
Signed-off-by: Minhong He <heminhong@kylinos.cn>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20260316073301.106643-1-heminhong@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Li hongliang <1468888505@139.com>
---
 net/ipv6/exthdrs.c   | 4 ++++
 net/ipv6/seg6_hmac.c | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index cdad9019c77c..dfa0fb3d6c35 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -361,6 +361,10 @@ static int ipv6_srh_rcv(struct sk_buff *skb)
 	hdr = (struct ipv6_sr_hdr *)skb_transport_header(skb);
 
 	idev = __in6_dev_get(skb->dev);
+	if (!idev) {
+		kfree_skb(skb);
+		return -1;
+	}
 
 	accept_seg6 = net->ipv6.devconf_all->seg6_enabled;
 	if (accept_seg6 > idev->cnf.seg6_enabled)
diff --git a/net/ipv6/seg6_hmac.c b/net/ipv6/seg6_hmac.c
index f82fcd8908e1..b52985c867c2 100644
--- a/net/ipv6/seg6_hmac.c
+++ b/net/ipv6/seg6_hmac.c
@@ -245,6 +245,8 @@ bool seg6_hmac_validate_skb(struct sk_buff *skb)
 	struct inet6_dev *idev;
 
 	idev = __in6_dev_get(skb->dev);
+	if (!idev)
+		return false;
 
 	srh = (struct ipv6_sr_hdr *)skb_transport_header(skb);
 
-- 
2.34.1



^ permalink raw reply related

* Re: [RFC PATCH 1/2] kernel/notifier: replace single-linked list with double-linked list for reverse traversal
From: Masami Hiramatsu @ 2026-04-20  5:44 UTC (permalink / raw)
  To: chensong_2000
  Cc: rafael, lenb, mturquette, sboyd, viresh.kumar, agk, snitzer,
	mpatocka, bmarzins, song, yukuai, linan122, jason.wessel, danielt,
	dianders, horms, davem, edumazet, kuba, pabeni, paulmck, frederic,
	mcgrof, petr.pavlu, da.gomez, samitolvanen, atomlin, jpoimboe,
	jikos, mbenes, pmladek, joe.lawrence, rostedt, mark.rutland,
	mathieu.desnoyers, linux-modules, linux-kernel,
	linux-trace-kernel, linux-acpi, linux-clk, linux-pm,
	live-patching, dm-devel, linux-raid, kgdb-bugreport, netdev
In-Reply-To: <20260415070137.17860-1-chensong_2000@189.cn>

Hi Song,

On Wed, 15 Apr 2026 15:01:37 +0800
chensong_2000@189.cn wrote:

> From: Song Chen <chensong_2000@189.cn>
> 
> The current notifier chain implementation uses a single-linked list
> (struct notifier_block *next), which only supports forward traversal
> in priority order. This makes it difficult to handle cleanup/teardown
> scenarios that require notifiers to be called in reverse priority order.

What about introducing a new notification callback API that allows you
to describe dependencies between callback functions?

For example, when registering a callback, you could register a string
as an ID and specify whether to call it before or after that ID,
or you could register a comparison function that is called when adding
to a list. (I prefer @name and @depends fields so that it can be easily
maintained.)

This would allow for better dependency building when adding to the list.

> 
> A concrete example is the ordering dependency between ftrace and
> livepatch during module load/unload. see the detail here [1].

If this only concerns notification callback issues with the ftrace
and livepatch modules, it's far more robust to simply call the
necessary processing directly when the modules load and unload,
rather than registering notification callbacks externally.

There are fprobe, kprobe and its trace-events, all of them are using
ftrace as its fundation layer. In this case, I always needs to
consider callback order when a module is unloaded.

If ftrace is working as a part of module callbacks, it will conflict
with fprobe/kprobe module callback. Of course we can reorder it with
modifying its priority. But this is ugly, because when we introduce
a new other feature which depends on another layer, we need to
reorder the callback's priority number on the list.

Based on the above, I don't think this can be resolved simply by
changing the list of notification callbacks to a bidirectional list.

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH net] net: mctp: fix don't require received header reserved bits to be zero
From: Jeremy Kerr @ 2026-04-20  5:41 UTC (permalink / raw)
  To: wit_yuan
  Cc: yuanzm2, matt, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel, stable
In-Reply-To: <20260417141340.5306-1-yuanzhaoming901030@126.com>

Hi,

> From the MCTP Base specification (DSP0236 v1.2.1), the first byte of
> the MCTP header contains a 4 bit reserved field, and 4 bit version.
> 
> On our current receive path, we require those 4 reserved bits to be
> zero, but the 9500-8i card is non-conformant, and may set these
> reserved bits.
> 
> DSP0236 states that the reserved bits must be written as zero, and
> ignored when read. While the device might not conform to the former,
> we should accept these message to conform to the latter.
> 
> Relax our check on the MCTP version byte to allow non-zero bits in the
> reserved field.
> 
> Fixes: 889b7da23abf ("mctp: Add initial routing framework")
> Signed-off-by: Yuan Zhaoming <yuanzm2@lenovo.com>

Looks good, thanks for the contribution!

Acked-by: Jeremy Kerr <jk@codeconstruct.com.au>

Cheers,


Jeremy

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when netdev is detached
From: Loktionov, Aleksandr @ 2026-04-20  6:20 UTC (permalink / raw)
  To: Li Li, Nguyen, Anthony L, Kitszel, Przemyslaw, David S. Miller,
	Jakub Kicinski, Eric Dumazet, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Decotigny, Singhai, Anjali, Samudrala, Sridhar,
	Brian Vazquez, Tantilov, Emil S
In-Reply-To: <20260419192555.3631327-1-boolli@google.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Li Li via Intel-wired-lan
> Sent: Sunday, April 19, 2026 9:26 PM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>; David S. Miller
> <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; Eric Dumazet
> <edumazet@google.com>; intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; David
> Decotigny <decot@google.com>; Singhai, Anjali
> <anjali.singhai@intel.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; Brian Vazquez <brianvv@google.com>; Li
> Li <boolli@google.com>; Tantilov, Emil S <emil.s.tantilov@intel.com>
> Subject: [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when
> netdev is detached
> 
> Even though commit 2e281e1155fc ("idpf: detach and close netdevs while
> handling a reset") prevents ethtool -N/-n operations to operate on
> detached netdevs, we found that out-of-tree workflows like OpenOnload
> can bypass ethtool core locks and call idpf_set_rxnfc directly during
> an idpf HW reset. When this happens, we could get kernel crashes like
> the following:
> 
> [ 4045.787439] BUG: kernel NULL pointer dereference, address:
> 0000000000000070 [ 4045.794420] #PF: supervisor read access in kernel
> mode [ 4045.799580] #PF: error_code(0x0000) - not-present page [
> 4045.804739] PGD 0 [ 4045.806772] Oops: Oops: 0000 [#1] SMP NOPTI ...
> [ 4045.836425] Workqueue: onload-wqueue oof_do_deferred_work_fn
> [onload] [ 4045.842926] RIP: 0010:idpf_del_flow_steer+0x24/0x170
> [idpf] ...
> [ 4045.946323] Call Trace:
> [ 4045.948796]  <TASK>
> [ 4045.950915]  ? show_trace_log_lvl+0x1b0/0x2f0 [ 4045.955293]  ?
> show_trace_log_lvl+0x1b0/0x2f0 [ 4045.959672]  ?
> idpf_set_rxnfc+0x6f/0x80 [idpf] [ 4045.964142]  ?
> __die_body.cold+0x8/0x12 [ 4045.968000]  ? page_fault_oops+0x148/0x160
> [ 4045.972117]  ? exc_page_fault+0x6f/0x160 [ 4045.976060]  ?
> asm_exc_page_fault+0x22/0x30 [ 4045.980262]  ?
> idpf_del_flow_steer+0x24/0x170 [idpf] [ 4045.985245]
> idpf_set_rxnfc+0x6f/0x80 [idpf] [ 4045.989535]
> af_xdp_filter_remove+0x7c/0xb0 [sfc_resource] [ 4045.995069]
> oo_hw_filter_clear_hwports+0x6f/0xa0 [onload] [ 4046.000589]
> oo_hw_filter_update+0x65/0x210 [onload] [ 4046.005587]
> oof_hw_filter_update.constprop.0+0xe7/0x140 [onload] [ 4046.011716]
> oof_manager_update_all_filters+0xad/0x270 [onload] [ 4046.017671]
> __oof_do_deferred_work+0x15e/0x190 [onload] [ 4046.023014]
> oof_do_deferred_work+0x2c/0x40 [onload] [ 4046.028018]
> oof_do_deferred_work_fn+0x12/0x30 [onload] [ 4046.033277]
> process_one_work+0x174/0x330 [ 4046.037304]  worker_thread+0x246/0x390
> [ 4046.041074]  ? __pfx_worker_thread+0x10/0x10 [ 4046.045364]
> kthread+0xf6/0x240 [ 4046.048530]  ? __pfx_kthread+0x10/0x10 [
> 4046.052297]  ret_from_fork+0x2d/0x50 [ 4046.055896]  ?
> __pfx_kthread+0x10/0x10 [ 4046.059664]  ret_from_fork_asm+0x1a/0x30 [
> 4046.063613]  </TASK>
> 
> To prevent this, we need to add checks in idpf_set_rxnfc and
> idpf_get_rxnfc to error out if the netdev is already detached.
> 
> Tested: implemented the following patch to synthetically force idpf
> into a HW reset:
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> index 4fc0bb14c5b1..27476d57bcf0 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> @@ -10,6 +10,9 @@
>  #define idpf_tx_buf_next(buf)		(*(u32 *)&(buf)->priv)
>  LIBETH_SQE_CHECK_PRIV(u32);
> 
> +static bool SIMULATE_TX_TIMEOUT;
> +module_param(SIMULATE_TX_TIMEOUT, bool, 0644);
> +
>  /**
>   * idpf_chk_linearize - Check if skb exceeds max descriptors per
> packet
>   * @skb: send buffer
> @@ -46,6 +49,8 @@ void idpf_tx_timeout(struct net_device *netdev,
> unsigned int txqueue)
> 
>  	adapter->tx_timeout_count++;
> 
> +	SIMULATE_TX_TIMEOUT = false;
> +
>  	netdev_err(netdev, "Detected Tx timeout: Count %d, Queue %d\n",
>  		   adapter->tx_timeout_count, txqueue);
>  	if (!idpf_is_reset_in_prog(adapter)) { @@ -2225,6 +2230,8 @@
> static bool idpf_tx_clean_complq(struct idpf_compl_queue *complq, int
> budget,
>  			goto fetch_next_desc;
>  		}
>  		tx_q = complq->txq_grp->txqs[rel_tx_qid];
> +		if (unlikely(SIMULATE_TX_TIMEOUT && (tx_q->idx % 2 ==
> 1)))
> +			goto fetch_next_desc;
> 
>  		/* Determine completion type */
>  		ctype = le16_get_bits(tx_desc->common.qid_comptype_gen,
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index be66f9b2e101..ba5da2a86c15 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -8,6 +8,9 @@
>  #include "idpf_virtchnl.h"
>  #include "idpf_ptp.h"
> 
> +static bool VIRTCHNL_FAILED;
> +module_param(VIRTCHNL_FAILED, bool, 0644);
> +
>  /**
>   * struct idpf_vc_xn_manager - Manager for tracking transactions
>   * @ring: backing and lookup for transactions @@ -3496,6 +3499,11 @@
> int idpf_vc_core_init(struct idpf_adapter *adapter)
>  		switch (adapter->state) {
>  		case __IDPF_VER_CHECK:
>  			err = idpf_send_ver_msg(adapter);
> +
> +			if (unlikely(VIRTCHNL_FAILED)) {
> +				err = -EIO;
> +			}
> +
>  			switch (err) {
>  			case 0:
>  				/* success, move state machine forward */
> 
> And tested by writing 1 to /sys/module/idpf/parameters/VIRTCHNL_FAILED
> and /sys/module/idpf/parameters/SIMULATE_TX_TIMEOUT, and running
> idpf_get_rxnfc() right after the HW reset.
> 
> Without the patch: encountered NULL pointer and kernel crash.
> 
> With the patch: no crashes.
> 
> Fixes: 2e281e1155fc ("idpf: detach and close netdevs while handling a
> reset")
> Signed-off-by: Li Li <boolli@google.com>
> ---
>  drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> index bb99d9e7c65d..8368a7e6a754 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> @@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev,
> struct ethtool_rxnfc *cmd,
>  	unsigned int cnt = 0;
>  	int err = 0;
> 
> +	if (!netdev || !netif_device_present(netdev))
> +		return -ENODEV;
> +
>  	idpf_vport_ctrl_lock(netdev);
>  	vport = idpf_netdev_to_vport(netdev);
>  	vport_config = np->adapter->vport_config[np->vport_idx];
> @@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device
> *netdev, struct ethtool_rxnfc *cmd)  {
>  	int ret = -EOPNOTSUPP;
> 
> +	if (!netdev || !netif_device_present(netdev))
> +		return -ENODEV;
> +
>  	idpf_vport_ctrl_lock(netdev);
>  	switch (cmd->cmd) {
>  	case ETHTOOL_SRXCLSRLINS:
> --
> 2.54.0.rc1.513.gad8abe7a5a-goog

Please add Cc: stable@vger.kernel.org

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

^ permalink raw reply

* Re: [PATCH net V2 3/3] net/mlx5e: SD, Fix race condition in secondary device probe/remove
From: Shay Drori @ 2026-04-20  6:21 UTC (permalink / raw)
  To: Paolo Abeni, Tariq Toukan, Eric Dumazet, Jakub Kicinski,
	Andrew Lunn, David S. Miller
  Cc: Saeed Mahameed, Mark Bloch, Leon Romanovsky, Simon Horman,
	Kees Cook, Parav Pandit, Patrisious Haddad, Gal Pressman, netdev,
	linux-rdma, linux-kernel, Dragos Tatulea
In-Reply-To: <b429c6d3-8f6a-41fb-a9e6-9867a8ee1ad8@redhat.com>



On 16/04/2026 14:07, Paolo Abeni wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 4/13/26 12:53 PM, Tariq Toukan wrote:
>> From: Shay Drory <shayd@nvidia.com>
>>
>> When utilizing Socket-Direct single netdev functionality the driver
>> resolves the actual auxiliary device using mlx5_sd_get_adev(). However,
>> the current implementation returns the primary ETH auxiliary device
>> without holding the device lock, leading to a potential race condition
>> where the ETH device could be unbound or removed concurrently during
>> probe, suspend, resume, or remove operations.[1]
>>
>> Fix this by introducing mlx5_sd_put_adev() and updating
>> mlx5_sd_get_adev() so that secondaries devices would acquire the device
>> lock of the returned auxiliary device. After the lock is acquired, a
>> second devcom check is needed[2].
>> In addition, update The callers to pair the get operation with the new
>> put operation, ensuring the lock is held while the auxiliary device is
>> being operated on and released afterwards.
>>
>> The "primary" designation is determined once in sd_register(). It's set
>> before devcom is marked ready, and it never changes after that.
>> In Addition, The primary path never locks a secondary: When the primary
>> device invoke mlx5_sd_get_adev(), it sees dev == primary and returns.
>> no additional lock is taken.
>> Therefore lock ordering is always: secondary_lock -> primary_lock. The
>> reverse never happens, so ABBA deadlock is impossible.
>>
>> [1]
>> for example:
>> BUG: kernel NULL pointer dereference, address: 0000000000000370
>> PGD 0 P4D 0
>> Oops: Oops: 0000 [#1] SMP
>> CPU: 4 UID: 0 PID: 3945 Comm: bash Not tainted 6.19.0-rc3+ #1 NONE
>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>> RIP: 0010:mlx5e_dcbnl_dscp_app+0x23/0x100 [mlx5_core]
>> Call Trace:
>>   <TASK>
>>   mlx5e_remove+0x82/0x12a [mlx5_core]
>>   device_release_driver_internal+0x194/0x1f0
>>   bus_remove_device+0xc6/0x140
>>   device_del+0x159/0x3c0
>>   ? devl_param_driverinit_value_get+0x29/0x80
>>   mlx5_rescan_drivers_locked+0x92/0x160 [mlx5_core]
>>   mlx5_unregister_device+0x34/0x50 [mlx5_core]
>>   mlx5_uninit_one+0x43/0xb0 [mlx5_core]
>>   remove_one+0x4e/0xc0 [mlx5_core]
>>   pci_device_remove+0x39/0xa0
>>   device_release_driver_internal+0x194/0x1f0
>>   unbind_store+0x99/0xa0
>>   kernfs_fop_write_iter+0x12e/0x1e0
>>   vfs_write+0x215/0x3d0
>>   ksys_write+0x5f/0xd0
>>   do_syscall_64+0x55/0xe90
>>   entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>
>> [2]
>>      CPU0 (primary)                     CPU1 (secondary)
>> ==========================================================================
>> mlx5e_remove() (device_lock held)
>>                                       mlx5e_remove() (2nd device_lock held)
>>                                        mlx5_sd_get_adev()
>>                                         mlx5_devcom_comp_is_ready() => true
>>                                         device_lock(primary)
>>   mlx5_sd_get_adev() ==> ret adev
>>   _mlx5e_remove()
>>   mlx5_sd_cleanup()
>>   // mlx5e_remove finished
>>   // releasing device_lock
>>                                         //need another check here...
>>                                         mlx5_devcom_comp_is_ready() => false
>>
>> Fixes: 381978d28317 ("net/mlx5e: Create single netdev per SD group")
>> Signed-off-by: Shay Drory <shayd@nvidia.com>
>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> ---
>>   .../net/ethernet/mellanox/mlx5/core/en_main.c  | 18 ++++++++++++++----
>>   .../net/ethernet/mellanox/mlx5/core/lib/sd.c   | 17 +++++++++++++++++
>>   .../net/ethernet/mellanox/mlx5/core/lib/sd.h   |  2 ++
>>   3 files changed, 33 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> index 0b8b44bbcb9e..11f80158e107 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> @@ -6657,8 +6657,11 @@ static int mlx5e_resume(struct auxiliary_device *adev)
>>                return err;
>>
>>        actual_adev = mlx5_sd_get_adev(mdev, adev, edev->idx);
>> -     if (actual_adev)
>> -             return _mlx5e_resume(actual_adev);
>> +     if (actual_adev) {
>> +             err = _mlx5e_resume(actual_adev);
>> +             mlx5_sd_put_adev(actual_adev, adev);
>> +             return err;
>> +     }
>>        return 0;
>>   }
>>
>> @@ -6698,6 +6701,8 @@ static int mlx5e_suspend(struct auxiliary_device *adev, pm_message_t state)
>>                err = _mlx5e_suspend(actual_adev, false);
>>
>>        mlx5_sd_cleanup(mdev);
>> +     if (actual_adev)
>> +             mlx5_sd_put_adev(actual_adev, adev);
>>        return err;
>>   }
>>
>> @@ -6795,8 +6800,11 @@ static int mlx5e_probe(struct auxiliary_device *adev,
>>                return err;
>>
>>        actual_adev = mlx5_sd_get_adev(mdev, adev, edev->idx);
>> -     if (actual_adev)
>> -             return _mlx5e_probe(actual_adev);
>> +     if (actual_adev) {
>> +             err = _mlx5e_probe(actual_adev);
>> +             mlx5_sd_put_adev(actual_adev, adev);
> 
> Sashiko says:
> 
> ---
> If _mlx5e_probe(actual_adev) fails, it frees mlx5e_dev but leaves the
> auxiliary device's drvdata pointing to it.

After probe fails, the driver core marks the device as unbound.
Hence, the stale drvdata pointer is unreachable.

> Furthermore, mlx5e_probe()
> returns the error without calling mlx5_sd_cleanup(mdev), leaving devcom
> incorrectly marked as ready.
> If the primary device is later unbound, mlx5e_remove() will see that
> devcom is ready, call _mlx5e_remove(), and blindly dereference the
> dangling mlx5e_dev pointer.
> Is there a missing cleanup step here to clear drvdata or reset the sd
> state on failure?

This is an existing bug that will be address in a new patch in V3

> ---
> 
> Please try to address AI comments (i.e. explaining why not relevant)
> proactively.

Ack.

Thanks

> 
> Thanks,
> 
> Paolo
> 


^ permalink raw reply

* Re: [PATCH net v1] net/mlx5: Fix HCA caps leak on notifier init failure
From: Tariq Toukan @ 2026-04-20  6:57 UTC (permalink / raw)
  To: Prathamesh Deshpande, Saeed Mahameed, Leon Romanovsky,
	Carolina Jubran
  Cc: Cosmin Ratiu, Tariq Toukan, Jakub Kicinski, netdev, linux-rdma,
	linux-kernel
In-Reply-To: <20260415005022.34764-1-prathameshdeshpande7@gmail.com>



On 15/04/2026 3:49, Prathamesh Deshpande wrote:
> mlx5_mdev_init() allocates HCA caps via mlx5_hca_caps_alloc() before
> calling mlx5_notifiers_init(). If notifier initialization fails, the
> error path jumps to err_hca_caps and skips mlx5_hca_caps_free(), leaking
> allocated caps.
> 
> Add a dedicated unwind label for notifier-init failure that frees HCA
> caps before continuing the existing cleanup sequence.
> 
> Fixes: b6b03097f982 ("net/mlx5: Initialize events outside devlink lock")
> Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 3f73d9b1115d..fab80c79ff07 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -1907,7 +1907,7 @@ int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx)
>   
>   	err = mlx5_notifiers_init(dev);
>   	if (err)
> -		goto err_hca_caps;
> +		goto err_notifiers_init;
>   
>   	/* The conjunction of sw_vhca_id with sw_owner_id will be a global
>   	 * unique id per function which uses mlx5_core.
> @@ -1923,6 +1923,8 @@ int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx)
>   
>   	return 0;
>   
> +err_notifiers_init:
> +	mlx5_hca_caps_free(dev);
>   err_hca_caps:
>   	mlx5_adev_cleanup(dev);
>   err_adev_init:

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Thanks.

^ permalink raw reply

* Re: [PATCH net-next v3 12/12] selftests: net: Add a test for BIG TCP in UDP tunnels
From: Willem de Bruijn @ 2026-04-20  7:00 UTC (permalink / raw)
  To: Alice Mikityanska, Willem de Bruijn
  Cc: Alice Mikityanska, Daniel Borkmann, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Xin Long, David Ahern,
	Nikolay Aleksandrov, Shuah Khan, Stanislav Fomichev, Andrew Lunn,
	Simon Horman, Florian Westphal, netdev
In-Reply-To: <CAD0BsJXyHQADVymJw7L0sA72P-=E8Wz7p3CedyvCbqEHBu17JA@mail.gmail.com>

Alice Mikityanska wrote:
> On Thu, 16 Apr 2026 at 14:06, Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Alice Mikityanska wrote:
> > > From: Alice Mikityanska <alice@isovalent.com>
> > >
> > > The test sets up VXLAN and GENEVE tunnels over IPv4 and IPv6 and runs
> > > IPv4 and IPv6 traffic through them with BIG TCP enabled. It checks that
> > > a non-negligible amount of big aggregated packets are seen in tcpdump.
> > >
> > > Signed-off-by: Alice Mikityanska <alice@isovalent.com>
> > > ---
> > >  tools/testing/selftests/net/Makefile          |   1 +
> > >  .../testing/selftests/net/big_tcp_tunnels.sh  | 145 ++++++++++++++++++
> > >  2 files changed, 146 insertions(+)
> > >  create mode 100755 tools/testing/selftests/net/big_tcp_tunnels.sh
> > >
> > > diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> > > index cab74ebdaced..c8ea9d4bb94f 100644
> > > --- a/tools/testing/selftests/net/Makefile
> > > +++ b/tools/testing/selftests/net/Makefile
> > > @@ -13,6 +13,7 @@ TEST_PROGS := \
> > >       arp_ndisc_untracked_subnets.sh \
> > >       bareudp.sh \
> > >       big_tcp.sh \
> > > +     big_tcp_tunnels.sh \
> > >       bind_bhash.sh \
> > >       bpf_offload.py \
> > >       bridge_vlan_dump.sh \
> > > diff --git a/tools/testing/selftests/net/big_tcp_tunnels.sh b/tools/testing/selftests/net/big_tcp_tunnels.sh
> > > new file mode 100755
> > > index 000000000000..b819911519ac
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/net/big_tcp_tunnels.sh
> > > @@ -0,0 +1,145 @@
> > > +#!/usr/bin/env bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +#
> > > +# Testing for IPv4 and IPv6 BIG TCP over VXLAN and GENEVE tunnels.
> > > +
> > > +SERVER_NS=$(mktemp -u server-XXXXXXXX)
> > > +SERVER_IP4="192.168.1.1"
> > > +SERVER_IP6="2001:db8::1:1"
> > > +SERVER_IP4_TUN="192.168.2.1"
> > > +SERVER_IP6_TUN="2001:db8::2:1"
> > > +
> > > +CLIENT_NS=$(mktemp -u client-XXXXXXXX)
> > > +CLIENT_IP4="192.168.1.2"
> > > +CLIENT_IP6="2001:db8::1:2"
> > > +CLIENT_IP4_TUN="192.168.2.2"
> > > +CLIENT_IP6_TUN="2001:db8::2:2"
> > > +
> > > +PACKETS_THRESHOLD=10000
> > > +
> > > +# Kselftest framework requirement - SKIP code is 4.
> > > +ksft_skip=4
> > > +
> > > +setup() {
> > > +     ip netns add "$SERVER_NS"
> > > +     ip netns add "$CLIENT_NS"
> > > +     ip -netns "$SERVER_NS" link add link1 type veth peer name link0 netns "$CLIENT_NS"
> > > +
> > > +     ip -netns "$CLIENT_NS" link set link0 up
> > > +     ip -netns "$CLIENT_NS" addr replace "$CLIENT_IP4/24" dev link0
> > > +     ip -netns "$CLIENT_NS" addr replace "$CLIENT_IP6/112" dev link0 nodad
> > > +     ip -netns "$CLIENT_NS" link set link0 \
> > > +             gso_max_size 196608 gso_ipv4_max_size 196608 \
> > > +             gro_max_size 196608 gro_ipv4_max_size 196608
> > > +     ip -netns "$SERVER_NS" link set link1 up
> > > +     ip -netns "$SERVER_NS" addr replace "$SERVER_IP4/24" dev link1
> > > +     ip -netns "$SERVER_NS" addr replace "$SERVER_IP6/112" dev link1 nodad
> > > +     ip -netns "$SERVER_NS" link set link1 \
> > > +             gso_max_size 196608 gso_ipv4_max_size 196608 \
> > > +             gro_max_size 196608 gro_ipv4_max_size 196608
> > > +
> > > +     ip netns exec "$SERVER_NS" netserver >/dev/null
> > > +}
> > > +
> > > +setup_tunnel() {
> > > +     if [ "$2" = 4 ]; then
> > > +             SERVER_IP="$SERVER_IP4"
> > > +             CLIENT_IP="$CLIENT_IP4"
> > > +             echo "Setting up ${1^^} over IPv4"
> > > +     else
> > > +             SERVER_IP="$SERVER_IP6"
> > > +             CLIENT_IP="$CLIENT_IP6"
> > > +             echo "Setting up ${1^^} over IPv6"
> > > +     fi
> > > +
> > > +     if [ "$1" = vxlan ]; then
> > > +             ip -netns "$CLIENT_NS" link add tun0 type vxlan \
> > > +                     id 5001 remote "$SERVER_IP" local "$CLIENT_IP" dev link0 dstport 4789
> > > +     else
> > > +             ip -netns "$CLIENT_NS" link add tun0 type geneve \
> > > +                     id 5001 remote "$SERVER_IP"
> > > +     fi
> > > +     ip -netns "$CLIENT_NS" link set tun0 up
> > > +     ip -netns "$CLIENT_NS" addr replace "$CLIENT_IP4_TUN/24" dev tun0
> > > +     ip -netns "$CLIENT_NS" addr replace "$CLIENT_IP6_TUN/112" dev tun0 nodad
> > > +     ip -netns "$CLIENT_NS" link set tun0 \
> > > +             gso_max_size 196608 gso_ipv4_max_size 196608 \
> > > +             gro_max_size 196608 gro_ipv4_max_size 196608
> > > +     if [ "$1" = vxlan ]; then
> > > +             ip -netns "$SERVER_NS" link add tun1 type vxlan \
> > > +                     id 5001 remote "$CLIENT_IP" local "$SERVER_IP" dev link1 dstport 4789
> > > +     else
> > > +             ip -netns "$SERVER_NS" link add tun1 type geneve \
> > > +                     id 5001 remote "$CLIENT_IP"
> > > +     fi
> > > +     ip -netns "$SERVER_NS" link set tun1 up
> > > +     ip -netns "$SERVER_NS" addr replace "$SERVER_IP4_TUN/24" dev tun1
> > > +     ip -netns "$SERVER_NS" addr replace "$SERVER_IP6_TUN/112" dev tun1 nodad
> > > +     ip -netns "$SERVER_NS" link set tun1 \
> > > +             gso_max_size 196608 gso_ipv4_max_size 196608 \
> > > +             gro_max_size 196608 gro_ipv4_max_size 196608
> > > +}
> > > +
> > > +cleanup_tunnel() {
> > > +     ip -netns "$CLIENT_NS" link del tun0
> > > +     ip -netns "$SERVER_NS" link del tun1
> > > +}
> > > +
> > > +cleanup() {
> > > +     ip netns exec "$SERVER_NS" killall netserver
> > > +     ip netns del "$SERVER_NS"
> > > +     ip netns del "$CLIENT_NS"
> > > +}
> > > +
> > > +do_test() {
> > > +     exec 3< <(ip netns exec "$SERVER_NS" tcpdump -nn -i link1 greater 65536 2> /dev/null)
> > > +     TCPDUMP_SERVER_PID="$!"
> > > +     exec 4< <(wc -l <&3)
> > > +     exec 5< <(ip netns exec "$CLIENT_NS" tcpdump -nn -i link0 greater 65536 2> /dev/null)
> > > +     TCPDUMP_CLIENT_PID="$!"
> > > +     exec 6< <(wc -l <&5)
> > > +
> > > +     if [ "$1" = 4 ]; then
> > > +             SERVER_IP="$SERVER_IP4_TUN"
> > > +             echo "Running IPv4 traffic in the tunnel"
> > > +     else
> > > +             SERVER_IP="$SERVER_IP6_TUN"
> > > +             echo "Running IPv6 traffic in the tunnel"
> > > +     fi
> > > +
> > > +     ip netns exec "$CLIENT_NS" netperf -t TCP_STREAM -l 5 -H "$SERVER_IP" -- \
> > > +             -r 80000:80000 > /dev/null
> >
> > is -r valid for TCP_STREAM
> 
> Yes, it controls how much data is sent in one send() syscall. If I set
> a smaller size, the kernel will still try to aggregate the packets,
> but there will be much fewer BIG TCP packets.

Not according to the care & feeding doc, where it is an RR option.
Maybe it is used in practice. Maybe also depends on whether using
omni or bsd mode. But just a note.

^ permalink raw reply

* Re: [PATCH net V2 1/3] net/mlx5: SD: Serialize init/cleanup
From: Shay Drori @ 2026-04-20  7:01 UTC (permalink / raw)
  To: Paolo Abeni, Tariq Toukan, Eric Dumazet, Jakub Kicinski,
	Andrew Lunn, David S. Miller
  Cc: Saeed Mahameed, Mark Bloch, Leon Romanovsky, Simon Horman,
	Kees Cook, Parav Pandit, Patrisious Haddad, Gal Pressman, netdev,
	linux-rdma, linux-kernel, Dragos Tatulea
In-Reply-To: <77503dd8-d882-4079-9dc8-f0cab89c0a7b@redhat.com>



On 16/04/2026 14:00, Paolo Abeni wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 4/13/26 12:53 PM, Tariq Toukan wrote:
>> @@ -491,23 +508,35 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
>>   {
>>        struct mlx5_sd *sd = mlx5_get_sd(dev);
>>        struct mlx5_core_dev *primary, *pos;
>> +     struct mlx5_sd *primary_sd = NULL;
>>        int i;
>>
>>        if (!sd)
>>                return;
>>
>> +     mlx5_devcom_comp_lock(sd->devcom);
>>        if (!mlx5_devcom_comp_is_ready(sd->devcom))
>> -             goto out;
>> +             goto out_unlock;
>>
>>        primary = mlx5_sd_get_primary(dev);
>> +     primary_sd = mlx5_get_sd(primary);
>> +
>> +     if (primary_sd->state != MLX5_SD_STATE_UP)
>> +             goto out_unlock;
>> +
>>        mlx5_sd_for_each_secondary(i, primary, pos)
>>                sd_cmd_unset_secondary(pos);
>>        sd_cmd_unset_primary(primary);
>>        debugfs_remove_recursive(sd->dfs);
>>
>>        sd_info(primary, "group id %#x, uncombined\n", sd->group_id);
>> -out:
>> +     primary_sd->state = MLX5_SD_STATE_DESTROYING;
>> +out_unlock:
>> +     mlx5_devcom_comp_unlock(sd->devcom);
>>        sd_unregister(dev);
>> +     if (primary_sd)
>> +             /* devcom isn't ready, reset the state */
>> +             primary_sd->state = MLX5_SD_STATE_DOWN;
> 
> Sashiko says:
> 
> ---
> Since primary_sd is only populated if devcom is ready, this condition will
> never trigger when devcom isn't ready, contrary to the comment. Instead,
> it triggers on the normal path after devcom is ready, immediately
> overwriting
> MLX5_SD_STATE_DESTROYING with MLX5_SD_STATE_DOWN outside the lock.
> Could this allow concurrent mlx5_sd_init() calls to see the down state and
> attempt hardware re-initialization while the group is still being torn down?
> Also, can this race and cause a use-after-free regression?
> During a concurrent Socket-Direct group teardown, the primary PF and
> secondary
> PF can execute mlx5_sd_cleanup() in parallel.
> If the primary PF completes its cleanup first, it will call
> sd_cleanup(primary)
> which calls kfree() on the sd structure, freeing the primary_sd memory.
> If the secondary PF is preempted just after releasing the devcom lock,
> it will resume, evaluate its local non-NULL primary_sd pointer, and
> locklessly
> write to primary_sd->state. Does this dereference the freed memory of the
> primary PF?
After some more thinks following the above comments, I change the logic
here so that devcom_set_ready(false) is done under the
mlx5_devcom_comp_lock(), making the is_ready() check in sd_init()
reliable gate for cleanup/init race cases.

> ---
> 


^ permalink raw reply

* [syzbot] [wireless?] KASAN: slab-use-after-free Write in rsi_91x_deinit
From: syzbot @ 2026-04-20  7:05 UTC (permalink / raw)
  To: linux-kernel, linux-usb, linux-wireless, netdev, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    87117347a0e7 usb: dwc3: starfive: Add JHB100 USB 2.0 DRD c..
git tree:       https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing
console output: https://syzkaller.appspot.com/x/log.txt?x=171b04ce580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=2056c1e3f6d3b0bc
dashboard link: https://syzkaller.appspot.com/bug?extid=5de83f57cd8531f55596
compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/6168cf3e4727/disk-87117347.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/36e554f1750a/vmlinux-87117347.xz
kernel image: https://storage.googleapis.com/syzbot-assets/7aa92e741f66/bzImage-87117347.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+5de83f57cd8531f55596@syzkaller.appspotmail.com

rsi_91x: rsi_probe: Failed to init usb interface
==================================================================
BUG: KASAN: slab-use-after-free in instrument_atomic_read_write include/linux/instrumented.h:112 [inline]
BUG: KASAN: slab-use-after-free in atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:252 [inline]
BUG: KASAN: slab-use-after-free in __refcount_add include/linux/refcount.h:283 [inline]
BUG: KASAN: slab-use-after-free in __refcount_inc include/linux/refcount.h:366 [inline]
BUG: KASAN: slab-use-after-free in refcount_inc include/linux/refcount.h:383 [inline]
BUG: KASAN: slab-use-after-free in get_task_struct include/linux/sched/task.h:116 [inline]
BUG: KASAN: slab-use-after-free in kthread_stop+0x8f/0x680 kernel/kthread.c:754
Write of size 4 at addr ffff88813d339da8 by task kworker/0:3/10583

CPU: 0 UID: 0 PID: 10583 Comm: kworker/0:3 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Workqueue: usb_hub_wq hub_event
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0x156/0x4c9 mm/kasan/report.c:482
 kasan_report+0xdf/0x1e0 mm/kasan/report.c:595
 check_region_inline mm/kasan/generic.c:186 [inline]
 kasan_check_range+0x10f/0x1e0 mm/kasan/generic.c:200
 instrument_atomic_read_write include/linux/instrumented.h:112 [inline]
 atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:252 [inline]
 __refcount_add include/linux/refcount.h:283 [inline]
 __refcount_inc include/linux/refcount.h:366 [inline]
 refcount_inc include/linux/refcount.h:383 [inline]
 get_task_struct include/linux/sched/task.h:116 [inline]
 kthread_stop+0x8f/0x680 kernel/kthread.c:754
 rsi_kill_thread drivers/net/wireless/rsi/rsi_common.h:78 [inline]
 rsi_91x_deinit+0x102/0x1f0 drivers/net/wireless/rsi/rsi_91x_main.c:405
 rsi_probe+0xd27/0x1aa0 drivers/net/wireless/rsi/rsi_91x_usb.c:861
 usb_probe_interface+0x303/0x8f0 drivers/usb/core/driver.c:396
 call_driver_probe drivers/base/dd.c:643 [inline]
 really_probe+0x241/0xa60 drivers/base/dd.c:721
 __driver_probe_device+0x1de/0x400 drivers/base/dd.c:863
 driver_probe_device+0x4c/0x1b0 drivers/base/dd.c:893
 __device_attach_driver+0x1df/0x340 drivers/base/dd.c:1021
 bus_for_each_drv+0x159/0x1e0 drivers/base/bus.c:500
 __device_attach+0x1e4/0x4d0 drivers/base/dd.c:1093
 device_initial_probe+0xaf/0xd0 drivers/base/dd.c:1148
 bus_probe_device+0x64/0x160 drivers/base/bus.c:613
 device_add+0x11d9/0x1950 drivers/base/core.c:3691
 usb_set_configuration+0xd97/0x1c60 drivers/usb/core/message.c:2268
 usb_generic_driver_probe+0xa1/0xe0 drivers/usb/core/generic.c:250
 usb_probe_device+0xef/0x400 drivers/usb/core/driver.c:291
 call_driver_probe drivers/base/dd.c:643 [inline]
 really_probe+0x241/0xa60 drivers/base/dd.c:721
 __driver_probe_device+0x1de/0x400 drivers/base/dd.c:863
 driver_probe_device+0x4c/0x1b0 drivers/base/dd.c:893
 __device_attach_driver+0x1df/0x340 drivers/base/dd.c:1021
 bus_for_each_drv+0x159/0x1e0 drivers/base/bus.c:500
 __device_attach+0x1e4/0x4d0 drivers/base/dd.c:1093
 device_initial_probe+0xaf/0xd0 drivers/base/dd.c:1148
 bus_probe_device+0x64/0x160 drivers/base/bus.c:613
 device_add+0x11d9/0x1950 drivers/base/core.c:3691
 usb_new_device.cold+0x685/0x115c drivers/usb/core/hub.c:2695
 hub_port_connect drivers/usb/core/hub.c:5567 [inline]
 hub_port_connect_change drivers/usb/core/hub.c:5707 [inline]
 port_event drivers/usb/core/hub.c:5871 [inline]
 hub_event+0x314d/0x4af0 drivers/usb/core/hub.c:5953
 process_one_work+0xa23/0x19a0 kernel/workqueue.c:3276
 process_scheduled_works kernel/workqueue.c:3359 [inline]
 worker_thread+0x5ef/0xe50 kernel/workqueue.c:3440
 kthread+0x370/0x450 kernel/kthread.c:436
 ret_from_fork+0x6c3/0xcb0 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>

Allocated by task 2:
 kasan_save_stack+0x30/0x50 mm/kasan/common.c:57
 kasan_save_track+0x14/0x30 mm/kasan/common.c:78
 unpoison_slab_object mm/kasan/common.c:340 [inline]
 __kasan_slab_alloc+0x6e/0x70 mm/kasan/common.c:366
 kasan_slab_alloc include/linux/kasan.h:253 [inline]
 slab_post_alloc_hook mm/slub.c:4538 [inline]
 slab_alloc_node mm/slub.c:4866 [inline]
 kmem_cache_alloc_node_noprof+0x26b/0x6b0 mm/slub.c:4918
 alloc_task_struct_node kernel/fork.c:185 [inline]
 dup_task_struct kernel/fork.c:916 [inline]
 copy_process+0x48b/0x7820 kernel/fork.c:2050
 kernel_clone+0xfc/0x9a0 kernel/fork.c:2653
 kernel_thread+0xdb/0x120 kernel/fork.c:2714
 create_kthread kernel/kthread.c:459 [inline]
 kthreadd+0x498/0x7a0 kernel/kthread.c:817
 ret_from_fork+0x6c3/0xcb0 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Freed by task 14:
 kasan_save_stack+0x30/0x50 mm/kasan/common.c:57
 kasan_save_track+0x14/0x30 mm/kasan/common.c:78
 kasan_save_free_info+0x3b/0x70 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x43/0x70 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2685 [inline]
 slab_free mm/slub.c:6165 [inline]
 kmem_cache_free+0x105/0x640 mm/slub.c:6295
 rcu_do_batch kernel/rcu/tree.c:2617 [inline]
 rcu_core+0x5a2/0x10d0 kernel/rcu/tree.c:2869
 handle_softirqs+0x1de/0x9d0 kernel/softirq.c:622
 run_ksoftirqd kernel/softirq.c:1063 [inline]
 run_ksoftirqd+0x38/0x60 kernel/softirq.c:1055
 smpboot_thread_fn+0x3d3/0xaa0 kernel/smpboot.c:160
 kthread+0x370/0x450 kernel/kthread.c:436
 ret_from_fork+0x6c3/0xcb0 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Last potentially related work creation:
 kasan_save_stack+0x30/0x50 mm/kasan/common.c:57
 kasan_record_aux_stack+0x8c/0xa0 mm/kasan/generic.c:556
 __call_rcu_common.constprop.0+0xa5/0x9b0 kernel/rcu/tree.c:3131
 put_task_struct include/linux/sched/task.h:159 [inline]
 put_task_struct include/linux/sched/task.h:128 [inline]
 delayed_put_task_struct+0xe4/0x2e0 kernel/exit.c:231
 rcu_do_batch kernel/rcu/tree.c:2617 [inline]
 rcu_core+0x5a2/0x10d0 kernel/rcu/tree.c:2869
 handle_softirqs+0x1de/0x9d0 kernel/softirq.c:622
 __do_softirq kernel/softirq.c:656 [inline]
 invoke_softirq kernel/softirq.c:496 [inline]
 __irq_exit_rcu+0xed/0x150 kernel/softirq.c:723
 irq_exit_rcu+0x9/0x30 kernel/softirq.c:739
 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1056 [inline]
 sysvec_apic_timer_interrupt+0x8f/0xb0 arch/x86/kernel/apic/apic.c:1056
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697

Second to last potentially related work creation:
 kasan_save_stack+0x30/0x50 mm/kasan/common.c:57
 kasan_record_aux_stack+0x8c/0xa0 mm/kasan/generic.c:556
 __call_rcu_common.constprop.0+0xa5/0x9b0 kernel/rcu/tree.c:3131
 put_task_struct_rcu_user kernel/exit.c:237 [inline]
 put_task_struct_rcu_user+0x6c/0xc0 kernel/exit.c:234
 context_switch kernel/sched/core.c:5301 [inline]
 __schedule+0xeb9/0x4220 kernel/sched/core.c:6911
 preempt_schedule_common+0x42/0xc0 kernel/sched/core.c:7095
 preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
 __raw_spin_unlock include/linux/spinlock_api_smp.h:169 [inline]
 _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
 spin_unlock include/linux/spinlock.h:389 [inline]
 filemap_map_pages+0x130f/0x1e50 mm/filemap.c:3936
 do_fault_around mm/memory.c:5757 [inline]
 do_read_fault mm/memory.c:5790 [inline]
 do_fault mm/memory.c:5933 [inline]
 do_pte_missing mm/memory.c:4477 [inline]
 handle_pte_fault mm/memory.c:6317 [inline]
 __handle_mm_fault+0x1e2e/0x2d60 mm/memory.c:6455
 handle_mm_fault+0x36d/0xa20 mm/memory.c:6624
 do_user_addr_fault+0x5ae/0x11d0 arch/x86/mm/fault.c:1334
 handle_page_fault arch/x86/mm/fault.c:1474 [inline]
 exc_page_fault+0x66/0xc0 arch/x86/mm/fault.c:1527
 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618

The buggy address belongs to the object at ffff88813d339d80
 which belongs to the cache task_struct of size 7296
The buggy address is located 40 bytes inside of
 freed 7296-byte region [ffff88813d339d80, ffff88813d33ba00)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13d338
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff88813d33f601
flags: 0x200000000000040(head|node=0|zone=2)
page_type: f5(slab)
raw: 0200000000000040 ffff8881012d9500 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800040004 00000000f5000000 ffff88813d33f601
head: 0200000000000040 ffff8881012d9500 dead000000000100 dead000000000122
head: 0000000000000000 0000000800040004 00000000f5000000 ffff88813d33f601
head: 0200000000000003 ffffea0004f4ce01 00000000ffffffff 00000000ffffffff
head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000008
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5914, tgid 5914 (dhcpcd-run-hook), ts 193584350604, free_ts 193494512770
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x153/0x170 mm/page_alloc.c:1889
 prep_new_page mm/page_alloc.c:1897 [inline]
 get_page_from_freelist+0xf10/0x39f0 mm/page_alloc.c:3962
 __alloc_frozen_pages_noprof+0x273/0x2860 mm/page_alloc.c:5250
 alloc_slab_page mm/slub.c:3292 [inline]
 allocate_slab mm/slub.c:3481 [inline]
 new_slab+0xa6/0x6c0 mm/slub.c:3539
 refill_objects+0x26b/0x400 mm/slub.c:7175
 refill_sheaf mm/slub.c:2812 [inline]
 __pcs_replace_empty_main+0x1ab/0x660 mm/slub.c:4615
 alloc_from_pcs mm/slub.c:4717 [inline]
 slab_alloc_node mm/slub.c:4851 [inline]
 kmem_cache_alloc_node_noprof+0x4e9/0x6b0 mm/slub.c:4918
 alloc_task_struct_node kernel/fork.c:185 [inline]
 dup_task_struct kernel/fork.c:916 [inline]
 copy_process+0x48b/0x7820 kernel/fork.c:2050
 kernel_clone+0xfc/0x9a0 kernel/fork.c:2653
 __do_sys_clone+0xd9/0x120 kernel/fork.c:2794
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x106/0x7b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 2861 tgid 2861 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1433 [inline]
 __free_frozen_pages+0x7b1/0xfb0 mm/page_alloc.c:2978
 qlink_free mm/kasan/quarantine.c:163 [inline]
 qlist_free_all+0x47/0xe0 mm/kasan/quarantine.c:179
 kasan_quarantine_reduce+0x1a0/0x1f0 mm/kasan/quarantine.c:286
 __kasan_slab_alloc+0x4e/0x70 mm/kasan/common.c:350
 kasan_slab_alloc include/linux/kasan.h:253 [inline]
 slab_post_alloc_hook mm/slub.c:4538 [inline]
 slab_alloc_node mm/slub.c:4866 [inline]
 kmem_cache_alloc_noprof+0x2e7/0x6a0 mm/slub.c:4873
 alloc_filename fs/namei.c:142 [inline]
 do_getname+0x35/0x390 fs/namei.c:182
 getname include/linux/fs.h:2512 [inline]
 class_filename_constructor include/linux/fs.h:2539 [inline]
 do_sys_openat2+0xc5/0x1e0 fs/open.c:1365
 do_sys_open fs/open.c:1372 [inline]
 __do_sys_openat fs/open.c:1388 [inline]
 __se_sys_openat fs/open.c:1383 [inline]
 __x64_sys_openat+0x12d/0x210 fs/open.c:1383
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x106/0x7b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Memory state around the buggy address:
 ffff88813d339c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88813d339d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88813d339d80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                  ^
 ffff88813d339e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88813d339e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox