Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net v8 3/4] selftests: Add MACsec VLAN propagation traffic test
From: Sabrina Dubroca @ 2026-04-08 18:26 UTC (permalink / raw)
  To: Cosmin Ratiu
  Cc: netdev, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Stanislav Fomichev,
	David Wei, Shuah Khan, linux-kselftest, Dragos Tatulea
In-Reply-To: <20260408115240.1636047-4-cratiu@nvidia.com>

2026-04-08, 14:52:39 +0300, Cosmin Ratiu wrote:
> Add VLAN filter propagation tests through offloaded MACsec devices via
> actual traffic.
> 
> The tests create MACsec tunnels with matching SAs on both endpoints,
> stack VLANs on top, and verify connectivity with ping. Covered:
> - Offloaded MACsec with VLAN (filters propagate to HW)
> - Software MACsec with VLAN (no HW filter propagation)
> - Offload on/off toggle and verifying traffic still works
> 
> On netdevsim this makes use of the VLAN filter debugfs file to actually
> validate that filters are applied/removed correctly.
> On real hardware the traffic should validate actual VLAN filter
> propagation.
> 
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
>  tools/testing/selftests/drivers/net/config    |   1 +
>  .../selftests/drivers/net/lib/py/env.py       |   9 ++
>  tools/testing/selftests/drivers/net/macsec.py | 141 ++++++++++++++++++
>  3 files changed, 151 insertions(+)

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>

-- 
Sabrina

^ permalink raw reply

* [PATCH net] net: airoha: Add missing RX_CPU_IDX() configuration in airoha_qdma_cleanup_rx_queue()
From: Lorenzo Bianconi @ 2026-04-08 18:26 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Lorenzo Bianconi
  Cc: linux-arm-kernel, linux-mediatek, netdev

When the descriptor index written in REG_RX_CPU_IDX() is equal to the one
stored in REG_RX_DMA_IDX(), the hw will stop since the QDMA RX ring is
empty.
Add missing REG_RX_CPU_IDX() configuration in airoha_qdma_cleanup_rx_queue
routine during QDMA RX ring cleanup.

Fixes: 514aac359987 ("net: airoha: Add missing cleanup bits in airoha_qdma_cleanup_rx_queue()")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 91cb63a32d99..919b7009cbe5 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -819,6 +819,11 @@ static void airoha_qdma_cleanup_rx_queue(struct airoha_queue *q)
 	}
 
 	q->head = q->tail;
+	/* Set RX_DMA_IDX to RX_CPU_IDX to notify the hw the QDMA RX ring is
+	 * empty.
+	 */
+	airoha_qdma_rmw(qdma, REG_RX_CPU_IDX(qid), RX_RING_CPU_IDX_MASK,
+			FIELD_PREP(RX_RING_CPU_IDX_MASK, q->head));
 	airoha_qdma_rmw(qdma, REG_RX_DMA_IDX(qid), RX_RING_DMA_IDX_MASK,
 			FIELD_PREP(RX_RING_DMA_IDX_MASK, q->tail));
 }

---
base-commit: f821664dde29302e8450aa0597bf1e4c7c5b0a22
change-id: 20260331-airoha-cpu-idx-airoha_qdma_cleanup_rx_queue-efde9e4ab786

Best regards,
-- 
Lorenzo Bianconi <lorenzo@kernel.org>


^ permalink raw reply related

* [PATCH net] ipv4: nexthop: update has_v4 flag for any family change in replace_nexthop_single()
From: Xiang Mei @ 2026-04-08 18:28 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, davem, edumazet, kuba, pabeni, idosch, bestswngs,
	Xiang Mei

When a nexthop within a group is replaced, nh_group_v4_update() is only
called when the old nexthop is AF_INET and the new one is AF_INET6. The
reverse direction (AF_INET6 to AF_INET) is not handled, leaving the
group's has_v4 flag stale at false.

This causes fib6_check_nexthop() to incorrectly accept an IPv6 route
referencing a group that now contains an AF_INET nexthop. During route
lookup, nexthop_fib6_nh() returns NULL for the AF_INET nexthop and the
subsequent dereference in rt6_find_cached_rt() crashes with a general
protection fault:

 Oops: general protection fault, probably for non-canonical address [...]
 KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
 RIP: 0010:ip6_pol_route
 Call Trace:
  fib6_rule_lookup
  ip6_route_output_flags
  inet6_rtm_getroute
  rtnetlink_rcv_msg
  [...]

Fix by calling nh_group_v4_update() whenever the old and new nexthops
have different address families, not just for AF_INET to AF_INET6.
Using a general inequality is safe here: individual nexthops can only
be AF_INET or AF_INET6 (enforced at parse time), so the only family
transitions possible are between these two, and nh_group_v4_update()
is a full rescan that always produces the correct has_v4 value.

Fixes: 885a3b15791d ("ipv4: nexthop: Correctly update nexthop group when replacing a nexthop")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
 net/ipv4/nexthop.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 2c9036c719b6..b2ea15446cd2 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -2466,10 +2466,10 @@ static int replace_nexthop_single(struct net *net, struct nexthop *old,
 			goto err_notify;
 	}
 
-	/* When replacing an IPv4 nexthop with an IPv6 nexthop, potentially
-	 * update IPv4 indication in all the groups using the nexthop.
+	/* When the nexthop family changes, update the IPv4 indication in all
+	 * the groups using the nexthop.
 	 */
-	if (oldi->family == AF_INET && newi->family == AF_INET6) {
+	if (oldi->family != newi->family) {
 		list_for_each_entry(nhge, &old->grp_list, nh_list) {
 			struct nexthop *nhp = nhge->nh_parent;
 			struct nh_group *nhg;
-- 
2.43.0


^ permalink raw reply related

* [PATCH net] net/mlx5e: Fix use-after-free in mlx5e_tx_reporter_timeout_recover
From: Matt Fleming @ 2026-04-08 18:44 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch
  Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-rdma, linux-kernel, kernel-team,
	Matt Fleming

From: Matt Fleming <mfleming@cloudflare.com>

mlx5e_tx_reporter_timeout_recover() accesses sq->netdev after
mlx5e_safe_reopen_channels() has torn down and freed the channel (and
its embedded SQs). Replace the three sq->netdev references with
priv->netdev which is safe because priv outlives channel teardown.

The netdev_err() call already used priv->netdev for this reason; make
the trylock/unlock and health_channel_eq_recover calls consistent.

This fixes the following KASAN splat:

  BUG: KASAN: use-after-free in mlx5e_tx_reporter_timeout_recover+0x1dd/0x360 [mlx5_core]
  Read of size 8 at addr ffff889860ed0b28 by task kworker/u113:2/5277

  Call Trace:
   mlx5e_tx_reporter_timeout_recover+0x1dd/0x360 [mlx5_core]
   devlink_health_reporter_recover+0xa2/0x150
   devlink_health_report+0x254/0x7c0
   mlx5e_reporter_tx_timeout+0x297/0x380 [mlx5_core]
   mlx5e_tx_timeout_work+0x109/0x170 [mlx5_core]
   process_one_work+0x677/0xf20
   worker_thread+0x51f/0xd90
   kthread+0x3a5/0x810
   ret_from_fork+0x208/0x400
   ret_from_fork_asm+0x1a/0x30

Fixes: 83ac0304a2d7 ("net/mlx5e: Fix deadlocks between devlink and netdev instance locks")
Signed-off-by: Matt Fleming <mfleming@cloudflare.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index afdeb1b3d425..8409ae73768f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -160,13 +160,13 @@ static int mlx5e_tx_reporter_timeout_recover(void *ctx)
 	 * channels are being closed for other reason and this work is not
 	 * relevant anymore.
 	 */
-	while (!netdev_trylock(sq->netdev)) {
+	while (!netdev_trylock(priv->netdev)) {
 		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state))
 			return 0;
 		msleep(20);
 	}
 
-	err = mlx5e_health_channel_eq_recover(sq->netdev, eq, sq->cq.ch_stats);
+	err = mlx5e_health_channel_eq_recover(priv->netdev, eq, sq->cq.ch_stats);
 	if (!err) {
 		to_ctx->status = 0; /* this sq recovered */
 		goto out;
@@ -186,7 +186,7 @@ static int mlx5e_tx_reporter_timeout_recover(void *ctx)
 		   "mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(%d).\n",
 		   err);
 out:
-	netdev_unlock(sq->netdev);
+	netdev_unlock(priv->netdev);
 	return err;
 }
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH iwl-net 0/4] ice: E825C missing PHY timestamp interrupt fixes
From: Jacob Keller @ 2026-04-08 18:46 UTC (permalink / raw)
  To: Anthony Nguyen, Intel Wired LAN, netdev
  Cc: Grzegorz Nitka, Timothy Miskell, Aleksandr Loktionov,
	Jacob Keller

We recently ran into a nasty corner case issue with a customer operating
E825C cards seeing some strange behavior with missing Tx timestamps. During
the course of debugging. This series contains a few fixes found during this
debugging process.

The primary issue discovered in the investigation is a misconfiguration of
the E825C PHY timestamp interrupt register, PHY_REG_TS_INT_CONFIG. This
register is responsible for programming the Tx timestamp behavior of a PHY
port. The driver programs two values here: a threshold for when to
interrupt and whether the interrupt is enabled.

The threshold value is used by hardware to determine when to trigger a Tx
timestamp interrupt. The interrupt cause for the port is raised when the
number of outstanding timestamps in the PHY port timestamp memory meets the
threshold. The interrupt cause is not cleared until the number of
outstanding timestamps drops *below* the threshold.

It is considered a misconfiguration if the threshold is programmed to 0. If
the interrupt is enabled while the threshold is zero, hardware will raise
the interrupt cause at the next time it checks. Once raised, the interrupt
cause for the port will never lower, since you cannot have fewer than zero
outstanding timestamps.

Worse, the timestamp status for the port will remain high even if the
PHY_REG_TS_INT_CONFIG is reprogrammed with a new threshold. The PHY is a
separate hardware block from the MAC, and thus the interrupt status for the
port will remain high even if you reset the device MAC with a PF reset,
CORE reset, or GLOBAL reset.

PHY ports are connected together into quads. Each quad muxes the PHY
interrupt status for the 4 ports on the quad together before connecting
that to the MACs miscellaneous interrupt vector. As a result, if a single
PHY port in the quad is stuck, no timestamp interrupts will be generated
for any timestamp on any port on that quad.

The ice driver never directly writes a value of 0 for the threshold.
Indeed, the desired behavior is to set the threshold to 1, so that
interrupts are generated as soon as a single timestamp is captured.
Unfortunately, it turns out that for the E825C PHY, programming the
threshold and enable bit in the same write may cause a race in the PHY
timestamp block. The PHY may "see" the interrupt as enabled first before it
sees the threshold value. If the previous threshold value is zero (such as
when the register is initialized to zero at a cold power on), the hardware
may race with programming the threshold and set the PHY interrupt status to
high as described above.

The first patch in this series corrects that programming order, ensuring
that the threshold is always written first in a separate transaction from
enabling the interrupt bit. Additionally, an explicit check against writing
a 0 is added to make it clear to future readers that writing 0 to the
threshold while enabling the interrupt is not safe.

The PHY timestamp block does not reset with the MAC, and seems to only
reset during cold power on. This makes recovery from the faulty
configuration difficult. To address this, perform an explicit reset of the
PHY PTP block during initialization. This is achieved by writing the
PHY_REG_GLOBAL register. This performs a PHY soft reset, which completely
resets the timestamp block. This includes clearing the timestamp memory,
the PHY timestamp interrupt status, and the PHY PTP counter. A soft reset
of all ports on the device is done as part of ice_ptp_init_phc() during
early initialization of the PTP functionality by the PTP clock owner, prior
to programming each PHY. The ice_ptp_init_phc() function is called at
driver init and during reinitialization after all forms of device reset.
This ensures that the driver begins operation at a clean slate, rather than
carrying over the stale and potentially buggy configuration of a previous
driver.

While attempting to root cause the issue with the PHY timestamp interrupt,
we also discovered that the driver incorrectly assumes that it is operating
on E822 hardware when reading the PHY timestamp memory status registers in
a few places. This includes the check at the end of the interrupt handler,
as well as the check done inside the PTP auxiliary function. This prevented
the driver from detecting waiting timestamps on ports other than the first
two.

Finally, the ice_ptp_read_tx_hwstamp_status_eth56g() function was
discovered to only read the timestamp interrupt status value from the first
quad due to mistaking the port index for a PHY quad index. This resulted in
reporting the timestamp status for the second quad as identical to the
first quad instead of properly reporting its value. This is a minor fix
since the function currently is only used for diagnostic purposes and does
not impact driver decision logic.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
Grzegorz Nitka (2):
      ice: fix timestamp interrupt configuration for E825C
      ice: perform PHY soft reset for E825C ports at initialization

Jacob Keller (2):
      ice: fix ready bitmap check for non-E822 devices
      ice: fix ice_ptp_read_tx_hwtstamp_status_eth56g

 drivers/net/ethernet/intel/ice/ice_ptp_hw.h |   5 +
 drivers/net/ethernet/intel/ice/ice_ptp.c    |  40 ++---
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 253 +++++++++++++++++++++++++++-
 3 files changed, 265 insertions(+), 33 deletions(-)
---
base-commit: e3b6e4778608889866917014b7dfe88425073fe5
change-id: 20260408-jk-even-more-e825c-fixes-9a6dd7311bd6

Best regards,
--  
Jacob Keller <jacob.e.keller@intel.com>


^ permalink raw reply

* [PATCH iwl-net 4/4] ice: fix ice_ptp_read_tx_hwtstamp_status_eth56g
From: Jacob Keller @ 2026-04-08 18:46 UTC (permalink / raw)
  To: Anthony Nguyen, Intel Wired LAN, netdev
  Cc: Grzegorz Nitka, Timothy Miskell, Aleksandr Loktionov,
	Jacob Keller
In-Reply-To: <20260408-jk-even-more-e825c-fixes-v1-0-b959da91a81f@intel.com>

The ice_ptp_read_tx_hwtstamp_status_eth56g function calls
ice_read_phy_eth56g with a PHY index. However the function actually expects
a port index. This causes the function to read the wrong PHY_PTP_INT_STATUS
registers, and effectively makes the status wrong for the second set of
ports from 4 to 7.

The ice_read_phy_eth56g function uses the provided port index to determine
which PHY device to read. We could refactor the entire chain to take a PHY
index, but this would impact many code sites. Instead, multiply the PHY
index by the number of ports, so that we read from the first port of each
PHY.

Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
index 64ad5ed5c688..672218e5d1f9 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
@@ -2219,13 +2219,19 @@ int ice_ptp_read_tx_hwtstamp_status_eth56g(struct ice_hw *hw, u32 *ts_status)
 	*ts_status = 0;
 
 	for (phy = 0; phy < params->num_phys; phy++) {
+		u8 port;
 		int err;
 
-		err = ice_read_phy_eth56g(hw, phy, PHY_PTP_INT_STATUS, &status);
+		/* ice_read_phy_eth56g expects a port index, so use the first
+		 * port of the PHY
+		 */
+		port = phy * hw->ptp.ports_per_phy;
+
+		err = ice_read_phy_eth56g(hw, port, PHY_PTP_INT_STATUS, &status);
 		if (err)
 			return err;
 
-		*ts_status |= (status & mask) << (phy * hw->ptp.ports_per_phy);
+		*ts_status |= (status & mask) << port;
 	}
 
 	ice_debug(hw, ICE_DBG_PTP, "PHY interrupt err: %x\n", *ts_status);

-- 
2.53.0.1066.g1eceb487f285


^ permalink raw reply related

* [PATCH iwl-net 3/4] ice: fix ready bitmap check for non-E822 devices
From: Jacob Keller @ 2026-04-08 18:46 UTC (permalink / raw)
  To: Anthony Nguyen, Intel Wired LAN, netdev
  Cc: Grzegorz Nitka, Timothy Miskell, Aleksandr Loktionov,
	Jacob Keller
In-Reply-To: <20260408-jk-even-more-e825c-fixes-v1-0-b959da91a81f@intel.com>

The E800 hardware (apart from E810) has a ready bitmap for the PHY
indicating which timestamp slots currently have an outstanding timestamp
waiting to be read by software.

This bitmap is checked in multiple places using the
ice_get_phy_tx_tstamp_ready():

 * ice_ptp_process_tx_tstamp() calls it to determine which timestamps to
   attempt reading from the PHY
 * ice_ptp_tx_tstamps_pending() calls it in a loop at the end of the
   miscellaneous IRQ to check if new timestamps came in while the interrupt
   handler was executing.
 * ice_ptp_maybe_trigger_tx_interrupt() calls it in the auxiliary work task
   to trigger a software interrupt in the event that the hardware logic
   gets stuck.

For E82X devices, multiple PHYs share the same block, and the parameter
passed to the ready bitmap is a block number associated with the given
port. For E825-C devices, the PHYs have their own independent blocks and do
not share, so the parameter passed needs to be the port number. For E810
devices, the ice_get_phy_tx_tstamp_ready() always returns all 1s regardless
of what port, since this hardware does not have a ready bitmap. Finally,
for E830 devices, each PF has its own ready bitmap accessible via register,
and the block parameter is unused.

The first call correctly uses the Tx timestamp tracker block parameter to
check the appropriate timestamp block. This works because the tracker is
setup correctly for each timestamp device type.

The second two callers behave incorrectly for all device types other than
the older E822 devices. They both iterate in a loop using
ICE_GET_QUAD_NUM() which is a macro only used by E822 devices. This logic
is incorrect for devices other than the E822 devices.

For E810 the calls would always return true, causing E810 devices to always
attempt to trigger a software interrupt even when they have no reason to.
For E830, this results in duplicate work as the ready bitmap is checked
once per number of quads. Finally, for E825-C, this results in the pending
checks failing to detect timestamps on ports other than the first two.

Fix this by introducing a new hardware API function to ice_ptp_hw.c,
ice_check_phy_tx_tstamp_ready(). This function will check if any timestamps
are available and returns a positive value if any timestamps are pending.
For E810, the function always returns false, so that the re-trigger checks
never happen. For E830, check the ready bitmap just once. For E82x
hardware, check each quad. Finally, for E825-C, check every port.

The interface function returns an integer to enable reporting of error code
if the driver is unable read the ready bitmap. This enables callers to
handle this case properly. The previous implementation assumed that
timestamps are available if they failed to read the bitmap. This is
problematic as it could lead to continuous software IRQ triggering if the
PHY timestamp registers somehow become inaccessible.

This change is especially important for E825-C devices, as the missing
checks could leave a window open where a new timestamp could arrive while
the existing timestamps aren't completed. As a result, the hardware
threshold logic would not trigger a new interrupt. Without the check, the
timestamp is left unhandled, and new timestamps will not cause an interrupt
again until the timestamp is handled. Since both the interrupt check and
the backup check in the auxiliary task do not function properly, the device
may have Tx timestamps permanently stuck failing on a given port.

The faulty checks originate from commit d938a8cca88a ("ice: Auxbus devices
& driver for E822 TS") and commit 712e876371f8 ("ice: periodically kick Tx
timestamp interrupt"), however at the time of the original coding, both
functions only operated on E822 hardware. This is no longer the case, and
hasn't been since the introduction of the ETH56G PHY model in commit
7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")

Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_ptp_hw.h |   1 +
 drivers/net/ethernet/intel/ice/ice_ptp.c    |  40 ++++------
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 117 ++++++++++++++++++++++++++++
 3 files changed, 132 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
index 9d7acc7eb2ce..1b58b054f4a5 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
@@ -300,6 +300,7 @@ void ice_ptp_reset_ts_memory(struct ice_hw *hw);
 int ice_ptp_init_phc(struct ice_hw *hw);
 void ice_ptp_init_hw(struct ice_hw *hw);
 int ice_get_phy_tx_tstamp_ready(struct ice_hw *hw, u8 block, u64 *tstamp_ready);
+int ice_check_phy_tx_tstamp_ready(struct ice_hw *hw);
 int ice_ptp_one_port_cmd(struct ice_hw *hw, u8 configured_port,
 			 enum ice_ptp_tmr_cmd configured_cmd);
 
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c
index ada42bcc4d0b..34906f972d17 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp.c
@@ -2718,7 +2718,7 @@ static bool ice_any_port_has_timestamps(struct ice_pf *pf)
 bool ice_ptp_tx_tstamps_pending(struct ice_pf *pf)
 {
 	struct ice_hw *hw = &pf->hw;
-	unsigned int i;
+	int ret;
 
 	/* Check software indicator */
 	switch (pf->ptp.tx_interrupt_mode) {
@@ -2739,16 +2739,15 @@ bool ice_ptp_tx_tstamps_pending(struct ice_pf *pf)
 	}
 
 	/* Check hardware indicator */
-	for (i = 0; i < ICE_GET_QUAD_NUM(hw->ptp.num_lports); i++) {
-		u64 tstamp_ready = 0;
-		int err;
-
-		err = ice_get_phy_tx_tstamp_ready(&pf->hw, i, &tstamp_ready);
-		if (err || tstamp_ready)
-			return true;
+	ret = ice_check_phy_tx_tstamp_ready(hw);
+	if (ret < 0) {
+		dev_dbg(ice_pf_to_dev(pf), "Unable to read PHY Tx timestamp ready bitmap, err %d\n",
+			ret);
+		/* Stop triggering IRQs if we're unable to read PHY */
+		return false;
 	}
 
-	return false;
+	return ret;
 }
 
 /**
@@ -2832,8 +2831,7 @@ static void ice_ptp_maybe_trigger_tx_interrupt(struct ice_pf *pf)
 {
 	struct device *dev = ice_pf_to_dev(pf);
 	struct ice_hw *hw = &pf->hw;
-	bool trigger_oicr = false;
-	unsigned int i;
+	int ret;
 
 	if (!pf->ptp.port.tx.has_ready_bitmap)
 		return;
@@ -2841,21 +2839,11 @@ static void ice_ptp_maybe_trigger_tx_interrupt(struct ice_pf *pf)
 	if (!ice_pf_src_tmr_owned(pf))
 		return;
 
-	for (i = 0; i < ICE_GET_QUAD_NUM(hw->ptp.num_lports); i++) {
-		u64 tstamp_ready;
-		int err;
-
-		err = ice_get_phy_tx_tstamp_ready(&pf->hw, i, &tstamp_ready);
-		if (!err && tstamp_ready) {
-			trigger_oicr = true;
-			break;
-		}
-	}
-
-	if (trigger_oicr) {
-		/* Trigger a software interrupt, to ensure this data
-		 * gets processed.
-		 */
+	ret = ice_check_phy_tx_tstamp_ready(hw);
+	if (ret < 0) {
+		dev_dbg(dev, "PTP periodic task unable to read PHY timestamp ready bitmap, err %d\n",
+			ret);
+	} else if (ret) {
 		dev_dbg(dev, "PTP periodic task detected waiting timestamps. Triggering Tx timestamp interrupt now.\n");
 
 		wr32(hw, PFINT_OICR, PFINT_OICR_TSYN_TX_M);
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
index 441b5f10e4bb..64ad5ed5c688 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
@@ -2168,6 +2168,35 @@ int ice_start_phy_timer_eth56g(struct ice_hw *hw, u8 port)
 	return 0;
 }
 
+/**
+ * ice_check_phy_tx_tstamp_ready_eth56g - Check Tx memory status for all ports
+ * @hw: pointer to the HW struct
+ *
+ * Check the PHY_REG_TX_MEMORY_STATUS for all ports. A set bit indicates
+ * a waiting timestamp.
+ *
+ * Return: 1 if any port has at least one timestamp ready bit set,
+ * 0 otherwise, and a negative error code if unable to read the bitmap.
+ */
+static int ice_check_phy_tx_tstamp_ready_eth56g(struct ice_hw *hw)
+{
+	int port;
+
+	for (port = 0; port < hw->ptp.num_lports; port++) {
+		u64 tstamp_ready;
+		int err;
+
+		err = ice_get_phy_tx_tstamp_ready(hw, port, &tstamp_ready);
+		if (err)
+			return err;
+
+		if (tstamp_ready)
+			return 1;
+	}
+
+	return 0;
+}
+
 /**
  * ice_ptp_read_tx_hwtstamp_status_eth56g - Get TX timestamp status
  * @hw: pointer to the HW struct
@@ -4318,6 +4347,35 @@ ice_get_phy_tx_tstamp_ready_e82x(struct ice_hw *hw, u8 quad, u64 *tstamp_ready)
 	return 0;
 }
 
+/**
+ * ice_check_phy_tx_tstamp_ready_e82x - Check Tx memory status for all quads
+ * @hw: pointer to the HW struct
+ *
+ * Check the Q_REG_TX_MEMORY_STATUS for all quads. A set bit indicates
+ * a waiting timestamp.
+ *
+ * Return: 1 if any quad has at least one timestamp ready bit set,
+ * 0 otherwise, and a negative error value if unable to read the bitmap.
+ */
+static int ice_check_phy_tx_tstamp_ready_e82x(struct ice_hw *hw)
+{
+	int quad;
+
+	for (quad = 0; quad < ICE_GET_QUAD_NUM(hw->ptp.num_lports); quad++) {
+		u64 tstamp_ready;
+		int err;
+
+		err = ice_get_phy_tx_tstamp_ready(hw, quad, &tstamp_ready);
+		if (err)
+			return err;
+
+		if (tstamp_ready)
+			return 1;
+	}
+
+	return 0;
+}
+
 /**
  * ice_phy_cfg_intr_e82x - Configure TX timestamp interrupt
  * @hw: pointer to the HW struct
@@ -4870,6 +4928,23 @@ ice_get_phy_tx_tstamp_ready_e810(struct ice_hw *hw, u8 port, u64 *tstamp_ready)
 	return 0;
 }
 
+/**
+ * ice_check_phy_tx_tstamp_ready_e810 - Check Tx memory status register
+ * @hw: pointer to the HW struct
+ *
+ * The E810 devices do not have a Tx memory status register. Note this is
+ * intentionally different behavior from ice_get_phy_tx_tstamp_ready_e810
+ * which always says that all bits are ready. This function is called in cases
+ * where code will trigger interrupts if timestamps are waiting, and should
+ * not be called for E810 hardware.
+ *
+ * Return: 0.
+ */
+static int ice_check_phy_tx_tstamp_ready_e810(struct ice_hw *hw)
+{
+	return 0;
+}
+
 /* E810 SMA functions
  *
  * The following functions operate specifically on E810 hardware and are used
@@ -5124,6 +5199,21 @@ static void ice_get_phy_tx_tstamp_ready_e830(const struct ice_hw *hw, u8 port,
 	*tstamp_ready |= rd32(hw, E830_PRTMAC_TS_TX_MEM_VALID_L);
 }
 
+/**
+ * ice_check_phy_tx_tstamp_ready_e830 - Check Tx memory status register
+ * @hw: pointer to the HW struct
+ *
+ * Return: 1 if the device has waiting timestamps, 0 otherwise.
+ */
+static int ice_check_phy_tx_tstamp_ready_e830(struct ice_hw *hw)
+{
+	u64 tstamp_ready;
+
+	ice_get_phy_tx_tstamp_ready_e830(hw, 0, &tstamp_ready);
+
+	return !!tstamp_ready;
+}
+
 /**
  * ice_ptp_init_phy_e830 - initialize PHY parameters
  * @ptp: pointer to the PTP HW struct
@@ -5716,6 +5806,33 @@ int ice_get_phy_tx_tstamp_ready(struct ice_hw *hw, u8 block, u64 *tstamp_ready)
 	}
 }
 
+/**
+ * ice_check_phy_tx_tstamp_ready - Check PHY Tx timestamp memory status
+ * @hw: pointer to the HW struct
+ *
+ * Check the PHY for Tx timestamp memory status on all ports. If you need to
+ * see individual timestamp status for each index, use
+ * ice_get_phy_tx_tstamp_ready() instead.
+ *
+ * Return: 1 if any port has timestamps available, 0 if there are no timestamps
+ * available, and a negative error code on failure.
+ */
+int ice_check_phy_tx_tstamp_ready(struct ice_hw *hw)
+{
+	switch (hw->mac_type) {
+	case ICE_MAC_E810:
+		return ice_check_phy_tx_tstamp_ready_e810(hw);
+	case ICE_MAC_E830:
+		return ice_check_phy_tx_tstamp_ready_e830(hw);
+	case ICE_MAC_GENERIC:
+		return ice_check_phy_tx_tstamp_ready_e82x(hw);
+	case ICE_MAC_GENERIC_3K_E825:
+		return ice_check_phy_tx_tstamp_ready_eth56g(hw);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 /**
  * ice_cgu_get_pin_desc_e823 - get pin description array
  * @hw: pointer to the hw struct

-- 
2.53.0.1066.g1eceb487f285


^ permalink raw reply related

* [PATCH iwl-net 1/4] ice: fix timestamp interrupt configuration for E825C
From: Jacob Keller @ 2026-04-08 18:46 UTC (permalink / raw)
  To: Anthony Nguyen, Intel Wired LAN, netdev
  Cc: Grzegorz Nitka, Timothy Miskell, Aleksandr Loktionov,
	Jacob Keller
In-Reply-To: <20260408-jk-even-more-e825c-fixes-v1-0-b959da91a81f@intel.com>

From: Grzegorz Nitka <grzegorz.nitka@intel.com>

The E825C ice_phy_cfg_intr_eth56g() function is responsible for programming
the PHY interrupt for a given port. This function writes to the
PHY_REG_TS_INT_CONFIG register of the port. The register is responsible for
configuring whether the port interrupt logic is enabled, as well as
programming the threshold of waiting timestamps that will trigger an
interrupt from this port.

This threshold value must not be programmed to zero while the interrupt is
enabled. Doing so puts the port in a misconfigured state where the PHY
timestamp interrupt for the quad of connected ports will become stuck.

This occurs, because a threshold of zero results in the timestamp interrupt
status for the port becoming stuck high. The four ports in the connected
quad have their timestamp status indicators muxed together. A new interrupt
cannot be generated until the timestamp status indicators return low for
all four ports.

Normally, the timestamp status for a port will clear once there are fewer
timestamps in that ports timestamp memory bank than the threshold. A
threshold of zero makes this impossible, so the timestamp status for the
port does not clear.

The ice driver never intentionally programs the threshold to zero, indeed
the driver always programs it to a value of 1, intending to get an
interrupt immediately as soon as even a single packet is waiting for a
timestamp.

However, there is a subtle flaw in the programming logic in the
ice_phy_cfg_intr_eth56g() function. Due to the way that the hardware
handles enabling the PHY interrupt. If the threshold value is modified at
the same time as the interrupt is enabled, the HW PHY state machine might
enable the interrupt before the new threshold value is actually updated.
This leaves a potential race condition caused by the hardware logic where
a PHY timestamp interrupt might be triggered before the non-zero threshold
is written, resulting in the PHY timestamp logic becoming stuck.

Once the PHY timestamp status is stuck high, it will remain stuck even
after attempting to reprogram the PHY block by changing its threshold or
disabling the interrupt. Even a typical PF or CORE reset will not reset the
particular block of the PHY that becomes stuck. Even a warm power cycle is
not guaranteed to cause the PHY block to reset, and a cold power cycle is
required.

Prevent this by always writing the PHY_REG_TS_INT_CONFIG in two stages.
First write the threshold value with the interrupt disabled, and only write
the enable bit after the threshold has been programmed. When disabling the
interrupt, leave the threshold unchanged. Additionally, re-read the
register after writing it to guarantee that the write to the PHY has been
flushed upon exit of the function.

While we're modifying this function implementation, explicitly reject
programming a threshold of 0 when enabling the interrupt. No caller does
this today, but the consequences of doing so are significant. An explicit
rejection in the code makes this clear.

Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 36 +++++++++++++++++++++++++----
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
index e3db252c3918..67775beb9449 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
@@ -1847,6 +1847,8 @@ static int ice_phy_cfg_mac_eth56g(struct ice_hw *hw, u8 port)
  * @ena: enable or disable interrupt
  * @threshold: interrupt threshold
  *
+ * The threshold cannot be 0 while the interrupt is enabled.
+ *
  * Configure TX timestamp interrupt for the specified port
  *
  * Return:
@@ -1858,19 +1860,45 @@ int ice_phy_cfg_intr_eth56g(struct ice_hw *hw, u8 port, bool ena, u8 threshold)
 	int err;
 	u32 val;
 
+	if (ena && !threshold)
+		return -EINVAL;
+
 	err = ice_read_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG, &val);
 	if (err)
 		return err;
 
+	val &= ~PHY_TS_INT_CONFIG_ENA_M;
 	if (ena) {
-		val |= PHY_TS_INT_CONFIG_ENA_M;
 		val &= ~PHY_TS_INT_CONFIG_THRESHOLD_M;
 		val |= FIELD_PREP(PHY_TS_INT_CONFIG_THRESHOLD_M, threshold);
-	} else {
-		val &= ~PHY_TS_INT_CONFIG_ENA_M;
+		err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG,
+					       val);
+		if (err) {
+			ice_debug(hw, ICE_DBG_PTP,
+				  "Failed to update 'threshold' PHY_REG_TS_INT_CONFIG port=%u ena=%u threshold=%u\n",
+				  port, !!ena, threshold);
+			return err;
+		}
+		val |= PHY_TS_INT_CONFIG_ENA_M;
 	}
 
-	return ice_write_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG, val);
+	err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG, val);
+	if (err) {
+		ice_debug(hw, ICE_DBG_PTP,
+			  "Failed to update 'ena' PHY_REG_TS_INT_CONFIG port=%u ena=%u threshold=%u\n",
+			  port, !!ena, threshold);
+		return err;
+	}
+
+	err = ice_read_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG, &val);
+	if (err) {
+		ice_debug(hw, ICE_DBG_PTP,
+			  "Failed to read PHY_REG_TS_INT_CONFIG port=%u ena=%u threshold=%u\n",
+			  port, !!ena, threshold);
+		return err;
+	}
+
+	return 0;
 }
 
 /**

-- 
2.53.0.1066.g1eceb487f285


^ permalink raw reply related

* [PATCH iwl-net 2/4] ice: perform PHY soft reset for E825C ports at initialization
From: Jacob Keller @ 2026-04-08 18:46 UTC (permalink / raw)
  To: Anthony Nguyen, Intel Wired LAN, netdev
  Cc: Grzegorz Nitka, Timothy Miskell, Aleksandr Loktionov,
	Jacob Keller
In-Reply-To: <20260408-jk-even-more-e825c-fixes-v1-0-b959da91a81f@intel.com>

From: Grzegorz Nitka <grzegorz.nitka@intel.com>

In some cases the PHY timestamp block of the E825C can become stuck. This
is known to occur if the software writes 0 to the Tx timestamp threshold,
and with older versions of the ice driver the threshold configuration is
buggy and can race in such that hardware briefly operates with a zero
threshold enabled. There are no other known ways to trigger this behavior,
but once it occurs, the hardware is not recovered by normal reset, a driver
reload, or even a warm power cycle of the system. A cold power cycle is
sufficient to recover hardware, but this is extremely invasive and can
result in significant downtime on customer deployments.

The PHY for each port has a timestamping block which has its own reset
functionality accessible by programming the PHY_REG_GLOBAL register.
Writing to the PHY_REG_GLOBAL_SOFT_RESET_BIT triggers the hardware to
perform a complete reset of the timestamping block of the PHY. This
includes clearing the timestamp status for the port, clearing all
outstanding timestamps in the memory bank, and resetting the PHY timer.

The new ice_ptp_phy_soft_reset_eth56g() function toggles the
PHY_REG_GLOBAL soft reset bit with the required delays, ensuring the
PHY is properly reinitialized without requiring a full device reset.
The sequence clears the reset bit, asserts it, then clears it again,
with short waits between transitions to allow hardware stabilization.

Call this function in the new ice_ptp_init_phc_e825c(), implementing the
E825C device specific variant of the ice_ptp_init_phc(). Note that if
ice_ptp_init_phc() fails, PTP functionality may be disabled, but the driver
will still load to allow basic functionality to continue.

This causes the clock owning PF driver to perform a PHY soft reset for
every port during initialization. This ensures the driver begins life in a
known functional state regardless of how it was previously programmed.

This ensures that we properly reconfigure the hardware after a device reset
or when loading the driver, even if it was previously misconfigured with an
out-of-date or modified driver.

Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
Signed-off-by: Timothy Miskell <timothy.miskell@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_ptp_hw.h |  4 ++
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 90 ++++++++++++++++++++++++++++-
 2 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
index 5896b346e579..9d7acc7eb2ce 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
@@ -374,6 +374,7 @@ int ice_stop_phy_timer_eth56g(struct ice_hw *hw, u8 port, bool soft_reset);
 int ice_start_phy_timer_eth56g(struct ice_hw *hw, u8 port);
 int ice_phy_cfg_intr_eth56g(struct ice_hw *hw, u8 port, bool ena, u8 threshold);
 int ice_phy_cfg_ptp_1step_eth56g(struct ice_hw *hw, u8 port);
+int ice_ptp_phy_soft_reset_eth56g(struct ice_hw *hw, u8 port);
 
 #define ICE_ETH56G_NOMINAL_INCVAL	0x140000000ULL
 #define ICE_ETH56G_NOMINAL_PCS_REF_TUS	0x100000000ULL
@@ -676,6 +677,9 @@ static inline u64 ice_get_base_incval(struct ice_hw *hw)
 #define ICE_P0_GNSS_PRSNT_N	BIT(4)
 
 /* ETH56G PHY register addresses */
+#define PHY_REG_GLOBAL			0x0
+#define PHY_REG_GLOBAL_SOFT_RESET_M	BIT(11)
+
 /* Timestamp PHY incval registers */
 #define PHY_REG_TIMETUS_L		0x8
 #define PHY_REG_TIMETUS_U		0xC
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
index 67775beb9449..441b5f10e4bb 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
@@ -377,6 +377,31 @@ static void ice_ptp_cfg_sync_delay(const struct ice_hw *hw, u32 delay)
  * The following functions operate on devices with the ETH 56G PHY.
  */
 
+/**
+ * ice_ptp_init_phc_e825c - Perform E825C specific PHC initialization
+ * @hw: pointer to HW struct
+ *
+ * Perform E825C-specific PTP hardware clock initialization steps.
+ *
+ * Return: 0 on success, or a negative error value on failure.
+ */
+static int ice_ptp_init_phc_e825c(struct ice_hw *hw)
+{
+	int err;
+
+	/* Soft reset all ports, to ensure everything is at a clean state */
+	for (int port = 0; port < hw->ptp.num_lports; port++) {
+		err = ice_ptp_phy_soft_reset_eth56g(hw, port);
+		if (err) {
+			ice_debug(hw, ICE_DBG_PTP, "Failed to soft reset port %d, err %d\n",
+				  port, err);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
 /**
  * ice_ptp_get_dest_dev_e825 - get destination PHY for given port number
  * @hw: pointer to the HW struct
@@ -2179,6 +2204,69 @@ int ice_ptp_read_tx_hwtstamp_status_eth56g(struct ice_hw *hw, u32 *ts_status)
 	return 0;
 }
 
+/**
+ * ice_ptp_phy_soft_reset_eth56g - Perform a PHY soft reset on ETH56G
+ * @hw: pointer to the HW structure
+ * @port: PHY port number
+ *
+ * Trigger a soft reset of the ETH56G PHY by toggling the soft reset
+ * bit in the PHY global register. The reset sequence consists of:
+ *   1. Clearing the soft reset bit
+ *   2. Asserting the soft reset bit
+ *   3. Clearing the soft reset bit again
+ *
+ * Short delays are inserted between each step to allow the hardware
+ * to settle. This provides a controlled way to reinitialize the PHY
+ * without requiring a full device reset.
+ *
+ * Return: 0 on success, or a negative error code on failure when
+ *         reading or writing the PHY register.
+ */
+int ice_ptp_phy_soft_reset_eth56g(struct ice_hw *hw, u8 port)
+{
+	u32 global_val;
+	int err;
+
+	err = ice_read_ptp_reg_eth56g(hw, port, PHY_REG_GLOBAL, &global_val);
+	if (err) {
+		ice_debug(hw, ICE_DBG_PTP, "Failed to read PHY_REG_GLOBAL for port %d, err %d\n",
+			  port, err);
+		return err;
+	}
+
+	global_val &= ~PHY_REG_GLOBAL_SOFT_RESET_M;
+	ice_debug(hw, ICE_DBG_PTP, "Clearing soft reset bit for port %d, val: 0x%x\n",
+		  port, global_val);
+	err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_GLOBAL, global_val);
+	if (err) {
+		ice_debug(hw, ICE_DBG_PTP, "Failed to write PHY_REG_GLOBAL for port %d, err %d\n",
+			  port, err);
+		return err;
+	}
+
+	usleep_range(5000, 6000);
+
+	global_val |= PHY_REG_GLOBAL_SOFT_RESET_M;
+	ice_debug(hw, ICE_DBG_PTP, "Set soft reset bit for port %d, val: 0x%x\n",
+		  port, global_val);
+	err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_GLOBAL, global_val);
+	if (err) {
+		ice_debug(hw, ICE_DBG_PTP, "Failed to write PHY_REG_GLOBAL for port %d, err %d\n",
+			  port, err);
+		return err;
+	}
+	usleep_range(5000, 6000);
+
+	global_val &= ~PHY_REG_GLOBAL_SOFT_RESET_M;
+	ice_debug(hw, ICE_DBG_PTP, "Clear soft reset bit for port %d, val: 0x%x\n",
+		  port, global_val);
+	err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_GLOBAL, global_val);
+	if (err)
+		ice_debug(hw, ICE_DBG_PTP, "Failed to write PHY_REG_GLOBAL for port %d, err %d\n",
+			  port, err);
+	return err;
+}
+
 /**
  * ice_get_phy_tx_tstamp_ready_eth56g - Read the Tx memory status register
  * @hw: pointer to the HW struct
@@ -5591,7 +5679,7 @@ int ice_ptp_init_phc(struct ice_hw *hw)
 	case ICE_MAC_GENERIC:
 		return ice_ptp_init_phc_e82x(hw);
 	case ICE_MAC_GENERIC_3K_E825:
-		return 0;
+		return ice_ptp_init_phc_e825c(hw);
 	default:
 		return -EOPNOTSUPP;
 	}

-- 
2.53.0.1066.g1eceb487f285


^ permalink raw reply related

* Re: [net-next v9 07/10] net: bnxt: Implement software USO
From: Joe Damato @ 2026-04-08 18:49 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, horms, linux-kernel, leon
In-Reply-To: <20260408110652.652e4732@kernel.org>

On Wed, Apr 08, 2026 at 11:06:52AM -0700, Jakub Kicinski wrote:
> On Wed, 8 Apr 2026 10:04:55 -0700 Joe Damato wrote:

[...]

> >    2. Or, keep the smaller buffer that we have now (BNXT_SW_USO_MAX_SEGS (64)
> >       * 256b = 16kb per ring) and fix the try_stop like this:
> > 
> > +static inline u16 bnxt_inline_avail(struct bnxt_tx_ring_info *txr)
> > +{
> > +       return BNXT_SW_USO_MAX_SEGS -
> > +              (u16)(txr->tx_inline_prod - READ_ONCE(txr->tx_inline_cons));
> > +}
> > +
> > 
> > -       slots = txr->tx_inline_prod - txr->tx_inline_cons;
> > -       slots = BNXT_SW_USO_MAX_SEGS - slots;
> > -
> > -       if (unlikely(slots < num_segs)) {
> > -               netif_txq_try_stop(txq, slots, num_segs);
> > +       if (unlikely(bnxt_inline_avail(txr) < num_segs)) {
> > +               netif_txq_try_stop(txq, bnxt_inline_avail(txr), num_segs);
> 
> I think option 2 makes sense. The point (which I think you got) is that
> the condition must be evaluated after the memory barrier.
> 
> Since the condition is repeated in your latest snippet - you can
> probably use netif_txq_maybe_stop() ?

Yea, that's better, thanks.

I'll go with the inline wrapper above and:

-       slots = txr->tx_inline_prod - txr->tx_inline_cons;
-       slots = BNXT_SW_USO_MAX_SEGS - slots;
-
-       if (unlikely(slots < num_segs)) {
-               netif_txq_try_stop(txq, slots, num_segs);
+       if (!netif_txq_maybe_stop(txq, bnxt_inline_avail(txr),
+                                 num_segs, num_segs))
                return NETDEV_TX_BUSY;

^ permalink raw reply

* Re: [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers
From: David Laight @ 2026-04-08 18:56 UTC (permalink / raw)
  To: Breno Leitao
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, metze, axboe,
	Stanislav Fomichev, io-uring, bpf, netdev, Linus Torvalds,
	linux-kernel, kernel-team
In-Reply-To: <adZcnNgxhsUjAgZW@gmail.com>

On Wed, 8 Apr 2026 06:52:54 -0700
Breno Leitao <leitao@debian.org> wrote:

> Hello David,
> 
> On Wed, Apr 08, 2026 at 12:26:53PM +0100, David Laight wrote:
> > On Wed, 08 Apr 2026 03:30:28 -0700
> > Breno Leitao <leitao@debian.org> wrote:
> >  
> > > Currently, the .getsockopt callback requires __user pointers:
> > >
> > >   int (*getsockopt)(struct socket *sock, int level,
> > >                     int optname, char __user *optval, int __user *optlen);
> > >
> > > This prevents kernel callers (io_uring, BPF) from using getsockopt on
> > > levels other than SOL_SOCKET, since they pass kernel pointers.
> > >
> > > Following Linus' suggestion [0], this series introduces sockopt_t, a
> > > type-safe wrapper around iov_iter, and a getsockopt_iter callback that
> > > works with both user and kernel buffers. AF_PACKET and CAN raw are
> > > converted as initial users, with selftests covering the trickiest
> > > conversion patterns.  
> >
> > What are you doing about the cases where 'optlen' is a complete lie?  
> 
> Is this incorrect optlen originating from userspace, and getting into
> the .getsockopt callbacks?

Look at tcp_ao_copy_mptks_to_user() in net/ipv4/tcp_ao.c
This isn't 'old code' it was added in 2023.

Basically what is being transferred is an array and 'optlen' is the
size of one element.
The number of elements is in the first one.

Yes, it is completely broken.

There was also some very old code that just didn't check the length
(probably only for 'int' sized parameters).
That might all have disappeared when decnet support was removed.

There was also a very longstanding bug that pretty much all the IP
protocols would treat negative lengths as 4.
That got 'fixed' not long ago, I do wonder how many applications that
broke! Passing an uninitialised on-stack variable would have worked
(for 'int' parameters) provided it wasn't in [0..3].
Even then there is code that will copy 1 byte (instead of 4) when
a short length is passed - but it only does something sensible on LE.

I've been though all this code trying to replace the 'int *optlen'
with 'unsigned int optlen' and then returning the updated length
(or -ERRNO) to the wrapper.
That simplifies 99% of the code.
However there are a very small number of places that want to return
an error with a corrected length.
If you were starting from scratch you could say that returning a bigger
length would return a specific errno (maybe -ERANGE) and the updated
length - but there is no consistency.

I pretty much decided that the getsockopt() functions would have to
be able to return one of:
	-errno
	length
	GETSOCKOPT_RVAL(errno, lenght)
with the wrapper separating out the merged value.

	David


> 
> > IIRC there is one related to some form of async io where it is just
> > the length of the header, the actual buffer length depends on
> > data in the header.  
> 
> Could you point me to the relevant code so I can examine this case?
> 
> > This doesn't matter with the existing code for applications, when they
> > get it wrong they just crash.  
> 
> Is this crash being triggered by the protocol callbacks?
> 
> I tried searching for this but couldn't find it. I'd appreciate any
> hints you could provide about this case.
> 
> Thanks
> --breno


^ permalink raw reply

* [PATCH net] ice: Fix missing 1's complement negation in GCS raw checksum
From: Matt Fleming @ 2026-04-08 19:02 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel
  Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, intel-wired-lan, netdev, linux-kernel, kernel-team,
	Matt Fleming

From: Matt Fleming <mfleming@cloudflare.com>

Commit 905d1a220e8d ("ice: Add E830 checksum offload support") added
Generic Checksum (GCS) support for E830 NICs but omitted the 1's
complement negation (~) when converting the hardware raw_csum to
skb->csum for CHECKSUM_COMPLETE.

Without the negation, every CHECKSUM_COMPLETE packet fails the
fast-path validation in nf_ip_checksum() and falls through to software
checksumming via __skb_checksum_complete(), which triggers the
rate-limited "hw csum failure" warning. Packets are still accepted
(the software recheck passes) but hardware checksum offload is
effectively disabled and the warning floods dmesg on systems running
nf_conntrack on VLAN sub-interfaces.

Multiple other drivers (idpf, ehea, iwlwifi, cassini, sunhme, enetc)
also apply ~ for CHECKSUM_COMPLETE. The ice driver was the only in-tree
user of csum_unfold() for CHECKSUM_COMPLETE that omitted it.

Fixes: 905d1a220e8d ("ice: Add E830 checksum offload support")
Signed-off-by: Matt Fleming <mfleming@cloudflare.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index e695a664e53d..c177579e0114 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -92,7 +92,7 @@ static void ice_rx_gcs(struct sk_buff *skb,
 	desc = (struct ice_32b_rx_flex_desc_nic *)rx_desc;
 	skb->ip_summed = CHECKSUM_COMPLETE;
 	csum = (__force u16)desc->raw_csum;
-	skb->csum = csum_unfold((__force __sum16)swab16(csum));
+	skb->csum = csum_unfold((__force __sum16)~swab16(csum));
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next 0/3] psp: add crypt-offset and spi-threshold attributes
From: Akhilesh Samineni @ 2026-04-08 19:02 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, edumazet, pabeni, andrew+netdev, horms, willemb,
	daniel.zahka, netdev, linux-kernel, jayakrishnan.udayavarma,
	ajit.khaparde, kiran.kella, sachin.suman
In-Reply-To: <20260407180928.3ce5ddca@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 794 bytes --]

On Wed, Apr 8, 2026 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 7 Apr 2026 21:09:38 +0530 Akhilesh Samineni wrote:
> > > Please read this document:
> > > https://www.kernel.org/doc/html/next/process/maintainer-netdev.html
> >
> > Thank you for the link. I have reviewed the netdev process documentation.
>
> It is one thing to make an unknowing mistake and another thing
> to ignore someone asking you to read the documentation.
> Please read the doc top to bottom and tell your entire team to read it.

Hi Jakub, Daniel,

My apologies. I missed the requirement in the documentation regarding
the necessity of a real driver implementation alongside netdevsim for
new APIs.

I will submit the next version of the patch after the PSP driver is upstreamed.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4211 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v8 4/4] tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present
From: Simon Schippers @ 2026-04-08 19:04 UTC (permalink / raw)
  To: Jason Wang
  Cc: willemdebruijn.kernel, andrew+netdev, davem, edumazet, kuba,
	pabeni, mst, eperezma, leiyang, stephen, jon, tim.gebauer, netdev,
	linux-kernel, kvm, virtualization
In-Reply-To: <b9d84d88-46d5-4fd3-a5b2-d914f54766f6@tu-dortmund.de>

Hi Jason,
what do you think about my previous reply?

I'm hoping to bring this patchset to an end soon, but I need your
feedback for that.

Thanks :)


^ permalink raw reply

* Re: [PATCH] nfc: nci: fix OOB heap read in nci_core_init_rsp_packet_v1()
From: Simon Horman @ 2026-04-08 19:05 UTC (permalink / raw)
  To: Lekë Hapçiu
  Cc: netdev, kuba, davem, edumazet, pabeni, security,
	Lekë Hapçiu
In-Reply-To: <20260404103016.1292588-1-snowwlake@icloud.com>

On Sat, Apr 04, 2026 at 12:30:16PM +0200, Lekë Hapçiu wrote:
> From: Lekë Hapçiu <framemain@outlook.com>
> 
> nci_core_init_rsp_packet_v1() uses the raw chip-supplied
> num_supported_rf_interfaces byte to compute the rsp_2 pointer, but
> the preceding min() already stores the capped value in
> ndev->num_supported_rf_interfaces.  When a hostile chip returns
> num_supported_rf_interfaces > 4 the memcpy is safe (capped) but rsp_2
> lands past the end of the skb, and the fields copied out of it corrupt
> nci_dev with data from adjacent kernel heap.
> 
> Use the already-capped ndev->num_supported_rf_interfaces for both the
> length check and the pointer, making the relationship between the two
> explicit.
> 
> Fixes: e8c0dacd9836 ("NFC: Update names and structs to NCI spec 1.0 d18")
> Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
> ---
> v2: drop intermediate offset variable, check skb->len directly against
>     ndev->num_supported_rf_interfaces + sizeof(*rsp_2) (Jakub Kicinski)

I'm unable to locate v1 and Jakub's review of it.
Could you provide a link?

> 
>  net/nfc/nci/rsp.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/net/nfc/nci/rsp.c b/net/nfc/nci/rsp.c
> index 9eeb86282..4aaf362b9 100644
> --- a/net/nfc/nci/rsp.c
> +++ b/net/nfc/nci/rsp.c
> @@ -66,7 +66,16 @@ static u8 nci_core_init_rsp_packet_v1(struct nci_dev *ndev,
>  	       rsp_1->supported_rf_interfaces,
>  	       ndev->num_supported_rf_interfaces);
>  
> -	rsp_2 = (void *) (skb->data + 6 + rsp_1->num_supported_rf_interfaces);
> +	if (skb->len < sizeof(*rsp_1) + ndev->num_supported_rf_interfaces +
> +		       sizeof(*rsp_2)) {

There are accesses to skb->data before this check.
And it seems they could also overrun the length of the skb.
So I think a check needs to be placed towards the beginning
of this function. (Sashiko has something similar to say.)

Also, I don't think that checking skb->len is sufficient,
as data may be present in the non-linear portion of the skb.
I suspect pskb_may_pull is needed.

If so, I think the same problem exists in the call path.

> +		pr_err("CORE_INIT_RSP too short: len=%u need=%zu\n",
> +		       skb->len,
> +		       sizeof(*rsp_1) + ndev->num_supported_rf_interfaces +
> +		       sizeof(*rsp_2));
> +		return NCI_STATUS_SYNTAX_ERROR;
> +	}
> +	rsp_2 = (void *)(skb->data + sizeof(*rsp_1) +
> +			 ndev->num_supported_rf_interfaces);
>  
>  	ndev->max_logical_connections = rsp_2->max_logical_connections;
>  	ndev->max_routing_table_size =

Sashiko also asks if it is valid to read the packet
if num_supported_rf_interfaces is truncated. Won't the beginning
of resp_2 lie in the trailing supported_rf_interfaces of the rsp1
header?

^ permalink raw reply

* [PATCH net-next v2] net: Add net_cookie to Dead loop messages
From: Chris J Arges @ 2026-04-08 19:10 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, David Ahern
  Cc: kernel-team, Chris J Arges, netdev, linux-kernel

Network devices can have the same name within different network namespaces.
To help distinguish these devices, add the net_cookie value which can be
used to identify the netns.

Signed-off-by: Chris J Arges <carges@cloudflare.com>
---
v2: rebased on net-next
---
 net/core/dev.c            | 5 +++--
 net/ipv4/ip_tunnel_core.c | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 5a31f9d2128c..3c752ba3aa29 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4886,8 +4886,9 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 	} else {
 		/* Recursion is detected! It is possible unfortunately. */
 recursion_alert:
-		net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n",
-				     dev->name);
+		net_crit_ratelimited("Dead loop on virtual device %s (net %llu), fix it urgently!\n",
+				     dev->name, dev_net(dev)->net_cookie);
+
 		rc = -ENETDOWN;
 	}
 
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index f430d6f0463e..2667f53482bd 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -60,8 +60,8 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 
 	if (unlikely(dev_recursion_level() > IP_TUNNEL_RECURSION_LIMIT)) {
 		if (dev) {
-			net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n",
-					     dev->name);
+			net_crit_ratelimited("Dead loop on virtual device %s (net %llu), fix it urgently!\n",
+					     dev->name, dev_net(dev)->net_cookie);
 			DEV_STATS_INC(dev, tx_errors);
 		}
 		ip_rt_put(rt);
-- 
2.43.0


^ permalink raw reply related

* Re: [net,PATCH] net: ks8851: Reinstate disabling of BHs around IRQ handler
From: Nicolai Buchwitz @ 2026-04-08 19:15 UTC (permalink / raw)
  To: Marek Vasut
  Cc: netdev, stable, David S. Miller, Andrew Lunn, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Ronald Wahl, Yicong Hui,
	linux-kernel
In-Reply-To: <a9845b8e-5d3f-472b-8f03-bba699ba3882@nabladev.com>

On 8.4.2026 17:41, Marek Vasut wrote:
> On 4/8/26 12:54 PM, Nicolai Buchwitz wrote:
> 
> Hello Nicolai,
> 
> thank you for testing on the SPI variant, that helped a lot.
> 
>> In order to make this work I would propose something like this (which 
>> works in my SPI setup):
>> 
>> --- a/drivers/net/ethernet/micrel/ks8851_par.c
>> +++ b/drivers/net/ethernet/micrel/ks8851_par.c
>> @@ -60,12 +60,14 @@ static void ks8851_lock_par(struct ks8851_net *ks, 
>> unsigned long *flags)
>>   {
>>       struct ks8851_net_par *ksp = to_ks8851_par(ks);
>> 
>> +    local_bh_disable();
>>       spin_lock_irqsave(&ksp->lock, *flags);
>>   }
>> 
>>   static void ks8851_unlock_par(struct ks8851_net *ks, unsigned long 
>> *flags)
>>   {
>>       struct ks8851_net_par *ksp = to_ks8851_par(ks);
>> 
>>       spin_unlock_irqrestore(&ksp->lock, *flags);
>> +    local_bh_enable();
>>   }
>> 
>> Tested-by: Nicolai Buchwitz <nb@tipi-net.de>  # KS8851 SPI, non-RT 
>> (regression + proposed fix)
> 
> Are you also able to test the KS8851 driver with PREEMPT_RT enabled and 
> heavy iperf3 traffic on the SPI variant ? Does that trigger any issues 
> ? I ran 'iperf3 -s' on the KS8851 end and 'iperf3 -c 192.168.1.300 -t 0 
> --bidir' on the host PC side.

Successfully tested with both PREEMPT_RT and non-RT kernels using the 
iperf3 command above - no issues observed. Both builds included the fix 
from my previous message.
If there is anything else worth testing on the KS8851 SPI variant, 
please let me know.

> 
> Let me prepare a slightly updated fix and send a V2.

Regards
Nicolai

^ permalink raw reply

* Re: [PATCH bpf v6 1/2] bpf: tcp: Reject non-TCP skb in bpf_sk_assign_tcp_reqsk()
From: Martin KaFai Lau @ 2026-04-08 19:22 UTC (permalink / raw)
  To: Jiayuan Chen, Kuniyuki Iwashima
  Cc: Eric Dumazet, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, John Fastabend,
	Stanislav Fomichev, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Shuah Khan, netdev, linux-kernel, linux-kselftest
In-Reply-To: <CAAVpQUDz_rUFF1A8XDyE13fLTsgdP5k0XWGtdB1V3r=Z_mJW+g@mail.gmail.com>

On Tue, Apr 07, 2026 at 09:25:06PM -0700, Kuniyuki Iwashima wrote:
> > > sashiko has flagged a similar issue with larger scope.
> > > Please take a look. Thanks.
> > >
> > > https://sashiko.dev/#/patchset/20260403015851.148209-1-jiayuan.chen%40linux.dev
> >
> >
> > Thanks a lot Martin, sashiko actually dug into a deeper issue here.
> >
> > Eric and Kuniyuki,
> >
> > I think the AI review has a point. Since BPF can modify skb fields, the
> > following sequence still bypasses the protocol check in
> > bpf_sk_assign_tcp_reqsk():
> >
> >     // for a UDP skb
> >     iph->protocol = TCP
> >     bpf_sk_assign_tcp_reqsk()
> >     iph->protocol = UDP
> >
> > On top of that, bpf_sk_assign() already has the same problem — it doesn't
> > validate L4 protocol at all.
> 
> Sigh... honestly it does not make sense to me to add changes
> in the common fast path to protect someone with bpf capability
> shooting oneself in the foot.
> 
> On top of L4 validation in bpf_sk_assign() and bpf_sk_assign_tcp_reqsk(),
> can't we mark such an skb immutable after the helpers and catch
> subsequent writes to skb->data on the verifier ?

Clearing the skb->sk in a helper like bpf_skb_store_bytes or
rejecting direct writes to skb->data could break existing
bpf program.

I suspect adding a simple iph->protocol/ip6h->nexthdr check to
the helper (e.g. bpf_sk_assign) could also break some
tunneling use cases (e.g. ipip) also.

> 
> 
> >
> > So I think we should add a check matching skb against sk in
> > skb_steal_sock() instead of adding check in bpf helper.

Maybe limit the check to the '*prefetched' case in skb_steal_sock().

FWIW, in the early days of bpf_sk_assign, a tc bpf program could only
get hold of a tcp_sock. Later, bpf_map_lookup_elem(&sock_map) was
allowed in tc, and then udp/unix sock support was also added to sock_map.

There have been discussions on tc bpf programs being able to do
bpf_map_lookup_elem(&sock_map) to get a unix_sock. AFAIK, this
looked-up unix_sock can be used in bpf_sk_assign. It probably
makes sense for bpf_sk_assign to reject all non-tcp/non-udp sk.

^ permalink raw reply

* Re: [PATCH net] ipv4: nexthop: update has_v4 flag for any family change in replace_nexthop_single()
From: David Ahern @ 2026-04-08 19:22 UTC (permalink / raw)
  To: Xiang Mei, netdev; +Cc: davem, edumazet, kuba, pabeni, idosch, bestswngs
In-Reply-To: <20260408182850.2618488-1-xmei5@asu.edu>

On 4/8/26 12:28 PM, Xiang Mei wrote:
> When a nexthop within a group is replaced, nh_group_v4_update() is only
> called when the old nexthop is AF_INET and the new one is AF_INET6. The
> reverse direction (AF_INET6 to AF_INET) is not handled, leaving the
> group's has_v4 flag stale at false.
> 
> This causes fib6_check_nexthop() to incorrectly accept an IPv6 route
> referencing a group that now contains an AF_INET nexthop. During route
> lookup, nexthop_fib6_nh() returns NULL for the AF_INET nexthop and the
> subsequent dereference in rt6_find_cached_rt() crashes with a general
> protection fault:
> 

...

> 
> Fix by calling nh_group_v4_update() whenever the old and new nexthops
> have different address families, not just for AF_INET to AF_INET6.
> Using a general inequality is safe here: individual nexthops can only
> be AF_INET or AF_INET6 (enforced at parse time), so the only family
> transitions possible are between these two, and nh_group_v4_update()
> is a full rescan that always produces the correct has_v4 value.
> 

please add a test to tools/testing/selftests/net/fib_nexthops.sh
covering this use case.


^ permalink raw reply

* Re: [PATCH net] tcp: update window_clamp when SO_RCVBUF is set
From: Eric Dumazet @ 2026-04-08 19:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, pabeni, andrew+netdev, horms, ncardwell, kuniyu,
	willemb, dsahern, quic_subashab, quic_stranche
In-Reply-To: <20260408001438.129165-1-kuba@kernel.org>

On Tue, Apr 7, 2026 at 5:14 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Commit under Fixes moved recomputing the window clamp to
> tcp_measure_rcv_mss() (when scaling_ratio changes).
> I suspect it missed the fact that we don't recompute the clamp
> when rcvbuf is set. Until scaling_ratio changes we are
> stuck with the old window clamp which may be based on
> the small initial buffer. scaling_ratio may never change.
>
> Inspired by Eric's recent commit d1361840f8c5 ("tcp: fix
> SO_RCVLOWAT and RCVBUF autotuning") plumb the user action
> thru to TCP and have it update the clamp.
>
> A smaller fix would be to just have tcp_rcvbuf_grow()
> adjust the clamp even if SOCK_RCVBUF_LOCK is set.
> But IIUC this is what we were trying to get away from
> in the first place.
>
> Fixes: a2cbb1603943 ("tcp: Update window clamping condition")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Eric Dumazet <edumaze@google.com>

^ permalink raw reply

* Re: [PATCH net] can: raw: fix ro->uniq use-after-free in raw_rcv()
From: Oliver Hartkopp @ 2026-04-08 19:31 UTC (permalink / raw)
  To: Sam P, netdev; +Cc: mkl, linux-kernel, linux-can
In-Reply-To: <430cc8b9-21f0-4954-ae36-ec5e63f3ef9d@bynar.io>



On 08.04.26 19:22, Sam P wrote:
> On 08/04/2026 17:28, Oliver Hartkopp wrote:

>>
>> Can you tell why you preferred the destructor solution now?
> 
> Thank you :) I preferred the destructor solution as it seemed to match 
> the socket lifetime model better and I wasn't sure if the blocking sync 
> in the raw_release() was too heavy-handed for this specific issue, given 
> raw_release() already holds rtnl_lock() and lock_sock(sk). That said, 
> I'm happy to defer to your experience if the sync fix is better suited, 
> I have tested both of them.

Thanks. I think rtnl_lock() really might create a performance impact to 
other networking code when syncronize_rcu() waits for its grace period.

>> And if I see it correctly the UAF problem might also show up with the
>> kfree(ro->filter) statement we can see at the beginning of the above 
>> patch.
>>
>> So either free_percpu(ro->uniq) and kfree(ro->filter) should be 
>> handled after the finalized synchronize_rcu() process, right?
> 
> ro->filter isn't accessed in the racey raw_rcv() path as far as I can 
> tell, and I don't *think* there are other racey paths but it wouldn't 
> hurt to handle it just in-case. I think this would be simple with the 
> synchronize_rcu() patch, as you mentioned, but I'm not sure with the 
> destructor.

ro->filter contains all the CAN_RAW specific CAN ID filters and is 
allocated if more than the single default filter is required.

It is last used in the raw_disable_allfilters() above.

So after the good discussion I tend to your original approach with the 
destructor ;-)

Will add my Acked-by: to the original posted patch.

Many thanks,
Oliver


^ permalink raw reply

* Re: [PATCH net] can: raw: fix ro->uniq use-after-free in raw_rcv()
From: Oliver Hartkopp @ 2026-04-08 19:32 UTC (permalink / raw)
  To: Sam P, netdev; +Cc: mkl, linux-kernel, linux-can
In-Reply-To: <26ec626d-cae7-4418-9782-7198864d070c@bynar.io>



On 08.04.26 16:30, Sam P wrote:
> raw_release() unregisters raw CAN receive filters via can_rx_unregister(),
> but receiver deletion is deferred with call_rcu(). This leaves a window
> where raw_rcv() may still be running in an RCU read-side critical section
> after raw_release() frees ro->uniq, leading to a use-after-free of the
> percpu uniq storage.
> 
> Move free_percpu(ro->uniq) out of raw_release() and into a raw-specific
> socket destructor. can_rx_unregister() takes an extra reference to the
> socket and only drops it from the RCU callback, so freeing uniq from
> sk_destruct ensures the percpu area is not released until the relevant
> callbacks have drained.
> 
> Fixes: 514ac99c64b2 ("can: fix multiple delivery of a single CAN frame 
> for overlapping CAN filters")
> Cc: stable@vger.kernel.org # v4.1+
> Assisted-by: Bynario AI
> Signed-off-by: Samuel Page <sam@bynar.io>

Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>

> 
> ---
>   net/can/raw.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/net/can/raw.c b/net/can/raw.c
> index eee244ffc31e..f042c4316890 100644
> --- a/net/can/raw.c
> +++ b/net/can/raw.c
> @@ -361,6 +361,14 @@ static int raw_notifier(struct notifier_block *nb, 
> unsigned long msg,
>       return NOTIFY_DONE;
>   }
> 
> +static void raw_sock_destruct(struct sock *sk)
> +{
> +    struct raw_sock *ro = raw_sk(sk);
> +
> +    free_percpu(ro->uniq);
> +    can_sock_destruct(sk);
> +}
> +
>   static int raw_init(struct sock *sk)
>   {
>       struct raw_sock *ro = raw_sk(sk);
> @@ -387,6 +395,8 @@ static int raw_init(struct sock *sk)
>       if (unlikely(!ro->uniq))
>           return -ENOMEM;
> 
> +    sk->sk_destruct = raw_sock_destruct;
> +
>       /* set notifier */
>       spin_lock(&raw_notifier_lock);
>       list_add_tail(&ro->notifier, &raw_notifier_list);
> @@ -436,7 +446,6 @@ static int raw_release(struct socket *sock)
>       ro->bound = 0;
>       ro->dev = NULL;
>       ro->count = 0;
> -    free_percpu(ro->uniq);
> 
>       sock_orphan(sk);
>       sock->sk = NULL;


^ permalink raw reply

* Re: BUG: net-next (7.0-rc6 based and later) fails to boot on Jetson Xavier NX
From: Russell King (Oracle) @ 2026-04-08 19:52 UTC (permalink / raw)
  To: Robin Murphy
  Cc: netdev, linux-arm-kernel, linux-kernel, iommu, linux-ext4,
	Linus Torvalds, dmaengine, Marek Szyprowski, Theodore Ts'o,
	Andreas Dilger, Vinod Koul, Frank Li
In-Reply-To: <3a1d0520-3402-47b2-9d7b-4e14a3cd07a4@arm.com>

On Wed, Apr 08, 2026 at 05:40:48PM +0100, Robin Murphy wrote:
> On 2026-04-08 5:16 pm, Russell King (Oracle) wrote:
> > On Wed, Apr 08, 2026 at 05:08:34PM +0100, Russell King (Oracle) wrote:
> > > The rebase is still progressing, but it's landed on:
> > > 
> > > c7d812e33f3e dmaengine: xilinx: xilinx_dma: Fix unmasked residue subtraction
> 
> FWIW I don't see a Tegra having the Xilinx IP in it anyway - judging by the
> DT it has their own tegra-gpcdma engine...
> 
> There's a fair chance this could be 90c5def10bea ("iommu: Do not call
> drivers for empty gathers"), which JonH also reported causing boot issues on
> Tegras - in short, SMMU TLB maintenance may not be completed properly which
> could lead to recycled DMA addresses causing exactly this kind of random
> memory corruption. I CC'd you on a patch:
> 
> https://lore.kernel.org/linux-iommu/20260408162846.GE3357077@nvidia.com/T/#t

Okay, bisect complete, and... no idea. It seems to suggest that 7.0-rc6
is actually fine - it ended up blaming Linus' tagging of 7.0-rc6 which
only changed the makefile. So, my assumption that because rc6 was merged
into net-next last Thursday which fails, net-next+rc7 fails, rc7 also
fails, that rc6 would also fail seems to be false.

Right, rc7 built with the same .config that rc6 was built with
definitely fails, this time with:

Root device found: PARTUUID=741c0777-391a-4bce-a222-455e180ece2a
depmod: ERROR: could not open directory /lib/modules/7.0.0-rc7-bisect: No such file or directory
depmod: FATAL: could not search modules: No such file or directory
usb 2-3: new SuperSpeed Plus Gen 2x1 USB device number 2 using tegra-xusb
hub 2-3:1.0: USB hub found
hub 2-3:1.0: 4 ports detected
usb 1-3: new full-speed USB device number 3 using tegra-xusb
EXT4-fs (mmcblk0p1): VFS: Can't find ext4 filesystem
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/mmcblk0p1, missing codepage or helper program, or other error.
mount: /mnt/: can't find PARTUUID=741c0777-391a-4bce-a222-455e180ece2a.
get_swap_device: Bad swap file entry 1800c00008
get_swap_device: Bad swap file entry 1800c00008
get_swap_device: Bad swap file entry 1800c00008

So, it seems rc6 -> rc7 => fails
net-next with rc5 -> net-next with rc6 => fails

However, before I test anything else, I've just built the same rc7
which failed above with your patch applied - and that boots fine.

Now, each Thursday, net-next gets updated as that's the day that the
net tree gets sent for merging into mainline. This causes net-next's
version to increase. So something in current net-next plus in rc7 is
causing this problem.

The commit you claim needs fixing is:

$ git describe --contains 90c5def10bea
v7.0-rc7~29^2~2

which I had assumed wouldn't be in net-next.

Now, mainline had this on Thursday:

commit f8f5627a8aeab15183eef8930bf75ba88a51622f
Merge: 4c2c526b5adf ec7067e66119
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Apr 2 09:57:06 2026 -0700

    Merge tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

commit 4c2c526b5adfb580bd95316bf179327d5ee26da8
Merge: 2ec9074b28a0 8b72aa5704c7
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Apr 2 09:53:16 2026 -0700

    Merge tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

and merging iommu-fixes-v7.0-rc6 introduced the buggy 90c5def10bea
commit into -rc7.

However, as soon as Linus merged net-7.0-rc7, netdev maintainers merged
that exact commit back into net-next:

commit 8ffb33d7709b59ff60560f48960a73bd8a55be95
Merge: 269389ba5398 f8f5627a8aea
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Thu Apr 2 10:57:09 2026 -0700

    Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Thereby bringing in that buggy commit into net-next, but with net-next
identifying itself as 7.0-rc6.

That's... confusing, but explains why current net-next which reports
itself as 7.0-rc6 _and_ rc7 both fail, but rc6 itself does not. It
also means I've wasted an entire afternoon running a useless bisect
between rc5 and rc6 due to the version numbers in net-next being
meaningless.

What's the status on the iommu fix? Is it merged into mainline yet?
If it isn't already, that means net-next remains unbootable going
into the merge window without manually carrying the fix locally.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* [net-next PATCH 0/2] net: dsa: tag_rtl8_4: fixes doc and set keep
From: Luiz Angelo Daros de Luca @ 2026-04-08 20:31 UTC (permalink / raw)
  To: Andrew Lunn, Vladimir Oltean, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman
  Cc: netdev, linux-kernel, Luiz Angelo Daros de Luca,
	Alvin Šipraga, Linus Walleij

This small series addresses two points in the rtl8_4 tagger used by the
realtel rtl8365mb driver.

The first patch updates the documentation of the tag format while the
second patch sets the KEEP flag bit, ensuring that the switch
respects the frame's VLAN format as provided by the kernel.

These patches were previously part of a larger series but are being 
submitted independently as they are self-contained and already 
received review.

Changes in v1:
- Resubmitted as a standalone series from the previous larger set.
- Kept Reviewed-by tags from Linus Walleij.

Link: https://patch.msgid.link/CAD++jLmX31KfhGXA6SMAPXb14dHSC1t4JQZ=PQvjh-3hUcnzJA@mail.gmail.com
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
---
Alvin Šipraga (1):
      net: dsa: tag_rtl8_4: update format description

Luiz Angelo Daros de Luca (1):
      net: dsa: tag_rtl8_4: set KEEP flag

 net/dsa/tag_rtl8_4.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)
---
base-commit: 1a8dd88469bf742fd5eda91cd8e0f720a983ec5a
change-id: 20260407-realtek_fixes-e1a81e477cbb

Best regards,
--  
Luiz Angelo Daros de Luca <luizluca@gmail.com>


^ permalink raw reply

* [net-next PATCH 1/2] net: dsa: tag_rtl8_4: update format description
From: Luiz Angelo Daros de Luca @ 2026-04-08 20:31 UTC (permalink / raw)
  To: Andrew Lunn, Vladimir Oltean, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman
  Cc: netdev, linux-kernel, Luiz Angelo Daros de Luca,
	Alvin Šipraga, Linus Walleij
In-Reply-To: <20260408-realtek_fixes-v1-0-915ff1404d56@gmail.com>

From: Alvin Šipraga <alsi@bang-olufsen.dk>

Document the updated tag layout fields (EFID, VSEL/VIDX) and clarify
which bits are set/cleared when emitting tags.

Co-developed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
---
 net/dsa/tag_rtl8_4.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/dsa/tag_rtl8_4.c b/net/dsa/tag_rtl8_4.c
index 2464545da4d2..b7ed39c5419f 100644
--- a/net/dsa/tag_rtl8_4.c
+++ b/net/dsa/tag_rtl8_4.c
@@ -17,8 +17,8 @@
  *  |              (8-bit)              |              (8-bit)              |
  *  |          Protocol [0x04]          |              REASON               | b
  *  |-----------------------------------+-----------------------------------| y
- *  |   (1)  | (1) | (2) |   (1)  | (3) | (1)  | (1) |    (1)    |   (5)    | t
- *  | FID_EN |  X  | FID | PRI_EN | PRI | KEEP |  X  | LEARN_DIS |    X     | e
+ *  |   (1)   |   (3)  |   (1)  |  (3)  | (1)  | (1)  |    (1)     |  (5)   | t
+ *  | EFID_EN |  EFID  | PRI_EN |  PRI  | KEEP | VSEL | LEARN_DIS  |  VIDX  | e
  *  |-----------------------------------+-----------------------------------| s
  *  |   (1)  |                       (15-bit)                               | |
  *  |  ALLOW |                        TX/RX                                 | v
@@ -32,19 +32,22 @@
  *     EtherType |         note that Realtek uses the same EtherType for
  *               |         other incompatible tag formats (e.g. tag_rtl4_a.c)
  *    Protocol   | 0x04: indicates that this tag conforms to this format
- *    X          | reserved
  *   ------------+-------------
  *    REASON     | reason for forwarding packet to CPU
  *               | 0: packet was forwarded or flooded to CPU
  *               | 80: packet was trapped to CPU
- *    FID_EN     | 1: packet has an FID
- *               | 0: no FID
- *    FID        | FID of packet (if FID_EN=1)
+ *    EFID_EN    | 1: packet has an EFID
+ *               | 0: no EFID
+ *    EFID       | Extended filter ID (EFID) of packet (if EFID_EN=1)
  *    PRI_EN     | 1: force priority of packet
  *               | 0: don't force priority
  *    PRI        | priority of packet (if PRI_EN=1)
  *    KEEP       | preserve packet VLAN tag format
+ *    VSEL       | 0: switch should classify packet according to VLAN tag
+ *               | 1: switch should classify packet according to VLAN membership
+ *               |    configuration with index VIDX
  *    LEARN_DIS  | don't learn the source MAC address of the packet
+ *    VIDX       | index of a VLAN membership configuration to use with VSEL
  *    ALLOW      | 1: treat TX/RX field as an allowance port mask, meaning the
  *               |    packet may only be forwarded to ports specified in the
  *               |    mask
@@ -111,7 +114,7 @@ static void rtl8_4_write_tag(struct sk_buff *skb, struct net_device *dev,
 	/* Set Protocol; zero REASON */
 	tag16[1] = htons(FIELD_PREP(RTL8_4_PROTOCOL, RTL8_4_PROTOCOL_RTL8365MB));
 
-	/* Zero FID_EN, FID, PRI_EN, PRI, KEEP; set LEARN_DIS */
+	/* Zero EFID_EN, EFID, PRI_EN, PRI, VSEL, VIDX, KEEP; set LEARN_DIS */
 	tag16[2] = htons(FIELD_PREP(RTL8_4_LEARN_DIS, 1));
 
 	/* Zero ALLOW; set RX (CPU->switch) forwarding port mask */

-- 
2.53.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox