Netdev List
 help / color / mirror / Atom feed
* [PATCH net v2] net: ethernet: ti: icssg: guard PA stat lookups
From: Philippe Schenker @ 2026-06-18  9:30 UTC (permalink / raw)
  To: netdev
  Cc: Philippe Schenker, Simon Horman, danishanwar, rogerq,
	linux-arm-kernel, stable, Andrew Lunn, David Carlier,
	David S. Miller, Eric Dumazet, Jacob Keller, Jakub Kicinski,
	Kevin Hao, Meghana Malladi, Paolo Abeni, Vadim Fedorenko,
	linux-kernel

From: Philippe Schenker <philippe.schenker@impulsing.ch>

icssg_ndo_get_stats64() unconditionally calls emac_get_stat_by_name()
with FW PA stat names regardless of whether the PA stats block is
present on the hardware.  emac_get_stat_by_name() already guards the
PA stats lookup with `if (emac->prueth->pa_stats)`; when that pointer
is NULL the lookup falls through to netdev_err() and returns -EINVAL.
Because ndo_get_stats64 is polled regularly by the networking stack
this produces thousands of log entries of the form:

  icssg-prueth icssg1-eth end0: Invalid stats FW_RX_ERROR

A secondary consequence is that the int(-EINVAL) return value is
implicitly widened to a near-ULLONG_MAX unsigned value when accumulated
into the __u64 fields of rtnl_link_stats64, silently corrupting the
rx_errors, rx_dropped and tx_dropped counters reported by `ip -s link`.

Every other PA-aware code path in the driver is already guarded with
the same `if (emac->prueth->pa_stats)` check.  Apply the same guard
here.

Fixes: 0d15a26b247d ("net: ti: icssg-prueth: Add ICSSG FW Stats")
Signed-off-by: Philippe Schenker <philippe.schenker@impulsing.ch>
Reviewed-by: Simon Horman <horms@kernel.org>

Cc: danishanwar@ti.com
Cc: rogerq@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@vger.kernel.org

---

Changes in v2:
- Removed newline between Fixes tag and Signed-off-by
- Use return in if statement to guard so we get rid
  of the 80 char warnings.
- Added Simon's Reviewed-by. Thanks!

 drivers/net/ethernet/ti/icssg/icssg_common.c | 49 +++++++++++---------
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
index a28a608f9bf4..d9af6419e032 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_common.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
@@ -1628,28 +1628,35 @@ void icssg_ndo_get_stats64(struct net_device *ndev,
 	stats->rx_over_errors = emac_get_stat_by_name(emac, "rx_over_errors");
 	stats->multicast      = emac_get_stat_by_name(emac, "rx_multicast_frames");
 
-	stats->rx_errors  = ndev->stats.rx_errors +
-			    emac_get_stat_by_name(emac, "FW_RX_ERROR") +
-			    emac_get_stat_by_name(emac, "FW_RX_EOF_SHORT_FRMERR") +
-			    emac_get_stat_by_name(emac, "FW_RX_B0_DROP_EARLY_EOF") +
-			    emac_get_stat_by_name(emac, "FW_RX_EXP_FRAG_Q_DROP") +
-			    emac_get_stat_by_name(emac, "FW_RX_FIFO_OVERRUN");
-	stats->rx_dropped = ndev->stats.rx_dropped +
-			    emac_get_stat_by_name(emac, "FW_DROPPED_PKT") +
-			    emac_get_stat_by_name(emac, "FW_INF_PORT_DISABLED") +
-			    emac_get_stat_by_name(emac, "FW_INF_SAV") +
-			    emac_get_stat_by_name(emac, "FW_INF_SA_DL") +
-			    emac_get_stat_by_name(emac, "FW_INF_PORT_BLOCKED") +
-			    emac_get_stat_by_name(emac, "FW_INF_DROP_TAGGED") +
-			    emac_get_stat_by_name(emac, "FW_INF_DROP_PRIOTAGGED") +
-			    emac_get_stat_by_name(emac, "FW_INF_DROP_NOTAG") +
-			    emac_get_stat_by_name(emac, "FW_INF_DROP_NOTMEMBER");
+	stats->rx_errors  = ndev->stats.rx_errors;
+	stats->rx_dropped = ndev->stats.rx_dropped;
 	stats->tx_errors  = ndev->stats.tx_errors;
-	stats->tx_dropped = ndev->stats.tx_dropped +
-			    emac_get_stat_by_name(emac, "FW_RTU_PKT_DROP") +
-			    emac_get_stat_by_name(emac, "FW_TX_DROPPED_PACKET") +
-			    emac_get_stat_by_name(emac, "FW_TX_TS_DROPPED_PACKET") +
-			    emac_get_stat_by_name(emac, "FW_TX_JUMBO_FRM_CUTOFF");
+	stats->tx_dropped = ndev->stats.tx_dropped;
+
+	if (!emac->prueth->pa_stats)
+		return;
+
+	stats->rx_errors  +=
+			emac_get_stat_by_name(emac, "FW_RX_ERROR") +
+			emac_get_stat_by_name(emac, "FW_RX_EOF_SHORT_FRMERR") +
+			emac_get_stat_by_name(emac, "FW_RX_B0_DROP_EARLY_EOF") +
+			emac_get_stat_by_name(emac, "FW_RX_EXP_FRAG_Q_DROP") +
+			emac_get_stat_by_name(emac, "FW_RX_FIFO_OVERRUN");
+	stats->rx_dropped +=
+			emac_get_stat_by_name(emac, "FW_DROPPED_PKT") +
+			emac_get_stat_by_name(emac, "FW_INF_PORT_DISABLED") +
+			emac_get_stat_by_name(emac, "FW_INF_SAV") +
+			emac_get_stat_by_name(emac, "FW_INF_SA_DL") +
+			emac_get_stat_by_name(emac, "FW_INF_PORT_BLOCKED") +
+			emac_get_stat_by_name(emac, "FW_INF_DROP_TAGGED") +
+			emac_get_stat_by_name(emac, "FW_INF_DROP_PRIOTAGGED") +
+			emac_get_stat_by_name(emac, "FW_INF_DROP_NOTAG") +
+			emac_get_stat_by_name(emac, "FW_INF_DROP_NOTMEMBER");
+	stats->tx_dropped +=
+			emac_get_stat_by_name(emac, "FW_RTU_PKT_DROP") +
+			emac_get_stat_by_name(emac, "FW_TX_DROPPED_PACKET") +
+			emac_get_stat_by_name(emac, "FW_TX_TS_DROPPED_PACKET") +
+			emac_get_stat_by_name(emac, "FW_TX_JUMBO_FRM_CUTOFF");
 }
 EXPORT_SYMBOL_GPL(icssg_ndo_get_stats64);
 
-- 
2.54.0

base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6
branch: fix-icssg_common-pa-stats-errors__master-7-1

^ permalink raw reply related

* Re: [PATCH net] net: ethernet: ti: icssg: guard PA stat lookups
From: Philippe Schenker @ 2026-06-18  9:29 UTC (permalink / raw)
  To: Simon Horman
  Cc: netdev, danishanwar, rogerq, linux-arm-kernel, stable,
	Andrew Lunn, David Carlier, David S. Miller, Eric Dumazet,
	Jacob Keller, Jakub Kicinski, Kevin Hao, Meghana Malladi,
	Paolo Abeni, Vadim Fedorenko, linux-kernel
In-Reply-To: <20260618091004.GG827683@horms.kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1794 bytes --]

Hi Simon

Thanks for the review and I'll send a v2 with that blank line removed.
Saw it right after sending the patch.

Philippe

On Thu, 2026-06-18 at 10:10 +0100, Simon Horman wrote:
> On Tue, Jun 16, 2026 at 04:35:34PM +0200, Philippe Schenker wrote:
> > From: Philippe Schenker <philippe.schenker@impulsing.ch>
> > 
> > icssg_ndo_get_stats64() unconditionally calls
> > emac_get_stat_by_name()
> > with FW PA stat names regardless of whether the PA stats block is
> > present on the hardware.  emac_get_stat_by_name() already guards
> > the
> > PA stats lookup with `if (emac->prueth->pa_stats)`; when that
> > pointer
> > is NULL the lookup falls through to netdev_err() and returns -
> > EINVAL.
> > Because ndo_get_stats64 is polled regularly by the networking stack
> > this produces thousands of log entries of the form:
> > 
> >   icssg-prueth icssg1-eth end0: Invalid stats FW_RX_ERROR
> > 
> > A secondary consequence is that the int(-EINVAL) return value is
> > implicitly widened to a near-ULLONG_MAX unsigned value when
> > accumulated
> > into the __u64 fields of rtnl_link_stats64, silently corrupting the
> > rx_errors, rx_dropped and tx_dropped counters reported by `ip -s
> > link`.
> > 
> > Every other PA-aware code path in the driver is already guarded
> > with
> > the same `if (emac->prueth->pa_stats)` check.  Apply the same guard
> > here.
> > 
> > Fixes: 0d15a26b247d ("net: ti: icssg-prueth: Add ICSSG FW Stats")
> 
> nit: no blank line between tags
> 
> > 
> > Signed-off-by: Philippe Schenker <philippe.schenker@impulsing.ch>
> > 
> > Cc: danishanwar@ti.com
> > Cc: rogerq@kernel.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: stable@vger.kernel.org
> 
> Reviewed-by: Simon Horman <horms@kernel.org>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply

* Re: [PATCH net 5/5] afs: Fix uncancelled rxrpc OOB message handler
From: Simon Horman @ 2026-06-18  9:29 UTC (permalink / raw)
  To: dhowells
  Cc: Simon Horman, netdev, marc.dionne, kuba, davem, edumazet, pabeni,
	linux-afs, linux-kernel, d4n.for.sec, n05ec, jaltman, stable
In-Reply-To: <20260616155749.2125907-6-dhowells@redhat.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
afs: Fix uncancelled rxrpc OOB message handler

This commit cancels AFS OOB message processing during shutdown and moves it
to afs_wq. It also prevents queuing if the net namespace is no longer live.

Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com

> diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
> index fd2d260fb25f9..6241f9349f6b6 100644
> --- a/fs/afs/rxrpc.c
> +++ b/fs/afs/rxrpc.c
[ ... ]
> @@ -128,6 +128,7 @@ void afs_close_socket(struct afs_net *net)
>  	_enter("");
>  
>  	cancel_work_sync(&net->charge_preallocation_work);
> +	cancel_work_sync(&net->rx_oob_work);
>  	kernel_listen(net->socket, 0);
>  	flush_workqueue(afs_async_calls);
>  	flush_workqueue(afs_wq);

[Severity: High]
This is a pre-existing issue, but does placing the cancellation and flush
before kernel_sock_shutdown() leave a window open for the work to be queued
afterwards?

If kernel_sock_shutdown() is what synchronously disables the notify_oob
callback from firing, couldn't a delayed callback still queue work on afs_wq
after we've already flushed it here?

[ ... ]
> @@ -985,5 +986,6 @@ static void afs_rx_notify_oob(struct sock *sk, struct sk_buff *oob)
>  {
>  	struct afs_net *net = sk->sk_user_data;
>  
> -	schedule_work(&net->rx_oob_work);
> +	if (net->live)
> +		queue_work(afs_wq, &net->rx_oob_work);
>  }

[Severity: High]
This is a pre-existing issue, but does this lockless check of net->live allow
a race condition that leads to a use-after-free?

Consider this sequence:

CPU1 (teardown)
afs_net_exit()
    net->live = false;
    afs_close_socket()
        cancel_work_sync(&net->rx_oob_work);
        flush_workqueue(afs_wq);
        kernel_sock_shutdown()  // blocks waiting for recvmsg_lock

CPU2 (incoming OOB)
rxrpc_notify_socket_oob()
    spin_lock_irq(&rx->recvmsg_lock);
    afs_rx_notify_oob()
        // Reads net->live == true (stale read, or delayed execution)
        queue_work(afs_wq, &net->rx_oob_work);
    spin_unlock_irq(&rx->recvmsg_lock);

CPU1 then finishes kernel_sock_shutdown() and frees the net namespace, but
afs_wq is never flushed again.

Would moving cancel_work_sync() and flush_workqueue() to the end of
afs_close_socket(), after kernel_sock_shutdown() completes, prevent this race?

^ permalink raw reply

* Re: [PATCH net 4/5] afs: Fix further netns teardown to cancel the preallocation charger
From: Simon Horman @ 2026-06-18  9:29 UTC (permalink / raw)
  To: dhowells
  Cc: Simon Horman, netdev, marc.dionne, kuba, davem, edumazet, pabeni,
	linux-afs, linux-kernel, d4n.for.sec, n05ec, jaltman, stable
In-Reply-To: <20260616155749.2125907-5-dhowells@redhat.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
afs: Fix further netns teardown to cancel the preallocation charger

This commit modifies the afs network namespace teardown process to wait for
the preallocation charger work item. It addresses a small window where the
work item could be requeued after being cancelled by flushing the workqueue
on which the charger runs.

Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com

> diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
> --- a/fs/afs/rxrpc.c
> +++ b/fs/afs/rxrpc.c
> @@ -130,6 +130,7 @@ void afs_close_socket(struct afs_net *net)
>  	cancel_work_sync(&net->charge_preallocation_work);
>  	kernel_listen(net->socket, 0);
>  	flush_workqueue(afs_async_calls);
> +	flush_workqueue(afs_wq);

[Severity: High]
Will flushing the system-wide global workqueue here cause cross-namespace
stalls during teardown?

It appears afs_wq is allocated once in afs_init() as a system-wide workqueue
shared by all AFS network namespaces:

fs/afs/main.c:afs_init() {
    ...
    afs_wq = alloc_workqueue("afs", WQ_PERCPU, 0);
    ...
}

Other namespaces use afs_wq for operations like cell management, which can
block for tens of seconds waiting for userspace DNS lookups via the
request_key mechanism.

Since afs_close_socket() runs in the context of the global cleanup_net
worker thread, does this stall the destruction of all network namespaces
system-wide until unrelated AFS work across the entire system finishes?

Would it be safer to explicitly call
cancel_work_sync(&net->charge_preallocation_work) a second time here instead
of flushing the shared queue?

^ permalink raw reply

* Re: [PATCH net 1/5] rxrpc: input: reject ACKALL outside transmit phase
From: Simon Horman @ 2026-06-18  9:27 UTC (permalink / raw)
  To: dhowells
  Cc: Simon Horman, netdev, marc.dionne, kuba, davem, edumazet, pabeni,
	linux-afs, linux-kernel, d4n.for.sec, n05ec, jaltman, stable
In-Reply-To: <20260616155749.2125907-2-dhowells@redhat.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
rxrpc: input: reject ACKALL outside transmit phase

rxrpc_input_ackall() accepts ACKALL packets without checking whether
the call is in a state that can legitimately have outstanding transmit
buffers. A forged ACKALL can reach a new service call and trigger a null
pointer dereference.

> diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
> index ce761466b02d6..37881dffa898b 100644
> --- a/net/rxrpc/input.c
> +++ b/net/rxrpc/input.c

[ ... ]

> @@ -1214,8 +1214,22 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
>  static void rxrpc_input_ackall(struct rxrpc_call *call, struct sk_buff *skb)
>  {
>  	struct rxrpc_ack_summary summary = { 0 };
> +	rxrpc_seq_t top = READ_ONCE(call->tx_top);
> +
> +	switch (__rxrpc_call_state(call)) {
> +	case RXRPC_CALL_CLIENT_SEND_REQUEST:
> +	case RXRPC_CALL_CLIENT_AWAIT_REPLY:
> +	case RXRPC_CALL_SERVER_SEND_REPLY:
> +	case RXRPC_CALL_SERVER_AWAIT_ACK:
> +		break;
> +	default:
> +		return;
> +	}
> +
> +	if (call->tx_bottom == top)
> +		return;
>  
> -	if (rxrpc_rotate_tx_window(call, call->tx_top, &summary))
> +	if (rxrpc_rotate_tx_window(call, top, &summary))

[Severity: High]
This isn't a bug introduced by this patch, but it appears the fix might be
incomplete. The patch prevents a NULL pointer dereference in
rxrpc_input_ackall() by checking if call->tx_bottom == top before calling
rxrpc_rotate_tx_window().

However, does the same vulnerability remain in rxrpc_receiving_reply()?

If a client in the RXRPC_CALL_CLIENT_SEND_REQUEST state receives a forged or
early DATA packet (acting as a reply) before it has queued any transmit data,
call->tx_top and call->tx_bottom will both be 0, and call->tx_queue will be
NULL.

rxrpc_receiving_reply() will then unconditionally call:

    if (!rxrpc_rotate_tx_window(call, top, &summary)) {

Because rxrpc_rotate_tx_window() uses a do { ... } while (before_eq(seq, to))
loop, it executes exactly once for seq = 1 and to = 0, which dereferences
the NULL tq:

    if (tq->bufs[ix]->flags & RXRPC_LAST_PACKET) {

Should the empty window check be placed at the top of rxrpc_rotate_tx_window()
instead, or mirrored in rxrpc_receiving_reply() to protect against early
replies?

>  		rxrpc_end_tx_phase(call, false, rxrpc_eproto_unexpected_ackall);
>  }

^ permalink raw reply

* [PATCH net v2 2/2] dpaa2-switch: fix VLAN upper check not rejecting bridge join
From: Ioana Ciornei @ 2026-06-18  9:28 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, netdev
  Cc: f.fainelli, vladimir.oltean, linux-kernel
In-Reply-To: <20260618092813.432535-1-ioana.ciornei@nxp.com>

The blamed commit refactored the prechangeupper event handling but
failed to actually return an error in case
dpaa2_switch_prevent_bridging_with_8021q_upper() detected a 802.1q upper
on a port which tries to join a bridge. Fix this by returning err
instead of 0.

Fixes: 45035febc495 ("net: dpaa2-switch: refactor prechangeupper sanity checks")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
---
Changes in v2:
- none

 drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
index 83ccefdac59f..858ba844ac51 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
@@ -2212,7 +2212,7 @@ dpaa2_switch_prechangeupper_sanity_checks(struct net_device *netdev,
 	if (err) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "Cannot join a bridge while VLAN uppers are present");
-		return 0;
+		return err;
 	}
 
 	netdev_for_each_lower_dev(upper_dev, other_dev, iter) {
-- 
2.25.1


^ permalink raw reply related

* [PATCH net v2 1/2] dpaa2-switch: do not accept VLAN uppers while bridged
From: Ioana Ciornei @ 2026-06-18  9:28 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, netdev
  Cc: f.fainelli, vladimir.oltean, linux-kernel
In-Reply-To: <20260618092813.432535-1-ioana.ciornei@nxp.com>

The dpaa2-switch driver does not support VLAN uppers while its ports are
bridged. This scenario tried to be prevented by rejecting a bridge join
while VLAN uppers exist but the reverse order was still possible.

This patches adds a check so that the dpaa2-switch also does not accept
VLAN uppers while bridged.

Fixes: f48298d3fbfa ("staging: dpaa2-switch: move the driver out of staging")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
---
Changes in v2:
- patch is new

 drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
index 45f276c2c3ec..83ccefdac59f 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
@@ -2233,6 +2233,7 @@ dpaa2_switch_prechangeupper_sanity_checks(struct net_device *netdev,
 static int dpaa2_switch_port_prechangeupper(struct net_device *netdev,
 					    struct netdev_notifier_changeupper_info *info)
 {
+	struct ethsw_port_priv *port_priv;
 	struct netlink_ext_ack *extack;
 	struct net_device *upper_dev;
 	int err;
@@ -2251,6 +2252,13 @@ static int dpaa2_switch_port_prechangeupper(struct net_device *netdev,
 
 		if (!info->linking)
 			dpaa2_switch_port_pre_bridge_leave(netdev);
+	} else if (is_vlan_dev(upper_dev)) {
+		port_priv = netdev_priv(netdev);
+		if (port_priv->fdb->bridge_dev) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "Cannot accept VLAN uppers while bridged");
+			return -EOPNOTSUPP;
+		}
 	}
 
 	return 0;
-- 
2.25.1


^ permalink raw reply related

* [PATCH net v2 0/2] dpaa2-switch: reject VLAN uppers while bridged
From: Ioana Ciornei @ 2026-06-18  9:28 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, netdev
  Cc: f.fainelli, vladimir.oltean, linux-kernel

The dpaa2-switch driver does not support VLAN uppers on its ports while
they are bridged. The check which should have prevented a port with a
VLAN upper to join bridge was poorly refactored and didn't actually
return an error. Patch 2/2 fixes that.

On the other hand, the driver didn't reject the addition of a VLAN upper
while bridged. Patch 1/2 fixes that.

Changes in v2:
- added patch 1/2

Ioana Ciornei (2):
  dpaa2-switch: do not accept VLAN uppers while bridged
  dpaa2-switch: fix VLAN upper check not rejecting bridge join

 drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

-- 
2.25.1


^ permalink raw reply

* Re: [PATCH 1/1] selftests: net: fix file owner for broadcast_ether_dst test
From: Simon Horman @ 2026-06-18  9:21 UTC (permalink / raw)
  To: Ross Porter
  Cc: linux-kselftest, netdev, stable, Edoardo Canepa, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Shuah Khan, Oscar Maes,
	Brett A C Sheffield, linux-kernel
In-Reply-To: <20260610062230.71573-2-ross.porter@canonical.com>

On Wed, Jun 10, 2026 at 06:22:29PM +1200, Ross Porter wrote:
> Ensure the output file is always owned by root (even if tcpdump was 
> compiled with `--with-user`), by passing the `-Z root` argument when 
> invoking it.

Hi Ross,

I think that the motivation, described in the cover letter,
belongs here so it can be found more easily using git..

Also, as there is only one patch in the series, the cover letter
could be dropped.

And lastly, this should be targeted at net as it's a but fix
for code present there.

Subject: [PATCH net] ...

For more information on the Networking development workflow see
https://docs.kernel.org/process/maintainer-netdev.html

> 
> Cc: stable@vger.kernel.org
> Reported-by: Edoardo Canepa <edoardo.canepa@canonical.com>
> Closes: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2129815
> Fixes: bf59028ea8d4 ("selftests: net: add test for destination in broadcast packets")
> Suggested-by: Edoardo Canepa <edoardo.canepa@canonical.com>
> Tested-by: Ross Porter <ross.porter@canonical.com>
> Signed-off-by: Ross Porter <ross.porter@canonical.com>

...

-- 
pw-bot: changes-requested

^ permalink raw reply

* [PATCH net v2] sfc: Use acquire/release for irq_soft_enabled
From: Gui-Dong Han @ 2026-06-18  9:16 UTC (permalink / raw)
  To: netdev, linux-net-drivers, ecree.xilinx
  Cc: linux-kernel, andrew+netdev, davem, edumazet, kuba, pabeni, horms,
	baijiaju1990, Gui-Dong Han

irq_soft_enabled is a lockless gate for interrupt handlers. When it is
false, handlers acknowledge interrupts but must not touch channel state.

Channel reallocation disables the gate, swaps and initializes channel
pointers, frees old channels, and then enables the gate again. Once a
handler observes irq_soft_enabled as true, it can dereference
efx->channel[] and other channel state. That observation must therefore
be ordered after the channel state was published.

READ_ONCE() does not provide that acquire ordering. The existing
smp_wmb() in the soft-enable paths also cannot provide it because it is
after the irq_soft_enabled=true store, so it cannot publish prior channel
state before the gate becomes visible.

Use a release store only when opening the software IRQ gate, and use
acquire loads in interrupt handlers before touching channels. Use
WRITE_ONCE() when closing the gate; handlers that observe false do not
touch channel state.

Keep the existing smp_wmb() after gate updates. It preserves the
previous ordering between the software IRQ gate and subsequent event
queue setup, start and stop operations, which is separate from the
release/acquire ordering added here.

Fixes: d829118705f8 ("sfc: Rework IRQ enable/disable")
Fixes: 8127d661e77f ("sfc: Add support for Solarflare SFC9100 family")
Fixes: 5a6681e22c14 ("sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver")
Fixes: 51b35a454efd ("sfc: skeleton EF100 PF driver")
Fixes: 6e173d3b4af9 ("sfc: Copy shared files needed for Siena (part 1)")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>
---
v2:
- Use release ordering only when enabling the software IRQ gate.
- Use WRITE_ONCE() when disabling it.
- Expand the commit message to address review comments from Jakub
  Kicinski and Edward Cree about the release pairing and the existing
  smp_wmb().
v1: https://lore.kernel.org/netdev/20260528092838.2099352-1-hanguidong02@gmail.com/
---
 drivers/net/ethernet/sfc/ef10.c               |  4 ++--
 drivers/net/ethernet/sfc/ef100_nic.c          |  2 +-
 drivers/net/ethernet/sfc/efx_channels.c       |  4 ++--
 drivers/net/ethernet/sfc/falcon/efx.c         |  4 ++--
 drivers/net/ethernet/sfc/falcon/falcon.c      |  2 +-
 drivers/net/ethernet/sfc/falcon/farch.c       |  4 ++--
 drivers/net/ethernet/sfc/falcon/net_driver.h  | 17 +++++++++++++++++
 drivers/net/ethernet/sfc/net_driver.h         | 17 +++++++++++++++++
 drivers/net/ethernet/sfc/siena/efx_channels.c |  4 ++--
 drivers/net/ethernet/sfc/siena/farch.c        |  4 ++--
 drivers/net/ethernet/sfc/siena/net_driver.h   | 17 +++++++++++++++++
 11 files changed, 65 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 7e04f115bbaa..a907303497f9 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -2143,7 +2143,7 @@ static irqreturn_t efx_ef10_msi_interrupt(int irq, void *dev_id)
 	netif_vdbg(efx, intr, efx->net_dev,
 		   "IRQ %d on CPU %d\n", irq, raw_smp_processor_id());
 
-	if (likely(READ_ONCE(efx->irq_soft_enabled))) {
+	if (likely(efx_irq_soft_enabled(efx))) {
 		/* Note test interrupts */
 		if (context->index == efx->irq_level)
 			efx->last_irq_cpu = raw_smp_processor_id();
@@ -2158,7 +2158,7 @@ static irqreturn_t efx_ef10_msi_interrupt(int irq, void *dev_id)
 static irqreturn_t efx_ef10_legacy_interrupt(int irq, void *dev_id)
 {
 	struct efx_nic *efx = dev_id;
-	bool soft_enabled = READ_ONCE(efx->irq_soft_enabled);
+	bool soft_enabled = efx_irq_soft_enabled(efx);
 	struct efx_channel *channel;
 	efx_dword_t reg;
 	u32 queues;
diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index 00050f786cae..7885b3a5a398 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -333,7 +333,7 @@ static irqreturn_t ef100_msi_interrupt(int irq, void *dev_id)
 	netif_vdbg(efx, intr, efx->net_dev,
 		   "IRQ %d on CPU %d\n", irq, raw_smp_processor_id());
 
-	if (likely(READ_ONCE(efx->irq_soft_enabled))) {
+	if (likely(efx_irq_soft_enabled(efx))) {
 		/* Note test interrupts */
 		if (context->index == efx->irq_level)
 			efx->last_irq_cpu = raw_smp_processor_id();
diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index f4dc3f3f4416..103d2a02bf5f 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -972,7 +972,7 @@ int efx_soft_enable_interrupts(struct efx_nic *efx)
 
 	BUG_ON(efx->state == STATE_DISABLED);
 
-	efx->irq_soft_enabled = true;
+	efx_irq_soft_enable(efx);
 	smp_wmb();
 
 	efx_for_each_channel(channel, efx) {
@@ -1009,7 +1009,7 @@ void efx_soft_disable_interrupts(struct efx_nic *efx)
 
 	efx_mcdi_mode_poll(efx);
 
-	efx->irq_soft_enabled = false;
+	efx_irq_soft_disable(efx);
 	smp_wmb();
 
 	if (efx->legacy_irq)
diff --git a/drivers/net/ethernet/sfc/falcon/efx.c b/drivers/net/ethernet/sfc/falcon/efx.c
index 0c197b448645..a61cc2c84b78 100644
--- a/drivers/net/ethernet/sfc/falcon/efx.c
+++ b/drivers/net/ethernet/sfc/falcon/efx.c
@@ -1460,7 +1460,7 @@ static int ef4_soft_enable_interrupts(struct ef4_nic *efx)
 
 	BUG_ON(efx->state == STATE_DISABLED);
 
-	efx->irq_soft_enabled = true;
+	ef4_irq_soft_enable(efx);
 	smp_wmb();
 
 	ef4_for_each_channel(channel, efx) {
@@ -1493,7 +1493,7 @@ static void ef4_soft_disable_interrupts(struct ef4_nic *efx)
 	if (efx->state == STATE_DISABLED)
 		return;
 
-	efx->irq_soft_enabled = false;
+	ef4_irq_soft_disable(efx);
 	smp_wmb();
 
 	if (efx->legacy_irq)
diff --git a/drivers/net/ethernet/sfc/falcon/falcon.c b/drivers/net/ethernet/sfc/falcon/falcon.c
index fb1d19b7c419..0c0e00412689 100644
--- a/drivers/net/ethernet/sfc/falcon/falcon.c
+++ b/drivers/net/ethernet/sfc/falcon/falcon.c
@@ -449,7 +449,7 @@ static irqreturn_t falcon_legacy_interrupt_a1(int irq, void *dev_id)
 		   "IRQ %d on CPU %d status " EF4_OWORD_FMT "\n",
 		   irq, raw_smp_processor_id(), EF4_OWORD_VAL(*int_ker));
 
-	if (!likely(READ_ONCE(efx->irq_soft_enabled)))
+	if (!likely(ef4_irq_soft_enabled(efx)))
 		return IRQ_HANDLED;
 
 	/* Check to see if we have a serious error condition */
diff --git a/drivers/net/ethernet/sfc/falcon/farch.c b/drivers/net/ethernet/sfc/falcon/farch.c
index 23d507a3820d..291165db7933 100644
--- a/drivers/net/ethernet/sfc/falcon/farch.c
+++ b/drivers/net/ethernet/sfc/falcon/farch.c
@@ -1500,7 +1500,7 @@ irqreturn_t ef4_farch_fatal_interrupt(struct ef4_nic *efx)
 irqreturn_t ef4_farch_legacy_interrupt(int irq, void *dev_id)
 {
 	struct ef4_nic *efx = dev_id;
-	bool soft_enabled = READ_ONCE(efx->irq_soft_enabled);
+	bool soft_enabled = ef4_irq_soft_enabled(efx);
 	ef4_oword_t *int_ker = efx->irq_status.addr;
 	irqreturn_t result = IRQ_NONE;
 	struct ef4_channel *channel;
@@ -1592,7 +1592,7 @@ irqreturn_t ef4_farch_msi_interrupt(int irq, void *dev_id)
 		   "IRQ %d on CPU %d status " EF4_OWORD_FMT "\n",
 		   irq, raw_smp_processor_id(), EF4_OWORD_VAL(*int_ker));
 
-	if (!likely(READ_ONCE(efx->irq_soft_enabled)))
+	if (!likely(ef4_irq_soft_enabled(efx)))
 		return IRQ_HANDLED;
 
 	/* Handle non-event-queue sources */
diff --git a/drivers/net/ethernet/sfc/falcon/net_driver.h b/drivers/net/ethernet/sfc/falcon/net_driver.h
index 7ab0db44720d..9880fff59f9d 100644
--- a/drivers/net/ethernet/sfc/falcon/net_driver.h
+++ b/drivers/net/ethernet/sfc/falcon/net_driver.h
@@ -1305,6 +1305,23 @@ static inline netdev_features_t ef4_supported_features(const struct ef4_nic *efx
 	return net_dev->features | net_dev->hw_features;
 }
 
+static inline void ef4_irq_soft_enable(struct ef4_nic *efx)
+{
+	/* Publish channel state before opening the IRQ handler gate. */
+	smp_store_release(&efx->irq_soft_enabled, true);
+}
+
+static inline void ef4_irq_soft_disable(struct ef4_nic *efx)
+{
+	WRITE_ONCE(efx->irq_soft_enabled, false);
+}
+
+static inline bool ef4_irq_soft_enabled(struct ef4_nic *efx)
+{
+	/* Pair with ef4_irq_soft_enable() before touching channels. */
+	return smp_load_acquire(&efx->irq_soft_enabled);
+}
+
 /* Get the current TX queue insert index. */
 static inline unsigned int
 ef4_tx_queue_get_insert_index(const struct ef4_tx_queue *tx_queue)
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index b98c259f672d..c172b3504e61 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1731,6 +1731,23 @@ static inline void efx_xmit_hwtstamp_pending(struct sk_buff *skb)
 	skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
 }
 
+static inline void efx_irq_soft_enable(struct efx_nic *efx)
+{
+	/* Publish channel state before opening the IRQ handler gate. */
+	smp_store_release(&efx->irq_soft_enabled, true);
+}
+
+static inline void efx_irq_soft_disable(struct efx_nic *efx)
+{
+	WRITE_ONCE(efx->irq_soft_enabled, false);
+}
+
+static inline bool efx_irq_soft_enabled(struct efx_nic *efx)
+{
+	/* Pair with efx_irq_soft_enable() before touching channels. */
+	return smp_load_acquire(&efx->irq_soft_enabled);
+}
+
 /* Get the max fill level of the TX queues on this channel */
 static inline unsigned int
 efx_channel_tx_fill_level(struct efx_channel *channel)
diff --git a/drivers/net/ethernet/sfc/siena/efx_channels.c b/drivers/net/ethernet/sfc/siena/efx_channels.c
index 1fc343598771..55123f8322a7 100644
--- a/drivers/net/ethernet/sfc/siena/efx_channels.c
+++ b/drivers/net/ethernet/sfc/siena/efx_channels.c
@@ -1004,7 +1004,7 @@ static int efx_soft_enable_interrupts(struct efx_nic *efx)
 
 	BUG_ON(efx->state == STATE_DISABLED);
 
-	efx->irq_soft_enabled = true;
+	efx_irq_soft_enable(efx);
 	smp_wmb();
 
 	efx_for_each_channel(channel, efx) {
@@ -1041,7 +1041,7 @@ static void efx_soft_disable_interrupts(struct efx_nic *efx)
 
 	efx_siena_mcdi_mode_poll(efx);
 
-	efx->irq_soft_enabled = false;
+	efx_irq_soft_disable(efx);
 	smp_wmb();
 
 	if (efx->legacy_irq)
diff --git a/drivers/net/ethernet/sfc/siena/farch.c b/drivers/net/ethernet/sfc/siena/farch.c
index 7613d7988894..208cc499c747 100644
--- a/drivers/net/ethernet/sfc/siena/farch.c
+++ b/drivers/net/ethernet/sfc/siena/farch.c
@@ -1514,7 +1514,7 @@ irqreturn_t efx_farch_fatal_interrupt(struct efx_nic *efx)
 irqreturn_t efx_farch_legacy_interrupt(int irq, void *dev_id)
 {
 	struct efx_nic *efx = dev_id;
-	bool soft_enabled = READ_ONCE(efx->irq_soft_enabled);
+	bool soft_enabled = efx_irq_soft_enabled(efx);
 	efx_oword_t *int_ker = efx->irq_status.addr;
 	irqreturn_t result = IRQ_NONE;
 	struct efx_channel *channel;
@@ -1606,7 +1606,7 @@ irqreturn_t efx_farch_msi_interrupt(int irq, void *dev_id)
 		   "IRQ %d on CPU %d status " EFX_OWORD_FMT "\n",
 		   irq, raw_smp_processor_id(), EFX_OWORD_VAL(*int_ker));
 
-	if (!likely(READ_ONCE(efx->irq_soft_enabled)))
+	if (!likely(efx_irq_soft_enabled(efx)))
 		return IRQ_HANDLED;
 
 	/* Handle non-event-queue sources */
diff --git a/drivers/net/ethernet/sfc/siena/net_driver.h b/drivers/net/ethernet/sfc/siena/net_driver.h
index 4cf556782133..73bc42a854e2 100644
--- a/drivers/net/ethernet/sfc/siena/net_driver.h
+++ b/drivers/net/ethernet/sfc/siena/net_driver.h
@@ -1624,6 +1624,23 @@ static inline void efx_xmit_hwtstamp_pending(struct sk_buff *skb)
 	skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
 }
 
+static inline void efx_irq_soft_enable(struct efx_nic *efx)
+{
+	/* Publish channel state before opening the IRQ handler gate. */
+	smp_store_release(&efx->irq_soft_enabled, true);
+}
+
+static inline void efx_irq_soft_disable(struct efx_nic *efx)
+{
+	WRITE_ONCE(efx->irq_soft_enabled, false);
+}
+
+static inline bool efx_irq_soft_enabled(struct efx_nic *efx)
+{
+	/* Pair with efx_irq_soft_enable() before touching channels. */
+	return smp_load_acquire(&efx->irq_soft_enabled);
+}
+
 /* Get the max fill level of the TX queues on this channel */
 static inline unsigned int
 efx_channel_tx_fill_level(struct efx_channel *channel)
-- 
2.34.1

^ permalink raw reply related

* Re: [PATCH v2] net: mvneta: free/request IRQ across suspend/resume
From: Zhou, Yun @ 2026-06-18  9:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
	clrkwllms, rostedt, netdev, linux-kernel, linux-rt-devel
In-Reply-To: <20260618083952.IbGzrvJL@linutronix.de>


On 6/18/26 16:39, Sebastian Andrzej Siewior wrote:
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> On 2026-06-17 17:20:28 [+0800], Yun Zhou wrote:
>> On PREEMPT_RT, the mvneta IRQ handler is force-threaded. Under high
> There is also the `threadirqs' option.
>
>> network traffic, the IRQ can enter suspend with desc->depth == 1
>> (masked by the oneshot mechanism between handler invocations).
> That would be irq_desc::depth.
>
>> During suspend, the kernel increments depth to 2 and masks the
>> interrupt at the MPIC level (clearing the SRC_CTL CPU routing bit,
>> due to IRQCHIP_MASK_ON_SUSPEND).
> The interrupt should be masked while the depth counter goes 0->1, no?
>
>>                                   On resume, depth is decremented
>> back to 1, but since it does not reach 0, the unmask is never
>> called. The MPIC CPU routing remains cleared, permanently disabling
>> interrupt delivery.
> But why not? In my naive assumption, we get into suspend with
> irq_desc::depth = 2 and the threaded should be woken up. Once the
> treaded handler is done the counter should decrement by one. Then again
> during resume reaching 0 leading to the unmask. If the thread handler is
> frozen and defrosted on resume then it should still happen but in
> different order.
>
> Something is missing here based on my naive assumption.
>
>> Fix by freeing the IRQ in suspend and re-requesting it in resume.
>> This ensures a clean IRQ state (depth=0, proper hardware routing)
>> on every resume cycle, regardless of the pre-suspend depth. This
>> follows the approach used by other drivers (e.g. igb).
> The igb shutdowns the device entirely, not just freeing the IRQ.
You are right. The original analysis was wrong — mvneta uses
request_percpu_irq() which sets IRQF_NO_SUSPEND, so the PM framework
never touches this IRQ. The depth never changes from 1.

The actual root cause is simpler: mvneta_percpu_isr() calls
disable_percpu_irq() before scheduling NAPI, and enable_percpu_irq()
is called in napi_complete_done(). If suspend hits during active NAPI
polling, the MPIC percpu IRQ stays masked after resume because
mvneta_start_dev() doesn't restore it.

Will send a v3 with the correct one-liner fix (enable_percpu_irq in
the resume path). Apologies for the incorrect analysis.

BR,
Yun

^ permalink raw reply

* Re: [PATCH bpf] bpf: zero-initialize the fib lookup flow struct
From: Toke Høiland-Jørgensen @ 2026-06-18  9:13 UTC (permalink / raw)
  To: Avinash Duduskar, ast, daniel, andrii
  Cc: bpf, davem, dsahern, eddyz87, edumazet, emil, horms,
	john.fastabend, jolsa, kuba, linux-kernel, martin.lau, memxor,
	netdev, pabeni, sdf, song, yonghong.song
In-Reply-To: <20260617224719.1428599-1-avinash.duduskar@gmail.com>

Avinash Duduskar <avinash.duduskar@gmail.com> writes:

> bpf_ipv4_fib_lookup() and bpf_ipv6_fib_lookup() build the flow key on
> the stack with a bare "struct flowi4 fl4;" / "struct flowi6 fl6;" and
> fill it field by field, but never set flowi4_l3mdev / flowi6_l3mdev.
>
> On the non-DIRECT path the lookup goes through the fib rules whenever the
> netns has custom rules, which a VRF installs:
>
> 	bpf_ipv4_fib_lookup() -> fib_lookup() -> __fib_lookup()
> 	  -> l3mdev_update_flow()   reads !fl->flowi_l3mdev
> 	  -> fib_rules_lookup() -> fib_rule_match()
> 	       -> l3mdev_fib_rule_match()   uses fl->flowi_l3mdev
>
> l3mdev_update_flow() resolves the l3mdev master from the ingress device
> only while the field is still zero. Left at a nonzero stack value the
> resolution is skipped, and l3mdev_fib_rule_match() then tests that value
> as an ifindex, so the VRF master is not resolved and the rule fails to
> match: an ingress enslaved to a VRF can fail to select its table. FIB
> rules matching on an L3 master device (l3mdev_fib_rule_iif_match()/
> _oif_match()) read the same value, so an "ip rule iif/oif <vrf>"
> mismatches the same way.
>
> Zero-initialize the whole flow struct rather than adding one more
> field assignment, so any flowi field added later is covered too.
> ip_route_input_slow() likewise zeroes the field before its input lookup.
>
> CONFIG_INIT_STACK_ALL_ZERO masks this by default, but it depends on
> compiler support (CC_HAS_AUTO_VAR_INIT_ZERO), so INIT_STACK_NONE builds,
> including older toolchains that fall back to it, are exposed. Built with
> INIT_STACK_ALL_PATTERN, a plain bpf_fib_lookup (no VLAN, no DIRECT) over a
> VRF slave whose destination is routed only in the VRF table returns
> BPF_FIB_LKUP_RET_NOT_FWDED, and resolves with this patch. On the default
> config the lookup succeeds either way, so ordinary testing does not catch
> the bug.
>
> Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
> Signed-off-by: Avinash Duduskar <avinash.duduskar@gmail.com>

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>


^ permalink raw reply

* Re: [PATCH net] netconsole: don't drop the last byte of a full-sized message
From: Simon Horman @ 2026-06-18  9:13 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, asantostc, gustavold,
	kernel-team
In-Reply-To: <20260616-max_print_chunk-v1-1-8dc125d67083@debian.org>

On Tue, Jun 16, 2026 at 09:09:52AM -0700, Breno Leitao wrote:
> nt->buf is exactly MAX_PRINT_CHUNK bytes, but scnprintf() reserves one
> byte for its NUL terminator, so a non-fragmented payload of exactly
> MAX_PRINT_CHUNK loses its last byte (emitted as a stray NUL in the
> release path). Grow nt->buf to MAX_PRINT_CHUNK + 1 and bound the
> scnprintf() calls with sizeof(nt->buf); the transmitted length stays
> capped at MAX_PRINT_CHUNK.
> 
> Alternatively, nt->buf could be left at MAX_PRINT_CHUNK and the NUL byte
> reserved by routing exactly-MAX_PRINT_CHUNK payloads to fragmentation
> ('len < MAX_PRINT_CHUNK'), at the cost of fragmenting those messages.
> But it would look less sane, thus the current approach.
> 
> Fixes: c62c0a17f9b7 ("netconsole: Append kernel version to message")
> Signed-off-by: Breno Leitao <leitao@debian.org>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH net] net: ethernet: ti: icssg: guard PA stat lookups
From: Simon Horman @ 2026-06-18  9:10 UTC (permalink / raw)
  To: Philippe Schenker
  Cc: netdev, Philippe Schenker, danishanwar, rogerq, linux-arm-kernel,
	stable, Andrew Lunn, David Carlier, David S. Miller, Eric Dumazet,
	Jacob Keller, Jakub Kicinski, Kevin Hao, Meghana Malladi,
	Paolo Abeni, Vadim Fedorenko, linux-kernel
In-Reply-To: <20260616143642.1972071-1-dev@pschenker.ch>

On Tue, Jun 16, 2026 at 04:35:34PM +0200, Philippe Schenker wrote:
> From: Philippe Schenker <philippe.schenker@impulsing.ch>
> 
> icssg_ndo_get_stats64() unconditionally calls emac_get_stat_by_name()
> with FW PA stat names regardless of whether the PA stats block is
> present on the hardware.  emac_get_stat_by_name() already guards the
> PA stats lookup with `if (emac->prueth->pa_stats)`; when that pointer
> is NULL the lookup falls through to netdev_err() and returns -EINVAL.
> Because ndo_get_stats64 is polled regularly by the networking stack
> this produces thousands of log entries of the form:
> 
>   icssg-prueth icssg1-eth end0: Invalid stats FW_RX_ERROR
> 
> A secondary consequence is that the int(-EINVAL) return value is
> implicitly widened to a near-ULLONG_MAX unsigned value when accumulated
> into the __u64 fields of rtnl_link_stats64, silently corrupting the
> rx_errors, rx_dropped and tx_dropped counters reported by `ip -s link`.
> 
> Every other PA-aware code path in the driver is already guarded with
> the same `if (emac->prueth->pa_stats)` check.  Apply the same guard
> here.
> 
> Fixes: 0d15a26b247d ("net: ti: icssg-prueth: Add ICSSG FW Stats")

nit: no blank line between tags

> 
> Signed-off-by: Philippe Schenker <philippe.schenker@impulsing.ch>
> 
> Cc: danishanwar@ti.com
> Cc: rogerq@kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: stable@vger.kernel.org

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH v2] net: mvneta: free/request IRQ across suspend/resume
From: Zhou, Yun @ 2026-06-18  9:03 UTC (permalink / raw)
  To: Maxime Chevallier, marcin.s.wojtas, andrew+netdev, davem,
	edumazet, kuba, pabeni, bigeasy, clrkwllms, rostedt
  Cc: netdev, linux-kernel, linux-rt-devel
In-Reply-To: <95249596-5f05-421c-9c8a-693c7b26c4f6@bootlin.com>

On 6/17/26 20:49, Maxime Chevallier wrote:
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> Hi,
>
> On 6/17/26 11:20, Yun Zhou wrote:
>> On PREEMPT_RT, the mvneta IRQ handler is force-threaded. Under high
>> network traffic, the IRQ can enter suspend with desc->depth == 1
>> (masked by the oneshot mechanism between handler invocations).
>>
>> During suspend, the kernel increments depth to 2 and masks the
>> interrupt at the MPIC level (clearing the SRC_CTL CPU routing bit,
>> due to IRQCHIP_MASK_ON_SUSPEND). On resume, depth is decremented
>> back to 1, but since it does not reach 0, the unmask is never
>> called. The MPIC CPU routing remains cleared, permanently disabling
>> interrupt delivery.
>>
>> Fix by freeing the IRQ in suspend and re-requesting it in resume.
>> This ensures a clean IRQ state (depth=0, proper hardware routing)
>> on every resume cycle, regardless of the pre-suspend depth. This
>> follows the approach used by other drivers (e.g. igb).
> This description makes it sound like it's not really a mvneta problem,
> but rather a broader effect from  preempt-rt / irq management / suspend
> interactions.
>
> Is this the expected way to deal with that ?
>
You were right to question this. After deeper investigation, I found
that the original analysis was incorrect.

The real root cause is entirely within the mvneta driver:

mvneta_percpu_isr() calls disable_percpu_irq() to mask the MPIC percpu
IRQ before scheduling NAPI. The corresponding enable_percpu_irq() is
called in napi_complete_done(). If suspend occurs during active NAPI
polling (between disable and enable), the MPIC percpu IRQ remains
masked after resume — mvneta_start_dev() only restores the NIC-level
INTR_NEW_MASK register, not the irqchip-level per-CPU mask.

The fix is a one-liner: call on_each_cpu(mvneta_percpu_enable) in the
resume path to ensure the MPIC percpu IRQ is unmasked. I will send a
v3 with the correct fix and updated description.

The previous free_irq/request_irq approach happened to work as a
side-effect (request_percpu_irq → enable_percpu_irq restores the mask),
but it was fixing the symptom rather than the actual cause.

Thank you very much for your rigorous review,
Yun





^ permalink raw reply

* [PATCH v6.6-v6.1] netfilter: nf_tables: always walk all pending catchall elements
From: Shivani Agarwal @ 2026-06-18  8:34 UTC (permalink / raw)
  To: stable, gregkh
  Cc: pablo, fw, phil, davem, edumazet, kuba, pabeni, horms,
	netfilter-devel, coreteam, netdev, linux-kernel, ajay.kaher,
	alexey.makhalov, vamsi-krishna.brahmajosyula, yin.ding,
	tapas.kundu, Yiming Qian, Sasha Levin, Shivani Agarwal

From: Florian Westphal <fw@strlen.de>

[ Upstream commit 7cb9a23d7ae40a702577d3d8bacb7026f04ac2a9 ]

During transaction processing we might have more than one catchall element:
1 live catchall element and 1 pending element that is coming as part of the
new batch.

If the map holding the catchall elements is also going away, its
required to toggle all catchall elements and not just the first viable
candidate.

Otherwise, we get:
 WARNING: ./include/net/netfilter/nf_tables.h:1281 at nft_data_release+0xb7/0xe0 [nf_tables], CPU#2: nft/1404
 RIP: 0010:nft_data_release+0xb7/0xe0 [nf_tables]
 [..]
 __nft_set_elem_destroy+0x106/0x380 [nf_tables]
 nf_tables_abort_release+0x348/0x8d0 [nf_tables]
 nf_tables_abort+0xcf2/0x3ac0 [nf_tables]
 nfnetlink_rcv_batch+0x9c9/0x20e0 [..]

Fixes: 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Shivani: Modified to apply on v6.6.y-v6.1.y ]
Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com>
---
 net/netfilter/nf_tables_api.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 196ac4e76..0581f6479 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -620,7 +620,6 @@ static void nft_map_catchall_deactivate(const struct nft_ctx *ctx,
 
 		elem.priv = catchall->elem;
 		nft_setelem_data_deactivate(ctx->net, set, &elem);
-		break;
 	}
 }
 
@@ -5241,7 +5240,6 @@ static void nft_map_catchall_activate(const struct nft_ctx *ctx,
 
 		elem.priv = catchall->elem;
 		nft_setelem_data_activate(ctx->net, set, &elem);
-		break;
 	}
 }
 
-- 
2.53.0


^ permalink raw reply related

* RE: [EXTERNAL] [PATCH net v2] net: marvell: prestera: initialize err in prestera_port_sfp_bind
From: Elad Nachman @ 2026-06-18  8:55 UTC (permalink / raw)
  To: Ruoyu Wang, Taras Chornyi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Russell King,
	Oleksandr Mazur, Yevhen Orlov, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20260617193228.1653582-1-ruoyuw560@gmail.com>

> 
> 
> From: Ruoyu Wang <ruoyuw560@gmail.com>
> Sent: Wednesday, June 17, 2026 10:32 PM
> To: Taras Chornyi <taras.chornyi@plvision.eu>; Andrew Lunn <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Russell King <linux@armlinux.org.uk>; Oleksandr Mazur <oleksandr.mazur@plvision.eu>; Yevhen Orlov <yevhen.orlov@plvision.eu>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [EXTERNAL] [PATCH net v2] net: marvell: prestera: initialize err in prestera_port_sfp_bind
> 
> prestera_port_sfp_bind() returns err after walking the ports node. If no child node matches the port's front-panel id, err is never assigned. Initialize err to 0 because absence of a matching optional port device tree node is not an error. In
> 
> prestera_port_sfp_bind() returns err after walking the ports node. If no
> child node matches the port's front-panel id, err is never assigned.
> 
> Initialize err to 0 because absence of a matching optional port device
> tree node is not an error. In that case no phylink is created and port
> creation should continue with port->phy_link left NULL. Errors from
> malformed matched nodes and phylink_create() still propagate.
> 
> Fixes: 52323ef75414 ("net: marvell: prestera: add phylink support")
> Signed-off-by: Ruoyu Wang <mailto:ruoyuw560@gmail.com>
> ---
> v2:
> - Add net tree target to the subject.
> - Explain why the no-match path returns 0 instead of -ENODEV.
> 
>  drivers/net/ethernet/marvell/prestera/prestera_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/marvell/prestera/prestera_main.c b/drivers/net/ethernet/marvell/prestera/prestera_main.c
> index 41e19e9ad28d4..a82e7a8029851 100644
> --- a/drivers/net/ethernet/marvell/prestera/prestera_main.c
> +++ b/drivers/net/ethernet/marvell/prestera/prestera_main.c
> @@ -373,7 +373,7 @@ static int prestera_port_sfp_bind(struct prestera_port *port)
>  	struct device_node *ports, *node;
>  	struct fwnode_handle *fwnode;
>  	struct phylink *phy_link;
> -	int err;
> +	int err = 0;
> 
>  	if (!sw->np)
>  		return 0;
> --
> 2.51.0
> 

prestera_port_sfp_bind() iterates only SFP ports.
Although all currently existing switch boards have at least one SFP uplink port,
In theory a manufacturer might produce a switch board without any SFP ports,
which will unnecessarily fail this function call, so for resolving this case indeed
err should be initialized to zero to make this function return 0 and not an error.

Acked-by: Elad Nachman <enachman@marvell.com>

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH 1/2] igc: Wait for MAC passthrough after reset
From: Ruinskiy, Dima @ 2026-06-18  8:51 UTC (permalink / raw)
  To: Loktionov, Aleksandr, kao, acelan, Nguyen, Anthony L,
	Kitszel, Przemyslaw
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <IA3PR11MB8986B77F49DF672178FEE4BCE5E32@IA3PR11MB8986.namprd11.prod.outlook.com>

On 18/06/2026 10:55, Loktionov, Aleksandr wrote:
> 
> 
>> -----Original Message-----
>> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
>> Of Chia-Lin Kao (AceLan) via Intel-wired-lan
>> Sent: Thursday, June 18, 2026 9:33 AM
>> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
>> Przemyslaw <przemyslaw.kitszel@intel.com>
>> Cc: Andrew Lunn <andrew+netdev@lunn.ch>; David S. Miller
>> <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub
>> Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; intel-
>> wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
>> kernel@vger.kernel.org
>> Subject: [Intel-wired-lan] [PATCH 1/2] igc: Wait for MAC passthrough
>> after reset
>>
>> Some systems support MAC passthrough for dock Ethernet controllers by
>> having firmware rewrite the receive address registers after the
>> controller reset completes.
>>
>> igc resets the controller before reading RAL0/RAH0, so that reset can
>> restore the controller native MAC address temporarily. If the driver
>> reads the registers immediately, it can race the firmware rewrite and
>> keep the native dock MAC instead of the host passthrough MAC.
>>
>> For LMVP devices, poll RAL0/RAH0 after reset and before reading the
>> MAC address. Stop once the address registers change to another valid
>> Ethernet address, allowing firmware a bounded window to complete the
>> passthrough update.
>>
> Good day, Chia-Lin
> 
> It'd be great if you could share more details on how to reproduce the issue.
> 
> What exact hardware setup is affected (dock model, NIC, system)?
> Which firmware/BIOS version?
> How often does the race trigger?
> Do you have a way to reliably reproduce it?
> 
> Also, what is the observed behavior vs. expected behavior? For example,
> which MAC address is seen and which one should be used?
> 
In addition to that - I would ask - when the race triggers - how much 
wait time do you need to reliably resolve it (i.e., for the FW to have 
completed the MAC update)?

Because 100 iterations of 100msec each - this translates to up-to 10 
seconds, no?
The weak spot here is what if you are on an LMvP system where MAC 
passthrough has not been enabled. You will always wait for the full 10 
seconds after every reset until you give up and just continue with the 
default MAC. Hardly desirable behavior.

We've implemented something like this in another driver at one point, 
and the default polling timeout there is 1 second (which does not affect 
the UX too much).

A better way may be using a FW interrupt to notify the driver when the 
MAC address has been updated. The usability of this approach depends on 
whether it is possible to update the MAC address up the stack after the 
device has already been initialized. Does the framework support this?

Thanks,
Dima.

> 
>> Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
>> ---
>>   drivers/net/ethernet/intel/igc/igc_main.c | 48
>> +++++++++++++++++++++++
>>   1 file changed, 48 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>> b/drivers/net/ethernet/intel/igc/igc_main.c
>> index 2c9e2dfd8499..fa9752ed8bc5 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> @@ -11,6 +11,7 @@
>>   #include <net/pkt_sched.h>
>>   #include <linux/bpf_trace.h>
>>   #include <net/xdp_sock_drv.h>
>> +#include <linux/etherdevice.h>
>>   #include <linux/pci.h>
>>   #include <linux/mdio.h>
>>
>> @@ -69,6 +70,52 @@ static const struct pci_device_id igc_pci_tbl[] = {
>>
>>   MODULE_DEVICE_TABLE(pci, igc_pci_tbl);
>>
>> +static void igc_read_rar0(struct igc_hw *hw, u8 *addr, u32 *ral, u32
>> +*rah) {
>> +	*ral = rd32(IGC_RAL(0));
>> +	*rah = rd32(IGC_RAH(0));
>> +
>> +	addr[0] = *ral & 0xff;
>> +	addr[1] = (*ral >> 8) & 0xff;
>> +	addr[2] = (*ral >> 16) & 0xff;
>> +	addr[3] = (*ral >> 24) & 0xff;
>> +	addr[4] = *rah & 0xff;
>> +	addr[5] = (*rah >> 8) & 0xff;
>> +}
>> +
>> +static bool igc_is_lmvp_device(struct pci_dev *pdev) {
>> +	switch (pdev->device) {
>> +	case IGC_DEV_ID_I225_LMVP:
>> +	case IGC_DEV_ID_I226_LMVP:
>> +		return true;
>> +	default:
>> +		return false;
>> +	}
>> +}
>> +
>> +static void igc_wait_for_lmvp_mac_passthrough(struct pci_dev *pdev,
>> +					      struct igc_hw *hw)
>> +{
>> +	u8 addr[ETH_ALEN] __aligned(2);
>> +	u32 orig_ral, orig_rah;
>> +	u32 ral, rah;
>> +	int i;
>> +
>> +	if (!igc_is_lmvp_device(pdev))
>> +		return;
>> +
>> +	igc_read_rar0(hw, addr, &orig_ral, &orig_rah);
>> +
>> +	for (i = 0; i < 100; i++) {
>> +		msleep(100);
>> +		igc_read_rar0(hw, addr, &ral, &rah);
>> +		if ((ral != orig_ral || rah != orig_rah) &&
>> +		    is_valid_ether_addr(addr))
>> +			return;
>> +	}
>> +}
>> +
>>   enum latency_range {
>>   	lowest_latency = 0,
>>   	low_latency = 1,
>> @@ -7259,6 +7306,7 @@ static int igc_probe(struct pci_dev *pdev,
>>   	 * known good starting state
>>   	 */
>>   	hw->mac.ops.reset_hw(hw);
>> +	igc_wait_for_lmvp_mac_passthrough(pdev, hw);
>>
>>   	if (igc_get_flash_presence_i225(hw)) {
>>   		if (hw->nvm.ops.validate(hw) < 0) {
>> --
>> 2.53.0
> 


^ permalink raw reply

* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Sebastian Andrzej Siewior @ 2026-06-18  8:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, Jakub Kicinski, John Ogness, Sergey Senozhatsky,
	Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao,
	Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
	stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
	Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <20260617111504.GK49951@noisy.programming.kicks-ass.net>

On 2026-06-17 13:15:04 [+0200], Peter Zijlstra wrote:
> 
> Can't we push all the legacy consoles into a single legacy kthread? I
> mean, converting all consoles is of course awesome, but should we really
> wait for that?

That would be 

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 85fbf1801cbe0..c72f8d7027aee 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -27,11 +27,7 @@ int devkmsg_sysctl_set_loglvl(const struct ctl_table *table, int write,
  * nbcon consoles have had their chance to print the panic messages
  * first.
  */
-#ifdef CONFIG_PREEMPT_RT
 # define force_legacy_kthread()	(true)
-#else
-# define force_legacy_kthread()	(false)
-#endif
 
 #ifdef CONFIG_PRINTK
 
and if I remember correctly it was due to delayed CI output limited to
RT. But this does not fix stable down to 5.10 LTS.

Sebastian

^ permalink raw reply related

* RE: [Intel-wired-lan] [PATCH 1/2] igc: Wait for MAC passthrough after reset
From: Kwapulinski, Piotr @ 2026-06-18  8:49 UTC (permalink / raw)
  To: kao, acelan, Nguyen, Anthony L, Kitszel, Przemyslaw
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260618073324.1843310-1-acelan.kao@canonical.com>

>-----Original Message-----
>From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Chia-Lin Kao (AceLan) via Intel-wired-lan
>Sent: Thursday, June 18, 2026 9:33 AM
>To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>
>Cc: Andrew Lunn <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
>Subject: [Intel-wired-lan] [PATCH 1/2] igc: Wait for MAC passthrough after reset
>
>Some systems support MAC passthrough for dock Ethernet controllers by having firmware rewrite the receive address registers after the controller reset completes.
>
>igc resets the controller before reading RAL0/RAH0, so that reset can restore the controller native MAC address temporarily. If the driver reads the registers immediately, it can race the firmware rewrite and keep the native dock MAC instead of the host passthrough MAC.
>
>For LMVP devices, poll RAL0/RAH0 after reset and before reading the MAC address. Stop once the address registers change to another valid Ethernet address, allowing firmware a bounded window to complete the passthrough update.
>
>Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
>---
> drivers/net/ethernet/intel/igc/igc_main.c | 48 +++++++++++++++++++++++
> 1 file changed, 48 insertions(+)
>
>diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
>index 2c9e2dfd8499..fa9752ed8bc5 100644
>--- a/drivers/net/ethernet/intel/igc/igc_main.c
>+++ b/drivers/net/ethernet/intel/igc/igc_main.c
>@@ -11,6 +11,7 @@
> #include <net/pkt_sched.h>
> #include <linux/bpf_trace.h>
> #include <net/xdp_sock_drv.h>
>+#include <linux/etherdevice.h>
> #include <linux/pci.h>
> #include <linux/mdio.h>
> 
>@@ -69,6 +70,52 @@ static const struct pci_device_id igc_pci_tbl[] = {
> 
> MODULE_DEVICE_TABLE(pci, igc_pci_tbl);
> 
>+static void igc_read_rar0(struct igc_hw *hw, u8 *addr, u32 *ral, u32 
>+*rah) {
>+	*ral = rd32(IGC_RAL(0));
>+	*rah = rd32(IGC_RAH(0));
>+
>+	addr[0] = *ral & 0xff;
>+	addr[1] = (*ral >> 8) & 0xff;
>+	addr[2] = (*ral >> 16) & 0xff;
>+	addr[3] = (*ral >> 24) & 0xff;
>+	addr[4] = *rah & 0xff;
>+	addr[5] = (*rah >> 8) & 0xff;
>+}
>+
>+static bool igc_is_lmvp_device(struct pci_dev *pdev) {
>+	switch (pdev->device) {
>+	case IGC_DEV_ID_I225_LMVP:
>+	case IGC_DEV_ID_I226_LMVP:
>+		return true;
>+	default:
>+		return false;
>+	}
>+}
>+
>+static void igc_wait_for_lmvp_mac_passthrough(struct pci_dev *pdev,
>+					      struct igc_hw *hw)
>+{
>+	u8 addr[ETH_ALEN] __aligned(2);
>+	u32 orig_ral, orig_rah;
>+	u32 ral, rah;
>+	int i;
Hello AceLan
Please move ral, rah and 'i' right into the loop.
Thank you.
Piotr
>+
>+	if (!igc_is_lmvp_device(pdev))
>+		return;
>+
>+	igc_read_rar0(hw, addr, &orig_ral, &orig_rah);
>+
>+	for (i = 0; i < 100; i++) {
>+		msleep(100);
>+		igc_read_rar0(hw, addr, &ral, &rah);
>+		if ((ral != orig_ral || rah != orig_rah) &&
>+		    is_valid_ether_addr(addr))
>+			return;
>+	}
>+}
>+
> enum latency_range {
> 	lowest_latency = 0,
> 	low_latency = 1,
>@@ -7259,6 +7306,7 @@ static int igc_probe(struct pci_dev *pdev,
> 	 * known good starting state
> 	 */
> 	hw->mac.ops.reset_hw(hw);
>+	igc_wait_for_lmvp_mac_passthrough(pdev, hw);
> 
> 	if (igc_get_flash_presence_i225(hw)) {
> 		if (hw->nvm.ops.validate(hw) < 0) {
>--
>2.53.0
>
>

^ permalink raw reply

* Re: [PATCH bpf v3 1/2] bpf: Fix partial copy of non-linear test_run output
From: Paul Chaignon @ 2026-06-18  8:46 UTC (permalink / raw)
  To: Sun Jian
  Cc: bpf, netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, davem,
	edumazet, kuba, pabeni, horms, shuah, hawk, john.fastabend, sdf,
	toke, lorenzo
In-Reply-To: <20260617093557.63880-2-sun.jian.kdev@gmail.com>

On Wed, Jun 17, 2026 at 05:35:56PM +0800, Sun Jian wrote:
> For non-linear test_run output, bpf_test_finish() derives the linear
> data copy length from copy_size - frag_size. This only matches the
> linear data length when copy_size is the full packet size.
> 
> When userspace provides a short data_out buffer, copy_size is clamped to
> that buffer size. If copy_size is smaller than frag_size, the computed
> length becomes negative and bpf_test_finish() returns -ENOSPC before
> copying the packet prefix or updating data_size_out.
> 
> Compute the linear data length from the packet layout instead, and clamp
> the linear copy length to copy_size. This preserves the expected
> partial-copy semantics: return -ENOSPC, copy the packet prefix that fits
> in data_out, and report the full packet length through data_size_out.
> 
> Fixes: 7855e0db150ad ("bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature")
> Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
> ---
>  net/bpf/test_run.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 2bc04feadfab..f15c613aaa4e 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -453,12 +453,8 @@ static int bpf_test_finish(const union bpf_attr *kattr,
>  	}
>  
>  	if (data_out) {
> -		int len = sinfo ? copy_size - frag_size : copy_size;
> -
> -		if (len < 0) {
> -			err = -ENOSPC;
> -			goto out;
> -		}
> +		u32 head_len = size - frag_size;
> +		u32 len = min(copy_size, head_len);
>  
>  		if (copy_to_user(data_out, data, len))
>  			goto out;

Acked-by: Paul Chaignon <paul.chaignon@gmail.com>


^ permalink raw reply

* RE: [PATCH net v3] tipc: fix use-after-free of the discoverer in tipc_disc_rcv()
From: Tung Quang Nguyen @ 2026-06-18  8:45 UTC (permalink / raw)
  To: Weiming Shi
  Cc: Simon Horman, netdev@vger.kernel.org,
	tipc-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, Xiang Mei, Jon Maloy,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
In-Reply-To: <20260617135744.3383175-3-bestswngs@gmail.com>

>Subject: [PATCH net v3] tipc: fix use-after-free of the discoverer in
>tipc_disc_rcv()
>
>bearer_disable() frees b->disc with tipc_disc_delete()'s plain kfree(), but
>tipc_disc_rcv() still dereferences b->disc in RX softirq under
>rcu_read_lock() (tipc_udp_recv -> tipc_rcv -> tipc_disc_rcv).
>
>L2 bearers are safe thanks to the synchronize_net() in tipc_disable_l2_media(),
>but the UDP bearer defers that call to the
>cleanup_bearer() workqueue, so the discoverer is freed with no grace
>period:
>
> BUG: KASAN: slab-use-after-free in tipc_disc_rcv (net/tipc/discover.c:149)
>Read of size 8 at addr ffff88802348b728 by task poc_tipc/184  <IRQ>
>  tipc_disc_rcv (net/tipc/discover.c:149)
>  tipc_rcv (net/tipc/node.c:2126)
>  tipc_udp_recv (net/tipc/udp_media.c:391)
>  udp_rcv (net/ipv4/udp.c:2643)
>  ip_local_deliver_finish (net/ipv4/ip_input.c:241)  </IRQ>  Freed by task 181:
>  kfree (mm/slub.c:6565)
>  bearer_disable (net/tipc/bearer.c:418)
>  tipc_nl_bearer_disable (net/tipc/bearer.c:1001)
>
>The bearer is freed with kfree_rcu(); free the discoverer the same way.
>Add an rcu_head to struct tipc_discoverer and free it and its skb from an RCU
>callback.
>
>Because the RCU callback (tipc_disc_free_rcu) lives in module text, a
>call_rcu() that is still pending when the tipc module is unloaded would invoke a
>freed function. Add an rcu_barrier() to tipc_exit() after the bearer subsystem
>has been torn down, so all pending discoverer callbacks have run before the
>module text goes away.
>
>Reachable from an unprivileged user namespace: the TIPCv2 genl family is
>netnsok and its bearer commands have no GENL_ADMIN_PERM. Needs
>CONFIG_TIPC and CONFIG_TIPC_MEDIA_UDP.
>
>Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash
>values")
>Reported-by: Xiang Mei <xmei5@asu.edu>
>Assisted-by: Claude:claude-opus-4-8
>Signed-off-by: Weiming Shi <bestswngs@gmail.com>
>---
>v3:
> - Reword the rcu_barrier() comment as a TODO (Tung Quang Nguyen).
>v2:
> - split the over-80-column container_of() line (Tung Quang Nguyen)
> - add rcu_barrier() to tipc_exit() so a pending call_rcu() cannot fire
>   into freed module text after rmmod (Eric Dumazet)
>
> net/tipc/core.c     |  5 +++++
> net/tipc/discover.c | 14 ++++++++++++--
> 2 files changed, 17 insertions(+), 2 deletions(-)
>
>diff --git a/net/tipc/core.c b/net/tipc/core.c index
>434e70eabe08..1ddecea1df6e 100644
>--- a/net/tipc/core.c
>+++ b/net/tipc/core.c
>@@ -218,6 +218,11 @@ static void __exit tipc_exit(void)
> 	unregister_pernet_device(&tipc_net_ops);
> 	tipc_unregister_sysctl();
>
>+	/* TODO: Wait for all timers that called call_rcu() to finish before
>+	 * calling rcu_barrier().
>+	 */
>+	rcu_barrier();
>+
> 	pr_info("Deactivated\n");
> }
>
>diff --git a/net/tipc/discover.c b/net/tipc/discover.c index
>3e54d2df5683..b9d06595b067 100644
>--- a/net/tipc/discover.c
>+++ b/net/tipc/discover.c
>@@ -58,6 +58,7 @@
>  * @skb: request message to be (repeatedly) sent
>  * @timer: timer governing period between requests
>  * @timer_intv: current interval between requests (in ms)
>+ * @rcu: RCU head for deferred freeing
>  */
> struct tipc_discoverer {
> 	u32 bearer_id;
>@@ -69,6 +70,7 @@ struct tipc_discoverer {
> 	struct sk_buff *skb;
> 	struct timer_list timer;
> 	unsigned long timer_intv;
>+	struct rcu_head rcu;
> };
>
> /**
>@@ -382,6 +384,15 @@ int tipc_disc_create(struct net *net, struct tipc_bearer
>*b,
> 	return 0;
> }
>
>+static void tipc_disc_free_rcu(struct rcu_head *rp) {
>+	struct tipc_discoverer *d = container_of(rp, struct tipc_discoverer,
>+						 rcu);
>+
>+	kfree_skb(d->skb);
>+	kfree(d);
>+}
>+
> /**
>  * tipc_disc_delete - destroy object sending periodic link setup requests
>  * @d: ptr to link dest structure
>@@ -389,8 +400,7 @@ int tipc_disc_create(struct net *net, struct tipc_bearer
>*b,  void tipc_disc_delete(struct tipc_discoverer *d)  {
> 	timer_shutdown_sync(&d->timer);
>-	kfree_skb(d->skb);
>-	kfree(d);
>+	call_rcu(&d->rcu, tipc_disc_free_rcu);
> }
>
> /**
>--
>2.43.0
>

Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>

^ permalink raw reply

* Re: [PATCH bpf v3 2/2] selftests/bpf: Cover partial copy of non-linear test_run output
From: Paul Chaignon @ 2026-06-18  8:44 UTC (permalink / raw)
  To: sun jian
  Cc: bot+bpf-ci, bpf, netdev, linux-kselftest, linux-kernel, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
	jolsa, davem, edumazet, kuba, pabeni, horms, shuah, hawk,
	john.fastabend, sdf, toke, lorenzo, martin.lau, clm,
	ihor.solodrai
In-Reply-To: <CABFUUZFeh18OjQ6EhjD17ZwQKb1aiVNkYKv-hAkVGrHhPpjE4Q@mail.gmail.com>

On Wed, Jun 17, 2026 at 10:19:52PM +0800, sun jian wrote:
> On Wed, Jun 17, 2026 at 6:31 PM <bot+bpf-ci@kernel.org> wrote:
> >
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c b/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> > > index 01f1d1b6715a..9cc898e6a9f7 100644
> > > --- a/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> > > +++ b/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> > > @@ -4,6 +4,10 @@
> > >
> > >  #include "test_pkt_access.skel.h"
> > >
> > > +#define NONLINEAR_PKT_LEN 9000
> > > +#define NONLINEAR_LINEAR_DATA_LEN 64
> > > +#define SHORT_OUT_LEN 100
> > > +
> >
> > [ ... ]
> >
> > > @@ -20,6 +24,69 @@ static void check_run_cnt(int prog_fd, __u64 run_cnt)
> > >             "incorrect number of repetitions, want %llu have %llu\n", run_cnt, info.run_cnt);
> > >  }
> > >
> > > +static void init_pkt(__u8 *pkt, size_t len)
> > > +{
> > > +     size_t i;
> > > +
> > > +     for (i = 0; i < len; i++)
> > > +             pkt[i] = i & 0xff;
> > > +}
> >
> > A question was raised on v2 about whether pkt_v4 could be reused by
> > reducing the linear area to ETH_HLEN, rather than introducing a custom
> > init_pkt() with a 9000-byte stack packet.
> >
> > Can't we reuse pkt_v4 here by reducing the linear area to ETH_HLEN?
> > The v3 still adds init_pkt() and the NONLINEAR_PKT_LEN packet, so this
> > doesn't seem to have been picked up.
> >
> > > +
> > > +static void test_skb_nonlinear_data_out_partial(struct test_pkt_access *skel)
> > > +{
> > > +     LIBBPF_OPTS(bpf_test_run_opts, topts);
> > > +     __u8 pkt[NONLINEAR_PKT_LEN];
> > > +     __u8 out[SHORT_OUT_LEN];
> > > +     struct __sk_buff skb = {};
> > > +     int prog_fd, err;
> > > +
> > > +     init_pkt(pkt, sizeof(pkt));
> > > +
> > > +     skb.data_end = NONLINEAR_LINEAR_DATA_LEN;
> > > +
> > > +     topts.data_in = pkt;
> > > +     topts.data_size_in = sizeof(pkt);
> > > +     topts.data_out = out;
> > > +     topts.data_size_out = sizeof(out);
> > > +     topts.ctx_in = &skb;
> > > +     topts.ctx_size_in = sizeof(skb);
> > > +
> > > +     prog_fd = bpf_program__fd(skel->progs.tc_pass_prog);
> >
> > [ ... ]
> >
> > > diff --git a/tools/testing/selftests/bpf/progs/test_pkt_access.c b/tools/testing/selftests/bpf/progs/test_pkt_access.c
> > > index bce7173152c6..cd284401eebd 100644
> > > --- a/tools/testing/selftests/bpf/progs/test_pkt_access.c
> > > +++ b/tools/testing/selftests/bpf/progs/test_pkt_access.c
> > > @@ -150,3 +150,15 @@ int test_pkt_access(struct __sk_buff *skb)
> > >
> > >       return TC_ACT_UNSPEC;
> > >  }
> > > +
> > > +SEC("tc")
> > > +int tc_pass_prog(struct __sk_buff *skb)
> > > +{
> > > +     return TC_ACT_OK;
> > > +}
> > > +
> > > +SEC("xdp.frags")
> > > +int xdp_frags_pass_prog(struct xdp_md *ctx)
> > > +{
> > > +     return XDP_PASS;
> > > +}
> >
> > A related suggestion on v2 was that, once pkt_v4 is reused, the existing
> > BPF program could be reused instead of adding new pass-through programs.
> >
> > Could tc_pass_prog and xdp_frags_pass_prog be dropped in favour of the
> > existing program? The v3 still adds both of these, so this point also
> > seems to be open.
> >
> >
> > ---
> > AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> > See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
> >
> > CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27680511802
> 
> Hi,
> 
> Thanks for checking this.

Hi Sun Jian,

It would help if you could reply inline instead of at the end of the
messages, especially when there are multiple comments. See [1] for an
explanation of how that works.

1: https://kernelnewbies.org/FirstKernelPatch#Responding_inline

> 
> I tried reusing pkt_v4 and the existing TC program, but they do not fit
> the skb case this test is trying to cover.
> 
> For skb test_run, IPv4/IPv6 inputs with a too-short L3 header in the
> linear area are rejected before bpf_test_finish(). With pkt_v4 and a
> linear area of ETH_HLEN, the test fails with -EINVAL before reaching the
> partial copy-out path. If the linear area is increased enough to pass the
> IPv4 check, pkt_v4 is too small to both trigger the old
> copy_size - frag_size path and verify that the copied prefix spans the
> linear data and the first fragment. pkt_v6 has the same issue: after
> making the IPv6 header linear, only 20 bytes remain in frags.
> 
> The existing test_pkt_access program has its own packet-access coverage
> goals and is not just a pass-through carrier. With such a short linear
> area or small packet fixture, it can fail before the test hits the
> bpf_test_finish()'s partial copy-out path. A pass-through TC program is
> therefore a better fit, because it keeps the test focused on the
> bpf_test_finish() copy-out semantics.

If we're keeping tc_pass_prog() then can't we use pkt_v4 and get rid of
init_pkt?

> 
> For XDP, this object does not have an existing xdp.frags pass-through
> program, so the small XDP frags program is needed to cover the other
> caller of the shared bpf_test_finish() path.
> 
> Thanks,
> Sun Jian

^ permalink raw reply

* Re: [PATCH v2] net: mvneta: free/request IRQ across suspend/resume
From: Sebastian Andrzej Siewior @ 2026-06-18  8:39 UTC (permalink / raw)
  To: Yun Zhou
  Cc: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
	clrkwllms, rostedt, netdev, linux-kernel, linux-rt-devel
In-Reply-To: <20260617092028.1722407-1-yun.zhou@windriver.com>

On 2026-06-17 17:20:28 [+0800], Yun Zhou wrote:
> On PREEMPT_RT, the mvneta IRQ handler is force-threaded. Under high

There is also the `threadirqs' option.

> network traffic, the IRQ can enter suspend with desc->depth == 1
> (masked by the oneshot mechanism between handler invocations).

That would be irq_desc::depth.

> During suspend, the kernel increments depth to 2 and masks the
> interrupt at the MPIC level (clearing the SRC_CTL CPU routing bit,
> due to IRQCHIP_MASK_ON_SUSPEND). 

The interrupt should be masked while the depth counter goes 0->1, no?

>                                  On resume, depth is decremented
> back to 1, but since it does not reach 0, the unmask is never
> called. The MPIC CPU routing remains cleared, permanently disabling
> interrupt delivery.

But why not? In my naive assumption, we get into suspend with
irq_desc::depth = 2 and the threaded should be woken up. Once the
treaded handler is done the counter should decrement by one. Then again
during resume reaching 0 leading to the unmask. If the thread handler is
frozen and defrosted on resume then it should still happen but in
different order.

Something is missing here based on my naive assumption.

> Fix by freeing the IRQ in suspend and re-requesting it in resume.
> This ensures a clean IRQ state (depth=0, proper hardware routing)
> on every resume cycle, regardless of the pre-suspend depth. This
> follows the approach used by other drivers (e.g. igb).

The igb shutdowns the device entirely, not just freeing the IRQ.

> Fixes: 9768b45ceb0b ("net: mvneta: support suspend and resume")
> Signed-off-by: Yun Zhou <yun.zhou@windriver.com>

Sebastian

^ permalink raw reply

* [PATCH v5.10 2/2] net/sched: cls_u32: use skb_header_pointer_careful()
From: Shivani Agarwal @ 2026-06-18  8:08 UTC (permalink / raw)
  To: stable, gregkh
  Cc: davem, edumazet, kuba, pabeni, horms, netdev, linux-kernel,
	xiaosuo, iri, jhs, ajay.kaher, alexey.makhalov,
	vamsi-krishna.brahmajosyula, yin.ding, tapas.kundu, GangMin Kim,
	Bin Lan, Shivani Agarwal
In-Reply-To: <20260618080807.1269070-1-shivani.agarwal@broadcom.com>

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit cabd1a976375780dabab888784e356f574bbaed8 ]

skb_header_pointer() does not fully validate negative @offset values.

Use skb_header_pointer_careful() instead.

GangMin Kim provided a report and a repro fooling u32_classify():

BUG: KASAN: slab-out-of-bounds in u32_classify+0x1180/0x11b0
net/sched/cls_u32.c:221

Fixes: fbc2e7d9cf49 ("cls_u32: use skb_header_pointer() to dereference data safely")
Reported-by: GangMin Kim <km.kim1503@gmail.com>
Closes: https://lore.kernel.org/netdev/CANn89iJkyUZ=mAzLzC4GdcAgLuPnUoivdLaOs6B9rq5_erj76w@mail.gmail.com/T/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260128141539.3404400-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Bin Lan <lanbincn@139.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Shivani: Modified to apply on 5.10.y ]
Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com>
---
 net/sched/cls_u32.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index f2a0c1068..e501390cc 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -149,10 +149,8 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			int toff = off + key->off + (off2 & key->offmask);
 			__be32 *data, hdata;
 
-			if (skb_headroom(skb) + toff > INT_MAX)
-				goto out;
-
-			data = skb_header_pointer(skb, toff, 4, &hdata);
+			data = skb_header_pointer_careful(skb, toff, 4,
+							  &hdata);
 			if (!data)
 				goto out;
 			if ((*data ^ key->val) & key->mask) {
@@ -202,8 +200,9 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		if (ht->divisor) {
 			__be32 *data, hdata;
 
-			data = skb_header_pointer(skb, off + n->sel.hoff, 4,
-						  &hdata);
+			data = skb_header_pointer_careful(skb,
+							  off + n->sel.hoff,
+							  4, &hdata);
 			if (!data)
 				goto out;
 			sel = ht->divisor & u32_hash_fold(*data, &n->sel,
@@ -217,7 +216,7 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			if (n->sel.flags & TC_U32_VAROFFSET) {
 				__be16 *data, hdata;
 
-				data = skb_header_pointer(skb,
+				data = skb_header_pointer_careful(skb,
 							  off + n->sel.offoff,
 							  2, &hdata);
 				if (!data)
-- 
2.53.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox