* Re: [PATCH 1/9] bitfield: add FIELD_GET_SIGNED()
From: Peter Zijlstra @ 2026-04-20 11:19 UTC (permalink / raw)
To: Yury Norov
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Jonathan Cameron, David Lechner,
Nuno Sá, Andy Shevchenko, Ping-Ke Shih, Richard Cochran,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexandre Belloni, Yury Norov, Rasmus Villemoes,
Hans de Goede, Linus Walleij, Sakari Ailus, Salah Triki,
Achim Gratz, Ben Collins, linux-kernel, linux-iio, linux-wireless,
netdev, linux-rtc
In-Reply-To: <20260417173621.368914-2-ynorov@nvidia.com>
On Fri, Apr 17, 2026 at 01:36:12PM -0400, Yury Norov wrote:
> The bitfields are designed in assumption that fields contain unsigned
> integer values, thus extracting the values from the field implies
> zero-extending.
>
> Some drivers need to sign-extend their fields, and currently do it like:
>
> dc_re += sign_extend32(FIELD_GET(0xfff000, tmp), 11);
> dc_im += sign_extend32(FIELD_GET(0xfff, tmp), 11);
>
> It's error-prone because it relies on user to provide the correct
> index of the most significant bit and proper 32 vs 64 function flavor.
>
> Thus, introduce a FIELD_GET_SIGNED() macro, which is the more
> convenient and compiles (on x86_64) to just a couple instructions:
> shl and sar.
>
> Signed-off-by: Yury Norov <ynorov@nvidia.com>
> ---
> include/linux/bitfield.h | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h
> index 54aeeef1f0ec..35ef63972810 100644
> --- a/include/linux/bitfield.h
> +++ b/include/linux/bitfield.h
> @@ -178,6 +178,22 @@
> __FIELD_GET(_mask, _reg, "FIELD_GET: "); \
> })
>
> +/**
> + * FIELD_GET_SIGNED() - extract a signed bitfield element
> + * @mask: shifted mask defining the field's length and position
> + * @reg: value of entire bitfield
> + *
> + * Returns the sign-extended field specified by @_mask from the
> + * bitfield passed in as @_reg by masking and shifting it down.
> + */
> +#define FIELD_GET_SIGNED(mask, reg) \
> + ({ \
> + __BF_FIELD_CHECK(mask, reg, 0U, "FIELD_GET_SIGNED: "); \
> + ((__signed_scalar_typeof(mask))((long long)(reg) << \
> + __builtin_clzll(mask) >> (__builtin_clzll(mask) + \
> + __builtin_ctzll(mask))));\
> + })
IIRC clz is count-leading-zeros and ctz is count-trailing-zeros. Most of
the other FIELD things use __bf_shf() which is defined in terms of ffs -
1 (which is another way of writing ctz).
So how about you start by redefining __bf_shf() in ctz, and then add
another helper for the clz and write the thing something like:
((long long)(reg) << __bf_clz(mask)) >> (__bf_clz(mask) + __bf_shf(mask));
Also, since the order of the shifts is rather important, I think it
makes sense to add this extra pair of (), even when not strictly needed,
just to make it easier to read.
^ permalink raw reply
* Re: [PATCH 2/9] x86/extable: switch to using FIELD_GET_SIGNED()
From: Peter Zijlstra @ 2026-04-20 11:24 UTC (permalink / raw)
To: Yury Norov
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Andy Lutomirski, Jonathan Cameron, David Lechner,
Nuno Sá, Andy Shevchenko, Ping-Ke Shih, Richard Cochran,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexandre Belloni, Yury Norov, Rasmus Villemoes,
Hans de Goede, Linus Walleij, Sakari Ailus, Salah Triki,
Achim Gratz, Ben Collins, linux-kernel, linux-iio, linux-wireless,
netdev, linux-rtc
In-Reply-To: <20260417173621.368914-3-ynorov@nvidia.com>
On Fri, Apr 17, 2026 at 01:36:13PM -0400, Yury Norov wrote:
> The EX_DATA register is laid out such that EX_DATA_IMM occupied MSB.
> It's done to make sure that FIELD_GET() will sign-extend the IMM
> field during extraction.
>
> To enforce that, all EX_DATA masks are made signed integers. This
> works, but relies on the particular implementation of FIELD_GET(),
> i.e. masking then shifting, not vice versa; and the particular
> placement of the fields in the register.
I don't think the order of the mask and shift matters in this case. If
we were to first shift down and then mask, it would still work (after
all, the mask would also need to be shifted and would also get sign
extended, effectively ending up as -1).
But yes, this very much depends on the signed field being the topmost
field and including the MSB.
^ permalink raw reply
* Re: [PATCH net v1] net: validate skb->napi_id in RX tracepoints
From: Jiayuan Chen @ 2026-04-20 11:27 UTC (permalink / raw)
To: Kohei Enju, netdev, linux-trace-kernel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
In-Reply-To: <20260420105427.162816-1-kohei@enjuk.jp>
On 4/20/26 6:54 PM, Kohei Enju wrote:
> Since commit 2bd82484bb4c ("xps: fix xps for stacked devices"),
> skb->napi_id shares storage with sender_cpu. RX tracepoints using
> net_dev_rx_verbose_template read skb->napi_id directly and can therefore
> report sender_cpu values as if they were NAPI IDs.
>
> For example, on the loopback path this can report 1 as napi_id, where 1
So I think veth_forward_skb->__netif_rx could be affected as well?
> comes from raw_smp_processor_id() + 1 in the XPS path:
>
> # bpftrace -e 'tracepoint:net:netif_rx_entry{ print(args->napi_id); }'
> # taskset -c 0 ping -c 1 ::1
>
> Report only valid NAPI IDs in these tracepoints and use 0 otherwise.
>
> Fixes: 2bd82484bb4c ("xps: fix xps for stacked devices")
> Signed-off-by: Kohei Enju <kohei@enjuk.jp>
> ---
> include/trace/events/net.h | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> index fdd9ad474ce3..dbc2c5598e35 100644
> --- a/include/trace/events/net.h
> +++ b/include/trace/events/net.h
> @@ -10,6 +10,7 @@
> #include <linux/if_vlan.h>
> #include <linux/ip.h>
> #include <linux/tracepoint.h>
> +#include <net/busy_poll.h>
>
> TRACE_EVENT(net_dev_start_xmit,
>
> @@ -208,7 +209,8 @@ DECLARE_EVENT_CLASS(net_dev_rx_verbose_template,
> TP_fast_assign(
> __assign_str(name);
> #ifdef CONFIG_NET_RX_BUSY_POLL
> - __entry->napi_id = skb->napi_id;
> + __entry->napi_id = napi_id_valid(skb->napi_id) ?
> + skb->napi_id : 0;
> #else
> __entry->napi_id = 0;
> #endif
^ permalink raw reply
* Re: [PATCH net v3 1/1] net: l3mdev: Reject non-L3 uppers in slave helpers
From: Ido Schimmel @ 2026-04-20 11:32 UTC (permalink / raw)
To: Ren Wei
Cc: netdev, idosch, dsahern, davem, edumazet, kuba, pabeni, horms,
jiri, yifanwucs, tomapufckgml, yuantan098, bird, royenheart
In-Reply-To: <20260419145332.3988923-1-n05ec@lzu.edu.cn>
On Sun, Apr 19, 2026 at 10:53:32PM +0800, Ren Wei wrote:
> From: Haoze Xie <royenheart@gmail.com>
>
> Several l3mdev slave-side helpers resolve an upper device and then use
> l3mdev_ops without first proving that the resolved device is still a
> valid L3 master.
>
> During slave transition, an RCU reader can transiently observe an upper
> that is not an L3 master. Guard the affected slave-resolved paths by
> requiring the resolved upper to still be an L3 master before using
> l3mdev_ops, while keeping existing L3 RX handler providers intact.
>
> Fixes: fdeea7be88b1 ("net: vrf: Set slave's private flag before linking")
> Cc: stable@kernel.org
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Co-developed-by: Yuan Tan <yuantan098@gmail.com>
> Signed-off-by: Yuan Tan <yuantan098@gmail.com>
> Suggested-by: Xin Liu <bird@lzu.edu.cn>
> Tested-by: Haoze Xie <royenheart@gmail.com>
> Signed-off-by: Haoze Xie <royenheart@gmail.com>
> Signed-off-by: Ao Zhou <n05ec@lzu.edu.cn>
I think it's fine for net:
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
We can try a more general approach in net-next which ensures that RCU
readers don't see different master devices within an RCU read-side
critical section. Something like [1]. I tested it (without your patch)
using the reproducer and was not able to trigger the bug.
I am aware that there are more places where we can unlink a device from
its master (e.g., deleting a master device), but most / all of them
(needs auditing) result in a synchronize_rcu() before releasing RTNL.
On my machine, the time it takes to unlink 1k devices from their master
increased from 0.96 seconds to 1.22 seconds (on average), which is
probably fine.
[1]
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 69daba3ddaf0..23657a11c66d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2970,6 +2970,7 @@ static int do_set_master(struct net_device *dev, int ifindex,
netdev_lock_ops(dev);
if (err)
return err;
+ synchronize_net();
} else {
return -EOPNOTSUPP;
}
^ permalink raw reply related
* Re: [PATCH bpf-next v4 1/6] net: move netfilter nf_reject_fill_skb_dst to core ipv4
From: bot+bpf-ci @ 2026-04-20 11:36 UTC (permalink / raw)
To: mahe.tardy, mahe.tardy
Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
john.fastabend, lkp, martin.lau, netdev, netfilter-devel,
oe-kbuild-all, pablo, ast, andrii, daniel, martin.lau, eddyz87,
yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260420105816.72168-2-mahe.tardy@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2271 bytes --]
> diff --git a/include/net/route.h b/include/net/route.h
> index f90106f383c5..ec2466fd0bec 100644
> --- a/include/net/route.h
> +++ b/include/net/route.h
> @@ -173,6 +173,7 @@ struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
> const struct sock *sk);
> struct dst_entry *ipv4_blackhole_route(struct net *net,
> struct dst_entry *dst_orig);
> +int ip_route_reply_fetch_dst(struct sk_buff *skb);
>
> static inline struct rtable *ip_route_output_key(struct net *net, struct flowi4 *flp)
> {
[ ... ]
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index bc1296f0ea69..7091ef936073 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2945,6 +2945,21 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
> }
> EXPORT_SYMBOL_GPL(ip_route_output_flow);
>
> +int ip_route_reply_fetch_dst(struct sk_buff *skb)
> +{
> + struct rtable *rt;
> + struct flowi4 fl4 = {
> + .daddr = ip_hdr(skb)->saddr
> + };
> +
> + rt = ip_route_output_key(dev_net(skb->dev), &fl4);
> + if (IS_ERR(rt))
> + return PTR_ERR(rt);
> + skb_dst_set(skb, &rt->dst);
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(ip_route_reply_fetch_dst);
Is ip_route_reply_fetch_dst() actually needed before calling icmp_send()?
Martin KaFai Lau raised this question in v3 of the patch series and
noted that icmpv6_send() does its own route lookup and doesn't use
skb_dst() of the original skb. He also pointed out that at
cgroup/egress, skb->_skb_refdst should already be set.
Looking at icmpv6_send(), the fl6.{daddr,saddr} are filled and passed
to icmpv6_route_lookup, which performs the reverse/invert route lookup.
The function doesn't appear to use skb_dst() of the original skb.
Since this patch moves ip_route_reply_fetch_dst() from netfilter to
core specifically for the BPF kfunc, can you clarify why this routing
step is necessary? Does the kfunc not work without
ip[6]_route_reply_fetch_dst()?
Reference: https://lore.kernel.org/bpf/df4b0996-3e88-4ea4-983b-82866455a6fc@linux.dev/
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24663313503
^ permalink raw reply
* Re: [PATCH bpf-next v4 4/6] selftests/bpf: add icmp_send_unreach kfunc tests
From: bot+bpf-ci @ 2026-04-20 11:36 UTC (permalink / raw)
To: mahe.tardy, mahe.tardy
Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
john.fastabend, lkp, martin.lau, netdev, netfilter-devel,
oe-kbuild-all, pablo, ast, andrii, daniel, martin.lau, eddyz87,
yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260420105816.72168-5-mahe.tardy@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1197 bytes --]
> diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
> new file mode 100644
> index 000000000000..24d5e01cfe80
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
> @@ -0,0 +1,136 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <test_progs.h>
> +#include <network_helpers.h>
> +#include <linux/errqueue.h>
> +#include "icmp_send_unreach.skel.h"
> +
> +#define TIMEOUT_MS 1000
> +#define SRV_PORT 54321
> +
> +#define ICMP_DEST_UNREACH 3
> +
> +#define ICMP_FRAG_NEEDED 4
> +#define NR_ICMP_UNREACH 15
Jordan Rife previously suggested in v3 that these ICMP constants should be
imported from <linux/icmp.h> rather than redefined here. Would it be cleaner
to include the header and avoid the redefinitions?
Reference: https://lore.kernel.org/bpf/usz5bhydsiejr37owgt3zypckzh7fa7ygmhsyaaiprsljx7iy5@ipopnr5n4ds7/
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24663313503
^ permalink raw reply
* Re: [PATCH bpf-next v4 3/6] bpf: add bpf_icmp_send_unreach kfunc
From: bot+bpf-ci @ 2026-04-20 11:36 UTC (permalink / raw)
To: mahe.tardy, mahe.tardy
Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
john.fastabend, lkp, martin.lau, netdev, netfilter-devel,
oe-kbuild-all, pablo, ast, andrii, daniel, martin.lau, eddyz87,
yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260420105816.72168-4-mahe.tardy@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3581 bytes --]
> diff --git a/net/core/filter.c b/net/core/filter.c
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
[ ... ]
> @@ -12423,6 +12427,86 @@
> return 0;
> }
>
> +static DEFINE_PER_CPU(bool, bpf_icmp_send_in_progress);
> +
> +/**
> + * bpf_icmp_send_unreach - Send ICMP destination unreachable error
> + * @skb: Packet that triggered the error
> + * @code: ICMP unreachable code (0-15 for IPv4, 0-6 for IPv6)
> + *
> + * Sends an ICMP destination unreachable message in response to the
> + * packet. The original packet is cloned before sending the ICMP error,
> + * so the BPF program can still let the packet pass if desired.
> + *
> + * Recursion protection: If called from a context that would trigger
> + * recursion (e.g., root cgroup processing its own ICMP packets),
> + * returns -EBUSY on re-entry.
> + *
> + * Return: 0 on success, negative error code on failure:
> + * -EINVAL: Invalid code parameter
> + * -ENOMEM: Memory allocation failed
> + * -EHOSTUNREACH: Routing lookup failed
> + * -EBUSY: Recursion detected
> + * -EPROTONOSUPPORT: Non-IP protocol
> + */
> +__bpf_kfunc int bpf_icmp_send_unreach(struct __sk_buff *__skb, int code)
> +{
> + struct sk_buff *skb = (struct sk_buff *)__skb;
> + struct sk_buff *nskb;
> + bool *in_progress;
> +
> + in_progress = this_cpu_ptr(&bpf_icmp_send_in_progress);
> + if (*in_progress)
> + return -EBUSY;
> +
> + switch (skb->protocol) {
> +#if IS_ENABLED(CONFIG_INET)
> + case htons(ETH_P_IP):
> + if (code < 0 || code > NR_ICMP_UNREACH)
> + return -EINVAL;
> +
> + nskb = skb_clone(skb, GFP_ATOMIC);
> + if (!nskb)
> + return -ENOMEM;
> +
> + if (!skb_dst(nskb) && ip_route_reply_fetch_dst(nskb) < 0) {
> + kfree_skb(nskb);
> + return -EHOSTUNREACH;
> + }
> +
> + *in_progress = true;
> + icmp_send(nskb, ICMP_DEST_UNREACH, code, 0);
> + *in_progress = false;
> + kfree_skb(nskb);
> + break;
> +#endif
> +#if IS_ENABLED(CONFIG_IPV6)
> + case htons(ETH_P_IPV6):
> + if (code < 0 || code > ICMPV6_REJECT_ROUTE)
> + return -EINVAL;
> +
> + nskb = skb_clone(skb, GFP_ATOMIC);
> + if (!nskb)
> + return -ENOMEM;
> +
> + if (!skb_dst(nskb) && ip6_route_reply_fetch_dst(nskb) < 0) {
> + kfree_skb(nskb);
> + return -EHOSTUNREACH;
> + }
Is the route lookup needed here? Looking at icmpv6_send() in
net/ipv6/icmp.c, it fills fl6.{daddr,saddr} and passes them to
icmpv6_route_lookup() which performs its own route lookup. The function
doesn't appear to use skb_dst() from the original skb:
icmpv6_send() {
...
fl6.daddr = hdr->saddr;
fl6.saddr = saddr;
...
dst = icmpv6_route_lookup(net, skb, sk, &fl6);
...
}
The same applies to the IPv4 path with icmp_send(). Additionally, at
cgroup/egress the skb->_skb_refdst should already be set, and the same
should be true for cgroup/ingress for inet proto.
Does the kfunc work without ip_route_reply_fetch_dst() and
ip6_route_reply_fetch_dst()? If so, can these calls be removed?
Reference: Martin KaFai Lau's comment in v3 review
https://lore.kernel.org/bpf/df4b0996-3e88-4ea4-983b-82866455a6fc@linux.dev/
> +
> + *in_progress = true;
> + icmpv6_send(nskb, ICMPV6_DEST_UNREACH, code, 0);
> + *in_progress = false;
> + kfree_skb(nskb);
> + break;
> +#endif
> + default:
> + return -EPROTONOSUPPORT;
> + }
> +
> + return 0;
> +}
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24663313503
^ permalink raw reply
* Question/proposal for DPLL NCO (DCO) mode
From: Ivan Vecera @ 2026-04-20 11:42 UTC (permalink / raw)
To: netdev@vger.kernel.org, Jiri Pirko, Arkadiusz Kubalewski,
Vadim Fedorenko, Jakub Kicinski
Hi all,
I am currently working on adding PTP clock support (PHC) to the ZL3073x
driver, and I would like to discuss an architectural addition to the
DPLL subsystem enum dpll_mode before sending the formal patch series.
To support IEEE 1588 PTP, the hardware DPLL must be decoupled from
tracking physical reference pins. Instead, its output frequency needs to
be continuously steered by a software PTP stack (e.g., linuxptp) using
the .adjfine() callback. In the industry, this specific hardware state
is widely known as NCO (Numerically Controlled Oscillator) or DCO
(Digitally Controlled Oscillator) mode.
Currently, the DPLL subsystem defines two modes in enum dpll_mode:
MANUAL: The user explicitly selects a physical input pin to track.
AUTOMATIC: The hardware automatically selects the best physical input
pin based on priorities.
Neither of these accurately represents the NCO/DCO state, where physical
pins are ignored, and the loop is fully software-steered.
Without a dedicated mode, drivers are forced into "automagic"
workarounds. A driver might implicitly switch the hardware to NCO mode
in the background as soon as .adjfine() is called, while misreporting
its mode as MANUAL to the DPLL subsystem.
I strongly believe this hidden "automagic" behavior is a bad design. The
user-space should have full visibility and explicit control over what
the DPLL state machine is actually doing.
The Proposal:
I would like to propose adding a new mode: DPLL_MODE_NCO (or
DPLL_MODE_DCO, I am open to suggestions regarding the terminology).
The expected workflow would be explicit:
If a user wants to actively steer the DPLL frequency via software (for
PTP or other Time-over-Packet applications), they must explicitly set
the DPLL device to this mode via Netlink first. Only then will the
hardware accept frequency adjustments via .adjfine().
I would appreciate any feedback from the DPLL subsystem maintainers and
the community on this approach.
Thanks,
Ivan
^ permalink raw reply
* [PATCH] docs: maintainer-netdev: fix typo in "targeting"
From: Ariful Islam Shoikot @ 2026-04-20 11:45 UTC (permalink / raw)
To: netdev; +Cc: linux-doc, workflows, linux-kernel, Ariful Islam Shoikot
Fix spelling mistake "targgeting" -> "targeting" in
maintainer-netdev.rst
No functional change.
Signed-off-by: Ariful Islam Shoikot <islamarifulshoikat@gmail.com>
---
Documentation/process/maintainer-netdev.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/process/maintainer-netdev.rst b/Documentation/process/maintainer-netdev.rst
index bda93b459a05..ec7b9aa2877f 100644
--- a/Documentation/process/maintainer-netdev.rst
+++ b/Documentation/process/maintainer-netdev.rst
@@ -528,7 +528,7 @@ The exact rules a driver must follow to acquire the ``Supported`` status:
status will be withdrawn.
5. Test failures due to bugs either in the driver or the test itself,
- or lack of support for the feature the test is targgeting are
+ or lack of support for the feature the test is targeting are
*not* a basis for losing the ``Supported`` status.
netdev CI will maintain an official page of supported devices, listing their
--
2.43.0
^ permalink raw reply related
* Re: [PATCH net v1] net: validate skb->napi_id in RX tracepoints
From: Kohei Enju @ 2026-04-20 11:54 UTC (permalink / raw)
To: Jiayuan Chen
Cc: netdev, linux-trace-kernel, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers
In-Reply-To: <b943ec1e-417c-4157-ab19-b34aa6d63688@linux.dev>
On 04/20 19:27, Jiayuan Chen wrote:
>
> On 4/20/26 6:54 PM, Kohei Enju wrote:
> > Since commit 2bd82484bb4c ("xps: fix xps for stacked devices"),
> > skb->napi_id shares storage with sender_cpu. RX tracepoints using
> > net_dev_rx_verbose_template read skb->napi_id directly and can therefore
> > report sender_cpu values as if they were NAPI IDs.
> >
> > For example, on the loopback path this can report 1 as napi_id, where 1
> So I think veth_forward_skb->__netif_rx could be affected as well?
Yes. Just in case, I've confirmed the same behavior in the veth path.
The mentioned loopback path is just a single example of possibly
affected paths.
Thanks,
Kohei
> > comes from raw_smp_processor_id() + 1 in the XPS path:
> >
> > # bpftrace -e 'tracepoint:net:netif_rx_entry{ print(args->napi_id); }'
> > # taskset -c 0 ping -c 1 ::1
> >
> > Report only valid NAPI IDs in these tracepoints and use 0 otherwise.
> >
> > Fixes: 2bd82484bb4c ("xps: fix xps for stacked devices")
> > Signed-off-by: Kohei Enju <kohei@enjuk.jp>
> > ---
> > include/trace/events/net.h | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> > index fdd9ad474ce3..dbc2c5598e35 100644
> > --- a/include/trace/events/net.h
> > +++ b/include/trace/events/net.h
> > @@ -10,6 +10,7 @@
> > #include <linux/if_vlan.h>
> > #include <linux/ip.h>
> > #include <linux/tracepoint.h>
> > +#include <net/busy_poll.h>
> > TRACE_EVENT(net_dev_start_xmit,
> > @@ -208,7 +209,8 @@ DECLARE_EVENT_CLASS(net_dev_rx_verbose_template,
> > TP_fast_assign(
> > __assign_str(name);
> > #ifdef CONFIG_NET_RX_BUSY_POLL
> > - __entry->napi_id = skb->napi_id;
> > + __entry->napi_id = napi_id_valid(skb->napi_id) ?
> > + skb->napi_id : 0;
> > #else
> > __entry->napi_id = 0;
> > #endif
^ permalink raw reply
* Re: [PATCH net v3 4/5] net: mana: Don't overwrite port probe error with add_adev result
From: Erni Sri Satya Vennela @ 2026-04-20 12:45 UTC (permalink / raw)
To: Simon Horman
Cc: haiyangz, davem, longli, ssengar, linux-kernel, kuba, kotaranov,
stephen, andrew+netdev, kys, wei.liu, shirazsaleem, kees, pabeni,
linux-hyperv, edumazet, decui, leon, gargaditya, dipayanroy,
netdev, shacharr
In-Reply-To: <20260417141014.218936-1-horms@kernel.org>
On Fri, Apr 17, 2026 at 03:10:14PM +0100, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
>
> For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
> ---
> net: mana: Don't overwrite port probe error with add_adev result
>
> In mana_probe(), if mana_probe_port() fails for any port, the error
> is stored in 'err' and the loop breaks. However, the subsequent
> unconditional 'err = add_adev(gd, "eth")' overwrites this error.
> If add_adev() succeeds, mana_probe() returns success despite ports
> being left in a partially initialized state (ac->ports[i] == NULL).
> Only call add_adev() when there is no prior error, so the probe
> correctly fails and triggers mana_remove() cleanup.
>
> > In mana_probe(), if mana_probe_port() fails for any port, the error
> > is stored in 'err' and the loop breaks. However, the subsequent
> > unconditional 'err = add_adev(gd, "eth")' overwrites this error.
> > If add_adev() succeeds, mana_probe() returns success despite ports
> > being left in a partially initialized state (ac->ports[i] == NULL).
> >
> > Only call add_adev() when there is no prior error, so the probe
> > correctly fails and triggers mana_remove() cleanup.
> >
> > Fixes: ced82fce77e9 ("net: mana: Probe rdma device in mana driver")
>
> The existing Fixes: tag points to commit ced82fce77e9 ("net: mana: Probe
> rdma device in mana driver"), but was the bug actually introduced there?
>
> Looking at the commit history, commit a69839d4327d ("net: mana: Add support
> for auxiliary device") added the unconditional 'err = add_adev(gd);' call
> that overwrites the error from mana_probe_port(). Commit ced82fce77e9 only
> modified the add_adev signature from add_adev(gd) to add_adev(gd, "eth")
> but did not introduce the buggy pattern.
>
> Should the Fixes: tag be:
> Fixes: a69839d4327d ("net: mana: Add support for auxiliary device")
Thankyou for the correction, Simon.
I'll make this change in the next version.
- Vennela
^ permalink raw reply
* Re: [PATCH net v3 3/5] net: mana: Guard mana_remove against double invocation
From: Erni Sri Satya Vennela @ 2026-04-20 12:45 UTC (permalink / raw)
To: Simon Horman
Cc: haiyangz, davem, longli, ssengar, linux-kernel, kuba, kotaranov,
stephen, andrew+netdev, kys, wei.liu, shirazsaleem, kees, pabeni,
linux-hyperv, edumazet, decui, leon, gargaditya, dipayanroy,
netdev, shacharr
In-Reply-To: <20260417140953.218832-1-horms@kernel.org>
On Fri, Apr 17, 2026 at 03:09:53PM +0100, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
>
> For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
> ---
> net: mana: Guard mana_remove against double invocation
>
> If PM resume fails (e.g., mana_attach() returns an error), mana_probe()
> calls mana_remove(), which tears down the device and sets
> gd->gdma_context = NULL and gd->driver_data = NULL. However, a failed
> resume callback does not automatically unbind the driver. When the device
> is eventually unbound, mana_remove() is invoked a second time. Without a
> NULL check, it dereferences gc->dev with gc == NULL, causing a kernel
> panic. Add an early return if gdma_context or driver_data is NULL so the
> second invocation is harmless.
>
> > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
>
> The Fixes: tag points to ca9c54d2d6a5, which introduced mana_remove()
> without NULL checks. However, the exploitable double invocation scenario
> was actually introduced by commit 635096a86edb ("net: mana: Support
> hibernation and kexec"), which added suspend/resume support and made
> mana_probe() call mana_remove() on error.
>
> Should the Fixes: tag be:
>
> Fixes: 635096a86edb ("net: mana: Support hibernation and kexec")
>
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index 9d095a6fb56c..2ab7c89e2fed 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > @@ -3685,11 +3685,16 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
> > struct gdma_context *gc = gd->gdma_context;
> > struct mana_context *ac = gd->driver_data;
> > struct mana_port_context *apc;
> > - struct device *dev = gc->dev;
> > + struct device *dev;
> > struct net_device *ndev;
> > int err;
> > int i;
> >
> > + if (!gc || !ac)
> > + return;
> > +
> > + dev = gc->dev;
> > +
> > disable_work_sync(&ac->link_change_work);
> > cancel_delayed_work_sync(&ac->gf_stats_work);
Thankyou for the correction, Simon.
I'll make this change in the next version.
- Vennela
^ permalink raw reply
* Re: Re: [PATCH] net: hamachi: fix divide by zero in hamachi_init_one
From: Andrew Lunn @ 2026-04-20 12:46 UTC (permalink / raw)
To: 王明煜
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, tglx, mingo, netdev,
linux-kernel
In-Reply-To: <2543c18f.5194.19da8c3b8d1.Coremail.25181214217@stu.xidian.edu.cn>
So what is your security model here? How did i get the hacked card
into the machine? If i have physical access, isn't it already too
late?
As far as i can see, this devices does not load firmware. Even if it
did load firmware from /lib/firmware, i need root to put a hacked
version there.
The PacketEngine Hamachi was in use around year 2000. It needs a PCI
slot. Not PCIe but PCI. Do they even exist any more? Both the card and
a machine with a PCI slot?
Maybe, rather than fix this /0, you should look around and see if you
can find any indication this hardware is still in use, and if not,
submit a patch removing the whole driver?
> +++ b/drivers/net/ethernet/packetengines/hamachi.c
> @@ -745,6 +745,14 @@ static int hamachi_init_one(struct pci_dev *pdev,
> dev->name, chip_tbl[chip_id].name, readl(ioaddr + ChipRev),
> ioaddr, dev->dev_addr, irq);
> i = readb(ioaddr + PCIClkMeas);
> +
> + if ((i & 0x80) && !(i & 0x7f)) {
> + dev_err(&pdev->dev, "Invalid PCIClkMeas value (0x%02x), hardware untrusted.\n", i);
> + unregister_netdev(dev);
> + ret = -EIO;
> + goto err_out_unmap_rx;
> + }
> +
> printk(KERN_INFO "%s: %d-bit %d Mhz PCI bus (%d), Virtual Jumpers "
> "%2.2x, LPA %4.4x.\n",
> dev->name, readw(ioaddr + MiscStatus) & 1 ? 64 : 32,
If you do decide it is worth keeping the driver, please add another
label at the end of the function, and do the unregister_netdev there.
Could i also suggest you consider more likely scenarios. It is much
easier to produce a hacked USB dongle than a PCI card. Users plug in
USB dongles without thinking, where as few users plug in a PCI
card. You are more likely to fix security problems which affect real
systems if you look at USB devices. I would also suggest that / 0 is
not that important. But can you trigger a buffer overrun?
Andrew
---
pw-bot: cr
^ permalink raw reply
* [PATCH net v4 0/5] net: mana: Fix probe/remove error path bugs
From: Erni Sri Satya Vennela @ 2026-04-20 12:47 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, ernis, ssengar, dipayanroy, gargaditya,
shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
linux-hyperv, netdev, linux-kernel
Fix five bugs in mana_probe()/mana_remove() error handling that can
cause warnings on uninitialized work structs, NULL pointer dereferences,
masked errors, and resource leaks when early probe steps fail.
Patches 1-2 move work struct initialization (link_change_work and
gf_stats_work) to before any error path that could trigger
mana_remove(), preventing WARN_ON in __flush_work() or debug object
warnings when sync cancellation runs on uninitialized work structs.
Patch 3 guards mana_remove() against double invocation. If PM resume
fails, mana_probe() calls mana_remove() which sets gdma_context and
driver_data to NULL. A failed resume does not unbind the driver, so
when the device is eventually unbound, mana_remove() is called again
and dereferences NULL, causing a kernel panic. An early return on
NULL gdma_context or driver_data makes the second call harmless.
Patch 4 prevents add_adev() from overwriting a port probe error,
which could leave the driver in a broken state with NULL ports while
reporting success.
Patch 5 changes 'goto out' to 'break' in mana_remove()'s port loop
so that mana_destroy_eq() is always reached, preventing EQ leaks when
a NULL port is encountered.
---
Changes in v4:
* Correct Fixes tag from ca9c54d2d6a5 to 635096a86edb
* Correct Fixes tag from ced82fce77e9 to a69839d4327d
Changes in v3:
* Add patch 3: net: mana: Guard mana_remove against double invocation.
* Fix inaccurate comments.
* Correct Fixes tag from ca9c54d2d6a5 to 1e2d0824a9c3.
Changes in v2:
* Apply the patchset in net instead of net-next.
---
Erni Sri Satya Vennela (5):
net: mana: Init link_change_work before potential error paths in probe
net: mana: Init gf_stats_work before potential error paths in probe
net: mana: Guard mana_remove against double invocation
net: mana: Don't overwrite port probe error with add_adev result
net: mana: Fix EQ leak in mana_remove on NULL port
drivers/net/ethernet/microsoft/mana/mana_en.c | 35 +++++++++++--------
1 file changed, 20 insertions(+), 15 deletions(-)
--
2.34.1
^ permalink raw reply
* [PATCH net v4 1/5] net: mana: Init link_change_work before potential error paths in probe
From: Erni Sri Satya Vennela @ 2026-04-20 12:47 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, ernis, ssengar, dipayanroy, gargaditya,
shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260420124741.1056179-1-ernis@linux.microsoft.com>
Move INIT_WORK(link_change_work) to right after the mana_context
allocation, before any error path that could reach mana_remove().
Previously, if mana_create_eq() or mana_query_device_cfg() failed,
mana_probe() would jump to the error path which calls mana_remove().
mana_remove() unconditionally calls disable_work_sync(link_change_work),
but the work struct had not been initialized yet. This can trigger
CONFIG_DEBUG_OBJECTS_WORK enabled.
Fixes: 54133f9b4b53 ("net: mana: Support HW link state events")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
---
Changes in v3,v4:
* No change.
Changes in v2:
* Apply the patch in net instead of net-next.
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 6302432b9bf6..e3e4b6de6668 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3631,6 +3631,8 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
ac->gdma_dev = gd;
gd->driver_data = ac;
+
+ INIT_WORK(&ac->link_change_work, mana_link_state_handle);
}
err = mana_create_eq(ac);
@@ -3648,8 +3650,6 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
if (!resuming) {
ac->num_ports = num_ports;
-
- INIT_WORK(&ac->link_change_work, mana_link_state_handle);
} else {
if (ac->num_ports != num_ports) {
dev_err(dev, "The number of vPorts changed: %d->%d\n",
--
2.34.1
^ permalink raw reply related
* [PATCH net v4 2/5] net: mana: Init gf_stats_work before potential error paths in probe
From: Erni Sri Satya Vennela @ 2026-04-20 12:47 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, ernis, ssengar, dipayanroy, gargaditya,
shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260420124741.1056179-1-ernis@linux.microsoft.com>
Move INIT_DELAYED_WORK(gf_stats_work) to before mana_create_eq(),
while keeping schedule_delayed_work() at its original location.
Previously, if any function between mana_create_eq() and the
INIT_DELAYED_WORK call failed, mana_probe() would call mana_remove()
which unconditionally calls cancel_delayed_work_sync(gf_stats_work)
in __flush_work() or debug object warnings with
CONFIG_DEBUG_OBJECTS_WORK enabled.
Fixes: be4f1d67ec56 ("net: mana: Add standard counter rx_missed_errors")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
---
Changes in v3,v4:
* No change.
Changes in v2:
* Apply the patch in net instead of net-next.
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index e3e4b6de6668..468ed60a8a00 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3635,6 +3635,8 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
INIT_WORK(&ac->link_change_work, mana_link_state_handle);
}
+ INIT_DELAYED_WORK(&ac->gf_stats_work, mana_gf_stats_work_handler);
+
err = mana_create_eq(ac);
if (err) {
dev_err(dev, "Failed to create EQs: %d\n", err);
@@ -3709,7 +3711,6 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
err = add_adev(gd, "eth");
- INIT_DELAYED_WORK(&ac->gf_stats_work, mana_gf_stats_work_handler);
schedule_delayed_work(&ac->gf_stats_work, MANA_GF_STATS_PERIOD);
out:
--
2.34.1
^ permalink raw reply related
* [PATCH net v4 3/5] net: mana: Guard mana_remove against double invocation
From: Erni Sri Satya Vennela @ 2026-04-20 12:47 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, ernis, ssengar, dipayanroy, gargaditya,
shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260420124741.1056179-1-ernis@linux.microsoft.com>
If PM resume fails (e.g., mana_attach() returns an error), mana_probe()
calls mana_remove(), which tears down the device and sets
gd->gdma_context = NULL and gd->driver_data = NULL.
However, a failed resume callback does not automatically unbind the
driver. When the device is eventually unbound, mana_remove() is invoked
a second time. Without a NULL check, it dereferences gc->dev with
gc == NULL, causing a kernel panic.
Add an early return if gdma_context or driver_data is NULL so the second
invocation is harmless. Move the dev = gc->dev assignment after the
guard so it cannot dereference NULL.
Fixes: 635096a86edb ("net: mana: Support hibernation and kexec")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
---
Changes in v4:
* Update Fixes tag to 635096a86edb
Changes in v3:
* Add this patch to the patchset
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 468ed60a8a00..ce1b7ec46a27 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3731,11 +3731,16 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
struct gdma_context *gc = gd->gdma_context;
struct mana_context *ac = gd->driver_data;
struct mana_port_context *apc;
- struct device *dev = gc->dev;
+ struct device *dev;
struct net_device *ndev;
int err;
int i;
+ if (!gc || !ac)
+ return;
+
+ dev = gc->dev;
+
disable_work_sync(&ac->link_change_work);
cancel_delayed_work_sync(&ac->gf_stats_work);
--
2.34.1
^ permalink raw reply related
* [PATCH net v4 4/5] net: mana: Don't overwrite port probe error with add_adev result
From: Erni Sri Satya Vennela @ 2026-04-20 12:47 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, ernis, ssengar, dipayanroy, gargaditya,
shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260420124741.1056179-1-ernis@linux.microsoft.com>
In mana_probe(), if mana_probe_port() fails for any port, the error
is stored in 'err' and the loop breaks. However, the subsequent
unconditional 'err = add_adev(gd, "eth")' overwrites this error.
If add_adev() succeeds, mana_probe() returns success despite ports
being left in a partially initialized state (ac->ports[i] == NULL).
Only call add_adev() when there is no prior error, so the probe
correctly fails and triggers mana_remove() cleanup.
Fixes: a69839d4327d ("net: mana: Add support for auxiliary device")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
---
Changes in v4:
* Update Fixes tag to a69839d4327d.
Changes in v3:
* Fix inaccurate comments.
Changes in v2:
* Apply the patch in net instead of net-next.
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index ce1b7ec46a27..39b18577fb51 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3680,10 +3680,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
if (!resuming) {
for (i = 0; i < ac->num_ports; i++) {
err = mana_probe_port(ac, i, &ac->ports[i]);
- /* we log the port for which the probe failed and stop
- * probes for subsequent ports.
- * Note that we keep running ports, for which the probes
- * were successful, unless add_adev fails too
+ /* Log the port for which the probe failed, stop probing
+ * subsequent ports, and skip add_adev.
+ * mana_remove() will clean up already-probed ports.
*/
if (err) {
dev_err(dev, "Probe Failed for port %d\n", i);
@@ -3697,10 +3696,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
enable_work(&apc->queue_reset_work);
err = mana_attach(ac->ports[i]);
rtnl_unlock();
- /* we log the port for which the attach failed and stop
- * attach for subsequent ports
- * Note that we keep running ports, for which the attach
- * were successful, unless add_adev fails too
+ /* Log the port for which the attach failed, stop
+ * attaching subsequent ports, and skip add_adev.
+ * mana_remove() will clean up already-attached ports.
*/
if (err) {
dev_err(dev, "Attach Failed for port %d\n", i);
@@ -3709,7 +3707,8 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
}
}
- err = add_adev(gd, "eth");
+ if (!err)
+ err = add_adev(gd, "eth");
schedule_delayed_work(&ac->gf_stats_work, MANA_GF_STATS_PERIOD);
--
2.34.1
^ permalink raw reply related
* [PATCH net v4 5/5] net: mana: Fix EQ leak in mana_remove on NULL port
From: Erni Sri Satya Vennela @ 2026-04-20 12:47 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, ernis, ssengar, dipayanroy, gargaditya,
shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260420124741.1056179-1-ernis@linux.microsoft.com>
In mana_remove(), when a NULL port is encountered in the port iteration
loop, 'goto out' skips the mana_destroy_eq(ac) call, leaking the event
queues allocated earlier by mana_create_eq().
This can happen when mana_probe_port() fails for port 0, leaving
ac->ports[0] as NULL. On driver unload or error cleanup, mana_remove()
hits the NULL entry and jumps past mana_destroy_eq().
Change 'goto out' to 'break' so the for-loop exits normally and
mana_destroy_eq() is always reached. Remove the now-unreferenced out:
label.
Fixes: 1e2d0824a9c3 ("net: mana: Add support for EQ sharing")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
---
Changes in v4:
* No change
Changes in v3;
* Update Fixes tag to appropriate commit id.
Changes in v2:
* Apply the patch in net instead of net-next.
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 39b18577fb51..98e2fcc797ca 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3752,7 +3752,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
if (!ndev) {
if (i == 0)
dev_err(dev, "No net device to remove\n");
- goto out;
+ break;
}
apc = netdev_priv(ndev);
@@ -3783,7 +3783,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
}
mana_destroy_eq(ac);
-out:
+
if (ac->per_port_queue_reset_wq) {
destroy_workqueue(ac->per_port_queue_reset_wq);
ac->per_port_queue_reset_wq = NULL;
--
2.34.1
^ permalink raw reply related
* Re: [PATCH v2 1/1] net: phy: realtek: Add support for PHY LEDs on RTL8221B
From: Andrew Lunn @ 2026-04-20 12:55 UTC (permalink / raw)
To: Chukun Pan
Cc: David S . Miller, Paolo Abeni, Jakub Kicinski, Eric Dumazet,
Russell King, Daniel Golle, Heiner Kallweit, linux-kernel, netdev
In-Reply-To: <20260420100800.2435204-1-amadeus@jmu.edu.cn>
On Mon, Apr 20, 2026 at 06:08:00PM +0800, Chukun Pan wrote:
> Realtek RTL8221B Ethernet PHY supports three LED pins which are used to
> indicate link status and activity. Add netdev trigger support for them.
>
> Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
> ---
> Changes in v2:
> - Invert the LED polarity in led_brightness_set to achieve LED_ON.
Nice idea.
> +static int rtl822xb_led_hw_control_set(struct phy_device *phydev, u8 index,
> + unsigned long rules)
> +{
> + u16 val = 0;
> + bool act;
> + int ret;
> +
> + if (index >= RTL8211x_LED_COUNT)
> + return -EINVAL;
> +
> + if (test_bit(TRIGGER_NETDEV_LINK, &rules) ||
> + test_bit(TRIGGER_NETDEV_LINK_10, &rules))
> + val |= RTL822X_VND2_LCR_LINK_10;
> +
> + if (test_bit(TRIGGER_NETDEV_LINK, &rules) ||
> + test_bit(TRIGGER_NETDEV_LINK_100, &rules))
> + val |= RTL822X_VND2_LCR_LINK_100;
> +
> + if (test_bit(TRIGGER_NETDEV_LINK, &rules) ||
> + test_bit(TRIGGER_NETDEV_LINK_1000, &rules))
> + val |= RTL822X_VND2_LCR_LINK_1000;
> +
> + if (test_bit(TRIGGER_NETDEV_LINK, &rules) ||
> + test_bit(TRIGGER_NETDEV_LINK_2500, &rules))
> + val |= RTL822X_VND2_LCR_LINK_2500;
> +
> + ret = phy_write_mmd(phydev, MDIO_MMD_VEND2,
> + RTL822X_VND2_LED(index), val);
> + if (ret < 0)
> + return ret;
> +
> + act = test_bit(TRIGGER_NETDEV_RX, &rules) ||
> + test_bit(TRIGGER_NETDEV_TX, &rules);
> +
> + return phy_modify_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_LCR6,
> + RTL822X_VND2_LED_ACT(index), act ?
> + RTL822X_VND2_LED_ACT(index) : 0);
I think to be safe, you probably should set the polarity back to its
default. Or you need to add to the commit message that you have
verified that ledtrig-netdev will always turn the LED off before
offloading to hardware, so resetting the polarity.
Andrew
---
pw-bot: cr
^ permalink raw reply
* Re: [PATCH] net: rose: use pskb_may_pull() in CLEAR_REQUEST length check
From: Andrew Lunn @ 2026-04-20 13:04 UTC (permalink / raw)
To: Ashutosh Desai
Cc: netdev, linux-hams, davem, edumazet, kuba, pabeni, horms,
linux-kernel
In-Reply-To: <20260420015723.462479-1-ashutoshdesai993@gmail.com>
On Mon, Apr 20, 2026 at 01:57:23AM +0000, Ashutosh Desai wrote:
> Commit 2835750dd647 ("net: rose: reject truncated CLEAR_REQUEST frames
> in state machines") guards against short CLEAR_REQUEST frames using a
> plain skb->len comparison. Use pskb_may_pull() instead, which both
> enforces the length requirement and ensures the bytes are in the linear
> part of the skb, making the subsequent accesses to skb->data[3] and
> skb->data[4] safe for non-linear buffers.
Did you review all the other comparisons on skb->len in rose?
rose_route.c- if (frametype == ROSE_CALL_REQUEST &&
rose_route.c: (skb->len <= ROSE_CALL_REQ_FACILITIES_OFF ||
rose_route.c- skb->data[ROSE_CALL_REQ_ADDR_LEN_OFF] !=
rose_route.c- ROSE_CALL_REQ_ADDR_LEN_VAL))
rose_loopback.c- if (frametype == ROSE_CALL_REQUEST &&
rose_loopback.c: (skb->len <= ROSE_CALL_REQ_FACILITIES_OFF ||
rose_loopback.c- skb->data[ROSE_CALL_REQ_ADDR_LEN_OFF] !=
Do these need the same fix? Are there other places non linear buffers
should be considered?
Andrew
^ permalink raw reply
* Re: [PATCH bpf-next v4 1/6] net: move netfilter nf_reject_fill_skb_dst to core ipv4
From: Mahe Tardy @ 2026-04-20 13:04 UTC (permalink / raw)
To: bot+bpf-ci
Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
john.fastabend, lkp, martin.lau, netdev, netfilter-devel,
oe-kbuild-all, pablo, martin.lau, eddyz87, yonghong.song, clm,
ihor.solodrai
In-Reply-To: <c0aa7f264055f44ac4c93aa4718473e89ed13b8bb11de3179ed40d2f88a99257@mail.kernel.org>
On Mon, Apr 20, 2026 at 11:36:12AM +0000, bot+bpf-ci@kernel.org wrote:
> > diff --git a/include/net/route.h b/include/net/route.h
> > index f90106f383c5..ec2466fd0bec 100644
> > --- a/include/net/route.h
> > +++ b/include/net/route.h
> > @@ -173,6 +173,7 @@ struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
> > const struct sock *sk);
> > struct dst_entry *ipv4_blackhole_route(struct net *net,
> > struct dst_entry *dst_orig);
> > +int ip_route_reply_fetch_dst(struct sk_buff *skb);
> >
> > static inline struct rtable *ip_route_output_key(struct net *net, struct flowi4 *flp)
> > {
>
> [ ... ]
>
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index bc1296f0ea69..7091ef936073 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -2945,6 +2945,21 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
> > }
> > EXPORT_SYMBOL_GPL(ip_route_output_flow);
> >
> > +int ip_route_reply_fetch_dst(struct sk_buff *skb)
> > +{
> > + struct rtable *rt;
> > + struct flowi4 fl4 = {
> > + .daddr = ip_hdr(skb)->saddr
> > + };
> > +
> > + rt = ip_route_output_key(dev_net(skb->dev), &fl4);
> > + if (IS_ERR(rt))
> > + return PTR_ERR(rt);
> > + skb_dst_set(skb, &rt->dst);
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(ip_route_reply_fetch_dst);
>
> Is ip_route_reply_fetch_dst() actually needed before calling icmp_send()?
>
> Martin KaFai Lau raised this question in v3 of the patch series and
> noted that icmpv6_send() does its own route lookup and doesn't use
> skb_dst() of the original skb. He also pointed out that at
> cgroup/egress, skb->_skb_refdst should already be set.
>
> Looking at icmpv6_send(), the fl6.{daddr,saddr} are filled and passed
> to icmpv6_route_lookup, which performs the reverse/invert route lookup.
> The function doesn't appear to use skb_dst() of the original skb.
>
> Since this patch moves ip_route_reply_fetch_dst() from netfilter to
> core specifically for the BPF kfunc, can you clarify why this routing
> step is necessary? Does the kfunc not work without
> ip[6]_route_reply_fetch_dst()?
>
> Reference: https://lore.kernel.org/bpf/df4b0996-3e88-4ea4-983b-82866455a6fc@linux.dev/
Yes I replied to this in the cover letter, let me quote it:
Initially, this kfunc was added only to cgroup_skb programs, Alexei
suggested not creating its own kfunc set and adding it to the more
global bpf_kfunc_set_skb. Now that recursion is handled and I realized,
thanks to Martin, that fetching the dst route might be only useful in
situation in which the packet was not yet routed, I decided to extend
the kfunc to more program types and route the packet only if needed.
So in short, if we decide to go back to cgroup_skb only, we can remove
this, if we expand to other program types like we do, it still makes
sense to do it.
>
>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24663313503
^ permalink raw reply
* [PATCH 1/1] tipc: fix double-free in tipc_buf_append()
From: Lee Jones @ 2026-04-20 13:05 UTC (permalink / raw)
To: lee, Jon Maloy, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Ying Xue, netdev, tipc-discussion,
linux-kernel
The tipc_msg_validate() function can potentially reallocate the skb it
is validating, freeing the old one. In tipc_buf_append(), it was being
called with a pointer to a local variable which was a copy of the
caller's skb pointer.
If the skb was reallocated and validation subsequently failed, the error
handling path would free the original skb pointer, which had already
been freed, leading to double-free.
Fix this by passing the caller's skb pointer-pointer directly to
tipc_msg_validate(), ensuring any modification is reflected correctly.
The local skb pointer is then updated from the (possibly modified)
caller's pointer.
Fixes: d618d09a68e4 ("tipc: enforce valid ratio between skb truesize and contents")
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Lee Jones <lee@kernel.org>
---
net/tipc/msg.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index 76284fc538eb..9f4f612ee027 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -177,8 +177,9 @@ int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf)
if (fragid == LAST_FRAGMENT) {
TIPC_SKB_CB(head)->validated = 0;
- if (unlikely(!tipc_msg_validate(&head)))
+ if (unlikely(!tipc_msg_validate(headbuf)))
goto err;
+ head = *headbuf;
*buf = head;
TIPC_SKB_CB(head)->tail = NULL;
*headbuf = NULL;
--
2.54.0.rc1.513.gad8abe7a5a-goog
^ permalink raw reply related
* Re: [PATCH bpf-next v4 3/6] bpf: add bpf_icmp_send_unreach kfunc
From: Mahe Tardy @ 2026-04-20 13:07 UTC (permalink / raw)
To: bot+bpf-ci
Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
john.fastabend, lkp, martin.lau, netdev, netfilter-devel,
oe-kbuild-all, pablo, martin.lau, eddyz87, yonghong.song, clm,
ihor.solodrai
In-Reply-To: <c706a726c2e82dfc6162f96d7c04313301c24d1a4e36af5c421fda7a1c01b1a1@mail.kernel.org>
On Mon, Apr 20, 2026 at 11:36:15AM +0000, bot+bpf-ci@kernel.org wrote:
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
>
> [ ... ]
>
> > +#if IS_ENABLED(CONFIG_IPV6)
> > + case htons(ETH_P_IPV6):
> > + if (code < 0 || code > ICMPV6_REJECT_ROUTE)
> > + return -EINVAL;
> > +
> > + nskb = skb_clone(skb, GFP_ATOMIC);
> > + if (!nskb)
> > + return -ENOMEM;
> > +
> > + if (!skb_dst(nskb) && ip6_route_reply_fetch_dst(nskb) < 0) {
> > + kfree_skb(nskb);
> > + return -EHOSTUNREACH;
> > + }
>
> Is the route lookup needed here? Looking at icmpv6_send() in
> net/ipv6/icmp.c, it fills fl6.{daddr,saddr} and passes them to
> icmpv6_route_lookup() which performs its own route lookup. The function
> doesn't appear to use skb_dst() from the original skb:
>
> icmpv6_send() {
> ...
> fl6.daddr = hdr->saddr;
> fl6.saddr = saddr;
> ...
> dst = icmpv6_route_lookup(net, skb, sk, &fl6);
> ...
> }
>
> The same applies to the IPv4 path with icmp_send(). Additionally, at
> cgroup/egress the skb->_skb_refdst should already be set, and the same
> should be true for cgroup/ingress for inet proto.
>
> Does the kfunc work without ip_route_reply_fetch_dst() and
> ip6_route_reply_fetch_dst()? If so, can these calls be removed?
>
> Reference: Martin KaFai Lau's comment in v3 review
> https://lore.kernel.org/bpf/df4b0996-3e88-4ea4-983b-82866455a6fc@linux.dev/
Same reply as https://lore.kernel.org/bpf/aeYkdKm7B4NQ3BDo@gmail.com/,
too bad the LLM can't access the cover letter.
>
> > +
> > + *in_progress = true;
> > + icmpv6_send(nskb, ICMPV6_DEST_UNREACH, code, 0);
> > + *in_progress = false;
> > + kfree_skb(nskb);
> > + break;
> > +#endif
> > + default:
> > + return -EPROTONOSUPPORT;
> > + }
> > +
> > + return 0;
> > +}
>
> [ ... ]
>
>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24663313503
^ permalink raw reply
* Re: [PATCH bpf-next v4 4/6] selftests/bpf: add icmp_send_unreach kfunc tests
From: Mahe Tardy @ 2026-04-20 13:08 UTC (permalink / raw)
To: bot+bpf-ci
Cc: alexei.starovoitov, andrii, ast, bpf, coreteam, daniel, fw,
john.fastabend, lkp, martin.lau, netdev, netfilter-devel,
oe-kbuild-all, pablo, martin.lau, eddyz87, yonghong.song, clm,
ihor.solodrai
In-Reply-To: <4cfe307cc6b1197fc7bb3a11e79382764ac1d269a42eb4e4ffec9e991e6868e6@mail.kernel.org>
On Mon, Apr 20, 2026 at 11:36:13AM +0000, bot+bpf-ci@kernel.org wrote:
> > diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
> > new file mode 100644
> > index 000000000000..24d5e01cfe80
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_unreach_kfunc.c
> > @@ -0,0 +1,136 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include <test_progs.h>
> > +#include <network_helpers.h>
> > +#include <linux/errqueue.h>
> > +#include "icmp_send_unreach.skel.h"
> > +
> > +#define TIMEOUT_MS 1000
> > +#define SRV_PORT 54321
> > +
> > +#define ICMP_DEST_UNREACH 3
> > +
> > +#define ICMP_FRAG_NEEDED 4
> > +#define NR_ICMP_UNREACH 15
>
> Jordan Rife previously suggested in v3 that these ICMP constants should be
> imported from <linux/icmp.h> rather than redefined here. Would it be cleaner
> to include the header and avoid the redefinitions?
Yes it would be cleaner, but looks like I can't do it simply, maybe
there's a solution, see the answer from the cover-letter:
to Jordan: I couldn't include <linux/icmp.h> because of redefines from
<network_helpers.h>.
>
> Reference: https://lore.kernel.org/bpf/usz5bhydsiejr37owgt3zypckzh7fa7ygmhsyaaiprsljx7iy5@ipopnr5n4ds7/
>
> [ ... ]
>
>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24663313503
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox