Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net 1/2] net/mlx5e: psp: Fix invalid access on PSP dev registration fail
From: Jakub Kicinski @ 2026-04-23  2:59 UTC (permalink / raw)
  To: Cosmin Ratiu
  Cc: Boris Pismenny, willemdebruijn.kernel@gmail.com,
	andrew+netdev@lunn.ch, daniel.zahka@gmail.com,
	davem@davemloft.net, leon@kernel.org,
	linux-kernel@vger.kernel.org, edumazet@google.com,
	linux-rdma@vger.kernel.org, Rahul Rameshbabu, Raed Salem,
	Dragos Tatulea, kees@kernel.org, Mark Bloch, pabeni@redhat.com,
	Tariq Toukan, Saeed Mahameed, netdev@vger.kernel.org,
	Gal Pressman
In-Reply-To: <5167f0714e3ddf750f80740bf2ab18a7bb567b16.camel@nvidia.com>

On Wed, 22 Apr 2026 15:13:04 +0000 Cosmin Ratiu wrote:
> > > Can you call mlx5e_psp_cleanup() when register fails for now?  
> > 
> > Done for the next version, currently undergoing testing.  
> 
> There's a snag: priv->psp may be accessed concurrently from
> mlx5e_get_stats() -> mlx5e_fold_sw_stats64() so we'd need to play
> tricks with RCU and that goes beyond what a net fix should be: It's a
> redesign of how priv->psp is handled in the driver. There's a risk we
> are missing things, or it becomes more intrusive that what a fix should
> be.

Questionable.

> I would like to ask you: let's please not do this redesign of priv->psp
> in a rush, and leave it for the net-next series I mentioned...
> 
> To reiterate, would you like to take patch 2?

Sure, whatever. But it has to be reposted, of course.

^ permalink raw reply

* Re: [PATCH net] tcp: make probe0 timer handle expired user timeout
From: Altan Hacigumus @ 2026-04-23  3:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neal Cardwell, Kuniyuki Iwashima, David S . Miller, David Ahern,
	Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
	Enke Chen
In-Reply-To: <CANn89iKwgoxM==synaLGKEP3k0jtCJ4b+Bap8ZF9k8yHJGnORg@mail.gmail.com>

On Wed, Apr 22, 2026 at 12:39 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Apr 21, 2026 at 8:31 PM Altan Hacigumus <ahacigu.linux@gmail.com> wrote:
> >
> > On Mon, Apr 20, 2026 at 9:58 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Mon, Apr 13, 2026 at 6:36 PM Altan Hacigumus <ahacigu.linux@gmail.com> wrote:
> > > >
> > > > tcp_clamp_probe0_to_user_timeout() computes remaining time in jiffies
> > > > using subtraction with an unsigned lvalue.  If elapsed probing time
> > > > already exceeds the configured TCP_USER_TIMEOUT, the subtraction
> > > > underflows and yields a large value.
> > > >
> > > > Handle this expiration case similarly to tcp_clamp_rto_to_user_timeout().
> > > >
> > > > Fixes: 344db93ae3ee ("tcp: make TCP_USER_TIMEOUT accurate for zero window probes")
> > > > Signed-off-by: Altan Hacigumus <ahacigu.linux@gmail.com>
> > > > ---
> > > >  net/ipv4/tcp_timer.c | 5 ++++-
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> > > > index 5a14a53a3c9e..4a43356a4e06 100644
> > > > --- a/net/ipv4/tcp_timer.c
> > > > +++ b/net/ipv4/tcp_timer.c
> > > > @@ -50,7 +50,8 @@ static u32 tcp_clamp_rto_to_user_timeout(const struct sock *sk)
> > > >  u32 tcp_clamp_probe0_to_user_timeout(const struct sock *sk, u32 when)
> > > >  {
> > > >         const struct inet_connection_sock *icsk = inet_csk(sk);
> > > > -       u32 remaining, user_timeout;
> > > > +       u32 user_timeout;
> > > > +       s32 remaining;
> > > >         s32 elapsed;
> > > >
> > > >         user_timeout = READ_ONCE(icsk->icsk_user_timeout);
> > > > @@ -61,6 +62,8 @@ u32 tcp_clamp_probe0_to_user_timeout(const struct sock *sk, u32 when)
> > > >         if (unlikely(elapsed < 0))
> > > >                 elapsed = 0;
> > > >         remaining = msecs_to_jiffies(user_timeout) - elapsed;
> > > > +       if (remaining <= 0)
> > > > +               return 1;
> > >
> > > I do not think this chunk is needed ?
> > > If @remaining is signed, then perhaps change the following line to:
> > >
> > > remaining = max_t(int, remaining, TCP_TIMEOUT_MIN);
> > >
> >
> > The if (remaining <= 0) return 1 handles the already-expired case and
> > mirrors the logic in tcp_clamp_rto_to_user_timeout().
>
> tcp_clamp_rto_to_user_timeout() has two conditionals.
>
> if (remaining <= 0)
>    return 1;
> return min_t(u32, icsk->icsk_rto, msecs_to_jiffies(remaining));
>
> First one was to avoid to call msecs_to_jiffies() with a negative number.
>
> tcp_clamp_probe0_to_user_timeout() has 3 tests already, you want to
> add a fourth one...
> I would prefer something less obscure.
> We do not care if we return 1 or 2 (TCP_TIMEOUT_MIN) jiffies
>

okay

> >
> > max_t(s32, remaining, TCP_TIMEOUT_MIN) with remaining changed to s32
> > also fixes the underflow; I used the explicit early return for symmetry
> > with the RTO helper, but I can switch that in v2 if you prefer.
> >
> > > Also, it would be great to have a  new packetdrill test.
> > >
> >
> > With packetdrill, AFAICS it would require a late ACK to be processed
> > after TCP_USER_TIMEOUT has elapsed but before tcp_probe_timer() runs the
> > abort check - i.e. a race between the RX softirq path and the timer.
> > This is not purely an event/packet ordering problem, so it is not clear
> > to me how a deterministic behavior can be simulated with packetdrill
> > without tying it to exact probe timing.
>
> I was trying to sense if the bug was serious or not, considering last
> tcp_clamp_probe0_to_user_timeout()
> statement is:
>
> return min_t(u32, remaining, when);
>
> Perhaps you should give more details in the changelog. What were the
> symptoms of this bug, for TCP_USER_TIMEOUT users.

Yes, it at least clamps to @when, but still caused connection teardowns
to be intermittently delayed beyond the user's explicit timeout sockopt.

Will send a v2 accordingly.

Thanks.

^ permalink raw reply

* Re: Path forward for NFC in the kernel
From: Jakub Kicinski @ 2026-04-23  3:03 UTC (permalink / raw)
  To: David Heidelberg
  Cc: Krzysztof Kozlowski, Michael Thalmeier, Raymond Hackley,
	Michael Walle, Bongsu Jeon, Mark Greer, netdev, oe-linux-nfc
In-Reply-To: <938496c6-84c1-4d53-bb56-73bbd7b2bdd7@ixit.cz>

On Wed, 22 Apr 2026 15:11:16 +0200 David Heidelberg wrote:
> Yes, this is broadly in line with what I had in mind. To clarify the “limited 
> reviews and basic maintenance” phrasing: that was more an attempt to set 
> expectations conservatively. I’m prepared to take on the responsibilities you 
> outlined — maintaining a tree, collecting and triaging patches, and sending 
> regular pull requests.
> 
> Regarding reviews and responsiveness: I can do the 48h turnaround for initial 
> feedback on submissions (excluding weekends, and occasional travel), and I’ll 
> make sure no patch sits unattended. For more complex changes where my current 
> NFC-specific knowledge may be a limiting factor, I’ll seek input rather than let 
> things stall.

Sounds good. The discussion taking long is perfectly fine. Main concern
to us is not hearing any response for a long time. Then we have to guess
whether the maintainer is planning to respond, or AWOL.

> I’m also planning to ramp up my familiarity with the NFC stack as I go, so I 
> expect both the quality and depth of my reviews to improve over time.
> 
> If that works for you, I’ll proceed with setting up a public tree and start 
> tracking incoming patches.

Works! :)

^ permalink raw reply

* Re: [PATCH net-deletions] net: remove ax25 and amateur radio (hamradio) subsystem
From: Kuniyuki Iwashima @ 2026-04-23  3:06 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, Simon Horman,
	corbet, skhan, federico.vaga, carlos.bilbao, avadhut.naik, alexs,
	si.yanteng, dzm91, 2023002089, tsbogend, dsahern, jani.nikula,
	mchehab+huawei, gregkh, jirislaby, tytso, herbert, ebiggers,
	johannes.berg, geert, pablo, tglx, mashiro.chen, mingo, dqfext,
	jreuter, sdf, pkshih, enelsonmoore, mkl, toke, kees, crossd,
	jlayton, wangliang74, aha310510, takamitz, linux-doc, linux-mips
In-Reply-To: <20260422104522.GK651125@horms.kernel.org>

On Mon, Apr 20, 2026 at 07:18:23PM -0700, Jakub Kicinski wrote:
> Remove the amateur radio (AX.25, NET/ROM, ROSE) protocol implementation
> and all associated hamradio device drivers from the kernel tree.
> This set of protocols has long been a huge bug/syzbot magnet,
> and since nobody stepped up to help us deal with the influx
> of the AI-generated bug reports we need to move it out of tree
> to protect our sanity.
>
> The code is moved to an out-of-tree repo:
> https://github.com/linux-netdev/mod-orphan
> if it's cleaned up and reworked there we can accept it back.
>
> Minimal stub headers are kept for include/net/ax25.h (AX25_P_IP,
> AX25_ADDR_LEN, ax25_address) and include/net/rose.h (ROSE_ADDR_LEN)
> so that the conditional integration code in arp.c and tun.c continues
> to compile and work when the out-of-tree modules are loaded.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* [PATCH bpf 1/2] bpf, tcx: reject offloaded programs on attach
From: Jiayuan Chen @ 2026-04-23  3:36 UTC (permalink / raw)
  To: bpf
  Cc: Jiayuan Chen, Yinhao Hu, Kaiyan Mei, Dongliang Mu,
	Daniel Borkmann, Nikolay Aleksandrov, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Martin KaFai Lau, John Fastabend, Stanislav Fomichev,
	Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Jesper Dangaard Brouer, Toke Høiland-Jørgensen, netdev,
	linux-kernel
In-Reply-To: <20260423033609.252464-1-jiayuan.chen@linux.dev>

An offloaded prog's bpf_func is replaced by bpf_prog_warn_on_exec(),
since it's supposed to run on the NIC, not the host. But tcx doesn't
check this and happily attaches it to the software path, so the first
packet hits the WARN.

XDP already guards this in dev_xdp_attach(); tcx just never got the
same check. Add it to both tcx_prog_attach() and tcx_link_attach().

Use bpf_prog_is_offloaded() instead of bpf_prog_is_dev_bound() so that
dev-bound-only programs (still JITed for host) keep working.

Fixes: e420bed025071 ("bpf: Add fd-based tcx multi-prog infra with link support")
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Reported-by: Dongliang Mu <dzm91@hust.edu.cn>
Closes: https://lore.kernel.org/bpf/64d8e2b5-a214-4f3c-b9e8-bcedbcb2c602@hust.edu.cn/
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 kernel/bpf/tcx.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/bpf/tcx.c b/kernel/bpf/tcx.c
index 02db0113b8e7c..86f4636d5a677 100644
--- a/kernel/bpf/tcx.c
+++ b/kernel/bpf/tcx.c
@@ -16,6 +16,9 @@ int tcx_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	struct net_device *dev;
 	int ret;
 
+	if (bpf_prog_is_offloaded(prog->aux))
+		return -EINVAL;
+
 	rtnl_lock();
 	dev = __dev_get_by_index(net, attr->target_ifindex);
 	if (!dev) {
@@ -315,6 +318,9 @@ int tcx_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	struct tcx_link *tcx;
 	int ret;
 
+	if (bpf_prog_is_offloaded(prog->aux))
+		return -EINVAL;
+
 	rtnl_lock();
 	dev = __dev_get_by_index(net, attr->link_create.target_ifindex);
 	if (!dev) {
-- 
2.43.0


^ permalink raw reply related

* [PATCH bpf 0/2] bpf: prevent offloaded programs from running on host via tcx/netkit
From: Jiayuan Chen @ 2026-04-23  3:36 UTC (permalink / raw)
  To: bpf
  Cc: Jiayuan Chen, Daniel Borkmann, Nikolay Aleksandrov, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Martin KaFai Lau, John Fastabend, Stanislav Fomichev,
	Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Jesper Dangaard Brouer, Toke Høiland-Jørgensen, netdev,
	linux-kernel

Yinhao reported a splat [1] when attaching a BPF program loaded with
prog_ifindex (targeted at an offload-capable device such as netdevsim)
to the software path via BPF_TCX_EGRESS. The program's bpf_func had
already been replaced by bpf_prog_warn_on_exec() during offload compile,
so the first packet that reaches tcx_run() trips the WARN:

[   19.592982] ------------[ cut here ]------------
[   19.594654] attempt to execute device eBPF program on the host!
[   19.594659] WARNING: kernel/bpf/offload.c:420 at 0x0, CPU#0: poc/337
[   19.599906] Modules linked in:
[   19.600680] CPU: 0 UID: 0 PID: 337 Comm: poc Not tainted
6.18.0-rc7-next-20251125 #10 PREEMPT(none)
[   19.601659] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX,
1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   19.602684] RIP: 0010:bpf_prog_warn_on_exec+0xc/0x20
[   19.603241] Code: 28 00 48 89 ef e8 74 44 2f 00 eb d7 66 90 90 90 90
90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 8d 3d a4 eb 95
06 <67> 48 0f b9 3a 31 c0 e9 83 76 44 ff 0f 1f 84 00 00 00 00 00 90 90
[   19.605093] RSP: 0018:ffff8881066e73d8 EFLAGS: 00010246
[   19.605663] RAX: ffffffff81cbca70 RBX: ffff8881013c4210 RCX:
0000000000000004
[   19.606378] RDX: 1ffff11020278842 RSI: ffffc90000563060 RDI:
ffffffff8861b620
[   19.607107] RBP: ffff8881010d0640 R08: ffff8881013c4210 R09:
ffff8881010d06b0
[   19.607827] R10: ffff8881010d06c3 R11: ffffc90000563000 R12:
ffffc90000563000
[   19.608751] R13: ffff8881010d06b4 R14: ffff888115eb1a34 R15:
dffffc0000000000
[   19.609478] FS:  000000000294c380(0000) GS:ffff8881911e9000(0000)
knlGS:0000000000000000
[   19.610316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   19.610943] CR2: 000057f6b9eb38c0 CR3: 00000001010ea000 CR4:
0000000000750ef0
[   19.611712] PKRU: 55555554
[   19.612006] Call Trace:
[   19.612281]  <TASK>
[   19.612523]  __dev_queue_xmit+0x22cb/0x3530
[   19.617607]  ip_finish_output2+0x621/0x1a60
[   19.621371]  ip_output+0x170/0x2e0
[   19.624586]  ip_send_skb+0x129/0x180
[   19.624940]  udp_send_skb+0x65d/0x1300
[   19.625316]  udp_sendmsg+0x13bf/0x2000
[   19.629960]  __sys_sendto+0x396/0x470
[   19.633720]  __x64_sys_sendto+0xdc/0x1b0
[   19.635066]  do_syscall_64+0x76/0x10a0
[   19.641701]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   19.642240] RIP: 0033:0x4240d7
[   19.642597] Code: 00 89 01 e9 c1 fe ff ff e8 f6 03 00 00 66 0f 1f 44
00 00 f3 0f 1e fa 80 3d 8d 3f 09 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
[   19.646088] RSP: 002b:00007fffcb9ecb68 EFLAGS: 00000202 ORIG_RAX:
000000000000002c
[   19.648938] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00000000004240d7
[   19.652116] RDX: 0000000000000008 RSI: 00007fffcb9ecce0 RDI:
0000000000000005
[   19.653148] RBP: 00007fffcb9eccf0 R08: 00007fffcb9ecbb0 R09:
0000000000000010
[   19.653951] R10: 0000000000000000 R11: 0000000000000202 R12:
00007fffcb9ece08
[   19.654760] R13: 00007fffcb9ece18 R14: 00000000004b2868 R15:
0000000000000001
[   19.657462]  </TASK>
[   19.657703] ---[ end trace 0000000000000000 ]---

The reason is that tcx and netkit never checked for offloaded programs
on attach, unlike XDP which already rejects this in dev_xdp_attach().
This series adds the same guard to both so offloaded programs can't be
attached to the software path.

[1]: https://lore.kernel.org/bpf/64d8e2b5-a214-4f3c-b9e8-bcedbcb2c602@hust.edu.cn/

Jiayuan Chen (2):
  bpf, tcx: reject offloaded programs on attach
  bpf, netkit: reject offloaded programs on attach

 drivers/net/netkit.c | 6 ++++++
 kernel/bpf/tcx.c     | 6 ++++++
 2 files changed, 12 insertions(+)

-- 
2.43.0


^ permalink raw reply

* [PATCH bpf 2/2] bpf, netkit: reject offloaded programs on attach
From: Jiayuan Chen @ 2026-04-23  3:36 UTC (permalink / raw)
  To: bpf
  Cc: Jiayuan Chen, Daniel Borkmann, Nikolay Aleksandrov, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Martin KaFai Lau, John Fastabend, Stanislav Fomichev,
	Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Jesper Dangaard Brouer, Toke Høiland-Jørgensen, netdev,
	linux-kernel
In-Reply-To: <20260423033609.252464-1-jiayuan.chen@linux.dev>

Same issue as the tcx fix: netkit accepts SCHED_CLS programs but never
checks if they were loaded for hardware offload. If someone loads a
program with prog_ifindex pointing to an offload-capable device and then
attaches it to a netkit peer, the bpf_func is bpf_prog_warn_on_exec()
and the first packet triggers the WARN.

Reject offloaded programs in both netkit_prog_attach() and
netkit_link_attach().

Fixes: 35dfaad7188cd ("netkit, bpf: Add bpf programmable net device")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 drivers/net/netkit.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c
index 5c0e01396e064..c4f764034c90f 100644
--- a/drivers/net/netkit.c
+++ b/drivers/net/netkit.c
@@ -533,6 +533,9 @@ int netkit_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	struct net_device *dev;
 	int ret;
 
+	if (bpf_prog_is_offloaded(prog->aux))
+		return -EINVAL;
+
 	rtnl_lock();
 	dev = netkit_dev_fetch(current->nsproxy->net_ns, attr->target_ifindex,
 			       attr->attach_type);
@@ -788,6 +791,9 @@ int netkit_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	struct net_device *dev;
 	int ret;
 
+	if (bpf_prog_is_offloaded(prog->aux))
+		return -EINVAL;
+
 	rtnl_lock();
 	dev = netkit_dev_fetch(current->nsproxy->net_ns,
 			       attr->link_create.target_ifindex,
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net v1] net: validate skb->napi_id in RX tracepoints
From: patchwork-bot+netdevbpf @ 2026-04-23  3:40 UTC (permalink / raw)
  To: Kohei Enju
  Cc: netdev, linux-trace-kernel, davem, edumazet, kuba, pabeni, horms,
	rostedt, mhiramat, mathieu.desnoyers
In-Reply-To: <20260420105427.162816-1-kohei@enjuk.jp>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 20 Apr 2026 10:54:23 +0000 you wrote:
> Since commit 2bd82484bb4c ("xps: fix xps for stacked devices"),
> skb->napi_id shares storage with sender_cpu. RX tracepoints using
> net_dev_rx_verbose_template read skb->napi_id directly and can therefore
> report sender_cpu values as if they were NAPI IDs.
> 
> For example, on the loopback path this can report 1 as napi_id, where 1
> comes from raw_smp_processor_id() + 1 in the XPS path:
> 
> [...]

Here is the summary with links:
  - [net,v1] net: validate skb->napi_id in RX tracepoints
    https://git.kernel.org/netdev/net/c/3bfcf396081a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net/packet: fix TOCTOU race on mmap'd vnet_hdr in tpacket_snd()
From: patchwork-bot+netdevbpf @ 2026-04-23  3:40 UTC (permalink / raw)
  To: Bingquan Chen
  Cc: willemdebruijn.kernel, gregkh, stephen, security, davem, kuba,
	edumazet, netdev
In-Reply-To: <20260418112006.78823-1-patzilla007@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 18 Apr 2026 19:20:06 +0800 you wrote:
> In tpacket_snd(), when PACKET_VNET_HDR is enabled, vnet_hdr points
> directly into the mmap'd TX ring buffer shared with userspace. The
> kernel validates the header via __packet_snd_vnet_parse() but then
> re-reads all fields later in virtio_net_hdr_to_skb(). A concurrent
> userspace thread can modify the vnet_hdr fields between validation
> and use, bypassing all safety checks.
> 
> [...]

Here is the summary with links:
  - [net] net/packet: fix TOCTOU race on mmap'd vnet_hdr in tpacket_snd()
    https://git.kernel.org/netdev/net/c/2c054e17d9d4

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] docs: maintainer-netdev: fix typo in "targeting"
From: patchwork-bot+netdevbpf @ 2026-04-23  3:40 UTC (permalink / raw)
  To: Ariful Islam Shoikot; +Cc: netdev, linux-doc, workflows, linux-kernel
In-Reply-To: <20260420114554.1026-1-islamarifulshoikat@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 20 Apr 2026 17:45:53 +0600 you wrote:
> Fix spelling mistake "targgeting" -> "targeting" in
> maintainer-netdev.rst
> 
> No functional change.
> 
> Signed-off-by: Ariful Islam Shoikot <islamarifulshoikat@gmail.com>
> 
> [...]

Here is the summary with links:
  - docs: maintainer-netdev: fix typo in "targeting"
    https://git.kernel.org/netdev/net/c/645d044d7e5c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] mctp i2c: check packet length before marking flow active
From: Jeremy Kerr @ 2026-04-23  3:47 UTC (permalink / raw)
  To: William A. Kennington III, Matt Johnston, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Wolfram Sang
  Cc: netdev, linux-kernel
In-Reply-To: <20260423001517.79219-1-william@wkennington.com>

Hi William,

> Move the mctp_i2c_get_tx_flow_state() call to after the length sanity
> check to ensure we only transition the flow state if we are actually
> going to proceed with the transmission and locking.

Good catch, thanks!

> Subject: [PATCH] mctp i2c: check packet length before marking flow active

You'll want to indicate that this is for the net tree, rather than
net-next, so something like:

   Subject: [PATCH net] net: mctp i2c: check packet length [...]

With that change:

Acked-by: Jeremy Kerr <jk@codeconstruct.com.au>

Out of curiosity though, how did you hit the hdr_byte_count mismatch in
the first place?

Cheers,


Jeremy

^ permalink raw reply

* Re: [PATCH net] sctp: fix sockets_allocated imbalance after sk_clone()
From: patchwork-bot+netdevbpf @ 2026-04-23  4:10 UTC (permalink / raw)
  To: Xin Long
  Cc: netdev, linux-sctp, davem, kuba, edumazet, pabeni, horms,
	marcelo.leitner, kuniyu
In-Reply-To: <af8d66f928dec3e9fcbee8d4a85b7d5a6b86f515.1776460180.git.lucien.xin@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 17 Apr 2026 17:09:40 -0400 you wrote:
> sk_clone() increments sockets_allocated and sets the socket refcount to 2.
> SCTP performs additional accounting in sctp_clone_sock(), so the clone-time
> increment must be undone to avoid double counting.
> 
> Note we cannot simply remove the SCTP-side increment, because the SCTP
> destroy path in sctp_destroy_sock() only decrements sockets_allocated when
> sp->ep is set, which may not be true for all failure paths in
> sctp_clone_sock().
> 
> [...]

Here is the summary with links:
  - [net] sctp: fix sockets_allocated imbalance after sk_clone()
    https://git.kernel.org/netdev/net/c/7c9b012d6367

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] seg6: fix seg6 lwtunnel output redirect for L2 reduced encap mode
From: patchwork-bot+netdevbpf @ 2026-04-23  4:10 UTC (permalink / raw)
  To: Andrea Mayer
  Cc: davem, dsahern, edumazet, kuba, pabeni, horms, anton.makarov11235,
	stefano.salsano, netdev, linux-kernel, stable
In-Reply-To: <20260418162838.31979-1-andrea.mayer@uniroma2.it>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 18 Apr 2026 18:28:38 +0200 you wrote:
> When SEG6_IPTUN_MODE_L2ENCAP_RED (L2ENCAP_RED) was introduced, the
> condition in seg6_build_state() that excludes L2 encap modes from
> setting LWTUNNEL_STATE_OUTPUT_REDIRECT was not updated to account for
> the new mode.
> As a consequence, L2ENCAP_RED routes incorrectly trigger seg6_output()
> on the output path, where the packet is silently dropped because
> skb_mac_header_was_set() fails on L3 packets.
> 
> [...]

Here is the summary with links:
  - [net] seg6: fix seg6 lwtunnel output redirect for L2 reduced encap mode
    https://git.kernel.org/netdev/net/c/ade67d5f5888

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v2] net/rds: zero per-item info buffer before handing it to visitors
From: patchwork-bot+netdevbpf @ 2026-04-23  4:10 UTC (permalink / raw)
  To: Michael Bommarito
  Cc: achender, davem, edumazet, kuba, pabeni, sharath.srinivasan,
	horms, netdev, linux-rdma, rds-devel, linux-kernel, stable
In-Reply-To: <20260418141047.3398203-1-michael.bommarito@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 18 Apr 2026 10:10:47 -0400 you wrote:
> rds_for_each_conn_info() and rds_walk_conn_path_info() both hand a
> caller-allocated on-stack u64 buffer to a per-connection visitor and
> then copy the full item_len bytes back to user space via
> rds_info_copy() regardless of how much of the buffer the visitor
> actually wrote.
> 
> rds_ib_conn_info_visitor() and rds6_ib_conn_info_visitor() only
> write a subset of their output struct when the underlying
> rds_connection is not in state RDS_CONN_UP (src/dst addr, tos, sl
> and the two GIDs via explicit memsets). Several u32 fields
> (max_send_wr, max_recv_wr, max_send_sge, rdma_mr_max, rdma_mr_size,
> cache_allocs) and the 2-byte alignment hole between sl and
> cache_allocs remain as whatever stack contents preceded the visitor
> call and are then memcpy_to_user()'d out to user space.
> 
> [...]

Here is the summary with links:
  - [net,v2] net/rds: zero per-item info buffer before handing it to visitors
    https://git.kernel.org/netdev/net/c/c88eb7e8d839

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v3 0/2] bnge fixes
From: patchwork-bot+netdevbpf @ 2026-04-23  4:10 UTC (permalink / raw)
  To: Vikas Gupta
  Cc: davem, edumazet, kuba, pabeni, andrew+netdev, horms, netdev,
	linux-kernel, vsrama-krishna.nemani, bhargava.marreddy,
	rajashekar.hudumula, ajit.khaparde, dharmender.garg,
	rahul-rg.gupta
In-Reply-To: <20260418023438.1597876-1-vikas.gupta@broadcom.com>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 18 Apr 2026 08:04:36 +0530 you wrote:
> Hi,
>  This series fix two issues.
> 
> Patch-1:
>     Due to wrong HWRM sequence, driver do not get the correct
>     information regarding resources and capabilities.
>     The patch fixes the initial HWRM sequence.
> Patch-2:
>     Remove the unsupported backing store type initialization, which is
>     not supported in Thor Ultra devices.
> 
> [...]

Here is the summary with links:
  - [net,v3,1/2] bnge: fix initial HWRM sequence
    https://git.kernel.org/netdev/net/c/70d7c905a07a
  - [net,v3,2/2] bnge: remove unsupported backing store type
    https://git.kernel.org/netdev/net/c/c6b34add67a5

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 0/4] Intel Wired LAN Driver Updates 2026-04-20 (ice)
From: patchwork-bot+netdevbpf @ 2026-04-23  4:20 UTC (permalink / raw)
  To: Jacob Keller
  Cc: przemyslaw.kitszel, andrew+netdev, davem, edumazet, kuba, pabeni,
	netdev, grzegorz.nitka, aleksandr.loktionov, poros,
	sunithax.d.mekala, timothy.miskell
In-Reply-To: <20260420-jk-iwl-net-2026-04-20-ptp-e825c-phy-interrupt-fixes-v1-0-bc2240f42251@intel.com>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 20 Apr 2026 17:51:24 -0700 you wrote:
> Since this is a set of related fixes for just the ice driver, Jake provides
> the following description for the series:
> 
> We recently ran into a nasty corner case issue with a customer operating
> E825C cards seeing some strange behavior with missing Tx timestamps. During
> the course of debugging. This series contains a few fixes found during this
> debugging process.
> 
> [...]

Here is the summary with links:
  - [net,1/4] ice: fix timestamp interrupt configuration for E825C
    https://git.kernel.org/netdev/net/c/c0a575a801a2
  - [net,2/4] ice: perform PHY soft reset for E825C ports at initialization
    https://git.kernel.org/netdev/net/c/3ec46e157c7f
  - [net,3/4] ice: fix ready bitmap check for non-E822 devices
    https://git.kernel.org/netdev/net/c/359dc1d41358
  - [net,4/4] ice: fix ice_ptp_read_tx_hwtstamp_status_eth56g
    https://git.kernel.org/netdev/net/c/1f75dbc53f68

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net/sched: sch_red: annotate data-races in red_dump_stats()
From: patchwork-bot+netdevbpf @ 2026-04-23  4:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, horms, jhs, jiri, netdev, eric.dumazet
In-Reply-To: <20260421142309.3964322-1-edumazet@google.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 21 Apr 2026 14:23:09 +0000 you wrote:
> red_dump_stats() only runs with RTNL held,
> reading fields that can be changed in qdisc fast path.
> 
> Add READ_ONCE()/WRITE_ONCE() annotations.
> 
> Alternative would be to acquire the qdisc spinlock, but our long-term
> goal is to make qdisc dump operations lockless as much as we can.
> 
> [...]

Here is the summary with links:
  - [net] net/sched: sch_red: annotate data-races in red_dump_stats()
    https://git.kernel.org/netdev/net/c/a8f5192809ca

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net_sched: sch_hhf: annotate data-races in hhf_dump_stats()
From: patchwork-bot+netdevbpf @ 2026-04-23  4:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, horms, jhs, jiri, netdev, eric.dumazet
In-Reply-To: <20260421143349.4052215-1-edumazet@google.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 21 Apr 2026 14:33:49 +0000 you wrote:
> hhf_dump_stats() only runs with RTNL held,
> reading fields that can be changed in qdisc fast path.
> 
> Add READ_ONCE()/WRITE_ONCE() annotations.
> 
> Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
> 
> [...]

Here is the summary with links:
  - [net] net_sched: sch_hhf: annotate data-races in hhf_dump_stats()
    https://git.kernel.org/netdev/net/c/a6edf2cd4156

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net/sched: sch_pie: annotate data-races in pie_dump_stats()
From: patchwork-bot+netdevbpf @ 2026-04-23  4:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, horms, jhs, jiri, netdev, eric.dumazet
In-Reply-To: <20260421142944.4009941-1-edumazet@google.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 21 Apr 2026 14:29:44 +0000 you wrote:
> pie_dump_stats() only runs with RTNL held,
> reading fields that can be changed in qdisc fast path.
> 
> Add READ_ONCE()/WRITE_ONCE() annotations.
> 
> Alternative would be to acquire the qdisc spinlock, but our long-term
> goal is to make qdisc dump operations lockless as much as we can.
> 
> [...]

Here is the summary with links:
  - [net] net/sched: sch_pie: annotate data-races in pie_dump_stats()
    https://git.kernel.org/netdev/net/c/5154561d9b11

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats()
From: patchwork-bot+netdevbpf @ 2026-04-23  4:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, horms, jhs, jiri, netdev, eric.dumazet
In-Reply-To: <20260421142509.3967231-1-edumazet@google.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 21 Apr 2026 14:25:09 +0000 you wrote:
> fq_codel_dump_stats() acquires the qdisc spinlock a bit too late.
> 
> Move this acquisition before we fill st.qdisc_stats with live data.
> 
> Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
> 
> [...]

Here is the summary with links:
  - [net] net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats()
    https://git.kernel.org/netdev/net/c/bbfaa73ea687

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net/sched: sch_sfb: annotate data-races in sfb_dump_stats()
From: patchwork-bot+netdevbpf @ 2026-04-23  4:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, horms, jhs, jiri, netdev, eric.dumazet
In-Reply-To: <20260421141655.3953721-1-edumazet@google.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 21 Apr 2026 14:16:55 +0000 you wrote:
> sfb_dump_stats() only runs with RTNL held,
> reading fields that can be changed in qdisc fast path.
> 
> Add READ_ONCE()/WRITE_ONCE() annotations.
> 
> Alternative would be to acquire the qdisc spinlock, but our long-term
> goal is to make qdisc dump operations lockless as much as we can.
> 
> [...]

Here is the summary with links:
  - [net] net/sched: sch_sfb: annotate data-races in sfb_dump_stats()
    https://git.kernel.org/netdev/net/c/1ada03fdef82

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v3] ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_lim
From: Ido Schimmel @ 2026-04-23  5:24 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: kuba, edumazet, dsahern, tom, willemdebruijn.kernel,
	justin.iurman, pabeni, netdev
In-Reply-To: <20260421202406.717885-1-daniel@iogearbox.net>

On Tue, Apr 21, 2026 at 10:24:06PM +0200, Daniel Borkmann wrote:
> Commit 47d3d7ac656a ("ipv6: Implement limits on Hop-by-Hop and
> Destination options") added net.ipv6.max_{hbh,dst}_opts_{cnt,len}
> and applied them in ip6_parse_tlv(), the generic TLV walker
> invoked from ipv6_destopt_rcv() and ipv6_parse_hopopts().
> 
> ip6_tnl_parse_tlv_enc_lim() does not go through ip6_parse_tlv();
> it has its own hand-rolled TLV scanner inside its NEXTHDR_DEST
> branch which looks for IPV6_TLV_TNL_ENCAP_LIMIT. That inner
> loop is bounded only by optlen, which can be up to 2048 bytes.
> Stuffing the Destination Options header with 2046 Pad1 (type=0)
> entries advances the scanner a single byte at a time, yielding
> ~2000 TLV iterations per extension header.
> 
> Reusing max_dst_opts_cnt to bound the TLV iterations, matching
> the semantics from 47d3d7ac656a, would require duplicating
> ip6_parse_tlv() to also validate Pad1/PadN payload. It would
> also mandate enforcing max_dst_opts_len, since otherwise an
> attacker shifts the axis to few options with a giant PadN and
> recovers the original DoS. Allowing up to 8 options before the
> tunnel encapsulation limit TLV is liberal enough; in practice
> encap limit is the first TLV. Thus, go with a hard-coded limit
> IP6_TUNNEL_MAX_DEST_TLVS (8).
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

Given that you are targeting net and that the issue was always present,
I would use:

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")

^ permalink raw reply

* [PATCH v1 net] ipmr: Free mr_table after RCU grace period.
From: Kuniyuki Iwashima @ 2026-04-23  5:34 UTC (permalink / raw)
  To: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

With CONFIG_IP_MROUTE_MULTIPLE_TABLES=n, ipmr_fib_lookup()
does not check if net->ipv4.mrt is NULL.

Since default_device_exit_batch() is called after ->exit_rtnl(),
a device could receive IGMP packets and access net->ipv4.mrt
during/after ipmr_rules_exit_rtnl().

If ipmr_rules_exit_rtnl() had already cleared it and freed the
memory, the access would trigger null-ptr-deref or use-after-free.

Let's fix it by using RCU helper and free mrt after RCU grace
period.

In addition, check_net(net) is added to mroute_clean_tables()
and ipmr_cache_unresolved() to synchronise via mfc_unres_lock.
This prevents ipmr_cache_unresolved() from putting skb into
c->_c.mfc_un.unres.unresolved after mroute_clean_tables()
purges it.

For the same reason, timer_shutdown_sync() is moved after
mroute_clean_tables().

Since rhltable_destroy() holds mutex internally, rcu_work is
used, and it is placed as the first member because rcu_head
must be placed within <4K offset.  mr_table is alraedy 3864
bytes without rcu_work.

Note that IP6MR is not yet converted to ->exit_rtnl(), so this
change is not needed for now but will be.

Fixes: b22b01867406 ("ipmr: Convert ipmr_net_exit_batch() to ->exit_rtnl().")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 include/linux/mroute_base.h |   3 +
 net/ipv4/ipmr.c             | 108 +++++++++++++++++++-----------------
 net/ipv4/ipmr_base.c        |  16 ++++++
 3 files changed, 77 insertions(+), 50 deletions(-)

diff --git a/include/linux/mroute_base.h b/include/linux/mroute_base.h
index cf3374580f74..5d75cc5b057e 100644
--- a/include/linux/mroute_base.h
+++ b/include/linux/mroute_base.h
@@ -226,6 +226,7 @@ struct mr_table_ops {
 
 /**
  * struct mr_table - a multicast routing table
+ * @work: used for table destruction
  * @list: entry within a list of multicast routing tables
  * @net: net where this table belongs
  * @ops: protocol specific operations
@@ -243,6 +244,7 @@ struct mr_table_ops {
  * @mroute_reg_vif_num: PIM-device vif index
  */
 struct mr_table {
+	struct rcu_work		work;
 	struct list_head	list;
 	possible_net_t		net;
 	struct mr_table_ops	ops;
@@ -274,6 +276,7 @@ void vif_device_init(struct vif_device *v,
 		     unsigned short flags,
 		     unsigned short get_iflink_mask);
 
+void mr_table_free(struct mr_table *mrt);
 struct mr_table *
 mr_table_alloc(struct net *net, u32 id,
 	       struct mr_table_ops *ops,
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 8a08d09b4c30..2058ca860294 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -151,16 +151,6 @@ static struct mr_table *__ipmr_get_table(struct net *net, u32 id)
 	return NULL;
 }
 
-static struct mr_table *ipmr_get_table(struct net *net, u32 id)
-{
-	struct mr_table *mrt;
-
-	rcu_read_lock();
-	mrt = __ipmr_get_table(net, id);
-	rcu_read_unlock();
-	return mrt;
-}
-
 static int ipmr_fib_lookup(struct net *net, struct flowi4 *flp4,
 			   struct mr_table **mrt)
 {
@@ -293,7 +283,7 @@ static void __net_exit ipmr_rules_exit_rtnl(struct net *net,
 	struct mr_table *mrt, *next;
 
 	list_for_each_entry_safe(mrt, next, &net->ipv4.mr_tables, list) {
-		list_del(&mrt->list);
+		list_del_rcu(&mrt->list);
 		ipmr_free_table(mrt, dev_kill_list);
 	}
 }
@@ -315,28 +305,30 @@ bool ipmr_rule_default(const struct fib_rule *rule)
 }
 EXPORT_SYMBOL(ipmr_rule_default);
 #else
-#define ipmr_for_each_table(mrt, net) \
-	for (mrt = net->ipv4.mrt; mrt; mrt = NULL)
-
 static struct mr_table *ipmr_mr_table_iter(struct net *net,
 					   struct mr_table *mrt)
 {
 	if (!mrt)
-		return net->ipv4.mrt;
+		return rcu_dereference(net->ipv4.mrt);
 	return NULL;
 }
 
-static struct mr_table *ipmr_get_table(struct net *net, u32 id)
+static struct mr_table *__ipmr_get_table(struct net *net, u32 id)
 {
-	return net->ipv4.mrt;
+	return rcu_dereference_check(net->ipv4.mrt,
+				     lockdep_rtnl_is_held() ||
+				     !rcu_access_pointer(net->ipv4.mrt));
 }
 
-#define __ipmr_get_table ipmr_get_table
+#define ipmr_for_each_table(mrt, net)				\
+	for (mrt = __ipmr_get_table(net, 0); mrt; mrt = NULL)
 
 static int ipmr_fib_lookup(struct net *net, struct flowi4 *flp4,
 			   struct mr_table **mrt)
 {
-	*mrt = net->ipv4.mrt;
+	*mrt = rcu_dereference(net->ipv4.mrt);
+	if (!*mrt)
+		return -EAGAIN;
 	return 0;
 }
 
@@ -347,7 +339,8 @@ static int __net_init ipmr_rules_init(struct net *net)
 	mrt = ipmr_new_table(net, RT_TABLE_DEFAULT);
 	if (IS_ERR(mrt))
 		return PTR_ERR(mrt);
-	net->ipv4.mrt = mrt;
+
+	rcu_assign_pointer(net->ipv4.mrt, mrt);
 	return 0;
 }
 
@@ -358,9 +351,10 @@ static void __net_exit ipmr_rules_exit(struct net *net)
 static void __net_exit ipmr_rules_exit_rtnl(struct net *net,
 					    struct list_head *dev_kill_list)
 {
-	ipmr_free_table(net->ipv4.mrt, dev_kill_list);
+	struct mr_table *mrt = rcu_dereference_protected(net->ipv4.mrt, 1);
 
-	net->ipv4.mrt = NULL;
+	RCU_INIT_POINTER(net->ipv4.mrt, NULL);
+	ipmr_free_table(mrt, dev_kill_list);
 }
 
 static int ipmr_rules_dump(struct net *net, struct notifier_block *nb,
@@ -381,6 +375,17 @@ bool ipmr_rule_default(const struct fib_rule *rule)
 EXPORT_SYMBOL(ipmr_rule_default);
 #endif
 
+static struct mr_table *ipmr_get_table(struct net *net, u32 id)
+{
+	struct mr_table *mrt;
+
+	rcu_read_lock();
+	mrt = __ipmr_get_table(net, id);
+	rcu_read_unlock();
+
+	return mrt;
+}
+
 static inline int ipmr_hash_cmp(struct rhashtable_compare_arg *arg,
 				const void *ptr)
 {
@@ -441,12 +446,11 @@ static void ipmr_free_table(struct mr_table *mrt, struct list_head *dev_kill_lis
 
 	WARN_ON_ONCE(!mr_can_free_table(net));
 
-	timer_shutdown_sync(&mrt->ipmr_expire_timer);
 	mroute_clean_tables(mrt, MRT_FLUSH_VIFS | MRT_FLUSH_VIFS_STATIC |
 			    MRT_FLUSH_MFC | MRT_FLUSH_MFC_STATIC,
 			    &ipmr_dev_kill_list);
-	rhltable_destroy(&mrt->mfc_hash);
-	kfree(mrt);
+	timer_shutdown_sync(&mrt->ipmr_expire_timer);
+	mr_table_free(mrt);
 
 	WARN_ON_ONCE(!net_initialized(net) && !list_empty(&ipmr_dev_kill_list));
 	list_splice(&ipmr_dev_kill_list, dev_kill_list);
@@ -1135,12 +1139,19 @@ static int ipmr_cache_report(const struct mr_table *mrt,
 static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
 				 struct sk_buff *skb, struct net_device *dev)
 {
+	struct net *net = read_pnet(&mrt->net);
 	const struct iphdr *iph = ip_hdr(skb);
-	struct mfc_cache *c;
+	struct mfc_cache *c = NULL;
 	bool found = false;
 	int err;
 
 	spin_lock_bh(&mfc_unres_lock);
+
+	if (!check_net(net)) {
+		err = -EINVAL;
+		goto err;
+	}
+
 	list_for_each_entry(c, &mrt->mfc_unres_queue, _c.list) {
 		if (c->mfc_mcastgrp == iph->daddr &&
 		    c->mfc_origin == iph->saddr) {
@@ -1153,10 +1164,8 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
 		/* Create a new entry if allowable */
 		c = ipmr_cache_alloc_unres();
 		if (!c) {
-			spin_unlock_bh(&mfc_unres_lock);
-
-			kfree_skb(skb);
-			return -ENOBUFS;
+			err = -ENOBUFS;
+			goto err;
 		}
 
 		/* Fill in the new cache entry */
@@ -1166,17 +1175,8 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
 
 		/* Reflect first query at mrouted. */
 		err = ipmr_cache_report(mrt, skb, vifi, IGMPMSG_NOCACHE);
-
-		if (err < 0) {
-			/* If the report failed throw the cache entry
-			   out - Brad Parker
-			 */
-			spin_unlock_bh(&mfc_unres_lock);
-
-			ipmr_cache_free(c);
-			kfree_skb(skb);
-			return err;
-		}
+		if (err < 0)
+			goto err;
 
 		atomic_inc(&mrt->cache_resolve_queue_len);
 		list_add(&c->_c.list, &mrt->mfc_unres_queue);
@@ -1189,18 +1189,26 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
 
 	/* See if we can append the packet */
 	if (c->_c.mfc_un.unres.unresolved.qlen > 3) {
-		kfree_skb(skb);
+		c = NULL;
 		err = -ENOBUFS;
-	} else {
-		if (dev) {
-			skb->dev = dev;
-			skb->skb_iif = dev->ifindex;
-		}
-		skb_queue_tail(&c->_c.mfc_un.unres.unresolved, skb);
-		err = 0;
+		goto err;
+	}
+
+	if (dev) {
+		skb->dev = dev;
+		skb->skb_iif = dev->ifindex;
 	}
 
+	skb_queue_tail(&c->_c.mfc_un.unres.unresolved, skb);
+
 	spin_unlock_bh(&mfc_unres_lock);
+	return 0;
+
+err:
+	spin_unlock_bh(&mfc_unres_lock);
+	if (c)
+		ipmr_cache_free(c);
+	kfree_skb(skb);
 	return err;
 }
 
@@ -1346,7 +1354,7 @@ static void mroute_clean_tables(struct mr_table *mrt, int flags,
 	}
 
 	if (flags & MRT_FLUSH_MFC) {
-		if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
+		if (atomic_read(&mrt->cache_resolve_queue_len) != 0 || !check_net(net)) {
 			spin_lock_bh(&mfc_unres_lock);
 			list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
 				list_del(&c->list);
diff --git a/net/ipv4/ipmr_base.c b/net/ipv4/ipmr_base.c
index 37a3c144276c..3930d612c3de 100644
--- a/net/ipv4/ipmr_base.c
+++ b/net/ipv4/ipmr_base.c
@@ -28,6 +28,20 @@ void vif_device_init(struct vif_device *v,
 		v->link = dev->ifindex;
 }
 
+static void __mr_free_table(struct work_struct *work)
+{
+	struct mr_table *mrt = container_of(to_rcu_work(work),
+					    struct mr_table, work);
+
+	rhltable_destroy(&mrt->mfc_hash);
+	kfree(mrt);
+}
+
+void mr_table_free(struct mr_table *mrt)
+{
+	queue_rcu_work(system_unbound_wq, &mrt->work);
+}
+
 struct mr_table *
 mr_table_alloc(struct net *net, u32 id,
 	       struct mr_table_ops *ops,
@@ -50,6 +64,8 @@ mr_table_alloc(struct net *net, u32 id,
 		kfree(mrt);
 		return ERR_PTR(err);
 	}
+
+	INIT_RCU_WORK(&mrt->work, __mr_free_table);
 	INIT_LIST_HEAD(&mrt->mfc_cache_list);
 	INIT_LIST_HEAD(&mrt->mfc_unres_queue);
 
-- 
2.54.0.rc2.533.g4f5dca5207-goog


^ permalink raw reply related

* Re: [PATCH net 1/1] net: rds: fix MR cleanup on copy error
From: Allison Henderson @ 2026-04-23  5:39 UTC (permalink / raw)
  To: Ren Wei, netdev, linux-rdma, rds-devel
  Cc: davem, edumazet, kuba, pabeni, horms, leon, santosh.shilimkar,
	jhubbard, yuantan098, yifanwucs, tomapufckgml, bird, draw51280
In-Reply-To: <79c8ef73ec8e5844d71038983940cc2943099baf.1776764247.git.draw51280@163.com>

On Wed, 2026-04-22 at 22:52 +0800, Ren Wei wrote:
> From: Ao Zhou <draw51280@163.com>
> 
> __rds_rdma_map() hands sg/pages ownership to the transport after
> get_mr() succeeds. If copying the generated cookie back to user space
> fails after that point, the error path must not free those resources
> again before dropping the MR reference.
> 
> Remove the duplicate unpin/free from the put_user() failure branch so
> that MR teardown is handled only through the existing final cleanup
> path.
> 
> Fixes: 0d4597c8c5ab ("net/rds: Track user mapped pages through special API")
> Cc: stable@kernel.org
> Reported-by: Yuan Tan <yuantan098@gmail.com>
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Reported-by: Xin Liu <bird@lzu.edu.cn>
> Signed-off-by: Ao Zhou <draw51280@163.com>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>

Hi Aso,

This fix looks good to me.  Since this is a bug fix, this patch should be cc'd to stable@vger.kernel.org.  Also be sure
to note the target tree and component in the subject line like this:  

[PATCH net v2 1/1] net/net: rds: fix MR cleanup on copy error

Other than that, the patch looks good to me.  Thanks Aso.

Reviewed-by: Allison Henderson <achender@kernel.org>

Allison

> ---
>  net/rds/rdma.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/net/rds/rdma.c b/net/rds/rdma.c
> index aa6465dc742c..61fb6e45281b 100644
> --- a/net/rds/rdma.c
> +++ b/net/rds/rdma.c
> @@ -326,10 +326,6 @@ static int __rds_rdma_map(struct rds_sock *rs, struct rds_get_mr_args *args,
>  
>  	if (args->cookie_addr &&
>  	    put_user(cookie, (u64 __user *)(unsigned long)args->cookie_addr)) {
> -		if (!need_odp) {
> -			unpin_user_pages(pages, nr_pages);
> -			kfree(sg);
> -		}
>  		ret = -EFAULT;
>  		goto out;
>  	}


^ permalink raw reply

* Re: [PATCH] smb: smbdirect: move fs/smb/common/smbdirect/ to fs/smb/smbdirect/
From: Christoph Hellwig @ 2026-04-23  5:52 UTC (permalink / raw)
  To: Stefan Metzmacher
  Cc: Christoph Hellwig, linux-cifs, linux-rdma, netdev,
	samba-technical, Tom Talpey, Steve French, Linus Torvalds,
	Namjae Jeon, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
	ceph-devel, Jeff Layton, linux-nfs
In-Reply-To: <9cb0901c-18c5-4858-941c-3b37ee112af9@samba.org>

On Wed, Apr 22, 2026 at 10:16:41AM +0200, Stefan Metzmacher wrote:
> > Why is this not in net/smbdirect/ or driver/infiniband/ulp/smdirect?
> 
> Yes, I also thought about net/smbdirect.
> 
> As IPPROTO_SMBDIRECT or PF_SMBDIRECT will be the next step,
> see the open discussion here:
> https://lore.kernel.org/linux-cifs/cover.1775571957.git.metze@samba.org/
> (I'll follow with that discussion soon)

Seems like it is the right fit then.

> I was just unsure about the consequences, e.g. would
> the maintainer/pull request flow have to change in that case?
> Or would Steve be able to take the changes via his trees?
> Any I also didn't want to offend anybody, so I just took
> what Linus proposed.

You might want to ask the sunrpc or ceph maintainers as they have a
similar split.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox