* Re: [PATCH net-next 0/2] appletalk: move the protocol out of tree
From: Jakub Kicinski @ 2026-06-16 15:49 UTC (permalink / raw)
To: Carsten Strotmann
Cc: John Paul Adrian Glaubitz, davem, netdev, edumazet, pabeni,
andrew+netdev, horms, geert, chleroy, npiggin, mpe, maddy,
linux-mips, linux-m68k, linuxppc-dev
In-Reply-To: <A3590144-073C-46D6-8425-90EE0C4D48E8@strotmann.de>
On Tue, 16 Jun 2026 09:13:46 +0200 Carsten Strotmann wrote:
> I'm a user of AppleTalk and other "Retro"-Features in the Linux Kernel.
>
> On 16 Jun 2026, at 2:55, Jakub Kicinski wrote:
>
> > We can complain about the AI slop til the cows comes home.
> > I don't like it, you don't like it. What difference does it make?
> >
> > If y'all have real solutions please share. Complaining about
> > "commercial interests" and "nuk[ing] everything in a panic reaction"
> > is not helpful.
>
> the solution, as Adrian pointed out, is to leave these features in
> the Linux kernel but have them disabled by default.
I think y'all need to internalize that "just leave it in" means work.
_Someone_ has to handle the reports and patches. And since nobody is
doing that the code is going to GitHub, where it can continue to "just
be left" or whatever, without racking up CVEs for the Linux kernel
and leading to maintainer burn out :/
> Maybe put a warning message in the kernel config tools that people
> should only enable these if they know what they are doing.
>
> These "retro"-features should not pose any security risk of they are
> not compiled into a kernel.
Nobody is stopping you from using this code! It's perfectly suitable
to be an out of tree module. Maybe it'd be harder if someone wanted to
remove a CPU architecture you want to use, but protocols are perfectly
fine as loadable modules. You can continue to use the code from:
https://github.com/linux-netdev/mod-orphan
Presumably you could get Debian to package that and you wouldn't even
know the sources no longer live in the kernel tree.
^ permalink raw reply
* [PATCH net-next 0/3] selftests/xsk: stabilize timeout test behavior
From: Tushar Vyavahare @ 2026-06-16 15:49 UTC (permalink / raw)
To: netdev, magnus.karlsson, maciej.fijalkowski, stfomichev,
kernelxing, davem, kuba, pabeni, ast, daniel, tirthendu.sarkar,
tushar.vyavahare
Cc: bpf
This series improves AF_XDP selftests by making timeout handling
explicit and fixing sources of non-determinism in xsk timeout tests.
Patch 1 introduces test_spec::poll_tmout and removes implicit
dependence on RX UMEM setup state for timeout behavior.
Patch 2 fixes thread harness sequencing by attaching XDP programs
before worker startup, removing signal-based termination, and using
barrier synchronization only for dual-thread runs.
Patch 3 restores shared_umem after POLL_TXQ_FULL so test-local
configuration does not leak into subsequent cases on shared-netdev
runs.
Together these changes make timeout handling easier to follow and
improve selftest stability, especially on real NIC runs.
Tushar Vyavahare (3):
selftests/xsk: make poll timeout mode explicit
selftests/xsk: fix timeout thread harness sequencing
selftests/xsk: restore shared_umem after POLL_TXQ_FULL
.../selftests/bpf/prog_tests/test_xsk.c | 96 +++++++++++--------
.../selftests/bpf/prog_tests/test_xsk.h | 2 +
2 files changed, 56 insertions(+), 42 deletions(-)
--
2.43.0
^ permalink raw reply
* [PATCH net-next 1/3] selftests/xsk: make poll timeout mode explicit
From: Tushar Vyavahare @ 2026-06-16 15:49 UTC (permalink / raw)
To: netdev, magnus.karlsson, maciej.fijalkowski, stfomichev,
kernelxing, davem, kuba, pabeni, ast, daniel, tirthendu.sarkar,
tushar.vyavahare
Cc: bpf
In-Reply-To: <20260616154955.1492560-1-tushar.vyavahare@intel.com>
Stop inferring timeout behavior from RX UMEM initialization state.
That ties timeout semantics to setup internals and obscures intent.
Use test_spec::poll_tmout as the explicit timeout-mode selector in
TX and RX paths.
In RX, treat poll timeout as expected only in timeout mode.
In TX, let send_pkts() own loop completion in non-timeout mode
and use __send_pkts() only for progress and timeout detection.
This makes timeout logic explicit and keeps control flow predictable.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
---
.../selftests/bpf/prog_tests/test_xsk.c | 44 +++++++++----------
.../selftests/bpf/prog_tests/test_xsk.h | 1 +
2 files changed, 21 insertions(+), 24 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
index 72875071d4f1..ca47a16ceb1a 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
@@ -65,11 +65,6 @@ static void gen_eth_hdr(struct xsk_socket_info *xsk, struct ethhdr *eth_hdr)
eth_hdr->h_proto = htons(ETH_P_LOOPBACK);
}
-static bool is_umem_valid(struct xsk_socket_info *xsk)
-{
- return !!xsk->umem->umem;
-}
-
static u32 mode_to_xdp_flags(enum test_mode mode)
{
return (mode == TEST_MODE_SKB) ? XDP_FLAGS_SKB_MODE : XDP_FLAGS_DRV_MODE;
@@ -1010,7 +1005,7 @@ static int __receive_pkts(struct test_spec *test, struct xsk_socket_info *xsk)
return TEST_FAILURE;
if (!ret) {
- if (!is_umem_valid(test->ifobj_tx->xsk))
+ if (test->poll_tmout)
return TEST_PASS;
ksft_print_msg("ERROR: [%s] Poll timed out\n", __func__);
@@ -1149,7 +1144,7 @@ static int receive_pkts(struct test_spec *test)
break;
res = __receive_pkts(test, xsk);
- if (!(res == TEST_PASS || res == TEST_CONTINUE))
+ if (res != TEST_CONTINUE)
return res;
ret = gettimeofday(&tv_now, NULL);
@@ -1166,7 +1161,8 @@ static int receive_pkts(struct test_spec *test)
return TEST_PASS;
}
-static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, bool timeout)
+static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk,
+ bool test_timeout)
{
u32 i, idx = 0, valid_pkts = 0, valid_frags = 0, buffer_len;
struct pkt_stream *pkt_stream = xsk->pkt_stream;
@@ -1178,7 +1174,7 @@ static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, b
buffer_len = pkt_get_buffer_len(umem, pkt_stream->max_pkt_len);
/* pkts_in_flight might be negative if many invalid packets are sent */
if (pkts_in_flight >= (int)((umem_size(umem) - xsk->batch_size * buffer_len) /
- buffer_len)) {
+ buffer_len) && !test_timeout) {
ret = kick_tx(xsk);
if (ret)
return TEST_FAILURE;
@@ -1191,7 +1187,7 @@ static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, b
while (xsk_ring_prod__reserve(&xsk->tx, xsk->batch_size, &idx) < xsk->batch_size) {
if (use_poll) {
ret = poll(&fds, 1, POLL_TMOUT);
- if (timeout) {
+ if (test_timeout) {
if (ret < 0) {
ksft_print_msg("ERROR: [%s] Poll error %d\n",
__func__, errno);
@@ -1271,7 +1267,7 @@ static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, b
if (use_poll) {
ret = poll(&fds, 1, POLL_TMOUT);
if (ret <= 0) {
- if (ret == 0 && timeout)
+ if (ret == 0 && test_timeout)
return TEST_PASS;
ksft_print_msg("ERROR: [%s] Poll error %d\n", __func__, ret);
@@ -1279,14 +1275,14 @@ static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, b
}
}
- if (!timeout) {
+ if (!test_timeout) {
if (complete_pkts(xsk, i))
return TEST_FAILURE;
usleep(10);
- return TEST_PASS;
}
+ /* Loop completion is driven by send_pkts() stream progress checks. */
return TEST_CONTINUE;
}
@@ -1322,7 +1318,6 @@ bool all_packets_sent(struct test_spec *test, unsigned long *bitmap)
static int send_pkts(struct test_spec *test, struct ifobject *ifobject)
{
- bool timeout = !is_umem_valid(test->ifobj_rx->xsk);
DECLARE_BITMAP(bitmap, test->nb_sockets);
u32 i, ret;
@@ -1337,19 +1332,18 @@ static int send_pkts(struct test_spec *test, struct ifobject *ifobject)
__set_bit(i, bitmap);
continue;
}
- ret = __send_pkts(ifobject, &ifobject->xsk_arr[i], timeout);
- if (ret == TEST_CONTINUE && !test->fail)
- continue;
-
- if ((ret || test->fail) && !timeout)
- return TEST_FAILURE;
-
- if (ret == TEST_PASS && timeout)
+ ret = __send_pkts(ifobject, &ifobject->xsk_arr[i], test->poll_tmout);
+ if (ret != TEST_CONTINUE)
return ret;
- ret = wait_for_tx_completion(&ifobject->xsk_arr[i]);
- if (ret)
+ if (test->fail)
return TEST_FAILURE;
+
+ if (!test->poll_tmout) {
+ ret = wait_for_tx_completion(&ifobject->xsk_arr[i]);
+ if (ret)
+ return TEST_FAILURE;
+ }
}
}
@@ -2231,6 +2225,7 @@ int testapp_xdp_shared_umem(struct test_spec *test)
int testapp_poll_txq_tmout(struct test_spec *test)
{
+ test->poll_tmout = true;
test->ifobj_tx->use_poll = true;
/* create invalid frame by set umem frame_size and pkt length equal to 2048 */
test->ifobj_tx->xsk->umem->frame_size = 2048;
@@ -2241,6 +2236,7 @@ int testapp_poll_txq_tmout(struct test_spec *test)
int testapp_poll_rxq_tmout(struct test_spec *test)
{
+ test->poll_tmout = true;
test->ifobj_rx->use_poll = true;
return testapp_validate_traffic_single_thread(test, test->ifobj_rx);
}
diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.h b/tools/testing/selftests/bpf/prog_tests/test_xsk.h
index 4313d0d87235..20eaaa254998 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_xsk.h
+++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.h
@@ -207,6 +207,7 @@ struct test_spec {
bool set_ring;
bool adjust_tail;
bool adjust_tail_support;
+ bool poll_tmout;
enum test_mode mode;
char name[MAX_TEST_NAME_SIZE];
};
--
2.43.0
^ permalink raw reply related
* [PATCH net-next 2/3] selftests/xsk: fix timeout thread harness sequencing
From: Tushar Vyavahare @ 2026-06-16 15:49 UTC (permalink / raw)
To: netdev, magnus.karlsson, maciej.fijalkowski, stfomichev,
kernelxing, davem, kuba, pabeni, ast, daniel, tirthendu.sarkar,
tushar.vyavahare
Cc: bpf
In-Reply-To: <20260616154955.1492560-1-tushar.vyavahare@intel.com>
Prevent workers from running before XDP program attachment completes.
The previous ordering allowed races between worker startup and setup.
Attach XDP programs before entering traffic validation.
Remove SIGUSR1-based worker termination and use pthread_join() for
thread shutdown so blocking syscalls are not interrupted.
Use barriers only for dual-thread runs so participants match and
teardown ordering stays deterministic.
This removes setup/startup races and stabilizes harness sequencing.
Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
---
.../selftests/bpf/prog_tests/test_xsk.c | 33 ++++++++++---------
.../selftests/bpf/prog_tests/test_xsk.h | 1 +
2 files changed, 18 insertions(+), 16 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
index ca47a16ceb1a..d4702d2aac5e 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
@@ -7,7 +7,6 @@
#include <linux/netdev.h>
#include <poll.h>
#include <pthread.h>
-#include <signal.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/socket.h>
@@ -1671,7 +1670,8 @@ void *worker_testapp_validate_rx(void *arg)
strerror(-err));
}
- pthread_barrier_wait(&barr);
+ if (test->use_barrier)
+ pthread_barrier_wait(&barr);
/* We leave only now in case of error to avoid getting stuck in the barrier */
if (err) {
@@ -1710,11 +1710,6 @@ static void testapp_clean_xsk_umem(struct ifobject *ifobj)
munmap(umem->buffer, umem->mmap_size);
}
-static void handler(int signum)
-{
- pthread_exit(NULL);
-}
-
static bool xdp_prog_changed_rx(struct test_spec *test)
{
struct ifobject *ifobj = test->ifobj_rx;
@@ -1819,9 +1814,18 @@ static int __testapp_validate_traffic(struct test_spec *test, struct ifobject *i
return TEST_FAILURE;
}
- if (ifobj2) {
+ err = xsk_attach_xdp_progs(test, ifobj1, ifobj2);
+ if (err) {
+ ksft_print_msg("Error: failed to attach XDP programs: %d (%s)\n",
+ err, strerror(-err));
+ return TEST_FAILURE;
+ }
+ test->use_barrier = !!ifobj2;
+
+ if (test->use_barrier) {
if (pthread_barrier_init(&barr, NULL, 2))
return TEST_FAILURE;
+
pkt_stream_reset(ifobj2->xsk->pkt_stream);
}
@@ -1829,27 +1833,26 @@ static int __testapp_validate_traffic(struct test_spec *test, struct ifobject *i
pkt_stream_reset(ifobj1->xsk->pkt_stream);
pkts_in_flight = 0;
- signal(SIGUSR1, handler);
/*Spawn RX thread */
pthread_create(&t0, NULL, ifobj1->func_ptr, test);
- if (ifobj2) {
+ if (test->use_barrier) {
pthread_barrier_wait(&barr);
if (pthread_barrier_destroy(&barr)) {
- pthread_kill(t0, SIGUSR1);
+ test->use_barrier = false;
+ pthread_join(t0, NULL);
clean_sockets(test, ifobj1);
clean_umem(test, ifobj1, NULL);
return TEST_FAILURE;
}
+ }
+ if (ifobj2) {
/*Spawn TX thread */
pthread_create(&t1, NULL, ifobj2->func_ptr, test);
-
pthread_join(t1, NULL);
}
- if (!ifobj2)
- pthread_kill(t0, SIGUSR1);
pthread_join(t0, NULL);
if (test->total_steps == test->current_step || test->fail) {
@@ -1887,8 +1890,6 @@ static int testapp_validate_traffic(struct test_spec *test)
}
}
- if (xsk_attach_xdp_progs(test, ifobj_rx, ifobj_tx))
- return TEST_FAILURE;
return __testapp_validate_traffic(test, ifobj_rx, ifobj_tx);
}
diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.h b/tools/testing/selftests/bpf/prog_tests/test_xsk.h
index 20eaaa254998..03753ddc5dcd 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_xsk.h
+++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.h
@@ -208,6 +208,7 @@ struct test_spec {
bool adjust_tail;
bool adjust_tail_support;
bool poll_tmout;
+ bool use_barrier;
enum test_mode mode;
char name[MAX_TEST_NAME_SIZE];
};
--
2.43.0
^ permalink raw reply related
* [PATCH net-next 3/3] selftests/xsk: restore shared_umem after POLL_TXQ_FULL
From: Tushar Vyavahare @ 2026-06-16 15:49 UTC (permalink / raw)
To: netdev, magnus.karlsson, maciej.fijalkowski, stfomichev,
kernelxing, davem, kuba, pabeni, ast, daniel, tirthendu.sarkar,
tushar.vyavahare
Cc: bpf
In-Reply-To: <20260616154955.1492560-1-tushar.vyavahare@intel.com>
POLL_TXQ_FULL temporarily disables shared_umem on TX to exercise the
TX timeout path in isolation.
With shared_umem enabled, TX setup expects RX UMEM to be initialized
first and fails with: "RX UMEM is not initialized before shared-UMEM TX
setup".
Save and restore shared_umem around POLL_TXQ_FULL execution, and restore
it on both success and pkt_stream_replace() failure paths.
Also add an in-code comment explaining why shared_umem is temporarily
disabled in this test.
This keeps timeout setup local and prevents cross-test state leakage.
Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
---
.../selftests/bpf/prog_tests/test_xsk.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
index d4702d2aac5e..6eb9096d084c 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
@@ -2226,13 +2226,28 @@ int testapp_xdp_shared_umem(struct test_spec *test)
int testapp_poll_txq_tmout(struct test_spec *test)
{
+ bool shared_umem = test->ifobj_tx->shared_umem;
+ int ret;
+
test->poll_tmout = true;
+ /*
+ * POLL_TXQ_FULL exercises TX timeout setup in isolation.
+ * Keep TX out of shared-UMEM mode here so TX setup does not require
+ * RX UMEM to be initialized first.
+ */
+ test->ifobj_tx->shared_umem = false;
test->ifobj_tx->use_poll = true;
/* create invalid frame by set umem frame_size and pkt length equal to 2048 */
test->ifobj_tx->xsk->umem->frame_size = 2048;
- if (pkt_stream_replace(test, 2 * DEFAULT_PKT_CNT, 2048))
+ if (pkt_stream_replace(test, 2 * DEFAULT_PKT_CNT, 2048)) {
+ test->ifobj_tx->shared_umem = shared_umem;
return TEST_FAILURE;
- return testapp_validate_traffic_single_thread(test, test->ifobj_tx);
+ }
+
+ ret = testapp_validate_traffic_single_thread(test, test->ifobj_tx);
+ test->ifobj_tx->shared_umem = shared_umem;
+
+ return ret;
}
int testapp_poll_rxq_tmout(struct test_spec *test)
--
2.43.0
^ permalink raw reply related
* Re: [PATCH net-next v5 0/6] pds_core: Add PLDM firmware update and host backed memory support
From: Jakub Kicinski @ 2026-06-16 15:54 UTC (permalink / raw)
To: Nikhil P. Rao
Cc: netdev, brett.creeley, eric.joyner, andrew+netdev, davem,
edumazet, pabeni, jacob.e.keller
In-Reply-To: <20260616023554.258764-1-nikhil.rao@amd.com>
On Tue, 16 Jun 2026 02:35:48 +0000 Nikhil P. Rao wrote:
> This series adds PLDM-based firmware update support to the pds_core
> driver. PLDM (Platform Level Data Model) is a DMTF standard for firmware
> management that provides a vendor-neutral interface for firmware updates.
net-next is closed for the duration of the merge window.
Please see: https://netdev.bots.linux.dev/net-next.html
And of course:
https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html#development-cycle
--
pw-bot: defer
^ permalink raw reply
* Re: [syzbot] [net?] KASAN: slab-use-after-free Read in fib_rules_lookup
From: Eric Dumazet @ 2026-06-16 15:55 UTC (permalink / raw)
To: Ido Schimmel
Cc: syzbot, kuniyu, davem, dsahern, horms, kuba, linux-kernel, netdev,
pabeni, syzkaller-bugs
In-Reply-To: <20260616153110.GA876739@shredder>
On Tue, Jun 16, 2026 at 8:31 AM Ido Schimmel <idosch@nvidia.com> wrote:
>
> On Tue, Jun 16, 2026 at 07:05:24AM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 72dfa4700f78 net: dsa: sja1105: fix lastused timestamp in ..
>
> This includes commit 759923cf03b0 ("ipv4: fib: Convert
> fib_net_exit_batch() to ->exit_rtnl().") that moved ip_fib_net_exit()
> (and therefore fib4_rules_exit()) earlier in the netns dismantle path.
>
> Kuniyuki, can you please take a look?
>
> You can use this to reproduce:
>
> #!/bin/bash
>
> while true; do
> ip netns add ns1
> ip -n ns1 link set dev lo up
> ip -n ns1 address add 192.0.2.1/24 dev lo
> ip -n ns1 link add name dummy1 up type dummy
> ip -n ns1 address add 198.51.100.1/24 dev dummy1
> ip -n ns1 rule add ipproto tcp sport 12345 table 12345
> ip -n ns1 fou add port 5555 ipproto 47 local 192.0.2.1 peer 198.51.100.2 peer_port 54321
> ip netns del ns1
> done
>
Oh right.
While looking at this syzbot report I also found an old issue.
https://lore.kernel.org/netdev/20260616141317.407791-1-edumazet@google.com/T/#u
I guess adding some delays in enqueue_to_backlog() could trigger a
similar bug even if we revert Kuniyuki's patch.
> Thanks
>
> > git tree: net-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=15794bd2580000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=a0842261b62cdea8
> > dashboard link: https://syzkaller.appspot.com/bug?extid=965506b59a2de0b6905c
> > compiler: Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/d4e16f50a97c/disk-72dfa470.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/6cd4a736e796/vmlinux-72dfa470.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/548b0011c8e8/bzImage-72dfa470.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com
> >
> > bond0 (unregistering): Released all slaves
> > bond1 (unregistering): Released all slaves
> > bond2 (unregistering): (slave dummy0): Releasing active interface
> > bond2 (unregistering): Released all slaves
> > ==================================================================
> > BUG: KASAN: slab-use-after-free in fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
> > Read of size 8 at addr ffff88804ec4c680 by task kworker/u8:21/12641
> >
> > CPU: 0 UID: 0 PID: 12641 Comm: kworker/u8:21 Not tainted syzkaller #0 PREEMPT(full)
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
> > Workqueue: netns cleanup_net
> > Call Trace:
> > <TASK>
> > dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
> > print_address_description+0x55/0x1e0 mm/kasan/report.c:378
> > print_report+0x58/0x70 mm/kasan/report.c:482
> > kasan_report+0x117/0x150 mm/kasan/report.c:595
> > fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
> > __fib_lookup+0x106/0x210 net/ipv4/fib_rules.c:96
> > ip_route_output_key_hash_rcu+0x294/0x2720 net/ipv4/route.c:2811
> > ip_route_output_key_hash+0x18d/0x2a0 net/ipv4/route.c:2702
> > __ip_route_output_key include/net/route.h:169 [inline]
> > ip_route_output_flow+0x2a/0x150 net/ipv4/route.c:2929
> > ip4_datagram_release_cb+0x89d/0xbe0 net/ipv4/datagram.c:118
> > release_sock+0x206/0x260 net/core/sock.c:3861
> > inet_shutdown+0x2b1/0x390 net/ipv4/af_inet.c:950
> > udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> > fou_release net/ipv4/fou_core.c:562 [inline]
> > fou_exit_net+0x17d/0x1f0 net/ipv4/fou_core.c:1230
> > ops_exit_list net/core/net_namespace.c:199 [inline]
> > ops_undo_list+0x43d/0x8d0 net/core/net_namespace.c:252
> > cleanup_net+0x572/0x810 net/core/net_namespace.c:702
> > process_one_work kernel/workqueue.c:3314 [inline]
> > process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
> > worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
> > kthread+0x389/0x470 kernel/kthread.c:436
> > ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
> > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > </TASK>
> >
> > Allocated by task 19121:
> > kasan_save_stack mm/kasan/common.c:57 [inline]
> > kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
> > poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
> > __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
> > kasan_kmalloc include/linux/kasan.h:263 [inline]
> > __do_kmalloc_node mm/slub.c:5296 [inline]
> > __kmalloc_node_track_caller_noprof+0x4d7/0x7b0 mm/slub.c:5408
> > kmemdup_noprof+0x2b/0x70 mm/util.c:138
> > kmemdup_noprof include/linux/fortify-string.h:763 [inline]
> > fib_rules_register+0x2f/0x400 net/core/fib_rules.c:170
> > fib4_rules_init+0x21/0x160 net/ipv4/fib_rules.c:508
> > ip_fib_net_init net/ipv4/fib_frontend.c:1578 [inline]
> > fib_net_init+0x17a/0x3e0 net/ipv4/fib_frontend.c:1628
> > ops_init+0x35d/0x5d0 net/core/net_namespace.c:137
> > setup_net+0x118/0x350 net/core/net_namespace.c:446
> > copy_net_ns+0x4f9/0x720 net/core/net_namespace.c:579
> > create_new_namespaces+0x3f0/0x6b0 kernel/nsproxy.c:132
> > unshare_nsproxy_namespaces+0x149/0x190 kernel/nsproxy.c:234
> > ksys_unshare+0x57d/0xa00 kernel/fork.c:3242
> > __do_sys_unshare kernel/fork.c:3316 [inline]
> > __se_sys_unshare kernel/fork.c:3314 [inline]
> > __x64_sys_unshare+0x38/0x50 kernel/fork.c:3314
> > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> > Freed by task 12641:
> > kasan_save_stack mm/kasan/common.c:57 [inline]
> > kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
> > kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:584
> > poison_slab_object mm/kasan/common.c:253 [inline]
> > __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
> > kasan_slab_free include/linux/kasan.h:235 [inline]
> > slab_free_hook mm/slub.c:2689 [inline]
> > __rcu_free_sheaf_prepare+0x12d/0x2a0 mm/slub.c:2940
> > rcu_free_sheaf+0x31/0x200 mm/slub.c:5850
> > rcu_do_batch kernel/rcu/tree.c:2617 [inline]
> > rcu_core+0x78b/0x10a0 kernel/rcu/tree.c:2869
> > handle_softirqs+0x225/0x840 kernel/softirq.c:622
> > do_softirq+0x76/0xd0 kernel/softirq.c:523
> > __local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
> > unregister_netdevice_many_notify+0x1874/0x2150 net/core/dev.c:12445
> > ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
> > ops_undo_list+0x391/0x8d0 net/core/net_namespace.c:248
> > cleanup_net+0x572/0x810 net/core/net_namespace.c:702
> > process_one_work kernel/workqueue.c:3314 [inline]
> > process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
> > worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
> > kthread+0x389/0x470 kernel/kthread.c:436
> > ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
> > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> >
> > The buggy address belongs to the object at ffff88804ec4c600
> > which belongs to the cache kmalloc-192 of size 192
> > The buggy address is located 128 bytes inside of
> > freed 192-byte region [ffff88804ec4c600, ffff88804ec4c6c0)
> >
> > The buggy address belongs to the physical page:
> > page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4ec4c
> > flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > page_type: f5(slab)
> > raw: 00fff00000000000 ffff88813fe163c0 dead000000000100 dead000000000122
> > raw: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
> > page dumped because: kasan: bad access detected
> > page_owner tracks the page as allocated
> > page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 13856, tgid 13853 (syz.3.2144), ts 351172300879, free_ts 351133053454
> > set_page_owner include/linux/page_owner.h:32 [inline]
> > post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
> > prep_new_page mm/page_alloc.c:1861 [inline]
> > get_page_from_freelist+0x24ae/0x2530 mm/page_alloc.c:3941
> > __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
> > alloc_slab_page mm/slub.c:3278 [inline]
> > allocate_slab+0x77/0x660 mm/slub.c:3467
> > new_slab mm/slub.c:3525 [inline]
> > refill_objects+0x336/0x3d0 mm/slub.c:7272
> > refill_sheaf mm/slub.c:2816 [inline]
> > __pcs_replace_empty_main+0x320/0x720 mm/slub.c:4652
> > alloc_from_pcs mm/slub.c:4750 [inline]
> > slab_alloc_node mm/slub.c:4884 [inline]
> > __do_kmalloc_node mm/slub.c:5295 [inline]
> > __kmalloc_noprof+0x464/0x750 mm/slub.c:5308
> > kmalloc_noprof include/linux/slab.h:954 [inline]
> > kzalloc_noprof include/linux/slab.h:1188 [inline]
> > new_dir fs/proc/proc_sysctl.c:966 [inline]
> > get_subdir fs/proc/proc_sysctl.c:1010 [inline]
> > sysctl_mkdir_p fs/proc/proc_sysctl.c:1320 [inline]
> > __register_sysctl_table+0xc02/0x1370 fs/proc/proc_sysctl.c:1395
> > neigh_sysctl_register+0x9b1/0xa90 net/core/neighbour.c:3915
> > addrconf_sysctl_register+0xb3/0x1c0 net/ipv6/addrconf.c:7396
> > ipv6_add_dev+0xd26/0x13a0 net/ipv6/addrconf.c:460
> > addrconf_notify+0x771/0x1050 net/ipv6/addrconf.c:3679
> > notifier_call_chain+0x1a5/0x3d0 kernel/notifier.c:85
> > call_netdevice_notifiers_extack net/core/dev.c:2288 [inline]
> > call_netdevice_notifiers net/core/dev.c:2302 [inline]
> > register_netdevice+0x18db/0x1f00 net/core/dev.c:11474
> > macsec_newlink+0x706/0x1200 drivers/net/macsec.c:4218
> > rtnl_newlink_create+0x310/0xb00 net/core/rtnetlink.c:3905
> > page last free pid 12657 tgid 12657 stack trace:
> > reset_page_owner include/linux/page_owner.h:25 [inline]
> > __free_pages_prepare mm/page_alloc.c:1397 [inline]
> > __free_frozen_pages+0xc0d/0xd20 mm/page_alloc.c:2938
> > __tlb_remove_table_free mm/mmu_gather.c:228 [inline]
> > tlb_remove_table_rcu+0x85/0x100 mm/mmu_gather.c:291
> > rcu_do_batch kernel/rcu/tree.c:2617 [inline]
> > rcu_core+0x78b/0x10a0 kernel/rcu/tree.c:2869
> > handle_softirqs+0x225/0x840 kernel/softirq.c:622
> > __do_softirq kernel/softirq.c:656 [inline]
> > invoke_softirq kernel/softirq.c:496 [inline]
> > __irq_exit_rcu+0xca/0x220 kernel/softirq.c:735
> > irq_exit_rcu+0x9/0x30 kernel/softirq.c:752
> > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1061 [inline]
> > sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1061
> > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
> >
> > Memory state around the buggy address:
> > ffff88804ec4c580: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc
> > ffff88804ec4c600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > >ffff88804ec4c680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> > ^
> > ffff88804ec4c700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > ffff88804ec4c780: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
> > ==================================================================
> >
> >
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> >
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >
> > If the report is already addressed, let syzbot know by replying with:
> > #syz fix: exact-commit-title
> >
> > If you want to overwrite report's subsystems, reply with:
> > #syz set subsystems: new-subsystem
> > (See the list of subsystem names on the web dashboard)
> >
> > If the report is a duplicate of another one, reply with:
> > #syz dup: exact-subject-of-another-report
> >
> > If you want to undo deduplication, reply with:
> > #syz undup
^ permalink raw reply
* [PATCH net 0/5] rxrpc: Miscellaneous fixes
From: David Howells @ 2026-06-16 15:57 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel
Here are some miscellaneous AF_RXRPC fixes for more stuff found by Sashiko[1]:
(1) Reject ACKALL packets for calls not in Tx or immediate post-Tx state.
(2) Fix connection leak from AF_RXRPC recvmsg userspace OOB handling.
(3) Fix double unlock in AF_RXRPC recvmsg userspace OOB handling.
(4) Fix AFS preallocate charge to flush the waitqueue after unlistening
the socket so that any charging thread that does manage to get started
will be waited for before socket destruction.
(5) Fix AFS OOB notify handling to cancel in-progress OOB notification
handling and then to flush the workqueue it's on.
David
The patches can be found here also:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes
[1] https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
David Howells (4):
rxrpc: Fix leak of connection from OOB challenge
rxrpc: Fix double unlock in rxrpc_recvmsg()
afs: Fix further netns teardown to cancel the preallocation charger
afs: Fix uncancelled rxrpc OOB message handler
Wyatt Feng (1):
rxrpc: input: reject ACKALL outside transmit phase
fs/afs/cm_security.c | 3 ++-
fs/afs/rxrpc.c | 5 ++++-
net/rxrpc/input.c | 16 +++++++++++++++-
net/rxrpc/oob.c | 5 +++++
net/rxrpc/recvmsg.c | 2 +-
5 files changed, 27 insertions(+), 4 deletions(-)
^ permalink raw reply
* [PATCH net 1/5] rxrpc: input: reject ACKALL outside transmit phase
From: David Howells @ 2026-06-16 15:57 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
Wyatt Feng, stable, Yuan Tan, Yifan Wu, Juefei Pu,
Zhengchuan Liang, Xin Liu, Ren Wei
In-Reply-To: <20260616155749.2125907-1-dhowells@redhat.com>
From: Wyatt Feng <bronzed_45_vested@icloud.com>
rxrpc_input_ackall() accepts ACKALL packets without checking whether
the call is in a state that can legitimately have outstanding transmit
buffers. A forged ACKALL can therefore reach a new service call in
RXRPC_CALL_SERVER_RECV_REQUEST before any reply packets have been
queued.
In that state call->tx_top is zero and call->tx_queue is NULL, so
rxrpc_rotate_tx_window() dereferences a NULL txqueue and triggers a
null-pointer dereference.
Fix rxrpc_input_ackall() to mirror the transmit-state gating already
used for normal ACK processing, and ignore ACKALL when there is no
outstanding transmit window to rotate.
Fixes: b341a0263b1b ("rxrpc: Implement progressive transmission queue struct")
Cc: stable@vger.kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Assisted-by: Codex:GPT-5.4
Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
---
net/rxrpc/input.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index ce761466b02d..37881dffa898 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1214,8 +1214,22 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
static void rxrpc_input_ackall(struct rxrpc_call *call, struct sk_buff *skb)
{
struct rxrpc_ack_summary summary = { 0 };
+ rxrpc_seq_t top = READ_ONCE(call->tx_top);
+
+ switch (__rxrpc_call_state(call)) {
+ case RXRPC_CALL_CLIENT_SEND_REQUEST:
+ case RXRPC_CALL_CLIENT_AWAIT_REPLY:
+ case RXRPC_CALL_SERVER_SEND_REPLY:
+ case RXRPC_CALL_SERVER_AWAIT_ACK:
+ break;
+ default:
+ return;
+ }
+
+ if (call->tx_bottom == top)
+ return;
- if (rxrpc_rotate_tx_window(call, call->tx_top, &summary))
+ if (rxrpc_rotate_tx_window(call, top, &summary))
rxrpc_end_tx_phase(call, false, rxrpc_eproto_unexpected_ackall);
}
^ permalink raw reply related
* [PATCH net 2/5] rxrpc: Fix leak of connection from OOB challenge
From: David Howells @ 2026-06-16 15:57 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
stable
In-Reply-To: <20260616155749.2125907-1-dhowells@redhat.com>
Fix leak of connection object from OOB challenge queue when response is
provided by userspace.
Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE")
Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
net/rxrpc/oob.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/rxrpc/oob.c b/net/rxrpc/oob.c
index 05ca9c1faa57..3318c8bd82ad 100644
--- a/net/rxrpc/oob.c
+++ b/net/rxrpc/oob.c
@@ -210,6 +210,11 @@ static int rxrpc_respond_to_oob(struct rxrpc_sock *rx,
break;
}
+ switch (skb->mark) {
+ case RXRPC_OOB_CHALLENGE:
+ rxrpc_put_connection(sp->chall.conn, rxrpc_conn_put_oob);
+ break;
+ }
rxrpc_free_skb(skb, rxrpc_skb_put_oob);
return ret;
}
^ permalink raw reply related
* [PATCH net 3/5] rxrpc: Fix double unlock in rxrpc_recvmsg()
From: David Howells @ 2026-06-16 15:57 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
stable
In-Reply-To: <20260616155749.2125907-1-dhowells@redhat.com>
Fix a double unlock in rxrpc_recvmsg() when dealing with OOB messages.
Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE")
Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
net/rxrpc/recvmsg.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 82614cbdb60f..39a03684432d 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -471,7 +471,7 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
release_sock(&rx->sk);
if (ret == -EAGAIN)
goto try_again;
- goto error_no_call;
+ goto error_trace;
}
/* Find the next call and dequeue it if we're not just peeking. If we
^ permalink raw reply related
* [PATCH net 4/5] afs: Fix further netns teardown to cancel the preallocation charger
From: David Howells @ 2026-06-16 15:57 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
Li Daming, Ren Wei, Jeffrey Altman, stable
In-Reply-To: <20260616155749.2125907-1-dhowells@redhat.com>
When an afs network namespace is torn down, it cancels and waits for the
work item that keeps the preallocated rxrpc call/conn/peer queue charged
before disabling incoming (i.e. listen 0), but there's a small window in
which it can be requeued by an incoming call wending through the I/O
thread.
Fix this by flushing the workqueue on which the charger runs after reducing
the listen backlog to zero.
Fixes: 47694fbc9d24 ("afs: Fix netns teardown to cancel the preallocation charger")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
cc: Li Daming <d4n.for.sec@gmail.com>
cc: Ren Wei <n05ec@lzu.edu.cn>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
fs/afs/rxrpc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index d5cfd24e815b..fd2d260fb25f 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -130,6 +130,7 @@ void afs_close_socket(struct afs_net *net)
cancel_work_sync(&net->charge_preallocation_work);
kernel_listen(net->socket, 0);
flush_workqueue(afs_async_calls);
+ flush_workqueue(afs_wq);
if (net->spare_incoming_call) {
afs_put_call(net->spare_incoming_call);
^ permalink raw reply related
* [PATCH net 5/5] afs: Fix uncancelled rxrpc OOB message handler
From: David Howells @ 2026-06-16 15:57 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
Li Daming, Ren Wei, Jeffrey Altman, stable
In-Reply-To: <20260616155749.2125907-1-dhowells@redhat.com>
Fix AFS to cancel its OOB message processing (typically to respond to
security challenges). Also move OOB message processing to afs_wq so that
it's also waited for and make the OOB handler just return if the net
namespace is no longer live.
Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE")
Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Li Daming <d4n.for.sec@gmail.com>
cc: Ren Wei <n05ec@lzu.edu.cn>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
fs/afs/cm_security.c | 3 ++-
fs/afs/rxrpc.c | 4 +++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/afs/cm_security.c b/fs/afs/cm_security.c
index edcbd249d202..103168c70dd4 100644
--- a/fs/afs/cm_security.c
+++ b/fs/afs/cm_security.c
@@ -101,7 +101,8 @@ void afs_process_oob_queue(struct work_struct *work)
struct sk_buff *oob;
enum rxrpc_oob_type type;
- while ((oob = rxrpc_kernel_dequeue_oob(net->socket, &type))) {
+ while (READ_ONCE(net->live) &&
+ (oob = rxrpc_kernel_dequeue_oob(net->socket, &type))) {
switch (type) {
case RXRPC_OOB_CHALLENGE:
afs_respond_to_challenge(oob);
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index fd2d260fb25f..6241f9349f6b 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -128,6 +128,7 @@ void afs_close_socket(struct afs_net *net)
_enter("");
cancel_work_sync(&net->charge_preallocation_work);
+ cancel_work_sync(&net->rx_oob_work);
kernel_listen(net->socket, 0);
flush_workqueue(afs_async_calls);
flush_workqueue(afs_wq);
@@ -985,5 +986,6 @@ static void afs_rx_notify_oob(struct sock *sk, struct sk_buff *oob)
{
struct afs_net *net = sk->sk_user_data;
- schedule_work(&net->rx_oob_work);
+ if (net->live)
+ queue_work(afs_wq, &net->rx_oob_work);
}
^ permalink raw reply related
* Re: [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause
From: Andrey Drobyshev @ 2026-06-16 15:58 UTC (permalink / raw)
To: Stefano Garzarella
Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den
In-Reply-To: <ajFUk7quPhbI7Te-@sgarzare-redhat>
On 6/16/26 5:18 PM, Stefano Garzarella wrote:
> On Fri, Jun 12, 2026 at 07:57:17PM +0300, Andrey Drobyshev wrote:
>> From: "Denis V. Lunev" <den@openvz.org>
>>
>> Earlier commit ("ms/vhost/vsock: Refuse the connection immediately when
>
> Please follow
> https://docs.kernel.org/process/submitting-patches.html#describe-your-changes
> on how to refer to a commit.
>
I omitted the hash on purpose as the commit is not yet in the mainline
tree, although our series is based and depends on it, as I mentioned:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?id=bb26ed5f3a8b
So it's a different (Michael's) repo and the commit is about to get
merged (but not yet there). But maybe usual reference style + repo link
would be better.
>> guest isn't ready") added a fast-fail in vhost_transport_send_pkt(). It
>> rejects every host send with -EHOSTUNREACH until the destination calls
>> SET_RUNNING(1). The fast-fail condition checks whether device's backends
>> are dropped, and if they're, the guest is considered to be not ready.
>
> Okay, so it's not a regression, I mean without this series that patch is
> not adding any regression, no?
>
> If it's the case, I'll change the wording in the cover letter.
>
Agreed.
>>
>> However, there might be other reasons for backends to be nulled. In
>> particular, when QEMU is performing CPR (checkpoint-restore) migration,
>> device ownership is being RESET and SET again, which leads to backends
>> drop and reattach. If we end up connecting during this window, an
>> AF_VSOCK client gets -EHOSTUNREACH, which is wrong.
>
> Please add this change before starting to support VHOST_RESET_OWNER
> ioctl in vhost-vsock, otherwise we are breaking the bisectability.
>
Agreed.
>>
>> Add a cpr_paused flag set inside vhost_vsock_drop_backends() when the
>> backend was previously live, cleared by vhost_vsock_start(). When set,
>> vhost_transport_send_pkt() queues the skb instead of fast-failing; the
>> existing kick of send_pkt_work in vhost_vsock_start() drains it on
>> resume. A device that has never run keeps cpr_paused == false and the
>> boot-time fast-fail behaviour is preserved.
>>
>> Pair the cpr_paused store with the backend store using an
>> smp_wmb()/smp_rmb() pair so a concurrent sender on a weakly-ordered
>> architecture never observes (NULL backend, !paused):
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> ---
>> drivers/vhost/vsock.c | 22 +++++++++++++++++++---
>> 1 file changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index e629886e5cf8..bcaba36becd7 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -61,6 +61,7 @@ struct vhost_vsock {
>>
>> u32 guest_cid;
>> bool seqpacket_allow;
>> + bool cpr_paused; /* between stop and next start */
>> };
>>
>> static u32 vhost_transport_get_local_cid(void)
>> @@ -311,11 +312,17 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
>> * the mutex would be too expensive in this hot path, and we already have
>> * all the outcomes covered: if the backend becomes NULL right after the check,
>> * vhost_transport_do_send_pkt() will check it under the mutex anyway.
>> + *
>> + * Don't fast-fail if cpr_paused is set, keep queueing skbs instead.
>> + * The kick in vhost_vsock_start() will drain them on resume.
>> */
>> if (unlikely(!data_race(vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])))) {
>> - rcu_read_unlock();
>> - kfree_skb(skb);
>> - return -EHOSTUNREACH;
>> + smp_rmb(); /* pairs with smp_wmb() in start/drop_backends */
>> + if (!READ_ONCE(vsock->cpr_paused)) {
>
> Can we avoid this which is not really readable and maybe add a single
> variable to control the fast-fail at all?
>
> I mean replacing both cpr_paused + backend-pointer with a single
> `started` flag: set it to false at open, true on start via
> smp_store_release(), back to false on normal stop, and leave it true
> during CPR pause.
>
> The reader in send_pkt can do just:
>
> if (!smp_load_acquire(&vsock->started))
> return -EHOSTUNREACH;
>
> WDYT?
>
I don't think it's gonna work as suggested. As I understand, the order
during CPR migration is:
1) SET_RUNNING(0)
-> vhost_vsock_stop()
-> vhost_vsock_drop_backends()
2) RESET_OWNER
-> vhost_vsock_drop_backends()
3) SET_OWNER
4) SET_RUNNING(1)
-> vhost_vsock_start
-> for (...) vhost_vq_set_backend()
(Btw I just noticed backends are already NULL at step 2), but that's
just our CPR case, for any potential RESET_OWNER users it might not be
the case).
So the race windows starts from 1) (not from 2)). We have no way of
differentiating whether device is actually being stopped for good, or
we're in the middle of CPR. If we set the flag to false on stop as you
suggested, we'll still hit the -EHOSTUNREACH case eventually, and
avoiding it is the whole purpose of this patch.
The fast-fail with -EHOSTUNREACH relies on the presence of backends.
IIUC the backend will only become set after initial SET_RUNNING(1),
which will only happen once the guest driver writes smth to virtio
config register, QEMU catches it and calls SET_RUNNING(1). So we have
ordering with the guest's actions here, which is logical. But for our
issue that means that the only true marker of paused/not paused is the
presence of backends - and that's why the flag is set in
vhost_vsock_drop_backends().
>> + rcu_read_unlock();
>> + kfree_skb(skb);
>> + return -EHOSTUNREACH;
>> + }
>
>
> That said claude here is reporting a potential issue that I think we
> should consider:
> After VHOST_RESET_OWNER, the guest CID stays in the hash, so
> vhost_transport_send_pkt() can still find the vsock, skip the
> fast-fail (cpr_paused=true), and call vhost_vq_work_queue() while
> vhost_workers_free() is freeing workers without a synchronize_rcu()
> — risking a use-after-free. Also, any send_pkt_work queued between
> the last flush and worker teardown gets its VHOST_WORK_QUEUED bit
> stuck (the vhost task exits without draining), deadlocking
> host→guest traffic after restart.
>
> A synchronize_rcu() in vhost_workers_free() between the
> rcu_assign_pointer(NULL) loop and the destroy loop would close the
> use-after-free, and reinitializing send_pkt_work via
> vhost_work_init() after vhost_dev_reset_owner() returns would clear
> the stuck QUEUED bit.
>
>
Yes, this looks real indeed. Though I couldn't hit the UAF issue while
testing host->guest transfer under KASAN.
>> }
>>
>> if (virtio_vsock_skb_reply(skb))
>> @@ -640,6 +647,9 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
>> mutex_unlock(&vq->mutex);
>> }
>>
>> + smp_wmb(); /* pairs with smp_rmb() in send_pkt */
>> + WRITE_ONCE(vsock->cpr_paused, false);
>> +
>> /* Some packets may have been queued before the device was started,
>> * let's kick the send worker to send them.
>> */
>> @@ -671,6 +681,11 @@ static void vhost_vsock_drop_backends(struct vhost_vsock *vsock)
>>
>> lockdep_assert_held(&vsock->dev.mutex);
>>
>> + if (vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
>> + WRITE_ONCE(vsock->cpr_paused, true);
>> + smp_wmb(); /* pairs with smp_rmb() in send_pkt */
>> + }
>
> Why here and not in vhost_vsock_reset_owner()?
>
> Also having this here will set it to true also with
> VHOST_VSOCK_SET_RUNNING(0), is that right?
>
That was added here precisely to cover the vhost_vsock_stop() case (see
above).
> Thanks,
> Stefano
>
>> +
>> for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
>> vq = &vsock->vqs[i];
>>
>> @@ -728,6 +743,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>>
>> vsock->guest_cid = 0; /* no CID assigned yet */
>> vsock->seqpacket_allow = false;
>> + vsock->cpr_paused = false;
>>
>> atomic_set(&vsock->queued_replies, 0);
>>
>> --
>> 2.47.1
>>
>
^ permalink raw reply
* Re: [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start
From: Andrey Drobyshev @ 2026-06-16 15:58 UTC (permalink / raw)
To: Stefano Garzarella
Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den
In-Reply-To: <ajFbT6sDESh9FDOl@sgarzare-redhat>
On 6/16/26 5:23 PM, Stefano Garzarella wrote:
> On Fri, Jun 12, 2026 at 07:57:18PM +0300, Andrey Drobyshev wrote:
>> During QEMU CPR live-update (and VHOST_RESET_OWNER in general) the guest
>> keeps running while the host drops and later re-attaches vhost backends.
>> If the guest adds a buffer to the TX virtqueue (guest->host) and kicks
>> while the backend is temporarily NULL (between vhost_vsock_drop_backends()
>> and the next vhost_vsock_start()), then the kick is delivered to the
>> vhost worker, handle_tx_kick() sees a NULL backend and returns, and the
>> kick signal is consumed. The buffer is then left in the ring.
>>
>> Then upon device start vhost_vsock_start() only re-kicks the RX send
>> worker, never the TX VQ, so the buffer is processed only if the guest
>> happens to kick again. But if the guest itself is now waiting for data
>>from the host, it will never kick TX VQ again, and we end up in a
>> deadlock.
>>
>> The deadlock is reproduced during active host->guest socat data transfer
>> under multiple consecutive CPR live-update's.
>>
>> To fix this, in vhost_vsock_start(), after kicking the RX send worker, also
>> queue the TX vq poll so any buffers the guest enqueued while we were paused
>> get scanned.
>
> Again, it seems like we're fixing an issue that existed before this
> series, but IIUC without support for VHOST_RESET_OWNER, this could never
> have happened, so the wording should be changed to make it clear that
> this is can happen only with the new VHOST_RESET_OWNER support.
>
> In addition, this patch must also be applied before the
> VHOST_RESET_OWNER support or merged into it.
>
Agreed.
>>
>> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>> ---
>> drivers/vhost/vsock.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index bcaba36becd7..1fcfe71d18be 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -655,6 +655,12 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
>> */
>> vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], &vsock->send_pkt_work);
>>
>> + /*
>> + * Some packets might've also been queued in TX VQ. Re-scan it here,
>> + * mirroring the RX send-worker kick above.
>> + */
>
> Can we also mention that this is related to VHOST_RESET_OWNER?
>
Agreed.
> Thanks,
> Stefano
>
>> + vhost_poll_queue(&vsock->vqs[VSOCK_VQ_TX].poll);
>> +
>> mutex_unlock(&vsock->dev.mutex);
>> return 0;
>>
>> --
>> 2.47.1
>>
>
^ permalink raw reply
* Re: [PATCH net-next v2 0/9] atm: remove more dead code
From: patchwork-bot+netdevbpf @ 2026-06-16 16:00 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, 3chas3,
mitch, linux-atm-general, dwmw2
In-Reply-To: <20260615194416.752559-1-kuba@kernel.org>
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 15 Jun 2026 12:44:07 -0700 you wrote:
> Commit 6deb53595092 ("net: remove unused ATM protocols and legacy
> ATM device drivers") removed a good chunk of old ATM drivers.
> Our goal going forward is to limit the ATM support to PPPoATM
> used in ADSL deployments.
>
> A recent burst of AI generated fixes for net/atm/signaling.c and
> net/atm/svc.c made me look closer at the remaining code. PPPoATM runs
> over permanent virtual circuits (PF_ATMPVC) with a statically
> configured VPI/VCI. We can drop switched virtual circuits (SVCs)
> and user-space signaling (atmsigd) support. While digging around
> I noticed a few more obviously dead pieces of code.
>
> [...]
Here is the summary with links:
- [net-next,v2,1/9] atm: remove AAL3/4 transport support
https://git.kernel.org/netdev/net-next/c/c1468145ce75
- [net-next,v2,2/9] atm: remove the unused send_oam / push_oam callbacks
https://git.kernel.org/netdev/net-next/c/b20aa9eded10
- [net-next,v2,3/9] atm: remove dead SONET PHY ioctls
https://git.kernel.org/netdev/net-next/c/277fb497d101
- [net-next,v2,4/9] atm: remove the local ATM (NSAP) address registry
https://git.kernel.org/netdev/net-next/c/a5a12d76d2cb
- [net-next,v2,5/9] atm: remove SVC socket support and the signaling daemon interface
https://git.kernel.org/netdev/net-next/c/aa582dc25ace
- [net-next,v2,6/9] atm: remove the unused change_qos device operation
https://git.kernel.org/netdev/net-next/c/6719d57ee047
- [net-next,v2,7/9] atm: remove the unused pre_send and send_bh device operations
https://git.kernel.org/netdev/net-next/c/ae6e653514d1
- [net-next,v2,8/9] atm: remove unused ATM PHY operations
https://git.kernel.org/netdev/net-next/c/e44e224e2f44
- [net-next,v2,9/9] atm: remove orphaned uAPI for deleted drivers, protocols and SVCs
https://git.kernel.org/netdev/net-next/c/8f9616500c59
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* [PATCH net v2] ice: fix memory leak in ice_lbtest_prepare_rings()
From: Dawei Feng @ 2026-06-16 15:57 UTC (permalink / raw)
To: Tony Nguyen
Cc: Przemek Kitszel, Andrew Lunn, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, intel-wired-lan, netdev,
linux-kernel, jianhao.xu, Dawei Feng, stable
ice_lbtest_prepare_rings() frees Rx rings only when
ice_vsi_start_all_rx_rings() fails. If ice_vsi_setup_rx_rings() fails
after allocating some descriptors, or if ice_vsi_cfg_lan() fails after
the Rx rings were prepared, the function reaches the Tx cleanup path
without releasing the initialized Rx resources.
Fix this by adding separate unwind paths for Rx setup failure and LAN
configuration failure. The Rx setup failure path releases the partially
prepared Rx rings before freeing Tx rings, while later failures first
undo the LAN Tx configuration and then release the Rx rings in reverse
setup order.
The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available. Manual inspection confirms that the bug is still
present in v7.1-rc7.
An x86_64 allyesconfig build showed no new warnings. As we do not have an
Intel E800 Series adapter available to run the ethtool offline loopback
selftest, no runtime testing was able to be performed.
Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest")
Cc: stable@vger.kernel.org
Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
---
Changes in v2:
- Fix cleanup order
drivers/net/ethernet/intel/ice/ice_ethtool.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index f28416a707d7..10a4abc66974 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -1069,18 +1069,18 @@ static int ice_lbtest_prepare_rings(struct ice_vsi *vsi)
status = ice_vsi_cfg_lan(vsi);
if (status)
- goto err_setup_rx_ring;
+ goto err_cfg_lan;
status = ice_vsi_start_all_rx_rings(vsi);
if (status)
- goto err_start_rx_ring;
+ goto err_cfg_lan;
return 0;
-err_start_rx_ring:
- ice_vsi_free_rx_rings(vsi);
-err_setup_rx_ring:
+err_cfg_lan:
ice_vsi_stop_lan_tx_rings(vsi, ICE_NO_RESET, 0);
+err_setup_rx_ring:
+ ice_vsi_free_rx_rings(vsi);
err_setup_tx_ring:
ice_vsi_free_tx_rings(vsi);
--
2.34.1
^ permalink raw reply related
* [PATCH 6.18 266/325] rxrpc: Fix the ACK parser to extract the SACK table for parsing
From: Greg Kroah-Hartman @ 2026-06-16 15:01 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Michael Bommarito, David Howells,
Marc Dionne, Jeffrey Altman, Eric Dumazet, David S. Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, linux-afs, netdev,
stable
In-Reply-To: <20260616145057.827196531@linuxfoundation.org>
6.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells <dhowells@redhat.com>
commit 333b6d5bb9f87827ac2639c737bf9613dbae7253 upstream.
Fix modification of the received skbuff in rxrpc_input_soft_acks() and a
potential incorrect access of the buffer in a fragmented UDP packet (the
packet would probably have to be deliberately pre-generated as fragmented)
when AF_RXRPC tries to extract the contents of the SACK table by copying
out the contents of the SACK table into a buffer before attempting to parse
AF_RXRPC assumes that it can just call skb_condense() and then validly
access the SACK table from skb->data and that it will be a flat buffer -
but skb_condense() can silently fail to do anything under some
circumstances.
Note that whilst rxrpc_input_soft_acks() should be able to parse extended
ACKs, the rest of AF_RXRPC doesn't currently support that.
Further, there's then no need to call skb_condense() in rxrpc_input_ack(),
so don't.
Fixes: d57a3a151660 ("rxrpc: Save last ACK's SACK table rather than marking txbufs")
Reported-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://lore.kernel.org/r/20260513180907.2061972-1-michael.bommarito@gmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: stable@kernel.org
Link: https://patch.msgid.link/105362.1780573560@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/rxrpc/input.c | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -963,23 +963,34 @@ static void rxrpc_input_soft_acks(struct
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
struct rxrpc_txqueue *tq = call->tx_queue;
unsigned long extracted = ~0UL;
- unsigned int nr = 0;
+ unsigned int nr = 0, nsack;
rxrpc_seq_t seq = call->acks_hard_ack + 1;
rxrpc_seq_t lowest_nak = seq + sp->ack.nr_acks;
- u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket);
+ u8 sack[256] __aligned(sizeof(unsigned long));
+ u8 *acks = sack;
_enter("%x,%x,%u", tq->qbase, seq, sp->ack.nr_acks);
while (after(seq, tq->qbase + RXRPC_NR_TXQUEUE - 1))
tq = tq->next;
+ /* Extract an individual SACK table. A normal SACK table is up to 255
+ * bytes with 1 ACK flag per byte, but an extended SACK table can be up
+ * to 256 bytes with up to 8 ACK/NACK flags per byte. The ACK flags go
+ * across all bit 0's then all bit 1's, then all bit 2's, ...
+ */
+ memset(sack, 0, sizeof(sack));
+ nsack = umin(sp->ack.nr_acks, 256);
+ if (skb_copy_bits(skb,
+ sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket),
+ sack, nsack) < 0)
+ return;
+
for (unsigned int i = 0; i < sp->ack.nr_acks; i++) {
/* Decant ACKs until we hit a txqueue boundary. */
+ if ((i & 255) == 0)
+ acks = sack;
shiftr_adv_rotr(acks, extracted);
- if (i == 256) {
- acks -= i;
- i = 0;
- }
seq++;
nr++;
if ((seq & RXRPC_TXQ_MASK) != 0)
@@ -1117,9 +1128,6 @@ static void rxrpc_input_ack(struct rxrpc
skb_copy_bits(skb, ioffset, &trailer, sizeof(trailer)) < 0)
return rxrpc_proto_abort(call, 0, rxrpc_badmsg_short_ack_trailer);
- if (nr_acks > 0)
- skb_condense(skb);
-
call->acks_latest_ts = ktime_get_real();
call->acks_hard_ack = hard_ack;
call->acks_prev_seq = prev_pkt;
^ permalink raw reply
* Re: [PATCH net-next 0/5] tls: reject the combination of TLS and sockmap
From: patchwork-bot+netdevbpf @ 2026-06-16 16:10 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, bpf, jakub,
john.fastabend, sd
In-Reply-To: <20260614014102.461064-1-kuba@kernel.org>
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Sat, 13 Jun 2026 18:40:55 -0700 you wrote:
> There are no known TLS+sockmap users and it has some known
> hard to solve bugs. Let's reject this configuration as we
> discussed a number of times.
>
> Jakub Kicinski (5):
> tls: reject the combination of TLS and sockmap
> tls: remove dead sockmap (psock) handling from the SW path
> selftests/bpf: remove sockmap + ktls tests
> selftests/bpf: drop the unused kTLS program from test_sockmap
> selftests/bpf: test that TLS crypto is rejected on a sockmap socket
>
> [...]
Here is the summary with links:
- [net-next,1/5] tls: reject the combination of TLS and sockmap
https://git.kernel.org/netdev/net-next/c/460e6486617c
- [net-next,2/5] tls: remove dead sockmap (psock) handling from the SW path
https://git.kernel.org/netdev/net-next/c/79511603a65b
- [net-next,3/5] selftests/bpf: remove sockmap + ktls tests
https://git.kernel.org/netdev/net-next/c/faf89584e436
- [net-next,4/5] selftests/bpf: drop the unused kTLS program from test_sockmap
https://git.kernel.org/netdev/net-next/c/6af8971d910e
- [net-next,5/5] selftests/bpf: test that TLS crypto is rejected on a sockmap socket
https://git.kernel.org/netdev/net-next/c/5949a7cf11e6
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* [PATCH net] netconsole: don't drop the last byte of a full-sized message
From: Breno Leitao @ 2026-06-16 16:09 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-kernel, asantostc, gustavold, kernel-team,
Breno Leitao
nt->buf is exactly MAX_PRINT_CHUNK bytes, but scnprintf() reserves one
byte for its NUL terminator, so a non-fragmented payload of exactly
MAX_PRINT_CHUNK loses its last byte (emitted as a stray NUL in the
release path). Grow nt->buf to MAX_PRINT_CHUNK + 1 and bound the
scnprintf() calls with sizeof(nt->buf); the transmitted length stays
capped at MAX_PRINT_CHUNK.
Alternatively, nt->buf could be left at MAX_PRINT_CHUNK and the NUL byte
reserved by routing exactly-MAX_PRINT_CHUNK payloads to fragmentation
('len < MAX_PRINT_CHUNK'), at the cost of fragmenting those messages.
But it would look less sane, thus the current approach.
Fixes: c62c0a17f9b7 ("netconsole: Append kernel version to message")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
drivers/net/netconsole.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 57dd6821a8aa9..bfab0a47678c9 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -184,8 +184,10 @@ struct netconsole_target {
bool extended;
bool release;
struct netpoll np;
- /* protected by target_list_lock */
- char buf[MAX_PRINT_CHUNK];
+ /* protected by target_list_lock; +1 gives scnprintf() room for its
+ * NUL terminator so a full MAX_PRINT_CHUNK payload is not truncated
+ */
+ char buf[MAX_PRINT_CHUNK + 1];
struct work_struct resume_wq;
};
@@ -1692,7 +1694,7 @@ static void send_msg_no_fragmentation(struct netconsole_target *nt,
if (release_len) {
release = init_utsname()->release;
- scnprintf(nt->buf, MAX_PRINT_CHUNK, "%s,%.*s", release,
+ scnprintf(nt->buf, sizeof(nt->buf), "%s,%.*s", release,
msg_len, msg);
msg_len += release_len;
} else {
@@ -1701,12 +1703,12 @@ static void send_msg_no_fragmentation(struct netconsole_target *nt,
if (userdata)
msg_len += scnprintf(&nt->buf[msg_len],
- MAX_PRINT_CHUNK - msg_len, "%s",
+ sizeof(nt->buf) - msg_len, "%s",
userdata);
if (sysdata)
msg_len += scnprintf(&nt->buf[msg_len],
- MAX_PRINT_CHUNK - msg_len, "%s",
+ sizeof(nt->buf) - msg_len, "%s",
sysdata);
send_udp(nt, nt->buf, msg_len);
---
base-commit: fbc6a80cb5d3fd4ac4b56e8c9d791dd17be890c4
change-id: 20260616-max_print_chunk-0a8cea1b1ed7
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related
* Re: [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause
From: Stefano Garzarella @ 2026-06-16 16:13 UTC (permalink / raw)
To: Andrey Drobyshev
Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den
In-Reply-To: <021a6604-289c-4dd8-a0be-33c7812c0105@virtuozzo.com>
On Tue, Jun 16, 2026 at 06:58:40PM +0300, Andrey Drobyshev wrote:
>On 6/16/26 5:18 PM, Stefano Garzarella wrote:
>> On Fri, Jun 12, 2026 at 07:57:17PM +0300, Andrey Drobyshev wrote:
[...]
>>> static u32 vhost_transport_get_local_cid(void)
>>> @@ -311,11 +312,17 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
>>> * the mutex would be too expensive in this hot path, and we already have
>>> * all the outcomes covered: if the backend becomes NULL right after the check,
>>> * vhost_transport_do_send_pkt() will check it under the mutex anyway.
>>> + *
>>> + * Don't fast-fail if cpr_paused is set, keep queueing skbs instead.
>>> + * The kick in vhost_vsock_start() will drain them on resume.
>>> */
>>> if (unlikely(!data_race(vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])))) {
>>> - rcu_read_unlock();
>>> - kfree_skb(skb);
>>> ] return -EHOSTUNREACH;
>>> + smp_rmb(); /* pairs with smp_wmb() in start/drop_backends */
>>> + if (!READ_ONCE(vsock->cpr_paused)) {
>>
>> Can we avoid this which is not really readable and maybe add a single
>> variable to control the fast-fail at all?
>>
>> I mean replacing both cpr_paused + backend-pointer with a single
>> `started` flag: set it to false at open, true on start via
>> smp_store_release(), back to false on normal stop, and leave it true
>> during CPR pause.
>>
>> The reader in send_pkt can do just:
>>
>> if (!smp_load_acquire(&vsock->started))
>> return -EHOSTUNREACH;
>>
>> WDYT?
>>
>
>I don't think it's gonna work as suggested. As I understand, the order
>during CPR migration is:
>
>1) SET_RUNNING(0)
> -> vhost_vsock_stop()
> -> vhost_vsock_drop_backends()
>2) RESET_OWNER
> -> vhost_vsock_drop_backends()
>3) SET_OWNER
>4) SET_RUNNING(1)
> -> vhost_vsock_start
> -> for (...) vhost_vq_set_backend()
>
>(Btw I just noticed backends are already NULL at step 2), but that's
>just our CPR case, for any potential RESET_OWNER users it might not be
>the case).
>
>So the race windows starts from 1) (not from 2)). We have no way of
>differentiating whether device is actually being stopped for good, or
>we're in the middle of CPR. If we set the flag to false on stop as you
>suggested, we'll still hit the -EHOSTUNREACH case eventually, and
>avoiding it is the whole purpose of this patch.
>
>The fast-fail with -EHOSTUNREACH relies on the presence of backends.
>IIUC the backend will only become set after initial SET_RUNNING(1),
>which will only happen once the guest driver writes smth to virtio
>config register, QEMU catches it and calls SET_RUNNING(1). So we have
>ordering with the guest's actions here, which is logical. But for our
>issue that means that the only true marker of paused/not paused is the
>presence of backends - and that's why the flag is set in
>vhost_vsock_drop_backends().
Okay, so what about avoiding to set `started` to false in
SET_RUNNING(0)? I mean use it just to track the first SET_RUNNING(1).
(And maybe changing the name to that variable).
Apart from CPR, when can SET_RUNNING(0) occur?
At the end that was just an optimization, if we queue the packet is not
a big issue IMO.
>
>>> + rcu_read_unlock();
>>> + kfree_skb(skb);
>>> + return -EHOSTUNREACH;
>>> + }
>>
>>
>> That said claude here is reporting a potential issue that I think we
>> should consider:
>> After VHOST_RESET_OWNER, the guest CID stays in the hash, so
>> vhost_transport_send_pkt() can still find the vsock, skip the
>> fast-fail (cpr_paused=true), and call vhost_vq_work_queue() while
>> vhost_workers_free() is freeing workers without a synchronize_rcu()
>> — risking a use-after-free. Also, any send_pkt_work queued between
>> the last flush and worker teardown gets its VHOST_WORK_QUEUED
>> bit
>> stuck (the vhost task exits without draining), deadlocking
>> host→guest traffic after restart.
>>
>> A synchronize_rcu() in vhost_workers_free() between the
>> rcu_assign_pointer(NULL) loop and the destroy loop would close the
>> use-after-free, and reinitializing send_pkt_work via
>> vhost_work_init() after vhost_dev_reset_owner() returns would clear
>> the stuck QUEUED bit.
>>
>>
>
>Yes, this looks real indeed. Though I couldn't hit the UAF issue while
>testing host->guest transfer under KASAN.
>
>>> }
>>>
>>> if (virtio_vsock_skb_reply(skb))
>>> @@ -640,6 +647,9 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
>>> mutex_unlock(&vq->mutex);
>>> }
>>>
>>> + smp_wmb(); /* pairs with smp_rmb() in send_pkt */
>>> + WRITE_ONCE(vsock->cpr_paused, false);
>>> +
>>> /* Some packets may have been queued before the device was started,
>>> * let's kick the send worker to send them.
>>> */
>>> @@ -671,6 +681,11 @@ static void vhost_vsock_drop_backends(struct vhost_vsock *vsock)
>>>
>>> lockdep_assert_held(&vsock->dev.mutex);
>>>
>>> + if (vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
>>> + WRITE_ONCE(vsock->cpr_paused, true);
>>> + smp_wmb(); /* pairs with smp_rmb() in send_pkt */
>>> + }
>>
>> Why here and not in vhost_vsock_reset_owner()?
>>
>> Also having this here will set it to true also with
>> VHOST_VSOCK_SET_RUNNING(0), is that right?
>>
>
>That was added here precisely to cover the vhost_vsock_stop() case (see
>above).
I see now, a comment or something in the commit would have helped.
Thanks,
Stefano
^ permalink raw reply
* Re: Ethtool : PRBS feature
From: Alexander H Duyck @ 2026-06-16 16:14 UTC (permalink / raw)
To: Das, Shubham, Andrew Lunn
Cc: netdev@vger.kernel.org, mkubecek@suse.cz, D H, Siddaraju,
Chintalapalle, Balaji
In-Reply-To: <SN7PR11MB81099B4885C10E16D52A2EB9FFE52@SN7PR11MB8109.namprd11.prod.outlook.com>
On Tue, 2026-06-16 at 12:14 +0000, Das, Shubham wrote:
> Hi Andrew,
>
> Thanks for the feedback.
>
> Yes, for multi-lane ports we can accept the lane number as an argument like:
>
> ethtool --phy-test eth1 lane 0 tx-prbs prbs7
> ethtool --phy-test eth2 lane 0 rx-prbs prbs7
>
> We referred to "Lee Trager's" "Open-Source Tooling for PHY Management and Testing" session:
> https://netdevconf.info/0x19/sessions/talk/open-source-tooling-for-phy-management-and-testing.html?.
> We have been trying to reach "Lee Trager" to seek more input, latest update on the approach and understand if there is a parallel effort in active so we can collaborate.
> If you can, please help me connect with "Lee Trager" and others who expressed interest in Ethernet PRBS. We are happy to align and start implementation.
>
You aren't going to have much luck if you are trying to reach out via
his Meta address as he has moved onto Nvidia so he is no longer working
on the fbnic driver.
As far as the work done most of it was internal and making use of
debugfs. I don't believe any of the work for fbnic began to approach
the suggested methods for upstreamming the feature as Lee had been
pulled into other efforts.
> About standardizing across other bus like PCIe and USB, I had a quick discussion with our internal designers, but I didn't observe any such SW-level config knobs interest.
> Looks like Ethernet has clear interest and we are joining that Ethernet PRBS community too.
I think it largely depends on what your implementation looks like. The
point being made was that many of the SerDes PHYs out there are capable
of use in multiple applications. So instead of being a networking
device you would be looking at a SerDes PHY such as those in
"/drivers/phy/".
Also do you know what layer in the PHY you are injecting this PRBS at?
I would be curious if this is PCS or at the PMD level?
If you are referring to the PCS level then yes, it would make sense to
have it in the networking subsystem as the PCS at this point is more a
netdev specific set of drivers, see "/drivers/net/pcs/".
In the case of the PMD that is where things get a bit more interesting.
There is an IEEE c45 register definition that includes PRBS testing
registers, however in the case of our implementation the PMD doesn't
follow that specification and follows more the "/drivers/phy/" model.
> Ethernet PRBS configuration and diagnostics support is well established and already widely used in existing Ethernet SERDES deployments.
> We think Ethernet is the most natural starting point within netdev, as it aligns with current driver practice and existing validation workflows.
The problem is many of these parts used as an Ethernet Serdes PMD are
really a multiuse part. So for example in the case of the hardware in
FBNIC we use the same part on the Ethernet PHY as we do for the PCIe
Gen5 PHY.
The complication in our case is that both are buried behind our FW due
to the fact that both are shared between slices. However for testing
purposes and such we could look at disabling the odd slices to
essentially unshare the hardware if you need another platform to test
something like this with.
^ permalink raw reply
* Re: [PATCH RFC 4/9] net: stmmac: qcom-ethqos: add per-platform NOC clock voting
From: Mohd Ayaan Anwar @ 2026-06-16 16:17 UTC (permalink / raw)
To: Konrad Dybcio
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Richard Cochran, Bjorn Andersson, Konrad Dybcio, Maxime Coquelin,
Alexandre Torgue, Russell King, linux-arm-msm, netdev, devicetree,
linux-kernel, linux-stm32, linux-arm-kernel
In-Reply-To: <45d7faac-7c0f-4f89-808e-06129e8420e4@oss.qualcomm.com>
Hi Konrad,
On Mon, Jun 15, 2026 at 02:13:05PM +0200, Konrad Dybcio wrote:
> On 6/11/26 8:37 PM, Mohd Ayaan Anwar wrote:
> > Some SoCs gate the EMAC's path to the System NOC behind dedicated clocks
> > that must be enabled before the DMA can reach memory. Add
> > ethqos_noc_clk_cfg and the corresponding fields in the driver-data and
> > runtime structs so each compatible can declare its own set with per-clock
> > rates. The clocks are acquired during probe and enabled/disabled
> > alongside the existing link clock in ethqos_clks_config().
>
> Sounds like we should use an OPP table instead, we can't just do
> set_rate() on qcom, as that will not propagate the required perf
> state to the clock controller's supplier power domain (i.e. VDDCX)
>
Understood, I will test this out for v2.
Ayaan
^ permalink raw reply
* Re: [Intel-wired-lan] e1000e: Report link down after "Detected Hardware Unit Hang" ?
From: Ruinskiy, Dima @ 2026-06-16 16:20 UTC (permalink / raw)
To: Helge Deller, Andrew Lunn, Helge Deller
Cc: Tony Nguyen, Przemek Kitszel, intel-wired-lan, netdev
In-Reply-To: <9d80ed59-5483-4c33-9d27-52fdf24aac6e@gmx.de>
On 15/06/2026 23:36, Helge Deller wrote:
> On 6/15/26 18:41, Andrew Lunn wrote:
>> On Sun, Jun 14, 2026 at 11:48:08PM +0200, Helge Deller wrote:
>>> I'm regularily facing the known "eno1: Detected Hardware Unit Hang:"
>>> with my on-board intel e1000e NIC hardware.
>>> Since none of he various tips on the internet helped, I had the idea
>>> to setup a master/slave bond networking to fail over to another NIC when
>>> the Intel chip hangs.
>>>
>>> Sadly this doesn't work as intended, because the link of the intel NIC
>>> isn't reported "down", so the failover never happens, unless I manually
>>> start "ifconfig eno1 down".
>>>
>>> My question: Shouldn't the intel NIC ideally report Link Down if we know
>>> it hangs? That way a fail-over should at least happen, right?
>>>
>>> Below is a completely untested patch.
>>> Does it make sense that I try to test and/or develop such a patch, or
>>> are there things I miss?
>>
>> If the interface is dead, then setting the carrier down makes a lot of
>> sense.
>
> That's what I think as well. Thanks for confirming.
>
>> One question i have is, what do you need to do to recover the
>> hardware? Will it correctly set the carrier up when you do the
>> recovery?
>
> The only way I could recover was to plug the network cable and re-insert
> it.
> I have not tested bringing the NIC down.
> But in both cases the driver will need to re-detect the media & link
>
>> Also, just looking at your proposed change, it is not clear to me why
>> such an assignment will result in carrier down. It would be good to
>> explain it in the commit message.
>
> Sure. The patch I attached was completely untested and just based on
> the analysis of the flow and how to make the Link possibly report to be
> down.
> Maybe someone knowledgeable of the driver has a better suggestion how to
> report the link down situation in a clean way?
>
> Helge
This does not seem like the right direction to me.
The "Detected Hardware Unit Hang" print does not indicate that the
interface is dead, but that the transmitter is stalled.
This can be due to an unusually high load, or a HW fault / race
condition with another component, etc.
When a hang is detected, the transmitter is stopped with
netif_stop_queue() and eventually ndo_tx_timeout triggers a full reset
to the device, which in many cases recovers it from the hang.
If the hang is persistent, we try to understand the cause and debug it.
Permanently marking the device as 'down' because it hung once is not
going to be the optimal solution.
^ permalink raw reply
* Re: [PATCH net-next 2/2] udp: convert udp_lib_getsockopt to sockopt_t
From: Breno Leitao @ 2026-06-16 16:22 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Willem de Bruijn, Shuah Khan, netdev, linux-kernel,
linux-kselftest, kernel-team
In-Reply-To: <aiy7ZR7Yz2Z4Ioyd@devvm7509.cco0.facebook.com>
On Fri, Jun 12, 2026 at 07:10:15PM -0700, Stanislav Fomichev wrote:
> On 06/12, Breno Leitao wrote:
> > int udp_lib_getsockopt(struct sock *sk, int level, int optname,
> > - char __user *optval, int __user *optlen)
> > + sockopt_t *opt)
> > {
> > struct udp_sock *up = udp_sk(sk);
> > int val, len;
> >
> > - if (get_user(len, optlen))
> > - return -EFAULT;
>
> [..]
>
> > - if (len < 0)
> > - return -EINVAL;
>
> I see this part now in sockopt_init_user, but you mention that it's a
> transitional helper. When we drop it, will we loose this <0 check?
> Maybe keep `if ((int)opt->optlen < 0))` here for backwards
> compatibility?
Good idea. I will do it and respin (once net-next reopens).
Thanks for the review,
--breno
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox