Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 net 1/2] tipc: fix UAF in cleanup_bearer() due to premature dst_cache_destroy()
From: Eric Dumazet @ 2026-06-24 12:42 UTC (permalink / raw)
  To: Tung Quang Nguyen
  Cc: Simon Horman, Kuniyuki Iwashima, Xin Long, Jon Maloy,
	tipc-discussion@lists.sourceforge.net, netdev@vger.kernel.org,
	eric.dumazet@gmail.com,
	syzbot+e14bc5d4942756023b77@syzkaller.appspotmail.com,
	David S . Miller, Jakub Kicinski, Paolo Abeni
In-Reply-To: <GV1P189MB198820F9C95A5FF9A110EA19C6ED2@GV1P189MB1988.EURP189.PROD.OUTLOOK.COM>

On Wed, Jun 24, 2026 at 5:04 AM Tung Quang Nguyen
<tung.quang.nguyen@est.tech> wrote:
>
> >Subject: [PATCH v2 net 1/2] tipc: fix UAF in cleanup_bearer() due to premature
> >dst_cache_destroy()
> >
> >TIPC UDP media bearer teardown calls dst_cache_destroy() on its replicast
> >caches before calling synchronize_net() to wait for concurrent RCU readers
> >(transmitters) to finish:
> >
> >static void cleanup_bearer(struct work_struct *work) { ...
> >       list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) {
> >               dst_cache_destroy(&rcast->dst_cache);
> >               list_del_rcu(&rcast->list);
> >               kfree_rcu(rcast, rcu);
> >       }
> >...
> >       dst_cache_destroy(&ub->rcast.dst_cache);
> >       udp_tunnel_sock_release(ub->sk);
> >       synchronize_net();
> >...
> >}
> >
> >This is highly buggy because dst_cache_destroy() immediately frees the per-
> >CPU cache memory (free_percpu()) and releases the cached dst entries
> >without any synchronization.
> >
> >If a concurrent transmitter (e.g., tipc_udp_xmit()) is running on another CPU
> >under RCU protection, it can call dst_cache_get() concurrently, leading to:
> >1. Use-After-Free on the per-CPU cache pointer itself (crash).
> >2. "rcuref - imbalanced put()" warning if it attempts to release a
> >   dst that was concurrently released by dst_cache_destroy().
> >
> >Furthermore, calling kfree(ub) immediately after synchronize_net() without
> >closing the socket first (or waiting after closing it) leaves a window where a
> >concurrent receiver (tipc_udp_recv()) could start after synchronize_net(),
> >access ub, and suffer a UAF when kfree(ub) runs.
> >
> >To fix this, we must defer dst_cache_destroy() and kfree(ub) until after we have
> >ensured that no more readers can see the bearer/socket and all existing
> >readers have finished:
> >
> >1. Defer rcast entry destruction (both dst_cache_destroy() and kfree())
> >   to an RCU callback using call_rcu_hurry().
> >   Using call_rcu_hurry() ensures the dst entries are released quickly.
> >
> >2. Release the bearer socket using udp_tunnel_sock_release() (stops
> >   new receive readers).
> >
> >3. Call synchronize_net() to wait for all outstanding RCU readers
> >   (both transmit and receive) to finish.
> >
> >4. Now that it is safe, call dst_cache_destroy() on the main bearer
> >   cache, and free ub.
> >
> >Note: 3) and 4) can be changed later in net-next to also use
> >call_rcu_hurry() and get rid of the synchronize_net() latency.
> >
> >Fixes: e9c1a793210f ("tipc: add dst_cache support for udp media")
> >Reported-by: syzbot+e14bc5d4942756023b77@syzkaller.appspotmail.com
> >Closes:
> >https://lore.kernel.org/netdev/6a396a66.52ae72c2.136ac7.0003.GAE@google.
> >com/T/#u
> >Signed-off-by: Eric Dumazet <edumazet@google.com>
> >Cc: Xin Long <lucien.xin@gmail.com>
> >Cc: Jon Maloy <jmaloy@redhat.com>
> >Cc: tipc-discussion@lists.sourceforge.net
> >---
> >v2: addressed Xin Long feedback
> >v1:
> >https://lore.kernel.org/netdev/CANn89i+dkbrSAwvaWXW7yWMfcwUebuTBLG
> >5T7AGZaZcpVYGyfQ@mail.gmail.com/T/#m7bbeedffe3bedb69e33236410e383
> >3c7ce809850
> > net/tipc/udp_media.c | 15 +++++++++++----
> > 1 file changed, 11 insertions(+), 4 deletions(-)
> >
> >diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index
> >988b8a7f953ad6da860e6190f1f244650f121dce..66f3cb87a0aaaac8f40e8f237a
> >b9a44d539b1cd8 100644
> >--- a/net/tipc/udp_media.c
> >+++ b/net/tipc/udp_media.c
> >@@ -803,6 +803,14 @@ static int tipc_udp_enable(struct net *net, struct
> >tipc_bearer *b,
> >       return err;
> > }
> >
> >+static void rcast_free_rcu(struct rcu_head *rcu) {
> >+      struct udp_replicast *rcast = container_of(rcu, struct udp_replicast,
> >+rcu);
>
> This line is long (over 80 columns). Please break it into 2 lines (refer to linux/Documentation/process/coding-style.rst).

I prefer it this way.

BTW:

commit bdc48fa11e46f867ea4d75fa59ee87a7f48be144
Author: Joe Perches <joe@perches.com>
Date:   Fri May 29 16:12:21 2020 -0700

    checkpatch/coding-style: deprecate 80-column warning

    Yes, staying withing 80 columns is certainly still _preferred_.  But
    it's not the hard limit that the checkpatch warnings imply, and other
    concerns can most certainly dominate.

    Increase the default limit to 100 characters.  Not because 100
    characters is some hard limit either, but that's certainly a "what are
    you doing" kind of value and less likely to be about the occasional
    slightly longer lines.

    Miscellanea:

     - to avoid unnecessary whitespace changes in files, checkpatch will no
       longer emit a warning about line length when scanning files unless
       --strict is also used

     - Add a bit to coding-style about alignment to open parenthesis

    Signed-off-by: Joe Perches <joe@perches.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

^ permalink raw reply

* [PATCH] selftests: wireguard: avoid extglob in netns CPU count
From: Yousef Alhouseen @ 2026-06-24 12:30 UTC (permalink / raw)
  To: Jason
  Cc: shuah, wireguard, netdev, linux-kselftest, linux-kernel,
	Yousef Alhouseen

netns.sh enables extglob before using cpu+([0-9]) to count CPUs, but
bash -n parses the whole script without executing that shopt command. This
makes syntax checks fail even though the script is intended for bash.

Use the ordinary cpu[0-9]* glob instead. It matches the same CPU directory
names without requiring extglob, and lets bash validate the script syntax.

Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
---
 tools/testing/selftests/wireguard/netns.sh | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh
index a8f550aec..98b423494 100755
--- a/tools/testing/selftests/wireguard/netns.sh
+++ b/tools/testing/selftests/wireguard/netns.sh
@@ -22,12 +22,11 @@
 # interfaces in $ns1 and $ns2. See https://www.wireguard.com/netns/ for further
 # details on how this is accomplished.
 set -e
-shopt -s extglob
 
 exec 3>&1
 export LANG=C
 export WG_HIDE_KEYS=never
-NPROC=( /sys/devices/system/cpu/cpu+([0-9]) ); NPROC=${#NPROC[@]}
+NPROC=( /sys/devices/system/cpu/cpu[0-9]* ); NPROC=${#NPROC[@]}
 netns0="wg-test-$$-0"
 netns1="wg-test-$$-1"
 netns2="wg-test-$$-2"
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: Kaitao Cheng @ 2026-06-24 12:29 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Alexei Starovoitov, Andrew Morton, David Hildenbrand, Jens Axboe,
	Tejun Heo, Alexander Viro, Christian Brauner, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
	Paul E. McKenney, Shakeel Butt, Christian König,
	David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
	Philipp Stanner, linux-block, LKML,
	open list:CONTROL GROUP (CGROUP), linux-ntfs-dev, Linux-Fsdevel,
	io-uring, audit, bpf, Network Development, dri-devel,
	linux-perf-use., linux-trace-kernel, kexec, live-patching,
	linux-modules, Linux Crypto Mailing List, Linux Power Management,
	rcu, sched-ext, linux-mm, virtualization, damon,
	clang-built-linux, chengkaitao, Muchun Song
In-Reply-To: <ajkSftEbdGoiJXYs@ashevche-desk.local>



在 2026/6/22 18:46, Andy Shevchenko 写道:
> On Mon, Jun 22, 2026 at 02:15:01PM +0800, Kaitao Cheng wrote:
>> 在 2026/6/22 13:28, Alexei Starovoitov 写道:
>>> On Sun, Jun 21, 2026 at 9:06 PM Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
> 
> ...
> 
>>>>  block/bfq-iosched.c                 |  17 +-
>>>>  block/blk-cgroup.c                  |  12 +-
>>>>  block/blk-flush.c                   |   4 +-
>>>>  block/blk-iocost.c                  |  18 +-
>>>>  block/blk-mq.c                      |   8 +-
>>>>  block/blk-throttle.c                |   4 +-
>>>>  block/kyber-iosched.c               |   4 +-
>>>>  block/partitions/ldm.c              |   8 +-
>>>>  block/sed-opal.c                    |   4 +-
>>>>  include/linux/list.h                | 269 ++++++++++++++++++++++++----
>>>>  include/linux/llist.h               |  81 +++++++--
>>>>  init/initramfs.c                    |   5 +-
>>>>  io_uring/cancel.c                   |   6 +-
>>>>  io_uring/poll.c                     |   3 +-
>>>>  io_uring/rw.c                       |   4 +-
>>>>  io_uring/timeout.c                  |   8 +-
>>>>  io_uring/uring_cmd.c                |   3 +-
>>>>  kernel/audit_tree.c                 |   4 +-
>>>>  kernel/audit_watch.c                |  16 +-
>>>>  kernel/auditfilter.c                |   4 +-
>>>>  kernel/auditsc.c                    |   4 +-
>>>>  kernel/bpf/arena.c                  |  10 +-
>>>>  kernel/bpf/arraymap.c               |   8 +-
>>>>  kernel/bpf/bpf_local_storage.c      |   3 +-
>>>>  kernel/bpf/bpf_lru_list.c           |  25 ++-
>>>>  kernel/bpf/btf.c                    |  18 +-
>>>>  kernel/bpf/cgroup.c                 |   7 +-
>>>>  kernel/bpf/cpumap.c                 |   4 +-
>>>>  kernel/bpf/devmap.c                 |  10 +-
>>>>  kernel/bpf/helpers.c                |   8 +-
>>>>  kernel/bpf/local_storage.c          |   4 +-
>>>>  kernel/bpf/memalloc.c               |  16 +-
>>>>  kernel/bpf/offload.c                |   8 +-
>>>>  kernel/bpf/states.c                 |   4 +-
>>>>  kernel/bpf/stream.c                 |   4 +-
>>>>  kernel/bpf/verifier.c               |   6 +-
>>>>  kernel/cgroup/cgroup-v1.c           |   4 +-
>>>>  kernel/cgroup/cgroup.c              |  54 +++---
>>>>  kernel/cgroup/dmem.c                |  12 +-
>>>>  kernel/cgroup/rdma.c                |   8 +-
>>>>  kernel/events/core.c                |  44 +++--
>>>>  kernel/events/uprobes.c             |  12 +-
>>>>  kernel/exit.c                       |   8 +-
>>>>  kernel/fail_function.c              |   4 +-
>>>>  kernel/gcov/clang.c                 |   4 +-
>>>>  kernel/irq_work.c                   |   4 +-
>>>>  kernel/kexec_core.c                 |   4 +-
>>>>  kernel/kprobes.c                    |  16 +-
>>>>  kernel/livepatch/core.c             |   4 +-
>>>>  kernel/livepatch/core.h             |   4 +-
>>>>  kernel/liveupdate/kho_block.c       |   4 +-
>>>>  kernel/liveupdate/luo_flb.c         |   4 +-
>>>>  kernel/locking/rwsem.c              |   2 +-
>>>>  kernel/locking/test-ww_mutex.c      |   2 +-
>>>>  kernel/module/main.c                |  11 +-
>>>>  kernel/padata.c                     |   4 +-
>>>>  kernel/power/snapshot.c             |   8 +-
>>>>  kernel/power/wakelock.c             |   4 +-
>>>>  kernel/printk/printk.c              |  11 +-
>>>>  kernel/ptrace.c                     |   4 +-
>>>>  kernel/rcu/rcutorture.c             |   3 +-
>>>>  kernel/rcu/tasks.h                  |   9 +-
>>>>  kernel/rcu/tree.c                   |   6 +-
>>>>  kernel/resource.c                   |   4 +-
>>>>  kernel/sched/core.c                 |   4 +-
>>>>  kernel/sched/ext.c                  |  22 +--
>>>>  kernel/sched/fair.c                 |  28 +--
>>>>  kernel/sched/topology.c             |   4 +-
>>>>  kernel/sched/wait.c                 |   4 +-
>>>>  kernel/seccomp.c                    |   4 +-
>>>>  kernel/signal.c                     |  11 +-
>>>>  kernel/smp.c                        |   4 +-
>>>>  kernel/taskstats.c                  |   8 +-
>>>>  kernel/time/clockevents.c           |   6 +-
>>>>  kernel/time/clocksource.c           |   4 +-
>>>>  kernel/time/posix-cpu-timers.c      |   4 +-
>>>>  kernel/time/posix-timers.c          |   3 +-
>>>>  kernel/torture.c                    |   3 +-
>>>>  kernel/trace/bpf_trace.c            |   4 +-
>>>>  kernel/trace/ftrace.c               |  49 +++--
>>>>  kernel/trace/ring_buffer.c          |  25 ++-
>>>>  kernel/trace/trace.c                |  12 +-
>>>>  kernel/trace/trace_dynevent.c       |   6 +-
>>>>  kernel/trace/trace_dynevent.h       |   5 +-
>>>>  kernel/trace/trace_events.c         |  35 ++--
>>>>  kernel/trace/trace_events_filter.c  |   4 +-
>>>>  kernel/trace/trace_events_hist.c    |   8 +-
>>>>  kernel/trace/trace_events_trigger.c |  17 +-
>>>>  kernel/trace/trace_events_user.c    |  16 +-
>>>>  kernel/trace/trace_stat.c           |   4 +-
>>>>  kernel/user-return-notifier.c       |   3 +-
>>>>  kernel/workqueue.c                  |  16 +-
>>>>  mm/backing-dev.c                    |   8 +-
>>>>  mm/balloon.c                        |   8 +-
>>>>  mm/cma.c                            |   4 +-
>>>>  mm/compaction.c                     |   4 +-
>>>>  mm/damon/core.c                     |   4 +-
>>>>  mm/damon/sysfs-schemes.c            |   4 +-
>>>>  mm/dmapool.c                        |   4 +-
>>>>  mm/huge_memory.c                    |   8 +-
>>>>  mm/hugetlb.c                        |  56 +++---
>>>>  mm/hugetlb_vmemmap.c                |  16 +-
>>>>  mm/khugepaged.c                     |  14 +-
>>>>  mm/kmemleak.c                       |   7 +-
>>>>  mm/ksm.c                            |  25 +--
>>>>  mm/list_lru.c                       |   4 +-
>>>>  mm/memcontrol-v1.c                  |   8 +-
>>>>  mm/memory-failure.c                 |  12 +-
>>>>  mm/memory-tiers.c                   |   4 +-
>>>>  mm/migrate.c                        |  23 ++-
>>>>  mm/mmu_notifier.c                   |   9 +-
>>>>  mm/page_alloc.c                     |   8 +-
>>>>  mm/page_reporting.c                 |   2 +-
>>>>  mm/percpu.c                         |  11 +-
>>>>  mm/pgtable-generic.c                |   4 +-
>>>>  mm/rmap.c                           |  10 +-
>>>>  mm/shmem.c                          |   9 +-
>>>>  mm/slab_common.c                    |  14 +-
>>>>  mm/slub.c                           |  33 ++--
>>>>  mm/swapfile.c                       |   4 +-
>>>>  mm/userfaultfd.c                    |  12 +-
>>>>  mm/vmalloc.c                        |  24 +--
>>>>  mm/vmscan.c                         |   7 +-
>>>>  mm/zsmalloc.c                       |   4 +-
>>>>  124 files changed, 875 insertions(+), 681 deletions(-)
>>>
>>> Not sure what you were thinking, but this diff stat
>>> is not landable.
>>
>> [PATCH v3 1/7] and [PATCH v3 2/7] contain the main logic and can
>> be merged directly. They are also compatible with the old API.
>> [PATCH v3 3/7] through [PATCH v3 7/7] are just simple interface
>> replacements and do not change any functional logic. They can be
>> left unmerged for now; individual modules can pick them up later
>> if needed.
>>
>> In v2, Andy Shevchenko mentioned: "If it's done by Linus himself
>> during the day when he prepares -rc1, it's fine."
> 
> Yes, but you need to get his blessing first to go with this.
> Have you communicated with him on this?

Not yet, because the overall approach is still not mature. People
have different opinions on the implementation details and on how
to move this forward, so I think we should iterate through a few
versions first before making a final decision.

>> Even so, the
>> changes in this patch series are indeed quite large and touch
>> almost every subsystem. I have only converted part of them for
>> now, so I wanted to send this out first and see what people think.
> 
> That's why it's better to provide a script to convert (e.g., coccinelle)
> instead of tons of patches.

I tried writing conversion scripts with Coccinelle, but there were
always cases that got missed. In contrast, I found that using AI
for focused replacements was actually more efficient.

As David Hildenbrand mentioned, "If we decide we want this, I guess
we should target per-subsystem conversions." I would like to provide
the new interface first; adapting each subsystem on demand later may
be easier to achieve.
-- 
Thanks
Kaitao Cheng


^ permalink raw reply

* Re: [PATCH net v7 3/4] iavf: send MAC change request synchronously
From: Przemek Kitszel @ 2026-06-24 12:28 UTC (permalink / raw)
  To: Jose Ignacio Tornos Martinez, netdev
  Cc: intel-wired-lan, aleksandr.loktionov, jacob.e.keller, horms,
	anthony.l.nguyen, davem, edumazet, kuba, pabeni, stable
In-Reply-To: <20260623101800.991293-4-jtornosm@redhat.com>

On 6/23/26 12:17, Jose Ignacio Tornos Martinez wrote:
> After commit ad7c7b2172c3 ("net: hold netdev instance lock during sysfs
> operations"), iavf_set_mac() is called with the netdev instance lock
> already held.
> 
> The function queues a MAC address change request via
> iavf_replace_primary_mac() and then waits for completion. However, in
> the current flow, the actual virtchnl message is sent by the watchdog
> task, which also needs to acquire the netdev lock to run. Additionally,
> the adminq_task which processes virtchnl responses also needs the netdev
> lock.
> 
> This creates a deadlock scenario:
> 1. iavf_set_mac() holds netdev lock and waits for MAC change
> 2. Watchdog needs netdev lock to send the request -> blocked
> 3. Even if request is sent, adminq_task needs netdev lock to process
>     PF response -> blocked
> 4. MAC change times out after 2.5 seconds
> 5. iavf_set_mac() returns -EAGAIN
> 
> This particularly affects VFs during bonding setup when multiple VFs are
> enslaved in quick succession.
> 
> Fix by implementing a synchronous MAC change operation similar to the
> approach used in commit fdadbf6e84c4 ("iavf: fix incorrect reset handling
> in callbacks").
> 
> The solution:
> 1. Send the virtchnl ADD_ETH_ADDR message directly (not via watchdog)
> 2. Poll the admin queue hardware directly for responses
> 3. Process all received messages (including non-MAC messages)
> 4. Return when MAC change completes or times out
> 
> A new generic function iavf_poll_virtchnl_response() is introduced that
> can be reused for any future synchronous virtchnl operations. It takes a
> callback to check completion, allowing flexible condition checking.
> 
> This allows the operation to complete synchronously while holding
> netdev_lock, without relying on watchdog or adminq_task. The function
> can sleep for up to 2.5 seconds polling hardware, but this is acceptable
> since netdev_lock is per-device and only serializes operations on the
> same interface.
> 
> To support this, change iavf_add_ether_addrs() to return an error code
> instead of void, allowing callers to detect failures. Additionally,
> export iavf_mac_add_reject() to enable proper rollback on local failures
> (timeouts, send errors) - PF rejections are already handled automatically
> by iavf_virtchnl_completion().
> 
> Remove vc_waitqueue entirely because iavf_set_mac was the only waiter on
> this waitqueue and after the changes it is not needed.
> 
> Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations")
> cc: stable@vger.kernel.org
> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
> ---
> v7: Rebase on current net tree
>      Remove the multi-batch processing loop from version 6 according to Przemek
>      Kitszel review: the loop cannot work without polling between iterations
>      since the second call would fail the current_op check. Multi-batch scenario
>      is extremely rare; send first batch and let watchdog handle remainder as v5
>      did

I was fine with v5 already, so:
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>

(we will see if Sashiko reads changelog notes (--- section here))

^ permalink raw reply

* [PATCH net] sctp: fix SCTP_RESET_STREAMS stream list length limit
From: Yousef Alhouseen @ 2026-06-24 12:22 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner, Xin Long
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, linux-sctp, netdev, linux-kernel, Yousef Alhouseen

SCTP_RESET_STREAMS carries a flexible array of u16 stream IDs, but the
optlen clamps treat USHRT_MAX as a byte count and then multiply
sizeof(__u16) by the fixed header size.

That caps the copied and validated option buffer at about 64 KiB, which
rejects valid requests containing more than about half of the u16 stream
ID range.

Use struct_size_t() for the maximum struct sctp_reset_streams layout
instead, so the bound matches the flexible array described by
srs_number_streams.

Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
---
 net/sctp/socket.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 66e12fb0c..b8f13044a 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4111,8 +4111,9 @@ static int sctp_setsockopt_reset_streams(struct sock *sk,
 	if (optlen < sizeof(*params))
 		return -EINVAL;
 	/* srs_number_streams is u16, so optlen can't be bigger than this. */
-	optlen = min_t(unsigned int, optlen, USHRT_MAX +
-					     sizeof(__u16) * sizeof(*params));
+	optlen = min_t(unsigned int, optlen,
+		       struct_size_t(struct sctp_reset_streams, srs_stream_list,
+				     USHRT_MAX));
 
 	if (params->srs_number_streams * sizeof(__u16) >
 	    optlen - sizeof(*params))
@@ -4598,8 +4599,8 @@ static int sctp_setsockopt(struct sock *sk, int level, int optname,
 	if (optlen > 0) {
 		/* Trim it to the biggest size sctp sockopt may need if necessary */
 		optlen = min_t(unsigned int, optlen,
-			       PAGE_ALIGN(USHRT_MAX +
-					  sizeof(__u16) * sizeof(struct sctp_reset_streams)));
+			       PAGE_ALIGN(struct_size_t(struct sctp_reset_streams,
+							srs_stream_list, USHRT_MAX)));
 		kopt = memdup_sockptr(optval, optlen);
 		if (IS_ERR(kopt))
 			return PTR_ERR(kopt);
-- 
2.54.0


^ permalink raw reply related

* [PATCH 5.15/6.1/6.6] af_unix: Reject SIOCATMARK on non-stream sockets
From: Alexander Martyniuk @ 2026-06-24 15:16 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Alexander Martyniuk, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Kuniyuki Iwashima, Jann Horn, Lee Jones, Sasha Levin, Rao Shoaib,
	netdev, linux-kernel, stable, Yuan Tan, Yifan Wu, Juefei Pu,
	Xin Liu, Jiexun Wang, Ren Wei

From: Jiexun Wang <wangjiexun2025@gmail.com>

commit d119775f2bad827edc28071c061fdd4a91f889a5 upstream.

SIOCATMARK reports whether the receive queue is at the urgent mark for
MSG_OOB.

In AF_UNIX, MSG_OOB is supported only for SOCK_STREAM sockets.
SOCK_DGRAM and SOCK_SEQPACKET reject MSG_OOB in sendmsg() and recvmsg(),
so they should not support SIOCATMARK either.

Return -EOPNOTSUPP for non-stream sockets before checking the receive
queue.

Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260506140825.2987635-1-n05ec@lzu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexander Martyniuk <alexevgmart@gmail.com>
---
Backport fix for CVE-2026-52928
 net/unix/af_unix.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 32892a40d139..8bd78cad69e7 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -3139,6 +3139,9 @@ static int unix_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 			struct sk_buff *skb;
 			int answ = 0;
 
+			if (sk->sk_type != SOCK_STREAM)
+				return -EOPNOTSUPP;
+
 			skb = skb_peek(&sk->sk_receive_queue);
 			if (skb && skb == READ_ONCE(unix_sk(sk)->oob_skb))
 				answ = 1;
-- 
2.43.0


^ permalink raw reply related

* [syzbot ci] Re: net: clear transport header during tunnel decapsulation
From: syzbot ci @ 2026-06-24 12:14 UTC (permalink / raw)
  To: davem, dsahern, edumazet, eric.dumazet, horms, idosch, kuba,
	netdev, pabeni, syzbot
  Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260624073209.3703492-1-edumazet@google.com>

syzbot ci has tested the following series

[v1] net: clear transport header during tunnel decapsulation
https://lore.kernel.org/all/20260624073209.3703492-1-edumazet@google.com
* [PATCH net] net: clear transport header during tunnel decapsulation

and found the following issue:
WARNING in geneve_udp_encap_recv

Full report is available here:
https://ci.syzbot.org/series/1f6dc47e-354f-4904-bc18-c2b7ea4d79b2

***

WARNING in geneve_udp_encap_recv

tree:      net
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net.git
base:      d87363b0edfc7504ff2b144fe4cdd8154f90f42e
arch:      amd64
compiler:  Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
config:    https://ci.syzbot.org/builds/7bb83c16-d55d-4d99-8c2b-6050e0022ef6/config

------------[ cut here ]------------
!skb_transport_header_was_set(skb)
WARNING: ./include/linux/skbuff.h:3094 at geneve_udp_encap_recv+0x26ed/0x4130, CPU#1: kworker/1:3/5072
Modules linked in:
CPU: 1 UID: 0 PID: 5072 Comm: kworker/1:3 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Workqueue: mld mld_ifc_work
RIP: 0010:geneve_udp_encap_recv+0x26ed/0x4130
Code: c8 00 00 00 0f b6 04 01 84 c0 0f 85 f2 18 00 00 48 8b 7c 24 78 66 44 89 6f 0a e8 7e 9b 7a 03 e9 8c 00 00 00 e8 f4 fc 33 fb 90 <0f> 0b 90 e9 2e e3 ff ff 49 83 c6 06 4c 89 f0 48 c1 e8 03 48 b9 00
RSP: 0018:ffffc90000a08620 EFLAGS: 00010246
RAX: ffffffff8691f92c RBX: ffff8881bcc8b5d0 RCX: ffff88816a7abb80
RDX: 0000000000000100 RSI: 000000000000ffff RDI: 000000000000ffff
RBP: ffffc90000a08790 R08: ffffffff903114f7 R09: 1ffffffff206229e
R10: dffffc0000000000 R11: fffffbfff206229f R12: ffff888109f6a108
R13: 1ffff110213ed5df R14: 0000000000000010 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff8882a927b000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6956ea5440 CR3: 000000016f55c000 CR4: 00000000000006f0
Call Trace:
 <IRQ>
 udp_queue_rcv_one_skb+0xfc5/0x10e0
 udp_unicast_rcv_skb+0x21a/0x3a0
 udp_rcv+0xecb/0x1db0
 ip_protocol_deliver_rcu+0x27e/0x440
 ip_local_deliver_finish+0x3bb/0x6f0
 NF_HOOK+0x336/0x3c0
 NF_HOOK+0x336/0x3c0
 process_backlog+0xa34/0x1860
 __napi_poll+0xaa/0x330
 net_rx_action+0x61d/0xf50
 handle_softirqs+0x225/0x840
 do_softirq+0x76/0xd0
 </IRQ>
 <TASK>
 __local_bh_enable_ip+0xf8/0x130
 __dev_queue_xmit+0x1ed7/0x37f0
 ip6_output+0x337/0x540
 NF_HOOK+0x177/0x4f0
 mld_sendpack+0x890/0xe10
 mld_ifc_work+0x839/0xe70
 process_scheduled_works+0xa8e/0x14e0
 worker_thread+0xa47/0xfb0
 kthread+0x388/0x470
 ret_from_fork+0x514/0xb70
 ret_from_fork_asm+0x1a/0x30
 </TASK>


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).

The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.

^ permalink raw reply

* Re: [PATCH v12 02/12] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
From: Nikolay Borisov @ 2026-06-24 12:12 UTC (permalink / raw)
  To: Pawan Gupta, x86, Jon Kohler, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260622-vmscape-bhb-v12-2-76cbda0ae3e5@linux.intel.com>



On 23.06.26 г. 20:33 ч., Pawan Gupta wrote:
> As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
> the Branch History Buffer (BHB). On Alder Lake and newer parts this
> sequence is not sufficient because it doesn't clear enough entries. This
> was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
> in the kernel.
> 
> Now with VMSCAPE (BHI variant) it is also required to isolate branch
> history between guests and userspace. Since BHI_DIS_S only protects the
> kernel, the newer CPUs also use IBPB.
> 
> A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
> But it currently does not clear enough BHB entries to be effective on newer
> CPUs with larger BHB. At boot, dynamically set the loop count of
> clear_bhb_loop() such that it is effective on newer CPUs too.
> 
> Introduce global loop counts, initializing them with appropriate value
> based on the hardware feature X86_FEATURE_BHI_CTRL.
> 
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>

Although AI brings up a valid argument about whether guests should be 
pessimized and fallback to the longer sequence ?

^ permalink raw reply

* RE: [PATCH v2 net 2/2] tipc: avoid busy looping in tipc_exit_net()
From: Tung Quang Nguyen @ 2026-06-24 12:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Simon Horman, Kuniyuki Iwashima, Xin Long, Jon Maloy,
	tipc-discussion@lists.sourceforge.net, netdev@vger.kernel.org,
	eric.dumazet@gmail.com, David S . Miller, Jakub Kicinski,
	Paolo Abeni
In-Reply-To: <20260623173030.2925059-3-edumazet@google.com>

>Subject: [PATCH v2 net 2/2] tipc: avoid busy looping in tipc_exit_net()
>
>Blamed commit introduced a busy-wait loop in tipc_exit_net() to wait for
>pending UDP bearer cleanup works to complete:
>
>       while (atomic_read(&tn->wq_count))
>               cond_resched();
>
>This loop can busy-wait for a long time if cond_resched() is a NOP. This
>typically happens if the netns exit is executed by a high priority task, or under
>kernels configured without preemption (CONFIG_PREEMPT_NONE). In such
>cases, it wastes CPU cycles and can lead to soft lockups.
>
>Fix this by replacing the busy loop with wait_var_event(), allowing the thread
>to sleep properly until the work queue count reaches zero.
>
>Accordingly, update cleanup_bearer() to use atomic_dec_and_test() and
>wake_up_var() to wake up the waiter when the count drops to zero.
>
>This uses the global wait queue hash table, avoiding the need to bloat struct
>tipc_net with a wait_queue_head_t. The atomic_dec_and_test() provides the
>necessary memory barrier to ensure the wakeup is not missed.
>
>Fixes: 04c26faa51d1 ("tipc: wait and exit until all work queues are done")
>Signed-off-by: Eric Dumazet <edumazet@google.com>
>Cc: Xin Long <lucien.xin@gmail.com>
>Cc: Jon Maloy <jmaloy@redhat.com>
>Cc: tipc-discussion@lists.sourceforge.net
>---
> net/tipc/core.c      | 4 ++--
> net/tipc/udp_media.c | 4 +++-
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
>diff --git a/net/tipc/core.c b/net/tipc/core.c index
>1ddecea1df6e9100334c47a28ff6c065292fb9ad..315975c3be8186784e9c44c9ff
>69d62c17ffd4b9 100644
>--- a/net/tipc/core.c
>+++ b/net/tipc/core.c
>@@ -45,6 +45,7 @@
> #include "crypto.h"
>
> #include <linux/module.h>
>+#include <linux/wait_bit.h>
>
> /* configurable TIPC parameters */
> unsigned int tipc_net_id __read_mostly; @@ -118,8 +119,7 @@ static void
>__net_exit tipc_exit_net(struct net *net)  #ifdef CONFIG_TIPC_CRYPTO
> 	tipc_crypto_stop(&tipc_net(net)->crypto_tx);
> #endif
>-	while (atomic_read(&tn->wq_count))
>-		cond_resched();
>+	wait_var_event(&tn->wq_count, atomic_read(&tn->wq_count) == 0);

It could be nicer if you change to this simple call:
wait_var_event(&tn->wq_count, !atomic_read(&tn->wq_count));

> }
>
> static void __net_exit tipc_pernet_pre_exit(struct net *net) diff --git
>a/net/tipc/udp_media.c b/net/tipc/udp_media.c index
>66f3cb87a0aaaac8f40e8f237ab9a44d539b1cd8..62ae7f5b58409c89798c915de
>e752ac42487581f 100644
>--- a/net/tipc/udp_media.c
>+++ b/net/tipc/udp_media.c
>@@ -40,6 +40,7 @@
> #include <linux/igmp.h>
> #include <linux/kernel.h>
> #include <linux/workqueue.h>
>+#include <linux/wait_bit.h>
> #include <linux/list.h>
> #include <net/sock.h>
> #include <net/ip.h>
>@@ -830,7 +831,8 @@ static void cleanup_bearer(struct work_struct *work)
> 	synchronize_net();
>
> 	dst_cache_destroy(&ub->rcast.dst_cache);
>-	atomic_dec(&tn->wq_count);
>+	if (atomic_dec_and_test(&tn->wq_count))
>+		wake_up_var(&tn->wq_count);
> 	kfree(ub);
> }
>
>--
>2.55.0.rc0.799.gd6f94ed593-goog
>


^ permalink raw reply

* RE: [PATCH v2 net 1/2] tipc: fix UAF in cleanup_bearer() due to premature dst_cache_destroy()
From: Tung Quang Nguyen @ 2026-06-24 12:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Simon Horman, Kuniyuki Iwashima, Xin Long, Jon Maloy,
	tipc-discussion@lists.sourceforge.net, netdev@vger.kernel.org,
	eric.dumazet@gmail.com,
	syzbot+e14bc5d4942756023b77@syzkaller.appspotmail.com,
	David S . Miller, Jakub Kicinski, Paolo Abeni
In-Reply-To: <20260623173030.2925059-2-edumazet@google.com>

>Subject: [PATCH v2 net 1/2] tipc: fix UAF in cleanup_bearer() due to premature
>dst_cache_destroy()
>
>TIPC UDP media bearer teardown calls dst_cache_destroy() on its replicast
>caches before calling synchronize_net() to wait for concurrent RCU readers
>(transmitters) to finish:
>
>static void cleanup_bearer(struct work_struct *work) { ...
>	list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) {
>		dst_cache_destroy(&rcast->dst_cache);
>		list_del_rcu(&rcast->list);
>		kfree_rcu(rcast, rcu);
>	}
>...
>	dst_cache_destroy(&ub->rcast.dst_cache);
>	udp_tunnel_sock_release(ub->sk);
>	synchronize_net();
>...
>}
>
>This is highly buggy because dst_cache_destroy() immediately frees the per-
>CPU cache memory (free_percpu()) and releases the cached dst entries
>without any synchronization.
>
>If a concurrent transmitter (e.g., tipc_udp_xmit()) is running on another CPU
>under RCU protection, it can call dst_cache_get() concurrently, leading to:
>1. Use-After-Free on the per-CPU cache pointer itself (crash).
>2. "rcuref - imbalanced put()" warning if it attempts to release a
>   dst that was concurrently released by dst_cache_destroy().
>
>Furthermore, calling kfree(ub) immediately after synchronize_net() without
>closing the socket first (or waiting after closing it) leaves a window where a
>concurrent receiver (tipc_udp_recv()) could start after synchronize_net(),
>access ub, and suffer a UAF when kfree(ub) runs.
>
>To fix this, we must defer dst_cache_destroy() and kfree(ub) until after we have
>ensured that no more readers can see the bearer/socket and all existing
>readers have finished:
>
>1. Defer rcast entry destruction (both dst_cache_destroy() and kfree())
>   to an RCU callback using call_rcu_hurry().
>   Using call_rcu_hurry() ensures the dst entries are released quickly.
>
>2. Release the bearer socket using udp_tunnel_sock_release() (stops
>   new receive readers).
>
>3. Call synchronize_net() to wait for all outstanding RCU readers
>   (both transmit and receive) to finish.
>
>4. Now that it is safe, call dst_cache_destroy() on the main bearer
>   cache, and free ub.
>
>Note: 3) and 4) can be changed later in net-next to also use
>call_rcu_hurry() and get rid of the synchronize_net() latency.
>
>Fixes: e9c1a793210f ("tipc: add dst_cache support for udp media")
>Reported-by: syzbot+e14bc5d4942756023b77@syzkaller.appspotmail.com
>Closes:
>https://lore.kernel.org/netdev/6a396a66.52ae72c2.136ac7.0003.GAE@google.
>com/T/#u
>Signed-off-by: Eric Dumazet <edumazet@google.com>
>Cc: Xin Long <lucien.xin@gmail.com>
>Cc: Jon Maloy <jmaloy@redhat.com>
>Cc: tipc-discussion@lists.sourceforge.net
>---
>v2: addressed Xin Long feedback
>v1:
>https://lore.kernel.org/netdev/CANn89i+dkbrSAwvaWXW7yWMfcwUebuTBLG
>5T7AGZaZcpVYGyfQ@mail.gmail.com/T/#m7bbeedffe3bedb69e33236410e383
>3c7ce809850
> net/tipc/udp_media.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
>diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index
>988b8a7f953ad6da860e6190f1f244650f121dce..66f3cb87a0aaaac8f40e8f237a
>b9a44d539b1cd8 100644
>--- a/net/tipc/udp_media.c
>+++ b/net/tipc/udp_media.c
>@@ -803,6 +803,14 @@ static int tipc_udp_enable(struct net *net, struct
>tipc_bearer *b,
> 	return err;
> }
>
>+static void rcast_free_rcu(struct rcu_head *rcu) {
>+	struct udp_replicast *rcast = container_of(rcu, struct udp_replicast,
>+rcu);

This line is long (over 80 columns). Please break it into 2 lines (refer to linux/Documentation/process/coding-style.rst).

>+
>+	dst_cache_destroy(&rcast->dst_cache);
>+	kfree(rcast);
>+}
>+
> /* cleanup_bearer - break the socket/bearer association */  static void
>cleanup_bearer(struct work_struct *work)  { @@ -811,18 +819,17 @@ static
>void cleanup_bearer(struct work_struct *work)
> 	struct tipc_net *tn;
>
> 	list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) {
>-		dst_cache_destroy(&rcast->dst_cache);
> 		list_del_rcu(&rcast->list);
>-		kfree_rcu(rcast, rcu);
>+		call_rcu_hurry(&rcast->rcu, rcast_free_rcu);
> 	}
>
> 	tn = tipc_net(sock_net(ub->sk));
>
>-	dst_cache_destroy(&ub->rcast.dst_cache);
> 	udp_tunnel_sock_release(ub->sk);
>
>-	/* Note: could use a call_rcu() to avoid another synchronize_net() */
> 	synchronize_net();
>+
>+	dst_cache_destroy(&ub->rcast.dst_cache);
> 	atomic_dec(&tn->wq_count);
> 	kfree(ub);
> }
>--
>2.55.0.rc0.799.gd6f94ed593-goog
>


^ permalink raw reply

* Re: [PATCH 5.10] netfilter: nf_log: validate MAC header was set before dumping it
From: Greg Kroah-Hartman @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Alexander Martyniuk, stable, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Patrick McHardy, netfilter-devel, coreteam,
	netdev, linux-kernel, Weiming Shi, Xiang Mei
In-Reply-To: <ajvEDFOlP7Bqb-3j@chamomile>

On Wed, Jun 24, 2026 at 01:48:28PM +0200, Pablo Neira Ayuso wrote:
> Hi,
> 
> Thanks but why only 5.10?

It's already in the following releases:
	5.15.210 6.1.176 6.6.143 6.12.94 6.18.36 7.0.13 7.1

so 5.10.y seems like the only one missing it at the moment.

thanks,

greg k-h

^ permalink raw reply

* RE: [PATCH net] tipc: fix out-of-bounds read in broadcast Gap ACK blocks
From: Tung Quang Nguyen @ 2026-06-24 11:56 UTC (permalink / raw)
  To: Samuel Page
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev@vger.kernel.org,
	tipc-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, Jon Maloy
In-Reply-To: <20260623135443.3662041-1-sam@bynar.io>

>Subject: [PATCH net] tipc: fix out-of-bounds read in broadcast Gap ACK blocks
>
>A broadcast PROTOCOL/STATE_MSG can carry a Gap ACK blocks record in its
>data area. tipc_get_gap_ack_blks() only verifies that the record's len field is
>self-consistent with its ugack_cnt/bgack_cnt counts (sz == struct_size(p, gacks,
>ugack_cnt + bgack_cnt)); it does not check that the record actually fits in the
>message data area, msg_data_sz().
>
>The unicast caller tipc_link_proto_rcv() bounds it ("if (glen > dlen) break;"), but
>the broadcast caller tipc_bcast_sync_rcv() discards the returned size, so
>tipc_link_advance_transmq() copies the record off the receive skb with an
>attacker-controlled count:
>
>	this_ga = kmemdup(ga, struct_size(ga, gacks, ga->bgack_cnt),
>			  GFP_ATOMIC);
>
>A TIPC neighbour that negotiated TIPC_GAP_ACK_BLOCK triggers it with one
>ordinary broadcast STATE_MSG (msg_bc_ack_invalid() clear), sized so its data
>area is short, carrying a Gap ACK record with len = 0x400, bgack_cnt = 0xff and
>ugack_cnt = 0. len then equals struct_size(p, gacks, 255), so the consistency
>check passes and ga is non-NULL; kmemdup() reads struct_size(ga, gacks, 255)
>= 1024 bytes out of the much smaller skb:
>
>  BUG: KASAN: slab-out-of-bounds in kmemdup_noprof+0x48/0x60
>  Read of size 1024 at addr ffff0000c7030d38 by task poc864/69
>  Call trace:
>   kmemdup_noprof+0x48/0x60
>   tipc_link_advance_transmq+0x86c/0xb80
>   tipc_link_bc_ack_rcv+0x19c/0x1e0
>   tipc_bcast_sync_rcv+0x1c4/0x2c4
>   tipc_rcv+0x85c/0x1340
>   tipc_l2_rcv_msg+0xac/0x104
>  The buggy address belongs to the object at ffff0000c7030d00
>   which belongs to the cache skbuff_small_head of size 704
>  The buggy address is located 56 bytes inside of
>   allocated 704-byte region [ffff0000c7030d00, ffff0000c7030fc0)
>
>The copied-out bytes are subsequently consumed as gap/ack values, but the
>read is already out of bounds at the kmemdup() regardless of how they are
>used.
>
>Apply the same bound the unicast path uses to the broadcast caller: drop the
>Gap ACK blocks when the reported size exceeds the message data size.
>A NULL ga is already the defined "no Gap ACK blocks" case, so well-formed
>state messages are unaffected.
>
>Fixes: d7626b5acff9 ("tipc: introduce Gap ACK blocks for broadcast link")
>Cc: stable@vger.kernel.org
>Assisted-by: Bynario AI
>Signed-off-by: Samuel Page <sam@bynar.io>
>---
>Before posting I found an earlier thread for what looks like the same (or a very
>closely related) issue:
>
>
>https://lore.kernel.org/netdev/1316452e465e9a96fce44ec15130a14f3872149f.
>1775809727.git.caoruide123@gmail.com/
>  [PATCH net 1/1] tipc: validate Gap ACK blocks in STATE message
>
>That one added the validation inside tipc_get_gap_ack_blks() and the thread
>stalled on whether the extra checks were redundant. This patch instead adds,
>on the broadcast caller, only the same bound the unicast path already applies,
>and includes the KASAN reproducer that was asked for there.
>
> net/tipc/bcast.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index
>76a1585d3f6b..61c83bd95755 100644
>--- a/net/tipc/bcast.c
>+++ b/net/tipc/bcast.c
>@@ -502,6 +502,7 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link
>*l,
> 	struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq;
> 	struct tipc_gap_ack_blks *ga;
> 	struct sk_buff_head xmitq;
>+	u16 glen;
> 	int rc = 0;
>
> 	__skb_queue_head_init(&xmitq);
>@@ -510,7 +511,10 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link
>*l,
> 	if (msg_type(hdr) != STATE_MSG) {
> 		tipc_link_bc_init_rcv(l, hdr);
> 	} else if (!msg_bc_ack_invalid(hdr)) {
>-		tipc_get_gap_ack_blks(&ga, l, hdr, false);
>+		/* Validate Gap ACK blocks, drop if invalid */
>+		glen = tipc_get_gap_ack_blks(&ga, l, hdr, false);
>+		if (glen > msg_data_sz(hdr))
>+			ga = NULL;

This is wrong because the skb is not dropped as it should be.
Note that 'ga' is NULL just for legacy TIPC that does not support Selective ACK.
To correctly fix this issue, you need to set a flag (for example, a Boolean output parameter) to TRUE instead of 'ga=NULL'.
Then, immediately return and repeatedly pass the flag to tipc_rcv() in order to drop the skb.

> 		if (!sysctl_tipc_bc_retruni)
> 			retrq = &xmitq;
> 		rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr),
>
>base-commit: a986fde914d88af47eb78fd29c5d1af7952c3500
>--
>2.54.0


^ permalink raw reply

* Re: [PATCH bpf-next v5 1/3] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper
From: Avinash Duduskar @ 2026-06-24 11:54 UTC (permalink / raw)
  To: toke, ast, daniel, andrii
  Cc: eddyz87, memxor, martin.lau, song, yonghong.song, jolsa, emil,
	john.fastabend, sdf, davem, edumazet, kuba, pabeni, horms, shuah,
	hawk, yatsenko, leon.hwang, kpsingh, a.s.protopopov, ameryhung,
	rongtao, eyal.birger, bpf, netdev, linux-kernel, linux-kselftest,
	dsahern
In-Reply-To: <87pl1gcmgf.fsf@toke.dk>

> > +	if (flags & BPF_FIB_LOOKUP_VLAN)
> > +		return -EINVAL;
> > +
>
> This is fine, but we should probably reject the input flag as well in
> the next patch (for symmetry).

I dug into this and I don't think the two are symmetric. The egress
reject is right for exactly the reason you gave: in tc you can redirect
to the VLAN device directly, so reducing the egress to the physical
parent is only needed for XDP. But that is a transmit argument, and
VLAN_INPUT never touches the egress or redirect side. It only sets the
lookup's ingress (flowi_iif), which picks the iif policy rule and the VRF
table, and there is no XDP-only constraint there for the symmetry to
mirror.

tc also has a real user for it. In __netif_receive_skb_core() the tcx
ingress hook runs before vlan_do_receive() demuxes the frame, so a clsact
program on the physical port sees a tagged frame with skb->dev still the
physical device and the tag in skb->vlan_tci. That is exactly the
physical-ifindex-plus-tag input VLAN_INPUT takes, and it wants the
subinterface-scoped answer. The 2/3 selftest already runs the VLAN_INPUT
cases on the tc path, including the VRF-table-selection ones, and they
pass, so this isn't a theoretical tc path.

So I would keep VLAN_INPUT allowed on both. If you would rather hold a
uniform "both VLAN flags are XDP-only" line under the same
restrict-now-relax-later rule as the TBID and OUTPUT combos, I will add
the reject, but unlike the egress case it removes a working tc path
rather than one tc cannot use. Happy either way, just let me know.

Thanks for the review.

-Avinash

^ permalink raw reply

* Re: [PATCH 5.10] netfilter: nf_log: validate MAC header was set before dumping it
From: Pablo Neira Ayuso @ 2026-06-24 11:51 UTC (permalink / raw)
  To: Alexander Martyniuk
  Cc: stable, Greg Kroah-Hartman, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Patrick McHardy, netfilter-devel, coreteam,
	netdev, linux-kernel, Weiming Shi, Xiang Mei
In-Reply-To: <ajvEDFOlP7Bqb-3j@chamomile>

BTW, fixing Cc: to netfilter-devel@vger.kernel.org

On Wed, Jun 24, 2026 at 01:48:31PM +0200, Pablo Neira Ayuso wrote:
> Hi,
> 
> Thanks but why only 5.10?
> 
> On Wed, Jun 24, 2026 at 02:01:15PM +0000, Alexander Martyniuk wrote:
> > From: Xiang Mei <xmei5@asu.edu>
> > 
> > commit a84b6fedbc97078788be78dbdd7517d143ad1a77 upstream
> > 
> > The fallback path of dump_mac_header() guards the MAC header access
> > only with "skb->mac_header != skb->network_header", without checking
> > skb_mac_header_was_set(). When the MAC header is unset, mac_header is
> > 0xffff, so the test passes and skb_mac_header(skb) returns
> > skb->head + 0xffff, ~64 KiB past the buffer; the loop then reads
> > dev->hard_header_len bytes out of bounds into the kernel log.
> > 
> > This is reachable via the netdev logger: nf_log_unknown_packet() calls
> > dump_mac_header() unconditionally, and an skb sent through AF_PACKET
> > with PACKET_QDISC_BYPASS reaches the egress hook with mac_header still
> > unset (__dev_queue_xmit(), which would reset it, is bypassed).
> > 
> > Add the skb_mac_header_was_set() check the ARPHRD_ETHER path already
> > uses, and replace the open-coded MAC header length test with
> > skb_mac_header_len(). Only skbs with an unset MAC header are affected;
> > valid ones are dumped as before.
> > 
> >  BUG: KASAN: slab-out-of-bounds in dump_mac_header (net/netfilter/nf_log_syslog.c:831)
> >  Read of size 1 at addr ffff88800ea49d3f by task exploit/148
> >  Call Trace:
> >   kasan_report (mm/kasan/report.c:595)
> >   dump_mac_header (net/netfilter/nf_log_syslog.c:831)
> >   nf_log_netdev_packet (net/netfilter/nf_log_syslog.c:938 net/netfilter/nf_log_syslog.c:963)
> >   nf_log_packet (net/netfilter/nf_log.c:260)
> >   nft_log_eval (net/netfilter/nft_log.c:60)
> >   nft_do_chain (net/netfilter/nf_tables_core.c:285)
> >   nft_do_chain_netdev (net/netfilter/nft_chain_filter.c:307)
> >   nf_hook_slow (net/netfilter/core.c:619)
> >   nf_hook_direct_egress (net/packet/af_packet.c:257)
> >   packet_xmit (net/packet/af_packet.c:280)
> >   packet_sendmsg (net/packet/af_packet.c:3114)
> >   __sys_sendto (net/socket.c:2265)
> > 
> > Fixes: 7eb9282cd0ef ("netfilter: ipt_LOG/ip6t_LOG: add option to print decoded MAC header")
> > Reported-by: Weiming Shi <bestswngs@gmail.com>
> > Assisted-by: Claude:claude-opus-4-8
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > Signed-off-by: Alexander Martyniuk <alexevgmart@gmail.com>
> > ---
> > Backport fix for CVE-2026-52942
> >  net/ipv4/netfilter/nf_log_ipv4.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv4/netfilter/nf_log_ipv4.c b/net/ipv4/netfilter/nf_log_ipv4.c
> > index d07583fac8f8..d6164e8e2c73 100644
> > --- a/net/ipv4/netfilter/nf_log_ipv4.c
> > +++ b/net/ipv4/netfilter/nf_log_ipv4.c
> > @@ -296,8 +296,8 @@ static void dump_ipv4_mac_header(struct nf_log_buf *m,
> >  
> >  fallback:
> >  	nf_log_buf_add(m, "MAC=");
> > -	if (dev->hard_header_len &&
> > -	    skb->mac_header != skb->network_header) {
> > +	if (dev->hard_header_len && skb_mac_header_was_set(skb) &&
> > +	    skb_mac_header_len(skb) != 0) {
> >  		const unsigned char *p = skb_mac_header(skb);
> >  		unsigned int i;
> >  
> > -- 
> > 2.43.0
> > 

^ permalink raw reply

* Re: [PATCH 3/7] dt-bindings: net: rockchip-dwmac: Allow 9 clocks
From: Heiko Stübner @ 2026-06-24 11:50 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Wu, Maxime Coquelin, Alexandre Torgue, Yanan He
  Cc: devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	netdev, linux-stm32, Yanan He
In-Reply-To: <20260624-rv1126-alientek-dlrv1126-v1-3-5aef608a3f64@gmail.com>

Am Mittwoch, 24. Juni 2026, 10:44:40 Mitteleuropäische Sommerzeit schrieb Yanan He:
> RV1126 has a separate GMAC Ethernet output clock used as the external
> PHY reference clock. This clock is described in addition to the existing
> GMAC clocks.

AS stated in the driver patch, this is the input clock for the phy
and ethernet phys are perfectly capable of handling their input
clocks. See reference in the driver patch.


Heiko


> Signed-off-by: Yanan He <grumpycat921013@gmail.com>
> ---
>  Documentation/devicetree/bindings/net/rockchip-dwmac.yaml | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
> index 80c252845349..86a7e83675ae 100644
> --- a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
> +++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
> @@ -71,7 +71,7 @@ properties:
>  
>    clocks:
>      minItems: 4
> -    maxItems: 8
> +    maxItems: 9
>  
>    clock-names:
>      contains:
> 
> 





^ permalink raw reply

* Re: [PATCH 1/1] xfrm: nat_keepalive: avoid double free on send error
From: Steffen Klassert @ 2026-06-24 11:49 UTC (permalink / raw)
  To: Qianyu Luo; +Cc: Eyal Birger, Ren Wei, netdev, herbert, davem, yuantan098, bird
In-Reply-To: <CAPOno=q0NxA_LOJ3QaM9Jgk1oM4ZQiW9igipGfxV7+yGvjr+0g@mail.gmail.com>

On Fri, Jun 19, 2026 at 05:06:40PM +0800, Qianyu Luo wrote:
> On Fri, Jun 19, 2026 at 1:21 AM Eyal Birger <eyal.birger@gmail.com> wrote:
> >
> > On Thu, Jun 18, 2026 at 9:36 AM Ren Wei <n05ec@lzu.edu.cn> wrote:
> > >
> > > From: Qianyu Luo <qianyuluo3@gmail.com>
> > >
> > >         skb_dst_set(skb, &rt->dst);
> > >
> > > @@ -100,6 +102,7 @@ static int nat_keepalive_send_ipv6(struct sk_buff *skb,
> > >         sock_net_set(sk, net);
> > >         dst = ip6_dst_lookup_flow(net, sk, &fl6, NULL);
> > >         if (IS_ERR(dst)) {
> > > +               kfree_skb(skb);
> > >                 local_unlock_nested_bh(&nat_keepalive_sk_ipv6.bh_lock);
> >
> > Any reason to do the kfree under lock?
> >
> 
> Thank you for your reply! I did this without particular reason.
> Just to keep the pre-handoff cleanup in the same error path.
> kfree_skb() does not need the bh lock there, so I can move it after the
> unlock in v2 if you need!

Then move it after the unlock to make it clear it is not needed.

^ permalink raw reply

* Re: [PATCH 4/7] net: stmmac: dwmac-rk: Enable refout clock for RGMII
From: Heiko Stübner @ 2026-06-24 11:49 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Wu, Maxime Coquelin, Alexandre Torgue, Yanan He
  Cc: devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	netdev, linux-stm32, Yanan He
In-Reply-To: <20260624-rv1126-alientek-dlrv1126-v1-4-5aef608a3f64@gmail.com>

Hi,

Am Mittwoch, 24. Juni 2026, 10:44:41 Mitteleuropäische Sommerzeit schrieb Yanan He:
> Some Rockchip GMAC integrations use clk_mac_refout as an external PHY
> reference clock even when the MAC is configured for RGMII.
> 
> RV1126 boards can route CLK_GMAC_ETHERNET_OUT to the external PHY as a
> 25 MHz reference clock. If the driver does not acquire and enable this
> clock in RGMII mode, the common clock framework may disable it as unused
> and the PHY can lose its reference clock.
> 
> Enable the refout clock handling for RGMII in addition to RMII.

the clock your referencing is not limited to your rv1126 but instead
present on most (all?) Rockchip SoCs.

And it is an input clock for the phy itself, so should be handled there.

See for example
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3588-tiger.dtsi#n313
as a reference.


Heiko

> Signed-off-by: Yanan He <grumpycat921013@gmail.com>
> ---
>  drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> index 8d7042e68926..f6fdc0c5b475 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> @@ -1112,7 +1112,8 @@ static int rk_gmac_clk_init(struct plat_stmmacenet_data *plat)
>  	bsp_priv->clk_enabled = false;
>  
>  	bsp_priv->num_clks = ARRAY_SIZE(rk_clocks);
> -	if (phy_iface == PHY_INTERFACE_MODE_RMII)
> +	if (phy_iface == PHY_INTERFACE_MODE_RMII ||
> +	    phy_iface == PHY_INTERFACE_MODE_RGMII)
>  		bsp_priv->num_clks += ARRAY_SIZE(rk_rmii_clocks);
>  
>  	bsp_priv->clks = devm_kcalloc(dev, bsp_priv->num_clks,
> @@ -1123,7 +1124,8 @@ static int rk_gmac_clk_init(struct plat_stmmacenet_data *plat)
>  	for (i = 0; i < ARRAY_SIZE(rk_clocks); i++)
>  		bsp_priv->clks[i].id = rk_clocks[i];
>  
> -	if (phy_iface == PHY_INTERFACE_MODE_RMII) {
> +	if (phy_iface == PHY_INTERFACE_MODE_RMII ||
> +	    phy_iface == PHY_INTERFACE_MODE_RGMII) {
>  		for (j = 0; j < ARRAY_SIZE(rk_rmii_clocks); j++)
>  			bsp_priv->clks[i++].id = rk_rmii_clocks[j];
>  	}
> 
> 





^ permalink raw reply

* Re: [PATCH 5.10] netfilter: nf_log: validate MAC header was set before dumping it
From: Pablo Neira Ayuso @ 2026-06-24 11:48 UTC (permalink / raw)
  To: Alexander Martyniuk
  Cc: stable, Greg Kroah-Hartman, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Patrick McHardy, netfilter-devel, coreteam,
	netdev, linux-kernel, Weiming Shi, Xiang Mei
In-Reply-To: <20260624140117.19799-1-alexevgmart@gmail.com>

Hi,

Thanks but why only 5.10?

On Wed, Jun 24, 2026 at 02:01:15PM +0000, Alexander Martyniuk wrote:
> From: Xiang Mei <xmei5@asu.edu>
> 
> commit a84b6fedbc97078788be78dbdd7517d143ad1a77 upstream
> 
> The fallback path of dump_mac_header() guards the MAC header access
> only with "skb->mac_header != skb->network_header", without checking
> skb_mac_header_was_set(). When the MAC header is unset, mac_header is
> 0xffff, so the test passes and skb_mac_header(skb) returns
> skb->head + 0xffff, ~64 KiB past the buffer; the loop then reads
> dev->hard_header_len bytes out of bounds into the kernel log.
> 
> This is reachable via the netdev logger: nf_log_unknown_packet() calls
> dump_mac_header() unconditionally, and an skb sent through AF_PACKET
> with PACKET_QDISC_BYPASS reaches the egress hook with mac_header still
> unset (__dev_queue_xmit(), which would reset it, is bypassed).
> 
> Add the skb_mac_header_was_set() check the ARPHRD_ETHER path already
> uses, and replace the open-coded MAC header length test with
> skb_mac_header_len(). Only skbs with an unset MAC header are affected;
> valid ones are dumped as before.
> 
>  BUG: KASAN: slab-out-of-bounds in dump_mac_header (net/netfilter/nf_log_syslog.c:831)
>  Read of size 1 at addr ffff88800ea49d3f by task exploit/148
>  Call Trace:
>   kasan_report (mm/kasan/report.c:595)
>   dump_mac_header (net/netfilter/nf_log_syslog.c:831)
>   nf_log_netdev_packet (net/netfilter/nf_log_syslog.c:938 net/netfilter/nf_log_syslog.c:963)
>   nf_log_packet (net/netfilter/nf_log.c:260)
>   nft_log_eval (net/netfilter/nft_log.c:60)
>   nft_do_chain (net/netfilter/nf_tables_core.c:285)
>   nft_do_chain_netdev (net/netfilter/nft_chain_filter.c:307)
>   nf_hook_slow (net/netfilter/core.c:619)
>   nf_hook_direct_egress (net/packet/af_packet.c:257)
>   packet_xmit (net/packet/af_packet.c:280)
>   packet_sendmsg (net/packet/af_packet.c:3114)
>   __sys_sendto (net/socket.c:2265)
> 
> Fixes: 7eb9282cd0ef ("netfilter: ipt_LOG/ip6t_LOG: add option to print decoded MAC header")
> Reported-by: Weiming Shi <bestswngs@gmail.com>
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Xiang Mei <xmei5@asu.edu>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> Signed-off-by: Alexander Martyniuk <alexevgmart@gmail.com>
> ---
> Backport fix for CVE-2026-52942
>  net/ipv4/netfilter/nf_log_ipv4.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/netfilter/nf_log_ipv4.c b/net/ipv4/netfilter/nf_log_ipv4.c
> index d07583fac8f8..d6164e8e2c73 100644
> --- a/net/ipv4/netfilter/nf_log_ipv4.c
> +++ b/net/ipv4/netfilter/nf_log_ipv4.c
> @@ -296,8 +296,8 @@ static void dump_ipv4_mac_header(struct nf_log_buf *m,
>  
>  fallback:
>  	nf_log_buf_add(m, "MAC=");
> -	if (dev->hard_header_len &&
> -	    skb->mac_header != skb->network_header) {
> +	if (dev->hard_header_len && skb_mac_header_was_set(skb) &&
> +	    skb_mac_header_len(skb) != 0) {
>  		const unsigned char *p = skb_mac_header(skb);
>  		unsigned int i;
>  
> -- 
> 2.43.0
> 

^ permalink raw reply

* Re: [PATCH bpf-next v8 3/7] bpf: add bpf_icmp_send kfunc
From: Mahe Tardy @ 2026-06-24 11:45 UTC (permalink / raw)
  To: Emil Tsalapatis
  Cc: bpf, andrii, ast, daniel, edumazet, john.fastabend, jordan, kuba,
	martin.lau, netdev, netfilter-devel, pabeni, yonghong.song
In-Reply-To: <ajuqZMzqACLOijoC@gmail.com>

On Wed, Jun 24, 2026 at 11:59:00AM +0200, Mahe Tardy wrote:
> On Tue, Jun 23, 2026 at 10:09:20PM -0400, Emil Tsalapatis wrote:
> > On Mon Jun 22, 2026 at 8:05 AM EDT, Mahe Tardy wrote:
> 
> [...]
> 
> > > +#if IS_ENABLED(CONFIG_IPV6)
> > > +	case htons(ETH_P_IPV6):
> > > +		if (type != ICMPV6_DEST_UNREACH)
> > > +			return -EOPNOTSUPP;
> > > +		if (code < 0 || code > ICMPV6_REJECT_ROUTE)
> > > +			return -EINVAL;
> > > +
> > > +		nskb = skb_clone(skb, GFP_ATOMIC);
> > > +		if (!nskb)
> > > +			return -ENOMEM;
> > > +
> > > +		if (!pskb_network_may_pull(nskb, sizeof(struct ipv6hdr))) {
> > 
> > Minor nit, but this may also fail with SKB_DROP_REASON_NOMEM. Now this is only
> > possible if the IP header is not in the linear space which may well be
> > impossible (?), but do we want to differentiate with
> > pskb_network_may_pull_reason()?
> 
> Indeed, I think for the IP header is should be fine, but I replaced it
> with the reason variant. Thanks!
>  
> > > +			kfree_skb(nskb);
> > > +			return -EBADMSG;
> > > +		}
> > > +
> 
> [...]
> 
> > >  static int __init bpf_kfunc_init(void)
> > >  {
> > >  	int ret;
> > > @@ -12639,6 +12745,9 @@ static int __init bpf_kfunc_init(void)
> > >  	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> > >  					       &bpf_kfunc_set_sock_addr);
> > >  	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
> > > +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB, &bpf_kfunc_set_icmp_send);
> > > +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_icmp_send);
> > > +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_ACT, &bpf_kfunc_set_icmp_send);
> > 
> > Based on Sashiko's feedback, since we mostly care about cgroup_skb
> > should we just make it exclusive to them and drop CLS_ACT?
> 
> This would indeed simplify this patchset, I could drop most of the
> complication induced by tc ingress routing. But I think having both
> cgroup_skb and tc support would be nice as a first implem. I'll try
> again in a new version as I added a test for ingress tc and could
> actually fix the routing based on sashiko's feedback (this also drop the
> first two patches that were partially wrong).

tl;dr: I'll remove the tc support as it feels difficult (impossible
without major plumbing changes?) to get right.

Here are the details:

Initially I ended up removing the first two patch set as they were
technically wrong (see explanations after), I added this small helper:

	#if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6)
	static bool skb_dst_validate_and_hold(struct sk_buff *skb)
	{
	       bool ret;

	       rcu_read_lock();
	       ret = skb_valid_dst(skb) && skb_dst_force(skb);
	       rcu_read_unlock();

	       return ret;
	}
	#endif

And then the body of the kfunc would do something like this (instead of
calling the removed helpers):

	reason = pskb_network_may_pull_reason(nskb, sizeof(struct iphdr));
	if (reason) {
		kfree_skb_reason(nskb, reason);
		return -EBADMSG;
	}

	memset(IPCB(nskb), 0, sizeof(struct inet_skb_parm));
	IPCB(nskb)->iif = nskb->skb_iif;

	if (!skb_dst_validate_and_hold(nskb)) {
		if (!nskb->dev) {
			kfree_skb(nskb);
			return -ENODEV;
		}

		iph = ip_hdr(nskb);
		reason = ip_route_input(nskb, iph->daddr, iph->saddr,
					ip4h_dscp(iph), nskb->dev);
		if (reason) {
			kfree_skb_reason(nskb, reason);
			return -EHOSTUNREACH;
		}
	}

	icmp_send(nskb, type, code, 0);

Then I added a tc ingress test to showcase the issue with the previous
helpers and trigger the routing in the kfunc, with this steup:

	  client ns:                    test ns:
	  icmp_peer                     ns_icmp_send_unreach_route_ingress
	+------------+                +-------------------------+
	| icmp_cli   |                | icmp_srv                |
	| 198.18.0.1 |--------------->| primary:   198.18.0.254 |
	+------------+ TCP SYN        | local dst: 198.18.0.2   |
		       dst=198.18.0.2 +-------------------------+
					  | tc ingress BPF
					  | calls bpf_icmp_send()
					  | -> icmp src=198.18.0.2
					  v

With the previous helpers, the icmp would route by reverting the daddr
immediately, and then asking for the route. Thus we would "forget" about
the actual source address, and in this case we would end up with the
icmp control message src IP being the primary address and not the one we
wanted: 198.18.9.2. Turns out route input was sufficient to give the
needed _correct_ information to icmp_send that would invert the address
for us and route the packet again.

Then I submitted that for more review by sashiko and it found this new
thing (which is orthogonal to the previous issue):

	> @@ -12639,6 +12788,9 @@ static int __init bpf_kfunc_init(void)
	>  	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
	>  					       &bpf_kfunc_set_sock_addr);
	>  	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
	> +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB, &bpf_kfunc_set_icmp_send);
	> +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_icmp_send);
	> +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_ACT, &bpf_kfunc_set_icmp_send);

	Can exposing net/core/filter.c:bpf_icmp_send() to sched_cls and sched_act
	deadlock a non-lockless qdisc?

	A cls_bpf program can run from a qdisc enqueue classifier while the qdisc
	root lock is held.  If it calls bpf_icmp_send(), the kfunc synchronously
	goes through icmp_send()/icmpv6_send() and then normal transmit.  If the
	reply routes back through the same qdisc, the inner transmit can try to take
	the same root lock again.

	One possible path is:

	__dev_xmit_skb()
	  q->enqueue()
	  prio_enqueue()
	  tcf_classify()
	  cls_bpf_classify()
	  bpf_icmp_send()
	  icmp_send()/icmpv6_send()
	  dev_queue_xmit()
	  __dev_xmit_skb()

	Is this kfunc safe in enqueue classifier/action contexts, or should this
	registration be limited to contexts that cannot run under the qdisc root
	lock?

I managed to indeed reproduce this deadlock. So I think there's no way
to implement this safely, we would either need:
- make the kfunc only available to tcx only (and then prevent program
  verified as TCX from being reused as legacy qdisc classifiers...)
- do some crazy runtime guard, exposing the lock.

So I will give up on tc support for now as it's more difficult than
expected.

> 
> > >  	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
> > >  }
> > >  late_initcall(bpf_kfunc_init);
> > > --
> > > 2.34.1
> > 

^ permalink raw reply

* Re: [PATCH net] net: clear transport header during tunnel decapsulation
From: Eric Dumazet @ 2026-06-24 11:44 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Ido Schimmel, David Ahern, netdev, eric.dumazet,
	syzbot+d5d0d598a4cfdfafdc3b
In-Reply-To: <95a719af-c9d3-4bce-995b-c6ffce15739c@linux.dev>

On Wed, Jun 24, 2026 at 3:41 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
>
> On 6/24/26 3:32 PM, Eric Dumazet wrote:
> > Syzbot triggered a DEBUG_NET_WARN_ON_ONCE(len > INT_MAX) assertion in
> > pskb_may_pull_reason() called from qdisc_pkt_len_segs_init().
> >
> > The root cause is a stale, negative transport header offset carried over
> > during tunnel decapsulation. When a tunnel receiver (e.g., VXLAN or Geneve)
> > decapsulates a packet, it pulls the outer headers but leaves the transport
> > header pointing to the outer UDP header. This offset becomes negative
> > relative to the new skb->data (inner IP header).
> >
> > If the packet bypasses GRO (e.g., an untrusted GSO packet flagged as
> > "unexpected GSO" by udp_unexpected_gso() due to missing tunnel GSO bits),
> > it is flushed directly to the stack as GRO_NORMAL. On ingress, Layer 2 Qdisc
> > processing (sch_handle_ingress) happens before Layer 3 IP reception
> > (ip_rcv_core) can run and reset the transport header. Consequently,
> > qdisc_pkt_len_segs_init() attempts to validate the transport header using
> > pskb_may_pull(skb, hdr_len + sizeof(tcphdr)). The negative hdr_len overflows
> > the unsigned cast in pskb_may_pull(), triggering the assertion.
> >
> > Fix this by clearing the transport header to the ~0U sentinel value during
> > decapsulation. This ensures that:
> > 1) The ingress Qdisc safely skips validation via !skb_transport_header_was_set()
> >     and returns early without warning.
> > 2) The IP layer (ip_rcv_core) later correctly resets the transport header
> >     to the inner L4 header offset.
> >
> > Introduce skb_unset_transport_header() helper and apply it in the main
> > decapsulation paths:
> > 1) __iptunnel_pull_header() (covering Geneve, GRE, IPIP, SIT, etc.)
> > 2) vxlan_rcv() (covering VXLAN)
> >
> > This restores skb invariants at the decapsulation boundary without adding
> > overhead to the Qdisc fast path.
> >
> > Fixes: 7fb4c1967011 ("net: pull headers in qdisc_pkt_len_segs_init()")
> > Reported-by: syzbot+d5d0d598a4cfdfafdc3b@syzkaller.appspotmail.com
> > Closes: https://lore.kernel.org/netdev/6a3b853b.52ae72c2.136ac7.000c.GAE@google.com/T/#u
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Assisted-by: Gemini:gemini-3.1-pro
>
>
> I think a negative skb_transport_offset() should break something else too,
> so the Fixes tag looks wrong, but I couldn't find any actual breakage
> (luck, or I'm missing it).

Read again the changelog: transport header is set (in ingress) a bit
later in the stack.

Nothing needs it before, but  qdisc_pkt_len_segs_init() if/when it is
called in ingress.

>
> Hope sashiko read this reply and confirm it....

On older kernels (before  7fb4c1967011 ("net: pull headers in
qdisc_pkt_len_segs_init()"),
the bug is completely latent and harmless.

This prevents unnecessary backporting churn and potential merge conflicts on
very old kernels where skb_unset_transport_header() doesn't exist.

The Historical Option (a6d5bbf34efa / d342894c5d28):

If we point to the original commits that introduced the tunnels,
we are historically accurate, but we risk stable scripts trying to
backport this fix all the way back to 2012/2016
(e.g. kernel 3.7 or 4.6), which is unnecessary and highly likely to
fail to apply.

^ permalink raw reply

* Re: [PATCH net v2] fsl/fman: Free init resources on KeyGen failure in fman_init()
From: Breno Leitao @ 2026-06-24 11:40 UTC (permalink / raw)
  To: Haoxiang Li
  Cc: madalin.bucur, sean.anderson, andrew+netdev, davem, edumazet,
	kuba, pabeni, florinel.iordache, netdev, linux-kernel, stable,
	Pavan Chebbi
In-Reply-To: <20260624055119.2776641-1-haoxiang_li2024@163.com>

On Wed, Jun 24, 2026 at 01:51:19PM +0800, Haoxiang Li wrote:
> diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c
> index 013273a2de32..3a2a57207e55 100644
> --- a/drivers/net/ethernet/freescale/fman/fman.c
> +++ b/drivers/net/ethernet/freescale/fman/fman.c
> @@ -1995,12 +1995,12 @@ static int fman_init(struct fman *fman)
>  
>  	/* Init KeyGen */
>  	fman->keygen = keygen_init(fman->kg_regs);
> -	if (!fman->keygen)
> +	if (!fman->keygen) {
> +		free_init_resources(fman);

That makes sense, fman_init() is doing the same earlier when "MURAM
alloc for BMI FIFO failed".

For this patch only, please feel free to add Reviewed-by: Breno Leitao <leitao@debian.org>

>  		return -EINVAL;
> +	}
>  
> -	err = enable(fman, cfg);
> -	if (err != 0)
> -		return err;
> +	enable(fman, cfg);

I understand the "while at it", but this should be a separate patch,
and it isn't a fix for 7472f4f281d0 ("fsl/fman: enable FMan Keygen")

Separate this in a different patch, targeting net-next. Also, enable()
might return "void" instead of "int", given it only returns 0.

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net v2] ice: eswitch: fix use-after-free of metadata_dst in repr release
From: Marcin Szycik @ 2026-06-24 11:36 UTC (permalink / raw)
  To: Doruk Tan Ozturk, anthony.l.nguyen, przemyslaw.kitszel,
	andrew+netdev, davem, edumazet, kuba, pabeni
  Cc: michal.swiatkowski, wojciech.drewek, intel-wired-lan, netdev,
	linux-kernel, stable, horms
In-Reply-To: <20260618145003.47471-1-doruk@0sec.ai>



On 18/06/2026 16:50, Doruk Tan Ozturk wrote:
> ice_eswitch_release_repr() frees the port representor metadata_dst via
> metadata_dst_free(), which directly kfree()s the object and ignores the
> dst_entry refcount. The eswitch slow-path TX routine
> ice_eswitch_port_start_xmit() takes a reference on this dst with
> dst_hold() and attaches it to the skb via skb_dst_set(). If such an skb
> is still in flight (e.g. queued in a qdisc) when the representor is torn
> down, the metadata_dst is freed while the skb still points at it. When
> the skb is later freed, dst_release() operates on already-freed memory.
> 
> Replace metadata_dst_free() with dst_release() so the metadata_dst is
> freed only after the last reference is dropped. The dst subsystem frees
> metadata_dst objects from dst_destroy() once the refcount reaches zero
> (DST_METADATA is set by metadata_dst_alloc()).
> 
> Same class of bug and fix as commit c32b26aaa2f9 ("netfilter:
> nft_tunnel: fix use-after-free on object destroy").
> 
> Fixes: 1a1c40df2e80 ("ice: set and release switchdev environment")
> Cc: stable@vger.kernel.org
> Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
> Reviewed-by: Simon Horman <horms@kernel.org>

Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>

> ---
> v2:
>  - Correct the Fixes: tag to 1a1c40df2e80 ("ice: set and release
>    switchdev environment"); the previously cited fff292b47ac1 only moved
>    the affected code rather than introducing the unbalanced free, and the
>    bug dates back to when switchdev support was added (Simon Horman).
>  - Add Simon Horman's Reviewed-by. No functional change.
> 
>  drivers/net/ethernet/intel/ice/ice_eswitch.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c
> index 2e4f0969035f..41b30a7ca4a9 100644
> --- a/drivers/net/ethernet/intel/ice/ice_eswitch.c
> +++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c
> @@ -95,7 +95,7 @@ ice_eswitch_release_repr(struct ice_pf *pf, struct ice_repr *repr)
>  		return;
> 
>  	ice_vsi_update_security(vsi, ice_vsi_ctx_set_antispoof);
> -	metadata_dst_free(repr->dst);
> +	dst_release(&repr->dst->dst);
>  	repr->dst = NULL;
>  	ice_fltr_add_mac_and_broadcast(vsi, repr->parent_mac,
>  				       ICE_FWD_TO_VSI);
> --
> 2.43.0


^ permalink raw reply

* Re: [PATCH iwl v3] ice: retry reading NVM if admin queue returns EBUSY
From: Robert Malz @ 2026-06-24 11:15 UTC (permalink / raw)
  To: Simon Horman
  Cc: anthony.l.nguyen, przemyslaw.kitszel, intel-wired-lan, netdev
In-Reply-To: <20260618152003.909400-1-horms@kernel.org>

Hey Simon,
Thanks for the review. I'll try to address these comments one by one.

On Thu, Jun 18, 2026 at 5:20 PM Simon Horman <horms@kernel.org> wrote:
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> ---
> ice: retry reading NVM if admin queue returns EBUSY
>
> When the admin queue command to read NVM returns EBUSY, the driver
> currently treats it as a fatal error and aborts the entire read operation.
> This patch adds retry logic to handle this condition.
>
> > diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
> > --- a/drivers/net/ethernet/intel/ice/ice_nvm.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
>
> [ ... ]
>
> > @@ -96,11 +97,25 @@ ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
> >                                        offset, read_size,
> >                                        data + bytes_read, last_cmd,
> >                                        read_shadow_ram, NULL);
> > -             if (status)
> > -                     break;
> > -
> > -             bytes_read += read_size;
> > -             offset += read_size;
> > +             if (status) {
> > +                     if (hw->adminq.sq_last_status != LIBIE_AQ_RC_EBUSY ||
> > +                         retry_cnt > ICE_SQ_SEND_MAX_EXECUTE)
>
> [Severity: Low]
> Does this check allow an extra retry execution? Since retry_cnt starts at 0,
> using a strictly greater-than check allows the loop to retry 4 times
> (5 total executions) instead of the 3 executions intended by
> ICE_SQ_SEND_MAX_EXECUTE.
>
> > +                             break;
> > +                     ice_debug(hw, ICE_DBG_NVM,
> > +                               "NVM read EBUSY error, retry %d\n",
> > +                               retry_cnt + 1);
> > +                     last_cmd = false;
> > +                     ice_release_nvm(hw);
> > +                     msleep(ICE_SQ_SEND_DELAY_TIME_MS);
>
The review is correct that this allows one extra execution. However, I
intentionally kept this verbatim from the OOT driver (1.15.4) to
retain the significant test exposure it has there. The absolute worst
case is one harmless extra retry before giving up. If you strongly
prefer strict adherence to ICE_SQ_SEND_MAX_EXECUTE, I can update the
condition in a v4.

> [Severity: High]
> Does dropping the NVM lock here violate the caller's assumption of mutual
> exclusion?
>
> Callers of ice_read_flat_nvm() typically acquire the NVM lock beforehand to
> ensure an atomic read of multiple sectors without interleaved writes. By
> explicitly dropping the lock during the msleep() window, another thread, VF,
> or PF could acquire the NVM lock and modify the NVM before the read resumes,
> potentially resulting in reading torn or inconsistent NVM data.
>
> > +                     status = ice_acquire_nvm(hw, ICE_RES_READ);
> > +                     if (status)
> > +                             break;
>

I don't think this is an issue. The NVM resource (req_res 0x0008) is
an ownership lease, not a content snapshot. The FW grants it with a
hold timeout (see the timeout field handling in ice_aq_req_res). It
never guaranteed a point-in-time view of the flash.
Very similar behavior already exists in ice_devlink_nvm_snapshot(),
which reads the NVM in chunks via ice_read_flat_nvm() and deliberately
acquires/releases the lock between chunks (with a comment explaining
the FW reclaim).
Additionally, the lock is only dropped on the failure path, where
neither bytes_read nor offset have been advanced. The driver simply
re-reads the failed chunk after re-acquiring. So no data that is
returned to the caller is committed across the drop window.
Overall the lock drop does not weaken any guarantee callers actually
have. If we wanted to introduce a true snapshot guarantee, that would
probably require different approach and separate patch.

> [Severity: High]
> Can this lead to an unbalanced NVM resource lock release and potential data
> corruption?
>
> If ice_acquire_nvm() fails here, the loop breaks and returns an error with the
> lock currently unheld. However, callers like ice_read_flash_module() are
> unaware of this internal lock state change and will unconditionally call
> ice_release_nvm() upon failure:
>
> drivers/net/ethernet/intel/ice/ice_nvm.c:ice_read_flash_module() {
>         status = ice_acquire_nvm(hw, ICE_RES_READ);
>         if (status)
>                 return status;
>
>         status = ice_read_flat_nvm(hw, start + offset, &length, data, false);
>
>         ice_release_nvm(hw);
>     ...
>
> Because firmware tracks resource locks per-PF rather than per-OS-thread, this
> errant second release could forcefully strip the lock from another thread on
> the same PF that successfully acquired it, exposing the NVM to concurrent
> modification.
>

Agreed, this might be a real bug, and the one of the three I think is
worth investigating.
If ice_acquire_nvm() fails after the drop, ice_read_flat_nvm() returns
with the lock unheld while callers unconditionally call
ice_release_nvm(), so a stray release is issued.

On probability, though, the window is very small. Reaching it requires
sustained EBUSY across the retry budget plus a failed re-acquire
(which itself polls up to ICE_NVM_TIMEOUT), and concurrently another
requester taking the lock. Most reads happen during init (ice_probe,
and reset/rebuild via ice_init_nvm), and NVM writes only happen on an
already initialized driver. The devlink/ethtool nvm_read paths are
also exposed, but hitting this race would require precise timing
against a concurrent NVM owner on the device.

I'd prefer to keep the scope of this patch limited to the EBUSY retry
path and not take on the unbalanced-release fix here. A proper fix
should change the lock-ownership contract of ice_read_flat_nvm() (on
error, the lock must be released by ice_read_flat_nvm(), callers
release only on success) and update all callers. Code change sould be
simple for all callers but ice_discover_flash_size(), it intentionally
holds one lease across a read loop and would need to re-acquire after
each expected boundary failure.

Given how small the original window is, I'd rather not trade tested
OOT behavior for the risk of a complex unbalanced NVM lock fix. I
actually have a patch mostly ready that fixes the lock-ownership
contract, but I really don't like it. It changes the design of
ice_read_flat_nvm(), making it less intuitive for callers. More
importantly, I just don't have the resources or test coverage right
now to properly verify such change.

However, I can modify the failure path for ice_acquire_nvm inside
ice_read_flat_nvm. Instead of bailing out immediately, we can just
retry it within the existing retry budget. In this case, the
probability of leaving ice_read_flat_nvm without holding the lock is
reduced even further without needing a refactor.

Please let me know what you think about my thought process on this.

> > +                     retry_cnt++;
> > +             } else {
> > +                     bytes_read += read_size;
> > +                     offset += read_size;
> > +                     retry_cnt = 0;
> > +             }
> >       } while (!last_cmd);
> >
> >       *length = bytes_read;

Thanks,
Robert

^ permalink raw reply

* Re: [PATCH 1/2] bug: Provide WARN_ON.*DEFERRED() macros for console deferred output
From: Sebastian Andrzej Siewior @ 2026-06-24 11:03 UTC (permalink / raw)
  To: Breno Leitao
  Cc: linux-arch, linux-kernel, sched-ext, netdev, David S . Miller,
	Andrea Righi, Andrew Morton, Arnd Bergmann, Ben Segall,
	Changwoo Min, David Vernet, Dietmar Eggemann, Eric Dumazet,
	Ingo Molnar, Jakub Kicinski, John Ogness, Juri Lelli,
	K Prateek Nayak, Paolo Abeni, Peter Zijlstra, Petr Mladek,
	Sergey Senozhatsky, Simon Horman, Steven Rostedt, Tejun Heo,
	Vincent Guittot, Vlad Poenaru
In-Reply-To: <ajuWnKsQR0Z825Wn@gmail.com>

On 2026-06-24 01:37:53 [-0700], Breno Leitao wrote:
Hi Breno,

> Have you considered an approach similar to printk_deferred_enter(),
> where you mark the code region that needs deferral and all WARN() calls
> within that region are automatically deferred?

Doing this at rq-lock site is not something the scheduler department
takes. It increases/ bloats the code sides more than what we have now.

Not everything is in __sched section so we can't check for this from
within printk. So this turd was the only idea I had.

> The current proposal requires changing individual WARN() call sites,
> but whether they need deferral might depend on the calling context. This
> means you'd need to convert many call sites and ensure all nested
> warnings are also converted to the deferred variant.

I hope for the forced-threaded-legacy the default but this camp has not
a lot members. It would increase the pressure to provide nbcon so it
could be a good thing.

To accept this series and make it more bullet-proof we could do
s/WARN_ON\>/WARN_ON_DEFERRED/ for all sched/ and require it regardless
if the rq-lock is held. So you wouldn't have to audit it each and every
time. Due to that preempt-disable thingy it can be used in preemptible
sections without breaking anything.

> 
> Thanks,
> --breno

Sebastian

^ permalink raw reply

* [PATCH 5.10] netfilter: nf_log: validate MAC header was set before dumping it
From: Alexander Martyniuk @ 2026-06-24 14:01 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Alexander Martyniuk, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Jakub Kicinski, Patrick McHardy,
	netfilter-devel, coreteam, netdev, linux-kernel, Weiming Shi,
	Xiang Mei

From: Xiang Mei <xmei5@asu.edu>

commit a84b6fedbc97078788be78dbdd7517d143ad1a77 upstream

The fallback path of dump_mac_header() guards the MAC header access
only with "skb->mac_header != skb->network_header", without checking
skb_mac_header_was_set(). When the MAC header is unset, mac_header is
0xffff, so the test passes and skb_mac_header(skb) returns
skb->head + 0xffff, ~64 KiB past the buffer; the loop then reads
dev->hard_header_len bytes out of bounds into the kernel log.

This is reachable via the netdev logger: nf_log_unknown_packet() calls
dump_mac_header() unconditionally, and an skb sent through AF_PACKET
with PACKET_QDISC_BYPASS reaches the egress hook with mac_header still
unset (__dev_queue_xmit(), which would reset it, is bypassed).

Add the skb_mac_header_was_set() check the ARPHRD_ETHER path already
uses, and replace the open-coded MAC header length test with
skb_mac_header_len(). Only skbs with an unset MAC header are affected;
valid ones are dumped as before.

 BUG: KASAN: slab-out-of-bounds in dump_mac_header (net/netfilter/nf_log_syslog.c:831)
 Read of size 1 at addr ffff88800ea49d3f by task exploit/148
 Call Trace:
  kasan_report (mm/kasan/report.c:595)
  dump_mac_header (net/netfilter/nf_log_syslog.c:831)
  nf_log_netdev_packet (net/netfilter/nf_log_syslog.c:938 net/netfilter/nf_log_syslog.c:963)
  nf_log_packet (net/netfilter/nf_log.c:260)
  nft_log_eval (net/netfilter/nft_log.c:60)
  nft_do_chain (net/netfilter/nf_tables_core.c:285)
  nft_do_chain_netdev (net/netfilter/nft_chain_filter.c:307)
  nf_hook_slow (net/netfilter/core.c:619)
  nf_hook_direct_egress (net/packet/af_packet.c:257)
  packet_xmit (net/packet/af_packet.c:280)
  packet_sendmsg (net/packet/af_packet.c:3114)
  __sys_sendto (net/socket.c:2265)

Fixes: 7eb9282cd0ef ("netfilter: ipt_LOG/ip6t_LOG: add option to print decoded MAC header")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Alexander Martyniuk <alexevgmart@gmail.com>
---
Backport fix for CVE-2026-52942
 net/ipv4/netfilter/nf_log_ipv4.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/netfilter/nf_log_ipv4.c b/net/ipv4/netfilter/nf_log_ipv4.c
index d07583fac8f8..d6164e8e2c73 100644
--- a/net/ipv4/netfilter/nf_log_ipv4.c
+++ b/net/ipv4/netfilter/nf_log_ipv4.c
@@ -296,8 +296,8 @@ static void dump_ipv4_mac_header(struct nf_log_buf *m,
 
 fallback:
 	nf_log_buf_add(m, "MAC=");
-	if (dev->hard_header_len &&
-	    skb->mac_header != skb->network_header) {
+	if (dev->hard_header_len && skb_mac_header_was_set(skb) &&
+	    skb_mac_header_len(skb) != 0) {
 		const unsigned char *p = skb_mac_header(skb);
 		unsigned int i;
 
-- 
2.43.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox