Netdev List

Netdev List
 help / color / mirror / Atom feed

* [syzbot ci] Re: net: clear transport header during tunnel decapsulation
From: syzbot ci @ 2026-06-24 12:14 UTC (permalink / raw)
  To: davem, dsahern, edumazet, eric.dumazet, horms, idosch, kuba,
	netdev, pabeni, syzbot
  Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260624073209.3703492-1-edumazet@google.com>

syzbot ci has tested the following series

[v1] net: clear transport header during tunnel decapsulation
https://lore.kernel.org/all/20260624073209.3703492-1-edumazet@google.com
* [PATCH net] net: clear transport header during tunnel decapsulation

and found the following issue:
WARNING in geneve_udp_encap_recv

Full report is available here:
https://ci.syzbot.org/series/1f6dc47e-354f-4904-bc18-c2b7ea4d79b2

***

WARNING in geneve_udp_encap_recv

tree:      net
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net.git
base:      d87363b0edfc7504ff2b144fe4cdd8154f90f42e
arch:      amd64
compiler:  Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
config:    https://ci.syzbot.org/builds/7bb83c16-d55d-4d99-8c2b-6050e0022ef6/config

------------[ cut here ]------------
!skb_transport_header_was_set(skb)
WARNING: ./include/linux/skbuff.h:3094 at geneve_udp_encap_recv+0x26ed/0x4130, CPU#1: kworker/1:3/5072
Modules linked in:
CPU: 1 UID: 0 PID: 5072 Comm: kworker/1:3 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Workqueue: mld mld_ifc_work
RIP: 0010:geneve_udp_encap_recv+0x26ed/0x4130
Code: c8 00 00 00 0f b6 04 01 84 c0 0f 85 f2 18 00 00 48 8b 7c 24 78 66 44 89 6f 0a e8 7e 9b 7a 03 e9 8c 00 00 00 e8 f4 fc 33 fb 90 <0f> 0b 90 e9 2e e3 ff ff 49 83 c6 06 4c 89 f0 48 c1 e8 03 48 b9 00
RSP: 0018:ffffc90000a08620 EFLAGS: 00010246
RAX: ffffffff8691f92c RBX: ffff8881bcc8b5d0 RCX: ffff88816a7abb80
RDX: 0000000000000100 RSI: 000000000000ffff RDI: 000000000000ffff
RBP: ffffc90000a08790 R08: ffffffff903114f7 R09: 1ffffffff206229e
R10: dffffc0000000000 R11: fffffbfff206229f R12: ffff888109f6a108
R13: 1ffff110213ed5df R14: 0000000000000010 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff8882a927b000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6956ea5440 CR3: 000000016f55c000 CR4: 00000000000006f0
Call Trace:
 <IRQ>
 udp_queue_rcv_one_skb+0xfc5/0x10e0
 udp_unicast_rcv_skb+0x21a/0x3a0
 udp_rcv+0xecb/0x1db0
 ip_protocol_deliver_rcu+0x27e/0x440
 ip_local_deliver_finish+0x3bb/0x6f0
 NF_HOOK+0x336/0x3c0
 NF_HOOK+0x336/0x3c0
 process_backlog+0xa34/0x1860
 __napi_poll+0xaa/0x330
 net_rx_action+0x61d/0xf50
 handle_softirqs+0x225/0x840
 do_softirq+0x76/0xd0
 </IRQ>
 <TASK>
 __local_bh_enable_ip+0xf8/0x130
 __dev_queue_xmit+0x1ed7/0x37f0
 ip6_output+0x337/0x540
 NF_HOOK+0x177/0x4f0
 mld_sendpack+0x890/0xe10
 mld_ifc_work+0x839/0xe70
 process_scheduled_works+0xa8e/0x14e0
 worker_thread+0xa47/0xfb0
 kthread+0x388/0x470
 ret_from_fork+0x514/0xb70
 ret_from_fork_asm+0x1a/0x30
 </TASK>


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).

The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.

^ permalink raw reply

* Re: [PATCH v12 02/12] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
From: Nikolay Borisov @ 2026-06-24 12:12 UTC (permalink / raw)
  To: Pawan Gupta, x86, Jon Kohler, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, Jason Baron, Alice Ryhl,
	Steven Rostedt, Ard Biesheuvel, Shuah Khan
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260622-vmscape-bhb-v12-2-76cbda0ae3e5@linux.intel.com>



On 23.06.26 г. 20:33 ч., Pawan Gupta wrote:
> As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
> the Branch History Buffer (BHB). On Alder Lake and newer parts this
> sequence is not sufficient because it doesn't clear enough entries. This
> was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
> in the kernel.
> 
> Now with VMSCAPE (BHI variant) it is also required to isolate branch
> history between guests and userspace. Since BHI_DIS_S only protects the
> kernel, the newer CPUs also use IBPB.
> 
> A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
> But it currently does not clear enough BHB entries to be effective on newer
> CPUs with larger BHB. At boot, dynamically set the loop count of
> clear_bhb_loop() such that it is effective on newer CPUs too.
> 
> Introduce global loop counts, initializing them with appropriate value
> based on the hardware feature X86_FEATURE_BHI_CTRL.
> 
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>

Although AI brings up a valid argument about whether guests should be 
pessimized and fallback to the longer sequence ?

^ permalink raw reply

* RE: [PATCH v2 net 2/2] tipc: avoid busy looping in tipc_exit_net()
From: Tung Quang Nguyen @ 2026-06-24 12:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Simon Horman, Kuniyuki Iwashima, Xin Long, Jon Maloy,
	tipc-discussion@lists.sourceforge.net, netdev@vger.kernel.org,
	eric.dumazet@gmail.com, David S . Miller, Jakub Kicinski,
	Paolo Abeni
In-Reply-To: <20260623173030.2925059-3-edumazet@google.com>

>Subject: [PATCH v2 net 2/2] tipc: avoid busy looping in tipc_exit_net()
>
>Blamed commit introduced a busy-wait loop in tipc_exit_net() to wait for
>pending UDP bearer cleanup works to complete:
>
>       while (atomic_read(&tn->wq_count))
>               cond_resched();
>
>This loop can busy-wait for a long time if cond_resched() is a NOP. This
>typically happens if the netns exit is executed by a high priority task, or under
>kernels configured without preemption (CONFIG_PREEMPT_NONE). In such
>cases, it wastes CPU cycles and can lead to soft lockups.
>
>Fix this by replacing the busy loop with wait_var_event(), allowing the thread
>to sleep properly until the work queue count reaches zero.
>
>Accordingly, update cleanup_bearer() to use atomic_dec_and_test() and
>wake_up_var() to wake up the waiter when the count drops to zero.
>
>This uses the global wait queue hash table, avoiding the need to bloat struct
>tipc_net with a wait_queue_head_t. The atomic_dec_and_test() provides the
>necessary memory barrier to ensure the wakeup is not missed.
>
>Fixes: 04c26faa51d1 ("tipc: wait and exit until all work queues are done")
>Signed-off-by: Eric Dumazet <edumazet@google.com>
>Cc: Xin Long <lucien.xin@gmail.com>
>Cc: Jon Maloy <jmaloy@redhat.com>
>Cc: tipc-discussion@lists.sourceforge.net
>---
> net/tipc/core.c      | 4 ++--
> net/tipc/udp_media.c | 4 +++-
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
>diff --git a/net/tipc/core.c b/net/tipc/core.c index
>1ddecea1df6e9100334c47a28ff6c065292fb9ad..315975c3be8186784e9c44c9ff
>69d62c17ffd4b9 100644
>--- a/net/tipc/core.c
>+++ b/net/tipc/core.c
>@@ -45,6 +45,7 @@
> #include "crypto.h"
>
> #include <linux/module.h>
>+#include <linux/wait_bit.h>
>
> /* configurable TIPC parameters */
> unsigned int tipc_net_id __read_mostly; @@ -118,8 +119,7 @@ static void
>__net_exit tipc_exit_net(struct net *net)  #ifdef CONFIG_TIPC_CRYPTO
> 	tipc_crypto_stop(&tipc_net(net)->crypto_tx);
> #endif
>-	while (atomic_read(&tn->wq_count))
>-		cond_resched();
>+	wait_var_event(&tn->wq_count, atomic_read(&tn->wq_count) == 0);

It could be nicer if you change to this simple call:
wait_var_event(&tn->wq_count, !atomic_read(&tn->wq_count));

> }
>
> static void __net_exit tipc_pernet_pre_exit(struct net *net) diff --git
>a/net/tipc/udp_media.c b/net/tipc/udp_media.c index
>66f3cb87a0aaaac8f40e8f237ab9a44d539b1cd8..62ae7f5b58409c89798c915de
>e752ac42487581f 100644
>--- a/net/tipc/udp_media.c
>+++ b/net/tipc/udp_media.c
>@@ -40,6 +40,7 @@
> #include <linux/igmp.h>
> #include <linux/kernel.h>
> #include <linux/workqueue.h>
>+#include <linux/wait_bit.h>
> #include <linux/list.h>
> #include <net/sock.h>
> #include <net/ip.h>
>@@ -830,7 +831,8 @@ static void cleanup_bearer(struct work_struct *work)
> 	synchronize_net();
>
> 	dst_cache_destroy(&ub->rcast.dst_cache);
>-	atomic_dec(&tn->wq_count);
>+	if (atomic_dec_and_test(&tn->wq_count))
>+		wake_up_var(&tn->wq_count);
> 	kfree(ub);
> }
>
>--
>2.55.0.rc0.799.gd6f94ed593-goog
>


^ permalink raw reply

* RE: [PATCH v2 net 1/2] tipc: fix UAF in cleanup_bearer() due to premature dst_cache_destroy()
From: Tung Quang Nguyen @ 2026-06-24 12:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Simon Horman, Kuniyuki Iwashima, Xin Long, Jon Maloy,
	tipc-discussion@lists.sourceforge.net, netdev@vger.kernel.org,
	eric.dumazet@gmail.com,
	syzbot+e14bc5d4942756023b77@syzkaller.appspotmail.com,
	David S . Miller, Jakub Kicinski, Paolo Abeni
In-Reply-To: <20260623173030.2925059-2-edumazet@google.com>

>Subject: [PATCH v2 net 1/2] tipc: fix UAF in cleanup_bearer() due to premature
>dst_cache_destroy()
>
>TIPC UDP media bearer teardown calls dst_cache_destroy() on its replicast
>caches before calling synchronize_net() to wait for concurrent RCU readers
>(transmitters) to finish:
>
>static void cleanup_bearer(struct work_struct *work) { ...
>	list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) {
>		dst_cache_destroy(&rcast->dst_cache);
>		list_del_rcu(&rcast->list);
>		kfree_rcu(rcast, rcu);
>	}
>...
>	dst_cache_destroy(&ub->rcast.dst_cache);
>	udp_tunnel_sock_release(ub->sk);
>	synchronize_net();
>...
>}
>
>This is highly buggy because dst_cache_destroy() immediately frees the per-
>CPU cache memory (free_percpu()) and releases the cached dst entries
>without any synchronization.
>
>If a concurrent transmitter (e.g., tipc_udp_xmit()) is running on another CPU
>under RCU protection, it can call dst_cache_get() concurrently, leading to:
>1. Use-After-Free on the per-CPU cache pointer itself (crash).
>2. "rcuref - imbalanced put()" warning if it attempts to release a
>   dst that was concurrently released by dst_cache_destroy().
>
>Furthermore, calling kfree(ub) immediately after synchronize_net() without
>closing the socket first (or waiting after closing it) leaves a window where a
>concurrent receiver (tipc_udp_recv()) could start after synchronize_net(),
>access ub, and suffer a UAF when kfree(ub) runs.
>
>To fix this, we must defer dst_cache_destroy() and kfree(ub) until after we have
>ensured that no more readers can see the bearer/socket and all existing
>readers have finished:
>
>1. Defer rcast entry destruction (both dst_cache_destroy() and kfree())
>   to an RCU callback using call_rcu_hurry().
>   Using call_rcu_hurry() ensures the dst entries are released quickly.
>
>2. Release the bearer socket using udp_tunnel_sock_release() (stops
>   new receive readers).
>
>3. Call synchronize_net() to wait for all outstanding RCU readers
>   (both transmit and receive) to finish.
>
>4. Now that it is safe, call dst_cache_destroy() on the main bearer
>   cache, and free ub.
>
>Note: 3) and 4) can be changed later in net-next to also use
>call_rcu_hurry() and get rid of the synchronize_net() latency.
>
>Fixes: e9c1a793210f ("tipc: add dst_cache support for udp media")
>Reported-by: syzbot+e14bc5d4942756023b77@syzkaller.appspotmail.com
>Closes:
>https://lore.kernel.org/netdev/6a396a66.52ae72c2.136ac7.0003.GAE@google.
>com/T/#u
>Signed-off-by: Eric Dumazet <edumazet@google.com>
>Cc: Xin Long <lucien.xin@gmail.com>
>Cc: Jon Maloy <jmaloy@redhat.com>
>Cc: tipc-discussion@lists.sourceforge.net
>---
>v2: addressed Xin Long feedback
>v1:
>https://lore.kernel.org/netdev/CANn89i+dkbrSAwvaWXW7yWMfcwUebuTBLG
>5T7AGZaZcpVYGyfQ@mail.gmail.com/T/#m7bbeedffe3bedb69e33236410e383
>3c7ce809850
> net/tipc/udp_media.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
>diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index
>988b8a7f953ad6da860e6190f1f244650f121dce..66f3cb87a0aaaac8f40e8f237a
>b9a44d539b1cd8 100644
>--- a/net/tipc/udp_media.c
>+++ b/net/tipc/udp_media.c
>@@ -803,6 +803,14 @@ static int tipc_udp_enable(struct net *net, struct
>tipc_bearer *b,
> 	return err;
> }
>
>+static void rcast_free_rcu(struct rcu_head *rcu) {
>+	struct udp_replicast *rcast = container_of(rcu, struct udp_replicast,
>+rcu);

This line is long (over 80 columns). Please break it into 2 lines (refer to linux/Documentation/process/coding-style.rst).

>+
>+	dst_cache_destroy(&rcast->dst_cache);
>+	kfree(rcast);
>+}
>+
> /* cleanup_bearer - break the socket/bearer association */  static void
>cleanup_bearer(struct work_struct *work)  { @@ -811,18 +819,17 @@ static
>void cleanup_bearer(struct work_struct *work)
> 	struct tipc_net *tn;
>
> 	list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) {
>-		dst_cache_destroy(&rcast->dst_cache);
> 		list_del_rcu(&rcast->list);
>-		kfree_rcu(rcast, rcu);
>+		call_rcu_hurry(&rcast->rcu, rcast_free_rcu);
> 	}
>
> 	tn = tipc_net(sock_net(ub->sk));
>
>-	dst_cache_destroy(&ub->rcast.dst_cache);
> 	udp_tunnel_sock_release(ub->sk);
>
>-	/* Note: could use a call_rcu() to avoid another synchronize_net() */
> 	synchronize_net();
>+
>+	dst_cache_destroy(&ub->rcast.dst_cache);
> 	atomic_dec(&tn->wq_count);
> 	kfree(ub);
> }
>--
>2.55.0.rc0.799.gd6f94ed593-goog
>


^ permalink raw reply

* Re: [PATCH 5.10] netfilter: nf_log: validate MAC header was set before dumping it
From: Greg Kroah-Hartman @ 2026-06-24 11:57 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Alexander Martyniuk, stable, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Patrick McHardy, netfilter-devel, coreteam,
	netdev, linux-kernel, Weiming Shi, Xiang Mei
In-Reply-To: <ajvEDFOlP7Bqb-3j@chamomile>

On Wed, Jun 24, 2026 at 01:48:28PM +0200, Pablo Neira Ayuso wrote:
> Hi,
> 
> Thanks but why only 5.10?

It's already in the following releases:
	5.15.210 6.1.176 6.6.143 6.12.94 6.18.36 7.0.13 7.1

so 5.10.y seems like the only one missing it at the moment.

thanks,

greg k-h

^ permalink raw reply

* RE: [PATCH net] tipc: fix out-of-bounds read in broadcast Gap ACK blocks
From: Tung Quang Nguyen @ 2026-06-24 11:56 UTC (permalink / raw)
  To: Samuel Page
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev@vger.kernel.org,
	tipc-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, Jon Maloy
In-Reply-To: <20260623135443.3662041-1-sam@bynar.io>

>Subject: [PATCH net] tipc: fix out-of-bounds read in broadcast Gap ACK blocks
>
>A broadcast PROTOCOL/STATE_MSG can carry a Gap ACK blocks record in its
>data area. tipc_get_gap_ack_blks() only verifies that the record's len field is
>self-consistent with its ugack_cnt/bgack_cnt counts (sz == struct_size(p, gacks,
>ugack_cnt + bgack_cnt)); it does not check that the record actually fits in the
>message data area, msg_data_sz().
>
>The unicast caller tipc_link_proto_rcv() bounds it ("if (glen > dlen) break;"), but
>the broadcast caller tipc_bcast_sync_rcv() discards the returned size, so
>tipc_link_advance_transmq() copies the record off the receive skb with an
>attacker-controlled count:
>
>	this_ga = kmemdup(ga, struct_size(ga, gacks, ga->bgack_cnt),
>			  GFP_ATOMIC);
>
>A TIPC neighbour that negotiated TIPC_GAP_ACK_BLOCK triggers it with one
>ordinary broadcast STATE_MSG (msg_bc_ack_invalid() clear), sized so its data
>area is short, carrying a Gap ACK record with len = 0x400, bgack_cnt = 0xff and
>ugack_cnt = 0. len then equals struct_size(p, gacks, 255), so the consistency
>check passes and ga is non-NULL; kmemdup() reads struct_size(ga, gacks, 255)
>= 1024 bytes out of the much smaller skb:
>
>  BUG: KASAN: slab-out-of-bounds in kmemdup_noprof+0x48/0x60
>  Read of size 1024 at addr ffff0000c7030d38 by task poc864/69
>  Call trace:
>   kmemdup_noprof+0x48/0x60
>   tipc_link_advance_transmq+0x86c/0xb80
>   tipc_link_bc_ack_rcv+0x19c/0x1e0
>   tipc_bcast_sync_rcv+0x1c4/0x2c4
>   tipc_rcv+0x85c/0x1340
>   tipc_l2_rcv_msg+0xac/0x104
>  The buggy address belongs to the object at ffff0000c7030d00
>   which belongs to the cache skbuff_small_head of size 704
>  The buggy address is located 56 bytes inside of
>   allocated 704-byte region [ffff0000c7030d00, ffff0000c7030fc0)
>
>The copied-out bytes are subsequently consumed as gap/ack values, but the
>read is already out of bounds at the kmemdup() regardless of how they are
>used.
>
>Apply the same bound the unicast path uses to the broadcast caller: drop the
>Gap ACK blocks when the reported size exceeds the message data size.
>A NULL ga is already the defined "no Gap ACK blocks" case, so well-formed
>state messages are unaffected.
>
>Fixes: d7626b5acff9 ("tipc: introduce Gap ACK blocks for broadcast link")
>Cc: stable@vger.kernel.org
>Assisted-by: Bynario AI
>Signed-off-by: Samuel Page <sam@bynar.io>
>---
>Before posting I found an earlier thread for what looks like the same (or a very
>closely related) issue:
>
>
>https://lore.kernel.org/netdev/1316452e465e9a96fce44ec15130a14f3872149f.
>1775809727.git.caoruide123@gmail.com/
>  [PATCH net 1/1] tipc: validate Gap ACK blocks in STATE message
>
>That one added the validation inside tipc_get_gap_ack_blks() and the thread
>stalled on whether the extra checks were redundant. This patch instead adds,
>on the broadcast caller, only the same bound the unicast path already applies,
>and includes the KASAN reproducer that was asked for there.
>
> net/tipc/bcast.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index
>76a1585d3f6b..61c83bd95755 100644
>--- a/net/tipc/bcast.c
>+++ b/net/tipc/bcast.c
>@@ -502,6 +502,7 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link
>*l,
> 	struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq;
> 	struct tipc_gap_ack_blks *ga;
> 	struct sk_buff_head xmitq;
>+	u16 glen;
> 	int rc = 0;
>
> 	__skb_queue_head_init(&xmitq);
>@@ -510,7 +511,10 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link
>*l,
> 	if (msg_type(hdr) != STATE_MSG) {
> 		tipc_link_bc_init_rcv(l, hdr);
> 	} else if (!msg_bc_ack_invalid(hdr)) {
>-		tipc_get_gap_ack_blks(&ga, l, hdr, false);
>+		/* Validate Gap ACK blocks, drop if invalid */
>+		glen = tipc_get_gap_ack_blks(&ga, l, hdr, false);
>+		if (glen > msg_data_sz(hdr))
>+			ga = NULL;

This is wrong because the skb is not dropped as it should be.
Note that 'ga' is NULL just for legacy TIPC that does not support Selective ACK.
To correctly fix this issue, you need to set a flag (for example, a Boolean output parameter) to TRUE instead of 'ga=NULL'.
Then, immediately return and repeatedly pass the flag to tipc_rcv() in order to drop the skb.

> 		if (!sysctl_tipc_bc_retruni)
> 			retrq = &xmitq;
> 		rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr),
>
>base-commit: a986fde914d88af47eb78fd29c5d1af7952c3500
>--
>2.54.0


^ permalink raw reply

* Re: [PATCH bpf-next v5 1/3] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper
From: Avinash Duduskar @ 2026-06-24 11:54 UTC (permalink / raw)
  To: toke, ast, daniel, andrii
  Cc: eddyz87, memxor, martin.lau, song, yonghong.song, jolsa, emil,
	john.fastabend, sdf, davem, edumazet, kuba, pabeni, horms, shuah,
	hawk, yatsenko, leon.hwang, kpsingh, a.s.protopopov, ameryhung,
	rongtao, eyal.birger, bpf, netdev, linux-kernel, linux-kselftest,
	dsahern
In-Reply-To: <87pl1gcmgf.fsf@toke.dk>

> > +	if (flags & BPF_FIB_LOOKUP_VLAN)
> > +		return -EINVAL;
> > +
>
> This is fine, but we should probably reject the input flag as well in
> the next patch (for symmetry).

I dug into this and I don't think the two are symmetric. The egress
reject is right for exactly the reason you gave: in tc you can redirect
to the VLAN device directly, so reducing the egress to the physical
parent is only needed for XDP. But that is a transmit argument, and
VLAN_INPUT never touches the egress or redirect side. It only sets the
lookup's ingress (flowi_iif), which picks the iif policy rule and the VRF
table, and there is no XDP-only constraint there for the symmetry to
mirror.

tc also has a real user for it. In __netif_receive_skb_core() the tcx
ingress hook runs before vlan_do_receive() demuxes the frame, so a clsact
program on the physical port sees a tagged frame with skb->dev still the
physical device and the tag in skb->vlan_tci. That is exactly the
physical-ifindex-plus-tag input VLAN_INPUT takes, and it wants the
subinterface-scoped answer. The 2/3 selftest already runs the VLAN_INPUT
cases on the tc path, including the VRF-table-selection ones, and they
pass, so this isn't a theoretical tc path.

So I would keep VLAN_INPUT allowed on both. If you would rather hold a
uniform "both VLAN flags are XDP-only" line under the same
restrict-now-relax-later rule as the TBID and OUTPUT combos, I will add
the reject, but unlike the egress case it removes a working tc path
rather than one tc cannot use. Happy either way, just let me know.

Thanks for the review.

-Avinash

^ permalink raw reply

* Re: [PATCH 5.10] netfilter: nf_log: validate MAC header was set before dumping it
From: Pablo Neira Ayuso @ 2026-06-24 11:51 UTC (permalink / raw)
  To: Alexander Martyniuk
  Cc: stable, Greg Kroah-Hartman, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Patrick McHardy, netfilter-devel, coreteam,
	netdev, linux-kernel, Weiming Shi, Xiang Mei
In-Reply-To: <ajvEDFOlP7Bqb-3j@chamomile>

BTW, fixing Cc: to netfilter-devel@vger.kernel.org

On Wed, Jun 24, 2026 at 01:48:31PM +0200, Pablo Neira Ayuso wrote:
> Hi,
> 
> Thanks but why only 5.10?
> 
> On Wed, Jun 24, 2026 at 02:01:15PM +0000, Alexander Martyniuk wrote:
> > From: Xiang Mei <xmei5@asu.edu>
> > 
> > commit a84b6fedbc97078788be78dbdd7517d143ad1a77 upstream
> > 
> > The fallback path of dump_mac_header() guards the MAC header access
> > only with "skb->mac_header != skb->network_header", without checking
> > skb_mac_header_was_set(). When the MAC header is unset, mac_header is
> > 0xffff, so the test passes and skb_mac_header(skb) returns
> > skb->head + 0xffff, ~64 KiB past the buffer; the loop then reads
> > dev->hard_header_len bytes out of bounds into the kernel log.
> > 
> > This is reachable via the netdev logger: nf_log_unknown_packet() calls
> > dump_mac_header() unconditionally, and an skb sent through AF_PACKET
> > with PACKET_QDISC_BYPASS reaches the egress hook with mac_header still
> > unset (__dev_queue_xmit(), which would reset it, is bypassed).
> > 
> > Add the skb_mac_header_was_set() check the ARPHRD_ETHER path already
> > uses, and replace the open-coded MAC header length test with
> > skb_mac_header_len(). Only skbs with an unset MAC header are affected;
> > valid ones are dumped as before.
> > 
> >  BUG: KASAN: slab-out-of-bounds in dump_mac_header (net/netfilter/nf_log_syslog.c:831)
> >  Read of size 1 at addr ffff88800ea49d3f by task exploit/148
> >  Call Trace:
> >   kasan_report (mm/kasan/report.c:595)
> >   dump_mac_header (net/netfilter/nf_log_syslog.c:831)
> >   nf_log_netdev_packet (net/netfilter/nf_log_syslog.c:938 net/netfilter/nf_log_syslog.c:963)
> >   nf_log_packet (net/netfilter/nf_log.c:260)
> >   nft_log_eval (net/netfilter/nft_log.c:60)
> >   nft_do_chain (net/netfilter/nf_tables_core.c:285)
> >   nft_do_chain_netdev (net/netfilter/nft_chain_filter.c:307)
> >   nf_hook_slow (net/netfilter/core.c:619)
> >   nf_hook_direct_egress (net/packet/af_packet.c:257)
> >   packet_xmit (net/packet/af_packet.c:280)
> >   packet_sendmsg (net/packet/af_packet.c:3114)
> >   __sys_sendto (net/socket.c:2265)
> > 
> > Fixes: 7eb9282cd0ef ("netfilter: ipt_LOG/ip6t_LOG: add option to print decoded MAC header")
> > Reported-by: Weiming Shi <bestswngs@gmail.com>
> > Assisted-by: Claude:claude-opus-4-8
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > Signed-off-by: Alexander Martyniuk <alexevgmart@gmail.com>
> > ---
> > Backport fix for CVE-2026-52942
> >  net/ipv4/netfilter/nf_log_ipv4.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv4/netfilter/nf_log_ipv4.c b/net/ipv4/netfilter/nf_log_ipv4.c
> > index d07583fac8f8..d6164e8e2c73 100644
> > --- a/net/ipv4/netfilter/nf_log_ipv4.c
> > +++ b/net/ipv4/netfilter/nf_log_ipv4.c
> > @@ -296,8 +296,8 @@ static void dump_ipv4_mac_header(struct nf_log_buf *m,
> >  
> >  fallback:
> >  	nf_log_buf_add(m, "MAC=");
> > -	if (dev->hard_header_len &&
> > -	    skb->mac_header != skb->network_header) {
> > +	if (dev->hard_header_len && skb_mac_header_was_set(skb) &&
> > +	    skb_mac_header_len(skb) != 0) {
> >  		const unsigned char *p = skb_mac_header(skb);
> >  		unsigned int i;
> >  
> > -- 
> > 2.43.0
> > 

^ permalink raw reply

* Re: [PATCH 3/7] dt-bindings: net: rockchip-dwmac: Allow 9 clocks
From: Heiko Stübner @ 2026-06-24 11:50 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Wu, Maxime Coquelin, Alexandre Torgue, Yanan He
  Cc: devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	netdev, linux-stm32, Yanan He
In-Reply-To: <20260624-rv1126-alientek-dlrv1126-v1-3-5aef608a3f64@gmail.com>

Am Mittwoch, 24. Juni 2026, 10:44:40 Mitteleuropäische Sommerzeit schrieb Yanan He:
> RV1126 has a separate GMAC Ethernet output clock used as the external
> PHY reference clock. This clock is described in addition to the existing
> GMAC clocks.

AS stated in the driver patch, this is the input clock for the phy
and ethernet phys are perfectly capable of handling their input
clocks. See reference in the driver patch.


Heiko


> Signed-off-by: Yanan He <grumpycat921013@gmail.com>
> ---
>  Documentation/devicetree/bindings/net/rockchip-dwmac.yaml | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
> index 80c252845349..86a7e83675ae 100644
> --- a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
> +++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
> @@ -71,7 +71,7 @@ properties:
>  
>    clocks:
>      minItems: 4
> -    maxItems: 8
> +    maxItems: 9
>  
>    clock-names:
>      contains:
> 
> 





^ permalink raw reply

* Re: [PATCH 1/1] xfrm: nat_keepalive: avoid double free on send error
From: Steffen Klassert @ 2026-06-24 11:49 UTC (permalink / raw)
  To: Qianyu Luo; +Cc: Eyal Birger, Ren Wei, netdev, herbert, davem, yuantan098, bird
In-Reply-To: <CAPOno=q0NxA_LOJ3QaM9Jgk1oM4ZQiW9igipGfxV7+yGvjr+0g@mail.gmail.com>

On Fri, Jun 19, 2026 at 05:06:40PM +0800, Qianyu Luo wrote:
> On Fri, Jun 19, 2026 at 1:21 AM Eyal Birger <eyal.birger@gmail.com> wrote:
> >
> > On Thu, Jun 18, 2026 at 9:36 AM Ren Wei <n05ec@lzu.edu.cn> wrote:
> > >
> > > From: Qianyu Luo <qianyuluo3@gmail.com>
> > >
> > >         skb_dst_set(skb, &rt->dst);
> > >
> > > @@ -100,6 +102,7 @@ static int nat_keepalive_send_ipv6(struct sk_buff *skb,
> > >         sock_net_set(sk, net);
> > >         dst = ip6_dst_lookup_flow(net, sk, &fl6, NULL);
> > >         if (IS_ERR(dst)) {
> > > +               kfree_skb(skb);
> > >                 local_unlock_nested_bh(&nat_keepalive_sk_ipv6.bh_lock);
> >
> > Any reason to do the kfree under lock?
> >
> 
> Thank you for your reply! I did this without particular reason.
> Just to keep the pre-handoff cleanup in the same error path.
> kfree_skb() does not need the bh lock there, so I can move it after the
> unlock in v2 if you need!

Then move it after the unlock to make it clear it is not needed.

^ permalink raw reply

* Re: [PATCH 4/7] net: stmmac: dwmac-rk: Enable refout clock for RGMII
From: Heiko Stübner @ 2026-06-24 11:49 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Wu, Maxime Coquelin, Alexandre Torgue, Yanan He
  Cc: devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	netdev, linux-stm32, Yanan He
In-Reply-To: <20260624-rv1126-alientek-dlrv1126-v1-4-5aef608a3f64@gmail.com>

Hi,

Am Mittwoch, 24. Juni 2026, 10:44:41 Mitteleuropäische Sommerzeit schrieb Yanan He:
> Some Rockchip GMAC integrations use clk_mac_refout as an external PHY
> reference clock even when the MAC is configured for RGMII.
> 
> RV1126 boards can route CLK_GMAC_ETHERNET_OUT to the external PHY as a
> 25 MHz reference clock. If the driver does not acquire and enable this
> clock in RGMII mode, the common clock framework may disable it as unused
> and the PHY can lose its reference clock.
> 
> Enable the refout clock handling for RGMII in addition to RMII.

the clock your referencing is not limited to your rv1126 but instead
present on most (all?) Rockchip SoCs.

And it is an input clock for the phy itself, so should be handled there.

See for example
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3588-tiger.dtsi#n313
as a reference.


Heiko

> Signed-off-by: Yanan He <grumpycat921013@gmail.com>
> ---
>  drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> index 8d7042e68926..f6fdc0c5b475 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> @@ -1112,7 +1112,8 @@ static int rk_gmac_clk_init(struct plat_stmmacenet_data *plat)
>  	bsp_priv->clk_enabled = false;
>  
>  	bsp_priv->num_clks = ARRAY_SIZE(rk_clocks);
> -	if (phy_iface == PHY_INTERFACE_MODE_RMII)
> +	if (phy_iface == PHY_INTERFACE_MODE_RMII ||
> +	    phy_iface == PHY_INTERFACE_MODE_RGMII)
>  		bsp_priv->num_clks += ARRAY_SIZE(rk_rmii_clocks);
>  
>  	bsp_priv->clks = devm_kcalloc(dev, bsp_priv->num_clks,
> @@ -1123,7 +1124,8 @@ static int rk_gmac_clk_init(struct plat_stmmacenet_data *plat)
>  	for (i = 0; i < ARRAY_SIZE(rk_clocks); i++)
>  		bsp_priv->clks[i].id = rk_clocks[i];
>  
> -	if (phy_iface == PHY_INTERFACE_MODE_RMII) {
> +	if (phy_iface == PHY_INTERFACE_MODE_RMII ||
> +	    phy_iface == PHY_INTERFACE_MODE_RGMII) {
>  		for (j = 0; j < ARRAY_SIZE(rk_rmii_clocks); j++)
>  			bsp_priv->clks[i++].id = rk_rmii_clocks[j];
>  	}
> 
> 





^ permalink raw reply

* Re: [PATCH 5.10] netfilter: nf_log: validate MAC header was set before dumping it
From: Pablo Neira Ayuso @ 2026-06-24 11:48 UTC (permalink / raw)
  To: Alexander Martyniuk
  Cc: stable, Greg Kroah-Hartman, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Patrick McHardy, netfilter-devel, coreteam,
	netdev, linux-kernel, Weiming Shi, Xiang Mei
In-Reply-To: <20260624140117.19799-1-alexevgmart@gmail.com>

Hi,

Thanks but why only 5.10?

On Wed, Jun 24, 2026 at 02:01:15PM +0000, Alexander Martyniuk wrote:
> From: Xiang Mei <xmei5@asu.edu>
> 
> commit a84b6fedbc97078788be78dbdd7517d143ad1a77 upstream
> 
> The fallback path of dump_mac_header() guards the MAC header access
> only with "skb->mac_header != skb->network_header", without checking
> skb_mac_header_was_set(). When the MAC header is unset, mac_header is
> 0xffff, so the test passes and skb_mac_header(skb) returns
> skb->head + 0xffff, ~64 KiB past the buffer; the loop then reads
> dev->hard_header_len bytes out of bounds into the kernel log.
> 
> This is reachable via the netdev logger: nf_log_unknown_packet() calls
> dump_mac_header() unconditionally, and an skb sent through AF_PACKET
> with PACKET_QDISC_BYPASS reaches the egress hook with mac_header still
> unset (__dev_queue_xmit(), which would reset it, is bypassed).
> 
> Add the skb_mac_header_was_set() check the ARPHRD_ETHER path already
> uses, and replace the open-coded MAC header length test with
> skb_mac_header_len(). Only skbs with an unset MAC header are affected;
> valid ones are dumped as before.
> 
>  BUG: KASAN: slab-out-of-bounds in dump_mac_header (net/netfilter/nf_log_syslog.c:831)
>  Read of size 1 at addr ffff88800ea49d3f by task exploit/148
>  Call Trace:
>   kasan_report (mm/kasan/report.c:595)
>   dump_mac_header (net/netfilter/nf_log_syslog.c:831)
>   nf_log_netdev_packet (net/netfilter/nf_log_syslog.c:938 net/netfilter/nf_log_syslog.c:963)
>   nf_log_packet (net/netfilter/nf_log.c:260)
>   nft_log_eval (net/netfilter/nft_log.c:60)
>   nft_do_chain (net/netfilter/nf_tables_core.c:285)
>   nft_do_chain_netdev (net/netfilter/nft_chain_filter.c:307)
>   nf_hook_slow (net/netfilter/core.c:619)
>   nf_hook_direct_egress (net/packet/af_packet.c:257)
>   packet_xmit (net/packet/af_packet.c:280)
>   packet_sendmsg (net/packet/af_packet.c:3114)
>   __sys_sendto (net/socket.c:2265)
> 
> Fixes: 7eb9282cd0ef ("netfilter: ipt_LOG/ip6t_LOG: add option to print decoded MAC header")
> Reported-by: Weiming Shi <bestswngs@gmail.com>
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Xiang Mei <xmei5@asu.edu>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> Signed-off-by: Alexander Martyniuk <alexevgmart@gmail.com>
> ---
> Backport fix for CVE-2026-52942
>  net/ipv4/netfilter/nf_log_ipv4.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/netfilter/nf_log_ipv4.c b/net/ipv4/netfilter/nf_log_ipv4.c
> index d07583fac8f8..d6164e8e2c73 100644
> --- a/net/ipv4/netfilter/nf_log_ipv4.c
> +++ b/net/ipv4/netfilter/nf_log_ipv4.c
> @@ -296,8 +296,8 @@ static void dump_ipv4_mac_header(struct nf_log_buf *m,
>  
>  fallback:
>  	nf_log_buf_add(m, "MAC=");
> -	if (dev->hard_header_len &&
> -	    skb->mac_header != skb->network_header) {
> +	if (dev->hard_header_len && skb_mac_header_was_set(skb) &&
> +	    skb_mac_header_len(skb) != 0) {
>  		const unsigned char *p = skb_mac_header(skb);
>  		unsigned int i;
>  
> -- 
> 2.43.0
> 

^ permalink raw reply

* Re: [PATCH bpf-next v8 3/7] bpf: add bpf_icmp_send kfunc
From: Mahe Tardy @ 2026-06-24 11:45 UTC (permalink / raw)
  To: Emil Tsalapatis
  Cc: bpf, andrii, ast, daniel, edumazet, john.fastabend, jordan, kuba,
	martin.lau, netdev, netfilter-devel, pabeni, yonghong.song
In-Reply-To: <ajuqZMzqACLOijoC@gmail.com>

On Wed, Jun 24, 2026 at 11:59:00AM +0200, Mahe Tardy wrote:
> On Tue, Jun 23, 2026 at 10:09:20PM -0400, Emil Tsalapatis wrote:
> > On Mon Jun 22, 2026 at 8:05 AM EDT, Mahe Tardy wrote:
> 
> [...]
> 
> > > +#if IS_ENABLED(CONFIG_IPV6)
> > > +	case htons(ETH_P_IPV6):
> > > +		if (type != ICMPV6_DEST_UNREACH)
> > > +			return -EOPNOTSUPP;
> > > +		if (code < 0 || code > ICMPV6_REJECT_ROUTE)
> > > +			return -EINVAL;
> > > +
> > > +		nskb = skb_clone(skb, GFP_ATOMIC);
> > > +		if (!nskb)
> > > +			return -ENOMEM;
> > > +
> > > +		if (!pskb_network_may_pull(nskb, sizeof(struct ipv6hdr))) {
> > 
> > Minor nit, but this may also fail with SKB_DROP_REASON_NOMEM. Now this is only
> > possible if the IP header is not in the linear space which may well be
> > impossible (?), but do we want to differentiate with
> > pskb_network_may_pull_reason()?
> 
> Indeed, I think for the IP header is should be fine, but I replaced it
> with the reason variant. Thanks!
>  
> > > +			kfree_skb(nskb);
> > > +			return -EBADMSG;
> > > +		}
> > > +
> 
> [...]
> 
> > >  static int __init bpf_kfunc_init(void)
> > >  {
> > >  	int ret;
> > > @@ -12639,6 +12745,9 @@ static int __init bpf_kfunc_init(void)
> > >  	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> > >  					       &bpf_kfunc_set_sock_addr);
> > >  	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
> > > +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB, &bpf_kfunc_set_icmp_send);
> > > +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_icmp_send);
> > > +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_ACT, &bpf_kfunc_set_icmp_send);
> > 
> > Based on Sashiko's feedback, since we mostly care about cgroup_skb
> > should we just make it exclusive to them and drop CLS_ACT?
> 
> This would indeed simplify this patchset, I could drop most of the
> complication induced by tc ingress routing. But I think having both
> cgroup_skb and tc support would be nice as a first implem. I'll try
> again in a new version as I added a test for ingress tc and could
> actually fix the routing based on sashiko's feedback (this also drop the
> first two patches that were partially wrong).

tl;dr: I'll remove the tc support as it feels difficult (impossible
without major plumbing changes?) to get right.

Here are the details:

Initially I ended up removing the first two patch set as they were
technically wrong (see explanations after), I added this small helper:

	#if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6)
	static bool skb_dst_validate_and_hold(struct sk_buff *skb)
	{
	       bool ret;

	       rcu_read_lock();
	       ret = skb_valid_dst(skb) && skb_dst_force(skb);
	       rcu_read_unlock();

	       return ret;
	}
	#endif

And then the body of the kfunc would do something like this (instead of
calling the removed helpers):

	reason = pskb_network_may_pull_reason(nskb, sizeof(struct iphdr));
	if (reason) {
		kfree_skb_reason(nskb, reason);
		return -EBADMSG;
	}

	memset(IPCB(nskb), 0, sizeof(struct inet_skb_parm));
	IPCB(nskb)->iif = nskb->skb_iif;

	if (!skb_dst_validate_and_hold(nskb)) {
		if (!nskb->dev) {
			kfree_skb(nskb);
			return -ENODEV;
		}

		iph = ip_hdr(nskb);
		reason = ip_route_input(nskb, iph->daddr, iph->saddr,
					ip4h_dscp(iph), nskb->dev);
		if (reason) {
			kfree_skb_reason(nskb, reason);
			return -EHOSTUNREACH;
		}
	}

	icmp_send(nskb, type, code, 0);

Then I added a tc ingress test to showcase the issue with the previous
helpers and trigger the routing in the kfunc, with this steup:

	  client ns:                    test ns:
	  icmp_peer                     ns_icmp_send_unreach_route_ingress
	+------------+                +-------------------------+
	| icmp_cli   |                | icmp_srv                |
	| 198.18.0.1 |--------------->| primary:   198.18.0.254 |
	+------------+ TCP SYN        | local dst: 198.18.0.2   |
		       dst=198.18.0.2 +-------------------------+
					  | tc ingress BPF
					  | calls bpf_icmp_send()
					  | -> icmp src=198.18.0.2
					  v

With the previous helpers, the icmp would route by reverting the daddr
immediately, and then asking for the route. Thus we would "forget" about
the actual source address, and in this case we would end up with the
icmp control message src IP being the primary address and not the one we
wanted: 198.18.9.2. Turns out route input was sufficient to give the
needed _correct_ information to icmp_send that would invert the address
for us and route the packet again.

Then I submitted that for more review by sashiko and it found this new
thing (which is orthogonal to the previous issue):

	> @@ -12639,6 +12788,9 @@ static int __init bpf_kfunc_init(void)
	>  	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
	>  					       &bpf_kfunc_set_sock_addr);
	>  	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
	> +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB, &bpf_kfunc_set_icmp_send);
	> +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_icmp_send);
	> +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_ACT, &bpf_kfunc_set_icmp_send);

	Can exposing net/core/filter.c:bpf_icmp_send() to sched_cls and sched_act
	deadlock a non-lockless qdisc?

	A cls_bpf program can run from a qdisc enqueue classifier while the qdisc
	root lock is held.  If it calls bpf_icmp_send(), the kfunc synchronously
	goes through icmp_send()/icmpv6_send() and then normal transmit.  If the
	reply routes back through the same qdisc, the inner transmit can try to take
	the same root lock again.

	One possible path is:

	__dev_xmit_skb()
	  q->enqueue()
	  prio_enqueue()
	  tcf_classify()
	  cls_bpf_classify()
	  bpf_icmp_send()
	  icmp_send()/icmpv6_send()
	  dev_queue_xmit()
	  __dev_xmit_skb()

	Is this kfunc safe in enqueue classifier/action contexts, or should this
	registration be limited to contexts that cannot run under the qdisc root
	lock?

I managed to indeed reproduce this deadlock. So I think there's no way
to implement this safely, we would either need:
- make the kfunc only available to tcx only (and then prevent program
  verified as TCX from being reused as legacy qdisc classifiers...)
- do some crazy runtime guard, exposing the lock.

So I will give up on tc support for now as it's more difficult than
expected.

> 
> > >  	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
> > >  }
> > >  late_initcall(bpf_kfunc_init);
> > > --
> > > 2.34.1
> > 

^ permalink raw reply

* Re: [PATCH net] net: clear transport header during tunnel decapsulation
From: Eric Dumazet @ 2026-06-24 11:44 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Ido Schimmel, David Ahern, netdev, eric.dumazet,
	syzbot+d5d0d598a4cfdfafdc3b
In-Reply-To: <95a719af-c9d3-4bce-995b-c6ffce15739c@linux.dev>

On Wed, Jun 24, 2026 at 3:41 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
>
> On 6/24/26 3:32 PM, Eric Dumazet wrote:
> > Syzbot triggered a DEBUG_NET_WARN_ON_ONCE(len > INT_MAX) assertion in
> > pskb_may_pull_reason() called from qdisc_pkt_len_segs_init().
> >
> > The root cause is a stale, negative transport header offset carried over
> > during tunnel decapsulation. When a tunnel receiver (e.g., VXLAN or Geneve)
> > decapsulates a packet, it pulls the outer headers but leaves the transport
> > header pointing to the outer UDP header. This offset becomes negative
> > relative to the new skb->data (inner IP header).
> >
> > If the packet bypasses GRO (e.g., an untrusted GSO packet flagged as
> > "unexpected GSO" by udp_unexpected_gso() due to missing tunnel GSO bits),
> > it is flushed directly to the stack as GRO_NORMAL. On ingress, Layer 2 Qdisc
> > processing (sch_handle_ingress) happens before Layer 3 IP reception
> > (ip_rcv_core) can run and reset the transport header. Consequently,
> > qdisc_pkt_len_segs_init() attempts to validate the transport header using
> > pskb_may_pull(skb, hdr_len + sizeof(tcphdr)). The negative hdr_len overflows
> > the unsigned cast in pskb_may_pull(), triggering the assertion.
> >
> > Fix this by clearing the transport header to the ~0U sentinel value during
> > decapsulation. This ensures that:
> > 1) The ingress Qdisc safely skips validation via !skb_transport_header_was_set()
> >     and returns early without warning.
> > 2) The IP layer (ip_rcv_core) later correctly resets the transport header
> >     to the inner L4 header offset.
> >
> > Introduce skb_unset_transport_header() helper and apply it in the main
> > decapsulation paths:
> > 1) __iptunnel_pull_header() (covering Geneve, GRE, IPIP, SIT, etc.)
> > 2) vxlan_rcv() (covering VXLAN)
> >
> > This restores skb invariants at the decapsulation boundary without adding
> > overhead to the Qdisc fast path.
> >
> > Fixes: 7fb4c1967011 ("net: pull headers in qdisc_pkt_len_segs_init()")
> > Reported-by: syzbot+d5d0d598a4cfdfafdc3b@syzkaller.appspotmail.com
> > Closes: https://lore.kernel.org/netdev/6a3b853b.52ae72c2.136ac7.000c.GAE@google.com/T/#u
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Assisted-by: Gemini:gemini-3.1-pro
>
>
> I think a negative skb_transport_offset() should break something else too,
> so the Fixes tag looks wrong, but I couldn't find any actual breakage
> (luck, or I'm missing it).

Read again the changelog: transport header is set (in ingress) a bit
later in the stack.

Nothing needs it before, but  qdisc_pkt_len_segs_init() if/when it is
called in ingress.

>
> Hope sashiko read this reply and confirm it....

On older kernels (before  7fb4c1967011 ("net: pull headers in
qdisc_pkt_len_segs_init()"),
the bug is completely latent and harmless.

This prevents unnecessary backporting churn and potential merge conflicts on
very old kernels where skb_unset_transport_header() doesn't exist.

The Historical Option (a6d5bbf34efa / d342894c5d28):

If we point to the original commits that introduced the tunnels,
we are historically accurate, but we risk stable scripts trying to
backport this fix all the way back to 2012/2016
(e.g. kernel 3.7 or 4.6), which is unnecessary and highly likely to
fail to apply.

^ permalink raw reply

* Re: [PATCH net v2] fsl/fman: Free init resources on KeyGen failure in fman_init()
From: Breno Leitao @ 2026-06-24 11:40 UTC (permalink / raw)
  To: Haoxiang Li
  Cc: madalin.bucur, sean.anderson, andrew+netdev, davem, edumazet,
	kuba, pabeni, florinel.iordache, netdev, linux-kernel, stable,
	Pavan Chebbi
In-Reply-To: <20260624055119.2776641-1-haoxiang_li2024@163.com>

On Wed, Jun 24, 2026 at 01:51:19PM +0800, Haoxiang Li wrote:
> diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c
> index 013273a2de32..3a2a57207e55 100644
> --- a/drivers/net/ethernet/freescale/fman/fman.c
> +++ b/drivers/net/ethernet/freescale/fman/fman.c
> @@ -1995,12 +1995,12 @@ static int fman_init(struct fman *fman)
>  
>  	/* Init KeyGen */
>  	fman->keygen = keygen_init(fman->kg_regs);
> -	if (!fman->keygen)
> +	if (!fman->keygen) {
> +		free_init_resources(fman);

That makes sense, fman_init() is doing the same earlier when "MURAM
alloc for BMI FIFO failed".

For this patch only, please feel free to add Reviewed-by: Breno Leitao <leitao@debian.org>

>  		return -EINVAL;
> +	}
>  
> -	err = enable(fman, cfg);
> -	if (err != 0)
> -		return err;
> +	enable(fman, cfg);

I understand the "while at it", but this should be a separate patch,
and it isn't a fix for 7472f4f281d0 ("fsl/fman: enable FMan Keygen")

Separate this in a different patch, targeting net-next. Also, enable()
might return "void" instead of "int", given it only returns 0.

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net v2] ice: eswitch: fix use-after-free of metadata_dst in repr release
From: Marcin Szycik @ 2026-06-24 11:36 UTC (permalink / raw)
  To: Doruk Tan Ozturk, anthony.l.nguyen, przemyslaw.kitszel,
	andrew+netdev, davem, edumazet, kuba, pabeni
  Cc: michal.swiatkowski, wojciech.drewek, intel-wired-lan, netdev,
	linux-kernel, stable, horms
In-Reply-To: <20260618145003.47471-1-doruk@0sec.ai>



On 18/06/2026 16:50, Doruk Tan Ozturk wrote:
> ice_eswitch_release_repr() frees the port representor metadata_dst via
> metadata_dst_free(), which directly kfree()s the object and ignores the
> dst_entry refcount. The eswitch slow-path TX routine
> ice_eswitch_port_start_xmit() takes a reference on this dst with
> dst_hold() and attaches it to the skb via skb_dst_set(). If such an skb
> is still in flight (e.g. queued in a qdisc) when the representor is torn
> down, the metadata_dst is freed while the skb still points at it. When
> the skb is later freed, dst_release() operates on already-freed memory.
> 
> Replace metadata_dst_free() with dst_release() so the metadata_dst is
> freed only after the last reference is dropped. The dst subsystem frees
> metadata_dst objects from dst_destroy() once the refcount reaches zero
> (DST_METADATA is set by metadata_dst_alloc()).
> 
> Same class of bug and fix as commit c32b26aaa2f9 ("netfilter:
> nft_tunnel: fix use-after-free on object destroy").
> 
> Fixes: 1a1c40df2e80 ("ice: set and release switchdev environment")
> Cc: stable@vger.kernel.org
> Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
> Reviewed-by: Simon Horman <horms@kernel.org>

Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>

> ---
> v2:
>  - Correct the Fixes: tag to 1a1c40df2e80 ("ice: set and release
>    switchdev environment"); the previously cited fff292b47ac1 only moved
>    the affected code rather than introducing the unbalanced free, and the
>    bug dates back to when switchdev support was added (Simon Horman).
>  - Add Simon Horman's Reviewed-by. No functional change.
> 
>  drivers/net/ethernet/intel/ice/ice_eswitch.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c
> index 2e4f0969035f..41b30a7ca4a9 100644
> --- a/drivers/net/ethernet/intel/ice/ice_eswitch.c
> +++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c
> @@ -95,7 +95,7 @@ ice_eswitch_release_repr(struct ice_pf *pf, struct ice_repr *repr)
>  		return;
> 
>  	ice_vsi_update_security(vsi, ice_vsi_ctx_set_antispoof);
> -	metadata_dst_free(repr->dst);
> +	dst_release(&repr->dst->dst);
>  	repr->dst = NULL;
>  	ice_fltr_add_mac_and_broadcast(vsi, repr->parent_mac,
>  				       ICE_FWD_TO_VSI);
> --
> 2.43.0


^ permalink raw reply

* Re: [PATCH iwl v3] ice: retry reading NVM if admin queue returns EBUSY
From: Robert Malz @ 2026-06-24 11:15 UTC (permalink / raw)
  To: Simon Horman
  Cc: anthony.l.nguyen, przemyslaw.kitszel, intel-wired-lan, netdev
In-Reply-To: <20260618152003.909400-1-horms@kernel.org>

Hey Simon,
Thanks for the review. I'll try to address these comments one by one.

On Thu, Jun 18, 2026 at 5:20 PM Simon Horman <horms@kernel.org> wrote:
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> ---
> ice: retry reading NVM if admin queue returns EBUSY
>
> When the admin queue command to read NVM returns EBUSY, the driver
> currently treats it as a fatal error and aborts the entire read operation.
> This patch adds retry logic to handle this condition.
>
> > diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
> > --- a/drivers/net/ethernet/intel/ice/ice_nvm.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
>
> [ ... ]
>
> > @@ -96,11 +97,25 @@ ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
> >                                        offset, read_size,
> >                                        data + bytes_read, last_cmd,
> >                                        read_shadow_ram, NULL);
> > -             if (status)
> > -                     break;
> > -
> > -             bytes_read += read_size;
> > -             offset += read_size;
> > +             if (status) {
> > +                     if (hw->adminq.sq_last_status != LIBIE_AQ_RC_EBUSY ||
> > +                         retry_cnt > ICE_SQ_SEND_MAX_EXECUTE)
>
> [Severity: Low]
> Does this check allow an extra retry execution? Since retry_cnt starts at 0,
> using a strictly greater-than check allows the loop to retry 4 times
> (5 total executions) instead of the 3 executions intended by
> ICE_SQ_SEND_MAX_EXECUTE.
>
> > +                             break;
> > +                     ice_debug(hw, ICE_DBG_NVM,
> > +                               "NVM read EBUSY error, retry %d\n",
> > +                               retry_cnt + 1);
> > +                     last_cmd = false;
> > +                     ice_release_nvm(hw);
> > +                     msleep(ICE_SQ_SEND_DELAY_TIME_MS);
>
The review is correct that this allows one extra execution. However, I
intentionally kept this verbatim from the OOT driver (1.15.4) to
retain the significant test exposure it has there. The absolute worst
case is one harmless extra retry before giving up. If you strongly
prefer strict adherence to ICE_SQ_SEND_MAX_EXECUTE, I can update the
condition in a v4.

> [Severity: High]
> Does dropping the NVM lock here violate the caller's assumption of mutual
> exclusion?
>
> Callers of ice_read_flat_nvm() typically acquire the NVM lock beforehand to
> ensure an atomic read of multiple sectors without interleaved writes. By
> explicitly dropping the lock during the msleep() window, another thread, VF,
> or PF could acquire the NVM lock and modify the NVM before the read resumes,
> potentially resulting in reading torn or inconsistent NVM data.
>
> > +                     status = ice_acquire_nvm(hw, ICE_RES_READ);
> > +                     if (status)
> > +                             break;
>

I don't think this is an issue. The NVM resource (req_res 0x0008) is
an ownership lease, not a content snapshot. The FW grants it with a
hold timeout (see the timeout field handling in ice_aq_req_res). It
never guaranteed a point-in-time view of the flash.
Very similar behavior already exists in ice_devlink_nvm_snapshot(),
which reads the NVM in chunks via ice_read_flat_nvm() and deliberately
acquires/releases the lock between chunks (with a comment explaining
the FW reclaim).
Additionally, the lock is only dropped on the failure path, where
neither bytes_read nor offset have been advanced. The driver simply
re-reads the failed chunk after re-acquiring. So no data that is
returned to the caller is committed across the drop window.
Overall the lock drop does not weaken any guarantee callers actually
have. If we wanted to introduce a true snapshot guarantee, that would
probably require different approach and separate patch.

> [Severity: High]
> Can this lead to an unbalanced NVM resource lock release and potential data
> corruption?
>
> If ice_acquire_nvm() fails here, the loop breaks and returns an error with the
> lock currently unheld. However, callers like ice_read_flash_module() are
> unaware of this internal lock state change and will unconditionally call
> ice_release_nvm() upon failure:
>
> drivers/net/ethernet/intel/ice/ice_nvm.c:ice_read_flash_module() {
>         status = ice_acquire_nvm(hw, ICE_RES_READ);
>         if (status)
>                 return status;
>
>         status = ice_read_flat_nvm(hw, start + offset, &length, data, false);
>
>         ice_release_nvm(hw);
>     ...
>
> Because firmware tracks resource locks per-PF rather than per-OS-thread, this
> errant second release could forcefully strip the lock from another thread on
> the same PF that successfully acquired it, exposing the NVM to concurrent
> modification.
>

Agreed, this might be a real bug, and the one of the three I think is
worth investigating.
If ice_acquire_nvm() fails after the drop, ice_read_flat_nvm() returns
with the lock unheld while callers unconditionally call
ice_release_nvm(), so a stray release is issued.

On probability, though, the window is very small. Reaching it requires
sustained EBUSY across the retry budget plus a failed re-acquire
(which itself polls up to ICE_NVM_TIMEOUT), and concurrently another
requester taking the lock. Most reads happen during init (ice_probe,
and reset/rebuild via ice_init_nvm), and NVM writes only happen on an
already initialized driver. The devlink/ethtool nvm_read paths are
also exposed, but hitting this race would require precise timing
against a concurrent NVM owner on the device.

I'd prefer to keep the scope of this patch limited to the EBUSY retry
path and not take on the unbalanced-release fix here. A proper fix
should change the lock-ownership contract of ice_read_flat_nvm() (on
error, the lock must be released by ice_read_flat_nvm(), callers
release only on success) and update all callers. Code change sould be
simple for all callers but ice_discover_flash_size(), it intentionally
holds one lease across a read loop and would need to re-acquire after
each expected boundary failure.

Given how small the original window is, I'd rather not trade tested
OOT behavior for the risk of a complex unbalanced NVM lock fix. I
actually have a patch mostly ready that fixes the lock-ownership
contract, but I really don't like it. It changes the design of
ice_read_flat_nvm(), making it less intuitive for callers. More
importantly, I just don't have the resources or test coverage right
now to properly verify such change.

However, I can modify the failure path for ice_acquire_nvm inside
ice_read_flat_nvm. Instead of bailing out immediately, we can just
retry it within the existing retry budget. In this case, the
probability of leaving ice_read_flat_nvm without holding the lock is
reduced even further without needing a refactor.

Please let me know what you think about my thought process on this.

> > +                     retry_cnt++;
> > +             } else {
> > +                     bytes_read += read_size;
> > +                     offset += read_size;
> > +                     retry_cnt = 0;
> > +             }
> >       } while (!last_cmd);
> >
> >       *length = bytes_read;

Thanks,
Robert

^ permalink raw reply

* Re: [PATCH 1/2] bug: Provide WARN_ON.*DEFERRED() macros for console deferred output
From: Sebastian Andrzej Siewior @ 2026-06-24 11:03 UTC (permalink / raw)
  To: Breno Leitao
  Cc: linux-arch, linux-kernel, sched-ext, netdev, David S . Miller,
	Andrea Righi, Andrew Morton, Arnd Bergmann, Ben Segall,
	Changwoo Min, David Vernet, Dietmar Eggemann, Eric Dumazet,
	Ingo Molnar, Jakub Kicinski, John Ogness, Juri Lelli,
	K Prateek Nayak, Paolo Abeni, Peter Zijlstra, Petr Mladek,
	Sergey Senozhatsky, Simon Horman, Steven Rostedt, Tejun Heo,
	Vincent Guittot, Vlad Poenaru
In-Reply-To: <ajuWnKsQR0Z825Wn@gmail.com>

On 2026-06-24 01:37:53 [-0700], Breno Leitao wrote:
Hi Breno,

> Have you considered an approach similar to printk_deferred_enter(),
> where you mark the code region that needs deferral and all WARN() calls
> within that region are automatically deferred?

Doing this at rq-lock site is not something the scheduler department
takes. It increases/ bloats the code sides more than what we have now.

Not everything is in __sched section so we can't check for this from
within printk. So this turd was the only idea I had.

> The current proposal requires changing individual WARN() call sites,
> but whether they need deferral might depend on the calling context. This
> means you'd need to convert many call sites and ensure all nested
> warnings are also converted to the deferred variant.

I hope for the forced-threaded-legacy the default but this camp has not
a lot members. It would increase the pressure to provide nbcon so it
could be a good thing.

To accept this series and make it more bullet-proof we could do
s/WARN_ON\>/WARN_ON_DEFERRED/ for all sched/ and require it regardless
if the rq-lock is held. So you wouldn't have to audit it each and every
time. Due to that preempt-disable thingy it can be used in preemptible
sections without breaking anything.

> 
> Thanks,
> --breno

Sebastian

^ permalink raw reply

* [PATCH 5.10] netfilter: nf_log: validate MAC header was set before dumping it
From: Alexander Martyniuk @ 2026-06-24 14:01 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Alexander Martyniuk, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Jakub Kicinski, Patrick McHardy,
	netfilter-devel, coreteam, netdev, linux-kernel, Weiming Shi,
	Xiang Mei

From: Xiang Mei <xmei5@asu.edu>

commit a84b6fedbc97078788be78dbdd7517d143ad1a77 upstream

The fallback path of dump_mac_header() guards the MAC header access
only with "skb->mac_header != skb->network_header", without checking
skb_mac_header_was_set(). When the MAC header is unset, mac_header is
0xffff, so the test passes and skb_mac_header(skb) returns
skb->head + 0xffff, ~64 KiB past the buffer; the loop then reads
dev->hard_header_len bytes out of bounds into the kernel log.

This is reachable via the netdev logger: nf_log_unknown_packet() calls
dump_mac_header() unconditionally, and an skb sent through AF_PACKET
with PACKET_QDISC_BYPASS reaches the egress hook with mac_header still
unset (__dev_queue_xmit(), which would reset it, is bypassed).

Add the skb_mac_header_was_set() check the ARPHRD_ETHER path already
uses, and replace the open-coded MAC header length test with
skb_mac_header_len(). Only skbs with an unset MAC header are affected;
valid ones are dumped as before.

 BUG: KASAN: slab-out-of-bounds in dump_mac_header (net/netfilter/nf_log_syslog.c:831)
 Read of size 1 at addr ffff88800ea49d3f by task exploit/148
 Call Trace:
  kasan_report (mm/kasan/report.c:595)
  dump_mac_header (net/netfilter/nf_log_syslog.c:831)
  nf_log_netdev_packet (net/netfilter/nf_log_syslog.c:938 net/netfilter/nf_log_syslog.c:963)
  nf_log_packet (net/netfilter/nf_log.c:260)
  nft_log_eval (net/netfilter/nft_log.c:60)
  nft_do_chain (net/netfilter/nf_tables_core.c:285)
  nft_do_chain_netdev (net/netfilter/nft_chain_filter.c:307)
  nf_hook_slow (net/netfilter/core.c:619)
  nf_hook_direct_egress (net/packet/af_packet.c:257)
  packet_xmit (net/packet/af_packet.c:280)
  packet_sendmsg (net/packet/af_packet.c:3114)
  __sys_sendto (net/socket.c:2265)

Fixes: 7eb9282cd0ef ("netfilter: ipt_LOG/ip6t_LOG: add option to print decoded MAC header")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Alexander Martyniuk <alexevgmart@gmail.com>
---
Backport fix for CVE-2026-52942
 net/ipv4/netfilter/nf_log_ipv4.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/netfilter/nf_log_ipv4.c b/net/ipv4/netfilter/nf_log_ipv4.c
index d07583fac8f8..d6164e8e2c73 100644
--- a/net/ipv4/netfilter/nf_log_ipv4.c
+++ b/net/ipv4/netfilter/nf_log_ipv4.c
@@ -296,8 +296,8 @@ static void dump_ipv4_mac_header(struct nf_log_buf *m,
 
 fallback:
 	nf_log_buf_add(m, "MAC=");
-	if (dev->hard_header_len &&
-	    skb->mac_header != skb->network_header) {
+	if (dev->hard_header_len && skb_mac_header_was_set(skb) &&
+	    skb_mac_header_len(skb) != 0) {
 		const unsigned char *p = skb_mac_header(skb);
 		unsigned int i;
 
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net] net: clear transport header during tunnel decapsulation
From: Jiayuan Chen @ 2026-06-24 10:41 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Ido Schimmel, David Ahern, netdev, eric.dumazet,
	syzbot+d5d0d598a4cfdfafdc3b
In-Reply-To: <20260624073209.3703492-1-edumazet@google.com>


On 6/24/26 3:32 PM, Eric Dumazet wrote:
> Syzbot triggered a DEBUG_NET_WARN_ON_ONCE(len > INT_MAX) assertion in
> pskb_may_pull_reason() called from qdisc_pkt_len_segs_init().
>
> The root cause is a stale, negative transport header offset carried over
> during tunnel decapsulation. When a tunnel receiver (e.g., VXLAN or Geneve)
> decapsulates a packet, it pulls the outer headers but leaves the transport
> header pointing to the outer UDP header. This offset becomes negative
> relative to the new skb->data (inner IP header).
>
> If the packet bypasses GRO (e.g., an untrusted GSO packet flagged as
> "unexpected GSO" by udp_unexpected_gso() due to missing tunnel GSO bits),
> it is flushed directly to the stack as GRO_NORMAL. On ingress, Layer 2 Qdisc
> processing (sch_handle_ingress) happens before Layer 3 IP reception
> (ip_rcv_core) can run and reset the transport header. Consequently,
> qdisc_pkt_len_segs_init() attempts to validate the transport header using
> pskb_may_pull(skb, hdr_len + sizeof(tcphdr)). The negative hdr_len overflows
> the unsigned cast in pskb_may_pull(), triggering the assertion.
>
> Fix this by clearing the transport header to the ~0U sentinel value during
> decapsulation. This ensures that:
> 1) The ingress Qdisc safely skips validation via !skb_transport_header_was_set()
>     and returns early without warning.
> 2) The IP layer (ip_rcv_core) later correctly resets the transport header
>     to the inner L4 header offset.
>
> Introduce skb_unset_transport_header() helper and apply it in the main
> decapsulation paths:
> 1) __iptunnel_pull_header() (covering Geneve, GRE, IPIP, SIT, etc.)
> 2) vxlan_rcv() (covering VXLAN)
>
> This restores skb invariants at the decapsulation boundary without adding
> overhead to the Qdisc fast path.
>
> Fixes: 7fb4c1967011 ("net: pull headers in qdisc_pkt_len_segs_init()")
> Reported-by: syzbot+d5d0d598a4cfdfafdc3b@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/netdev/6a3b853b.52ae72c2.136ac7.000c.GAE@google.com/T/#u
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Assisted-by: Gemini:gemini-3.1-pro


I think a negative skb_transport_offset() should break something else too,
so the Fixes tag looks wrong, but I couldn't find any actual breakage 
(luck, or I'm missing it).

Hope sashiko read this reply and confirm it....



^ permalink raw reply

* Re: [PATCH bpf-next v2] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG
From: Jakub Sitnicki @ 2026-06-24 10:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Amery Hung, Kuniyuki Iwashima, bpf, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jiayuan Chen, John Fastabend,
	Network Development, kernel-team
In-Reply-To: <CAADnVQKr1XisnigNsBw7CsXxY3Xn5KOGtX_YDdXmNMZyJy4_Cw@mail.gmail.com>

On Tue, Jun 23, 2026 at 02:26 PM -07, Alexei Starovoitov wrote:
> On Tue, Jun 23, 2026 at 1:36 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> This is a follow up from discussions at BPF summit with Alexei & John.
>
> Not quite. The discussion was to disable pieces of sockmap
> that are causing trouble.
> Not to move them under config knobs, but disable them.

I really don't see how we can remove a feature that has users.
I guess we will just have to weather out the bug wave.

^ permalink raw reply

* Re: [PATCH net] net: ethernet: qualcomm: ppe: Demote from supported and fix maintainer addresses
From: Jie Luo @ 2026-06-24 10:16 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Krzysztof Kozlowski, Bjorn Andersson, Michael Turquette,
	Stephen Boyd, Brian Masney, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Lei Wei, Suruchi Agarwal, Pavithra R,
	linux-kernel, linux-arm-msm, linux-clk, devicetree, netdev,
	Kiran Kumar C.S.K, quic_linchen
In-Reply-To: <7095f7ba-bacb-4d03-89cf-ed43882d8213@lunn.ch>

On 6/23/2026 7:31 PM, Andrew Lunn wrote:
> On Tue, Jun 23, 2026 at 05:42:34PM +0800, Jie Luo wrote:
>>
>>
>> On 6/23/2026 4:10 PM, Andrew Lunn wrote:
>>>> Driver is not supported - in terms of how netdev understands supported
>>>> commitment - if maintainer does not care to receive the patches for its
>>>> code, so demote it to "maintained" to reflect true status.
>>>
>>> Maybe "Orphan" would be better, if the listed Maintainer is not doing
>>> any Maintainer work?
>>>
>>> 	   Andrew	   
>>
>> Hello Andrew, Krzysztof,
>> I will continue to maintain the listed drivers, so their status can
>> remain Supported.
> 
> Please understand that being a Maintainer requires that you respond to
> patches and questions about this driver, give Reviewed-by:, ask for
> patches to be changed etc. If you don't respond, ideally with 2 to 3
> days, the driver will be set to Orphaned.
> 
> If you want to maintain the Supported status, we can help you set up
> the needed CI system, and get it registered so it reports the results.
> 
>     Andrew

Thank you Andrew, Krzysztof, for the clarification on what "Supported"
status entails and for the offer to help with CI setup. I would very
much appreciate the community's help in getting the CI system set up
and registered for this driver. In the mean time we will also look at
resources internally within Qualcomm, to understand how to support
testing using kernelCI/netdevCI for IPQ SoC. This will help us test
the driver continuously as well.

I fully understand and accept the maintainer responsibilities for this
driver, and commit to the below:
- Responding to patches and questions in a timely manner.
- Providing review comments and requesting changes where appropriate,
  and providing Reviewed-by tags when needed.

I would also like to take a moment to provide an update on our current
efforts for IPQ SoC, if it can be of help. We have already re-started
our efforts for the drivers and are currently actively working to extend
the IPQ drivers to support more functionality and for newer SoC support
for same family. We plan to post these updates to the current drivers
once the review window reopens.

We feel maintaining the "Supported" status is appropriate and reflects
our genuine long-term commitment to IPQ SoC networking drivers in Linux
kernel. We request you to retain the current status for this driver if
acceptable.

Regarding the email ID change, we had attempted to rectify the
MAINTAINERS file a few months ago based on recommendation given
internally (please see below thread), however agree that such an update
in documentation is also required.

https://lore.kernel.org/all/20250903-maintainer_update-v1-1-2183fd2a3c44@oss.qualcomm.com/

^ permalink raw reply

* Re: [PATCH 1/2] bug: Provide WARN_ON.*DEFERRED() macros for console deferred output
From: Sebastian Andrzej Siewior @ 2026-06-24 10:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-kernel, sched-ext, netdev, David S . Miller,
	Andrea Righi, Andrew Morton, Arnd Bergmann, Ben Segall,
	Breno Leitao, Changwoo Min, David Vernet, Dietmar Eggemann,
	Eric Dumazet, Ingo Molnar, Jakub Kicinski, John Ogness,
	Juri Lelli, K Prateek Nayak, Paolo Abeni, Petr Mladek,
	Sergey Senozhatsky, Simon Horman, Steven Rostedt, Tejun Heo,
	Vincent Guittot, Vlad Poenaru
In-Reply-To: <20260624093117.GY48970@noisy.programming.kicks-ass.net>

On 2026-06-24 11:31:17 [+0200], Peter Zijlstra wrote:
> On Tue, Jun 23, 2026 at 04:26:49PM +0200, Sebastian Andrzej Siewior wrote:
> 
> > +#ifndef WARN_ON_DEFERRED
> > +#define WARN_ON_DEFERRED(condition) ({					\
> > +	int __ret_warn_on = !!(condition);				\
> > +	if (unlikely(__ret_warn_on)) {					\
> > +		guard(preempt)();					\
> > +		printk_deferred_enter()					\
> > +		__WARN();						\
> > +		printk_deferred_exit()					\
> > +	}								\
> > +	unlikely(__ret_warn_on);					\
> > +})
> > +#endif
> 
> This will generate atrocious shite at the WARN sites.

You mean the missing semicolon and huge size increase?
On x86 with these guard+deffered in the upper variant, before:
    text    data     bss     dec   filename
   93910   37424     832  132166   kernel/sched/core.o
   61802    4945     152   66899   kernel/sched/fair.o
  215108   24453    3768  243329   kernel/sched/build_policy.o
   86128   30092   12704  128924   kernel/sched/build_utility.o
  456948   96914   17456  571318   total
After:
   96140   37408     832  134380   kernel/sched/core.o
   64490    4937     152   69579   kernel/sched/fair.o
  222980   24157    3768  250905   kernel/sched/build_policy.o
   86544   30100   12704  129348   kernel/sched/build_utility.o
  470154   96602   17456  584212   total + 1.3%

total went up by 1.3% or 12.59KiB.
This effects:  alpha, arc, arm, csky, hexagon, m68k, microblaze, mips,
nios2, openrisc, sparc, um, xtensa
and could motivate them to implement __WARN_FLAGS which would lower size
in general and this stunt would have no effect.

Just looked at arm and it has support for invalid opcodes somehow but
not for this.

Sebastian

^ permalink raw reply

* [PATCH v3 5/7] net: wwan: t9xx: Add FSM thread
From: Jack Wu via B4 Relay @ 2026-06-24 10:04 UTC (permalink / raw)
  To: Loic Poulain, Sergey Ryazanov, Johannes Berg, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jack Wu, Wen-Zhi Huang, Shi-Wei Yeh, Minano Tseng,
	Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
	Jonathan Corbet, Shuah Khan
  Cc: linux-kernel, netdev, linux-arm-kernel, linux-mediatek, linux-doc
In-Reply-To: <20260624-t9xx_driver_v1-v3-0-73ff03f60c48@compal.com>

From: Jack Wu <jackbb_wu@compal.com>

The FSM (Finite-state Machine) thread is responsible for
synchronizing the actions of different modules. The
asynchronous events from the device or the OS will trigger
a state transition.

The FSM thread will append it to the event queue when an
event arrives. It handles the events sequentially. After
processing the event, the FSM thread notifies other modules
before and after the state transition.

Seven FSM states are defined. They can transition from one
state to another, self-transition in some states, and
transition in some sub-states.

Signed-off-by: Jack Wu <jackbb_wu@compal.com>
---
 drivers/net/wwan/t9xx/Makefile                  |   3 +-
 drivers/net/wwan/t9xx/mtk_ctrl_plane.c          |  46 ++
 drivers/net/wwan/t9xx/mtk_ctrl_plane.h          |   2 +
 drivers/net/wwan/t9xx/mtk_dev.h                 |   1 +
 drivers/net/wwan/t9xx/mtk_fsm.c                 | 948 ++++++++++++++++++++++++
 drivers/net/wwan/t9xx/mtk_fsm.h                 | 140 ++++
 drivers/net/wwan/t9xx/mtk_port.c                |  65 ++
 drivers/net/wwan/t9xx/mtk_port.h                |   2 +
 drivers/net/wwan/t9xx/mtk_utility.h             |  33 +
 drivers/net/wwan/t9xx/pcie/mtk_cldma.c          | 222 +++++-
 drivers/net/wwan/t9xx/pcie/mtk_cldma.h          |   3 +
 drivers/net/wwan/t9xx/pcie/mtk_cldma_drv.h      |   3 -
 drivers/net/wwan/t9xx/pcie/mtk_cldma_drv_m9xx.c |   7 +-
 drivers/net/wwan/t9xx/pcie/mtk_cldma_drv_m9xx.h |   2 -
 drivers/net/wwan/t9xx/pcie/mtk_pci.c            |  16 +-
 drivers/net/wwan/t9xx/pcie/mtk_trans_ctrl.c     |  10 +
 drivers/net/wwan/t9xx/pcie/mtk_trans_ctrl.h     |   1 -
 17 files changed, 1488 insertions(+), 16 deletions(-)

diff --git a/drivers/net/wwan/t9xx/Makefile b/drivers/net/wwan/t9xx/Makefile
index db3b1aa1928b..75760b2039dc 100644
--- a/drivers/net/wwan/t9xx/Makefile
+++ b/drivers/net/wwan/t9xx/Makefile
@@ -10,4 +10,5 @@ mtk_t9xx-y := \
 	mtk_dev.o \
 	mtk_ctrl_plane.o \
 	mtk_port.o \
-	mtk_port_io.o
+	mtk_port_io.o \
+	mtk_fsm.o
diff --git a/drivers/net/wwan/t9xx/mtk_ctrl_plane.c b/drivers/net/wwan/t9xx/mtk_ctrl_plane.c
index b9a0443ce8ec..dc6a0670fe2b 100644
--- a/drivers/net/wwan/t9xx/mtk_ctrl_plane.c
+++ b/drivers/net/wwan/t9xx/mtk_ctrl_plane.c
@@ -5,10 +5,46 @@
  */
 
 #include <linux/device.h>
+#include <linux/freezer.h>
+#include <linux/kthread.h>
+#include <linux/list.h>
+#include <linux/pm_runtime.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
 
 #include "mtk_ctrl_plane.h"
 #include "mtk_port.h"
 
+#define TAG "CTRL"
+
+static void mtk_ctrl_trans_fsm_state_handler(struct mtk_fsm_param *param,
+					     struct mtk_ctrl_blk *ctrl_blk)
+{
+	struct mtk_md_dev *mdev = ctrl_blk->mdev;
+
+	switch (param->to) {
+	case FSM_STATE_OFF:
+		ctrl_blk->ops->fsm_indication(mdev, param);
+		ctrl_blk->ops->exit(mdev);
+		break;
+	case FSM_STATE_ON:
+		ctrl_blk->ops->init(mdev);
+		fallthrough;
+	default:
+		ctrl_blk->ops->fsm_indication(mdev, param);
+		break;
+	}
+}
+
+static void mtk_ctrl_fsm_state_listener(struct mtk_fsm_param *param, void *data)
+{
+	struct mtk_ctrl_blk *ctrl_blk = data;
+
+	mtk_port_mngr_fsm_state_handler(param, ctrl_blk->port_mngr);
+	mtk_ctrl_trans_fsm_state_handler(param, ctrl_blk);
+	mtk_port_mngr_fsm_state_handler_late(param, ctrl_blk->port_mngr);
+}
+
 /**
  * mtk_ctrl_init() - Initialize the control plane block.
  * @mdev: Pointer to the MTK modem device.
@@ -39,8 +75,17 @@ int mtk_ctrl_init(struct mtk_md_dev *mdev, struct mtk_ctrl_hif_ops *ops, struct
 	if (err)
 		goto err_free_mem;
 
+	err = mtk_fsm_notifier_register(mdev, MTK_USER_CTRL, mtk_ctrl_fsm_state_listener,
+					ctrl_blk, FSM_PRIO_1, false);
+	if (err) {
+		dev_err((mdev)->dev, "Fail to register fsm notification(ret = %d)\n", err);
+		goto err_port_exit;
+	}
+
 	return 0;
 
+err_port_exit:
+	mtk_port_mngr_exit(ctrl_blk);
 err_free_mem:
 	devm_kfree(mdev->dev, ctrl_blk);
 
@@ -58,6 +103,7 @@ void mtk_ctrl_exit(struct mtk_md_dev *mdev)
 {
 	struct mtk_ctrl_blk *ctrl_blk = mdev->ctrl_blk;
 
+	mtk_fsm_notifier_unregister(mdev, MTK_USER_CTRL);
 	mtk_port_mngr_exit(ctrl_blk);
 	devm_kfree(mdev->dev, ctrl_blk);
 	mdev->ctrl_blk = NULL;
diff --git a/drivers/net/wwan/t9xx/mtk_ctrl_plane.h b/drivers/net/wwan/t9xx/mtk_ctrl_plane.h
index d7fcccde8a1b..92817e92a2e4 100644
--- a/drivers/net/wwan/t9xx/mtk_ctrl_plane.h
+++ b/drivers/net/wwan/t9xx/mtk_ctrl_plane.h
@@ -10,6 +10,7 @@
 #include <linux/skbuff.h>
 
 #include "mtk_dev.h"
+#include "mtk_fsm.h"
 
 #define Q_MTU_2K			(0x800)
 #define Q_MTU_3_5K			(0xE00)
@@ -62,6 +63,7 @@ struct mtk_ctrl_hif_ops {
 	int (*init)(struct mtk_md_dev *mdev);
 	int (*exit)(struct mtk_md_dev *mdev);
 	int (*submit_skb)(struct mtk_md_dev *mdev, struct sk_buff *skb, bool force_send);
+	void (*fsm_indication)(struct mtk_md_dev *mdev, struct mtk_fsm_param *param);
 	int (*send_cmd)(struct mtk_md_dev *mdev, int cmd, void *data);
 };
 
diff --git a/drivers/net/wwan/t9xx/mtk_dev.h b/drivers/net/wwan/t9xx/mtk_dev.h
index bb3ea68890ea..2388ada2c6a6 100644
--- a/drivers/net/wwan/t9xx/mtk_dev.h
+++ b/drivers/net/wwan/t9xx/mtk_dev.h
@@ -59,6 +59,7 @@ struct mtk_md_dev {
 	u32 hw_ver;
 	char dev_str[MTK_DEV_STR_LEN];
 	struct mtk_ctrl_blk *ctrl_blk;
+	struct mtk_md_fsm *fsm;
 };
 
 static inline u32 mtk_dev_get_dev_state(struct mtk_md_dev *mdev)
diff --git a/drivers/net/wwan/t9xx/mtk_fsm.c b/drivers/net/wwan/t9xx/mtk_fsm.c
new file mode 100644
index 000000000000..a9943c63986c
--- /dev/null
+++ b/drivers/net/wwan/t9xx/mtk_fsm.c
@@ -0,0 +1,948 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022, MediaTek Inc.
+ */
+
+#include <linux/bitfield.h>
+#include <linux/device.h>
+#include <linux/kref.h>
+#include <linux/kthread.h>
+#include <linux/list.h>
+#include <linux/pci.h>
+#include <linux/sched/signal.h>
+#include <linux/skbuff.h>
+#include <linux/wait.h>
+
+#include "mtk_fsm.h"
+#include "mtk_port.h"
+#include "mtk_port_io.h"
+#include "mtk_utility.h"
+
+#define EVT_TF_GATECLOSED (1)
+#define MTK_FSM_INFO_LEN	(64)
+
+#define FSM_HS_START_MASK	(FSM_F_SAP_HS_START | FSM_F_MD_HS_START)
+#define FSM_HS2_DONE_MASK	(FSM_F_SAP_HS2_DONE | FSM_F_MD_HS2_DONE)
+
+#define RTFT_DATA_SIZE		(3 * 1024)
+#define EVT_HANDLER_TIMEOUT	(HZ * 30)
+#define BLOCKING_EVT_TIMEOUT	(2 * EVT_HANDLER_TIMEOUT)
+
+#define REGION_BITMASK		0xF
+#define DEVICE_CFG_SHIFT	24
+#define DEVICE_CFG_REGION_MASK	0x3
+
+enum device_stage {
+	DEV_STAGE_IDLE = 4,
+	DEV_STAGE_MAX
+};
+
+enum device_cfg {
+	DEV_CFG_NORMAL = 0,
+	DEV_CFG_MD_ONLY,
+};
+
+enum runtime_feature_support_type {
+	RTFT_TYPE_NOT_EXIST = 0,
+	RTFT_TYPE_NOT_SUPPORT = 1,
+	RTFT_TYPE_MUST_SUPPORT = 2,
+	RTFT_TYPE_OPTIONAL_SUPPORT = 3,
+	RTFT_TYPE_SUPPORT_BACKWARD_COMPAT = 4,
+};
+
+enum query_runtime_feature_id {
+	QUERY_RTFT_ID_MD_PORT_ENUM = 0,
+	QUERY_RTFT_ID_SAP_PORT_ENUM = 1,
+	QUERY_RTFT_ID_MD_PORT_CFG = 2,
+	QUERY_RTFT_ID_MAX
+};
+
+enum ctrl_msg_id {
+	CTRL_MSG_HS1 = 0,
+	CTRL_MSG_HS2 = 1,
+	CTRL_MSG_HS3 = 2,
+};
+
+struct ctrl_msg_header {
+	__le32 id;
+	__le32 ex_msg;
+	__le32 data_len;
+	u8 reserved[];
+} __packed;
+
+struct runtime_feature_entry {
+	u8 feature_id;
+	struct runtime_feature_info support_info;
+	u8 reserved[2];
+	__le32 data_len;
+	u8 data[];
+};
+
+struct feature_query {
+	__le32 head_pattern;
+	struct runtime_feature_info ft_set[FEATURE_CNT];
+	__le32 tail_pattern;
+};
+
+static int mtk_fsm_send_hs1_msg(struct fsm_hs_info *hs_info)
+{
+	struct ctrl_msg_header *ctrl_msg_h;
+	struct feature_query *ft_query;
+	struct sk_buff *skb;
+	int ret, msg_size;
+
+	msg_size = sizeof(*ctrl_msg_h) + sizeof(*ft_query);
+	skb = __dev_alloc_skb(msg_size, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	skb_put(skb, msg_size);
+	ctrl_msg_h = (struct ctrl_msg_header *)skb->data;
+	ctrl_msg_h->id = cpu_to_le32(CTRL_MSG_HS1);
+	ctrl_msg_h->ex_msg = 0;
+	ctrl_msg_h->data_len = cpu_to_le32(sizeof(*ft_query));
+
+	ft_query = (struct feature_query *)(skb->data + sizeof(*ctrl_msg_h));
+	ft_query->head_pattern = cpu_to_le32(FEATURE_QUERY_PATTERN);
+	memcpy(ft_query->ft_set, hs_info->query_ft_set, sizeof(hs_info->query_ft_set));
+	ft_query->tail_pattern = cpu_to_le32(FEATURE_QUERY_PATTERN);
+
+	/* send handshake1 message to device */
+	ret = mtk_port_internal_write(hs_info->ctrl_port, skb);
+	if (ret <= 0)
+		return ret;
+
+	return 0;
+}
+
+static int mtk_fsm_feature_set_match(enum runtime_feature_support_type *cur_ft_spt,
+				     struct runtime_feature_info rtft_info_st,
+				     struct runtime_feature_info rtft_info_cfg)
+{
+	int ret = 0;
+
+	switch (FIELD_GET(FEATURE_TYPE, rtft_info_st.feature)) {
+	case RTFT_TYPE_NOT_EXIST:
+		fallthrough;
+	case RTFT_TYPE_NOT_SUPPORT:
+		*cur_ft_spt = RTFT_TYPE_NOT_EXIST;
+		break;
+	case RTFT_TYPE_MUST_SUPPORT:
+		if (FIELD_GET(FEATURE_TYPE, rtft_info_cfg.feature) == RTFT_TYPE_NOT_EXIST ||
+		    FIELD_GET(FEATURE_TYPE, rtft_info_cfg.feature) == RTFT_TYPE_NOT_SUPPORT)
+			ret = -EPROTO;
+		else
+			*cur_ft_spt = RTFT_TYPE_MUST_SUPPORT;
+		break;
+	case RTFT_TYPE_OPTIONAL_SUPPORT:
+		if (FIELD_GET(FEATURE_TYPE, rtft_info_cfg.feature) == RTFT_TYPE_NOT_EXIST ||
+		    FIELD_GET(FEATURE_TYPE, rtft_info_cfg.feature) == RTFT_TYPE_NOT_SUPPORT) {
+			*cur_ft_spt = RTFT_TYPE_NOT_SUPPORT;
+		} else {
+			if (FIELD_GET(FEATURE_VER, rtft_info_st.feature) ==
+			    FIELD_GET(FEATURE_VER, rtft_info_cfg.feature))
+				*cur_ft_spt = RTFT_TYPE_MUST_SUPPORT;
+			else
+				*cur_ft_spt = RTFT_TYPE_NOT_SUPPORT;
+		}
+		break;
+	case RTFT_TYPE_SUPPORT_BACKWARD_COMPAT:
+		if (FIELD_GET(FEATURE_VER, rtft_info_st.feature) >=
+		    FIELD_GET(FEATURE_VER, rtft_info_cfg.feature))
+			*cur_ft_spt = RTFT_TYPE_MUST_SUPPORT;
+		else
+			*cur_ft_spt = RTFT_TYPE_NOT_EXIST;
+		break;
+	default:
+		ret = -EPROTO;
+	}
+
+	return ret;
+}
+
+static int (*query_rtft_action[FEATURE_CNT])(struct mtk_md_dev *mdev, void *rt_data) = {
+	[QUERY_RTFT_ID_MD_PORT_ENUM] = mtk_port_status_update,
+	[QUERY_RTFT_ID_SAP_PORT_ENUM] = mtk_port_status_update,
+};
+
+static int mtk_fsm_parse_hs2_msg(struct fsm_hs_info *hs_info)
+{
+	struct mtk_md_fsm *fsm = container_of(hs_info, struct mtk_md_fsm, hs_info[hs_info->id]);
+	char *rt_data = ((struct sk_buff *)hs_info->rt_data)->data;
+	enum runtime_feature_support_type cur_ft_spt;
+	struct runtime_feature_entry *rtft_entry;
+	unsigned int ft_id, offset, data_len;
+	int ret = 0;
+
+	offset = sizeof(struct feature_query);
+	for (ft_id = 0; ft_id < FEATURE_CNT; ft_id++) {
+		if (offset + sizeof(*rtft_entry) > hs_info->rt_data_len)
+			break;
+
+		rtft_entry = (struct runtime_feature_entry *)(rt_data + offset);
+		ret = mtk_fsm_feature_set_match(&cur_ft_spt,
+						rtft_entry->support_info,
+						hs_info->query_ft_set[ft_id]);
+		if (ret < 0)
+			break;
+
+		if (cur_ft_spt == RTFT_TYPE_MUST_SUPPORT)
+			if (query_rtft_action[ft_id])
+				ret = query_rtft_action[ft_id](fsm->mdev, rtft_entry->data);
+		if (ret < 0)
+			break;
+
+		data_len = le32_to_cpu(rtft_entry->data_len);
+		if (data_len > hs_info->rt_data_len - offset - sizeof(*rtft_entry))
+			break;
+
+		offset += sizeof(*rtft_entry) + data_len;
+	}
+
+	if (ft_id != FEATURE_CNT) {
+		dev_err((fsm->mdev)->dev, "Unable to handle mistake hs2 msg, ft_id=%d\n", ft_id);
+		ret = -EPROTO;
+	}
+
+	return ret;
+}
+
+static int mtk_fsm_append_rtft_entries(struct mtk_md_dev *mdev, void *feature_data,
+				       unsigned int *len, struct fsm_hs_info *hs_info)
+{
+	char *rt_data = ((struct sk_buff *)hs_info->rt_data)->data;
+	struct runtime_feature_entry *rtft_entry;
+	int ft_id, ret = 0, rtdata_len = 0;
+	struct feature_query *ft_query;
+
+	ft_query = (struct feature_query *)rt_data;
+	if (le32_to_cpu(ft_query->head_pattern) != FEATURE_QUERY_PATTERN ||
+	    le32_to_cpu(ft_query->tail_pattern) != FEATURE_QUERY_PATTERN) {
+		ret = -EPROTO;
+		goto hs_err;
+	}
+
+	/* parse runtime feature query and fill runtime feature entry */
+	rtft_entry = feature_data;
+	for (ft_id = 0; ft_id < FEATURE_CNT && rtdata_len < RTFT_DATA_SIZE; ft_id++) {
+		rtft_entry->feature_id = ft_id;
+		rtft_entry->data_len = 0;
+
+		switch (FIELD_GET(FEATURE_TYPE, ft_query->ft_set[ft_id].feature)) {
+		case RTFT_TYPE_NOT_EXIST:
+			fallthrough;
+		case RTFT_TYPE_NOT_SUPPORT:
+			fallthrough;
+		case RTFT_TYPE_MUST_SUPPORT:
+			rtft_entry->support_info = ft_query->ft_set[ft_id];
+			break;
+		case RTFT_TYPE_OPTIONAL_SUPPORT:
+			fallthrough;
+		case RTFT_TYPE_SUPPORT_BACKWARD_COMPAT:
+			rtft_entry->support_info.feature = FEATURE_TYPE_NOT;
+			rtft_entry->support_info.feature |= FEATURE_VER_0;
+			break;
+		}
+
+		rtdata_len += sizeof(*rtft_entry) + le32_to_cpu(rtft_entry->data_len);
+		rtft_entry = (struct runtime_feature_entry *)(feature_data + rtdata_len);
+	}
+	*len = rtdata_len;
+	return 0;
+
+hs_err:
+	*len = 0;
+	return ret;
+}
+
+static int mtk_fsm_send_hs3_msg(struct fsm_hs_info *hs_info)
+{
+	struct mtk_md_fsm *fsm = container_of(hs_info, struct mtk_md_fsm, hs_info[hs_info->id]);
+	unsigned int data_len, msg_size = 0;
+	struct ctrl_msg_header *ctrl_msg_h;
+	struct sk_buff *skb;
+	int ret;
+
+	skb = __dev_alloc_skb(RTFT_DATA_SIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	msg_size += sizeof(*ctrl_msg_h);
+	ctrl_msg_h = (struct ctrl_msg_header *)skb->data;
+	ctrl_msg_h->id = cpu_to_le32(CTRL_MSG_HS3);
+	ctrl_msg_h->ex_msg = 0;
+	ret = mtk_fsm_append_rtft_entries(fsm->mdev,
+					  skb->data + sizeof(*ctrl_msg_h),
+					  &data_len, hs_info);
+	if (ret) {
+		dev_kfree_skb(skb);
+		return ret;
+	}
+
+	ctrl_msg_h->data_len = cpu_to_le32(data_len);
+	msg_size += data_len;
+	skb_put(skb, msg_size);
+	ret = mtk_port_internal_write(hs_info->ctrl_port, skb);
+	if (ret <= 0)
+		return ret;
+
+	return 0;
+}
+
+static int mtk_fsm_sap_ctrl_msg_handler(void *__fsm, struct sk_buff *skb)
+{
+	struct ctrl_msg_header *ctrl_msg_h;
+	struct mtk_md_fsm *fsm = __fsm;
+	struct fsm_hs_info *hs_info;
+	int ret;
+
+	if (skb->len < sizeof(*ctrl_msg_h)) {
+		dev_kfree_skb(skb);
+		return -EINVAL;
+	}
+
+	ctrl_msg_h = (struct ctrl_msg_header *)skb->data;
+	skb_pull(skb, sizeof(*ctrl_msg_h));
+
+	hs_info = &fsm->hs_info[HS_ID_SAP];
+	if (le32_to_cpu(ctrl_msg_h->id) != CTRL_MSG_HS2) {
+		dev_kfree_skb(skb);
+		return -EPROTO;
+	}
+
+	hs_info->rt_data = skb;
+	hs_info->rt_data_len = skb->len;
+	ret = mtk_fsm_evt_submit(fsm->mdev, FSM_EVT_STARTUP,
+				 hs_info->fsm_flag_hs2, hs_info, sizeof(*hs_info), 0);
+	if (ret == FSM_EVT_RET_FAIL)
+		dev_kfree_skb(skb);
+
+	return 0;
+}
+
+static int mtk_fsm_md_ctrl_msg_handler(void *__fsm, struct sk_buff *skb)
+{
+	struct ctrl_msg_header *ctrl_msg_h;
+	struct mtk_md_fsm *fsm = __fsm;
+	struct fsm_hs_info *hs_info;
+	bool consumed_skb = false;
+	int ret;
+
+	if (skb->len < sizeof(*ctrl_msg_h)) {
+		dev_kfree_skb(skb);
+		return -EINVAL;
+	}
+
+	ctrl_msg_h = (struct ctrl_msg_header *)skb->data;
+	hs_info = &fsm->hs_info[HS_ID_MD];
+	switch (le32_to_cpu(ctrl_msg_h->id)) {
+	case CTRL_MSG_HS2:
+		skb_pull(skb, sizeof(*ctrl_msg_h));
+		hs_info->rt_data = skb;
+		hs_info->rt_data_len = skb->len;
+		ret = mtk_fsm_evt_submit(fsm->mdev, FSM_EVT_STARTUP,
+					 hs_info->fsm_flag_hs2, hs_info, sizeof(*hs_info), 0);
+		if (ret != FSM_EVT_RET_FAIL)
+			consumed_skb = true;
+		break;
+	default:
+		dev_err(fsm->mdev->dev, "Invalid ctrl msg id\n");
+	}
+
+	if (!consumed_skb)
+		dev_kfree_skb(skb);
+
+	return 0;
+}
+
+static int (*ctrl_msg_handler[HS_ID_MAX])(void *__fsm, struct sk_buff *skb) = {
+	[HS_ID_MD] = mtk_fsm_md_ctrl_msg_handler,
+	[HS_ID_SAP] = mtk_fsm_sap_ctrl_msg_handler,
+};
+
+static void mtk_fsm_idle_evt_handler(struct mtk_md_dev *mdev,
+				     u32 dev_state, struct mtk_md_fsm *fsm)
+{
+	u32 dev_cfg = dev_state >> DEVICE_CFG_SHIFT & DEVICE_CFG_REGION_MASK;
+	int hs_id;
+
+	if (dev_cfg == DEV_CFG_MD_ONLY)
+		fsm->hs_done_flag = FSM_F_MD_HS_START | FSM_F_MD_HS2_DONE;
+	else
+		fsm->hs_done_flag = FSM_HS_START_MASK | FSM_HS2_DONE_MASK;
+
+	mtk_fsm_evt_submit(mdev, FSM_EVT_STARTUP, FSM_F_DFLT, NULL, 0, 0);
+
+	for (hs_id = 0; hs_id < HS_ID_MAX; hs_id++)
+		mtk_dev_unmask_dev_evt(mdev, fsm->hs_info[hs_id].mhccif_ch);
+}
+
+static int mtk_fsm_early_bootup_handler(u32 status, void *__fsm)
+{
+	struct mtk_md_fsm *fsm = __fsm;
+	struct mtk_md_dev *mdev;
+	u32 dev_state, dev_stage;
+
+	mdev = fsm->mdev;
+	mtk_dev_mask_dev_evt(mdev, status);
+	mtk_dev_clear_dev_evt(mdev, status);
+
+	dev_state = mtk_dev_get_dev_state(mdev);
+	dev_stage = dev_state & REGION_BITMASK;
+	if (dev_stage >= DEV_STAGE_MAX) {
+		dev_err(mdev->dev, "Invalid dev state 0x%x\n", dev_state);
+		return -ENXIO;
+	}
+
+	if (dev_state == fsm->last_dev_state)
+		goto exit;
+	fsm->last_dev_state = dev_state;
+
+	if (dev_stage == DEV_STAGE_IDLE)
+		mtk_fsm_idle_evt_handler(mdev, dev_state, fsm);
+
+exit:
+	return 0;
+}
+
+static int mtk_fsm_ctrl_ch_start(struct mtk_md_fsm *fsm, struct fsm_hs_info *hs_info, int flag)
+{
+	if (!hs_info->ctrl_port) {
+		hs_info->ctrl_port = mtk_port_internal_open(fsm->mdev, hs_info->port_name, flag);
+		if (!hs_info->ctrl_port) {
+			dev_err(fsm->mdev->dev, "Failed to open ctrl port(%s)\n",
+				hs_info->port_name);
+			return -ENODEV;
+		}
+
+		mtk_port_internal_recv_register(hs_info->ctrl_port,
+						ctrl_msg_handler[hs_info->id], fsm);
+	}
+
+	return 0;
+}
+
+static void mtk_fsm_ctrl_ch_stop(struct mtk_md_fsm *fsm)
+{
+	struct fsm_hs_info *hs_info;
+	int hs_id;
+
+	for (hs_id = 0; hs_id < HS_ID_MAX; hs_id++) {
+		hs_info = &fsm->hs_info[hs_id];
+		if (hs_info->ctrl_port) {
+			mtk_port_internal_close(hs_info->ctrl_port);
+			hs_info->ctrl_port = NULL;
+		}
+	}
+}
+
+static void mtk_fsm_switch_state(struct mtk_md_fsm *fsm,
+				 enum mtk_fsm_state to_state, struct mtk_fsm_evt *event)
+{
+	char fsm_info[MTK_FSM_INFO_LEN];
+	struct mtk_fsm_notifier *nt;
+	struct mtk_fsm_param param;
+
+	param.from = fsm->state;
+	param.to = to_state;
+	param.evt_id = event ? event->id : FSM_EVT_MAX;
+	param.fsm_flag = event ? event->fsm_flag : FSM_F_DFLT;
+
+	list_for_each_entry(nt, &fsm->pre_notifiers, entry)
+		nt->cb(&param, nt->data);
+
+	fsm->state = to_state;
+	fsm->fsm_flag |= event ? event->fsm_flag : FSM_F_DFLT;
+
+	snprintf(fsm_info, MTK_FSM_INFO_LEN,
+		 "state=%d, fsm_flag=0x%x", to_state, fsm->fsm_flag);
+	mtk_uevent_notify(fsm->mdev->dev, MTK_UEVENT_FSM, fsm_info);
+
+	list_for_each_entry(nt, &fsm->post_notifiers, entry)
+		nt->cb(&param, nt->data);
+}
+
+static int mtk_fsm_startup_act(struct mtk_md_fsm *fsm, struct mtk_fsm_evt *event)
+{
+	enum mtk_fsm_state to_state = FSM_STATE_BOOTUP;
+	struct fsm_hs_info *hs_info = event->data;
+	struct mtk_md_dev *mdev = fsm->mdev;
+	int ret = 0;
+
+	if (fsm->state != FSM_STATE_ON && fsm->state != FSM_STATE_BOOTUP) {
+		ret = -EPROTO;
+		goto free_rt_data;
+	}
+
+	if (fsm->state != FSM_STATE_BOOTUP) {
+		mtk_fsm_switch_state(fsm, to_state, event);
+		return 0;
+	}
+
+	if (event->fsm_flag & FSM_HS_START_MASK) {
+		mtk_fsm_switch_state(fsm, to_state, event);
+
+		ret = mtk_fsm_ctrl_ch_start(fsm, hs_info, O_NONBLOCK);
+		if (!ret)
+			ret = mtk_fsm_send_hs1_msg(hs_info);
+		if (ret)
+			goto hs_err;
+	} else if (event->fsm_flag & FSM_HS2_DONE_MASK) {
+		ret = mtk_fsm_parse_hs2_msg(hs_info);
+		if (!ret) {
+			mtk_fsm_switch_state(fsm, to_state, event);
+			ret = mtk_fsm_send_hs3_msg(hs_info);
+		}
+		dev_kfree_skb(hs_info->rt_data);
+		hs_info->rt_data = NULL;
+		if (ret)
+			goto hs_err;
+	}
+
+	if (((fsm->fsm_flag | event->fsm_flag) & fsm->hs_done_flag) == fsm->hs_done_flag) {
+		to_state = FSM_STATE_READY;
+		mtk_fsm_switch_state(fsm, to_state, NULL);
+	}
+
+	return 0;
+
+free_rt_data:
+	if (hs_info && hs_info->rt_data) {
+		dev_kfree_skb(hs_info->rt_data);
+		hs_info->rt_data = NULL;
+	}
+hs_err:
+	dev_err((mdev)->dev, "Failed to hs with device %d:0x%x, ret=%d",
+		fsm->state, fsm->fsm_flag, ret);
+	return ret;
+}
+
+static void mtk_fsm_evt_release(struct kref *kref)
+{
+	struct mtk_fsm_evt *event = container_of(kref, struct mtk_fsm_evt, kref);
+
+	kfree(event);
+}
+
+static void mtk_fsm_evt_put(struct mtk_fsm_evt *event)
+{
+	kref_put(&event->kref, mtk_fsm_evt_release);
+}
+
+static void mtk_fsm_evt_finish(struct mtk_md_fsm *fsm,
+			       struct mtk_fsm_evt *event, int retval)
+{
+	if (event->mode & EVT_MODE_BLOCKING) {
+		event->status = retval;
+		wake_up(&fsm->evt_waitq);
+	}
+	mtk_fsm_evt_put(event);
+}
+
+static void mtk_fsm_evt_cleanup(struct mtk_md_fsm *fsm, struct list_head *evtq)
+{
+	struct mtk_fsm_evt *event, *tmp;
+
+	list_for_each_entry_safe(event, tmp, evtq, entry) {
+		list_del(&event->entry);
+		mtk_fsm_evt_finish(fsm, event, FSM_EVT_RET_FAIL);
+	}
+}
+
+static int mtk_fsm_enter_off_state(struct mtk_md_fsm *fsm, struct mtk_fsm_evt *event)
+{
+	struct mtk_md_dev *mdev = fsm->mdev;
+	int hs_id;
+
+	if (fsm->state == FSM_STATE_OFF || fsm->state == FSM_STATE_INVALID)
+		return -EPROTO;
+
+	mtk_dev_mask_dev_evt(mdev, DEV_EVT_D2H_BOOT_FLOW_SYNC);
+	for (hs_id = 0; hs_id < HS_ID_MAX; hs_id++)
+		mtk_dev_mask_dev_evt(mdev, fsm->hs_info[hs_id].mhccif_ch);
+
+	mtk_fsm_ctrl_ch_stop(fsm);
+	mtk_fsm_switch_state(fsm, FSM_STATE_OFF, event);
+
+	return 0;
+}
+
+static int mtk_fsm_dev_rm_act(struct mtk_md_fsm *fsm, struct mtk_fsm_evt *event)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&fsm->evtq_lock, flags);
+	set_bit(EVT_TF_GATECLOSED, &fsm->t_flag);
+	mtk_fsm_evt_cleanup(fsm, &fsm->evtq);
+	spin_unlock_irqrestore(&fsm->evtq_lock, flags);
+
+	return mtk_fsm_enter_off_state(fsm, event);
+}
+
+static int mtk_fsm_hs1_handler(u32 status, void *__hs_info)
+{
+	struct fsm_hs_info *hs_info = __hs_info;
+	struct mtk_md_dev *mdev;
+	struct mtk_md_fsm *fsm;
+
+	fsm = container_of(hs_info, struct mtk_md_fsm, hs_info[hs_info->id]);
+	mdev = fsm->mdev;
+	mtk_fsm_evt_submit(mdev, FSM_EVT_STARTUP,
+			   hs_info->fsm_flag_hs1, hs_info, sizeof(*hs_info), 0);
+	mtk_dev_mask_dev_evt(mdev, hs_info->mhccif_ch);
+	mtk_dev_clear_dev_evt(mdev, hs_info->mhccif_ch);
+
+	return 0;
+}
+
+static void mtk_fsm_hs_info_init_by_hsid(struct mtk_md_fsm *fsm, int hs_id)
+{
+	struct fsm_hs_info *hs_info;
+
+	if (hs_id < 0 || hs_id >= HS_ID_MAX) {
+		dev_warn((fsm->mdev)->dev, "hs_id = %d, invalid.\n", hs_id);
+		return;
+	}
+
+	hs_info = &fsm->hs_info[hs_id];
+	hs_info->id = hs_id;
+	hs_info->ctrl_port = NULL;
+	hs_info->rt_data = NULL;
+	switch (hs_id) {
+	case HS_ID_MD:
+		snprintf(hs_info->port_name, PORT_NAME_LEN, "MDCTRL");
+		hs_info->mhccif_ch = DEV_EVT_D2H_ASYNC_HS_NOTIFY_MD;
+		hs_info->fsm_flag_hs1 = FSM_F_MD_HS_START;
+		hs_info->fsm_flag_hs2 = FSM_F_MD_HS2_DONE;
+		hs_info->query_ft_set[QUERY_RTFT_ID_MD_PORT_ENUM].feature =
+			FIELD_PREP(FEATURE_TYPE, RTFT_TYPE_MUST_SUPPORT);
+		hs_info->query_ft_set[QUERY_RTFT_ID_MD_PORT_ENUM].feature |=
+			FIELD_PREP(FEATURE_VER, 0);
+		hs_info->query_ft_set[QUERY_RTFT_ID_MD_PORT_CFG].feature =
+			FIELD_PREP(FEATURE_TYPE, RTFT_TYPE_NOT_SUPPORT);
+		break;
+	case HS_ID_SAP:
+		snprintf(hs_info->port_name, PORT_NAME_LEN, "SAPCTRL");
+		hs_info->mhccif_ch = DEV_EVT_D2H_ASYNC_HS_NOTIFY_SAP;
+		hs_info->fsm_flag_hs1 = FSM_F_SAP_HS_START;
+		hs_info->fsm_flag_hs2 = FSM_F_SAP_HS2_DONE;
+		hs_info->query_ft_set[QUERY_RTFT_ID_SAP_PORT_ENUM].feature =
+			FIELD_PREP(FEATURE_TYPE, RTFT_TYPE_MUST_SUPPORT);
+		hs_info->query_ft_set[QUERY_RTFT_ID_SAP_PORT_ENUM].feature |=
+			FIELD_PREP(FEATURE_VER, 0);
+		break;
+	}
+}
+
+static void mtk_fsm_hs_info_init(struct mtk_md_fsm *fsm)
+{
+	struct mtk_md_dev *mdev = fsm->mdev;
+	struct fsm_hs_info *hs_info;
+	int hs_id;
+
+	for (hs_id = 0; hs_id < HS_ID_MAX; hs_id++) {
+		mtk_fsm_hs_info_init_by_hsid(fsm, hs_id);
+		hs_info = &fsm->hs_info[hs_id];
+		mtk_dev_register_dev_evt(mdev, hs_info->mhccif_ch,
+					 mtk_fsm_hs1_handler, hs_info);
+	}
+}
+
+static void mtk_fsm_hs_info_exit(struct mtk_md_fsm *fsm)
+{
+	struct mtk_md_dev *mdev = fsm->mdev;
+	struct fsm_hs_info *hs_info;
+	int hs_id;
+
+	for (hs_id = 0; hs_id < HS_ID_MAX; hs_id++) {
+		hs_info = &fsm->hs_info[hs_id];
+		mtk_dev_unregister_dev_evt(mdev, hs_info->mhccif_ch);
+	}
+}
+
+static int mtk_fsm_dev_add_act(struct mtk_md_fsm *fsm, struct mtk_fsm_evt *event)
+{
+	if (fsm->state != FSM_STATE_OFF && fsm->state != FSM_STATE_INVALID)
+		return -EPROTO;
+
+	mtk_fsm_switch_state(fsm, FSM_STATE_ON, event);
+	mtk_dev_unmask_dev_evt(fsm->mdev, DEV_EVT_D2H_BOOT_FLOW_SYNC);
+
+	return 0;
+}
+
+static int (*evts_act_tbl[FSM_EVT_MAX])(struct mtk_md_fsm *__fsm, struct mtk_fsm_evt *event) = {
+	[FSM_EVT_STARTUP] = mtk_fsm_startup_act,
+	[FSM_EVT_DEV_RM] = mtk_fsm_dev_rm_act,
+	[FSM_EVT_DEV_ADD] = mtk_fsm_dev_add_act,
+};
+
+int mtk_fsm_start(struct mtk_md_dev *mdev)
+{
+	struct mtk_md_fsm *fsm = mdev->fsm;
+
+	if (!fsm)
+		return -EINVAL;
+
+	if (!fsm->fsm_handler)
+		return -EFAULT;
+
+	wake_up_process(fsm->fsm_handler);
+	return 0;
+}
+EXPORT_SYMBOL(mtk_fsm_start);
+
+static void mkt_fsm_notifier_cleanup(struct mtk_md_dev *mdev, struct list_head *ntq)
+{
+	struct mtk_fsm_notifier *nt, *tmp;
+
+	list_for_each_entry_safe(nt, tmp, ntq, entry) {
+		list_del(&nt->entry);
+		dev_warn((mdev)->dev, "Having to free notifier(%d) by FSM!\n", nt->id);
+		devm_kfree(mdev->dev, nt);
+	}
+}
+
+static void mtk_fsm_notifier_insert(struct mtk_fsm_notifier *notifier, struct list_head *head)
+{
+	struct mtk_fsm_notifier *nt;
+
+	list_for_each_entry(nt, head, entry) {
+		if (notifier->prio > nt->prio) {
+			list_add(&notifier->entry, nt->entry.prev);
+			return;
+		}
+	}
+	list_add_tail(&notifier->entry, head);
+}
+
+int mtk_fsm_notifier_register(struct mtk_md_dev *mdev, enum mtk_user_id id,
+			      void (*cb)(struct mtk_fsm_param *, void *data),
+			      void *data, enum mtk_fsm_prio prio, bool is_pre)
+{
+	struct mtk_md_fsm *fsm = mdev->fsm;
+	struct mtk_fsm_notifier *notifier;
+
+	if (!fsm)
+		return -EINVAL;
+
+	if (id >= MTK_USER_MAX || !cb || prio >= FSM_PRIO_MAX)
+		return -EINVAL;
+
+	notifier = devm_kzalloc(mdev->dev, sizeof(*notifier), GFP_KERNEL);
+	if (!notifier)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&notifier->entry);
+	notifier->id = id;
+	notifier->cb = cb;
+	notifier->data = data;
+	notifier->prio = prio;
+
+	if (is_pre)
+		mtk_fsm_notifier_insert(notifier, &fsm->pre_notifiers);
+	else
+		mtk_fsm_notifier_insert(notifier, &fsm->post_notifiers);
+
+	return 0;
+}
+EXPORT_SYMBOL(mtk_fsm_notifier_register);
+
+int mtk_fsm_notifier_unregister(struct mtk_md_dev *mdev, enum mtk_user_id id)
+{
+	struct mtk_md_fsm *fsm = mdev->fsm;
+	struct mtk_fsm_notifier *nt, *tmp;
+
+	if (!fsm)
+		return -EINVAL;
+
+	list_for_each_entry_safe(nt, tmp, &fsm->pre_notifiers, entry) {
+		if (nt->id == id) {
+			list_del(&nt->entry);
+			devm_kfree(mdev->dev, nt);
+			break;
+		}
+	}
+	list_for_each_entry_safe(nt, tmp, &fsm->post_notifiers, entry) {
+		if (nt->id == id) {
+			list_del(&nt->entry);
+			devm_kfree(mdev->dev, nt);
+			break;
+		}
+	}
+	return 0;
+}
+EXPORT_SYMBOL(mtk_fsm_notifier_unregister);
+
+int mtk_fsm_evt_submit(struct mtk_md_dev *mdev,
+		       enum mtk_fsm_evt_id id, enum mtk_fsm_flag flag,
+		       void *data, unsigned int len, unsigned char mode)
+{
+	struct mtk_md_fsm *fsm = mdev->fsm;
+	struct mtk_fsm_evt *event;
+	unsigned long flags;
+	int ret = 0;
+
+	if (!fsm || id >= FSM_EVT_MAX) {
+		dev_err((mdev)->dev, "Invalid param!\n");
+		return FSM_EVT_RET_FAIL;
+	}
+
+	if (test_bit(EVT_TF_GATECLOSED, &fsm->t_flag)) {
+		dev_err((mdev)->dev, "Failed to submit evt, fsm has been removed!\n");
+		return FSM_EVT_RET_FAIL;
+	}
+
+	event = kzalloc(sizeof(*event),
+			(in_hardirq() || in_softirq() || irqs_disabled()) ?
+			GFP_ATOMIC : GFP_KERNEL);
+	if (!event)
+		return FSM_EVT_RET_FAIL;
+
+	kref_init(&event->kref);
+	event->mdev = mdev;
+	event->id = id;
+	event->fsm_flag = flag;
+	event->status = FSM_EVT_RET_ONGOING;
+	event->data = data;
+	event->len = len;
+	event->mode = mode;
+
+	spin_lock_irqsave(&fsm->evtq_lock, flags);
+	if (test_bit(EVT_TF_GATECLOSED, &fsm->t_flag)) {
+		spin_unlock_irqrestore(&fsm->evtq_lock, flags);
+		mtk_fsm_evt_put(event);
+		dev_err(mdev->dev, "Failed to add event, fsm dev has been removed!\n");
+		return FSM_EVT_RET_FAIL;
+	}
+
+	kref_get(&event->kref);
+	if (mode & EVT_MODE_TOHEAD)
+		list_add(&event->entry, &fsm->evtq);
+	else
+		list_add_tail(&event->entry, &fsm->evtq);
+	spin_unlock_irqrestore(&fsm->evtq_lock, flags);
+
+	wake_up_process(fsm->fsm_handler);
+	if (mode & EVT_MODE_BLOCKING) {
+		ret = wait_event_timeout(fsm->evt_waitq,
+					 (event->status != 0), BLOCKING_EVT_TIMEOUT);
+		if (!ret && event->status != FSM_EVT_RET_DONE) {
+			dev_err((mdev)->dev, "Handling fsm blocking event timeout!\n");
+			ret = -ETIMEDOUT;
+		} else {
+			ret = event->status;
+		}
+	}
+	mtk_fsm_evt_put(event);
+
+	return ret;
+}
+EXPORT_SYMBOL(mtk_fsm_evt_submit);
+
+static int mtk_fsm_evt_handler(void *__fsm)
+{
+	struct mtk_md_fsm *fsm = __fsm;
+	struct mtk_fsm_evt *event;
+	unsigned long flags;
+	int ret;
+
+wake_up:
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	while (!kthread_should_stop() && !list_empty(&fsm->evtq)) {
+		set_current_state(TASK_RUNNING);
+		spin_lock_irqsave(&fsm->evtq_lock, flags);
+		event = list_first_entry(&fsm->evtq, struct mtk_fsm_evt, entry);
+		list_del(&event->entry);
+		spin_unlock_irqrestore(&fsm->evtq_lock, flags);
+
+		if (event->id < FSM_EVT_MAX) {
+			ret = evts_act_tbl[event->id](fsm, event);
+			if (ret) {
+				dev_err((fsm->mdev)->dev,
+					"Failed to handle evt, fsm state = %d, ret = %d\n",
+					fsm->state, ret);
+				mtk_fsm_evt_finish(fsm, event, FSM_EVT_RET_FAIL);
+			} else {
+				mtk_fsm_evt_finish(fsm, event, FSM_EVT_RET_DONE);
+			}
+		} else {
+			mtk_fsm_evt_finish(fsm, event, FSM_EVT_RET_DONE);
+		}
+	}
+
+	if (kthread_should_stop()) {
+		set_current_state(TASK_RUNNING);
+		return 0;
+	}
+
+	schedule();
+	goto wake_up;
+}
+
+int mtk_fsm_init(struct mtk_md_dev *mdev)
+{
+	struct mtk_md_fsm *fsm;
+	int ret;
+
+	fsm = devm_kzalloc(mdev->dev, sizeof(*fsm), GFP_KERNEL);
+	if (!fsm)
+		return -ENOMEM;
+
+	fsm->fsm_handler = kthread_create(mtk_fsm_evt_handler, fsm, "fsm_evt_thread%d_%s",
+					  mdev->hw_ver, mdev->dev_str);
+	if (IS_ERR(fsm->fsm_handler)) {
+		ret = PTR_ERR(fsm->fsm_handler);
+		goto exit;
+	}
+
+	fsm->mdev = mdev;
+	fsm->state = FSM_STATE_INVALID;
+	fsm->fsm_flag = FSM_F_DFLT;
+
+	INIT_LIST_HEAD(&fsm->evtq);
+	spin_lock_init(&fsm->evtq_lock);
+	init_waitqueue_head(&fsm->evt_waitq);
+
+	INIT_LIST_HEAD(&fsm->pre_notifiers);
+	INIT_LIST_HEAD(&fsm->post_notifiers);
+
+	mtk_fsm_hs_info_init(fsm);
+	mtk_dev_register_dev_evt(mdev, DEV_EVT_D2H_BOOT_FLOW_SYNC,
+				 mtk_fsm_early_bootup_handler, fsm);
+	mdev->fsm = fsm;
+	return 0;
+exit:
+	devm_kfree(mdev->dev, fsm);
+	return ret;
+}
+EXPORT_SYMBOL(mtk_fsm_init);
+
+int mtk_fsm_exit(struct mtk_md_dev *mdev)
+{
+	struct mtk_md_fsm *fsm = mdev->fsm;
+	unsigned long flags;
+
+	if (!fsm)
+		return -EINVAL;
+
+	if (fsm->fsm_handler) {
+		kthread_stop(fsm->fsm_handler);
+		fsm->fsm_handler = NULL;
+	}
+
+	spin_lock_irqsave(&fsm->evtq_lock, flags);
+	if (WARN_ON(!list_empty(&fsm->evtq)))
+		mtk_fsm_evt_cleanup(fsm, &fsm->evtq);
+	spin_unlock_irqrestore(&fsm->evtq_lock, flags);
+
+	mkt_fsm_notifier_cleanup(mdev, &fsm->pre_notifiers);
+	mkt_fsm_notifier_cleanup(mdev, &fsm->post_notifiers);
+
+	mtk_dev_unregister_dev_evt(mdev, DEV_EVT_D2H_BOOT_FLOW_SYNC);
+	mtk_fsm_hs_info_exit(fsm);
+
+	devm_kfree(mdev->dev, fsm);
+	return 0;
+}
+EXPORT_SYMBOL(mtk_fsm_exit);
diff --git a/drivers/net/wwan/t9xx/mtk_fsm.h b/drivers/net/wwan/t9xx/mtk_fsm.h
new file mode 100644
index 000000000000..f2fc66bcef61
--- /dev/null
+++ b/drivers/net/wwan/t9xx/mtk_fsm.h
@@ -0,0 +1,140 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (c) 2022, MediaTek Inc.
+ */
+
+#ifndef __MTK_FSM_H__
+#define __MTK_FSM_H__
+
+#include "mtk_dev.h"
+
+#define FEATURE_CNT		(64)
+#define FEATURE_QUERY_PATTERN	(0x49434343)
+
+#define FEATURE_TYPE		GENMASK(3, 0)
+#define FEATURE_VER		GENMASK(7, 4)
+
+#define FEATURE_TYPE_NOT	FIELD_PREP(FEATURE_TYPE, RTFT_TYPE_NOT_SUPPORT)
+#define FEATURE_TYPE_MUST	FIELD_PREP(FEATURE_TYPE, RTFT_TYPE_MUST_SUPPORT)
+#define FEATURE_TYPE_OPTIONAL	FIELD_PREP(FEATURE_TYPE, RTFT_TYPE_OPTIONAL_SUPPORT)
+#define FEATURE_VER_0		FIELD_PREP(FEATURE_VER, 0)
+
+#define EVT_MODE_BLOCKING	(0x01)
+#define EVT_MODE_TOHEAD		(0x02)
+
+#define FSM_EVT_RET_FAIL	(-1)
+#define FSM_EVT_RET_ONGOING	(0)
+#define FSM_EVT_RET_DONE	(1)
+
+enum mtk_fsm_flag {
+	FSM_F_DFLT = 0,
+	FSM_F_SAP_HS_START	= BIT(0),
+	FSM_F_SAP_HS2_DONE	= BIT(1),
+	FSM_F_MD_HS_START	= BIT(2),
+	FSM_F_MD_HS2_DONE	= BIT(3),
+};
+
+enum mtk_fsm_state {
+	FSM_STATE_INVALID = 0,
+	FSM_STATE_OFF,
+	FSM_STATE_ON,
+	FSM_STATE_BOOTUP,
+	FSM_STATE_READY,
+};
+
+enum mtk_fsm_evt_id {
+	FSM_EVT_STARTUP = 0,
+	FSM_EVT_DEV_RM,
+	FSM_EVT_DEV_ADD,
+	FSM_EVT_MAX
+};
+
+enum mtk_fsm_prio {
+	FSM_PRIO_0 = 0,
+	FSM_PRIO_1 = 1,
+	FSM_PRIO_MAX
+};
+
+struct mtk_fsm_param {
+	enum mtk_fsm_state from;
+	enum mtk_fsm_state to;
+	enum mtk_fsm_evt_id evt_id;
+	enum mtk_fsm_flag fsm_flag;
+};
+
+#define PORT_NAME_LEN 20
+
+enum handshake_info_id {
+	HS_ID_MD = 0,
+	HS_ID_SAP,
+	HS_ID_MAX
+};
+
+struct runtime_feature_info {
+	u8 feature;
+};
+
+struct fsm_hs_info {
+	unsigned char id;
+	void *ctrl_port;
+	char port_name[PORT_NAME_LEN];
+	unsigned int mhccif_ch;
+	unsigned int fsm_flag_hs1;
+	unsigned int fsm_flag_hs2;
+	/* the feature that the device should support */
+	struct runtime_feature_info query_ft_set[FEATURE_CNT];
+	/* runtime data from device need to be parsed by host */
+	void *rt_data;
+	unsigned int rt_data_len;
+};
+
+struct mtk_md_fsm {
+	struct mtk_md_dev *mdev;
+	struct task_struct *fsm_handler;
+	struct fsm_hs_info hs_info[HS_ID_MAX];
+	unsigned int hs_done_flag;
+	unsigned long t_flag;
+	u32 last_dev_state;
+	enum mtk_fsm_state state;
+	unsigned int fsm_flag;
+	struct list_head evtq;
+	/* protect evtq */
+	spinlock_t evtq_lock;
+	/* waitq for fsm blocking submit */
+	wait_queue_head_t evt_waitq;
+	struct list_head pre_notifiers;
+	struct list_head post_notifiers;
+};
+
+struct mtk_fsm_evt {
+	struct list_head entry;
+	struct kref kref;
+	struct mtk_md_dev *mdev;
+	enum mtk_fsm_evt_id id;
+	unsigned int fsm_flag;
+	int status;
+	unsigned char mode;
+	unsigned int len;
+	void *data;
+};
+
+struct mtk_fsm_notifier {
+	struct list_head entry;
+	enum mtk_user_id id;
+	void (*cb)(struct mtk_fsm_param *param, void *data);
+	void *data;
+	enum mtk_fsm_prio prio;
+};
+
+int mtk_fsm_init(struct mtk_md_dev *mdev);
+int mtk_fsm_exit(struct mtk_md_dev *mdev);
+int mtk_fsm_start(struct mtk_md_dev *mdev);
+int mtk_fsm_notifier_register(struct mtk_md_dev *mdev, enum mtk_user_id id,
+			      void (*cb)(struct mtk_fsm_param *, void *data),
+			      void *data, enum mtk_fsm_prio prio, bool is_pre);
+int mtk_fsm_notifier_unregister(struct mtk_md_dev *mdev, enum mtk_user_id id);
+int mtk_fsm_evt_submit(struct mtk_md_dev *mdev,
+		       enum mtk_fsm_evt_id id, enum mtk_fsm_flag flag,
+		       void *data, unsigned int len, unsigned char mode);
+
+#endif /* __MTK_FSM_H__ */
diff --git a/drivers/net/wwan/t9xx/mtk_port.c b/drivers/net/wwan/t9xx/mtk_port.c
index 034c9ad0f892..fbc2fb7bdc2d 100644
--- a/drivers/net/wwan/t9xx/mtk_port.c
+++ b/drivers/net/wwan/t9xx/mtk_port.c
@@ -819,6 +819,71 @@ int mtk_port_ch_disable(struct mtk_port *port)
 	return ret;
 }
 
+static void mtk_port_disable(struct mtk_port_mngr *port_mngr)
+{
+	struct mtk_port **ports;
+	int tbl_type;
+	int ret, idx;
+
+	ports = kcalloc(port_mngr->port_cnt, sizeof(struct mtk_port *), GFP_KERNEL);
+	if (!ports)
+		return;
+
+	tbl_type = PORT_TBL_SAP;
+	do {
+		ret = radix_tree_gang_lookup(&port_mngr->port_tbl[tbl_type],
+					     (void **)ports, 0, port_mngr->port_cnt);
+		for (idx = 0; idx < ret; idx++)
+			ports_ops[ports[idx]->info.type]->disable(ports[idx]);
+	} while (++tbl_type < PORT_TBL_MAX);
+	kfree(ports);
+}
+
+void mtk_port_mngr_fsm_state_handler(struct mtk_fsm_param *fsm_param, void *arg)
+{
+	struct mtk_port_mngr *port_mngr;
+
+	if (!fsm_param || !arg)
+		return;
+
+	port_mngr = arg;
+
+	switch (fsm_param->to) {
+	case FSM_STATE_OFF:
+		mtk_port_disable(port_mngr);
+		break;
+	default:
+		break;
+	}
+}
+
+void mtk_port_mngr_fsm_state_handler_late(struct mtk_fsm_param *fsm_param, void *arg)
+{
+	struct mtk_port_mngr *port_mngr;
+	struct mtk_port *port;
+
+	if (!fsm_param || !arg)
+		return;
+
+	port_mngr = arg;
+
+	switch (fsm_param->to) {
+	case FSM_STATE_BOOTUP:
+		if (fsm_param->fsm_flag & FSM_F_MD_HS_START) {
+			port = mtk_port_search_by_id(port_mngr, CCCI_CONTROL_RX);
+			if (port)
+				ports_ops[port->info.type]->enable(port);
+		} else if (fsm_param->fsm_flag & FSM_F_SAP_HS_START) {
+			port = mtk_port_search_by_id(port_mngr, CCCI_SAP_CONTROL_RX);
+			if (port)
+				ports_ops[port->info.type]->enable(port);
+		}
+		break;
+	default:
+		break;
+	}
+}
+
 int mtk_port_mngr_init(struct mtk_ctrl_blk *ctrl_blk, struct mtk_port_cfg *port_cfg, int port_cnt)
 {
 	struct mtk_port_mngr *port_mngr;
diff --git a/drivers/net/wwan/t9xx/mtk_port.h b/drivers/net/wwan/t9xx/mtk_port.h
index bd4291408bc2..a201c0007878 100644
--- a/drivers/net/wwan/t9xx/mtk_port.h
+++ b/drivers/net/wwan/t9xx/mtk_port.h
@@ -152,6 +152,8 @@ int mtk_port_send_data(struct mtk_port *port, void *data);
 int mtk_port_status_update(struct mtk_md_dev *mdev, void *data);
 int mtk_port_ch_enable(struct mtk_port *port);
 int mtk_port_ch_disable(struct mtk_port *port);
+void mtk_port_mngr_fsm_state_handler(struct mtk_fsm_param *fsm_param, void *arg);
+void mtk_port_mngr_fsm_state_handler_late(struct mtk_fsm_param *fsm_param, void *arg);
 int mtk_port_mngr_init(struct mtk_ctrl_blk *ctrl_blk, struct mtk_port_cfg *port_cfg, int port_cnt);
 void mtk_port_mngr_exit(struct mtk_ctrl_blk *ctrl_blk);
 void mtk_port_trb_init(struct mtk_port *port, struct trb *trb, enum mtk_trb_cmd_type cmd,
diff --git a/drivers/net/wwan/t9xx/mtk_utility.h b/drivers/net/wwan/t9xx/mtk_utility.h
new file mode 100644
index 000000000000..b72db3842d2d
--- /dev/null
+++ b/drivers/net/wwan/t9xx/mtk_utility.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (c) 2022, MediaTek Inc.
+ */
+
+#ifndef __MTK_UTILITY_H__
+#define __MTK_UTILITY_H__
+
+#include <linux/device.h>
+#include "mtk_dev.h"
+
+#define MTK_UEVENT_INFO_LEN 128
+
+/* MTK uevent */
+enum mtk_uevent_id {
+	MTK_UEVENT_UNDEF = 0,
+	MTK_UEVENT_FSM = 1,
+	MTK_UEVENT_MINIDUMP = 2,
+	MTK_UEVENT_LOWPOWER = 3,
+	MTK_UEVENT_MAX
+};
+
+static inline void mtk_uevent_notify(struct device *dev, enum mtk_uevent_id id, const char *info)
+{
+	char buf[MTK_UEVENT_INFO_LEN];
+	char *ext[2] = {NULL, NULL};
+
+	snprintf(buf, MTK_UEVENT_INFO_LEN, "%s:event_id=%d, info=%s",
+		 dev->kobj.name, id, info);
+	ext[0] = buf;
+	kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, ext);
+}
+#endif /* __MTK_UTILITY_H__ */
diff --git a/drivers/net/wwan/t9xx/pcie/mtk_cldma.c b/drivers/net/wwan/t9xx/pcie/mtk_cldma.c
index 7a0815aa2fc8..0ad475d31c34 100644
--- a/drivers/net/wwan/t9xx/pcie/mtk_cldma.c
+++ b/drivers/net/wwan/t9xx/pcie/mtk_cldma.c
@@ -34,12 +34,172 @@
 #define CLDMA_RETRY_DELAY_MS	(100)
 #define NO_BUDGET		(0)
 
+static struct cldma_drv_info_desc cldma_drv_info_tbl[] = {
+	{0x01CA, &drv_ops_name(m9xx), &cldma_regs_name(m9xx)},
+	{0, NULL},
+};
+
+static void mtk_cldma_get_drv_info(struct cldma_drv_info *drv_info, u32 hw_ver)
+{
+	struct cldma_drv_info_desc *p_drv_info;
+	u8 i;
+
+	for (i = 0; (p_drv_info = &cldma_drv_info_tbl[i]) && p_drv_info &&
+	     p_drv_info->drv_ops && p_drv_info->hw_regs; i++)
+		if (p_drv_info->hw_ver == hw_ver) {
+			drv_info->drv_ops = p_drv_info->drv_ops;
+			drv_info->hw_regs = p_drv_info->hw_regs;
+		}
+}
+
+static int mtk_cldma_isr(int irq_id, void *param)
+{
+	struct cldma_drv_info *drv_info = param;
+	struct mtk_md_dev *mdev;
+	u32 tx_done, rx_done;
+	u32 tx_sta, rx_sta;
+	struct txq *txq;
+	struct rxq *rxq;
+	int i;
+
+	mdev = drv_info->mdev;
+	drv_info->drv_ops->cldma_get_intr_status(drv_info, &tx_sta, &rx_sta);
+	tx_done = (tx_sta >> QUEUE_XFER_DONE) & 0xFF;
+	rx_done = (rx_sta >> QUEUE_XFER_DONE) & 0xFF;
+
+	if (tx_done) {
+		for (i = 0; i < HW_QUEUE_NUM; i++) {
+			txq = drv_info->txq[i];
+			if (!(tx_done & BIT(i)) || !txq)
+				continue;
+			queue_work(drv_info->wq, &txq->tx_done_work);
+		}
+	}
+	if (rx_done) {
+		for (i = 0; i < HW_QUEUE_NUM; i++) {
+			rxq = drv_info->rxq[i];
+			if (!(rx_done & BIT(i)) || !rxq)
+				continue;
+			queue_work(drv_info->wq, &rxq->rx_done_work);
+		}
+	}
+
+	mtk_pci_clear_irq(mdev, drv_info->pci_ext_irq_id);
+	mtk_pci_unmask_irq(mdev, drv_info->pci_ext_irq_id);
+
+	return IRQ_HANDLED;
+}
+
 static const int mtk_cldma_hw_id_tbl[NR_CLDMA] = {
 	[CLDMA0] = CLDMA0_HW_ID,
 	[CLDMA1] = CLDMA1_HW_ID,
-	[CLDMA4] = CLDMA4_HW_ID,
 };
 
+static int mtk_cldma_dev_init(struct cldma_dev *cd, int hif_id)
+{
+	char gpd_pool_name[DMA_POOL_NAME_LEN];
+	char bd_pool_name[DMA_POOL_NAME_LEN];
+	struct cldma_drv_info *drv_info;
+	struct cldma_hw_regs *hw_regs;
+	struct mtk_md_dev *mdev;
+	unsigned int flag;
+	int hw_id, ret;
+
+	if (!cd || hif_id >= NR_CLDMA)
+		return -EINVAL;
+
+	if (cd->cldma_drv_info[hif_id])
+		return 0;
+
+	hw_id = mtk_cldma_hw_id_tbl[hif_id];
+	mdev = cd->trans->mdev;
+	drv_info = devm_kzalloc(mdev->dev, sizeof(*drv_info), GFP_KERNEL);
+	if (!drv_info)
+		return -ENOMEM;
+
+	drv_info->cd = cd;
+	drv_info->mdev = mdev;
+	drv_info->hif_id = hif_id;
+	drv_info->hw_id = hw_id;
+	mtk_cldma_get_drv_info(drv_info, mdev->hw_ver);
+
+	if (!drv_info->drv_ops || !drv_info->hw_regs) {
+		dev_err((mdev)->dev, "Failed to find CLDMA Driver for PCI %x\n", mdev->hw_ver);
+		ret = -EIO;
+		goto err_free_drv_info;
+	}
+
+	hw_regs = drv_info->hw_regs;
+	snprintf(gpd_pool_name, DMA_POOL_NAME_LEN, "cldma%d_gpd_pool_%s",
+		 hw_id, mdev->dev_str);
+	snprintf(bd_pool_name, DMA_POOL_NAME_LEN, "cldma%d_bd_pool_%s",
+		 hw_id, mdev->dev_str);
+	drv_info->gpd_dma_pool = dma_pool_create(gpd_pool_name, mdev->dev,
+						 sizeof(union gpd), 4, 0);
+	if (!drv_info->gpd_dma_pool) {
+		dev_err((mdev)->dev, "Failed to alloc gpd dma pool for cldma%d\n", hw_id);
+		ret = -ENOMEM;
+		goto err_free_drv_info;
+	}
+	drv_info->bd_dma_pool = dma_pool_create(bd_pool_name, mdev->dev,
+						sizeof(union bd), 4, 0);
+	if (!drv_info->bd_dma_pool) {
+		dev_err((mdev)->dev, "Failed to alloc bd dma pool for cldma%d\n", hw_id);
+		ret = -ENOMEM;
+		goto err_destroy_gpd_pool;
+	}
+
+	switch (hif_id) {
+	case CLDMA0:
+		drv_info->pci_ext_irq_id = mtk_pci_get_irq_id(mdev, MTK_IRQ_SRC_CLDMA0);
+		drv_info->base_addr = hw_regs->cldma0_base_addr;
+		break;
+	case CLDMA1:
+		drv_info->pci_ext_irq_id = mtk_pci_get_irq_id(mdev, MTK_IRQ_SRC_CLDMA1);
+		drv_info->base_addr = hw_regs->cldma1_base_addr;
+		break;
+	default:
+		ret = -EINVAL;
+		goto err_destroy_dma_pool;
+	}
+
+	flag = WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_HIGHPRI;
+	drv_info->wq = alloc_workqueue("cldma%d_workq_%s", flag, 0, hw_id, mdev->dev_str);
+	if (!drv_info->wq) {
+		dev_err((mdev)->dev, "Failed to alloc work queue for cldma%d\n", hw_id);
+		ret = -ENOMEM;
+		goto err_destroy_dma_pool;
+	}
+
+	drv_info->drv_ops->cldma_drv_init(drv_info);
+
+	/* mask/clear PCI CLDMA L1 interrupt */
+	mtk_pci_mask_irq(mdev, drv_info->pci_ext_irq_id);
+	mtk_pci_clear_irq(mdev, drv_info->pci_ext_irq_id);
+
+	/* register CLDMA interrupt handler */
+	ret = mtk_pci_register_irq(mdev, drv_info->pci_ext_irq_id, mtk_cldma_isr, drv_info);
+	if (ret)
+		goto err_destroy_wq;
+
+	/* unmask PCI CLDMA L1 interrupt */
+	mtk_pci_unmask_irq(mdev, drv_info->pci_ext_irq_id);
+
+	cd->cldma_drv_info[hif_id] = drv_info;
+	return 0;
+
+err_destroy_wq:
+	destroy_workqueue(drv_info->wq);
+err_destroy_dma_pool:
+	dma_pool_destroy(drv_info->bd_dma_pool);
+err_destroy_gpd_pool:
+	dma_pool_destroy(drv_info->gpd_dma_pool);
+err_free_drv_info:
+	devm_kfree(mdev->dev, drv_info);
+
+	return ret;
+}
+
 static inline void mtk_cldma_clr_bd_dsc(struct cldma_drv_info *drv_info,
 					struct bd_dsc *bd_dsc_pool, int nr_bds)
 {
@@ -824,6 +984,7 @@ static void mtk_cldma_rxq_free(struct cldma_drv_info *drv_info, u32 rxqno)
 		if (req->skb) {
 			if (rxq->nr_bds) {
 				skb_shinfo(req->skb)->frag_list = NULL;
+				dev_kfree_skb_any(req->skb);
 			} else {
 				if (req->data_dma_addr)
 					dma_unmap_single(mdev->dev, req->data_dma_addr,
@@ -853,6 +1014,44 @@ static void mtk_cldma_rxq_free(struct cldma_drv_info *drv_info, u32 rxqno)
 	devm_kfree(mdev->dev, rxq);
 }
 
+static int mtk_cldma_dev_exit(struct cldma_dev *cd, int hif_id)
+{
+	struct cldma_drv_info *drv_info;
+	struct mtk_md_dev *mdev;
+	int virq_id;
+	int i;
+
+	if (!cd || hif_id >= NR_CLDMA)
+		return -EINVAL;
+
+	if (!cd->cldma_drv_info[hif_id])
+		return 0;
+
+	/* free cldma descriptor */
+	drv_info = cd->cldma_drv_info[hif_id];
+	mdev = cd->trans->mdev;
+	virq_id = mtk_pci_get_virq_id(mdev, drv_info->pci_ext_irq_id);
+	mtk_pci_mask_irq(mdev, drv_info->pci_ext_irq_id);
+	synchronize_irq(virq_id);
+	for (i = 0; i < HW_QUEUE_NUM; i++) {
+		if (drv_info->txq[i])
+			mtk_cldma_txq_free(drv_info, drv_info->txq[i]->txqno);
+		if (drv_info->rxq[i])
+			mtk_cldma_rxq_free(drv_info, drv_info->rxq[i]->rxqno);
+	}
+
+	flush_workqueue(drv_info->wq);
+	destroy_workqueue(drv_info->wq);
+	dma_pool_destroy(drv_info->bd_dma_pool);
+	dma_pool_destroy(drv_info->gpd_dma_pool);
+	mtk_pci_unregister_irq(mdev, drv_info->pci_ext_irq_id);
+
+	devm_kfree(mdev->dev, drv_info);
+	cd->cldma_drv_info[hif_id] = NULL;
+
+	return 0;
+}
+
 static int mtk_cldma_start_xfer(struct cldma_drv_info *drv_info, u32 qno)
 {
 	struct cldma_drv_ops *drv_ops;
@@ -1163,6 +1362,27 @@ int mtk_cldma_trb_process(void *dev, struct sk_buff *skb)
 	return trb_act_tbl[trb->cmd](cd, skb);
 }
 
+void mtk_cldma_fsm_state_listener(struct mtk_fsm_param *param, struct mtk_ctrl_trans *trans)
+{
+	struct cldma_dev *cd = trans->dev;
+	int i;
+
+	switch (param->to) {
+	case FSM_STATE_BOOTUP:
+		if (param->fsm_flag & FSM_F_SAP_HS_START)
+			mtk_cldma_dev_init(cd, CLDMA0);
+		else if (param->fsm_flag & FSM_F_MD_HS_START)
+			mtk_cldma_dev_init(cd, CLDMA1);
+		break;
+	case FSM_STATE_OFF:
+		for (i = 0; i < NR_CLDMA; i++)
+			mtk_cldma_dev_exit(cd, i);
+		break;
+	default:
+		break;
+	}
+}
+
 int mtk_cldma_check_ch_cfg(void *dev, struct queue_info *que)
 {
 	struct cldma_drv_info *drv_info;
diff --git a/drivers/net/wwan/t9xx/pcie/mtk_cldma.h b/drivers/net/wwan/t9xx/pcie/mtk_cldma.h
index 74ce4f2f0b30..4686f7b178e5 100644
--- a/drivers/net/wwan/t9xx/pcie/mtk_cldma.h
+++ b/drivers/net/wwan/t9xx/pcie/mtk_cldma.h
@@ -167,4 +167,7 @@ int mtk_cldma_check_ch_cfg(void *dev, struct queue_info *que);
 #define drv_ops_name(NAME) cldma_drv_ops_##NAME
 #define cldma_regs_name(NAME) mtk_cldma_regs_##NAME
 
+extern struct cldma_drv_ops cldma_drv_ops_m9xx;
+extern struct cldma_hw_regs mtk_cldma_regs_m9xx;
+
 #endif
diff --git a/drivers/net/wwan/t9xx/pcie/mtk_cldma_drv.h b/drivers/net/wwan/t9xx/pcie/mtk_cldma_drv.h
index 8763c23abf54..6de87b7ffd45 100644
--- a/drivers/net/wwan/t9xx/pcie/mtk_cldma_drv.h
+++ b/drivers/net/wwan/t9xx/pcie/mtk_cldma_drv.h
@@ -11,7 +11,6 @@
 #define LINK_ERROR_VAL		(0xFFFFFFFF)
 #define CLDMA0_HW_ID		(0)
 #define CLDMA1_HW_ID		(1)
-#define CLDMA4_HW_ID		(4)
 
 struct cldma_hw_regs {
 	u8 cldma_rx_skb_pool_max_size;
@@ -36,7 +35,6 @@ struct cldma_hw_regs {
 	u16 reg_cldma_l2rimsr0;
 	u16 reg_cldma_l2rimsr1;
 	u16 reg_cldma_int_mask;
-	u16 reg_cldma4_int_mask;
 	u16 reg_cldma_slp_mem_ctl;
 	u16 reg_cldma_busy_mask;
 	u16 reg_cldma_ip_busy_to_pcie_mask;
@@ -58,7 +56,6 @@ struct cldma_hw_regs {
 	u32 rq_err_int_bitmask;
 	u32 cldma0_base_addr;
 	u32 cldma1_base_addr;
-	u32 cldma4_base_addr;
 	u32 rq_active_start_err_int_bitmask;
 	u32 reg_cldma_ul_start_addrl_0;
 	u32 reg_cldma_ul_start_addrh_0;
diff --git a/drivers/net/wwan/t9xx/pcie/mtk_cldma_drv_m9xx.c b/drivers/net/wwan/t9xx/pcie/mtk_cldma_drv_m9xx.c
index d9145d146a5c..a59c35fc1577 100644
--- a/drivers/net/wwan/t9xx/pcie/mtk_cldma_drv_m9xx.c
+++ b/drivers/net/wwan/t9xx/pcie/mtk_cldma_drv_m9xx.c
@@ -34,7 +34,6 @@
 struct cldma_hw_regs mtk_cldma_regs_m9xx = {
 	.cldma0_base_addr = CLDMA0_BASE_ADDR,
 	.cldma1_base_addr = CLDMA1_BASE_ADDR,
-	.cldma4_base_addr = CLDMA4_BASE_ADDR,
 	.cldma_rx_skb_pool_max_size = CLDMA_RX_SKB_POOL_MAX_SIZE,
 	.cldma_rx_skb_reload_threshold = CLDMA_RX_SKB_RELOAD_THRESHOLD,
 	.tq_err_int_offset = TQ_ERR_INT_OFFSET,
@@ -93,7 +92,6 @@ struct cldma_hw_regs mtk_cldma_regs_m9xx = {
 	.reg_cldma_l3risar1 = REG_CLDMA_L3RISAR1,
 	.reg_cldma_ip_busy = REG_CLDMA_IP_BUSY,
 	.reg_cldma_int_mask = REG_CLDMA_INT_EAP_USIP_MASK,
-	.reg_cldma4_int_mask = REG_CLDMA_INT_WF_MASK,
 	.reg_cldma_ip_busy_to_pcie_mask = REG_CLDMA_IP_BUSY_TO_PCIE_MASK,
 	.reg_cldma_ip_busy_to_pcie_mask_set = REG_CLDMA_IP_BUSY_TO_PCIE_MASK_SET,
 	.reg_cldma_ip_busy_to_pcie_mask_clr = REG_CLDMA_IP_BUSY_TO_PCIE_MASK_CLR,
@@ -135,10 +133,7 @@ static void mtk_cldma_drv_init_m9xx(struct cldma_drv_info *drv_info)
 			ALLQ << 24);
 
 	/* enable interrupt to PCIe */
-	if (drv_info->hw_id == CLDMA4_HW_ID)
-		mtk_pci_write32(mdev, base + hw_regs->reg_cldma4_int_mask, 0);
-	else
-		mtk_pci_write32(mdev, base + hw_regs->reg_cldma_int_mask, 0);
+	mtk_pci_write32(mdev, base + hw_regs->reg_cldma_int_mask, 0);
 
 	/* disable illegal memory check */
 	mtk_pci_write32(mdev, base + hw_regs->reg_cldma_ul_dummy_0, 1);
diff --git a/drivers/net/wwan/t9xx/pcie/mtk_cldma_drv_m9xx.h b/drivers/net/wwan/t9xx/pcie/mtk_cldma_drv_m9xx.h
index 2c63c43ff065..f113c4c1068a 100644
--- a/drivers/net/wwan/t9xx/pcie/mtk_cldma_drv_m9xx.h
+++ b/drivers/net/wwan/t9xx/pcie/mtk_cldma_drv_m9xx.h
@@ -8,7 +8,6 @@
 
 #define CLDMA0_BASE_ADDR				(0x1021C000)
 #define CLDMA1_BASE_ADDR				(0x1021E000)
-#define CLDMA4_BASE_ADDR				(0x10224000)
 
 #define CLDMA_RX_SKB_POOL_MAX_SIZE			(64)
 #define CLDMA_RX_SKB_RELOAD_THRESHOLD			(16)
@@ -80,7 +79,6 @@
 #define REG_CLDMA_L2RIMSR1				(0x0800 + 0x00FC)
 
 #define REG_CLDMA_INT_EAP_USIP_MASK			(0x0800 + 0x011C)
-#define REG_CLDMA_INT_WF_MASK				(0x0800 + 0x0120)
 #define REG_CLDMA_RQ1_GPD_DONE_CNT			(0x0800 + 0x0174)
 #define REG_CLDMA_TQ1_GPD_DONE_CNT			(0x0800 + 0x0184)
 
diff --git a/drivers/net/wwan/t9xx/pcie/mtk_pci.c b/drivers/net/wwan/t9xx/pcie/mtk_pci.c
index d3f862098a1d..4b93da5833db 100644
--- a/drivers/net/wwan/t9xx/pcie/mtk_pci.c
+++ b/drivers/net/wwan/t9xx/pcie/mtk_pci.c
@@ -897,22 +897,34 @@ static int mtk_pci_dev_init(struct mtk_md_dev *mdev)
 {
 	int ret;
 
-	ret = mtk_trans_ctrl_init(mdev);
+	ret = mtk_fsm_init(mdev);
 	if (ret) {
-		dev_err(mdev->dev, "Failed to initialize control plane: %d\n", ret);
+		dev_err(mdev->dev, "Failed to initialize FSM: %d\n", ret);
 		return ret;
 	}
 
+	ret = mtk_trans_ctrl_init(mdev);
+	if (ret)
+		goto free_fsm;
+
 	return 0;
+free_fsm:
+	mtk_fsm_exit(mdev);
+	return ret;
 }
 
 static void mtk_pci_dev_exit(struct mtk_md_dev *mdev)
 {
+	mtk_fsm_evt_submit(mdev, FSM_EVT_DEV_RM, 0, NULL, 0,
+			   EVT_MODE_BLOCKING | EVT_MODE_TOHEAD);
 	mtk_trans_ctrl_exit(mdev);
+	mtk_fsm_exit(mdev);
 }
 
 static int mtk_pci_dev_start(struct mtk_md_dev *mdev)
 {
+	mtk_fsm_evt_submit(mdev, FSM_EVT_DEV_ADD, 0, NULL, 0, 0);
+	mtk_fsm_start(mdev);
 	return 0;
 }
 static const struct mtk_dev_ops pci_hw_ops = {
diff --git a/drivers/net/wwan/t9xx/pcie/mtk_trans_ctrl.c b/drivers/net/wwan/t9xx/pcie/mtk_trans_ctrl.c
index 6eeed6935550..f905c9055b2b 100644
--- a/drivers/net/wwan/t9xx/pcie/mtk_trans_ctrl.c
+++ b/drivers/net/wwan/t9xx/pcie/mtk_trans_ctrl.c
@@ -481,6 +481,15 @@ static int mtk_pcie_hif_submit_skb(struct mtk_md_dev *mdev, struct sk_buff *skb,
 	return 0;
 }
 
+static void mtk_pcie_hif_fsm_indication(struct mtk_md_dev *mdev, struct mtk_fsm_param *param)
+{
+	struct mtk_ctrl_blk *ctrl_blk = mdev->ctrl_blk;
+	struct mtk_ctrl_trans *trans;
+
+	trans = ctrl_blk->ctrl_hw_priv;
+	mtk_cldma_fsm_state_listener(param, trans);
+}
+
 static int mtk_pcie_hif_cmd_func(struct mtk_md_dev *mdev, int cmd, void *data)
 {
 	struct mtk_ctrl_blk *ctrl_blk = mdev->ctrl_blk;
@@ -508,6 +517,7 @@ static struct mtk_ctrl_hif_ops pcie_ctrl_ops = {
 	.init = mtk_pcie_hif_init,
 	.exit = mtk_pcie_hif_exit,
 	.submit_skb = mtk_pcie_hif_submit_skb,
+	.fsm_indication = mtk_pcie_hif_fsm_indication,
 	.send_cmd = mtk_pcie_hif_cmd_func,
 };
 
diff --git a/drivers/net/wwan/t9xx/pcie/mtk_trans_ctrl.h b/drivers/net/wwan/t9xx/pcie/mtk_trans_ctrl.h
index a3ff56ddf86f..4b9c9db6ad71 100644
--- a/drivers/net/wwan/t9xx/pcie/mtk_trans_ctrl.h
+++ b/drivers/net/wwan/t9xx/pcie/mtk_trans_ctrl.h
@@ -29,7 +29,6 @@
 enum mtk_hif_id {
 	CLDMA0,
 	CLDMA1,
-	CLDMA4,
 	NR_CLDMA
 };
 

-- 
2.34.1



^ permalink raw reply related

* [PATCH v3 7/7] net: wwan: t9xx: Add maintainers entry
From: Jack Wu via B4 Relay @ 2026-06-24 10:04 UTC (permalink / raw)
  To: Loic Poulain, Sergey Ryazanov, Johannes Berg, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jack Wu, Wen-Zhi Huang, Shi-Wei Yeh, Minano Tseng,
	Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
	Jonathan Corbet, Shuah Khan
  Cc: linux-kernel, netdev, linux-arm-kernel, linux-mediatek, linux-doc
In-Reply-To: <20260624-t9xx_driver_v1-v3-0-73ff03f60c48@compal.com>

From: Jack Wu <jackbb_wu@compal.com>

Add MAINTAINERS entry for the MediaTek T9XX 5G WWAN modem device
driver.

Signed-off-by: Jack Wu <jackbb_wu@compal.com>
---
 MAINTAINERS | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 461a3eed6129..8155d26bff03 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16494,6 +16494,15 @@ L:	netdev@vger.kernel.org
 S:	Supported
 F:	drivers/net/wwan/t7xx/
 
+MEDIATEK T9XX 5G WWAN MODEM DRIVER
+M:	Jack Wu <jackbb_wu@compal.com>
+R:	Wen-Zhi Huang <wen-zhi.huang@mediatek.com>
+R:	Shi-Wei Yeh <shi-wei.yeh@mediatek.com>
+R:	Minano Tseng <Minano.tseng@mediatek.com>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	drivers/net/wwan/t9xx/
+
 MEDIATEK USB3 DRD IP DRIVER
 M:	Chunfeng Yun <chunfeng.yun@mediatek.com>
 L:	linux-usb@vger.kernel.org

-- 
2.34.1



^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox