* [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
@ 2024-12-18 3:24 Steve French
2024-12-19 0:28 ` Fwd: " Steve French
2024-12-19 8:41 ` Kuniyuki Iwashima
0 siblings, 2 replies; 20+ messages in thread
From: Steve French @ 2024-12-18 3:24 UTC (permalink / raw)
To: CIFS; +Cc: Kuniyuki Iwashima, Enzo Matsumiya
[-- Attachment #1: Type: text/plain, Size: 3364 bytes --]
Enzo had an interesting patch, that seems to fix an important problem.
Here was his repro scenario:
tw:~ # mount.cifs -o credentials=/root/wincreds,echo_interval=10
//someserver/target1 /mnt/test
tw:~ # ls /mnt/test
abc dir1 dir3 target1_file.txt tsub
tw:~ # iptables -A INPUT -s someserver -j DROP
Trigger reconnect and wait for 3*echo_interval:
tw:~ # cat /mnt/test/target1_file.txt
cat: /mnt/test/target1_file.txt: Host is down
Then umount and rmmod. Note that rmmod might take several iterations
until it properly tears down everything, so make sure you see the "not
loaded" message before proceeding:
tw:~ # umount /mnt/*; rmmod cifs
umount: /mnt/az: not mounted.
umount: /mnt/dfs: not mounted.
umount: /mnt/local: not mounted.
umount: /mnt/scratch: not mounted.
rmmod: ERROR: Module cifs is in use
...
tw:~ # rmmod cifs
rmmod: ERROR: Module cifs is not currently loaded
Then kickoff the TCP internals:
tw:~ # iptables -F
Gets the lockdep warning (requires CONFIG_LOCKDEP=y) + a NULL deref
later on.
Any thoughts on his patch? See below (and attached)
Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.")
fixed a netns UAF by manually enabled socket refcounting
(sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)).
The reason the patch worked for that bug was because we now hold
references to the netns (get_net_track() gets a ref internally)
and they're properly released (internally, on __sk_destruct()),
but only because sk->sk_net_refcnt was set.
Problem:
(this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless
if init_net or other)
Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not
only out of cifs scope, but also technically wrong -- it's set conditionally
based on user (=1) vs kernel (=0) sockets. And net/ implementations
seem to base their user vs kernel space operations on it.
e.g. upon TCP socket close, the TCP timers are not cleared because
sk->sk_net_refcnt=1:
(cf. commit 151c9c724d05 ("tcp: properly terminate timers for
kernel sockets"))
net/ipv4/tcp.c:
void tcp_close(struct sock *sk, long timeout)
{
lock_sock(sk);
__tcp_close(sk, timeout);
release_sock(sk);
if (!sk->sk_net_refcnt)
inet_csk_clear_xmit_timers_sync(sk);
sock_put(sk);
}
Which will throw a lockdep warning and then, as expected, deadlock on
tcp_write_timer().
A way to reproduce this is by running the reproducer from ef7134c7fc48
and then 'rmmod cifs'. A few seconds later, the deadlock/lockdep
warning shows up.
Fix:
We shouldn't mess with socket internals ourselves, so do not set
sk_net_refcnt manually.
Also change __sock_create() to sock_create_kern() for explicitness.
As for non-init_net network namespaces, we deal with it the best way
we can -- hold an extra netns reference for server->ssocket and drop it
when it's released. This ensures that the netns still exists whenever
we need to create/destroy server->ssocket, but is not directly tied to
it.
Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.")
--
Thanks,
Steve
[-- Attachment #2: 0001-smb-client-fix-TCP-timers-deadlock-after-rmmod.patch --]
[-- Type: text/x-patch, Size: 5831 bytes --]
From f6cfa4bc261477f7a91c46f34b8d163f19870249 Mon Sep 17 00:00:00 2001
From: Enzo Matsumiya <ematsumiya@suse.de>
Date: Tue, 10 Dec 2024 18:15:12 -0300
Subject: [PATCH 1/4] smb: client: fix TCP timers deadlock after rmmod
Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.")
fixed a netns UAF by manually enabled socket refcounting
(sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)).
The reason the patch worked for that bug was because we now hold
references to the netns (get_net_track() gets a ref internally)
and they're properly released (internally, on __sk_destruct()),
but only because sk->sk_net_refcnt was set.
Problem:
(this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless
if init_net or other)
Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not
only out of cifs scope, but also technically wrong -- it's set conditionally
based on user (=1) vs kernel (=0) sockets. And net/ implementations
seem to base their user vs kernel space operations on it.
e.g. upon TCP socket close, the TCP timers are not cleared because
sk->sk_net_refcnt=1:
(cf. commit 151c9c724d05 ("tcp: properly terminate timers for kernel sockets"))
net/ipv4/tcp.c:
void tcp_close(struct sock *sk, long timeout)
{
lock_sock(sk);
__tcp_close(sk, timeout);
release_sock(sk);
if (!sk->sk_net_refcnt)
inet_csk_clear_xmit_timers_sync(sk);
sock_put(sk);
}
Which will throw a lockdep warning and then, as expected, deadlock on
tcp_write_timer().
A way to reproduce this is by running the reproducer from ef7134c7fc48
and then 'rmmod cifs'. A few seconds later, the deadlock/lockdep
warning shows up.
Fix:
We shouldn't mess with socket internals ourselves, so do not set
sk_net_refcnt manually.
Also change __sock_create() to sock_create_kern() for explicitness.
As for non-init_net network namespaces, we deal with it the best way
we can -- hold an extra netns reference for server->ssocket and drop it
when it's released. This ensures that the netns still exists whenever
we need to create/destroy server->ssocket, but is not directly tied to
it.
Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.")
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
---
fs/smb/client/connect.c | 36 ++++++++++++++++++++++++++----------
1 file changed, 26 insertions(+), 10 deletions(-)
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 2372538a1211..ddcc9e514a0e 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -987,9 +987,13 @@ clean_demultiplex_info(struct TCP_Server_Info *server)
msleep(125);
if (cifs_rdma_enabled(server))
smbd_destroy(server);
+
if (server->ssocket) {
sock_release(server->ssocket);
server->ssocket = NULL;
+
+ /* Release netns reference for the socket. */
+ put_net(cifs_net_ns(server));
}
if (!list_empty(&server->pending_mid_q)) {
@@ -1037,6 +1041,7 @@ clean_demultiplex_info(struct TCP_Server_Info *server)
*/
}
+ /* Release netns reference for this server. */
put_net(cifs_net_ns(server));
kfree(server->leaf_fullpath);
kfree(server);
@@ -1713,6 +1718,8 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx,
tcp_ses->ops = ctx->ops;
tcp_ses->vals = ctx->vals;
+
+ /* Grab netns reference for this server. */
cifs_set_net_ns(tcp_ses, get_net(current->nsproxy->net_ns));
tcp_ses->conn_id = atomic_inc_return(&tcpSesNextId);
@@ -1844,6 +1851,7 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx,
out_err_crypto_release:
cifs_crypto_secmech_release(tcp_ses);
+ /* Release netns reference for this server. */
put_net(cifs_net_ns(tcp_ses));
out_err:
@@ -1852,8 +1860,10 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx,
cifs_put_tcp_session(tcp_ses->primary_server, false);
kfree(tcp_ses->hostname);
kfree(tcp_ses->leaf_fullpath);
- if (tcp_ses->ssocket)
+ if (tcp_ses->ssocket) {
sock_release(tcp_ses->ssocket);
+ put_net(cifs_net_ns(tcp_ses));
+ }
kfree(tcp_ses);
}
return ERR_PTR(rc);
@@ -3131,20 +3141,20 @@ generic_ip_connect(struct TCP_Server_Info *server)
socket = server->ssocket;
} else {
struct net *net = cifs_net_ns(server);
- struct sock *sk;
- rc = __sock_create(net, sfamily, SOCK_STREAM,
- IPPROTO_TCP, &server->ssocket, 1);
+ rc = sock_create_kern(net, sfamily, SOCK_STREAM, IPPROTO_TCP, &server->ssocket);
if (rc < 0) {
cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
return rc;
}
- sk = server->ssocket->sk;
- __netns_tracker_free(net, &sk->ns_tracker, false);
- sk->sk_net_refcnt = 1;
- get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
- sock_inuse_add(net, 1);
+ /*
+ * Grab netns reference for the socket.
+ *
+ * It'll be released here, on error, or in clean_demultiplex_info() upon server
+ * teardown.
+ */
+ get_net(net);
/* BB other socket options to set KEEPALIVE, NODELAY? */
cifs_dbg(FYI, "Socket created\n");
@@ -3158,8 +3168,10 @@ generic_ip_connect(struct TCP_Server_Info *server)
}
rc = bind_socket(server);
- if (rc < 0)
+ if (rc < 0) {
+ put_net(cifs_net_ns(server));
return rc;
+ }
/*
* Eventually check for other socket options to change from
@@ -3196,6 +3208,7 @@ generic_ip_connect(struct TCP_Server_Info *server)
if (rc < 0) {
cifs_dbg(FYI, "Error %d connecting to server\n", rc);
trace_smb3_connect_err(server->hostname, server->conn_id, &server->dstaddr, rc);
+ put_net(cifs_net_ns(server));
sock_release(socket);
server->ssocket = NULL;
return rc;
@@ -3204,6 +3217,9 @@ generic_ip_connect(struct TCP_Server_Info *server)
if (sport == htons(RFC1001_PORT))
rc = ip_rfc1001_connect(server);
+ if (rc < 0)
+ put_net(cifs_net_ns(server));
+
return rc;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2024-12-18 3:24 [PATCH][SMB3 client] fix TCP timers deadlock after rmmod Steve French
@ 2024-12-19 0:28 ` Steve French
2024-12-19 0:30 ` Steve French
2025-04-01 13:54 ` Wang Zhaolong
2024-12-19 8:41 ` Kuniyuki Iwashima
1 sibling, 2 replies; 20+ messages in thread
From: Steve French @ 2024-12-19 0:28 UTC (permalink / raw)
To: linux-fsdevel, linux-net, LKML; +Cc: Enzo Matsumiya
[-- Attachment #1: Type: text/plain, Size: 3969 bytes --]
Adding fsdevel and networking in case any thoughts on this fix for
network/namespaces refcount issue (that causes rmmod UAF).
Any opinions on Enzo's proposed Fix?
---------- Forwarded message ---------
From: Steve French <smfrench@gmail.com>
Date: Tue, Dec 17, 2024 at 9:24 PM
Subject: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
To: CIFS <linux-cifs@vger.kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>, Enzo Matsumiya <ematsumiya@suse.de>
Enzo had an interesting patch, that seems to fix an important problem.
Here was his repro scenario:
tw:~ # mount.cifs -o credentials=/root/wincreds,echo_interval=10
//someserver/target1 /mnt/test
tw:~ # ls /mnt/test
abc dir1 dir3 target1_file.txt tsub
tw:~ # iptables -A INPUT -s someserver -j DROP
Trigger reconnect and wait for 3*echo_interval:
tw:~ # cat /mnt/test/target1_file.txt
cat: /mnt/test/target1_file.txt: Host is down
Then umount and rmmod. Note that rmmod might take several iterations
until it properly tears down everything, so make sure you see the "not
loaded" message before proceeding:
tw:~ # umount /mnt/*; rmmod cifs
umount: /mnt/az: not mounted.
umount: /mnt/dfs: not mounted.
umount: /mnt/local: not mounted.
umount: /mnt/scratch: not mounted.
rmmod: ERROR: Module cifs is in use
...
tw:~ # rmmod cifs
rmmod: ERROR: Module cifs is not currently loaded
Then kickoff the TCP internals:
tw:~ # iptables -F
Gets the lockdep warning (requires CONFIG_LOCKDEP=y) + a NULL deref
later on.
Any thoughts on his patch? See below (and attached)
Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.")
fixed a netns UAF by manually enabled socket refcounting
(sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)).
The reason the patch worked for that bug was because we now hold
references to the netns (get_net_track() gets a ref internally)
and they're properly released (internally, on __sk_destruct()),
but only because sk->sk_net_refcnt was set.
Problem:
(this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless
if init_net or other)
Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not
only out of cifs scope, but also technically wrong -- it's set conditionally
based on user (=1) vs kernel (=0) sockets. And net/ implementations
seem to base their user vs kernel space operations on it.
e.g. upon TCP socket close, the TCP timers are not cleared because
sk->sk_net_refcnt=1:
(cf. commit 151c9c724d05 ("tcp: properly terminate timers for
kernel sockets"))
net/ipv4/tcp.c:
void tcp_close(struct sock *sk, long timeout)
{
lock_sock(sk);
__tcp_close(sk, timeout);
release_sock(sk);
if (!sk->sk_net_refcnt)
inet_csk_clear_xmit_timers_sync(sk);
sock_put(sk);
}
Which will throw a lockdep warning and then, as expected, deadlock on
tcp_write_timer().
A way to reproduce this is by running the reproducer from ef7134c7fc48
and then 'rmmod cifs'. A few seconds later, the deadlock/lockdep
warning shows up.
Fix:
We shouldn't mess with socket internals ourselves, so do not set
sk_net_refcnt manually.
Also change __sock_create() to sock_create_kern() for explicitness.
As for non-init_net network namespaces, we deal with it the best way
we can -- hold an extra netns reference for server->ssocket and drop it
when it's released. This ensures that the netns still exists whenever
we need to create/destroy server->ssocket, but is not directly tied to
it.
Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.")
--
Thanks,
Steve
--
Thanks,
Steve
[-- Attachment #2: 0001-smb-client-fix-TCP-timers-deadlock-after-rmmod.patch --]
[-- Type: application/x-patch, Size: 5831 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread* Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2024-12-19 0:28 ` Fwd: " Steve French
@ 2024-12-19 0:30 ` Steve French
2025-04-01 13:54 ` Wang Zhaolong
1 sibling, 0 replies; 20+ messages in thread
From: Steve French @ 2024-12-19 0:30 UTC (permalink / raw)
To: Network Development; +Cc: Enzo Matsumiya
[-- Attachment #1: Type: text/plain, Size: 4394 bytes --]
Correcting email for network mailing list
---------- Forwarded message ---------
From: Steve French <smfrench@gmail.com>
Date: Wed, Dec 18, 2024 at 6:28 PM
Subject: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
To: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
<linux-net@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>
Cc: Enzo Matsumiya <ematsumiya@suse.de>
Adding fsdevel and networking in case any thoughts on this fix for
network/namespaces refcount issue (that causes rmmod UAF).
Any opinions on Enzo's proposed Fix?
---------- Forwarded message ---------
From: Steve French <smfrench@gmail.com>
Date: Tue, Dec 17, 2024 at 9:24 PM
Subject: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
To: CIFS <linux-cifs@vger.kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>, Enzo Matsumiya <ematsumiya@suse.de>
Enzo had an interesting patch, that seems to fix an important problem.
Here was his repro scenario:
tw:~ # mount.cifs -o credentials=/root/wincreds,echo_interval=10
//someserver/target1 /mnt/test
tw:~ # ls /mnt/test
abc dir1 dir3 target1_file.txt tsub
tw:~ # iptables -A INPUT -s someserver -j DROP
Trigger reconnect and wait for 3*echo_interval:
tw:~ # cat /mnt/test/target1_file.txt
cat: /mnt/test/target1_file.txt: Host is down
Then umount and rmmod. Note that rmmod might take several iterations
until it properly tears down everything, so make sure you see the "not
loaded" message before proceeding:
tw:~ # umount /mnt/*; rmmod cifs
umount: /mnt/az: not mounted.
umount: /mnt/dfs: not mounted.
umount: /mnt/local: not mounted.
umount: /mnt/scratch: not mounted.
rmmod: ERROR: Module cifs is in use
...
tw:~ # rmmod cifs
rmmod: ERROR: Module cifs is not currently loaded
Then kickoff the TCP internals:
tw:~ # iptables -F
Gets the lockdep warning (requires CONFIG_LOCKDEP=y) + a NULL deref
later on.
Any thoughts on his patch? See below (and attached)
Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.")
fixed a netns UAF by manually enabled socket refcounting
(sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)).
The reason the patch worked for that bug was because we now hold
references to the netns (get_net_track() gets a ref internally)
and they're properly released (internally, on __sk_destruct()),
but only because sk->sk_net_refcnt was set.
Problem:
(this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless
if init_net or other)
Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not
only out of cifs scope, but also technically wrong -- it's set conditionally
based on user (=1) vs kernel (=0) sockets. And net/ implementations
seem to base their user vs kernel space operations on it.
e.g. upon TCP socket close, the TCP timers are not cleared because
sk->sk_net_refcnt=1:
(cf. commit 151c9c724d05 ("tcp: properly terminate timers for
kernel sockets"))
net/ipv4/tcp.c:
void tcp_close(struct sock *sk, long timeout)
{
lock_sock(sk);
__tcp_close(sk, timeout);
release_sock(sk);
if (!sk->sk_net_refcnt)
inet_csk_clear_xmit_timers_sync(sk);
sock_put(sk);
}
Which will throw a lockdep warning and then, as expected, deadlock on
tcp_write_timer().
A way to reproduce this is by running the reproducer from ef7134c7fc48
and then 'rmmod cifs'. A few seconds later, the deadlock/lockdep
warning shows up.
Fix:
We shouldn't mess with socket internals ourselves, so do not set
sk_net_refcnt manually.
Also change __sock_create() to sock_create_kern() for explicitness.
As for non-init_net network namespaces, we deal with it the best way
we can -- hold an extra netns reference for server->ssocket and drop it
when it's released. This ensures that the netns still exists whenever
we need to create/destroy server->ssocket, but is not directly tied to
it.
Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.")
--
Thanks,
Steve
--
Thanks,
Steve
--
Thanks,
Steve
[-- Attachment #2: 0001-smb-client-fix-TCP-timers-deadlock-after-rmmod.patch --]
[-- Type: application/x-patch, Size: 5831 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2024-12-19 0:28 ` Fwd: " Steve French
2024-12-19 0:30 ` Steve French
@ 2025-04-01 13:54 ` Wang Zhaolong
2025-04-01 20:26 ` Kuniyuki Iwashima
2025-04-02 2:01 ` Kuniyuki Iwashima
1 sibling, 2 replies; 20+ messages in thread
From: Wang Zhaolong @ 2025-04-01 13:54 UTC (permalink / raw)
To: Steve French, linux-fsdevel, linux-net, LKML, Enzo Matsumiya,
kuniyu, edumazet
Cc: zhangchangzhong
Hi.
My colleagues and I have been investigating the issue addressed by this patch
and have discovered some significant concerns that require broader discussion.
### Socket Leak Issue
After testing this patch extensively, I've confirmed it introduces a socket leak
when TCP connections don't complete proper termination (e.g., when FIN packets
are dropped). The leak manifests as a continuous increase in TCP slab usage:
I've documented this issue with a reproducer in Bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=219972#c0
The key issue appears to stem from the interaction between the SMB client and TCP
socket lifecycle management:
1. Removing `sk->sk_net_refcnt = 1` causes TCP timers to be terminated early in
`tcp_close()` via the `inet_csk_clear_xmit_timers_sync()` call when
`!sk->sk_net_refcnt`
2. This early timer termination prevents proper reference counting resolution
when connections don't complete the 4-way TCP termination handshake
3. The resulting socket references are never fully released, leading to a leak
#### Timeline of Related Changes
1. v4.2-rc1 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets")
- Added `sk_net_refcnt` field to distinguish user sockets (=1) from kernel sockets (=0)
- Kernel sockets don't hold netns references, which can lead to potential UAF issues
2. v6.9-rc2 151c9c724d05: ("tcp: properly terminate timers for kernel sockets")
- Modified `tcp_close()` to check `sk->sk_net_refcnt` and explicitly terminate timers for kernel sockets (=0)
- This prevents UAF when netns is destroyed before socket timers complete
- **Key change**: If `!sk->sk_net_refcnt`, call `inet_csk_clear_xmit_timers_sync()`
3. v6.12-rc7 ef7134c7fc48: ("smb: client: Fix use-after-free of network namespace")
- Fixed netns UAF in CIFS by manually setting `sk->sk_net_refcnt = 1`
- Also called `maybe_get_net()` to maintain netns references
- This effectively made kernel sockets behave like user sockets for reference counting
4. v6.13-rc4 e9f2517a3e18: ("smb: client: fix TCP timers deadlock after rmmod")
- Problem commit: Removed `sk->sk_net_refcnt = 1` setting
- Changed to using explicit `get_net()/put_net()` at CIFS layer
- This change leads to socket leaks because timers are terminated early
### Lockdep Warning Analysis
I've also investigated the lockdep warning mentioned in the patch. My analysis
suggests it may be a false positive rather than an actual deadlock. The crash
actually occurs in the lockdep subsystem itself (null pointer dereference in
`check_wait_context()`), not in the CIFS or networking code directly.
The procedure for the null pointer dereference is as follows:
When lockdep is enabled, the lock class "slock-AF_INET-CIFS" is set when a socket
connection is established.
```
cifs_do_mount
cifs_mount
mount_get_conns
cifs_get_tcp_session
ip_connect
generic_ip_connect
cifs_reclassify_socket4
sock_lock_init_class_and_name(sk, "slock-AF_INET-CIFS",
lockdep_init_map
lockdep_init_map_wait
lockdep_init_map_type
lockdep_init_map_type
register_lock_class
__set_bit(class - lock_classes, lock_classes_in_use);
```
When the module is unloaded, the lock class is cleaned up.
```
free_mod_mem
lockdep_free_key_range
__lockdep_free_key_range
zap_class
__clear_bit(class - lock_classes, lock_classes_in_use);
```
After the module is uninstalled and the network connection is restored, the
timer is woken up.
```
run_timer_softirq
run_timer_base
__run_timers
call_timer_fn
tcp_write_timer
bh_lock_sock
spin_lock(&((__sk)->sk_lock.slock))
_raw_spin_lock
lock_acquire
__lock_acquire
check_wait_context
hlock_class
if (!test_bit(class_idx, lock_classes_in_use)) {
return NULL;
hlock_class(next)->wait_type_inner; // Null pointer dereference
```
The problem lies within lockdep, as Kuniyuki says:
> I tried the repro and confirmed it triggers null deref.
>
> It happens in LOCKDEP internal, so for me it looks like a problem in
> LOCKDEP rather than CIFS or TCP.
>
> I think LOCKDEP should hold a module reference and prevent related
> modules from being unloaded.
Regarding the deadlock issue, it is clear that the locks mentioned in the deadlock warning
do not belong to the same lock instance. A deadlock should not occur.
### Discussion Points
1. API Design Question: Is this fundamentally an issue with how CIFS uses the socket
API, or is it a networking layer issue that should handle socket cleanup differently?
2. Approach to Resolution: Would it be better to:
- Revert to the original solution (setting `sk->sk_net_refcnt = 1`) from ef7134c7fc48?
- Work with networking subsystem maintainers on a more comprehensive solution that
handles socket cleanup properly?
3. CVE Process Question: Given that CVE-2024-54680 appears to "fix" a non-existent issue
while introducing an actual vulnerability, what's the appropriate way to address this?
What's the best path forward?
Best regards,
Wang Zhaolong
> Adding fsdevel and networking in case any thoughts on this fix for
> network/namespaces refcount issue (that causes rmmod UAF).
>
> Any opinions on Enzo's proposed Fix?
>
> ---------- Forwarded message ---------
> From: Steve French <smfrench@gmail.com>
> Date: Tue, Dec 17, 2024 at 9:24 PM
> Subject: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
> To: CIFS <linux-cifs@vger.kernel.org>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>, Enzo Matsumiya <ematsumiya@suse.de>
>
>
> Enzo had an interesting patch, that seems to fix an important problem.
>
> Here was his repro scenario:
>
> tw:~ # mount.cifs -o credentials=/root/wincreds,echo_interval=10
> //someserver/target1 /mnt/test
> tw:~ # ls /mnt/test
> abc dir1 dir3 target1_file.txt tsub
> tw:~ # iptables -A INPUT -s someserver -j DROP
>
> Trigger reconnect and wait for 3*echo_interval:
>
> tw:~ # cat /mnt/test/target1_file.txt
> cat: /mnt/test/target1_file.txt: Host is down
>
> Then umount and rmmod. Note that rmmod might take several iterations
> until it properly tears down everything, so make sure you see the "not
> loaded" message before proceeding:
>
> tw:~ # umount /mnt/*; rmmod cifs
> umount: /mnt/az: not mounted.
> umount: /mnt/dfs: not mounted.
> umount: /mnt/local: not mounted.
> umount: /mnt/scratch: not mounted.
> rmmod: ERROR: Module cifs is in use
> ...
> tw:~ # rmmod cifs
> rmmod: ERROR: Module cifs is not currently loaded
>
> Then kickoff the TCP internals:
> tw:~ # iptables -F
>
> Gets the lockdep warning (requires CONFIG_LOCKDEP=y) + a NULL deref
> later on.
>
>
> Any thoughts on his patch? See below (and attached)
>
> Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
> namespace.")
> fixed a netns UAF by manually enabled socket refcounting
> (sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)).
>
> The reason the patch worked for that bug was because we now hold
> references to the netns (get_net_track() gets a ref internally)
> and they're properly released (internally, on __sk_destruct()),
> but only because sk->sk_net_refcnt was set.
>
> Problem:
> (this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless
> if init_net or other)
>
> Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not
> only out of cifs scope, but also technically wrong -- it's set conditionally
> based on user (=1) vs kernel (=0) sockets. And net/ implementations
> seem to base their user vs kernel space operations on it.
>
> e.g. upon TCP socket close, the TCP timers are not cleared because
> sk->sk_net_refcnt=1:
> (cf. commit 151c9c724d05 ("tcp: properly terminate timers for
> kernel sockets"))
>
> net/ipv4/tcp.c:
> void tcp_close(struct sock *sk, long timeout)
> {
> lock_sock(sk);
> __tcp_close(sk, timeout);
> release_sock(sk);
> if (!sk->sk_net_refcnt)
> inet_csk_clear_xmit_timers_sync(sk);
> sock_put(sk);
> }
>
> Which will throw a lockdep warning and then, as expected, deadlock on
> tcp_write_timer().
>
> A way to reproduce this is by running the reproducer from ef7134c7fc48
> and then 'rmmod cifs'. A few seconds later, the deadlock/lockdep
> warning shows up.
>
> Fix:
> We shouldn't mess with socket internals ourselves, so do not set
> sk_net_refcnt manually.
>
> Also change __sock_create() to sock_create_kern() for explicitness.
>
> As for non-init_net network namespaces, we deal with it the best way
> we can -- hold an extra netns reference for server->ssocket and drop it
> when it's released. This ensures that the netns still exists whenever
> we need to create/destroy server->ssocket, but is not directly tied to
> it.
>
> Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network
> namespace.")
>
>
> --
> Thanks,
>
> Steve
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-01 13:54 ` Wang Zhaolong
@ 2025-04-01 20:26 ` Kuniyuki Iwashima
2025-04-02 0:57 ` Kuniyuki Iwashima
2025-04-02 2:01 ` Kuniyuki Iwashima
1 sibling, 1 reply; 20+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-01 20:26 UTC (permalink / raw)
To: wangzhaolong1
Cc: edumazet, ematsumiya, kuniyu, linux-fsdevel, linux-kernel, netdev,
smfrench, zhangchangzhong
(corrected netdev list)
From: Wang Zhaolong <wangzhaolong1@huawei.com>
Date: Tue, 1 Apr 2025 21:54:47 +0800
> Hi.
>
> My colleagues and I have been investigating the issue addressed by this patch
> and have discovered some significant concerns that require broader discussion.
>
> ### Socket Leak Issue
>
> After testing this patch extensively, I've confirmed it introduces a socket leak
> when TCP connections don't complete proper termination (e.g., when FIN packets
> are dropped). The leak manifests as a continuous increase in TCP slab usage:
>
> I've documented this issue with a reproducer in Bugzilla:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=219972#c0
>
> The key issue appears to stem from the interaction between the SMB client and TCP
> socket lifecycle management:
>
> 1. Removing `sk->sk_net_refcnt = 1` causes TCP timers to be terminated early in
> `tcp_close()` via the `inet_csk_clear_xmit_timers_sync()` call when
> `!sk->sk_net_refcnt`
> 2. This early timer termination prevents proper reference counting resolution
> when connections don't complete the 4-way TCP termination handshake
> 3. The resulting socket references are never fully released, leading to a leak
>
> #### Timeline of Related Changes
>
> 1. v4.2-rc1 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets")
> - Added `sk_net_refcnt` field to distinguish user sockets (=1) from kernel sockets (=0)
> - Kernel sockets don't hold netns references, which can lead to potential UAF issues
>
> 2. v6.9-rc2 151c9c724d05: ("tcp: properly terminate timers for kernel sockets")
> - Modified `tcp_close()` to check `sk->sk_net_refcnt` and explicitly terminate timers for kernel sockets (=0)
> - This prevents UAF when netns is destroyed before socket timers complete
> - **Key change**: If `!sk->sk_net_refcnt`, call `inet_csk_clear_xmit_timers_sync()`
>
> 3. v6.12-rc7 ef7134c7fc48: ("smb: client: Fix use-after-free of network namespace")
> - Fixed netns UAF in CIFS by manually setting `sk->sk_net_refcnt = 1`
> - Also called `maybe_get_net()` to maintain netns references
> - This effectively made kernel sockets behave like user sockets for reference counting
>
> 4. v6.13-rc4 e9f2517a3e18: ("smb: client: fix TCP timers deadlock after rmmod")
> - Problem commit: Removed `sk->sk_net_refcnt = 1` setting
> - Changed to using explicit `get_net()/put_net()` at CIFS layer
> - This change leads to socket leaks because timers are terminated early
>
> ### Lockdep Warning Analysis
>
> I've also investigated the lockdep warning mentioned in the patch. My analysis
> suggests it may be a false positive rather than an actual deadlock.
Note that the 'deadlock' in the description is simply wrong as I mentioned
before. The 'deadlock' means a lock which belongs to an unloaded module,
and not actual deadlock like circular locking etc.
https://lore.kernel.org/netdev/20241219084100.33837-1-kuniyu@amazon.com/
> The crash
> actually occurs in the lockdep subsystem itself (null pointer dereference in
> `check_wait_context()`), not in the CIFS or networking code directly.
>
> The procedure for the null pointer dereference is as follows:
>
> When lockdep is enabled, the lock class "slock-AF_INET-CIFS" is set when a socket
> connection is established.
>
> ```
> cifs_do_mount
> cifs_mount
> mount_get_conns
> cifs_get_tcp_session
> ip_connect
> generic_ip_connect
> cifs_reclassify_socket4
> sock_lock_init_class_and_name(sk, "slock-AF_INET-CIFS",
> lockdep_init_map
> lockdep_init_map_wait
> lockdep_init_map_type
> lockdep_init_map_type
> register_lock_class
> __set_bit(class - lock_classes, lock_classes_in_use);
> ```
>
> When the module is unloaded, the lock class is cleaned up.
>
> ```
> free_mod_mem
> lockdep_free_key_range
> __lockdep_free_key_range
> zap_class
> __clear_bit(class - lock_classes, lock_classes_in_use);
> ```
>
> After the module is uninstalled and the network connection is restored, the
> timer is woken up.
>
> ```
> run_timer_softirq
> run_timer_base
> __run_timers
> call_timer_fn
> tcp_write_timer
> bh_lock_sock
> spin_lock(&((__sk)->sk_lock.slock))
> _raw_spin_lock
> lock_acquire
> __lock_acquire
> check_wait_context
> hlock_class
> if (!test_bit(class_idx, lock_classes_in_use)) {
> return NULL;
> hlock_class(next)->wait_type_inner; // Null pointer dereference
> ```
>
> The problem lies within lockdep, as Kuniyuki says:
>
> > I tried the repro and confirmed it triggers null deref.
> >
> > It happens in LOCKDEP internal, so for me it looks like a problem in
> > LOCKDEP rather than CIFS or TCP.
> >
> > I think LOCKDEP should hold a module reference and prevent related
> > modules from being unloaded.
>
> Regarding the deadlock issue, it is clear that the locks mentioned in the deadlock warning
> do not belong to the same lock instance. A deadlock should not occur.
>
> ### Discussion Points
>
> 1. API Design Question: Is this fundamentally an issue with how CIFS uses the socket
> API, or is it a networking layer issue that should handle socket cleanup differently?
>
> 2. Approach to Resolution: Would it be better to:
> - Revert to the original solution (setting `sk->sk_net_refcnt = 1`) from ef7134c7fc48?
> - Work with networking subsystem maintainers on a more comprehensive solution that
> handles socket cleanup properly?
Could you test this patch with e9f2517a3e18 reverted ?
---8<---
diff --git a/include/net/sock.h b/include/net/sock.h
index 8daf1b3b12c6..1e15a1209ea6 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -547,6 +547,10 @@ struct sock {
struct rcu_head sk_rcu;
netns_tracker ns_tracker;
struct xarray sk_user_frags;
+
+#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(MODULE)
+ struct module *sk_owner;
+#endif
};
struct sock_bh_locked {
@@ -1583,6 +1587,15 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
sk_mem_reclaim(sk);
}
+#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(MODULE)
+static inline void sk_set_owner(struct sock *sk, struct module *owner)
+{
+ sk->sk_owner = module_get(owner);
+}
+#else
+#define sk_set_owner(sk, owner)
+#endif
+
/*
* Macro so as to not evaluate some arguments when
* lockdep is not enabled.
@@ -1592,6 +1605,7 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
*/
#define sock_lock_init_class_and_name(sk, sname, skey, name, key) \
do { \
+ sk_set_owner(sk, THIS_MODULE); \
sk->sk_lock.owned = 0; \
init_waitqueue_head(&sk->sk_lock.wq); \
spin_lock_init(&(sk)->sk_lock.slock); \
diff --git a/net/core/sock.c b/net/core/sock.c
index 323892066def..2d91a92ed26d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2324,6 +2324,11 @@ static void __sk_destruct(struct rcu_head *head)
__netns_tracker_free(net, &sk->ns_tracker, false);
net_passive_dec(net);
}
+
+#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(MODULE)
+ if (sk->sk_owner)
+ module_put(sk->sk_owner);
+#endif
sk_prot_free(sk->sk_prot_creator, sk);
}
---8<---
>
> 3. CVE Process Question: Given that CVE-2024-54680 appears to "fix" a non-existent issue
> while introducing an actual vulnerability, what's the appropriate way to address this?
>
> What's the best path forward?
>
> Best regards,
> Wang Zhaolong
>
> > Adding fsdevel and networking in case any thoughts on this fix for
> > network/namespaces refcount issue (that causes rmmod UAF).
> >
> > Any opinions on Enzo's proposed Fix?
> >
> > ---------- Forwarded message ---------
> > From: Steve French <smfrench@gmail.com>
> > Date: Tue, Dec 17, 2024 at 9:24 PM
> > Subject: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
> > To: CIFS <linux-cifs@vger.kernel.org>
> > Cc: Kuniyuki Iwashima <kuniyu@amazon.com>, Enzo Matsumiya <ematsumiya@suse.de>
> >
> >
> > Enzo had an interesting patch, that seems to fix an important problem.
> >
> > Here was his repro scenario:
> >
> > tw:~ # mount.cifs -o credentials=/root/wincreds,echo_interval=10
> > //someserver/target1 /mnt/test
> > tw:~ # ls /mnt/test
> > abc dir1 dir3 target1_file.txt tsub
> > tw:~ # iptables -A INPUT -s someserver -j DROP
> >
> > Trigger reconnect and wait for 3*echo_interval:
> >
> > tw:~ # cat /mnt/test/target1_file.txt
> > cat: /mnt/test/target1_file.txt: Host is down
> >
> > Then umount and rmmod. Note that rmmod might take several iterations
> > until it properly tears down everything, so make sure you see the "not
> > loaded" message before proceeding:
> >
> > tw:~ # umount /mnt/*; rmmod cifs
> > umount: /mnt/az: not mounted.
> > umount: /mnt/dfs: not mounted.
> > umount: /mnt/local: not mounted.
> > umount: /mnt/scratch: not mounted.
> > rmmod: ERROR: Module cifs is in use
> > ...
> > tw:~ # rmmod cifs
> > rmmod: ERROR: Module cifs is not currently loaded
> >
> > Then kickoff the TCP internals:
> > tw:~ # iptables -F
> >
> > Gets the lockdep warning (requires CONFIG_LOCKDEP=y) + a NULL deref
> > later on.
> >
> >
> > Any thoughts on his patch? See below (and attached)
> >
> > Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
> > namespace.")
> > fixed a netns UAF by manually enabled socket refcounting
> > (sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)).
> >
> > The reason the patch worked for that bug was because we now hold
> > references to the netns (get_net_track() gets a ref internally)
> > and they're properly released (internally, on __sk_destruct()),
> > but only because sk->sk_net_refcnt was set.
> >
> > Problem:
> > (this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless
> > if init_net or other)
> >
> > Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not
> > only out of cifs scope, but also technically wrong -- it's set conditionally
> > based on user (=1) vs kernel (=0) sockets. And net/ implementations
> > seem to base their user vs kernel space operations on it.
> >
> > e.g. upon TCP socket close, the TCP timers are not cleared because
> > sk->sk_net_refcnt=1:
> > (cf. commit 151c9c724d05 ("tcp: properly terminate timers for
> > kernel sockets"))
> >
> > net/ipv4/tcp.c:
> > void tcp_close(struct sock *sk, long timeout)
> > {
> > lock_sock(sk);
> > __tcp_close(sk, timeout);
> > release_sock(sk);
> > if (!sk->sk_net_refcnt)
> > inet_csk_clear_xmit_timers_sync(sk);
> > sock_put(sk);
> > }
> >
> > Which will throw a lockdep warning and then, as expected, deadlock on
> > tcp_write_timer().
> >
> > A way to reproduce this is by running the reproducer from ef7134c7fc48
> > and then 'rmmod cifs'. A few seconds later, the deadlock/lockdep
> > warning shows up.
> >
> > Fix:
> > We shouldn't mess with socket internals ourselves, so do not set
> > sk_net_refcnt manually.
> >
> > Also change __sock_create() to sock_create_kern() for explicitness.
> >
> > As for non-init_net network namespaces, we deal with it the best way
> > we can -- hold an extra netns reference for server->ssocket and drop it
> > when it's released. This ensures that the netns still exists whenever
> > we need to create/destroy server->ssocket, but is not directly tied to
> > it.
> >
> > Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network
> > namespace.")
> >
> >
> > --
> > Thanks,
> >
> > Steve
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-01 20:26 ` Kuniyuki Iwashima
@ 2025-04-02 0:57 ` Kuniyuki Iwashima
2025-04-02 4:43 ` Wang Zhaolong
0 siblings, 1 reply; 20+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-02 0:57 UTC (permalink / raw)
To: kuniyu
Cc: edumazet, ematsumiya, linux-fsdevel, linux-kernel, netdev,
smfrench, wangzhaolong1, zhangchangzhong
From: Kuniyuki Iwashima <kuniyu@amazon.com>
Date: Tue, 1 Apr 2025 13:26:54 -0700
> (corrected netdev list)
>
> From: Wang Zhaolong <wangzhaolong1@huawei.com>
> Date: Tue, 1 Apr 2025 21:54:47 +0800
> > Hi.
> >
> > My colleagues and I have been investigating the issue addressed by this patch
> > and have discovered some significant concerns that require broader discussion.
> >
> > ### Socket Leak Issue
> >
> > After testing this patch extensively, I've confirmed it introduces a socket leak
> > when TCP connections don't complete proper termination (e.g., when FIN packets
> > are dropped). The leak manifests as a continuous increase in TCP slab usage:
> >
> > I've documented this issue with a reproducer in Bugzilla:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=219972#c0
> >
> > The key issue appears to stem from the interaction between the SMB client and TCP
> > socket lifecycle management:
> >
> > 1. Removing `sk->sk_net_refcnt = 1` causes TCP timers to be terminated early in
> > `tcp_close()` via the `inet_csk_clear_xmit_timers_sync()` call when
> > `!sk->sk_net_refcnt`
> > 2. This early timer termination prevents proper reference counting resolution
> > when connections don't complete the 4-way TCP termination handshake
> > 3. The resulting socket references are never fully released, leading to a leak
> >
> > #### Timeline of Related Changes
> >
> > 1. v4.2-rc1 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets")
> > - Added `sk_net_refcnt` field to distinguish user sockets (=1) from kernel sockets (=0)
> > - Kernel sockets don't hold netns references, which can lead to potential UAF issues
> >
> > 2. v6.9-rc2 151c9c724d05: ("tcp: properly terminate timers for kernel sockets")
> > - Modified `tcp_close()` to check `sk->sk_net_refcnt` and explicitly terminate timers for kernel sockets (=0)
> > - This prevents UAF when netns is destroyed before socket timers complete
> > - **Key change**: If `!sk->sk_net_refcnt`, call `inet_csk_clear_xmit_timers_sync()`
> >
> > 3. v6.12-rc7 ef7134c7fc48: ("smb: client: Fix use-after-free of network namespace")
> > - Fixed netns UAF in CIFS by manually setting `sk->sk_net_refcnt = 1`
> > - Also called `maybe_get_net()` to maintain netns references
> > - This effectively made kernel sockets behave like user sockets for reference counting
> >
> > 4. v6.13-rc4 e9f2517a3e18: ("smb: client: fix TCP timers deadlock after rmmod")
> > - Problem commit: Removed `sk->sk_net_refcnt = 1` setting
> > - Changed to using explicit `get_net()/put_net()` at CIFS layer
> > - This change leads to socket leaks because timers are terminated early
> >
> > ### Lockdep Warning Analysis
> >
> > I've also investigated the lockdep warning mentioned in the patch. My analysis
> > suggests it may be a false positive rather than an actual deadlock.
>
> Note that the 'deadlock' in the description is simply wrong as I mentioned
> before. The 'deadlock' means a lock which belongs to an unloaded module,
> and not actual deadlock like circular locking etc.
>
> https://lore.kernel.org/netdev/20241219084100.33837-1-kuniyu@amazon.com/
>
>
> > The crash
> > actually occurs in the lockdep subsystem itself (null pointer dereference in
> > `check_wait_context()`), not in the CIFS or networking code directly.
> >
> > The procedure for the null pointer dereference is as follows:
> >
> > When lockdep is enabled, the lock class "slock-AF_INET-CIFS" is set when a socket
> > connection is established.
> >
> > ```
> > cifs_do_mount
> > cifs_mount
> > mount_get_conns
> > cifs_get_tcp_session
> > ip_connect
> > generic_ip_connect
> > cifs_reclassify_socket4
> > sock_lock_init_class_and_name(sk, "slock-AF_INET-CIFS",
> > lockdep_init_map
> > lockdep_init_map_wait
> > lockdep_init_map_type
> > lockdep_init_map_type
> > register_lock_class
> > __set_bit(class - lock_classes, lock_classes_in_use);
> > ```
> >
> > When the module is unloaded, the lock class is cleaned up.
> >
> > ```
> > free_mod_mem
> > lockdep_free_key_range
> > __lockdep_free_key_range
> > zap_class
> > __clear_bit(class - lock_classes, lock_classes_in_use);
> > ```
> >
> > After the module is uninstalled and the network connection is restored, the
> > timer is woken up.
> >
> > ```
> > run_timer_softirq
> > run_timer_base
> > __run_timers
> > call_timer_fn
> > tcp_write_timer
> > bh_lock_sock
> > spin_lock(&((__sk)->sk_lock.slock))
> > _raw_spin_lock
> > lock_acquire
> > __lock_acquire
> > check_wait_context
> > hlock_class
> > if (!test_bit(class_idx, lock_classes_in_use)) {
> > return NULL;
> > hlock_class(next)->wait_type_inner; // Null pointer dereference
> > ```
> >
> > The problem lies within lockdep, as Kuniyuki says:
> >
> > > I tried the repro and confirmed it triggers null deref.
> > >
> > > It happens in LOCKDEP internal, so for me it looks like a problem in
> > > LOCKDEP rather than CIFS or TCP.
> > >
> > > I think LOCKDEP should hold a module reference and prevent related
> > > modules from being unloaded.
> >
> > Regarding the deadlock issue, it is clear that the locks mentioned in the deadlock warning
> > do not belong to the same lock instance. A deadlock should not occur.
> >
> > ### Discussion Points
> >
> > 1. API Design Question: Is this fundamentally an issue with how CIFS uses the socket
> > API, or is it a networking layer issue that should handle socket cleanup differently?
> >
> > 2. Approach to Resolution: Would it be better to:
> > - Revert to the original solution (setting `sk->sk_net_refcnt = 1`) from ef7134c7fc48?
> > - Work with networking subsystem maintainers on a more comprehensive solution that
> > handles socket cleanup properly?
I verified the patch below fixed the null-ptr-deref in lockdep by
preventing cifs from being unloaded while TCP sockets are alive.
I'll post this officialy, and once this is merged and pulled into
the cifs tree, I'll send a revert of e9f2517a3e18.
---8<---
diff --git a/include/net/sock.h b/include/net/sock.h
index 8daf1b3b12c6..e6515ef9116a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -547,6 +547,10 @@ struct sock {
struct rcu_head sk_rcu;
netns_tracker ns_tracker;
struct xarray sk_user_frags;
+
+#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
+ struct module *sk_owner;
+#endif
};
struct sock_bh_locked {
@@ -1583,6 +1587,16 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
sk_mem_reclaim(sk);
}
+#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
+static inline void sk_set_owner(struct sock *sk, struct module *owner)
+{
+ __module_get(owner);
+ sk->sk_owner = owner;
+}
+#else
+#define sk_set_owner(sk, owner)
+#endif
+
/*
* Macro so as to not evaluate some arguments when
* lockdep is not enabled.
@@ -1592,6 +1606,7 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
*/
#define sock_lock_init_class_and_name(sk, sname, skey, name, key) \
do { \
+ sk_set_owner(sk, THIS_MODULE); \
sk->sk_lock.owned = 0; \
init_waitqueue_head(&sk->sk_lock.wq); \
spin_lock_init(&(sk)->sk_lock.slock); \
diff --git a/net/core/sock.c b/net/core/sock.c
index 323892066def..b54f12faad1c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2324,6 +2324,12 @@ static void __sk_destruct(struct rcu_head *head)
__netns_tracker_free(net, &sk->ns_tracker, false);
net_passive_dec(net);
}
+
+#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
+ if (sk->sk_owner)
+ module_put(sk->sk_owner);
+#endif
+
sk_prot_free(sk->sk_prot_creator, sk);
}
---8<---
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-02 0:57 ` Kuniyuki Iwashima
@ 2025-04-02 4:43 ` Wang Zhaolong
0 siblings, 0 replies; 20+ messages in thread
From: Wang Zhaolong @ 2025-04-02 4:43 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: edumazet, ematsumiya, linux-fsdevel, linux-kernel, netdev,
smfrench, zhangchangzhong
Hi.
sorry for the late response.
I tested this patch below and it works fine.
Best Regards,
Wang Zhaolong
>
> I verified the patch below fixed the null-ptr-deref in lockdep by
> preventing cifs from being unloaded while TCP sockets are alive.
>
> I'll post this officialy, and once this is merged and pulled into
> the cifs tree, I'll send a revert of e9f2517a3e18.
>
> ---8<---
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 8daf1b3b12c6..e6515ef9116a 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -547,6 +547,10 @@ struct sock {
> struct rcu_head sk_rcu;
> netns_tracker ns_tracker;
> struct xarray sk_user_frags;
> +
> +#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
> + struct module *sk_owner;
> +#endif
> };
>
> struct sock_bh_locked {
> @@ -1583,6 +1587,16 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
> sk_mem_reclaim(sk);
> }
>
> +#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
> +static inline void sk_set_owner(struct sock *sk, struct module *owner)
> +{
> + __module_get(owner);
> + sk->sk_owner = owner;
> +}
> +#else
> +#define sk_set_owner(sk, owner)
> +#endif
> +
> /*
> * Macro so as to not evaluate some arguments when
> * lockdep is not enabled.
> @@ -1592,6 +1606,7 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
> */
> #define sock_lock_init_class_and_name(sk, sname, skey, name, key) \
> do { \
> + sk_set_owner(sk, THIS_MODULE); \
> sk->sk_lock.owned = 0; \
> init_waitqueue_head(&sk->sk_lock.wq); \
> spin_lock_init(&(sk)->sk_lock.slock); \
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 323892066def..b54f12faad1c 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2324,6 +2324,12 @@ static void __sk_destruct(struct rcu_head *head)
> __netns_tracker_free(net, &sk->ns_tracker, false);
> net_passive_dec(net);
> }
> +
> +#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
> + if (sk->sk_owner)
> + module_put(sk->sk_owner);
> +#endif
> +
> sk_prot_free(sk->sk_prot_creator, sk);
> }
>
> ---8<---
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-01 13:54 ` Wang Zhaolong
2025-04-01 20:26 ` Kuniyuki Iwashima
@ 2025-04-02 2:01 ` Kuniyuki Iwashima
2025-04-02 4:49 ` Wang Zhaolong
1 sibling, 1 reply; 20+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-02 2:01 UTC (permalink / raw)
To: wangzhaolong1
Cc: edumazet, ematsumiya, kuniyu, linux-fsdevel, linux-kernel,
linux-net, smfrench, zhangchangzhong
From: Wang Zhaolong <wangzhaolong1@huawei.com>
Date: Tue, 1 Apr 2025 21:54:47 +0800
> Hi.
>
> My colleagues and I have been investigating the issue addressed by this patch
> and have discovered some significant concerns that require broader discussion.
>
> ### Socket Leak Issue
>
> After testing this patch extensively, I've confirmed it introduces a socket leak
> when TCP connections don't complete proper termination (e.g., when FIN packets
> are dropped). The leak manifests as a continuous increase in TCP slab usage:
>
> I've documented this issue with a reproducer in Bugzilla:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=219972#c0
>
> The key issue appears to stem from the interaction between the SMB client and TCP
> socket lifecycle management:
>
> 1. Removing `sk->sk_net_refcnt = 1` causes TCP timers to be terminated early in
> `tcp_close()` via the `inet_csk_clear_xmit_timers_sync()` call when
> `!sk->sk_net_refcnt`
> 2. This early timer termination prevents proper reference counting resolution
> when connections don't complete the 4-way TCP termination handshake
> 3. The resulting socket references are never fully released, leading to a leak
>
> #### Timeline of Related Changes
>
> 1. v4.2-rc1 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets")
> - Added `sk_net_refcnt` field to distinguish user sockets (=1) from kernel sockets (=0)
> - Kernel sockets don't hold netns references, which can lead to potential UAF issues
>
> 2. v6.9-rc2 151c9c724d05: ("tcp: properly terminate timers for kernel sockets")
> - Modified `tcp_close()` to check `sk->sk_net_refcnt` and explicitly terminate timers for kernel sockets (=0)
> - This prevents UAF when netns is destroyed before socket timers complete
> - **Key change**: If `!sk->sk_net_refcnt`, call `inet_csk_clear_xmit_timers_sync()`
>
> 3. v6.12-rc7 ef7134c7fc48: ("smb: client: Fix use-after-free of network namespace")
> - Fixed netns UAF in CIFS by manually setting `sk->sk_net_refcnt = 1`
> - Also called `maybe_get_net()` to maintain netns references
> - This effectively made kernel sockets behave like user sockets for reference counting
>
> 4. v6.13-rc4 e9f2517a3e18: ("smb: client: fix TCP timers deadlock after rmmod")
> - Problem commit: Removed `sk->sk_net_refcnt = 1` setting
> - Changed to using explicit `get_net()/put_net()` at CIFS layer
> - This change leads to socket leaks because timers are terminated early
>
> ### Lockdep Warning Analysis
>
> I've also investigated the lockdep warning mentioned in the patch. My analysis
> suggests it may be a false positive rather than an actual deadlock. The crash
> actually occurs in the lockdep subsystem itself (null pointer dereference in
> `check_wait_context()`), not in the CIFS or networking code directly.
>
> The procedure for the null pointer dereference is as follows:
>
> When lockdep is enabled, the lock class "slock-AF_INET-CIFS" is set when a socket
> connection is established.
>
> ```
> cifs_do_mount
> cifs_mount
> mount_get_conns
> cifs_get_tcp_session
> ip_connect
> generic_ip_connect
> cifs_reclassify_socket4
> sock_lock_init_class_and_name(sk, "slock-AF_INET-CIFS",
> lockdep_init_map
> lockdep_init_map_wait
> lockdep_init_map_type
> lockdep_init_map_type
> register_lock_class
> __set_bit(class - lock_classes, lock_classes_in_use);
> ```
>
> When the module is unloaded, the lock class is cleaned up.
>
> ```
> free_mod_mem
> lockdep_free_key_range
> __lockdep_free_key_range
> zap_class
> __clear_bit(class - lock_classes, lock_classes_in_use);
> ```
>
> After the module is uninstalled and the network connection is restored, the
> timer is woken up.
>
> ```
> run_timer_softirq
> run_timer_base
> __run_timers
> call_timer_fn
> tcp_write_timer
> bh_lock_sock
> spin_lock(&((__sk)->sk_lock.slock))
> _raw_spin_lock
> lock_acquire
> __lock_acquire
> check_wait_context
> hlock_class
> if (!test_bit(class_idx, lock_classes_in_use)) {
> return NULL;
> hlock_class(next)->wait_type_inner; // Null pointer dereference
> ```
>
> The problem lies within lockdep, as Kuniyuki says:
>
> > I tried the repro and confirmed it triggers null deref.
> >
> > It happens in LOCKDEP internal, so for me it looks like a problem in
> > LOCKDEP rather than CIFS or TCP.
> >
> > I think LOCKDEP should hold a module reference and prevent related
> > modules from being unloaded.
>
> Regarding the deadlock issue, it is clear that the locks mentioned in the deadlock warning
> do not belong to the same lock instance. A deadlock should not occur.
>
> ### Discussion Points
>
> 1. API Design Question: Is this fundamentally an issue with how CIFS uses the socket
> API, or is it a networking layer issue that should handle socket cleanup differently?
>
> 2. Approach to Resolution: Would it be better to:
> - Revert to the original solution (setting `sk->sk_net_refcnt = 1`) from ef7134c7fc48?
> - Work with networking subsystem maintainers on a more comprehensive solution that
> handles socket cleanup properly?
>
> 3. CVE Process Question: Given that CVE-2024-54680 appears to "fix" a non-existent issue
> while introducing an actual vulnerability, what's the appropriate way to address this?
I tested on 6.14 and e9f2517a3e18, but the issue still reproduces,
so e9f2517a3e18 is a bogus fix, and we will need to ask the CNA team
to update the description once the correct fix is merged.
repro:
---8<---
#!/bin/bash
CIFS_SERVER="10.0.0.137"
CIFS_PATH="//${CIFS_SERVER}/Users/Administrator/Desktop/CIFS_TEST"
DEV="enp0s3"
CRED="/root/WindowsCredential.sh"
MNT=$(mktemp -d XXXXXX)
echo "/tmp/${MNT}"
mkdir -p /tmp/${MNT}
mount -t cifs ${CIFS_PATH} /tmp/${MNT} -o vers=3.0,sec=ntlmssp,credentials=${CRED},rsize=65536,wsize=65536,cache=none,echo_interval=1
echo "hello world" > /tmp/${MNT}/a.txt
iptables -A INPUT -s ${CIFS_SERVER} -j DROP
cat /tmp/${MNT}/a.txt &
for i in $(seq 10);
do
umount /tmp/${MNT}
rmmod cifs
sleep 1
done
rm -r /tmp/${MNT}
iptables -D INPUT -s ${CIFS_SERVER} -j DROP
---8<---
splat:
---8<---
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 10 PID: 0 at kernel/locking/lockdep.c:234 hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223)
Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs]
CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Not tainted 6.14.0 #36
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223)
Code: ef 90 e8 c4 66 4c 00 85 c0 74 23 8b 05 ba dd bf 01 85 c0 75 19 90 48 c7 c6 4d 83 a1 82 48 c7 c7 bc 0d a0 82 e8 e2 2f f8 ff 90 <0f> 0b 90 90 90 31 c0 c3 cc cc cc cc 0f 1f 44 00 00 90 90 90 90 90
RSP: 0018:ffa0000000468a08 EFLAGS: 00010086
RAX: 0000000000000000 RBX: ff1100010091cc38 RCX: 0000000000000027
RDX: ff1100081f09ca48 RSI: 0000000000000001 RDI: ff1100081f09ca40
RBP: ff1100010091c200 R08: ff1100083fe6e228 R09: 00000000ffffbfff
R10: ff1100081eca0000 R11: ff1100083fe10dc0 R12: ff1100010091cc88
R13: 0000000000000001 R14: 0000000000000000 R15: 00000000000424b1
FS: 0000000000000000(0000) GS:ff1100081f080000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f10a998d1b4 CR3: 0000000002c4a003 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? __warn (kernel/panic.c:748)
? hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223)
? report_bug (lib/bug.c:201 lib/bug.c:219)
? handle_bug (arch/x86/kernel/traps.c:285)
? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
? hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223)
? hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223)
__lock_acquire (kernel/locking/lockdep.c:4853 kernel/locking/lockdep.c:5178)
? lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816)
? sk_filter_trim_cap (./include/linux/rcupdate.h:337 ./include/linux/rcupdate.h:849 net/core/filter.c:155)
lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816)
? tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350)
? sk_filter_trim_cap (./include/linux/rcupdate.h:883 net/core/filter.c:166)
? __inet_lookup_established (./include/net/inet_hashtables.h:358 net/ipv4/inet_hashtables.c:515)
_raw_spin_lock_nested (kernel/locking/spinlock.c:379)
? tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350)
tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350)
? raw_local_deliver (net/ipv4/raw.c:206)
? lock_is_held_type (kernel/locking/lockdep.c:5592 kernel/locking/lockdep.c:5923)
ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1))
ip_local_deliver_finish (./include/linux/rcupdate.h:878 net/ipv4/ip_input.c:234)
ip_sublist_rcv_finish (net/ipv4/ip_input.c:576)
ip_list_rcv_finish (net/ipv4/ip_input.c:628)
ip_list_rcv (net/ipv4/ip_input.c:670)
__netif_receive_skb_list_core (net/core/dev.c:5939 net/core/dev.c:5986)
netif_receive_skb_list_internal (net/core/dev.c:6040 net/core/dev.c:6129)
napi_complete_done (./include/linux/list.h:37 ./include/net/gro.h:519 ./include/net/gro.h:514 net/core/dev.c:6496)
e1000_clean (drivers/net/ethernet/intel/e1000/e1000_main.c:3815)
__napi_poll.constprop.0 (net/core/dev.c:7191)
net_rx_action (net/core/dev.c:7262 net/core/dev.c:7382)
? handle_irq_event (kernel/irq/internals.h:236 kernel/irq/handle.c:213)
? _raw_spin_unlock_irqrestore (./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
? find_held_lock (kernel/locking/lockdep.c:5341)
handle_softirqs (kernel/softirq.c:561)
__irq_exit_rcu (kernel/softirq.c:596 kernel/softirq.c:435 kernel/softirq.c:662)
irq_exit_rcu (kernel/softirq.c:680)
common_interrupt (arch/x86/kernel/irq.c:280 (discriminator 14))
</IRQ>
<TASK>
asm_common_interrupt (./arch/x86/include/asm/idtentry.h:693)
RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:744)
Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d c3 2b 15 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
RSP: 0018:ffa00000000ffee8 EFLAGS: 00000202
RAX: 000000000000640b RBX: ff1100010091c200 RCX: 0000000000061aa4
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff812f30c5
RBP: 000000000000000a R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
? do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325)
default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118)
do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325)
cpu_startup_entry (kernel/sched/idle.c:422 (discriminator 1))
start_secondary (arch/x86/kernel/smpboot.c:315)
common_startup_64 (arch/x86/kernel/head_64.S:421)
</TASK>
irq event stamp: 25636
hardirqs last enabled at (25636): [<ffffffff81297fe1>] __local_bh_enable_ip+0x71/0xd0
hardirqs last disabled at (25635): [<ffffffff8129800a>] __local_bh_enable_ip+0x9a/0xd0
softirqs last enabled at (25606): [<ffffffff81297b2e>] handle_softirqs+0x2ee/0x3b0
softirqs last disabled at (25613): [<ffffffff81297d51>] __irq_exit_rcu+0xa1/0xc0
---[ end trace 0000000000000000 ]---
BUG: kernel NULL pointer dereference, address: 00000000000000c4
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Tainted: G W 6.14.0 #36
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:__lock_acquire (kernel/locking/lockdep.c:4852 kernel/locking/lockdep.c:5178)
Code: 15 41 09 c7 41 8b 44 24 20 25 ff 1f 00 00 41 09 c7 8b 84 24 a0 00 00 00 45 89 7c 24 20 41 89 44 24 24 e8 e1 bc ff ff 4c 89 e7 <44> 0f b6 b8 c4 00 00 00 e8 d1 bc ff ff 0f b6 80 c5 00 00 00 88 44
RSP: 0018:ffa0000000468a10 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ff1100010091cc38 RCX: 0000000000000027
RDX: ff1100081f09ca48 RSI: 0000000000000001 RDI: ff1100010091cc88
RBP: ff1100010091c200 R08: ff1100083fe6e228 R09: 00000000ffffbfff
R10: ff1100081eca0000 R11: ff1100083fe10dc0 R12: ff1100010091cc88
R13: 0000000000000001 R14: 0000000000000000 R15: 00000000000424b1
FS: 0000000000000000(0000) GS:ff1100081f080000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000c4 CR3: 0000000002c4a003 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816)
_raw_spin_lock_nested (kernel/locking/spinlock.c:379)
tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350)
ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1))
ip_local_deliver_finish (./include/linux/rcupdate.h:878 net/ipv4/ip_input.c:234)
ip_sublist_rcv_finish (net/ipv4/ip_input.c:576)
ip_list_rcv_finish (net/ipv4/ip_input.c:628)
ip_list_rcv (net/ipv4/ip_input.c:670)
__netif_receive_skb_list_core (net/core/dev.c:5939 net/core/dev.c:5986)
netif_receive_skb_list_internal (net/core/dev.c:6040 net/core/dev.c:6129)
napi_complete_done (./include/linux/list.h:37 ./include/net/gro.h:519 ./include/net/gro.h:514 net/core/dev.c:6496)
e1000_clean (drivers/net/ethernet/intel/e1000/e1000_main.c:3815)
__napi_poll.constprop.0 (net/core/dev.c:7191)
net_rx_action (net/core/dev.c:7262 net/core/dev.c:7382)
handle_softirqs (kernel/softirq.c:561)
__irq_exit_rcu (kernel/softirq.c:596 kernel/softirq.c:435 kernel/softirq.c:662)
irq_exit_rcu (kernel/softirq.c:680)
common_interrupt (arch/x86/kernel/irq.c:280 (discriminator 14))
</IRQ>
<TASK>
asm_common_interrupt (./arch/x86/include/asm/idtentry.h:693)
RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:744)
Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d c3 2b 15 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
RSP: 0018:ffa00000000ffee8 EFLAGS: 00000202
RAX: 000000000000640b RBX: ff1100010091c200 RCX: 0000000000061aa4
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff812f30c5
RBP: 000000000000000a R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
? do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325)
default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118)
do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325)
cpu_startup_entry (kernel/sched/idle.c:422 (discriminator 1))
start_secondary (arch/x86/kernel/smpboot.c:315)
common_startup_64 (arch/x86/kernel/head_64.S:421)
</TASK>
Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs]
CR2: 00000000000000c4
---[ end trace 0000000000000000 ]---
RIP: 0010:__lock_acquire+0x222/0x16f0
Code: 15 41 09 c7 41 8b 44 24 20 25 ff 1f 00 00 41 09 c7 8b 84 24 a0 00 00 00 45 89 7c 24 20 41 89 44 24 24 e8 e1 bc ff ff 4c 89 e7 <44> 0f b6 b8 c4 00 00 00 e8 d1 bc ff ff 0f b6 80 c5 00 00 00 88 44
RSP: 0018:ffa0000000468a10 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ff1100010091cc38 RCX: 0000000000000027
RDX: ff1100081f09ca48 RSI: 0000000000000001 RDI: ff1100010091cc88
RBP: ff1100010091c200 R08: ff1100083fe6e228 R09: 00000000ffffbfff
R10: ff1100081eca0000 R11: ff1100083fe10dc0 R12: ff1100010091cc88
R13: 0000000000000001 R14: 0000000000000000 R15: 00000000000424b1
FS: 0000000000000000(0000) GS:ff1100081f080000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000c4 CR3: 0000000002c4a003 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
---8<---
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-02 2:01 ` Kuniyuki Iwashima
@ 2025-04-02 4:49 ` Wang Zhaolong
2025-04-02 7:13 ` Greg Kroah-Hartman
0 siblings, 1 reply; 20+ messages in thread
From: Wang Zhaolong @ 2025-04-02 4:49 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: edumazet, ematsumiya, linux-fsdevel, linux-kernel, linux-net,
smfrench, zhangchangzhong, cve, Greg Kroah-Hartman, sfrench
Yes, it seems the previous description might not have been entirely clear.
I need to clearly point out that this patch, intended as the fix for CVE-2024-54680,
does not actually address any real issues. It also fails to resolve the null pointer
dereference problem within lockdep. On top of that, it has caused a series of
subsequent leakage issues.
We will indeed need the CNA team to update the description once the correct fix
is merged.
Best regards,
Wang Zhaolong
> From: Wang Zhaolong <wangzhaolong1@huawei.com>
> Date: Tue, 1 Apr 2025 21:54:47 +0800
>> Hi.
>>
>> My colleagues and I have been investigating the issue addressed by this patch
>> and have discovered some significant concerns that require broader discussion.
>>
>> ### Socket Leak Issue
>>
>> After testing this patch extensively, I've confirmed it introduces a socket leak
>> when TCP connections don't complete proper termination (e.g., when FIN packets
>> are dropped). The leak manifests as a continuous increase in TCP slab usage:
>>
>> I've documented this issue with a reproducer in Bugzilla:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=219972#c0
>>
>> The key issue appears to stem from the interaction between the SMB client and TCP
>> socket lifecycle management:
>>
>> 1. Removing `sk->sk_net_refcnt = 1` causes TCP timers to be terminated early in
>> `tcp_close()` via the `inet_csk_clear_xmit_timers_sync()` call when
>> `!sk->sk_net_refcnt`
>> 2. This early timer termination prevents proper reference counting resolution
>> when connections don't complete the 4-way TCP termination handshake
>> 3. The resulting socket references are never fully released, leading to a leak
>>
>> #### Timeline of Related Changes
>>
>> 1. v4.2-rc1 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets")
>> - Added `sk_net_refcnt` field to distinguish user sockets (=1) from kernel sockets (=0)
>> - Kernel sockets don't hold netns references, which can lead to potential UAF issues
>>
>> 2. v6.9-rc2 151c9c724d05: ("tcp: properly terminate timers for kernel sockets")
>> - Modified `tcp_close()` to check `sk->sk_net_refcnt` and explicitly terminate timers for kernel sockets (=0)
>> - This prevents UAF when netns is destroyed before socket timers complete
>> - **Key change**: If `!sk->sk_net_refcnt`, call `inet_csk_clear_xmit_timers_sync()`
>>
>> 3. v6.12-rc7 ef7134c7fc48: ("smb: client: Fix use-after-free of network namespace")
>> - Fixed netns UAF in CIFS by manually setting `sk->sk_net_refcnt = 1`
>> - Also called `maybe_get_net()` to maintain netns references
>> - This effectively made kernel sockets behave like user sockets for reference counting
>>
>> 4. v6.13-rc4 e9f2517a3e18: ("smb: client: fix TCP timers deadlock after rmmod")
>> - Problem commit: Removed `sk->sk_net_refcnt = 1` setting
>> - Changed to using explicit `get_net()/put_net()` at CIFS layer
>> - This change leads to socket leaks because timers are terminated early
>>
>> ### Lockdep Warning Analysis
>>
>> I've also investigated the lockdep warning mentioned in the patch. My analysis
>> suggests it may be a false positive rather than an actual deadlock. The crash
>> actually occurs in the lockdep subsystem itself (null pointer dereference in
>> `check_wait_context()`), not in the CIFS or networking code directly.
>>
>> The procedure for the null pointer dereference is as follows:
>>
>> When lockdep is enabled, the lock class "slock-AF_INET-CIFS" is set when a socket
>> connection is established.
>>
>> ```
>> cifs_do_mount
>> cifs_mount
>> mount_get_conns
>> cifs_get_tcp_session
>> ip_connect
>> generic_ip_connect
>> cifs_reclassify_socket4
>> sock_lock_init_class_and_name(sk, "slock-AF_INET-CIFS",
>> lockdep_init_map
>> lockdep_init_map_wait
>> lockdep_init_map_type
>> lockdep_init_map_type
>> register_lock_class
>> __set_bit(class - lock_classes, lock_classes_in_use);
>> ```
>>
>> When the module is unloaded, the lock class is cleaned up.
>>
>> ```
>> free_mod_mem
>> lockdep_free_key_range
>> __lockdep_free_key_range
>> zap_class
>> __clear_bit(class - lock_classes, lock_classes_in_use);
>> ```
>>
>> After the module is uninstalled and the network connection is restored, the
>> timer is woken up.
>>
>> ```
>> run_timer_softirq
>> run_timer_base
>> __run_timers
>> call_timer_fn
>> tcp_write_timer
>> bh_lock_sock
>> spin_lock(&((__sk)->sk_lock.slock))
>> _raw_spin_lock
>> lock_acquire
>> __lock_acquire
>> check_wait_context
>> hlock_class
>> if (!test_bit(class_idx, lock_classes_in_use)) {
>> return NULL;
>> hlock_class(next)->wait_type_inner; // Null pointer dereference
>> ```
>>
>> The problem lies within lockdep, as Kuniyuki says:
>>
>>> I tried the repro and confirmed it triggers null deref.
>>>
>>> It happens in LOCKDEP internal, so for me it looks like a problem in
>>> LOCKDEP rather than CIFS or TCP.
>>>
>>> I think LOCKDEP should hold a module reference and prevent related
>>> modules from being unloaded.
>>
>> Regarding the deadlock issue, it is clear that the locks mentioned in the deadlock warning
>> do not belong to the same lock instance. A deadlock should not occur.
>>
>> ### Discussion Points
>>
>> 1. API Design Question: Is this fundamentally an issue with how CIFS uses the socket
>> API, or is it a networking layer issue that should handle socket cleanup differently?
>>
>> 2. Approach to Resolution: Would it be better to:
>> - Revert to the original solution (setting `sk->sk_net_refcnt = 1`) from ef7134c7fc48?
>> - Work with networking subsystem maintainers on a more comprehensive solution that
>> handles socket cleanup properly?
>>
>> 3. CVE Process Question: Given that CVE-2024-54680 appears to "fix" a non-existent issue
>> while introducing an actual vulnerability, what's the appropriate way to address this?
>
> I tested on 6.14 and e9f2517a3e18, but the issue still reproduces,
> so e9f2517a3e18 is a bogus fix, and we will need to ask the CNA team
> to update the description once the correct fix is merged.
>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-02 4:49 ` Wang Zhaolong
@ 2025-04-02 7:13 ` Greg Kroah-Hartman
2025-04-02 9:15 ` Wang Zhaolong
0 siblings, 1 reply; 20+ messages in thread
From: Greg Kroah-Hartman @ 2025-04-02 7:13 UTC (permalink / raw)
To: Wang Zhaolong
Cc: Kuniyuki Iwashima, edumazet, ematsumiya, linux-fsdevel,
linux-kernel, linux-net, smfrench, zhangchangzhong, cve, sfrench
On Wed, Apr 02, 2025 at 12:49:50PM +0800, Wang Zhaolong wrote:
> Yes, it seems the previous description might not have been entirely clear.
> I need to clearly point out that this patch, intended as the fix for CVE-2024-54680,
> does not actually address any real issues. It also fails to resolve the null pointer
> dereference problem within lockdep. On top of that, it has caused a series of
> subsequent leakage issues.
If this cve does not actually fix anything, then we can easily reject
it, please just let us know if that needs to happen here.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-02 7:13 ` Greg Kroah-Hartman
@ 2025-04-02 9:15 ` Wang Zhaolong
2025-04-02 15:18 ` Greg Kroah-Hartman
0 siblings, 1 reply; 20+ messages in thread
From: Wang Zhaolong @ 2025-04-02 9:15 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Kuniyuki Iwashima, edumazet, ematsumiya, linux-fsdevel,
linux-kernel, linux-net, smfrench, zhangchangzhong, cve, sfrench
> On Wed, Apr 02, 2025 at 12:49:50PM +0800, Wang Zhaolong wrote:
>> Yes, it seems the previous description might not have been entirely clear.
>> I need to clearly point out that this patch, intended as the fix for CVE-2024-54680,
>> does not actually address any real issues. It also fails to resolve the null pointer
>> dereference problem within lockdep. On top of that, it has caused a series of
>> subsequent leakage issues.
>
> If this cve does not actually fix anything, then we can easily reject
> it, please just let us know if that needs to happen here.
>
> thanks,
>
> greg k-h
Hi Greg,
Yes, I can confirm that the patch for CVE-2024-54680 (commit e9f2517a3e18)
should be rejected. Our analysis shows:
1. It fails to address the actual null pointer dereference in lockdep
2. It introduces multiple serious issues:
1. A socket leak vulnerability as documented in bugzilla #219972
2. Network namespace refcount imbalance issues as described in
bugzilla #219792 (which required the follow-up mainline fix
4e7f1644f2ac "smb: client: Fix netns refcount imbalance
causing leaks and use-after-free")
The next thing we should probably do is:
- Reverting e9f2517a3e18
- Reverting the follow-up fix 4e7f1644f2ac, as it's trying to fix
problems introduced by the problematic CVE patch
- Addressing the original lockdep issue properly (Kuniyuki is working
on a module ownership tracking patch, though it hasn't been merged yet)
Regardless of the status of Kuniyuki's lockdep fix, the CVE patch itself
is fundamentally flawed and should be rejected as it creates more problems
than it solves.
Thank you for your attention to this matter.
Best regards.
Wang Zhaolong
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-02 9:15 ` Wang Zhaolong
@ 2025-04-02 15:18 ` Greg Kroah-Hartman
2025-04-02 20:09 ` Kuniyuki Iwashima
0 siblings, 1 reply; 20+ messages in thread
From: Greg Kroah-Hartman @ 2025-04-02 15:18 UTC (permalink / raw)
To: Wang Zhaolong
Cc: Kuniyuki Iwashima, edumazet, ematsumiya, linux-fsdevel,
linux-kernel, linux-net, smfrench, zhangchangzhong, cve, sfrench
On Wed, Apr 02, 2025 at 05:15:44PM +0800, Wang Zhaolong wrote:
> > On Wed, Apr 02, 2025 at 12:49:50PM +0800, Wang Zhaolong wrote:
> > > Yes, it seems the previous description might not have been entirely clear.
> > > I need to clearly point out that this patch, intended as the fix for CVE-2024-54680,
> > > does not actually address any real issues. It also fails to resolve the null pointer
> > > dereference problem within lockdep. On top of that, it has caused a series of
> > > subsequent leakage issues.
> >
> > If this cve does not actually fix anything, then we can easily reject
> > it, please just let us know if that needs to happen here.
> >
> > thanks,
> >
> > greg k-h
> Hi Greg,
>
> Yes, I can confirm that the patch for CVE-2024-54680 (commit e9f2517a3e18)
> should be rejected. Our analysis shows:
>
> 1. It fails to address the actual null pointer dereference in lockdep
>
> 2. It introduces multiple serious issues:
> 1. A socket leak vulnerability as documented in bugzilla #219972
> 2. Network namespace refcount imbalance issues as described in
> bugzilla #219792 (which required the follow-up mainline fix
> 4e7f1644f2ac "smb: client: Fix netns refcount imbalance
> causing leaks and use-after-free")
>
> The next thing we should probably do is:
> - Reverting e9f2517a3e18
> - Reverting the follow-up fix 4e7f1644f2ac, as it's trying to fix
> problems introduced by the problematic CVE patch
Great, can you please send patches now for both of these so we can
backport them to the stable kernels properly?
> - Addressing the original lockdep issue properly (Kuniyuki is working
> on a module ownership tracking patch, though it hasn't been merged yet)
>
> Regardless of the status of Kuniyuki's lockdep fix, the CVE patch itself
> is fundamentally flawed and should be rejected as it creates more problems
> than it solves.
Ok, I'll go reject that now, thanks.
greg k-h
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-02 15:18 ` Greg Kroah-Hartman
@ 2025-04-02 20:09 ` Kuniyuki Iwashima
2025-04-02 20:15 ` Greg KH
0 siblings, 1 reply; 20+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-02 20:09 UTC (permalink / raw)
To: gregkh
Cc: cve, edumazet, ematsumiya, kuniyu, linux-fsdevel, linux-kernel,
linux-net, sfrench, smfrench, wangzhaolong1, zhangchangzhong
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Wed, 2 Apr 2025 16:18:37 +0100
> On Wed, Apr 02, 2025 at 05:15:44PM +0800, Wang Zhaolong wrote:
> > > On Wed, Apr 02, 2025 at 12:49:50PM +0800, Wang Zhaolong wrote:
> > > > Yes, it seems the previous description might not have been entirely clear.
> > > > I need to clearly point out that this patch, intended as the fix for CVE-2024-54680,
> > > > does not actually address any real issues. It also fails to resolve the null pointer
> > > > dereference problem within lockdep. On top of that, it has caused a series of
> > > > subsequent leakage issues.
> > >
> > > If this cve does not actually fix anything, then we can easily reject
> > > it, please just let us know if that needs to happen here.
> > >
> > > thanks,
> > >
> > > greg k-h
> > Hi Greg,
> >
> > Yes, I can confirm that the patch for CVE-2024-54680 (commit e9f2517a3e18)
> > should be rejected. Our analysis shows:
> >
> > 1. It fails to address the actual null pointer dereference in lockdep
> >
> > 2. It introduces multiple serious issues:
> > 1. A socket leak vulnerability as documented in bugzilla #219972
> > 2. Network namespace refcount imbalance issues as described in
> > bugzilla #219792 (which required the follow-up mainline fix
> > 4e7f1644f2ac "smb: client: Fix netns refcount imbalance
> > causing leaks and use-after-free")
> >
> > The next thing we should probably do is:
> > - Reverting e9f2517a3e18
> > - Reverting the follow-up fix 4e7f1644f2ac, as it's trying to fix
> > problems introduced by the problematic CVE patch
>
> Great, can you please send patches now for both of these so we can
> backport them to the stable kernels properly?
Sent to CIFS tree:
https://lore.kernel.org/linux-cifs/20250402200319.2834-1-kuniyu@amazon.com/
Thanks!
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-02 20:09 ` Kuniyuki Iwashima
@ 2025-04-02 20:15 ` Greg KH
2025-04-02 20:22 ` Kuniyuki Iwashima
0 siblings, 1 reply; 20+ messages in thread
From: Greg KH @ 2025-04-02 20:15 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: cve, edumazet, ematsumiya, linux-fsdevel, linux-kernel, linux-net,
sfrench, smfrench, wangzhaolong1, zhangchangzhong
On Wed, Apr 02, 2025 at 01:09:19PM -0700, Kuniyuki Iwashima wrote:
> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Date: Wed, 2 Apr 2025 16:18:37 +0100
> > On Wed, Apr 02, 2025 at 05:15:44PM +0800, Wang Zhaolong wrote:
> > > > On Wed, Apr 02, 2025 at 12:49:50PM +0800, Wang Zhaolong wrote:
> > > > > Yes, it seems the previous description might not have been entirely clear.
> > > > > I need to clearly point out that this patch, intended as the fix for CVE-2024-54680,
> > > > > does not actually address any real issues. It also fails to resolve the null pointer
> > > > > dereference problem within lockdep. On top of that, it has caused a series of
> > > > > subsequent leakage issues.
> > > >
> > > > If this cve does not actually fix anything, then we can easily reject
> > > > it, please just let us know if that needs to happen here.
> > > >
> > > > thanks,
> > > >
> > > > greg k-h
> > > Hi Greg,
> > >
> > > Yes, I can confirm that the patch for CVE-2024-54680 (commit e9f2517a3e18)
> > > should be rejected. Our analysis shows:
> > >
> > > 1. It fails to address the actual null pointer dereference in lockdep
> > >
> > > 2. It introduces multiple serious issues:
> > > 1. A socket leak vulnerability as documented in bugzilla #219972
> > > 2. Network namespace refcount imbalance issues as described in
> > > bugzilla #219792 (which required the follow-up mainline fix
> > > 4e7f1644f2ac "smb: client: Fix netns refcount imbalance
> > > causing leaks and use-after-free")
> > >
> > > The next thing we should probably do is:
> > > - Reverting e9f2517a3e18
> > > - Reverting the follow-up fix 4e7f1644f2ac, as it's trying to fix
> > > problems introduced by the problematic CVE patch
> >
> > Great, can you please send patches now for both of these so we can
> > backport them to the stable kernels properly?
>
> Sent to CIFS tree:
> https://lore.kernel.org/linux-cifs/20250402200319.2834-1-kuniyu@amazon.com/
You forgot to add a Cc: stable@ on the patches to ensure that they get
picked up properly for all stable trees :(
Can you redo them?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-02 20:15 ` Greg KH
@ 2025-04-02 20:22 ` Kuniyuki Iwashima
2025-04-02 20:28 ` Greg KH
0 siblings, 1 reply; 20+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-02 20:22 UTC (permalink / raw)
To: gregkh
Cc: cve, edumazet, ematsumiya, kuniyu, linux-fsdevel, linux-kernel,
linux-net, sfrench, smfrench, wangzhaolong1, zhangchangzhong
From: Greg KH <gregkh@linuxfoundation.org>
Date: Wed, 2 Apr 2025 21:15:58 +0100
> On Wed, Apr 02, 2025 at 01:09:19PM -0700, Kuniyuki Iwashima wrote:
> > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Date: Wed, 2 Apr 2025 16:18:37 +0100
> > > On Wed, Apr 02, 2025 at 05:15:44PM +0800, Wang Zhaolong wrote:
> > > > > On Wed, Apr 02, 2025 at 12:49:50PM +0800, Wang Zhaolong wrote:
> > > > > > Yes, it seems the previous description might not have been entirely clear.
> > > > > > I need to clearly point out that this patch, intended as the fix for CVE-2024-54680,
> > > > > > does not actually address any real issues. It also fails to resolve the null pointer
> > > > > > dereference problem within lockdep. On top of that, it has caused a series of
> > > > > > subsequent leakage issues.
> > > > >
> > > > > If this cve does not actually fix anything, then we can easily reject
> > > > > it, please just let us know if that needs to happen here.
> > > > >
> > > > > thanks,
> > > > >
> > > > > greg k-h
> > > > Hi Greg,
> > > >
> > > > Yes, I can confirm that the patch for CVE-2024-54680 (commit e9f2517a3e18)
> > > > should be rejected. Our analysis shows:
> > > >
> > > > 1. It fails to address the actual null pointer dereference in lockdep
> > > >
> > > > 2. It introduces multiple serious issues:
> > > > 1. A socket leak vulnerability as documented in bugzilla #219972
> > > > 2. Network namespace refcount imbalance issues as described in
> > > > bugzilla #219792 (which required the follow-up mainline fix
> > > > 4e7f1644f2ac "smb: client: Fix netns refcount imbalance
> > > > causing leaks and use-after-free")
> > > >
> > > > The next thing we should probably do is:
> > > > - Reverting e9f2517a3e18
> > > > - Reverting the follow-up fix 4e7f1644f2ac, as it's trying to fix
> > > > problems introduced by the problematic CVE patch
> > >
> > > Great, can you please send patches now for both of these so we can
> > > backport them to the stable kernels properly?
> >
> > Sent to CIFS tree:
> > https://lore.kernel.org/linux-cifs/20250402200319.2834-1-kuniyu@amazon.com/
>
> You forgot to add a Cc: stable@ on the patches to ensure that they get
> picked up properly for all stable trees :(
Ah sorry, I did the same with netdev. netdev patches usually do
not have the tag but are backported fine, maybe netdev local rule ?
>
> Can you redo them?
Sure, will resend.
Thanks!
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-02 20:22 ` Kuniyuki Iwashima
@ 2025-04-02 20:28 ` Greg KH
2025-04-02 20:50 ` Kuniyuki Iwashima
0 siblings, 1 reply; 20+ messages in thread
From: Greg KH @ 2025-04-02 20:28 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: cve, edumazet, ematsumiya, linux-fsdevel, linux-kernel, linux-net,
sfrench, smfrench, wangzhaolong1, zhangchangzhong
On Wed, Apr 02, 2025 at 01:22:11PM -0700, Kuniyuki Iwashima wrote:
> From: Greg KH <gregkh@linuxfoundation.org>
> Date: Wed, 2 Apr 2025 21:15:58 +0100
> > On Wed, Apr 02, 2025 at 01:09:19PM -0700, Kuniyuki Iwashima wrote:
> > > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Date: Wed, 2 Apr 2025 16:18:37 +0100
> > > > On Wed, Apr 02, 2025 at 05:15:44PM +0800, Wang Zhaolong wrote:
> > > > > > On Wed, Apr 02, 2025 at 12:49:50PM +0800, Wang Zhaolong wrote:
> > > > > > > Yes, it seems the previous description might not have been entirely clear.
> > > > > > > I need to clearly point out that this patch, intended as the fix for CVE-2024-54680,
> > > > > > > does not actually address any real issues. It also fails to resolve the null pointer
> > > > > > > dereference problem within lockdep. On top of that, it has caused a series of
> > > > > > > subsequent leakage issues.
> > > > > >
> > > > > > If this cve does not actually fix anything, then we can easily reject
> > > > > > it, please just let us know if that needs to happen here.
> > > > > >
> > > > > > thanks,
> > > > > >
> > > > > > greg k-h
> > > > > Hi Greg,
> > > > >
> > > > > Yes, I can confirm that the patch for CVE-2024-54680 (commit e9f2517a3e18)
> > > > > should be rejected. Our analysis shows:
> > > > >
> > > > > 1. It fails to address the actual null pointer dereference in lockdep
> > > > >
> > > > > 2. It introduces multiple serious issues:
> > > > > 1. A socket leak vulnerability as documented in bugzilla #219972
> > > > > 2. Network namespace refcount imbalance issues as described in
> > > > > bugzilla #219792 (which required the follow-up mainline fix
> > > > > 4e7f1644f2ac "smb: client: Fix netns refcount imbalance
> > > > > causing leaks and use-after-free")
> > > > >
> > > > > The next thing we should probably do is:
> > > > > - Reverting e9f2517a3e18
> > > > > - Reverting the follow-up fix 4e7f1644f2ac, as it's trying to fix
> > > > > problems introduced by the problematic CVE patch
> > > >
> > > > Great, can you please send patches now for both of these so we can
> > > > backport them to the stable kernels properly?
> > >
> > > Sent to CIFS tree:
> > > https://lore.kernel.org/linux-cifs/20250402200319.2834-1-kuniyu@amazon.com/
> >
> > You forgot to add a Cc: stable@ on the patches to ensure that they get
> > picked up properly for all stable trees :(
>
> Ah sorry, I did the same with netdev. netdev patches usually do
> not have the tag but are backported fine, maybe netdev local rule ?
Nope, that's the "old" way of dealing with netdev patches, the
documentation was changed years ago, please always put a cc: stable on
it. Otherwise you are just at the whim of our "hey, I'm board, let's
look for Fixes: only tags!" script to catch them, which will also never
notify you of failures.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-02 20:28 ` Greg KH
@ 2025-04-02 20:50 ` Kuniyuki Iwashima
2025-04-02 21:32 ` Greg KH
0 siblings, 1 reply; 20+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-02 20:50 UTC (permalink / raw)
To: gregkh
Cc: cve, edumazet, ematsumiya, kuniyu, linux-fsdevel, linux-kernel,
linux-net, sfrench, smfrench, wangzhaolong1, zhangchangzhong
From: Greg KH <gregkh@linuxfoundation.org>
Date: Wed, 2 Apr 2025 21:28:51 +0100
> On Wed, Apr 02, 2025 at 01:22:11PM -0700, Kuniyuki Iwashima wrote:
> > From: Greg KH <gregkh@linuxfoundation.org>
> > Date: Wed, 2 Apr 2025 21:15:58 +0100
> > > On Wed, Apr 02, 2025 at 01:09:19PM -0700, Kuniyuki Iwashima wrote:
> > > > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > Date: Wed, 2 Apr 2025 16:18:37 +0100
> > > > > On Wed, Apr 02, 2025 at 05:15:44PM +0800, Wang Zhaolong wrote:
> > > > > > > On Wed, Apr 02, 2025 at 12:49:50PM +0800, Wang Zhaolong wrote:
> > > > > > > > Yes, it seems the previous description might not have been entirely clear.
> > > > > > > > I need to clearly point out that this patch, intended as the fix for CVE-2024-54680,
> > > > > > > > does not actually address any real issues. It also fails to resolve the null pointer
> > > > > > > > dereference problem within lockdep. On top of that, it has caused a series of
> > > > > > > > subsequent leakage issues.
> > > > > > >
> > > > > > > If this cve does not actually fix anything, then we can easily reject
> > > > > > > it, please just let us know if that needs to happen here.
> > > > > > >
> > > > > > > thanks,
> > > > > > >
> > > > > > > greg k-h
> > > > > > Hi Greg,
> > > > > >
> > > > > > Yes, I can confirm that the patch for CVE-2024-54680 (commit e9f2517a3e18)
> > > > > > should be rejected. Our analysis shows:
> > > > > >
> > > > > > 1. It fails to address the actual null pointer dereference in lockdep
> > > > > >
> > > > > > 2. It introduces multiple serious issues:
> > > > > > 1. A socket leak vulnerability as documented in bugzilla #219972
> > > > > > 2. Network namespace refcount imbalance issues as described in
> > > > > > bugzilla #219792 (which required the follow-up mainline fix
> > > > > > 4e7f1644f2ac "smb: client: Fix netns refcount imbalance
> > > > > > causing leaks and use-after-free")
> > > > > >
> > > > > > The next thing we should probably do is:
> > > > > > - Reverting e9f2517a3e18
> > > > > > - Reverting the follow-up fix 4e7f1644f2ac, as it's trying to fix
> > > > > > problems introduced by the problematic CVE patch
> > > > >
> > > > > Great, can you please send patches now for both of these so we can
> > > > > backport them to the stable kernels properly?
> > > >
> > > > Sent to CIFS tree:
> > > > https://lore.kernel.org/linux-cifs/20250402200319.2834-1-kuniyu@amazon.com/
> > >
> > > You forgot to add a Cc: stable@ on the patches to ensure that they get
> > > picked up properly for all stable trees :(
> >
> > Ah sorry, I did the same with netdev. netdev patches usually do
> > not have the tag but are backported fine, maybe netdev local rule ?
>
> Nope, that's the "old" way of dealing with netdev patches, the
> documentation was changed years ago, please always put a cc: stable on
> it. Otherwise you are just at the whim of our "hey, I'm board, let's
> look for Fixes: only tags!" script to catch them, which will also never
> notify you of failures.
Good to know that, thanks!
My concern was that I could spam the list if I respin the patches,
and incomplete patch could be backported.
From stable-kernel-rules.rst, such an accident can be prevented if
someone points out a problem within 48 hours ?
For example, if v1 is posted with Cc:stable, and a week later
v2 is posted, then the not-yet-upstreamed v1 could be backported ?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-02 20:50 ` Kuniyuki Iwashima
@ 2025-04-02 21:32 ` Greg KH
2025-04-02 21:58 ` Kuniyuki Iwashima
0 siblings, 1 reply; 20+ messages in thread
From: Greg KH @ 2025-04-02 21:32 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: cve, edumazet, ematsumiya, linux-fsdevel, linux-kernel, linux-net,
sfrench, smfrench, wangzhaolong1, zhangchangzhong
On Wed, Apr 02, 2025 at 01:50:05PM -0700, Kuniyuki Iwashima wrote:
> From: Greg KH <gregkh@linuxfoundation.org>
> Date: Wed, 2 Apr 2025 21:28:51 +0100
> > On Wed, Apr 02, 2025 at 01:22:11PM -0700, Kuniyuki Iwashima wrote:
> > > From: Greg KH <gregkh@linuxfoundation.org>
> > > Date: Wed, 2 Apr 2025 21:15:58 +0100
> > > > On Wed, Apr 02, 2025 at 01:09:19PM -0700, Kuniyuki Iwashima wrote:
> > > > > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > > Date: Wed, 2 Apr 2025 16:18:37 +0100
> > > > > > On Wed, Apr 02, 2025 at 05:15:44PM +0800, Wang Zhaolong wrote:
> > > > > > > > On Wed, Apr 02, 2025 at 12:49:50PM +0800, Wang Zhaolong wrote:
> > > > > > > > > Yes, it seems the previous description might not have been entirely clear.
> > > > > > > > > I need to clearly point out that this patch, intended as the fix for CVE-2024-54680,
> > > > > > > > > does not actually address any real issues. It also fails to resolve the null pointer
> > > > > > > > > dereference problem within lockdep. On top of that, it has caused a series of
> > > > > > > > > subsequent leakage issues.
> > > > > > > >
> > > > > > > > If this cve does not actually fix anything, then we can easily reject
> > > > > > > > it, please just let us know if that needs to happen here.
> > > > > > > >
> > > > > > > > thanks,
> > > > > > > >
> > > > > > > > greg k-h
> > > > > > > Hi Greg,
> > > > > > >
> > > > > > > Yes, I can confirm that the patch for CVE-2024-54680 (commit e9f2517a3e18)
> > > > > > > should be rejected. Our analysis shows:
> > > > > > >
> > > > > > > 1. It fails to address the actual null pointer dereference in lockdep
> > > > > > >
> > > > > > > 2. It introduces multiple serious issues:
> > > > > > > 1. A socket leak vulnerability as documented in bugzilla #219972
> > > > > > > 2. Network namespace refcount imbalance issues as described in
> > > > > > > bugzilla #219792 (which required the follow-up mainline fix
> > > > > > > 4e7f1644f2ac "smb: client: Fix netns refcount imbalance
> > > > > > > causing leaks and use-after-free")
> > > > > > >
> > > > > > > The next thing we should probably do is:
> > > > > > > - Reverting e9f2517a3e18
> > > > > > > - Reverting the follow-up fix 4e7f1644f2ac, as it's trying to fix
> > > > > > > problems introduced by the problematic CVE patch
> > > > > >
> > > > > > Great, can you please send patches now for both of these so we can
> > > > > > backport them to the stable kernels properly?
> > > > >
> > > > > Sent to CIFS tree:
> > > > > https://lore.kernel.org/linux-cifs/20250402200319.2834-1-kuniyu@amazon.com/
> > > >
> > > > You forgot to add a Cc: stable@ on the patches to ensure that they get
> > > > picked up properly for all stable trees :(
> > >
> > > Ah sorry, I did the same with netdev. netdev patches usually do
> > > not have the tag but are backported fine, maybe netdev local rule ?
> >
> > Nope, that's the "old" way of dealing with netdev patches, the
> > documentation was changed years ago, please always put a cc: stable on
> > it. Otherwise you are just at the whim of our "hey, I'm board, let's
> > look for Fixes: only tags!" script to catch them, which will also never
> > notify you of failures.
>
> Good to know that, thanks!
>
> My concern was that I could spam the list if I respin the patches,
> and incomplete patch could be backported.
>
> >From stable-kernel-rules.rst, such an accident can be prevented if
> someone points out a problem within 48 hours ?
>
> For example, if v1 is posted with Cc:stable, and a week later
> v2 is posted, then the not-yet-upstreamed v1 could be backported ?
>
Anything can be asked to be applied to stable once it is in Linus's
tree, but if you add the cc: stable stuff to the original patch, it will
be done automatically for you. That's all, I don't see anything about
"48 hours" in that document about submitting patches, that's only in the
-rc review cycle portion of the stable releases, not Linus's releases.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2025-04-02 21:32 ` Greg KH
@ 2025-04-02 21:58 ` Kuniyuki Iwashima
0 siblings, 0 replies; 20+ messages in thread
From: Kuniyuki Iwashima @ 2025-04-02 21:58 UTC (permalink / raw)
To: gregkh
Cc: cve, edumazet, ematsumiya, kuniyu, linux-fsdevel, linux-kernel,
linux-net, sfrench, smfrench, wangzhaolong1, zhangchangzhong
From: Greg KH <gregkh@linuxfoundation.org>
Date: Wed, 2 Apr 2025 22:32:58 +0100
> On Wed, Apr 02, 2025 at 01:50:05PM -0700, Kuniyuki Iwashima wrote:
> > From: Greg KH <gregkh@linuxfoundation.org>
> > Date: Wed, 2 Apr 2025 21:28:51 +0100
> > > On Wed, Apr 02, 2025 at 01:22:11PM -0700, Kuniyuki Iwashima wrote:
> > > > From: Greg KH <gregkh@linuxfoundation.org>
> > > > Date: Wed, 2 Apr 2025 21:15:58 +0100
> > > > > On Wed, Apr 02, 2025 at 01:09:19PM -0700, Kuniyuki Iwashima wrote:
> > > > > > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > > > Date: Wed, 2 Apr 2025 16:18:37 +0100
> > > > > > > On Wed, Apr 02, 2025 at 05:15:44PM +0800, Wang Zhaolong wrote:
> > > > > > > > > On Wed, Apr 02, 2025 at 12:49:50PM +0800, Wang Zhaolong wrote:
> > > > > > > > > > Yes, it seems the previous description might not have been entirely clear.
> > > > > > > > > > I need to clearly point out that this patch, intended as the fix for CVE-2024-54680,
> > > > > > > > > > does not actually address any real issues. It also fails to resolve the null pointer
> > > > > > > > > > dereference problem within lockdep. On top of that, it has caused a series of
> > > > > > > > > > subsequent leakage issues.
> > > > > > > > >
> > > > > > > > > If this cve does not actually fix anything, then we can easily reject
> > > > > > > > > it, please just let us know if that needs to happen here.
> > > > > > > > >
> > > > > > > > > thanks,
> > > > > > > > >
> > > > > > > > > greg k-h
> > > > > > > > Hi Greg,
> > > > > > > >
> > > > > > > > Yes, I can confirm that the patch for CVE-2024-54680 (commit e9f2517a3e18)
> > > > > > > > should be rejected. Our analysis shows:
> > > > > > > >
> > > > > > > > 1. It fails to address the actual null pointer dereference in lockdep
> > > > > > > >
> > > > > > > > 2. It introduces multiple serious issues:
> > > > > > > > 1. A socket leak vulnerability as documented in bugzilla #219972
> > > > > > > > 2. Network namespace refcount imbalance issues as described in
> > > > > > > > bugzilla #219792 (which required the follow-up mainline fix
> > > > > > > > 4e7f1644f2ac "smb: client: Fix netns refcount imbalance
> > > > > > > > causing leaks and use-after-free")
> > > > > > > >
> > > > > > > > The next thing we should probably do is:
> > > > > > > > - Reverting e9f2517a3e18
> > > > > > > > - Reverting the follow-up fix 4e7f1644f2ac, as it's trying to fix
> > > > > > > > problems introduced by the problematic CVE patch
> > > > > > >
> > > > > > > Great, can you please send patches now for both of these so we can
> > > > > > > backport them to the stable kernels properly?
> > > > > >
> > > > > > Sent to CIFS tree:
> > > > > > https://lore.kernel.org/linux-cifs/20250402200319.2834-1-kuniyu@amazon.com/
> > > > >
> > > > > You forgot to add a Cc: stable@ on the patches to ensure that they get
> > > > > picked up properly for all stable trees :(
> > > >
> > > > Ah sorry, I did the same with netdev. netdev patches usually do
> > > > not have the tag but are backported fine, maybe netdev local rule ?
> > >
> > > Nope, that's the "old" way of dealing with netdev patches, the
> > > documentation was changed years ago, please always put a cc: stable on
> > > it. Otherwise you are just at the whim of our "hey, I'm board, let's
> > > look for Fixes: only tags!" script to catch them, which will also never
> > > notify you of failures.
> >
> > Good to know that, thanks!
> >
> > My concern was that I could spam the list if I respin the patches,
> > and incomplete patch could be backported.
> >
> > >From stable-kernel-rules.rst, such an accident can be prevented if
> > someone points out a problem within 48 hours ?
> >
> > For example, if v1 is posted with Cc:stable, and a week later
> > v2 is posted, then the not-yet-upstreamed v1 could be backported ?
> >
>
> Anything can be asked to be applied to stable once it is in Linus's
> tree, but if you add the cc: stable stuff to the original patch, it will
> be done automatically for you.
Now I understood. The process is triggered only after the patch
is merged to Linus' tree. I assumed the workflow is triggered by
the patch email itself.
Thanks for explaining!
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod
2024-12-18 3:24 [PATCH][SMB3 client] fix TCP timers deadlock after rmmod Steve French
2024-12-19 0:28 ` Fwd: " Steve French
@ 2024-12-19 8:41 ` Kuniyuki Iwashima
1 sibling, 0 replies; 20+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-19 8:41 UTC (permalink / raw)
To: smfrench; +Cc: ematsumiya, kuniyu, linux-cifs, netdev
Hi,
sorry for the late response.
From: Steve French <smfrench@gmail.com>
Date: Tue, 17 Dec 2024 21:24:12 -0600
> Enzo had an interesting patch, that seems to fix an important problem.
>
> Here was his repro scenario:
>
> tw:~ # mount.cifs -o credentials=/root/wincreds,echo_interval=10
> //someserver/target1 /mnt/test
> tw:~ # ls /mnt/test
> abc dir1 dir3 target1_file.txt tsub
> tw:~ # iptables -A INPUT -s someserver -j DROP
>
> Trigger reconnect and wait for 3*echo_interval:
>
> tw:~ # cat /mnt/test/target1_file.txt
> cat: /mnt/test/target1_file.txt: Host is down
>
> Then umount and rmmod. Note that rmmod might take several iterations
> until it properly tears down everything, so make sure you see the "not
> loaded" message before proceeding:
>
> tw:~ # umount /mnt/*; rmmod cifs
> umount: /mnt/az: not mounted.
> umount: /mnt/dfs: not mounted.
> umount: /mnt/local: not mounted.
> umount: /mnt/scratch: not mounted.
> rmmod: ERROR: Module cifs is in use
> ...
> tw:~ # rmmod cifs
> rmmod: ERROR: Module cifs is not currently loaded
>
> Then kickoff the TCP internals:
> tw:~ # iptables -F
>
> Gets the lockdep warning (requires CONFIG_LOCKDEP=y) + a NULL deref
> later on.
I tried the repro and confirmed it triggers null deref.
It happens in LOCKDEP internal, so for me it looks like a problem in
LOCKDEP rather than CIFS or TCP.
I think LOCKDEP should hold a module reference and prevent related
modules from being unloaded.
[ 242.144225][ C0] ------------[ cut here ]------------
[ 242.144562][ C0] DEBUG_LOCKS_WARN_ON(1)
[ 242.144575][ C0] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:232 hlock_class+0xf9/0x130
[ 242.145435][ C0] Modules linked in: [last unloaded: cifs]
[ 242.145820][ C0] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc4-01014-g835f27aa21c9 #25
[ 242.146426][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 242.147197][ C0] RIP: 0010:hlock_class+0xf9/0x130
[ 242.147529][ C0] Code: b6 14 11 38 d0 7c 04 84 d2 75 47 8b 05 54 a5 6b 04 85 c0 75 19 90 48 c7 c6 e0 e1 46 84 48 c7 c7 c0 dd 46 84 e8 88 6d eb ff 90 <0f> 0b 90 90 90 31 c0 5b c3 cc cc cc cc e8 05 91 4d 00 e9 19 ff ff
[ 242.148779][ C0] RSP: 0018:ff1100006c009138 EFLAGS: 00010082
[ 242.149168][ C0] RAX: 0000000000000000 RBX: 00000000000004e4 RCX: 0000000000000027
[ 242.149673][ C0] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ff1100006c1e8a08
[ 242.150180][ C0] RBP: 0000000000000000 R08: 0000000000000001 R09: ffe21c000d83d141
[ 242.150688][ C0] R10: ff1100006c1e8a0b R11: 0000000000000017 R12: 0000000000000000
[ 242.151199][ C0] R13: ffffffff8502e6c0 R14: ffffffff8502f150 R15: 00000000000424e4
[ 242.151706][ C0] FS: 0000000000000000(0000) GS:ff1100006c000000(0000) knlGS:0000000000000000
[ 242.152275][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 242.152693][ C0] CR2: 00007ffd0d2666b8 CR3: 0000000010002002 CR4: 0000000000771ef0
[ 242.153197][ C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 242.153698][ C0] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 242.154200][ C0] PKRU: 55555554
[ 242.154426][ C0] Call Trace:
[ 242.154636][ C0] <IRQ>
[ 242.154817][ C0] ? __warn+0xcc/0x2e0
[ 242.155086][ C0] ? hlock_class+0xf9/0x130
[ 242.155378][ C0] ? report_bug+0x28c/0x2d0
[ 242.155678][ C0] ? handle_bug+0x54/0xa0
[ 242.155954][ C0] ? exc_invalid_op+0x18/0x50
[ 242.156250][ C0] ? asm_exc_invalid_op+0x1a/0x20
[ 242.156579][ C0] ? hlock_class+0xf9/0x130
[ 242.156867][ C0] ? hlock_class+0xf8/0x130
[ 242.157154][ C0] __lock_acquire+0x4d8/0x3c30
[ 242.157459][ C0] ? find_held_lock+0x33/0x120
[ 242.157766][ C0] ? __pfx___lock_acquire+0x10/0x10
[ 242.158095][ C0] ? __pfx_lock_release+0x10/0x10
[ 242.158414][ C0] lock_acquire+0x196/0x520
[ 242.158699][ C0] ? tcp_v4_rcv+0x2695/0x3b70
[ 242.158999][ C0] ? __pfx_lock_acquire+0x10/0x10
[ 242.159325][ C0] ? __pfx_tcp_inbound_hash+0x10/0x10
[ 242.159673][ C0] ? __pfx_sk_filter_trim_cap+0x10/0x10
[ 242.160039][ C0] ? kasan_quarantine_put+0x84/0x1d0
[ 242.160382][ C0] _raw_spin_lock_nested+0x2e/0x70
[ 242.160707][ C0] ? tcp_v4_rcv+0x2695/0x3b70
[ 242.161004][ C0] tcp_v4_rcv+0x2695/0x3b70
[ 242.161292][ C0] ? __pfx_tcp_v4_rcv+0x10/0x10
[ 242.161598][ C0] ? ip_local_deliver_finish+0x204/0x490
[ 242.161960][ C0] ? __pfx_raw_local_deliver+0x10/0x10
[ 242.162308][ C0] ? hlock_class+0x4e/0x130
[ 242.162598][ C0] ? lock_release+0x5d5/0xb00
[ 242.162898][ C0] ? lock_is_held_type+0xa0/0x120
[ 242.163223][ C0] ip_protocol_deliver_rcu+0x8e/0x330
[ 242.163568][ C0] ip_local_deliver_finish+0x2bd/0x490
[ 242.163922][ C0] ip_local_deliver+0x198/0x450
[ 242.164236][ C0] ? __pfx_ip_local_deliver+0x10/0x10
[ 242.164579][ C0] ? lock_is_held_type+0xa0/0x120
[ 242.164896][ C0] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 242.165273][ C0] ? lock_is_held_type+0xa0/0x120
[ 242.165594][ C0] ? lock_is_held_type+0xa0/0x120
[ 242.165921][ C0] ip_sublist_rcv_finish+0x1e7/0x410
[ 242.166260][ C0] ip_sublist_rcv+0x380/0x760
[ 242.166564][ C0] ? __pfx_ip_sublist_rcv+0x10/0x10
[ 242.166898][ C0] ? hlock_class+0x4e/0x130
[ 242.167187][ C0] ? __lock_acquire+0x15e6/0x3c30
[ 242.167506][ C0] ? ip_fast_csum+0xc/0x20
[ 242.167789][ C0] ? ip_rcv_core+0x5dd/0xd30
[ 242.168084][ C0] ip_list_rcv+0x26e/0x380
[ 242.168367][ C0] ? __pfx_ip_list_rcv+0x10/0x10
[ 242.168684][ C0] __netif_receive_skb_list_core+0x5c2/0x830
[ 242.169063][ C0] ? __pfx___netif_receive_skb_list_core+0x10/0x10
[ 242.169470][ C0] ? lockdep_hardirqs_on_prepare+0x12b/0x3f0
[ 242.169844][ C0] ? kvm_clock_get_cycles+0x18/0x30
[ 242.170174][ C0] ? ktime_get_with_offset+0xe3/0x240
[ 242.170520][ C0] netif_receive_skb_list_internal+0x595/0xc70
[ 242.170905][ C0] ? __pfx_netif_receive_skb_list_internal+0x10/0x10
[ 242.171321][ C0] ? __kasan_mempool_unpoison_object+0x11e/0x1e0
[ 242.171721][ C0] ? napi_gro_receive+0x792/0x9d0
[ 242.172040][ C0] napi_complete_done+0x19d/0x700
[ 242.172359][ C0] ? __pfx_napi_complete_done+0x10/0x10
[ 242.172709][ C0] ? __pfx_e1000_clean_rx_irq+0x10/0x10
[ 242.173061][ C0] e1000_clean+0x85f/0x23e0
[ 242.173347][ C0] ? hlock_class+0x4e/0x130
[ 242.173636][ C0] ? __pfx_e1000_clean+0x10/0x10
[ 242.173953][ C0] __napi_poll.constprop.0+0x9b/0x430
[ 242.174296][ C0] net_rx_action+0x8d4/0xc90
[ 242.174594][ C0] ? __pfx_net_rx_action+0x10/0x10
[ 242.174919][ C0] ? handle_irq_event+0x10d/0x1c0
[ 242.175246][ C0] ? find_held_lock+0x33/0x120
[ 242.175556][ C0] ? hlock_class+0x4e/0x130
[ 242.175855][ C0] ? lock_release+0x5d5/0xb00
[ 242.176157][ C0] ? __pfx_lock_release+0x10/0x10
[ 242.176478][ C0] ? kvm_guest_apic_eoi_write+0x1e/0x40
[ 242.176831][ C0] handle_softirqs+0x1ae/0x770
[ 242.177137][ C0] irq_exit_rcu+0x94/0xc0
[ 242.177413][ C0] common_interrupt+0x7e/0x90
[ 242.177716][ C0] </IRQ>
[ 242.177904][ C0] <TASK>
[ 242.178093][ C0] asm_common_interrupt+0x26/0x40
[ 242.178415][ C0] RIP: 0010:default_idle+0xf/0x20
[ 242.178738][ C0] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 13 df 32 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[ 242.179958][ C0] RSP: 0018:ffffffff85007e30 EFLAGS: 00000206
[ 242.180341][ C0] RAX: 000000000007b609 RBX: 0000000000000000 RCX: ffffffff840f7d25
[ 242.180849][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8125f83d
[ 242.181357][ C0] RBP: dffffc0000000000 R08: 0000000000000001 R09: ffe21c000d83ecb8
[ 242.181864][ C0] R10: ff1100006c1f65c3 R11: 0000000000000000 R12: ffffffff8595b388
[ 242.182371][ C0] R13: 1ffffffff0a00fcb R14: 0000000000000000 R15: 0000000000000000
[ 242.182877][ C0] ? ct_kernel_exit.constprop.0+0xc5/0xf0
[ 242.183240][ C0] ? do_idle+0x2fd/0x3b0
[ 242.183508][ C0] default_idle_call+0x6d/0xb0
[ 242.183811][ C0] do_idle+0x2fd/0x3b0
[ 242.184072][ C0] ? __pfx_do_idle+0x10/0x10
[ 242.184368][ C0] cpu_startup_entry+0x4f/0x60
[ 242.184673][ C0] rest_init+0x139/0x200
[ 242.184945][ C0] ? acpi_subsystem_init+0x50/0x140
[ 242.185280][ C0] start_kernel+0x374/0x3f0
[ 242.185573][ C0] x86_64_start_reservations+0x18/0x30
[ 242.185925][ C0] x86_64_start_kernel+0xcf/0xe0
[ 242.186236][ C0] common_startup_64+0x12c/0x138
[ 242.186551][ C0] </TASK>
[ 242.186740][ C0] irq event stamp: 505362
[ 242.187010][ C0] hardirqs last enabled at (505362): [<ffffffff81171ea1>] __local_bh_enable_ip+0xa1/0x110
[ 242.187632][ C0] hardirqs last disabled at (505361): [<ffffffff81171eca>] __local_bh_enable_ip+0xca/0x110
[ 242.188252][ C0] softirqs last enabled at (505348): [<ffffffff81171a8c>] handle_softirqs+0x50c/0x770
[ 242.188851][ C0] softirqs last disabled at (505355): [<ffffffff81172174>] irq_exit_rcu+0x94/0xc0
[ 242.189424][ C0] ---[ end trace 0000000000000000 ]---
[ 242.189778][ C0] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000018: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 242.190575][ C0] KASAN: null-ptr-deref in range [0x00000000000000c0-0x00000000000000c7]
[ 242.191097][ C0] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.12.0-rc4-01014-g835f27aa21c9 #25
[ 242.191782][ C0] Tainted: [W]=WARN
[ 242.192023][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 242.192788][ C0] RIP: 0010:__lock_acquire+0x4f0/0x3c30
[ 242.193137][ C0] Code: 84 24 40 01 00 00 4c 89 f7 41 89 46 34 e8 e8 27 ff ff 48 ba 00 00 00 00 00 fc ff df 48 8d b8 c4 00 00 00 48 89 f9 48 c1 e9 03 <0f> b6 14 11 48 89 f9 83 e1 07 38 ca 7f 08 84 d2 0f 85 b6 29 00 00
[ 242.194860][ C0] RSP: 0018:ff1100006c009148 EFLAGS: 00010007
[ 242.195251][ C0] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
[ 242.195758][ C0] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: 00000000000000c4
[ 242.196253][ C0] RBP: 0000000000000000 R08: 0000000000000001 R09: ffe21c000d83d141
[ 242.196750][ C0] R10: ff1100006c1e8a0b R11: 0000000000000017 R12: 0000000000000000
[ 242.197245][ C0] R13: ffffffff8502e6c0 R14: ffffffff8502f150 R15: 00000000000424e4
[ 242.197742][ C0] FS: 0000000000000000(0000) GS:ff1100006c000000(0000) knlGS:0000000000000000
[ 242.198301][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 242.198715][ C0] CR2: 00007ffd0d2666b8 CR3: 0000000010002002 CR4: 0000000000771ef0
[ 242.199212][ C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 242.199703][ C0] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 242.200200][ C0] PKRU: 55555554
[ 242.200424][ C0] Call Trace:
[ 242.200635][ C0] <IRQ>
[ 242.200816][ C0] ? die_addr+0x3c/0xa0
[ 242.201084][ C0] ? exc_general_protection+0x14d/0x230
[ 242.201433][ C0] ? asm_exc_general_protection+0x26/0x30
[ 242.201793][ C0] ? __lock_acquire+0x4f0/0x3c30
[ 242.202103][ C0] ? __lock_acquire+0x4d8/0x3c30
[ 242.202414][ C0] ? find_held_lock+0x33/0x120
[ 242.202715][ C0] ? __pfx___lock_acquire+0x10/0x10
[ 242.203040][ C0] ? __pfx_lock_release+0x10/0x10
[ 242.203355][ C0] lock_acquire+0x196/0x520
[ 242.203639][ C0] ? tcp_v4_rcv+0x2695/0x3b70
[ 242.203934][ C0] ? __pfx_lock_acquire+0x10/0x10
[ 242.204246][ C0] ? __pfx_tcp_inbound_hash+0x10/0x10
[ 242.204577][ C0] ? __pfx_sk_filter_trim_cap+0x10/0x10
[ 242.204923][ C0] ? kasan_quarantine_put+0x84/0x1d0
[ 242.205257][ C0] _raw_spin_lock_nested+0x2e/0x70
[ 242.205586][ C0] ? tcp_v4_rcv+0x2695/0x3b70
[ 242.205883][ C0] tcp_v4_rcv+0x2695/0x3b70
[ 242.206170][ C0] ? __pfx_tcp_v4_rcv+0x10/0x10
[ 242.206475][ C0] ? ip_local_deliver_finish+0x204/0x490
[ 242.206828][ C0] ? __pfx_raw_local_deliver+0x10/0x10
[ 242.207166][ C0] ? hlock_class+0x4e/0x130
[ 242.207446][ C0] ? lock_release+0x5d5/0xb00
[ 242.207736][ C0] ? lock_is_held_type+0xa0/0x120
[ 242.208047][ C0] ip_protocol_deliver_rcu+0x8e/0x330
[ 242.208380][ C0] ip_local_deliver_finish+0x2bd/0x490
[ 242.208719][ C0] ip_local_deliver+0x198/0x450
[ 242.209021][ C0] ? __pfx_ip_local_deliver+0x10/0x10
[ 242.209351][ C0] ? lock_is_held_type+0xa0/0x120
[ 242.209661][ C0] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 242.210030][ C0] ? lock_is_held_type+0xa0/0x120
[ 242.210342][ C0] ? lock_is_held_type+0xa0/0x120
[ 242.210655][ C0] ip_sublist_rcv_finish+0x1e7/0x410
[ 242.210982][ C0] ip_sublist_rcv+0x380/0x760
[ 242.211274][ C0] ? __pfx_ip_sublist_rcv+0x10/0x10
[ 242.211594][ C0] ? hlock_class+0x4e/0x130
[ 242.211873][ C0] ? __lock_acquire+0x15e6/0x3c30
[ 242.212186][ C0] ? ip_fast_csum+0xc/0x20
[ 242.212460][ C0] ? ip_rcv_core+0x5dd/0xd30
[ 242.212746][ C0] ip_list_rcv+0x26e/0x380
[ 242.213021][ C0] ? __pfx_ip_list_rcv+0x10/0x10
[ 242.213328][ C0] __netif_receive_skb_list_core+0x5c2/0x830
[ 242.213699][ C0] ? __pfx___netif_receive_skb_list_core+0x10/0x10
[ 242.214103][ C0] ? lockdep_hardirqs_on_prepare+0x12b/0x3f0
[ 242.214477][ C0] ? kvm_clock_get_cycles+0x18/0x30
[ 242.214807][ C0] ? ktime_get_with_offset+0xe3/0x240
[ 242.215139][ C0] netif_receive_skb_list_internal+0x595/0xc70
[ 242.215520][ C0] ? __pfx_netif_receive_skb_list_internal+0x10/0x10
[ 242.215934][ C0] ? __kasan_mempool_unpoison_object+0x11e/0x1e0
[ 242.216324][ C0] ? napi_gro_receive+0x792/0x9d0
[ 242.216633][ C0] napi_complete_done+0x19d/0x700
[ 242.216941][ C0] ? __pfx_napi_complete_done+0x10/0x10
[ 242.217280][ C0] ? __pfx_e1000_clean_rx_irq+0x10/0x10
[ 242.217618][ C0] e1000_clean+0x85f/0x23e0
[ 242.217897][ C0] ? hlock_class+0x4e/0x130
[ 242.218173][ C0] ? __pfx_e1000_clean+0x10/0x10
[ 242.218477][ C0] __napi_poll.constprop.0+0x9b/0x430
[ 242.218807][ C0] net_rx_action+0x8d4/0xc90
[ 242.219092][ C0] ? __pfx_net_rx_action+0x10/0x10
[ 242.219406][ C0] ? handle_irq_event+0x10d/0x1c0
[ 242.219719][ C0] ? find_held_lock+0x33/0x120
[ 242.220014][ C0] ? hlock_class+0x4e/0x130
[ 242.220293][ C0] ? lock_release+0x5d5/0xb00
[ 242.220581][ C0] ? __pfx_lock_release+0x10/0x10
[ 242.220887][ C0] ? kvm_guest_apic_eoi_write+0x1e/0x40
[ 242.221227][ C0] handle_softirqs+0x1ae/0x770
[ 242.221527][ C0] irq_exit_rcu+0x94/0xc0
[ 242.221793][ C0] common_interrupt+0x7e/0x90
[ 242.222081][ C0] </IRQ>
[ 242.222262][ C0] <TASK>
[ 242.222443][ C0] asm_common_interrupt+0x26/0x40
[ 242.222754][ C0] RIP: 0010:default_idle+0xf/0x20
[ 242.223063][ C0] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 13 df 32 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[ 242.224245][ C0] RSP: 0018:ffffffff85007e30 EFLAGS: 00000206
[ 242.224619][ C0] RAX: 000000000007b609 RBX: 0000000000000000 RCX: ffffffff840f7d25
[ 242.225113][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8125f83d
[ 242.225596][ C0] RBP: dffffc0000000000 R08: 0000000000000001 R09: ffe21c000d83ecb8
[ 242.226077][ C0] R10: ff1100006c1f65c3 R11: 0000000000000000 R12: ffffffff8595b388
[ 242.226560][ C0] R13: 1ffffffff0a00fcb R14: 0000000000000000 R15: 0000000000000000
[ 242.227044][ C0] ? ct_kernel_exit.constprop.0+0xc5/0xf0
[ 242.227393][ C0] ? do_idle+0x2fd/0x3b0
[ 242.227655][ C0] default_idle_call+0x6d/0xb0
[ 242.227948][ C0] do_idle+0x2fd/0x3b0
[ 242.228201][ C0] ? __pfx_do_idle+0x10/0x10
[ 242.228486][ C0] cpu_startup_entry+0x4f/0x60
[ 242.228781][ C0] rest_init+0x139/0x200
[ 242.229043][ C0] ? acpi_subsystem_init+0x50/0x140
[ 242.229361][ C0] start_kernel+0x374/0x3f0
[ 242.229637][ C0] x86_64_start_reservations+0x18/0x30
[ 242.229969][ C0] x86_64_start_kernel+0xcf/0xe0
[ 242.230268][ C0] common_startup_64+0x12c/0x138
[ 242.230570][ C0] </TASK>
[ 242.230757][ C0] Modules linked in: [last unloaded: cifs]
[ 242.231113][ C0] ---[ end trace 0000000000000000 ]---
[ 242.231443][ C0] RIP: 0010:__lock_acquire+0x4f0/0x3c30
[ 242.231778][ C0] Code: 84 24 40 01 00 00 4c 89 f7 41 89 46 34 e8 e8 27 ff ff 48 ba 00 00 00 00 00 fc ff df 48 8d b8 c4 00 00 00 48 89 f9 48 c1 e9 03 <0f> b6 14 11 48 89 f9 83 e1 07 38 ca 7f 08 84 d2 0f 85 b6 29 00 00
[ 242.232952][ C0] RSP: 0018:ff1100006c009148 EFLAGS: 00010007
[ 242.233322][ C0] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018
[ 242.233800][ C0] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: 00000000000000c4
[ 242.234277][ C0] RBP: 0000000000000000 R08: 0000000000000001 R09: ffe21c000d83d141
[ 242.234756][ C0] R10: ff1100006c1e8a0b R11: 0000000000000017 R12: 0000000000000000
[ 242.235239][ C0] R13: ffffffff8502e6c0 R14: ffffffff8502f150 R15: 00000000000424e4
[ 242.235721][ C0] FS: 0000000000000000(0000) GS:ff1100006c000000(0000) knlGS:0000000000000000
[ 242.236257][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 242.236655][ C0] CR2: 00007ffd0d2666b8 CR3: 0000000010002002 CR4: 0000000000771ef0
[ 242.237140][ C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 242.237621][ C0] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 242.238101][ C0] PKRU: 55555554
[ 242.238317][ C0] Kernel panic - not syncing: Fatal exception in interrupt
[ 242.239006][ C0] Kernel Offset: disabled
[ 242.239320][ C0] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>
>
> Any thoughts on his patch? See below (and attached)
>
> Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
> namespace.")
> fixed a netns UAF by manually enabled socket refcounting
> (sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)).
>
> The reason the patch worked for that bug was because we now hold
> references to the netns (get_net_track() gets a ref internally)
> and they're properly released (internally, on __sk_destruct()),
> but only because sk->sk_net_refcnt was set.
>
> Problem:
> (this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless
> if init_net or other)
>
> Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not
> only out of cifs scope, but also technically wrong -- it's set conditionally
> based on user (=1) vs kernel (=0) sockets. And net/ implementations
> seem to base their user vs kernel space operations on it.
>
> e.g. upon TCP socket close, the TCP timers are not cleared because
> sk->sk_net_refcnt=1:
> (cf. commit 151c9c724d05 ("tcp: properly terminate timers for
> kernel sockets"))
>
> net/ipv4/tcp.c:
> void tcp_close(struct sock *sk, long timeout)
> {
> lock_sock(sk);
> __tcp_close(sk, timeout);
> release_sock(sk);
> if (!sk->sk_net_refcnt)
> inet_csk_clear_xmit_timers_sync(sk);
> sock_put(sk);
> }
>
> Which will throw a lockdep warning and then, as expected, deadlock on
I wondered how the deadlock happens when I read this, but it seems this
'deadlock' means literally 'dead' (freed) lock, not something like ABBA
deadlock.
> tcp_write_timer().
>
> A way to reproduce this is by running the reproducer from ef7134c7fc48
> and then 'rmmod cifs'. A few seconds later, the deadlock/lockdep
> warning shows up.
>
> Fix:
> We shouldn't mess with socket internals ourselves, so do not set
> sk_net_refcnt manually.
>
> Also change __sock_create() to sock_create_kern() for explicitness.
>
> As for non-init_net network namespaces, we deal with it the best way
> we can -- hold an extra netns reference for server->ssocket and drop it
> when it's released. This ensures that the netns still exists whenever
> we need to create/destroy server->ssocket, but is not directly tied to
> it.
>
> Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network
> namespace.")
>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-04-02 21:59 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-18 3:24 [PATCH][SMB3 client] fix TCP timers deadlock after rmmod Steve French
2024-12-19 0:28 ` Fwd: " Steve French
2024-12-19 0:30 ` Steve French
2025-04-01 13:54 ` Wang Zhaolong
2025-04-01 20:26 ` Kuniyuki Iwashima
2025-04-02 0:57 ` Kuniyuki Iwashima
2025-04-02 4:43 ` Wang Zhaolong
2025-04-02 2:01 ` Kuniyuki Iwashima
2025-04-02 4:49 ` Wang Zhaolong
2025-04-02 7:13 ` Greg Kroah-Hartman
2025-04-02 9:15 ` Wang Zhaolong
2025-04-02 15:18 ` Greg Kroah-Hartman
2025-04-02 20:09 ` Kuniyuki Iwashima
2025-04-02 20:15 ` Greg KH
2025-04-02 20:22 ` Kuniyuki Iwashima
2025-04-02 20:28 ` Greg KH
2025-04-02 20:50 ` Kuniyuki Iwashima
2025-04-02 21:32 ` Greg KH
2025-04-02 21:58 ` Kuniyuki Iwashima
2024-12-19 8:41 ` Kuniyuki Iwashima
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.