From: Wang Zhaolong <wangzhaolong1@huawei.com>
To: Kuniyuki Iwashima <kuniyu@amazon.com>, Steve French <sfrench@samba.org>
Cc: Paulo Alcantara <pc@manguebit.com>,
Ronnie Sahlberg <ronniesahlberg@gmail.com>,
Shyam Prasad N <sprasad@microsoft.com>,
Tom Talpey <tom@talpey.com>, Bharath SM <bharathsm@microsoft.com>,
Enzo Matsumiya <ematsumiya@suse.de>,
Kuniyuki Iwashima <kuni1840@gmail.com>,
<linux-cifs@vger.kernel.org>, <samba-technical@lists.samba.org>
Subject: Re: [PATCH 2/2] Revert "smb: client: fix TCP timers deadlock after rmmod"
Date: Thu, 3 Apr 2025 11:12:38 +0800 [thread overview]
Message-ID: <51c59271-c05f-4c36-a01b-4522cf47e2af@huawei.com> (raw)
In-Reply-To: <20250402200319.2834-3-kuniyu@amazon.com>
Thanks Kuniyuki for the thorough explanation and fix. Your analysis of
the TCP socket lifecycle and reference counting is excellent!
This reversion is definitely the right approach.
Acked-by: Wang Zhaolong <wangzhaolong1@huawei.com>
> This reverts commit e9f2517a3e18a54a3943c098d2226b245d488801.
>
> Commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after
> rmmod") is intended to fix a null-ptr-deref in LOCKDEP, which is
> mentioned as CVE-2024-54680, but is actually did not fix anything;
> The issue can be reproduced on top of it. [0]
>
> Also, it reverted the change by commit ef7134c7fc48 ("smb: client:
> Fix use-after-free of network namespace.") and introduced a real
> issue by reviving the kernel TCP socket.
>
> When a reconnect happens for a CIFS connection, the socket state
> transitions to FIN_WAIT_1. Then, inet_csk_clear_xmit_timers_sync()
> in tcp_close() stops all timers for the socket.
>
> If an incoming FIN packet is lost, the socket will stay at FIN_WAIT_1
> forever, and such sockets could be leaked up to net.ipv4.tcp_max_orphans.
>
> Usually, FIN can be retransmitted by the peer, but if the peer aborts
> the connection, the issue comes into reality.
>
> I warned about this privately by pointing out the exact report [1],
> but the bogus fix was finally merged.
>
> So, we should not stop the timers to finally kill the connection on
> our side in that case, meaning we must not use a kernel socket for
> TCP whose sk->sk_net_refcnt is 0.
>
> The kernel socket does not have a reference to its netns to make it
> possible to tear down netns without cleaning up every resource in it.
>
> For example, tunnel devices use a UDP socket internally, but we can
> destroy netns without removing such devices and let it complete
> during exit. Otherwise, netns would be leaked when the last application
> died.
>
> However, this is problematic for TCP sockets because TCP has timers to
> close the connection gracefully even after the socket is close()d. The
> lifetime of the socket and its netns is different from the lifetime of
> the underlying connection.
>
> If the socket user does not maintain the netns lifetime, the timer could
> be fired after the socket is close()d and its netns is freed up, resulting
> in use-after-free.
>
> Actually, we have seen so many similar issues and converted such sockets
> to have a reference to netns.
>
> That's why I converted the CIFS client socket to have a reference to
> netns (sk->sk_net_refcnt == 1), which is somehow mentioned as out-of-scope
> of CIFS and technically wrong in e9f2517a3e18, but **is in-scope and right
> fix**.
>
> Regarding the LOCKDEP issue, we can prevent the module unload by
> bumping the module refcount when switching the LOCKDDEP key in
> sock_lock_init_class_and_name(). [2]
>
> For a while, let's revert the bogus fix.
>
> Note that now we can use sk_net_refcnt_upgrade() for the socket
> conversion, but I'll do so later separately to make backport easy.
>
> Link: https://lore.kernel.org/all/20250402020807.28583-1-kuniyu@amazon.com/ #[0]
> Link: https://lore.kernel.org/netdev/c08bd5378da647a2a4c16698125d180a@huawei.com/ #[1]
> Link: https://lore.kernel.org/lkml/20250402005841.19846-1-kuniyu@amazon.com/ #[2]
> Fixes: e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod")
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
> fs/smb/client/connect.c | 36 ++++++++++--------------------------
> 1 file changed, 10 insertions(+), 26 deletions(-)
>
> diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
> index 137a611c5ab0..989d8808260b 100644
> --- a/fs/smb/client/connect.c
> +++ b/fs/smb/client/connect.c
> @@ -1073,13 +1073,9 @@ clean_demultiplex_info(struct TCP_Server_Info *server)
> msleep(125);
> if (cifs_rdma_enabled(server))
> smbd_destroy(server);
> -
> if (server->ssocket) {
> sock_release(server->ssocket);
> server->ssocket = NULL;
> -
> - /* Release netns reference for the socket. */
> - put_net(cifs_net_ns(server));
> }
>
> if (!list_empty(&server->pending_mid_q)) {
> @@ -1127,7 +1123,6 @@ clean_demultiplex_info(struct TCP_Server_Info *server)
> */
> }
>
> - /* Release netns reference for this server. */
> put_net(cifs_net_ns(server));
> kfree(server->leaf_fullpath);
> kfree(server->hostname);
> @@ -1773,8 +1768,6 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx,
>
> tcp_ses->ops = ctx->ops;
> tcp_ses->vals = ctx->vals;
> -
> - /* Grab netns reference for this server. */
> cifs_set_net_ns(tcp_ses, get_net(current->nsproxy->net_ns));
>
> tcp_ses->sign = ctx->sign;
> @@ -1902,7 +1895,6 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx,
> out_err_crypto_release:
> cifs_crypto_secmech_release(tcp_ses);
>
> - /* Release netns reference for this server. */
> put_net(cifs_net_ns(tcp_ses));
>
> out_err:
> @@ -1911,10 +1903,8 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx,
> cifs_put_tcp_session(tcp_ses->primary_server, false);
> kfree(tcp_ses->hostname);
> kfree(tcp_ses->leaf_fullpath);
> - if (tcp_ses->ssocket) {
> + if (tcp_ses->ssocket)
> sock_release(tcp_ses->ssocket);
> - put_net(cifs_net_ns(tcp_ses));
> - }
> kfree(tcp_ses);
> }
> return ERR_PTR(rc);
> @@ -3356,20 +3346,20 @@ generic_ip_connect(struct TCP_Server_Info *server)
> socket = server->ssocket;
> } else {
> struct net *net = cifs_net_ns(server);
> + struct sock *sk;
>
> - rc = sock_create_kern(net, sfamily, SOCK_STREAM, IPPROTO_TCP, &server->ssocket);
> + rc = __sock_create(net, sfamily, SOCK_STREAM,
> + IPPROTO_TCP, &server->ssocket, 1);
> if (rc < 0) {
> cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
> return rc;
> }
>
> - /*
> - * Grab netns reference for the socket.
> - *
> - * It'll be released here, on error, or in clean_demultiplex_info() upon server
> - * teardown.
> - */
> - get_net(net);
> + sk = server->ssocket->sk;
> + __netns_tracker_free(net, &sk->ns_tracker, false);
> + sk->sk_net_refcnt = 1;
> + get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> + sock_inuse_add(net, 1);
>
> /* BB other socket options to set KEEPALIVE, NODELAY? */
> cifs_dbg(FYI, "Socket created\n");
> @@ -3383,10 +3373,8 @@ generic_ip_connect(struct TCP_Server_Info *server)
> }
>
> rc = bind_socket(server);
> - if (rc < 0) {
> - put_net(cifs_net_ns(server));
> + if (rc < 0)
> return rc;
> - }
>
> /*
> * Eventually check for other socket options to change from
> @@ -3423,7 +3411,6 @@ generic_ip_connect(struct TCP_Server_Info *server)
> if (rc < 0) {
> cifs_dbg(FYI, "Error %d connecting to server\n", rc);
> trace_smb3_connect_err(server->hostname, server->conn_id, &server->dstaddr, rc);
> - put_net(cifs_net_ns(server));
> sock_release(socket);
> server->ssocket = NULL;
> return rc;
> @@ -3441,9 +3428,6 @@ generic_ip_connect(struct TCP_Server_Info *server)
> (server->rfc1001_sessinit == -1 && sport == htons(RFC1001_PORT)))
> rc = ip_rfc1001_connect(server);
>
> - if (rc < 0)
> - put_net(cifs_net_ns(server));
> -
> return rc;
> }
>
next prev parent reply other threads:[~2025-04-03 3:12 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-02 20:02 [PATCH 0/2] cifs: Revert bogus fix for CVE-2024-54680 and its followup commit Kuniyuki Iwashima
2025-04-02 20:02 ` [PATCH 1/2] Revert "smb: client: Fix netns refcount imbalance causing leaks and use-after-free" Kuniyuki Iwashima
2025-04-03 3:16 ` Wang Zhaolong
2025-04-03 9:59 ` Wang Zhaolong
2025-04-03 17:26 ` Kuniyuki Iwashima
2025-04-03 17:32 ` Steve French
2025-04-03 17:46 ` Kuniyuki Iwashima
2025-04-02 20:02 ` [PATCH 2/2] Revert "smb: client: fix TCP timers deadlock after rmmod" Kuniyuki Iwashima
2025-04-03 3:12 ` Wang Zhaolong [this message]
2025-04-03 1:14 ` [PATCH 0/2] cifs: Revert bogus fix for CVE-2024-54680 and its followup commit Steve French
2025-04-03 2:18 ` Kuniyuki Iwashima
2025-04-03 3:19 ` Steve French
2025-04-03 10:14 ` Wang Zhaolong
2025-04-11 7:04 ` Kuniyuki Iwashima
2025-04-12 17:28 ` Steve French
2025-04-12 19:10 ` Kuniyuki Iwashima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51c59271-c05f-4c36-a01b-4522cf47e2af@huawei.com \
--to=wangzhaolong1@huawei.com \
--cc=bharathsm@microsoft.com \
--cc=ematsumiya@suse.de \
--cc=kuni1840@gmail.com \
--cc=kuniyu@amazon.com \
--cc=linux-cifs@vger.kernel.org \
--cc=pc@manguebit.com \
--cc=ronniesahlberg@gmail.com \
--cc=samba-technical@lists.samba.org \
--cc=sfrench@samba.org \
--cc=sprasad@microsoft.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox