From: Kuniyuki Iwashima <kuniyu@amazon.com>
To: "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Allison Henderson <allison.henderson@oracle.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>,
Kuniyuki Iwashima <kuni1840@gmail.com>, <netdev@vger.kernel.org>,
<linux-rdma@vger.kernel.org>, <rds-devel@oss.oracle.com>
Subject: [PATCH v5 net 1/2] tcp: Fix NEW_SYN_RECV handling in inet_twsk_purge()
Date: Fri, 8 Mar 2024 12:01:21 -0800 [thread overview]
Message-ID: <20240308200122.64357-2-kuniyu@amazon.com> (raw)
In-Reply-To: <20240308200122.64357-1-kuniyu@amazon.com>
From: Eric Dumazet <edumazet@google.com>
inet_twsk_purge() uses rcu to find TIME_WAIT and NEW_SYN_RECV
objects to purge.
These objects use SLAB_TYPESAFE_BY_RCU semantic and need special
care. We need to use refcount_inc_not_zero(&sk->sk_refcnt).
Reuse the existing correct logic I wrote for TIME_WAIT,
because both structures have common locations for
sk_state, sk_family, and netns pointer.
If after the refcount_inc_not_zero() the object fields longer match
the keys, use sock_gen_put(sk) to release the refcount.
Then we can call inet_twsk_deschedule_put() for TIME_WAIT,
inet_csk_reqsk_queue_drop_and_put() for NEW_SYN_RECV sockets,
with BH disabled.
Then we need to restart the loop because we had drop rcu_read_lock().
Fixes: 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in inet_twsk_purge()")
Link: https://lore.kernel.org/netdev/CANn89iLvFuuihCtt9PME2uS1WJATnf5fKjDToa1WzVnRzHnPfg@mail.gmail.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
[ Kuniyuki: fixed some checkpatch checks & warnings. ]
---
net/ipv4/inet_timewait_sock.c | 41 ++++++++++++++++-------------------
1 file changed, 19 insertions(+), 22 deletions(-)
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index 5befa4de5b24..e8de45d34d56 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -263,12 +263,12 @@ void __inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo, bool rearm)
}
EXPORT_SYMBOL_GPL(__inet_twsk_schedule);
+/* Remove all non full sockets (TIME_WAIT and NEW_SYN_RECV) for dead netns */
void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family)
{
- struct inet_timewait_sock *tw;
- struct sock *sk;
struct hlist_nulls_node *node;
unsigned int slot;
+ struct sock *sk;
for (slot = 0; slot <= hashinfo->ehash_mask; slot++) {
struct inet_ehash_bucket *head = &hashinfo->ehash[slot];
@@ -277,38 +277,35 @@ void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family)
rcu_read_lock();
restart:
sk_nulls_for_each_rcu(sk, node, &head->chain) {
- if (sk->sk_state != TCP_TIME_WAIT) {
- /* A kernel listener socket might not hold refcnt for net,
- * so reqsk_timer_handler() could be fired after net is
- * freed. Userspace listener and reqsk never exist here.
- */
- if (unlikely(sk->sk_state == TCP_NEW_SYN_RECV &&
- hashinfo->pernet)) {
- struct request_sock *req = inet_reqsk(sk);
-
- inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, req);
- }
+ int state = inet_sk_state_load(sk);
+ if ((1 << state) & ~(TCPF_TIME_WAIT |
+ TCPF_NEW_SYN_RECV))
continue;
- }
- tw = inet_twsk(sk);
- if ((tw->tw_family != family) ||
- refcount_read(&twsk_net(tw)->ns.count))
+ if (sk->sk_family != family ||
+ refcount_read(&sock_net(sk)->ns.count))
continue;
- if (unlikely(!refcount_inc_not_zero(&tw->tw_refcnt)))
+ if (unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
continue;
- if (unlikely((tw->tw_family != family) ||
- refcount_read(&twsk_net(tw)->ns.count))) {
- inet_twsk_put(tw);
+ if (unlikely(sk->sk_family != family ||
+ refcount_read(&sock_net(sk)->ns.count))) {
+ sock_gen_put(sk);
goto restart;
}
rcu_read_unlock();
local_bh_disable();
- inet_twsk_deschedule_put(tw);
+ if (state == TCP_TIME_WAIT) {
+ inet_twsk_deschedule_put(inet_twsk(sk));
+ } else {
+ struct request_sock *req = inet_reqsk(sk);
+
+ inet_csk_reqsk_queue_drop_and_put(req->rsk_listener,
+ req);
+ }
local_bh_enable();
goto restart_rcu;
}
--
2.30.2
next prev parent reply other threads:[~2024-03-08 20:09 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-08 20:01 [PATCH v5 net 0/2] tcp/rds: Fix use-after-free around kernel TCP reqsk Kuniyuki Iwashima
2024-03-08 20:01 ` Kuniyuki Iwashima [this message]
2024-03-08 20:01 ` [PATCH v5 net 2/2] rds: tcp: Fix use-after-free of net in reqsk_timer_handler() Kuniyuki Iwashima
2024-03-12 11:04 ` Paolo Abeni
2024-03-12 12:34 ` Eric Dumazet
2024-03-12 15:10 ` [PATCH v5 net 0/2] tcp/rds: Fix use-after-free around kernel TCP reqsk patchwork-bot+netdevbpf
2024-03-12 15:59 ` Jakub Kicinski
2024-03-13 2:02 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240308200122.64357-2-kuniyu@amazon.com \
--to=kuniyu@amazon.com \
--cc=allison.henderson@oracle.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=kuni1840@gmail.com \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rds-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.