* [PATCH RFC net] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
@ 2024-09-05 6:42 Dmitry Antipov
2024-09-08 18:36 ` Cong Wang
0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Antipov @ 2024-09-05 6:42 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki
Cc: Jakub Kicinski, Paolo Abeni, netdev, lvc-project, Dmitry Antipov
At https://syzkaller.appspot.com/bug?extid=f363afac6b0ace576f45, syzbot
has triggered the following race condition:
On CPU0, 'sk_psock_drop()' is running at [1]:
void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
{
write_lock_bh(&sk->sk_callback_lock);
sk_psock_restore_proto(sk, psock); [1]
rcu_assign_sk_user_data(sk, NULL);
if (psock->progs.stream_parser)
sk_psock_stop_strp(sk, psock);
else if (psock->progs.stream_verdict || psock->progs.skb_verdict)
sk_psock_stop_verdict(sk, psock);
write_unlock_bh(&sk->sk_callback_lock);
sk_psock_stop(psock);
INIT_RCU_WORK(&psock->rwork, sk_psock_destroy);
queue_rcu_work(system_wq, &psock->rwork);
}
If 'sock_map_destroy()' is scheduled on CPU1 at the same time, psock is
always NULL at [2]. But, since [1] may be is in progress during [3], the
value of 'saved_destroy' at this point is undefined:
void sock_map_destroy(struct sock *sk)
{
void (*saved_destroy)(struct sock *sk);
struct sk_psock *psock;
rcu_read_lock();
psock = sk_psock_get(sk); [2]
if (unlikely(!psock)) {
rcu_read_unlock();
saved_destroy = READ_ONCE(sk->sk_prot)->destroy; [3]
} else {
saved_destroy = psock->saved_destroy;
sock_map_remove_links(sk, psock);
rcu_read_unlock();
sk_psock_stop(psock);
sk_psock_put(sk, psock);
}
if (WARN_ON_ONCE(saved_destroy == sock_map_destroy))
return;
if (saved_destroy)
saved_destroy(sk);
}
The following fix is helpful to avoid the race and does not introduce
psock leak (when running the syzbot reproducer from the above), but
I'm not sure whether the latter is always true in some other scenario.
So comments are highly appreciated.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
---
net/core/sock_map.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index d3dbb92153f2..fef4231fc872 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -1649,7 +1649,7 @@ void sock_map_destroy(struct sock *sk)
struct sk_psock *psock;
rcu_read_lock();
- psock = sk_psock_get(sk);
+ psock = sk_psock(sk);
if (unlikely(!psock)) {
rcu_read_unlock();
saved_destroy = READ_ONCE(sk->sk_prot)->destroy;
@@ -1658,7 +1658,6 @@ void sock_map_destroy(struct sock *sk)
sock_map_remove_links(sk, psock);
rcu_read_unlock();
sk_psock_stop(psock);
- sk_psock_put(sk, psock);
}
if (WARN_ON_ONCE(saved_destroy == sock_map_destroy))
return;
--
2.46.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH RFC net] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
2024-09-05 6:42 [PATCH RFC net] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put() Dmitry Antipov
@ 2024-09-08 18:36 ` Cong Wang
2024-09-09 7:04 ` Dmitry Antipov
0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2024-09-08 18:36 UTC (permalink / raw)
To: Dmitry Antipov
Cc: John Fastabend, Jakub Sitnicki, Jakub Kicinski, Paolo Abeni,
netdev, lvc-project
On Thu, Sep 05, 2024 at 09:42:57AM +0300, Dmitry Antipov wrote:
> At https://syzkaller.appspot.com/bug?extid=f363afac6b0ace576f45, syzbot
> has triggered the following race condition:
Are you sure it is due to sockmap code?
I see rds_tcp_accept_one() in the stack trace. This is why I highly
suspect that it is due to RDS code instead of sockmap code.
I have the following patch ready for testing, in case you are
interested.
Thanks.
--------------->
commit 4068420e2c82137ab95d387346c0776a36c69e5d
Author: Cong Wang <cong.wang@bytedance.com>
Date: Sun Sep 1 17:01:49 2024 -0700
rds: check sock->sk->sk_user_data conflicts
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 351ac1747224..54ee7f6b8f34 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -134,11 +134,12 @@ void rds_tcp_restore_callbacks(struct socket *sock,
* it is set. The absence of RDS_CONN_UP bit protects those paths
* from being called while it isn't set.
*/
-void rds_tcp_reset_callbacks(struct socket *sock,
- struct rds_conn_path *cp)
+int rds_tcp_reset_callbacks(struct socket *sock,
+ struct rds_conn_path *cp)
{
struct rds_tcp_connection *tc = cp->cp_transport_data;
struct socket *osock = tc->t_sock;
+ int ret = 0;
if (!osock)
goto newsock;
@@ -181,21 +182,25 @@ void rds_tcp_reset_callbacks(struct socket *sock,
newsock:
rds_send_path_reset(cp);
lock_sock(sock->sk);
- rds_tcp_set_callbacks(sock, cp);
+ ret = rds_tcp_set_callbacks(sock, cp);
release_sock(sock->sk);
+ return ret;
}
/* Add tc to rds_tcp_tc_list and set tc->t_sock. See comments
* above rds_tcp_reset_callbacks for notes about synchronization
* with data path
*/
-void rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp)
+int rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp)
{
struct rds_tcp_connection *tc = cp->cp_transport_data;
- rdsdebug("setting sock %p callbacks to tc %p\n", sock, tc);
write_lock_bh(&sock->sk->sk_callback_lock);
-
+ if (sock->sk->sk_user_data) {
+ write_unlock_bh(&sock->sk->sk_callback_lock);
+ return -EBUSY;
+ }
+ rdsdebug("setting sock %p callbacks to tc %p\n", sock, tc);
/* done under the callback_lock to serialize with write_space */
spin_lock(&rds_tcp_tc_list_lock);
list_add_tail(&tc->t_list_item, &rds_tcp_tc_list);
@@ -222,6 +227,7 @@ void rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp)
sock->sk->sk_state_change = rds_tcp_state_change;
write_unlock_bh(&sock->sk->sk_callback_lock);
+ return 0;
}
/* Handle RDS_INFO_TCP_SOCKETS socket option. It only returns IPv4
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index 053aa7da87ef..710cc7fa41af 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -50,8 +50,8 @@ struct rds_tcp_statistics {
/* tcp.c */
bool rds_tcp_tune(struct socket *sock);
-void rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp);
-void rds_tcp_reset_callbacks(struct socket *sock, struct rds_conn_path *cp);
+int rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp);
+int rds_tcp_reset_callbacks(struct socket *sock, struct rds_conn_path *cp);
void rds_tcp_restore_callbacks(struct socket *sock,
struct rds_tcp_connection *tc);
u32 rds_tcp_write_seq(struct rds_tcp_connection *tc);
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index d89bd8d0c354..695456455aee 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -205,11 +205,15 @@ int rds_tcp_accept_one(struct socket *sock)
goto rst_nsk;
if (rs_tcp->t_sock) {
/* Duelling SYN has been handled in rds_tcp_accept_one() */
- rds_tcp_reset_callbacks(new_sock, cp);
+ ret = rds_tcp_reset_callbacks(new_sock, cp);
+ if (ret)
+ goto rst_nsk;
/* rds_connect_path_complete() marks RDS_CONN_UP */
rds_connect_path_complete(cp, RDS_CONN_RESETTING);
} else {
- rds_tcp_set_callbacks(new_sock, cp);
+ ret = rds_tcp_set_callbacks(new_sock, cp);
+ if (ret)
+ goto rst_nsk;
rds_connect_path_complete(cp, RDS_CONN_CONNECTING);
}
new_sock = NULL;
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH RFC net] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
2024-09-08 18:36 ` Cong Wang
@ 2024-09-09 7:04 ` Dmitry Antipov
2024-09-11 4:32 ` Cong Wang
0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Antipov @ 2024-09-09 7:04 UTC (permalink / raw)
To: Cong Wang
Cc: John Fastabend, Jakub Sitnicki, Jakub Kicinski, Paolo Abeni,
netdev, lvc-project
On 9/8/24 9:36 PM, Cong Wang wrote:
> Are you sure it is due to sockmap code?
No, and that's why my patch has RFC tag in subject :-).
> I see rds_tcp_accept_one() in the stack trace. This is why I highly
> suspect that it is due to RDS code instead of sockmap code.
>
> I have the following patch ready for testing, in case you are
> interested.
Does it work for you? Running current upstream with this patch applied,
I'm still seeing the same warning at net/core/sock_map.c:1663.
Again, I'm suspecting the race just because 'sk_psock_drop()' issues
'sk_psock_restore_proto()' with 'sk->sk_callback_lock' write locked,
but 'sock_map_destroy()' just uses 'READ_ONCE()' to obtain a callback
which may be changed underneath.
BTW looking here and there again, I suppose that my patch is not correct
too because it moves and/or shrinks the race window but doesn't eliminate
it completely.
Dmitry
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC net] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
2024-09-09 7:04 ` Dmitry Antipov
@ 2024-09-11 4:32 ` Cong Wang
2024-09-11 9:51 ` Dmitry Antipov
0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2024-09-11 4:32 UTC (permalink / raw)
To: Dmitry Antipov
Cc: John Fastabend, Jakub Sitnicki, Jakub Kicinski, Paolo Abeni,
netdev, lvc-project
On Mon, Sep 09, 2024 at 10:04:21AM +0300, Dmitry Antipov wrote:
> On 9/8/24 9:36 PM, Cong Wang wrote:
>
> > Are you sure it is due to sockmap code?
>
> No, and that's why my patch has RFC tag in subject :-).
>
> > I see rds_tcp_accept_one() in the stack trace. This is why I highly
> > suspect that it is due to RDS code instead of sockmap code.
> >
> > I have the following patch ready for testing, in case you are
> > interested.
>
> Does it work for you? Running current upstream with this patch applied,
> I'm still seeing the same warning at net/core/sock_map.c:1663.
I never tested the RDS code (hence why I didn't post it). But for the warning
itself, actually disabling CONFIG_RDS made it disappear on my side, yet
another reason why I suspect it is RDS related.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC net] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
2024-09-11 4:32 ` Cong Wang
@ 2024-09-11 9:51 ` Dmitry Antipov
2024-09-11 16:45 ` Cong Wang
0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Antipov @ 2024-09-11 9:51 UTC (permalink / raw)
To: Cong Wang
Cc: John Fastabend, Jakub Sitnicki, Jakub Kicinski, Paolo Abeni,
netdev, lvc-project
On 9/11/24 7:32 AM, Cong Wang wrote:
> I never tested the RDS code (hence why I didn't post it). But for the warning
> itself, actually disabling CONFIG_RDS made it disappear on my side, yet
> another reason why I suspect it is RDS related.
OTOH sockmap code depends from CONFIG_BPF_SYSCALL. So I'm pretty sure that
there are more sockmap users beyond RDS and turning off CONFIG_RDS by itself
is not too useful for further investigations of this case.
Dmitry
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC net] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
2024-09-11 9:51 ` Dmitry Antipov
@ 2024-09-11 16:45 ` Cong Wang
2024-09-12 15:59 ` Dmitry Antipov
0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2024-09-11 16:45 UTC (permalink / raw)
To: Dmitry Antipov
Cc: John Fastabend, Jakub Sitnicki, Jakub Kicinski, Paolo Abeni,
netdev, lvc-project
On Wed, Sep 11, 2024 at 12:51:04PM +0300, Dmitry Antipov wrote:
> On 9/11/24 7:32 AM, Cong Wang wrote:
>
> > I never tested the RDS code (hence why I didn't post it). But for the warning
> > itself, actually disabling CONFIG_RDS made it disappear on my side, yet
> > another reason why I suspect it is RDS related.
>
> OTOH sockmap code depends from CONFIG_BPF_SYSCALL. So I'm pretty sure that
> there are more sockmap users beyond RDS and turning off CONFIG_RDS by itself
> is not too useful for further investigations of this case.
>
I guess you totally misunderstand my point. As a significant sockmap
contributor, I am certainly aware of sockmap users. My point is that I
needed to narrow down the problem to CONFIG_RDS when I was debugging it.
So, please let me know if you can still reproduce this after disabling
CONFIG_RDS, because I could not reproduce it any more. If you can,
please kindly share the stack trace without rds_* functions.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC net] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
2024-09-11 16:45 ` Cong Wang
@ 2024-09-12 15:59 ` Dmitry Antipov
2024-09-14 0:34 ` Cong Wang
0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Antipov @ 2024-09-12 15:59 UTC (permalink / raw)
To: Cong Wang
Cc: John Fastabend, Jakub Sitnicki, Jakub Kicinski, Paolo Abeni,
netdev, lvc-project
On 9/11/24 7:45 PM, Cong Wang wrote:
> I guess you totally misunderstand my point. As a significant sockmap
> contributor, I am certainly aware of sockmap users. My point is that I
> needed to narrow down the problem to CONFIG_RDS when I was debugging it.
I've narrowed down the problem to possible race condition between two
functions. "Narrowing down" the problem to a 17.5Kloc-sized subsystem
is not too helpful.
> So, please let me know if you can still reproduce this after disabling
> CONFIG_RDS, because I could not reproduce it any more. If you can,
> please kindly share the stack trace without rds_* functions.
Yes, this issue requires CONFIG_RDS and CONFIG_RDS_TCP to reproduce. But
syzbot reproducer I'm working with doesn't create RDS sockets explicitly
(with 'socket(AF_RDS, ..., ...)' or so). When two options above are enabled,
the default network namespace has special kernel-space socket which is
created in 'rds_tcp_listen_init()' and (if my understanding of the namespaces
is correct) may be inherited with 'unshare(CLONE_NEWNET)'. So just enabling
these two options makes the kernel vulnerable.
So I'm still gently asking you to check whether there is a race condition
I've talked about. Hopefully this shouldn't be too hard for a significant
sockmap contributor.
Dmitry
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC net] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
2024-09-12 15:59 ` Dmitry Antipov
@ 2024-09-14 0:34 ` Cong Wang
2024-09-18 15:42 ` Dmitry Antipov
0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2024-09-14 0:34 UTC (permalink / raw)
To: Dmitry Antipov
Cc: John Fastabend, Jakub Sitnicki, Jakub Kicinski, Paolo Abeni,
netdev, lvc-project
On Thu, Sep 12, 2024 at 06:59:39PM +0300, Dmitry Antipov wrote:
> On 9/11/24 7:45 PM, Cong Wang wrote:
>
> > I guess you totally misunderstand my point. As a significant sockmap
> > contributor, I am certainly aware of sockmap users. My point is that I
> > needed to narrow down the problem to CONFIG_RDS when I was debugging it.
>
> I've narrowed down the problem to possible race condition between two
> functions. "Narrowing down" the problem to a 17.5Kloc-sized subsystem
> is not too helpful.
Narrowing down from more 30 millions lines of code to 17.5K is already a huge
win to me, maybe not for you. :)
>
> > So, please let me know if you can still reproduce this after disabling
> > CONFIG_RDS, because I could not reproduce it any more. If you can,
> > please kindly share the stack trace without rds_* functions.
>
> Yes, this issue requires CONFIG_RDS and CONFIG_RDS_TCP to reproduce. But
> syzbot reproducer I'm working with doesn't create RDS sockets explicitly
> (with 'socket(AF_RDS, ..., ...)' or so). When two options above are enabled,
> the default network namespace has special kernel-space socket which is
> created in 'rds_tcp_listen_init()' and (if my understanding of the namespaces
> is correct) may be inherited with 'unshare(CLONE_NEWNET)'. So just enabling
> these two options makes the kernel vulnerable.
Thanks for confirming it.
I did notice the RDS kernel socket, but, without my patch, we can still
use sockops to hook TCP socket under the RDS socket and add it to a
sockmap, hence the conflict of sock->sk->sk_user_data.
My patch basically prevents such TCP socket under RDS socket from being
added to any sockmap.
>
> So I'm still gently asking you to check whether there is a race condition
> I've talked about. Hopefully this shouldn't be too hard for a significant
> sockmap contributor.
If you can kindly explain why this race condition is not related to RDS
despite the fact it only happens with CONFIG_RDS enabled, I'd happy to
review it. Otherwise, I feel like you may head to a wrong direction.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC net] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
2024-09-14 0:34 ` Cong Wang
@ 2024-09-18 15:42 ` Dmitry Antipov
0 siblings, 0 replies; 9+ messages in thread
From: Dmitry Antipov @ 2024-09-18 15:42 UTC (permalink / raw)
To: Cong Wang
Cc: John Fastabend, Jakub Sitnicki, Jakub Kicinski, Paolo Abeni,
netdev, lvc-project, Allison Henderson
+CC Allison
On 9/14/24 3:34 AM, Cong Wang wrote:
> On Thu, Sep 12, 2024 at 06:59:39PM +0300, Dmitry Antipov wrote:
>> On 9/11/24 7:45 PM, Cong Wang wrote:
>>
>>> I guess you totally misunderstand my point. As a significant sockmap
>>> contributor, I am certainly aware of sockmap users. My point is that I
>>> needed to narrow down the problem to CONFIG_RDS when I was debugging it.
>>
>> I've narrowed down the problem to possible race condition between two
>> functions. "Narrowing down" the problem to a 17.5Kloc-sized subsystem
>> is not too helpful.
>
> Narrowing down from more 30 millions lines of code to 17.5K is already a huge
> win to me, maybe not for you. :)
>
>>
>>> So, please let me know if you can still reproduce this after disabling
>>> CONFIG_RDS, because I could not reproduce it any more. If you can,
>>> please kindly share the stack trace without rds_* functions.
>>
>> Yes, this issue requires CONFIG_RDS and CONFIG_RDS_TCP to reproduce. But
>> syzbot reproducer I'm working with doesn't create RDS sockets explicitly
>> (with 'socket(AF_RDS, ..., ...)' or so). When two options above are enabled,
>> the default network namespace has special kernel-space socket which is
>> created in 'rds_tcp_listen_init()' and (if my understanding of the namespaces
>> is correct) may be inherited with 'unshare(CLONE_NEWNET)'. So just enabling
>> these two options makes the kernel vulnerable.
>
> Thanks for confirming it.
>
> I did notice the RDS kernel socket, but, without my patch, we can still
> use sockops to hook TCP socket under the RDS socket and add it to a
> sockmap, hence the conflict of sock->sk->sk_user_data.
>
> My patch basically prevents such TCP socket under RDS socket from being
> added to any sockmap.
>
>>
>> So I'm still gently asking you to check whether there is a race condition
>> I've talked about. Hopefully this shouldn't be too hard for a significant
>> sockmap contributor.
>
> If you can kindly explain why this race condition is not related to RDS
> despite the fact it only happens with CONFIG_RDS enabled, I'd happy to
> review it. Otherwise, I feel like you may head to a wrong direction.
>
> Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-09-18 15:43 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-05 6:42 [PATCH RFC net] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put() Dmitry Antipov
2024-09-08 18:36 ` Cong Wang
2024-09-09 7:04 ` Dmitry Antipov
2024-09-11 4:32 ` Cong Wang
2024-09-11 9:51 ` Dmitry Antipov
2024-09-11 16:45 ` Cong Wang
2024-09-12 15:59 ` Dmitry Antipov
2024-09-14 0:34 ` Cong Wang
2024-09-18 15:42 ` Dmitry Antipov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox