* [PATCH] netlink: Fix the netlink socket malfunction due to concurrency
@ 2023-08-18 18:37 Qingjie Xing
2023-08-18 18:57 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Qingjie Xing @ 2023-08-18 18:37 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, keescook, johannes, pctammela,
dhowells, fw, kuniyu
Cc: netdev, linux-kernel, xqjcool
The concurrent Invocation of netlink_attachskb() and netlink_recvmsg()
on different CPUs causes malfunction of netlink socket.
The concurrent scenario of netlink_recvmsg() and netlink_attachskb()
as following:
CPU A CPU B
======== ========
netlink_recvmsg() netlink_attachskb()
[1]bit NETLINK_S_CONGESTED is set
netlink_overrun()
netlink_rcv_wake()
[2]sk_receive_queue is empty
clear bit NETLINK_S_CONGESTED
[3]NETLINK_F_RECV_NO_ENOBUFS not set
set bit NETLINK_S_CONGESTED
In this scenario, the socket's receive queue is empty. Additionally,
due to the NETLINK_S_CONGESTED flag being set, all packets sent to
this socket are discarded.
To prevent this situation, we need to introduce a check for whether
the socket receive buffer is full before setting the NETLINK_S_CONGESTED
flag.
Signed-off-by: Qingjie Xing <xqjcool@gmail.com>
---
net/netlink/af_netlink.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 383631873748..80bcce9acbfc 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -352,7 +352,8 @@ static void netlink_overrun(struct sock *sk)
struct netlink_sock *nlk = nlk_sk(sk);
if (!(nlk->flags & NETLINK_F_RECV_NO_ENOBUFS)) {
- if (!test_and_set_bit(NETLINK_S_CONGESTED,
+ if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf
+ && !test_and_set_bit(NETLINK_S_CONGESTED,
&nlk_sk(sk)->state)) {
sk->sk_err = ENOBUFS;
sk_error_report(sk);
--
2.41.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] netlink: Fix the netlink socket malfunction due to concurrency
2023-08-18 18:37 [PATCH] netlink: Fix the netlink socket malfunction due to concurrency Qingjie Xing
@ 2023-08-18 18:57 ` Eric Dumazet
2023-08-19 18:04 ` Qingjie Xing
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Eric Dumazet @ 2023-08-18 18:57 UTC (permalink / raw)
To: Qingjie Xing
Cc: davem, kuba, pabeni, keescook, johannes, pctammela, dhowells, fw,
kuniyu, netdev, linux-kernel
On Fri, Aug 18, 2023 at 8:42 PM Qingjie Xing <xqjcool@gmail.com> wrote:
>
> The concurrent Invocation of netlink_attachskb() and netlink_recvmsg()
> on different CPUs causes malfunction of netlink socket.
>
> The concurrent scenario of netlink_recvmsg() and netlink_attachskb()
> as following:
>
> CPU A CPU B
> ======== ========
> netlink_recvmsg() netlink_attachskb()
> [1]bit NETLINK_S_CONGESTED is set
> netlink_overrun()
> netlink_rcv_wake()
> [2]sk_receive_queue is empty
> clear bit NETLINK_S_CONGESTED
> [3]NETLINK_F_RECV_NO_ENOBUFS not set
> set bit NETLINK_S_CONGESTED
>
> In this scenario, the socket's receive queue is empty. Additionally,
> due to the NETLINK_S_CONGESTED flag being set, all packets sent to
> this socket are discarded.
>
> To prevent this situation, we need to introduce a check for whether
> the socket receive buffer is full before setting the NETLINK_S_CONGESTED
> flag.
>
> Signed-off-by: Qingjie Xing <xqjcool@gmail.com>
> ---
> net/netlink/af_netlink.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 383631873748..80bcce9acbfc 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -352,7 +352,8 @@ static void netlink_overrun(struct sock *sk)
> struct netlink_sock *nlk = nlk_sk(sk);
>
> if (!(nlk->flags & NETLINK_F_RECV_NO_ENOBUFS)) {
> - if (!test_and_set_bit(NETLINK_S_CONGESTED,
> + if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf
> + && !test_and_set_bit(NETLINK_S_CONGESTED,
> &nlk_sk(sk)->state)) {
> sk->sk_err = ENOBUFS;
> sk_error_report(sk);
This does not look race-free to me.
sk->sk_rcvbuf can change anytime.
I wonder if we should instead remove all this CONGESTED thing and come
back to the fundamentals,
instead of having another bit mirroring some existing state.
Apparently part of it was added in
commit cd1df525da59c64244d27b4548ff5d132489488a
Author: Patrick McHardy <kaber@trash.net>
Date: Wed Apr 17 06:47:05 2013 +0000
netlink: add flow control for memory mapped I/O
Add flow control for memory mapped RX. Since user-space usually doesn't
invoke recvmsg() when using memory mapped I/O, flow control is performed
in netlink_poll(). Dumps are allowed to continue if at least half of the
ring frames are unused.
But I do not think we kept this memory mapped RX.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] netlink: Fix the netlink socket malfunction due to concurrency
2023-08-18 18:57 ` Eric Dumazet
@ 2023-08-19 18:04 ` Qingjie Xing
2023-08-19 18:24 ` Qingjie Xing
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Qingjie Xing @ 2023-08-19 18:04 UTC (permalink / raw)
To: edumazet
Cc: davem, dhowells, fw, johannes, keescook, kuba, kuniyu,
linux-kernel, netdev, pabeni, pctammela, xqjcool
This piece of code has been present since the Linux code v2.6.12 was
incorporated into Git management.
I believe this modification could potentially address the concurrent issue
we've been discussing.
In netlink_rcv_wake(), as described in [2], the socket's receive queue is
empty, indicated by sk_rmem_alloc being 0. At this point, concurrent
netlink_attachskb() calls netlink_overrun(). In this critical state, the
sk_rmem_alloc of the socket will not instantly transition from 0 to a
value greater than sk_rcvbuf.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] netlink: Fix the netlink socket malfunction due to concurrency
2023-08-18 18:57 ` Eric Dumazet
2023-08-19 18:04 ` Qingjie Xing
@ 2023-08-19 18:24 ` Qingjie Xing
2023-08-19 20:17 ` Qingjie Xing
2023-08-23 16:43 ` Further Elaboration on this patch Qingjie Xing
3 siblings, 0 replies; 6+ messages in thread
From: Qingjie Xing @ 2023-08-19 18:24 UTC (permalink / raw)
To: edumazet
Cc: davem, dhowells, fw, johannes, keescook, kuba, kuniyu,
linux-kernel, netdev, pabeni, pctammela, xqjcool
This piece of code has been present since the Linux code v2.6.12 was
incorporated into Git management.
I believe this modification could potentially address the concurrent issue
we've been discussing.
In netlink_rcv_wake(), as described in [2], the socket's receive queue is
empty, indicated by sk_rmem_alloc being 0. At this point, concurrent
netlink_attachskb() calls netlink_overrun(). In this critical state, the
sk_rmem_alloc of the socket will not instantly transition from 0 to a
value greater than sk_rcvbuf.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] netlink: Fix the netlink socket malfunction due to concurrency
2023-08-18 18:57 ` Eric Dumazet
2023-08-19 18:04 ` Qingjie Xing
2023-08-19 18:24 ` Qingjie Xing
@ 2023-08-19 20:17 ` Qingjie Xing
2023-08-23 16:43 ` Further Elaboration on this patch Qingjie Xing
3 siblings, 0 replies; 6+ messages in thread
From: Qingjie Xing @ 2023-08-19 20:17 UTC (permalink / raw)
To: edumazet
Cc: davem, dhowells, fw, johannes, keescook, kuba, kuniyu,
linux-kernel, netdev, pabeni, pctammela, xqjcool
This piece of code has been present since the Linux code v2.6.12 was
incorporated into Git management.
I believe this modification could potentially address the concurrent issue
we've been discussing.
In netlink_rcv_wake(), as described in [2], the socket's receive queue is
empty, indicated by sk_rmem_alloc being 0. At this point, concurrent
netlink_attachskb() calls netlink_overrun(). In this critical state, the
sk_rmem_alloc of the socket will not instantly transition from 0 to a
value greater than sk_rcvbuf.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Further Elaboration on this patch
2023-08-18 18:57 ` Eric Dumazet
` (2 preceding siblings ...)
2023-08-19 20:17 ` Qingjie Xing
@ 2023-08-23 16:43 ` Qingjie Xing
3 siblings, 0 replies; 6+ messages in thread
From: Qingjie Xing @ 2023-08-23 16:43 UTC (permalink / raw)
To: edumazet
Cc: davem, dhowells, fw, johannes, keescook, kuba, kuniyu,
linux-kernel, netdev, pabeni, pctammela, xqjcool
First of all, I'd like to apologize for the duplication of emails.
I'm sorry for any inconvenience this may have caused.
It's important to emphasize that this fix primarily addresses an anomaly
in the behavior of `netlink_rcv_wake()` and `netlink_overrun()` when
dealing with the `NETLINK_S_CONGESTED` flag. This issue only arises when
the --sk->sk_receive_queue-- is empty.
This situation occurs when `netlink_recvmsg()` completes the reception of
the last packet in the sk->sk_receive_queue, leaving it empty. Within
netlink_rcv_wake()`, the `NETLINK_S_CONGESTED` flag is cleared. However,
concurrently, `netlink_overrun()` proceeds to set the `NETLINK_S_CONGESTED`
flag again.
In this specific scenario, due to the `NETLINK_S_CONGESTED` flag being set,
packets sent to this socket cannot be added to the sk->sk_receive_queue.
Because the sk->sk_receive_queue is empty, the `EPOLLIN` flag, which is
monitored by the `poll()` call, won't be set. Consequently, further calls
to `netlink_recvmsg()` will not be triggered.
When the sk->sk_receive_queue is not empty, or when concurrent actions
lead to the non-emptiness of sk->sk_receive_queue, these situations will
not pose a problem. This is because as long as there are packets in the
sk->sk_receive_queue, the `netlink_recvmsg()` function will be invoked
continuously until the queue becomes empty. Only when the
sk->sk_receive_queue becomes empty, the `netlink_rcv_wake()` function
will attempt once again to clear the `NETLINK_S_CONGESTED` flag.
Therefore, we only need to consider the scenario described above, where
the sk->sk_receive_queue is empty.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-08-23 16:47 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-18 18:37 [PATCH] netlink: Fix the netlink socket malfunction due to concurrency Qingjie Xing
2023-08-18 18:57 ` Eric Dumazet
2023-08-19 18:04 ` Qingjie Xing
2023-08-19 18:24 ` Qingjie Xing
2023-08-19 20:17 ` Qingjie Xing
2023-08-23 16:43 ` Further Elaboration on this patch Qingjie Xing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).