* [PATCH] netlink: Fix the netlink socket malfunction due to concurrency @ 2023-08-18 18:37 Qingjie Xing 2023-08-18 18:57 ` Eric Dumazet 0 siblings, 1 reply; 6+ messages in thread From: Qingjie Xing @ 2023-08-18 18:37 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, keescook, johannes, pctammela, dhowells, fw, kuniyu Cc: netdev, linux-kernel, xqjcool The concurrent Invocation of netlink_attachskb() and netlink_recvmsg() on different CPUs causes malfunction of netlink socket. The concurrent scenario of netlink_recvmsg() and netlink_attachskb() as following: CPU A CPU B ======== ======== netlink_recvmsg() netlink_attachskb() [1]bit NETLINK_S_CONGESTED is set netlink_overrun() netlink_rcv_wake() [2]sk_receive_queue is empty clear bit NETLINK_S_CONGESTED [3]NETLINK_F_RECV_NO_ENOBUFS not set set bit NETLINK_S_CONGESTED In this scenario, the socket's receive queue is empty. Additionally, due to the NETLINK_S_CONGESTED flag being set, all packets sent to this socket are discarded. To prevent this situation, we need to introduce a check for whether the socket receive buffer is full before setting the NETLINK_S_CONGESTED flag. Signed-off-by: Qingjie Xing <xqjcool@gmail.com> --- net/netlink/af_netlink.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 383631873748..80bcce9acbfc 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -352,7 +352,8 @@ static void netlink_overrun(struct sock *sk) struct netlink_sock *nlk = nlk_sk(sk); if (!(nlk->flags & NETLINK_F_RECV_NO_ENOBUFS)) { - if (!test_and_set_bit(NETLINK_S_CONGESTED, + if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf + && !test_and_set_bit(NETLINK_S_CONGESTED, &nlk_sk(sk)->state)) { sk->sk_err = ENOBUFS; sk_error_report(sk); -- 2.41.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] netlink: Fix the netlink socket malfunction due to concurrency 2023-08-18 18:37 [PATCH] netlink: Fix the netlink socket malfunction due to concurrency Qingjie Xing @ 2023-08-18 18:57 ` Eric Dumazet 2023-08-19 18:04 ` Qingjie Xing ` (3 more replies) 0 siblings, 4 replies; 6+ messages in thread From: Eric Dumazet @ 2023-08-18 18:57 UTC (permalink / raw) To: Qingjie Xing Cc: davem, kuba, pabeni, keescook, johannes, pctammela, dhowells, fw, kuniyu, netdev, linux-kernel On Fri, Aug 18, 2023 at 8:42 PM Qingjie Xing <xqjcool@gmail.com> wrote: > > The concurrent Invocation of netlink_attachskb() and netlink_recvmsg() > on different CPUs causes malfunction of netlink socket. > > The concurrent scenario of netlink_recvmsg() and netlink_attachskb() > as following: > > CPU A CPU B > ======== ======== > netlink_recvmsg() netlink_attachskb() > [1]bit NETLINK_S_CONGESTED is set > netlink_overrun() > netlink_rcv_wake() > [2]sk_receive_queue is empty > clear bit NETLINK_S_CONGESTED > [3]NETLINK_F_RECV_NO_ENOBUFS not set > set bit NETLINK_S_CONGESTED > > In this scenario, the socket's receive queue is empty. Additionally, > due to the NETLINK_S_CONGESTED flag being set, all packets sent to > this socket are discarded. > > To prevent this situation, we need to introduce a check for whether > the socket receive buffer is full before setting the NETLINK_S_CONGESTED > flag. > > Signed-off-by: Qingjie Xing <xqjcool@gmail.com> > --- > net/netlink/af_netlink.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c > index 383631873748..80bcce9acbfc 100644 > --- a/net/netlink/af_netlink.c > +++ b/net/netlink/af_netlink.c > @@ -352,7 +352,8 @@ static void netlink_overrun(struct sock *sk) > struct netlink_sock *nlk = nlk_sk(sk); > > if (!(nlk->flags & NETLINK_F_RECV_NO_ENOBUFS)) { > - if (!test_and_set_bit(NETLINK_S_CONGESTED, > + if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf > + && !test_and_set_bit(NETLINK_S_CONGESTED, > &nlk_sk(sk)->state)) { > sk->sk_err = ENOBUFS; > sk_error_report(sk); This does not look race-free to me. sk->sk_rcvbuf can change anytime. I wonder if we should instead remove all this CONGESTED thing and come back to the fundamentals, instead of having another bit mirroring some existing state. Apparently part of it was added in commit cd1df525da59c64244d27b4548ff5d132489488a Author: Patrick McHardy <kaber@trash.net> Date: Wed Apr 17 06:47:05 2013 +0000 netlink: add flow control for memory mapped I/O Add flow control for memory mapped RX. Since user-space usually doesn't invoke recvmsg() when using memory mapped I/O, flow control is performed in netlink_poll(). Dumps are allowed to continue if at least half of the ring frames are unused. But I do not think we kept this memory mapped RX. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] netlink: Fix the netlink socket malfunction due to concurrency 2023-08-18 18:57 ` Eric Dumazet @ 2023-08-19 18:04 ` Qingjie Xing 2023-08-19 18:24 ` Qingjie Xing ` (2 subsequent siblings) 3 siblings, 0 replies; 6+ messages in thread From: Qingjie Xing @ 2023-08-19 18:04 UTC (permalink / raw) To: edumazet Cc: davem, dhowells, fw, johannes, keescook, kuba, kuniyu, linux-kernel, netdev, pabeni, pctammela, xqjcool This piece of code has been present since the Linux code v2.6.12 was incorporated into Git management. I believe this modification could potentially address the concurrent issue we've been discussing. In netlink_rcv_wake(), as described in [2], the socket's receive queue is empty, indicated by sk_rmem_alloc being 0. At this point, concurrent netlink_attachskb() calls netlink_overrun(). In this critical state, the sk_rmem_alloc of the socket will not instantly transition from 0 to a value greater than sk_rcvbuf. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] netlink: Fix the netlink socket malfunction due to concurrency 2023-08-18 18:57 ` Eric Dumazet 2023-08-19 18:04 ` Qingjie Xing @ 2023-08-19 18:24 ` Qingjie Xing 2023-08-19 20:17 ` Qingjie Xing 2023-08-23 16:43 ` Further Elaboration on this patch Qingjie Xing 3 siblings, 0 replies; 6+ messages in thread From: Qingjie Xing @ 2023-08-19 18:24 UTC (permalink / raw) To: edumazet Cc: davem, dhowells, fw, johannes, keescook, kuba, kuniyu, linux-kernel, netdev, pabeni, pctammela, xqjcool This piece of code has been present since the Linux code v2.6.12 was incorporated into Git management. I believe this modification could potentially address the concurrent issue we've been discussing. In netlink_rcv_wake(), as described in [2], the socket's receive queue is empty, indicated by sk_rmem_alloc being 0. At this point, concurrent netlink_attachskb() calls netlink_overrun(). In this critical state, the sk_rmem_alloc of the socket will not instantly transition from 0 to a value greater than sk_rcvbuf. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] netlink: Fix the netlink socket malfunction due to concurrency 2023-08-18 18:57 ` Eric Dumazet 2023-08-19 18:04 ` Qingjie Xing 2023-08-19 18:24 ` Qingjie Xing @ 2023-08-19 20:17 ` Qingjie Xing 2023-08-23 16:43 ` Further Elaboration on this patch Qingjie Xing 3 siblings, 0 replies; 6+ messages in thread From: Qingjie Xing @ 2023-08-19 20:17 UTC (permalink / raw) To: edumazet Cc: davem, dhowells, fw, johannes, keescook, kuba, kuniyu, linux-kernel, netdev, pabeni, pctammela, xqjcool This piece of code has been present since the Linux code v2.6.12 was incorporated into Git management. I believe this modification could potentially address the concurrent issue we've been discussing. In netlink_rcv_wake(), as described in [2], the socket's receive queue is empty, indicated by sk_rmem_alloc being 0. At this point, concurrent netlink_attachskb() calls netlink_overrun(). In this critical state, the sk_rmem_alloc of the socket will not instantly transition from 0 to a value greater than sk_rcvbuf. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Further Elaboration on this patch 2023-08-18 18:57 ` Eric Dumazet ` (2 preceding siblings ...) 2023-08-19 20:17 ` Qingjie Xing @ 2023-08-23 16:43 ` Qingjie Xing 3 siblings, 0 replies; 6+ messages in thread From: Qingjie Xing @ 2023-08-23 16:43 UTC (permalink / raw) To: edumazet Cc: davem, dhowells, fw, johannes, keescook, kuba, kuniyu, linux-kernel, netdev, pabeni, pctammela, xqjcool First of all, I'd like to apologize for the duplication of emails. I'm sorry for any inconvenience this may have caused. It's important to emphasize that this fix primarily addresses an anomaly in the behavior of `netlink_rcv_wake()` and `netlink_overrun()` when dealing with the `NETLINK_S_CONGESTED` flag. This issue only arises when the --sk->sk_receive_queue-- is empty. This situation occurs when `netlink_recvmsg()` completes the reception of the last packet in the sk->sk_receive_queue, leaving it empty. Within netlink_rcv_wake()`, the `NETLINK_S_CONGESTED` flag is cleared. However, concurrently, `netlink_overrun()` proceeds to set the `NETLINK_S_CONGESTED` flag again. In this specific scenario, due to the `NETLINK_S_CONGESTED` flag being set, packets sent to this socket cannot be added to the sk->sk_receive_queue. Because the sk->sk_receive_queue is empty, the `EPOLLIN` flag, which is monitored by the `poll()` call, won't be set. Consequently, further calls to `netlink_recvmsg()` will not be triggered. When the sk->sk_receive_queue is not empty, or when concurrent actions lead to the non-emptiness of sk->sk_receive_queue, these situations will not pose a problem. This is because as long as there are packets in the sk->sk_receive_queue, the `netlink_recvmsg()` function will be invoked continuously until the queue becomes empty. Only when the sk->sk_receive_queue becomes empty, the `netlink_rcv_wake()` function will attempt once again to clear the `NETLINK_S_CONGESTED` flag. Therefore, we only need to consider the scenario described above, where the sk->sk_receive_queue is empty. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-08-23 16:47 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-18 18:37 [PATCH] netlink: Fix the netlink socket malfunction due to concurrency Qingjie Xing 2023-08-18 18:57 ` Eric Dumazet 2023-08-19 18:04 ` Qingjie Xing 2023-08-19 18:24 ` Qingjie Xing 2023-08-19 20:17 ` Qingjie Xing 2023-08-23 16:43 ` Further Elaboration on this patch Qingjie Xing
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).