* [PATCH] netlink: introduce netlink poll to resolve fast return issue [not found] <CGME20231103072245epcas1p4471a31e9f579e38501c8c856d3ca2a77@epcas1p4.samsung.com> @ 2023-11-03 7:22 ` Jong eon Park 2023-11-06 23:48 ` Jakub Kicinski 0 siblings, 1 reply; 7+ messages in thread From: Jong eon Park @ 2023-11-03 7:22 UTC (permalink / raw) To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: netdev, linux-kernel, Jong eon Park, Dong ha Kang In very rare cases, there was an issue where a user's poll function waiting for a uevent would continuously return very quickly, causing excessive CPU usage due to the following scenario. Once sk_rcvbuf becomes full netlink_broadcast_deliver returns an error and netlink_overrun is called. However, if netlink_overrun was called in a context just before a another context returns from the poll and recv is invoked, emptying the rcvbuf, sk->sk_err = ENOBUF is written to the netlink socket belatedly and it enters the NETLINK_S_CONGESTED state. If the user does not check for POLLERR, they cannot consume and clean sk_err and repeatedly enter the situation where they call poll again but return immediately. To address this issue, I would like to introduce the following netlink poll. After calling the datagram_poll, netlink poll checks the NETLINK_S_CONGESTED status and rcv queue, and this make the user to be readable once more even if the user has already emptied rcv queue. This allows the user to be able to consume sk->sk_err value through netlink_recvmsg, thus the situation described above can be avoided Co-developed-by: Dong ha Kang <dongha7.kang@samsung.com> Signed-off-by: Jong eon Park <jongeon.park@samsung.com> --- net/netlink/af_netlink.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index eb086b06d60d..f08c10220041 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2002,6 +2002,20 @@ static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, return err ? : copied; } +static __poll_t netlink_poll(struct file *file, struct socket *sock, + poll_table *wait) +{ + __poll_t mask = datagram_poll(file, sock, wait); + struct sock *sk = sock->sk; + struct netlink_sock *nlk = nlk_sk(sk); + + if (test_bit(NETLINK_S_CONGESTED, &nlk->state) && + skb_queue_empty_lockless(&sk->sk_receive_queue)) + mask |= EPOLLIN | EPOLLRDNORM; + + return mask; +} + static void netlink_data_ready(struct sock *sk) { BUG(); @@ -2803,7 +2817,7 @@ static const struct proto_ops netlink_ops = { .socketpair = sock_no_socketpair, .accept = sock_no_accept, .getname = netlink_getname, - .poll = datagram_poll, + .poll = netlink_poll, .ioctl = netlink_ioctl, .listen = sock_no_listen, .shutdown = sock_no_shutdown, -- 2.25.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] netlink: introduce netlink poll to resolve fast return issue 2023-11-03 7:22 ` [PATCH] netlink: introduce netlink poll to resolve fast return issue Jong eon Park @ 2023-11-06 23:48 ` Jakub Kicinski 2023-11-07 2:05 ` Jong eon Park 0 siblings, 1 reply; 7+ messages in thread From: Jakub Kicinski @ 2023-11-06 23:48 UTC (permalink / raw) To: Jong eon Park, Paolo Abeni Cc: David S. Miller, Eric Dumazet, netdev, linux-kernel, Dong ha Kang On Fri, 3 Nov 2023 16:22:09 +0900 Jong eon Park wrote: > In very rare cases, there was an issue where a user's poll function > waiting for a uevent would continuously return very quickly, causing > excessive CPU usage due to the following scenario. > > Once sk_rcvbuf becomes full netlink_broadcast_deliver returns an error and > netlink_overrun is called. However, if netlink_overrun was called in a > context just before a another context returns from the poll and recv is > invoked, emptying the rcvbuf, sk->sk_err = ENOBUF is written to the > netlink socket belatedly and it enters the NETLINK_S_CONGESTED state. > If the user does not check for POLLERR, they cannot consume and clean > sk_err and repeatedly enter the situation where they call poll again but > return immediately. > > To address this issue, I would like to introduce the following netlink > poll. > > After calling the datagram_poll, netlink poll checks the > NETLINK_S_CONGESTED status and rcv queue, and this make the user to be > readable once more even if the user has already emptied rcv queue. This > allows the user to be able to consume sk->sk_err value through > netlink_recvmsg, thus the situation described above can be avoided The explanation makes sense, but I'm not able to make the jump in understanding how this is a netlink problem. datagram_poll() returns EPOLLERR because sk_err is set, what makes netlink special? The fact that we can have an sk_err with nothing in the recv queue? Paolo understands this better, maybe he can weigh in tomorrow... ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] netlink: introduce netlink poll to resolve fast return issue 2023-11-06 23:48 ` Jakub Kicinski @ 2023-11-07 2:05 ` Jong eon Park 2023-11-07 16:53 ` Jakub Kicinski 0 siblings, 1 reply; 7+ messages in thread From: Jong eon Park @ 2023-11-07 2:05 UTC (permalink / raw) To: 'Jakub Kicinski', 'Paolo Abeni' Cc: 'David S. Miller', 'Eric Dumazet', netdev, linux-kernel, 'Dong ha Kang' > -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Tuesday, November 7, 2023 8:48 AM > To: Jong eon Park <jongeon.park@samsung.com>; Paolo Abeni > <pabeni@redhat.com> > Cc: David S. Miller <davem@davemloft.net>; Eric Dumazet > <edumazet@google.com>; netdev@vger.kernel.org; linux- > kernel@vger.kernel.org; Dong ha Kang <dongha7.kang@samsung.com> > Subject: Re: [PATCH] netlink: introduce netlink poll to resolve fast > return issue > > On Fri, 3 Nov 2023 16:22:09 +0900 Jong eon Park wrote: > > In very rare cases, there was an issue where a user's poll function > > waiting for a uevent would continuously return very quickly, causing > > excessive CPU usage due to the following scenario. > > > > Once sk_rcvbuf becomes full netlink_broadcast_deliver returns an error > > and netlink_overrun is called. However, if netlink_overrun was called > > in a context just before a another context returns from the poll and > > recv is invoked, emptying the rcvbuf, sk->sk_err = ENOBUF is written > > to the netlink socket belatedly and it enters the NETLINK_S_CONGESTED > state. > > If the user does not check for POLLERR, they cannot consume and clean > > sk_err and repeatedly enter the situation where they call poll again > > but return immediately. > > > > To address this issue, I would like to introduce the following netlink > > poll. > > > > After calling the datagram_poll, netlink poll checks the > > NETLINK_S_CONGESTED status and rcv queue, and this make the user to be > > readable once more even if the user has already emptied rcv queue. > > This allows the user to be able to consume sk->sk_err value through > > netlink_recvmsg, thus the situation described above can be avoided > > The explanation makes sense, but I'm not able to make the jump in > understanding how this is a netlink problem. datagram_poll() returns > EPOLLERR because sk_err is set, what makes netlink special? > The fact that we can have an sk_err with nothing in the recv queue? > > Paolo understands this better, maybe he can weigh in tomorrow... Perhaps my explanation was not comprehensive enough. The issue at hand is that once it occurs, users cannot escape from this "busy running" situation, and the inadequate handling of EPOLLERR by users imposes a heavy burden on the entire system, which seems quite harsh. The reason for a separate netlink poll is related to the netlink state. When it enters the NETLINK_S_CONGESTED state, sk can no longer receive or deliver skb, and the receive_queue must be completely emptied to clear the state. However, it was found that the NETLINK_S_CONGESTED state was still maintained even when the receive_queue was empty, which was incorrect, and that's why I implemented the handling in poll. I don't consider this approach to be the best way, so if you have any recommendations for a better solution, I would appreciate it. Regards. JE Park. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] netlink: introduce netlink poll to resolve fast return issue 2023-11-07 2:05 ` Jong eon Park @ 2023-11-07 16:53 ` Jakub Kicinski 2023-11-10 14:54 ` Jong eon Park 0 siblings, 1 reply; 7+ messages in thread From: Jakub Kicinski @ 2023-11-07 16:53 UTC (permalink / raw) To: Jong eon Park Cc: 'Paolo Abeni', 'David S. Miller', 'Eric Dumazet', netdev, linux-kernel, 'Dong ha Kang' On Tue, 7 Nov 2023 11:05:08 +0900 Jong eon Park wrote: > The issue at hand is that once it occurs, users cannot escape from this > "busy running" situation, and the inadequate handling of EPOLLERR by users > imposes a heavy burden on the entire system, which seems quite harsh. > > The reason for a separate netlink poll is related to the netlink state. > When it enters the NETLINK_S_CONGESTED state, sk can no longer receive or > deliver skb, and the receive_queue must be completely emptied to clear the > state. However, it was found that the NETLINK_S_CONGESTED state was still > maintained even when the receive_queue was empty, which was incorrect, and > that's why I implemented the handling in poll. Why does the wake up happen in the first place? I don't see anything special in the netlink code, so I'm assuming it's because datagram_poll() returns EPOLLERR. The man page says: EPOLLERR Error condition happened on the associated file descriptor. This event is also reported for the write end of a pipe when the read end has been closed. epoll_wait(2) will always report for this event; it is not necessary to set it in events when calling epoll_ctl(). To me that sounds like EPOLLERR is always implicitly enabled, and should be handled by the application. IOW it's an pure application bug. Are you aware of any precedent for sockets adding in EPOLLOUT when EPOLLERR is set? ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] netlink: introduce netlink poll to resolve fast return issue 2023-11-07 16:53 ` Jakub Kicinski @ 2023-11-10 14:54 ` Jong eon Park 2023-11-10 19:00 ` Jakub Kicinski 0 siblings, 1 reply; 7+ messages in thread From: Jong eon Park @ 2023-11-10 14:54 UTC (permalink / raw) To: 'Jakub Kicinski' Cc: 'Paolo Abeni', 'David S. Miller', 'Eric Dumazet', netdev, linux-kernel, 'Dong ha Kang' On Tuesday, Nov 7, 2023 at 08:48 Jakub Kicinski wrote: > Why does the wake up happen in the first place? > I don't see anything special in the netlink code, so I'm assuming it's > because datagram_poll() returns EPOLLERR. > > The man page says: > > EPOLLERR > Error condition happened on the associated file > descriptor. This event is also reported for the write end > of a pipe when the read end has been closed. > > epoll_wait(2) will always report for this event; it is not > necessary to set it in events when calling epoll_ctl(). > > To me that sounds like EPOLLERR is always implicitly enabled, and should > be handled by the application. IOW it's an pure application bug. > > Are you aware of any precedent for sockets adding in EPOLLOUT when > EPOLLERR is set? In my case, the first wake-up was by POLLIN, not POLLERR. Please consider the below scenario. ------------CPU1 (kernel)---------- --------------CPU2 (app)-------------- ... a driver delivers uevent. poll was waiting for schedule. a driver delivers uevent. a driver delivers uevent. ... 1) netlink_broadcast_deliver fails. (sk_rmem_alloc > sk_rcvbuf) getting schedule and poll returns, and the app calls recv. (rcv queue is empied) 2) netlink_overrun is called. (NETLINK_S_CONGESTED flag is set, ENOBUF is written in sk_err and, wake up poll.) finishing its job and call poll again. poll returns POLLERR. (the app doesn't have POLLERR handler,) it calls poll, but getting POLLERR. it calls poll, but getting POLLERR. it calls poll, but getting POLLERR. ... Interestingly, in this issue, even though netlink overrun frequently happened and caused POLLERRs, the user was managing it well through POLLIN and 'recv' function without a specific POLLERR handler. However, in the current situation, rcv queue is already empty and NETLINK_S_CONGESTED flag prevents any more incoming packets. This makes it impossible for the user to call 'recv'. This "congested" situation is a bit ambiguous. The queue is empty, yet 'congested' remains. This means kernel can no longer deliver uevents despite the empty queue, and it lead to the persistent 'congested' status. The reason for the difference in netlink lies in the NETLINK_S_CONGESTED flag. If it were UDP, upon seeing the empty queue, it might have kept pushing the received packets into the queue (making possible to call 'recv'). BRs, JE Park. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] netlink: introduce netlink poll to resolve fast return issue 2023-11-10 14:54 ` Jong eon Park @ 2023-11-10 19:00 ` Jakub Kicinski 2023-11-13 3:50 ` Jong eon Park 0 siblings, 1 reply; 7+ messages in thread From: Jakub Kicinski @ 2023-11-10 19:00 UTC (permalink / raw) To: Jong eon Park Cc: 'Paolo Abeni', 'David S. Miller', 'Eric Dumazet', netdev, linux-kernel, 'Dong ha Kang' On Fri, 10 Nov 2023 23:54:48 +0900 Jong eon Park wrote: > Interestingly, in this issue, even though netlink overrun frequently > happened and caused POLLERRs, the user was managing it well through > POLLIN and 'recv' function without a specific POLLERR handler. > However, in the current situation, rcv queue is already empty and > NETLINK_S_CONGESTED flag prevents any more incoming packets. This makes > it impossible for the user to call 'recv'. > > This "congested" situation is a bit ambiguous. The queue is empty, yet > 'congested' remains. This means kernel can no longer deliver uevents > despite the empty queue, and it lead to the persistent 'congested' status. > > The reason for the difference in netlink lies in the NETLINK_S_CONGESTED > flag. If it were UDP, upon seeing the empty queue, it might have kept > pushing the received packets into the queue (making possible to call > 'recv'). I see, please add a comment saying that NETLINK_S_CONGESTED prevents new skbs from being queued before the new test in netlink_poll(). Please repost next week (i.e. after the merge window) with subject tagged [PATCH net-next v2]. ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] netlink: introduce netlink poll to resolve fast return issue 2023-11-10 19:00 ` Jakub Kicinski @ 2023-11-13 3:50 ` Jong eon Park 0 siblings, 0 replies; 7+ messages in thread From: Jong eon Park @ 2023-11-13 3:50 UTC (permalink / raw) To: 'Jakub Kicinski' Cc: 'Paolo Abeni', 'David S. Miller', 'Eric Dumazet', netdev, linux-kernel, 'Dong ha Kang' On Saturday, November 11, 2023 4:00 AM, Jakub Kicinski wrote: > > I see, please add a comment saying that NETLINK_S_CONGESTED prevents > new skbs from being queued before the new test in netlink_poll(). > > Please repost next week (i.e. after the merge window) with subject > tagged [PATCH net-next v2]. Got it. Thanks Jakub. BRs, JE Park. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-11-13 3:50 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20231103072245epcas1p4471a31e9f579e38501c8c856d3ca2a77@epcas1p4.samsung.com>
2023-11-03 7:22 ` [PATCH] netlink: introduce netlink poll to resolve fast return issue Jong eon Park
2023-11-06 23:48 ` Jakub Kicinski
2023-11-07 2:05 ` Jong eon Park
2023-11-07 16:53 ` Jakub Kicinski
2023-11-10 14:54 ` Jong eon Park
2023-11-10 19:00 ` Jakub Kicinski
2023-11-13 3:50 ` Jong eon Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).