netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] netlink: introduce netlink poll to resolve fast return issue
       [not found] <CGME20231103072245epcas1p4471a31e9f579e38501c8c856d3ca2a77@epcas1p4.samsung.com>
@ 2023-11-03  7:22 ` Jong eon Park
  2023-11-06 23:48   ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Jong eon Park @ 2023-11-03  7:22 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: netdev, linux-kernel, Jong eon Park, Dong ha Kang

In very rare cases, there was an issue where a user's poll function
waiting for a uevent would continuously return very quickly, causing
excessive CPU usage due to the following scenario.

Once sk_rcvbuf becomes full netlink_broadcast_deliver returns an error and
netlink_overrun is called. However, if netlink_overrun was called in a
context just before a another context returns from the poll and recv is
invoked, emptying the rcvbuf, sk->sk_err = ENOBUF is written to the
netlink socket belatedly and it enters the NETLINK_S_CONGESTED state.
If the user does not check for POLLERR, they cannot consume and clean
sk_err and repeatedly enter the situation where they call poll again but
return immediately.

To address this issue, I would like to introduce the following netlink
poll.

After calling the datagram_poll, netlink poll checks the
NETLINK_S_CONGESTED status and rcv queue, and this make the user to be
readable once more even if the user has already emptied rcv queue. This
allows the user to be able to consume sk->sk_err value through
netlink_recvmsg, thus the situation described above can be avoided

Co-developed-by: Dong ha Kang <dongha7.kang@samsung.com>
Signed-off-by: Jong eon Park <jongeon.park@samsung.com>
---
 net/netlink/af_netlink.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index eb086b06d60d..f08c10220041 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2002,6 +2002,20 @@ static int netlink_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 	return err ? : copied;
 }
 
+static __poll_t netlink_poll(struct file *file, struct socket *sock,
+			     poll_table *wait)
+{
+	__poll_t mask = datagram_poll(file, sock, wait);
+	struct sock *sk = sock->sk;
+	struct netlink_sock *nlk = nlk_sk(sk);
+
+	if (test_bit(NETLINK_S_CONGESTED, &nlk->state) &&
+	    skb_queue_empty_lockless(&sk->sk_receive_queue))
+		mask |= EPOLLIN | EPOLLRDNORM;
+
+	return mask;
+}
+
 static void netlink_data_ready(struct sock *sk)
 {
 	BUG();
@@ -2803,7 +2817,7 @@ static const struct proto_ops netlink_ops = {
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	netlink_getname,
-	.poll =		datagram_poll,
+	.poll =		netlink_poll,
 	.ioctl =	netlink_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] netlink: introduce netlink poll to resolve fast return issue
  2023-11-03  7:22 ` [PATCH] netlink: introduce netlink poll to resolve fast return issue Jong eon Park
@ 2023-11-06 23:48   ` Jakub Kicinski
  2023-11-07  2:05     ` Jong eon Park
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2023-11-06 23:48 UTC (permalink / raw)
  To: Jong eon Park, Paolo Abeni
  Cc: David S. Miller, Eric Dumazet, netdev, linux-kernel, Dong ha Kang

On Fri,  3 Nov 2023 16:22:09 +0900 Jong eon Park wrote:
> In very rare cases, there was an issue where a user's poll function
> waiting for a uevent would continuously return very quickly, causing
> excessive CPU usage due to the following scenario.
> 
> Once sk_rcvbuf becomes full netlink_broadcast_deliver returns an error and
> netlink_overrun is called. However, if netlink_overrun was called in a
> context just before a another context returns from the poll and recv is
> invoked, emptying the rcvbuf, sk->sk_err = ENOBUF is written to the
> netlink socket belatedly and it enters the NETLINK_S_CONGESTED state.
> If the user does not check for POLLERR, they cannot consume and clean
> sk_err and repeatedly enter the situation where they call poll again but
> return immediately.
> 
> To address this issue, I would like to introduce the following netlink
> poll.
> 
> After calling the datagram_poll, netlink poll checks the
> NETLINK_S_CONGESTED status and rcv queue, and this make the user to be
> readable once more even if the user has already emptied rcv queue. This
> allows the user to be able to consume sk->sk_err value through
> netlink_recvmsg, thus the situation described above can be avoided

The explanation makes sense, but I'm not able to make the jump in
understanding how this is a netlink problem. datagram_poll() returns
EPOLLERR because sk_err is set, what makes netlink special?
The fact that we can have an sk_err with nothing in the recv queue?

Paolo understands this better, maybe he can weigh in tomorrow...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] netlink: introduce netlink poll to resolve fast return issue
  2023-11-06 23:48   ` Jakub Kicinski
@ 2023-11-07  2:05     ` Jong eon Park
  2023-11-07 16:53       ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Jong eon Park @ 2023-11-07  2:05 UTC (permalink / raw)
  To: 'Jakub Kicinski', 'Paolo Abeni'
  Cc: 'David S. Miller', 'Eric Dumazet', netdev,
	linux-kernel, 'Dong ha Kang'



> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Tuesday, November 7, 2023 8:48 AM
> To: Jong eon Park <jongeon.park@samsung.com>; Paolo Abeni
> <pabeni@redhat.com>
> Cc: David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Dong ha Kang <dongha7.kang@samsung.com>
> Subject: Re: [PATCH] netlink: introduce netlink poll to resolve fast
> return issue
> 
> On Fri,  3 Nov 2023 16:22:09 +0900 Jong eon Park wrote:
> > In very rare cases, there was an issue where a user's poll function
> > waiting for a uevent would continuously return very quickly, causing
> > excessive CPU usage due to the following scenario.
> >
> > Once sk_rcvbuf becomes full netlink_broadcast_deliver returns an error
> > and netlink_overrun is called. However, if netlink_overrun was called
> > in a context just before a another context returns from the poll and
> > recv is invoked, emptying the rcvbuf, sk->sk_err = ENOBUF is written
> > to the netlink socket belatedly and it enters the NETLINK_S_CONGESTED
> state.
> > If the user does not check for POLLERR, they cannot consume and clean
> > sk_err and repeatedly enter the situation where they call poll again
> > but return immediately.
> >
> > To address this issue, I would like to introduce the following netlink
> > poll.
> >
> > After calling the datagram_poll, netlink poll checks the
> > NETLINK_S_CONGESTED status and rcv queue, and this make the user to be
> > readable once more even if the user has already emptied rcv queue.
> > This allows the user to be able to consume sk->sk_err value through
> > netlink_recvmsg, thus the situation described above can be avoided
> 
> The explanation makes sense, but I'm not able to make the jump in
> understanding how this is a netlink problem. datagram_poll() returns
> EPOLLERR because sk_err is set, what makes netlink special?
> The fact that we can have an sk_err with nothing in the recv queue?
> 
> Paolo understands this better, maybe he can weigh in tomorrow...

Perhaps my explanation was not comprehensive enough.

The issue at hand is that once it occurs, users cannot escape from this 
"busy running" situation, and the inadequate handling of EPOLLERR by users 
imposes a heavy burden on the entire system, which seems quite harsh.

The reason for a separate netlink poll is related to the netlink state. 
When it enters the NETLINK_S_CONGESTED state, sk can no longer receive or 
deliver skb, and the receive_queue must be completely emptied to clear the 
state. However, it was found that the NETLINK_S_CONGESTED state was still 
maintained even when the receive_queue was empty, which was incorrect, and 
that's why I implemented the handling in poll.

I don't consider this approach to be the best way, so if you have any 
recommendations for a better solution, I would appreciate it.

Regards.
JE Park.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] netlink: introduce netlink poll to resolve fast return issue
  2023-11-07  2:05     ` Jong eon Park
@ 2023-11-07 16:53       ` Jakub Kicinski
  2023-11-10 14:54         ` Jong eon Park
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2023-11-07 16:53 UTC (permalink / raw)
  To: Jong eon Park
  Cc: 'Paolo Abeni', 'David S. Miller',
	'Eric Dumazet', netdev, linux-kernel,
	'Dong ha Kang'

On Tue, 7 Nov 2023 11:05:08 +0900 Jong eon Park wrote:
> The issue at hand is that once it occurs, users cannot escape from this 
> "busy running" situation, and the inadequate handling of EPOLLERR by users 
> imposes a heavy burden on the entire system, which seems quite harsh.
> 
> The reason for a separate netlink poll is related to the netlink state. 
> When it enters the NETLINK_S_CONGESTED state, sk can no longer receive or 
> deliver skb, and the receive_queue must be completely emptied to clear the 
> state. However, it was found that the NETLINK_S_CONGESTED state was still 
> maintained even when the receive_queue was empty, which was incorrect, and 
> that's why I implemented the handling in poll.

Why does the wake up happen in the first place? 
I don't see anything special in the netlink code, so I'm assuming 
it's because datagram_poll() returns EPOLLERR.

The man page says:

       EPOLLERR
              Error condition happened on the associated file
              descriptor.  This event is also reported for the write end
              of a pipe when the read end has been closed.

              epoll_wait(2) will always report for this event; it is not
              necessary to set it in events when calling epoll_ctl().

To me that sounds like EPOLLERR is always implicitly enabled, 
and should be handled by the application. IOW it's an pure application
bug.

Are you aware of any precedent for sockets adding in EPOLLOUT 
when EPOLLERR is set?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] netlink: introduce netlink poll to resolve fast return issue
  2023-11-07 16:53       ` Jakub Kicinski
@ 2023-11-10 14:54         ` Jong eon Park
  2023-11-10 19:00           ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Jong eon Park @ 2023-11-10 14:54 UTC (permalink / raw)
  To: 'Jakub Kicinski'
  Cc: 'Paolo Abeni', 'David S. Miller',
	'Eric Dumazet', netdev, linux-kernel,
	'Dong ha Kang'



On Tuesday, Nov 7, 2023 at 08:48 Jakub Kicinski wrote:
> Why does the wake up happen in the first place?
> I don't see anything special in the netlink code, so I'm assuming it's
> because datagram_poll() returns EPOLLERR.
> 
> The man page says:
> 
>        EPOLLERR
>               Error condition happened on the associated file
>               descriptor.  This event is also reported for the write end
>               of a pipe when the read end has been closed.
> 
>               epoll_wait(2) will always report for this event; it is not
>               necessary to set it in events when calling epoll_ctl().
> 
> To me that sounds like EPOLLERR is always implicitly enabled, and should
> be handled by the application. IOW it's an pure application bug.
> 
> Are you aware of any precedent for sockets adding in EPOLLOUT when
> EPOLLERR is set?

In my case, the first wake-up was by POLLIN, not POLLERR.
Please consider the below scenario.

------------CPU1 (kernel)----------  --------------CPU2 (app)--------------
...
a driver delivers uevent.              poll was waiting for schedule.
a driver delivers uevent.
a driver delivers uevent.
...
1) netlink_broadcast_deliver fails.
(sk_rmem_alloc > sk_rcvbuf)
                                            getting schedule and poll
returns,
                                            and the app calls recv.
                                            (rcv queue is empied)
                                            2)

netlink_overrun is called.
(NETLINK_S_CONGESTED flag is set,
ENOBUF is written in sk_err and,
wake up poll.)
                                            finishing its job and call poll
again.
                                            poll returns POLLERR.

                                            (the app doesn't have POLLERR
handler,)
                                            it calls poll, but getting
POLLERR.
                                            it calls poll, but getting
POLLERR.
                                            it calls poll, but getting
POLLERR.
                                            ...
									 

Interestingly, in this issue, even though netlink overrun frequently 
happened and caused POLLERRs, the user was managing it well through 
POLLIN and 'recv' function without a specific POLLERR handler. 
However, in the current situation, rcv queue is already empty and 
NETLINK_S_CONGESTED flag prevents any more incoming packets. This makes 
it impossible for the user to call 'recv'.

This "congested" situation is a bit ambiguous. The queue is empty, yet 
'congested' remains. This means kernel can no longer deliver uevents 
despite the empty queue, and it lead to the persistent 'congested' status.

The reason for the difference in netlink lies in the NETLINK_S_CONGESTED 
flag. If it were UDP, upon seeing the empty queue, it might have kept 
pushing the received packets into the queue (making possible to call 
'recv').

BRs,
JE Park.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] netlink: introduce netlink poll to resolve fast return issue
  2023-11-10 14:54         ` Jong eon Park
@ 2023-11-10 19:00           ` Jakub Kicinski
  2023-11-13  3:50             ` Jong eon Park
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2023-11-10 19:00 UTC (permalink / raw)
  To: Jong eon Park
  Cc: 'Paolo Abeni', 'David S. Miller',
	'Eric Dumazet', netdev, linux-kernel,
	'Dong ha Kang'

On Fri, 10 Nov 2023 23:54:48 +0900 Jong eon Park wrote:
> Interestingly, in this issue, even though netlink overrun frequently 
> happened and caused POLLERRs, the user was managing it well through 
> POLLIN and 'recv' function without a specific POLLERR handler. 
> However, in the current situation, rcv queue is already empty and 
> NETLINK_S_CONGESTED flag prevents any more incoming packets. This makes 
> it impossible for the user to call 'recv'.
> 
> This "congested" situation is a bit ambiguous. The queue is empty, yet 
> 'congested' remains. This means kernel can no longer deliver uevents 
> despite the empty queue, and it lead to the persistent 'congested' status.
> 
> The reason for the difference in netlink lies in the NETLINK_S_CONGESTED 
> flag. If it were UDP, upon seeing the empty queue, it might have kept 
> pushing the received packets into the queue (making possible to call 
> 'recv').

I see, please add a comment saying that NETLINK_S_CONGESTED prevents
new skbs from being queued before the new test in netlink_poll().

Please repost next week (i.e. after the merge window) with subject
tagged [PATCH net-next v2].

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] netlink: introduce netlink poll to resolve fast return issue
  2023-11-10 19:00           ` Jakub Kicinski
@ 2023-11-13  3:50             ` Jong eon Park
  0 siblings, 0 replies; 7+ messages in thread
From: Jong eon Park @ 2023-11-13  3:50 UTC (permalink / raw)
  To: 'Jakub Kicinski'
  Cc: 'Paolo Abeni', 'David S. Miller',
	'Eric Dumazet', netdev, linux-kernel,
	'Dong ha Kang'

On Saturday, November 11, 2023 4:00 AM, Jakub Kicinski wrote:
> 
> I see, please add a comment saying that NETLINK_S_CONGESTED prevents
> new skbs from being queued before the new test in netlink_poll().
> 
> Please repost next week (i.e. after the merge window) with subject
> tagged [PATCH net-next v2].

Got it. Thanks Jakub.

BRs,
JE Park.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-11-13  3:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20231103072245epcas1p4471a31e9f579e38501c8c856d3ca2a77@epcas1p4.samsung.com>
2023-11-03  7:22 ` [PATCH] netlink: introduce netlink poll to resolve fast return issue Jong eon Park
2023-11-06 23:48   ` Jakub Kicinski
2023-11-07  2:05     ` Jong eon Park
2023-11-07 16:53       ` Jakub Kicinski
2023-11-10 14:54         ` Jong eon Park
2023-11-10 19:00           ` Jakub Kicinski
2023-11-13  3:50             ` Jong eon Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).