netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jong eon Park" <jongeon.park@samsung.com>
To: "'Jakub Kicinski'" <kuba@kernel.org>
Cc: "'Paolo Abeni'" <pabeni@redhat.com>,
	"'David S. Miller'" <davem@davemloft.net>,
	"'Eric Dumazet'" <edumazet@google.com>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>,
	"'Dong ha Kang'" <dongha7.kang@samsung.com>
Subject: RE: [PATCH] netlink: introduce netlink poll to resolve fast return issue
Date: Fri, 10 Nov 2023 23:54:48 +0900	[thread overview]
Message-ID: <000001da13e5$d9b99e30$8d2cda90$@samsung.com> (raw)
In-Reply-To: <20231107085347.75bc3802@kernel.org>



On Tuesday, Nov 7, 2023 at 08:48 Jakub Kicinski wrote:
> Why does the wake up happen in the first place?
> I don't see anything special in the netlink code, so I'm assuming it's
> because datagram_poll() returns EPOLLERR.
> 
> The man page says:
> 
>        EPOLLERR
>               Error condition happened on the associated file
>               descriptor.  This event is also reported for the write end
>               of a pipe when the read end has been closed.
> 
>               epoll_wait(2) will always report for this event; it is not
>               necessary to set it in events when calling epoll_ctl().
> 
> To me that sounds like EPOLLERR is always implicitly enabled, and should
> be handled by the application. IOW it's an pure application bug.
> 
> Are you aware of any precedent for sockets adding in EPOLLOUT when
> EPOLLERR is set?

In my case, the first wake-up was by POLLIN, not POLLERR.
Please consider the below scenario.

------------CPU1 (kernel)----------  --------------CPU2 (app)--------------
...
a driver delivers uevent.              poll was waiting for schedule.
a driver delivers uevent.
a driver delivers uevent.
...
1) netlink_broadcast_deliver fails.
(sk_rmem_alloc > sk_rcvbuf)
                                            getting schedule and poll
returns,
                                            and the app calls recv.
                                            (rcv queue is empied)
                                            2)

netlink_overrun is called.
(NETLINK_S_CONGESTED flag is set,
ENOBUF is written in sk_err and,
wake up poll.)
                                            finishing its job and call poll
again.
                                            poll returns POLLERR.

                                            (the app doesn't have POLLERR
handler,)
                                            it calls poll, but getting
POLLERR.
                                            it calls poll, but getting
POLLERR.
                                            it calls poll, but getting
POLLERR.
                                            ...
									 

Interestingly, in this issue, even though netlink overrun frequently 
happened and caused POLLERRs, the user was managing it well through 
POLLIN and 'recv' function without a specific POLLERR handler. 
However, in the current situation, rcv queue is already empty and 
NETLINK_S_CONGESTED flag prevents any more incoming packets. This makes 
it impossible for the user to call 'recv'.

This "congested" situation is a bit ambiguous. The queue is empty, yet 
'congested' remains. This means kernel can no longer deliver uevents 
despite the empty queue, and it lead to the persistent 'congested' status.

The reason for the difference in netlink lies in the NETLINK_S_CONGESTED 
flag. If it were UDP, upon seeing the empty queue, it might have kept 
pushing the received packets into the queue (making possible to call 
'recv').

BRs,
JE Park.



  reply	other threads:[~2023-11-10 14:54 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20231103072245epcas1p4471a31e9f579e38501c8c856d3ca2a77@epcas1p4.samsung.com>
2023-11-03  7:22 ` [PATCH] netlink: introduce netlink poll to resolve fast return issue Jong eon Park
2023-11-06 23:48   ` Jakub Kicinski
2023-11-07  2:05     ` Jong eon Park
2023-11-07 16:53       ` Jakub Kicinski
2023-11-10 14:54         ` Jong eon Park [this message]
2023-11-10 19:00           ` Jakub Kicinski
2023-11-13  3:50             ` Jong eon Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000001da13e5$d9b99e30$8d2cda90$@samsung.com' \
    --to=jongeon.park@samsung.com \
    --cc=davem@davemloft.net \
    --cc=dongha7.kang@samsung.com \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).