netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Alexander Aring <aahringo@redhat.com>
Cc: fw@strlen.de, netdev@vger.kernel.org, linux-man@vger.kernel.org,
	teigland@redhat.com
Subject: Re: [PATCH resend] netlink.7: note not reliable if NETLINK_NO_ENOBUFS
Date: Fri, 5 Mar 2021 04:04:37 +0100	[thread overview]
Message-ID: <20210305030437.GA4268@salvia> (raw)
In-Reply-To: <20210304205728.34477-1-aahringo@redhat.com>

Hi Alexander,

On Thu, Mar 04, 2021 at 03:57:28PM -0500, Alexander Aring wrote:
> This patch adds a note to the netlink manpage that if NETLINK_NO_ENOBUFS
> is set there is no additional handling to make netlink reliable. It just
> disables the error notification.

A bit more background on this toggle.

NETLINK_NO_ENOBUFS also disables netlink broadcast congestion control
which kicks in when the socket buffer gets full. The existing
congestion control algorithm keeps dropping netlink event messages
until the queue is emptied. Note that it might take a while until your
userspace process fully empties the socket queue that is congested
(and during that time _your process is losing every netlink event_).

The usual approach when your process hits ENOBUFS is to resync via
NLM_F_DUMP unicast request. However, getting back to sync with the
kernel subsystem might be expensive if the number of items that are
exposed via netlink is huge.

Note that some people select very large socket buffer queue for
netlink sockets when they notice ENOBUFS. This might however makes
things worse because, as I said, congestion control drops every
netlink message until the queue is emptied. Selecting a large socket
buffer might help to postpone the ENOBUFS error, but once your process
hits ENOBUFS, then the netlink congestion control kicks in and you
will make you lose a lot of event messages (until the queue is empty
again!).

So NETLINK_NO_ENOBUFS from userspace makes sense if:

1) You are subscribed to a netlink broadcast group (so it does _not_
   make sense for unicast netlink sockets).
2) The kernel subsystem delivers the netlink messages you are
   subscribed to from atomic context (e.g. network packet path, if
   the netlink event is triggered by network packets, your process
   might get spammed with a lot of netlink messages in little time,
   depending on your network workload).
3) Your process does not want to resync on lost netlink messages.
   Your process assumes that events might get lost but it does not
   case / it does not want to make any specific action in such case.
4) You want to disable the netlink broadcast congestion control.

To provide an example kernel subsystem, this toggle can be useful with
the connection tracking system, when monitoring for new connection
events in a soft real-time fashion.

> The used word "avoid" receiving ENOBUFS errors can be interpreted
> that netlink tries to do some additional queue handling to avoid
> that such scenario occurs at all, e.g. like zerocopy which tries to
> avoid memory copy. However disable is not the right word here as
> well that in some cases ENOBUFS can be still received. This patch
> makes clear that there will no additional handling to put netlink in
> a more reliable mode.

Right, the NETLINK_NO_ENOBUFS toggle alone itself is not making
netlink more reliable for the broadcast scenario, it just changes the
way it netlink broadcast deals with congestion: userspace process gets
no reports on lost messages and netlink congestion control is
disabled.

  reply	other threads:[~2021-03-05  3:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-04 20:57 [PATCH resend] netlink.7: note not reliable if NETLINK_NO_ENOBUFS Alexander Aring
2021-03-05  3:04 ` Pablo Neira Ayuso [this message]
2021-03-05 19:43   ` Alexander Ahring Oder Aring
2021-03-05 20:36     ` Pablo Neira Ayuso
2021-03-05 23:21       ` Florian Westphal
2021-03-06  0:10         ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210305030437.GA4268@salvia \
    --to=pablo@netfilter.org \
    --cc=aahringo@redhat.com \
    --cc=fw@strlen.de \
    --cc=linux-man@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).