From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Nikolaos Gkarlis <nickgarlis@gmail.com>
Cc: netfilter-devel@vger.kernel.org, fw@strlen.de
Subject: Re: [PATCH] netfilter: nfnetlink: always ACK batch end if requested
Date: Wed, 8 Oct 2025 13:09:04 +0200 [thread overview]
Message-ID: <aOZGUPMwr5aHm66x@calendula> (raw)
In-Reply-To: <CA+jwDR=hSYD76Z_3tdJTn6ZKkU+U9ZKESh3YUXDNHkvcDbJHsw@mail.gmail.com>
Hi,
On Wed, Oct 08, 2025 at 10:41:05AM +0200, Nikolaos Gkarlis wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Regarding bf2ac490d28c, I don't understand why one needs an ack for
> > _BEGIN message. Maybe, an ack for END message might make sense when
> > BATCH_DONE is reached so you get a confirmation that the batch has
> > been fully processed, however...
>
> _BEGIN might be excessive, but as you said, I do think _END could be
> useful in the way you describe.
>
> My assumption is that the author of 1bf2ac490d28c aims to standardize
> the behavior while also allowing some flexibility in what flags are
> sent. If someone tried to use those flags in a creative way that
> deviates from what nft userspace expects, they might run into
> difficulties handling the responses correctly.
I think the author of 1bf2ac490d28c is using it for a testing tool
that sends very small batches (only few commands at a time). In that
case, considering the default socket buffer size, the acknowledment is
going to fit into the userspace netlink socket buffer.
> > I suspect the author of bf2ac490d28c is making wrong assumptions on
> > the number of acknowledgements that are going to be received by
> > userspace.
>
> That could very well be the case. As you said, you’re not always
> guaranteed to receive the same number of ACKs.
>
> I’m aware of the ENOBUFS error. Personally, I see it as a “fatal” or
> “delivery” error, which should tell userspace that no more messages
> are coming.
In netlink, ENOBUFS is not "fatal", it means messages got lost, but
there are still messages in the netlink socket buffer to be processed,
ie. the netlink messages before the overrun are still in place, but
the messages that could not fit in into the socket buffer has been
dropped.
nfnetlink handles a batch in two stages:
1) Process every netlink message in the batch, if either netlink
message triggers an error or NLM_F_ACK is set on, then enqueue
an error to the list.
2) If batch was successfully processed, iterate over the list of
errors and create the netlink acknowledgement messages that is
stores in the userspace netlink socket buffer.
Since this is a batch of netlink messages, acknowledgement either
triggered by explicit NLM_F_ACK or by errors may overrun the netlink
socket buffer.
> Similar to EPERM which I have a test case for.
EPERM is indeed fatal.
> It might not be the best approach, since one could argue such errors
> might also occur for individual batch commands. Still, now that I
> think about it, not receiving a _BEGIN message could indicate that
> the error is indeed fatal.
I think I see your point.
Acknowledgement for _BEGIN will be likely in the netlink socket
buffer, because it is the first message to be acknowledged, but _END
is the last one to be processed, so it could get lost if many
acknowledgements before have been queued to the userspace netlink
socket buffer (leading to overrun).
It seems with 1bf2ac490d28c, an acknowledgement with _BEGIN can be an
indication of successfully handling a batch in the way you describe.
> Receiving an error about an invalid command isn’t necessarily a
> delivery failure (unlike ENOBUFS), and I’d still expect to get the
> entire message back, including the ACK. Otherwise, how would userspace
> know that it has read all messages and drained the buffer?
For this nfnetlink batching, use select() to poll for pending messages
to process, and make no assumptions on how many messages you receive.
> You could argue that userspace should bail on the first error it
> receives, but if I’m not mistaken, the kernel will still send an error
> for any subsequent invalid command, meaning the buffer isn’t being
> drained again.
If you open, send then batch, process response, then close. Lazy
approach that consist of bailing out on the first error is OK. The
close call on the socket implicitly cleans up the ignored pending
error messages in the socket buffer.
But if you keep the socket open for several batches, with the approach
you describe, then unprocessed netlink error messages will pile up on
the socket buffer. If you do not do any sort of sequence tracking,
then you application process old pending errors as new, libmnl handles
this with EILSEQ.
All these netlink subtle details are not easy to follow :).
> > Netlink is a unreliable transport protocol, there are mechanisms to
> > make it "more reliable" but message loss (particularly in the
> > kernel -> userspace direction) is still possible.
>
> Is it unreliable mainly because of those corner cases, or are there
> other factors to consider as well?
As for netlink batching, which is supported in other classic netlink
subsystems, this acknowledgement overrun issue exists, I am referring
to the scenario where you add several netlink messages to the buffer
and send() them to the kernel.
As for nfnetlink, it is a bit special in that it has begin and end
messages because this is needed for the transaction semantics (to
implement a dryrun to test if incremental ruleset update is OK).
1bf2ac490d28c added the handling for NLM_F_ACK which I left it
unspecified at the time.
Netlink can be also used for event delivery to userspace, ENOBUFS can
also happen in that case, but that is a different scenario.
TBH, I am trying to remember the details, I don't talk about Netlink
very often.
next prev parent reply other threads:[~2025-10-08 11:09 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-01 21:15 [PATCH] netfilter: nfnetlink: always ACK batch end if requested Nikolaos Gkarlis
2025-10-02 9:48 ` Fernando Fernandez Mancera
2025-10-02 10:41 ` Nikolaos Gkarlis
2025-10-02 11:03 ` Fernando Fernandez Mancera
2025-10-04 9:26 ` [PATCH v2 0/2] " Nikolaos Gkarlis
2025-10-04 9:26 ` [PATCH v2 1/2] netfilter: nfnetlink: " Nikolaos Gkarlis
2025-10-04 9:26 ` [PATCH v2 2/2] selftests: netfilter: add nfnetlink ACK handling tests Nikolaos Gkarlis
2025-10-04 10:46 ` Florian Westphal
2025-10-04 11:08 ` Nikolaos Gkarlis
2025-10-04 12:26 ` Florian Westphal
2025-10-05 10:43 ` Nikolaos Gkarlis
2025-10-05 11:42 ` Florian Westphal
2025-10-05 12:54 ` [PATCH v3] " Nikolaos Gkarlis
2025-10-08 10:26 ` Florian Westphal
2025-10-08 10:37 ` Nikolaos Gkarlis
2025-10-08 10:39 ` Florian Westphal
2025-10-04 9:38 ` [PATCH v2 0/2] always ACK batch end if requested Nikolaos Gkarlis
2025-10-02 10:10 ` [PATCH] netfilter: nfnetlink: " Florian Westphal
2025-10-02 10:46 ` Nikolaos Gkarlis
2025-10-07 20:33 ` Pablo Neira Ayuso
2025-10-08 7:28 ` Florian Westphal
2025-10-08 11:33 ` Pablo Neira Ayuso
2025-10-08 13:35 ` Donald Hunter
2025-10-08 14:50 ` Florian Westphal
2025-10-08 8:41 ` Nikolaos Gkarlis
2025-10-08 11:09 ` Pablo Neira Ayuso [this message]
2025-10-08 14:50 ` Nikolaos Gkarlis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aOZGUPMwr5aHm66x@calendula \
--to=pablo@netfilter.org \
--cc=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
--cc=nickgarlis@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).