* [PATCH] netfilter: nfnetlink: always ACK batch end if requested
@ 2025-10-01 21:15 Nikolaos Gkarlis
2025-10-02 9:48 ` Fernando Fernandez Mancera
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: Nikolaos Gkarlis @ 2025-10-01 21:15 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo, fw, Nikolaos Gkarlis
Before ACKs were introduced for batch begin and batch end messages,
userspace expected to receive the same number of ACKs as it sent,
unless a fatal error occurred.
To preserve this deterministic behavior, send an ACK for batch end
messages even when an error happens in the middle of the batch,
similar to how ACKs are handled for command messages.
Signed-off-by: Nikolaos Gkarlis <nickgarlis@gmail.com>
---
Hi,
I recently came across the issue introduced by bf2ac490d28c and
while trying to find a way to handle it by adding an ACK on batch
begin, end messages, I spotted what looks like an inconsistency?
I have tested this change with my userspace application and it
seems to resolve the "problem". However, I am not sure if there
is a suitable place to add a regression test, since AFAIK nft
userspace does not currently use this feature. I would be happy
to contribute a test if you could point me to the right place.
I may be missing some context, so feedback on whether this is the
right approach would be very welcome.
net/netfilter/nfnetlink.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 811d02b4c4f7..0342087ead06 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -600,6 +600,11 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
status |= NFNL_BATCH_FAILURE;
goto replay_abort;
}
+
+ if (nlh->nlmsg_flags & NLM_F_ACK) {
+ memset(&extack, 0, sizeof(extack));
+ nfnl_err_add(&err_list, nlh, 0, &extack);
+ }
}
nfnl_err_deliver(&err_list, oskb);
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] netfilter: nfnetlink: always ACK batch end if requested
2025-10-01 21:15 [PATCH] netfilter: nfnetlink: always ACK batch end if requested Nikolaos Gkarlis
@ 2025-10-02 9:48 ` Fernando Fernandez Mancera
2025-10-02 10:41 ` Nikolaos Gkarlis
2025-10-02 10:10 ` [PATCH] netfilter: nfnetlink: " Florian Westphal
2025-10-07 20:33 ` Pablo Neira Ayuso
2 siblings, 1 reply; 27+ messages in thread
From: Fernando Fernandez Mancera @ 2025-10-02 9:48 UTC (permalink / raw)
To: Nikolaos Gkarlis, netfilter-devel; +Cc: pablo, fw
On 10/1/25 11:15 PM, Nikolaos Gkarlis wrote:
> Before ACKs were introduced for batch begin and batch end messages,
> userspace expected to receive the same number of ACKs as it sent,
> unless a fatal error occurred.
>
> To preserve this deterministic behavior, send an ACK for batch end
> messages even when an error happens in the middle of the batch,
> similar to how ACKs are handled for command messages.
>
> Signed-off-by: Nikolaos Gkarlis <nickgarlis@gmail.com>
> ---
> Hi,
>
> I recently came across the issue introduced by bf2ac490d28c and
> while trying to find a way to handle it by adding an ACK on batch
> begin, end messages, I spotted what looks like an inconsistency?
>
> I have tested this change with my userspace application and it
> seems to resolve the "problem". However, I am not sure if there
> is a suitable place to add a regression test, since AFAIK nft
> userspace does not currently use this feature. I would be happy
> to contribute a test if you could point me to the right place.
>
> I may be missing some context, so feedback on whether this is the
> right approach would be very welcome.
>
> net/netfilter/nfnetlink.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
> index 811d02b4c4f7..0342087ead06 100644
> --- a/net/netfilter/nfnetlink.c
> +++ b/net/netfilter/nfnetlink.c
> @@ -600,6 +600,11 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
> status |= NFNL_BATCH_FAILURE;
> goto replay_abort;
> }
> +
> + if (nlh->nlmsg_flags & NLM_F_ACK) {
> + memset(&extack, 0, sizeof(extack));
> + nfnl_err_add(&err_list, nlh, 0, &extack);
> + }
> }
Right, if BATCH_END message is reached and has the NLM_F_ACK, nfnetlink
should send the corresponding ACK. Currently if BATCH_END message is
missing, this would send an extra wrong ACK if the previous message was
using NLM_F_ACK.
e.g for a batch formatted like (BATCH_BEGIN|NFT_MSG_NEWRULE + NLM_F_ACK)
- nfnetlink would send two ACKs while it should be only one. Granted it
won't configure anything but it would be still misleading.
What about this?
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index b8d0fad1ed10..ecf85346d883 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -601,7 +601,7 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb,
struct nlmsghdr *nlh,
goto replay_abort;
}
- if (nlh->nlmsg_flags & NLM_F_ACK) {
+ if (nlh->nlmsg_flags & NLM_F_ACK && status &
NFNL_BATCH_DONE) {
memset(&extack, 0, sizeof(extack));
nfnl_err_add(&err_list, nlh, 0, &extack);
}
>
> nfnl_err_deliver(&err_list, oskb);
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] netfilter: nfnetlink: always ACK batch end if requested
2025-10-01 21:15 [PATCH] netfilter: nfnetlink: always ACK batch end if requested Nikolaos Gkarlis
2025-10-02 9:48 ` Fernando Fernandez Mancera
@ 2025-10-02 10:10 ` Florian Westphal
2025-10-02 10:46 ` Nikolaos Gkarlis
2025-10-07 20:33 ` Pablo Neira Ayuso
2 siblings, 1 reply; 27+ messages in thread
From: Florian Westphal @ 2025-10-02 10:10 UTC (permalink / raw)
To: Nikolaos Gkarlis; +Cc: netfilter-devel, pablo
Nikolaos Gkarlis <nickgarlis@gmail.com> wrote:
> I have tested this change with my userspace application and it
> seems to resolve the "problem". However, I am not sure if there
> is a suitable place to add a regression test, since AFAIK nft
> userspace does not currently use this feature. I would be happy
> to contribute a test if you could point me to the right place.
You could add a nfnetlink selftest to:
tools/testing/selftests/net/netfilter
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] netfilter: nfnetlink: always ACK batch end if requested
2025-10-02 9:48 ` Fernando Fernandez Mancera
@ 2025-10-02 10:41 ` Nikolaos Gkarlis
2025-10-02 11:03 ` Fernando Fernandez Mancera
0 siblings, 1 reply; 27+ messages in thread
From: Nikolaos Gkarlis @ 2025-10-02 10:41 UTC (permalink / raw)
To: Fernando Fernandez Mancera; +Cc: netfilter-devel, pablo, fw
Fernando Fernandez Mancera <fmancera@suse.de> wrote:
> e.g for a batch formatted like (BATCH_BEGIN|NFT_MSG_NEWRULE + NLM_F_ACK)
> - nfnetlink would send two ACKs while it should be only one. Granted it
> won't configure anything but it would be still misleading.
>
> What about this?
>
Thanks ! I was a bit unsure on whether the status should also be checked.
This seems to work with my test.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] netfilter: nfnetlink: always ACK batch end if requested
2025-10-02 10:10 ` [PATCH] netfilter: nfnetlink: " Florian Westphal
@ 2025-10-02 10:46 ` Nikolaos Gkarlis
0 siblings, 0 replies; 27+ messages in thread
From: Nikolaos Gkarlis @ 2025-10-02 10:46 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel, pablo, Fernando Fernandez Mancera
Florian Westphal <fw@strlen.de> wrote:
> You could add a nfnetlink selftest to:
> tools/testing/selftests/net/netfilter
Thanks! I will add the test and include it in the patch. I should
have some time to look at it this evening. Since there seem to be
many corner cases, I suggest we wait for the test before merging,
unless there is a particular rush to get this out.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] netfilter: nfnetlink: always ACK batch end if requested
2025-10-02 10:41 ` Nikolaos Gkarlis
@ 2025-10-02 11:03 ` Fernando Fernandez Mancera
2025-10-04 9:26 ` [PATCH v2 0/2] " Nikolaos Gkarlis
2025-10-04 9:38 ` [PATCH v2 0/2] always ACK batch end if requested Nikolaos Gkarlis
0 siblings, 2 replies; 27+ messages in thread
From: Fernando Fernandez Mancera @ 2025-10-02 11:03 UTC (permalink / raw)
To: Nikolaos Gkarlis; +Cc: netfilter-devel, pablo, fw
On 10/2/25 12:41 PM, Nikolaos Gkarlis wrote:
> Fernando Fernandez Mancera <fmancera@suse.de> wrote:
>
>> e.g for a batch formatted like (BATCH_BEGIN|NFT_MSG_NEWRULE + NLM_F_ACK)
>> - nfnetlink would send two ACKs while it should be only one. Granted it
>> won't configure anything but it would be still misleading.
>>
>> What about this?
>>
>
> Thanks ! I was a bit unsure on whether the status should also be checked.
> This seems to work with my test.
Just a nit, the commit should also have a Fixes tag IMHO.
Fixes: bf2ac490d28c ("netfilter: nfnetlink: Handle ACK flags for batch
messages")
Thanks!
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 0/2] always ACK batch end if requested
2025-10-02 11:03 ` Fernando Fernandez Mancera
@ 2025-10-04 9:26 ` Nikolaos Gkarlis
2025-10-04 9:26 ` [PATCH v2 1/2] netfilter: nfnetlink: " Nikolaos Gkarlis
2025-10-04 9:26 ` [PATCH v2 2/2] selftests: netfilter: add nfnetlink ACK handling tests Nikolaos Gkarlis
2025-10-04 9:38 ` [PATCH v2 0/2] always ACK batch end if requested Nikolaos Gkarlis
1 sibling, 2 replies; 27+ messages in thread
From: Nikolaos Gkarlis @ 2025-10-04 9:26 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo, fw, fmancera, Nikolaos Gkarlis
Hi again,
Apologies for the delay. I have now added an nfnetlink selftest as
suggested by Florian, and committed Fernando's suggestion to also
check for the status. I have confirmed that the test fails on
kernels > 6.10 that do not include Fernando's 09efbac953f6 fix.
The test could be made more extensive, but I think this is a good
starting point. Let me know what you think.
Nikolaos Gkarlis (2):
netfilter: nfnetlink: always ACK batch end if requested
selftests: netfilter: add nfnetlink ACK handling tests
net/netfilter/nfnetlink.c | 5 +
.../testing/selftests/net/netfilter/Makefile | 3 +
.../selftests/net/netfilter/nfnetlink.c | 424 ++++++++++++++++++
.../selftests/net/netfilter/nfnetlink.sh | 11 +
4 files changed, 443 insertions(+)
create mode 100644 tools/testing/selftests/net/netfilter/nfnetlink.c
create mode 100755 tools/testing/selftests/net/netfilter/nfnetlink.sh
--
2.34.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 1/2] netfilter: nfnetlink: always ACK batch end if requested
2025-10-04 9:26 ` [PATCH v2 0/2] " Nikolaos Gkarlis
@ 2025-10-04 9:26 ` Nikolaos Gkarlis
2025-10-04 9:26 ` [PATCH v2 2/2] selftests: netfilter: add nfnetlink ACK handling tests Nikolaos Gkarlis
1 sibling, 0 replies; 27+ messages in thread
From: Nikolaos Gkarlis @ 2025-10-04 9:26 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo, fw, fmancera, Nikolaos Gkarlis
Before ACKs were introduced for batch begin and batch end messages,
userspace expected to receive the same number of ACKs as it sent,
unless a fatal error occurred.
To preserve this deterministic behavior, send an ACK for batch end
messages even when an error happens in the middle of the batch,
similar to how ACKs are handled for command messages.
Fixes: bf2ac490d28c ("netfilter: nfnetlink: Handle ACK flags for batch
messages")
Signed-off-by: Nikolaos Gkarlis <nickgarlis@gmail.com>
---
net/netfilter/nfnetlink.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 811d02b4c4f7..33acc1b94a0e 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -600,6 +600,11 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
status |= NFNL_BATCH_FAILURE;
goto replay_abort;
}
+
+ if (nlh->nlmsg_flags & NLM_F_ACK && status & NFNL_BATCH_DONE) {
+ memset(&extack, 0, sizeof(extack));
+ nfnl_err_add(&err_list, nlh, 0, &extack);
+ }
}
nfnl_err_deliver(&err_list, oskb);
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v2 2/2] selftests: netfilter: add nfnetlink ACK handling tests
2025-10-04 9:26 ` [PATCH v2 0/2] " Nikolaos Gkarlis
2025-10-04 9:26 ` [PATCH v2 1/2] netfilter: nfnetlink: " Nikolaos Gkarlis
@ 2025-10-04 9:26 ` Nikolaos Gkarlis
2025-10-04 10:46 ` Florian Westphal
1 sibling, 1 reply; 27+ messages in thread
From: Nikolaos Gkarlis @ 2025-10-04 9:26 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo, fw, fmancera, Nikolaos Gkarlis
Add nfnetlink selftests to validate the ACKs sent after a batch
message. These tests verify that:
- ACKs are always received in order.
- Module loading does not affect the responses.
- The number of ACKs matches the number of requests, unless a
fatal error occurs.
Signed-off-by: Nikolaos Gkarlis <nickgarlis@gmail.com>
---
.../testing/selftests/net/netfilter/Makefile | 3 +
.../selftests/net/netfilter/nfnetlink.c | 424 ++++++++++++++++++
.../selftests/net/netfilter/nfnetlink.sh | 11 +
3 files changed, 438 insertions(+)
create mode 100644 tools/testing/selftests/net/netfilter/nfnetlink.c
create mode 100755 tools/testing/selftests/net/netfilter/nfnetlink.sh
diff --git a/tools/testing/selftests/net/netfilter/Makefile b/tools/testing/selftests/net/netfilter/Makefile
index a98ed892f55f..cb38c8a9b2cc 100644
--- a/tools/testing/selftests/net/netfilter/Makefile
+++ b/tools/testing/selftests/net/netfilter/Makefile
@@ -37,6 +37,7 @@ TEST_PROGS += nft_zones_many.sh
TEST_PROGS += rpath.sh
TEST_PROGS += vxlan_mtu_frag.sh
TEST_PROGS += xt_string.sh
+TEST_PROGS += nfnetlink.sh
TEST_PROGS_EXTENDED = nft_concat_range_perf.sh
@@ -46,6 +47,7 @@ TEST_GEN_FILES += conntrack_dump_flush
TEST_GEN_FILES += conntrack_reverse_clash
TEST_GEN_FILES += sctp_collision
TEST_GEN_FILES += udpclash
+TEST_GEN_FILES += nfnetlink
include ../../lib.mk
@@ -54,6 +56,7 @@ $(OUTPUT)/nf_queue: LDLIBS += $(MNL_LDLIBS)
$(OUTPUT)/conntrack_dump_flush: CFLAGS += $(MNL_CFLAGS)
$(OUTPUT)/conntrack_dump_flush: LDLIBS += $(MNL_LDLIBS)
+$(OUTPUT)/nfnetlink: LDLIBS += $(MNL_LDLIBS)
$(OUTPUT)/udpclash: LDLIBS += -lpthread
TEST_FILES := lib.sh
diff --git a/tools/testing/selftests/net/netfilter/nfnetlink.c b/tools/testing/selftests/net/netfilter/nfnetlink.c
new file mode 100644
index 000000000000..c1cb7484873a
--- /dev/null
+++ b/tools/testing/selftests/net/netfilter/nfnetlink.c
@@ -0,0 +1,424 @@
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <time.h>
+#include <arpa/inet.h>
+#include <libmnl/libmnl.h>
+#include <linux/netfilter/nfnetlink.h>
+#include <linux/netfilter/nf_tables.h>
+#include <linux/netfilter/nf_conntrack_common.h>
+#include <linux/netfilter.h>
+#include <sys/types.h>
+
+#include "../../kselftest_harness.h"
+
+#define BATCH_PAGE_SIZE 8192
+
+static bool batch_begin(struct mnl_nlmsg_batch *batch, uint32_t seq)
+{
+ struct nlmsghdr *nlh;
+ struct nfgenmsg *nfh;
+
+ nlh = mnl_nlmsg_put_header(mnl_nlmsg_batch_current(batch));
+ nlh->nlmsg_type = NFNL_MSG_BATCH_BEGIN;
+ nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+ nlh->nlmsg_seq = seq;
+
+ nfh = mnl_nlmsg_put_extra_header(nlh, sizeof(struct nfgenmsg));
+ nfh->nfgen_family = NFPROTO_UNSPEC;
+ nfh->version = NFNETLINK_V0;
+ nfh->res_id = NFNL_SUBSYS_NFTABLES;
+
+ return mnl_nlmsg_batch_next(batch);
+}
+
+static bool batch_end(struct mnl_nlmsg_batch *batch, uint32_t seq)
+{
+ struct nlmsghdr *nlh;
+ struct nfgenmsg *nfh;
+
+ nlh = mnl_nlmsg_put_header(mnl_nlmsg_batch_current(batch));
+ nlh->nlmsg_type = NFNL_MSG_BATCH_END;
+ nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+ nlh->nlmsg_seq = seq;
+
+ nfh = mnl_nlmsg_put_extra_header(nlh, sizeof(struct nfgenmsg));
+ nfh->nfgen_family = NFPROTO_UNSPEC;
+ nfh->version = NFNETLINK_V0;
+ nfh->res_id = NFNL_SUBSYS_NFTABLES;
+
+ return mnl_nlmsg_batch_next(batch);
+}
+
+static bool
+create_table(struct mnl_nlmsg_batch *batch, const char *name, uint32_t seq)
+{
+ struct nlmsghdr *nlh;
+ struct nfgenmsg *nfh;
+
+ nlh = mnl_nlmsg_put_header(mnl_nlmsg_batch_current(batch));
+ nlh->nlmsg_type = (NFNL_SUBSYS_NFTABLES << 8) | NFT_MSG_NEWTABLE;
+ nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_ACK;
+ nlh->nlmsg_seq = seq;
+
+ nfh = mnl_nlmsg_put_extra_header(nlh, sizeof(struct nfgenmsg));
+ nfh->nfgen_family = NFPROTO_INET;
+ nfh->version = NFNETLINK_V0;
+ nfh->res_id = 0;
+
+ mnl_attr_put_strz(nlh, NFTA_TABLE_NAME, name);
+
+ return mnl_nlmsg_batch_next(batch);
+}
+
+static bool
+create_chain(struct mnl_nlmsg_batch *batch, const char *table,
+ const char *chain, uint32_t seq)
+{
+ struct nlmsghdr *nlh;
+ struct nfgenmsg *nfh;
+
+ nlh = mnl_nlmsg_put_header(mnl_nlmsg_batch_current(batch));
+ nlh->nlmsg_type = (NFNL_SUBSYS_NFTABLES << 8) | NFT_MSG_NEWCHAIN;
+ nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_ACK;
+ nlh->nlmsg_seq = seq;
+
+ nfh = mnl_nlmsg_put_extra_header(nlh, sizeof(struct nfgenmsg));
+ nfh->nfgen_family = NFPROTO_INET;
+ nfh->version = NFNETLINK_V0;
+ nfh->res_id = 0;
+
+ mnl_attr_put_strz(nlh, NFTA_CHAIN_TABLE, table);
+ mnl_attr_put_strz(nlh, NFTA_CHAIN_NAME, chain);
+
+ return mnl_nlmsg_batch_next(batch);
+}
+
+static bool
+create_rule_with_ct(struct mnl_nlmsg_batch *batch, const char *table,
+ const char *chain, uint32_t seq)
+{
+ struct nlmsghdr *nlh;
+ struct nfgenmsg *nfh;
+ struct nlattr *nest1, *nest2, *nest3, *nest4;
+
+ nlh = mnl_nlmsg_put_header(mnl_nlmsg_batch_current(batch));
+ nlh->nlmsg_type = (NFNL_SUBSYS_NFTABLES << 8) | NFT_MSG_NEWRULE;
+ nlh->nlmsg_flags =
+ NLM_F_REQUEST | NLM_F_CREATE | NLM_F_APPEND | NLM_F_ACK;
+ nlh->nlmsg_seq = seq;
+
+ nfh = mnl_nlmsg_put_extra_header(nlh, sizeof(struct nfgenmsg));
+ nfh->nfgen_family = NFPROTO_INET;
+ nfh->version = NFNETLINK_V0;
+ nfh->res_id = 0;
+
+ mnl_attr_put_strz(nlh, NFTA_RULE_TABLE, table);
+ mnl_attr_put_strz(nlh, NFTA_RULE_CHAIN, chain);
+
+ /* Create expressions list */
+ nest1 = mnl_attr_nest_start(nlh, NFTA_RULE_EXPRESSIONS);
+
+ /* CT expression to load ct state */
+ nest2 = mnl_attr_nest_start(nlh, NFTA_LIST_ELEM);
+ mnl_attr_put_strz(nlh, NFTA_EXPR_NAME, "ct");
+
+ nest3 = mnl_attr_nest_start(nlh, NFTA_EXPR_DATA);
+ mnl_attr_put_u32(nlh, NFTA_CT_KEY, htonl(NFT_CT_STATE));
+ mnl_attr_put_u32(nlh, NFTA_CT_DREG, htonl(NFT_REG_1));
+ mnl_attr_nest_end(nlh, nest3);
+
+ mnl_attr_nest_end(nlh, nest2);
+
+ /* Bitwise expression */
+ nest2 = mnl_attr_nest_start(nlh, NFTA_LIST_ELEM);
+ mnl_attr_put_strz(nlh, NFTA_EXPR_NAME, "bitwise");
+
+ nest3 = mnl_attr_nest_start(nlh, NFTA_EXPR_DATA);
+ mnl_attr_put_u32(nlh, NFTA_BITWISE_SREG, htonl(NFT_REG_1));
+ mnl_attr_put_u32(nlh, NFTA_BITWISE_DREG, htonl(NFT_REG_1));
+ mnl_attr_put_u32(nlh, NFTA_BITWISE_LEN, htonl(4));
+
+ uint32_t mask =
+ NF_CT_STATE_BIT(IP_CT_ESTABLISHED) | NF_CT_STATE_BIT(IP_CT_RELATED);
+ nest4 = mnl_attr_nest_start(nlh, NFTA_BITWISE_MASK);
+ mnl_attr_put(nlh, NFTA_DATA_VALUE, sizeof(mask), &mask);
+ mnl_attr_nest_end(nlh, nest4);
+
+ uint32_t val = 0x00000000;
+ nest4 = mnl_attr_nest_start(nlh, NFTA_BITWISE_XOR);
+ mnl_attr_put(nlh, NFTA_DATA_VALUE, sizeof(val), &val);
+ mnl_attr_nest_end(nlh, nest4);
+
+ mnl_attr_nest_end(nlh, nest3);
+ mnl_attr_nest_end(nlh, nest2);
+
+ /* Cmp expression */
+ nest2 = mnl_attr_nest_start(nlh, NFTA_LIST_ELEM);
+ mnl_attr_put_strz(nlh, NFTA_EXPR_NAME, "cmp");
+
+ nest3 = mnl_attr_nest_start(nlh, NFTA_EXPR_DATA);
+ mnl_attr_put_u32(nlh, NFTA_CMP_SREG, htonl(NFT_REG_1));
+ mnl_attr_put_u32(nlh, NFTA_CMP_OP, htonl(NFT_CMP_NEQ));
+
+ uint32_t cmp_data = 0x00000000;
+ nest4 = mnl_attr_nest_start(nlh, NFTA_CMP_DATA);
+ mnl_attr_put(nlh, NFTA_DATA_VALUE, sizeof(cmp_data), &cmp_data);
+ mnl_attr_nest_end(nlh, nest4);
+
+ mnl_attr_nest_end(nlh, nest3);
+ mnl_attr_nest_end(nlh, nest2);
+
+ /* Counter expression */
+ nest2 = mnl_attr_nest_start(nlh, NFTA_LIST_ELEM);
+ mnl_attr_put_strz(nlh, NFTA_EXPR_NAME, "counter");
+
+ mnl_attr_nest_end(nlh, nest2);
+
+ mnl_attr_nest_end(nlh, nest1);
+
+ return mnl_nlmsg_batch_next(batch);
+}
+
+static bool create_invalid_table(struct mnl_nlmsg_batch *batch, uint32_t seq)
+{
+ struct nlmsghdr *nlh;
+ struct nfgenmsg *nfh;
+
+ nlh = mnl_nlmsg_put_header(mnl_nlmsg_batch_current(batch));
+ nlh->nlmsg_type = (NFNL_SUBSYS_NFTABLES << 8) | NFT_MSG_NEWTABLE;
+ nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_ACK;
+ nlh->nlmsg_seq = seq;
+
+ nfh = mnl_nlmsg_put_extra_header(nlh, sizeof(struct nfgenmsg));
+ nfh->nfgen_family = NFPROTO_INET;
+ nfh->version = NFNETLINK_V0;
+ nfh->res_id = 0;
+
+ /* Intentionally omit table name attribute to cause error */
+
+ return mnl_nlmsg_batch_next(batch);
+}
+
+static void validate_res(struct mnl_socket *nl, uint32_t first_seq,
+ uint32_t expected_acks,
+ struct __test_metadata *_metadata)
+{
+ char buf[BATCH_PAGE_SIZE];
+ int ret, acks_received = 0;
+ uint32_t last_seq = 0;
+ bool out_of_order = false;
+
+ while (1) {
+ ret = mnl_socket_recvfrom(nl, buf, sizeof(buf));
+ if (ret == -1) {
+ if (errno == EAGAIN || errno == EWOULDBLOCK) {
+ break;
+ }
+
+ TH_LOG("Unexpected error on recv: %s", strerror(errno));
+ ASSERT_TRUE(false);
+ return;
+ }
+
+ struct nlmsghdr *nlh = (struct nlmsghdr *)buf;
+ while (mnl_nlmsg_ok(nlh, ret)) {
+ if (nlh->nlmsg_type == NLMSG_ERROR) {
+ struct nlmsgerr *err =
+ mnl_nlmsg_get_payload(nlh);
+ acks_received++;
+
+ if (err->error != 0) {
+ TH_LOG("[seq=%u] Error: %s",
+ nlh->nlmsg_seq,
+ strerror(-err->error));
+ } else {
+ TH_LOG("[seq=%u] ACK", nlh->nlmsg_seq);
+ }
+
+ if (nlh->nlmsg_seq <= last_seq) {
+ out_of_order = true;
+ TH_LOG
+ ("Out of order ack: seq %u after %u",
+ nlh->nlmsg_seq, last_seq);
+ }
+ last_seq = nlh->nlmsg_seq;
+ }
+ nlh = mnl_nlmsg_next(nlh, &ret);
+ }
+ }
+
+ ASSERT_FALSE(out_of_order);
+
+ ASSERT_EQ(acks_received, expected_acks);
+}
+
+FIXTURE(nfnetlink_batch)
+{
+ struct mnl_socket *nl;
+};
+
+FIXTURE_SETUP(nfnetlink_batch)
+{
+ struct timeval tv = {.tv_sec = 1,.tv_usec = 0 };
+
+ self->nl = mnl_socket_open(NETLINK_NETFILTER);
+ ASSERT_NE(self->nl, NULL);
+
+ ASSERT_EQ(mnl_socket_bind(self->nl, 0, MNL_SOCKET_AUTOPID), 0);
+
+ ASSERT_EQ(setsockopt
+ (mnl_socket_get_fd(self->nl), SOL_SOCKET, SO_RCVTIMEO, &tv,
+ sizeof(tv)), 0);
+}
+
+FIXTURE_TEARDOWN(nfnetlink_batch)
+{
+ if (self->nl) {
+ mnl_socket_close(self->nl);
+ }
+}
+
+TEST_F(nfnetlink_batch, simple_batch)
+{
+ struct mnl_nlmsg_batch *batch;
+ char buf[BATCH_PAGE_SIZE];
+ uint32_t seq = time(NULL);
+ int ret;
+
+ batch = mnl_nlmsg_batch_start(buf, sizeof(buf));
+ ASSERT_NE(batch, NULL);
+
+ ASSERT_TRUE(batch_begin(batch, seq++));
+ create_table(batch, "test_table1", seq++);
+ batch_end(batch, seq++);
+
+ ret = mnl_socket_sendto(self->nl, mnl_nlmsg_batch_head(batch),
+ mnl_nlmsg_batch_size(batch));
+ ASSERT_GT(ret, 0);
+
+ // Expect 3 acks: batch_begin, create_table, batch_end
+ validate_res(self->nl, seq - 3, 3, _metadata);
+
+ mnl_nlmsg_batch_stop(batch);
+}
+
+TEST_F(nfnetlink_batch, module_load)
+{
+ struct mnl_nlmsg_batch *batch;
+ char buf[BATCH_PAGE_SIZE];
+ uint32_t seq = time(NULL) + 1000;
+ int ret;
+
+ batch = mnl_nlmsg_batch_start(buf, sizeof(buf));
+ ASSERT_NE(batch, NULL);
+
+ batch_begin(batch, seq++);
+ create_table(batch, "test_table2", seq++);
+ create_chain(batch, "test_table2", "test_chain", seq++);
+ create_rule_with_ct(batch, "test_table2", "test_chain", seq++);
+ batch_end(batch, seq++);
+
+ ret = mnl_socket_sendto(self->nl, mnl_nlmsg_batch_head(batch),
+ mnl_nlmsg_batch_size(batch));
+ ASSERT_GT(ret, 0);
+
+ // Expect 5 acks: batch_begin, table, chain, rule, batch_end
+ validate_res(self->nl, seq - 5, 5, _metadata);
+
+ mnl_nlmsg_batch_stop(batch);
+}
+
+// Repeat the CT test to verify module loading behavior
+TEST_F(nfnetlink_batch, post_module_load)
+{
+ struct mnl_nlmsg_batch *batch;
+ char buf[BATCH_PAGE_SIZE];
+ uint32_t seq = time(NULL) + 2000;
+ int ret;
+
+ batch = mnl_nlmsg_batch_start(buf, sizeof(buf));
+ ASSERT_NE(batch, NULL);
+
+ batch_begin(batch, seq++);
+ create_table(batch, "test_table6", seq++);
+ create_chain(batch, "test_table6", "test_chain", seq++);
+ create_rule_with_ct(batch, "test_table6", "test_chain", seq++);
+ batch_end(batch, seq++);
+
+ ret = mnl_socket_sendto(self->nl, mnl_nlmsg_batch_head(batch),
+ mnl_nlmsg_batch_size(batch));
+ ASSERT_GT(ret, 0);
+
+ validate_res(self->nl, seq - 5, 5, _metadata);
+
+ mnl_nlmsg_batch_stop(batch);
+}
+
+TEST_F(nfnetlink_batch, invalid_batch)
+{
+ struct mnl_nlmsg_batch *batch;
+ char buf[BATCH_PAGE_SIZE];
+ uint32_t seq = time(NULL) + 3000;
+ int ret;
+
+ batch = mnl_nlmsg_batch_start(buf, sizeof(buf));
+ ASSERT_NE(batch, NULL);
+
+ batch_begin(batch, seq++);
+ create_table(batch, "test_table3", seq++);
+ create_invalid_table(batch, seq++);
+ create_chain(batch, "test_table3", "test_chain", seq++);
+ batch_end(batch, seq++);
+
+ ret = mnl_socket_sendto(self->nl, mnl_nlmsg_batch_head(batch),
+ mnl_nlmsg_batch_size(batch));
+ ASSERT_GT(ret, 0);
+
+ // Expect 5 acks: batch_begin, table, invalid_table(error), chain, batch_end
+ validate_res(self->nl, seq - 5, 5, _metadata);
+
+ mnl_nlmsg_batch_stop(batch);
+}
+
+TEST_F(nfnetlink_batch, batch_with_fatal_error)
+{
+ struct mnl_nlmsg_batch *batch;
+ char buf[BATCH_PAGE_SIZE];
+ uint32_t seq = time(NULL) + 4000;
+ uid_t uid;
+ int ret;
+
+ batch = mnl_nlmsg_batch_start(buf, sizeof(buf));
+ ASSERT_NE(batch, NULL);
+
+ batch_begin(batch, seq++);
+ create_table(batch, "test_table4", seq++);
+ create_table(batch, "test_table5", seq++);
+ batch_end(batch, seq++);
+
+ // Drop privileges to trigger EPERM
+ uid = geteuid();
+ if (uid == 0) {
+ ASSERT_EQ(seteuid(65534), 0);
+ }
+
+ ret = mnl_socket_sendto(self->nl, mnl_nlmsg_batch_head(batch),
+ mnl_nlmsg_batch_size(batch));
+
+ // Restore privileges
+ if (uid == 0) {
+ seteuid(uid);
+ }
+
+ ASSERT_GT(ret, 0);
+
+ // EPERM is fatal and should abort batch, expect only 1 ACK
+ validate_res(self->nl, seq - 4, 1, _metadata);
+
+ mnl_nlmsg_batch_stop(batch);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/net/netfilter/nfnetlink.sh b/tools/testing/selftests/net/netfilter/nfnetlink.sh
new file mode 100755
index 000000000000..4456796732a8
--- /dev/null
+++ b/tools/testing/selftests/net/netfilter/nfnetlink.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+exec unshare -n bash -c '
+ nft flush ruleset
+
+ rmmod nft_ct
+
+ ./nfnetlink
+
+ nft flush ruleset
+'
\ No newline at end of file
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v2 0/2] always ACK batch end if requested
2025-10-02 11:03 ` Fernando Fernandez Mancera
2025-10-04 9:26 ` [PATCH v2 0/2] " Nikolaos Gkarlis
@ 2025-10-04 9:38 ` Nikolaos Gkarlis
1 sibling, 0 replies; 27+ messages in thread
From: Nikolaos Gkarlis @ 2025-10-04 9:38 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo, fw, fmancera, Nikolaos Gkarlis
Hi again,
Apologies for the delay. I have now added an nfnetlink selftest as
suggested by Florian, and committed Fernando's suggestion to also
check for the status. I have confirmed that the test fails on
kernels > 6.10 that do not include Fernando's 09efbac953f6 fix.
The test could be made more extensive, but I think this is a good
starting point. Let me know what you think.
This is a resend to fix threading. Sorry about that.
Nikolaos Gkarlis (2):
netfilter: nfnetlink: always ACK batch end if requested
selftests: netfilter: add nfnetlink ACK handling tests
net/netfilter/nfnetlink.c | 5 +
.../testing/selftests/net/netfilter/Makefile | 3 +
.../selftests/net/netfilter/nfnetlink.c | 424 ++++++++++++++++++
.../selftests/net/netfilter/nfnetlink.sh | 11 +
4 files changed, 443 insertions(+)
create mode 100644 tools/testing/selftests/net/netfilter/nfnetlink.c
create mode 100755 tools/testing/selftests/net/netfilter/nfnetlink.sh
--
2.34.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 2/2] selftests: netfilter: add nfnetlink ACK handling tests
2025-10-04 9:26 ` [PATCH v2 2/2] selftests: netfilter: add nfnetlink ACK handling tests Nikolaos Gkarlis
@ 2025-10-04 10:46 ` Florian Westphal
2025-10-04 11:08 ` Nikolaos Gkarlis
0 siblings, 1 reply; 27+ messages in thread
From: Florian Westphal @ 2025-10-04 10:46 UTC (permalink / raw)
To: Nikolaos Gkarlis; +Cc: netfilter-devel, pablo, fmancera
Nikolaos Gkarlis <nickgarlis@gmail.com> wrote:
> Add nfnetlink selftests to validate the ACKs sent after a batch
> message. These tests verify that:
>
> - ACKs are always received in order.
> - Module loading does not affect the responses.
> - The number of ACKs matches the number of requests, unless a
> fatal error occurs.
Thanks for the tests!
This looks good to me, justs one minor nit.
Can you drop the shell wrapper and just call unshare(CLONE_NEWNET) from the
fixture setup function?
> +++ b/tools/testing/selftests/net/netfilter/Makefile
> @@ -37,6 +37,7 @@ TEST_PROGS += nft_zones_many.sh
> TEST_PROGS += rpath.sh
> TEST_PROGS += vxlan_mtu_frag.sh
> TEST_PROGS += xt_string.sh
> +TEST_PROGS += nfnetlink.sh
> TEST_PROGS_EXTENDED = nft_concat_range_perf.sh
>
> @@ -46,6 +47,7 @@ TEST_GEN_FILES += conntrack_dump_flush
> TEST_GEN_FILES += conntrack_reverse_clash
> TEST_GEN_FILES += sctp_collision
> TEST_GEN_FILES += udpclash
> +TEST_GEN_FILES += nfnetlink
replacing this with
TEST_GEN_PROGS = nfnetlink
... should make kselftests run this prog just like it does
for TEST_PROGS shell scripts.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 2/2] selftests: netfilter: add nfnetlink ACK handling tests
2025-10-04 10:46 ` Florian Westphal
@ 2025-10-04 11:08 ` Nikolaos Gkarlis
2025-10-04 12:26 ` Florian Westphal
0 siblings, 1 reply; 27+ messages in thread
From: Nikolaos Gkarlis @ 2025-10-04 11:08 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel, pablo, fmancera
Florian Westphal <fw@strlen.de> wrote:
> Can you drop the shell wrapper and just call unshare(CLONE_NEWNET) from the
> fixture setup function?
Yes, I can do that. However, I am wondering about cleanup both
before and after the tests, as well as ensuring that the nft_ct
module is not loaded prior to execution. Should these steps be
included in the program, or do we assume that the tests will
always run on a clean system with no modules already loaded?
Thanks for the review!
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 2/2] selftests: netfilter: add nfnetlink ACK handling tests
2025-10-04 11:08 ` Nikolaos Gkarlis
@ 2025-10-04 12:26 ` Florian Westphal
2025-10-05 10:43 ` Nikolaos Gkarlis
0 siblings, 1 reply; 27+ messages in thread
From: Florian Westphal @ 2025-10-04 12:26 UTC (permalink / raw)
To: Nikolaos Gkarlis; +Cc: netfilter-devel, pablo, fmancera
Nikolaos Gkarlis <nickgarlis@gmail.com> wrote:
> Florian Westphal <fw@strlen.de> wrote:
> > Can you drop the shell wrapper and just call unshare(CLONE_NEWNET) from the
> > fixture setup function?
>
> Yes, I can do that. However, I am wondering about cleanup both
> before and after the tests, as well as ensuring that the nft_ct
> module is not loaded prior to execution.
Is this to exercise replay path?
Perhaps add a comment to the subtest that depends on this.
I don't see the need for this otherwise.
What happens when nft_ct is builtin? The test should not fail
in that case.
> Should these steps be
> included in the program, or do we assume that the tests will
> always run on a clean system with no modules already loaded?
No, if you need nft_ct to not be loaded then the rmmod has to
be used, but there is no guarantee it will work, e.g. because
nft_ct is builtin or because its in use.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 2/2] selftests: netfilter: add nfnetlink ACK handling tests
2025-10-04 12:26 ` Florian Westphal
@ 2025-10-05 10:43 ` Nikolaos Gkarlis
2025-10-05 11:42 ` Florian Westphal
0 siblings, 1 reply; 27+ messages in thread
From: Nikolaos Gkarlis @ 2025-10-05 10:43 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel, pablo, fmancera
Florian Westphal <fw@strlen.de> wrote:
> Is this to exercise replay path?
> Perhaps add a comment to the subtest that depends on this.
Yes, that is the goal. I can add a comment to the test explaining that.
> What happens when nft_ct is builtin? The test should not fail
> in that case.
No, the test does not fail when nft_ct is builtin or missing. It only
verifies the number of ACKs returned and their order, which is what
bf2ac490d28c broke. I have tested both builtin and non-existent cases.
> No, if you need nft_ct to not be loaded then the rmmod has to
> be used, but there is no guarantee it will work, e.g. because
> nft_ct is builtin or because it's in use.
What do you think about something like this?
+++ b/tools/testing/selftests/net/netfilter/nfnetlink.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+
+# If nft_ct is a module and is loaded, remove it to test module auto-loading
+lsmod | grep -q "^nft_ct\b" && rmmod nft_ct
+./nfnetlink
I can send a v3 if that looks okay to you.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 2/2] selftests: netfilter: add nfnetlink ACK handling tests
2025-10-05 10:43 ` Nikolaos Gkarlis
@ 2025-10-05 11:42 ` Florian Westphal
2025-10-05 12:54 ` [PATCH v3] " Nikolaos Gkarlis
0 siblings, 1 reply; 27+ messages in thread
From: Florian Westphal @ 2025-10-05 11:42 UTC (permalink / raw)
To: Nikolaos Gkarlis; +Cc: netfilter-devel, pablo, fmancera
Nikolaos Gkarlis <nickgarlis@gmail.com> wrote:
> Florian Westphal <fw@strlen.de> wrote:
> > Is this to exercise replay path?
> > Perhaps add a comment to the subtest that depends on this.
>
> Yes, that is the goal. I can add a comment to the test explaining that.
Thanks!
> verifies the number of ACKs returned and their order, which is what
> bf2ac490d28c broke. I have tested both builtin and non-existent cases.
Great.
> What do you think about something like this?
>
> +++ b/tools/testing/selftests/net/netfilter/nfnetlink.sh
> @@ -0,0 +1,5 @@
> +#!/bin/bash
> +
> +# If nft_ct is a module and is loaded, remove it to test module auto-loading
Looks good, can you also mention that if removal doesn't work this is
ok?
> +lsmod | grep -q "^nft_ct\b" && rmmod nft_ct
You could just "rmmod nft_ct 2>/dev/null" with
above comment in place.
> +./nfnetlink
Since you need the shell wraper for the rmmod you could
also make this
unshare -n ./nfnetlink and not do it in the .c setup
function. Your call.
> I can send a v3 if that looks okay to you.
Yes please. You can just send a v3 of the test case,
no need to resend the fix.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v3] selftests: netfilter: add nfnetlink ACK handling tests
2025-10-05 11:42 ` Florian Westphal
@ 2025-10-05 12:54 ` Nikolaos Gkarlis
2025-10-08 10:26 ` Florian Westphal
0 siblings, 1 reply; 27+ messages in thread
From: Nikolaos Gkarlis @ 2025-10-05 12:54 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo, fw, fmancera, Nikolaos Gkarlis
Add nfnetlink selftests to validate the ACKs sent after a batch
message. These tests verify that:
- ACKs are always received in order.
- Module loading does not affect the responses.
- The number of ACKs matches the number of requests, unless a
fatal error occurs.
Signed-off-by: Nikolaos Gkarlis <nickgarlis@gmail.com>
---
.../testing/selftests/net/netfilter/Makefile | 3 +
.../selftests/net/netfilter/nfnetlink.c | 437 ++++++++++++++++++
.../selftests/net/netfilter/nfnetlink.sh | 6 +
3 files changed, 446 insertions(+)
create mode 100644 tools/testing/selftests/net/netfilter/nfnetlink.c
create mode 100755 tools/testing/selftests/net/netfilter/nfnetlink.sh
diff --git a/tools/testing/selftests/net/netfilter/Makefile b/tools/testing/selftests/net/netfilter/Makefile
index a98ed892f55f..cb38c8a9b2cc 100644
--- a/tools/testing/selftests/net/netfilter/Makefile
+++ b/tools/testing/selftests/net/netfilter/Makefile
@@ -37,6 +37,7 @@ TEST_PROGS += nft_zones_many.sh
TEST_PROGS += rpath.sh
TEST_PROGS += vxlan_mtu_frag.sh
TEST_PROGS += xt_string.sh
+TEST_PROGS += nfnetlink.sh
TEST_PROGS_EXTENDED = nft_concat_range_perf.sh
@@ -46,6 +47,7 @@ TEST_GEN_FILES += conntrack_dump_flush
TEST_GEN_FILES += conntrack_reverse_clash
TEST_GEN_FILES += sctp_collision
TEST_GEN_FILES += udpclash
+TEST_GEN_FILES += nfnetlink
include ../../lib.mk
@@ -54,6 +56,7 @@ $(OUTPUT)/nf_queue: LDLIBS += $(MNL_LDLIBS)
$(OUTPUT)/conntrack_dump_flush: CFLAGS += $(MNL_CFLAGS)
$(OUTPUT)/conntrack_dump_flush: LDLIBS += $(MNL_LDLIBS)
+$(OUTPUT)/nfnetlink: LDLIBS += $(MNL_LDLIBS)
$(OUTPUT)/udpclash: LDLIBS += -lpthread
TEST_FILES := lib.sh
diff --git a/tools/testing/selftests/net/netfilter/nfnetlink.c b/tools/testing/selftests/net/netfilter/nfnetlink.c
new file mode 100644
index 000000000000..d6a2895be1f0
--- /dev/null
+++ b/tools/testing/selftests/net/netfilter/nfnetlink.c
@@ -0,0 +1,437 @@
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sched.h>
+#include <time.h>
+#include <arpa/inet.h>
+#include <libmnl/libmnl.h>
+#include <linux/netfilter/nfnetlink.h>
+#include <linux/netfilter/nf_tables.h>
+#include <linux/netfilter/nf_conntrack_common.h>
+#include <linux/netfilter.h>
+#include <sys/types.h>
+
+#include "../../kselftest_harness.h"
+
+#define BATCH_PAGE_SIZE 8192
+
+static bool batch_begin(struct mnl_nlmsg_batch *batch, uint32_t seq)
+{
+ struct nlmsghdr *nlh;
+ struct nfgenmsg *nfh;
+
+ nlh = mnl_nlmsg_put_header(mnl_nlmsg_batch_current(batch));
+ nlh->nlmsg_type = NFNL_MSG_BATCH_BEGIN;
+ nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+ nlh->nlmsg_seq = seq;
+
+ nfh = mnl_nlmsg_put_extra_header(nlh, sizeof(struct nfgenmsg));
+ nfh->nfgen_family = NFPROTO_UNSPEC;
+ nfh->version = NFNETLINK_V0;
+ nfh->res_id = NFNL_SUBSYS_NFTABLES;
+
+ return mnl_nlmsg_batch_next(batch);
+}
+
+static bool batch_end(struct mnl_nlmsg_batch *batch, uint32_t seq)
+{
+ struct nlmsghdr *nlh;
+ struct nfgenmsg *nfh;
+
+ nlh = mnl_nlmsg_put_header(mnl_nlmsg_batch_current(batch));
+ nlh->nlmsg_type = NFNL_MSG_BATCH_END;
+ nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+ nlh->nlmsg_seq = seq;
+
+ nfh = mnl_nlmsg_put_extra_header(nlh, sizeof(struct nfgenmsg));
+ nfh->nfgen_family = NFPROTO_UNSPEC;
+ nfh->version = NFNETLINK_V0;
+ nfh->res_id = NFNL_SUBSYS_NFTABLES;
+
+ return mnl_nlmsg_batch_next(batch);
+}
+
+static bool
+create_table(struct mnl_nlmsg_batch *batch, const char *name, uint32_t seq)
+{
+ struct nlmsghdr *nlh;
+ struct nfgenmsg *nfh;
+
+ nlh = mnl_nlmsg_put_header(mnl_nlmsg_batch_current(batch));
+ nlh->nlmsg_type = (NFNL_SUBSYS_NFTABLES << 8) | NFT_MSG_NEWTABLE;
+ nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_ACK;
+ nlh->nlmsg_seq = seq;
+
+ nfh = mnl_nlmsg_put_extra_header(nlh, sizeof(struct nfgenmsg));
+ nfh->nfgen_family = NFPROTO_INET;
+ nfh->version = NFNETLINK_V0;
+ nfh->res_id = 0;
+
+ mnl_attr_put_strz(nlh, NFTA_TABLE_NAME, name);
+
+ return mnl_nlmsg_batch_next(batch);
+}
+
+static bool
+create_chain(struct mnl_nlmsg_batch *batch, const char *table,
+ const char *chain, uint32_t seq)
+{
+ struct nlmsghdr *nlh;
+ struct nfgenmsg *nfh;
+
+ nlh = mnl_nlmsg_put_header(mnl_nlmsg_batch_current(batch));
+ nlh->nlmsg_type = (NFNL_SUBSYS_NFTABLES << 8) | NFT_MSG_NEWCHAIN;
+ nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_ACK;
+ nlh->nlmsg_seq = seq;
+
+ nfh = mnl_nlmsg_put_extra_header(nlh, sizeof(struct nfgenmsg));
+ nfh->nfgen_family = NFPROTO_INET;
+ nfh->version = NFNETLINK_V0;
+ nfh->res_id = 0;
+
+ mnl_attr_put_strz(nlh, NFTA_CHAIN_TABLE, table);
+ mnl_attr_put_strz(nlh, NFTA_CHAIN_NAME, chain);
+
+ return mnl_nlmsg_batch_next(batch);
+}
+
+static bool
+create_rule_with_ct(struct mnl_nlmsg_batch *batch, const char *table,
+ const char *chain, uint32_t seq)
+{
+ struct nlmsghdr *nlh;
+ struct nfgenmsg *nfh;
+ struct nlattr *nest1, *nest2, *nest3, *nest4;
+
+ nlh = mnl_nlmsg_put_header(mnl_nlmsg_batch_current(batch));
+ nlh->nlmsg_type = (NFNL_SUBSYS_NFTABLES << 8) | NFT_MSG_NEWRULE;
+ nlh->nlmsg_flags =
+ NLM_F_REQUEST | NLM_F_CREATE | NLM_F_APPEND | NLM_F_ACK;
+ nlh->nlmsg_seq = seq;
+
+ nfh = mnl_nlmsg_put_extra_header(nlh, sizeof(struct nfgenmsg));
+ nfh->nfgen_family = NFPROTO_INET;
+ nfh->version = NFNETLINK_V0;
+ nfh->res_id = 0;
+
+ mnl_attr_put_strz(nlh, NFTA_RULE_TABLE, table);
+ mnl_attr_put_strz(nlh, NFTA_RULE_CHAIN, chain);
+
+ /* Create expressions list */
+ nest1 = mnl_attr_nest_start(nlh, NFTA_RULE_EXPRESSIONS);
+
+ /* CT expression to load ct state */
+ nest2 = mnl_attr_nest_start(nlh, NFTA_LIST_ELEM);
+ mnl_attr_put_strz(nlh, NFTA_EXPR_NAME, "ct");
+
+ nest3 = mnl_attr_nest_start(nlh, NFTA_EXPR_DATA);
+ mnl_attr_put_u32(nlh, NFTA_CT_KEY, htonl(NFT_CT_STATE));
+ mnl_attr_put_u32(nlh, NFTA_CT_DREG, htonl(NFT_REG_1));
+ mnl_attr_nest_end(nlh, nest3);
+
+ mnl_attr_nest_end(nlh, nest2);
+
+ /* Bitwise expression */
+ nest2 = mnl_attr_nest_start(nlh, NFTA_LIST_ELEM);
+ mnl_attr_put_strz(nlh, NFTA_EXPR_NAME, "bitwise");
+
+ nest3 = mnl_attr_nest_start(nlh, NFTA_EXPR_DATA);
+ mnl_attr_put_u32(nlh, NFTA_BITWISE_SREG, htonl(NFT_REG_1));
+ mnl_attr_put_u32(nlh, NFTA_BITWISE_DREG, htonl(NFT_REG_1));
+ mnl_attr_put_u32(nlh, NFTA_BITWISE_LEN, htonl(4));
+
+ uint32_t mask =
+ NF_CT_STATE_BIT(IP_CT_ESTABLISHED) | NF_CT_STATE_BIT(IP_CT_RELATED);
+ nest4 = mnl_attr_nest_start(nlh, NFTA_BITWISE_MASK);
+ mnl_attr_put(nlh, NFTA_DATA_VALUE, sizeof(mask), &mask);
+ mnl_attr_nest_end(nlh, nest4);
+
+ uint32_t val = 0x00000000;
+ nest4 = mnl_attr_nest_start(nlh, NFTA_BITWISE_XOR);
+ mnl_attr_put(nlh, NFTA_DATA_VALUE, sizeof(val), &val);
+ mnl_attr_nest_end(nlh, nest4);
+
+ mnl_attr_nest_end(nlh, nest3);
+ mnl_attr_nest_end(nlh, nest2);
+
+ /* Cmp expression */
+ nest2 = mnl_attr_nest_start(nlh, NFTA_LIST_ELEM);
+ mnl_attr_put_strz(nlh, NFTA_EXPR_NAME, "cmp");
+
+ nest3 = mnl_attr_nest_start(nlh, NFTA_EXPR_DATA);
+ mnl_attr_put_u32(nlh, NFTA_CMP_SREG, htonl(NFT_REG_1));
+ mnl_attr_put_u32(nlh, NFTA_CMP_OP, htonl(NFT_CMP_NEQ));
+
+ uint32_t cmp_data = 0x00000000;
+ nest4 = mnl_attr_nest_start(nlh, NFTA_CMP_DATA);
+ mnl_attr_put(nlh, NFTA_DATA_VALUE, sizeof(cmp_data), &cmp_data);
+ mnl_attr_nest_end(nlh, nest4);
+
+ mnl_attr_nest_end(nlh, nest3);
+ mnl_attr_nest_end(nlh, nest2);
+
+ /* Counter expression */
+ nest2 = mnl_attr_nest_start(nlh, NFTA_LIST_ELEM);
+ mnl_attr_put_strz(nlh, NFTA_EXPR_NAME, "counter");
+
+ mnl_attr_nest_end(nlh, nest2);
+
+ mnl_attr_nest_end(nlh, nest1);
+
+ return mnl_nlmsg_batch_next(batch);
+}
+
+static bool create_invalid_table(struct mnl_nlmsg_batch *batch, uint32_t seq)
+{
+ struct nlmsghdr *nlh;
+ struct nfgenmsg *nfh;
+
+ nlh = mnl_nlmsg_put_header(mnl_nlmsg_batch_current(batch));
+ nlh->nlmsg_type = (NFNL_SUBSYS_NFTABLES << 8) | NFT_MSG_NEWTABLE;
+ nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_ACK;
+ nlh->nlmsg_seq = seq;
+
+ nfh = mnl_nlmsg_put_extra_header(nlh, sizeof(struct nfgenmsg));
+ nfh->nfgen_family = NFPROTO_INET;
+ nfh->version = NFNETLINK_V0;
+ nfh->res_id = 0;
+
+ /* Intentionally omit table name attribute to cause error */
+
+ return mnl_nlmsg_batch_next(batch);
+}
+
+static void validate_res(struct mnl_socket *nl, uint32_t first_seq,
+ uint32_t expected_acks,
+ struct __test_metadata *_metadata)
+{
+ char buf[BATCH_PAGE_SIZE];
+ int ret, acks_received = 0;
+ uint32_t last_seq = 0;
+ bool out_of_order = false;
+
+ while (1) {
+ ret = mnl_socket_recvfrom(nl, buf, sizeof(buf));
+ if (ret == -1) {
+ if (errno == EAGAIN || errno == EWOULDBLOCK) {
+ break;
+ }
+
+ TH_LOG("Unexpected error on recv: %s", strerror(errno));
+ ASSERT_TRUE(false);
+ return;
+ }
+
+ struct nlmsghdr *nlh = (struct nlmsghdr *)buf;
+ while (mnl_nlmsg_ok(nlh, ret)) {
+ if (nlh->nlmsg_type == NLMSG_ERROR) {
+ struct nlmsgerr *err =
+ mnl_nlmsg_get_payload(nlh);
+ acks_received++;
+
+ if (err->error != 0) {
+ TH_LOG("[seq=%u] Error: %s",
+ nlh->nlmsg_seq,
+ strerror(-err->error));
+ } else {
+ TH_LOG("[seq=%u] ACK", nlh->nlmsg_seq);
+ }
+
+ if (nlh->nlmsg_seq <= last_seq) {
+ out_of_order = true;
+ TH_LOG
+ ("Out of order ack: seq %u after %u",
+ nlh->nlmsg_seq, last_seq);
+ }
+ last_seq = nlh->nlmsg_seq;
+ }
+ nlh = mnl_nlmsg_next(nlh, &ret);
+ }
+ }
+
+ ASSERT_FALSE(out_of_order);
+
+ ASSERT_EQ(acks_received, expected_acks);
+}
+
+FIXTURE(nfnetlink_batch)
+{
+ struct mnl_socket *nl;
+};
+
+FIXTURE_SETUP(nfnetlink_batch)
+{
+ struct timeval tv = {.tv_sec = 1,.tv_usec = 0 };
+
+ ASSERT_EQ(unshare(CLONE_NEWNET), 0);
+
+ self->nl = mnl_socket_open(NETLINK_NETFILTER);
+ ASSERT_NE(self->nl, NULL);
+
+ ASSERT_EQ(mnl_socket_bind(self->nl, 0, MNL_SOCKET_AUTOPID), 0);
+
+ ASSERT_EQ(setsockopt
+ (mnl_socket_get_fd(self->nl), SOL_SOCKET, SO_RCVTIMEO, &tv,
+ sizeof(tv)), 0);
+}
+
+FIXTURE_TEARDOWN(nfnetlink_batch)
+{
+ if (self->nl) {
+ mnl_socket_close(self->nl);
+ }
+}
+
+TEST_F(nfnetlink_batch, simple_batch)
+{
+ struct mnl_nlmsg_batch *batch;
+ char buf[BATCH_PAGE_SIZE];
+ uint32_t seq = time(NULL);
+ int ret;
+
+ batch = mnl_nlmsg_batch_start(buf, sizeof(buf));
+ ASSERT_NE(batch, NULL);
+
+ ASSERT_TRUE(batch_begin(batch, seq++));
+ create_table(batch, "test_table1", seq++);
+ batch_end(batch, seq++);
+
+ ret = mnl_socket_sendto(self->nl, mnl_nlmsg_batch_head(batch),
+ mnl_nlmsg_batch_size(batch));
+ ASSERT_GT(ret, 0);
+
+ // Expect 3 acks: batch_begin, create_table, batch_end
+ validate_res(self->nl, seq - 3, 3, _metadata);
+
+ mnl_nlmsg_batch_stop(batch);
+}
+
+/*
+ * This test simulates the replay path by executing a batch that depends
+ * on the nft_ct module. The purpose is to verify that the same number of
+ * ACKs is received regardless of whether the module gets loaded automatically
+ * or not. If the module is built in or not present, the test will pass anyway.
+ */
+TEST_F(nfnetlink_batch, module_load)
+{
+ struct mnl_nlmsg_batch *batch;
+ char buf[BATCH_PAGE_SIZE];
+ uint32_t seq = time(NULL) + 1000;
+ int ret;
+
+ batch = mnl_nlmsg_batch_start(buf, sizeof(buf));
+ ASSERT_NE(batch, NULL);
+
+ batch_begin(batch, seq++);
+ create_table(batch, "test_table2", seq++);
+ create_chain(batch, "test_table2", "test_chain", seq++);
+ create_rule_with_ct(batch, "test_table2", "test_chain", seq++);
+ batch_end(batch, seq++);
+
+ ret = mnl_socket_sendto(self->nl, mnl_nlmsg_batch_head(batch),
+ mnl_nlmsg_batch_size(batch));
+ ASSERT_GT(ret, 0);
+
+ // Expect 5 acks: batch_begin, table, chain, rule, batch_end
+ validate_res(self->nl, seq - 5, 5, _metadata);
+
+ mnl_nlmsg_batch_stop(batch);
+}
+
+/*
+ * Repeat the previous test, but this time the nft_ct module should have been
+ * loaded by the previous test (if it exists and is not built-in). In any case,
+ * the same number of ACKs as before is expected.
+ */
+TEST_F(nfnetlink_batch, post_module_load)
+{
+ struct mnl_nlmsg_batch *batch;
+ char buf[BATCH_PAGE_SIZE];
+ uint32_t seq = time(NULL) + 2000;
+ int ret;
+
+ batch = mnl_nlmsg_batch_start(buf, sizeof(buf));
+ ASSERT_NE(batch, NULL);
+
+ batch_begin(batch, seq++);
+ create_table(batch, "test_table6", seq++);
+ create_chain(batch, "test_table6", "test_chain", seq++);
+ create_rule_with_ct(batch, "test_table6", "test_chain", seq++);
+ batch_end(batch, seq++);
+
+ ret = mnl_socket_sendto(self->nl, mnl_nlmsg_batch_head(batch),
+ mnl_nlmsg_batch_size(batch));
+ ASSERT_GT(ret, 0);
+
+ validate_res(self->nl, seq - 5, 5, _metadata);
+
+ mnl_nlmsg_batch_stop(batch);
+}
+
+TEST_F(nfnetlink_batch, invalid_batch)
+{
+ struct mnl_nlmsg_batch *batch;
+ char buf[BATCH_PAGE_SIZE];
+ uint32_t seq = time(NULL) + 3000;
+ int ret;
+
+ batch = mnl_nlmsg_batch_start(buf, sizeof(buf));
+ ASSERT_NE(batch, NULL);
+
+ batch_begin(batch, seq++);
+ create_table(batch, "test_table3", seq++);
+ create_invalid_table(batch, seq++);
+ create_chain(batch, "test_table3", "test_chain", seq++);
+ batch_end(batch, seq++);
+
+ ret = mnl_socket_sendto(self->nl, mnl_nlmsg_batch_head(batch),
+ mnl_nlmsg_batch_size(batch));
+ ASSERT_GT(ret, 0);
+
+ // Expect 5 acks: batch_begin, table, invalid_table(error), chain, batch_end
+ validate_res(self->nl, seq - 5, 5, _metadata);
+
+ mnl_nlmsg_batch_stop(batch);
+}
+
+TEST_F(nfnetlink_batch, batch_with_fatal_error)
+{
+ struct mnl_nlmsg_batch *batch;
+ char buf[BATCH_PAGE_SIZE];
+ uint32_t seq = time(NULL) + 4000;
+ uid_t uid;
+ int ret;
+
+ batch = mnl_nlmsg_batch_start(buf, sizeof(buf));
+ ASSERT_NE(batch, NULL);
+
+ batch_begin(batch, seq++);
+ create_table(batch, "test_table4", seq++);
+ create_table(batch, "test_table5", seq++);
+ batch_end(batch, seq++);
+
+ // Drop privileges to trigger EPERM
+ uid = geteuid();
+ if (uid == 0) {
+ ASSERT_EQ(seteuid(65534), 0);
+ }
+
+ ret = mnl_socket_sendto(self->nl, mnl_nlmsg_batch_head(batch),
+ mnl_nlmsg_batch_size(batch));
+
+ // Restore privileges
+ if (uid == 0) {
+ seteuid(uid);
+ }
+
+ ASSERT_GT(ret, 0);
+
+ // EPERM is fatal and should abort batch, expect only 1 ACK
+ validate_res(self->nl, seq - 4, 1, _metadata);
+
+ mnl_nlmsg_batch_stop(batch);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/net/netfilter/nfnetlink.sh b/tools/testing/selftests/net/netfilter/nfnetlink.sh
new file mode 100755
index 000000000000..3d264a4bde6f
--- /dev/null
+++ b/tools/testing/selftests/net/netfilter/nfnetlink.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+
+# If nft_ct is a module and is loaded, remove it to test module auto-loading.
+# If removal fails, continue anyway since it won't affect the test result.
+rmmod nft_ct 2>/dev/null
+./nfnetlink
\ No newline at end of file
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] netfilter: nfnetlink: always ACK batch end if requested
2025-10-01 21:15 [PATCH] netfilter: nfnetlink: always ACK batch end if requested Nikolaos Gkarlis
2025-10-02 9:48 ` Fernando Fernandez Mancera
2025-10-02 10:10 ` [PATCH] netfilter: nfnetlink: " Florian Westphal
@ 2025-10-07 20:33 ` Pablo Neira Ayuso
2025-10-08 7:28 ` Florian Westphal
2025-10-08 8:41 ` Nikolaos Gkarlis
2 siblings, 2 replies; 27+ messages in thread
From: Pablo Neira Ayuso @ 2025-10-07 20:33 UTC (permalink / raw)
To: Nikolaos Gkarlis; +Cc: netfilter-devel, fw
Hi Nikolaos,
On Wed, Oct 01, 2025 at 11:15:03PM +0200, Nikolaos Gkarlis wrote:
> Before ACKs were introduced for batch begin and batch end messages,
> userspace expected to receive the same number of ACKs as it sent,
> unless a fatal error occurred.
Regarding bf2ac490d28c, I don't understand why one needs an ack for
_BEGIN message. Maybe, an ack for END message might make sense when
BATCH_DONE is reached so you get a confirmation that the batch has
been fully processed, however...
> To preserve this deterministic behavior, send an ACK for batch end
> messages even when an error happens in the middle of the batch,
> similar to how ACKs are handled for command messages.
I suspect the author of bf2ac490d28c is making wrong assumptions on
the number of acknowledgements that are going to be received by
userspace.
Let's just forget about this bf2ac490d28c for a moment, a quick summary:
#1 If you don't set NLM_F_ACK in your netlink messages in the batch
(this is what netfilter's userspace does): then errors result in
acknowledgement. But ENOBUFS is still possible: this means your batch
has resulted in too many acknowledment messages (errors) filling up
the userspace netlink socket buffer.
#2 If you set NLM_F_ACK in your netlink messages in the batch:
You get one acknowledgement for each message in the batch, with a
sufficiently large batch, this may overrun the userspace socket
buffer (ENOBUFS), then maybe the kernel was successful to fully
process the transaction but some of those acks get lost.
That is, 1bf2ac490d28c allows you to set on NLM_F_ACK for BATCH_END,
but that acknowledgement can get lost in case of netlink socket
overrun.
ENOBUFS scenarios break assumptions on the number of messages that
you are going to receive from the kernel.
Netlink is a unreliable transport protocol, there are mechanisms to
make it "more reliable" but message loss (particularly in the
kernel -> userspace direction) is still possible.
In this particular case, where batching several netlink messages in
one single send(), userspace will not process the acknowledments
messages in the userspace socket buffer until the batch is complete.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] netfilter: nfnetlink: always ACK batch end if requested
2025-10-07 20:33 ` Pablo Neira Ayuso
@ 2025-10-08 7:28 ` Florian Westphal
2025-10-08 11:33 ` Pablo Neira Ayuso
2025-10-08 8:41 ` Nikolaos Gkarlis
1 sibling, 1 reply; 27+ messages in thread
From: Florian Westphal @ 2025-10-08 7:28 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: Nikolaos Gkarlis, netfilter-devel, donald.hunter
[ Cc Donald Hunter ]
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Wed, Oct 01, 2025 at 11:15:03PM +0200, Nikolaos Gkarlis wrote:
> > Before ACKs were introduced for batch begin and batch end messages,
> > userspace expected to receive the same number of ACKs as it sent,
> > unless a fatal error occurred.
>
> Regarding bf2ac490d28c, I don't understand why one needs an ack for
> _BEGIN message. Maybe, an ack for END message might make sense when
> BATCH_DONE is reached so you get a confirmation that the batch has
> been fully processed, however...
... which (BATCH_DONE reached) would be made moot by this proposed
patch, as we would ACK it even if its not reached anymore.
> I suspect the author of bf2ac490d28c is making wrong assumptions on
> the number of acknowledgements that are going to be received by
> userspace.
>
> Let's just forget about this bf2ac490d28c for a moment, a quick summary:
>
> #1 If you don't set NLM_F_ACK in your netlink messages in the batch
> (this is what netfilter's userspace does): then errors result in
> acknowledgement. But ENOBUFS is still possible: this means your batch
> has resulted in too many acknowledment messages (errors) filling up
> the userspace netlink socket buffer.
> #2 If you set NLM_F_ACK in your netlink messages in the batch:
> You get one acknowledgement for each message in the batch, with a
> sufficiently large batch, this may overrun the userspace socket
> buffer (ENOBUFS), then maybe the kernel was successful to fully
> process the transaction but some of those acks get lost.
Right, 1:1 relationship between messages and ACKs is only there for
theoretical infinite receive buffer which makes this feature rather limited
for batched case.
> In this particular case, where batching several netlink messages in
> one single send(), userspace will not process the acknowledments
> messages in the userspace socket buffer until the batch is complete.
OK, from what I gather you'd like for
"netfilter: nfnetlink: always ACK batch end if requested"
to not be applied.
I would still like to apply the nfnetlink selftest however (even
if it has to be trimmed/modified), because it does catch the issue
fixed by Fernando
[ 09efbac953f6 ("netfilter: nfnetlink: reset nlh pointer during batch replay") ]:
ok 1 nfnetlink_batch.simple_batch
# RUN nfnetlink_batch.module_load ...
# nfnetlink.c:239:module_load:[seq=1759907514] ACK
# nfnetlink.c:239:module_load:[seq=1759907512] ACK
# nfnetlink.c:244:module_load:Out of order ack: seq 1759907512 after 1759907514
# nfnetlink.c:239:module_load:[seq=1759907513] ACK
# nfnetlink.c:239:module_load:[seq=1759907514] ACK
# nfnetlink.c:239:module_load:[seq=1759907515] ACK
# nfnetlink.c:254:module_load:Expected 0 (0) == out_of_order (1)
# module_load: Test terminated by assertion
# FAIL nfnetlink_batch.module_load
If the decision is that there should NOT be an ACK for the BATCH_END if
there was an error, then the test only needs minor adjustment:
- // Expect 5 acks: batch_begin, table, invalid_table(error), chain, batch_end
- validate_res(self->nl, seq - 5, 5, _metadata);
+ // Expect 4 acks: batch_begin, table, invalid_table(error), chain
+ validate_res(self->nl, seq - 4, 4, _metadata);
So, what is the 'more useful' behaviour? Choices are all equally bad:
1. If we want to always include it, it might not be there due to
-ENOBUFS, which will always happen if the batch was large (triggers
too many acks).
2. If we only include it on success, it might not be there for the
same reason, so absence doesn't imply failure.
HOWEVER, if the batch was small enough then 2) gives slightly more
useable feedback in the sense that the entire batch was processed.
So I am leaning towards not applying the nfnetlink patch but applying
the (adjusted) test case.
Other takes?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] netfilter: nfnetlink: always ACK batch end if requested
2025-10-07 20:33 ` Pablo Neira Ayuso
2025-10-08 7:28 ` Florian Westphal
@ 2025-10-08 8:41 ` Nikolaos Gkarlis
2025-10-08 11:09 ` Pablo Neira Ayuso
1 sibling, 1 reply; 27+ messages in thread
From: Nikolaos Gkarlis @ 2025-10-08 8:41 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, fw
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Regarding bf2ac490d28c, I don't understand why one needs an ack for
> _BEGIN message. Maybe, an ack for END message might make sense when
> BATCH_DONE is reached so you get a confirmation that the batch has
> been fully processed, however...
_BEGIN might be excessive, but as you said, I do think _END could be
useful in the way you describe.
My assumption is that the author of 1bf2ac490d28c aims to standardize
the behavior while also allowing some flexibility in what flags are
sent. If someone tried to use those flags in a creative way that
deviates from what nft userspace expects, they might run into
difficulties handling the responses correctly.
> I suspect the author of bf2ac490d28c is making wrong assumptions on
> the number of acknowledgements that are going to be received by
> userspace.
That could very well be the case. As you said, you’re not always
guaranteed to receive the same number of ACKs.
I’m aware of the ENOBUFS error. Personally, I see it as a “fatal” or
“delivery” error, which should tell userspace that no more messages
are coming. Similar to EPERM which I have a test case for. It might
not be the best approach, since one could argue such errors might
also occur for individual batch commands. Still, now that I think
about it, not receiving a _BEGIN message could indicate that
the error is indeed fatal.
Receiving an error about an invalid command isn’t necessarily a
delivery failure (unlike ENOBUFS), and I’d still expect to get the
entire message back, including the ACK. Otherwise, how would userspace
know that it has read all messages and drained the buffer?
You could argue that userspace should bail on the first error it
receives, but if I’m not mistaken, the kernel will still send an error
for any subsequent invalid command, meaning the buffer isn’t being
drained again.
> Netlink is a unreliable transport protocol, there are mechanisms to
> make it "more reliable" but message loss (particularly in the
> kernel -> userspace direction) is still possible.
Is it unreliable mainly because of those corner cases, or are there
other factors to consider as well?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3] selftests: netfilter: add nfnetlink ACK handling tests
2025-10-05 12:54 ` [PATCH v3] " Nikolaos Gkarlis
@ 2025-10-08 10:26 ` Florian Westphal
2025-10-08 10:37 ` Nikolaos Gkarlis
0 siblings, 1 reply; 27+ messages in thread
From: Florian Westphal @ 2025-10-08 10:26 UTC (permalink / raw)
To: Nikolaos Gkarlis; +Cc: netfilter-devel, pablo, fmancera
Nikolaos Gkarlis <nickgarlis@gmail.com> wrote:
> Add nfnetlink selftests to validate the ACKs sent after a batch
> message. These tests verify that:
>
> - ACKs are always received in order.
> - Module loading does not affect the responses.
> - The number of ACKs matches the number of requests, unless a
> fatal error occurs.
I will keep this on hold while the discussion wrt. ACK flag on
batches is ongoing.
Also, checkpatch.pl reports:
WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
#59: FILE: tools/testing/selftests/net/netfilter/nfnetlink.c:1:
... and a few other nits that should be resolved too.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3] selftests: netfilter: add nfnetlink ACK handling tests
2025-10-08 10:26 ` Florian Westphal
@ 2025-10-08 10:37 ` Nikolaos Gkarlis
2025-10-08 10:39 ` Florian Westphal
0 siblings, 1 reply; 27+ messages in thread
From: Nikolaos Gkarlis @ 2025-10-08 10:37 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel, pablo, fmancera
Florian Westphal <fw@strlen.de> wrote:
> I will keep this on hold while the discussion wrt. ACK flag on
> batches is ongoing.
Thanks, I appreciate you taking the time for this.
> Also, checkpatch.pl reports:
>
> WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
> #59: FILE: tools/testing/selftests/net/netfilter/nfnetlink.c:1:
>
> ... and a few other nits that should be resolved too.
Apologies, I will submit a new version ASAP.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3] selftests: netfilter: add nfnetlink ACK handling tests
2025-10-08 10:37 ` Nikolaos Gkarlis
@ 2025-10-08 10:39 ` Florian Westphal
0 siblings, 0 replies; 27+ messages in thread
From: Florian Westphal @ 2025-10-08 10:39 UTC (permalink / raw)
To: Nikolaos Gkarlis; +Cc: netfilter-devel, pablo, fmancera
Nikolaos Gkarlis <nickgarlis@gmail.com> wrote:
> > ... and a few other nits that should be resolved too.
>
> Apologies, I will submit a new version ASAP.
No need, please wait until the discussion has completed
so we know if the selftest can be taken as-is or if it has
to be adjusted (no-batch-end-ack-on-error).
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] netfilter: nfnetlink: always ACK batch end if requested
2025-10-08 8:41 ` Nikolaos Gkarlis
@ 2025-10-08 11:09 ` Pablo Neira Ayuso
2025-10-08 14:50 ` Nikolaos Gkarlis
0 siblings, 1 reply; 27+ messages in thread
From: Pablo Neira Ayuso @ 2025-10-08 11:09 UTC (permalink / raw)
To: Nikolaos Gkarlis; +Cc: netfilter-devel, fw
Hi,
On Wed, Oct 08, 2025 at 10:41:05AM +0200, Nikolaos Gkarlis wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Regarding bf2ac490d28c, I don't understand why one needs an ack for
> > _BEGIN message. Maybe, an ack for END message might make sense when
> > BATCH_DONE is reached so you get a confirmation that the batch has
> > been fully processed, however...
>
> _BEGIN might be excessive, but as you said, I do think _END could be
> useful in the way you describe.
>
> My assumption is that the author of 1bf2ac490d28c aims to standardize
> the behavior while also allowing some flexibility in what flags are
> sent. If someone tried to use those flags in a creative way that
> deviates from what nft userspace expects, they might run into
> difficulties handling the responses correctly.
I think the author of 1bf2ac490d28c is using it for a testing tool
that sends very small batches (only few commands at a time). In that
case, considering the default socket buffer size, the acknowledment is
going to fit into the userspace netlink socket buffer.
> > I suspect the author of bf2ac490d28c is making wrong assumptions on
> > the number of acknowledgements that are going to be received by
> > userspace.
>
> That could very well be the case. As you said, you’re not always
> guaranteed to receive the same number of ACKs.
>
> I’m aware of the ENOBUFS error. Personally, I see it as a “fatal” or
> “delivery” error, which should tell userspace that no more messages
> are coming.
In netlink, ENOBUFS is not "fatal", it means messages got lost, but
there are still messages in the netlink socket buffer to be processed,
ie. the netlink messages before the overrun are still in place, but
the messages that could not fit in into the socket buffer has been
dropped.
nfnetlink handles a batch in two stages:
1) Process every netlink message in the batch, if either netlink
message triggers an error or NLM_F_ACK is set on, then enqueue
an error to the list.
2) If batch was successfully processed, iterate over the list of
errors and create the netlink acknowledgement messages that is
stores in the userspace netlink socket buffer.
Since this is a batch of netlink messages, acknowledgement either
triggered by explicit NLM_F_ACK or by errors may overrun the netlink
socket buffer.
> Similar to EPERM which I have a test case for.
EPERM is indeed fatal.
> It might not be the best approach, since one could argue such errors
> might also occur for individual batch commands. Still, now that I
> think about it, not receiving a _BEGIN message could indicate that
> the error is indeed fatal.
I think I see your point.
Acknowledgement for _BEGIN will be likely in the netlink socket
buffer, because it is the first message to be acknowledged, but _END
is the last one to be processed, so it could get lost if many
acknowledgements before have been queued to the userspace netlink
socket buffer (leading to overrun).
It seems with 1bf2ac490d28c, an acknowledgement with _BEGIN can be an
indication of successfully handling a batch in the way you describe.
> Receiving an error about an invalid command isn’t necessarily a
> delivery failure (unlike ENOBUFS), and I’d still expect to get the
> entire message back, including the ACK. Otherwise, how would userspace
> know that it has read all messages and drained the buffer?
For this nfnetlink batching, use select() to poll for pending messages
to process, and make no assumptions on how many messages you receive.
> You could argue that userspace should bail on the first error it
> receives, but if I’m not mistaken, the kernel will still send an error
> for any subsequent invalid command, meaning the buffer isn’t being
> drained again.
If you open, send then batch, process response, then close. Lazy
approach that consist of bailing out on the first error is OK. The
close call on the socket implicitly cleans up the ignored pending
error messages in the socket buffer.
But if you keep the socket open for several batches, with the approach
you describe, then unprocessed netlink error messages will pile up on
the socket buffer. If you do not do any sort of sequence tracking,
then you application process old pending errors as new, libmnl handles
this with EILSEQ.
All these netlink subtle details are not easy to follow :).
> > Netlink is a unreliable transport protocol, there are mechanisms to
> > make it "more reliable" but message loss (particularly in the
> > kernel -> userspace direction) is still possible.
>
> Is it unreliable mainly because of those corner cases, or are there
> other factors to consider as well?
As for netlink batching, which is supported in other classic netlink
subsystems, this acknowledgement overrun issue exists, I am referring
to the scenario where you add several netlink messages to the buffer
and send() them to the kernel.
As for nfnetlink, it is a bit special in that it has begin and end
messages because this is needed for the transaction semantics (to
implement a dryrun to test if incremental ruleset update is OK).
1bf2ac490d28c added the handling for NLM_F_ACK which I left it
unspecified at the time.
Netlink can be also used for event delivery to userspace, ENOBUFS can
also happen in that case, but that is a different scenario.
TBH, I am trying to remember the details, I don't talk about Netlink
very often.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] netfilter: nfnetlink: always ACK batch end if requested
2025-10-08 7:28 ` Florian Westphal
@ 2025-10-08 11:33 ` Pablo Neira Ayuso
2025-10-08 13:35 ` Donald Hunter
0 siblings, 1 reply; 27+ messages in thread
From: Pablo Neira Ayuso @ 2025-10-08 11:33 UTC (permalink / raw)
To: Florian Westphal; +Cc: Nikolaos Gkarlis, netfilter-devel, donald.hunter
On Wed, Oct 08, 2025 at 09:28:26AM +0200, Florian Westphal wrote:
> [ Cc Donald Hunter ]
>
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Wed, Oct 01, 2025 at 11:15:03PM +0200, Nikolaos Gkarlis wrote:
> > > Before ACKs were introduced for batch begin and batch end messages,
> > > userspace expected to receive the same number of ACKs as it sent,
> > > unless a fatal error occurred.
> >
> > Regarding bf2ac490d28c, I don't understand why one needs an ack for
> > _BEGIN message. Maybe, an ack for END message might make sense when
> > BATCH_DONE is reached so you get a confirmation that the batch has
> > been fully processed, however...
>
> ... which (BATCH_DONE reached) would be made moot by this proposed
> patch, as we would ACK it even if its not reached anymore.
Yes, I am inclined not to add more features to bf2ac490d28c (and
follow up fixes patches that came with it).
> > I suspect the author of bf2ac490d28c is making wrong assumptions on
> > the number of acknowledgements that are going to be received by
> > userspace.
> >
> > Let's just forget about this bf2ac490d28c for a moment, a quick summary:
> >
> > #1 If you don't set NLM_F_ACK in your netlink messages in the batch
> > (this is what netfilter's userspace does): then errors result in
> > acknowledgement. But ENOBUFS is still possible: this means your batch
> > has resulted in too many acknowledment messages (errors) filling up
> > the userspace netlink socket buffer.
> > #2 If you set NLM_F_ACK in your netlink messages in the batch:
> > You get one acknowledgement for each message in the batch, with a
> > sufficiently large batch, this may overrun the userspace socket
> > buffer (ENOBUFS), then maybe the kernel was successful to fully
> > process the transaction but some of those acks get lost.
>
> Right, 1:1 relationship between messages and ACKs is only there for
> theoretical infinite receive buffer which makes this feature rather limited
> for batched case.
Exactly.
> > In this particular case, where batching several netlink messages in
> > one single send(), userspace will not process the acknowledments
> > messages in the userspace socket buffer until the batch is complete.
>
> OK, from what I gather you'd like for
> "netfilter: nfnetlink: always ACK batch end if requested"
> to not be applied.
I think this at least needs more discussion, I think we are now
understanding the implications of bf2ac490d28c.
> I would still like to apply the nfnetlink selftest however (even
> if it has to be trimmed/modified), because it does catch the issue
> fixed by Fernando
> [ 09efbac953f6 ("netfilter: nfnetlink: reset nlh pointer during batch replay") ]:
>
> ok 1 nfnetlink_batch.simple_batch
> # RUN nfnetlink_batch.module_load ...
> # nfnetlink.c:239:module_load:[seq=1759907514] ACK
> # nfnetlink.c:239:module_load:[seq=1759907512] ACK
> # nfnetlink.c:244:module_load:Out of order ack: seq 1759907512 after 1759907514
> # nfnetlink.c:239:module_load:[seq=1759907513] ACK
> # nfnetlink.c:239:module_load:[seq=1759907514] ACK
> # nfnetlink.c:239:module_load:[seq=1759907515] ACK
> # nfnetlink.c:254:module_load:Expected 0 (0) == out_of_order (1)
> # module_load: Test terminated by assertion
> # FAIL nfnetlink_batch.module_load
>
> If the decision is that there should NOT be an ACK for the BATCH_END if
> there was an error, then the test only needs minor adjustment:
>
> - // Expect 5 acks: batch_begin, table, invalid_table(error), chain, batch_end
> - validate_res(self->nl, seq - 5, 5, _metadata);
> + // Expect 4 acks: batch_begin, table, invalid_table(error), chain
> + validate_res(self->nl, seq - 4, 4, _metadata);
>
> So, what is the 'more useful' behaviour? Choices are all equally bad:
>
> 1. If we want to always include it, it might not be there due to
> -ENOBUFS, which will always happen if the batch was large (triggers
> too many acks).
Yes.
> 2. If we only include it on success, it might not be there for the
> same reason, so absence doesn't imply failure.
Yes.
> HOWEVER, if the batch was small enough then 2) gives slightly more
> useable feedback in the sense that the entire batch was processed.
Yes.
I think Nikolaos pointed out that _BEGIN+NLM_F_ACK could actually
provide an indication, with the assumption that the netlink userspace
queue is going to be empty because it will be the first
acknowledgement...
> So I am leaning towards not applying the nfnetlink patch but applying
> the (adjusted) test case.
>
> Other takes?
Yes, I would start with this approach you propose, then keep
discussing if it makes sense to keep extending bf2ac490d28c or leave
it as is.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] netfilter: nfnetlink: always ACK batch end if requested
2025-10-08 11:33 ` Pablo Neira Ayuso
@ 2025-10-08 13:35 ` Donald Hunter
2025-10-08 14:50 ` Florian Westphal
0 siblings, 1 reply; 27+ messages in thread
From: Donald Hunter @ 2025-10-08 13:35 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: Florian Westphal, Nikolaos Gkarlis, netfilter-devel
On Wed, 8 Oct 2025 at 12:33, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> On Wed, Oct 08, 2025 at 09:28:26AM +0200, Florian Westphal wrote:
> > [ Cc Donald Hunter ]
Thanks for the cc. I'm having difficulty catching up with the thread
of conversation, so forgive me if I miss something.
> Yes, I am inclined not to add more features to bf2ac490d28c (and
> follow up fixes patches that came with it).
Is there a problem with bf2ac490d28c or just with things that have
come after it?
> > > I suspect the author of bf2ac490d28c is making wrong assumptions on
> > > the number of acknowledgements that are going to be received by
> > > userspace.
Hopefully not. In the success case, ask for an ack, get an ack.
Without that guarantee to userspace, we'd need to extend the YNL spec
to say which messages don't honour acks.
> > > Let's just forget about this bf2ac490d28c for a moment, a quick summary:
> > >
> > > #1 If you don't set NLM_F_ACK in your netlink messages in the batch
> > > (this is what netfilter's userspace does): then errors result in
> > > acknowledgement. But ENOBUFS is still possible: this means your batch
> > > has resulted in too many acknowledment messages (errors) filling up
> > > the userspace netlink socket buffer.
> > > #2 If you set NLM_F_ACK in your netlink messages in the batch:
> > > You get one acknowledgement for each message in the batch, with a
> > > sufficiently large batch, this may overrun the userspace socket
> > > buffer (ENOBUFS), then maybe the kernel was successful to fully
> > > process the transaction but some of those acks get lost.
Are people reporting ENOBUFS because of ACKs in practice or is this theoretical?
> > Right, 1:1 relationship between messages and ACKs is only there for
> > theoretical infinite receive buffer which makes this feature rather limited
> > for batched case.
>
> Exactly.
>
> > > In this particular case, where batching several netlink messages in
> > > one single send(), userspace will not process the acknowledments
> > > messages in the userspace socket buffer until the batch is complete.
> >
> > OK, from what I gather you'd like for
> > "netfilter: nfnetlink: always ACK batch end if requested"
> > to not be applied.
>
> I think this at least needs more discussion, I think we are now
> understanding the implications of bf2ac490d28c.
Is the problem due to interplay between ACKs and ERRORs ?
I don't think anyone has successfully specified the expected behaviour
when errors get reported, namely how many acks versus how many errors
could be expected. I know that the YNL python library reports the
first error received and terminates, regardless of how many ACKs were
received.
My guess is that it should be possible to report as many errors as
were received within the receive message length before exiting but I
have not tried to implement that.
> > So, what is the 'more useful' behaviour? Choices are all equally bad:
> >
> > 1. If we want to always include it, it might not be there due to
> > -ENOBUFS, which will always happen if the batch was large (triggers
> > too many acks).
>
> Yes.
>
> > 2. If we only include it on success, it might not be there for the
> > same reason, so absence doesn't imply failure.
>
> Yes.
>
> > HOWEVER, if the batch was small enough then 2) gives slightly more
> > useable feedback in the sense that the entire batch was processed.
>
> Yes.
>
> I think Nikolaos pointed out that _BEGIN+NLM_F_ACK could actually
> provide an indication, with the assumption that the netlink userspace
> queue is going to be empty because it will be the first
> acknowledgement...
>
> > So I am leaning towards not applying the nfnetlink patch but applying
> > the (adjusted) test case.
> >
> > Other takes?
I think we should try to specify the behaviour and see if it makes
sense before layering more functionality onto what is there.
I can describe how the YNL python code sees the world, when it asks for ACKs:
1. An ack for BEGIN, cmd, cmd, ... , END in the success scenario.
2. An ack for BEGIN, cmd, cmd, ... up to the first ERROR in a failure scenario.
> Yes, I would start with this approach you propose, then keep
> discussing if it makes sense to keep extending bf2ac490d28c or leave
> it as is.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] netfilter: nfnetlink: always ACK batch end if requested
2025-10-08 13:35 ` Donald Hunter
@ 2025-10-08 14:50 ` Florian Westphal
0 siblings, 0 replies; 27+ messages in thread
From: Florian Westphal @ 2025-10-08 14:50 UTC (permalink / raw)
To: Donald Hunter; +Cc: Pablo Neira Ayuso, Nikolaos Gkarlis, netfilter-devel
Donald Hunter <donald.hunter@gmail.com> wrote:
> Thanks for the cc. I'm having difficulty catching up with the thread
> of conversation, so forgive me if I miss something.
>
> > Yes, I am inclined not to add more features to bf2ac490d28c (and
> > follow up fixes patches that came with it).
>
> Is there a problem with bf2ac490d28c or just with things that have
> come after it?
it broke golang library for nftables:
https://github.com/google/nftables/issues/329
Fix is:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git/commit/?id=09efbac953f6f076a07735f9ba885148d4796235
Current discussion centers more on what bf2ac490d28c wants,
what it does, what netlink can provide in practice and what
userspace would expect/need.
> > > > I suspect the author of bf2ac490d28c is making wrong assumptions on
> > > > the number of acknowledgements that are going to be received by
> > > > userspace.
>
> Hopefully not. In the success case, ask for an ack, get an ack.
> Without that guarantee to userspace, we'd need to extend the YNL spec
> to say which messages don't honour acks.
Its not about the types, its more about available buffer space for acks.
> > > > #2 If you set NLM_F_ACK in your netlink messages in the batch:
> > > > You get one acknowledgement for each message in the batch, with a
> > > > sufficiently large batch, this may overrun the userspace socket
> > > > buffer (ENOBUFS), then maybe the kernel was successful to fully
> > > > process the transaction but some of those acks get lost.
>
> Are people reporting ENOBUFS because of ACKs in practice or is this theoretical?
It happens in practice, e.g.:
https://lore.kernel.org/netfilter/aKrK6h2zYdqj2unR@lilyboudoir.localdomain/
(every set element triggers another error message).
Note, this is without NLM_F_ACK, just due to error messages
triggered during batch processing.
> I think we should try to specify the behaviour and see if it makes
> sense before layering more functionality onto what is there.
I think we can all agree on that :-)
> I can describe how the YNL python code sees the world, when it asks for ACKs:
>
> 1. An ack for BEGIN, cmd, cmd, ... , END in the success scenario.
> 2. An ack for BEGIN, cmd, cmd, ... up to the first ERROR in a failure scenario.
Thanks for clarifying.
This should work, IFF the batch is small enough so receive buffer can
hold all acks.
nftables doesn't bail out on first error (transaction will fail and
no permanent changes are made, but it will move to next contained
message.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] netfilter: nfnetlink: always ACK batch end if requested
2025-10-08 11:09 ` Pablo Neira Ayuso
@ 2025-10-08 14:50 ` Nikolaos Gkarlis
0 siblings, 0 replies; 27+ messages in thread
From: Nikolaos Gkarlis @ 2025-10-08 14:50 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, fw
Hi again,
Thanks for the detailed explanation. For the record, I am not strongly
for or against one or the other approach but I do think that my
approach is something to consider if batch ACKs are to be supported in
the long run.
> In netlink, ENOBUFS is not "fatal", it means messages got lost, but
> there are still messages in the netlink socket buffer to be
> processed, ie. the netlink messages before the overrun are still in
> place, but the messages that could not fit in into the socket buffer
> has been dropped.
Yes perhaps in this instance "delivery" error would be a better
description but I understand it's not an error in applying the batch.
> For this nfnetlink batching, use select() to poll for pending
> messages to process, and make no assumptions on how many messages
> you receive.
That's an option indeed although the library that I use
https://github.com/google/nftables is unfortunately limited to blocking
recvmsg calls and this is why it has to keep track of the sent acks
and validate them after. I wonder if there are more cases like that ?
In fact, that's the reason why I stumbled upon the duplicate ACK bug
and attempted using ACKs on batch messages to work around the issue.
Additionally, this library will dynamically adjust its buffer based on the
expected response size.
I appreciate you probably don't want to make architectural decisions
based on unofficial library limitations but I thought it might be a
good thing to mention that.
To make the discussion clearer, I think we are referring to three
different scenarios. I've outlined them below for clarity.
Scenario 1: Success
User sends:
Command1
Command2
Command3
Kernel replies:
Command1 - 0
Command2 - 0
Command3 - 0
Result: The batch is processed successfully.
Scenario 2: Failure
User sends:
Command1
Command2 (invalid)
Command3 (invalid)
Kernel replies:
Command1 - 0
Command2 - EINVAL
Command3 - EINVAL
Note: The reply for Command3 is still received even though Command2
failed.
Scenario 3: Fatal / Permission Error
User sends:
Command1
Command2
Command3
Kernel replies:
Command1 - EPERM
Result: Indicates a fatal error. Processing stops immediately.
Now let's add BATCH_BEGIN and BATCH_END into the mix.
Scenario 1: Success with Batch
User sends:
BATCH_BEGIN
Command1
Command2
Command3
BATCH_END
Kernel replies:
BATCH_BEGIN
Command1
Command2
Command3
BATCH_END
Result: Consistent behavior. Full batch completed successfully.
Scenario 2: Failure with Batch
User sends:
BATCH_BEGIN
Command1
Command2 (invalid)
Command3 (invalid)
BATCH_END
Kernel replies:
BATCH_BEGIN
Command1 - 0
Command2 - EINVAL
Command3 - EINVAL
Observation: This seems inconsistent with earlier behavior. I would
expect a BATCH_END reply here as well to indicate that the batch
processing is complete, even if errors occurred.
Scenario 3: Fatal Error with Batch
User sends:
BATCH_BEGIN
Command1
Command2
Command3
BATCH_END
Kernel replies:
Command1 - EPERM
Result: Matches the previous "fatal" case which I believe is what
you're pointing out does not make sense and would require further
changes which you don't think is a good idea to make.
I’m not trying to push this change, since I don’t rely on batch ACKs.
Just want to make sure we’re on the same page.
Let me know how you’d like to proceed and whether I should update the
test patch.
Thanks!
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2025-10-08 14:50 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-01 21:15 [PATCH] netfilter: nfnetlink: always ACK batch end if requested Nikolaos Gkarlis
2025-10-02 9:48 ` Fernando Fernandez Mancera
2025-10-02 10:41 ` Nikolaos Gkarlis
2025-10-02 11:03 ` Fernando Fernandez Mancera
2025-10-04 9:26 ` [PATCH v2 0/2] " Nikolaos Gkarlis
2025-10-04 9:26 ` [PATCH v2 1/2] netfilter: nfnetlink: " Nikolaos Gkarlis
2025-10-04 9:26 ` [PATCH v2 2/2] selftests: netfilter: add nfnetlink ACK handling tests Nikolaos Gkarlis
2025-10-04 10:46 ` Florian Westphal
2025-10-04 11:08 ` Nikolaos Gkarlis
2025-10-04 12:26 ` Florian Westphal
2025-10-05 10:43 ` Nikolaos Gkarlis
2025-10-05 11:42 ` Florian Westphal
2025-10-05 12:54 ` [PATCH v3] " Nikolaos Gkarlis
2025-10-08 10:26 ` Florian Westphal
2025-10-08 10:37 ` Nikolaos Gkarlis
2025-10-08 10:39 ` Florian Westphal
2025-10-04 9:38 ` [PATCH v2 0/2] always ACK batch end if requested Nikolaos Gkarlis
2025-10-02 10:10 ` [PATCH] netfilter: nfnetlink: " Florian Westphal
2025-10-02 10:46 ` Nikolaos Gkarlis
2025-10-07 20:33 ` Pablo Neira Ayuso
2025-10-08 7:28 ` Florian Westphal
2025-10-08 11:33 ` Pablo Neira Ayuso
2025-10-08 13:35 ` Donald Hunter
2025-10-08 14:50 ` Florian Westphal
2025-10-08 8:41 ` Nikolaos Gkarlis
2025-10-08 11:09 ` Pablo Neira Ayuso
2025-10-08 14:50 ` Nikolaos Gkarlis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).