* [PATCH net-next 0/8] netfilter: updates for net-next
@ 2025-09-01 8:08 Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 1/8] netfilter: ebtables: Use vmalloc_array() to improve code Florian Westphal
` (8 more replies)
0 siblings, 9 replies; 14+ messages in thread
From: Florian Westphal @ 2025-09-01 8:08 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Hi,
The following patchset contains Netfilter fixes for *net-next*:
1) prefer vmalloc_array in ebtables, from Qianfeng Rong.
2) Use csum_replace4 instead of open-coding it, from Christophe Leroy.
3+4) Get rid of GFP_ATOMIC in transaction object allocations, those
cause silly failures with large sets under memory pressure, from
myself.
5) Introduce new NFTA_DEVICE_PREFIX attribute in nftables netlink api,
re-using old NFTA_DEVICE_NAME led to confusion with different
kernel/userspace versions. This refines the wildcard interface
support added in 6.16 release. From Phil Sutter.
6) Remove test for AVX cpu feature in nftables pipapo set type,
testing for AVX2 feature is sufficient.
7) Unexport a few function in nf_reject infra: no external callers.
8) Extend payload offset to u16, this was restricted to values <=255
so far, from Fernando Fernandez Mancera.
Please, pull these changes from:
The following changes since commit 864ecc4a6dade82d3f70eab43dad0e277aa6fc78:
Merge branch 'net-add-rcu-safety-to-dst-dev' (2025-08-29 19:36:34 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git tags/nf-next-25-09-01
for you to fetch changes up to 0618948e58e09e1ebf59078bf5b7841bbd1ce1d2:
netfilter: nft_payload: extend offset to 65535 bytes (2025-09-01 09:53:17 +0200)
----------------------------------------------------------------
netfilter pull request nf-next-25-09-01
----------------------------------------------------------------
Christophe Leroy (1):
netfilter: nft_payload: Use csum_replace4() instead of opencoding
Fernando Fernandez Mancera (1):
netfilter: nft_payload: extend offset to 65535 bytes
Florian Westphal (4):
netfilter: nf_tables: allow iter callbacks to sleep
netfilter: nf_tables: all transaction allocations can now sleep
netfilter: nft_set_pipapo: remove redundant test for avx feature bit
netfilter: nf_reject: remove unneeded exports
Phil Sutter (1):
netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX
Qianfeng Rong (1):
netfilter: ebtables: Use vmalloc_array() to improve code
include/net/netfilter/ipv4/nf_reject.h | 8 --
include/net/netfilter/ipv6/nf_reject.h | 10 ---
include/net/netfilter/nf_tables.h | 2 +
include/net/netfilter/nf_tables_core.h | 2 +-
include/uapi/linux/netfilter/nf_tables.h | 2 +
net/bridge/netfilter/ebtables.c | 14 ++--
net/ipv4/netfilter/nf_reject_ipv4.c | 27 +++---
net/ipv6/netfilter/nf_reject_ipv6.c | 37 ++++++---
net/netfilter/nf_tables_api.c | 89 +++++++++++---------
net/netfilter/nft_payload.c | 20 +++--
net/netfilter/nft_set_hash.c | 100 ++++++++++++++++++++++-
net/netfilter/nft_set_pipapo.c | 3 +-
net/netfilter/nft_set_pipapo_avx2.c | 2 +-
net/netfilter/nft_set_rbtree.c | 35 ++++++--
14 files changed, 242 insertions(+), 109 deletions(-)
--
2.49.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next 1/8] netfilter: ebtables: Use vmalloc_array() to improve code
2025-09-01 8:08 [PATCH net-next 0/8] netfilter: updates for net-next Florian Westphal
@ 2025-09-01 8:08 ` Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 2/8] netfilter: nft_payload: Use csum_replace4() instead of opencoding Florian Westphal
` (7 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-09-01 8:08 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Qianfeng Rong <rongqianfeng@vivo.com>
Remove array_size() calls and replace vmalloc() with vmalloc_array() to
simplify the code. vmalloc_array() is also optimized better, uses fewer
instructions, and handles overflow more concisely[1].
[1]: https://lore.kernel.org/lkml/abc66ec5-85a4-47e1-9759-2f60ab111971@vivo.com/
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/bridge/netfilter/ebtables.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 3e67d4aff419..5697e3949a36 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -920,8 +920,8 @@ static int translate_table(struct net *net, const char *name,
* if an error occurs
*/
newinfo->chainstack =
- vmalloc(array_size(nr_cpu_ids,
- sizeof(*(newinfo->chainstack))));
+ vmalloc_array(nr_cpu_ids,
+ sizeof(*(newinfo->chainstack)));
if (!newinfo->chainstack)
return -ENOMEM;
for_each_possible_cpu(i) {
@@ -938,7 +938,7 @@ static int translate_table(struct net *net, const char *name,
}
}
- cl_s = vmalloc(array_size(udc_cnt, sizeof(*cl_s)));
+ cl_s = vmalloc_array(udc_cnt, sizeof(*cl_s));
if (!cl_s)
return -ENOMEM;
i = 0; /* the i'th udc */
@@ -1018,8 +1018,8 @@ static int do_replace_finish(struct net *net, struct ebt_replace *repl,
* the check on the size is done later, when we have the lock
*/
if (repl->num_counters) {
- unsigned long size = repl->num_counters * sizeof(*counterstmp);
- counterstmp = vmalloc(size);
+ counterstmp = vmalloc_array(repl->num_counters,
+ sizeof(*counterstmp));
if (!counterstmp)
return -ENOMEM;
}
@@ -1386,7 +1386,7 @@ static int do_update_counters(struct net *net, const char *name,
if (num_counters == 0)
return -EINVAL;
- tmp = vmalloc(array_size(num_counters, sizeof(*tmp)));
+ tmp = vmalloc_array(num_counters, sizeof(*tmp));
if (!tmp)
return -ENOMEM;
@@ -1526,7 +1526,7 @@ static int copy_counters_to_user(struct ebt_table *t,
if (num_counters != nentries)
return -EINVAL;
- counterstmp = vmalloc(array_size(nentries, sizeof(*counterstmp)));
+ counterstmp = vmalloc_array(nentries, sizeof(*counterstmp));
if (!counterstmp)
return -ENOMEM;
--
2.49.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 2/8] netfilter: nft_payload: Use csum_replace4() instead of opencoding
2025-09-01 8:08 [PATCH net-next 0/8] netfilter: updates for net-next Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 1/8] netfilter: ebtables: Use vmalloc_array() to improve code Florian Westphal
@ 2025-09-01 8:08 ` Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 3/8] netfilter: nf_tables: allow iter callbacks to sleep Florian Westphal
` (6 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-09-01 8:08 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Christophe Leroy <christophe.leroy@csgroup.eu>
Open coded calculation can be avoided and replaced by the
equivalent csum_replace4() in nft_csum_replace().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nft_payload.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index 7dfc5343dae4..059b28ffad0e 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -684,7 +684,7 @@ static const struct nft_expr_ops nft_payload_inner_ops = {
static inline void nft_csum_replace(__sum16 *sum, __wsum fsum, __wsum tsum)
{
- *sum = csum_fold(csum_add(csum_sub(~csum_unfold(*sum), fsum), tsum));
+ csum_replace4(sum, (__force __be32)fsum, (__force __be32)tsum);
if (*sum == 0)
*sum = CSUM_MANGLED_0;
}
--
2.49.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 3/8] netfilter: nf_tables: allow iter callbacks to sleep
2025-09-01 8:08 [PATCH net-next 0/8] netfilter: updates for net-next Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 1/8] netfilter: ebtables: Use vmalloc_array() to improve code Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 2/8] netfilter: nft_payload: Use csum_replace4() instead of opencoding Florian Westphal
@ 2025-09-01 8:08 ` Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 4/8] netfilter: nf_tables: all transaction allocations can now sleep Florian Westphal
` (5 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-09-01 8:08 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Quoting Sven Auhagen:
we do see on occasions that we get the following error message, more so on
x86 systems than on arm64:
Error: Could not process rule: Cannot allocate memory delete table inet filter
It is not a consistent error and does not happen all the time.
We are on Kernel 6.6.80, seems to me like we have something along the lines
of the nf_tables: allow clone callbacks to sleep problem using GFP_ATOMIC.
As hinted at by Sven, this is because of GFP_ATOMIC allocations during
set flush.
When set is flushed, all elements are deactivated. This triggers a set
walk and each element gets added to the transaction list.
The rbtree and rhashtable sets don't allow the iter callback to sleep:
rbtree walk acquires read side of an rwlock with bh disabled, rhashtable
walk happens with rcu read lock held.
Rbtree is simple enough to resolve:
When the walk context is ITER_READ, no change is needed (the iter
callback must not deactivate elements; we're not in a transaction).
When the iter type is ITER_UPDATE, the rwlock isn't needed because the
caller holds the transaction mutex, this prevents any and all changes to
the ruleset, including add/remove of set elements.
Rhashtable is slightly more complex.
When the iter type is ITER_READ, no change is needed, like rbtree.
For ITER_UPDATE, we hold transaction mutex which prevents elements from
getting free'd, even outside of rcu read lock section.
So build a temporary list of all elements while doing the rcu iteration
and then call the iterator in a second pass.
The disadvantage is the need to iterate twice, but this cost comes with
the benefit to allow the iter callback to use GFP_KERNEL allocations in
a followup patch.
The new list based logic makes it necessary to catch recursive calls to
the same set earlier.
Such walk -> iter -> walk recursion for the same set can happen during
ruleset validation in case userspace gave us a bogus (cyclic) ruleset
where verdict map m jumps to chain that sooner or later also calls
"vmap @m".
Before the new ->in_update_walk test, the ruleset is rejected because the
infinite recursion causes ctx->level to exceed the allowed maximum.
But with the new logic added here, elements would get skipped:
nft_rhash_walk_update would see elements that are on the walk_list of
an older stack frame.
As all recursive calls into same map results in -EMLINK, we can avoid this
problem by using the new in_update_walk flag and reject immediately.
Next patch converts the problematic GFP_ATOMIC allocations.
Reported-by: Sven Auhagen <Sven.Auhagen@belden.com>
Closes: https://lore.kernel.org/netfilter-devel/BY1PR18MB5874110CAFF1ED098D0BC4E7E07BA@BY1PR18MB5874.namprd18.prod.outlook.com/
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_tables.h | 2 +
net/netfilter/nft_set_hash.c | 100 +++++++++++++++++++++++++++++-
net/netfilter/nft_set_rbtree.c | 35 ++++++++---
3 files changed, 126 insertions(+), 11 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 891e43a01bdc..e2128663b160 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -556,6 +556,7 @@ struct nft_set_elem_expr {
* @size: maximum set size
* @field_len: length of each field in concatenation, bytes
* @field_count: number of concatenated fields in element
+ * @in_update_walk: true during ->walk() in transaction phase
* @use: number of rules references to this set
* @nelems: number of elements
* @ndeact: number of deactivated elements queued for removal
@@ -590,6 +591,7 @@ struct nft_set {
u32 size;
u8 field_len[NFT_REG32_COUNT];
u8 field_count;
+ bool in_update_walk;
u32 use;
atomic_t nelems;
u32 ndeact;
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 266d0c637225..ba01ce75d6de 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -30,6 +30,7 @@ struct nft_rhash {
struct nft_rhash_elem {
struct nft_elem_priv priv;
struct rhash_head node;
+ struct llist_node walk_node;
u32 wq_gc_seq;
struct nft_set_ext ext;
};
@@ -144,6 +145,7 @@ nft_rhash_update(struct nft_set *set, const u32 *key,
goto err1;
he = nft_elem_priv_cast(elem_priv);
+ init_llist_node(&he->walk_node);
prev = rhashtable_lookup_get_insert_key(&priv->ht, &arg, &he->node,
nft_rhash_params);
if (IS_ERR(prev))
@@ -180,6 +182,7 @@ static int nft_rhash_insert(const struct net *net, const struct nft_set *set,
};
struct nft_rhash_elem *prev;
+ init_llist_node(&he->walk_node);
prev = rhashtable_lookup_get_insert_key(&priv->ht, &arg, &he->node,
nft_rhash_params);
if (IS_ERR(prev))
@@ -261,12 +264,12 @@ static bool nft_rhash_delete(const struct nft_set *set,
return true;
}
-static void nft_rhash_walk(const struct nft_ctx *ctx, struct nft_set *set,
- struct nft_set_iter *iter)
+static void nft_rhash_walk_ro(const struct nft_ctx *ctx, struct nft_set *set,
+ struct nft_set_iter *iter)
{
struct nft_rhash *priv = nft_set_priv(set);
- struct nft_rhash_elem *he;
struct rhashtable_iter hti;
+ struct nft_rhash_elem *he;
rhashtable_walk_enter(&priv->ht, &hti);
rhashtable_walk_start(&hti);
@@ -295,6 +298,97 @@ static void nft_rhash_walk(const struct nft_ctx *ctx, struct nft_set *set,
rhashtable_walk_exit(&hti);
}
+static void nft_rhash_walk_update(const struct nft_ctx *ctx,
+ struct nft_set *set,
+ struct nft_set_iter *iter)
+{
+ struct nft_rhash *priv = nft_set_priv(set);
+ struct nft_rhash_elem *he, *tmp;
+ struct llist_node *first_node;
+ struct rhashtable_iter hti;
+ LLIST_HEAD(walk_list);
+
+ lockdep_assert_held(&nft_pernet(ctx->net)->commit_mutex);
+
+ if (set->in_update_walk) {
+ /* This can happen with bogus rulesets during ruleset validation
+ * when a verdict map causes a jump back to the same map.
+ *
+ * Without this extra check the walk_next loop below will see
+ * elems on the callers walk_list and skip (not validate) them.
+ */
+ iter->err = -EMLINK;
+ return;
+ }
+
+ /* walk happens under RCU.
+ *
+ * We create a snapshot list so ->iter callback can sleep.
+ * commit_mutex is held, elements can ...
+ * .. be added in parallel from dataplane (dynset)
+ * .. be marked as dead in parallel from dataplane (dynset).
+ * .. be queued for removal in parallel (gc timeout).
+ * .. not be freed: transaction mutex is held.
+ */
+ rhashtable_walk_enter(&priv->ht, &hti);
+ rhashtable_walk_start(&hti);
+
+ while ((he = rhashtable_walk_next(&hti))) {
+ if (IS_ERR(he)) {
+ if (PTR_ERR(he) != -EAGAIN) {
+ iter->err = PTR_ERR(he);
+ break;
+ }
+
+ continue;
+ }
+
+ /* rhashtable resized during walk, skip */
+ if (llist_on_list(&he->walk_node))
+ continue;
+
+ llist_add(&he->walk_node, &walk_list);
+ }
+ rhashtable_walk_stop(&hti);
+ rhashtable_walk_exit(&hti);
+
+ first_node = __llist_del_all(&walk_list);
+ set->in_update_walk = true;
+ llist_for_each_entry_safe(he, tmp, first_node, walk_node) {
+ if (iter->err == 0) {
+ iter->err = iter->fn(ctx, set, iter, &he->priv);
+ if (iter->err == 0)
+ iter->count++;
+ }
+
+ /* all entries must be cleared again, else next ->walk iteration
+ * will skip entries.
+ */
+ init_llist_node(&he->walk_node);
+ }
+ set->in_update_walk = false;
+}
+
+static void nft_rhash_walk(const struct nft_ctx *ctx, struct nft_set *set,
+ struct nft_set_iter *iter)
+{
+ switch (iter->type) {
+ case NFT_ITER_UPDATE:
+ /* only relevant for netlink dumps which use READ type */
+ WARN_ON_ONCE(iter->skip != 0);
+
+ nft_rhash_walk_update(ctx, set, iter);
+ break;
+ case NFT_ITER_READ:
+ nft_rhash_walk_ro(ctx, set, iter);
+ break;
+ default:
+ iter->err = -EINVAL;
+ WARN_ON_ONCE(1);
+ break;
+ }
+}
+
static bool nft_rhash_expr_needs_gc_run(const struct nft_set *set,
struct nft_set_ext *ext)
{
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 938a257c069e..b311b66df3e9 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -584,15 +584,14 @@ nft_rbtree_deactivate(const struct net *net, const struct nft_set *set,
return NULL;
}
-static void nft_rbtree_walk(const struct nft_ctx *ctx,
- struct nft_set *set,
- struct nft_set_iter *iter)
+static void nft_rbtree_do_walk(const struct nft_ctx *ctx,
+ struct nft_set *set,
+ struct nft_set_iter *iter)
{
struct nft_rbtree *priv = nft_set_priv(set);
struct nft_rbtree_elem *rbe;
struct rb_node *node;
- read_lock_bh(&priv->lock);
for (node = rb_first(&priv->root); node != NULL; node = rb_next(node)) {
rbe = rb_entry(node, struct nft_rbtree_elem, node);
@@ -600,14 +599,34 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
goto cont;
iter->err = iter->fn(ctx, set, iter, &rbe->priv);
- if (iter->err < 0) {
- read_unlock_bh(&priv->lock);
+ if (iter->err < 0)
return;
- }
cont:
iter->count++;
}
- read_unlock_bh(&priv->lock);
+}
+
+static void nft_rbtree_walk(const struct nft_ctx *ctx,
+ struct nft_set *set,
+ struct nft_set_iter *iter)
+{
+ struct nft_rbtree *priv = nft_set_priv(set);
+
+ switch (iter->type) {
+ case NFT_ITER_UPDATE:
+ lockdep_assert_held(&nft_pernet(ctx->net)->commit_mutex);
+ nft_rbtree_do_walk(ctx, set, iter);
+ break;
+ case NFT_ITER_READ:
+ read_lock_bh(&priv->lock);
+ nft_rbtree_do_walk(ctx, set, iter);
+ read_unlock_bh(&priv->lock);
+ break;
+ default:
+ iter->err = -EINVAL;
+ WARN_ON_ONCE(1);
+ break;
+ }
}
static void nft_rbtree_gc_remove(struct net *net, struct nft_set *set,
--
2.49.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 4/8] netfilter: nf_tables: all transaction allocations can now sleep
2025-09-01 8:08 [PATCH net-next 0/8] netfilter: updates for net-next Florian Westphal
` (2 preceding siblings ...)
2025-09-01 8:08 ` [PATCH net-next 3/8] netfilter: nf_tables: allow iter callbacks to sleep Florian Westphal
@ 2025-09-01 8:08 ` Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 5/8] netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX Florian Westphal
` (4 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-09-01 8:08 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Now that nft_setelem_flush is not called with rcu read lock held or
disabled softinterrupts anymore this can now use GFP_KERNEL too.
This is the last atomic allocation of transaction elements, so remove
all gfp_t arguments and the wrapper function.
This makes attempts to delete large sets much more reliable, before
this was prone to transient memory allocation failures.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nf_tables_api.c | 47 ++++++++++++++---------------------
1 file changed, 19 insertions(+), 28 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 58c5425d61c2..54519b3d2868 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -151,12 +151,12 @@ static void nft_ctx_init(struct nft_ctx *ctx,
bitmap_zero(ctx->reg_inited, NFT_REG32_NUM);
}
-static struct nft_trans *nft_trans_alloc_gfp(const struct nft_ctx *ctx,
- int msg_type, u32 size, gfp_t gfp)
+static struct nft_trans *nft_trans_alloc(const struct nft_ctx *ctx,
+ int msg_type, u32 size)
{
struct nft_trans *trans;
- trans = kzalloc(size, gfp);
+ trans = kzalloc(size, GFP_KERNEL);
if (trans == NULL)
return NULL;
@@ -172,12 +172,6 @@ static struct nft_trans *nft_trans_alloc_gfp(const struct nft_ctx *ctx,
return trans;
}
-static struct nft_trans *nft_trans_alloc(const struct nft_ctx *ctx,
- int msg_type, u32 size)
-{
- return nft_trans_alloc_gfp(ctx, msg_type, size, GFP_KERNEL);
-}
-
static struct nft_trans_binding *nft_trans_get_binding(struct nft_trans *trans)
{
switch (trans->msg_type) {
@@ -442,8 +436,7 @@ static bool nft_trans_collapse_set_elem_allowed(const struct nft_trans_elem *a,
static bool nft_trans_collapse_set_elem(struct nftables_pernet *nft_net,
struct nft_trans_elem *tail,
- struct nft_trans_elem *trans,
- gfp_t gfp)
+ struct nft_trans_elem *trans)
{
unsigned int nelems, old_nelems = tail->nelems;
struct nft_trans_elem *new_trans;
@@ -466,9 +459,11 @@ static bool nft_trans_collapse_set_elem(struct nftables_pernet *nft_net,
/* krealloc might free tail which invalidates list pointers */
list_del_init(&tail->nft_trans.list);
- new_trans = krealloc(tail, struct_size(tail, elems, nelems), gfp);
+ new_trans = krealloc(tail, struct_size(tail, elems, nelems),
+ GFP_KERNEL);
if (!new_trans) {
- list_add_tail(&tail->nft_trans.list, &nft_net->commit_list);
+ list_add_tail(&tail->nft_trans.list,
+ &nft_net->commit_list);
return false;
}
@@ -484,7 +479,7 @@ static bool nft_trans_collapse_set_elem(struct nftables_pernet *nft_net,
}
static bool nft_trans_try_collapse(struct nftables_pernet *nft_net,
- struct nft_trans *trans, gfp_t gfp)
+ struct nft_trans *trans)
{
struct nft_trans *tail;
@@ -501,7 +496,7 @@ static bool nft_trans_try_collapse(struct nftables_pernet *nft_net,
case NFT_MSG_DELSETELEM:
return nft_trans_collapse_set_elem(nft_net,
nft_trans_container_elem(tail),
- nft_trans_container_elem(trans), gfp);
+ nft_trans_container_elem(trans));
}
return false;
@@ -537,17 +532,14 @@ static void nft_trans_commit_list_add_tail(struct net *net, struct nft_trans *tr
}
}
-static void nft_trans_commit_list_add_elem(struct net *net, struct nft_trans *trans,
- gfp_t gfp)
+static void nft_trans_commit_list_add_elem(struct net *net, struct nft_trans *trans)
{
struct nftables_pernet *nft_net = nft_pernet(net);
WARN_ON_ONCE(trans->msg_type != NFT_MSG_NEWSETELEM &&
trans->msg_type != NFT_MSG_DELSETELEM);
- might_alloc(gfp);
-
- if (nft_trans_try_collapse(nft_net, trans, gfp)) {
+ if (nft_trans_try_collapse(nft_net, trans)) {
kfree(trans);
return;
}
@@ -7549,7 +7541,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
}
ue->priv = elem_priv;
- nft_trans_commit_list_add_elem(ctx->net, trans, GFP_KERNEL);
+ nft_trans_commit_list_add_elem(ctx->net, trans);
goto err_elem_free;
}
}
@@ -7573,7 +7565,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
}
nft_trans_container_elem(trans)->elems[0].priv = elem.priv;
- nft_trans_commit_list_add_elem(ctx->net, trans, GFP_KERNEL);
+ nft_trans_commit_list_add_elem(ctx->net, trans);
return 0;
err_set_full:
@@ -7839,7 +7831,7 @@ static int nft_del_setelem(struct nft_ctx *ctx, struct nft_set *set,
nft_setelem_data_deactivate(ctx->net, set, elem.priv);
nft_trans_container_elem(trans)->elems[0].priv = elem.priv;
- nft_trans_commit_list_add_elem(ctx->net, trans, GFP_KERNEL);
+ nft_trans_commit_list_add_elem(ctx->net, trans);
return 0;
fail_ops:
@@ -7864,9 +7856,8 @@ static int nft_setelem_flush(const struct nft_ctx *ctx,
if (!nft_set_elem_active(ext, iter->genmask))
return 0;
- trans = nft_trans_alloc_gfp(ctx, NFT_MSG_DELSETELEM,
- struct_size_t(struct nft_trans_elem, elems, 1),
- GFP_ATOMIC);
+ trans = nft_trans_alloc(ctx, NFT_MSG_DELSETELEM,
+ struct_size_t(struct nft_trans_elem, elems, 1));
if (!trans)
return -ENOMEM;
@@ -7877,7 +7868,7 @@ static int nft_setelem_flush(const struct nft_ctx *ctx,
nft_trans_elem_set(trans) = set;
nft_trans_container_elem(trans)->nelems = 1;
nft_trans_container_elem(trans)->elems[0].priv = elem_priv;
- nft_trans_commit_list_add_elem(ctx->net, trans, GFP_ATOMIC);
+ nft_trans_commit_list_add_elem(ctx->net, trans);
return 0;
}
@@ -7894,7 +7885,7 @@ static int __nft_set_catchall_flush(const struct nft_ctx *ctx,
nft_setelem_data_deactivate(ctx->net, set, elem_priv);
nft_trans_container_elem(trans)->elems[0].priv = elem_priv;
- nft_trans_commit_list_add_elem(ctx->net, trans, GFP_KERNEL);
+ nft_trans_commit_list_add_elem(ctx->net, trans);
return 0;
}
--
2.49.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 5/8] netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX
2025-09-01 8:08 [PATCH net-next 0/8] netfilter: updates for net-next Florian Westphal
` (3 preceding siblings ...)
2025-09-01 8:08 ` [PATCH net-next 4/8] netfilter: nf_tables: all transaction allocations can now sleep Florian Westphal
@ 2025-09-01 8:08 ` Florian Westphal
2025-09-01 20:46 ` Jakub Kicinski
2025-09-01 8:08 ` [PATCH net-next 6/8] netfilter: nft_set_pipapo: remove redundant test for avx feature bit Florian Westphal
` (3 subsequent siblings)
8 siblings, 1 reply; 14+ messages in thread
From: Florian Westphal @ 2025-09-01 8:08 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Phil Sutter <phil@nwl.cc>
This new attribute is supposed to be used instead of NFTA_DEVICE_NAME
for simple wildcard interface specs. It holds a NUL-terminated string
representing an interface name prefix to match on.
While kernel code to distinguish full names from prefixes in
NFTA_DEVICE_NAME is simpler than this solution, reusing the existing
attribute with different semantics leads to confusion between different
versions of kernel and user space though:
* With old kernels, wildcards submitted by user space are accepted yet
silently treated as regular names.
* With old user space, wildcards submitted by kernel may cause crashes
since libnftnl expects NUL-termination when there is none.
Using a distinct attribute type sanitizes these situations as the
receiving part detects and rejects the unexpected attribute nested in
*_HOOK_DEVS attributes.
Fixes: 6d07a289504a ("netfilter: nf_tables: Support wildcard netdev hook specs")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/uapi/linux/netfilter/nf_tables.h | 2 ++
net/netfilter/nf_tables_api.c | 42 +++++++++++++++++-------
2 files changed, 33 insertions(+), 11 deletions(-)
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 2beb30be2c5f..8e0eb832bc01 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -1784,10 +1784,12 @@ enum nft_synproxy_attributes {
* enum nft_device_attributes - nf_tables device netlink attributes
*
* @NFTA_DEVICE_NAME: name of this device (NLA_STRING)
+ * @NFTA_DEVICE_PREFIX: device name prefix, a simple wildcard (NLA_STRING)
*/
enum nft_devices_attributes {
NFTA_DEVICE_UNSPEC,
NFTA_DEVICE_NAME,
+ NFTA_DEVICE_PREFIX,
__NFTA_DEVICE_MAX
};
#define NFTA_DEVICE_MAX (__NFTA_DEVICE_MAX - 1)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 54519b3d2868..68e273d8821a 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1951,6 +1951,18 @@ static int nft_dump_stats(struct sk_buff *skb, struct nft_stats __percpu *stats)
return -ENOSPC;
}
+static bool hook_is_prefix(struct nft_hook *hook)
+{
+ return strlen(hook->ifname) >= hook->ifnamelen;
+}
+
+static int nft_nla_put_hook_dev(struct sk_buff *skb, struct nft_hook *hook)
+{
+ int attr = hook_is_prefix(hook) ? NFTA_DEVICE_PREFIX : NFTA_DEVICE_NAME;
+
+ return nla_put_string(skb, attr, hook->ifname);
+}
+
static int nft_dump_basechain_hook(struct sk_buff *skb,
const struct net *net, int family,
const struct nft_base_chain *basechain,
@@ -1982,16 +1994,15 @@ static int nft_dump_basechain_hook(struct sk_buff *skb,
if (!first)
first = hook;
- if (nla_put(skb, NFTA_DEVICE_NAME,
- hook->ifnamelen, hook->ifname))
+ if (nft_nla_put_hook_dev(skb, hook))
goto nla_put_failure;
n++;
}
nla_nest_end(skb, nest_devs);
if (n == 1 &&
- nla_put(skb, NFTA_HOOK_DEV,
- first->ifnamelen, first->ifname))
+ !hook_is_prefix(first) &&
+ nla_put_string(skb, NFTA_HOOK_DEV, first->ifname))
goto nla_put_failure;
}
nla_nest_end(skb, nest);
@@ -2302,7 +2313,8 @@ void nf_tables_chain_destroy(struct nft_chain *chain)
}
static struct nft_hook *nft_netdev_hook_alloc(struct net *net,
- const struct nlattr *attr)
+ const struct nlattr *attr,
+ bool prefix)
{
struct nf_hook_ops *ops;
struct net_device *dev;
@@ -2319,7 +2331,8 @@ static struct nft_hook *nft_netdev_hook_alloc(struct net *net,
if (err < 0)
goto err_hook_free;
- hook->ifnamelen = nla_len(attr);
+ /* include the terminating NUL-char when comparing non-prefixes */
+ hook->ifnamelen = strlen(hook->ifname) + !prefix;
/* nf_tables_netdev_event() is called under rtnl_mutex, this is
* indirectly serializing all the other holders of the commit_mutex with
@@ -2366,14 +2379,22 @@ static int nf_tables_parse_netdev_hooks(struct net *net,
struct nft_hook *hook, *next;
const struct nlattr *tmp;
int rem, n = 0, err;
+ bool prefix;
nla_for_each_nested(tmp, attr, rem) {
- if (nla_type(tmp) != NFTA_DEVICE_NAME) {
+ switch (nla_type(tmp)) {
+ case NFTA_DEVICE_NAME:
+ prefix = false;
+ break;
+ case NFTA_DEVICE_PREFIX:
+ prefix = true;
+ break;
+ default:
err = -EINVAL;
goto err_hook;
}
- hook = nft_netdev_hook_alloc(net, tmp);
+ hook = nft_netdev_hook_alloc(net, tmp, prefix);
if (IS_ERR(hook)) {
NL_SET_BAD_ATTR(extack, tmp);
err = PTR_ERR(hook);
@@ -2419,7 +2440,7 @@ static int nft_chain_parse_netdev(struct net *net, struct nlattr *tb[],
int err;
if (tb[NFTA_HOOK_DEV]) {
- hook = nft_netdev_hook_alloc(net, tb[NFTA_HOOK_DEV]);
+ hook = nft_netdev_hook_alloc(net, tb[NFTA_HOOK_DEV], false);
if (IS_ERR(hook)) {
NL_SET_BAD_ATTR(extack, tb[NFTA_HOOK_DEV]);
return PTR_ERR(hook);
@@ -9449,8 +9470,7 @@ static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net,
list_for_each_entry_rcu(hook, hook_list, list,
lockdep_commit_lock_is_held(net)) {
- if (nla_put(skb, NFTA_DEVICE_NAME,
- hook->ifnamelen, hook->ifname))
+ if (nft_nla_put_hook_dev(skb, hook))
goto nla_put_failure;
}
nla_nest_end(skb, nest_devs);
--
2.49.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 6/8] netfilter: nft_set_pipapo: remove redundant test for avx feature bit
2025-09-01 8:08 [PATCH net-next 0/8] netfilter: updates for net-next Florian Westphal
` (4 preceding siblings ...)
2025-09-01 8:08 ` [PATCH net-next 5/8] netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX Florian Westphal
@ 2025-09-01 8:08 ` Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 7/8] netfilter: nf_reject: remove unneeded exports Florian Westphal
` (2 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-09-01 8:08 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Sebastian points out that avx2 depends on avx, see check_cpufeature_deps()
in arch/x86/kernel/cpu/cpuid-deps.c:
avx2 feature bit will be cleared when avx isn't available.
No functional change intended.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nft_set_pipapo.c | 3 +--
net/netfilter/nft_set_pipapo_avx2.c | 2 +-
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index b385cfcf886f..4b64c3bd8e70 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -530,8 +530,7 @@ static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m,
local_bh_disable();
#if defined(CONFIG_X86_64) && !defined(CONFIG_UML)
- if (boot_cpu_has(X86_FEATURE_AVX2) && boot_cpu_has(X86_FEATURE_AVX) &&
- irq_fpu_usable()) {
+ if (boot_cpu_has(X86_FEATURE_AVX2) && irq_fpu_usable()) {
e = pipapo_get_avx2(m, data, genmask, tstamp);
local_bh_enable();
return e;
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index 29326f3fcaf3..7559306d0aed 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1099,7 +1099,7 @@ bool nft_pipapo_avx2_estimate(const struct nft_set_desc *desc, u32 features,
desc->field_count < NFT_PIPAPO_MIN_FIELDS)
return false;
- if (!boot_cpu_has(X86_FEATURE_AVX2) || !boot_cpu_has(X86_FEATURE_AVX))
+ if (!boot_cpu_has(X86_FEATURE_AVX2))
return false;
est->size = pipapo_estimate_size(desc);
--
2.49.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 7/8] netfilter: nf_reject: remove unneeded exports
2025-09-01 8:08 [PATCH net-next 0/8] netfilter: updates for net-next Florian Westphal
` (5 preceding siblings ...)
2025-09-01 8:08 ` [PATCH net-next 6/8] netfilter: nft_set_pipapo: remove redundant test for avx feature bit Florian Westphal
@ 2025-09-01 8:08 ` Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 8/8] netfilter: nft_payload: extend offset to 65535 bytes Florian Westphal
2025-09-02 10:53 ` [PATCH net-next 0/8] netfilter: updates for net-next Florian Westphal
8 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-09-01 8:08 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
These functions have no external callers and can be static.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/ipv4/nf_reject.h | 8 ------
include/net/netfilter/ipv6/nf_reject.h | 10 -------
net/ipv4/netfilter/nf_reject_ipv4.c | 27 ++++++++++++-------
net/ipv6/netfilter/nf_reject_ipv6.c | 37 +++++++++++++++++---------
4 files changed, 42 insertions(+), 40 deletions(-)
diff --git a/include/net/netfilter/ipv4/nf_reject.h b/include/net/netfilter/ipv4/nf_reject.h
index c653fcb88354..09de2f2686b5 100644
--- a/include/net/netfilter/ipv4/nf_reject.h
+++ b/include/net/netfilter/ipv4/nf_reject.h
@@ -10,14 +10,6 @@
void nf_send_unreach(struct sk_buff *skb_in, int code, int hook);
void nf_send_reset(struct net *net, struct sock *, struct sk_buff *oldskb,
int hook);
-const struct tcphdr *nf_reject_ip_tcphdr_get(struct sk_buff *oldskb,
- struct tcphdr *_oth, int hook);
-struct iphdr *nf_reject_iphdr_put(struct sk_buff *nskb,
- const struct sk_buff *oldskb,
- __u8 protocol, int ttl);
-void nf_reject_ip_tcphdr_put(struct sk_buff *nskb, const struct sk_buff *oldskb,
- const struct tcphdr *oth);
-
struct sk_buff *nf_reject_skb_v4_unreach(struct net *net,
struct sk_buff *oldskb,
const struct net_device *dev,
diff --git a/include/net/netfilter/ipv6/nf_reject.h b/include/net/netfilter/ipv6/nf_reject.h
index d729344ba644..94ec0b9f2838 100644
--- a/include/net/netfilter/ipv6/nf_reject.h
+++ b/include/net/netfilter/ipv6/nf_reject.h
@@ -9,16 +9,6 @@ void nf_send_unreach6(struct net *net, struct sk_buff *skb_in, unsigned char cod
unsigned int hooknum);
void nf_send_reset6(struct net *net, struct sock *sk, struct sk_buff *oldskb,
int hook);
-const struct tcphdr *nf_reject_ip6_tcphdr_get(struct sk_buff *oldskb,
- struct tcphdr *otcph,
- unsigned int *otcplen, int hook);
-struct ipv6hdr *nf_reject_ip6hdr_put(struct sk_buff *nskb,
- const struct sk_buff *oldskb,
- __u8 protocol, int hoplimit);
-void nf_reject_ip6_tcphdr_put(struct sk_buff *nskb,
- const struct sk_buff *oldskb,
- const struct tcphdr *oth, unsigned int otcplen);
-
struct sk_buff *nf_reject_skb_v6_tcp_reset(struct net *net,
struct sk_buff *oldskb,
const struct net_device *dev,
diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c
index 0d3cb2ba6fc8..05631abe3f0d 100644
--- a/net/ipv4/netfilter/nf_reject_ipv4.c
+++ b/net/ipv4/netfilter/nf_reject_ipv4.c
@@ -12,6 +12,15 @@
#include <linux/netfilter_ipv4.h>
#include <linux/netfilter_bridge.h>
+static struct iphdr *nf_reject_iphdr_put(struct sk_buff *nskb,
+ const struct sk_buff *oldskb,
+ __u8 protocol, int ttl);
+static void nf_reject_ip_tcphdr_put(struct sk_buff *nskb, const struct sk_buff *oldskb,
+ const struct tcphdr *oth);
+static const struct tcphdr *
+nf_reject_ip_tcphdr_get(struct sk_buff *oldskb,
+ struct tcphdr *_oth, int hook);
+
static int nf_reject_iphdr_validate(struct sk_buff *skb)
{
struct iphdr *iph;
@@ -136,8 +145,9 @@ struct sk_buff *nf_reject_skb_v4_unreach(struct net *net,
}
EXPORT_SYMBOL_GPL(nf_reject_skb_v4_unreach);
-const struct tcphdr *nf_reject_ip_tcphdr_get(struct sk_buff *oldskb,
- struct tcphdr *_oth, int hook)
+static const struct tcphdr *
+nf_reject_ip_tcphdr_get(struct sk_buff *oldskb,
+ struct tcphdr *_oth, int hook)
{
const struct tcphdr *oth;
@@ -163,11 +173,10 @@ const struct tcphdr *nf_reject_ip_tcphdr_get(struct sk_buff *oldskb,
return oth;
}
-EXPORT_SYMBOL_GPL(nf_reject_ip_tcphdr_get);
-struct iphdr *nf_reject_iphdr_put(struct sk_buff *nskb,
- const struct sk_buff *oldskb,
- __u8 protocol, int ttl)
+static struct iphdr *nf_reject_iphdr_put(struct sk_buff *nskb,
+ const struct sk_buff *oldskb,
+ __u8 protocol, int ttl)
{
struct iphdr *niph, *oiph = ip_hdr(oldskb);
@@ -188,10 +197,9 @@ struct iphdr *nf_reject_iphdr_put(struct sk_buff *nskb,
return niph;
}
-EXPORT_SYMBOL_GPL(nf_reject_iphdr_put);
-void nf_reject_ip_tcphdr_put(struct sk_buff *nskb, const struct sk_buff *oldskb,
- const struct tcphdr *oth)
+static void nf_reject_ip_tcphdr_put(struct sk_buff *nskb, const struct sk_buff *oldskb,
+ const struct tcphdr *oth)
{
struct iphdr *niph = ip_hdr(nskb);
struct tcphdr *tcph;
@@ -218,7 +226,6 @@ void nf_reject_ip_tcphdr_put(struct sk_buff *nskb, const struct sk_buff *oldskb,
nskb->csum_start = (unsigned char *)tcph - nskb->head;
nskb->csum_offset = offsetof(struct tcphdr, check);
}
-EXPORT_SYMBOL_GPL(nf_reject_ip_tcphdr_put);
static int nf_reject_fill_skb_dst(struct sk_buff *skb_in)
{
diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c
index cb2d38e80de9..6b022449f867 100644
--- a/net/ipv6/netfilter/nf_reject_ipv6.c
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -12,6 +12,19 @@
#include <linux/netfilter_ipv6.h>
#include <linux/netfilter_bridge.h>
+static struct ipv6hdr *
+nf_reject_ip6hdr_put(struct sk_buff *nskb,
+ const struct sk_buff *oldskb,
+ __u8 protocol, int hoplimit);
+static void
+nf_reject_ip6_tcphdr_put(struct sk_buff *nskb,
+ const struct sk_buff *oldskb,
+ const struct tcphdr *oth, unsigned int otcplen);
+static const struct tcphdr *
+nf_reject_ip6_tcphdr_get(struct sk_buff *oldskb,
+ struct tcphdr *otcph,
+ unsigned int *otcplen, int hook);
+
static bool nf_reject_v6_csum_ok(struct sk_buff *skb, int hook)
{
const struct ipv6hdr *ip6h = ipv6_hdr(skb);
@@ -146,9 +159,10 @@ struct sk_buff *nf_reject_skb_v6_unreach(struct net *net,
}
EXPORT_SYMBOL_GPL(nf_reject_skb_v6_unreach);
-const struct tcphdr *nf_reject_ip6_tcphdr_get(struct sk_buff *oldskb,
- struct tcphdr *otcph,
- unsigned int *otcplen, int hook)
+static const struct tcphdr *
+nf_reject_ip6_tcphdr_get(struct sk_buff *oldskb,
+ struct tcphdr *otcph,
+ unsigned int *otcplen, int hook)
{
const struct ipv6hdr *oip6h = ipv6_hdr(oldskb);
u8 proto;
@@ -192,11 +206,11 @@ const struct tcphdr *nf_reject_ip6_tcphdr_get(struct sk_buff *oldskb,
return otcph;
}
-EXPORT_SYMBOL_GPL(nf_reject_ip6_tcphdr_get);
-struct ipv6hdr *nf_reject_ip6hdr_put(struct sk_buff *nskb,
- const struct sk_buff *oldskb,
- __u8 protocol, int hoplimit)
+static struct ipv6hdr *
+nf_reject_ip6hdr_put(struct sk_buff *nskb,
+ const struct sk_buff *oldskb,
+ __u8 protocol, int hoplimit)
{
struct ipv6hdr *ip6h;
const struct ipv6hdr *oip6h = ipv6_hdr(oldskb);
@@ -216,11 +230,11 @@ struct ipv6hdr *nf_reject_ip6hdr_put(struct sk_buff *nskb,
return ip6h;
}
-EXPORT_SYMBOL_GPL(nf_reject_ip6hdr_put);
-void nf_reject_ip6_tcphdr_put(struct sk_buff *nskb,
- const struct sk_buff *oldskb,
- const struct tcphdr *oth, unsigned int otcplen)
+static void
+nf_reject_ip6_tcphdr_put(struct sk_buff *nskb,
+ const struct sk_buff *oldskb,
+ const struct tcphdr *oth, unsigned int otcplen)
{
struct tcphdr *tcph;
@@ -248,7 +262,6 @@ void nf_reject_ip6_tcphdr_put(struct sk_buff *nskb,
csum_partial(tcph,
sizeof(struct tcphdr), 0));
}
-EXPORT_SYMBOL_GPL(nf_reject_ip6_tcphdr_put);
static int nf_reject6_fill_skb_dst(struct sk_buff *skb_in)
{
--
2.49.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 8/8] netfilter: nft_payload: extend offset to 65535 bytes
2025-09-01 8:08 [PATCH net-next 0/8] netfilter: updates for net-next Florian Westphal
` (6 preceding siblings ...)
2025-09-01 8:08 ` [PATCH net-next 7/8] netfilter: nf_reject: remove unneeded exports Florian Westphal
@ 2025-09-01 8:08 ` Florian Westphal
2025-09-02 10:53 ` [PATCH net-next 0/8] netfilter: updates for net-next Florian Westphal
8 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-09-01 8:08 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Fernando Fernandez Mancera <fmancera@suse.de>
In some situations 255 bytes offset is not enough to match or manipulate
the desired packet field. Increase the offset limit to 65535 or U16_MAX.
In addition, the nla policy maximum value is not set anymore as it is
limited to s16. Instead, the maximum value is checked during the payload
expression initialization function.
Tested with the nft command line tool.
table ip filter {
chain output {
@nh,2040,8 set 0xff
@nh,524280,8 set 0xff
@nh,524280,8 0xff
@nh,2040,8 0xff
}
}
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_tables_core.h | 2 +-
net/netfilter/nft_payload.c | 18 +++++++++++-------
2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index 6c2f483d9828..7644cfe9267d 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -73,7 +73,7 @@ struct nft_ct {
struct nft_payload {
enum nft_payload_bases base:8;
- u8 offset;
+ u16 offset;
u8 len;
u8 dreg;
};
diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index 059b28ffad0e..b0214418f75a 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -40,7 +40,7 @@ static bool nft_payload_rebuild_vlan_hdr(const struct sk_buff *skb, int mac_off,
/* add vlan header into the user buffer for if tag was removed by offloads */
static bool
-nft_payload_copy_vlan(u32 *d, const struct sk_buff *skb, u8 offset, u8 len)
+nft_payload_copy_vlan(u32 *d, const struct sk_buff *skb, u16 offset, u8 len)
{
int mac_off = skb_mac_header(skb) - skb->data;
u8 *vlanh, *dst_u8 = (u8 *) d;
@@ -212,7 +212,7 @@ static const struct nla_policy nft_payload_policy[NFTA_PAYLOAD_MAX + 1] = {
[NFTA_PAYLOAD_SREG] = { .type = NLA_U32 },
[NFTA_PAYLOAD_DREG] = { .type = NLA_U32 },
[NFTA_PAYLOAD_BASE] = { .type = NLA_U32 },
- [NFTA_PAYLOAD_OFFSET] = NLA_POLICY_MAX(NLA_BE32, 255),
+ [NFTA_PAYLOAD_OFFSET] = { .type = NLA_BE32 },
[NFTA_PAYLOAD_LEN] = NLA_POLICY_MAX(NLA_BE32, 255),
[NFTA_PAYLOAD_CSUM_TYPE] = { .type = NLA_U32 },
[NFTA_PAYLOAD_CSUM_OFFSET] = NLA_POLICY_MAX(NLA_BE32, 255),
@@ -797,7 +797,7 @@ static int nft_payload_csum_inet(struct sk_buff *skb, const u32 *src,
struct nft_payload_set {
enum nft_payload_bases base:8;
- u8 offset;
+ u16 offset;
u8 len;
u8 sreg;
u8 csum_type;
@@ -812,7 +812,7 @@ struct nft_payload_vlan_hdr {
};
static bool
-nft_payload_set_vlan(const u32 *src, struct sk_buff *skb, u8 offset, u8 len,
+nft_payload_set_vlan(const u32 *src, struct sk_buff *skb, u16 offset, u8 len,
int *vlan_hlen)
{
struct nft_payload_vlan_hdr *vlanh;
@@ -940,14 +940,18 @@ static int nft_payload_set_init(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nlattr * const tb[])
{
+ u32 csum_offset, offset, csum_type = NFT_PAYLOAD_CSUM_NONE;
struct nft_payload_set *priv = nft_expr_priv(expr);
- u32 csum_offset, csum_type = NFT_PAYLOAD_CSUM_NONE;
int err;
priv->base = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE]));
- priv->offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET]));
priv->len = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN]));
+ err = nft_parse_u32_check(tb[NFTA_PAYLOAD_OFFSET], U16_MAX, &offset);
+ if (err < 0)
+ return err;
+ priv->offset = offset;
+
if (tb[NFTA_PAYLOAD_CSUM_TYPE])
csum_type = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_CSUM_TYPE]));
if (tb[NFTA_PAYLOAD_CSUM_OFFSET]) {
@@ -1069,7 +1073,7 @@ nft_payload_select_ops(const struct nft_ctx *ctx,
if (tb[NFTA_PAYLOAD_DREG] == NULL)
return ERR_PTR(-EINVAL);
- err = nft_parse_u32_check(tb[NFTA_PAYLOAD_OFFSET], U8_MAX, &offset);
+ err = nft_parse_u32_check(tb[NFTA_PAYLOAD_OFFSET], U16_MAX, &offset);
if (err < 0)
return ERR_PTR(err);
--
2.49.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 5/8] netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX
2025-09-01 8:08 ` [PATCH net-next 5/8] netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX Florian Westphal
@ 2025-09-01 20:46 ` Jakub Kicinski
2025-09-01 21:12 ` Pablo Neira Ayuso
0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2025-09-01 20:46 UTC (permalink / raw)
To: Florian Westphal
Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
netfilter-devel, pablo
On Mon, 1 Sep 2025 10:08:39 +0200 Florian Westphal wrote:
> This new attribute is supposed to be used instead of NFTA_DEVICE_NAME
> for simple wildcard interface specs. It holds a NUL-terminated string
> representing an interface name prefix to match on.
>
> While kernel code to distinguish full names from prefixes in
> NFTA_DEVICE_NAME is simpler than this solution, reusing the existing
> attribute with different semantics leads to confusion between different
> versions of kernel and user space though:
>
> * With old kernels, wildcards submitted by user space are accepted yet
> silently treated as regular names.
> * With old user space, wildcards submitted by kernel may cause crashes
> since libnftnl expects NUL-termination when there is none.
>
> Using a distinct attribute type sanitizes these situations as the
> receiving part detects and rejects the unexpected attribute nested in
> *_HOOK_DEVS attributes.
>
> Fixes: 6d07a289504a ("netfilter: nf_tables: Support wildcard netdev hook specs")
Why is this not targeting net? The sooner we adjust the uAPI the better.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 5/8] netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX
2025-09-01 20:46 ` Jakub Kicinski
@ 2025-09-01 21:12 ` Pablo Neira Ayuso
2025-09-02 0:04 ` Florian Westphal
0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2025-09-01 21:12 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Florian Westphal, netdev, Paolo Abeni, David S. Miller,
Eric Dumazet, netfilter-devel
On Mon, Sep 01, 2025 at 01:46:02PM -0700, Jakub Kicinski wrote:
> On Mon, 1 Sep 2025 10:08:39 +0200 Florian Westphal wrote:
> > This new attribute is supposed to be used instead of NFTA_DEVICE_NAME
> > for simple wildcard interface specs. It holds a NUL-terminated string
> > representing an interface name prefix to match on.
> >
> > While kernel code to distinguish full names from prefixes in
> > NFTA_DEVICE_NAME is simpler than this solution, reusing the existing
> > attribute with different semantics leads to confusion between different
> > versions of kernel and user space though:
> >
> > * With old kernels, wildcards submitted by user space are accepted yet
> > silently treated as regular names.
> > * With old user space, wildcards submitted by kernel may cause crashes
> > since libnftnl expects NUL-termination when there is none.
> >
> > Using a distinct attribute type sanitizes these situations as the
> > receiving part detects and rejects the unexpected attribute nested in
> > *_HOOK_DEVS attributes.
> >
> > Fixes: 6d07a289504a ("netfilter: nf_tables: Support wildcard netdev hook specs")
>
> Why is this not targeting net? The sooner we adjust the uAPI the better.
I think there were doubts that was possible at this stage.
But I agree, it is a bit late but better fix it there.
Florian?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 5/8] netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX
2025-09-01 21:12 ` Pablo Neira Ayuso
@ 2025-09-02 0:04 ` Florian Westphal
2025-09-02 13:03 ` Paolo Abeni
0 siblings, 1 reply; 14+ messages in thread
From: Florian Westphal @ 2025-09-02 0:04 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Jakub Kicinski, netdev, Paolo Abeni, David S. Miller,
Eric Dumazet, netfilter-devel
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, Sep 01, 2025 at 01:46:02PM -0700, Jakub Kicinski wrote:
> > Why is this not targeting net? The sooner we adjust the uAPI the better.
I considered it a new feature rather than a bug fix:
Userspace can't rely on the existing api because kernels before
6.16 don't special-case the names provided, and nftables doesn't
make use of the 6.16 "accident" (the attempt to re-use the existing
device name attribute for this).
The corresponding userspace changes (v4 uses the new attribute)
haven't been merged yet.
But sure, getting rid of the "accident" faster makes sense,
thanks for suggesting this.
> I think there were doubts that was possible at this stage.
> But I agree, it is a bit late but better fix it there.
Alright, I'll send a new nf-next PR with this one dropped in a few hours
and a separate nf.git PR with this patch included.
If you prefer to just apply this series sand this patch that worls for
me too, I'll just follow this changesets' patchwork state.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/8] netfilter: updates for net-next
2025-09-01 8:08 [PATCH net-next 0/8] netfilter: updates for net-next Florian Westphal
` (7 preceding siblings ...)
2025-09-01 8:08 ` [PATCH net-next 8/8] netfilter: nft_payload: extend offset to 65535 bytes Florian Westphal
@ 2025-09-02 10:53 ` Florian Westphal
8 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-09-02 10:53 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Florian Westphal <fw@strlen.de> wrote:
> The following patchset contains Netfilter fixes for *net-next*:
>
> 1) prefer vmalloc_array in ebtables, from Qianfeng Rong.
> 2) Use csum_replace4 instead of open-coding it, from Christophe Leroy.
> 3+4) Get rid of GFP_ATOMIC in transaction object allocations, those
> cause silly failures with large sets under memory pressure, from
> myself.
> 5) Introduce new NFTA_DEVICE_PREFIX attribute in nftables netlink api,
> re-using old NFTA_DEVICE_NAME led to confusion with different
> kernel/userspace versions. This refines the wildcard interface
> support added in 6.16 release. From Phil Sutter.
As per discussion I'll route patch 5 via net tree instead, so:
pw-bot: changes-requested
A new nf-next -> net-next MR will follow.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 5/8] netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX
2025-09-02 0:04 ` Florian Westphal
@ 2025-09-02 13:03 ` Paolo Abeni
0 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2025-09-02 13:03 UTC (permalink / raw)
To: Florian Westphal, Pablo Neira Ayuso
Cc: Jakub Kicinski, netdev, David S. Miller, Eric Dumazet,
netfilter-devel
On 9/2/25 2:04 AM, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> On Mon, Sep 01, 2025 at 01:46:02PM -0700, Jakub Kicinski wrote:
>>> Why is this not targeting net? The sooner we adjust the uAPI the better.
>
> I considered it a new feature rather than a bug fix:
>
> Userspace can't rely on the existing api because kernels before
> 6.16 don't special-case the names provided, and nftables doesn't
> make use of the 6.16 "accident" (the attempt to re-use the existing
> device name attribute for this).
>
> The corresponding userspace changes (v4 uses the new attribute)
> haven't been merged yet.
>
> But sure, getting rid of the "accident" faster makes sense,
> thanks for suggesting this.
>
>> I think there were doubts that was possible at this stage.
>> But I agree, it is a bit late but better fix it there.
>
> Alright, I'll send a new nf-next PR with this one dropped in a few hours
> and a separate nf.git PR with this patch included.
I agree that this latter option is preferable.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-09-02 13:03 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-01 8:08 [PATCH net-next 0/8] netfilter: updates for net-next Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 1/8] netfilter: ebtables: Use vmalloc_array() to improve code Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 2/8] netfilter: nft_payload: Use csum_replace4() instead of opencoding Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 3/8] netfilter: nf_tables: allow iter callbacks to sleep Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 4/8] netfilter: nf_tables: all transaction allocations can now sleep Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 5/8] netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX Florian Westphal
2025-09-01 20:46 ` Jakub Kicinski
2025-09-01 21:12 ` Pablo Neira Ayuso
2025-09-02 0:04 ` Florian Westphal
2025-09-02 13:03 ` Paolo Abeni
2025-09-01 8:08 ` [PATCH net-next 6/8] netfilter: nft_set_pipapo: remove redundant test for avx feature bit Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 7/8] netfilter: nf_reject: remove unneeded exports Florian Westphal
2025-09-01 8:08 ` [PATCH net-next 8/8] netfilter: nft_payload: extend offset to 65535 bytes Florian Westphal
2025-09-02 10:53 ` [PATCH net-next 0/8] netfilter: updates for net-next Florian Westphal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).