From: Florian Westphal <fw@strlen.de>
To: <netdev@vger.kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
<netfilter-devel@vger.kernel.org>,
pablo@netfilter.org
Subject: [PATCH v2 net-next 11/11] netfilter: nft_set_rbtree: validate open interval overlap
Date: Fri, 6 Feb 2026 16:30:48 +0100 [thread overview]
Message-ID: <20260206153048.17570-12-fw@strlen.de> (raw)
In-Reply-To: <20260206153048.17570-1-fw@strlen.de>
From: Pablo Neira Ayuso <pablo@netfilter.org>
Open intervals do not have an end element, in particular an open
interval at the end of the set is hard to validate because of it is
lacking the end element, and interval validation relies on such end
element to perform the checks.
This patch adds a new flag field to struct nft_set_elem, this is not an
issue because this is a temporary object that is allocated in the stack
from the insert/deactivate path. This flag field is used to specify that
this is the last element in this add/delete command.
The last flag is used, in combination with the start element cookie, to
check if there is a partial overlap, eg.
Already exists: 255.255.255.0-255.255.255.254
Add interval: 255.255.255.0-255.255.255.255
~~~~~~~~~~~~~
start element overlap
Basically, the idea is to check for an existing end element in the set
if there is an overlap with an existing start element.
However, the last open interval can come in any position in the add
command, the corner case can get a bit more complicated:
Already exists: 255.255.255.0-255.255.255.254
Add intervals: 255.255.255.0-255.255.255.255,255.255.255.0-255.255.255.254
~~~~~~~~~~~~~
start element overlap
To catch this overlap, annotate that the new start element is a possible
overlap, then report the overlap if the next element is another start
element that confirms that previous element in an open interval at the
end of the set.
For deletions, do not update the start cookie when deleting an open
interval, otherwise this can trigger spurious EEXIST when adding new
elements.
Unfortunately, there is no NFT_SET_ELEM_INTERVAL_OPEN flag which would
make easier to detect open interval overlaps.
Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_tables.h | 4 ++
net/netfilter/nf_tables_api.c | 21 +++++++--
net/netfilter/nft_set_rbtree.c | 71 ++++++++++++++++++++++++++-----
3 files changed, 82 insertions(+), 14 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 31906f90706e..426534a711b0 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -277,6 +277,8 @@ struct nft_userdata {
unsigned char data[];
};
+#define NFT_SET_ELEM_INTERNAL_LAST 0x1
+
/* placeholder structure for opaque set element backend representation. */
struct nft_elem_priv { };
@@ -286,6 +288,7 @@ struct nft_elem_priv { };
* @key: element key
* @key_end: closing element key
* @data: element data
+ * @flags: flags
* @priv: element private data and extensions
*/
struct nft_set_elem {
@@ -301,6 +304,7 @@ struct nft_set_elem {
u32 buf[NFT_DATA_VALUE_MAXLEN / sizeof(u32)];
struct nft_data val;
} data;
+ u32 flags;
struct nft_elem_priv *priv;
};
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index d7773c9bbcff..1ed034a47bd0 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -7270,7 +7270,8 @@ static u32 nft_set_maxsize(const struct nft_set *set)
}
static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
- const struct nlattr *attr, u32 nlmsg_flags)
+ const struct nlattr *attr, u32 nlmsg_flags,
+ bool last)
{
struct nft_expr *expr_array[NFT_SET_EXPR_MAX] = {};
struct nlattr *nla[NFTA_SET_ELEM_MAX + 1];
@@ -7556,6 +7557,11 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
if (flags)
*nft_set_ext_flags(ext) = flags;
+ if (last)
+ elem.flags = NFT_SET_ELEM_INTERNAL_LAST;
+ else
+ elem.flags = 0;
+
if (obj)
*nft_set_ext_obj(ext) = obj;
@@ -7719,7 +7725,8 @@ static int nf_tables_newsetelem(struct sk_buff *skb,
nft_ctx_init(&ctx, net, skb, info->nlh, family, table, NULL, nla);
nla_for_each_nested(attr, nla[NFTA_SET_ELEM_LIST_ELEMENTS], rem) {
- err = nft_add_set_elem(&ctx, set, attr, info->nlh->nlmsg_flags);
+ err = nft_add_set_elem(&ctx, set, attr, info->nlh->nlmsg_flags,
+ nla_is_last(attr, rem));
if (err < 0) {
NL_SET_BAD_ATTR(extack, attr);
return err;
@@ -7843,7 +7850,7 @@ static void nft_trans_elems_destroy_abort(const struct nft_ctx *ctx,
}
static int nft_del_setelem(struct nft_ctx *ctx, struct nft_set *set,
- const struct nlattr *attr)
+ const struct nlattr *attr, bool last)
{
struct nlattr *nla[NFTA_SET_ELEM_MAX + 1];
struct nft_set_ext_tmpl tmpl;
@@ -7911,6 +7918,11 @@ static int nft_del_setelem(struct nft_ctx *ctx, struct nft_set *set,
if (flags)
*nft_set_ext_flags(ext) = flags;
+ if (last)
+ elem.flags = NFT_SET_ELEM_INTERNAL_LAST;
+ else
+ elem.flags = 0;
+
trans = nft_trans_elem_alloc(ctx, NFT_MSG_DELSETELEM, set);
if (trans == NULL)
goto fail_trans;
@@ -8058,7 +8070,8 @@ static int nf_tables_delsetelem(struct sk_buff *skb,
return nft_set_flush(&ctx, set, genmask);
nla_for_each_nested(attr, nla[NFTA_SET_ELEM_LIST_ELEMENTS], rem) {
- err = nft_del_setelem(&ctx, set, attr);
+ err = nft_del_setelem(&ctx, set, attr,
+ nla_is_last(attr, rem));
if (err == -ENOENT &&
NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYSETELEM)
continue;
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index a4fb5b517d9d..644d4b916705 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -304,10 +304,19 @@ static void nft_rbtree_set_start_cookie(struct nft_rbtree *priv,
priv->start_rbe_cookie = (unsigned long)rbe;
}
+static void nft_rbtree_set_start_cookie_open(struct nft_rbtree *priv,
+ const struct nft_rbtree_elem *rbe,
+ unsigned long open_interval)
+{
+ priv->start_rbe_cookie = (unsigned long)rbe | open_interval;
+}
+
+#define NFT_RBTREE_OPEN_INTERVAL 1UL
+
static bool nft_rbtree_cmp_start_cookie(struct nft_rbtree *priv,
const struct nft_rbtree_elem *rbe)
{
- return priv->start_rbe_cookie == (unsigned long)rbe;
+ return (priv->start_rbe_cookie & ~NFT_RBTREE_OPEN_INTERVAL) == (unsigned long)rbe;
}
static bool nft_rbtree_insert_same_interval(const struct net *net,
@@ -337,13 +346,14 @@ static bool nft_rbtree_insert_same_interval(const struct net *net,
static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
struct nft_rbtree_elem *new,
- struct nft_elem_priv **elem_priv, u64 tstamp)
+ struct nft_elem_priv **elem_priv, u64 tstamp, bool last)
{
struct nft_rbtree_elem *rbe, *rbe_le = NULL, *rbe_ge = NULL, *rbe_prev;
struct rb_node *node, *next, *parent, **p, *first = NULL;
struct nft_rbtree *priv = nft_set_priv(set);
u8 cur_genmask = nft_genmask_cur(net);
u8 genmask = nft_genmask_next(net);
+ unsigned long open_interval = 0;
int d;
/* Descend the tree to search for an existing element greater than the
@@ -449,10 +459,18 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
}
}
- if (nft_rbtree_interval_null(set, new))
- priv->start_rbe_cookie = 0;
- else if (nft_rbtree_interval_start(new) && priv->start_rbe_cookie)
+ if (nft_rbtree_interval_null(set, new)) {
priv->start_rbe_cookie = 0;
+ } else if (nft_rbtree_interval_start(new) && priv->start_rbe_cookie) {
+ if (nft_set_is_anonymous(set)) {
+ priv->start_rbe_cookie = 0;
+ } else if (priv->start_rbe_cookie & NFT_RBTREE_OPEN_INTERVAL) {
+ /* Previous element is an open interval that partially
+ * overlaps with an existing non-open interval.
+ */
+ return -ENOTEMPTY;
+ }
+ }
/* - new start element matching existing start element: full overlap
* reported as -EEXIST, cleared by caller if NLM_F_EXCL is not given.
@@ -460,7 +478,27 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
if (rbe_ge && !nft_rbtree_cmp(set, new, rbe_ge) &&
nft_rbtree_interval_start(rbe_ge) == nft_rbtree_interval_start(new)) {
*elem_priv = &rbe_ge->priv;
- nft_rbtree_set_start_cookie(priv, rbe_ge);
+
+ /* - Corner case: new start element of open interval (which
+ * comes as last element in the batch) overlaps the start of
+ * an existing interval with an end element: partial overlap.
+ */
+ node = rb_first(&priv->root);
+ rbe = __nft_rbtree_next_active(node, genmask);
+ if (rbe && nft_rbtree_interval_end(rbe)) {
+ rbe = nft_rbtree_next_active(rbe, genmask);
+ if (rbe &&
+ nft_rbtree_interval_start(rbe) &&
+ !nft_rbtree_cmp(set, new, rbe)) {
+ if (last)
+ return -ENOTEMPTY;
+
+ /* Maybe open interval? */
+ open_interval = NFT_RBTREE_OPEN_INTERVAL;
+ }
+ }
+ nft_rbtree_set_start_cookie_open(priv, rbe_ge, open_interval);
+
return -EEXIST;
}
@@ -515,6 +553,12 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
nft_rbtree_interval_end(rbe_ge) && nft_rbtree_interval_end(new))
return -ENOTEMPTY;
+ /* - start element overlaps an open interval but end element is new:
+ * partial overlap, reported as -ENOEMPTY.
+ */
+ if (!rbe_ge && priv->start_rbe_cookie && nft_rbtree_interval_end(new))
+ return -ENOTEMPTY;
+
/* Accepted element: pick insertion point depending on key value */
parent = NULL;
p = &priv->root.rb_node;
@@ -624,6 +668,7 @@ static int nft_rbtree_insert(const struct net *net, const struct nft_set *set,
struct nft_elem_priv **elem_priv)
{
struct nft_rbtree_elem *rbe = nft_elem_priv_cast(elem->priv);
+ bool last = !!(elem->flags & NFT_SET_ELEM_INTERNAL_LAST);
struct nft_rbtree *priv = nft_set_priv(set);
u64 tstamp = nft_net_tstamp(net);
int err;
@@ -640,8 +685,12 @@ static int nft_rbtree_insert(const struct net *net, const struct nft_set *set,
cond_resched();
write_lock_bh(&priv->lock);
- err = __nft_rbtree_insert(net, set, rbe, elem_priv, tstamp);
+ err = __nft_rbtree_insert(net, set, rbe, elem_priv, tstamp, last);
write_unlock_bh(&priv->lock);
+
+ if (nft_rbtree_interval_end(rbe))
+ priv->start_rbe_cookie = 0;
+
} while (err == -EAGAIN);
return err;
@@ -729,6 +778,7 @@ nft_rbtree_deactivate(const struct net *net, const struct nft_set *set,
const struct nft_set_elem *elem)
{
struct nft_rbtree_elem *rbe, *this = nft_elem_priv_cast(elem->priv);
+ bool last = !!(elem->flags & NFT_SET_ELEM_INTERNAL_LAST);
struct nft_rbtree *priv = nft_set_priv(set);
const struct rb_node *parent = priv->root.rb_node;
u8 genmask = nft_genmask_next(net);
@@ -769,9 +819,10 @@ nft_rbtree_deactivate(const struct net *net, const struct nft_set *set,
continue;
}
- if (nft_rbtree_interval_start(rbe))
- nft_rbtree_set_start_cookie(priv, rbe);
- else if (!nft_rbtree_deactivate_same_interval(net, priv, rbe))
+ if (nft_rbtree_interval_start(rbe)) {
+ if (!last)
+ nft_rbtree_set_start_cookie(priv, rbe);
+ } else if (!nft_rbtree_deactivate_same_interval(net, priv, rbe))
return NULL;
nft_rbtree_flush(net, set, &rbe->priv);
--
2.52.0
next prev parent reply other threads:[~2026-02-06 15:31 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 15:30 [PATCH v2 net-next 00/11] netfilter: updates for net-next Florian Westphal
2026-02-06 15:30 ` [PATCH v2 net-next 01/11] netfilter: nft_set_rbtree: don't gc elements on insert Florian Westphal
2026-02-11 5:00 ` patchwork-bot+netdevbpf
2026-02-06 15:30 ` [PATCH v2 net-next 02/11] netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation Florian Westphal
2026-02-06 15:30 ` [PATCH v2 net-next 03/11] selftests: netfilter: nft_queue.sh: add udp fraglist gro test case Florian Westphal
2026-02-19 2:41 ` [TEST] nft_queue / test_udp_gro_ct flakes Jakub Kicinski
2026-02-19 15:11 ` Florian Westphal
2026-02-06 15:30 ` [PATCH v2 net-next 04/11] netfilter: flowtable: dedicated slab for flow entry Florian Westphal
2026-02-06 15:30 ` [PATCH v2 net-next 05/11] selftests: netfilter: add IPV6_TUNNEL to config Florian Westphal
2026-02-06 15:30 ` [PATCH v2 net-next 06/11] netfilter: nft_set_hash: fix get operation on big endian Florian Westphal
2026-02-06 15:30 ` [PATCH v2 net-next 07/11] netfilter: nft_counter: fix reset of counters on 32bit archs Florian Westphal
2026-02-06 15:30 ` [PATCH v2 net-next 08/11] netfilter: nft_set_rbtree: fix bogus EEXIST with NLM_F_CREATE with null interval Florian Westphal
2026-02-06 15:30 ` [PATCH v2 net-next 09/11] netfilter: nft_set_rbtree: check for partial overlaps in anonymous sets Florian Westphal
2026-02-06 15:30 ` [PATCH v2 net-next 10/11] netfilter: nft_set_rbtree: validate element belonging to interval Florian Westphal
2026-02-06 15:30 ` Florian Westphal [this message]
2026-02-10 11:49 ` [PATCH v2 net-next 11/11] netfilter: nft_set_rbtree: validate open interval overlap Paolo Abeni
2026-02-10 15:29 ` Florian Westphal
2026-02-11 3:56 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260206153048.17570-12-fw@strlen.de \
--to=fw@strlen.de \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox