* [PATCH net 0/9] Netfilter/IPVS fixes for net
@ 2026-06-01 11:59 Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 1/9] netfilter: xt_NFQUEUE: prefer raw_smp_processor_id Pablo Neira Ayuso
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2026-06-01 11:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
Hi,
The following patchset contains Netfilter/IPVS fixes for net:
1) Fix splat with PREEMPT_RCU because smp_processor_id() in nfqueue,
from Fernando Fernandez Mancera.
2) Fix possible use of pointer to old IPVS scheduler after RCU grace
period when editing service, from Julian Anastasov.
3) Fix possible forever RCU walk over rt->fib6_siblings in nft_fib6,
if rt is unlinked mid-iteration, apparently same issue happens in
the fib6 core. From Jiayuan Chen.
4) Add mutex to guard refcount in synproxy infrastructure, since
concurrent hook {un}registration can happen.
From Fernando Fernandez Mancera.
5) Bail out if IRC conntrack helper fails to parse a command, do not
try parsing using other command handlers, from Florian Westphal.
This fixes a possible out-of-bound read.
6) Possible use-after-free in nft_tunnel by releasing template dst
after all references has been dropped, from Tristan Madani.
7) Ignore conntrack template in nft_ct, from Jiayuan Chen.
8) Missing skb_ensure_writable() in ebt_snat, Yiming Qian.
9) Remove multi-register byteorder support, this allows for kernel
stack info leak, from Florian Westphal.
Please, pull these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-26-06-01
Thanks.
----------------------------------------------------------------
The following changes since commit 78ef59e7a6459b16f8102e0ee1c718443323d1af:
Merge branch 'wireguard-fixes-for-7-1-rc6' (2026-05-29 13:01:31 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-26-06-01
for you to fetch changes up to bb061d3de41707415269be75ebf700efb03ec212:
netfilter: nft_byteorder: remove multi-register support (2026-06-01 13:43:53 +0200)
----------------------------------------------------------------
netfilter pull request 26-06-01
----------------------------------------------------------------
Fernando Fernandez Mancera (2):
netfilter: xt_NFQUEUE: prefer raw_smp_processor_id
netfilter: synproxy: add mutex to guard hook reference counting
Florian Westphal (2):
netfilter: conntrack_irc: fix possible out-of-bounds read
netfilter: nft_byteorder: remove multi-register support
Jiayuan Chen (2):
netfilter: nft_fib_ipv6: bail out of sibling walk if rt got unlinked
netfilter: nft_ct: bail out on template ct in get eval
Julian Anastasov (1):
ipvs: clear the svc scheduler ptr early on edit
Tristan Madani (1):
netfilter: nft_tunnel: fix use-after-free on object destroy
Yiming Qian (1):
netfilter: bridge: make ebt_snat ARP rewrite writable
include/net/ip_vs.h | 3 +--
net/bridge/netfilter/ebt_snat.c | 3 +++
net/ipv6/netfilter/nft_fib_ipv6.c | 3 +++
net/netfilter/ipvs/ip_vs_ctl.c | 13 ++++++----
net/netfilter/ipvs/ip_vs_sched.c | 14 +++++------
net/netfilter/nf_conntrack_irc.c | 4 +--
net/netfilter/nf_synproxy_core.c | 24 +++++++++++++-----
net/netfilter/nft_byteorder.c | 51 +++++++++++++++------------------------
net/netfilter/nft_ct.c | 8 +++---
net/netfilter/nft_ct_fast.c | 2 +-
net/netfilter/nft_tunnel.c | 2 +-
net/netfilter/xt_NFQUEUE.c | 2 +-
12 files changed, 68 insertions(+), 61 deletions(-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net 1/9] netfilter: xt_NFQUEUE: prefer raw_smp_processor_id
2026-06-01 11:59 [PATCH net 0/9] Netfilter/IPVS fixes for net Pablo Neira Ayuso
@ 2026-06-01 11:59 ` Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 2/9] ipvs: clear the svc scheduler ptr early on edit Pablo Neira Ayuso
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2026-06-01 11:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Fernando Fernandez Mancera <fmancera@suse.de>
With PREEMPT_RCU this triggers a splat because smp_processor_id() can be
preempted while inside a RCU critical section. If xt_NFQUEUE target is
invoked via nft_compat_eval() path, we are inside a RCU critical
section.
Just use the raw version instead.
Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/xt_NFQUEUE.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/xt_NFQUEUE.c b/net/netfilter/xt_NFQUEUE.c
index 466da23e36ff..b32d153e3a18 100644
--- a/net/netfilter/xt_NFQUEUE.c
+++ b/net/netfilter/xt_NFQUEUE.c
@@ -91,7 +91,7 @@ nfqueue_tg_v3(struct sk_buff *skb, const struct xt_action_param *par)
if (info->queues_total > 1) {
if (info->flags & NFQ_FLAG_CPU_FANOUT) {
- int cpu = smp_processor_id();
+ int cpu = raw_smp_processor_id();
queue = info->queuenum + cpu % info->queues_total;
} else {
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 2/9] ipvs: clear the svc scheduler ptr early on edit
2026-06-01 11:59 [PATCH net 0/9] Netfilter/IPVS fixes for net Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 1/9] netfilter: xt_NFQUEUE: prefer raw_smp_processor_id Pablo Neira Ayuso
@ 2026-06-01 11:59 ` Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 3/9] netfilter: nft_fib_ipv6: bail out of sibling walk if rt got unlinked Pablo Neira Ayuso
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2026-06-01 11:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Julian Anastasov <ja@ssi.bg>
ip_vs_edit_service() while unbinding the old scheduler clears
the svc->scheduler ptr after the scheduler module initiates
RCU callbacks. This can cause packets to use the old
scheduler at the time when svc->sched_data is already freed
after RCU grace period.
Fix it by clearing the ptr early in ip_vs_unbind_scheduler(),
before the done_service method schedules any RCU callbacks.
Also, if the new scheduler fails to initialize when replacing
the old scheduler, try to restore the old scheduler while still
returning the error code.
Link: https://sashiko.dev/#/patchset/20260519015506.634185-1-rosenp%40gmail.com
Fixes: 05f00505a89a ("ipvs: fix crash if scheduler is changed")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/net/ip_vs.h | 3 +--
net/netfilter/ipvs/ip_vs_ctl.c | 13 ++++++++-----
net/netfilter/ipvs/ip_vs_sched.c | 14 +++++++-------
3 files changed, 16 insertions(+), 14 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index a02e569813d2..e517eaaa177b 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1824,8 +1824,7 @@ int register_ip_vs_scheduler(struct ip_vs_scheduler *scheduler);
int unregister_ip_vs_scheduler(struct ip_vs_scheduler *scheduler);
int ip_vs_bind_scheduler(struct ip_vs_service *svc,
struct ip_vs_scheduler *scheduler);
-void ip_vs_unbind_scheduler(struct ip_vs_service *svc,
- struct ip_vs_scheduler *sched);
+void ip_vs_unbind_scheduler(struct ip_vs_service *svc);
struct ip_vs_scheduler *ip_vs_scheduler_get(const char *sched_name);
void ip_vs_scheduler_put(struct ip_vs_scheduler *scheduler);
struct ip_vs_conn *
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index bd9cae44d214..16daba8cac83 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1898,7 +1898,7 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u,
if (ret_hooks >= 0)
ip_vs_unregister_hooks(ipvs, u->af);
if (svc != NULL) {
- ip_vs_unbind_scheduler(svc, sched);
+ ip_vs_unbind_scheduler(svc);
ip_vs_service_free(svc);
}
ip_vs_scheduler_put(sched);
@@ -1962,9 +1962,8 @@ ip_vs_edit_service(struct ip_vs_service *svc, struct ip_vs_service_user_kern *u)
old_sched = rcu_dereference_protected(svc->scheduler, 1);
if (sched != old_sched) {
if (old_sched) {
- ip_vs_unbind_scheduler(svc, old_sched);
- RCU_INIT_POINTER(svc->scheduler, NULL);
- /* Wait all svc->sched_data users */
+ ip_vs_unbind_scheduler(svc);
+ /* Wait all svc->scheduler/sched_data users */
synchronize_rcu();
}
/* Bind the new scheduler */
@@ -1972,6 +1971,10 @@ ip_vs_edit_service(struct ip_vs_service *svc, struct ip_vs_service_user_kern *u)
ret = ip_vs_bind_scheduler(svc, sched);
if (ret) {
ip_vs_scheduler_put(sched);
+ /* Try to restore the old_sched */
+ if (old_sched &&
+ !ip_vs_bind_scheduler(svc, old_sched))
+ old_sched = NULL;
goto out;
}
}
@@ -2027,7 +2030,7 @@ static void __ip_vs_del_service(struct ip_vs_service *svc, bool cleanup)
/* Unbind scheduler */
old_sched = rcu_dereference_protected(svc->scheduler, 1);
- ip_vs_unbind_scheduler(svc, old_sched);
+ ip_vs_unbind_scheduler(svc);
ip_vs_scheduler_put(old_sched);
/* Unbind persistence engine, keep svc->pe */
diff --git a/net/netfilter/ipvs/ip_vs_sched.c b/net/netfilter/ipvs/ip_vs_sched.c
index c6e421c4e299..24adc38942a0 100644
--- a/net/netfilter/ipvs/ip_vs_sched.c
+++ b/net/netfilter/ipvs/ip_vs_sched.c
@@ -56,19 +56,19 @@ int ip_vs_bind_scheduler(struct ip_vs_service *svc,
/*
* Unbind a service with its scheduler
*/
-void ip_vs_unbind_scheduler(struct ip_vs_service *svc,
- struct ip_vs_scheduler *sched)
+void ip_vs_unbind_scheduler(struct ip_vs_service *svc)
{
- struct ip_vs_scheduler *cur_sched;
+ struct ip_vs_scheduler *sched;
- cur_sched = rcu_dereference_protected(svc->scheduler, 1);
- /* This check proves that old 'sched' was installed */
- if (!cur_sched)
+ sched = rcu_dereference_protected(svc->scheduler, 1);
+ if (!sched)
return;
+ /* Reset the scheduler before initiating any RCU callbacks */
+ rcu_assign_pointer(svc->scheduler, NULL);
+ smp_wmb(); /* paired with smp_rmb() in ip_vs_schedule() */
if (sched->done_service)
sched->done_service(svc);
- /* svc->scheduler can be set to NULL only by caller */
}
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 3/9] netfilter: nft_fib_ipv6: bail out of sibling walk if rt got unlinked
2026-06-01 11:59 [PATCH net 0/9] Netfilter/IPVS fixes for net Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 1/9] netfilter: xt_NFQUEUE: prefer raw_smp_processor_id Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 2/9] ipvs: clear the svc scheduler ptr early on edit Pablo Neira Ayuso
@ 2026-06-01 11:59 ` Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 4/9] netfilter: synproxy: add mutex to guard hook reference counting Pablo Neira Ayuso
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2026-06-01 11:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Jiayuan Chen <jiayuan.chen@linux.dev>
This was reported by Sashiko [1].
The RCU walk over rt->fib6_siblings can spin forever if rt is unlinked
mid-iteration: rt->fib6_siblings.next still points into the old ring,
so the loop never meets &rt->fib6_siblings as its terminator.
fib6_purge_rt() always does WRITE_ONCE(rt->fib6_nsiblings, 0) before
list_del_rcu(), so readers can use rt->fib6_nsiblings == 0 as the
detach signal. The same pattern is used in fib6_info_uses_dev() and
rt6_nlmsg_size().
[1]: https://sashiko.dev/#/patchset/20260520023411.391233-1-jiayuan.chen%40linux.dev
Suggested-by: Florian Westphal <fw@strlen.de>
Fixes: 1c32b24c234b ("netfilter: nft_fib_ipv6: switch to fib6_lookup")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/ipv6/netfilter/nft_fib_ipv6.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/ipv6/netfilter/nft_fib_ipv6.c b/net/ipv6/netfilter/nft_fib_ipv6.c
index c0a0075e2590..2dbe44715df3 100644
--- a/net/ipv6/netfilter/nft_fib_ipv6.c
+++ b/net/ipv6/netfilter/nft_fib_ipv6.c
@@ -191,6 +191,9 @@ static bool nft_fib6_info_nh_uses_dev(struct fib6_info *rt,
if (nft_fib6_info_nh_dev_match(nh_dev, dev))
return true;
+
+ if (!READ_ONCE(rt->fib6_nsiblings))
+ return false;
}
return false;
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 4/9] netfilter: synproxy: add mutex to guard hook reference counting
2026-06-01 11:59 [PATCH net 0/9] Netfilter/IPVS fixes for net Pablo Neira Ayuso
` (2 preceding siblings ...)
2026-06-01 11:59 ` [PATCH net 3/9] netfilter: nft_fib_ipv6: bail out of sibling walk if rt got unlinked Pablo Neira Ayuso
@ 2026-06-01 11:59 ` Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 5/9] netfilter: conntrack_irc: fix possible out-of-bounds read Pablo Neira Ayuso
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2026-06-01 11:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Fernando Fernandez Mancera <fmancera@suse.de>
As the synproxy infrastructure register netfilter hooks on-demand when a
user adds the first iptables target or nftables expression, if done
concurrently they can race each other.
Introduce a mutex to serialize the refcount control blocks access from
both frontends. While a per namespace mutex might be more efficient, it
is not needed for target/expression like SYNPROXY.
Fixes: ad49d86e07a4 ("netfilter: nf_tables: Add synproxy support")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nf_synproxy_core.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c
index 036c8586f49b..ed00114f65f3 100644
--- a/net/netfilter/nf_synproxy_core.c
+++ b/net/netfilter/nf_synproxy_core.c
@@ -22,6 +22,8 @@
#include <net/netfilter/nf_conntrack_zones.h>
#include <net/netfilter/nf_synproxy.h>
+static DEFINE_MUTEX(synproxy_mutex);
+
unsigned int synproxy_net_id;
EXPORT_SYMBOL_GPL(synproxy_net_id);
@@ -769,26 +771,31 @@ static const struct nf_hook_ops ipv4_synproxy_ops[] = {
int nf_synproxy_ipv4_init(struct synproxy_net *snet, struct net *net)
{
- int err;
+ int err = 0;
+ mutex_lock(&synproxy_mutex);
if (snet->hook_ref4 == 0) {
err = nf_register_net_hooks(net, ipv4_synproxy_ops,
ARRAY_SIZE(ipv4_synproxy_ops));
if (err)
- return err;
+ goto out;
}
snet->hook_ref4++;
- return 0;
+out:
+ mutex_unlock(&synproxy_mutex);
+ return err;
}
EXPORT_SYMBOL_GPL(nf_synproxy_ipv4_init);
void nf_synproxy_ipv4_fini(struct synproxy_net *snet, struct net *net)
{
+ mutex_lock(&synproxy_mutex);
snet->hook_ref4--;
if (snet->hook_ref4 == 0)
nf_unregister_net_hooks(net, ipv4_synproxy_ops,
ARRAY_SIZE(ipv4_synproxy_ops));
+ mutex_unlock(&synproxy_mutex);
}
EXPORT_SYMBOL_GPL(nf_synproxy_ipv4_fini);
@@ -1193,27 +1200,32 @@ static const struct nf_hook_ops ipv6_synproxy_ops[] = {
int
nf_synproxy_ipv6_init(struct synproxy_net *snet, struct net *net)
{
- int err;
+ int err = 0;
+ mutex_lock(&synproxy_mutex);
if (snet->hook_ref6 == 0) {
err = nf_register_net_hooks(net, ipv6_synproxy_ops,
ARRAY_SIZE(ipv6_synproxy_ops));
if (err)
- return err;
+ goto out;
}
snet->hook_ref6++;
- return 0;
+out:
+ mutex_unlock(&synproxy_mutex);
+ return err;
}
EXPORT_SYMBOL_GPL(nf_synproxy_ipv6_init);
void
nf_synproxy_ipv6_fini(struct synproxy_net *snet, struct net *net)
{
+ mutex_lock(&synproxy_mutex);
snet->hook_ref6--;
if (snet->hook_ref6 == 0)
nf_unregister_net_hooks(net, ipv6_synproxy_ops,
ARRAY_SIZE(ipv6_synproxy_ops));
+ mutex_unlock(&synproxy_mutex);
}
EXPORT_SYMBOL_GPL(nf_synproxy_ipv6_fini);
#endif /* CONFIG_IPV6 */
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 5/9] netfilter: conntrack_irc: fix possible out-of-bounds read
2026-06-01 11:59 [PATCH net 0/9] Netfilter/IPVS fixes for net Pablo Neira Ayuso
` (3 preceding siblings ...)
2026-06-01 11:59 ` [PATCH net 4/9] netfilter: synproxy: add mutex to guard hook reference counting Pablo Neira Ayuso
@ 2026-06-01 11:59 ` Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 6/9] netfilter: nft_tunnel: fix use-after-free on object destroy Pablo Neira Ayuso
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2026-06-01 11:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Florian Westphal <fw@strlen.de>
When parsing fails after we've matched the command string we
should bail out instead of trying to match a different command.
This helper should be deprecated, given prevalence of TLS I doubt it has
any relevance in 2026.
Fixes: 869f37d8e48f ("[NETFILTER]: nf_conntrack/nf_nat: add IRC helper port")
Closes: https://sashiko.dev/#/patchset/20260525182924.28456-1-fw%40strlen.de
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nf_conntrack_irc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_conntrack_irc.c b/net/netfilter/nf_conntrack_irc.c
index 522183b9a604..2ebe4cb47cf6 100644
--- a/net/netfilter/nf_conntrack_irc.c
+++ b/net/netfilter/nf_conntrack_irc.c
@@ -203,7 +203,7 @@ static int help(struct sk_buff *skb, unsigned int protoff,
if (parse_dcc(data, data_limit, &dcc_ip,
&dcc_port, &addr_beg_p, &addr_end_p)) {
pr_debug("unable to parse dcc command\n");
- continue;
+ goto out;
}
pr_debug("DCC bound ip/port: %pI4:%u\n",
@@ -217,7 +217,7 @@ static int help(struct sk_buff *skb, unsigned int protoff,
net_warn_ratelimited("Forged DCC command from %pI4: %pI4:%u\n",
&tuple->src.u3.ip,
&dcc_ip, dcc_port);
- continue;
+ goto out;
}
exp = nf_ct_expect_alloc(ct);
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 6/9] netfilter: nft_tunnel: fix use-after-free on object destroy
2026-06-01 11:59 [PATCH net 0/9] Netfilter/IPVS fixes for net Pablo Neira Ayuso
` (4 preceding siblings ...)
2026-06-01 11:59 ` [PATCH net 5/9] netfilter: conntrack_irc: fix possible out-of-bounds read Pablo Neira Ayuso
@ 2026-06-01 11:59 ` Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 7/9] netfilter: nft_ct: bail out on template ct in get eval Pablo Neira Ayuso
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2026-06-01 11:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Tristan Madani <tristan@talencesecurity.com>
nft_tunnel_obj_destroy() calls metadata_dst_free() which directly
kfree()s the metadata_dst, ignoring the dst_entry refcount. Packets
that took a reference via dst_hold() in nft_tunnel_obj_eval() and
are still queued (e.g. in a netem qdisc) are left with a dangling
pointer. When these packets are eventually dequeued, dst_release()
operates on freed memory.
Replace metadata_dst_free() with dst_release() so the metadata_dst
is freed only after all references are dropped. The dst subsystem
already handles metadata_dst cleanup in dst_destroy() when
DST_METADATA is set.
Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support")
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nft_tunnel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nft_tunnel.c b/net/netfilter/nft_tunnel.c
index 0b987bc2132a..68f7cfbbee06 100644
--- a/net/netfilter/nft_tunnel.c
+++ b/net/netfilter/nft_tunnel.c
@@ -676,7 +676,7 @@ static void nft_tunnel_obj_destroy(const struct nft_ctx *ctx,
{
struct nft_tunnel_obj *priv = nft_obj_data(obj);
- metadata_dst_free(priv->md);
+ dst_release(&priv->md->dst);
}
static struct nft_object_type nft_tunnel_obj_type;
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 7/9] netfilter: nft_ct: bail out on template ct in get eval
2026-06-01 11:59 [PATCH net 0/9] Netfilter/IPVS fixes for net Pablo Neira Ayuso
` (5 preceding siblings ...)
2026-06-01 11:59 ` [PATCH net 6/9] netfilter: nft_tunnel: fix use-after-free on object destroy Pablo Neira Ayuso
@ 2026-06-01 11:59 ` Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 8/9] netfilter: bridge: make ebt_snat ARP rewrite writable Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 9/9] netfilter: nft_byteorder: remove multi-register support Pablo Neira Ayuso
8 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2026-06-01 11:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Jiayuan Chen <jiayuan.chen@linux.dev>
I noticed this issue while looking at a historic syzbot report [1].
A rule like the one below is enough to trigger the bug:
table ip t {
chain pre {
type filter hook prerouting priority raw;
ct zone set 1
ct original saddr 1.2.3.4 accept
}
}
The first expression attaches a per-cpu template ct via
nft_ct_set_zone_eval() (nf_ct_tmpl_alloc -> kzalloc, tuple is all
zero, nf_ct_l3num(ct) == 0). The next expression then calls
nft_ct_get_eval() on the same skb, treats the template as a real ct
and hits the 16-byte memcpy path. With dreg at NFT_REG32_15 this
overflows past struct nft_regs on the kernel stack; with smaller
dreg values it silently clobbers adjacent registers.
Reject template ct at the eval entry and in nft_ct_get_fast_eval(),
mirroring the check nft_ct_set_eval() already has. Additionally,
bound the address copy in NFT_CT_SRC / NFT_CT_DST by priv->len
instead of by nf_ct_l3num(ct): nf_ct_get_tuple() zeroes the tuple
before pkt_to_tuple() fills in only the protocol-relevant leading
bytes, so the trailing bytes of tuple->{src,dst}.u3.all are
well-defined zero. priv->len is validated at rule load, so the
copy size is now bounded by the destination register rather than
by an untrusted field on the conntrack.
[1]: https://syzkaller.appspot.com/bug?id=389cf09cb72926114fce90dc85a2c3231dcb647c
Fixes: 45d9bcda21f4 ("netfilter: nf_tables: validate len in nft_validate_data_load()")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nft_ct.c | 8 +++-----
net/netfilter/nft_ct_fast.c | 2 +-
2 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index fa2cc556331c..357513c6dcea 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -78,7 +78,7 @@ static void nft_ct_get_eval(const struct nft_expr *expr,
break;
}
- if (ct == NULL)
+ if (!ct || nf_ct_is_template(ct))
goto err;
switch (priv->key) {
@@ -180,12 +180,10 @@ static void nft_ct_get_eval(const struct nft_expr *expr,
tuple = &ct->tuplehash[priv->dir].tuple;
switch (priv->key) {
case NFT_CT_SRC:
- memcpy(dest, tuple->src.u3.all,
- nf_ct_l3num(ct) == NFPROTO_IPV4 ? 4 : 16);
+ memcpy(dest, tuple->src.u3.all, priv->len);
return;
case NFT_CT_DST:
- memcpy(dest, tuple->dst.u3.all,
- nf_ct_l3num(ct) == NFPROTO_IPV4 ? 4 : 16);
+ memcpy(dest, tuple->dst.u3.all, priv->len);
return;
case NFT_CT_PROTO_SRC:
nft_reg_store16(dest, (__force u16)tuple->src.u.all);
diff --git a/net/netfilter/nft_ct_fast.c b/net/netfilter/nft_ct_fast.c
index e684c8a91848..ecf7b3a404be 100644
--- a/net/netfilter/nft_ct_fast.c
+++ b/net/netfilter/nft_ct_fast.c
@@ -30,7 +30,7 @@ void nft_ct_get_fast_eval(const struct nft_expr *expr,
break;
}
- if (!ct) {
+ if (!ct || nf_ct_is_template(ct)) {
regs->verdict.code = NFT_BREAK;
return;
}
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 8/9] netfilter: bridge: make ebt_snat ARP rewrite writable
2026-06-01 11:59 [PATCH net 0/9] Netfilter/IPVS fixes for net Pablo Neira Ayuso
` (6 preceding siblings ...)
2026-06-01 11:59 ` [PATCH net 7/9] netfilter: nft_ct: bail out on template ct in get eval Pablo Neira Ayuso
@ 2026-06-01 11:59 ` Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 9/9] netfilter: nft_byteorder: remove multi-register support Pablo Neira Ayuso
8 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2026-06-01 11:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Yiming Qian <yimingqian591@gmail.com>
The ebtables SNAT target keeps the Ethernet source address rewrite
behind skb_ensure_writable(skb, 0). This is intentional: at the bridge
ebtables hooks the Ethernet header is addressed through
skb_mac_header()/eth_hdr(), while skb->data points at the Ethernet
payload. Asking skb_ensure_writable() for ETH_HLEN bytes would check
the payload, not the Ethernet header, and would reintroduce the small
packet regression fixed by commit 63137bc5882a.
However, the optional ARP sender hardware address rewrite is different.
It writes through skb_store_bits() at an offset relative to skb->data:
skb_store_bits(skb, sizeof(struct arphdr), info->mac, ETH_ALEN)
skb_header_pointer() only safely reads the ARP header; it does not make
the later sender hardware address range writable. If that range is
still held in a nonlinear skb fragment backed by a splice-imported file
page, skb_store_bits() maps the frag page and copies the new MAC address
directly into it.
Ensure the ARP SHA range is writable before reading the ARP header and
before calling skb_store_bits().
Fixes: 63137bc5882a ("netfilter: ebtables: Fixes dropping of small packets in bridge nat")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/bridge/netfilter/ebt_snat.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/bridge/netfilter/ebt_snat.c b/net/bridge/netfilter/ebt_snat.c
index 7dfbcdfc30e5..c9e229af0366 100644
--- a/net/bridge/netfilter/ebt_snat.c
+++ b/net/bridge/netfilter/ebt_snat.c
@@ -31,6 +31,9 @@ ebt_snat_tg(struct sk_buff *skb, const struct xt_action_param *par)
const struct arphdr *ap;
struct arphdr _ah;
+ if (skb_ensure_writable(skb, sizeof(_ah) + ETH_ALEN))
+ return EBT_DROP;
+
ap = skb_header_pointer(skb, 0, sizeof(_ah), &_ah);
if (ap == NULL)
return EBT_DROP;
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 9/9] netfilter: nft_byteorder: remove multi-register support
2026-06-01 11:59 [PATCH net 0/9] Netfilter/IPVS fixes for net Pablo Neira Ayuso
` (7 preceding siblings ...)
2026-06-01 11:59 ` [PATCH net 8/9] netfilter: bridge: make ebt_snat ARP rewrite writable Pablo Neira Ayuso
@ 2026-06-01 11:59 ` Pablo Neira Ayuso
8 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2026-06-01 11:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Florian Westphal <fw@strlen.de>
64bit byteorder conversion is broken when several registers need to be
converted because the source register array advances in steps for 4 bytes
instead of 8:
for (i = ...
src64 = nft_reg_load64(&src[i]);
~~~~~ u32 *src
nft_reg_store64(&dst64[i],
Remove the multi-register support, it has other issues as well:
Pablo points out that commit
caf3ef7468f7 ("netfilter: nf_tables: prevent OOB access in nft_byteorder_eval")
alters semantics: before the loop operated on registers, i.e.
for ( ... )
dst32[i] = htons((u16)src32[i])
.. but after the patch it will operate on bytes, which makes this
useless to convert e.g. concatenations, which store each compound
in its own register.
Multi-convert of u32 has one theoretical application:
ct mark . meta mark . tcp dport @intervalset
Because ct mark and meta mark are host byte order, use with
intervals has to convert the byteorder for ct/meta mark value
to network byte order (bigendian).
nftables emits this:
[ meta load mark => reg 1 ]
[ byteorder reg 1 = hton(reg 1, 4, 4) ]
[ ct load mark => reg 9 ]
[ byteorder reg 9 = hton(reg 9, 4, 4) ]
...
I.e. two separate calls. Theoretically it could be changed to do:
[ meta load mark => reg 1 ]
[ ct load mark => reg 9 ]
[ byteorder reg 1 = htonl(reg 1, 4, 8) ]
...
But then all it would take to change the set to
meta mark . tcp dport . ct mark
... and we'd be back to two "byteorder" calls. IOW, support to
convert a range of registers is both dysfunctional and dubious.
Simplify this: remove the feature.
Pablo Neira Ayuso points out that nftables before 1.1.0 can generate
incorrect byteorder conversions, see 9fe58952c45a,
"evaluate: skip byteorder conversion for selector smaller than 2 bytes"
in nftables.git). Affected rulesets fail to load with this change and
old userspace due to 'len != size' check.
Fixes: c301f0981fdd ("netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval()")
Cc: <stable+noautosel@kernel.org> # may break rule load with old nftables versions
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/netfilter-devel/20240206104336.ctigqpkunom2ufmn@lion.mk-sys.cz/
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nft_byteorder.c | 51 ++++++++++++++---------------------
1 file changed, 20 insertions(+), 31 deletions(-)
diff --git a/net/netfilter/nft_byteorder.c b/net/netfilter/nft_byteorder.c
index 2316c77f4228..dfd41fc8d9b8 100644
--- a/net/netfilter/nft_byteorder.c
+++ b/net/netfilter/nft_byteorder.c
@@ -19,7 +19,6 @@ struct nft_byteorder {
u8 sreg;
u8 dreg;
enum nft_byteorder_ops op:8;
- u8 len;
u8 size;
};
@@ -28,13 +27,8 @@ void nft_byteorder_eval(const struct nft_expr *expr,
const struct nft_pktinfo *pkt)
{
const struct nft_byteorder *priv = nft_expr_priv(expr);
- u32 *src = ®s->data[priv->sreg];
+ const u32 *src = ®s->data[priv->sreg];
u32 *dst = ®s->data[priv->dreg];
- u16 *s16, *d16;
- unsigned int i;
-
- s16 = (void *)src;
- d16 = (void *)dst;
switch (priv->size) {
case 8: {
@@ -43,18 +37,14 @@ void nft_byteorder_eval(const struct nft_expr *expr,
switch (priv->op) {
case NFT_BYTEORDER_NTOH:
- for (i = 0; i < priv->len / 8; i++) {
- src64 = nft_reg_load64(&src[i]);
- nft_reg_store64(&dst64[i],
- be64_to_cpu((__force __be64)src64));
- }
+ src64 = nft_reg_load64(src);
+
+ nft_reg_store64(dst64, be64_to_cpu((__force __be64)src64));
break;
case NFT_BYTEORDER_HTON:
- for (i = 0; i < priv->len / 8; i++) {
- src64 = (__force __u64)
- cpu_to_be64(nft_reg_load64(&src[i]));
- nft_reg_store64(&dst64[i], src64);
- }
+ src64 = (__force __u64)cpu_to_be64(nft_reg_load64(src));
+
+ nft_reg_store64(dst64, src64);
break;
}
break;
@@ -62,24 +52,20 @@ void nft_byteorder_eval(const struct nft_expr *expr,
case 4:
switch (priv->op) {
case NFT_BYTEORDER_NTOH:
- for (i = 0; i < priv->len / 4; i++)
- dst[i] = ntohl((__force __be32)src[i]);
+ *dst = ntohl((__force __be32)*src);
break;
case NFT_BYTEORDER_HTON:
- for (i = 0; i < priv->len / 4; i++)
- dst[i] = (__force __u32)htonl(src[i]);
+ *dst = (__force __u32)htonl(*src);
break;
}
break;
case 2:
switch (priv->op) {
case NFT_BYTEORDER_NTOH:
- for (i = 0; i < priv->len / 2; i++)
- d16[i] = ntohs((__force __be16)s16[i]);
+ nft_reg_store16(dst, ntohs(nft_reg_load_be16(src)));
break;
case NFT_BYTEORDER_HTON:
- for (i = 0; i < priv->len / 2; i++)
- d16[i] = (__force __u16)htons(s16[i]);
+ nft_reg_store_be16(dst, htons(nft_reg_load16(src)));
break;
}
break;
@@ -137,20 +123,22 @@ static int nft_byteorder_init(const struct nft_ctx *ctx,
if (err < 0)
return err;
- priv->len = len;
+ /* no longer support multi-reg conversions */
+ if (len != size)
+ return -EOPNOTSUPP;
err = nft_parse_register_load(ctx, tb[NFTA_BYTEORDER_SREG], &priv->sreg,
- priv->len);
+ len);
if (err < 0)
return err;
err = nft_parse_register_store(ctx, tb[NFTA_BYTEORDER_DREG],
&priv->dreg, NULL, NFT_DATA_VALUE,
- priv->len);
+ len);
if (err < 0)
return err;
- if (nft_reg_overlap(priv->sreg, priv->dreg, priv->len))
+ if (nft_reg_overlap(priv->sreg, priv->dreg, len))
return -EINVAL;
return 0;
@@ -167,10 +155,11 @@ static int nft_byteorder_dump(struct sk_buff *skb,
goto nla_put_failure;
if (nla_put_be32(skb, NFTA_BYTEORDER_OP, htonl(priv->op)))
goto nla_put_failure;
- if (nla_put_be32(skb, NFTA_BYTEORDER_LEN, htonl(priv->len)))
- goto nla_put_failure;
if (nla_put_be32(skb, NFTA_BYTEORDER_SIZE, htonl(priv->size)))
goto nla_put_failure;
+ /* compatibility for old userspace which permitted size != len */
+ if (nla_put_be32(skb, NFTA_BYTEORDER_LEN, htonl(priv->size)))
+ goto nla_put_failure;
return 0;
nla_put_failure:
--
2.47.3
^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-06-01 11:59 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01 11:59 [PATCH net 0/9] Netfilter/IPVS fixes for net Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 1/9] netfilter: xt_NFQUEUE: prefer raw_smp_processor_id Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 2/9] ipvs: clear the svc scheduler ptr early on edit Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 3/9] netfilter: nft_fib_ipv6: bail out of sibling walk if rt got unlinked Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 4/9] netfilter: synproxy: add mutex to guard hook reference counting Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 5/9] netfilter: conntrack_irc: fix possible out-of-bounds read Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 6/9] netfilter: nft_tunnel: fix use-after-free on object destroy Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 7/9] netfilter: nft_ct: bail out on template ct in get eval Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 8/9] netfilter: bridge: make ebt_snat ARP rewrite writable Pablo Neira Ayuso
2026-06-01 11:59 ` [PATCH net 9/9] netfilter: nft_byteorder: remove multi-register support Pablo Neira Ayuso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox