public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/9] Netfilter updates for net-next
@ 2023-01-18 12:31 Florian Westphal
  0 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2023-01-18 12:31 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Florian Westphal

Hello,

following patch set includes netfilter updates for your *net-next* tree.

1. Replace pr_debug use with nf_log infra for debugging in sctp
   conntrack.
2. Remove pr_debug calls, they are either useless or we have better
   options in place.
3. Avoid repeated load of ct->status in some spots.
   Some bit-flags cannot change during the lifeetime of
   a connection, so no need to re-fetch those.
4. Avoid uneeded nesting of rcu_read_lock during tuple lookup.
5. Remove the CLUSTERIP target.  Marked as obsolete for years,
   and we still have WARN splats wrt. races of the out-of-band
   /proc interface installed by this target.
6. Add static key to nf_tables to avoid the retpoline mitigation
   if/else if cascade provided the cpu doesn't need the retpoline thunk.
7. add nf_tables objref calls to the retpoline mitigation workaround.
8. Split parts of nft_ct.c that do not need symbols exported by
   the conntrack modules and place them in nf_tables directly.
   This allows to avoid indirect call for 'ct status' checks.
9. Add 'destroy' commands to nf_tables.  They are identical
   to the existing 'delete' commands, but do not indicate
   an error if the referenced object (set, chain, rule...)
   did not exist, from Fernando.

The following changes since commit c4791b3196bf46367bcf6cc56a09b32e037c4f49:

  Merge branch 'net-mdio-continue-separating-c22-and-c45' (2023-01-17 19:34:10 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git

for you to fetch changes up to f80a612dd77c4585171e44a06b490466bdeec1ae:

  netfilter: nf_tables: add support to destroy operation (2023-01-18 13:09:00 +0100)

----------------------------------------------------------------
Fernando Fernandez Mancera (1):
      netfilter: nf_tables: add support to destroy operation

Florian Westphal (8):
      netfilter: conntrack: sctp: use nf log infrastructure for invalid packets
      netfilter: conntrack: remove pr_debug calls
      netfilter: conntrack: avoid reload of ct->status
      netfilter: conntrack: move rcu read lock to nf_conntrack_find_get
      netfilter: ip_tables: remove clusterip target
      netfilter: nf_tables: add static key to skip retpoline workarounds
      netfilter: nf_tables: avoid retpoline overhead for objref calls
      netfilter: nf_tables: avoid retpoline overhead for some ct expression calls

 include/net/netfilter/nf_tables_core.h   |  16 +
 include/uapi/linux/netfilter/nf_tables.h |  14 +
 net/ipv4/netfilter/Kconfig               |  14 -
 net/ipv4/netfilter/Makefile              |   1 -
 net/ipv4/netfilter/ipt_CLUSTERIP.c       | 929 -------------------------------
 net/netfilter/Makefile                   |   6 +
 net/netfilter/nf_conntrack_core.c        |  46 +-
 net/netfilter/nf_conntrack_proto.c       |  20 +-
 net/netfilter/nf_conntrack_proto_sctp.c  |  46 +-
 net/netfilter/nf_conntrack_proto_tcp.c   |   9 -
 net/netfilter/nf_conntrack_proto_udp.c   |  10 +-
 net/netfilter/nf_tables_api.c            | 111 +++-
 net/netfilter/nf_tables_core.c           |  35 +-
 net/netfilter/nft_ct.c                   |  39 +-
 net/netfilter/nft_ct_fast.c              |  56 ++
 net/netfilter/nft_objref.c               |  12 +-
 16 files changed, 302 insertions(+), 1062 deletions(-)
 delete mode 100644 net/ipv4/netfilter/ipt_CLUSTERIP.c
 create mode 100644 net/netfilter/nft_ct_fast.c
-- 
2.38.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next 0/9] Netfilter updates for net-next
@ 2023-03-08 19:30 Florian Westphal
  0 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2023-03-08 19:30 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel

Hi,

The following set contains updates for the *net-next* tree:

1. nf_tables 'brouting' support, from Sriram Yagnaraman.

2. Update bridge netfilter and ovs conntrack helpers to handle
   IPv6 Jumbo packets properly, i.e. fetch the packet length
   from hop-by-hop extension header, from Xin Long.

   This comes with a test BIG TCP test case, added to
   tools/testing/selftests/net/.

3. Fix spelling and indentation in conntrack, from Jeremy Sowden.

Please consider pulling from

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git

----------------------------------------------------------------

The following changes since commit 7d8c48917a9576b5fc8871aa4946149b0e4a4927:

  dt-bindings: net: dsa: mediatek,mt7530: change some descriptions to literal (2023-03-08 13:05:37 +0000)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git main

for you to fetch changes up to b0ca200077b3872056e6a8291c9a50f803658c2a:

  netfilter: nat: fix indentation of function arguments (2023-03-08 14:25:44 +0100)

----------------------------------------------------------------

Jeremy Sowden (2):
  netfilter: conntrack: fix typo
  netfilter: nat: fix indentation of function arguments

Sriram Yagnaraman (1):
  netfilter: bridge: introduce broute meta statement

Xin Long (6):
  netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len
  netfilter: bridge: check len before accessing more nh data
  netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len
  netfilter: move br_nf_check_hbh_len to utils
  netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim
  selftests: add a selftest for big tcp

 include/linux/netfilter_ipv6.h           |   2 +
 include/uapi/linux/netfilter/nf_tables.h |   2 +
 net/bridge/br_netfilter_ipv6.c           |  79 ++--------
 net/bridge/netfilter/nft_meta_bridge.c   |  71 ++++++++-
 net/netfilter/nf_conntrack_core.c        |   2 +-
 net/netfilter/nf_conntrack_ovs.c         |  11 +-
 net/netfilter/nf_nat_core.c              |   4 +-
 net/netfilter/utils.c                    |  52 +++++++
 tools/testing/selftests/net/Makefile     |   1 +
 tools/testing/selftests/net/big_tcp.sh   | 180 +++++++++++++++++++++++
 10 files changed, 327 insertions(+), 77 deletions(-)
 create mode 100755 tools/testing/selftests/net/big_tcp.sh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next 0/9] Netfilter updates for net-next
@ 2023-05-18 10:07 Florian Westphal
  0 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2023-05-18 10:07 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel

Hello,

[ sorry if you get this twice, wrong mail aliases in v1 ]

this PR contains updates for your *net-next* tree.

nftables updates:

1. Allow key existence checks with maps.
   At the moment the kernel requires userspace to pass a destination
   register for the associated value, make this optional so userspace
   can query if the key exists, just like with normal sets.

2. nftables maintains a counter per set that holds the number of
   elements.  This counter gets decremented on element removal,
   but its only incremented if the set has a upper maximum value.
   Increment unconditionally, this will allow us to update the
   maximum value later on.

3. At DCCP option maching, from Jeremy Sowden.

4. use struct_size macro, from Christophe JAILLET.

Conntrack:

5. Squash holes in struct nf_conntrack_expect, also Christophe JAILLET.

6. Allow clash resolution for GRE Protocol to avoid a packet drop,
   from Faicker Mo.

Flowtable:

Simplify route logic and split large functions into smaller
chunks, from Pablo Neira Ayuso.

The following changes since commit b50a8b0d57ab1ef11492171e98a030f48682eac3:

  net: openvswitch: Use struct_size() (2023-05-17 21:25:46 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git tags/nf-next-2023-05-18

for you to fetch changes up to e05b5362166b18a224c30502e81416e4d622d3e4:

  netfilter: flowtable: split IPv6 datapath in helper functions (2023-05-18 08:48:55 +0200)

----------------------------------------------------------------
Christophe JAILLET (2):
      netfilter: Reorder fields in 'struct nf_conntrack_expect'
      netfilter: nft_set_pipapo: Use struct_size()

Faicker Mo (1):
      netfilter: conntrack: allow insertion clash of gre protocol

Florian Westphal (2):
      netfilter: nf_tables: relax set/map validation checks
      netfilter: nf_tables: always increment set element count

Jeremy Sowden (1):
      netfilter: nft_exthdr: add boolean DCCP option matching

Pablo Neira Ayuso (3):
      netfilter: flowtable: simplify route logic
      netfilter: flowtable: split IPv4 datapath in helper functions
      netfilter: flowtable: split IPv6 datapath in helper functions

 include/net/netfilter/nf_conntrack_expect.h |  18 +--
 include/net/netfilter/nf_flow_table.h       |   4 +-
 include/uapi/linux/netfilter/nf_tables.h    |   2 +
 net/netfilter/nf_conntrack_proto_gre.c      |   1 +
 net/netfilter/nf_flow_table_core.c          |  24 +--
 net/netfilter/nf_flow_table_ip.c            | 231 ++++++++++++++++++----------
 net/netfilter/nf_tables_api.c               |  11 +-
 net/netfilter/nft_exthdr.c                  | 106 +++++++++++++
 net/netfilter/nft_flow_offload.c            |  12 +-
 net/netfilter/nft_lookup.c                  |  23 ++-
 net/netfilter/nft_set_pipapo.c              |   6 +-
 11 files changed, 303 insertions(+), 135 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next 0/9] Netfilter updates for net-next
@ 2024-08-22 22:19 Pablo Neira Ayuso
  0 siblings, 0 replies; 20+ messages in thread
From: Pablo Neira Ayuso @ 2024-08-22 22:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw

Hi,

The following batch contains Netfilter updates for net-next:

Patch #1 fix checksum calculation in nfnetlink_queue with SCTP,
	 segment GSO packet since skb_zerocopy() does not support
	 GSO_BY_FRAGS, from Antonio Ojea.

Patch #2 extend nfnetlink_queue coverage to handle SCTP packets,
	 from Antonio Ojea.

Patch #3 uses consume_skb() instead of kfree_skb() in nfnetlink,
         from Donald Hunter.

Patch #4 adds a dedicate commit list for sets to speed up
	 intra-transaction lookups, from Florian Westphal.

Patch #5 skips removal of element from abort path for the pipapo
         backend, ditching the shadow copy of this datastructure
	 is sufficient.

Patch #6 moves nf_ct_netns_get() out of nf_conncount_init() to
	 let users of conncoiunt decide when to enable conntrack,
	 this is needed by openvswitch, from Xin Long.

Patch #7 pass context to all nft_parse_register_load() in
	 preparation for the next patch.

Patches #8 and #9 reject loads from uninitialized registers from
	 control plane to remove register initialization from
	 datapath. From Florian Westphal.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git nf-next-24-08-23

Thanks.

----------------------------------------------------------------

The following changes since commit 1bf8e07c382bd4f04ede81ecc05267a8ffd60999:

  dt-binding: ptp: fsl,ptp: add pci1957,ee02 compatible string for fsl,enetc-ptp (2024-08-19 09:48:53 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git tags/nf-next-24-08-23

for you to fetch changes up to c88baabf16d1ef74ab8832de9761226406af5507:

  netfilter: nf_tables: don't initialize registers in nft_do_chain() (2024-08-20 12:37:25 +0200)

----------------------------------------------------------------
netfilter pull request 24-08-23

----------------------------------------------------------------
Antonio Ojea (2):
      netfilter: nfnetlink_queue: unbreak SCTP traffic
      selftests: netfilter: nft_queue.sh: sctp coverage

Donald Hunter (1):
      netfilter: nfnetlink: convert kfree_skb to consume_skb

Florian Westphal (4):
      netfilter: nf_tables: store new sets in dedicated list
      netfilter: nf_tables: pass context structure to nft_parse_register_load
      netfilter: nf_tables: allow loads only when register is initialized
      netfilter: nf_tables: don't initialize registers in nft_do_chain()

Pablo Neira Ayuso (1):
      netfilter: nf_tables: do not remove elements if set backend implements .abort

Xin Long (1):
      netfilter: move nf_ct_netns_get out of nf_conncount_init

 include/net/netfilter/nf_conntrack_count.h         |  6 +-
 include/net/netfilter/nf_tables.h                  |  6 +-
 net/bridge/netfilter/nft_meta_bridge.c             |  2 +-
 net/core/dev.c                                     |  1 +
 net/ipv4/netfilter/nft_dup_ipv4.c                  |  4 +-
 net/ipv6/netfilter/nft_dup_ipv6.c                  |  4 +-
 net/netfilter/nf_conncount.c                       | 15 +---
 net/netfilter/nf_tables_api.c                      | 75 +++++++++++++++----
 net/netfilter/nf_tables_core.c                     |  2 +-
 net/netfilter/nfnetlink.c                          | 14 ++--
 net/netfilter/nfnetlink_queue.c                    | 12 ++-
 net/netfilter/nft_bitwise.c                        |  4 +-
 net/netfilter/nft_byteorder.c                      |  2 +-
 net/netfilter/nft_cmp.c                            |  6 +-
 net/netfilter/nft_ct.c                             |  2 +-
 net/netfilter/nft_dup_netdev.c                     |  2 +-
 net/netfilter/nft_dynset.c                         |  4 +-
 net/netfilter/nft_exthdr.c                         |  2 +-
 net/netfilter/nft_fwd_netdev.c                     |  6 +-
 net/netfilter/nft_hash.c                           |  2 +-
 net/netfilter/nft_lookup.c                         |  2 +-
 net/netfilter/nft_masq.c                           |  4 +-
 net/netfilter/nft_meta.c                           |  2 +-
 net/netfilter/nft_nat.c                            |  8 +-
 net/netfilter/nft_objref.c                         |  2 +-
 net/netfilter/nft_payload.c                        |  2 +-
 net/netfilter/nft_queue.c                          |  2 +-
 net/netfilter/nft_range.c                          |  2 +-
 net/netfilter/nft_redir.c                          |  4 +-
 net/netfilter/nft_tproxy.c                         |  4 +-
 net/netfilter/xt_connlimit.c                       | 15 +++-
 net/openvswitch/conntrack.c                        |  5 +-
 tools/testing/selftests/net/netfilter/config       |  2 +
 tools/testing/selftests/net/netfilter/nft_queue.sh | 85 +++++++++++++++++++++-
 34 files changed, 226 insertions(+), 84 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next 0/9] netfilter: updates for net-next
@ 2026-01-28 15:41 Florian Westphal
  2026-01-28 15:41 ` [PATCH net-next 1/9] netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature Florian Westphal
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Florian Westphal @ 2026-01-28 15:41 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

Hi,

The following patchset contains Netfilter fixes for *net-next*:

Patches 1 to 4 add IP6IP6 tunneling acceleration to the flowtable
infrastructure.  Patch 5 extends test coverage for this.
From Lorenzo Bianconi.

Patch 6 removes a duplicated helper from xt_time extension, we can
use an existing helper for this, from Jinjie Ruan.

Patch 7 adds an rhashtable to nfnetink_queue to speed up out-of-order
verdict processing.  Before this list walk was required due to in-order
design assumption.

Patch 8 fixes an esoteric packet-drop problem with UDPGRO and nfqueue added
in v6.11. Patch 9 adds a test case for this.

Please, pull these changes from:
The following changes since commit 239f09e258b906deced5c2a7c1ac8aed301b558b:

  selftests: ptp: treat unsupported PHC operations as skip (2026-01-27 17:57:28 -0800)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git nf-next-26-01-28

for you to fetch changes up to f0ba90068f33a2d18fa4cc848ea7477d489194bf:

  selftests: netfilter: nft_queue.sh: add udp fraglist gro test case (2026-01-28 16:29:55 +0100)

----------------------------------------------------------------
netfilter pull request nf-next-26-01-28

----------------------------------------------------------------
Florian Westphal (2):
  netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation
  selftests: netfilter: nft_queue.sh: add udp fraglist gro test case

Jinjie Ruan (1):
  netfilter: xt_time: use is_leap_year() helper

Lorenzo Bianconi (5):
  netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature
  netfilter: Introduce tunnel metadata info in nf_flowtable_ctx struct
  netfilter: flowtable: Add IP6IP6 rx sw acceleration
  netfilter: flowtable: Add IP6IP6 tx sw acceleration
  selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest

Scott Mitchell (1):
  netfilter: nfnetlink_queue: optimize verdict lookup with hash table

 include/net/netfilter/nf_queue.h              |   4 +
 net/ipv6/ip6_tunnel.c                         |  27 ++
 net/netfilter/nf_flow_table_ip.c              | 243 +++++++++++++---
 net/netfilter/nfnetlink_queue.c               | 263 ++++++++++++------
 net/netfilter/xt_time.c                       |   8 +-
 .../selftests/net/netfilter/nft_flowtable.sh  |  62 ++++-
 .../selftests/net/netfilter/nft_queue.sh      | 142 +++++++++-
 7 files changed, 612 insertions(+), 137 deletions(-)
-- 
2.52.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next 1/9] netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature
  2026-01-28 15:41 [PATCH net-next 0/9] netfilter: updates for net-next Florian Westphal
@ 2026-01-28 15:41 ` Florian Westphal
  2026-01-28 15:41 ` [PATCH net-next 2/9] netfilter: Introduce tunnel metadata info in nf_flowtable_ctx struct Florian Westphal
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2026-01-28 15:41 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

From: Lorenzo Bianconi <lorenzo@kernel.org>

Rely on nf_flowtable_ctx struct pointer in nf_flow_ip4_tunnel_proto and
nf_flow_skb_encap_protocol routine signature. This is a preliminary patch
to introduce IP6IP6 flowtable acceleration since nf_flowtable_ctx will
be used to store IP6IP6 tunnel info.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_flow_table_ip.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 11da560f38bf..283b3fe61919 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -295,15 +295,16 @@ static unsigned int nf_flow_xmit_xfrm(struct sk_buff *skb,
 	return NF_STOLEN;
 }
 
-static bool nf_flow_ip4_tunnel_proto(struct sk_buff *skb, u32 *psize)
+static bool nf_flow_ip4_tunnel_proto(struct nf_flowtable_ctx *ctx,
+				     struct sk_buff *skb)
 {
 	struct iphdr *iph;
 	u16 size;
 
-	if (!pskb_may_pull(skb, sizeof(*iph) + *psize))
+	if (!pskb_may_pull(skb, sizeof(*iph) + ctx->offset))
 		return false;
 
-	iph = (struct iphdr *)(skb_network_header(skb) + *psize);
+	iph = (struct iphdr *)(skb_network_header(skb) + ctx->offset);
 	size = iph->ihl << 2;
 
 	if (ip_is_fragment(iph) || unlikely(ip_has_options(size)))
@@ -313,7 +314,7 @@ static bool nf_flow_ip4_tunnel_proto(struct sk_buff *skb, u32 *psize)
 		return false;
 
 	if (iph->protocol == IPPROTO_IPIP)
-		*psize += size;
+		ctx->offset += size;
 
 	return true;
 }
@@ -329,8 +330,8 @@ static void nf_flow_ip4_tunnel_pop(struct sk_buff *skb)
 	skb_reset_network_header(skb);
 }
 
-static bool nf_flow_skb_encap_protocol(struct sk_buff *skb, __be16 proto,
-				       u32 *offset)
+static bool nf_flow_skb_encap_protocol(struct nf_flowtable_ctx *ctx,
+				       struct sk_buff *skb, __be16 proto)
 {
 	__be16 inner_proto = skb->protocol;
 	struct vlan_ethhdr *veth;
@@ -343,7 +344,7 @@ static bool nf_flow_skb_encap_protocol(struct sk_buff *skb, __be16 proto,
 
 		veth = (struct vlan_ethhdr *)skb_mac_header(skb);
 		if (veth->h_vlan_encapsulated_proto == proto) {
-			*offset += VLAN_HLEN;
+			ctx->offset += VLAN_HLEN;
 			inner_proto = proto;
 			ret = true;
 		}
@@ -351,14 +352,14 @@ static bool nf_flow_skb_encap_protocol(struct sk_buff *skb, __be16 proto,
 	case htons(ETH_P_PPP_SES):
 		if (nf_flow_pppoe_proto(skb, &inner_proto) &&
 		    inner_proto == proto) {
-			*offset += PPPOE_SES_HLEN;
+			ctx->offset += PPPOE_SES_HLEN;
 			ret = true;
 		}
 		break;
 	}
 
 	if (inner_proto == htons(ETH_P_IP))
-		ret = nf_flow_ip4_tunnel_proto(skb, offset);
+		ret = nf_flow_ip4_tunnel_proto(ctx, skb);
 
 	return ret;
 }
@@ -416,7 +417,7 @@ nf_flow_offload_lookup(struct nf_flowtable_ctx *ctx,
 {
 	struct flow_offload_tuple tuple = {};
 
-	if (!nf_flow_skb_encap_protocol(skb, htons(ETH_P_IP), &ctx->offset))
+	if (!nf_flow_skb_encap_protocol(ctx, skb, htons(ETH_P_IP)))
 		return NULL;
 
 	if (nf_flow_tuple_ip(ctx, skb, &tuple) < 0)
@@ -897,7 +898,7 @@ nf_flow_offload_ipv6_lookup(struct nf_flowtable_ctx *ctx,
 	struct flow_offload_tuple tuple = {};
 
 	if (skb->protocol != htons(ETH_P_IPV6) &&
-	    !nf_flow_skb_encap_protocol(skb, htons(ETH_P_IPV6), &ctx->offset))
+	    !nf_flow_skb_encap_protocol(ctx, skb, htons(ETH_P_IPV6)))
 		return NULL;
 
 	if (nf_flow_tuple_ipv6(ctx, skb, &tuple) < 0)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 2/9] netfilter: Introduce tunnel metadata info in nf_flowtable_ctx struct
  2026-01-28 15:41 [PATCH net-next 0/9] netfilter: updates for net-next Florian Westphal
  2026-01-28 15:41 ` [PATCH net-next 1/9] netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature Florian Westphal
@ 2026-01-28 15:41 ` Florian Westphal
  2026-01-28 15:41 ` [PATCH net-next 3/9] netfilter: flowtable: Add IP6IP6 rx sw acceleration Florian Westphal
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2026-01-28 15:41 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

From: Lorenzo Bianconi <lorenzo@kernel.org>

Add tunnel hdr_size and tunnel proto fields in nf_flowtable_ctx struct
in order to store IP tunnel header size and protocol used during IPIP
and IP6IP6 tunnel sw offloading decapsulation and avoid recomputing them
during tunnel header pop since this is constant for IPv6.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_flow_table_ip.c | 41 +++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 283b3fe61919..ddfaddfa57be 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -144,6 +144,18 @@ static bool ip_has_options(unsigned int thoff)
 	return thoff != sizeof(struct iphdr);
 }
 
+struct nf_flowtable_ctx {
+	const struct net_device	*in;
+	u32			offset;
+	u32			hdrsize;
+	struct {
+		/* Tunnel IP header size */
+		u32 hdr_size;
+		/* IP tunnel protocol */
+		u8 proto;
+	} tun;
+};
+
 static void nf_flow_tuple_encap(struct sk_buff *skb,
 				struct flow_offload_tuple *tuple)
 {
@@ -186,12 +198,6 @@ static void nf_flow_tuple_encap(struct sk_buff *skb,
 	}
 }
 
-struct nf_flowtable_ctx {
-	const struct net_device	*in;
-	u32			offset;
-	u32			hdrsize;
-};
-
 static int nf_flow_tuple_ip(struct nf_flowtable_ctx *ctx, struct sk_buff *skb,
 			    struct flow_offload_tuple *tuple)
 {
@@ -313,20 +319,22 @@ static bool nf_flow_ip4_tunnel_proto(struct nf_flowtable_ctx *ctx,
 	if (iph->ttl <= 1)
 		return false;
 
-	if (iph->protocol == IPPROTO_IPIP)
+	if (iph->protocol == IPPROTO_IPIP) {
+		ctx->tun.proto = IPPROTO_IPIP;
+		ctx->tun.hdr_size = size;
 		ctx->offset += size;
+	}
 
 	return true;
 }
 
-static void nf_flow_ip4_tunnel_pop(struct sk_buff *skb)
+static void nf_flow_ip4_tunnel_pop(struct nf_flowtable_ctx *ctx,
+				   struct sk_buff *skb)
 {
-	struct iphdr *iph = (struct iphdr *)skb_network_header(skb);
-
-	if (iph->protocol != IPPROTO_IPIP)
+	if (ctx->tun.proto != IPPROTO_IPIP)
 		return;
 
-	skb_pull(skb, iph->ihl << 2);
+	skb_pull(skb, ctx->tun.hdr_size);
 	skb_reset_network_header(skb);
 }
 
@@ -364,7 +372,8 @@ static bool nf_flow_skb_encap_protocol(struct nf_flowtable_ctx *ctx,
 	return ret;
 }
 
-static void nf_flow_encap_pop(struct sk_buff *skb,
+static void nf_flow_encap_pop(struct nf_flowtable_ctx *ctx,
+			      struct sk_buff *skb,
 			      struct flow_offload_tuple_rhash *tuplehash)
 {
 	struct vlan_hdr *vlan_hdr;
@@ -391,7 +400,7 @@ static void nf_flow_encap_pop(struct sk_buff *skb,
 	}
 
 	if (skb->protocol == htons(ETH_P_IP))
-		nf_flow_ip4_tunnel_pop(skb);
+		nf_flow_ip4_tunnel_pop(ctx, skb);
 }
 
 struct nf_flow_xmit {
@@ -461,7 +470,7 @@ static int nf_flow_offload_forward(struct nf_flowtable_ctx *ctx,
 
 	flow_offload_refresh(flow_table, flow, false);
 
-	nf_flow_encap_pop(skb, tuplehash);
+	nf_flow_encap_pop(ctx, skb, tuplehash);
 	thoff -= ctx->offset;
 
 	iph = ip_hdr(skb);
@@ -876,7 +885,7 @@ static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
 
 	flow_offload_refresh(flow_table, flow, false);
 
-	nf_flow_encap_pop(skb, tuplehash);
+	nf_flow_encap_pop(ctx, skb, tuplehash);
 
 	ip6h = ipv6_hdr(skb);
 	nf_flow_nat_ipv6(flow, skb, dir, ip6h);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 3/9] netfilter: flowtable: Add IP6IP6 rx sw acceleration
  2026-01-28 15:41 [PATCH net-next 0/9] netfilter: updates for net-next Florian Westphal
  2026-01-28 15:41 ` [PATCH net-next 1/9] netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature Florian Westphal
  2026-01-28 15:41 ` [PATCH net-next 2/9] netfilter: Introduce tunnel metadata info in nf_flowtable_ctx struct Florian Westphal
@ 2026-01-28 15:41 ` Florian Westphal
  2026-01-28 15:41 ` [PATCH net-next 4/9] netfilter: flowtable: Add IP6IP6 tx " Florian Westphal
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2026-01-28 15:41 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

From: Lorenzo Bianconi <lorenzo@kernel.org>

Introduce sw acceleration for rx path of IP6IP6 tunnels relying on the
netfilter flowtable infrastructure. Subsequent patches will add sw
acceleration for IP6IP6 tunnels tx path.
IP6IP6 rx sw acceleration can be tested running the following scenario
where the traffic is forwarded between two NICs (eth0 and eth1) and an
IP6IP6 tunnel is used to access a remote site (using eth1 as the underlay
device):

ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (2001:db8:3::2)

$ip addr show
6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
    inet6 2001:db8:1::2/64 scope global nodad
       valid_lft forever preferred_lft forever
7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
    inet6 2001:db8:2::1/64 scope global nodad
       valid_lft forever preferred_lft forever
8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/tunnel6 2001:db8:2::1 peer 2001:db8:2::2 permaddr ce9c:2940:7dcc::
    inet6 2002:db8:1::1/64 scope global nodad
       valid_lft forever preferred_lft forever

$ip -6 route show
2001:db8:1::/64 dev eth0 proto kernel metric 256 pref medium
2001:db8:2::/64 dev eth1 proto kernel metric 256 pref medium
2002:db8:1::/64 dev tun0 proto kernel metric 256 pref medium
default via 2002:db8:1::2 dev tun0 metric 1024 pref medium

$nft list ruleset
table inet filter {
        flowtable ft {
                hook ingress priority filter
                devices = { eth0, eth1 }
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
                meta l4proto { tcp, udp } flow add @ft
        }
}

Reproducing the scenario described above using veths I got the following
results:
- TCP stream received from the IPIP tunnel:
  - net-next: (baseline)                  ~ 81Gbps
  - net-next + IP6IP6 flowtbale support:  ~112Gbps

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/ipv6/ip6_tunnel.c            | 27 +++++++++++
 net/netfilter/nf_flow_table_ip.c | 83 +++++++++++++++++++++++++++-----
 2 files changed, 97 insertions(+), 13 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index c1f39735a236..f68f6f110a3e 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1828,6 +1828,32 @@ int ip6_tnl_encap_setup(struct ip6_tnl *t,
 }
 EXPORT_SYMBOL_GPL(ip6_tnl_encap_setup);
 
+static int ip6_tnl_fill_forward_path(struct net_device_path_ctx *ctx,
+				     struct net_device_path *path)
+{
+	struct ip6_tnl *t = netdev_priv(ctx->dev);
+	struct flowi6 fl6 = {
+		.daddr = t->parms.raddr,
+	};
+	struct dst_entry *dst;
+	int err;
+
+	dst = ip6_route_output(dev_net(ctx->dev), NULL, &fl6);
+	if (!dst->error) {
+		path->type = DEV_PATH_TUN;
+		path->tun.src_v6 = t->parms.laddr;
+		path->tun.dst_v6 = t->parms.raddr;
+		path->tun.l3_proto = IPPROTO_IPV6;
+		path->dev = ctx->dev;
+		ctx->dev = dst->dev;
+	}
+
+	err = dst->error;
+	dst_release(dst);
+
+	return err;
+}
+
 static const struct net_device_ops ip6_tnl_netdev_ops = {
 	.ndo_init	= ip6_tnl_dev_init,
 	.ndo_uninit	= ip6_tnl_dev_uninit,
@@ -1836,6 +1862,7 @@ static const struct net_device_ops ip6_tnl_netdev_ops = {
 	.ndo_change_mtu = ip6_tnl_change_mtu,
 	.ndo_get_stats64 = dev_get_tstats64,
 	.ndo_get_iflink = ip6_tnl_get_iflink,
+	.ndo_fill_forward_path = ip6_tnl_fill_forward_path,
 };
 
 #define IPXIPX_FEATURES (NETIF_F_SG |		\
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index ddfaddfa57be..51c64b3d4e50 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -156,12 +156,14 @@ struct nf_flowtable_ctx {
 	} tun;
 };
 
-static void nf_flow_tuple_encap(struct sk_buff *skb,
+static void nf_flow_tuple_encap(struct nf_flowtable_ctx *ctx,
+				struct sk_buff *skb,
 				struct flow_offload_tuple *tuple)
 {
 	__be16 inner_proto = skb->protocol;
 	struct vlan_ethhdr *veth;
 	struct pppoe_hdr *phdr;
+	struct ipv6hdr *ip6h;
 	struct iphdr *iph;
 	u16 offset = 0;
 	int i = 0;
@@ -188,13 +190,25 @@ static void nf_flow_tuple_encap(struct sk_buff *skb,
 		break;
 	}
 
-	if (inner_proto == htons(ETH_P_IP)) {
+	switch (inner_proto) {
+	case htons(ETH_P_IP):
 		iph = (struct iphdr *)(skb_network_header(skb) + offset);
-		if (iph->protocol == IPPROTO_IPIP) {
+		if (ctx->tun.proto == IPPROTO_IPIP) {
 			tuple->tun.dst_v4.s_addr = iph->daddr;
 			tuple->tun.src_v4.s_addr = iph->saddr;
 			tuple->tun.l3_proto = IPPROTO_IPIP;
 		}
+		break;
+	case htons(ETH_P_IPV6):
+		ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset);
+		if (ctx->tun.proto == IPPROTO_IPV6) {
+			tuple->tun.dst_v6 = ip6h->daddr;
+			tuple->tun.src_v6 = ip6h->saddr;
+			tuple->tun.l3_proto = IPPROTO_IPV6;
+		}
+		break;
+	default:
+		break;
 	}
 }
 
@@ -265,7 +279,7 @@ static int nf_flow_tuple_ip(struct nf_flowtable_ctx *ctx, struct sk_buff *skb,
 	tuple->l3proto		= AF_INET;
 	tuple->l4proto		= ipproto;
 	tuple->iifidx		= ctx->in->ifindex;
-	nf_flow_tuple_encap(skb, tuple);
+	nf_flow_tuple_encap(ctx, skb, tuple);
 
 	return 0;
 }
@@ -328,10 +342,45 @@ static bool nf_flow_ip4_tunnel_proto(struct nf_flowtable_ctx *ctx,
 	return true;
 }
 
-static void nf_flow_ip4_tunnel_pop(struct nf_flowtable_ctx *ctx,
-				   struct sk_buff *skb)
+static bool nf_flow_ip6_tunnel_proto(struct nf_flowtable_ctx *ctx,
+				     struct sk_buff *skb)
 {
-	if (ctx->tun.proto != IPPROTO_IPIP)
+#if IS_ENABLED(CONFIG_IPV6)
+	struct ipv6hdr *ip6h, _ip6h;
+	__be16 frag_off;
+	u8 nexthdr;
+	int hdrlen;
+
+	ip6h = skb_header_pointer(skb, ctx->offset, sizeof(*ip6h), &_ip6h);
+	if (!ip6h)
+		return false;
+
+	if (ip6h->hop_limit <= 1)
+		return false;
+
+	nexthdr = ip6h->nexthdr;
+	hdrlen = ipv6_skip_exthdr(skb, sizeof(*ip6h) + ctx->offset, &nexthdr,
+				  &frag_off);
+	if (hdrlen < 0)
+		return false;
+
+	if (nexthdr == IPPROTO_IPV6) {
+		ctx->tun.hdr_size = hdrlen;
+		ctx->tun.proto = IPPROTO_IPV6;
+	}
+	ctx->offset += ctx->tun.hdr_size;
+
+	return true;
+#else
+	return false;
+#endif /* IS_ENABLED(CONFIG_IPV6) */
+}
+
+static void nf_flow_ip_tunnel_pop(struct nf_flowtable_ctx *ctx,
+				  struct sk_buff *skb)
+{
+	if (ctx->tun.proto != IPPROTO_IPIP &&
+	    ctx->tun.proto != IPPROTO_IPV6)
 		return;
 
 	skb_pull(skb, ctx->tun.hdr_size);
@@ -366,8 +415,16 @@ static bool nf_flow_skb_encap_protocol(struct nf_flowtable_ctx *ctx,
 		break;
 	}
 
-	if (inner_proto == htons(ETH_P_IP))
+	switch (inner_proto) {
+	case htons(ETH_P_IP):
 		ret = nf_flow_ip4_tunnel_proto(ctx, skb);
+		break;
+	case htons(ETH_P_IPV6):
+		ret = nf_flow_ip6_tunnel_proto(ctx, skb);
+		break;
+	default:
+		break;
+	}
 
 	return ret;
 }
@@ -399,8 +456,9 @@ static void nf_flow_encap_pop(struct nf_flowtable_ctx *ctx,
 		}
 	}
 
-	if (skb->protocol == htons(ETH_P_IP))
-		nf_flow_ip4_tunnel_pop(ctx, skb);
+	if (skb->protocol == htons(ETH_P_IP) ||
+	    skb->protocol == htons(ETH_P_IPV6))
+		nf_flow_ip_tunnel_pop(ctx, skb);
 }
 
 struct nf_flow_xmit {
@@ -848,7 +906,7 @@ static int nf_flow_tuple_ipv6(struct nf_flowtable_ctx *ctx, struct sk_buff *skb,
 	tuple->l3proto		= AF_INET6;
 	tuple->l4proto		= nexthdr;
 	tuple->iifidx		= ctx->in->ifindex;
-	nf_flow_tuple_encap(skb, tuple);
+	nf_flow_tuple_encap(ctx, skb, tuple);
 
 	return 0;
 }
@@ -906,8 +964,7 @@ nf_flow_offload_ipv6_lookup(struct nf_flowtable_ctx *ctx,
 {
 	struct flow_offload_tuple tuple = {};
 
-	if (skb->protocol != htons(ETH_P_IPV6) &&
-	    !nf_flow_skb_encap_protocol(ctx, skb, htons(ETH_P_IPV6)))
+	if (!nf_flow_skb_encap_protocol(ctx, skb, htons(ETH_P_IPV6)))
 		return NULL;
 
 	if (nf_flow_tuple_ipv6(ctx, skb, &tuple) < 0)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 4/9] netfilter: flowtable: Add IP6IP6 tx sw acceleration
  2026-01-28 15:41 [PATCH net-next 0/9] netfilter: updates for net-next Florian Westphal
                   ` (2 preceding siblings ...)
  2026-01-28 15:41 ` [PATCH net-next 3/9] netfilter: flowtable: Add IP6IP6 rx sw acceleration Florian Westphal
@ 2026-01-28 15:41 ` Florian Westphal
  2026-01-28 15:41 ` [PATCH net-next 5/9] selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest Florian Westphal
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2026-01-28 15:41 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

From: Lorenzo Bianconi <lorenzo@kernel.org>

Introduce sw acceleration for tx path of IP6IP6 tunnels relying on the
netfilter flowtable infrastructure.
IP6IP6 tx sw acceleration can be tested running the following scenario
where the traffic is forwarded between two NICs (eth0 and eth1) and an
IP6IP6 tunnel is used to access a remote site (using eth1 as the underlay
device):

ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (2001:db8:3::2)

$ip addr show
6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
    inet6 2001:db8:1::2/64 scope global nodad
       valid_lft forever preferred_lft forever
7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
    inet6 2001:db8:2::1/64 scope global nodad
       valid_lft forever preferred_lft forever
8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/tunnel6 2001:db8:2::1 peer 2001:db8:2::2 permaddr ce9c:2940:7dcc::
    inet6 2002:db8:1::1/64 scope global nodad
       valid_lft forever preferred_lft forever

$ip -6 route show
2001:db8:1::/64 dev eth0 proto kernel metric 256 pref medium
2001:db8:2::/64 dev eth1 proto kernel metric 256 pref medium
2002:db8:1::/64 dev tun0 proto kernel metric 256 pref medium
default via 2002:db8:1::2 dev tun0 metric 1024 pref medium

$nft list ruleset
table inet filter {
        flowtable ft {
                hook ingress priority filter
                devices = { eth0, eth1 }
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
                meta l4proto { tcp, udp } flow add @ft
        }
}

Reproducing the scenario described above using veths I got the following
results:
- TCP stream received from the IPIP tunnel:
  - net-next: (baseline)                  ~93Gbps
  - net-next + IP6IP6 flowtbale support:  ~98Gbps

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_flow_table_ip.c | 108 ++++++++++++++++++++++++++++++-
 1 file changed, 106 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 51c64b3d4e50..3fdb10d9bf7f 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -14,6 +14,7 @@
 #include <net/ip.h>
 #include <net/ipv6.h>
 #include <net/ip6_route.h>
+#include <net/ip6_tunnel.h>
 #include <net/neighbour.h>
 #include <net/netfilter/nf_flow_table.h>
 #include <net/netfilter/nf_conntrack_acct.h>
@@ -637,6 +638,97 @@ static int nf_flow_tunnel_v4_push(struct net *net, struct sk_buff *skb,
 	return 0;
 }
 
+struct ipv6_tel_txoption {
+	struct ipv6_txoptions ops;
+	__u8 dst_opt[8];
+};
+
+static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb,
+				      struct flow_offload_tuple *tuple,
+				      struct in6_addr **ip6_daddr,
+				      int encap_limit)
+{
+	struct ipv6hdr *ip6h = (struct ipv6hdr *)skb_network_header(skb);
+	u8 hop_limit = ip6h->hop_limit, proto = IPPROTO_IPV6;
+	struct rtable *rt = dst_rtable(tuple->dst_cache);
+	__u8 dsfield = ipv6_get_dsfield(ip6h);
+	struct flowi6 fl6 = {
+		.daddr = tuple->tun.src_v6,
+		.saddr = tuple->tun.dst_v6,
+		.flowi6_proto = proto,
+	};
+	int err, mtu;
+	u32 headroom;
+
+	err = iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6);
+	if (err)
+		return err;
+
+	skb_set_inner_ipproto(skb, proto);
+	headroom = sizeof(*ip6h) + LL_RESERVED_SPACE(rt->dst.dev) +
+		   rt->dst.header_len;
+	if (encap_limit)
+		headroom += 8;
+	err = skb_cow_head(skb, headroom);
+	if (err)
+		return err;
+
+	skb_scrub_packet(skb, true);
+	mtu = dst_mtu(&rt->dst) - sizeof(*ip6h);
+	if (encap_limit)
+		mtu -= 8;
+	mtu = max(mtu, IPV6_MIN_MTU);
+	skb_dst_update_pmtu_no_confirm(skb, mtu);
+
+	if (encap_limit > 0) {
+		struct ipv6_tel_txoption opt = {
+			.dst_opt[2] = IPV6_TLV_TNL_ENCAP_LIMIT,
+			.dst_opt[3] = 1,
+			.dst_opt[4] = encap_limit,
+			.dst_opt[5] = IPV6_TLV_PADN,
+			.dst_opt[6] = 1,
+		};
+		struct ipv6_opt_hdr *hopt;
+
+		opt.ops.dst1opt = (struct ipv6_opt_hdr *)opt.dst_opt;
+		opt.ops.opt_nflen = 8;
+
+		hopt = skb_push(skb, ipv6_optlen(opt.ops.dst1opt));
+		memcpy(hopt, opt.ops.dst1opt, ipv6_optlen(opt.ops.dst1opt));
+		hopt->nexthdr = IPPROTO_IPV6;
+		proto = NEXTHDR_DEST;
+	}
+
+	skb_push(skb, sizeof(*ip6h));
+	skb_reset_network_header(skb);
+
+	ip6h = ipv6_hdr(skb);
+	ip6_flow_hdr(ip6h, dsfield,
+		     ip6_make_flowlabel(net, skb, fl6.flowlabel, true, &fl6));
+	ip6h->hop_limit = hop_limit;
+	ip6h->nexthdr = proto;
+	ip6h->daddr = tuple->tun.src_v6;
+	ip6h->saddr = tuple->tun.dst_v6;
+	ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(*ip6h));
+	IP6CB(skb)->nhoff = offsetof(struct ipv6hdr, nexthdr);
+
+	*ip6_daddr = &tuple->tun.src_v6;
+
+	return 0;
+}
+
+static int nf_flow_tunnel_v6_push(struct net *net, struct sk_buff *skb,
+				  struct flow_offload_tuple *tuple,
+				  struct in6_addr **ip6_daddr,
+				  int encap_limit)
+{
+	if (tuple->tun_num)
+		return nf_flow_tunnel_ip6ip6_push(net, skb, tuple, ip6_daddr,
+						  encap_limit);
+
+	return 0;
+}
+
 static int nf_flow_encap_push(struct sk_buff *skb,
 			      struct flow_offload_tuple *tuple)
 {
@@ -914,7 +1006,7 @@ static int nf_flow_tuple_ipv6(struct nf_flowtable_ctx *ctx, struct sk_buff *skb,
 static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
 					struct nf_flowtable *flow_table,
 					struct flow_offload_tuple_rhash *tuplehash,
-					struct sk_buff *skb)
+					struct sk_buff *skb, int encap_limit)
 {
 	enum flow_offload_tuple_dir dir;
 	struct flow_offload *flow;
@@ -925,6 +1017,12 @@ static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
 	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
 
 	mtu = flow->tuplehash[dir].tuple.mtu + ctx->offset;
+	if (flow->tuplehash[!dir].tuple.tun_num) {
+		mtu -= sizeof(*ip6h);
+		if (encap_limit > 0)
+			mtu -= 8; /* encap limit option */
+	}
+
 	if (unlikely(nf_flow_exceeds_mtu(skb, mtu)))
 		return 0;
 
@@ -977,6 +1075,7 @@ unsigned int
 nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 			  const struct nf_hook_state *state)
 {
+	int encap_limit = IPV6_DEFAULT_TNL_ENCAP_LIMIT;
 	struct flow_offload_tuple_rhash *tuplehash;
 	struct nf_flowtable *flow_table = priv;
 	struct flow_offload_tuple *other_tuple;
@@ -995,7 +1094,8 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 	if (tuplehash == NULL)
 		return NF_ACCEPT;
 
-	ret = nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash, skb);
+	ret = nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash, skb,
+					   encap_limit);
 	if (ret < 0)
 		return NF_DROP;
 	else if (ret == 0)
@@ -1014,6 +1114,10 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 	other_tuple = &flow->tuplehash[!dir].tuple;
 	ip6_daddr = &other_tuple->src_v6;
 
+	if (nf_flow_tunnel_v6_push(state->net, skb, other_tuple,
+				   &ip6_daddr, encap_limit) < 0)
+		return NF_DROP;
+
 	if (nf_flow_encap_push(skb, other_tuple) < 0)
 		return NF_DROP;
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 5/9] selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest
  2026-01-28 15:41 [PATCH net-next 0/9] netfilter: updates for net-next Florian Westphal
                   ` (3 preceding siblings ...)
  2026-01-28 15:41 ` [PATCH net-next 4/9] netfilter: flowtable: Add IP6IP6 tx " Florian Westphal
@ 2026-01-28 15:41 ` Florian Westphal
  2026-01-28 15:41 ` [PATCH net-next 6/9] netfilter: xt_time: use is_leap_year() helper Florian Westphal
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2026-01-28 15:41 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

From: Lorenzo Bianconi <lorenzo@kernel.org>

Similar to IPIP, introduce specific selftest for IP6IP6 flowtable SW
acceleration in nft_flowtable.sh

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 .../selftests/net/netfilter/nft_flowtable.sh  | 62 ++++++++++++++++---
 1 file changed, 53 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/net/netfilter/nft_flowtable.sh b/tools/testing/selftests/net/netfilter/nft_flowtable.sh
index a68bc882fa4e..14d7f67715ed 100755
--- a/tools/testing/selftests/net/netfilter/nft_flowtable.sh
+++ b/tools/testing/selftests/net/netfilter/nft_flowtable.sh
@@ -592,16 +592,28 @@ ip -net "$nsr1" link set tun0 up
 ip -net "$nsr1" addr add 192.168.100.1/24 dev tun0
 ip netns exec "$nsr1" sysctl net.ipv4.conf.tun0.forwarding=1 > /dev/null
 
+ip -net "$nsr1" link add name tun6 type ip6tnl local fee1:2::1 remote fee1:2::2
+ip -net "$nsr1" link set tun6 up
+ip -net "$nsr1" addr add fee1:3::1/64 dev tun6 nodad
+
 ip -net "$nsr2" link add name tun0 type ipip local 192.168.10.2 remote 192.168.10.1
 ip -net "$nsr2" link set tun0 up
 ip -net "$nsr2" addr add 192.168.100.2/24 dev tun0
 ip netns exec "$nsr2" sysctl net.ipv4.conf.tun0.forwarding=1 > /dev/null
 
+ip -net "$nsr2" link add name tun6 type ip6tnl local fee1:2::2 remote fee1:2::1
+ip -net "$nsr2" link set tun6 up
+ip -net "$nsr2" addr add fee1:3::2/64 dev tun6 nodad
+
 ip -net "$nsr1" route change default via 192.168.100.2
 ip -net "$nsr2" route change default via 192.168.100.1
+ip -6 -net "$nsr1" route change default via fee1:3::2
+ip -6 -net "$nsr2" route change default via fee1:3::1
 ip -net "$ns2" route add default via 10.0.2.1
+ip -6 -net "$ns2" route add default via dead:2::1
 
 ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif tun0 accept'
+ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif tun6 accept'
 ip netns exec "$nsr1" nft -a insert rule inet filter forward \
 	'meta oif "veth0" tcp sport 12345 ct mark set 1 flow add @f1 counter name routed_repl accept'
 
@@ -611,28 +623,51 @@ if ! test_tcp_forwarding_nat "$ns1" "$ns2" 1 "IPIP tunnel"; then
 	ret=1
 fi
 
+if test_tcp_forwarding "$ns1" "$ns2" 1 6 "[dead:2::99]" 12345; then
+	echo "PASS: flow offload for ns1/ns2 IP6IP6 tunnel"
+else
+	echo "FAIL: flow offload for ns1/ns2 with IP6IP6 tunnel" 1>&2
+	ip netns exec "$nsr1" nft list ruleset
+	ret=1
+fi
+
 # Create vlan tagged devices for IPIP traffic.
 ip -net "$nsr1" link add link veth1 name veth1.10 type vlan id 10
 ip -net "$nsr1" link set veth1.10 up
 ip -net "$nsr1" addr add 192.168.20.1/24 dev veth1.10
+ip -net "$nsr1" addr add fee1:4::1/64 dev veth1.10 nodad
 ip netns exec "$nsr1" sysctl net.ipv4.conf.veth1/10.forwarding=1 > /dev/null
 ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif veth1.10 accept'
-ip -net "$nsr1" link add name tun1 type ipip local 192.168.20.1 remote 192.168.20.2
-ip -net "$nsr1" link set tun1 up
-ip -net "$nsr1" addr add 192.168.200.1/24 dev tun1
+
+ip -net "$nsr1" link add name tun0.10 type ipip local 192.168.20.1 remote 192.168.20.2
+ip -net "$nsr1" link set tun0.10 up
+ip -net "$nsr1" addr add 192.168.200.1/24 dev tun0.10
 ip -net "$nsr1" route change default via 192.168.200.2
-ip netns exec "$nsr1" sysctl net.ipv4.conf.tun1.forwarding=1 > /dev/null
-ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif tun1 accept'
+ip netns exec "$nsr1" sysctl net.ipv4.conf.tun0/10.forwarding=1 > /dev/null
+ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif tun0.10 accept'
+
+ip -net "$nsr1" link add name tun6.10 type ip6tnl local fee1:4::1 remote fee1:4::2
+ip -net "$nsr1" link set tun6.10 up
+ip -net "$nsr1" addr add fee1:5::1/64 dev tun6.10 nodad
+ip -6 -net "$nsr1" route change default via fee1:5::2
+ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif tun6.10 accept'
 
 ip -net "$nsr2" link add link veth0 name veth0.10 type vlan id 10
 ip -net "$nsr2" link set veth0.10 up
 ip -net "$nsr2" addr add 192.168.20.2/24 dev veth0.10
+ip -net "$nsr2" addr add fee1:4::2/64 dev veth0.10 nodad
 ip netns exec "$nsr2" sysctl net.ipv4.conf.veth0/10.forwarding=1 > /dev/null
-ip -net "$nsr2" link add name tun1 type ipip local 192.168.20.2 remote 192.168.20.1
-ip -net "$nsr2" link set tun1 up
-ip -net "$nsr2" addr add 192.168.200.2/24 dev tun1
+
+ip -net "$nsr2" link add name tun0.10 type ipip local 192.168.20.2 remote 192.168.20.1
+ip -net "$nsr2" link set tun0.10 up
+ip -net "$nsr2" addr add 192.168.200.2/24 dev tun0.10
 ip -net "$nsr2" route change default via 192.168.200.1
-ip netns exec "$nsr2" sysctl net.ipv4.conf.tun1.forwarding=1 > /dev/null
+ip netns exec "$nsr2" sysctl net.ipv4.conf.tun0/10.forwarding=1 > /dev/null
+
+ip -net "$nsr2" link add name tun6.10 type ip6tnl local fee1:4::2 remote fee1:4::1
+ip -net "$nsr2" link set tun6.10 up
+ip -net "$nsr2" addr add fee1:5::2/64 dev tun6.10 nodad
+ip -6 -net "$nsr2" route change default via fee1:5::1
 
 if ! test_tcp_forwarding_nat "$ns1" "$ns2" 1 "IPIP tunnel over vlan"; then
 	echo "FAIL: flow offload for ns1/ns2 with IPIP tunnel over vlan" 1>&2
@@ -640,10 +675,19 @@ if ! test_tcp_forwarding_nat "$ns1" "$ns2" 1 "IPIP tunnel over vlan"; then
 	ret=1
 fi
 
+if test_tcp_forwarding "$ns1" "$ns2" 1 6 "[dead:2::99]" 12345; then
+	echo "PASS: flow offload for ns1/ns2 IP6IP6 tunnel over vlan"
+else
+	echo "FAIL: flow offload for ns1/ns2 with IP6IP6 tunnel over vlan" 1>&2
+	ip netns exec "$nsr1" nft list ruleset
+	ret=1
+fi
+
 # Restore the previous configuration
 ip -net "$nsr1" route change default via 192.168.10.2
 ip -net "$nsr2" route change default via 192.168.10.1
 ip -net "$ns2" route del default via 10.0.2.1
+ip -6 -net "$ns2" route del default via dead:2::1
 }
 
 # Another test:
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 6/9] netfilter: xt_time: use is_leap_year() helper
  2026-01-28 15:41 [PATCH net-next 0/9] netfilter: updates for net-next Florian Westphal
                   ` (4 preceding siblings ...)
  2026-01-28 15:41 ` [PATCH net-next 5/9] selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest Florian Westphal
@ 2026-01-28 15:41 ` Florian Westphal
  2026-01-28 15:41 ` [PATCH net-next 7/9] netfilter: nfnetlink_queue: optimize verdict lookup with hash table Florian Westphal
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2026-01-28 15:41 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

From: Jinjie Ruan <ruanjinjie@huawei.com>

Use the is_leap_year() helper from rtc.h instead of
writing it by hand

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/xt_time.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/xt_time.c b/net/netfilter/xt_time.c
index 6aa12d0f54e2..00319d2a54da 100644
--- a/net/netfilter/xt_time.c
+++ b/net/netfilter/xt_time.c
@@ -14,6 +14,7 @@
 
 #include <linux/ktime.h>
 #include <linux/module.h>
+#include <linux/rtc.h>
 #include <linux/skbuff.h>
 #include <linux/types.h>
 #include <linux/netfilter/x_tables.h>
@@ -64,11 +65,6 @@ static const u_int16_t days_since_epoch[] = {
 	3287, 2922, 2557, 2191, 1826, 1461, 1096, 730, 365, 0,
 };
 
-static inline bool is_leap(unsigned int y)
-{
-	return y % 4 == 0 && (y % 100 != 0 || y % 400 == 0);
-}
-
 /*
  * Each network packet has a (nano)seconds-since-the-epoch (SSTE) timestamp.
  * Since we match against days and daytime, the SSTE value needs to be
@@ -138,7 +134,7 @@ static void localtime_3(struct xtm *r, time64_t time)
 	 * (A different approach to use would be to subtract a monthlength
 	 * from w repeatedly while counting.)
 	 */
-	if (is_leap(year)) {
+	if (is_leap_year(year)) {
 		/* use days_since_leapyear[] in a leap year */
 		for (i = ARRAY_SIZE(days_since_leapyear) - 1;
 		    i > 0 && days_since_leapyear[i] > w; --i)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 7/9] netfilter: nfnetlink_queue: optimize verdict lookup with hash table
  2026-01-28 15:41 [PATCH net-next 0/9] netfilter: updates for net-next Florian Westphal
                   ` (5 preceding siblings ...)
  2026-01-28 15:41 ` [PATCH net-next 6/9] netfilter: xt_time: use is_leap_year() helper Florian Westphal
@ 2026-01-28 15:41 ` Florian Westphal
  2026-01-28 15:41 ` [PATCH net-next 8/9] netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation Florian Westphal
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2026-01-28 15:41 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

From: Scott Mitchell <scott.k.mitch1@gmail.com>

The current implementation uses a linear list to find queued packets by
ID when processing verdicts from userspace. With large queue depths and
out-of-order verdicting, this O(n) lookup becomes a significant
bottleneck, causing userspace verdict processing to dominate CPU time.

Replace the linear search with a hash table for O(1) average-case
packet lookup by ID. A global rhashtable spanning all network
namespaces attributes hash bucket memory to kernel but is subject to
fixed upper bound.

Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_queue.h |   3 +
 net/netfilter/nfnetlink_queue.c  | 146 ++++++++++++++++++++++++-------
 2 files changed, 119 insertions(+), 30 deletions(-)

diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index 4aeffddb7586..e6803831d6af 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -6,11 +6,13 @@
 #include <linux/ipv6.h>
 #include <linux/jhash.h>
 #include <linux/netfilter.h>
+#include <linux/rhashtable-types.h>
 #include <linux/skbuff.h>
 
 /* Each queued (to userspace) skbuff has one of these. */
 struct nf_queue_entry {
 	struct list_head	list;
+	struct rhash_head	hash_node;
 	struct sk_buff		*skb;
 	unsigned int		id;
 	unsigned int		hook_index;	/* index in hook_entries->hook[] */
@@ -20,6 +22,7 @@ struct nf_queue_entry {
 #endif
 	struct nf_hook_state	state;
 	u16			size; /* sizeof(entry) + saved route keys */
+	u16			queue_num;
 
 	/* extra space to store route keys */
 };
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 8fa0807973c9..671b52c652ef 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -30,6 +30,8 @@
 #include <linux/netfilter/nf_conntrack_common.h>
 #include <linux/list.h>
 #include <linux/cgroup-defs.h>
+#include <linux/rhashtable.h>
+#include <linux/jhash.h>
 #include <net/gso.h>
 #include <net/sock.h>
 #include <net/tcp_states.h>
@@ -47,6 +49,8 @@
 #endif
 
 #define NFQNL_QMAX_DEFAULT 1024
+#define NFQNL_HASH_MIN     1024
+#define NFQNL_HASH_MAX     1048576
 
 /* We're using struct nlattr which has 16bit nla_len. Note that nla_len
  * includes the header length. Thus, the maximum packet length that we
@@ -56,6 +60,26 @@
  */
 #define NFQNL_MAX_COPY_RANGE (0xffff - NLA_HDRLEN)
 
+/* Composite key for packet lookup: (net, queue_num, packet_id) */
+struct nfqnl_packet_key {
+	possible_net_t net;
+	u32 packet_id;
+	u16 queue_num;
+} __aligned(sizeof(u32));  /* jhash2 requires 32-bit alignment */
+
+/* Global rhashtable - one for entire system, all netns */
+static struct rhashtable nfqnl_packet_map __read_mostly;
+
+/* Helper to initialize composite key */
+static inline void nfqnl_init_key(struct nfqnl_packet_key *key,
+				  struct net *net, u32 packet_id, u16 queue_num)
+{
+	memset(key, 0, sizeof(*key));
+	write_pnet(&key->net, net);
+	key->packet_id = packet_id;
+	key->queue_num = queue_num;
+}
+
 struct nfqnl_instance {
 	struct hlist_node hlist;		/* global list of queues */
 	struct rcu_head rcu;
@@ -100,6 +124,39 @@ static inline u_int8_t instance_hashfn(u_int16_t queue_num)
 	return ((queue_num >> 8) ^ queue_num) % INSTANCE_BUCKETS;
 }
 
+/* Extract composite key from nf_queue_entry for hashing */
+static u32 nfqnl_packet_obj_hashfn(const void *data, u32 len, u32 seed)
+{
+	const struct nf_queue_entry *entry = data;
+	struct nfqnl_packet_key key;
+
+	nfqnl_init_key(&key, entry->state.net, entry->id, entry->queue_num);
+
+	return jhash2((u32 *)&key, sizeof(key) / sizeof(u32), seed);
+}
+
+/* Compare stack-allocated key against entry */
+static int nfqnl_packet_obj_cmpfn(struct rhashtable_compare_arg *arg,
+				  const void *obj)
+{
+	const struct nfqnl_packet_key *key = arg->key;
+	const struct nf_queue_entry *entry = obj;
+
+	return !net_eq(entry->state.net, read_pnet(&key->net)) ||
+	       entry->queue_num != key->queue_num ||
+	       entry->id != key->packet_id;
+}
+
+static const struct rhashtable_params nfqnl_rhashtable_params = {
+	.head_offset = offsetof(struct nf_queue_entry, hash_node),
+	.key_len = sizeof(struct nfqnl_packet_key),
+	.obj_hashfn = nfqnl_packet_obj_hashfn,
+	.obj_cmpfn = nfqnl_packet_obj_cmpfn,
+	.automatic_shrinking = true,
+	.min_size = NFQNL_HASH_MIN,
+	.max_size = NFQNL_HASH_MAX,
+};
+
 static struct nfqnl_instance *
 instance_lookup(struct nfnl_queue_net *q, u_int16_t queue_num)
 {
@@ -188,33 +245,45 @@ instance_destroy(struct nfnl_queue_net *q, struct nfqnl_instance *inst)
 	spin_unlock(&q->instances_lock);
 }
 
-static inline void
+static int
 __enqueue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
 {
-       list_add_tail(&entry->list, &queue->queue_list);
-       queue->queue_total++;
+	int err;
+
+	entry->queue_num = queue->queue_num;
+
+	err = rhashtable_insert_fast(&nfqnl_packet_map, &entry->hash_node,
+				     nfqnl_rhashtable_params);
+	if (unlikely(err))
+		return err;
+
+	list_add_tail(&entry->list, &queue->queue_list);
+	queue->queue_total++;
+
+	return 0;
 }
 
 static void
 __dequeue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
 {
+	rhashtable_remove_fast(&nfqnl_packet_map, &entry->hash_node,
+			       nfqnl_rhashtable_params);
 	list_del(&entry->list);
 	queue->queue_total--;
 }
 
 static struct nf_queue_entry *
-find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
+find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id,
+		   struct net *net)
 {
-	struct nf_queue_entry *entry = NULL, *i;
+	struct nfqnl_packet_key key;
+	struct nf_queue_entry *entry;
 
-	spin_lock_bh(&queue->lock);
+	nfqnl_init_key(&key, net, id, queue->queue_num);
 
-	list_for_each_entry(i, &queue->queue_list, list) {
-		if (i->id == id) {
-			entry = i;
-			break;
-		}
-	}
+	spin_lock_bh(&queue->lock);
+	entry = rhashtable_lookup_fast(&nfqnl_packet_map, &key,
+				       nfqnl_rhashtable_params);
 
 	if (entry)
 		__dequeue_entry(queue, entry);
@@ -404,8 +473,7 @@ nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn cmpfn, unsigned long data)
 	spin_lock_bh(&queue->lock);
 	list_for_each_entry_safe(entry, next, &queue->queue_list, list) {
 		if (!cmpfn || cmpfn(entry, data)) {
-			list_del(&entry->list);
-			queue->queue_total--;
+			__dequeue_entry(queue, entry);
 			nfqnl_reinject(entry, NF_DROP);
 		}
 	}
@@ -885,23 +953,23 @@ __nfqnl_enqueue_packet(struct net *net, struct nfqnl_instance *queue,
 	if (nf_ct_drop_unconfirmed(entry))
 		goto err_out_free_nskb;
 
-	if (queue->queue_total >= queue->queue_maxlen) {
-		if (queue->flags & NFQA_CFG_F_FAIL_OPEN) {
-			failopen = 1;
-			err = 0;
-		} else {
-			queue->queue_dropped++;
-			net_warn_ratelimited("nf_queue: full at %d entries, dropping packets(s)\n",
-					     queue->queue_total);
-		}
-		goto err_out_free_nskb;
-	}
+	if (queue->queue_total >= queue->queue_maxlen)
+		goto err_out_queue_drop;
+
 	entry->id = ++queue->id_sequence;
 	*packet_id_ptr = htonl(entry->id);
 
+	/* Insert into hash BEFORE unicast. If failure don't send to userspace. */
+	err = __enqueue_entry(queue, entry);
+	if (unlikely(err))
+		goto err_out_queue_drop;
+
 	/* nfnetlink_unicast will either free the nskb or add it to a socket */
 	err = nfnetlink_unicast(nskb, net, queue->peer_portid);
 	if (err < 0) {
+		/* Unicast failed - remove entry we just inserted */
+		__dequeue_entry(queue, entry);
+
 		if (queue->flags & NFQA_CFG_F_FAIL_OPEN) {
 			failopen = 1;
 			err = 0;
@@ -911,11 +979,22 @@ __nfqnl_enqueue_packet(struct net *net, struct nfqnl_instance *queue,
 		goto err_out_unlock;
 	}
 
-	__enqueue_entry(queue, entry);
-
 	spin_unlock_bh(&queue->lock);
 	return 0;
 
+err_out_queue_drop:
+	if (queue->flags & NFQA_CFG_F_FAIL_OPEN) {
+		failopen = 1;
+		err = 0;
+	} else {
+		queue->queue_dropped++;
+
+		if (queue->queue_total >= queue->queue_maxlen)
+			net_warn_ratelimited("nf_queue: full at %d entries, dropping packets(s)\n",
+					     queue->queue_total);
+		else
+			net_warn_ratelimited("nf_queue: hash insert failed: %d\n", err);
+	}
 err_out_free_nskb:
 	kfree_skb(nskb);
 err_out_unlock:
@@ -1427,7 +1506,7 @@ static int nfqnl_recv_verdict(struct sk_buff *skb, const struct nfnl_info *info,
 
 	verdict = ntohl(vhdr->verdict);
 
-	entry = find_dequeue_entry(queue, ntohl(vhdr->id));
+	entry = find_dequeue_entry(queue, ntohl(vhdr->id), info->net);
 	if (entry == NULL)
 		return -ENOENT;
 
@@ -1774,10 +1853,14 @@ static int __init nfnetlink_queue_init(void)
 {
 	int status;
 
+	status = rhashtable_init(&nfqnl_packet_map, &nfqnl_rhashtable_params);
+	if (status < 0)
+		return status;
+
 	status = register_pernet_subsys(&nfnl_queue_net_ops);
 	if (status < 0) {
 		pr_err("failed to register pernet ops\n");
-		goto out;
+		goto cleanup_rhashtable;
 	}
 
 	netlink_register_notifier(&nfqnl_rtnl_notifier);
@@ -1802,7 +1885,8 @@ static int __init nfnetlink_queue_init(void)
 cleanup_netlink_notifier:
 	netlink_unregister_notifier(&nfqnl_rtnl_notifier);
 	unregister_pernet_subsys(&nfnl_queue_net_ops);
-out:
+cleanup_rhashtable:
+	rhashtable_destroy(&nfqnl_packet_map);
 	return status;
 }
 
@@ -1814,6 +1898,8 @@ static void __exit nfnetlink_queue_fini(void)
 	netlink_unregister_notifier(&nfqnl_rtnl_notifier);
 	unregister_pernet_subsys(&nfnl_queue_net_ops);
 
+	rhashtable_destroy(&nfqnl_packet_map);
+
 	rcu_barrier(); /* Wait for completion of call_rcu()'s */
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 8/9] netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation
  2026-01-28 15:41 [PATCH net-next 0/9] netfilter: updates for net-next Florian Westphal
                   ` (6 preceding siblings ...)
  2026-01-28 15:41 ` [PATCH net-next 7/9] netfilter: nfnetlink_queue: optimize verdict lookup with hash table Florian Westphal
@ 2026-01-28 15:41 ` Florian Westphal
  2026-01-28 15:41 ` [PATCH net-next 9/9] selftests: netfilter: nft_queue.sh: add udp fraglist gro test case Florian Westphal
  2026-01-29  5:03 ` [PATCH net-next 0/9] netfilter: updates for net-next Jakub Kicinski
  9 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2026-01-28 15:41 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

Ulrich reports a regression with nfqueue:

If an application did not set the 'F_GSO' capability flag and a gso
packet with an unconfirmed nf_conn entry is received all packets are
now dropped instead of queued, because the check happens after
skb_gso_segment().  In that case, we did have exclusive ownership
of the skb and its associated conntrack entry.  The elevated use
count is due to skb_clone happening via skb_gso_segment().

Move the check so that its peformed vs. the aggregated packet.

Then, annotate the individual segments except the first one so we
can do a 2nd check at reinject time.

For the normal case, where userspace does in-order reinjects, this avoids
packet drops: first reinjected segment continues traversal and confirms
entry, remaining segments observe the confirmed entry.

While at it, simplify nf_ct_drop_unconfirmed(): We only care about
unconfirmed entries with a refcnt > 1, there is no need to special-case
dying entries.

This only happens with UDP.  With TCP, the only unconfirmed packet will
be the TCP SYN, those aren't aggregated by GRO.

Next patch adds a udpgro test case to cover this scenario.

Reported-by: Ulrich Weber <ulrich.weber@gmail.com>
Fixes: 7d8dc1c7be8d ("netfilter: nf_queue: drop packets with cloned unconfirmed conntracks")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_queue.h |   1 +
 net/netfilter/nfnetlink_queue.c  | 119 ++++++++++++++++++-------------
 2 files changed, 69 insertions(+), 51 deletions(-)

diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index e6803831d6af..70dac4ab2f35 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -21,6 +21,7 @@ struct nf_queue_entry {
 	struct net_device	*physout;
 #endif
 	struct nf_hook_state	state;
+	bool			nf_ct_was_unconfirmed;
 	u16			size; /* sizeof(entry) + saved route keys */
 	u16			queue_num;
 
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 671b52c652ef..930b0e534d1e 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -435,6 +435,33 @@ static void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
 	nf_queue_entry_free(entry);
 }
 
+/* return true if the entry has an unconfirmed conntrack attached that isn't owned by us
+ * exclusively.
+ */
+static bool nf_ct_drop_unconfirmed(const struct nf_queue_entry *entry, bool *is_unconfirmed)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	struct nf_conn *ct = (void *)skb_nfct(entry->skb);
+
+	if (!ct || nf_ct_is_confirmed(ct))
+		return false;
+
+	*is_unconfirmed = true;
+
+	/* in some cases skb_clone() can occur after initial conntrack
+	 * pickup, but conntrack assumes exclusive skb->_nfct ownership for
+	 * unconfirmed entries.
+	 *
+	 * This happens for br_netfilter and with ip multicast routing.
+	 * This can't be solved with serialization here because one clone
+	 * could have been queued for local delivery or could be transmitted
+	 * in parallel on another CPU.
+	 */
+	return refcount_read(&ct->ct_general.use) > 1;
+#endif
+	return false;
+}
+
 static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict)
 {
 	const struct nf_ct_hook *ct_hook;
@@ -462,6 +489,26 @@ static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict)
 			break;
 		}
 	}
+
+	if (verdict != NF_DROP && entry->nf_ct_was_unconfirmed) {
+		bool is_unconfirmed = false;
+
+		/* If first queued segment was already reinjected then
+		 * there is a good chance the ct entry is now confirmed.
+		 *
+		 * Handle the rare cases:
+		 *  - out-of-order verdict
+		 *  - threaded userspace reinjecting in parallel
+		 *  - first segment was dropped
+		 *
+		 * In all of those cases we can't handle this packet
+		 * because we can't be sure that another CPU won't modify
+		 * nf_conn->ext in parallel which isn't allowed.
+		 */
+		if (nf_ct_drop_unconfirmed(entry, &is_unconfirmed))
+			verdict = NF_DROP;
+	}
+
 	nf_reinject(entry, verdict);
 }
 
@@ -891,49 +938,6 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 	return NULL;
 }
 
-static bool nf_ct_drop_unconfirmed(const struct nf_queue_entry *entry)
-{
-#if IS_ENABLED(CONFIG_NF_CONNTRACK)
-	static const unsigned long flags = IPS_CONFIRMED | IPS_DYING;
-	struct nf_conn *ct = (void *)skb_nfct(entry->skb);
-	unsigned long status;
-	unsigned int use;
-
-	if (!ct)
-		return false;
-
-	status = READ_ONCE(ct->status);
-	if ((status & flags) == IPS_DYING)
-		return true;
-
-	if (status & IPS_CONFIRMED)
-		return false;
-
-	/* in some cases skb_clone() can occur after initial conntrack
-	 * pickup, but conntrack assumes exclusive skb->_nfct ownership for
-	 * unconfirmed entries.
-	 *
-	 * This happens for br_netfilter and with ip multicast routing.
-	 * We can't be solved with serialization here because one clone could
-	 * have been queued for local delivery.
-	 */
-	use = refcount_read(&ct->ct_general.use);
-	if (likely(use == 1))
-		return false;
-
-	/* Can't decrement further? Exclusive ownership. */
-	if (!refcount_dec_not_one(&ct->ct_general.use))
-		return false;
-
-	skb_set_nfct(entry->skb, 0);
-	/* No nf_ct_put(): we already decremented .use and it cannot
-	 * drop down to 0.
-	 */
-	return true;
-#endif
-	return false;
-}
-
 static int
 __nfqnl_enqueue_packet(struct net *net, struct nfqnl_instance *queue,
 			struct nf_queue_entry *entry)
@@ -950,9 +954,6 @@ __nfqnl_enqueue_packet(struct net *net, struct nfqnl_instance *queue,
 	}
 	spin_lock_bh(&queue->lock);
 
-	if (nf_ct_drop_unconfirmed(entry))
-		goto err_out_free_nskb;
-
 	if (queue->queue_total >= queue->queue_maxlen)
 		goto err_out_queue_drop;
 
@@ -995,7 +996,6 @@ __nfqnl_enqueue_packet(struct net *net, struct nfqnl_instance *queue,
 		else
 			net_warn_ratelimited("nf_queue: hash insert failed: %d\n", err);
 	}
-err_out_free_nskb:
 	kfree_skb(nskb);
 err_out_unlock:
 	spin_unlock_bh(&queue->lock);
@@ -1074,9 +1074,10 @@ __nfqnl_enqueue_packet_gso(struct net *net, struct nfqnl_instance *queue,
 static int
 nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
 {
-	unsigned int queued;
-	struct nfqnl_instance *queue;
 	struct sk_buff *skb, *segs, *nskb;
+	bool ct_is_unconfirmed = false;
+	struct nfqnl_instance *queue;
+	unsigned int queued;
 	int err = -ENOBUFS;
 	struct net *net = entry->state.net;
 	struct nfnl_queue_net *q = nfnl_queue_pernet(net);
@@ -1100,6 +1101,15 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
 		break;
 	}
 
+	/* Check if someone already holds another reference to
+	 * unconfirmed ct.  If so, we cannot queue the skb:
+	 * concurrent modifications of nf_conn->ext are not
+	 * allowed and we can't know if another CPU isn't
+	 * processing the same nf_conn entry in parallel.
+	 */
+	if (nf_ct_drop_unconfirmed(entry, &ct_is_unconfirmed))
+		return -EINVAL;
+
 	if (!skb_is_gso(skb) || ((queue->flags & NFQA_CFG_F_GSO) && !skb_is_gso_sctp(skb)))
 		return __nfqnl_enqueue_packet(net, queue, entry);
 
@@ -1117,10 +1127,17 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
 		if (err == 0)
 			err = __nfqnl_enqueue_packet_gso(net, queue,
 							segs, entry);
-		if (err == 0)
+		if (err == 0) {
 			queued++;
-		else
+			/* skb_gso_segment() caused increment of ct refcount.
+			 * Annotate this for all queued entries except the first one
+			 * queued.  As long as the first one is reinjected first it
+			 * will do the confirmation for us.
+			 */
+			entry->nf_ct_was_unconfirmed = ct_is_unconfirmed;
+		} else {
 			kfree_skb(segs);
+		}
 	}
 
 	if (queued) {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 9/9] selftests: netfilter: nft_queue.sh: add udp fraglist gro test case
  2026-01-28 15:41 [PATCH net-next 0/9] netfilter: updates for net-next Florian Westphal
                   ` (7 preceding siblings ...)
  2026-01-28 15:41 ` [PATCH net-next 8/9] netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation Florian Westphal
@ 2026-01-28 15:41 ` Florian Westphal
  2026-01-29  5:03 ` [PATCH net-next 0/9] netfilter: updates for net-next Jakub Kicinski
  9 siblings, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2026-01-28 15:41 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

Without the preceding patch, this fails with:

FAIL: test_udp_gro_ct: Expected udp conntrack entry
FAIL: test_udp_gro_ct: Expected software segmentation to occur, had 10 and 0

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 .../selftests/net/netfilter/nft_queue.sh      | 142 +++++++++++++++++-
 1 file changed, 136 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/net/netfilter/nft_queue.sh b/tools/testing/selftests/net/netfilter/nft_queue.sh
index 6136ceec45e0..139bc1211878 100755
--- a/tools/testing/selftests/net/netfilter/nft_queue.sh
+++ b/tools/testing/selftests/net/netfilter/nft_queue.sh
@@ -510,7 +510,7 @@ EOF
 
 udp_listener_ready()
 {
-	ss -S -N "$1" -uln -o "sport = :12345" | grep -q 12345
+	ss -S -N "$1" -uln -o "sport = :$2" | grep -q "$2"
 }
 
 output_files_written()
@@ -518,7 +518,7 @@ output_files_written()
 	test -s "$1" && test -s "$2"
 }
 
-test_udp_ct_race()
+test_udp_nat_race()
 {
         ip netns exec "$nsrouter" nft -f /dev/stdin <<EOF
 flush ruleset
@@ -545,8 +545,8 @@ EOF
 	ip netns exec "$nsrouter" ./nf_queue -q 12 -d 1000 &
 	local nfqpid=$!
 
-	busywait "$BUSYWAIT_TIMEOUT" udp_listener_ready "$ns2"
-	busywait "$BUSYWAIT_TIMEOUT" udp_listener_ready "$ns3"
+	busywait "$BUSYWAIT_TIMEOUT" udp_listener_ready "$ns2" 12345
+	busywait "$BUSYWAIT_TIMEOUT" udp_listener_ready "$ns3" 12345
 	busywait "$BUSYWAIT_TIMEOUT" nf_queue_wait "$nsrouter" 12
 
 	# Send two packets, one should end up in ns1, other in ns2.
@@ -557,7 +557,7 @@ EOF
 
 	busywait 10000 output_files_written "$TMPFILE1" "$TMPFILE2"
 
-	kill "$nfqpid"
+	kill "$nfqpid" "$rpid1" "$rpid2"
 
 	if ! ip netns exec "$nsrouter" bash -c 'conntrack -L -p udp --dport 12345 2>/dev/null | wc -l | grep -q "^1"'; then
 		echo "FAIL: Expected One udp conntrack entry"
@@ -585,6 +585,135 @@ EOF
 	echo "PASS: both udp receivers got one packet each"
 }
 
+# Make sure UDPGRO aggregated packets don't lose
+# their skb->nfct entry when nfqueue passes the
+# skb to userspace with software gso segmentation on.
+test_udp_gro_ct()
+{
+	local errprefix="FAIL: test_udp_gro_ct:"
+
+	ip netns exec "$nsrouter" conntrack -F 2>/dev/null
+
+        ip netns exec "$nsrouter" nft -f /dev/stdin <<EOF
+flush ruleset
+table inet udpq {
+	# Number of packets/bytes queued to userspace
+	counter toqueue { }
+	# Number of packets/bytes reinjected from userspace with 'ct new' intact
+	counter fromqueue { }
+	# These two counters should be identical and not 0.
+
+	chain prerouting {
+		type filter hook prerouting priority -300; policy accept;
+
+		# userspace sends small packets, if < 1000, UDPGRO did
+		# not kick in, but test needs a 'new' conntrack with udpgro skb.
+		meta iifname veth0 meta l4proto udp meta length > 1000 accept
+
+		# don't pick up non-gso packets and don't queue them to
+		# userspace.
+		notrack
+	}
+
+        chain postrouting {
+		type filter hook postrouting priority 0; policy accept;
+
+		# Only queue unconfirmed fraglist gro skbs to userspace.
+		udp dport 12346 ct status ! confirmed counter name "toqueue" mark set 1 queue num 1
+        }
+
+	chain validate {
+		type filter hook postrouting priority 1; policy accept;
+		# ... and only count those that were reinjected with the
+		# skb->nfct intact.
+		mark 1 counter name "fromqueue"
+	}
+}
+EOF
+	timeout 10 ip netns exec "$ns2" socat UDP-LISTEN:12346,fork,pf=ipv4 OPEN:"$TMPFILE1",trunc &
+	local rpid=$!
+
+	ip netns exec "$nsrouter" ./nf_queue -G -c -q 1 -t 2 > "$TMPFILE2" &
+	local nfqpid=$!
+
+	ip netns exec "$nsrouter" ethtool -K "veth0" rx-udp-gro-forwarding on rx-gro-list on generic-receive-offload on
+
+	busywait "$BUSYWAIT_TIMEOUT" udp_listener_ready "$ns2" 12346
+	busywait "$BUSYWAIT_TIMEOUT" nf_queue_wait "$nsrouter" 1
+
+	local bs=512
+	local count=$(((32 * 1024 * 1024) / bs))
+	dd if=/dev/zero bs="$bs" count="$count" 2>/dev/null | for i in $(seq 1 16); do
+		timeout 5 ip netns exec "$ns1" \
+			socat -u -b 512 STDIN UDP-DATAGRAM:10.0.2.99:12346,reuseport,bind=0.0.0.0:55221 &
+	done
+
+	busywait 10000 test -s "$TMPFILE1"
+
+	kill "$rpid"
+
+	wait
+
+	local p
+	local b
+	local pqueued
+	local bqueued
+
+	c=$(ip netns exec "$nsrouter" nft list counter inet udpq "toqueue" | grep packets)
+	read p pqueued b bqueued <<EOF
+$c
+EOF
+	local preinject
+	local breinject
+	c=$(ip netns exec "$nsrouter" nft list counter inet udpq "fromqueue" | grep packets)
+	read p preinject b breinject <<EOF
+$c
+EOF
+	ip netns exec "$nsrouter" ethtool -K "veth0" rx-udp-gro-forwarding off
+	ip netns exec "$nsrouter" ethtool -K "veth1" rx-udp-gro-forwarding off
+
+	if [ "$pqueued" -eq 0 ];then
+		# happens when gro did not build at least on aggregate
+		echo "SKIP: No packets were queued"
+		return
+	fi
+
+	local saw_ct_entry=0
+	if ip netns exec "$nsrouter" bash -c 'conntrack -L -p udp --dport 12346 2>/dev/null | wc -l | grep -q "^1"'; then
+		saw_ct_entry=1
+	else
+		echo "$errprefix Expected udp conntrack entry"
+		ip netns exec "$nsrouter" conntrack -L
+		ret=1
+	fi
+
+	if [ "$pqueued" -ge "$preinject" ] ;then
+		echo "$errprefix Expected software segmentation to occur, had $pqueued and $preinject"
+		ret=1
+		return
+	fi
+
+	# sw segmentation adds extra udp and ip headers.
+	local breinject_expect=$((preinject * (512 + 20 + 8)))
+
+	if [ "$breinject" -eq "$breinject_expect" ]; then
+		if [ "$saw_ct_entry" -eq 1 ];then
+			echo "PASS: fraglist gro skb passed with conntrack entry"
+		else
+			echo "$errprefix fraglist gro skb passed without conntrack entry"
+			ret=1
+		fi
+	else
+		echo "$errprefix Counter mismatch, conntrack entry dropped by nfqueue? Queued: $pqueued, $bqueued. Post-queue: $preinject, $breinject. Expected $breinject_expect"
+		ret=1
+	fi
+
+	if ! ip netns exec "$nsrouter" nft delete table inet udpq; then
+		echo "$errprefix: Could not delete udpq table"
+		ret=1
+	fi
+}
+
 test_queue_removal()
 {
 	read tainted_then < /proc/sys/kernel/tainted
@@ -663,7 +792,8 @@ test_tcp_localhost_connectclose
 test_tcp_localhost_requeue
 test_sctp_forward
 test_sctp_output
-test_udp_ct_race
+test_udp_nat_race
+test_udp_gro_ct
 
 # should be last, adds vrf device in ns1 and changes routes
 test_icmp_vrf
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 0/9] netfilter: updates for net-next
  2026-01-28 15:41 [PATCH net-next 0/9] netfilter: updates for net-next Florian Westphal
                   ` (8 preceding siblings ...)
  2026-01-28 15:41 ` [PATCH net-next 9/9] selftests: netfilter: nft_queue.sh: add udp fraglist gro test case Florian Westphal
@ 2026-01-29  5:03 ` Jakub Kicinski
  2026-01-29  8:56   ` Florian Westphal
  9 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2026-01-29  5:03 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
	netfilter-devel, pablo

On Wed, 28 Jan 2026 16:41:46 +0100 Florian Westphal wrote:
> Patches 1 to 4 add IP6IP6 tunneling acceleration to the flowtable
> infrastructure.  Patch 5 extends test coverage for this.
> From Lorenzo Bianconi.
> 
> Patch 6 removes a duplicated helper from xt_time extension, we can
> use an existing helper for this, from Jinjie Ruan.
> 
> Patch 7 adds an rhashtable to nfnetink_queue to speed up out-of-order
> verdict processing.  Before this list walk was required due to in-order
> design assumption.
> 
> Patch 8 fixes an esoteric packet-drop problem with UDPGRO and nfqueue added
> in v6.11. Patch 9 adds a test case for this.

Hi!

There's a UAF in the CI:

https://netdev-ctrl.bots.linux.dev/logs/vmksft/nf-dbg/results/494261/vm-crash-thr0-0

[  580.340726][T19113] sctp: Hash tables configured (bind 32/56)
[  601.749973][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
[  601.985349][    C2] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
[  602.191750][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
[  602.555469][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
[  602.895890][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
[  603.226543][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
[  603.435907][    C0] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
[  603.569421][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
[  603.672454][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
[  603.821679][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
[  618.553975][T19316] ==================================================================
[  618.554200][T19316] BUG: KASAN: slab-use-after-free in nfqnl_enqueue_packet+0x8f1/0x9e0 [nfnetlink_queue]
[  618.554424][T19316] Write of size 1 at addr ff1100001cc9ae68 by task socat/19316
[  618.554600][T19316] 
[  618.554662][T19316] CPU: 2 UID: 0 PID: 19316 Comm: socat Not tainted 6.19.0-rc6-virtme #1 PREEMPT(full) 
[  618.554665][T19316] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  618.554667][T19316] Call Trace:
[  618.554669][T19316]  <TASK>
[  618.554670][T19316]  dump_stack_lvl+0x6f/0xa0
[  618.554678][T19316]  print_address_description.constprop.0+0x6e/0x300
[  618.554683][T19316]  print_report+0xfc/0x1fb
[  618.554684][T19316]  ? nfqnl_enqueue_packet+0x8f1/0x9e0 [nfnetlink_queue]
[  618.554687][T19316]  ? __virt_addr_valid+0x1da/0x430
[  618.554691][T19316]  ? nfqnl_enqueue_packet+0x8f1/0x9e0 [nfnetlink_queue]
[  618.554693][T19316]  kasan_report+0xe8/0x120
[  618.554697][T19316]  ? nfqnl_enqueue_packet+0x8f1/0x9e0 [nfnetlink_queue]
[  618.554699][T19316]  nfqnl_enqueue_packet+0x8f1/0x9e0 [nfnetlink_queue]
[  618.554702][T19316]  ? __nfqnl_enqueue_packet+0x470/0x470 [nfnetlink_queue]
[  618.554703][T19316]  ? nf_queue_entry_release_refs+0x230/0x240
[  618.554707][T19316]  ? __nf_queue+0x11f/0x1700
[  618.554709][T19316]  __nf_queue+0x50c/0x1700
[  618.554710][T19316]  ? nft_do_chain_inet+0xd8/0x3a0 [nf_tables]
[  618.554722][T19316]  ? nf_queue_entry_get_refs+0x390/0x390
[  618.554724][T19316]  nf_queue+0x18/0x50
[  618.554726][T19316]  nf_hook_slow+0x138/0x1d0
[  618.554729][T19316]  __ip_local_out+0x41f/0x8d0
[  618.554731][T19316]  ? ip_output+0x650/0x650
[  618.554732][T19316]  ? lock_acquire.part.0+0xbc/0x260
[  618.554735][T19316]  ? find_held_lock+0x2b/0x80
[  618.554737][T19316]  ? ip_append_data.part.0+0x1a0/0x1a0
[  618.554740][T19316]  ? ip4_dst_hoplimit+0x15b/0x320
[  618.554742][T19316]  __ip_queue_xmit+0x73f/0x1660
[  618.554744][T19316]  sctp_packet_transmit+0x655/0x1070 [sctp]
[  618.554757][T19316]  sctp_outq_flush_transports+0x321/0x6c0 [sctp]
[  618.554768][T19316]  sctp_outq_flush+0x125/0x190 [sctp]
[  618.554775][T19316]  ? lock_acquire.part.0+0xbc/0x260
[  618.554777][T19316]  ? sctp_outq_flush_data+0x1950/0x1950 [sctp]
[  618.554784][T19316]  ? sctp_outq_tail+0x2b8/0xa20 [sctp]
[  618.554791][T19316]  sctp_cmd_interpreter.isra.0+0x40e/0x4f50 [sctp]
[  618.554801][T19316]  ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp]
[  618.554807][T19316]  ? rcu_lockdep_current_cpu_online+0x39/0x1b0
[  618.554812][T19316]  sctp_side_effects+0xcf/0x230 [sctp]
[  618.554819][T19316]  ? sctp_cmd_interpreter.isra.0+0x4f50/0x4f50 [sctp]
[  618.554825][T19316]  ? __lock_acquire+0x577/0xc10
[  618.554828][T19316]  ? br_deinit+0x5b0/0x5b0 [bridge]
[  618.554836][T19316]  sctp_do_sm+0x1a0/0x4e0 [sctp]
[  618.554844][T19316]  ? sctp_cname+0x1c0/0x1c0 [sctp]
[  618.554851][T19316]  ? __lock_release.isra.0+0x59/0x170
[  618.554853][T19316]  ? sctp_do_8_2_transport_strike.isra.0+0x1160/0x1160 [sctp]
[  618.554860][T19316]  ? __might_fault+0x97/0x140
[  618.554866][T19316]  ? sctp_datamsg_from_user+0x677/0x1140 [sctp]
[  618.554875][T19316]  ? skb_set_owner_w+0x27e/0x610
[  618.554879][T19316]  ? sock_recv_errqueue+0x4a0/0x4a0
[  618.554881][T19316]  sctp_primitive_SEND+0x82/0xe0 [sctp]
[  618.554889][T19316]  sctp_sendmsg_to_asoc+0x9d0/0x1420 [sctp]
[  618.554898][T19316]  ? sctp_close+0x850/0x850 [sctp]
[  618.554904][T19316]  ? mark_held_locks+0x40/0x70
[  618.554907][T19316]  sctp_sendmsg+0x624/0xd70 [sctp]
[  618.554915][T19316]  ? sctp_sendmsg_new_asoc+0x720/0x720 [sctp]
[  618.554921][T19316]  ? current_time+0x83/0x300
[  618.554924][T19316]  ? new_sync_write+0x6f0/0x6f0
[  618.554927][T19316]  ? make_vfsuid+0xe0/0xe0
[  618.554930][T19316]  ? ovl_path_next+0x760/0x760
[  618.554934][T19316]  ? atime_needs_update+0x27f/0x5d0
[  618.554937][T19316]  sock_write_iter+0x281/0x4d0
[  618.554938][T19316]  ? backing_file_read_iter+0x50e/0x730
[  618.554942][T19316]  ? ____sys_recvmsg+0x6b0/0x6b0
[  618.554945][T19316]  ? ovl_mmap+0x270/0x270
[  618.554947][T19316]  ? ____sys_recvmsg+0x6b0/0x6b0
[  618.554948][T19316]  new_sync_write+0x3c5/0x6f0
[  618.554950][T19316]  ? new_sync_read+0x24f/0x6f0
[  618.554952][T19316]  ? new_sync_read+0x6f0/0x6f0
[  618.554954][T19316]  ? generic_atomic_write_valid+0x150/0x150
[  618.554956][T19316]  ? __set_current_blocked+0x110/0x110
[  618.554959][T19316]  ? find_held_lock+0x2b/0x80
[  618.554961][T19316]  ? do_pselect.constprop.0+0x14e/0x1f0
[  618.554964][T19316]  vfs_write+0x65e/0xbb0
[  618.554966][T19316]  ? vfs_read+0x3cc/0x790
[  618.554968][T19316]  ksys_write+0x17e/0x200
[  618.554970][T19316]  ? __ia32_sys_read+0xc0/0xc0
[  618.554972][T19316]  ? rcu_is_watching+0x15/0xd0
[  618.554974][T19316]  do_syscall_64+0xbd/0xfc0
[  618.554979][T19316]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[  618.554981][T19316] RIP: 0033:0x7fd5f9750c5e
[  618.554984][T19316] Code: 4d 89 d8 e8 34 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
[  618.554987][T19316] RSP: 002b:00007fffca8a36c0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[  618.554990][T19316] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007fd5f9750c5e
[  618.554992][T19316] RDX: 0000000000002000 RSI: 000055fcf0fd4000 RDI: 0000000000000007
[  618.554993][T19316] RBP: 00007fffca8a36d0 R08: 0000000000000000 R09: 0000000000000000
[  618.554994][T19316] R10: 0000000000000000 R11: 0000000000000202 R12: 000055fcf0fd4000
[  618.554995][T19316] R13: 0000000000002000 R14: 000055fcf0fd4000 R15: 0000000000000007
[  618.554997][T19316]  </TASK>
[  618.554998][T19316] 
[  618.565908][T19316] Allocated by task 19316:
[  618.566029][T19316]  kasan_save_stack+0x30/0x50
[  618.566144][T19316]  kasan_save_track+0x14/0x30
[  618.566258][T19316]  __kasan_kmalloc+0x7b/0x90
[  618.566369][T19316]  __kmalloc_noprof+0x2cd/0x820
[  618.566479][T19316]  __nf_queue+0x11f/0x1700
[  618.566589][T19316]  nf_queue+0x18/0x50
[  618.566671][T19316]  nf_hook_slow+0x138/0x1d0
[  618.566784][T19316]  __ip_local_out+0x41f/0x8d0
[  618.566892][T19316]  __ip_queue_xmit+0x73f/0x1660
[  618.567003][T19316]  sctp_packet_transmit+0x655/0x1070 [sctp]
[  618.567148][T19316]  sctp_outq_flush_transports+0x321/0x6c0 [sctp]
[  618.567294][T19316]  sctp_outq_flush+0x125/0x190 [sctp]
[  618.567408][T19316]  sctp_cmd_interpreter.isra.0+0x40e/0x4f50 [sctp]
[  618.567552][T19316]  sctp_side_effects+0xcf/0x230 [sctp]
[  618.567670][T19316]  sctp_do_sm+0x1a0/0x4e0 [sctp]
[  618.567785][T19316]  sctp_primitive_SEND+0x82/0xe0 [sctp]
[  618.567899][T19316]  sctp_sendmsg_to_asoc+0x9d0/0x1420 [sctp]
[  618.568040][T19316]  sctp_sendmsg+0x624/0xd70 [sctp]
[  618.568160][T19316]  sock_write_iter+0x281/0x4d0
[  618.568270][T19316]  new_sync_write+0x3c5/0x6f0
[  618.568380][T19316]  vfs_write+0x65e/0xbb0
[  618.568464][T19316]  ksys_write+0x17e/0x200
[  618.568546][T19316]  do_syscall_64+0xbd/0xfc0
[  618.568656][T19316]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[  618.568794][T19316] 
[  618.568851][T19316] Freed by task 19314:
[  618.568935][T19316]  kasan_save_stack+0x30/0x50
[  618.569047][T19316]  kasan_save_track+0x14/0x30
[  618.569160][T19316]  kasan_save_free_info+0x3b/0x60
[  618.569273][T19316]  __kasan_slab_free+0x43/0x70
[  618.569390][T19316]  kfree+0x119/0x580
[  618.569472][T19316]  nfqnl_reinject+0x7f/0x3d0 [nfnetlink_queue]
[  618.569610][T19316]  nfqnl_recv_verdict+0x76f/0xfd3 [nfnetlink_queue]
[  618.569747][T19316]  nfnetlink_rcv_msg+0x49b/0xf00
[  618.569859][T19316]  netlink_rcv_skb+0x123/0x380
[  618.569970][T19316]  nfnetlink_rcv+0x166/0x4a0
[  618.570080][T19316]  netlink_unicast+0x4a3/0x770
[  618.570195][T19316]  netlink_sendmsg+0x735/0xc60
[  618.570307][T19316]  __sys_sendto+0x24e/0x360
[  618.570419][T19316]  __x64_sys_sendto+0xe4/0x1f0
[  618.570529][T19316]  do_syscall_64+0xbd/0xfc0
[  618.570640][T19316]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[  618.570776][T19316] 
[  618.570833][T19316] The buggy address belongs to the object at ff1100001cc9ae00
[  618.570833][T19316]  which belongs to the cache kmalloc-128 of size 128
[  618.571105][T19316] The buggy address is located 104 bytes inside of
[  618.571105][T19316]  freed 128-byte region [ff1100001cc9ae00, ff1100001cc9ae80)
[  618.571381][T19316] 
[  618.571438][T19316] The buggy address belongs to the physical page:
[  618.571573][T19316] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xff1100001cc9be80 pfn:0x1cc9a
[  618.571803][T19316] head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[  618.571972][T19316] flags: 0x80000000000240(workingset|head|node=0|zone=1)
[  618.572115][T19316] page_type: f5(slab)
[  618.572206][T19316] raw: 0080000000000240 ff1100000103ce40 ffd4000000048090 ff11000001032a88
[  618.572404][T19316] raw: ff1100001cc9be80 0000000000150011 00000000f5000000 0000000000000000
[  618.572603][T19316] head: 0080000000000240 ff1100000103ce40 ffd4000000048090 ff11000001032a88
[  618.572798][T19316] head: ff1100001cc9be80 0000000000150011 00000000f5000000 0000000000000000
[  618.572994][T19316] head: 0080000000000001 ffd4000000732681 00000000ffffffff 00000000ffffffff
[  618.573191][T19316] head: ff1100001cc9bf10 0000000000000000 00000000ffffffff 0000000000000000
[  618.573390][T19316] page dumped because: kasan: bad access detected
[  618.573527][T19316] 
[  618.573584][T19316] Memory state around the buggy address:
[  618.573693][T19316]  ff1100001cc9ad00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  618.573856][T19316]  ff1100001cc9ad80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  618.574015][T19316] >ff1100001cc9ae00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  618.574179][T19316]                                                           ^
[  618.574343][T19316]  ff1100001cc9ae80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  618.574504][T19316]  ff1100001cc9af00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  618.574672][T19316] ==================================================================
[  618.574903][T19316] Disabling lock debugging due to kernel taint

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 0/9] netfilter: updates for net-next
  2026-01-29  5:03 ` [PATCH net-next 0/9] netfilter: updates for net-next Jakub Kicinski
@ 2026-01-29  8:56   ` Florian Westphal
  2026-01-29 10:08     ` Florian Westphal
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2026-01-29  8:56 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
	netfilter-devel, pablo

Jakub Kicinski <kuba@kernel.org> wrote:
> On Wed, 28 Jan 2026 16:41:46 +0100 Florian Westphal wrote:
> > Patches 1 to 4 add IP6IP6 tunneling acceleration to the flowtable
> > infrastructure.  Patch 5 extends test coverage for this.
> > From Lorenzo Bianconi.
> > 
> > Patch 6 removes a duplicated helper from xt_time extension, we can
> > use an existing helper for this, from Jinjie Ruan.
> > 
> > Patch 7 adds an rhashtable to nfnetink_queue to speed up out-of-order
> > verdict processing.  Before this list walk was required due to in-order
> > design assumption.
> > 
> > Patch 8 fixes an esoteric packet-drop problem with UDPGRO and nfqueue added
> > in v6.11. Patch 9 adds a test case for this.
> 
> Hi!
> 
> There's a UAF in the CI:
> 
> https://netdev-ctrl.bots.linux.dev/logs/vmksft/nf-dbg/results/494261/vm-crash-thr0-0
> 
> [  580.340726][T19113] sctp: Hash tables configured (bind 32/56)
> [  601.749973][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> [  601.985349][    C2] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> [  602.191750][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> [  602.555469][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> [  602.895890][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> [  603.226543][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> [  603.435907][    C0] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> [  603.569421][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> [  603.672454][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> [  603.821679][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> [  618.553975][T19316] ==================================================================
> [  618.554200][T19316] BUG: KASAN: slab-use-after-free in nfqnl_enqueue_packet+0x8f1/0x9e0 [nfnetlink_queue]
> [  618.554424][T19316] Write of size 1 at addr ff1100001cc9ae68 by task socat/19316
> [  618.554600][T19316] 

Did not occur here during local testing :-(

Should I send a v2 without the last two patches or will you pull and
discard the last two changes?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 0/9] netfilter: updates for net-next
  2026-01-29  8:56   ` Florian Westphal
@ 2026-01-29 10:08     ` Florian Westphal
  2026-01-29 10:40       ` Paolo Abeni
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2026-01-29 10:08 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
	netfilter-devel, pablo

Florian Westphal <fw@strlen.de> wrote:
> Jakub Kicinski <kuba@kernel.org> wrote:
> > [  580.340726][T19113] sctp: Hash tables configured (bind 32/56)
> > [  601.749973][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> > [  601.985349][    C2] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> > [  602.191750][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> > [  602.555469][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> > [  602.895890][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> > [  603.226543][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> > [  603.435907][    C0] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> > [  603.569421][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> > [  603.672454][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> > [  603.821679][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
> > [  618.553975][T19316] ==================================================================
> > [  618.554200][T19316] BUG: KASAN: slab-use-after-free in nfqnl_enqueue_packet+0x8f1/0x9e0 [nfnetlink_queue]
> > [  618.554424][T19316] Write of size 1 at addr ff1100001cc9ae68 by task socat/19316
> > [  618.554600][T19316] 
> 
> Did not occur here during local testing :-(
> 
> Should I send a v2 without the last two patches or will you pull and
> discard the last two changes?

Alternatively you can also pull this:

  https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git tags/nf-next-26-01-29

Which is the same series but without the last two patches, i.e. up to
e19079adcd26a25d7d3e586b1837493361fdf8b6:

  netfilter: nfnetlink_queue: optimize verdict lookup with hash table (2026-01-29 09:52:07 +0100)

----------------------------------------------------------------
netfilter pull request nf-next-26-01-29

----------------------------------------------------------------
Jinjie Ruan (1):
      netfilter: xt_time: use is_leap_year() helper

Lorenzo Bianconi (5):
      netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature
      netfilter: Introduce tunnel metadata info in nf_flowtable_ctx struct
      netfilter: flowtable: Add IP6IP6 rx sw acceleration
      netfilter: flowtable: Add IP6IP6 tx sw acceleration
      selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest

Scott Mitchell (1):
      netfilter: nfnetlink_queue: optimize verdict lookup with hash table

6 files changed, 408 insertions(+), 81 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 0/9] netfilter: updates for net-next
  2026-01-29 10:08     ` Florian Westphal
@ 2026-01-29 10:40       ` Paolo Abeni
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Abeni @ 2026-01-29 10:40 UTC (permalink / raw)
  To: Florian Westphal, Jakub Kicinski
  Cc: netdev, David S. Miller, Eric Dumazet, netfilter-devel, pablo

On 1/29/26 11:08 AM, Florian Westphal wrote:
> Florian Westphal <fw@strlen.de> wrote:
>> Jakub Kicinski <kuba@kernel.org> wrote:
>>> [  580.340726][T19113] sctp: Hash tables configured (bind 32/56)
>>> [  601.749973][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
>>> [  601.985349][    C2] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
>>> [  602.191750][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
>>> [  602.555469][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
>>> [  602.895890][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
>>> [  603.226543][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
>>> [  603.435907][    C0] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
>>> [  603.569421][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
>>> [  603.672454][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
>>> [  603.821679][    C1] TCP: request_sock_TCP: Possible SYN flooding on port 127.0.0.1:23456. Sending cookies.
>>> [  618.553975][T19316] ==================================================================
>>> [  618.554200][T19316] BUG: KASAN: slab-use-after-free in nfqnl_enqueue_packet+0x8f1/0x9e0 [nfnetlink_queue]
>>> [  618.554424][T19316] Write of size 1 at addr ff1100001cc9ae68 by task socat/19316
>>> [  618.554600][T19316] 
>>
>> Did not occur here during local testing :-(
>>
>> Should I send a v2 without the last two patches or will you pull and
>> discard the last two changes?
> 
> Alternatively you can also pull this:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git tags/nf-next-26-01-29
> 
> Which is the same series but without the last two patches, i.e. up to
> e19079adcd26a25d7d3e586b1837493361fdf8b6:
> 
>   netfilter: nfnetlink_queue: optimize verdict lookup with hash table (2026-01-29 09:52:07 +0100)

Would you mind sending a formal v2 of the PR, so that the CI catches it
up and it's properly tracked in PW?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next 0/9] netfilter: updates for net-next
@ 2026-02-24 20:50 Florian Westphal
  2026-02-26  3:50 ` patchwork-bot+netdevbpf
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2026-02-24 20:50 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

Hi,

The following patchset contains Netfilter fixes for *net-next*,
including IPVS updates from and via Julian Anastasov.

First updates for IPVS. From Julians cover-letter:

* Convert the global __ip_vs_mutex to per-net service_mutex and
  switch the service tables to be per-net, cowork by Jiejian Wu and
  Dust Li

* Convert some code that walks the service lists to use RCU instead of
  the service_mutex

* We used two tables for services (non-fwmark and fwmark), merge them
  into single svc_table

* The list for unavailable destinations (dest_trash) holds dsts and
  thus dev references causing extra work for the ip_vs_dst_event() dev
  notifier handler. Change this by dropping the reference when dest
  is removed and saved into dest_trash. The dest_trash will need more
  changes to make it light for lookups. TODO.

* On new connection we can do multiple lookups for services by trying
  different fallback options. Add more counters for service types, so
  that we can avoid unneeded lookups for services.

* The no_cport and dropentry counters can be per-net and also we can
  avoid extra conn lookups

Then, a few cleanups for nf_tables:

* keep BH enabled during nft_set_rbtree inserts, this is possible because the
  root lock is now only taken from control plane.
* toss a few EXPORT_SYMBOLs from nf_tables; these were historic
  leftovers from back in the day when e.g. set backends were still
  residing in their own modules.
* remove the register tracking infra from nftables.  It was disabled
  years ago in 5.18 and there are no plans to salvage this work; the
  idea was good (remove redundant register stores), but there is just
  one too many pitfalls, and better rule structuring (verdict maps)
  largely avoids the scenarios where this would have helped.

Florian Westphal (3):
  netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock
  netfilter: nf_tables: drop obsolete EXPORT_SYMBOLs
  netfilter: nf_tables: remove register tracking infrastructure

Jiejian Wu (1):
  ipvs: make ip_vs_svc_table and ip_vs_svc_fwm_table per netns

Julian Anastasov (5):
  ipvs: some service readers can use RCU
  ipvs: use single svc table
  ipvs: do not keep dest_dst after dest is removed
  ipvs: use more counters to avoid service lookups
  ipvs: no_cport and dropentry counters can be per-net

 include/net/ip_vs.h                      |  39 ++-
 include/net/netfilter/nf_tables.h        |  32 --
 include/net/netfilter/nft_fib.h          |   2 -
 include/net/netfilter/nft_meta.h         |   3 -
 net/bridge/netfilter/nft_meta_bridge.c   |  20 --
 net/bridge/netfilter/nft_reject_bridge.c |   1 -
 net/ipv4/netfilter/nft_dup_ipv4.c        |   1 -
 net/ipv4/netfilter/nft_fib_ipv4.c        |   2 -
 net/ipv4/netfilter/nft_reject_ipv4.c     |   1 -
 net/ipv6/netfilter/nft_dup_ipv6.c        |   1 -
 net/ipv6/netfilter/nft_fib_ipv6.c        |   2 -
 net/ipv6/netfilter/nft_reject_ipv6.c     |   1 -
 net/netfilter/ipvs/ip_vs_conn.c          |  64 ++--
 net/netfilter/ipvs/ip_vs_core.c          |   2 +-
 net/netfilter/ipvs/ip_vs_ctl.c           | 368 ++++++++---------------
 net/netfilter/ipvs/ip_vs_est.c           |  18 +-
 net/netfilter/ipvs/ip_vs_xmit.c          |  12 +-
 net/netfilter/nf_tables_api.c            |  78 -----
 net/netfilter/nft_bitwise.c              | 104 -------
 net/netfilter/nft_byteorder.c            |  11 -
 net/netfilter/nft_cmp.c                  |   3 -
 net/netfilter/nft_compat.c               |  10 -
 net/netfilter/nft_connlimit.c            |   1 -
 net/netfilter/nft_counter.c              |   1 -
 net/netfilter/nft_ct.c                   |  46 ---
 net/netfilter/nft_dup_netdev.c           |   1 -
 net/netfilter/nft_dynset.c               |   1 -
 net/netfilter/nft_exthdr.c               |  34 ---
 net/netfilter/nft_fib.c                  |  42 ---
 net/netfilter/nft_fib_inet.c             |   1 -
 net/netfilter/nft_fib_netdev.c           |   1 -
 net/netfilter/nft_flow_offload.c         |   1 -
 net/netfilter/nft_fwd_netdev.c           |   2 -
 net/netfilter/nft_hash.c                 |  36 ---
 net/netfilter/nft_immediate.c            |  12 -
 net/netfilter/nft_last.c                 |   1 -
 net/netfilter/nft_limit.c                |   2 -
 net/netfilter/nft_log.c                  |   1 -
 net/netfilter/nft_lookup.c               |  12 -
 net/netfilter/nft_masq.c                 |   3 -
 net/netfilter/nft_meta.c                 |  45 ---
 net/netfilter/nft_nat.c                  |   2 -
 net/netfilter/nft_numgen.c               |  22 --
 net/netfilter/nft_objref.c               |   2 -
 net/netfilter/nft_osf.c                  |  25 --
 net/netfilter/nft_payload.c              |  47 ---
 net/netfilter/nft_queue.c                |   2 -
 net/netfilter/nft_quota.c                |   1 -
 net/netfilter/nft_range.c                |   1 -
 net/netfilter/nft_redir.c                |   3 -
 net/netfilter/nft_reject_inet.c          |   1 -
 net/netfilter/nft_reject_netdev.c        |   1 -
 net/netfilter/nft_rt.c                   |   1 -
 net/netfilter/nft_set_rbtree.c           |  23 +-
 net/netfilter/nft_socket.c               |  26 --
 net/netfilter/nft_synproxy.c             |   1 -
 net/netfilter/nft_tproxy.c               |   1 -
 net/netfilter/nft_tunnel.c               |  26 --
 net/netfilter/nft_xfrm.c                 |  27 --
 59 files changed, 221 insertions(+), 1009 deletions(-)

-- 
2.52.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 0/9] netfilter: updates for net-next
  2026-02-24 20:50 Florian Westphal
@ 2026-02-26  3:50 ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 20+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-02-26  3:50 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, pabeni, davem, edumazet, kuba, netfilter-devel, pablo

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 24 Feb 2026 21:50:39 +0100 you wrote:
> Hi,
> 
> The following patchset contains Netfilter fixes for *net-next*,
> including IPVS updates from and via Julian Anastasov.
> 
> First updates for IPVS. From Julians cover-letter:
> 
> [...]

Here is the summary with links:
  - [net-next,1/9] ipvs: make ip_vs_svc_table and ip_vs_svc_fwm_table per netns
    https://git.kernel.org/netdev/net-next/c/74455a5b4326
  - [net-next,2/9] ipvs: some service readers can use RCU
    https://git.kernel.org/netdev/net-next/c/3de0ec2873ea
  - [net-next,3/9] ipvs: use single svc table
    https://git.kernel.org/netdev/net-next/c/b24ae1a387e4
  - [net-next,4/9] ipvs: do not keep dest_dst after dest is removed
    https://git.kernel.org/netdev/net-next/c/40fb72209fd8
  - [net-next,5/9] ipvs: use more counters to avoid service lookups
    https://git.kernel.org/netdev/net-next/c/c59bd9e62e06
  - [net-next,6/9] ipvs: no_cport and dropentry counters can be per-net
    https://git.kernel.org/netdev/net-next/c/09b71fb45946
  - [net-next,7/9] netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock
    https://git.kernel.org/netdev/net-next/c/3aea466a4399
  - [net-next,8/9] netfilter: nf_tables: drop obsolete EXPORT_SYMBOLs
    https://git.kernel.org/netdev/net-next/c/b6461103e01a
  - [net-next,9/9] netfilter: nf_tables: remove register tracking infrastructure
    https://git.kernel.org/netdev/net-next/c/6b94d081f81d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-02-26  3:50 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-28 15:41 [PATCH net-next 0/9] netfilter: updates for net-next Florian Westphal
2026-01-28 15:41 ` [PATCH net-next 1/9] netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature Florian Westphal
2026-01-28 15:41 ` [PATCH net-next 2/9] netfilter: Introduce tunnel metadata info in nf_flowtable_ctx struct Florian Westphal
2026-01-28 15:41 ` [PATCH net-next 3/9] netfilter: flowtable: Add IP6IP6 rx sw acceleration Florian Westphal
2026-01-28 15:41 ` [PATCH net-next 4/9] netfilter: flowtable: Add IP6IP6 tx " Florian Westphal
2026-01-28 15:41 ` [PATCH net-next 5/9] selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest Florian Westphal
2026-01-28 15:41 ` [PATCH net-next 6/9] netfilter: xt_time: use is_leap_year() helper Florian Westphal
2026-01-28 15:41 ` [PATCH net-next 7/9] netfilter: nfnetlink_queue: optimize verdict lookup with hash table Florian Westphal
2026-01-28 15:41 ` [PATCH net-next 8/9] netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation Florian Westphal
2026-01-28 15:41 ` [PATCH net-next 9/9] selftests: netfilter: nft_queue.sh: add udp fraglist gro test case Florian Westphal
2026-01-29  5:03 ` [PATCH net-next 0/9] netfilter: updates for net-next Jakub Kicinski
2026-01-29  8:56   ` Florian Westphal
2026-01-29 10:08     ` Florian Westphal
2026-01-29 10:40       ` Paolo Abeni
  -- strict thread matches above, loose matches on Subject: below --
2026-02-24 20:50 Florian Westphal
2026-02-26  3:50 ` patchwork-bot+netdevbpf
2024-08-22 22:19 [PATCH net-next 0/9] Netfilter " Pablo Neira Ayuso
2023-05-18 10:07 Florian Westphal
2023-03-08 19:30 Florian Westphal
2023-01-18 12:31 Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox