[PATCH 1/3 nf v5] netfilter: nf_socket: skip socket lookup for non-first fragments

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/3 nf v5] netfilter: nf_socket: skip socket lookup for non-first fragments
@ 2026-04-28 10:25 Fernando Fernandez Mancera
  2026-04-28 10:25 ` [PATCH 2/3 nf v5] netfilter: nf_tables: skip L4 header parsing " Fernando Fernandez Mancera
  2026-04-28 10:25 ` [PATCH 3/3 nf v5] netfilter: xtables: fix " Fernando Fernandez Mancera
  0 siblings, 2 replies; 6+ messages in thread
From: Fernando Fernandez Mancera @ 2026-04-28 10:25 UTC (permalink / raw)
  To: netfilter-devel; +Cc: coreteam, phil, fw, pablo, Fernando Fernandez Mancera

Both nft_socket and xt_socket relies on L4 headers to perform socket
lookup in the slow path. For fragmented packets, while the IP protocol
remains constant across all fragments, only the first fragment contains
the actual L4 header.

As the expression/match could be attached to a chain with a priority
lower than -400, it could bypass defragmentation.

Add a check for fragmentation in the lookup functions directly so the
problem is handled for both nft_socket and xt_socket at the same time.
In addition, future users of the functions would not need to care about
this.

Fixes: 902d6a4c2a4f ("netfilter: nf_defrag: Skip defrag if NOTRACK is set")
Fixes: 554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
v3: added this patch to the series, I splitted this as the fix is
generic for both nft_socket and xt_socket
v4: no changes
v5: no changes
---
 net/ipv4/netfilter/nf_socket_ipv4.c | 3 +++
 net/ipv6/netfilter/nf_socket_ipv6.c | 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/netfilter/nf_socket_ipv4.c b/net/ipv4/netfilter/nf_socket_ipv4.c
index 5080fa5fbf6a..f9c6755f5ec5 100644
--- a/net/ipv4/netfilter/nf_socket_ipv4.c
+++ b/net/ipv4/netfilter/nf_socket_ipv4.c
@@ -94,6 +94,9 @@ struct sock *nf_sk_lookup_slow_v4(struct net *net, const struct sk_buff *skb,
 #endif
 	int doff = 0;
 
+	if (ntohs(iph->frag_off) & IP_OFFSET)
+		return NULL;
+
 	if (iph->protocol == IPPROTO_UDP || iph->protocol == IPPROTO_TCP) {
 		struct tcphdr _hdr;
 		struct udphdr *hp;
diff --git a/net/ipv6/netfilter/nf_socket_ipv6.c b/net/ipv6/netfilter/nf_socket_ipv6.c
index ced8bd44828e..893f2aeb4711 100644
--- a/net/ipv6/netfilter/nf_socket_ipv6.c
+++ b/net/ipv6/netfilter/nf_socket_ipv6.c
@@ -100,6 +100,7 @@ struct sock *nf_sk_lookup_slow_v6(struct net *net, const struct sk_buff *skb,
 	const struct in6_addr *daddr = NULL, *saddr = NULL;
 	struct ipv6hdr *iph = ipv6_hdr(skb), ipv6_var;
 	struct sk_buff *data_skb = NULL;
+	unsigned short fragoff = 0;
 	int doff = 0;
 	int thoff = 0, tproto;
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
@@ -107,8 +108,8 @@ struct sock *nf_sk_lookup_slow_v6(struct net *net, const struct sk_buff *skb,
 	struct nf_conn const *ct;
 #endif
 
-	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
-	if (tproto < 0) {
+	tproto = ipv6_find_hdr(skb, &thoff, -1, &fragoff, NULL);
+	if (tproto < 0 || fragoff) {
 		pr_debug("unable to find transport header in IPv6 packet, dropping\n");
 		return NULL;
 	}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3 nf v5] netfilter: nf_tables: skip L4 header parsing for non-first fragments
  2026-04-28 10:25 [PATCH 1/3 nf v5] netfilter: nf_socket: skip socket lookup for non-first fragments Fernando Fernandez Mancera
@ 2026-04-28 10:25 ` Fernando Fernandez Mancera
  2026-04-28 10:25 ` [PATCH 3/3 nf v5] netfilter: xtables: fix " Fernando Fernandez Mancera
  1 sibling, 0 replies; 6+ messages in thread
From: Fernando Fernandez Mancera @ 2026-04-28 10:25 UTC (permalink / raw)
  To: netfilter-devel; +Cc: coreteam, phil, fw, pablo, Fernando Fernandez Mancera

The tproxy, osf and exthdr (SCTP) expressions rely on the presence of
transport layer headers to perform socket lookups, fingerprint matching,
or chunk extraction. For fragmented packets, while the IP protocol
remains constant across all fragments, only the first fragment contains
the actual L4 header.

The expressions could be attached to a chain with a priority lower than
-400, bypassing defragmentation. Or could be used in stateless
environments where defragmentation is not happening at all.  This could
result in garbage data being used for the matching.

Add a check for pkt->fragoff so only unfragmented packets or the first
fragment is processed.

Fixes: 133dc203d77d ("netfilter: nft_exthdr: Support SCTP chunks")
Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support")
Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
v2: handled fragmented packets for socket expression too,
squashed nftables expression commits into this one.
v3: removed changes to nft_socket and created a generic solution for
xt/nft
v4: no changes
v5: added check on payload fastpath
---
 net/netfilter/nf_tables_core.c | 2 +-
 net/netfilter/nft_exthdr.c     | 2 +-
 net/netfilter/nft_osf.c        | 2 +-
 net/netfilter/nft_tproxy.c     | 8 ++++----
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index 5ddd5b6e135f..8ab186f86dd4 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -153,7 +153,7 @@ static bool nft_payload_fast_eval(const struct nft_expr *expr,
 	if (priv->base == NFT_PAYLOAD_NETWORK_HEADER)
 		ptr = skb_network_header(skb) + pkt->nhoff;
 	else {
-		if (!(pkt->flags & NFT_PKTINFO_L4PROTO))
+		if (!(pkt->flags & NFT_PKTINFO_L4PROTO) || pkt->fragoff)
 			return false;
 		ptr = skb->data + nft_thoff(pkt);
 	}
diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c
index 0407d6f708ae..e6a07c0df207 100644
--- a/net/netfilter/nft_exthdr.c
+++ b/net/netfilter/nft_exthdr.c
@@ -376,7 +376,7 @@ static void nft_exthdr_sctp_eval(const struct nft_expr *expr,
 	const struct sctp_chunkhdr *sch;
 	struct sctp_chunkhdr _sch;
 
-	if (pkt->tprot != IPPROTO_SCTP)
+	if (pkt->tprot != IPPROTO_SCTP || pkt->fragoff)
 		goto err;
 
 	do {
diff --git a/net/netfilter/nft_osf.c b/net/netfilter/nft_osf.c
index c02d5cb52143..45fe56da5044 100644
--- a/net/netfilter/nft_osf.c
+++ b/net/netfilter/nft_osf.c
@@ -33,7 +33,7 @@ static void nft_osf_eval(const struct nft_expr *expr, struct nft_regs *regs,
 		return;
 	}
 
-	if (pkt->tprot != IPPROTO_TCP) {
+	if (pkt->tprot != IPPROTO_TCP || pkt->fragoff) {
 		regs->verdict.code = NFT_BREAK;
 		return;
 	}
diff --git a/net/netfilter/nft_tproxy.c b/net/netfilter/nft_tproxy.c
index f2101af8c867..89be443734f6 100644
--- a/net/netfilter/nft_tproxy.c
+++ b/net/netfilter/nft_tproxy.c
@@ -30,8 +30,8 @@ static void nft_tproxy_eval_v4(const struct nft_expr *expr,
 	__be16 tport = 0;
 	struct sock *sk;
 
-	if (pkt->tprot != IPPROTO_TCP &&
-	    pkt->tprot != IPPROTO_UDP) {
+	if ((pkt->tprot != IPPROTO_TCP &&
+	     pkt->tprot != IPPROTO_UDP) || pkt->fragoff) {
 		regs->verdict.code = NFT_BREAK;
 		return;
 	}
@@ -97,8 +97,8 @@ static void nft_tproxy_eval_v6(const struct nft_expr *expr,
 
 	memset(&taddr, 0, sizeof(taddr));
 
-	if (pkt->tprot != IPPROTO_TCP &&
-	    pkt->tprot != IPPROTO_UDP) {
+	if ((pkt->tprot != IPPROTO_TCP &&
+	     pkt->tprot != IPPROTO_UDP) || pkt->fragoff) {
 		regs->verdict.code = NFT_BREAK;
 		return;
 	}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3 nf v5] netfilter: xtables: fix L4 header parsing for non-first fragments
  2026-04-28 10:25 [PATCH 1/3 nf v5] netfilter: nf_socket: skip socket lookup for non-first fragments Fernando Fernandez Mancera
  2026-04-28 10:25 ` [PATCH 2/3 nf v5] netfilter: nf_tables: skip L4 header parsing " Fernando Fernandez Mancera
@ 2026-04-28 10:25 ` Fernando Fernandez Mancera
  2026-04-30  6:08   ` Pablo Neira Ayuso
  1 sibling, 1 reply; 6+ messages in thread
From: Fernando Fernandez Mancera @ 2026-04-28 10:25 UTC (permalink / raw)
  To: netfilter-devel; +Cc: coreteam, phil, fw, pablo, Fernando Fernandez Mancera

Multiple targets and matches relies on L4 header to operate. For
fragmented packets, every fragment carries the transport protocol
identifier, but only the first fragment contains the L4 header.

As the 'raw' table can be configured to run at priority -450 (before
defragmentation at -400), the target/match can be reached before
reassembly. In this case, non-first fragments have their payload
incorrectly parsed as a TCP/UDP header. This would be of course a
misconfiguration scenario. In most of the cases this just lead to a
unreliable behavior for fragmented traffic.

Add a fragment check to ensure target/match only evaluates unfragmented
packets or the first fragment in the stream.

Fixes: 902d6a4c2a4f ("netfilter: nf_defrag: Skip defrag if NOTRACK is set")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
v2: handled fragmented packets for socket expression too,
squashed nftables expression commits into this one.
v3: removed changes to nft_socket and created a generic solution for
xt/nft
v4: no changes
v5: no changes
---
 net/netfilter/xt_TPROXY.c    | 11 +++++++++--
 net/netfilter/xt_ecn.c       |  4 ++++
 net/netfilter/xt_hashlimit.c |  4 +++-
 net/netfilter/xt_osf.c       |  3 +++
 net/netfilter/xt_tcpmss.c    |  4 ++++
 5 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c
index e4bea1d346cf..5f60e7298a1e 100644
--- a/net/netfilter/xt_TPROXY.c
+++ b/net/netfilter/xt_TPROXY.c
@@ -86,6 +86,9 @@ tproxy_tg4_v0(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_tproxy_target_info *tgi = par->targinfo;
 
+	if (par->fragoff)
+		return NF_DROP;
+
 	return tproxy_tg4(xt_net(par), skb, tgi->laddr, tgi->lport,
 			  tgi->mark_mask, tgi->mark_value);
 }
@@ -95,6 +98,9 @@ tproxy_tg4_v1(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_tproxy_target_info_v1 *tgi = par->targinfo;
 
+	if (par->fragoff)
+		return NF_DROP;
+
 	return tproxy_tg4(xt_net(par), skb, tgi->laddr.ip, tgi->lport,
 			  tgi->mark_mask, tgi->mark_value);
 }
@@ -106,6 +112,7 @@ tproxy_tg6_v1(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct ipv6hdr *iph = ipv6_hdr(skb);
 	const struct xt_tproxy_target_info_v1 *tgi = par->targinfo;
+	unsigned short fragoff = 0;
 	struct udphdr _hdr, *hp;
 	struct sock *sk;
 	const struct in6_addr *laddr;
@@ -113,8 +120,8 @@ tproxy_tg6_v1(struct sk_buff *skb, const struct xt_action_param *par)
 	int thoff = 0;
 	int tproto;
 
-	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
-	if (tproto < 0)
+	tproto = ipv6_find_hdr(skb, &thoff, -1, &fragoff, NULL);
+	if (tproto < 0 || fragoff)
 		return NF_DROP;
 
 	hp = skb_header_pointer(skb, thoff, sizeof(_hdr), &_hdr);
diff --git a/net/netfilter/xt_ecn.c b/net/netfilter/xt_ecn.c
index b96e8203ac54..a8503f5d26bf 100644
--- a/net/netfilter/xt_ecn.c
+++ b/net/netfilter/xt_ecn.c
@@ -30,6 +30,10 @@ static bool match_tcp(const struct sk_buff *skb, struct xt_action_param *par)
 	struct tcphdr _tcph;
 	const struct tcphdr *th;
 
+	/* this is fine for IPv6 as ecn_mt_check6() enforces -p tcp */
+	if (par->fragoff)
+		return false;
+
 	/* In practice, TCP match does this, so can't fail.  But let's
 	 * be good citizens.
 	 */
diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 3bd127bfc114..2704b4b60d1e 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -658,6 +658,8 @@ hashlimit_init_dst(const struct xt_hashlimit_htable *hinfo,
 		if (!(hinfo->cfg.mode &
 		      (XT_HASHLIMIT_HASH_DPT | XT_HASHLIMIT_HASH_SPT)))
 			return 0;
+		if (ntohs(ip_hdr(skb)->frag_off) & IP_OFFSET)
+			return -1;
 		nexthdr = ip_hdr(skb)->protocol;
 		break;
 #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
@@ -681,7 +683,7 @@ hashlimit_init_dst(const struct xt_hashlimit_htable *hinfo,
 			return 0;
 		nexthdr = ipv6_hdr(skb)->nexthdr;
 		protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr, &frag_off);
-		if ((int)protoff < 0)
+		if ((int)protoff < 0 || ntohs(frag_off) & IP6_OFFSET)
 			return -1;
 		break;
 	}
diff --git a/net/netfilter/xt_osf.c b/net/netfilter/xt_osf.c
index dc9485854002..e8807caede68 100644
--- a/net/netfilter/xt_osf.c
+++ b/net/netfilter/xt_osf.c
@@ -27,6 +27,9 @@
 static bool
 xt_osf_match_packet(const struct sk_buff *skb, struct xt_action_param *p)
 {
+	if (p->fragoff)
+		return false;
+
 	return nf_osf_match(skb, xt_family(p), xt_hooknum(p), xt_in(p),
 			    xt_out(p), p->matchinfo, xt_net(p), nf_osf_fingers);
 }
diff --git a/net/netfilter/xt_tcpmss.c b/net/netfilter/xt_tcpmss.c
index 0d32d4841cb3..b9da8269161d 100644
--- a/net/netfilter/xt_tcpmss.c
+++ b/net/netfilter/xt_tcpmss.c
@@ -32,6 +32,10 @@ tcpmss_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	u8 _opt[15 * 4 - sizeof(_tcph)];
 	unsigned int i, optlen;
 
+	/* this is fine for IPv6 as xt_tcpmss enforces -p tcp */
+	if (par->fragoff)
+		return false;
+
 	/* If we don't have the whole header, drop packet. */
 	th = skb_header_pointer(skb, par->thoff, sizeof(_tcph), &_tcph);
 	if (th == NULL)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3 nf v5] netfilter: xtables: fix L4 header parsing for non-first fragments
  2026-04-28 10:25 ` [PATCH 3/3 nf v5] netfilter: xtables: fix " Fernando Fernandez Mancera
@ 2026-04-30  6:08   ` Pablo Neira Ayuso
  2026-04-30 14:53     ` Fernando Fernandez Mancera
  0 siblings, 1 reply; 6+ messages in thread
From: Pablo Neira Ayuso @ 2026-04-30  6:08 UTC (permalink / raw)
  To: Fernando Fernandez Mancera; +Cc: netfilter-devel, coreteam, phil, fw

Hi Fernando,

On Tue, Apr 28, 2026 at 12:25:48PM +0200, Fernando Fernandez Mancera wrote:
> Multiple targets and matches relies on L4 header to operate. For
> fragmented packets, every fragment carries the transport protocol
> identifier, but only the first fragment contains the L4 header.
> 
> As the 'raw' table can be configured to run at priority -450 (before
> defragmentation at -400), the target/match can be reached before
> reassembly. In this case, non-first fragments have their payload
> incorrectly parsed as a TCP/UDP header. This would be of course a
> misconfiguration scenario. In most of the cases this just lead to a
> unreliable behavior for fragmented traffic.
> 
> Add a fragment check to ensure target/match only evaluates unfragmented
> packets or the first fragment in the stream.

One more little issue here: There seems to be an issue in
xt_hashlimit, hashlimit_init_dst() drops packets via hotdrop if it
returns -1.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3 nf v5] netfilter: xtables: fix L4 header parsing for non-first fragments
  2026-04-30  6:08   ` Pablo Neira Ayuso
@ 2026-04-30 14:53     ` Fernando Fernandez Mancera
  2026-04-30 15:06       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 6+ messages in thread
From: Fernando Fernandez Mancera @ 2026-04-30 14:53 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, coreteam, phil, fw

On 4/30/26 8:08 AM, Pablo Neira Ayuso wrote:
> Hi Fernando,
> 
> On Tue, Apr 28, 2026 at 12:25:48PM +0200, Fernando Fernandez Mancera wrote:
>> Multiple targets and matches relies on L4 header to operate. For
>> fragmented packets, every fragment carries the transport protocol
>> identifier, but only the first fragment contains the L4 header.
>>
>> As the 'raw' table can be configured to run at priority -450 (before
>> defragmentation at -400), the target/match can be reached before
>> reassembly. In this case, non-first fragments have their payload
>> incorrectly parsed as a TCP/UDP header. This would be of course a
>> misconfiguration scenario. In most of the cases this just lead to a
>> unreliable behavior for fragmented traffic.
>>
>> Add a fragment check to ensure target/match only evaluates unfragmented
>> packets or the first fragment in the stream.
> 
> One more little issue here: There seems to be an issue in
> xt_hashlimit, hashlimit_init_dst() drops packets via hotdrop if it
> returns -1.

Hi Pablo, I do not follow here. I think a hotdrop is the right thing to do.

xt_hashlimit creates the hash and later checks whether we are over the 
limit or not. The verdict is set based on that and the INVERT flag.. I 
don't think we should match or not match packets that we cannot parse 
correctly, we should just drop them.

This is the current behavior for example when protoff < 0 (because no L4 
header is found).

What do you think?

Thanks,
Fernando.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3 nf v5] netfilter: xtables: fix L4 header parsing for non-first fragments
  2026-04-30 14:53     ` Fernando Fernandez Mancera
@ 2026-04-30 15:06       ` Pablo Neira Ayuso
  0 siblings, 0 replies; 6+ messages in thread
From: Pablo Neira Ayuso @ 2026-04-30 15:06 UTC (permalink / raw)
  To: Fernando Fernandez Mancera; +Cc: netfilter-devel, coreteam, phil, fw

On Thu, Apr 30, 2026 at 04:53:14PM +0200, Fernando Fernandez Mancera wrote:
> On 4/30/26 8:08 AM, Pablo Neira Ayuso wrote:
> > Hi Fernando,
> > 
> > On Tue, Apr 28, 2026 at 12:25:48PM +0200, Fernando Fernandez Mancera wrote:
> > > Multiple targets and matches relies on L4 header to operate. For
> > > fragmented packets, every fragment carries the transport protocol
> > > identifier, but only the first fragment contains the L4 header.
> > > 
> > > As the 'raw' table can be configured to run at priority -450 (before
> > > defragmentation at -400), the target/match can be reached before
> > > reassembly. In this case, non-first fragments have their payload
> > > incorrectly parsed as a TCP/UDP header. This would be of course a
> > > misconfiguration scenario. In most of the cases this just lead to a
> > > unreliable behavior for fragmented traffic.
> > > 
> > > Add a fragment check to ensure target/match only evaluates unfragmented
> > > packets or the first fragment in the stream.
> > 
> > One more little issue here: There seems to be an issue in
> > xt_hashlimit, hashlimit_init_dst() drops packets via hotdrop if it
> > returns -1.
> 
> Hi Pablo, I do not follow here. I think a hotdrop is the right thing to do.
> 
> xt_hashlimit creates the hash and later checks whether we are over the limit
> or not. The verdict is set based on that and the INVERT flag.. I don't think
> we should match or not match packets that we cannot parse correctly, we
> should just drop them.
> 
> This is the current behavior for example when protoff < 0 (because no L4
> header is found).
> 
> What do you think?

Your reasoning makes sense.

Currently if protoff < 0, the packet is dropped, therefore, fragments
are already being dropped.

Let's stick to this approach, thanks for explaining. I will take this
series as is then.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-30 15:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 10:25 [PATCH 1/3 nf v5] netfilter: nf_socket: skip socket lookup for non-first fragments Fernando Fernandez Mancera
2026-04-28 10:25 ` [PATCH 2/3 nf v5] netfilter: nf_tables: skip L4 header parsing " Fernando Fernandez Mancera
2026-04-28 10:25 ` [PATCH 3/3 nf v5] netfilter: xtables: fix " Fernando Fernandez Mancera
2026-04-30  6:08   ` Pablo Neira Ayuso
2026-04-30 14:53     ` Fernando Fernandez Mancera
2026-04-30 15:06       ` Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.