public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v19 nf-next 0/5] conntrack: bridge: add double vlan, pppoe and pppoe-in-q
@ 2026-02-24  6:53 Eric Woudstra
  2026-02-24  6:53 ` [PATCH v19 nf-next 1/5] net: pppoe: avoid zero-length arrays in struct pppoe_hdr Eric Woudstra
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Eric Woudstra @ 2026-02-24  6:53 UTC (permalink / raw)
  To: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, Simon Horman, Nikolay Aleksandrov, Ido Schimmel
  Cc: netdev, netfilter-devel, bridge, Eric Woudstra

Conntrack bridge only tracks untagged and 802.1q.

To make the bridge-fastpath experience more similar to the
forward-fastpath experience, introduce patches for double vlan,
pppoe and pppoe-in-q tagged packets to bridge conntrack and to
bridge filter chain.

Changes in v19:

- Add net: pppoe: avoid zero-length arrays in struct pppoe_hdr.
   (It was part of other patch-set of mine, moved to this patch-set)

Changes in v18:

- Rebased
- ‎nf_conntrack_bridge: added #include <linux/ppp_defs.h>
- nf_checksum(_partial)(): changed WARN_ON to WARN_ON_ONCE.
- nft_set_bridge_pktinfo(): changed call to pskb_may_pull() to
   skb_header_pointer().

Changes in v17:

- Add patch for nft_set_pktinfo_ipv4/6_validate() adding nhoff argument.
- Stopped using skb_set_network_header() in nft_set_bridge_pktinfo,
   using the new offset for nft_set_pktinfo_ipv4/6_validate instead.
- When pskb_may_pull() fails in nft_set_bridge_pktinfo() set proto to 0,
   resulting in pktinfo unspecified.

Changes in v16:

- Changed nft_chain_filter patch: Only help populating pktinfo offsets,
   call nft_do_chain() with original network_offset.
- Changed commit messages.
- Removed kernel-doc comments.

Changes in v15:

- Do not munge skb->protocol.
- Introduce nft_set_bridge_pktinfo() helper.
- Introduce nf_ct_bridge_pre_inner() helper.
- nf_ct_bridge_pre(): Don't trim on ph->hdr.length, only compare to what
   ip header claims and return NF_ACCEPT if it does not match.
- nf_ct_bridge_pre(): Renamed u32 data_len to pppoe_len.
- nf_ct_bridge_pre(): Reset network_header only when ret == NF_ACCEPT.
- nf_checksum(_partial)(): Use of skb_network_offset().
- nf_checksum(_partial)(): Use 'if (WARN_ON()) return 0' instead.
- nf_checksum(_partial)(): Added comments

Changes in v14:

- nf_checksum(_patial): Use DEBUG_NET_WARN_ON_ONCE(
   !skb_pointer_if_linear()) instead of pskb_may_pull().
- nft_do_chain_bridge: Added default case ph->proto is neither
   ipv4 nor ipv6.
- nft_do_chain_bridge: only reset network header when ret == NF_ACCEPT.

Changes in v13:

- Do not use pull/push before/after calling nf_conntrack_in() or
   nft_do_chain().
- Add patch to correct calculating checksum when skb->data !=
   skb_network_header(skb).

Changes in v12:

- Only allow tracking this traffic when a conntrack zone is set.
- nf_ct_bridge_pre(): skb pull/push without touching the checksum,
   because the pull is always restored with push.
- nft_do_chain_bridge(): handle the extra header similar to
   nf_ct_bridge_pre(), using pull/push.

Changes in v11:

- nft_do_chain_bridge(): Proper readout of encapsulated proto.
- nft_do_chain_bridge(): Use skb_set_network_header() instead of thoff.
- removed test script, it is now in separate patch.

v10 split from patch-set: bridge-fastpath and related improvements v9

Eric Woudstra (5):
  net: pppoe: avoid zero-length arrays in struct pppoe_hdr
  netfilter: utils: nf_checksum(_partial) correct data!=networkheader
  netfilter: bridge: Add conntrack double vlan and pppoe
  netfilter: nft_set_pktinfo_ipv4/6_validate: Add nhoff argument
  netfilter: nft_chain_filter: Add bridge double vlan and pppoe

 drivers/net/ppp/pppoe.c                    |  2 +-
 include/net/netfilter/nf_tables_ipv4.h     | 21 +++--
 include/net/netfilter/nf_tables_ipv6.h     | 21 +++--
 include/uapi/linux/if_pppox.h              |  4 +
 net/bridge/netfilter/nf_conntrack_bridge.c | 93 ++++++++++++++++++----
 net/netfilter/nft_chain_filter.c           | 59 ++++++++++++--
 net/netfilter/utils.c                      | 28 +++++--
 7 files changed, 182 insertions(+), 46 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v19 nf-next 1/5] net: pppoe: avoid zero-length arrays in struct pppoe_hdr
  2026-02-24  6:53 [PATCH v19 nf-next 0/5] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
@ 2026-02-24  6:53 ` Eric Woudstra
  2026-02-24 14:15   ` Florian Westphal
  2026-02-24  6:53 ` [PATCH v19 nf-next 2/5] netfilter: utils: nf_checksum(_partial) correct data!=networkheader Eric Woudstra
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Eric Woudstra @ 2026-02-24  6:53 UTC (permalink / raw)
  To: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, Simon Horman, Nikolay Aleksandrov, Ido Schimmel
  Cc: netdev, netfilter-devel, bridge, Eric Woudstra, Kees Cook

Jakub Kicinski suggested following patch:

W=1 C=1 GCC build gives us:

net/bridge/netfilter/nf_conntrack_bridge.c: note: in included file (through
../include/linux/if_pppox.h, ../include/uapi/linux/netfilter_bridge.h,
../include/linux/netfilter_bridge.h): include/uapi/linux/if_pppox.h:
153:29: warning: array of flexible structures

It doesn't like that hdr has a zero-length array which overlaps proto.
The kernel code doesn't currently need those arrays.

PPPoE connection is functional after applying this patch.

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
 drivers/net/ppp/pppoe.c       | 2 +-
 include/uapi/linux/if_pppox.h | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 4275b393a454..7900cc3212a5 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -885,7 +885,7 @@ static int pppoe_sendmsg(struct socket *sock, struct msghdr *m,
 	skb->protocol = cpu_to_be16(ETH_P_PPP_SES);
 
 	ph = skb_put(skb, total_len + sizeof(struct pppoe_hdr));
-	start = (char *)&ph->tag[0];
+	start = (char *)ph + sizeof(*ph);
 
 	error = memcpy_from_msg(start, m, total_len);
 	if (error < 0) {
diff --git a/include/uapi/linux/if_pppox.h b/include/uapi/linux/if_pppox.h
index 9abd80dcc46f..29b804aa7474 100644
--- a/include/uapi/linux/if_pppox.h
+++ b/include/uapi/linux/if_pppox.h
@@ -122,7 +122,9 @@ struct sockaddr_pppol2tpv3in6 {
 struct pppoe_tag {
 	__be16 tag_type;
 	__be16 tag_len;
+#ifndef __KERNEL__
 	char tag_data[];
+#endif
 } __attribute__ ((packed));
 
 /* Tag identifiers */
@@ -150,7 +152,9 @@ struct pppoe_hdr {
 	__u8 code;
 	__be16 sid;
 	__be16 length;
+#ifndef __KERNEL__
 	struct pppoe_tag tag[];
+#endif
 } __packed;
 
 /* Length of entire PPPoE + PPP header */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v19 nf-next 2/5] netfilter: utils: nf_checksum(_partial) correct data!=networkheader
  2026-02-24  6:53 [PATCH v19 nf-next 0/5] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
  2026-02-24  6:53 ` [PATCH v19 nf-next 1/5] net: pppoe: avoid zero-length arrays in struct pppoe_hdr Eric Woudstra
@ 2026-02-24  6:53 ` Eric Woudstra
  2026-02-24  6:53 ` [PATCH v19 nf-next 3/5] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Eric Woudstra @ 2026-02-24  6:53 UTC (permalink / raw)
  To: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, Simon Horman, Nikolay Aleksandrov, Ido Schimmel
  Cc: netdev, netfilter-devel, bridge, Eric Woudstra

In the conntrack hook it may not always be the case that:
skb_network_header(skb) == skb->data, i.e. skb_network_offset(skb)
is zero.

This is problematic when L4 function nf_conntrack_handle_packet()
is accessing L3 data. This function uses thoff and ip_hdr()
to finds it's data. But it also calculates the checksum.
nf_checksum() and nf_checksum_partial() both use lower skb-checksum
functions that are based on using skb->data.

Adjust for skb_network_offset(skb), so that the checksum is calculated
correctly.

Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
 net/netfilter/utils.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/utils.c b/net/netfilter/utils.c
index 008419db815a..4f8b5442b650 100644
--- a/net/netfilter/utils.c
+++ b/net/netfilter/utils.c
@@ -124,16 +124,25 @@ __sum16 nf_checksum(struct sk_buff *skb, unsigned int hook,
 		    unsigned int dataoff, u8 protocol,
 		    unsigned short family)
 {
+	unsigned int nhpull = skb_network_offset(skb);
 	__sum16 csum = 0;
 
+	if (WARN_ON_ONCE(!skb_pointer_if_linear(skb, nhpull, 0)))
+		return 0;
+
+	/* pull/push because the lower csum functions assume that
+	 * skb_network_offset(skb) is zero.
+	 */
+	__skb_pull(skb, nhpull);
 	switch (family) {
 	case AF_INET:
-		csum = nf_ip_checksum(skb, hook, dataoff, protocol);
+		csum = nf_ip_checksum(skb, hook, dataoff - nhpull, protocol);
 		break;
 	case AF_INET6:
-		csum = nf_ip6_checksum(skb, hook, dataoff, protocol);
+		csum = nf_ip6_checksum(skb, hook, dataoff - nhpull, protocol);
 		break;
 	}
+	__skb_push(skb, nhpull);
 
 	return csum;
 }
@@ -143,18 +152,25 @@ __sum16 nf_checksum_partial(struct sk_buff *skb, unsigned int hook,
 			    unsigned int dataoff, unsigned int len,
 			    u8 protocol, unsigned short family)
 {
+	unsigned int nhpull = skb_network_offset(skb);
 	__sum16 csum = 0;
 
+	if (WARN_ON_ONCE(!skb_pointer_if_linear(skb, nhpull, 0)))
+		return 0;
+
+	/* See nf_checksum() */
+	__skb_pull(skb, nhpull);
 	switch (family) {
 	case AF_INET:
-		csum = nf_ip_checksum_partial(skb, hook, dataoff, len,
-					      protocol);
+		csum = nf_ip_checksum_partial(skb, hook, dataoff - nhpull,
+					      len, protocol);
 		break;
 	case AF_INET6:
-		csum = nf_ip6_checksum_partial(skb, hook, dataoff, len,
-					       protocol);
+		csum = nf_ip6_checksum_partial(skb, hook, dataoff - nhpull,
+					       len, protocol);
 		break;
 	}
+	__skb_push(skb, nhpull);
 
 	return csum;
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v19 nf-next 3/5] netfilter: bridge: Add conntrack double vlan and pppoe
  2026-02-24  6:53 [PATCH v19 nf-next 0/5] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
  2026-02-24  6:53 ` [PATCH v19 nf-next 1/5] net: pppoe: avoid zero-length arrays in struct pppoe_hdr Eric Woudstra
  2026-02-24  6:53 ` [PATCH v19 nf-next 2/5] netfilter: utils: nf_checksum(_partial) correct data!=networkheader Eric Woudstra
@ 2026-02-24  6:53 ` Eric Woudstra
  2026-02-25  1:52   ` [v19,nf-next,3/5] " Jakub Kicinski
  2026-02-24  6:53 ` [PATCH v19 nf-next 4/5] netfilter: nft_set_pktinfo_ipv4/6_validate: Add nhoff argument Eric Woudstra
  2026-02-24  6:53 ` [PATCH v19 nf-next 5/5] netfilter: nft_chain_filter: Add bridge double vlan and pppoe Eric Woudstra
  4 siblings, 1 reply; 12+ messages in thread
From: Eric Woudstra @ 2026-02-24  6:53 UTC (permalink / raw)
  To: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, Simon Horman, Nikolay Aleksandrov, Ido Schimmel
  Cc: netdev, netfilter-devel, bridge, Eric Woudstra

In a bridge, until now, it is possible to track connections of plain
ip(v6) and ip(v6) encapsulated in single 802.1q or 802.1ad.

This patch adds the capability to track connections when the connection
is (also) encapsulated in PPPoE. It also adds the capability to track
connections that are encapsulated in an inner 802.1q, combined with an
outer 802.1ad or 802.1q encapsulation.

To prevent mixing connections that are tagged differently in the L2
encapsulations, one should separate them using conntrack zones.
Using a conntrack zone is a hard requirement for the newly added
encapsulations of the tracking capability inside a bridge.

Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
 net/bridge/netfilter/nf_conntrack_bridge.c | 93 ++++++++++++++++++----
 1 file changed, 76 insertions(+), 17 deletions(-)

diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
index 58a33d0380b0..49e01083278c 100644
--- a/net/bridge/netfilter/nf_conntrack_bridge.c
+++ b/net/bridge/netfilter/nf_conntrack_bridge.c
@@ -16,6 +16,7 @@
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_bridge.h>
 
+#include <linux/ppp_defs.h>
 #include <linux/netfilter_ipv4.h>
 
 #include "../br_private.h"
@@ -236,58 +237,116 @@ static int nf_ct_br_ipv6_check(const struct sk_buff *skb)
 	return 0;
 }
 
+static int nf_ct_bridge_pre_inner(struct sk_buff *skb, __be16 *proto, u32 *len)
+{
+	switch (*proto) {
+	case htons(ETH_P_PPP_SES): {
+		struct ppp_hdr {
+			struct pppoe_hdr hdr;
+			__be16 proto;
+		} *ph;
+
+		if (!pskb_may_pull(skb, PPPOE_SES_HLEN))
+			return -1;
+		ph = (struct ppp_hdr *)(skb->data);
+		switch (ph->proto) {
+		case htons(PPP_IP):
+			*proto = htons(ETH_P_IP);
+			*len = ntohs(ph->hdr.length) - 2;
+			skb_set_network_header(skb, PPPOE_SES_HLEN);
+			return PPPOE_SES_HLEN;
+		case htons(PPP_IPV6):
+			*proto = htons(ETH_P_IPV6);
+			*len = ntohs(ph->hdr.length) - 2;
+			skb_set_network_header(skb, PPPOE_SES_HLEN);
+			return PPPOE_SES_HLEN;
+		}
+		break;
+	}
+	case htons(ETH_P_8021Q): {
+		struct vlan_hdr *vhdr;
+
+		if (!pskb_may_pull(skb, VLAN_HLEN))
+			return -1;
+		vhdr = (struct vlan_hdr *)(skb->data);
+		*proto = vhdr->h_vlan_encapsulated_proto;
+		skb_set_network_header(skb, VLAN_HLEN);
+		return VLAN_HLEN;
+	}
+	}
+	return 0;
+}
+
 static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
 				     const struct nf_hook_state *state)
 {
 	struct nf_hook_state bridge_state = *state;
+	int ret = NF_ACCEPT, offset = 0;
 	enum ip_conntrack_info ctinfo;
+	u32 len, pppoe_len = 0;
 	struct nf_conn *ct;
-	u32 len;
-	int ret;
+	__be16 proto;
 
 	ct = nf_ct_get(skb, &ctinfo);
 	if ((ct && !nf_ct_is_template(ct)) ||
 	    ctinfo == IP_CT_UNTRACKED)
 		return NF_ACCEPT;
 
-	switch (skb->protocol) {
-	case htons(ETH_P_IP):
-		if (!pskb_may_pull(skb, sizeof(struct iphdr)))
+	proto = skb->protocol;
+
+	if (ct && nf_ct_zone_id(nf_ct_zone(ct), CTINFO2DIR(ctinfo)) !=
+			NF_CT_DEFAULT_ZONE_ID) {
+		offset = nf_ct_bridge_pre_inner(skb, &proto, &pppoe_len);
+		if (offset < 0)
 			return NF_ACCEPT;
+	}
+
+	switch (proto) {
+	case htons(ETH_P_IP):
+		if (!pskb_may_pull(skb, offset + sizeof(struct iphdr)))
+			goto do_not_track;
 
 		len = skb_ip_totlen(skb);
-		if (pskb_trim_rcsum(skb, len))
-			return NF_ACCEPT;
+		if (pppoe_len && pppoe_len != len)
+			goto do_not_track;
+		if (pskb_trim_rcsum(skb, offset + len))
+			goto do_not_track;
 
 		if (nf_ct_br_ip_check(skb))
-			return NF_ACCEPT;
+			goto do_not_track;
 
 		bridge_state.pf = NFPROTO_IPV4;
 		ret = nf_ct_br_defrag4(skb, &bridge_state);
 		break;
 	case htons(ETH_P_IPV6):
-		if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
-			return NF_ACCEPT;
+		if (!pskb_may_pull(skb, offset + sizeof(struct ipv6hdr)))
+			goto do_not_track;
 
 		len = sizeof(struct ipv6hdr) + skb_ipv6_payload_len(skb);
-		if (pskb_trim_rcsum(skb, len))
-			return NF_ACCEPT;
+		if (pppoe_len && pppoe_len != len)
+			goto do_not_track;
+		if (pskb_trim_rcsum(skb, offset + len))
+			goto do_not_track;
 
 		if (nf_ct_br_ipv6_check(skb))
-			return NF_ACCEPT;
+			goto do_not_track;
 
 		bridge_state.pf = NFPROTO_IPV6;
 		ret = nf_ct_br_defrag6(skb, &bridge_state);
 		break;
 	default:
 		nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
-		return NF_ACCEPT;
+		goto do_not_track;
 	}
 
-	if (ret != NF_ACCEPT)
-		return ret;
+	if (ret == NF_ACCEPT)
+		ret = nf_conntrack_in(skb, &bridge_state);
+
+do_not_track:
+	if (offset && ret == NF_ACCEPT)
+		skb_reset_network_header(skb);
 
-	return nf_conntrack_in(skb, &bridge_state);
+	return ret;
 }
 
 static unsigned int nf_ct_bridge_in(void *priv, struct sk_buff *skb,
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v19 nf-next 4/5] netfilter: nft_set_pktinfo_ipv4/6_validate: Add nhoff argument
  2026-02-24  6:53 [PATCH v19 nf-next 0/5] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
                   ` (2 preceding siblings ...)
  2026-02-24  6:53 ` [PATCH v19 nf-next 3/5] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
@ 2026-02-24  6:53 ` Eric Woudstra
  2026-02-25  1:52   ` [v19,nf-next,4/5] " Jakub Kicinski
  2026-02-24  6:53 ` [PATCH v19 nf-next 5/5] netfilter: nft_chain_filter: Add bridge double vlan and pppoe Eric Woudstra
  4 siblings, 1 reply; 12+ messages in thread
From: Eric Woudstra @ 2026-02-24  6:53 UTC (permalink / raw)
  To: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, Simon Horman, Nikolay Aleksandrov, Ido Schimmel
  Cc: netdev, netfilter-devel, bridge, Eric Woudstra

Add specifying an offset when calling nft_set_pktinfo_ipv4/6_validate()
for cases where the ip(v6) header is not located at skb_network_header().

When an offset is specified other then zero, do not set pkt->tprot and
the corresponding pkt->flags to not change rule processing. It does make
the offsets in pktinfo available for code that is not checking pkt->flags
to use the offsets, like nft_flow_offload_eval().

Existing behaviour for a rule like "tcp dport 22 accept" is not changed
when, for instance, a PPPoE packet is being matched inside a bridge.

Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
 include/net/netfilter/nf_tables_ipv4.h | 21 +++++++++++++--------
 include/net/netfilter/nf_tables_ipv6.h | 21 +++++++++++++--------
 net/netfilter/nft_chain_filter.c       |  8 ++++----
 3 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/include/net/netfilter/nf_tables_ipv4.h b/include/net/netfilter/nf_tables_ipv4.h
index fcf967286e37..bd354937134f 100644
--- a/include/net/netfilter/nf_tables_ipv4.h
+++ b/include/net/netfilter/nf_tables_ipv4.h
@@ -16,12 +16,12 @@ static inline void nft_set_pktinfo_ipv4(struct nft_pktinfo *pkt)
 	pkt->fragoff = ntohs(ip->frag_off) & IP_OFFSET;
 }
 
-static inline int __nft_set_pktinfo_ipv4_validate(struct nft_pktinfo *pkt)
+static inline int __nft_set_pktinfo_ipv4_validate(struct nft_pktinfo *pkt, u32 nhoff)
 {
 	struct iphdr *iph, _iph;
 	u32 len, thoff, skb_len;
 
-	iph = skb_header_pointer(pkt->skb, skb_network_offset(pkt->skb),
+	iph = skb_header_pointer(pkt->skb, skb_network_offset(pkt->skb) + nhoff,
 				 sizeof(*iph), &_iph);
 	if (!iph)
 		return -1;
@@ -31,7 +31,7 @@ static inline int __nft_set_pktinfo_ipv4_validate(struct nft_pktinfo *pkt)
 
 	len = iph_totlen(pkt->skb, iph);
 	thoff = iph->ihl * 4;
-	skb_len = pkt->skb->len - skb_network_offset(pkt->skb);
+	skb_len = pkt->skb->len - skb_network_offset(pkt->skb) - nhoff;
 
 	if (skb_len < len)
 		return -1;
@@ -40,17 +40,22 @@ static inline int __nft_set_pktinfo_ipv4_validate(struct nft_pktinfo *pkt)
 	else if (thoff < sizeof(*iph))
 		return -1;
 
-	pkt->flags = NFT_PKTINFO_L4PROTO;
-	pkt->tprot = iph->protocol;
-	pkt->thoff = skb_network_offset(pkt->skb) + thoff;
+	if (!nhoff) {
+		pkt->flags = NFT_PKTINFO_L4PROTO;
+		pkt->tprot = iph->protocol;
+	} else {
+		pkt->flags = 0;
+		pkt->tprot = 0;
+	}
+	pkt->thoff = skb_network_offset(pkt->skb) + nhoff + thoff;
 	pkt->fragoff = ntohs(iph->frag_off) & IP_OFFSET;
 
 	return 0;
 }
 
-static inline void nft_set_pktinfo_ipv4_validate(struct nft_pktinfo *pkt)
+static inline void nft_set_pktinfo_ipv4_validate(struct nft_pktinfo *pkt, u32 nhoff)
 {
-	if (__nft_set_pktinfo_ipv4_validate(pkt) < 0)
+	if (__nft_set_pktinfo_ipv4_validate(pkt, nhoff) < 0)
 		nft_set_pktinfo_unspec(pkt);
 }
 
diff --git a/include/net/netfilter/nf_tables_ipv6.h b/include/net/netfilter/nf_tables_ipv6.h
index c53ac00bb974..1e84a891f268 100644
--- a/include/net/netfilter/nf_tables_ipv6.h
+++ b/include/net/netfilter/nf_tables_ipv6.h
@@ -24,17 +24,17 @@ static inline void nft_set_pktinfo_ipv6(struct nft_pktinfo *pkt)
 	pkt->fragoff = frag_off;
 }
 
-static inline int __nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt)
+static inline int __nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt, u32 nhoff)
 {
 #if IS_ENABLED(CONFIG_IPV6)
 	unsigned int flags = IP6_FH_F_AUTH;
 	struct ipv6hdr *ip6h, _ip6h;
-	unsigned int thoff = 0;
+	unsigned int thoff = nhoff;
 	unsigned short frag_off;
 	u32 pkt_len, skb_len;
 	int protohdr;
 
-	ip6h = skb_header_pointer(pkt->skb, skb_network_offset(pkt->skb),
+	ip6h = skb_header_pointer(pkt->skb, skb_network_offset(pkt->skb) + nhoff,
 				  sizeof(*ip6h), &_ip6h);
 	if (!ip6h)
 		return -1;
@@ -43,7 +43,7 @@ static inline int __nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt)
 		return -1;
 
 	pkt_len = ipv6_payload_len(pkt->skb, ip6h);
-	skb_len = pkt->skb->len - skb_network_offset(pkt->skb);
+	skb_len = pkt->skb->len - skb_network_offset(pkt->skb) - nhoff;
 	if (pkt_len + sizeof(*ip6h) > skb_len)
 		return -1;
 
@@ -51,8 +51,13 @@ static inline int __nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt)
 	if (protohdr < 0 || thoff > U16_MAX)
 		return -1;
 
-	pkt->flags = NFT_PKTINFO_L4PROTO;
-	pkt->tprot = protohdr;
+	if (!nhoff) {
+		pkt->flags = NFT_PKTINFO_L4PROTO;
+		pkt->tprot = protohdr;
+	} else {
+		pkt->flags = 0;
+		pkt->tprot = 0;
+	}
 	pkt->thoff = thoff;
 	pkt->fragoff = frag_off;
 
@@ -62,9 +67,9 @@ static inline int __nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt)
 #endif
 }
 
-static inline void nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt)
+static inline void nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt, u32 nhoff)
 {
-	if (__nft_set_pktinfo_ipv6_validate(pkt) < 0)
+	if (__nft_set_pktinfo_ipv6_validate(pkt, nhoff) < 0)
 		nft_set_pktinfo_unspec(pkt);
 }
 
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index b16185e9a6dd..d4d5eadaba9c 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -238,10 +238,10 @@ nft_do_chain_bridge(void *priv,
 
 	switch (eth_hdr(skb)->h_proto) {
 	case htons(ETH_P_IP):
-		nft_set_pktinfo_ipv4_validate(&pkt);
+		nft_set_pktinfo_ipv4_validate(&pkt, 0);
 		break;
 	case htons(ETH_P_IPV6):
-		nft_set_pktinfo_ipv6_validate(&pkt);
+		nft_set_pktinfo_ipv6_validate(&pkt, 0);
 		break;
 	default:
 		nft_set_pktinfo_unspec(&pkt);
@@ -293,10 +293,10 @@ static unsigned int nft_do_chain_netdev(void *priv, struct sk_buff *skb,
 
 	switch (skb->protocol) {
 	case htons(ETH_P_IP):
-		nft_set_pktinfo_ipv4_validate(&pkt);
+		nft_set_pktinfo_ipv4_validate(&pkt, 0);
 		break;
 	case htons(ETH_P_IPV6):
-		nft_set_pktinfo_ipv6_validate(&pkt);
+		nft_set_pktinfo_ipv6_validate(&pkt, 0);
 		break;
 	default:
 		nft_set_pktinfo_unspec(&pkt);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v19 nf-next 5/5] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2026-02-24  6:53 [PATCH v19 nf-next 0/5] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
                   ` (3 preceding siblings ...)
  2026-02-24  6:53 ` [PATCH v19 nf-next 4/5] netfilter: nft_set_pktinfo_ipv4/6_validate: Add nhoff argument Eric Woudstra
@ 2026-02-24  6:53 ` Eric Woudstra
  2026-03-10  8:37   ` Eric Woudstra
  2026-03-11 10:07   ` Pablo Neira Ayuso
  4 siblings, 2 replies; 12+ messages in thread
From: Eric Woudstra @ 2026-02-24  6:53 UTC (permalink / raw)
  To: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, Simon Horman, Nikolay Aleksandrov, Ido Schimmel
  Cc: netdev, netfilter-devel, bridge, Eric Woudstra

In nft_do_chain_bridge() pktinfo is only fully populated for plain packets
and packets encapsulated in single 802.1q or 802.1ad.

When implementing the software bridge-fastpath and testing all possible
encapulations, there can be more encapsulations:

The packet could (also) be encapsulated in PPPoE, or the packet could be
encapsulated in an inner 802.1q, combined with an outer 802.1ad or 802.1q
encapsulation.

nft_flow_offload_eval() also examines the L4 header, with the L4 protocol
known from the conntrack-tuplehash. To access the header it uses
nft_thoff(), but for these packets it returns zero.

Introduce nft_set_bridge_pktinfo() to help populate pktinfo with the
offsets.

Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
 net/netfilter/nft_chain_filter.c | 55 +++++++++++++++++++++++++++++---
 1 file changed, 51 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index d4d5eadaba9c..66ef30c60e56 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -227,21 +227,68 @@ static inline void nft_chain_filter_inet_fini(void) {}
 #endif /* CONFIG_NF_TABLES_IPV6 */
 
 #if IS_ENABLED(CONFIG_NF_TABLES_BRIDGE)
+static int nft_set_bridge_pktinfo(struct nft_pktinfo *pkt, struct sk_buff *skb,
+				  const struct nf_hook_state *state,
+				  __be16 *proto)
+{
+	nft_set_pktinfo(pkt, skb, state);
+
+	switch (*proto) {
+	case htons(ETH_P_PPP_SES): {
+		struct ppp_hdr {
+			struct pppoe_hdr hdr;
+			__be16 proto;
+		} *ph, _ph;
+
+		ph = skb_header_pointer(skb, 0, sizeof(_ph), &_ph);
+		if (!ph) {
+			*proto = 0;
+			return -1;
+		}
+		switch (ph->proto) {
+		case htons(PPP_IP):
+			*proto = htons(ETH_P_IP);
+			return PPPOE_SES_HLEN;
+		case htons(PPP_IPV6):
+			*proto = htons(ETH_P_IPV6);
+			return PPPOE_SES_HLEN;
+		}
+		break;
+	}
+	case htons(ETH_P_8021Q): {
+		struct vlan_hdr *vhdr, _vhdr;
+
+		vhdr = skb_header_pointer(skb, 0, sizeof(_vhdr), &_vhdr);
+		if (!vhdr) {
+			*proto = 0;
+			return -1;
+		}
+		*proto = vhdr->h_vlan_encapsulated_proto;
+		return VLAN_HLEN;
+	}
+	}
+	return 0;
+}
+
 static unsigned int
 nft_do_chain_bridge(void *priv,
 		    struct sk_buff *skb,
 		    const struct nf_hook_state *state)
 {
 	struct nft_pktinfo pkt;
+	__be16 proto;
+	int offset;
 
-	nft_set_pktinfo(&pkt, skb, state);
+	proto = eth_hdr(skb)->h_proto;
+
+	offset = nft_set_bridge_pktinfo(&pkt, skb, state, &proto);
 
-	switch (eth_hdr(skb)->h_proto) {
+	switch (proto) {
 	case htons(ETH_P_IP):
-		nft_set_pktinfo_ipv4_validate(&pkt, 0);
+		nft_set_pktinfo_ipv4_validate(&pkt, offset);
 		break;
 	case htons(ETH_P_IPV6):
-		nft_set_pktinfo_ipv6_validate(&pkt, 0);
+		nft_set_pktinfo_ipv6_validate(&pkt, offset);
 		break;
 	default:
 		nft_set_pktinfo_unspec(&pkt);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v19 nf-next 1/5] net: pppoe: avoid zero-length arrays in struct pppoe_hdr
  2026-02-24  6:53 ` [PATCH v19 nf-next 1/5] net: pppoe: avoid zero-length arrays in struct pppoe_hdr Eric Woudstra
@ 2026-02-24 14:15   ` Florian Westphal
  0 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2026-02-24 14:15 UTC (permalink / raw)
  To: Eric Woudstra
  Cc: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Phil Sutter,
	Simon Horman, Nikolay Aleksandrov, Ido Schimmel, netdev,
	netfilter-devel, bridge, Kees Cook

Eric Woudstra <ericwouds@gmail.com> wrote:
> Jakub Kicinski suggested following patch:
> 
> W=1 C=1 GCC build gives us:
> 
> net/bridge/netfilter/nf_conntrack_bridge.c: note: in included file (through
> ../include/linux/if_pppox.h, ../include/uapi/linux/netfilter_bridge.h,
> ../include/linux/netfilter_bridge.h): include/uapi/linux/if_pppox.h:
> 153:29: warning: array of flexible structures
> 
> It doesn't like that hdr has a zero-length array which overlaps proto.
> The kernel code doesn't currently need those arrays.
> 
> PPPoE connection is functional after applying this patch.

LGTM, can you send this one directly to netdev@vger.kernel.org?

Its not netfilter related so it feels wrong to apply it, even though I
agree with it :-)

No need to resend the rest of this series.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [v19,nf-next,3/5] netfilter: bridge: Add conntrack double vlan and pppoe
  2026-02-24  6:53 ` [PATCH v19 nf-next 3/5] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
@ 2026-02-25  1:52   ` Jakub Kicinski
  0 siblings, 0 replies; 12+ messages in thread
From: Jakub Kicinski @ 2026-02-25  1:52 UTC (permalink / raw)
  To: ericwouds
  Cc: Jakub Kicinski, mostrows, netfilter-devel, pabeni, kees, edumazet,
	razor, horms, fw, pablo, andrew+netdev, idosch, netdev, davem,
	bridge, phil

This is an AI-generated review of your patch. The human sending this
email says "The tracking of feedback in the AI bot is often wrong,
but FWIW this is what it has to say about the submission"

For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
netfilter: bridge: Add conntrack double vlan and pppoe

This patch adds connection tracking capabilities for PPPoE encapsulated
packets and for inner 802.1q combined with outer 802.1ad or 802.1q
encapsulation in bridge netfilter.

> diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
> index 58a33d0380b0..49e01083278c 100644
> --- a/net/bridge/netfilter/nf_conntrack_bridge.c
> +++ b/net/bridge/netfilter/nf_conntrack_bridge.c
> @@ -16,6 +16,7 @@
>  #include <net/netfilter/nf_conntrack_helper.h>
>  #include <net/netfilter/nf_conntrack_bridge.h>
>
> +#include <linux/ppp_defs.h>
>  #include <linux/netfilter_ipv4.h>
>
>  #include "../br_private.h"
> @@ -236,58 +237,116 @@ static int nf_ct_br_ipv6_check(const struct sk_buff *skb)
>  	return 0;
>  }
>
> +static int nf_ct_bridge_pre_inner(struct sk_buff *skb, __be16 *proto, u32 *len)
> +{
> +	switch (*proto) {
> +	case htons(ETH_P_PPP_SES): {
> +		struct ppp_hdr {
> +			struct pppoe_hdr hdr;
> +			__be16 proto;
> +		} *ph;
> +
> +		if (!pskb_may_pull(skb, PPPOE_SES_HLEN))
> +			return -1;
> +		ph = (struct ppp_hdr *)(skb->data);
> +		switch (ph->proto) {
> +		case htons(PPP_IP):
> +			*proto = htons(ETH_P_IP);
> +			*len = ntohs(ph->hdr.length) - 2;
> +			skb_set_network_header(skb, PPPOE_SES_HLEN);
> +			return PPPOE_SES_HLEN;
> +		case htons(PPP_IPV6):
> +			*proto = htons(ETH_P_IPV6);
> +			*len = ntohs(ph->hdr.length) - 2;
> +			skb_set_network_header(skb, PPPOE_SES_HLEN);
> +			return PPPOE_SES_HLEN;
> +		}
> +		break;
> +	}

Is there validation on the PPPoE length field before using it? Florian
Westphal raised this concern during v14 review:

https://lore.kernel.org/netfilter-devel/aG2VDyHfVsp5L2zR@strlen.de/

The code reads ph->hdr.length and uses it directly to compute *len without
checking if the value is reasonable. While pskb_may_pull and pskb_trim_rcsum
provide bounds checking later in the code path, garbage or malicious
ph->hdr.length values could cause issues before those checks occur.

Should there be explicit validation of the PPPoE length field here to ensure
it's within acceptable bounds?

[ ... ]

> @@ -236,58 +237,116 @@ static int nf_ct_br_ipv6_check(const struct sk_buff *skb)
>
[ ... ]

>  static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
>  				     const struct nf_hook_state *state)
>  {
>  	struct nf_hook_state bridge_state = *state;
> +	int ret = NF_ACCEPT, offset = 0;
>  	enum ip_conntrack_info ctinfo;
> +	u32 len, pppoe_len = 0;
>  	struct nf_conn *ct;
> -	u32 len;
> -	int ret;
> +	__be16 proto;
>
>  	ct = nf_ct_get(skb, &ctinfo);
>  	if ((ct && !nf_ct_is_template(ct)) ||
>  	    ctinfo == IP_CT_UNTRACKED)
>  		return NF_ACCEPT;
>
> -	switch (skb->protocol) {
> -	case htons(ETH_P_IP):
> -		if (!pskb_may_pull(skb, sizeof(struct iphdr)))
> +	proto = skb->protocol;
> +
> +	if (ct && nf_ct_zone_id(nf_ct_zone(ct), CTINFO2DIR(ctinfo)) !=
> +			NF_CT_DEFAULT_ZONE_ID) {
> +		offset = nf_ct_bridge_pre_inner(skb, &proto, &pppoe_len);
> +		if (offset < 0)
>  			return NF_ACCEPT;
> +	}
> +
> +	switch (proto) {
> +	case htons(ETH_P_IP):
> +		if (!pskb_may_pull(skb, offset + sizeof(struct iphdr)))
> +			goto do_not_track;
>
>  		len = skb_ip_totlen(skb);
> -		if (pskb_trim_rcsum(skb, len))
> -			return NF_ACCEPT;
> +		if (pppoe_len && pppoe_len != len)
> +			goto do_not_track;
> +		if (pskb_trim_rcsum(skb, offset + len))
> +			goto do_not_track;
>
>  		if (nf_ct_br_ip_check(skb))
> -			return NF_ACCEPT;
> +			goto do_not_track;
>
>  		bridge_state.pf = NFPROTO_IPV4;
>  		ret = nf_ct_br_defrag4(skb, &bridge_state);
>  		break;
>  	case htons(ETH_P_IPV6):
> -		if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
> -			return NF_ACCEPT;
> +		if (!pskb_may_pull(skb, offset + sizeof(struct ipv6hdr)))
> +			goto do_not_track;
>
>  		len = sizeof(struct ipv6hdr) + skb_ipv6_payload_len(skb);
> -		if (pskb_trim_rcsum(skb, len))
> -			return NF_ACCEPT;
> +		if (pppoe_len && pppoe_len != len)
> +			goto do_not_track;
> +		if (pskb_trim_rcsum(skb, offset + len))
> +			goto do_not_track;
>
>  		if (nf_ct_br_ipv6_check(skb))
> -			return NF_ACCEPT;
> +			goto do_not_track;
>
>  		bridge_state.pf = NFPROTO_IPV6;
>  		ret = nf_ct_br_defrag6(skb, &bridge_state);
>  		break;
>  	default:
>  		nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
> -		return NF_ACCEPT;
> +		goto do_not_track;
>  	}
>
> -	if (ret != NF_ACCEPT)
> -		return ret;
> +	if (ret == NF_ACCEPT)
> +		ret = nf_conntrack_in(skb, &bridge_state);
> +
> +do_not_track:
> +	if (offset && ret == NF_ACCEPT)
> +		skb_reset_network_header(skb);
>
> -	return nf_conntrack_in(skb, &bridge_state);
> +	return ret;
>  }

This patch introduces PPPoE and inner VLAN encapsulated packet processing,
but skb->protocol is never modified. When a PPPoE or VLAN-encapsulated
fragmented IP packet is reassembled by ip_defrag via nf_ct_br_defrag4 (or
nf_ct_br_defrag6 for IPv6), the br_skb_cb_restore call sets
BR_INPUT_SKB_CB(skb)->frag_max_size to a non-zero value.

Later in the packet path, nf_ct_bridge_post calls nf_ct_bridge_refrag:

net/bridge/netfilter/nf_conntrack_bridge.c:nf_ct_bridge_refrag() {
    ...
    if (BR_INPUT_SKB_CB(skb)->frag_max_size) {
        switch (skb->protocol) {
        case htons(ETH_P_IP):
            nf_br_ip_fragment(state->net, state->sk, skb, &data, output);
            break;
        case htons(ETH_P_IPV6):
            nf_br_ip6_fragment(state->net, state->sk, skb, &data, output);
            break;
        default:
            WARN_ON_ONCE(1);
            return NF_DROP;
        }
    }
}

Since skb->protocol is still ETH_P_PPP_SES or ETH_P_8021Q (not ETH_P_IP or
ETH_P_IPV6), nf_ct_bridge_refrag hits the default case for reassembled
packets. This triggers WARN_ON_ONCE(1) and returns NF_DROP, dropping the
packet and emitting a kernel warning.

Should skb->protocol be updated to reflect the inner protocol, or should
nf_ct_bridge_refrag be extended to handle these encapsulation types?
-- 
pw-bot: awaiting-upstream

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [v19,nf-next,4/5] netfilter: nft_set_pktinfo_ipv4/6_validate: Add nhoff argument
  2026-02-24  6:53 ` [PATCH v19 nf-next 4/5] netfilter: nft_set_pktinfo_ipv4/6_validate: Add nhoff argument Eric Woudstra
@ 2026-02-25  1:52   ` Jakub Kicinski
  0 siblings, 0 replies; 12+ messages in thread
From: Jakub Kicinski @ 2026-02-25  1:52 UTC (permalink / raw)
  To: ericwouds
  Cc: Jakub Kicinski, mostrows, netfilter-devel, pabeni, kees, edumazet,
	razor, horms, fw, pablo, andrew+netdev, idosch, netdev, davem,
	bridge, phil

This is an AI-generated review of your patch. The human sending this
email says "The tracking of feedback in the AI bot is often wrong,
but FWIW this is what it has to say about the submission"

For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
netfilter: nft_set_pktinfo_ipv4/6_validate: Add nhoff argument

This commit adds the ability to specify an offset when calling
nft_set_pktinfo_ipv4/6_validate() for cases where the IP header is not
at skb_network_header().

> diff --git a/include/net/netfilter/nf_tables_ipv4.h b/include/net/netfilter/nf_tables_ipv4.h
> index fcf967286e37..bd354937134f 100644
> --- a/include/net/netfilter/nf_tables_ipv4.h
> +++ b/include/net/netfilter/nf_tables_ipv4.h

[ ... ]

> diff --git a/include/net/netfilter/nf_tables_ipv6.h b/include/net/netfilter/nf_tables_ipv6.h
> index c53ac00bb974..1e84a891f268 100644
> --- a/include/net/netfilter/nf_tables_ipv6.h
> +++ b/include/net/netfilter/nf_tables_ipv6.h
> @@ -24,17 +24,17 @@ static inline void nft_set_pktinfo_ipv6(struct nft_pktinfo *pkt)
>  	pkt->fragoff = frag_off;
>  }
>
> -static inline int __nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt)
> +static inline int __nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt, u32 nhoff)
>  {
>  #if IS_ENABLED(CONFIG_IPV6)
>  	unsigned int flags = IP6_FH_F_AUTH;
>  	struct ipv6hdr *ip6h, _ip6h;
> -	unsigned int thoff = 0;
> +	unsigned int thoff = nhoff;
                             ^^^^^

When nhoff != 0, does ipv6_find_hdr() interpret 'thoff' as a relative or
absolute offset?

Looking at the IPv6 header read above:

	ip6h = skb_header_pointer(pkt->skb, skb_network_offset(pkt->skb) + nhoff,
				  sizeof(*ip6h), &_ip6h);

The IPv6 header is at absolute offset 'skb_network_offset(pkt->skb) + nhoff'.

But when ipv6_find_hdr() is called:

	protohdr = ipv6_find_hdr(pkt->skb, &thoff, -1, &frag_off, &flags);

ipv6_find_hdr() interprets a non-zero *offset as an absolute offset from
skb->data (it does: ip6 = skb_header_pointer(skb, *offset, ...) and
start = *offset + sizeof(struct ipv6hdr)).

In bridge contexts where skb_network_offset != 0 (typically 14 for Ethernet),
ipv6_find_hdr() will read from the wrong location when nhoff != 0. Shouldn't
this be 'unsigned int thoff = skb_network_offset(pkt->skb) + nhoff;' to match
how the IPv6 header is read?

The IPv4 version correctly computes absolute offsets:

	pkt->thoff = skb_network_offset(pkt->skb) + nhoff + thoff;

All current callers pass nhoff=0, so this is dormant until a future patch
enables non-zero nhoff (which this commit is designed to support).

>  	unsigned short frag_off;
>  	u32 pkt_len, skb_len;
>  	int protohdr;
>
> -	ip6h = skb_header_pointer(pkt->skb, skb_network_offset(pkt->skb),
> +	ip6h = skb_header_pointer(pkt->skb, skb_network_offset(pkt->skb) + nhoff,
>  				  sizeof(*ip6h), &_ip6h);
>  	if (!ip6h)
>  		return -1;

[ ... ]

Florian Westphal raised a concern in v18 about a build warning in
include/uapi/linux/if_pppox.h:153:29 ('array of flexible structures').
He requested a new version to fix this. The author addressed this in v19 by
adding patch 1/5 'net: pppoe: avoid zero-length arrays in struct pppoe_hdr'
that wraps the flexible arrays in '#ifndef __KERNEL__' guards.

Reference: https://lore.kernel.org/netdev/aZynSuGMtK7JOOCj@strlen.de/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v19 nf-next 5/5] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2026-02-24  6:53 ` [PATCH v19 nf-next 5/5] netfilter: nft_chain_filter: Add bridge double vlan and pppoe Eric Woudstra
@ 2026-03-10  8:37   ` Eric Woudstra
  2026-03-10 12:39     ` Florian Westphal
  2026-03-11 10:07   ` Pablo Neira Ayuso
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Woudstra @ 2026-03-10  8:37 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso
  Cc: Paolo Abeni, Jakub Kicinski, netdev, Nikolay Aleksandrov,
	Simon Horman, Eric Dumazet, Ido Schimmel, netfilter-devel, bridge,
	David S. Miller, Phil Sutter, Michal Ostrowski, Andrew Lunn



On 2/24/26 7:53 AM, Eric Woudstra wrote:
> In nft_do_chain_bridge() pktinfo is only fully populated for plain packets
> and packets encapsulated in single 802.1q or 802.1ad.
> 
> When implementing the software bridge-fastpath and testing all possible
> encapulations, there can be more encapsulations:
> 
> The packet could (also) be encapsulated in PPPoE, or the packet could be
> encapsulated in an inner 802.1q, combined with an outer 802.1ad or 802.1q
> encapsulation.
> 
> nft_flow_offload_eval() also examines the L4 header, with the L4 protocol
> known from the conntrack-tuplehash. To access the header it uses
> nft_thoff(), but for these packets it returns zero.
> 
> Introduce nft_set_bridge_pktinfo() to help populate pktinfo with the
> offsets.
> 
> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
> ---
>  net/netfilter/nft_chain_filter.c | 55 +++++++++++++++++++++++++++++---
>  1 file changed, 51 insertions(+), 4 deletions(-)
> 
> diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
> index d4d5eadaba9c..66ef30c60e56 100644
> --- a/net/netfilter/nft_chain_filter.c
> +++ b/net/netfilter/nft_chain_filter.c
> @@ -227,21 +227,68 @@ static inline void nft_chain_filter_inet_fini(void) {}
>  #endif /* CONFIG_NF_TABLES_IPV6 */
>  
>  #if IS_ENABLED(CONFIG_NF_TABLES_BRIDGE)
> +static int nft_set_bridge_pktinfo(struct nft_pktinfo *pkt, struct sk_buff *skb,
> +				  const struct nf_hook_state *state,
> +				  __be16 *proto)
> +{
> +	nft_set_pktinfo(pkt, skb, state);
> +
> +	switch (*proto) {
> +	case htons(ETH_P_PPP_SES): {
> +		struct ppp_hdr {
> +			struct pppoe_hdr hdr;
> +			__be16 proto;
> +		} *ph, _ph;
> +
> +		ph = skb_header_pointer(skb, 0, sizeof(_ph), &_ph);
> +		if (!ph) {
> +			*proto = 0;
> +			return -1;
> +		}
> +		switch (ph->proto) {
> +		case htons(PPP_IP):
> +			*proto = htons(ETH_P_IP);
> +			return PPPOE_SES_HLEN;
> +		case htons(PPP_IPV6):
> +			*proto = htons(ETH_P_IPV6);
> +			return PPPOE_SES_HLEN;
> +		}
> +		break;
> +	}
> +	case htons(ETH_P_8021Q): {
> +		struct vlan_hdr *vhdr, _vhdr;
> +
> +		vhdr = skb_header_pointer(skb, 0, sizeof(_vhdr), &_vhdr);
> +		if (!vhdr) {
> +			*proto = 0;
> +			return -1;
> +		}
> +		*proto = vhdr->h_vlan_encapsulated_proto;
> +		return VLAN_HLEN;
> +	}
> +	}
> +	return 0;
> +}
> +
>  static unsigned int
>  nft_do_chain_bridge(void *priv,
>  		    struct sk_buff *skb,
>  		    const struct nf_hook_state *state)
>  {
>  	struct nft_pktinfo pkt;
> +	__be16 proto;
> +	int offset;
>  
> -	nft_set_pktinfo(&pkt, skb, state);
> +	proto = eth_hdr(skb)->h_proto;
> +
> +	offset = nft_set_bridge_pktinfo(&pkt, skb, state, &proto);
>  
> -	switch (eth_hdr(skb)->h_proto) {
> +	switch (proto) {
>  	case htons(ETH_P_IP):
> -		nft_set_pktinfo_ipv4_validate(&pkt, 0);
> +		nft_set_pktinfo_ipv4_validate(&pkt, offset);
>  		break;
>  	case htons(ETH_P_IPV6):
> -		nft_set_pktinfo_ipv6_validate(&pkt, 0);
> +		nft_set_pktinfo_ipv6_validate(&pkt, offset);
>  		break;
>  	default:
>  		nft_set_pktinfo_unspec(&pkt);

Just in case, I'm sending a kind and small reminder, before it is too
late in the cycle.

Regards,

Eric


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v19 nf-next 5/5] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2026-03-10  8:37   ` Eric Woudstra
@ 2026-03-10 12:39     ` Florian Westphal
  0 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2026-03-10 12:39 UTC (permalink / raw)
  To: Eric Woudstra
  Cc: Pablo Neira Ayuso, Paolo Abeni, Jakub Kicinski, netdev,
	Nikolay Aleksandrov, Simon Horman, Eric Dumazet, Ido Schimmel,
	netfilter-devel, bridge, David S. Miller, Phil Sutter,
	Michal Ostrowski, Andrew Lunn

Eric Woudstra <ericwouds@gmail.com> wrote:
> Just in case, I'm sending a kind and small reminder, before it is too
> late in the cycle.

I dropped this due to questions/feedback:
https://lore.kernel.org/netfilter-devel/20260225015256.967692-1-kuba@kernel.org/
https://lore.kernel.org/netfilter-devel/20260225015235.967500-1-kuba@kernel.org/

(You can ignore the bit wrt. 'net: pppoe: avoid zero-length arrays in
 struct pppoe_hdr', I am aware you sent a patch to net, so that part is
 covered).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v19 nf-next 5/5] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2026-02-24  6:53 ` [PATCH v19 nf-next 5/5] netfilter: nft_chain_filter: Add bridge double vlan and pppoe Eric Woudstra
  2026-03-10  8:37   ` Eric Woudstra
@ 2026-03-11 10:07   ` Pablo Neira Ayuso
  1 sibling, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2026-03-11 10:07 UTC (permalink / raw)
  To: Eric Woudstra
  Cc: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Florian Westphal, Phil Sutter,
	Simon Horman, Nikolay Aleksandrov, Ido Schimmel, netdev,
	netfilter-devel, bridge

Hi Eric,

On Tue, Feb 24, 2026 at 07:53:06AM +0100, Eric Woudstra wrote:
> In nft_do_chain_bridge() pktinfo is only fully populated for plain packets
> and packets encapsulated in single 802.1q or 802.1ad.
> 
> When implementing the software bridge-fastpath and testing all possible
> encapulations, there can be more encapsulations:
> 
> The packet could (also) be encapsulated in PPPoE, or the packet could be
> encapsulated in an inner 802.1q, combined with an outer 802.1ad or 802.1q
> encapsulation.
> 
> nft_flow_offload_eval() also examines the L4 header, with the L4 protocol
> known from the conntrack-tuplehash. To access the header it uses
> nft_thoff(), but for these packets it returns zero.
> 
> Introduce nft_set_bridge_pktinfo() to help populate pktinfo with the
> offsets.

I just posted a slightly different approach to deal with this which
also works for the netdev family. My understanding is that your
proposal has a strong dependency on the conntrack infrastructure, and
it would be good if stateless filtering on double-tagged vlan and
pppoe is also possible.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-03-11 10:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-24  6:53 [PATCH v19 nf-next 0/5] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
2026-02-24  6:53 ` [PATCH v19 nf-next 1/5] net: pppoe: avoid zero-length arrays in struct pppoe_hdr Eric Woudstra
2026-02-24 14:15   ` Florian Westphal
2026-02-24  6:53 ` [PATCH v19 nf-next 2/5] netfilter: utils: nf_checksum(_partial) correct data!=networkheader Eric Woudstra
2026-02-24  6:53 ` [PATCH v19 nf-next 3/5] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
2026-02-25  1:52   ` [v19,nf-next,3/5] " Jakub Kicinski
2026-02-24  6:53 ` [PATCH v19 nf-next 4/5] netfilter: nft_set_pktinfo_ipv4/6_validate: Add nhoff argument Eric Woudstra
2026-02-25  1:52   ` [v19,nf-next,4/5] " Jakub Kicinski
2026-02-24  6:53 ` [PATCH v19 nf-next 5/5] netfilter: nft_chain_filter: Add bridge double vlan and pppoe Eric Woudstra
2026-03-10  8:37   ` Eric Woudstra
2026-03-10 12:39     ` Florian Westphal
2026-03-11 10:07   ` Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox