netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v14 nf-next 0/3] conntrack: bridge: add double vlan, pppoe and pppoe-in-q
@ 2025-07-08 15:12 Eric Woudstra
  2025-07-08 15:12 ` [PATCH v14 nf-next 1/3] netfilter: utils: nf_checksum(_partial) correct data!=networkheader Eric Woudstra
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Eric Woudstra @ 2025-07-08 15:12 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: netfilter-devel, bridge, netdev, Eric Woudstra

Conntrack bridge only tracks untagged and 802.1q.

To make the bridge-fastpath experience more similar to the
forward-fastpath experience, add double vlan, pppoe and pppoe-in-q
tagged packets to bridge conntrack and to bridge filter chain.

Changes in v14:

- nf_checksum(_patial): Use DEBUG_NET_WARN_ON_ONCE(
   !skb_pointer_if_linear()) instead of pskb_may_pull().
- nft_do_chain_bridge: Added default case ph->proto is neither
   ipv4 nor ipv6.
- nft_do_chain_bridge: only reset network header when ret == NF_ACCEPT.

Changes in v13:

- Do not use pull/push before/after calling nf_conntrack_in() or
   nft_do_chain().
- Add patch to correct calculating checksum when skb->data !=
   skb_network_header(skb).

Changes in v12:

- Only allow tracking this traffic when a conntrack zone is set.
- nf_ct_bridge_pre(): skb pull/push without touching the checksum,
   because the pull is always restored with push.
- nft_do_chain_bridge(): handle the extra header similar to
   nf_ct_bridge_pre(), using pull/push.

Changes in v11:

- nft_do_chain_bridge(): Proper readout of encapsulated proto.
- nft_do_chain_bridge(): Use skb_set_network_header() instead of thoff.
- removed test script, it is now in separate patch.

v10 split from patch-set: bridge-fastpath and related improvements v9

Eric Woudstra (3):
  netfilter: utils: nf_checksum(_partial) correct data!=networkheader
  netfilter: bridge: Add conntrack double vlan and pppoe
  netfilter: nft_chain_filter: Add bridge double vlan and pppoe

 net/bridge/netfilter/nf_conntrack_bridge.c | 88 ++++++++++++++++++----
 net/netfilter/nft_chain_filter.c           | 59 ++++++++++++++-
 net/netfilter/utils.c                      | 20 +++--
 3 files changed, 144 insertions(+), 23 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v14 nf-next 1/3] netfilter: utils: nf_checksum(_partial) correct data!=networkheader
  2025-07-08 15:12 [PATCH v14 nf-next 0/3] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
@ 2025-07-08 15:12 ` Eric Woudstra
  2025-07-08 15:12 ` [PATCH v14 nf-next 2/3] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
  2025-07-08 15:12 ` [PATCH v14 nf-next 3/3] netfilter: nft_chain_filter: Add bridge " Eric Woudstra
  2 siblings, 0 replies; 12+ messages in thread
From: Eric Woudstra @ 2025-07-08 15:12 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: netfilter-devel, bridge, netdev, Eric Woudstra

In the conntrack hook it may not always be the case that:
skb_network_header(skb) == skb->data.

This is problematic when L4 function nf_conntrack_handle_packet()
is accessing L3 data. This function uses thoff and ip_hdr()
to finds it's data. But it also calculates the checksum.
nf_checksum() and nf_checksum_partial() both use lower skb-checksum
functions that are based on using skb->data.

When skb_network_header(skb) != skb->data, adjust accordingly,
so that the checksum is calculated correctly.

Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
 net/netfilter/utils.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/utils.c b/net/netfilter/utils.c
index 008419db815a..9ba822983bc0 100644
--- a/net/netfilter/utils.c
+++ b/net/netfilter/utils.c
@@ -124,16 +124,20 @@ __sum16 nf_checksum(struct sk_buff *skb, unsigned int hook,
 		    unsigned int dataoff, u8 protocol,
 		    unsigned short family)
 {
+	unsigned int nhpull = skb_network_header(skb) - skb->data;
 	__sum16 csum = 0;
 
+	DEBUG_NET_WARN_ON_ONCE(!skb_pointer_if_linear(skb, nhpull, 0));
+	__skb_pull(skb, nhpull);
 	switch (family) {
 	case AF_INET:
-		csum = nf_ip_checksum(skb, hook, dataoff, protocol);
+		csum = nf_ip_checksum(skb, hook, dataoff - nhpull, protocol);
 		break;
 	case AF_INET6:
-		csum = nf_ip6_checksum(skb, hook, dataoff, protocol);
+		csum = nf_ip6_checksum(skb, hook, dataoff - nhpull, protocol);
 		break;
 	}
+	__skb_push(skb, nhpull);
 
 	return csum;
 }
@@ -143,18 +147,22 @@ __sum16 nf_checksum_partial(struct sk_buff *skb, unsigned int hook,
 			    unsigned int dataoff, unsigned int len,
 			    u8 protocol, unsigned short family)
 {
+	unsigned int nhpull = skb_network_header(skb) - skb->data;
 	__sum16 csum = 0;
 
+	DEBUG_NET_WARN_ON_ONCE(!skb_pointer_if_linear(skb, nhpull, 0));
+	__skb_pull(skb, nhpull);
 	switch (family) {
 	case AF_INET:
-		csum = nf_ip_checksum_partial(skb, hook, dataoff, len,
-					      protocol);
+		csum = nf_ip_checksum_partial(skb, hook, dataoff - nhpull,
+					      len, protocol);
 		break;
 	case AF_INET6:
-		csum = nf_ip6_checksum_partial(skb, hook, dataoff, len,
-					       protocol);
+		csum = nf_ip6_checksum_partial(skb, hook, dataoff - nhpull,
+					       len, protocol);
 		break;
 	}
+	__skb_push(skb, nhpull);
 
 	return csum;
 }
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v14 nf-next 2/3] netfilter: bridge: Add conntrack double vlan and pppoe
  2025-07-08 15:12 [PATCH v14 nf-next 0/3] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
  2025-07-08 15:12 ` [PATCH v14 nf-next 1/3] netfilter: utils: nf_checksum(_partial) correct data!=networkheader Eric Woudstra
@ 2025-07-08 15:12 ` Eric Woudstra
  2025-07-08 22:00   ` Florian Westphal
  2025-07-08 15:12 ` [PATCH v14 nf-next 3/3] netfilter: nft_chain_filter: Add bridge " Eric Woudstra
  2 siblings, 1 reply; 12+ messages in thread
From: Eric Woudstra @ 2025-07-08 15:12 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: netfilter-devel, bridge, netdev, Eric Woudstra

This adds the capability to conntrack 802.1ad, QinQ, PPPoE and PPPoE-in-Q
packets that are passing a bridge, only when a conntrack zone is set.

Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
 net/bridge/netfilter/nf_conntrack_bridge.c | 88 ++++++++++++++++++----
 1 file changed, 72 insertions(+), 16 deletions(-)

diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
index 6482de4d8750..ccfe1df5e92e 100644
--- a/net/bridge/netfilter/nf_conntrack_bridge.c
+++ b/net/bridge/netfilter/nf_conntrack_bridge.c
@@ -242,53 +242,109 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
 {
 	struct nf_hook_state bridge_state = *state;
 	enum ip_conntrack_info ctinfo;
+	u32 len, data_len = U32_MAX;
+	int ret, offset = 0;
 	struct nf_conn *ct;
-	u32 len;
-	int ret;
+	__be16 outer_proto;
 
 	ct = nf_ct_get(skb, &ctinfo);
 	if ((ct && !nf_ct_is_template(ct)) ||
 	    ctinfo == IP_CT_UNTRACKED)
 		return NF_ACCEPT;
 
+	if (ct && nf_ct_zone_id(nf_ct_zone(ct), CTINFO2DIR(ctinfo)) !=
+			NF_CT_DEFAULT_ZONE_ID) {
+		switch (skb->protocol) {
+		case htons(ETH_P_PPP_SES): {
+			struct ppp_hdr {
+				struct pppoe_hdr hdr;
+				__be16 proto;
+			} *ph;
+
+			offset = PPPOE_SES_HLEN;
+			if (!pskb_may_pull(skb, offset))
+				return NF_ACCEPT;
+			outer_proto = skb->protocol;
+			ph = (struct ppp_hdr *)(skb->data);
+			switch (ph->proto) {
+			case htons(PPP_IP):
+				skb->protocol = htons(ETH_P_IP);
+				break;
+			case htons(PPP_IPV6):
+				skb->protocol = htons(ETH_P_IPV6);
+				break;
+			default:
+				nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
+				return NF_ACCEPT;
+			}
+			data_len = ntohs(ph->hdr.length) - 2;
+			skb_set_network_header(skb, offset);
+			break;
+		}
+		case htons(ETH_P_8021Q): {
+			struct vlan_hdr *vhdr;
+
+			offset = VLAN_HLEN;
+			if (!pskb_may_pull(skb, offset))
+				return NF_ACCEPT;
+			outer_proto = skb->protocol;
+			vhdr = (struct vlan_hdr *)(skb->data);
+			skb->protocol = vhdr->h_vlan_encapsulated_proto;
+			data_len = U32_MAX;
+			skb_set_network_header(skb, offset);
+			break;
+		}
+		}
+	}
+
+	ret = NF_ACCEPT;
 	switch (skb->protocol) {
 	case htons(ETH_P_IP):
-		if (!pskb_may_pull(skb, sizeof(struct iphdr)))
-			return NF_ACCEPT;
+		if (!pskb_may_pull(skb, offset + sizeof(struct iphdr)))
+			goto do_not_track;
 
 		len = skb_ip_totlen(skb);
-		if (pskb_trim_rcsum(skb, len))
-			return NF_ACCEPT;
+		if (data_len < len)
+			len = data_len;
+		if (pskb_trim_rcsum(skb, offset + len))
+			goto do_not_track;
 
 		if (nf_ct_br_ip_check(skb))
-			return NF_ACCEPT;
+			goto do_not_track;
 
 		bridge_state.pf = NFPROTO_IPV4;
 		ret = nf_ct_br_defrag4(skb, &bridge_state);
 		break;
 	case htons(ETH_P_IPV6):
-		if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
-			return NF_ACCEPT;
+		if (!pskb_may_pull(skb, offset + sizeof(struct ipv6hdr)))
+			goto do_not_track;
 
 		len = sizeof(struct ipv6hdr) + ntohs(ipv6_hdr(skb)->payload_len);
-		if (pskb_trim_rcsum(skb, len))
-			return NF_ACCEPT;
+		if (data_len < len)
+			len = data_len;
+		if (pskb_trim_rcsum(skb, offset + len))
+			goto do_not_track;
 
 		if (nf_ct_br_ipv6_check(skb))
-			return NF_ACCEPT;
+			goto do_not_track;
 
 		bridge_state.pf = NFPROTO_IPV6;
 		ret = nf_ct_br_defrag6(skb, &bridge_state);
 		break;
 	default:
 		nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
-		return NF_ACCEPT;
+		goto do_not_track;
 	}
 
-	if (ret != NF_ACCEPT)
-		return ret;
+	if (ret == NF_ACCEPT)
+		ret = nf_conntrack_in(skb, &bridge_state);
 
-	return nf_conntrack_in(skb, &bridge_state);
+do_not_track:
+	if (offset) {
+		skb_reset_network_header(skb);
+		skb->protocol = outer_proto;
+	}
+	return ret;
 }
 
 static unsigned int nf_ct_bridge_in(void *priv, struct sk_buff *skb,
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v14 nf-next 3/3] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2025-07-08 15:12 [PATCH v14 nf-next 0/3] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
  2025-07-08 15:12 ` [PATCH v14 nf-next 1/3] netfilter: utils: nf_checksum(_partial) correct data!=networkheader Eric Woudstra
  2025-07-08 15:12 ` [PATCH v14 nf-next 2/3] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
@ 2025-07-08 15:12 ` Eric Woudstra
  2025-07-08 22:02   ` Florian Westphal
  2 siblings, 1 reply; 12+ messages in thread
From: Eric Woudstra @ 2025-07-08 15:12 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: netfilter-devel, bridge, netdev, Eric Woudstra

This adds the capability to evaluate 802.1ad, QinQ, PPPoE and PPPoE-in-Q
packets in the bridge filter chain.

Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
 net/netfilter/nft_chain_filter.c | 59 +++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index 846d48ba8965..1d09015a64d6 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -232,11 +232,62 @@ nft_do_chain_bridge(void *priv,
 		    struct sk_buff *skb,
 		    const struct nf_hook_state *state)
 {
+	__be16 outer_proto, proto = 0;
 	struct nft_pktinfo pkt;
+	int ret, offset = 0;
 
 	nft_set_pktinfo(&pkt, skb, state);
 
 	switch (eth_hdr(skb)->h_proto) {
+	case htons(ETH_P_PPP_SES): {
+		struct ppp_hdr {
+			struct pppoe_hdr hdr;
+			__be16 proto;
+		} *ph;
+
+		if (!pskb_may_pull(skb, PPPOE_SES_HLEN))
+			break;
+		ph = (struct ppp_hdr *)(skb->data);
+		switch (ph->proto) {
+		case htons(PPP_IP):
+			offset = PPPOE_SES_HLEN;
+			outer_proto = skb->protocol;
+			proto = htons(ETH_P_IP);
+			skb_set_network_header(skb, offset);
+			skb->protocol = proto;
+			break;
+		case htons(PPP_IPV6):
+			offset = PPPOE_SES_HLEN;
+			outer_proto = skb->protocol;
+			proto = htons(ETH_P_IPV6);
+			skb_set_network_header(skb, offset);
+			skb->protocol = proto;
+			break;
+		default:
+			proto = eth_hdr(skb)->h_proto;
+			break;
+		}
+		break;
+	}
+	case htons(ETH_P_8021Q): {
+		struct vlan_hdr *vhdr;
+
+		if (!pskb_may_pull(skb, VLAN_HLEN))
+			break;
+		vhdr = (struct vlan_hdr *)(skb->data);
+		offset = VLAN_HLEN;
+		outer_proto = skb->protocol;
+		proto = vhdr->h_vlan_encapsulated_proto;
+		skb_set_network_header(skb, offset);
+		skb->protocol = proto;
+		break;
+	}
+	default:
+		proto = eth_hdr(skb)->h_proto;
+		break;
+	}
+
+	switch (proto) {
 	case htons(ETH_P_IP):
 		nft_set_pktinfo_ipv4_validate(&pkt);
 		break;
@@ -248,7 +299,13 @@ nft_do_chain_bridge(void *priv,
 		break;
 	}
 
-	return nft_do_chain(&pkt, priv);
+	ret = nft_do_chain(&pkt, priv);
+
+	if (offset && ret == NF_ACCEPT) {
+		skb_reset_network_header(skb);
+		skb->protocol = outer_proto;
+	}
+	return ret;
 }
 
 static const struct nft_chain_type nft_chain_filter_bridge = {
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v14 nf-next 2/3] netfilter: bridge: Add conntrack double vlan and pppoe
  2025-07-08 15:12 ` [PATCH v14 nf-next 2/3] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
@ 2025-07-08 22:00   ` Florian Westphal
  0 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2025-07-08 22:00 UTC (permalink / raw)
  To: Eric Woudstra
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, bridge, netdev

Eric Woudstra <ericwouds@gmail.com> wrote:
> This adds the capability to conntrack 802.1ad, QinQ, PPPoE and PPPoE-in-Q
> packets that are passing a bridge, only when a conntrack zone is set.
> 
> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
> ---
>  net/bridge/netfilter/nf_conntrack_bridge.c | 88 ++++++++++++++++++----
>  1 file changed, 72 insertions(+), 16 deletions(-)
> 
> +			data_len = ntohs(ph->hdr.length) - 2;

Shouldn't there be some validation on data_len here?

> +		if (!pskb_may_pull(skb, offset + sizeof(struct iphdr)))
> +			goto do_not_track;
 
>  		len = skb_ip_totlen(skb);
> -		if (pskb_trim_rcsum(skb, len))
> -			return NF_ACCEPT;
> +		if (data_len < len)
> +			len = data_len;

Hmm.  So if ph->hdr.length is smaller than what ip header claims,
len shrinks.

If its higher, then the mismatch is ignored and we only use
the ip header length (i.e., the smaller value).

> +		if (pskb_trim_rcsum(skb, offset + len))
> +			goto do_not_track;

Is the intent that garbage data_len is caught here and

>  		if (nf_ct_br_ip_check(skb))
> -			return NF_ACCEPT;

here?  If so, maybe a comment could help.

> +		goto do_not_track;
>  	}
>  
> -	if (ret != NF_ACCEPT)
> -		return ret;
> +	if (ret == NF_ACCEPT)
> +		ret = nf_conntrack_in(skb, &bridge_state);
>  
> -	return nf_conntrack_in(skb, &bridge_state);
> +do_not_track:
> +	if (offset) {

if (ret == NF_ACCEPT && offset) { ...

Else skb could have been free'd. There should be test cases for this
functionality included.  If we lack test cases for the existing
functionality, which might be the case, please consider submitting
a reduced test case first so it can be applied regardless of the
remaining functionality.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v14 nf-next 3/3] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2025-07-08 15:12 ` [PATCH v14 nf-next 3/3] netfilter: nft_chain_filter: Add bridge " Eric Woudstra
@ 2025-07-08 22:02   ` Florian Westphal
  2025-07-11 12:55     ` Eric Woudstra
  0 siblings, 1 reply; 12+ messages in thread
From: Florian Westphal @ 2025-07-08 22:02 UTC (permalink / raw)
  To: Eric Woudstra
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, bridge, netdev

Eric Woudstra <ericwouds@gmail.com> wrote:
> +		if (!pskb_may_pull(skb, VLAN_HLEN))
> +			break;
> +		vhdr = (struct vlan_hdr *)(skb->data);
> +		offset = VLAN_HLEN;
> +		outer_proto = skb->protocol;
> +		proto = vhdr->h_vlan_encapsulated_proto;
> +		skb_set_network_header(skb, offset);
> +		skb->protocol = proto;

Why is skb->protocol munged?  Also applies to the previous patch,
I forgot to ask.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v14 nf-next 3/3] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2025-07-08 22:02   ` Florian Westphal
@ 2025-07-11 12:55     ` Eric Woudstra
  2025-07-11 14:14       ` Florian Westphal
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Woudstra @ 2025-07-11 12:55 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, bridge, netdev


On 7/9/25 12:02 AM, Florian Westphal wrote:
> Eric Woudstra <ericwouds@gmail.com> wrote:
>> +		if (!pskb_may_pull(skb, VLAN_HLEN))
>> +			break;
>> +		vhdr = (struct vlan_hdr *)(skb->data);
>> +		offset = VLAN_HLEN;
>> +		outer_proto = skb->protocol;
>> +		proto = vhdr->h_vlan_encapsulated_proto;
>> +		skb_set_network_header(skb, offset);
>> +		skb->protocol = proto;
> 
> Why is skb->protocol munged?  Also applies to the previous patch,
> I forgot to ask.

In the previous patch in nf_ct_bridge_pre(), indeed, no need to munge
skb->protocol. So I'll change that.

But in nft_do_chain_bridge() it is needed in the case of matching 'ip
saddr', 'ip daddr', 'ip6 saddr' or 'ip6 daddr'. I suspect all ip/ip6
matches are suffering.

So still matching is something like:

tcp dport 8080 counter name "check"

But no match when:

ip saddr 192.168.1.1 tcp dport 8080 counter name "check"

After munging skb->protocol, I do get the match.

I haven't found where yet, but It seems nft is checking skb->protocol,
before it tries to match the ip(6) saddr/daddr.


And to answer a question in the other patch: this issue is found by
using my script bridge_fastpath.sh. It first checks the connection,
conntrack and nft-chain are functional in all testcases. So, it tests
the functionality of the patches in this patch-set. I want to improve
the script on a few more issues and then send a non-rfc.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v14 nf-next 3/3] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2025-07-11 12:55     ` Eric Woudstra
@ 2025-07-11 14:14       ` Florian Westphal
  2025-07-12 10:08         ` Eric Woudstra
  2025-09-02  8:48         ` Eric Woudstra
  0 siblings, 2 replies; 12+ messages in thread
From: Florian Westphal @ 2025-07-11 14:14 UTC (permalink / raw)
  To: Eric Woudstra
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, bridge, netdev

Eric Woudstra <ericwouds@gmail.com> wrote:

[ skb->protocl munging ]

> But in nft_do_chain_bridge() it is needed in the case of matching 'ip
> saddr', 'ip daddr', 'ip6 saddr' or 'ip6 daddr'. I suspect all ip/ip6
> matches are suffering.

Thats because of implicit dependency insertion on userspace side:
# ip saddr 1.2.3.4 counter ip daddr 3.4.5.6
bridge test-bridge input
  [ meta load protocol => reg 1 ]
  [ cmp eq reg 1 0x00000008 ]
  [ payload load 4b @ network header + 12 => reg 1 ]
  ...

So, if userspace would NOT do that it would 'just work'.

Pablo, whats your take on this?
We currently don't have a 'nhproto' field in nft_pktinfo
and there is no space to add one.

We could say that things work as expected, and that
 ip saddr 1.2.3.4

should not magically match packets in e.g. pppoe encap.
I suspect it will start to work if you force it to match in pppoe, e.g.
ether type 0x8864 ip saddr ...

so nft won't silently add the skb->protocol dependency.

Its not a technical issue but about how matching is supposed to work
in a bridge.

If its supposed to work automatically we need to either:
1. munge skb->protocol in kernel, even tough its wrong (we don't strip
   the l2 headers).
2. record the real l3 protocol somewhere and make it accessible, then
   fix the dependency generation in userspace to use the 'new way' (meta
   l3proto)?
3. change the dependency generation to something else.
   But what? 'ether type ip' won't work either for 8021ad etc.
   'ip version' can't be used for arp.

> I haven't found where yet, but It seems nft is checking skb->protocol,
> before it tries to match the ip(6) saddr/daddr.

Yes, userspace inserts this, see 'nft --debug=netlink add rule bridge ..'

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v14 nf-next 3/3] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2025-07-11 14:14       ` Florian Westphal
@ 2025-07-12 10:08         ` Eric Woudstra
  2025-07-12 10:50           ` Florian Westphal
  2025-09-02  8:48         ` Eric Woudstra
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Woudstra @ 2025-07-12 10:08 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, bridge, netdev



On 7/11/25 4:14 PM, Florian Westphal wrote:
> Eric Woudstra <ericwouds@gmail.com> wrote:
> 
> [ skb->protocl munging ]
> 
>> But in nft_do_chain_bridge() it is needed in the case of matching 'ip
>> saddr', 'ip daddr', 'ip6 saddr' or 'ip6 daddr'. I suspect all ip/ip6
>> matches are suffering.
> 
> Thats because of implicit dependency insertion on userspace side:
> # ip saddr 1.2.3.4 counter ip daddr 3.4.5.6
> bridge test-bridge input
>   [ meta load protocol => reg 1 ]
>   [ cmp eq reg 1 0x00000008 ]
>   [ payload load 4b @ network header + 12 => reg 1 ]
>   ...
> 
> So, if userspace would NOT do that it would 'just work'.
> 
> Pablo, whats your take on this?
> We currently don't have a 'nhproto' field in nft_pktinfo
> and there is no space to add one.
> 
> We could say that things work as expected, and that
>  ip saddr 1.2.3.4
> 
> should not magically match packets in e.g. pppoe encap.
> I suspect it will start to work if you force it to match in pppoe, e.g.
> ether type 0x8864 ip saddr ...
> 
> so nft won't silently add the skb->protocol dependency.
> 
> Its not a technical issue but about how matching is supposed to work
> in a bridge.
> 
> If its supposed to work automatically we need to either:
> 1. munge skb->protocol in kernel, even tough its wrong (we don't strip
>    the l2 headers).
> 2. record the real l3 protocol somewhere and make it accessible, then
>    fix the dependency generation in userspace to use the 'new way' (meta
>    l3proto)?
> 3. change the dependency generation to something else.
>    But what? 'ether type ip' won't work either for 8021ad etc.
>    'ip version' can't be used for arp.

Is using 'meta nfproto ipv4' instead an option? This looks at
pkt->state->pf, which holds the correct value, not at skb->protcol.

> 
>> I haven't found where yet, but It seems nft is checking skb->protocol,
>> before it tries to match the ip(6) saddr/daddr.
> 
> Yes, userspace inserts this, see 'nft --debug=netlink add rule bridge ..'


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v14 nf-next 3/3] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2025-07-12 10:08         ` Eric Woudstra
@ 2025-07-12 10:50           ` Florian Westphal
  0 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2025-07-12 10:50 UTC (permalink / raw)
  To: Eric Woudstra
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, bridge, netdev

Eric Woudstra <ericwouds@gmail.com> wrote:
> Is using 'meta nfproto ipv4' instead an option? This looks at
> pkt->state->pf, which holds the correct value, not at skb->protcol.

pf is NFPROTO_BRIDGE.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v14 nf-next 3/3] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2025-07-11 14:14       ` Florian Westphal
  2025-07-12 10:08         ` Eric Woudstra
@ 2025-09-02  8:48         ` Eric Woudstra
  2025-09-02 13:18           ` Florian Westphal
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Woudstra @ 2025-09-02  8:48 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, bridge, netdev



On 7/11/25 4:14 PM, Florian Westphal wrote:
> Eric Woudstra <ericwouds@gmail.com> wrote:
> 
> [ skb->protocl munging ]
> 
>> But in nft_do_chain_bridge() it is needed in the case of matching 'ip
>> saddr', 'ip daddr', 'ip6 saddr' or 'ip6 daddr'. I suspect all ip/ip6
>> matches are suffering.
> 
> Thats because of implicit dependency insertion on userspace side:
> # ip saddr 1.2.3.4 counter ip daddr 3.4.5.6
> bridge test-bridge input
>   [ meta load protocol => reg 1 ]
>   [ cmp eq reg 1 0x00000008 ]
>   [ payload load 4b @ network header + 12 => reg 1 ]
>   ...
> 
> So, if userspace would NOT do that it would 'just work'.
> 
> Pablo, whats your take on this?
> We currently don't have a 'nhproto' field in nft_pktinfo
> and there is no space to add one.
> 
> We could say that things work as expected, and that
>  ip saddr 1.2.3.4
> 
> should not magically match packets in e.g. pppoe encap.
> I suspect it will start to work if you force it to match in pppoe, e.g.
> ether type 0x8864 ip saddr ...
> 
> so nft won't silently add the skb->protocol dependency.
> 
> Its not a technical issue but about how matching is supposed to work
> in a bridge.
> 
> If its supposed to work automatically we need to either:
> 1. munge skb->protocol in kernel, even tough its wrong (we don't strip
>    the l2 headers).
> 2. record the real l3 protocol somewhere and make it accessible, then
>    fix the dependency generation in userspace to use the 'new way' (meta
>    l3proto)?
> 3. change the dependency generation to something else.
>    But what? 'ether type ip' won't work either for 8021ad etc.
>    'ip version' can't be used for arp.
> 

Hi Florian,

Did you get any information on how to handle this issue?

>> I haven't found where yet, but It seems nft is checking skb->protocol,
>> before it tries to match the ip(6) saddr/daddr.
> 
> Yes, userspace inserts this, see 'nft --debug=netlink add rule bridge ..'


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v14 nf-next 3/3] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2025-09-02  8:48         ` Eric Woudstra
@ 2025-09-02 13:18           ` Florian Westphal
  0 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2025-09-02 13:18 UTC (permalink / raw)
  To: Eric Woudstra
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, bridge, netdev

Eric Woudstra <ericwouds@gmail.com> wrote:
> > Thats because of implicit dependency insertion on userspace side:
> > # ip saddr 1.2.3.4 counter ip daddr 3.4.5.6
> > bridge test-bridge input
> >   [ meta load protocol => reg 1 ]
> >   [ cmp eq reg 1 0x00000008 ]
> >   [ payload load 4b @ network header + 12 => reg 1 ]
> >   ...
> > 
> > So, if userspace would NOT do that it would 'just work'.
> > 
> > Pablo, whats your take on this?
> > We currently don't have a 'nhproto' field in nft_pktinfo
> > and there is no space to add one.
> > 
> > We could say that things work as expected, and that
> >  ip saddr 1.2.3.4
> > 
> > should not magically match packets in e.g. pppoe encap.

FTR, I think 'ip saddr 1.2.3.4' (standalone with no other info),
should NOT match inside a random l2 tunnel.

> > I suspect it will start to work if you force it to match in pppoe, e.g.
> > ether type 0x8864 ip saddr ...
> > 
> > so nft won't silently add the skb->protocol dependency.
> > 
> > Its not a technical issue but about how matching is supposed to work
> > in a bridge.
> > 
> > If its supposed to work automatically we need to either:
> > 1. munge skb->protocol in kernel, even tough its wrong (we don't strip
> >    the l2 headers).
> > 2. record the real l3 protocol somewhere and make it accessible, then
> >    fix the dependency generation in userspace to use the 'new way' (meta
> >    l3proto)?
> > 3. change the dependency generation to something else.
> >    But what? 'ether type ip' won't work either for 8021ad etc.
> >    'ip version' can't be used for arp.
> > 
> 
> Hi Florian,
> 
> Did you get any information on how to handle this issue?

Did you check if you can get it to match if you add the relevant
l3 dependency in the rule?

I don't think we should (or can) change how the rules get evaluated by
making 'ip saddr' match on other l2 tunnel protocols by default.

It is even incompatible with any exiting rulesets, consider e.g.
"ip daddr 1.2.3.4 drop" on a bridge, now this address becomes
unreachable but it works before your patch (if the address is found in
e.g. pppoe header).

'ip/ip6' should work as expected as long as userspace provides
the correct ether type and dependencies.

I.e., what this patch adds as C code should work if being provided
as part of the rule.

What might make sense is to add the ppp(oe) header to src/proto.c
in nftables so users that want to match the header following ppp
one don't have to use raw payload match syntax.

What might also make sense is to either add a way to force a call
to nft_set_pktinfo_ipv4_validate() from the ruleset, or take your
patch but WITHOUT the skb->protocol munging.

However, due to the number of possible l2 header chain combinations
I'm not sure we should bother with trying to add all of them.

I worry we would end up turning nft_do_chain_bridge() preamble or
nft_set_pktinfo() into some kind of l2 packet dissector.

Maybe one way forward is to introduce

	NFT_META_BRI_INET_VALIDATE

nft add rule ... meta inet validate ...
(just an idea, come up with better names...)

We'd have to add NFT_PKTINFO_L3PROTO flag to
include/net/netfilter/nf_tables.h.
(or, alternatively NFT_PKTINFO_UNSPEC).

Then, set this flag in struct nft_pktinfo, from
nft_set_pktinfo_ipv4|6_validate (or from nft_set_pktinfo_unspec).

NFT_META_BRI_INET_VALIDATE, would call nft_set_pktinfo_ipv4_validate
or nft_set_pktinfo_ipv6_validate depending on iph->version and set
NFT_BREAK verdict if the flag is still absent.

**USERSPACE IS RESPONSIBLE** to prevent arp packets from entering
this expression. If they do, then header validation should fail
but there would be an off-chance that the garbage is also a valid
ipv4 or ipv6 packet.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-09-02 13:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-08 15:12 [PATCH v14 nf-next 0/3] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
2025-07-08 15:12 ` [PATCH v14 nf-next 1/3] netfilter: utils: nf_checksum(_partial) correct data!=networkheader Eric Woudstra
2025-07-08 15:12 ` [PATCH v14 nf-next 2/3] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
2025-07-08 22:00   ` Florian Westphal
2025-07-08 15:12 ` [PATCH v14 nf-next 3/3] netfilter: nft_chain_filter: Add bridge " Eric Woudstra
2025-07-08 22:02   ` Florian Westphal
2025-07-11 12:55     ` Eric Woudstra
2025-07-11 14:14       ` Florian Westphal
2025-07-12 10:08         ` Eric Woudstra
2025-07-12 10:50           ` Florian Westphal
2025-09-02  8:48         ` Eric Woudstra
2025-09-02 13:18           ` Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).