netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v12 nf-next 0/2] conntrack: bridge: add double vlan, pppoe and pppoe-in-q
@ 2025-06-17  6:58 Eric Woudstra
  2025-06-17  6:58 ` [PATCH v12 nf-next 1/2] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
  2025-06-17  6:58 ` [PATCH v12 nf-next 2/2] netfilter: nft_chain_filter: Add bridge " Eric Woudstra
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Woudstra @ 2025-06-17  6:58 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: netfilter-devel, bridge, netdev, Eric Woudstra

Conntrack bridge only tracks untagged and 802.1q.

To make the bridge-fastpath experience more similar to the
forward-fastpath experience, add double vlan, pppoe and pppoe-in-q
tagged packets to bridge conntrack and to bridge filter chain.

Changes in v12:

- Only allow tracking this traffic when a conntrack zone is set.
- nf_ct_bridge_pre(): skb pull/push without touching the checksum,
   because the pull is always restored with push.
- nft_do_chain_bridge(): handle the extra header similar to
   nf_ct_bridge_pre(), using pull/push.

Changes in v11:

- nft_do_chain_bridge(): Proper readout of encapsulated proto.
- nft_do_chain_bridge(): Use skb_set_network_header() instead of thoff.
- removed test script, it is now in separate patch.

v10 split from patch-set: bridge-fastpath and related improvements v9

Eric Woudstra (2):
  netfilter: bridge: Add conntrack double vlan and pppoe
  netfilter: nft_chain_filter: Add bridge double vlan and pppoe

 net/bridge/netfilter/nf_conntrack_bridge.c | 83 ++++++++++++++++++----
 net/netfilter/nft_chain_filter.c           | 55 +++++++++++++-
 2 files changed, 125 insertions(+), 13 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v12 nf-next 1/2] netfilter: bridge: Add conntrack double vlan and pppoe
  2025-06-17  6:58 [PATCH v12 nf-next 0/2] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
@ 2025-06-17  6:58 ` Eric Woudstra
  2025-06-22 20:16   ` Florian Westphal
  2025-06-17  6:58 ` [PATCH v12 nf-next 2/2] netfilter: nft_chain_filter: Add bridge " Eric Woudstra
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Woudstra @ 2025-06-17  6:58 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: netfilter-devel, bridge, netdev, Eric Woudstra

This adds the capability to conntrack 802.1ad, QinQ, PPPoE and PPPoE-in-Q
packets that are passing a bridge, only when a conntrack zone is set.

Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
 net/bridge/netfilter/nf_conntrack_bridge.c | 83 ++++++++++++++++++----
 1 file changed, 71 insertions(+), 12 deletions(-)

diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
index 6482de4d8750..6a17bd81038e 100644
--- a/net/bridge/netfilter/nf_conntrack_bridge.c
+++ b/net/bridge/netfilter/nf_conntrack_bridge.c
@@ -242,53 +242,112 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
 {
 	struct nf_hook_state bridge_state = *state;
 	enum ip_conntrack_info ctinfo;
+	u32 len, data_len = U32_MAX;
+	int ret, offset = 0;
 	struct nf_conn *ct;
-	u32 len;
-	int ret;
+	__be16 outer_proto;
 
 	ct = nf_ct_get(skb, &ctinfo);
 	if ((ct && !nf_ct_is_template(ct)) ||
 	    ctinfo == IP_CT_UNTRACKED)
 		return NF_ACCEPT;
 
+	if (ct && nf_ct_zone_id(nf_ct_zone(ct), CTINFO2DIR(ctinfo)) !=
+			NF_CT_DEFAULT_ZONE_ID) {
+		switch (skb->protocol) {
+		case htons(ETH_P_PPP_SES): {
+			struct ppp_hdr {
+				struct pppoe_hdr hdr;
+				__be16 proto;
+			} *ph;
+
+			offset = PPPOE_SES_HLEN;
+			if (!pskb_may_pull(skb, offset))
+				return NF_ACCEPT;
+			outer_proto = skb->protocol;
+			ph = (struct ppp_hdr *)(skb->data);
+			switch (ph->proto) {
+			case htons(PPP_IP):
+				skb->protocol = htons(ETH_P_IP);
+				break;
+			case htons(PPP_IPV6):
+				skb->protocol = htons(ETH_P_IPV6);
+				break;
+			default:
+				nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
+				return NF_ACCEPT;
+			}
+			data_len = ntohs(ph->hdr.length) - 2;
+			__skb_pull(skb, offset);
+			skb_reset_network_header(skb);
+			break;
+		}
+		case htons(ETH_P_8021Q): {
+			struct vlan_hdr *vhdr;
+
+			offset = VLAN_HLEN;
+			if (!pskb_may_pull(skb, offset))
+				return NF_ACCEPT;
+			outer_proto = skb->protocol;
+			vhdr = (struct vlan_hdr *)(skb->data);
+			skb->protocol = vhdr->h_vlan_encapsulated_proto;
+			data_len = U32_MAX;
+			__skb_pull(skb, offset);
+			skb_reset_network_header(skb);
+			break;
+		}
+		}
+	}
+
+	ret = NF_ACCEPT;
 	switch (skb->protocol) {
 	case htons(ETH_P_IP):
 		if (!pskb_may_pull(skb, sizeof(struct iphdr)))
-			return NF_ACCEPT;
+			goto do_not_track;
 
 		len = skb_ip_totlen(skb);
+		if (data_len < len)
+			len = data_len;
 		if (pskb_trim_rcsum(skb, len))
-			return NF_ACCEPT;
+			goto do_not_track;
 
 		if (nf_ct_br_ip_check(skb))
-			return NF_ACCEPT;
+			goto do_not_track;
 
 		bridge_state.pf = NFPROTO_IPV4;
 		ret = nf_ct_br_defrag4(skb, &bridge_state);
 		break;
 	case htons(ETH_P_IPV6):
 		if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
-			return NF_ACCEPT;
+			goto do_not_track;
 
 		len = sizeof(struct ipv6hdr) + ntohs(ipv6_hdr(skb)->payload_len);
+		if (data_len < len)
+			len = data_len;
 		if (pskb_trim_rcsum(skb, len))
-			return NF_ACCEPT;
+			goto do_not_track;
 
 		if (nf_ct_br_ipv6_check(skb))
-			return NF_ACCEPT;
+			goto do_not_track;
 
 		bridge_state.pf = NFPROTO_IPV6;
 		ret = nf_ct_br_defrag6(skb, &bridge_state);
 		break;
 	default:
 		nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
-		return NF_ACCEPT;
+		goto do_not_track;
 	}
 
-	if (ret != NF_ACCEPT)
-		return ret;
+	if (ret == NF_ACCEPT)
+		ret = nf_conntrack_in(skb, &bridge_state);
 
-	return nf_conntrack_in(skb, &bridge_state);
+do_not_track:
+	if (offset) {
+		__skb_push(skb, offset);
+		skb_reset_network_header(skb);
+		skb->protocol = outer_proto;
+	}
+	return ret;
 }
 
 static unsigned int nf_ct_bridge_in(void *priv, struct sk_buff *skb,
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v12 nf-next 2/2] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2025-06-17  6:58 [PATCH v12 nf-next 0/2] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
  2025-06-17  6:58 ` [PATCH v12 nf-next 1/2] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
@ 2025-06-17  6:58 ` Eric Woudstra
  2025-06-22 20:40   ` Florian Westphal
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Woudstra @ 2025-06-17  6:58 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: netfilter-devel, bridge, netdev, Eric Woudstra

This adds the capability to evaluate 802.1ad, QinQ, PPPoE and PPPoE-in-Q
packets in the bridge filter chain.

Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
 net/netfilter/nft_chain_filter.c | 55 +++++++++++++++++++++++++++++++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index 19a553550c76..b9ab1916be94 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -232,11 +232,57 @@ nft_do_chain_bridge(void *priv,
 		    struct sk_buff *skb,
 		    const struct nf_hook_state *state)
 {
+	__be16 outer_proto, proto = 0;
 	struct nft_pktinfo pkt;
+	int ret, offset = 0;
 
 	nft_set_pktinfo(&pkt, skb, state);
 
 	switch (eth_hdr(skb)->h_proto) {
+	case htons(ETH_P_PPP_SES): {
+		struct ppp_hdr {
+			struct pppoe_hdr hdr;
+			__be16 proto;
+		} *ph;
+
+		if (!pskb_may_pull(skb, PPPOE_SES_HLEN))
+			break;
+		offset = PPPOE_SES_HLEN;
+		outer_proto = skb->protocol;
+		ph = (struct ppp_hdr *)(skb->data);
+		__skb_pull(skb, offset);
+		skb_reset_network_header(skb);
+		switch (ph->proto) {
+		case htons(PPP_IP):
+			proto = htons(ETH_P_IP);
+			break;
+		case htons(PPP_IPV6):
+			proto = htons(ETH_P_IPV6);
+			break;
+		}
+		skb->protocol = proto;
+		break;
+	}
+	case htons(ETH_P_8021Q): {
+		struct vlan_hdr *vhdr;
+
+		if (!pskb_may_pull(skb, VLAN_HLEN))
+			break;
+		offset = VLAN_HLEN;
+		outer_proto = skb->protocol;
+		vhdr = (struct vlan_hdr *)(skb->data);
+		__skb_pull(skb, offset);
+		skb_reset_network_header(skb);
+		proto = vhdr->h_vlan_encapsulated_proto;
+		skb->protocol = proto;
+		break;
+	}
+	default:
+		proto = eth_hdr(skb)->h_proto;
+		break;
+	}
+
+	switch (proto) {
 	case htons(ETH_P_IP):
 		nft_set_pktinfo_ipv4_validate(&pkt);
 		break;
@@ -248,7 +294,14 @@ nft_do_chain_bridge(void *priv,
 		break;
 	}
 
-	return nft_do_chain(&pkt, priv);
+	ret = nft_do_chain(&pkt, priv);
+
+	if (offset) {
+		__skb_push(skb, offset);
+		skb_reset_network_header(skb);
+		skb->protocol = outer_proto;
+	}
+	return ret;
 }
 
 static const struct nft_chain_type nft_chain_filter_bridge = {
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v12 nf-next 1/2] netfilter: bridge: Add conntrack double vlan and pppoe
  2025-06-17  6:58 ` [PATCH v12 nf-next 1/2] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
@ 2025-06-22 20:16   ` Florian Westphal
  2025-06-28 13:27     ` Eric Woudstra
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2025-06-22 20:16 UTC (permalink / raw)
  To: Eric Woudstra
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, bridge, netdev

Eric Woudstra <ericwouds@gmail.com> wrote:
> -	if (ret != NF_ACCEPT)
> -		return ret;
> +	if (ret == NF_ACCEPT)
> +		ret = nf_conntrack_in(skb, &bridge_state);
>  
> -	return nf_conntrack_in(skb, &bridge_state);
> +do_not_track:
> +	if (offset) {
> +		__skb_push(skb, offset);

nf_conntrack_in() can free the skb, or steal it.

But aside from this, I'm not sure this is a good idea to begin with,
it feels like we start to reimplement br_netfilter.c .

Perhaps it would be better to not push/pull but instead rename

unsigned int
nf_conntrack_in(struct sk_buff *skb, const struct nf_hook_state *state)

 to

unsigned int 
nf_conntrack_inner(struct sk_buff *skb, const struct nf_hook_state *state,
		   unsigned int nhoff)

and add

unsigned int 
nf_conntrack_in(struct sk_buff *skb, const struct nf_hook_state *state)
{
	return nf_conntrack_inner(skb, state, skb_network_offset(skb));
}

Or, alternatively, add
struct nf_ct_pktoffs {
	u16 nhoff;
	u16 thoff;
};

then populate that from nf_ct_bridge_pre(), then pass that to
nf_conntrack_inner() (all names are suggestions, if you find something
better thats fine).

Its going to be more complicated than this, but my point is that e.g.
nf_ct_get_tuple() already gets the l4 offset, so why not pass l3
offset too?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v12 nf-next 2/2] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2025-06-17  6:58 ` [PATCH v12 nf-next 2/2] netfilter: nft_chain_filter: Add bridge " Eric Woudstra
@ 2025-06-22 20:40   ` Florian Westphal
  2025-06-24 10:09     ` Eric Woudstra
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2025-06-22 20:40 UTC (permalink / raw)
  To: Eric Woudstra
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, bridge, netdev

Eric Woudstra <ericwouds@gmail.com> wrote:
> -	return nft_do_chain(&pkt, priv);
> +	ret = nft_do_chain(&pkt, priv);
> +
> +	if (offset) {
> +		__skb_push(skb, offset);
> +		skb_reset_network_header(skb);
> +		skb->protocol = outer_proto;
> +	}

I don't think its a good idea to do this.

nft_do_chain() can mangle packet in arbitrary ways,
including making a duplicate, sending icmp/tcp resets in response
to packet. forwarding the packet to another interface, dropping
the packet, etc.

Wouldn't it be enough to set the skb network header if its not
set yet, without pull (and a need to push later)?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v12 nf-next 2/2] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
  2025-06-22 20:40   ` Florian Westphal
@ 2025-06-24 10:09     ` Eric Woudstra
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Woudstra @ 2025-06-24 10:09 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, bridge, netdev



On 6/22/25 10:40 PM, Florian Westphal wrote:
> Eric Woudstra <ericwouds@gmail.com> wrote:
>> -	return nft_do_chain(&pkt, priv);
>> +	ret = nft_do_chain(&pkt, priv);
>> +
>> +	if (offset) {
>> +		__skb_push(skb, offset);
>> +		skb_reset_network_header(skb);
>> +		skb->protocol = outer_proto;
>> +	}
> 
> I don't think its a good idea to do this.
> 
> nft_do_chain() can mangle packet in arbitrary ways,
> including making a duplicate, sending icmp/tcp resets in response
> to packet. forwarding the packet to another interface, dropping
> the packet, etc.
> 
> Wouldn't it be enough to set the skb network header if its not
> set yet, without pull (and a need to push later)?

If I replace the pull + skb_reset_network_header with
skb_set_network_header and remove the push, this also works.
I'll change it in the next version of this patch.

However, if I do the same in nf_ct_bridge_pre() (the other patch in this
patch-set), then packets get dropped. I'll need to look into that furter.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v12 nf-next 1/2] netfilter: bridge: Add conntrack double vlan and pppoe
  2025-06-22 20:16   ` Florian Westphal
@ 2025-06-28 13:27     ` Eric Woudstra
  2025-06-28 14:21       ` Eric Woudstra
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Woudstra @ 2025-06-28 13:27 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, bridge, netdev



On 6/22/25 10:16 PM, Florian Westphal wrote:
> Eric Woudstra <ericwouds@gmail.com> wrote:
>> -	if (ret != NF_ACCEPT)
>> -		return ret;
>> +	if (ret == NF_ACCEPT)
>> +		ret = nf_conntrack_in(skb, &bridge_state);
>>  
>> -	return nf_conntrack_in(skb, &bridge_state);
>> +do_not_track:
>> +	if (offset) {
>> +		__skb_push(skb, offset);
> 
> nf_conntrack_in() can free the skb, or steal it.
> 
> But aside from this, I'm not sure this is a good idea to begin with,
> it feels like we start to reimplement br_netfilter.c .
> 
> Perhaps it would be better to not push/pull but instead rename
> 
> unsigned int
> nf_conntrack_in(struct sk_buff *skb, const struct nf_hook_state *state)
> 
>  to
> 
> unsigned int 
> nf_conntrack_inner(struct sk_buff *skb, const struct nf_hook_state *state,
> 		   unsigned int nhoff)
> 
> and add
> 
> unsigned int 
> nf_conntrack_in(struct sk_buff *skb, const struct nf_hook_state *state)
> {
> 	return nf_conntrack_inner(skb, state, skb_network_offset(skb));
> }
> 
> Or, alternatively, add
> struct nf_ct_pktoffs {
> 	u16 nhoff;
> 	u16 thoff;
> };
> 
> then populate that from nf_ct_bridge_pre(), then pass that to
> nf_conntrack_inner() (all names are suggestions, if you find something
> better thats fine).
> 
> Its going to be more complicated than this, but my point is that e.g.
> nf_ct_get_tuple() already gets the l4 offset, so why not pass l3
> offset too?

So I've tried nf_conntrack_inner(). The thing is:

>  	switch (skb->protocol) {
>  	case htons(ETH_P_IP):
>  		if (!pskb_may_pull(skb, sizeof(struct iphdr)))
> -			return NF_ACCEPT;
> +			goto do_not_track;
>
>  		len = skb_ip_totlen(skb);
> +		if (data_len < len)
> +			len = data_len;
>  		if (pskb_trim_rcsum(skb, len))
> -			return NF_ACCEPT;
> +			goto do_not_track;
>
>  		if (nf_ct_br_ip_check(skb))
> -			return NF_ACCEPT;
> +			goto do_not_track;
>
>  		bridge_state.pf = NFPROTO_IPV4;
>  		ret = nf_ct_br_defrag4(skb, &bridge_state);
>  		break;
>  	case htons(ETH_P_IPV6):
>  		if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
> -			return NF_ACCEPT;
> +			goto do_not_track;
>
>  		len = sizeof(struct ipv6hdr) + ntohs(ipv6_hdr(skb)->payload_len);
> +		if (data_len < len)
> +			len = data_len;
>  		if (pskb_trim_rcsum(skb, len))
> -			return NF_ACCEPT;
> +			goto do_not_track;
>
>  		if (nf_ct_br_ipv6_check(skb))
> -			return NF_ACCEPT;
> +			goto do_not_track;
>
>  		bridge_state.pf = NFPROTO_IPV6;
>  		ret = nf_ct_br_defrag6(skb, &bridge_state);
>  		break;

This part all use ip_hdr(skb) and ipv6_hdr(skb). I could add offset to
skb->network_header temporarily for this part of the code. Do you think
that is okay?

Adding offset to skb->network_header during the call to
nf_conntrack_in() does not work, but, as you mentioned, adding the
offset through the nf_conntrack_inner() function, that does work. Except
for 1 piece of code, I found so far:

nf_checksum() reports an error when it is called from
nf_conntrack_tcp_packet(). It also uses ip_hdr(skb) and ipv6_hdr(skb).
Strangely, It only gives the error when dealing with a pppoe packet or
pppoe-in-q packet. There is no error when q-in-q (double q) or 802.1ad
are involved.

Do you have any suggestion how you want to handle this failure in
nf_checksum()?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v12 nf-next 1/2] netfilter: bridge: Add conntrack double vlan and pppoe
  2025-06-28 13:27     ` Eric Woudstra
@ 2025-06-28 14:21       ` Eric Woudstra
  2025-07-01 11:36         ` Florian Westphal
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Woudstra @ 2025-06-28 14:21 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, bridge, netdev



On 6/28/25 3:27 PM, Eric Woudstra wrote:
> 
> 
> On 6/22/25 10:16 PM, Florian Westphal wrote:
>> Eric Woudstra <ericwouds@gmail.com> wrote:
>>> -	if (ret != NF_ACCEPT)
>>> -		return ret;
>>> +	if (ret == NF_ACCEPT)
>>> +		ret = nf_conntrack_in(skb, &bridge_state);
>>>  
>>> -	return nf_conntrack_in(skb, &bridge_state);
>>> +do_not_track:
>>> +	if (offset) {
>>> +		__skb_push(skb, offset);
>>
>> nf_conntrack_in() can free the skb, or steal it.
>>
>> But aside from this, I'm not sure this is a good idea to begin with,
>> it feels like we start to reimplement br_netfilter.c .
>>
>> Perhaps it would be better to not push/pull but instead rename
>>
>> unsigned int
>> nf_conntrack_in(struct sk_buff *skb, const struct nf_hook_state *state)
>>
>>  to
>>
>> unsigned int 
>> nf_conntrack_inner(struct sk_buff *skb, const struct nf_hook_state *state,
>> 		   unsigned int nhoff)
>>
>> and add
>>
>> unsigned int 
>> nf_conntrack_in(struct sk_buff *skb, const struct nf_hook_state *state)
>> {
>> 	return nf_conntrack_inner(skb, state, skb_network_offset(skb));
>> }
>>
>> Or, alternatively, add
>> struct nf_ct_pktoffs {
>> 	u16 nhoff;
>> 	u16 thoff;
>> };
>>
>> then populate that from nf_ct_bridge_pre(), then pass that to
>> nf_conntrack_inner() (all names are suggestions, if you find something
>> better thats fine).
>>
>> Its going to be more complicated than this, but my point is that e.g.
>> nf_ct_get_tuple() already gets the l4 offset, so why not pass l3
>> offset too?
> 
> So I've tried nf_conntrack_inner(). The thing is:
> 
>>  	switch (skb->protocol) {
>>  	case htons(ETH_P_IP):
>>  		if (!pskb_may_pull(skb, sizeof(struct iphdr)))
>> -			return NF_ACCEPT;
>> +			goto do_not_track;
>>
>>  		len = skb_ip_totlen(skb);
>> +		if (data_len < len)
>> +			len = data_len;
>>  		if (pskb_trim_rcsum(skb, len))
>> -			return NF_ACCEPT;
>> +			goto do_not_track;
>>
>>  		if (nf_ct_br_ip_check(skb))
>> -			return NF_ACCEPT;
>> +			goto do_not_track;
>>
>>  		bridge_state.pf = NFPROTO_IPV4;
>>  		ret = nf_ct_br_defrag4(skb, &bridge_state);
>>  		break;
>>  	case htons(ETH_P_IPV6):
>>  		if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
>> -			return NF_ACCEPT;
>> +			goto do_not_track;
>>
>>  		len = sizeof(struct ipv6hdr) + ntohs(ipv6_hdr(skb)->payload_len);
>> +		if (data_len < len)
>> +			len = data_len;
>>  		if (pskb_trim_rcsum(skb, len))
>> -			return NF_ACCEPT;
>> +			goto do_not_track;
>>
>>  		if (nf_ct_br_ipv6_check(skb))
>> -			return NF_ACCEPT;
>> +			goto do_not_track;
>>
>>  		bridge_state.pf = NFPROTO_IPV6;
>>  		ret = nf_ct_br_defrag6(skb, &bridge_state);
>>  		break;
> 
> This part all use ip_hdr(skb) and ipv6_hdr(skb). I could add offset to
> skb->network_header temporarily for this part of the code. Do you think
> that is okay?
> 
> Adding offset to skb->network_header during the call to
> nf_conntrack_in() does not work, but, as you mentioned, adding the
> offset through the nf_conntrack_inner() function, that does work. Except
> for 1 piece of code, I found so far:

A small correction, Adding offset to skb->network_header during to call
to nf_conntrack_in() also works. Then skb->network_header can be
restored after this call and nf_conntrack_inner() is not needed.

> 
> nf_checksum() reports an error when it is called from
> nf_conntrack_tcp_packet(). It also uses ip_hdr(skb) and ipv6_hdr(skb).
> Strangely, It only gives the error when dealing with a pppoe packet or
> pppoe-in-q packet. There is no error when q-in-q (double q) or 802.1ad
> are involved.
> 
> Do you have any suggestion how you want to handle this failure in
> nf_checksum()?
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v12 nf-next 1/2] netfilter: bridge: Add conntrack double vlan and pppoe
  2025-06-28 14:21       ` Eric Woudstra
@ 2025-07-01 11:36         ` Florian Westphal
  0 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2025-07-01 11:36 UTC (permalink / raw)
  To: Eric Woudstra
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Nikolay Aleksandrov,
	Ido Schimmel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, bridge, netdev

Eric Woudstra <ericwouds@gmail.com> wrote:
> > Adding offset to skb->network_header during the call to
> > nf_conntrack_in() does not work, but, as you mentioned, adding the
> > offset through the nf_conntrack_inner() function, that does work. Except
> > for 1 piece of code, I found so far:
> 
> A small correction, Adding offset to skb->network_header during to call
> to nf_conntrack_in() also works. Then skb->network_header can be
> restored after this call and nf_conntrack_inner() is not needed.

Good, thats even better.

> > nf_checksum() reports an error when it is called from
> > nf_conntrack_tcp_packet(). It also uses ip_hdr(skb) and ipv6_hdr(skb).
> > Strangely, It only gives the error when dealing with a pppoe packet or
> > pppoe-in-q packet. There is no error when q-in-q (double q) or 802.1ad
> > are involved.
> > 
> > Do you have any suggestion how you want to handle this failure in
> > nf_checksum()?

I suspect nf_checksum() assumes skb->data points to network header.
Several places in netfilter assume this, which is the reason for all the
skb pull/push kludges in br_netfilter_hooks.c :-/

git grep -- 'skb->data' net/netfilter net/*/netfilter | wc -l
66

(not all of those are going to be an issue, such as ipvs).

Some callers do this:
if (nf_ip_checksum(skb, hooknum, hdrlen, IPPROTO_ICMP))

where hdrlen is the size of the ipv4 header.

That won't do the right thing when skb->data isn't identical to the
start of the ipv4 header.

Others do this:
 if (nf_ip_checksum(skb, nft_hook(pkt), thoff, IPPROTO_TCP)) {

... where thoff is set via nft_set_pktinfo_ipv4(), so it *might*
be correct if nft_do_chain_bridge() is updated to follow l2 encap
trail (switch nft_do_chain_bridge() to use the flow dissector?).

but in some places thoff comes from this:
        thoff = ipv6_skip_exthdr(skb, ((u8*)(ip6h+1) - skb->data), &proto, &fo);

... which should have the right offset regardless of skb->data is.

So AFAICS the initial step has to be to go through conntrack (and all
conntrack helpers) and get rid of all 'skb->data is l3 header' assumptions.


Then repeat for nat engine, then for nf_tables, then for helpers such as
the nf checksum functions.

IPVS, ipset and xtables can be left as-is AFAICS as they will only see
packets coming from ip stack.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-07-01 11:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-17  6:58 [PATCH v12 nf-next 0/2] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
2025-06-17  6:58 ` [PATCH v12 nf-next 1/2] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
2025-06-22 20:16   ` Florian Westphal
2025-06-28 13:27     ` Eric Woudstra
2025-06-28 14:21       ` Eric Woudstra
2025-07-01 11:36         ` Florian Westphal
2025-06-17  6:58 ` [PATCH v12 nf-next 2/2] netfilter: nft_chain_filter: Add bridge " Eric Woudstra
2025-06-22 20:40   ` Florian Westphal
2025-06-24 10:09     ` Eric Woudstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).