* [PATCH v18 nf-next 0/4] conntrack: bridge: add double vlan, pppoe and pppoe-in-q
@ 2026-02-22 19:58 Eric Woudstra
2026-02-22 19:58 ` [PATCH v18 nf-next 1/4] netfilter: utils: nf_checksum(_partial) correct data!=networkheader Eric Woudstra
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Eric Woudstra @ 2026-02-22 19:58 UTC (permalink / raw)
To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Nikolay Aleksandrov, Ido Schimmel
Cc: netfilter-devel, netdev, bridge, Eric Woudstra
Conntrack bridge only tracks untagged and 802.1q.
To make the bridge-fastpath experience more similar to the
forward-fastpath experience, introduce patches for double vlan,
pppoe and pppoe-in-q tagged packets to bridge conntrack and to
bridge filter chain.
Changes in v18:
- Rebased
- nf_conntrack_bridge: added #include <linux/ppp_defs.h>
- nf_checksum(_partial)(): changed WARN_ON to WARN_ON_ONCE.
- nft_set_bridge_pktinfo(): changed call to pskb_may_pull() to
skb_header_pointer().
Changes in v17:
- Add patch for nft_set_pktinfo_ipv4/6_validate() adding nhoff argument.
- Stopped using skb_set_network_header() in nft_set_bridge_pktinfo,
using the new offset for nft_set_pktinfo_ipv4/6_validate instead.
- When pskb_may_pull() fails in nft_set_bridge_pktinfo() set proto to 0,
resulting in pktinfo unspecified.
Changes in v16:
- Changed nft_chain_filter patch: Only help populating pktinfo offsets,
call nft_do_chain() with original network_offset.
- Changed commit messages.
- Removed kernel-doc comments.
Changes in v15:
- Do not munge skb->protocol.
- Introduce nft_set_bridge_pktinfo() helper.
- Introduce nf_ct_bridge_pre_inner() helper.
- nf_ct_bridge_pre(): Don't trim on ph->hdr.length, only compare to what
ip header claims and return NF_ACCEPT if it does not match.
- nf_ct_bridge_pre(): Renamed u32 data_len to pppoe_len.
- nf_ct_bridge_pre(): Reset network_header only when ret == NF_ACCEPT.
- nf_checksum(_partial)(): Use of skb_network_offset().
- nf_checksum(_partial)(): Use 'if (WARN_ON()) return 0' instead.
- nf_checksum(_partial)(): Added comments
Changes in v14:
- nf_checksum(_patial): Use DEBUG_NET_WARN_ON_ONCE(
!skb_pointer_if_linear()) instead of pskb_may_pull().
- nft_do_chain_bridge: Added default case ph->proto is neither
ipv4 nor ipv6.
- nft_do_chain_bridge: only reset network header when ret == NF_ACCEPT.
Changes in v13:
- Do not use pull/push before/after calling nf_conntrack_in() or
nft_do_chain().
- Add patch to correct calculating checksum when skb->data !=
skb_network_header(skb).
Changes in v12:
- Only allow tracking this traffic when a conntrack zone is set.
- nf_ct_bridge_pre(): skb pull/push without touching the checksum,
because the pull is always restored with push.
- nft_do_chain_bridge(): handle the extra header similar to
nf_ct_bridge_pre(), using pull/push.
Changes in v11:
- nft_do_chain_bridge(): Proper readout of encapsulated proto.
- nft_do_chain_bridge(): Use skb_set_network_header() instead of thoff.
- removed test script, it is now in separate patch.
v10 split from patch-set: bridge-fastpath and related improvements v9
Eric Woudstra (4):
netfilter: utils: nf_checksum(_partial) correct data!=networkheader
netfilter: bridge: Add conntrack double vlan and pppoe
netfilter: nft_set_pktinfo_ipv4/6_validate: Add nhoff argument
netfilter: nft_chain_filter: Add bridge double vlan and pppoe
include/net/netfilter/nf_tables_ipv4.h | 21 +++--
include/net/netfilter/nf_tables_ipv6.h | 21 +++--
net/bridge/netfilter/nf_conntrack_bridge.c | 93 ++++++++++++++++++----
net/netfilter/nft_chain_filter.c | 59 ++++++++++++--
net/netfilter/utils.c | 28 +++++--
5 files changed, 177 insertions(+), 45 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v18 nf-next 1/4] netfilter: utils: nf_checksum(_partial) correct data!=networkheader
2026-02-22 19:58 [PATCH v18 nf-next 0/4] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
@ 2026-02-22 19:58 ` Eric Woudstra
2026-02-22 19:58 ` [PATCH v18 nf-next 2/4] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Eric Woudstra @ 2026-02-22 19:58 UTC (permalink / raw)
To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Nikolay Aleksandrov, Ido Schimmel
Cc: netfilter-devel, netdev, bridge, Eric Woudstra
In the conntrack hook it may not always be the case that:
skb_network_header(skb) == skb->data, i.e. skb_network_offset(skb)
is zero.
This is problematic when L4 function nf_conntrack_handle_packet()
is accessing L3 data. This function uses thoff and ip_hdr()
to finds it's data. But it also calculates the checksum.
nf_checksum() and nf_checksum_partial() both use lower skb-checksum
functions that are based on using skb->data.
Adjust for skb_network_offset(skb), so that the checksum is calculated
correctly.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/netfilter/utils.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/utils.c b/net/netfilter/utils.c
index 008419db815a..4f8b5442b650 100644
--- a/net/netfilter/utils.c
+++ b/net/netfilter/utils.c
@@ -124,16 +124,25 @@ __sum16 nf_checksum(struct sk_buff *skb, unsigned int hook,
unsigned int dataoff, u8 protocol,
unsigned short family)
{
+ unsigned int nhpull = skb_network_offset(skb);
__sum16 csum = 0;
+ if (WARN_ON_ONCE(!skb_pointer_if_linear(skb, nhpull, 0)))
+ return 0;
+
+ /* pull/push because the lower csum functions assume that
+ * skb_network_offset(skb) is zero.
+ */
+ __skb_pull(skb, nhpull);
switch (family) {
case AF_INET:
- csum = nf_ip_checksum(skb, hook, dataoff, protocol);
+ csum = nf_ip_checksum(skb, hook, dataoff - nhpull, protocol);
break;
case AF_INET6:
- csum = nf_ip6_checksum(skb, hook, dataoff, protocol);
+ csum = nf_ip6_checksum(skb, hook, dataoff - nhpull, protocol);
break;
}
+ __skb_push(skb, nhpull);
return csum;
}
@@ -143,18 +152,25 @@ __sum16 nf_checksum_partial(struct sk_buff *skb, unsigned int hook,
unsigned int dataoff, unsigned int len,
u8 protocol, unsigned short family)
{
+ unsigned int nhpull = skb_network_offset(skb);
__sum16 csum = 0;
+ if (WARN_ON_ONCE(!skb_pointer_if_linear(skb, nhpull, 0)))
+ return 0;
+
+ /* See nf_checksum() */
+ __skb_pull(skb, nhpull);
switch (family) {
case AF_INET:
- csum = nf_ip_checksum_partial(skb, hook, dataoff, len,
- protocol);
+ csum = nf_ip_checksum_partial(skb, hook, dataoff - nhpull,
+ len, protocol);
break;
case AF_INET6:
- csum = nf_ip6_checksum_partial(skb, hook, dataoff, len,
- protocol);
+ csum = nf_ip6_checksum_partial(skb, hook, dataoff - nhpull,
+ len, protocol);
break;
}
+ __skb_push(skb, nhpull);
return csum;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v18 nf-next 2/4] netfilter: bridge: Add conntrack double vlan and pppoe
2026-02-22 19:58 [PATCH v18 nf-next 0/4] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
2026-02-22 19:58 ` [PATCH v18 nf-next 1/4] netfilter: utils: nf_checksum(_partial) correct data!=networkheader Eric Woudstra
@ 2026-02-22 19:58 ` Eric Woudstra
2026-02-22 19:58 ` [PATCH v18 nf-next 3/4] netfilter: nft_set_pktinfo_ipv4/6_validate: Add nhoff argument Eric Woudstra
2026-02-22 19:58 ` [PATCH v18 nf-next 4/4] netfilter: nft_chain_filter: Add bridge double vlan and pppoe Eric Woudstra
3 siblings, 0 replies; 7+ messages in thread
From: Eric Woudstra @ 2026-02-22 19:58 UTC (permalink / raw)
To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Nikolay Aleksandrov, Ido Schimmel
Cc: netfilter-devel, netdev, bridge, Eric Woudstra
In a bridge, until now, it is possible to track connections of plain
ip(v6) and ip(v6) encapsulated in single 802.1q or 802.1ad.
This patch adds the capability to track connections when the connection
is (also) encapsulated in PPPoE. It also adds the capability to track
connections that are encapsulated in an inner 802.1q, combined with an
outer 802.1ad or 802.1q encapsulation.
To prevent mixing connections that are tagged differently in the L2
encapsulations, one should separate them using conntrack zones.
Using a conntrack zone is a hard requirement for the newly added
encapsulations of the tracking capability inside a bridge.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/bridge/netfilter/nf_conntrack_bridge.c | 93 ++++++++++++++++++----
1 file changed, 76 insertions(+), 17 deletions(-)
diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
index 58a33d0380b0..49e01083278c 100644
--- a/net/bridge/netfilter/nf_conntrack_bridge.c
+++ b/net/bridge/netfilter/nf_conntrack_bridge.c
@@ -16,6 +16,7 @@
#include <net/netfilter/nf_conntrack_helper.h>
#include <net/netfilter/nf_conntrack_bridge.h>
+#include <linux/ppp_defs.h>
#include <linux/netfilter_ipv4.h>
#include "../br_private.h"
@@ -236,58 +237,116 @@ static int nf_ct_br_ipv6_check(const struct sk_buff *skb)
return 0;
}
+static int nf_ct_bridge_pre_inner(struct sk_buff *skb, __be16 *proto, u32 *len)
+{
+ switch (*proto) {
+ case htons(ETH_P_PPP_SES): {
+ struct ppp_hdr {
+ struct pppoe_hdr hdr;
+ __be16 proto;
+ } *ph;
+
+ if (!pskb_may_pull(skb, PPPOE_SES_HLEN))
+ return -1;
+ ph = (struct ppp_hdr *)(skb->data);
+ switch (ph->proto) {
+ case htons(PPP_IP):
+ *proto = htons(ETH_P_IP);
+ *len = ntohs(ph->hdr.length) - 2;
+ skb_set_network_header(skb, PPPOE_SES_HLEN);
+ return PPPOE_SES_HLEN;
+ case htons(PPP_IPV6):
+ *proto = htons(ETH_P_IPV6);
+ *len = ntohs(ph->hdr.length) - 2;
+ skb_set_network_header(skb, PPPOE_SES_HLEN);
+ return PPPOE_SES_HLEN;
+ }
+ break;
+ }
+ case htons(ETH_P_8021Q): {
+ struct vlan_hdr *vhdr;
+
+ if (!pskb_may_pull(skb, VLAN_HLEN))
+ return -1;
+ vhdr = (struct vlan_hdr *)(skb->data);
+ *proto = vhdr->h_vlan_encapsulated_proto;
+ skb_set_network_header(skb, VLAN_HLEN);
+ return VLAN_HLEN;
+ }
+ }
+ return 0;
+}
+
static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state)
{
struct nf_hook_state bridge_state = *state;
+ int ret = NF_ACCEPT, offset = 0;
enum ip_conntrack_info ctinfo;
+ u32 len, pppoe_len = 0;
struct nf_conn *ct;
- u32 len;
- int ret;
+ __be16 proto;
ct = nf_ct_get(skb, &ctinfo);
if ((ct && !nf_ct_is_template(ct)) ||
ctinfo == IP_CT_UNTRACKED)
return NF_ACCEPT;
- switch (skb->protocol) {
- case htons(ETH_P_IP):
- if (!pskb_may_pull(skb, sizeof(struct iphdr)))
+ proto = skb->protocol;
+
+ if (ct && nf_ct_zone_id(nf_ct_zone(ct), CTINFO2DIR(ctinfo)) !=
+ NF_CT_DEFAULT_ZONE_ID) {
+ offset = nf_ct_bridge_pre_inner(skb, &proto, &pppoe_len);
+ if (offset < 0)
return NF_ACCEPT;
+ }
+
+ switch (proto) {
+ case htons(ETH_P_IP):
+ if (!pskb_may_pull(skb, offset + sizeof(struct iphdr)))
+ goto do_not_track;
len = skb_ip_totlen(skb);
- if (pskb_trim_rcsum(skb, len))
- return NF_ACCEPT;
+ if (pppoe_len && pppoe_len != len)
+ goto do_not_track;
+ if (pskb_trim_rcsum(skb, offset + len))
+ goto do_not_track;
if (nf_ct_br_ip_check(skb))
- return NF_ACCEPT;
+ goto do_not_track;
bridge_state.pf = NFPROTO_IPV4;
ret = nf_ct_br_defrag4(skb, &bridge_state);
break;
case htons(ETH_P_IPV6):
- if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
- return NF_ACCEPT;
+ if (!pskb_may_pull(skb, offset + sizeof(struct ipv6hdr)))
+ goto do_not_track;
len = sizeof(struct ipv6hdr) + skb_ipv6_payload_len(skb);
- if (pskb_trim_rcsum(skb, len))
- return NF_ACCEPT;
+ if (pppoe_len && pppoe_len != len)
+ goto do_not_track;
+ if (pskb_trim_rcsum(skb, offset + len))
+ goto do_not_track;
if (nf_ct_br_ipv6_check(skb))
- return NF_ACCEPT;
+ goto do_not_track;
bridge_state.pf = NFPROTO_IPV6;
ret = nf_ct_br_defrag6(skb, &bridge_state);
break;
default:
nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
- return NF_ACCEPT;
+ goto do_not_track;
}
- if (ret != NF_ACCEPT)
- return ret;
+ if (ret == NF_ACCEPT)
+ ret = nf_conntrack_in(skb, &bridge_state);
+
+do_not_track:
+ if (offset && ret == NF_ACCEPT)
+ skb_reset_network_header(skb);
- return nf_conntrack_in(skb, &bridge_state);
+ return ret;
}
static unsigned int nf_ct_bridge_in(void *priv, struct sk_buff *skb,
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v18 nf-next 3/4] netfilter: nft_set_pktinfo_ipv4/6_validate: Add nhoff argument
2026-02-22 19:58 [PATCH v18 nf-next 0/4] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
2026-02-22 19:58 ` [PATCH v18 nf-next 1/4] netfilter: utils: nf_checksum(_partial) correct data!=networkheader Eric Woudstra
2026-02-22 19:58 ` [PATCH v18 nf-next 2/4] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
@ 2026-02-22 19:58 ` Eric Woudstra
2026-02-22 19:58 ` [PATCH v18 nf-next 4/4] netfilter: nft_chain_filter: Add bridge double vlan and pppoe Eric Woudstra
3 siblings, 0 replies; 7+ messages in thread
From: Eric Woudstra @ 2026-02-22 19:58 UTC (permalink / raw)
To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Nikolay Aleksandrov, Ido Schimmel
Cc: netfilter-devel, netdev, bridge, Eric Woudstra
Add specifying an offset when calling nft_set_pktinfo_ipv4/6_validate()
for cases where the ip(v6) header is not located at skb_network_header().
When an offset is specified other then zero, do not set pkt->tprot and
the corresponding pkt->flags to not change rule processing. It does make
the offsets in pktinfo available for code that is not checking pkt->flags
to use the offsets, like nft_flow_offload_eval().
Existing behaviour for a rule like "tcp dport 22 accept" is not changed
when, for instance, a PPPoE packet is being matched inside a bridge.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
include/net/netfilter/nf_tables_ipv4.h | 21 +++++++++++++--------
include/net/netfilter/nf_tables_ipv6.h | 21 +++++++++++++--------
net/netfilter/nft_chain_filter.c | 8 ++++----
3 files changed, 30 insertions(+), 20 deletions(-)
diff --git a/include/net/netfilter/nf_tables_ipv4.h b/include/net/netfilter/nf_tables_ipv4.h
index fcf967286e37..bd354937134f 100644
--- a/include/net/netfilter/nf_tables_ipv4.h
+++ b/include/net/netfilter/nf_tables_ipv4.h
@@ -16,12 +16,12 @@ static inline void nft_set_pktinfo_ipv4(struct nft_pktinfo *pkt)
pkt->fragoff = ntohs(ip->frag_off) & IP_OFFSET;
}
-static inline int __nft_set_pktinfo_ipv4_validate(struct nft_pktinfo *pkt)
+static inline int __nft_set_pktinfo_ipv4_validate(struct nft_pktinfo *pkt, u32 nhoff)
{
struct iphdr *iph, _iph;
u32 len, thoff, skb_len;
- iph = skb_header_pointer(pkt->skb, skb_network_offset(pkt->skb),
+ iph = skb_header_pointer(pkt->skb, skb_network_offset(pkt->skb) + nhoff,
sizeof(*iph), &_iph);
if (!iph)
return -1;
@@ -31,7 +31,7 @@ static inline int __nft_set_pktinfo_ipv4_validate(struct nft_pktinfo *pkt)
len = iph_totlen(pkt->skb, iph);
thoff = iph->ihl * 4;
- skb_len = pkt->skb->len - skb_network_offset(pkt->skb);
+ skb_len = pkt->skb->len - skb_network_offset(pkt->skb) - nhoff;
if (skb_len < len)
return -1;
@@ -40,17 +40,22 @@ static inline int __nft_set_pktinfo_ipv4_validate(struct nft_pktinfo *pkt)
else if (thoff < sizeof(*iph))
return -1;
- pkt->flags = NFT_PKTINFO_L4PROTO;
- pkt->tprot = iph->protocol;
- pkt->thoff = skb_network_offset(pkt->skb) + thoff;
+ if (!nhoff) {
+ pkt->flags = NFT_PKTINFO_L4PROTO;
+ pkt->tprot = iph->protocol;
+ } else {
+ pkt->flags = 0;
+ pkt->tprot = 0;
+ }
+ pkt->thoff = skb_network_offset(pkt->skb) + nhoff + thoff;
pkt->fragoff = ntohs(iph->frag_off) & IP_OFFSET;
return 0;
}
-static inline void nft_set_pktinfo_ipv4_validate(struct nft_pktinfo *pkt)
+static inline void nft_set_pktinfo_ipv4_validate(struct nft_pktinfo *pkt, u32 nhoff)
{
- if (__nft_set_pktinfo_ipv4_validate(pkt) < 0)
+ if (__nft_set_pktinfo_ipv4_validate(pkt, nhoff) < 0)
nft_set_pktinfo_unspec(pkt);
}
diff --git a/include/net/netfilter/nf_tables_ipv6.h b/include/net/netfilter/nf_tables_ipv6.h
index c53ac00bb974..1e84a891f268 100644
--- a/include/net/netfilter/nf_tables_ipv6.h
+++ b/include/net/netfilter/nf_tables_ipv6.h
@@ -24,17 +24,17 @@ static inline void nft_set_pktinfo_ipv6(struct nft_pktinfo *pkt)
pkt->fragoff = frag_off;
}
-static inline int __nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt)
+static inline int __nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt, u32 nhoff)
{
#if IS_ENABLED(CONFIG_IPV6)
unsigned int flags = IP6_FH_F_AUTH;
struct ipv6hdr *ip6h, _ip6h;
- unsigned int thoff = 0;
+ unsigned int thoff = nhoff;
unsigned short frag_off;
u32 pkt_len, skb_len;
int protohdr;
- ip6h = skb_header_pointer(pkt->skb, skb_network_offset(pkt->skb),
+ ip6h = skb_header_pointer(pkt->skb, skb_network_offset(pkt->skb) + nhoff,
sizeof(*ip6h), &_ip6h);
if (!ip6h)
return -1;
@@ -43,7 +43,7 @@ static inline int __nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt)
return -1;
pkt_len = ipv6_payload_len(pkt->skb, ip6h);
- skb_len = pkt->skb->len - skb_network_offset(pkt->skb);
+ skb_len = pkt->skb->len - skb_network_offset(pkt->skb) - nhoff;
if (pkt_len + sizeof(*ip6h) > skb_len)
return -1;
@@ -51,8 +51,13 @@ static inline int __nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt)
if (protohdr < 0 || thoff > U16_MAX)
return -1;
- pkt->flags = NFT_PKTINFO_L4PROTO;
- pkt->tprot = protohdr;
+ if (!nhoff) {
+ pkt->flags = NFT_PKTINFO_L4PROTO;
+ pkt->tprot = protohdr;
+ } else {
+ pkt->flags = 0;
+ pkt->tprot = 0;
+ }
pkt->thoff = thoff;
pkt->fragoff = frag_off;
@@ -62,9 +67,9 @@ static inline int __nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt)
#endif
}
-static inline void nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt)
+static inline void nft_set_pktinfo_ipv6_validate(struct nft_pktinfo *pkt, u32 nhoff)
{
- if (__nft_set_pktinfo_ipv6_validate(pkt) < 0)
+ if (__nft_set_pktinfo_ipv6_validate(pkt, nhoff) < 0)
nft_set_pktinfo_unspec(pkt);
}
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index b16185e9a6dd..d4d5eadaba9c 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -238,10 +238,10 @@ nft_do_chain_bridge(void *priv,
switch (eth_hdr(skb)->h_proto) {
case htons(ETH_P_IP):
- nft_set_pktinfo_ipv4_validate(&pkt);
+ nft_set_pktinfo_ipv4_validate(&pkt, 0);
break;
case htons(ETH_P_IPV6):
- nft_set_pktinfo_ipv6_validate(&pkt);
+ nft_set_pktinfo_ipv6_validate(&pkt, 0);
break;
default:
nft_set_pktinfo_unspec(&pkt);
@@ -293,10 +293,10 @@ static unsigned int nft_do_chain_netdev(void *priv, struct sk_buff *skb,
switch (skb->protocol) {
case htons(ETH_P_IP):
- nft_set_pktinfo_ipv4_validate(&pkt);
+ nft_set_pktinfo_ipv4_validate(&pkt, 0);
break;
case htons(ETH_P_IPV6):
- nft_set_pktinfo_ipv6_validate(&pkt);
+ nft_set_pktinfo_ipv6_validate(&pkt, 0);
break;
default:
nft_set_pktinfo_unspec(&pkt);
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v18 nf-next 4/4] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
2026-02-22 19:58 [PATCH v18 nf-next 0/4] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
` (2 preceding siblings ...)
2026-02-22 19:58 ` [PATCH v18 nf-next 3/4] netfilter: nft_set_pktinfo_ipv4/6_validate: Add nhoff argument Eric Woudstra
@ 2026-02-22 19:58 ` Eric Woudstra
2026-02-23 16:01 ` Florian Westphal
2026-02-23 19:15 ` Florian Westphal
3 siblings, 2 replies; 7+ messages in thread
From: Eric Woudstra @ 2026-02-22 19:58 UTC (permalink / raw)
To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Nikolay Aleksandrov, Ido Schimmel
Cc: netfilter-devel, netdev, bridge, Eric Woudstra
In nft_do_chain_bridge() pktinfo is only fully populated for plain packets
and packets encapsulated in single 802.1q or 802.1ad.
When implementing the software bridge-fastpath and testing all possible
encapulations, there can be more encapsulations:
The packet could (also) be encapsulated in PPPoE, or the packet could be
encapsulated in an inner 802.1q, combined with an outer 802.1ad or 802.1q
encapsulation.
nft_flow_offload_eval() also examines the L4 header, with the L4 protocol
known from the conntrack-tuplehash. To access the header it uses
nft_thoff(), but for these packets it returns zero.
Introduce nft_set_bridge_pktinfo() to help populate pktinfo with the
offsets.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/netfilter/nft_chain_filter.c | 55 +++++++++++++++++++++++++++++---
1 file changed, 51 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index d4d5eadaba9c..66ef30c60e56 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -227,21 +227,68 @@ static inline void nft_chain_filter_inet_fini(void) {}
#endif /* CONFIG_NF_TABLES_IPV6 */
#if IS_ENABLED(CONFIG_NF_TABLES_BRIDGE)
+static int nft_set_bridge_pktinfo(struct nft_pktinfo *pkt, struct sk_buff *skb,
+ const struct nf_hook_state *state,
+ __be16 *proto)
+{
+ nft_set_pktinfo(pkt, skb, state);
+
+ switch (*proto) {
+ case htons(ETH_P_PPP_SES): {
+ struct ppp_hdr {
+ struct pppoe_hdr hdr;
+ __be16 proto;
+ } *ph, _ph;
+
+ ph = skb_header_pointer(skb, 0, sizeof(_ph), &_ph);
+ if (!ph) {
+ *proto = 0;
+ return -1;
+ }
+ switch (ph->proto) {
+ case htons(PPP_IP):
+ *proto = htons(ETH_P_IP);
+ return PPPOE_SES_HLEN;
+ case htons(PPP_IPV6):
+ *proto = htons(ETH_P_IPV6);
+ return PPPOE_SES_HLEN;
+ }
+ break;
+ }
+ case htons(ETH_P_8021Q): {
+ struct vlan_hdr *vhdr, _vhdr;
+
+ vhdr = skb_header_pointer(skb, 0, sizeof(_vhdr), &_vhdr);
+ if (!vhdr) {
+ *proto = 0;
+ return -1;
+ }
+ *proto = vhdr->h_vlan_encapsulated_proto;
+ return VLAN_HLEN;
+ }
+ }
+ return 0;
+}
+
static unsigned int
nft_do_chain_bridge(void *priv,
struct sk_buff *skb,
const struct nf_hook_state *state)
{
struct nft_pktinfo pkt;
+ __be16 proto;
+ int offset;
- nft_set_pktinfo(&pkt, skb, state);
+ proto = eth_hdr(skb)->h_proto;
+
+ offset = nft_set_bridge_pktinfo(&pkt, skb, state, &proto);
- switch (eth_hdr(skb)->h_proto) {
+ switch (proto) {
case htons(ETH_P_IP):
- nft_set_pktinfo_ipv4_validate(&pkt, 0);
+ nft_set_pktinfo_ipv4_validate(&pkt, offset);
break;
case htons(ETH_P_IPV6):
- nft_set_pktinfo_ipv6_validate(&pkt, 0);
+ nft_set_pktinfo_ipv6_validate(&pkt, offset);
break;
default:
nft_set_pktinfo_unspec(&pkt);
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v18 nf-next 4/4] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
2026-02-22 19:58 ` [PATCH v18 nf-next 4/4] netfilter: nft_chain_filter: Add bridge double vlan and pppoe Eric Woudstra
@ 2026-02-23 16:01 ` Florian Westphal
2026-02-23 19:15 ` Florian Westphal
1 sibling, 0 replies; 7+ messages in thread
From: Florian Westphal @ 2026-02-23 16:01 UTC (permalink / raw)
Cc: Pablo Neira Ayuso, Phil Sutter, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Nikolay Aleksandrov,
Ido Schimmel, netfilter-devel, netdev, bridge
Eric Woudstra <ericwouds@gmail.com> wrote:
This an AI-generated review, with no edits.
I added my own 2 cents at the end.
> diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
> index d4d5eadaba9c..66ef30c60e56 100644
> --- a/net/netfilter/nft_chain_filter.c
> +++ b/net/netfilter/nft_chain_filter.c
> @@ -227,21 +227,68 @@ static inline void nft_chain_filter_inet_fini(void) {}
> #endif /* CONFIG_NF_TABLES_IPV6 */
>
> #if IS_ENABLED(CONFIG_NF_TABLES_BRIDGE)
> +static int nft_set_bridge_pktinfo(struct nft_pktinfo *pkt, struct sk_buff *skb,
> + const struct nf_hook_state *state,
> + __be16 *proto)
> +{
> + nft_set_pktinfo(pkt, skb, state);
> +
> + switch (*proto) {
> + case htons(ETH_P_PPP_SES): {
[ ... ]
> + }
> + case htons(ETH_P_8021Q): {
> + struct vlan_hdr *vhdr, _vhdr;
> +
> + vhdr = skb_header_pointer(skb, 0, sizeof(_vhdr), &_vhdr);
> + if (!vhdr) {
> + *proto = 0;
> + return -1;
> + }
> + *proto = vhdr->h_vlan_encapsulated_proto;
> + return VLAN_HLEN;
> + }
> + }
> + return 0;
> +}
The commit message states support for "inner 802.1q, combined with an
outer 802.1ad or 802.1q encapsulation", but the code only handles
ETH_P_8021Q (0x8100). Should there be a case for ETH_P_8021AD (0x88A8)?
When a packet has an 802.1ad outer VLAN tag, proto will be ETH_P_8021AD
but won't match any case in the switch statement. The function returns 0
with proto unchanged, then nft_do_chain_bridge() calls
nft_set_pktinfo_unspec(), setting thoff to 0.
Other netfilter code handles both protocols, for example:
net/netfilter/nf_flow_table_ip.c:
switch (tuple->encap[i].proto) {
case htons(ETH_P_8021Q):
case htons(ETH_P_8021AD):
if (skb_vlan_push(...))
For double VLAN scenarios like outer 802.1q + inner 802.1q, the function
strips the outer tag and sets proto to ETH_P_8021Q (from the inner VLAN).
When nft_do_chain_bridge() switches on this proto value, it doesn't match
ETH_P_IP or ETH_P_IPV6, so it falls through to the default case and calls
nft_set_pktinfo_unspec().
This means for a packet with two 802.1q tags followed by IPv4, the offset
returned is 4 (one VLAN header), but nft_set_pktinfo_unspec() is called
and thoff becomes 0 instead of 8.
Should nft_set_bridge_pktinfo() handle nested encapsulations recursively,
or should nft_do_chain_bridge() check if the updated proto is still a
VLAN type and call nft_set_bridge_pktinfo() again?
[ fw: I oppose auto-following nested headers an would prefer to add
any vlan-vlan-vlan-vlan-vlan-pppoe or whatever incrementally after
someone explains why thats sane.
Wrt. the rest, I suspect the LLM gets confused due to vlan offloading,
the code is fine if vlan offloading is on, we have outer vlan tag in
the skb and eth_hdr(skb)->h_proto will be ETH_P_8021Q.
Neverthelss: should this support configurations where vlan tag offload
is off? If it should, then ETH_P_8021AD / ETH_P_8021Q combo has to be
handled. That could be done in a followup change, of course.
So, *I* don't see a need to send another iteration of the patch.
Eric, whats your take? Supersede? Ignore LLM? Follwup patch?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v18 nf-next 4/4] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
2026-02-22 19:58 ` [PATCH v18 nf-next 4/4] netfilter: nft_chain_filter: Add bridge double vlan and pppoe Eric Woudstra
2026-02-23 16:01 ` Florian Westphal
@ 2026-02-23 19:15 ` Florian Westphal
1 sibling, 0 replies; 7+ messages in thread
From: Florian Westphal @ 2026-02-23 19:15 UTC (permalink / raw)
To: Eric Woudstra
Cc: Pablo Neira Ayuso, Phil Sutter, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Nikolay Aleksandrov,
Ido Schimmel, netfilter-devel, netdev, bridge
Eric Woudstra <ericwouds@gmail.com> wrote:
> In nft_do_chain_bridge() pktinfo is only fully populated for plain packets
> and packets encapsulated in single 802.1q or 802.1ad.
>
> When implementing the software bridge-fastpath and testing all possible
> encapulations, there can be more encapsulations:
>
> The packet could (also) be encapsulated in PPPoE, or the packet could be
> encapsulated in an inner 802.1q, combined with an outer 802.1ad or 802.1q
> encapsulation.
>
> nft_flow_offload_eval() also examines the L4 header, with the L4 protocol
> known from the conntrack-tuplehash. To access the header it uses
> nft_thoff(), but for these packets it returns zero.
>
> Introduce nft_set_bridge_pktinfo() to help populate pktinfo with the
> offsets.
> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
> ---
> net/netfilter/nft_chain_filter.c | 55 +++++++++++++++++++++++++++++---
> 1 file changed, 51 insertions(+), 4 deletions(-)
>
> diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
> index d4d5eadaba9c..66ef30c60e56 100644
> --- a/net/netfilter/nft_chain_filter.c
> +++ b/net/netfilter/nft_chain_filter.c
> @@ -227,21 +227,68 @@ static inline void nft_chain_filter_inet_fini(void) {}
> #endif /* CONFIG_NF_TABLES_IPV6 */
>
> #if IS_ENABLED(CONFIG_NF_TABLES_BRIDGE)
> +static int nft_set_bridge_pktinfo(struct nft_pktinfo *pkt, struct sk_buff *skb,
> + const struct nf_hook_state *state,
> + __be16 *proto)
> +{
> + nft_set_pktinfo(pkt, skb, state);
> +
> + switch (*proto) {
> + case htons(ETH_P_PPP_SES): {
> + struct ppp_hdr {
> + struct pppoe_hdr hdr;
Hmm, this seems to trigger warning on NIPAs build_allmodconfig_warn test:
../include/uapi/linux/if_pppox.h:153:29: warning: array of flexible structures
Would you mind a new version? Its the last patch, so I leave it up to
you if you want to resend the whole series or just 4/4.
Sorry, I did not have time to run the entire test pipeline with the
pending patches until today.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-02-23 19:15 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-22 19:58 [PATCH v18 nf-next 0/4] conntrack: bridge: add double vlan, pppoe and pppoe-in-q Eric Woudstra
2026-02-22 19:58 ` [PATCH v18 nf-next 1/4] netfilter: utils: nf_checksum(_partial) correct data!=networkheader Eric Woudstra
2026-02-22 19:58 ` [PATCH v18 nf-next 2/4] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
2026-02-22 19:58 ` [PATCH v18 nf-next 3/4] netfilter: nft_set_pktinfo_ipv4/6_validate: Add nhoff argument Eric Woudstra
2026-02-22 19:58 ` [PATCH v18 nf-next 4/4] netfilter: nft_chain_filter: Add bridge double vlan and pppoe Eric Woudstra
2026-02-23 16:01 ` Florian Westphal
2026-02-23 19:15 ` Florian Westphal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox