* [PATCH v10 nf-next 0/3] Add nf_flow_encap_push() for xmit direct
@ 2025-03-15 19:59 Eric Woudstra
2025-03-15 19:59 ` [PATCH v10 nf-next 1/3] net: pppoe: avoid zero-length arrays in struct pppoe_hdr Eric Woudstra
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Eric Woudstra @ 2025-03-15 19:59 UTC (permalink / raw)
To: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Simon Horman
Cc: netdev, netfilter-devel, linux-hardening, Eric Woudstra
To have the ability to handle xmit direct with outgoing encaps in the
bridge fastpass bypass, we need to be able to handle them without going
through vlan/pppoe devices.
So I've applied, amended and squashed wenxu's patch-set.
This patch also makes it possible to egress from vlan-filtering brlan to
lan0 with vlan tagged packets, if the bridge master port is doing the
vlan tagging, instead of a vlan-device, as seen in the figure below.
Without this patch, this is currently not possible in the
forward-fastpath.
forward fastpath bypass
.----------------------------------------.
/ \
| IP - forwarding |
| / \ v
| / wan ...
| /
| |
| |
| +-------------------------------+
| | untagged |
| | to |
| | vlan 1 |
| | |
| | brlan (vlan-filtering) |
| +---------------+ |
| | DSA-SWITCH | |
| | | vlan 1 |
| | | to |
| | vlan 1 | untagged |
| +---------------+---------------+
. / \
------>lan0 wlan1
.
.
.
.
.
^
vlan 1 tagged packets
Added patch to eliminate array of flexible structures warning.
Added patch to clean up structures.
Split from patch-set: bridge-fastpath and related improvements v9
Eric Woudstra (3):
net: pppoe: avoid zero-length arrays in struct pppoe_hdr
netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit
direct
netfilter: flow: remove hw_outdev, out.hw_ifindex and out.hw_ifidx
drivers/net/ppp/pppoe.c | 2 +-
include/net/netfilter/nf_flow_table.h | 2 -
include/uapi/linux/if_pppox.h | 4 ++
net/netfilter/nf_flow_table_core.c | 1 -
net/netfilter/nf_flow_table_ip.c | 96 ++++++++++++++++++++++++++-
net/netfilter/nf_flow_table_offload.c | 2 +-
net/netfilter/nft_flow_offload.c | 10 +--
7 files changed, 102 insertions(+), 15 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v10 nf-next 1/3] net: pppoe: avoid zero-length arrays in struct pppoe_hdr
2025-03-15 19:59 [PATCH v10 nf-next 0/3] Add nf_flow_encap_push() for xmit direct Eric Woudstra
@ 2025-03-15 19:59 ` Eric Woudstra
2025-03-23 16:48 ` Simon Horman
2025-03-15 19:59 ` [PATCH v10 nf-next 2/3] netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit direct Eric Woudstra
2025-03-15 19:59 ` [PATCH v10 nf-next 3/3] netfilter: flow: remove hw_outdev, out.hw_ifindex and out.hw_ifidx Eric Woudstra
2 siblings, 1 reply; 8+ messages in thread
From: Eric Woudstra @ 2025-03-15 19:59 UTC (permalink / raw)
To: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Simon Horman
Cc: netdev, netfilter-devel, linux-hardening, Eric Woudstra,
Nikolay Aleksandrov
Jakub Kicinski suggested following patch:
W=1 C=1 GCC build gives us:
net/bridge/netfilter/nf_conntrack_bridge.c: note: in included file (through
../include/linux/if_pppox.h, ../include/uapi/linux/netfilter_bridge.h,
../include/linux/netfilter_bridge.h): include/uapi/linux/if_pppox.h:
153:29: warning: array of flexible structures
It doesn't like that hdr has a zero-length array which overlaps proto.
The kernel code doesn't currently need those arrays.
PPPoE connection is functional after applying this patch.
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
---
Split from patch-set: bridge-fastpath and related improvements v9
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
drivers/net/ppp/pppoe.c | 2 +-
include/uapi/linux/if_pppox.h | 4 ++++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 68e631718ab0..17946af6a8cf 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -882,7 +882,7 @@ static int pppoe_sendmsg(struct socket *sock, struct msghdr *m,
skb->protocol = cpu_to_be16(ETH_P_PPP_SES);
ph = skb_put(skb, total_len + sizeof(struct pppoe_hdr));
- start = (char *)&ph->tag[0];
+ start = (char *)ph + sizeof(*ph);
error = memcpy_from_msg(start, m, total_len);
if (error < 0) {
diff --git a/include/uapi/linux/if_pppox.h b/include/uapi/linux/if_pppox.h
index 9abd80dcc46f..29b804aa7474 100644
--- a/include/uapi/linux/if_pppox.h
+++ b/include/uapi/linux/if_pppox.h
@@ -122,7 +122,9 @@ struct sockaddr_pppol2tpv3in6 {
struct pppoe_tag {
__be16 tag_type;
__be16 tag_len;
+#ifndef __KERNEL__
char tag_data[];
+#endif
} __attribute__ ((packed));
/* Tag identifiers */
@@ -150,7 +152,9 @@ struct pppoe_hdr {
__u8 code;
__be16 sid;
__be16 length;
+#ifndef __KERNEL__
struct pppoe_tag tag[];
+#endif
} __packed;
/* Length of entire PPPoE + PPP header */
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v10 nf-next 2/3] netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit direct
2025-03-15 19:59 [PATCH v10 nf-next 0/3] Add nf_flow_encap_push() for xmit direct Eric Woudstra
2025-03-15 19:59 ` [PATCH v10 nf-next 1/3] net: pppoe: avoid zero-length arrays in struct pppoe_hdr Eric Woudstra
@ 2025-03-15 19:59 ` Eric Woudstra
2025-03-18 23:23 ` Pablo Neira Ayuso
2025-03-15 19:59 ` [PATCH v10 nf-next 3/3] netfilter: flow: remove hw_outdev, out.hw_ifindex and out.hw_ifidx Eric Woudstra
2 siblings, 1 reply; 8+ messages in thread
From: Eric Woudstra @ 2025-03-15 19:59 UTC (permalink / raw)
To: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Simon Horman
Cc: netdev, netfilter-devel, linux-hardening, Eric Woudstra,
Nikolay Aleksandrov
Loosely based on wenxu's patches:
"nf_flow_table_offload: offload the vlan/PPPoE encap in the flowtable".
Fixed double vlan and pppoe packets, almost entirely rewriting the patch.
After this patch, it is possible to transmit packets in the fastpath with
outgoing encaps, without using vlan- and/or pppoe-devices.
This makes it possible to use more different kinds of network setups.
For example, when bridge tagging is used to egress vlan tagged
packets using the forward fastpath. Another example is passing 802.1q
tagged packets through a bridge using the bridge fastpath.
This also makes the software fastpath process more similar to the
hardware offloaded fastpath process, where encaps are also pushed.
After applying this patch, always info->outdev = info->hw_outdev,
so the netfilter code can be further cleaned up by removing:
* hw_outdev from struct nft_forward_info
* out.hw_ifindex from struct nf_flow_route
* out.hw_ifidx from struct flow_offload_tuple
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/netfilter/nf_flow_table_ip.c | 96 +++++++++++++++++++++++++++++++-
net/netfilter/nft_flow_offload.c | 6 +-
2 files changed, 96 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 8cd4cf7ae211..d0c3c459c4d2 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -306,6 +306,92 @@ static bool nf_flow_skb_encap_protocol(struct sk_buff *skb, __be16 proto,
return false;
}
+static int nf_flow_vlan_inner_push(struct sk_buff *skb, __be16 proto, u16 id)
+{
+ struct vlan_hdr *vhdr;
+
+ if (skb_cow_head(skb, VLAN_HLEN))
+ return -1;
+
+ __skb_push(skb, VLAN_HLEN);
+ skb_reset_network_header(skb);
+
+ vhdr = (struct vlan_hdr *)(skb->data);
+ vhdr->h_vlan_TCI = htons(id);
+ vhdr->h_vlan_encapsulated_proto = skb->protocol;
+ skb->protocol = proto;
+
+ return 0;
+}
+
+static int nf_flow_ppoe_push(struct sk_buff *skb, u16 id)
+{
+ struct ppp_hdr {
+ struct pppoe_hdr hdr;
+ __be16 proto;
+ } *ph;
+ int data_len = skb->len + 2;
+ __be16 proto;
+
+ if (skb_cow_head(skb, PPPOE_SES_HLEN))
+ return -1;
+
+ if (skb->protocol == htons(ETH_P_IP))
+ proto = htons(PPP_IP);
+ else if (skb->protocol == htons(ETH_P_IPV6))
+ proto = htons(PPP_IPV6);
+ else
+ return -1;
+
+ __skb_push(skb, PPPOE_SES_HLEN);
+ skb_reset_network_header(skb);
+
+ ph = (struct ppp_hdr *)(skb->data);
+ ph->hdr.ver = 1;
+ ph->hdr.type = 1;
+ ph->hdr.code = 0;
+ ph->hdr.sid = htons(id);
+ ph->hdr.length = htons(data_len);
+ ph->proto = proto;
+ skb->protocol = htons(ETH_P_PPP_SES);
+
+ return 0;
+}
+
+static int nf_flow_encap_push(struct sk_buff *skb,
+ struct flow_offload_tuple_rhash *tuplehash,
+ unsigned short *type)
+{
+ int i = 0, ret = 0;
+
+ if (!tuplehash->tuple.encap_num)
+ return 0;
+
+ if (tuplehash->tuple.encap[i].proto == htons(ETH_P_8021Q) ||
+ tuplehash->tuple.encap[i].proto == htons(ETH_P_8021AD)) {
+ __vlan_hwaccel_put_tag(skb, tuplehash->tuple.encap[i].proto,
+ tuplehash->tuple.encap[i].id);
+ i++;
+ if (i >= tuplehash->tuple.encap_num)
+ return 0;
+ }
+
+ switch (tuplehash->tuple.encap[i].proto) {
+ case htons(ETH_P_8021Q):
+ *type = ETH_P_8021Q;
+ ret = nf_flow_vlan_inner_push(skb,
+ tuplehash->tuple.encap[i].proto,
+ tuplehash->tuple.encap[i].id);
+ break;
+ case htons(ETH_P_PPP_SES):
+ *type = ETH_P_PPP_SES;
+ ret = nf_flow_ppoe_push(skb,
+ tuplehash->tuple.encap[i].id);
+ break;
+ }
+ return ret;
+}
+
static void nf_flow_encap_pop(struct sk_buff *skb,
struct flow_offload_tuple_rhash *tuplehash)
{
@@ -335,6 +421,7 @@ static void nf_flow_encap_pop(struct sk_buff *skb,
static unsigned int nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
const struct flow_offload_tuple_rhash *tuplehash,
+ struct flow_offload_tuple_rhash *other_tuplehash,
unsigned short type)
{
struct net_device *outdev;
@@ -343,6 +430,9 @@ static unsigned int nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
if (!outdev)
return NF_DROP;
+ if (nf_flow_encap_push(skb, other_tuplehash, &type) < 0)
+ return NF_DROP;
+
skb->dev = outdev;
dev_hard_header(skb, skb->dev, type, tuplehash->tuple.out.h_dest,
tuplehash->tuple.out.h_source, skb->len);
@@ -462,7 +552,8 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
ret = NF_STOLEN;
break;
case FLOW_OFFLOAD_XMIT_DIRECT:
- ret = nf_flow_queue_xmit(state->net, skb, tuplehash, ETH_P_IP);
+ ret = nf_flow_queue_xmit(state->net, skb, tuplehash,
+ &flow->tuplehash[!dir], ETH_P_IP);
if (ret == NF_DROP)
flow_offload_teardown(flow);
break;
@@ -757,7 +848,8 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
ret = NF_STOLEN;
break;
case FLOW_OFFLOAD_XMIT_DIRECT:
- ret = nf_flow_queue_xmit(state->net, skb, tuplehash, ETH_P_IPV6);
+ ret = nf_flow_queue_xmit(state->net, skb, tuplehash,
+ &flow->tuplehash[!dir], ETH_P_IPV6);
if (ret == NF_DROP)
flow_offload_teardown(flow);
break;
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index 221d50223018..d320b7f5282e 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -124,13 +124,12 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack,
info->indev = NULL;
break;
}
- if (!info->outdev)
- info->outdev = path->dev;
info->encap[info->num_encaps].id = path->encap.id;
info->encap[info->num_encaps].proto = path->encap.proto;
info->num_encaps++;
if (path->type == DEV_PATH_PPPOE)
memcpy(info->h_dest, path->encap.h_dest, ETH_ALEN);
+ info->xmit_type = FLOW_OFFLOAD_XMIT_DIRECT;
break;
case DEV_PATH_BRIDGE:
if (is_zero_ether_addr(info->h_source))
@@ -158,8 +157,7 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack,
break;
}
}
- if (!info->outdev)
- info->outdev = info->indev;
+ info->outdev = info->indev;
info->hw_outdev = info->indev;
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v10 nf-next 3/3] netfilter: flow: remove hw_outdev, out.hw_ifindex and out.hw_ifidx
2025-03-15 19:59 [PATCH v10 nf-next 0/3] Add nf_flow_encap_push() for xmit direct Eric Woudstra
2025-03-15 19:59 ` [PATCH v10 nf-next 1/3] net: pppoe: avoid zero-length arrays in struct pppoe_hdr Eric Woudstra
2025-03-15 19:59 ` [PATCH v10 nf-next 2/3] netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit direct Eric Woudstra
@ 2025-03-15 19:59 ` Eric Woudstra
2 siblings, 0 replies; 8+ messages in thread
From: Eric Woudstra @ 2025-03-15 19:59 UTC (permalink / raw)
To: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Simon Horman
Cc: netdev, netfilter-devel, linux-hardening, Eric Woudstra,
Nikolay Aleksandrov
Now always info->outdev == info->hw_outdev, so the netfilter code can be
further cleaned up by removing:
* hw_outdev from struct nft_forward_info
* out.hw_ifindex from struct nf_flow_route
* out.hw_ifidx from struct flow_offload_tuple
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
include/net/netfilter/nf_flow_table.h | 2 --
net/netfilter/nf_flow_table_core.c | 1 -
net/netfilter/nf_flow_table_offload.c | 2 +-
net/netfilter/nft_flow_offload.c | 4 ----
4 files changed, 1 insertion(+), 8 deletions(-)
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index d711642e78b5..4ab32fb61865 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -145,7 +145,6 @@ struct flow_offload_tuple {
};
struct {
u32 ifidx;
- u32 hw_ifidx;
u8 h_source[ETH_ALEN];
u8 h_dest[ETH_ALEN];
} out;
@@ -211,7 +210,6 @@ struct nf_flow_route {
} in;
struct {
u32 ifindex;
- u32 hw_ifindex;
u8 h_source[ETH_ALEN];
u8 h_dest[ETH_ALEN];
} out;
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 9d8361526f82..1e5d3735c028 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -127,7 +127,6 @@ static int flow_offload_fill_route(struct flow_offload *flow,
memcpy(flow_tuple->out.h_source, route->tuple[dir].out.h_source,
ETH_ALEN);
flow_tuple->out.ifidx = route->tuple[dir].out.ifindex;
- flow_tuple->out.hw_ifidx = route->tuple[dir].out.hw_ifindex;
dst_release(dst);
break;
case FLOW_OFFLOAD_XMIT_XFRM:
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 0ec4abded10d..f642d0426f1c 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -555,7 +555,7 @@ static void flow_offload_redirect(struct net *net,
switch (this_tuple->xmit_type) {
case FLOW_OFFLOAD_XMIT_DIRECT:
this_tuple = &flow->tuplehash[dir].tuple;
- ifindex = this_tuple->out.hw_ifidx;
+ ifindex = this_tuple->out.ifidx;
break;
case FLOW_OFFLOAD_XMIT_NEIGH:
other_tuple = &flow->tuplehash[!dir].tuple;
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index d320b7f5282e..acfdf523bd3b 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -80,7 +80,6 @@ static int nft_dev_fill_forward_path(const struct nf_flow_route *route,
struct nft_forward_info {
const struct net_device *indev;
const struct net_device *outdev;
- const struct net_device *hw_outdev;
struct id {
__u16 id;
__be16 proto;
@@ -159,8 +158,6 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack,
}
info->outdev = info->indev;
- info->hw_outdev = info->indev;
-
if (nf_flowtable_hw_offload(flowtable) &&
nft_is_valid_ether_device(info->indev))
info->xmit_type = FLOW_OFFLOAD_XMIT_DIRECT;
@@ -212,7 +209,6 @@ static void nft_dev_forward_path(struct nf_flow_route *route,
memcpy(route->tuple[dir].out.h_source, info.h_source, ETH_ALEN);
memcpy(route->tuple[dir].out.h_dest, info.h_dest, ETH_ALEN);
route->tuple[dir].out.ifindex = info.outdev->ifindex;
- route->tuple[dir].out.hw_ifindex = info.hw_outdev->ifindex;
route->tuple[dir].xmit_type = info.xmit_type;
}
}
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v10 nf-next 2/3] netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit direct
2025-03-15 19:59 ` [PATCH v10 nf-next 2/3] netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit direct Eric Woudstra
@ 2025-03-18 23:23 ` Pablo Neira Ayuso
2025-03-19 19:37 ` Eric Woudstra
0 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2025-03-18 23:23 UTC (permalink / raw)
To: Eric Woudstra
Cc: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jozsef Kadlecsik, Simon Horman,
netdev, netfilter-devel, linux-hardening, Nikolay Aleksandrov
On Sat, Mar 15, 2025 at 08:59:09PM +0100, Eric Woudstra wrote:
> Loosely based on wenxu's patches:
>
> "nf_flow_table_offload: offload the vlan/PPPoE encap in the flowtable".
I remember that patch.
> Fixed double vlan and pppoe packets, almost entirely rewriting the patch.
>
> After this patch, it is possible to transmit packets in the fastpath with
> outgoing encaps, without using vlan- and/or pppoe-devices.
>
> This makes it possible to use more different kinds of network setups.
> For example, when bridge tagging is used to egress vlan tagged
> packets using the forward fastpath. Another example is passing 802.1q
> tagged packets through a bridge using the bridge fastpath.
>
> This also makes the software fastpath process more similar to the
> hardware offloaded fastpath process, where encaps are also pushed.
I am not convinced that making the software flowtable more similar
hardware is the way the go, we already have to deal with issues with
flow teardown mechanism (races) to make it more suitable for hardware
offload.
I think the benefit for pppoe is that packets do not go to userspace
anymore, but probably pppoe driver can be modified push the header
itself?
As for vlan, this is saving an indirection?
Going in this direction means the flowtable datapath will get more
headers to be pushed in this path in the future, eg. mpls.
Is this also possibly breaking existing setups? eg. xfrm + vlan
devices, but maybe I'm wrong.
> After applying this patch, always info->outdev = info->hw_outdev,
> so the netfilter code can be further cleaned up by removing:
> * hw_outdev from struct nft_forward_info
> * out.hw_ifindex from struct nf_flow_route
> * out.hw_ifidx from struct flow_offload_tuple
>
> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
> ---
> net/netfilter/nf_flow_table_ip.c | 96 +++++++++++++++++++++++++++++++-
> net/netfilter/nft_flow_offload.c | 6 +-
> 2 files changed, 96 insertions(+), 6 deletions(-)
>
> diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> index 8cd4cf7ae211..d0c3c459c4d2 100644
> --- a/net/netfilter/nf_flow_table_ip.c
> +++ b/net/netfilter/nf_flow_table_ip.c
> @@ -306,6 +306,92 @@ static bool nf_flow_skb_encap_protocol(struct sk_buff *skb, __be16 proto,
> return false;
> }
>
> +static int nf_flow_vlan_inner_push(struct sk_buff *skb, __be16 proto, u16 id)
> +{
> + struct vlan_hdr *vhdr;
> +
> + if (skb_cow_head(skb, VLAN_HLEN))
> + return -1;
> +
> + __skb_push(skb, VLAN_HLEN);
> + skb_reset_network_header(skb);
> +
> + vhdr = (struct vlan_hdr *)(skb->data);
> + vhdr->h_vlan_TCI = htons(id);
> + vhdr->h_vlan_encapsulated_proto = skb->protocol;
> + skb->protocol = proto;
> +
> + return 0;
> +}
> +
> +static int nf_flow_ppoe_push(struct sk_buff *skb, u16 id)
> +{
> + struct ppp_hdr {
> + struct pppoe_hdr hdr;
> + __be16 proto;
> + } *ph;
> + int data_len = skb->len + 2;
> + __be16 proto;
> +
> + if (skb_cow_head(skb, PPPOE_SES_HLEN))
> + return -1;
> +
> + if (skb->protocol == htons(ETH_P_IP))
> + proto = htons(PPP_IP);
> + else if (skb->protocol == htons(ETH_P_IPV6))
> + proto = htons(PPP_IPV6);
> + else
> + return -1;
> +
> + __skb_push(skb, PPPOE_SES_HLEN);
> + skb_reset_network_header(skb);
> +
> + ph = (struct ppp_hdr *)(skb->data);
> + ph->hdr.ver = 1;
> + ph->hdr.type = 1;
> + ph->hdr.code = 0;
> + ph->hdr.sid = htons(id);
> + ph->hdr.length = htons(data_len);
> + ph->proto = proto;
> + skb->protocol = htons(ETH_P_PPP_SES);
> +
> + return 0;
> +}
> +
> +static int nf_flow_encap_push(struct sk_buff *skb,
> + struct flow_offload_tuple_rhash *tuplehash,
> + unsigned short *type)
> +{
> + int i = 0, ret = 0;
> +
> + if (!tuplehash->tuple.encap_num)
> + return 0;
> +
> + if (tuplehash->tuple.encap[i].proto == htons(ETH_P_8021Q) ||
> + tuplehash->tuple.encap[i].proto == htons(ETH_P_8021AD)) {
> + __vlan_hwaccel_put_tag(skb, tuplehash->tuple.encap[i].proto,
> + tuplehash->tuple.encap[i].id);
> + i++;
> + if (i >= tuplehash->tuple.encap_num)
> + return 0;
> + }
> +
> + switch (tuplehash->tuple.encap[i].proto) {
> + case htons(ETH_P_8021Q):
> + *type = ETH_P_8021Q;
> + ret = nf_flow_vlan_inner_push(skb,
> + tuplehash->tuple.encap[i].proto,
> + tuplehash->tuple.encap[i].id);
> + break;
> + case htons(ETH_P_PPP_SES):
> + *type = ETH_P_PPP_SES;
> + ret = nf_flow_ppoe_push(skb,
> + tuplehash->tuple.encap[i].id);
> + break;
> + }
> + return ret;
> +}
> +
> static void nf_flow_encap_pop(struct sk_buff *skb,
> struct flow_offload_tuple_rhash *tuplehash)
> {
> @@ -335,6 +421,7 @@ static void nf_flow_encap_pop(struct sk_buff *skb,
>
> static unsigned int nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
> const struct flow_offload_tuple_rhash *tuplehash,
> + struct flow_offload_tuple_rhash *other_tuplehash,
> unsigned short type)
> {
> struct net_device *outdev;
> @@ -343,6 +430,9 @@ static unsigned int nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
> if (!outdev)
> return NF_DROP;
>
> + if (nf_flow_encap_push(skb, other_tuplehash, &type) < 0)
> + return NF_DROP;
> +
> skb->dev = outdev;
> dev_hard_header(skb, skb->dev, type, tuplehash->tuple.out.h_dest,
> tuplehash->tuple.out.h_source, skb->len);
> @@ -462,7 +552,8 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
> ret = NF_STOLEN;
> break;
> case FLOW_OFFLOAD_XMIT_DIRECT:
> - ret = nf_flow_queue_xmit(state->net, skb, tuplehash, ETH_P_IP);
> + ret = nf_flow_queue_xmit(state->net, skb, tuplehash,
> + &flow->tuplehash[!dir], ETH_P_IP);
> if (ret == NF_DROP)
> flow_offload_teardown(flow);
> break;
> @@ -757,7 +848,8 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
> ret = NF_STOLEN;
> break;
> case FLOW_OFFLOAD_XMIT_DIRECT:
> - ret = nf_flow_queue_xmit(state->net, skb, tuplehash, ETH_P_IPV6);
> + ret = nf_flow_queue_xmit(state->net, skb, tuplehash,
> + &flow->tuplehash[!dir], ETH_P_IPV6);
> if (ret == NF_DROP)
> flow_offload_teardown(flow);
> break;
> diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
> index 221d50223018..d320b7f5282e 100644
> --- a/net/netfilter/nft_flow_offload.c
> +++ b/net/netfilter/nft_flow_offload.c
> @@ -124,13 +124,12 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack,
> info->indev = NULL;
> break;
> }
> - if (!info->outdev)
> - info->outdev = path->dev;
> info->encap[info->num_encaps].id = path->encap.id;
> info->encap[info->num_encaps].proto = path->encap.proto;
> info->num_encaps++;
> if (path->type == DEV_PATH_PPPOE)
> memcpy(info->h_dest, path->encap.h_dest, ETH_ALEN);
> + info->xmit_type = FLOW_OFFLOAD_XMIT_DIRECT;
> break;
> case DEV_PATH_BRIDGE:
> if (is_zero_ether_addr(info->h_source))
> @@ -158,8 +157,7 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack,
> break;
> }
> }
> - if (!info->outdev)
> - info->outdev = info->indev;
> + info->outdev = info->indev;
>
> info->hw_outdev = info->indev;
>
> --
> 2.47.1
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v10 nf-next 2/3] netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit direct
2025-03-18 23:23 ` Pablo Neira Ayuso
@ 2025-03-19 19:37 ` Eric Woudstra
0 siblings, 0 replies; 8+ messages in thread
From: Eric Woudstra @ 2025-03-19 19:37 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jozsef Kadlecsik, Simon Horman,
netdev, netfilter-devel, linux-hardening, Nikolay Aleksandrov
On 3/19/25 12:23 AM, Pablo Neira Ayuso wrote:
> On Sat, Mar 15, 2025 at 08:59:09PM +0100, Eric Woudstra wrote:
>> Loosely based on wenxu's patches:
>>
>> "nf_flow_table_offload: offload the vlan/PPPoE encap in the flowtable".
>
> I remember that patch.
>
>> Fixed double vlan and pppoe packets, almost entirely rewriting the patch.
>>
>> After this patch, it is possible to transmit packets in the fastpath with
>> outgoing encaps, without using vlan- and/or pppoe-devices.
>>
>> This makes it possible to use more different kinds of network setups.
>> For example, when bridge tagging is used to egress vlan tagged
>> packets using the forward fastpath. Another example is passing 802.1q
>> tagged packets through a bridge using the bridge fastpath.
>>
>> This also makes the software fastpath process more similar to the
>> hardware offloaded fastpath process, where encaps are also pushed.
>
> I am not convinced that making the software flowtable more similar
> hardware is the way the go, we already have to deal with issues with
> flow teardown mechanism (races) to make it more suitable for hardware
> offload.
>
> I think the benefit for pppoe is that packets do not go to userspace
> anymore, but probably pppoe driver can be modified push the header
> itself?
>
> As for vlan, this is saving an indirection?
>
> Going in this direction means the flowtable datapath will get more
> headers to be pushed in this path in the future, eg. mpls.
>
> Is this also possibly breaking existing setups? eg. xfrm + vlan
> devices, but maybe I'm wrong.
>
If you do not want to touch the software fastpath, It should be possible
to do it without.
For bridged interfaces, only use the hardware fastpath, not installing a
hook for the software fastpath at all.
Another option is installing the hook (matching the hash, updating
counter and perhaps calling flow_offload_refresh() and so), but then
letting traffic continue the normal path. That is, until the hardware
offload takes over.
In both cases only allow to add the flowtable if the offload flag is set.
What do you think?
But in all cases (including existing cases in existing code), I think we
need the patches from "[PATCH v10 nf-next 0/3] netfilter: fastpath fixes".
Could you look at these?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v10 nf-next 1/3] net: pppoe: avoid zero-length arrays in struct pppoe_hdr
2025-03-15 19:59 ` [PATCH v10 nf-next 1/3] net: pppoe: avoid zero-length arrays in struct pppoe_hdr Eric Woudstra
@ 2025-03-23 16:48 ` Simon Horman
2025-03-25 6:46 ` Eric Woudstra
0 siblings, 1 reply; 8+ messages in thread
From: Simon Horman @ 2025-03-23 16:48 UTC (permalink / raw)
To: Eric Woudstra
Cc: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
netdev, netfilter-devel, linux-hardening, Nikolay Aleksandrov
On Sat, Mar 15, 2025 at 08:59:08PM +0100, Eric Woudstra wrote:
> Jakub Kicinski suggested following patch:
>
> W=1 C=1 GCC build gives us:
>
> net/bridge/netfilter/nf_conntrack_bridge.c: note: in included file (through
> ../include/linux/if_pppox.h, ../include/uapi/linux/netfilter_bridge.h,
> ../include/linux/netfilter_bridge.h): include/uapi/linux/if_pppox.h:
> 153:29: warning: array of flexible structures
>
> It doesn't like that hdr has a zero-length array which overlaps proto.
> The kernel code doesn't currently need those arrays.
>
> PPPoE connection is functional after applying this patch.
>
> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
>
> ---
>
> Split from patch-set: bridge-fastpath and related improvements v9
>
> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
Hi Eric,
Perhaps this is due to tooling, but your Signed-off-by line should
appear immediately after the Reviewed-by line. No blank line in between.
And, in particular, the Signed-off-by line should appear above the (first)
scissors ("---"), as if git am is used to apply your patch then the
commit message will be truncated at that point. Which results
in a commit with no signed-off-by line.
FWIIW, putting the note about splitting the patch-set below the scissors
looks good to me.
...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v10 nf-next 1/3] net: pppoe: avoid zero-length arrays in struct pppoe_hdr
2025-03-23 16:48 ` Simon Horman
@ 2025-03-25 6:46 ` Eric Woudstra
0 siblings, 0 replies; 8+ messages in thread
From: Eric Woudstra @ 2025-03-25 6:46 UTC (permalink / raw)
To: Simon Horman
Cc: Michal Ostrowski, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
netdev, netfilter-devel, linux-hardening, Nikolay Aleksandrov
On 3/23/25 5:48 PM, Simon Horman wrote:
> On Sat, Mar 15, 2025 at 08:59:08PM +0100, Eric Woudstra wrote:
>> Jakub Kicinski suggested following patch:
>>
>> W=1 C=1 GCC build gives us:
>>
>> net/bridge/netfilter/nf_conntrack_bridge.c: note: in included file (through
>> ../include/linux/if_pppox.h, ../include/uapi/linux/netfilter_bridge.h,
>> ../include/linux/netfilter_bridge.h): include/uapi/linux/if_pppox.h:
>> 153:29: warning: array of flexible structures
>>
>> It doesn't like that hdr has a zero-length array which overlaps proto.
>> The kernel code doesn't currently need those arrays.
>>
>> PPPoE connection is functional after applying this patch.
>>
>> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
>>
>> ---
>>
>> Split from patch-set: bridge-fastpath and related improvements v9
>>
>> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
>
> Hi Eric,
>
> Perhaps this is due to tooling, but your Signed-off-by line should
> appear immediately after the Reviewed-by line. No blank line in between.
>
> And, in particular, the Signed-off-by line should appear above the (first)
> scissors ("---"), as if git am is used to apply your patch then the
> commit message will be truncated at that point. Which results
> in a commit with no signed-off-by line.
>
> FWIIW, putting the note about splitting the patch-set below the scissors
> looks good to me.
>
> ...
Thanks, when I noticed it, it was send already. I've changed my script,
so it should not happen anymore.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-03-25 6:46 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-15 19:59 [PATCH v10 nf-next 0/3] Add nf_flow_encap_push() for xmit direct Eric Woudstra
2025-03-15 19:59 ` [PATCH v10 nf-next 1/3] net: pppoe: avoid zero-length arrays in struct pppoe_hdr Eric Woudstra
2025-03-23 16:48 ` Simon Horman
2025-03-25 6:46 ` Eric Woudstra
2025-03-15 19:59 ` [PATCH v10 nf-next 2/3] netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit direct Eric Woudstra
2025-03-18 23:23 ` Pablo Neira Ayuso
2025-03-19 19:37 ` Eric Woudstra
2025-03-15 19:59 ` [PATCH v10 nf-next 3/3] netfilter: flow: remove hw_outdev, out.hw_ifindex and out.hw_ifidx Eric Woudstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).