* [PATCH RFC v1 net-next 01/12] netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit direct
2024-10-13 18:54 [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Eric Woudstra
@ 2024-10-13 18:54 ` Eric Woudstra
2024-10-13 18:54 ` [PATCH RFC v1 net-next 02/12] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
` (11 subsequent siblings)
12 siblings, 0 replies; 39+ messages in thread
From: Eric Woudstra @ 2024-10-13 18:54 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, Eric Woudstra
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
Loosely based on wenxu's patches:
"nf_flow_table_offload: offload the vlan/PPPoE encap in the flowtable".
Fixed double vlan and pppoe packets, almost entirely rewriting the patch.
After this patch, it is possible to transmit packets in the fastpath with
outgoing encaps, without using vlan- and/or pppoe-devices.
This makes it possible to use more different kinds of network setups.
For example, when bridge tagging is used to egress vlan tagged
packets using the forward fastpath. Another example is passing 802.1q
tagged packets through a bridge using the bridge fastpath.
This also makes the software fastpath process more similar to the
hardware offloaded fastpath process, where encaps are also pushed.
After applying this patch, always info->outdev = info->hw_outdev,
so the netfilter code can be further cleaned up by removing:
* hw_outdev from struct nft_forward_info
* out.hw_ifindex from struct nf_flow_route
* out.hw_ifidx from struct flow_offload_tuple
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/netfilter/nf_flow_table_ip.c | 96 +++++++++++++++++++++++++++++++-
net/netfilter/nft_flow_offload.c | 6 +-
2 files changed, 96 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 98edcaa37b38..9221ddb6f07a 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -302,6 +302,92 @@ static bool nf_flow_skb_encap_protocol(struct sk_buff *skb, __be16 proto,
return false;
}
+static inline int nf_flow_vlan_inner_push(struct sk_buff *skb, __be16 proto, u16 id)
+{
+ struct vlan_hdr *vhdr;
+
+ if (skb_cow_head(skb, VLAN_HLEN))
+ return -1;
+
+ __skb_push(skb, VLAN_HLEN);
+ skb_reset_network_header(skb);
+
+ vhdr = (struct vlan_hdr *)(skb->data);
+ vhdr->h_vlan_TCI = htons(id);
+ vhdr->h_vlan_encapsulated_proto = skb->protocol;
+ skb->protocol = proto;
+
+ return 0;
+}
+
+static inline int nf_flow_ppoe_push(struct sk_buff *skb, u16 id)
+{
+ struct ppp_hdr {
+ struct pppoe_hdr hdr;
+ __be16 proto;
+ } *ph;
+ int data_len = skb->len + 2;
+ __be16 proto;
+
+ if (skb_cow_head(skb, PPPOE_SES_HLEN))
+ return -1;
+
+ if (skb->protocol == htons(ETH_P_IP))
+ proto = htons(PPP_IP);
+ else if (skb->protocol == htons(ETH_P_IPV6))
+ proto = htons(PPP_IPV6);
+ else
+ return -1;
+
+ __skb_push(skb, PPPOE_SES_HLEN);
+ skb_reset_network_header(skb);
+
+ ph = (struct ppp_hdr *)(skb->data);
+ ph->hdr.ver = 1;
+ ph->hdr.type = 1;
+ ph->hdr.code = 0;
+ ph->hdr.sid = htons(id);
+ ph->hdr.length = htons(data_len);
+ ph->proto = proto;
+ skb->protocol = htons(ETH_P_PPP_SES);
+
+ return 0;
+}
+
+static int nf_flow_encap_push(struct sk_buff *skb,
+ struct flow_offload_tuple_rhash *tuplehash,
+ unsigned short *type)
+{
+ int i = 0, ret = 0;
+
+ if (!tuplehash->tuple.encap_num)
+ return 0;
+
+ if (tuplehash->tuple.encap[i].proto == htons(ETH_P_8021Q) ||
+ tuplehash->tuple.encap[i].proto == htons(ETH_P_8021AD)) {
+ __vlan_hwaccel_put_tag(skb, tuplehash->tuple.encap[i].proto,
+ tuplehash->tuple.encap[i].id);
+ i++;
+ if (i >= tuplehash->tuple.encap_num)
+ return 0;
+ }
+
+ switch (tuplehash->tuple.encap[i].proto) {
+ case htons(ETH_P_8021Q):
+ *type = ETH_P_8021Q;
+ ret = nf_flow_vlan_inner_push(skb,
+ tuplehash->tuple.encap[i].proto,
+ tuplehash->tuple.encap[i].id);
+ break;
+ case htons(ETH_P_PPP_SES):
+ *type = ETH_P_PPP_SES;
+ ret = nf_flow_ppoe_push(skb,
+ tuplehash->tuple.encap[i].id);
+ break;
+ }
+ return ret;
+}
+
static void nf_flow_encap_pop(struct sk_buff *skb,
struct flow_offload_tuple_rhash *tuplehash)
{
@@ -331,6 +417,7 @@ static void nf_flow_encap_pop(struct sk_buff *skb,
static unsigned int nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
const struct flow_offload_tuple_rhash *tuplehash,
+ struct flow_offload_tuple_rhash *other_tuplehash,
unsigned short type)
{
struct net_device *outdev;
@@ -339,6 +426,9 @@ static unsigned int nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
if (!outdev)
return NF_DROP;
+ if (nf_flow_encap_push(skb, other_tuplehash, &type) < 0)
+ return NF_DROP;
+
skb->dev = outdev;
dev_hard_header(skb, skb->dev, type, tuplehash->tuple.out.h_dest,
tuplehash->tuple.out.h_source, skb->len);
@@ -458,7 +548,8 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
ret = NF_STOLEN;
break;
case FLOW_OFFLOAD_XMIT_DIRECT:
- ret = nf_flow_queue_xmit(state->net, skb, tuplehash, ETH_P_IP);
+ ret = nf_flow_queue_xmit(state->net, skb, tuplehash,
+ &flow->tuplehash[!dir], ETH_P_IP);
if (ret == NF_DROP)
flow_offload_teardown(flow);
break;
@@ -753,7 +844,8 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
ret = NF_STOLEN;
break;
case FLOW_OFFLOAD_XMIT_DIRECT:
- ret = nf_flow_queue_xmit(state->net, skb, tuplehash, ETH_P_IPV6);
+ ret = nf_flow_queue_xmit(state->net, skb, tuplehash,
+ &flow->tuplehash[!dir], ETH_P_IPV6);
if (ret == NF_DROP)
flow_offload_teardown(flow);
break;
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index e8f800788c4a..bb15aa55e6fb 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -124,13 +124,12 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack,
info->indev = NULL;
break;
}
- if (!info->outdev)
- info->outdev = path->dev;
info->encap[info->num_encaps].id = path->encap.id;
info->encap[info->num_encaps].proto = path->encap.proto;
info->num_encaps++;
if (path->type == DEV_PATH_PPPOE)
memcpy(info->h_dest, path->encap.h_dest, ETH_ALEN);
+ info->xmit_type = FLOW_OFFLOAD_XMIT_DIRECT;
break;
case DEV_PATH_BRIDGE:
if (is_zero_ether_addr(info->h_source))
@@ -158,8 +157,7 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack,
break;
}
}
- if (!info->outdev)
- info->outdev = info->indev;
+ info->outdev = info->indev;
info->hw_outdev = info->indev;
--
2.45.2
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH RFC v1 net-next 02/12] netfilter: bridge: Add conntrack double vlan and pppoe
2024-10-13 18:54 [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Eric Woudstra
2024-10-13 18:54 ` [PATCH RFC v1 net-next 01/12] netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit direct Eric Woudstra
@ 2024-10-13 18:54 ` Eric Woudstra
2024-10-18 13:17 ` Vladimir Oltean
2024-10-13 18:54 ` [PATCH RFC v1 net-next 03/12] netfilter: nft_chain_filter: Add bridge " Eric Woudstra
` (10 subsequent siblings)
12 siblings, 1 reply; 39+ messages in thread
From: Eric Woudstra @ 2024-10-13 18:54 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, Eric Woudstra
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
This adds the capability to conntrack 802.1ad, QinQ, PPPoE and PPPoE-in-Q
packets that are passing a bridge.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/bridge/netfilter/nf_conntrack_bridge.c | 86 ++++++++++++++++++----
1 file changed, 73 insertions(+), 13 deletions(-)
diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
index 816bb0fde718..fb2f79396aa0 100644
--- a/net/bridge/netfilter/nf_conntrack_bridge.c
+++ b/net/bridge/netfilter/nf_conntrack_bridge.c
@@ -241,56 +241,116 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state)
{
struct nf_hook_state bridge_state = *state;
+ __be16 outer_proto, inner_proto;
enum ip_conntrack_info ctinfo;
+ int ret, offset = 0;
struct nf_conn *ct;
- u32 len;
- int ret;
+ u32 len, data_len;
ct = nf_ct_get(skb, &ctinfo);
if ((ct && !nf_ct_is_template(ct)) ||
ctinfo == IP_CT_UNTRACKED)
return NF_ACCEPT;
+ switch (skb->protocol) {
+ case htons(ETH_P_PPP_SES):
+ struct ppp_hdr {
+ struct pppoe_hdr hdr;
+ __be16 proto;
+ } *ph = (struct ppp_hdr *)(skb->data);
+
+ data_len = ntohs(ph->hdr.length) - 2;
+ offset = PPPOE_SES_HLEN;
+ outer_proto = skb->protocol;
+ switch (ph->proto) {
+ case htons(PPP_IP):
+ inner_proto = htons(ETH_P_IP);
+ break;
+ case htons(PPP_IPV6):
+ inner_proto = htons(ETH_P_IPV6);
+ break;
+ default:
+ return NF_ACCEPT;
+ }
+ break;
+ case htons(ETH_P_8021Q):
+ struct vlan_hdr *vhdr = (struct vlan_hdr *)(skb->data);
+
+ data_len = 0xffffffff;
+ offset = VLAN_HLEN;
+ outer_proto = skb->protocol;
+ inner_proto = vhdr->h_vlan_encapsulated_proto;
+ break;
+ default:
+ data_len = 0xffffffff;
+ break;
+ }
+
+ if (offset) {
+ switch (inner_proto) {
+ case htons(ETH_P_IP):
+ case htons(ETH_P_IPV6):
+ if (!pskb_may_pull(skb, offset))
+ return NF_ACCEPT;
+ skb_pull_rcsum(skb, offset);
+ skb_reset_network_header(skb);
+ skb->protocol = inner_proto;
+ break;
+ default:
+ return NF_ACCEPT;
+ }
+ }
+
+ ret = NF_ACCEPT;
switch (skb->protocol) {
case htons(ETH_P_IP):
if (!pskb_may_pull(skb, sizeof(struct iphdr)))
- return NF_ACCEPT;
+ goto do_not_track;
len = skb_ip_totlen(skb);
+ if (data_len < len)
+ len = data_len;
if (pskb_trim_rcsum(skb, len))
- return NF_ACCEPT;
+ goto do_not_track;
if (nf_ct_br_ip_check(skb))
- return NF_ACCEPT;
+ goto do_not_track;
bridge_state.pf = NFPROTO_IPV4;
ret = nf_ct_br_defrag4(skb, &bridge_state);
break;
case htons(ETH_P_IPV6):
if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
- return NF_ACCEPT;
+ goto do_not_track;
len = sizeof(struct ipv6hdr) + ntohs(ipv6_hdr(skb)->payload_len);
+ if (data_len < len)
+ len = data_len;
if (pskb_trim_rcsum(skb, len))
- return NF_ACCEPT;
+ goto do_not_track;
if (nf_ct_br_ipv6_check(skb))
- return NF_ACCEPT;
+ goto do_not_track;
bridge_state.pf = NFPROTO_IPV6;
ret = nf_ct_br_defrag6(skb, &bridge_state);
break;
default:
nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
- return NF_ACCEPT;
+ goto do_not_track;
}
- if (ret != NF_ACCEPT)
- return ret;
+ if (ret == NF_ACCEPT)
+ ret = nf_conntrack_in(skb, &bridge_state);
- return nf_conntrack_in(skb, &bridge_state);
+do_not_track:
+ if (offset) {
+ skb_push_rcsum(skb, offset);
+ skb_reset_network_header(skb);
+ skb->protocol = outer_proto;
+ }
+ return ret;
}
-
static unsigned int nf_ct_bridge_in(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state)
{
--
2.45.2
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 02/12] netfilter: bridge: Add conntrack double vlan and pppoe
2024-10-13 18:54 ` [PATCH RFC v1 net-next 02/12] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
@ 2024-10-18 13:17 ` Vladimir Oltean
2024-10-18 18:53 ` Eric Woudstra
0 siblings, 1 reply; 39+ messages in thread
From: Vladimir Oltean @ 2024-10-18 13:17 UTC (permalink / raw)
To: Eric Woudstra
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, netdev, linux-kernel,
netfilter-devel, coreteam, bridge, linux-arm-kernel,
linux-mediatek
On Sun, Oct 13, 2024 at 08:54:58PM +0200, Eric Woudstra wrote:
> This adds the capability to conntrack 802.1ad, QinQ, PPPoE and PPPoE-in-Q
> packets that are passing a bridge.
>
> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
> ---
Whatever you choose to do forward with these patches, please squash this
build fix here (you can drop my authorship info and commit message):
From e73315196c3143de2af2fe39e3b0e95391849d6c Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Fri, 18 Oct 2024 13:59:27 +0300
Subject: [PATCH] netfilter: bridge: fix build failures in nf_ct_bridge_pre()
clang-16 fails to build, stating:
net/bridge/netfilter/nf_conntrack_bridge.c:257:3: error: expected expression
struct ppp_hdr {
^
net/bridge/netfilter/nf_conntrack_bridge.c:262:20: error: use of undeclared identifier 'ph'
data_len = ntohs(ph->hdr.length) - 2;
^
net/bridge/netfilter/nf_conntrack_bridge.c:262:20: error: use of undeclared identifier 'ph'
net/bridge/netfilter/nf_conntrack_bridge.c:262:20: error: use of undeclared identifier 'ph'
net/bridge/netfilter/nf_conntrack_bridge.c:262:20: error: use of undeclared identifier 'ph'
net/bridge/netfilter/nf_conntrack_bridge.c:265:11: error: use of undeclared identifier 'ph'
switch (ph->proto) {
^
net/bridge/netfilter/nf_conntrack_bridge.c:278:3: error: expected expression
struct vlan_hdr *vhdr = (struct vlan_hdr *)(skb->data);
^
net/bridge/netfilter/nf_conntrack_bridge.c:283:17: error: use of undeclared identifier 'vhdr'
inner_proto = vhdr->h_vlan_encapsulated_proto;
^
One cannot have variable declarations placed this way in a switch/case
statement, a new scope must be opened.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
net/bridge/netfilter/nf_conntrack_bridge.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
index fb2f79396aa0..31e2bcd71735 100644
--- a/net/bridge/netfilter/nf_conntrack_bridge.c
+++ b/net/bridge/netfilter/nf_conntrack_bridge.c
@@ -253,7 +253,7 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
return NF_ACCEPT;
switch (skb->protocol) {
- case htons(ETH_P_PPP_SES):
+ case htons(ETH_P_PPP_SES): {
struct ppp_hdr {
struct pppoe_hdr hdr;
__be16 proto;
@@ -273,7 +273,8 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
return NF_ACCEPT;
}
break;
- case htons(ETH_P_8021Q):
+ }
+ case htons(ETH_P_8021Q): {
struct vlan_hdr *vhdr = (struct vlan_hdr *)(skb->data);
data_len = 0xffffffff;
@@ -281,6 +282,7 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
outer_proto = skb->protocol;
inner_proto = vhdr->h_vlan_encapsulated_proto;
break;
+ }
default:
data_len = 0xffffffff;
break;
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 02/12] netfilter: bridge: Add conntrack double vlan and pppoe
2024-10-18 13:17 ` Vladimir Oltean
@ 2024-10-18 18:53 ` Eric Woudstra
0 siblings, 0 replies; 39+ messages in thread
From: Eric Woudstra @ 2024-10-18 18:53 UTC (permalink / raw)
To: Vladimir Oltean
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, netdev, linux-kernel,
netfilter-devel, coreteam, bridge, linux-arm-kernel,
linux-mediatek
On 10/18/24 3:17 PM, Vladimir Oltean wrote:
> On Sun, Oct 13, 2024 at 08:54:58PM +0200, Eric Woudstra wrote:
>> This adds the capability to conntrack 802.1ad, QinQ, PPPoE and PPPoE-in-Q
>> packets that are passing a bridge.
>>
>> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
>> ---
>
> Whatever you choose to do forward with these patches, please squash this
> build fix here (you can drop my authorship info and commit message):
Thanks, I had already fixed the errors from patchwork.kernel.org->checks
for the next version of the rfc patch. This is indeed one of them.
> From e73315196c3143de2af2fe39e3b0e95391849d6c Mon Sep 17 00:00:00 2001
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> Date: Fri, 18 Oct 2024 13:59:27 +0300
> Subject: [PATCH] netfilter: bridge: fix build failures in nf_ct_bridge_pre()
>
> clang-16 fails to build, stating:
>
> net/bridge/netfilter/nf_conntrack_bridge.c:257:3: error: expected expression
> struct ppp_hdr {
> ^
> net/bridge/netfilter/nf_conntrack_bridge.c:262:20: error: use of undeclared identifier 'ph'
> data_len = ntohs(ph->hdr.length) - 2;
> ^
> net/bridge/netfilter/nf_conntrack_bridge.c:262:20: error: use of undeclared identifier 'ph'
> net/bridge/netfilter/nf_conntrack_bridge.c:262:20: error: use of undeclared identifier 'ph'
> net/bridge/netfilter/nf_conntrack_bridge.c:262:20: error: use of undeclared identifier 'ph'
> net/bridge/netfilter/nf_conntrack_bridge.c:265:11: error: use of undeclared identifier 'ph'
> switch (ph->proto) {
> ^
>
> net/bridge/netfilter/nf_conntrack_bridge.c:278:3: error: expected expression
> struct vlan_hdr *vhdr = (struct vlan_hdr *)(skb->data);
> ^
> net/bridge/netfilter/nf_conntrack_bridge.c:283:17: error: use of undeclared identifier 'vhdr'
> inner_proto = vhdr->h_vlan_encapsulated_proto;
> ^
>
> One cannot have variable declarations placed this way in a switch/case
> statement, a new scope must be opened.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
> net/bridge/netfilter/nf_conntrack_bridge.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
> index fb2f79396aa0..31e2bcd71735 100644
> --- a/net/bridge/netfilter/nf_conntrack_bridge.c
> +++ b/net/bridge/netfilter/nf_conntrack_bridge.c
> @@ -253,7 +253,7 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
> return NF_ACCEPT;
>
> switch (skb->protocol) {
> - case htons(ETH_P_PPP_SES):
> + case htons(ETH_P_PPP_SES): {
> struct ppp_hdr {
> struct pppoe_hdr hdr;
> __be16 proto;
> @@ -273,7 +273,8 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
> return NF_ACCEPT;
> }
> break;
> - case htons(ETH_P_8021Q):
> + }
> + case htons(ETH_P_8021Q): {
> struct vlan_hdr *vhdr = (struct vlan_hdr *)(skb->data);
>
> data_len = 0xffffffff;
> @@ -281,6 +282,7 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
> outer_proto = skb->protocol;
> inner_proto = vhdr->h_vlan_encapsulated_proto;
> break;
> + }
> default:
> data_len = 0xffffffff;
> break;
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH RFC v1 net-next 03/12] netfilter: nft_chain_filter: Add bridge double vlan and pppoe
2024-10-13 18:54 [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Eric Woudstra
2024-10-13 18:54 ` [PATCH RFC v1 net-next 01/12] netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit direct Eric Woudstra
2024-10-13 18:54 ` [PATCH RFC v1 net-next 02/12] netfilter: bridge: Add conntrack double vlan and pppoe Eric Woudstra
@ 2024-10-13 18:54 ` Eric Woudstra
2024-10-13 18:55 ` [PATCH RFC v1 net-next 04/12] bridge: br_vlan_fill_forward_path_pvid: Add port to port Eric Woudstra
` (9 subsequent siblings)
12 siblings, 0 replies; 39+ messages in thread
From: Eric Woudstra @ 2024-10-13 18:54 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, Eric Woudstra
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
This adds the capability to evaluate 802.1ad, QinQ, PPPoE and PPPoE-in-Q
packets in the bridge filter chain.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/netfilter/nft_chain_filter.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index 7010541fcca6..91aa3fa43d31 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -232,11 +232,27 @@ nft_do_chain_bridge(void *priv,
struct sk_buff *skb,
const struct nf_hook_state *state)
{
+ struct ethhdr *ethh = eth_hdr(skb);
struct nft_pktinfo pkt;
+ int thoff;
nft_set_pktinfo(&pkt, skb, state);
- switch (eth_hdr(skb)->h_proto) {
+ switch (ethh->h_proto) {
+ case htons(ETH_P_PPP_SES):
+ thoff = PPPOE_SES_HLEN;
+ ethh += thoff;
+ break;
+ case htons(ETH_P_8021Q):
+ thoff = VLAN_HLEN;
+ ethh += thoff;
+ break;
+ default:
+ thoff = 0;
+ break;
+ }
+
+ switch (ethh->h_proto) {
case htons(ETH_P_IP):
nft_set_pktinfo_ipv4_validate(&pkt);
break;
@@ -248,6 +264,8 @@ nft_do_chain_bridge(void *priv,
break;
}
+ pkt.thoff += thoff;
+
return nft_do_chain(&pkt, priv);
}
--
2.45.2
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH RFC v1 net-next 04/12] bridge: br_vlan_fill_forward_path_pvid: Add port to port
2024-10-13 18:54 [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Eric Woudstra
` (2 preceding siblings ...)
2024-10-13 18:54 ` [PATCH RFC v1 net-next 03/12] netfilter: nft_chain_filter: Add bridge " Eric Woudstra
@ 2024-10-13 18:55 ` Eric Woudstra
2024-10-14 6:36 ` Nikolay Aleksandrov
2024-10-13 18:55 ` [PATCH RFC v1 net-next 05/12] bridge: br_fill_forward_path add " Eric Woudstra
` (8 subsequent siblings)
12 siblings, 1 reply; 39+ messages in thread
From: Eric Woudstra @ 2024-10-13 18:55 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, Eric Woudstra
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
Lookup vlan group from bridge port, if it is passed as argument.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/bridge/br_private.h | 1 +
net/bridge/br_vlan.c | 6 +++++-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index d4bedc87b1d8..8da7798f9368 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -1581,6 +1581,7 @@ bool br_vlan_can_enter_range(const struct net_bridge_vlan *v_curr,
const struct net_bridge_vlan *range_end);
void br_vlan_fill_forward_path_pvid(struct net_bridge *br,
+ struct net_bridge_port *p,
struct net_device_path_ctx *ctx,
struct net_device_path *path);
int br_vlan_fill_forward_path_mode(struct net_bridge *br,
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 9c2fffb827ab..1830d7d617cd 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -1441,6 +1441,7 @@ int br_vlan_get_pvid_rcu(const struct net_device *dev, u16 *p_pvid)
EXPORT_SYMBOL_GPL(br_vlan_get_pvid_rcu);
void br_vlan_fill_forward_path_pvid(struct net_bridge *br,
+ struct net_bridge_port *p,
struct net_device_path_ctx *ctx,
struct net_device_path *path)
{
@@ -1453,7 +1454,10 @@ void br_vlan_fill_forward_path_pvid(struct net_bridge *br,
if (!br_opt_get(br, BROPT_VLAN_ENABLED))
return;
- vg = br_vlan_group(br);
+ if (p)
+ vg = nbp_vlan_group(p);
+ else
+ vg = br_vlan_group(br);
if (idx >= 0 &&
ctx->vlan[idx].proto == br->vlan_proto) {
--
2.45.2
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 04/12] bridge: br_vlan_fill_forward_path_pvid: Add port to port
2024-10-13 18:55 ` [PATCH RFC v1 net-next 04/12] bridge: br_vlan_fill_forward_path_pvid: Add port to port Eric Woudstra
@ 2024-10-14 6:36 ` Nikolay Aleksandrov
0 siblings, 0 replies; 39+ messages in thread
From: Nikolay Aleksandrov @ 2024-10-14 6:36 UTC (permalink / raw)
To: Eric Woudstra, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Matthias Brugger, AngeloGioacchino Del Regno, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Frank Wunderlich,
Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
On 13/10/2024 21:55, Eric Woudstra wrote:
> Lookup vlan group from bridge port, if it is passed as argument.
>
> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
> ---
> net/bridge/br_private.h | 1 +
> net/bridge/br_vlan.c | 6 +++++-
> 2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index d4bedc87b1d8..8da7798f9368 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -1581,6 +1581,7 @@ bool br_vlan_can_enter_range(const struct net_bridge_vlan *v_curr,
> const struct net_bridge_vlan *range_end);
>
> void br_vlan_fill_forward_path_pvid(struct net_bridge *br,
> + struct net_bridge_port *p,
> struct net_device_path_ctx *ctx,
> struct net_device_path *path);
> int br_vlan_fill_forward_path_mode(struct net_bridge *br,
You haven't updated the !CONFIG_BRIDGE_VLAN_FILTERING version of this helper.
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index 9c2fffb827ab..1830d7d617cd 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -1441,6 +1441,7 @@ int br_vlan_get_pvid_rcu(const struct net_device *dev, u16 *p_pvid)
> EXPORT_SYMBOL_GPL(br_vlan_get_pvid_rcu);
>
> void br_vlan_fill_forward_path_pvid(struct net_bridge *br,
> + struct net_bridge_port *p,
> struct net_device_path_ctx *ctx,
> struct net_device_path *path)
> {
> @@ -1453,7 +1454,10 @@ void br_vlan_fill_forward_path_pvid(struct net_bridge *br,
> if (!br_opt_get(br, BROPT_VLAN_ENABLED))
> return;
>
> - vg = br_vlan_group(br);
> + if (p)
> + vg = nbp_vlan_group(p);
> + else
> + vg = br_vlan_group(br);
>
> if (idx >= 0 &&
> ctx->vlan[idx].proto == br->vlan_proto) {
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH RFC v1 net-next 05/12] bridge: br_fill_forward_path add port to port
2024-10-13 18:54 [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Eric Woudstra
` (3 preceding siblings ...)
2024-10-13 18:55 ` [PATCH RFC v1 net-next 04/12] bridge: br_vlan_fill_forward_path_pvid: Add port to port Eric Woudstra
@ 2024-10-13 18:55 ` Eric Woudstra
2024-10-14 6:30 ` Nikolay Aleksandrov
2024-10-13 18:55 ` [PATCH RFC v1 net-next 06/12] net: core: dev: Add dev_fill_bridge_path() Eric Woudstra
` (7 subsequent siblings)
12 siblings, 1 reply; 39+ messages in thread
From: Eric Woudstra @ 2024-10-13 18:55 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, Eric Woudstra
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
If handed a bridge port, use the bridge master to fill the forward path.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/bridge/br_device.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 26b79feb385d..e242e091b4a6 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -384,15 +384,25 @@ static int br_fill_forward_path(struct net_device_path_ctx *ctx,
struct net_device_path *path)
{
struct net_bridge_fdb_entry *f;
- struct net_bridge_port *dst;
+ struct net_bridge_port *src, *dst;
+ struct net_device *br_dev;
struct net_bridge *br;
- if (netif_is_bridge_port(ctx->dev))
- return -1;
+ if (netif_is_bridge_port(ctx->dev)) {
+ br_dev = netdev_master_upper_dev_get_rcu((struct net_device *)ctx->dev);
+ if (!br_dev)
+ return -1;
+
+ br = netdev_priv(br_dev);
- br = netdev_priv(ctx->dev);
+ src = br_port_get_rcu(ctx->dev);
- br_vlan_fill_forward_path_pvid(br, ctx, path);
+ br_vlan_fill_forward_path_pvid(br, src, ctx, path);
+ } else {
+ br = netdev_priv(ctx->dev);
+
+ br_vlan_fill_forward_path_pvid(br, NULL, ctx, path);
+ }
f = br_fdb_find_rcu(br, ctx->daddr, path->bridge.vlan_id);
if (!f)
--
2.45.2
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 05/12] bridge: br_fill_forward_path add port to port
2024-10-13 18:55 ` [PATCH RFC v1 net-next 05/12] bridge: br_fill_forward_path add " Eric Woudstra
@ 2024-10-14 6:30 ` Nikolay Aleksandrov
0 siblings, 0 replies; 39+ messages in thread
From: Nikolay Aleksandrov @ 2024-10-14 6:30 UTC (permalink / raw)
To: Eric Woudstra, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Matthias Brugger, AngeloGioacchino Del Regno, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Frank Wunderlich,
Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
On 13/10/2024 21:55, Eric Woudstra wrote:
> If handed a bridge port, use the bridge master to fill the forward path.
>
> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
> ---
> net/bridge/br_device.c | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
> index 26b79feb385d..e242e091b4a6 100644
> --- a/net/bridge/br_device.c
> +++ b/net/bridge/br_device.c
> @@ -384,15 +384,25 @@ static int br_fill_forward_path(struct net_device_path_ctx *ctx,
> struct net_device_path *path)
> {
> struct net_bridge_fdb_entry *f;
> - struct net_bridge_port *dst;
> + struct net_bridge_port *src, *dst;
> + struct net_device *br_dev;
reverse xmas tree order
> struct net_bridge *br;
>
> - if (netif_is_bridge_port(ctx->dev))
> - return -1;
> + if (netif_is_bridge_port(ctx->dev)) {
> + br_dev = netdev_master_upper_dev_get_rcu((struct net_device *)ctx->dev);
> + if (!br_dev)
> + return -1;
> +
> + br = netdev_priv(br_dev);
>
> - br = netdev_priv(ctx->dev);
> + src = br_port_get_rcu(ctx->dev);
>
> - br_vlan_fill_forward_path_pvid(br, ctx, path);
> + br_vlan_fill_forward_path_pvid(br, src, ctx, path);
> + } else {
> + br = netdev_priv(ctx->dev);
> +
> + br_vlan_fill_forward_path_pvid(br, NULL, ctx, path);
> + }
>
> f = br_fdb_find_rcu(br, ctx->daddr, path->bridge.vlan_id);
> if (!f)
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH RFC v1 net-next 06/12] net: core: dev: Add dev_fill_bridge_path()
2024-10-13 18:54 [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Eric Woudstra
` (4 preceding siblings ...)
2024-10-13 18:55 ` [PATCH RFC v1 net-next 05/12] bridge: br_fill_forward_path add " Eric Woudstra
@ 2024-10-13 18:55 ` Eric Woudstra
2024-10-14 6:59 ` Nikolay Aleksandrov
2024-10-13 18:55 ` [PATCH RFC v1 net-next 07/12] netfilter :nf_flow_table_offload: Add nf_flow_rule_bridge() Eric Woudstra
` (6 subsequent siblings)
12 siblings, 1 reply; 39+ messages in thread
From: Eric Woudstra @ 2024-10-13 18:55 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, Eric Woudstra
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
New function dev_fill_bridge_path(), similar to dev_fill_forward_path().
It handles starting from a bridge port instead of the bridge master.
The structures ctx and nft_forward_info need to be already filled in with
the (vlan) encaps.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
include/linux/netdevice.h | 2 +
net/core/dev.c | 77 ++++++++++++++++++++++++++++++++-------
2 files changed, 66 insertions(+), 13 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e87b5e488325..9d80f650345e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3069,6 +3069,8 @@ void dev_remove_offload(struct packet_offload *po);
int dev_get_iflink(const struct net_device *dev);
int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb);
+int dev_fill_bridge_path(struct net_device_path_ctx *ctx,
+ struct net_device_path_stack *stack);
int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
struct net_device_path_stack *stack);
struct net_device *__dev_get_by_flags(struct net *net, unsigned short flags,
diff --git a/net/core/dev.c b/net/core/dev.c
index cd479f5f22f6..49959c4904fc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -713,44 +713,95 @@ static struct net_device_path *dev_fwd_path(struct net_device_path_stack *stack)
return &stack->path[k];
}
-int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
- struct net_device_path_stack *stack)
+static int dev_fill_forward_path_common(struct net_device_path_ctx *ctx,
+ struct net_device_path_stack *stack)
{
const struct net_device *last_dev;
- struct net_device_path_ctx ctx = {
- .dev = dev,
- };
struct net_device_path *path;
int ret = 0;
- memcpy(ctx.daddr, daddr, sizeof(ctx.daddr));
- stack->num_paths = 0;
- while (ctx.dev && ctx.dev->netdev_ops->ndo_fill_forward_path) {
- last_dev = ctx.dev;
+ while (ctx->dev && ctx->dev->netdev_ops->ndo_fill_forward_path) {
+ last_dev = ctx->dev;
path = dev_fwd_path(stack);
if (!path)
return -1;
memset(path, 0, sizeof(struct net_device_path));
- ret = ctx.dev->netdev_ops->ndo_fill_forward_path(&ctx, path);
+ ret = ctx->dev->netdev_ops->ndo_fill_forward_path(ctx, path);
if (ret < 0)
return -1;
- if (WARN_ON_ONCE(last_dev == ctx.dev))
+ if (WARN_ON_ONCE(last_dev == ctx->dev))
return -1;
}
- if (!ctx.dev)
+ if (!ctx->dev)
return ret;
path = dev_fwd_path(stack);
if (!path)
return -1;
path->type = DEV_PATH_ETHERNET;
- path->dev = ctx.dev;
+ path->dev = ctx->dev;
+
+ return ret;
+}
+
+int dev_fill_bridge_path(struct net_device_path_ctx *ctx,
+ struct net_device_path_stack *stack)
+{
+ const struct net_device *last_dev, *br_dev;
+ struct net_device_path *path;
+ int ret = 0;
+
+ stack->num_paths = 0;
+
+ if (!ctx->dev || !netif_is_bridge_port(ctx->dev))
+ return -1;
+
+ br_dev = netdev_master_upper_dev_get_rcu((struct net_device *)ctx->dev);
+ if (!br_dev || !br_dev->netdev_ops->ndo_fill_forward_path)
+ return -1;
+
+ last_dev = ctx->dev;
+ path = dev_fwd_path(stack);
+ if (!path)
+ return -1;
+
+ memset(path, 0, sizeof(struct net_device_path));
+ ret = br_dev->netdev_ops->ndo_fill_forward_path(ctx, path);
+ if (ret < 0)
+ return -1;
+
+ if (!ctx->dev || WARN_ON_ONCE(last_dev == ctx->dev))
+ return -1;
+
+ if (!netif_is_bridge_master(ctx->dev))
+ return dev_fill_forward_path_common(ctx, stack);
+
+ path = dev_fwd_path(stack);
+ if (!path)
+ return -1;
+ path->type = DEV_PATH_ETHERNET;
+ path->dev = ctx->dev;
return ret;
}
+EXPORT_SYMBOL_GPL(dev_fill_bridge_path);
+
+int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
+ struct net_device_path_stack *stack)
+{
+ struct net_device_path_ctx ctx = {
+ .dev = dev,
+ };
+
+ memcpy(ctx.daddr, daddr, sizeof(ctx.daddr));
+
+ stack->num_paths = 0;
+
+ return dev_fill_forward_path_common(&ctx, stack);
+}
EXPORT_SYMBOL_GPL(dev_fill_forward_path);
/**
--
2.45.2
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 06/12] net: core: dev: Add dev_fill_bridge_path()
2024-10-13 18:55 ` [PATCH RFC v1 net-next 06/12] net: core: dev: Add dev_fill_bridge_path() Eric Woudstra
@ 2024-10-14 6:59 ` Nikolay Aleksandrov
2024-10-14 18:34 ` Eric Woudstra
0 siblings, 1 reply; 39+ messages in thread
From: Nikolay Aleksandrov @ 2024-10-14 6:59 UTC (permalink / raw)
To: Eric Woudstra, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Matthias Brugger, AngeloGioacchino Del Regno, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Frank Wunderlich,
Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
On 13/10/2024 21:55, Eric Woudstra wrote:
> New function dev_fill_bridge_path(), similar to dev_fill_forward_path().
> It handles starting from a bridge port instead of the bridge master.
> The structures ctx and nft_forward_info need to be already filled in with
> the (vlan) encaps.
>
> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
> ---
> include/linux/netdevice.h | 2 +
> net/core/dev.c | 77 ++++++++++++++++++++++++++++++++-------
> 2 files changed, 66 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index e87b5e488325..9d80f650345e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3069,6 +3069,8 @@ void dev_remove_offload(struct packet_offload *po);
>
> int dev_get_iflink(const struct net_device *dev);
> int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb);
> +int dev_fill_bridge_path(struct net_device_path_ctx *ctx,
> + struct net_device_path_stack *stack);
> int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
> struct net_device_path_stack *stack);
> struct net_device *__dev_get_by_flags(struct net *net, unsigned short flags,
> diff --git a/net/core/dev.c b/net/core/dev.c
> index cd479f5f22f6..49959c4904fc 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -713,44 +713,95 @@ static struct net_device_path *dev_fwd_path(struct net_device_path_stack *stack)
> return &stack->path[k];
> }
>
> -int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
> - struct net_device_path_stack *stack)
> +static int dev_fill_forward_path_common(struct net_device_path_ctx *ctx,
> + struct net_device_path_stack *stack)
> {
> const struct net_device *last_dev;
> - struct net_device_path_ctx ctx = {
> - .dev = dev,
> - };
> struct net_device_path *path;
> int ret = 0;
>
> - memcpy(ctx.daddr, daddr, sizeof(ctx.daddr));
> - stack->num_paths = 0;
> - while (ctx.dev && ctx.dev->netdev_ops->ndo_fill_forward_path) {
> - last_dev = ctx.dev;
> + while (ctx->dev && ctx->dev->netdev_ops->ndo_fill_forward_path) {
> + last_dev = ctx->dev;
> path = dev_fwd_path(stack);
> if (!path)
> return -1;
>
> memset(path, 0, sizeof(struct net_device_path));
> - ret = ctx.dev->netdev_ops->ndo_fill_forward_path(&ctx, path);
> + ret = ctx->dev->netdev_ops->ndo_fill_forward_path(ctx, path);
> if (ret < 0)
> return -1;
>
> - if (WARN_ON_ONCE(last_dev == ctx.dev))
> + if (WARN_ON_ONCE(last_dev == ctx->dev))
> return -1;
> }
>
> - if (!ctx.dev)
> + if (!ctx->dev)
> return ret;
>
> path = dev_fwd_path(stack);
> if (!path)
> return -1;
> path->type = DEV_PATH_ETHERNET;
> - path->dev = ctx.dev;
> + path->dev = ctx->dev;
> +
> + return ret;
> +}
> +
> +int dev_fill_bridge_path(struct net_device_path_ctx *ctx,
> + struct net_device_path_stack *stack)
> +{
> + const struct net_device *last_dev, *br_dev;
> + struct net_device_path *path;
> + int ret = 0;
> +
> + stack->num_paths = 0;
> +
> + if (!ctx->dev || !netif_is_bridge_port(ctx->dev))
> + return -1;
> +
> + br_dev = netdev_master_upper_dev_get_rcu((struct net_device *)ctx->dev);
> + if (!br_dev || !br_dev->netdev_ops->ndo_fill_forward_path)
> + return -1;
> +
> + last_dev = ctx->dev;
> + path = dev_fwd_path(stack);
> + if (!path)
> + return -1;
> +
> + memset(path, 0, sizeof(struct net_device_path));
> + ret = br_dev->netdev_ops->ndo_fill_forward_path(ctx, path);
> + if (ret < 0)
> + return -1;
> +
> + if (!ctx->dev || WARN_ON_ONCE(last_dev == ctx->dev))
> + return -1;
> +
> + if (!netif_is_bridge_master(ctx->dev))
hmm, do we expect ctx->dev to be a bridge master? Looking at
br_fill_forward_path, it seems to be == fdb->dst->dev which
should be the target port
> + return dev_fill_forward_path_common(ctx, stack);
> +
> + path = dev_fwd_path(stack);
> + if (!path)
> + return -1;
> + path->type = DEV_PATH_ETHERNET;
> + path->dev = ctx->dev;
>
> return ret;
> }
> +EXPORT_SYMBOL_GPL(dev_fill_bridge_path);
> +
> +int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
> + struct net_device_path_stack *stack)
> +{
> + struct net_device_path_ctx ctx = {
> + .dev = dev,
> + };
> +
> + memcpy(ctx.daddr, daddr, sizeof(ctx.daddr));
> +
> + stack->num_paths = 0;
> +
> + return dev_fill_forward_path_common(&ctx, stack);
> +}
> EXPORT_SYMBOL_GPL(dev_fill_forward_path);
>
> /**
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 06/12] net: core: dev: Add dev_fill_bridge_path()
2024-10-14 6:59 ` Nikolay Aleksandrov
@ 2024-10-14 18:34 ` Eric Woudstra
2024-10-16 7:43 ` Nikolay Aleksandrov
0 siblings, 1 reply; 39+ messages in thread
From: Eric Woudstra @ 2024-10-14 18:34 UTC (permalink / raw)
To: Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Roopa Prabhu, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
On 10/14/24 8:59 AM, Nikolay Aleksandrov wrote:
> On 13/10/2024 21:55, Eric Woudstra wrote:
>> New function dev_fill_bridge_path(), similar to dev_fill_forward_path().
>> It handles starting from a bridge port instead of the bridge master.
>> The structures ctx and nft_forward_info need to be already filled in with
>> the (vlan) encaps.
>>
>> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
>> ---
>> include/linux/netdevice.h | 2 +
>> net/core/dev.c | 77 ++++++++++++++++++++++++++++++++-------
>> 2 files changed, 66 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index e87b5e488325..9d80f650345e 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -3069,6 +3069,8 @@ void dev_remove_offload(struct packet_offload *po);
>>
>> int dev_get_iflink(const struct net_device *dev);
>> int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb);
>> +int dev_fill_bridge_path(struct net_device_path_ctx *ctx,
>> + struct net_device_path_stack *stack);
>> int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
>> struct net_device_path_stack *stack);
>> struct net_device *__dev_get_by_flags(struct net *net, unsigned short flags,
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index cd479f5f22f6..49959c4904fc 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -713,44 +713,95 @@ static struct net_device_path *dev_fwd_path(struct net_device_path_stack *stack)
>> return &stack->path[k];
>> }
>>
>> -int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
>> - struct net_device_path_stack *stack)
>> +static int dev_fill_forward_path_common(struct net_device_path_ctx *ctx,
>> + struct net_device_path_stack *stack)
>> {
>> const struct net_device *last_dev;
>> - struct net_device_path_ctx ctx = {
>> - .dev = dev,
>> - };
>> struct net_device_path *path;
>> int ret = 0;
>>
>> - memcpy(ctx.daddr, daddr, sizeof(ctx.daddr));
>> - stack->num_paths = 0;
>> - while (ctx.dev && ctx.dev->netdev_ops->ndo_fill_forward_path) {
>> - last_dev = ctx.dev;
>> + while (ctx->dev && ctx->dev->netdev_ops->ndo_fill_forward_path) {
>> + last_dev = ctx->dev;
>> path = dev_fwd_path(stack);
>> if (!path)
>> return -1;
>>
>> memset(path, 0, sizeof(struct net_device_path));
>> - ret = ctx.dev->netdev_ops->ndo_fill_forward_path(&ctx, path);
>> + ret = ctx->dev->netdev_ops->ndo_fill_forward_path(ctx, path);
>> if (ret < 0)
>> return -1;
>>
>> - if (WARN_ON_ONCE(last_dev == ctx.dev))
>> + if (WARN_ON_ONCE(last_dev == ctx->dev))
>> return -1;
>> }
>>
>> - if (!ctx.dev)
>> + if (!ctx->dev)
>> return ret;
>>
>> path = dev_fwd_path(stack);
>> if (!path)
>> return -1;
>> path->type = DEV_PATH_ETHERNET;
>> - path->dev = ctx.dev;
>> + path->dev = ctx->dev;
>> +
>> + return ret;
>> +}
>> +
>> +int dev_fill_bridge_path(struct net_device_path_ctx *ctx,
>> + struct net_device_path_stack *stack)
>> +{
>> + const struct net_device *last_dev, *br_dev;
>> + struct net_device_path *path;
>> + int ret = 0;
>> +
>> + stack->num_paths = 0;
>> +
>> + if (!ctx->dev || !netif_is_bridge_port(ctx->dev))
>> + return -1;
>> +
>> + br_dev = netdev_master_upper_dev_get_rcu((struct net_device *)ctx->dev);
>> + if (!br_dev || !br_dev->netdev_ops->ndo_fill_forward_path)
>> + return -1;
>> +
>> + last_dev = ctx->dev;
>> + path = dev_fwd_path(stack);
>> + if (!path)
>> + return -1;
>> +
>> + memset(path, 0, sizeof(struct net_device_path));
>> + ret = br_dev->netdev_ops->ndo_fill_forward_path(ctx, path);
>> + if (ret < 0)
>> + return -1;
>> +
>> + if (!ctx->dev || WARN_ON_ONCE(last_dev == ctx->dev))
>> + return -1;
>> +
>> + if (!netif_is_bridge_master(ctx->dev))
>
> hmm, do we expect ctx->dev to be a bridge master? Looking at
> br_fill_forward_path, it seems to be == fdb->dst->dev which
> should be the target port
It would indeed be very unlikely. It was a left-over from code I wrote,
thinking that here I could handle cascaded bridges (via vlan-device). I
dropped that, since conntrack does not follow this flow.
So would it be better to only make sure that ctx->dev is not a bridge
master?
if (netif_is_bridge_master(ctx->dev))
return -1;
return dev_fill_forward_path_common(ctx, stack);
>> + return dev_fill_forward_path_common(ctx, stack);
>> +
>> + path = dev_fwd_path(stack);
>> + if (!path)
>> + return -1;
>> + path->type = DEV_PATH_ETHERNET;
>> + path->dev = ctx->dev;
>>
>> return ret;
>> }
>> +EXPORT_SYMBOL_GPL(dev_fill_bridge_path);
>> +
>> +int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
>> + struct net_device_path_stack *stack)
>> +{
>> + struct net_device_path_ctx ctx = {
>> + .dev = dev,
>> + };
>> +
>> + memcpy(ctx.daddr, daddr, sizeof(ctx.daddr));
>> +
>> + stack->num_paths = 0;
>> +
>> + return dev_fill_forward_path_common(&ctx, stack);
>> +}
>> EXPORT_SYMBOL_GPL(dev_fill_forward_path);
>>
>> /**
>
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 06/12] net: core: dev: Add dev_fill_bridge_path()
2024-10-14 18:34 ` Eric Woudstra
@ 2024-10-16 7:43 ` Nikolay Aleksandrov
2024-10-16 15:57 ` Eric Woudstra
0 siblings, 1 reply; 39+ messages in thread
From: Nikolay Aleksandrov @ 2024-10-16 7:43 UTC (permalink / raw)
To: Eric Woudstra, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Matthias Brugger, AngeloGioacchino Del Regno, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Frank Wunderlich,
Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
On 14/10/2024 21:34, Eric Woudstra wrote:
>
>
> On 10/14/24 8:59 AM, Nikolay Aleksandrov wrote:
>> On 13/10/2024 21:55, Eric Woudstra wrote:
>>> New function dev_fill_bridge_path(), similar to dev_fill_forward_path().
>>> It handles starting from a bridge port instead of the bridge master.
>>> The structures ctx and nft_forward_info need to be already filled in with
>>> the (vlan) encaps.
>>>
>>> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
>>> ---
>>> include/linux/netdevice.h | 2 +
>>> net/core/dev.c | 77 ++++++++++++++++++++++++++++++++-------
>>> 2 files changed, 66 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index e87b5e488325..9d80f650345e 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -3069,6 +3069,8 @@ void dev_remove_offload(struct packet_offload *po);
>>>
>>> int dev_get_iflink(const struct net_device *dev);
>>> int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb);
>>> +int dev_fill_bridge_path(struct net_device_path_ctx *ctx,
>>> + struct net_device_path_stack *stack);
>>> int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
>>> struct net_device_path_stack *stack);
>>> struct net_device *__dev_get_by_flags(struct net *net, unsigned short flags,
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index cd479f5f22f6..49959c4904fc 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -713,44 +713,95 @@ static struct net_device_path *dev_fwd_path(struct net_device_path_stack *stack)
>>> return &stack->path[k];
>>> }
>>>
>>> -int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
>>> - struct net_device_path_stack *stack)
>>> +static int dev_fill_forward_path_common(struct net_device_path_ctx *ctx,
>>> + struct net_device_path_stack *stack)
>>> {
>>> const struct net_device *last_dev;
>>> - struct net_device_path_ctx ctx = {
>>> - .dev = dev,
>>> - };
>>> struct net_device_path *path;
>>> int ret = 0;
>>>
>>> - memcpy(ctx.daddr, daddr, sizeof(ctx.daddr));
>>> - stack->num_paths = 0;
>>> - while (ctx.dev && ctx.dev->netdev_ops->ndo_fill_forward_path) {
>>> - last_dev = ctx.dev;
>>> + while (ctx->dev && ctx->dev->netdev_ops->ndo_fill_forward_path) {
>>> + last_dev = ctx->dev;
>>> path = dev_fwd_path(stack);
>>> if (!path)
>>> return -1;
>>>
>>> memset(path, 0, sizeof(struct net_device_path));
>>> - ret = ctx.dev->netdev_ops->ndo_fill_forward_path(&ctx, path);
>>> + ret = ctx->dev->netdev_ops->ndo_fill_forward_path(ctx, path);
>>> if (ret < 0)
>>> return -1;
>>>
>>> - if (WARN_ON_ONCE(last_dev == ctx.dev))
>>> + if (WARN_ON_ONCE(last_dev == ctx->dev))
>>> return -1;
>>> }
>>>
>>> - if (!ctx.dev)
>>> + if (!ctx->dev)
>>> return ret;
>>>
>>> path = dev_fwd_path(stack);
>>> if (!path)
>>> return -1;
>>> path->type = DEV_PATH_ETHERNET;
>>> - path->dev = ctx.dev;
>>> + path->dev = ctx->dev;
>>> +
>>> + return ret;
>>> +}
>>> +
>>> +int dev_fill_bridge_path(struct net_device_path_ctx *ctx,
>>> + struct net_device_path_stack *stack)
>>> +{
>>> + const struct net_device *last_dev, *br_dev;
>>> + struct net_device_path *path;
>>> + int ret = 0;
>>> +
>>> + stack->num_paths = 0;
>>> +
>>> + if (!ctx->dev || !netif_is_bridge_port(ctx->dev))
>>> + return -1;
>>> +
>>> + br_dev = netdev_master_upper_dev_get_rcu((struct net_device *)ctx->dev);
>>> + if (!br_dev || !br_dev->netdev_ops->ndo_fill_forward_path)
>>> + return -1;
>>> +
>>> + last_dev = ctx->dev;
>>> + path = dev_fwd_path(stack);
>>> + if (!path)
>>> + return -1;
>>> +
>>> + memset(path, 0, sizeof(struct net_device_path));
>>> + ret = br_dev->netdev_ops->ndo_fill_forward_path(ctx, path);
>>> + if (ret < 0)
>>> + return -1;
>>> +
>>> + if (!ctx->dev || WARN_ON_ONCE(last_dev == ctx->dev))
>>> + return -1;
* ^^^^^^^^^ here
>>> +
>>> + if (!netif_is_bridge_master(ctx->dev))
>>
>> hmm, do we expect ctx->dev to be a bridge master? Looking at
>> br_fill_forward_path, it seems to be == fdb->dst->dev which
>> should be the target port
>
> It would indeed be very unlikely. It was a left-over from code I wrote,
> thinking that here I could handle cascaded bridges (via vlan-device). I
> dropped that, since conntrack does not follow this flow.
>
> So would it be better to only make sure that ctx->dev is not a bridge
> master?
>
> if (netif_is_bridge_master(ctx->dev))
> return -1;
>
> return dev_fill_forward_path_common(ctx, stack);
>
I think you misunderstood me, I don't think ctx->dev can ever be a bridge
device because ctx->dev gets set to fdb->dst and fdbs that point to the bridge
itself have fdb->dst == NULL but ctx->dev is checked against NULL earlier*
so the bridge dev check doesn't make sense to me.
>>> + return dev_fill_forward_path_common(ctx, stack);
>>> +
>>> + path = dev_fwd_path(stack);
>>> + if (!path)
>>> + return -1;
>>> + path->type = DEV_PATH_ETHERNET;
>>> + path->dev = ctx->dev;
>>>
>>> return ret;
>>> }
>>> +EXPORT_SYMBOL_GPL(dev_fill_bridge_path);
>>> +
>>> +int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
>>> + struct net_device_path_stack *stack)
>>> +{
>>> + struct net_device_path_ctx ctx = {
>>> + .dev = dev,
>>> + };
>>> +
>>> + memcpy(ctx.daddr, daddr, sizeof(ctx.daddr));
>>> +
>>> + stack->num_paths = 0;
>>> +
>>> + return dev_fill_forward_path_common(&ctx, stack);
>>> +}
>>> EXPORT_SYMBOL_GPL(dev_fill_forward_path);
>>>
>>> /**
>>
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 06/12] net: core: dev: Add dev_fill_bridge_path()
2024-10-16 7:43 ` Nikolay Aleksandrov
@ 2024-10-16 15:57 ` Eric Woudstra
0 siblings, 0 replies; 39+ messages in thread
From: Eric Woudstra @ 2024-10-16 15:57 UTC (permalink / raw)
To: Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Roopa Prabhu, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
On 10/16/24 9:43 AM, Nikolay Aleksandrov wrote:
> On 14/10/2024 21:34, Eric Woudstra wrote:
>>
>>
>> On 10/14/24 8:59 AM, Nikolay Aleksandrov wrote:
>>> On 13/10/2024 21:55, Eric Woudstra wrote:
>>>> New function dev_fill_bridge_path(), similar to dev_fill_forward_path().
>>>> It handles starting from a bridge port instead of the bridge master.
>>>> The structures ctx and nft_forward_info need to be already filled in with
>>>> the (vlan) encaps.
>>>>
>>>> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
>>>> ---
>>>> include/linux/netdevice.h | 2 +
>>>> net/core/dev.c | 77 ++++++++++++++++++++++++++++++++-------
>>>> 2 files changed, 66 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index e87b5e488325..9d80f650345e 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -3069,6 +3069,8 @@ void dev_remove_offload(struct packet_offload *po);
>>>>
>>>> int dev_get_iflink(const struct net_device *dev);
>>>> int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb);
>>>> +int dev_fill_bridge_path(struct net_device_path_ctx *ctx,
>>>> + struct net_device_path_stack *stack);
>>>> int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
>>>> struct net_device_path_stack *stack);
>>>> struct net_device *__dev_get_by_flags(struct net *net, unsigned short flags,
>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>> index cd479f5f22f6..49959c4904fc 100644
>>>> --- a/net/core/dev.c
>>>> +++ b/net/core/dev.c
>>>> @@ -713,44 +713,95 @@ static struct net_device_path *dev_fwd_path(struct net_device_path_stack *stack)
>>>> return &stack->path[k];
>>>> }
>>>>
>>>> -int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
>>>> - struct net_device_path_stack *stack)
>>>> +static int dev_fill_forward_path_common(struct net_device_path_ctx *ctx,
>>>> + struct net_device_path_stack *stack)
>>>> {
>>>> const struct net_device *last_dev;
>>>> - struct net_device_path_ctx ctx = {
>>>> - .dev = dev,
>>>> - };
>>>> struct net_device_path *path;
>>>> int ret = 0;
>>>>
>>>> - memcpy(ctx.daddr, daddr, sizeof(ctx.daddr));
>>>> - stack->num_paths = 0;
>>>> - while (ctx.dev && ctx.dev->netdev_ops->ndo_fill_forward_path) {
>>>> - last_dev = ctx.dev;
>>>> + while (ctx->dev && ctx->dev->netdev_ops->ndo_fill_forward_path) {
>>>> + last_dev = ctx->dev;
>>>> path = dev_fwd_path(stack);
>>>> if (!path)
>>>> return -1;
>>>>
>>>> memset(path, 0, sizeof(struct net_device_path));
>>>> - ret = ctx.dev->netdev_ops->ndo_fill_forward_path(&ctx, path);
>>>> + ret = ctx->dev->netdev_ops->ndo_fill_forward_path(ctx, path);
>>>> if (ret < 0)
>>>> return -1;
>>>>
>>>> - if (WARN_ON_ONCE(last_dev == ctx.dev))
>>>> + if (WARN_ON_ONCE(last_dev == ctx->dev))
>>>> return -1;
>>>> }
>>>>
>>>> - if (!ctx.dev)
>>>> + if (!ctx->dev)
>>>> return ret;
>>>>
>>>> path = dev_fwd_path(stack);
>>>> if (!path)
>>>> return -1;
>>>> path->type = DEV_PATH_ETHERNET;
>>>> - path->dev = ctx.dev;
>>>> + path->dev = ctx->dev;
>>>> +
>>>> + return ret;
>>>> +}
>>>> +
>>>> +int dev_fill_bridge_path(struct net_device_path_ctx *ctx,
>>>> + struct net_device_path_stack *stack)
>>>> +{
>>>> + const struct net_device *last_dev, *br_dev;
>>>> + struct net_device_path *path;
>>>> + int ret = 0;
>>>> +
>>>> + stack->num_paths = 0;
>>>> +
>>>> + if (!ctx->dev || !netif_is_bridge_port(ctx->dev))
>>>> + return -1;
>>>> +
>>>> + br_dev = netdev_master_upper_dev_get_rcu((struct net_device *)ctx->dev);
>>>> + if (!br_dev || !br_dev->netdev_ops->ndo_fill_forward_path)
>>>> + return -1;
>>>> +
>>>> + last_dev = ctx->dev;
>>>> + path = dev_fwd_path(stack);
>>>> + if (!path)
>>>> + return -1;
>>>> +
>>>> + memset(path, 0, sizeof(struct net_device_path));
>>>> + ret = br_dev->netdev_ops->ndo_fill_forward_path(ctx, path);
>>>> + if (ret < 0)
>>>> + return -1;
>>>> +
>>>> + if (!ctx->dev || WARN_ON_ONCE(last_dev == ctx->dev))
>>>> + return -1;
>
> * ^^^^^^^^^ here
>
>>>> +
>>>> + if (!netif_is_bridge_master(ctx->dev))
>>>
>>> hmm, do we expect ctx->dev to be a bridge master? Looking at
>>> br_fill_forward_path, it seems to be == fdb->dst->dev which
>>> should be the target port
>>
>> It would indeed be very unlikely. It was a left-over from code I wrote,
>> thinking that here I could handle cascaded bridges (via vlan-device). I
>> dropped that, since conntrack does not follow this flow.
>>
>> So would it be better to only make sure that ctx->dev is not a bridge
>> master?
>>
>> if (netif_is_bridge_master(ctx->dev))
>> return -1;
>>
>> return dev_fill_forward_path_common(ctx, stack);
>>
>
> I think you misunderstood me, I don't think ctx->dev can ever be a bridge
> device because ctx->dev gets set to fdb->dst and fdbs that point to the bridge
> itself have fdb->dst == NULL but ctx->dev is checked against NULL earlier*
> so the bridge dev check doesn't make sense to me.
I see, thanks. I'll drop the check in v2.
>>>> + return dev_fill_forward_path_common(ctx, stack);
>>>> +
>>>> + path = dev_fwd_path(stack);
>>>> + if (!path)
>>>> + return -1;
>>>> + path->type = DEV_PATH_ETHERNET;
>>>> + path->dev = ctx->dev;
>>>>
>>>> return ret;
>>>> }
>>>> +EXPORT_SYMBOL_GPL(dev_fill_bridge_path);
>>>> +
>>>> +int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
>>>> + struct net_device_path_stack *stack)
>>>> +{
>>>> + struct net_device_path_ctx ctx = {
>>>> + .dev = dev,
>>>> + };
>>>> +
>>>> + memcpy(ctx.daddr, daddr, sizeof(ctx.daddr));
>>>> +
>>>> + stack->num_paths = 0;
>>>> +
>>>> + return dev_fill_forward_path_common(&ctx, stack);
>>>> +}
>>>> EXPORT_SYMBOL_GPL(dev_fill_forward_path);
>>>>
>>>> /**
>>>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH RFC v1 net-next 07/12] netfilter :nf_flow_table_offload: Add nf_flow_rule_bridge()
2024-10-13 18:54 [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Eric Woudstra
` (5 preceding siblings ...)
2024-10-13 18:55 ` [PATCH RFC v1 net-next 06/12] net: core: dev: Add dev_fill_bridge_path() Eric Woudstra
@ 2024-10-13 18:55 ` Eric Woudstra
2024-10-13 18:55 ` [PATCH RFC v1 net-next 08/12] netfilter: nf_flow_table_inet: Add nf_flowtable_type flowtable_bridge Eric Woudstra
` (5 subsequent siblings)
12 siblings, 0 replies; 39+ messages in thread
From: Eric Woudstra @ 2024-10-13 18:55 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, Eric Woudstra
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
Add nf_flow_rule_bridge().
It only calls the common rule and adds the redirect.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
include/net/netfilter/nf_flow_table.h | 3 +++
net/netfilter/nf_flow_table_offload.c | 13 +++++++++++++
2 files changed, 16 insertions(+)
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index b63d53bb9dd6..568019a3898a 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -341,6 +341,9 @@ void nf_flow_table_offload_flush_cleanup(struct nf_flowtable *flowtable);
int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
struct net_device *dev,
enum flow_block_command cmd);
+int nf_flow_rule_bridge(struct net *net, struct flow_offload *flow,
+ enum flow_offload_tuple_dir dir,
+ struct nf_flow_rule *flow_rule);
int nf_flow_rule_route_ipv4(struct net *net, struct flow_offload *flow,
enum flow_offload_tuple_dir dir,
struct nf_flow_rule *flow_rule);
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index e06bc36f49fe..5543ce03a196 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -679,6 +679,19 @@ nf_flow_rule_route_common(struct net *net, const struct flow_offload *flow,
return 0;
}
+int nf_flow_rule_bridge(struct net *net, struct flow_offload *flow,
+ enum flow_offload_tuple_dir dir,
+ struct nf_flow_rule *flow_rule)
+{
+ if (nf_flow_rule_route_common(net, flow, dir, flow_rule) < 0)
+ return -1;
+
+ flow_offload_redirect(net, flow, dir, flow_rule);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(nf_flow_rule_bridge);
+
int nf_flow_rule_route_ipv4(struct net *net, struct flow_offload *flow,
enum flow_offload_tuple_dir dir,
struct nf_flow_rule *flow_rule)
--
2.45.2
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH RFC v1 net-next 08/12] netfilter: nf_flow_table_inet: Add nf_flowtable_type flowtable_bridge
2024-10-13 18:54 [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Eric Woudstra
` (6 preceding siblings ...)
2024-10-13 18:55 ` [PATCH RFC v1 net-next 07/12] netfilter :nf_flow_table_offload: Add nf_flow_rule_bridge() Eric Woudstra
@ 2024-10-13 18:55 ` Eric Woudstra
2024-10-13 18:55 ` [PATCH RFC v1 net-next 09/12] netfilter: nft_flow_offload: Add NFPROTO_BRIDGE to validate Eric Woudstra
` (4 subsequent siblings)
12 siblings, 0 replies; 39+ messages in thread
From: Eric Woudstra @ 2024-10-13 18:55 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, Eric Woudstra
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
This will allow a flowtable to be added to the nft bridge family.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/netfilter/nf_flow_table_inet.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c
index b0f199171932..80b238196f29 100644
--- a/net/netfilter/nf_flow_table_inet.c
+++ b/net/netfilter/nf_flow_table_inet.c
@@ -65,6 +65,16 @@ static int nf_flow_rule_route_inet(struct net *net,
return err;
}
+static struct nf_flowtable_type flowtable_bridge = {
+ .family = NFPROTO_BRIDGE,
+ .init = nf_flow_table_init,
+ .setup = nf_flow_table_offload_setup,
+ .action = nf_flow_rule_bridge,
+ .free = nf_flow_table_free,
+ .hook = nf_flow_offload_inet_hook,
+ .owner = THIS_MODULE,
+};
+
static struct nf_flowtable_type flowtable_inet = {
.family = NFPROTO_INET,
.init = nf_flow_table_init,
@@ -97,6 +107,7 @@ static struct nf_flowtable_type flowtable_ipv6 = {
static int __init nf_flow_inet_module_init(void)
{
+ nft_register_flowtable_type(&flowtable_bridge);
nft_register_flowtable_type(&flowtable_ipv4);
nft_register_flowtable_type(&flowtable_ipv6);
nft_register_flowtable_type(&flowtable_inet);
@@ -109,6 +120,7 @@ static void __exit nf_flow_inet_module_exit(void)
nft_unregister_flowtable_type(&flowtable_inet);
nft_unregister_flowtable_type(&flowtable_ipv6);
nft_unregister_flowtable_type(&flowtable_ipv4);
+ nft_unregister_flowtable_type(&flowtable_bridge);
}
module_init(nf_flow_inet_module_init);
@@ -118,5 +130,6 @@ MODULE_LICENSE("GPL");
MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>");
MODULE_ALIAS_NF_FLOWTABLE(AF_INET);
MODULE_ALIAS_NF_FLOWTABLE(AF_INET6);
+MODULE_ALIAS_NF_FLOWTABLE(AF_BRIDGE);
MODULE_ALIAS_NF_FLOWTABLE(1); /* NFPROTO_INET */
MODULE_DESCRIPTION("Netfilter flow table mixed IPv4/IPv6 module");
--
2.45.2
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH RFC v1 net-next 09/12] netfilter: nft_flow_offload: Add NFPROTO_BRIDGE to validate
2024-10-13 18:54 [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Eric Woudstra
` (7 preceding siblings ...)
2024-10-13 18:55 ` [PATCH RFC v1 net-next 08/12] netfilter: nf_flow_table_inet: Add nf_flowtable_type flowtable_bridge Eric Woudstra
@ 2024-10-13 18:55 ` Eric Woudstra
2024-10-13 18:55 ` [PATCH RFC v1 net-next 10/12] netfilter: nft_flow_offload: Add DEV_PATH_MTK_WDMA to nft_dev_path_info() Eric Woudstra
` (3 subsequent siblings)
12 siblings, 0 replies; 39+ messages in thread
From: Eric Woudstra @ 2024-10-13 18:55 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, Eric Woudstra
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
Need to add NFPROTO_BRIDGE to nft_flow_offload_validate() to support
the bridge-fastpath.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/netfilter/nft_flow_offload.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index bb15aa55e6fb..6719a810e9b5 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -421,7 +421,8 @@ static int nft_flow_offload_validate(const struct nft_ctx *ctx,
if (ctx->family != NFPROTO_IPV4 &&
ctx->family != NFPROTO_IPV6 &&
- ctx->family != NFPROTO_INET)
+ ctx->family != NFPROTO_INET &&
+ ctx->family != NFPROTO_BRIDGE)
return -EOPNOTSUPP;
return nft_chain_validate_hooks(ctx->chain, hook_mask);
--
2.45.2
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH RFC v1 net-next 10/12] netfilter: nft_flow_offload: Add DEV_PATH_MTK_WDMA to nft_dev_path_info()
2024-10-13 18:54 [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Eric Woudstra
` (8 preceding siblings ...)
2024-10-13 18:55 ` [PATCH RFC v1 net-next 09/12] netfilter: nft_flow_offload: Add NFPROTO_BRIDGE to validate Eric Woudstra
@ 2024-10-13 18:55 ` Eric Woudstra
2024-10-13 18:55 ` [PATCH RFC v1 net-next 11/12] bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa Eric Woudstra
` (2 subsequent siblings)
12 siblings, 0 replies; 39+ messages in thread
From: Eric Woudstra @ 2024-10-13 18:55 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, Eric Woudstra
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
In case of using mediatek wireless, in nft_dev_fill_forward_path(), the
forward path is filled, ending with mediatek wlan1.
Because DEV_PATH_MTK_WDMA is unknown inside nft_dev_path_info() it returns
with info.indev = NULL. Then nft_dev_forward_path() returns without
setting the direct transmit parameters.
This results in a neighbor transmit, and direct transmit not possible.
But we want to use it for flow between bridged interfaces.
So this patch adds DEV_PATH_MTK_WDMA to nft_dev_path_info() and makes
direct transmission possible.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/netfilter/nft_flow_offload.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index 6719a810e9b5..2923286d475e 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -106,6 +106,7 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack,
switch (path->type) {
case DEV_PATH_ETHERNET:
case DEV_PATH_DSA:
+ case DEV_PATH_MTK_WDMA:
case DEV_PATH_VLAN:
case DEV_PATH_PPPOE:
info->indev = path->dev;
@@ -114,7 +115,7 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack,
if (path->type == DEV_PATH_ETHERNET)
break;
- if (path->type == DEV_PATH_DSA) {
+ if (path->type == DEV_PATH_DSA || path->type == DEV_PATH_MTK_WDMA) {
i = stack->num_paths;
break;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH RFC v1 net-next 11/12] bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa
2024-10-13 18:54 [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Eric Woudstra
` (9 preceding siblings ...)
2024-10-13 18:55 ` [PATCH RFC v1 net-next 10/12] netfilter: nft_flow_offload: Add DEV_PATH_MTK_WDMA to nft_dev_path_info() Eric Woudstra
@ 2024-10-13 18:55 ` Eric Woudstra
2024-10-14 6:18 ` Nikolay Aleksandrov
2024-10-13 18:55 ` [PATCH RFC v1 net-next 12/12] netfilter: nft_flow_offload: Add bridgeflow to nft_flow_offload_eval() Eric Woudstra
2024-10-14 6:35 ` [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Nikolay Aleksandrov
12 siblings, 1 reply; 39+ messages in thread
From: Eric Woudstra @ 2024-10-13 18:55 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, Eric Woudstra
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
In network setup as below:
fastpath bypass
.----------------------------------------.
/ \
| IP - forwarding |
| / \ v
| / wan ...
| /
| |
| |
| brlan.1
| |
| +-------------------------------+
| | vlan 1 |
| | |
| | brlan (vlan-filtering) |
| | +---------------+
| | | DSA-SWITCH |
| | vlan 1 | |
| | to | |
| | untagged 1 vlan 1 |
| +---------------+---------------+
. / \
----->wlan1 lan0
. .
. ^
^ vlan 1 tagged packets
untagged packets
Now that DEV_PATH_MTK_WDMA is added to nft_dev_path_info() the forward
path is filled also when ending with the mediatek wlan1, info.indev not
NULL now in nft_dev_forward_path(). This results in a direct transmit
instead of a neighbor transmit. This is how it should be, But this fails.
br_vlan_fill_forward_path_mode() sets DEV_PATH_BR_VLAN_UNTAG_HW when
filling in from brlan.1 towards wlan1. But it should be set to
DEV_PATH_BR_VLAN_UNTAG in this case. Using BR_VLFLAG_ADDED_BY_SWITCHDEV
is not correct. The dsa switchdev adds it as a foreign port.
Use BR_VLFLAG_TAGGING_BY_SWITCHDEV to make sure DEV_PATH_BR_VLAN_UNTAG is
set when there is a dsa-switch inside the bridge.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/bridge/br_private.h | 1 +
net/bridge/br_vlan.c | 18 +++++++++++++++++-
2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 8da7798f9368..7d427214cc7c 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -180,6 +180,7 @@ enum {
BR_VLFLAG_MCAST_ENABLED = BIT(2),
BR_VLFLAG_GLOBAL_MCAST_ENABLED = BIT(3),
BR_VLFLAG_NEIGH_SUPPRESS_ENABLED = BIT(4),
+ BR_VLFLAG_TAGGING_BY_SWITCHDEV = BIT(5),
};
/**
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 1830d7d617cd..b7877724b969 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -3,6 +3,7 @@
#include <linux/netdevice.h>
#include <linux/rtnetlink.h>
#include <linux/slab.h>
+#include <net/dsa.h>
#include <net/switchdev.h>
#include "br_private.h"
@@ -100,6 +101,19 @@ static void __vlan_flags_commit(struct net_bridge_vlan *v, u16 flags)
__vlan_flags_update(v, flags, true);
}
+static inline bool br_vlan_tagging_by_switchdev(struct net_bridge *br)
+{
+#if IS_ENABLED(CONFIG_NET_DSA)
+ struct net_bridge_port *p;
+
+ list_for_each_entry(p, &br->port_list, list) {
+ if (dsa_user_dev_check(p->dev))
+ return false;
+ }
+#endif
+ return true;
+}
+
static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
struct net_bridge_vlan *v, u16 flags,
struct netlink_ext_ack *extack)
@@ -113,6 +127,8 @@ static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
if (err == -EOPNOTSUPP)
return vlan_vid_add(dev, br->vlan_proto, v->vid);
v->priv_flags |= BR_VLFLAG_ADDED_BY_SWITCHDEV;
+ if (br_vlan_tagging_by_switchdev(br))
+ v->priv_flags |= BR_VLFLAG_TAGGING_BY_SWITCHDEV;
return err;
}
@@ -1491,7 +1507,7 @@ int br_vlan_fill_forward_path_mode(struct net_bridge *br,
if (path->bridge.vlan_mode == DEV_PATH_BR_VLAN_TAG)
path->bridge.vlan_mode = DEV_PATH_BR_VLAN_KEEP;
- else if (v->priv_flags & BR_VLFLAG_ADDED_BY_SWITCHDEV)
+ else if (v->priv_flags & BR_VLFLAG_TAGGING_BY_SWITCHDEV)
path->bridge.vlan_mode = DEV_PATH_BR_VLAN_UNTAG_HW;
else
path->bridge.vlan_mode = DEV_PATH_BR_VLAN_UNTAG;
--
2.45.2
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 11/12] bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa
2024-10-13 18:55 ` [PATCH RFC v1 net-next 11/12] bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa Eric Woudstra
@ 2024-10-14 6:18 ` Nikolay Aleksandrov
2024-10-14 6:22 ` Nikolay Aleksandrov
0 siblings, 1 reply; 39+ messages in thread
From: Nikolay Aleksandrov @ 2024-10-14 6:18 UTC (permalink / raw)
To: Eric Woudstra, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Matthias Brugger, AngeloGioacchino Del Regno, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Frank Wunderlich,
Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
On 13/10/2024 21:55, Eric Woudstra wrote:
> In network setup as below:
>
> fastpath bypass
> .----------------------------------------.
> / \
> | IP - forwarding |
> | / \ v
> | / wan ...
> | /
> | |
> | |
> | brlan.1
> | |
> | +-------------------------------+
> | | vlan 1 |
> | | |
> | | brlan (vlan-filtering) |
> | | +---------------+
> | | | DSA-SWITCH |
> | | vlan 1 | |
> | | to | |
> | | untagged 1 vlan 1 |
> | +---------------+---------------+
> . / \
> ----->wlan1 lan0
> . .
> . ^
> ^ vlan 1 tagged packets
> untagged packets
>
> Now that DEV_PATH_MTK_WDMA is added to nft_dev_path_info() the forward
> path is filled also when ending with the mediatek wlan1, info.indev not
> NULL now in nft_dev_forward_path(). This results in a direct transmit
> instead of a neighbor transmit. This is how it should be, But this fails.
>
> br_vlan_fill_forward_path_mode() sets DEV_PATH_BR_VLAN_UNTAG_HW when
> filling in from brlan.1 towards wlan1. But it should be set to
> DEV_PATH_BR_VLAN_UNTAG in this case. Using BR_VLFLAG_ADDED_BY_SWITCHDEV
> is not correct. The dsa switchdev adds it as a foreign port.
>
> Use BR_VLFLAG_TAGGING_BY_SWITCHDEV to make sure DEV_PATH_BR_VLAN_UNTAG is
> set when there is a dsa-switch inside the bridge.
>
> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
> ---
> net/bridge/br_private.h | 1 +
> net/bridge/br_vlan.c | 18 +++++++++++++++++-
> 2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 8da7798f9368..7d427214cc7c 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -180,6 +180,7 @@ enum {
> BR_VLFLAG_MCAST_ENABLED = BIT(2),
> BR_VLFLAG_GLOBAL_MCAST_ENABLED = BIT(3),
> BR_VLFLAG_NEIGH_SUPPRESS_ENABLED = BIT(4),
> + BR_VLFLAG_TAGGING_BY_SWITCHDEV = BIT(5),
> };
>
> /**
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index 1830d7d617cd..b7877724b969 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -3,6 +3,7 @@
> #include <linux/netdevice.h>
> #include <linux/rtnetlink.h>
> #include <linux/slab.h>
> +#include <net/dsa.h>
> #include <net/switchdev.h>
>
> #include "br_private.h"
> @@ -100,6 +101,19 @@ static void __vlan_flags_commit(struct net_bridge_vlan *v, u16 flags)
> __vlan_flags_update(v, flags, true);
> }
>
> +static inline bool br_vlan_tagging_by_switchdev(struct net_bridge *br)
no inline in .c files and also constify br
> +{
> +#if IS_ENABLED(CONFIG_NET_DSA)
> + struct net_bridge_port *p;
> +
> + list_for_each_entry(p, &br->port_list, list) {
> + if (dsa_user_dev_check(p->dev))
I don't think this can change at runtime, so please keep a counter in
the bridge and don't walk the port list on every vlan add.
> + return false;
> + }
> +#endif
> + return true;
> +}
> +
> static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
> struct net_bridge_vlan *v, u16 flags,
> struct netlink_ext_ack *extack)
> @@ -113,6 +127,8 @@ static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
> if (err == -EOPNOTSUPP)
> return vlan_vid_add(dev, br->vlan_proto, v->vid);
> v->priv_flags |= BR_VLFLAG_ADDED_BY_SWITCHDEV;
> + if (br_vlan_tagging_by_switchdev(br))
> + v->priv_flags |= BR_VLFLAG_TAGGING_BY_SWITCHDEV;
> return err;
> }
>
> @@ -1491,7 +1507,7 @@ int br_vlan_fill_forward_path_mode(struct net_bridge *br,
>
> if (path->bridge.vlan_mode == DEV_PATH_BR_VLAN_TAG)
> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_KEEP;
> - else if (v->priv_flags & BR_VLFLAG_ADDED_BY_SWITCHDEV)
> + else if (v->priv_flags & BR_VLFLAG_TAGGING_BY_SWITCHDEV)
> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_UNTAG_HW;
> else
> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_UNTAG;
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 11/12] bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa
2024-10-14 6:18 ` Nikolay Aleksandrov
@ 2024-10-14 6:22 ` Nikolay Aleksandrov
2024-10-14 14:46 ` Vladimir Oltean
0 siblings, 1 reply; 39+ messages in thread
From: Nikolay Aleksandrov @ 2024-10-14 6:22 UTC (permalink / raw)
To: Eric Woudstra, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Matthias Brugger, AngeloGioacchino Del Regno, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Frank Wunderlich,
Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
On 14/10/2024 09:18, Nikolay Aleksandrov wrote:
> On 13/10/2024 21:55, Eric Woudstra wrote:
>> In network setup as below:
>>
>> fastpath bypass
>> .----------------------------------------.
>> / \
>> | IP - forwarding |
>> | / \ v
>> | / wan ...
>> | /
>> | |
>> | |
>> | brlan.1
>> | |
>> | +-------------------------------+
>> | | vlan 1 |
>> | | |
>> | | brlan (vlan-filtering) |
>> | | +---------------+
>> | | | DSA-SWITCH |
>> | | vlan 1 | |
>> | | to | |
>> | | untagged 1 vlan 1 |
>> | +---------------+---------------+
>> . / \
>> ----->wlan1 lan0
>> . .
>> . ^
>> ^ vlan 1 tagged packets
>> untagged packets
>>
>> Now that DEV_PATH_MTK_WDMA is added to nft_dev_path_info() the forward
>> path is filled also when ending with the mediatek wlan1, info.indev not
>> NULL now in nft_dev_forward_path(). This results in a direct transmit
>> instead of a neighbor transmit. This is how it should be, But this fails.
>>
>> br_vlan_fill_forward_path_mode() sets DEV_PATH_BR_VLAN_UNTAG_HW when
>> filling in from brlan.1 towards wlan1. But it should be set to
>> DEV_PATH_BR_VLAN_UNTAG in this case. Using BR_VLFLAG_ADDED_BY_SWITCHDEV
>> is not correct. The dsa switchdev adds it as a foreign port.
>>
>> Use BR_VLFLAG_TAGGING_BY_SWITCHDEV to make sure DEV_PATH_BR_VLAN_UNTAG is
>> set when there is a dsa-switch inside the bridge.
>>
>> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
>> ---
>> net/bridge/br_private.h | 1 +
>> net/bridge/br_vlan.c | 18 +++++++++++++++++-
>> 2 files changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
>> index 8da7798f9368..7d427214cc7c 100644
>> --- a/net/bridge/br_private.h
>> +++ b/net/bridge/br_private.h
>> @@ -180,6 +180,7 @@ enum {
>> BR_VLFLAG_MCAST_ENABLED = BIT(2),
>> BR_VLFLAG_GLOBAL_MCAST_ENABLED = BIT(3),
>> BR_VLFLAG_NEIGH_SUPPRESS_ENABLED = BIT(4),
>> + BR_VLFLAG_TAGGING_BY_SWITCHDEV = BIT(5),
>> };
>>
>> /**
>> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
>> index 1830d7d617cd..b7877724b969 100644
>> --- a/net/bridge/br_vlan.c
>> +++ b/net/bridge/br_vlan.c
>> @@ -3,6 +3,7 @@
>> #include <linux/netdevice.h>
>> #include <linux/rtnetlink.h>
>> #include <linux/slab.h>
>> +#include <net/dsa.h>
>> #include <net/switchdev.h>
>>
>> #include "br_private.h"
>> @@ -100,6 +101,19 @@ static void __vlan_flags_commit(struct net_bridge_vlan *v, u16 flags)
>> __vlan_flags_update(v, flags, true);
>> }
>>
>> +static inline bool br_vlan_tagging_by_switchdev(struct net_bridge *br)
>
> no inline in .c files and also constify br
>
>> +{
>> +#if IS_ENABLED(CONFIG_NET_DSA)
>> + struct net_bridge_port *p;
>> +
>> + list_for_each_entry(p, &br->port_list, list) {
>> + if (dsa_user_dev_check(p->dev))
>
> I don't think this can change at runtime, so please keep a counter in
> the bridge and don't walk the port list on every vlan add.
>
you can use an internal bridge opt (check br_private.h) with a private opt
that's set when such device is added as a port, no need for a full counter
obviously
>> + return false;
>> + }
>> +#endif
>> + return true;
>> +}
>> +
>> static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
>> struct net_bridge_vlan *v, u16 flags,
>> struct netlink_ext_ack *extack)
>> @@ -113,6 +127,8 @@ static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
>> if (err == -EOPNOTSUPP)
>> return vlan_vid_add(dev, br->vlan_proto, v->vid);
>> v->priv_flags |= BR_VLFLAG_ADDED_BY_SWITCHDEV;
>> + if (br_vlan_tagging_by_switchdev(br))
>> + v->priv_flags |= BR_VLFLAG_TAGGING_BY_SWITCHDEV;
>> return err;
>> }
>>
>> @@ -1491,7 +1507,7 @@ int br_vlan_fill_forward_path_mode(struct net_bridge *br,
>>
>> if (path->bridge.vlan_mode == DEV_PATH_BR_VLAN_TAG)
>> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_KEEP;
>> - else if (v->priv_flags & BR_VLFLAG_ADDED_BY_SWITCHDEV)
>> + else if (v->priv_flags & BR_VLFLAG_TAGGING_BY_SWITCHDEV)
>> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_UNTAG_HW;
>> else
>> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_UNTAG;
>
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 11/12] bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa
2024-10-14 6:22 ` Nikolay Aleksandrov
@ 2024-10-14 14:46 ` Vladimir Oltean
2024-10-15 10:26 ` Eric Woudstra
0 siblings, 1 reply; 39+ messages in thread
From: Vladimir Oltean @ 2024-10-14 14:46 UTC (permalink / raw)
To: Eric Woudstra
Cc: Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Roopa Prabhu, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, netdev, linux-kernel,
netfilter-devel, coreteam, bridge, linux-arm-kernel,
linux-mediatek, Andrew Lunn, Florian Fainelli
Keeping the full email body untrimmed for extra context for the newly
added people.
On Mon, Oct 14, 2024 at 09:22:07AM +0300, Nikolay Aleksandrov wrote:
> On 14/10/2024 09:18, Nikolay Aleksandrov wrote:
> > On 13/10/2024 21:55, Eric Woudstra wrote:
> >> In network setup as below:
> >>
> >> fastpath bypass
> >> .----------------------------------------.
> >> / \
> >> | IP - forwarding |
> >> | / \ v
> >> | / wan ...
> >> | /
> >> | |
> >> | |
> >> | brlan.1
> >> | |
> >> | +-------------------------------+
> >> | | vlan 1 |
> >> | | |
> >> | | brlan (vlan-filtering) |
> >> | | +---------------+
> >> | | | DSA-SWITCH |
> >> | | vlan 1 | |
> >> | | to | |
> >> | | untagged 1 vlan 1 |
> >> | +---------------+---------------+
> >> . / \
> >> ----->wlan1 lan0
> >> . .
> >> . ^
> >> ^ vlan 1 tagged packets
> >> untagged packets
> >>
> >> Now that DEV_PATH_MTK_WDMA is added to nft_dev_path_info() the forward
> >> path is filled also when ending with the mediatek wlan1, info.indev not
> >> NULL now in nft_dev_forward_path(). This results in a direct transmit
> >> instead of a neighbor transmit. This is how it should be, But this fails.
> >>
> >> br_vlan_fill_forward_path_mode() sets DEV_PATH_BR_VLAN_UNTAG_HW when
> >> filling in from brlan.1 towards wlan1. But it should be set to
> >> DEV_PATH_BR_VLAN_UNTAG in this case. Using BR_VLFLAG_ADDED_BY_SWITCHDEV
> >> is not correct. The dsa switchdev adds it as a foreign port.
> >>
> >> Use BR_VLFLAG_TAGGING_BY_SWITCHDEV to make sure DEV_PATH_BR_VLAN_UNTAG is
> >> set when there is a dsa-switch inside the bridge.
> >>
> >> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
> >> ---
> >> net/bridge/br_private.h | 1 +
> >> net/bridge/br_vlan.c | 18 +++++++++++++++++-
> >> 2 files changed, 18 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> >> index 8da7798f9368..7d427214cc7c 100644
> >> --- a/net/bridge/br_private.h
> >> +++ b/net/bridge/br_private.h
> >> @@ -180,6 +180,7 @@ enum {
> >> BR_VLFLAG_MCAST_ENABLED = BIT(2),
> >> BR_VLFLAG_GLOBAL_MCAST_ENABLED = BIT(3),
> >> BR_VLFLAG_NEIGH_SUPPRESS_ENABLED = BIT(4),
> >> + BR_VLFLAG_TAGGING_BY_SWITCHDEV = BIT(5),
> >> };
> >>
> >> /**
> >> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> >> index 1830d7d617cd..b7877724b969 100644
> >> --- a/net/bridge/br_vlan.c
> >> +++ b/net/bridge/br_vlan.c
> >> @@ -3,6 +3,7 @@
> >> #include <linux/netdevice.h>
> >> #include <linux/rtnetlink.h>
> >> #include <linux/slab.h>
> >> +#include <net/dsa.h>
> >> #include <net/switchdev.h>
> >>
> >> #include "br_private.h"
> >> @@ -100,6 +101,19 @@ static void __vlan_flags_commit(struct net_bridge_vlan *v, u16 flags)
> >> __vlan_flags_update(v, flags, true);
> >> }
> >>
> >> +static inline bool br_vlan_tagging_by_switchdev(struct net_bridge *br)
> >
> > no inline in .c files and also constify br
> >
> >> +{
> >> +#if IS_ENABLED(CONFIG_NET_DSA)
> >> + struct net_bridge_port *p;
> >> +
> >> + list_for_each_entry(p, &br->port_list, list) {
> >> + if (dsa_user_dev_check(p->dev))
> >
> > I don't think this can change at runtime, so please keep a counter in
> > the bridge and don't walk the port list on every vlan add.
> >
>
> you can use an internal bridge opt (check br_private.h) with a private opt
> that's set when such device is added as a port, no need for a full counter
> obviously
To continue on Nikolay's line of thought...
Can you abstractly describe which functional behavior do you need the
bridge port to perform, rather than "it needs to be a DSA user port"?
switchdev_bridge_port_offload() has a mechanism to inform the bridge
core of extra abilities (like tx_fwd_offload). Perhaps you could modify
the DSA drivers you need to set a similar bit to inform the bridge of
their presence and ability. That would also work when the bridge port is
a LAG over a DSA user port.
Also, please also CC DSA maintainers when you use DSA API outside
net/dsa/ and drivers/net/dsa/. I am in the process of revamping the
public DSA API and would like to be in touch with changes as they are
made.
> >> + return false;
> >> + }
> >> +#endif
> >> + return true;
> >> +}
> >> +
> >> static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
> >> struct net_bridge_vlan *v, u16 flags,
> >> struct netlink_ext_ack *extack)
> >> @@ -113,6 +127,8 @@ static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
> >> if (err == -EOPNOTSUPP)
> >> return vlan_vid_add(dev, br->vlan_proto, v->vid);
> >> v->priv_flags |= BR_VLFLAG_ADDED_BY_SWITCHDEV;
> >> + if (br_vlan_tagging_by_switchdev(br))
> >> + v->priv_flags |= BR_VLFLAG_TAGGING_BY_SWITCHDEV;
> >> return err;
> >> }
> >>
> >> @@ -1491,7 +1507,7 @@ int br_vlan_fill_forward_path_mode(struct net_bridge *br,
> >>
> >> if (path->bridge.vlan_mode == DEV_PATH_BR_VLAN_TAG)
> >> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_KEEP;
> >> - else if (v->priv_flags & BR_VLFLAG_ADDED_BY_SWITCHDEV)
> >> + else if (v->priv_flags & BR_VLFLAG_TAGGING_BY_SWITCHDEV)
> >> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_UNTAG_HW;
> >> else
> >> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_UNTAG;
> >
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 11/12] bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa
2024-10-14 14:46 ` Vladimir Oltean
@ 2024-10-15 10:26 ` Eric Woudstra
2024-10-20 9:23 ` Eric Woudstra
0 siblings, 1 reply; 39+ messages in thread
From: Eric Woudstra @ 2024-10-15 10:26 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Roopa Prabhu, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, netdev, linux-kernel,
netfilter-devel, coreteam, bridge, linux-arm-kernel,
linux-mediatek, Andrew Lunn, Florian Fainelli
On 10/14/24 4:46 PM, Vladimir Oltean wrote:
> Keeping the full email body untrimmed for extra context for the newly
> added people.
>
> On Mon, Oct 14, 2024 at 09:22:07AM +0300, Nikolay Aleksandrov wrote:
>> On 14/10/2024 09:18, Nikolay Aleksandrov wrote:
>>> On 13/10/2024 21:55, Eric Woudstra wrote:
>>>> In network setup as below:
>>>>
>>>> fastpath bypass
>>>> .----------------------------------------.
>>>> / \
>>>> | IP - forwarding |
>>>> | / \ v
>>>> | / wan ...
>>>> | /
>>>> | |
>>>> | |
>>>> | brlan.1
>>>> | |
>>>> | +-------------------------------+
>>>> | | vlan 1 |
>>>> | | |
>>>> | | brlan (vlan-filtering) |
>>>> | | +---------------+
>>>> | | | DSA-SWITCH |
>>>> | | vlan 1 | |
>>>> | | to | |
>>>> | | untagged 1 vlan 1 |
>>>> | +---------------+---------------+
>>>> . / \
>>>> ----->wlan1 lan0
>>>> . .
>>>> . ^
>>>> ^ vlan 1 tagged packets
>>>> untagged packets
>>>>
>>>> Now that DEV_PATH_MTK_WDMA is added to nft_dev_path_info() the forward
>>>> path is filled also when ending with the mediatek wlan1, info.indev not
>>>> NULL now in nft_dev_forward_path(). This results in a direct transmit
>>>> instead of a neighbor transmit. This is how it should be, But this fails.
>>>>
>>>> br_vlan_fill_forward_path_mode() sets DEV_PATH_BR_VLAN_UNTAG_HW when
>>>> filling in from brlan.1 towards wlan1. But it should be set to
>>>> DEV_PATH_BR_VLAN_UNTAG in this case. Using BR_VLFLAG_ADDED_BY_SWITCHDEV
>>>> is not correct. The dsa switchdev adds it as a foreign port.
>>>>
>>>> Use BR_VLFLAG_TAGGING_BY_SWITCHDEV to make sure DEV_PATH_BR_VLAN_UNTAG is
>>>> set when there is a dsa-switch inside the bridge.
>>>>
>>>> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
>>>> ---
>>>> net/bridge/br_private.h | 1 +
>>>> net/bridge/br_vlan.c | 18 +++++++++++++++++-
>>>> 2 files changed, 18 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
>>>> index 8da7798f9368..7d427214cc7c 100644
>>>> --- a/net/bridge/br_private.h
>>>> +++ b/net/bridge/br_private.h
>>>> @@ -180,6 +180,7 @@ enum {
>>>> BR_VLFLAG_MCAST_ENABLED = BIT(2),
>>>> BR_VLFLAG_GLOBAL_MCAST_ENABLED = BIT(3),
>>>> BR_VLFLAG_NEIGH_SUPPRESS_ENABLED = BIT(4),
>>>> + BR_VLFLAG_TAGGING_BY_SWITCHDEV = BIT(5),
>>>> };
>>>>
>>>> /**
>>>> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
>>>> index 1830d7d617cd..b7877724b969 100644
>>>> --- a/net/bridge/br_vlan.c
>>>> +++ b/net/bridge/br_vlan.c
>>>> @@ -3,6 +3,7 @@
>>>> #include <linux/netdevice.h>
>>>> #include <linux/rtnetlink.h>
>>>> #include <linux/slab.h>
>>>> +#include <net/dsa.h>
>>>> #include <net/switchdev.h>
>>>>
>>>> #include "br_private.h"
>>>> @@ -100,6 +101,19 @@ static void __vlan_flags_commit(struct net_bridge_vlan *v, u16 flags)
>>>> __vlan_flags_update(v, flags, true);
>>>> }
>>>>
>>>> +static inline bool br_vlan_tagging_by_switchdev(struct net_bridge *br)
>>>
>>> no inline in .c files and also constify br
>>>
>>>> +{
>>>> +#if IS_ENABLED(CONFIG_NET_DSA)
>>>> + struct net_bridge_port *p;
>>>> +
>>>> + list_for_each_entry(p, &br->port_list, list) {
>>>> + if (dsa_user_dev_check(p->dev))
>>>
>>> I don't think this can change at runtime, so please keep a counter in
>>> the bridge and don't walk the port list on every vlan add.
>>>
>>
>> you can use an internal bridge opt (check br_private.h) with a private opt
>> that's set when such device is added as a port, no need for a full counter
>> obviously
>
> To continue on Nikolay's line of thought...
>
> Can you abstractly describe which functional behavior do you need the
> bridge port to perform, rather than "it needs to be a DSA user port"?
Hello Vladimir,
This has to do with the sequence of events, when dealing with vlan
tagging in a bridge. When looking at the ingress of untaggged packets,
that will be tagged by a certain port of that bridge, it depends when
this tagging happens, in respect to the netfilter hook. This describes
the existing code:
Because we are using dev_fill_forward_path(), we know in all cases that
untagged packets arive and are tagged by the bridge.
In the case (#1) off a switchdev, the tagging is done before the packet
reaches the netfilter hook. Then the software fastpath code needs to
handle the packet, as if it is incoming tagged, remembering that this
tag was tagged at ingress (tuple->in_vlan_ingress). We need to remember
this, because dealing with hardware offloaded fastpath we then need to
skip this vlan tag in the hardware offloaded fastpath in
nf_flow_rule_match() and nf_flow_rule_route_common().
In all other cases (#2), 'normal' ports (ports without switchdev and
dsa-user-ports) the tagging is done after packet reaches the
netfilter-hook. Then the software fastpath code needs to handle the
packet, as incoming untagged.
(Of course the dsa-user-port is more complicated, but it all handled by
the dsa architecture so that it can be used as a 'normal' port.)
In both cases #1 and #2 also the other direction is taken into account.
To decide which case to apply, the code looks at
BR_VLFLAG_ADDED_BY_SWITCHDEV, which was actually used for something
else, but it is easily available in the code at that point and seemed to
work, so far...
But as it turns out, this flag is also set for foreign ports added to
the dsa-switchdev. This means that case #1 is applied to the foreign dsa
port. This breaks the software fastpath and, with packets not reaching
their destination, also the hardware offloaded fastpath does not start.
We need to apply case #2.
So actually we need to know, inside br_vlan_fill_forward_path_mode(),
whether or not net_bridge_port *dst is a foreign port added to the dsa
switchdev. Or perhaps there is another way to find out we have to apply
case #2.
> switchdev_bridge_port_offload() has a mechanism to inform the bridge
> core of extra abilities (like tx_fwd_offload). Perhaps you could modify
> the DSA drivers you need to set a similar bit to inform the bridge of
> their presence and ability. That would also work when the bridge port is
> a LAG over a DSA user port.
>
> Also, please also CC DSA maintainers when you use DSA API outside
> net/dsa/ and drivers/net/dsa/. I am in the process of revamping the
> public DSA API and would like to be in touch with changes as they are
> made.
>
>>>> + return false;
>>>> + }
>>>> +#endif
>>>> + return true;
>>>> +}
>>>> +
>>>> static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
>>>> struct net_bridge_vlan *v, u16 flags,
>>>> struct netlink_ext_ack *extack)
>>>> @@ -113,6 +127,8 @@ static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
>>>> if (err == -EOPNOTSUPP)
>>>> return vlan_vid_add(dev, br->vlan_proto, v->vid);
>>>> v->priv_flags |= BR_VLFLAG_ADDED_BY_SWITCHDEV;
>>>> + if (br_vlan_tagging_by_switchdev(br))
>>>> + v->priv_flags |= BR_VLFLAG_TAGGING_BY_SWITCHDEV;
>>>> return err;
>>>> }
>>>>
>>>> @@ -1491,7 +1507,7 @@ int br_vlan_fill_forward_path_mode(struct net_bridge *br,
>>>>
>>>> if (path->bridge.vlan_mode == DEV_PATH_BR_VLAN_TAG)
>>>> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_KEEP;
>>>> - else if (v->priv_flags & BR_VLFLAG_ADDED_BY_SWITCHDEV)
>>>> + else if (v->priv_flags & BR_VLFLAG_TAGGING_BY_SWITCHDEV)
>>>> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_UNTAG_HW;
>>>> else
>>>> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_UNTAG;
>>>
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 11/12] bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa
2024-10-15 10:26 ` Eric Woudstra
@ 2024-10-20 9:23 ` Eric Woudstra
2024-10-21 13:47 ` Vladimir Oltean
0 siblings, 1 reply; 39+ messages in thread
From: Eric Woudstra @ 2024-10-20 9:23 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Roopa Prabhu, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, netdev, linux-kernel,
netfilter-devel, coreteam, bridge, linux-arm-kernel,
linux-mediatek, Andrew Lunn, Florian Fainelli
On 10/15/24 12:26 PM, Eric Woudstra wrote:
>
>
> On 10/14/24 4:46 PM, Vladimir Oltean wrote:
>> Keeping the full email body untrimmed for extra context for the newly
>> added people.
>>
>> On Mon, Oct 14, 2024 at 09:22:07AM +0300, Nikolay Aleksandrov wrote:
>>> On 14/10/2024 09:18, Nikolay Aleksandrov wrote:
>>>> On 13/10/2024 21:55, Eric Woudstra wrote:
>>>>> In network setup as below:
>>>>>
>>>>> fastpath bypass
>>>>> .----------------------------------------.
>>>>> / \
>>>>> | IP - forwarding |
>>>>> | / \ v
>>>>> | / wan ...
>>>>> | /
>>>>> | |
>>>>> | |
>>>>> | brlan.1
>>>>> | |
>>>>> | +-------------------------------+
>>>>> | | vlan 1 |
>>>>> | | |
>>>>> | | brlan (vlan-filtering) |
>>>>> | | +---------------+
>>>>> | | | DSA-SWITCH |
>>>>> | | vlan 1 | |
>>>>> | | to | |
>>>>> | | untagged 1 vlan 1 |
>>>>> | +---------------+---------------+
>>>>> . / \
>>>>> ----->wlan1 lan0
>>>>> . .
>>>>> . ^
>>>>> ^ vlan 1 tagged packets
>>>>> untagged packets
>>>>>
>>>>> Now that DEV_PATH_MTK_WDMA is added to nft_dev_path_info() the forward
>>>>> path is filled also when ending with the mediatek wlan1, info.indev not
>>>>> NULL now in nft_dev_forward_path(). This results in a direct transmit
>>>>> instead of a neighbor transmit. This is how it should be, But this fails.
>>>>>
>>>>> br_vlan_fill_forward_path_mode() sets DEV_PATH_BR_VLAN_UNTAG_HW when
>>>>> filling in from brlan.1 towards wlan1. But it should be set to
>>>>> DEV_PATH_BR_VLAN_UNTAG in this case. Using BR_VLFLAG_ADDED_BY_SWITCHDEV
>>>>> is not correct. The dsa switchdev adds it as a foreign port.
>>>>>
>>>>> Use BR_VLFLAG_TAGGING_BY_SWITCHDEV to make sure DEV_PATH_BR_VLAN_UNTAG is
>>>>> set when there is a dsa-switch inside the bridge.
>>>>>
>>>>> Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
>>>>> ---
>>>>> net/bridge/br_private.h | 1 +
>>>>> net/bridge/br_vlan.c | 18 +++++++++++++++++-
>>>>> 2 files changed, 18 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
>>>>> index 8da7798f9368..7d427214cc7c 100644
>>>>> --- a/net/bridge/br_private.h
>>>>> +++ b/net/bridge/br_private.h
>>>>> @@ -180,6 +180,7 @@ enum {
>>>>> BR_VLFLAG_MCAST_ENABLED = BIT(2),
>>>>> BR_VLFLAG_GLOBAL_MCAST_ENABLED = BIT(3),
>>>>> BR_VLFLAG_NEIGH_SUPPRESS_ENABLED = BIT(4),
>>>>> + BR_VLFLAG_TAGGING_BY_SWITCHDEV = BIT(5),
>>>>> };
>>>>>
>>>>> /**
>>>>> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
>>>>> index 1830d7d617cd..b7877724b969 100644
>>>>> --- a/net/bridge/br_vlan.c
>>>>> +++ b/net/bridge/br_vlan.c
>>>>> @@ -3,6 +3,7 @@
>>>>> #include <linux/netdevice.h>
>>>>> #include <linux/rtnetlink.h>
>>>>> #include <linux/slab.h>
>>>>> +#include <net/dsa.h>
>>>>> #include <net/switchdev.h>
>>>>>
>>>>> #include "br_private.h"
>>>>> @@ -100,6 +101,19 @@ static void __vlan_flags_commit(struct net_bridge_vlan *v, u16 flags)
>>>>> __vlan_flags_update(v, flags, true);
>>>>> }
>>>>>
>>>>> +static inline bool br_vlan_tagging_by_switchdev(struct net_bridge *br)
>>>>
>>>> no inline in .c files and also constify br
>>>>
>>>>> +{
>>>>> +#if IS_ENABLED(CONFIG_NET_DSA)
>>>>> + struct net_bridge_port *p;
>>>>> +
>>>>> + list_for_each_entry(p, &br->port_list, list) {
>>>>> + if (dsa_user_dev_check(p->dev))
>>>>
>>>> I don't think this can change at runtime, so please keep a counter in
>>>> the bridge and don't walk the port list on every vlan add.
>>>>
>>>
>>> you can use an internal bridge opt (check br_private.h) with a private opt
>>> that's set when such device is added as a port, no need for a full counter
>>> obviously
>>
>> To continue on Nikolay's line of thought...
>>
>> Can you abstractly describe which functional behavior do you need the
>> bridge port to perform, rather than "it needs to be a DSA user port"?
>
> Hello Vladimir,
>
> This has to do with the sequence of events, when dealing with vlan
> tagging in a bridge. When looking at the ingress of untaggged packets,
> that will be tagged by a certain port of that bridge, it depends when
> this tagging happens, in respect to the netfilter hook. This describes
> the existing code:
>
> Because we are using dev_fill_forward_path(), we know in all cases that
> untagged packets arive and are tagged by the bridge.
>
> In the case (#1) off a switchdev, the tagging is done before the packet
> reaches the netfilter hook. Then the software fastpath code needs to
> handle the packet, as if it is incoming tagged, remembering that this
> tag was tagged at ingress (tuple->in_vlan_ingress). We need to remember
> this, because dealing with hardware offloaded fastpath we then need to
> skip this vlan tag in the hardware offloaded fastpath in
> nf_flow_rule_match() and nf_flow_rule_route_common().
>
> In all other cases (#2), 'normal' ports (ports without switchdev and
> dsa-user-ports) the tagging is done after packet reaches the
> netfilter-hook. Then the software fastpath code needs to handle the
> packet, as incoming untagged.
>
> (Of course the dsa-user-port is more complicated, but it all handled by
> the dsa architecture so that it can be used as a 'normal' port.)
>
> In both cases #1 and #2 also the other direction is taken into account.
>
> To decide which case to apply, the code looks at
> BR_VLFLAG_ADDED_BY_SWITCHDEV, which was actually used for something
> else, but it is easily available in the code at that point and seemed to
> work, so far...
>
> But as it turns out, this flag is also set for foreign ports added to
> the dsa-switchdev. This means that case #1 is applied to the foreign dsa
> port. This breaks the software fastpath and, with packets not reaching
> their destination, also the hardware offloaded fastpath does not start.
> We need to apply case #2.
>
> So actually we need to know, inside br_vlan_fill_forward_path_mode(),
> whether or not net_bridge_port *dst is a foreign port added to the dsa
> switchdev. Or perhaps there is another way to find out we have to apply
> case #2.
>
So after doing some more reading, at creation of the code using
BR_VLFLAG_ADDED_BY_SWITCHDEV would have been without problems.
After the switchdev was altered so that objects from foreign devices can
be added, it is problematic in br_vlan_fill_forward_path_mode(). I have
tested and indeed any foreign device does have this problem.
So we need a way to distinguish in br_vlan_fill_forward_path_mode()
whether or not we are dealing with a (dsa) foreign device on the switchdev.
I have come up with something, but this is most likely to crude to be
accepted, but for the sake of 'rfc' discussing it may lead to a proper
solution. So what does work is the following patch, so that
netif_has_dsa_foreign_vlan() can be used inside
br_vlan_fill_forward_path_mode().
Any suggestions on how this could be implemented properly would be
greatly appreciated.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9d80f650345e75..3fb67312428a1f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1839,6 +1839,7 @@ enum netdev_reg_state {
*
* @vlan_info: VLAN info
* @dsa_ptr: dsa specific data
+ * @dsa_foreign_vlan: Counter, number of dsa foreign vlans added
* @tipc_ptr: TIPC specific data
* @atalk_ptr: AppleTalk link
* @ip_ptr: IPv4 specific data
@@ -2214,6 +2215,7 @@ struct net_device {
#endif
#if IS_ENABLED(CONFIG_NET_DSA)
struct dsa_port *dsa_ptr;
+ unsigned int dsa_foreign_vlan;
#endif
#if IS_ENABLED(CONFIG_TIPC)
struct tipc_bearer __rcu *tipc_ptr;
@@ -5135,6 +5137,15 @@ static inline bool netif_is_lag_port(const struct
net_device *dev)
return netif_is_bond_slave(dev) || netif_is_team_port(dev);
}
+static inline bool netif_has_dsa_foreign_vlan(const struct net_device *dev)
+{
+#if IS_ENABLED(CONFIG_NET_DSA)
+ return !!dev->dsa_foreign_vlan;
+#else
+ return false;
+#endif
+}
+
static inline bool netif_is_rxfh_configured(const struct net_device *dev)
{
return dev->priv_flags & IFF_RXFH_CONFIGURED;
diff --git a/net/dsa/user.c b/net/dsa/user.c
index 74eda9b30608e6..775f6346120ed6 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -737,6 +737,8 @@ static int dsa_user_host_vlan_add(struct net_device
*dev,
return 0;
}
+ obj->orig_dev->dsa_foreign_vlan++;
+
vlan = *SWITCHDEV_OBJ_PORT_VLAN(obj);
/* Even though drivers often handle CPU membership in special ways,
@@ -824,6 +826,8 @@ static int dsa_user_host_vlan_del(struct net_device
*dev,
if (dsa_port_skip_vlan_configuration(dp))
return 0;
+ obj->orig_dev->dsa_foreign_vlan--;
+
vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
return dsa_port_host_vlan_del(dp, vlan);
>> switchdev_bridge_port_offload() has a mechanism to inform the bridge
>> core of extra abilities (like tx_fwd_offload). Perhaps you could modify
>> the DSA drivers you need to set a similar bit to inform the bridge of
>> their presence and ability. That would also work when the bridge port is
>> a LAG over a DSA user port.
>>
>> Also, please also CC DSA maintainers when you use DSA API outside
>> net/dsa/ and drivers/net/dsa/. I am in the process of revamping the
>> public DSA API and would like to be in touch with changes as they are
>> made.
>>
>>>>> + return false;
>>>>> + }
>>>>> +#endif
>>>>> + return true;
>>>>> +}
>>>>> +
>>>>> static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
>>>>> struct net_bridge_vlan *v, u16 flags,
>>>>> struct netlink_ext_ack *extack)
>>>>> @@ -113,6 +127,8 @@ static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
>>>>> if (err == -EOPNOTSUPP)
>>>>> return vlan_vid_add(dev, br->vlan_proto, v->vid);
>>>>> v->priv_flags |= BR_VLFLAG_ADDED_BY_SWITCHDEV;
>>>>> + if (br_vlan_tagging_by_switchdev(br))
>>>>> + v->priv_flags |= BR_VLFLAG_TAGGING_BY_SWITCHDEV;
>>>>> return err;
>>>>> }
>>>>>
>>>>> @@ -1491,7 +1507,7 @@ int br_vlan_fill_forward_path_mode(struct net_bridge *br,
>>>>>
>>>>> if (path->bridge.vlan_mode == DEV_PATH_BR_VLAN_TAG)
>>>>> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_KEEP;
>>>>> - else if (v->priv_flags & BR_VLFLAG_ADDED_BY_SWITCHDEV)
>>>>> + else if (v->priv_flags & BR_VLFLAG_TAGGING_BY_SWITCHDEV)
>>>>> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_UNTAG_HW;
>>>>> else
>>>>> path->bridge.vlan_mode = DEV_PATH_BR_VLAN_UNTAG;
>>>>
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 11/12] bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa
2024-10-20 9:23 ` Eric Woudstra
@ 2024-10-21 13:47 ` Vladimir Oltean
2024-10-22 7:25 ` Eric Woudstra
0 siblings, 1 reply; 39+ messages in thread
From: Vladimir Oltean @ 2024-10-21 13:47 UTC (permalink / raw)
To: Eric Woudstra
Cc: Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Roopa Prabhu, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, netdev, linux-kernel,
netfilter-devel, coreteam, bridge, linux-arm-kernel,
linux-mediatek, Andrew Lunn, Florian Fainelli
On Sun, Oct 20, 2024 at 11:23:18AM +0200, Eric Woudstra wrote:
> So after doing some more reading, at creation of the code using
> BR_VLFLAG_ADDED_BY_SWITCHDEV would have been without problems.
>
> After the switchdev was altered so that objects from foreign devices can
> be added, it is problematic in br_vlan_fill_forward_path_mode(). I have
> tested and indeed any foreign device does have this problem.
>
> So we need a way to distinguish in br_vlan_fill_forward_path_mode()
> whether or not we are dealing with a (dsa) foreign device on the switchdev.
>
> I have come up with something, but this is most likely to crude to be
> accepted, but for the sake of 'rfc' discussing it may lead to a proper
> solution. So what does work is the following patch, so that
> netif_has_dsa_foreign_vlan() can be used inside
> br_vlan_fill_forward_path_mode().
>
> Any suggestions on how this could be implemented properly would be
> greatly appreciated.
I don't know nearly enough about the netfilter flowtable to even
understand exactly the problem you're describing and are trying to solve.
I've started to read up on things, but plenty of concepts are new and
I'm mixing this with plenty of other activities. If you could share some
commands to build a test setup so I could form my own independent
opinion of what is going on, it would be great as it would speed up that
process.
With respect to the patch you've posted, it doesn't look exactly great.
One would need to make a thorough analysis of the bridge's use of
BR_VLFLAG_ADDED_BY_SWITCHDEV, of whether it still makes sense in today's
world where br_switchdev_vlan_replay() is a thing (a VLAN that used to
not be "added by switchdev" can become "added by switchdev" after a
replay, but this flag will remain incorrectly unset), of whether VLANs on
foreign DSA interfaces should even have this flag set, and on whether
your flowtable forwarding path patches are conceptually using it correctly.
There's a lot to think about, and if somebody doesn't have the big picture,
I'm worried that a wrong decision will be taken.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RFC v1 net-next 11/12] bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa
2024-10-21 13:47 ` Vladimir Oltean
@ 2024-10-22 7:25 ` Eric Woudstra
0 siblings, 0 replies; 39+ messages in thread
From: Eric Woudstra @ 2024-10-22 7:25 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Roopa Prabhu, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, netdev, linux-kernel,
netfilter-devel, coreteam, bridge, linux-arm-kernel,
linux-mediatek, Andrew Lunn, Florian Fainelli
On 10/21/24 3:47 PM, Vladimir Oltean wrote:
> On Sun, Oct 20, 2024 at 11:23:18AM +0200, Eric Woudstra wrote:
>> So after doing some more reading, at creation of the code using
>> BR_VLFLAG_ADDED_BY_SWITCHDEV would have been without problems.
>>
>> After the switchdev was altered so that objects from foreign devices can
>> be added, it is problematic in br_vlan_fill_forward_path_mode(). I have
>> tested and indeed any foreign device does have this problem.
>>
>> So we need a way to distinguish in br_vlan_fill_forward_path_mode()
>> whether or not we are dealing with a (dsa) foreign device on the switchdev.
>>
>> I have come up with something, but this is most likely to crude to be
>> accepted, but for the sake of 'rfc' discussing it may lead to a proper
>> solution. So what does work is the following patch, so that
>> netif_has_dsa_foreign_vlan() can be used inside
>> br_vlan_fill_forward_path_mode().
>>
>> Any suggestions on how this could be implemented properly would be
>> greatly appreciated.
> I don't know nearly enough about the netfilter flowtable to even
> understand exactly the problem you're describing and are trying to solve.
> I've started to read up on things, but plenty of concepts are new and
Another way of shortly describing it, considering the software fastpath,
only there it is a problem:
Same case #1:
When looking at the ingress hook of the fastpath for a bridged
switchdev-port (non-dsa) with PVID set, the existing code is written so
that the flowtuple needs to match the packet INcluding the PVID.
Same case #2:
When considering the ingress hook of the fastpath of a bridged device
that is not part of a switchdev at all and there is no dsa on the bridge
(so it is not a foreign device), the existing code is written so that
the flowtuple needs to match the packet EXcluding the PVID.
When looking at the diagram of this patch, wlan1 would stand for any
foreign device. Because of the use of BR_VLFLAG_ADDED_BY_SWITCHDEV, case
#1 is applied, instead of case #2.
> I'm mixing this with plenty of other activities. If you could share some
> commands to build a test setup so I could form my own independent
> opinion of what is going on, it would be great as it would speed up that
> process.
I've only setup the bridged part with systemd-networkd. When trying to
re-create, it will only happen if the total action of the bridge,
towards the foreign port, is:
ingress + egress = untagging
So in the forward-fastpath, tagging in the forward path is done by a
vlan-device and untagging is done in the forward path of the bridge.
> With respect to the patch you've posted, it doesn't look exactly great.
Agreed. I was experimenting now with having br_switchdev_port_vlan_add()
first to try it only for non-foreign ports. If not successful, then try
it with foreign ports. This way the calling function will know if the
port is a foreign port. Therefore, no need for the switchdev driver to
communicate back to upper layers.
> One would need to make a thorough analysis of the bridge's use of
> BR_VLFLAG_ADDED_BY_SWITCHDEV, of whether it still makes sense in today's
> world where br_switchdev_vlan_replay() is a thing (a VLAN that used to
> not be "added by switchdev" can become "added by switchdev" after a
> replay, but this flag will remain incorrectly unset), of whether VLANs on
> foreign DSA interfaces should even have this flag set, and on whether
> your flowtable forwarding path patches are conceptually using it correctly.
> There's a lot to think about, and if somebody doesn't have the big picture,
> I'm worried that a wrong decision will be taken.
The entire usage BR_VLFLAG_ADDED_BY_SWITCHDEV does need a careful review.
I also realize my patch-set needs to do more with the switchdev and vlan
combination, then it does now. So for now I will leave it as RFC, as it
will not work properly with switchdevs other the dsa.
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH RFC v1 net-next 12/12] netfilter: nft_flow_offload: Add bridgeflow to nft_flow_offload_eval()
2024-10-13 18:54 [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Eric Woudstra
` (10 preceding siblings ...)
2024-10-13 18:55 ` [PATCH RFC v1 net-next 11/12] bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa Eric Woudstra
@ 2024-10-13 18:55 ` Eric Woudstra
2024-10-14 6:35 ` [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Nikolay Aleksandrov
12 siblings, 0 replies; 39+ messages in thread
From: Eric Woudstra @ 2024-10-13 18:55 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Nikolay Aleksandrov, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle, Eric Woudstra
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
Edit nft_flow_offload_eval() to make it possible to handle a flowtable of
the nft bridge family.
Use nft_flow_offload_bridge_init() to fill the flow tuples. It uses
nft_dev_fill_bridge_path() in each direction.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
---
net/netfilter/nft_flow_offload.c | 142 +++++++++++++++++++++++++++++--
1 file changed, 137 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index 2923286d475e..bd4850691baa 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -184,6 +184,129 @@ static bool nft_flowtable_find_dev(const struct net_device *dev,
return found;
}
+static int nft_dev_fill_bridge_path(struct flow_offload *flow,
+ struct nft_flowtable *ft,
+ const struct nft_pktinfo *pkt,
+ enum ip_conntrack_dir dir,
+ const struct net_device *src_dev,
+ const struct net_device *dst_dev,
+ unsigned char *src_ha,
+ unsigned char *dst_ha)
+{
+ struct flow_offload_tuple_rhash *th = flow->tuplehash;
+ struct net_device_path_stack stack;
+ struct net_device_path_ctx ctx = {};
+ struct nft_forward_info info = {};
+ int i, j = 0;
+
+ for (i = th[dir].tuple.encap_num - 1; i >= 0 ; i--) {
+ if (info.num_encaps >= NF_FLOW_TABLE_ENCAP_MAX)
+ return -1;
+ info.encap[info.num_encaps].id = th[dir].tuple.encap[i].id;
+ info.encap[info.num_encaps].proto = th[dir].tuple.encap[i].proto;
+ info.num_encaps++;
+
+ if (th[dir].tuple.encap[i].proto == htons(ETH_P_PPP_SES))
+ continue;
+
+ if (ctx.num_vlans >= NET_DEVICE_PATH_VLAN_MAX)
+ return -1;
+ ctx.vlan[ctx.num_vlans].id = th[dir].tuple.encap[i].id;
+ ctx.vlan[ctx.num_vlans].proto = th[dir].tuple.encap[i].proto;
+ ctx.num_vlans++;
+ }
+ ctx.dev = src_dev;
+ ether_addr_copy(ctx.daddr, dst_ha);
+
+ if (dev_fill_bridge_path(&ctx, &stack) < 0)
+ return -1;
+
+ nft_dev_path_info(&stack, &info, dst_ha, &ft->data);
+
+ if (!info.indev || info.indev != dst_dev)
+ return -1;
+
+ th[!dir].tuple.iifidx = info.indev->ifindex;
+ for (i = info.num_encaps - 1; i >= 0; i--) {
+ th[!dir].tuple.encap[j].id = info.encap[i].id;
+ th[!dir].tuple.encap[j].proto = info.encap[i].proto;
+ if (info.ingress_vlans & BIT(i))
+ th[!dir].tuple.in_vlan_ingress |= BIT(j);
+ j++;
+ }
+ th[!dir].tuple.encap_num = info.num_encaps;
+
+ th[dir].tuple.mtu = dst_dev->mtu;
+ ether_addr_copy(th[dir].tuple.out.h_source, src_ha);
+ ether_addr_copy(th[dir].tuple.out.h_dest, dst_ha);
+ th[dir].tuple.out.ifidx = info.outdev->ifindex;
+ th[dir].tuple.out.hw_ifidx = info.hw_outdev->ifindex;
+ th[dir].tuple.xmit_type = FLOW_OFFLOAD_XMIT_DIRECT;
+
+ return 0;
+}
+
+static int nft_flow_offload_bridge_init(struct flow_offload *flow,
+ const struct nft_pktinfo *pkt,
+ enum ip_conntrack_dir dir,
+ struct nft_flowtable *ft)
+{
+ struct ethhdr *eth = eth_hdr(pkt->skb);
+ struct flow_offload_tuple *tuple;
+ const struct net_device *out_dev;
+ const struct net_device *in_dev;
+ int err, i = 0;
+
+ in_dev = nft_in(pkt);
+ if (!in_dev || !nft_flowtable_find_dev(in_dev, ft))
+ return -1;
+
+ out_dev = nft_out(pkt);
+ if (!out_dev || !nft_flowtable_find_dev(out_dev, ft))
+ return -1;
+
+ tuple = &flow->tuplehash[!dir].tuple;
+
+ if (skb_vlan_tag_present(pkt->skb)) {
+ tuple->encap[i].id = skb_vlan_tag_get(pkt->skb);
+ tuple->encap[i].proto = pkt->skb->vlan_proto;
+ i++;
+ }
+ switch (pkt->skb->protocol) {
+ case htons(ETH_P_8021Q):
+ struct vlan_hdr *vhdr;
+
+ vhdr = (struct vlan_hdr *)skb_network_header(pkt->skb);
+ tuple->encap[i].id = ntohs(vhdr->h_vlan_TCI);
+ tuple->encap[i].proto = pkt->skb->protocol;
+ i++;
+ break;
+ case htons(ETH_P_PPP_SES):
+ struct pppoe_hdr *phdr;
+
+ phdr = (struct pppoe_hdr *)skb_network_header(pkt->skb);
+ tuple->encap[i].id = ntohs(phdr->sid);
+ tuple->encap[i].proto = pkt->skb->protocol;
+ i++;
+ break;
+ }
+ tuple->encap_num = i;
+
+ err = nft_dev_fill_bridge_path(flow, ft, pkt, !dir, out_dev, in_dev,
+ eth->h_dest, eth->h_source);
+ if (err < 0)
+ return err;
+
+ memset(tuple->encap, 0, sizeof(tuple->encap));
+
+ err = nft_dev_fill_bridge_path(flow, ft, pkt, dir, in_dev, out_dev,
+ eth->h_source, eth->h_dest);
+ if (err < 0)
+ return err;
+
+ return 0;
+}
+
static void nft_dev_forward_path(struct nf_flow_route *route,
const struct nf_conn *ct,
enum ip_conntrack_dir dir,
@@ -294,6 +417,7 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
{
struct nft_flow_offload *priv = nft_expr_priv(expr);
struct nf_flowtable *flowtable = &priv->flowtable->data;
+ bool routing = (flowtable->type->family != NFPROTO_BRIDGE);
struct tcphdr _tcph, *tcph = NULL;
struct nf_flow_route route = {};
enum ip_conntrack_info ctinfo;
@@ -347,14 +471,20 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
goto out;
dir = CTINFO2DIR(ctinfo);
- if (nft_flow_route(pkt, ct, &route, dir, priv->flowtable) < 0)
- goto err_flow_route;
+ if (routing) {
+ if (nft_flow_route(pkt, ct, &route, dir, priv->flowtable) < 0)
+ goto err_flow_route;
+ }
flow = flow_offload_alloc(ct);
if (!flow)
goto err_flow_alloc;
- flow_offload_route_init(flow, &route);
+ if (routing)
+ flow_offload_route_init(flow, &route);
+ else
+ if (nft_flow_offload_bridge_init(flow, pkt, dir, priv->flowtable) < 0)
+ goto err_flow_route;
if (tcph) {
ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
@@ -407,8 +537,10 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
err_flow_add:
flow_offload_free(flow);
err_flow_alloc:
- dst_release(route.tuple[dir].dst);
- dst_release(route.tuple[!dir].dst);
+ if (routing) {
+ dst_release(route.tuple[dir].dst);
+ dst_release(route.tuple[!dir].dst);
+ }
err_flow_route:
clear_bit(IPS_OFFLOAD_BIT, &ct->status);
out:
--
2.45.2
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements
2024-10-13 18:54 [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Eric Woudstra
` (11 preceding siblings ...)
2024-10-13 18:55 ` [PATCH RFC v1 net-next 12/12] netfilter: nft_flow_offload: Add bridgeflow to nft_flow_offload_eval() Eric Woudstra
@ 2024-10-14 6:35 ` Nikolay Aleksandrov
2024-10-14 18:29 ` Eric Woudstra
12 siblings, 1 reply; 39+ messages in thread
From: Nikolay Aleksandrov @ 2024-10-14 6:35 UTC (permalink / raw)
To: Eric Woudstra, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik, Roopa Prabhu,
Matthias Brugger, AngeloGioacchino Del Regno, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Frank Wunderlich,
Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
On 13/10/2024 21:54, Eric Woudstra wrote:
> This patchset makes it possible to set up a (hardware offloaded) fastpath
> for bridged interfaces.
>
The subject and this sentence are misleading, you're talking about netfilter bridge
fastpath offload, please mention it in both places. When you just say bridge fast
path, I think of the software fast path.
> To set up the fastpath with offloading, add this extra flowtable:
>
> table bridge filter {
> flowtable fb {
> hook ingress priority filter
> devices = { lan0, lan1, lan2, lan3, lan4, wlan0, wlan1 }
> flags offload
> }
> chain forward {
> type filter hook forward priority filter; policy accept;
> ct state established flow add @fb
> }
> }
>
> Creating a separate fastpath for bridges.
>
> forward fastpath bypass
> .----------------------------------------.
> / \
> | IP - forwarding |
> | / \ v
> | / wan ...
> | /
> | |
> | |
> | brlan.1
> | |
> | +-------------------------------+
> | | vlan 1 |
> | | |
> | | brlan (vlan-filtering) |
> | +---------------+ |
> | | DSA-SWITCH | |
> | | | vlan 1 |
> | | | to |
> | | vlan 1 | untagged |
> | +---------------+---------------+
> . / \
> ------>lan0 wlan1
> . ^ ^
> . | |
> . \_________________/
> . bridge fastpath bypass
> .
> ^
> vlan 1 tagged packets
>
> To have the ability to handle xmit direct with outgoing encaps in the
> bridge fastpass bypass, we need to be able to handle them without going
> through vlan/pppoe devices. So I've applied, amended and squashed wenxu's
> patchset. This patch also makes it possible to egress from vlan-filtering
> brlan to lan0 with vlan tagged packets, if the bridge master port is doing
> the vlan tagging, instead of the vlan-device. Without this patch, this is
> not possible in the bridge-fastpath and also not in the forward-fastpath,
> as seen in the figure above.
>
> There are also some more fixes for filling in the forward path. These
> fixes also apply to for the forward-fastpath. They include handling
> DEV_PATH_MTK_WDMA in nft_dev_path_info() and avoiding
> DEV_PATH_BR_VLAN_UNTAG_HW for bridges with ports that use dsa.
>
> Conntrack bridge only tracks untagged and 802.1q. To make the bridge
> fastpath experience more similar to the forward fastpath experience,
> I've added double vlan, pppoe and pppoe-in-q tagged packets to bridge
> conntrack and to bridge filter chain.
>
> Eric Woudstra (12):
> netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit
> direct
> netfilter: bridge: Add conntrack double vlan and pppoe
> netfilter: nft_chain_filter: Add bridge double vlan and pppoe
> bridge: br_vlan_fill_forward_path_pvid: Add port to port
> bridge: br_fill_forward_path add port to port
> net: core: dev: Add dev_fill_bridge_path()
> netfilter :nf_flow_table_offload: Add nf_flow_rule_bridge()
> netfilter: nf_flow_table_inet: Add nf_flowtable_type flowtable_bridge
> netfilter: nft_flow_offload: Add NFPROTO_BRIDGE to validate
> netfilter: nft_flow_offload: Add DEV_PATH_MTK_WDMA to
> nft_dev_path_info()
> bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa
> netfilter: nft_flow_offload: Add bridgeflow to nft_flow_offload_eval()
>
> include/linux/netdevice.h | 2 +
> include/net/netfilter/nf_flow_table.h | 3 +
> net/bridge/br_device.c | 20 ++-
> net/bridge/br_private.h | 2 +
> net/bridge/br_vlan.c | 24 +++-
> net/bridge/netfilter/nf_conntrack_bridge.c | 86 ++++++++++--
> net/core/dev.c | 77 +++++++++--
> net/netfilter/nf_flow_table_inet.c | 13 ++
> net/netfilter/nf_flow_table_ip.c | 96 ++++++++++++-
> net/netfilter/nf_flow_table_offload.c | 13 ++
> net/netfilter/nft_chain_filter.c | 20 ++-
> net/netfilter/nft_flow_offload.c | 154 +++++++++++++++++++--
> 12 files changed, 463 insertions(+), 47 deletions(-)
>
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements
2024-10-14 6:35 ` [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements Nikolay Aleksandrov
@ 2024-10-14 18:29 ` Eric Woudstra
2024-10-15 12:16 ` Felix Fietkau
0 siblings, 1 reply; 39+ messages in thread
From: Eric Woudstra @ 2024-10-14 18:29 UTC (permalink / raw)
To: Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Roopa Prabhu, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
On 10/14/24 8:35 AM, Nikolay Aleksandrov wrote:
> On 13/10/2024 21:54, Eric Woudstra wrote:
>> This patchset makes it possible to set up a (hardware offloaded) fastpath
>> for bridged interfaces.
>>
>
> The subject and this sentence are misleading, you're talking about netfilter bridge
> fastpath offload, please mention it in both places. When you just say bridge fast
> path, I think of the software fast path.
>
Hello Nikolay,
It would be no problem for me to change the subject and body, if you
think that is better.
The thing is, these patches actually make it possible to set up a fully
functional software fastpath between bridged interfaces. Only after the
software fastpath is set up and functional, it can be offloaded, which
happens to by my personal motivation to write this patch-set.
If the offload flag is set in the flowtable, the software fastpath will
be offloaded. But in this patch-set, there is nothing that changes
anything there, the existing code is used unchanged.
>> To set up the fastpath with offloading, add this extra flowtable:
>>
>> table bridge filter {
>> flowtable fb {
>> hook ingress priority filter
>> devices = { lan0, lan1, lan2, lan3, lan4, wlan0, wlan1 }
>> flags offload
>> }
>> chain forward {
>> type filter hook forward priority filter; policy accept;
>> ct state established flow add @fb
>> }
>> }
>>
>> Creating a separate fastpath for bridges.
>>
>> forward fastpath bypass
>> .----------------------------------------.
>> / \
>> | IP - forwarding |
>> | / \ v
>> | / wan ...
>> | /
>> | |
>> | |
>> | brlan.1
>> | |
>> | +-------------------------------+
>> | | vlan 1 |
>> | | |
>> | | brlan (vlan-filtering) |
>> | +---------------+ |
>> | | DSA-SWITCH | |
>> | | | vlan 1 |
>> | | | to |
>> | | vlan 1 | untagged |
>> | +---------------+---------------+
>> . / \
>> ------>lan0 wlan1
>> . ^ ^
>> . | |
>> . \_________________/
>> . bridge fastpath bypass
>> .
>> ^
>> vlan 1 tagged packets
>>
>> To have the ability to handle xmit direct with outgoing encaps in the
>> bridge fastpass bypass, we need to be able to handle them without going
>> through vlan/pppoe devices. So I've applied, amended and squashed wenxu's
>> patchset. This patch also makes it possible to egress from vlan-filtering
>> brlan to lan0 with vlan tagged packets, if the bridge master port is doing
>> the vlan tagging, instead of the vlan-device. Without this patch, this is
>> not possible in the bridge-fastpath and also not in the forward-fastpath,
>> as seen in the figure above.
>>
>> There are also some more fixes for filling in the forward path. These
>> fixes also apply to for the forward-fastpath. They include handling
>> DEV_PATH_MTK_WDMA in nft_dev_path_info() and avoiding
>> DEV_PATH_BR_VLAN_UNTAG_HW for bridges with ports that use dsa.
>>
>> Conntrack bridge only tracks untagged and 802.1q. To make the bridge
>> fastpath experience more similar to the forward fastpath experience,
>> I've added double vlan, pppoe and pppoe-in-q tagged packets to bridge
>> conntrack and to bridge filter chain.
>>
>> Eric Woudstra (12):
>> netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit
>> direct
>> netfilter: bridge: Add conntrack double vlan and pppoe
>> netfilter: nft_chain_filter: Add bridge double vlan and pppoe
>> bridge: br_vlan_fill_forward_path_pvid: Add port to port
>> bridge: br_fill_forward_path add port to port
>> net: core: dev: Add dev_fill_bridge_path()
>> netfilter :nf_flow_table_offload: Add nf_flow_rule_bridge()
>> netfilter: nf_flow_table_inet: Add nf_flowtable_type flowtable_bridge
>> netfilter: nft_flow_offload: Add NFPROTO_BRIDGE to validate
>> netfilter: nft_flow_offload: Add DEV_PATH_MTK_WDMA to
>> nft_dev_path_info()
>> bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa
>> netfilter: nft_flow_offload: Add bridgeflow to nft_flow_offload_eval()
>>
>> include/linux/netdevice.h | 2 +
>> include/net/netfilter/nf_flow_table.h | 3 +
>> net/bridge/br_device.c | 20 ++-
>> net/bridge/br_private.h | 2 +
>> net/bridge/br_vlan.c | 24 +++-
>> net/bridge/netfilter/nf_conntrack_bridge.c | 86 ++++++++++--
>> net/core/dev.c | 77 +++++++++--
>> net/netfilter/nf_flow_table_inet.c | 13 ++
>> net/netfilter/nf_flow_table_ip.c | 96 ++++++++++++-
>> net/netfilter/nf_flow_table_offload.c | 13 ++
>> net/netfilter/nft_chain_filter.c | 20 ++-
>> net/netfilter/nft_flow_offload.c | 154 +++++++++++++++++++--
>> 12 files changed, 463 insertions(+), 47 deletions(-)
>>
>
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements
2024-10-14 18:29 ` Eric Woudstra
@ 2024-10-15 12:16 ` Felix Fietkau
2024-10-15 13:32 ` Eric Woudstra
0 siblings, 1 reply; 39+ messages in thread
From: Felix Fietkau @ 2024-10-15 12:16 UTC (permalink / raw)
To: Eric Woudstra, Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Roopa Prabhu, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
Hi Eric,
On 14.10.24 20:29, Eric Woudstra wrote:
> It would be no problem for me to change the subject and body, if you
> think that is better.
>
> The thing is, these patches actually make it possible to set up a fully
> functional software fastpath between bridged interfaces. Only after the
> software fastpath is set up and functional, it can be offloaded, which
> happens to by my personal motivation to write this patch-set.
>
> If the offload flag is set in the flowtable, the software fastpath will
> be offloaded. But in this patch-set, there is nothing that changes
> anything there, the existing code is used unchanged.
FWIW, a while back, I also wanted to add a software fast path for the
bridge layer to the kernel, also with the intention of using it for
hardware offload. It wasn't accepted back then, because (if I remember
correctly) people didn't want any extra complexity in the network stack
to make the bridge layer faster.
Because of that, I created this piece of software:
https://github.com/nbd168/bridger
It uses an eBPF TC classifier for discovering flows and handling the
software fast path, and also creates hardware offload rules for flows.
With that, hardware offloading for bridged LAN->WLAN flows is fully
supported on MediaTek hardware with upstream kernels.
- Felix
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements
2024-10-15 12:16 ` Felix Fietkau
@ 2024-10-15 13:32 ` Eric Woudstra
2024-10-15 19:44 ` Felix Fietkau
0 siblings, 1 reply; 39+ messages in thread
From: Eric Woudstra @ 2024-10-15 13:32 UTC (permalink / raw)
To: Felix Fietkau, Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Roopa Prabhu, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
On 10/15/24 2:16 PM, Felix Fietkau wrote:
> Hi Eric,
>
> On 14.10.24 20:29, Eric Woudstra wrote:
>> It would be no problem for me to change the subject and body, if you
>> think that is better.
>>
>> The thing is, these patches actually make it possible to set up a fully
>> functional software fastpath between bridged interfaces. Only after the
>> software fastpath is set up and functional, it can be offloaded, which
>> happens to by my personal motivation to write this patch-set.
>>
>> If the offload flag is set in the flowtable, the software fastpath will
>> be offloaded. But in this patch-set, there is nothing that changes
>> anything there, the existing code is used unchanged.
>
> FWIW, a while back, I also wanted to add a software fast path for the
> bridge layer to the kernel, also with the intention of using it for
> hardware offload. It wasn't accepted back then, because (if I remember
> correctly) people didn't want any extra complexity in the network stack
> to make the bridge layer faster.
Hello Felix,
I think this patch-set is a clear showcase it is not very complex at
all. The core of making it possible only consists a few patches. Half of
this patch-set involves improvements that also apply to the
forward-fastpath.
> Because of that, I created this piece of software:
> https://github.com/nbd168/bridger
>
> It uses an eBPF TC classifier for discovering flows and handling the
> software fast path, and also creates hardware offload rules for flows.
> With that, hardware offloading for bridged LAN->WLAN flows is fully
> supported on MediaTek hardware with upstream kernels.
>
> - Felix
Thanks, I've seen that already. Nice piece of software, but I'm not
running openwrt. I would like to see a solution implemented in the
kernel, so any operating system can use it.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements
2024-10-15 13:32 ` Eric Woudstra
@ 2024-10-15 19:44 ` Felix Fietkau
2024-10-16 15:59 ` Eric Woudstra
0 siblings, 1 reply; 39+ messages in thread
From: Felix Fietkau @ 2024-10-15 19:44 UTC (permalink / raw)
To: Eric Woudstra, Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Roopa Prabhu, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
On 15.10.24 15:32, Eric Woudstra wrote:
>
>
> On 10/15/24 2:16 PM, Felix Fietkau wrote:
>> Hi Eric,
>>
>> On 14.10.24 20:29, Eric Woudstra wrote:
>>> It would be no problem for me to change the subject and body, if you
>>> think that is better.
>>>
>>> The thing is, these patches actually make it possible to set up a fully
>>> functional software fastpath between bridged interfaces. Only after the
>>> software fastpath is set up and functional, it can be offloaded, which
>>> happens to by my personal motivation to write this patch-set.
>>>
>>> If the offload flag is set in the flowtable, the software fastpath will
>>> be offloaded. But in this patch-set, there is nothing that changes
>>> anything there, the existing code is used unchanged.
>>
>> FWIW, a while back, I also wanted to add a software fast path for the
>> bridge layer to the kernel, also with the intention of using it for
>> hardware offload. It wasn't accepted back then, because (if I remember
>> correctly) people didn't want any extra complexity in the network stack
>> to make the bridge layer faster.
>
> Hello Felix,
>
> I think this patch-set is a clear showcase it is not very complex at
> all. The core of making it possible only consists a few patches. Half of
> this patch-set involves improvements that also apply to the
> forward-fastpath.
It's definitely an interesting approach. How does it deal with devices
roaming from one bridge port to another? I couldn't find that in the code.
>> Because of that, I created this piece of software:
>> https://github.com/nbd168/bridger
>>
>> It uses an eBPF TC classifier for discovering flows and handling the
>> software fast path, and also creates hardware offload rules for flows.
>> With that, hardware offloading for bridged LAN->WLAN flows is fully
>> supported on MediaTek hardware with upstream kernels.
>>
>> - Felix
>
> Thanks, I've seen that already. Nice piece of software, but I'm not
> running openwrt. I would like to see a solution implemented in the
> kernel, so any operating system can use it.
Makes sense. By the way, bridger can easily be built for non-OpenWrt
systems too. The only library that's actually needed is libubox - that
one is small and can be linked in statically. ubus support is fully
optional and not necessary for standard cases.
- Felix
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements
2024-10-15 19:44 ` Felix Fietkau
@ 2024-10-16 15:59 ` Eric Woudstra
2024-10-17 9:17 ` Felix Fietkau
0 siblings, 1 reply; 39+ messages in thread
From: Eric Woudstra @ 2024-10-16 15:59 UTC (permalink / raw)
To: Felix Fietkau, Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Roopa Prabhu, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
On 10/15/24 9:44 PM, Felix Fietkau wrote:
> On 15.10.24 15:32, Eric Woudstra wrote:
>>
>>
>> On 10/15/24 2:16 PM, Felix Fietkau wrote:
>>> Hi Eric,
>>>
>>> On 14.10.24 20:29, Eric Woudstra wrote:
>>>> It would be no problem for me to change the subject and body, if you
>>>> think that is better.
>>>>
>>>> The thing is, these patches actually make it possible to set up a fully
>>>> functional software fastpath between bridged interfaces. Only after the
>>>> software fastpath is set up and functional, it can be offloaded, which
>>>> happens to by my personal motivation to write this patch-set.
>>>>
>>>> If the offload flag is set in the flowtable, the software fastpath will
>>>> be offloaded. But in this patch-set, there is nothing that changes
>>>> anything there, the existing code is used unchanged.
>>>
>>> FWIW, a while back, I also wanted to add a software fast path for the
>>> bridge layer to the kernel, also with the intention of using it for
>>> hardware offload. It wasn't accepted back then, because (if I remember
>>> correctly) people didn't want any extra complexity in the network stack
>>> to make the bridge layer faster.
>>
>> Hello Felix,
>>
>> I think this patch-set is a clear showcase it is not very complex at
>> all. The core of making it possible only consists a few patches. Half of
>> this patch-set involves improvements that also apply to the
>> forward-fastpath.
>
> It's definitely an interesting approach. How does it deal with devices
> roaming from one bridge port to another? I couldn't find that in the code.
It is handled in the same manner when dealing with the forward-fastpath,
with the aid of conntrack. If roaming is problematic, then it would be
for both the forward-fastpath and the bridge-fastpath. I have a topic on
the banana-pi forum about this patch-set, so I think long discussions
about additional details we could have there, keeping the mailing list
more clean.
>>> Because of that, I created this piece of software:
>>> https://github.com/nbd168/bridger
>>>
>>> It uses an eBPF TC classifier for discovering flows and handling the
>>> software fast path, and also creates hardware offload rules for flows.
>>> With that, hardware offloading for bridged LAN->WLAN flows is fully
>>> supported on MediaTek hardware with upstream kernels.
>>>
>>> - Felix
>>
>> Thanks, I've seen that already. Nice piece of software, but I'm not
>> running openwrt. I would like to see a solution implemented in the
>> kernel, so any operating system can use it.
>
> Makes sense. By the way, bridger can easily be built for non-OpenWrt
> systems too. The only library that's actually needed is libubox - that
> one is small and can be linked in statically. ubus support is fully
> optional and not necessary for standard cases.
>
> - Felix
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements
2024-10-16 15:59 ` Eric Woudstra
@ 2024-10-17 9:17 ` Felix Fietkau
2024-10-17 12:39 ` Pablo Neira Ayuso
0 siblings, 1 reply; 39+ messages in thread
From: Felix Fietkau @ 2024-10-17 9:17 UTC (permalink / raw)
To: Eric Woudstra, Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Pablo Neira Ayuso, Jozsef Kadlecsik,
Roopa Prabhu, Matthias Brugger, AngeloGioacchino Del Regno,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Frank Wunderlich, Daniel Golle
Cc: netdev, linux-kernel, netfilter-devel, coreteam, bridge,
linux-arm-kernel, linux-mediatek
On 16.10.24 17:59, Eric Woudstra wrote:
>
>
> On 10/15/24 9:44 PM, Felix Fietkau wrote:
>> On 15.10.24 15:32, Eric Woudstra wrote:
>>>
>>>
>>> On 10/15/24 2:16 PM, Felix Fietkau wrote:
>>>> Hi Eric,
>>>>
>>>> On 14.10.24 20:29, Eric Woudstra wrote:
>>>>> It would be no problem for me to change the subject and body, if you
>>>>> think that is better.
>>>>>
>>>>> The thing is, these patches actually make it possible to set up a fully
>>>>> functional software fastpath between bridged interfaces. Only after the
>>>>> software fastpath is set up and functional, it can be offloaded, which
>>>>> happens to by my personal motivation to write this patch-set.
>>>>>
>>>>> If the offload flag is set in the flowtable, the software fastpath will
>>>>> be offloaded. But in this patch-set, there is nothing that changes
>>>>> anything there, the existing code is used unchanged.
>>>>
>>>> FWIW, a while back, I also wanted to add a software fast path for the
>>>> bridge layer to the kernel, also with the intention of using it for
>>>> hardware offload. It wasn't accepted back then, because (if I remember
>>>> correctly) people didn't want any extra complexity in the network stack
>>>> to make the bridge layer faster.
>>>
>>> Hello Felix,
>>>
>>> I think this patch-set is a clear showcase it is not very complex at
>>> all. The core of making it possible only consists a few patches. Half of
>>> this patch-set involves improvements that also apply to the
>>> forward-fastpath.
>>
>> It's definitely an interesting approach. How does it deal with devices
>> roaming from one bridge port to another? I couldn't find that in the code.
>
> It is handled in the same manner when dealing with the forward-fastpath,
> with the aid of conntrack. If roaming is problematic, then it would be
> for both the forward-fastpath and the bridge-fastpath. I have a topic on
> the banana-pi forum about this patch-set, so I think long discussions
> about additional details we could have there, keeping the mailing list
> more clean.
You forgot to include a link to the forum topic :)
By the way, based on some reports that I received, I do believe that the
existing forwarding fastpath also doesn't handle roaming properly.
I just didn't have the time to properly look into that yet.
- Felix
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements
2024-10-17 9:17 ` Felix Fietkau
@ 2024-10-17 12:39 ` Pablo Neira Ayuso
2024-10-17 17:06 ` Felix Fietkau
0 siblings, 1 reply; 39+ messages in thread
From: Pablo Neira Ayuso @ 2024-10-17 12:39 UTC (permalink / raw)
To: Felix Fietkau
Cc: Eric Woudstra, Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jozsef Kadlecsik, Roopa Prabhu,
Matthias Brugger, AngeloGioacchino Del Regno, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Frank Wunderlich,
Daniel Golle, netdev, linux-kernel, netfilter-devel, coreteam,
bridge, linux-arm-kernel, linux-mediatek
On Thu, Oct 17, 2024 at 11:17:09AM +0200, Felix Fietkau wrote:
[...]
> By the way, based on some reports that I received, I do believe that the
> existing forwarding fastpath also doesn't handle roaming properly.
> I just didn't have the time to properly look into that yet.
I think it should work for the existing forwarding fastpath.
- If computer roams from different port, packets follow classic path,
then new flow entry is created. The flow old entry expires after 30
seconds.
- If route is stale, flow entry is also removed.
Maybe I am missing another possible scenario?
Thanks.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements
2024-10-17 12:39 ` Pablo Neira Ayuso
@ 2024-10-17 17:06 ` Felix Fietkau
2024-10-17 18:09 ` Pablo Neira Ayuso
0 siblings, 1 reply; 39+ messages in thread
From: Felix Fietkau @ 2024-10-17 17:06 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Eric Woudstra, Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jozsef Kadlecsik, Roopa Prabhu,
Matthias Brugger, AngeloGioacchino Del Regno, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Frank Wunderlich,
Daniel Golle, netdev, linux-kernel, netfilter-devel, coreteam,
bridge, linux-arm-kernel, linux-mediatek
On 17.10.24 14:39, Pablo Neira Ayuso wrote:
> On Thu, Oct 17, 2024 at 11:17:09AM +0200, Felix Fietkau wrote:
> [...]
>> By the way, based on some reports that I received, I do believe that the
>> existing forwarding fastpath also doesn't handle roaming properly.
>> I just didn't have the time to properly look into that yet.
>
> I think it should work for the existing forwarding fastpath.
>
> - If computer roams from different port, packets follow classic path,
> then new flow entry is created. The flow old entry expires after 30
> seconds.
> - If route is stale, flow entry is also removed.
>
> Maybe I am missing another possible scenario?
I'm mainly talking about the scenario where a computer moves to a
different switch port on L2 only, so all routes remain the same.
I haven't fully analyzed the issue, but I did find a few potential
issues with what you're describing.
1. Since one direction remains the same when a computer roams, a new
flow entry would probably fail to be added because of an existing entry
in the flow hash table.
2. Even with that out of the way, the MTK hardware offload currently
does not support matching the incoming switch/ethernet port.
So even if we manage to add an updated entry, the old entry could still
be kept alive by the hardware.
The issues I found probably wouldn't cause connection hangs in pure L3
software flow offload, since it will use the bridge device for xmit
instead of its members. But since hardware offload needs to redirect
traffic to individual bridge ports, it could cause connection hangs with
stale flow entries.
There might be other issues as well, but this is what I could come up
with on short notice. I think in order to properly address this, we
should probably monitor for FDB / neigh entry changes somehow and clear
affected flows.
Routes do not become stale in my scenario, so something else is needed
to trigger flow entry removal.
- Felix
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements
2024-10-17 17:06 ` Felix Fietkau
@ 2024-10-17 18:09 ` Pablo Neira Ayuso
2024-10-17 18:39 ` Felix Fietkau
0 siblings, 1 reply; 39+ messages in thread
From: Pablo Neira Ayuso @ 2024-10-17 18:09 UTC (permalink / raw)
To: Felix Fietkau
Cc: Eric Woudstra, Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jozsef Kadlecsik, Roopa Prabhu,
Matthias Brugger, AngeloGioacchino Del Regno, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Frank Wunderlich,
Daniel Golle, netdev, linux-kernel, netfilter-devel, coreteam,
bridge, linux-arm-kernel, linux-mediatek
On Thu, Oct 17, 2024 at 07:06:51PM +0200, Felix Fietkau wrote:
> On 17.10.24 14:39, Pablo Neira Ayuso wrote:
> > On Thu, Oct 17, 2024 at 11:17:09AM +0200, Felix Fietkau wrote:
> > [...]
> > > By the way, based on some reports that I received, I do believe that the
> > > existing forwarding fastpath also doesn't handle roaming properly.
> > > I just didn't have the time to properly look into that yet.
> >
> > I think it should work for the existing forwarding fastpath.
> >
> > - If computer roams from different port, packets follow classic path,
> > then new flow entry is created. The flow old entry expires after 30
> > seconds.
> > - If route is stale, flow entry is also removed.
> >
> > Maybe I am missing another possible scenario?
>
> I'm mainly talking about the scenario where a computer moves to a different
> switch port on L2 only, so all routes remain the same.
>
> I haven't fully analyzed the issue, but I did find a few potential issues
> with what you're describing.
>
> 1. Since one direction remains the same when a computer roams, a new flow
> entry would probably fail to be added because of an existing entry in the
> flow hash table.
I don't think so, hash includes iifidx.
> 2. Even with that out of the way, the MTK hardware offload currently does
> not support matching the incoming switch/ethernet port.
> So even if we manage to add an updated entry, the old entry could still be
> kept alive by the hardware.
OK, that means probably driver needs to address the lack of iifidx in
the matching by dealling with more than one single flow entry to point
to one single hardware entry (refcounting?).
> The issues I found probably wouldn't cause connection hangs in pure L3
> software flow offload, since it will use the bridge device for xmit instead
> of its members. But since hardware offload needs to redirect traffic to
> individual bridge ports, it could cause connection hangs with stale flow
> entries.
I would not expect a hang, packets will just flow over classic path
for a little while for the computer that is roaming until the new flow
entry is added.
> There might be other issues as well, but this is what I could come up with
> on short notice. I think in order to properly address this, we should
> probably monitor for FDB / neigh entry changes somehow and clear affected
> flows.
>
> Routes do not become stale in my scenario, so something else is needed to
> trigger flow entry removal.
Yes. In case letting expire stale flow entries with old iifidx is not enough
some other mechanism could be required.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RFC v1 net-next 00/12] bridge-fastpath and related improvements
2024-10-17 18:09 ` Pablo Neira Ayuso
@ 2024-10-17 18:39 ` Felix Fietkau
0 siblings, 0 replies; 39+ messages in thread
From: Felix Fietkau @ 2024-10-17 18:39 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Eric Woudstra, Nikolay Aleksandrov, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jozsef Kadlecsik, Roopa Prabhu,
Matthias Brugger, AngeloGioacchino Del Regno, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Frank Wunderlich,
Daniel Golle, netdev, linux-kernel, netfilter-devel, coreteam,
bridge, linux-arm-kernel, linux-mediatek
On 17.10.24 20:09, Pablo Neira Ayuso wrote:
> On Thu, Oct 17, 2024 at 07:06:51PM +0200, Felix Fietkau wrote:
>> On 17.10.24 14:39, Pablo Neira Ayuso wrote:
>> > On Thu, Oct 17, 2024 at 11:17:09AM +0200, Felix Fietkau wrote:
>> > [...]
>> > > By the way, based on some reports that I received, I do believe that the
>> > > existing forwarding fastpath also doesn't handle roaming properly.
>> > > I just didn't have the time to properly look into that yet.
>> >
>> > I think it should work for the existing forwarding fastpath.
>> >
>> > - If computer roams from different port, packets follow classic path,
>> > then new flow entry is created. The flow old entry expires after 30
>> > seconds.
>> > - If route is stale, flow entry is also removed.
>> >
>> > Maybe I am missing another possible scenario?
>>
>> I'm mainly talking about the scenario where a computer moves to a different
>> switch port on L2 only, so all routes remain the same.
>>
>> I haven't fully analyzed the issue, but I did find a few potential issues
>> with what you're describing.
>>
>> 1. Since one direction remains the same when a computer roams, a new flow
>> entry would probably fail to be added because of an existing entry in the
>> flow hash table.
>
> I don't think so, hash includes iifidx.
I'm talking about the side where the input ifindex remains the same, but
the output interface doesn't.
>> 2. Even with that out of the way, the MTK hardware offload currently does
>> not support matching the incoming switch/ethernet port.
>> So even if we manage to add an updated entry, the old entry could still be
>> kept alive by the hardware.
>
> OK, that means probably driver needs to address the lack of iifidx in
> the matching by dealling with more than one single flow entry to point
> to one single hardware entry (refcounting?).
If we have multiple colliding entries, I think a more reasonable
behavior would be allowing the newer flow to override the older one.
>> The issues I found probably wouldn't cause connection hangs in pure L3
>> software flow offload, since it will use the bridge device for xmit instead
>> of its members. But since hardware offload needs to redirect traffic to
>> individual bridge ports, it could cause connection hangs with stale flow
>> entries.
>
> I would not expect a hang, packets will just flow over classic path
> for a little while for the computer that is roaming until the new flow
> entry is added.
If the hardware still handles traffic, but redirects it to the wrong
destination port, the connection will hang.
- Felix
^ permalink raw reply [flat|nested] 39+ messages in thread