Ethernet Bridge development
 help / color / mirror / Atom feed
* Re: [Bridge] [RFC bpf-next 1/2] net: bridge: add unstable br_fdb_find_port_from_ifindex helper
       [not found] ` <720907692575488526f06edc2cf5c8f783777d4f.1643044381.git.lorenzo@kernel.org>
@ 2022-01-24 18:32   ` Nikolay Aleksandrov
  2022-01-25  5:09     ` Alexei Starovoitov
  2022-01-26 11:27     ` Lorenzo Bianconi
  0 siblings, 2 replies; 10+ messages in thread
From: Nikolay Aleksandrov @ 2022-01-24 18:32 UTC (permalink / raw)
  To: Lorenzo Bianconi, bpf, netdev
  Cc: bridge@lists.linux-foundation.org, daniel, Roopa Prabhu, dsahern,
	toke, komachi.yoshiki, ast, Ido Schimmel, memxor, brouer, kuba,
	lorenzo.bianconi, andrii.nakryiko, davem

On 24/01/2022 19:20, Lorenzo Bianconi wrote:
> Similar to bpf_xdp_ct_lookup routine, introduce
> br_fdb_find_port_from_ifindex unstable helper in order to accelerate
> linux bridge with XDP. br_fdb_find_port_from_ifindex will perform a
> lookup in the associated bridge fdb table and it will return the
> output ifindex if the destination address is associated to a bridge
> port or -ENODEV for BOM traffic or if lookup fails.
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  net/bridge/br.c         | 21 +++++++++++++
>  net/bridge/br_fdb.c     | 67 +++++++++++++++++++++++++++++++++++------
>  net/bridge/br_private.h | 12 ++++++++
>  3 files changed, 91 insertions(+), 9 deletions(-)
> 

Hi Lorenzo,
Please CC bridge maintainers for bridge-related patches, I've added Roopa and the
bridge mailing list as well. Aside from that, the change is certainly interesting, I've been
thinking about a similar helper for some time now, few comments below.

Have you thought about the egress path and if by the current bridge state the packet would
be allowed to egress through the found port from the lookup? I'd guess you have to keep updating
the active ports list based on netlink events, but there's a lot of egress bridge logic that
either have to be duplicated or somehow synced. Check should_deliver() (br_forward.c) and later
egress stages, but I see how this is a good first step and perhaps we can build upon it.
There are a few possible solutions, but I haven't tried anything yet, most obvious being
yet another helper. :)

> diff --git a/net/bridge/br.c b/net/bridge/br.c
> index 1fac72cc617f..d2d1c2341d9c 100644
> --- a/net/bridge/br.c
> +++ b/net/bridge/br.c
> @@ -16,6 +16,8 @@
>  #include <net/llc.h>
>  #include <net/stp.h>
>  #include <net/switchdev.h>
> +#include <linux/btf.h>
> +#include <linux/btf_ids.h>
>  
>  #include "br_private.h"
>  
> @@ -365,6 +367,17 @@ static const struct stp_proto br_stp_proto = {
>  	.rcv	= br_stp_rcv,
>  };
>  
> +#if (IS_ENABLED(CONFIG_DEBUG_INFO_BTF) || IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
> +BTF_SET_START(br_xdp_fdb_check_kfunc_ids)
> +BTF_ID(func, br_fdb_find_port_from_ifindex)
> +BTF_SET_END(br_xdp_fdb_check_kfunc_ids)
> +
> +static const struct btf_kfunc_id_set br_xdp_fdb_kfunc_set = {
> +	.owner     = THIS_MODULE,
> +	.check_set = &br_xdp_fdb_check_kfunc_ids,
> +};
> +#endif
> +
>  static int __init br_init(void)
>  {
>  	int err;
> @@ -417,6 +430,14 @@ static int __init br_init(void)
>  		"need this.\n");
>  #endif
>  
> +#if (IS_ENABLED(CONFIG_DEBUG_INFO_BTF) || IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
> +	err = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &br_xdp_fdb_kfunc_set);
> +	if (err < 0) {
> +		br_netlink_fini();
> +		goto err_out6;

Add err_out7 and handle it there please. Let's keep it consistent.
Also I cannot find register_btf_kfunc_id_set() in net-next or Linus' master, but
should it be paired with an unregister on unload (br_deinit) ?

> +	}
> +#endif
> +
>  	return 0;
>  
>  err_out6:
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index 6ccda68bd473..cd3afa240298 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -235,30 +235,79 @@ static struct net_bridge_fdb_entry *br_fdb_find(struct net_bridge *br,
>  	return fdb;
>  }
>  
> -struct net_device *br_fdb_find_port(const struct net_device *br_dev,
> -				    const unsigned char *addr,
> -				    __u16 vid)
> +static struct net_device *
> +__br_fdb_find_port(const struct net_device *br_dev,
> +		   const unsigned char *addr,
> +		   __u16 vid, bool ts_update)
>  {
>  	struct net_bridge_fdb_entry *f;
> -	struct net_device *dev = NULL;
>  	struct net_bridge *br;
>  
> -	ASSERT_RTNL();
> -
>  	if (!netif_is_bridge_master(br_dev))
>  		return NULL;
>  
>  	br = netdev_priv(br_dev);
> -	rcu_read_lock();
>  	f = br_fdb_find_rcu(br, addr, vid);
> -	if (f && f->dst)
> -		dev = f->dst->dev;
> +
> +	if (f && f->dst) {
> +		f->updated = jiffies;
> +		f->used = f->updated;

This is wrong, f->updated should be set only if anything changed for the fdb.
Also you can optimize f->used a little bit if you check if jiffies != current value
before setting, you can have millions of packets per sec dirtying that cache line.

Aside from the above, it will change expected behaviour for br_fdb_find_port users
(mlxsw, added Ido to CC as well) because it will mark the fdb as active and refresh it
which should be done only for the ebpf helper, or might be exported through another helper
so ebpf users can decide if they want it updated. There are 2 different use cases and it is
not ok for both as we'll start refreshing fdbs that have been inactive for a while
and would've expired otherwise.

> +		return f->dst->dev;

This is wrong as well, f->dst can become NULL (fdb switched to point to the bridge itself).
You should make sure to read f->dst only once and work with the result. I know it's
been like that, but it was ok when accessed with rtnl held.

> +	}
> +	return NULL;
> +}
> +
> +struct net_device *br_fdb_find_port(const struct net_device *br_dev,
> +				    const unsigned char *addr,
> +				    __u16 vid)
> +{
> +	struct net_device *dev;
> +
> +	ASSERT_RTNL();
> +
> +	rcu_read_lock();
> +	dev = __br_fdb_find_port(br_dev, addr, vid, false);
>  	rcu_read_unlock();
>  
>  	return dev;
>  }
>  EXPORT_SYMBOL_GPL(br_fdb_find_port);
>  
> +int br_fdb_find_port_from_ifindex(struct xdp_md *xdp_ctx,
> +				  struct bpf_fdb_lookup *opt,
> +				  u32 opt__sz)
> +{
> +	struct xdp_buff *ctx = (struct xdp_buff *)xdp_ctx;
> +	struct net_bridge_port *port;
> +	struct net_device *dev;
> +	int ret = -ENODEV;
> +
> +	BUILD_BUG_ON(sizeof(struct bpf_fdb_lookup) != NF_BPF_FDB_OPTS_SZ);
> +	if (!opt || opt__sz != sizeof(struct bpf_fdb_lookup))
> +		return -ENODEV;
> +
> +	rcu_read_lock();
> +
> +	dev = dev_get_by_index_rcu(dev_net(ctx->rxq->dev), opt->ifindex);
> +	if (!dev)
> +		goto out;
> +
> +	if (unlikely(!netif_is_bridge_port(dev)))
> +		goto out;

This check shouldn't be needed if the port checks below succeed.

> +
> +	port = br_port_get_check_rcu(dev);
> +	if (unlikely(!port || !port->br))
> +		goto out;
> +
> +	dev = __br_fdb_find_port(port->br->dev, opt->addr, opt->vid, true);
> +	if (dev)
> +		ret = dev->ifindex;
> +out:
> +	rcu_read_unlock();
> +
> +	return ret;
> +}
> +
>  struct net_bridge_fdb_entry *br_fdb_find_rcu(struct net_bridge *br,
>  					     const unsigned char *addr,
>  					     __u16 vid)
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 2661dda1a92b..64d4f1727da2 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -18,6 +18,7 @@
>  #include <linux/if_vlan.h>
>  #include <linux/rhashtable.h>
>  #include <linux/refcount.h>
> +#include <linux/bpf.h>
>  
>  #define BR_HASH_BITS 8
>  #define BR_HASH_SIZE (1 << BR_HASH_BITS)
> @@ -2094,4 +2095,15 @@ void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br,
>  void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br,
>  		       u16 vid, struct net_bridge_port *p, struct nd_msg *msg);
>  struct nd_msg *br_is_nd_neigh_msg(struct sk_buff *skb, struct nd_msg *m);
> +
> +#define NF_BPF_FDB_OPTS_SZ	12
> +struct bpf_fdb_lookup {
> +	u8	addr[ETH_ALEN]; /* ETH_ALEN */
> +	u16	vid;
> +	u32	ifindex;
> +};
> +
> +int br_fdb_find_port_from_ifindex(struct xdp_md *xdp_ctx,
> +				  struct bpf_fdb_lookup *opt,
> +				  u32 opt__sz);
>  #endif

Thanks,
 Nik

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bridge] [RFC bpf-next 1/2] net: bridge: add unstable br_fdb_find_port_from_ifindex helper
  2022-01-24 18:32   ` [Bridge] [RFC bpf-next 1/2] net: bridge: add unstable br_fdb_find_port_from_ifindex helper Nikolay Aleksandrov
@ 2022-01-25  5:09     ` Alexei Starovoitov
  2022-01-26 11:09       ` Lorenzo Bianconi
  2022-01-26 11:27     ` Lorenzo Bianconi
  1 sibling, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2022-01-25  5:09 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: bridge@lists.linux-foundation.org, Daniel Borkmann, Roopa Prabhu,
	Network Development, David Ahern,
	Toke Høiland-Jørgensen, Yoshiki Komachi,
	Alexei Starovoitov, David S. Miller, Ido Schimmel,
	Kumar Kartikeya Dwivedi, Jesper Dangaard Brouer, Jakub Kicinski,
	bpf, Andrii Nakryiko, Lorenzo Bianconi, Lorenzo Bianconi

On Mon, Jan 24, 2022 at 10:32 AM Nikolay Aleksandrov <nikolay@nvidia.com> wrote:
> >
> > +int br_fdb_find_port_from_ifindex(struct xdp_md *xdp_ctx,
> > +                               struct bpf_fdb_lookup *opt,
> > +                               u32 opt__sz)
> > +{
> > +     struct xdp_buff *ctx = (struct xdp_buff *)xdp_ctx;
> > +     struct net_bridge_port *port;
> > +     struct net_device *dev;
> > +     int ret = -ENODEV;
> > +
> > +     BUILD_BUG_ON(sizeof(struct bpf_fdb_lookup) != NF_BPF_FDB_OPTS_SZ);
> > +     if (!opt || opt__sz != sizeof(struct bpf_fdb_lookup))
> > +             return -ENODEV;
> > +
> > +     rcu_read_lock();
> > +
> > +     dev = dev_get_by_index_rcu(dev_net(ctx->rxq->dev), opt->ifindex);
> > +     if (!dev)
> > +             goto out;

imo that is way too much wrapping for an unstable helper.
The dev lookup is not cheap.

With all the extra checks the XDP acceleration gets reduced.
I think it would be better to use kprobe/fentry on bridge
functions that operate on fdb and replicate necessary
data into bpf map.
Then xdp prog would do a single cheap lookup from that map
to figure out 'port'.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bridge] [RFC bpf-next 1/2] net: bridge: add unstable br_fdb_find_port_from_ifindex helper
  2022-01-25  5:09     ` Alexei Starovoitov
@ 2022-01-26 11:09       ` Lorenzo Bianconi
  2022-01-26 12:02         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 10+ messages in thread
From: Lorenzo Bianconi @ 2022-01-26 11:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bridge@lists.linux-foundation.org, Daniel Borkmann, Roopa Prabhu,
	Network Development, David Ahern,
	Toke Høiland-Jørgensen, Yoshiki Komachi,
	Alexei Starovoitov, Lorenzo Bianconi, Ido Schimmel,
	Nikolay Aleksandrov, Jesper Dangaard Brouer, Jakub Kicinski, bpf,
	Andrii Nakryiko, David S. Miller, Kumar Kartikeya Dwivedi

[-- Attachment #1: Type: text/plain, Size: 1332 bytes --]

> On Mon, Jan 24, 2022 at 10:32 AM Nikolay Aleksandrov <nikolay@nvidia.com> wrote:
> > >
> > > +int br_fdb_find_port_from_ifindex(struct xdp_md *xdp_ctx,
> > > +                               struct bpf_fdb_lookup *opt,
> > > +                               u32 opt__sz)
> > > +{
> > > +     struct xdp_buff *ctx = (struct xdp_buff *)xdp_ctx;
> > > +     struct net_bridge_port *port;
> > > +     struct net_device *dev;
> > > +     int ret = -ENODEV;
> > > +
> > > +     BUILD_BUG_ON(sizeof(struct bpf_fdb_lookup) != NF_BPF_FDB_OPTS_SZ);
> > > +     if (!opt || opt__sz != sizeof(struct bpf_fdb_lookup))
> > > +             return -ENODEV;
> > > +
> > > +     rcu_read_lock();
> > > +
> > > +     dev = dev_get_by_index_rcu(dev_net(ctx->rxq->dev), opt->ifindex);
> > > +     if (!dev)
> > > +             goto out;
> 
> imo that is way too much wrapping for an unstable helper.
> The dev lookup is not cheap.
> 
> With all the extra checks the XDP acceleration gets reduced.
> I think it would be better to use kprobe/fentry on bridge
> functions that operate on fdb and replicate necessary
> data into bpf map.
> Then xdp prog would do a single cheap lookup from that map
> to figure out 'port'.

ack, right. This is a very interesting approach. I will investigate it. Thanks.

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bridge] [RFC bpf-next 1/2] net: bridge: add unstable br_fdb_find_port_from_ifindex helper
  2022-01-24 18:32   ` [Bridge] [RFC bpf-next 1/2] net: bridge: add unstable br_fdb_find_port_from_ifindex helper Nikolay Aleksandrov
  2022-01-25  5:09     ` Alexei Starovoitov
@ 2022-01-26 11:27     ` Lorenzo Bianconi
  2022-01-26 12:08       ` Kumar Kartikeya Dwivedi
  2022-01-26 12:39       ` Nikolay Aleksandrov
  1 sibling, 2 replies; 10+ messages in thread
From: Lorenzo Bianconi @ 2022-01-26 11:27 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: bridge@lists.linux-foundation.org, daniel, Roopa Prabhu, netdev,
	dsahern, toke, komachi.yoshiki, ast, davem, Ido Schimmel, memxor,
	brouer, kuba, bpf, andrii.nakryiko, lorenzo.bianconi

[-- Attachment #1: Type: text/plain, Size: 8810 bytes --]

> On 24/01/2022 19:20, Lorenzo Bianconi wrote:
> > Similar to bpf_xdp_ct_lookup routine, introduce
> > br_fdb_find_port_from_ifindex unstable helper in order to accelerate
> > linux bridge with XDP. br_fdb_find_port_from_ifindex will perform a
> > lookup in the associated bridge fdb table and it will return the
> > output ifindex if the destination address is associated to a bridge
> > port or -ENODEV for BOM traffic or if lookup fails.
> > 
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  net/bridge/br.c         | 21 +++++++++++++
> >  net/bridge/br_fdb.c     | 67 +++++++++++++++++++++++++++++++++++------
> >  net/bridge/br_private.h | 12 ++++++++
> >  3 files changed, 91 insertions(+), 9 deletions(-)
> > 
> 
> Hi Lorenzo,

Hi Nikolay,

thx for the review.

> Please CC bridge maintainers for bridge-related patches, I've added Roopa and the
> bridge mailing list as well. Aside from that, the change is certainly interesting, I've been
> thinking about a similar helper for some time now, few comments below.

yes, sorry for that. I figured it out after sending the series out.

> 
> Have you thought about the egress path and if by the current bridge state the packet would
> be allowed to egress through the found port from the lookup? I'd guess you have to keep updating
> the active ports list based on netlink events, but there's a lot of egress bridge logic that
> either have to be duplicated or somehow synced. Check should_deliver() (br_forward.c) and later
> egress stages, but I see how this is a good first step and perhaps we can build upon it.
> There are a few possible solutions, but I haven't tried anything yet, most obvious being
> yet another helper. :)

ack, right but I am bit worried about adding too much logic and slow down xdp
performances. I guess we can investigate first the approach proposed by Alexei
and then revaluate. Agree?

> 
> > diff --git a/net/bridge/br.c b/net/bridge/br.c
> > index 1fac72cc617f..d2d1c2341d9c 100644
> > --- a/net/bridge/br.c
> > +++ b/net/bridge/br.c
> > @@ -16,6 +16,8 @@
> >  #include <net/llc.h>
> >  #include <net/stp.h>
> >  #include <net/switchdev.h>
> > +#include <linux/btf.h>
> > +#include <linux/btf_ids.h>
> >  
> >  #include "br_private.h"
> >  
> > @@ -365,6 +367,17 @@ static const struct stp_proto br_stp_proto = {
> >  	.rcv	= br_stp_rcv,
> >  };
> >  
> > +#if (IS_ENABLED(CONFIG_DEBUG_INFO_BTF) || IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
> > +BTF_SET_START(br_xdp_fdb_check_kfunc_ids)
> > +BTF_ID(func, br_fdb_find_port_from_ifindex)
> > +BTF_SET_END(br_xdp_fdb_check_kfunc_ids)
> > +
> > +static const struct btf_kfunc_id_set br_xdp_fdb_kfunc_set = {
> > +	.owner     = THIS_MODULE,
> > +	.check_set = &br_xdp_fdb_check_kfunc_ids,
> > +};
> > +#endif
> > +
> >  static int __init br_init(void)
> >  {
> >  	int err;
> > @@ -417,6 +430,14 @@ static int __init br_init(void)
> >  		"need this.\n");
> >  #endif
> >  
> > +#if (IS_ENABLED(CONFIG_DEBUG_INFO_BTF) || IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
> > +	err = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &br_xdp_fdb_kfunc_set);
> > +	if (err < 0) {
> > +		br_netlink_fini();
> > +		goto err_out6;
> 
> Add err_out7 and handle it there please. Let's keep it consistent.
> Also I cannot find register_btf_kfunc_id_set() in net-next or Linus' master, but
> should it be paired with an unregister on unload (br_deinit) ?

I guess at the time I sent the series it was just in bpf-next but now it should
be in net-next too.
I do not think we need a unregister here.
@Kumar: agree?

> 
> > +	}
> > +#endif
> > +
> >  	return 0;
> >  
> >  err_out6:
> > diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> > index 6ccda68bd473..cd3afa240298 100644
> > --- a/net/bridge/br_fdb.c
> > +++ b/net/bridge/br_fdb.c
> > @@ -235,30 +235,79 @@ static struct net_bridge_fdb_entry *br_fdb_find(struct net_bridge *br,
> >  	return fdb;
> >  }
> >  
> > -struct net_device *br_fdb_find_port(const struct net_device *br_dev,
> > -				    const unsigned char *addr,
> > -				    __u16 vid)
> > +static struct net_device *
> > +__br_fdb_find_port(const struct net_device *br_dev,
> > +		   const unsigned char *addr,
> > +		   __u16 vid, bool ts_update)
> >  {
> >  	struct net_bridge_fdb_entry *f;
> > -	struct net_device *dev = NULL;
> >  	struct net_bridge *br;
> >  
> > -	ASSERT_RTNL();
> > -
> >  	if (!netif_is_bridge_master(br_dev))
> >  		return NULL;
> >  
> >  	br = netdev_priv(br_dev);
> > -	rcu_read_lock();
> >  	f = br_fdb_find_rcu(br, addr, vid);
> > -	if (f && f->dst)
> > -		dev = f->dst->dev;
> > +
> > +	if (f && f->dst) {
> > +		f->updated = jiffies;
> > +		f->used = f->updated;
> 
> This is wrong, f->updated should be set only if anything changed for the fdb.
> Also you can optimize f->used a little bit if you check if jiffies != current value
> before setting, you can have millions of packets per sec dirtying that cache line.

ack, right. I will fix it.

> 
> Aside from the above, it will change expected behaviour for br_fdb_find_port users
> (mlxsw, added Ido to CC as well) because it will mark the fdb as active and refresh it
> which should be done only for the ebpf helper, or might be exported through another helper
> so ebpf users can decide if they want it updated. There are 2 different use cases and it is
> not ok for both as we'll start refreshing fdbs that have been inactive for a while
> and would've expired otherwise.

This is a bug actually. I forgot to check ts_update in the if condition,
something like:

if (f && f->dst && ts_update) {
 ...
 }

> 
> > +		return f->dst->dev;
> 
> This is wrong as well, f->dst can become NULL (fdb switched to point to the bridge itself).
> You should make sure to read f->dst only once and work with the result. I know it's
> been like that, but it was ok when accessed with rtnl held.

uhm, right. I will fix it.

> 
> > +	}
> > +	return NULL;
> > +}
> > +
> > +struct net_device *br_fdb_find_port(const struct net_device *br_dev,
> > +				    const unsigned char *addr,
> > +				    __u16 vid)
> > +{
> > +	struct net_device *dev;
> > +
> > +	ASSERT_RTNL();
> > +
> > +	rcu_read_lock();
> > +	dev = __br_fdb_find_port(br_dev, addr, vid, false);
> >  	rcu_read_unlock();
> >  
> >  	return dev;
> >  }
> >  EXPORT_SYMBOL_GPL(br_fdb_find_port);
> >  
> > +int br_fdb_find_port_from_ifindex(struct xdp_md *xdp_ctx,
> > +				  struct bpf_fdb_lookup *opt,
> > +				  u32 opt__sz)
> > +{
> > +	struct xdp_buff *ctx = (struct xdp_buff *)xdp_ctx;
> > +	struct net_bridge_port *port;
> > +	struct net_device *dev;
> > +	int ret = -ENODEV;
> > +
> > +	BUILD_BUG_ON(sizeof(struct bpf_fdb_lookup) != NF_BPF_FDB_OPTS_SZ);
> > +	if (!opt || opt__sz != sizeof(struct bpf_fdb_lookup))
> > +		return -ENODEV;
> > +
> > +	rcu_read_lock();
> > +
> > +	dev = dev_get_by_index_rcu(dev_net(ctx->rxq->dev), opt->ifindex);
> > +	if (!dev)
> > +		goto out;
> > +
> > +	if (unlikely(!netif_is_bridge_port(dev)))
> > +		goto out;
> 
> This check shouldn't be needed if the port checks below succeed.

ack, I will fix it.

Regards,
Lorenzo

> 
> > +
> > +	port = br_port_get_check_rcu(dev);
> > +	if (unlikely(!port || !port->br))
> > +		goto out;
> > +
> > +	dev = __br_fdb_find_port(port->br->dev, opt->addr, opt->vid, true);
> > +	if (dev)
> > +		ret = dev->ifindex;
> > +out:
> > +	rcu_read_unlock();
> > +
> > +	return ret;
> > +}
> > +
> >  struct net_bridge_fdb_entry *br_fdb_find_rcu(struct net_bridge *br,
> >  					     const unsigned char *addr,
> >  					     __u16 vid)
> > diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> > index 2661dda1a92b..64d4f1727da2 100644
> > --- a/net/bridge/br_private.h
> > +++ b/net/bridge/br_private.h
> > @@ -18,6 +18,7 @@
> >  #include <linux/if_vlan.h>
> >  #include <linux/rhashtable.h>
> >  #include <linux/refcount.h>
> > +#include <linux/bpf.h>
> >  
> >  #define BR_HASH_BITS 8
> >  #define BR_HASH_SIZE (1 << BR_HASH_BITS)
> > @@ -2094,4 +2095,15 @@ void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br,
> >  void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br,
> >  		       u16 vid, struct net_bridge_port *p, struct nd_msg *msg);
> >  struct nd_msg *br_is_nd_neigh_msg(struct sk_buff *skb, struct nd_msg *m);
> > +
> > +#define NF_BPF_FDB_OPTS_SZ	12
> > +struct bpf_fdb_lookup {
> > +	u8	addr[ETH_ALEN]; /* ETH_ALEN */
> > +	u16	vid;
> > +	u32	ifindex;
> > +};
> > +
> > +int br_fdb_find_port_from_ifindex(struct xdp_md *xdp_ctx,
> > +				  struct bpf_fdb_lookup *opt,
> > +				  u32 opt__sz);
> >  #endif
> 
> Thanks,
>  Nik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bridge] [RFC bpf-next 1/2] net: bridge: add unstable br_fdb_find_port_from_ifindex helper
  2022-01-26 11:09       ` Lorenzo Bianconi
@ 2022-01-26 12:02         ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 10+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-01-26 12:02 UTC (permalink / raw)
  To: Lorenzo Bianconi, Alexei Starovoitov
  Cc: bridge@lists.linux-foundation.org, Daniel Borkmann,
	Network Development, David Ahern, Roopa Prabhu, Yoshiki Komachi,
	Alexei Starovoitov, Lorenzo Bianconi, Ido Schimmel,
	Nikolay Aleksandrov, Jesper Dangaard Brouer, Jakub Kicinski, bpf,
	Andrii Nakryiko, David S. Miller, Kumar Kartikeya Dwivedi

Lorenzo Bianconi <lorenzo@kernel.org> writes:

>> On Mon, Jan 24, 2022 at 10:32 AM Nikolay Aleksandrov <nikolay@nvidia.com> wrote:
>> > >
>> > > +int br_fdb_find_port_from_ifindex(struct xdp_md *xdp_ctx,
>> > > +                               struct bpf_fdb_lookup *opt,
>> > > +                               u32 opt__sz)
>> > > +{
>> > > +     struct xdp_buff *ctx = (struct xdp_buff *)xdp_ctx;
>> > > +     struct net_bridge_port *port;
>> > > +     struct net_device *dev;
>> > > +     int ret = -ENODEV;
>> > > +
>> > > +     BUILD_BUG_ON(sizeof(struct bpf_fdb_lookup) != NF_BPF_FDB_OPTS_SZ);
>> > > +     if (!opt || opt__sz != sizeof(struct bpf_fdb_lookup))
>> > > +             return -ENODEV;
>> > > +
>> > > +     rcu_read_lock();
>> > > +
>> > > +     dev = dev_get_by_index_rcu(dev_net(ctx->rxq->dev), opt->ifindex);
>> > > +     if (!dev)
>> > > +             goto out;
>> 
>> imo that is way too much wrapping for an unstable helper.
>> The dev lookup is not cheap.
>> 
>> With all the extra checks the XDP acceleration gets reduced.
>> I think it would be better to use kprobe/fentry on bridge
>> functions that operate on fdb and replicate necessary
>> data into bpf map.
>> Then xdp prog would do a single cheap lookup from that map
>> to figure out 'port'.
>
> ack, right. This is a very interesting approach. I will investigate
> it. Thanks.

I think it would be interesting to try both, and compare their
performance. I'm a bit sceptical about Alexei's assertion that
dev_get_by_index_rcu() is that expensive: we do such a lookup in the XDP
redirect code when using the non-map bpf_redirect() helper, and I have
not been able to measure a significant performance difference between
the map and non-map variants (after we added bulking to the latter).

If looking up devices by ifindex does turn out to be too expensive,
maybe what we really need is a way to pass around 'struct net_device'
pointers to BPF helpers, so a given BPF program only has to do the
lookup once if it's calling multiple dev-based helpers? I think this
should be doable with BTF, no?

-Toke


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bridge] [RFC bpf-next 1/2] net: bridge: add unstable br_fdb_find_port_from_ifindex helper
  2022-01-26 11:27     ` Lorenzo Bianconi
@ 2022-01-26 12:08       ` Kumar Kartikeya Dwivedi
  2022-01-26 12:39       ` Nikolay Aleksandrov
  1 sibling, 0 replies; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-01-26 12:08 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bridge@lists.linux-foundation.org, daniel, Roopa Prabhu, netdev,
	dsahern, toke, komachi.yoshiki, ast, lorenzo.bianconi,
	Ido Schimmel, Nikolay Aleksandrov, brouer, kuba, bpf,
	andrii.nakryiko, davem

On Wed, Jan 26, 2022 at 04:57:42PM IST, Lorenzo Bianconi wrote:
> > On 24/01/2022 19:20, Lorenzo Bianconi wrote:
> > > Similar to bpf_xdp_ct_lookup routine, introduce
> > > br_fdb_find_port_from_ifindex unstable helper in order to accelerate
> > > linux bridge with XDP. br_fdb_find_port_from_ifindex will perform a
> > > lookup in the associated bridge fdb table and it will return the
> > > output ifindex if the destination address is associated to a bridge
> > > port or -ENODEV for BOM traffic or if lookup fails.
> > >
> > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > ---
> > >  net/bridge/br.c         | 21 +++++++++++++
> > >  net/bridge/br_fdb.c     | 67 +++++++++++++++++++++++++++++++++++------
> > >  net/bridge/br_private.h | 12 ++++++++
> > >  3 files changed, 91 insertions(+), 9 deletions(-)
> > >
> >
> > Hi Lorenzo,
>
> Hi Nikolay,
>
> thx for the review.
>
> [...]
>
> I guess at the time I sent the series it was just in bpf-next but now it should
> be in net-next too.
> I do not think we need a unregister here.
> @Kumar: agree?
>

Yes, no need to call unregister (hence there is no unregister).

> > [...]



--
Kartikeya

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bridge] [RFC bpf-next 1/2] net: bridge: add unstable br_fdb_find_port_from_ifindex helper
  2022-01-26 11:27     ` Lorenzo Bianconi
  2022-01-26 12:08       ` Kumar Kartikeya Dwivedi
@ 2022-01-26 12:39       ` Nikolay Aleksandrov
  2022-01-26 12:50         ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 10+ messages in thread
From: Nikolay Aleksandrov @ 2022-01-26 12:39 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bridge@lists.linux-foundation.org, daniel, Roopa Prabhu, netdev,
	dsahern, toke, komachi.yoshiki, ast, davem, Ido Schimmel, memxor,
	brouer, kuba, bpf, andrii.nakryiko, lorenzo.bianconi

On 26/01/2022 13:27, Lorenzo Bianconi wrote:
>> On 24/01/2022 19:20, Lorenzo Bianconi wrote:
>>> Similar to bpf_xdp_ct_lookup routine, introduce
>>> br_fdb_find_port_from_ifindex unstable helper in order to accelerate
>>> linux bridge with XDP. br_fdb_find_port_from_ifindex will perform a
>>> lookup in the associated bridge fdb table and it will return the
>>> output ifindex if the destination address is associated to a bridge
>>> port or -ENODEV for BOM traffic or if lookup fails.
>>>
>>> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>>> ---
>>>  net/bridge/br.c         | 21 +++++++++++++
>>>  net/bridge/br_fdb.c     | 67 +++++++++++++++++++++++++++++++++++------
>>>  net/bridge/br_private.h | 12 ++++++++
>>>  3 files changed, 91 insertions(+), 9 deletions(-)
>>>
>>
>> Hi Lorenzo,
> 
> Hi Nikolay,
> 
> thx for the review.
> 
>> Please CC bridge maintainers for bridge-related patches, I've added Roopa and the
>> bridge mailing list as well. Aside from that, the change is certainly interesting, I've been
>> thinking about a similar helper for some time now, few comments below.
> 
> yes, sorry for that. I figured it out after sending the series out.
> 
>>
>> Have you thought about the egress path and if by the current bridge state the packet would
>> be allowed to egress through the found port from the lookup? I'd guess you have to keep updating
>> the active ports list based on netlink events, but there's a lot of egress bridge logic that
>> either have to be duplicated or somehow synced. Check should_deliver() (br_forward.c) and later
>> egress stages, but I see how this is a good first step and perhaps we can build upon it.
>> There are a few possible solutions, but I haven't tried anything yet, most obvious being
>> yet another helper. :)
> 
> ack, right but I am bit worried about adding too much logic and slow down xdp
> performances. I guess we can investigate first the approach proposed by Alexei
> and then revaluate. Agree?
> 

Sure, that approach sounds very interesting, but my point was that bypassing the ingress
and egress logic defeats most of the bridge features. You just get an fdb hash table which
you can build today with ebpf without any changes to the kernel. :) You have multiple states,
flags and options for each port and each vlan which can change dynamically based on external
events (e.g. STP, config changes etc) and they can affect forwarding even if the fdbs remain
in the table.
One (untested, potential) way is to speedup full flows that have successfully passed from ingress to
egress for some period of time and flush them based on related events that might have affected
them, but that is very different. Another way would be to replicate some of that logic in ebpf
which would hit performance, and would probably also require more helpers. It would be interesting
to see how this problem would be solved.

>>
>>> diff --git a/net/bridge/br.c b/net/bridge/br.c
>>> index 1fac72cc617f..d2d1c2341d9c 100644
>>> --- a/net/bridge/br.c
>>> +++ b/net/bridge/br.c
>>> @@ -16,6 +16,8 @@
>>>  #include <net/llc.h>
>>>  #include <net/stp.h>
>>>  #include <net/switchdev.h>
>>> +#include <linux/btf.h>
>>> +#include <linux/btf_ids.h>
>>>  
>>>  #include "br_private.h"
>>>  
>>> @@ -365,6 +367,17 @@ static const struct stp_proto br_stp_proto = {
>>>  	.rcv	= br_stp_rcv,
>>>  };
>>>  
>>> +#if (IS_ENABLED(CONFIG_DEBUG_INFO_BTF) || IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
>>> +BTF_SET_START(br_xdp_fdb_check_kfunc_ids)
>>> +BTF_ID(func, br_fdb_find_port_from_ifindex)
>>> +BTF_SET_END(br_xdp_fdb_check_kfunc_ids)
>>> +
>>> +static const struct btf_kfunc_id_set br_xdp_fdb_kfunc_set = {
>>> +	.owner     = THIS_MODULE,
>>> +	.check_set = &br_xdp_fdb_check_kfunc_ids,
>>> +};
>>> +#endif
>>> +
>>>  static int __init br_init(void)
>>>  {
>>>  	int err;
>>> @@ -417,6 +430,14 @@ static int __init br_init(void)
>>>  		"need this.\n");
>>>  #endif
>>>  
>>> +#if (IS_ENABLED(CONFIG_DEBUG_INFO_BTF) || IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
>>> +	err = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &br_xdp_fdb_kfunc_set);
>>> +	if (err < 0) {
>>> +		br_netlink_fini();
>>> +		goto err_out6;
>>
>> Add err_out7 and handle it there please. Let's keep it consistent.
>> Also I cannot find register_btf_kfunc_id_set() in net-next or Linus' master, but
>> should it be paired with an unregister on unload (br_deinit) ?
> 
> I guess at the time I sent the series it was just in bpf-next but now it should
> be in net-next too.
> I do not think we need a unregister here.
> @Kumar: agree?
> 

Oh, my bad. I obviously should've looked at the bpf tree. :)

>>
>>> +	}
>>> +#endif
>>> +
>>>  	return 0;
>>>  
>>>  err_out6:
>>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>>> index 6ccda68bd473..cd3afa240298 100644
>>> --- a/net/bridge/br_fdb.c
>>> +++ b/net/bridge/br_fdb.c
>>> @@ -235,30 +235,79 @@ static struct net_bridge_fdb_entry *br_fdb_find(struct net_bridge *br,
>>>  	return fdb;
>>>  }
>>>  
>>> -struct net_device *br_fdb_find_port(const struct net_device *br_dev,
>>> -				    const unsigned char *addr,
>>> -				    __u16 vid)
>>> +static struct net_device *
>>> +__br_fdb_find_port(const struct net_device *br_dev,
>>> +		   const unsigned char *addr,
>>> +		   __u16 vid, bool ts_update)
>>>  {
>>>  	struct net_bridge_fdb_entry *f;
>>> -	struct net_device *dev = NULL;
>>>  	struct net_bridge *br;
>>>  
>>> -	ASSERT_RTNL();
>>> -
>>>  	if (!netif_is_bridge_master(br_dev))
>>>  		return NULL;
>>>  
>>>  	br = netdev_priv(br_dev);
>>> -	rcu_read_lock();
>>>  	f = br_fdb_find_rcu(br, addr, vid);
>>> -	if (f && f->dst)
>>> -		dev = f->dst->dev;
>>> +
>>> +	if (f && f->dst) {
>>> +		f->updated = jiffies;
>>> +		f->used = f->updated;
>>
>> This is wrong, f->updated should be set only if anything changed for the fdb.
>> Also you can optimize f->used a little bit if you check if jiffies != current value
>> before setting, you can have millions of packets per sec dirtying that cache line.
> 
> ack, right. I will fix it.
> 
>>
>> Aside from the above, it will change expected behaviour for br_fdb_find_port users
>> (mlxsw, added Ido to CC as well) because it will mark the fdb as active and refresh it
>> which should be done only for the ebpf helper, or might be exported through another helper
>> so ebpf users can decide if they want it updated. There are 2 different use cases and it is
>> not ok for both as we'll start refreshing fdbs that have been inactive for a while
>> and would've expired otherwise.
> 
> This is a bug actually. I forgot to check ts_update in the if condition,
> something like:
> 
> if (f && f->dst && ts_update) {
>  ...
>  }
> 
>>
>>> +		return f->dst->dev;
>>
>> This is wrong as well, f->dst can become NULL (fdb switched to point to the bridge itself).
>> You should make sure to read f->dst only once and work with the result. I know it's
>> been like that, but it was ok when accessed with rtnl held.
> 
> uhm, right. I will fix it.
> 
>>
>>> +	}
>>> +	return NULL;
>>> +}
>>> +
>>> +struct net_device *br_fdb_find_port(const struct net_device *br_dev,
>>> +				    const unsigned char *addr,
>>> +				    __u16 vid)
>>> +{
>>> +	struct net_device *dev;
>>> +
>>> +	ASSERT_RTNL();
>>> +
>>> +	rcu_read_lock();
>>> +	dev = __br_fdb_find_port(br_dev, addr, vid, false);
>>>  	rcu_read_unlock();
>>>  
>>>  	return dev;
>>>  }
>>>  EXPORT_SYMBOL_GPL(br_fdb_find_port);
>>>  
>>> +int br_fdb_find_port_from_ifindex(struct xdp_md *xdp_ctx,
>>> +				  struct bpf_fdb_lookup *opt,
>>> +				  u32 opt__sz)
>>> +{
>>> +	struct xdp_buff *ctx = (struct xdp_buff *)xdp_ctx;
>>> +	struct net_bridge_port *port;
>>> +	struct net_device *dev;
>>> +	int ret = -ENODEV;
>>> +
>>> +	BUILD_BUG_ON(sizeof(struct bpf_fdb_lookup) != NF_BPF_FDB_OPTS_SZ);
>>> +	if (!opt || opt__sz != sizeof(struct bpf_fdb_lookup))
>>> +		return -ENODEV;
>>> +
>>> +	rcu_read_lock();
>>> +
>>> +	dev = dev_get_by_index_rcu(dev_net(ctx->rxq->dev), opt->ifindex);
>>> +	if (!dev)
>>> +		goto out;
>>> +
>>> +	if (unlikely(!netif_is_bridge_port(dev)))
>>> +		goto out;
>>
>> This check shouldn't be needed if the port checks below succeed.
> 
> ack, I will fix it.
> 
> Regards,
> Lorenzo
> 
>>
>>> +
>>> +	port = br_port_get_check_rcu(dev);
>>> +	if (unlikely(!port || !port->br))
>>> +		goto out;
>>> +
>>> +	dev = __br_fdb_find_port(port->br->dev, opt->addr, opt->vid, true);
>>> +	if (dev)
>>> +		ret = dev->ifindex;
>>> +out:
>>> +	rcu_read_unlock();
>>> +
>>> +	return ret;
>>> +}
>>> +
>>>  struct net_bridge_fdb_entry *br_fdb_find_rcu(struct net_bridge *br,
>>>  					     const unsigned char *addr,
>>>  					     __u16 vid)
>>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
>>> index 2661dda1a92b..64d4f1727da2 100644
>>> --- a/net/bridge/br_private.h
>>> +++ b/net/bridge/br_private.h
>>> @@ -18,6 +18,7 @@
>>>  #include <linux/if_vlan.h>
>>>  #include <linux/rhashtable.h>
>>>  #include <linux/refcount.h>
>>> +#include <linux/bpf.h>
>>>  
>>>  #define BR_HASH_BITS 8
>>>  #define BR_HASH_SIZE (1 << BR_HASH_BITS)
>>> @@ -2094,4 +2095,15 @@ void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br,
>>>  void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br,
>>>  		       u16 vid, struct net_bridge_port *p, struct nd_msg *msg);
>>>  struct nd_msg *br_is_nd_neigh_msg(struct sk_buff *skb, struct nd_msg *m);
>>> +
>>> +#define NF_BPF_FDB_OPTS_SZ	12
>>> +struct bpf_fdb_lookup {
>>> +	u8	addr[ETH_ALEN]; /* ETH_ALEN */
>>> +	u16	vid;
>>> +	u32	ifindex;
>>> +};
>>> +
>>> +int br_fdb_find_port_from_ifindex(struct xdp_md *xdp_ctx,
>>> +				  struct bpf_fdb_lookup *opt,
>>> +				  u32 opt__sz);
>>>  #endif
>>
>> Thanks,
>>  Nik


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bridge] [RFC bpf-next 1/2] net: bridge: add unstable br_fdb_find_port_from_ifindex helper
  2022-01-26 12:39       ` Nikolay Aleksandrov
@ 2022-01-26 12:50         ` Toke Høiland-Jørgensen
  2022-01-26 12:57           ` Nikolay Aleksandrov
  0 siblings, 1 reply; 10+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-01-26 12:50 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Lorenzo Bianconi
  Cc: bridge@lists.linux-foundation.org, daniel, netdev, dsahern,
	Roopa Prabhu, komachi.yoshiki, ast, davem, Ido Schimmel, memxor,
	brouer, kuba, bpf, andrii.nakryiko, lorenzo.bianconi

Nikolay Aleksandrov <nikolay@nvidia.com> writes:

> On 26/01/2022 13:27, Lorenzo Bianconi wrote:
>>> On 24/01/2022 19:20, Lorenzo Bianconi wrote:
>>>> Similar to bpf_xdp_ct_lookup routine, introduce
>>>> br_fdb_find_port_from_ifindex unstable helper in order to accelerate
>>>> linux bridge with XDP. br_fdb_find_port_from_ifindex will perform a
>>>> lookup in the associated bridge fdb table and it will return the
>>>> output ifindex if the destination address is associated to a bridge
>>>> port or -ENODEV for BOM traffic or if lookup fails.
>>>>
>>>> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>>>> ---
>>>>  net/bridge/br.c         | 21 +++++++++++++
>>>>  net/bridge/br_fdb.c     | 67 +++++++++++++++++++++++++++++++++++------
>>>>  net/bridge/br_private.h | 12 ++++++++
>>>>  3 files changed, 91 insertions(+), 9 deletions(-)
>>>>
>>>
>>> Hi Lorenzo,
>> 
>> Hi Nikolay,
>> 
>> thx for the review.
>> 
>>> Please CC bridge maintainers for bridge-related patches, I've added Roopa and the
>>> bridge mailing list as well. Aside from that, the change is certainly interesting, I've been
>>> thinking about a similar helper for some time now, few comments below.
>> 
>> yes, sorry for that. I figured it out after sending the series out.
>> 
>>>
>>> Have you thought about the egress path and if by the current bridge state the packet would
>>> be allowed to egress through the found port from the lookup? I'd guess you have to keep updating
>>> the active ports list based on netlink events, but there's a lot of egress bridge logic that
>>> either have to be duplicated or somehow synced. Check should_deliver() (br_forward.c) and later
>>> egress stages, but I see how this is a good first step and perhaps we can build upon it.
>>> There are a few possible solutions, but I haven't tried anything yet, most obvious being
>>> yet another helper. :)
>> 
>> ack, right but I am bit worried about adding too much logic and slow down xdp
>> performances. I guess we can investigate first the approach proposed by Alexei
>> and then revaluate. Agree?
>> 
>
> Sure, that approach sounds very interesting, but my point was that
> bypassing the ingress and egress logic defeats most of the bridge
> features. You just get an fdb hash table which you can build today
> with ebpf without any changes to the kernel. :) You have multiple
> states, flags and options for each port and each vlan which can change
> dynamically based on external events (e.g. STP, config changes etc)
> and they can affect forwarding even if the fdbs remain in the table.

To me, leveraging all this is precisely the reason to have BPF helpers
instead of just replicating state in BPF maps: it's very easy to do that
and show a nice speedup, and then once you get all the corner cases
covered that the in-kernel code already deals with, you've chipped away
at that speedup and spent a lot of time essentially re-writing the
battle-tested code already in the kernel.

So I think figuring out how to do the state sync is the right thing to
do; a second helper would be fine for this, IMO, but I'm not really
familiar enough with the bridge code to really have a qualified opinion.

-Toke


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bridge] [RFC bpf-next 1/2] net: bridge: add unstable br_fdb_find_port_from_ifindex helper
  2022-01-26 12:50         ` Toke Høiland-Jørgensen
@ 2022-01-26 12:57           ` Nikolay Aleksandrov
  2022-01-26 15:00             ` Lorenzo Bianconi
  0 siblings, 1 reply; 10+ messages in thread
From: Nikolay Aleksandrov @ 2022-01-26 12:57 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Lorenzo Bianconi
  Cc: bridge@lists.linux-foundation.org, daniel, netdev, dsahern,
	Roopa Prabhu, komachi.yoshiki, ast, davem, Ido Schimmel, memxor,
	brouer, kuba, bpf, andrii.nakryiko, lorenzo.bianconi

On 26/01/2022 14:50, Toke Høiland-Jørgensen wrote:
> Nikolay Aleksandrov <nikolay@nvidia.com> writes:
> 
>> On 26/01/2022 13:27, Lorenzo Bianconi wrote:
>>>> On 24/01/2022 19:20, Lorenzo Bianconi wrote:
>>>>> Similar to bpf_xdp_ct_lookup routine, introduce
>>>>> br_fdb_find_port_from_ifindex unstable helper in order to accelerate
>>>>> linux bridge with XDP. br_fdb_find_port_from_ifindex will perform a
>>>>> lookup in the associated bridge fdb table and it will return the
>>>>> output ifindex if the destination address is associated to a bridge
>>>>> port or -ENODEV for BOM traffic or if lookup fails.
>>>>>
>>>>> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>>>>> ---
>>>>>  net/bridge/br.c         | 21 +++++++++++++
>>>>>  net/bridge/br_fdb.c     | 67 +++++++++++++++++++++++++++++++++++------
>>>>>  net/bridge/br_private.h | 12 ++++++++
>>>>>  3 files changed, 91 insertions(+), 9 deletions(-)
>>>>>
>>>>
>>>> Hi Lorenzo,
>>>
>>> Hi Nikolay,
>>>
>>> thx for the review.
>>>
>>>> Please CC bridge maintainers for bridge-related patches, I've added Roopa and the
>>>> bridge mailing list as well. Aside from that, the change is certainly interesting, I've been
>>>> thinking about a similar helper for some time now, few comments below.
>>>
>>> yes, sorry for that. I figured it out after sending the series out.
>>>
>>>>
>>>> Have you thought about the egress path and if by the current bridge state the packet would
>>>> be allowed to egress through the found port from the lookup? I'd guess you have to keep updating
>>>> the active ports list based on netlink events, but there's a lot of egress bridge logic that
>>>> either have to be duplicated or somehow synced. Check should_deliver() (br_forward.c) and later
>>>> egress stages, but I see how this is a good first step and perhaps we can build upon it.
>>>> There are a few possible solutions, but I haven't tried anything yet, most obvious being
>>>> yet another helper. :)
>>>
>>> ack, right but I am bit worried about adding too much logic and slow down xdp
>>> performances. I guess we can investigate first the approach proposed by Alexei
>>> and then revaluate. Agree?
>>>
>>
>> Sure, that approach sounds very interesting, but my point was that
>> bypassing the ingress and egress logic defeats most of the bridge
>> features. You just get an fdb hash table which you can build today
>> with ebpf without any changes to the kernel. :) You have multiple
>> states, flags and options for each port and each vlan which can change
>> dynamically based on external events (e.g. STP, config changes etc)
>> and they can affect forwarding even if the fdbs remain in the table.
> 
> To me, leveraging all this is precisely the reason to have BPF helpers
> instead of just replicating state in BPF maps: it's very easy to do that
> and show a nice speedup, and then once you get all the corner cases
> covered that the in-kernel code already deals with, you've chipped away
> at that speedup and spent a lot of time essentially re-writing the
> battle-tested code already in the kernel.
> 
> So I think figuring out how to do the state sync is the right thing to
> do; a second helper would be fine for this, IMO, but I'm not really
> familiar enough with the bridge code to really have a qualified opinion.
> 
> -Toke
> 

Right, sounds good to me. IMO it should be required in order to get a meaningful bridge
speedup, otherwise the solution is incomplete and you just do simple lookups that ignore
all of the state that could impact the forwarding decision.

Cheers,
 Nik

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bridge] [RFC bpf-next 1/2] net: bridge: add unstable br_fdb_find_port_from_ifindex helper
  2022-01-26 12:57           ` Nikolay Aleksandrov
@ 2022-01-26 15:00             ` Lorenzo Bianconi
  0 siblings, 0 replies; 10+ messages in thread
From: Lorenzo Bianconi @ 2022-01-26 15:00 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: bridge@lists.linux-foundation.org, daniel, Roopa Prabhu, netdev,
	dsahern, Toke Høiland-Jørgensen, komachi.yoshiki, ast,
	davem, Ido Schimmel, memxor, brouer, kuba, bpf, andrii.nakryiko,
	lorenzo.bianconi

[-- Attachment #1: Type: text/plain, Size: 3963 bytes --]

> On 26/01/2022 14:50, Toke Høiland-Jørgensen wrote:
> > Nikolay Aleksandrov <nikolay@nvidia.com> writes:
> > 
> >> On 26/01/2022 13:27, Lorenzo Bianconi wrote:
> >>>> On 24/01/2022 19:20, Lorenzo Bianconi wrote:
> >>>>> Similar to bpf_xdp_ct_lookup routine, introduce
> >>>>> br_fdb_find_port_from_ifindex unstable helper in order to accelerate
> >>>>> linux bridge with XDP. br_fdb_find_port_from_ifindex will perform a
> >>>>> lookup in the associated bridge fdb table and it will return the
> >>>>> output ifindex if the destination address is associated to a bridge
> >>>>> port or -ENODEV for BOM traffic or if lookup fails.
> >>>>>
> >>>>> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> >>>>> ---
> >>>>>  net/bridge/br.c         | 21 +++++++++++++
> >>>>>  net/bridge/br_fdb.c     | 67 +++++++++++++++++++++++++++++++++++------
> >>>>>  net/bridge/br_private.h | 12 ++++++++
> >>>>>  3 files changed, 91 insertions(+), 9 deletions(-)
> >>>>>
> >>>>
> >>>> Hi Lorenzo,
> >>>
> >>> Hi Nikolay,
> >>>
> >>> thx for the review.
> >>>
> >>>> Please CC bridge maintainers for bridge-related patches, I've added Roopa and the
> >>>> bridge mailing list as well. Aside from that, the change is certainly interesting, I've been
> >>>> thinking about a similar helper for some time now, few comments below.
> >>>
> >>> yes, sorry for that. I figured it out after sending the series out.
> >>>
> >>>>
> >>>> Have you thought about the egress path and if by the current bridge state the packet would
> >>>> be allowed to egress through the found port from the lookup? I'd guess you have to keep updating
> >>>> the active ports list based on netlink events, but there's a lot of egress bridge logic that
> >>>> either have to be duplicated or somehow synced. Check should_deliver() (br_forward.c) and later
> >>>> egress stages, but I see how this is a good first step and perhaps we can build upon it.
> >>>> There are a few possible solutions, but I haven't tried anything yet, most obvious being
> >>>> yet another helper. :)
> >>>
> >>> ack, right but I am bit worried about adding too much logic and slow down xdp
> >>> performances. I guess we can investigate first the approach proposed by Alexei
> >>> and then revaluate. Agree?
> >>>
> >>
> >> Sure, that approach sounds very interesting, but my point was that
> >> bypassing the ingress and egress logic defeats most of the bridge
> >> features. You just get an fdb hash table which you can build today
> >> with ebpf without any changes to the kernel. :) You have multiple
> >> states, flags and options for each port and each vlan which can change
> >> dynamically based on external events (e.g. STP, config changes etc)
> >> and they can affect forwarding even if the fdbs remain in the table.
> > 
> > To me, leveraging all this is precisely the reason to have BPF helpers
> > instead of just replicating state in BPF maps: it's very easy to do that
> > and show a nice speedup, and then once you get all the corner cases
> > covered that the in-kernel code already deals with, you've chipped away
> > at that speedup and spent a lot of time essentially re-writing the
> > battle-tested code already in the kernel.
> > 
> > So I think figuring out how to do the state sync is the right thing to
> > do; a second helper would be fine for this, IMO, but I'm not really
> > familiar enough with the bridge code to really have a qualified opinion.
> > 
> > -Toke
> > 
> 
> Right, sounds good to me. IMO it should be required in order to get a meaningful bridge
> speedup, otherwise the solution is incomplete and you just do simple lookups that ignore
> all of the state that could impact the forwarding decision.

ack, I agree but I need to review it first since I am not so familiar
with that codebase :)
Doing so we can compare this solution with the one proposed by Alexei.

Regards,
Lorenzo

> 
> Cheers,
>  Nik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-01-26 15:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cover.1643044381.git.lorenzo@kernel.org>
     [not found] ` <720907692575488526f06edc2cf5c8f783777d4f.1643044381.git.lorenzo@kernel.org>
2022-01-24 18:32   ` [Bridge] [RFC bpf-next 1/2] net: bridge: add unstable br_fdb_find_port_from_ifindex helper Nikolay Aleksandrov
2022-01-25  5:09     ` Alexei Starovoitov
2022-01-26 11:09       ` Lorenzo Bianconi
2022-01-26 12:02         ` Toke Høiland-Jørgensen
2022-01-26 11:27     ` Lorenzo Bianconi
2022-01-26 12:08       ` Kumar Kartikeya Dwivedi
2022-01-26 12:39       ` Nikolay Aleksandrov
2022-01-26 12:50         ` Toke Høiland-Jørgensen
2022-01-26 12:57           ` Nikolay Aleksandrov
2022-01-26 15:00             ` Lorenzo Bianconi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox