From: Vladimir Oltean <olteanv@gmail.com>
To: Ido Schimmel <idosch@idosch.org>
Cc: Nikolay Aleksandrov <nikolay@nvidia.com>,
Roopa Prabhu <roopa@nvidia.com>, Andrew Lunn <andrew@lunn.ch>,
Florian Fainelli <f.fainelli@gmail.com>,
netdev@vger.kernel.org
Subject: Re: Timing of host-joined bridge multicast groups with switchdev
Date: Tue, 23 Feb 2021 21:49:08 +0200 [thread overview]
Message-ID: <20210223194908.ne4a7abulirqfbs6@skbuf> (raw)
In-Reply-To: <YDVXhZdy510mFtG/@shredder.lan>
On Tue, Feb 23, 2021 at 09:29:09PM +0200, Ido Schimmel wrote:
> On Tue, Feb 23, 2021 at 08:02:36PM +0200, Vladimir Oltean wrote:
> > On Tue, Feb 23, 2021 at 07:56:22PM +0200, Ido Schimmel wrote:
> > > For route offload you get a dump of all the existing routes when you
> > > register your notifier. It's a bit different with bridge because you
> > > don't care about existing bridges when you just initialize your driver.
> > >
> > > We had a similar issue with VXLAN because its FDB can be populated and
> > > only then attached to a bridge that you offload. Check
> > > vxlan_fdb_replay(). Probably need to introduce something similar for
> > > FDB/MDB entries.
> >
> > So you would be in favor of a driver-voluntary 'pull' type of approach
> > at bridge join, instead of the bridge 'pushing' the addresses?
> >
> > That's all fine, except when we'll have more than 3 switchdev drivers,
> > how do we expect to manage all this complexity duplicated in many places
> > in the kernel, instead of having it in a central place? Are there corner
> > cases I'm missing which make the 'push' approach impractical?
>
> Not sure. It needs to be scheduled when the driver is ready to handle
> it. In br_add_if() after netdev_master_upper_dev_link() is probably a
> good place.
>
> It also needs to be done once and not every time another port joins the
> bridge. This can be done using the port's parent ID, similar to what we
> are already doing with the offload forward mark in
> nbp_switchdev_mark_set().
>
> But I'm not sure how we replay it only for a single notifier block. I'm
> not familiar with setups where you have more than one listener let alone
> more than one that is interested in notifications from a specific
> bridge, so maybe it is OK to just replay it for all the listeners. But I
> would prefer to avoid it if we can.
At least with a driver-initiated pull, this seems to work:
-----------------------------[ cut here ]-----------------------------
From 13cb5ccbe35f64cfabe7dea3f76c8bc778cff9dc Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Tue, 23 Feb 2021 21:45:08 +0200
Subject: [PATCH] net: bridge: add a function that replays port and host-joined
mdb entries
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
include/linux/if_bridge.h | 3 +
include/net/switchdev.h | 1 +
net/bridge/br_mdb.c | 115 ++++++++++++++++++++++++++++++++++++++
net/dsa/slave.c | 1 +
4 files changed, 120 insertions(+)
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b979005ea39c..d1190e2984bc 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -95,6 +95,9 @@ static inline bool br_multicast_router(const struct net_device *dev)
}
#endif
+int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
+ struct notifier_block *nb);
+
#if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
bool br_vlan_enabled(const struct net_device *dev);
int br_vlan_get_pvid(const struct net_device *dev, u16 *p_pvid);
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index b7fc7d0f54e2..8c3218177136 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -68,6 +68,7 @@ enum switchdev_obj_id {
};
struct switchdev_obj {
+ struct list_head list;
struct net_device *orig_dev;
enum switchdev_obj_id id;
u32 flags;
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 8846c5bcd075..72978c881e11 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -235,6 +235,121 @@ static int __mdb_fill_info(struct sk_buff *skb,
return -EMSGSIZE;
}
+static int br_mdb_replay_one(struct notifier_block *nb, struct net_device *dev,
+ struct switchdev_obj_port_mdb *mdb)
+{
+ struct switchdev_notifier_port_obj_info obj_info = {
+ .info = {
+ .dev = dev,
+ },
+ .obj = &mdb->obj,
+ };
+ int err;
+
+ err = nb->notifier_call(nb, SWITCHDEV_PORT_OBJ_ADD, &obj_info);
+ return notifier_to_errno(err);
+}
+
+static int br_mdb_queue_one(struct list_head *mdb_list,
+ enum switchdev_obj_id id,
+ struct net_bridge_mdb_entry *mp,
+ struct net_device *orig_dev)
+{
+ struct switchdev_obj_port_mdb *mdb;
+
+ mdb = kzalloc(sizeof(*mdb), GFP_ATOMIC);
+ if (!mdb)
+ return -ENOMEM;
+
+ mdb->obj.id = id;
+ mdb->obj.orig_dev = orig_dev;
+ mdb->vid = mp->addr.vid;
+
+ if (mp->addr.proto == htons(ETH_P_IP))
+ ip_eth_mc_map(mp->addr.dst.ip4, mdb->addr);
+#if IS_ENABLED(CONFIG_IPV6)
+ else
+ ipv6_eth_mc_map(&mp->addr.dst.ip6, mdb->addr);
+#endif
+
+ list_add_tail(&mdb->obj.list, mdb_list);
+
+ return 0;
+}
+
+int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
+ struct notifier_block *nb)
+{
+ struct net_bridge_mdb_entry *mp;
+ struct switchdev_obj *obj, *tmp;
+ struct list_head mdb_list;
+ struct net_bridge *br;
+ int err = 0;
+
+ ASSERT_RTNL();
+
+ INIT_LIST_HEAD(&mdb_list);
+
+ if (!netif_is_bridge_master(br_dev))
+ return -EINVAL;
+
+ if (!netif_is_bridge_port(dev))
+ return -EINVAL;
+
+ br = netdev_priv(br_dev);
+
+ if (!br_opt_get(br, BROPT_MULTICAST_ENABLED))
+ return 0;
+
+ rcu_read_lock();
+
+ hlist_for_each_entry_rcu(mp, &br->mdb_list, mdb_node) {
+ struct net_bridge_port_group __rcu **pp;
+ struct net_bridge_port_group *p;
+
+ if (mp->host_joined) {
+ err = br_mdb_queue_one(&mdb_list,
+ SWITCHDEV_OBJ_ID_HOST_MDB,
+ mp, br_dev);
+ if (err) {
+ rcu_read_unlock();
+ goto out_free_mdb;
+ }
+ }
+
+ for (pp = &mp->ports; (p = rcu_dereference(*pp)) != NULL;
+ pp = &p->next) {
+ if (p->key.port->dev != dev)
+ continue;
+
+ err = br_mdb_queue_one(&mdb_list,
+ SWITCHDEV_OBJ_ID_PORT_MDB,
+ mp, dev);
+ if (err) {
+ rcu_read_unlock();
+ goto out_free_mdb;
+ }
+ }
+ }
+
+ rcu_read_unlock();
+
+ list_for_each_entry(obj, &mdb_list, list) {
+ err = br_mdb_replay_one(nb, dev, SWITCHDEV_OBJ_PORT_MDB(obj));
+ if (err)
+ goto out_free_mdb;
+ }
+
+out_free_mdb:
+ list_for_each_entry_safe(obj, tmp, &mdb_list, list) {
+ list_del(&obj->list);
+ kfree(SWITCHDEV_OBJ_PORT_MDB(obj));
+ }
+
+ return err;
+}
+EXPORT_SYMBOL(br_mdb_replay);
+
static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
struct net_device *dev)
{
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 43e403ac70d5..9052ff5efab7 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -2070,6 +2070,7 @@ static int dsa_slave_changeupper(struct net_device *dev,
err = dsa_port_bridge_join(dp, info->upper_dev);
if (!err)
dsa_bridge_mtu_normalization(dp);
+ err = br_mdb_replay(info->upper_dev, dev, &dsa_slave_switchdev_blocking_notifier);
err = notifier_from_errno(err);
} else {
dsa_port_bridge_leave(dp, info->upper_dev);
--
2.25.1
-----------------------------[ cut here ]-----------------------------
I am just not sure why I need to emit the notification only once per
ASIC. Currently, SWITCHDEV_OBJ_ID_HOST_MDB is emitted for all bridge
ports, so the callers need to reference-count it anyway. As for the port
mdb entries, I am filtering the entries towards just a single port when
that joins. So I think this is okay.
next prev parent reply other threads:[~2021-02-23 19:49 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-23 17:37 Timing of host-joined bridge multicast groups with switchdev Vladimir Oltean
2021-02-23 17:56 ` Ido Schimmel
2021-02-23 18:02 ` Vladimir Oltean
2021-02-23 19:29 ` Ido Schimmel
2021-02-23 19:49 ` Vladimir Oltean [this message]
2021-02-23 20:07 ` Vladimir Oltean
2021-02-24 2:01 ` [PATCH] net: bridge: add a function that replays port and host-joined kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210223194908.ne4a7abulirqfbs6@skbuf \
--to=olteanv@gmail.com \
--cc=andrew@lunn.ch \
--cc=f.fainelli@gmail.com \
--cc=idosch@idosch.org \
--cc=netdev@vger.kernel.org \
--cc=nikolay@nvidia.com \
--cc=roopa@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox