* [PATCH net 0/2] net: bridge: switchdev: Skip MDB replays of pending events
@ 2024-01-31 12:35 Tobias Waldekranz
2024-01-31 12:35 ` [PATCH net 1/2] net: switchdev: Add helper to check if an object event is pending Tobias Waldekranz
2024-01-31 12:35 ` [PATCH net 2/2] net: bridge: switchdev: Skip MDB replays of pending events Tobias Waldekranz
0 siblings, 2 replies; 12+ messages in thread
From: Tobias Waldekranz @ 2024-01-31 12:35 UTC (permalink / raw)
To: davem, kuba; +Cc: olteanv, roopa, razor, bridge, netdev, jiri, ivecera
Prevent the MDB replay logic from racing with the IGMP/MLD snooping
logic, which can otherwise cause the bridge to generate replays of
events that will also be delivered as regular events. The log message
of 2/2 has all the details.
We choose to preserve events in the deferred queue, eliding the
corresponding replay instead of the opposite. This is important
because purging the deferred event instead, would rob other listeners
of of that event entirely. I.e., regular events are "broadcast" to all
listeners, while replays are "unicast" only to the port joining or
leaving the bridge.
br0
/ \
sw1p0 sw2p0
(hwdom 1) (hwdom 2)
In a setup like above, it is vital that sw1p0 learns about all group
memberships on sw2p0, since it may want to translate such memberships
to host equivalents, in order to let the bridge sofware forward them
to ports in other hardware domains.
Tobias Waldekranz (2):
net: switchdev: Add helper to check if an object event is pending
net: bridge: switchdev: Skip MDB replays of pending events
include/net/switchdev.h | 3 ++
net/bridge/br_switchdev.c | 44 +++++++++++++++++-----------
net/switchdev/switchdev.c | 61 +++++++++++++++++++++++++++++++++++++++
3 files changed, 91 insertions(+), 17 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH net 1/2] net: switchdev: Add helper to check if an object event is pending 2024-01-31 12:35 [PATCH net 0/2] net: bridge: switchdev: Skip MDB replays of pending events Tobias Waldekranz @ 2024-01-31 12:35 ` Tobias Waldekranz 2024-01-31 12:50 ` Jiri Pirko 2024-01-31 13:34 ` Vladimir Oltean 2024-01-31 12:35 ` [PATCH net 2/2] net: bridge: switchdev: Skip MDB replays of pending events Tobias Waldekranz 1 sibling, 2 replies; 12+ messages in thread From: Tobias Waldekranz @ 2024-01-31 12:35 UTC (permalink / raw) To: davem, kuba; +Cc: olteanv, roopa, razor, bridge, netdev, jiri, ivecera When adding/removing a port to/from a bridge, the port must be brought up to speed with the current state of the bridge. This is done by replaying all relevant events, directly to the port in question. In some situations, specifically when replaying the MDB, this process may race against new events that are generated concurrently. So the bridge must ensure that the event is not already pending on the deferred queue. switchdev_port_obj_is_deferred answers this question. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> --- include/net/switchdev.h | 3 ++ net/switchdev/switchdev.c | 61 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+) diff --git a/include/net/switchdev.h b/include/net/switchdev.h index a43062d4c734..538851a93d9e 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -308,6 +308,9 @@ void switchdev_deferred_process(void); int switchdev_port_attr_set(struct net_device *dev, const struct switchdev_attr *attr, struct netlink_ext_ack *extack); +bool switchdev_port_obj_is_deferred(struct net_device *dev, + enum switchdev_notifier_type nt, + const struct switchdev_obj *obj); int switchdev_port_obj_add(struct net_device *dev, const struct switchdev_obj *obj, struct netlink_ext_ack *extack); diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index 5b045284849e..40bb17c7fdbf 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -19,6 +19,35 @@ #include <linux/rtnetlink.h> #include <net/switchdev.h> +static bool switchdev_obj_eq(const struct switchdev_obj *a, + const struct switchdev_obj *b) +{ + const struct switchdev_obj_port_vlan *va, *vb; + const struct switchdev_obj_port_mdb *ma, *mb; + + if (a->id != b->id || a->orig_dev != b->orig_dev) + return false; + + switch (a->id) { + case SWITCHDEV_OBJ_ID_PORT_VLAN: + va = SWITCHDEV_OBJ_PORT_VLAN(a); + vb = SWITCHDEV_OBJ_PORT_VLAN(b); + return va->flags == vb->flags && + va->vid == vb->vid && + va->changed == vb->changed; + case SWITCHDEV_OBJ_ID_PORT_MDB: + case SWITCHDEV_OBJ_ID_HOST_MDB: + ma = SWITCHDEV_OBJ_PORT_MDB(a); + mb = SWITCHDEV_OBJ_PORT_MDB(b); + return ma->vid == mb->vid && + !memcmp(ma->addr, mb->addr, sizeof(ma->addr)); + default: + break; + } + + BUG(); +} + static LIST_HEAD(deferred); static DEFINE_SPINLOCK(deferred_lock); @@ -307,6 +336,38 @@ int switchdev_port_obj_del(struct net_device *dev, } EXPORT_SYMBOL_GPL(switchdev_port_obj_del); +bool switchdev_port_obj_is_deferred(struct net_device *dev, + enum switchdev_notifier_type nt, + const struct switchdev_obj *obj) +{ + struct switchdev_deferred_item *dfitem; + bool found = false; + + ASSERT_RTNL(); + + spin_lock_bh(&deferred_lock); + + list_for_each_entry(dfitem, &deferred, list) { + if (dfitem->dev != dev) + continue; + + if ((dfitem->func == switchdev_port_obj_add_deferred && + nt == SWITCHDEV_PORT_OBJ_ADD) || + (dfitem->func == switchdev_port_obj_del_deferred && + nt == SWITCHDEV_PORT_OBJ_DEL)) { + if (switchdev_obj_eq((const void *)dfitem->data, obj)) { + found = true; + break; + } + } + } + + spin_unlock_bh(&deferred_lock); + + return found; +} +EXPORT_SYMBOL_GPL(switchdev_port_obj_is_deferred); + static ATOMIC_NOTIFIER_HEAD(switchdev_notif_chain); static BLOCKING_NOTIFIER_HEAD(switchdev_blocking_notif_chain); -- 2.34.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net 1/2] net: switchdev: Add helper to check if an object event is pending 2024-01-31 12:35 ` [PATCH net 1/2] net: switchdev: Add helper to check if an object event is pending Tobias Waldekranz @ 2024-01-31 12:50 ` Jiri Pirko 2024-01-31 13:31 ` Tobias Waldekranz 2024-02-01 0:33 ` Vladimir Oltean 2024-01-31 13:34 ` Vladimir Oltean 1 sibling, 2 replies; 12+ messages in thread From: Jiri Pirko @ 2024-01-31 12:50 UTC (permalink / raw) To: Tobias Waldekranz Cc: davem, kuba, olteanv, roopa, razor, bridge, netdev, ivecera Wed, Jan 31, 2024 at 01:35:43PM CET, tobias@waldekranz.com wrote: >When adding/removing a port to/from a bridge, the port must be brought >up to speed with the current state of the bridge. This is done by >replaying all relevant events, directly to the port in question. Could you please use the imperative mood in your patch descriptions? That way, it is much easier to understand what is the current state of things and what you are actually changing. https://www.kernel.org/doc/html/v6.7/process/submitting-patches.html#describe-your-changes While at it, could you also fix your cover letter so the reader can actually tell what's the current state and what the patchset is doing? pw-bot: cr > >In some situations, specifically when replaying the MDB, this process >may race against new events that are generated concurrently. So the >bridge must ensure that the event is not already pending on the >deferred queue. switchdev_port_obj_is_deferred answers this question. > >Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> >--- > include/net/switchdev.h | 3 ++ > net/switchdev/switchdev.c | 61 +++++++++++++++++++++++++++++++++++++++ > 2 files changed, 64 insertions(+) > >diff --git a/include/net/switchdev.h b/include/net/switchdev.h >index a43062d4c734..538851a93d9e 100644 >--- a/include/net/switchdev.h >+++ b/include/net/switchdev.h >@@ -308,6 +308,9 @@ void switchdev_deferred_process(void); > int switchdev_port_attr_set(struct net_device *dev, > const struct switchdev_attr *attr, > struct netlink_ext_ack *extack); >+bool switchdev_port_obj_is_deferred(struct net_device *dev, >+ enum switchdev_notifier_type nt, >+ const struct switchdev_obj *obj); > int switchdev_port_obj_add(struct net_device *dev, > const struct switchdev_obj *obj, > struct netlink_ext_ack *extack); >diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c >index 5b045284849e..40bb17c7fdbf 100644 >--- a/net/switchdev/switchdev.c >+++ b/net/switchdev/switchdev.c >@@ -19,6 +19,35 @@ > #include <linux/rtnetlink.h> > #include <net/switchdev.h> > >+static bool switchdev_obj_eq(const struct switchdev_obj *a, >+ const struct switchdev_obj *b) >+{ >+ const struct switchdev_obj_port_vlan *va, *vb; >+ const struct switchdev_obj_port_mdb *ma, *mb; >+ >+ if (a->id != b->id || a->orig_dev != b->orig_dev) >+ return false; >+ >+ switch (a->id) { >+ case SWITCHDEV_OBJ_ID_PORT_VLAN: >+ va = SWITCHDEV_OBJ_PORT_VLAN(a); >+ vb = SWITCHDEV_OBJ_PORT_VLAN(b); >+ return va->flags == vb->flags && >+ va->vid == vb->vid && >+ va->changed == vb->changed; >+ case SWITCHDEV_OBJ_ID_PORT_MDB: >+ case SWITCHDEV_OBJ_ID_HOST_MDB: >+ ma = SWITCHDEV_OBJ_PORT_MDB(a); >+ mb = SWITCHDEV_OBJ_PORT_MDB(b); >+ return ma->vid == mb->vid && >+ !memcmp(ma->addr, mb->addr, sizeof(ma->addr)); >+ default: >+ break; >+ } >+ >+ BUG(); >+} >+ > static LIST_HEAD(deferred); > static DEFINE_SPINLOCK(deferred_lock); > >@@ -307,6 +336,38 @@ int switchdev_port_obj_del(struct net_device *dev, > } > EXPORT_SYMBOL_GPL(switchdev_port_obj_del); > >+bool switchdev_port_obj_is_deferred(struct net_device *dev, >+ enum switchdev_notifier_type nt, >+ const struct switchdev_obj *obj) >+{ >+ struct switchdev_deferred_item *dfitem; >+ bool found = false; >+ >+ ASSERT_RTNL(); >+ >+ spin_lock_bh(&deferred_lock); >+ >+ list_for_each_entry(dfitem, &deferred, list) { >+ if (dfitem->dev != dev) >+ continue; >+ >+ if ((dfitem->func == switchdev_port_obj_add_deferred && >+ nt == SWITCHDEV_PORT_OBJ_ADD) || >+ (dfitem->func == switchdev_port_obj_del_deferred && >+ nt == SWITCHDEV_PORT_OBJ_DEL)) { >+ if (switchdev_obj_eq((const void *)dfitem->data, obj)) { >+ found = true; >+ break; >+ } >+ } >+ } >+ >+ spin_unlock_bh(&deferred_lock); >+ >+ return found; >+} >+EXPORT_SYMBOL_GPL(switchdev_port_obj_is_deferred); >+ > static ATOMIC_NOTIFIER_HEAD(switchdev_notif_chain); > static BLOCKING_NOTIFIER_HEAD(switchdev_blocking_notif_chain); > >-- >2.34.1 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 1/2] net: switchdev: Add helper to check if an object event is pending 2024-01-31 12:50 ` Jiri Pirko @ 2024-01-31 13:31 ` Tobias Waldekranz 2024-02-01 0:33 ` Vladimir Oltean 1 sibling, 0 replies; 12+ messages in thread From: Tobias Waldekranz @ 2024-01-31 13:31 UTC (permalink / raw) To: Jiri Pirko; +Cc: davem, kuba, olteanv, roopa, razor, bridge, netdev, ivecera On ons, jan 31, 2024 at 13:50, Jiri Pirko <jiri@resnulli.us> wrote: > Wed, Jan 31, 2024 at 01:35:43PM CET, tobias@waldekranz.com wrote: >>When adding/removing a port to/from a bridge, the port must be brought >>up to speed with the current state of the bridge. This is done by >>replaying all relevant events, directly to the port in question. > > Could you please use the imperative mood in your patch descriptions? > That way, it is much easier to understand what is the current state of > things and what you are actually changing. > > https://www.kernel.org/doc/html/v6.7/process/submitting-patches.html#describe-your-changes > > While at it, could you also fix your cover letter so the reader can > actually tell what's the current state and what the patchset is doing? Sure thing. Do you feel that this is enough of an issue that it blocks you from doing a review of v1, or can I wait for more feedback and bake it in with other changes? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 1/2] net: switchdev: Add helper to check if an object event is pending 2024-01-31 12:50 ` Jiri Pirko 2024-01-31 13:31 ` Tobias Waldekranz @ 2024-02-01 0:33 ` Vladimir Oltean 2024-02-01 7:45 ` Jiri Pirko 1 sibling, 1 reply; 12+ messages in thread From: Vladimir Oltean @ 2024-02-01 0:33 UTC (permalink / raw) To: Jiri Pirko Cc: Tobias Waldekranz, davem, kuba, roopa, razor, bridge, netdev, ivecera On Wed, Jan 31, 2024 at 01:50:11PM +0100, Jiri Pirko wrote: > Wed, Jan 31, 2024 at 01:35:43PM CET, tobias@waldekranz.com wrote: > >When adding/removing a port to/from a bridge, the port must be brought > >up to speed with the current state of the bridge. This is done by > >replaying all relevant events, directly to the port in question. > > Could you please use the imperative mood in your patch descriptions? > That way, it is much easier to understand what is the current state of > things and what you are actually changing. FWIW, the paragraph you've concentrated upon does describe the current state of things, not what the patch does; thus it does not need to be in the imperative mood. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 1/2] net: switchdev: Add helper to check if an object event is pending 2024-02-01 0:33 ` Vladimir Oltean @ 2024-02-01 7:45 ` Jiri Pirko 0 siblings, 0 replies; 12+ messages in thread From: Jiri Pirko @ 2024-02-01 7:45 UTC (permalink / raw) To: Vladimir Oltean Cc: Tobias Waldekranz, davem, kuba, roopa, razor, bridge, netdev, ivecera Thu, Feb 01, 2024 at 01:33:05AM CET, olteanv@gmail.com wrote: >On Wed, Jan 31, 2024 at 01:50:11PM +0100, Jiri Pirko wrote: >> Wed, Jan 31, 2024 at 01:35:43PM CET, tobias@waldekranz.com wrote: >> >When adding/removing a port to/from a bridge, the port must be brought >> >up to speed with the current state of the bridge. This is done by >> >replaying all relevant events, directly to the port in question. >> >> Could you please use the imperative mood in your patch descriptions? >> That way, it is much easier to understand what is the current state of >> things and what you are actually changing. > >FWIW, the paragraph you've concentrated upon does describe the current >state of things, not what the patch does; thus it does not need to be in >the imperative mood. Well, there is no imperative mood in the next paragraph either :) Therefore from the patch desctiption pow now clue what the patch is doing. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 1/2] net: switchdev: Add helper to check if an object event is pending 2024-01-31 12:35 ` [PATCH net 1/2] net: switchdev: Add helper to check if an object event is pending Tobias Waldekranz 2024-01-31 12:50 ` Jiri Pirko @ 2024-01-31 13:34 ` Vladimir Oltean 2024-01-31 14:48 ` Tobias Waldekranz 1 sibling, 1 reply; 12+ messages in thread From: Vladimir Oltean @ 2024-01-31 13:34 UTC (permalink / raw) To: Tobias Waldekranz Cc: davem, kuba, roopa, razor, bridge, netdev, jiri, ivecera On Wed, Jan 31, 2024 at 01:35:43PM +0100, Tobias Waldekranz wrote: > When adding/removing a port to/from a bridge, the port must be brought > up to speed with the current state of the bridge. This is done by > replaying all relevant events, directly to the port in question. > > In some situations, specifically when replaying the MDB, this process > may race against new events that are generated concurrently. > > So the bridge must ensure that the event is not already pending on the > deferred queue. switchdev_port_obj_is_deferred answers this question. > > Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> I don't see great value in splitting this patch in (1) unused helpers (2) actual fix that uses them. Especially since it creates confusion - it is nowhere made clear in this commit message that it is just preparatory work. > --- > include/net/switchdev.h | 3 ++ > net/switchdev/switchdev.c | 61 +++++++++++++++++++++++++++++++++++++++ > 2 files changed, 64 insertions(+) > > diff --git a/include/net/switchdev.h b/include/net/switchdev.h > index a43062d4c734..538851a93d9e 100644 > --- a/include/net/switchdev.h > +++ b/include/net/switchdev.h > @@ -308,6 +308,9 @@ void switchdev_deferred_process(void); > int switchdev_port_attr_set(struct net_device *dev, > const struct switchdev_attr *attr, > struct netlink_ext_ack *extack); > +bool switchdev_port_obj_is_deferred(struct net_device *dev, > + enum switchdev_notifier_type nt, > + const struct switchdev_obj *obj); I think this is missing a shim definition for when CONFIG_NET_SWITCHDEV is disabled. > int switchdev_port_obj_add(struct net_device *dev, > const struct switchdev_obj *obj, > struct netlink_ext_ack *extack); > diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c > index 5b045284849e..40bb17c7fdbf 100644 > --- a/net/switchdev/switchdev.c > +++ b/net/switchdev/switchdev.c > @@ -19,6 +19,35 @@ > #include <linux/rtnetlink.h> > #include <net/switchdev.h> > > +static bool switchdev_obj_eq(const struct switchdev_obj *a, > + const struct switchdev_obj *b) > +{ > + const struct switchdev_obj_port_vlan *va, *vb; > + const struct switchdev_obj_port_mdb *ma, *mb; > + > + if (a->id != b->id || a->orig_dev != b->orig_dev) > + return false; > + > + switch (a->id) { > + case SWITCHDEV_OBJ_ID_PORT_VLAN: > + va = SWITCHDEV_OBJ_PORT_VLAN(a); > + vb = SWITCHDEV_OBJ_PORT_VLAN(b); > + return va->flags == vb->flags && > + va->vid == vb->vid && > + va->changed == vb->changed; > + case SWITCHDEV_OBJ_ID_PORT_MDB: > + case SWITCHDEV_OBJ_ID_HOST_MDB: > + ma = SWITCHDEV_OBJ_PORT_MDB(a); > + mb = SWITCHDEV_OBJ_PORT_MDB(b); > + return ma->vid == mb->vid && > + !memcmp(ma->addr, mb->addr, sizeof(ma->addr)); ether_addr_equal(). > + default: > + break; Does C allow you to not return anything here? > + } > + > + BUG(); > +} > + > static LIST_HEAD(deferred); > static DEFINE_SPINLOCK(deferred_lock); > > @@ -307,6 +336,38 @@ int switchdev_port_obj_del(struct net_device *dev, > } > EXPORT_SYMBOL_GPL(switchdev_port_obj_del); > > +bool switchdev_port_obj_is_deferred(struct net_device *dev, > + enum switchdev_notifier_type nt, > + const struct switchdev_obj *obj) A kernel-doc comment would be great. It looks like it's not returning whether the port object is deferred, but whether the _action_ given by @nt on the @obj is deferred. This further distinguishes between deferred additions and deferred removals. > +{ > + struct switchdev_deferred_item *dfitem; > + bool found = false; > + > + ASSERT_RTNL(); Why does rtnl_lock() have to be held? To fully allow switchdev_deferred_process() to run to completion, aka its dfitem->func() as well? > + > + spin_lock_bh(&deferred_lock); > + > + list_for_each_entry(dfitem, &deferred, list) { > + if (dfitem->dev != dev) > + continue; > + > + if ((dfitem->func == switchdev_port_obj_add_deferred && > + nt == SWITCHDEV_PORT_OBJ_ADD) || > + (dfitem->func == switchdev_port_obj_del_deferred && > + nt == SWITCHDEV_PORT_OBJ_DEL)) { > + if (switchdev_obj_eq((const void *)dfitem->data, obj)) { > + found = true; > + break; > + } > + } > + } > + > + spin_unlock_bh(&deferred_lock); > + > + return found; > +} > +EXPORT_SYMBOL_GPL(switchdev_port_obj_is_deferred); > + > static ATOMIC_NOTIFIER_HEAD(switchdev_notif_chain); > static BLOCKING_NOTIFIER_HEAD(switchdev_blocking_notif_chain); > > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 1/2] net: switchdev: Add helper to check if an object event is pending 2024-01-31 13:34 ` Vladimir Oltean @ 2024-01-31 14:48 ` Tobias Waldekranz 2024-02-01 0:29 ` Vladimir Oltean 0 siblings, 1 reply; 12+ messages in thread From: Tobias Waldekranz @ 2024-01-31 14:48 UTC (permalink / raw) To: Vladimir Oltean; +Cc: davem, kuba, roopa, razor, bridge, netdev, jiri, ivecera On ons, jan 31, 2024 at 15:34, Vladimir Oltean <olteanv@gmail.com> wrote: > On Wed, Jan 31, 2024 at 01:35:43PM +0100, Tobias Waldekranz wrote: >> When adding/removing a port to/from a bridge, the port must be brought >> up to speed with the current state of the bridge. This is done by >> replaying all relevant events, directly to the port in question. >> >> In some situations, specifically when replaying the MDB, this process >> may race against new events that are generated concurrently. >> >> So the bridge must ensure that the event is not already pending on the >> deferred queue. switchdev_port_obj_is_deferred answers this question. >> >> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> > > I don't see great value in splitting this patch in (1) unused helpers > (2) actual fix that uses them. Especially since it creates confusion - > it is nowhere made clear in this commit message that it is just > preparatory work. It was one commit until the last minute, I'll squash them back together. >> --- >> include/net/switchdev.h | 3 ++ >> net/switchdev/switchdev.c | 61 +++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 64 insertions(+) >> >> diff --git a/include/net/switchdev.h b/include/net/switchdev.h >> index a43062d4c734..538851a93d9e 100644 >> --- a/include/net/switchdev.h >> +++ b/include/net/switchdev.h >> @@ -308,6 +308,9 @@ void switchdev_deferred_process(void); >> int switchdev_port_attr_set(struct net_device *dev, >> const struct switchdev_attr *attr, >> struct netlink_ext_ack *extack); >> +bool switchdev_port_obj_is_deferred(struct net_device *dev, >> + enum switchdev_notifier_type nt, >> + const struct switchdev_obj *obj); > > I think this is missing a shim definition for when CONFIG_NET_SWITCHDEV > is disabled. Even though the only caller is br_switchdev.c, which is guarded behind CONFIG_NET_SWITCHDEV? >> int switchdev_port_obj_add(struct net_device *dev, >> const struct switchdev_obj *obj, >> struct netlink_ext_ack *extack); >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c >> index 5b045284849e..40bb17c7fdbf 100644 >> --- a/net/switchdev/switchdev.c >> +++ b/net/switchdev/switchdev.c >> @@ -19,6 +19,35 @@ >> #include <linux/rtnetlink.h> >> #include <net/switchdev.h> >> >> +static bool switchdev_obj_eq(const struct switchdev_obj *a, >> + const struct switchdev_obj *b) >> +{ >> + const struct switchdev_obj_port_vlan *va, *vb; >> + const struct switchdev_obj_port_mdb *ma, *mb; >> + >> + if (a->id != b->id || a->orig_dev != b->orig_dev) >> + return false; >> + >> + switch (a->id) { >> + case SWITCHDEV_OBJ_ID_PORT_VLAN: >> + va = SWITCHDEV_OBJ_PORT_VLAN(a); >> + vb = SWITCHDEV_OBJ_PORT_VLAN(b); >> + return va->flags == vb->flags && >> + va->vid == vb->vid && >> + va->changed == vb->changed; >> + case SWITCHDEV_OBJ_ID_PORT_MDB: >> + case SWITCHDEV_OBJ_ID_HOST_MDB: >> + ma = SWITCHDEV_OBJ_PORT_MDB(a); >> + mb = SWITCHDEV_OBJ_PORT_MDB(b); >> + return ma->vid == mb->vid && >> + !memcmp(ma->addr, mb->addr, sizeof(ma->addr)); > > ether_addr_equal(). > >> + default: >> + break; > > Does C allow you to not return anything here? No warnings or errors are generated by my compiler (GCC 12.2.0). My guess is that the expansion of BUG() ends with __builtin_unreachable() or similar. >> + } >> + >> + BUG(); >> +} >> + >> static LIST_HEAD(deferred); >> static DEFINE_SPINLOCK(deferred_lock); >> >> @@ -307,6 +336,38 @@ int switchdev_port_obj_del(struct net_device *dev, >> } >> EXPORT_SYMBOL_GPL(switchdev_port_obj_del); >> >> +bool switchdev_port_obj_is_deferred(struct net_device *dev, >> + enum switchdev_notifier_type nt, >> + const struct switchdev_obj *obj) > > A kernel-doc comment would be great. It looks like it's not returning > whether the port object is deferred, but whether the _action_ given by > @nt on the @obj is deferred. This further distinguishes between deferred > additions and deferred removals. > Fair, so should the name change as well? I guess you'd want something like switchdev_port_obj_notification_is_deferred, but that sure is awfully long. >> +{ >> + struct switchdev_deferred_item *dfitem; >> + bool found = false; >> + >> + ASSERT_RTNL(); > > Why does rtnl_lock() have to be held? To fully allow switchdev_deferred_process() > to run to completion, aka its dfitem->func() as well? That is in effect what is does, yes. All we really would need is to ensure that any individual item that has been removed from the list has also executed its callback. But holding rtnl_lock was the most granular way I could see that would ensure that. >> + >> + spin_lock_bh(&deferred_lock); >> + >> + list_for_each_entry(dfitem, &deferred, list) { >> + if (dfitem->dev != dev) >> + continue; >> + >> + if ((dfitem->func == switchdev_port_obj_add_deferred && >> + nt == SWITCHDEV_PORT_OBJ_ADD) || >> + (dfitem->func == switchdev_port_obj_del_deferred && >> + nt == SWITCHDEV_PORT_OBJ_DEL)) { >> + if (switchdev_obj_eq((const void *)dfitem->data, obj)) { >> + found = true; >> + break; >> + } >> + } >> + } >> + >> + spin_unlock_bh(&deferred_lock); >> + >> + return found; >> +} >> +EXPORT_SYMBOL_GPL(switchdev_port_obj_is_deferred); >> + >> static ATOMIC_NOTIFIER_HEAD(switchdev_notif_chain); >> static BLOCKING_NOTIFIER_HEAD(switchdev_blocking_notif_chain); >> >> -- >> 2.34.1 >> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 1/2] net: switchdev: Add helper to check if an object event is pending 2024-01-31 14:48 ` Tobias Waldekranz @ 2024-02-01 0:29 ` Vladimir Oltean 0 siblings, 0 replies; 12+ messages in thread From: Vladimir Oltean @ 2024-02-01 0:29 UTC (permalink / raw) To: Tobias Waldekranz Cc: davem, kuba, roopa, razor, bridge, netdev, jiri, ivecera On Wed, Jan 31, 2024 at 03:48:05PM +0100, Tobias Waldekranz wrote: > >> +bool switchdev_port_obj_is_deferred(struct net_device *dev, > >> + enum switchdev_notifier_type nt, > >> + const struct switchdev_obj *obj); > > > > I think this is missing a shim definition for when CONFIG_NET_SWITCHDEV > > is disabled. > > Even though the only caller is br_switchdev.c, which is guarded behind > CONFIG_NET_SWITCHDEV? My mistake, please disregard. > >> int switchdev_port_obj_add(struct net_device *dev, > >> const struct switchdev_obj *obj, > >> struct netlink_ext_ack *extack); > >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c > >> index 5b045284849e..40bb17c7fdbf 100644 > >> --- a/net/switchdev/switchdev.c > >> +++ b/net/switchdev/switchdev.c > >> @@ -19,6 +19,35 @@ > >> #include <linux/rtnetlink.h> > >> #include <net/switchdev.h> > >> > >> +static bool switchdev_obj_eq(const struct switchdev_obj *a, > >> + const struct switchdev_obj *b) > >> +{ > >> + const struct switchdev_obj_port_vlan *va, *vb; > >> + const struct switchdev_obj_port_mdb *ma, *mb; > >> + > >> + if (a->id != b->id || a->orig_dev != b->orig_dev) > >> + return false; > >> + > >> + switch (a->id) { > >> + case SWITCHDEV_OBJ_ID_PORT_VLAN: > >> + va = SWITCHDEV_OBJ_PORT_VLAN(a); > >> + vb = SWITCHDEV_OBJ_PORT_VLAN(b); > >> + return va->flags == vb->flags && > >> + va->vid == vb->vid && > >> + va->changed == vb->changed; > >> + case SWITCHDEV_OBJ_ID_PORT_MDB: > >> + case SWITCHDEV_OBJ_ID_HOST_MDB: > >> + ma = SWITCHDEV_OBJ_PORT_MDB(a); > >> + mb = SWITCHDEV_OBJ_PORT_MDB(b); > >> + return ma->vid == mb->vid && > >> + !memcmp(ma->addr, mb->addr, sizeof(ma->addr)); > > > > ether_addr_equal(). > > > >> + default: > >> + break; > > > > Does C allow you to not return anything here? > > No warnings or errors are generated by my compiler (GCC 12.2.0). > > My guess is that the expansion of BUG() ends with > __builtin_unreachable() or similar. Interesting, I didn't know that. Although checkpatch says: "WARNING: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants". So I'm conflicted about what I just learned and how it can be applied in a way that checkpatch doesn't dislike. > > >> + } > >> + > >> + BUG(); > >> +} > >> + > >> static LIST_HEAD(deferred); > >> static DEFINE_SPINLOCK(deferred_lock); > >> > >> @@ -307,6 +336,38 @@ int switchdev_port_obj_del(struct net_device *dev, > >> } > >> EXPORT_SYMBOL_GPL(switchdev_port_obj_del); > >> > >> +bool switchdev_port_obj_is_deferred(struct net_device *dev, > >> + enum switchdev_notifier_type nt, > >> + const struct switchdev_obj *obj) > > > > A kernel-doc comment would be great. It looks like it's not returning > > whether the port object is deferred, but whether the _action_ given by > > @nt on the @obj is deferred. This further distinguishes between deferred > > additions and deferred removals. > > > > Fair, so should the name change as well? I guess you'd want something > like switchdev_port_obj_notification_is_deferred, but that sure is > awfully long. switchdev_port_obj_act_is_deferred() for action, maybe? ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net 2/2] net: bridge: switchdev: Skip MDB replays of pending events 2024-01-31 12:35 [PATCH net 0/2] net: bridge: switchdev: Skip MDB replays of pending events Tobias Waldekranz 2024-01-31 12:35 ` [PATCH net 1/2] net: switchdev: Add helper to check if an object event is pending Tobias Waldekranz @ 2024-01-31 12:35 ` Tobias Waldekranz 2024-01-31 13:51 ` Vladimir Oltean 2024-01-31 15:06 ` Vladimir Oltean 1 sibling, 2 replies; 12+ messages in thread From: Tobias Waldekranz @ 2024-01-31 12:35 UTC (permalink / raw) To: davem, kuba; +Cc: olteanv, roopa, razor, bridge, netdev, jiri, ivecera Generating the list of events MDB to replay races against the IGMP/MLD snooping logic, which may concurrently enqueue events to the switchdev deferred queue, leading to duplicate events being sent to drivers. Avoid this by grabbing the write-side lock of the MDB, and make sure that a deferred version of a replay event is not already enqueued to the switchdev deferred queue before adding it to the replay list. An easy way to reproduce this issue, on an mv88e6xxx system, was to create a snooping bridge, and immediately add a port to it: root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \ > ip link set dev x3 up master br0 root@infix-06-0b-00:~$ ip link del dev br0 root@infix-06-0b-00:~$ mvls atu ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a DEV:0 Marvell 88E6393X 33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . . 33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . . ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a root@infix-06-0b-00:~$ The two IPv6 groups remain in the hardware database because the port (x3) is notified of the host's membership twice: once in the original event and once in a replay. Since DSA tracks host addresses using reference counters, and only a single delete notification is sent, the count remains at 1 when the bridge is destroyed. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> --- net/bridge/br_switchdev.c | 44 ++++++++++++++++++++++++--------------- 1 file changed, 27 insertions(+), 17 deletions(-) diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c index ee84e783e1df..a3481190d5e6 100644 --- a/net/bridge/br_switchdev.c +++ b/net/bridge/br_switchdev.c @@ -595,6 +595,8 @@ br_switchdev_mdb_replay_one(struct notifier_block *nb, struct net_device *dev, } static int br_switchdev_mdb_queue_one(struct list_head *mdb_list, + struct net_device *dev, + unsigned long action, enum switchdev_obj_id id, const struct net_bridge_mdb_entry *mp, struct net_device *orig_dev) @@ -608,8 +610,17 @@ static int br_switchdev_mdb_queue_one(struct list_head *mdb_list, mdb->obj.id = id; mdb->obj.orig_dev = orig_dev; br_switchdev_mdb_populate(mdb, mp); - list_add_tail(&mdb->obj.list, mdb_list); + if (switchdev_port_obj_is_deferred(dev, action, &mdb->obj)) { + /* This event is already in the deferred queue of + * events, so this replay must be elided, lest the + * driver receives duplicate events for it. + */ + kfree(mdb); + return 0; + } + + list_add_tail(&mdb->obj.list, mdb_list); return 0; } @@ -677,22 +688,26 @@ br_switchdev_mdb_replay(struct net_device *br_dev, struct net_device *dev, if (!br_opt_get(br, BROPT_MULTICAST_ENABLED)) return 0; - /* We cannot walk over br->mdb_list protected just by the rtnl_mutex, - * because the write-side protection is br->multicast_lock. But we - * need to emulate the [ blocking ] calling context of a regular - * switchdev event, so since both br->multicast_lock and RCU read side - * critical sections are atomic, we have no choice but to pick the RCU - * read side lock, queue up all our events, leave the critical section - * and notify switchdev from blocking context. + if (adding) + action = SWITCHDEV_PORT_OBJ_ADD; + else + action = SWITCHDEV_PORT_OBJ_DEL; + + /* br_switchdev_mdb_queue_one will take care to not queue a + * replay of an event that is already pending in the switchdev + * deferred queue. In order to safely determine that, there + * must be no new deferred MDB notifications enqueued for the + * duration of the MDB scan. Therefore, grab the write-side + * lock to avoid racing with any concurrent IGMP/MLD snooping. */ - rcu_read_lock(); + spin_lock_bh(&br->multicast_lock); hlist_for_each_entry_rcu(mp, &br->mdb_list, mdb_node) { struct net_bridge_port_group __rcu * const *pp; const struct net_bridge_port_group *p; if (mp->host_joined) { - err = br_switchdev_mdb_queue_one(&mdb_list, + err = br_switchdev_mdb_queue_one(&mdb_list, dev, action, SWITCHDEV_OBJ_ID_HOST_MDB, mp, br_dev); if (err) { @@ -706,7 +721,7 @@ br_switchdev_mdb_replay(struct net_device *br_dev, struct net_device *dev, if (p->key.port->dev != dev) continue; - err = br_switchdev_mdb_queue_one(&mdb_list, + err = br_switchdev_mdb_queue_one(&mdb_list, dev, action, SWITCHDEV_OBJ_ID_PORT_MDB, mp, dev); if (err) { @@ -716,12 +731,7 @@ br_switchdev_mdb_replay(struct net_device *br_dev, struct net_device *dev, } } - rcu_read_unlock(); - - if (adding) - action = SWITCHDEV_PORT_OBJ_ADD; - else - action = SWITCHDEV_PORT_OBJ_DEL; + spin_unlock_bh(&br->multicast_lock); list_for_each_entry(obj, &mdb_list, list) { err = br_switchdev_mdb_replay_one(nb, dev, -- 2.34.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] net: bridge: switchdev: Skip MDB replays of pending events 2024-01-31 12:35 ` [PATCH net 2/2] net: bridge: switchdev: Skip MDB replays of pending events Tobias Waldekranz @ 2024-01-31 13:51 ` Vladimir Oltean 2024-01-31 15:06 ` Vladimir Oltean 1 sibling, 0 replies; 12+ messages in thread From: Vladimir Oltean @ 2024-01-31 13:51 UTC (permalink / raw) To: Tobias Waldekranz Cc: davem, kuba, roopa, razor, bridge, netdev, jiri, ivecera On Wed, Jan 31, 2024 at 01:35:44PM +0100, Tobias Waldekranz wrote: > Generating the list of events MDB to replay races against the IGMP/MLD > snooping logic, which may concurrently enqueue events to the switchdev > deferred queue, leading to duplicate events being sent to drivers. > > Avoid this by grabbing the write-side lock of the MDB, and make sure > that a deferred version of a replay event is not already enqueued to > the switchdev deferred queue before adding it to the replay list. > > An easy way to reproduce this issue, on an mv88e6xxx system, was to > create a snooping bridge, and immediately add a port to it: > > root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \ > > ip link set dev x3 up master br0 > root@infix-06-0b-00:~$ ip link del dev br0 > root@infix-06-0b-00:~$ mvls atu > ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a > DEV:0 Marvell 88E6393X > 33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . . > 33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . . > ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a > root@infix-06-0b-00:~$ > > The two IPv6 groups remain in the hardware database because the > port (x3) is notified of the host's membership twice: once in the > original event and once in a replay. Since DSA tracks host addresses > using reference counters, and only a single delete notification is > sent, the count remains at 1 when the bridge is destroyed. > It's not really my business as to how the network maintainers handle this, but if you intend this to go to 'net', you should provide a Fixes: tag. And to make a compelling case for a submission to 'net', you should start off by explaining what the user-visible impact of the bug is. > Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> > --- > net/bridge/br_switchdev.c | 44 ++++++++++++++++++++++++--------------- > 1 file changed, 27 insertions(+), 17 deletions(-) > > diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c > index ee84e783e1df..a3481190d5e6 100644 > --- a/net/bridge/br_switchdev.c > +++ b/net/bridge/br_switchdev.c > @@ -595,6 +595,8 @@ br_switchdev_mdb_replay_one(struct notifier_block *nb, struct net_device *dev, > } > > static int br_switchdev_mdb_queue_one(struct list_head *mdb_list, > + struct net_device *dev, > + unsigned long action, > enum switchdev_obj_id id, > const struct net_bridge_mdb_entry *mp, > struct net_device *orig_dev) > @@ -608,8 +610,17 @@ static int br_switchdev_mdb_queue_one(struct list_head *mdb_list, > mdb->obj.id = id; > mdb->obj.orig_dev = orig_dev; > br_switchdev_mdb_populate(mdb, mp); > - list_add_tail(&mdb->obj.list, mdb_list); > > + if (switchdev_port_obj_is_deferred(dev, action, &mdb->obj)) { > + /* This event is already in the deferred queue of > + * events, so this replay must be elided, lest the > + * driver receives duplicate events for it. > + */ > + kfree(mdb); Would it make sense to make "mdb" a local on-stack variable, and kmemdup() it only if it needs to be queued? > + return 0; > + } > + > + list_add_tail(&mdb->obj.list, mdb_list); > return 0; > } > > @@ -677,22 +688,26 @@ br_switchdev_mdb_replay(struct net_device *br_dev, struct net_device *dev, > if (!br_opt_get(br, BROPT_MULTICAST_ENABLED)) > return 0; > > - /* We cannot walk over br->mdb_list protected just by the rtnl_mutex, > - * because the write-side protection is br->multicast_lock. But we > - * need to emulate the [ blocking ] calling context of a regular > - * switchdev event, so since both br->multicast_lock and RCU read side > - * critical sections are atomic, we have no choice but to pick the RCU > - * read side lock, queue up all our events, leave the critical section > - * and notify switchdev from blocking context. > + if (adding) > + action = SWITCHDEV_PORT_OBJ_ADD; > + else > + action = SWITCHDEV_PORT_OBJ_DEL; > + > + /* br_switchdev_mdb_queue_one will take care to not queue a () after function names > + * replay of an event that is already pending in the switchdev > + * deferred queue. In order to safely determine that, there > + * must be no new deferred MDB notifications enqueued for the > + * duration of the MDB scan. Therefore, grab the write-side > + * lock to avoid racing with any concurrent IGMP/MLD snooping. > */ > - rcu_read_lock(); > + spin_lock_bh(&br->multicast_lock); > > hlist_for_each_entry_rcu(mp, &br->mdb_list, mdb_node) { hlist_for_each_entry() > struct net_bridge_port_group __rcu * const *pp; > const struct net_bridge_port_group *p; > > if (mp->host_joined) { > - err = br_switchdev_mdb_queue_one(&mdb_list, > + err = br_switchdev_mdb_queue_one(&mdb_list, dev, action, > SWITCHDEV_OBJ_ID_HOST_MDB, > mp, br_dev); > if (err) { > @@ -706,7 +721,7 @@ br_switchdev_mdb_replay(struct net_device *br_dev, struct net_device *dev, > if (p->key.port->dev != dev) > continue; > > - err = br_switchdev_mdb_queue_one(&mdb_list, > + err = br_switchdev_mdb_queue_one(&mdb_list, dev, action, > SWITCHDEV_OBJ_ID_PORT_MDB, > mp, dev); > if (err) { > @@ -716,12 +731,7 @@ br_switchdev_mdb_replay(struct net_device *br_dev, struct net_device *dev, > } > } > > - rcu_read_unlock(); > - > - if (adding) > - action = SWITCHDEV_PORT_OBJ_ADD; > - else > - action = SWITCHDEV_PORT_OBJ_DEL; > + spin_unlock_bh(&br->multicast_lock); > > list_for_each_entry(obj, &mdb_list, list) { > err = br_switchdev_mdb_replay_one(nb, dev, > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] net: bridge: switchdev: Skip MDB replays of pending events 2024-01-31 12:35 ` [PATCH net 2/2] net: bridge: switchdev: Skip MDB replays of pending events Tobias Waldekranz 2024-01-31 13:51 ` Vladimir Oltean @ 2024-01-31 15:06 ` Vladimir Oltean 1 sibling, 0 replies; 12+ messages in thread From: Vladimir Oltean @ 2024-01-31 15:06 UTC (permalink / raw) To: Tobias Waldekranz Cc: davem, kuba, roopa, razor, bridge, netdev, jiri, ivecera On Wed, Jan 31, 2024 at 01:35:44PM +0100, Tobias Waldekranz wrote: > list_for_each_entry(obj, &mdb_list, list) { > err = br_switchdev_mdb_replay_one(nb, dev, > -- > 2.34.1 > I think there's one more race to deal with. If the switchdev driver has signaled SWITCHDEV_BRPORT_UNOFFLOADED, it may be that there are still deferred port object deletions. If the switchdev port is under a LAG which is under the bridge AND is leaving the LAG, those deferred deletions might run too late, aka after it will no longer process the deletions, since it has left the bridge constellation. To fix that, we need another switchdev_deferred_process() call, after the br_switchdev_mdb_replay_one() calls, while still under rtnl_lock(). The existing switchdev_deferred_process() call from del_nbp() will not help, since the net_bridge_port (the LAG) does _not_ disappear. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-02-01 7:45 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-01-31 12:35 [PATCH net 0/2] net: bridge: switchdev: Skip MDB replays of pending events Tobias Waldekranz 2024-01-31 12:35 ` [PATCH net 1/2] net: switchdev: Add helper to check if an object event is pending Tobias Waldekranz 2024-01-31 12:50 ` Jiri Pirko 2024-01-31 13:31 ` Tobias Waldekranz 2024-02-01 0:33 ` Vladimir Oltean 2024-02-01 7:45 ` Jiri Pirko 2024-01-31 13:34 ` Vladimir Oltean 2024-01-31 14:48 ` Tobias Waldekranz 2024-02-01 0:29 ` Vladimir Oltean 2024-01-31 12:35 ` [PATCH net 2/2] net: bridge: switchdev: Skip MDB replays of pending events Tobias Waldekranz 2024-01-31 13:51 ` Vladimir Oltean 2024-01-31 15:06 ` Vladimir Oltean
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).