netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tobias Waldekranz <tobias@waldekranz.com>
To: Vladimir Oltean <olteanv@gmail.com>
Cc: davem@davemloft.net, kuba@kernel.org, atenart@kernel.org,
	roopa@nvidia.com, razor@blackwall.org, bridge@lists.linux.dev,
	netdev@vger.kernel.org, jiri@resnulli.us, ivecera@redhat.com
Subject: Re: [PATCH v4 net 1/2] net: bridge: switchdev: Skip MDB replays of deferred events on offload
Date: Wed, 14 Feb 2024 22:28:16 +0100	[thread overview]
Message-ID: <87plwysplb.fsf@waldekranz.com> (raw)
In-Reply-To: <20240214164559.njyaoscx2e22esep@skbuf>

On ons, feb 14, 2024 at 18:45, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Mon, Feb 12, 2024 at 08:18:43PM +0100, Tobias Waldekranz wrote:
>> Before this change, generation of the list of MDB events to replay
>> would race against the creation of new group memberships, either from
>> the IGMP/MLD snooping logic or from user configuration.
>> 
>> While new memberships are immediately visible to walkers of
>> br->mdb_list, the notification of their existence to switchdev event
>> subscribers is deferred until a later point in time. So if a replay
>> list was generated during a time that overlapped with such a window,
>> it would also contain a replay of the not-yet-delivered event.
>> 
>> The driver would thus receive two copies of what the bridge internally
>> considered to be one single event. On destruction of the bridge, only
>> a single membership deletion event was therefore sent. As a
>> consequence of this, drivers which reference count memberships (at
>> least DSA), would be left with orphan groups in their hardware
>> database when the bridge was destroyed.
>> 
>> This is only an issue when replaying additions. While deletion events
>> may still be pending on the deferred queue, they will already have
>> been removed from br->mdb_list, so no duplicates can be generated in
>> that scenario.
>> 
>> To a user this meant that old group memberships, from a bridge in
>> which a port was previously attached, could be reanimated (in
>> hardware) when the port joined a new bridge, without the new bridge's
>> knowledge.
>> 
>> For example, on an mv88e6xxx system, create a snooping bridge and
>> immediately add a port to it:
>> 
>>     root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \
>>     > ip link set dev x3 up master br0
>> 
>> And then destroy the bridge:
>> 
>>     root@infix-06-0b-00:~$ ip link del dev br0
>>     root@infix-06-0b-00:~$ mvls atu
>>     ADDRESS             FID  STATE      Q  F  0  1  2  3  4  5  6  7  8  9  a
>>     DEV:0 Marvell 88E6393X
>>     33:33:00:00:00:6a     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
>>     33:33:ff:87:e4:3f     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
>>     ff:ff:ff:ff:ff:ff     1  static     -  -  0  1  2  3  4  5  6  7  8  9  a
>>     root@infix-06-0b-00:~$
>> 
>> The two IPv6 groups remain in the hardware database because the
>> port (x3) is notified of the host's membership twice: once via the
>> original event and once via a replay. Since only a single delete
>> notification is sent, the count remains at 1 when the bridge is
>> destroyed.
>> 
>> Then add the same port (or another port belonging to the same hardware
>> domain) to a new bridge, this time with snooping disabled:
>> 
>>     root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \
>>     > ip link set dev x3 up master br1
>> 
>> All multicast, including the two IPv6 groups from br0, should now be
>> flooded, according to the policy of br1. But instead the old
>> memberships are still active in the hardware database, causing the
>> switch to only forward traffic to those groups towards the CPU (port
>> 0).
>> 
>> Eliminate the race in two steps:
>> 
>> 1. Grab the write-side lock of the MDB while generating the replay
>>    list.
>> 
>> This prevents new memberships from showing up while we are generating
>> the replay list. But it leaves the scenario in which a deferred event
>> was already generated, but not delivered, before we grabbed the
>> lock. Therefore:
>> 
>> 2. Make sure that no deferred version of a replay event is already
>>    enqueued to the switchdev deferred queue, before adding it to the
>>    replay list, when replaying additions.
>> 
>> Fixes: 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries")
>> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
>> ---
>
> Excellent from my side, thank you!

Thanks!

> Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
>
>> @@ -307,6 +336,50 @@ int switchdev_port_obj_del(struct net_device *dev,
>>  }
>>  EXPORT_SYMBOL_GPL(switchdev_port_obj_del);
>>  
>> +/**
>> + *	switchdev_port_obj_act_is_deferred - Is object action pending?
>> + *
>> + *	@dev: port device
>> + *	@nt: type of action; add or delete
>> + *	@obj: object to test
>> + *
>> + *	Returns true if a deferred item is exists, which is equivalent
>> + *	to the action @nt of an object @obj.
>
> nitpick: replace "is exists" with something else like "is pending" or
> "exists".
>
> Also "action of an object" or "on an object"?

Yes, these are annoying. I might as well send a v5.

pw-bot: changes-requested

>> + *
>> + *	rtnl_lock must be held.
>> + */

  reply	other threads:[~2024-02-14 21:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-12 19:18 [PATCH v4 net 0/2] net: bridge: switchdev: Ensure MDB events are delivered exactly once Tobias Waldekranz
2024-02-12 19:18 ` [PATCH v4 net 1/2] net: bridge: switchdev: Skip MDB replays of deferred events on offload Tobias Waldekranz
2024-02-14 16:45   ` Vladimir Oltean
2024-02-14 21:28     ` Tobias Waldekranz [this message]
2024-02-12 19:18 ` [PATCH v4 net 2/2] net: bridge: switchdev: Ensure deferred event delivery on unoffload Tobias Waldekranz
2024-02-14 16:47   ` Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87plwysplb.fsf@waldekranz.com \
    --to=tobias@waldekranz.com \
    --cc=atenart@kernel.org \
    --cc=bridge@lists.linux.dev \
    --cc=davem@davemloft.net \
    --cc=ivecera@redhat.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=razor@blackwall.org \
    --cc=roopa@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).