From: Tobias Waldekranz <tobias@waldekranz.com>
To: Vladimir Oltean <olteanv@gmail.com>
Cc: davem@davemloft.net, kuba@kernel.org, atenart@kernel.org,
roopa@nvidia.com, razor@blackwall.org, bridge@lists.linux.dev,
netdev@vger.kernel.org, jiri@resnulli.us, ivecera@redhat.com
Subject: Re: [PATCH v4 net 1/2] net: bridge: switchdev: Skip MDB replays of deferred events on offload
Date: Wed, 14 Feb 2024 22:28:16 +0100 [thread overview]
Message-ID: <87plwysplb.fsf@waldekranz.com> (raw)
In-Reply-To: <20240214164559.njyaoscx2e22esep@skbuf>
On ons, feb 14, 2024 at 18:45, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Mon, Feb 12, 2024 at 08:18:43PM +0100, Tobias Waldekranz wrote:
>> Before this change, generation of the list of MDB events to replay
>> would race against the creation of new group memberships, either from
>> the IGMP/MLD snooping logic or from user configuration.
>>
>> While new memberships are immediately visible to walkers of
>> br->mdb_list, the notification of their existence to switchdev event
>> subscribers is deferred until a later point in time. So if a replay
>> list was generated during a time that overlapped with such a window,
>> it would also contain a replay of the not-yet-delivered event.
>>
>> The driver would thus receive two copies of what the bridge internally
>> considered to be one single event. On destruction of the bridge, only
>> a single membership deletion event was therefore sent. As a
>> consequence of this, drivers which reference count memberships (at
>> least DSA), would be left with orphan groups in their hardware
>> database when the bridge was destroyed.
>>
>> This is only an issue when replaying additions. While deletion events
>> may still be pending on the deferred queue, they will already have
>> been removed from br->mdb_list, so no duplicates can be generated in
>> that scenario.
>>
>> To a user this meant that old group memberships, from a bridge in
>> which a port was previously attached, could be reanimated (in
>> hardware) when the port joined a new bridge, without the new bridge's
>> knowledge.
>>
>> For example, on an mv88e6xxx system, create a snooping bridge and
>> immediately add a port to it:
>>
>> root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \
>> > ip link set dev x3 up master br0
>>
>> And then destroy the bridge:
>>
>> root@infix-06-0b-00:~$ ip link del dev br0
>> root@infix-06-0b-00:~$ mvls atu
>> ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a
>> DEV:0 Marvell 88E6393X
>> 33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . .
>> 33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . .
>> ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a
>> root@infix-06-0b-00:~$
>>
>> The two IPv6 groups remain in the hardware database because the
>> port (x3) is notified of the host's membership twice: once via the
>> original event and once via a replay. Since only a single delete
>> notification is sent, the count remains at 1 when the bridge is
>> destroyed.
>>
>> Then add the same port (or another port belonging to the same hardware
>> domain) to a new bridge, this time with snooping disabled:
>>
>> root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \
>> > ip link set dev x3 up master br1
>>
>> All multicast, including the two IPv6 groups from br0, should now be
>> flooded, according to the policy of br1. But instead the old
>> memberships are still active in the hardware database, causing the
>> switch to only forward traffic to those groups towards the CPU (port
>> 0).
>>
>> Eliminate the race in two steps:
>>
>> 1. Grab the write-side lock of the MDB while generating the replay
>> list.
>>
>> This prevents new memberships from showing up while we are generating
>> the replay list. But it leaves the scenario in which a deferred event
>> was already generated, but not delivered, before we grabbed the
>> lock. Therefore:
>>
>> 2. Make sure that no deferred version of a replay event is already
>> enqueued to the switchdev deferred queue, before adding it to the
>> replay list, when replaying additions.
>>
>> Fixes: 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries")
>> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
>> ---
>
> Excellent from my side, thank you!
Thanks!
> Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
>
>> @@ -307,6 +336,50 @@ int switchdev_port_obj_del(struct net_device *dev,
>> }
>> EXPORT_SYMBOL_GPL(switchdev_port_obj_del);
>>
>> +/**
>> + * switchdev_port_obj_act_is_deferred - Is object action pending?
>> + *
>> + * @dev: port device
>> + * @nt: type of action; add or delete
>> + * @obj: object to test
>> + *
>> + * Returns true if a deferred item is exists, which is equivalent
>> + * to the action @nt of an object @obj.
>
> nitpick: replace "is exists" with something else like "is pending" or
> "exists".
>
> Also "action of an object" or "on an object"?
Yes, these are annoying. I might as well send a v5.
pw-bot: changes-requested
>> + *
>> + * rtnl_lock must be held.
>> + */
next prev parent reply other threads:[~2024-02-14 21:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-12 19:18 [PATCH v4 net 0/2] net: bridge: switchdev: Ensure MDB events are delivered exactly once Tobias Waldekranz
2024-02-12 19:18 ` [PATCH v4 net 1/2] net: bridge: switchdev: Skip MDB replays of deferred events on offload Tobias Waldekranz
2024-02-14 16:45 ` Vladimir Oltean
2024-02-14 21:28 ` Tobias Waldekranz [this message]
2024-02-12 19:18 ` [PATCH v4 net 2/2] net: bridge: switchdev: Ensure deferred event delivery on unoffload Tobias Waldekranz
2024-02-14 16:47 ` Vladimir Oltean
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87plwysplb.fsf@waldekranz.com \
--to=tobias@waldekranz.com \
--cc=atenart@kernel.org \
--cc=bridge@lists.linux.dev \
--cc=davem@davemloft.net \
--cc=ivecera@redhat.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=olteanv@gmail.com \
--cc=razor@blackwall.org \
--cc=roopa@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.