From: Vladimir Oltean <vladimir.oltean@nxp.com>
To: Horatiu Vultur <horatiu.vultur@microchip.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"davem@davemloft.net" <davem@davemloft.net>,
"edumazet@google.com" <edumazet@google.com>,
"kuba@kernel.org" <kuba@kernel.org>,
"pabeni@redhat.com" <pabeni@redhat.com>
Subject: Re: [PATCH net-next v2 0/7] net: lan966x: Add lag support
Date: Wed, 29 Jun 2022 12:26:31 +0000 [thread overview]
Message-ID: <20220629122630.qd7nsqmkxoshovhc@skbuf> (raw)
In-Reply-To: <20220627201330.45219-1-horatiu.vultur@microchip.com>
Hi Horatiu,
On Mon, Jun 27, 2022 at 10:13:23PM +0200, Horatiu Vultur wrote:
> Add lag support for lan966x.
> First 4 patches don't do any changes to the current behaviour, they
> just prepare for lag support. While the rest is to add the lag support.
>
> v1->v2:
> - fix the LAG PGIDs when ports go down, in this way is not
> needed anymore the last patch of the series.
>
> Horatiu Vultur (7):
> net: lan966x: Add reqisters used to configure lag interfaces
> net: lan966x: Split lan966x_fdb_event_work
> net: lan966x: Expose lan966x_switchdev_nb and
> lan966x_switchdev_blocking_nb
> net: lan966x: Extend lan966x_foreign_bridging_check
> net: lan966x: Add lag support for lan966x.
> net: lan966x: Extend FDB to support also lag
> net: lan966x: Extend MAC to support also lag interfaces.
>
> .../net/ethernet/microchip/lan966x/Makefile | 2 +-
> .../ethernet/microchip/lan966x/lan966x_fdb.c | 153 ++++++---
> .../ethernet/microchip/lan966x/lan966x_lag.c | 322 ++++++++++++++++++
> .../ethernet/microchip/lan966x/lan966x_mac.c | 66 +++-
> .../ethernet/microchip/lan966x/lan966x_main.h | 41 +++
> .../ethernet/microchip/lan966x/lan966x_regs.h | 45 +++
> .../microchip/lan966x/lan966x_switchdev.c | 115 +++++--
> 7 files changed, 654 insertions(+), 90 deletions(-)
> create mode 100644 drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
>
> --
> 2.33.0
>
I've downloaded and applied your patches and I have some general feedback.
Some of it relates to changes which were not made and hence I couldn't
have commented on the patches themselves, so I'm posting it here.
1. switchdev_bridge_port_offload() returns an error code if object
replay failed, or if it couldn't get the port parent id, or if the user
tries to join a lan966x port and a port belonging to another switchdev
driver to the same LAG. It would be good to propagate this error and not
ignore it.
Side note: maybe this could help to eliminate the extra logic you need
to add to lan966x_foreign_bridging_check().
2. lan966x_foreign_dev_check() seems wrong/misunderstood. Currently it
reports that a LAG upper is a foreign interface (unoffloaded). In turn,
this makes switchdev_lower_dev_find() not find any lan966x interface
beneath a LAG, and hence, __switchdev_handle_fdb_event_to_device() would
not recurse to the lan966x "dev" below a LAG when the "orig_dev" of an
FDB event is the bridge itself. Otherwise said, if you have no direct
lan966x port under a bridge, but just bridge -> LAG -> lan966x, you will
miss all local (host-filtered) FDB event notifications that you should
otherwise learn towards the CPU.
3. The implementation of lan966x_lag_mac_add_entry(), with that first
call to lan966x_mac_del_entry(), seems a hack. Why do you need to do
that?
4. The handling of lan966x->mac_lock seems wrong in general, not just
particular to this patch set. In particular, it appears to protect too
little in lan966x_mac_add_entry(), i.e. just the list_add_tail.
This makes it possible for lan966x_mac_lookup and lan966x_mac_learn to
be concurrent with lan966x_mac_del_entry(). In turn, this appears bad
first and foremost for the hardware access interface, since the MAC
table access is indirect, and if you allow multiple threads to
concurrently call lan966x_mac_select(), change the command in
ANA_MACACCESS, and poll for command completion, things will go sideways
very quickly (one command will inadvertently poll for the completion of
another, which inadvertently operates on the row/column selected by yet
a third command, all that due to improper serialization).
5. There is a race between lan966x_fdb_lag_event_work() calling
lan966x_lag_first_port(), and lan966x_lag_port_leave() changing
port->bond = NULL. Specifically, when a lan966x port leaves a LAG, there
might still be deferred FDB events (add or del) which are still pending.
There exists a dead time during which you will ignore these, because you
think that the first lan966x LAG port isn't the first lan966x LAG port,
which will lead to a desynchronization between the bridge FDB and the
hardware FDB.
In DSA we solved this by flushing lan966x->fdb_work inside
lan966x_port_prechangeupper() on leave. This waits for the pending
events to finish, and the bridge will not emit further events.
It's important to do this in prechangeupper() rather than in
changeupper() because switchdev_handle_fdb_event_to_device() needs the
upper/lower relationship to still exist to function properly, and in
changeupper() it has already been destroyed.
Side note: if you flush lan966x->fdb_work, then you have an upper bound
for how long can lan966x_fdb_event_work be deferred. Specifically, you
can remove the dev_hold() and dev_put() calls, since it surely can't be
deferred until after the netdev is unregistered. The bounding event is
much quicker - the lan966x port leaves the LAG.
6. You are missing LAG FDB migration logic in lan966x_lag_port_join().
Specifically, you assume that the lan966x_lag_first_port() will never
change, probably because you just make the switch ports join the LAG in
the order 1, 2, 3. But they can also join in the order 3, 2, 1.
next prev parent reply other threads:[~2022-06-29 12:26 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-27 20:13 [PATCH net-next v2 0/7] net: lan966x: Add lag support Horatiu Vultur
2022-06-27 20:13 ` [PATCH net-next v2 1/7] net: lan966x: Add reqisters used to configure lag interfaces Horatiu Vultur
2022-06-27 20:13 ` [PATCH net-next v2 2/7] net: lan966x: Split lan966x_fdb_event_work Horatiu Vultur
2022-06-27 20:13 ` [PATCH net-next v2 3/7] net: lan966x: Expose lan966x_switchdev_nb and lan966x_switchdev_blocking_nb Horatiu Vultur
2022-06-27 20:13 ` [PATCH net-next v2 4/7] net: lan966x: Extend lan966x_foreign_bridging_check Horatiu Vultur
2022-06-27 20:13 ` [PATCH net-next v2 5/7] net: lan966x: Add lag support for lan966x Horatiu Vultur
2022-06-27 20:13 ` [PATCH net-next v2 6/7] net: lan966x: Extend FDB to support also lag Horatiu Vultur
2022-06-27 20:13 ` [PATCH net-next v2 7/7] net: lan966x: Extend MAC to support also lag interfaces Horatiu Vultur
2022-06-29 12:26 ` Vladimir Oltean [this message]
2022-06-30 19:08 ` [PATCH net-next v2 0/7] net: lan966x: Add lag support Horatiu Vultur
2022-07-01 16:34 ` Vladimir Oltean
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220629122630.qd7nsqmkxoshovhc@skbuf \
--to=vladimir.oltean@nxp.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horatiu.vultur@microchip.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox