From: Vladimir Oltean <vladimir.oltean@nxp.com>
To: Horatiu Vultur <horatiu.vultur@microchip.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"UNGLinuxDriver@microchip.com" <UNGLinuxDriver@microchip.com>,
"davem@davemloft.net" <davem@davemloft.net>,
"edumazet@google.com" <edumazet@google.com>,
"kuba@kernel.org" <kuba@kernel.org>,
"pabeni@redhat.com" <pabeni@redhat.com>
Subject: Re: [PATCH net-next v3 2/7] net: lan966x: Split lan966x_fdb_event_work
Date: Sat, 2 Jul 2022 14:08:34 +0000 [thread overview]
Message-ID: <20220702140834.gyqmtmaru6ecdamb@skbuf> (raw)
In-Reply-To: <20220701205227.1337160-3-horatiu.vultur@microchip.com>
On Fri, Jul 01, 2022 at 10:52:22PM +0200, Horatiu Vultur wrote:
> Split the function lan966x_fdb_event_work. One case for when the
> orig_dev is a bridge and one case when orig_dev is lan966x port.
> This is preparation for lag support. There is no functional change.
>
> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> ---
> -static void lan966x_fdb_event_work(struct work_struct *work)
> +void lan966x_fdb_flush_workqueue(struct lan966x *lan966x)
> +{
> + flush_workqueue(lan966x->fdb_work);
> +}
> +
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c b/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c
> index df2bee678559..d9fc6a9a3da1 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c
> @@ -320,9 +320,10 @@ static int lan966x_port_prechangeupper(struct net_device *dev,
> {
> struct lan966x_port *port = netdev_priv(dev);
>
> - if (netif_is_bridge_master(info->upper_dev) && !info->linking)
> - switchdev_bridge_port_unoffload(port->dev, port,
> - NULL, NULL);
> + if (netif_is_bridge_master(info->upper_dev) && !info->linking) {
> + switchdev_bridge_port_unoffload(port->dev, port, NULL, NULL);
> + lan966x_fdb_flush_workqueue(port->lan966x);
> + }
Very curious as to why you decided to stuff this change in here.
There was no functional change in v2, now there is. And it's a change
you might need to come back to later (probably sooner than you'd like),
since the flushing of the workqueue is susceptible to causing deadlocks
if done improperly - let's see how you blame a commit that was only
supposed to move code, in that case ;)
The deadlock that I'm talking about comes from the fact that
lan966x_port_prechangeupper() runs with rtnl_lock() held. So the code of
the flushed workqueue item must not hold rtnl_lock(), or any other lock
that is blocked by the rtnl_lock(). Otherwise, the flushing will wait
for a workqueue item to complete, that in turn waits to acquire the
rtnl_lock, which is held by the thread waiting the workqueue to complete.
Analyzing your code, lan966x_mac_notifiers() takes rtnl_lock().
That is taken from threaded interrupt context - lan966x_mac_irq_process(),
but is a sub-lock of spin_lock(&lan966x->mac_lock).
There are 2 problems with that already: rtnl_lock() is a mutex => can
sleep, but &lan966x->mac_lock is a spin lock => is atomic. You can't
take rtnl_lock() from atomic context. Lockdep and/or CONFIG_DEBUG_ATOMIC_SLEEP
will tell you so much.
The second problem is the lock ordering inversion that this causes.
There exists a threaded IRQ which takes the locks in the order mac_lock
-> rtnl_lock, and there exists this new fdb_flush_workqueue which takes
the locks in the order rtnl_lock -> mac_lock. If they run at the same
time, kaboom. Again, lockdep will tell you as much.
I'm sorry, but you need to solve the existing locking problems with the
code first.
>
> return NOTIFY_DONE;
> }
> --
> 2.33.0
>
next prev parent reply other threads:[~2022-07-02 14:08 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-01 20:52 [PATCH net-next v3 0/7] net: lan966x: Add lag support Horatiu Vultur
2022-07-01 20:52 ` [PATCH net-next v3 1/7] net: lan966x: Add reqisters used to configure lag interfaces Horatiu Vultur
2022-07-02 14:30 ` Vladimir Oltean
2022-07-01 20:52 ` [PATCH net-next v3 2/7] net: lan966x: Split lan966x_fdb_event_work Horatiu Vultur
2022-07-02 14:08 ` Vladimir Oltean [this message]
2022-07-05 21:59 ` Horatiu Vultur
2022-07-01 20:52 ` [PATCH net-next v3 3/7] net: lan966x: Expose lan966x_switchdev_nb and lan966x_switchdev_blocking_nb Horatiu Vultur
2022-07-01 20:52 ` [PATCH net-next v3 4/7] net: lan966x: Extend lan966x_foreign_bridging_check Horatiu Vultur
2022-07-02 14:30 ` Vladimir Oltean
2022-07-01 20:52 ` [PATCH net-next v3 5/7] net: lan966x: Add lag support for lan966x Horatiu Vultur
2022-07-02 14:12 ` Vladimir Oltean
2022-07-05 18:38 ` kernel test robot
2022-07-06 8:45 ` Vladimir Oltean
2022-07-01 20:52 ` [PATCH net-next v3 6/7] net: lan966x: Extend FDB to support also lag Horatiu Vultur
2022-07-01 20:52 ` [PATCH net-next v3 7/7] net: lan966x: Extend MAC to support also lag interfaces Horatiu Vultur
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220702140834.gyqmtmaru6ecdamb@skbuf \
--to=vladimir.oltean@nxp.com \
--cc=UNGLinuxDriver@microchip.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horatiu.vultur@microchip.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox