From: Jason Gunthorpe <jgg@nvidia.com>
To: Parav Pandit <parav@nvidia.com>
Cc: <dledford@redhat.com>, <jiri@mellanox.com>,
<linux-rdma@vger.kernel.org>, <michaelgur@mellanox.com>,
<netdev@vger.kernel.org>, <saeedm@nvidia.com>,
Leon Romanovsky <leonro@nvidia.com>
Subject: Re: [PATCH rdma-rc RESEND v1] RDMA/mlx5: Fix devlink deadlock on net namespace deletion
Date: Mon, 26 Oct 2020 19:25:12 -0300 [thread overview]
Message-ID: <20201026222512.GB2066862@nvidia.com> (raw)
In-Reply-To: <20201026134359.23150-1-parav@nvidia.com>
On Mon, Oct 26, 2020 at 03:43:59PM +0200, Parav Pandit wrote:
> When a mlx5 core devlink instance is reloaded in different net
> namespace, its associated IB device is deleted and recreated.
>
> Example sequence is:
> $ ip netns add foo
> $ devlink dev reload pci/0000:00:08.0 netns foo
> $ ip netns del foo
>
> mlx5 IB device needs to attach and detach the netdevice to it
> through the netdev notifier chain during load and unload sequence.
> A below call graph of the unload flow.
>
> cleanup_net()
> down_read(&pernet_ops_rwsem); <- first sem acquired
> ops_pre_exit_list()
> pre_exit()
> devlink_pernet_pre_exit()
> devlink_reload()
> mlx5_devlink_reload_down()
> mlx5_unload_one()
> [...]
> mlx5_ib_remove()
> mlx5_ib_unbind_slave_port()
> mlx5_remove_netdev_notifier()
> unregister_netdevice_notifier()
> down_write(&pernet_ops_rwsem);<- recurrsive lock
>
> Hence, when net namespace is deleted, mlx5 reload results in deadlock.
>
> When deadlock occurs, devlink mutex is also held. This not only deadlocks
> the mlx5 device under reload, but all the processes which attempt to access
> unrelated devlink devices are deadlocked.
>
> Hence, fix this by mlx5 ib driver to register for per net netdev
> notifier instead of global one, which operats on the net namespace
> without holding the pernet_ops_rwsem.
>
> Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload")
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> Changelog:
> v0->v1:
> - updated comment for mlx5_core_net API to be used by multiple mlx5
> drivers
> ---
> drivers/infiniband/hw/mlx5/main.c | 6 ++++--
> .../net/ethernet/mellanox/mlx5/core/lib/mlx5.h | 5 -----
> include/linux/mlx5/driver.h | 18 ++++++++++++++++++
> 3 files changed, 22 insertions(+), 7 deletions(-)
Applied to for-rc, thanks
Jason
prev parent reply other threads:[~2020-10-26 22:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-19 5:27 [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace deletion Leon Romanovsky
2020-10-19 13:07 ` Jason Gunthorpe
2020-10-19 13:23 ` Parav Pandit
2020-10-19 19:01 ` Jason Gunthorpe
2020-10-19 19:26 ` Parav Pandit
2020-10-20 11:41 ` Parav Pandit
2020-10-26 13:38 ` Parav Pandit
2020-10-26 13:47 ` Parav Pandit
2020-10-26 13:43 ` [PATCH rdma-rc RESEND v1] " Parav Pandit
2020-10-26 22:25 ` Jason Gunthorpe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201026222512.GB2066862@nvidia.com \
--to=jgg@nvidia.com \
--cc=dledford@redhat.com \
--cc=jiri@mellanox.com \
--cc=leonro@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
--cc=michaelgur@mellanox.com \
--cc=netdev@vger.kernel.org \
--cc=parav@nvidia.com \
--cc=saeedm@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.