All of lore.kernel.org
 help / color / mirror / Atom feed
From: Simon Horman <horms@kernel.org>
To: Tariq Toukan <tariqt@nvidia.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Eric Dumazet <edumazet@google.com>,
	netdev@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
	Gal Pressman <gal@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Maher Sanalla <msanalla@nvidia.com>,
	Mark Bloch <mbloch@nvidia.com>
Subject: Re: [PATCH net 3/5] net/mlx5: Reload only IB representors upon lag disable/enable
Date: Fri, 10 May 2024 16:51:02 +0100	[thread overview]
Message-ID: <20240510155102.GE2347895@kernel.org> (raw)
In-Reply-To: <20240509112951.590184-4-tariqt@nvidia.com>

On Thu, May 09, 2024 at 02:29:49PM +0300, Tariq Toukan wrote:
> From: Maher Sanalla <msanalla@nvidia.com>
> 
> On lag disable, the bond IB device along with all of its
> representors are destroyed, and then the slaves' representors get reloaded.
> 
> In case the slave IB representor load fails, the eswitch error flow
> unloads all representors, including ethernet representors, where the
> netdevs get detached and removed from lag bond. Such flow is inaccurate
> as the lag driver is not responsible for loading/unloading ethernet
> representors. Furthermore, the flow described above begins by holding
> lag lock to prevent bond changes during disable flow. However, when
> reaching the ethernet representors detachment from lag, the lag lock is
> required again, triggering the following deadlock:
> 
> Call trace:
> __switch_to+0xf4/0x148
> __schedule+0x2c8/0x7d0
> schedule+0x50/0xe0
> schedule_preempt_disabled+0x18/0x28
> __mutex_lock.isra.13+0x2b8/0x570
> __mutex_lock_slowpath+0x1c/0x28
> mutex_lock+0x4c/0x68
> mlx5_lag_remove_netdev+0x3c/0x1a0 [mlx5_core]
> mlx5e_uplink_rep_disable+0x70/0xa0 [mlx5_core]
> mlx5e_detach_netdev+0x6c/0xb0 [mlx5_core]
> mlx5e_netdev_change_profile+0x44/0x138 [mlx5_core]
> mlx5e_netdev_attach_nic_profile+0x28/0x38 [mlx5_core]
> mlx5e_vport_rep_unload+0x184/0x1b8 [mlx5_core]
> mlx5_esw_offloads_rep_load+0xd8/0xe0 [mlx5_core]
> mlx5_eswitch_reload_reps+0x74/0xd0 [mlx5_core]
> mlx5_disable_lag+0x130/0x138 [mlx5_core]
> mlx5_lag_disable_change+0x6c/0x70 [mlx5_core] // hold ldev->lock
> mlx5_devlink_eswitch_mode_set+0xc0/0x410 [mlx5_core]
> devlink_nl_cmd_eswitch_set_doit+0xdc/0x180
> genl_family_rcv_msg_doit.isra.17+0xe8/0x138
> genl_rcv_msg+0xe4/0x220
> netlink_rcv_skb+0x44/0x108
> genl_rcv+0x40/0x58
> netlink_unicast+0x198/0x268
> netlink_sendmsg+0x1d4/0x418
> sock_sendmsg+0x54/0x60
> __sys_sendto+0xf4/0x120
> __arm64_sys_sendto+0x30/0x40
> el0_svc_common+0x8c/0x120
> do_el0_svc+0x30/0xa0
> el0_svc+0x20/0x30
> el0_sync_handler+0x90/0xb8
> el0_sync+0x160/0x180
> 
> Thus, upon lag enable/disable, load and unload only the IB representors
> of the slaves preventing the deadlock mentioned above.
> 
> While at it, refactor the mlx5_esw_offloads_rep_load() function to have
> a static helper method for its internal logic, in symmetry with the
> representor unload design.
> 
> Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode")
> Co-developed-by: Mark Bloch <mbloch@nvidia.com>
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


  reply	other threads:[~2024-05-10 15:51 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-09 11:29 [PATCH net 0/5] mlx5 misc fixes Tariq Toukan
2024-05-09 11:29 ` [PATCH net 1/5] net/mlx5e: Fix netif state handling Tariq Toukan
2024-05-10 15:31   ` Simon Horman
2024-05-09 11:29 ` [PATCH net 2/5] net/mlx5: Fix peer devlink set for SF representor devlink port Tariq Toukan
2024-05-10 15:38   ` Simon Horman
2024-05-09 11:29 ` [PATCH net 3/5] net/mlx5: Reload only IB representors upon lag disable/enable Tariq Toukan
2024-05-10 15:51   ` Simon Horman [this message]
2024-05-09 11:29 ` [PATCH net 4/5] net/mlx5: Add a timeout to acquire the command queue semaphore Tariq Toukan
2024-05-09 11:29 ` [PATCH net 5/5] net/mlx5: Discard command completions in internal error Tariq Toukan
2024-05-11  2:50 ` [PATCH net 0/5] mlx5 misc fixes patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240510155102.GE2347895@kernel.org \
    --to=horms@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=mbloch@nvidia.com \
    --cc=msanalla@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.