All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
To: Tariq Toukan <tariqt@nvidia.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	Gal Pressman <gal@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
	Moshe Shemesh <moshe@nvidia.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, Jianbo Liu <jianbol@nvidia.com>
Subject: Re: [PATCH net 5/6] net/mlx5: Bridge, fix the crash caused by LAG state check
Date: Tue, 11 Mar 2025 09:02:04 +0100	[thread overview]
Message-ID: <Z8/t/LYlMWEwa+RL@mev-dev.igk.intel.com> (raw)
In-Reply-To: <1741644104-97767-6-git-send-email-tariqt@nvidia.com>

On Tue, Mar 11, 2025 at 12:01:43AM +0200, Tariq Toukan wrote:
> From: Jianbo Liu <jianbol@nvidia.com>
> 
> When removing LAG device from bridge, NETDEV_CHANGEUPPER event is
> triggered. Driver finds the lower devices (PFs) to flush all the
> offloaded entries. And mlx5_lag_is_shared_fdb is checked, it returns
> false if one of PF is unloaded. In such case,
> mlx5_esw_bridge_lag_rep_get() and its caller return NULL, instead of
> the alive PF, and the flush is skipped.
> 
> Besides, the bridge fdb entry's lastuse is updated in mlx5 bridge
> event handler. But this SWITCHDEV_FDB_ADD_TO_BRIDGE event can be
> ignored in this case because the upper interface for bond is deleted,
> and the entry will never be aged because lastuse is never updated.
> 
> To make things worse, as the entry is alive, mlx5 bridge workqueue
> keeps sending that event, which is then handled by kernel bridge
> notifier. It causes the following crash when accessing the passed bond
> netdev which is already destroyed.
> 
> To fix this issue, remove such checks. LAG state is already checked in
> commit 15f8f168952f ("net/mlx5: Bridge, verify LAG state when adding
> bond to bridge"), driver still need to skip offload if LAG becomes
> invalid state after initialization.
> 
>  Oops: stack segment: 0000 [#1] SMP
>  CPU: 3 UID: 0 PID: 23695 Comm: kworker/u40:3 Tainted: G           OE      6.11.0_mlnx #1
>  Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>  Workqueue: mlx5_bridge_wq mlx5_esw_bridge_update_work [mlx5_core]
>  RIP: 0010:br_switchdev_event+0x2c/0x110 [bridge]
>  Code: 44 00 00 48 8b 02 48 f7 00 00 02 00 00 74 69 41 54 55 53 48 83 ec 08 48 8b a8 08 01 00 00 48 85 ed 74 4a 48 83 fe 02 48 89 d3 <4c> 8b 65 00 74 23 76 49 48 83 fe 05 74 7e 48 83 fe 06 75 2f 0f b7
>  RSP: 0018:ffffc900092cfda0 EFLAGS: 00010297
>  RAX: ffff888123bfe000 RBX: ffffc900092cfe08 RCX: 00000000ffffffff
>  RDX: ffffc900092cfe08 RSI: 0000000000000001 RDI: ffffffffa0c585f0
>  RBP: 6669746f6e690a30 R08: 0000000000000000 R09: ffff888123ae92c8
>  R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888123ae9c60
>  R13: 0000000000000001 R14: ffffc900092cfe08 R15: 0000000000000000
>  FS:  0000000000000000(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007f15914c8734 CR3: 0000000002830005 CR4: 0000000000770ef0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  PKRU: 55555554
>  Call Trace:
>   <TASK>
>   ? __die_body+0x1a/0x60
>   ? die+0x38/0x60
>   ? do_trap+0x10b/0x120
>   ? do_error_trap+0x64/0xa0
>   ? exc_stack_segment+0x33/0x50
>   ? asm_exc_stack_segment+0x22/0x30
>   ? br_switchdev_event+0x2c/0x110 [bridge]
>   ? sched_balance_newidle.isra.149+0x248/0x390
>   notifier_call_chain+0x4b/0xa0
>   atomic_notifier_call_chain+0x16/0x20
>   mlx5_esw_bridge_update+0xec/0x170 [mlx5_core]
>   mlx5_esw_bridge_update_work+0x19/0x40 [mlx5_core]
>   process_scheduled_works+0x81/0x390
>   worker_thread+0x106/0x250
>   ? bh_worker+0x110/0x110
>   kthread+0xb7/0xe0
>   ? kthread_park+0x80/0x80
>   ret_from_fork+0x2d/0x50
>   ? kthread_park+0x80/0x80
>   ret_from_fork_asm+0x11/0x20
>   </TASK>
> 
> Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG")
> Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
> Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  .../net/ethernet/mellanox/mlx5/core/en/rep/bridge.c  | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c
> index 5d128c5b4529..0f5d7ea8956f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c
> @@ -48,15 +48,10 @@ mlx5_esw_bridge_lag_rep_get(struct net_device *dev, struct mlx5_eswitch *esw)
>  	struct list_head *iter;
>  
>  	netdev_for_each_lower_dev(dev, lower, iter) {
> -		struct mlx5_core_dev *mdev;
> -		struct mlx5e_priv *priv;
> -
>  		if (!mlx5e_eswitch_rep(lower))
>  			continue;
>  
> -		priv = netdev_priv(lower);
> -		mdev = priv->mdev;
> -		if (mlx5_lag_is_shared_fdb(mdev) && mlx5_esw_bridge_dev_same_esw(lower, esw))
> +		if (mlx5_esw_bridge_dev_same_esw(lower, esw))
>  			return lower;
>  	}
>  
> @@ -125,7 +120,7 @@ static bool mlx5_esw_bridge_is_local(struct net_device *dev, struct net_device *
>  	priv = netdev_priv(rep);
>  	mdev = priv->mdev;
>  	if (netif_is_lag_master(dev))
> -		return mlx5_lag_is_shared_fdb(mdev) && mlx5_lag_is_master(mdev);
> +		return mlx5_lag_is_master(mdev);
>  	return true;
>  }
>  
> @@ -455,6 +450,9 @@ static int mlx5_esw_bridge_switchdev_event(struct notifier_block *nb,
>  	if (!rep)
>  		return NOTIFY_DONE;
>  
> +	if (netif_is_lag_master(dev) && !mlx5_lag_is_shared_fdb(esw->dev))
> +		return NOTIFY_DONE;
> +
>  	switch (event) {
>  	case SWITCHDEV_FDB_ADD_TO_BRIDGE:
>  		fdb_info = container_of(info,

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

> -- 
> 2.31.1

  reply	other threads:[~2025-03-11  8:05 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-10 22:01 [PATCH net 0/6] mlx5 misc fixes 2025-03-10 Tariq Toukan
2025-03-10 22:01 ` [PATCH net 1/6] net/mlx5: DR, use the right action structs for STEv3 Tariq Toukan
2025-03-10 22:01 ` [PATCH net 2/6] net/mlx5: HWS, Rightsize bwc matcher priority Tariq Toukan
2025-03-10 22:01 ` [PATCH net 3/6] net/mlx5: Fix incorrect IRQ pool usage when releasing IRQs Tariq Toukan
2025-03-11  7:38   ` Michal Swiatkowski
2025-03-10 22:01 ` [PATCH net 4/6] net/mlx5: Lag, Check shared fdb before creating MultiPort E-Switch Tariq Toukan
2025-03-11  7:41   ` Michal Swiatkowski
2025-03-10 22:01 ` [PATCH net 5/6] net/mlx5: Bridge, fix the crash caused by LAG state check Tariq Toukan
2025-03-11  8:02   ` Michal Swiatkowski [this message]
2025-03-10 22:01 ` [PATCH net 6/6] net/mlx5e: Prevent bridge link show failure for non-eswitch-allowed devices Tariq Toukan
2025-03-11  8:08   ` Michal Swiatkowski
2025-03-13 12:20 ` [PATCH net 0/6] mlx5 misc fixes 2025-03-10 patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8/t/LYlMWEwa+RL@mev-dev.igk.intel.com \
    --to=michal.swiatkowski@linux.intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=jianbol@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=moshe@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.