netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rdma-next 0/2] mlx5 RDMA LAG fixes
@ 2023-08-16  6:52 Leon Romanovsky
  2023-08-16  6:52 ` [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged Leon Romanovsky
  2023-08-16  6:52 ` [PATCH rdma-next 2/2] RDMA/mlx5: Send correct port events Leon Romanovsky
  0 siblings, 2 replies; 7+ messages in thread
From: Leon Romanovsky @ 2023-08-16  6:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Eric Dumazet, Jakub Kicinski, linux-kernel,
	linux-rdma, Mark Bloch, Mark Zhang, netdev, Paolo Abeni,
	Saeed Mahameed

From: Leon Romanovsky <leonro@nvidia.com>

Hi,

These two not urgent fixes to mlx5 RDMA LAG logic.

Thanks

Mark Bloch (2):
  RDMA/mlx5: Get upper device only if device is lagged
  RDMA/mlx5: Send correct port events

 drivers/infiniband/hw/mlx5/main.c             | 57 ++++++++++++++-----
 .../net/ethernet/mellanox/mlx5/core/lag/lag.c | 29 ++++++++++
 include/linux/mlx5/driver.h                   |  2 +
 3 files changed, 75 insertions(+), 13 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged
  2023-08-16  6:52 [PATCH rdma-next 0/2] mlx5 RDMA LAG fixes Leon Romanovsky
@ 2023-08-16  6:52 ` Leon Romanovsky
  2023-08-18 16:33   ` Jason Gunthorpe
  2023-08-16  6:52 ` [PATCH rdma-next 2/2] RDMA/mlx5: Send correct port events Leon Romanovsky
  1 sibling, 1 reply; 7+ messages in thread
From: Leon Romanovsky @ 2023-08-16  6:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Bloch, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-rdma, Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

If the RDMA device isn't in LAG mode there is no need
to try to get the upper device.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/main.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index f0b394ed7452..215d7b0add8f 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -195,12 +195,18 @@ static int mlx5_netdev_event(struct notifier_block *this,
 	case NETDEV_CHANGE:
 	case NETDEV_UP:
 	case NETDEV_DOWN: {
-		struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(mdev);
 		struct net_device *upper = NULL;
 
-		if (lag_ndev) {
-			upper = netdev_master_upper_dev_get(lag_ndev);
-			dev_put(lag_ndev);
+		if (ibdev->lag_active) {
+			struct net_device *lag_ndev;
+
+			lag_ndev = mlx5_lag_get_roce_netdev(mdev);
+			if (lag_ndev) {
+				upper = netdev_master_upper_dev_get(lag_ndev);
+				dev_put(lag_ndev);
+			} else {
+				goto done;
+			}
 		}
 
 		if (ibdev->is_rep)
@@ -254,9 +260,11 @@ static struct net_device *mlx5_ib_get_netdev(struct ib_device *device,
 	if (!mdev)
 		return NULL;
 
-	ndev = mlx5_lag_get_roce_netdev(mdev);
-	if (ndev)
-		goto out;
+	if (ibdev->lag_active) {
+		ndev = mlx5_lag_get_roce_netdev(mdev);
+		if (ndev)
+			goto out;
+	}
 
 	/* Ensure ndev does not disappear before we invoke dev_hold()
 	 */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH rdma-next 2/2] RDMA/mlx5: Send correct port events
  2023-08-16  6:52 [PATCH rdma-next 0/2] mlx5 RDMA LAG fixes Leon Romanovsky
  2023-08-16  6:52 ` [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged Leon Romanovsky
@ 2023-08-16  6:52 ` Leon Romanovsky
  1 sibling, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2023-08-16  6:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Bloch, Eric Dumazet, Jakub Kicinski, linux-rdma, Mark Zhang,
	netdev, Paolo Abeni, Saeed Mahameed

From: Mark Bloch <mbloch@nvidia.com>

When operating in switchdev mode and with an active LAG, the function
mlx5_lag_get_roce_netdev() fails to return a valid net device as this
function is designed specifically for RoCE LAGs.

Consequently, this issue resulted in the driver sending incorrect event
reports. To address this, a new API is introduced to properly obtain the
net device. Additionally, some code logic is cleaned up during this
modification.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/main.c             | 39 +++++++++++++++----
 .../net/ethernet/mellanox/mlx5/core/lag/lag.c | 29 ++++++++++++++
 include/linux/mlx5/driver.h                   |  2 +
 3 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 215d7b0add8f..8b98200bd94c 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -159,6 +159,29 @@ static struct mlx5_roce *mlx5_get_rep_roce(struct mlx5_ib_dev *dev,
 	return NULL;
 }
 
+static bool mlx5_netdev_send_event(struct mlx5_ib_dev *dev,
+				   struct net_device *ndev,
+				   struct net_device *upper,
+				   struct mlx5_roce *roce)
+{
+	if (!dev->ib_active)
+		return false;
+
+	/* Event is about our upper device */
+	if (upper == ndev)
+		return true;
+
+	/* RDMA device not in lag and not in switchdev */
+	if (!dev->is_rep && !upper && ndev == roce->netdev)
+		return true;
+
+	/* RDMA device in switchdev */
+	if (dev->is_rep && ndev == roce->netdev)
+		return true;
+
+	return false;
+}
+
 static int mlx5_netdev_event(struct notifier_block *this,
 			     unsigned long event, void *ptr)
 {
@@ -200,7 +223,7 @@ static int mlx5_netdev_event(struct notifier_block *this,
 		if (ibdev->lag_active) {
 			struct net_device *lag_ndev;
 
-			lag_ndev = mlx5_lag_get_roce_netdev(mdev);
+			lag_ndev = mlx5_lag_get_netdev(mdev);
 			if (lag_ndev) {
 				upper = netdev_master_upper_dev_get(lag_ndev);
 				dev_put(lag_ndev);
@@ -209,13 +232,13 @@ static int mlx5_netdev_event(struct notifier_block *this,
 			}
 		}
 
-		if (ibdev->is_rep)
+		if (ibdev->is_rep) {
 			roce = mlx5_get_rep_roce(ibdev, ndev, upper, &port_num);
-		if (!roce)
-			return NOTIFY_DONE;
-		if ((upper == ndev ||
-		     ((!upper || ibdev->is_rep) && ndev == roce->netdev)) &&
-		    ibdev->ib_active) {
+			if (!roce)
+				return NOTIFY_DONE;
+		}
+
+		if (mlx5_netdev_send_event(ibdev, ndev, upper, roce)) {
 			struct ib_event ibev = { };
 			enum ib_port_state port_state;
 
@@ -260,7 +283,7 @@ static struct net_device *mlx5_ib_get_netdev(struct ib_device *device,
 	if (!mdev)
 		return NULL;
 
-	if (ibdev->lag_active) {
+	if (!ibdev->is_rep && ibdev->lag_active) {
 		ndev = mlx5_lag_get_roce_netdev(mdev);
 		if (ndev)
 			goto out;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index f0a074b2fcdf..83298e9addd3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -1498,6 +1498,35 @@ struct net_device *mlx5_lag_get_roce_netdev(struct mlx5_core_dev *dev)
 }
 EXPORT_SYMBOL(mlx5_lag_get_roce_netdev);
 
+struct net_device *mlx5_lag_get_netdev(struct mlx5_core_dev *dev)
+{
+	struct net_device *ndev = NULL;
+	struct mlx5_lag *ldev;
+	unsigned long flags;
+	int i;
+
+	spin_lock_irqsave(&lag_lock, flags);
+	ldev = mlx5_lag_dev(dev);
+
+	if (!(ldev && __mlx5_lag_is_active(ldev)))
+		goto unlock;
+
+	for (i = 0; i < ldev->ports; i++) {
+		if (ldev->pf[i].dev == dev) {
+			ndev = ldev->pf[i].netdev;
+			break;
+		}
+	}
+
+	if (ndev)
+		dev_hold(ndev);
+
+unlock:
+	spin_unlock_irqrestore(&lag_lock, flags);
+	return ndev;
+}
+EXPORT_SYMBOL(mlx5_lag_get_netdev);
+
 u8 mlx5_lag_get_slave_port(struct mlx5_core_dev *dev,
 			   struct net_device *slave)
 {
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 25d0528f9219..bc7e3a974f62 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1160,6 +1160,8 @@ bool mlx5_lag_is_master(struct mlx5_core_dev *dev);
 bool mlx5_lag_is_shared_fdb(struct mlx5_core_dev *dev);
 bool mlx5_lag_is_mpesw(struct mlx5_core_dev *dev);
 struct net_device *mlx5_lag_get_roce_netdev(struct mlx5_core_dev *dev);
+
+struct net_device *mlx5_lag_get_netdev(struct mlx5_core_dev *dev);
 u8 mlx5_lag_get_slave_port(struct mlx5_core_dev *dev,
 			   struct net_device *slave);
 int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged
  2023-08-16  6:52 ` [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged Leon Romanovsky
@ 2023-08-18 16:33   ` Jason Gunthorpe
  2023-08-18 16:42     ` Jason Gunthorpe
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2023-08-18 16:33 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Mark Bloch, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-rdma, Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed

On Wed, Aug 16, 2023 at 09:52:23AM +0300, Leon Romanovsky wrote:
> From: Mark Bloch <mbloch@nvidia.com>
> 
> If the RDMA device isn't in LAG mode there is no need
> to try to get the upper device.
> 
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/infiniband/hw/mlx5/main.c | 22 +++++++++++++++-------
>  1 file changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index f0b394ed7452..215d7b0add8f 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -195,12 +195,18 @@ static int mlx5_netdev_event(struct notifier_block *this,
>  	case NETDEV_CHANGE:
>  	case NETDEV_UP:
>  	case NETDEV_DOWN: {
> -		struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(mdev);
>  		struct net_device *upper = NULL;
>  
> -		if (lag_ndev) {
> -			upper = netdev_master_upper_dev_get(lag_ndev);
> -			dev_put(lag_ndev);
> +		if (ibdev->lag_active) {

Needs locking to read lag_active

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged
  2023-08-18 16:33   ` Jason Gunthorpe
@ 2023-08-18 16:42     ` Jason Gunthorpe
  2023-08-20  9:59       ` Leon Romanovsky
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2023-08-18 16:42 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Mark Bloch, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-rdma, Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed

On Fri, Aug 18, 2023 at 01:33:35PM -0300, Jason Gunthorpe wrote:
> On Wed, Aug 16, 2023 at 09:52:23AM +0300, Leon Romanovsky wrote:
> > From: Mark Bloch <mbloch@nvidia.com>
> > 
> > If the RDMA device isn't in LAG mode there is no need
> > to try to get the upper device.
> > 
> > Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/infiniband/hw/mlx5/main.c | 22 +++++++++++++++-------
> >  1 file changed, 15 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> > index f0b394ed7452..215d7b0add8f 100644
> > --- a/drivers/infiniband/hw/mlx5/main.c
> > +++ b/drivers/infiniband/hw/mlx5/main.c
> > @@ -195,12 +195,18 @@ static int mlx5_netdev_event(struct notifier_block *this,
> >  	case NETDEV_CHANGE:
> >  	case NETDEV_UP:
> >  	case NETDEV_DOWN: {
> > -		struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(mdev);
> >  		struct net_device *upper = NULL;
> >  
> > -		if (lag_ndev) {
> > -			upper = netdev_master_upper_dev_get(lag_ndev);
> > -			dev_put(lag_ndev);
> > +		if (ibdev->lag_active) {
> 
> Needs locking to read lag_active

Specifically the use of the bitfield looks messed up.. If lag_active
and some others were set only during probe it could be OK.

But mixing other stuff that is being written concurrently is not OK to
do like this. (eg ib_active via a mlx5 notifier)

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged
  2023-08-18 16:42     ` Jason Gunthorpe
@ 2023-08-20  9:59       ` Leon Romanovsky
  2023-08-21 13:39         ` Jason Gunthorpe
  0 siblings, 1 reply; 7+ messages in thread
From: Leon Romanovsky @ 2023-08-20  9:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Bloch, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-rdma, Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed

On Fri, Aug 18, 2023 at 01:42:30PM -0300, Jason Gunthorpe wrote:
> On Fri, Aug 18, 2023 at 01:33:35PM -0300, Jason Gunthorpe wrote:
> > On Wed, Aug 16, 2023 at 09:52:23AM +0300, Leon Romanovsky wrote:
> > > From: Mark Bloch <mbloch@nvidia.com>
> > > 
> > > If the RDMA device isn't in LAG mode there is no need
> > > to try to get the upper device.
> > > 
> > > Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > ---
> > >  drivers/infiniband/hw/mlx5/main.c | 22 +++++++++++++++-------
> > >  1 file changed, 15 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> > > index f0b394ed7452..215d7b0add8f 100644
> > > --- a/drivers/infiniband/hw/mlx5/main.c
> > > +++ b/drivers/infiniband/hw/mlx5/main.c
> > > @@ -195,12 +195,18 @@ static int mlx5_netdev_event(struct notifier_block *this,
> > >  	case NETDEV_CHANGE:
> > >  	case NETDEV_UP:
> > >  	case NETDEV_DOWN: {
> > > -		struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(mdev);
> > >  		struct net_device *upper = NULL;
> > >  
> > > -		if (lag_ndev) {
> > > -			upper = netdev_master_upper_dev_get(lag_ndev);
> > > -			dev_put(lag_ndev);
> > > +		if (ibdev->lag_active) {
> > 
> > Needs locking to read lag_active
> 
> Specifically the use of the bitfield looks messed up.. If lag_active
> and some others were set only during probe it could be OK.

All fields except ib_active are static and set during probe.

> 
> But mixing other stuff that is being written concurrently is not OK to
> do like this. (eg ib_active via a mlx5 notifier)

What you are looking is the following change, did I get you right?

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 9d0c56b59ed2..ee73113717b2 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1094,7 +1094,7 @@ struct mlx5_ib_dev {
        /* serialize update of capability mask
         */
        struct mutex                    cap_mask_mutex;
-       u8                              ib_active:1;
+       bool                            ib_active;
        u8                              is_rep:1;
        u8                              lag_active:1;
        u8                              wc_support:1;

> 
> Jason

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged
  2023-08-20  9:59       ` Leon Romanovsky
@ 2023-08-21 13:39         ` Jason Gunthorpe
  0 siblings, 0 replies; 7+ messages in thread
From: Jason Gunthorpe @ 2023-08-21 13:39 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Mark Bloch, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-rdma, Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed

On Sun, Aug 20, 2023 at 12:59:26PM +0300, Leon Romanovsky wrote:
> On Fri, Aug 18, 2023 at 01:42:30PM -0300, Jason Gunthorpe wrote:
> > On Fri, Aug 18, 2023 at 01:33:35PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Aug 16, 2023 at 09:52:23AM +0300, Leon Romanovsky wrote:
> > > > From: Mark Bloch <mbloch@nvidia.com>
> > > > 
> > > > If the RDMA device isn't in LAG mode there is no need
> > > > to try to get the upper device.
> > > > 
> > > > Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > ---
> > > >  drivers/infiniband/hw/mlx5/main.c | 22 +++++++++++++++-------
> > > >  1 file changed, 15 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> > > > index f0b394ed7452..215d7b0add8f 100644
> > > > --- a/drivers/infiniband/hw/mlx5/main.c
> > > > +++ b/drivers/infiniband/hw/mlx5/main.c
> > > > @@ -195,12 +195,18 @@ static int mlx5_netdev_event(struct notifier_block *this,
> > > >  	case NETDEV_CHANGE:
> > > >  	case NETDEV_UP:
> > > >  	case NETDEV_DOWN: {
> > > > -		struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(mdev);
> > > >  		struct net_device *upper = NULL;
> > > >  
> > > > -		if (lag_ndev) {
> > > > -			upper = netdev_master_upper_dev_get(lag_ndev);
> > > > -			dev_put(lag_ndev);
> > > > +		if (ibdev->lag_active) {
> > > 
> > > Needs locking to read lag_active
> > 
> > Specifically the use of the bitfield looks messed up.. If lag_active
> > and some others were set only during probe it could be OK.
> 
> All fields except ib_active are static and set during probe.
> 
> > 
> > But mixing other stuff that is being written concurrently is not OK to
> > do like this. (eg ib_active via a mlx5 notifier)
> 
> What you are looking is the following change, did I get you right?
> 
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index 9d0c56b59ed2..ee73113717b2 100644
> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -1094,7 +1094,7 @@ struct mlx5_ib_dev {
>         /* serialize update of capability mask
>          */
>         struct mutex                    cap_mask_mutex;
> -       u8                              ib_active:1;
> +       bool                            ib_active;
>         u8                              is_rep:1;
>         u8                              lag_active:1;
>         u8                              wc_support:1;

That helps, but it still needs some kind of concurrency management for
ib_active

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-08-21 13:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-16  6:52 [PATCH rdma-next 0/2] mlx5 RDMA LAG fixes Leon Romanovsky
2023-08-16  6:52 ` [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged Leon Romanovsky
2023-08-18 16:33   ` Jason Gunthorpe
2023-08-18 16:42     ` Jason Gunthorpe
2023-08-20  9:59       ` Leon Romanovsky
2023-08-21 13:39         ` Jason Gunthorpe
2023-08-16  6:52 ` [PATCH rdma-next 2/2] RDMA/mlx5: Send correct port events Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).