* [PATCH rdma-next 0/2] mlx5 RDMA LAG fixes
@ 2023-08-16 6:52 Leon Romanovsky
2023-08-16 6:52 ` [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged Leon Romanovsky
2023-08-16 6:52 ` [PATCH rdma-next 2/2] RDMA/mlx5: Send correct port events Leon Romanovsky
0 siblings, 2 replies; 7+ messages in thread
From: Leon Romanovsky @ 2023-08-16 6:52 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, Eric Dumazet, Jakub Kicinski, linux-kernel,
linux-rdma, Mark Bloch, Mark Zhang, netdev, Paolo Abeni,
Saeed Mahameed
From: Leon Romanovsky <leonro@nvidia.com>
Hi,
These two not urgent fixes to mlx5 RDMA LAG logic.
Thanks
Mark Bloch (2):
RDMA/mlx5: Get upper device only if device is lagged
RDMA/mlx5: Send correct port events
drivers/infiniband/hw/mlx5/main.c | 57 ++++++++++++++-----
.../net/ethernet/mellanox/mlx5/core/lag/lag.c | 29 ++++++++++
include/linux/mlx5/driver.h | 2 +
3 files changed, 75 insertions(+), 13 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged
2023-08-16 6:52 [PATCH rdma-next 0/2] mlx5 RDMA LAG fixes Leon Romanovsky
@ 2023-08-16 6:52 ` Leon Romanovsky
2023-08-18 16:33 ` Jason Gunthorpe
2023-08-16 6:52 ` [PATCH rdma-next 2/2] RDMA/mlx5: Send correct port events Leon Romanovsky
1 sibling, 1 reply; 7+ messages in thread
From: Leon Romanovsky @ 2023-08-16 6:52 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Bloch, David S. Miller, Eric Dumazet, Jakub Kicinski,
linux-rdma, Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed
From: Mark Bloch <mbloch@nvidia.com>
If the RDMA device isn't in LAG mode there is no need
to try to get the upper device.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/main.c | 22 +++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index f0b394ed7452..215d7b0add8f 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -195,12 +195,18 @@ static int mlx5_netdev_event(struct notifier_block *this,
case NETDEV_CHANGE:
case NETDEV_UP:
case NETDEV_DOWN: {
- struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(mdev);
struct net_device *upper = NULL;
- if (lag_ndev) {
- upper = netdev_master_upper_dev_get(lag_ndev);
- dev_put(lag_ndev);
+ if (ibdev->lag_active) {
+ struct net_device *lag_ndev;
+
+ lag_ndev = mlx5_lag_get_roce_netdev(mdev);
+ if (lag_ndev) {
+ upper = netdev_master_upper_dev_get(lag_ndev);
+ dev_put(lag_ndev);
+ } else {
+ goto done;
+ }
}
if (ibdev->is_rep)
@@ -254,9 +260,11 @@ static struct net_device *mlx5_ib_get_netdev(struct ib_device *device,
if (!mdev)
return NULL;
- ndev = mlx5_lag_get_roce_netdev(mdev);
- if (ndev)
- goto out;
+ if (ibdev->lag_active) {
+ ndev = mlx5_lag_get_roce_netdev(mdev);
+ if (ndev)
+ goto out;
+ }
/* Ensure ndev does not disappear before we invoke dev_hold()
*/
--
2.41.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH rdma-next 2/2] RDMA/mlx5: Send correct port events
2023-08-16 6:52 [PATCH rdma-next 0/2] mlx5 RDMA LAG fixes Leon Romanovsky
2023-08-16 6:52 ` [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged Leon Romanovsky
@ 2023-08-16 6:52 ` Leon Romanovsky
1 sibling, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2023-08-16 6:52 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Bloch, Eric Dumazet, Jakub Kicinski, linux-rdma, Mark Zhang,
netdev, Paolo Abeni, Saeed Mahameed
From: Mark Bloch <mbloch@nvidia.com>
When operating in switchdev mode and with an active LAG, the function
mlx5_lag_get_roce_netdev() fails to return a valid net device as this
function is designed specifically for RoCE LAGs.
Consequently, this issue resulted in the driver sending incorrect event
reports. To address this, a new API is introduced to properly obtain the
net device. Additionally, some code logic is cleaned up during this
modification.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/main.c | 39 +++++++++++++++----
.../net/ethernet/mellanox/mlx5/core/lag/lag.c | 29 ++++++++++++++
include/linux/mlx5/driver.h | 2 +
3 files changed, 62 insertions(+), 8 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 215d7b0add8f..8b98200bd94c 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -159,6 +159,29 @@ static struct mlx5_roce *mlx5_get_rep_roce(struct mlx5_ib_dev *dev,
return NULL;
}
+static bool mlx5_netdev_send_event(struct mlx5_ib_dev *dev,
+ struct net_device *ndev,
+ struct net_device *upper,
+ struct mlx5_roce *roce)
+{
+ if (!dev->ib_active)
+ return false;
+
+ /* Event is about our upper device */
+ if (upper == ndev)
+ return true;
+
+ /* RDMA device not in lag and not in switchdev */
+ if (!dev->is_rep && !upper && ndev == roce->netdev)
+ return true;
+
+ /* RDMA device in switchdev */
+ if (dev->is_rep && ndev == roce->netdev)
+ return true;
+
+ return false;
+}
+
static int mlx5_netdev_event(struct notifier_block *this,
unsigned long event, void *ptr)
{
@@ -200,7 +223,7 @@ static int mlx5_netdev_event(struct notifier_block *this,
if (ibdev->lag_active) {
struct net_device *lag_ndev;
- lag_ndev = mlx5_lag_get_roce_netdev(mdev);
+ lag_ndev = mlx5_lag_get_netdev(mdev);
if (lag_ndev) {
upper = netdev_master_upper_dev_get(lag_ndev);
dev_put(lag_ndev);
@@ -209,13 +232,13 @@ static int mlx5_netdev_event(struct notifier_block *this,
}
}
- if (ibdev->is_rep)
+ if (ibdev->is_rep) {
roce = mlx5_get_rep_roce(ibdev, ndev, upper, &port_num);
- if (!roce)
- return NOTIFY_DONE;
- if ((upper == ndev ||
- ((!upper || ibdev->is_rep) && ndev == roce->netdev)) &&
- ibdev->ib_active) {
+ if (!roce)
+ return NOTIFY_DONE;
+ }
+
+ if (mlx5_netdev_send_event(ibdev, ndev, upper, roce)) {
struct ib_event ibev = { };
enum ib_port_state port_state;
@@ -260,7 +283,7 @@ static struct net_device *mlx5_ib_get_netdev(struct ib_device *device,
if (!mdev)
return NULL;
- if (ibdev->lag_active) {
+ if (!ibdev->is_rep && ibdev->lag_active) {
ndev = mlx5_lag_get_roce_netdev(mdev);
if (ndev)
goto out;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index f0a074b2fcdf..83298e9addd3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -1498,6 +1498,35 @@ struct net_device *mlx5_lag_get_roce_netdev(struct mlx5_core_dev *dev)
}
EXPORT_SYMBOL(mlx5_lag_get_roce_netdev);
+struct net_device *mlx5_lag_get_netdev(struct mlx5_core_dev *dev)
+{
+ struct net_device *ndev = NULL;
+ struct mlx5_lag *ldev;
+ unsigned long flags;
+ int i;
+
+ spin_lock_irqsave(&lag_lock, flags);
+ ldev = mlx5_lag_dev(dev);
+
+ if (!(ldev && __mlx5_lag_is_active(ldev)))
+ goto unlock;
+
+ for (i = 0; i < ldev->ports; i++) {
+ if (ldev->pf[i].dev == dev) {
+ ndev = ldev->pf[i].netdev;
+ break;
+ }
+ }
+
+ if (ndev)
+ dev_hold(ndev);
+
+unlock:
+ spin_unlock_irqrestore(&lag_lock, flags);
+ return ndev;
+}
+EXPORT_SYMBOL(mlx5_lag_get_netdev);
+
u8 mlx5_lag_get_slave_port(struct mlx5_core_dev *dev,
struct net_device *slave)
{
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 25d0528f9219..bc7e3a974f62 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1160,6 +1160,8 @@ bool mlx5_lag_is_master(struct mlx5_core_dev *dev);
bool mlx5_lag_is_shared_fdb(struct mlx5_core_dev *dev);
bool mlx5_lag_is_mpesw(struct mlx5_core_dev *dev);
struct net_device *mlx5_lag_get_roce_netdev(struct mlx5_core_dev *dev);
+
+struct net_device *mlx5_lag_get_netdev(struct mlx5_core_dev *dev);
u8 mlx5_lag_get_slave_port(struct mlx5_core_dev *dev,
struct net_device *slave);
int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
--
2.41.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged
2023-08-16 6:52 ` [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged Leon Romanovsky
@ 2023-08-18 16:33 ` Jason Gunthorpe
2023-08-18 16:42 ` Jason Gunthorpe
0 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2023-08-18 16:33 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Mark Bloch, David S. Miller, Eric Dumazet, Jakub Kicinski,
linux-rdma, Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed
On Wed, Aug 16, 2023 at 09:52:23AM +0300, Leon Romanovsky wrote:
> From: Mark Bloch <mbloch@nvidia.com>
>
> If the RDMA device isn't in LAG mode there is no need
> to try to get the upper device.
>
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> drivers/infiniband/hw/mlx5/main.c | 22 +++++++++++++++-------
> 1 file changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index f0b394ed7452..215d7b0add8f 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -195,12 +195,18 @@ static int mlx5_netdev_event(struct notifier_block *this,
> case NETDEV_CHANGE:
> case NETDEV_UP:
> case NETDEV_DOWN: {
> - struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(mdev);
> struct net_device *upper = NULL;
>
> - if (lag_ndev) {
> - upper = netdev_master_upper_dev_get(lag_ndev);
> - dev_put(lag_ndev);
> + if (ibdev->lag_active) {
Needs locking to read lag_active
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged
2023-08-18 16:33 ` Jason Gunthorpe
@ 2023-08-18 16:42 ` Jason Gunthorpe
2023-08-20 9:59 ` Leon Romanovsky
0 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2023-08-18 16:42 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Mark Bloch, David S. Miller, Eric Dumazet, Jakub Kicinski,
linux-rdma, Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed
On Fri, Aug 18, 2023 at 01:33:35PM -0300, Jason Gunthorpe wrote:
> On Wed, Aug 16, 2023 at 09:52:23AM +0300, Leon Romanovsky wrote:
> > From: Mark Bloch <mbloch@nvidia.com>
> >
> > If the RDMA device isn't in LAG mode there is no need
> > to try to get the upper device.
> >
> > Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> > drivers/infiniband/hw/mlx5/main.c | 22 +++++++++++++++-------
> > 1 file changed, 15 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> > index f0b394ed7452..215d7b0add8f 100644
> > --- a/drivers/infiniband/hw/mlx5/main.c
> > +++ b/drivers/infiniband/hw/mlx5/main.c
> > @@ -195,12 +195,18 @@ static int mlx5_netdev_event(struct notifier_block *this,
> > case NETDEV_CHANGE:
> > case NETDEV_UP:
> > case NETDEV_DOWN: {
> > - struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(mdev);
> > struct net_device *upper = NULL;
> >
> > - if (lag_ndev) {
> > - upper = netdev_master_upper_dev_get(lag_ndev);
> > - dev_put(lag_ndev);
> > + if (ibdev->lag_active) {
>
> Needs locking to read lag_active
Specifically the use of the bitfield looks messed up.. If lag_active
and some others were set only during probe it could be OK.
But mixing other stuff that is being written concurrently is not OK to
do like this. (eg ib_active via a mlx5 notifier)
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged
2023-08-18 16:42 ` Jason Gunthorpe
@ 2023-08-20 9:59 ` Leon Romanovsky
2023-08-21 13:39 ` Jason Gunthorpe
0 siblings, 1 reply; 7+ messages in thread
From: Leon Romanovsky @ 2023-08-20 9:59 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mark Bloch, David S. Miller, Eric Dumazet, Jakub Kicinski,
linux-rdma, Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed
On Fri, Aug 18, 2023 at 01:42:30PM -0300, Jason Gunthorpe wrote:
> On Fri, Aug 18, 2023 at 01:33:35PM -0300, Jason Gunthorpe wrote:
> > On Wed, Aug 16, 2023 at 09:52:23AM +0300, Leon Romanovsky wrote:
> > > From: Mark Bloch <mbloch@nvidia.com>
> > >
> > > If the RDMA device isn't in LAG mode there is no need
> > > to try to get the upper device.
> > >
> > > Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > ---
> > > drivers/infiniband/hw/mlx5/main.c | 22 +++++++++++++++-------
> > > 1 file changed, 15 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> > > index f0b394ed7452..215d7b0add8f 100644
> > > --- a/drivers/infiniband/hw/mlx5/main.c
> > > +++ b/drivers/infiniband/hw/mlx5/main.c
> > > @@ -195,12 +195,18 @@ static int mlx5_netdev_event(struct notifier_block *this,
> > > case NETDEV_CHANGE:
> > > case NETDEV_UP:
> > > case NETDEV_DOWN: {
> > > - struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(mdev);
> > > struct net_device *upper = NULL;
> > >
> > > - if (lag_ndev) {
> > > - upper = netdev_master_upper_dev_get(lag_ndev);
> > > - dev_put(lag_ndev);
> > > + if (ibdev->lag_active) {
> >
> > Needs locking to read lag_active
>
> Specifically the use of the bitfield looks messed up.. If lag_active
> and some others were set only during probe it could be OK.
All fields except ib_active are static and set during probe.
>
> But mixing other stuff that is being written concurrently is not OK to
> do like this. (eg ib_active via a mlx5 notifier)
What you are looking is the following change, did I get you right?
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 9d0c56b59ed2..ee73113717b2 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1094,7 +1094,7 @@ struct mlx5_ib_dev {
/* serialize update of capability mask
*/
struct mutex cap_mask_mutex;
- u8 ib_active:1;
+ bool ib_active;
u8 is_rep:1;
u8 lag_active:1;
u8 wc_support:1;
>
> Jason
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged
2023-08-20 9:59 ` Leon Romanovsky
@ 2023-08-21 13:39 ` Jason Gunthorpe
0 siblings, 0 replies; 7+ messages in thread
From: Jason Gunthorpe @ 2023-08-21 13:39 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Mark Bloch, David S. Miller, Eric Dumazet, Jakub Kicinski,
linux-rdma, Mark Zhang, netdev, Paolo Abeni, Saeed Mahameed
On Sun, Aug 20, 2023 at 12:59:26PM +0300, Leon Romanovsky wrote:
> On Fri, Aug 18, 2023 at 01:42:30PM -0300, Jason Gunthorpe wrote:
> > On Fri, Aug 18, 2023 at 01:33:35PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Aug 16, 2023 at 09:52:23AM +0300, Leon Romanovsky wrote:
> > > > From: Mark Bloch <mbloch@nvidia.com>
> > > >
> > > > If the RDMA device isn't in LAG mode there is no need
> > > > to try to get the upper device.
> > > >
> > > > Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > ---
> > > > drivers/infiniband/hw/mlx5/main.c | 22 +++++++++++++++-------
> > > > 1 file changed, 15 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> > > > index f0b394ed7452..215d7b0add8f 100644
> > > > --- a/drivers/infiniband/hw/mlx5/main.c
> > > > +++ b/drivers/infiniband/hw/mlx5/main.c
> > > > @@ -195,12 +195,18 @@ static int mlx5_netdev_event(struct notifier_block *this,
> > > > case NETDEV_CHANGE:
> > > > case NETDEV_UP:
> > > > case NETDEV_DOWN: {
> > > > - struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(mdev);
> > > > struct net_device *upper = NULL;
> > > >
> > > > - if (lag_ndev) {
> > > > - upper = netdev_master_upper_dev_get(lag_ndev);
> > > > - dev_put(lag_ndev);
> > > > + if (ibdev->lag_active) {
> > >
> > > Needs locking to read lag_active
> >
> > Specifically the use of the bitfield looks messed up.. If lag_active
> > and some others were set only during probe it could be OK.
>
> All fields except ib_active are static and set during probe.
>
> >
> > But mixing other stuff that is being written concurrently is not OK to
> > do like this. (eg ib_active via a mlx5 notifier)
>
> What you are looking is the following change, did I get you right?
>
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index 9d0c56b59ed2..ee73113717b2 100644
> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -1094,7 +1094,7 @@ struct mlx5_ib_dev {
> /* serialize update of capability mask
> */
> struct mutex cap_mask_mutex;
> - u8 ib_active:1;
> + bool ib_active;
> u8 is_rep:1;
> u8 lag_active:1;
> u8 wc_support:1;
That helps, but it still needs some kind of concurrency management for
ib_active
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-08-21 13:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-16 6:52 [PATCH rdma-next 0/2] mlx5 RDMA LAG fixes Leon Romanovsky
2023-08-16 6:52 ` [PATCH rdma-next 1/2] RDMA/mlx5: Get upper device only if device is lagged Leon Romanovsky
2023-08-18 16:33 ` Jason Gunthorpe
2023-08-18 16:42 ` Jason Gunthorpe
2023-08-20 9:59 ` Leon Romanovsky
2023-08-21 13:39 ` Jason Gunthorpe
2023-08-16 6:52 ` [PATCH rdma-next 2/2] RDMA/mlx5: Send correct port events Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).