From: Saeed Mahameed <saeed@kernel.org>
To: "David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
Alexander Lobakin <alexandr.lobakin@intel.com>,
Maher Sanalla <msanalla@nvidia.com>,
Leon Romanovsky <leonro@nvidia.com>,
Mark Bloch <mbloch@nvidia.com>
Subject: [net 7/7] net/mlx5: Fix mlx5_get_next_dev() peer device matching
Date: Tue, 31 May 2022 13:54:47 -0700 [thread overview]
Message-ID: <20220531205447.99236-8-saeed@kernel.org> (raw)
In-Reply-To: <20220531205447.99236-1-saeed@kernel.org>
From: Saeed Mahameed <saeedm@nvidia.com>
In some use-cases, mlx5 instances will need to search for their peer
device (the other port on the same HCA). For that, mlx5 device matching
mechanism relied on auxiliary_find_device() to search, and used a bad matching
callback function.
This approach has two issues:
1) next_phys_dev() the matching function, assumed all devices are
of the type mlx5_adev (mlx5 auxiliary device) which is wrong and
could lead to crashes, this worked for a while, since only lately
other drivers started registering auxiliary devices.
2) using the auxiliary class bus (auxiliary_find_device) to search for
mlx5_core_dev devices, who are actually PCIe device instances, is wrong.
This works since mlx5_core always has at least one mlx5_adev instance
hanging around in the aux bus.
As suggested by others we can fix 1. by comparing device names prefixes
if they have the string "mlx5_core" in them, which is not a best practice !
but even with that fixed, still 2. needs fixing, we are trying to
match pcie device peers so we should look in the right bus (pci bus),
hence this fix.
The fix:
1) search the pci bus for mlx5 peer devices, instead of the aux bus
2) to validated devices are the same type "mlx5_core_dev" compare if
they have the same driver, which is bulletproof.
This wouldn't have worked with the aux bus since the various mlx5 aux
device types don't share the same driver, even if they share the same device
wrapper struct (mlx5_adev) "which helped to find the parent device"
Fixes: a925b5e309c9 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus")
Reported-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reported-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/dev.c | 34 +++++++++++++------
1 file changed, 23 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index 11f7c03ae81b..0eb9d74547f8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -571,18 +571,32 @@ static int _next_phys_dev(struct mlx5_core_dev *mdev,
return 1;
}
+static void *pci_get_other_drvdata(struct device *this, struct device *other)
+{
+ if (this->driver != other->driver)
+ return NULL;
+
+ return pci_get_drvdata(to_pci_dev(other));
+}
+
static int next_phys_dev(struct device *dev, const void *data)
{
- struct mlx5_adev *madev = container_of(dev, struct mlx5_adev, adev.dev);
- struct mlx5_core_dev *mdev = madev->mdev;
+ struct mlx5_core_dev *mdev, *this = (struct mlx5_core_dev *)data;
+
+ mdev = pci_get_other_drvdata(this->device, dev);
+ if (!mdev)
+ return 0;
return _next_phys_dev(mdev, data);
}
static int next_phys_dev_lag(struct device *dev, const void *data)
{
- struct mlx5_adev *madev = container_of(dev, struct mlx5_adev, adev.dev);
- struct mlx5_core_dev *mdev = madev->mdev;
+ struct mlx5_core_dev *mdev, *this = (struct mlx5_core_dev *)data;
+
+ mdev = pci_get_other_drvdata(this->device, dev);
+ if (!mdev)
+ return 0;
if (!MLX5_CAP_GEN(mdev, vport_group_manager) ||
!MLX5_CAP_GEN(mdev, lag_master) ||
@@ -596,19 +610,17 @@ static int next_phys_dev_lag(struct device *dev, const void *data)
static struct mlx5_core_dev *mlx5_get_next_dev(struct mlx5_core_dev *dev,
int (*match)(struct device *dev, const void *data))
{
- struct auxiliary_device *adev;
- struct mlx5_adev *madev;
+ struct device *next;
if (!mlx5_core_is_pf(dev))
return NULL;
- adev = auxiliary_find_device(NULL, dev, match);
- if (!adev)
+ next = bus_find_device(&pci_bus_type, NULL, dev, match);
+ if (!next)
return NULL;
- madev = container_of(adev, struct mlx5_adev, adev);
- put_device(&adev->dev);
- return madev->mdev;
+ put_device(next);
+ return pci_get_drvdata(to_pci_dev(next));
}
/* Must be called with intf_mutex held */
--
2.36.1
prev parent reply other threads:[~2022-05-31 20:55 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-31 20:54 [pull request][net 0/7] mlx5 fixes 2022-05-31 Saeed Mahameed
2022-05-31 20:54 ` [net 1/7] net/mlx5: Don't use already freed action pointer Saeed Mahameed
2022-06-02 1:20 ` patchwork-bot+netdevbpf
2022-05-31 20:54 ` [net 2/7] net/mlx5e: TC NIC mode, fix tc chains miss table Saeed Mahameed
2022-05-31 20:54 ` [net 3/7] net/mlx5: CT: Fix header-rewrite re-use for tupels Saeed Mahameed
2022-05-31 20:54 ` [net 4/7] net/mlx5e: Disable softirq in mlx5e_activate_rq to avoid race condition Saeed Mahameed
2022-05-31 20:54 ` [net 5/7] net/mlx5: correct ECE offset in query qp output Saeed Mahameed
2022-05-31 20:54 ` [net 6/7] net/mlx5e: Update netdev features after changing XDP state Saeed Mahameed
2022-05-31 20:54 ` Saeed Mahameed [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220531205447.99236-8-saeed@kernel.org \
--to=saeed@kernel.org \
--cc=alexandr.lobakin@intel.com \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=leonro@nvidia.com \
--cc=mbloch@nvidia.com \
--cc=msanalla@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).