* [PATCH 1/5] common/mlx5: fix bond check
2026-03-02 11:34 [PATCH 0/5] net/mlx5: add BlueField socket direct support Dariusz Sosnowski
@ 2026-03-02 11:34 ` Dariusz Sosnowski
2026-03-02 11:34 ` [PATCH 2/5] net/mlx5: " Dariusz Sosnowski
` (5 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Dariusz Sosnowski @ 2026-03-02 11:34 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad, Rongwei Liu
Cc: dev, Raslan Darawsheh, stable
mlx5 PMD supports probing a device where PF kernel netdevs
are part of a netdev bonding device in the Linux kernel.
In such scenario, there is only IB device exposed
which mlx5 PMD later uses to configure the device.
This IB device is created only one of the PFs.
PMD allowed probing this device by any of the PFs.
As part of the logic for allowing this, mlx5 common driver
checked if the name of IB device contained "bond", but this is not
always the case and depends on existence of specific udev rules.
This patch fixes that by attempting to resolve, through sysfs,
if any of the netdevs related to probed PCI device
are part of the bonding netdev, instead of relying on device name.
Fixes: f956d3d4c33c ("net/mlx5: fix probing with secondary bonding member")
Cc: rongweil@nvidia.com
Cc: stable@dpdk.org
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
drivers/common/mlx5/linux/mlx5_common_os.c | 86 ++++++++++++++++++++--
drivers/common/mlx5/linux/mlx5_common_os.h | 9 +++
2 files changed, 90 insertions(+), 5 deletions(-)
diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c b/drivers/common/mlx5/linux/mlx5_common_os.c
index 926b56e419..fc7e9ecddc 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -560,6 +560,14 @@ mlx5_os_pd_prepare(struct mlx5_common_device *cdev)
#endif /* HAVE_IBV_FLOW_DV_SUPPORT */
}
+static bool
+pci_addr_partial_match(const struct rte_pci_addr *addr1, const struct rte_pci_addr *addr2)
+{
+ return addr1->domain == addr2->domain &&
+ addr1->bus == addr2->bus &&
+ addr1->devid == addr2->devid;
+}
+
static struct ibv_device *
mlx5_os_get_ibv_device(const struct rte_pci_device *pci_dev)
{
@@ -581,17 +589,23 @@ mlx5_os_get_ibv_device(const struct rte_pci_device *pci_dev)
}
ret1 = mlx5_get_device_guid(addr, guid1, sizeof(guid1));
while (n-- > 0) {
+ bool pci_partial_match;
+ bool guid_match;
+ bool bond_match;
+
DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[n]->name);
if (mlx5_get_pci_addr(ibv_list[n]->ibdev_path, &paddr) != 0)
continue;
if (ret1 > 0)
ret2 = mlx5_get_device_guid(&paddr, guid2, sizeof(guid2));
+ guid_match = ret1 > 0 && ret2 > 0 && memcmp(guid1, guid2, sizeof(guid1)) == 0;
+ pci_partial_match = pci_addr_partial_match(addr, &paddr);
/* Bond device can bond secondary PCIe */
- if ((strstr(ibv_list[n]->name, "bond") && !is_vf_dev &&
- ((ret1 > 0 && ret2 > 0 && !memcmp(guid1, guid2, sizeof(guid1))) ||
- (addr->domain == paddr.domain && addr->bus == paddr.bus &&
- addr->devid == paddr.devid))) ||
- !rte_pci_addr_cmp(addr, &paddr)) {
+ bond_match = !is_vf_dev &&
+ mlx5_os_is_device_bond(ibv_list[n]) &&
+ (guid_match || pci_partial_match);
+ /* IB device matches either through bond or directly. */
+ if (bond_match || !rte_pci_addr_cmp(addr, &paddr)) {
ibv_match = ibv_list[n];
break;
}
@@ -1160,3 +1174,65 @@ mlx5_os_interrupt_handler_destroy(struct rte_intr_handle *intr_handle,
mlx5_intr_callback_unregister(intr_handle, cb, cb_arg);
rte_intr_instance_free(intr_handle);
}
+
+RTE_EXPORT_INTERNAL_SYMBOL(mlx5_os_is_device_bond)
+bool
+mlx5_os_is_device_bond(const void *dev)
+{
+ const struct ibv_device *ibdev;
+ char path[PATH_MAX];
+ struct dirent *e;
+ DIR *net_dir;
+ bool result;
+ int ret;
+
+ if (dev == NULL)
+ return false;
+ ibdev = dev;
+
+ DRV_LOG(DEBUG, "Checking if %s ibdev belongs to bond", ibdev->name);
+
+ ret = snprintf(path, sizeof(path), "%s/device/net", ibdev->ibdev_path);
+ if (ret < 0 || ret >= (int)sizeof(path)) {
+ DRV_LOG(DEBUG, "Unable to get netdevs path for IB device %s", ibdev->name);
+ return false;
+ }
+
+ net_dir = opendir(path);
+ if (net_dir == NULL) {
+ DRV_LOG(DEBUG, "Unable to open directory %s (%s)", path, rte_strerror(errno));
+ return false;
+ }
+
+ result = false;
+ while ((e = readdir(net_dir)) != NULL) {
+ if (e->d_name[0] == '.')
+ continue;
+
+ DRV_LOG(DEBUG, "Checking if %s netdev related to %s ibdev belongs to bond",
+ e->d_name, ibdev->name);
+
+ ret = snprintf(path, sizeof(path), "/sys/class/net/%s/master/bonding", e->d_name);
+ if (ret < 0 || ret >= (int)sizeof(path)) {
+ DRV_LOG(DEBUG, "Unable to get bond path for %s netdev", e->d_name);
+ continue;
+ }
+
+ if (access(path, F_OK) == 0) {
+ /* At least one associated netdev is part of a bond. */
+ DRV_LOG(DEBUG, "Bonding path exists for %s netdev", e->d_name);
+ result = true;
+ goto end;
+ }
+
+ DRV_LOG(DEBUG, "Unable to access bond path for %s netdev (%s)",
+ e->d_name, rte_strerror(errno));
+ }
+
+ DRV_LOG(DEBUG, "No bonded netdev related to %s ibdev found",
+ ibdev->name);
+
+end:
+ closedir(net_dir);
+ return result;
+}
diff --git a/drivers/common/mlx5/linux/mlx5_common_os.h b/drivers/common/mlx5/linux/mlx5_common_os.h
index 2e2c54f1fa..7d4e3c5fe8 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.h
+++ b/drivers/common/mlx5/linux/mlx5_common_os.h
@@ -317,4 +317,13 @@ void
mlx5_os_interrupt_handler_destroy(struct rte_intr_handle *intr_handle,
rte_intr_callback_fn cb, void *cb_arg);
+/**
+ * Return true if given IB device is associated with a networking bond.
+ *
+ * @param dev[in]
+ * Pointer to IB device.
+ */
+__rte_internal
+bool mlx5_os_is_device_bond(const void *dev);
+
#endif /* RTE_PMD_MLX5_COMMON_OS_H_ */
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 2/5] net/mlx5: fix bond check
2026-03-02 11:34 [PATCH 0/5] net/mlx5: add BlueField socket direct support Dariusz Sosnowski
2026-03-02 11:34 ` [PATCH 1/5] common/mlx5: fix bond check Dariusz Sosnowski
@ 2026-03-02 11:34 ` Dariusz Sosnowski
2026-03-02 11:34 ` [PATCH 3/5] net/mlx5: calculate number of uplinks and host PFs Dariusz Sosnowski
` (4 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Dariusz Sosnowski @ 2026-03-02 11:34 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad
Cc: dev, Raslan Darawsheh, stable
mlx5 networking PMD supports probing ethdev ports based
on LAG configured on Linux kernel level.
In such cases, a single IB device is created in the kernel
and mlx5 PMD configured the device through this IB device.
In order to recognize whether PMD will run over LAG device or not,
mlx5 PMD relied on IB device name.
This patch fixes mlx5 networking PMD logic to rely on
mlx5_os_is_device_bond() introduced in previous commit,
instead of relying solely on IB device name.
Fixes: 2e569a370395 ("net/mlx5: add VF LAG mode bonding device recognition")
Cc: viacheslavo@nvidia.com
Cc: stable@dpdk.org
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
drivers/net/mlx5/linux/mlx5_os.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 76edd19c70..405aa9799c 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1928,8 +1928,8 @@ mlx5_dev_spawn_data_cmp(const void *a, const void *b)
/**
* Match PCI information for possible slaves of bonding device.
*
- * @param[in] ibdev_name
- * Name of Infiniband device.
+ * @param[in] ibdev
+ * Pointer to IB device.
* @param[in] pci_dev
* Pointer to primary PCI address structure to match.
* @param[in] nl_rdma
@@ -1946,7 +1946,7 @@ mlx5_dev_spawn_data_cmp(const void *a, const void *b)
* positive index of slave PF in bonding.
*/
static int
-mlx5_device_bond_pci_match(const char *ibdev_name,
+mlx5_device_bond_pci_match(const struct ibv_device *ibdev,
const struct rte_pci_addr *pci_dev,
int nl_rdma, uint16_t owner,
struct mlx5_dev_info *dev_info,
@@ -1968,9 +1968,9 @@ mlx5_device_bond_pci_match(const char *ibdev_name,
memset(bond_info, 0, sizeof(*bond_info));
if (nl_rdma < 0)
return -1;
- if (!strstr(ibdev_name, "bond"))
+ if (!mlx5_os_is_device_bond(ibdev))
return -1;
- np = mlx5_nl_portnum(nl_rdma, ibdev_name, dev_info);
+ np = mlx5_nl_portnum(nl_rdma, ibdev->name, dev_info);
if (!np)
return -1;
if (mlx5_get_device_guid(pci_dev, cur_guid, sizeof(cur_guid)) < 0)
@@ -1982,7 +1982,7 @@ mlx5_device_bond_pci_match(const char *ibdev_name,
*/
for (i = 1; i <= np; ++i) {
/* Check whether Infiniband port is populated. */
- ifindex = mlx5_nl_ifindex(nl_rdma, ibdev_name, i, dev_info);
+ ifindex = mlx5_nl_ifindex(nl_rdma, ibdev->name, i, dev_info);
if (!ifindex)
continue;
if (!if_indextoname(ifindex, ifname))
@@ -2396,7 +2396,7 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev,
info = &tmp_info[ret];
}
DRV_LOG(DEBUG, "Checking device \"%s\"", ibv_list[ret]->name);
- bd = mlx5_device_bond_pci_match(ibv_list[ret]->name, &owner_pci,
+ bd = mlx5_device_bond_pci_match(ibv_list[ret], &owner_pci,
nl_rdma, owner_id,
info,
&bond_info);
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 3/5] net/mlx5: calculate number of uplinks and host PFs
2026-03-02 11:34 [PATCH 0/5] net/mlx5: add BlueField socket direct support Dariusz Sosnowski
2026-03-02 11:34 ` [PATCH 1/5] common/mlx5: fix bond check Dariusz Sosnowski
2026-03-02 11:34 ` [PATCH 2/5] net/mlx5: " Dariusz Sosnowski
@ 2026-03-02 11:34 ` Dariusz Sosnowski
2026-03-02 11:34 ` [PATCH 4/5] net/mlx5: compare representors explicitly Dariusz Sosnowski
` (3 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Dariusz Sosnowski @ 2026-03-02 11:34 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad
Cc: dev, Raslan Darawsheh
Add counting of number of uplinks (IB ports for physical ports)
and host PFs (IB ports for host PF representors visible
on BlueField DPU) for all probed IB devices.
This information will be used to generate a proper DPDK port name,
instead of specific setup type, in the follow up patches.
To facilitate correct counting the following changes are also made:
- Checking RTE_ETH_DEVARG_REPRESENTOR_IGNORE_PF flag is moved to
mlx5_representor_match() so all uplink IB ports are kept in the list.
- IB ports for VF/SF/Host PF representors related to all PFs are kept
in the list.
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
drivers/net/mlx5/linux/mlx5_os.c | 74 +++++++++++++++++++++++++-------
drivers/net/mlx5/mlx5.h | 2 +
2 files changed, 60 insertions(+), 16 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 405aa9799c..f9f3b2c38b 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1047,6 +1047,12 @@ mlx5_queue_counter_id_prepare(struct rte_eth_dev *dev)
"available.", dev->data->port_id);
}
+static inline bool
+mlx5_ignore_pf_representor(const struct rte_eth_devargs *eth_da)
+{
+ return (eth_da->flags & RTE_ETH_DEVARG_REPRESENTOR_IGNORE_PF) != 0;
+}
+
/**
* Check if representor spawn info match devargs.
*
@@ -1075,6 +1081,10 @@ mlx5_representor_match(struct mlx5_dev_spawn_data *spawn,
*/
if (mlx5_is_probed_port_on_mpesw_device(spawn) &&
switch_info->name_type == MLX5_PHYS_PORT_NAME_TYPE_UPLINK) {
+ if (!switch_info->master && mlx5_ignore_pf_representor(eth_da)) {
+ rte_errno = EBUSY;
+ return false;
+ }
for (p = 0; p < eth_da->nb_ports; ++p)
if (switch_info->port_name == eth_da->ports[p])
return true;
@@ -2297,10 +2307,45 @@ mlx5_device_mpesw_pci_match(struct ibv_device *ibv,
return -1;
}
-static inline bool
-mlx5_ignore_pf_representor(const struct rte_eth_devargs *eth_da)
+static void
+calc_nb_uplinks_hpfs(struct ibv_device **ibv_match,
+ unsigned int nd,
+ struct mlx5_dev_spawn_data *list,
+ unsigned int ns)
{
- return (eth_da->flags & RTE_ETH_DEVARG_REPRESENTOR_IGNORE_PF) != 0;
+ for (unsigned int i = 0; i != nd; i++) {
+ uint32_t nb_uplinks = 0;
+ uint32_t nb_hpfs = 0;
+ uint32_t j;
+
+ for (unsigned int j = 0; j != ns; j++) {
+ if (strcmp(ibv_match[i]->name, list[j].phys_dev_name) != 0)
+ continue;
+
+ if (list[j].info.name_type == MLX5_PHYS_PORT_NAME_TYPE_UPLINK)
+ nb_uplinks++;
+ else if (list[j].info.name_type == MLX5_PHYS_PORT_NAME_TYPE_PFHPF)
+ nb_hpfs++;
+ }
+
+ if (nb_uplinks > 0 || nb_hpfs > 0) {
+ for (j = 0; j != ns; j++) {
+ if (strcmp(ibv_match[i]->name, list[j].phys_dev_name) != 0)
+ continue;
+
+ list[j].nb_uplinks = nb_uplinks;
+ list[j].nb_hpfs = nb_hpfs;
+ }
+
+ DRV_LOG(DEBUG, "IB device %s has %u uplinks, %u host PFs",
+ ibv_match[i]->name,
+ nb_uplinks,
+ nb_hpfs);
+ } else {
+ DRV_LOG(DEBUG, "IB device %s unable to recognize uplinks/host PFs",
+ ibv_match[i]->name);
+ }
+ }
}
/**
@@ -2611,8 +2656,6 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev,
if (list[ns].info.port_name == mpesw) {
list[ns].info.master = 1;
list[ns].info.representor = 0;
- } else if (mlx5_ignore_pf_representor(ð_da)) {
- continue;
} else {
list[ns].info.master = 0;
list[ns].info.representor = 1;
@@ -2629,17 +2672,14 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev,
case MLX5_PHYS_PORT_NAME_TYPE_PFHPF:
case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
case MLX5_PHYS_PORT_NAME_TYPE_PFSF:
- /* Only spawn representors related to the probed PF. */
- if (list[ns].info.pf_num == owner_id) {
- /*
- * Ports of this type have PF index encoded in name,
- * which translate to the related uplink port index.
- */
- list[ns].mpesw_port = list[ns].info.pf_num;
- /* MPESW owner is also saved but not used now. */
- list[ns].info.mpesw_owner = mpesw;
- ns++;
- }
+ /*
+ * Ports of this type have PF index encoded in name,
+ * which translate to the related uplink port index.
+ */
+ list[ns].mpesw_port = list[ns].info.pf_num;
+ /* MPESW owner is also saved but not used now. */
+ list[ns].info.mpesw_owner = mpesw;
+ ns++;
break;
default:
break;
@@ -2773,6 +2813,8 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev,
}
}
MLX5_ASSERT(ns);
+ /* Calculate number of uplinks and host PFs for each matched IB device. */
+ calc_nb_uplinks_hpfs(ibv_match, nd, list, ns);
/*
* Sort list to probe devices in natural order for users convenience
* (i.e. master first, then representors from lowest to highest ID).
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b83dda5652..bef0088164 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -215,6 +215,8 @@ struct mlx5_dev_cap {
struct mlx5_dev_spawn_data {
uint32_t ifindex; /**< Network interface index. */
uint32_t max_port; /**< Device maximal port index. */
+ uint32_t nb_uplinks; /**< Number of uplinks associated with IB device. */
+ uint32_t nb_hpfs; /**< Number of host PFs associated with IB device. */
uint32_t phys_port; /**< Device physical port index. */
int pf_bond; /**< bonding device PF index. < 0 - no bonding */
int mpesw_port; /**< MPESW uplink index. Valid if mpesw_owner_port >= 0. */
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 4/5] net/mlx5: compare representors explicitly
2026-03-02 11:34 [PATCH 0/5] net/mlx5: add BlueField socket direct support Dariusz Sosnowski
` (2 preceding siblings ...)
2026-03-02 11:34 ` [PATCH 3/5] net/mlx5: calculate number of uplinks and host PFs Dariusz Sosnowski
@ 2026-03-02 11:34 ` Dariusz Sosnowski
2026-03-02 11:34 ` [PATCH 5/5] net/mlx5: build port name dynamically Dariusz Sosnowski
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Dariusz Sosnowski @ 2026-03-02 11:34 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad
Cc: dev, Raslan Darawsheh
When probing ethdev ports in mlx5 PMD, driver attempted to
match IB ports to passed representor devarg values.
Specifically, for each IB port found on matching IB device
(matching through PCI/auxiliary bus) mlx5 PMD did:
- find corresponding linux netdev,
- read and parsed "phys_port_name" through netlink/sysfs,
- compared parsed info to representor devarg.
Comparison was done through representor ID calculation.
This representor ID however ignored the controller index
requested through devargs or parsed from phys_port_name.
On setups with multiple PCI controllers available
(such as BlueField with Socket Direct),
this could lead to collisions on PF and VF/SF indexes.
This patch reworks the comparison to rely on explicit
values parsed from representor devarg and "phys_port_name".
Controller index, port index and representor index
from representor devarg will be compared against
controller index, PF index and VF/SF index found in IB port data.
If controller index is skipped in representor devarg,
controller index comparison will be skipped as well
and first matching IB port will be selected.
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
drivers/net/mlx5/linux/mlx5_os.c | 154 ++++++++++++++++++++-----------
1 file changed, 101 insertions(+), 53 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index f9f3b2c38b..d30106c4c5 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1053,6 +1053,87 @@ mlx5_ignore_pf_representor(const struct rte_eth_devargs *eth_da)
return (eth_da->flags & RTE_ETH_DEVARG_REPRESENTOR_IGNORE_PF) != 0;
}
+static bool
+is_standard_eswitch(const struct mlx5_dev_spawn_data *spawn)
+{
+ bool is_bond = spawn->pf_bond >= 0;
+
+ return !is_bond && spawn->nb_uplinks <= 1 && spawn->nb_hpfs <= 1;
+}
+
+static bool
+is_hpf(const struct mlx5_dev_spawn_data *spawn)
+{
+ return spawn->info.port_name == -1 &&
+ spawn->info.name_type == MLX5_PHYS_PORT_NAME_TYPE_PFHPF;
+}
+
+static bool
+representor_match_uplink(const struct mlx5_dev_spawn_data *spawn,
+ uint16_t port_name,
+ const struct rte_eth_devargs *eth_da,
+ uint16_t eth_da_pf_num)
+{
+ if (spawn->info.name_type != MLX5_PHYS_PORT_NAME_TYPE_UPLINK)
+ return false;
+ /* One of the uplinks will be a transfer proxy. Must be probed always. */
+ if (spawn->info.master)
+ return true;
+ if (mlx5_ignore_pf_representor(eth_da))
+ return false;
+
+ return port_name == eth_da_pf_num;
+}
+
+static bool
+representor_match_port(const struct mlx5_dev_spawn_data *spawn,
+ const struct rte_eth_devargs *eth_da)
+{
+ for (uint16_t p = 0; p < eth_da->nb_ports; ++p) {
+ uint16_t pf_num = eth_da->ports[p];
+
+ /* PF representor in devargs is interpreted as probing uplink port. */
+ if (eth_da->type == RTE_ETH_REPRESENTOR_PF) {
+ if (representor_match_uplink(spawn, spawn->info.port_name, eth_da, pf_num))
+ return true;
+
+ continue;
+ }
+
+ /* Allow probing related uplink when VF/SF representor is requested. */
+ if ((eth_da->type == RTE_ETH_REPRESENTOR_VF ||
+ eth_da->type == RTE_ETH_REPRESENTOR_SF) &&
+ representor_match_uplink(spawn, spawn->info.pf_num, eth_da, pf_num))
+ return true;
+
+ for (uint16_t f = 0; f < eth_da->nb_representor_ports; ++f) {
+ uint16_t port_num = eth_da->representor_ports[f];
+ bool pf_num_match;
+ bool rep_num_match;
+
+ /*
+ * In standard E-Switch case, allow probing VFs even if wrong PF index
+ * was provided.
+ */
+ if (is_standard_eswitch(spawn))
+ pf_num_match = true;
+ else
+ pf_num_match = spawn->info.pf_num == pf_num;
+
+ /* Host PF is indicated through VF/SF representor index == -1. */
+ if (is_hpf(spawn))
+ rep_num_match = port_num == UINT16_MAX;
+ else
+ rep_num_match = port_num == spawn->info.port_name;
+
+ if (pf_num_match && rep_num_match)
+ return true;
+ }
+ }
+
+ return false;
+}
+
/**
* Check if representor spawn info match devargs.
*
@@ -1069,54 +1150,29 @@ mlx5_representor_match(struct mlx5_dev_spawn_data *spawn,
struct rte_eth_devargs *eth_da)
{
struct mlx5_switch_info *switch_info = &spawn->info;
- unsigned int p, f;
- uint16_t id;
- uint16_t repr_id = mlx5_representor_id_encode(switch_info,
- eth_da->type);
+ unsigned int c;
+ bool ignore_ctrl_num = eth_da->nb_mh_controllers == 0 ||
+ switch_info->name_type == MLX5_PHYS_PORT_NAME_TYPE_UPLINK;
- /*
- * Assuming Multiport E-Switch device was detected,
- * if spawned port is an uplink, check if the port
- * was requested through representor devarg.
- */
- if (mlx5_is_probed_port_on_mpesw_device(spawn) &&
- switch_info->name_type == MLX5_PHYS_PORT_NAME_TYPE_UPLINK) {
- if (!switch_info->master && mlx5_ignore_pf_representor(eth_da)) {
- rte_errno = EBUSY;
- return false;
- }
- for (p = 0; p < eth_da->nb_ports; ++p)
- if (switch_info->port_name == eth_da->ports[p])
- return true;
- rte_errno = EBUSY;
- return false;
- }
switch (eth_da->type) {
case RTE_ETH_REPRESENTOR_PF:
- /*
- * PF representors provided in devargs translate to uplink ports, but
- * if and only if the device is a part of MPESW device.
- */
- if (!mlx5_is_probed_port_on_mpesw_device(spawn)) {
+ if (switch_info->name_type != MLX5_PHYS_PORT_NAME_TYPE_UPLINK) {
rte_errno = EBUSY;
return false;
}
break;
case RTE_ETH_REPRESENTOR_SF:
- if (!(spawn->info.port_name == -1 &&
- switch_info->name_type ==
- MLX5_PHYS_PORT_NAME_TYPE_PFHPF) &&
- switch_info->name_type != MLX5_PHYS_PORT_NAME_TYPE_PFSF) {
+ if (!is_hpf(spawn) &&
+ switch_info->name_type != MLX5_PHYS_PORT_NAME_TYPE_PFSF &&
+ switch_info->name_type != MLX5_PHYS_PORT_NAME_TYPE_UPLINK) {
rte_errno = EBUSY;
return false;
}
break;
case RTE_ETH_REPRESENTOR_VF:
- /* Allows HPF representor index -1 as exception. */
- if (!(spawn->info.port_name == -1 &&
- switch_info->name_type ==
- MLX5_PHYS_PORT_NAME_TYPE_PFHPF) &&
- switch_info->name_type != MLX5_PHYS_PORT_NAME_TYPE_PFVF) {
+ if (!is_hpf(spawn) &&
+ switch_info->name_type != MLX5_PHYS_PORT_NAME_TYPE_PFVF &&
+ switch_info->name_type != MLX5_PHYS_PORT_NAME_TYPE_UPLINK) {
rte_errno = EBUSY;
return false;
}
@@ -1129,21 +1185,17 @@ mlx5_representor_match(struct mlx5_dev_spawn_data *spawn,
DRV_LOG(ERR, "unsupported representor type");
return false;
}
- /* Check representor ID: */
- for (p = 0; p < eth_da->nb_ports; ++p) {
- if (!mlx5_is_probed_port_on_mpesw_device(spawn) && spawn->pf_bond < 0) {
- /* For non-LAG mode, allow and ignore pf. */
- switch_info->pf_num = eth_da->ports[p];
- repr_id = mlx5_representor_id_encode(switch_info,
- eth_da->type);
- }
- for (f = 0; f < eth_da->nb_representor_ports; ++f) {
- id = MLX5_REPRESENTOR_ID
- (eth_da->ports[p], eth_da->type,
- eth_da->representor_ports[f]);
- if (repr_id == id)
+ if (!ignore_ctrl_num) {
+ for (c = 0; c < eth_da->nb_mh_controllers; ++c) {
+ uint16_t ctrl_num = eth_da->mh_controllers[c];
+
+ if (spawn->info.ctrl_num == ctrl_num &&
+ representor_match_port(spawn, eth_da))
return true;
}
+ } else {
+ if (representor_match_port(spawn, eth_da))
+ return true;
}
rte_errno = EBUSY;
return false;
@@ -2822,16 +2874,12 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev,
qsort(list, ns, sizeof(*list), mlx5_dev_spawn_data_cmp);
if (eth_da.type != RTE_ETH_REPRESENTOR_NONE) {
/* Set devargs default values. */
- if (eth_da.nb_mh_controllers == 0) {
- eth_da.nb_mh_controllers = 1;
- eth_da.mh_controllers[0] = 0;
- }
if (eth_da.nb_ports == 0 && ns > 0) {
if (list[0].pf_bond >= 0 && list[0].info.representor)
DRV_LOG(WARNING, "Representor on Bonding device should use pf#vf# syntax: %s",
pci_dev->device.devargs->args);
eth_da.nb_ports = 1;
- eth_da.ports[0] = list[0].info.pf_num;
+ eth_da.ports[0] = list[0].info.port_name;
}
if (eth_da.nb_representor_ports == 0) {
eth_da.nb_representor_ports = 1;
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 5/5] net/mlx5: build port name dynamically
2026-03-02 11:34 [PATCH 0/5] net/mlx5: add BlueField socket direct support Dariusz Sosnowski
` (3 preceding siblings ...)
2026-03-02 11:34 ` [PATCH 4/5] net/mlx5: compare representors explicitly Dariusz Sosnowski
@ 2026-03-02 11:34 ` Dariusz Sosnowski
2026-03-04 7:26 ` [PATCH 0/5] net/mlx5: add BlueField socket direct support Bing Zhao
2026-03-04 10:57 ` [PATCH v2 0/3] net/mlx5: net/mlx5: fix probing to allow BlueField Socket Direct Dariusz Sosnowski
6 siblings, 0 replies; 12+ messages in thread
From: Dariusz Sosnowski @ 2026-03-02 11:34 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad
Cc: dev, Raslan Darawsheh
When new mlx5 ethdev port was spawned, the port's name was generated
based on specific type of setup the port was probed on.
The types of names however were identical in each case.
In order to prevent future changes, which require adding
specific cases for new specific setups, this patch reworks
the port's name building logic to be dynamic.
This logic will be based on information independent of setup type,
such as:
- number of uplinks (physical ports),
- number of host PFs (host PF representors on BlueField ARM),
- whether port is an uplink,
- whether port is a representor or not.
This patch keeps addition of IB device name in ethdev port name
in case underlying device is a LAG, for name compatibility reasons.
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
drivers/net/mlx5/linux/mlx5_os.c | 122 +++++++++++++++++++++----------
1 file changed, 84 insertions(+), 38 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index d30106c4c5..324d65cf32 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1068,6 +1068,84 @@ is_hpf(const struct mlx5_dev_spawn_data *spawn)
spawn->info.name_type == MLX5_PHYS_PORT_NAME_TYPE_PFHPF;
}
+static int
+build_port_name(struct rte_device *dpdk_dev,
+ struct mlx5_dev_spawn_data *spawn,
+ char *name,
+ size_t name_sz)
+{
+ bool is_bond = spawn->pf_bond >= 0;
+ int written = 0;
+ int ret;
+
+ ret = snprintf(name, name_sz, "%s", dpdk_dev->name);
+ if (ret < 0)
+ return ret;
+ written += ret;
+ if (written >= (int)name_sz)
+ return written;
+
+ /*
+ * Whenever bond device is detected, include IB device name.
+ * This is kept to keep port naming backward compatible.
+ */
+ if (is_bond) {
+ ret = snprintf(name + written, name_sz - written, "_%s", spawn->phys_dev_name);
+ if (ret < 0)
+ return ret;
+ written += ret;
+ if (written >= (int)name_sz)
+ return written;
+ }
+
+ if (spawn->info.name_type == MLX5_PHYS_PORT_NAME_TYPE_UPLINK) {
+ /* Add port to name if and only if there is more than one uplink. */
+ if (spawn->nb_uplinks <= 1)
+ goto end;
+
+ ret = snprintf(name + written, name_sz - written, "_p%u", spawn->info.port_name);
+ if (ret < 0)
+ return ret;
+ written += ret;
+ if (written >= (int)name_sz)
+ return written;
+ } else if (spawn->info.representor) {
+ /*
+ * If port is a representor, then switchdev has been enabled.
+ * In that case add controller, PF and VF/SF indexes to port name
+ * if at least one of these conditions are met:
+ * 1. Device is a bond (VF-LAG).
+ * 2. There are multiple uplinks (MPESW).
+ * 3. There are multiple host PFs (BlueField socket direct).
+ *
+ * If none of these conditions apply, then it is assumed that
+ * this device manages a single non-shared E-Switch with single controller,
+ * where there is only one uplink/PF and one host PF (on BlueField).
+ */
+ if (!is_standard_eswitch(spawn))
+ ret = snprintf(name + written, name_sz - written,
+ "_representor_c%dpf%d%s%u",
+ spawn->info.ctrl_num,
+ spawn->info.pf_num,
+ spawn->info.name_type ==
+ MLX5_PHYS_PORT_NAME_TYPE_PFSF ? "sf" : "vf",
+ spawn->info.port_name);
+ else
+ ret = snprintf(name + written, name_sz - written, "_representor_%s%u",
+ spawn->info.name_type ==
+ MLX5_PHYS_PORT_NAME_TYPE_PFSF ? "sf" : "vf",
+ spawn->info.port_name);
+ if (ret < 0)
+ return ret;
+ written += ret;
+ if (written >= (int)name_sz)
+ return written;
+ }
+
+end:
+ return written;
+}
+
static bool
representor_match_uplink(const struct mlx5_dev_spawn_data *spawn,
uint16_t port_name,
@@ -1247,44 +1325,12 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
!mlx5_representor_match(spawn, eth_da))
return NULL;
/* Build device name. */
- if (spawn->pf_bond >= 0) {
- /* Bonding device. */
- if (!switch_info->representor) {
- err = snprintf(name, sizeof(name), "%s_%s",
- dpdk_dev->name, spawn->phys_dev_name);
- } else {
- err = snprintf(name, sizeof(name), "%s_%s_representor_c%dpf%d%s%u",
- dpdk_dev->name, spawn->phys_dev_name,
- switch_info->ctrl_num,
- switch_info->pf_num,
- switch_info->name_type ==
- MLX5_PHYS_PORT_NAME_TYPE_PFSF ? "sf" : "vf",
- switch_info->port_name);
- }
- } else if (mlx5_is_probed_port_on_mpesw_device(spawn)) {
- /* MPESW device. */
- if (switch_info->name_type == MLX5_PHYS_PORT_NAME_TYPE_UPLINK) {
- err = snprintf(name, sizeof(name), "%s_p%d",
- dpdk_dev->name, spawn->mpesw_port);
- } else {
- err = snprintf(name, sizeof(name), "%s_representor_c%dpf%d%s%u",
- dpdk_dev->name,
- switch_info->ctrl_num,
- switch_info->pf_num,
- switch_info->name_type ==
- MLX5_PHYS_PORT_NAME_TYPE_PFSF ? "sf" : "vf",
- switch_info->port_name);
- }
- } else {
- /* Single device. */
- if (!switch_info->representor)
- strlcpy(name, dpdk_dev->name, sizeof(name));
- else
- err = snprintf(name, sizeof(name), "%s_representor_%s%u",
- dpdk_dev->name,
- switch_info->name_type ==
- MLX5_PHYS_PORT_NAME_TYPE_PFSF ? "sf" : "vf",
- switch_info->port_name);
+ err = build_port_name(dpdk_dev, spawn, name, sizeof(name));
+ if (err < 0) {
+ DRV_LOG(ERR, "Failed to build port name for IB device %s/%u",
+ spawn->phys_dev_name, spawn->phys_port);
+ rte_errno = EINVAL;
+ return NULL;
}
if (err >= (int)sizeof(name))
DRV_LOG(WARNING, "device name overflow %s", name);
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread* RE: [PATCH 0/5] net/mlx5: add BlueField socket direct support
2026-03-02 11:34 [PATCH 0/5] net/mlx5: add BlueField socket direct support Dariusz Sosnowski
` (4 preceding siblings ...)
2026-03-02 11:34 ` [PATCH 5/5] net/mlx5: build port name dynamically Dariusz Sosnowski
@ 2026-03-04 7:26 ` Bing Zhao
2026-03-04 10:57 ` [PATCH v2 0/3] net/mlx5: net/mlx5: fix probing to allow BlueField Socket Direct Dariusz Sosnowski
6 siblings, 0 replies; 12+ messages in thread
From: Bing Zhao @ 2026-03-04 7:26 UTC (permalink / raw)
To: Dariusz Sosnowski, Slava Ovsiienko, Ori Kam, Suanming Mou,
Matan Azrad
Cc: dev@dpdk.org, Raslan Darawsheh
Hi,
> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Monday, March 2, 2026 7:35 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; Bing Zhao
> <bingz@nvidia.com>; Ori Kam <orika@nvidia.com>; Suanming Mou
> <suanmingm@nvidia.com>; Matan Azrad <matan@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> Subject: [PATCH 0/5] net/mlx5: add BlueField socket direct support
>
> Goal of this patchset is to prepare probing logic in mlx5 networking PMD
> for support of BlueField DPUs with Socket Direct.
> In such use case, BlueField DPU will be connected through PCI to 2
> different CPUs on the host.
> Each host CPU sees 2 PFs.
> Each PF is connected to one of the physical ports.
>
> +--------+ +--------+
> |CPU 0 | |CPU 1 |
> | | | |
> | pf0 | | pf0 |
> | | | |
> | pf1 | | pf1 |
> | | | |
> +---+----+ +-+------+
> | |
> | |
> | |
> +----+ +-----+
> | |
> | |
> | |
> +---+-----------+----+
> |BF3 DPU |
> | |
> | pf0hpf pf1hpf |
> | |
> | pf2hpf pf3hpf |
> | |
> | p0 p1 |
> +------+------+------+
> | phy0 | | phy1 |
> +------+ +------+
>
>
> On BlueField DPU ARM Linux netdevs map to PFs/ports as follows:
>
> - p0 and p1 to physical ports 0 and 1 respectively,
> - pf0hpf and pf2hpf to CPU0 pf0 and CPU1 pf0 respectively,
> - pf1hpf and pf3hpf to CPU0 pf1 and CPU1 pf1 respectively.
>
> There are several possible ways to use such a setup:
>
> - Single E-Switch (embedded switch) per each CPU PF to
> physical port connection.
> - Shared E-Switch for related CPU PFs:
> - For example, both pf0hpf and pf2hpf are in the same E-Switch domain.
> - Multiport E-Switch.
> - All host PFs and physical ports are in the same E-Switch domain.
>
> When a DPDK application would be run on BlueField ARM it should be
> possible for application to probe all the relevant representors
> (corresponding to available netdevs).
> Using testpmd syntax users will be able to do the following:
>
> # Probe both physical ports
> port attach 03:00.0,dv_flow_en=2,representor=pf0-1
>
> # Probe both host PF 0 from CPU 0
> # (VF representor index -1 is special encoding for host PF)
> port attach 03:00.0,dv_flow_en=2,representor=pf0vf65535
> # or with explicit controller index
> port attach 03:00.0,dv_flow_en=2,representor=c1pf0vf65535
>
> # Probe both host PF 0 from CPU 1
> port attach 03:00.0,dv_flow_en=2,representor=pf2vf65535
> # or with explicit controller index
> port attach 03:00.0,dv_flow_en=2,representor=c2pf2vf65535
>
> Patches overview:
>
> - Patch 1 and 2 - Fixes bond detection logic.
> Previously mlx5 PMD relied on "bond" appearing in IB device name
> which is not always the case. Moved to sysfs checks for bonding devices.
> - Patch 3 - Add calculation of number of physical ports and host PFs.
> This information will be used to determine
> how DPDK port name is generated, instead of relying on
> specific setup type.
> - Patch 4 - Change "representor to IB port" matching logic to directly
> compare ethdev devargs values to IB port info.
> Added optional matching on controller index.
> - Patch 5 - Make DPDK port name generation dynamic and dependent on
> types/number of ports, instead of specific setup type.
> This allows more generic probing, independent of setup topology.
>
> Dariusz Sosnowski (5):
> common/mlx5: fix bond check
> net/mlx5: fix bond check
> net/mlx5: calculate number of uplinks and host PFs
> net/mlx5: compare representors explicitly
> net/mlx5: build port name dynamically
>
> drivers/common/mlx5/linux/mlx5_common_os.c | 86 ++++-
> drivers/common/mlx5/linux/mlx5_common_os.h | 9 +
> drivers/net/mlx5/linux/mlx5_os.c | 356 ++++++++++++++-------
> drivers/net/mlx5/mlx5.h | 2 +
> 4 files changed, 338 insertions(+), 115 deletions(-)
>
> --
> 2.47.3
Series Acked-by: Bing Zhao <bingz@nvidia.com>
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH v2 0/3] net/mlx5: net/mlx5: fix probing to allow BlueField Socket Direct
2026-03-02 11:34 [PATCH 0/5] net/mlx5: add BlueField socket direct support Dariusz Sosnowski
` (5 preceding siblings ...)
2026-03-04 7:26 ` [PATCH 0/5] net/mlx5: add BlueField socket direct support Bing Zhao
@ 2026-03-04 10:57 ` Dariusz Sosnowski
2026-03-04 10:57 ` [PATCH v2 1/3] common/mlx5: fix bond check Dariusz Sosnowski
` (3 more replies)
6 siblings, 4 replies; 12+ messages in thread
From: Dariusz Sosnowski @ 2026-03-04 10:57 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad
Cc: dev, Raslan Darawsheh
Goal of this patchset is to fix probing logic in mlx5 networking PMD
to allow support of BlueField DPUs with Socket Direct.
In such use case, BlueField DPU will be connected through PCI
to 2 different CPUs on the host.
Each host CPU sees 2 PFs.
Each PF is connected to one of the physical ports.
+--------+ +--------+
|CPU 0 | |CPU 1 |
| | | |
| pf0 | | pf0 |
| | | |
| pf1 | | pf1 |
| | | |
+---+----+ +-+------+
| |
| |
| |
+----+ +-----+
| |
| |
| |
+---+-----------+----+
|BF3 DPU |
| |
| pf0hpf pf1hpf |
| |
| pf2hpf pf3hpf |
| |
| p0 p1 |
+------+------+------+
| phy0 | | phy1 |
+------+ +------+
On BlueField DPU ARM Linux netdevs map to PFs/ports as follows:
- p0 and p1 to physical ports 0 and 1 respectively,
- pf0hpf and pf2hpf to CPU0 pf0 and CPU1 pf0 respectively,
- pf1hpf and pf3hpf to CPU0 pf1 and CPU1 pf1 respectively.
There are several possible ways to use such a setup:
- Single E-Switch (embedded switch) per each CPU PF to
physical port connection.
- Shared E-Switch for related CPU PFs:
- For example, both pf0hpf and pf2hpf are in the same E-Switch.
- Multiport E-Switch.
Existing probing logic in mlx5 PMD did not support case (2).
In this case there is one physical port (uplink in mlx5 naming)
and 2 host PFs.
On such a setup mlx5 generated port names with the following syntax:
03:00.0_representor_vfX
Port name syntax was selected based on specific setup type.
Since setup was not recognized as neither bond nor MPESW,
mlx5 selected the default name without PF index.
Since BlueField with Socket Direct would have 2 host PFs,
such probing logic caused DPDK port name collisions
on the attempt to probe 2 host PFs at the same time.
More over there were some false positives during probing on systems
with or without specific udev rules which change mlx5 IB device
name to include "bond" in the name.
This patchset addresses the above:
- Patch 1 and 2 - Fixes bond detection logic.
Previously mlx5 PMD relied on "bond" appearing in IB device name
which is not always the case. Moved to sysfs checks.
- Patch 3 - Fixed uplink and host PF probing logic.
Previously mlx5 PMD relied on specific setup type.
With this patch probing is more generic and based on
types and number of available ports on the E-Switch.
With this patchset, a DPDK application run on BlueField ARM is able to
probe all the relevant representors (corresponding to available netdevs).
Using testpmd syntax users will be able to do the following:
# Probe both physical ports
port attach 03:00.0,dv_flow_en=2,representor=pf0-1
# Probe both host PF 0 from CPU 0
# (VF representor index -1 is special encoding for host PF)
port attach 03:00.0,dv_flow_en=2,representor=pf0vf65535
# or with explicit controller index
port attach 03:00.0,dv_flow_en=2,representor=c1pf0vf65535
# Probe both host PF 0 from CPU 1
port attach 03:00.0,dv_flow_en=2,representor=pf2vf65535
# or with explicit controller index
port attach 03:00.0,dv_flow_en=2,representor=c2pf2vf65535
v1: https://patches.dpdk.org/project/dpdk/cover/20260302113443.16648-1-dsosnowski@nvidia.com/
v1->v2:
- Squash patches 3-5 and add Fixes tag,
since these patches fix existing probing logic.
Dariusz Sosnowski (3):
common/mlx5: fix bond check
net/mlx5: fix bond check
net/mlx5: fix probing to allow BlueField Socket Direct
drivers/common/mlx5/linux/mlx5_common_os.c | 86 ++++-
drivers/common/mlx5/linux/mlx5_common_os.h | 9 +
drivers/net/mlx5/linux/mlx5_os.c | 356 ++++++++++++++-------
drivers/net/mlx5/mlx5.h | 2 +
4 files changed, 338 insertions(+), 115 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH v2 1/3] common/mlx5: fix bond check
2026-03-04 10:57 ` [PATCH v2 0/3] net/mlx5: net/mlx5: fix probing to allow BlueField Socket Direct Dariusz Sosnowski
@ 2026-03-04 10:57 ` Dariusz Sosnowski
2026-03-04 10:57 ` [PATCH v2 2/3] net/mlx5: " Dariusz Sosnowski
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Dariusz Sosnowski @ 2026-03-04 10:57 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad, Rongwei Liu
Cc: dev, Raslan Darawsheh, stable
mlx5 PMD supports probing a device where PF kernel netdevs
are part of a netdev bonding device in the Linux kernel.
In such scenario, there is only IB device exposed
which mlx5 PMD later uses to configure the device.
This IB device is created only one of the PFs.
PMD allowed probing this device by any of the PFs.
As part of the logic for allowing this, mlx5 common driver
checked if the name of IB device contained "bond", but this is not
always the case and depends on existence of specific udev rules.
This patch fixes that by attempting to resolve, through sysfs,
if any of the netdevs related to probed PCI device
are part of the bonding netdev, instead of relying on device name.
Fixes: f956d3d4c33c ("net/mlx5: fix probing with secondary bonding member")
Cc: rongweil@nvidia.com
Cc: stable@dpdk.org
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Bing Zhao <bingz@nvidia.com>
---
drivers/common/mlx5/linux/mlx5_common_os.c | 86 ++++++++++++++++++++--
drivers/common/mlx5/linux/mlx5_common_os.h | 9 +++
2 files changed, 90 insertions(+), 5 deletions(-)
diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c b/drivers/common/mlx5/linux/mlx5_common_os.c
index 926b56e419..fc7e9ecddc 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -560,6 +560,14 @@ mlx5_os_pd_prepare(struct mlx5_common_device *cdev)
#endif /* HAVE_IBV_FLOW_DV_SUPPORT */
}
+static bool
+pci_addr_partial_match(const struct rte_pci_addr *addr1, const struct rte_pci_addr *addr2)
+{
+ return addr1->domain == addr2->domain &&
+ addr1->bus == addr2->bus &&
+ addr1->devid == addr2->devid;
+}
+
static struct ibv_device *
mlx5_os_get_ibv_device(const struct rte_pci_device *pci_dev)
{
@@ -581,17 +589,23 @@ mlx5_os_get_ibv_device(const struct rte_pci_device *pci_dev)
}
ret1 = mlx5_get_device_guid(addr, guid1, sizeof(guid1));
while (n-- > 0) {
+ bool pci_partial_match;
+ bool guid_match;
+ bool bond_match;
+
DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[n]->name);
if (mlx5_get_pci_addr(ibv_list[n]->ibdev_path, &paddr) != 0)
continue;
if (ret1 > 0)
ret2 = mlx5_get_device_guid(&paddr, guid2, sizeof(guid2));
+ guid_match = ret1 > 0 && ret2 > 0 && memcmp(guid1, guid2, sizeof(guid1)) == 0;
+ pci_partial_match = pci_addr_partial_match(addr, &paddr);
/* Bond device can bond secondary PCIe */
- if ((strstr(ibv_list[n]->name, "bond") && !is_vf_dev &&
- ((ret1 > 0 && ret2 > 0 && !memcmp(guid1, guid2, sizeof(guid1))) ||
- (addr->domain == paddr.domain && addr->bus == paddr.bus &&
- addr->devid == paddr.devid))) ||
- !rte_pci_addr_cmp(addr, &paddr)) {
+ bond_match = !is_vf_dev &&
+ mlx5_os_is_device_bond(ibv_list[n]) &&
+ (guid_match || pci_partial_match);
+ /* IB device matches either through bond or directly. */
+ if (bond_match || !rte_pci_addr_cmp(addr, &paddr)) {
ibv_match = ibv_list[n];
break;
}
@@ -1160,3 +1174,65 @@ mlx5_os_interrupt_handler_destroy(struct rte_intr_handle *intr_handle,
mlx5_intr_callback_unregister(intr_handle, cb, cb_arg);
rte_intr_instance_free(intr_handle);
}
+
+RTE_EXPORT_INTERNAL_SYMBOL(mlx5_os_is_device_bond)
+bool
+mlx5_os_is_device_bond(const void *dev)
+{
+ const struct ibv_device *ibdev;
+ char path[PATH_MAX];
+ struct dirent *e;
+ DIR *net_dir;
+ bool result;
+ int ret;
+
+ if (dev == NULL)
+ return false;
+ ibdev = dev;
+
+ DRV_LOG(DEBUG, "Checking if %s ibdev belongs to bond", ibdev->name);
+
+ ret = snprintf(path, sizeof(path), "%s/device/net", ibdev->ibdev_path);
+ if (ret < 0 || ret >= (int)sizeof(path)) {
+ DRV_LOG(DEBUG, "Unable to get netdevs path for IB device %s", ibdev->name);
+ return false;
+ }
+
+ net_dir = opendir(path);
+ if (net_dir == NULL) {
+ DRV_LOG(DEBUG, "Unable to open directory %s (%s)", path, rte_strerror(errno));
+ return false;
+ }
+
+ result = false;
+ while ((e = readdir(net_dir)) != NULL) {
+ if (e->d_name[0] == '.')
+ continue;
+
+ DRV_LOG(DEBUG, "Checking if %s netdev related to %s ibdev belongs to bond",
+ e->d_name, ibdev->name);
+
+ ret = snprintf(path, sizeof(path), "/sys/class/net/%s/master/bonding", e->d_name);
+ if (ret < 0 || ret >= (int)sizeof(path)) {
+ DRV_LOG(DEBUG, "Unable to get bond path for %s netdev", e->d_name);
+ continue;
+ }
+
+ if (access(path, F_OK) == 0) {
+ /* At least one associated netdev is part of a bond. */
+ DRV_LOG(DEBUG, "Bonding path exists for %s netdev", e->d_name);
+ result = true;
+ goto end;
+ }
+
+ DRV_LOG(DEBUG, "Unable to access bond path for %s netdev (%s)",
+ e->d_name, rte_strerror(errno));
+ }
+
+ DRV_LOG(DEBUG, "No bonded netdev related to %s ibdev found",
+ ibdev->name);
+
+end:
+ closedir(net_dir);
+ return result;
+}
diff --git a/drivers/common/mlx5/linux/mlx5_common_os.h b/drivers/common/mlx5/linux/mlx5_common_os.h
index 2e2c54f1fa..7d4e3c5fe8 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.h
+++ b/drivers/common/mlx5/linux/mlx5_common_os.h
@@ -317,4 +317,13 @@ void
mlx5_os_interrupt_handler_destroy(struct rte_intr_handle *intr_handle,
rte_intr_callback_fn cb, void *cb_arg);
+/**
+ * Return true if given IB device is associated with a networking bond.
+ *
+ * @param dev[in]
+ * Pointer to IB device.
+ */
+__rte_internal
+bool mlx5_os_is_device_bond(const void *dev);
+
#endif /* RTE_PMD_MLX5_COMMON_OS_H_ */
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH v2 2/3] net/mlx5: fix bond check
2026-03-04 10:57 ` [PATCH v2 0/3] net/mlx5: net/mlx5: fix probing to allow BlueField Socket Direct Dariusz Sosnowski
2026-03-04 10:57 ` [PATCH v2 1/3] common/mlx5: fix bond check Dariusz Sosnowski
@ 2026-03-04 10:57 ` Dariusz Sosnowski
2026-03-04 10:57 ` [PATCH v2 3/3] net/mlx5: fix probing to allow BlueField Socket Direct Dariusz Sosnowski
2026-03-10 8:16 ` [PATCH v2 0/3] net/mlx5: " Raslan Darawsheh
3 siblings, 0 replies; 12+ messages in thread
From: Dariusz Sosnowski @ 2026-03-04 10:57 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad
Cc: dev, Raslan Darawsheh, stable
mlx5 networking PMD supports probing ethdev ports based
on LAG configured on Linux kernel level.
In such cases, a single IB device is created in the kernel
and mlx5 PMD configured the device through this IB device.
In order to recognize whether PMD will run over LAG device or not,
mlx5 PMD relied on IB device name.
This patch fixes mlx5 networking PMD logic to rely on
mlx5_os_is_device_bond() introduced in previous commit,
instead of relying solely on IB device name.
Fixes: 2e569a370395 ("net/mlx5: add VF LAG mode bonding device recognition")
Cc: viacheslavo@nvidia.com
Cc: stable@dpdk.org
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Bing Zhao <bingz@nvidia.com>
---
drivers/net/mlx5/linux/mlx5_os.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 76edd19c70..405aa9799c 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1928,8 +1928,8 @@ mlx5_dev_spawn_data_cmp(const void *a, const void *b)
/**
* Match PCI information for possible slaves of bonding device.
*
- * @param[in] ibdev_name
- * Name of Infiniband device.
+ * @param[in] ibdev
+ * Pointer to IB device.
* @param[in] pci_dev
* Pointer to primary PCI address structure to match.
* @param[in] nl_rdma
@@ -1946,7 +1946,7 @@ mlx5_dev_spawn_data_cmp(const void *a, const void *b)
* positive index of slave PF in bonding.
*/
static int
-mlx5_device_bond_pci_match(const char *ibdev_name,
+mlx5_device_bond_pci_match(const struct ibv_device *ibdev,
const struct rte_pci_addr *pci_dev,
int nl_rdma, uint16_t owner,
struct mlx5_dev_info *dev_info,
@@ -1968,9 +1968,9 @@ mlx5_device_bond_pci_match(const char *ibdev_name,
memset(bond_info, 0, sizeof(*bond_info));
if (nl_rdma < 0)
return -1;
- if (!strstr(ibdev_name, "bond"))
+ if (!mlx5_os_is_device_bond(ibdev))
return -1;
- np = mlx5_nl_portnum(nl_rdma, ibdev_name, dev_info);
+ np = mlx5_nl_portnum(nl_rdma, ibdev->name, dev_info);
if (!np)
return -1;
if (mlx5_get_device_guid(pci_dev, cur_guid, sizeof(cur_guid)) < 0)
@@ -1982,7 +1982,7 @@ mlx5_device_bond_pci_match(const char *ibdev_name,
*/
for (i = 1; i <= np; ++i) {
/* Check whether Infiniband port is populated. */
- ifindex = mlx5_nl_ifindex(nl_rdma, ibdev_name, i, dev_info);
+ ifindex = mlx5_nl_ifindex(nl_rdma, ibdev->name, i, dev_info);
if (!ifindex)
continue;
if (!if_indextoname(ifindex, ifname))
@@ -2396,7 +2396,7 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev,
info = &tmp_info[ret];
}
DRV_LOG(DEBUG, "Checking device \"%s\"", ibv_list[ret]->name);
- bd = mlx5_device_bond_pci_match(ibv_list[ret]->name, &owner_pci,
+ bd = mlx5_device_bond_pci_match(ibv_list[ret], &owner_pci,
nl_rdma, owner_id,
info,
&bond_info);
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH v2 3/3] net/mlx5: fix probing to allow BlueField Socket Direct
2026-03-04 10:57 ` [PATCH v2 0/3] net/mlx5: net/mlx5: fix probing to allow BlueField Socket Direct Dariusz Sosnowski
2026-03-04 10:57 ` [PATCH v2 1/3] common/mlx5: fix bond check Dariusz Sosnowski
2026-03-04 10:57 ` [PATCH v2 2/3] net/mlx5: " Dariusz Sosnowski
@ 2026-03-04 10:57 ` Dariusz Sosnowski
2026-03-10 8:16 ` [PATCH v2 0/3] net/mlx5: " Raslan Darawsheh
3 siblings, 0 replies; 12+ messages in thread
From: Dariusz Sosnowski @ 2026-03-04 10:57 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad
Cc: dev, Raslan Darawsheh, stable
BlueField DPUs with Socket Direct (SD) can be connected to 2 different
CPUs on the host system.
Each host CPU sees 2 PFs.
Each PF is connected to one of the physical ports.
On BlueField DPU ARM Linux netdevs map to PFs/ports as follows:
- p0 and p1 to physical ports 0 and 1 respectively,
- pf0hpf and pf2hpf to CPU0 pf0 and CPU1 pf0 respectively,
- pf1hpf and pf3hpf to CPU0 pf1 and CPU1 pf1 respectively.
There are several possible ways to use such a setup:
1. Single E-Switch (embedded switch) per each CPU PF to
physical port connection.
2. Shared E-Switch for related CPU PFs:
- For example, both pf0hpf and pf2hpf are in the same E-Switch.
3. Multiport E-Switch (MPESW).
Existing probing logic in mlx5 PMD did not support case (2).
In this case there is one physical port (uplink in mlx5 naming)
and 2 host PFs.
On such a setup mlx5 generated port names with the following syntax:
03:00.0_representor_vfX
Because setup was not recognized as neither bond nor MPESW.
Since BlueField with Socket Direct would have 2 host PFs,
such probing logic caused DPDK port name collisions
on the attempt to probe 2 host PFs at the same time.
This patch addresses that by changing probing and naming logic
to be more generic. This is achieved through:
- Adding logic for calculation of number of uplinks and
number of host PFs available on the system.
- Change port name generation logic to be based on these numbers
instead of specific setup type.
- Change representor matching logic during probing
to respect all parameters passed in devargs.
Specifically, controller index, PF index and VF indexes are used.
Fixes: 11c73de9ef63 ("net/mlx5: probe multi-port E-Switch device")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Bing Zhao <bingz@nvidia.com>
---
drivers/net/mlx5/linux/mlx5_os.c | 342 +++++++++++++++++++++----------
drivers/net/mlx5/mlx5.h | 2 +
2 files changed, 241 insertions(+), 103 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 405aa9799c..324d65cf32 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1047,6 +1047,171 @@ mlx5_queue_counter_id_prepare(struct rte_eth_dev *dev)
"available.", dev->data->port_id);
}
+static inline bool
+mlx5_ignore_pf_representor(const struct rte_eth_devargs *eth_da)
+{
+ return (eth_da->flags & RTE_ETH_DEVARG_REPRESENTOR_IGNORE_PF) != 0;
+}
+
+static bool
+is_standard_eswitch(const struct mlx5_dev_spawn_data *spawn)
+{
+ bool is_bond = spawn->pf_bond >= 0;
+
+ return !is_bond && spawn->nb_uplinks <= 1 && spawn->nb_hpfs <= 1;
+}
+
+static bool
+is_hpf(const struct mlx5_dev_spawn_data *spawn)
+{
+ return spawn->info.port_name == -1 &&
+ spawn->info.name_type == MLX5_PHYS_PORT_NAME_TYPE_PFHPF;
+}
+
+static int
+build_port_name(struct rte_device *dpdk_dev,
+ struct mlx5_dev_spawn_data *spawn,
+ char *name,
+ size_t name_sz)
+{
+ bool is_bond = spawn->pf_bond >= 0;
+ int written = 0;
+ int ret;
+
+ ret = snprintf(name, name_sz, "%s", dpdk_dev->name);
+ if (ret < 0)
+ return ret;
+ written += ret;
+ if (written >= (int)name_sz)
+ return written;
+
+ /*
+ * Whenever bond device is detected, include IB device name.
+ * This is kept to keep port naming backward compatible.
+ */
+ if (is_bond) {
+ ret = snprintf(name + written, name_sz - written, "_%s", spawn->phys_dev_name);
+ if (ret < 0)
+ return ret;
+ written += ret;
+ if (written >= (int)name_sz)
+ return written;
+ }
+
+ if (spawn->info.name_type == MLX5_PHYS_PORT_NAME_TYPE_UPLINK) {
+ /* Add port to name if and only if there is more than one uplink. */
+ if (spawn->nb_uplinks <= 1)
+ goto end;
+
+ ret = snprintf(name + written, name_sz - written, "_p%u", spawn->info.port_name);
+ if (ret < 0)
+ return ret;
+ written += ret;
+ if (written >= (int)name_sz)
+ return written;
+ } else if (spawn->info.representor) {
+ /*
+ * If port is a representor, then switchdev has been enabled.
+ * In that case add controller, PF and VF/SF indexes to port name
+ * if at least one of these conditions are met:
+ * 1. Device is a bond (VF-LAG).
+ * 2. There are multiple uplinks (MPESW).
+ * 3. There are multiple host PFs (BlueField socket direct).
+ *
+ * If none of these conditions apply, then it is assumed that
+ * this device manages a single non-shared E-Switch with single controller,
+ * where there is only one uplink/PF and one host PF (on BlueField).
+ */
+ if (!is_standard_eswitch(spawn))
+ ret = snprintf(name + written, name_sz - written,
+ "_representor_c%dpf%d%s%u",
+ spawn->info.ctrl_num,
+ spawn->info.pf_num,
+ spawn->info.name_type ==
+ MLX5_PHYS_PORT_NAME_TYPE_PFSF ? "sf" : "vf",
+ spawn->info.port_name);
+ else
+ ret = snprintf(name + written, name_sz - written, "_representor_%s%u",
+ spawn->info.name_type ==
+ MLX5_PHYS_PORT_NAME_TYPE_PFSF ? "sf" : "vf",
+ spawn->info.port_name);
+ if (ret < 0)
+ return ret;
+ written += ret;
+ if (written >= (int)name_sz)
+ return written;
+ }
+
+end:
+ return written;
+}
+
+static bool
+representor_match_uplink(const struct mlx5_dev_spawn_data *spawn,
+ uint16_t port_name,
+ const struct rte_eth_devargs *eth_da,
+ uint16_t eth_da_pf_num)
+{
+ if (spawn->info.name_type != MLX5_PHYS_PORT_NAME_TYPE_UPLINK)
+ return false;
+ /* One of the uplinks will be a transfer proxy. Must be probed always. */
+ if (spawn->info.master)
+ return true;
+ if (mlx5_ignore_pf_representor(eth_da))
+ return false;
+
+ return port_name == eth_da_pf_num;
+}
+
+static bool
+representor_match_port(const struct mlx5_dev_spawn_data *spawn,
+ const struct rte_eth_devargs *eth_da)
+{
+ for (uint16_t p = 0; p < eth_da->nb_ports; ++p) {
+ uint16_t pf_num = eth_da->ports[p];
+
+ /* PF representor in devargs is interpreted as probing uplink port. */
+ if (eth_da->type == RTE_ETH_REPRESENTOR_PF) {
+ if (representor_match_uplink(spawn, spawn->info.port_name, eth_da, pf_num))
+ return true;
+
+ continue;
+ }
+
+ /* Allow probing related uplink when VF/SF representor is requested. */
+ if ((eth_da->type == RTE_ETH_REPRESENTOR_VF ||
+ eth_da->type == RTE_ETH_REPRESENTOR_SF) &&
+ representor_match_uplink(spawn, spawn->info.pf_num, eth_da, pf_num))
+ return true;
+
+ for (uint16_t f = 0; f < eth_da->nb_representor_ports; ++f) {
+ uint16_t port_num = eth_da->representor_ports[f];
+ bool pf_num_match;
+ bool rep_num_match;
+
+ /*
+ * In standard E-Switch case, allow probing VFs even if wrong PF index
+ * was provided.
+ */
+ if (is_standard_eswitch(spawn))
+ pf_num_match = true;
+ else
+ pf_num_match = spawn->info.pf_num == pf_num;
+
+ /* Host PF is indicated through VF/SF representor index == -1. */
+ if (is_hpf(spawn))
+ rep_num_match = port_num == UINT16_MAX;
+ else
+ rep_num_match = port_num == spawn->info.port_name;
+
+ if (pf_num_match && rep_num_match)
+ return true;
+ }
+ }
+
+ return false;
+}
+
/**
* Check if representor spawn info match devargs.
*
@@ -1063,50 +1228,29 @@ mlx5_representor_match(struct mlx5_dev_spawn_data *spawn,
struct rte_eth_devargs *eth_da)
{
struct mlx5_switch_info *switch_info = &spawn->info;
- unsigned int p, f;
- uint16_t id;
- uint16_t repr_id = mlx5_representor_id_encode(switch_info,
- eth_da->type);
+ unsigned int c;
+ bool ignore_ctrl_num = eth_da->nb_mh_controllers == 0 ||
+ switch_info->name_type == MLX5_PHYS_PORT_NAME_TYPE_UPLINK;
- /*
- * Assuming Multiport E-Switch device was detected,
- * if spawned port is an uplink, check if the port
- * was requested through representor devarg.
- */
- if (mlx5_is_probed_port_on_mpesw_device(spawn) &&
- switch_info->name_type == MLX5_PHYS_PORT_NAME_TYPE_UPLINK) {
- for (p = 0; p < eth_da->nb_ports; ++p)
- if (switch_info->port_name == eth_da->ports[p])
- return true;
- rte_errno = EBUSY;
- return false;
- }
switch (eth_da->type) {
case RTE_ETH_REPRESENTOR_PF:
- /*
- * PF representors provided in devargs translate to uplink ports, but
- * if and only if the device is a part of MPESW device.
- */
- if (!mlx5_is_probed_port_on_mpesw_device(spawn)) {
+ if (switch_info->name_type != MLX5_PHYS_PORT_NAME_TYPE_UPLINK) {
rte_errno = EBUSY;
return false;
}
break;
case RTE_ETH_REPRESENTOR_SF:
- if (!(spawn->info.port_name == -1 &&
- switch_info->name_type ==
- MLX5_PHYS_PORT_NAME_TYPE_PFHPF) &&
- switch_info->name_type != MLX5_PHYS_PORT_NAME_TYPE_PFSF) {
+ if (!is_hpf(spawn) &&
+ switch_info->name_type != MLX5_PHYS_PORT_NAME_TYPE_PFSF &&
+ switch_info->name_type != MLX5_PHYS_PORT_NAME_TYPE_UPLINK) {
rte_errno = EBUSY;
return false;
}
break;
case RTE_ETH_REPRESENTOR_VF:
- /* Allows HPF representor index -1 as exception. */
- if (!(spawn->info.port_name == -1 &&
- switch_info->name_type ==
- MLX5_PHYS_PORT_NAME_TYPE_PFHPF) &&
- switch_info->name_type != MLX5_PHYS_PORT_NAME_TYPE_PFVF) {
+ if (!is_hpf(spawn) &&
+ switch_info->name_type != MLX5_PHYS_PORT_NAME_TYPE_PFVF &&
+ switch_info->name_type != MLX5_PHYS_PORT_NAME_TYPE_UPLINK) {
rte_errno = EBUSY;
return false;
}
@@ -1119,21 +1263,17 @@ mlx5_representor_match(struct mlx5_dev_spawn_data *spawn,
DRV_LOG(ERR, "unsupported representor type");
return false;
}
- /* Check representor ID: */
- for (p = 0; p < eth_da->nb_ports; ++p) {
- if (!mlx5_is_probed_port_on_mpesw_device(spawn) && spawn->pf_bond < 0) {
- /* For non-LAG mode, allow and ignore pf. */
- switch_info->pf_num = eth_da->ports[p];
- repr_id = mlx5_representor_id_encode(switch_info,
- eth_da->type);
- }
- for (f = 0; f < eth_da->nb_representor_ports; ++f) {
- id = MLX5_REPRESENTOR_ID
- (eth_da->ports[p], eth_da->type,
- eth_da->representor_ports[f]);
- if (repr_id == id)
+ if (!ignore_ctrl_num) {
+ for (c = 0; c < eth_da->nb_mh_controllers; ++c) {
+ uint16_t ctrl_num = eth_da->mh_controllers[c];
+
+ if (spawn->info.ctrl_num == ctrl_num &&
+ representor_match_port(spawn, eth_da))
return true;
}
+ } else {
+ if (representor_match_port(spawn, eth_da))
+ return true;
}
rte_errno = EBUSY;
return false;
@@ -1185,44 +1325,12 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
!mlx5_representor_match(spawn, eth_da))
return NULL;
/* Build device name. */
- if (spawn->pf_bond >= 0) {
- /* Bonding device. */
- if (!switch_info->representor) {
- err = snprintf(name, sizeof(name), "%s_%s",
- dpdk_dev->name, spawn->phys_dev_name);
- } else {
- err = snprintf(name, sizeof(name), "%s_%s_representor_c%dpf%d%s%u",
- dpdk_dev->name, spawn->phys_dev_name,
- switch_info->ctrl_num,
- switch_info->pf_num,
- switch_info->name_type ==
- MLX5_PHYS_PORT_NAME_TYPE_PFSF ? "sf" : "vf",
- switch_info->port_name);
- }
- } else if (mlx5_is_probed_port_on_mpesw_device(spawn)) {
- /* MPESW device. */
- if (switch_info->name_type == MLX5_PHYS_PORT_NAME_TYPE_UPLINK) {
- err = snprintf(name, sizeof(name), "%s_p%d",
- dpdk_dev->name, spawn->mpesw_port);
- } else {
- err = snprintf(name, sizeof(name), "%s_representor_c%dpf%d%s%u",
- dpdk_dev->name,
- switch_info->ctrl_num,
- switch_info->pf_num,
- switch_info->name_type ==
- MLX5_PHYS_PORT_NAME_TYPE_PFSF ? "sf" : "vf",
- switch_info->port_name);
- }
- } else {
- /* Single device. */
- if (!switch_info->representor)
- strlcpy(name, dpdk_dev->name, sizeof(name));
- else
- err = snprintf(name, sizeof(name), "%s_representor_%s%u",
- dpdk_dev->name,
- switch_info->name_type ==
- MLX5_PHYS_PORT_NAME_TYPE_PFSF ? "sf" : "vf",
- switch_info->port_name);
+ err = build_port_name(dpdk_dev, spawn, name, sizeof(name));
+ if (err < 0) {
+ DRV_LOG(ERR, "Failed to build port name for IB device %s/%u",
+ spawn->phys_dev_name, spawn->phys_port);
+ rte_errno = EINVAL;
+ return NULL;
}
if (err >= (int)sizeof(name))
DRV_LOG(WARNING, "device name overflow %s", name);
@@ -2297,10 +2405,45 @@ mlx5_device_mpesw_pci_match(struct ibv_device *ibv,
return -1;
}
-static inline bool
-mlx5_ignore_pf_representor(const struct rte_eth_devargs *eth_da)
+static void
+calc_nb_uplinks_hpfs(struct ibv_device **ibv_match,
+ unsigned int nd,
+ struct mlx5_dev_spawn_data *list,
+ unsigned int ns)
{
- return (eth_da->flags & RTE_ETH_DEVARG_REPRESENTOR_IGNORE_PF) != 0;
+ for (unsigned int i = 0; i != nd; i++) {
+ uint32_t nb_uplinks = 0;
+ uint32_t nb_hpfs = 0;
+ uint32_t j;
+
+ for (unsigned int j = 0; j != ns; j++) {
+ if (strcmp(ibv_match[i]->name, list[j].phys_dev_name) != 0)
+ continue;
+
+ if (list[j].info.name_type == MLX5_PHYS_PORT_NAME_TYPE_UPLINK)
+ nb_uplinks++;
+ else if (list[j].info.name_type == MLX5_PHYS_PORT_NAME_TYPE_PFHPF)
+ nb_hpfs++;
+ }
+
+ if (nb_uplinks > 0 || nb_hpfs > 0) {
+ for (j = 0; j != ns; j++) {
+ if (strcmp(ibv_match[i]->name, list[j].phys_dev_name) != 0)
+ continue;
+
+ list[j].nb_uplinks = nb_uplinks;
+ list[j].nb_hpfs = nb_hpfs;
+ }
+
+ DRV_LOG(DEBUG, "IB device %s has %u uplinks, %u host PFs",
+ ibv_match[i]->name,
+ nb_uplinks,
+ nb_hpfs);
+ } else {
+ DRV_LOG(DEBUG, "IB device %s unable to recognize uplinks/host PFs",
+ ibv_match[i]->name);
+ }
+ }
}
/**
@@ -2611,8 +2754,6 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev,
if (list[ns].info.port_name == mpesw) {
list[ns].info.master = 1;
list[ns].info.representor = 0;
- } else if (mlx5_ignore_pf_representor(ð_da)) {
- continue;
} else {
list[ns].info.master = 0;
list[ns].info.representor = 1;
@@ -2629,17 +2770,14 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev,
case MLX5_PHYS_PORT_NAME_TYPE_PFHPF:
case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
case MLX5_PHYS_PORT_NAME_TYPE_PFSF:
- /* Only spawn representors related to the probed PF. */
- if (list[ns].info.pf_num == owner_id) {
- /*
- * Ports of this type have PF index encoded in name,
- * which translate to the related uplink port index.
- */
- list[ns].mpesw_port = list[ns].info.pf_num;
- /* MPESW owner is also saved but not used now. */
- list[ns].info.mpesw_owner = mpesw;
- ns++;
- }
+ /*
+ * Ports of this type have PF index encoded in name,
+ * which translate to the related uplink port index.
+ */
+ list[ns].mpesw_port = list[ns].info.pf_num;
+ /* MPESW owner is also saved but not used now. */
+ list[ns].info.mpesw_owner = mpesw;
+ ns++;
break;
default:
break;
@@ -2773,6 +2911,8 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev,
}
}
MLX5_ASSERT(ns);
+ /* Calculate number of uplinks and host PFs for each matched IB device. */
+ calc_nb_uplinks_hpfs(ibv_match, nd, list, ns);
/*
* Sort list to probe devices in natural order for users convenience
* (i.e. master first, then representors from lowest to highest ID).
@@ -2780,16 +2920,12 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev,
qsort(list, ns, sizeof(*list), mlx5_dev_spawn_data_cmp);
if (eth_da.type != RTE_ETH_REPRESENTOR_NONE) {
/* Set devargs default values. */
- if (eth_da.nb_mh_controllers == 0) {
- eth_da.nb_mh_controllers = 1;
- eth_da.mh_controllers[0] = 0;
- }
if (eth_da.nb_ports == 0 && ns > 0) {
if (list[0].pf_bond >= 0 && list[0].info.representor)
DRV_LOG(WARNING, "Representor on Bonding device should use pf#vf# syntax: %s",
pci_dev->device.devargs->args);
eth_da.nb_ports = 1;
- eth_da.ports[0] = list[0].info.pf_num;
+ eth_da.ports[0] = list[0].info.port_name;
}
if (eth_da.nb_representor_ports == 0) {
eth_da.nb_representor_ports = 1;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c54266ec26..f69db11735 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -214,6 +214,8 @@ struct mlx5_dev_cap {
struct mlx5_dev_spawn_data {
uint32_t ifindex; /**< Network interface index. */
uint32_t max_port; /**< Device maximal port index. */
+ uint32_t nb_uplinks; /**< Number of uplinks associated with IB device. */
+ uint32_t nb_hpfs; /**< Number of host PFs associated with IB device. */
uint32_t phys_port; /**< Device physical port index. */
int pf_bond; /**< bonding device PF index. < 0 - no bonding */
int mpesw_port; /**< MPESW uplink index. Valid if mpesw_owner_port >= 0. */
--
2.47.3
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH v2 0/3] net/mlx5: net/mlx5: fix probing to allow BlueField Socket Direct
2026-03-04 10:57 ` [PATCH v2 0/3] net/mlx5: net/mlx5: fix probing to allow BlueField Socket Direct Dariusz Sosnowski
` (2 preceding siblings ...)
2026-03-04 10:57 ` [PATCH v2 3/3] net/mlx5: fix probing to allow BlueField Socket Direct Dariusz Sosnowski
@ 2026-03-10 8:16 ` Raslan Darawsheh
3 siblings, 0 replies; 12+ messages in thread
From: Raslan Darawsheh @ 2026-03-10 8:16 UTC (permalink / raw)
To: Dariusz Sosnowski, Viacheslav Ovsiienko, Bing Zhao, Ori Kam,
Suanming Mou, Matan Azrad
Cc: dev
Hi,
On 04/03/2026 12:57 PM, Dariusz Sosnowski wrote:
> Goal of this patchset is to fix probing logic in mlx5 networking PMD
> to allow support of BlueField DPUs with Socket Direct.
> In such use case, BlueField DPU will be connected through PCI
> to 2 different CPUs on the host.
> Each host CPU sees 2 PFs.
> Each PF is connected to one of the physical ports.
>
> +--------+ +--------+
> |CPU 0 | |CPU 1 |
> | | | |
> | pf0 | | pf0 |
> | | | |
> | pf1 | | pf1 |
> | | | |
> +---+----+ +-+------+
> | |
> | |
> | |
> +----+ +-----+
> | |
> | |
> | |
> +---+-----------+----+
> |BF3 DPU |
> | |
> | pf0hpf pf1hpf |
> | |
> | pf2hpf pf3hpf |
> | |
> | p0 p1 |
> +------+------+------+
> | phy0 | | phy1 |
> +------+ +------+
>
> On BlueField DPU ARM Linux netdevs map to PFs/ports as follows:
>
> - p0 and p1 to physical ports 0 and 1 respectively,
> - pf0hpf and pf2hpf to CPU0 pf0 and CPU1 pf0 respectively,
> - pf1hpf and pf3hpf to CPU0 pf1 and CPU1 pf1 respectively.
>
> There are several possible ways to use such a setup:
>
> - Single E-Switch (embedded switch) per each CPU PF to
> physical port connection.
> - Shared E-Switch for related CPU PFs:
> - For example, both pf0hpf and pf2hpf are in the same E-Switch.
> - Multiport E-Switch.
>
> Existing probing logic in mlx5 PMD did not support case (2).
> In this case there is one physical port (uplink in mlx5 naming)
> and 2 host PFs.
> On such a setup mlx5 generated port names with the following syntax:
>
> 03:00.0_representor_vfX
>
> Port name syntax was selected based on specific setup type.
> Since setup was not recognized as neither bond nor MPESW,
> mlx5 selected the default name without PF index.
> Since BlueField with Socket Direct would have 2 host PFs,
> such probing logic caused DPDK port name collisions
> on the attempt to probe 2 host PFs at the same time.
>
> More over there were some false positives during probing on systems
> with or without specific udev rules which change mlx5 IB device
> name to include "bond" in the name.
>
> This patchset addresses the above:
>
> - Patch 1 and 2 - Fixes bond detection logic.
> Previously mlx5 PMD relied on "bond" appearing in IB device name
> which is not always the case. Moved to sysfs checks.
> - Patch 3 - Fixed uplink and host PF probing logic.
> Previously mlx5 PMD relied on specific setup type.
> With this patch probing is more generic and based on
> types and number of available ports on the E-Switch.
>
> With this patchset, a DPDK application run on BlueField ARM is able to
> probe all the relevant representors (corresponding to available netdevs).
> Using testpmd syntax users will be able to do the following:
>
> # Probe both physical ports
> port attach 03:00.0,dv_flow_en=2,representor=pf0-1
>
> # Probe both host PF 0 from CPU 0
> # (VF representor index -1 is special encoding for host PF)
> port attach 03:00.0,dv_flow_en=2,representor=pf0vf65535
> # or with explicit controller index
> port attach 03:00.0,dv_flow_en=2,representor=c1pf0vf65535
>
> # Probe both host PF 0 from CPU 1
> port attach 03:00.0,dv_flow_en=2,representor=pf2vf65535
> # or with explicit controller index
> port attach 03:00.0,dv_flow_en=2,representor=c2pf2vf65535
>
> v1: https://patches.dpdk.org/project/dpdk/cover/20260302113443.16648-1-dsosnowski@nvidia.com/
>
> v1->v2:
> - Squash patches 3-5 and add Fixes tag,
> since these patches fix existing probing logic.
>
> Dariusz Sosnowski (3):
> common/mlx5: fix bond check
> net/mlx5: fix bond check
> net/mlx5: fix probing to allow BlueField Socket Direct
>
> drivers/common/mlx5/linux/mlx5_common_os.c | 86 ++++-
> drivers/common/mlx5/linux/mlx5_common_os.h | 9 +
> drivers/net/mlx5/linux/mlx5_os.c | 356 ++++++++++++++-------
> drivers/net/mlx5/mlx5.h | 2 +
> 4 files changed, 338 insertions(+), 115 deletions(-)
>
> --
> 2.47.3
>
Series applied to next-net-mlx,
Kindest regards
Raslan Darawsheh
^ permalink raw reply [flat|nested] 12+ messages in thread