public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Adam Young <admiyo@amperemail.onmicrosoft.com>
To: Jiri Pirko <jiri@resnulli.us>
Cc: netdev@vger.kernel.org
Subject: Re: [PATCH net-next v4 13/13] net/mlx5: Add a shared devlink instance for PFs on same chip
Date: Tue, 24 Mar 2026 15:57:56 -0400	[thread overview]
Message-ID: <fce1bf3e-325c-412b-9b87-0536e273717e@amperemail.onmicrosoft.com> (raw)
In-Reply-To: <d4d7aa10-10f1-4321-9476-266ce7fc7c31@amperemail.onmicrosoft.com>


On 3/24/26 13:49, Adam Young wrote:
>
> On 3/24/26 09:02, Jiri Pirko wrote:
>> Mon, Mar 23, 2026 at 04:05:19PM +0100, jiri@resnulli.us wrote:
>>> Sat, Mar 21, 2026 at 12:37:06AM +0100, 
>>> admiyo@amperemail.onmicrosoft.com wrote:
>>>>
>>>> 0005:01:00.0 Ethernet controller: Mellanox Technologies MT27710 Family
>>>> [ConnectX-4 Lx]
>>> Cool. Will try to reproduce this locally. Thanks!
>>>
>> Hi. Testing this with:
>> 08:00.0 Ethernet controller: Mellanox Technologies MT27710 Family 
>> [ConnectX-4 Lx]
>> works fine. What's your output of "devlink dev info"?
>
> Running on 6.18.6:
>
> # devlink dev info
> pci/0005:01:00.0:
>   driver mlx5_core
>   versions:
>       fixed:
>         fw.psid MT_2420110034
>       running:
>         fw.version 14.32.1010
>         fw 14.32.1010
>       stored:
>         fw.version 14.32.1010
>         fw 14.32.1010
> auxiliary/mlx5_core.eth.0:
>   driver mlx5_core.eth
> pci/0005:01:00.1:
>   driver mlx5_core
>   versions:
>       fixed:
>         fw.psid MT_2420110034
>       running:
>         fw.version 14.32.1010
>         fw 14.32.1010
>       stored:
>         fw.version 14.32.1010
>         fw 14.32.1010
> auxiliary/mlx5_core.eth.1:
>   driver mlx5_core.eth
>
I just confirmed that the error is the failure to look up the SN or V3 
keyword.  The following ugly actually brought up the adapter:

@@ -32,12 +34,14 @@ int mlx5_shd_init(struct mlx5_core_dev *dev)
                 /* Fall-back to SN for older devices. */
                 start = pci_vpd_find_ro_info_keyword(vpd_data, vpd_size,
  PCI_VPD_RO_KEYWORD_SERIALNO, &kw_len);
-               if (start < 0)
-                       return -ENOENT;
         }
-       sn = kstrndup(vpd_data + start, kw_len, GFP_KERNEL);
+       if (start < 0)
+               sn = kstrndup("random", 6, GFP_KERNEL);
+       else
+               sn = kstrndup(vpd_data + start, kw_len, GFP_KERNEL);
         if (!sn)
                 return -ENOMEM;
+

SO I think you need something other than sn to key off of to name the 
shd (if I understand the code correctly)



>
>
>>
>>
>>>
>>>
>>>> 0005:01:00.1 Ethernet controller: Mellanox Technologies MT27710 Family
>>>> [ConnectX-4 Lx]
>>>>
>>>> On 3/20/26 19:16, Adam Young wrote:
>>>>> This breaks on my system:
>>>>>
>>>>> On 7.0.0 It boots fine.  With net-next/main currently at this commit
>>>>>
>>>>>
>>>>> commit 8737d7194d6d5947c3d7d8813895b44a25b84477 (net-next/main,
>>>>> net-next/HEAD)
>>>>> Author: Lorenzo Bianconi <lorenzo@kernel.org>
>>>>> Date:   Fri Mar 13 17:28:36 2026 +0100
>>>>>
>>>>> I get:
>>>>>
>>>>> [   21.859081] mlx5_core 0005:01:00.0: probe_one:2017:(pid 10):
>>>>> mlx5_shd_init failed with error code -2
>>>>> [   21.863266] mlx5_core 0005:01:00.0: probe with driver mlx5_core 
>>>>> failed
>>>>> with error -2
>>>>> [   21.866360] mlx5_core 0005:01:00.1: probe_one:2017:(pid 10):
>>>>> mlx5_shd_init failed with error code -2
>>>>> [   21.869937] mlx5_core 0005:01:00.1: probe with driver mlx5_core 
>>>>> failed
>>>>> with error -2
>>>>>
>>>>>
>>>>> I am happy to help debug:   what do you need from me?
>>>>>
>>>>>
>>>>> On 3/12/26 06:04, Jiri Pirko wrote:
>>>>>> From: Jiri Pirko <jiri@nvidia.com>
>>>>>>
>>>>>> Use the previously introduced shared devlink infrastructure to 
>>>>>> create
>>>>>> a shared devlink instance for mlx5 PFs that reside on the same 
>>>>>> physical
>>>>>> chip. The shared instance is identified by the chip's serial number
>>>>>> extracted from PCI VPD (V3 keyword, with fallback to serial number
>>>>>> for older devices).
>>>>>>
>>>>>> Each PF that probes calls mlx5_shd_init() which extracts the chip 
>>>>>> serial
>>>>>> number and uses devlink_shd_get() to get or create the shared 
>>>>>> instance.
>>>>>> When a PF is removed, mlx5_shd_uninit() calls devlink_shd_put()
>>>>>> to release the reference. The shared instance is automatically 
>>>>>> destroyed
>>>>>> when the last PF is removed.
>>>>>>
>>>>>> Make the PF devlink instances nested in this shared devlink 
>>>>>> instance,
>>>>>> allowing userspace to identify which PFs belong to the same physical
>>>>>> chip.
>>>>>>
>>>>>> Example:
>>>>>>
>>>>>> pci/0000:08:00.0: index 0
>>>>>>     nested_devlink:
>>>>>>       auxiliary/mlx5_core.eth.0
>>>>>> devlink_index/1: index 1
>>>>>>     nested_devlink:
>>>>>>       pci/0000:08:00.0
>>>>>>       pci/0000:08:00.1
>>>>>> auxiliary/mlx5_core.eth.0: index 2
>>>>>> pci/0000:08:00.1: index 3
>>>>>>     nested_devlink:
>>>>>>       auxiliary/mlx5_core.eth.1
>>>>>> auxiliary/mlx5_core.eth.1: index 4
>>>>>>
>>>>>> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
>>>>>> ---
>>>>>> v2->v3:
>>>>>> - removed "const" from "sn"
>>>>>> - passing driver pointer to devlink_shd_get()
>>>>>> ---
>>>>>>    .../net/ethernet/mellanox/mlx5/core/Makefile  |  5 +-
>>>>>>    .../net/ethernet/mellanox/mlx5/core/main.c    | 17 ++++++
>>>>>>    .../ethernet/mellanox/mlx5/core/sh_devlink.c  | 61 
>>>>>> +++++++++++++++++++
>>>>>>    .../ethernet/mellanox/mlx5/core/sh_devlink.h  | 12 ++++
>>>>>>    include/linux/mlx5/driver.h                   |  1 +
>>>>>>    5 files changed, 94 insertions(+), 2 deletions(-)
>>>>>>    create mode 100644
>>>>>> drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.c
>>>>>>    create mode 100644
>>>>>> drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.h
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>>> b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>>> index 8ffa286a18f5..d39fe9c4a87c 100644
>>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>>> @@ -16,8 +16,9 @@ mlx5_core-y :=    main.o cmd.o debugfs.o fw.o eq.o
>>>>>> uar.o pagealloc.o \
>>>>>>            transobj.o vport.o sriov.o fs_cmd.o fs_core.o pci_irq.o \
>>>>>>            fs_counters.o fs_ft_pool.o rl.o lag/debugfs.o lag/lag.o
>>>>>> dev.o events.o wq.o lib/gid.o \
>>>>>>            lib/devcom.o lib/pci_vsc.o lib/dm.o lib/fs_ttc.o
>>>>>> diag/fs_tracepoint.o \
>>>>>> -        diag/fw_tracer.o diag/crdump.o devlink.o diag/rsc_dump.o
>>>>>> diag/reporter_vnic.o \
>>>>>> -        fw_reset.o qos.o lib/tout.o lib/aso.o wc.o fs_pool.o
>>>>>> lib/nv_param.o
>>>>>> +        diag/fw_tracer.o diag/crdump.o devlink.o sh_devlink.o
>>>>>> diag/rsc_dump.o \
>>>>>> +        diag/reporter_vnic.o fw_reset.o qos.o lib/tout.o lib/aso.o
>>>>>> wc.o fs_pool.o \
>>>>>> +        lib/nv_param.o
>>>>>>      #
>>>>>>    # Netdev basic
>>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c
>>>>>> b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>>>>>> index fdc3ba20912e..1c35c3fc3bb3 100644
>>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
>>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>>>>>> @@ -74,6 +74,7 @@
>>>>>>    #include "mlx5_irq.h"
>>>>>>    #include "hwmon.h"
>>>>>>    #include "lag/lag.h"
>>>>>> +#include "sh_devlink.h"
>>>>>>      MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>");
>>>>>>    MODULE_DESCRIPTION("Mellanox 5th generation network adapters
>>>>>> (ConnectX series) core driver");
>>>>>> @@ -1520,10 +1521,16 @@ int mlx5_init_one(struct mlx5_core_dev *dev)
>>>>>>        int err;
>>>>>>          devl_lock(devlink);
>>>>>> +    if (dev->shd) {
>>>>>> +        err = devl_nested_devlink_set(dev->shd, devlink);
>>>>>> +        if (err)
>>>>>> +            goto unlock;
>>>>>> +    }
>>>>>>        devl_register(devlink);
>>>>>>        err = mlx5_init_one_devl_locked(dev);
>>>>>>        if (err)
>>>>>>            devl_unregister(devlink);
>>>>>> +unlock:
>>>>>>        devl_unlock(devlink);
>>>>>>        return err;
>>>>>>    }
>>>>>> @@ -2005,6 +2012,13 @@ static int probe_one(struct pci_dev *pdev,
>>>>>> const struct pci_device_id *id)
>>>>>>            goto pci_init_err;
>>>>>>        }
>>>>>>    +    err = mlx5_shd_init(dev);
>>>>>> +    if (err) {
>>>>>> +        mlx5_core_err(dev, "mlx5_shd_init failed with error code 
>>>>>> %d\n",
>>>>>> +                  err);
>>>>>> +        goto shd_init_err;
>>>>>> +    }
>>>>>> +
>>>>>>        err = mlx5_init_one(dev);
>>>>>>        if (err) {
>>>>>>            mlx5_core_err(dev, "mlx5_init_one failed with error code
>>>>>> %d\n",
>>>>>> @@ -2018,6 +2032,8 @@ static int probe_one(struct pci_dev *pdev,
>>>>>> const struct pci_device_id *id)
>>>>>>        return 0;
>>>>>>      err_init_one:
>>>>>> +    mlx5_shd_uninit(dev);
>>>>>> +shd_init_err:
>>>>>>        mlx5_pci_close(dev);
>>>>>>    pci_init_err:
>>>>>>        mlx5_mdev_uninit(dev);
>>>>>> @@ -2039,6 +2055,7 @@ static void remove_one(struct pci_dev *pdev)
>>>>>>        mlx5_drain_health_wq(dev);
>>>>>>        mlx5_sriov_disable(pdev, false);
>>>>>>        mlx5_uninit_one(dev);
>>>>>> +    mlx5_shd_uninit(dev);
>>>>>>        mlx5_pci_close(dev);
>>>>>>        mlx5_mdev_uninit(dev);
>>>>>>        mlx5_adev_idx_free(dev->priv.adev_idx);
>>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.c
>>>>>> b/drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.c
>>>>>> new file mode 100644
>>>>>> index 000000000000..bc33f95302df
>>>>>> --- /dev/null
>>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.c
>>>>>> @@ -0,0 +1,61 @@
>>>>>> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>>>>>> +/* Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights
>>>>>> reserved. */
>>>>>> +
>>>>>> +#include <linux/mlx5/driver.h>
>>>>>> +#include <net/devlink.h>
>>>>>> +
>>>>>> +#include "sh_devlink.h"
>>>>>> +
>>>>>> +static const struct devlink_ops mlx5_shd_ops = {
>>>>>> +};
>>>>>> +
>>>>>> +int mlx5_shd_init(struct mlx5_core_dev *dev)
>>>>>> +{
>>>>>> +    u8 *vpd_data __free(kfree) = NULL;
>>>>>> +    struct pci_dev *pdev = dev->pdev;
>>>>>> +    unsigned int vpd_size, kw_len;
>>>>>> +    struct devlink *devlink;
>>>>>> +    char *sn, *end;
>>>>>> +    int start;
>>>>>> +    int err;
>>>>>> +
>>>>>> +    if (!mlx5_core_is_pf(dev))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    vpd_data = pci_vpd_alloc(pdev, &vpd_size);
>>>>>> +    if (IS_ERR(vpd_data)) {
>>>>>> +        err = PTR_ERR(vpd_data);
>>>>>> +        return err == -ENODEV ? 0 : err;
>>>>>> +    }
>>>>>> +    start = pci_vpd_find_ro_info_keyword(vpd_data, vpd_size, "V3",
>>>>>> &kw_len);
>>>>>> +    if (start < 0) {
>>>>>> +        /* Fall-back to SN for older devices. */
>>>>>> +        start = pci_vpd_find_ro_info_keyword(vpd_data, vpd_size,
>>>>>> + PCI_VPD_RO_KEYWORD_SERIALNO, &kw_len);
>>>>>> +        if (start < 0)
>>>>>> +            return -ENOENT;
>>>>>> +    }
>>>>>> +    sn = kstrndup(vpd_data + start, kw_len, GFP_KERNEL);
>>>>>> +    if (!sn)
>>>>>> +        return -ENOMEM;
>>>>>> +    /* Firmware may return spaces at the end of the string, strip
>>>>>> it. */
>>>>>> +    end = strchrnul(sn, ' ');
>>>>>> +    *end = '\0';
>>>>>> +
>>>>>> +    /* Get or create shared devlink instance */
>>>>>> +    devlink = devlink_shd_get(sn, &mlx5_shd_ops, 0, 
>>>>>> pdev->dev.driver);
>>>>>> +    kfree(sn);
>>>>>> +    if (!devlink)
>>>>>> +        return -ENOMEM;
>>>>>> +
>>>>>> +    dev->shd = devlink;
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +void mlx5_shd_uninit(struct mlx5_core_dev *dev)
>>>>>> +{
>>>>>> +    if (!dev->shd)
>>>>>> +        return;
>>>>>> +
>>>>>> +    devlink_shd_put(dev->shd);
>>>>>> +}
>>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.h
>>>>>> b/drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.h
>>>>>> new file mode 100644
>>>>>> index 000000000000..8ab8d6940227
>>>>>> --- /dev/null
>>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.h
>>>>>> @@ -0,0 +1,12 @@
>>>>>> +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
>>>>>> +/* Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights
>>>>>> reserved. */
>>>>>> +
>>>>>> +#ifndef __MLX5_SH_DEVLINK_H__
>>>>>> +#define __MLX5_SH_DEVLINK_H__
>>>>>> +
>>>>>> +#include <linux/mlx5/driver.h>
>>>>>> +
>>>>>> +int mlx5_shd_init(struct mlx5_core_dev *dev);
>>>>>> +void mlx5_shd_uninit(struct mlx5_core_dev *dev);
>>>>>> +
>>>>>> +#endif /* __MLX5_SH_DEVLINK_H__ */
>>>>>> diff --git a/include/linux/mlx5/driver.h 
>>>>>> b/include/linux/mlx5/driver.h
>>>>>> index 04dcd09f7517..1268fcf35ec7 100644
>>>>>> --- a/include/linux/mlx5/driver.h
>>>>>> +++ b/include/linux/mlx5/driver.h
>>>>>> @@ -798,6 +798,7 @@ struct mlx5_core_dev {
>>>>>>        enum mlx5_wc_state wc_state;
>>>>>>        /* sync write combining state */
>>>>>>        struct mutex wc_state_lock;
>>>>>> +    struct devlink *shd;
>>>>>>    };
>>>>>>      struct mlx5_db {
>

  reply	other threads:[~2026-03-24 19:58 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-12 10:03 [PATCH net-next v4 00/13] devlink: introduce shared devlink instance for PFs on same chip Jiri Pirko
2026-03-12 10:03 ` [PATCH net-next v4 01/13] devlink: expose devlink instance index over netlink Jiri Pirko
2026-03-12 10:03 ` [PATCH net-next v4 02/13] devlink: add helpers to get bus_name/dev_name Jiri Pirko
2026-03-12 10:03 ` [PATCH net-next v4 03/13] devlink: avoid extra iterations when found devlink is not registered Jiri Pirko
2026-03-12 10:03 ` [PATCH net-next v4 04/13] devlink: allow to use devlink index as a command handle Jiri Pirko
2026-03-12 10:03 ` [PATCH net-next v4 05/13] devlink: support index-based lookup via bus_name/dev_name handle Jiri Pirko
2026-03-12 10:04 ` [PATCH net-next v4 06/13] devlink: support index-based notification filtering Jiri Pirko
2026-03-12 10:04 ` [PATCH net-next v4 07/13] devlink: introduce __devlink_alloc() with dev driver pointer Jiri Pirko
2026-03-12 10:04 ` [PATCH net-next v4 08/13] devlink: add devlink_dev_driver_name() helper and use it in trace events Jiri Pirko
2026-03-12 10:04 ` [PATCH net-next v4 09/13] devlink: add devl_warn() helper and use it in port warnings Jiri Pirko
2026-03-12 10:04 ` [PATCH net-next v4 10/13] devlink: allow devlink instance allocation without a backing device Jiri Pirko
2026-03-12 10:04 ` [PATCH net-next v4 11/13] devlink: introduce shared devlink instance for PFs on same chip Jiri Pirko
2026-03-12 10:04 ` [PATCH net-next v4 12/13] documentation: networking: add shared devlink documentation Jiri Pirko
2026-03-12 10:04 ` [PATCH net-next v4 13/13] net/mlx5: Add a shared devlink instance for PFs on same chip Jiri Pirko
2026-03-20 23:16   ` Adam Young
2026-03-20 23:37     ` Adam Young
2026-03-23 15:05       ` Jiri Pirko
2026-03-24 13:02         ` Jiri Pirko
2026-03-24 17:49           ` Adam Young
2026-03-24 19:57             ` Adam Young [this message]
2026-03-24 15:10   ` [PATCH net-next v4 13/15] " Ben Copeland
2026-03-24 15:21     ` Jiri Pirko
2026-03-24 15:37       ` Ben Copeland
2026-03-14 20:20 ` [PATCH net-next v4 00/13] devlink: introduce " patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fce1bf3e-325c-412b-9b87-0536e273717e@amperemail.onmicrosoft.com \
    --to=admiyo@amperemail.onmicrosoft.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox