From: Leon Romanovsky <leon@kernel.org>
To: Omer Shpigelman <oshpigelman@habana.ai>
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, zyehudai@habana.ai
Subject: Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
Date: Thu, 13 Jun 2024 22:18:28 +0300 [thread overview]
Message-ID: <20240613191828.GJ4966@unreal> (raw)
In-Reply-To: <20240613082208.1439968-12-oshpigelman@habana.ai>
On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
> Add an RDMA driver of Gaudi ASICs family for AI scaling.
> The driver itself is agnostic to the ASIC in action, it operates according
> to the capabilities that were passed on device initialization.
> The device is initialized by the hbl_cn driver via auxiliary bus.
> The driver also supports QP resource tracking and port/device HW counters.
>
> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> Co-developed-by: David Meriin <dmeriin@habana.ai>
> Signed-off-by: David Meriin <dmeriin@habana.ai>
> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
I afraid that you misinterpreted the "Co-developed-by" tag. All these
people are probably touch the code and not actually sit together at
the same room and write the code together. So, please remove the
extensive "Co-developed-by" tags.
It is not full review yet, but simple pass-by-comments.
> ---
> MAINTAINERS | 10 +
> drivers/infiniband/Kconfig | 1 +
> drivers/infiniband/hw/Makefile | 1 +
> drivers/infiniband/hw/hbl/Kconfig | 17 +
> drivers/infiniband/hw/hbl/Makefile | 8 +
> drivers/infiniband/hw/hbl/hbl.h | 326 +++
> drivers/infiniband/hw/hbl/hbl_main.c | 478 ++++
> drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 ++++++++++++++++++++++
> include/uapi/rdma/hbl-abi.h | 204 ++
> include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
> include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
> include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
> 12 files changed, 3904 insertions(+)
> create mode 100644 drivers/infiniband/hw/hbl/Kconfig
> create mode 100644 drivers/infiniband/hw/hbl/Makefile
> create mode 100644 drivers/infiniband/hw/hbl/hbl.h
> create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c
> create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c
> create mode 100644 include/uapi/rdma/hbl-abi.h
> create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h
> create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h
<...>
> +#define hbl_ibdev_emerg(ibdev, format, ...) ibdev_emerg(ibdev, format, ##__VA_ARGS__)
> +#define hbl_ibdev_alert(ibdev, format, ...) ibdev_alert(ibdev, format, ##__VA_ARGS__)
> +#define hbl_ibdev_crit(ibdev, format, ...) ibdev_crit(ibdev, format, ##__VA_ARGS__)
> +#define hbl_ibdev_err(ibdev, format, ...) ibdev_err(ibdev, format, ##__VA_ARGS__)
> +#define hbl_ibdev_warn(ibdev, format, ...) ibdev_warn(ibdev, format, ##__VA_ARGS__)
> +#define hbl_ibdev_notice(ibdev, format, ...) ibdev_notice(ibdev, format, ##__VA_ARGS__)
> +#define hbl_ibdev_info(ibdev, format, ...) ibdev_info(ibdev, format, ##__VA_ARGS__)
> +#define hbl_ibdev_dbg(ibdev, format, ...) ibdev_dbg(ibdev, format, ##__VA_ARGS__)
> +
> +#define hbl_ibdev_emerg_ratelimited(ibdev, fmt, ...) \
> + ibdev_emerg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +#define hbl_ibdev_alert_ratelimited(ibdev, fmt, ...) \
> + ibdev_alert_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +#define hbl_ibdev_crit_ratelimited(ibdev, fmt, ...) \
> + ibdev_crit_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +#define hbl_ibdev_err_ratelimited(ibdev, fmt, ...) \
> + ibdev_err_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +#define hbl_ibdev_warn_ratelimited(ibdev, fmt, ...) \
> + ibdev_warn_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +#define hbl_ibdev_notice_ratelimited(ibdev, fmt, ...) \
> + ibdev_notice_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +#define hbl_ibdev_info_ratelimited(ibdev, fmt, ...) \
> + ibdev_info_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +#define hbl_ibdev_dbg_ratelimited(ibdev, fmt, ...) \
> + ibdev_dbg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +
Please don't redefine the existing macros. Just use the existing ones.
<...>
> + if (hbl_ib_match_netdev(ibdev, netdev))
> + ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
> + else
> + return NOTIFY_DONE;
It is not kernel coding style. Please write:
if (!hbl_ib_match_netdev(ibdev, netdev))
return NOTIFY_DONE;
ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
> +
<...>
> +static int hbl_ib_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
> +{
> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
> + struct hbl_ib_aux_ops *aux_ops = aux_dev->aux_ops;
> + struct hbl_ib_device *hdev;
> + ktime_t timeout;
> + int rc;
> +
> + rc = hdev_init(aux_dev);
> + if (rc) {
> + dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
> + return -EIO;
> + }
> +
> + hdev = aux_dev->priv;
> +
> + /* don't allow module unloading while it is attached */
> + if (!try_module_get(THIS_MODULE)) {
This part makes wonder, what are you trying to do here? What doesn't work for you
in standard driver core and module load mechanism?
> + dev_err(hdev->dev, "Failed to increment %s module refcount\n",
> + module_name(THIS_MODULE));
> + rc = -EIO;
> + goto module_get_err;
> + }
> +
> + timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
> + while (1) {
> + aux_ops->hw_access_lock(aux_dev);
> +
> + /* if the device is operational, proceed to actual init while holding the lock in
> + * order to prevent concurrent hard reset
> + */
> + if (aux_ops->device_operational(aux_dev))
> + break;
> +
> + aux_ops->hw_access_unlock(aux_dev);
> +
> + if (ktime_compare(ktime_get(), timeout) > 0) {
> + dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
> + rc = -EBUSY;
> + goto timeout_err;
> + }
> +
> + dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing IB\n");
> +
> + msleep_interruptible(MSEC_PER_SEC);
> + }
The code above is unexpected.
> +
> + rc = hbl_ib_dev_init(hdev);
> + if (rc) {
> + dev_err(hdev->dev, "Failed to init ib device\n");
> + goto dev_init_err;
> + }
> +
> + aux_ops->hw_access_unlock(aux_dev);
> +
> + return 0;
> +
> +dev_init_err:
> + aux_ops->hw_access_unlock(aux_dev);
> +timeout_err:
> + module_put(THIS_MODULE);
> +module_get_err:
> + hdev_fini(aux_dev);
> +
> + return rc;
> +}
<...>
> +static int __init hbl_ib_init(void)
> +{
> + pr_info("loading driver\n");
Please remove all these debug prints and leave only the necessary ones.
> +
> + return auxiliary_driver_register(&hbl_ib_driver);
> +}
> +
> +static void __exit hbl_ib_exit(void)
> +{
> + auxiliary_driver_unregister(&hbl_ib_driver);
> +
> + pr_info("driver removed\n");
> +}
> +
> +module_init(hbl_ib_init);
> +module_exit(hbl_ib_exit)
Thanks
next prev parent reply other threads:[~2024-06-13 19:18 UTC|newest]
Thread overview: 114+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
2024-06-13 8:21 ` [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver Omer Shpigelman
2024-06-13 13:01 ` Przemek Kitszel
2024-06-13 14:16 ` Przemek Kitszel
2024-06-17 8:08 ` Omer Shpigelman
2024-06-17 11:48 ` Leon Romanovsky
2024-06-18 7:28 ` Omer Shpigelman
2024-06-15 0:05 ` Stephen Hemminger
2024-06-17 8:14 ` Omer Shpigelman
2024-06-17 14:05 ` Markus Elfring
2024-06-17 15:02 ` Andrew Lunn
2024-06-18 7:51 ` Omer Shpigelman
2024-06-13 8:21 ` [PATCH 02/15] net: hbl_cn: memory manager component Omer Shpigelman
2024-06-13 8:21 ` [PATCH 03/15] net: hbl_cn: physical layer support Omer Shpigelman
2024-06-13 8:21 ` [PATCH 04/15] net: hbl_cn: QP state machine Omer Shpigelman
2024-06-17 13:18 ` Leon Romanovsky
2024-06-18 5:50 ` Omer Shpigelman
2024-06-18 7:08 ` Leon Romanovsky
2024-06-18 7:58 ` Omer Shpigelman
2024-06-18 9:00 ` Leon Romanovsky
2024-06-24 7:24 ` Omer Shpigelman
2024-06-13 8:21 ` [PATCH 05/15] net: hbl_cn: memory trace events Omer Shpigelman
2024-06-13 8:21 ` [PATCH 06/15] net: hbl_cn: debugfs support Omer Shpigelman
2024-06-19 18:35 ` Sunil Kovvuri Goutham
2024-06-21 10:17 ` Omer Shpigelman
2024-06-21 10:30 ` Sunil Kovvuri Goutham
2024-06-23 7:25 ` Omer Shpigelman
2024-06-21 15:33 ` Andrew Lunn
2024-06-23 6:57 ` Omer Shpigelman
2024-06-23 15:02 ` Andrew Lunn
2024-06-24 7:21 ` Omer Shpigelman
2024-06-24 9:22 ` Leon Romanovsky
2024-12-17 10:00 ` Avri Kehat
2024-12-17 10:52 ` Andrew Lunn
2024-06-13 8:22 ` [PATCH 07/15] net: hbl_cn: gaudi2: ASIC register header files Omer Shpigelman
2024-06-13 8:22 ` [PATCH 08/15] net: hbl_cn: gaudi2: ASIC specific support Omer Shpigelman
2024-06-13 8:22 ` [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver Omer Shpigelman
2024-06-13 21:49 ` Andrew Lunn
2024-06-18 6:58 ` Omer Shpigelman
2024-06-18 14:19 ` Andrew Lunn
2024-06-19 7:16 ` Omer Shpigelman
2024-06-19 8:01 ` Przemek Kitszel
2024-06-19 12:15 ` Omer Shpigelman
2024-06-19 15:21 ` Jakub Kicinski
2024-06-20 8:43 ` Omer Shpigelman
2024-06-20 13:51 ` Jakub Kicinski
2024-06-20 19:14 ` Andrew Lunn
2024-06-23 14:48 ` Omer Shpigelman
2024-06-19 16:13 ` Andrew Lunn
2024-06-23 6:22 ` Omer Shpigelman
2024-06-23 14:46 ` Andrew Lunn
2024-06-26 10:13 ` Omer Shpigelman
2024-06-26 14:13 ` Andrew Lunn
2024-06-30 7:11 ` Omer Shpigelman
2024-06-14 22:48 ` Joe Damato
2024-06-16 1:04 ` Andrew Lunn
2024-06-18 19:37 ` Omer Shpigelman
2024-06-18 21:19 ` Stephen Hemminger
2024-06-19 12:13 ` Omer Shpigelman
2024-06-15 0:10 ` Stephen Hemminger
2024-06-19 12:07 ` Omer Shpigelman
2024-06-15 0:16 ` Stephen Hemminger
2024-06-18 19:39 ` Omer Shpigelman
2024-06-19 15:40 ` Andrew Lunn
2024-06-20 8:36 ` Omer Shpigelman
2024-06-15 10:55 ` Zhu Yanjun
2024-06-18 11:16 ` Omer Shpigelman
2024-06-15 17:13 ` Zhu Yanjun
2024-06-16 1:08 ` Andrew Lunn
2024-06-13 8:22 ` [PATCH 10/15] net: hbl_en: gaudi2: ASIC specific support Omer Shpigelman
2024-06-13 8:22 ` [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver Omer Shpigelman
2024-06-13 19:18 ` Leon Romanovsky [this message]
2024-06-17 17:43 ` Omer Shpigelman
2024-06-17 19:04 ` Leon Romanovsky
2024-06-18 11:08 ` Omer Shpigelman
2024-06-18 12:58 ` Leon Romanovsky
2024-06-19 9:27 ` Omer Shpigelman
2024-06-19 10:52 ` Leon Romanovsky
2024-06-24 8:47 ` Omer Shpigelman
2024-06-24 9:10 ` Leon Romanovsky
2024-06-28 10:24 ` Omer Shpigelman
2024-06-30 13:29 ` Leon Romanovsky
2024-07-01 10:46 ` Omer Shpigelman
2024-07-01 12:46 ` Leon Romanovsky
2024-07-12 13:08 ` Jason Gunthorpe
2024-07-14 10:18 ` Omer Shpigelman
2024-07-16 13:40 ` Jason Gunthorpe
2024-07-17 7:08 ` Omer Shpigelman
2024-07-17 7:36 ` Leon Romanovsky
2024-07-17 10:51 ` Omer Shpigelman
2024-07-17 11:56 ` Jason Gunthorpe
2024-07-17 12:33 ` Leon Romanovsky
2024-07-18 6:54 ` Omer Shpigelman
2024-07-18 8:31 ` Leon Romanovsky
2024-06-18 16:01 ` Przemek Kitszel
2024-06-19 9:34 ` Omer Shpigelman
2024-06-17 14:17 ` Jason Gunthorpe
2024-06-19 9:39 ` Omer Shpigelman
2024-06-13 8:22 ` [PATCH 12/15] RDMA/hbl: direct verbs support Omer Shpigelman
2024-07-04 10:31 ` Omer Shpigelman
2024-07-07 7:57 ` Leon Romanovsky
2024-07-07 8:42 ` Omer Shpigelman
2024-07-07 9:00 ` Leon Romanovsky
2024-07-12 13:10 ` Jason Gunthorpe
2024-07-14 6:05 ` Omer Shpigelman
2024-06-13 8:22 ` [PATCH 13/15] accel/habanalabs: network scaling support Omer Shpigelman
2024-06-19 18:41 ` Sunil Kovvuri Goutham
2024-06-21 10:21 ` Omer Shpigelman
2024-06-13 8:22 ` [PATCH 14/15] accel/habanalabs/gaudi2: CN registers header files Omer Shpigelman
2024-06-13 8:22 ` [PATCH 15/15] accel/habanalabs/gaudi2: network scaling support Omer Shpigelman
2024-06-17 12:34 ` [PATCH 00/15] Introduce HabanaLabs network drivers Alexander Lobakin
2024-06-19 11:40 ` Omer Shpigelman
2024-06-19 16:33 ` Jiri Pirko
2024-06-20 5:37 ` Omer Shpigelman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240613191828.GJ4966@unreal \
--to=leon@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=ogabbay@kernel.org \
--cc=oshpigelman@habana.ai \
--cc=zyehudai@habana.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.