* [PATCH 00/15] Introduce HabanaLabs network drivers
@ 2024-06-13 8:21 Omer Shpigelman
2024-06-13 8:21 ` [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver Omer Shpigelman
` (15 more replies)
0 siblings, 16 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:21 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
This patch set implements the HabanaLabs network drivers for Gaudi2 ASIC
which is designed for scaling of AI neural networks training.
The patch set includes the common code which is shared by all Gaudi ASICs
and the Gaudi2 ASIC specific code. Newer ASICs code will be followed.
All of these network drivers are modeled as an auxiliary devices to the
parent driver.
The newly added drivers are Core Network (CN), Ethernet and InfiniBand.
All of these drivers are based on the existing habanalabs driver which
serves as the compute driver and the entire platform.
The habanalabs driver probes the network drivers which configure the
relevant NIC HW of the device. In addition, it continuously communicates
with the CN driver for providing some services which are not NIC specific
e.g. PCI, MMU, FW communication etc.
See the drivers scheme at:
Documentation/networking/device_drivers/ethernet/intel/hbl.rst
The CN driver is both a parent and a son driver. It serves as the common
layer of many shared operations that are required by both EN and IB
drivers.
The Gaudi2 NIC HW is composed of 48 physical lanes, 56Gbps each. Each pair
of lanes represent a 100Gbps logical port.
The NIC HW was designed specifically for scaling AI training.
Hence it basically functions as a regular NIC device but it is tuned for
its dedicated purpose. As a result, the NIC HW supports Ethernet traffic
and RDMA over modified ROCEv2 protocol.
For example, with respect to the IB driver, the HW supports a single
context and a single PD. The reason for this is that the operational use
case of AI training for Gaudi2 consists of a single user
application/process.
Another example related to the IB driver is the lack of MR since a single
application/process can share the entire MMU with the compute device.
Moreover, the memory allocation of user data buffers which are used for
RDMA communication is done via the habanalabs compute driver uAPI.
With respect to the Ethernet driver, since the Ethernet flow is used
mainly for control, the HW is not performance tuned e.g. it assumes a
contiguous memory for the Rx buffers. Thus the EN driver needs to copy the
Rx packets from the Rx buffer into the skb memory.
The first 8 patches implement the CN driver.
The next 2 patches implement the EN driver.
The next 2 patches implement the IB driver.
The last 3 patches modify the compute driver to support the CN driver.
The patches are rebased on v6.10-rc3 tag:
https://github.com/torvalds/linux/releases/tag/v6.10-rc3
The patches are also available at:
https://github.com/HabanaAI/drivers.gpu.linux-nic.kernel/tree/hbl_next
The user-mode of the driver is being reviewed at:
https://github.com/linux-rdma/rdma-core/pull/1472
Any feedback, comment or question is welcome.
Thanks,
Omer
Omer Shpigelman (15):
net: hbl_cn: add habanalabs Core Network driver
net: hbl_cn: memory manager component
net: hbl_cn: physical layer support
net: hbl_cn: QP state machine
net: hbl_cn: memory trace events
net: hbl_cn: debugfs support
net: hbl_cn: gaudi2: ASIC register header files
net: hbl_cn: gaudi2: ASIC specific support
net: hbl_en: add habanalabs Ethernet driver
net: hbl_en: gaudi2: ASIC specific support
RDMA/hbl: add habanalabs RDMA driver
RDMA/hbl: direct verbs support
accel/habanalabs: network scaling support
accel/habanalabs/gaudi2: CN registers header files
accel/habanalabs/gaudi2: network scaling support
.../ABI/testing/debugfs-driver-habanalabs_cn | 195 +
.../device_drivers/ethernet/index.rst | 1 +
.../device_drivers/ethernet/intel/hbl.rst | 82 +
MAINTAINERS | 33 +
drivers/accel/habanalabs/Kconfig | 1 +
drivers/accel/habanalabs/Makefile | 3 +
drivers/accel/habanalabs/cn/Makefile | 2 +
drivers/accel/habanalabs/cn/cn.c | 815 +
drivers/accel/habanalabs/cn/cn.h | 133 +
.../habanalabs/common/command_submission.c | 2 +-
drivers/accel/habanalabs/common/device.c | 23 +
drivers/accel/habanalabs/common/firmware_if.c | 20 +
drivers/accel/habanalabs/common/habanalabs.h | 43 +-
.../accel/habanalabs/common/habanalabs_drv.c | 37 +-
.../habanalabs/common/habanalabs_ioctl.c | 2 +
drivers/accel/habanalabs/common/memory.c | 123 +
drivers/accel/habanalabs/gaudi/gaudi.c | 14 +-
drivers/accel/habanalabs/gaudi2/Makefile | 2 +-
drivers/accel/habanalabs/gaudi2/gaudi2.c | 440 +-
drivers/accel/habanalabs/gaudi2/gaudi2P.h | 41 +-
drivers/accel/habanalabs/gaudi2/gaudi2_cn.c | 424 +
drivers/accel/habanalabs/gaudi2/gaudi2_cn.h | 42 +
.../habanalabs/gaudi2/gaudi2_coresight.c | 145 +-
.../accel/habanalabs/gaudi2/gaudi2_security.c | 16 +-
drivers/accel/habanalabs/goya/goya.c | 6 +
.../include/gaudi2/asic_reg/gaudi2_regs.h | 10 +-
.../include/gaudi2/asic_reg/nic0_phy_regs.h | 59 +
.../nic0_qm0_axuser_nonsecured_regs.h | 61 +
.../include/gaudi2/asic_reg/nic0_qpc1_regs.h | 905 +
.../include/gaudi2/asic_reg/nic0_rxe0_regs.h | 725 +
.../include/gaudi2/asic_reg/nic0_rxe1_regs.h | 725 +
.../include/gaudi2/asic_reg/nic0_txe0_regs.h | 529 +
.../include/gaudi2/asic_reg/nic0_txs0_regs.h | 289 +
.../include/hw_ip/nic/nic_general.h | 15 +
drivers/infiniband/Kconfig | 1 +
drivers/infiniband/hw/Makefile | 1 +
drivers/infiniband/hw/hbl/Kconfig | 18 +
drivers/infiniband/hw/hbl/Makefile | 12 +
drivers/infiniband/hw/hbl/hbl.h | 326 +
drivers/infiniband/hw/hbl/hbl_encap.c | 216 +
drivers/infiniband/hw/hbl/hbl_main.c | 493 +
drivers/infiniband/hw/hbl/hbl_query_port.c | 96 +
drivers/infiniband/hw/hbl/hbl_set_port_ex.c | 96 +
drivers/infiniband/hw/hbl/hbl_usr_fifo.c | 252 +
drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 +
drivers/net/ethernet/intel/Kconfig | 38 +
drivers/net/ethernet/intel/Makefile | 2 +
drivers/net/ethernet/intel/hbl_cn/Makefile | 14 +
.../net/ethernet/intel/hbl_cn/common/Makefile | 3 +
.../net/ethernet/intel/hbl_cn/common/hbl_cn.c | 5984 ++
.../net/ethernet/intel/hbl_cn/common/hbl_cn.h | 1666 +
.../intel/hbl_cn/common/hbl_cn_debugfs.c | 1457 +
.../ethernet/intel/hbl_cn/common/hbl_cn_drv.c | 240 +
.../intel/hbl_cn/common/hbl_cn_memory.c | 368 +
.../ethernet/intel/hbl_cn/common/hbl_cn_phy.c | 234 +
.../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 491 +
.../net/ethernet/intel/hbl_cn/gaudi2/Makefile | 3 +
.../asic_reg/arc_farm_kdma_ctx_axuser_masks.h | 135 +
.../asic_reg/dcore0_sync_mngr_objs_regs.h | 43543 +++++++++++++++
.../asic_reg/gaudi2_blocks_linux_driver.h | 45068 ++++++++++++++++
.../hbl_cn/gaudi2/asic_reg/gaudi2_regs.h | 77 +
.../asic_reg/nic0_mac_ch0_mac_128_masks.h | 339 +
.../asic_reg/nic0_mac_ch0_mac_128_regs.h | 101 +
.../asic_reg/nic0_mac_ch0_mac_pcs_masks.h | 713 +
.../asic_reg/nic0_mac_ch0_mac_pcs_regs.h | 271 +
.../asic_reg/nic0_mac_ch1_mac_pcs_regs.h | 271 +
.../asic_reg/nic0_mac_ch2_mac_pcs_regs.h | 271 +
.../asic_reg/nic0_mac_ch3_mac_pcs_regs.h | 271 +
.../nic0_mac_glob_stat_control_reg_masks.h | 67 +
.../nic0_mac_glob_stat_control_reg_regs.h | 37 +
.../asic_reg/nic0_mac_glob_stat_rx0_regs.h | 93 +
.../asic_reg/nic0_mac_glob_stat_rx2_regs.h | 93 +
.../asic_reg/nic0_mac_glob_stat_tx0_regs.h | 75 +
.../asic_reg/nic0_mac_glob_stat_tx2_regs.h | 75 +
.../gaudi2/asic_reg/nic0_mac_rs_fec_regs.h | 157 +
.../hbl_cn/gaudi2/asic_reg/nic0_phy_masks.h | 77 +
.../hbl_cn/gaudi2/asic_reg/nic0_phy_regs.h | 59 +
.../nic0_qm0_axuser_nonsecured_regs.h | 61 +
.../asic_reg/nic0_qpc0_axuser_cong_que_regs.h | 61 +
.../asic_reg/nic0_qpc0_axuser_db_fifo_regs.h | 61 +
.../asic_reg/nic0_qpc0_axuser_err_fifo_regs.h | 61 +
.../nic0_qpc0_axuser_ev_que_lbw_intr_regs.h | 61 +
.../asic_reg/nic0_qpc0_axuser_qpc_req_regs.h | 61 +
.../asic_reg/nic0_qpc0_axuser_qpc_resp_regs.h | 61 +
.../asic_reg/nic0_qpc0_axuser_rxwqe_regs.h | 61 +
.../nic0_qpc0_axuser_txwqe_lbw_qman_bp_regs.h | 61 +
.../nic0_qpc0_dbfifo0_ci_upd_addr_regs.h | 27 +
.../nic0_qpc0_dbfifosecur_ci_upd_addr_regs.h | 27 +
.../hbl_cn/gaudi2/asic_reg/nic0_qpc0_masks.h | 963 +
.../hbl_cn/gaudi2/asic_reg/nic0_qpc0_regs.h | 905 +
.../hbl_cn/gaudi2/asic_reg/nic0_qpc1_regs.h | 905 +
.../gaudi2/asic_reg/nic0_rxb_core_masks.h | 459 +
.../gaudi2/asic_reg/nic0_rxb_core_regs.h | 665 +
.../nic0_rxe0_axuser_axuser_cq0_regs.h | 61 +
.../nic0_rxe0_axuser_axuser_cq1_regs.h | 61 +
.../hbl_cn/gaudi2/asic_reg/nic0_rxe0_masks.h | 705 +
.../hbl_cn/gaudi2/asic_reg/nic0_rxe0_regs.h | 725 +
.../asic_reg/nic0_rxe0_wqe_aruser_regs.h | 61 +
.../hbl_cn/gaudi2/asic_reg/nic0_rxe1_regs.h | 725 +
.../gaudi2/asic_reg/nic0_serdes0_masks.h | 7163 +++
.../gaudi2/asic_reg/nic0_serdes0_regs.h | 1679 +
.../gaudi2/asic_reg/nic0_serdes1_regs.h | 1679 +
.../asic_reg/nic0_tmr_axuser_tmr_fifo_regs.h | 61 +
.../nic0_tmr_axuser_tmr_free_list_regs.h | 61 +
.../asic_reg/nic0_tmr_axuser_tmr_fsm_regs.h | 61 +
.../hbl_cn/gaudi2/asic_reg/nic0_tmr_masks.h | 361 +
.../hbl_cn/gaudi2/asic_reg/nic0_tmr_regs.h | 183 +
.../hbl_cn/gaudi2/asic_reg/nic0_txb_regs.h | 167 +
.../hbl_cn/gaudi2/asic_reg/nic0_txe0_masks.h | 759 +
.../hbl_cn/gaudi2/asic_reg/nic0_txe0_regs.h | 529 +
.../hbl_cn/gaudi2/asic_reg/nic0_txs0_masks.h | 555 +
.../hbl_cn/gaudi2/asic_reg/nic0_txs0_regs.h | 289 +
.../nic0_umr0_0_completion_queue_ci_1_regs.h | 27 +
.../nic0_umr0_0_unsecure_doorbell0_regs.h | 31 +
.../nic0_umr0_0_unsecure_doorbell1_regs.h | 31 +
.../gaudi2/asic_reg/prt0_mac_core_masks.h | 137 +
.../gaudi2/asic_reg/prt0_mac_core_regs.h | 67 +
.../ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.c | 5689 ++
.../ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.h | 427 +
.../intel/hbl_cn/gaudi2/gaudi2_cn_debugfs.c | 319 +
.../intel/hbl_cn/gaudi2/gaudi2_cn_eq.c | 732 +
.../intel/hbl_cn/gaudi2/gaudi2_cn_phy.c | 2743 +
drivers/net/ethernet/intel/hbl_en/Makefile | 12 +
.../net/ethernet/intel/hbl_en/common/Makefile | 3 +
.../net/ethernet/intel/hbl_en/common/hbl_en.c | 1170 +
.../net/ethernet/intel/hbl_en/common/hbl_en.h | 208 +
.../intel/hbl_en/common/hbl_en_dcbnl.c | 101 +
.../ethernet/intel/hbl_en/common/hbl_en_drv.c | 211 +
.../intel/hbl_en/common/hbl_en_ethtool.c | 452 +
.../net/ethernet/intel/hbl_en/gaudi2/Makefile | 2 +
.../ethernet/intel/hbl_en/gaudi2/gaudi2_en.c | 728 +
.../ethernet/intel/hbl_en/gaudi2/gaudi2_en.h | 53 +
.../intel/hbl_en/gaudi2/gaudi2_en_dcbnl.c | 32 +
include/linux/habanalabs/cpucp_if.h | 125 +-
include/linux/habanalabs/hl_boot_if.h | 9 +-
include/linux/net/intel/cn.h | 474 +
include/linux/net/intel/cn_aux.h | 298 +
include/linux/net/intel/cni.h | 636 +
include/linux/net/intel/gaudi2.h | 432 +
include/linux/net/intel/gaudi2_aux.h | 94 +
include/trace/events/habanalabs_cn.h | 116 +
include/uapi/drm/habanalabs_accel.h | 10 +-
include/uapi/rdma/hbl-abi.h | 204 +
include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
146 files changed, 148514 insertions(+), 70 deletions(-)
create mode 100644 Documentation/ABI/testing/debugfs-driver-habanalabs_cn
create mode 100644 Documentation/networking/device_drivers/ethernet/intel/hbl.rst
create mode 100644 drivers/accel/habanalabs/cn/Makefile
create mode 100644 drivers/accel/habanalabs/cn/cn.c
create mode 100644 drivers/accel/habanalabs/cn/cn.h
create mode 100644 drivers/accel/habanalabs/gaudi2/gaudi2_cn.c
create mode 100644 drivers/accel/habanalabs/gaudi2/gaudi2_cn.h
create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_phy_regs.h
create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_qm0_axuser_nonsecured_regs.h
create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_qpc1_regs.h
create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_rxe0_regs.h
create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_rxe1_regs.h
create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_txe0_regs.h
create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_txs0_regs.h
create mode 100644 drivers/accel/habanalabs/include/hw_ip/nic/nic_general.h
create mode 100644 drivers/infiniband/hw/hbl/Kconfig
create mode 100644 drivers/infiniband/hw/hbl/Makefile
create mode 100644 drivers/infiniband/hw/hbl/hbl.h
create mode 100644 drivers/infiniband/hw/hbl/hbl_encap.c
create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c
create mode 100644 drivers/infiniband/hw/hbl/hbl_query_port.c
create mode 100644 drivers/infiniband/hw/hbl/hbl_set_port_ex.c
create mode 100644 drivers/infiniband/hw/hbl/hbl_usr_fifo.c
create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/Makefile
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/Makefile
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_debugfs.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_drv.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_memory.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_phy.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/Makefile
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/arc_farm_kdma_ctx_axuser_masks.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/dcore0_sync_mngr_objs_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/gaudi2_blocks_linux_driver.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/gaudi2_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_ch0_mac_128_masks.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_ch0_mac_128_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_ch0_mac_pcs_masks.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_ch0_mac_pcs_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_ch1_mac_pcs_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_ch2_mac_pcs_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_ch3_mac_pcs_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_glob_stat_control_reg_masks.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_glob_stat_control_reg_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_glob_stat_rx0_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_glob_stat_rx2_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_glob_stat_tx0_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_glob_stat_tx2_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_rs_fec_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_phy_masks.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_phy_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qm0_axuser_nonsecured_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_cong_que_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_db_fifo_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_err_fifo_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_ev_que_lbw_intr_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_qpc_req_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_qpc_resp_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_rxwqe_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_txwqe_lbw_qman_bp_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_dbfifo0_ci_upd_addr_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_dbfifosecur_ci_upd_addr_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_masks.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc1_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxb_core_masks.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxb_core_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxe0_axuser_axuser_cq0_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxe0_axuser_axuser_cq1_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxe0_masks.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxe0_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxe0_wqe_aruser_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxe1_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_serdes0_masks.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_serdes0_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_serdes1_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_tmr_axuser_tmr_fifo_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_tmr_axuser_tmr_free_list_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_tmr_axuser_tmr_fsm_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_tmr_masks.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_tmr_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_txb_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_txe0_masks.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_txe0_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_txs0_masks.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_txs0_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_umr0_0_completion_queue_ci_1_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_umr0_0_unsecure_doorbell0_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_umr0_0_unsecure_doorbell1_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/prt0_mac_core_masks.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/prt0_mac_core_regs.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_debugfs.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_eq.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_phy.c
create mode 100644 drivers/net/ethernet/intel/hbl_en/Makefile
create mode 100644 drivers/net/ethernet/intel/hbl_en/common/Makefile
create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c
create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c
create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c
create mode 100644 drivers/net/ethernet/intel/hbl_en/gaudi2/Makefile
create mode 100644 drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en.c
create mode 100644 drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en.h
create mode 100644 drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en_dcbnl.c
create mode 100644 include/linux/net/intel/cn.h
create mode 100644 include/linux/net/intel/cn_aux.h
create mode 100644 include/linux/net/intel/cni.h
create mode 100644 include/linux/net/intel/gaudi2.h
create mode 100644 include/linux/net/intel/gaudi2_aux.h
create mode 100644 include/trace/events/habanalabs_cn.h
create mode 100644 include/uapi/rdma/hbl-abi.h
create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h
create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h
--
2.34.1
^ permalink raw reply [flat|nested] 107+ messages in thread
* [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
@ 2024-06-13 8:21 ` Omer Shpigelman
2024-06-13 13:01 ` Przemek Kitszel
` (2 more replies)
2024-06-13 8:21 ` [PATCH 02/15] net: hbl_cn: memory manager component Omer Shpigelman
` (14 subsequent siblings)
15 siblings, 3 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:21 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
Add the hbl_cn driver which will serve both Ethernet and InfiniBand
drivers.
hbl_cn is the layer which is used by the satellite drivers for many shared
operations that are needed by both EN and IB subsystems like QPs, CQs etc.
The CN driver is initialized via auxiliary bus by the habanalabs driver.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Co-developed-by: David Meriin <dmeriin@habana.ai>
Signed-off-by: David Meriin <dmeriin@habana.ai>
Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
---
.../device_drivers/ethernet/index.rst | 1 +
.../device_drivers/ethernet/intel/hbl.rst | 82 +
MAINTAINERS | 11 +
drivers/net/ethernet/intel/Kconfig | 20 +
drivers/net/ethernet/intel/Makefile | 1 +
drivers/net/ethernet/intel/hbl_cn/Makefile | 9 +
.../net/ethernet/intel/hbl_cn/common/Makefile | 3 +
.../net/ethernet/intel/hbl_cn/common/hbl_cn.c | 5954 +++++++++++++++++
.../net/ethernet/intel/hbl_cn/common/hbl_cn.h | 1627 +++++
.../ethernet/intel/hbl_cn/common/hbl_cn_drv.c | 220 +
.../intel/hbl_cn/common/hbl_cn_memory.c | 40 +
.../ethernet/intel/hbl_cn/common/hbl_cn_phy.c | 33 +
.../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 13 +
include/linux/habanalabs/cpucp_if.h | 125 +-
include/linux/habanalabs/hl_boot_if.h | 9 +-
include/linux/net/intel/cn.h | 474 ++
include/linux/net/intel/cn_aux.h | 298 +
include/linux/net/intel/cni.h | 636 ++
18 files changed, 9545 insertions(+), 11 deletions(-)
create mode 100644 Documentation/networking/device_drivers/ethernet/intel/hbl.rst
create mode 100644 drivers/net/ethernet/intel/hbl_cn/Makefile
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/Makefile
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_drv.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_memory.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_phy.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
create mode 100644 include/linux/net/intel/cn.h
create mode 100644 include/linux/net/intel/cn_aux.h
create mode 100644 include/linux/net/intel/cni.h
diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst
index 6932d8c043c2..2fe975aae7b1 100644
--- a/Documentation/networking/device_drivers/ethernet/index.rst
+++ b/Documentation/networking/device_drivers/ethernet/index.rst
@@ -40,6 +40,7 @@ Contents:
intel/i40e
intel/iavf
intel/ice
+ intel/hbl
marvell/octeontx2
marvell/octeon_ep
marvell/octeon_ep_vf
diff --git a/Documentation/networking/device_drivers/ethernet/intel/hbl.rst b/Documentation/networking/device_drivers/ethernet/intel/hbl.rst
new file mode 100644
index 000000000000..ebf436d0da7c
--- /dev/null
+++ b/Documentation/networking/device_drivers/ethernet/intel/hbl.rst
@@ -0,0 +1,82 @@
+.. SPDX-License-Identifier: GPL-2.0
+======================================================================
+Linux kernel drivers for HabanaLabs (an Intel company) Network
+======================================================================
+
+Copyright 2020-2024 HabanaLabs, Ltd.
+Copyright(c) 2023-2024 Intel Corporation.
+
+Contents
+========
+
+- Overview
+- Identifying Your Adapter
+- Important Notes
+- Support
+- Trademarks
+
+Overview
+========
+
+These drivers enables the infrastructure for network functionality which is part
+of the GAUDI ASIC family of AI Accelerators.
+
+The network interfaces are mainly used for scaling the training of AI neural
+networks through ROCEv2 protocol.
+
+Driver's information can be obtained using lspci.
+
+For questions related to hardware requirements, refer to the documentation
+supplied with your HabanaLabs adapter. All hardware requirements listed apply to
+use with Linux.
+
+The included network drivers are Core Network, Ethernet and InfiniBand which are
+all based on the habanalabs driver which serves as the compute driver and the
+general platform.
+This is the drivers scheme:
+
++------------+ | +-----+ | +-------+
+| INFINIBAND | | | NET | | | ACCEL |
++------------+ | +-----+ | +-------+
+ | |
+ +----+ | +----+ +----+ | +------------+
+ | IB | | | EN <---| CN | | | HABANALABS |
+ +-^--+ | +----+ +---^+ | +------------+
+ | | | |
+ +-----------------------+ +--------------+
+
+The parent driver is at the arrow base and the son is at the arrow head.
+The CN driver is both a parent and a son driver.
+
+Identifying Your Adapter
+========================
+For information on how to identify your adapter, and for the latest Intel
+network drivers, refer to the Intel Support website:
+https://www.intel.com/support
+
+Important Notes
+===============
+
+hbl_cn main goal is to provide core functionalities that are shared between the
+Ethernet (hbl_en) and the InfiniBand (hbl) drivers.
+It contains all core logic that is needed to operate the satellite drivers while
+keeping them as minimal as possible. Only pure Ethernet/InfiniBand code should
+reside in the satellite drivers.
+This code structure ensures a single and common infrastructure layer for both
+functionalities which makes it easier to modify and maintain.
+
+Support
+=======
+For general information, go to the Intel support website at:
+https://www.intel.com/support/
+
+If an issue is identified with the released source code on a supported kernel
+with a supported adapter, email the specific information related to the issue
+to intel-wired-lan@lists.osuosl.org.
+
+Trademarks
+==========
+Intel is a trademark or registered trademark of Intel Corporation or its
+subsidiaries in the United States and/or other countries.
+
+* Other names and brands may be claimed as the property of others.
diff --git a/MAINTAINERS b/MAINTAINERS
index 8754ac2c259d..e948e33e990d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9604,6 +9604,17 @@ F: include/linux/habanalabs/
F: include/trace/events/habanalabs.h
F: include/uapi/drm/habanalabs_accel.h
+HABANALABS CORE NETWORK DRIVER
+M: Omer Shpigelman <oshpigelman@habana.ai>
+L: netdev@vger.kernel.org
+L: linux-rdma@vger.kernel.org
+S: Supported
+W: https://www.habana.ai
+F: Documentation/networking/device_drivers/ethernet/intel/hbl.rst
+F: drivers/net/ethernet/intel/hbl_cn/
+F: include/linux/habanalabs/
+F: include/linux/net/intel/cn*
+
HACKRF MEDIA DRIVER
L: linux-media@vger.kernel.org
S: Orphan
diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index e0287fbd501d..0d1b8a2bae99 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -397,4 +397,24 @@ config IDPF
To compile this driver as a module, choose M here. The module
will be called idpf.
+config HABANA_CN
+ tristate "HabanaLabs (an Intel Company) Core Network driver"
+ depends on X86_64
+ depends on PCI
+ select AUXILIARY_BUS
+ default n
+ help
+ This driver enables the infrastructure for network functionality that
+ is part of the GAUDI ASIC family of AI Accelerators.
+ For more information on how to identify your adapter, go to the
+ Adapter & Driver ID Guide that can be located at:
+
+ <http://support.intel.com>
+
+ More specific information on configuring the driver is in
+ <file:Documentation/networking/device_drivers/ethernet/intel/hbl.rst>.
+
+ To compile this driver as a module, choose M here. The module
+ will be called habanalabs_cn.
+
endif # NET_VENDOR_INTEL
diff --git a/drivers/net/ethernet/intel/Makefile b/drivers/net/ethernet/intel/Makefile
index 04c844ef4964..10049a28e336 100644
--- a/drivers/net/ethernet/intel/Makefile
+++ b/drivers/net/ethernet/intel/Makefile
@@ -19,3 +19,4 @@ obj-$(CONFIG_IAVF) += iavf/
obj-$(CONFIG_FM10K) += fm10k/
obj-$(CONFIG_ICE) += ice/
obj-$(CONFIG_IDPF) += idpf/
+obj-$(CONFIG_HABANA_CN) += hbl_cn/
diff --git a/drivers/net/ethernet/intel/hbl_cn/Makefile b/drivers/net/ethernet/intel/hbl_cn/Makefile
new file mode 100644
index 000000000000..84ee2a6d7c3b
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for HabanaLabs (an Intel Company) core network driver
+#
+
+obj-$(CONFIG_HABANA_CN) := habanalabs_cn.o
+
+include $(src)/common/Makefile
+habanalabs_cn-y += $(HBL_CN_COMMON_FILES)
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/Makefile b/drivers/net/ethernet/intel/hbl_cn/common/Makefile
new file mode 100644
index 000000000000..c8cf092db786
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/common/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+HBL_CN_COMMON_FILES := common/hbl_cn_drv.o common/hbl_cn.o \
+ common/hbl_cn_phy.o common/hbl_cn_qp.o common/hbl_cn_memory.o
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
new file mode 100644
index 000000000000..946b11bfa61b
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
@@ -0,0 +1,5954 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "hbl_cn.h"
+
+#include <linux/file.h>
+#include <linux/module.h>
+#include <linux/overflow.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+
+#define NIC_MIN_WQS_PER_PORT 2
+
+#define NIC_SEQ_RESETS_TIMEOUT_MS 15000 /* 15 seconds */
+#define NIC_MAX_SEQ_RESETS 3
+
+#define HBL_CN_IPV4_PROTOCOL_UDP 17
+
+/* SOB mask is not expected to change across ASIC. Hence common defines. */
+#define NIC_SOB_INC_MASK 0x80000000
+#define NIC_SOB_VAL_MASK 0x7fff
+
+#define NIC_DUMP_QP_SZ SZ_4K
+
+#define HBL_AUX2NIC(aux_dev) \
+ ({ \
+ struct hbl_aux_dev *__aux_dev = (aux_dev); \
+ ((__aux_dev)->type == HBL_AUX_DEV_ETH) ? \
+ container_of(__aux_dev, struct hbl_cn_device, en_aux_dev) : \
+ container_of(__aux_dev, struct hbl_cn_device, ib_aux_dev); \
+ })
+
+#define RAND_STAT_CNT(cnt) \
+ do { \
+ u32 __cnt = get_random_u32(); \
+ (cnt) = __cnt; \
+ dev_info(hdev->dev, "port %d, %s: %u\n", port, #cnt, __cnt); \
+ } while (0)
+
+struct hbl_cn_stat hbl_cn_mac_fec_stats[] = {
+ {"correctable_errors", 0x2, 0x3},
+ {"uncorrectable_errors", 0x4, 0x5}
+};
+
+struct hbl_cn_stat hbl_cn_mac_stats_rx[] = {
+ {"Octets", 0x0},
+ {"OctetsReceivedOK", 0x4},
+ {"aAlignmentErrors", 0x8},
+ {"aPAUSEMACCtrlFramesReceived", 0xC},
+ {"aFrameTooLongErrors", 0x10},
+ {"aInRangeLengthErrors", 0x14},
+ {"aFramesReceivedOK", 0x18},
+ {"aFrameCheckSequenceErrors", 0x1C},
+ {"VLANReceivedOK", 0x20},
+ {"ifInErrors", 0x24},
+ {"ifInUcastPkts", 0x28},
+ {"ifInMulticastPkts", 0x2C},
+ {"ifInBroadcastPkts", 0x30},
+ {"DropEvents", 0x34},
+ {"Pkts", 0x38},
+ {"UndersizePkts", 0x3C},
+ {"Pkts64Octets", 0x40},
+ {"Pkts65to127Octets", 0x44},
+ {"Pkts128to255Octets", 0x48},
+ {"Pkts256to511Octets", 0x4C},
+ {"Pkts512to1023Octets", 0x50},
+ {"Pkts1024to1518Octets", 0x54},
+ {"Pkts1519toMaxOctets", 0x58},
+ {"OversizePkts", 0x5C},
+ {"Jabbers", 0x60},
+ {"Fragments", 0x64},
+ {"aCBFCPAUSERx0", 0x68},
+ {"aCBFCPAUSERx1", 0x6C},
+ {"aCBFCPAUSERx2", 0x70},
+ {"aCBFCPAUSERx3", 0x74},
+ {"aCBFCPAUSERx4", 0x78},
+ {"aCBFCPAUSERx5", 0x7C},
+ {"aCBFCPAUSERx6", 0x80},
+ {"aCBFCPAUSERx7", 0x84},
+ {"aMACControlFramesReceived", 0x88}
+};
+
+struct hbl_cn_stat hbl_cn_mac_stats_tx[] = {
+ {"Octets", 0x0},
+ {"OctetsTransmittedOK", 0x4},
+ {"aPAUSEMACCtrlFramesTransmitted", 0x8},
+ {"aFramesTransmittedOK", 0xC},
+ {"VLANTransmittedOK", 0x10},
+ {"ifOutErrors", 0x14},
+ {"ifOutUcastPkts", 0x18},
+ {"ifOutMulticastPkts", 0x1C},
+ {"ifOutBroadcastPkts", 0x20},
+ {"Pkts64Octets", 0x24},
+ {"Pkts65to127Octets", 0x28},
+ {"Pkts128to255Octets", 0x2C},
+ {"Pkts256to511Octets", 0x30},
+ {"Pkts512to1023Octets", 0x34},
+ {"Pkts1024to1518Octets", 0x38},
+ {"Pkts1519toMaxOctets", 0x3C},
+ {"aCBFCPAUSETx0", 0x40},
+ {"aCBFCPAUSETx1", 0x44},
+ {"aCBFCPAUSETx2", 0x48},
+ {"aCBFCPAUSETx3", 0x4C},
+ {"aCBFCPAUSETx4", 0x50},
+ {"aCBFCPAUSETx5", 0x54},
+ {"aCBFCPAUSETx6", 0x58},
+ {"aCBFCPAUSETx7", 0x5C},
+ {"aMACControlFramesTx", 0x60},
+ {"Pkts", 0x64}
+};
+
+static const char pcs_counters_str[][ETH_GSTRING_LEN] = {
+ {"pcs_local_faults"},
+ {"pcs_remote_faults"},
+ {"pcs_remote_fault_reconfig"},
+ {"pcs_link_restores"},
+ {"pcs_link_toggles"},
+};
+
+static size_t pcs_counters_str_len = ARRAY_SIZE(pcs_counters_str);
+size_t hbl_cn_mac_fec_stats_len = ARRAY_SIZE(hbl_cn_mac_fec_stats);
+size_t hbl_cn_mac_stats_rx_len = ARRAY_SIZE(hbl_cn_mac_stats_rx);
+size_t hbl_cn_mac_stats_tx_len = ARRAY_SIZE(hbl_cn_mac_stats_tx);
+
+static void qps_stop(struct hbl_cn_device *hdev);
+static void qp_destroy_work(struct work_struct *work);
+static int __user_wq_arr_unset(struct hbl_cn_ctx *ctx, struct hbl_cn_port *cn_port, u32 type);
+static void user_cq_destroy(struct kref *kref);
+static void set_app_params_clear(struct hbl_cn_device *hdev);
+static int hbl_cn_ib_cmd_ctrl(struct hbl_aux_dev *aux_dev, void *cn_ib_ctx, u32 op, void *input,
+ void *output);
+static int hbl_cn_ib_query_mem_handle(struct hbl_aux_dev *ib_aux_dev, u64 mem_handle,
+ struct hbl_ib_mem_info *info);
+
+static void hbl_cn_reset_stats_counters_port(struct hbl_cn_device *hdev, u32 port);
+static void hbl_cn_late_init(struct hbl_cn_device *hdev);
+static void hbl_cn_late_fini(struct hbl_cn_device *hdev);
+static int hbl_cn_sw_init(struct hbl_cn_device *hdev);
+static void hbl_cn_sw_fini(struct hbl_cn_device *hdev);
+static void hbl_cn_spmu_init(struct hbl_cn_port *cn_port, bool full);
+static int hbl_cn_cmd_port_check(struct hbl_cn_device *hdev, u32 port, u32 flags);
+static void hbl_cn_qps_stop(struct hbl_cn_port *cn_port);
+
+static int hbl_cn_request_irqs(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+
+ return asic_funcs->request_irqs(hdev);
+}
+
+static void hbl_cn_free_irqs(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+
+ asic_funcs->free_irqs(hdev);
+}
+
+static void hbl_cn_synchronize_irqs(struct hbl_aux_dev *cn_aux_dev)
+{
+ struct hbl_cn_device *hdev = cn_aux_dev->priv;
+ struct hbl_cn_asic_funcs *asic_funcs;
+
+ asic_funcs = hdev->asic_funcs;
+
+ asic_funcs->synchronize_irqs(hdev);
+}
+
+void hbl_cn_get_frac_info(u64 numerator, u64 denominator, u64 *integer, u64 *exp)
+{
+ u64 high_digit_n, high_digit_d, integer_tmp, exp_tmp;
+ u8 num_digits_n, num_digits_d;
+ int i;
+
+ num_digits_d = hbl_cn_get_num_of_digits(denominator);
+ high_digit_d = denominator;
+ for (i = 0; i < num_digits_d - 1; i++)
+ high_digit_d /= 10;
+
+ integer_tmp = 0;
+ exp_tmp = 0;
+
+ if (numerator) {
+ num_digits_n = hbl_cn_get_num_of_digits(numerator);
+ high_digit_n = numerator;
+ for (i = 0; i < num_digits_n - 1; i++)
+ high_digit_n /= 10;
+
+ exp_tmp = num_digits_d - num_digits_n;
+
+ if (high_digit_n < high_digit_d) {
+ high_digit_n *= 10;
+ exp_tmp++;
+ }
+
+ integer_tmp = div_u64(high_digit_n, high_digit_d);
+ }
+
+ *integer = integer_tmp;
+ *exp = exp_tmp;
+}
+
+int hbl_cn_read_spmu_counters(struct hbl_cn_port *cn_port, u64 out_data[], u32 *num_out_data)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_stat *ignore;
+ int rc;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ port_funcs->spmu_get_stats_info(cn_port, &ignore, num_out_data);
+
+ /* this function can be called from ethtool, get_statistics ioctl and FW status thread */
+ mutex_lock(&cn_port->cnt_lock);
+ rc = port_funcs->spmu_sample(cn_port, *num_out_data, out_data);
+ mutex_unlock(&cn_port->cnt_lock);
+
+ return rc;
+}
+
+static u32 hbl_cn_get_port_toggle_cnt(struct hbl_cn_port *cn_port)
+{
+ /* We should not count the first toggle, as it marks that port was brought up for
+ * the first time. In case port connection wasn't established, the counter should be 0.
+ */
+ return cn_port->port_toggle_cnt ? cn_port->port_toggle_cnt - 1 : 0;
+}
+
+static int __hbl_cn_get_cnts_num(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+
+ return pcs_counters_str_len +
+ hdev->asic_funcs->port_funcs->get_cnts_num(cn_port);
+}
+
+static void __hbl_cn_get_cnts_names(struct hbl_cn_port *cn_port, u8 *data, bool ext)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ int i, len;
+
+ len = ext ? HBL_IB_CNT_NAME_LEN : ETH_GSTRING_LEN;
+
+ for (i = 0; i < pcs_counters_str_len; i++)
+ memcpy(data + i * len, pcs_counters_str[i], ETH_GSTRING_LEN);
+ data += i * len;
+
+ hdev->asic_funcs->port_funcs->get_cnts_names(cn_port, data, ext);
+}
+
+static void __hbl_cn_get_cnts_values(struct hbl_cn_port *cn_port, u64 *data)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+
+ data[0] = cn_port->pcs_local_fault_cnt;
+ data[1] = cn_port->pcs_remote_fault_cnt;
+ data[2] = cn_port->pcs_remote_fault_reconfig_cnt;
+ data[3] = cn_port->pcs_link_restore_cnt;
+ data[4] = hbl_cn_get_port_toggle_cnt(cn_port);
+
+ data += pcs_counters_str_len;
+
+ hdev->asic_funcs->port_funcs->get_cnts_values(cn_port, data);
+}
+
+static int __hbl_cn_port_hw_init(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+
+ if (cn_port->disabled) {
+ dev_err(hdev->dev, "Port %u is disabled\n", cn_port->port);
+ return -EPERM;
+ }
+
+ hbl_cn_reset_stats_counters_port(hdev, cn_port->port);
+
+ return hdev->asic_funcs->port_funcs->port_hw_init(cn_port);
+}
+
+static void __hbl_cn_port_hw_fini(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_funcs *asic_funcs;
+
+ asic_funcs = hdev->asic_funcs;
+
+ /* in hard reset the QPs were stopped by hbl_cn_stop called from halt engines */
+ if (hdev->operational)
+ hbl_cn_qps_stop(cn_port);
+
+ asic_funcs->port_funcs->port_hw_fini(cn_port);
+}
+
+bool hbl_cn_comp_device_operational(struct hbl_cn_device *hdev)
+{
+ struct hbl_aux_dev *aux_dev = hdev->cn_aux_dev;
+ struct hbl_cn_aux_ops *aux_ops;
+
+ aux_ops = aux_dev->aux_ops;
+
+ return aux_ops->device_operational(aux_dev);
+}
+
+void hbl_cn_spmu_get_stats_info(struct hbl_cn_port *cn_port, struct hbl_cn_stat **stats,
+ u32 *n_stats)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+
+ hdev->asic_funcs->port_funcs->spmu_get_stats_info(cn_port, stats, n_stats);
+}
+
+int hbl_cn_reserve_dva_block(struct hbl_cn_ctx *ctx, u64 size, u64 *dva)
+{
+ struct hbl_cn_device *hdev = ctx->hdev;
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ return aux_ops->vm_reserve_dva_block(aux_dev, ctx->driver_vm_info.vm_handle, size, dva);
+}
+
+void hbl_cn_unreserve_dva_block(struct hbl_cn_ctx *ctx, u64 dva, u64 size)
+{
+ struct hbl_cn_device *hdev = ctx->hdev;
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ aux_ops->vm_unreserve_dva_block(aux_dev, ctx->driver_vm_info.vm_handle, dva, size);
+}
+
+int hbl_cn_get_hw_block_handle(struct hbl_cn_device *hdev, u64 address, u64 *handle)
+{
+ return hdev->asic_funcs->get_hw_block_handle(hdev, address, handle);
+}
+
+static int hbl_cn_get_hw_block_addr(struct hbl_cn_device *hdev, u64 handle, u64 *addr, u64 *size)
+{
+ return hdev->asic_funcs->get_hw_block_addr(hdev, handle, addr, size);
+}
+
+int hbl_cn_send_cpucp_packet(struct hbl_cn_device *hdev, u32 port, enum cpucp_packet_id pkt_id,
+ int val)
+{
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ return hdev->asic_funcs->port_funcs->send_cpucp_packet(cn_port, pkt_id, val);
+}
+
+static bool hbl_cn_device_operational(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+
+ return hdev->operational;
+}
+
+static void hbl_cn_hw_access_lock(struct hbl_aux_dev *aux_dev)
+ __acquires(&hdev->hw_access_lock)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+
+ mutex_lock(&hdev->hw_access_lock);
+}
+
+static void hbl_cn_hw_access_unlock(struct hbl_aux_dev *aux_dev)
+ __releases(&hdev->hw_access_lock)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+
+ mutex_unlock(&hdev->hw_access_lock);
+}
+
+static bool hbl_cn_is_eth_lpbk(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+
+ return hdev->eth_loopback;
+}
+
+static int hbl_cn_port_hw_init(struct hbl_aux_dev *aux_dev, u32 port)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ return __hbl_cn_port_hw_init(cn_port);
+}
+
+static void hbl_cn_port_hw_fini(struct hbl_aux_dev *aux_dev, u32 port)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ __hbl_cn_port_hw_fini(cn_port);
+}
+
+static int hbl_cn_phy_port_init(struct hbl_aux_dev *aux_dev, u32 port)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ return hdev->asic_funcs->port_funcs->phy_port_init(cn_port);
+}
+
+static void hbl_cn_phy_port_fini(struct hbl_aux_dev *aux_dev, u32 port)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ hdev->asic_funcs->port_funcs->phy_port_fini(cn_port);
+}
+
+static int hbl_cn_set_pfc(struct hbl_aux_dev *aux_dev, u32 port, bool enable)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ cn_port->pfc_enable = enable;
+
+ return hdev->asic_funcs->port_funcs->set_pfc(cn_port);
+}
+
+static int hbl_cn_get_cnts_num(struct hbl_aux_dev *aux_dev, u32 port)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ return __hbl_cn_get_cnts_num(cn_port);
+}
+
+static void hbl_cn_get_cnts_names(struct hbl_aux_dev *aux_dev, u32 port, u8 *data)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ __hbl_cn_get_cnts_names(cn_port, data, false);
+}
+
+static void hbl_cn_get_cnts_values(struct hbl_aux_dev *aux_dev, u32 port, u64 *data)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ __hbl_cn_get_cnts_values(cn_port, data);
+}
+
+static bool hbl_cn_get_mac_lpbk(struct hbl_aux_dev *aux_dev, u32 port)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ return cn_port->mac_loopback;
+}
+
+static int hbl_cn_set_mac_lpbk(struct hbl_aux_dev *aux_dev, u32 port, bool enable)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ if (atomic_read(&cn_port->num_of_allocated_qps)) {
+ dev_dbg(hdev->dev,
+ "There are active QPs under this port - Can't %s mac loopback\n",
+ enable ? "enable" : "disable");
+ return -EBUSY;
+ }
+
+ cn_port->mac_loopback = enable;
+
+ if (enable)
+ hdev->mac_loopback |= BIT(port);
+ else
+ hdev->mac_loopback &= ~BIT(port);
+
+ return 0;
+}
+
+static int hbl_cn_update_mtu(struct hbl_aux_dev *aux_dev, u32 port, u32 mtu)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_port *cn_port;
+ unsigned long qp_id = 0;
+ struct hbl_cn_qp *qp;
+ int rc = 0;
+
+ cn_port = &hdev->cn_ports[port];
+ port_funcs = hdev->asic_funcs->port_funcs;
+ mtu += HBL_EN_MAX_HEADERS_SZ;
+
+ port_funcs->cfg_lock(cn_port);
+ xa_for_each(&cn_port->qp_ids, qp_id, qp) {
+ if (qp->mtu_type == MTU_FROM_NETDEV && qp->mtu != mtu) {
+ rc = port_funcs->update_qp_mtu(cn_port, qp, mtu);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to update MTU, port: %d, qpn: %ld, %d\n",
+ port, qp_id, rc);
+ break;
+ }
+ }
+ }
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static int hbl_cn_qpc_write(struct hbl_aux_dev *aux_dev, u32 port, void *qpc,
+ struct qpc_mask *qpc_mask, u32 qpn, bool is_req)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_port *cn_port;
+ int rc;
+
+ cn_port = &hdev->cn_ports[port];
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ port_funcs->cfg_lock(cn_port);
+ rc = port_funcs->qpc_write(cn_port, qpc, qpc_mask, qpn, is_req);
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static void hbl_cn_ctrl_lock(struct hbl_aux_dev *aux_dev, u32 port)
+ __acquires(&cn_port->control_lock)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ mutex_lock(&cn_port->control_lock);
+}
+
+static void hbl_cn_ctrl_unlock(struct hbl_aux_dev *aux_dev, u32 port)
+ __releases(&cn_port->control_lock)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ mutex_unlock(&cn_port->control_lock);
+}
+
+static int hbl_cn_dispatcher_register_qp(struct hbl_aux_dev *aux_dev, u32 port, u32 asid, u32 qp_id)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ return hbl_cn_eq_dispatcher_register_qp(cn_port, asid, qp_id);
+}
+
+static int hbl_cn_dispatcher_unregister_qp(struct hbl_aux_dev *aux_dev, u32 port, u32 qp_id)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ return hbl_cn_eq_dispatcher_unregister_qp(cn_port, qp_id);
+}
+
+static u32 hbl_cn_get_speed(struct hbl_aux_dev *aux_dev, u32 port)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ return cn_port->speed;
+}
+
+static void hbl_cn_track_ext_port_reset(struct hbl_aux_dev *aux_dev, u32 port, u32 syndrome)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ hbl_cn_track_port_reset(cn_port, syndrome);
+}
+
+static void hbl_cn_port_toggle_count(struct hbl_aux_dev *aux_dev, u32 port)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ cn_port->port_toggle_cnt++;
+}
+
+/* Check for initialized hbl IB device. */
+bool hbl_cn_is_ibdev(struct hbl_cn_device *hdev)
+{
+ return !!hdev->ib_aux_dev.priv;
+}
+
+/* Check for opened hbl IB device. */
+static bool hbl_cn_is_ibdev_opened(struct hbl_cn_device *hdev)
+{
+ return hdev->ib_aux_dev.priv && hdev->ib_device_opened;
+}
+
+static int hbl_cn_ib_alloc_ucontext(struct hbl_aux_dev *ib_aux_dev, int user_fd, void **cn_ib_ctx)
+{
+ struct hbl_cn_comp_vm_info *user_vm_info, *driver_vm_info;
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(ib_aux_dev);
+ struct hbl_aux_dev *aux_dev = hdev->cn_aux_dev;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_cn_ctx *ctx;
+ int rc;
+
+ asic_funcs = hdev->asic_funcs;
+ aux_ops = aux_dev->aux_ops;
+
+ if (!hdev->multi_ctx_support && hdev->ctx) {
+ dev_err(hdev->dev, "There is already an active user context\n");
+ return -EBUSY;
+ }
+
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+ if (!ctx)
+ return -ENOMEM;
+
+ ctx->hdev = hdev;
+ mutex_init(&ctx->lock);
+
+ user_vm_info = &ctx->user_vm_info;
+ driver_vm_info = &ctx->driver_vm_info;
+
+ rc = aux_ops->register_cn_user_context(aux_dev, user_fd, ctx, &ctx->comp_handle,
+ &user_vm_info->vm_handle);
+ if (rc) {
+ dev_dbg(hdev->dev, "Failed to register user context with FD %d\n", user_fd);
+ goto release_ctx;
+ }
+
+ if (user_vm_info->vm_handle != ctx->comp_handle) {
+ rc = aux_ops->get_vm_info(aux_dev, user_vm_info->vm_handle, &user_vm_info->vm_info);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to get user VM info for handle 0x%llx\n",
+ user_vm_info->vm_handle);
+ goto deregister_ctx;
+ }
+
+ if (user_vm_info->vm_info.mmu_mode == HBL_CN_MMU_MODE_NETWORK_TLB)
+ ctx->user_asid = user_vm_info->vm_info.net_tlb.pasid;
+ else
+ ctx->user_asid = user_vm_info->vm_info.ext_mmu.work_id;
+ } else {
+ /* No data transfer in this mode */
+ ctx->user_asid = -1;
+ }
+
+ rc = aux_ops->vm_create(aux_dev, ctx->comp_handle, 0, &driver_vm_info->vm_handle);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to create driver VM for vompute handle 0x%llx\n",
+ ctx->comp_handle);
+ goto deregister_ctx;
+ }
+
+ rc = aux_ops->get_vm_info(aux_dev, driver_vm_info->vm_handle, &driver_vm_info->vm_info);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to get driver VM info for handle 0x%llx\n",
+ driver_vm_info->vm_handle);
+ goto destroy_driver_vm;
+ }
+
+ if (driver_vm_info->vm_info.mmu_mode == HBL_CN_MMU_MODE_NETWORK_TLB)
+ ctx->asid = driver_vm_info->vm_info.net_tlb.pasid;
+ else
+ ctx->asid = driver_vm_info->vm_info.ext_mmu.work_id;
+
+ /* must be called before calling create_mem_ctx */
+ rc = asic_funcs->ctx_init(ctx);
+ if (rc) {
+ dev_err(hdev->dev, "failed to init user context with ASID %d\n", ctx->asid);
+ goto destroy_driver_vm;
+ }
+
+ if (ctx->user_asid != -1 && user_vm_info->vm_info.mmu_mode == HBL_CN_MMU_MODE_NETWORK_TLB) {
+ rc = asic_funcs->create_mem_ctx(ctx, user_vm_info->vm_info.net_tlb.pasid,
+ user_vm_info->vm_info.net_tlb.page_tbl_addr);
+ if (rc) {
+ dev_err(hdev->dev,
+ "failed to create HW memory context for user VM, FD %d\n", user_fd);
+ goto ctx_cleanup;
+ }
+ }
+
+ if (driver_vm_info->vm_info.mmu_mode == HBL_CN_MMU_MODE_NETWORK_TLB) {
+ rc = asic_funcs->create_mem_ctx(ctx, driver_vm_info->vm_info.net_tlb.pasid,
+ driver_vm_info->vm_info.net_tlb.page_tbl_addr);
+ if (rc) {
+ dev_err(hdev->dev,
+ "failed to create HW memory context for driver VM, FD %d\n",
+ user_fd);
+ goto user_vm_ctx_cleanup;
+ }
+ }
+
+ *cn_ib_ctx = ctx;
+ hdev->ib_device_opened = true;
+ hdev->ctx = ctx;
+
+ return 0;
+
+user_vm_ctx_cleanup:
+ if (ctx->user_asid != -1 && user_vm_info->vm_info.mmu_mode == HBL_CN_MMU_MODE_NETWORK_TLB)
+ asic_funcs->destroy_mem_ctx(ctx, user_vm_info->vm_info.net_tlb.pasid,
+ user_vm_info->vm_info.net_tlb.page_tbl_addr);
+ctx_cleanup:
+ asic_funcs->ctx_fini(ctx);
+destroy_driver_vm:
+ aux_ops->vm_destroy(aux_dev, driver_vm_info->vm_handle);
+deregister_ctx:
+ aux_ops->deregister_cn_user_context(aux_dev, user_vm_info->vm_handle);
+release_ctx:
+ mutex_destroy(&ctx->lock);
+ kfree(ctx);
+
+ return rc;
+}
+
+static void hbl_cn_ib_dealloc_ucontext(struct hbl_aux_dev *ib_aux_dev, void *cn_ib_ctx)
+{
+ struct hbl_cn_comp_vm_info *user_vm_info, *driver_vm_info;
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(ib_aux_dev);
+ struct hbl_aux_dev *aux_dev = hdev->cn_aux_dev;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct hbl_cn_ctx *ctx = cn_ib_ctx;
+ struct hbl_cn_aux_ops *aux_ops;
+
+ aux_ops = aux_dev->aux_ops;
+ asic_funcs = hdev->asic_funcs;
+ user_vm_info = &ctx->user_vm_info;
+ driver_vm_info = &ctx->driver_vm_info;
+
+ dev_dbg(hdev->dev, "IB context dealloc\n");
+
+ if (driver_vm_info->vm_info.mmu_mode == HBL_CN_MMU_MODE_NETWORK_TLB)
+ asic_funcs->destroy_mem_ctx(ctx, driver_vm_info->vm_info.net_tlb.pasid,
+ driver_vm_info->vm_info.net_tlb.page_tbl_addr);
+
+ if (ctx->user_asid != -1 && user_vm_info->vm_info.mmu_mode == HBL_CN_MMU_MODE_NETWORK_TLB)
+ asic_funcs->destroy_mem_ctx(ctx, user_vm_info->vm_info.net_tlb.pasid,
+ user_vm_info->vm_info.net_tlb.page_tbl_addr);
+
+ hbl_cn_ctx_resources_destroy(hdev, ctx);
+ hdev->asic_funcs->ctx_fini(ctx);
+
+ aux_ops->vm_destroy(aux_dev, driver_vm_info->vm_handle);
+ aux_ops->deregister_cn_user_context(aux_dev, user_vm_info->vm_handle);
+
+ hdev->ctx = NULL;
+ mutex_destroy(&ctx->lock);
+ kfree(ctx);
+
+ hdev->ib_device_opened = false;
+}
+
+static void hbl_cn_ib_query_port(struct hbl_aux_dev *aux_dev, u32 port,
+ struct hbl_ib_port_attr *port_attr)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct hbl_cn_port *cn_port;
+
+ asic_funcs = hdev->asic_funcs;
+ cn_port = &hdev->cn_ports[port];
+
+ port_attr->open = hbl_cn_is_port_open(cn_port);
+ port_attr->link_up = cn_port->pcs_link;
+ port_attr->speed = cn_port->speed;
+ port_attr->max_msg_sz = asic_funcs->get_max_msg_sz(hdev);
+ port_attr->num_lanes = hdev->lanes_per_port;
+ port_attr->max_mtu = SZ_8K;
+}
+
+static inline void parse_fw_ver(struct hbl_cn_device *hdev, char *str, u32 *maj, u16 *min, u16 *sub)
+{
+ char *ver = strstr(str, "fw-");
+ int ret;
+
+ if (!ver)
+ goto failure;
+
+ ret = sscanf(ver, "fw-%d.%hu.%hu", maj, min, sub);
+ if (ret < 3) {
+failure:
+ dev_dbg(hdev->dev, "Failed to read version string\n");
+ *maj = *min = *sub = 0;
+ }
+}
+
+static void hbl_cn_ib_query_device(struct hbl_aux_dev *aux_dev, struct hbl_ib_device_attr *dev_attr)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_properties *cn_props;
+ struct hbl_ib_aux_data *aux_data;
+ u16 minor, sub_ver;
+ u32 major;
+
+ aux_data = aux_dev->aux_data;
+ cn_props = &hdev->cn_props;
+
+ if (hdev->cpucp_fw) {
+ parse_fw_ver(hdev, hdev->fw_ver, &major, &minor, &sub_ver);
+ dev_attr->fw_ver = ((u64)major << 32) | ((u64)minor << 16) | sub_ver;
+ }
+
+ dev_attr->max_mr_size = aux_data->dram_size;
+
+ dev_attr->page_size_cap = PAGE_SIZE;
+
+ dev_attr->vendor_id = hdev->pdev->vendor;
+ dev_attr->vendor_part_id = hdev->pdev->device;
+ dev_attr->hw_ver = hdev->pdev->subsystem_device;
+
+ dev_attr->max_qp = cn_props->max_qps_num;
+
+ dev_attr->max_qp_wr = aux_data->max_num_of_wqes;
+ dev_attr->max_cqe = cn_props->user_cq_max_entries;
+
+ dev_attr->cqe_size = cn_props->cqe_size;
+ dev_attr->min_cq_entries = cn_props->user_cq_min_entries;
+}
+
+static void hbl_cn_ib_set_ip_addr_encap(struct hbl_aux_dev *aux_dev, u32 ip_addr, u32 port)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct hbl_cn_port *cn_port;
+ u32 encap_id;
+
+ asic_funcs = hdev->asic_funcs;
+ cn_port = &hdev->cn_ports[port];
+
+ asic_funcs->port_funcs->set_ip_addr_encap(cn_port, &encap_id, ip_addr);
+}
+
+static char *hbl_cn_ib_qp_syndrome_to_str(struct hbl_aux_dev *aux_dev, u32 syndrome)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_asic_funcs *asic_funcs;
+
+ asic_funcs = hdev->asic_funcs;
+
+ return asic_funcs->qp_syndrome_to_str(syndrome);
+}
+
+static int hbl_cn_ib_verify_qp_id(struct hbl_aux_dev *aux_dev, u32 qp_id, u32 port)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_port *cn_port;
+ struct hbl_cn_device *hdev;
+ struct hbl_cn_qp *qp;
+ int rc = 0;
+
+ hdev = HBL_AUX2NIC(aux_dev);
+ port_funcs = hdev->asic_funcs->port_funcs;
+ cn_port = &hdev->cn_ports[port];
+
+ port_funcs->cfg_lock(cn_port);
+ qp = xa_load(&cn_port->qp_ids, qp_id);
+
+ if (IS_ERR_OR_NULL(qp)) {
+ dev_dbg(hdev->dev, "Failed to find matching QP for handle %d, port %d\n", qp_id,
+ port);
+ rc = -EINVAL;
+ goto cfg_unlock;
+ }
+
+ /* sanity test the port IDs */
+ if (qp->port != port) {
+ dev_dbg(hdev->dev, "QP port %d does not match requested port %d\n", qp->port, port);
+ rc = -EINVAL;
+ goto cfg_unlock;
+ }
+
+cfg_unlock:
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static int hbl_cn_ib_dump_qp(struct hbl_aux_dev *aux_dev, struct hbl_ib_dump_qp_attr *attr,
+ char *buf, size_t size)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct hbl_cn_qp_info qp_info = {};
+ int rc;
+
+ asic_funcs = hdev->asic_funcs;
+
+ qp_info.port = attr->port;
+ qp_info.qpn = attr->qpn;
+ qp_info.req = attr->req;
+ qp_info.full_print = attr->full;
+ qp_info.force_read = attr->force;
+
+ rc = asic_funcs->qp_read(hdev, &qp_info, buf, size);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to read QP %u, port %u\n", attr->qpn, attr->port);
+ return rc;
+ }
+
+ return 0;
+}
+
+static int hbl_cn_en_aux_data_init(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+ struct hbl_en_aux_data *en_aux_data;
+ struct hbl_cn_properties *cn_props;
+ struct hbl_en_aux_ops *en_aux_ops;
+ struct hbl_aux_dev *en_aux_dev;
+ char **mac_addr;
+ int i;
+
+ en_aux_dev = &hdev->en_aux_dev;
+ en_aux_dev->type = HBL_AUX_DEV_ETH;
+ en_aux_data = en_aux_dev->aux_data;
+ en_aux_ops = en_aux_dev->aux_ops;
+ cn_props = &hdev->cn_props;
+
+ en_aux_data->pdev = hdev->pdev;
+ en_aux_data->dev = hdev->dev;
+ en_aux_data->ports_mask = hdev->ext_ports_mask;
+ en_aux_data->auto_neg_mask = hdev->auto_neg_mask;
+ en_aux_data->id = hdev->id;
+ en_aux_data->fw_ver = hdev->fw_ver;
+ en_aux_data->qsfp_eeprom = hdev->cpucp_info->qsfp_eeprom;
+ en_aux_data->pending_reset_long_timeout = hdev->pending_reset_long_timeout;
+ en_aux_data->max_frm_len = cn_props->max_frm_len;
+ en_aux_data->raw_elem_size = cn_props->raw_elem_size;
+ en_aux_data->max_raw_mtu = cn_props->max_raw_mtu;
+ en_aux_data->min_raw_mtu = cn_props->min_raw_mtu;
+ en_aux_data->max_num_of_ports = hdev->cn_props.max_num_of_ports;
+ en_aux_data->has_eq = hdev->has_eq;
+ en_aux_data->asic_type = hdev->asic_type;
+
+ mac_addr = kcalloc(hdev->cn_props.max_num_of_ports, sizeof(*mac_addr), GFP_KERNEL);
+ if (!mac_addr)
+ return -ENOMEM;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(en_aux_data->ports_mask & BIT(i)))
+ continue;
+
+ mac_addr[i] = hdev->cpucp_info->mac_addrs[i].mac_addr;
+ }
+
+ en_aux_data->mac_addr = mac_addr;
+
+ /* set en -> cn ops */
+ /* device functions */
+ en_aux_ops->device_operational = hbl_cn_device_operational;
+ en_aux_ops->hw_access_lock = hbl_cn_hw_access_lock;
+ en_aux_ops->hw_access_unlock = hbl_cn_hw_access_unlock;
+ en_aux_ops->is_eth_lpbk = hbl_cn_is_eth_lpbk;
+ /* port functions */
+ en_aux_ops->port_hw_init = hbl_cn_port_hw_init;
+ en_aux_ops->port_hw_fini = hbl_cn_port_hw_fini;
+ en_aux_ops->phy_init = hbl_cn_phy_port_init;
+ en_aux_ops->phy_fini = hbl_cn_phy_port_fini;
+ en_aux_ops->set_pfc = hbl_cn_set_pfc;
+ en_aux_ops->get_cnts_num = hbl_cn_get_cnts_num;
+ en_aux_ops->get_cnts_names = hbl_cn_get_cnts_names;
+ en_aux_ops->get_cnts_values = hbl_cn_get_cnts_values;
+ en_aux_ops->get_mac_lpbk = hbl_cn_get_mac_lpbk;
+ en_aux_ops->set_mac_lpbk = hbl_cn_set_mac_lpbk;
+ en_aux_ops->update_mtu = hbl_cn_update_mtu;
+ en_aux_ops->qpc_write = hbl_cn_qpc_write;
+ en_aux_ops->ctrl_lock = hbl_cn_ctrl_lock;
+ en_aux_ops->ctrl_unlock = hbl_cn_ctrl_unlock;
+ en_aux_ops->eq_dispatcher_register_qp = hbl_cn_dispatcher_register_qp;
+ en_aux_ops->eq_dispatcher_unregister_qp = hbl_cn_dispatcher_unregister_qp;
+ en_aux_ops->get_speed = hbl_cn_get_speed;
+ en_aux_ops->track_ext_port_reset = hbl_cn_track_ext_port_reset;
+ en_aux_ops->port_toggle_count = hbl_cn_port_toggle_count;
+
+ asic_funcs->set_en_data(hdev);
+
+ return 0;
+}
+
+static void hbl_cn_en_aux_data_fini(struct hbl_cn_device *hdev)
+{
+ struct hbl_aux_dev *aux_dev = &hdev->en_aux_dev;
+ struct hbl_en_aux_data *aux_data;
+
+ aux_data = aux_dev->aux_data;
+
+ kfree(aux_data->mac_addr);
+ aux_data->mac_addr = NULL;
+}
+
+static int hbl_cn_ib_aux_data_init(struct hbl_cn_device *hdev)
+{
+ struct hbl_ib_port_cnts_data *cnts_data;
+ struct hbl_ib_aux_data *ib_aux_data;
+ struct hbl_ib_aux_ops *ib_aux_ops;
+ struct hbl_aux_dev *ib_aux_dev;
+ struct hbl_cn_port *cn_port;
+ int rc, i;
+
+ ib_aux_dev = &hdev->ib_aux_dev;
+ ib_aux_dev->type = HBL_AUX_DEV_IB;
+ ib_aux_data = ib_aux_dev->aux_data;
+ ib_aux_ops = ib_aux_dev->aux_ops;
+
+ ib_aux_data->pdev = hdev->pdev;
+ ib_aux_data->dev = hdev->dev;
+ ib_aux_data->ports_mask = hdev->ports_mask;
+ ib_aux_data->ext_ports_mask = hdev->ext_ports_mask;
+ ib_aux_data->max_num_of_wqes = hdev->cn_props.max_hw_user_wqs_num;
+ ib_aux_data->max_num_of_ports = hdev->cn_props.max_num_of_ports;
+ ib_aux_data->pending_reset_long_timeout = hdev->pending_reset_long_timeout;
+ ib_aux_data->id = hdev->id;
+ ib_aux_data->dram_size = hdev->dram_size;
+ ib_aux_data->mixed_qp_wq_types = hdev->mixed_qp_wq_types;
+ ib_aux_data->umr_support = hdev->umr_support;
+ ib_aux_data->cc_support = hdev->cc_support;
+
+ ib_aux_data->cnts_data = kcalloc(hdev->cn_props.max_num_of_ports,
+ sizeof(*ib_aux_data->cnts_data), GFP_KERNEL);
+ if (!ib_aux_data->cnts_data)
+ return -ENOMEM;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(ib_aux_data->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+ cnts_data = &ib_aux_data->cnts_data[i];
+
+ cnts_data->num = __hbl_cn_get_cnts_num(cn_port);
+
+ cnts_data->names = kcalloc(cnts_data->num, HBL_IB_CNT_NAME_LEN, GFP_KERNEL);
+ if (!cnts_data->names) {
+ rc = -ENOMEM;
+ goto free_cnts_data;
+ }
+
+ __hbl_cn_get_cnts_names(cn_port, cnts_data->names, true);
+ }
+
+ /* set ib -> cn ops */
+ /* the following functions are used even if the IB verbs API is disabled */
+ ib_aux_ops->device_operational = hbl_cn_device_operational;
+ ib_aux_ops->hw_access_lock = hbl_cn_hw_access_lock;
+ ib_aux_ops->hw_access_unlock = hbl_cn_hw_access_unlock;
+ ib_aux_ops->alloc_ucontext = hbl_cn_ib_alloc_ucontext;
+ ib_aux_ops->dealloc_ucontext = hbl_cn_ib_dealloc_ucontext;
+ ib_aux_ops->query_port = hbl_cn_ib_query_port;
+ ib_aux_ops->query_device = hbl_cn_ib_query_device;
+ ib_aux_ops->set_ip_addr_encap = hbl_cn_ib_set_ip_addr_encap;
+ ib_aux_ops->qp_syndrome_to_str = hbl_cn_ib_qp_syndrome_to_str;
+ ib_aux_ops->verify_qp_id = hbl_cn_ib_verify_qp_id;
+ ib_aux_ops->get_cnts_values = hbl_cn_get_cnts_values;
+ ib_aux_ops->dump_qp = hbl_cn_ib_dump_qp;
+
+ /* these functions are used only if the IB verbs API is enabled */
+ ib_aux_ops->cmd_ctrl = hbl_cn_ib_cmd_ctrl;
+ ib_aux_ops->query_mem_handle = hbl_cn_ib_query_mem_handle;
+
+ return 0;
+
+free_cnts_data:
+ for (--i; i >= 0; i--) {
+ if (!(ib_aux_data->ports_mask & BIT(i)))
+ continue;
+
+ kfree(ib_aux_data->cnts_data[i].names);
+ }
+ kfree(ib_aux_data->cnts_data);
+
+ return rc;
+}
+
+static void hbl_cn_ib_aux_data_fini(struct hbl_cn_device *hdev)
+{
+ struct hbl_ib_aux_data *aux_data;
+ struct hbl_aux_dev *aux_dev;
+ int i;
+
+ aux_dev = &hdev->ib_aux_dev;
+ aux_data = aux_dev->aux_data;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(aux_data->ports_mask & BIT(i)))
+ continue;
+
+ kfree(aux_data->cnts_data[i].names);
+ }
+ kfree(aux_data->cnts_data);
+}
+
+static void eth_adev_release(struct device *dev)
+{
+ struct hbl_aux_dev *aux_dev = container_of(dev, struct hbl_aux_dev, adev.dev);
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+
+ hdev->is_eth_aux_dev_initialized = false;
+}
+
+static int hbl_cn_en_aux_drv_init(struct hbl_cn_device *hdev)
+{
+ struct hbl_aux_dev *aux_dev = &hdev->en_aux_dev;
+ struct auxiliary_device *adev;
+ int rc;
+
+ rc = hbl_cn_en_aux_data_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "eth aux data init failed\n");
+ return rc;
+ }
+
+ adev = &aux_dev->adev;
+ adev->id = hdev->id;
+ adev->name = "en";
+ adev->dev.parent = hdev->dev;
+ adev->dev.release = eth_adev_release;
+
+ rc = auxiliary_device_init(adev);
+ if (rc) {
+ dev_err(hdev->dev, "eth auxiliary_device_init failed\n");
+ goto aux_data_free;
+ }
+
+ rc = auxiliary_device_add(adev);
+ if (rc) {
+ dev_err(hdev->dev, "eth auxiliary_device_add failed\n");
+ goto uninit_adev;
+ }
+
+ hdev->is_eth_aux_dev_initialized = true;
+
+ return 0;
+
+uninit_adev:
+ auxiliary_device_uninit(adev);
+aux_data_free:
+ hbl_cn_en_aux_data_fini(hdev);
+
+ return rc;
+}
+
+static void hbl_cn_en_aux_drv_fini(struct hbl_cn_device *hdev)
+{
+ struct auxiliary_device *adev;
+
+ if (!hdev->is_eth_aux_dev_initialized)
+ return;
+
+ adev = &hdev->en_aux_dev.adev;
+
+ auxiliary_device_delete(adev);
+ auxiliary_device_uninit(adev);
+
+ hbl_cn_en_aux_data_fini(hdev);
+}
+
+static void ib_adev_release(struct device *dev)
+{
+ struct hbl_aux_dev *aux_dev = container_of(dev, struct hbl_aux_dev, adev.dev);
+ struct hbl_cn_device *hdev;
+
+ hdev = container_of(aux_dev, struct hbl_cn_device, ib_aux_dev);
+
+ hdev->is_ib_aux_dev_initialized = false;
+}
+
+static int hbl_cn_ib_aux_drv_init(struct hbl_cn_device *hdev)
+{
+ struct hbl_aux_dev *aux_dev = &hdev->ib_aux_dev;
+ struct auxiliary_device *adev;
+ int rc;
+
+ if (!hdev->ib_support)
+ return 0;
+
+ rc = hbl_cn_ib_aux_data_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "IB aux data init failed\n");
+ return rc;
+ }
+
+ adev = &aux_dev->adev;
+ adev->id = hdev->id;
+ adev->name = "ib";
+ adev->dev.parent = hdev->dev;
+ adev->dev.release = ib_adev_release;
+
+ rc = auxiliary_device_init(adev);
+ if (rc) {
+ dev_err(hdev->dev, "ib auxiliary_device_init failed\n");
+ goto aux_data_free;
+ }
+
+ rc = auxiliary_device_add(adev);
+ if (rc) {
+ dev_err(hdev->dev, "ib auxiliary_device_add failed\n");
+ goto uninit_adev;
+ }
+
+ hdev->is_ib_aux_dev_initialized = true;
+
+ return 0;
+
+uninit_adev:
+ auxiliary_device_uninit(adev);
+aux_data_free:
+ hbl_cn_ib_aux_data_fini(hdev);
+
+ return rc;
+}
+
+static void hbl_cn_ib_aux_drv_fini(struct hbl_cn_device *hdev)
+{
+ struct auxiliary_device *adev;
+
+ if (!hdev->ib_support || !hdev->is_ib_aux_dev_initialized)
+ return;
+
+ adev = &hdev->ib_aux_dev.adev;
+
+ auxiliary_device_delete(adev);
+ auxiliary_device_uninit(adev);
+
+ hbl_cn_ib_aux_data_fini(hdev);
+}
+
+void hbl_cn_internal_port_fini_locked(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_funcs *asic_funcs;
+
+ asic_funcs = hdev->asic_funcs;
+
+ if (!cn_port->port_open)
+ return;
+
+ cn_port->port_open = false;
+
+ /* verify that the port is marked as closed before continuing */
+ mb();
+
+ asic_funcs->port_funcs->phy_port_fini(cn_port);
+
+ __hbl_cn_port_hw_fini(cn_port);
+}
+
+static void hbl_cn_internal_ports_fini(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_port *cn_port;
+ int i;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)) || (hdev->ext_ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ mutex_lock(&cn_port->control_lock);
+
+ hbl_cn_internal_port_fini_locked(cn_port);
+
+ mutex_unlock(&cn_port->control_lock);
+ }
+}
+
+void hbl_cn_ports_cancel_status_work(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_port *cn_port;
+ int i;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ cancel_delayed_work_sync(&cn_port->fw_status_work);
+ }
+}
+
+int hbl_cn_internal_port_init_locked(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ u32 port = cn_port->port;
+ int rc;
+
+ asic_funcs = hdev->asic_funcs;
+
+ rc = __hbl_cn_port_hw_init(cn_port);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to configure the HW, port: %d, %d", port, rc);
+ return rc;
+ }
+
+ rc = asic_funcs->port_funcs->phy_port_init(cn_port);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to configure the HW, port: %d, %d", port, rc);
+ goto phy_fail;
+ }
+
+ cn_port->port_open = true;
+
+ return 0;
+
+phy_fail:
+ __hbl_cn_port_hw_fini(cn_port);
+
+ return rc;
+}
+
+static int hbl_cn_internal_ports_init(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_port *cn_port;
+ u32 port;
+ int rc, i;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)) || (hdev->ext_ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+ port = cn_port->port;
+
+ mutex_lock(&cn_port->control_lock);
+
+ rc = hbl_cn_internal_port_init_locked(cn_port);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to configure the HW, port: %d, %d", port, rc);
+ mutex_unlock(&cn_port->control_lock);
+ goto port_init_fail;
+ }
+
+ mutex_unlock(&cn_port->control_lock);
+ }
+
+ return 0;
+
+port_init_fail:
+ hbl_cn_internal_ports_fini(hdev);
+
+ return rc;
+}
+
+static int hbl_cn_kernel_ctx_init(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+
+ return asic_funcs->kernel_ctx_init(hdev, hdev->kernel_asid);
+}
+
+static void hbl_cn_kernel_ctx_fini(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+
+ asic_funcs->kernel_ctx_fini(hdev, hdev->kernel_asid);
+}
+
+static void hbl_cn_mac_loopback_init(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port = cn_port->port;
+ bool enable;
+
+ aux_dev = &hdev->en_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ enable = !!(hdev->mac_loopback & BIT(port));
+ cn_port->mac_loopback = enable;
+
+ if (cn_port->eth_enable && aux_ops->set_dev_lpbk)
+ aux_ops->set_dev_lpbk(aux_dev, port, enable);
+}
+
+static int hbl_cn_core_init(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+ struct hbl_cn_macro *cn_macro;
+ struct hbl_cn_port *cn_port;
+ int rc, i, port_cnt = 0;
+ u32 port;
+
+ /* RX packet drop config is not preserved across hard reset. */
+ hdev->rx_drop_percent = 0;
+
+ if (hdev->load_phy_fw) {
+ if (hdev->cn_props.is_phy_fw_binary) {
+ rc = hbl_cn_phy_has_binary_fw(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "F/W file was not found\n");
+ return rc;
+ }
+ }
+
+ rc = asic_funcs->phy_fw_load_all(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "F/W load for all failed\n");
+ return rc;
+ }
+ }
+
+ if (hdev->phy_config_fw)
+ dev_dbg(hdev->dev, "F/W CRC: 0x%x\n", asic_funcs->phy_get_crc(hdev));
+
+ for (i = 0; i < hdev->cn_props.num_of_macros; i++) {
+ hdev->cn_macros[i].phy_macro_needs_reset = true;
+ hdev->cn_macros[i].rec_link_sts = 0;
+ }
+
+ memset(hdev->phy_ber_info, 0,
+ hdev->cn_props.max_num_of_lanes * sizeof(struct hbl_cn_ber_info));
+
+ rc = asic_funcs->core_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "core init failed\n");
+ return rc;
+ }
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++, port_cnt++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+ cn_macro = cn_port->cn_macro;
+ port = cn_port->port;
+
+ /* In case this port got disabled, enable it back here */
+ cn_port->disabled = false;
+ /* Port toggle count should be reinitialized for each port upond hard reset only */
+ cn_port->port_toggle_cnt = 0;
+ cn_port->port_toggle_cnt_prev = 0;
+
+ /* Reset the macro PHY once on boot.
+ * This function resets all the 4 lanes in the PHY macro, therefore only one of the
+ * two ports of the macro should call it.
+ */
+ if (hdev->phy_config_fw && cn_macro->phy_macro_needs_reset) {
+ rc = asic_funcs->phy_reset_macro(cn_macro);
+ if (rc) {
+ dev_err(hdev->dev, "PHY reset macro failed for port %d\n", port);
+ goto err;
+ }
+
+ cn_macro->phy_macro_needs_reset = false;
+ }
+
+ hbl_cn_spmu_init(cn_port, false);
+
+ cn_port->auto_neg_enable = !!(hdev->auto_neg_mask & BIT(port));
+
+ if (!hdev->in_reset)
+ cn_port->eth_enable = !!(BIT(port) & hdev->ext_ports_mask);
+
+ /* This function must be called after setting cn_port->eth_enable */
+ hbl_cn_mac_loopback_init(cn_port);
+ }
+
+ return 0;
+
+err:
+ asic_funcs->core_fini(hdev);
+
+ return rc;
+}
+
+static void hbl_cn_core_fini(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+
+ asic_funcs->core_fini(hdev);
+}
+
+static void wq_arrays_pool_destroy(struct hbl_cn_device *hdev)
+{
+ if (!hdev->wq_arrays_pool_enable)
+ return;
+
+ gen_pool_destroy(hdev->wq_arrays_pool);
+}
+
+static int wq_arrays_pool_alloc(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_properties *cn_props;
+ int rc;
+
+ if (!hdev->wq_arrays_pool_enable)
+ return 0;
+
+ cn_props = &hdev->cn_props;
+
+ hdev->wq_arrays_pool = gen_pool_create(ilog2(hdev->cache_line_size), -1);
+ if (!hdev->wq_arrays_pool) {
+ dev_err(hdev->dev, "Failed to create a pool to manage WQ arrays on HBM\n");
+ rc = -ENOMEM;
+ goto gen_pool_create_fail;
+ }
+
+ gen_pool_set_algo(hdev->wq_arrays_pool, gen_pool_best_fit, NULL);
+
+ rc = gen_pool_add(hdev->wq_arrays_pool, cn_props->wq_base_addr, cn_props->wq_base_size,
+ -1);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to add memory to the WQ arrays pool\n");
+ goto gen_pool_add_fail;
+ }
+
+ return 0;
+
+gen_pool_add_fail:
+ gen_pool_destroy(hdev->wq_arrays_pool);
+gen_pool_create_fail:
+ return rc;
+}
+
+int __hbl_cn_ports_reopen(struct hbl_cn_device *hdev)
+{
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *en_aux_dev;
+ int rc;
+
+ en_aux_dev = &hdev->en_aux_dev;
+ aux_ops = en_aux_dev->aux_ops;
+
+ rc = hbl_cn_kernel_ctx_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init kernel context\n");
+ return rc;
+ }
+
+ rc = hbl_cn_core_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init core\n");
+ goto core_init_fail;
+ }
+
+ rc = hbl_cn_internal_ports_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init internal ports\n");
+ goto internal_ports_fail;
+ }
+
+ hdev->in_reset = false;
+ hdev->fw_reset = false;
+ hdev->operational = true;
+
+ if (aux_ops->ports_reopen) {
+ rc = aux_ops->ports_reopen(en_aux_dev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to reopen en ports\n");
+ goto en_ports_reopen_fail;
+ }
+ }
+
+ return 0;
+
+en_ports_reopen_fail:
+ hdev->operational = false;
+ hbl_cn_internal_ports_fini(hdev);
+internal_ports_fail:
+ hbl_cn_core_fini(hdev);
+core_init_fail:
+ hbl_cn_kernel_ctx_fini(hdev);
+
+ return rc;
+}
+
+void __hbl_cn_stop(struct hbl_cn_device *hdev)
+{
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *en_aux_dev;
+
+ en_aux_dev = &hdev->en_aux_dev;
+ aux_ops = en_aux_dev->aux_ops;
+
+ /* Cancelling all outstanding works for all ports should be done first when stopping */
+ hdev->asic_funcs->ports_cancel_status_work(hdev);
+
+ qps_stop(hdev);
+
+ if (aux_ops->ports_stop)
+ aux_ops->ports_stop(en_aux_dev);
+
+ hbl_cn_internal_ports_fini(hdev);
+ hbl_cn_core_fini(hdev);
+ hbl_cn_kernel_ctx_fini(hdev);
+}
+
+void __hbl_cn_hard_reset_prepare(struct hbl_cn_device *hdev, bool fw_reset, bool in_teardown)
+{
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *en_aux_dev;
+
+ en_aux_dev = &hdev->en_aux_dev;
+ aux_ops = en_aux_dev->aux_ops;
+
+ hdev->in_reset = true;
+ hdev->fw_reset = fw_reset;
+ hdev->in_teardown = in_teardown;
+ hdev->operational = false;
+
+ mutex_lock(&hdev->hw_access_lock);
+ mutex_unlock(&hdev->hw_access_lock);
+
+ if (aux_ops->ports_stop_prepare)
+ aux_ops->ports_stop_prepare(en_aux_dev);
+}
+
+void hbl_cn_hard_reset_prepare(struct hbl_aux_dev *cn_aux_dev, bool fw_reset, bool in_teardown)
+{
+ struct hbl_cn_device *hdev = cn_aux_dev->priv;
+
+ __hbl_cn_hard_reset_prepare(hdev, fw_reset, in_teardown);
+}
+
+int hbl_cn_send_port_cpucp_status(struct hbl_aux_dev *aux_dev, u32 port, u8 cmd, u8 period)
+{
+ struct hbl_cn_device *hdev = aux_dev->priv;
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ if (cmd > HBL_CN_STATUS_PERIODIC_STOP) {
+ dev_err(hdev->dev, "Received invalid CN status cmd (%d) from F/W, port %d", cmd,
+ port);
+ return -EINVAL;
+ }
+
+ hdev->status_cmd = cmd;
+ hdev->status_period = (cmd == HBL_CN_STATUS_PERIODIC_START) ? period : 0;
+
+ if (cmd == HBL_CN_STATUS_PERIODIC_STOP)
+ cancel_delayed_work_sync(&cn_port->fw_status_work);
+ else
+ queue_delayed_work(cn_port->wq, &cn_port->fw_status_work, 0);
+
+ return 0;
+}
+
+static void hbl_cn_get_cpucp_info(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ aux_ops->get_cpucp_info(aux_dev, hdev->cpucp_info);
+}
+
+static int hbl_cn_ports_reopen(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_cn_device *hdev = aux_dev->priv;
+ int rc;
+
+ /* update CPUCP info after device reset */
+ hbl_cn_get_cpucp_info(hdev);
+
+ rc = hbl_cn_request_irqs(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to request IRQs\n");
+ return rc;
+ }
+
+ rc = __hbl_cn_ports_reopen(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to reopen ports\n");
+ goto free_irqs;
+ }
+
+ return 0;
+
+free_irqs:
+ hbl_cn_free_irqs(hdev);
+
+ return rc;
+}
+
+static void hbl_cn_stop(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_cn_device *hdev = aux_dev->priv;
+
+ __hbl_cn_stop(hdev);
+
+ hbl_cn_synchronize_irqs(aux_dev);
+ hbl_cn_free_irqs(hdev);
+}
+
+static int hbl_cn_set_static_properties(struct hbl_cn_device *hdev)
+{
+ return hdev->asic_funcs->set_static_properties(hdev);
+}
+
+static int hbl_cn_set_dram_properties(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+
+ return asic_funcs->set_dram_properties(hdev);
+}
+
+static int hbl_cn_set_asic_funcs(struct hbl_cn_device *hdev)
+{
+ switch (hdev->asic_type) {
+ case HBL_ASIC_GAUDI2:
+ default:
+ dev_err(hdev->dev, "Unrecognized ASIC type %d\n", hdev->asic_type);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+int hbl_cn_dev_init(struct hbl_cn_device *hdev)
+{
+ int rc;
+
+ if (!hdev->ports_mask) {
+ dev_err(hdev->dev, "All ports are disabled\n");
+ return -EINVAL;
+ }
+
+ /* must be called first to init the ASIC funcs */
+ rc = hbl_cn_set_asic_funcs(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "failed to set ASIC aux ops\n");
+ return rc;
+ }
+
+ /* get CPUCP info before initializing the device */
+ hbl_cn_get_cpucp_info(hdev);
+
+ /* init static cn properties */
+ rc = hbl_cn_set_static_properties(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to set static properties\n");
+ return rc;
+ }
+
+ /* init DRAM cn properties */
+ rc = hbl_cn_set_dram_properties(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "failed to set DRAM properties\n");
+ return rc;
+ }
+
+ rc = hbl_cn_sw_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "SW init failed\n");
+ return rc;
+ }
+
+ rc = hbl_cn_request_irqs(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to request IRQs\n");
+ goto request_irqs_fail;
+ }
+
+ rc = hbl_cn_kernel_ctx_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init kernel context\n");
+ goto kernel_ctx_init_fail;
+ }
+
+ rc = hbl_cn_core_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init core\n");
+ goto core_init_fail;
+ }
+
+ rc = hbl_cn_internal_ports_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init internal ports\n");
+ goto internal_ports_init_fail;
+ }
+
+ rc = wq_arrays_pool_alloc(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init WQ arrays pool\n");
+ goto wq_arrays_pool_alloc_fail;
+ }
+
+ hbl_cn_mem_init(hdev);
+
+ hdev->operational = true;
+
+ rc = hbl_cn_en_aux_drv_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init Ethernet driver\n");
+ goto en_aux_drv_fail;
+ }
+
+ rc = hbl_cn_ib_aux_drv_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init IB driver\n");
+ goto ib_aux_drv_fail;
+ }
+
+ hbl_cn_late_init(hdev);
+
+ hdev->is_initialized = true;
+
+ return 0;
+
+ib_aux_drv_fail:
+ hbl_cn_en_aux_drv_fini(hdev);
+en_aux_drv_fail:
+ hdev->operational = false;
+ hbl_cn_mem_fini(hdev);
+ wq_arrays_pool_destroy(hdev);
+wq_arrays_pool_alloc_fail:
+ hbl_cn_internal_ports_fini(hdev);
+internal_ports_init_fail:
+ hbl_cn_core_fini(hdev);
+core_init_fail:
+ hbl_cn_kernel_ctx_fini(hdev);
+kernel_ctx_init_fail:
+ hbl_cn_free_irqs(hdev);
+request_irqs_fail:
+ hbl_cn_sw_fini(hdev);
+
+ return rc;
+}
+
+void hbl_cn_dev_fini(struct hbl_cn_device *hdev)
+{
+ if (!hdev->is_initialized)
+ return;
+
+ hdev->is_initialized = false;
+
+ if (hdev->hw_stop_during_teardown) {
+ hbl_cn_hard_reset_prepare(hdev->cn_aux_dev, false, true);
+ hbl_cn_stop(hdev->cn_aux_dev);
+ }
+
+ hbl_cn_late_fini(hdev);
+
+ hbl_cn_ib_aux_drv_fini(hdev);
+ /* must be called after MSI was disabled */
+ hbl_cn_en_aux_drv_fini(hdev);
+ hbl_cn_mem_fini(hdev);
+ wq_arrays_pool_destroy(hdev);
+ hbl_cn_sw_fini(hdev);
+}
+
+static int hbl_cn_cmd_port_check(struct hbl_cn_device *hdev, u32 port, u32 flags)
+{
+ bool check_open = flags & NIC_PORT_CHECK_OPEN,
+ check_enable = (flags & NIC_PORT_CHECK_ENABLE) || check_open,
+ print_on_err = flags & NIC_PORT_PRINT_ON_ERR;
+ struct hbl_cn_port *cn_port;
+
+ if (port >= hdev->cn_props.max_num_of_ports) {
+ if (print_on_err)
+ dev_dbg(hdev->dev, "Invalid port %d\n", port);
+ return -EINVAL;
+ }
+
+ if (check_enable && !(hdev->ports_mask & BIT(port))) {
+ if (print_on_err)
+ dev_dbg(hdev->dev, "Port %d is disabled\n", port);
+ return -ENODEV;
+ }
+
+ cn_port = &hdev->cn_ports[port];
+
+ if (check_open && !hbl_cn_is_port_open(cn_port)) {
+ if (print_on_err)
+ dev_dbg(hdev->dev, "Port %d is closed\n", port);
+ return -ENODEV;
+ }
+
+ return 0;
+}
+
+static void hbl_cn_get_qp_id_range(struct hbl_cn_port *cn_port, u32 *min_id, u32 *max_id)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_funcs *asic_funcs;
+
+ asic_funcs = hdev->asic_funcs;
+
+ asic_funcs->port_funcs->get_qp_id_range(cn_port, min_id, max_id);
+
+ /* Take the minimum between the max id supported by the port and the max id supported by
+ * the WQs number the user asked to allocate.
+ */
+ *max_id = min(cn_port->qp_idx_offset + cn_port->num_of_wqs - 1, *max_id);
+}
+
+static void hbl_cn_qp_do_release(struct hbl_cn_qp *qp)
+{
+ struct hbl_cn_qpc_drain_attr drain_attr = { .wait_for_idle = false, };
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_port *cn_port;
+
+ if (IS_ERR_OR_NULL(qp))
+ return;
+
+ cn_port = qp->cn_port;
+ port_funcs = cn_port->hdev->asic_funcs->port_funcs;
+
+ cancel_delayed_work(&qp->adaptive_tmr_reset);
+
+ port_funcs->qp_pre_destroy(qp);
+
+ /* QP was found before, hence use xa_store to replace the pointer but don't release index.
+ * xa_store should not fail in such scenario.
+ */
+ xa_store(&qp->cn_port->qp_ids, qp->qp_id, NULL, GFP_KERNEL);
+
+ /* drain the Req QP now in order to make sure that accesses to the WQ will not
+ * be performed from this point on.
+ * Waiting for the WQ to drain is performed in the reset work
+ */
+ hbl_cn_qp_modify(cn_port, qp, CN_QP_STATE_SQD, &drain_attr);
+
+ queue_work(cn_port->qp_wq, &qp->async_work);
+}
+
+static void qp_adaptive_tmr_reset(struct work_struct *work)
+{
+ struct hbl_cn_qp *qp = container_of(work, struct hbl_cn_qp, adaptive_tmr_reset.work);
+ struct hbl_cn_port *cn_port = qp->cn_port;
+ struct hbl_cn_device *hdev;
+
+ hdev = cn_port->hdev;
+
+ hdev->asic_funcs->port_funcs->adaptive_tmr_reset(qp);
+}
+
+static int alloc_qp(struct hbl_cn_device *hdev, struct hbl_cn_ctx *ctx,
+ struct hbl_cni_alloc_conn_in *in, struct hbl_cni_alloc_conn_out *out)
+{
+ struct hbl_cn_wq_array_properties *swq_arr_props, *rwq_arr_props;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_port *cn_port;
+ struct xa_limit id_limit;
+ u32 min_id, max_id, port;
+ struct hbl_cn_qp *qp;
+ int id, rc;
+
+ port = in->port;
+
+ rc = hbl_cn_cmd_port_check(hdev, port, NIC_PORT_CHECK_OPEN | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ qp = kzalloc(sizeof(*qp), GFP_KERNEL);
+ if (!qp)
+ return -ENOMEM;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ cn_port = &hdev->cn_ports[port];
+ qp->cn_port = cn_port;
+ qp->port = port;
+ qp->ctx = ctx;
+ qp->curr_state = CN_QP_STATE_RESET;
+ INIT_WORK(&qp->async_work, qp_destroy_work);
+ INIT_DELAYED_WORK(&qp->adaptive_tmr_reset, qp_adaptive_tmr_reset);
+
+ hbl_cn_get_qp_id_range(cn_port, &min_id, &max_id);
+
+ port_funcs->cfg_lock(cn_port);
+
+ if (!cn_port->set_app_params) {
+ dev_dbg(hdev->dev,
+ "Failed to allocate QP, set_app_params wasn't called yet, port %d\n", port);
+ rc = -EPERM;
+ goto error_exit;
+ }
+
+ swq_arr_props = &cn_port->wq_arr_props[HBL_CNI_USER_WQ_SEND];
+ rwq_arr_props = &cn_port->wq_arr_props[HBL_CNI_USER_WQ_RECV];
+
+ if (!swq_arr_props->enabled || !rwq_arr_props->enabled) {
+ dev_dbg(hdev->dev, "Failed to allocate QP as WQs are not configured, port %d\n",
+ port);
+ rc = -EPERM;
+ goto error_exit;
+ }
+
+ if (swq_arr_props->under_unset || rwq_arr_props->under_unset) {
+ dev_dbg(hdev->dev, "Failed to allocate QP as WQs are under unset, port %d\n", port);
+ rc = -EPERM;
+ goto error_exit;
+ }
+
+ id_limit = XA_LIMIT(min_id, max_id);
+ rc = xa_alloc(&cn_port->qp_ids, &id, qp, id_limit, GFP_KERNEL);
+ if (rc) {
+ dev_dbg(hdev->dev, "Failed allocate QP IDR entry, port %d", port);
+ goto error_exit;
+ }
+
+ qp->qp_id = id;
+
+ rc = port_funcs->register_qp(cn_port, id, ctx->asid);
+ if (rc) {
+ dev_dbg(hdev->dev, "Failed to register QP %d, port %d\n", id, port);
+ goto qp_register_error;
+ }
+
+ atomic_inc(&cn_port->num_of_allocated_qps);
+
+ port_funcs->cfg_unlock(cn_port);
+
+ out->conn_id = id;
+
+ return 0;
+
+qp_register_error:
+ xa_erase(&qp->cn_port->qp_ids, qp->qp_id);
+error_exit:
+ port_funcs->cfg_unlock(cn_port);
+ kfree(qp);
+ return rc;
+}
+
+u32 hbl_cn_get_wq_array_type(bool is_send)
+{
+ return is_send ? HBL_CNI_USER_WQ_SEND : HBL_CNI_USER_WQ_RECV;
+}
+
+static int alloc_and_map_wq(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp, u32 n_wq,
+ bool is_swq)
+{
+ u32 wq_arr_type, wqe_size, qp_idx_offset, wq_idx;
+ struct hbl_cn_wq_array_properties *wq_arr_props;
+ struct hbl_cn_mem_data mem_data = {};
+ struct hbl_cn_properties *cn_props;
+ struct hbl_cn_device *hdev;
+ struct hbl_cn_mem_buf *buf;
+ u64 wq_arr_size, wq_size;
+ int rc;
+
+ hdev = cn_port->hdev;
+ cn_props = &hdev->cn_props;
+ qp_idx_offset = cn_port->qp_idx_offset;
+ wq_idx = qp->qp_id - qp_idx_offset;
+
+ wq_arr_type = hbl_cn_get_wq_array_type(is_swq);
+ wq_arr_props = &cn_port->wq_arr_props[wq_arr_type];
+ wqe_size = is_swq ? cn_port->swqe_size : cn_props->rwqe_size;
+
+ if (wq_arr_props->dva_base) {
+ mem_data.mem_id = HBL_CN_DRV_MEM_HOST_VIRTUAL;
+ mem_data.size = PAGE_ALIGN(n_wq * wqe_size);
+
+ /* Get offset into device VA block pre-allocated for SWQ.
+ *
+ * Note: HW indexes into SWQ array using qp_id.
+ * In general, it's HW requirement to leave holes in a WQ array if corresponding QP
+ * indexes are allocated on another WQ array.
+ */
+ mem_data.device_va = wq_arr_props->dva_base + wq_arr_props->offset +
+ wq_arr_props->wq_size * wq_idx;
+
+ /* Check for out of range. */
+ if (mem_data.device_va + mem_data.size >
+ wq_arr_props->dva_base + wq_arr_props->dva_size) {
+ dev_dbg(hdev->dev,
+ "Out of range device VA. device_va 0x%llx, size 0x%llx\n",
+ mem_data.device_va, mem_data.size);
+ return -EINVAL;
+ }
+ } else {
+ /* DMA coherent allocate case. Memory for WQ array is already allocated in
+ * user_wq_arr_set(). Here we use the allocated base addresses and QP id to
+ * calculate the CPU & bus addresses of the WQ for current QP and return that
+ * handle to the user. User may mmap() this handle returned by set_req_qp_ctx()
+ * to write WQEs.
+ */
+ mem_data.mem_id = HBL_CN_DRV_MEM_HOST_MAP_ONLY;
+
+ buf = hbl_cn_mem_buf_get(hdev, wq_arr_props->handle);
+ if (!buf) {
+ dev_err(hdev->dev, "Failed to retrieve WQ arr handle for port %d\n",
+ cn_port->port);
+ return -EINVAL;
+ }
+
+ /* Actual size to allocate. Page aligned since we mmap to user. */
+ mem_data.size = PAGE_ALIGN(n_wq * wqe_size);
+ wq_size = wq_arr_props->wq_size;
+ wq_arr_size = buf->mappable_size;
+
+ /* Get offset into kernel buffer block pre-allocated for SWQ. */
+ mem_data.in.host_map_data.kernel_address = buf->kernel_address +
+ wq_arr_props->offset + wq_size * wq_idx;
+
+ mem_data.in.host_map_data.bus_address = buf->bus_address + wq_arr_props->offset +
+ wq_size * wq_idx;
+
+ /* Check for out of range. */
+ if ((u64)mem_data.in.host_map_data.kernel_address + mem_data.size >
+ (u64)buf->kernel_address + wq_arr_size) {
+ dev_dbg(hdev->dev,
+ "Out of range kernel addr. kernel addr 0x%p, size 0x%llx\n",
+ mem_data.in.host_map_data.kernel_address, mem_data.size);
+ return -EINVAL;
+ }
+ }
+
+ /* Allocate host vmalloc memory and map its physical pages to PMMU. */
+ rc = hbl_cn_mem_alloc(qp->ctx, &mem_data);
+ if (rc) {
+ dev_dbg(hdev->dev, "Failed to allocate %s. Port %d, QP %d\n",
+ is_swq ? "SWQ" : "RWQ", cn_port->port, qp->qp_id);
+ return rc;
+ }
+
+ /* Retrieve mmap handle. */
+ if (is_swq) {
+ qp->swq_handle = mem_data.handle;
+ qp->swq_size = mem_data.size;
+ } else {
+ qp->rwq_handle = mem_data.handle;
+ qp->rwq_size = mem_data.size;
+ }
+
+ return 0;
+}
+
+static int set_req_qp_ctx(struct hbl_cn_device *hdev, struct hbl_cni_req_conn_ctx_in *in,
+ struct hbl_cni_req_conn_ctx_out *out)
+{
+ struct hbl_cn_wq_array_properties *swq_arr_props, *rwq_arr_props;
+ struct hbl_cn_encap_xarray_pdata *encap_data;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ u32 wq_size, port, max_wq_size;
+ struct hbl_cn_port *cn_port;
+ struct hbl_cn_qp *qp;
+ int rc;
+
+ port = in->port;
+
+ rc = hbl_cn_cmd_port_check(hdev, port, NIC_PORT_CHECK_OPEN | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ asic_funcs = hdev->asic_funcs;
+ port_funcs = asic_funcs->port_funcs;
+ cn_port = &hdev->cn_ports[port];
+
+ if (in->timer_granularity > NIC_TMR_TIMEOUT_MAX_GRAN) {
+ dev_err(hdev->dev,
+ "timer granularity %d is not supported\n", in->timer_granularity);
+ return -EINVAL;
+ }
+
+ if (!in->timer_granularity && !hbl_cn_is_ibdev_opened(hdev))
+ in->timer_granularity = NIC_TMR_TIMEOUT_DEFAULT_GRAN;
+
+ port_funcs->cfg_lock(cn_port);
+ qp = xa_load(&cn_port->qp_ids, in->conn_id);
+
+ if (IS_ERR_OR_NULL(qp)) {
+ dev_dbg(hdev->dev, "Failed to find matching QP for handle %d, port %d\n",
+ in->conn_id, port);
+ rc = -EINVAL;
+ goto cfg_unlock;
+ }
+
+ /* sanity test the port IDs */
+ if (qp->port != port) {
+ dev_dbg(hdev->dev, "QP port %d does not match requested port %d\n", qp->port, port);
+ rc = -EINVAL;
+ goto cfg_unlock;
+ }
+
+ if (in->encap_en) {
+ encap_data = xa_load(&cn_port->encap_ids, in->encap_id);
+ if (!encap_data) {
+ dev_dbg_ratelimited(hdev->dev,
+ "Encapsulation ID %d not found, ignoring\n",
+ in->encap_id);
+ in->encap_en = 0;
+ in->encap_id = 0;
+ }
+ }
+
+ if (qp->is_req) {
+ dev_dbg(hdev->dev, "Port %d, QP %d - Requester QP is already set\n", port,
+ qp->qp_id);
+ rc = -EINVAL;
+ goto cfg_unlock;
+ }
+
+ wq_size = in->wq_size;
+
+ /* verify that size does not exceed wq_array size */
+ max_wq_size = cn_port->num_of_wq_entries;
+
+ if (wq_size > max_wq_size) {
+ dev_dbg(hdev->dev,
+ "Port %d, Requester QP %d - requested size (%d) > max size (%d)\n", port,
+ qp->qp_id, wq_size, max_wq_size);
+ rc = -EINVAL;
+ goto cfg_unlock;
+ }
+
+ swq_arr_props = &cn_port->wq_arr_props[HBL_CNI_USER_WQ_SEND];
+ rwq_arr_props = &cn_port->wq_arr_props[HBL_CNI_USER_WQ_RECV];
+
+ if (!swq_arr_props->on_device_mem) {
+ rc = alloc_and_map_wq(cn_port, qp, wq_size, true);
+ if (rc)
+ goto cfg_unlock;
+
+ out->swq_mem_handle = qp->swq_handle;
+ out->swq_mem_size = qp->swq_size;
+ }
+
+ if (!rwq_arr_props->on_device_mem) {
+ rc = alloc_and_map_wq(cn_port, qp, wq_size, false);
+ if (rc)
+ goto err_free_swq;
+
+ out->rwq_mem_handle = qp->rwq_handle;
+ out->rwq_mem_size = qp->rwq_size;
+ }
+
+ qp->remote_key = in->remote_key;
+
+ rc = hbl_cn_qp_modify(cn_port, qp, CN_QP_STATE_RTS, in);
+ if (rc)
+ goto err_free_rwq;
+
+ port_funcs->cfg_unlock(cn_port);
+
+ return 0;
+
+err_free_rwq:
+ if (qp->rwq_handle) {
+ hbl_cn_mem_destroy(hdev, qp->rwq_handle);
+ qp->rwq_handle = 0;
+ out->rwq_mem_handle = qp->rwq_handle;
+ if (!rwq_arr_props->dva_base) {
+ int ret;
+
+ ret = hbl_cn_mem_buf_put_handle(hdev, rwq_arr_props->handle);
+ if (ret == 1)
+ rwq_arr_props->handle = 0;
+ }
+ }
+err_free_swq:
+ if (qp->swq_handle) {
+ hbl_cn_mem_destroy(hdev, qp->swq_handle);
+ qp->swq_handle = 0;
+ out->swq_mem_handle = qp->swq_handle;
+ if (!swq_arr_props->dva_base) {
+ int ret;
+
+ ret = hbl_cn_mem_buf_put_handle(hdev, swq_arr_props->handle);
+ if (ret == 1)
+ swq_arr_props->handle = 0;
+ }
+ }
+cfg_unlock:
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static int set_res_qp_ctx(struct hbl_cn_device *hdev, struct hbl_cni_res_conn_ctx_in *in)
+{
+ struct hbl_cn_encap_xarray_pdata *encap_data;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct hbl_cn_port *cn_port;
+ struct hbl_cn_qp *qp;
+ u32 port;
+ int rc;
+
+ port = in->port;
+
+ rc = hbl_cn_cmd_port_check(hdev, port, NIC_PORT_CHECK_OPEN | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ asic_funcs = hdev->asic_funcs;
+ port_funcs = asic_funcs->port_funcs;
+ cn_port = &hdev->cn_ports[port];
+
+ port_funcs->cfg_lock(cn_port);
+ qp = xa_load(&cn_port->qp_ids, in->conn_id);
+
+ if (IS_ERR_OR_NULL(qp)) {
+ dev_dbg(hdev->dev, "Failed to find matching QP for handle %d, port %d\n",
+ in->conn_id, port);
+ rc = -EINVAL;
+ goto unlock_cfg;
+ }
+
+ if (in->encap_en) {
+ encap_data = xa_load(&cn_port->encap_ids, in->encap_id);
+ if (!encap_data) {
+ dev_dbg_ratelimited(hdev->dev,
+ "Encapsulation ID %d not found, ignoring\n",
+ in->encap_id);
+ in->encap_en = 0;
+ in->encap_id = 0;
+ }
+ }
+
+ if (qp->is_res) {
+ dev_dbg(hdev->dev, "Port %d, QP %d - Responder QP is already set\n", port,
+ qp->qp_id);
+ rc = -EINVAL;
+ goto unlock_cfg;
+ }
+
+ /* sanity test the port IDs */
+ if (qp->port != port) {
+ dev_dbg(hdev->dev, "QP port %d does not match requested port %d\n", qp->port, port);
+ rc = -EINVAL;
+ goto unlock_cfg;
+ }
+
+ qp->local_key = in->local_key;
+
+ if (qp->curr_state == CN_QP_STATE_RESET) {
+ rc = hbl_cn_qp_modify(cn_port, qp, CN_QP_STATE_INIT, NULL);
+ if (rc)
+ goto unlock_cfg;
+ }
+
+ /* all is well, we are ready to receive */
+ rc = hbl_cn_qp_modify(cn_port, qp, CN_QP_STATE_RTR, in);
+
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+
+unlock_cfg:
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+/* must be called under the port cfg lock */
+u32 hbl_cn_get_max_qp_id(struct hbl_cn_port *cn_port)
+{
+ int max_qp_id = cn_port->qp_idx_offset;
+ unsigned long qp_id = 0;
+ struct hbl_cn_qp *qp;
+
+ xa_for_each(&cn_port->qp_ids, qp_id, qp)
+ if (qp->qp_id > max_qp_id)
+ max_qp_id = qp->qp_id;
+
+ return max_qp_id;
+}
+
+static void qp_destroy_work(struct work_struct *work)
+{
+ struct hbl_cn_qp *qp = container_of(work, struct hbl_cn_qp, async_work);
+ struct hbl_cn_wq_array_properties *swq_arr_props, *rwq_arr_props;
+ struct hbl_cn_port *cn_port = qp->cn_port;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_qpc_drain_attr drain_attr;
+ struct hbl_cn_qpc_reset_attr rst_attr;
+ struct hbl_cn_ctx *ctx = qp->ctx;
+ struct hbl_cn_device *hdev;
+ int rc;
+
+ hdev = cn_port->hdev;
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ if (!hdev->operational) {
+ drain_attr.wait_for_idle = false;
+ rst_attr.reset_mode = hdev->qp_reset_mode;
+ } else {
+ drain_attr.wait_for_idle = true;
+ rst_attr.reset_mode = CN_QP_RESET_MODE_GRACEFUL;
+ }
+
+ /* Complete the wait for SQ to drain. To allow parallel QPs destruction, don't take the cfg
+ * lock here. This is safe because SQD->SQD QP transition is a simple wait to drain the QP
+ * without any access to the HW.
+ */
+ if (qp->curr_state == CN_QP_STATE_SQD)
+ hbl_cn_qp_modify(cn_port, qp, CN_QP_STATE_SQD, &drain_attr);
+
+ port_funcs->cfg_lock(cn_port);
+
+ hbl_cn_qp_modify(cn_port, qp, CN_QP_STATE_RESET, &rst_attr);
+
+ port_funcs->unregister_qp(cn_port, qp->qp_id);
+
+ swq_arr_props = &cn_port->wq_arr_props[HBL_CNI_USER_WQ_SEND];
+ rwq_arr_props = &cn_port->wq_arr_props[HBL_CNI_USER_WQ_RECV];
+
+ if (qp->swq_handle) {
+ hbl_cn_mem_destroy(hdev, qp->swq_handle);
+ qp->swq_handle = 0;
+ if (!swq_arr_props->dva_base) {
+ rc = hbl_cn_mem_buf_put_handle(hdev, swq_arr_props->handle);
+ if (rc == 1)
+ swq_arr_props->handle = 0;
+ }
+ }
+
+ if (qp->rwq_handle) {
+ hbl_cn_mem_destroy(hdev, qp->rwq_handle);
+ qp->rwq_handle = 0;
+ if (!rwq_arr_props->dva_base) {
+ rc = hbl_cn_mem_buf_put_handle(hdev, rwq_arr_props->handle);
+ if (rc == 1)
+ rwq_arr_props->handle = 0;
+ }
+ }
+
+ xa_erase(&cn_port->qp_ids, qp->qp_id);
+
+ if (atomic_dec_and_test(&cn_port->num_of_allocated_qps)) {
+ if (swq_arr_props->under_unset)
+ __user_wq_arr_unset(ctx, cn_port, HBL_CNI_USER_WQ_SEND);
+
+ if (rwq_arr_props->under_unset)
+ __user_wq_arr_unset(ctx, cn_port, HBL_CNI_USER_WQ_RECV);
+ }
+
+ if (qp->req_user_cq)
+ hbl_cn_user_cq_put(qp->req_user_cq);
+
+ if (qp->res_user_cq)
+ hbl_cn_user_cq_put(qp->res_user_cq);
+
+ port_funcs->qp_post_destroy(qp);
+
+ /* hbl_cn_mem_destroy should be included inside lock not due to protection.
+ * The handles (swq_handle and rwq_handle) are created based on QP id.
+ * Lock is to avoid concurrent memory access from a new handle created before freeing
+ * memory.
+ */
+ port_funcs->cfg_unlock(cn_port);
+
+ kfree(qp);
+}
+
+static void qps_drain_async_work(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_port *cn_port;
+ int i, num_gen_qps;
+
+ /* wait for the workers to complete */
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ drain_workqueue(cn_port->qp_wq);
+
+ num_gen_qps = atomic_read(&cn_port->num_of_allocated_qps);
+ if (num_gen_qps)
+ dev_warn(hdev->dev, "Port %d still has %d QPs alive\n", i, num_gen_qps);
+ }
+}
+
+static inline int __must_check PTR_ERR_OR_EINVAL(__force const void *ptr)
+{
+ if (IS_ERR(ptr))
+ return PTR_ERR(ptr);
+ else
+ return -EINVAL;
+}
+
+static int destroy_qp(struct hbl_cn_device *hdev, struct hbl_cni_destroy_conn_in *in)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct hbl_cn_port *cn_port;
+ struct hbl_cn_qp *qp;
+ u32 port, flags;
+ int rc;
+
+ port = in->port;
+
+ if (port >= hdev->cn_props.max_num_of_ports) {
+ dev_dbg(hdev->dev, "Invalid port %d\n", port);
+ return -EINVAL;
+ }
+
+ cn_port = &hdev->cn_ports[port];
+
+ /* in case of destroying QPs of external ports, the port may be already closed by a user
+ * issuing "ip link set down" command so we only check if the port is enabled in these
+ * ports.
+ */
+ flags = cn_port->eth_enable ? NIC_PORT_CHECK_ENABLE : NIC_PORT_CHECK_OPEN;
+ flags |= NIC_PORT_PRINT_ON_ERR;
+ rc = hbl_cn_cmd_port_check(hdev, port, flags);
+ if (rc)
+ return rc;
+
+ asic_funcs = hdev->asic_funcs;
+ port_funcs = asic_funcs->port_funcs;
+
+ /* prevent reentrancy by locking the whole process of destroy_qp */
+ port_funcs->cfg_lock(cn_port);
+ qp = xa_load(&cn_port->qp_ids, in->conn_id);
+
+ if (IS_ERR_OR_NULL(qp)) {
+ rc = PTR_ERR_OR_EINVAL(qp);
+ goto out_err;
+ }
+
+ hbl_cn_qp_do_release(qp);
+
+ port_funcs->cfg_unlock(cn_port);
+
+ return 0;
+
+out_err:
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static void hbl_cn_qps_stop(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs = cn_port->hdev->asic_funcs->port_funcs;
+ struct hbl_cn_qpc_drain_attr drain = { .wait_for_idle = false, };
+ unsigned long qp_id = 0;
+ struct hbl_cn_qp *qp;
+
+ port_funcs->cfg_lock(cn_port);
+
+ xa_for_each(&cn_port->qp_ids, qp_id, qp) {
+ if (IS_ERR_OR_NULL(qp))
+ continue;
+
+ hbl_cn_qp_modify(cn_port, qp, CN_QP_STATE_QPD, (void *)&drain);
+ }
+
+ port_funcs->cfg_unlock(cn_port);
+}
+
+static void qps_stop(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_port *cn_port;
+ int i;
+
+ /* stop the QPs */
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ hbl_cn_qps_stop(cn_port);
+ }
+}
+
+static int user_wq_arr_set(struct hbl_cn_device *hdev, struct hbl_cni_user_wq_arr_set_in *in,
+ struct hbl_cni_user_wq_arr_set_out *out, struct hbl_cn_ctx *ctx)
+{
+ u32 port, type, num_of_wqs, num_of_wq_entries, min_wqs_per_port, mem_id;
+ struct hbl_cn_wq_array_properties *wq_arr_props;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_properties *cn_props;
+ struct hbl_cn_port *cn_port;
+ char *type_str;
+ int rc;
+
+ if (in->swq_granularity > HBL_CNI_SWQE_GRAN_64B) {
+ dev_dbg(hdev->dev, "Invalid send WQE granularity %d\n", in->swq_granularity);
+ return -EINVAL;
+ }
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+ cn_props = &hdev->cn_props;
+
+ type = in->type;
+
+ if (type > cn_props->max_wq_arr_type) {
+ dev_dbg(hdev->dev, "invalid type %d, can't set user WQ\n", type);
+ return -EINVAL;
+ }
+
+ mem_id = in->mem_id;
+
+ if (mem_id != HBL_CNI_MEM_HOST && mem_id != HBL_CNI_MEM_DEVICE) {
+ dev_dbg(hdev->dev, "invalid memory type %d for user WQ\n", mem_id);
+ return -EINVAL;
+ }
+
+ port = in->port;
+ rc = hbl_cn_cmd_port_check(hdev, port, NIC_PORT_CHECK_OPEN | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ cn_port = &hdev->cn_ports[port];
+
+ wq_arr_props = &cn_port->wq_arr_props[type];
+ type_str = wq_arr_props->type_str;
+
+ /* For generic WQs minimum number of wqs required is 2, one for raw eth and one for rdma */
+ min_wqs_per_port = NIC_MIN_WQS_PER_PORT;
+ if (in->num_of_wqs < min_wqs_per_port) {
+ dev_dbg(hdev->dev, "number of %s WQs must be minimum %d, port %d\n", type_str,
+ min_wqs_per_port, port);
+ return -EINVAL;
+ }
+
+ /* H/W limitation */
+ if (in->num_of_wqs > cn_props->max_hw_qps_num) {
+ dev_dbg(hdev->dev, "number of %s WQs (0x%x) can't be bigger than 0x%x, port %d\n",
+ type_str, in->num_of_wqs, cn_props->max_hw_qps_num, port);
+ return -EINVAL;
+ }
+
+ if (!is_power_of_2(in->num_of_wq_entries)) {
+ dev_dbg(hdev->dev,
+ "number of %s WQ entries (0x%x) must be a power of 2, port %d\n", type_str,
+ in->num_of_wq_entries, port);
+ return -EINVAL;
+ }
+
+ /* H/W limitation */
+ if (in->num_of_wq_entries < cn_props->min_hw_user_wqs_num) {
+ dev_dbg(hdev->dev,
+ "number of %s WQ entries (0x%x) must be at least %d, port %d\n", type_str,
+ in->num_of_wq_entries, cn_props->min_hw_user_wqs_num, port);
+ return -EINVAL;
+ }
+
+ /* H/W limitation */
+ if (in->num_of_wq_entries > cn_props->max_hw_user_wqs_num) {
+ dev_dbg(hdev->dev,
+ "number of %s WQ entries (0x%x) can't be bigger than 0x%x, port %d\n",
+ type_str, in->num_of_wq_entries, cn_props->max_hw_user_wqs_num, port);
+ return -EINVAL;
+ }
+
+ port_funcs->cfg_lock(cn_port);
+
+ if (!cn_port->set_app_params) {
+ dev_dbg(hdev->dev,
+ "Failed to set %s WQ array, set_app_params wasn't called yet, port %d\n",
+ type_str, port);
+ rc = -EPERM;
+ goto out;
+ }
+
+ /* we first check the wq_under_unset condition since a prev WQ unset (async) operation may
+ * still be in progress, and since in such cases we would like to return -EAGAIN to the
+ * caller and not -EINVAL
+ */
+ if (wq_arr_props->enabled && wq_arr_props->under_unset) {
+ dev_dbg_ratelimited(hdev->dev,
+ "Retry to set %s WQ array as it is under unset, port %d\n",
+ type_str, port);
+ rc = -EAGAIN;
+ goto out;
+ }
+
+ if (wq_arr_props->enabled) {
+ dev_dbg(hdev->dev, "%s WQ array is already enabled, port %d\n", type_str, port);
+ rc = -EINVAL;
+ goto out;
+ }
+
+ if (wq_arr_props->under_unset) {
+ dev_dbg(hdev->dev,
+ "Failed to set %s WQ array as it is not enabled and under unset, port %d\n",
+ type_str, port);
+ rc = -EPERM;
+ goto out;
+ }
+
+ num_of_wq_entries = cn_port->num_of_wq_entries;
+ num_of_wqs = cn_port->num_of_wqs;
+
+ if (num_of_wq_entries && num_of_wq_entries != in->num_of_wq_entries) {
+ dev_dbg(hdev->dev, "%s WQ number of entries (0x%x) should be 0x%x, port %d\n",
+ type_str, in->num_of_wq_entries, num_of_wq_entries, port);
+ rc = -EINVAL;
+ goto out;
+ }
+
+ if (num_of_wqs && num_of_wqs != in->num_of_wqs) {
+ dev_dbg(hdev->dev, "%s WQs number (0x%x) should be 0x%x, port %d\n",
+ type_str, in->num_of_wqs, num_of_wqs, port);
+ rc = -EINVAL;
+ goto out;
+ }
+
+ rc = hdev->asic_funcs->user_wq_arr_set(hdev, in, out, ctx);
+ if (rc) {
+ dev_err(hdev->dev, "%s WQ array set failed, port %d, err %d\n", type_str, port, rc);
+ goto out;
+ }
+
+ cn_port->num_of_wq_entries = in->num_of_wq_entries;
+ cn_port->num_of_wqs = in->num_of_wqs;
+
+ wq_arr_props->enabled = true;
+
+out:
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static int __user_wq_arr_unset(struct hbl_cn_ctx *ctx, struct hbl_cn_port *cn_port, u32 type)
+{
+ struct hbl_cn_wq_array_properties *wq_arr_props;
+ struct hbl_cn_device *hdev;
+ char *type_str;
+ u32 port;
+ int rc;
+
+ hdev = ctx->hdev;
+ wq_arr_props = &cn_port->wq_arr_props[type];
+ type_str = wq_arr_props->type_str;
+ port = cn_port->port;
+
+ rc = hdev->asic_funcs->port_funcs->user_wq_arr_unset(ctx, cn_port, type);
+ if (rc)
+ dev_err(hdev->dev, "%s WQ array unset failed, port %d, err %d\n", type_str, port,
+ rc);
+
+ wq_arr_props->enabled = false;
+ wq_arr_props->under_unset = false;
+
+ if (!cn_port->wq_arr_props[HBL_CNI_USER_WQ_SEND].enabled &&
+ !cn_port->wq_arr_props[HBL_CNI_USER_WQ_RECV].enabled) {
+ cn_port->num_of_wq_entries = 0;
+ cn_port->num_of_wqs = 0;
+ }
+
+ return rc;
+}
+
+static int user_wq_arr_unset(struct hbl_cn_device *hdev, struct hbl_cni_user_wq_arr_unset_in *in,
+ struct hbl_cn_ctx *ctx)
+{
+ struct hbl_cn_wq_array_properties *wq_arr_props;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_properties *cn_props;
+ struct hbl_cn_port *cn_port;
+ u32 port, type;
+ char *type_str;
+ int rc;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+ cn_props = &hdev->cn_props;
+
+ type = in->type;
+
+ if (type > cn_props->max_wq_arr_type) {
+ dev_dbg(hdev->dev, "invalid type %d, can't unset user WQ\n", type);
+ return -EINVAL;
+ }
+
+ port = in->port;
+
+ /* No need to check if the port is open because internal ports are always open and external
+ * ports might be closed by a user command e.g. "ip link set down" after a WQ was
+ * configured, but we still want to unset it.
+ */
+ rc = hbl_cn_cmd_port_check(hdev, port, NIC_PORT_CHECK_ENABLE | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ cn_port = &hdev->cn_ports[port];
+
+ wq_arr_props = &cn_port->wq_arr_props[type];
+ type_str = wq_arr_props->type_str;
+
+ port_funcs->cfg_lock(cn_port);
+
+ if (!wq_arr_props->enabled) {
+ dev_dbg(hdev->dev, "%s WQ array is disabled, port %d\n", type_str, port);
+ rc = -EINVAL;
+ goto out;
+ }
+
+ if (wq_arr_props->under_unset) {
+ dev_dbg(hdev->dev, "%s WQ array is already under unset, port %d\n", type_str, port);
+ rc = -EPERM;
+ goto out;
+ }
+
+ /* Allocated QPs might still use the WQ, hence unset the WQ once they are destroyed */
+ if (atomic_read(&cn_port->num_of_allocated_qps)) {
+ wq_arr_props->under_unset = true;
+ rc = 0;
+ goto out;
+ }
+
+ rc = __user_wq_arr_unset(ctx, cn_port, type);
+out:
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static int alloc_user_cq_id(struct hbl_cn_device *hdev, struct hbl_cni_alloc_user_cq_id_in *in,
+ struct hbl_cni_alloc_user_cq_id_out *out, struct hbl_cn_ctx *ctx)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs = hdev->asic_funcs->port_funcs;
+ struct hbl_cn_properties *cn_props = &hdev->cn_props;
+ u32 min_id, max_id, port, flags;
+ struct hbl_cn_user_cq *user_cq;
+ struct hbl_cn_port *cn_port;
+ struct xa_limit id_limit;
+ int id, rc;
+
+ port = in->port;
+ flags = NIC_PORT_PRINT_ON_ERR;
+
+ if (!cn_props->force_cq)
+ flags |= NIC_PORT_CHECK_OPEN;
+
+ rc = hbl_cn_cmd_port_check(hdev, port, flags);
+ if (rc)
+ return rc;
+
+ cn_port = &hdev->cn_ports[port];
+
+ user_cq = kzalloc(sizeof(*user_cq), GFP_KERNEL);
+ if (!user_cq)
+ return -ENOMEM;
+
+ user_cq->state = USER_CQ_STATE_ALLOC;
+ user_cq->ctx = ctx;
+ user_cq->cn_port = cn_port;
+ kref_init(&user_cq->refcount);
+
+ port_funcs->get_cq_id_range(cn_port, &min_id, &max_id);
+
+ port_funcs->cfg_lock(cn_port);
+
+ if (!cn_port->set_app_params) {
+ dev_dbg(hdev->dev,
+ "Failed to allocate a CQ ID, set_app_params wasn't called yet, port %d\n",
+ port);
+ rc = -EPERM;
+ goto cfg_unlock;
+ }
+
+ id_limit = XA_LIMIT(min_id, max_id);
+ rc = xa_alloc(&cn_port->cq_ids, &id, user_cq, id_limit, GFP_KERNEL);
+ if (rc) {
+ dev_err(hdev->dev, "No available user CQ, port %d\n", port);
+ goto cfg_unlock;
+ }
+
+ user_cq->id = id;
+
+ mutex_init(&user_cq->overrun_lock);
+
+ port_funcs->cfg_unlock(cn_port);
+
+ dev_dbg(hdev->dev, "Allocating CQ id %d in port %d", id, port);
+
+ out->id = id;
+
+ return 0;
+
+cfg_unlock:
+ port_funcs->cfg_unlock(cn_port);
+ kfree(user_cq);
+
+ return rc;
+}
+
+static bool validate_cq_id_range(struct hbl_cn_port *cn_port, u32 cq_id)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ u32 min_id, max_id;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ port_funcs->get_cq_id_range(cn_port, &min_id, &max_id);
+
+ return (cq_id >= min_id) && (cq_id <= max_id);
+}
+
+static int __user_cq_set(struct hbl_cn_device *hdev, struct hbl_cni_user_cq_set_in_params *in,
+ struct hbl_cni_user_cq_set_out_params *out)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs = hdev->asic_funcs->port_funcs;
+ struct hbl_cn_properties *cn_props = &hdev->cn_props;
+ struct hbl_cn_user_cq *user_cq;
+ struct hbl_cn_port *cn_port;
+ u32 port, flags, id;
+ int rc;
+
+ id = in->id;
+ port = in->port;
+
+ flags = NIC_PORT_PRINT_ON_ERR;
+
+ if (!cn_props->force_cq)
+ flags |= NIC_PORT_CHECK_OPEN;
+
+ rc = hbl_cn_cmd_port_check(hdev, port, flags);
+ if (rc)
+ return rc;
+
+ cn_port = &hdev->cn_ports[port];
+
+ if (!validate_cq_id_range(cn_port, id)) {
+ dev_dbg(hdev->dev, "user CQ %d is invalid, port %d\n", id, port);
+ return -EINVAL;
+ }
+
+ if (in->num_of_cqes < cn_props->user_cq_min_entries) {
+ dev_dbg(hdev->dev,
+ "user CQ %d buffer length must be at least 0x%x entries, port %d\n",
+ id, cn_props->user_cq_min_entries, port);
+ return -EINVAL;
+ }
+
+ if (!is_power_of_2(in->num_of_cqes)) {
+ dev_dbg(hdev->dev, "user CQ %d buffer length must be at power of 2, port %d\n",
+ id, port);
+ return -EINVAL;
+ }
+
+ if (in->num_of_cqes > cn_props->user_cq_max_entries) {
+ dev_dbg(hdev->dev,
+ "user CQ %d buffer length must not be more than 0x%x entries, port %d\n",
+ id, cn_props->user_cq_max_entries, port);
+ return -EINVAL;
+ }
+
+ port_funcs->cfg_lock(cn_port);
+
+ /* Validate if user CQ is allocated. */
+ user_cq = xa_load(&cn_port->cq_ids, id);
+ if (!user_cq) {
+ dev_dbg(hdev->dev, "user CQ %d wasn't allocated, port %d\n", id, port);
+ rc = -EINVAL;
+ goto out;
+ }
+
+ /* Validate that user CQ is in ALLOC state. */
+ if (user_cq->state != USER_CQ_STATE_ALLOC) {
+ dev_dbg(hdev->dev, "user CQ %d set failed, current state %d, port %d\n",
+ id, user_cq->state, port);
+ rc = -EINVAL;
+ goto out;
+ }
+
+ rc = port_funcs->user_cq_set(user_cq, in, out);
+ if (rc) {
+ dev_dbg(hdev->dev, "user CQ %d set failed, port %d\n", id, port);
+ goto out;
+ }
+
+ user_cq->state = USER_CQ_STATE_SET;
+out:
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static int user_cq_id_set(struct hbl_cn_device *hdev, struct hbl_cni_user_cq_id_set_in *in,
+ struct hbl_cni_user_cq_id_set_out *out)
+{
+ struct hbl_cni_user_cq_set_out_params out2 = {};
+ struct hbl_cni_user_cq_set_in_params in2 = {};
+ int rc;
+
+ in2.port = in->port;
+ in2.num_of_cqes = in->num_of_cqes;
+ in2.id = in->id;
+
+ rc = __user_cq_set(hdev, &in2, &out2);
+ if (rc)
+ return rc;
+
+ out->mem_handle = out2.mem_handle;
+ out->pi_handle = out2.pi_handle;
+ out->regs_handle = out2.regs_handle;
+ out->regs_offset = out2.regs_offset;
+
+ return 0;
+}
+
+static void user_cq_destroy(struct kref *kref)
+{
+ struct hbl_cn_user_cq *user_cq = container_of(kref, struct hbl_cn_user_cq, refcount);
+ struct hbl_cn_port *cn_port = user_cq->cn_port;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ /* Destroy the remaining resources allocated during SET state. The below callback needs to
+ * be called only if the CQ moved to unset from set state. This is because, this resource
+ * was created only during set state. If the CQ moved directly to unset from alloc then we
+ * shouldn't be trying to clear the resource.
+ */
+ if (user_cq->state == USER_CQ_STATE_SET_TO_UNSET)
+ port_funcs->user_cq_destroy(user_cq);
+
+ mutex_destroy(&user_cq->overrun_lock);
+ xa_erase(&cn_port->cq_ids, user_cq->id);
+ kfree(user_cq);
+}
+
+struct hbl_cn_user_cq *hbl_cn_user_cq_get(struct hbl_cn_port *cn_port, u8 cq_id)
+{
+ struct hbl_cn_user_cq *user_cq;
+
+ user_cq = xa_load(&cn_port->cq_ids, cq_id);
+ if (!user_cq || user_cq->state != USER_CQ_STATE_SET)
+ return NULL;
+
+ kref_get(&user_cq->refcount);
+
+ return user_cq;
+}
+
+int hbl_cn_user_cq_put(struct hbl_cn_user_cq *user_cq)
+{
+ return kref_put(&user_cq->refcount, user_cq_destroy);
+}
+
+static int user_cq_unset_locked(struct hbl_cn_user_cq *user_cq, bool warn_if_alive)
+{
+ struct hbl_cn_port *cn_port = user_cq->cn_port;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port, id = user_cq->id;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ int rc = 0, ret;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ /* Call unset only if the CQ has already been SET */
+ if (user_cq->state == USER_CQ_STATE_SET) {
+ rc = port_funcs->user_cq_unset(user_cq);
+ if (rc)
+ dev_dbg(hdev->dev, "user CQ %d unset failed, port %d\n", id, port);
+
+ user_cq->state = USER_CQ_STATE_SET_TO_UNSET;
+ } else {
+ user_cq->state = USER_CQ_STATE_ALLOC_TO_UNSET;
+ }
+
+ /* we'd like to destroy even if the unset callback returned error */
+ ret = hbl_cn_user_cq_put(user_cq);
+
+ if (warn_if_alive && ret != 1)
+ dev_warn(hdev->dev, "user CQ %d was not destroyed, port %d\n", id, port);
+
+ return rc;
+}
+
+static int __user_cq_unset(struct hbl_cn_device *hdev, struct hbl_cni_user_cq_unset_in_params *in)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs = hdev->asic_funcs->port_funcs;
+ struct hbl_cn_properties *cn_props = &hdev->cn_props;
+ struct hbl_cn_user_cq *user_cq;
+ struct hbl_cn_port *cn_port;
+ u32 port, flags, id;
+ int rc;
+
+ port = in->port;
+ id = in->id;
+
+ if (port >= cn_props->max_num_of_ports) {
+ dev_dbg(hdev->dev, "Invalid port %d\n", port);
+ return -EINVAL;
+ }
+
+ cn_port = &hdev->cn_ports[port];
+
+ flags = NIC_PORT_PRINT_ON_ERR;
+
+ /* Unless force_cq flag in enabled, in case of user CQ unset of external ports, the port
+ * may be already closed by the user, so we only check if the port is enabled.
+ */
+ if (!cn_props->force_cq)
+ flags |= cn_port->eth_enable ? NIC_PORT_CHECK_ENABLE : NIC_PORT_CHECK_OPEN;
+
+ rc = hbl_cn_cmd_port_check(hdev, port, flags);
+ if (rc)
+ return rc;
+
+ if (!validate_cq_id_range(cn_port, id)) {
+ dev_dbg(hdev->dev, "user CQ %d is invalid, port %d\n", id, port);
+ return -EINVAL;
+ }
+
+ port_funcs->cfg_lock(cn_port);
+
+ /* Validate if user CQ is allocated. */
+ user_cq = xa_load(&cn_port->cq_ids, id);
+ if (!user_cq) {
+ dev_dbg(hdev->dev, "user CQ %d wasn't allocated, port %d\n", id, port);
+ rc = -EINVAL;
+ goto out;
+ }
+
+ rc = user_cq_unset_locked(user_cq, false);
+out:
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static int user_cq_id_unset(struct hbl_cn_device *hdev, struct hbl_cni_user_cq_id_unset_in *in)
+{
+ struct hbl_cni_user_cq_unset_in_params in2 = {};
+
+ in2.port = in->port;
+ in2.id = in->id;
+
+ return __user_cq_unset(hdev, &in2);
+}
+
+static int user_set_app_params(struct hbl_cn_device *hdev,
+ struct hbl_cni_set_user_app_params_in *in, struct hbl_cn_ctx *ctx)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_port *cn_port;
+ bool modify_wqe_checkers;
+ u32 port;
+ int rc;
+
+ port_funcs = asic_funcs->port_funcs;
+
+ port = in->port;
+
+ rc = hbl_cn_cmd_port_check(hdev, port, NIC_PORT_CHECK_OPEN | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ cn_port = &hdev->cn_ports[port];
+
+ /* We must take rtnl_lock here prior to taking cfg_lock, as we may land into flow that
+ * extracts the IP port and that can cause a deadlock in case an operation from the
+ * net subsystem that requires the cfg_lock is executed at the same time. As such operation
+ * will first obtain rtnl_lock and then will try to take a cfg_lock, hence a deadlock.
+ */
+ rtnl_lock();
+ port_funcs->cfg_lock(cn_port);
+
+ rc = asic_funcs->user_set_app_params(hdev, in, &modify_wqe_checkers, ctx);
+ if (rc)
+ goto out;
+
+ if (modify_wqe_checkers) {
+ rc = hdev->asic_funcs->port_funcs->disable_wqe_index_checker(cn_port);
+ if (rc) {
+ dev_err(hdev->dev, "Failed disable wqe index checker, port %d rc %d\n",
+ port, rc);
+ goto out;
+ }
+ }
+
+ cn_port->set_app_params = true;
+
+out:
+ port_funcs->cfg_unlock(cn_port);
+ rtnl_unlock();
+
+ return rc;
+}
+
+static int user_get_app_params(struct hbl_cn_device *hdev,
+ struct hbl_cni_get_user_app_params_in *in,
+ struct hbl_cni_get_user_app_params_out *out)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_port *cn_port;
+ u32 port;
+ int rc;
+
+ port_funcs = asic_funcs->port_funcs;
+
+ port = in->port;
+
+ rc = hbl_cn_cmd_port_check(hdev, port, NIC_PORT_CHECK_OPEN | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ cn_port = &hdev->cn_ports[port];
+
+ port_funcs->cfg_lock(cn_port);
+ asic_funcs->user_get_app_params(hdev, in, out);
+ port_funcs->cfg_unlock(cn_port);
+
+ return 0;
+}
+
+static int eq_poll(struct hbl_cn_device *hdev, struct hbl_cn_ctx *ctx,
+ struct hbl_cni_eq_poll_in *in, struct hbl_cni_eq_poll_out *out)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+ struct hbl_cn_port *cn_port;
+ u32 port;
+ int rc;
+
+ port = in->port;
+ rc = hbl_cn_cmd_port_check(hdev, port, NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ cn_port = &hdev->cn_ports[port];
+ rc = asic_funcs->port_funcs->eq_poll(cn_port, ctx->asid, out);
+ switch (rc) {
+ case 0:
+ out->status = HBL_CNI_EQ_POLL_STATUS_SUCCESS;
+ break;
+ case -EOPNOTSUPP:
+ out->status = HBL_CNI_EQ_POLL_STATUS_ERR_UNSUPPORTED_OP;
+ break;
+ case -EINVAL:
+ out->status = HBL_CNI_EQ_POLL_STATUS_ERR_NO_SUCH_PORT;
+ break;
+ case -ENXIO:
+ out->status = HBL_CNI_EQ_POLL_STATUS_ERR_PORT_DISABLED;
+ break;
+ case -ENODATA:
+ out->status = HBL_CNI_EQ_POLL_STATUS_EQ_EMPTY;
+ break;
+ case -ESRCH:
+ out->status = HBL_CNI_EQ_POLL_STATUS_ERR_NO_SUCH_EQ;
+ break;
+ default:
+ out->status = HBL_CNI_EQ_POLL_STATUS_ERR_UNDEF;
+ break;
+ }
+
+ return 0;
+}
+
+static void get_user_db_fifo_id_range(struct hbl_cn_port *cn_port, u32 *min_id, u32 *max_id,
+ u32 id_hint)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs;
+
+ port_funcs = cn_port->hdev->asic_funcs->port_funcs;
+
+ /* id_hint comes from user. Driver enforces allocation of the requested
+ * db fifo HW resource. i.e. driver fails if requested resource is not
+ * available. Reason, user stack has hard coded user fifo resource IDs.
+ */
+ if (id_hint) {
+ *min_id = id_hint;
+ *max_id = id_hint;
+ } else {
+ port_funcs->get_db_fifo_id_range(cn_port, min_id, max_id);
+ }
+}
+
+static int alloc_user_db_fifo(struct hbl_cn_device *hdev, struct hbl_cn_ctx *ctx,
+ struct hbl_cni_alloc_user_db_fifo_in *in,
+ struct hbl_cni_alloc_user_db_fifo_out *out)
+{
+ struct hbl_cn_db_fifo_xarray_pdata *xa_pdata;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_port *cn_port;
+ struct xa_limit id_limit;
+ u32 min_id, max_id;
+ int rc, id;
+ u32 port;
+
+ port = in->port;
+ rc = hbl_cn_cmd_port_check(hdev, port, NIC_PORT_CHECK_OPEN | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ cn_port = &hdev->cn_ports[port];
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ get_user_db_fifo_id_range(cn_port, &min_id, &max_id, in->id_hint);
+
+ /* IDR private data. */
+ xa_pdata = kzalloc(sizeof(*xa_pdata), GFP_KERNEL);
+ if (!xa_pdata)
+ return -ENOMEM;
+
+ xa_pdata->asid = ctx->asid;
+ xa_pdata->state = DB_FIFO_STATE_ALLOC;
+ xa_pdata->port = port;
+
+ port_funcs->cfg_lock(cn_port);
+
+ if (!cn_port->set_app_params) {
+ dev_dbg(hdev->dev,
+ "Failed to allocate DB FIFO, set_app_params wasn't called yet, port %d\n",
+ port);
+ rc = -EPERM;
+ goto cfg_unlock;
+ }
+
+ id_limit = XA_LIMIT(min_id, max_id);
+ rc = xa_alloc(&cn_port->db_fifo_ids, &id, xa_pdata, id_limit, GFP_KERNEL);
+ if (rc) {
+ dev_dbg_ratelimited(hdev->dev, "DB FIFO ID allocation failed, port %d\n", port);
+ goto cfg_unlock;
+ }
+
+ xa_pdata->id = id;
+
+ port_funcs->cfg_unlock(cn_port);
+
+ out->id = id;
+
+ return 0;
+
+cfg_unlock:
+ port_funcs->cfg_unlock(cn_port);
+ kfree(xa_pdata);
+ return rc;
+}
+
+static int validate_db_fifo_id_range(struct hbl_cn_port *cn_port, u32 db_fifo_id)
+{
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct hbl_cn_device *hdev;
+ u32 min_id, max_id;
+
+ hdev = cn_port->hdev;
+ asic_funcs = hdev->asic_funcs;
+
+ asic_funcs->port_funcs->get_db_fifo_id_range(cn_port, &min_id, &max_id);
+
+ if (db_fifo_id < min_id || db_fifo_id > max_id) {
+ dev_dbg_ratelimited(hdev->dev, "Invalid db fifo ID, %d, port: %d\n", db_fifo_id,
+ cn_port->port);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int validate_db_fifo_mode(struct hbl_cn_port *cn_port, u8 fifo_mode)
+{
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct hbl_cn_device *hdev;
+ u32 modes_mask;
+
+ hdev = cn_port->hdev;
+ asic_funcs = hdev->asic_funcs;
+
+ asic_funcs->port_funcs->get_db_fifo_modes_mask(cn_port, &modes_mask);
+
+ if (!(BIT(fifo_mode) & modes_mask)) {
+ dev_dbg_ratelimited(hdev->dev, "Invalid db fifo mode, %d, port: %d\n", fifo_mode,
+ cn_port->port);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int validate_db_fifo_ioctl(struct hbl_cn_port *cn_port, u32 db_fifo_id)
+{
+ return validate_db_fifo_id_range(cn_port, db_fifo_id);
+}
+
+static int user_db_fifo_unset_and_free(struct hbl_cn_port *cn_port,
+ struct hbl_cn_db_fifo_xarray_pdata *xa_pdata)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ int rc = 0;
+
+ asic_funcs = hdev->asic_funcs;
+
+ asic_funcs->port_funcs->db_fifo_unset(cn_port, xa_pdata->id, xa_pdata);
+
+ /* Destroy CI buffer if we allocated one.
+ * Note: Not all DB fifo modes need CI memory buffer.
+ * Track CI via sync objects.
+ * If there is an issue in destroying the CI memory, then we might exit this function
+ * without freeing the db_fifo_pool. This would cause a kernel assertion when we try to do
+ * rmmod as the gen_alloc_destroy for db_fifo_pool would fail as there are allocations
+ * still left in the pool. So, the db_fifo_pool needs to be freed irrespective of the ci
+ * memory being destroyed or not.
+ */
+ if (xa_pdata->ci_mmap_handle)
+ rc = hbl_cn_mem_destroy(hdev, xa_pdata->ci_mmap_handle);
+
+ asic_funcs->port_funcs->db_fifo_free(cn_port, xa_pdata->db_pool_addr, xa_pdata->fifo_size);
+
+ return rc;
+}
+
+static int user_db_fifo_set(struct hbl_cn_device *hdev, struct hbl_cn_ctx *ctx,
+ struct hbl_cni_user_db_fifo_set_in *in,
+ struct hbl_cni_user_db_fifo_set_out *out)
+{
+ u64 umr_block_addr, umr_mmap_handle, ci_mmap_handle = 0, ci_device_handle;
+ struct hbl_cn_db_fifo_xarray_pdata *xa_pdata;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_mem_data mem_data = {};
+ struct hbl_cn_port *cn_port;
+ u32 umr_db_offset, port, id;
+ int rc;
+
+ port = in->port;
+ rc = hbl_cn_cmd_port_check(hdev, port, NIC_PORT_CHECK_OPEN | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ cn_port = &hdev->cn_ports[port];
+ port_funcs = hdev->asic_funcs->port_funcs;
+ id = in->id;
+
+ rc = validate_db_fifo_ioctl(cn_port, id);
+ if (rc)
+ return rc;
+
+ /* Get allocated ID private data. Having meta data associated with IDR also helps validate
+ * that user do not trick kernel into configuring db fifo HW for an unallocated ID.
+ */
+ port_funcs->cfg_lock(cn_port);
+ xa_pdata = xa_load(&cn_port->db_fifo_ids, id);
+ if (!xa_pdata) {
+ dev_dbg_ratelimited(hdev->dev, "DB FIFO ID %d is not allocated, port: %d\n", id,
+ port);
+ rc = -EINVAL;
+ goto cfg_unlock;
+ }
+
+ rc = validate_db_fifo_mode(cn_port, in->mode);
+ if (rc)
+ goto cfg_unlock;
+
+ xa_pdata->fifo_mode = in->mode;
+
+ /* User may call db_fifo_set multiple times post db_fifo_alloc. So, before doing any
+ * further register changes, make sure to unset the previous settings for this id
+ */
+ if (xa_pdata->state == DB_FIFO_STATE_SET) {
+ rc = user_db_fifo_unset_and_free(cn_port, xa_pdata);
+ if (rc) {
+ dev_dbg(hdev->dev, "Fail to unset DB FIFO %d before set, port %d\n", id,
+ port);
+ goto cfg_unlock;
+ }
+ }
+
+ rc = port_funcs->db_fifo_allocate(cn_port, xa_pdata);
+ if (rc) {
+ dev_dbg(hdev->dev, "DB FIFO %d allocation failed, port %d, mode %d\n", id, port,
+ in->mode);
+ goto cfg_unlock;
+ }
+
+ /* Get the user mapped register(UMR) block address and
+ * db fifo offset associated with the ID.
+ */
+ port_funcs->get_db_fifo_umr(cn_port, id, &umr_block_addr, &umr_db_offset);
+
+ /* Get mmap handle for UMR block. */
+ rc = hbl_cn_get_hw_block_handle(hdev, umr_block_addr, &umr_mmap_handle);
+ if (rc) {
+ dev_dbg_ratelimited(hdev->dev,
+ "Failed to get UMR mmap handle of DB FIFO %d, port %d\n", id,
+ port);
+ goto free_db_fifo;
+ }
+
+ /* Allocate a consumer-index(CI) buffer in host kernel.
+ * HW updates CI when it pops a db fifo. User mmaps CI buffer and may poll to read current
+ * CI.
+ * Allocate page size, else we risk exposing kernel data to userspace inadvertently.
+ */
+ mem_data.mem_id = HBL_CN_DRV_MEM_HOST_DMA_COHERENT;
+ mem_data.size = PAGE_SIZE;
+ rc = hbl_cn_mem_alloc(ctx, &mem_data);
+ if (rc) {
+ dev_dbg_ratelimited(hdev->dev,
+ "DB FIFO id %d, CI buffer allocation failed, port %d\n",
+ id, port);
+ goto free_db_fifo;
+ }
+
+ ci_mmap_handle = mem_data.handle;
+ ci_device_handle = mem_data.addr;
+
+ rc = port_funcs->db_fifo_set(cn_port, ctx, id, ci_device_handle, xa_pdata);
+ if (rc) {
+ dev_dbg_ratelimited(hdev->dev, "DB FIFO id %d, HW config failed, port %d\n", id,
+ port);
+ goto free_ci;
+ }
+
+ /* Cache IDR metadata and init IOCTL out. */
+ out->ci_handle = ci_mmap_handle;
+ out->regs_handle = umr_mmap_handle;
+ out->regs_offset = umr_db_offset;
+
+ xa_pdata->ci_mmap_handle = out->ci_handle;
+ xa_pdata->umr_mmap_handle = out->regs_handle;
+ xa_pdata->umr_db_offset = out->regs_offset;
+ xa_pdata->state = DB_FIFO_STATE_SET;
+
+ out->fifo_size = xa_pdata->fifo_size;
+ out->fifo_bp_thresh = xa_pdata->fifo_size / 2;
+
+ port_funcs->cfg_unlock(cn_port);
+
+ return 0;
+
+free_ci:
+ if (ci_mmap_handle)
+ hbl_cn_mem_destroy(hdev, ci_mmap_handle);
+free_db_fifo:
+ port_funcs->db_fifo_free(cn_port, xa_pdata->db_pool_addr, xa_pdata->fifo_size);
+cfg_unlock:
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static int __user_db_fifo_unset(struct hbl_cn_port *cn_port,
+ struct hbl_cn_db_fifo_xarray_pdata *xa_pdata)
+{
+ u32 id = xa_pdata->id;
+ int rc = 0;
+
+ /* User may call unset or the context may be destroyed while a db fifo is still in
+ * allocated state. When we call alloc_user_db_fifo next time, we would skip that
+ * particular id. This way, the id is blocked indefinitely until a full reset is done.
+ * So to fix this issue, we maintain the state of the idr. Perform unset only if set had
+ * been previously done for the idr.
+ */
+ if (xa_pdata->state == DB_FIFO_STATE_SET)
+ rc = user_db_fifo_unset_and_free(cn_port, xa_pdata);
+
+ kfree(xa_pdata);
+ xa_erase(&cn_port->db_fifo_ids, id);
+
+ return rc;
+}
+
+static int user_db_fifo_unset(struct hbl_cn_device *hdev, struct hbl_cni_user_db_fifo_unset_in *in)
+{
+ struct hbl_cn_db_fifo_xarray_pdata *xa_pdata;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_port *cn_port;
+ int rc;
+ u32 id;
+
+ rc = hbl_cn_cmd_port_check(hdev, in->port, NIC_PORT_CHECK_OPEN | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ cn_port = &hdev->cn_ports[in->port];
+ port_funcs = hdev->asic_funcs->port_funcs;
+ id = in->id;
+
+ rc = validate_db_fifo_ioctl(cn_port, id);
+ if (rc)
+ return rc;
+
+ port_funcs->cfg_lock(cn_port);
+
+ xa_pdata = xa_load(&cn_port->db_fifo_ids, id);
+ if (!xa_pdata) {
+ dev_dbg_ratelimited(hdev->dev, "DB fifo ID %d is not allocated, port: %d\n", id,
+ in->port);
+ rc = -EINVAL;
+ goto out;
+ }
+
+ rc = __user_db_fifo_unset(cn_port, xa_pdata);
+out:
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static int user_encap_alloc(struct hbl_cn_device *hdev, struct hbl_cni_user_encap_alloc_in *in,
+ struct hbl_cni_user_encap_alloc_out *out)
+{
+ struct hbl_cn_encap_xarray_pdata *xa_pdata;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_port *cn_port;
+ struct xa_limit id_limit;
+ u32 min_id, max_id;
+ int rc, id;
+ u32 port;
+
+ port = in->port;
+ rc = hbl_cn_cmd_port_check(hdev, port, NIC_PORT_CHECK_OPEN | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ cn_port = &hdev->cn_ports[port];
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ port_funcs->get_encap_id_range(cn_port, &min_id, &max_id);
+
+ /* IDR private data. */
+ xa_pdata = kzalloc(sizeof(*xa_pdata), GFP_KERNEL);
+ if (!xa_pdata)
+ return -ENOMEM;
+
+ xa_pdata->port = port;
+
+ port_funcs->cfg_lock(cn_port);
+
+ if (!cn_port->set_app_params) {
+ dev_dbg(hdev->dev,
+ "Failed to allocate encapsulation ID, set_app_params wasn't called yet, port %d\n",
+ port);
+ rc = -EPERM;
+ goto cfg_unlock;
+ }
+
+ id_limit = XA_LIMIT(min_id, max_id);
+ rc = xa_alloc(&cn_port->encap_ids, &id, xa_pdata, id_limit, GFP_KERNEL);
+ if (rc) {
+ dev_dbg_ratelimited(hdev->dev, "Encapsulation ID allocation failed, port %d\n",
+ port);
+ goto cfg_unlock;
+ }
+
+ xa_pdata->id = id;
+ port_funcs->cfg_unlock(cn_port);
+
+ out->id = id;
+
+ return 0;
+
+cfg_unlock:
+ port_funcs->cfg_unlock(cn_port);
+ kfree(xa_pdata);
+
+ return rc;
+}
+
+static int validate_encap_id_range(struct hbl_cn_port *cn_port, u32 encap_id)
+{
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct hbl_cn_device *hdev;
+ u32 min_id, max_id;
+
+ hdev = cn_port->hdev;
+ asic_funcs = hdev->asic_funcs;
+
+ asic_funcs->port_funcs->get_encap_id_range(cn_port, &min_id, &max_id);
+
+ if (encap_id < min_id || encap_id > max_id) {
+ dev_dbg_ratelimited(hdev->dev, "Invalid encapsulation ID, %d\n", encap_id);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int validate_encap_ioctl(struct hbl_cn_port *cn_port, u32 encap_id)
+{
+ return validate_encap_id_range(cn_port, encap_id);
+}
+
+static bool is_encap_supported(struct hbl_cn_device *hdev, struct hbl_cni_user_encap_set_in *in)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+
+ return asic_funcs->is_encap_supported(hdev, in);
+}
+
+static int user_encap_set(struct hbl_cn_device *hdev, struct hbl_cni_user_encap_set_in *in)
+{
+ struct hbl_cn_encap_xarray_pdata *xa_pdata;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_port *cn_port;
+ u32 id, encap_type_data = 0;
+ void *encap_header = NULL;
+ int rc;
+
+ /* Check if the user request for encap set is supported */
+ if (!is_encap_supported(hdev, in))
+ return -EOPNOTSUPP;
+
+ rc = hbl_cn_cmd_port_check(hdev, in->port, NIC_PORT_CHECK_OPEN | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ cn_port = &hdev->cn_ports[in->port];
+ port_funcs = hdev->asic_funcs->port_funcs;
+ id = in->id;
+
+ rc = validate_encap_ioctl(cn_port, id);
+ if (rc)
+ return rc;
+
+ switch (in->encap_type) {
+ case HBL_CNI_ENCAP_OVER_IPV4:
+ encap_type_data = in->ip_proto;
+ break;
+ case HBL_CNI_ENCAP_OVER_UDP:
+ encap_type_data = in->udp_dst_port;
+ break;
+ case HBL_CNI_ENCAP_NONE:
+ /* No encapsulation/tunneling mode. Just set
+ * source IPv4 address and UDP protocol.
+ */
+ encap_type_data = HBL_CN_IPV4_PROTOCOL_UDP;
+ break;
+ default:
+ dev_dbg_ratelimited(hdev->dev, "Invalid encapsulation type, %d\n", in->encap_type);
+ return -EINVAL;
+ }
+
+ port_funcs->cfg_lock(cn_port);
+
+ xa_pdata = xa_load(&cn_port->encap_ids, id);
+ if (!xa_pdata) {
+ dev_dbg_ratelimited(hdev->dev, "Encapsulation ID %d is not allocated\n", id);
+ rc = -EINVAL;
+ goto cfg_unlock;
+ }
+
+ /* There could be a use case wherein the user allocates a encap ID and then calls encap_set
+ * with IPv4 encap. Now, without doing a unset, the user can call the encap_set with UDP
+ * encap or encap_none. In this case, we should be clearing the existing settings as well
+ * as freeing any allocated buffer. So, call unset API to clear the settings
+ */
+ port_funcs->encap_unset(cn_port, id, xa_pdata);
+
+ if (xa_pdata->encap_type != HBL_CNI_ENCAP_NONE)
+ kfree(xa_pdata->encap_header);
+
+ if (in->encap_type != HBL_CNI_ENCAP_NONE) {
+ if (in->tnl_hdr_size > NIC_MAX_TNL_HDR_SIZE) {
+ dev_dbg_ratelimited(hdev->dev, "Invalid tunnel header size, %d\n",
+ in->tnl_hdr_size);
+ rc = -EINVAL;
+ goto cfg_unlock;
+ }
+
+ /* Align encapsulation header to 32bit register fields. */
+ encap_header = kzalloc(ALIGN(in->tnl_hdr_size, 4), GFP_KERNEL);
+ if (!encap_header) {
+ rc = -ENOMEM;
+ goto cfg_unlock;
+ }
+
+ rc = copy_from_user(encap_header, u64_to_user_ptr(in->tnl_hdr_ptr),
+ in->tnl_hdr_size);
+ if (rc) {
+ dev_dbg_ratelimited(hdev->dev,
+ "Copy encapsulation header data failed, %d\n", rc);
+ rc = -EFAULT;
+ goto free_header;
+ }
+
+ xa_pdata->encap_header = encap_header;
+ xa_pdata->encap_header_size = in->tnl_hdr_size;
+ }
+
+ xa_pdata->encap_type = in->encap_type;
+ xa_pdata->encap_type_data = encap_type_data;
+ xa_pdata->src_ip = in->ipv4_addr;
+ xa_pdata->is_set = true;
+
+ rc = port_funcs->encap_set(cn_port, id, xa_pdata);
+ if (rc)
+ goto free_header;
+
+ port_funcs->cfg_unlock(cn_port);
+
+ return 0;
+
+free_header:
+ if (in->encap_type != HBL_CNI_ENCAP_NONE)
+ kfree(encap_header);
+cfg_unlock:
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static int user_encap_unset(struct hbl_cn_device *hdev, struct hbl_cni_user_encap_unset_in *in)
+{
+ struct hbl_cn_encap_xarray_pdata *xa_pdata;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_port *cn_port;
+ int rc;
+ u32 id;
+
+ rc = hbl_cn_cmd_port_check(hdev, in->port, NIC_PORT_CHECK_OPEN | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ cn_port = &hdev->cn_ports[in->port];
+ port_funcs = hdev->asic_funcs->port_funcs;
+ id = in->id;
+
+ rc = validate_encap_ioctl(cn_port, id);
+ if (rc)
+ return rc;
+
+ port_funcs->cfg_lock(cn_port);
+
+ xa_pdata = xa_load(&cn_port->encap_ids, id);
+ if (!xa_pdata) {
+ dev_dbg_ratelimited(hdev->dev, "Encapsulation ID %d is not allocated\n", id);
+ rc = -EINVAL;
+ goto out;
+ }
+
+ if (xa_pdata->is_set) {
+ port_funcs->encap_unset(cn_port, id, xa_pdata);
+
+ if (xa_pdata->encap_type != HBL_CNI_ENCAP_NONE)
+ kfree(xa_pdata->encap_header);
+ }
+
+ xa_erase(&cn_port->encap_ids, id);
+ kfree(xa_pdata);
+
+out:
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static int user_ccq_set(struct hbl_cn_device *hdev, struct hbl_cni_user_ccq_set_in *in,
+ struct hbl_cni_user_ccq_set_out *out, struct hbl_cn_ctx *ctx)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs = hdev->asic_funcs->port_funcs;
+ u64 ccq_mmap_handle, ccq_device_addr, pi_mmap_handle, pi_device_addr;
+ struct hbl_cn_mem_data mem_data = {};
+ struct hbl_cn_port *cn_port;
+ u32 port, ccqn;
+ int rc;
+
+ rc = hbl_cn_cmd_port_check(hdev, in->port, NIC_PORT_CHECK_OPEN | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ port = in->port;
+
+ if (!hdev->mmu_bypass) {
+ dev_dbg(hdev->dev, "Allocation of non physical dma-mem is not supported, port %d\n",
+ port);
+ return -EOPNOTSUPP;
+ }
+
+ if (!is_power_of_2(in->num_of_entries)) {
+ dev_dbg(hdev->dev, "user CCQ buffer length must be at power of 2, port %d\n",
+ port);
+ return -EINVAL;
+ }
+
+ if (in->num_of_entries > USER_CCQ_MAX_ENTRIES ||
+ in->num_of_entries < USER_CCQ_MIN_ENTRIES) {
+ dev_dbg(hdev->dev, "CCQ buffer length invalid 0x%x, port %d\n", in->num_of_entries,
+ port);
+ return -EINVAL;
+ }
+
+ cn_port = &hdev->cn_ports[port];
+
+ port_funcs->cfg_lock(cn_port);
+
+ if (!cn_port->set_app_params) {
+ dev_dbg(hdev->dev,
+ "Failed to set CCQ handler, set_app_params wasn't called yet, port %d\n",
+ port);
+ rc = -EPERM;
+ goto cfg_unlock;
+ }
+
+ if (cn_port->ccq_enable) {
+ dev_dbg(hdev->dev, "Failed setting CCQ handler - it is already set, port %d\n",
+ port);
+ rc = -EBUSY;
+ goto cfg_unlock;
+ }
+
+ /* Allocate the queue memory buffer in host kernel */
+ mem_data.mem_id = HBL_CN_DRV_MEM_HOST_DMA_COHERENT;
+ mem_data.size = in->num_of_entries * CC_CQE_SIZE;
+ rc = hbl_cn_mem_alloc(ctx, &mem_data);
+ if (rc) {
+ dev_err(hdev->dev, "CCQ memory buffer allocation failed, port %d\n", port);
+ goto cfg_unlock;
+ }
+
+ ccq_mmap_handle = mem_data.handle;
+ ccq_device_addr = mem_data.addr;
+
+ /* Allocate a producer-index (PI) buffer in host kernel */
+ memset(&mem_data, 0, sizeof(mem_data));
+ mem_data.mem_id = HBL_CN_DRV_MEM_HOST_DMA_COHERENT;
+ mem_data.size = PAGE_SIZE;
+ rc = hbl_cn_mem_alloc(ctx, &mem_data);
+ if (rc) {
+ dev_err(hdev->dev, "CCQ PI buffer allocation failed, port %d\n", port);
+ goto free_ccq;
+ }
+
+ pi_mmap_handle = mem_data.handle;
+ pi_device_addr = mem_data.addr;
+
+ port_funcs->user_ccq_set(cn_port, ccq_device_addr, pi_device_addr, in->num_of_entries,
+ &ccqn);
+
+ rc = hbl_cn_eq_dispatcher_register_ccq(cn_port, ctx->asid, ccqn);
+ if (rc) {
+ dev_err(hdev->dev, "failed to register CCQ EQ handler, port %u, asid %u\n", port,
+ ctx->asid);
+ goto free_pi;
+ }
+
+ cn_port->ccq_handle = ccq_mmap_handle;
+ cn_port->ccq_pi_handle = pi_mmap_handle;
+
+ out->mem_handle = cn_port->ccq_handle;
+ out->pi_handle = cn_port->ccq_pi_handle;
+ out->id = ccqn;
+
+ cn_port->ccq_enable = true;
+
+ port_funcs->cfg_unlock(cn_port);
+
+ return 0;
+
+free_pi:
+ hbl_cn_mem_destroy(hdev, pi_mmap_handle);
+free_ccq:
+ hbl_cn_mem_destroy(hdev, ccq_mmap_handle);
+cfg_unlock:
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static int __user_ccq_unset(struct hbl_cn_device *hdev, struct hbl_cn_ctx *ctx, u32 port)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_port *cn_port;
+ bool has_errors = false;
+ u32 ccqn;
+ int rc;
+
+ cn_port = &hdev->cn_ports[port];
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+ port_funcs->user_ccq_unset(cn_port, &ccqn);
+
+ rc = hbl_cn_mem_destroy(hdev, cn_port->ccq_pi_handle);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to free CCQ PI memory, port %d\n", port);
+ has_errors = true;
+ }
+
+ rc = hbl_cn_mem_destroy(hdev, cn_port->ccq_handle);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to free CCQ memory, port %d\n", port);
+ has_errors = true;
+ }
+
+ rc = hbl_cn_eq_dispatcher_unregister_ccq(cn_port, ctx->asid, ccqn);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to unregister CCQ EQ handler, port %u, asid %u\n", port,
+ ctx->asid);
+ has_errors = true;
+ }
+
+ if (has_errors)
+ return -EIO;
+
+ cn_port->ccq_enable = false;
+
+ return 0;
+}
+
+static int user_ccq_unset(struct hbl_cn_device *hdev, struct hbl_cni_user_ccq_unset_in *in,
+ struct hbl_cn_ctx *ctx)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs = hdev->asic_funcs->port_funcs;
+ struct hbl_cn_port *cn_port;
+ u32 port;
+ int rc;
+
+ port = in->port;
+
+ rc = hbl_cn_cmd_port_check(hdev, port, NIC_PORT_CHECK_ENABLE | NIC_PORT_PRINT_ON_ERR);
+ if (rc)
+ return rc;
+
+ cn_port = &hdev->cn_ports[port];
+
+ port_funcs->cfg_lock(cn_port);
+
+ if (!cn_port->ccq_enable) {
+ dev_dbg(hdev->dev, "Failed unsetting CCQ handler - it is already unset, port %u\n",
+ port);
+ rc = -ENXIO;
+ goto out;
+ }
+
+ rc = __user_ccq_unset(hdev, ctx, in->port);
+out:
+ port_funcs->cfg_unlock(cn_port);
+
+ return rc;
+}
+
+static int dump_qp(struct hbl_cn_device *hdev, struct hbl_cni_dump_qp_in *in)
+{
+ struct hbl_cn_qp_info qp_info = {};
+ u32 buf_size;
+ char *buf;
+ int rc;
+
+ buf_size = in->user_buf_size;
+
+ if (!buf_size || buf_size > NIC_DUMP_QP_SZ) {
+ dev_err(hdev->dev, "Invalid buffer size %u\n", buf_size);
+ return -EINVAL;
+ }
+
+ buf = kzalloc(buf_size, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ qp_info.port = in->port;
+ qp_info.qpn = in->qpn;
+ qp_info.req = in->req;
+ qp_info.full_print = true;
+ qp_info.force_read = true;
+
+ rc = hdev->asic_funcs->qp_read(hdev, &qp_info, buf, buf_size);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to read QP %u, port %u\n", in->qpn, in->port);
+ goto out;
+ }
+
+ if (copy_to_user((void __user *)(uintptr_t)in->user_buf, buf, buf_size)) {
+ dev_err(hdev->dev, "copy to user failed in debug ioctl\n");
+ rc = -EFAULT;
+ goto out;
+ }
+
+out:
+ kfree(buf);
+ return rc;
+}
+
+static int __hbl_cn_control(struct hbl_cn_device *hdev, u32 op, void *input, void *output,
+ struct hbl_cn_ctx *ctx)
+{
+ int rc;
+
+ if (!(hdev->ctrl_op_mask & BIT(op))) {
+ dev_dbg(hdev->dev, "CN control request %d is not supported on this device\n", op);
+ return -EOPNOTSUPP;
+ }
+
+ switch (op) {
+ case HBL_CNI_OP_ALLOC_CONN:
+ rc = alloc_qp(hdev, ctx, input, output);
+ break;
+ case HBL_CNI_OP_SET_REQ_CONN_CTX:
+ rc = set_req_qp_ctx(hdev, input, output);
+ break;
+ case HBL_CNI_OP_SET_RES_CONN_CTX:
+ rc = set_res_qp_ctx(hdev, input);
+ break;
+ case HBL_CNI_OP_DESTROY_CONN:
+ rc = destroy_qp(hdev, input);
+ break;
+ case HBL_CNI_OP_USER_WQ_SET:
+ rc = user_wq_arr_set(hdev, input, output, ctx);
+ break;
+ case HBL_CNI_OP_USER_WQ_UNSET:
+ rc = user_wq_arr_unset(hdev, input, ctx);
+ break;
+ case HBL_CNI_OP_ALLOC_USER_CQ_ID:
+ rc = alloc_user_cq_id(hdev, input, output, ctx);
+ break;
+ case HBL_CNI_OP_SET_USER_APP_PARAMS:
+ rc = user_set_app_params(hdev, input, ctx);
+ break;
+ case HBL_CNI_OP_GET_USER_APP_PARAMS:
+ rc = user_get_app_params(hdev, input, output);
+ break;
+ case HBL_CNI_OP_EQ_POLL:
+ rc = eq_poll(hdev, ctx, input, output);
+ break;
+ case HBL_CNI_OP_ALLOC_USER_DB_FIFO:
+ rc = alloc_user_db_fifo(hdev, ctx, input, output);
+ break;
+ case HBL_CNI_OP_USER_DB_FIFO_SET:
+ rc = user_db_fifo_set(hdev, ctx, input, output);
+ break;
+ case HBL_CNI_OP_USER_DB_FIFO_UNSET:
+ rc = user_db_fifo_unset(hdev, input);
+ break;
+ case HBL_CNI_OP_USER_ENCAP_ALLOC:
+ rc = user_encap_alloc(hdev, input, output);
+ break;
+ case HBL_CNI_OP_USER_ENCAP_SET:
+ rc = user_encap_set(hdev, input);
+ break;
+ case HBL_CNI_OP_USER_ENCAP_UNSET:
+ rc = user_encap_unset(hdev, input);
+ break;
+ case HBL_CNI_OP_USER_CCQ_SET:
+ rc = user_ccq_set(hdev, input, output, ctx);
+ break;
+ case HBL_CNI_OP_USER_CCQ_UNSET:
+ rc = user_ccq_unset(hdev, input, ctx);
+ break;
+ case HBL_CNI_OP_USER_CQ_ID_SET:
+ rc = user_cq_id_set(hdev, input, output);
+ break;
+ case HBL_CNI_OP_USER_CQ_ID_UNSET:
+ rc = user_cq_id_unset(hdev, input);
+ break;
+ case HBL_CNI_OP_DUMP_QP:
+ rc = dump_qp(hdev, input);
+ break;
+ default:
+ /* we shouldn't get here as we check the opcode mask before */
+ dev_dbg(hdev->dev, "Invalid CN control request %d\n", op);
+ return -EINVAL;
+ }
+
+ return rc;
+}
+
+static int hbl_cn_ib_cmd_ctrl(struct hbl_aux_dev *aux_dev, void *cn_ib_ctx, u32 op, void *input,
+ void *output)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(aux_dev);
+ struct hbl_cn_ctx *ctx = cn_ib_ctx;
+ int rc;
+
+ mutex_lock(&ctx->lock);
+
+ do
+ rc = __hbl_cn_control(hdev, op, input, output, ctx);
+ while (rc == -EAGAIN);
+
+ mutex_unlock(&ctx->lock);
+
+ return rc;
+}
+
+static enum hbl_ib_mem_type mem_id_to_mem_type(enum hbl_cn_drv_mem_id id)
+{
+ switch (id) {
+ case HBL_CN_DRV_MEM_HOST_DMA_COHERENT:
+ return HBL_IB_MEM_HOST_DMA_COHERENT;
+ case HBL_CN_DRV_MEM_HOST_VIRTUAL:
+ return HBL_IB_MEM_HOST_VIRTUAL;
+ case HBL_CN_DRV_MEM_DEVICE:
+ return HBL_IB_MEM_DEVICE;
+ case HBL_CN_DRV_MEM_HOST_MAP_ONLY:
+ return HBL_IB_MEM_HOST_MAP_ONLY;
+ case HBL_CN_DRV_MEM_INVALID:
+ default:
+ return HBL_IB_MEM_INVALID;
+ }
+}
+
+static int hbl_cn_ib_query_mem_handle(struct hbl_aux_dev *ib_aux_dev, u64 mem_handle,
+ struct hbl_ib_mem_info *info)
+{
+ struct hbl_cn_device *hdev = HBL_AUX2NIC(ib_aux_dev);
+ struct hbl_cn_mem_buf *buf;
+ u64 mem_type;
+
+ mem_type = (mem_handle >> PAGE_SHIFT) & HBL_CN_MMAP_TYPE_MASK;
+
+ memset(info, 0, sizeof(*info));
+ info->mem_handle = mem_handle;
+
+ switch (mem_type) {
+ case HBL_CN_MMAP_TYPE_BLOCK:
+ info->mtype = HBL_IB_MEM_HW_BLOCK;
+ if (!hbl_cn_get_hw_block_addr(hdev, mem_handle, &info->bus_addr, &info->size))
+ return 0;
+
+ dev_err(hdev->dev, "NIC: No hw block address for handle %#llx\n", mem_handle);
+ break;
+ case HBL_CN_MMAP_TYPE_CN_MEM:
+ buf = hbl_cn_mem_buf_get(hdev, mem_handle);
+ if (!buf) {
+ dev_err(hdev->dev, "NIC: No buffer for handle %#llx\n", mem_handle);
+ break;
+ }
+
+ info->cpu_addr = buf->kernel_address;
+ info->bus_addr = buf->bus_address;
+ info->size = buf->mappable_size;
+ info->vmalloc = false;
+ info->mtype = mem_id_to_mem_type(buf->mem_id);
+
+ hbl_cn_mem_buf_put(buf);
+ return 0;
+ default:
+ dev_err(hdev->dev, "NIC: Invalid handle %#llx\n", mem_handle);
+ break;
+ }
+ return -EINVAL;
+}
+
+static void qps_destroy(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs = hdev->asic_funcs->port_funcs;
+ struct hbl_cn_port *cn_port;
+ unsigned long qp_id = 0;
+ struct hbl_cn_qp *qp;
+ int i;
+
+ /* destroy the QPs */
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ /* protect against destroy_qp occurring in parallel */
+ port_funcs->cfg_lock(cn_port);
+
+ xa_for_each(&cn_port->qp_ids, qp_id, qp) {
+ if (IS_ERR_OR_NULL(qp))
+ continue;
+
+ hbl_cn_qp_do_release(qp);
+ }
+
+ port_funcs->cfg_unlock(cn_port);
+ }
+
+ /* wait for the workers to complete */
+ qps_drain_async_work(hdev);
+
+ /* Verify the lists are empty */
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ port_funcs->cfg_lock(cn_port);
+
+ xa_for_each(&cn_port->qp_ids, qp_id, qp)
+ dev_err_ratelimited(hdev->dev, "Port %d QP %ld is still alive\n",
+ cn_port->port, qp_id);
+
+ port_funcs->cfg_unlock(cn_port);
+ }
+}
+
+static void user_cqs_destroy(struct hbl_cn_ctx *ctx)
+{
+ struct hbl_cn_device *hdev = ctx->hdev;
+ struct hbl_cn_properties *cn_props;
+ struct hbl_cn_user_cq *user_cq;
+ struct hbl_cn_port *cn_port;
+ unsigned long id;
+ int i;
+
+ cn_props = &hdev->cn_props;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!cn_props->force_cq && !(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ xa_for_each(&cn_port->cq_ids, id, user_cq) {
+ if (user_cq->state == USER_CQ_STATE_ALLOC)
+ hbl_cn_user_cq_put(user_cq);
+ else if (user_cq->state == USER_CQ_STATE_SET)
+ user_cq_unset_locked(user_cq, true);
+ }
+ }
+}
+
+static void wq_arrs_destroy(struct hbl_cn_ctx *ctx)
+{
+ struct hbl_cn_wq_array_properties *wq_arr_props;
+ struct hbl_cn_device *hdev = ctx->hdev;
+ struct hbl_cn_port *cn_port;
+ u32 type;
+ int i;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ wq_arr_props = cn_port->wq_arr_props;
+
+ for (type = 0; type < HBL_CNI_USER_WQ_TYPE_MAX; type++) {
+ if (wq_arr_props[type].enabled)
+ __user_wq_arr_unset(ctx, cn_port, type);
+ }
+ }
+}
+
+static void ccqs_destroy(struct hbl_cn_ctx *ctx)
+{
+ struct hbl_cn_device *hdev = ctx->hdev;
+ struct hbl_cn_port *cn_port;
+ int port;
+
+ for (port = 0; port < hdev->cn_props.max_num_of_ports; port++) {
+ if (!(hdev->ports_mask & BIT(port)))
+ continue;
+
+ cn_port = &hdev->cn_ports[port];
+ if (cn_port->ccq_enable)
+ __user_ccq_unset(hdev, ctx, port);
+ }
+}
+
+static void user_db_fifos_destroy(struct hbl_cn_ctx *ctx)
+{
+ struct hbl_cn_db_fifo_xarray_pdata *xa_pdata;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_port *cn_port;
+ struct hbl_cn_device *hdev;
+ unsigned long id;
+ int i;
+
+ hdev = ctx->hdev;
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ port_funcs->cfg_lock(cn_port);
+
+ xa_for_each(&cn_port->db_fifo_ids, id, xa_pdata)
+ if (xa_pdata->asid == ctx->asid)
+ __user_db_fifo_unset(cn_port, xa_pdata);
+
+ port_funcs->cfg_unlock(cn_port);
+ }
+}
+
+static void encap_ids_destroy(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs = hdev->asic_funcs->port_funcs;
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+ struct hbl_cn_encap_xarray_pdata *xa_pdata;
+ struct hbl_cn_port *cn_port;
+ unsigned long encap_id;
+ int i;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ port_funcs->cfg_lock(cn_port);
+
+ xa_for_each(&cn_port->encap_ids, encap_id, xa_pdata) {
+ asic_funcs->port_funcs->encap_unset(cn_port, encap_id, xa_pdata);
+
+ if (xa_pdata->encap_type != HBL_CNI_ENCAP_NONE)
+ kfree(xa_pdata->encap_header);
+
+ kfree(xa_pdata);
+ xa_erase(&cn_port->encap_ids, encap_id);
+ }
+
+ port_funcs->cfg_unlock(cn_port);
+ }
+}
+
+static void set_app_params_clear(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct hbl_cn_port *cn_port;
+ u32 max_num_of_ports, port;
+
+ asic_funcs = hdev->asic_funcs;
+ max_num_of_ports = hdev->cn_props.max_num_of_ports;
+
+ asic_funcs->app_params_clear(hdev);
+
+ for (port = 0; port < max_num_of_ports; port++) {
+ if (!(hdev->ports_mask & BIT(port)))
+ continue;
+
+ cn_port = &hdev->cn_ports[port];
+ cn_port->set_app_params = false;
+ }
+}
+
+void hbl_cn_ctx_resources_destroy(struct hbl_cn_device *hdev, struct hbl_cn_ctx *ctx)
+{
+ qps_destroy(hdev);
+ user_cqs_destroy(ctx);
+ wq_arrs_destroy(ctx);
+ ccqs_destroy(ctx);
+ user_db_fifos_destroy(ctx);
+ encap_ids_destroy(hdev);
+ set_app_params_clear(hdev);
+}
+
+int hbl_cn_alloc_ring(struct hbl_cn_device *hdev, struct hbl_cn_ring *ring, int elem_size,
+ int count)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+ int rc;
+
+ ring->count = count;
+ ring->elem_size = elem_size;
+ ring->asid = hdev->kernel_asid;
+
+ RING_BUF_SIZE(ring) = elem_size * count;
+ RING_BUF_ADDRESS(ring) = hbl_cn_dma_alloc_coherent(hdev, RING_BUF_SIZE(ring),
+ &RING_BUF_DMA_ADDRESS(ring), GFP_KERNEL);
+ if (!RING_BUF_ADDRESS(ring))
+ return -ENOMEM;
+
+ /* ring's idx_ptr shall point on pi/ci address */
+ RING_PI_SIZE(ring) = sizeof(u64);
+ RING_PI_ADDRESS(ring) = hbl_cn_dma_pool_zalloc(hdev, RING_PI_SIZE(ring),
+ GFP_KERNEL | __GFP_ZERO,
+ &RING_PI_DMA_ADDRESS(ring));
+ if (!RING_PI_ADDRESS(ring)) {
+ rc = -ENOMEM;
+ goto pi_alloc_fail;
+ }
+
+ return 0;
+
+pi_alloc_fail:
+ asic_funcs->dma_free_coherent(hdev, RING_BUF_SIZE(ring), RING_BUF_ADDRESS(ring),
+ RING_BUF_DMA_ADDRESS(ring));
+
+ return rc;
+}
+
+void hbl_cn_free_ring(struct hbl_cn_device *hdev, struct hbl_cn_ring *ring)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+
+ asic_funcs->dma_pool_free(hdev, RING_PI_ADDRESS(ring), RING_PI_DMA_ADDRESS(ring));
+
+ asic_funcs->dma_free_coherent(hdev, RING_BUF_SIZE(ring), RING_BUF_ADDRESS(ring),
+ RING_BUF_DMA_ADDRESS(ring));
+}
+
+static void hbl_cn_randomize_status_cnts(struct hbl_cn_port *cn_port,
+ struct hbl_cn_cpucp_status *status)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+
+ RAND_STAT_CNT(status->high_ber_reinit);
+ RAND_STAT_CNT(status->correctable_err_cnt);
+ RAND_STAT_CNT(status->uncorrectable_err_cnt);
+ RAND_STAT_CNT(status->bad_format_cnt);
+ RAND_STAT_CNT(status->responder_out_of_sequence_psn_cnt);
+}
+
+static void hbl_cn_get_status(struct hbl_cn_port *cn_port, struct hbl_cn_cpucp_status *status)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+
+ /* Port toggle counter should always be filled regardless of the logical state of the
+ * port.
+ */
+ status->port_toggle_cnt = hbl_cn_get_port_toggle_cnt(cn_port);
+
+ status->port = port;
+ status->up = hbl_cn_is_port_open(cn_port);
+
+ if (!status->up)
+ return;
+
+ status->pcs_link = cn_port->pcs_link;
+ status->phy_ready = cn_port->phy_fw_tuned;
+ status->auto_neg = cn_port->auto_neg_enable;
+
+ if (hdev->rand_status) {
+ hbl_cn_randomize_status_cnts(cn_port, status);
+ return;
+ }
+
+ status->high_ber_reinit = cn_port->pcs_remote_fault_reconfig_cnt;
+
+ /* Each ASIC will fill the rest of the statistics */
+ hdev->asic_funcs->port_funcs->get_status(cn_port, status);
+}
+
+static void hbl_cn_convert_cpucp_status(struct cpucp_nic_status *to,
+ struct hbl_cn_cpucp_status *from)
+{
+ to->port = cpu_to_le32(from->port);
+ to->bad_format_cnt = cpu_to_le32(from->bad_format_cnt);
+ to->responder_out_of_sequence_psn_cnt =
+ cpu_to_le32(from->responder_out_of_sequence_psn_cnt);
+ to->high_ber_reinit = cpu_to_le32(from->high_ber_reinit);
+ to->correctable_err_cnt = cpu_to_le32(from->correctable_err_cnt);
+ to->uncorrectable_err_cnt = cpu_to_le32(from->uncorrectable_err_cnt);
+ to->retraining_cnt = cpu_to_le32(from->retraining_cnt);
+ to->up = from->up;
+ to->pcs_link = from->pcs_link;
+ to->phy_ready = from->phy_ready;
+ to->auto_neg = from->auto_neg;
+ to->timeout_retransmission_cnt = cpu_to_le32(from->timeout_retransmission_cnt);
+ to->high_ber_cnt = cpu_to_le32(from->high_ber_cnt);
+ to->pre_fec_ser.integer = cpu_to_le16(from->pre_fec_ser.integer);
+ to->pre_fec_ser.exp = cpu_to_le16(from->pre_fec_ser.exp);
+ to->post_fec_ser.integer = cpu_to_le16(from->post_fec_ser.integer);
+ to->post_fec_ser.exp = cpu_to_le16(from->post_fec_ser.exp);
+ to->bandwidth.integer = cpu_to_le16(from->bandwidth.integer);
+ to->bandwidth.frac = cpu_to_le16(from->bandwidth.frac);
+ to->lat.integer = cpu_to_le16(from->lat.integer);
+ to->lat.frac = cpu_to_le16(from->lat.frac);
+ to->port_toggle_cnt = cpu_to_le32(from->port_toggle_cnt);
+}
+
+static int hbl_cn_send_cpucp_status(struct hbl_cn_device *hdev, u32 port,
+ struct hbl_cn_cpucp_status *cn_status)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct cpucp_nic_status_packet *pkt;
+ struct cpucp_nic_status status = {};
+ struct hbl_cn_properties *cn_props;
+ size_t total_pkt_size, data_size;
+ struct hbl_cn_port *cn_port;
+ u64 result;
+ int rc;
+
+ cn_props = &hdev->cn_props;
+ data_size = cn_props->status_packet_size;
+ asic_funcs = hdev->asic_funcs;
+ port_funcs = asic_funcs->port_funcs;
+ cn_port = &hdev->cn_ports[port];
+
+ total_pkt_size = sizeof(struct cpucp_nic_status_packet) + data_size;
+
+ /* data should be aligned to 8 bytes in order to CPU-CP to copy it */
+ total_pkt_size = (total_pkt_size + 0x7) & ~0x7;
+
+ /* total_pkt_size is casted to u16 later on */
+ if (total_pkt_size > USHRT_MAX) {
+ dev_err(hdev->dev, "NIC status data is too big\n");
+ rc = -EINVAL;
+ goto out;
+ }
+
+ pkt = kzalloc(total_pkt_size, GFP_KERNEL);
+ if (!pkt) {
+ rc = -ENOMEM;
+ goto out;
+ }
+
+ hbl_cn_convert_cpucp_status(&status, cn_status);
+
+ pkt->length = cpu_to_le32(data_size / sizeof(u32));
+ memcpy(&pkt->data, &status, data_size);
+
+ pkt->cpucp_pkt.ctl = cpu_to_le32(CPUCP_PACKET_NIC_STATUS << CPUCP_PKT_CTL_OPCODE_SHIFT);
+
+ rc = asic_funcs->send_cpu_message(hdev, (u32 *)pkt, total_pkt_size, 0, &result);
+ if (rc)
+ dev_err(hdev->dev, "failed to send NIC status, port %d\n", port);
+
+ kfree(pkt);
+out:
+ port_funcs->post_send_status(cn_port);
+
+ return rc;
+}
+
+void hbl_cn_fw_status_work(struct work_struct *work)
+{
+ struct hbl_cn_port *cn_port = container_of(work, struct hbl_cn_port, fw_status_work.work);
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_cpucp_status status = {};
+ u32 port = cn_port->port;
+ int rc;
+
+ hbl_cn_get_status(cn_port, &status);
+
+ rc = hbl_cn_send_cpucp_status(hdev, port, &status);
+ if (rc)
+ return;
+
+ if (hdev->status_cmd == HBL_CN_STATUS_PERIODIC_START)
+ queue_delayed_work(cn_port->wq, &cn_port->fw_status_work,
+ msecs_to_jiffies(hdev->status_period * 1000));
+}
+
+static void cn_port_sw_fini(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = cn_port->hdev->asic_funcs;
+
+ if (!cn_port->sw_initialized)
+ return;
+
+ cn_port->sw_initialized = false;
+
+ asic_funcs->port_funcs->port_sw_fini(cn_port);
+
+ xa_destroy(&cn_port->cq_ids);
+ xa_destroy(&cn_port->encap_ids);
+ xa_destroy(&cn_port->db_fifo_ids);
+ xa_destroy(&cn_port->qp_ids);
+
+ mutex_destroy(&cn_port->cnt_lock);
+ mutex_destroy(&cn_port->control_lock);
+
+ kfree(cn_port->reset_tracker);
+
+ destroy_workqueue(cn_port->qp_wq);
+ destroy_workqueue(cn_port->wq);
+}
+
+static void cn_wq_arr_props_init(struct hbl_cn_wq_array_properties *wq_arr_props)
+{
+ wq_arr_props[HBL_CNI_USER_WQ_SEND].type_str = "send";
+ wq_arr_props[HBL_CNI_USER_WQ_SEND].is_send = true;
+
+ wq_arr_props[HBL_CNI_USER_WQ_RECV].type_str = "recv";
+ wq_arr_props[HBL_CNI_USER_WQ_RECV].is_send = false;
+}
+
+static int cn_port_sw_init(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_wq_array_properties *wq_arr_props;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_reset_tracker *reset_tracker;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ u32 port, max_qp_error_syndromes;
+ char wq_name[32] = {0};
+ int rc;
+
+ port = cn_port->port;
+ asic_funcs = hdev->asic_funcs;
+ port_funcs = asic_funcs->port_funcs;
+ wq_arr_props = cn_port->wq_arr_props;
+ reset_tracker = NULL;
+
+ snprintf(wq_name, sizeof(wq_name) - 1, "hbl%u-cn%d-wq", hdev->id, port);
+ cn_port->wq = alloc_workqueue(wq_name, 0, 0);
+ if (!cn_port->wq) {
+ dev_err(hdev->dev, "Failed to create WQ, port: %d\n", port);
+ return -ENOMEM;
+ }
+
+ snprintf(wq_name, sizeof(wq_name) - 1, "hbl%u-cn%d-qp-wq", hdev->id, port);
+ cn_port->qp_wq = alloc_workqueue(wq_name, WQ_UNBOUND, 0);
+ if (!cn_port->qp_wq) {
+ dev_err(hdev->dev, "Failed to create QP WQ, port: %d\n", port);
+ rc = -ENOMEM;
+ goto qp_wq_err;
+ }
+
+ max_qp_error_syndromes = hdev->cn_props.max_qp_error_syndromes;
+ if (max_qp_error_syndromes) {
+ reset_tracker = kcalloc(max_qp_error_syndromes, sizeof(*reset_tracker), GFP_KERNEL);
+ if (!reset_tracker) {
+ rc = -ENOMEM;
+ goto reset_tracker_err;
+ }
+
+ cn_port->reset_tracker = reset_tracker;
+ }
+
+ mutex_init(&cn_port->control_lock);
+ mutex_init(&cn_port->cnt_lock);
+
+ xa_init_flags(&cn_port->qp_ids, XA_FLAGS_ALLOC);
+ xa_init_flags(&cn_port->db_fifo_ids, XA_FLAGS_ALLOC);
+ xa_init_flags(&cn_port->encap_ids, XA_FLAGS_ALLOC);
+ xa_init_flags(&cn_port->cq_ids, XA_FLAGS_ALLOC);
+
+ INIT_DELAYED_WORK(&cn_port->link_status_work, port_funcs->phy_link_status_work);
+
+ cn_port->speed = asic_funcs->get_default_port_speed(hdev);
+ cn_port->pfc_enable = true;
+ cn_port->pflags = PFLAGS_PCS_LINK_CHECK | PFLAGS_PHY_AUTO_NEG_LPBK;
+
+ cn_wq_arr_props_init(wq_arr_props);
+
+ rc = port_funcs->port_sw_init(cn_port);
+ if (rc)
+ goto sw_init_err;
+
+ cn_port->sw_initialized = true;
+
+ return 0;
+
+sw_init_err:
+ xa_destroy(&cn_port->cq_ids);
+ xa_destroy(&cn_port->encap_ids);
+ xa_destroy(&cn_port->db_fifo_ids);
+ xa_destroy(&cn_port->qp_ids);
+
+ mutex_destroy(&cn_port->cnt_lock);
+ mutex_destroy(&cn_port->control_lock);
+
+ if (max_qp_error_syndromes)
+ kfree(reset_tracker);
+reset_tracker_err:
+ destroy_workqueue(cn_port->qp_wq);
+qp_wq_err:
+ destroy_workqueue(cn_port->wq);
+
+ return rc;
+}
+
+static int cn_macro_sw_init(struct hbl_cn_macro *cn_macro)
+{
+ struct hbl_cn_asic_funcs *asic_funcs;
+
+ asic_funcs = cn_macro->hdev->asic_funcs;
+
+ return asic_funcs->macro_sw_init(cn_macro);
+}
+
+static void cn_macro_sw_fini(struct hbl_cn_macro *cn_macro)
+{
+ struct hbl_cn_asic_funcs *asic_funcs;
+
+ asic_funcs = cn_macro->hdev->asic_funcs;
+
+ asic_funcs->macro_sw_fini(cn_macro);
+}
+
+static void hbl_cn_sw_fini(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+ int i;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++)
+ cn_port_sw_fini(&hdev->cn_ports[i]);
+
+ for (i = 0; i < hdev->cn_props.num_of_macros; i++)
+ cn_macro_sw_fini(&hdev->cn_macros[i]);
+
+ asic_funcs->sw_fini(hdev);
+
+ kfree(hdev->ib_aux_dev.aux_data);
+ kfree(hdev->ib_aux_dev.aux_ops);
+ kfree(hdev->en_aux_dev.aux_data);
+ kfree(hdev->en_aux_dev.aux_ops);
+ kfree(hdev->mac_lane_remap);
+ kfree(hdev->phy_ber_info);
+ kfree(hdev->phy_tx_taps);
+ kfree(hdev->cn_macros);
+ kfree(hdev->cn_ports);
+}
+
+static int hbl_cn_sw_init(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+ struct hbl_cn_macro *cn_macro, *cn_macros;
+ int rc, i, macro_cnt = 0, port_cnt = 0;
+ struct hbl_cn_port *cn_port, *cn_ports;
+ struct hbl_en_aux_data *en_aux_data;
+ struct hbl_ib_aux_data *ib_aux_data;
+ struct hbl_en_aux_ops *en_aux_ops;
+ struct hbl_ib_aux_ops *ib_aux_ops;
+ struct hbl_cn_ber_info *ber_info;
+ struct hbl_cn_tx_taps *tx_taps;
+ u32 *mac_lane_remap;
+
+ asic_funcs->pre_sw_init(hdev);
+
+ /* Allocate per port common structure array */
+ cn_ports = kcalloc(hdev->cn_props.max_num_of_ports, sizeof(*cn_ports), GFP_KERNEL);
+ if (!cn_ports)
+ return -ENOMEM;
+
+ /* Allocate per macro common structure array */
+ cn_macros = kcalloc(hdev->cn_props.num_of_macros, sizeof(*cn_macros), GFP_KERNEL);
+ if (!cn_macros) {
+ rc = -ENOMEM;
+ goto macro_alloc_fail;
+ }
+
+ tx_taps = kcalloc(hdev->cn_props.max_num_of_lanes, sizeof(*tx_taps), GFP_KERNEL);
+ if (!tx_taps) {
+ rc = -ENOMEM;
+ goto taps_alloc_fail;
+ }
+
+ ber_info = kcalloc(hdev->cn_props.max_num_of_lanes, sizeof(*ber_info), GFP_KERNEL);
+ if (!ber_info) {
+ rc = -ENOMEM;
+ goto ber_info_alloc_fail;
+ }
+
+ mac_lane_remap = kcalloc(hdev->cn_props.num_of_macros, sizeof(*mac_lane_remap),
+ GFP_KERNEL);
+ if (!mac_lane_remap) {
+ rc = -ENOMEM;
+ goto mac_remap_alloc_fail;
+ }
+
+ en_aux_data = kzalloc(sizeof(*en_aux_data), GFP_KERNEL);
+ if (!en_aux_data) {
+ rc = -ENOMEM;
+ goto en_aux_data_alloc_fail;
+ }
+
+ en_aux_ops = kzalloc(sizeof(*en_aux_ops), GFP_KERNEL);
+ if (!en_aux_ops) {
+ rc = -ENOMEM;
+ goto en_aux_ops_alloc_fail;
+ }
+
+ ib_aux_data = kzalloc(sizeof(*ib_aux_data), GFP_KERNEL);
+ if (!ib_aux_data) {
+ rc = -ENOMEM;
+ goto ib_aux_data_alloc_fail;
+ }
+
+ ib_aux_ops = kzalloc(sizeof(*ib_aux_ops), GFP_KERNEL);
+ if (!ib_aux_ops) {
+ rc = -ENOMEM;
+ goto ib_aux_ops_alloc_fail;
+ }
+
+ hdev->en_aux_dev.aux_data = en_aux_data;
+ hdev->en_aux_dev.aux_ops = en_aux_ops;
+ hdev->ib_aux_dev.aux_data = ib_aux_data;
+ hdev->ib_aux_dev.aux_ops = ib_aux_ops;
+
+ hdev->phy_tx_taps = tx_taps;
+ hdev->phy_ber_info = ber_info;
+ hdev->mac_lane_remap = mac_lane_remap;
+ hdev->phy_config_fw = !hdev->pldm && !hdev->skip_phy_init;
+ hdev->mmu_bypass = true;
+ hdev->phy_calc_ber_wait_sec = 30;
+
+ rc = asic_funcs->sw_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "ASIC SW init failed\n");
+ goto sw_init_fail;
+ }
+
+ hdev->cn_ports = cn_ports;
+ hdev->cn_macros = cn_macros;
+ for (i = 0; i < hdev->cn_props.num_of_macros; i++, macro_cnt++) {
+ cn_macro = &hdev->cn_macros[i];
+
+ cn_macro->hdev = hdev;
+ cn_macro->idx = i;
+
+ rc = cn_macro_sw_init(cn_macro);
+ if (rc) {
+ dev_err(hdev->dev, "Macro %d SW init failed\n", i);
+ goto macro_init_fail;
+ }
+ }
+
+ /* At this stage, we don't know how many ports we have, so we must
+ * allocate for the maximum number of ports (and also free all of them
+ * in sw_fini)
+ */
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++, port_cnt++) {
+ cn_port = &hdev->cn_ports[i];
+ cn_port->hdev = hdev;
+ cn_port->port = i;
+ atomic_set(&cn_port->num_of_allocated_qps, 0);
+ rc = cn_port_sw_init(cn_port);
+ if (rc) {
+ dev_err(hdev->dev, "S/W init failed, port %d\n", i);
+ goto port_init_fail;
+ }
+ }
+
+ return 0;
+
+port_init_fail:
+ for (i = 0; i < port_cnt; i++)
+ cn_port_sw_fini(&hdev->cn_ports[i]);
+
+macro_init_fail:
+ for (i = 0; i < macro_cnt; i++)
+ cn_macro_sw_fini(&hdev->cn_macros[i]);
+
+ asic_funcs->sw_fini(hdev);
+sw_init_fail:
+ kfree(ib_aux_ops);
+ib_aux_ops_alloc_fail:
+ kfree(ib_aux_data);
+ib_aux_data_alloc_fail:
+ kfree(en_aux_ops);
+en_aux_ops_alloc_fail:
+ kfree(en_aux_data);
+en_aux_data_alloc_fail:
+ kfree(mac_lane_remap);
+mac_remap_alloc_fail:
+ kfree(ber_info);
+ber_info_alloc_fail:
+ kfree(tx_taps);
+taps_alloc_fail:
+ kfree(cn_macros);
+macro_alloc_fail:
+ kfree(cn_ports);
+
+ return rc;
+}
+
+static void hbl_cn_late_init(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ /* compute2cn */
+ aux_ops->ports_reopen = hbl_cn_ports_reopen;
+ aux_ops->ports_stop_prepare = hbl_cn_hard_reset_prepare;
+ aux_ops->ports_stop = hbl_cn_stop;
+ aux_ops->synchronize_irqs = hbl_cn_synchronize_irqs;
+
+ hdev->asic_funcs->late_init(hdev);
+}
+
+static void hbl_cn_late_fini(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ /* compute2cn */
+ aux_ops->ports_reopen = NULL;
+ aux_ops->ports_stop_prepare = NULL;
+ aux_ops->ports_stop = NULL;
+ aux_ops->synchronize_irqs = NULL;
+
+ hdev->asic_funcs->late_fini(hdev);
+}
+
+bool hbl_cn_is_port_open(struct hbl_cn_port *cn_port)
+{
+ struct hbl_aux_dev *aux_dev = &cn_port->hdev->en_aux_dev;
+ struct hbl_en_aux_ops *aux_ops = aux_dev->aux_ops;
+ u32 port = cn_port->port;
+
+ if (cn_port->eth_enable && aux_ops->is_port_open)
+ return aux_ops->is_port_open(aux_dev, port);
+
+ return cn_port->port_open;
+}
+
+u32 hbl_cn_get_pflags(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port = cn_port->port;
+
+ aux_dev = &hdev->en_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ if (cn_port->eth_enable && aux_ops->get_pflags)
+ return aux_ops->get_pflags(aux_dev, port);
+
+ return cn_port->pflags;
+}
+
+u8 hbl_cn_get_num_of_digits(u64 num)
+{
+ u8 n_digits = 0;
+
+ while (num) {
+ n_digits++;
+ num /= 10;
+ }
+
+ return n_digits;
+}
+
+static void hbl_cn_spmu_init(struct hbl_cn_port *cn_port, bool full)
+{
+ u32 spmu_events[NIC_SPMU_STATS_LEN_MAX], num_event_types, port = cn_port->port;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_stat *event_types;
+ int rc, i;
+
+ if (!hdev->supports_coresight)
+ return;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ port_funcs->spmu_get_stats_info(cn_port, &event_types, &num_event_types);
+ num_event_types = min_t(u32, num_event_types, NIC_SPMU_STATS_LEN_MAX);
+
+ for (i = 0; i < num_event_types; i++)
+ spmu_events[i] = event_types[i].lo_offset;
+
+ if (full) {
+ rc = port_funcs->spmu_config(cn_port, num_event_types, spmu_events, false);
+ if (rc)
+ dev_err(hdev->dev, "Failed to disable spmu for port %d\n", port);
+ }
+
+ rc = port_funcs->spmu_config(cn_port, num_event_types, spmu_events, true);
+ if (rc)
+ dev_err(hdev->dev, "Failed to enable spmu for port %d\n", port);
+}
+
+static void hbl_cn_reset_stats_counters_port(struct hbl_cn_device *hdev, u32 port)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_cn_port *cn_port;
+ struct hbl_aux_dev *aux_dev;
+
+ cn_port = &hdev->cn_ports[port];
+ aux_dev = &hdev->en_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ /* Ethernet */
+ if (cn_port->eth_enable && aux_ops->reset_stats)
+ aux_ops->reset_stats(aux_dev, port);
+
+ /* MAC */
+ port_funcs->reset_mac_stats(cn_port);
+
+ /* SPMU */
+ hbl_cn_spmu_init(cn_port, true);
+
+ /* XPCS91 */
+ cn_port->correctable_errors_cnt = 0;
+ cn_port->uncorrectable_errors_cnt = 0;
+
+ /* PCS */
+ cn_port->pcs_local_fault_cnt = 0;
+ cn_port->pcs_remote_fault_cnt = 0;
+ cn_port->pcs_remote_fault_reconfig_cnt = 0;
+ cn_port->pcs_link_restore_cnt = 0;
+
+ /* Congestion Queue */
+ cn_port->cong_q_err_cnt = 0;
+}
+
+void hbl_cn_reset_stats_counters(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_port *cn_port;
+ u32 port;
+ int i;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ if (!hbl_cn_is_port_open(cn_port))
+ continue;
+
+ port = cn_port->port;
+
+ hbl_cn_reset_stats_counters_port(hdev, port);
+ }
+}
+
+void hbl_cn_reset_ports_toggle_counters(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_port *cn_port;
+ int i;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ cn_port->port_toggle_cnt = 0;
+ cn_port->port_toggle_cnt_prev = 0;
+ }
+}
+
+/* The following implements the events dispatcher
+ * Each application registering with the device is assigned a unique ASID
+ * by the driver, it is also being associated with a SW-EQ by the dispatcher
+ * (The Eth driver is handled by the kernel associated with ASID 0).
+ * during the lifetime of the app/ASID, each resource allocated to it
+ * that can generate events (such as QP and CQ) is being associated by the
+ * dispatcher the appropriate ASID.
+ * During the course of work of the NIC, the HW EQ is accessed
+ * (by poling or interrupt), and for each event found in it
+ * - The resource ID which generated the event is retrieved from it (CQ# or QP#)
+ * - The ASID it retrieved from the ASID-resource association lists,
+ * - The event is inserted to the ASID-specific SW-EQ to be retrieved later on
+ * by the app. An exception is the Eth driver which as for today is tightly
+ * coupled with the EQ so the dispatcher calls the Eth event handling routine
+ * (if registered) immediately after dispatching the events to the SW-EQs.
+ * Note: The Link events which are always handled by the Eth driver (ASID 0).
+ */
+
+struct hbl_cn_ev_dq *hbl_cn_cqn_to_dq(struct hbl_cn_ev_dqs *ev_dqs, u32 cqn,
+ struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_properties *cn_prop = &hdev->cn_props;
+ struct hbl_cn_ev_dq *dq;
+
+ if (cqn >= cn_prop->max_cqs)
+ return NULL;
+
+ dq = ev_dqs->cq_dq[cqn];
+ if (!dq || !dq->associated)
+ return NULL;
+
+ return dq;
+}
+
+struct hbl_cn_ev_dq *hbl_cn_ccqn_to_dq(struct hbl_cn_ev_dqs *ev_dqs, u32 ccqn,
+ struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_properties *cn_prop = &hdev->cn_props;
+ struct hbl_cn_ev_dq *dq;
+
+ if (ccqn >= cn_prop->max_ccqs)
+ return NULL;
+
+ dq = ev_dqs->ccq_dq[ccqn];
+ if (!dq || !dq->associated)
+ return NULL;
+
+ return dq;
+}
+
+struct hbl_cn_dq_qp_info *hbl_cn_get_qp_info(struct hbl_cn_ev_dqs *ev_dqs, u32 qpn)
+{
+ struct hbl_cn_dq_qp_info *qp_info = NULL;
+
+ hash_for_each_possible(ev_dqs->qps, qp_info, node, qpn)
+ if (qpn == qp_info->qpn)
+ return qp_info;
+
+ return NULL;
+}
+
+struct hbl_cn_ev_dq *hbl_cn_qpn_to_dq(struct hbl_cn_ev_dqs *ev_dqs, u32 qpn)
+{
+ struct hbl_cn_dq_qp_info *qp_info = hbl_cn_get_qp_info(ev_dqs, qpn);
+
+ if (qp_info)
+ return qp_info->dq;
+
+ return NULL;
+}
+
+struct hbl_cn_ev_dq *hbl_cn_dbn_to_dq(struct hbl_cn_ev_dqs *ev_dqs, u32 dbn,
+ struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_properties *cn_prop = &hdev->cn_props;
+ struct hbl_cn_ev_dq *dq;
+
+ if (dbn >= cn_prop->max_db_fifos)
+ return NULL;
+
+ dq = ev_dqs->db_dq[dbn];
+ if (!dq || !dq->associated)
+ return NULL;
+
+ return dq;
+}
+
+struct hbl_cn_ev_dq *hbl_cn_asid_to_dq(struct hbl_cn_ev_dqs *ev_dqs, u32 asid)
+{
+ struct hbl_cn_ev_dq *dq;
+ int i;
+
+ for (i = 0; i < NIC_NUM_CONCUR_ASIDS; i++) {
+ dq = &ev_dqs->edq[i];
+ if (dq->associated && dq->asid == asid)
+ return dq;
+ }
+
+ return NULL;
+}
+
+static void hbl_cn_dq_reset(struct hbl_cn_ev_dq *dq)
+{
+ struct hbl_cn_eq_raw_buf *buf = &dq->buf;
+
+ dq->overflow = 0;
+ buf->tail = 0;
+ buf->head = buf->tail;
+ buf->events_count = 0;
+ memset(buf->events, 0, sizeof(buf->events));
+}
+
+bool hbl_cn_eq_dispatcher_is_empty(struct hbl_cn_ev_dq *dq)
+{
+ return (dq->buf.events_count == 0);
+}
+
+bool hbl_cn_eq_dispatcher_is_full(struct hbl_cn_ev_dq *dq)
+{
+ return (dq->buf.events_count == (NIC_EQ_INFO_BUF_SIZE - 1));
+}
+
+void hbl_cn_eq_dispatcher_init(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ int i;
+
+ hash_init(ev_dqs->qps);
+ mutex_init(&ev_dqs->lock);
+
+ hbl_cn_dq_reset(&ev_dqs->default_edq);
+
+ for (i = 0; i < NIC_NUM_CONCUR_ASIDS; i++)
+ hbl_cn_dq_reset(&ev_dqs->edq[i]);
+
+ for (i = 0; i < NIC_DRV_MAX_CQS_NUM; i++)
+ ev_dqs->cq_dq[i] = NULL;
+
+ for (i = 0; i < NIC_DRV_NUM_DB_FIFOS; i++)
+ ev_dqs->db_dq[i] = NULL;
+}
+
+void hbl_cn_eq_dispatcher_fini(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_ev_dqs *edqs = ev_dqs;
+ struct hbl_cn_dq_qp_info *qp_info;
+ u32 port = cn_port->port;
+ struct hlist_node *tmp;
+ int i;
+
+ if (!hash_empty(edqs->qps))
+ dev_err(hdev->dev, "port %d dispatcher is closed while there are QPs in use\n",
+ port);
+
+ hash_for_each_safe(edqs->qps, i, tmp, qp_info, node) {
+ dev_err_ratelimited(hdev->dev, "port %d QP %d was not destroyed\n", port,
+ qp_info->qpn);
+ hash_del(&qp_info->node);
+ kfree(qp_info);
+ }
+
+ mutex_destroy(&ev_dqs->lock);
+}
+
+void hbl_cn_eq_dispatcher_reset(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ struct hbl_cn_ev_dqs *edqs = ev_dqs;
+ int i;
+
+ mutex_lock(&edqs->lock);
+
+ hbl_cn_dq_reset(&edqs->default_edq);
+
+ for (i = 0; i < NIC_NUM_CONCUR_ASIDS; i++)
+ hbl_cn_dq_reset(&edqs->edq[i]);
+
+ mutex_unlock(&edqs->lock);
+}
+
+int hbl_cn_eq_dispatcher_associate_dq(struct hbl_cn_port *cn_port, u32 asid)
+{
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ struct hbl_cn_ev_dq *dq;
+ int i, rc = -ENOSPC;
+
+ mutex_lock(&ev_dqs->lock);
+
+ dq = hbl_cn_asid_to_dq(ev_dqs, asid);
+ if (dq) {
+ rc = 0;
+ goto exit;
+ }
+
+ for (i = 0; i < NIC_NUM_CONCUR_ASIDS; i++) {
+ dq = &ev_dqs->edq[i];
+ if (!dq->associated) {
+ dq->associated = true;
+ dq->asid = asid;
+ rc = 0;
+ break;
+ }
+ }
+
+exit:
+ mutex_unlock(&ev_dqs->lock);
+
+ return rc;
+}
+
+int hbl_cn_eq_dispatcher_dissociate_dq(struct hbl_cn_port *cn_port, u32 asid)
+{
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ struct hbl_cn_ev_dq *dq;
+
+ mutex_lock(&ev_dqs->lock);
+
+ dq = hbl_cn_asid_to_dq(ev_dqs, asid);
+ if (!dq)
+ goto exit;
+
+ hbl_cn_dq_reset(dq);
+ dq->associated = false;
+ dq->asid = U32_MAX;
+
+exit:
+ mutex_unlock(&ev_dqs->lock);
+
+ return 0;
+}
+
+int hbl_cn_eq_dispatcher_register_qp(struct hbl_cn_port *cn_port, u32 asid, u32 qp_id)
+{
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ struct hbl_cn_dq_qp_info *qp_info;
+ struct hbl_cn_ev_dq *dq;
+ int rc = 0;
+
+ mutex_lock(&ev_dqs->lock);
+
+ /* check if such qp is already registered and if with the same asid */
+ dq = hbl_cn_qpn_to_dq(ev_dqs, qp_id);
+ if (dq) {
+ if (dq->asid != asid)
+ rc = -EINVAL;
+
+ goto exit;
+ }
+
+ /* find the dq associated with the given asid */
+ dq = hbl_cn_asid_to_dq(ev_dqs, asid);
+ if (!dq) {
+ rc = -ENODATA;
+ goto exit;
+ }
+
+ /* register the QP */
+ qp_info = kmalloc(sizeof(*qp_info), GFP_KERNEL);
+ if (!qp_info) {
+ rc = -ENOMEM;
+ goto exit;
+ }
+
+ qp_info->dq = dq;
+ qp_info->qpn = qp_id;
+ hash_add(ev_dqs->qps, &qp_info->node, qp_id);
+
+exit:
+ mutex_unlock(&ev_dqs->lock);
+
+ return rc;
+}
+
+int hbl_cn_eq_dispatcher_unregister_qp(struct hbl_cn_port *cn_port, u32 qp_id)
+{
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ struct hbl_cn_dq_qp_info *qp_info;
+
+ mutex_lock(&ev_dqs->lock);
+
+ qp_info = hbl_cn_get_qp_info(ev_dqs, qp_id);
+ if (qp_info) {
+ hash_del(&qp_info->node);
+ kfree(qp_info);
+ }
+
+ mutex_unlock(&ev_dqs->lock);
+
+ return 0;
+}
+
+int hbl_cn_eq_dispatcher_register_cq(struct hbl_cn_port *cn_port, u32 asid, u32 cqn)
+{
+ struct hbl_cn_properties *cn_prop = &cn_port->hdev->cn_props;
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ struct hbl_cn_ev_dq *dq;
+ int rc = 0;
+
+ if (cqn >= cn_prop->max_cqs)
+ return -EINVAL;
+
+ mutex_lock(&ev_dqs->lock);
+
+ /* check if such qp is already registered and if with the same
+ * asid
+ */
+ dq = ev_dqs->cq_dq[cqn];
+ if (dq) {
+ if (dq->asid != asid)
+ rc = -EINVAL;
+
+ goto exit;
+ }
+
+ /* find the dq associated with the given asid */
+ dq = hbl_cn_asid_to_dq(ev_dqs, asid);
+ if (!dq) {
+ rc = -ENODATA;
+ goto exit;
+ }
+
+ ev_dqs->cq_dq[cqn] = dq;
+
+exit:
+ mutex_unlock(&ev_dqs->lock);
+
+ return rc;
+}
+
+int hbl_cn_eq_dispatcher_unregister_cq(struct hbl_cn_port *cn_port, u32 cqn)
+{
+ struct hbl_cn_properties *cn_prop = &cn_port->hdev->cn_props;
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+
+ if (cqn >= cn_prop->max_cqs)
+ return -EINVAL;
+
+ mutex_lock(&ev_dqs->lock);
+
+ ev_dqs->cq_dq[cqn] = NULL;
+
+ mutex_unlock(&ev_dqs->lock);
+
+ return 0;
+}
+
+int hbl_cn_eq_dispatcher_register_ccq(struct hbl_cn_port *cn_port, u32 asid, u32 ccqn)
+{
+ struct hbl_cn_properties *cn_prop = &cn_port->hdev->cn_props;
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ struct hbl_cn_ev_dq *dq;
+ int rc = 0;
+
+ if (ccqn >= cn_prop->max_ccqs)
+ return -EINVAL;
+
+ mutex_lock(&ev_dqs->lock);
+
+ /* check if such qp is already registered and if with the same asid */
+ dq = ev_dqs->ccq_dq[ccqn];
+ if (dq) {
+ rc = -EINVAL;
+ goto exit;
+ }
+
+ /* find the dq associated with the given asid */
+ dq = hbl_cn_asid_to_dq(ev_dqs, asid);
+ if (!dq) {
+ rc = -ENODATA;
+ goto exit;
+ }
+
+ ev_dqs->ccq_dq[ccqn] = dq;
+
+exit:
+ mutex_unlock(&ev_dqs->lock);
+ return rc;
+}
+
+int hbl_cn_eq_dispatcher_unregister_ccq(struct hbl_cn_port *cn_port, u32 asid, u32 ccqn)
+{
+ struct hbl_cn_properties *cn_prop = &cn_port->hdev->cn_props;
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+
+ if (ccqn >= cn_prop->max_ccqs)
+ return -EINVAL;
+
+ if (!hbl_cn_asid_to_dq(ev_dqs, asid))
+ return -ENODATA;
+
+ mutex_lock(&ev_dqs->lock);
+
+ ev_dqs->ccq_dq[ccqn] = NULL;
+
+ mutex_unlock(&ev_dqs->lock);
+
+ return 0;
+}
+
+int hbl_cn_eq_dispatcher_register_db(struct hbl_cn_port *cn_port, u32 asid, u32 dbn)
+{
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ struct hbl_cn_ev_dq *dq;
+ u32 min, max;
+ int rc = 0;
+
+ cn_port->hdev->asic_funcs->port_funcs->get_db_fifo_hw_id_range(cn_port, &min, &max);
+ if (dbn < min || dbn > max) {
+ dev_err(cn_port->hdev->dev,
+ "Failed to register dbn %u to the dispatcher (valid range %u-%u)\n", dbn,
+ min, max);
+ return -EINVAL;
+ }
+
+ mutex_lock(&ev_dqs->lock);
+
+ /* check if doorbell is already registered and if so is it with the same
+ * asid
+ */
+ dq = ev_dqs->db_dq[dbn];
+ if (dq) {
+ if (dq->asid != asid)
+ rc = -EINVAL;
+
+ goto exit;
+ }
+
+ /* find the dq associated with the given asid and transport */
+ dq = hbl_cn_asid_to_dq(ev_dqs, asid);
+ if (!dq) {
+ rc = -ENODATA;
+ goto exit;
+ }
+
+ ev_dqs->db_dq[dbn] = dq;
+
+exit:
+ mutex_unlock(&ev_dqs->lock);
+
+ return rc;
+}
+
+int hbl_cn_eq_dispatcher_unregister_db(struct hbl_cn_port *cn_port, u32 dbn)
+{
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ u32 min, max;
+
+ cn_port->hdev->asic_funcs->port_funcs->get_db_fifo_hw_id_range(cn_port, &min, &max);
+ if (dbn < min || dbn > max) {
+ dev_err(cn_port->hdev->dev,
+ "Failed to unregister dbn %u from the dispatcher (valid range %u-%u)\n",
+ dbn, min, max);
+ return -EINVAL;
+ }
+
+ mutex_lock(&ev_dqs->lock);
+
+ ev_dqs->db_dq[dbn] = NULL;
+
+ mutex_unlock(&ev_dqs->lock);
+
+ return 0;
+}
+
+static int __hbl_cn_eq_dispatcher_enqueue(struct hbl_cn_port *cn_port, struct hbl_cn_ev_dq *dq,
+ const struct hbl_cn_eqe *eqe)
+{
+ struct hbl_aux_dev *aux_dev = &cn_port->hdev->ib_aux_dev;
+ struct hbl_ib_aux_ops *aux_ops = aux_dev->aux_ops;
+
+ if (hbl_cn_eq_dispatcher_is_full(dq)) {
+ dq->overflow++;
+ return -ENOSPC;
+ }
+
+ memcpy(&dq->buf.events[dq->buf.head], eqe, min(sizeof(*eqe), sizeof(dq->buf.events[0])));
+ dq->buf.head = (dq->buf.head + 1) & (NIC_EQ_INFO_BUF_SIZE - 1);
+ dq->buf.events_count++;
+
+ /* If IB device exist, call work scheduler for hbl to poll eq */
+ if (aux_ops->eqe_work_schd)
+ aux_ops->eqe_work_schd(aux_dev, cn_port->port);
+
+ return 0;
+}
+
+/* Broadcast event to all user ASIDs */
+int hbl_cn_eq_dispatcher_enqueue_bcast(struct hbl_cn_port *cn_port, const struct hbl_cn_eqe *eqe)
+{
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_ev_dq *dq;
+ int i, rc = 0;
+
+ if (!hbl_cn_is_port_open(cn_port))
+ return 0;
+
+ mutex_lock(&ev_dqs->lock);
+
+ for (i = 0; i < NIC_NUM_CONCUR_ASIDS; i++) {
+ if (i == hdev->kernel_asid)
+ continue;
+
+ dq = hbl_cn_asid_to_dq(ev_dqs, i);
+ if (!dq)
+ continue;
+
+ rc = __hbl_cn_eq_dispatcher_enqueue(cn_port, dq, eqe);
+ if (rc) {
+ dev_dbg_ratelimited(cn_port->hdev->dev,
+ "Port %d, failed to enqueue dispatcher for ASID %d. %d\n",
+ cn_port->port, i, rc);
+ break;
+ }
+ }
+
+ mutex_unlock(&ev_dqs->lock);
+
+ return rc;
+}
+
+int hbl_cn_eq_dispatcher_enqueue(struct hbl_cn_port *cn_port, const struct hbl_cn_eqe *eqe)
+{
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_ev_dq *dq;
+ int rc;
+
+ if (!hbl_cn_is_port_open(cn_port))
+ return 0;
+
+ port_funcs = cn_port->hdev->asic_funcs->port_funcs;
+
+ mutex_lock(&ev_dqs->lock);
+
+ dq = port_funcs->eq_dispatcher_select_dq(cn_port, eqe);
+ if (!dq) {
+ rc = -ENODATA;
+ goto exit;
+ }
+
+ rc = __hbl_cn_eq_dispatcher_enqueue(cn_port, dq, eqe);
+ if (rc)
+ dev_dbg_ratelimited(cn_port->hdev->dev,
+ "Port %d, failed to enqueue dispatcher. %d\n", cn_port->port,
+ rc);
+
+exit:
+ mutex_unlock(&ev_dqs->lock);
+ return rc;
+}
+
+int hbl_cn_eq_dispatcher_dequeue(struct hbl_cn_port *cn_port, u32 asid, struct hbl_cn_eqe *eqe,
+ bool is_default)
+{
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ struct hbl_cn_ev_dq *dq;
+ int rc;
+
+ mutex_lock(&ev_dqs->lock);
+
+ if (is_default)
+ dq = &ev_dqs->default_edq;
+ else
+ dq = hbl_cn_asid_to_dq(ev_dqs, asid);
+
+ if (!dq) {
+ rc = -ESRCH;
+ goto exit;
+ }
+
+ if (hbl_cn_eq_dispatcher_is_empty(dq)) {
+ rc = -ENODATA;
+ goto exit;
+ }
+
+ /* We do a copy here instead of returning a pointer since a reset or
+ * destroy operation may occur after we return from the routine
+ */
+ memcpy(eqe, &dq->buf.events[dq->buf.tail], min(sizeof(*eqe), sizeof(dq->buf.events[0])));
+
+ dq->buf.tail = (dq->buf.tail + 1) & (NIC_EQ_INFO_BUF_SIZE - 1);
+ dq->buf.events_count--;
+ rc = 0;
+
+exit:
+ mutex_unlock(&ev_dqs->lock);
+ return rc;
+}
+
+u32 hbl_cn_dram_readl(struct hbl_cn_device *hdev, u64 addr)
+{
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ return aux_ops->dram_readl(aux_dev, addr);
+}
+
+void hbl_cn_dram_writel(struct hbl_cn_device *hdev, u32 val, u64 addr)
+{
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ aux_ops->dram_writel(aux_dev, val, addr);
+}
+
+u32 hbl_cn_rreg(struct hbl_cn_device *hdev, u32 reg)
+{
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ return aux_ops->rreg(aux_dev, reg);
+}
+
+void hbl_cn_wreg(struct hbl_cn_device *hdev, u32 reg, u32 val)
+{
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ return aux_ops->wreg(aux_dev, reg, val);
+}
+
+int hbl_cn_reserve_wq_dva(struct hbl_cn_ctx *ctx, struct hbl_cn_port *cn_port, u64 wq_arr_size,
+ u32 type, u64 *dva)
+{
+ struct hbl_cn_wq_array_properties *wq_arr_props;
+ int rc;
+
+ /* The Device VA block for WQ array is just reserved here. It will be backed by host
+ * physical pages once the MMU mapping is done via hbl_map_vmalloc_range inside the
+ * alloc_and_map_wq. Using host page alignment ensures we start with offset 0, both
+ * on host and device side.
+ */
+ rc = hbl_cn_reserve_dva_block(ctx, wq_arr_size, dva);
+ if (rc)
+ return rc;
+
+ wq_arr_props = &cn_port->wq_arr_props[type];
+
+ wq_arr_props->dva_base = *dva;
+ wq_arr_props->dva_size = wq_arr_size;
+
+ return 0;
+}
+
+void hbl_cn_unreserve_wq_dva(struct hbl_cn_ctx *ctx, struct hbl_cn_port *cn_port, u32 type)
+{
+ struct hbl_cn_wq_array_properties *wq_arr_props = &cn_port->wq_arr_props[type];
+
+ hbl_cn_unreserve_dva_block(ctx, wq_arr_props->dva_base, wq_arr_props->dva_size);
+ wq_arr_props->dva_base = 0;
+}
+
+void hbl_cn_track_port_reset(struct hbl_cn_port *cn_port, u32 syndrome)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_reset_tracker *reset_tracker;
+ unsigned long timestamp_jiffies = jiffies;
+ u32 max_qp_error_syndromes;
+
+ max_qp_error_syndromes = hdev->cn_props.max_qp_error_syndromes;
+ if (syndrome >= max_qp_error_syndromes) {
+ dev_dbg(hdev->dev, "Invalid syndrome %u\n", syndrome);
+ return;
+ }
+
+ reset_tracker = &cn_port->reset_tracker[syndrome];
+
+ /* In case the timeout passed, reset the tracker parameters and return */
+ if (time_after_eq(timestamp_jiffies, reset_tracker->timeout_jiffies)) {
+ reset_tracker->num_seq_resets = 1;
+ reset_tracker->timeout_jiffies = timestamp_jiffies +
+ msecs_to_jiffies(NIC_SEQ_RESETS_TIMEOUT_MS);
+ return;
+ }
+
+ reset_tracker->num_seq_resets++;
+
+ /* In case the max sequential resets was reached before we passed the timeout,
+ * disable that port.
+ */
+ if (reset_tracker->num_seq_resets == NIC_MAX_SEQ_RESETS) {
+ dev_err(hdev->dev,
+ "Disabling port %u due to %d sequential resets, syndrome %u\n",
+ cn_port->port, NIC_MAX_SEQ_RESETS, syndrome);
+ cn_port->disabled = true;
+ }
+}
+
+void hbl_cn_eq_handler(struct hbl_cn_port *cn_port)
+{
+ struct hbl_en_aux_ops *en_aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_cn_device *hdev;
+ struct hbl_cn_eqe eqe;
+ u32 port;
+
+ if (!cn_port->eq_handler_enable)
+ return;
+
+ hdev = cn_port->hdev;
+ aux_dev = &hdev->en_aux_dev;
+ en_aux_ops = aux_dev->aux_ops;
+ port = cn_port->port;
+
+ mutex_lock(&cn_port->control_lock);
+
+ if (!hbl_cn_is_port_open(cn_port)) {
+ dev_dbg(hdev->dev, "ignoring events while port %d closed", port);
+ goto out;
+ }
+
+ if (en_aux_ops->handle_eqe)
+ while (!hbl_cn_eq_dispatcher_dequeue(cn_port, hdev->kernel_asid, &eqe, false))
+ en_aux_ops->handle_eqe(aux_dev, port, &eqe);
+
+out:
+ mutex_unlock(&cn_port->control_lock);
+}
+
+void hbl_cn_get_self_hw_block_handle(struct hbl_cn_device *hdev, u64 address, u64 *handle)
+{
+ *handle = lower_32_bits(address) | (HBL_CN_MMAP_TYPE_BLOCK);
+ *handle <<= PAGE_SHIFT;
+}
+
+u32 hbl_cn_hw_block_handle_to_addr32(struct hbl_cn_device *hdev, u64 handle)
+{
+ return lower_32_bits(handle >> PAGE_SHIFT);
+}
+
+void *__hbl_cn_dma_alloc_coherent(struct hbl_cn_device *hdev, size_t size, dma_addr_t *dma_handle,
+ gfp_t flag, const char *caller)
+{
+ const struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+
+ return asic_funcs->dma_alloc_coherent(hdev, size, dma_handle, flag);
+}
+
+void __hbl_cn_dma_free_coherent(struct hbl_cn_device *hdev, size_t size, void *cpu_addr,
+ dma_addr_t dma_addr, const char *caller)
+{
+ const struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+
+ asic_funcs->dma_free_coherent(hdev, size, cpu_addr, dma_addr);
+}
+
+void *__hbl_cn_dma_pool_zalloc(struct hbl_cn_device *hdev, size_t size, gfp_t mem_flags,
+ dma_addr_t *dma_handle, const char *caller)
+{
+ const struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+
+ return asic_funcs->dma_pool_zalloc(hdev, size, mem_flags, dma_handle);
+}
+
+void __hbl_cn_dma_pool_free(struct hbl_cn_device *hdev, void *vaddr, dma_addr_t dma_addr,
+ const char *caller)
+{
+ const struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+
+ asic_funcs->dma_pool_free(hdev, vaddr, dma_addr);
+}
+
+int hbl_cn_get_reg_pcie_addr(struct hbl_cn_device *hdev, u8 bar_id, u32 reg, u64 *pci_addr)
+{
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u64 offset;
+ int rc;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ rc = aux_ops->get_reg_pcie_addr(aux_dev, reg, &offset);
+ if (rc)
+ return rc;
+
+ *pci_addr = pci_resource_start(hdev->pdev, bar_id) + offset;
+
+ return 0;
+}
+
+int hbl_cn_get_src_ip(struct hbl_cn_port *cn_port, u32 *src_ip)
+{
+ struct hbl_aux_dev *aux_dev = &cn_port->hdev->en_aux_dev;
+ struct hbl_en_aux_ops *aux_ops = aux_dev->aux_ops;
+ u32 port = cn_port->port;
+
+ if (cn_port->eth_enable && aux_ops->get_src_ip)
+ return aux_ops->get_src_ip(aux_dev, port, src_ip);
+
+ *src_ip = 0;
+
+ return 0;
+}
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.h b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.h
new file mode 100644
index 000000000000..27139e93d990
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.h
@@ -0,0 +1,1627 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef HABANALABS_CN_H_
+#define HABANALABS_CN_H_
+
+#include <linux/habanalabs/cpucp_if.h>
+#include <linux/net/intel/cni.h>
+#include <linux/net/intel/cn.h>
+
+#include <linux/bitfield.h>
+#include <linux/ctype.h>
+#include <linux/kfifo.h>
+#include <linux/netdevice.h>
+#include <linux/types.h>
+#include <linux/genalloc.h>
+
+/* Use upper bits of mmap offset to store habana CN driver specific information.
+ * bits[63:60] - Encode mmap type
+ * bits[45:0] - mmap offset value
+ *
+ * NOTE: struct vm_area_struct.vm_pgoff uses offset in pages. Hence, these
+ * defines are w.r.t to PAGE_SIZE
+ */
+#define HBL_CN_MMAP_TYPE_SHIFT (60 - PAGE_SHIFT)
+#define HBL_CN_MMAP_TYPE_MASK (0xfull << HBL_CN_MMAP_TYPE_SHIFT)
+#define HBL_CN_MMAP_TYPE_CN_MEM (0x2ull << HBL_CN_MMAP_TYPE_SHIFT)
+#define HBL_CN_MMAP_TYPE_BLOCK (0x1ull << HBL_CN_MMAP_TYPE_SHIFT)
+
+#define HBL_CN_MMAP_OFFSET_VALUE_MASK (0x0FFFFFFFFFFFull >> PAGE_SHIFT)
+#define HBL_CN_MMAP_OFFSET_VALUE_GET(off) ((off) & HBL_CN_MMAP_OFFSET_VALUE_MASK)
+
+#define RREG32(reg) hbl_cn_rreg(hdev, (reg))
+#define WREG32(reg, v) hbl_cn_wreg(hdev, (reg), (v))
+#define WREG32_P(reg, val, mask) \
+ do { \
+ u32 __reg = (reg); \
+ u32 tmp_ = RREG32(__reg); \
+ u32 __mask = (mask); \
+ tmp_ &= __mask; \
+ tmp_ |= ((val) & ~__mask); \
+ WREG32(__reg, tmp_); \
+ } while (0)
+
+#define RMWREG32_SHIFTED(reg, val, mask) WREG32_P(reg, val, ~(mask))
+
+#define RMWREG32(reg, val, mask) \
+ do { \
+ u32 _mask = (mask); \
+ RMWREG32_SHIFTED(reg, (val) << __ffs(_mask), _mask); \
+ } while (0)
+
+#define __snprintf(buf, bsize, fmt, ...) \
+ do { \
+ size_t _bsize = (bsize); \
+ char *_buf = (buf); \
+ size_t _blen; \
+ \
+ _blen = strlen(_buf); \
+ \
+ if (_blen < _bsize) \
+ snprintf(_buf + _blen, _bsize - _blen, fmt, ##__VA_ARGS__); \
+ } while (0)
+
+#define NIC_MAX_TNL_HDR_SIZE 32 /* Bytes */
+
+#define NIC_EQ_INFO_BUF_SIZE 256
+#define NIC_NUM_CONCUR_ASIDS 4
+#define RDMA_OFFSET 1
+
+#define NIC_QPC_INV_USEC 1000000 /* 1s */
+#define NIC_SIM_QPC_INV_USEC (NIC_QPC_INV_USEC * 5)
+#define NIC_PLDM_QPC_INV_USEC (NIC_QPC_INV_USEC * 10)
+
+#define NIC_SPMU_STATS_LEN_MAX 6
+
+#define NIC_TMR_TIMEOUT_DEFAULT_GRAN 13
+#define NIC_TMR_TIMEOUT_MAX_GRAN 31
+#define NIC_ADAPTIVE_TIMEOUT_RANGE 6
+#define NIC_GRAN_TO_USEC(gran) (1UL << ((gran) + 2))
+#define NIC_TMR_RESET_FACTOR 3
+
+#define PARSE_FIELD(data, shift, size) (((data) >> (shift)) & (BIT(size) - 1))
+#define MERGE_FIELDS(data_hi, data_lo, shift) ((data_hi) << (shift) | (data_lo))
+
+#define NIC_MACRO_CFG_SIZE hdev->cn_props.macro_cfg_size
+#define NIC_MACRO_CFG_BASE(port) (NIC_MACRO_CFG_SIZE * ((port) >> 1))
+
+#define NIC_MACRO_RREG32(reg) RREG32(NIC_MACRO_CFG_BASE(port) + (reg))
+#define NIC_MACRO_WREG32(reg, val) WREG32(NIC_MACRO_CFG_BASE(port) + (reg), (val))
+#define NIC_MACRO_RMWREG32(reg, val, mask) RMWREG32(NIC_MACRO_CFG_BASE(port) + (reg), val, mask)
+
+#define NIC_PORT_CHECK_ENABLE BIT(0)
+#define NIC_PORT_CHECK_OPEN BIT(1)
+#define NIC_PORT_PRINT_ON_ERR BIT(2)
+#define NIC_PORT_CHECK_ALL GENMASK(2, 0)
+
+#define QPC_REQ_BURST_SIZE 16
+#define QPC_REQ_SCHED_Q 3
+#define QPC_RES_SCHED_Q 2
+#define QPC_RAW_SCHED_Q 1
+
+#define REGMASK(V, F) (((V) << F##_SHIFT) & F##_MASK)
+#define REGMASK2(V, F) (((V) << F##_S) & F##_M)
+
+#define NIC_DR_10 1031250
+#define NIC_DR_25 2578125
+#define NIC_DR_26 2656250
+#define NIC_DR_50 5312500
+#define NIC_DR_100 10625000
+
+#define NIC_MAC_LANE_0 0U
+#define NIC_MAC_LANE_1 1U
+#define NIC_MAC_LANE_2 2U
+#define NIC_MAC_LANE_3 3U
+#define NIC_MAC_LANES 4U
+
+#define CC_CQE_SIZE 16
+#define USER_CCQ_MAX_ENTRIES (BIT(21) / CC_CQE_SIZE) /* dma_alloc can only allocate 2M */
+#define USER_CCQ_MIN_ENTRIES 16
+
+#define DMA_COHERENT_MAX_SIZE SZ_4M
+
+#define PHY_TX_TAPS_NUM 5
+
+#define ACCUMULATE_FEC_STATS_DURATION_MS 100 /* ms */
+#define ACCUMULATE_FEC_STATS_DURATION_MS_MAX 10000 /* ms */
+
+#define HBL_CN_BLOCK_SIZE 0x1000
+
+/* DMA alloc/free wrappers */
+#define hbl_cn_dma_alloc_coherent(hdev, size, dma_handle, flags) \
+ __hbl_cn_dma_alloc_coherent(hdev, size, dma_handle, flags, __func__)
+#define hbl_cn_dma_free_coherent(hdev, size, cpu_addr, dma_addr) \
+ __hbl_cn_dma_free_coherent(hdev, size, cpu_addr, dma_addr, __func__)
+
+#define hbl_cn_dma_pool_zalloc(hdev, size, mem_flags, dma_handle) \
+ __hbl_cn_dma_pool_zalloc(hdev, size, mem_flags, dma_handle, __func__)
+#define hbl_cn_dma_pool_free(hdev, vaddr, dma_addr) \
+ __hbl_cn_dma_pool_free(hdev, vaddr, dma_addr, __func__)
+
+extern struct hbl_cn_stat hbl_cn_mac_fec_stats[];
+extern struct hbl_cn_stat hbl_cn_mac_stats_rx[];
+extern struct hbl_cn_stat hbl_cn_mac_stats_tx[];
+extern size_t hbl_cn_mac_fec_stats_len;
+extern size_t hbl_cn_mac_stats_rx_len;
+extern size_t hbl_cn_mac_stats_tx_len;
+
+/**
+ * enum mtu_type - Describes QP's MTU value source.
+ * @MTU_INVALID: MTU is not configured yet.
+ * @MTU_FROM_USER: MTU gotten from user call to set requester context.
+ * @MTU_FROM_NETDEV: MTU gotten from netdev.
+ * @MTU_DEFAULT: Use default MTU value.
+ */
+enum mtu_type {
+ MTU_INVALID,
+ MTU_FROM_USER,
+ MTU_FROM_NETDEV,
+ MTU_DEFAULT
+};
+
+/**
+ * enum hbl_cn_qp_state - The states the QPs can be.
+ * Follows the spirit of "10.3.1 QUEUE PAIR AND EE CONTEXT STATES" section in
+ * InfiniBand(TM) Architecture Release.
+ * CN_QP_STATE_RESET: QP is in reset state
+ * CN_QP_STATE_INIT: Initialized state where all QP resources are allocated. QP can post Recv WRs.
+ * CN_QP_STATE_RTR: Ready to receive state. QP can post & process Rcv WRs & send ACKs,
+ * CN_QP_STATE_RTS: Ready to send state. QP can post and process recv & send WRs
+ * CN_QP_STATE_SQD: SQ is in drained state (a sub-state of the QP draining process).
+ * CN_QP_STATE_QPD: QP is in drained state, both RQ and SQ are drained.
+ * CN_QP_STATE_SQERR: Send queue error state. QP can Post & process Receive WRs, Send WRs are
+ * completed in error.
+ * CN_QP_STATE_ERR: Error state.
+ * CN_QP_NUM_STATE: Number of states the QP can be in.
+ */
+enum hbl_cn_qp_state {
+ CN_QP_STATE_RESET = 0,
+ CN_QP_STATE_INIT,
+ CN_QP_STATE_RTR,
+ CN_QP_STATE_RTS,
+ CN_QP_STATE_SQD,
+ CN_QP_STATE_QPD,
+ CN_QP_STATE_SQERR,
+ CN_QP_STATE_ERR,
+ CN_QP_NUM_STATE, /* must be last */
+};
+
+/**
+ * enum hbl_cn_qp_state_op - The Valid state transition operations on QP.
+ * CN_QP_OP_INVAL: Invalid OP, indicates invalid state transition path, not to be used. Must be 0.
+ * CN_QP_OP_RST_2INIT: Move the QP from the Reset state to the Init state.
+ * CN_QP_OP_INIT_2RTR: Move the QP from the Init state to the Ready-to-receive state.
+ * CN_QP_OP_RTR_2RTR: Reconfig Responder.
+ * CN_QP_OP_RTR_2QPD: drain the Responder.
+ * CN_QP_OP_RTR_2RTS: Move the QP from RTR state to the Ready-to-send state.
+ * CN_QP_OP_RTR_2SQD: Move the QP from RTR state to the drained state.
+ * CN_QP_OP_RTS_2RTS: Reconfigure the requester.
+ * CN_QP_OP_RTS_2SQERR: move the QP from RTS to the SQER state due to HW errors.
+ * CN_QP_OP_RTS_2SQD: Drain the SQ and move to the SQ-Drained state.
+ * CN_QP_OP_RTS_2QPD: drain the QP (requester and responder).
+ * CN_QP_OP_SQD_2SQD: Re-drain the SQ.
+ * CN_QP_OP_SQD_2QPD: Drain the QP (Responder too).
+ * CN_QP_OP_SQD_2RTS: Enable the requester after draining is done.
+ * CN_QP_OP_SQD_2SQ_ERR: move the QP to SQ_Err due to HW error.
+ * CN_QP_OP_QPD_2RTR: restart the Responder.
+ * CN_QP_OP_QPD_2QPD: drain the QP (again).
+ * CN_QP_OP_SQ_ERR_2SQD: recover from SQ err and return to work
+ * CN_QP_OP_2ERR: Place the QP in error state (due to HW errors).
+ * An error can be forced from any state, except Reset.
+ * CN_QP_OP_2RESET: Move to reset state.
+ * it is possible to transition from any state to the Reset state.
+ * CN_QP_OP_NOP: Do nothing.
+ */
+enum hbl_cn_qp_state_op {
+ CN_QP_OP_INVAL = 0,
+ CN_QP_OP_RST_2INIT,
+ CN_QP_OP_INIT_2RTR,
+ CN_QP_OP_RTR_2RTR,
+ CN_QP_OP_RTR_2QPD,
+ CN_QP_OP_RTR_2RTS,
+ CN_QP_OP_RTR_2SQD,
+ CN_QP_OP_RTS_2RTS,
+ CN_QP_OP_RTS_2SQERR,
+ CN_QP_OP_RTS_2SQD,
+ CN_QP_OP_RTS_2QPD,
+ CN_QP_OP_SQD_2SQD,
+ CN_QP_OP_SQD_2QPD,
+ CN_QP_OP_SQD_2RTS,
+ CN_QP_OP_SQD_2SQ_ERR,
+ CN_QP_OP_QPD_2RTR,
+ CN_QP_OP_QPD_2QPD,
+ CN_QP_OP_SQ_ERR_2SQD,
+ CN_QP_OP_2ERR,
+ CN_QP_OP_2RESET,
+ CN_QP_OP_NOP,
+};
+
+/**
+ * enum db_fifo_state - Describes db fifo's current state. Starting it from 1 because by default
+ * on reset when the state is 0, it shouldn't be confused as Allocated state.
+ * @DB_FIFO_STATE_ALLOC: db fifo id has been allocated.
+ * @DB_FIFO_STATE_SET: db fifo set is done for the corresponding id.
+ */
+enum db_fifo_state {
+ DB_FIFO_STATE_ALLOC = 1,
+ DB_FIFO_STATE_SET,
+};
+
+/**
+ * struct hbl_cn_qp_reset_mode - QP reset method.
+ * @CN_QP_RESET_MODE_GRACEFUL: Graceful reset, reset the QP components in an orderly manner and wait
+ * on each component to settle before moving to the next step.
+ * @CN_QP_RESET_MODE_FAST: Fast reset, reset the QP components in an orderly manner, without waiting
+ * for the components to settle .
+ * @CN_QP_RESET_MODE_HARD: Clear the QP contexts immediately.
+ */
+enum hbl_cn_qp_reset_mode {
+ CN_QP_RESET_MODE_GRACEFUL = 0,
+ CN_QP_RESET_MODE_FAST = 1,
+ CN_QP_RESET_MODE_HARD = 2,
+};
+
+/**
+ * enum hbl_cn_user_cq_state - User CQ states.
+ * @USER_CQ_STATE_ALLOC: ID allocated.
+ * @USER_CQ_STATE_SET: HW configured. Resources allocated.
+ * @USER_CQ_STATE_ALLOC_TO_UNSET: CQ moved to unset state from alloc state directly.
+ * @USER_CQ_STATE_SET_TO_UNSET: CQ moved to unset from set state. HW config cleared.
+ * Resources ready to be reclaimed.
+ */
+enum hbl_cn_user_cq_state {
+ USER_CQ_STATE_ALLOC = 1,
+ USER_CQ_STATE_SET,
+ USER_CQ_STATE_ALLOC_TO_UNSET,
+ USER_CQ_STATE_SET_TO_UNSET,
+};
+
+/**
+ * enum hbl_cn_drv_mem_id - Memory allocation methods:
+ * HBL_CN_DRV_MEM_INVALID: N/A option.
+ * HBL_CN_DRV_MEM_HOST_DMA_COHERENT: Host DMA coherent memory.
+ * HBL_CN_DRV_MEM_HOST_VIRTUAL: Host virtual memory.
+ * HBL_CN_DRV_MEM_DEVICE: Device HBM memory.
+ * HBL_CN_DRV_MEM_HOST_MAP_ONLY: Host mapping only.
+ */
+enum hbl_cn_drv_mem_id {
+ HBL_CN_DRV_MEM_INVALID,
+ HBL_CN_DRV_MEM_HOST_DMA_COHERENT,
+ HBL_CN_DRV_MEM_HOST_VIRTUAL,
+ HBL_CN_DRV_MEM_DEVICE,
+ HBL_CN_DRV_MEM_HOST_MAP_ONLY,
+};
+
+struct hbl_cn_port;
+struct hbl_cn_device;
+struct hbl_cn_macro;
+
+/**
+ * struct hbl_cn_dq_qp_info - structure to hold qp info for dispatch queue.
+ * @node: reference to a QP within the list
+ * @dq: dq per asid event dispatcher queue
+ * @qpn: QP id
+ */
+struct hbl_cn_dq_qp_info {
+ struct hlist_node node;
+ struct hbl_cn_ev_dq *dq;
+ u32 qpn;
+};
+
+/**
+ * struct hbl_cn_eq_raw_buf - a buffer holding unparsed raw EQ events
+ * @events: an array which stores the events
+ * @head: the queue head
+ * @tail: the queue tail
+ * @events_count: number of available events in the queue
+ */
+struct hbl_cn_eq_raw_buf {
+ struct hbl_cn_eqe events[NIC_EQ_INFO_BUF_SIZE];
+ u32 head;
+ u32 tail;
+ u32 events_count;
+};
+
+/**
+ * struct hbl_cn_ev_dq - per-asid/app event dispatch queue
+ * @buf: the dispatched events queue.
+ * @asid: the asid registered on this dq
+ * @overflow: buffer overflow counter
+ * @associated: if this dq associate with a user/asid or not.
+ */
+struct hbl_cn_ev_dq {
+ struct hbl_cn_eq_raw_buf buf;
+ u32 asid;
+ u32 overflow;
+ u8 associated;
+};
+
+/**
+ * struct hbl_cn_ev_dqs - software managed events dispatch queues.
+ * used for dispatching events to their applications.
+ * @qps: a hash table to convert QP-numbers to their owner (ASID).
+ * @cq_dq: array to associate/convert cq-numbers to the dispatch queues.
+ * @ccq_dq: array to associate/convert ccq-numbers to the dispatch queues.
+ * @db_dq: array to associate/convert doorbell-numbers to the dispatch queues.
+ * @edq: the events dispatch queues (as many queues as the number of possible same-time ASIDs).
+ * @default_edq: default events dispatch queue for unknown resources and events.
+ * @lock: protects from simultaneous operations.
+ */
+struct hbl_cn_ev_dqs {
+ DECLARE_HASHTABLE(qps, 11);
+ struct hbl_cn_ev_dq *cq_dq[NIC_DRV_MAX_CQS_NUM];
+ struct hbl_cn_ev_dq *ccq_dq[NIC_DRV_MAX_CCQS_NUM];
+ struct hbl_cn_ev_dq *db_dq[NIC_DRV_NUM_DB_FIFOS];
+ struct hbl_cn_ev_dq edq[NIC_NUM_CONCUR_ASIDS];
+ struct hbl_cn_ev_dq default_edq;
+ /* protects from simultaneous operations */
+ struct mutex lock;
+};
+
+/**
+ * struct hbl_cn_qp_info - holds information of a QP to read via debugfs.
+ * @port: the port the QP belongs to.
+ * @qpn: QP number.
+ * @req: true for requester QP, otherwise responder.
+ * @full_print: print full QP information.
+ * @force_read: force reading a QP in invalid/error state.
+ */
+struct hbl_cn_qp_info {
+ u32 port;
+ u32 qpn;
+ u8 req;
+ u8 full_print;
+ u8 force_read;
+};
+
+/**
+ * struct hbl_cn_wqe_info - holds information of a WQE to read via debugfs.
+ * @port: the port the WQE belongs to.
+ * @qpn: QP number.
+ * @wqe_idx: WQE index.
+ * @tx: true for tx WQE, otherwise rx WQE.
+ */
+struct hbl_cn_wqe_info {
+ u32 port;
+ u32 qpn;
+ u32 wqe_idx;
+ u8 tx;
+};
+
+/**
+ * struct hbl_cn_db_fifo_xarray_pdata - Holds private data of userspace doorbell xarray
+ * @asid: Associated user context.
+ * @port: Associated port index.
+ * @id: fifo index.
+ * @ci_mmap_handle: Consumer index mmap handle.
+ * @umr_mmap_handle: UMR block mmap handle.
+ * @umr_db_offset: db fifo offset in UMR block.
+ * @state: db fifo's state.
+ * @db_pool_addr: offset of the allocated address in gen pool.
+ * @fifo_offset: actual fifo offset allocated for that id.
+ * @fifo_size: size of the fifo allocated.
+ * @fifo_mode: mode of the fifo as received in the IOCTL.
+ */
+struct hbl_cn_db_fifo_xarray_pdata {
+ u32 asid;
+ u32 port;
+ u32 id;
+ u64 ci_mmap_handle;
+ u64 umr_mmap_handle;
+ u32 umr_db_offset;
+ enum db_fifo_state state;
+ u32 db_pool_addr;
+ u32 fifo_offset;
+ u32 fifo_size;
+ u8 fifo_mode;
+};
+
+/**
+ * struct hbl_cn_encap_xarray_pdata - Holds private data of userspace encapsulation xarray
+ * @port: Associated port index
+ * @id: Encapsulation ID
+ * @src_ip: Source port IPv4 address.
+ * @encap_type: L3/L4 encapsulation
+ * @encap_type_data: IPv4 protocol or UDP port
+ * @encap_header: Encapsulation header
+ * @encap_header_size: Encapsulation header size
+ * @is_set: True if encap was set, false otherwise
+ */
+struct hbl_cn_encap_xarray_pdata {
+ u32 port;
+ u32 id;
+ u32 src_ip;
+ enum hbl_nic_encap_type encap_type;
+ u32 encap_type_data;
+ void *encap_header;
+ u32 encap_header_size;
+ u8 is_set;
+};
+
+/**
+ * struct hbl_cn_user_cq - user CQ data.
+ * @cn_port: associated port.
+ * @refcount: number of QPs that use this CQ.
+ * @ctx: Associated user context.
+ * @state: User CQ state.
+ * @overrun_lock: protects the setting\unsetting of CQ overrun.
+ * @mem_handle: mmap handle of buffer memory.
+ * @pi_handle: mmap handle of PI memory.
+ * @id: CQ ID.
+ * @qp_set_overrun_cnt: number of QPs which expect CQ overrun to be enabled.
+ * @is_mmu_bp: is MMU bypass enabled for the memory buffer.
+ */
+struct hbl_cn_user_cq {
+ struct hbl_cn_port *cn_port;
+ struct kref refcount;
+ struct hbl_cn_ctx *ctx;
+ enum hbl_cn_user_cq_state state;
+ /* protects the setting\unsetting of CQ overrun */
+ struct mutex overrun_lock;
+ u64 mem_handle;
+ u64 pi_handle;
+ u32 id;
+ u32 qp_set_overrun_cnt;
+ u8 is_mmu_bp;
+};
+
+/**
+ * struct hbl_cn_wq_array_properties - WQ array properties.
+ * @type_str: string of this WQ array type.
+ * @handle: handle for this WQ array.
+ * @dva_base: reserved device VA for this WQ array.
+ * @dva_size: size in bytes of device VA block of this WQ array.
+ * @wq_size: size in bytes of each WQ in this WQ array.
+ * @idx: index of this WQ array.
+ * @offset: offset inside the memory chunk.
+ * @under_unset: true if this WQ array is waiting for unset (will be done when all QPs are
+ * destroyed), false otherwise.
+ * @on_device_mem: true if this WQ array resides on HBM, false if on host.
+ * @is_send: true if this WQ array should contain send WQEs, false if recv WQEs.
+ * @wq_mmu_bypass: true if WQs has MMU-BP access, false otherwise.
+ * @enabled: true if this WQ array is enabled, false otherwise.
+ */
+struct hbl_cn_wq_array_properties {
+ char *type_str;
+ u64 handle;
+ u64 dva_base;
+ u64 dva_size;
+ u64 wq_size;
+ u32 idx;
+ u32 offset;
+ u8 under_unset;
+ u8 on_device_mem;
+ u8 is_send;
+ u8 wq_mmu_bypass;
+ u8 enabled;
+};
+
+/**
+ * struct hbl_cn_reset_tracker - port reset tracking information.
+ * @timeout_jiffies: end of the measurement window.
+ * @num_seq_resets: how many sequential resets were triggered inside the measurement window.
+ */
+struct hbl_cn_reset_tracker {
+ unsigned long timeout_jiffies;
+ u8 num_seq_resets;
+};
+
+/**
+ * struct hbl_cn_mem_buf - describes a memory allocation.
+ * @ctx: pointer to the context this memory belongs to.
+ * @bus_address: Holds the memory's DMA address.
+ * @kernel_address: Holds the memory's kernel virtual address.
+ * @refcount: reference counter for buffer users.
+ * @mmap: atomic boolean indicating whether or not the buffer is mapped right now.
+ * @real_mapped_size: the actual size of buffer mapped, after part of it may be released, may
+ * change at runtime.
+ * @mappable_size: the original mappable size of the buffer, does not change after the allocation.
+ * @device_addr: Holds the HBM address.
+ * @device_va: Device virtual address. Valid only for MMU mapped allocations.
+ * @handle: The buffer id that is stored in the mem xarray.
+ * @mem_id: specify host/device memory allocation.
+ * @is_destroyed: Indicates whether or not the memory was destroyed.
+ */
+struct hbl_cn_mem_buf {
+ struct hbl_cn_ctx *ctx;
+ dma_addr_t bus_address;
+ void *kernel_address;
+ struct kref refcount;
+ atomic_t mmap;
+ u64 real_mapped_size;
+ u64 mappable_size;
+ u64 device_addr;
+ u64 device_va;
+ u64 handle;
+ u32 mem_id;
+ atomic_t is_destroyed;
+};
+
+/**
+ * struct hbl_cn_qp - Describes a Queue Pair.
+ * @cn_port: Pointer to the port this QP belongs to.
+ * @async_work: async work performed on QP, when destroying the QP.
+ * @adaptive_tmr_reset: reset work performed on QP, for adaptive timer.
+ * @req_user_cq: CQ ID used by the requester context.
+ * @res_user_cq: CQ ID used by the responder context.
+ * @ctx: Associated user context.
+ * @curr_state: The current state of the QP.
+ * @mtu_type: Source of MTU value from user, from netdev or default.
+ * @swq_handle: Send WQ mmap handle.
+ * @rwq_handle: Receive WQ mmap handle.
+ * @port: The port number this QP belongs to.
+ * @qp_id: The QP number within its port.
+ * @local_key: Key for local access.
+ * @remote_key: Key for remote access.
+ * @mtu: Current MTU value.
+ * @swq_size: Send WQ mmap size.
+ * @rwq_size: Receive WQ mmap size.
+ * @is_req: is requester context was set for the QP.
+ * @is_res: is responder context was set for the QP.
+ * @force_cq_overrun: force CQ overrun, if needed, during destruction phase.
+ * @timeout_granularity: User supplied granularity value.
+ * @timeout_curr: Currently used granularity value.
+ */
+struct hbl_cn_qp {
+ struct hbl_cn_port *cn_port;
+ struct work_struct async_work;
+ struct delayed_work adaptive_tmr_reset;
+ struct hbl_cn_user_cq *req_user_cq;
+ struct hbl_cn_user_cq *res_user_cq;
+ struct hbl_cn_ctx *ctx;
+ enum hbl_cn_qp_state curr_state;
+ enum mtu_type mtu_type;
+ u64 swq_handle;
+ u64 rwq_handle;
+ u32 port;
+ u32 qp_id;
+ u32 local_key;
+ u32 remote_key;
+ u32 mtu;
+ u32 swq_size;
+ u32 rwq_size;
+ u8 is_req;
+ u8 is_res;
+ u8 force_cq_overrun;
+ u8 timeout_granularity;
+ u8 timeout_curr;
+};
+
+/**
+ * enum qp_conn_state - State of retransmission flow.
+ * @QP_CONN_STATE_OPEN: connection is open.
+ * @QP_CONN_STATE_CLOSED: connection is closed.
+ * @QP_CONN_STATE_RESYNC: Connection is re-synchronizing.
+ * @QP_CONN_STATE_ERROR: Connection is in error state.
+ */
+enum qp_conn_state {
+ QP_CONN_STATE_OPEN = 0,
+ QP_CONN_STATE_CLOSED = 1,
+ QP_CONN_STATE_RESYNC = 2,
+ QP_CONN_STATE_ERROR = 3,
+};
+
+/**
+ * struct hbl_cn_qpc_attr - QPC attributes as read from the HW.
+ * @valid: QPC is valid.
+ * @in_work: qp was scheduled to work.
+ * @error: QPC is in error state (relevant in Req QPC only).
+ * @conn_state - state of retransmission flow (relevant in Res QPC only).
+ */
+struct hbl_cn_qpc_attr {
+ u8 valid;
+ u8 in_work;
+ u8 error;
+ enum qp_conn_state conn_state;
+};
+
+/**
+ * struct hbl_cn_qpc_reset_attr - attributes used when setting QP state to reset.
+ * @reset_mode: the type/mode of reset to be used.
+ */
+struct hbl_cn_qpc_reset_attr {
+ enum hbl_cn_qp_reset_mode reset_mode;
+};
+
+/**
+ * struct hbl_cn_qpc_drain_attr - QPC attributes used for draining operation.
+ * @wait_for_idle: wait for QPC to become idle.
+ */
+struct hbl_cn_qpc_drain_attr {
+ bool wait_for_idle;
+};
+
+/**
+ * struct hbl_cn_mem_data - memory allocation metadata.
+ * @in: mem_id specific allocation parameters.
+ * @in.device_mem_data: mem_id HBL_CN_DRV_MEM_DEVICE specific allocation parameters.
+ * @in.device_mem_data.port: Associated port index.
+ * @in.device_mem_data.type: enum hbl_nic_mem_type to be allocated.
+ * @in.host_map_data: mem_id HBL_CN_DRV_MEM_HOST_MAP_ONLY specific mapping parameters.
+ * @in.host_map_data.bus_address: Memory DMA address.
+ * @in.host_map_data.kernel_address: Memory kernel virtual address.
+ * @mem_id: Allocation type, enum hbl_cn_mem_id.
+ * @size: Allocation size.
+ * @device_va: Device virtual address. Valid only for MMU mapped allocation.
+ * @handle: Returned mmap handle.
+ * @addr: Returned allocation address.
+ */
+struct hbl_cn_mem_data {
+ union {
+ /* HBL_CN_DRV_MEM_DEVICE */
+ struct {
+ u32 port;
+ enum hbl_nic_mem_type type;
+ } device_mem_data;
+
+ /* HBL_CN_DRV_MEM_HOST_MAP_ONLY */
+ struct {
+ dma_addr_t bus_address;
+ void *kernel_address;
+ } host_map_data;
+ } in;
+
+ /* Common in params */
+ enum hbl_cn_drv_mem_id mem_id;
+ u64 size;
+ u64 device_va;
+
+ /* Common out params */
+ u64 handle;
+ u64 addr;
+};
+
+/**
+ * struct hbl_cni_user_cq_set_in_params - user CQ configuration in params.
+ * @port: port index.
+ * @num_of_cqes: Number of CQ entries in the buffer.
+ * @id: CQ ID.
+ */
+struct hbl_cni_user_cq_set_in_params {
+ u32 port;
+ u32 num_of_cqes;
+ u32 id;
+};
+
+/**
+ * struct hbl_cni_user_cq_set_out_params - user CQ configuration out params.
+ * @mem_handle: Handle of CQ memory buffer.
+ * @pi_handle: Handle of CQ producer-inder memory buffer.
+ * @regs_handle: Handle of CQ Registers base-address.
+ * @regs_offset: CQ Registers sub-offset.
+ */
+struct hbl_cni_user_cq_set_out_params {
+ u64 mem_handle;
+ u64 pi_handle;
+ u64 regs_handle;
+ u32 regs_offset;
+};
+
+/**
+ * struct hbl_cni_user_cq_unset_in_params - user CQ unconfiguration in params.
+ * @port: port index.
+ * @id: CQ ID.
+ */
+struct hbl_cni_user_cq_unset_in_params {
+ u32 port;
+ u32 id;
+};
+
+/**
+ * struct hbl_cn_cpucp_status - describes the status of a port.
+ * @port: port index.
+ * @bad_format_cnt: e.g. CRC.
+ * @responder_out_of_sequence_psn_cnt: e.g NAK.
+ * @high_ber_reinit_cnt: link reinit due to high BER.
+ * @correctable_err_cnt: e.g. bit-flip.
+ * @uncorrectable_err_cnt: e.g. MAC errors.
+ * @retraining_cnt: re-training counter.
+ * @up: is port up.
+ * @pcs_link: has PCS link.
+ * @phy_ready: is PHY ready.
+ * @auto_neg: is Autoneg enabled.
+ * @timeout_retransmission_cnt: timeout retransmission events.
+ * @high_ber_cnt: high ber events.
+ * @pre_fec_ser: pre FEC SER value.
+ * @post_fec_ser: post FEC SER value.
+ * @bandwidth: measured bandwidth.
+ * @lat: measured latency.
+ * @port_toggle_cnt: counts how many times the link toggled since last port PHY init.
+ */
+struct hbl_cn_cpucp_status {
+ u32 port;
+ u32 bad_format_cnt;
+ u32 responder_out_of_sequence_psn_cnt;
+ u32 high_ber_reinit;
+ u32 correctable_err_cnt;
+ u32 uncorrectable_err_cnt;
+ u32 retraining_cnt;
+ u8 up;
+ u8 pcs_link;
+ u8 phy_ready;
+ u8 auto_neg;
+ u32 timeout_retransmission_cnt;
+ u32 high_ber_cnt;
+ struct hbl_cn_cpucp_ser_val pre_fec_ser;
+ struct hbl_cn_cpucp_ser_val post_fec_ser;
+ struct hbl_cn_cpucp_frac_val bandwidth;
+ struct hbl_cn_cpucp_frac_val lat;
+ u32 port_toggle_cnt;
+};
+
+/**
+ * struct hbl_cn_asic_port_funcs - ASIC specific functions that are can be called from common code
+ * for a specific port.
+ * @port_hw_init: initialize the port HW.
+ * @port_hw_fini: cleanup the port HW.
+ * @phy_port_init: port PHY init.
+ * @phy_port_start_stop: port PHY start/stop.
+ * @phy_port_power_up: port PHY power-up.
+ * @phy_port_reconfig: port PHY reconfigure.
+ * @phy_port_fini: port PHY cleanup.
+ * @phy_link_status_work: link status handler.
+ * @update_mtu: updates MTU inside a requestor QP context.
+ * @user_wq_arr_unset: unset user WQ array (check whether user_wq_lock should be taken).
+ * @get_cq_id_range: get user CQ ID range.
+ * @user_cq_set: set user CQ.
+ * @user_cq_unset: unset user CQ.
+ * @user_cq_destroy: destroy user CQ.
+ * @get_cnts_num: get the number of available counters.
+ * @get_cnts_names: get the names of the available counters.
+ * @get_cnts_values: get the values of the available counters.
+ * @port_sw_init: initialize per port software components.
+ * @port_sw_fini: finalize per port software components.
+ * @register_qp: register a new qp-id with the NIC.
+ * @unregister_qp: unregister a qp.
+ * @get_qp_id_range: Get unsecure QP ID range.
+ * @eq_poll: poll the EQ for asid/app-specific events.
+ * @get_db_fifo_id_range: Get the fifo ID range that is available to the user.
+ * @get_db_fifo_hw_id_range: Get the actual HW fifo ID range.
+ * @db_fifo_set: Config unsecure userspace doorbell fifo.
+ * @db_fifo_unset: Destroy unsecure userspace doorbell fifo.
+ * @get_db_fifo_umr: Get UMR block address and db fifo offset.
+ * @get_db_fifo_modes_mask: Get the supported db fifo modes
+ * @db_fifo_allocate: Allocate fifo for the specific mode
+ * @db_fifo_free: Free the fifo for the specific id
+ * @set_pfc: enable/disable PFC.
+ * @get_encap_id_range: Get user encapsulation ID range
+ * @encap_set: Start encapsulation
+ * @encap_unset: Stop encapsulation
+ * @set_ip_addr_encap: Setup IP address encapsulation.
+ * @qpc_write: write a QP context to the HW.
+ * @qpc_invalidate: invalidate a QP context.
+ * @qpc_query: read a QP context.
+ * @qpc_clear: clear a QP context.
+ * @user_ccq_set: set user congestion completion queue.
+ * @user_ccq_unset: unset user congestion completion queue.
+ * @reset_mac_stats: reset MAC statistics.
+ * @collect_fec_stats: collect FEC statistics.
+ * @disable_wqe_index_checker: Disable WQE index checker for both Rx and Tx.
+ * @get_status: get status information for F/W.
+ * @cfg_lock: acquire the port configuration lock.
+ * @cfg_unlock: release the port configuration lock.
+ * @cfg_is_locked: check if the port configuration lock is locked.
+ * @qp_pre_destroy: prepare for a QP destroy. Called under the cfg lock.
+ * @qp_post_destroy: cleanup after a QP destroy. Called under the cfg lock.
+ * @set_port_status: config port status before notifying user.
+ * @send_cpucp_packet: Send cpucp packet to FW.
+ * @adaptive_tmr_reset: Reset timer granularity for adaptive timeout feature.
+ * @spmu_get_stats_info: get SPMU statistics information.
+ * @spmu_config: config the SPMU.
+ * @spmu_sample: read SPMU counters.
+ * @post_send_status: handler for post sending status packet to FW.
+ */
+struct hbl_cn_asic_port_funcs {
+ int (*port_hw_init)(struct hbl_cn_port *cn_port);
+ void (*port_hw_fini)(struct hbl_cn_port *cn_port);
+ int (*phy_port_init)(struct hbl_cn_port *cn_port);
+ void (*phy_port_start_stop)(struct hbl_cn_port *cn_port, bool is_start);
+ int (*phy_port_power_up)(struct hbl_cn_port *cn_port);
+ void (*phy_port_reconfig)(struct hbl_cn_port *cn_port);
+ void (*phy_port_fini)(struct hbl_cn_port *cn_port);
+ void (*phy_link_status_work)(struct work_struct *work);
+ int (*update_qp_mtu)(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp, u32 mtu);
+ int (*user_wq_arr_unset)(struct hbl_cn_ctx *ctx, struct hbl_cn_port *cn_port, u32 type);
+ void (*get_cq_id_range)(struct hbl_cn_port *cn_port, u32 *min_id, u32 *max_id);
+ int (*user_cq_set)(struct hbl_cn_user_cq *user_cq, struct hbl_cni_user_cq_set_in_params *in,
+ struct hbl_cni_user_cq_set_out_params *out);
+ int (*user_cq_unset)(struct hbl_cn_user_cq *user_cq);
+ void (*user_cq_destroy)(struct hbl_cn_user_cq *user_cq);
+ int (*get_cnts_num)(struct hbl_cn_port *cn_port);
+ void (*get_cnts_names)(struct hbl_cn_port *cn_port, u8 *data, bool ext);
+ void (*get_cnts_values)(struct hbl_cn_port *cn_port, u64 *data);
+ int (*port_sw_init)(struct hbl_cn_port *cn_port);
+ void (*port_sw_fini)(struct hbl_cn_port *cn_port);
+ int (*register_qp)(struct hbl_cn_port *cn_port, u32 qp_id, u32 asid);
+ void (*unregister_qp)(struct hbl_cn_port *cn_port, u32 qp_id);
+ void (*get_qp_id_range)(struct hbl_cn_port *cn_port, u32 *min_id, u32 *max_id);
+ int (*eq_poll)(struct hbl_cn_port *cn_port, u32 asid,
+ struct hbl_cni_eq_poll_out *event);
+ struct hbl_cn_ev_dq * (*eq_dispatcher_select_dq)(struct hbl_cn_port *cn_port,
+ const struct hbl_cn_eqe *eqe);
+ void (*get_db_fifo_id_range)(struct hbl_cn_port *cn_port, u32 *min_id, u32 *max_id);
+ void (*get_db_fifo_hw_id_range)(struct hbl_cn_port *cn_port, u32 *min_id, u32 *max_id);
+ int (*db_fifo_set)(struct hbl_cn_port *cn_port, struct hbl_cn_ctx *ctx, u32 id,
+ u64 ci_device_handle, struct hbl_cn_db_fifo_xarray_pdata *xa_pdata);
+ void (*db_fifo_unset)(struct hbl_cn_port *cn_port, u32 id,
+ struct hbl_cn_db_fifo_xarray_pdata *xa_pdata);
+ void (*get_db_fifo_umr)(struct hbl_cn_port *cn_port, u32 id,
+ u64 *umr_block_addr, u32 *umr_db_offset);
+ void (*get_db_fifo_modes_mask)(struct hbl_cn_port *cn_port, u32 *mode_mask);
+ int (*db_fifo_allocate)(struct hbl_cn_port *cn_port,
+ struct hbl_cn_db_fifo_xarray_pdata *xa_pdata);
+ void (*db_fifo_free)(struct hbl_cn_port *cn_port, u32 db_pool_addr, u32 fifo_size);
+ int (*set_pfc)(struct hbl_cn_port *cn_port);
+ void (*get_encap_id_range)(struct hbl_cn_port *cn_port, u32 *min_id, u32 *max_id);
+ int (*encap_set)(struct hbl_cn_port *cn_port, u32 encap_id,
+ struct hbl_cn_encap_xarray_pdata *xa_pdata);
+ void (*encap_unset)(struct hbl_cn_port *cn_port, u32 encap_id,
+ struct hbl_cn_encap_xarray_pdata *xa_pdata);
+ void (*set_ip_addr_encap)(struct hbl_cn_port *cn_port, u32 *encap_id, u32 src_ip);
+ int (*qpc_write)(struct hbl_cn_port *cn_port, void *qpc, struct qpc_mask *qpc_mask,
+ u32 qpn, bool is_req);
+ int (*qpc_invalidate)(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp, bool is_req);
+ int (*qpc_query)(struct hbl_cn_port *cn_port, u32 qpn, bool is_req,
+ struct hbl_cn_qpc_attr *attr);
+ int (*qpc_clear)(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp, bool is_req);
+ void (*user_ccq_set)(struct hbl_cn_port *cn_port, u64 ccq_device_addr, u64 pi_device_addr,
+ u32 num_of_entries, u32 *ccqn);
+ void (*user_ccq_unset)(struct hbl_cn_port *cn_port, u32 *ccqn);
+ void (*reset_mac_stats)(struct hbl_cn_port *cn_port);
+ void (*collect_fec_stats)(struct hbl_cn_port *cn_port, char *buf, size_t size);
+ int (*disable_wqe_index_checker)(struct hbl_cn_port *cn_port);
+ void (*get_status)(struct hbl_cn_port *cn_port, struct hbl_cn_cpucp_status *status);
+ void (*cfg_lock)(struct hbl_cn_port *cn_port);
+ void (*cfg_unlock)(struct hbl_cn_port *cn_port);
+ bool (*cfg_is_locked)(struct hbl_cn_port *cn_port);
+ void (*qp_pre_destroy)(struct hbl_cn_qp *qp);
+ void (*qp_post_destroy)(struct hbl_cn_qp *qp);
+ void (*set_port_status)(struct hbl_cn_port *cn_port, bool up);
+ int (*send_cpucp_packet)(struct hbl_cn_port *cn_port, enum cpucp_packet_id packet_id,
+ int val);
+ void (*adaptive_tmr_reset)(struct hbl_cn_qp *qp);
+ void (*spmu_get_stats_info)(struct hbl_cn_port *cn_port, struct hbl_cn_stat **stats,
+ u32 *n_stats);
+ int (*spmu_config)(struct hbl_cn_port *cn_port, u32 num_event_types, u32 event_types[],
+ bool enable);
+ int (*spmu_sample)(struct hbl_cn_port *cn_port, u32 num_out_data, u64 out_data[]);
+ void (*post_send_status)(struct hbl_cn_port *cn_port);
+};
+
+/**
+ * struct hbl_cn_asic_funcs - ASIC specific functions that are can be called from common code.
+ * @core_init: core infrastructure init.
+ * @core_fini: core infrastructure cleanup.
+ * @get_default_port_speed: get the default port BW in MB/s.
+ * @phy_reset_macro: macro PHY reset.
+ * @phy_get_crc: get PHY CRC.
+ * @set_req_qp_ctx: set up a requester QP context.
+ * @set_res_qp_ctx: set up a responder QP context.
+ * @user_wq_arr_set: set user WQ array (check whether user_wq_lock should be taken).
+ * @user_set_app_params: update user params to be later retrieved by user_get_app_params.
+ * @user_get_app_params: retrieve user params, previously configured or saved by
+ * user_set_app_params.
+ * @get_phy_fw_name: returns the PHY FW file name.
+ * @pre_sw_init: initialize device SW fixed properties.
+ * @sw_init: initialize device SW.
+ * @sw_fini: cleanup device SW.
+ * @macro_sw_init: initialize per macro software components.
+ * @macro_sw_fini: finalize per macro software components.
+ * @kernel_ctx_init: initialize kernel context.
+ * @kernel_ctx_fini: de-initialize kernel context.
+ * @ctx_init: initialize user context.
+ * @ctx_fini: de-initialize user context.
+ * @qp_read: read a QP content.
+ * @wqe_read: read a WQE content.
+ * @phy_fw_load_all: load PHY fw on all the ports.
+ * @set_en_data: ASIC data to be used by the Ethernet driver.
+ * @request_irqs: Add handlers to interrupt lines.
+ * @free_irqs: Free interrupts allocated with request_irqs.
+ * @synchronize_irqs: Wait for pending IRQ handlers (on other CPUs).
+ * @phy_dump_serdes_params: dump the serdes parameters.
+ * @get_max_msg_sz: get maximum message size.
+ * @qp_syndrome_to_str: convert a QP error syndrome to an error string.
+ * @app_params_clear: clear app params.
+ * @inject_rx_err: Force RX packet drops.
+ * @is_encap_supported: true if encapsulation is supported according to the given parameters.
+ * @set_wqe_index_checker: set wqe index checker (enable/disable).
+ * @get_wqe_index_checker: get wqe index checker (enabled/disabled).
+ * @set_static_properties: Sets static CN properties.
+ * @set_dram_properties: Sets DRAM CN properties.
+ * @late_init: set post initialization properties, e.g., compute2cn ops.
+ * @late_fini: clear post initialization properties, e.g., compute2cn ops.
+ * @get_hw_block_handle: Map block and return its handle.
+ * @get_hw_block_addr: Get address and size for a memory handle.
+ * @create_mem_ctx: create a HW memory context.
+ * @destroy_mem_ctx: destroy a HW memory context.
+ * @phy_speed_rate_write: set PHY speed rate ID.
+ * @phy_speed_rate_read: get PHY speed rate ID.
+ * @phy_training_type_write: set PHY training type ID.
+ * @phy_training_type_read: get PHY training type ID.
+ * @dma_alloc_coherent: Allocate coherent DMA memory.
+ * @dma_free_coherent: Free coherent DMA memory.
+ * @dma_pool_zalloc: Allocate a memory block from a DMA memory pool.
+ * @dma_pool_free: Free a DMA memory pool block.
+ * @set_maintenance_work_interval: Set maintenance work interval.
+ * @get_maintenance_work_interval: Get maintenance work interval.
+ * @send_cpu_message: Send message to F/W. If the message is timedout, the driver will eventually
+ * reset the device. The timeout is passed as an argument. If it is 0 the
+ * timeout set is the default timeout for the specific ASIC.
+ * @ports_cancel_status_work: Cancel status work of all ports.
+ * @port_funcs: Functions called from common code for a specific port.
+ */
+struct hbl_cn_asic_funcs {
+ int (*core_init)(struct hbl_cn_device *hdev);
+ void (*core_fini)(struct hbl_cn_device *hdev);
+ u32 (*get_default_port_speed)(struct hbl_cn_device *hdev);
+ int (*phy_reset_macro)(struct hbl_cn_macro *cn_macro);
+ u16 (*phy_get_crc)(struct hbl_cn_device *hdev);
+ int (*set_req_qp_ctx)(struct hbl_cn_device *hdev, struct hbl_cni_req_conn_ctx_in *in,
+ struct hbl_cn_qp *qp);
+ int (*set_res_qp_ctx)(struct hbl_cn_device *hdev, struct hbl_cni_res_conn_ctx_in *in,
+ struct hbl_cn_qp *qp);
+ int (*user_wq_arr_set)(struct hbl_cn_device *hdev, struct hbl_cni_user_wq_arr_set_in *in,
+ struct hbl_cni_user_wq_arr_set_out *out, struct hbl_cn_ctx *ctx);
+ int (*user_set_app_params)(struct hbl_cn_device *hdev,
+ struct hbl_cni_set_user_app_params_in *in,
+ bool *modify_wqe_checkers, struct hbl_cn_ctx *ctx);
+ void (*user_get_app_params)(struct hbl_cn_device *hdev,
+ struct hbl_cni_get_user_app_params_in *in,
+ struct hbl_cni_get_user_app_params_out *out);
+ const char* (*get_phy_fw_name)(void);
+ void (*pre_sw_init)(struct hbl_cn_device *hdev);
+ int (*sw_init)(struct hbl_cn_device *hdev);
+ void (*sw_fini)(struct hbl_cn_device *hdev);
+ int (*macro_sw_init)(struct hbl_cn_macro *cn_macro);
+ void (*macro_sw_fini)(struct hbl_cn_macro *cn_macro);
+ int (*kernel_ctx_init)(struct hbl_cn_device *hdev, u32 asid);
+ void (*kernel_ctx_fini)(struct hbl_cn_device *hdev, u32 asid);
+ int (*ctx_init)(struct hbl_cn_ctx *ctx);
+ void (*ctx_fini)(struct hbl_cn_ctx *ctx);
+ int (*qp_read)(struct hbl_cn_device *hdev, struct hbl_cn_qp_info *qp_info, char *buf,
+ size_t bsize);
+ int (*wqe_read)(struct hbl_cn_device *hdev, char *buf, size_t bsize);
+ int (*phy_fw_load_all)(struct hbl_cn_device *hdev);
+ void (*set_en_data)(struct hbl_cn_device *hdev);
+ int (*request_irqs)(struct hbl_cn_device *hdev);
+ void (*free_irqs)(struct hbl_cn_device *hdev);
+ void (*synchronize_irqs)(struct hbl_cn_device *hdev);
+ void (*phy_dump_serdes_params)(struct hbl_cn_device *hdev, char *buf, size_t size);
+ u32 (*get_max_msg_sz)(struct hbl_cn_device *hdev);
+ char *(*qp_syndrome_to_str)(u32 syndrome);
+ void (*app_params_clear)(struct hbl_cn_device *hdev);
+ int (*inject_rx_err)(struct hbl_cn_device *hdev, u8 drop_percent);
+ bool (*is_encap_supported)(struct hbl_cn_device *hdev,
+ struct hbl_cni_user_encap_set_in *in);
+ int (*set_wqe_index_checker)(struct hbl_cn_device *hdev, u32 enable);
+ int (*get_wqe_index_checker)(struct hbl_cn_device *hdev);
+ int (*set_static_properties)(struct hbl_cn_device *hdev);
+ int (*set_dram_properties)(struct hbl_cn_device *hdev);
+ void (*late_init)(struct hbl_cn_device *hdev);
+ void (*late_fini)(struct hbl_cn_device *hdev);
+ int (*get_hw_block_handle)(struct hbl_cn_device *hdev, u64 address, u64 *handle);
+ int (*get_hw_block_addr)(struct hbl_cn_device *hdev, u64 handle, u64 *addr, u64 *size);
+ int (*create_mem_ctx)(struct hbl_cn_ctx *ctx, u32 pasid, u64 page_tbl_addr);
+ void (*destroy_mem_ctx)(struct hbl_cn_ctx *ctx, u32 pasid, u64 page_tbl_addr);
+ int (*phy_speed_rate_write)(struct hbl_cn_device *hdev, u32 speed_rate_id);
+ u32 (*phy_speed_rate_read)(struct hbl_cn_device *hdev);
+ int (*phy_training_type_write)(struct hbl_cn_device *hdev, u32 training_type_id);
+ u32 (*phy_training_type_read)(struct hbl_cn_device *hdev);
+ void *(*dma_alloc_coherent)(struct hbl_cn_device *hdev, size_t size, dma_addr_t *dma_handle,
+ gfp_t flag);
+ void (*dma_free_coherent)(struct hbl_cn_device *hdev, size_t size, void *cpu_addr,
+ dma_addr_t dma_addr);
+ void *(*dma_pool_zalloc)(struct hbl_cn_device *hdev, size_t size, gfp_t mem_flags,
+ dma_addr_t *dma_handle);
+ void (*dma_pool_free)(struct hbl_cn_device *hdev, void *vaddr, dma_addr_t dma_addr);
+ void (*set_maintenance_work_interval)(struct hbl_cn_device *hdev, u32 value);
+ u32 (*get_maintenance_work_interval)(struct hbl_cn_device *hdev);
+ int (*send_cpu_message)(struct hbl_cn_device *hdev, u32 *msg, u16 len, u32 timeout,
+ u64 *result);
+ void (*ports_cancel_status_work)(struct hbl_cn_device *hdev);
+ struct hbl_cn_asic_port_funcs *port_funcs;
+};
+
+/**
+ * struct hbl_cn_tx_taps - holds the Tx taps values for a specific lane (tx_pre2, tx_pre1, tx_main,
+ * tx_post1 and tx_post2).
+ * @pam4_taps: taps for PAM4 mode.
+ * @nrz_taps: taps for NRZ mode.
+ */
+struct hbl_cn_tx_taps {
+ s32 pam4_taps[PHY_TX_TAPS_NUM];
+ s32 nrz_taps[PHY_TX_TAPS_NUM];
+};
+
+/**
+ * struct hbl_cn_ber_info - holds the last calculated BER info for a specific lane.
+ * the BER (bit error rate) value is represented by "integer.frac * e ^ -exp".
+ * @integer: the integer part of the BER value.
+ * @frac: the fracture part of the BER value.
+ * @exp: the exponent part of the BER value.
+ * @valid: is info valid.
+ */
+struct hbl_cn_ber_info {
+ u64 integer;
+ u64 frac;
+ u8 exp;
+ u8 valid;
+};
+
+/**
+ * struct hbl_cn_macro - manage specific macro that holds multiple engines.
+ * @hdev: habanalabs device structure.
+ * @rec_link_sts: link status bits as received from the MAC_REC_STS0 register.
+ * @phy_macro_needs_reset: true if the PHY macro needs to be reset.
+ * @idx: index of the macro.
+ */
+struct hbl_cn_macro {
+ struct hbl_cn_device *hdev;
+ u32 rec_link_sts;
+ u8 phy_macro_needs_reset;
+ u8 idx;
+};
+
+/**
+ * struct hbl_cn_port - manage specific port common structure.
+ * @hdev: habanalabs device structure.
+ * @cn_specific: pointer to an ASIC specific port structure.
+ * @cn_macro: pointer to the manage structure of the containing macro.
+ * @wq: general purpose WQ for low/medium priority jobs like link status detection or status fetch.
+ * @qp_wq: QP work queue for handling the reset or destruction of QPs.
+ * @wq_arr_props: array per type of WQ array properties.
+ * @ev_dqs: per ASID/App events dispatch queues managed by the driver.
+ * @num_of_allocated_qps: the currently number of allocated qps for this port.
+ * @link_status_work: work for checking port link status.
+ * @fw_status_work: work for sending port status to the FW.
+ * @control_lock: protects from a race between port open/close and other stuff that might run in
+ * parallel (such as event handling).
+ * @cnt_lock: protects the counters from concurrent reading. Needed for SPMU and XPCS91 counters.
+ * @qp_ids: xarray to hold all QP IDs.
+ * @db_fifo_ids: Allocated doorbell fifo IDs.
+ * @cq_ids: xarray to hold all CQ IDs.
+ * @encap_ids: Allocated encapsulation IDs.
+ * @fw_tuning_limit_ts: time stamp of FW tuning time limit.
+ * @correctable_errors_cnt: count the correctable FEC blocks.
+ * @uncorrectable_errors_cnt: count the uncorrectable FEC blocks.
+ * @port: port index.
+ * @speed: the bandwidth of the port in Mb/s.
+ * @pflags: private flags bit mask.
+ * @retry_cnt: counts the number of retries during link establishment.
+ * @pcs_local_fault_cnt: counter of PCS link local errors since last F/W configuration. These errors
+ * can appear even when link is up.
+ * @pcs_remote_fault_cnt: counter of PCS link remote errors since last F/W configuration. These
+ * errors can appear even when link is up.
+ * @pcs_remote_fault_seq_cnt: counter for number of PCS remote faults in a row, or in other words
+ * the length of their sequence.
+ * @pcs_remote_fault_reconfig_cnt: counter of PHY reconfigurations due to remote fault errors since
+ * last port open.
+ * @pcs_link_restore_cnt: counter of PCS link momentary loss (glitch) since last F/W configuration.
+ * @data_rate: data rate according to speed and number of lanes.
+ * @num_of_wq_entries: number of entries configured in the port WQ.
+ * @num_of_wqs: number of WQs configured for this port.
+ * @qp_idx_offset: offset to the base QP index of this port for generic QPs.
+ * @port_toggle_cnt: counts number of times port link status was toggled since PHY init.
+ * @port_toggle_cnt_prev: holds the value of port_toggle_cnt in the last pcs link check.
+ * @cong_q_err_cnt: error count of congestion queue error.
+ * @port_open: true if the port H/W is initialized, false otherwise.
+ * @mac_loopback: true if port in MAC loopback mode, false otherwise.
+ * @pfc_enable: true if this port supports Priority Flow Control, false otherwise.
+ * @sw_initialized: true if the basic SW initialization was completed successfully for this port,
+ * false otherwise.
+ * @phy_fw_tuned: true if F/W is tuned, false otherwise.
+ * @phy_func_mode_en: true if PHY is set to functional mode, false otherwise.
+ * @pcs_link: true if the port has PCS link, false otherwise.
+ * @eq_pcs_link: true if the port got PCS link in the EQ, false otherwise..
+ * @link_eqe: cache link status EQE. Dispatched to user for internal ports only.
+ * @auto_neg_enable: true if this port supports Autonegotiation, false otherwise.
+ * @auto_neg_resolved: true if Autonegotiation was completed for this port, false otherwise.
+ * @auto_neg_skipped: true if Autonegotiation was skipped for this port, false otherwise.
+ * @eth_enable: is Ethernet traffic enabled in addition to RDMA.
+ * @ccq_enable: true if the CCQ was initialized successfully for this port, false otherwise.
+ * @set_app_params: set_app_params operation was executed by the user. This is a mandatory step in
+ * order to initialize the uAPI.
+ * @disabled: true if this port is disabled, i.e. need to block its initialization, false otherwise.
+ * @bp_enable: true if WQ back-pressure was enabled, false otherwise.
+ * @eq_handler_enable: true if event queue events are handled, false otherwise.
+ */
+struct hbl_cn_port {
+ struct hbl_cn_device *hdev;
+ void *cn_specific;
+ struct hbl_cn_macro *cn_macro;
+ struct workqueue_struct *wq;
+ struct workqueue_struct *qp_wq;
+ struct hbl_cn_wq_array_properties wq_arr_props[HBL_CNI_USER_WQ_TYPE_MAX];
+ struct hbl_cn_ev_dqs ev_dqs;
+ struct hbl_cn_reset_tracker *reset_tracker;
+ atomic_t num_of_allocated_qps;
+ struct delayed_work link_status_work;
+ struct delayed_work fw_status_work;
+ /* protects from a race between port open/close and event handling */
+ struct mutex control_lock;
+ /* protects the counters from concurrent reading */
+ struct mutex cnt_lock;
+ struct xarray qp_ids;
+ struct xarray db_fifo_ids;
+ struct xarray cq_ids;
+ struct xarray encap_ids;
+ struct hbl_cn_eqe link_eqe;
+ ktime_t fw_tuning_limit_ts;
+ u64 ccq_handle;
+ u64 ccq_pi_handle;
+ u64 correctable_errors_cnt;
+ u64 uncorrectable_errors_cnt;
+ u32 port;
+ u32 speed;
+ u32 pflags;
+ u32 retry_cnt;
+ u32 pcs_local_fault_cnt;
+ u32 pcs_remote_fault_seq_cnt;
+ u32 pcs_remote_fault_reconfig_cnt;
+ u32 pcs_remote_fault_cnt;
+ u32 pcs_link_restore_cnt;
+ u32 data_rate;
+ u32 num_of_wq_entries;
+ u32 num_of_wqs;
+ u32 qp_idx_offset;
+ u32 swqe_size;
+ u32 port_toggle_cnt;
+ u32 port_toggle_cnt_prev;
+ u32 cong_q_err_cnt;
+ u8 port_open;
+ u8 mac_loopback;
+ u8 pfc_enable;
+ u8 sw_initialized;
+ u8 phy_fw_tuned;
+ u8 phy_func_mode_en;
+ u8 pcs_link;
+ u8 eq_pcs_link;
+ u8 auto_neg_enable;
+ u8 auto_neg_resolved;
+ u8 auto_neg_skipped;
+ u8 eth_enable;
+ u8 ccq_enable;
+ u8 set_app_params;
+ u8 disabled;
+ u8 bp_enable;
+ u8 eq_handler_enable;
+};
+
+/**
+ * struct hbl_cn_comp_vm_info - Compute virtual memory info.
+ * @vm_info: VM info.
+ * @vm_handle: VM handle.
+ */
+struct hbl_cn_comp_vm_info {
+ struct hbl_cn_vm_info vm_info;
+ u64 vm_handle;
+};
+
+/**
+ * struct hbl_cn_ctx - user context common structure.
+ * @hdev: device structure.
+ * @user_vm_info: info of user compute VM.
+ * @driver_vm_info: info of driver compute VM.
+ * @lock: protects context from specific concurrent operations.
+ * @comp_handle: compute handle.
+ * @asid: ASID for accessing driver memory.
+ * @user_asid: ASID for accessing user memory:
+ */
+struct hbl_cn_ctx {
+ struct hbl_cn_device *hdev;
+ struct hbl_cn_comp_vm_info user_vm_info;
+ struct hbl_cn_comp_vm_info driver_vm_info;
+ /* protects context from specific concurrent operations */
+ struct mutex lock;
+ u64 comp_handle;
+ u32 asid;
+ u32 user_asid;
+};
+
+/**
+ * struct hbl_cn_properties - ASIC specific properties.
+ * @phy_base_addr: base address of the PHY.
+ * @nic_drv_addr: the base address of the memory in the device
+ * @nic_drv_size: the size of the memory in the device
+ * @nic_drv_base_addr: the aligned base address of the memory in the device
+ * @nic_drv_end_addr: the aligned end address of the memory in the device
+ * @txs_base_addr: base address of the ports timer cfg
+ * @txs_base_size: size of the ports timer cfg
+ * @wq_base_addr: base address of send and receive work-q
+ * @wq_base_size: base address of send and receive work-q
+ * @tmr_base_addr: base address of the macros timer cfg
+ * @tmr_base_size: size of the macros timer cfg
+ * @req_qpc_base_addr: the base address of a requester (sender) QP context buffer
+ * @req_qpc_base_size: the size of a requester (sender) QP context buffer
+ * @res_qpc_base_addr: the base address of a responder (receiver) QP context buffer
+ * @res_qpc_base_size: the size of a requester (receiver) QP context buffer
+ * @max_hw_qps_num: maximum number of QPs supported by HW.
+ * @max_qps_num: maximum number of QPs to allocate.
+ * @max_hw_user_wqs_num: maximum number of WQ entries supported by HW.
+ * @min_hw_user_wqs_num: minimum number of WQ entries supported by HW.
+ * @macro_cfg_size: the size of the macro configuration space.
+ * @rwqe_size: receive WQE size.
+ * @user_cq_min_entries: minimum number of supported user CQ entries.
+ * @user_cq_max_entries: max number of supported user CQ entries.
+ * @max_frm_len: maximum allowed frame length.
+ * @raw_elem_size: size of element in raw buffers.
+ * @status_packet_size: size of the status packet we are going to send to F/W.
+ * @cqe_size: Size of the Completion queue Entry.
+ * @max_qp_error_syndromes: maximum number of QP error syndromes.
+ * @max_raw_mtu: maximum MTU size for raw packets.
+ * @min_raw_mtu: minimum MTU size for raw packets.
+ * @clk: clock frequency in MHz.
+ * @force_cq: all CQs should be enabled regardless of the ports link mask.
+ * @max_num_of_lanes: maximum number of lanes supported by ASIC.
+ * @max_num_of_ports: maximum number of ports supported by ASIC.
+ * @num_of_macros: number of macros supported by ASIC.
+ * @max_cqs: maximum number of completion queues.
+ * @max_ccqs: maximum number of congestion control completion queues.
+ * @max_db_fifos: maximum number of DB fifos.
+ * @max_wq_arr_type: maximum WQ array type number.
+ * @is_phy_fw_binary: True if phy FW is in binary format, false otherwise.
+ */
+struct hbl_cn_properties {
+ u64 phy_base_addr;
+ u64 nic_drv_addr;
+ u64 nic_drv_size;
+ u64 nic_drv_base_addr;
+ u64 nic_drv_end_addr;
+ u64 txs_base_addr;
+ u64 txs_base_size;
+ u64 wq_base_addr;
+ u64 wq_base_size;
+ u64 tmr_base_addr;
+ u64 tmr_base_size;
+ u64 req_qpc_base_addr;
+ u64 req_qpc_base_size;
+ u64 res_qpc_base_addr;
+ u64 res_qpc_base_size;
+ u32 max_hw_qps_num;
+ u32 max_qps_num;
+ u32 max_hw_user_wqs_num;
+ u32 min_hw_user_wqs_num;
+ u32 macro_cfg_size;
+ u32 rwqe_size;
+ u32 user_cq_min_entries;
+ u32 user_cq_max_entries;
+ u32 max_frm_len;
+ u32 raw_elem_size;
+ u32 status_packet_size;
+ u32 cqe_size;
+ u32 max_qp_error_syndromes;
+ u16 max_raw_mtu;
+ u16 min_raw_mtu;
+ u16 clk;
+ u8 force_cq;
+ u8 max_num_of_lanes;
+ u8 max_num_of_ports;
+ u8 num_of_macros;
+ u8 max_cqs;
+ u8 max_ccqs;
+ u8 max_db_fifos;
+ u8 max_wq_arr_type;
+ u8 is_phy_fw_binary;
+};
+
+/**
+ * struct hbl_cn_device - habanalabs CN device structure.
+ * @pdev: pointer to PCI device.
+ * @dev: related kernel basic device structure.
+ * @cpucp_info: FW info.
+ * @asic_funcs: ASIC specific functions that can be called from common code.
+ * @phy_tx_taps: array that holds all PAM4 Tx taps of all lanes.
+ * @phy_ber_info: array that holds last calculated BER info of all lanes.
+ * @cn_ports: pointer to an array that holds all ports manage common structures.
+ * @cn_macros: pointer to an array that holds all macros manage structures.
+ * @wq_arrays_pool: memory pool for WQ arrays on HBM.
+ * @cn_props: fixed NIC properties.
+ * @asic_specific: ASIC specific information to use only from ASIC files.
+ * @cn_aux_dev: pointer to CN auxiliary device.
+ * @en_aux_dev: Ethernet auxiliary device.
+ * @ib_aux_dev: InfiniBand auxiliary device.
+ * @qp_info: details of a QP to read via debugfs.
+ * @wqe_info: details of a WQE to read via debugfs.
+ * @ctx: user context.
+ * @hw_access_lock: protects from HW access during reset flows.
+ * @asic_type: ASIC specific type.
+ * @status_cmd: status packet command from FW.
+ * @qp_reset_mode: Graceful/fast reset.
+ * @fw_ver: FW version.
+ * @mem_ids: an xarray holding all active memory handles.
+ * @ctrl_op_mask: mask of supported control operations.
+ * @ports_mask: mask of available ports.
+ * @ext_ports_mask: mask of external ports (subset of ports_mask).
+ * @phys_auto_neg_mask: mask of ports with Autonegotiation availability.
+ * @auto_neg_mask: mask of port with Autonegotiation enabled.
+ * @mac_loopback: enable MAC loopback on specific ports.
+ * @dram_size: available DRAM size.
+ * @mmap_type_flag: flag to indicate NIC MMAP type.
+ * @pol_tx_mask: bitmap of tx polarity for all lanes.
+ * @pol_rx_mask: bitmap of rx polarity for all lanes.
+ * @device_timeout: device access timeout in usec.
+ * @mac_lane_remap: MAC to PHY lane mapping.
+ * @pending_reset_long_timeout: Long timeout for pending hard reset to finish in seconds.
+ * @kernel_asid: kernel ASID.
+ * @qp_drain_time: drain waiting time in seconds after QP invalidation.
+ * @card_location: the OAM number in the HLS (relevant for PMC card type).
+ * @phy_port_to_dump: the port which its serdes params will be dumped.
+ * @fw_major_version: major version of current loaded preboot.
+ * @fw_minor_version: minor version of current loaded preboot.
+ * @qpc_cache_inv_timeout: timeout for QPC cache invalidation.
+ * @fw_app_cpu_boot_dev_sts0: bitmap representation of application security status reported by FW,
+ * bit description can be found in CPU_BOOT_DEV_STS0.
+ * @fw_app_cpu_boot_dev_sts1: bitmap representation of application security status reported by FW,
+ * bit description can be found in CPU_BOOT_DEV_STS1.
+ * @accumulate_fec_duration: Time (ms) to accumulate FEC errors for.
+ * @id: device ID.
+ * @phy_calc_ber_wait_sec: time in seconds to wait before BER calculation.
+ * @cache_line_size: device cache line size.
+ * @operational: is device operational.
+ * @in_reset: is device under reset.
+ * @fw_reset: is device under reset which was initiated by FW.
+ * @in_teardown: is device under teardown.
+ * @is_initialized: is device initialized.
+ * @pldm: is running on Palladium setup.
+ * @skip_phy_init: avoid writing/reading PHY registers.
+ * @load_phy_fw: true if the PHY F/W should be loaded, false otherwise.
+ * @cpucp_fw: is CPUCP FW enabled.
+ * @supports_coresight: is CoreSight supported.
+ * @use_fw_serdes_info: true if FW serdes values should be used, false if hard coded values should
+ * be used.
+ * @phy_config_fw: true if the PHY F/W should be configured, false otherwise. The PHY F/W should be
+ * configured on ASIC only, in contrary to Palladium.
+ * @mmu_bypass: use MMU bypass for allocated data structures (false is used only for debug mode).
+ * @wq_arrays_pool_enable: Use device memory pool for WQ arrays.
+ * @poll_enable: enable polling mode rather than interrupt mode.
+ * @has_eq: is event queue is supported.
+ * @skip_mac_reset: skip MAC reset.
+ * @skip_mac_cnts: Used to skip MAC counters if not supported.
+ * @skip_odd_ports_cfg_lock: do not lock the odd ports when acquiring the cfg lock for all ports.
+ * @ib_support: InfiniBand support.
+ * @mmu_enable: is MMU enabled.
+ * @eth_loopback: enable hack in hbl_en_handle_tx to test eth traffic.
+ * @lanes_per_port: number of physical lanes per port.
+ * @is_eth_aux_dev_initialized: true if the eth auxiliary device is initialized.
+ * @is_ib_aux_dev_initialized: true if the IB auxiliary device is initialized.
+ * @rx_drop_percent: RX packet drop percentage set via debugfs.
+ * @rand_status: randomize the FW status counters (used for testing).
+ * @status_period: periodic time in secs at which FW expects status packet.
+ * @phy_regs_print: print all PHY registers reads/writes.
+ * @phy_calc_ber: show PHY BER statistics during power-up.
+ * @is_decap_disabled: true if need to skip decapsulation, false otherwise.
+ * @phy_set_nrz: Set the PHY to NRZ mode (25Gbps speed).
+ * @skip_phy_default_tx_taps_cfg: Used to skip re-configuration of the default tx_taps.
+ * @cpucp_checkers_shift: CPUCP checkers flags shift.
+ * @mixed_qp_wq_types: Using mixed QP WQ types is supported.
+ * @hw_stop_during_teardown: Stopping the HW should take place during device teardown.
+ * @qp_wait_for_idle: Wait for QP to be idle.
+ * @hw_invalid_while_teardown: HW is unavailable during device teardown.
+ * @umr_support: device supports UMR.
+ * @ib_device_opened: Is true if IB deviced has been opened.
+ * @multi_ctx_support: device supports multiple contexts.
+ * @skip_phy_pol_cfg: Used to prevent overwriting polarity cfg after setting them via debugfs.
+ * @cc_support: device supports congestion control.
+ * @phy_force_first_tx_taps_cfg: start with first Tx taps config in PHY power-up.
+ */
+struct hbl_cn_device {
+ struct pci_dev *pdev;
+ struct device *dev;
+ struct hbl_cn_cpucp_info *cpucp_info;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct hbl_cn_tx_taps *phy_tx_taps;
+ struct hbl_cn_ber_info *phy_ber_info;
+ struct hbl_cn_port *cn_ports;
+ struct hbl_cn_macro *cn_macros;
+ struct gen_pool *wq_arrays_pool;
+ struct hbl_cn_properties cn_props;
+ void *asic_specific;
+ char *fw_ver;
+ struct hbl_aux_dev *cn_aux_dev;
+ struct hbl_aux_dev en_aux_dev;
+ struct hbl_aux_dev ib_aux_dev;
+ struct hbl_cn_qp_info qp_info;
+ struct hbl_cn_wqe_info wqe_info;
+ struct hbl_cn_ctx *ctx;
+ /* protects from HW access during reset flows */
+ struct mutex hw_access_lock;
+ enum hbl_cn_asic_type asic_type;
+ enum hbl_cn_status_cmd status_cmd;
+ enum hbl_cn_qp_reset_mode qp_reset_mode;
+ struct xarray mem_ids;
+ u64 ctrl_op_mask;
+ u64 ports_mask;
+ u64 ext_ports_mask;
+ u64 phys_auto_neg_mask;
+ u64 auto_neg_mask;
+ u64 mac_loopback;
+ u64 dram_size;
+ u64 mmap_type_flag;
+ u64 pol_tx_mask;
+ u64 pol_rx_mask;
+ u32 device_timeout;
+ u32 *mac_lane_remap;
+ u32 pending_reset_long_timeout;
+ u32 kernel_asid;
+ u32 qp_drain_time;
+ u32 card_location;
+ u32 phy_port_to_dump;
+ u32 fw_major_version;
+ u32 fw_minor_version;
+ u32 qpc_cache_inv_timeout;
+ u32 fw_app_cpu_boot_dev_sts0;
+ u32 fw_app_cpu_boot_dev_sts1;
+ u32 accumulate_fec_duration;
+ u16 id;
+ u16 phy_calc_ber_wait_sec;
+ u16 cache_line_size;
+ u8 operational;
+ u8 in_reset;
+ u8 fw_reset;
+ u8 in_teardown;
+ u8 is_initialized;
+ u8 pldm;
+ u8 skip_phy_init;
+ u8 load_phy_fw;
+ u8 cpucp_fw;
+ u8 supports_coresight;
+ u8 use_fw_serdes_info;
+ u8 phy_config_fw;
+ u8 mmu_bypass;
+ u8 wq_arrays_pool_enable;
+ u8 poll_enable;
+ u8 has_eq;
+ u8 skip_mac_reset;
+ u8 skip_mac_cnts;
+ u8 skip_odd_ports_cfg_lock;
+ u8 ib_support;
+ u8 mmu_enable;
+ u8 eth_loopback;
+ u8 lanes_per_port;
+ u8 is_eth_aux_dev_initialized;
+ u8 is_ib_aux_dev_initialized;
+ u8 rx_drop_percent;
+ u8 rand_status;
+ u8 status_period;
+ u8 phy_regs_print;
+ u8 phy_calc_ber;
+ u8 is_decap_disabled;
+ u8 phy_set_nrz;
+ u8 skip_phy_default_tx_taps_cfg;
+ u8 cpucp_checkers_shift;
+ u8 mixed_qp_wq_types;
+ u8 hw_stop_during_teardown;
+ u8 qp_wait_for_idle;
+ u8 hw_invalid_while_teardown;
+ u8 umr_support;
+ u8 ib_device_opened;
+ u8 multi_ctx_support;
+ u8 skip_phy_pol_cfg;
+ u8 cc_support;
+ u8 phy_force_first_tx_taps_cfg;
+};
+
+static inline void hbl_cn_strtolower(char *str)
+{
+ while (*str) {
+ *str = tolower(*str);
+ str++;
+ }
+}
+
+int hbl_cn_dev_init(struct hbl_cn_device *hdev);
+void hbl_cn_dev_fini(struct hbl_cn_device *hdev);
+bool hbl_cn_comp_device_operational(struct hbl_cn_device *hdev);
+void hbl_cn_spmu_get_stats_info(struct hbl_cn_port *cn_port, struct hbl_cn_stat **stats,
+ u32 *n_stats);
+int hbl_cn_reserve_dva_block(struct hbl_cn_ctx *ctx, u64 size, u64 *dva);
+void hbl_cn_unreserve_dva_block(struct hbl_cn_ctx *ctx, u64 dva, u64 size);
+int hbl_cn_get_hw_block_handle(struct hbl_cn_device *hdev, u64 address, u64 *handle);
+int hbl_cn_send_cpucp_packet(struct hbl_cn_device *hdev, u32 port, enum cpucp_packet_id pkt_id,
+ int val);
+int hbl_cn_internal_port_init_locked(struct hbl_cn_port *cn_port);
+void hbl_cn_internal_port_fini_locked(struct hbl_cn_port *cn_port);
+int hbl_cn_phy_init(struct hbl_cn_port *cn_port);
+void hbl_cn_phy_fini(struct hbl_cn_port *cn_port);
+void hbl_cn_phy_port_reconfig(struct hbl_cn_port *cn_port);
+int hbl_cn_phy_has_binary_fw(struct hbl_cn_device *hdev);
+void hbl_cn_phy_set_fw_polarity(struct hbl_cn_device *hdev);
+void hbl_cn_phy_set_port_status(struct hbl_cn_port *cn_port, bool up);
+int hbl_cn_read_spmu_counters(struct hbl_cn_port *cn_port, u64 out_data[], u32 *num_out_data);
+int hbl_cn_qp_modify(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp,
+ enum hbl_cn_qp_state new_state, void *params);
+u32 hbl_cn_get_max_qp_id(struct hbl_cn_port *cn_port);
+bool hbl_cn_is_port_open(struct hbl_cn_port *cn_port);
+u32 hbl_cn_get_pflags(struct hbl_cn_port *cn_port);
+u8 hbl_cn_get_num_of_digits(u64 num);
+void hbl_cn_reset_stats_counters(struct hbl_cn_device *hdev);
+void hbl_cn_reset_ports_toggle_counters(struct hbl_cn_device *hdev);
+void hbl_cn_get_self_hw_block_handle(struct hbl_cn_device *hdev, u64 address, u64 *handle);
+u32 hbl_cn_hw_block_handle_to_addr32(struct hbl_cn_device *hdev, u64 handle);
+
+struct hbl_cn_ev_dq *hbl_cn_asid_to_dq(struct hbl_cn_ev_dqs *ev_dqs, u32 asid);
+struct hbl_cn_ev_dq *hbl_cn_dbn_to_dq(struct hbl_cn_ev_dqs *ev_dqs, u32 dbn,
+ struct hbl_cn_device *hdev);
+struct hbl_cn_ev_dq *hbl_cn_qpn_to_dq(struct hbl_cn_ev_dqs *ev_dqs, u32 qpn);
+struct hbl_cn_dq_qp_info *hbl_cn_get_qp_info(struct hbl_cn_ev_dqs *ev_dqs, u32 qpn);
+struct hbl_cn_ev_dq *hbl_cn_cqn_to_dq(struct hbl_cn_ev_dqs *ev_dqs,
+ u32 cqn, struct hbl_cn_device *hdev);
+struct hbl_cn_ev_dq *hbl_cn_ccqn_to_dq(struct hbl_cn_ev_dqs *ev_dqs, u32 ccqn,
+ struct hbl_cn_device *hdev);
+
+int hbl_cn_reserve_wq_dva(struct hbl_cn_ctx *ctx, struct hbl_cn_port *cn_port, u64 wq_arr_size,
+ u32 type, u64 *dva);
+void hbl_cn_unreserve_wq_dva(struct hbl_cn_ctx *ctx, struct hbl_cn_port *cn_port, u32 type);
+u32 hbl_cn_get_wq_array_type(bool is_send);
+
+void hbl_cn_track_port_reset(struct hbl_cn_port *cn_port, u32 syndrome);
+int hbl_cn_get_reg_pcie_addr(struct hbl_cn_device *hdev, u8 bar_id, u32 reg, u64 *pci_addr);
+void hbl_cn_ports_cancel_status_work(struct hbl_cn_device *hdev);
+
+/* Memory related functions */
+int hbl_cn_mem_alloc(struct hbl_cn_ctx *ctx, struct hbl_cn_mem_data *mem_data);
+int hbl_cn_mem_destroy(struct hbl_cn_device *hdev, u64 handle);
+struct hbl_cn_mem_buf *hbl_cn_mem_buf_get(struct hbl_cn_device *hdev, u64 handle);
+int hbl_cn_mem_buf_put(struct hbl_cn_mem_buf *buf);
+int hbl_cn_mem_buf_put_handle(struct hbl_cn_device *hdev, u64 handle);
+void hbl_cn_mem_init(struct hbl_cn_device *hdev);
+void hbl_cn_mem_fini(struct hbl_cn_device *hdev);
+
+u32 hbl_cn_dram_readl(struct hbl_cn_device *hdev, u64 addr);
+void hbl_cn_dram_writel(struct hbl_cn_device *hdev, u32 val, u64 addr);
+u32 hbl_cn_rreg(struct hbl_cn_device *hdev, u32 reg);
+void hbl_cn_wreg(struct hbl_cn_device *hdev, u32 reg, u32 val);
+void hbl_cn_get_frac_info(u64 numerator, u64 denominator, u64 *integer, u64 *exp);
+
+bool hbl_cn_eq_dispatcher_is_empty(struct hbl_cn_ev_dq *dq);
+bool hbl_cn_eq_dispatcher_is_full(struct hbl_cn_ev_dq *dq);
+void hbl_cn_eq_dispatcher_init(struct hbl_cn_port *cn_port);
+void hbl_cn_eq_dispatcher_fini(struct hbl_cn_port *cn_port);
+void hbl_cn_eq_dispatcher_reset(struct hbl_cn_port *cn_port);
+int hbl_cn_eq_dispatcher_associate_dq(struct hbl_cn_port *cn_port, u32 asid);
+int hbl_cn_eq_dispatcher_dissociate_dq(struct hbl_cn_port *cn_port, u32 asid);
+int hbl_cn_eq_dispatcher_register_qp(struct hbl_cn_port *cn_port, u32 asid, u32 qp_id);
+int hbl_cn_eq_dispatcher_unregister_qp(struct hbl_cn_port *cn_port, u32 qp_id);
+int hbl_cn_eq_dispatcher_register_cq(struct hbl_cn_port *cn_port, u32 asid, u32 cqn);
+int hbl_cn_eq_dispatcher_unregister_cq(struct hbl_cn_port *cn_port, u32 cqn);
+int hbl_cn_eq_dispatcher_register_db(struct hbl_cn_port *cn_port, u32 asid, u32 dbn);
+int hbl_cn_eq_dispatcher_unregister_db(struct hbl_cn_port *cn_port, u32 dbn);
+int hbl_cn_eq_dispatcher_dequeue(struct hbl_cn_port *cn_port, u32 asid,
+ struct hbl_cn_eqe *eqe, bool is_default);
+int hbl_cn_eq_dispatcher_register_ccq(struct hbl_cn_port *cn_port, u32 asid, u32 ccqn);
+int hbl_cn_eq_dispatcher_unregister_ccq(struct hbl_cn_port *cn_port, u32 asid, u32 ccqn);
+int hbl_cn_eq_dispatcher_enqueue(struct hbl_cn_port *cn_port, const struct hbl_cn_eqe *eqe);
+int hbl_cn_eq_dispatcher_enqueue_bcast(struct hbl_cn_port *cn_port, const struct hbl_cn_eqe *eqe);
+void hbl_cn_eq_handler(struct hbl_cn_port *cn_port);
+int hbl_cn_alloc_ring(struct hbl_cn_device *hdev, struct hbl_cn_ring *ring, int elem_size,
+ int count);
+void hbl_cn_free_ring(struct hbl_cn_device *hdev, struct hbl_cn_ring *ring);
+
+struct hbl_cn_user_cq *hbl_cn_user_cq_get(struct hbl_cn_port *cn_port, u8 cq_id);
+int hbl_cn_user_cq_put(struct hbl_cn_user_cq *user_cq);
+bool hbl_cn_is_ibdev(struct hbl_cn_device *hdev);
+void hbl_cn_hard_reset_prepare(struct hbl_aux_dev *cn_aux_dev, bool fw_reset, bool in_teardown);
+int hbl_cn_send_port_cpucp_status(struct hbl_aux_dev *aux_dev, u32 port, u8 cmd, u8 period);
+void hbl_cn_fw_status_work(struct work_struct *work);
+int hbl_cn_get_src_ip(struct hbl_cn_port *cn_port, u32 *src_ip);
+void hbl_cn_ctx_resources_destroy(struct hbl_cn_device *hdev, struct hbl_cn_ctx *ctx);
+
+int __hbl_cn_ports_reopen(struct hbl_cn_device *hdev);
+void __hbl_cn_hard_reset_prepare(struct hbl_cn_device *hdev, bool fw_reset, bool in_teardown);
+void __hbl_cn_stop(struct hbl_cn_device *hdev);
+
+/* DMA memory allocations */
+void *__hbl_cn_dma_alloc_coherent(struct hbl_cn_device *hdev, size_t size, dma_addr_t *dma_handle,
+ gfp_t flag, const char *caller);
+void __hbl_cn_dma_free_coherent(struct hbl_cn_device *hdev, size_t size, void *cpu_addr,
+ dma_addr_t dma_addr, const char *caller);
+void *__hbl_cn_dma_pool_zalloc(struct hbl_cn_device *hdev, size_t size, gfp_t mem_flags,
+ dma_addr_t *dma_handle, const char *caller);
+void __hbl_cn_dma_pool_free(struct hbl_cn_device *hdev, void *vaddr, dma_addr_t dma_addr,
+ const char *caller);
+
+#endif /* HABANALABS_CN_H_ */
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_drv.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_drv.c
new file mode 100644
index 000000000000..47eedd27f36e
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_drv.c
@@ -0,0 +1,220 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#define pr_fmt(fmt) "habanalabs_cn: " fmt
+
+#include "hbl_cn.h"
+
+#include <linux/module.h>
+#include <linux/auxiliary_bus.h>
+#include <linux/sched/clock.h>
+
+#define HBL_DRIVER_AUTHOR "HabanaLabs Kernel Driver Team"
+
+#define HBL_DRIVER_DESC "HabanaLabs AI accelerators Core Network driver"
+
+MODULE_AUTHOR(HBL_DRIVER_AUTHOR);
+MODULE_DESCRIPTION(HBL_DRIVER_DESC);
+MODULE_LICENSE("GPL");
+
+/* QP drain time in seconds */
+#define HBL_CN_QP_DRAIN_TIME 5
+
+static bool poll_enable;
+static uint qp_drain_time = HBL_CN_QP_DRAIN_TIME;
+
+module_param(poll_enable, bool, 0444);
+MODULE_PARM_DESC(poll_enable,
+ "Enable driver in polling mode rather than IRQ (0 = no, 1 = yes, default: no)");
+
+module_param(qp_drain_time, uint, 0444);
+MODULE_PARM_DESC(qp_drain_time, "QP drain time in seconds after QP invalidation (default: 2)");
+
+static int hdev_init(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_cn_aux_data *aux_data = aux_dev->aux_data;
+ struct hbl_cn_device *hdev;
+ int rc;
+
+ hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
+ if (!hdev)
+ return -ENOMEM;
+
+ hdev->cpucp_info = kzalloc(sizeof(*hdev->cpucp_info), GFP_KERNEL);
+ if (!hdev->cpucp_info) {
+ rc = -ENOMEM;
+ goto free_hdev;
+ }
+
+ aux_dev->priv = hdev;
+ hdev->cn_aux_dev = aux_dev;
+ hdev->pdev = aux_data->pdev;
+ hdev->dev = aux_data->dev;
+ hdev->asic_type = aux_data->asic_type;
+ hdev->pending_reset_long_timeout = aux_data->pending_reset_long_timeout;
+ hdev->pldm = aux_data->pldm;
+ hdev->skip_phy_init = aux_data->skip_phy_init;
+ hdev->cpucp_fw = aux_data->cpucp_fw;
+ hdev->load_phy_fw = aux_data->load_phy_fw;
+ hdev->supports_coresight = aux_data->supports_coresight;
+ hdev->use_fw_serdes_info = aux_data->use_fw_serdes_info;
+ hdev->fw_ver = aux_data->fw_ver;
+ hdev->id = aux_data->id;
+ hdev->dram_size = aux_data->dram_size;
+ hdev->ports_mask = aux_data->ports_mask;
+ hdev->ext_ports_mask = aux_data->ext_ports_mask;
+ hdev->phys_auto_neg_mask = aux_data->auto_neg_mask;
+ hdev->cache_line_size = aux_data->cache_line_size;
+ hdev->kernel_asid = aux_data->kernel_asid;
+ hdev->qp_drain_time = qp_drain_time;
+ hdev->card_location = aux_data->card_location;
+ hdev->mmu_enable = aux_data->mmu_enable;
+ hdev->lanes_per_port = aux_data->lanes_per_port;
+ hdev->device_timeout = aux_data->device_timeout;
+ hdev->fw_major_version = aux_data->fw_major_version;
+ hdev->fw_minor_version = aux_data->fw_minor_version;
+ hdev->fw_app_cpu_boot_dev_sts0 = aux_data->fw_app_cpu_boot_dev_sts0;
+ hdev->fw_app_cpu_boot_dev_sts1 = aux_data->fw_app_cpu_boot_dev_sts1;
+ hdev->cpucp_checkers_shift = aux_data->cpucp_checkers_shift;
+ hdev->accumulate_fec_duration = ACCUMULATE_FEC_STATS_DURATION_MS;
+ hdev->poll_enable = poll_enable;
+
+ mutex_init(&hdev->hw_access_lock);
+
+ return 0;
+
+free_hdev:
+ kfree(hdev);
+ return rc;
+}
+
+static void hdev_fini(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_cn_device *hdev = aux_dev->priv;
+
+ mutex_destroy(&hdev->hw_access_lock);
+
+ kfree(hdev->cpucp_info);
+ kfree(hdev);
+ aux_dev->priv = NULL;
+}
+
+static const struct auxiliary_device_id hbl_cn_id_table[] = {
+ { .name = "habanalabs.cn", },
+ {},
+};
+
+MODULE_DEVICE_TABLE(auxiliary, hbl_cn_id_table);
+
+static int hbl_cn_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
+{
+ struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
+ struct hbl_cn_aux_ops *aux_ops = aux_dev->aux_ops;
+ struct hbl_cn_device *hdev;
+ ktime_t timeout;
+ int rc;
+
+ rc = hdev_init(aux_dev);
+ if (rc) {
+ dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
+ return -EIO;
+ }
+
+ hdev = aux_dev->priv;
+
+ /* don't allow module unloading while it is attached */
+ if (!try_module_get(THIS_MODULE)) {
+ dev_err(hdev->dev, "Failed to increment %s module refcount\n",
+ module_name(THIS_MODULE));
+ rc = -EIO;
+ goto module_get_err;
+ }
+
+ timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
+ while (1) {
+ aux_ops->hw_access_lock(aux_dev);
+
+ /* if the device is operational, proceed to actual init while holding the lock in
+ * order to prevent concurrent hard reset
+ */
+ if (aux_ops->device_operational(aux_dev))
+ break;
+
+ aux_ops->hw_access_unlock(aux_dev);
+
+ if (ktime_compare(ktime_get(), timeout) > 0) {
+ dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
+ rc = -EBUSY;
+ goto timeout_err;
+ }
+
+ dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing CN\n");
+
+ msleep_interruptible(MSEC_PER_SEC);
+ }
+
+ rc = hbl_cn_dev_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init CN device\n");
+ goto dev_init_err;
+ }
+
+ aux_ops->hw_access_unlock(aux_dev);
+
+ return 0;
+
+dev_init_err:
+ aux_ops->hw_access_unlock(aux_dev);
+timeout_err:
+ module_put(THIS_MODULE);
+module_get_err:
+ hdev_fini(aux_dev);
+
+ return rc;
+}
+
+/* This function can be called only from the compute driver when deleting the aux bus, because we
+ * incremented the module refcount on probing. Hence no need to protect here from hard reset.
+ */
+static void hbl_cn_remove(struct auxiliary_device *adev)
+{
+ struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
+ struct hbl_cn_device *hdev = aux_dev->priv;
+
+ if (!hdev)
+ return;
+
+ hbl_cn_dev_fini(hdev);
+
+ /* allow module unloading as now it is detached */
+ module_put(THIS_MODULE);
+
+ hdev_fini(aux_dev);
+}
+
+static struct auxiliary_driver hbl_cn_driver = {
+ .name = "cn",
+ .probe = hbl_cn_probe,
+ .remove = hbl_cn_remove,
+ .id_table = hbl_cn_id_table,
+};
+
+static int __init hbl_cn_init(void)
+{
+ pr_info("loading driver\n");
+
+ return auxiliary_driver_register(&hbl_cn_driver);
+}
+
+static void __exit hbl_cn_exit(void)
+{
+ auxiliary_driver_unregister(&hbl_cn_driver);
+
+ pr_info("driver removed\n");
+}
+
+module_init(hbl_cn_init);
+module_exit(hbl_cn_exit);
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_memory.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_memory.c
new file mode 100644
index 000000000000..93c97fad6a20
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_memory.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "hbl_cn.h"
+
+int hbl_cn_mem_alloc(struct hbl_cn_ctx *ctx, struct hbl_cn_mem_data *mem_data)
+{
+ return 0;
+}
+
+int hbl_cn_mem_destroy(struct hbl_cn_device *hdev, u64 handle)
+{
+ return 0;
+}
+
+struct hbl_cn_mem_buf *hbl_cn_mem_buf_get(struct hbl_cn_device *hdev, u64 handle)
+{
+ return NULL;
+}
+
+int hbl_cn_mem_buf_put(struct hbl_cn_mem_buf *buf)
+{
+ return 0;
+}
+
+int hbl_cn_mem_buf_put_handle(struct hbl_cn_device *hdev, u64 handle)
+{
+ return 0;
+}
+
+void hbl_cn_mem_init(struct hbl_cn_device *hdev)
+{
+}
+
+void hbl_cn_mem_fini(struct hbl_cn_device *hdev)
+{
+}
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_phy.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_phy.c
new file mode 100644
index 000000000000..0d07cd78221d
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_phy.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "hbl_cn.h"
+
+void hbl_cn_phy_set_port_status(struct hbl_cn_port *cn_port, bool up)
+{
+}
+
+int hbl_cn_phy_init(struct hbl_cn_port *cn_port)
+{
+ return 0;
+}
+
+void hbl_cn_phy_fini(struct hbl_cn_port *cn_port)
+{
+}
+
+void hbl_cn_phy_port_reconfig(struct hbl_cn_port *cn_port)
+{
+}
+
+int hbl_cn_phy_has_binary_fw(struct hbl_cn_device *hdev)
+{
+ return 0;
+}
+
+void hbl_cn_phy_set_fw_polarity(struct hbl_cn_device *hdev)
+{
+}
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
new file mode 100644
index 000000000000..9ddc23bf8194
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "hbl_cn.h"
+
+int hbl_cn_qp_modify(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp,
+ enum hbl_cn_qp_state new_state, void *params)
+{
+ return 0;
+}
diff --git a/include/linux/habanalabs/cpucp_if.h b/include/linux/habanalabs/cpucp_if.h
index f316c8d0f3fc..7fe3f3e68b04 100644
--- a/include/linux/habanalabs/cpucp_if.h
+++ b/include/linux/habanalabs/cpucp_if.h
@@ -366,6 +366,27 @@ struct hl_eq_addr_dec_intr_data {
__u8 pad[7];
};
+#define MAX_PORTS_PER_NIC 4
+
+/* NIC interrupt type */
+enum hl_nic_interrupt_type {
+ NIC_INTR_NONE = 0,
+ NIC_INTR_TMR = 1,
+ NIC_INTR_RXB_CORE_SPI,
+ NIC_INTR_RXB_CORE_SEI,
+ NIC_INTR_QPC_RESP_ERR,
+ NIC_INTR_RXE_SPI,
+ NIC_INTR_RXE_SEI,
+ NIC_INTR_TXS,
+ NIC_INTR_TXE,
+};
+
+struct hl_eq_nic_intr_cause {
+ __le32 intr_type; /* enum hl_nic_interrupt_type */
+ __le32 pad;
+ struct hl_eq_intr_cause intr_cause[MAX_PORTS_PER_NIC];
+};
+
struct hl_eq_entry {
struct hl_eq_header hdr;
union {
@@ -382,6 +403,7 @@ struct hl_eq_entry {
struct hl_eq_hbm_sei_data sei_data; /* Gaudi2 HBM */
struct hl_eq_engine_arc_intr_data arc_data;
struct hl_eq_addr_dec_intr_data addr_dec;
+ struct hl_eq_nic_intr_cause nic_intr_cause;
__le64 data[7];
};
};
@@ -665,6 +687,9 @@ enum pq_init_status {
* by the host to prevent replay attacks. public key and certificate also
* provided as part of the FW response.
*
+ * CPUCP_PACKET_NIC_SET_CHECKERS -
+ * Packet to set a specific NIC checker bit.
+ *
* CPUCP_PACKET_MONITOR_DUMP_GET -
* Get monitors registers dump from the CpuCP kernel.
* The CPU will put the registers dump in the a buffer allocated by the driver
@@ -673,6 +698,14 @@ enum pq_init_status {
* data corruption in case of mismatched driver/FW versions.
* Obsolete.
*
+ * CPUCP_PACKET_NIC_WQE_ASID_SET -
+ * Packet to set nic wqe asid as the registers needed are privilege and to be configured by FW
+ *
+ * CPUCP_PACKET_NIC_ECC_INTRS_UNMASK -
+ * Packet to unmask NIC memory registers which are masked at preboot stage. As per the Arch
+ * team recommendation, NIC memory ECC errors should be unmasked after NIC driver is up and
+ * running
+ *
* CPUCP_PACKET_GENERIC_PASSTHROUGH -
* Generic opcode for all firmware info that is only passed to host
* through the LKD, without getting parsed there.
@@ -681,12 +714,28 @@ enum pq_init_status {
* LKD sends FW indication whether device is free or in use, this indication is reported
* also to the BMC.
*
+ * CPUCP_PACKET_NIC_MAC_TX_RESET -
+ * Packet to reset the NIC MAC Tx.
+ *
+ * CPUCP_PACKET_NIC_WQE_ASID_UNSET -
+ * Packet to unset nic wqe asid as the registers needed are privilege and to be configured
+ * by FW.
+ *
* CPUCP_PACKET_SOFT_RESET -
* Packet to perform soft-reset.
*
* CPUCP_PACKET_INTS_REGISTER -
* Packet to inform FW that queues have been established and LKD is ready to receive
* EQ events.
+ *
+ * CPUCP_PACKET_NIC_INIT_TXS_MEM -
+ * Init TXS related memory in HBM.
+ *
+ * CPUCP_PACKET_NIC_INIT_TMR_MEM -
+ * Init HW timer related memory in HBM.
+ *
+ * CPUCP_PACKET_NIC_CLR_MEM -
+ * Clear NIC related memory in HBM.
*/
enum cpucp_packet_id {
@@ -740,21 +789,24 @@ enum cpucp_packet_id {
CPUCP_PACKET_RESERVED2, /* not used */
CPUCP_PACKET_SEC_ATTEST_GET, /* internal */
CPUCP_PACKET_INFO_SIGNED_GET, /* internal */
- CPUCP_PACKET_RESERVED4, /* not used */
+ CPUCP_PACKET_NIC_SET_CHECKERS, /* internal */
CPUCP_PACKET_MONITOR_DUMP_GET, /* debugfs */
- CPUCP_PACKET_RESERVED5, /* not used */
- CPUCP_PACKET_RESERVED6, /* not used */
- CPUCP_PACKET_RESERVED7, /* not used */
+ CPUCP_PACKET_RESERVED3, /* not used */
+ CPUCP_PACKET_NIC_WQE_ASID_SET, /* internal */
+ CPUCP_PACKET_NIC_ECC_INTRS_UNMASK, /* internal */
CPUCP_PACKET_GENERIC_PASSTHROUGH, /* IOCTL */
- CPUCP_PACKET_RESERVED8, /* not used */
+ CPUCP_PACKET_RESERVED4, /* not used */
CPUCP_PACKET_ACTIVE_STATUS_SET, /* internal */
- CPUCP_PACKET_RESERVED9, /* not used */
- CPUCP_PACKET_RESERVED10, /* not used */
- CPUCP_PACKET_RESERVED11, /* not used */
- CPUCP_PACKET_RESERVED12, /* internal */
- CPUCP_PACKET_RESERVED13, /* internal */
+ CPUCP_PACKET_NIC_MAC_TX_RESET, /* internal */
+ CPUCP_PACKET_RESERVED5, /* not used */
+ CPUCP_PACKET_NIC_WQE_ASID_UNSET, /* internal */
+ CPUCP_PACKET_RESERVED6, /* internal */
+ CPUCP_PACKET_RESERVED7, /* internal */
CPUCP_PACKET_SOFT_RESET, /* internal */
CPUCP_PACKET_INTS_REGISTER, /* internal */
+ CPUCP_PACKET_NIC_INIT_TXS_MEM, /* internal */
+ CPUCP_PACKET_NIC_INIT_TMR_MEM, /* internal */
+ CPUCP_PACKET_NIC_CLR_MEM, /* internal */
CPUCP_PACKET_ID_MAX /* must be last */
};
@@ -862,6 +914,9 @@ struct cpucp_packet {
/* For NIC requests */
__le32 port_index;
+ /* For NIC requests */
+ __le32 macro_index;
+
/* For Generic packet sub index */
__le32 pkt_subidx;
};
@@ -1058,6 +1113,21 @@ enum pvt_index {
PVT_NE
};
+#define NIC_CHECKERS_TYPE_SHIFT 0
+#define NIC_CHECKERS_TYPE_MASK 0xFFFF
+#define NIC_CHECKERS_CHECK_SHIFT 16
+#define NIC_CHECKERS_CHECK_MASK 0x1
+#define NIC_CHECKERS_DROP_SHIFT 17
+#define NIC_CHECKERS_DROP_MASK 0x1
+
+enum nic_checkers_types {
+ RX_PKT_BAD_FORMAT = 0,
+ RX_INV_OPCODE,
+ RX_INV_SYNDROME,
+ RX_WQE_IDX_MISMATCH,
+ TX_WQE_IDX_MISMATCH = 0x80
+};
+
/* Event Queue Packets */
struct eq_generic_event {
@@ -1273,6 +1343,7 @@ struct ser_val {
* @post_fec_ser: post FEC SER value.
* @throughput: measured throughput.
* @latency: measured latency.
+ * @port_toggle_cnt: counts how many times the link toggled since last port PHY init.
*/
struct cpucp_nic_status {
__le32 port;
@@ -1292,6 +1363,8 @@ struct cpucp_nic_status {
struct ser_val post_fec_ser;
struct frac_val bandwidth;
struct frac_val lat;
+ __le32 port_toggle_cnt;
+ __u8 reserved[4];
};
enum cpucp_hbm_row_replace_cause {
@@ -1420,4 +1493,36 @@ enum hl_passthrough_type {
HL_PASSTHROUGH_VERSIONS,
};
+/* structure cpucp_cn_init_hw_mem_packet - used for initializing the associated CN (Core Network)
+ * hw(TIMER, TX-SCHEDQ) memory in HBM using the provided parameters.
+ * @cpucp_pkt: basic cpucp packet, the rest of the parameters extend the packet.
+ * @mem_base_addr: base address of the associated memory
+ * @num_entries: number of entries.
+ * @entry_size: size of entry.
+ * @granularity: base value for first element.
+ * @pad: padding
+ */
+struct cpucp_cn_init_hw_mem_packet {
+ struct cpucp_packet cpucp_pkt;
+ __le64 mem_base_addr;
+ __le16 num_entries;
+ __le16 entry_size;
+ __le16 granularity;
+ __u8 pad[2];
+};
+
+/* structure cpucp_cn_clear_mem_packet - used for clearing the associated CN (Core Network)
+ * memory in HBM using the provided parameters.
+ * @cpucp_pkt: basic cpucp packet, the rest of the parameters extend the packet.
+ * @mem_base_addr: base address of the associated memory
+ * @size: size in bytes of the associated memory.
+ * @pad: padding
+ */
+struct cpucp_cn_clear_mem_packet {
+ struct cpucp_packet cpucp_pkt;
+ __le64 mem_base_addr;
+ __le32 size;
+ __u8 pad[4];
+};
+
#endif /* CPUCP_IF_H */
diff --git a/include/linux/habanalabs/hl_boot_if.h b/include/linux/habanalabs/hl_boot_if.h
index 93366d5621fd..a6d9e510a974 100644
--- a/include/linux/habanalabs/hl_boot_if.h
+++ b/include/linux/habanalabs/hl_boot_if.h
@@ -194,6 +194,7 @@ enum cpu_boot_dev_sts {
CPU_BOOT_DEV_STS_FW_NIC_STAT_EXT_EN = 24,
CPU_BOOT_DEV_STS_IS_IDLE_CHECK_EN = 25,
CPU_BOOT_DEV_STS_MAP_HWMON_EN = 26,
+ CPU_BOOT_DEV_STS_NIC_MEM_CLEAR_EN = 27,
CPU_BOOT_DEV_STS_ENABLED = 31,
CPU_BOOT_DEV_STS_SCND_EN = 63,
CPU_BOOT_DEV_STS_LAST = 64 /* we have 2 registers of 32 bits */
@@ -331,6 +332,11 @@ enum cpu_boot_dev_sts {
* HWMON enum mapping to cpucp enums.
* Initialized in: linux
*
+ * CPU_BOOT_DEV_STS0_NIC_MEM_CLEAR_EN
+ * If set, means f/w supports nic hbm memory clear and
+ * tmr,txs hbm memory init.
+ * Initialized in: zephyr-mgmt
+ *
* CPU_BOOT_DEV_STS0_ENABLED Device status register enabled.
* This is a main indication that the
* running FW populates the device status
@@ -367,6 +373,7 @@ enum cpu_boot_dev_sts {
#define CPU_BOOT_DEV_STS0_FW_NIC_STAT_EXT_EN (1 << CPU_BOOT_DEV_STS_FW_NIC_STAT_EXT_EN)
#define CPU_BOOT_DEV_STS0_IS_IDLE_CHECK_EN (1 << CPU_BOOT_DEV_STS_IS_IDLE_CHECK_EN)
#define CPU_BOOT_DEV_STS0_MAP_HWMON_EN (1 << CPU_BOOT_DEV_STS_MAP_HWMON_EN)
+#define CPU_BOOT_DEV_STS0_NIC_MEM_CLEAR_EN (1 << CPU_BOOT_DEV_STS_NIC_MEM_CLEAR_EN)
#define CPU_BOOT_DEV_STS0_ENABLED (1 << CPU_BOOT_DEV_STS_ENABLED)
#define CPU_BOOT_DEV_STS1_ENABLED (1 << CPU_BOOT_DEV_STS_ENABLED)
@@ -403,7 +410,7 @@ enum kmd_msg {
KMD_MSG_GOTO_WFE,
KMD_MSG_FIT_RDY,
KMD_MSG_SKIP_BMC,
- RESERVED,
+ KMD_MSG_RESERVED,
KMD_MSG_RST_DEV,
KMD_MSG_LAST
};
diff --git a/include/linux/net/intel/cn.h b/include/linux/net/intel/cn.h
new file mode 100644
index 000000000000..d040b0bbd94c
--- /dev/null
+++ b/include/linux/net/intel/cn.h
@@ -0,0 +1,474 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef HBL_CN_H_
+#define HBL_CN_H_
+
+#include <linux/types.h>
+#include <linux/sizes.h>
+#include <linux/net/intel/cn_aux.h>
+
+#define HBL_EN_PFC_PRIO_NUM 4
+#define CQ_ARM_TIMEOUT_USEC 10
+
+struct qpc_mask;
+
+/**
+ * enum hbl_cn_pflags - mutable capabilities of the port.
+ * PFLAGS_PCS_LINK_CHECK: check for PCS link periodically.
+ * PFLAGS_PHY_AUTO_NEG_LPBK: allow Autonegotiation in loopback.
+ */
+enum hbl_cn_pflags {
+ PFLAGS_PCS_LINK_CHECK = BIT(0),
+ PFLAGS_PHY_AUTO_NEG_LPBK = BIT(1),
+};
+
+enum hbl_ts_type {
+ TS_RC = 0,
+ TS_RAW = 1
+};
+
+enum hbl_trust_level {
+ UNSECURED = 0,
+ SECURED = 1,
+ PRIVILEGE = 2
+};
+
+/**
+ * enum qpc_req_wq_type - QP REQ WQ type.
+ * @QPC_REQ_WQ_TYPE_WRITE: WRITE, "native" SEND, RECV-RDV or READ-RDV operations are allowed.
+ * @QPC_REQ_WQ_TYPE_RDV_READ: No operation is allowed on this endpoint QP.
+ * @QPC_REQ_WQ_TYPE_RDV_WRITE: SEND-RDV operation is allowed on this QP.
+ */
+enum qpc_req_wq_type {
+ QPC_REQ_WQ_TYPE_WRITE = 1,
+ QPC_REQ_WQ_TYPE_RDV_READ = 2,
+ QPC_REQ_WQ_TYPE_RDV_WRITE = 3
+};
+
+/**
+ * enum hbl_ib_mem_type - Memory allocation types.
+ * @HBL_IB_MEM_INVALID: N/A option.
+ * @HBL_IB_MEM_HOST_DMA_COHERENT: Host DMA coherent memory.
+ * @HBL_IB_MEM_HOST_VIRTUAL: Host virtual memory.
+ * @HBL_IB_MEM_DEVICE: Device HBM memory.
+ * @HBL_IB_MEM_HOST_MAP_ONLY: Host mapping only.
+ * @HBL_IB_MEM_HW_BLOCK: Hw registers.
+ */
+enum hbl_ib_mem_type {
+ HBL_IB_MEM_INVALID,
+ HBL_IB_MEM_HOST_DMA_COHERENT,
+ HBL_IB_MEM_HOST_VIRTUAL,
+ HBL_IB_MEM_DEVICE,
+ HBL_IB_MEM_HOST_MAP_ONLY,
+ HBL_IB_MEM_HW_BLOCK,
+};
+
+/**
+ * struct hbl_cn_eqe - describes an event-queue entry
+ * @data: the data each event-queue entry contains
+ */
+struct hbl_cn_eqe {
+ u32 data[4];
+};
+
+/**
+ * struct hbl_cn_mem_resources - memory resource used by a memory ring.
+ * @addr: virtual address of the memory.
+ * @dma_addr: physical address of the memory.
+ * @size: memory size.
+ */
+struct hbl_cn_mem_resource {
+ void *addr;
+ dma_addr_t dma_addr;
+ u32 size;
+};
+
+/**
+ * struct hbl_cn_ring - represents a memory ring.
+ * @buf: the ring buffer memory resource.
+ * @pi: the memory-resident producer index of the ring, updated by HW
+ * @pi_shadow: producer shadow index - used by SW
+ * @ci_shadow: consumer shadow index - used by SW
+ * @rep_idx: use to count until a threshold value, like HW update
+ * @asid: the asid of the ring
+ * @count: the number of elements the ring can hold
+ * @elem_size: the rings's element size
+ */
+struct hbl_cn_ring {
+ struct hbl_cn_mem_resource buf;
+ struct hbl_cn_mem_resource pi;
+ u32 pi_shadow;
+ u32 ci_shadow;
+ u32 rep_idx;
+ u32 asid;
+ u32 count;
+ u32 elem_size;
+};
+
+/* ring support */
+#define RING_BUF_DMA_ADDRESS(ring) ((ring)->buf.dma_addr)
+#define RING_BUF_ADDRESS(ring) ((ring)->buf.addr)
+#define RING_BUF_SIZE(ring) ((ring)->buf.size)
+#define RING_PI_DMA_ADDRESS(ring) ((ring)->pi.dma_addr)
+#define RING_PI_ADDRESS(ring) ((ring)->pi.addr)
+#define RING_PI_SIZE(ring) ((ring)->pi.size)
+#define RING_CI_ADDRESS(ring) RING_BUF_ADDRESS(ring)
+
+/* Ethernet */
+
+/**
+ * struct hbl_en_aux_data - habanalabs data for the Ethernet driver.
+ * @pdev: pointer to PCI device.
+ * @dev: related kernel basic device structure.
+ * @asic_specific: ASIC specific data.
+ * @fw_ver: FW version.
+ * @qsfp_eeprom: QSFPD EEPROM info.
+ * @mac_addr: array of all MAC addresses.
+ * @asic_type: ASIC specific type.
+ * @ports_mask: mask of available ports.
+ * @auto_neg_mask: mask of port with Autonegotiation enabled.
+ * @pending_reset_long_timeout: long timeout for pending hard reset to finish in seconds.
+ * @max_frm_len: maximum allowed frame length.
+ * @raw_elem_size: size of element in raw buffers.
+ * @max_raw_mtu: maximum MTU size for raw packets.
+ * @min_raw_mtu: minimum MTU size for raw packets.
+ * @id: device ID.
+ * @max_num_of_ports: max number of available ports.
+ * @has_eq: true if event queue is supported.
+ */
+struct hbl_en_aux_data {
+ struct pci_dev *pdev;
+ struct device *dev;
+ void *asic_specific;
+ char *fw_ver;
+ char *qsfp_eeprom;
+ char **mac_addr;
+ enum hbl_cn_asic_type asic_type;
+ u64 ports_mask;
+ u64 auto_neg_mask;
+ u32 pending_reset_long_timeout;
+ u32 max_frm_len;
+ u32 raw_elem_size;
+ u16 max_raw_mtu;
+ u16 min_raw_mtu;
+ u16 id;
+ u8 max_num_of_ports;
+ u8 has_eq;
+};
+
+/**
+ * struct hbl_en_aux_ops - pointer functions for cn <-> en drivers communication.
+ * @device_operational: is device operational.
+ * @hw_access_lock: prevent HW access.
+ * @hw_access_unlock: allow HW access.
+ * @is_eth_lpbk: is Ethernet loopback enabled.
+ * @port_hw_init: port HW init.
+ * @port_hw_fini: port HW cleanup.
+ * @phy_init: port PHY init.
+ * @phy_fini: port PHY cleanup.
+ * @set_pfc: enable/disable PFC.
+ * @get_cnts_num: get the number of available counters.
+ * @get_cnts_names: get the names of the available counters.
+ * @get_cnts_values: get the values of the available counters.
+ * @eq_dispatcher_register_qp: register QP to its event dispatch queue.
+ * @eq_dispatcher_unregister_qp: un-register QP from its event dispatch queue.
+ * @get_speed: get the port speed in Mb/s.
+ * @track_ext_port_reset: track the reset of the given port according to the given syndrome.
+ * @port_toggle_count: count port toggles upon actions that teardown or create a port.
+ * @ports_reopen: reopen the ports after hard reset.
+ * @ports_stop_prepare: prepare the ports for a stop.
+ * @ports_stop: stop traffic.
+ * @set_port_status: set the link port status.
+ * @get_mac_lpbk: get MAC loopback status.
+ * @set_mac_lpbk: set MAC loopback status.
+ * @update_mtu: update all QPs to use the new MTU value.
+ * @qpc_write: write a QP context to the HW.
+ * @ctrl_lock: control mutex lock.
+ * @ctrl_unlock: control mutex unlock.
+ * @is_port_open: is port open;
+ * @get_src_ip: get the source IP of the given port.
+ * @reset_stats: reset port statistics (called from debugfs only).
+ * @get_mtu: get the port MTU value.
+ * @get_pflags: get the port private flags.
+ * @set_dev_lpbk: set loopback status on the net-device.
+ * @handle_eqe: handle event queue entry from H/W.
+ * @asic_ops: pointer for ASIC specific ops struct.
+ */
+struct hbl_en_aux_ops {
+ /* en2cn */
+ bool (*device_operational)(struct hbl_aux_dev *aux_dev);
+ void (*hw_access_lock)(struct hbl_aux_dev *aux_dev);
+ void (*hw_access_unlock)(struct hbl_aux_dev *aux_dev);
+ bool (*is_eth_lpbk)(struct hbl_aux_dev *aux_dev);
+ int (*port_hw_init)(struct hbl_aux_dev *aux_dev, u32 port);
+ void (*port_hw_fini)(struct hbl_aux_dev *aux_dev, u32 port);
+ int (*phy_init)(struct hbl_aux_dev *aux_dev, u32 port);
+ void (*phy_fini)(struct hbl_aux_dev *aux_dev, u32 port);
+ int (*set_pfc)(struct hbl_aux_dev *aux_dev, u32 port, bool enable);
+ int (*get_cnts_num)(struct hbl_aux_dev *aux_dev, u32 port);
+ void (*get_cnts_names)(struct hbl_aux_dev *aux_dev, u32 port, u8 *data);
+ void (*get_cnts_values)(struct hbl_aux_dev *aux_dev, u32 port, u64 *data);
+ bool (*get_mac_lpbk)(struct hbl_aux_dev *aux_dev, u32 port);
+ int (*set_mac_lpbk)(struct hbl_aux_dev *aux_dev, u32 port, bool enable);
+ int (*update_mtu)(struct hbl_aux_dev *aux_dev, u32 port, u32 mtu);
+ int (*qpc_write)(struct hbl_aux_dev *aux_dev, u32 port, void *qpc,
+ struct qpc_mask *qpc_mask, u32 qpn, bool is_req);
+ void (*ctrl_lock)(struct hbl_aux_dev *aux_dev, u32 port);
+ void (*ctrl_unlock)(struct hbl_aux_dev *aux_dev, u32 port);
+ int (*eq_dispatcher_register_qp)(struct hbl_aux_dev *aux_dev, u32 port, u32 asid,
+ u32 qp_id);
+ int (*eq_dispatcher_unregister_qp)(struct hbl_aux_dev *aux_dev, u32 port, u32 qp_id);
+ u32 (*get_speed)(struct hbl_aux_dev *aux_dev, u32 port);
+ void (*track_ext_port_reset)(struct hbl_aux_dev *aux_dev, u32 port, u32 syndrome);
+ void (*port_toggle_count)(struct hbl_aux_dev *aux_dev, u32 port);
+
+ /* cn2en */
+ int (*ports_reopen)(struct hbl_aux_dev *aux_dev);
+ void (*ports_stop_prepare)(struct hbl_aux_dev *aux_dev);
+ void (*ports_stop)(struct hbl_aux_dev *aux_dev);
+ void (*set_port_status)(struct hbl_aux_dev *aux_dev, u32 port_idx, bool up);
+ bool (*is_port_open)(struct hbl_aux_dev *aux_dev, u32 port_idx);
+ int (*get_src_ip)(struct hbl_aux_dev *aux_dev, u32 port_idx, u32 *src_ip);
+ void (*reset_stats)(struct hbl_aux_dev *aux_dev, u32 port_idx);
+ u32 (*get_mtu)(struct hbl_aux_dev *aux_dev, u32 port_idx);
+ u32 (*get_pflags)(struct hbl_aux_dev *aux_dev, u32 port_idx);
+ void (*set_dev_lpbk)(struct hbl_aux_dev *aux_dev, u32 port_idx, bool enable);
+ void (*handle_eqe)(struct hbl_aux_dev *aux_dev, u32 port, struct hbl_cn_eqe *eqe);
+ void *asic_ops;
+};
+
+/* InfiniBand */
+
+#define HBL_IB_CNT_NAME_LEN (ETH_GSTRING_LEN * 2)
+
+/**
+ * struct hbl_ib_device_attr - IB device attributes.
+ * @fw_ver: firmware version.
+ * @max_mr_size: max size of a memory region.
+ * @page_size_cap: largest page size in MMU.
+ * @vendor_id: device vendor ID.
+ * @vendor_part_id: device vendor part ID.
+ * @hw_ver: device chip version.
+ * @cqe_size: Size of Completion Queue Entry.
+ * @min_cq_entries: Minimum completion queue entries needed.
+ * @max_qp: max QPs supported.
+ * @max_qp_wr: max QPs per work-request supported.
+ * @max_cqe: max completion-queue entries supported.
+ */
+struct hbl_ib_device_attr {
+ u64 fw_ver;
+ u64 max_mr_size;
+ u64 page_size_cap;
+ u32 vendor_id;
+ u32 vendor_part_id;
+ u32 hw_ver;
+ u32 cqe_size;
+ u32 min_cq_entries;
+ s32 max_qp;
+ s32 max_qp_wr;
+ s32 max_cqe;
+};
+
+/**
+ * struct hbl_ib_port_attr - IB port attributes.
+ * @speed: speed in Mb/s.
+ * @max_msg_sz: max message size
+ * @max_mtu: max mtu size
+ * @open: is open and fully initialized.
+ * @link_up: has PCS link.
+ * @num_lanes: number of lanes per port.
+ */
+struct hbl_ib_port_attr {
+ u32 speed;
+ u32 max_msg_sz;
+ u32 max_mtu;
+ u8 open;
+ u8 link_up;
+ u8 num_lanes;
+};
+
+/**
+ * struct hbl_ib_port_cnts_data - IB port counters data.
+ * @names: Names of the counters.
+ * @num: Number of counters.
+ */
+struct hbl_ib_port_cnts_data {
+ u8 *names;
+ u32 num;
+};
+
+/**
+ * struct hbl_ib_dump_qp_attr - IB QP dump attributes.
+ * @port: Port ID the QP belongs to.
+ * @qpn: QP number.
+ * @req: Requester QP, otherwise responder.
+ * @full: Include full QP information.
+ * @force: Force reading a QP in invalid/error state.
+ */
+struct hbl_ib_dump_qp_attr {
+ u32 port;
+ u32 qpn;
+ u8 req;
+ u8 full;
+ u8 force;
+};
+
+/**
+ * struct hbl_ib_mem_info - Information for a memory region pertaining to a memory handle.
+ * @cpu_addr: The kernel virtual address.
+ * @bus_addr: The bus address.
+ * @mtype: The memory type.
+ * @mem_handle: The memory handle.
+ * @size: The size of the memory region.
+ * @vmalloc: The memory is virtually contiguous only.
+ */
+struct hbl_ib_mem_info {
+ void *cpu_addr;
+ dma_addr_t bus_addr;
+ enum hbl_ib_mem_type mtype;
+ u64 mem_handle;
+ u64 size;
+ u8 vmalloc;
+};
+
+/**
+ * struct hbl_ib_aux_data - habanalabs data for the IB driver.
+ * @pdev: pointer to PCI device.
+ * @dev: related kernel basic device structure.
+ * @cnts_data: Ports counters data.
+ * @ports_mask: mask of available ports.
+ * @ext_ports_mask: mask of external ports (subset of ports_mask).
+ * @dram_size: available DRAM size.
+ * @max_num_of_wqes: maximum number of WQ entries.
+ * @pending_reset_long_timeout: long timeout for pending hard reset to finish in seconds.
+ * @id: device ID.
+ * @max_num_of_ports: maximum number of ports supported by ASIC.
+ * @mixed_qp_wq_types: Using mixed QP WQ types is supported.
+ * @umr_support: device supports UMR.
+ * @cc_support: device supports congestion control.
+ */
+struct hbl_ib_aux_data {
+ struct pci_dev *pdev;
+ struct device *dev;
+ struct hbl_ib_port_cnts_data *cnts_data;
+ u64 ports_mask;
+ u64 ext_ports_mask;
+ u64 dram_size;
+ u32 max_num_of_wqes;
+ u32 pending_reset_long_timeout;
+ u16 id;
+ u8 max_num_of_ports;
+ u8 mixed_qp_wq_types;
+ u8 umr_support;
+ u8 cc_support;
+};
+
+/**
+ * struct hbl_ib_aux_ops - pointer functions for cn <-> ib drivers communication.
+ * @device_operational: is device operational.
+ * @hw_access_lock: prevent HW access.
+ * @hw_access_unlock: allow HW access.
+ * @alloc_ucontext: allocate user context.
+ * @dealloc_ucontext: deallocate user context.
+ * @query_port: get port attributes.
+ * @cmd_ctrl: operate the device with proprietary opcodes.
+ * @query_device: get device attributes.
+ * @set_ip_addr_encap: setup IP address encapsulation.
+ * @qp_syndrome_to_str: translates syndrome qp number to string.
+ * @verify_qp_id: verify if the specified QP id is valid.
+ * @get_cnts_values: get the values of the available counters.
+ * @dump_qp: dump QP context to the given buffer.
+ * @query_mem_handle: query information for a memory handle.
+ * @eqe_work_schd: schedule a user eq poll work on hbl side.
+ * @dispatch_fatal_event: raise a fatal event to user space.
+ */
+struct hbl_ib_aux_ops {
+ /* ib2cn */
+ bool (*device_operational)(struct hbl_aux_dev *aux_dev);
+ void (*hw_access_lock)(struct hbl_aux_dev *aux_dev);
+ void (*hw_access_unlock)(struct hbl_aux_dev *aux_dev);
+ int (*alloc_ucontext)(struct hbl_aux_dev *aux_dev, int user_fd, void **cn_ib_ctx);
+ void (*dealloc_ucontext)(struct hbl_aux_dev *aux_dev, void *cn_ib_ctx);
+ void (*query_port)(struct hbl_aux_dev *aux_dev, u32 port,
+ struct hbl_ib_port_attr *port_attr);
+ int (*cmd_ctrl)(struct hbl_aux_dev *aux_dev, void *cn_ib_ctx, u32 op, void *input,
+ void *output);
+ void (*query_device)(struct hbl_aux_dev *aux_dev, struct hbl_ib_device_attr *device_attr);
+ void (*set_ip_addr_encap)(struct hbl_aux_dev *aux_dev, u32 ip_addr, u32 port);
+ char *(*qp_syndrome_to_str)(struct hbl_aux_dev *aux_dev, u32 syndrome);
+ int (*verify_qp_id)(struct hbl_aux_dev *aux_dev, u32 qp_id, u32 port);
+ void (*get_cnts_values)(struct hbl_aux_dev *aux_dev, u32 port, u64 *data);
+ int (*dump_qp)(struct hbl_aux_dev *aux_dev, struct hbl_ib_dump_qp_attr *attr, char *buf,
+ size_t size);
+ int (*query_mem_handle)(struct hbl_aux_dev *aux_dev, u64 mem_handle,
+ struct hbl_ib_mem_info *info);
+
+ /* cn2ib */
+ void (*eqe_work_schd)(struct hbl_aux_dev *aux_dev, u32 port);
+ void (*dispatch_fatal_event)(struct hbl_aux_dev *aux_dev, u32 asid);
+};
+
+/* CN */
+
+/* interrupt type */
+enum hbl_cn_cpucp_interrupt_type {
+ HBL_CN_CPUCP_INTR_NONE = 0,
+ HBL_CN_CPUCP_INTR_TMR = 1,
+ HBL_CN_CPUCP_INTR_RXB_CORE_SPI,
+ HBL_CN_CPUCP_INTR_RXB_CORE_SEI,
+ HBL_CN_CPUCP_INTR_QPC_RESP_ERR,
+ HBL_CN_CPUCP_INTR_RXE_SPI,
+ HBL_CN_CPUCP_INTR_RXE_SEI,
+ HBL_CN_CPUCP_INTR_TXS,
+ HBL_CN_CPUCP_INTR_TXE,
+};
+
+/*
+ * struct hbl_cn_eq_port_intr_cause - port interrupt cause data.
+ * @intr_cause_data: interrupt cause data.
+ */
+struct hbl_cn_eq_port_intr_cause {
+ u64 intr_cause_data;
+};
+
+/*
+ * struct hbl_cn_eq_intr_cause - interrupt cause data.
+ * @intr_type: interrupt type.
+ * @intr_cause: array of ports interrupt cause data.
+ */
+struct hbl_cn_eq_intr_cause {
+ u32 intr_type; /* enum hbl_cn_cpucp_interrupt_type */
+ u32 pad;
+ struct hbl_cn_eq_port_intr_cause intr_cause[MAX_PORTS_PER_NIC];
+};
+
+/*
+ * struct hbl_cn_cpucp_frac_val - fracture value represented by "integer.frac".
+ * @integer: the integer part of the fracture value;
+ * @frac: the fracture part of the fracture value.
+ */
+struct hbl_cn_cpucp_frac_val {
+ union {
+ struct {
+ u16 integer;
+ u16 frac;
+ };
+ u16 val;
+ };
+};
+
+/*
+ * struct hbl_cn_cpucp_ser_val - Symbol Error Rate value represented by "integer * 10 ^ -exp".
+ * @integer: the integer part of the SER value.
+ * @exp: the exponent part of the SER value.
+ */
+struct hbl_cn_cpucp_ser_val {
+ u16 integer;
+ u16 exp;
+};
+
+#endif /* HBL_CN_H_ */
diff --git a/include/linux/net/intel/cn_aux.h b/include/linux/net/intel/cn_aux.h
new file mode 100644
index 000000000000..8a7f54e78afb
--- /dev/null
+++ b/include/linux/net/intel/cn_aux.h
@@ -0,0 +1,298 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef HBL_CN_AUX_H_
+#define HBL_CN_AUX_H_
+
+#include <linux/irqreturn.h>
+#include <linux/habanalabs/cpucp_if.h>
+#include <linux/auxiliary_bus.h>
+#include <linux/if_vlan.h>
+#include <uapi/linux/ethtool.h>
+
+#define HBL_EN_MAX_HEADERS_SZ (ETH_HLEN + 2 * VLAN_HLEN + ETH_FCS_LEN)
+
+/* driver specific value, should always be >= asic specific h/w resource */
+#define NIC_DRV_MAX_CQS_NUM 32
+#define NIC_DRV_MAX_CCQS_NUM 4
+#define NIC_DRV_NUM_DB_FIFOS 32
+
+/**
+ * enum hbl_cn_asic_type - supported ASIC types.
+ * @ASIC_GAUDI2: Gaudi2 device.
+ */
+enum hbl_cn_asic_type {
+ HBL_ASIC_GAUDI2,
+};
+
+/**
+ * enum hbl_cn_status_cmd - status cmd type.
+ * @HBL_CN_STATUS_ONE_SHOT: one shot command.
+ * @HBL_CN_STATUS_PERIODIC_START: start periodic status update.
+ * @HBL_CN_STATUS_PERIODIC_STOP: stop periodic status update.
+ */
+enum hbl_cn_status_cmd {
+ HBL_CN_STATUS_ONE_SHOT,
+ HBL_CN_STATUS_PERIODIC_START,
+ HBL_CN_STATUS_PERIODIC_STOP,
+};
+
+/**
+ * enum hbl_aux_dev_type - auxiliary device type.
+ * HBL_AUX_DEV_CN: Shared Network Interface.
+ * HBL_AUX_DEV_ETH: Ethernet.
+ * HBL_AUX_DEV_IB: InfiniBand.
+ */
+enum hbl_aux_dev_type {
+ HBL_AUX_DEV_CN,
+ HBL_AUX_DEV_ETH,
+ HBL_AUX_DEV_IB,
+};
+
+/**
+ * struct hbl_aux_dev - habanalabs auxiliary device structure.
+ * @adev: auxiliary device.
+ * @aux_ops: pointer functions for drivers communication.
+ * @aux_data: essential data for operating the auxiliary device.
+ * @priv: auxiliary device private data.
+ * @type: type of the auxiliary device.
+ */
+struct hbl_aux_dev {
+ struct auxiliary_device adev;
+ void *aux_ops;
+ void *aux_data;
+ void *priv;
+ enum hbl_aux_dev_type type;
+};
+
+/**
+ * struct hbl_cn_stat - Holds ASIC specific statistics string and default register offset.
+ * @str: String name of ethtool stat.
+ * @lo_offset: Register offset of the stat.
+ * @hi_offset: High register offset. May be unused for some stats.
+ */
+struct hbl_cn_stat {
+ char str[ETH_GSTRING_LEN];
+ int lo_offset;
+ int hi_offset;
+};
+
+/*
+ * struct hbl_cn_cpucp_mac_addr - port MAC address received from FW.
+ * @mac_addr: port MAC address.
+ */
+struct hbl_cn_cpucp_mac_addr {
+ u8 mac_addr[ETH_ALEN];
+};
+
+/*
+ * struct hbl_cn_cpucp_info - info received from FW.
+ * @mac_addrs: array of MAC address for all physical ports.
+ * @link_mask: mask of available ports.
+ * @pol_tx_mask: array of Tx polarity value for all ports.
+ * @pol_rx_mask: array of Rx polarity value for all ports.
+ * @link_ext_mask: mask of external ports.
+ * @qsfp_eeprom: QSFP EEPROM info.
+ * @auto_neg_mask: mask of ports which supports Autonegotiation.
+ * @serdes_type: type of serdes.
+ * @tx_swap_map: lane swapping map.
+ */
+struct hbl_cn_cpucp_info {
+ struct hbl_cn_cpucp_mac_addr mac_addrs[CPUCP_MAX_NICS];
+ u64 link_mask[CPUCP_NIC_MASK_ARR_LEN];
+ u64 pol_tx_mask[CPUCP_NIC_POLARITY_ARR_LEN];
+ u64 pol_rx_mask[CPUCP_NIC_POLARITY_ARR_LEN];
+ u64 link_ext_mask[CPUCP_NIC_MASK_ARR_LEN];
+ u8 qsfp_eeprom[CPUCP_NIC_QSFP_EEPROM_MAX_LEN];
+ u64 auto_neg_mask[CPUCP_NIC_MASK_ARR_LEN];
+ enum cpucp_serdes_type serdes_type;
+ u16 tx_swap_map[CPUCP_MAX_NICS];
+};
+
+/**
+ * struct hbl_cn_aux_data - habanalabs data for the cn driver.
+ * @pdev: pointer to PCI device.
+ * @dev: related kernel basic device structure.
+ * @asic_specific: ASIC specific data.
+ * @fw_ver: FW version.
+ * @asic_type: ASIC specific type.
+ * @ports_mask: mask of available ports.
+ * @ext_ports_mask: mask of external ports (subset of ports_mask).
+ * @auto_neg_mask: mask of ports with Autonegotiation enabled.
+ * @dram_size: available DRAM size.
+ * @nic_drv_addr: base address for NIC driver on DRAM.
+ * @nic_drv_size: driver size reserved for NIC driver on DRAM.
+ * @macro_cfg_size: the size of the macro configuration space.
+ * @max_num_of_ports: max number of available ports.
+ * @pending_reset_long_timeout: long timeout for pending hard reset to finish in seconds.
+ * @kernel_asid: kernel ASID.
+ * @card_location: the OAM number in the HLS (relevant for PMC card type).
+ * @device_timeout: device access timeout in usec.
+ * @fw_major_version: major version of current loaded preboot.
+ * @fw_minor_version: minor version of current loaded preboot.
+ * @fw_app_cpu_boot_dev_sts0: bitmap representation of application security
+ * status reported by FW, bit description can be
+ * found in CPU_BOOT_DEV_STS0
+ * @fw_app_cpu_boot_dev_sts1: bitmap representation of application security
+ * status reported by FW, bit description can be
+ * found in CPU_BOOT_DEV_STS1
+ * @id: device ID.
+ * @cache_line_size: device cache line size.
+ * @clk: clock frequency in MHz.
+ * @pldm: is running on Palladium setup.
+ * @skip_phy_init: avoid writing/reading PHY registers.
+ * @load_phy_fw: load PHY F/W.
+ * @cpucp_fw: is CPUCP FW enabled.
+ * @supports_coresight: is CoreSight supported.
+ * @use_fw_serdes_info: true if FW serdes values should be used, false if hard coded values should
+ * be used.
+ * @mmu_enable: is MMU enabled.
+ * @lanes_per_port: number of physical lanes per port.
+ * @cpucp_checkers_shift: CPUCP checkers flags shift.
+ */
+struct hbl_cn_aux_data {
+ struct pci_dev *pdev;
+ struct device *dev;
+ void *asic_specific;
+ char *fw_ver;
+ enum hbl_cn_asic_type asic_type;
+ u64 ports_mask;
+ u64 ext_ports_mask;
+ u64 auto_neg_mask;
+ u64 dram_size;
+ u64 nic_drv_addr;
+ u64 nic_drv_size;
+ u32 macro_cfg_size;
+ u32 pending_reset_long_timeout;
+ u32 kernel_asid;
+ u32 card_location;
+ u32 device_timeout;
+ u32 fw_major_version;
+ u32 fw_minor_version;
+ u32 fw_app_cpu_boot_dev_sts0;
+ u32 fw_app_cpu_boot_dev_sts1;
+ u16 id;
+ u16 cache_line_size;
+ u16 clk;
+ u8 pldm;
+ u8 skip_phy_init;
+ u8 load_phy_fw;
+ u8 cpucp_fw;
+ u8 supports_coresight;
+ u8 use_fw_serdes_info;
+ u8 mmu_enable;
+ u8 lanes_per_port;
+ u8 cpucp_checkers_shift;
+};
+
+/**
+ * enum hbl_cn_mmu_mode - MMU modes the CN can work with.
+ * @HBL_CN_MMU_MODE_EXTERNAL: using external MMU HW IP.
+ * @HBL_CN_MMU_MODE_NETWORK_TLB: Using internal network TLB (but external page-table).
+ */
+enum hbl_cn_mmu_mode {
+ HBL_CN_MMU_MODE_EXTERNAL,
+ HBL_CN_MMU_MODE_NETWORK_TLB,
+};
+
+/**
+ * struct hbl_cn_vm_info - VM related info for the cn driver.
+ * @mmu_mode: the type (or mode) of MMU currently configured.
+ * @ext_mmu.work_id: the unique work-ID assigned to this VM when in external MMU mode.
+ * @net_tlb.pasid: the PCI process space address ID assigned to the device.
+ * @net_tlb.page_tbl_addr: the address of the MMU page table of this VM.
+ */
+struct hbl_cn_vm_info {
+ enum hbl_cn_mmu_mode mmu_mode;
+ union {
+ struct {
+ u32 work_id;
+ } ext_mmu;
+
+ struct {
+ u32 pasid;
+ u64 page_tbl_addr;
+ } net_tlb;
+ };
+};
+
+typedef bool (*hbl_cn_poll_cond_func)(u32 val, void *arg);
+
+enum hbl_cn_mem_type {
+ HBL_CN_MEM_TYPE_HOST,
+ HBL_CN_MEM_TYPE_DEVICE,
+};
+
+/**
+ * struct hbl_cn_aux_ops - pointer functions for cn <-> compute drivers communication.
+ * @device_operational: is device operational.
+ * @hw_access_lock: prevent HW access.
+ * @hw_access_unlock: allow HW access.
+ * @device_reset: Perform device reset.
+ * @vm_dev_mmu_map: map cpu/kernel address or device memory range to device address range in order
+ * to provide device-memory access.
+ * @vm_dev_mmu_unmap: unmap a previously mapped address range.
+ * @vm_reserve_dva_block: Reserve a device virtual block of a given size.
+ * @vm_unreserve_dva_block: Release a given device virtual block.
+ * @dram_readl: Read long from DRAM.
+ * @dram_writel: Write long to DRAM.
+ * @rreg: Read register.
+ * @wreg: Write register.
+ * @get_reg_pcie_addr: Retrieve pci address.
+ * @poll_reg: Poll on a register until a given condition is fulfilled or timeout.
+ * @get_cpucp_info: fetch updated CPUCP info.
+ * @register_cn_user_context: register a user context represented by user provided FD. If the
+ * returned comp_handle and vm_handle are equal then this context doesn't
+ * support data transfer.
+ * @deregister_cn_user_context: de-register the user context represented by the vm_handle returned
+ * from calling register_cn_user_context.
+ * @vm_create: create a VM in registered context.
+ * @vm_destroy: destroy a VM in registered context.
+ * @get_vm_info: get information on a VM.
+ * @ports_reopen: reopen the ports after hard reset.
+ * @ports_stop_prepare: prepare the ports for a stop.
+ * @ports_stop: stop traffic.
+ * @synchronize_irqs: Synchronize IRQs.
+ * @asic_ops: pointer for ASIC specific ops struct.
+ */
+struct hbl_cn_aux_ops {
+ /* cn2compute */
+ bool (*device_operational)(struct hbl_aux_dev *aux_dev);
+ void (*hw_access_lock)(struct hbl_aux_dev *aux_dev);
+ void (*hw_access_unlock)(struct hbl_aux_dev *aux_dev);
+ void (*device_reset)(struct hbl_aux_dev *aux_dev);
+ int (*vm_dev_mmu_map)(struct hbl_aux_dev *aux_dev, u64 vm_handle,
+ enum hbl_cn_mem_type mem_type, u64 addr, u64 dva, size_t size);
+ void (*vm_dev_mmu_unmap)(struct hbl_aux_dev *aux_dev, u64 vm_handle, u64 dva, size_t size);
+ int (*vm_reserve_dva_block)(struct hbl_aux_dev *aux_dev, u64 vm_handle, u64 size, u64 *dva);
+ void (*vm_unreserve_dva_block)(struct hbl_aux_dev *aux_dev, u64 vm_handle, u64 dva,
+ u64 size);
+ u32 (*dram_readl)(struct hbl_aux_dev *aux_dev, u64 addr);
+ void (*dram_writel)(struct hbl_aux_dev *aux_dev, u32 val, u64 addr);
+ u32 (*rreg)(struct hbl_aux_dev *aux_dev, u32 reg);
+ void (*wreg)(struct hbl_aux_dev *aux_dev, u32 reg, u32 val);
+ int (*get_reg_pcie_addr)(struct hbl_aux_dev *aux_dev, u32 reg, u64 *pci_addr);
+ int (*poll_reg)(struct hbl_aux_dev *aux_dev, u32 reg, u64 timeout_us,
+ hbl_cn_poll_cond_func func, void *arg);
+ void (*get_cpucp_info)(struct hbl_aux_dev *aux_dev,
+ struct hbl_cn_cpucp_info *hbl_cn_cpucp_info);
+ int (*register_cn_user_context)(struct hbl_aux_dev *aux_dev, int user_fd,
+ const void *cn_ctx, u64 *comp_handle, u64 *vm_handle);
+ void (*deregister_cn_user_context)(struct hbl_aux_dev *aux_dev, u64 vm_handle);
+ int (*vm_create)(struct hbl_aux_dev *aux_dev, u64 comp_handle, u32 flags, u64 *vm_handle);
+ void (*vm_destroy)(struct hbl_aux_dev *aux_dev, u64 vm_handle);
+ int (*get_vm_info)(struct hbl_aux_dev *aux_dev, u64 vm_handle,
+ struct hbl_cn_vm_info *vm_info);
+
+ /* compute2cn */
+ int (*ports_reopen)(struct hbl_aux_dev *aux_dev);
+ void (*ports_stop_prepare)(struct hbl_aux_dev *aux_dev, bool fw_reset, bool in_teardown);
+ void (*ports_stop)(struct hbl_aux_dev *aux_dev);
+ void (*synchronize_irqs)(struct hbl_aux_dev *aux_dev);
+ void *asic_ops;
+};
+
+#endif /* HBL_CN_AUX_H_ */
diff --git a/include/linux/net/intel/cni.h b/include/linux/net/intel/cni.h
new file mode 100644
index 000000000000..206b93ffd0e8
--- /dev/null
+++ b/include/linux/net/intel/cni.h
@@ -0,0 +1,636 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef HBL_CNI_H_
+#define HBL_CNI_H_
+
+#include <linux/if_ether.h>
+#include <linux/types.h>
+
+#define HBL_CNI_STAT_STR_LEN 32
+
+/* Requester */
+#define HBL_CNI_CQE_TYPE_REQ 0
+/* Responder */
+#define HBL_CNI_CQE_TYPE_RES 1
+
+/* Number of backpressure offsets */
+#define HBL_CNI_USER_BP_OFFS_MAX 16
+
+/* Number of FnA addresses for SRAM/DCCM completion */
+#define HBL_CNI_FNA_CMPL_ADDR_NUM 2
+
+/**
+ * struct hbl_cni_alloc_conn_in - NIC opcode HBL_CNI_OP_ALLOC_CONN in param.
+ * @port: NIC port ID.
+ * @hint: this may be used as the connection-number hint for the driver as a recommendation of user.
+ */
+struct hbl_cni_alloc_conn_in {
+ u32 port;
+ u32 hint;
+};
+
+/**
+ * struct hbl_cni_alloc_conn_out - NIC opcode HBL_CNI_OP_ALLOC_CONN out param.
+ * @conn_id: Connection ID.
+ */
+struct hbl_cni_alloc_conn_out {
+ u32 conn_id;
+};
+
+/**
+ * struct hbl_cni_req_conn_ctx_in - NIC opcode HBL_CNI_OP_SET_REQ_CONN_CTX in param.
+ * @dst_ip_addr: Destination IP address in native endianness.
+ * @dst_conn_id: Destination connection ID.
+ * @port: NIC port ID.
+ * @conn_id: Connection ID.
+ * @dst_mac_addr: Destination MAC address.
+ * @priority: Connection priority [0..3].
+ * @timer_granularity: Timer granularity [0..127].
+ * @swq_granularity: SWQ granularity [0 for 32B or 1 for 64B].
+ * @wq_type: Work queue type [1..3].
+ * @cq_number: Completion queue number.
+ * @wq_remote_log_size: Remote Work queue log size [2^QPC] Rendezvous.
+ * @congestion_en: Enable/disable Congestion-Control.
+ * @congestion_wnd: Congestion-Window size.
+ * @mtu: Max Transmit Unit.
+ * @encap_en: used as boolean; indicates if this QP has encapsulation support.
+ * @encap_id: Encapsulation-id; valid only if 'encap_en' is set.
+ * @wq_size: Max number of elements in the work queue.
+ * @loopback: used as boolean; indicates if this QP used for loopback mode.
+ * @compression_en: Enable compression.
+ * @remote_key: Remote-key to be used to generate on outgoing packets.
+ */
+struct hbl_cni_req_conn_ctx_in {
+ u32 reserved0;
+ u32 dst_ip_addr;
+ u32 dst_conn_id;
+ u32 deprecated0;
+ u32 reserved1;
+ u32 port;
+ u32 conn_id;
+ u8 dst_mac_addr[ETH_ALEN];
+ u8 deprecated1;
+ u8 priority;
+ u8 deprecated2;
+ u8 timer_granularity;
+ u8 swq_granularity;
+ u8 wq_type;
+ u8 deprecated3;
+ u8 cq_number;
+ u8 wq_remote_log_size;
+ u8 congestion_en;
+ u32 congestion_wnd;
+ u16 mtu;
+ u8 encap_en;
+ u8 encap_id;
+ u32 wq_size;
+ u8 loopback;
+ u8 reserved2;
+ u8 reserved3;
+ u8 compression_en;
+ u32 remote_key;
+};
+
+/**
+ * struct hbl_cni_req_conn_ctx_out - NIC opcode HBL_CNI_OP_SET_REQ_CONN_CTX out param.
+ * @swq_mem_handle: Handle for send WQ memory.
+ * @rwq_mem_handle: Handle for receive WQ memory.
+ * @swq_mem_size: Size of the send WQ memory.
+ * @rwq_mem_size: Size of the receive WQ memory.
+ */
+struct hbl_cni_req_conn_ctx_out {
+ u64 swq_mem_handle;
+ u64 rwq_mem_handle;
+ u32 swq_mem_size;
+ u32 rwq_mem_size;
+};
+
+/**
+ * struct hbl_cni_res_conn_ctx_in - NIC opcode HBL_CNI_OP_SET_RES_CONN_CTX in param.
+ * @dst_ip_addr: Destination IP address in native endianness.
+ * @dst_conn_id: Destination connection ID.
+ * @port: NIC port ID.
+ * @conn_id: Connection ID.
+ * @dst_mac_addr: Destination MAC address.
+ * @priority: Connection priority [0..3].
+ * @wq_peer_granularity: Work queue granularity.
+ * @cq_number: Completion queue number.
+ * @conn_peer: Connection peer.
+ * @rdv: used as boolean; indicates if this QP is RDV (WRITE or READ).
+ * @loopback: used as boolean; indicates if this QP used for loopback mode.
+ * @encap_en: used as boolean; indicates if this QP has encapsulation support.
+ * @encap_id: Encapsulation-id; valid only if 'encap_en' is set.
+ * @wq_peer_size: size of the peer Work queue.
+ * @local_key: Local-key to be used to validate against incoming packets.
+ */
+struct hbl_cni_res_conn_ctx_in {
+ u32 reserved;
+ u32 dst_ip_addr;
+ u32 dst_conn_id;
+ u32 port;
+ u32 conn_id;
+ u8 dst_mac_addr[ETH_ALEN];
+ u8 priority;
+ u8 deprecated1;
+ u8 deprecated2;
+ u8 wq_peer_granularity;
+ u8 cq_number;
+ u8 deprecated3;
+ u32 conn_peer;
+ u8 rdv;
+ u8 loopback;
+ u8 encap_en;
+ u8 encap_id;
+ u32 wq_peer_size;
+ u32 local_key;
+};
+
+/**
+ * struct hbl_cni_destroy_conn_in - NIC opcode HBL_CNI_OP_DESTROY_CONN in param.
+ * @port: NIC port ID.
+ * @conn_id: Connection ID.
+ */
+struct hbl_cni_destroy_conn_in {
+ u32 port;
+ u32 conn_id;
+};
+
+/**
+ * enum hbl_nic_mem_type - NIC WQ memory allocation type.
+ * @HBL_CNI_USER_WQ_SEND: Allocate memory for the user send WQ array.
+ * @HBL_CNI_USER_WQ_RECV: Allocate memory for the user receive WQ array.
+ * @HBL_CNI_USER_WQ_TYPE_MAX: number of values in enum.
+ */
+enum hbl_nic_mem_type {
+ HBL_CNI_USER_WQ_SEND,
+ HBL_CNI_USER_WQ_RECV,
+ HBL_CNI_USER_WQ_TYPE_MAX
+};
+
+/**
+ * enum hbl_nic_mem_id - memory allocation methods.
+ * @HBL_CNI_MEM_HOST: memory allocated on the host memory.
+ * @HBL_CNI_MEM_DEVICE: memory allocated on the device memory.
+ */
+enum hbl_nic_mem_id {
+ HBL_CNI_MEM_HOST = 1,
+ HBL_CNI_MEM_DEVICE
+};
+
+/**
+ * enum hbl_nic_swq_granularity - send WQE granularity.
+ * @HBL_CNI_SWQE_GRAN_32B: 32 byte WQE for linear write.
+ * @HBL_CNI_SWQE_GRAN_64B: 64 byte WQE for multi-stride write.
+ */
+enum hbl_nic_swq_granularity {
+ HBL_CNI_SWQE_GRAN_32B,
+ HBL_CNI_SWQE_GRAN_64B
+};
+
+/**
+ * struct hbl_cni_user_wq_arr_set_in - NIC opcode HBL_CNI_OP_USER_WQ_SET in param.
+ * @port: NIC port ID.
+ * @num_of_wqs: Number of user WQs.
+ * @num_of_wq_entries: Number of entries per user WQ.
+ * @type: Type of user WQ array.
+ * @mem_id: Specify host/device memory allocation.
+ * @swq_granularity: Specify the granularity of send WQ, 0: 32 bytes, 1: 64 bytes.
+ */
+struct hbl_cni_user_wq_arr_set_in {
+ u64 reserved;
+ u32 port;
+ u32 num_of_wqs;
+ u32 num_of_wq_entries;
+ u32 type;
+ u32 mem_id;
+ u8 swq_granularity;
+};
+
+/**
+ * struct hbl_cni_user_wq_arr_set_out - NIC opcode HBL_CNI_OP_USER_WQ_SET out param.
+ * @mem_handle: Handle of WQ array memory buffer.
+ */
+struct hbl_cni_user_wq_arr_set_out {
+ u64 mem_handle;
+};
+
+/**
+ * struct hbl_cni_user_wq_arr_unset_in - NIC opcode HBL_CNI_OP_USER_WQ_UNSET in param.
+ * @port: NIC port ID.
+ * @type: Type of user WQ array.
+ */
+struct hbl_cni_user_wq_arr_unset_in {
+ u32 port;
+ u32 type;
+};
+
+/**
+ * struct hbl_cni_alloc_user_cq_id_in - NIC opcode HBL_CNI_OP_ALLOC_USER_CQ_ID in param.
+ * @port: NIC port ID.
+ */
+struct hbl_cni_alloc_user_cq_id_in {
+ u32 port;
+};
+
+/**
+ * struct hbl_cni_alloc_user_cq_id_out - NIC opcode HBL_CNI_OP_ALLOC_USER_CQ_ID out param.
+ * @id: CQ ID.
+ */
+struct hbl_cni_alloc_user_cq_id_out {
+ u32 id;
+};
+
+/**
+ * struct hbl_cni_user_cq_id_set_in - NIC opcode HBL_CNI_OP_USER_CQ_SET in param.
+ * @port: NIC port ID.
+ * @num_of_cqes: Number of CQ entries in the buffer.
+ * @id: CQ ID.
+ */
+struct hbl_cni_user_cq_id_set_in {
+ u32 port;
+ u32 num_of_cqes;
+ u32 id;
+};
+
+/**
+ * struct hbl_cni_user_cq_id_set_out - NIC opcode HBL_CNI_OP_USER_CQ_ID_SET out param.
+ * @mem_handle: Handle of CQ memory buffer.
+ * @pi_handle: Handle of CQ producer-inder memory buffer.
+ * @regs_handle: Handle of CQ Registers base-address.
+ * @regs_offset: CQ Registers sub-offset.
+ */
+struct hbl_cni_user_cq_id_set_out {
+ u64 mem_handle;
+ u64 pi_handle;
+ u64 regs_handle;
+ u32 regs_offset;
+};
+
+/**
+ * struct hbl_cni_user_cq_id_unset_in - NIC opcode HBL_CNI_OP_USER_CQ_ID_UNSET in param.
+ * @port: NIC port ID.
+ * @id: NIC CQ ID.
+ */
+struct hbl_cni_user_cq_id_unset_in {
+ u32 port;
+ u32 id;
+};
+
+/**
+ * struct hbl_cni_dump_qp_in - NIC opcode HBL_CNI_OP_DUMP_QP in param.
+ * @user_buf_address: Pre-allocated user buffer address to hold the dump output.
+ * @user_buf_size: Size of the user buffer.
+ * @port: NIC port ID.
+ * @qpn: NIC QP ID.
+ * @req: is requester (otherwise responder).
+ */
+struct hbl_cni_dump_qp_in {
+ u64 user_buf;
+ u32 user_buf_size;
+ u32 port;
+ u32 qpn;
+ u8 req;
+};
+
+/* User App Params */
+
+/**
+ * struct hbl_cni_set_user_app_params_in - NIC opcode HBL_CNI_OP_SET_USER_APP_PARAMS in param.
+ * allow the user application to set general parameters
+ * regarding the RDMA nic operation. These parameters stay
+ * in effect until the application releases the device
+ * @port: NIC port ID.
+ * @bp_offs: Offsets in NIC memory to signal a back pressure. Note that the advanced flag must be
+ * enabled in case it's being set.
+ * @advanced: A boolean that indicates whether this WQ should support advanced operations, such as
+ * RDV, QMan, WTD, etc.
+ * @adaptive_timeout_en: Enable adaptive timeout feature for this port.
+ */
+struct hbl_cni_set_user_app_params_in {
+ u32 port;
+ u32 bp_offs[HBL_CNI_USER_BP_OFFS_MAX];
+ u8 advanced;
+ u8 adaptive_timeout_en;
+};
+
+/**
+ * struct hbl_cni_get_user_app_params_in - NIC opcode HBL_CNI_OP_GET_USER_APP_PARAMS in param.
+ * @port: NIC port ID.
+ */
+struct hbl_cni_get_user_app_params_in {
+ u32 port;
+};
+
+/**
+ * struct hbl_cni_get_user_app_params_out - NIC opcode HBL_CNI_OP_GET_USER_APP_PARAMS out param.
+ * @max_num_of_qps: Number of QPs that are supported by the driver. User must allocate enough room
+ * for his work-queues according to this number.
+ * @num_allocated_qps: Number of QPs that were already allocated (in use).
+ * @max_allocated_qp_idx: The highest index of the allocated QPs (i.e. this is where the
+ * driver may allocate its next QP).
+ * @max_cq_size: Maximum size of a CQ buffer.
+ * @advanced: true if advanced features are supported.
+ * @max_num_of_cqs: Maximum number of CQs.
+ * @max_num_of_db_fifos: Maximum number of DB-FIFOs.
+ * @max_num_of_encaps: Maximum number of encapsulations.
+ * @speed: port speed in Mbps.
+ * @nic_macro_idx: macro index of this specific port.
+ * @nic_phys_port_idx: physical port index (AKA lane) of this specific port.
+ */
+struct hbl_cni_get_user_app_params_out {
+ u32 max_num_of_qps;
+ u32 num_allocated_qps;
+ u32 max_allocated_qp_idx;
+ u32 max_cq_size;
+ u8 advanced;
+ u8 max_num_of_cqs;
+ u8 max_num_of_db_fifos;
+ u8 max_num_of_encaps;
+ u32 speed;
+ u8 nic_macro_idx;
+ u8 nic_phys_port_idx;
+};
+
+/**
+ * struct hbl_cni_alloc_user_db_fifo_in - NIC opcode HBL_CNI_OP_ALLOC_USER_DB_FIFO in param
+ * @port: NIC port ID
+ * @id_hint: Hint to allocate a specific HW resource
+ */
+struct hbl_cni_alloc_user_db_fifo_in {
+ u32 port;
+ u32 id_hint;
+};
+
+/**
+ * struct hbl_cni_alloc_user_db_fifo_out - NIC opcode HBL_CNI_OP_ALLOC_USER_DB_FIFO out param
+ * @id: DB-FIFO ID
+ */
+struct hbl_cni_alloc_user_db_fifo_out {
+ u32 id;
+};
+
+/**
+ * enum hbl_nic_db_fifo_type - NIC users FIFO modes of operation.
+ * @HBL_CNI_DB_FIFO_TYPE_DB: mode for direct user door-bell submit.
+ * @HBL_CNI_DB_FIFO_TYPE_CC: mode for congestion control.
+ */
+enum hbl_nic_db_fifo_type {
+ HBL_CNI_DB_FIFO_TYPE_DB = 0,
+ HBL_CNI_DB_FIFO_TYPE_CC,
+};
+
+/**
+ * struct hbl_cni_user_db_fifo_set_in - NIC opcode HBL_CNI_OP_USER_DB_FIFO_SET in param.
+ * @port: NIC port ID
+ * @id: NIC DB-FIFO ID
+ * @mode: represents desired mode of operation for provided FIFO, according to hbl_nic_db_fifo_type
+ */
+struct hbl_cni_user_db_fifo_set_in {
+ u32 port;
+ u32 id;
+ u8 mode;
+};
+
+/**
+ * struct hbl_cni_user_db_fifo_set_out - NIC opcode HBL_CNI_OP_USER_DB_FIFO_SET out param.
+ * @ci_handle: Handle of DB-FIFO consumer-inder memory buffer.
+ * @regs_handle: Handle of DB-FIFO Registers base-address.
+ * @regs_offset: Offset to the DB-FIFO Registers.
+ * @fifo_size: fifo size that was allocated.
+ * @fifo_bp_thresh: fifo threshold that was set by the driver.
+ */
+struct hbl_cni_user_db_fifo_set_out {
+ u64 ci_handle;
+ u64 regs_handle;
+ u32 regs_offset;
+ u32 fifo_size;
+ u32 fifo_bp_thresh;
+};
+
+/**
+ * struct hbl_cni_user_db_fifo_unset_in - NIC opcode HBL_CNI_OP_USER_DB_FIFO_UNSET in param.
+ * @port: NIC port ID.
+ * @id: NIC DB-FIFO ID.
+ */
+struct hbl_cni_user_db_fifo_unset_in {
+ u32 port;
+ u32 id;
+};
+
+/* The operation completed successfully and an event was read */
+#define HBL_CNI_EQ_POLL_STATUS_SUCCESS 0
+/* The operation completed successfully, no event was found */
+#define HBL_CNI_EQ_POLL_STATUS_EQ_EMPTY 1
+/* The operation failed since it is not supported by the device/driver */
+#define HBL_CNI_EQ_POLL_STATUS_ERR_UNSUPPORTED_OP 2
+/* The operation failed, port was not found */
+#define HBL_CNI_EQ_POLL_STATUS_ERR_NO_SUCH_PORT 3
+/* The operation failed, port is disabled */
+#define HBL_CNI_EQ_POLL_STATUS_ERR_PORT_DISABLED 4
+/* The operation failed, an event-queue associated with the app was not found */
+#define HBL_CNI_EQ_POLL_STATUS_ERR_NO_SUCH_EQ 5
+/* The operation failed with an undefined error */
+#define HBL_CNI_EQ_POLL_STATUS_ERR_UNDEF 6
+
+/* completion-queue events */
+#define HBL_CNI_EQ_EVENT_TYPE_CQ_ERR 0
+/* Queue-pair events */
+#define HBL_CNI_EQ_EVENT_TYPE_QP_ERR 1
+/* Doorbell events */
+#define HBL_CNI_EQ_EVENT_TYPE_DB_FIFO_ERR 2
+/* congestion completion-queue events */
+#define HBL_CNI_EQ_EVENT_TYPE_CCQ 3
+/* Direct WQE security error. */
+#define HBL_CNI_EQ_EVENT_TYPE_WTD_SECURITY_ERR 4
+/* Numerical error */
+#define HBL_CNI_EQ_EVENT_TYPE_NUMERICAL_ERR 5
+/* Link status. */
+#define HBL_CNI_EQ_EVENT_TYPE_LINK_STATUS 6
+/* Queue-pair counters aligned */
+#define HBL_CNI_EQ_EVENT_TYPE_QP_ALIGN_COUNTERS 7
+
+/**
+ * struct hbl_cni_eq_poll_in - NIC opcode HBL_CNI_OP_EQ_POLL in param.
+ * @port: NIC port ID.
+ */
+struct hbl_cni_eq_poll_in {
+ u32 port;
+};
+
+/**
+ * struct hbl_cni_eq_poll_out - NIC opcode HBL_CNI_OP_EQ_POLL out param.
+ * @status: HBL_CNI_EQ_POLL_STATUS_*.
+ * @idx: Connection/CQ/DB-fifo index, depends on event type.
+ * @ev_data: Event-specific data.
+ * @ev_type: Event type.
+ * @rest_occurred: Was the error due to reset.
+ * @is_req: For QP events marks if corresponding QP is requestor.
+ */
+struct hbl_cni_eq_poll_out {
+ u32 status;
+ u32 idx;
+ u32 ev_data;
+ u8 ev_type;
+ u8 rest_occurred;
+ u8 is_req;
+};
+
+/**
+ * enum hbl_nic_encap_type - Supported encapsulation types
+ * @HBL_CNI_ENCAP_NONE: No Tunneling.
+ * @HBL_CNI_ENCAP_OVER_IPV4: Tunnel RDMA packets through L3 layer
+ * @HBL_CNI_ENCAP_OVER_UDP: Tunnel RDMA packets through L4 layer
+ */
+enum hbl_nic_encap_type {
+ HBL_CNI_ENCAP_NONE,
+ HBL_CNI_ENCAP_OVER_IPV4,
+ HBL_CNI_ENCAP_OVER_UDP,
+};
+
+/**
+ * struct hbl_cni_user_encap_alloc_in - NIC opcode HBL_CNI_OP_USER_ENCAP_ALLOC in param.
+ * @port: NIC port ID.
+ */
+struct hbl_cni_user_encap_alloc_in {
+ u32 port;
+};
+
+/**
+ * struct hbl_cni_user_encap_alloc_out - NIC opcode HBL_CNI_OP_USER_ENCAP_ALLOC out param.
+ * @id: Encapsulation ID.
+ */
+struct hbl_cni_user_encap_alloc_out {
+ u32 id;
+};
+
+/**
+ * struct hbl_cni_user_encap_set_in - NIC opcode HBL_CNI_OP_USER_ENCAP_SET in param.
+ * @tnl_hdr_ptr: Pointer to the tunnel encapsulation header. i.e. specific tunnel header data to be
+ * used in the encapsulation by the HW.
+ * @tnl_hdr_size: Tunnel encapsulation header size.
+ * @port: NIC port ID.
+ * @id: Encapsulation ID.
+ * @ipv4_addr: Source IP address, set regardless of encapsulation type.
+ * @udp_dst_port: The UDP destination-port. Valid for L4 tunnel.
+ * @ip_proto: IP protocol to use. Valid for L3 tunnel.
+ * @encap_type: Encapsulation type. May be either no-encapsulation or encapsulation over L3 or L4.
+ */
+struct hbl_cni_user_encap_set_in {
+ u64 tnl_hdr_ptr;
+ u32 tnl_hdr_size;
+ u32 port;
+ u32 id;
+ u32 ipv4_addr;
+ union {
+ u16 udp_dst_port;
+ u16 ip_proto;
+ };
+ u8 encap_type;
+};
+
+/**
+ * struct hbl_cni_user_encap_unset_in - NIC opcode HBL_CNI_OP_USER_ENCAP_UNSET in param.
+ * @port: NIC port ID.
+ * @id: Encapsulation ID.
+ */
+struct hbl_cni_user_encap_unset_in {
+ u32 port;
+ u32 id;
+};
+
+/**
+ * struct hbl_cni_user_ccq_set_in - NIC opcode HBL_CNI_OP_USER_CCQ_SET in param.
+ * @port: NIC port ID.
+ * @num_of_entries: Number of CCQ entries in the buffer.
+ */
+struct hbl_cni_user_ccq_set_in {
+ u32 port;
+ u32 num_of_entries;
+};
+
+/**
+ * struct hbl_cni_user_ccq_set_out - NIC opcode HBL_CNI_OP_USER_CCQ_SET out param.
+ * @mem_handle: Handle of CCQ memory buffer.
+ * @pi_handle: Handle of CCQ producer-index memory buffer.
+ * @id: CQ ID.
+ */
+struct hbl_cni_user_ccq_set_out {
+ u64 mem_handle;
+ u64 pi_handle;
+ u32 id;
+};
+
+/**
+ * struct hbl_cni_user_ccq_unset_in - NIC opcode HBL_CNI_OP_USER_CCQ_UNSET in param.
+ * @port: NIC port ID.
+ */
+struct hbl_cni_user_ccq_unset_in {
+ u32 port;
+};
+
+/* Opcode to allocate connection ID */
+#define HBL_CNI_OP_ALLOC_CONN 0
+/* Opcode to set up a requester connection context */
+#define HBL_CNI_OP_SET_REQ_CONN_CTX 1
+/* Opcode to set up a responder connection context */
+#define HBL_CNI_OP_SET_RES_CONN_CTX 2
+/* Opcode to destroy a connection */
+#define HBL_CNI_OP_DESTROY_CONN 3
+/* Opcode reserved (deprecated) */
+#define HBL_CNI_OP_RESERVED0 4
+/* Opcode reserved (deprecated) */
+#define HBL_CNI_OP_RESERVED1 5
+/* Opcode reserved (deprecated) */
+#define HBL_CNI_OP_RESERVED2 6
+/* Opcode reserved (deprecated) */
+#define HBL_CNI_OP_RESERVED3 7
+/* Opcode reserved (deprecated) */
+#define HBL_CNI_OP_RESERVED4 8
+/* Opcode to set a user WQ array */
+#define HBL_CNI_OP_USER_WQ_SET 9
+/* Opcode to unset a user WQ array */
+#define HBL_CNI_OP_USER_WQ_UNSET 10
+/* Opcode reserved */
+#define HBL_CNI_OP_RESERVED5 11
+/* Opcode reserved */
+#define HBL_CNI_OP_RESERVED6 12
+/* Opcode reserved */
+#define HBL_CNI_OP_RESERVED7 13
+/* Opcode to allocate a CQ */
+#define HBL_CNI_OP_ALLOC_USER_CQ_ID 14
+/* Opcode to set specific user-application parameters */
+#define HBL_CNI_OP_SET_USER_APP_PARAMS 15
+/* Opcode to get specific user-application parameters */
+#define HBL_CNI_OP_GET_USER_APP_PARAMS 16
+/* Opcode to allocate a DB-FIFO */
+#define HBL_CNI_OP_ALLOC_USER_DB_FIFO 17
+/* Opcode to create a DB-FIFO */
+#define HBL_CNI_OP_USER_DB_FIFO_SET 18
+/* Opcode to destroy a DB-FIFO */
+#define HBL_CNI_OP_USER_DB_FIFO_UNSET 19
+/* Opcode to poll on EQ */
+#define HBL_CNI_OP_EQ_POLL 20
+/* Opcode to allocate encapsulation ID */
+#define HBL_CNI_OP_USER_ENCAP_ALLOC 21
+/* Opcode to create an encapsulation */
+#define HBL_CNI_OP_USER_ENCAP_SET 22
+/* Opcode to destroy an encapsulation */
+#define HBL_CNI_OP_USER_ENCAP_UNSET 23
+/* Opcode to create a CCQ */
+#define HBL_CNI_OP_USER_CCQ_SET 24
+/* Opcode to destroy a CCQ */
+#define HBL_CNI_OP_USER_CCQ_UNSET 25
+/* Opcode to set user CQ by ID */
+#define HBL_CNI_OP_USER_CQ_ID_SET 26
+/* Opcode to unset user CQ by ID */
+#define HBL_CNI_OP_USER_CQ_ID_UNSET 27
+/* Opcode reserved */
+#define HBL_CNI_OP_RESERVED8 28
+/* Opcode to dump the context of a QP */
+#define HBL_CNI_OP_DUMP_QP 29
+
+#endif /* HBL_CNI_H_ */
--
2.34.1
^ permalink raw reply related [flat|nested] 107+ messages in thread
* [PATCH 02/15] net: hbl_cn: memory manager component
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
2024-06-13 8:21 ` [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver Omer Shpigelman
@ 2024-06-13 8:21 ` Omer Shpigelman
2024-06-13 8:21 ` [PATCH 03/15] net: hbl_cn: physical layer support Omer Shpigelman
` (13 subsequent siblings)
15 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:21 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
Add a common memory manager which handles allocation and mapping.
It manages physical/virtual memory on host/device.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Co-developed-by: David Meriin <dmeriin@habana.ai>
Signed-off-by: David Meriin <dmeriin@habana.ai>
Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
---
.../intel/hbl_cn/common/hbl_cn_memory.c | 325 +++++++++++++++++-
1 file changed, 322 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_memory.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_memory.c
index 93c97fad6a20..878ecba66aa3 100644
--- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_memory.c
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_memory.c
@@ -4,37 +4,356 @@
* All Rights Reserved.
*/
+#include <linux/vmalloc.h>
#include "hbl_cn.h"
-int hbl_cn_mem_alloc(struct hbl_cn_ctx *ctx, struct hbl_cn_mem_data *mem_data)
+static int hbl_cn_map_vmalloc_range(struct hbl_cn_ctx *ctx, u64 vmalloc_va, u64 device_va,
+ u64 size)
+{
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = ctx->hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ return aux_ops->vm_dev_mmu_map(aux_dev, ctx->driver_vm_info.vm_handle, HBL_CN_MEM_TYPE_HOST,
+ vmalloc_va, device_va, size);
+}
+
+static void hbl_cn_unmap_vmalloc_range(struct hbl_cn_ctx *ctx, u64 device_va, u64 size)
+{
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = ctx->hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ aux_ops->vm_dev_mmu_unmap(aux_dev, ctx->driver_vm_info.vm_handle, device_va, size);
+}
+
+static int alloc_mem(struct hbl_cn_mem_buf *buf, gfp_t gfp, struct hbl_cn_mem_data *mem_data)
+{
+ u64 device_addr, size = mem_data->size;
+ struct hbl_cn_ctx *ctx = buf->ctx;
+ u32 mem_id = mem_data->mem_id;
+ struct hbl_cn_device *hdev;
+ void *p = NULL;
+
+ hdev = ctx->hdev;
+
+ switch (mem_id) {
+ case HBL_CN_DRV_MEM_HOST_DMA_COHERENT:
+ if (get_order(size) > MAX_PAGE_ORDER) {
+ dev_err(hdev->dev, "memory size 0x%llx must be less than 0x%lx\n", size,
+ 1UL << (PAGE_SHIFT + MAX_PAGE_ORDER));
+ return -ENOMEM;
+ }
+
+ p = hbl_cn_dma_alloc_coherent(hdev, size, &buf->bus_address, GFP_USER | __GFP_ZERO);
+ if (!p) {
+ dev_err(hdev->dev,
+ "failed to allocate 0x%llx of dma memory for the NIC\n", size);
+ return -ENOMEM;
+ }
+
+ break;
+ case HBL_CN_DRV_MEM_HOST_VIRTUAL:
+ p = vmalloc_user(size);
+ if (!p) {
+ dev_err(hdev->dev, "failed to allocate vmalloc memory, size 0x%llx\n",
+ size);
+ return -ENOMEM;
+ }
+
+ break;
+ case HBL_CN_DRV_MEM_HOST_MAP_ONLY:
+ p = mem_data->in.host_map_data.kernel_address;
+ buf->bus_address = mem_data->in.host_map_data.bus_address;
+ break;
+ case HBL_CN_DRV_MEM_DEVICE:
+ if (!hdev->wq_arrays_pool_enable) {
+ dev_err(hdev->dev, "No WQ arrays pool support for device memory\n");
+ return -EOPNOTSUPP;
+ }
+
+ device_addr = (u64)gen_pool_alloc(hdev->wq_arrays_pool, size);
+ if (!device_addr) {
+ dev_err(hdev->dev, "Failed to allocate device memory, size 0x%llx\n", size);
+ return -ENOMEM;
+ }
+
+ buf->device_addr = device_addr;
+ break;
+ default:
+ dev_err(hdev->dev, "Invalid mem_id %d\n", mem_id);
+ return -EINVAL;
+ }
+
+ buf->kernel_address = p;
+ buf->mappable_size = size;
+
+ return 0;
+}
+
+static int map_mem(struct hbl_cn_mem_buf *buf, struct hbl_cn_mem_data *mem_data)
+{
+ struct hbl_cn_ctx *ctx = buf->ctx;
+ struct hbl_cn_device *hdev;
+ int rc;
+
+ hdev = ctx->hdev;
+
+ if (mem_data->mem_id == HBL_CN_DRV_MEM_HOST_DMA_COHERENT) {
+ dev_err(hdev->dev, "Mapping DMA coherent host memory is not yet supported\n");
+ return -EPERM;
+ }
+
+ rc = hbl_cn_map_vmalloc_range(ctx, (u64)buf->kernel_address, mem_data->device_va,
+ buf->mappable_size);
+ if (rc)
+ return rc;
+
+ buf->device_va = mem_data->device_va;
+
+ return 0;
+}
+
+static void mem_do_release(struct hbl_cn_device *hdev, struct hbl_cn_mem_buf *buf)
+{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+
+ if (buf->mem_id == HBL_CN_DRV_MEM_HOST_DMA_COHERENT)
+ asic_funcs->dma_free_coherent(hdev, buf->mappable_size, buf->kernel_address,
+ buf->bus_address);
+ else if (buf->mem_id == HBL_CN_DRV_MEM_HOST_VIRTUAL)
+ vfree(buf->kernel_address);
+ else if (buf->mem_id == HBL_CN_DRV_MEM_DEVICE)
+ gen_pool_free(hdev->wq_arrays_pool, buf->device_addr, buf->mappable_size);
+}
+
+static int __cn_mem_buf_alloc(struct hbl_cn_mem_buf *buf, gfp_t gfp,
+ struct hbl_cn_mem_data *mem_data)
+{
+ struct hbl_cn_ctx *ctx = buf->ctx;
+ struct hbl_cn_device *hdev;
+ int rc;
+
+ hdev = ctx->hdev;
+
+ if (mem_data->mem_id != HBL_CN_DRV_MEM_DEVICE)
+ mem_data->size = PAGE_ALIGN(mem_data->size);
+
+ rc = alloc_mem(buf, gfp, mem_data);
+ if (rc)
+ return rc;
+
+ if (mem_data->device_va) {
+ mem_data->device_va = PAGE_ALIGN(mem_data->device_va);
+ rc = map_mem(buf, mem_data);
+ if (rc)
+ goto release_mem;
+ }
+
+ return 0;
+
+release_mem:
+ mem_do_release(hdev, buf);
+ return rc;
+}
+
+static struct hbl_cn_mem_buf *cn_mem_buf_alloc(struct hbl_cn_ctx *ctx, gfp_t gfp,
+ struct hbl_cn_mem_data *mem_data)
+{
+ struct xa_limit id_limit = XA_LIMIT(1, INT_MAX);
+ struct hbl_cn_device *hdev = ctx->hdev;
+ struct hbl_cn_mem_buf *buf;
+ int rc;
+ u32 id;
+
+ buf = kzalloc(sizeof(*buf), gfp);
+ if (!buf)
+ return NULL;
+
+ rc = xa_alloc(&hdev->mem_ids, &id, buf, id_limit, GFP_ATOMIC);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to allocate xarray for a new buffer, rc=%d\n", rc);
+ goto free_buf;
+ }
+
+ buf->ctx = ctx;
+ buf->mem_id = mem_data->mem_id;
+
+ buf->handle = (((u64)id | hdev->mmap_type_flag) << PAGE_SHIFT);
+ kref_init(&buf->refcount);
+
+ rc = __cn_mem_buf_alloc(buf, gfp, mem_data);
+ if (rc)
+ goto remove_xa;
+
+ return buf;
+
+remove_xa:
+ xa_erase(&hdev->mem_ids, lower_32_bits(buf->handle >> PAGE_SHIFT));
+free_buf:
+ kfree(buf);
+ return NULL;
+}
+
+static int cn_mem_alloc(struct hbl_cn_ctx *ctx, struct hbl_cn_mem_data *mem_data)
{
+ struct hbl_cn_mem_buf *buf;
+
+ buf = cn_mem_buf_alloc(ctx, GFP_KERNEL, mem_data);
+ if (!buf)
+ return -ENOMEM;
+
+ mem_data->handle = buf->handle;
+
+ if (mem_data->mem_id == HBL_CN_DRV_MEM_HOST_DMA_COHERENT)
+ mem_data->addr = (u64)buf->bus_address;
+ else if (mem_data->mem_id == HBL_CN_DRV_MEM_HOST_VIRTUAL)
+ mem_data->addr = (u64)buf->kernel_address;
+ else if (mem_data->mem_id == HBL_CN_DRV_MEM_DEVICE)
+ mem_data->addr = (u64)buf->device_addr;
+
return 0;
}
+int hbl_cn_mem_alloc(struct hbl_cn_ctx *ctx, struct hbl_cn_mem_data *mem_data)
+{
+ struct hbl_cn_device *hdev = ctx->hdev;
+ int rc;
+
+ switch (mem_data->mem_id) {
+ case HBL_CN_DRV_MEM_HOST_DMA_COHERENT:
+ case HBL_CN_DRV_MEM_HOST_VIRTUAL:
+ case HBL_CN_DRV_MEM_HOST_MAP_ONLY:
+ case HBL_CN_DRV_MEM_DEVICE:
+ rc = cn_mem_alloc(ctx, mem_data);
+ break;
+ default:
+ dev_dbg(hdev->dev, "Invalid mem_id %d\n", mem_data->mem_id);
+ rc = -EINVAL;
+ break;
+ }
+
+ return rc;
+}
+
+static void cn_mem_buf_destroy(struct hbl_cn_mem_buf *buf)
+{
+ if (buf->device_va)
+ hbl_cn_unmap_vmalloc_range(buf->ctx, buf->device_va, buf->mappable_size);
+
+ mem_do_release(buf->ctx->hdev, buf);
+
+ kfree(buf);
+}
+
int hbl_cn_mem_destroy(struct hbl_cn_device *hdev, u64 handle)
{
+ struct hbl_cn_mem_buf *buf;
+ int rc;
+
+ buf = hbl_cn_mem_buf_get(hdev, handle);
+ if (!buf) {
+ dev_dbg(hdev->dev, "Memory destroy failed, no match for handle 0x%llx\n", handle);
+ return -EINVAL;
+ }
+
+ rc = atomic_cmpxchg(&buf->is_destroyed, 0, 1);
+ hbl_cn_mem_buf_put(buf);
+ if (rc) {
+ dev_dbg(hdev->dev, "Memory destroy failed, handle 0x%llx was already destroyed\n",
+ handle);
+ return -EINVAL;
+ }
+
+ rc = hbl_cn_mem_buf_put_handle(hdev, handle);
+ if (rc < 0)
+ return rc;
+
+ if (rc == 0)
+ dev_dbg(hdev->dev, "Handle 0x%llx is destroyed while still in use\n", handle);
+
return 0;
}
+static void cn_mem_buf_release(struct kref *kref)
+{
+ struct hbl_cn_mem_buf *buf = container_of(kref, struct hbl_cn_mem_buf, refcount);
+ struct hbl_cn_device *hdev = buf->ctx->hdev;
+
+ xa_erase(&hdev->mem_ids, lower_32_bits(buf->handle >> PAGE_SHIFT));
+
+ cn_mem_buf_destroy(buf);
+}
+
struct hbl_cn_mem_buf *hbl_cn_mem_buf_get(struct hbl_cn_device *hdev, u64 handle)
{
- return NULL;
+ struct hbl_cn_mem_buf *buf;
+
+ xa_lock(&hdev->mem_ids);
+ buf = xa_load(&hdev->mem_ids, lower_32_bits(handle >> PAGE_SHIFT));
+ if (!buf) {
+ xa_unlock(&hdev->mem_ids);
+ dev_dbg(hdev->dev, "Buff get failed, no match to handle %#llx\n", handle);
+ return NULL;
+ }
+
+ kref_get(&buf->refcount);
+ xa_unlock(&hdev->mem_ids);
+
+ return buf;
}
int hbl_cn_mem_buf_put(struct hbl_cn_mem_buf *buf)
{
- return 0;
+ return kref_put(&buf->refcount, cn_mem_buf_release);
+}
+
+static void cn_mem_buf_remove_xa_locked(struct kref *kref)
+{
+ struct hbl_cn_mem_buf *buf = container_of(kref, struct hbl_cn_mem_buf, refcount);
+
+ __xa_erase(&buf->ctx->hdev->mem_ids, lower_32_bits(buf->handle >> PAGE_SHIFT));
}
int hbl_cn_mem_buf_put_handle(struct hbl_cn_device *hdev, u64 handle)
{
+ struct hbl_cn_mem_buf *buf;
+
+ xa_lock(&hdev->mem_ids);
+ buf = xa_load(&hdev->mem_ids, lower_32_bits(handle >> PAGE_SHIFT));
+ if (!buf) {
+ xa_unlock(&hdev->mem_ids);
+ dev_dbg(hdev->dev, "Buff put failed, no match to handle %#llx\n", handle);
+ return -EINVAL;
+ }
+
+ if (kref_put(&buf->refcount, cn_mem_buf_remove_xa_locked)) {
+ xa_unlock(&hdev->mem_ids);
+ cn_mem_buf_destroy(buf);
+ return 1;
+ }
+
+ xa_unlock(&hdev->mem_ids);
return 0;
}
void hbl_cn_mem_init(struct hbl_cn_device *hdev)
{
+ xa_init_flags(&hdev->mem_ids, XA_FLAGS_ALLOC);
}
void hbl_cn_mem_fini(struct hbl_cn_device *hdev)
{
+ struct xarray *mem_ids;
+
+ mem_ids = &hdev->mem_ids;
+
+ if (!xa_empty(mem_ids))
+ dev_crit(hdev->dev, "memory manager is destroyed while not empty!\n");
+
+ xa_destroy(mem_ids);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 107+ messages in thread
* [PATCH 03/15] net: hbl_cn: physical layer support
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
2024-06-13 8:21 ` [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver Omer Shpigelman
2024-06-13 8:21 ` [PATCH 02/15] net: hbl_cn: memory manager component Omer Shpigelman
@ 2024-06-13 8:21 ` Omer Shpigelman
2024-06-13 8:21 ` [PATCH 04/15] net: hbl_cn: QP state machine Omer Shpigelman
` (12 subsequent siblings)
15 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:21 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
Add a common physical (PHY) layer initialization and link notification.
An notification is sent on every link up/down change.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Co-developed-by: David Meriin <dmeriin@habana.ai>
Signed-off-by: David Meriin <dmeriin@habana.ai>
Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
---
.../ethernet/intel/hbl_cn/common/hbl_cn_phy.c | 201 ++++++++++++++++++
1 file changed, 201 insertions(+)
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_phy.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_phy.c
index 0d07cd78221d..6753d54ae2b0 100644
--- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_phy.c
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_phy.c
@@ -4,30 +4,231 @@
* All Rights Reserved.
*/
+#include <linux/firmware.h>
#include "hbl_cn.h"
+static void port_reset_state(struct hbl_cn_port *cn_port)
+{
+ cn_port->pcs_link = false;
+ cn_port->eq_pcs_link = false;
+ cn_port->auto_neg_resolved = false;
+ cn_port->auto_neg_skipped = false;
+ cn_port->phy_fw_tuned = false;
+ cn_port->retry_cnt = 0;
+ cn_port->pcs_remote_fault_seq_cnt = 0;
+ cn_port->pcs_link_restore_cnt = 0;
+ cn_port->correctable_errors_cnt = 0;
+ cn_port->uncorrectable_errors_cnt = 0;
+}
+
+static u32 get_data_rate(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port, speed, data_rate;
+
+ port = cn_port->port;
+ speed = cn_port->speed;
+
+ switch (speed) {
+ case SPEED_10000:
+ data_rate = NIC_DR_10;
+ break;
+ case SPEED_25000:
+ data_rate = NIC_DR_25;
+ break;
+ case SPEED_50000:
+ data_rate = NIC_DR_50;
+ break;
+ case SPEED_100000:
+ data_rate = NIC_DR_50;
+ break;
+ case SPEED_200000:
+ data_rate = NIC_DR_100;
+ break;
+ case SPEED_400000:
+ data_rate = NIC_DR_100;
+ break;
+ default:
+ data_rate = NIC_DR_50;
+ dev_err(hdev->dev, "unknown port %d speed, continue with 50 GHz\n", port);
+ break;
+ }
+
+ dev_dbg(hdev->dev, "port %d, speed %d data rate %d\n", port, speed, data_rate);
+
+ return data_rate;
+}
+
void hbl_cn_phy_set_port_status(struct hbl_cn_port *cn_port, bool up)
{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port = cn_port->port;
+ bool is_ibdev;
+ int rc;
+
+ aux_dev = &hdev->en_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ port_funcs = hdev->asic_funcs->port_funcs;
+ is_ibdev = hbl_cn_is_ibdev(hdev);
+
+ port_funcs->set_port_status(cn_port, up);
+
+ if (cn_port->eth_enable) {
+ if (aux_ops->set_port_status)
+ aux_ops->set_port_status(aux_dev, port, up);
+ } else {
+ if (hdev->ctx)
+ dev_info(hdev->dev, "Card %u Port %u: link %s\n",
+ hdev->card_location, port, up ? "up" : "down");
+ else
+ dev_dbg(hdev->dev, "Card %u Port %u: link %s\n",
+ hdev->card_location, port, up ? "up" : "down");
+ }
+
+ /* IB flow. User polls for IB events.
+ * - internal ports: Enqueue link event in EQ dispatcher. IB event would be dispatched in
+ * response.
+ * - external ports: Do not enqueue. hbl IB driver dispatches IB events from netdev
+ * notifier chain handler.
+ * non-IB flow. User polls for EQ events.
+ * - internal ports: Enqueue link event in EQ dispatcher.
+ * - external ports: Enqueue link event in EQ dispatcher.
+ */
+ if (!is_ibdev || !cn_port->eth_enable) {
+ if (hdev->has_eq) {
+ rc = hbl_cn_eq_dispatcher_enqueue_bcast(cn_port, &cn_port->link_eqe);
+ if (rc)
+ dev_dbg_ratelimited(hdev->dev,
+ "Port %d, failed to dispatch link event %s, %d\n",
+ port, up ? "up" : "down", rc);
+ }
+ }
+
+ cn_port->port_toggle_cnt++;
+
+ /* The FEC counters are relevant during the time that link is UP, hence reset them here */
+ if (up) {
+ cn_port->correctable_errors_cnt = 0;
+ cn_port->uncorrectable_errors_cnt = 0;
+ }
+
+ if (hdev->pldm) {
+ dev_dbg(hdev->dev, "%s: port %u\n", __func__, port);
+ msleep(1000);
+ }
}
int hbl_cn_phy_init(struct hbl_cn_port *cn_port)
{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ int rc;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ /* If mac_loopback is enabled on this port, move the port status to UP state */
+ if (cn_port->mac_loopback) {
+ cn_port->pcs_link = true;
+ hbl_cn_phy_set_port_status(cn_port, true);
+ return 0;
+ }
+
+ if (!hdev->phy_config_fw) {
+ /* If EQ is supported, it will take care of setting the port status */
+ if (!hdev->has_eq) {
+ cn_port->pcs_link = true;
+ hbl_cn_phy_set_port_status(cn_port, true);
+ }
+
+ return 0;
+ }
+
+ cn_port->data_rate = get_data_rate(cn_port);
+
+ rc = port_funcs->phy_port_power_up(cn_port);
+ if (rc) {
+ dev_err(hdev->dev, "ASIC specific phy port power-up failed, %d\n", rc);
+ return rc;
+ }
+
+ port_funcs->phy_port_start_stop(cn_port, true);
+
+ queue_delayed_work(cn_port->wq, &cn_port->link_status_work, msecs_to_jiffies(1));
+
return 0;
}
+/* This function does not change the port link status in order to avoid unnecessary netdev actions
+ * and prints. Hence it should be done from outside.
+ */
void hbl_cn_phy_fini(struct hbl_cn_port *cn_port)
{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+
+ /* This is done before the check because we support setting mac loopback for a specific port
+ * and this function might be called when cn_port->mac_loopback is true (during the port
+ * reset after setting mac loopback), but the link status work was scheduled before (when
+ * the port was opened w/o mac loopback).
+ */
+ cancel_delayed_work_sync(&cn_port->link_status_work);
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ if (!hdev->phy_config_fw || cn_port->mac_loopback) {
+ cn_port->pcs_link = false;
+ cn_port->eq_pcs_link = false;
+ return;
+ }
+
+ port_reset_state(cn_port);
+ port_funcs->phy_port_start_stop(cn_port, false);
}
void hbl_cn_phy_port_reconfig(struct hbl_cn_port *cn_port)
{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ port_funcs->phy_port_reconfig(cn_port);
+
+ port_reset_state(cn_port);
}
int hbl_cn_phy_has_binary_fw(struct hbl_cn_device *hdev)
{
+ struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+ const struct firmware *fw;
+ const char *fw_name;
+ int rc;
+
+ fw_name = asic_funcs->get_phy_fw_name();
+
+ rc = request_firmware(&fw, fw_name, hdev->dev);
+ if (rc) {
+ dev_err(hdev->dev, "Firmware file %s is not found!\n", fw_name);
+ return rc;
+ }
+
+ release_firmware(fw);
+
return 0;
}
void hbl_cn_phy_set_fw_polarity(struct hbl_cn_device *hdev)
{
+ struct hbl_cn_cpucp_info *cpucp_info;
+
+ if (hdev->skip_phy_pol_cfg)
+ return;
+
+ cpucp_info = hdev->cpucp_info;
+
+ hdev->pol_tx_mask = cpucp_info->pol_tx_mask[0];
+ hdev->pol_rx_mask = cpucp_info->pol_rx_mask[0];
}
--
2.34.1
^ permalink raw reply related [flat|nested] 107+ messages in thread
* [PATCH 04/15] net: hbl_cn: QP state machine
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
` (2 preceding siblings ...)
2024-06-13 8:21 ` [PATCH 03/15] net: hbl_cn: physical layer support Omer Shpigelman
@ 2024-06-13 8:21 ` Omer Shpigelman
2024-06-17 13:18 ` Leon Romanovsky
2024-06-13 8:21 ` [PATCH 05/15] net: hbl_cn: memory trace events Omer Shpigelman
` (11 subsequent siblings)
15 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:21 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
Add a common QP state machine which handles the moving for a QP from one
state to another including performing necessary checks, draining
in-flight transactions, invalidating caches and error reporting.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Co-developed-by: David Meriin <dmeriin@habana.ai>
Signed-off-by: David Meriin <dmeriin@habana.ai>
Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
---
.../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 480 +++++++++++++++++-
1 file changed, 479 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
index 9ddc23bf8194..26ebdf448193 100644
--- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
@@ -6,8 +6,486 @@
#include "hbl_cn.h"
+#define OP_RETRY_COUNT 4
+#define OPC_SETTLE_RETRY_COUNT 20
+
+/* The following table represents the (valid) operations that can be performed on
+ * a QP in order to move it from one state to another
+ * For example: a QP in RTR state can be moved to RTS state using the CN_QP_OP_RTR_2RTS
+ * operation.
+ */
+static const enum hbl_cn_qp_state_op qp_valid_state_op[CN_QP_NUM_STATE][CN_QP_NUM_STATE] = {
+ [CN_QP_STATE_RESET] = {
+ [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
+ [CN_QP_STATE_INIT] = CN_QP_OP_RST_2INIT,
+ [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
+ [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
+ },
+ [CN_QP_STATE_INIT] = {
+ [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
+ [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
+ [CN_QP_STATE_INIT] = CN_QP_OP_NOP,
+ [CN_QP_STATE_RTR] = CN_QP_OP_INIT_2RTR,
+ [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
+ [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
+ },
+ [CN_QP_STATE_RTR] = {
+ [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
+ [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
+ [CN_QP_STATE_RTR] = CN_QP_OP_RTR_2RTR,
+ [CN_QP_STATE_RTS] = CN_QP_OP_RTR_2RTS,
+ [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
+ [CN_QP_STATE_QPD] = CN_QP_OP_RTR_2QPD,
+ },
+ [CN_QP_STATE_RTS] = {
+ [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
+ [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
+ [CN_QP_STATE_RTS] = CN_QP_OP_RTS_2RTS,
+ [CN_QP_STATE_SQD] = CN_QP_OP_RTS_2SQD,
+ [CN_QP_STATE_QPD] = CN_QP_OP_RTS_2QPD,
+ [CN_QP_STATE_SQERR] = CN_QP_OP_RTS_2SQERR,
+ },
+ [CN_QP_STATE_SQD] = {
+ [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
+ [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
+ [CN_QP_STATE_SQD] = CN_QP_OP_SQD_2SQD,
+ [CN_QP_STATE_RTS] = CN_QP_OP_SQD_2RTS,
+ [CN_QP_STATE_QPD] = CN_QP_OP_SQD_2QPD,
+ [CN_QP_STATE_SQERR] = CN_QP_OP_SQD_2SQ_ERR,
+ },
+ [CN_QP_STATE_QPD] = {
+ [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
+ [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
+ [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
+ [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
+ [CN_QP_STATE_RTR] = CN_QP_OP_QPD_2RTR,
+ },
+ [CN_QP_STATE_SQERR] = {
+ [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
+ [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
+ [CN_QP_STATE_SQD] = CN_QP_OP_SQ_ERR_2SQD,
+ [CN_QP_STATE_SQERR] = CN_QP_OP_NOP,
+ },
+ [CN_QP_STATE_ERR] = {
+ [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
+ [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
+ }
+};
+
+static char *cn_qp_state_2name(enum hbl_cn_qp_state state)
+{
+ static char *arr[CN_QP_NUM_STATE] = {
+ "Reset",
+ "Init",
+ "RTR",
+ "RTS",
+ "SQD",
+ "QPD",
+ "SQERR",
+ "ERR",
+ };
+
+ return arr[state];
+}
+
+static inline int wait_for_qpc_idle(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp, bool is_req)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_qpc_attr qpc_attr;
+ int i, rc;
+
+ if (!hdev->qp_wait_for_idle)
+ return 0;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ for (i = 0; i < OPC_SETTLE_RETRY_COUNT; i++) {
+ rc = port_funcs->qpc_query(cn_port, qp->qp_id, is_req, &qpc_attr);
+
+ if (rc && (rc != -EBUSY && rc != -ETIMEDOUT))
+ return rc;
+
+ if (!(rc || qpc_attr.in_work))
+ return 0;
+
+ /* Release lock while we wait before retry.
+ * Note, we can assert that we are already locked.
+ */
+ port_funcs->cfg_unlock(cn_port);
+
+ msleep(20);
+
+ port_funcs->cfg_lock(cn_port);
+ }
+
+ rc = port_funcs->qpc_query(cn_port, qp->qp_id, is_req, &qpc_attr);
+
+ if (rc && (rc != -EBUSY && rc != -ETIMEDOUT))
+ return rc;
+
+ if (rc || qpc_attr.in_work)
+ return -ETIMEDOUT;
+
+ return 0;
+}
+
+static int cn_qp_op_reset(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ int rc, rc1;
+
+ asic_funcs = hdev->asic_funcs;
+
+ /* clear the QPCs */
+ rc = asic_funcs->port_funcs->qpc_clear(cn_port, qp, false);
+ if (rc && hbl_cn_comp_device_operational(hdev))
+ /* Device might not respond during reset if the reset was due to error */
+ dev_err(hdev->dev, "Port %d QP %d: Failed to clear responder QPC\n",
+ qp->port, qp->qp_id);
+ else
+ qp->is_res = false;
+
+ rc1 = asic_funcs->port_funcs->qpc_clear(cn_port, qp, true);
+ if (rc1) {
+ rc = rc1;
+ if (hbl_cn_comp_device_operational(hdev))
+ /* Device might not respond during reset if the reset was due to error */
+ dev_err(hdev->dev, "Port %d QP %d: Failed to clear requestor QPC\n",
+ qp->port, qp->qp_id);
+ } else {
+ qp->is_req = false;
+ }
+
+ /* wait for REQ idle, RES idle is already done in cn_qp_op_2qpd */
+ rc = wait_for_qpc_idle(cn_port, qp, true);
+ if (rc) {
+ dev_err(hdev->dev, "Port %d QP %d, Requestor QPC is not idle (rc %d)\n",
+ cn_port->port, qp->qp_id, rc);
+ return rc;
+ }
+
+ qp->curr_state = CN_QP_STATE_RESET;
+
+ return rc;
+}
+
+static int cn_qp_op_reset_2init(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp)
+{
+ if (ZERO_OR_NULL_PTR(qp))
+ return -EINVAL;
+
+ qp->curr_state = CN_QP_STATE_INIT;
+
+ return 0;
+}
+
+static int cn_qp_op_2rts(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp,
+ struct hbl_cni_req_conn_ctx_in *in)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ int rc;
+
+ asic_funcs = hdev->asic_funcs;
+
+ rc = asic_funcs->set_req_qp_ctx(hdev, in, qp);
+ if (rc)
+ return rc;
+
+ qp->curr_state = CN_QP_STATE_RTS;
+ qp->is_req = true;
+
+ return 0;
+}
+
+static int cn_qp_op_2rtr(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp,
+ struct hbl_cni_res_conn_ctx_in *in)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ int rc;
+
+ asic_funcs = hdev->asic_funcs;
+
+ rc = asic_funcs->set_res_qp_ctx(hdev, in, qp);
+ if (rc)
+ return rc;
+
+ qp->curr_state = CN_QP_STATE_RTR;
+ qp->is_res = true;
+
+ return 0;
+}
+
+static inline int cn_qp_invalidate_qpc(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp,
+ bool is_req)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ int i, rc;
+
+ asic_funcs = hdev->asic_funcs;
+
+ for (i = 0; i < OP_RETRY_COUNT; i++) {
+ rc = asic_funcs->port_funcs->qpc_invalidate(cn_port, qp, is_req);
+ if (!rc)
+ break;
+
+ usleep_range(100, 200);
+ }
+
+ return rc;
+}
+
+static int cn_qp_invalidate(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp, bool is_req,
+ bool wait_for_idle)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ int rc;
+
+ rc = cn_qp_invalidate_qpc(cn_port, qp, is_req);
+ if (rc) {
+ if (hbl_cn_comp_device_operational(hdev))
+ dev_err(hdev->dev, "Port %d QP %d, failed to invalidate %s QPC (rc %d)\n",
+ cn_port->port, qp->qp_id, is_req ? "Requester" : "Responder", rc);
+ return rc;
+ }
+
+ if (!wait_for_idle || is_req)
+ return 0;
+
+ /* check for QP idle in case of responder only */
+ rc = wait_for_qpc_idle(cn_port, qp, false);
+ if (rc) {
+ dev_err(hdev->dev, "Port %d QP %d, Responder QPC is not idle (rc %d)\n",
+ cn_port->port, qp->qp_id, rc);
+ return rc;
+ }
+
+ return rc;
+}
+
+/* Drain the Requester */
+static int cn_qp_op_rts_2sqd(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp, void *attr)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_qpc_drain_attr *drain = attr;
+ int rc = 0;
+
+ switch (qp->curr_state) {
+ case CN_QP_STATE_RTS:
+ cn_qp_invalidate(cn_port, qp, true, drain->wait_for_idle);
+ if (drain->wait_for_idle)
+ ssleep(hdev->qp_drain_time);
+
+ break;
+ default:
+ rc = -EOPNOTSUPP;
+ break;
+ }
+
+ if (!rc)
+ qp->curr_state = CN_QP_STATE_SQD;
+
+ return rc;
+}
+
+/* Re-drain the Requester. This function is called without holding the cfg lock so it must not
+ * access the HW or do anything other than just sleeping.
+ */
+static int cn_qp_op_sqd_2sqd(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp, void *attr)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_qpc_drain_attr *drain = attr;
+
+ /* no need to invalidate the QP as it was already invalidated just extend the wait time */
+ if (drain->wait_for_idle)
+ ssleep(hdev->qp_drain_time);
+
+ return 0;
+}
+
+/* Drain the QP (Requester and Responder) */
+static int cn_qp_op_2qpd(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp, void *attr)
+{
+ struct hbl_cn_qpc_drain_attr *drain = attr;
+ int rc = 0;
+
+ switch (qp->curr_state) {
+ case CN_QP_STATE_RTR:
+ /* In RTR only the Resp is working */
+ cn_qp_invalidate(cn_port, qp, false, drain->wait_for_idle);
+ break;
+ case CN_QP_STATE_RTS:
+ /* In RTS both the Resp and Req are working */
+ cn_qp_op_rts_2sqd(cn_port, qp, attr);
+ cn_qp_invalidate(cn_port, qp, false, drain->wait_for_idle);
+ break;
+ case CN_QP_STATE_SQD:
+ /* In SQD only the Resp is working */
+ cn_qp_invalidate(cn_port, qp, false, drain->wait_for_idle);
+ break;
+ case CN_QP_STATE_QPD:
+ break;
+ default:
+ rc = -EOPNOTSUPP;
+ break;
+ }
+
+ if (!rc)
+ qp->curr_state = CN_QP_STATE_QPD;
+
+ return rc;
+}
+
+static int cn_qp_op_2reset(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp,
+ const struct hbl_cn_qpc_reset_attr *attr)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct hbl_cn_qpc_drain_attr drain;
+
+ asic_funcs = hdev->asic_funcs;
+
+ /* brute-force reset when reset mode is Hard */
+ if (attr->reset_mode == CN_QP_RESET_MODE_HARD && qp->curr_state != CN_QP_STATE_RESET) {
+ /* invalidate */
+ asic_funcs->port_funcs->qpc_invalidate(cn_port, qp, true);
+ asic_funcs->port_funcs->qpc_invalidate(cn_port, qp, false);
+
+ /* wait for HW digest the invalidation */
+ usleep_range(100, 150);
+
+ cn_qp_op_reset(cn_port, qp);
+ return 0;
+ }
+
+ if (attr->reset_mode == CN_QP_RESET_MODE_GRACEFUL)
+ drain.wait_for_idle = true;
+ else
+ drain.wait_for_idle = false;
+
+ switch (qp->curr_state) {
+ case CN_QP_STATE_RESET:
+ break;
+ case CN_QP_STATE_INIT:
+ cn_qp_op_reset(cn_port, qp);
+ break;
+ case CN_QP_STATE_RTR:
+ case CN_QP_STATE_RTS:
+ case CN_QP_STATE_SQD:
+ cn_qp_op_2qpd(cn_port, qp, &drain);
+ cn_qp_op_reset(cn_port, qp);
+ break;
+ case CN_QP_STATE_QPD:
+ cn_qp_op_reset(cn_port, qp);
+ break;
+ case CN_QP_STATE_SQERR:
+ case CN_QP_STATE_ERR:
+ cn_qp_op_reset(cn_port, qp);
+ break;
+ default:
+ dev_err(hdev->dev, "Port %d QP %d: Unknown state %d, moving to RESET state\n",
+ qp->port, qp->qp_id, qp->curr_state);
+ cn_qp_op_reset(cn_port, qp);
+ break;
+ }
+
+ return 0;
+}
+
+/* QP state handling routines */
int hbl_cn_qp_modify(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp,
enum hbl_cn_qp_state new_state, void *params)
{
- return 0;
+ enum hbl_cn_qp_state prev_state = qp->curr_state;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ enum hbl_cn_qp_state_op op;
+ int rc;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ /* only SQD->SQD transition can be executed without holding the configuration lock */
+ if (prev_state != CN_QP_STATE_SQD || new_state != CN_QP_STATE_SQD) {
+ if (!port_funcs->cfg_is_locked(cn_port)) {
+ dev_err(hdev->dev,
+ "Configuration lock must be held while moving Port %u QP %u from state %s to %s\n",
+ qp->port, qp->qp_id, cn_qp_state_2name(prev_state),
+ cn_qp_state_2name(new_state));
+ return -EACCES;
+ }
+ }
+
+ if (qp->curr_state >= CN_QP_NUM_STATE || new_state >= CN_QP_NUM_STATE ||
+ qp_valid_state_op[qp->curr_state][new_state] == CN_QP_OP_INVAL) {
+ dev_err(hdev->dev,
+ "Invalid QP state transition, Port %u QP %u from state %s to %s\n",
+ qp->port, qp->qp_id, cn_qp_state_2name(prev_state),
+ cn_qp_state_2name(new_state));
+ return -EINVAL;
+ }
+
+ if (new_state >= CN_QP_NUM_STATE) {
+ dev_err(hdev->dev, "Invalid QP state %d\n", new_state);
+ return -EINVAL;
+ }
+
+ /* get the operation needed for this state transition */
+ op = qp_valid_state_op[qp->curr_state][new_state];
+
+ switch (op) {
+ case CN_QP_OP_2RESET:
+ rc = cn_qp_op_2reset(cn_port, qp, params);
+ break;
+ case CN_QP_OP_RST_2INIT:
+ rc = cn_qp_op_reset_2init(cn_port, qp);
+ break;
+ case CN_QP_OP_INIT_2RTR:
+ rc = cn_qp_op_2rtr(cn_port, qp, params);
+ break;
+ case CN_QP_OP_RTR_2RTR:
+ rc = cn_qp_op_2rtr(cn_port, qp, params);
+ break;
+ case CN_QP_OP_RTR_2QPD:
+ rc = cn_qp_op_2qpd(cn_port, qp, params);
+ break;
+ case CN_QP_OP_RTR_2RTS:
+ rc = cn_qp_op_2rts(cn_port, qp, params);
+ break;
+ case CN_QP_OP_RTS_2RTS:
+ rc = cn_qp_op_2rts(cn_port, qp, params);
+ break;
+ case CN_QP_OP_RTS_2SQD:
+ rc = cn_qp_op_rts_2sqd(cn_port, qp, params);
+ break;
+ case CN_QP_OP_SQD_2SQD:
+ rc = cn_qp_op_sqd_2sqd(cn_port, qp, params);
+ break;
+ case CN_QP_OP_RTS_2QPD:
+ rc = cn_qp_op_2qpd(cn_port, qp, params);
+ break;
+ case CN_QP_OP_SQD_2QPD:
+ rc = cn_qp_op_2qpd(cn_port, qp, params);
+ break;
+ case CN_QP_OP_INVAL:
+ rc = -EINVAL;
+ break;
+ case CN_QP_OP_NOP:
+ rc = 0;
+ break;
+ default:
+ rc = -EOPNOTSUPP;
+ break;
+ }
+
+ if (rc)
+ dev_err(hdev->dev,
+ "Errors detected while moving Port %u QP %u from state %s to %s, (rc %d)\n",
+ qp->port, qp->qp_id, cn_qp_state_2name(prev_state),
+ cn_qp_state_2name(new_state), rc);
+
+ return rc;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 107+ messages in thread
* [PATCH 05/15] net: hbl_cn: memory trace events
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
` (3 preceding siblings ...)
2024-06-13 8:21 ` [PATCH 04/15] net: hbl_cn: QP state machine Omer Shpigelman
@ 2024-06-13 8:21 ` Omer Shpigelman
2024-06-13 8:21 ` [PATCH 06/15] net: hbl_cn: debugfs support Omer Shpigelman
` (10 subsequent siblings)
15 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:21 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
Add trace events for hbl_cn to track memory allocations.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Co-developed-by: David Meriin <dmeriin@habana.ai>
Signed-off-by: David Meriin <dmeriin@habana.ai>
Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
---
.../net/ethernet/intel/hbl_cn/common/hbl_cn.c | 28 ++++-
.../ethernet/intel/hbl_cn/common/hbl_cn_drv.c | 3 +
.../intel/hbl_cn/common/hbl_cn_memory.c | 9 ++
include/trace/events/habanalabs_cn.h | 116 ++++++++++++++++++
4 files changed, 154 insertions(+), 2 deletions(-)
create mode 100644 include/trace/events/habanalabs_cn.h
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
index 946b11bfa61b..4e910b2cb8ac 100644
--- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
@@ -12,6 +12,8 @@
#include <linux/pci.h>
#include <linux/slab.h>
+#include <trace/events/habanalabs_cn.h>
+
#define NIC_MIN_WQS_PER_PORT 2
#define NIC_SEQ_RESETS_TIMEOUT_MS 15000 /* 15 seconds */
@@ -5892,8 +5894,15 @@ void *__hbl_cn_dma_alloc_coherent(struct hbl_cn_device *hdev, size_t size, dma_a
gfp_t flag, const char *caller)
{
const struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+ void *ptr;
+
+ ptr = asic_funcs->dma_alloc_coherent(hdev, size, dma_handle, flag);
- return asic_funcs->dma_alloc_coherent(hdev, size, dma_handle, flag);
+ if (trace_habanalabs_cn_dma_alloc_coherent_enabled())
+ trace_habanalabs_cn_dma_alloc_coherent(hdev->dev, (u64)(uintptr_t)ptr, *dma_handle,
+ size, caller);
+
+ return ptr;
}
void __hbl_cn_dma_free_coherent(struct hbl_cn_device *hdev, size_t size, void *cpu_addr,
@@ -5902,14 +5911,25 @@ void __hbl_cn_dma_free_coherent(struct hbl_cn_device *hdev, size_t size, void *c
const struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
asic_funcs->dma_free_coherent(hdev, size, cpu_addr, dma_addr);
+
+ if (trace_habanalabs_cn_dma_free_coherent_enabled())
+ trace_habanalabs_cn_dma_free_coherent(hdev->dev, (u64)(uintptr_t)cpu_addr, dma_addr,
+ size, caller);
}
void *__hbl_cn_dma_pool_zalloc(struct hbl_cn_device *hdev, size_t size, gfp_t mem_flags,
dma_addr_t *dma_handle, const char *caller)
{
const struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
+ void *ptr;
- return asic_funcs->dma_pool_zalloc(hdev, size, mem_flags, dma_handle);
+ ptr = asic_funcs->dma_pool_zalloc(hdev, size, mem_flags, dma_handle);
+
+ if (trace_habanalabs_cn_dma_pool_zalloc_enabled())
+ trace_habanalabs_cn_dma_pool_zalloc(hdev->dev, (u64)(uintptr_t)ptr, *dma_handle,
+ size, caller);
+
+ return ptr;
}
void __hbl_cn_dma_pool_free(struct hbl_cn_device *hdev, void *vaddr, dma_addr_t dma_addr,
@@ -5918,6 +5938,10 @@ void __hbl_cn_dma_pool_free(struct hbl_cn_device *hdev, void *vaddr, dma_addr_t
const struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
asic_funcs->dma_pool_free(hdev, vaddr, dma_addr);
+
+ if (trace_habanalabs_cn_dma_pool_free_enabled())
+ trace_habanalabs_cn_dma_pool_free(hdev->dev, (u64)(uintptr_t)vaddr, dma_addr, 0,
+ caller);
}
int hbl_cn_get_reg_pcie_addr(struct hbl_cn_device *hdev, u8 bar_id, u32 reg, u64 *pci_addr)
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_drv.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_drv.c
index 47eedd27f36e..5ea690509592 100644
--- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_drv.c
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_drv.c
@@ -12,6 +12,9 @@
#include <linux/auxiliary_bus.h>
#include <linux/sched/clock.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/habanalabs_cn.h>
+
#define HBL_DRIVER_AUTHOR "HabanaLabs Kernel Driver Team"
#define HBL_DRIVER_DESC "HabanaLabs AI accelerators Core Network driver"
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_memory.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_memory.c
index 878ecba66aa3..305b5b85acbe 100644
--- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_memory.c
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_memory.c
@@ -6,6 +6,7 @@
#include <linux/vmalloc.h>
#include "hbl_cn.h"
+#include <trace/events/habanalabs_cn.h>
static int hbl_cn_map_vmalloc_range(struct hbl_cn_ctx *ctx, u64 vmalloc_va, u64 device_va,
u64 size)
@@ -201,12 +202,16 @@ static struct hbl_cn_mem_buf *cn_mem_buf_alloc(struct hbl_cn_ctx *ctx, gfp_t gfp
static int cn_mem_alloc(struct hbl_cn_ctx *ctx, struct hbl_cn_mem_data *mem_data)
{
+ struct hbl_cn_device *hdev = ctx->hdev;
struct hbl_cn_mem_buf *buf;
buf = cn_mem_buf_alloc(ctx, GFP_KERNEL, mem_data);
if (!buf)
return -ENOMEM;
+ trace_habanalabs_cn_mem_alloc(hdev->dev, buf->mem_id, buf->handle, (u64)buf->kernel_address,
+ buf->bus_address, buf->device_va, buf->mappable_size);
+
mem_data->handle = buf->handle;
if (mem_data->mem_id == HBL_CN_DRV_MEM_HOST_DMA_COHERENT)
@@ -242,6 +247,10 @@ int hbl_cn_mem_alloc(struct hbl_cn_ctx *ctx, struct hbl_cn_mem_data *mem_data)
static void cn_mem_buf_destroy(struct hbl_cn_mem_buf *buf)
{
+ trace_habanalabs_cn_mem_destroy(buf->ctx->hdev->dev, buf->mem_id, buf->handle,
+ (u64)buf->kernel_address, buf->bus_address, buf->device_va,
+ buf->mappable_size);
+
if (buf->device_va)
hbl_cn_unmap_vmalloc_range(buf->ctx, buf->device_va, buf->mappable_size);
diff --git a/include/trace/events/habanalabs_cn.h b/include/trace/events/habanalabs_cn.h
new file mode 100644
index 000000000000..aca962cf3130
--- /dev/null
+++ b/include/trace/events/habanalabs_cn.h
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2023 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ *
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM habanalabs_cn
+
+#if !defined(_TRACE_HABANALABS_CN_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HABANALABS_CN_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(habanalabs_cn_mem_template,
+ TP_PROTO(struct device *dev, u32 mem_id, u64 handle, u64 kernel_addr, u64 bus_addr,
+ u64 device_va, size_t size),
+
+ TP_ARGS(dev, mem_id, handle, kernel_addr, bus_addr, device_va, size),
+
+ TP_STRUCT__entry(
+ __string(dname, dev_name(dev))
+ __field(u32, mem_id)
+ __field(u64, handle)
+ __field(u64, kernel_addr)
+ __field(u64, bus_addr)
+ __field(u64, device_va)
+ __field(u32, size)
+ ),
+
+ TP_fast_assign(
+ __assign_str(dname);
+ __entry->mem_id = mem_id;
+ __entry->handle = handle;
+ __entry->kernel_addr = kernel_addr;
+ __entry->bus_addr = bus_addr;
+ __entry->device_va = device_va;
+ __entry->size = size;
+ ),
+
+ TP_printk("%s: mem_id: %#x, handle: %#llx, kernel_addr: %#llx, bus_addr: %#llx, device_va: %#llx, size: %#x",
+ __get_str(dname),
+ __entry->mem_id,
+ __entry->handle,
+ __entry->kernel_addr,
+ __entry->bus_addr,
+ __entry->device_va,
+ __entry->size)
+);
+
+DEFINE_EVENT(habanalabs_cn_mem_template, habanalabs_cn_mem_alloc,
+ TP_PROTO(struct device *dev, u32 mem_id, u64 handle, u64 kernel_addr, u64 bus_addr,
+ u64 device_va, size_t size),
+ TP_ARGS(dev, mem_id, handle, kernel_addr, bus_addr, device_va, size));
+
+DEFINE_EVENT(habanalabs_cn_mem_template, habanalabs_cn_mem_destroy,
+ TP_PROTO(struct device *dev, u32 mem_id, u64 handle, u64 kernel_addr, u64 bus_addr,
+ u64 device_va, size_t size),
+ TP_ARGS(dev, mem_id, handle, kernel_addr, bus_addr, device_va, size));
+
+DECLARE_EVENT_CLASS(habanalabs_cn_dma_alloc_template,
+ TP_PROTO(struct device *dev, u64 cpu_addr, u64 dma_addr, size_t size, const char *caller),
+
+ TP_ARGS(dev, cpu_addr, dma_addr, size, caller),
+
+ TP_STRUCT__entry(
+ __string(dname, dev_name(dev))
+ __field(u64, cpu_addr)
+ __field(u64, dma_addr)
+ __field(u32, size)
+ __field(const char *, caller)
+ ),
+
+ TP_fast_assign(
+ __assign_str(dname);
+ __entry->cpu_addr = cpu_addr;
+ __entry->dma_addr = dma_addr;
+ __entry->size = size;
+ __entry->caller = caller;
+ ),
+
+ TP_printk("%s: cpu_addr: %#llx, dma_addr: %#llx, size: %#x, caller: %s",
+ __get_str(dname),
+ __entry->cpu_addr,
+ __entry->dma_addr,
+ __entry->size,
+ __entry->caller
+ )
+);
+
+DEFINE_EVENT(habanalabs_cn_dma_alloc_template, habanalabs_cn_dma_alloc_coherent,
+ TP_PROTO(struct device *dev, u64 cpu_addr, u64 dma_addr, size_t size,
+ const char *caller),
+ TP_ARGS(dev, cpu_addr, dma_addr, size, caller));
+
+DEFINE_EVENT(habanalabs_cn_dma_alloc_template, habanalabs_cn_dma_free_coherent,
+ TP_PROTO(struct device *dev, u64 cpu_addr, u64 dma_addr, size_t size,
+ const char *caller),
+ TP_ARGS(dev, cpu_addr, dma_addr, size, caller));
+
+DEFINE_EVENT(habanalabs_cn_dma_alloc_template, habanalabs_cn_dma_pool_zalloc,
+ TP_PROTO(struct device *dev, u64 cpu_addr, u64 dma_addr, size_t size,
+ const char *caller),
+ TP_ARGS(dev, cpu_addr, dma_addr, size, caller));
+
+DEFINE_EVENT(habanalabs_cn_dma_alloc_template, habanalabs_cn_dma_pool_free,
+ TP_PROTO(struct device *dev, u64 cpu_addr, u64 dma_addr, size_t size,
+ const char *caller),
+ TP_ARGS(dev, cpu_addr, dma_addr, size, caller));
+
+#endif /* if !defined(_TRACE_HABANALABS_CN_H) || defined(TRACE_HEADER_MULTI_READ) */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
2.34.1
^ permalink raw reply related [flat|nested] 107+ messages in thread
* [PATCH 06/15] net: hbl_cn: debugfs support
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
` (4 preceding siblings ...)
2024-06-13 8:21 ` [PATCH 05/15] net: hbl_cn: memory trace events Omer Shpigelman
@ 2024-06-13 8:21 ` Omer Shpigelman
2024-06-19 18:35 ` Sunil Kovvuri Goutham
2024-06-13 8:22 ` [PATCH 08/15] net: hbl_cn: gaudi2: ASIC specific support Omer Shpigelman
` (9 subsequent siblings)
15 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:21 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
Add debugfs files for advanced settings, debug options and testing.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Co-developed-by: David Meriin <dmeriin@habana.ai>
Signed-off-by: David Meriin <dmeriin@habana.ai>
Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
---
.../ABI/testing/debugfs-driver-habanalabs_cn | 195 +++
MAINTAINERS | 1 +
drivers/net/ethernet/intel/hbl_cn/Makefile | 2 +
.../net/ethernet/intel/hbl_cn/common/hbl_cn.c | 4 +
.../net/ethernet/intel/hbl_cn/common/hbl_cn.h | 37 +
.../intel/hbl_cn/common/hbl_cn_debugfs.c | 1457 +++++++++++++++++
.../ethernet/intel/hbl_cn/common/hbl_cn_drv.c | 19 +-
7 files changed, 1714 insertions(+), 1 deletion(-)
create mode 100644 Documentation/ABI/testing/debugfs-driver-habanalabs_cn
create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_debugfs.c
diff --git a/Documentation/ABI/testing/debugfs-driver-habanalabs_cn b/Documentation/ABI/testing/debugfs-driver-habanalabs_cn
new file mode 100644
index 000000000000..5de689b0986d
--- /dev/null
+++ b/Documentation/ABI/testing/debugfs-driver-habanalabs_cn
@@ -0,0 +1,195 @@
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_accumulate_fec_duration
+Date: May 2024
+KernelVersion: N/A
+Contact: kvabhilash@habana.ai
+Description: Time (in ms) to accumulate FEC errors for. This is used as
+ divisor in calculation of pre & post FEC (Forward Error
+ Correction) SER (Symbol Error Rate).
+ Valid values are from 1 to ACCUMULATE_FEC_STATS_DURATION_MS_MAX.
+ Default value is ACCUMULATE_FEC_STATS_DURATION_MS.
+ Usage: echo <time_ms> > nic_accumulate_fec_duration
+ cat nic_accumulate_fec_duration
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_disable_decap
+Date: May 2024
+KernelVersion: 6.9
+Contact: gkgurumurthy@habana.ai
+Description: Allows the root user to enable/disable the decapsulation logic
+ on the RX path.
+ '1' disables the logic, '0' enables the logic.
+ The typical use case is to disable the decap on rx with encap
+ enabled on the tx, and run the RDMA traffic. This causes the
+ packets to be forwarded to ethernet ring which is ultimately
+ sent to linux kernel. The user can analyze the RDMA packet using
+ tcpdump and validate the encapsulation header.
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_inject_rx_err
+Date: May 2024
+KernelVersion: 6.9
+Contact: akishore@habana.ai
+Description: Allows the root user to force RX packet drop.
+ Usage: echo <drop_percent> > nic_inject_rx_err
+ cat nic_inject_rx_err
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_mac_lane_remap
+Date: May 2024
+KernelVersion: 6.9
+Contact: akishore@habana.ai
+Description: Allows root user to change MAC to PHY lane mapping. User should
+ provide space separated bitmap for all lanes on all NIC macros.
+ Usage: echo macro0 macro1 macro2 ... macroX > nic_mac_lane_remap
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_mac_loopback
+Date: May 2024
+KernelVersion: 6.9
+Contact: oshpigelman@habana.ai
+Description: Allows the root user to disable/enable MAC loopback for each NIC
+ port. The ports will function as if a physical loopback
+ transceiver was connected. A bitmask should be provided where
+ each bit represents a port.
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_mmu_bypass
+Date: May 2024
+KernelVersion: 6.9
+Contact: zyehudai@habana.ai
+Description: Sets the NIC to use MMU bypass for its allocated data structure.
+ Value of "0" is for disable, otherwise enable.
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_nrz_tx_taps
+Date: May 2024
+KernelVersion: 6.9
+Contact: sozeri@habana.ai
+Description: Allows the root user to set and show the NRZ Tx taps for the
+ port lanes. The lanes indices are 0-47.
+ Acceptable input string form:
+ <lane> <tx_pre2> <tx_pre1> <tx_main> <tx_post1> <tx_post2>.
+ cat nic_nrz_tx_taps will dump all taps.
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_override_port_status
+Date: May 2024
+KernelVersion: 6.9
+Contact: aagranovich@habana.ai
+Description: Allows the root user to force set ports link status.
+ Usage: echo <port> <status> > nic_override_port_status
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_pam4_tx_taps
+Date: May 2024
+KernelVersion: 6.9
+Contact: oshpigelman@habana.ai
+Description: Allows the root user to set and show the PAM4 Tx taps for the
+ port lanes. The lanes indices are 0-47.
+ Acceptable input string form:
+ <lane> <tx_pre2> <tx_pre1> <tx_main> <tx_post1> <tx_post2>.
+ cat nic_pam4_tx_taps will dump all taps.
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_phy_calc_ber
+Date: May 2024
+KernelVersion: 6.9
+Contact: sozeri@habana.ai
+Description: Allows the root user to enable calculation of PHY BER during the
+ PHY power_up flow.
+ value "0" is for disable, otherwise enable.
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_phy_calc_ber_wait_sec
+Date: May 2024
+KernelVersion: 6.9
+Contact: sozeri@habana.ai
+Description: Allows the root user to set the waiting time in seconds before
+ BER calculating.
+ Usage: echo <time_in_seconds> > nic_phy_calc_ber_wait_sec
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_phy_dump_serdes_params
+Date: May 2024
+KernelVersion: 6.9
+Contact: sozeri@habana.ai
+Description: Allows the root user to dump serdes parameters for a given port.
+ Need to write the port and then read it in order to dump the
+ parameters.
+ Usage: echo <port> > nic_phy_dump_serdes_params
+ cat nic_phy_dump_serdes_params
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_phy_force_first_tx_taps_cfg
+Date: May 2024
+KernelVersion: 6.9
+Contact: sozeri@habana.ai
+Description: Allows the root user to determine that every PHY power-up will
+ set the first Tx taps cfg instead of the current one.
+ Usage: echo 1 > nic_phy_force_first_tx_taps_cfg
+ cat nic_phy_force_first_tx_taps_cfg
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_phy_regs_print
+Date: May 2024
+KernelVersion: 6.9
+Contact: sozeri@habana.ai
+Description: Allows the root user to dump all PHY registers reads/writes
+ during the PHY power_up flow.
+ value "0" is for disable, otherwise enable.
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_phy_set_nrz
+Date: May 2024
+KernelVersion: 6.9
+Contact: sozeri@habana.ai
+Description: Allows the root user to set/unset NRZ mode (25Gbps speed).
+ Usage: echo 0/1 > nic_phy_set_nrz
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_polarity
+Date: May 2024
+KernelVersion: 6.9
+Contact: oshpigelman@habana.ai
+Description: Allows the root user to set the polarity for the port lanes.
+ The lanes indices are 0-47.
+ Acceptable input string form: <lane> <pol_tx> <pol_rx>.
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_print_fec_stats
+Date: May 2024
+KernelVersion: 6.9
+Contact: sozeri@habana.ai
+Description: Allows the root user to dump the FEC stats for all NIC ports to
+ dmesg.
+ Usage: cat nic_print_fec_stats
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_qp
+Date: May 2024
+KernelVersion: 6.9
+Contact: oshpigelman@habana.ai
+Description: Read a QP (Queue Pair) HW structure content. Need to write the
+ QP number, port, type and other flags to the file and then read
+ it in order to dump the QP content.
+ Input form: <port> <qpn> <is_req> <is_full_print> <force_read>.
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_rand_status
+Date: May 2024
+KernelVersion: 6.9
+Contact: oshpigelman@habana.ai
+Description: Enable randomization for the values returned in the NIC status
+ packet. Value of "0" is for disable, otherwise enable.
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_reset_cnt
+Date: May 2024
+KernelVersion: 6.9
+Contact: oshpigelman@habana.ai
+Description: Resets nic counters. Any decimal value is a valid input.
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_show_internal_ports_status
+Date: May 2024
+KernelVersion: 6.9
+Contact: sozeri@habana.ai
+Description: Allows the root user to read the link status of all internal
+ NIC ports.
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_wqe
+Date: May 2024
+KernelVersion: 6.9
+Contact: zyehudai@habana.ai
+Description: Read a WQE (Work Queue Entry) HW structure content. Need to
+ write the port, QP number, WQE index and it's type to the file
+ and then read it in order to dump the WQE content.
+ Input form: <port> <qpn> <wqe_idx> <is_tx>.
+
+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_wqe_index_checker
+Date: May 2024
+KernelVersion: 6.9
+Contact: dmeriin@habana.ai
+Description: Allows the root user to enable/disable the WQE index checker.
+ value "0" is for disable, otherwise enable.
+ Usage: echo <enable> > nic_wqe_index_checker
diff --git a/MAINTAINERS b/MAINTAINERS
index e948e33e990d..906224204aba 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9610,6 +9610,7 @@ L: netdev@vger.kernel.org
L: linux-rdma@vger.kernel.org
S: Supported
W: https://www.habana.ai
+F: Documentation/ABI/testing/debugfs-driver-habanalabs_cn
F: Documentation/networking/device_drivers/ethernet/intel/hbl.rst
F: drivers/net/ethernet/intel/hbl_cn/
F: include/linux/habanalabs/
diff --git a/drivers/net/ethernet/intel/hbl_cn/Makefile b/drivers/net/ethernet/intel/hbl_cn/Makefile
index 84ee2a6d7c3b..c2c4142f18a0 100644
--- a/drivers/net/ethernet/intel/hbl_cn/Makefile
+++ b/drivers/net/ethernet/intel/hbl_cn/Makefile
@@ -7,3 +7,5 @@ obj-$(CONFIG_HABANA_CN) := habanalabs_cn.o
include $(src)/common/Makefile
habanalabs_cn-y += $(HBL_CN_COMMON_FILES)
+
+habanalabs_cn-$(CONFIG_DEBUG_FS) += common/hbl_cn_debugfs.o
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
index 4e910b2cb8ac..fc7bd6474404 100644
--- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
@@ -1842,6 +1842,8 @@ int hbl_cn_dev_init(struct hbl_cn_device *hdev)
hbl_cn_late_init(hdev);
+ hbl_cn_debugfs_dev_init(hdev);
+
hdev->is_initialized = true;
return 0;
@@ -1878,6 +1880,8 @@ void hbl_cn_dev_fini(struct hbl_cn_device *hdev)
hbl_cn_stop(hdev->cn_aux_dev);
}
+ hbl_cn_debugfs_dev_fini(hdev);
+
hbl_cn_late_fini(hdev);
hbl_cn_ib_aux_drv_fini(hdev);
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.h b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.h
index 27139e93d990..0b96cd3db719 100644
--- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.h
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.h
@@ -142,6 +142,34 @@
#define hbl_cn_dma_pool_free(hdev, vaddr, dma_addr) \
__hbl_cn_dma_pool_free(hdev, vaddr, dma_addr, __func__)
+/* CN debugfs files enum */
+enum hbl_cn_debugfs_files_idx {
+ NIC_MAC_LOOPBACK = 0,
+ NIC_PAM4_TX_TAPS,
+ NIC_NRZ_TX_TAPS,
+ NIC_POLARITY,
+ NIC_QP,
+ NIC_WQE,
+ NIC_RESET_CNT,
+ NIC_MAC_LANE_REMAP,
+ NIC_RAND_STATUS,
+ NIC_MMU_BYPASS,
+ NIC_ETH_LOOPBACK,
+ NIC_PHY_REGS_PRINT,
+ NIC_SHOW_INTERNAL_PORTS_STATUS,
+ NIC_PRINT_FEC_STATS,
+ NIC_DISABLE_DECAP,
+ NIC_PHY_SET_NRZ,
+ NIC_PHY_DUMP_SERDES_PARAMS,
+ NIC_INJECT_RX_ERR,
+ NIC_PHY_CALC_BER,
+ NIC_PHY_CALC_BER_WAIT_SEC,
+ NIC_OVERRIDE_PORT_STATUS,
+ NIC_WQE_INDEX_CHECKER,
+ NIC_ACCUMULATE_FEC_DURATION,
+ NIC_PHY_FORCE_FIRST_TX_TAPS_CFG,
+};
+
extern struct hbl_cn_stat hbl_cn_mac_fec_stats[];
extern struct hbl_cn_stat hbl_cn_mac_stats_rx[];
extern struct hbl_cn_stat hbl_cn_mac_stats_tx[];
@@ -1312,6 +1340,7 @@ struct hbl_cn_properties {
* struct hbl_cn_device - habanalabs CN device structure.
* @pdev: pointer to PCI device.
* @dev: related kernel basic device structure.
+ * @cn_dentry: CN debugfs root dentry.
* @cpucp_info: FW info.
* @asic_funcs: ASIC specific functions that can be called from common code.
* @phy_tx_taps: array that holds all PAM4 Tx taps of all lanes.
@@ -1341,6 +1370,7 @@ struct hbl_cn_properties {
* @mac_loopback: enable MAC loopback on specific ports.
* @dram_size: available DRAM size.
* @mmap_type_flag: flag to indicate NIC MMAP type.
+ * @debugfs_supp_mask: mask of supported debugfs files.
* @pol_tx_mask: bitmap of tx polarity for all lanes.
* @pol_rx_mask: bitmap of rx polarity for all lanes.
* @device_timeout: device access timeout in usec.
@@ -1411,6 +1441,7 @@ struct hbl_cn_properties {
struct hbl_cn_device {
struct pci_dev *pdev;
struct device *dev;
+ struct dentry *cn_dentry;
struct hbl_cn_cpucp_info *cpucp_info;
struct hbl_cn_asic_funcs *asic_funcs;
struct hbl_cn_tx_taps *phy_tx_taps;
@@ -1441,6 +1472,7 @@ struct hbl_cn_device {
u64 mac_loopback;
u64 dram_size;
u64 mmap_type_flag;
+ u64 debugfs_supp_mask;
u64 pol_tx_mask;
u64 pol_rx_mask;
u32 device_timeout;
@@ -1534,6 +1566,8 @@ void hbl_cn_phy_set_port_status(struct hbl_cn_port *cn_port, bool up);
int hbl_cn_read_spmu_counters(struct hbl_cn_port *cn_port, u64 out_data[], u32 *num_out_data);
int hbl_cn_qp_modify(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp,
enum hbl_cn_qp_state new_state, void *params);
+void hbl_cn_debugfs_dev_init(struct hbl_cn_device *hdev);
+void hbl_cn_debugfs_dev_fini(struct hbl_cn_device *hdev);
u32 hbl_cn_get_max_qp_id(struct hbl_cn_port *cn_port);
bool hbl_cn_is_port_open(struct hbl_cn_port *cn_port);
u32 hbl_cn_get_pflags(struct hbl_cn_port *cn_port);
@@ -1614,6 +1648,9 @@ int __hbl_cn_ports_reopen(struct hbl_cn_device *hdev);
void __hbl_cn_hard_reset_prepare(struct hbl_cn_device *hdev, bool fw_reset, bool in_teardown);
void __hbl_cn_stop(struct hbl_cn_device *hdev);
+void __init hbl_cn_debugfs_init(void);
+void hbl_cn_debugfs_fini(void);
+
/* DMA memory allocations */
void *__hbl_cn_dma_alloc_coherent(struct hbl_cn_device *hdev, size_t size, dma_addr_t *dma_handle,
gfp_t flag, const char *caller);
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_debugfs.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_debugfs.c
new file mode 100644
index 000000000000..b13da28fdcb7
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_debugfs.c
@@ -0,0 +1,1457 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "hbl_cn.h"
+
+#ifdef CONFIG_DEBUG_FS
+
+#include <linux/debugfs.h>
+#include <linux/module.h>
+#include <linux/nospec.h>
+
+#define POLARITY_KBUF_SIZE 8
+#define TX_TAPS_KBUF_SIZE 25
+#define KBUF_IN_SIZE 18
+#define KBUF_OUT_SIZE BIT(12)
+#define KBUF_OUT_BIG_SIZE BIT(14)
+#define MAC_LANE_REMAP_READ_SIZE 10
+#define MAX_INT_PORT_STS_KBUF_SIZE 20
+#define HBL_CN_DEBUGFS_CREATE_FILE(op, perm, dir, dev, fops) \
+ do { \
+ enum hbl_cn_debugfs_files_idx __op = op; \
+ if (hdev->debugfs_supp_mask & BIT(__op)) \
+ debugfs_create_file(hbl_cn_debugfs_names[__op], perm, dir, dev, fops); \
+ } while (0)
+
+#define HBL_CN_DEBUGFS_CREATE_U8(op, perm, dir, fops) \
+ do { \
+ enum hbl_cn_debugfs_files_idx __op = op; \
+ if (hdev->debugfs_supp_mask & BIT(__op)) \
+ debugfs_create_u8(hbl_cn_debugfs_names[__op], perm, dir, fops); \
+ } while (0)
+
+#define HBL_CN_DEBUGFS_CREATE_U16(op, perm, dir, fops) \
+ do { \
+ enum hbl_cn_debugfs_files_idx __op = op; \
+ if (hdev->debugfs_supp_mask & BIT(__op)) \
+ debugfs_create_u16(hbl_cn_debugfs_names[__op], perm, dir, fops); \
+ } while (0)
+
+static char hbl_cn_debugfs_names[][NAME_MAX] = {
+ [NIC_MAC_LOOPBACK] = "nic_mac_loopback",
+ [NIC_PAM4_TX_TAPS] = "nic_pam4_tx_taps",
+ [NIC_NRZ_TX_TAPS] = "nic_nrz_tx_taps",
+ [NIC_POLARITY] = "nic_polarity",
+ [NIC_QP] = "nic_qp",
+ [NIC_WQE] = "nic_wqe",
+ [NIC_RESET_CNT] = "nic_reset_cnt",
+ [NIC_MAC_LANE_REMAP] = "nic_mac_lane_remap",
+ [NIC_RAND_STATUS] = "nic_rand_status",
+ [NIC_MMU_BYPASS] = "nic_mmu_bypass",
+ [NIC_ETH_LOOPBACK] = "nic_eth_loopback",
+ [NIC_PHY_REGS_PRINT] = "nic_phy_regs_print",
+ [NIC_SHOW_INTERNAL_PORTS_STATUS] = "nic_show_internal_ports_status",
+ [NIC_PRINT_FEC_STATS] = "nic_print_fec_stats",
+ [NIC_DISABLE_DECAP] = "nic_disable_decap",
+ [NIC_PHY_SET_NRZ] = "nic_phy_set_nrz",
+ [NIC_PHY_DUMP_SERDES_PARAMS] = "nic_phy_dump_serdes_params",
+ [NIC_INJECT_RX_ERR] = "nic_inject_rx_err",
+ [NIC_PHY_CALC_BER] = "nic_phy_calc_ber",
+ [NIC_PHY_CALC_BER_WAIT_SEC] = "nic_phy_calc_ber_wait_sec",
+ [NIC_OVERRIDE_PORT_STATUS] = "nic_override_port_status",
+ [NIC_WQE_INDEX_CHECKER] = "nic_wqe_index_checker",
+ [NIC_ACCUMULATE_FEC_DURATION] = "nic_accumulate_fec_duration",
+ [NIC_PHY_FORCE_FIRST_TX_TAPS_CFG] = "nic_phy_force_first_tx_taps_cfg",
+};
+
+static struct dentry *hbl_cn_debug_root;
+
+static int hbl_device_hard_reset_sync(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ ktime_t timeout;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ aux_ops->device_reset(aux_dev);
+
+ timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * 1000ull);
+ while (!hbl_cn_comp_device_operational(hdev) && !READ_ONCE(hdev->in_teardown)) {
+ ssleep(1);
+ if (ktime_compare(ktime_get(), timeout) > 0) {
+ dev_crit(hdev->dev, "Timed out waiting for hard reset to finish\n");
+ return -ETIMEDOUT;
+ }
+ }
+
+ return 0;
+}
+
+static ssize_t debugfs_print_value_to_buffer(void *buf, size_t count, loff_t *ppos, char *fmt,
+ u32 val)
+{
+ size_t fmt_len = strlen(fmt);
+ char tmp_fmt[32] = {0};
+ char tmp[32] = {0};
+
+ if (*ppos)
+ return 0;
+
+ if (fmt_len > sizeof(tmp_fmt) - 2)
+ return -ENOMEM;
+
+ strscpy(tmp_fmt, fmt, sizeof(tmp_fmt));
+
+ tmp_fmt[fmt_len] = '\n';
+ tmp_fmt[fmt_len + 1] = '\0';
+
+ snprintf(tmp, sizeof(tmp), tmp_fmt, val);
+
+ return simple_read_from_buffer(buf, count, ppos, tmp, strlen(tmp) + 1);
+}
+
+static ssize_t debugfs_pam4_tx_taps_write(struct file *f, const char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ s32 tx_pre2, tx_pre1, tx_main, tx_post1, tx_post2, *taps;
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ char kbuf[TX_TAPS_KBUF_SIZE] = {0};
+ u32 lane, max_num_of_lanes;
+ char *c1, *c2;
+ ssize_t rc;
+
+ max_num_of_lanes = hdev->cn_props.max_num_of_lanes;
+
+ if (count > sizeof(kbuf) - 1)
+ goto err;
+ if (copy_from_user(kbuf, buf, count))
+ goto err;
+ kbuf[count] = '\0';
+
+ c1 = kbuf;
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtou32(c1, 10, &lane);
+ if (rc)
+ goto err;
+
+ if (lane >= max_num_of_lanes) {
+ dev_err(hdev->dev, "lane max value is %d\n", max_num_of_lanes - 1);
+ return -EINVAL;
+ }
+
+ /* Turn off speculation due to Spectre vulnerability */
+ lane = array_index_nospec(lane, max_num_of_lanes);
+
+ c1 = c2 + 1;
+
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtos32(c1, 10, &tx_pre2);
+ if (rc)
+ goto err;
+
+ c1 = c2 + 1;
+
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtos32(c1, 10, &tx_pre1);
+ if (rc)
+ goto err;
+
+ c1 = c2 + 1;
+
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtos32(c1, 10, &tx_main);
+ if (rc)
+ goto err;
+
+ c1 = c2 + 1;
+
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtos32(c1, 10, &tx_post1);
+ if (rc)
+ goto err;
+
+ c1 = c2 + 1;
+
+ rc = kstrtos32(c1, 10, &tx_post2);
+ if (rc)
+ goto err;
+
+ taps = hdev->phy_tx_taps[lane].pam4_taps;
+ taps[0] = tx_pre2;
+ taps[1] = tx_pre1;
+ taps[2] = tx_main;
+ taps[3] = tx_post1;
+ taps[4] = tx_post2;
+
+ return count;
+err:
+ dev_err(hdev->dev,
+ "usage: echo <lane> <tx_pre2> <tx_pre1> <tx_main> <tx_post1> <tx_post2> > nic_pam4_tx_taps\n");
+
+ return -EINVAL;
+}
+
+static ssize_t debugfs_pam4_tx_taps_read(struct file *f, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ u32 lane, max_num_of_lanes;
+ ssize_t rc, len;
+ char *kbuf;
+ s32 *taps;
+
+ max_num_of_lanes = hdev->cn_props.max_num_of_lanes;
+
+ if (*ppos)
+ return 0;
+
+ kbuf = kzalloc(KBUF_OUT_SIZE, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ sprintf(kbuf + strlen(kbuf), "PAM4 tx taps:\n");
+
+ for (lane = 0; lane < max_num_of_lanes; lane++) {
+ taps = hdev->phy_tx_taps[lane].pam4_taps;
+ len = strlen(kbuf);
+ if ((KBUF_OUT_SIZE - len) <= 1) {
+ rc = -EFBIG;
+ goto out;
+ }
+ snprintf(kbuf + len, KBUF_OUT_SIZE - len, "lane %u: %d %d %d %d %d\n", lane,
+ taps[0], taps[1], taps[2], taps[3], taps[4]);
+ }
+
+ rc = simple_read_from_buffer(buf, count, ppos, kbuf, strlen(kbuf) + 1);
+
+out:
+ kfree(kbuf);
+
+ return rc;
+}
+
+static const struct file_operations debugfs_pam4_tx_taps_fops = {
+ .owner = THIS_MODULE,
+ .write = debugfs_pam4_tx_taps_write,
+ .read = debugfs_pam4_tx_taps_read,
+};
+
+static ssize_t debugfs_nrz_tx_taps_write(struct file *f, const char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ s32 tx_pre2, tx_pre1, tx_main, tx_post1, tx_post2, *taps;
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ char kbuf[TX_TAPS_KBUF_SIZE] = {0};
+ u32 lane, max_num_of_lanes;
+ char *c1, *c2;
+ ssize_t rc;
+
+ max_num_of_lanes = hdev->cn_props.max_num_of_lanes;
+
+ if (count > sizeof(kbuf) - 1)
+ goto err;
+ if (copy_from_user(kbuf, buf, count))
+ goto err;
+ kbuf[count] = '\0';
+
+ c1 = kbuf;
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtou32(c1, 10, &lane);
+ if (rc)
+ goto err;
+
+ if (lane >= max_num_of_lanes) {
+ dev_err(hdev->dev, "lane max value is %d\n", max_num_of_lanes - 1);
+ return -EINVAL;
+ }
+
+ /* Turn off speculation due to Spectre vulnerability */
+ lane = array_index_nospec(lane, max_num_of_lanes);
+
+ c1 = c2 + 1;
+
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtos32(c1, 10, &tx_pre2);
+ if (rc)
+ goto err;
+
+ c1 = c2 + 1;
+
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtos32(c1, 10, &tx_pre1);
+ if (rc)
+ goto err;
+
+ c1 = c2 + 1;
+
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtos32(c1, 10, &tx_main);
+ if (rc)
+ goto err;
+
+ c1 = c2 + 1;
+
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtos32(c1, 10, &tx_post1);
+ if (rc)
+ goto err;
+
+ c1 = c2 + 1;
+
+ rc = kstrtos32(c1, 10, &tx_post2);
+ if (rc)
+ goto err;
+
+ taps = hdev->phy_tx_taps[lane].nrz_taps;
+ taps[0] = tx_pre2;
+ taps[1] = tx_pre1;
+ taps[2] = tx_main;
+ taps[3] = tx_post1;
+ taps[4] = tx_post2;
+
+ return count;
+err:
+ dev_err(hdev->dev,
+ "usage: echo <lane> <tx_pre2> <tx_pre1> <tx_main> <tx_post1> <tx_post2> > nic_nrz_tx_taps\n");
+
+ return -EINVAL;
+}
+
+static ssize_t debugfs_nrz_tx_taps_read(struct file *f, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ u32 lane, max_num_of_lanes;
+ ssize_t rc, len;
+ char *kbuf;
+ s32 *taps;
+
+ max_num_of_lanes = hdev->cn_props.max_num_of_lanes;
+
+ if (*ppos)
+ return 0;
+
+ kbuf = kzalloc(KBUF_OUT_SIZE, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ sprintf(kbuf + strlen(kbuf), "NRZ tx taps:\n");
+
+ for (lane = 0; lane < max_num_of_lanes; lane++) {
+ taps = hdev->phy_tx_taps[lane].nrz_taps;
+ len = strlen(kbuf);
+ if ((KBUF_OUT_SIZE - len) <= 1) {
+ rc = -EFBIG;
+ goto out;
+ }
+ snprintf(kbuf + len, KBUF_OUT_SIZE - len, "lane %u: %d %d %d %d %d\n", lane,
+ taps[0], taps[1], taps[2], taps[3], taps[4]);
+ }
+
+ rc = simple_read_from_buffer(buf, count, ppos, kbuf, strlen(kbuf) + 1);
+
+out:
+ kfree(kbuf);
+
+ return rc;
+}
+
+static const struct file_operations debugfs_nrz_tx_taps_fops = {
+ .owner = THIS_MODULE,
+ .write = debugfs_nrz_tx_taps_write,
+ .read = debugfs_nrz_tx_taps_read,
+};
+
+static ssize_t debugfs_polarity_write(struct file *f, const char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ char kbuf[POLARITY_KBUF_SIZE] = {0};
+ u32 lane, max_num_of_lanes;
+ u8 pol_tx, pol_rx;
+ char *c1, *c2;
+ ssize_t rc;
+ u64 val;
+
+ max_num_of_lanes = hdev->cn_props.max_num_of_lanes;
+
+ if (count > sizeof(kbuf) - 1)
+ goto err;
+ if (copy_from_user(kbuf, buf, count))
+ goto err;
+ kbuf[count] = '\0';
+
+ c1 = kbuf;
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtou32(c1, 10, &lane);
+ if (rc)
+ goto err;
+
+ if (lane >= max_num_of_lanes) {
+ dev_err(hdev->dev, "lane max value is %d\n", max_num_of_lanes - 1);
+ return -EINVAL;
+ }
+
+ c1 = c2 + 1;
+
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtou8(c1, 10, &pol_tx);
+ if (rc)
+ goto err;
+
+ c1 = c2 + 1;
+
+ rc = kstrtou8(c1, 10, &pol_rx);
+ if (rc)
+ goto err;
+
+ if ((pol_tx & ~1) || (pol_rx & ~1)) {
+ dev_err(hdev->dev, "pol_tx and pol_rx should be 0 or 1\n");
+ goto err;
+ }
+
+ val = hdev->pol_tx_mask;
+ val &= ~BIT_ULL(lane);
+ val |= ((u64)pol_tx) << lane;
+ hdev->pol_tx_mask = val;
+
+ val = hdev->pol_rx_mask;
+ val &= ~BIT_ULL(lane);
+ val |= ((u64)pol_rx) << lane;
+ hdev->pol_rx_mask = val;
+
+ /* This flag is set to prevent overwriting the new values after reset */
+ hdev->skip_phy_pol_cfg = true;
+
+ return count;
+err:
+ dev_err(hdev->dev, "usage: echo <lane> <pol_tx> <pol_rx> > nic_polarity\n");
+
+ return -EINVAL;
+}
+
+static const struct file_operations debugfs_polarity_fops = {
+ .owner = THIS_MODULE,
+ .write = debugfs_polarity_write,
+};
+
+static ssize_t debugfs_qp_read(struct file *f, char __user *buf, size_t count, loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ char *kbuf;
+ ssize_t rc;
+
+ asic_funcs = hdev->asic_funcs;
+
+ if (*ppos)
+ return 0;
+
+ kbuf = kzalloc(KBUF_OUT_SIZE, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ rc = asic_funcs->qp_read(hdev, &hdev->qp_info, kbuf, KBUF_OUT_SIZE);
+ if (rc)
+ goto out;
+
+ rc = simple_read_from_buffer(buf, count, ppos, kbuf, strlen(kbuf) + 1);
+
+out:
+ kfree(kbuf);
+
+ return rc;
+}
+
+static ssize_t debugfs_qp_write(struct file *f, const char __user *buf, size_t count, loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ struct hbl_cn_qp_info *qp_info = &hdev->qp_info;
+ u32 port, qpn, max_num_of_ports;
+ u8 req, full_print, force_read;
+ char kbuf[KBUF_IN_SIZE] = {0};
+ char *c1, *c2;
+ ssize_t rc;
+
+ max_num_of_ports = hdev->cn_props.max_num_of_ports;
+
+ if (count > sizeof(kbuf) - 1)
+ goto err;
+ if (copy_from_user(kbuf, buf, count))
+ goto err;
+ kbuf[count] = '\0';
+
+ c1 = kbuf;
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtou32(c1, 10, &port);
+ if (rc)
+ goto err;
+
+ if (port >= max_num_of_ports) {
+ dev_err(hdev->dev, "port max value is %d\n", max_num_of_ports - 1);
+ return -EINVAL;
+ }
+
+ c1 = c2 + 1;
+
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtou32(c1, 10, &qpn);
+ if (rc)
+ goto err;
+
+ c1 = c2 + 1;
+
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtou8(c1, 10, &req);
+ if (rc)
+ goto err;
+
+ if (req & ~1) {
+ dev_err(hdev->dev, "req should be 0 or 1\n");
+ goto err;
+ }
+
+ c1 = c2 + 1;
+
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtou8(c1, 10, &full_print);
+ if (rc)
+ goto err;
+
+ if (full_print & ~1) {
+ dev_err(hdev->dev, "full_print should be 0 or 1\n");
+ goto err;
+ }
+
+ c1 = c2 + 1;
+
+ /* may not be the last element due to the optional params */
+ c2 = strchr(c1, ' ');
+ if (c2)
+ *c2 = '\0';
+
+ rc = kstrtou8(c1, 10, &force_read);
+ if (rc)
+ goto err;
+
+ if (force_read & ~1) {
+ dev_err(hdev->dev, "force_read should be 0 or 1\n");
+ goto err;
+ }
+
+ qp_info->port = port;
+ qp_info->qpn = qpn;
+ qp_info->req = req;
+ qp_info->full_print = full_print;
+ qp_info->force_read = force_read;
+
+ return count;
+err:
+ dev_err(hdev->dev,
+ "usage: echo <port> <qpn> <is_req> <is_full_print> <force_read> [<exts_print>] > nic_qp\n");
+
+ return -EINVAL;
+}
+
+static const struct file_operations debugfs_qp_fops = {
+ .owner = THIS_MODULE,
+ .read = debugfs_qp_read,
+ .write = debugfs_qp_write
+};
+
+static ssize_t debugfs_wqe_read(struct file *f, char __user *buf, size_t count, loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ char *kbuf;
+ ssize_t rc;
+
+ asic_funcs = hdev->asic_funcs;
+
+ if (*ppos)
+ return 0;
+
+ kbuf = kzalloc(KBUF_OUT_SIZE, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ rc = asic_funcs->wqe_read(hdev, kbuf, KBUF_OUT_SIZE);
+ if (rc)
+ goto out;
+
+ rc = simple_read_from_buffer(buf, count, ppos, kbuf, strlen(kbuf) + 1);
+
+out:
+ kfree(kbuf);
+
+ return rc;
+}
+
+static ssize_t debugfs_wqe_write(struct file *f, const char __user *buf, size_t count, loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ struct hbl_cn_wqe_info *wqe_info = &hdev->wqe_info;
+ u32 port, qpn, wqe_idx, max_num_of_lanes;
+ char kbuf[KBUF_IN_SIZE] = {0};
+ char *c1, *c2;
+ ssize_t rc;
+ u8 tx;
+
+ max_num_of_lanes = hdev->cn_props.max_num_of_lanes;
+
+ if (count > sizeof(kbuf) - 1)
+ goto err;
+ if (copy_from_user(kbuf, buf, count))
+ goto err;
+ kbuf[count] = '\0';
+
+ c1 = kbuf;
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtou32(c1, 10, &port);
+ if (rc)
+ goto err;
+
+ if (port >= max_num_of_lanes) {
+ dev_err(hdev->dev, "port max value is %d\n", max_num_of_lanes - 1);
+ return -EINVAL;
+ }
+
+ c1 = c2 + 1;
+
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtou32(c1, 10, &qpn);
+ if (rc)
+ goto err;
+
+ c1 = c2 + 1;
+
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtou32(c1, 10, &wqe_idx);
+ if (rc)
+ goto err;
+
+ c1 = c2 + 1;
+
+ rc = kstrtou8(c1, 10, &tx);
+ if (rc)
+ goto err;
+
+ if (tx & ~1) {
+ dev_err(hdev->dev, "tx should be 0 or 1\n");
+ goto err;
+ }
+
+ wqe_info->port = port;
+ wqe_info->qpn = qpn;
+ wqe_info->wqe_idx = wqe_idx;
+ wqe_info->tx = tx;
+
+ return count;
+err:
+ dev_err(hdev->dev, "usage: echo <port> <qpn> <wqe_idx> <is_tx> > nic_wqe\n");
+
+ return -EINVAL;
+}
+
+static const struct file_operations debugfs_wqe_fops = {
+ .owner = THIS_MODULE,
+ .read = debugfs_wqe_read,
+ .write = debugfs_wqe_write
+};
+
+static ssize_t debugfs_reset_cnt_write(struct file *f, const char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ ssize_t rc;
+ u32 val;
+
+ rc = kstrtou32_from_user(buf, count, 10, &val);
+ if (rc)
+ return rc;
+
+ hbl_cn_reset_ports_toggle_counters(hdev);
+ hbl_cn_reset_stats_counters(hdev);
+
+ return count;
+}
+
+static const struct file_operations debugfs_reset_cnt_fops = {
+ .owner = THIS_MODULE,
+ .write = debugfs_reset_cnt_write
+};
+
+static int parse_user_mac_lane_remap_data(u32 *dest_arr, int *dest_arr_cnt, char *buf, int count)
+{
+ int i = 0, j = 0, rc;
+ int offset;
+ u32 val;
+
+ while (i < count) {
+ offset = strcspn(&buf[i], " ");
+ buf[i + offset] = '\0';
+
+ rc = kstrtou32(&buf[i], 16, &val);
+ if (rc)
+ return rc;
+
+ dest_arr[j++] = val;
+ i += (offset + 1);
+ }
+
+ *dest_arr_cnt = j;
+
+ return 0;
+}
+
+static ssize_t debugfs_mac_lane_remap_write(struct file *f, const char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ struct hbl_cn_properties *cn_props;
+ u32 *mac_lane_remap_buf;
+ int rc, n_parsed = 0;
+ char *kbuf;
+
+ cn_props = &hdev->cn_props;
+
+ kbuf = kcalloc(count + 1, sizeof(*buf), GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ mac_lane_remap_buf = kcalloc(cn_props->num_of_macros, sizeof(*mac_lane_remap_buf),
+ GFP_KERNEL);
+ if (!mac_lane_remap_buf) {
+ rc = -ENOMEM;
+ goto err_free_kbuf;
+ }
+
+ rc = copy_from_user(kbuf, buf, count);
+ if (rc)
+ goto err_free_mac_lane_remap_buf;
+
+ /* Add trailing space to simplify parsing user data. */
+ kbuf[count] = ' ';
+
+ rc = parse_user_mac_lane_remap_data(mac_lane_remap_buf, &n_parsed, kbuf, count + 1);
+ if (rc || n_parsed != cn_props->num_of_macros) {
+ rc = -EINVAL;
+ goto err_parse;
+ }
+
+ memcpy(hdev->mac_lane_remap, mac_lane_remap_buf,
+ sizeof(*mac_lane_remap_buf) * cn_props->num_of_macros);
+
+ rc = hbl_device_hard_reset_sync(hdev);
+ if (rc)
+ goto err_free_mac_lane_remap_buf;
+
+ kfree(mac_lane_remap_buf);
+ kfree(kbuf);
+
+ return count;
+err_parse:
+ dev_err_ratelimited(hdev->dev,
+ "usage: echo macro0 macro1 macro2 ... macroX > mac_lane_remap\n");
+err_free_mac_lane_remap_buf:
+ kfree(mac_lane_remap_buf);
+err_free_kbuf:
+ kfree(kbuf);
+ return -EINVAL;
+}
+
+static ssize_t debugfs_mac_lane_remap_read(struct file *f, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ char kbuf[MAC_LANE_REMAP_READ_SIZE];
+ struct hbl_cn_properties *cn_props;
+ int i, j;
+
+ cn_props = &hdev->cn_props;
+
+ if (*ppos)
+ return 0;
+
+ for (i = 0, j = 0; i < cn_props->num_of_macros; i++, j += MAC_LANE_REMAP_READ_SIZE) {
+ memset(kbuf, 0, MAC_LANE_REMAP_READ_SIZE);
+ sprintf(kbuf, "0x%x ", hdev->mac_lane_remap[i]);
+
+ if (copy_to_user(&buf[j], kbuf, MAC_LANE_REMAP_READ_SIZE)) {
+ dev_err(hdev->dev, "error in copying lane info to user\n");
+ return -EFAULT;
+ }
+
+ *ppos += MAC_LANE_REMAP_READ_SIZE;
+ }
+
+ return j + 1;
+}
+
+static const struct file_operations debugfs_mac_lane_remap_fops = {
+ .owner = THIS_MODULE,
+ .write = debugfs_mac_lane_remap_write,
+ .read = debugfs_mac_lane_remap_read,
+};
+
+static ssize_t debugfs_eth_loopback_write(struct file *f, const char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ u32 val = 0;
+ int rc;
+
+ rc = kstrtou32_from_user(buf, count, 10, &val);
+ if (rc)
+ return rc;
+
+ hdev->eth_loopback = !!val;
+
+ dev_info(hdev->dev, "%s eth_loopback\n", hdev->eth_loopback ? "enable" : "disable");
+
+ return count;
+}
+
+static ssize_t debugfs_eth_loopback_read(struct file *f, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+
+ return debugfs_print_value_to_buffer(buf, count, ppos, "%u", hdev->eth_loopback);
+}
+
+static const struct file_operations debugfs_eth_loopback_fops = {
+ .owner = THIS_MODULE,
+ .write = debugfs_eth_loopback_write,
+ .read = debugfs_eth_loopback_read,
+};
+
+static ssize_t debugfs_phy_regs_print_write(struct file *f, const char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ u32 val = 0;
+ int rc;
+
+ rc = kstrtou32_from_user(buf, count, 10, &val);
+ if (rc)
+ return rc;
+
+ hdev->phy_regs_print = !!val;
+
+ dev_info(hdev->dev, "%s printing PHY registers\n",
+ hdev->phy_regs_print ? "enable" : "disable");
+
+ return count;
+}
+
+static ssize_t debugfs_phy_regs_print_read(struct file *f, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+
+ return debugfs_print_value_to_buffer(buf, count, ppos, "%u", hdev->phy_regs_print);
+}
+
+static const struct file_operations debugfs_phy_regs_print_fops = {
+ .owner = THIS_MODULE,
+ .write = debugfs_phy_regs_print_write,
+ .read = debugfs_phy_regs_print_read,
+};
+
+static ssize_t debugfs_show_internal_ports_status_read(struct file *f, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ char kbuf[MAX_INT_PORT_STS_KBUF_SIZE];
+ struct hbl_cn_port *cn_port;
+ int i, cnt, total_cnt;
+
+ if (*ppos)
+ return 0;
+
+ total_cnt = 0;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)) || (hdev->ext_ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ memset(kbuf, 0, MAX_INT_PORT_STS_KBUF_SIZE);
+ cnt = sprintf(kbuf, "Port %-2u: %s\n",
+ cn_port->port, cn_port->pcs_link ? "UP" : "DOWN");
+
+ if (copy_to_user(&buf[total_cnt], kbuf, cnt)) {
+ dev_err(hdev->dev, "error in copying info to user\n");
+ return -EFAULT;
+ }
+
+ total_cnt += cnt;
+ *ppos += cnt;
+ }
+
+ if (!total_cnt) {
+ char *msg = "No internal ports found\n";
+
+ return simple_read_from_buffer(buf, count, ppos, msg, strlen(msg));
+ }
+
+ return total_cnt + 1;
+}
+
+static const struct file_operations debugfs_show_internal_ports_status_fops = {
+ .owner = THIS_MODULE,
+ .read = debugfs_show_internal_ports_status_read,
+};
+
+static ssize_t debugfs_print_fec_stats_read(struct file *f, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ struct hbl_cn_port *cn_port;
+ char *kbuf;
+ int i, rc;
+
+ asic_funcs = hdev->asic_funcs;
+
+ if (*ppos)
+ return 0;
+
+ kbuf = kzalloc(KBUF_OUT_BIG_SIZE, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ sprintf(kbuf + strlen(kbuf), "Card %u FEC stats:\n", hdev->card_location);
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ asic_funcs->port_funcs->collect_fec_stats(cn_port, kbuf, KBUF_OUT_BIG_SIZE);
+ }
+
+ rc = simple_read_from_buffer(buf, count, ppos, kbuf, strlen(kbuf) + 1);
+
+ kfree(kbuf);
+
+ return rc;
+}
+
+static const struct file_operations debugfs_print_fec_stats_fops = {
+ .owner = THIS_MODULE,
+ .read = debugfs_print_fec_stats_read,
+};
+
+static ssize_t debugfs_phy_set_nrz_write(struct file *f, const char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ bool val;
+ int rc;
+
+ rc = kstrtobool_from_user(buf, count, &val);
+ if (rc)
+ return rc;
+
+ if (val == hdev->phy_set_nrz)
+ return count;
+
+ hdev->phy_set_nrz = val;
+ hdev->skip_phy_default_tx_taps_cfg = 0;
+
+ dev_info(hdev->dev, "%s NRZ mode\n", hdev->phy_set_nrz ? "Enable" : "Disable");
+
+ rc = hbl_device_hard_reset_sync(hdev);
+ if (rc)
+ return -EINVAL;
+
+ return count;
+}
+
+static const struct file_operations debugfs_phy_set_nrz_fops = {
+ .owner = THIS_MODULE,
+ .write = debugfs_phy_set_nrz_write,
+};
+
+static ssize_t debugfs_phy_dump_serdes_params_read(struct file *f, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ char *kbuf;
+ ssize_t rc;
+
+ asic_funcs = hdev->asic_funcs;
+
+ if (*ppos)
+ return 0;
+
+ /* For ASICs that don't support this feature, return an error */
+ if (!asic_funcs->phy_dump_serdes_params)
+ return -EINVAL;
+
+ kbuf = kzalloc(KBUF_OUT_BIG_SIZE, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ asic_funcs->phy_dump_serdes_params(hdev, kbuf, KBUF_OUT_BIG_SIZE);
+
+ rc = simple_read_from_buffer(buf, count, ppos, kbuf, strlen(kbuf) + 1);
+
+ kfree(kbuf);
+
+ return rc;
+}
+
+static ssize_t debugfs_phy_dump_serdes_params_write(struct file *f, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ u32 port;
+ int rc;
+
+ asic_funcs = hdev->asic_funcs;
+
+ /* For ASICs that don't support this feature, return an error */
+ if (!asic_funcs->phy_dump_serdes_params)
+ return -EINVAL;
+
+ rc = kstrtou32_from_user(buf, count, 10, &port);
+ if (rc)
+ return rc;
+
+ if (port >= hdev->cn_props.max_num_of_ports) {
+ dev_err(hdev->dev, "Invalid port number %u\n", port);
+ return -EINVAL;
+ }
+
+ hdev->phy_port_to_dump = port;
+
+ return count;
+}
+
+static const struct file_operations debugfs_phy_dump_serdes_params_fops = {
+ .owner = THIS_MODULE,
+ .read = debugfs_phy_dump_serdes_params_read,
+ .write = debugfs_phy_dump_serdes_params_write,
+};
+
+static ssize_t debugfs_inject_rx_err_read(struct file *f, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+
+ return debugfs_print_value_to_buffer(buf, count, ppos, "%u", hdev->rx_drop_percent);
+}
+
+static ssize_t debugfs_inject_rx_err_write(struct file *f, const char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ u32 val;
+ int rc;
+
+ asic_funcs = hdev->asic_funcs;
+
+ if (*ppos)
+ return 0;
+
+ rc = kstrtou32_from_user(buf, count, 10, &val);
+ if (rc)
+ return rc;
+
+ if (val > 100) {
+ dev_dbg_ratelimited(hdev->dev, "Invalid drop percentage %d\n", val);
+ return -EINVAL;
+ }
+
+ asic_funcs->inject_rx_err(hdev, val);
+
+ return count;
+}
+
+static const struct file_operations debugfs_inject_rx_err_fops = {
+ .owner = THIS_MODULE,
+ .read = debugfs_inject_rx_err_read,
+ .write = debugfs_inject_rx_err_write,
+};
+
+static ssize_t debugfs_override_port_status_write(struct file *f, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ char kbuf[KBUF_IN_SIZE] = {0};
+ struct hbl_cn_port *cn_port;
+ u32 port, max_num_of_ports;
+ char *c1, *c2;
+ ssize_t rc;
+ u8 up;
+
+ max_num_of_ports = hdev->cn_props.max_num_of_ports;
+
+ if (count > sizeof(kbuf) - 1)
+ goto err;
+ if (copy_from_user(kbuf, buf, count))
+ goto err;
+ kbuf[count] = '\0';
+
+ c1 = kbuf;
+ c2 = strchr(c1, ' ');
+ if (!c2)
+ goto err;
+ *c2 = '\0';
+
+ rc = kstrtou32(c1, 10, &port);
+ if (rc)
+ goto err;
+
+ if (port >= max_num_of_ports) {
+ dev_err(hdev->dev, "port max value is %d\n", max_num_of_ports - 1);
+ return -EINVAL;
+ }
+
+ /* Turn off speculation due to Spectre vulnerability */
+ port = array_index_nospec(port, max_num_of_ports);
+
+ c1 = c2 + 1;
+
+ rc = kstrtou8(c1, 10, &up);
+ if (rc)
+ goto err;
+
+ if (hdev->ports_mask & BIT(port)) {
+ cn_port = &hdev->cn_ports[port];
+
+ cn_port->pcs_link = !!up;
+ hbl_cn_phy_set_port_status(cn_port, !!up);
+ }
+
+ return count;
+err:
+ dev_err(hdev->dev, "usage: echo <port> <status> > nic_override_port_status\n");
+
+ return -EINVAL;
+}
+
+static const struct file_operations debugfs_override_port_status_fops = {
+ .owner = THIS_MODULE,
+ .write = debugfs_override_port_status_write,
+};
+
+static ssize_t debugfs_write_wqe_index_checker(struct file *f, const char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ u32 val;
+ int rc;
+
+ asic_funcs = hdev->asic_funcs;
+
+ /* For ASICs that don't support this feature, return an error */
+ if (!asic_funcs->set_wqe_index_checker)
+ return -EINVAL;
+
+ rc = kstrtou32_from_user(buf, count, 10, &val);
+ if (rc)
+ return rc;
+
+ rc = asic_funcs->set_wqe_index_checker(hdev, !!val);
+ if (rc)
+ return rc;
+
+ return count;
+}
+
+static ssize_t debugfs_read_wqe_index_checker(struct file *f, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ struct hbl_cn_asic_funcs *asic_funcs;
+ u32 index_checker;
+
+ asic_funcs = hdev->asic_funcs;
+
+ if (!asic_funcs->get_wqe_index_checker)
+ return -EINVAL;
+
+ index_checker = asic_funcs->get_wqe_index_checker(hdev);
+
+ return debugfs_print_value_to_buffer(buf, count, ppos, "%u", index_checker);
+}
+
+static const struct file_operations debugfs_wqe_index_checker_fops = {
+ .owner = THIS_MODULE,
+ .write = debugfs_write_wqe_index_checker,
+ .read = debugfs_read_wqe_index_checker,
+};
+
+static ssize_t debugfs_accumulate_fec_duration_write(struct file *f, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ u32 val = 0;
+ int rc;
+
+ rc = kstrtou32_from_user(buf, count, 10, &val);
+ if (rc)
+ return rc;
+
+ if (!val || val > ACCUMULATE_FEC_STATS_DURATION_MS_MAX)
+ return -EINVAL;
+
+ hdev->accumulate_fec_duration = val;
+
+ return count;
+}
+
+static ssize_t debugfs_accumulate_fec_duration_read(struct file *f, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+
+ return debugfs_print_value_to_buffer(buf, count, ppos, "%u",
+ hdev->accumulate_fec_duration);
+}
+
+static const struct file_operations debugfs_accumulate_fec_duration_fops = {
+ .owner = THIS_MODULE,
+ .write = debugfs_accumulate_fec_duration_write,
+ .read = debugfs_accumulate_fec_duration_read,
+};
+
+static ssize_t debugfs_mac_loopback_read(struct file *f, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+
+ return debugfs_print_value_to_buffer(buf, count, ppos, "0x%llx", hdev->mac_loopback);
+}
+
+static ssize_t debugfs_mac_loopback_write(struct file *f, const char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct hbl_cn_device *hdev = file_inode(f)->i_private;
+ ssize_t ret;
+ u64 val;
+ int rc;
+
+ ret = kstrtoull_from_user(buf, count, 16, &val);
+ if (ret)
+ return ret;
+
+ if (val == hdev->mac_loopback)
+ return count;
+
+ hdev->mac_loopback = val;
+ rc = hbl_device_hard_reset_sync(hdev);
+ if (rc)
+ return rc;
+
+ return count;
+}
+
+static const struct file_operations debugfs_mac_loopback_fops = {
+ .owner = THIS_MODULE,
+ .read = debugfs_mac_loopback_read,
+ .write = debugfs_mac_loopback_write,
+};
+
+static void __hbl_cn_debugfs_dev_init(struct hbl_cn_device *hdev, struct dentry *root_dir)
+{
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_MAC_LOOPBACK, 0644, root_dir, hdev,
+ &debugfs_mac_loopback_fops);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_PAM4_TX_TAPS, 0444, root_dir, hdev,
+ &debugfs_pam4_tx_taps_fops);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_NRZ_TX_TAPS, 0444, root_dir, hdev,
+ &debugfs_nrz_tx_taps_fops);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_POLARITY, 0444, root_dir, hdev, &debugfs_polarity_fops);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_QP, 0444, root_dir, hdev, &debugfs_qp_fops);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_WQE, 0444, root_dir, hdev, &debugfs_wqe_fops);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_RESET_CNT, 0444, root_dir, hdev,
+ &debugfs_reset_cnt_fops);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_MAC_LANE_REMAP, 0644, root_dir, hdev,
+ &debugfs_mac_lane_remap_fops);
+
+ HBL_CN_DEBUGFS_CREATE_U8(NIC_RAND_STATUS, 0644, root_dir, &hdev->rand_status);
+
+ HBL_CN_DEBUGFS_CREATE_U8(NIC_MMU_BYPASS, 0644, root_dir, &hdev->mmu_bypass);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_ETH_LOOPBACK, 0644, root_dir, hdev,
+ &debugfs_eth_loopback_fops);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_PHY_REGS_PRINT, 0444, root_dir, hdev,
+ &debugfs_phy_regs_print_fops);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_SHOW_INTERNAL_PORTS_STATUS, 0444, root_dir, hdev,
+ &debugfs_show_internal_ports_status_fops);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_PRINT_FEC_STATS, 0444, root_dir, hdev,
+ &debugfs_print_fec_stats_fops);
+
+ HBL_CN_DEBUGFS_CREATE_U8(NIC_DISABLE_DECAP, 0644, root_dir, &hdev->is_decap_disabled);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_PHY_SET_NRZ, 0444, root_dir, hdev,
+ &debugfs_phy_set_nrz_fops);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_PHY_DUMP_SERDES_PARAMS, 0444, root_dir, hdev,
+ &debugfs_phy_dump_serdes_params_fops);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_INJECT_RX_ERR, 0444, root_dir, hdev,
+ &debugfs_inject_rx_err_fops);
+
+ HBL_CN_DEBUGFS_CREATE_U8(NIC_PHY_CALC_BER, 0644, root_dir, &hdev->phy_calc_ber);
+
+ HBL_CN_DEBUGFS_CREATE_U16(NIC_PHY_CALC_BER_WAIT_SEC, 0644, root_dir,
+ &hdev->phy_calc_ber_wait_sec);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_OVERRIDE_PORT_STATUS, 0200, root_dir, hdev,
+ &debugfs_override_port_status_fops);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_WQE_INDEX_CHECKER, 0644, root_dir, hdev,
+ &debugfs_wqe_index_checker_fops);
+
+ HBL_CN_DEBUGFS_CREATE_FILE(NIC_ACCUMULATE_FEC_DURATION, 0644, root_dir, hdev,
+ &debugfs_accumulate_fec_duration_fops);
+
+ HBL_CN_DEBUGFS_CREATE_U8(NIC_PHY_FORCE_FIRST_TX_TAPS_CFG, 0644, root_dir,
+ &hdev->phy_force_first_tx_taps_cfg);
+}
+
+void hbl_cn_debugfs_dev_init(struct hbl_cn_device *hdev)
+{
+ char name[64] = {0};
+
+ snprintf(name, sizeof(name), "hbl_cn%d", hdev->id);
+ hdev->cn_dentry = debugfs_create_dir(name, hbl_cn_debug_root);
+ __hbl_cn_debugfs_dev_init(hdev, hdev->cn_dentry);
+}
+
+void hbl_cn_debugfs_dev_fini(struct hbl_cn_device *hdev)
+{
+ debugfs_remove_recursive(hdev->cn_dentry);
+}
+
+void __init hbl_cn_debugfs_init(void)
+{
+ hbl_cn_debug_root = debugfs_create_dir(module_name(THIS_MODULE), NULL);
+}
+
+void hbl_cn_debugfs_fini(void)
+{
+ debugfs_remove_recursive(hbl_cn_debug_root);
+}
+
+#else
+
+void hbl_cn_debugfs_dev_init(struct hbl_cn_device *hdev)
+{
+}
+
+void hbl_cn_debugfs_dev_fini(struct hbl_cn_device *hdev)
+{
+}
+
+void __init hbl_cn_debugfs_init(void)
+{
+}
+
+void hbl_cn_debugfs_fini(void)
+{
+}
+
+#endif /* CONFIG_DEBUG_FS */
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_drv.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_drv.c
index 5ea690509592..1a26d9f3b9a4 100644
--- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_drv.c
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_drv.c
@@ -207,15 +207,32 @@ static struct auxiliary_driver hbl_cn_driver = {
static int __init hbl_cn_init(void)
{
+ int rc;
+
pr_info("loading driver\n");
- return auxiliary_driver_register(&hbl_cn_driver);
+ hbl_cn_debugfs_init();
+
+ rc = auxiliary_driver_register(&hbl_cn_driver);
+ if (rc) {
+ pr_err("Failed to register auxiliary driver\n");
+ goto remove_debugfs;
+ }
+
+ return 0;
+
+remove_debugfs:
+ hbl_cn_debugfs_fini();
+
+ return rc;
}
static void __exit hbl_cn_exit(void)
{
auxiliary_driver_unregister(&hbl_cn_driver);
+ hbl_cn_debugfs_fini();
+
pr_info("driver removed\n");
}
--
2.34.1
^ permalink raw reply related [flat|nested] 107+ messages in thread
* [PATCH 08/15] net: hbl_cn: gaudi2: ASIC specific support
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
` (5 preceding siblings ...)
2024-06-13 8:21 ` [PATCH 06/15] net: hbl_cn: debugfs support Omer Shpigelman
@ 2024-06-13 8:22 ` Omer Shpigelman
2024-06-13 8:22 ` [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver Omer Shpigelman
` (8 subsequent siblings)
15 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:22 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
Add GAUDI2 ASIC support which includes HW specific configurations and
operations.
It is initialized on top of the common device layer and has dedicated data
structures which are passed via auxiliary bus.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Co-developed-by: David Meriin <dmeriin@habana.ai>
Signed-off-by: David Meriin <dmeriin@habana.ai>
Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
---
MAINTAINERS | 1 +
drivers/net/ethernet/intel/hbl_cn/Makefile | 3 +
.../net/ethernet/intel/hbl_cn/common/hbl_cn.c | 2 +
.../net/ethernet/intel/hbl_cn/common/hbl_cn.h | 2 +
.../net/ethernet/intel/hbl_cn/gaudi2/Makefile | 3 +
.../ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.c | 5689 +++++++++++++++++
.../ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.h | 427 ++
.../intel/hbl_cn/gaudi2/gaudi2_cn_debugfs.c | 319 +
.../intel/hbl_cn/gaudi2/gaudi2_cn_eq.c | 732 +++
.../intel/hbl_cn/gaudi2/gaudi2_cn_phy.c | 2743 ++++++++
include/linux/net/intel/gaudi2.h | 432 ++
include/linux/net/intel/gaudi2_aux.h | 94 +
12 files changed, 10447 insertions(+)
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/Makefile
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.h
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_debugfs.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_eq.c
create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_phy.c
create mode 100644 include/linux/net/intel/gaudi2.h
create mode 100644 include/linux/net/intel/gaudi2_aux.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 906224204aba..096439a62129 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9615,6 +9615,7 @@ F: Documentation/networking/device_drivers/ethernet/intel/hbl.rst
F: drivers/net/ethernet/intel/hbl_cn/
F: include/linux/habanalabs/
F: include/linux/net/intel/cn*
+F: include/linux/net/intel/gaudi2*
HACKRF MEDIA DRIVER
L: linux-media@vger.kernel.org
diff --git a/drivers/net/ethernet/intel/hbl_cn/Makefile b/drivers/net/ethernet/intel/hbl_cn/Makefile
index c2c4142f18a0..836b7f7824b4 100644
--- a/drivers/net/ethernet/intel/hbl_cn/Makefile
+++ b/drivers/net/ethernet/intel/hbl_cn/Makefile
@@ -8,4 +8,7 @@ obj-$(CONFIG_HABANA_CN) := habanalabs_cn.o
include $(src)/common/Makefile
habanalabs_cn-y += $(HBL_CN_COMMON_FILES)
+include $(src)/gaudi2/Makefile
+habanalabs_cn-y += $(HBL_CN_GAUDI2_FILES)
+
habanalabs_cn-$(CONFIG_DEBUG_FS) += common/hbl_cn_debugfs.o
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
index fc7bd6474404..b493b681a3b2 100644
--- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
@@ -1747,6 +1747,8 @@ static int hbl_cn_set_asic_funcs(struct hbl_cn_device *hdev)
{
switch (hdev->asic_type) {
case HBL_ASIC_GAUDI2:
+ gaudi2_cn_set_asic_funcs(hdev);
+ break;
default:
dev_err(hdev->dev, "Unrecognized ASIC type %d\n", hdev->asic_type);
return -EINVAL;
diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.h b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.h
index 0b96cd3db719..0627a2aa6f0c 100644
--- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.h
+++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.h
@@ -1644,6 +1644,8 @@ void hbl_cn_fw_status_work(struct work_struct *work);
int hbl_cn_get_src_ip(struct hbl_cn_port *cn_port, u32 *src_ip);
void hbl_cn_ctx_resources_destroy(struct hbl_cn_device *hdev, struct hbl_cn_ctx *ctx);
+void gaudi2_cn_set_asic_funcs(struct hbl_cn_device *hdev);
+
int __hbl_cn_ports_reopen(struct hbl_cn_device *hdev);
void __hbl_cn_hard_reset_prepare(struct hbl_cn_device *hdev, bool fw_reset, bool in_teardown);
void __hbl_cn_stop(struct hbl_cn_device *hdev);
diff --git a/drivers/net/ethernet/intel/hbl_cn/gaudi2/Makefile b/drivers/net/ethernet/intel/hbl_cn/gaudi2/Makefile
new file mode 100644
index 000000000000..889776350452
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/gaudi2/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+HBL_CN_GAUDI2_FILES := gaudi2/gaudi2_cn.o gaudi2/gaudi2_cn_phy.o \
+ gaudi2/gaudi2_cn_eq.o gaudi2/gaudi2_cn_debugfs.o
diff --git a/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.c b/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.c
new file mode 100644
index 000000000000..a1b01c729c31
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.c
@@ -0,0 +1,5689 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "gaudi2_cn.h"
+
+#include <linux/circ_buf.h>
+#include <linux/units.h>
+
+#define CFG_BAR_ID 0
+
+#define GAUDI2_NIC_WTD_BP_UPPER_TH_DIFF 24
+#define GAUDI2_NIC_WTD_BP_LOWER_TH_DIFF 26
+#define GAUDI2_NIC_MIN_WQ_SIZE_BP_ENABLED 32
+#define GAUDI2_NIC_MTU_DEFAULT SZ_8K /* 8KB */
+#define QPC_SANITY_CHECK_INTERVAL_MS 1 /* 1 msec */
+#define NIC_TMR_TIMEOUT_PLDM_US 1000 /* 1 msec */
+#define NIC_TMR_TIMEOUT_PLDM_GRAN 7 /* 512 us */
+
+#define PERF_BW_WINDOW_MSEC 100
+#define PERF_BW_WINDOW_USEC (PERF_BW_WINDOW_MSEC * USEC_PER_MSEC)
+/* Convert bytes per window into gigabytes per second */
+#define PERF_BW_WINDOW_DIV ((GIGA * PERF_BW_WINDOW_MSEC) / MSEC_PER_SEC)
+
+#define IPV4_PROTOCOL_UDP 17
+#define DUMMY_UDP_PORT 8224
+#define GAUDI2_USER_ENCAP_ID 0
+
+#define GAUDI2_PFC_PRIO_DRIVER 0
+#define GAUDI2_PFC_PRIO_USER_BASE 1
+
+#define GAUDI2_NIC_MAX_CONG_WND BIT(23)
+#define GAUDI2_NIC_MAX_SPEED SPEED_100000
+
+#define RETRY_COUNT_QPC_SANITY 10
+#define GAUDI2_NIC_MAX_TIMEOUT_RETRIES 0xFE
+#define GAUDI2_NIC_MAX_SEQ_ERR_RETRIES 0xFE
+
+/* Actual mask used by HW is smaller than the one declared in
+ * NIC0_QPC0_WQ_BP_2ARC_ADDR_VAL_MASK and NIC0_QPC0_WQ_BP_2QMAN_ADDR_VAL_MASK
+ */
+#define WQ_BP_ADDR_VAL_MASK 0x7FFFFFF
+
+/* User encapsulation 32 bit register offset. */
+#define encap_offset(id) ((id) * 4)
+
+/* We have fixed mapping between SW and HW IDs. */
+#define db_fifo_hw_id(id) ((id) - 1)
+
+/* User doorbell fifo 32 bit register offset. */
+#define db_fifo_offset(id) (db_fifo_hw_id(id) * 4)
+
+/* User doorbell fifo 64 bit register offset. */
+#define db_fifo_offset64(id) (db_fifo_hw_id(id) * 8)
+
+static int gaudi2_cn_ctx_init(struct hbl_cn_ctx *ctx);
+static void gaudi2_cn_ctx_fini(struct hbl_cn_ctx *ctx);
+static void gaudi2_qp_sanity_fini(struct gaudi2_cn_port *gaudi2_port);
+static int gaudi2_qp_sanity_init(struct gaudi2_cn_port *gaudi2_port);
+static void gaudi2_get_default_encap_id(struct hbl_cn_port *cn_port, u32 *id);
+static int gaudi2_encap_set(struct hbl_cn_port *cn_port, u32 encap_id,
+ struct hbl_cn_encap_xarray_pdata *xa_pdata);
+static void gaudi2_user_cq_set_overrun(struct hbl_cn_user_cq *user_cq, bool set_overrun);
+static int gaudi2_cn_poll_reg(struct hbl_cn_device *hdev, u32 reg, u64 timeout_us,
+ hbl_cn_poll_cond_func func, void *arg);
+
+enum gaudi2_cn_user_bp_offs {
+ HBL_CNI_USER_BP_OFFS_FW,
+ HBL_CNI_USER_BP_OFFS_QMAN
+};
+
+struct gaudi2_cn_stat {
+ char str[ETH_GSTRING_LEN];
+};
+
+static struct gaudi2_cn_stat gaudi2_cn_err_stats[] = {
+ {"Congestion Q err"},
+ {"Eth DB fifo overrun"}
+};
+
+/* Gaudi2 FEC (Fwd Error Correction) Stats */
+static struct gaudi2_cn_stat gaudi2_cn_mac_fec_stats[] = {
+ {"cw_corrected_accum"},
+ {"cw_uncorrect_accum"},
+ {"cw_corrected"},
+ {"cw_uncorrect"},
+ {"symbol_err_corrected_lane_0"},
+ {"symbol_err_corrected_lane_1"},
+ {"symbol_err_corrected_lane_2"},
+ {"symbol_err_corrected_lane_3"},
+ {"pre_FEC_SER_int"},
+ {"pre_FEC_SER_exp (negative)"},
+ {"post_FEC_SER_int"},
+ {"post_FEC_SER_exp (negative)"},
+};
+
+/* Gaudi2 performance Stats */
+static struct gaudi2_cn_stat gaudi2_cn_perf_stats[] = {
+ {"bandwidth_gbps_int"},
+ {"bandwidth_gbps_frac"},
+ {"last_data_latency_usec_int"},
+ {"last_data_latency_usec_frac"},
+};
+
+static size_t gaudi2_cn_err_stats_len = ARRAY_SIZE(gaudi2_cn_err_stats);
+static size_t gaudi2_cn_mac_fec_stats_len = ARRAY_SIZE(gaudi2_cn_mac_fec_stats);
+static size_t gaudi2_cn_perf_stats_len = ARRAY_SIZE(gaudi2_cn_perf_stats);
+
+#define GAUDI2_SYNDROME_TYPE(syndrome) (((syndrome) >> 6) & 0x3)
+#define GAUDI2_MAX_SYNDROME_STRING_LEN 256
+#define GAUDI2_MAX_SYNDROME_TYPE 3
+
+#define GAUDI2_NUM_OF_NIC_RXB_CORE_SEI_CAUSE 2
+#define GAUDI2_NUM_OF_NIC_RXB_CORE_SPI_CAUSE 6
+#define GAUDI2_NUM_OF_NIC_RXE_SEI_CAUSE 4
+#define GAUDI2_NUM_OF_NIC_RXE_SPI_CAUSE 24
+#define GAUDI2_NUM_OF_NIC_QPC_RESP_ERR_CAUSE 7
+
+static const char * const
+gaudi2_cn_rxb_core_sei_interrupts_cause[GAUDI2_NUM_OF_NIC_RXB_CORE_SEI_CAUSE] = {
+ "HBW RRESP error",
+ "LBW RRESP error"
+};
+
+static const char * const
+gaudi2_cn_rxb_core_spi_interrupts_cause[GAUDI2_NUM_OF_NIC_RXB_CORE_SPI_CAUSE] = {
+ "Packet dropped due to no available buffers",
+ "Control pointers count illegal port 0",
+ "Control pointers count illegal port 1",
+ "Control pointers count illegal port 2",
+ "Control pointers count illegal port 3",
+ "Scatter pointers count illegal"
+};
+
+static const char * const
+gaudi2_cn_qpc_resp_err_interrupts_cause[GAUDI2_NUM_OF_NIC_QPC_RESP_ERR_CAUSE] = {
+ "ARC SEI error",
+ "QPC LBW AXI write slv decode err",
+ "QPC LBW AXI write slv err",
+ "QPC HBW AXI write slv decode err",
+ "QPC HBW AXI write slv err",
+ "QPC HBW AXI read slv decode err",
+ "QPC HBW AXI read slv err"
+};
+
+static const char * const gaudi2_cn_rxe_sei_interrupts_cause[GAUDI2_NUM_OF_NIC_RXE_SEI_CAUSE] = {
+ "HBW RRESP error WQE",
+ "HBW RRESP error FNA",
+ "LBW BRESP error",
+ "HBW BRESP error"
+};
+
+static const char * const gaudi2_cn_rxe_spi_interrupts_cause[GAUDI2_NUM_OF_NIC_RXE_SPI_CAUSE] = {
+ "QP invalid",
+ "TS mismatch",
+ "Request CS invalid",
+ "Response CS invalid",
+ "Request PSN invalid",
+ "Request PSN unsent",
+ "Response RKEY invalid",
+ "Response RESYNC invalid",
+ "Packet bad format",
+ "Invalid opcode",
+ "Invalid syndrome",
+ "Invalid min packet size RC",
+ "Invalid max packet size RC",
+ "Invalid min packet size raw",
+ "Invalid max packet size raw",
+ "Tunnel invalid",
+ "WQE index mismatch",
+ "WQ WR opcode invalid",
+ "WQ RDV opcode invalid",
+ "WQ RD opcode invalid",
+ "WQE WR zero",
+ "WQE multi zero",
+ "WQE WE send big",
+ "WQE multi big"
+};
+
+static char qp_syndromes[NIC_MAX_QP_ERR_SYNDROMES][GAUDI2_MAX_SYNDROME_STRING_LEN] = {
+ /* Rx packet errors*/
+ [0x1] = "[RX] pkt err, pkt bad format",
+ [0x2] = "[RX] pkt err, pkt tunnel invalid",
+ [0x3] = "[RX] pkt err, BTH opcode invalid",
+ [0x4] = "[RX] pkt err, syndrome invalid",
+ [0x5] = "[RX] pkt err, Reliable QP max size invalid",
+ [0x6] = "[RX] pkt err, Reliable QP min size invalid",
+ [0x7] = "[RX] pkt err, Raw min size invalid",
+ [0x8] = "[RX] pkt err, Raw max size invalid",
+ [0x9] = "[RX] pkt err, QP invalid",
+ [0xa] = "[RX] pkt err, Transport Service mismatch",
+ [0xb] = "[RX] pkt err, QPC Requester QP state invalid",
+ [0xc] = "[RX] pkt err, QPC Responder QP state invalid",
+ [0xd] = "[RX] pkt err, QPC Responder resync invalid",
+ [0xe] = "[RX] pkt err, QPC Requester PSN invalid",
+ [0xf] = "[RX] pkt err, QPC Requester PSN unset",
+ [0x10] = "[RX] pkt err, QPC Responder RKEY invalid",
+ [0x11] = "[RX] pkt err, WQE index mismatch",
+ [0x12] = "[RX] pkt err, WQE write opcode invalid",
+ [0x13] = "[RX] pkt err, WQE Rendezvous opcode invalid",
+ [0x14] = "[RX] pkt err, WQE Read opcode invalid",
+ [0x15] = "[RX] pkt err, WQE Write Zero",
+ [0x16] = "[RX] pkt err, WQE multi zero",
+ [0x17] = "[RX] pkt err, WQE Write send big",
+ [0x18] = "[RX] pkt err, WQE multi big",
+
+ /* QPC errors */
+ [0x40] = "[qpc] [TMR] max-retry-cnt exceeded",
+ [0x41] = "[qpc] [req DB] QP not valid",
+ [0x42] = "[qpc] [req DB] security check",
+ [0x43] = "[qpc] [req DB] PI > last-index",
+ [0x44] = "[qpc] [req DB] wq-type is READ",
+ [0x45] = "[qpc] [req TX] QP not valid",
+ [0x46] = "[qpc] [req TX] Rendezvous WQE but wq-type is not WRITE",
+ [0x47] = "[qpc] [req RX] QP not valid",
+ [0x48] = "[qpc] [req RX] max-retry-cnt exceeded",
+ [0x49] = "[qpc] [req RDV] QP not valid",
+ [0x4a] = "[qpc] [req RDV] wrong wq-type",
+ [0x4b] = "[qpc] [req RDV] PI > last-index",
+ [0x4c] = "[qpc] [res TX] QP not valid",
+ [0x4d] = "[qpc] [res RX] QP not valid",
+
+ /* tx packet error */
+ [0x80] = "[TX] pkt error, QPC.wq_type is write does not support WQE.opcode",
+ [0x81] = "[TX] pkt error, QPC.wq_type is rendezvous does not support WQE.opcode",
+ [0x82] = "[TX] pkt error, QPC.wq_type is read does not support WQE.opcode",
+ [0x83] = "[TX] pkt error, QPC.gaudi1 is set does not support WQE.opcode",
+ [0x84] = "[TX] pkt error, WQE.opcode is write but WQE.size is 0",
+ [0x85] =
+ "[TX] pkt error, WQE.opcode is multi-stride|local-stride|multi-dual but WQE.size is 0",
+ [0x86] = "[TX] pkt error, WQE.opcode is send but WQE.size is 0",
+ [0x87] = "[TX] pkt error, WQE.opcode is rendezvous-write|rendezvous-read but WQE.size is 0",
+ [0x88] = "[TX] pkt error, WQE.opcode is write but size > configured max-write-send-size",
+ [0x89] =
+ "[TX] pkt error, WQE.opcode is multi-stride|local-stride|multi-dual but size > configured max-stride-size",
+ [0x8a] =
+ "[TX] pkt error, WQE.opcode is rendezvous-write|rendezvous-read but QPC.remote_wq_log_size <= configured min-remote-log-size",
+ [0x8b] =
+ "[TX] pkt error, WQE.opcode is rendezvous-write but WQE.size != configured rdv-wqe-size (per granularity)",
+ [0x8c] =
+ "[TX] pkt error, WQE.opcode is rendezvous-read but WQE.size != configured rdv-wqe-size (per granularity)",
+ [0x8d] =
+ "[TX] pkt error, WQE.inline is set but WQE.size != configured inline-wqe-size (per granularity)",
+ [0x8e] = "[TX] pkt error, QPC.gaudi1 is set but WQE.inline is set",
+ [0x8f] =
+ "[TX] pkt error, WQE.opcode is multi-stride|local-stride|multi-dual but QPC.swq_granularity is 0",
+ [0x90] = "[TX] pkt error, WQE.opcode != NOP but WQE.reserved0 != 0",
+ [0x91] = "[TX] pkt error, WQE.opcode != NOP but WQE.wqe_index != execution-index [7.0]",
+ [0x92] =
+ "[TX] pkt error, WQE.opcode is multi-stride|local-stride|multi-dual but WQE.size < stride-size",
+ [0x93] =
+ "[TX] pkt error, WQE.reduction_opcode is upscale but WQE.remote_address LSB is not 0",
+ [0x94] = "[TX] pkt error, WQE.reduction_opcode is upscale but does not support WQE.opcode",
+ [0x95] = "[TX] pkt error, RAW packet but WQE.size not supported",
+ [0xB0] = "WQE.opcode is QoS but WQE.inline is set",
+ [0xB1] = "WQE.opcode above 15",
+ [0xB2] = "RAW above MIN",
+ [0xB3] = "RAW below MAX",
+ [0xB4] = "WQE.reduction is disable but reduction-opcode is not 0",
+ [0xB5] = "WQE.opcode is READ-RDV but WQE.inline is set",
+ [0xB6] = "WQE fetch WR size not 4",
+ [0xB7] = "WQE fetch WR addr not mod4",
+ [0xB8] = "RDV last-index",
+ [0xB9] = "Gaudi1 multi-dual",
+ [0xBA] = "WQE bad opcode",
+ [0xBB] = "WQE bad size",
+ [0xBC] = "WQE SE not RAW",
+ [0xBD] = "Gaudi1 tunnal",
+ [0xBE] = "Tunnel 0-size",
+ [0xBF] = "Tunnel max size",
+};
+
+char *gaudi2_cn_qp_err_syndrome_to_str(u32 syndrome)
+{
+ int syndrome_type;
+ char *str;
+
+ /* syndrome comprised from 8 bits
+ * [2:type, 6:syndrome]
+ * 6 bits for syndrome
+ * 2 bits for type
+ * 0 - rx packet error
+ * 1 - qp error
+ * 2 - tx packet error
+ */
+
+ if (syndrome >= NIC_MAX_QP_ERR_SYNDROMES)
+ return "syndrome unknown";
+
+ syndrome_type = GAUDI2_SYNDROME_TYPE(syndrome);
+
+ str = qp_syndromes[syndrome];
+ if (strlen(str))
+ return str;
+
+ switch (syndrome_type) {
+ case 0:
+ str = "RX packet syndrome unknown";
+ break;
+ case 1:
+ str = "QPC syndrome unknown";
+ break;
+ case 2:
+ str = "TX packet syndrome unknown";
+ break;
+ default:
+ str = "syndrome unknown";
+ break;
+ }
+
+ return str;
+}
+
+static void db_fifo_toggle_err_evt(struct hbl_cn_port *cn_port, bool enable)
+{
+ u32 mask = NIC0_QPC0_REQ_STATIC_CONFIG_QM_PUSH_TO_ERR_FIFO_NON_V_MASK |
+ NIC0_QPC0_REQ_STATIC_CONFIG_QM_PUSH_ERR_PI_EX_LAST_MASK;
+ u32 val = enable ? (mask >> __ffs(mask)) : 0;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+
+ NIC_RMWREG32_SHIFTED(NIC0_QPC0_REQ_STATIC_CONFIG, val, mask);
+}
+
+static void __gaudi2_cn_get_db_fifo_umr(struct hbl_cn_port *cn_port, u32 block_id, u32 offset_id,
+ u64 *umr_block_addr, u32 *umr_db_offset)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port, odd_db_offset;
+ struct gaudi2_cn_device *gaudi2;
+
+ gaudi2 = hdev->asic_specific;
+ odd_db_offset = NIC0_UMR0_0_UNSECURE_DOORBELL1_UNSECURE_DB_FIRST32 -
+ NIC0_UMR0_0_UNSECURE_DOORBELL0_UNSECURE_DB_FIRST32;
+
+ /* UMR base address we map to userspace */
+ *umr_block_addr = gaudi2->cfg_base +
+ NIC_CFG_BASE(port, NIC0_UMR0_0_UNSECURE_DOORBELL0_BASE) +
+ NIC0_UMR0_0_UNSECURE_DOORBELL0_BASE + (block_id * NIC_UMR_OFFSET);
+
+ /* Each UMR block hosts 2 doorbell fifos. Get byte offset. */
+ *umr_db_offset = (offset_id & 1) ? odd_db_offset : 0;
+}
+
+static void gaudi2_cn_get_db_fifo_umr(struct hbl_cn_port *cn_port, u32 id, u64 *umr_block_addr,
+ u32 *umr_db_offset)
+{
+ __gaudi2_cn_get_db_fifo_umr(cn_port, db_fifo_hw_id(id) / 2, db_fifo_hw_id(id),
+ umr_block_addr, umr_db_offset);
+}
+
+static void db_fifo_push_dummy(struct hbl_cn_port *cn_port, u32 id, int n_dummy, bool is_eth)
+{
+ u32 port, umr_db_offset, offset_0_31, offset_32_64, db_dummy[2];
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct gaudi2_cn_device *gaudi2;
+ u64 umr_block_addr;
+ int i;
+
+ gaudi2 = hdev->asic_specific;
+ port = cn_port->port;
+
+ /* 8 bytes dummy doorbell packet with unused QP ID. */
+ db_dummy[0] = 0;
+ db_dummy[1] = NIC_MAX_QP_NUM;
+
+ /* Get DB fifo offset in register configuration space. */
+ gaudi2_cn_get_db_fifo_umr(cn_port, id, &umr_block_addr, &umr_db_offset);
+ offset_0_31 = umr_block_addr - gaudi2->cfg_base + umr_db_offset;
+ offset_32_64 = offset_0_31 + (NIC0_UMR0_0_UNSECURE_DOORBELL0_UNSECURE_DB_SECOND32 -
+ NIC0_UMR0_0_UNSECURE_DOORBELL0_UNSECURE_DB_FIRST32);
+
+ /* Split user doorbell fifo packets to fit 32 bit registers. */
+ for (i = 0; i < n_dummy; i++)
+ if (is_eth) {
+ NIC_WREG32(NIC0_QPC0_SECURED_DB_FIRST32, db_dummy[0]);
+ NIC_WREG32(NIC0_QPC0_SECURED_DB_SECOND32, db_dummy[1]);
+ } else {
+ WREG32(offset_0_31, db_dummy[0]);
+ WREG32(offset_32_64, db_dummy[1]);
+ }
+
+ if (gaudi2->flush_db_fifo) {
+ if (is_eth)
+ NIC_RREG32(NIC0_QPC0_SECURED_DB_FIRST32);
+ else
+ RREG32(offset_0_31);
+ }
+}
+
+static bool db_fifo_reset_cond_func1(u32 val, void *arg)
+{
+ return val;
+}
+
+static bool db_fifo_reset_cond_func2(u32 val, void *arg)
+{
+ return val == (NIC_FIFO_DB_SIZE - 1);
+}
+
+/* Doorbell fifo H/W bug. There is no provision for S/W to reset H/W CI.
+ * Hence, we implement workaround. Push dummy doorbells to db fifos till
+ * CI wraps.
+ */
+static void __db_fifo_reset(struct hbl_cn_port *cn_port, u32 *ci_cpu_addr, u32 id, bool is_eth)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct gaudi2_cn_aux_ops *gaudi2_aux_ops;
+ struct hbl_cn_aux_ops *aux_ops;
+ u32 ci = *ci_cpu_addr, port;
+ struct hbl_aux_dev *aux_dev;
+ int rc;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ gaudi2_aux_ops = aux_ops->asic_ops;
+
+ /* no need to push dummy doorbells, as the hard will reset itself. However, reset the memory
+ * where the last CI is stored at.
+ */
+ if (!hdev->operational) {
+ *ci_cpu_addr = 0;
+ return;
+ }
+
+ port = cn_port->port;
+
+ /* Stop HW from throwing below error events.
+ * 1. Requester DB, invalid QP
+ * 2. Requester DB, PI > last WQ index
+ *
+ * Dummy doorbell piggybacks on the idea that HW updates CI
+ * even for invalid doorbells. However, EQ error events are
+ * generated and fifo is pushed to error state.
+ */
+ db_fifo_toggle_err_evt(cn_port, false);
+
+ /* 1. Another User doorbell fifo HW bug. CI updated by HW is one
+ * less than number of doorbells pushed.
+ * 2. We cannot assert if user has pushed any doorbells. i.e
+ * CI buffer is default 0 or HW updated zero.
+ *
+ * To handle above scenario we push 2 dummy doorbells if CI is zero.
+ * This forces HW to update CI buffer. Hence, ensuring we are not
+ * dealing with default zero memory.
+ * Note, driver ensures CI buffer is zeroed out before passing it on to HW.
+ */
+ if (!ci) {
+ db_fifo_push_dummy(cn_port, id, 2, is_eth);
+
+ rc = gaudi2_aux_ops->poll_mem(aux_dev, ci_cpu_addr, &ci, db_fifo_reset_cond_func1);
+ if (rc && !ci)
+ dev_err(hdev->dev, "Doorbell fifo reset timed out\n");
+ }
+
+ /* Push dummy doorbells such that HW CI points to fifo base. */
+ db_fifo_push_dummy(cn_port, id, NIC_FIFO_DB_SIZE - ci - 1, is_eth);
+
+ /* Wait for HW to absorb dummy doorbells and update CI. */
+ rc = gaudi2_aux_ops->poll_mem(aux_dev, ci_cpu_addr, &ci, db_fifo_reset_cond_func2);
+ if (rc && (ci != (NIC_FIFO_DB_SIZE - 1)))
+ dev_err(hdev->dev, "Doorbell fifo reset timed out, ci: %d\n", ci);
+
+ db_fifo_toggle_err_evt(cn_port, true);
+
+ /* Zero out HW CI buffer address register for added safety. */
+ if (is_eth) {
+ NIC_WREG32(NIC0_QPC0_DBFIFOSECUR_CI_UPD_ADDR_DBFIFO_CI_UPD_ADDR_31_7, 0);
+ NIC_WREG32(NIC0_QPC0_DBFIFOSECUR_CI_UPD_ADDR_DBFIFO_CI_UPD_ADDR_63_32, 0);
+ } else {
+ NIC_WREG32(NIC0_QPC0_DBFIFO0_CI_UPD_ADDR_DBFIFO_CI_UPD_ADDR_31_7 +
+ db_fifo_offset64(id), 0);
+ NIC_WREG32(NIC0_QPC0_DBFIFO0_CI_UPD_ADDR_DBFIFO_CI_UPD_ADDR_63_32 +
+ db_fifo_offset64(id), 0);
+ }
+}
+
+static void gaudi2_cn_db_fifo_reset(struct hbl_aux_dev *aux_dev, u32 port)
+{
+ struct hbl_cn_device *hdev = container_of(aux_dev, struct hbl_cn_device, en_aux_dev);
+ struct hbl_cn_port *cn_port = &hdev->cn_ports[port];
+ struct gaudi2_cn_port *gaudi2_port;
+ u32 *ci_cpu_addr;
+
+ gaudi2_port = cn_port->cn_specific;
+ ci_cpu_addr = (u32 *)RING_BUF_ADDRESS(&gaudi2_port->fifo_ring);
+
+ gaudi2_port->db_fifo_pi = 0;
+
+ __db_fifo_reset(cn_port, ci_cpu_addr, 0, true);
+}
+
+static int gaudi2_cn_config_wqe_asid(struct hbl_cn_port *cn_port, u32 asid, bool set_asid)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+ int rc;
+
+ rc = hbl_cn_send_cpucp_packet(hdev, port, set_asid ? CPUCP_PACKET_NIC_WQE_ASID_SET :
+ CPUCP_PACKET_NIC_WQE_ASID_UNSET, asid);
+ if (rc)
+ dev_err(hdev->dev, "Failed to %s WQE ASID, port %d, rc %d\n",
+ set_asid ? "set" : "unset", port, rc);
+
+ return rc;
+}
+
+static int gaudi2_cn_disable_wqe_index_checker(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+ int rc;
+
+ rc = hbl_cn_send_cpucp_packet(hdev, port, CPUCP_PACKET_NIC_SET_CHECKERS,
+ RX_WQE_IDX_MISMATCH);
+
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to disable Rx WQE idx mismatch checker, port %d, rc %d\n", port,
+ rc);
+ return rc;
+ }
+
+ rc = hbl_cn_send_cpucp_packet(hdev, port, CPUCP_PACKET_NIC_SET_CHECKERS,
+ TX_WQE_IDX_MISMATCH);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to disable Tx WQE idx mismatch checker, port %d, rc %d\n", port,
+ rc);
+ return rc;
+ }
+
+ return 0;
+}
+
+static void *gaudi2_cn_dma_alloc_coherent(struct hbl_cn_device *hdev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flag)
+{
+ const struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ struct hbl_aux_dev *aux_dev = hdev->cn_aux_dev;
+
+ return gaudi2->cn_aux_ops->dma_alloc_coherent(aux_dev, size, dma_handle, flag);
+}
+
+static void gaudi2_cn_dma_free_coherent(struct hbl_cn_device *hdev, size_t size, void *cpu_addr,
+ dma_addr_t dma_handle)
+{
+ const struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ struct hbl_aux_dev *aux_dev = hdev->cn_aux_dev;
+
+ gaudi2->cn_aux_ops->dma_free_coherent(aux_dev, size, cpu_addr, dma_handle);
+}
+
+static void *gaudi2_cn_dma_pool_zalloc(struct hbl_cn_device *hdev, size_t size, gfp_t mem_flags,
+ dma_addr_t *dma_handle)
+{
+ const struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ struct hbl_aux_dev *aux_dev = hdev->cn_aux_dev;
+
+ return gaudi2->cn_aux_ops->dma_pool_zalloc(aux_dev, size, mem_flags, dma_handle);
+}
+
+static void gaudi2_cn_dma_pool_free(struct hbl_cn_device *hdev, void *vaddr, dma_addr_t dma_addr)
+{
+ const struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ struct hbl_aux_dev *aux_dev = hdev->cn_aux_dev;
+
+ gaudi2->cn_aux_ops->dma_pool_free(aux_dev, vaddr, dma_addr);
+}
+
+static int gaudi2_cn_send_cpu_message(struct hbl_cn_device *hdev, u32 *msg, u16 len, u32 timeout,
+ u64 *result)
+{
+ struct gaudi2_cn_aux_ops *gaudi2_aux_ops;
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ gaudi2_aux_ops = aux_ops->asic_ops;
+
+ return gaudi2_aux_ops->send_cpu_message(aux_dev, msg, len, timeout, result);
+}
+
+static int gaudi2_cn_poll_reg(struct hbl_cn_device *hdev, u32 reg, u64 timeout_us,
+ hbl_cn_poll_cond_func func, void *arg)
+{
+ struct gaudi2_cn_aux_ops *gaudi2_aux_ops;
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ gaudi2_aux_ops = aux_ops->asic_ops;
+
+ return gaudi2_aux_ops->poll_reg(aux_dev, reg, timeout_us, func, arg);
+}
+
+static int gaudi2_cn_alloc_cq_rings(struct gaudi2_cn_port *gaudi2_port)
+{
+ u32 elem_size, queue_size, total_queues_size, count;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ struct hbl_cn_ring *ring;
+ dma_addr_t dma_addr;
+ void *cpu_addr;
+ int rc, i;
+
+ elem_size = sizeof(struct gaudi2_cqe);
+ count = NIC_CQ_MAX_ENTRIES;
+ queue_size = elem_size * count;
+ total_queues_size = queue_size * GAUDI2_NIC_MAX_CQS_NUM;
+
+ /* The HW expects that all CQs will be located in a physically consecutive memory one after
+ * the other. Hence we allocate all of them in one chunk.
+ */
+ cpu_addr = hbl_cn_dma_alloc_coherent(hdev, total_queues_size, &dma_addr, GFP_KERNEL);
+ if (!cpu_addr)
+ return -ENOMEM;
+
+ for (i = 0; i < NIC_CQS_NUM; i++) {
+ ring = &gaudi2_port->cq_rings[i];
+ RING_BUF_ADDRESS(ring) = cpu_addr + i * queue_size;
+ RING_BUF_DMA_ADDRESS(ring) = dma_addr + i * queue_size;
+ /* prevent freeing memory fragments by individual Qs */
+ RING_BUF_SIZE(ring) = i ? 0 : total_queues_size;
+ ring->count = count;
+ ring->elem_size = elem_size;
+ ring->asid = hdev->kernel_asid;
+ }
+
+ for (i = 0; i < NIC_CQS_NUM; i++) {
+ ring = &gaudi2_port->cq_rings[i];
+ RING_PI_SIZE(ring) = sizeof(u64);
+ RING_PI_ADDRESS(ring) = hbl_cn_dma_pool_zalloc(hdev, RING_PI_SIZE(ring),
+ GFP_KERNEL | __GFP_ZERO,
+ &RING_PI_DMA_ADDRESS(ring));
+ if (!RING_PI_ADDRESS(ring)) {
+ rc = -ENOMEM;
+ goto err;
+ }
+ }
+
+ return 0;
+
+err:
+ /* free the allocated rings indices */
+ for (--i; i >= 0; i--) {
+ ring = &gaudi2_port->cq_rings[i];
+ hbl_cn_dma_pool_free(hdev, RING_PI_ADDRESS(ring), RING_PI_DMA_ADDRESS(ring));
+ }
+
+ /* free rings memory */
+ hbl_cn_dma_free_coherent(hdev, total_queues_size, cpu_addr, dma_addr);
+
+ return rc;
+}
+
+static void gaudi2_cn_free_cq_rings(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ struct hbl_cn_ring *ring;
+ int i;
+
+ for (i = 0; i < NIC_CQS_NUM; i++) {
+ ring = &gaudi2_port->cq_rings[i];
+ hbl_cn_dma_pool_free(hdev, RING_PI_ADDRESS(ring), RING_PI_DMA_ADDRESS(ring));
+ }
+
+ /* the entire CQs memory is allocated as one chunk and stored at index 0 */
+ ring = &gaudi2_port->cq_rings[0];
+ hbl_cn_dma_free_coherent(hdev, RING_BUF_SIZE(ring), RING_BUF_ADDRESS(ring),
+ RING_BUF_DMA_ADDRESS(ring));
+}
+
+static int gaudi2_cn_alloc_rings_resources(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ int rc;
+
+ rc = hbl_cn_alloc_ring(hdev, &gaudi2_port->fifo_ring,
+ ALIGN(sizeof(u32), DEVICE_CACHE_LINE_SIZE), 1);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to allocate fifo ring\n");
+ return rc;
+ }
+
+ rc = hbl_cn_alloc_ring(hdev, &gaudi2_port->rx_ring, NIC_RAW_ELEM_SIZE,
+ NIC_RX_RING_PKT_NUM);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to allocate RX ring\n");
+ goto err_rx_ring;
+ }
+
+ rc = hbl_cn_alloc_ring(hdev, &gaudi2_port->wq_ring, sizeof(struct gaudi2_sq_wqe),
+ QP_WQE_NUM_REC);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to allocate WQ ring\n");
+ goto err_wq_ring;
+ }
+
+ rc = hbl_cn_alloc_ring(hdev, &gaudi2_port->eq_ring, sizeof(struct hbl_cn_eqe),
+ NIC_EQ_RING_NUM_REC);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to allocate EQ ring\n");
+ goto err_eq_ring;
+ }
+
+ rc = gaudi2_cn_alloc_cq_rings(gaudi2_port);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to allocate CQ rings\n");
+ goto err_cq_rings;
+ }
+
+ return 0;
+
+err_cq_rings:
+ hbl_cn_free_ring(hdev, &gaudi2_port->eq_ring);
+err_eq_ring:
+ hbl_cn_free_ring(hdev, &gaudi2_port->wq_ring);
+err_wq_ring:
+ hbl_cn_free_ring(hdev, &gaudi2_port->rx_ring);
+err_rx_ring:
+ hbl_cn_free_ring(hdev, &gaudi2_port->fifo_ring);
+
+ return rc;
+}
+
+static void gaudi2_cn_free_rings_resources(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+
+ gaudi2_cn_free_cq_rings(gaudi2_port);
+ hbl_cn_free_ring(hdev, &gaudi2_port->eq_ring);
+ hbl_cn_free_ring(hdev, &gaudi2_port->wq_ring);
+ hbl_cn_free_ring(hdev, &gaudi2_port->rx_ring);
+ hbl_cn_free_ring(hdev, &gaudi2_port->fifo_ring);
+}
+
+static void gaudi2_cn_reset_rings(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_ring *cq_ring;
+
+ /* Reset CQ ring HW PI and shadow PI/CI */
+ cq_ring = &gaudi2_port->cq_rings[NIC_CQ_RDMA_IDX];
+ *((u32 *)RING_PI_ADDRESS(cq_ring)) = 0;
+ cq_ring->pi_shadow = 0;
+ cq_ring->ci_shadow = 0;
+}
+
+static void gaudi2_cn_port_sw_fini(struct hbl_cn_port *cn_port)
+{
+ struct gaudi2_cn_port *gaudi2_port = cn_port->cn_specific;
+
+ mutex_destroy(&gaudi2_port->qp_destroy_lock);
+ mutex_destroy(&gaudi2_port->cfg_lock);
+
+ hbl_cn_eq_dispatcher_fini(cn_port);
+ gaudi2_cn_free_rings_resources(gaudi2_port);
+}
+
+static void link_eqe_init(struct hbl_cn_port *cn_port)
+{
+ /* Init only header. Data field(i.e. link status) would be updated
+ * when event is ready to be sent to user.
+ */
+ cn_port->link_eqe.data[0] = EQE_HEADER(true, EQE_LINK_STATUS);
+}
+
+static int gaudi2_cn_port_sw_init(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct gaudi2_cn_port *gaudi2_port;
+ struct gaudi2_cn_device *gaudi2;
+ u32 port = cn_port->port;
+ int rc;
+
+ gaudi2 = hdev->asic_specific;
+ gaudi2_port = &gaudi2->cn_ports[port];
+ gaudi2_port->hdev = hdev;
+ gaudi2_port->cn_port = cn_port;
+ cn_port->cn_specific = gaudi2_port;
+
+ cn_port->cn_macro = &hdev->cn_macros[port >> 1];
+
+ INIT_DELAYED_WORK(&cn_port->fw_status_work, hbl_cn_fw_status_work);
+
+ rc = gaudi2_cn_alloc_rings_resources(gaudi2_port);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to alloc rings, port: %d, %d\n", port, rc);
+ return rc;
+ }
+
+ hbl_cn_eq_dispatcher_init(gaudi2_port->cn_port);
+
+ mutex_init(&gaudi2_port->cfg_lock);
+ mutex_init(&gaudi2_port->qp_destroy_lock);
+
+ /* Userspace might not be notified immediately of link event from HW.
+ * e.g. if serdes is not yet configured or link is not stable, SW might
+ * defer sending link event to userspace.
+ * Hence we cache HW link EQE and updated with real link status just
+ * before sending to userspace.
+ */
+ link_eqe_init(cn_port);
+
+ return 0;
+}
+
+static int gaudi2_cn_macro_sw_init(struct hbl_cn_macro *cn_macro)
+{
+ return 0;
+}
+
+static void gaudi2_cn_macro_sw_fini(struct hbl_cn_macro *cn_macro)
+{
+}
+
+static int gaudi2_cn_set_pfc(struct hbl_cn_port *cn_port)
+{
+ struct gaudi2_cn_port *gaudi2_port = cn_port->cn_specific;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 port = cn_port->port, val = 0;
+ int i, start_lane;
+
+ val |= NIC0_MAC_CH0_MAC_128_COMMAND_CONFIG_TX_ENA_MASK |
+ NIC0_MAC_CH0_MAC_128_COMMAND_CONFIG_RX_ENA_MASK |
+ NIC0_MAC_CH0_MAC_128_COMMAND_CONFIG_PROMIS_EN_MASK |
+ NIC0_MAC_CH0_MAC_128_COMMAND_CONFIG_TX_PAD_EN_MASK;
+
+ if (cn_port->pfc_enable) {
+ val |= NIC0_MAC_CH0_MAC_128_COMMAND_CONFIG_PFC_MODE_MASK;
+ } else {
+ val |= NIC0_MAC_CH0_MAC_128_COMMAND_CONFIG_PAUSE_IGNORE_MASK |
+ NIC0_MAC_CH0_MAC_128_COMMAND_CONFIG_CNTL_FRAME_ENA_MASK;
+ }
+
+ /* Write the value for each lane under this port */
+ start_lane = (port & 1) ? (NIC_MAC_NUM_OF_LANES / 2) : NIC_MAC_LANES_START;
+
+ for (i = start_lane; i < start_lane + (NIC_MAC_NUM_OF_LANES / 2); i++)
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_COMMAND_CONFIG + MAC_CH_OFFSET(i), val);
+
+ return 0;
+}
+
+static int gaudi2_cn_config_port_hw_txs(struct gaudi2_cn_port *gaudi2_port)
+{
+ u32 txs_schedq, txs_fence_idx, txs_pi, txs_ci, txs_tail, txs_head, txs_timeout_31_0,
+ timeout_47_32, prio, txs_port, rl_en_log_time;
+ struct hbl_cn_port *cn_port = gaudi2_port->cn_port;
+ struct cpucp_cn_init_hw_mem_packet pkt;
+ struct hbl_cn_properties *cn_prop;
+ struct hbl_cn_device *hdev;
+ u32 port = cn_port->port;
+ bool use_cpucp;
+ u64 txs_addr;
+ int i, rc;
+
+ hdev = gaudi2_port->hdev;
+ cn_prop = &hdev->cn_props;
+
+ txs_addr = cn_prop->txs_base_addr + port * cn_prop->txs_base_size;
+
+ use_cpucp = !!(hdev->fw_app_cpu_boot_dev_sts0 & CPU_BOOT_DEV_STS0_NIC_MEM_CLEAR_EN);
+ if (use_cpucp) {
+ memset(&pkt, 0, sizeof(pkt));
+ pkt.cpucp_pkt.ctl = cpu_to_le32(CPUCP_PACKET_NIC_INIT_TXS_MEM <<
+ CPUCP_PKT_CTL_OPCODE_SHIFT);
+ pkt.cpucp_pkt.port_index = cpu_to_le32(port);
+ pkt.mem_base_addr = cpu_to_le64(txs_addr + TXS_FREE_OFFS);
+ pkt.num_entries = cpu_to_le16(TXS_FREE_NUM_ENTRIES);
+ pkt.entry_size = cpu_to_le16(TXS_ENT_SIZE);
+ pkt.granularity = cpu_to_le16(TXS_GRANULARITY);
+
+ rc = gaudi2_cn_send_cpu_message(hdev, (u32 *)&pkt, sizeof(pkt), 0, NULL);
+
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to handle CPU-CP pkt %u, error %d\n",
+ CPUCP_PACKET_NIC_INIT_TXS_MEM, rc);
+ return rc;
+ }
+ } else {
+ /* TX sched-Qs list */
+ for (i = 0; i < TXS_FREE_NUM_ENTRIES; i++)
+ hbl_cn_dram_writel(hdev, TXS_GRANULARITY + i,
+ txs_addr + TXS_FREE_OFFS + i * TXS_ENT_SIZE);
+
+ /* Perform read to flush the writes */
+ hbl_cn_dram_readl(hdev, txs_addr);
+ }
+
+ WARN_ON_CACHE_UNALIGNED(txs_addr + TXS_FIFO_OFFS);
+
+ /* set TX sched queues address */
+ NIC_WREG32(NIC0_TXS0_BASE_ADDRESS_63_32, upper_32_bits(txs_addr + TXS_FIFO_OFFS));
+ NIC_WREG32(NIC0_TXS0_BASE_ADDRESS_31_7, lower_32_bits(txs_addr + TXS_FIFO_OFFS) >> 7);
+
+ /* Set access to bypass the MMU (old style configuration) */
+ NIC_WREG32(NIC0_TXS0_AXI_USER_LO, 0x400);
+
+ NIC_WREG32(NIC0_TXS0_FREE_LIST_PUSH_MASK_EN, 1);
+
+ txs_fence_idx = 0;
+ txs_pi = 0;
+ txs_ci = 0;
+ txs_tail = 0;
+ txs_head = 0;
+ txs_timeout_31_0 = 0;
+ timeout_47_32 = 0;
+ prio = 0;
+ txs_port = 0;
+ rl_en_log_time = 0;
+
+ /* Gaudi2 TXS implements 256 schedule-Qs.
+ * These queues are hard-divided to 4x64 priority groups of Qs.
+ * (The first and last Q group-relative numbers of each group (0-63) can be configured
+ * via NIC0_TXS0_FIRST_SCHEDQ_ID and NIC0_TXS0_LAST_SCHEDQ_ID, We will use its
+ * default values of 0 and 63 respectively).
+ * From the above pools we need to allocate and configure:
+ * 256 Qs (0-255) are evenly divided between the 4 possible ports so each port is
+ * assigned with 64 Qs.
+ * The 64 Qs are divided between the 4 possible priorities generating 16
+ * priority-granularity groups of which:
+ * - The Last group is dedicated for Ethernet (RAW_SCHED_Q).
+ * - The last-1 group is dedicated for the RDMA responder (RES_SCHED_Q)
+ * - The Last-2 group is dedicated for the RDMA Req (REQ_SCHED_Q)
+ * - The remaining Qs will be used by the BBR when supported.
+ */
+ for (i = 0; i < TXS_SCHEDQ; i++) {
+ /* main sched Qs */
+ txs_port = i / TXS_PORT_NUM_SCHEDQS;
+
+ prio = i % HBL_EN_PFC_PRIO_NUM;
+ txs_schedq = (timeout_47_32 & 0xFFFF) | ((prio & 0x3) << 16) |
+ ((txs_port & 1) << 18) | ((rl_en_log_time & 0x3F) << 19);
+ txs_tail = i;
+ txs_head = i;
+ NIC_WREG32(NIC0_TXS0_SCHEDQ_UPDATE_DESC_31_0, txs_fence_idx);
+ NIC_WREG32(NIC0_TXS0_SCHEDQ_UPDATE_DESC_63_32, txs_pi);
+ NIC_WREG32(NIC0_TXS0_SCHEDQ_UPDATE_DESC_95_64, txs_ci);
+ NIC_WREG32(NIC0_TXS0_SCHEDQ_UPDATE_DESC_127_96, txs_tail);
+ NIC_WREG32(NIC0_TXS0_SCHEDQ_UPDATE_DESC_159_128, txs_head);
+ NIC_WREG32(NIC0_TXS0_SCHEDQ_UPDATE_DESC_191_160, txs_timeout_31_0);
+ NIC_WREG32(NIC0_TXS0_SCHEDQ_UPDATE_DESC_217_192, txs_schedq);
+ NIC_WREG32(NIC0_TXS0_SCHEDQ_UPDATE_FIFO, i);
+ NIC_WREG32(NIC0_TXS0_SCHEDQ_UPDATE_EN, 1);
+ }
+
+ NIC_WREG32(NIC0_TXS0_TICK_WRAP, 100);
+
+ NIC_WREG32(NIC0_TXS0_SCAN_TIME_COMPARE_0, 4);
+ NIC_WREG32(NIC0_TXS0_SCAN_TIME_COMPARE_1, 0);
+ NIC_WREG32(NIC0_TXS0_TMR_SCAN_EN, 1);
+
+ NIC_WREG32(NIC0_TXS0_BASE_ADDRESS_FREE_LIST_63_32,
+ upper_32_bits(txs_addr + TXS_FREE_OFFS));
+
+ NIC_WREG32(NIC0_TXS0_BASE_ADDRESS_FREE_LIST_31_0,
+ lower_32_bits(txs_addr + TXS_FREE_OFFS));
+
+ NIC_WREG32(NIC0_TXS0_LIST_MASK,
+ ~(0xFFFFFFFF << (ilog2(TXS_FREE_NUM_ENTRIES) - 5)));
+ NIC_WREG32(NIC0_TXS0_PRODUCER_UPDATE, TXS_FREE_NUM_ENTRIES);
+ NIC_WREG32(NIC0_TXS0_PRODUCER_UPDATE_EN, 1);
+ NIC_WREG32(NIC0_TXS0_PRODUCER_UPDATE_EN, 0);
+ NIC_WREG32(NIC0_TXS0_LIST_MEM_READ_MASK, 0);
+ NIC_WREG32(NIC0_TXS0_PUSH_LOCK_EN, 1);
+
+ /* disable burst size optimization */
+ NIC_WREG32(NIC0_TXS0_IGNORE_BURST_EN, 0);
+
+ return 0;
+}
+
+static void gaudi2_cn_config_port_hw_txe(struct gaudi2_cn_port *gaudi2_port, u64 mac_addr)
+{
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ struct hbl_cn_properties *cn_prop;
+ u32 port;
+
+ cn_prop = &hdev->cn_props;
+ port = gaudi2_port->cn_port->port;
+
+ /* set the base address of the raw wq */
+ NIC_WREG32(NIC0_TXE0_SQ_BASE_ADDRESS_63_32_0,
+ upper_32_bits(RING_BUF_DMA_ADDRESS(&gaudi2_port->wq_ring)));
+
+ NIC_WREG32(NIC0_TXE0_SQ_BASE_ADDRESS_31_0_0,
+ lower_32_bits(RING_BUF_DMA_ADDRESS(&gaudi2_port->wq_ring)));
+
+ NIC_WREG32(NIC0_TXE0_LOG_MAX_WQ_SIZE_0, WQ_BUFFER_LOG_SIZE - 2);
+
+ /* map: prio#0-dscp#0, prio#1-dscp#0, prio#2-dscp#16, prio#3-dscp#24 */
+ NIC_WREG32(NIC0_TXE0_PRIO_TO_DSCP_0, 0x18100000);
+
+ NIC_WREG32(NIC0_TXE0_PORT0_MAC_CFG_47_32, (mac_addr >> 32) & 0xFFFF);
+ NIC_WREG32(NIC0_TXE0_PORT0_MAC_CFG_31_0, mac_addr & 0xFFFFFFFF);
+ NIC_WREG32(NIC0_TXE0_PORT1_MAC_CFG_47_32, (mac_addr >> 32) & 0xFFFF);
+ NIC_WREG32(NIC0_TXE0_PORT1_MAC_CFG_31_0, mac_addr & 0xFFFFFFFF);
+
+ /* set MMU bypass for kernel WQ */
+ NIC_WREG32(NIC0_TXE0_WQE_USER_CFG, 0x1);
+
+ NIC_WREG32(NIC0_TXE0_WQE_PREFETCH_CFG, 0x3);
+
+ NIC_WREG32(NIC0_TXE0_BTH_MKEY, 0xffff);
+
+ /* 100ms BW window size */
+ NIC_WREG32(NIC0_TXE0_STATS_CFG0, cn_prop->clk * PERF_BW_WINDOW_USEC);
+ NIC_RMWREG32(NIC0_TXE0_STATS_CFG1, 1, NIC0_TXE0_STATS_CFG1_LATENCY_ENABLE_MASK);
+ NIC_RMWREG32(NIC0_TXE0_STATS_CFG1, 0, NIC0_TXE0_STATS_CFG1_WIN_TYPE_MASK);
+ NIC_RMWREG32(NIC0_TXE0_STATS_CFG1, 0, NIC0_TXE0_STATS_CFG1_WIN_SAMP_LATENCY_MASK);
+ NIC_RMWREG32(NIC0_TXE0_STATS_CFG1, 3, NIC0_TXE0_STATS_CFG1_TOT_TYPE_MASK);
+ NIC_RMWREG32(NIC0_TXE0_STATS_CFG1, 1, NIC0_TXE0_STATS_CFG1_ENABLE_MASK);
+ NIC_RMWREG32(NIC0_TXE0_STATS_CFG2, 0, NIC0_TXE0_STATS_CFG2_LATENCY_WRAP_EN_MASK);
+ /* 2us latency window size */
+ NIC_RMWREG32(NIC0_TXE0_STATS_CFG2, 2 * cn_prop->clk,
+ NIC0_TXE0_STATS_CFG2_LATENCY_MAX_VAL_MASK);
+}
+
+static void gaudi2_cn_config_port_hw_qpc(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_port *cn_port = gaudi2_port->cn_port;
+ u64 req_qpc_base_addr, res_qpc_base_addr;
+ struct hbl_cn_properties *cn_prop;
+ struct hbl_cn_device *hdev;
+ u32 port = cn_port->port;
+
+ hdev = gaudi2_port->hdev;
+ cn_prop = &hdev->cn_props;
+
+ req_qpc_base_addr = cn_prop->req_qpc_base_addr + port * cn_prop->req_qpc_base_size;
+ res_qpc_base_addr = cn_prop->res_qpc_base_addr + port * cn_prop->res_qpc_base_size;
+
+ WARN_ON_CACHE_UNALIGNED(req_qpc_base_addr);
+ WARN_ON_CACHE_UNALIGNED(res_qpc_base_addr);
+
+ NIC_WREG32(NIC0_QPC0_REQ_BASE_ADDRESS_63_32, upper_32_bits(req_qpc_base_addr));
+ NIC_WREG32(NIC0_QPC0_REQ_BASE_ADDRESS_31_7, lower_32_bits(req_qpc_base_addr) >> 7);
+
+ NIC_WREG32(NIC0_QPC0_RES_BASE_ADDRESS_63_32, upper_32_bits(res_qpc_base_addr));
+ NIC_WREG32(NIC0_QPC0_RES_BASE_ADDRESS_31_7, lower_32_bits(res_qpc_base_addr) >> 7);
+
+ NIC_WREG32(NIC0_QPC0_RES_QPC_CACHE_INVALIDATE, 1);
+ NIC_WREG32(NIC0_QPC0_REQ_QPC_CACHE_INVALIDATE, 1);
+ NIC_WREG32(NIC0_QPC0_RES_QPC_CACHE_INVALIDATE, 0);
+ NIC_WREG32(NIC0_QPC0_REQ_QPC_CACHE_INVALIDATE, 0);
+
+ NIC_WREG32(NIC0_QPC0_INTERRUPT_CAUSE, 0);
+
+ /* Configure MMU-BP override for DB-FIFOs */
+ NIC_WREG32(NIC0_QPC0_AXUSER_DB_FIFO_HB_WR_OVRD_LO, 0xFFFFFBFF);
+ NIC_WREG32(NIC0_QPC0_AXUSER_DB_FIFO_HB_RD_OVRD_LO, 0xFFFFFBFF);
+
+ WARN_ON_CACHE_UNALIGNED(RING_BUF_DMA_ADDRESS(&gaudi2_port->fifo_ring));
+
+ /* Configure doorbell */
+ NIC_WREG32(NIC0_QPC0_DBFIFOSECUR_CI_UPD_ADDR_DBFIFO_CI_UPD_ADDR_63_32,
+ upper_32_bits(RING_BUF_DMA_ADDRESS(&gaudi2_port->fifo_ring)));
+ NIC_WREG32(NIC0_QPC0_DBFIFOSECUR_CI_UPD_ADDR_DBFIFO_CI_UPD_ADDR_31_7,
+ lower_32_bits(RING_BUF_DMA_ADDRESS(&gaudi2_port->fifo_ring)) >> 7);
+
+ gaudi2_cn_eq_dispatcher_register_db(gaudi2_port, hdev->kernel_asid,
+ GAUDI2_DB_FIFO_SECURE_HW_ID);
+
+ /* Configure MMU-BP override for error-FIFO */
+ NIC_WREG32(NIC0_QPC0_AXUSER_ERR_FIFO_HB_WR_OVRD_LO, 0xFFFFFBFF);
+ NIC_WREG32(NIC0_QPC0_AXUSER_ERR_FIFO_HB_RD_OVRD_LO, 0xFFFFFBFF);
+
+ NIC_WREG32(NIC0_QPC0_RETRY_COUNT_MAX,
+ (GAUDI2_NIC_MAX_TIMEOUT_RETRIES << NIC0_QPC0_RETRY_COUNT_MAX_TIMEOUT_SHIFT) |
+ (GAUDI2_NIC_MAX_SEQ_ERR_RETRIES <<
+ NIC0_QPC0_RETRY_COUNT_MAX_SEQUENCE_ERROR_SHIFT));
+
+ /* Configure MMU-BP override for QPCs */
+ NIC_WREG32(NIC0_QPC0_AXUSER_QPC_REQ_HB_WR_OVRD_LO, 0xFFFFFBFF);
+ NIC_WREG32(NIC0_QPC0_AXUSER_QPC_REQ_HB_RD_OVRD_LO, 0xFFFFFBFF);
+ NIC_WREG32(NIC0_QPC0_AXUSER_QPC_RESP_HB_WR_OVRD_LO, 0xFFFFFBFF);
+ NIC_WREG32(NIC0_QPC0_AXUSER_QPC_RESP_HB_RD_OVRD_LO, 0xFFFFFBFF);
+
+ /* Configure MMU-BP override for Congestion-Queue */
+ NIC_WREG32(NIC0_QPC0_AXUSER_CONG_QUE_HB_WR_OVRD_LO, 0xFFFFFBFF);
+ NIC_WREG32(NIC0_QPC0_AXUSER_CONG_QUE_HB_RD_OVRD_LO, 0xFFFFFBFF);
+
+ NIC_RMWREG32(NIC0_QPC0_REQ_STATIC_CONFIG, 1,
+ NIC0_QPC0_REQ_STATIC_CONFIG_QM_MOVEQP2ERR_SECUR_ERR_MASK);
+ NIC_RMWREG32(NIC0_QPC0_REQ_STATIC_CONFIG, 0,
+ NIC0_QPC0_REQ_STATIC_CONFIG_QM_PUSH_TO_ERROR_ASID_MASK);
+ NIC_RMWREG32(NIC0_QPC0_REQ_STATIC_CONFIG, 1,
+ NIC0_QPC0_REQ_STATIC_CONFIG_QM_UPD_IGNORE_ASID_ERR_MASK);
+ NIC_RMWREG32(NIC0_QPC0_REQ_STATIC_CONFIG, 0,
+ NIC0_QPC0_REQ_STATIC_CONFIG_QM_MOVEQP2ERR_ASID_ERR_MASK);
+ NIC_RMWREG32(NIC0_QPC0_REQ_STATIC_CONFIG, 1,
+ NIC0_QPC0_REQ_STATIC_CONFIG_QM_PUSH_TO_ERROR_SECURITY_MASK);
+
+ /* Disable the WTD back-pressure mechanism to ARC and QMAN - it will be
+ * enabled later on by the user.
+ */
+ NIC_RMWREG32_SHIFTED(NIC0_QPC0_WTD_CONFIG, 0, NIC0_QPC0_WTD_CONFIG_WQ_BP_2ARC_EN_MASK |
+ NIC0_QPC0_WTD_CONFIG_WQ_BP_2QMAN_EN_MASK);
+}
+
+static void gaudi2_cn_config_port_hw_rxe(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_ring *cq_ring = &gaudi2_port->cq_rings[NIC_CQ_RAW_IDX];
+ struct hbl_cn_properties *cn_prop = &gaudi2_port->hdev->cn_props;
+ struct hbl_cn_ring *rx_ring = &gaudi2_port->rx_ring;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 port = gaudi2_port->cn_port->port;
+ u32 rx_mem_addr_lo, rx_mem_addr_hi;
+ int i;
+
+ rx_mem_addr_lo = lower_32_bits(RING_BUF_DMA_ADDRESS(rx_ring));
+ rx_mem_addr_hi = upper_32_bits(RING_BUF_DMA_ADDRESS(rx_ring));
+
+ NIC_WREG32(NIC0_RXE0_RAW_QPN_P0_0, RAW_QPN);
+ NIC_WREG32(NIC0_RXE0_RAW_QPN_P0_1, RAW_QPN);
+ NIC_WREG32(NIC0_RXE0_RAW_QPN_P1_0, RAW_QPN);
+ NIC_WREG32(NIC0_RXE0_RAW_QPN_P1_1, RAW_QPN);
+ NIC_WREG32(NIC0_RXE0_RAW_QPN_P2_0, RAW_QPN);
+ NIC_WREG32(NIC0_RXE0_RAW_QPN_P2_1, RAW_QPN);
+ NIC_WREG32(NIC0_RXE0_RAW_QPN_P3_0, RAW_QPN);
+ NIC_WREG32(NIC0_RXE0_RAW_QPN_P3_1, RAW_QPN);
+
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_LO_P0_0, rx_mem_addr_lo);
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_LO_P0_1, rx_mem_addr_lo);
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_HI_P0_0, rx_mem_addr_hi);
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_HI_P0_1, rx_mem_addr_hi);
+
+ /* #define NIC5_RXE1_RAW_MISC_P2_LOG_RAW_ENTRY_SIZE_P2_SHIFT 0 */
+ /* #define NIC5_RXE1_RAW_MISC_P2_LOG_RAW_ENTRY_SIZE_P2_MASK 0xF */
+
+ /* #define NIC5_RXE1_RAW_MISC_P2_LOG_BUFFER_SIZE_MASK_P2_SHIFT 15 */
+ /* #define NIC5_RXE1_RAW_MISC_P2_LOG_BUFFER_SIZE_MASK_P2_MASK */
+ /* 0xF8000 */
+
+ NIC_WREG32(NIC0_RXE0_RAW_MISC_P0_0,
+ (ilog2(rx_ring->elem_size) & NIC0_RXE0_RAW_MISC_P2_LOG_RAW_ENTRY_SIZE_P2_MASK) |
+ ((ilog2(rx_ring->count) & 0x1F) << 15));
+
+ NIC_WREG32(NIC0_RXE0_RAW_MISC_P0_1,
+ (ilog2(rx_ring->elem_size) & NIC0_RXE0_RAW_MISC_P2_LOG_RAW_ENTRY_SIZE_P2_MASK) |
+ ((ilog2(rx_ring->count) & 0x1F) << 15));
+
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_LO_P1_0, rx_mem_addr_lo);
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_LO_P1_1, rx_mem_addr_lo);
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_HI_P1_0, rx_mem_addr_hi);
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_HI_P1_1, rx_mem_addr_hi);
+
+ NIC_WREG32(NIC0_RXE0_RAW_MISC_P1_0,
+ (ilog2(rx_ring->elem_size) & NIC0_RXE0_RAW_MISC_P2_LOG_RAW_ENTRY_SIZE_P2_MASK) |
+ ((ilog2(rx_ring->count) & 0x1F) << 15));
+
+ NIC_WREG32(NIC0_RXE0_RAW_MISC_P1_1,
+ (ilog2(rx_ring->elem_size) & NIC0_RXE0_RAW_MISC_P2_LOG_RAW_ENTRY_SIZE_P2_MASK) |
+ ((ilog2(rx_ring->count) & 0x1F) << 15));
+
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_LO_P2_0, rx_mem_addr_lo);
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_LO_P2_1, rx_mem_addr_lo);
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_HI_P2_0, rx_mem_addr_hi);
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_HI_P2_1, rx_mem_addr_hi);
+
+ NIC_WREG32(NIC0_RXE0_RAW_MISC_P2_0,
+ (ilog2(rx_ring->elem_size) & NIC0_RXE0_RAW_MISC_P2_LOG_RAW_ENTRY_SIZE_P2_MASK) |
+ ((ilog2(rx_ring->count) & 0x1F) << 15));
+
+ NIC_WREG32(NIC0_RXE0_RAW_MISC_P2_1,
+ (ilog2(rx_ring->elem_size) & NIC0_RXE0_RAW_MISC_P2_LOG_RAW_ENTRY_SIZE_P2_MASK) |
+ ((ilog2(rx_ring->count) & 0x1F) << 15));
+
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_LO_P3_0, rx_mem_addr_lo);
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_LO_P3_1, rx_mem_addr_lo);
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_HI_P3_0, rx_mem_addr_hi);
+ NIC_WREG32(NIC0_RXE0_RAW_BASE_HI_P3_1, rx_mem_addr_hi);
+
+ NIC_WREG32(NIC0_RXE0_RAW_MISC_P3_0,
+ (ilog2(rx_ring->elem_size) & NIC0_RXE0_RAW_MISC_P2_LOG_RAW_ENTRY_SIZE_P2_MASK) |
+ ((ilog2(rx_ring->count) & 0x1F) << 15));
+
+ NIC_WREG32(NIC0_RXE0_RAW_MISC_P3_1,
+ (ilog2(rx_ring->elem_size) & NIC0_RXE0_RAW_MISC_P2_LOG_RAW_ENTRY_SIZE_P2_MASK) |
+ ((ilog2(rx_ring->count) & 0x1F) << 15));
+
+ NIC_WREG32(NIC0_RXE0_CQ_BASE_ADDR_63_32, upper_32_bits(RING_BUF_DMA_ADDRESS(cq_ring)));
+ NIC_WREG32(NIC0_RXE0_CQ_BASE_ADDR_31_7,
+ lower_32_bits(RING_BUF_DMA_ADDRESS(cq_ring)) & 0xFFFFFF80);
+
+ NIC_WREG32(NIC0_RXE0_CQ_PI_ADDR_HI_0,
+ upper_32_bits(RING_PI_DMA_ADDRESS(cq_ring)));
+ NIC_WREG32(NIC0_RXE0_CQ_PI_ADDR_LO_0,
+ lower_32_bits(RING_PI_DMA_ADDRESS(cq_ring)) & 0xFFFFFF80);
+
+ /* Set the max CQ size */
+ NIC_WREG32(NIC0_RXE0_CQ_LOG_MAX_SIZE, ilog2(NIC_CQ_MAX_ENTRIES));
+
+ /* Set the actual single CQ size log2(number of entries in cq)*/
+ NIC_WREG32(NIC0_RXE0_CQ_LOG_SIZE_0, ilog2(cq_ring->count));
+
+ /* Initialize MMU-BP for all CQs */
+ for (i = 0; i < cn_prop->max_cqs; i++)
+ NIC_WREG32(NIC0_RXE0_AXUSER_AXUSER_CQ0_HB_WR_OVRD_LO +
+ (i * NIC_RXE_AXUSER_AXUSER_CQ_OFFSET), 0xFFFFFBFF);
+
+ NIC_WREG32(NIC0_RXE0_CQ_WRITE_INDEX_0, 0);
+ NIC_WREG32(NIC0_RXE0_CQ_PRODUCER_INDEX_0, 0);
+ NIC_WREG32(NIC0_RXE0_CQ_CONSUMER_INDEX_0, 0);
+
+ /* enable, pi-update and completion-events */
+ NIC_WREG32(NIC0_RXE0_CQ_CFG_0, 1 << NIC0_RXE0_CQ_CFG_WRITE_PI_EN_SHIFT |
+ 1 << NIC0_RXE0_CQ_CFG_ENABLE_SHIFT);
+
+ /* disable all RDMA CQs */
+ for (i = 1; i < cn_prop->max_cqs; i++)
+ NIC_WREG32(NIC0_RXE0_CQ_CFG_0 + i * 4, 0);
+
+ /* set MMU bypass for kernel WQ */
+ NIC_WREG32(NIC0_RXE0_ARUSER_MMU_BP, 0x1);
+
+ /* set SPMU RXE counters of Group 1 */
+ NIC_WREG32(NIC0_RXE0_DBG_SPMU_SELECT, 0x1);
+}
+
+static void gaudi2_cn_config_hw_mac_filter(struct gaudi2_cn_port *gaudi2_port, u64 mac_addr)
+{
+ struct hbl_cn_port *cn_port = gaudi2_port->cn_port;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 port = cn_port->port;
+
+ if (cn_port->eth_enable) {
+ if (port & 1) {
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RC_MAC_31_0_2,
+ mac_addr & 0xFFFFFFFF);
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RC_MAC_47_32_2,
+ (mac_addr >> 32) & 0xFFFF);
+
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RC_MAC_31_0_MASK_2, 0);
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RC_MAC_47_32_MASK_2, 0);
+
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RAW0_MAC_31_0_2,
+ mac_addr & 0xFFFFFFFF);
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RAW0_MAC_47_32_2,
+ (mac_addr >> 32) & 0xFFFF);
+
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RAW0_MAC_31_0_MASK_2, 0);
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RAW0_MAC_47_32_MASK_2, 0);
+ } else {
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RC_MAC_31_0_0,
+ mac_addr & 0xFFFFFFFF);
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RC_MAC_47_32_0,
+ (mac_addr >> 32) & 0xFFFF);
+
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RC_MAC_31_0_MASK_0, 0);
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RC_MAC_47_32_MASK_0, 0);
+
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RAW0_MAC_31_0_0,
+ mac_addr & 0xFFFFFFFF);
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RAW0_MAC_47_32_0,
+ (mac_addr >> 32) & 0xFFFF);
+
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RAW0_MAC_31_0_MASK_0, 0);
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RAW0_MAC_47_32_MASK_0, 0);
+ }
+ } else {
+ if (port & 1) {
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RC_MAC_31_0_MASK_2, 0xFFFFFFFF);
+
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RC_MAC_47_32_MASK_2, 0xFFFF);
+ } else {
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RC_MAC_31_0_MASK_0, 0xFFFFFFFF);
+
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TS_RC_MAC_47_32_MASK_0, 0xFFFF);
+ }
+ }
+}
+
+static int gaudi2_cn_hw_mac_ch_reset(struct gaudi2_cn_port *gaudi2_port, int lane)
+{
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 port = gaudi2_port->cn_port->port;
+ ktime_t timeout;
+ u32 read_reg;
+
+ if (hdev->skip_mac_reset)
+ return 0;
+
+ timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * 1000ull);
+
+ do {
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_CONTROL1 + MAC_CH_OFFSET(lane),
+ BIT(NIC0_MAC_CH0_MAC_PCS_CONTROL1_FLD_RESET_SHIFT));
+ usleep_range(50, 200);
+
+ read_reg = NIC_MACRO_RREG32(NIC0_MAC_CH0_MAC_PCS_CONTROL1 + MAC_CH_OFFSET(lane));
+ } while ((read_reg & NIC0_MAC_CH0_MAC_PCS_CONTROL1_FLD_RESET_MASK) &&
+ ktime_compare(ktime_get(), timeout) < 0);
+
+ if (read_reg & NIC0_MAC_CH0_MAC_PCS_CONTROL1_FLD_RESET_MASK) {
+ dev_err(hdev->dev, "Timeout while MAC channel %d reset\n", lane);
+ return -EBUSY;
+ }
+
+ return 0;
+}
+
+static void gaudi2_cn_hw_mac_port_config_lane_speed(struct gaudi2_cn_port *gaudi2_port, int i)
+{
+ struct hbl_cn_port *cn_port = gaudi2_port->cn_port;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 port = cn_port->port;
+
+ switch (cn_port->speed) {
+ case SPEED_25000:
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL_INTVL + MAC_CH_OFFSET(i),
+ REGMASK(0x4FFF,
+ NIC0_MAC_CH0_MAC_PCS_VENDOR_VL_INTVL_MARKER_COUNTER));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_PCS_MODE + MAC_CH_OFFSET(i),
+ REGMASK(1, NIC0_MAC_CH0_MAC_PCS_VENDOR_PCS_MODE_ENA_CLAUSE49) |
+ REGMASK(1, NIC0_MAC_CH0_MAC_PCS_VENDOR_PCS_MODE_HI_BER25));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_0 + MAC_CH_OFFSET(i),
+ REGMASK(0xC1, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_0_M0) |
+ REGMASK(0x68, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_0_M1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x21, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_1_M2));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_0 + MAC_CH_OFFSET(i),
+ REGMASK(0xF0, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_0_M0) |
+ REGMASK(0xC4, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_0_M1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_1 + MAC_CH_OFFSET(i),
+ REGMASK(0xE6, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_1_M2));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_0 + MAC_CH_OFFSET(i),
+ REGMASK(0xC5, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_0_M0) |
+ REGMASK(0x65, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_0_M1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x9B, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_1_M2));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_0 + MAC_CH_OFFSET(i),
+ REGMASK(0xA2, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_0_M0) |
+ REGMASK(0x79, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_0_M1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x3D, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_1_M2));
+ break;
+ case SPEED_50000:
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL_INTVL + MAC_CH_OFFSET(i),
+ REGMASK(0x4FFF,
+ NIC0_MAC_CH0_MAC_PCS_VENDOR_VL_INTVL_MARKER_COUNTER));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_PCS_MODE + MAC_CH_OFFSET(i), 0x0);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_0 + MAC_CH_OFFSET(i),
+ REGMASK(0x90, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_0_M0) |
+ REGMASK(0x76, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_0_M1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x47, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_1_M2));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_0 + MAC_CH_OFFSET(i),
+ REGMASK(0xF0, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_0_M0) |
+ REGMASK(0xC4, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_0_M1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_1 + MAC_CH_OFFSET(i),
+ REGMASK(0xE6, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_1_M2));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_0 + MAC_CH_OFFSET(i),
+ REGMASK(0xC5, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_0_M0) |
+ REGMASK(0x65, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_0_M1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x9B, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_1_M2));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_0 + MAC_CH_OFFSET(i),
+ REGMASK(0xA2, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_0_M0) |
+ REGMASK(0x79, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_0_M1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x3D, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_1_M2));
+ break;
+ case SPEED_100000:
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL_INTVL + MAC_CH_OFFSET(i),
+ REGMASK(0x3FFF,
+ NIC0_MAC_CH0_MAC_PCS_VENDOR_VL_INTVL_MARKER_COUNTER));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_PCS_MODE + MAC_CH_OFFSET(i), 0x0);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_0 + MAC_CH_OFFSET(i),
+ REGMASK(0xC1, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_0_M0) |
+ REGMASK(0x68, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_0_M1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x21, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL0_1_M2));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_0 + MAC_CH_OFFSET(i),
+ REGMASK(0x9D, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_0_M0) |
+ REGMASK(0x71, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_0_M1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x8E, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL1_1_M2));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_0 + MAC_CH_OFFSET(i),
+ REGMASK(0x59, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_0_M0) |
+ REGMASK(0x4B, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_0_M1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_1 + MAC_CH_OFFSET(i),
+ REGMASK(0xE8, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL2_1_M2));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_0 + MAC_CH_OFFSET(i),
+ REGMASK(0x4D, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_0_M0) |
+ REGMASK(0x95, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_0_M1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x7B, NIC0_MAC_CH0_MAC_PCS_VENDOR_VL3_1_M2));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL4_0 + MAC_CH_OFFSET(i),
+ REGMASK(0x7F5, NIC0_MAC_CH0_MAC_PCS_VL4_0_VL4_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL4_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x9, NIC0_MAC_CH0_MAC_PCS_VL4_1_VL4_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL5_0 + MAC_CH_OFFSET(i),
+ REGMASK(0x14DD, NIC0_MAC_CH0_MAC_PCS_VL5_0_VL5_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL5_1 + MAC_CH_OFFSET(i),
+ REGMASK(0xC2, NIC0_MAC_CH0_MAC_PCS_VL5_1_VL5_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL6_0 + MAC_CH_OFFSET(i),
+ REGMASK(0x4A9A, NIC0_MAC_CH0_MAC_PCS_VL6_0_VL6_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL6_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x26, NIC0_MAC_CH0_MAC_PCS_VL6_1_VL6_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL7_0 + MAC_CH_OFFSET(i),
+ REGMASK(0x457B, NIC0_MAC_CH0_MAC_PCS_VL7_0_VL7_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL7_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x66, NIC0_MAC_CH0_MAC_PCS_VL7_1_VL7_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL8_0 + MAC_CH_OFFSET(i),
+ REGMASK(0x24A0, NIC0_MAC_CH0_MAC_PCS_VL8_0_VL8_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL8_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x76, NIC0_MAC_CH0_MAC_PCS_VL8_1_VL8_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL9_0 + MAC_CH_OFFSET(i),
+ REGMASK(0xC968, NIC0_MAC_CH0_MAC_PCS_VL9_0_VL9_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL9_1 + MAC_CH_OFFSET(i),
+ REGMASK(0xFB, NIC0_MAC_CH0_MAC_PCS_VL9_1_VL9_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL10_0 + MAC_CH_OFFSET(i),
+ REGMASK(0x6CFD, NIC0_MAC_CH0_MAC_PCS_VL10_0_VL10_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL10_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x99, NIC0_MAC_CH0_MAC_PCS_VL10_1_VL10_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL11_0 + MAC_CH_OFFSET(i),
+ REGMASK(0x91B9, NIC0_MAC_CH0_MAC_PCS_VL11_0_VL11_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL11_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x55, NIC0_MAC_CH0_MAC_PCS_VL11_1_VL11_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL12_0 + MAC_CH_OFFSET(i),
+ REGMASK(0xB95C, NIC0_MAC_CH0_MAC_PCS_VL12_0_VL12_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL12_1 + MAC_CH_OFFSET(i),
+ REGMASK(0xB2, NIC0_MAC_CH0_MAC_PCS_VL12_1_VL12_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL13_0 + MAC_CH_OFFSET(i),
+ REGMASK(0xF81A, NIC0_MAC_CH0_MAC_PCS_VL13_0_VL13_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL13_1 + MAC_CH_OFFSET(i),
+ REGMASK(0xBD, NIC0_MAC_CH0_MAC_PCS_VL13_1_VL13_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL14_0 + MAC_CH_OFFSET(i),
+ REGMASK(0xC783, NIC0_MAC_CH0_MAC_PCS_VL14_0_VL14_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL14_1 + MAC_CH_OFFSET(i),
+ REGMASK(0xCA, NIC0_MAC_CH0_MAC_PCS_VL14_1_VL14_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL15_0 + MAC_CH_OFFSET(i),
+ REGMASK(0x3635, NIC0_MAC_CH0_MAC_PCS_VL15_0_VL15_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL15_1 + MAC_CH_OFFSET(i),
+ REGMASK(0xCD, NIC0_MAC_CH0_MAC_PCS_VL15_1_VL15_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL16_0 + MAC_CH_OFFSET(i),
+ REGMASK(0x31C4, NIC0_MAC_CH0_MAC_PCS_VL16_0_VL16_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL16_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x4C, NIC0_MAC_CH0_MAC_PCS_VL16_1_VL16_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL17_0 + MAC_CH_OFFSET(i),
+ REGMASK(0xD6AD, NIC0_MAC_CH0_MAC_PCS_VL17_0_VL17_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL17_1 + MAC_CH_OFFSET(i),
+ REGMASK(0xB7, NIC0_MAC_CH0_MAC_PCS_VL17_1_VL17_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL18_0 + MAC_CH_OFFSET(i),
+ REGMASK(0x665F, NIC0_MAC_CH0_MAC_PCS_VL18_0_VL18_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL18_1 + MAC_CH_OFFSET(i),
+ REGMASK(0x2A, NIC0_MAC_CH0_MAC_PCS_VL18_1_VL18_1));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL19_0 + MAC_CH_OFFSET(i),
+ REGMASK(0xF0C0, NIC0_MAC_CH0_MAC_PCS_VL19_0_VL19_0));
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_PCS_VL19_1 + MAC_CH_OFFSET(i),
+ REGMASK(0xE5, NIC0_MAC_CH0_MAC_PCS_VL19_1_VL19_1));
+ break;
+ default:
+ dev_err(hdev->dev, "unknown port %d speed %dMb/s, cannot set MAC XPCS\n", port,
+ cn_port->speed);
+ return;
+ }
+}
+
+static void gaudi2_cn_hw_mac_port_config(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_port *cn_port = gaudi2_port->cn_port;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 port = cn_port->port;
+ int i, start_lane;
+
+ start_lane = (port & 1) ? (NIC_MAC_NUM_OF_LANES / 2) : NIC_MAC_LANES_START;
+
+ for (i = start_lane; i < start_lane + (NIC_MAC_NUM_OF_LANES / 2); i++) {
+ gaudi2_cn_hw_mac_ch_reset(gaudi2_port, i);
+
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_FRM_LENGTH + MAC_CH_OFFSET(i),
+ NIC_MAC_MAX_FRM_LEN);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_COMMAND_CONFIG + MAC_CH_OFFSET(i), 0x2913);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_TX_FIFO_SECTIONS + MAC_CH_OFFSET(i), 0x4);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_RX_FIFO_SECTIONS + MAC_CH_OFFSET(i), 0x4);
+
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL01_PAUSE_QUANTA + MAC_CH_OFFSET(i),
+ 0xFFFFFFFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL01_QUANTA_THRESH + MAC_CH_OFFSET(i),
+ 0x7FFF7FFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL23_PAUSE_QUANTA + MAC_CH_OFFSET(i),
+ 0xFFFFFFFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL23_QUANTA_THRESH + MAC_CH_OFFSET(i),
+ 0x7FFF7FFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL45_PAUSE_QUANTA + MAC_CH_OFFSET(i),
+ 0xFFFFFFFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL45_QUANTA_THRESH + MAC_CH_OFFSET(i),
+ 0x7FFF7FFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL67_PAUSE_QUANTA + MAC_CH_OFFSET(i),
+ 0xFFFFFFFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL67_QUANTA_THRESH + MAC_CH_OFFSET(i),
+ 0x7FFF7FFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL89_PAUSE_QUANTA + MAC_CH_OFFSET(i),
+ 0xFFFFFFFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL89_QUANTA_THRESH + MAC_CH_OFFSET(i),
+ 0x7FFF7FFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL1011_PAUSE_QUANTA + MAC_CH_OFFSET(i),
+ 0xFFFFFFFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL1011_QUANTA_THRESH + MAC_CH_OFFSET(i),
+ 0x7FFF7FFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL1213_PAUSE_QUANTA + MAC_CH_OFFSET(i),
+ 0xFFFFFFFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL1213_QUANTA_THRESH + MAC_CH_OFFSET(i),
+ 0x7FFF7FFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL1415_PAUSE_QUANTA + MAC_CH_OFFSET(i),
+ 0xFFFFFFFF);
+ NIC_MACRO_WREG32(NIC0_MAC_CH0_MAC_128_CL1415_QUANTA_THRESH + MAC_CH_OFFSET(i),
+ 0x7FFF7FFF);
+
+ gaudi2_cn_hw_mac_port_config_lane_speed(gaudi2_port, i);
+ }
+}
+
+static void gaudi2_cn_enable_port_interrupts(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+
+ /* enable RXE block interrupts. RXE SPI interrupts should stay masked out,
+ * they generate a lot of events which are not fatal errors
+ */
+ NIC_WREG32(NIC0_RXE0_SEI_INTR_MASK, 0x0);
+
+ /* enable TXS block interrupts */
+ NIC_WREG32(NIC0_TXS0_INTERRUPT_MASK, 0x0);
+
+ /* enable TXE block interrupts */
+ NIC_WREG32(NIC0_TXE0_INTERRUPT_MASK, 0x0);
+
+ /* enable QPC response error interrupts */
+ NIC_WREG32(NIC0_QPC0_INTERRUPT_RESP_ERR_MASK, 0x0);
+}
+
+static void gaudi2_cn_disable_port_interrupts(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+
+ /* disable RXE block interrupts */
+ NIC_WREG32(NIC0_RXE0_SEI_INTR_MASK, 0xF);
+
+ /* disable TXS block interrupts */
+ NIC_WREG32(NIC0_TXS0_INTERRUPT_MASK, 0xF);
+
+ /* disable TXE block interrupts */
+ NIC_WREG32(NIC0_TXE0_INTERRUPT_MASK, 0x7F);
+
+ /* disable QPC response error interrupts */
+ NIC_WREG32(NIC0_QPC0_INTERRUPT_RESP_ERR_MASK, 0x7F);
+}
+
+static int gaudi2_cn_hw_port_config(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_port *cn_port = gaudi2_port->cn_port;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 port = cn_port->port;
+ u64 mac_addr = 0;
+ int i, rc = 0;
+
+ for (i = 0; i < ETH_ALEN; i++) {
+ mac_addr <<= 8;
+ mac_addr |= hdev->cpucp_info->mac_addrs[port].mac_addr[i];
+ }
+
+ /* TXS Configuration */
+ rc = gaudi2_cn_config_port_hw_txs(gaudi2_port);
+ if (rc)
+ return rc;
+
+ /* TXE Configuration */
+ gaudi2_cn_config_port_hw_txe(gaudi2_port, mac_addr);
+
+ /* QPC Configuration */
+ gaudi2_cn_config_port_hw_qpc(gaudi2_port);
+
+ /* RXE Configuration */
+ gaudi2_cn_config_port_hw_rxe(gaudi2_port);
+
+ /* MAC filtering */
+ gaudi2_cn_config_hw_mac_filter(gaudi2_port, mac_addr);
+
+ /* Lanes Configuration */
+ gaudi2_cn_hw_mac_port_config(gaudi2_port);
+
+ /* PFC Configuration */
+ gaudi2_cn_set_pfc(cn_port);
+
+ /* Enable port GIC interrupts - required only if running on PLDM */
+ if (hdev->pldm)
+ gaudi2_cn_enable_port_interrupts(cn_port);
+
+ return rc;
+}
+
+void gaudi2_cn_hw_mac_loopback_cfg(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_port *cn_port;
+ struct hbl_cn_device *hdev;
+ u32 port, val;
+
+ cn_port = gaudi2_port->cn_port;
+ hdev = cn_port->hdev;
+ port = cn_port->port;
+ val = !!cn_port->mac_loopback;
+
+ if (port & 1) {
+ /* odd ports use lanes 2,3 */
+ NIC_MACRO_RMWREG32(NIC0_MAC_CH2_MAC_PCS_CONTROL1, val,
+ NIC0_MAC_CH0_MAC_PCS_CONTROL1_LOOPBACK_MASK);
+ NIC_MACRO_RMWREG32(NIC0_MAC_CH3_MAC_PCS_CONTROL1, val,
+ NIC0_MAC_CH0_MAC_PCS_CONTROL1_LOOPBACK_MASK);
+ } else {
+ /* even ports use lanes 0,1 */
+ NIC_MACRO_RMWREG32(NIC0_MAC_CH0_MAC_PCS_CONTROL1, val,
+ NIC0_MAC_CH0_MAC_PCS_CONTROL1_LOOPBACK_MASK);
+ NIC_MACRO_RMWREG32(NIC0_MAC_CH1_MAC_PCS_CONTROL1, val,
+ NIC0_MAC_CH0_MAC_PCS_CONTROL1_LOOPBACK_MASK);
+ }
+
+ /* flush cfg */
+ NIC_MACRO_RREG32(NIC0_MAC_CH0_MAC_PCS_CONTROL1);
+}
+
+bool gaudi2_cn_is_cq_in_overrun(struct hbl_cn_port *cn_port, u8 cq_id)
+{
+ struct hbl_cn_user_cq *user_cq;
+ bool is_cq_in_overrun = false;
+
+ user_cq = hbl_cn_user_cq_get(cn_port, cq_id);
+ if (user_cq) {
+ is_cq_in_overrun = user_cq->qp_set_overrun_cnt > 0;
+ hbl_cn_user_cq_put(user_cq);
+ }
+
+ return is_cq_in_overrun;
+}
+
+static void gaudi2_cn_qp_pre_destroy(struct hbl_cn_qp *qp)
+{
+ struct hbl_cn_port *cn_port = qp->cn_port;
+ struct gaudi2_cn_port *gaudi2_port;
+
+ gaudi2_port = cn_port->cn_specific;
+
+ mutex_lock(&gaudi2_port->qp_destroy_lock);
+
+ /* only the first QP should enable MAC loopback */
+ if (++gaudi2_port->qp_destroy_cnt == 1 && !cn_port->mac_loopback && !cn_port->pcs_link) {
+ cn_port->mac_loopback = true;
+ gaudi2_cn_hw_mac_loopback_cfg(gaudi2_port);
+ gaudi2_port->qp_destroy_mac_lpbk = true;
+ }
+
+ mutex_unlock(&gaudi2_port->qp_destroy_lock);
+}
+
+static void gaudi2_cn_qp_post_destroy(struct hbl_cn_qp *qp)
+{
+ struct hbl_cn_port *cn_port = qp->cn_port;
+ struct gaudi2_cn_port *gaudi2_port;
+
+ gaudi2_port = cn_port->cn_specific;
+
+ mutex_lock(&gaudi2_port->qp_destroy_lock);
+
+ /* only the last QP should disable MAC loopback */
+ if (!--gaudi2_port->qp_destroy_cnt && gaudi2_port->qp_destroy_mac_lpbk) {
+ cn_port->mac_loopback = false;
+ gaudi2_cn_hw_mac_loopback_cfg(gaudi2_port);
+ gaudi2_port->qp_destroy_mac_lpbk = false;
+ }
+
+ mutex_unlock(&gaudi2_port->qp_destroy_lock);
+}
+
+static int gaudi2_cn_hw_config(struct gaudi2_cn_port *gaudi2_port)
+{
+ int rc = 0;
+
+ rc = gaudi2_cn_hw_port_config(gaudi2_port);
+ if (rc)
+ return rc;
+
+ gaudi2_cn_hw_mac_loopback_cfg(gaudi2_port);
+
+ return rc;
+}
+
+static int gaudi2_cn_port_hw_init(struct hbl_cn_port *cn_port)
+{
+ struct gaudi2_cn_port *gaudi2_port = cn_port->cn_specific;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 port = cn_port->port;
+ int rc;
+
+ gaudi2_cn_reset_rings(gaudi2_port);
+
+ /* register the Eth CQ with the event dispatcher */
+ rc = hbl_cn_eq_dispatcher_register_cq(cn_port, gaudi2_port->cq_rings[NIC_CQ_RAW_IDX].asid,
+ NIC_CQ_RAW_IDX);
+ if (rc) {
+ dev_err(hdev->dev, "failed to register port %d CQ %d with cn_eq_sw\n", port,
+ NIC_CQ_RAW_IDX);
+ goto cq_register_fail;
+ }
+
+ rc = gaudi2_cn_hw_config(gaudi2_port);
+ if (rc)
+ goto cq_init_fail;
+
+ cn_port->eq_handler_enable = true;
+
+ rc = gaudi2_qp_sanity_init(gaudi2_port);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init QP sanity, port: %d, %d\n", port, rc);
+ goto cq_init_fail;
+ }
+
+ return 0;
+
+cq_init_fail:
+ cn_port->eq_handler_enable = false;
+ hbl_cn_eq_dispatcher_unregister_cq(cn_port, NIC_CQ_RAW_IDX);
+cq_register_fail:
+
+ return rc;
+}
+
+static void gaudi2_cn_port_hw_fini(struct hbl_cn_port *cn_port)
+{
+ struct gaudi2_cn_port *gaudi2_port = cn_port->cn_specific;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+
+ /* Disable port GIC interrupts - required only if running on PLDM */
+ if (hdev->pldm)
+ gaudi2_cn_disable_port_interrupts(cn_port);
+
+ gaudi2_qp_sanity_fini(gaudi2_port);
+
+ cn_port->eq_handler_enable = false;
+
+ hbl_cn_eq_dispatcher_unregister_cq(cn_port, NIC_CQ_RAW_IDX);
+ hbl_cn_eq_dispatcher_unregister_db(cn_port, GAUDI2_DB_FIFO_SECURE_HW_ID);
+
+ hbl_cn_eq_dispatcher_reset(cn_port);
+}
+
+static bool qpc_op_cond_func(u32 val, void *arg)
+{
+ return !val;
+}
+
+/* must be called under mutex_lock(&cn_port->qpc_lock) */
+static int gaudi2_cn_qpc_op(struct hbl_cn_port *cn_port, u64 ctrl, bool wait_for_completion)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+ int rc = 0;
+
+ NIC_WREG32(NIC0_QPC0_GW_CTRL, ctrl);
+
+ NIC_WREG32(NIC0_QPC0_GW_BUSY, 1);
+
+ /* do not poll on registers when reset was initiated by FW */
+ if (wait_for_completion && !hdev->fw_reset) {
+ u32 addr = NIC0_QPC0_GW_BUSY + NIC_CFG_BASE(port, NIC0_QPC0_GW_BUSY);
+
+ rc = gaudi2_cn_poll_reg(hdev, addr, hdev->qpc_cache_inv_timeout, qpc_op_cond_func,
+ NULL);
+ }
+
+ return rc;
+}
+
+static int gaudi2_cn_qpc_write_masked(struct hbl_cn_port *cn_port, const void *qpc_data,
+ const struct qpc_mask *qpc_mask, u32 qpn, bool is_req,
+ bool force_doorbell)
+{
+ struct hbl_cn_device *hdev;
+ u32 port, data_size, ctrl;
+ const u32 *mask, *data;
+ int i, rc;
+
+ hdev = cn_port->hdev;
+ port = cn_port->port;
+ data_size = is_req ? sizeof(struct gaudi2_qpc_requester) :
+ sizeof(struct gaudi2_qpc_responder);
+ mask = (const u32 *)qpc_mask;
+ data = qpc_data;
+
+ /* Don't write to the Gw if its busy with prev operation */
+ if (NIC_RREG32(NIC0_QPC0_GW_BUSY)) {
+ if (hbl_cn_comp_device_operational(hdev))
+ dev_err(hdev->dev, "Cannot write to port %d QP %d %s QPC, GW is busy\n",
+ port, qpn, is_req ? "requester" : "responder");
+
+ return (hdev->in_teardown && hdev->hw_invalid_while_teardown) ? 0 : -EBUSY;
+ }
+
+ /* Copy the mask and data to the gateway regs.
+ * Only the data bits with their corresponding mask-bits set will be written
+ * to the HW.
+ */
+ for (i = 0; i < (sizeof(struct qpc_mask) / sizeof(u32)); i++)
+ NIC_WREG32(NIC0_QPC0_GW_MASK_0 + i * sizeof(u32), mask[i]);
+
+ for (i = 0; i < (data_size / sizeof(u32)); i++)
+ NIC_WREG32(NIC0_QPC0_GW_DATA_0 + i * sizeof(u32), data[i]);
+
+ ctrl = (is_req << NIC0_QPC0_GW_CTRL_REQUESTER_SHIFT) | qpn |
+ (!!force_doorbell << NIC0_QPC0_GW_CTRL_DOORBELL_FORCE_SHIFT);
+
+ rc = gaudi2_cn_qpc_op(cn_port, ctrl, true);
+ if (rc && hbl_cn_comp_device_operational(hdev))
+ /* Device might not respond during reset if the reset was due to error */
+ dev_err(hdev->dev, "%s QPC GW write timeout, port: %d, qpn: %u\n",
+ is_req ? "requester" : "responder", port, qpn);
+
+ return rc;
+}
+
+bool gaudi2_handle_qp_error_retry(struct hbl_cn_port *cn_port, u32 qpn)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct gaudi2_qpc_requester req_qpc = {};
+ struct qpc_mask mask = {};
+ int port = cn_port->port;
+ u8 max_retry_timeout;
+ struct hbl_cn_qp *qp;
+ int rc, retry = 5;
+ u8 timeout_max;
+ u64 wq_delay;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+ port_funcs->cfg_lock(cn_port);
+ qp = xa_load(&cn_port->qp_ids, qpn);
+
+ if (!qp) {
+ port_funcs->cfg_unlock(cn_port);
+ dev_err(hdev->dev, "adaptive retry, port %d, QP: %d is null\n",
+ port, qpn);
+
+ return false;
+ }
+
+ cancel_delayed_work(&qp->adaptive_tmr_reset);
+
+ timeout_max = qp->timeout_granularity + NIC_ADAPTIVE_TIMEOUT_RANGE / 2;
+ if (qp->timeout_curr < timeout_max) {
+ qp->timeout_curr++;
+ /* clear QP error */
+ REQ_QPC_SET_ERR(mask, 1);
+ REQ_QPC_SET_ERR(req_qpc, 0);
+ REQ_QPC_SET_TIMEOUT_RETRY_COUNT(mask, 0xff);
+ REQ_QPC_SET_TIMEOUT_RETRY_COUNT(req_qpc, 0);
+ REQ_QPC_SET_TM_GRANULARITY(mask, 0x7f);
+ REQ_QPC_SET_TM_GRANULARITY(req_qpc, qp->timeout_curr);
+
+ do {
+ rc = gaudi2_cn_qpc_write_masked(cn_port, &req_qpc, &mask, qp->qp_id,
+ true, true);
+ if (rc)
+ dev_err(hdev->dev, "failed to write QPC port %d, %d, err %d\n",
+ port, qpn, rc);
+
+ rc = gaudi2_cn_qpc_read(cn_port, &req_qpc, qp->qp_id, true);
+ if (rc)
+ dev_err(hdev->dev, "failed to read QPC port %d, %d, err %d\n",
+ port, qpn, rc);
+ if (!REQ_QPC_GET_ERROR(req_qpc))
+ break;
+ retry--;
+ } while (retry);
+
+ if (!retry) {
+ port_funcs->cfg_unlock(cn_port);
+ dev_err(hdev->dev, "failed to clear QPC error port %d, %d\n", port, qpn);
+
+ return false;
+ }
+
+ dev_dbg_ratelimited(hdev->dev, "dropping Port-%d QP error on qp %d\n",
+ port, qp->qp_id);
+
+ max_retry_timeout = GAUDI2_NIC_MAX_TIMEOUT_RETRIES / NIC_ADAPTIVE_TIMEOUT_RANGE;
+ wq_delay = NIC_GRAN_TO_USEC(qp->timeout_curr) * max_retry_timeout *
+ NIC_TMR_RESET_FACTOR;
+ queue_delayed_work(cn_port->qp_wq, &qp->adaptive_tmr_reset,
+ msecs_to_jiffies(wq_delay / 1000));
+
+ port_funcs->cfg_unlock(cn_port);
+
+ return true;
+ }
+
+ qp->timeout_curr = qp->timeout_granularity - (NIC_ADAPTIVE_TIMEOUT_RANGE >> 1);
+
+ port_funcs->cfg_unlock(cn_port);
+
+ return false;
+}
+
+static int gaudi2_cn_qpc_write(struct hbl_cn_port *cn_port, void *qpc, struct qpc_mask *qpc_mask,
+ u32 qpn, bool is_req)
+{
+ u32 data_size = is_req ? sizeof(struct gaudi2_qpc_requester) :
+ sizeof(struct gaudi2_qpc_responder);
+ struct qpc_mask mask = {};
+
+ if (!qpc_mask) {
+ /* NULL mask flags full QPC write */
+ memset(&mask, 0xFF, data_size);
+ qpc_mask = &mask;
+ }
+
+ return gaudi2_cn_qpc_write_masked(cn_port, qpc, qpc_mask, qpn, is_req, false);
+}
+
+static int gaudi2_cn_qpc_invalidate(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp, bool is_req)
+{
+ struct gaudi2_qpc_requester req_qpc = {};
+ struct gaudi2_qpc_responder res_qpc = {};
+ struct qpc_mask mask = {};
+ void *qpc;
+ int rc;
+
+ if (is_req) {
+ /* use Congestion window mode with RTT state disabled &
+ * window size 0 to force REQ Tx stop, while Rx remains
+ * active.
+ */
+ REQ_QPC_SET_CONGESTION_MODE(mask, 3);
+ REQ_QPC_SET_RTT_STATE(mask, 3);
+ REQ_QPC_SET_CONGESTION_WIN(mask, GENMASK(23, 0));
+
+ REQ_QPC_SET_CONGESTION_MODE(req_qpc, 2);
+ REQ_QPC_SET_RTT_STATE(req_qpc, 0);
+ REQ_QPC_SET_CONGESTION_WIN(req_qpc, 0);
+ qpc = &req_qpc;
+ } else {
+ RES_QPC_SET_VALID(mask, 1);
+ RES_QPC_SET_VALID(res_qpc, 0);
+ qpc = &res_qpc;
+ }
+
+ rc = gaudi2_cn_qpc_write_masked(cn_port, qpc, &mask, qp->qp_id, is_req, false);
+
+ if (is_req) {
+ /* Allow CQ overrun to make sure QP drain is successful. In case PFC is sent due to
+ * CQ overflow, no Tx can be sent until CQ releases the back-pressure. If in the
+ * meanwhile a QP needs to be invalidated, no tx will be sent in the QP drain stage.
+ * This will cause Tx slices to be stuck after the QP drain stage has finished.
+ * Next when the CQ will be destroyed, it will release the back-pressure, causing
+ * the stuck Tx slices to be sent (as there's no more back-pressure). Since no QP
+ * is allocated anymore, AXI errors and QP invalid errors will be received.
+ * As a workaround to the issue above, allow overrun to the associated CQ of the
+ * invalidated QP. This will release the back-pressure before the drain stage, and
+ * will allow all needed tx packets to be drained successfully. Once the drain stage
+ * is done, and the QP is cleared, disable CQ overrun.
+ */
+ if (!qp->force_cq_overrun && qp->req_user_cq) {
+ qp->force_cq_overrun = true;
+ gaudi2_user_cq_set_overrun(qp->req_user_cq, true);
+ }
+
+ /* H/W bug H6-3379: the TXE WQ cache is disabled, thus no need to invalidate it */
+ }
+
+ return rc;
+}
+
+static int gaudi2_cn_qpc_clear(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp, bool is_req)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct gaudi2_qpc_requester req_qpc = {};
+ struct gaudi2_qpc_responder res_qpc = {};
+ u32 port = cn_port->port;
+ struct qpc_mask mask;
+ void *qpc;
+ int rc;
+
+ qpc = is_req ? (void *)&req_qpc : (void *)&res_qpc;
+
+ if (qp->force_cq_overrun && is_req) {
+ qp->force_cq_overrun = false;
+ if (qp->req_user_cq)
+ gaudi2_user_cq_set_overrun(qp->req_user_cq, false);
+ }
+
+ memset(&mask, 0xFF, sizeof(mask));
+
+ rc = gaudi2_cn_qpc_write_masked(cn_port, qpc, &mask, qp->qp_id, is_req, false);
+ if (rc)
+ return rc;
+
+ if (is_req) {
+ /* Invalidate RXE WQE cache */
+ NIC_RMWREG32(NIC0_RXE0_CACHE_CFG, 1, NIC0_RXE0_CACHE_CFG_INVALIDATION_MASK);
+ NIC_RREG32(NIC0_RXE0_CACHE_CFG);
+
+ NIC_RMWREG32(NIC0_RXE0_CACHE_CFG, 0, NIC0_RXE0_CACHE_CFG_INVALIDATION_MASK);
+ NIC_RREG32(NIC0_RXE0_CACHE_CFG);
+ }
+
+ return 0;
+}
+
+int gaudi2_cn_qpc_read(struct hbl_cn_port *cn_port, void *qpc, u32 qpn, bool is_req)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ bool force_doorbell = false;
+ u32 *data, port, size;
+ int i, rc;
+ u64 ctrl;
+
+ port = cn_port->port;
+ data = qpc;
+ size = is_req ? sizeof(struct gaudi2_qpc_requester) : sizeof(struct gaudi2_qpc_responder);
+
+ /* Don't write to the Gw if its busy with prev operation */
+ if (NIC_RREG32(NIC0_QPC0_GW_BUSY)) {
+ if (hbl_cn_comp_device_operational(hdev))
+ dev_err(hdev->dev, "Cannot read from port %d QP %d %s QPC, GW is busy\n",
+ port, qpn, is_req ? "requester" : "responder");
+
+ return (hdev->in_teardown && hdev->hw_invalid_while_teardown) ? 0 : -EBUSY;
+ }
+
+ /* Clear the mask gateway regs which will cause the operation to be a read */
+ for (i = 0; i < QPC_GW_MASK_REG_NUM; i++)
+ NIC_WREG32(NIC0_QPC0_GW_MASK_0 + i * sizeof(u32), 0);
+
+ ctrl = (is_req << NIC0_QPC0_GW_CTRL_REQUESTER_SHIFT) | qpn |
+ (!!force_doorbell << NIC0_QPC0_GW_CTRL_DOORBELL_FORCE_SHIFT);
+ rc = gaudi2_cn_qpc_op(cn_port, ctrl, true);
+ if (rc)
+ return rc;
+
+ for (i = 0; i < size / sizeof(u32); i++)
+ data[i] = NIC_RREG32(NIC0_QPC0_GW_DATA_0 + i * sizeof(u32));
+
+ return 0;
+}
+
+static int gaudi2_cn_qpc_query(struct hbl_cn_port *cn_port, u32 qpn, bool is_req,
+ struct hbl_cn_qpc_attr *attr)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct gaudi2_qpc_requester req_qpc;
+ struct gaudi2_qpc_responder res_qpc;
+ u32 port = cn_port->port;
+ int rc;
+
+ if (is_req) {
+ rc = gaudi2_cn_qpc_read(cn_port, (void *)&req_qpc, qpn, is_req);
+ if (rc)
+ goto out_err;
+
+ attr->valid = REQ_QPC_GET_VALID(req_qpc);
+ attr->in_work = REQ_QPC_GET_IN_WORK(req_qpc);
+ attr->error = REQ_QPC_GET_ERROR(req_qpc);
+ } else {
+ rc = gaudi2_cn_qpc_read(cn_port, (void *)&res_qpc, qpn, is_req);
+ if (rc)
+ goto out_err;
+
+ attr->valid = RES_QPC_GET_VALID(res_qpc);
+ attr->in_work = RES_QPC_GET_IN_WORK(res_qpc);
+ attr->conn_state = RES_QPC_GET_CONN_STATE(res_qpc);
+ }
+
+ return 0;
+
+out_err:
+ dev_err(hdev->dev, "%s QPC GW read timeout, port: %d, qpn: %u\n",
+ is_req ? "requester" : "responder", port, qpn);
+ return rc;
+}
+
+int gaudi2_cn_wqe_read(struct hbl_cn_port *cn_port, void *wqe, u32 qpn, u32 wqe_idx, bool is_tx)
+{
+ u64 wq_base_addr_upper, wq_base_addr_lower, wq_size_cline_log, ctrl,
+ req_qpc_base_addr_upper, req_qpc_base_addr_lower, wqe_offset;
+ u32 *data, port, wqe_cline_idx, num_of_wqe_in_cline;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ int i, rc;
+
+ port = cn_port->port;
+ data = wqe;
+
+ if (is_tx) {
+ wq_base_addr_upper = NIC_RREG32(NIC0_TXE0_SQ_BASE_ADDRESS_63_32_1);
+ wq_base_addr_lower = NIC_RREG32(NIC0_TXE0_SQ_BASE_ADDRESS_31_0_1);
+ wq_size_cline_log = NIC_RREG32(NIC0_TXE0_LOG_MAX_WQ_SIZE_1);
+ num_of_wqe_in_cline = TX_WQE_NUM_IN_CLINE;
+ } else {
+ wq_base_addr_upper = NIC_RREG32(NIC0_RXE0_WIN1_WQ_BASE_HI);
+ wq_base_addr_lower = NIC_RREG32(NIC0_RXE0_WIN1_WQ_BASE_LO);
+ wq_size_cline_log = NIC_RREG32(NIC0_RXE0_WIN1_WQ_MISC);
+ num_of_wqe_in_cline = RX_WQE_NUM_IN_CLINE;
+ }
+ wqe_cline_idx = wqe_idx / num_of_wqe_in_cline;
+
+ req_qpc_base_addr_upper = NIC_RREG32(NIC0_TXE0_SQ_BASE_ADDRESS_63_32_1);
+ req_qpc_base_addr_lower = NIC_RREG32(NIC0_QPC0_REQ_BASE_ADDRESS_31_7);
+
+ /* Don't write to the Gw if its busy with prev operation */
+ if (NIC_RREG32(NIC0_QPC0_GW_BUSY)) {
+ if (hbl_cn_comp_device_operational(hdev))
+ dev_err(hdev->dev,
+ "Cannot read wqe from port %d QP %d, GW is busy\n", port, qpn);
+
+ return -EBUSY;
+ }
+
+ WARN_ON_CACHE_UNALIGNED(wq_base_addr_lower);
+
+ /* Hacking the QPC base address to read WQ */
+ NIC_WREG32(NIC0_QPC0_REQ_BASE_ADDRESS_63_32, wq_base_addr_upper);
+ NIC_WREG32(NIC0_QPC0_REQ_BASE_ADDRESS_31_7, wq_base_addr_lower >> 7);
+
+ /* Clear the mask gateway regs which will cause the operation to be a read */
+ for (i = 0; i < QPC_GW_MASK_REG_NUM; i++)
+ NIC_WREG32(NIC0_QPC0_GW_MASK_0 + i * sizeof(u32), 0);
+
+ /* Calculate the WQE offset in cache line units */
+ wqe_offset = (1ULL << wq_size_cline_log) * qpn + wqe_cline_idx;
+ ctrl = NIC0_QPC0_GW_CTRL_REQUESTER_MASK + wqe_offset;
+ rc = gaudi2_cn_qpc_op(cn_port, ctrl, true);
+ if (rc)
+ goto exit;
+
+ /* H/W reads in cache line size */
+ for (i = 0; i < 32; i++)
+ data[i] = NIC_RREG32(NIC0_QPC0_GW_DATA_0 + i * sizeof(u32));
+
+exit:
+ /* Restore the configuration */
+ NIC_WREG32(NIC0_QPC0_REQ_BASE_ADDRESS_63_32, req_qpc_base_addr_upper);
+ NIC_WREG32(NIC0_QPC0_REQ_BASE_ADDRESS_31_7, req_qpc_base_addr_lower);
+
+ return rc;
+}
+
+static bool is_valid_mtu(u16 mtu)
+{
+ return (mtu == SZ_1K) || (mtu == SZ_2K) || (mtu == SZ_4K) || (mtu == SZ_8K);
+}
+
+static int normalize_priority(struct hbl_cn_device *hdev, u32 priority, enum hbl_ts_type type,
+ bool is_req, u32 *norm_priority)
+{
+ /* Ethernet and Responder get the highest priority */
+ if (!is_req || type == TS_RAW) {
+ *norm_priority = GAUDI2_PFC_PRIO_DRIVER;
+ return 0;
+ }
+
+ /* Req priority can vary from 1 to 3 */
+ if (priority < GAUDI2_PFC_PRIO_USER_BASE || priority >= HBL_EN_PFC_PRIO_NUM)
+ return -EINVAL;
+
+ *norm_priority = priority;
+ return 0;
+}
+
+static u32 gaudi2_cn_txs_get_schedq_num(u32 priority, bool is_req)
+{
+ u32 prio_q_group;
+
+ /* prio-group numbering start from 1 - normalize it to Zero */
+ prio_q_group = (is_req ? TXS_PORT_REQ_SCHED_Q : TXS_PORT_RES_SCHED_Q);
+
+ return prio_q_group * HBL_EN_PFC_PRIO_NUM + priority;
+}
+
+static void gaudi2_default_encap_set(struct hbl_cn_port *cn_port, u32 *encap_id, u32 src_ip)
+{
+ struct hbl_cn_encap_xarray_pdata encap_data;
+ u8 dummy_hdr[NIC_MAX_TNL_HDR_SIZE] = {};
+
+ gaudi2_get_default_encap_id(cn_port, encap_id);
+
+ if (!src_ip && hbl_cn_get_src_ip(cn_port, &src_ip)) {
+ dev_dbg(cn_port->hdev->dev, "failed to get interface IP, using 0\n");
+ src_ip = 0;
+ }
+
+ memset(dummy_hdr, 0xa5, sizeof(dummy_hdr));
+
+ encap_data.port = cn_port->port;
+ encap_data.id = *encap_id;
+ encap_data.src_ip = src_ip;
+ encap_data.encap_type = HBL_CNI_ENCAP_OVER_UDP;
+ encap_data.encap_type_data = DUMMY_UDP_PORT;
+ encap_data.encap_header = dummy_hdr;
+ encap_data.encap_header_size = sizeof(dummy_hdr);
+
+ gaudi2_encap_set(cn_port, *encap_id, &encap_data);
+}
+
+static int gaudi2_cn_validate_timeout(u8 gran)
+{
+ u64 tmr_timeout_us;
+ int ret = 0;
+
+ tmr_timeout_us = NIC_GRAN_TO_USEC(gran);
+
+ /* This check guarantees that we can't overflow the PSN window within a timer period,
+ * meaning that it needs to ensure that the time it takes to transmit the total amount
+ * of bits ofall packets is greater than the timeout value. note that we take the
+ * "worst case" values, i.e. min MTU and max speed.
+ *
+ * MAX_CONG_WND * MIN_MTU [bits]
+ * ----------------------------- > TIMEOUT [s]
+ * SPEED [bits/s]
+ *
+ * ==>
+ *
+ * GAUDI2_NIC_MAX_CONG_WND * (NIC_RAW_MIN_MTU [bytes] * BITS_PER_BYTE)
+ * ------------------------------------------------------------------- >
+ * GAUDI2_NIC_MAX_SPEED [Mbits/sec] * 1M
+ *
+ * NIC_TMR_TIMEOUT_US [usec]
+ * -------------------------
+ * 1M
+ * ==>
+ *
+ * GAUDI2_NIC_MAX_CONG_WND * (NIC_RAW_MIN_MTU * BITS_PER_BYTE) >
+ * NIC_TMR_TIMEOUT_US * GAUDI2_NIC_MAX_SPEED
+ */
+ if ((((u64)GAUDI2_NIC_MAX_CONG_WND) * ((u64)(NIC_RAW_MIN_MTU * BITS_PER_BYTE))) <=
+ (tmr_timeout_us * ((u64)GAUDI2_NIC_MAX_SPEED)))
+ ret = -EINVAL;
+
+ return ret;
+}
+
+static int gaudi2_set_req_qp_ctx(struct hbl_cn_device *hdev, struct hbl_cni_req_conn_ctx_in *in,
+ struct hbl_cn_qp *qp)
+{
+ u8 mac[ETH_ALEN], cqn, encap_en, timer_granularity;
+ struct gaudi2_qpc_requester req_qpc;
+ struct gaudi2_cn_port *gaudi2_port;
+ struct hbl_cn_user_cq *user_cq;
+ struct hbl_en_aux_ops *aux_ops;
+ u32 port, priority, encap_id;
+ struct hbl_cn_port *cn_port;
+ struct hbl_aux_dev *aux_dev;
+ int rc;
+
+ port = in->port;
+ cn_port = &hdev->cn_ports[port];
+ gaudi2_port = cn_port->cn_specific;
+ aux_dev = &hdev->en_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ /* In case user didn't set encap, unset for internal ports. */
+ encap_id = 0;
+ encap_en = 0;
+
+ /* Enforce sender (remote) WQ to be at least 4 times bigger than receiver WQ to avoid H/W
+ * bug - RDV roll-back may stuck the QP.
+ */
+ if (in->wq_type == QPC_REQ_WQ_TYPE_WRITE && in->wq_remote_log_size &&
+ (in->wq_size * 4 > BIT(in->wq_remote_log_size))) {
+ dev_dbg(hdev->dev, "Invalid RDV WQ size. local %d, remote %lu, port %d\n",
+ in->wq_size, BIT(in->wq_remote_log_size), port);
+ return -EINVAL;
+ }
+
+ if (in->mtu && !is_valid_mtu(in->mtu)) {
+ dev_dbg(hdev->dev, "MTU of %u is not supported, port %d\n", in->mtu, port);
+ return -EINVAL;
+ }
+
+ if (normalize_priority(hdev, in->priority, TS_RC, true, &priority)) {
+ dev_dbg(hdev->dev, "Unsupported priority value %u, port %d\n", in->priority, port);
+ return -EINVAL;
+ }
+
+ /* H6-3399: Below configuration isn't valid due to H/W bug, i.e.: using encap_id for src IP
+ * settings w/o encapsulation isn't allowed.
+ */
+ if (!in->encap_en && in->encap_id) {
+ dev_dbg(hdev->dev,
+ "Encapsulation ID %d can't be set when encapsulation disable, port %d\n",
+ in->encap_id, port);
+ return -EINVAL;
+ }
+
+ /* Due to H/W bug H6-3280, it was decided to allow congestion control for external ports
+ * only - the user shouldn't enable it for internal ports.
+ */
+ if (!cn_port->eth_enable && in->congestion_en) {
+ dev_err(hdev->dev,
+ "congestion control should be disabled for internal ports, port %d mode %u\n",
+ port, in->congestion_en);
+ return -EINVAL;
+ }
+
+ if (cn_port->bp_enable && in->wq_size < GAUDI2_NIC_MIN_WQ_SIZE_BP_ENABLED) {
+ dev_err(hdev->dev,
+ "WQ size (%d) can't be smaller than %d when back pressure is enabled, port %d\n",
+ in->wq_size, GAUDI2_NIC_MIN_WQ_SIZE_BP_ENABLED, port);
+ return -EINVAL;
+ }
+
+ timer_granularity = hdev->pldm ? NIC_TMR_TIMEOUT_PLDM_GRAN : in->timer_granularity;
+
+ /* gran 0 is a special case for the highest timeout supported by hw */
+ if (timer_granularity && gaudi2_cn_validate_timeout(timer_granularity)) {
+ dev_err(hdev->dev,
+ "timer granularity %d is not supported\n", timer_granularity);
+ return -EINVAL;
+ }
+
+ if (gaudi2_port->adaptive_timeout_en) {
+ qp->timeout_granularity = timer_granularity;
+ qp->timeout_curr = timer_granularity - (NIC_ADAPTIVE_TIMEOUT_RANGE >> 1);
+ timer_granularity = qp->timeout_curr;
+ }
+
+ if (in->cq_number) {
+ /* User CQ. */
+ cqn = in->cq_number;
+
+ user_cq = hbl_cn_user_cq_get(cn_port, cqn);
+ if (!user_cq) {
+ dev_dbg(hdev->dev, "CQ %d is invalid, port %d\n", cqn, port);
+ return -EINVAL;
+ }
+
+ qp->req_user_cq = user_cq;
+ } else {
+ /* No CQ. */
+ cqn = NIC_CQ_RDMA_IDX;
+ }
+
+ if (cn_port->eth_enable)
+ memcpy(mac, in->dst_mac_addr, ETH_ALEN);
+ else
+ /* in this case the MAC is irrelevant so use broadcast */
+ eth_broadcast_addr(mac);
+
+ memset(&req_qpc, 0, sizeof(req_qpc));
+
+ REQ_QPC_SET_DST_QP(req_qpc, in->dst_conn_id);
+ REQ_QPC_SET_PORT(req_qpc, 0); /* Always select lane 0 */
+ REQ_QPC_SET_PRIORITY(req_qpc, 3);
+ REQ_QPC_SET_RKEY(req_qpc, qp->remote_key);
+ REQ_QPC_SET_DST_IP(req_qpc, in->dst_ip_addr);
+ REQ_QPC_SET_DST_MAC_LSB(req_qpc, *(u32 *)mac);
+ REQ_QPC_SET_DST_MAC_MSB(req_qpc, *(u16 *)(mac + 4));
+
+ REQ_QPC_SET_SCHD_Q_NUM(req_qpc, gaudi2_cn_txs_get_schedq_num(priority, true));
+ REQ_QPC_SET_TM_GRANULARITY(req_qpc, timer_granularity);
+
+ REQ_QPC_SET_TRANSPORT_SERVICE(req_qpc, TS_RC);
+ REQ_QPC_SET_BURST_SIZE(req_qpc, QPC_REQ_BURST_SIZE);
+ REQ_QPC_SET_LAST_IDX(req_qpc, in->wq_size - 1);
+
+ REQ_QPC_SET_WQ_BASE_ADDR(req_qpc, 1);
+
+ /* In case the user didn't specify MTU, set the one from netdev.
+ * If there is no netdev, use the default value.
+ */
+ if (in->mtu) {
+ qp->mtu = in->mtu;
+ qp->mtu_type = MTU_FROM_USER;
+ } else if (cn_port->eth_enable) {
+ if (aux_ops->get_mtu)
+ qp->mtu = aux_ops->get_mtu(aux_dev, port) + HBL_EN_MAX_HEADERS_SZ;
+ else
+ qp->mtu = GAUDI2_NIC_MTU_DEFAULT;
+
+ qp->mtu_type = MTU_FROM_NETDEV;
+ } else {
+ qp->mtu = GAUDI2_NIC_MTU_DEFAULT;
+ qp->mtu_type = MTU_DEFAULT;
+ }
+
+ REQ_QPC_SET_MTU(req_qpc, ilog2(roundup_pow_of_two(qp->mtu)) - 10);
+
+ /* GAUDI1 mode is not used and hence set to 0 */
+ REQ_QPC_SET_MOD_GAUDI1(req_qpc, 0);
+ REQ_QPC_SET_SWQ_GRANULARITY(req_qpc, in->swq_granularity);
+ REQ_QPC_SET_CQ_NUM(req_qpc, cqn);
+
+ /* Protect the HW from zero value */
+ REQ_QPC_SET_REMOTE_WQ_LOG_SZ(req_qpc, in->wq_remote_log_size ? in->wq_remote_log_size : 2);
+
+ /* config MMU-BP */
+ REQ_QPC_SET_DATA_MMU_BYPASS(req_qpc, 0);
+
+ /* ASID is also used as protection-domain, so always configure it */
+ REQ_QPC_SET_ASID(req_qpc, qp->ctx->user_asid);
+
+ REQ_QPC_SET_ACKREQ_FREQ(req_qpc, 8);
+ REQ_QPC_SET_WQ_TYPE(req_qpc, in->wq_type);
+
+ /* user QP - unsecured trust level */
+ REQ_QPC_SET_TRUST_LEVEL(req_qpc, UNSECURED);
+
+ /* Congestion control configurations are done for external ports only - for internal ports
+ * congestion control will be disabled.
+ */
+ if (cn_port->eth_enable) {
+ u32 congestion_wnd;
+
+ if (in->congestion_wnd > GAUDI2_NIC_MAX_CONG_WND) {
+ dev_dbg(hdev->dev,
+ "Congestion window size(%u) can't be > max allowed size(%lu), port %d\n",
+ in->congestion_wnd, GAUDI2_NIC_MAX_CONG_WND, port);
+ return -EINVAL;
+ }
+
+ /* congestion_mode:
+ * 0: no congestion
+ * 1: congestion control (BBR/SWIFT)
+ * 2: congestion window
+ *
+ * REQ_QPC_SET_CONGESTION_MODE set those modes.
+ * REQ_QPC_SET_RTT_STATE enable the CC-CQ mechanism (relevant for BBR/SWIFT only).
+ * when user does not set congestion_en we set congestion to mode 2
+ * so we still have cc via the CONGESTION_WIN.
+ */
+ REQ_QPC_SET_CONGESTION_MODE(req_qpc, (in->congestion_en) ? 1 : 2);
+
+ REQ_QPC_SET_RTT_STATE(req_qpc, in->congestion_en);
+
+ congestion_wnd = in->congestion_wnd ? in->congestion_wnd : GAUDI2_NIC_MAX_CONG_WND;
+ REQ_QPC_SET_CONGESTION_WIN(req_qpc, congestion_wnd);
+ }
+
+ if (in->encap_en) {
+ encap_id = in->encap_id;
+ encap_en = in->encap_en;
+ } else if (cn_port->eth_enable) {
+ gaudi2_get_default_encap_id(cn_port, &encap_id);
+ encap_en = 1;
+ }
+
+ REQ_QPC_SET_ENCAP_ENABLE(req_qpc, encap_en);
+ REQ_QPC_SET_ENCAP_TYPE(req_qpc, encap_id);
+
+ REQ_QPC_SET_VALID(req_qpc, 1);
+
+ rc = gaudi2_cn_qpc_write(cn_port, &req_qpc, NULL, in->conn_id, true);
+ if (rc)
+ goto qpc_write_fail;
+
+ if (gaudi2_port->advanced && cn_port->bp_enable &&
+ in->wq_size < gaudi2_port->min_qp_size) {
+ gaudi2_port->min_qp_size = in->wq_size;
+
+ /* The back-pressure thresholds values describe the occupancy of the QP,
+ * thus should be configured to be the size of the smallest QP minus some
+ * defined numbers (currently 24/26 for the upper/lower thresholds
+ * respectively).
+ */
+ NIC_WREG32(NIC0_QPC0_WQ_UPPER_THRESHOLD,
+ gaudi2_port->min_qp_size - GAUDI2_NIC_WTD_BP_UPPER_TH_DIFF);
+ NIC_WREG32(NIC0_QPC0_WQ_LOWER_THRESHOLD,
+ gaudi2_port->min_qp_size - GAUDI2_NIC_WTD_BP_LOWER_TH_DIFF);
+ }
+
+ return 0;
+
+qpc_write_fail:
+ if (qp->req_user_cq) {
+ hbl_cn_user_cq_put(qp->req_user_cq);
+ qp->req_user_cq = NULL;
+ }
+
+ return rc;
+}
+
+static int gaudi2_set_res_qp_ctx(struct hbl_cn_device *hdev, struct hbl_cni_res_conn_ctx_in *in,
+ struct hbl_cn_qp *qp)
+{
+ u8 mac[ETH_ALEN], cqn, encap_en, wq_mmu_bypass;
+ u32 port = in->port, priority, encap_id;
+ struct gaudi2_qpc_responder res_qpc;
+ struct hbl_cn_user_cq *user_cq;
+ struct hbl_cn_port *cn_port;
+ int rc;
+
+ cn_port = &hdev->cn_ports[port];
+
+ /* In case user didn't set encap, unset for internal ports. */
+ encap_id = 0;
+ encap_en = 0;
+
+ /* H6-3399: Below configuration isn't valid due to H/W bug, i.e.: using encap_id for src IP
+ * settings w/o encapsulation isn't allowed.
+ */
+ if (!in->encap_en && in->encap_id) {
+ dev_dbg(hdev->dev,
+ "Encapsulation ID %d can't be set when encapsulation disable, port %d\n",
+ in->encap_id, port);
+ return -EINVAL;
+ }
+
+ if (cn_port->eth_enable)
+ memcpy(mac, in->dst_mac_addr, ETH_ALEN);
+ else
+ /* in this case the MAC is irrelevant so use broadcast */
+ eth_broadcast_addr(mac);
+
+ if (normalize_priority(hdev, in->priority, TS_RC, false, &priority)) {
+ dev_dbg(hdev->dev, "Unsupported priority value %u, port %d\n", in->priority, port);
+ return -EINVAL;
+ }
+
+ if (in->cq_number) {
+ /* User CQ. */
+ cqn = in->cq_number;
+
+ user_cq = hbl_cn_user_cq_get(cn_port, cqn);
+ if (!user_cq) {
+ dev_dbg(hdev->dev, "CQ %d is invalid, port %d\n", cqn, port);
+ return -EINVAL;
+ }
+
+ qp->res_user_cq = user_cq;
+ } else {
+ /* No CQ. */
+ cqn = NIC_CQ_RDMA_IDX;
+ }
+
+ memset(&res_qpc, 0, sizeof(res_qpc));
+
+ RES_QPC_SET_DST_QP(res_qpc, in->dst_conn_id);
+ RES_QPC_SET_PORT(res_qpc, 0); /* Always select lane 0 */
+ RES_QPC_SET_PRIORITY(res_qpc, 2);
+ RES_QPC_SET_LKEY(res_qpc, qp->local_key);
+ RES_QPC_SET_DST_IP(res_qpc, in->dst_ip_addr);
+ RES_QPC_SET_DST_MAC_LSB(res_qpc, *(u32 *)mac);
+ RES_QPC_SET_DST_MAC_MSB(res_qpc, *(u16 *)(mac + 4));
+
+ RES_QPC_SET_TRANSPORT_SERVICE(res_qpc, TS_RC);
+
+ /* config MMU-BP
+ * In RDV QPs, the responded side is not used for 'real' user data but
+ * rather to pass WQEs as data, therefore the QPC MMU-BP attribute shall
+ * be taken according to the configuration of the WQ array.
+ */
+ wq_mmu_bypass = cn_port->wq_arr_props[HBL_CNI_USER_WQ_SEND].wq_mmu_bypass;
+
+ if (!(in->rdv && wq_mmu_bypass))
+ RES_QPC_SET_DATA_MMU_BYPASS(res_qpc, 0);
+ else
+ RES_QPC_SET_DATA_MMU_BYPASS(res_qpc, 1);
+
+ /* ASID is also used as protection-domain, so always configure it */
+ RES_QPC_SET_ASID(res_qpc, qp->ctx->user_asid);
+
+ RES_QPC_SET_PEER_QP(res_qpc, in->conn_peer);
+ RES_QPC_SET_SCHD_Q_NUM(res_qpc, gaudi2_cn_txs_get_schedq_num(priority, false));
+
+ /* for rdv QPs RXE responder takes its security-level from QPC */
+ if (in->rdv)
+ RES_QPC_SET_TRUST_LEVEL(res_qpc, SECURED);
+ else
+ RES_QPC_SET_TRUST_LEVEL(res_qpc, UNSECURED);
+
+ /* GAUDI1 mode is not used and hence set to 0 */
+ RES_QPC_SET_MOD_GAUDI1(res_qpc, 0);
+
+ RES_QPC_SET_CQ_NUM(res_qpc, cqn);
+ RES_QPC_SET_PEER_WQ_GRAN(res_qpc, in->wq_peer_granularity);
+
+ if (in->encap_en) {
+ encap_id = in->encap_id;
+ encap_en = in->encap_en;
+ } else if (cn_port->eth_enable) {
+ gaudi2_get_default_encap_id(cn_port, &encap_id);
+ encap_en = 1;
+ }
+
+ RES_QPC_SET_ENCAP_ENABLE(res_qpc, encap_en);
+ RES_QPC_SET_ENCAP_TYPE(res_qpc, encap_id);
+
+ RES_QPC_SET_VALID(res_qpc, 1);
+
+ rc = gaudi2_cn_qpc_write(cn_port, &res_qpc, NULL, in->conn_id, false);
+ if (rc)
+ goto qpc_write_fail;
+
+ return 0;
+
+qpc_write_fail:
+ if (qp->res_user_cq) {
+ hbl_cn_user_cq_put(qp->res_user_cq);
+ qp->res_user_cq = NULL;
+ }
+
+ return rc;
+}
+
+static int gaudi2_cn_update_qp_mtu(struct hbl_cn_port *cn_port, struct hbl_cn_qp *qp, u32 mtu)
+{
+ struct gaudi2_qpc_requester req_qpc = {};
+ struct qpc_mask mask = {};
+
+ /* MTU field is 2 bits wide */
+ REQ_QPC_SET_MTU(mask, 0x3);
+ REQ_QPC_SET_MTU(req_qpc, ilog2(roundup_pow_of_two(mtu)) - 10);
+
+ return gaudi2_cn_qpc_write_masked(cn_port, &req_qpc, &mask, qp->qp_id, true, false);
+}
+
+static int gaudi2_user_wq_arr_set(struct hbl_cn_device *hdev, struct hbl_cni_user_wq_arr_set_in *in,
+ struct hbl_cni_user_wq_arr_set_out *out, struct hbl_cn_ctx *ctx)
+{
+ u64 wq_base_addr, wq_size_cline_log, wq_size, wq_arr_size, num_of_wqs, num_of_wq_entries;
+ u32 wqe_size, rw_asid, type, port, wqe_asid, alignment_size;
+ struct hbl_cn_wq_array_properties *wq_arr_props;
+ struct hbl_cn_mem_data mem_data = {};
+ struct gaudi2_cn_port *gaudi2_port;
+ struct hbl_cn_port *cn_port;
+ bool phys_addr = true;
+ int rc;
+
+ type = in->type;
+ port = in->port;
+ cn_port = &hdev->cn_ports[port];
+ gaudi2_port = cn_port->cn_specific;
+
+ wq_arr_props = &cn_port->wq_arr_props[type];
+ num_of_wqs = in->num_of_wqs;
+
+ if (wq_arr_props->is_send) {
+ wqe_size = (in->swq_granularity == HBL_CNI_SWQE_GRAN_64B) ?
+ NIC_SEND_WQE_SIZE_MULTI_STRIDE : NIC_SEND_WQE_SIZE;
+ cn_port->swqe_size = wqe_size;
+ } else {
+ wqe_size = NIC_RECV_WQE_SIZE;
+ }
+
+ if (in->mem_id == HBL_CNI_MEM_HOST) {
+ alignment_size = PAGE_SIZE / min(NIC_SEND_WQE_SIZE, NIC_RECV_WQE_SIZE);
+ num_of_wq_entries = ALIGN(in->num_of_wq_entries, alignment_size);
+ wq_size = num_of_wq_entries * wqe_size;
+ wqe_asid = ctx->asid;
+ wq_arr_props->wq_size = wq_size;
+ wq_size_cline_log = ilog2(wq_size / DEVICE_CACHE_LINE_SIZE);
+ mem_data.mem_id = HBL_CN_DRV_MEM_HOST_DMA_COHERENT;
+ } else {
+ num_of_wq_entries = in->num_of_wq_entries;
+ wq_size = ALIGN(num_of_wq_entries * wqe_size, DEVICE_CACHE_LINE_SIZE);
+ mem_data.mem_id = HBL_CN_DRV_MEM_DEVICE;
+ mem_data.in.device_mem_data.port = port;
+ mem_data.in.device_mem_data.type = type;
+ wqe_asid = hdev->kernel_asid;
+ /* device wants the size in units of cache-line */
+ wq_size_cline_log = ilog2((num_of_wq_entries * wqe_size) / DEVICE_CACHE_LINE_SIZE);
+ }
+
+ wq_arr_size = num_of_wqs * wq_size;
+
+ /* We use the MMU whenever the WQ allocation is more than the 4MB DMA coherent memory
+ * constraint. We need not allocate memory if we are using MMU. We reserve the VA in the
+ * PMMU and allocate the actual memory inside set_req_qp_ctx and map to this virtual address
+ * space.
+ */
+ if (wq_arr_size > DMA_COHERENT_MAX_SIZE && in->mem_id == HBL_CNI_MEM_HOST) {
+ if (!hdev->mmu_enable) {
+ dev_dbg(hdev->dev,
+ "MMU not enabled. For allocations greater than %llx, MMU needs to be enabled, wq_arr_size : 0x%llx, port: %d\n",
+ (u64)DMA_COHERENT_MAX_SIZE, wq_arr_size, port);
+ return -EINVAL;
+ }
+ phys_addr = false;
+
+ rc = hbl_cn_reserve_wq_dva(ctx, cn_port, wq_arr_size, type, &wq_base_addr);
+ if (rc)
+ return rc;
+ } else {
+ mem_data.size = wq_arr_size;
+
+ rc = hbl_cn_mem_alloc(ctx, &mem_data);
+ if (rc) {
+ dev_dbg(hdev->dev, "Failed to allocate WQ: %d\n", rc);
+ return rc;
+ }
+
+ wq_base_addr = mem_data.addr;
+ wq_arr_props->handle = mem_data.handle;
+ }
+
+ wq_arr_props->on_device_mem = in->mem_id == HBL_CNI_MEM_DEVICE;
+
+ dev_dbg(hdev->dev,
+ "port %d: WQ-> type:%u addr=0x%llx log_size:%llu wqe_asid:%u mmu_bp:%u\n", port,
+ type, wq_base_addr, wq_size_cline_log, wqe_asid, phys_addr ? 1 : 0);
+
+ if (wq_arr_props->is_send) {
+ NIC_WREG32(NIC0_TXE0_SQ_BASE_ADDRESS_63_32_1, upper_32_bits(wq_base_addr));
+ NIC_WREG32(NIC0_TXE0_SQ_BASE_ADDRESS_31_0_1, lower_32_bits(wq_base_addr));
+
+ NIC_WREG32(NIC0_TXE0_LOG_MAX_WQ_SIZE_1, wq_size_cline_log);
+
+ /* configure WQ MMU
+ * currently user app has the index of 1
+ */
+ if (phys_addr)
+ NIC_WREG32(NIC0_TXE0_WQE_USER_CFG,
+ NIC_RREG32(NIC0_TXE0_WQE_USER_CFG) | (1 << 1));
+ else
+ NIC_WREG32(NIC0_TXE0_WQE_USER_CFG,
+ NIC_RREG32(NIC0_TXE0_WQE_USER_CFG) & ~(1 << 1));
+
+ /* Set secured ASID config. The security is enabled for WQs on HBM such that it can
+ * be accessed only with process whose ASID is wqe_asid. Here we program ASID '0' so
+ * that only the CN HW can access the WQs on HBM.
+ * There is a provision to unset the previous ASID settings, so we set the
+ * ASID only in case of WQ on HBM and unset it in the wq_arr_unset.
+ */
+ if (wqe_asid != ctx->asid) {
+ rc = gaudi2_cn_config_wqe_asid(cn_port, wqe_asid, true);
+ if (rc)
+ goto set_asid_fail;
+ }
+
+ rw_asid = (ctx->asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_RD_SHIFT) |
+ (ctx->asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_WR_SHIFT);
+
+ NIC_WREG32(NIC0_QM0_AXUSER_NONSECURED_HB_ASID, rw_asid);
+ NIC_WREG32(NIC0_QM0_AXUSER_NONSECURED_HB_MMU_BP, phys_addr ? 0x1 : 0);
+
+ if (gaudi2_port->advanced) {
+ /* enable override of asid */
+ NIC_WREG32(NIC0_QPC0_AXUSER_TXWQE_LBW_QMAN_BP_HB_WR_OVRD_LO, 0xfffff800);
+ NIC_WREG32(NIC0_QPC0_AXUSER_TXWQE_LBW_QMAN_BP_HB_RD_OVRD_LO, 0xfffff800);
+ NIC_WREG32(NIC0_QPC0_AXUSER_TXWQE_LBW_QMAN_BP_HB_ASID, wqe_asid);
+ /* Configure MMU-BP override for TX-WQs */
+ NIC_WREG32(NIC0_QPC0_AXUSER_TXWQE_LBW_QMAN_BP_HB_MMU_BP,
+ phys_addr ? 0x1 : 0);
+
+ /* configure the QPC with the Tx WQ parameters */
+ NIC_WREG32(NIC0_QPC0_TX_WQ_BASE_ADDR_63_32_1,
+ upper_32_bits(wq_base_addr));
+ NIC_WREG32(NIC0_QPC0_TX_WQ_BASE_ADDR_31_0_1, lower_32_bits(wq_base_addr));
+ NIC_WREG32(NIC0_QPC0_LOG_MAX_TX_WQ_SIZE_1, wq_size_cline_log);
+ NIC_WREG32(NIC0_QPC0_MMU_BYPASS_TX_WQ_1, phys_addr ? 0x1 : 0);
+
+ /* rendezvous configuration for send work queue */
+ NIC_WREG32(NIC0_RXE0_RDV_SEND_WQ_BASE_ADDR_HI,
+ upper_32_bits(wq_base_addr));
+ NIC_WREG32(NIC0_RXE0_RDV_SEND_WQ_BASE_ADDR_LO,
+ lower_32_bits(wq_base_addr));
+ NIC_WREG32(NIC0_RXE0_RDV_LOG_MAX_WQ_SIZE, wq_size_cline_log);
+
+ if (num_of_wq_entries < gaudi2_port->min_qp_size) {
+ gaudi2_port->min_qp_size = (u32)num_of_wq_entries;
+
+ /* The back-pressure thresholds values describe the occupancy of
+ * the QP, thus should be configured to be the size of the smallest
+ * QP minus some defined numbers (currently 4/8 for the
+ * upper/lower thresholds respectively).
+ */
+ NIC_WREG32(NIC0_QPC0_WQ_UPPER_THRESHOLD,
+ gaudi2_port->min_qp_size -
+ GAUDI2_NIC_WTD_BP_UPPER_TH_DIFF);
+ NIC_WREG32(NIC0_QPC0_WQ_LOWER_THRESHOLD,
+ gaudi2_port->min_qp_size -
+ GAUDI2_NIC_WTD_BP_LOWER_TH_DIFF);
+ }
+ }
+ } else {
+ NIC_WREG32(NIC0_RXE0_WIN1_WQ_BASE_HI, upper_32_bits(wq_base_addr));
+ NIC_WREG32(NIC0_RXE0_WIN1_WQ_BASE_LO, lower_32_bits(wq_base_addr));
+
+ NIC_WREG32(NIC0_RXE0_WIN1_WQ_MISC, wq_size_cline_log);
+
+ /* configure WQ MMU for RXE */
+ if (phys_addr)
+ NIC_WREG32(NIC0_RXE0_ARUSER_MMU_BP,
+ NIC_RREG32(NIC0_RXE0_ARUSER_MMU_BP) | (1 << 1));
+ else
+ NIC_WREG32(NIC0_RXE0_ARUSER_MMU_BP,
+ NIC_RREG32(NIC0_RXE0_ARUSER_MMU_BP) & ~(1 << 1));
+
+ if (wqe_asid != ctx->asid) {
+ /* enable override of asid bit before changing asid */
+ NIC_WREG32(NIC0_RXE0_WQE_ARUSER_HB_RD_OVRD_LO, 0xFFFFFc00);
+ /* change asid to secured asid */
+ NIC_WREG32(NIC0_RXE0_WQE_ARUSER_HB_ASID, wqe_asid);
+ } else {
+ NIC_WREG32(NIC0_RXE0_WQE_ARUSER_HB_RD_OVRD_LO, 0xFFFFFFFF);
+ }
+
+ if (gaudi2_port->advanced) {
+ /* enable override of asid */
+ NIC_WREG32(NIC0_QPC0_AXUSER_RXWQE_HB_WR_OVRD_LO, 0xFFFFF800);
+ NIC_WREG32(NIC0_QPC0_AXUSER_RXWQE_HB_RD_OVRD_LO, 0xFFFFF800);
+ NIC_WREG32(NIC0_QPC0_AXUSER_RXWQE_HB_ASID, wqe_asid);
+ /* Configure MMU-BP override for RX-WQs */
+ NIC_WREG32(NIC0_QPC0_AXUSER_RXWQE_HB_MMU_BP, phys_addr ? 0x1 : 0);
+
+ /* configure the QPC with the Rx WQ parameters */
+ NIC_WREG32(NIC0_QPC0_RX_WQ_BASE_ADDR_63_32_1,
+ upper_32_bits(wq_base_addr));
+ NIC_WREG32(NIC0_QPC0_RX_WQ_BASE_ADDR_31_0_1,
+ lower_32_bits(wq_base_addr));
+ NIC_WREG32(NIC0_QPC0_LOG_MAX_RX_WQ_SIZE_1, wq_size_cline_log);
+ NIC_WREG32(NIC0_QPC0_MMU_BYPASS_RX_WQ_1, phys_addr ? 0x1 : 0);
+
+ /* rendezvous configuration for receive work queue */
+ NIC_WREG32(NIC0_RXE0_WIN0_WQ_BASE_HI, (upper_32_bits(wq_base_addr)));
+ NIC_WREG32(NIC0_RXE0_WIN0_WQ_BASE_LO, lower_32_bits(wq_base_addr));
+ NIC_WREG32(NIC0_RXE0_WIN0_WQ_MISC, wq_size_cline_log);
+ }
+ }
+
+ /* We are using a separate flag for wq mmu bypass as the hdev->mmu_bypass is being used by
+ * other CN data structures.
+ */
+ wq_arr_props->wq_mmu_bypass = phys_addr;
+
+ return 0;
+
+set_asid_fail:
+ if (phys_addr)
+ hbl_cn_mem_destroy(hdev, mem_data.handle);
+ else
+ hbl_cn_unreserve_dva_block(ctx, wq_arr_props->dva_base, wq_arr_props->dva_size);
+
+ return rc;
+}
+
+static int gaudi2_user_wq_arr_unset(struct hbl_cn_ctx *ctx, struct hbl_cn_port *cn_port, u32 type)
+{
+ struct hbl_cn_wq_array_properties *wq_arr_props = &cn_port->wq_arr_props[type];
+ struct hbl_cn_device *hdev = ctx->hdev;
+ u32 port = cn_port->port;
+ int rc = 0;
+
+ if (wq_arr_props->is_send) {
+ NIC_WREG32(NIC0_QPC0_TX_WQ_BASE_ADDR_63_32_1, 0);
+ NIC_WREG32(NIC0_QPC0_TX_WQ_BASE_ADDR_31_0_1, 0);
+ NIC_WREG32(NIC0_QPC0_LOG_MAX_TX_WQ_SIZE_1, 0);
+
+ NIC_WREG32(NIC0_TXE0_SQ_BASE_ADDRESS_63_32_1, 0);
+ NIC_WREG32(NIC0_TXE0_SQ_BASE_ADDRESS_31_0_1, 0);
+ NIC_WREG32(NIC0_TXE0_LOG_MAX_WQ_SIZE_1, 0);
+
+ if (wq_arr_props->on_device_mem)
+ gaudi2_cn_config_wqe_asid(cn_port, 0, false);
+ } else {
+ NIC_WREG32(NIC0_QPC0_RX_WQ_BASE_ADDR_63_32_1, 0);
+ NIC_WREG32(NIC0_QPC0_RX_WQ_BASE_ADDR_31_0_1, 0);
+ NIC_WREG32(NIC0_QPC0_LOG_MAX_RX_WQ_SIZE_1, 0);
+
+ NIC_WREG32(NIC0_RXE0_WIN1_WQ_BASE_LO, 0);
+ NIC_WREG32(NIC0_RXE0_WIN1_WQ_BASE_HI, 0);
+ NIC_WREG32(NIC0_RXE0_WIN1_WQ_MISC, 0);
+ }
+
+ if (wq_arr_props->dva_base) {
+ hbl_cn_unreserve_wq_dva(ctx, cn_port, type);
+ } else {
+ rc = hbl_cn_mem_destroy(hdev, wq_arr_props->handle);
+ if (!rc)
+ wq_arr_props->handle = 0;
+ }
+
+ return rc;
+}
+
+static void gaudi2_get_cq_id_range(struct hbl_cn_port *cn_port, u32 *min_id, u32 *max_id)
+{
+ *min_id = NIC_MIN_CQ_ID;
+ *max_id = NIC_MAX_CQ_ID;
+}
+
+static int gaudi2_user_cq_set(struct hbl_cn_user_cq *user_cq,
+ struct hbl_cni_user_cq_set_in_params *in,
+ struct hbl_cni_user_cq_set_out_params *out)
+{
+ u64 mem_handle, pi_handle, regs_handle, pi_device_addr, umr_block_addr;
+ u32 port, id = user_cq->id, offset = id * 4, regs_offset;
+ struct hbl_cn_port *cn_port = user_cq->cn_port;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_ctx *ctx = user_cq->ctx;
+ struct gaudi2_cn_port *gaudi2_port;
+ struct gaudi2_cn_device *gaudi2;
+ struct hbl_cn_mem_data mem_data;
+ int rc;
+
+ gaudi2 = hdev->asic_specific;
+ port = cn_port->port;
+
+ if (!hdev->mmu_bypass) {
+ dev_dbg(hdev->dev,
+ "Allocation of non physical CQ %d dma-mem is not supported, port %d\n", id,
+ port);
+ return -EOPNOTSUPP;
+ }
+
+ gaudi2_port = &gaudi2->cn_ports[port];
+
+ memset(&mem_data, 0, sizeof(mem_data));
+ mem_data.mem_id = HBL_CN_DRV_MEM_HOST_MAP_ONLY;
+ mem_data.in.host_map_data.bus_address = RING_BUF_DMA_ADDRESS(&gaudi2_port->cq_rings[0]) +
+ (id * NIC_TOTAL_CQ_MEM_SIZE);
+ mem_data.in.host_map_data.kernel_address = RING_BUF_ADDRESS(&gaudi2_port->cq_rings[0]) +
+ (id * NIC_TOTAL_CQ_MEM_SIZE);
+ mem_data.size = in->num_of_cqes * CQE_SIZE;
+ rc = hbl_cn_mem_alloc(ctx, &mem_data);
+ if (rc) {
+ dev_dbg(hdev->dev, "user CQ %d buffer allocation failed, rc %d, port %d\n", id, rc,
+ port);
+ return rc;
+ }
+
+ mem_handle = mem_data.handle;
+
+ /* Allocate a producer-index (PI) buffer in host kernel.
+ * HW updates PI when it pushes an entry to a CQ.
+ * User mmaps PI buffer and may poll to read current PI.
+ *
+ * Allocate page size, else we risk exposing kernel data to userspace inadvertently.
+ */
+ memset(&mem_data, 0, sizeof(mem_data));
+ mem_data.mem_id = HBL_CN_DRV_MEM_HOST_DMA_COHERENT;
+ mem_data.size = PAGE_SIZE;
+ rc = hbl_cn_mem_alloc(ctx, &mem_data);
+ if (rc) {
+ dev_dbg(hdev->dev, "user CQ %d PI buffer allocation failed, rc %d, port %d\n", id,
+ rc, port);
+ goto pi_alloc_fail;
+ }
+
+ pi_handle = mem_data.handle;
+ pi_device_addr = mem_data.addr;
+
+ NIC_WREG32(NIC0_RXE0_CQ_LOG_SIZE_0 + offset, ilog2(in->num_of_cqes));
+ NIC_WREG32(NIC0_RXE0_CQ_PI_ADDR_HI_0 + offset, upper_32_bits(pi_device_addr));
+ NIC_WREG32(NIC0_RXE0_CQ_PI_ADDR_LO_0 + offset, lower_32_bits(pi_device_addr));
+
+ /* reset the PI+CI for this CQ */
+ NIC_WREG32(NIC0_RXE0_CQ_WRITE_INDEX_0 + offset, 0);
+ NIC_WREG32(NIC0_RXE0_CQ_PRODUCER_INDEX_0 + offset, 0);
+ NIC_WREG32(NIC0_RXE0_CQ_CONSUMER_INDEX_0 + offset, 0);
+ NIC_WREG32(NIC0_RXE0_CQ_CFG_0 + offset, NIC0_RXE0_CQ_CFG_WRITE_PI_EN_MASK |
+ NIC0_RXE0_CQ_CFG_ENABLE_MASK);
+
+ /* CQs 0 and 1 are secured and hence reserved so skip them for block addr calculation */
+ __gaudi2_cn_get_db_fifo_umr(cn_port, (id >> 1) - 1, id, &umr_block_addr, ®s_offset);
+
+ /* add CQ offset */
+ regs_offset += NIC0_UMR0_0_COMPLETION_QUEUE_CI_1_CQ_NUMBER -
+ NIC0_UMR0_0_UNSECURE_DOORBELL1_UNSECURE_DB_FIRST32;
+
+ /* get mmap handle for UMR block */
+ rc = hbl_cn_get_hw_block_handle(hdev, umr_block_addr, ®s_handle);
+ if (rc) {
+ dev_dbg(hdev->dev, "Failed to get user CQ %d UMR block, rc %d, port %d\n", id, rc,
+ port);
+ goto umr_get_fail;
+ }
+
+ rc = hbl_cn_eq_dispatcher_register_cq(cn_port, ctx->asid, id);
+ if (rc) {
+ dev_err(hdev->dev, "failed to register CQ %d, rc %d, port %d\n", id, rc, port);
+ goto eq_register_fail;
+ }
+
+ out->mem_handle = mem_handle;
+ out->regs_handle = regs_handle;
+ out->regs_offset = regs_offset;
+ out->pi_handle = pi_handle;
+
+ user_cq->mem_handle = mem_handle;
+ user_cq->pi_handle = pi_handle;
+
+ return 0;
+
+eq_register_fail:
+umr_get_fail:
+ hbl_cn_mem_destroy(hdev, pi_handle);
+pi_alloc_fail:
+ hbl_cn_mem_destroy(hdev, mem_handle);
+
+ return rc;
+}
+
+static int gaudi2_user_cq_unset(struct hbl_cn_user_cq *user_cq)
+{
+ struct hbl_cn_port *cn_port = user_cq->cn_port;
+ u32 port, id = user_cq->id, offset = id * 4;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+
+ port = cn_port->port;
+
+ hbl_cn_eq_dispatcher_unregister_cq(cn_port, id);
+
+ NIC_RMWREG32(NIC0_RXE0_CQ_CFG_0 + offset, 0, NIC0_RXE0_CQ_CFG_WRITE_PI_EN_MASK);
+ /* flush the new cfg */
+ NIC_RREG32(NIC0_RXE0_CQ_CFG_0 + offset);
+
+ NIC_WREG32(NIC0_RXE0_CQ_PI_ADDR_HI_0 + offset, 0);
+ NIC_WREG32(NIC0_RXE0_CQ_PI_ADDR_LO_0 + offset, 0);
+
+ /* only unmaps as the HW might still access this memory */
+ hbl_cn_mem_destroy(hdev, user_cq->mem_handle);
+ /* unmaps and frees as we disabled the PI flag and the HW won't access this memory */
+ hbl_cn_mem_destroy(hdev, user_cq->pi_handle);
+
+ return 0;
+}
+
+static void gaudi2_user_cq_set_overrun(struct hbl_cn_user_cq *user_cq, bool set_overrun)
+{
+ struct hbl_cn_port *cn_port = user_cq->cn_port;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port, offset = user_cq->id * 4;
+ bool update_cq_cfg = false;
+
+ port = cn_port->port;
+
+ mutex_lock(&user_cq->overrun_lock);
+
+ /* only the first QP should enable CQ overrun, and the last QP should disable overrun */
+ if (set_overrun && user_cq->qp_set_overrun_cnt == 0) {
+ user_cq->qp_set_overrun_cnt++;
+ update_cq_cfg = true;
+ } else if (!set_overrun && user_cq->qp_set_overrun_cnt == 1) {
+ user_cq->qp_set_overrun_cnt--;
+ update_cq_cfg = true;
+ }
+
+ if (update_cq_cfg)
+ NIC_RMWREG32(NIC0_RXE0_CQ_CFG_0 + offset, set_overrun,
+ NIC0_RXE0_CQ_CFG_OVERRUN_EN_MASK);
+
+ mutex_unlock(&user_cq->overrun_lock);
+}
+
+static void gaudi2_user_cq_destroy(struct hbl_cn_user_cq *user_cq)
+{
+ struct hbl_cn_port *cn_port = user_cq->cn_port;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port, offset = user_cq->id * 4;
+
+ port = cn_port->port;
+
+ NIC_WREG32(NIC0_RXE0_CQ_CFG_0 + offset, 0);
+ NIC_WREG32(NIC0_RXE0_CQ_LOG_SIZE_0 + offset, 0);
+}
+
+static void gaudi2_set_advanced_op_mask(struct hbl_cn_device *hdev, bool advanced)
+{
+ u64 advanced_op_mask = BIT(HBL_CNI_OP_USER_CCQ_SET) | BIT(HBL_CNI_OP_USER_CCQ_UNSET);
+
+ if (advanced)
+ hdev->ctrl_op_mask |= advanced_op_mask;
+ else
+ hdev->ctrl_op_mask &= ~advanced_op_mask;
+}
+
+static int gaudi2_user_set_app_params(struct hbl_cn_device *hdev,
+ struct hbl_cni_set_user_app_params_in *in,
+ bool *modify_wqe_checkers, struct hbl_cn_ctx *ctx)
+{
+ u32 port = in->port, bp_offs_fw, bp_offs_qman, encap_id, wtd_config;
+ struct gaudi2_cn_port *gaudi2_port;
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+
+ if (cn_port->set_app_params) {
+ dev_dbg(hdev->dev, "App params were already set, port %d\n", port);
+ return -EPERM;
+ }
+
+ gaudi2_port = cn_port->cn_specific;
+ gaudi2_port->advanced = in->advanced;
+ gaudi2_port->adaptive_timeout_en = in->adaptive_timeout_en;
+
+ /* Enable\disable advanced operations */
+ gaudi2_set_advanced_op_mask(hdev, (bool)gaudi2_port->advanced);
+
+ bp_offs_fw = in->bp_offs[HBL_CNI_USER_BP_OFFS_FW];
+ bp_offs_qman = in->bp_offs[HBL_CNI_USER_BP_OFFS_QMAN];
+
+ /* Validate the parameters before performing any register changes */
+ if ((bp_offs_fw) || (bp_offs_qman)) {
+ if (!gaudi2_port->advanced) {
+ dev_dbg(hdev->dev,
+ "Port %u: advanced flag is disabled - can't set back-pressue\n",
+ port);
+ return -EINVAL;
+ }
+
+ if ((bp_offs_fw) && (bp_offs_fw & ~WQ_BP_ADDR_VAL_MASK)) {
+ dev_dbg(hdev->dev, "Port %u: invalid ARC BP offset 0x%x\n", port,
+ bp_offs_fw);
+ return -EINVAL;
+ }
+
+ if ((bp_offs_qman) && (bp_offs_qman & ~WQ_BP_ADDR_VAL_MASK)) {
+ dev_dbg(hdev->dev, "Port %u: invalid QMAN BP offset 0x%x\n", port,
+ bp_offs_qman);
+ return -EINVAL;
+ }
+
+ gaudi2_port->min_qp_size = U32_MAX;
+
+ /* Enable BP for DB */
+ wtd_config = NIC_RREG32(NIC0_QPC0_WTD_CONFIG) |
+ NIC0_QPC0_WTD_CONFIG_WQ_BP_DB_ACCOUNTED_MASK;
+
+ if (bp_offs_fw) {
+ /* Enable WTD BP to ARC */
+ wtd_config |= NIC0_QPC0_WTD_CONFIG_WQ_BP_2ARC_EN_MASK;
+
+ /* Set the offset in the ARC memory to signal the BP*/
+ NIC_WREG32(NIC0_QPC0_WQ_BP_2ARC_ADDR, bp_offs_fw);
+ }
+
+ if (bp_offs_qman) {
+ /* Enable WTD BP to QMAN */
+ wtd_config |= NIC0_QPC0_WTD_CONFIG_WQ_BP_2QMAN_EN_MASK;
+
+ /* Set the offset in the QMAN memory to signal the BP*/
+ NIC_WREG32(NIC0_QPC0_WQ_BP_2QMAN_ADDR, bp_offs_qman);
+ }
+
+ NIC_WREG32(NIC0_QPC0_WTD_CONFIG, wtd_config);
+
+ *modify_wqe_checkers = true;
+ cn_port->bp_enable = true;
+ } else {
+ cn_port->bp_enable = false;
+ *modify_wqe_checkers = false;
+ }
+
+ if (gaudi2_port->adaptive_timeout_en) {
+ u8 max_retry_timeout = GAUDI2_NIC_MAX_TIMEOUT_RETRIES / NIC_ADAPTIVE_TIMEOUT_RANGE;
+
+ NIC_WREG32(NIC0_QPC0_RETRY_COUNT_MAX,
+ (max_retry_timeout << NIC0_QPC0_RETRY_COUNT_MAX_TIMEOUT_SHIFT) |
+ (max_retry_timeout << NIC0_QPC0_RETRY_COUNT_MAX_SEQUENCE_ERROR_SHIFT));
+ }
+
+ /* configure port's encapsulation for source ip-address automatically */
+ if (cn_port->eth_enable)
+ gaudi2_default_encap_set(cn_port, &encap_id, 0);
+
+ return 0;
+}
+
+static void gaudi2_user_get_app_params(struct hbl_cn_device *hdev,
+ struct hbl_cni_get_user_app_params_in *in,
+ struct hbl_cni_get_user_app_params_out *out)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ struct hbl_cn_port *cn_port;
+
+ cn_port = gaudi2->cn_ports[in->port].cn_port;
+ out->max_num_of_qps = NIC_MAX_QP_NUM;
+ /* always include the Ethernet QP */
+ out->num_allocated_qps = 1 + atomic_read(&cn_port->num_of_allocated_qps);
+ out->max_allocated_qp_idx = hbl_cn_get_max_qp_id(cn_port);
+ out->max_cq_size = CQE_SIZE * NIC_CQ_MAX_ENTRIES;
+
+ /* Two CQs are reserved - one for the Ethernet and one for the driver CQ. We could use the
+ * driver CQ as a user CQ but in each UMR there are 2 CQs and since CQ idx 0 is reserved for
+ * Ethernet, also CQ idx 1 is unavailable.
+ */
+ out->max_num_of_cqs = GAUDI2_NIC_MAX_CQS_NUM - NIC_CQS_NUM;
+ out->max_num_of_db_fifos = GAUDI2_MAX_DB_FIFO_ID - GAUDI2_MIN_DB_FIFO_ID + 1;
+ out->max_num_of_encaps = GAUDI2_MAX_ENCAP_ID - GAUDI2_MIN_ENCAP_ID + 1;
+}
+
+static void gaudi2_cn_stop_traffic_macro(struct hbl_cn_macro *cn_macro)
+{
+ struct hbl_cn_device *hdev = cn_macro->hdev;
+ u32 port = cn_macro->idx << 1; /* the index of the first port in the macro */
+ int i;
+
+ for (i = 0; i < 8; i++)
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_DROP_THRESHOLD_0 + i * 4, 0);
+
+ usleep_range(1000, 2000);
+
+ NIC_MACRO_RMWREG32(NIC0_TMR_TMR_CACHES_CFG, 1,
+ NIC0_TMR_TMR_CACHES_CFG_LIST_CACHE_STOP_MASK);
+ NIC_MACRO_RMWREG32(NIC0_TMR_TMR_CACHES_CFG, 1,
+ NIC0_TMR_TMR_CACHES_CFG_FREE_LIST_CACHE_STOP_MASK);
+ NIC_MACRO_RMWREG32(NIC0_TMR_TMR_CACHES_CFG, 1,
+ NIC0_TMR_TMR_CACHES_CFG_STATE_CACHE_STOP_MASK);
+
+ usleep_range(1000, 2000);
+
+ /* Flush all the writes */
+ NIC_MACRO_RREG32(NIC0_TMR_TMR_CACHES_CFG);
+}
+
+static void gaudi2_cn_stop_traffic_port(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+
+ NIC_RMWREG32(NIC0_QPC0_REQ_STATIC_CONFIG, 1,
+ NIC0_QPC0_REQ_STATIC_CONFIG_CACHE_STOP_MASK);
+ NIC_RMWREG32(NIC0_QPC0_RES_STATIC_CONFIG, 1,
+ NIC0_QPC0_RES_STATIC_CONFIG_CACHE_STOP_MASK);
+ NIC_RMWREG32(NIC0_TXS0_CACHE_CFG, 1,
+ NIC0_TXS0_CACHE_CFG_LIST_CACHE_STOP_MASK);
+ NIC_RMWREG32(NIC0_TXS0_CACHE_CFG, 1,
+ NIC0_TXS0_CACHE_CFG_FREE_LIST_CACHE_STOP_MASK);
+
+ /* Flush all the writes */
+ NIC_MACRO_RREG32(NIC0_TXS0_CACHE_CFG);
+}
+
+static bool gaudi2_cn_is_macro_enabled(struct hbl_cn_macro *cn_macro)
+{
+ struct hbl_cn_device *hdev = cn_macro->hdev;
+ u32 port1, port2;
+
+ port1 = cn_macro->idx << 1; /* the index of the first port in the macro */
+ port2 = port1 + 1;
+
+ return (hdev->ports_mask & BIT(port1)) || (hdev->ports_mask & BIT(port2));
+}
+
+/* FW must be aligned with any changes done to this function */
+static void gaudi2_cn_stop_traffic(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_macro *cn_macro;
+ int i;
+
+ for (i = 0; i < hdev->cn_props.num_of_macros; i++) {
+ cn_macro = &hdev->cn_macros[i];
+
+ if (!gaudi2_cn_is_macro_enabled(cn_macro))
+ continue;
+
+ gaudi2_cn_stop_traffic_macro(cn_macro);
+ }
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ gaudi2_cn_stop_traffic_port(&hdev->cn_ports[i]);
+ }
+}
+
+static void gaudi2_cn_set_speed(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_port *cn_port;
+ int i;
+
+ for (i = 0; i < NIC_NUMBER_OF_PORTS; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+ cn_port->speed = hdev->phy_set_nrz ? SPEED_25000 : SPEED_100000;
+ }
+}
+
+static void gaudi2_cn_config_hw_mac(struct hbl_cn_macro *cn_macro)
+{
+ struct hbl_cn_device *hdev = cn_macro->hdev;
+ struct gaudi2_cn_device *gaudi2;
+ u32 port, speed;
+
+ gaudi2 = hdev->asic_specific;
+ /* the index of the first port in the macro */
+ port = cn_macro->idx << 1;
+ speed = hdev->cn_ports[port].speed;
+
+ switch (speed) {
+ case SPEED_25000:
+ /* AC_SD_CFG: sd_n2 = 0, sd_8x = 0 */
+ NIC_MACRO_WREG32(PRT0_MAC_CORE_MAC_SD_CFG, 0xF0FF00);
+
+ /* KP_MODE = 0, FEC91_EN = 1, FEC91_1LANE = 1 */
+ NIC_MACRO_WREG32(PRT0_MAC_CORE_MAC_FEC91_CFG, 0x60F);
+ NIC_MACRO_WREG32(PRT0_MAC_CORE_MAC_PCS_CFG, 0x0);
+ NIC_MACRO_WREG32(PRT0_MAC_CORE_MAC_FC_FEC_CFG, 0x0);
+ break;
+ case SPEED_50000:
+ NIC_MACRO_WREG32(PRT0_MAC_CORE_MAC_SD_CFG, 0xF0FFF0);
+ NIC_MACRO_WREG32(PRT0_MAC_CORE_MAC_FEC91_CFG, 0xFF);
+ NIC_MACRO_WREG32(PRT0_MAC_CORE_MAC_PCS_CFG, 0x0);
+
+ if (gaudi2->mac_rs_fec_ctrl_support) {
+ NIC_MACRO_WREG32(NIC0_MAC_RS_FEC_RSFEC_CONTROL, 0x400);
+ NIC_MACRO_WREG32(NIC0_MAC_RS_FEC_RSFEC1_CONTROL, 0x400);
+ NIC_MACRO_WREG32(NIC0_MAC_RS_FEC_RSFEC2_CONTROL, 0x400);
+ NIC_MACRO_WREG32(NIC0_MAC_RS_FEC_RSFEC3_CONTROL, 0x400);
+ }
+ break;
+ case SPEED_100000:
+ NIC_MACRO_WREG32(PRT0_MAC_CORE_MAC_FEC91_CFG, 0xFF);
+ break;
+ default:
+ dev_err(hdev->dev, "unknown speed %dMb/s, cannot set MAC\n", speed);
+ }
+}
+
+static void gaudi2_cn_config_hw_rxb(struct hbl_cn_macro *cn_macro)
+{
+ u32 dynamic_credits, static_credits, drop_th, small_pkt_drop_th, xoff_th, xon_th, val;
+ struct hbl_cn_device *hdev = cn_macro->hdev;
+ u32 port = cn_macro->idx << 1; /* the index of the first port in the macro */
+ int i;
+
+ /* Set iCRC calculation & verification with reversed bytes */
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_ICRC_CFG, 0x2);
+
+ /* Assuming 1 effective priority per port divided between 2 physical ports. */
+ static_credits = RXB_NUM_STATIC_CREDITS;
+ dynamic_credits = 0;
+ drop_th = static_credits - RXB_DROP_TH_DEPTH;
+ xoff_th = static_credits - RXB_XOFF_TH_DEPTH;
+ xon_th = xoff_th - RXB_XON_TH_DEPTH;
+ small_pkt_drop_th = static_credits - RXB_DROP_SMALL_TH_DEPTH;
+
+ /* Dynamic credits (global) */
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_MAX_DYNAMIC, dynamic_credits);
+
+ val = static_credits | (static_credits << 13);
+ for (i = 0; i < 8; i++)
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_MAX_STATIC_CREDITS_0 + i * 4, val);
+
+ /* Drop threshold (per port/prio) */
+ val = drop_th | (drop_th << 13);
+ for (i = 0; i < 8; i++)
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_DROP_THRESHOLD_0 + i * 4, val);
+
+ /* Drop threshold for small packets (per port/prio) */
+ val = small_pkt_drop_th | (small_pkt_drop_th << 13);
+ for (i = 0; i < 8; i++)
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_DROP_SMALL_THRESHOLD_0 + i * 4, val);
+
+ /* XOFF threshold (per port/prio) */
+ val = xoff_th | (xoff_th << 13);
+ for (i = 0; i < 8; i++)
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_XOFF_THRESHOLD_0 + i * 4, val);
+
+ /* XON threshold (per port/prio) */
+ val = xon_th | (xon_th << 13);
+ for (i = 0; i < 8; i++)
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_XON_THRESHOLD_0 + i * 4, val);
+
+ /* All DSCP values should be mapped to PRIO 0 */
+ for (i = 0; i < 8; i++)
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_DSCP2PRIO_0 + i * 4, 0);
+
+ /* set priority 0 as default priority to all ports and set the RXB to take the priority
+ * according to the incoming port.
+ */
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_PORT_DEFAULT_PRIO, 0x0);
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_PORT_TRUST_LEVEL, 0);
+
+ /* spread PAUSE on all prios (i.e. global pause) */
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_MAC_PFC_MODE, 0x2);
+}
+
+static void gaudi2_cn_config_hw_txb(struct hbl_cn_macro *cn_macro)
+{
+ struct hbl_cn_device *hdev = cn_macro->hdev;
+ u32 port = cn_macro->idx << 1; /* the index of the first port in the macro */
+ u32 speed;
+
+ speed = hdev->cn_ports[port].speed;
+
+ /* Set iCRC calculation & generation with reversed bytes */
+ NIC_MACRO_WREG32(NIC0_TXB_ICRC_CFG, 0x2);
+
+ NIC_MACRO_WREG32(NIC0_TXB_GLOBAL_PAUSE, 0x0);
+
+ switch (speed) {
+ case SPEED_25000:
+ fallthrough;
+ case SPEED_50000:
+ NIC_MACRO_WREG32(NIC0_TXB_TDM_PORT_ARB_MASK, 0x7BDE);
+ break;
+ case SPEED_100000:
+ NIC_MACRO_WREG32(NIC0_TXB_TDM_PORT_ARB_MASK, 0xBBEE);
+ break;
+ default:
+ dev_err(hdev->dev, "unknown port %d speed %dMb/s, cannot set TDM mask\n", port,
+ speed);
+ }
+}
+
+static int gaudi2_cn_config_hw_tmr(struct hbl_cn_macro *cn_macro)
+{
+ struct cpucp_cn_init_hw_mem_packet pkt;
+ struct hbl_cn_properties *cn_prop;
+ u64 tmr_addr, nic_tmr_timeout_us;
+ struct hbl_cn_device *hdev;
+ bool use_cpucp;
+ int i, rc;
+ u32 port;
+
+ port = cn_macro->idx << 1; /* the index of the first port in the macro */
+ hdev = cn_macro->hdev;
+ cn_prop = &hdev->cn_props;
+
+ tmr_addr = cn_prop->tmr_base_addr + cn_macro->idx * cn_prop->tmr_base_size;
+
+ use_cpucp = !!(hdev->fw_app_cpu_boot_dev_sts0 & CPU_BOOT_DEV_STS0_NIC_MEM_CLEAR_EN);
+ if (use_cpucp) {
+ memset(&pkt, 0, sizeof(pkt));
+ pkt.cpucp_pkt.ctl = cpu_to_le32(CPUCP_PACKET_NIC_INIT_TMR_MEM <<
+ CPUCP_PKT_CTL_OPCODE_SHIFT);
+ pkt.cpucp_pkt.macro_index = cpu_to_le32(cn_macro->idx);
+ pkt.mem_base_addr = cpu_to_le64(tmr_addr + TMR_FREE_OFFS);
+ pkt.num_entries = cpu_to_le16(TMR_FREE_NUM_ENTRIES);
+ pkt.entry_size = cpu_to_le16(TMR_ENT_SIZE);
+ pkt.granularity = cpu_to_le16(TMR_GRANULARITY);
+
+ rc = gaudi2_cn_send_cpu_message(hdev, (u32 *)&pkt, sizeof(pkt), 0, NULL);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to handle CPU-CP pkt %u, error %d\n",
+ CPUCP_PACKET_NIC_INIT_TMR_MEM, rc);
+ return rc;
+ }
+ } else {
+ /* Timer free list */
+ for (i = 0; i < TMR_FREE_NUM_ENTRIES; i++) {
+ hbl_cn_dram_writel(hdev, TMR_GRANULARITY + i,
+ tmr_addr + TMR_FREE_OFFS + i * TMR_ENT_SIZE);
+
+ if ((i % NIC_MAX_COMBINED_WRITES) == 0)
+ hbl_cn_dram_readl(hdev,
+ tmr_addr + TMR_FREE_OFFS + i * TMR_ENT_SIZE);
+ }
+
+ /* Perform read to flush the writes */
+ hbl_cn_dram_readl(hdev, tmr_addr);
+ }
+
+ WARN_ON_CACHE_UNALIGNED(tmr_addr + TMR_FIFO_OFFS);
+ WARN_ON_CACHE_UNALIGNED(tmr_addr + TMR_FSM0_OFFS);
+
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_BASE_ADDRESS_63_32,
+ upper_32_bits(tmr_addr + TMR_FIFO_OFFS));
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_BASE_ADDRESS_31_7,
+ lower_32_bits(tmr_addr + TMR_FIFO_OFFS) >> 7);
+
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_BASE_ADDRESS_FREE_LIST_63_32,
+ upper_32_bits(tmr_addr + TMR_FREE_OFFS));
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_BASE_ADDRESS_FREE_LIST_31_0,
+ lower_32_bits(tmr_addr + TMR_FREE_OFFS));
+
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_CACHE_BASE_ADDR_63_32,
+ upper_32_bits(tmr_addr + TMR_FSM0_OFFS));
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_CACHE_BASE_ADDR_31_7,
+ lower_32_bits(tmr_addr + TMR_FSM0_OFFS) >> 7);
+
+ /* configure MMU-BP for TIMERS */
+ NIC_MACRO_WREG32(NIC0_TMR_AXUSER_TMR_FREE_LIST_HB_WR_OVRD_LO, 0xFFFFFBFF);
+ NIC_MACRO_WREG32(NIC0_TMR_AXUSER_TMR_FREE_LIST_HB_RD_OVRD_LO, 0xFFFFFBFF);
+
+ NIC_MACRO_WREG32(NIC0_TMR_AXUSER_TMR_FIFO_HB_WR_OVRD_LO, 0xFFFFFBFF);
+ NIC_MACRO_WREG32(NIC0_TMR_AXUSER_TMR_FIFO_HB_RD_OVRD_LO, 0xFFFFFBFF);
+
+ NIC_MACRO_WREG32(NIC0_TMR_AXUSER_TMR_FSM_HB_WR_OVRD_LO, 0xFFFFFBFF);
+ NIC_MACRO_WREG32(NIC0_TMR_AXUSER_TMR_FSM_HB_RD_OVRD_LO, 0xFFFFFBFF);
+ /* Perform read to flush the writes */
+ NIC_MACRO_RREG32(NIC0_TMR_AXUSER_TMR_FSM_HB_RD_OVRD_LO);
+
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_31_0, 0);
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_63_32, 0);
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_95_64, 0);
+
+ for (i = 0; i < TMR_GRANULARITY; i++) {
+ /* Set the amount of ticks for timeout. */
+ nic_tmr_timeout_us = ((i == 0) || (i >= 32)) ?
+ GENMASK_ULL(46, 0) :
+ (u64)(hdev->pldm ? NIC_TMR_TIMEOUT_PLDM_US :
+ (1ULL << (i + 2)));
+
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_191_160,
+ lower_32_bits(nic_tmr_timeout_us));
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_216_192,
+ upper_32_bits(nic_tmr_timeout_us));
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_127_96, i);
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_SCHEDQ_UPDATE_DESC_159_128, i);
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_SCHEDQ_UPDATE_FIFO, i);
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_SCHEDQ_UPDATE_EN, 1);
+ }
+
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_SCAN_TIMER_COMP_31_0, 10);
+
+ /* Set the number of clock's cycles for a single tick in order to have 1 usec per tick.
+ * i.e.: 1/frequency_in_MHz * num_of_clk_cycles = 1 usec
+ */
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_TICK_WRAP, cn_prop->clk);
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_LIST_MASK,
+ ~(0xFFFFFFFF << (ilog2(TMR_FREE_NUM_ENTRIES) - 5)));
+
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_PRODUCER_UPDATE, TMR_FREE_NUM_ENTRIES);
+ /* Latch the TMR value */
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_PRODUCER_UPDATE_EN, 1);
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_PRODUCER_UPDATE_EN, 0);
+
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_LIST_MEM_READ_MASK, 0);
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_PUSH_LOCK_EN, 1);
+ NIC_MACRO_WREG32(NIC0_TMR_TMR_TIMER_EN, 1);
+ NIC_MACRO_WREG32(NIC0_TMR_FREE_LIST_PUSH_MASK_EN, 0);
+
+ /* Perform read from the device to flush all configurations */
+ NIC_MACRO_RREG32(NIC0_TMR_TMR_TIMER_EN);
+
+ return 0;
+}
+
+static void gaudi2_cn_enable_gic_macro_interrupts(struct hbl_cn_macro *cn_macro)
+{
+ struct hbl_cn_device *hdev;
+ u32 port;
+
+ port = cn_macro->idx << 1; /* the index of the first port in the macro */
+ hdev = cn_macro->hdev;
+
+ /* enable TMR block interrupts */
+ NIC_MACRO_WREG32(NIC0_TMR_INTERRUPT_MASK, 0x0);
+
+ /* enable RXB_CORE block interrupts */
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_SEI_INTR_MASK, 0x0);
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_SPI_INTR_MASK, 0x0);
+}
+
+static void gaudi2_cn_disable_gic_macro_interrupts(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_macro *cn_macro;
+ u32 port;
+ int i;
+
+ for (i = 0; i < NIC_NUMBER_OF_MACROS; i++) {
+ cn_macro = &hdev->cn_macros[i];
+ port = cn_macro->idx << 1; /* the index of the first port in the macro */
+
+ /* It's not allowed to configure a macro that both of its ports are not enabled */
+ if (!gaudi2_cn_is_macro_enabled(cn_macro))
+ continue;
+
+ /* disable TMR block interrupts */
+ NIC_MACRO_WREG32(NIC0_TMR_INTERRUPT_MASK, 0xF);
+
+ /* disable RXB_CORE block interrupts */
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_SEI_INTR_MASK, 0x3);
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_SPI_INTR_MASK, 0x3F);
+ }
+}
+
+static int gaudi2_cn_hw_macro_config(struct hbl_cn_macro *cn_macro)
+{
+ int rc;
+
+ /* the following registers are shared between each pair of ports */
+
+ /* MAC Configuration */
+ gaudi2_cn_config_hw_mac(cn_macro);
+
+ /* RXB Configuration */
+ gaudi2_cn_config_hw_rxb(cn_macro);
+
+ /* TXB Configuration */
+ gaudi2_cn_config_hw_txb(cn_macro);
+
+ /* TMR Configuration */
+ rc = gaudi2_cn_config_hw_tmr(cn_macro);
+ if (rc)
+ return rc;
+
+ /* Enable GIC macro interrupts - required only if running on PLDM */
+ if (cn_macro->hdev->pldm)
+ gaudi2_cn_enable_gic_macro_interrupts(cn_macro);
+
+ return rc;
+}
+
+static int gaudi2_cn_macros_hw_config(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_macro *cn_macro;
+ int i, rc = 0;
+
+ for (i = 0; i < NIC_NUMBER_OF_MACROS; i++) {
+ cn_macro = &hdev->cn_macros[i];
+
+ if (!gaudi2_cn_is_macro_enabled(cn_macro))
+ continue;
+
+ rc = gaudi2_cn_hw_macro_config(cn_macro);
+ if (rc)
+ return rc;
+ }
+
+ return rc;
+}
+
+static int gaudi2_cn_core_init(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_properties *cn_prop = &hdev->cn_props;
+ u64 nic_dram_alloc_size;
+ int rc;
+
+ nic_dram_alloc_size = cn_prop->nic_drv_end_addr - cn_prop->nic_drv_base_addr;
+ if (nic_dram_alloc_size > cn_prop->nic_drv_size) {
+ dev_err(hdev->dev, "DRAM allocation for CN (%lluMB) shouldn't exceed %lluMB\n",
+ div_u64(nic_dram_alloc_size, SZ_1M),
+ div_u64(cn_prop->nic_drv_size, SZ_1M));
+ return -ENOMEM;
+ }
+
+ rc = gaudi2_cn_phy_init(hdev);
+ if (rc)
+ return rc;
+
+ /* This function must be called before configuring the macros */
+ gaudi2_cn_set_speed(hdev);
+
+ rc = gaudi2_cn_macros_hw_config(hdev);
+ if (rc)
+ return rc;
+
+ return gaudi2_cn_eq_init(hdev);
+}
+
+static void gaudi2_cn_core_fini(struct hbl_cn_device *hdev)
+{
+ gaudi2_cn_eq_fini(hdev);
+
+ /* Disable GIC macro interrupts - required only if running on PLDM */
+ if (hdev->pldm)
+ gaudi2_cn_disable_gic_macro_interrupts(hdev);
+
+ gaudi2_cn_stop_traffic(hdev);
+}
+
+static int gaudi2_cn_ctx_dispatcher_init(struct hbl_cn_device *hdev, u32 asid)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cn_port *gaudi2_port;
+ struct hbl_cn_port *cn_port;
+ int i, j, rc = 0;
+ u32 port;
+
+ for (i = 0; i < NIC_NUMBER_OF_PORTS; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ gaudi2_port = &gaudi2->cn_ports[i];
+ cn_port = gaudi2_port->cn_port;
+ port = cn_port->port;
+
+ rc = hbl_cn_eq_dispatcher_associate_dq(gaudi2_port->cn_port, asid);
+ if (rc) {
+ dev_err(hdev->dev,
+ "failed to associate ASID %d with port %d event dispatcher (err %d)\n",
+ asid, port, rc);
+ goto associate_error;
+ }
+ }
+
+ return 0;
+
+associate_error:
+ /* dissociate the associated dqs */
+ for (j = 0; j < i; j++) {
+ gaudi2_port = &gaudi2->cn_ports[j];
+ hbl_cn_eq_dispatcher_dissociate_dq(gaudi2_port->cn_port, asid);
+ }
+
+ return rc;
+}
+
+static void gaudi2_cn_ctx_dispatcher_fini(struct hbl_cn_device *hdev, u32 asid)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cn_port *gaudi2_port;
+ int i;
+
+ for (i = 0; i < NIC_NUMBER_OF_PORTS; i++)
+ if (hdev->ports_mask & BIT(i)) {
+ gaudi2_port = &gaudi2->cn_ports[i];
+ hbl_cn_eq_dispatcher_dissociate_dq(gaudi2_port->cn_port, asid);
+ }
+}
+
+static int gaudi2_cn_kernel_ctx_init(struct hbl_cn_device *hdev, u32 asid)
+{
+ return gaudi2_cn_ctx_dispatcher_init(hdev, asid);
+}
+
+static void gaudi2_cn_kernel_ctx_fini(struct hbl_cn_device *hdev, u32 asid)
+{
+ gaudi2_cn_ctx_dispatcher_fini(hdev, asid);
+}
+
+static int gaudi2_cn_ctx_init(struct hbl_cn_ctx *ctx)
+{
+ return gaudi2_cn_ctx_dispatcher_init(ctx->hdev, ctx->asid);
+}
+
+static void gaudi2_cn_ctx_fini(struct hbl_cn_ctx *ctx)
+{
+ gaudi2_cn_ctx_dispatcher_fini(ctx->hdev, ctx->asid);
+}
+
+static void gaudi2_cn_configure_cq(struct hbl_aux_dev *aux_dev, u32 port, u16 coalesce_usec,
+ bool enable)
+{
+ struct hbl_cn_device *hdev = container_of(aux_dev, struct hbl_cn_device, en_aux_dev);
+ struct hbl_cn_properties *cn_prop = &hdev->cn_props;
+ u32 arm_timeout;
+
+ /* Calc timeout in ticks.
+ * result/value of 0 is interpreted as ASAP but since a value of Zero is an invalid value
+ * we modify it to 1.
+ */
+ arm_timeout = coalesce_usec ? cn_prop->clk * coalesce_usec : 1;
+
+ /* disable the current timer before configuring the new time */
+ NIC_RMWREG32(NIC0_RXE0_CQ_ARM_TIMEOUT_EN, 0, BIT(NIC_CQ_RAW_IDX));
+ NIC_RREG32(NIC0_RXE0_CQ_ARM_TIMEOUT_EN);
+
+ /* if enable - configure the new timer and enable it */
+ if (enable) {
+ NIC_WREG32(NIC0_RXE0_CQ_ARM_TIMEOUT, arm_timeout);
+ NIC_RMWREG32(NIC0_RXE0_CQ_ARM_TIMEOUT_EN, 1, BIT(NIC_CQ_RAW_IDX));
+ }
+}
+
+static void gaudi2_cn_arm_cq(struct hbl_aux_dev *aux_dev, u32 port, u32 index)
+{
+ struct hbl_cn_device *hdev = container_of(aux_dev, struct hbl_cn_device, en_aux_dev);
+
+ NIC_WREG32(NIC0_QPC0_ARM_CQ_NUM, NIC_CQ_RAW_IDX);
+ NIC_WREG32(NIC0_QPC0_ARM_CQ_INDEX, index);
+}
+
+static void gaudi2_cn_write_rx_ci(struct hbl_aux_dev *aux_dev, u32 port, u32 ci)
+{
+ struct hbl_cn_device *hdev = container_of(aux_dev, struct hbl_cn_device, en_aux_dev);
+
+ NIC_WREG32(NIC0_QPC0_SECURED_CQ_NUMBER, NIC_CQ_RAW_IDX);
+ NIC_WREG32(NIC0_QPC0_SECURED_CQ_CONSUMER_INDEX, ci);
+}
+
+static void gaudi2_cn_get_pfc_cnts(struct hbl_aux_dev *aux_dev, u32 port, int pfc_prio,
+ u64 *indications, u64 *requests)
+{
+ struct hbl_cn_device *hdev = container_of(aux_dev, struct hbl_cn_device, en_aux_dev);
+ u64 reg_addr, lo_part, hi_part;
+
+ reg_addr = (port & 1) ? NIC0_MAC_GLOB_STAT_RX2_ACBFCPAUSEFRAMESRECEIVED0_2 :
+ NIC0_MAC_GLOB_STAT_RX0_ACBFCPAUSEFRAMESRECEIVED0;
+
+ reg_addr += (4 * pfc_prio);
+
+ lo_part = NIC_MACRO_RREG32(reg_addr);
+ hi_part = NIC_MACRO_RREG32(NIC0_MAC_GLOB_STAT_CONTROL_REG_DATA_HI);
+ *indications = lo_part | (hi_part << 32);
+
+ reg_addr = (port & 1) ? NIC0_MAC_GLOB_STAT_TX2_ACBFCPAUSEFRAMESTRANSMITTED0_2 :
+ NIC0_MAC_GLOB_STAT_TX0_ACBFCPAUSEFRAMESTRANSMITTED0;
+
+ reg_addr += (4 * pfc_prio);
+
+ lo_part = NIC_MACRO_RREG32(reg_addr);
+ hi_part = NIC_MACRO_RREG32(NIC0_MAC_GLOB_STAT_CONTROL_REG_DATA_HI);
+ *requests = lo_part | (hi_part << 32);
+}
+
+static int gaudi2_cn_ring_tx_doorbell(struct hbl_aux_dev *aux_dev, u32 port, u32 pi,
+ bool *full_after_tx)
+{
+ struct hbl_cn_device *hdev = container_of(aux_dev, struct hbl_cn_device, en_aux_dev);
+ u32 db_fifo_ci = 0, db_fifo_pi = 0, space_left_in_db_fifo = 0;
+ struct gaudi2_cn_port *gaudi2_port;
+ struct hbl_cn_port *cn_port;
+
+ cn_port = &hdev->cn_ports[port];
+ gaudi2_port = cn_port->cn_specific;
+
+ db_fifo_ci = *((u32 *)RING_CI_ADDRESS(&gaudi2_port->fifo_ring));
+ db_fifo_pi = gaudi2_port->db_fifo_pi;
+
+ space_left_in_db_fifo = CIRC_SPACE(db_fifo_pi, db_fifo_ci, NIC_FIFO_DB_SIZE);
+
+ if (!space_left_in_db_fifo) {
+ dev_dbg_ratelimited(hdev->dev, "port %d DB fifo full. PI %d, CI %d\n", port,
+ db_fifo_pi, db_fifo_ci);
+ return -EBUSY;
+ }
+
+ NIC_WREG32(NIC0_QPC0_SECURED_DB_FIRST32, pi);
+ NIC_WREG32(NIC0_QPC0_SECURED_DB_SECOND32, RAW_QPN);
+
+ /* Incrementing local PI and wrap around at the size of NIC_FIFO_DB_SIZE */
+ gaudi2_port->db_fifo_pi = (gaudi2_port->db_fifo_pi + 1) & (NIC_FIFO_DB_SIZE - 1);
+
+ *full_after_tx = !(CIRC_SPACE(gaudi2_port->db_fifo_pi, db_fifo_ci, NIC_FIFO_DB_SIZE));
+
+ return 0;
+}
+
+static void gaudi2_cn_compute_reset_prepare(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_cn_device *hdev = aux_dev->priv;
+ struct gaudi2_cn_device *gaudi2;
+
+ gaudi2 = hdev->asic_specific;
+ gaudi2->in_compute_reset = true;
+
+ gaudi2_cn_eq_enter_temporal_polling_mode(hdev);
+ gaudi2_cn_phy_flush_link_status_work(hdev);
+}
+
+static void gaudi2_cn_compute_reset_late_init(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_cn_device *hdev = aux_dev->priv;
+ struct gaudi2_cn_device *gaudi2;
+
+ gaudi2_cn_eq_exit_temporal_polling_mode(hdev);
+
+ gaudi2 = hdev->asic_specific;
+ gaudi2->in_compute_reset = false;
+}
+
+static void gaudi2_handle_cn_port_reset_locked(struct hbl_cn_port *cn_port)
+{
+ struct gaudi2_en_aux_ops *gaudi2_en_aux_ops;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct gaudi2_cn_device *gaudi2;
+
+ gaudi2 = hdev->asic_specific;
+ gaudi2_en_aux_ops = &gaudi2->en_aux_ops;
+
+ if (hdev->ext_ports_mask & BIT(cn_port->port)) {
+ dev_err_ratelimited(hdev->dev, "port %d, going to reset\n", cn_port->port);
+ if (gaudi2_en_aux_ops->port_reset_locked)
+ gaudi2_en_aux_ops->port_reset_locked(&hdev->en_aux_dev, cn_port->port);
+ } else {
+ hbl_cn_internal_port_fini_locked(cn_port);
+ hbl_cn_internal_port_init_locked(cn_port);
+ }
+}
+
+static void gaudi2_cn_print_event(struct hbl_cn_device *hdev, u16 event_type, bool ratelimited,
+ const char *fmt, ...)
+{
+ struct gaudi2_cn_aux_ops *gaudi2_aux_ops;
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ struct va_format vaf;
+ va_list args;
+ char *name;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ gaudi2_aux_ops = aux_ops->asic_ops;
+
+ name = gaudi2_aux_ops->get_event_name(aux_dev, event_type);
+
+ va_start(args, fmt);
+ vaf.fmt = fmt;
+ vaf.va = &args;
+
+ if (ratelimited)
+ dev_err_ratelimited(hdev->dev, "%s: %pV\n", name, &vaf);
+ else
+ dev_err(hdev->dev, "%s: %pV\n", name, &vaf);
+
+ va_end(args);
+}
+
+static int gaudi2_handle_error(struct hbl_cn_device *hdev, u16 event_type, u8 macro_index,
+ struct hbl_cn_eq_intr_cause *intr_cause)
+{
+ u32 intr_cause_data, port, first_port, last_port, num_of_ports_in_macro, intr_type,
+ error_count = 0;
+ int idx, i;
+
+ num_of_ports_in_macro = NIC_NUMBER_OF_ENGINES / NIC_NUMBER_OF_MACROS;
+ first_port = macro_index * num_of_ports_in_macro;
+ last_port = (macro_index + 1) * num_of_ports_in_macro - 1;
+ intr_type = intr_cause->intr_type;
+
+ if (!intr_type || intr_type > HBL_CN_CPUCP_INTR_TXE) {
+ gaudi2_cn_print_event(hdev, event_type, true, "port %u: invalid interrupt type %u",
+ macro_index, intr_type);
+ return 1;
+ }
+
+ intr_cause_data = (u32)intr_cause->intr_cause[0].intr_cause_data;
+
+ switch (intr_type) {
+ case HBL_CN_CPUCP_INTR_TMR:
+ gaudi2_cn_print_event(hdev, event_type, true,
+ "TMR error on macro %d cause 0x%x", macro_index,
+ intr_cause_data);
+ return 1;
+ case HBL_CN_CPUCP_INTR_RXB_CORE_SPI:
+ for (i = 0; i < GAUDI2_NUM_OF_NIC_RXB_CORE_SPI_CAUSE; i++) {
+ if (!(intr_cause_data & BIT(i)))
+ continue;
+
+ gaudi2_cn_print_event(hdev, event_type, true,
+ "RXB CORE SPI error on macro %d cause: %s. cause bit %d",
+ macro_index,
+ gaudi2_cn_rxb_core_spi_interrupts_cause[i], i);
+ error_count++;
+ }
+
+ return error_count;
+ case HBL_CN_CPUCP_INTR_RXB_CORE_SEI:
+ for (i = 0; i < GAUDI2_NUM_OF_NIC_RXB_CORE_SEI_CAUSE; i++) {
+ if (!(intr_cause_data & BIT(i)))
+ continue;
+
+ gaudi2_cn_print_event(hdev, event_type, true,
+ "RXB CORE SEI error on macro %d cause: %s. cause bit %d",
+ macro_index,
+ gaudi2_cn_rxb_core_sei_interrupts_cause[i], i);
+ error_count++;
+ }
+
+ return error_count;
+ }
+
+ for (port = first_port, idx = 0; port <= last_port; port++, idx++) {
+ /* check that port is indeed enabled in the macro */
+ if (!(hdev->ports_mask & BIT(port)))
+ continue;
+
+ intr_cause_data = (u32)intr_cause->intr_cause[idx].intr_cause_data;
+ if (!intr_cause_data)
+ continue;
+
+ switch (intr_type) {
+ case HBL_CN_CPUCP_INTR_QPC_RESP_ERR:
+ for (i = 0; i < GAUDI2_NUM_OF_NIC_QPC_RESP_ERR_CAUSE; i++) {
+ if (!(intr_cause_data & BIT(i)))
+ continue;
+
+ gaudi2_cn_print_event(hdev, event_type, true,
+ "QPC response error on port %d cause: %s. cause bit %d",
+ port,
+ gaudi2_cn_qpc_resp_err_interrupts_cause[i],
+ i);
+ error_count++;
+ }
+
+ break;
+ case HBL_CN_CPUCP_INTR_RXE_SPI:
+ for (i = 0; i < GAUDI2_NUM_OF_NIC_RXE_SPI_CAUSE; i++) {
+ if (!(intr_cause_data & BIT(i)))
+ continue;
+
+ dev_dbg_ratelimited(hdev->dev,
+ "RXE SPI error on port %d cause: %s. cause bit %d\n",
+ port, gaudi2_cn_rxe_spi_interrupts_cause[i],
+ i);
+ error_count++;
+ }
+
+ break;
+ case HBL_CN_CPUCP_INTR_RXE_SEI:
+ for (i = 0; i < GAUDI2_NUM_OF_NIC_RXE_SEI_CAUSE; i++) {
+ if (!(intr_cause_data & BIT(i)))
+ continue;
+
+ gaudi2_cn_print_event(hdev, event_type, true,
+ "RXE SEI error on port %d cause: %s. cause bit %d",
+ port,
+ gaudi2_cn_rxe_sei_interrupts_cause[i], i);
+ error_count++;
+ }
+
+ break;
+ case HBL_CN_CPUCP_INTR_TXS:
+ gaudi2_cn_print_event(hdev, event_type, true,
+ "TXS error on port %d cause 0x%x", port,
+ intr_cause_data);
+ error_count++;
+ break;
+ case HBL_CN_CPUCP_INTR_TXE:
+ gaudi2_cn_print_event(hdev, event_type, true,
+ "TXE error on port %d cause 0x%x", port,
+ intr_cause_data);
+ error_count++;
+ break;
+ default:
+ gaudi2_cn_print_event(hdev, event_type, true,
+ "Invalid interrupt type port %d", port);
+ }
+ }
+
+ return error_count;
+}
+
+static void gaudi2_cn_convert_intr_cause(struct hbl_cn_eq_intr_cause *to,
+ struct hl_eq_nic_intr_cause *from)
+{
+ int i;
+
+ to->intr_type = le32_to_cpu(from->intr_type);
+
+ for (i = 0; i < MAX_PORTS_PER_NIC; i++)
+ to->intr_cause[i].intr_cause_data =
+ le64_to_cpu(from->intr_cause[i].intr_cause_data);
+}
+
+static int gaudi2_cn_sw_err_event(struct hbl_aux_dev *aux_dev, u16 event_type, u8 macro_index,
+ struct hl_eq_nic_intr_cause *intr_cause_cpucp)
+{
+ u32 qpc_intr_cause, port, first_port, last_port, num_of_ports_in_macro, error_count = 0;
+ struct hbl_cn_device *hdev = aux_dev->priv;
+ struct hbl_cn_eq_intr_cause intr_cause;
+ struct hbl_cn_port *cn_port;
+
+ gaudi2_cn_convert_intr_cause(&intr_cause, intr_cause_cpucp);
+
+ num_of_ports_in_macro = NIC_NUMBER_OF_ENGINES / NIC_NUMBER_OF_MACROS;
+ first_port = macro_index * num_of_ports_in_macro;
+ last_port = (macro_index + 1) * num_of_ports_in_macro - 1;
+
+ for (port = first_port; port <= last_port; port++) {
+ /* check that port is indeed enabled in the macro */
+ if (!(hdev->ports_mask & BIT(port)))
+ continue;
+
+ qpc_intr_cause = NIC_RREG32(NIC0_QPC0_INTERRUPT_CAUSE);
+
+ /* eqe interrupts are mapped to MSI except interrupt on error event queue
+ * which is handled here, in such case port reset is required.
+ */
+ if (!(qpc_intr_cause & 0x400))
+ continue;
+
+ gaudi2_cn_print_event(hdev, event_type, true, "QPC EQ error on port %d", port);
+ NIC_WREG32(NIC0_QPC0_INTERRUPT_CLR, 0x400);
+ error_count++;
+
+ cn_port = &hdev->cn_ports[port];
+ mutex_lock(&cn_port->control_lock);
+ hbl_cn_track_port_reset(cn_port, NIC_EQ_ERR_SYNDROME);
+ gaudi2_handle_cn_port_reset_locked(cn_port);
+ mutex_unlock(&cn_port->control_lock);
+ }
+
+ error_count += gaudi2_handle_error(hdev, event_type, macro_index, &intr_cause);
+
+ return error_count;
+}
+
+static int gaudi2_cn_axi_error_response_event(struct hbl_aux_dev *aux_dev, u16 event_type,
+ u8 macro_index,
+ struct hl_eq_nic_intr_cause *intr_cause_cpucp)
+{
+ struct hbl_cn_device *hdev = aux_dev->priv;
+ struct hbl_cn_eq_intr_cause intr_cause;
+
+ gaudi2_cn_convert_intr_cause(&intr_cause, intr_cause_cpucp);
+
+ return gaudi2_handle_error(hdev, event_type, macro_index, &intr_cause);
+}
+
+static void gaudi2_cn_pre_sw_init(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_properties *cn_prop = &hdev->cn_props;
+
+ cn_prop->phy_base_addr = NIC0_PHY_BASE;
+ cn_prop->max_hw_qps_num = NIC_HW_MAX_QP_NUM;
+ cn_prop->max_qps_num = NIC_MAX_QP_NUM;
+ cn_prop->max_hw_user_wqs_num = USER_WQES_MAX_NUM;
+ cn_prop->min_hw_user_wqs_num = USER_WQES_MIN_NUM;
+ cn_prop->rwqe_size = NIC_RECV_WQE_SIZE;
+ cn_prop->force_cq = false;
+ cn_prop->max_num_of_lanes = NIC_MAX_NUM_OF_LANES;
+ cn_prop->num_of_macros = NIC_NUMBER_OF_MACROS;
+ cn_prop->max_cqs = GAUDI2_NIC_MAX_CQS_NUM;
+ cn_prop->max_ccqs = NIC_MAX_CCQS_NUM;
+ cn_prop->max_db_fifos = GAUDI2_NIC_NUM_DB_FIFOS;
+ cn_prop->user_cq_min_entries = NIC_CQ_USER_MIN_ENTRIES;
+ cn_prop->user_cq_max_entries = NIC_CQ_USER_MAX_ENTRIES;
+ cn_prop->cqe_size = CQE_SIZE;
+ cn_prop->max_frm_len = NIC_MAX_FRM_LEN;
+ cn_prop->raw_elem_size = NIC_RAW_ELEM_SIZE;
+ cn_prop->max_raw_mtu = NIC_RAW_MAX_MTU;
+ cn_prop->min_raw_mtu = NIC_RAW_MIN_MTU;
+ cn_prop->max_wq_arr_type = HBL_CNI_USER_WQ_RECV;
+ cn_prop->is_phy_fw_binary = true;
+}
+
+static int gaudi2_cn_sw_init(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_ring **rx_rings, **cq_rings, **wq_rings;
+ struct gaudi2_cn_aux_data *gaudi2_cn_aux_data;
+ struct gaudi2_en_aux_data *gaudi2_en_aux_data;
+ struct gaudi2_en_aux_ops *gaudi2_en_aux_ops;
+ struct hbl_cn_aux_data *cn_aux_data;
+ struct hbl_en_aux_data *en_aux_data;
+ struct hbl_cn_aux_ops *cn_aux_ops;
+ struct hbl_en_aux_ops *en_aux_ops;
+ struct gaudi2_cn_device *gaudi2;
+ struct hbl_aux_dev *cn_aux_dev;
+ struct hbl_aux_dev *en_aux_dev;
+ int rc;
+
+ BUILD_BUG_ON_NOT_POWER_OF_2(NIC_RAW_ELEM_SIZE);
+ BUILD_BUG_ON_NOT_POWER_OF_2(NIC_RX_RING_PKT_NUM);
+ BUILD_BUG_ON_NOT_POWER_OF_2(NIC_CQ_MAX_ENTRIES);
+ BUILD_BUG_ON_NOT_POWER_OF_2(NIC_EQ_RING_NUM_REC);
+
+ gaudi2 = kzalloc(sizeof(*gaudi2), GFP_KERNEL);
+ if (!gaudi2)
+ return -ENOMEM;
+
+ rx_rings = kcalloc(NIC_NUMBER_OF_PORTS, sizeof(*rx_rings), GFP_KERNEL);
+ if (!rx_rings) {
+ rc = -ENOMEM;
+ goto rx_rings_fail;
+ }
+
+ cq_rings = kcalloc(NIC_NUMBER_OF_PORTS, sizeof(*cq_rings), GFP_KERNEL);
+ if (!cq_rings) {
+ rc = -ENOMEM;
+ goto cq_rings_fail;
+ }
+
+ wq_rings = kcalloc(NIC_NUMBER_OF_PORTS, sizeof(*wq_rings), GFP_KERNEL);
+ if (!wq_rings) {
+ rc = -ENOMEM;
+ goto qp_rings_fail;
+ }
+
+ hdev->asic_specific = gaudi2;
+
+ cn_aux_dev = hdev->cn_aux_dev;
+ cn_aux_data = cn_aux_dev->aux_data;
+ cn_aux_ops = cn_aux_dev->aux_ops;
+ gaudi2_cn_aux_data = cn_aux_data->asic_specific;
+ gaudi2->cn_aux_ops = cn_aux_ops->asic_ops;
+
+ gaudi2->temporal_polling = !hdev->poll_enable;
+ gaudi2->fw_security_enabled = gaudi2_cn_aux_data->fw_security_enabled;
+ gaudi2->msix_enabled = gaudi2_cn_aux_data->msix_enabled;
+ gaudi2->cfg_base = gaudi2_cn_aux_data->cfg_base;
+ gaudi2->irq_num_port_base = gaudi2_cn_aux_data->irq_num_port_base;
+ gaudi2->sob_id_base = gaudi2_cn_aux_data->sob_id_base;
+ gaudi2->sob_inc_cfg_val = gaudi2_cn_aux_data->sob_inc_cfg_val;
+ gaudi2->setup_type = gaudi2_cn_aux_data->setup_type;
+
+ gaudi2_en_aux_data = &gaudi2->en_aux_data;
+ gaudi2_en_aux_ops = &gaudi2->en_aux_ops;
+
+ en_aux_dev = &hdev->en_aux_dev;
+ en_aux_data = en_aux_dev->aux_data;
+ en_aux_ops = en_aux_dev->aux_ops;
+ en_aux_data->asic_specific = gaudi2_en_aux_data;
+ en_aux_ops->asic_ops = gaudi2_en_aux_ops;
+
+ gaudi2_en_aux_data->rx_rings = rx_rings;
+ gaudi2_en_aux_data->cq_rings = cq_rings;
+ gaudi2_en_aux_data->wq_rings = wq_rings;
+ gaudi2_en_aux_data->kernel_asid = hdev->kernel_asid;
+ gaudi2_en_aux_data->raw_qpn = RAW_QPN;
+ gaudi2_en_aux_data->tx_ring_len = NIC_TX_BUF_SIZE;
+ gaudi2_en_aux_data->schedq_num = TXS_PORT_RAW_SCHED_Q * HBL_EN_PFC_PRIO_NUM +
+ GAUDI2_PFC_PRIO_DRIVER;
+
+ /* As a W/A for H/W bug H6-3399, we increase our Tx packets by padding them with bigger
+ * value than the default. This should keep the MAC in the other side busier on each packet
+ * processing, hence decrease the rate that it pushes the packet towards the Rx.
+ */
+ gaudi2_en_aux_data->pad_size = NIC_SKB_PAD_SIZE;
+
+ /* en2cn */
+ gaudi2_en_aux_ops->configure_cq = gaudi2_cn_configure_cq;
+ gaudi2_en_aux_ops->arm_cq = gaudi2_cn_arm_cq;
+ gaudi2_en_aux_ops->write_rx_ci = gaudi2_cn_write_rx_ci;
+ gaudi2_en_aux_ops->get_pfc_cnts = gaudi2_cn_get_pfc_cnts;
+ gaudi2_en_aux_ops->ring_tx_doorbell = gaudi2_cn_ring_tx_doorbell;
+ gaudi2_en_aux_ops->qp_err_syndrome_to_str = gaudi2_cn_qp_err_syndrome_to_str;
+ gaudi2_en_aux_ops->db_fifo_reset = gaudi2_cn_db_fifo_reset;
+
+ hdev->ctrl_op_mask = BIT(HBL_CNI_OP_ALLOC_CONN) |
+ BIT(HBL_CNI_OP_SET_REQ_CONN_CTX) |
+ BIT(HBL_CNI_OP_SET_RES_CONN_CTX) |
+ BIT(HBL_CNI_OP_DESTROY_CONN) |
+ BIT(HBL_CNI_OP_USER_WQ_SET) |
+ BIT(HBL_CNI_OP_USER_WQ_UNSET) |
+ BIT(HBL_CNI_OP_SET_USER_APP_PARAMS) |
+ BIT(HBL_CNI_OP_GET_USER_APP_PARAMS) |
+ BIT(HBL_CNI_OP_ALLOC_USER_DB_FIFO) |
+ BIT(HBL_CNI_OP_USER_DB_FIFO_SET) |
+ BIT(HBL_CNI_OP_USER_DB_FIFO_UNSET) |
+ BIT(HBL_CNI_OP_EQ_POLL) |
+ BIT(HBL_CNI_OP_USER_ENCAP_ALLOC) |
+ BIT(HBL_CNI_OP_USER_ENCAP_SET) |
+ BIT(HBL_CNI_OP_USER_ENCAP_UNSET) |
+ BIT(HBL_CNI_OP_ALLOC_USER_CQ_ID) |
+ BIT(HBL_CNI_OP_USER_CQ_ID_SET) |
+ BIT(HBL_CNI_OP_USER_CQ_ID_UNSET) |
+ BIT(HBL_CNI_OP_DUMP_QP);
+
+ hdev->debugfs_supp_mask = BIT(NIC_MAC_LOOPBACK) |
+ BIT(NIC_PAM4_TX_TAPS) |
+ BIT(NIC_NRZ_TX_TAPS) |
+ BIT(NIC_POLARITY) |
+ BIT(NIC_QP) |
+ BIT(NIC_WQE) |
+ BIT(NIC_RESET_CNT) |
+ BIT(NIC_MAC_LANE_REMAP) |
+ BIT(NIC_RAND_STATUS) |
+ BIT(NIC_MMU_BYPASS) |
+ BIT(NIC_ETH_LOOPBACK) |
+ BIT(NIC_PHY_REGS_PRINT) |
+ BIT(NIC_SHOW_INTERNAL_PORTS_STATUS) |
+ BIT(NIC_PRINT_FEC_STATS) |
+ BIT(NIC_DISABLE_DECAP) |
+ BIT(NIC_PHY_SET_NRZ) |
+ BIT(NIC_PHY_DUMP_SERDES_PARAMS) |
+ BIT(NIC_PHY_CALC_BER) |
+ BIT(NIC_PHY_CALC_BER_WAIT_SEC) |
+ BIT(NIC_OVERRIDE_PORT_STATUS) |
+ BIT(NIC_ACCUMULATE_FEC_DURATION) |
+ BIT(NIC_PHY_FORCE_FIRST_TX_TAPS_CFG);
+
+ hdev->ib_support = true;
+ hdev->qpc_cache_inv_timeout = hdev->pldm ? NIC_PLDM_QPC_INV_USEC :
+ NIC_QPC_INV_USEC;
+ hdev->qp_wait_for_idle = true;
+ hdev->qp_reset_mode = CN_QP_RESET_MODE_HARD;
+ hdev->hw_invalid_while_teardown = false;
+ hdev->has_eq = true;
+ hdev->umr_support = true;
+ hdev->cc_support = true;
+
+ gaudi2->mac_rs_fec_ctrl_support = true;
+ gaudi2->flush_db_fifo = false;
+
+ hdev->cn_props.max_qp_error_syndromes = NIC_MAX_QP_ERR_SYNDROMES;
+ hdev->cn_props.status_packet_size = sizeof(struct cpucp_nic_status);
+
+ hdev->wq_arrays_pool_enable = true;
+ hdev->mmap_type_flag = HBL_CN_MMAP_TYPE_CN_MEM;
+
+ return 0;
+
+qp_rings_fail:
+ kfree(cq_rings);
+cq_rings_fail:
+ kfree(rx_rings);
+rx_rings_fail:
+ kfree(gaudi2);
+
+ return rc;
+}
+
+static void gaudi2_cn_sw_fini(struct hbl_cn_device *hdev)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_en_aux_data *en_aux_data;
+
+ en_aux_data = &gaudi2->en_aux_data;
+
+ kfree(en_aux_data->wq_rings);
+ kfree(en_aux_data->cq_rings);
+ kfree(en_aux_data->rx_rings);
+ kfree(gaudi2);
+}
+
+static void gaudi2_cn_set_en_data(struct hbl_cn_device *hdev)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_en_aux_data *gaudi2_aux_data;
+ struct gaudi2_cn_port *gaudi2_port;
+ struct hbl_en_aux_data *aux_data;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_cn_port *cn_port;
+ struct hbl_aux_dev *aux_dev;
+ int i;
+
+ aux_dev = &hdev->en_aux_dev;
+ aux_data = aux_dev->aux_data;
+ gaudi2_aux_data = &gaudi2->en_aux_data;
+ aux_data->asic_specific = gaudi2_aux_data;
+ aux_ops = aux_dev->aux_ops;
+ aux_ops->asic_ops = &gaudi2->en_aux_ops;
+
+ for (i = 0; i < NIC_NUMBER_OF_PORTS; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+ gaudi2_port = cn_port->cn_specific;
+
+ if (cn_port->eth_enable) {
+ gaudi2_aux_data->rx_rings[i] = &gaudi2_port->rx_ring;
+ gaudi2_aux_data->cq_rings[i] = &gaudi2_port->cq_rings[NIC_CQ_RAW_IDX];
+ gaudi2_aux_data->wq_rings[i] = &gaudi2_port->wq_ring;
+ }
+ }
+}
+
+static int gaudi2_register_qp(struct hbl_cn_port *cn_port, u32 qp_id, u32 asid)
+{
+ return hbl_cn_eq_dispatcher_register_qp(cn_port, asid, qp_id);
+}
+
+static void gaudi2_unregister_qp(struct hbl_cn_port *cn_port, u32 qp_id)
+{
+ hbl_cn_eq_dispatcher_unregister_qp(cn_port, qp_id);
+}
+
+static void gaudi2_get_qp_id_range(struct hbl_cn_port *cn_port, u32 *min_id, u32 *max_id)
+{
+ *min_id = NIC_MIN_CONN_ID;
+ *max_id = NIC_MAX_CONN_ID;
+}
+
+static u8 gaudi2_qp_event_is_req_event(struct hbl_cn_eqe *eqe)
+{
+ char synd_str_to_lower[GAUDI2_MAX_SYNDROME_STRING_LEN] = {};
+ u32 synd = EQE_QP_EVENT_ERR_SYND(eqe);
+ char *synd_str;
+
+ synd_str = gaudi2_cn_qp_err_syndrome_to_str(synd);
+
+ if (strlen(synd_str)) {
+ strscpy(synd_str_to_lower, synd_str, sizeof(synd_str_to_lower));
+ hbl_cn_strtolower(synd_str_to_lower);
+
+ if (strnstr(synd_str_to_lower, "req", strlen(synd_str_to_lower)))
+ return 1;
+ }
+
+ return 0;
+}
+
+static int gaudi2_eq_poll(struct hbl_cn_port *cn_port, u32 asid, struct hbl_cni_eq_poll_out *event)
+{
+ u32 ev_type, ev_valid, port = cn_port->port;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_eqe eqe;
+ int rc;
+
+ rc = hbl_cn_eq_dispatcher_dequeue(cn_port, asid, &eqe, false);
+ if (rc)
+ return rc;
+
+ ev_valid = EQE_IS_VALID(&eqe);
+ if (!ev_valid) {
+ dev_dbg_ratelimited(hdev->dev,
+ "got EQE invalid entry while expecting a valid one\n");
+ return -ENODATA;
+ }
+
+ ev_type = EQE_TYPE(&eqe);
+ switch (ev_type) {
+ case EQE_COMP_ERR:
+ event->ev_type = HBL_CNI_EQ_EVENT_TYPE_CQ_ERR;
+ event->idx = EQE_CQ_EVENT_CQ_NUM(&eqe);
+ break;
+ case EQE_QP_ERR:
+ event->ev_type = HBL_CNI_EQ_EVENT_TYPE_QP_ERR;
+ event->idx = EQE_RAW_TX_EVENT_QPN(&eqe);
+ event->rest_occurred = EQE_QP_EVENT_RESET(&eqe);
+ event->is_req = gaudi2_qp_event_is_req_event(&eqe);
+ break;
+ case EQE_DB_FIFO_OVERRUN:
+ event->ev_type = HBL_CNI_EQ_EVENT_TYPE_DB_FIFO_ERR;
+ event->idx = EQE_DB_EVENT_DB_NUM(&eqe);
+ break;
+ case EQE_CONG:
+ /* completion ready in cc comp queue */
+ event->ev_type = HBL_CNI_EQ_EVENT_TYPE_CCQ;
+ event->idx = EQE_CQ_EVENT_CQ_NUM(&eqe);
+ break;
+ case EQE_LINK_STATUS:
+ event->ev_type = HBL_CNI_EQ_EVENT_TYPE_LINK_STATUS;
+ break;
+ case EQE_QP_ALIGN_COUNTERS:
+ event->ev_type = HBL_CNI_EQ_EVENT_TYPE_QP_ALIGN_COUNTERS;
+ event->idx = EQE_SW_EVENT_QPN(&eqe);
+ break;
+ default:
+ /* if the event should not be reported to the user then return
+ * as if no event was found
+ */
+ dev_dbg_ratelimited(hdev->dev, "dropping Port-%d event %d report to user\n", port,
+ ev_type);
+ return -ENODATA;
+ }
+
+ /* fill the vevent-specific data */
+ event->ev_data = eqe.data[2];
+
+ return 0;
+}
+
+static void gaudi2_get_db_fifo_id_range(struct hbl_cn_port *cn_port, u32 *min_id, u32 *max_id)
+{
+ *min_id = GAUDI2_MIN_DB_FIFO_ID;
+ *max_id = GAUDI2_MAX_DB_FIFO_ID;
+}
+
+static void gaudi2_get_db_fifo_hw_id_range(struct hbl_cn_port *cn_port, u32 *min_id, u32 *max_id)
+{
+ *min_id = db_fifo_hw_id(GAUDI2_MIN_DB_FIFO_ID);
+ *max_id = GAUDI2_DB_FIFO_SECURE_HW_ID;
+}
+
+static void gaudi2_get_db_fifo_modes_mask(struct hbl_cn_port *cn_port, u32 *mode_mask)
+{
+ *mode_mask = BIT(HBL_CNI_DB_FIFO_TYPE_DB) | BIT(HBL_CNI_DB_FIFO_TYPE_CC);
+}
+
+static int gaudi2_db_fifo_allocate(struct hbl_cn_port *cn_port,
+ struct hbl_cn_db_fifo_xarray_pdata *xa_pdata)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 fifo_size;
+
+ switch (xa_pdata->fifo_mode) {
+ case HBL_CNI_DB_FIFO_TYPE_DB:
+ case HBL_CNI_DB_FIFO_TYPE_CC:
+ fifo_size = DB_FIFO_SIZE;
+ break;
+ default:
+ dev_dbg(hdev->dev, "Port %d, invalid DB fifo mode: %d. Allocation failed\n",
+ cn_port->port, xa_pdata->fifo_mode);
+ return -EINVAL;
+ }
+
+ xa_pdata->fifo_size = fifo_size;
+
+ return 0;
+}
+
+static void gaudi2_db_fifo_free(struct hbl_cn_port *cn_port, u32 db_pool_offset, u32 fifo_size)
+{
+}
+
+static int gaudi2_db_fifo_set(struct hbl_cn_port *cn_port, struct hbl_cn_ctx *ctx, u32 id,
+ u64 ci_device_handle, struct hbl_cn_db_fifo_xarray_pdata *xa_pdata)
+{
+ struct gaudi2_cn_port *gaudi2_port = cn_port->cn_specific;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+ u32 mmu_bypass;
+ u8 is_cc;
+ u32 val;
+ int rc;
+
+ rc = gaudi2_cn_eq_dispatcher_register_db(gaudi2_port, ctx->asid, db_fifo_hw_id(id));
+ if (rc)
+ return rc;
+
+ WARN_ON_CACHE_UNALIGNED(ci_device_handle);
+
+ /* Config HW to use memory buffer for updating
+ * consumer-index(CI) when it pops fifo.
+ */
+ NIC_WREG32(NIC0_QPC0_DBFIFO0_CI_UPD_ADDR_DBFIFO_CI_UPD_ADDR_31_7 + db_fifo_offset64(id),
+ lower_32_bits(ci_device_handle) >> 7);
+ NIC_WREG32(NIC0_QPC0_DBFIFO0_CI_UPD_ADDR_DBFIFO_CI_UPD_ADDR_63_32 + db_fifo_offset64(id),
+ upper_32_bits(ci_device_handle));
+
+ is_cc = (xa_pdata->fifo_mode == HBL_CNI_DB_FIFO_TYPE_CC);
+ /* We use generic H/W FIFOs. Configured as a userspace doorbell or congestion control FIFO.
+ */
+ NIC_WREG32(NIC0_QPC0_DB_FIFO_CFG_0 + db_fifo_offset(id), is_cc);
+
+ mmu_bypass = !!(hdev->mmu_bypass);
+ val = ctx->asid;
+ val |= mmu_bypass << NIC0_QPC0_DB_FIFO_USER_OVRD_MMU_BYPASS_SHIFT;
+ NIC_WREG32(NIC0_QPC0_DB_FIFO_USER_OVRD_0 + db_fifo_offset(id), val);
+
+ return 0;
+}
+
+static void db_fifo_reset(struct hbl_cn_port *cn_port, u32 id, u64 mmap_handle)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_mem_buf *buf;
+ u32 *ci_cpu_addr;
+
+ buf = hbl_cn_mem_buf_get(hdev, mmap_handle);
+ if (!buf) {
+ dev_err(hdev->dev, "Failed to retrieve port %d db fifo CI memory\n",
+ cn_port->port);
+ return;
+ }
+
+ /* Read latest HW updated CI */
+ ci_cpu_addr = (u32 *)buf->kernel_address;
+
+ __db_fifo_reset(cn_port, ci_cpu_addr, id, false);
+
+ hbl_cn_mem_buf_put(buf);
+}
+
+static void gaudi2_db_fifo_unset(struct hbl_cn_port *cn_port, u32 id,
+ struct hbl_cn_db_fifo_xarray_pdata *xa_pdata)
+{
+ db_fifo_reset(cn_port, id, xa_pdata->ci_mmap_handle);
+
+ hbl_cn_eq_dispatcher_unregister_db(cn_port, db_fifo_hw_id(id));
+}
+
+static void gaudi2_get_encap_id_range(struct hbl_cn_port *cn_port, u32 *min_id, u32 *max_id)
+{
+ if (cn_port->port & 0x1) {
+ *min_id = GAUDI2_USER_ENCAP_ID + 2;
+ *max_id = GAUDI2_USER_ENCAP_ID + 2;
+ } else {
+ *min_id = GAUDI2_USER_ENCAP_ID;
+ *max_id = GAUDI2_USER_ENCAP_ID;
+ }
+}
+
+static void gaudi2_get_default_encap_id(struct hbl_cn_port *cn_port, u32 *id)
+{
+ u32 min, max;
+
+ gaudi2_get_encap_id_range(cn_port, &min, &max);
+ *id = max + 1;
+}
+
+static int gaudi2_encap_set(struct hbl_cn_port *cn_port, u32 encap_id,
+ struct hbl_cn_encap_xarray_pdata *xa_pdata)
+{
+ u32 encap_hdr_offset = NIC0_TXE0_ENCAP_DATA_63_32_0 - NIC0_TXE0_ENCAP_DATA_31_0_0;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 *encap_header = xa_pdata->encap_header;
+ u32 encap_cfg = 0, decap_cfg = 0;
+ u32 port = cn_port->port;
+ u32 hdr_size;
+ int i;
+
+ NIC_WREG32(NIC0_TXE0_SOURCE_IP_PORT0_0 + encap_offset(encap_id), xa_pdata->src_ip);
+
+ encap_cfg |= xa_pdata->encap_type_data & NIC0_TXE0_ENCAP_CFG_IPV4_PROTOCOL_UDP_DEST_MASK;
+
+ if (xa_pdata->encap_type == HBL_CNI_ENCAP_NONE) {
+ NIC_WREG32(NIC0_TXE0_ENCAP_CFG_0 + encap_offset(encap_id), encap_cfg);
+ return 0;
+ }
+
+ if (!IS_ALIGNED(xa_pdata->encap_header_size, sizeof(u32))) {
+ dev_err(hdev->dev, "Encap header size(%d) must be a multiple of %ld\n",
+ xa_pdata->encap_header_size, sizeof(u32));
+ return -EINVAL;
+ }
+
+ hdr_size = xa_pdata->encap_header_size / sizeof(u32);
+ encap_cfg |= (hdr_size << NIC0_TXE0_ENCAP_CFG_ENCAP_SIZE_SHIFT) &
+ NIC0_TXE0_ENCAP_CFG_ENCAP_SIZE_MASK;
+
+ if (xa_pdata->encap_type == HBL_CNI_ENCAP_OVER_UDP) {
+ encap_cfg |= BIT(NIC0_TXE0_ENCAP_CFG_HDR_FORMAT_SHIFT);
+ if (!hdev->is_decap_disabled) {
+ decap_cfg |= NIC0_RXB_CORE_TNL_DECAP_UDP_VALID_MASK;
+ decap_cfg |= (xa_pdata->encap_type_data <<
+ NIC0_RXB_CORE_TNL_DECAP_UDP_UDP_DEST_PORT_SHIFT) &
+ NIC0_RXB_CORE_TNL_DECAP_UDP_UDP_DEST_PORT_MASK;
+ decap_cfg |= (hdr_size << NIC0_RXB_CORE_TNL_DECAP_UDP_TNL_SIZE_SHIFT) &
+ NIC0_RXB_CORE_TNL_DECAP_UDP_TNL_SIZE_MASK;
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TNL_DECAP_UDP_0 + encap_offset(encap_id),
+ decap_cfg);
+ }
+ } else if (xa_pdata->encap_type == HBL_CNI_ENCAP_OVER_IPV4) {
+ if (!hdev->is_decap_disabled) {
+ decap_cfg |= NIC0_RXB_CORE_TNL_DECAP_IPV4_VALID_MASK;
+ decap_cfg |= (xa_pdata->encap_type_data <<
+ NIC0_RXB_CORE_TNL_DECAP_IPV4_IPV4_PROTOCOL_SHIFT) &
+ NIC0_RXB_CORE_TNL_DECAP_IPV4_IPV4_PROTOCOL_MASK;
+ decap_cfg |= (hdr_size << NIC0_RXB_CORE_TNL_DECAP_IPV4_TNL_SIZE_SHIFT) &
+ NIC0_RXB_CORE_TNL_DECAP_IPV4_TNL_SIZE_MASK;
+
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TNL_DECAP_IPV4_0 + encap_offset(encap_id),
+ decap_cfg);
+ }
+ }
+
+ NIC_WREG32(NIC0_TXE0_ENCAP_CFG_0 + encap_offset(encap_id), encap_cfg);
+
+ /* Encapsulation header is already aligned to 32 bits. Hence, it's
+ * safe to access it in chunks of 4 bytes.
+ */
+ for (i = 0; i * sizeof(u32) < xa_pdata->encap_header_size; i++)
+ NIC_WREG32(NIC0_TXE0_ENCAP_DATA_31_0_0 + encap_hdr_offset * i +
+ encap_offset(encap_id), encap_header[i]);
+
+ return 0;
+}
+
+static void gaudi2_encap_unset(struct hbl_cn_port *cn_port, u32 encap_id,
+ struct hbl_cn_encap_xarray_pdata *xa_pdata)
+{
+ u32 encap_hdr_offset = NIC0_TXE0_ENCAP_DATA_63_32_0 - NIC0_TXE0_ENCAP_DATA_31_0_0;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+ int i;
+
+ NIC_WREG32(NIC0_TXE0_SOURCE_IP_PORT0_0 + encap_offset(encap_id), 0);
+ NIC_WREG32(NIC0_TXE0_ENCAP_CFG_0 + encap_offset(encap_id), 0);
+
+ if (xa_pdata->encap_type == HBL_CNI_ENCAP_OVER_UDP)
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TNL_DECAP_UDP_0 + encap_offset(encap_id), 0);
+ else if (xa_pdata->encap_type == HBL_CNI_ENCAP_OVER_IPV4)
+ NIC_MACRO_WREG32(NIC0_RXB_CORE_TNL_DECAP_IPV4_0 + encap_offset(encap_id), 0);
+
+ for (i = 0; i * sizeof(u32) < xa_pdata->encap_header_size; i++)
+ NIC_WREG32(NIC0_TXE0_ENCAP_DATA_31_0_0 + encap_hdr_offset * i +
+ encap_offset(encap_id), 0);
+}
+
+static u32 gaudi2_cn_get_default_port_speed(struct hbl_cn_device *hdev)
+{
+ return SPEED_100000;
+}
+
+static void gaudi2_cn_get_cnts_names(struct hbl_cn_port *cn_port, u8 *data, bool ext)
+{
+ char str[HBL_IB_CNT_NAME_LEN], *rx_fmt, *tx_fmt;
+ struct hbl_cn_stat *spmu_stats;
+ u32 n_spmu_stats;
+ int i, len;
+
+ if (ext) {
+ len = HBL_IB_CNT_NAME_LEN;
+ rx_fmt = "rx_%s";
+ tx_fmt = "tx_%s";
+ } else {
+ len = ETH_GSTRING_LEN;
+ rx_fmt = "%s";
+ tx_fmt = "%s";
+ }
+
+ hbl_cn_spmu_get_stats_info(cn_port, &spmu_stats, &n_spmu_stats);
+
+ for (i = 0; i < n_spmu_stats; i++)
+ memcpy(data + i * len, spmu_stats[i].str, ETH_GSTRING_LEN);
+ data += i * len;
+
+ if (!cn_port->hdev->skip_mac_cnts) {
+ for (i = 0; i < hbl_cn_mac_stats_rx_len; i++) {
+ memset(str, 0, len);
+ snprintf(str, len, rx_fmt, hbl_cn_mac_stats_rx[i].str);
+ memcpy(data + i * len, str, len);
+ }
+ data += i * len;
+
+ for (i = 0; i < gaudi2_cn_mac_fec_stats_len; i++)
+ memcpy(data + i * len, gaudi2_cn_mac_fec_stats[i].str, ETH_GSTRING_LEN);
+ data += i * len;
+
+ for (i = 0; i < hbl_cn_mac_stats_tx_len; i++) {
+ memset(str, 0, len);
+ snprintf(str, len, tx_fmt, hbl_cn_mac_stats_tx[i].str);
+ memcpy(data + i * len, str, len);
+ }
+ data += i * len;
+ }
+
+ for (i = 0; i < gaudi2_cn_err_stats_len; i++)
+ memcpy(data + i * len, gaudi2_cn_err_stats[i].str, ETH_GSTRING_LEN);
+ data += i * len;
+
+ for (i = 0; i < gaudi2_cn_perf_stats_len; i++)
+ memcpy(data + i * len, gaudi2_cn_perf_stats[i].str, ETH_GSTRING_LEN);
+}
+
+static int gaudi2_cn_get_cnts_num(struct hbl_cn_port *cn_port)
+{
+ int n_spmu_stats, mac_counters;
+ struct hbl_cn_stat *ignore;
+
+ hbl_cn_spmu_get_stats_info(cn_port, &ignore, &n_spmu_stats);
+
+ mac_counters = !cn_port->hdev->skip_mac_cnts ? hbl_cn_mac_stats_rx_len +
+ hbl_cn_mac_stats_tx_len + gaudi2_cn_mac_fec_stats_len : 0;
+
+ return n_spmu_stats + mac_counters + gaudi2_cn_err_stats_len + gaudi2_cn_perf_stats_len;
+}
+
+static int gaudi2_cn_get_mac_tx_stats(struct hbl_cn_port *cn_port, u64 *data)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u64 start_reg, lo_part, hi_part;
+ u32 port = cn_port->port;
+ int i;
+
+ start_reg = (port & 1) ? NIC0_MAC_GLOB_STAT_TX2_ETHERSTATSOCTETS_6 :
+ NIC0_MAC_GLOB_STAT_TX0_ETHERSTATSOCTETS_4;
+
+ for (i = 0; i < hbl_cn_mac_stats_tx_len; i++) {
+ lo_part = NIC_MACRO_RREG32(start_reg + hbl_cn_mac_stats_tx[i].lo_offset);
+ /* Upper part must be read after lower part, since the upper part register
+ * gets its value only after the lower part was read.
+ */
+ hi_part = NIC_MACRO_RREG32(NIC0_MAC_GLOB_STAT_CONTROL_REG_DATA_HI);
+
+ data[i] = lo_part | (hi_part << 32);
+ }
+
+ return i;
+}
+
+static int gaudi2_cn_get_mac_rx_stats(struct hbl_cn_port *cn_port, u64 *data)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u64 start_reg, lo_part, hi_part;
+ u32 port = cn_port->port;
+ int i;
+
+ start_reg = (port & 1) ? NIC0_MAC_GLOB_STAT_RX2_ETHERSTATSOCTETS_2 :
+ NIC0_MAC_GLOB_STAT_RX0_ETHERSTATSOCTETS;
+
+ for (i = 0; i < hbl_cn_mac_stats_rx_len; i++) {
+ lo_part = NIC_MACRO_RREG32(start_reg + hbl_cn_mac_stats_rx[i].lo_offset);
+ /* Upper part must be read after lower part, since the upper part register
+ * gets its value only after the lower part was read.
+ */
+ hi_part = NIC_MACRO_RREG32(NIC0_MAC_GLOB_STAT_CONTROL_REG_DATA_HI);
+
+ data[i] = lo_part | (hi_part << 32);
+ }
+
+ return i;
+}
+
+static int __gaudi2_cn_get_mac_fec_stats(struct hbl_cn_port *cn_port, u64 *data)
+{
+ u64 start_jiffies, diff_ms, numerator, denominator, integer, exp;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+ int i;
+
+ if (!data)
+ return 0;
+
+ /* Read the relevant registers in order to clear them */
+ if (port & 1) {
+ cn_port->correctable_errors_cnt +=
+ NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC2_CCW_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC2_CCW_HI) << 16);
+ cn_port->uncorrectable_errors_cnt +=
+ NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC2_NCCW_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC2_NCCW_HI) << 16);
+
+ for (i = 0; i < 8; i++)
+ NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR4_LO + 4 * i);
+ } else {
+ cn_port->correctable_errors_cnt +=
+ NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_CCW_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_CCW_HI) << 16);
+ cn_port->uncorrectable_errors_cnt +=
+ NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_NCCW_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_NCCW_HI) << 16);
+
+ for (i = 0; i < 8; i++)
+ NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR0_LO + 4 * i);
+ }
+
+ start_jiffies = jiffies;
+
+ /* sleep some time to accumulate stats */
+ msleep(hdev->accumulate_fec_duration);
+
+ diff_ms = jiffies_to_msecs(jiffies - start_jiffies);
+
+ if (port & 1) {
+ data[FEC_CW_CORRECTED] = NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC2_CCW_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC2_CCW_HI) << 16);
+ data[FEC_CW_UNCORRECTED] = NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC2_NCCW_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC2_NCCW_HI) <<
+ 16);
+ data[FEC_SYMBOL_ERR_CORRECTED_LANE_0] =
+ NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR4_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR4_HI) <<
+ 16);
+ data[FEC_SYMBOL_ERR_CORRECTED_LANE_1] =
+ NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR5_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR5_HI) <<
+ 16);
+ data[FEC_SYMBOL_ERR_CORRECTED_LANE_2] =
+ NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR6_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR6_HI) <<
+ 16);
+ data[FEC_SYMBOL_ERR_CORRECTED_LANE_3] =
+ NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR7_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR7_HI) <<
+ 16);
+ } else {
+ data[FEC_CW_CORRECTED] = NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_CCW_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_CCW_HI) << 16);
+ data[FEC_CW_UNCORRECTED] = NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_NCCW_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_NCCW_HI) <<
+ 16);
+ data[FEC_SYMBOL_ERR_CORRECTED_LANE_0] =
+ NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR0_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR0_HI) <<
+ 16);
+ data[FEC_SYMBOL_ERR_CORRECTED_LANE_1] =
+ NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR1_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR1_HI) <<
+ 16);
+ data[FEC_SYMBOL_ERR_CORRECTED_LANE_2] =
+ NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR2_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR2_HI) <<
+ 16);
+ data[FEC_SYMBOL_ERR_CORRECTED_LANE_3] =
+ NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR3_LO) |
+ (NIC_MACRO_RREG32(NIC0_MAC_RS_FEC_RSFEC_SYMBLERR3_HI) <<
+ 16);
+ }
+
+ cn_port->correctable_errors_cnt += data[FEC_CW_CORRECTED];
+ cn_port->uncorrectable_errors_cnt += data[FEC_CW_UNCORRECTED];
+
+ data[FEC_CW_CORRECTED_ACCUM] = cn_port->correctable_errors_cnt;
+ data[FEC_CW_UNCORRECTED_ACCUM] = cn_port->uncorrectable_errors_cnt;
+
+ /* The denominator is the total number of symbols in the measured time T ms
+ * (100G bits/sec = 10G sym/sec = 10G * T/1000 sym = 1G * T / 100 sym)
+ */
+ denominator = div_u64((u64)BIT(30) * diff_ms, 100);
+
+ /* Pre FEC: the numerator is the sum of uncorrected symbols (~= uncorrected_cw * 16) and
+ * corrected symbols.
+ */
+ numerator = data[FEC_CW_UNCORRECTED] << 4;
+ for (i = 0; i < 4; i++)
+ numerator += data[FEC_SYMBOL_ERR_CORRECTED_LANE_0 + i];
+
+ hbl_cn_get_frac_info(numerator, denominator, &integer, &exp);
+
+ data[FEC_PRE_FEC_SER_INT] = integer;
+ data[FEC_PRE_FEC_SER_EXP] = exp;
+
+ /* Post FEC: the numerator is the uncorrected symbols (~= uncorrected_cw * 16) */
+ numerator = data[FEC_CW_UNCORRECTED] << 4;
+
+ hbl_cn_get_frac_info(numerator, denominator, &integer, &exp);
+
+ data[FEC_POST_FEC_SER_INT] = integer;
+ data[FEC_POST_FEC_SER_EXP] = exp;
+
+ return (int)gaudi2_cn_mac_fec_stats_len;
+}
+
+static int gaudi2_cn_get_mac_stats(struct hbl_cn_port *cn_port, u64 *data)
+{
+ int cnt = 0;
+
+ if (cn_port->hdev->skip_mac_cnts)
+ return 0;
+
+ cnt += gaudi2_cn_get_mac_rx_stats(cn_port, &data[cnt]);
+ cnt += __gaudi2_cn_get_mac_fec_stats(cn_port, &data[cnt]);
+ cnt += gaudi2_cn_get_mac_tx_stats(cn_port, &data[cnt]);
+
+ return cnt;
+}
+
+static int gaudi2_cn_get_err_stats(struct hbl_cn_port *cn_port, u64 *data)
+{
+ struct gaudi2_en_aux_ops *gaudi2_en_aux_ops;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct gaudi2_cn_device *gaudi2;
+ struct hbl_aux_dev *aux_dev;
+ int i = 0;
+
+ gaudi2 = hdev->asic_specific;
+ aux_dev = &hdev->en_aux_dev;
+ gaudi2_en_aux_ops = &gaudi2->en_aux_ops;
+
+ data[i++] = cn_port->cong_q_err_cnt;
+
+ if (cn_port->eth_enable && gaudi2_en_aux_ops->get_overrun_cnt)
+ data[i++] = gaudi2_en_aux_ops->get_overrun_cnt(aux_dev, cn_port->port);
+ else
+ data[i++] = 0;
+
+ return i;
+}
+
+static int gaudi2_cn_get_perf_stats(struct hbl_cn_port *cn_port, u64 *data)
+{
+ u64 lat_dividend, lat_divisor, lat_int, lat_frac;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct hbl_cn_properties *cn_prop;
+ u64 bw_dividend, bw_int, bw_frac;
+ u32 port = cn_port->port;
+
+ cn_prop = &hdev->cn_props;
+
+ /* Bandwidth calculation */
+ bw_dividend = (((u64)NIC_RREG32(NIC0_TXE0_STATS_MEAS_WIN_BYTES_MSB)) << 32) |
+ NIC_RREG32(NIC0_TXE0_STATS_MEAS_WIN_BYTES_LSB);
+
+ /* bytes to bits */
+ bw_dividend *= BITS_PER_BYTE;
+
+ bw_int = div_u64(bw_dividend, PERF_BW_WINDOW_DIV);
+ bw_frac = ((bw_dividend - PERF_BW_WINDOW_DIV * bw_int) * 10) / PERF_BW_WINDOW_DIV;
+
+ /* In case there is no traffic (BW=0), the latency will show the last measured value (when
+ * there was traffic). Therefore, we need to clear it.
+ */
+ if (bw_int == 0 && bw_frac == 0) {
+ lat_int = 0;
+ lat_frac = 0;
+ } else {
+ /* Latency calculation */
+ lat_dividend = (((u64)NIC_RREG32(NIC0_TXE0_STATS_TOT_BYTES_MSB)) << 32) |
+ NIC_RREG32(NIC0_TXE0_STATS_TOT_BYTES_LSB);
+ lat_divisor = cn_prop->clk;
+
+ lat_int = div_u64(lat_dividend, lat_divisor);
+ lat_frac = ((lat_dividend - lat_divisor * lat_int) * 10) / lat_divisor;
+ }
+
+ data[PERF_BANDWIDTH_INT] = bw_int;
+ data[PERF_BANDWIDTH_FRAC] = bw_frac;
+ data[PERF_LATENCY_INT] = lat_int;
+ data[PERF_LATENCY_FRAC] = lat_frac;
+
+ return (int)gaudi2_cn_perf_stats_len;
+}
+
+static void gaudi2_cn_get_cnts_values(struct hbl_cn_port *cn_port, u64 *data)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 cnt = 0;
+ int rc;
+
+ rc = hbl_cn_read_spmu_counters(cn_port, &data[cnt], &cnt);
+ if (rc)
+ dev_err(hdev->dev, "Failed to get SPMU counters, port %d\n", cn_port->port);
+
+ cnt += gaudi2_cn_get_mac_stats(cn_port, &data[cnt]);
+ cnt += gaudi2_cn_get_err_stats(cn_port, &data[cnt]);
+ cnt += gaudi2_cn_get_perf_stats(cn_port, &data[cnt]);
+}
+
+static void gaudi2_cn_reset_mac_stats(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+
+ NIC_MACRO_WREG32(NIC0_MAC_GLOB_STAT_CONTROL_REG_STATN_CONFIG,
+ BIT(NIC0_MAC_GLOB_STAT_CONTROL_REG_STATN_CONFIG_F_RESET_SHIFT));
+}
+
+void gaudi2_cn_get_mac_fec_stats(struct hbl_cn_port *cn_port, u64 *data)
+{
+ __gaudi2_cn_get_mac_fec_stats(cn_port, data);
+}
+
+/* HW bug: QP stuck in limited state in case there is a race between timeout and receive ACK.
+ * Description: When timeout occurs QPC resets NTS to ONA. If an ACK arives to the RX, the RX reads
+ * the QPC and timeout occurs after the read response to the RX and before the RX update indication,
+ * the NTS will rollback to the ONA.
+ * After the RX handles the ACK, it will do an update, and may advance the ONA ahead of the NTS.
+ * In this case the QP will go into limited state forever: NTS - ONA > congestion window
+ */
+static void __qpc_sanity_check(struct gaudi2_cn_port *gaudi2_port, u32 qpn)
+{
+ u32 ona_psn, nts_psn, in_work, bcs_psn, bcc_psn, ona_rem_pi, consumer_idx, execution_idx,
+ is_valid, port, wq_type;
+ int rc, retry_cnt_in_work = 0, retry_cnt_qpc_timeout = 0;
+ struct gaudi2_qpc_requester req_qpc = {};
+ struct qpc_mask qpc_mask = {};
+ struct hbl_cn_port *cn_port;
+ struct hbl_cn_device *hdev;
+
+ cn_port = gaudi2_port->cn_port;
+ hdev = cn_port->hdev;
+ port = cn_port->port;
+
+retry:
+ rc = gaudi2_cn_qpc_read(cn_port, (void *)&req_qpc, qpn, true);
+ if (rc) {
+ dev_err_ratelimited(hdev->dev, "Requester port %d QPC %d read failed\n", port, qpn);
+ return;
+ }
+
+ is_valid = REQ_QPC_GET_VALID(req_qpc);
+ if (!is_valid)
+ return;
+
+ /* When the timeout retry counter is non zero, an ack could potentially arrive and increase
+ * ONA, after QPC was read.
+ */
+ if (REQ_QPC_GET_TIMEOUT_RETRY_COUNT(req_qpc)) {
+ if (retry_cnt_qpc_timeout < RETRY_COUNT_QPC_SANITY) {
+ dev_dbg(hdev->dev, "QPC timeout retry count > 0, trying again #%d\n",
+ retry_cnt_qpc_timeout);
+ usleep_range(1000, 1500);
+ retry_cnt_qpc_timeout++;
+ goto retry;
+ } else {
+ dev_dbg(hdev->dev,
+ "Can't apply fix. QPC timeout retry count > 0, after %d QPC reads",
+ retry_cnt_qpc_timeout);
+ return;
+ }
+ }
+
+ in_work = REQ_QPC_GET_IN_WORK(req_qpc);
+
+ ona_psn = REQ_QPC_GET_ONA_PSN(req_qpc);
+ nts_psn = REQ_QPC_GET_NTS_PSN(req_qpc);
+
+ bcs_psn = REQ_QPC_GET_BCS_PSN(req_qpc);
+ bcc_psn = REQ_QPC_GET_BCC_PSN(req_qpc);
+
+ consumer_idx = REQ_QPC_GET_CONSUMER_IDX(req_qpc);
+ execution_idx = REQ_QPC_GET_EXECUTION_IDX(req_qpc);
+
+ ona_rem_pi = REQ_QPC_GET_OLDEST_UNACKED_REMOTE_PRODUCER_IDX(req_qpc);
+
+ wq_type = REQ_QPC_GET_WQ_TYPE(req_qpc);
+
+ /* We hit the HW bug. Unacknowledged PSN can never be greater than next
+ * PSN to be sent out.
+ */
+ if (NIC_IS_PSN_CYCLIC_BIG(ona_psn, nts_psn)) {
+ struct hbl_cn_eqe eqe;
+
+ dev_dbg(hdev->dev,
+ "ona_psn(%d) nts_psn(%d), bcc_psn(%d) bcs_psn(%d), consumer_idx(%d) execution_idx(%d). Retry_cnt %d\n",
+ ona_psn, nts_psn, bcc_psn, bcs_psn, consumer_idx, execution_idx,
+ retry_cnt_in_work);
+
+ /* Wait till HW stops working on QPC. */
+ if (in_work && retry_cnt_in_work < RETRY_COUNT_QPC_SANITY) {
+ usleep_range(1000, 1500);
+ retry_cnt_in_work++;
+ goto retry;
+ }
+
+ dev_dbg(hdev->dev, "Port %d QP %d in limited state. Applying fix.\n", port, qpn);
+
+ /* Force update QPC fields. */
+
+ REQ_QPC_SET_NTS_PSN(qpc_mask, 0xffffff);
+ REQ_QPC_SET_BCS_PSN(qpc_mask, 0xffffff);
+ REQ_QPC_SET_EXECUTION_IDX(qpc_mask, 0x3fffff);
+ if (wq_type == QPC_REQ_WQ_TYPE_WRITE)
+ REQ_QPC_SET_REMOTE_PRODUCER_IDX(qpc_mask, 0x3fffff);
+
+ REQ_QPC_SET_NTS_PSN(req_qpc, ona_psn);
+ REQ_QPC_SET_BCS_PSN(req_qpc, bcc_psn);
+ REQ_QPC_SET_EXECUTION_IDX(req_qpc, consumer_idx);
+ if (wq_type == QPC_REQ_WQ_TYPE_WRITE)
+ REQ_QPC_SET_REMOTE_PRODUCER_IDX(req_qpc, ona_rem_pi);
+
+ rc = gaudi2_cn_qpc_write_masked(cn_port, (void *)&req_qpc, &qpc_mask, qpn, true,
+ true);
+ if (rc)
+ dev_err(hdev->dev, "Requester port %d QPC %d write failed\n", port, qpn);
+
+ eqe.data[0] = EQE_HEADER(true, EQE_QP_ALIGN_COUNTERS);
+ eqe.data[1] = qpn;
+
+ rc = hbl_cn_eq_dispatcher_enqueue(cn_port, &eqe);
+ if (rc)
+ dev_err(hdev->dev, "port %d QPC %d failed dispatching EQ event %d\n", port,
+ qpn, EQE_QP_ALIGN_COUNTERS);
+ }
+}
+
+/* We sanitize one QP at a time since it's high latency operation,
+ * too heavy to do it in one shot. We mitigate this via interleaving
+ * with thread scheduling.
+ */
+static void gaudi2_qp_sanity_work(struct work_struct *work)
+{
+ struct gaudi2_cn_port *gaudi2_port = container_of(work, struct gaudi2_cn_port,
+ qp_sanity_work.work);
+ struct hbl_cn_port *cn_port = gaudi2_port->cn_port;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ unsigned long qp_id = 0;
+ u32 timeout_cnt, port;
+ struct hbl_cn_qp *qp;
+
+ port = cn_port->port;
+ timeout_cnt = NIC_RREG32(NIC0_QPC0_NUM_TIMEOUTS);
+
+ if (gaudi2_port->qp_timeout_cnt == timeout_cnt)
+ goto done;
+
+ gaudi2_port->qp_timeout_cnt = timeout_cnt;
+
+ mutex_lock(&gaudi2_port->cfg_lock);
+ xa_for_each(&cn_port->qp_ids, qp_id, qp)
+ if (qp && qp->is_req)
+ __qpc_sanity_check(gaudi2_port, qp_id);
+ mutex_unlock(&gaudi2_port->cfg_lock);
+
+done:
+ queue_delayed_work(gaudi2_port->qp_sanity_wq, &gaudi2_port->qp_sanity_work,
+ msecs_to_jiffies(QPC_SANITY_CHECK_INTERVAL_MS));
+}
+
+static int gaudi2_qp_sanity_init(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_port *cn_port = gaudi2_port->cn_port;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 port = cn_port->port;
+ char wq_name[30] = {0};
+
+ /* The qp sanity work is relevant only for external ports */
+ if (!cn_port->eth_enable)
+ return 0;
+
+ snprintf(wq_name, sizeof(wq_name) - 1, "hbl%u-cn%d-qp-sanity", hdev->id, port);
+
+ gaudi2_port->qp_sanity_wq = alloc_ordered_workqueue(wq_name, 0);
+ if (!gaudi2_port->qp_sanity_wq) {
+ dev_err(hdev->dev, "Failed to create QP sanity WQ, port: %d\n", port);
+ return -ENOMEM;
+ }
+
+ INIT_DELAYED_WORK(&gaudi2_port->qp_sanity_work, gaudi2_qp_sanity_work);
+ queue_delayed_work(gaudi2_port->qp_sanity_wq, &gaudi2_port->qp_sanity_work,
+ msecs_to_jiffies(QPC_SANITY_CHECK_INTERVAL_MS));
+
+ return 0;
+}
+
+static void gaudi2_qp_sanity_fini(struct gaudi2_cn_port *gaudi2_port)
+{
+ if (!gaudi2_port->qp_sanity_wq)
+ return;
+
+ cancel_delayed_work_sync(&gaudi2_port->qp_sanity_work);
+ destroy_workqueue(gaudi2_port->qp_sanity_wq);
+}
+
+static void gaudi2_cn_user_ccq_set(struct hbl_cn_port *cn_port, u64 ccq_device_addr,
+ u64 pi_device_addr, u32 num_of_entries, u32 *ccqn)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+
+ WARN_ON_CACHE_UNALIGNED(ccq_device_addr);
+ WARN_ON_CACHE_UNALIGNED(pi_device_addr);
+
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_BASE_ADDR_63_32, upper_32_bits(ccq_device_addr));
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_BASE_ADDR_31_7, ((ccq_device_addr >> 7) & 0x1FFFFFF));
+
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_PI_ADDR_63_32, upper_32_bits(pi_device_addr));
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_PI_ADDR_31_7, ((pi_device_addr >> 7) & 0x1FFFFFF));
+
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_WRITE_INDEX, 0);
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_PRODUCER_INDEX, 0);
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_CONSUMER_INDEX, 0);
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_CONSUMER_INDEX_CB, 0);
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_LOG_SIZE, ilog2(num_of_entries));
+
+ /* set enable + update-pi
+ * set overrun-en to allow overrun of ci since a HW bug exist
+ * in Gaudi2 which prevents updating ci.
+ */
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_CFG, NIC0_QPC0_CONG_QUE_CFG_ENABLE_MASK |
+ NIC0_QPC0_CONG_QUE_CFG_OVERRUN_EN_MASK |
+ NIC0_QPC0_CONG_QUE_CFG_WRITE_PI_EN_MASK);
+
+ /* gaudi2 has only 1 CCQ. Therefore, set 0 as ccqn. */
+ *ccqn = 0;
+}
+
+static void gaudi2_cn_user_ccq_unset(struct hbl_cn_port *cn_port, u32 *ccqn)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_CFG, 0);
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_PI_ADDR_63_32, 0);
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_PI_ADDR_31_7, 0);
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_BASE_ADDR_63_32, 0);
+ NIC_WREG32(NIC0_QPC0_CONG_QUE_BASE_ADDR_31_7, 0);
+
+ /* gaudi2 has only 1 CCQ. Therefore, set 0 as ccqn. */
+ *ccqn = 0;
+}
+
+static void gaudi2_cn_get_spmu_data(struct hbl_cn_port *cn_port, struct hbl_cn_cpucp_status *status)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u64 spmu_data[NIC_SPMU_STATS_LEN_MAX];
+ u32 port = cn_port->port, ignore;
+ int rc;
+
+ memset(spmu_data, 0, sizeof(spmu_data));
+
+ rc = hbl_cn_read_spmu_counters(cn_port, spmu_data, &ignore);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to get SPMU counters, port %d, %d\n", port, rc);
+ return;
+ }
+
+ status->bad_format_cnt = 0;
+ status->responder_out_of_sequence_psn_cnt = spmu_data[3];
+}
+
+static void gaudi2_cn_get_fec_status(struct hbl_cn_port *cn_port,
+ struct hbl_cn_cpucp_status *status)
+{
+ u64 fec_data[FEC_STAT_LAST];
+
+ memset(fec_data, 0, sizeof(fec_data));
+
+ gaudi2_cn_get_mac_fec_stats(cn_port, fec_data);
+
+ status->correctable_err_cnt = fec_data[FEC_CW_CORRECTED_ACCUM];
+ status->uncorrectable_err_cnt = fec_data[FEC_CW_UNCORRECTED_ACCUM];
+ status->pre_fec_ser.integer = fec_data[FEC_PRE_FEC_SER_INT];
+ status->pre_fec_ser.exp = fec_data[FEC_PRE_FEC_SER_EXP];
+ status->post_fec_ser.integer = fec_data[FEC_POST_FEC_SER_INT];
+ status->post_fec_ser.exp = fec_data[FEC_POST_FEC_SER_EXP];
+}
+
+static void gaudi2_cn_get_perf_status(struct hbl_cn_port *cn_port,
+ struct hbl_cn_cpucp_status *status)
+{
+ u64 perf_data[PERF_STAT_LAST];
+
+ memset(perf_data, 0, sizeof(perf_data));
+
+ gaudi2_cn_get_perf_stats(cn_port, perf_data);
+
+ status->bandwidth.integer = perf_data[PERF_BANDWIDTH_INT];
+ status->bandwidth.frac = perf_data[PERF_BANDWIDTH_FRAC];
+ status->lat.integer = perf_data[PERF_LATENCY_INT];
+ status->lat.frac = perf_data[PERF_LATENCY_FRAC];
+}
+
+static u32 gaudi2_cn_get_timeout_retransmission_cnt(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+
+ return NIC_RREG32(NIC0_QPC0_NUM_TIMEOUTS);
+}
+
+static u32 gaudi2_cn_get_high_ber_cnt(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+
+ if (port & 1)
+ return NIC_RREG32(NIC0_MAC_CH2_MAC_PCS_BER_HIGH_ORDER_CNT);
+ else
+ return NIC_RREG32(NIC0_MAC_CH0_MAC_PCS_BER_HIGH_ORDER_CNT);
+}
+
+static void gaudi2_cn_get_status(struct hbl_cn_port *cn_port, struct hbl_cn_cpucp_status *status)
+{
+ u32 timeout_retransmission_cnt, high_ber_cnt;
+
+ gaudi2_cn_get_spmu_data(cn_port, status);
+ gaudi2_cn_get_fec_status(cn_port, status);
+ gaudi2_cn_get_perf_status(cn_port, status);
+
+ timeout_retransmission_cnt = gaudi2_cn_get_timeout_retransmission_cnt(cn_port);
+ high_ber_cnt = gaudi2_cn_get_high_ber_cnt(cn_port);
+
+ status->timeout_retransmission_cnt = timeout_retransmission_cnt;
+ status->high_ber_cnt = high_ber_cnt;
+}
+
+static void gaudi2_cn_cfg_lock(struct hbl_cn_port *cn_port)
+ __acquires(&gaudi2_port->cfg_lock)
+{
+ struct gaudi2_cn_port *gaudi2_port = cn_port->cn_specific;
+
+ mutex_lock(&gaudi2_port->cfg_lock);
+}
+
+static void gaudi2_cn_cfg_unlock(struct hbl_cn_port *cn_port)
+ __releases(&gaudi2_port->cfg_lock)
+{
+ struct gaudi2_cn_port *gaudi2_port = cn_port->cn_specific;
+
+ mutex_unlock(&gaudi2_port->cfg_lock);
+}
+
+static bool gaudi2_cn_cfg_is_locked(struct hbl_cn_port *cn_port)
+{
+ struct gaudi2_cn_port *gaudi2_port = cn_port->cn_specific;
+
+ return mutex_is_locked(&gaudi2_port->cfg_lock);
+}
+
+static u32 gaudi2_cn_get_max_msg_sz(struct hbl_cn_device *hdev)
+{
+ return SZ_1G;
+}
+
+static void gaudi2_cn_app_params_clear(struct hbl_cn_device *hdev)
+{
+}
+
+static void gaudi2_cn_set_port_status(struct hbl_cn_port *cn_port, bool up)
+{
+ cn_port->link_eqe.data[2] = !!up;
+}
+
+static void gaudi2_cn_adaptive_tmr_reset(struct hbl_cn_qp *qp)
+{
+ struct hbl_cn_port *cn_port = qp->cn_port;
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct gaudi2_qpc_requester req_qpc;
+ struct hbl_cn_device *hdev;
+ u64 retry_count;
+ u8 user_gran;
+ u32 rc;
+
+ hdev = cn_port->hdev;
+ user_gran = qp->timeout_granularity - NIC_ADAPTIVE_TIMEOUT_RANGE / 2;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ port_funcs->cfg_lock(cn_port);
+ rc = gaudi2_cn_qpc_read(cn_port, &req_qpc, qp->qp_id, true);
+
+ if (rc)
+ goto out;
+
+ retry_count = REQ_QPC_GET_TIMEOUT_RETRY_COUNT(req_qpc);
+
+ if (!retry_count) {
+ if (qp->timeout_curr != user_gran)
+ qp->timeout_curr = user_gran;
+ } else if (qp->timeout_curr == user_gran) {
+ dev_err(hdev->dev, "Retry count is %lld, but current gran is already reset\n",
+ retry_count);
+ } else if (!REQ_QPC_GET_ERROR(req_qpc)) {
+ queue_delayed_work(cn_port->qp_wq, &qp->adaptive_tmr_reset,
+ msecs_to_jiffies(5));
+ }
+
+out:
+ port_funcs->cfg_unlock(cn_port);
+}
+
+static int gaudi2_cn_send_cpucp_packet(struct hbl_cn_port *cn_port, enum cpucp_packet_id packet_id,
+ int val)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct cpucp_packet pkt;
+ u32 port;
+ int rc;
+
+ port = cn_port->port;
+
+ memset(&pkt, 0, sizeof(pkt));
+ pkt.ctl = cpu_to_le32(packet_id << CPUCP_PKT_CTL_OPCODE_SHIFT);
+ pkt.value = cpu_to_le64(val);
+ pkt.macro_index = cpu_to_le32(port);
+
+ rc = gaudi2_cn_send_cpu_message(hdev, (u32 *)&pkt, sizeof(pkt), 0, NULL);
+ if (rc)
+ dev_err(hdev->dev,
+ "Failed to send cpucp packet, port %d packet id %d, val %d, error %d\n",
+ port, packet_id, val, rc);
+
+ return rc;
+}
+
+static void gaudi2_cn_spmu_get_stats_info(struct hbl_cn_port *cn_port, struct hbl_cn_stat **stats,
+ u32 *n_stats)
+{
+ struct gaudi2_cn_aux_ops *gaudi2_aux_ops;
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = cn_port->hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ gaudi2_aux_ops = aux_ops->asic_ops;
+
+ gaudi2_aux_ops->spmu_get_stats_info(aux_dev, cn_port->port, stats, n_stats);
+}
+
+static int gaudi2_cn_spmu_config(struct hbl_cn_port *cn_port, u32 num_event_types,
+ u32 event_types[], bool enable)
+{
+ struct gaudi2_cn_aux_ops *gaudi2_aux_ops;
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = cn_port->hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ gaudi2_aux_ops = aux_ops->asic_ops;
+
+ return gaudi2_aux_ops->spmu_config(aux_dev, cn_port->port, num_event_types, event_types,
+ enable);
+}
+
+static int gaudi2_cn_spmu_sample(struct hbl_cn_port *cn_port, u32 num_out_data, u64 out_data[])
+{
+ struct gaudi2_cn_aux_ops *gaudi2_aux_ops;
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = cn_port->hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ gaudi2_aux_ops = aux_ops->asic_ops;
+
+ return gaudi2_aux_ops->spmu_sample(aux_dev, cn_port->port, num_out_data, out_data);
+}
+
+static void gaudi2_cn_post_send_status(struct hbl_cn_port *cn_port)
+{
+ struct gaudi2_cn_aux_ops *gaudi2_aux_ops;
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = cn_port->hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ gaudi2_aux_ops = aux_ops->asic_ops;
+
+ gaudi2_aux_ops->post_send_status(aux_dev, cn_port->port);
+}
+
+static int gaudi2_cn_inject_rx_err(struct hbl_cn_device *hdev, u8 drop_percent)
+{
+ /* NoOps */
+ return 0;
+}
+
+static bool gaudi2_cn_is_encap_supported(struct hbl_cn_device *hdev,
+ struct hbl_cni_user_encap_set_in *in)
+{
+ if (in->encap_type != HBL_CNI_ENCAP_OVER_UDP) {
+ dev_dbg(hdev->dev, "Encap type %u is not supported\n", in->encap_type);
+ return false;
+ }
+
+ if (in->tnl_hdr_size != NIC_MAX_TNL_HDR_SIZE) {
+ dev_dbg(hdev->dev, "Encap hdr-size must be %d\n", NIC_MAX_TNL_HDR_SIZE);
+ return false;
+ }
+
+ return true;
+}
+
+static int gaudi2_cn_set_static_properties(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_properties *cn_prop = &hdev->cn_props;
+ struct hbl_cn_aux_data *cn_aux_data;
+ struct hbl_aux_dev *cn_aux_dev;
+
+ cn_aux_dev = hdev->cn_aux_dev;
+ cn_aux_data = cn_aux_dev->aux_data;
+
+ cn_prop->max_num_of_ports = NIC_NUMBER_OF_PORTS;
+ cn_prop->macro_cfg_size = cn_aux_data->macro_cfg_size;
+ cn_prop->txs_base_size = TXS_TOTAL_PORT_SIZE;
+ cn_prop->tmr_base_size = TMR_TOTAL_MACRO_SIZE;
+ cn_prop->req_qpc_base_size = REQ_QPC_TOTAL_PORT_SIZE;
+ cn_prop->res_qpc_base_size = RES_QPC_TOTAL_PORT_SIZE;
+ cn_prop->clk = cn_aux_data->clk;
+
+ return 0;
+}
+
+static int gaudi2_cn_set_dram_properties(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_properties *cn_prop = &hdev->cn_props;
+ struct hbl_cn_aux_data *cn_aux_data;
+ struct hbl_aux_dev *cn_aux_dev;
+ u64 nic_drv_addr, nic_drv_size;
+
+ cn_aux_dev = hdev->cn_aux_dev;
+ cn_aux_data = cn_aux_dev->aux_data;
+ nic_drv_addr = cn_aux_data->nic_drv_addr;
+ nic_drv_size = cn_aux_data->nic_drv_size;
+
+ cn_prop->nic_drv_addr = nic_drv_addr;
+ cn_prop->nic_drv_base_addr = NIC_DRV_BASE_ADDR(nic_drv_addr);
+ cn_prop->nic_drv_end_addr = NIC_DRV_END_ADDR(nic_drv_addr, nic_drv_size);
+ cn_prop->wq_base_addr = WQ_BASE_ADDR(nic_drv_addr);
+ cn_prop->txs_base_addr = TXS_BASE_ADDR(nic_drv_addr);
+ cn_prop->tmr_base_addr = TMR_BASE_ADDR(nic_drv_addr);
+ cn_prop->req_qpc_base_addr = REQ_QPC_BASE_ADDR(nic_drv_addr);
+ cn_prop->res_qpc_base_addr = RES_QPC_BASE_ADDR(nic_drv_addr);
+ cn_prop->nic_drv_size = nic_drv_size;
+ cn_prop->wq_base_size = WQ_BASE_SIZE(nic_drv_addr, nic_drv_size);
+
+ return 0;
+}
+
+static void gaudi2_cn_late_init(struct hbl_cn_device *hdev)
+{
+ struct gaudi2_cn_aux_ops *gaudi2_cn_aux_ops;
+ struct hbl_cn_aux_ops *cn_aux_ops;
+
+ cn_aux_ops = hdev->cn_aux_dev->aux_ops;
+ gaudi2_cn_aux_ops = cn_aux_ops->asic_ops;
+
+ /* compute2cn */
+ gaudi2_cn_aux_ops->reset_prepare = gaudi2_cn_compute_reset_prepare;
+ gaudi2_cn_aux_ops->reset_late_init = gaudi2_cn_compute_reset_late_init;
+ gaudi2_cn_aux_ops->sw_err_event_handler = gaudi2_cn_sw_err_event;
+ gaudi2_cn_aux_ops->axi_error_response_event_handler = gaudi2_cn_axi_error_response_event;
+ gaudi2_cn_aux_ops->ports_stop_prepare = hbl_cn_hard_reset_prepare;
+ gaudi2_cn_aux_ops->send_port_cpucp_status = hbl_cn_send_port_cpucp_status;
+}
+
+static void gaudi2_cn_late_fini(struct hbl_cn_device *hdev)
+{
+ struct gaudi2_cn_aux_ops *gaudi2_cn_aux_ops;
+ struct hbl_cn_aux_ops *cn_aux_ops;
+
+ cn_aux_ops = hdev->cn_aux_dev->aux_ops;
+ gaudi2_cn_aux_ops = cn_aux_ops->asic_ops;
+
+ /* compute2cn */
+ gaudi2_cn_aux_ops->reset_prepare = NULL;
+ gaudi2_cn_aux_ops->reset_late_init = NULL;
+ gaudi2_cn_aux_ops->sw_err_event_handler = NULL;
+ gaudi2_cn_aux_ops->axi_error_response_event_handler = NULL;
+ gaudi2_cn_aux_ops->ports_stop_prepare = NULL;
+ gaudi2_cn_aux_ops->send_port_cpucp_status = NULL;
+}
+
+static int gaudi2_cn_get_hw_block_handle(struct hbl_cn_device *hdev, u64 address, u64 *handle)
+{
+ hbl_cn_get_self_hw_block_handle(hdev, address, handle);
+
+ return 0;
+}
+
+static int gaudi2_cn_get_hw_block_addr(struct hbl_cn_device *hdev, u64 handle, u64 *addr, u64 *size)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ u32 reg;
+ int rc;
+
+ *size = HBL_CN_BLOCK_SIZE;
+ reg = hbl_cn_hw_block_handle_to_addr32(hdev, handle) - lower_32_bits(gaudi2->cfg_base);
+
+ rc = hbl_cn_get_reg_pcie_addr(hdev, CFG_BAR_ID, reg, addr);
+ if (rc)
+ dev_err(hdev->dev, "Failed to get hw block address for register 0x%x", reg);
+
+ return rc;
+}
+
+static struct hbl_cn_asic_port_funcs gaudi2_cn_port_funcs = {
+ .port_hw_init = gaudi2_cn_port_hw_init,
+ .port_hw_fini = gaudi2_cn_port_hw_fini,
+ .phy_port_init = gaudi2_cn_phy_port_init,
+ .phy_port_start_stop = gaudi2_cn_phy_port_start_stop,
+ .phy_port_power_up = gaudi2_cn_phy_port_power_up,
+ .phy_port_reconfig = gaudi2_cn_phy_port_reconfig,
+ .phy_port_fini = gaudi2_cn_phy_port_fini,
+ .phy_link_status_work = gaudi2_cn_phy_link_status_work,
+ .update_qp_mtu = gaudi2_cn_update_qp_mtu,
+ .user_wq_arr_unset = gaudi2_user_wq_arr_unset,
+ .get_cq_id_range = gaudi2_get_cq_id_range,
+ .user_cq_set = gaudi2_user_cq_set,
+ .user_cq_unset = gaudi2_user_cq_unset,
+ .user_cq_destroy = gaudi2_user_cq_destroy,
+ .get_cnts_num = gaudi2_cn_get_cnts_num,
+ .get_cnts_names = gaudi2_cn_get_cnts_names,
+ .get_cnts_values = gaudi2_cn_get_cnts_values,
+ .port_sw_init = gaudi2_cn_port_sw_init,
+ .port_sw_fini = gaudi2_cn_port_sw_fini,
+ .register_qp = gaudi2_register_qp,
+ .unregister_qp = gaudi2_unregister_qp,
+ .get_qp_id_range = gaudi2_get_qp_id_range,
+ .eq_poll = gaudi2_eq_poll,
+ .eq_dispatcher_select_dq = gaudi2_cn_eq_dispatcher_select_dq,
+ .get_db_fifo_id_range = gaudi2_get_db_fifo_id_range,
+ .get_db_fifo_hw_id_range = gaudi2_get_db_fifo_hw_id_range,
+ .db_fifo_set = gaudi2_db_fifo_set,
+ .db_fifo_unset = gaudi2_db_fifo_unset,
+ .get_db_fifo_umr = gaudi2_cn_get_db_fifo_umr,
+ .get_db_fifo_modes_mask = gaudi2_get_db_fifo_modes_mask,
+ .db_fifo_allocate = gaudi2_db_fifo_allocate,
+ .db_fifo_free = gaudi2_db_fifo_free,
+ .set_pfc = gaudi2_cn_set_pfc,
+ .get_encap_id_range = gaudi2_get_encap_id_range,
+ .encap_set = gaudi2_encap_set,
+ .encap_unset = gaudi2_encap_unset,
+ .set_ip_addr_encap = gaudi2_default_encap_set,
+ .qpc_write = gaudi2_cn_qpc_write,
+ .qpc_invalidate = gaudi2_cn_qpc_invalidate,
+ .qpc_query = gaudi2_cn_qpc_query,
+ .qpc_clear = gaudi2_cn_qpc_clear,
+ .user_ccq_set = gaudi2_cn_user_ccq_set,
+ .user_ccq_unset = gaudi2_cn_user_ccq_unset,
+ .reset_mac_stats = gaudi2_cn_reset_mac_stats,
+ .collect_fec_stats = gaudi2_cn_debugfs_collect_fec_stats,
+ .disable_wqe_index_checker = gaudi2_cn_disable_wqe_index_checker,
+ .get_status = gaudi2_cn_get_status,
+ .cfg_lock = gaudi2_cn_cfg_lock,
+ .cfg_unlock = gaudi2_cn_cfg_unlock,
+ .cfg_is_locked = gaudi2_cn_cfg_is_locked,
+ .qp_pre_destroy = gaudi2_cn_qp_pre_destroy,
+ .qp_post_destroy = gaudi2_cn_qp_post_destroy,
+ .set_port_status = gaudi2_cn_set_port_status,
+ .send_cpucp_packet = gaudi2_cn_send_cpucp_packet,
+ .adaptive_tmr_reset = gaudi2_cn_adaptive_tmr_reset,
+ .spmu_get_stats_info = gaudi2_cn_spmu_get_stats_info,
+ .spmu_config = gaudi2_cn_spmu_config,
+ .spmu_sample = gaudi2_cn_spmu_sample,
+ .post_send_status = gaudi2_cn_post_send_status,
+};
+
+static struct hbl_cn_asic_funcs gaudi2_cn_funcs = {
+ .core_init = gaudi2_cn_core_init,
+ .core_fini = gaudi2_cn_core_fini,
+ .set_req_qp_ctx = gaudi2_set_req_qp_ctx,
+ .set_res_qp_ctx = gaudi2_set_res_qp_ctx,
+ .user_wq_arr_set = gaudi2_user_wq_arr_set,
+ .user_set_app_params = gaudi2_user_set_app_params,
+ .user_get_app_params = gaudi2_user_get_app_params,
+ .phy_reset_macro = gaudi2_cn_phy_reset_macro,
+ .phy_get_crc = gaudi2_cn_phy_get_crc,
+ .get_phy_fw_name = gaudi2_cn_phy_get_fw_name,
+ .phy_fw_load_all = gaudi2_cn_phy_fw_load_all,
+ .get_default_port_speed = gaudi2_cn_get_default_port_speed,
+ .pre_sw_init = gaudi2_cn_pre_sw_init,
+ .sw_init = gaudi2_cn_sw_init,
+ .sw_fini = gaudi2_cn_sw_fini,
+ .macro_sw_init = gaudi2_cn_macro_sw_init,
+ .macro_sw_fini = gaudi2_cn_macro_sw_fini,
+ .kernel_ctx_init = gaudi2_cn_kernel_ctx_init,
+ .kernel_ctx_fini = gaudi2_cn_kernel_ctx_fini,
+ .ctx_init = gaudi2_cn_ctx_init,
+ .ctx_fini = gaudi2_cn_ctx_fini,
+ .qp_read = gaudi2_cn_debugfs_qp_read,
+ .wqe_read = gaudi2_cn_debugfs_wqe_read,
+ .set_en_data = gaudi2_cn_set_en_data,
+ .request_irqs = gaudi2_cn_eq_request_irqs,
+ .synchronize_irqs = gaudi2_cn_eq_sync_irqs,
+ .free_irqs = gaudi2_cn_eq_free_irqs,
+ .phy_dump_serdes_params = gaudi2_cn_phy_dump_serdes_params,
+ .get_max_msg_sz = gaudi2_cn_get_max_msg_sz,
+ .qp_syndrome_to_str = gaudi2_cn_qp_err_syndrome_to_str,
+ .app_params_clear = gaudi2_cn_app_params_clear,
+ .inject_rx_err = gaudi2_cn_inject_rx_err,
+ .is_encap_supported = gaudi2_cn_is_encap_supported,
+ .set_static_properties = gaudi2_cn_set_static_properties,
+ .set_dram_properties = gaudi2_cn_set_dram_properties,
+ .late_init = gaudi2_cn_late_init,
+ .late_fini = gaudi2_cn_late_fini,
+ .get_hw_block_handle = gaudi2_cn_get_hw_block_handle,
+ .get_hw_block_addr = gaudi2_cn_get_hw_block_addr,
+ .dma_alloc_coherent = gaudi2_cn_dma_alloc_coherent,
+ .dma_free_coherent = gaudi2_cn_dma_free_coherent,
+ .dma_pool_zalloc = gaudi2_cn_dma_pool_zalloc,
+ .dma_pool_free = gaudi2_cn_dma_pool_free,
+ .send_cpu_message = gaudi2_cn_send_cpu_message,
+ .ports_cancel_status_work = hbl_cn_ports_cancel_status_work,
+ .port_funcs = &gaudi2_cn_port_funcs,
+};
+
+void gaudi2_cn_set_asic_funcs(struct hbl_cn_device *hdev)
+{
+ hdev->asic_funcs = &gaudi2_cn_funcs;
+}
diff --git a/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.h b/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.h
new file mode 100644
index 000000000000..58a0d4e86d47
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.h
@@ -0,0 +1,427 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef GAUDI2_CN_H_
+#define GAUDI2_CN_H_
+
+#include <linux/net/intel/gaudi2.h>
+
+#include "../common/hbl_cn.h"
+#include "asic_reg/gaudi2_regs.h"
+
+#define NIC_NUMBER_OF_MACROS 12
+#define NIC_NUMBER_OF_ENGINES (NIC_NUMBER_OF_MACROS * 2)
+#define NIC_MAX_NUMBER_OF_PORTS (NIC_NUMBER_OF_ENGINES * 2)
+#define NIC_MAX_FIFO_RINGS 32
+#define NIC_MAC_NUM_OF_LANES 4
+#define NIC_MAC_LANES_START 0
+#define NIC_NUMBER_OF_EQS 1
+#define DEVICE_CACHE_LINE_SIZE 128
+#define NIC_SEND_WQE_SIZE 32
+#define NIC_SEND_WQE_SIZE_MULTI_STRIDE 64
+#define NIC_RECV_WQE_SIZE 16
+#define DB_FIFO_ELEMENT_SIZE 8
+/* 4 entries of 32 bit each i.e. 16 bytes */
+#define NIC_RAW_EQE_SIZE 16
+#define NIC_MAX_CCQS_NUM 1
+#define NIC_HW_MAX_QP_NUM BIT(24) /* 16M (per port) */
+
+#define NIC_NUMBER_OF_PORTS NIC_NUMBER_OF_ENGINES
+#define NIC_MAX_NUM_OF_LANES (NIC_NUMBER_OF_MACROS * NIC_MAC_LANES)
+#define NIC_CQS_NUM 2 /* For Raw and RDMA */
+#define NIC_EQ_ERR_SYNDROME 0
+#define NIC_QP_ERR_RETRY_SYNDROME 0x40
+#define NIC_MAX_QP_ERR_SYNDROMES 0x100
+#define GAUDI2_NIC_MAX_CQS_NUM 16
+
+/* make sure generic max CCQs number is always larger than h/w specific max CCQs number */
+static_assert(NIC_MAX_CCQS_NUM <= NIC_DRV_MAX_CCQS_NUM);
+
+#define GAUDI2_NIC_NUM_DB_FIFOS 32
+
+/* writing to the device memory-mapped dram using the writel or writeb commands (for example) is
+ * subject to the write-combined rules, meaning that writes temporarily stored in a buffer and are
+ * released together later in burst mode towards the device.
+ * Due to the high latencies in the PLDM such writes take a lot of time which may lead to system
+ * hangs. The burst issue gets more severe if ports are opened in parallel as each port accesses
+ * this memory, therefore we limit the amount of pending writes by inserting reads every several
+ * writes which causes the pending writes to be flushed to the device.
+ */
+#define NIC_MAX_COMBINED_WRITES 0x2000
+
+#define NIC_MAX_RC_MTU SZ_8K
+
+#define UDP_HDR_SIZE 8
+
+/* This is the max frame length the H/W supports (Tx/Rx) */
+#define NIC_MAX_RDMA_HDRS 128
+#define NIC_MAX_TNL_HDR_SIZE 32 /* Bytes */
+#define NIC_MAX_TNL_HDRS (NIC_MAX_TNL_HDR_SIZE + UDP_HDR_SIZE)
+#define NIC_MAX_FRM_LEN (NIC_MAX_RC_MTU + NIC_MAX_RDMA_HDRS)
+#define NIC_MAC_MAX_FRM_LEN (NIC_MAX_FRM_LEN + HBL_EN_MAX_HEADERS_SZ + NIC_MAX_TNL_HDRS)
+#define NIC_RAW_MIN_MTU (SZ_1K - HBL_EN_MAX_HEADERS_SZ)
+#define NIC_RAW_MAX_MTU (NIC_MAX_RC_MTU - HBL_EN_MAX_HEADERS_SZ)
+
+/* This is the size of an element size in the RAW buffer - note that it is different than
+ * NIC_MAX_FRM_LEN, because it has to be power of 2.
+ */
+#define NIC_RAW_ELEM_SIZE (2 * NIC_MAX_RC_MTU)
+
+#define NIC_RX_RING_PKT_NUM BIT(8)
+
+#define NIC_MIN_CONN_ID 1
+#define NIC_MAX_CONN_ID (BIT(13) - 1) /* 8K QPs */
+
+#define NIC_MAX_QP_NUM (NIC_MAX_CONN_ID + 1)
+
+/* Number of available QPs must not exceed NIC_HW_MAX_QP_NUM */
+static_assert(NIC_MAX_QP_NUM <= NIC_HW_MAX_QP_NUM);
+
+/* Allocate an extra QP to be used as dummy QP. */
+#define REQ_QPC_TOTAL_PORT_SIZE ((NIC_MAX_QP_NUM + 1) * sizeof(struct gaudi2_qpc_requester))
+#define RES_QPC_TOTAL_PORT_SIZE ALIGN((NIC_MAX_QP_NUM + 1) * \
+ sizeof(struct gaudi2_qpc_responder), \
+ DEVICE_CACHE_LINE_SIZE)
+
+#define TMR_ENT_SIZE 4
+#define TMR_GRANULARITY 256
+#define TMR_FSM_SIZE ALIGN(NIC_MAX_QP_NUM, DEVICE_CACHE_LINE_SIZE)
+/* each timer serves two NICs, hence multiply by 2 */
+#define TMR_FIFO_SIZE ALIGN((NIC_MAX_QP_NUM * 2 * TMR_ENT_SIZE) + \
+ DEVICE_CACHE_LINE_SIZE * TMR_GRANULARITY, \
+ DEVICE_CACHE_LINE_SIZE)
+#define TMR_FREE_NUM_ENTRIES (TMR_FIFO_SIZE / DEVICE_CACHE_LINE_SIZE)
+#define TMR_FREE_SIZE ALIGN(TMR_FREE_NUM_ENTRIES * TMR_ENT_SIZE, \
+ DEVICE_CACHE_LINE_SIZE)
+#define TMR_TOTAL_MACRO_SIZE (TMR_FSM_SIZE * 2 + TMR_FREE_SIZE + TMR_FIFO_SIZE)
+
+#define TMR_FSM0_OFFS 0
+#define TMR_FREE_OFFS (TMR_FSM0_OFFS + 2 * TMR_FSM_SIZE)
+#define TMR_FIFO_OFFS (TMR_FREE_OFFS + TMR_FREE_SIZE)
+
+#define TXS_ENT_SIZE 4
+#define TXS_GRANULARITY 256
+#define TXS_FIFO_SIZE ALIGN((NIC_MAX_QP_NUM * 2 * TXS_ENT_SIZE) + \
+ DEVICE_CACHE_LINE_SIZE * TXS_GRANULARITY, \
+ DEVICE_CACHE_LINE_SIZE)
+#define TXS_FREE_NUM_ENTRIES (TXS_FIFO_SIZE / DEVICE_CACHE_LINE_SIZE)
+#define TXS_FREE_SIZE ALIGN(TXS_FREE_NUM_ENTRIES * TXS_ENT_SIZE, \
+ DEVICE_CACHE_LINE_SIZE)
+#define TXS_TOTAL_PORT_SIZE (TXS_FREE_SIZE + TXS_FIFO_SIZE)
+
+#define TXS_FREE_OFFS 0
+#define TXS_FIFO_OFFS (TXS_FREE_OFFS + TXS_FREE_SIZE)
+
+#define TXS_NUM_PORTS NIC_MAC_LANES
+#define TXS_SCHEDQ TXS_GRANULARITY
+#define TXS_NUM_SCHEDQS TXS_SCHEDQ
+
+#define TXS_PORT_NUM_SCHEDQS (TXS_NUM_SCHEDQS / TXS_NUM_PORTS)
+#define TXS_PORT_NUM_SCHED_GRANS (TXS_PORT_NUM_SCHEDQS / HBL_EN_PFC_PRIO_NUM)
+#define TXS_PORT_RAW_SCHED_Q (TXS_PORT_NUM_SCHED_GRANS - QPC_RAW_SCHED_Q)
+#define TXS_PORT_RES_SCHED_Q (TXS_PORT_NUM_SCHED_GRANS - QPC_RES_SCHED_Q)
+#define TXS_PORT_REQ_SCHED_Q (TXS_PORT_NUM_SCHED_GRANS - QPC_REQ_SCHED_Q)
+
+#define RXB_NUM_BUFFS 2880
+#define RXB_BUFF_SIZE 128 /* size in bytes */
+#define RXB_NUM_MTU_BUFFS ((NIC_MAX_FRM_LEN / RXB_BUFF_SIZE) + 1)
+#define RXB_DROP_SMALL_TH_DEPTH 3
+#define RXB_DROP_TH_DEPTH (1 * RXB_NUM_MTU_BUFFS)
+#define RXB_XOFF_TH_DEPTH (11 * RXB_NUM_MTU_BUFFS)
+#define RXB_XON_TH_DEPTH (1 * RXB_NUM_MTU_BUFFS)
+#define RXB_NUM_STATIC_CREDITS (RXB_NUM_BUFFS / 2)
+
+#define SECTION_ALIGN_SIZE 0x100000ull
+#define NIC_DRV_BASE_ADDR(nic_drv_addr) ALIGN(nic_drv_addr, SECTION_ALIGN_SIZE)
+
+#define NIC_DRV_END_ADDR(nic_drv_addr, nic_drv_size) \
+ ALIGN(((nic_drv_addr) + (nic_drv_size)), \
+ SECTION_ALIGN_SIZE)
+
+#define REQ_QPC_BASE_ADDR NIC_DRV_BASE_ADDR
+
+#define RES_QPC_BASE_ADDR(nic_drv_addr) (REQ_QPC_BASE_ADDR(nic_drv_addr) + \
+ ALIGN(NIC_NUMBER_OF_ENGINES * REQ_QPC_TOTAL_PORT_SIZE, \
+ SECTION_ALIGN_SIZE))
+
+#define TMR_BASE_ADDR(nic_drv_addr) (RES_QPC_BASE_ADDR(nic_drv_addr) + \
+ ALIGN(NIC_NUMBER_OF_ENGINES * RES_QPC_TOTAL_PORT_SIZE, \
+ SECTION_ALIGN_SIZE))
+
+#define TXS_BASE_ADDR(nic_drv_addr) (TMR_BASE_ADDR(nic_drv_addr) + \
+ ALIGN(NIC_NUMBER_OF_MACROS * TMR_TOTAL_MACRO_SIZE, \
+ SECTION_ALIGN_SIZE))
+
+#define WQ_BASE_ADDR(nic_drv_addr) (TXS_BASE_ADDR(nic_drv_addr) + \
+ ALIGN(NIC_NUMBER_OF_ENGINES * TXS_TOTAL_PORT_SIZE, \
+ SECTION_ALIGN_SIZE))
+
+/* Unlike the other port related sizes, this size is shared between all the engines */
+#define WQ_BASE_SIZE(nic_drv_addr, nic_drv_size) \
+ ({ \
+ u64 __nic_drv_addr = (nic_drv_addr); \
+ NIC_DRV_END_ADDR(__nic_drv_addr, (nic_drv_size)) - WQ_BASE_ADDR(__nic_drv_addr); \
+ })
+
+#define WQ_BUFFER_LOG_SIZE 8
+#define WQ_BUFFER_SIZE (1 << (WQ_BUFFER_LOG_SIZE))
+#define CQE_SIZE sizeof(struct gaudi2_cqe)
+#define NIC_CQ_RAW_IDX 0
+#define NIC_CQ_RDMA_IDX 1
+#define QP_WQE_NUM_REC 128
+#define TX_WQE_NUM_IN_CLINE (DEVICE_CACHE_LINE_SIZE / NIC_SEND_WQE_SIZE_MULTI_STRIDE)
+#define RX_WQE_NUM_IN_CLINE (DEVICE_CACHE_LINE_SIZE / NIC_RECV_WQE_SIZE)
+#define RAW_QPN 0
+
+#define NIC_FIFO_DB_SIZE 64
+#define NIC_TX_BUF_SIZE QP_WQE_NUM_REC
+#define NIC_CQ_MAX_ENTRIES BIT(13)
+#define NIC_EQ_RING_NUM_REC BIT(18)
+
+/* if not equal, the size of the WQ must be considered when checking data bounds in en_tx_done */
+static_assert(NIC_TX_BUF_SIZE == QP_WQE_NUM_REC);
+
+#define NIC_TOTAL_CQ_MEM_SIZE (NIC_CQ_MAX_ENTRIES * CQE_SIZE)
+
+#define NIC_CQ_USER_MIN_ENTRIES 4
+#define NIC_CQ_USER_MAX_ENTRIES NIC_CQ_MAX_ENTRIES
+
+#define NIC_MIN_CQ_ID NIC_CQS_NUM
+#define NIC_MAX_CQ_ID (GAUDI2_NIC_MAX_CQS_NUM - 1)
+
+static_assert(NIC_CQ_RDMA_IDX < GAUDI2_NIC_MAX_CQS_NUM);
+static_assert(NIC_CQ_RAW_IDX < GAUDI2_NIC_MAX_CQS_NUM);
+
+#define USER_WQES_MIN_NUM 16
+#define USER_WQES_MAX_NUM BIT(15) /* 32K */
+
+#define NIC_RXE_AXUSER_AXUSER_CQ_OFFSET (NIC0_RXE0_AXUSER_AXUSER_CQ1_HB_ASID - \
+ NIC0_RXE0_AXUSER_AXUSER_CQ0_HB_ASID)
+
+/* Unsecure userspace doorbell fifo IDs as reported to the user, HW IDs are 0-29 */
+#define GAUDI2_MIN_DB_FIFO_ID 1
+#define GAUDI2_MAX_DB_FIFO_ID 30
+
+#define GAUDI2_DB_FIFO_SECURE_HW_ID 30
+#define GAUDI2_DB_FIFO_PRIVILEGE_HW_ID 31
+
+/* The size of the DB FIFO in bytes is constant */
+#define DB_FIFO_ENTRY_SIZE 8
+#define DB_FIFO_NUM_OF_ENTRIES 64
+#define DB_FIFO_SIZE (DB_FIFO_NUM_OF_ENTRIES * DB_FIFO_ENTRY_SIZE)
+
+/* User encapsulation IDs. There are 8 encaps and 4 decap resources available per macro.
+ * So for now let's allow the max of 2 encaps per port.
+ */
+#define GAUDI2_MIN_ENCAP_ID 0
+#define GAUDI2_MAX_ENCAP_ID 1
+
+#define QPC_GW_MASK_REG_NUM (((NIC0_QPC0_GW_MASK_31 - NIC0_QPC0_GW_MASK_0) >> 2) + 1)
+
+#define NIC_CFG_LO_SIZE (NIC0_QPC1_REQ_STATIC_CONFIG - NIC0_QPC0_REQ_STATIC_CONFIG)
+
+#define NIC_CFG_HI_SIZE (NIC0_RXE1_CONTROL - NIC0_RXE0_CONTROL)
+
+#define NIC_CFG_BASE(port, reg) \
+ ({ \
+ u32 __port = (port); \
+ u32 __reg = (reg); \
+ (u64)(NIC_MACRO_CFG_BASE(__port) + ((__reg < NIC0_RXE0_CONTROL) ? \
+ (NIC_CFG_LO_SIZE * (u64)(__port & 1)) : (NIC_CFG_HI_SIZE * (u64)(__port & 1)))); \
+ })
+
+#define NIC_RREG32(reg) \
+ ({ \
+ u32 _reg = (reg); \
+ RREG32(NIC_CFG_BASE(port, _reg) + _reg); \
+ })
+
+#define NIC_WREG32(reg, val) \
+ ({ \
+ u32 _reg = (reg); \
+ WREG32(NIC_CFG_BASE(port, _reg) + _reg, (val)); \
+ })
+
+#define NIC_RMWREG32(reg, val, mask) \
+ ({ \
+ u32 _reg = (reg); \
+ RMWREG32(NIC_CFG_BASE(port, _reg) + _reg, (val), (mask)); \
+ })
+
+#define NIC_RMWREG32_SHIFTED(reg, val, mask) \
+ ({ \
+ u32 _reg = (reg); \
+ RMWREG32_SHIFTED(NIC_CFG_BASE(port, _reg) + _reg, (val), (mask)); \
+ })
+
+#define MAC_CH_OFFSET(lane) ((NIC0_MAC_CH1_MAC_PCS_BASE - NIC0_MAC_CH0_MAC_PCS_BASE) * (lane))
+
+#define WARN_ON_CACHE_UNALIGNED(addr) WARN_ON_ONCE(!IS_ALIGNED(addr, DEVICE_CACHE_LINE_SIZE))
+
+enum gaudi2_cn_mac_fec_stats_type {
+ FEC_CW_CORRECTED_ACCUM,
+ FEC_CW_UNCORRECTED_ACCUM,
+ FEC_CW_CORRECTED,
+ FEC_CW_UNCORRECTED,
+ FEC_SYMBOL_ERR_CORRECTED_LANE_0,
+ FEC_SYMBOL_ERR_CORRECTED_LANE_1,
+ FEC_SYMBOL_ERR_CORRECTED_LANE_2,
+ FEC_SYMBOL_ERR_CORRECTED_LANE_3,
+ FEC_PRE_FEC_SER_INT,
+ FEC_PRE_FEC_SER_EXP,
+ FEC_POST_FEC_SER_INT,
+ FEC_POST_FEC_SER_EXP,
+ FEC_STAT_LAST
+};
+
+enum gaudi2_cn_perf_stats_type {
+ PERF_BANDWIDTH_INT,
+ PERF_BANDWIDTH_FRAC,
+ PERF_LATENCY_INT,
+ PERF_LATENCY_FRAC,
+ PERF_STAT_LAST
+};
+
+enum gaudi2_cn_pcs_link_state {
+ PCS_LINK_STATE_SETTLING,
+ PCS_LINK_STATE_STRESS,
+ PCS_LINK_STATE_STEADY
+};
+
+struct gaudi2_cn_port;
+
+/**
+ * struct gaudi2_cn_port - manage specific port.
+ * @hdev: habanalabs device structure.
+ * @cn_port: pointer to a common device structure.
+ * @fifo_ring: rings array for doorbell H/W interface.
+ * @wq_ring: raw work queue ring.
+ * @rx_ring: raw skb ring.
+ * @cq_ring: ring array for the completion queue of raw/rdma packets.
+ * @eq_ring: ring for the event queue.
+ * @eq_work: EQ work for processing events (e.g Tx completion).
+ * @qp_sanity_work: QPC sanity check worker.
+ * @qp_sanity_wq: QPC sanity worker thread.
+ * @cfg_lock: Serializes the port configuration.
+ * @qp_destroy_lock: protects the MAC loopback switching for QP destroy flow.
+ * @pcs_link_stady_state_ts: the timestamp to move to the pcs link steady state.
+ * @pcs_link_state: the current pcs link state.
+ * @qp_destroy_cnt: number of QPs currently under destruction.
+ * @min_qp_size: the size of the smallest QP.
+ * @db_fifo_pi: DB fifo ring producer index.
+ * @qp_timeout_cnt: count of timeouts occurred on a port operating a QP.
+ * @pcs_link_samples_per_sec: the number of times we check the pcs link in a second.
+ * @advanced: true if advanced features are supported.
+ * @adaptive_timeout_en: enable adaptive timeout feature.
+ * @qp_destroy_mac_lpbk: port in is MAC loopback due to QP destroy flow.
+ * @initial_tx_taps_cfg: first tx taps config since the last PHY power-up.
+ * @tx_taps_cfg: current tx taps config.
+ * @tx_taps_modified: flag to indicate if tx_taps were modified due to remote faults.
+ */
+struct gaudi2_cn_port {
+ struct hbl_cn_device *hdev;
+ struct hbl_cn_port *cn_port;
+ struct hbl_cn_ring fifo_ring;
+ struct hbl_cn_ring wq_ring;
+ struct hbl_cn_ring rx_ring;
+ struct hbl_cn_ring cq_rings[NIC_CQS_NUM];
+ struct hbl_cn_ring eq_ring;
+ struct delayed_work eq_work;
+ struct delayed_work qp_sanity_work;
+ struct workqueue_struct *qp_sanity_wq;
+ /* Serializes the port configuration */
+ struct mutex cfg_lock;
+ /* protects the MAC loopback switching for QP destroy flow */
+ struct mutex qp_destroy_lock;
+ ktime_t pcs_link_stady_state_ts;
+ enum gaudi2_cn_pcs_link_state pcs_link_state;
+ u32 qp_destroy_cnt;
+ u32 min_qp_size;
+ u32 db_fifo_pi;
+ u32 qp_timeout_cnt;
+ u32 pcs_link_samples_per_sec;
+ u8 advanced;
+ u8 adaptive_timeout_en;
+ u8 qp_destroy_mac_lpbk;
+ u8 initial_tx_taps_cfg;
+ u8 tx_taps_cfg;
+ u8 tx_taps_modified;
+};
+
+/**
+ * struct gaudi2_cn_device - ASIC specific manage structure.
+ * @cn_ports: array that holds all ports manage structures.
+ * @cn_macros: array that holds all macro manage structures.
+ * @en_aux_data: data to be used by the Ethernet driver.
+ * @en_aux_ops: functions for Ethernet <-> CN drivers communication.
+ * @cn_aux_ops: functions for CN <-> compute drivers communication.
+ * @setup_type: type of setup connectivity.
+ * @cfg_base: configuration space base address.
+ * @irq_num_port_base: base IRQ number for port EQ.
+ * @sob_id_base: first reserved SOB ID.
+ * @sob_inc_cfg_val: configuration value for incrementing SOB by one.
+ * @fw_security_enabled: FW security enabled.
+ * @msix_enabled: MSI-X enabled.
+ * @temporal_polling: EQ polling activity is temporal and is used only in specific cases.
+ * @flush_db_fifo: force flush DB FIFO after a write.
+ * @in_compute_reset: device is under compute reset.
+ * @mac_rs_fec_ctrl_support: Is MAC_RS_FEC_CONTROL block supported.
+ */
+struct gaudi2_cn_device {
+ struct gaudi2_cn_port cn_ports[NIC_NUMBER_OF_PORTS];
+ struct gaudi2_en_aux_data en_aux_data;
+ struct gaudi2_en_aux_ops en_aux_ops;
+ struct gaudi2_cn_aux_ops *cn_aux_ops;
+ enum gaudi2_setup_type setup_type;
+ u64 cfg_base;
+ u32 irq_num_port_base;
+ u32 sob_id_base;
+ u32 sob_inc_cfg_val;
+ u8 fw_security_enabled;
+ u8 msix_enabled;
+ u8 temporal_polling;
+ u8 flush_db_fifo;
+ u8 in_compute_reset;
+ u8 mac_rs_fec_ctrl_support;
+};
+
+int gaudi2_cn_eq_init(struct hbl_cn_device *hdev);
+void gaudi2_cn_eq_fini(struct hbl_cn_device *hdev);
+int gaudi2_cn_debugfs_qp_read(struct hbl_cn_device *hdev, struct hbl_cn_qp_info *qp_info, char *buf,
+ size_t bsize);
+int gaudi2_cn_debugfs_wqe_read(struct hbl_cn_device *hdev, char *buf, size_t bsize);
+void gaudi2_cn_debugfs_collect_fec_stats(struct hbl_cn_port *cn_port, char *buf, size_t size);
+int gaudi2_cn_eq_dispatcher_register_db(struct gaudi2_cn_port *gaudi2_cn, u32 asid, u32 dbn);
+int gaudi2_cn_eq_request_irqs(struct hbl_cn_device *hdev);
+void gaudi2_cn_eq_sync_irqs(struct hbl_cn_device *hdev);
+void gaudi2_cn_eq_free_irqs(struct hbl_cn_device *hdev);
+struct hbl_cn_ev_dq *gaudi2_cn_eq_dispatcher_select_dq(struct hbl_cn_port *cn_port,
+ const struct hbl_cn_eqe *eqe);
+char *gaudi2_cn_qp_err_syndrome_to_str(u32 syndrome);
+int gaudi2_cn_qpc_read(struct hbl_cn_port *cn_port, void *qpc, u32 qpn, bool is_req);
+int gaudi2_cn_wqe_read(struct hbl_cn_port *cn_port, void *wqe, u32 qpn, u32 wqe_idx, bool is_tx);
+void gaudi2_cn_hw_mac_loopback_cfg(struct gaudi2_cn_port *gaudi2_cn);
+int gaudi2_cn_set_info(struct hbl_cn_device *hdev, bool get_from_fw);
+int gaudi2_cn_phy_reset_macro(struct hbl_cn_macro *cn_macro);
+int gaudi2_cn_phy_init(struct hbl_cn_device *hdev);
+void gaudi2_cn_eq_enter_temporal_polling_mode(struct hbl_cn_device *hdev);
+void gaudi2_cn_eq_exit_temporal_polling_mode(struct hbl_cn_device *hdev);
+void gaudi2_cn_phy_flush_link_status_work(struct hbl_cn_device *hdev);
+int gaudi2_cn_phy_port_init(struct hbl_cn_port *cn_port);
+void gaudi2_cn_phy_port_start_stop(struct hbl_cn_port *cn_port, bool is_start);
+const char *gaudi2_cn_phy_get_fw_name(void);
+int gaudi2_cn_phy_fw_load_all(struct hbl_cn_device *hdev);
+u16 gaudi2_cn_phy_get_crc(struct hbl_cn_device *hdev);
+int gaudi2_cn_phy_port_power_up(struct hbl_cn_port *cn_port);
+void gaudi2_cn_phy_port_reconfig(struct hbl_cn_port *cn_port);
+void gaudi2_cn_phy_port_fini(struct hbl_cn_port *cn_port);
+void gaudi2_cn_phy_link_status_work(struct work_struct *work);
+void gaudi2_cn_phy_dump_serdes_params(struct hbl_cn_device *hdev, char *buf, size_t size);
+void gaudi2_cn_get_mac_fec_stats(struct hbl_cn_port *cn_port, u64 *data);
+bool gaudi2_cn_is_cq_in_overrun(struct hbl_cn_port *cn_port, u8 cq_id);
+bool gaudi2_handle_qp_error_retry(struct hbl_cn_port *cn_port, u32 qpn);
+
+#endif /* GAUDI2_CN_H_ */
diff --git a/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_debugfs.c b/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_debugfs.c
new file mode 100644
index 000000000000..3acd8b8b0f4a
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_debugfs.c
@@ -0,0 +1,319 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "gaudi2_cn.h"
+
+#define _fsnprintf(buf, size, fmt, ...) \
+ do { \
+ if (full_print) \
+ __snprintf(buf, size, fmt, ##__VA_ARGS__); \
+ \
+ } while (0)
+
+static int gaudi2_cn_debugfs_qpc_req_parse(struct hbl_cn_device *hdev,
+ struct hbl_cn_qp_info *qp_info,
+ struct gaudi2_qpc_requester *req, char *buf,
+ size_t bsize)
+{
+ bool full_print, force_read;
+
+ force_read = qp_info->force_read;
+ full_print = qp_info->full_print;
+
+ __snprintf(buf, bsize, "Valid: %lld\n", REQ_QPC_GET_VALID(*req));
+ if (strlen(buf) >= bsize)
+ return -EFBIG;
+
+ if (!force_read && !REQ_QPC_GET_VALID(*req))
+ return 0;
+
+ __snprintf(buf, bsize, "Error: %lld\n", REQ_QPC_GET_ERROR(*req));
+ if (strlen(buf) >= bsize)
+ return -EFBIG;
+
+ if (!force_read && REQ_QPC_GET_ERROR(*req))
+ return 0;
+
+ _fsnprintf(buf, bsize, "in_work: 0x%llx\n", REQ_QPC_GET_IN_WORK(*req));
+ _fsnprintf(buf, bsize, "trusted: 0x%llx\n", REQ_QPC_GET_TRUST_LEVEL(*req));
+ _fsnprintf(buf, bsize, "WQ addr: 0x%llx\n", REQ_QPC_GET_WQ_BASE_ADDR(*req));
+ _fsnprintf(buf, bsize, "MTU: 0x%llx\n", REQ_QPC_GET_MTU(*req));
+ _fsnprintf(buf, bsize, "cong mode: 0x%llx\n", REQ_QPC_GET_CONGESTION_MODE(*req));
+ _fsnprintf(buf, bsize, "priority: 0x%llx\n", REQ_QPC_GET_PRIORITY(*req));
+ _fsnprintf(buf, bsize, "transport service: 0x%llx\n", REQ_QPC_GET_TRANSPORT_SERVICE(*req));
+ _fsnprintf(buf, bsize, "SWQ gran: 0x%llx\n", REQ_QPC_GET_SWQ_GRANULARITY(*req));
+ __snprintf(buf, bsize, "WQ type: 0x%llx\n", REQ_QPC_GET_WQ_TYPE(*req));
+ _fsnprintf(buf, bsize, "port/lane: 0x%llx\n", REQ_QPC_GET_PORT(*req));
+ _fsnprintf(buf, bsize, "Gaudi1 mode: 0x%llx\n", REQ_QPC_GET_MOD_GAUDI1(*req));
+ __snprintf(buf, bsize, "data MMU BP: 0x%llx\n", REQ_QPC_GET_DATA_MMU_BYPASS(*req));
+ _fsnprintf(buf, bsize, "PSN delivered: 0x%llx\n", REQ_QPC_GET_PSN_DELIVERED(*req));
+ _fsnprintf(buf, bsize, "pacing time: 0x%llx\n", REQ_QPC_GET_PACING_TIME(*req));
+ _fsnprintf(buf, bsize, "Ackreq freq: 0x%llx\n", REQ_QPC_GET_ACKREQ_FREQ(*req));
+ _fsnprintf(buf, bsize, "PSN since ackreq: 0x%llx\n", REQ_QPC_GET_PSN_SINCE_ACKREQ(*req));
+ __snprintf(buf, bsize, "oldest unacked remote PI: 0x%llx\n",
+ REQ_QPC_GET_OLDEST_UNACKED_REMOTE_PRODUCER_IDX(*req));
+ __snprintf(buf, bsize, "remote CI: 0x%llx\n", REQ_QPC_GET_REMOTE_CONSUMER_IDX(*req));
+ __snprintf(buf, bsize, "remote PI: 0x%llx\n", REQ_QPC_GET_REMOTE_PRODUCER_IDX(*req));
+ __snprintf(buf, bsize, "local PI: 0x%llx\n", REQ_QPC_GET_LOCAL_PRODUCER_IDX(*req));
+ __snprintf(buf, bsize, "local CI: 0x%llx\n", REQ_QPC_GET_CONSUMER_IDX(*req));
+ __snprintf(buf, bsize, "local EI: 0x%llx\n", REQ_QPC_GET_EXECUTION_IDX(*req));
+ __snprintf(buf, bsize, "last index: 0x%llx\n", REQ_QPC_GET_LAST_IDX(*req));
+ __snprintf(buf, bsize, "ASID: 0x%llx\n", REQ_QPC_GET_ASID(*req));
+ _fsnprintf(buf, bsize, "burst size: 0x%llx\n", REQ_QPC_GET_BURST_SIZE(*req));
+ _fsnprintf(buf, bsize, "CC RTT PSN: 0x%llx\n", REQ_QPC_GET_RTT_MARKED_PSN(*req));
+ _fsnprintf(buf, bsize, "CC RTT timestamp: 0x%llx\n", REQ_QPC_GET_RTT_TIMESTAMP(*req));
+ _fsnprintf(buf, bsize, "congestion window: 0x%llx\n", REQ_QPC_GET_CONGESTION_WIN(*req));
+ _fsnprintf(buf, bsize, "encap en: 0x%llx\n", REQ_QPC_GET_ENCAP_ENABLE(*req));
+ _fsnprintf(buf, bsize, "RTT state: 0x%llx\n", REQ_QPC_GET_RTT_STATE(*req));
+ _fsnprintf(buf, bsize, "CQ num: 0x%llx\n", REQ_QPC_GET_CQ_NUM(*req));
+ _fsnprintf(buf, bsize, "congestion window NMA: 0x%llx\n",
+ REQ_QPC_GET_CONGESTION_NON_MARKED_ACK(*req));
+ _fsnprintf(buf, bsize, "encap type: 0x%llx\n", REQ_QPC_GET_ENCAP_TYPE(*req));
+ __snprintf(buf, bsize, "remote WQ log bsize: 0x%llx\n", REQ_QPC_GET_REMOTE_WQ_LOG_SZ(*req));
+ _fsnprintf(buf, bsize, "congestion window MA: 0x%llx\n",
+ REQ_QPC_GET_CONGESTION_MARKED_ACK(*req));
+ _fsnprintf(buf, bsize, "WQ back-press: 0x%llx\n", REQ_QPC_GET_WQ_BACK_PRESSURE(*req));
+ _fsnprintf(buf, bsize, "timeout gran: 0x%llx\n", REQ_QPC_GET_TM_GRANULARITY(*req));
+ __snprintf(buf, bsize, "BCC PSN: 0x%llx\n", REQ_QPC_GET_BCC_PSN(*req));
+ __snprintf(buf, bsize, "ONA PSN: 0x%llx\n", REQ_QPC_GET_ONA_PSN(*req));
+ _fsnprintf(buf, bsize, "sched Q: 0x%llx\n", REQ_QPC_GET_SCHD_Q_NUM(*req));
+ __snprintf(buf, bsize, "BCS PSN: 0x%llx\n", REQ_QPC_GET_BCS_PSN(*req));
+ __snprintf(buf, bsize, "NTS PSN: 0x%llx\n", REQ_QPC_GET_NTS_PSN(*req));
+ _fsnprintf(buf, bsize, "timeout retry cnt: 0x%llx\n",
+ REQ_QPC_GET_TIMEOUT_RETRY_COUNT(*req));
+ _fsnprintf(buf, bsize, "seq ERR retry cnt: 0x%llx\n",
+ REQ_QPC_GET_SEQUENCE_ERROR_RETRY_COUNT(*req));
+ __snprintf(buf, bsize, "dst MAC: %04llx%08llx\n", REQ_QPC_GET_DST_MAC_MSB(*req),
+ REQ_QPC_GET_DST_MAC_LSB(*req));
+ _fsnprintf(buf, bsize, "dst ipv4: 0x%llx\n", REQ_QPC_GET_DST_IP(*req));
+ _fsnprintf(buf, bsize, "remote key: 0x%llx\n", REQ_QPC_GET_RKEY(*req));
+ _fsnprintf(buf, bsize, "multi-stride state: 0x%016llx%08llx\n",
+ REQ_QPC_GET_MULTI_STRIDE_STATE_MSB(*req),
+ REQ_QPC_GET_MULTI_STRIDE_STATE_LSB(*req));
+ _fsnprintf(buf, bsize, "dest QP: 0x%llx\n", REQ_QPC_GET_DST_QP(*req));
+
+ /* make sure the caller is aware that the buffer it is using is not long enough */
+ if (strlen(buf) >= bsize)
+ return -EFBIG;
+
+ return 0;
+}
+
+static int gaudi2_cn_debugfs_qpc_res_parse(struct hbl_cn_device *hdev,
+ struct hbl_cn_qp_info *qp_info,
+ struct gaudi2_qpc_responder *res, char *buf,
+ size_t bsize)
+{
+ bool full_print, force_read;
+
+ force_read = qp_info->force_read;
+ full_print = qp_info->full_print;
+
+ __snprintf(buf, bsize, "Valid: %lld\n", RES_QPC_GET_VALID(*res));
+ if (strlen(buf) >= bsize)
+ return -EFBIG;
+
+ if (!force_read && !RES_QPC_GET_VALID(*res))
+ return 0;
+
+ _fsnprintf(buf, bsize, "in work: 0x%llx\n", RES_QPC_GET_IN_WORK(*res));
+ _fsnprintf(buf, bsize, "peer WQ gran: 0x%llx\n", RES_QPC_GET_PEER_WQ_GRAN(*res));
+ _fsnprintf(buf, bsize, "CQ num: 0x%llx\n", RES_QPC_GET_CQ_NUM(*res));
+ __snprintf(buf, bsize, "cyc_idx: 0x%llx\n", RES_QPC_GET_CYCLIC_IDX(*res));
+ _fsnprintf(buf, bsize, "encap EN: 0x%llx\n", RES_QPC_GET_ENCAP_ENABLE(*res));
+ _fsnprintf(buf, bsize, "encap type: 0x%llx\n", RES_QPC_GET_ENCAP_TYPE(*res));
+ __snprintf(buf, bsize, "data MMU BP: 0x%llx\n", RES_QPC_GET_DATA_MMU_BYPASS(*res));
+ _fsnprintf(buf, bsize, "Gaudi1 mode: 0x%llx\n", RES_QPC_GET_MOD_GAUDI1(*res));
+ _fsnprintf(buf, bsize, "trust level: 0x%llx\n", RES_QPC_GET_TRUST_LEVEL(*res));
+ __snprintf(buf, bsize, "expected PSN: 0x%llx\n", RES_QPC_GET_EXPECTED_PSN(*res));
+ _fsnprintf(buf, bsize, "sched Q: 0x%llx\n", RES_QPC_GET_SCHD_Q_NUM(*res));
+ _fsnprintf(buf, bsize, "peer QP: 0x%llx\n", RES_QPC_GET_PEER_QP(*res));
+ __snprintf(buf, bsize, "ASID: 0x%llx\n", RES_QPC_GET_ASID(*res));
+ _fsnprintf(buf, bsize, "transport service: 0x%llx\n", RES_QPC_GET_TRANSPORT_SERVICE(*res));
+ _fsnprintf(buf, bsize, "ECN count: 0x%llx\n", RES_QPC_GET_ECN_COUNT(*res));
+ __snprintf(buf, bsize, "dst MAC: %04llx%08llx\n", RES_QPC_GET_DST_MAC_MSB(*res),
+ RES_QPC_GET_DST_MAC_LSB(*res));
+ __snprintf(buf, bsize, "dst ipv4: 0x%llx\n", RES_QPC_GET_DST_IP(*res));
+ _fsnprintf(buf, bsize, "local key: 0x%llx\n", RES_QPC_GET_LKEY(*res));
+ _fsnprintf(buf, bsize, "NACK syndrome: 0x%llx\n", RES_QPC_GET_NACK_SYNDROME(*res));
+ __snprintf(buf, bsize, "conn state: 0x%llx\n", RES_QPC_GET_CONN_STATE(*res));
+ _fsnprintf(buf, bsize, "priority: 0x%llx\n", RES_QPC_GET_PRIORITY(*res));
+ _fsnprintf(buf, bsize, "port/lane: 0x%llx\n", RES_QPC_GET_PORT(*res));
+ _fsnprintf(buf, bsize, "dest QP: 0x%llx\n", RES_QPC_GET_DESTINATION_QP(*res));
+
+ /* make sure the caller is aware that the buffer it is using is not long enough */
+ if (strlen(buf) >= bsize)
+ return -EFBIG;
+
+ return 0;
+}
+
+int gaudi2_cn_debugfs_qp_read(struct hbl_cn_device *hdev, struct hbl_cn_qp_info *qp_info, char *buf,
+ size_t bsize)
+{
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct gaudi2_qpc_requester qpc_req = {};
+ struct gaudi2_qpc_responder qpc_res = {};
+ struct hbl_cn_port *cn_port;
+ u32 port, qpn;
+ void *qpc;
+ bool req;
+ int rc;
+
+ req = qp_info->req;
+ port = qp_info->port;
+ qpn = qp_info->qpn;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+ cn_port = &hdev->cn_ports[port];
+ qpc = req ? (void *)&qpc_req : (void *)&qpc_res;
+
+ if (!hbl_cn_is_port_open(cn_port)) {
+ dev_err(hdev->dev,
+ "Cannot read port %d QP %d, port is not initialized\n", port, qpn);
+ return -EPERM;
+ }
+
+ port_funcs->cfg_lock(cn_port);
+ rc = gaudi2_cn_qpc_read(cn_port, qpc, qpn, req);
+ port_funcs->cfg_unlock(cn_port);
+ if (rc)
+ return rc;
+
+ __snprintf(buf, bsize, "port %d, qpn %d, req %d:\n", port, qpn, req);
+ if (strlen(buf) >= bsize)
+ return -EFBIG;
+
+ if (req)
+ rc = gaudi2_cn_debugfs_qpc_req_parse(hdev, qp_info, &qpc_req, buf, bsize);
+ else
+ rc = gaudi2_cn_debugfs_qpc_res_parse(hdev, qp_info, &qpc_res, buf, bsize);
+
+ return rc;
+}
+
+static int gaudi2_cn_debugfs_wqe_parse(struct hbl_cn_device *hdev, struct hbl_cn_wqe_info *wqe_info,
+ void *wqe, char *buf, size_t bsize)
+{
+ struct gaudi2_sq_wqe *sq_wqe;
+ struct gaudi2_rq_wqe *rq_wqe;
+ u8 i;
+
+ if (wqe_info->tx) {
+ i = wqe_info->wqe_idx % TX_WQE_NUM_IN_CLINE;
+ sq_wqe = &(((struct gaudi2_sq_wqe *)wqe)[i]);
+ __snprintf(buf, bsize, "opcode: 0x%llx\n", TX_WQE_GET_OPCODE(*sq_wqe));
+ __snprintf(buf, bsize, "trace event data: 0x%llx\n",
+ TX_WQE_GET_TRACE_EVENT_DATA(*sq_wqe));
+ __snprintf(buf, bsize, "trace event: 0x%llx\n", TX_WQE_GET_TRACE_EVENT(*sq_wqe));
+ __snprintf(buf, bsize, "WQE index: 0x%llx\n", TX_WQE_GET_WQE_INDEX(*sq_wqe));
+ __snprintf(buf, bsize, "reduction opcode: 0x%llx\n",
+ TX_WQE_GET_REDUCTION_OPCODE(*sq_wqe));
+ __snprintf(buf, bsize, "SE: 0x%llx\n", TX_WQE_GET_SE(*sq_wqe));
+ __snprintf(buf, bsize, "inline: 0x%llx\n", TX_WQE_GET_INLINE(*sq_wqe));
+ __snprintf(buf, bsize, "ackreq: 0x%llx\n", TX_WQE_GET_ACKREQ(*sq_wqe));
+ __snprintf(buf, bsize, "size: 0x%llx\n", TX_WQE_GET_SIZE(*sq_wqe));
+ __snprintf(buf, bsize, "local address LSB: 0x%llx\n",
+ TX_WQE_GET_LOCAL_ADDR_LSB(*sq_wqe));
+ __snprintf(buf, bsize, "local address MSB: 0x%llx\n",
+ TX_WQE_GET_LOCAL_ADDR_MSB(*sq_wqe));
+ __snprintf(buf, bsize, "remote address LSB: 0x%llx\n",
+ TX_WQE_GET_REMOTE_ADDR_LSB(*sq_wqe));
+ __snprintf(buf, bsize, "remote address MSB: 0x%llx\n",
+ TX_WQE_GET_REMOTE_ADDR_MSB(*sq_wqe));
+ __snprintf(buf, bsize, "tag: 0x%llx\n", TX_WQE_GET_TAG(*sq_wqe));
+ __snprintf(buf, bsize, "remote SOB: 0x%llx\n", TX_WQE_GET_REMOTE_SOB(*sq_wqe));
+ __snprintf(buf, bsize, "remote SOB data: 0x%llx\n",
+ TX_WQE_GET_REMOTE_SOB_DATA(*sq_wqe));
+ __snprintf(buf, bsize, "SOB command: 0x%llx\n", TX_WQE_GET_SOB_CMD(*sq_wqe));
+ __snprintf(buf, bsize, "completion type: 0x%llx\n",
+ TX_WQE_GET_COMPLETION_TYPE(*sq_wqe));
+ } else {
+ i = wqe_info->wqe_idx % RX_WQE_NUM_IN_CLINE;
+ rq_wqe = &(((struct gaudi2_rq_wqe *)wqe)[i]);
+ __snprintf(buf, bsize, "opcode: 0x%llx\n", RX_WQE_GET_OPCODE(*rq_wqe));
+ __snprintf(buf, bsize, "WQE index: 0x%llx\n", RX_WQE_GET_WQE_INDEX(*rq_wqe));
+ __snprintf(buf, bsize, "SOB command: 0x%llx\n", RX_WQE_GET_SOB_CMD(*rq_wqe));
+ __snprintf(buf, bsize, "local SOB: 0x%llx\n", RX_WQE_GET_LOCAL_SOB(*rq_wqe));
+ __snprintf(buf, bsize, "local SOB data: 0x%llx\n",
+ RX_WQE_GET_LOCAL_SOB_DATA(*rq_wqe));
+ __snprintf(buf, bsize, "completion type: 0x%llx\n",
+ RX_WQE_GET_COMPLETION_TYPE(*rq_wqe));
+ __snprintf(buf, bsize, "size: 0x%llx\n", RX_WQE_GET_SIZE(*rq_wqe));
+ __snprintf(buf, bsize, "tag: 0x%llx\n", RX_WQE_GET_TAG(*rq_wqe));
+ }
+
+ /* Make sure the caller is aware that the buffer used isn't big enough */
+ if (strlen(buf) >= bsize)
+ return -EFBIG;
+
+ return 0;
+}
+
+int gaudi2_cn_debugfs_wqe_read(struct hbl_cn_device *hdev, char *buf, size_t bsize)
+{
+ struct gaudi2_sq_wqe sq_wqe[TX_WQE_NUM_IN_CLINE] = {};
+ struct gaudi2_rq_wqe rq_wqe[RX_WQE_NUM_IN_CLINE] = {};
+ struct hbl_cn_asic_port_funcs *port_funcs;
+ struct hbl_cn_wqe_info *wqe_info;
+ struct hbl_cn_port *cn_port;
+ u32 port, qpn, wqe_idx;
+ void *wqe;
+ bool tx;
+ int rc;
+
+ port_funcs = hdev->asic_funcs->port_funcs;
+
+ /* Get the details of the WQE to read as written by the user via debugfs */
+ wqe_info = &hdev->wqe_info;
+ tx = wqe_info->tx;
+ port = wqe_info->port;
+ qpn = wqe_info->qpn;
+ wqe_idx = wqe_info->wqe_idx;
+
+ cn_port = &hdev->cn_ports[port];
+ wqe = tx ? (void *)&sq_wqe : (void *)&rq_wqe;
+
+ if (!hbl_cn_is_port_open(cn_port)) {
+ dev_err(hdev->dev,
+ "Cannot read port %d QP %d, port is not initialized\n", port, qpn);
+ return -EPERM;
+ }
+
+ port_funcs->cfg_lock(cn_port);
+ rc = gaudi2_cn_wqe_read(cn_port, wqe, qpn, wqe_idx, tx);
+ port_funcs->cfg_unlock(cn_port);
+ if (rc)
+ goto exit;
+
+ __snprintf(buf, bsize, "port %d, qpn %d, wqe_idx %d, tx %d:\n", port, qpn, wqe_idx, tx);
+
+ rc = gaudi2_cn_debugfs_wqe_parse(hdev, wqe_info, wqe, buf, bsize);
+
+exit:
+ return rc;
+}
+
+void gaudi2_cn_debugfs_collect_fec_stats(struct hbl_cn_port *cn_port, char *buf, size_t size)
+{
+ u32 port = cn_port->port;
+ u64 data[FEC_STAT_LAST];
+ ssize_t len;
+
+ gaudi2_cn_get_mac_fec_stats(cn_port, data);
+
+ len = strlen(buf);
+ if ((size - len) <= 1)
+ return;
+
+ if (cn_port->pcs_link)
+ snprintf(buf + len, size - len,
+ "Port %u: pre_fec_SER: %llue-%llu post_fec_SER: %llue-%llu\n", port,
+ data[FEC_PRE_FEC_SER_INT], data[FEC_PRE_FEC_SER_EXP],
+ data[FEC_POST_FEC_SER_INT], data[FEC_POST_FEC_SER_EXP]);
+ else
+ snprintf(buf + len, size - len, "Port %u: Link is down\n", port);
+}
diff --git a/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_eq.c b/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_eq.c
new file mode 100644
index 000000000000..68f6367f3798
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_eq.c
@@ -0,0 +1,732 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include <linux/pci.h>
+
+#include "gaudi2_cn.h"
+
+#define GAUDI2_NIC_MAX_STRING_LEN 64
+
+static const char
+gaudi2_cn_eq_irq_name[NIC_NUMBER_OF_ENGINES][GAUDI2_NIC_MAX_STRING_LEN] = {
+ "gaudi2 cn0 qpc0 EQ",
+ "gaudi2 cn0 qpc1 EQ",
+ "gaudi2 cn1 qpc0 EQ",
+ "gaudi2 cn1 qpc1 EQ",
+ "gaudi2 cn2 qpc0 EQ",
+ "gaudi2 cn2 qpc1 EQ",
+ "gaudi2 cn3 qpc0 EQ",
+ "gaudi2 cn3 qpc1 EQ",
+ "gaudi2 cn4 qpc0 EQ",
+ "gaudi2 cn4 qpc1 EQ",
+ "gaudi2 cn5 qpc0 EQ",
+ "gaudi2 cn5 qpc1 EQ",
+ "gaudi2 cn6 qpc0 EQ",
+ "gaudi2 cn6 qpc1 EQ",
+ "gaudi2 cn7 qpc0 EQ",
+ "gaudi2 cn7 qpc1 EQ",
+ "gaudi2 cn8 qpc0 EQ",
+ "gaudi2 cn8 qpc1 EQ",
+ "gaudi2 cn9 qpc0 EQ",
+ "gaudi2 cn9 qpc1 EQ",
+ "gaudi2 cn10 qpc0 EQ",
+ "gaudi2 cn10 qpc1 EQ",
+ "gaudi2 cn11 qpc0 EQ",
+ "gaudi2 cn11 qpc1 EQ",
+};
+
+/* Event queues for all the ports are initialized ahead of port-specific initialization regardless
+ * of being enabled or not. We do the same for their IRQs.
+ */
+static irqreturn_t gaudi2_cn_eq_threaded_isr(int irq, void *arg);
+
+int gaudi2_cn_eq_request_irqs(struct hbl_cn_device *hdev)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cn_port *gaudi2_port;
+ int i, rc, irq;
+
+ if (!gaudi2->msix_enabled)
+ return 0;
+
+ /* IRQs should be allocated if polling activity is only temporal */
+ if (hdev->poll_enable && !gaudi2->temporal_polling)
+ return 0;
+
+ for (i = 0; i < NIC_NUMBER_OF_PORTS; i++) {
+ gaudi2_port = &gaudi2->cn_ports[i];
+ irq = pci_irq_vector(hdev->pdev, gaudi2->irq_num_port_base + i);
+ rc = request_threaded_irq(irq, NULL, gaudi2_cn_eq_threaded_isr, IRQF_ONESHOT,
+ gaudi2_cn_eq_irq_name[i], gaudi2_port);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to request IRQ %d for port %d\n", irq, i);
+ goto irq_fail;
+ }
+ }
+
+ return 0;
+
+irq_fail:
+ for (i--; i >= 0; i--) {
+ gaudi2_port = &gaudi2->cn_ports[i];
+ irq = pci_irq_vector(hdev->pdev, gaudi2->irq_num_port_base + i);
+ free_irq(irq, gaudi2_port);
+ }
+ return rc;
+}
+
+void gaudi2_cn_eq_sync_irqs(struct hbl_cn_device *hdev)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ int i, irq;
+
+ if (!gaudi2->msix_enabled)
+ return;
+
+ /* IRQs are allocated if polling is temporal so return only if polling mode is constant */
+ if (hdev->poll_enable && !gaudi2->temporal_polling)
+ return;
+
+ for (i = 0; i < NIC_NUMBER_OF_PORTS; i++) {
+ irq = pci_irq_vector(hdev->pdev, gaudi2->irq_num_port_base + i);
+ synchronize_irq(irq);
+ }
+}
+
+void gaudi2_cn_eq_free_irqs(struct hbl_cn_device *hdev)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cn_port *gaudi2_port;
+ int i, irq;
+
+ if (!gaudi2->msix_enabled)
+ return;
+
+ /* IRQs are allocated if polling is temporal so return only if polling mode is constant */
+ if (hdev->poll_enable && !gaudi2->temporal_polling)
+ return;
+
+ for (i = 0; i < NIC_NUMBER_OF_PORTS; i++) {
+ gaudi2_port = &gaudi2->cn_ports[i];
+ irq = pci_irq_vector(hdev->pdev, gaudi2->irq_num_port_base + i);
+ free_irq(irq, gaudi2_port);
+ }
+}
+
+/* HW per port link/lane status mask */
+static u32 gaudi2_cn_get_link_status_mask(struct gaudi2_cn_port *gaudi2_port)
+{
+ switch (gaudi2_port->cn_port->speed) {
+ case SPEED_50000:
+ /* In 50GbE mode, HW supports up to 2 SERDES
+ * links per port.
+ * Note: SW uses fixed link 0 per port for
+ * transmission. Link 1 is unused.
+ */
+ return 0x3;
+ case SPEED_25000:
+ fallthrough;
+ case SPEED_100000:
+ /* In 100GbE mode, HW supports only one
+ * SERDES link per port.
+ */
+ return 0x1;
+ default:
+ dev_err(gaudi2_port->hdev->dev, "Unsupported speed %d\n",
+ gaudi2_port->cn_port->speed);
+ }
+
+ return 0;
+}
+
+static void gaudi2_cn_link_event_handler(struct gaudi2_cn_port *gaudi2_port)
+{
+ u32 curr_link_sts, link_sts_change, link_status_mask, port;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ struct gaudi2_cn_port *gaudi2_port_curr;
+ u8 l, cn_m, port_offset, port_shift;
+ struct hbl_cn_port *cn_port_curr;
+ struct gaudi2_cn_device *gaudi2;
+ struct hbl_cn_macro *cn_macro;
+ bool link_up, prev_link_up;
+
+ gaudi2 = hdev->asic_specific;
+ port = gaudi2_port->cn_port->port;
+ cn_macro = gaudi2_port->cn_port->cn_macro;
+
+ curr_link_sts = (NIC_MACRO_RREG32(PRT0_MAC_CORE_MAC_REC_STS0) &
+ PRT0_MAC_CORE_MAC_REC_STS0_REC_LINK_STS_MASK) >>
+ PRT0_MAC_CORE_MAC_REC_STS0_REC_LINK_STS_SHIFT;
+
+ /* get the change on the serdes by XOR with previous val */
+ link_sts_change = curr_link_sts ^ cn_macro->rec_link_sts;
+
+ /* store current value as previous (for next round) */
+ cn_macro->rec_link_sts = curr_link_sts;
+
+ /* calc the MACRO its link-change we need to handle */
+ cn_m = port >> 1;
+
+ /* Iterate all SERDES links and check which one was changed */
+ for (l = 0; l < NIC_MAC_LANES; l++) {
+ if (!(link_sts_change & BIT(l)))
+ continue;
+
+ /* calc port offset from current link
+ * (2 ports per macro and 2 links per port)
+ */
+ port_offset = l >> 1;
+ port_shift = port_offset ? 2 : 0;
+
+ /* get the port struct to handle its link according to the
+ * current SERDES link index
+ */
+ gaudi2_port_curr = &gaudi2->cn_ports[cn_m * 2 + port_offset];
+ cn_port_curr = gaudi2_port_curr->cn_port;
+
+ mutex_lock(&cn_port_curr->control_lock);
+
+ /* Skip in case the port is closed because the port_close method took care of
+ * disabling the carrier and stopping the queue.
+ */
+ if (!hbl_cn_is_port_open(cn_port_curr)) {
+ mutex_unlock(&cn_port_curr->control_lock);
+ continue;
+ }
+
+ link_status_mask = gaudi2_cn_get_link_status_mask(gaudi2_port_curr);
+ link_up = (curr_link_sts >> port_shift) & link_status_mask;
+ prev_link_up = cn_port_curr->pcs_link;
+
+ if (prev_link_up != link_up && !link_up) {
+ mutex_lock(&gaudi2_port_curr->qp_destroy_lock);
+
+ if (gaudi2_port_curr->qp_destroy_cnt && !cn_port_curr->mac_loopback) {
+ cn_port_curr->mac_loopback = true;
+ gaudi2_cn_hw_mac_loopback_cfg(gaudi2_port_curr);
+ gaudi2_port_curr->qp_destroy_mac_lpbk = true;
+ }
+
+ mutex_unlock(&gaudi2_port_curr->qp_destroy_lock);
+ }
+
+ /* Record the current link status that we got.
+ * In case it is UP, and PHY is not ready, we don't want to actually set it and
+ * reflect it to the user - this will be done later once the PHY will be ready.
+ */
+ cn_port_curr->eq_pcs_link = link_up;
+
+ /* In case of link DOWN, set the actual link and reflect it to the user */
+ if (!link_up)
+ cn_port_curr->pcs_link = false;
+
+ /* Set the actual link status that is reflected to the user and print it in case
+ * either we don't have PHY or we have PHY and it's ready.
+ */
+ if (!hdev->phy_config_fw || cn_port_curr->phy_fw_tuned) {
+ cn_port_curr->pcs_link = link_up;
+ hbl_cn_phy_set_port_status(cn_port_curr, link_up);
+ }
+
+ mutex_unlock(&cn_port_curr->control_lock);
+ }
+}
+
+static void gaudi2_cn_eq_dispatcher_default_handler(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_port *cn_port = gaudi2_port->cn_port;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 event_type, port, synd;
+ struct hbl_cn_eqe eqe;
+
+ port = cn_port->port;
+
+ mutex_lock(&cn_port->control_lock);
+
+ while (!hbl_cn_eq_dispatcher_dequeue(cn_port, hdev->kernel_asid, &eqe, true)) {
+ if (!EQE_IS_VALID(&eqe)) {
+ dev_warn_ratelimited(hdev->dev,
+ "Port-%d got invalid EQE on default queue!\n", port);
+ continue;
+ }
+
+ event_type = EQE_TYPE(&eqe);
+
+ switch (event_type) {
+ case EQE_COMP:
+ dev_warn_ratelimited(hdev->dev, "Port-%d comp event for invalid CQ:%d\n",
+ port, EQE_CQ_EVENT_CQ_NUM(&eqe));
+ break;
+ case EQE_RAW_TX_COMP:
+ dev_warn_ratelimited(hdev->dev,
+ "Port-%d raw-tx-comp event for invalid QP:%d\n",
+ port, EQE_RAW_TX_EVENT_QPN(&eqe));
+ break;
+ case EQE_QP_ERR:
+ synd = EQE_QP_EVENT_ERR_SYND(&eqe);
+ dev_warn_ratelimited(hdev->dev,
+ "Port-%d qp-err event: %d ,%s, for invalid QP:%d\n",
+ port, synd, gaudi2_cn_qp_err_syndrome_to_str(synd),
+ EQE_QP_EVENT_QPN(&eqe));
+ break;
+ case EQE_COMP_ERR:
+ dev_warn_ratelimited(hdev->dev, "Port-%d cq-err event for invalid CQ:%d\n",
+ port, EQE_CQ_EVENT_CQ_NUM(&eqe));
+ break;
+ case EQE_DB_FIFO_OVERRUN:
+ dev_warn_ratelimited(hdev->dev,
+ "Port-%d db-fifo overrun event for invalid DB:%d\n",
+ port, EQE_DB_EVENT_DB_NUM(&eqe));
+ break;
+ case EQE_CONG:
+ dev_warn_ratelimited(hdev->dev,
+ "Port-%d congestion event for invalid CCQ:%d\n",
+ port, EQE_CQ_EVENT_CCQ_NUM(&eqe));
+ break;
+ case EQE_CONG_ERR:
+ /* congestion error due to cc cq hw bug is known */
+ cn_port->cong_q_err_cnt++;
+ dev_dbg_ratelimited(hdev->dev, "Port-%d congestion error event\n", port);
+ break;
+ case EQE_QP_ALIGN_COUNTERS:
+ dev_warn_ratelimited(hdev->dev,
+ "Port-%d QP align counters event, for invalid QP:%d\n",
+ port, EQE_SW_EVENT_QPN(&eqe));
+ break;
+ default:
+ dev_warn_ratelimited(hdev->dev, "Port-%d unsupported event type: %d",
+ port, event_type);
+ }
+ }
+
+ mutex_unlock(&cn_port->control_lock);
+}
+
+static void cn_eq_handler(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_port *cn_port = gaudi2_port->cn_port;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 event_type, port, synd, qpn;
+ bool qp_retry_handled = false;
+ struct hbl_cn_ring *eq_ring;
+ struct hbl_cn_eqe *eqe_p;
+ int rc;
+
+ eq_ring = &gaudi2_port->eq_ring;
+ port = cn_port->port;
+
+ /* read the producer index from HW once. New event, received
+ * after the "read once", will be handled in the next callback.
+ */
+ eq_ring->pi_shadow = *((u32 *)RING_PI_ADDRESS(eq_ring));
+
+ while (eq_ring->ci_shadow != eq_ring->pi_shadow) {
+ eqe_p = (struct hbl_cn_eqe *)RING_BUF_ADDRESS(eq_ring) +
+ (eq_ring->ci_shadow & (eq_ring->count - 1));
+ if (!EQE_IS_VALID(eqe_p)) {
+ dev_warn_ratelimited(hdev->dev,
+ "Port-%d got invalid EQE on EQ (eq.data[0] 0x%x, ci 0x%x, pi 0x%x)\n",
+ port, eqe_p->data[0], eq_ring->ci_shadow,
+ eq_ring->pi_shadow);
+ } else {
+ event_type = EQE_TYPE(eqe_p);
+
+ if (event_type == EQE_QP_ERR) {
+ synd = EQE_QP_EVENT_ERR_SYND(eqe_p);
+ if (gaudi2_port->adaptive_timeout_en && synd ==
+ NIC_QP_ERR_RETRY_SYNDROME) {
+ qpn = EQE_RAW_TX_EVENT_QPN(eqe_p);
+ if (qpn != RAW_QPN)
+ qp_retry_handled =
+ gaudi2_handle_qp_error_retry(cn_port, qpn);
+ }
+ }
+
+ /* In case this is link event, we handle it now and the dispatcher won't be
+ * involved.
+ */
+ if (event_type == EQE_LINK_STATUS) {
+ gaudi2_cn_link_event_handler(gaudi2_port);
+ /* ignore CQ errors when CQ is in overrun, as CQ overflow errors are
+ * expected.
+ */
+ } else if (!qp_retry_handled && ((event_type != EQE_COMP_ERR) ||
+ !gaudi2_cn_is_cq_in_overrun(cn_port, EQE_CQ_EVENT_CQ_NUM(eqe_p)))) {
+ rc = hbl_cn_eq_dispatcher_enqueue(cn_port, eqe_p);
+ if (rc)
+ dev_warn_ratelimited(hdev->dev,
+ "failed to dispatch event %d, err %d\n",
+ event_type, rc);
+ }
+
+ /* Mark the EQ-entry is not valid */
+ EQE_SET_INVALID(eqe_p);
+ }
+
+ eq_ring->rep_idx++;
+ eq_ring->ci_shadow = (eq_ring->ci_shadow + 1) & EQ_IDX_MASK;
+
+ /* Update the HW consumer index, every quarter ring, with an
+ * absolute value (ci_shadow is a wrap-around value).
+ * Use the read producer index value for that.
+ */
+ if (eq_ring->rep_idx > (eq_ring->count / 4) - 1) {
+ eq_ring->rep_idx = 0;
+ NIC_WREG32(NIC0_QPC0_EVENT_QUE_CONSUMER_INDEX, eq_ring->ci_shadow);
+ }
+ }
+
+ if (!qp_retry_handled) {
+ hbl_cn_eq_handler(cn_port);
+
+ /* Handle unknown resources and events */
+ gaudi2_cn_eq_dispatcher_default_handler(gaudi2_port);
+ }
+}
+
+static inline void gaudi2_cn_eq_clr_interrupts(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 port = gaudi2_port->cn_port->port;
+
+ /* Release the HW to allow more EQ interrupts.
+ * No need for interrupt masking. As long as the SW hasn't set the clear reg,
+ * new interrupts won't be raised
+ */
+ NIC_WREG32(NIC0_QPC0_INTERRUPT_CLR, 0x200);
+
+ /* flush write so the interrupt will be cleared as soon as possible */
+ NIC_RREG32(NIC0_QPC0_INTERRUPT_CLR);
+}
+
+static void gaudi2_cn_eq_work(struct work_struct *work)
+{
+ struct gaudi2_cn_port *gaudi2_port = container_of(work, struct gaudi2_cn_port,
+ eq_work.work);
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+
+ cn_eq_handler(gaudi2_port);
+
+ if (hdev->poll_enable)
+ schedule_delayed_work(&gaudi2_port->eq_work, msecs_to_jiffies(1));
+ else
+ gaudi2_cn_eq_clr_interrupts(gaudi2_port);
+}
+
+/* Use this routine when working with real HW */
+static irqreturn_t gaudi2_cn_eq_threaded_isr(int irq, void *arg)
+{
+ struct gaudi2_cn_port *gaudi2_port = arg;
+
+ gaudi2_cn_eq_clr_interrupts(gaudi2_port);
+ cn_eq_handler(gaudi2_port);
+
+ return IRQ_HANDLED;
+}
+
+static void gaudi2_cn_eq_hw_config(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_ring *ring = &gaudi2_port->eq_ring;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 port = gaudi2_port->cn_port->port;
+
+ WARN_ON_CACHE_UNALIGNED(RING_PI_DMA_ADDRESS(ring));
+ WARN_ON_CACHE_UNALIGNED(RING_BUF_DMA_ADDRESS(ring));
+
+ /* set base address for event queue */
+ NIC_WREG32(NIC0_QPC0_EVENT_QUE_PI_ADDR_63_32, upper_32_bits(RING_PI_DMA_ADDRESS(ring)));
+
+ NIC_WREG32(NIC0_QPC0_EVENT_QUE_PI_ADDR_31_7,
+ lower_32_bits(RING_PI_DMA_ADDRESS(ring)) >> 7);
+
+ NIC_WREG32(NIC0_QPC0_EVENT_QUE_BASE_ADDR_63_32,
+ upper_32_bits(RING_BUF_DMA_ADDRESS(ring)));
+
+ NIC_WREG32(NIC0_QPC0_EVENT_QUE_BASE_ADDR_31_7,
+ lower_32_bits(RING_BUF_DMA_ADDRESS(ring)) >> 7);
+
+ NIC_WREG32(NIC0_QPC0_EVENT_QUE_LOG_SIZE, ilog2(ring->count));
+
+ NIC_WREG32(NIC0_QPC0_EVENT_QUE_WRITE_INDEX, 0);
+ NIC_WREG32(NIC0_QPC0_EVENT_QUE_PRODUCER_INDEX, 0);
+ NIC_WREG32(NIC0_QPC0_EVENT_QUE_CONSUMER_INDEX, 0);
+ NIC_WREG32(NIC0_QPC0_EVENT_QUE_CONSUMER_INDEX_CB, 0);
+
+ NIC_WREG32(NIC0_QPC0_EVENT_QUE_CFG, NIC0_QPC0_EVENT_QUE_CFG_INTERRUPT_PER_EQE_MASK |
+ NIC0_QPC0_EVENT_QUE_CFG_WRITE_PI_EN_MASK | NIC0_QPC0_EVENT_QUE_CFG_ENABLE_MASK);
+
+ NIC_WREG32(NIC0_QPC0_AXUSER_EV_QUE_LBW_INTR_HB_WR_OVRD_LO, 0xFFFFFBFF);
+ NIC_WREG32(NIC0_QPC0_AXUSER_EV_QUE_LBW_INTR_HB_RD_OVRD_LO, 0xFFFFFBFF);
+
+ /* reset SW indices */
+ *((u32 *)RING_PI_ADDRESS(ring)) = 0;
+ ring->pi_shadow = 0;
+ ring->ci_shadow = 0;
+ ring->rep_idx = 0;
+}
+
+static void gaudi2_cn_eq_interrupts_enable_conditionally(struct gaudi2_cn_port *gaudi2_port,
+ bool poll_enable)
+{
+ struct hbl_cn_port *cn_port = gaudi2_port->cn_port;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 port = cn_port->port, sob_id;
+ struct gaudi2_cn_device *gaudi2;
+
+ gaudi2 = hdev->asic_specific;
+
+ if (poll_enable) {
+ /* Masking all QPC Interrupts except EQ wire int */
+ NIC_WREG32(NIC0_QPC0_INTERRUPT_MASK, 0x3FF);
+ NIC_WREG32(NIC0_QPC0_INTERRUPT_EN,
+ NIC0_QPC0_INTERRUPT_EN_INTERRUPT10_WIRE_EN_MASK);
+ } else {
+ sob_id = gaudi2->sob_id_base + port;
+ NIC_WREG32(NIC0_QPC0_INTERRUPT_BASE_9,
+ DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_id * sizeof(u32));
+ NIC_WREG32(NIC0_QPC0_INTERRUPT_DATA_9, gaudi2->sob_inc_cfg_val);
+
+ /* Masking all QPC Interrupts except EQ int and error event queue int */
+ NIC_WREG32(NIC0_QPC0_INTERRUPT_MASK, 0x1FF);
+
+ NIC_WREG32(NIC0_QPC0_INTERRUPT_EN,
+ NIC0_QPC0_INTERRUPT_EN_INTERRUPT9_MSI_EN_MASK |
+ NIC0_QPC0_INTERRUPT_EN_INTERRUPT10_WIRE_EN_MASK);
+ }
+
+ /* flush */
+ NIC_RREG32(NIC0_QPC0_INTERRUPT_EN);
+}
+
+static void gaudi2_cn_eq_interrupts_disable(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_port *cn_port = gaudi2_port->cn_port;
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+ u32 port = cn_port->port;
+
+ /* disabling and masking all QPC Interrupts */
+ NIC_WREG32(NIC0_QPC0_INTERRUPT_EN, 0);
+ NIC_WREG32(NIC0_QPC0_INTERRUPT_MASK, 0x7FF);
+
+ /* flush */
+ NIC_RREG32(NIC0_QPC0_INTERRUPT_EN);
+}
+
+void gaudi2_cn_eq_enter_temporal_polling_mode(struct hbl_cn_device *hdev)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cn_port *gaudi2_port;
+ int i;
+
+ if (hdev->poll_enable)
+ return;
+
+ for (i = 0; i < NIC_NUMBER_OF_PORTS; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ gaudi2_port = &gaudi2->cn_ports[i];
+ gaudi2_cn_eq_interrupts_enable_conditionally(gaudi2_port, true);
+ }
+
+ hdev->poll_enable = true;
+
+ /* wait for ISRs to complete before scheduling the polling work */
+ gaudi2_cn_eq_sync_irqs(hdev);
+
+ for (i = 0; i < NIC_NUMBER_OF_PORTS; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ gaudi2_port = &gaudi2->cn_ports[i];
+ schedule_delayed_work(&gaudi2_port->eq_work, msecs_to_jiffies(1));
+ }
+}
+
+void gaudi2_cn_eq_exit_temporal_polling_mode(struct hbl_cn_device *hdev)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cn_port *gaudi2_port;
+ int i;
+
+ if (!hdev->poll_enable)
+ return;
+
+ if (!gaudi2->temporal_polling)
+ return;
+
+ for (i = 0; i < NIC_NUMBER_OF_PORTS; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ gaudi2_port = &gaudi2->cn_ports[i];
+ cancel_delayed_work_sync(&gaudi2_port->eq_work);
+ }
+
+ hdev->poll_enable = false;
+
+ for (i = 0; i < NIC_NUMBER_OF_PORTS; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ gaudi2_port = &gaudi2->cn_ports[i];
+ gaudi2_cn_eq_interrupts_enable_conditionally(gaudi2_port, false);
+ /* Schedule the work as interrupts may be pending but not acked thus preventing
+ * interrupts from triggering.
+ * Double scheduling avoidance of the work (from the ISR and from here)
+ * is done by the WQ scheduler itself.
+ */
+ schedule_delayed_work(&gaudi2_port->eq_work, 0);
+ }
+}
+
+static int gaudi2_cn_eq_port_init(struct gaudi2_cn_port *gaudi2_port)
+{
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+
+ gaudi2_cn_eq_hw_config(gaudi2_port);
+
+ /* we disable eq handler here in order to prevent a crash if a race occurs
+ * between the work-queue calling the handler routine and the eth driver
+ * unregistering it.
+ */
+ gaudi2_port->cn_port->eq_handler_enable = false;
+
+ INIT_DELAYED_WORK(&gaudi2_port->eq_work, gaudi2_cn_eq_work);
+
+ gaudi2_cn_eq_interrupts_enable_conditionally(gaudi2_port, hdev->poll_enable);
+
+ if (hdev->poll_enable)
+ schedule_delayed_work(&gaudi2_port->eq_work, msecs_to_jiffies(1));
+
+ return 0;
+}
+
+static void gaudi2_cn_eq_port_fini(struct gaudi2_cn_port *gaudi2_port)
+{
+ gaudi2_cn_eq_interrupts_disable(gaudi2_port);
+ cancel_delayed_work_sync(&gaudi2_port->eq_work);
+ gaudi2_port->cn_port->eq_handler_enable = false;
+}
+
+int gaudi2_cn_eq_init(struct hbl_cn_device *hdev)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cn_port *gaudi2_port;
+ int rc, i, port_cnt = 0;
+ u32 port;
+
+ /* Need to reset the value of 'poll_enable' for a case that we entered temporal polling mode
+ * but didn't exit it (e.g. during a failing soft-reset).
+ * The original value is actually the inverse of 'temporal_polling' which is set once in
+ * sw_init and is constant.
+ */
+ hdev->poll_enable = !gaudi2->temporal_polling;
+
+ /* Due to H/W bug on gaudi2, link events for both even and odd ports arrive only on the odd
+ * port in the macro. Therefore, need to initialize all EQs of all ports regardless of their
+ * enablement.
+ */
+ for (i = 0; i < NIC_NUMBER_OF_PORTS; i++, port_cnt++) {
+ gaudi2_port = &gaudi2->cn_ports[i];
+ port = gaudi2_port->cn_port->port;
+
+ rc = gaudi2_cn_eq_port_init(gaudi2_port);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init the hardware EQ, port: %d, %d\n", port,
+ rc);
+ goto err;
+ }
+ }
+
+ return 0;
+
+err:
+ for (i = 0; i < port_cnt; i++)
+ gaudi2_cn_eq_port_fini(&gaudi2->cn_ports[i]);
+
+ return rc;
+}
+
+void gaudi2_cn_eq_fini(struct hbl_cn_device *hdev)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ int i;
+
+ for (i = 0; i < NIC_NUMBER_OF_PORTS; i++)
+ gaudi2_cn_eq_port_fini(&gaudi2->cn_ports[i]);
+}
+
+/* event dispatcher
+ *
+ * In gaudi2 each port has a single EQ. The HW writes all the related events
+ * to this EQ. Since multiple applications can use the port at the same time we
+ * need to have a way to dispatch the app-related events to the correct
+ * application, these events will be read later on by the IB API.
+ */
+
+struct hbl_cn_ev_dq *gaudi2_cn_eq_dispatcher_select_dq(struct hbl_cn_port *cn_port,
+ const struct hbl_cn_eqe *eqe)
+{
+ struct gaudi2_cn_port *gaudi2_port = (struct gaudi2_cn_port *)cn_port->cn_specific;
+ struct hbl_cn_ev_dqs *ev_dqs = &cn_port->ev_dqs;
+ struct hbl_cn_ev_dq *dq = NULL;
+ u32 event_type = EQE_TYPE(eqe);
+ u32 cqn, qpn, dbn, ccqn;
+
+ switch (event_type) {
+ case EQE_COMP:
+ fallthrough;
+ case EQE_COMP_ERR:
+ cqn = EQE_CQ_EVENT_CQ_NUM(eqe);
+ dq = hbl_cn_cqn_to_dq(ev_dqs, cqn, gaudi2_port->hdev);
+ break;
+ case EQE_QP_ERR:
+ qpn = EQE_QP_EVENT_QPN(eqe);
+ dq = hbl_cn_qpn_to_dq(ev_dqs, qpn);
+ break;
+ case EQE_RAW_TX_COMP:
+ qpn = EQE_RAW_TX_EVENT_QPN(eqe);
+ dq = hbl_cn_qpn_to_dq(ev_dqs, qpn);
+ break;
+ case EQE_DB_FIFO_OVERRUN:
+ dbn = EQE_DB_EVENT_DB_NUM(eqe);
+ dq = hbl_cn_dbn_to_dq(ev_dqs, dbn, gaudi2_port->hdev);
+ break;
+ case EQE_CONG:
+ ccqn = EQE_CQ_EVENT_CCQ_NUM(eqe);
+ dq = hbl_cn_ccqn_to_dq(ev_dqs, ccqn, gaudi2_port->hdev);
+ break;
+ case EQE_QP_ALIGN_COUNTERS:
+ qpn = EQE_SW_EVENT_QPN(eqe);
+ dq = hbl_cn_qpn_to_dq(ev_dqs, qpn);
+ break;
+ case EQE_CONG_ERR:
+ fallthrough;
+ case EQE_RESERVED:
+ fallthrough;
+ default:
+ dq = &ev_dqs->default_edq;
+ }
+
+ /* Unknown resources and events should be handled by default events
+ * dispatch queue.
+ */
+ return IS_ERR_OR_NULL(dq) ? &ev_dqs->default_edq : dq;
+}
+
+int gaudi2_cn_eq_dispatcher_register_db(struct gaudi2_cn_port *gaudi2_port, u32 asid, u32 dbn)
+{
+ struct hbl_cn_device *hdev = gaudi2_port->hdev;
+
+ if (dbn == GAUDI2_DB_FIFO_PRIVILEGE_HW_ID)
+ return -EINVAL;
+
+ if (asid != hdev->kernel_asid && dbn == GAUDI2_DB_FIFO_SECURE_HW_ID)
+ return -EINVAL;
+
+ return hbl_cn_eq_dispatcher_register_db(gaudi2_port->cn_port, asid, dbn);
+}
diff --git a/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_phy.c b/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_phy.c
new file mode 100644
index 000000000000..2837fc11c43a
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_phy.c
@@ -0,0 +1,2743 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include <linux/module.h>
+#include <linux/firmware.h>
+
+#include "gaudi2_cn.h"
+
+#define NIC_PHY_CFG_SIZE (NIC0_SERDES1_LANE0_REGISTER_0P00 - NIC0_SERDES0_LANE0_REGISTER_0P00)
+
+#define NIC_PHY_CFG_BASE(port) \
+ ({ \
+ u32 __port = (port); \
+ ((u64)(NIC_MACRO_CFG_BASE(__port) + \
+ NIC_PHY_CFG_SIZE * (u64)((__port) & 1))); \
+ })
+
+#define LANE_LO_OFF (NIC0_SERDES0_LANE1_REGISTER_0P00 - NIC0_SERDES0_LANE0_REGISTER_0P00)
+
+#define LANE_HI_OFF (NIC0_SERDES0_LANE1_REGISTER_AI00 - NIC0_SERDES0_LANE0_REGISTER_AI00)
+
+#define LANE_OFF(reg, lane) \
+ ({ \
+ u32 __lane = lane; \
+ ((reg) < NIC0_SERDES0_LANE0_REGISTER_AI00) ? \
+ ((__lane) * LANE_LO_OFF) : ((__lane) * LANE_HI_OFF); \
+ })
+
+#define PHY_PRINT(port, lane, op, val, reg) \
+ ({ \
+ if (hdev->phy_regs_print) { \
+ u32 __port = (port); \
+ dev_info(hdev->dev, "[%s],Nic,%u,Port,%u,Lane,%d,%s,0x%08x,0x%08llx\n", \
+ __func__, __port >> 1, __port & 0x1, (lane), (op), (val), (reg)); \
+ usleep_range(1000, 2000); \
+ } \
+ })
+
+#define NIC_PHY_RREG32(reg) \
+ ({ \
+ u32 _port = port; \
+ u64 _reg = NIC_PHY_CFG_BASE(_port) + (reg); \
+ u32 _val = RREG32(_reg); \
+ PHY_PRINT(_port, -1, "read", _val, _reg); \
+ _val; \
+ })
+
+#define NIC_PHY_WREG32(reg, val) \
+ do { \
+ u32 _port = port; \
+ u64 _reg = NIC_PHY_CFG_BASE(_port) + (reg); \
+ u32 _val = (val); \
+ WREG32(_reg, _val); \
+ PHY_PRINT(_port, -1, "write", _val, _reg); \
+ } while (0)
+
+#define NIC_PHY_RMWREG32(reg, val, mask) \
+ do { \
+ u32 _port = port; \
+ u64 _reg = NIC_PHY_CFG_BASE(_port) + (reg); \
+ u32 _val = (val); \
+ u32 _mask = (mask); \
+ u32 _tmp = RREG32(_reg); \
+ PHY_PRINT(_port, -1, "read(rmw)", _tmp, _reg); \
+ _tmp &= ~_mask; \
+ _tmp |= (_val << __ffs(_mask)); \
+ WREG32(_reg, _tmp); \
+ PHY_PRINT(_port, -1, "write(rmw)", _tmp, _reg); \
+ } while (0)
+
+#define NIC_PHY_RREG32_LANE(reg) \
+ ({ \
+ u32 _port = port; \
+ u32 _lane = lane; \
+ u64 _reg = (reg); \
+ u64 __reg = NIC_PHY_CFG_BASE(_port) + _reg + LANE_OFF(_reg, _lane); \
+ u32 _val = RREG32(__reg); \
+ PHY_PRINT(_port, _lane, "read", _val, __reg); \
+ _val; \
+ })
+
+#define NIC_PHY_WREG32_LANE(reg, val) \
+ do { \
+ u32 _port = port; \
+ u32 _lane = lane; \
+ u64 _reg = (reg); \
+ u64 __reg = NIC_PHY_CFG_BASE(_port) + _reg + LANE_OFF(_reg, _lane); \
+ u32 _val = (val); \
+ WREG32(__reg, _val); \
+ PHY_PRINT(_port, _lane, "write", _val, __reg); \
+ } while (0)
+
+#define NIC_PHY_RMWREG32_LANE(reg, val, mask) \
+ do { \
+ u32 _port = port; \
+ u32 _lane = lane; \
+ u64 _reg = (reg); \
+ u64 __reg = NIC_PHY_CFG_BASE(_port) + _reg + LANE_OFF(_reg, _lane); \
+ u32 _val = (val); \
+ u32 _mask = (mask); \
+ u32 _tmp = RREG32(__reg); \
+ PHY_PRINT(_port, _lane, "read(rmw)", _tmp, __reg); \
+ _tmp &= ~_mask; \
+ _tmp |= (_val << __ffs(_mask)); \
+ WREG32(__reg, _tmp); \
+ PHY_PRINT(_port, _lane, "write(rmw)", _tmp, __reg); \
+ } while (0)
+
+#define NIC_PHY_READ_COUNTS_PER_MS 100000
+#define NIC_PHY_FW_TIME_CONSTANT_RATIO 64
+#define NIC_PHY_FW_TUNING_INTERVAL_MS 100
+#define NIC_PHY_FW_TUNING_TIMEOUT_MS (30 * MSEC_PER_SEC) /* 30 seconds */
+#define NIC_PHY_PAM4_BER_FACTOR 53125000
+#define NIC_PHY_NRZ_BER_FACTOR 25781250
+
+#define NIC_PHY_TX_POL_MASK_HL225 0xF00000000430
+#define NIC_PHY_RX_POL_MASK_HL225 0x0FFFFFFFFBCF
+#define NIC_PHY_TX_POL_MASK_HLS2 0x0
+#define NIC_PHY_RX_POL_MASK_HLS2 0x0
+
+#define NIC_PHY_PCS_LINK_DOWN_TH_S 5
+#define NIC_PHY_MAC_REMOTE_FAULT_TH_S 10
+
+#define NIC_PHY_PCS_SETTLING_WAIT_MS (5 * MSEC_PER_SEC)
+#define NIC_PHY_PCS_STRESS_INT_MS 10
+#define NIC_PHY_PCS_STEADY_STATE_INT_MS (1 * MSEC_PER_SEC)
+
+#define NIC_PHY_PCS_TESTING_WINDOW_S 20
+#define NIC_PHY_PCS_TESTING_WINDOW_MS (NIC_PHY_PCS_TESTING_WINDOW_S * MSEC_PER_SEC)
+#define NIC_PHY_PCS_STRESS_WINDOW_MS \
+ (NIC_PHY_PCS_TESTING_WINDOW_MS - NIC_PHY_PCS_SETTLING_WAIT_MS)
+
+#define NIC_PHY_PCS_MAX_LINK_TOGGLES 5
+
+enum tx_taps_sets {
+ NIC_PHY_TX_TAPS_SET_1 = 0,
+ NIC_PHY_TX_TAPS_SET_2,
+ NIC_PHY_TX_TAPS_SET_3,
+ NIC_PHY_TX_TAPS_SET_4,
+
+ NIC_PHY_TX_TAPS_NUM_SETS
+};
+
+#define NIC_PHY_DEFAULT_TX_TAPS_DEFAULT NIC_PHY_TX_TAPS_SET_1
+
+struct hbl_cn_tx_taps tx_taps_set_array[NIC_PHY_TX_TAPS_NUM_SETS] = {
+ {.pam4_taps = {2, -10, 23, 0, 0}, .nrz_taps = {0, -10, 26, 0, 0}},
+ {.pam4_taps = {0, -6, 22, 0, 0}, .nrz_taps = {0, -10, 26, 0, 0}},
+ {.pam4_taps = {3, -12, 21, 0, 0}, .nrz_taps = {0, -10, 26, 0, 0}},
+ {.pam4_taps = {1, -7, 18, 0, 0}, .nrz_taps = {0, -10, 26, 0, 0}},
+};
+
+static enum tx_taps_sets tx_taps_cfg_array[][2] = {
+ {NIC_PHY_TX_TAPS_SET_1, NIC_PHY_TX_TAPS_SET_1},
+ {NIC_PHY_TX_TAPS_SET_1, NIC_PHY_TX_TAPS_SET_2},
+ {NIC_PHY_TX_TAPS_SET_2, NIC_PHY_TX_TAPS_SET_1},
+ {NIC_PHY_TX_TAPS_SET_2, NIC_PHY_TX_TAPS_SET_2},
+ {NIC_PHY_TX_TAPS_SET_1, NIC_PHY_TX_TAPS_SET_3},
+ {NIC_PHY_TX_TAPS_SET_3, NIC_PHY_TX_TAPS_SET_1},
+ {NIC_PHY_TX_TAPS_SET_3, NIC_PHY_TX_TAPS_SET_3},
+ {NIC_PHY_TX_TAPS_SET_1, NIC_PHY_TX_TAPS_SET_4},
+ {NIC_PHY_TX_TAPS_SET_4, NIC_PHY_TX_TAPS_SET_1},
+ {NIC_PHY_TX_TAPS_SET_4, NIC_PHY_TX_TAPS_SET_4}
+};
+
+static size_t tx_taps_num_cfgs = ARRAY_SIZE(tx_taps_cfg_array);
+
+#define NIC_MAC_LANE_MAP(lane_0, lane_1, lane_2, lane_3) \
+ (((lane_0) & \
+ NIC0_PHY_PHY_ASYNC_LANE_SWAP_SERDES0_TX0_SWAP_ID_MASK) | \
+ (((lane_1) & \
+ NIC0_PHY_PHY_ASYNC_LANE_SWAP_SERDES0_TX0_SWAP_ID_MASK) << \
+ NIC0_PHY_PHY_ASYNC_LANE_SWAP_SERDES0_TX1_SWAP_ID_SHIFT) | \
+ (((lane_2) & \
+ NIC0_PHY_PHY_ASYNC_LANE_SWAP_SERDES0_TX0_SWAP_ID_MASK) << \
+ NIC0_PHY_PHY_ASYNC_LANE_SWAP_SERDES1_TX0_SWAP_ID_SHIFT) | \
+ (((lane_3) & \
+ NIC0_PHY_PHY_ASYNC_LANE_SWAP_SERDES0_TX0_SWAP_ID_MASK) << \
+ NIC0_PHY_PHY_ASYNC_LANE_SWAP_SERDES1_TX1_SWAP_ID_SHIFT))
+
+/* Lane map for HL-225 */
+static u32 default_cn_mac_lane_remap[] = {
+ /* MACRO 0 */
+ NIC_MAC_LANE_MAP(NIC_MAC_LANE_3, NIC_MAC_LANE_1, NIC_MAC_LANE_0, NIC_MAC_LANE_2),
+ /* MACRO 1-10. Use default HW power on reset mapping. */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* MACRO 11 */
+ NIC_MAC_LANE_MAP(NIC_MAC_LANE_1, NIC_MAC_LANE_0, NIC_MAC_LANE_3, NIC_MAC_LANE_2),
+};
+
+/* Firmware lane mapping per macro are nibbles.
+ * e.g. 0x3210 maps to lane 3/2/1/0
+ */
+#define FW_PARSE_LANE_MAP(macro, lane) \
+ ({ \
+ u32 _lane = (lane); \
+ ((macro) & (0xf << (_lane * 4))) >> (_lane * 4); \
+ })
+
+enum lane_state {
+ READY,
+ NOT_READY,
+ FAILURE
+};
+
+#define GAUDI2_PHY_FW_FILE "habanalabs/gaudi2/gaudi2_cn_fw.bin"
+
+MODULE_FIRMWARE(GAUDI2_PHY_FW_FILE);
+
+const char *gaudi2_cn_phy_get_fw_name(void)
+{
+ return GAUDI2_PHY_FW_FILE;
+}
+
+static int get_tx_lane_in_macro(struct hbl_cn_device *hdev, u32 port, int lane)
+{
+ u32 lane_in_macro, lane_swap_val;
+ int tx_lane;
+
+ lane_in_macro = (port & 0x1) * 2 + lane;
+ lane_swap_val = hdev->mac_lane_remap[port >> 1];
+
+ if (!lane_swap_val)
+ return lane_in_macro;
+
+ for (tx_lane = 0; tx_lane < NIC_MAC_NUM_OF_LANES; tx_lane++) {
+ if (((lane_swap_val >> (tx_lane * 2)) & 0x3) == lane_in_macro)
+ break;
+ }
+
+ return tx_lane;
+}
+
+static void get_tx_port_and_lane(struct hbl_cn_device *hdev, u32 port, int lane, u32 *tx_port,
+ int *tx_lane)
+{
+ struct hbl_cn_port *cn_port = &hdev->cn_ports[port];
+ u32 tx_lane_in_macro, abs_tx_lane_idx;
+
+ if (!cn_port->auto_neg_enable) {
+ *tx_port = port;
+ *tx_lane = lane;
+ return;
+ }
+
+ tx_lane_in_macro = get_tx_lane_in_macro(hdev, port, lane);
+ abs_tx_lane_idx = (port >> 1) * NIC_MAC_NUM_OF_LANES + tx_lane_in_macro;
+
+ *tx_port = abs_tx_lane_idx >> 1;
+ *tx_lane = abs_tx_lane_idx & 0x1;
+}
+
+static bool is_lane_swapping(struct hbl_cn_device *hdev, u32 port, int lane)
+{
+ u32 tx_port;
+ int tx_lane;
+
+ get_tx_port_and_lane(hdev, port, lane, &tx_port, &tx_lane);
+
+ return (tx_port != port) || (tx_lane != lane);
+}
+
+static void set_fw_lane_mapping(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_cpucp_info *cpucp_info = hdev->cpucp_info;
+ u16 cpu_macro_tx_swap_map;
+ int i;
+
+ for (i = 0; i < NIC_NUMBER_OF_MACROS; i++) {
+ cpu_macro_tx_swap_map = cpucp_info->tx_swap_map[i];
+ hdev->mac_lane_remap[i] = NIC_MAC_LANE_MAP(FW_PARSE_LANE_MAP(cpu_macro_tx_swap_map,
+ 0), /* lane 0 */
+ FW_PARSE_LANE_MAP(cpu_macro_tx_swap_map,
+ 1), /* lane 1 */
+ FW_PARSE_LANE_MAP(cpu_macro_tx_swap_map,
+ 2), /* lane 2 */
+ FW_PARSE_LANE_MAP(cpu_macro_tx_swap_map,
+ 3)); /* lane 3 */
+ }
+}
+
+static void mac_lane_remap(struct hbl_cn_device *hdev, u32 port)
+{
+ if (hdev->mac_lane_remap[port >> 1])
+ NIC_MACRO_WREG32(NIC0_PHY_PHY_ASYNC_LANE_SWAP, hdev->mac_lane_remap[port >> 1]);
+}
+
+static void soft_reset(struct hbl_cn_device *hdev, u32 port)
+{
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_980D, 0x888,
+ NIC0_SERDES0_REGISTER_980D_DOMAIN_RESET_MASK);
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_980D, 0x0,
+ NIC0_SERDES0_REGISTER_980D_DOMAIN_RESET_MASK);
+}
+
+static void logic_reset(struct hbl_cn_device *hdev, u32 port)
+{
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_980D, 0x777,
+ NIC0_SERDES0_REGISTER_980D_DOMAIN_RESET_MASK);
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_980D, 0x0,
+ NIC0_SERDES0_REGISTER_980D_DOMAIN_RESET_MASK);
+}
+
+static void cpu_reset(struct hbl_cn_device *hdev, u32 port)
+{
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_980D, 0xAAA,
+ NIC0_SERDES0_REGISTER_980D_DOMAIN_RESET_MASK);
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_980D, 0x0,
+ NIC0_SERDES0_REGISTER_980D_DOMAIN_RESET_MASK);
+}
+
+static int fw_cmd(struct hbl_cn_device *hdev, u32 port, u32 cmd, u32 *detail, u32 expected_res,
+ u32 *res_ptr)
+{
+ u32 res, val, checks = 0;
+
+ if (detail)
+ NIC_PHY_WREG32(NIC0_SERDES0_REGISTER_9816, *detail);
+
+ NIC_PHY_WREG32(NIC0_SERDES0_REGISTER_9815, cmd);
+
+ do {
+ usleep_range(1000, 2000);
+ res = NIC_PHY_RREG32(NIC0_SERDES0_REGISTER_9815);
+ if (checks++ > NIC_PHY_READ_COUNTS_PER_MS) {
+ dev_dbg(hdev->dev, "timeout for PHY cmd 0x%x port %u\n", cmd, port);
+ return -ETIMEDOUT;
+ }
+ } while (res == cmd);
+
+ val = (res >> 8) & 0xF;
+
+ if (val != expected_res) {
+ dev_dbg(hdev->dev, "cmd 0x%x returned error 0x%x port %u\n", cmd, val, port);
+ return -EFAULT;
+ }
+
+ *res_ptr = res;
+
+ return 0;
+}
+
+static void clock_init(struct hbl_cn_device *hdev, u32 port, int lane)
+{
+ u32 first_val, second_val;
+
+ if (port & 0x1) { /* raven 1 */
+ if (lane == 0) {
+ first_val = 0xA9E0;
+ second_val = 0x9B9E;
+ } else { /* lane 1 */
+ first_val = 0xA9E0;
+ second_val = 0x9B9E;
+ }
+ } else { /* raven 0 */
+ if (lane == 0) {
+ first_val = 0x59E0;
+ second_val = 0x9B5E;
+ } else { /* lane 1 */
+ first_val = 0xA9E0;
+ second_val = 0x9B9E;
+ }
+ }
+
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PCC, first_val);
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0NF3, second_val);
+}
+
+static u32 int_to_twos(s32 val, u8 bitwidth)
+{
+ return val < 0 ? (1 << bitwidth) + val : val;
+}
+
+static int twos_to_int(unsigned int val, u8 bitwidth)
+{
+ u32 mask = 1 << (bitwidth - 1);
+
+ return -(val & mask) + (val & ~mask);
+}
+
+static void set_tx_taps(struct hbl_cn_device *hdev, u32 port, int lane, s32 tx_pre2, s32 tx_pre1,
+ s32 tx_main, s32 tx_post1, s32 tx_post2, bool pam4)
+{
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA5, int_to_twos(tx_pre2, 8),
+ NIC0_SERDES0_LANE0_REGISTER_0PA5_TX_PRE_2_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA7, int_to_twos(tx_pre1, 8),
+ NIC0_SERDES0_LANE0_REGISTER_0PA7_TX_PRE_1_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA9, int_to_twos(tx_main, 8),
+ NIC0_SERDES0_LANE0_REGISTER_0PA9_TX_MAIN_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PAB, int_to_twos(tx_post1, 8),
+ NIC0_SERDES0_LANE0_REGISTER_0PAB_TX_POST_1_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PAD, int_to_twos(tx_post2, 8),
+ NIC0_SERDES0_LANE0_REGISTER_0PAD_TX_POST_2_MASK);
+
+ dev_dbg(hdev->dev, "Card %u Port %u lane %d: set %s tx taps [%d,%d,%d,%d,%d]\n",
+ hdev->card_location, port, lane, pam4 ? "PAM4" : "NRZ", tx_pre2, tx_pre1, tx_main,
+ tx_post1, tx_post2);
+}
+
+static void set_tx_taps_cfg(struct hbl_cn_device *hdev, u32 port, int lane, u8 cfg, bool pam4,
+ bool reset_taps)
+{
+ enum tx_taps_sets set_id;
+ u32 abs_lane_idx;
+ s32 *taps;
+
+ set_id = tx_taps_cfg_array[cfg][lane];
+ abs_lane_idx = (port << 1) + lane;
+
+ if (pam4) {
+ taps = hdev->phy_tx_taps[abs_lane_idx].pam4_taps;
+ memcpy(taps, tx_taps_set_array[set_id].pam4_taps,
+ sizeof(hdev->phy_tx_taps[abs_lane_idx].pam4_taps));
+ } else {
+ taps = hdev->phy_tx_taps[abs_lane_idx].nrz_taps;
+ memcpy(taps, tx_taps_set_array[set_id].nrz_taps,
+ sizeof(hdev->phy_tx_taps[abs_lane_idx].nrz_taps));
+ }
+
+ if (reset_taps) {
+ /* Here we first reset the Tx taps (setting all to zero) in order to force link
+ * down on the remote port, so it will have a "fresh start" when setting the next
+ * Tx taps set.
+ */
+ set_tx_taps(hdev, port, lane, 0, 0, 0, 0, 0, pam4);
+ msleep(100);
+ }
+
+ set_tx_taps(hdev, port, lane, taps[0], taps[1], taps[2], taps[3], taps[4], pam4);
+}
+
+static u8 get_curr_tx_taps_cfg(struct hbl_cn_device *hdev, u32 port)
+{
+ struct gaudi2_cn_port *gaudi2_port = hdev->cn_ports[port].cn_specific;
+
+ return gaudi2_port->tx_taps_cfg;
+}
+
+static void init_pam4_tx(struct hbl_cn_device *hdev, u32 port, int lane)
+{
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA1, 0x0);
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA2, 0x0);
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA3, 0x0);
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA4, 0x0);
+ /* data quite */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, 0x6320);
+ /* auto symmetric, scale */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PAF, 0xFAC9);
+ /* data, prbs */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PB0, 0x4000);
+ /* cursor -2 */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA5, 0x100);
+ /* cursor -1 */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA7, 0xF900);
+ /* cursor -main */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA9, 0x1700);
+ /* cursor +1 */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PAB, 0x0);
+ /* cursor +2 */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PAD, 0x0);
+}
+
+static void init_pam4_rx(struct hbl_cn_device *hdev, u32 port, int lane)
+{
+ /* ac-couple always */
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PF8, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_0PF8_AC_COUPLE_EN_MASK);
+}
+
+static void set_lane_mode_tx(struct hbl_cn_device *hdev, u32 port, int lane, bool pam4)
+{
+ if (pam4) {
+ /* Disable NRZ mode */
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PB0, 0x0,
+ NIC0_SERDES0_LANE0_REGISTER_0PB0_TX_NRZ_MODE_MASK);
+ /* Disable NRZ PRBS Generator */
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PB0, 0x0,
+ NIC0_SERDES0_LANE0_REGISTER_0PB0_TX_NRZ_PRBS_GEN_EN_MASK);
+ /* Enable PAM4 PRBS Generator */
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_PRBS_CLK_EN_MASK);
+ } else {
+ /* Disable PAM4 PRBS Generator */
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, 0x0,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_PRBS_CLK_EN_MASK);
+ /* Enable NRZ mode */
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PB0, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_0PB0_TX_NRZ_MODE_MASK);
+ /* Enable NRZ PRBS Generator */
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PB0, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_0PB0_TX_NRZ_PRBS_GEN_EN_MASK);
+ }
+}
+
+static void set_lane_mode_rx(struct hbl_cn_device *hdev, u32 port, int lane, bool pam4)
+{
+ if (pam4)
+ /* Enable PAM4 mode */
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P41, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_0P41_PAM4_EN_MASK);
+ else
+ /* Disable PAM4 mode */
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P41, 0x0,
+ NIC0_SERDES0_LANE0_REGISTER_0P41_PAM4_EN_MASK);
+}
+
+static void prbs_mode_select_tx(struct hbl_cn_device *hdev, u32 port, int lane, bool pam4,
+ char *mode)
+{
+ u32 val;
+
+ if (!mode || strncmp(mode, "PRBS", strlen("PRBS")))
+ return;
+
+ if (pam4) {
+ if (!strncmp(mode, "PRBS9", strlen("PRBS9")))
+ val = 0;
+ else if (!strncmp(mode, "PRBS13", strlen("PRBS13")))
+ val = 1;
+ else if (!strncmp(mode, "PRBS15", strlen("PRBS15")))
+ val = 2;
+ else /* PRBS31 */
+ val = 3;
+
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, val,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_PRBS_MODE_MASK);
+ } else {
+ if (!strncmp(mode, "PRBS9", strlen("PRBS9")))
+ val = 0;
+ else if (!strncmp(mode, "PRBS15", strlen("PRBS15")))
+ val = 1;
+ else if (!strncmp(mode, "PRBS23", strlen("PRBS23")))
+ val = 2;
+ else /* PRBS31 */
+ val = 3;
+
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, val,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_PRBS_MODE_MASK);
+ }
+
+ val = pam4 ? 0 : 1;
+
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PB0, val,
+ NIC0_SERDES0_LANE0_REGISTER_0PB0_TX_NRZ_PRBS_CLK_EN_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PB0, val,
+ NIC0_SERDES0_LANE0_REGISTER_0PB0_TX_NRZ_PRBS_GEN_EN_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PB0, val,
+ NIC0_SERDES0_LANE0_REGISTER_0PB0_TX_NRZ_MODE_MASK);
+
+ if (pam4)
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PB0, 0x0,
+ NIC0_SERDES0_LANE0_REGISTER_0PB0_TX_HALF_RATE_EN_MASK);
+
+ val = pam4 ? 1 : 0;
+
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_TEST_DATA_SRC_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, val,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_PRBS_CLK_EN_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_PAM4_TEST_EN_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, val,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_PRBS_GEN_EN_MASK);
+}
+
+static void prbs_mode_select_rx(struct hbl_cn_device *hdev, u32 port, int lane, bool pam4,
+ char *mode)
+{
+ u32 val;
+
+ if (!mode || strncmp(mode, "PRBS", strlen("PRBS")))
+ return;
+
+ if (pam4) {
+ if (!strncmp(mode, "PRBS9", strlen("PRBS9")))
+ val = 0;
+ else if (!strncmp(mode, "PRBS13", strlen("PRBS13")))
+ val = 1;
+ else if (!strncmp(mode, "PRBS15", strlen("PRBS15")))
+ val = 2;
+ else /* PRBS31 */
+ val = 3;
+
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P43, val,
+ NIC0_SERDES0_LANE0_REGISTER_0P43_PRBS_MODE_SEL_MASK);
+ } else {
+ if (!strncmp(mode, "PRBS9", strlen("PRBS9")))
+ val = 0;
+ else if (!strncmp(mode, "PRBS15", strlen("PRBS15")))
+ val = 1;
+ else if (!strncmp(mode, "PRBS23", strlen("PRBS23")))
+ val = 2;
+ else /* PRBS31 */
+ val = 3;
+
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0N61, val,
+ NIC0_SERDES0_LANE0_REGISTER_0N61_NRZ_PRBS_MODE_SEL_MASK);
+ }
+
+ if (pam4) {
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P43, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_0P43_PU_PRBS_CHKR_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P43, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_0P43_PU_PRBS_SYNC_CHKR_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P43, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_0P43_RX_PRBS_AUTO_SYNC_EN_MASK);
+ } else {
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0N61, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_0N61_PRBS_CHKR_EN_MASK);
+ }
+}
+
+static void set_default_polarity_values(struct hbl_cn_device *hdev)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ enum gaudi2_setup_type setup_type;
+ u64 pol_tx, pol_rx;
+
+ setup_type = gaudi2->setup_type;
+
+ if (hdev->skip_phy_pol_cfg)
+ return;
+
+ switch (setup_type) {
+ case GAUDI2_SETUP_TYPE_HLS2:
+ pol_tx = NIC_PHY_TX_POL_MASK_HL225 ^ NIC_PHY_TX_POL_MASK_HLS2;
+ pol_rx = NIC_PHY_RX_POL_MASK_HL225 ^ NIC_PHY_RX_POL_MASK_HLS2;
+ break;
+ default:
+ dev_err(hdev->dev, "Wrong setup type %d\n", setup_type);
+ return;
+ }
+
+ hdev->pol_tx_mask = pol_tx;
+ hdev->pol_rx_mask = pol_rx;
+}
+
+static void set_default_mac_lane_remap(struct hbl_cn_device *hdev)
+{
+ memcpy(hdev->mac_lane_remap, default_cn_mac_lane_remap,
+ sizeof(default_cn_mac_lane_remap));
+}
+
+static s32 get_pam4_tap_pre2(struct hbl_cn_device *hdev, u32 card_location, u32 abs_lane_idx)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ enum gaudi2_setup_type setup_type;
+
+ setup_type = gaudi2->setup_type;
+
+ switch (setup_type) {
+ case GAUDI2_SETUP_TYPE_HLS2:
+ return tx_taps_set_array[NIC_PHY_DEFAULT_TX_TAPS_DEFAULT].pam4_taps[0];
+ default:
+ dev_err(hdev->dev, "Wrong setup type %d\n", setup_type);
+ }
+
+ return 2;
+}
+
+static s32 get_pam4_tap_pre1(struct hbl_cn_device *hdev, u32 card_location, u32 abs_lane_idx)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ enum gaudi2_setup_type setup_type;
+
+ setup_type = gaudi2->setup_type;
+
+ switch (setup_type) {
+ case GAUDI2_SETUP_TYPE_HLS2:
+ return tx_taps_set_array[NIC_PHY_DEFAULT_TX_TAPS_DEFAULT].pam4_taps[1];
+ default:
+ dev_err(hdev->dev, "Wrong setup type %d\n", setup_type);
+ }
+
+ return -12;
+}
+
+static s32 get_pam4_tap_main(struct hbl_cn_device *hdev, u32 card_location, u32 abs_lane_idx)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ enum gaudi2_setup_type setup_type;
+
+ setup_type = gaudi2->setup_type;
+
+ switch (setup_type) {
+ case GAUDI2_SETUP_TYPE_HLS2:
+ return tx_taps_set_array[NIC_PHY_DEFAULT_TX_TAPS_DEFAULT].pam4_taps[2];
+ default:
+ dev_err(hdev->dev, "Wrong setup type %d\n", setup_type);
+ }
+
+ return 22;
+}
+
+static s32 get_pam4_tap_post1(struct hbl_cn_device *hdev, u32 card_location, u32 abs_lane_idx)
+{
+ return tx_taps_set_array[NIC_PHY_DEFAULT_TX_TAPS_DEFAULT].pam4_taps[3];
+}
+
+static s32 get_pam4_tap_post2(struct hbl_cn_device *hdev, u32 card_location, u32 abs_lane_idx)
+{
+ return tx_taps_set_array[NIC_PHY_DEFAULT_TX_TAPS_DEFAULT].pam4_taps[4];
+}
+
+static void set_default_tx_taps_values(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_properties *cn_props = &hdev->cn_props;
+ u32 card_location;
+ int abs_lane_idx;
+ s32 *taps;
+
+ card_location = hdev->card_location;
+
+ for (abs_lane_idx = 0; abs_lane_idx < cn_props->max_num_of_lanes; abs_lane_idx++) {
+ /* PAM4 */
+ taps = hdev->phy_tx_taps[abs_lane_idx].pam4_taps;
+ taps[0] = get_pam4_tap_pre2(hdev, card_location, abs_lane_idx);
+ taps[1] = get_pam4_tap_pre1(hdev, card_location, abs_lane_idx);
+ taps[2] = get_pam4_tap_main(hdev, card_location, abs_lane_idx);
+ taps[3] = get_pam4_tap_post1(hdev, card_location, abs_lane_idx);
+ taps[4] = get_pam4_tap_post2(hdev, card_location, abs_lane_idx);
+
+ /* NRZ */
+ taps = hdev->phy_tx_taps[abs_lane_idx].nrz_taps;
+ taps[0] = tx_taps_set_array[NIC_PHY_DEFAULT_TX_TAPS_DEFAULT].nrz_taps[0];
+ taps[1] = tx_taps_set_array[NIC_PHY_DEFAULT_TX_TAPS_DEFAULT].nrz_taps[1];
+ taps[2] = tx_taps_set_array[NIC_PHY_DEFAULT_TX_TAPS_DEFAULT].nrz_taps[2];
+ taps[3] = tx_taps_set_array[NIC_PHY_DEFAULT_TX_TAPS_DEFAULT].nrz_taps[3];
+ taps[4] = tx_taps_set_array[NIC_PHY_DEFAULT_TX_TAPS_DEFAULT].nrz_taps[4];
+ }
+}
+
+static void set_pol_tx(struct hbl_cn_device *hdev, u32 port, int lane, u32 tx_pol)
+{
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, tx_pol,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_ANA_OUT_FLIP_MASK);
+}
+
+static void set_pol_rx(struct hbl_cn_device *hdev, u32 port, int lane, u32 rx_pol)
+{
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P43, rx_pol,
+ NIC0_SERDES0_LANE0_REGISTER_0P43_RX_DATA_FLIP_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0N61, rx_pol,
+ NIC0_SERDES0_LANE0_REGISTER_0N61_PRBS_CHECK_FLIP_MASK);
+}
+
+static void set_gc_tx(struct hbl_cn_device *hdev, u32 port, int lane, u32 tx_gc)
+{
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PAF, tx_gc,
+ NIC0_SERDES0_LANE0_REGISTER_0PAF_TX_GRAYCODE_EN_MASK);
+}
+
+static void set_gc_rx(struct hbl_cn_device *hdev, u32 port, int lane, u32 rx_gc)
+{
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P42, rx_gc,
+ NIC0_SERDES0_LANE0_REGISTER_0P42_RX_GRAYCODE_EN_MASK);
+}
+
+static void set_pc_tx(struct hbl_cn_device *hdev, u32 port, int lane, u32 tx_pc)
+{
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PAF, tx_pc,
+ NIC0_SERDES0_LANE0_REGISTER_0PAF_TX_PRECODE_EN_MASK);
+}
+
+static void set_pc_rx(struct hbl_cn_device *hdev, u32 port, int lane, u32 rx_pc)
+{
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P42, rx_pc,
+ NIC0_SERDES0_LANE0_REGISTER_0P42_RX_PRECODE_EN_MASK);
+}
+
+static void set_msblsb_tx(struct hbl_cn_device *hdev, u32 port, int lane, u32 tx_msblsb)
+{
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PAF, tx_msblsb,
+ NIC0_SERDES0_LANE0_REGISTER_0PAF_TX_SWAP_MSB_LSB_MASK);
+}
+
+static void set_msblsb_rx(struct hbl_cn_device *hdev, u32 port, int lane, u32 rx_msblsb)
+{
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P43, rx_msblsb,
+ NIC0_SERDES0_LANE0_REGISTER_0P43_RX_SWAP_MSB_LSB_MASK);
+}
+
+static void init_lane_for_fw_tx(struct hbl_cn_device *hdev, u32 port, int lane, bool pam4,
+ bool do_lt)
+{
+ u32 abs_lane_idx, tx_pol, tx_gc, tx_msblsb;
+
+ abs_lane_idx = (port << 1) + lane;
+ tx_pol = (hdev->pol_tx_mask >> abs_lane_idx) & 1;
+
+ tx_gc = (pam4 && !do_lt) ? 1 : 0;
+ tx_msblsb = do_lt ? 1 : 0;
+
+ set_lane_mode_tx(hdev, port, lane, pam4);
+ set_gc_tx(hdev, port, lane, tx_gc);
+ set_pc_tx(hdev, port, lane, 0);
+ set_msblsb_tx(hdev, port, lane, tx_msblsb);
+ set_pol_tx(hdev, port, lane, tx_pol);
+}
+
+static void init_lane_for_fw_rx(struct hbl_cn_device *hdev, u32 port, int lane, bool pam4,
+ bool do_lt)
+{
+ u32 abs_lane_idx, rx_pol, rx_gc, rx_msblsb;
+
+ abs_lane_idx = (port << 1) + lane;
+ rx_pol = (hdev->pol_rx_mask >> abs_lane_idx) & 1;
+
+ rx_gc = (pam4 && !do_lt) ? 1 : 0;
+ rx_msblsb = do_lt ? 1 : 0;
+
+ set_lane_mode_rx(hdev, port, lane, pam4);
+ set_gc_rx(hdev, port, lane, rx_gc);
+ set_pc_rx(hdev, port, lane, 0);
+ set_msblsb_rx(hdev, port, lane, rx_msblsb);
+ set_pol_rx(hdev, port, lane, rx_pol);
+}
+
+static void set_functional_mode_lane(struct hbl_cn_device *hdev, u32 port, int lane, bool do_lt)
+{
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, 0,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_PRBS_CLK_EN_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, 0,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_PAM4_TEST_EN_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, 0,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_PRBS_GEN_EN_MASK);
+
+ if (do_lt)
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_AN10, 0x5);
+ else
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_AN10, 0);
+}
+
+static void set_functional_mode(struct hbl_cn_device *hdev, u32 port)
+{
+ struct hbl_cn_port *cn_port = &hdev->cn_ports[port];
+ int lane, tx_lane;
+ u32 tx_port;
+ bool do_lt;
+
+ do_lt = cn_port->auto_neg_enable;
+
+ for (lane = 0; lane < 2; lane++) {
+ get_tx_port_and_lane(hdev, port, lane, &tx_port, &tx_lane);
+ set_functional_mode_lane(hdev, tx_port, tx_lane, do_lt);
+ }
+
+ cn_port->phy_func_mode_en = true;
+}
+
+static u32 get_fw_reg(struct hbl_cn_device *hdev, u32 port, u32 fw_addr)
+{
+ u32 ignore;
+
+ fw_cmd(hdev, port, 0xE010, &fw_addr, 0xE, &ignore);
+
+ return NIC_PHY_RREG32(NIC0_SERDES0_REGISTER_9812);
+}
+
+static int set_fw_reg(struct hbl_cn_device *hdev, u32 port, u32 fw_addr, u32 val)
+{
+ u32 ignore;
+
+ NIC_PHY_WREG32(NIC0_SERDES0_REGISTER_9812, val);
+
+ return fw_cmd(hdev, port, 0xE020, &fw_addr, 0xE, &ignore);
+}
+
+static void enable_lane_swapping(struct hbl_cn_device *hdev, u32 port, int lane, bool do_an,
+ bool do_lt)
+{
+ if (do_an || do_lt)
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_AJ40, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_AJ40_ANLT_LANE_SWAPPING_EN_MASK);
+
+ if (do_an)
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_AJ40, 0x0, 0x40);
+}
+
+static void disable_lane_swapping(struct hbl_cn_device *hdev, u32 port, int lane, bool do_an,
+ bool do_lt)
+{
+ if (do_an)
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_AJ40, 0x0,
+ NIC0_SERDES0_LANE0_REGISTER_AJ40_ANLT_LANE_SWAPPING_EN_MASK);
+}
+
+static void lane_swapping_config(struct hbl_cn_device *hdev, u32 port, int lane, bool do_an,
+ bool do_lt)
+{
+ u32 tx_port, lt_option;
+ int tx_lane;
+
+ get_tx_port_and_lane(hdev, port, lane, &tx_port, &tx_lane);
+
+ lt_option = get_fw_reg(hdev, port, 366);
+
+ if (is_lane_swapping(hdev, port, lane)) {
+ enable_lane_swapping(hdev, tx_port, tx_lane, do_an, do_lt);
+ enable_lane_swapping(hdev, port, lane, do_an, do_lt);
+
+ lt_option |= (1 << (3 + 8 * (1 - lane)));
+ } else {
+ disable_lane_swapping(hdev, tx_port, tx_lane, do_an, do_lt);
+ disable_lane_swapping(hdev, port, lane, do_an, do_lt);
+
+ lt_option &= ~(1 << (3 + 8 * (1 - lane)));
+ }
+
+ set_fw_reg(hdev, port, 366, lt_option);
+}
+
+static int fw_start(struct hbl_cn_device *hdev, u32 port, int lane, bool pam4, bool do_lt)
+{
+ u32 cmd, speed, ignore;
+
+ cmd = pam4 ? (0x80D0 | lane) : (0x80C0 | lane);
+ speed = pam4 ? 0x9 : 0x3;
+
+ if (do_lt)
+ speed |= 0x100;
+
+ return fw_cmd(hdev, port, cmd, &speed, 0x8, &ignore);
+}
+
+static int fw_start_tx(struct hbl_cn_device *hdev, u32 port, int lane, bool pam4, bool do_lt)
+{
+ u32 speed, cmd, ignore;
+ int rc;
+
+ speed = pam4 ? 0x9 : 0x3;
+
+ if (pam4)
+ cmd = do_lt ? (0x7030 | lane) : (0x7010 | lane);
+ else
+ cmd = do_lt ? (0x7020 | lane) : (0x7000 | lane);
+
+ rc = fw_cmd(hdev, port, cmd, &speed, 0x7, &ignore);
+ if (rc)
+ return rc;
+
+ if (do_lt)
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_RSVD_0PA0_04_MASK);
+
+ return 0;
+}
+
+static int fw_config_vcocap(struct hbl_cn_device *hdev, u32 port, int lane, u32 mode,
+ u32 counter_value)
+{
+ u32 ignore;
+
+ return fw_cmd(hdev, port, 0x6000 | (mode << 4) | lane, &counter_value, 14, &ignore);
+}
+
+static int set_pll_tx(struct hbl_cn_device *hdev, u32 port, int lane, u32 data_rate)
+{
+ u32 card_location, msbc, lsbc;
+ int rc;
+
+ card_location = hdev->card_location;
+
+ if (lane == 0)
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_9825, 0x0,
+ NIC0_SERDES0_REGISTER_9825_PLL_0_LOCK_32T_CLK_SEL_MASK);
+ else
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_982E, 0x0,
+ NIC0_SERDES0_REGISTER_982E_PLL_1_LOCK_32T_CLK_SEL_MASK);
+
+ switch (data_rate) {
+ case NIC_DR_50:
+ /* toggle FRACN LSB for better phase noise */
+ NIC_PHY_RMWREG32_LANE(0x54587D4, 0x0, 0x1);
+ msbc = 0x5;
+ lsbc = 0x4FFA;
+ break;
+ case NIC_DR_26:
+ /* toggle FRACN LSB for better phase noise */
+ NIC_PHY_RMWREG32_LANE(0x54587D4, 0x0, 0x1);
+ NIC_PHY_RMWREG32_LANE(0x5458320, 0x0, 0x1);
+ msbc = 0x5;
+ lsbc = 0x4FFA;
+ break;
+ case NIC_DR_25:
+ msbc = 0x5;
+ lsbc = 0x27FA;
+ break;
+ case NIC_DR_10:
+ msbc = 0x2;
+ lsbc = 0xFFD;
+ break;
+ default:
+ dev_err(hdev->dev, "Card %u Port %u lane %d: unsupported data rate\n",
+ card_location, port, lane);
+ return -EFAULT;
+ }
+
+ rc = fw_config_vcocap(hdev, port, lane, 1, msbc);
+ if (rc)
+ return rc;
+
+ rc = fw_config_vcocap(hdev, port, lane, 2, lsbc);
+ if (rc)
+ return rc;
+
+ rc = fw_config_vcocap(hdev, port, lane, 3, 0x40);
+ if (rc)
+ return rc;
+
+ rc = fw_config_vcocap(hdev, port, lane, 4, 0x0);
+ if (rc)
+ return rc;
+
+ usleep_range(500, 1000);
+
+ if (lane == 0) {
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_9825, 0x1,
+ NIC0_SERDES0_REGISTER_9825_PLL_LOCK_SRC_SEL_MASK);
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_9825, 0x0,
+ NIC0_SERDES0_REGISTER_9825_PLL_0_LOCK_EN_MASK);
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_9825, 0x1,
+ NIC0_SERDES0_REGISTER_9825_PLL_0_LOCK_EN_MASK);
+ } else {
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_9825, 0x2,
+ NIC0_SERDES0_REGISTER_9825_PLL_LOCK_SRC_SEL_MASK);
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_982E, 0x0,
+ NIC0_SERDES0_REGISTER_982E_PLL_1_LOCK_EN_MASK);
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_982E, 0x1,
+ NIC0_SERDES0_REGISTER_982E_PLL_1_LOCK_EN_MASK);
+ }
+
+ usleep_range(500, 1000);
+
+ return 0;
+}
+
+static int set_pll_rx(struct hbl_cn_device *hdev, u32 port, int lane, u32 data_rate)
+{
+ u32 card_location, msbc, lsbc, third_val;
+ int rc;
+
+ card_location = hdev->card_location;
+
+ if (lane == 0)
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_9825, 0x1,
+ NIC0_SERDES0_REGISTER_9825_PLL_0_LOCK_32T_CLK_SEL_MASK);
+ else
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_982E, 0x1,
+ NIC0_SERDES0_REGISTER_982E_PLL_1_LOCK_32T_CLK_SEL_MASK);
+
+ switch (data_rate) {
+ case NIC_DR_50:
+ case NIC_DR_26:
+ msbc = 0x5;
+ lsbc = 0x4FFA;
+ third_val = 0x30;
+ break;
+ case NIC_DR_25:
+ msbc = 0x5;
+ lsbc = 0x27FA;
+ third_val = 0x30;
+ break;
+ case NIC_DR_10:
+ msbc = 0x2;
+ lsbc = 0xFFD;
+ third_val = 0x40;
+ break;
+ default:
+ dev_err(hdev->dev, "Card %u Port %u lane %d: unsupported data rate\n",
+ card_location, port, lane);
+ return -EFAULT;
+ }
+
+ rc = fw_config_vcocap(hdev, port, lane, 1, msbc);
+ if (rc)
+ return rc;
+
+ rc = fw_config_vcocap(hdev, port, lane, 2, lsbc);
+ if (rc)
+ return rc;
+
+ rc = fw_config_vcocap(hdev, port, lane, 3, third_val);
+ if (rc)
+ return rc;
+
+ rc = fw_config_vcocap(hdev, port, lane, 4, 0x1);
+ if (rc)
+ return rc;
+
+ usleep_range(500, 1000);
+
+ if (lane == 0) {
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_9825, 0x1,
+ NIC0_SERDES0_REGISTER_9825_PLL_LOCK_SRC_SEL_MASK);
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_9825, 0x0,
+ NIC0_SERDES0_REGISTER_9825_PLL_0_LOCK_EN_MASK);
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_9825, 0x1,
+ NIC0_SERDES0_REGISTER_9825_PLL_0_LOCK_EN_MASK);
+ } else {
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_9825, 0x2,
+ NIC0_SERDES0_REGISTER_9825_PLL_LOCK_SRC_SEL_MASK);
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_982E, 0x0,
+ NIC0_SERDES0_REGISTER_982E_PLL_1_LOCK_EN_MASK);
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_982E, 0x1,
+ NIC0_SERDES0_REGISTER_982E_PLL_1_LOCK_EN_MASK);
+ }
+
+ usleep_range(500, 1000);
+
+ return 0;
+}
+
+static int set_pll(struct hbl_cn_device *hdev, u32 port, int lane, u32 data_rate)
+{
+ u32 card_location, tx_port;
+ int tx_lane, rc;
+
+ card_location = hdev->card_location;
+
+ get_tx_port_and_lane(hdev, port, lane, &tx_port, &tx_lane);
+
+ rc = set_pll_tx(hdev, tx_port, tx_lane, data_rate);
+ if (rc) {
+ dev_err(hdev->dev, "Card %u Port %u lane %d: set Tx PLL failed, rc %d\n",
+ card_location, tx_port, tx_lane, rc);
+ return rc;
+ }
+
+ rc = set_pll_rx(hdev, port, lane, data_rate);
+ if (rc) {
+ dev_err(hdev->dev, "Card %u Port %u lane %d: set Rx PLL failed, rc %d\n",
+ card_location, port, lane, rc);
+ return rc;
+ }
+
+ return 0;
+}
+
+static void set_tx_taps_scale(struct hbl_cn_device *hdev, u32 port, int lane)
+{
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PAF, 0x4, 0x3E);
+}
+
+static int fw_config_speed_pam4(struct hbl_cn_device *hdev, u32 port, int lane, bool do_lt)
+{
+ u32 tx_port, card_location, val;
+ u8 curr_tx_taps_cfg;
+ int tx_lane, rc;
+
+ card_location = hdev->card_location;
+
+ get_tx_port_and_lane(hdev, port, lane, &tx_port, &tx_lane);
+
+ init_pam4_tx(hdev, tx_port, tx_lane);
+ init_pam4_rx(hdev, port, lane);
+
+ /* Disable AN/LT lane swapping */
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_AJ40, 0x0,
+ NIC0_SERDES0_LANE0_REGISTER_AJ40_ANLT_LANE_SWAPPING_EN_MASK);
+
+ lane_swapping_config(hdev, port, lane, false, do_lt);
+
+ init_lane_for_fw_tx(hdev, tx_port, tx_lane, true, do_lt);
+ init_lane_for_fw_rx(hdev, port, lane, true, do_lt);
+
+ prbs_mode_select_tx(hdev, tx_port, tx_lane, true, "PRBS31");
+ prbs_mode_select_rx(hdev, port, lane, true, "PRBS31");
+
+ rc = fw_start(hdev, port, lane, true, do_lt);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Card %u Port %u lane %d: F/W config speed PAM4 failed (LT %s), rc %d\n",
+ card_location, port, lane, do_lt ? "enabled" : "disable", rc);
+ return rc;
+ }
+
+ if (is_lane_swapping(hdev, port, lane)) {
+ rc = fw_start_tx(hdev, tx_port, tx_lane, true, do_lt);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Card %u Port %u lane %d: F/W config speed PAM4 failed (LT %s), rc %d\n",
+ card_location, tx_port, tx_lane, do_lt ? "enabled" : "disable",
+ rc);
+ return rc;
+ }
+
+ if (do_lt)
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_RSVD_0PA0_04_MASK);
+ }
+
+ if (do_lt) {
+ if (!hdev->phy_calc_ber) {
+ /* tell the F/W to do LT with PCS data instead of PRBS */
+ val = get_fw_reg(hdev, port, 366);
+ val &= 0xFEFE;
+ set_fw_reg(hdev, port, 366, val);
+ }
+
+ set_tx_taps_scale(hdev, tx_port, tx_lane);
+ set_gc_tx(hdev, tx_port, tx_lane, 0);
+ set_pc_tx(hdev, tx_port, tx_lane, 0);
+ set_gc_rx(hdev, port, lane, 0);
+ set_pc_rx(hdev, port, lane, 0);
+ } else {
+ curr_tx_taps_cfg = get_curr_tx_taps_cfg(hdev, port);
+ set_tx_taps_cfg(hdev, tx_port, tx_lane, curr_tx_taps_cfg, true, false);
+ }
+
+ return 0;
+}
+
+static void init_nrz_tx(struct hbl_cn_device *hdev, u32 port, int lane)
+{
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA1, 0x0);
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA2, 0x0);
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA3, 0x0);
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA4, 0x0);
+ /* data quiet */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, 0x6320);
+ /* auto symmetric, scale */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PAF, 0xF8C9);
+ /* data, prbs */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PB0, 0x4820);
+ /* cursor -2 */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA5, 0x0);
+ /* cursor -1 */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA7, 0xFC00);
+ /* cursor -main */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA9, 0x1800);
+ /* cursor +1 */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PAB, 0x0);
+ /* cursor +2 */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PAD, 0x0);
+}
+
+static void init_nrz_rx(struct hbl_cn_device *hdev, u32 port, int lane)
+{
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PF8, 0xEC06);
+}
+
+static int fw_config_speed_nrz(struct hbl_cn_device *hdev, u32 port, int lane)
+{
+ u32 tx_port, card_location;
+ u8 curr_tx_taps_cfg;
+ int tx_lane, rc;
+
+ card_location = hdev->card_location;
+
+ get_tx_port_and_lane(hdev, port, lane, &tx_port, &tx_lane);
+
+ lane_swapping_config(hdev, port, lane, false, false);
+
+ init_nrz_tx(hdev, tx_port, tx_lane);
+ init_nrz_rx(hdev, port, lane);
+
+ init_lane_for_fw_tx(hdev, tx_port, tx_lane, false, false);
+ init_lane_for_fw_rx(hdev, port, lane, false, false);
+
+ prbs_mode_select_tx(hdev, tx_port, tx_lane, false, "PRBS31");
+ prbs_mode_select_rx(hdev, port, lane, false, "PRBS31");
+
+ rc = fw_start(hdev, port, lane, false, false);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Card %u Port %u lane %d: F/W config speed NRZ failed, rc %d\n",
+ card_location, port, lane, rc);
+ return rc;
+ }
+
+ if (is_lane_swapping(hdev, port, lane)) {
+ rc = fw_start_tx(hdev, tx_port, tx_lane, false, false);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Card %u Port %u lane %d: F/W config speed NRZ failed, rc %d\n",
+ card_location, tx_port, tx_lane, rc);
+ return rc;
+ }
+ }
+
+ curr_tx_taps_cfg = get_curr_tx_taps_cfg(hdev, port);
+ set_tx_taps_cfg(hdev, tx_port, tx_lane, curr_tx_taps_cfg, false, false);
+
+ return 0;
+}
+
+static void reset_mac_tx(struct hbl_cn_device *hdev, u32 port)
+{
+ struct gaudi2_cn_device *gaudi2 = hdev->asic_specific;
+ u32 tx_ch_mask;
+
+ /* For F/W version 37.1.0 and above, the reset will be done by the F/W */
+ if ((hdev->fw_major_version == 37 && hdev->fw_minor_version > 1) ||
+ hdev->fw_major_version > 37) {
+ struct gaudi2_cn_aux_ops *gaudi2_aux_ops;
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ struct cpucp_packet pkt;
+ int rc;
+
+ aux_dev = hdev->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ gaudi2_aux_ops = aux_ops->asic_ops;
+
+ memset(&pkt, 0, sizeof(pkt));
+ pkt.ctl = cpu_to_le32(CPUCP_PACKET_NIC_MAC_TX_RESET << CPUCP_PKT_CTL_OPCODE_SHIFT);
+ pkt.port_index = cpu_to_le32(port);
+
+ rc = gaudi2_aux_ops->send_cpu_message(aux_dev, (u32 *)&pkt, sizeof(pkt), 0, NULL);
+ if (rc)
+ dev_warn(hdev->dev, "Card %u Port %u: Failed to reset MAC Tx, rc %d\n",
+ hdev->card_location, port, rc);
+
+ return;
+ }
+
+ if (gaudi2->fw_security_enabled) {
+ dev_warn(hdev->dev, "Card %u Port %u: Failed to reset MAC Tx, security is enabled.\n",
+ hdev->card_location, port);
+ return;
+ }
+
+ tx_ch_mask = 1 << PRT0_MAC_CORE_MAC_RST_CFG_SD_TX_SW_RST_N_SHIFT;
+ tx_ch_mask <<= (port & 0x1) ? 2 : 0;
+
+ NIC_MACRO_RMWREG32(PRT0_MAC_CORE_MAC_RST_CFG, 0, tx_ch_mask);
+ msleep(100);
+ NIC_MACRO_RMWREG32(PRT0_MAC_CORE_MAC_RST_CFG, 1, tx_ch_mask);
+}
+
+static int fw_config(struct hbl_cn_device *hdev, u32 port, u32 data_rate, bool do_lt)
+{
+ u32 card_location;
+ int lane, rc;
+ bool pam4;
+
+ card_location = hdev->card_location;
+ pam4 = (data_rate == NIC_DR_50);
+
+ /* clear go bit */
+ if (pam4) {
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_980F, 0x1, 0x800);
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_980F, 0x1, 0x100);
+ }
+
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_980F, 0x0, 0x8000);
+
+ for (lane = 0; lane < 2; lane++) {
+ if (pam4) {
+ rc = fw_config_speed_pam4(hdev, port, lane, do_lt);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Card %u Port %u lane %d: F/W PAM4 config failed, rc %d\n",
+ card_location, port, lane, rc);
+ return rc;
+ }
+
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PEA, 0x60,
+ NIC0_SERDES0_LANE0_REGISTER_0PEA_VDACCLKPHASE0_MASK);
+ } else {
+ rc = fw_config_speed_nrz(hdev, port, lane);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Card %u Port %u lane %d: F/W NRZ config failed, rc %d\n",
+ card_location, port, lane, rc);
+ return rc;
+ }
+ }
+ }
+
+ for (lane = 0; lane < 2; lane++) {
+ rc = set_pll(hdev, port, lane, data_rate);
+ if (rc)
+ return rc;
+ }
+
+ msleep(100);
+
+ reset_mac_tx(hdev, port);
+
+ if (!hdev->phy_calc_ber)
+ set_functional_mode(hdev, port);
+
+ /* set go bit */
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_980F, 0x1, 0x8000);
+
+ return 0;
+}
+
+static void phy_port_reset(struct hbl_cn_device *hdev, u32 port)
+{
+ int lane;
+
+ soft_reset(hdev, port);
+ usleep_range(500, 1000);
+
+ for (lane = 0; lane < 2; lane++)
+ clock_init(hdev, port, lane);
+
+ cpu_reset(hdev, port);
+ logic_reset(hdev, port);
+
+ usleep_range(500, 1000);
+}
+
+static void prbs_reset(struct hbl_cn_port *cn_port, int lane, bool pam4)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P43, 1,
+ NIC0_SERDES0_LANE0_REGISTER_0P43_RX_PRBS_AUTO_SYNC_EN_MASK);
+
+ if (pam4) {
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P43, 1,
+ NIC0_SERDES0_LANE0_REGISTER_0P43_PRBS_SYNC_CNTR_RESET_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P43, 0,
+ NIC0_SERDES0_LANE0_REGISTER_0P43_PRBS_SYNC_CNTR_RESET_MASK);
+ } else {
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0N61, 1,
+ NIC0_SERDES0_LANE0_REGISTER_0N61_PRBS_CNTR_RESET_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0N61, 0,
+ NIC0_SERDES0_LANE0_REGISTER_0N61_PRBS_CNTR_RESET_MASK);
+ }
+
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P43, 0,
+ NIC0_SERDES0_LANE0_REGISTER_0P43_RX_PRBS_AUTO_SYNC_EN_MASK);
+}
+
+static u64 _get_prbs_cnt(struct hbl_cn_port *cn_port, int lane, bool pam4)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+ u64 cnt;
+
+ if (pam4)
+ cnt = (((u64)NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P50)) << 16) +
+ NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P51);
+ else
+ cnt = (((u64)NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0N66)) << 16) +
+ NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0N67);
+
+ return cnt;
+}
+
+static enum lane_state get_prbs_cnt(struct hbl_cn_port *cn_port, int lane, bool pam4,
+ u64 prbs_prev_cnt, u64 *prbs_new_cnt)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port, phy_ready;
+ u64 cnt;
+
+ if (pam4) {
+ phy_ready = (NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P6A) &
+ NIC0_SERDES0_LANE0_REGISTER_0P6A_RX_READ_PHY_READY_MASK) >>
+ NIC0_SERDES0_LANE0_REGISTER_0P6A_RX_READ_PHY_READY_SHIFT;
+ } else {
+ phy_ready = (NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0N2E) &
+ NIC0_SERDES0_LANE0_REGISTER_0N2E_NRZ_READ_PHY_READY_MASK) >>
+ NIC0_SERDES0_LANE0_REGISTER_0N2E_NRZ_READ_PHY_READY_SHIFT;
+ }
+
+ if (!phy_ready)
+ return NOT_READY;
+
+ cnt = _get_prbs_cnt(cn_port, lane, pam4);
+
+ /* check PRBS counter wrapped around */
+ if (cnt < prbs_prev_cnt) {
+ if ((prbs_prev_cnt - cnt) < 0x10000)
+ return FAILURE;
+
+ cnt = _get_prbs_cnt(cn_port, lane, pam4);
+ }
+
+ *prbs_new_cnt = cnt;
+
+ return READY;
+}
+
+static void _calc_ber_lane(struct hbl_cn_port *cn_port, int lane, u64 total_cnt, u64 error_cnt,
+ struct hbl_cn_ber_info *ber_info)
+{
+ u64 total_high_digits, error_high_digits, integer, frac;
+ u8 total_num_digits, error_num_digits, exp;
+ int i;
+
+ total_num_digits = hbl_cn_get_num_of_digits(total_cnt);
+ error_num_digits = hbl_cn_get_num_of_digits(error_cnt);
+
+ if (total_num_digits > 2) {
+ total_high_digits = total_cnt;
+
+ for (i = 0; i < total_num_digits - 2; i++)
+ total_high_digits = total_high_digits / 10;
+ } else {
+ total_high_digits = total_cnt;
+ }
+
+ if (!total_high_digits)
+ return;
+
+ if (error_num_digits > 2) {
+ error_high_digits = error_cnt;
+
+ for (i = 0; i < error_num_digits - 2; i++)
+ error_high_digits = error_high_digits / 10;
+ } else {
+ error_high_digits = error_cnt;
+ }
+
+ exp = total_num_digits - error_num_digits;
+
+ if (error_high_digits < total_high_digits) {
+ error_high_digits *= 10;
+ exp++;
+ }
+
+ integer = div_u64(error_high_digits, total_high_digits);
+ frac = div_u64(((error_high_digits - (integer * total_high_digits)) * 10),
+ total_high_digits);
+
+ ber_info->integer = integer;
+ ber_info->frac = frac;
+ ber_info->exp = exp;
+ ber_info->valid = true;
+}
+
+static void calc_ber_lane(struct hbl_cn_port *cn_port, int lane, bool pam4)
+{
+ u64 prbs_err_cnt_pre, prbs_prev_cnt, prbs_err_cnt_post, prbs_err_cnt,
+ prbs_reset_time_jiffies, prbs_accum_time_jiffies, prbs_accum_time_ms,
+ factor, error_cnt, total_cnt;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 card_location, port, abs_lane_idx;
+ struct hbl_cn_ber_info *ber_info;
+ enum lane_state state;
+
+ card_location = hdev->card_location;
+ port = cn_port->port;
+ abs_lane_idx = (port << 1) + lane;
+
+ ber_info = &hdev->phy_ber_info[abs_lane_idx];
+ memset(ber_info, 0, sizeof(*ber_info));
+
+ prbs_reset(cn_port, lane, pam4);
+ prbs_reset_time_jiffies = jiffies;
+ prbs_err_cnt_pre = _get_prbs_cnt(cn_port, lane, pam4);
+ prbs_err_cnt_post = 0;
+
+ prbs_prev_cnt = prbs_err_cnt_pre;
+
+ while (true) {
+ msleep(500);
+
+ state = get_prbs_cnt(cn_port, lane, pam4, prbs_prev_cnt, &prbs_err_cnt_post);
+ prbs_accum_time_jiffies = jiffies - prbs_reset_time_jiffies;
+ prbs_accum_time_ms = jiffies_to_msecs(prbs_accum_time_jiffies);
+ prbs_err_cnt = prbs_err_cnt_post - prbs_err_cnt_pre;
+
+ if (state != READY) {
+ dev_dbg(hdev->dev, "Card %u Port %u lane %d: No BER (state = %s)\n",
+ card_location, port, lane,
+ (state == NOT_READY) ? "NOT_READY" : "FAILURE");
+ return;
+ }
+
+ if (prbs_accum_time_ms >= 5000 || prbs_err_cnt >= 10000000)
+ break;
+
+ prbs_prev_cnt = prbs_err_cnt_post;
+ }
+
+ factor = pam4 ? NIC_PHY_PAM4_BER_FACTOR : NIC_PHY_NRZ_BER_FACTOR;
+
+ error_cnt = prbs_err_cnt;
+ total_cnt = prbs_accum_time_ms * factor;
+
+ _calc_ber_lane(cn_port, lane, total_cnt, error_cnt, ber_info);
+
+ dev_dbg(hdev->dev,
+ "Card %u Port %u lane %d: total_cnt %llu error_cnt %llu (%llu ms) - BER %llu.%llue-%u\n",
+ card_location, port, lane, total_cnt, error_cnt, prbs_accum_time_ms,
+ ber_info->integer, ber_info->frac, ber_info->exp);
+}
+
+static void calc_ber(struct hbl_cn_port *cn_port)
+{
+ int lane;
+
+ for (lane = 0; lane < 2; lane++)
+ calc_ber_lane(cn_port, lane, cn_port->data_rate == NIC_DR_50);
+}
+
+static void get_tx_port_lane(u32 port, int lane, u32 *_port, int *_lane)
+{
+ if (port != 0 && port != 1) {
+ *_port = port;
+ *_lane = lane;
+ return;
+ }
+
+ if (port == 0 && lane == 0) {
+ *_port = 1;
+ *_lane = 1;
+ } else if (port == 0 && lane == 1) {
+ *_port = 0;
+ *_lane = 1;
+ } else if (port == 1 && lane == 0) {
+ *_port = 0;
+ *_lane = 0;
+ } else if (port == 1 && lane == 1) {
+ *_port = 1;
+ *_lane = 0;
+ }
+}
+
+static void modify_tx_taps(struct hbl_cn_port *cn_port)
+{
+ struct gaudi2_cn_port *gaudi2_port = cn_port->cn_specific;
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u8 curr_cfg, next_cfg;
+ u32 port, _port;
+ int lane, _lane;
+ bool pam4;
+
+ port = cn_port->port;
+ curr_cfg = get_curr_tx_taps_cfg(hdev, port);
+ next_cfg = (curr_cfg + 1) % tx_taps_num_cfgs;
+ pam4 = cn_port->data_rate == NIC_DR_50;
+ _port = 0;
+ _lane = 0;
+
+ gaudi2_port->tx_taps_cfg = next_cfg;
+
+ /* If the next cfg equals the initial cfg, it means that we went through all the taps cfgs.
+ * In that case, PHY reconfigure should be triggered.
+ */
+ if (next_cfg == gaudi2_port->initial_tx_taps_cfg) {
+ dev_dbg(hdev->dev,
+ "Card %u Port %u: all tx taps cfgs were failed - reconfiguring PHY\n",
+ hdev->card_location, port);
+
+ return hbl_cn_phy_port_reconfig(cn_port);
+ }
+
+ dev_dbg(hdev->dev, "Card %u Port %u: modify %s tx taps (%u,%u)->(%u,%u)\n",
+ hdev->card_location, port, pam4 ? "PAM4" : "NRZ",
+ tx_taps_cfg_array[curr_cfg][0] + 1, tx_taps_cfg_array[curr_cfg][1] + 1,
+ tx_taps_cfg_array[next_cfg][0] + 1, tx_taps_cfg_array[next_cfg][1] + 1);
+
+ for (lane = 0; lane < 2; lane++) {
+ get_tx_port_lane(port, lane, &_port, &_lane);
+ set_tx_taps_cfg(hdev, _port, _lane, next_cfg, pam4, true);
+ }
+
+ gaudi2_port->tx_taps_modified = true;
+}
+
+static void print_final_tx_taps(struct hbl_cn_port *cn_port)
+{
+ char tx_taps_str0[25] = {0}, tx_taps_str1[25] = {0};
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port, abs_lane0_idx, abs_lane1_idx;
+ s32 *taps;
+ bool pam4;
+
+ pam4 = cn_port->data_rate == NIC_DR_50;
+ port = cn_port->port;
+ abs_lane0_idx = port << 1;
+ abs_lane1_idx = abs_lane0_idx + 1;
+
+ taps = pam4 ? hdev->phy_tx_taps[abs_lane0_idx].pam4_taps :
+ hdev->phy_tx_taps[abs_lane0_idx].nrz_taps;
+ sprintf(tx_taps_str0, "%d,%d,%d,%d,%d", taps[0], taps[1], taps[2], taps[3], taps[4]);
+
+ taps = pam4 ? hdev->phy_tx_taps[abs_lane1_idx].pam4_taps :
+ hdev->phy_tx_taps[abs_lane1_idx].nrz_taps;
+ sprintf(tx_taps_str1, "%d,%d,%d,%d,%d", taps[0], taps[1], taps[2], taps[3], taps[4]);
+
+ dev_dbg(hdev->dev, "Card %u Port %u: Final Tx taps - lane0: [%s], lane1: [%s]\n",
+ hdev->card_location, port, tx_taps_str0, tx_taps_str1);
+}
+
+static void change_pcs_link_state(struct gaudi2_cn_port *gaudi2_port,
+ enum gaudi2_cn_pcs_link_state pcs_link_state)
+{
+ struct hbl_cn_port *cn_port = gaudi2_port->cn_port;
+
+ gaudi2_port->pcs_link_state = pcs_link_state;
+
+ /* The retry count is being incremented in a different frequency in every state.
+ * Therefore, in order to have a logical value in each state, it needs to be reset when
+ * moving to a new state.
+ */
+ cn_port->retry_cnt = 0;
+}
+
+static void check_pcs_link(struct hbl_cn_port *cn_port)
+{
+ u32 card_location, port, mac_gnrl_sts, pcs_link_samples_per_sec, link_down_cnt_th,
+ remote_fault_cnt_th, link_toggles;
+ enum gaudi2_cn_pcs_link_state pcs_link_state;
+ struct gaudi2_cn_port *gaudi2_port;
+ struct hbl_cn_device *hdev;
+
+ hdev = cn_port->hdev;
+ gaudi2_port = cn_port->cn_specific;
+ card_location = hdev->card_location;
+ port = cn_port->port;
+ pcs_link_state = gaudi2_port->pcs_link_state;
+
+ if (pcs_link_state == PCS_LINK_STATE_SETTLING) {
+ if (cn_port->eth_enable) {
+ change_pcs_link_state(gaudi2_port, PCS_LINK_STATE_STEADY);
+ } else {
+ change_pcs_link_state(gaudi2_port, PCS_LINK_STATE_STRESS);
+ gaudi2_port->pcs_link_stady_state_ts =
+ ktime_add_ms(ktime_get(), NIC_PHY_PCS_STRESS_WINDOW_MS);
+ }
+
+ return;
+ }
+
+ mac_gnrl_sts = (port & 0x1) ? NIC_MACRO_RREG32(PRT0_MAC_CORE_MAC_GNRL_STS_2) :
+ NIC_MACRO_RREG32(PRT0_MAC_CORE_MAC_GNRL_STS_0);
+
+ if (FIELD_GET(PRT0_MAC_CORE_MAC_GNRL_STS_LOC_FAULT_MASK, mac_gnrl_sts))
+ cn_port->pcs_local_fault_cnt++;
+
+ if (FIELD_GET(PRT0_MAC_CORE_MAC_GNRL_STS_REM_FAULT_MASK, mac_gnrl_sts)) {
+ cn_port->pcs_remote_fault_cnt++;
+ cn_port->pcs_remote_fault_seq_cnt++;
+ } else {
+ cn_port->pcs_remote_fault_seq_cnt = 0;
+ }
+
+ pcs_link_samples_per_sec = gaudi2_port->pcs_link_samples_per_sec;
+ remote_fault_cnt_th = NIC_PHY_MAC_REMOTE_FAULT_TH_S * pcs_link_samples_per_sec;
+
+ if (pcs_link_state == PCS_LINK_STATE_STRESS) {
+ if (ktime_after(ktime_get(), gaudi2_port->pcs_link_stady_state_ts)) {
+ change_pcs_link_state(gaudi2_port, PCS_LINK_STATE_STEADY);
+ goto check_link;
+ }
+
+ if (cn_port->pcs_remote_fault_seq_cnt) {
+ dev_dbg(hdev->dev, "Card %u Port %u: got MAC remote fault during stress window\n",
+ card_location, port);
+
+ modify_tx_taps(cn_port);
+ change_pcs_link_state(gaudi2_port, PCS_LINK_STATE_SETTLING);
+ cn_port->pcs_remote_fault_seq_cnt = 0;
+ }
+ } else { /* PCS_LINK_STATE_STEADY */
+ if (gaudi2_port->tx_taps_modified) {
+ print_final_tx_taps(cn_port);
+ gaudi2_port->tx_taps_modified = false;
+ }
+
+ if (cn_port->pcs_remote_fault_seq_cnt == remote_fault_cnt_th) {
+ dev_dbg(hdev->dev,
+ "Card %u Port %u: %u sequential seconds of MAC remote faults\n",
+ card_location, port, NIC_PHY_MAC_REMOTE_FAULT_TH_S);
+
+ /* Modify tx taps - external ports are excluded */
+ if (!cn_port->eth_enable) {
+ modify_tx_taps(cn_port);
+ change_pcs_link_state(gaudi2_port, PCS_LINK_STATE_SETTLING);
+ }
+
+ cn_port->pcs_remote_fault_seq_cnt = 0;
+ }
+ }
+
+check_link:
+ link_toggles = cn_port->port_toggle_cnt - cn_port->port_toggle_cnt_prev;
+ cn_port->port_toggle_cnt_prev = cn_port->port_toggle_cnt;
+
+ /* The condition to reset the retry_cnt is that the link is UP, and if in steady state,
+ * only if the link toggling threshold is not exceeded.
+ */
+ if (cn_port->pcs_link && !(pcs_link_state == PCS_LINK_STATE_STEADY &&
+ link_toggles > NIC_PHY_PCS_MAX_LINK_TOGGLES)) {
+ cn_port->retry_cnt = 0;
+ return;
+ }
+
+ cn_port->retry_cnt++;
+ link_down_cnt_th = NIC_PHY_PCS_LINK_DOWN_TH_S * pcs_link_samples_per_sec;
+
+ if (cn_port->retry_cnt == link_down_cnt_th) {
+ dev_dbg(hdev->dev,
+ "Card %u Port %u: %u sequential seconds of PCS link down - reconfiguring PHY\n",
+ card_location, port, NIC_PHY_PCS_LINK_DOWN_TH_S);
+
+ hbl_cn_phy_port_reconfig(cn_port);
+ }
+}
+
+static u32 rv_debug(struct hbl_cn_device *hdev, u32 port, int lane, u32 mode, u32 index)
+{
+ u32 cmd, res;
+
+ cmd = 0xB000 + ((mode & 0xF) << 4) + lane;
+
+ fw_cmd(hdev, port, cmd, &index, 0xB, &res);
+
+ return NIC_PHY_RREG32(NIC0_SERDES0_REGISTER_9816);
+}
+
+static int fw_tuning(struct hbl_cn_device *hdev, u32 port, int lane, bool pam4)
+{
+ u32 state, mode;
+
+ mode = pam4 ? 2 : 1;
+ state = rv_debug(hdev, port, lane, mode, 0);
+
+ if (pam4) {
+ if (((u16)state) != 0x8F00 && ((u16)state) != 0x8F80)
+ return -EAGAIN;
+ } else {
+ if (((u16)state) != 0x9A00)
+ return -EAGAIN;
+ }
+
+ return 0;
+}
+
+static void do_fw_tuning(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 card_location, port;
+ int lane, rc;
+ bool pam4;
+
+ card_location = hdev->card_location;
+ port = cn_port->port;
+ pam4 = (cn_port->data_rate == NIC_DR_50);
+
+ for (lane = 0; lane < 2; lane++) {
+ rc = fw_tuning(hdev, port, lane, pam4);
+ if (rc) {
+ if (ktime_after(ktime_get(), cn_port->fw_tuning_limit_ts)) {
+ dev_dbg(hdev->dev,
+ "Card %u Port %u lane %d: F/W tuning limit - reconfiguring PHY\n",
+ card_location, port, lane);
+
+ hbl_cn_phy_port_reconfig(cn_port);
+ return;
+ }
+
+ break;
+ }
+ }
+
+ if (!rc) {
+ /* The control lock needs to be taken here in order to protect against a parallel
+ * status set from the link event handler.
+ * This lock also protects port close flow that destroys this thread synchronically,
+ * so a potential deadlock could happen here.
+ * In order to avoid this deadlock, we need to check if this lock was taken.
+ * If it was taken and the port is marked as closed (i.e., we are now during port
+ * close flow), we can return immediately.
+ * Otherwise, we need to keep trying to take this lock before we enter the critial
+ * section.
+ */
+ while (!mutex_trylock(&cn_port->control_lock))
+ if (!hbl_cn_is_port_open(cn_port))
+ return;
+
+ cn_port->phy_fw_tuned = true;
+
+ /* If we got link up event, set it now when PHY is ready */
+ if (cn_port->eq_pcs_link) {
+ cn_port->pcs_link = true;
+ hbl_cn_phy_set_port_status(cn_port, true);
+ }
+
+ mutex_unlock(&cn_port->control_lock);
+
+ cn_port->retry_cnt = 0;
+ }
+}
+
+static int fw_tuning_an(struct hbl_cn_device *hdev, u32 port, int lane)
+{
+ u32 state = rv_debug(hdev, port, lane, 1, 0);
+
+ if (((u16)state) != 0xA01F && ((u16)state) != 0xA020 && ((u16)state) != 0xAF00) {
+ u32 error_status = rv_debug(hdev, port, lane, 0, 3);
+
+ dev_dbg_ratelimited(hdev->dev,
+ "Card %u Port %u lane %d: auto neg fw is not ready, state 0x%x error 0x%x\n",
+ hdev->card_location, port, lane, state, error_status);
+ return -EAGAIN;
+ }
+
+ return 0;
+}
+
+static void tx_quite(struct hbl_cn_device *hdev, u32 port, int lane)
+{
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA1, 0x0);
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA2, 0x0);
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA3, 0x0);
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA4, 0x0);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, 0x0,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_TEST_DATA_SRC_MASK);
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, 0x0,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_PAM4_TEST_EN_MASK);
+}
+
+static int do_anlt(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port, tx_port;
+ int tx_lane, rc;
+
+ /* fw_tuning_an needs to be done only on lane 0 */
+ rc = fw_tuning_an(hdev, port, 0);
+ if (rc)
+ return rc;
+
+ get_tx_port_and_lane(hdev, port, 0, &tx_port, &tx_lane);
+ tx_quite(hdev, tx_port, tx_lane);
+
+ rc = fw_config(hdev, port, NIC_DR_50, true);
+ if (rc) {
+ dev_dbg(hdev->dev,
+ "Card %u Port %u: PHY link training failed, rc %d - reconfiguring PHY\n",
+ hdev->card_location, port, rc);
+
+ hbl_cn_phy_port_reconfig(cn_port);
+
+ return rc;
+ }
+
+ cn_port->auto_neg_resolved = true;
+
+ return 0;
+}
+
+static void do_fw_tuning_auto_neg(struct hbl_cn_port *cn_port)
+{
+ u32 fw_tuning_timeout_ms;
+
+ if (cn_port->auto_neg_enable) {
+ if (do_anlt(cn_port))
+ return;
+ } else {
+ cn_port->auto_neg_skipped = true;
+ }
+
+ if (cn_port->eth_enable)
+ fw_tuning_timeout_ms = NIC_PHY_FW_TUNING_TIMEOUT_MS;
+ else
+ fw_tuning_timeout_ms = tx_taps_num_cfgs * NIC_PHY_PCS_SETTLING_WAIT_MS;
+
+ cn_port->fw_tuning_limit_ts = ktime_add_ms(ktime_get(), fw_tuning_timeout_ms);
+ do_fw_tuning(cn_port);
+}
+
+static u32 get_timeout_ms(struct hbl_cn_port *cn_port)
+{
+ u32 card_location, port, timeout_ms;
+ struct gaudi2_cn_port *gaudi2_port;
+ struct hbl_cn_device *hdev;
+
+ hdev = cn_port->hdev;
+ gaudi2_port = cn_port->cn_specific;
+ card_location = hdev->card_location;
+ port = cn_port->port;
+ timeout_ms = MSEC_PER_SEC;
+
+ if (!cn_port->phy_fw_tuned) {
+ timeout_ms = NIC_PHY_FW_TUNING_INTERVAL_MS;
+ } else if (!cn_port->phy_func_mode_en) {
+ u16 timeout_sec = hdev->phy_calc_ber_wait_sec;
+
+ dev_info(hdev->dev, "Card %u Port %u: Waiting %u seconds before calculating BER\n",
+ card_location, port, timeout_sec);
+ timeout_ms = timeout_sec * MSEC_PER_SEC;
+ } else {
+ enum gaudi2_cn_pcs_link_state pcs_link_state = gaudi2_port->pcs_link_state;
+
+ switch (pcs_link_state) {
+ case PCS_LINK_STATE_SETTLING:
+ timeout_ms = NIC_PHY_PCS_SETTLING_WAIT_MS;
+ dev_dbg(hdev->dev, "Card %u Port %u: waiting %lu seconds for settling\n",
+ card_location, port, timeout_ms / MSEC_PER_SEC);
+ break;
+ case PCS_LINK_STATE_STRESS:
+ timeout_ms = NIC_PHY_PCS_STRESS_INT_MS;
+ gaudi2_port->pcs_link_samples_per_sec = MSEC_PER_SEC / timeout_ms;
+ break;
+ case PCS_LINK_STATE_STEADY:
+ timeout_ms = NIC_PHY_PCS_STEADY_STATE_INT_MS;
+ gaudi2_port->pcs_link_samples_per_sec = MSEC_PER_SEC / timeout_ms;
+ break;
+ default:
+ dev_err(hdev->dev, "Card %u Port %u: invalid pcs_link_state %u\n",
+ card_location, port, pcs_link_state);
+ }
+ }
+
+ return timeout_ms;
+}
+
+void gaudi2_cn_phy_link_status_work(struct work_struct *work)
+{
+ u32 card_location, port, timeout_ms;
+ struct gaudi2_cn_device *gaudi2;
+ struct hbl_cn_port *cn_port;
+ struct hbl_cn_device *hdev;
+
+ cn_port = container_of(work, struct hbl_cn_port, link_status_work.work);
+ hdev = cn_port->hdev;
+ gaudi2 = hdev->asic_specific;
+ card_location = hdev->card_location;
+ port = cn_port->port;
+
+ /* Reschedule this work if the device is under compute reset */
+ if (gaudi2->in_compute_reset) {
+ timeout_ms = MSEC_PER_SEC;
+ goto reschedule;
+ }
+
+ if (cn_port->phy_fw_tuned) {
+ if (!cn_port->phy_func_mode_en) {
+ calc_ber(cn_port);
+ dev_info(hdev->dev, "Card %u Port %u: BER calculation is done\n",
+ card_location, port);
+ return;
+ }
+
+ check_pcs_link(cn_port);
+ } else {
+ if (cn_port->auto_neg_resolved || cn_port->auto_neg_skipped)
+ do_fw_tuning(cn_port);
+ else
+ do_fw_tuning_auto_neg(cn_port);
+ }
+
+ timeout_ms = get_timeout_ms(cn_port);
+
+reschedule:
+ queue_delayed_work(cn_port->wq, &cn_port->link_status_work, msecs_to_jiffies(timeout_ms));
+}
+
+static void set_tx(struct hbl_cn_device *hdev, u32 port, int lane, bool enable)
+{
+ u32 val = enable ? 0x1 : 0x0;
+
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0NF8, val,
+ NIC0_SERDES0_LANE0_REGISTER_0NF8_PU_VDRV_MASK);
+}
+
+void gaudi2_cn_phy_port_start_stop(struct hbl_cn_port *cn_port, bool is_start)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port, tx_port;
+ int lane, tx_lane;
+
+ for (lane = 0; lane < 2; lane++) {
+ get_tx_port_and_lane(hdev, port, lane, &tx_port, &tx_lane);
+
+ if (is_start) {
+ /* Enable TX driver in SerDes */
+ set_tx(hdev, tx_port, tx_lane, true);
+ /* Enable F/W Rx tuning is done during power up flow */
+ } else {
+ /* Disable TX driver in SerDes */
+ set_tx(hdev, tx_port, tx_lane, false);
+ /* Silence F/W Rx tuning */
+ NIC_PHY_WREG32(NIC0_SERDES0_REGISTER_9815, 0x9000 | lane);
+ }
+ }
+}
+
+static int fw_start_an(struct hbl_cn_device *hdev, u32 port, int lane)
+{
+ u32 detail = 0, ignore;
+
+ return fw_cmd(hdev, port, 0x80A0 | lane, &detail, 0x8, &ignore);
+}
+
+static int fw_start_an_tx(struct hbl_cn_device *hdev, u32 port, int lane)
+{
+ u32 detail = 0, ignore;
+ int rc;
+
+ rc = fw_cmd(hdev, port, 0x7040 | lane, &detail, 0x7, &ignore);
+ if (rc)
+ return rc;
+
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0, 0x0,
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_PAM4_TEST_EN_MASK);
+
+ return 0;
+}
+
+static int fw_config_auto_neg(struct hbl_cn_device *hdev, u32 port, int lane)
+{
+ struct hbl_cn_port *cn_port = &hdev->cn_ports[port];
+ u64 basepage = 0x800000001ull;
+ u32 tx_port, pflags;
+ u32 card_location;
+ int tx_lane, rc;
+
+ card_location = hdev->card_location;
+ pflags = hbl_cn_get_pflags(cn_port);
+
+ get_tx_port_and_lane(hdev, port, lane, &tx_port, &tx_lane);
+
+ /* clear go bit */
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_980F, 0x0, 0x8000);
+
+ init_nrz_tx(hdev, tx_port, tx_lane);
+ init_nrz_rx(hdev, port, lane);
+
+ init_lane_for_fw_tx(hdev, tx_port, tx_lane, false, true);
+ init_lane_for_fw_rx(hdev, port, lane, false, true);
+
+ prbs_mode_select_tx(hdev, tx_port, tx_lane, false, "PRBS31");
+ prbs_mode_select_rx(hdev, port, lane, false, "PRBS31");
+
+ lane_swapping_config(hdev, port, lane, true, true);
+
+ /* set FW to start AN */
+
+ rc = fw_start_an(hdev, port, lane);
+ if (rc) {
+ dev_err(hdev->dev, "Card %u Port %u lane %d: start auto neg failed, rc %d\n",
+ card_location, tx_port, tx_lane, rc);
+ return rc;
+ }
+
+ if (is_lane_swapping(hdev, port, lane)) {
+ rc = fw_start_an_tx(hdev, tx_port, tx_lane);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Card %u Port %u lane %d: start auto neg failed, rc %d\n",
+ card_location, tx_port, tx_lane, rc);
+ return rc;
+ }
+ }
+
+ /* AN reset */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_AK00, 0xE000);
+
+ /* AN mode */
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_AI10, basepage & 0xFFFF);
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_AI11, (basepage >> 16) & 0xFFFF);
+ NIC_PHY_WREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_AI12, (basepage >> 32) & 0xFFFF);
+
+ /* IEEE */
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_AK00, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_AK00_ARG_ANEG_IEEE_MODE_S_MASK);
+
+ if (pflags & PFLAGS_PHY_AUTO_NEG_LPBK)
+ NIC_PHY_RMWREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_AK00, 0x1,
+ NIC0_SERDES0_LANE0_REGISTER_AK00_ARG_DIS_NONCE_MATCH_S_MASK);
+
+ rc = set_pll(hdev, port, lane, NIC_DR_25);
+ if (rc)
+ return rc;
+
+ /* set go bit */
+ NIC_PHY_RMWREG32(NIC0_SERDES0_REGISTER_980F, 0x1, 0x8000);
+
+ return 0;
+}
+
+static int port_9_reinit(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_port *cn_port_9 = &hdev->cn_ports[9];
+
+ if (!hbl_cn_is_port_open(cn_port_9))
+ return 0;
+
+ dev_dbg(hdev->dev,
+ "Card %u Port 9: Performing port 9 PHY reinit following port 8 PHY init\n",
+ hdev->card_location);
+
+ hbl_cn_phy_fini(cn_port_9);
+
+ return hbl_cn_phy_init(cn_port_9);
+}
+
+int gaudi2_cn_phy_port_power_up(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ struct gaudi2_cn_port *gaudi2_port;
+ u32 data_rate = cn_port->data_rate;
+ u32 card_location, port;
+ int rc;
+
+ gaudi2_port = cn_port->cn_specific;
+ card_location = hdev->card_location;
+ port = cn_port->port;
+
+ phy_port_reset(hdev, port);
+
+ if (hdev->phy_force_first_tx_taps_cfg)
+ gaudi2_port->tx_taps_cfg = 0;
+
+ cn_port->phy_func_mode_en = false;
+ gaudi2_port->pcs_link_state = PCS_LINK_STATE_SETTLING;
+ gaudi2_port->initial_tx_taps_cfg = gaudi2_port->tx_taps_cfg;
+
+ if (cn_port->auto_neg_enable) {
+ /* AN config should be done only on lane 0 */
+ rc = fw_config_auto_neg(hdev, port, 0);
+ if (rc) {
+ dev_err(hdev->dev, "Card %u Port %u: F/W config auto_neg failed, rc %d\n",
+ card_location, port, rc);
+ return rc;
+ }
+ } else {
+ rc = fw_config(hdev, port, data_rate, false);
+ if (rc) {
+ dev_err(hdev->dev, "Card %u Port %u: F/W config failed, rc %d\n",
+ card_location, port, rc);
+ return rc;
+ }
+ }
+
+ /* Port 8 is an external port which will usually be brought UP after all the internal ports
+ * are UP. Due to macro clock nest dependency, when PHY reset is called for port 8,
+ * port 9 (which is internal) is being toggled and might lost stabilization.
+ * A W/A to overcome this issue is to reinit port 9 right after.
+ */
+ if (port == 8)
+ port_9_reinit(hdev);
+
+ return 0;
+}
+
+void gaudi2_cn_phy_port_reconfig(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 card_location, port;
+ int rc;
+
+ if (!hdev->phy_config_fw)
+ return;
+
+ card_location = hdev->card_location;
+ port = cn_port->port;
+
+ rc = gaudi2_cn_phy_port_power_up(cn_port);
+ if (rc)
+ dev_err(hdev->dev, "Card %u Port %u: PHY reconfig failed\n", card_location, port);
+}
+
+int gaudi2_cn_phy_port_init(struct hbl_cn_port *cn_port)
+{
+ struct hbl_cn_device *hdev = cn_port->hdev;
+ u32 port = cn_port->port;
+ int rc;
+
+ mac_lane_remap(hdev, port);
+
+ rc = hbl_cn_phy_init(cn_port);
+ if (rc)
+ dev_err(hdev->dev, "Port %u: failed to init PHY, rc %d\n", port, rc);
+
+ return rc;
+}
+
+void gaudi2_cn_phy_port_fini(struct hbl_cn_port *cn_port)
+{
+ hbl_cn_phy_fini(cn_port);
+}
+
+int gaudi2_cn_phy_reset_macro(struct hbl_cn_macro *cn_macro)
+{
+ struct hbl_cn_device *hdev = cn_macro->hdev;
+ u32 port;
+
+ /* Reset the two ports under the given cn_macro */
+ port = cn_macro->idx << 1;
+
+ /* Enable PHY refclk */
+ NIC_MACRO_WREG32(NIC0_PHY_PHY_IDDQ_0, 0);
+ NIC_MACRO_WREG32(NIC0_PHY_PHY_IDDQ_1, 0);
+
+ phy_port_reset(hdev, port);
+ phy_port_reset(hdev, port + 1);
+
+ return 0;
+}
+
+void gaudi2_cn_phy_flush_link_status_work(struct hbl_cn_device *hdev)
+{
+ struct hbl_cn_port *cn_port;
+ int i;
+
+ for (i = 0; i < hdev->cn_props.max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ cn_port = &hdev->cn_ports[i];
+
+ flush_delayed_work(&cn_port->link_status_work);
+ }
+}
+
+static int find_first_enabled_port(struct hbl_cn_device *hdev, u32 *port)
+{
+ int i;
+
+ for (i = 0; i < NIC_NUMBER_OF_PORTS; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ *port = i;
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+static void fw_write_all(struct hbl_cn_device *hdev, u32 addr, u32 data)
+{
+ int port;
+
+ for (port = 0; port < NIC_NUMBER_OF_PORTS; port++) {
+ if (!(hdev->ports_mask & BIT(port)))
+ continue;
+
+ NIC_PHY_WREG32(addr, data);
+ }
+}
+
+static void fw_write_all_lanes(struct hbl_cn_device *hdev, u32 addr, u32 data)
+{
+ int port, lane;
+
+ for (port = 0; port < NIC_NUMBER_OF_PORTS; port++) {
+ if (!(hdev->ports_mask & BIT(port)))
+ continue;
+
+ for (lane = 0; lane < 2; lane++)
+ NIC_PHY_WREG32_LANE(addr, data);
+ }
+}
+
+static void fw_unload_all(struct hbl_cn_device *hdev)
+{
+ u32 port;
+
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_9814, 0xFFF0);
+
+ for (port = 0; port < NIC_NUMBER_OF_PORTS; port++) {
+ if (!(hdev->ports_mask & BIT(port)))
+ continue;
+
+ cpu_reset(hdev, port);
+ }
+
+ msleep(100);
+
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_9814, 0x0);
+
+ /* PAM4 */
+ fw_write_all_lanes(hdev, NIC0_SERDES0_LANE0_REGISTER_0P11, 0);
+ usleep_range(1000, 2000);
+ fw_write_all_lanes(hdev, NIC0_SERDES0_LANE0_REGISTER_0P11, 0x2000);
+
+ /* NRZ */
+ fw_write_all_lanes(hdev, NIC0_SERDES0_LANE0_REGISTER_0N0B, 0);
+ fw_write_all_lanes(hdev, NIC0_SERDES0_LANE0_REGISTER_0N0C, 0);
+ usleep_range(1000, 2000);
+ fw_write_all_lanes(hdev, NIC0_SERDES0_LANE0_REGISTER_0N0C, 0x8000);
+}
+
+static u32 fw_crc(struct hbl_cn_device *hdev, u32 port)
+{
+ u32 checksum_code, ignore;
+
+ fw_cmd(hdev, port, 0xF001, NULL, 0xF, &ignore);
+ checksum_code = NIC_PHY_RREG32(NIC0_SERDES0_REGISTER_9816);
+
+ return checksum_code;
+}
+
+static u32 fw_hash(struct hbl_cn_device *hdev, u32 port)
+{
+ u32 low_word, hash_code, res;
+
+ fw_cmd(hdev, port, 0xF000, NULL, 0xF, &res);
+ low_word = NIC_PHY_RREG32(NIC0_SERDES0_REGISTER_9816);
+ hash_code = ((res & 0xFF) << 16) | low_word;
+
+ return hash_code;
+}
+
+static int mcu_cal_enable_all(struct hbl_cn_device *hdev)
+{
+ u32 port;
+ int rc;
+
+ for (port = 0; port < NIC_NUMBER_OF_PORTS; port++) {
+ if (!(hdev->ports_mask & BIT(port)))
+ continue;
+
+ rc = set_fw_reg(hdev, port, 357, NIC_PHY_FW_TIME_CONSTANT_RATIO);
+ if (rc) {
+ dev_dbg(hdev->dev, "Port %u: MCU calibration failed\n", port);
+ return rc;
+ }
+ }
+
+ return 0;
+}
+
+int gaudi2_cn_phy_fw_load_all(struct hbl_cn_device *hdev)
+{
+ u32 entry_point, length, ram_addr, sections, status, checks, checksum;
+ int rc, i, j, data_ptr = 0;
+ const struct firmware *fw;
+ const void *fw_data;
+ const char *fw_name;
+ u16 mdio_data;
+ u32 port; /* For regs read */
+
+ rc = find_first_enabled_port(hdev, &port);
+ if (rc)
+ return rc;
+
+ fw_name = gaudi2_cn_phy_get_fw_name();
+
+ fw_unload_all(hdev);
+
+ rc = request_firmware(&fw, fw_name, hdev->dev);
+ if (rc) {
+ dev_err(hdev->dev, "Firmware file %s is not found\n", fw_name);
+ return rc;
+ }
+
+ fw_data = (const void *)fw->data;
+ fw_data += 0x1000;
+
+ /* skip hash, crc and date */
+ entry_point = get_unaligned_be32(fw_data + 8);
+ length = get_unaligned_be32(fw_data + 12);
+ ram_addr = get_unaligned_be32(fw_data + 16);
+
+ dev_dbg(hdev->dev, "entry_point: 0x%x\n", entry_point);
+ dev_dbg(hdev->dev, "length: 0x%x\n", length);
+
+ fw_data += 20;
+
+ sections = DIV_ROUND_UP(length, 24);
+
+ dev_dbg(hdev->dev, "sections: %d\n", sections);
+
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_9814, 0xFFF0); /* FW2 */
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_980D, 0x0AAA); /* FW1 */
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_980D, 0x0); /* FW1 */
+
+ checks = 0;
+
+ do {
+ usleep_range(10000, 20000);
+ status = NIC_PHY_RREG32(NIC0_SERDES0_REGISTER_9814); /* FW2 */
+ dev_dbg(hdev->dev, "port %d, status: 0x%x\n", port, status);
+ if (checks++ > NIC_PHY_READ_COUNTS_PER_MS) {
+ dev_err(hdev->dev, "failed to load F/W, fw2 timeout 0x%x\n", status);
+ rc = -ETIMEDOUT;
+ goto release_fw;
+ }
+ } while (status);
+
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_9814, 0x0);
+
+ for (i = 0; i <= sections; i++) {
+ checksum = 0x800C;
+
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_9F0C, ram_addr >> 16); /* FW0 + 12 */
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_9F0D, ram_addr & 0xFFFF); /* FW0 + 13 */
+ checksum += (ram_addr >> 16) + (ram_addr & 0xFFFF);
+
+ for (j = 0; j < 12; j++) {
+ if (data_ptr >= length)
+ mdio_data = 0;
+ else
+ mdio_data = get_unaligned_be16(fw_data + data_ptr);
+
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_9F00 + 4 * j, mdio_data);
+
+ checksum += mdio_data;
+ data_ptr += 2;
+ ram_addr += 2;
+ }
+
+ /* FW0 + 14 */
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_9F0E, (~checksum + 1) & 0xFFFF);
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_9F0F, 0x800C); /* FW0 + 15 */
+
+ checks = 0;
+
+ do {
+ usleep_range(1000, 2000);
+ status = NIC_PHY_RREG32(NIC0_SERDES0_REGISTER_9F0F); /* FW0 + 15 */
+ if (checks++ > NIC_PHY_READ_COUNTS_PER_MS) {
+ dev_err(hdev->dev, "failed to load F/W, fw0 timeout 0x%x\n",
+ status);
+ rc = -ETIMEDOUT;
+ goto release_fw;
+ }
+ } while (status == 0x800C);
+ }
+
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_9F0C, entry_point >> 16); /* FW0 + 12 */
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_9F0D, entry_point & 0xFFFF); /* FW0 + 13 */
+ checksum = (entry_point >> 16) + (entry_point & 0xFFFF) + 0x4000;
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_9F0E, (~checksum + 1) & 0xFFFF); /* FW0 + 14 */
+ fw_write_all(hdev, NIC0_SERDES0_REGISTER_9F0F, 0x4000); /* FW0 + 15 */
+
+ msleep(500);
+
+ dev_dbg(hdev->dev, "F/W CRC = 0x%x\n", fw_crc(hdev, port));
+ dev_dbg(hdev->dev, "F/W hash = 0x%x\n", fw_hash(hdev, port));
+
+ rc = mcu_cal_enable_all(hdev);
+
+release_fw:
+ release_firmware(fw);
+ return rc;
+}
+
+u16 gaudi2_cn_phy_get_crc(struct hbl_cn_device *hdev)
+{
+ u32 port;
+ int rc;
+
+ rc = find_first_enabled_port(hdev, &port);
+ if (rc)
+ return rc;
+
+ return fw_crc(hdev, port);
+}
+
+static bool is_old_phy_fw_loaded(struct hbl_cn_device *hdev)
+{
+ return gaudi2_cn_phy_get_crc(hdev) == 0x1723;
+}
+
+static bool is_phy_fw_with_anlt_support(struct hbl_cn_device *hdev)
+{
+ return gaudi2_cn_phy_get_crc(hdev) == 0x185E;
+}
+
+int gaudi2_cn_phy_init(struct hbl_cn_device *hdev)
+{
+ if (!hdev->phy_config_fw)
+ return 0;
+
+ /* Fail the initialization in case of an old PHY F/W, as the current PHY init flow won't
+ * work with it.
+ */
+ if (is_old_phy_fw_loaded(hdev)) {
+ dev_err(hdev->dev, "PHY F/W is very old - failing the initialization\n");
+ return -EINVAL;
+ }
+
+ /* In case LKD override the existing PHY F/W with an unofficial one and this F/W has ANLT
+ * support, ANLT will be enabled according to the mask.
+ * Otherwise, ANLT will be disabled on all ports.
+ */
+ if (hdev->load_phy_fw && is_phy_fw_with_anlt_support(hdev))
+ hdev->auto_neg_mask = hdev->phys_auto_neg_mask;
+ else
+ hdev->auto_neg_mask = 0;
+
+ /* In case we didn't get serdes info from FW, set to default values */
+ if (hdev->use_fw_serdes_info) {
+ set_fw_lane_mapping(hdev);
+ hbl_cn_phy_set_fw_polarity(hdev);
+ } else {
+ set_default_mac_lane_remap(hdev);
+ set_default_polarity_values(hdev);
+ }
+
+ /* Set the tx taps to their default values only once */
+ if (!hdev->skip_phy_default_tx_taps_cfg) {
+ set_default_tx_taps_values(hdev);
+ hdev->skip_phy_default_tx_taps_cfg = true;
+ }
+
+ return 0;
+}
+
+static int fw_read_s16(struct hbl_cn_device *hdev, u32 port, u32 offset)
+{
+ u32 t = NIC_PHY_RREG32(NIC0_SERDES0_REGISTER_9F00 + 4 * offset);
+
+ return (t & 0x8000) ? t - 0x10000 : t;
+}
+
+static void get_channel_estimation_params(struct hbl_cn_device *hdev, u32 port, int lane, u32 *of,
+ u32 *hf)
+{
+ struct hbl_cn_port *cn_port = &hdev->cn_ports[port];
+
+ if (cn_port->auto_neg_enable) {
+ *of = rv_debug(hdev, port, lane, 5, 22);
+ *hf = rv_debug(hdev, port, lane, 5, 23);
+ } else {
+ *of = rv_debug(hdev, port, lane, 2, 4);
+ *hf = rv_debug(hdev, port, lane, 2, 5);
+ }
+}
+
+static void get_tx_taps(struct hbl_cn_device *hdev, u32 port, int lane, int *tx_taps)
+{
+ u32 tx_pre2, tx_pre1, tx_main, tx_post1, tx_post2;
+
+ tx_pre2 = (NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA5) &
+ NIC0_SERDES0_LANE0_REGISTER_0PA5_TX_PRE_2_MASK) >>
+ NIC0_SERDES0_LANE0_REGISTER_0PA5_TX_PRE_2_SHIFT;
+ tx_pre1 = (NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA7) &
+ NIC0_SERDES0_LANE0_REGISTER_0PA7_TX_PRE_1_MASK) >>
+ NIC0_SERDES0_LANE0_REGISTER_0PA7_TX_PRE_1_SHIFT;
+ tx_main = (NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA9) &
+ NIC0_SERDES0_LANE0_REGISTER_0PA9_TX_MAIN_MASK) >>
+ NIC0_SERDES0_LANE0_REGISTER_0PA9_TX_MAIN_SHIFT;
+ tx_post1 = (NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PAB) &
+ NIC0_SERDES0_LANE0_REGISTER_0PAB_TX_POST_1_MASK) >>
+ NIC0_SERDES0_LANE0_REGISTER_0PAB_TX_POST_1_SHIFT;
+ tx_post2 = (NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PAD) &
+ NIC0_SERDES0_LANE0_REGISTER_0PAD_TX_POST_2_MASK) >>
+ NIC0_SERDES0_LANE0_REGISTER_0PAD_TX_POST_2_SHIFT;
+
+ tx_taps[0] = twos_to_int(tx_pre2, 8);
+ tx_taps[1] = twos_to_int(tx_pre1, 8);
+ tx_taps[2] = twos_to_int(tx_main, 8);
+ tx_taps[3] = twos_to_int(tx_post1, 8);
+ tx_taps[4] = twos_to_int(tx_post2, 8);
+}
+
+static void copy_info(char *buf, char *name, int *data, u8 count, ssize_t size)
+{
+ int i;
+
+ __snprintf(buf, size, "%s:", name);
+
+ for (i = 0; i < count; i++)
+ __snprintf(buf, size, " %d", data[i]);
+
+ __snprintf(buf, size, "\n");
+}
+
+static void dump_ber_info(struct hbl_cn_device *hdev, u32 port, int lane, char *buf, ssize_t size)
+{
+ struct hbl_cn_ber_info *ber_info;
+ u32 abs_lane_idx;
+
+ abs_lane_idx = (port << 1) + lane;
+ ber_info = &hdev->phy_ber_info[abs_lane_idx];
+
+ if (ber_info->valid)
+ __snprintf(buf, size, "BER: %llu.%llue-%u\n",
+ ber_info->integer, ber_info->frac, ber_info->exp);
+ else
+ __snprintf(buf, size, "No BER information\n");
+}
+
+void gaudi2_cn_phy_dump_serdes_params(struct hbl_cn_device *hdev, char *buf, size_t size)
+{
+ u32 port, card_location, sd, phy_ready, ch_est_of, ch_est_hf, ppm_twos, adapt_state;
+ int lane, i, ppm, eye[3], isi[18], tx_taps[5];
+ u8 tx_pol, rx_pol;
+ bool pam4;
+
+ port = hdev->phy_port_to_dump;
+ card_location = hdev->card_location;
+ pam4 = hdev->cn_ports[port].data_rate == NIC_DR_50;
+
+ __snprintf(buf, size, "\nmode: %s\n\n", pam4 ? "PAM4" : "NRZ");
+
+ for (lane = 0; lane < 2; lane++) {
+ sd = (NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P6A) &
+ NIC0_SERDES0_LANE0_REGISTER_0P6A_READ_SIG_DET_MASK) >>
+ NIC0_SERDES0_LANE0_REGISTER_0P6A_READ_SIG_DET_SHIFT;
+
+ phy_ready = (NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P6A) &
+ NIC0_SERDES0_LANE0_REGISTER_0P6A_RX_READ_PHY_READY_MASK) >>
+ NIC0_SERDES0_LANE0_REGISTER_0P6A_RX_READ_PHY_READY_SHIFT;
+ ppm_twos = (NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P73) &
+ NIC0_SERDES0_LANE0_REGISTER_0P73_READ_FREQ_ACC_MASK) >>
+ NIC0_SERDES0_LANE0_REGISTER_0P73_READ_FREQ_ACC_SHIFT;
+ ppm = twos_to_int(ppm_twos, 11);
+ adapt_state = rv_debug(hdev, port, lane, 2, 0);
+
+ tx_pol = (NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0PA0) &
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_ANA_OUT_FLIP_MASK) >>
+ NIC0_SERDES0_LANE0_REGISTER_0PA0_TX_ANA_OUT_FLIP_SHIFT;
+
+ rx_pol = pam4 ? (NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0P43) &
+ NIC0_SERDES0_LANE0_REGISTER_0P43_RX_DATA_FLIP_MASK) >>
+ NIC0_SERDES0_LANE0_REGISTER_0P43_RX_DATA_FLIP_SHIFT :
+ (NIC_PHY_RREG32_LANE(NIC0_SERDES0_LANE0_REGISTER_0N61) &
+ NIC0_SERDES0_LANE0_REGISTER_0N61_PRBS_CHECK_FLIP_MASK) >>
+ NIC0_SERDES0_LANE0_REGISTER_0N61_PRBS_CHECK_FLIP_SHIFT;
+
+ get_channel_estimation_params(hdev, port, lane, &ch_est_of, &ch_est_hf);
+
+ rv_debug(hdev, port, lane, 0xA, 5);
+ for (i = 0; i < 3; i++)
+ eye[i] = fw_read_s16(hdev, port, i);
+
+ rv_debug(hdev, port, lane, 0xA, 0);
+ for (i = 0; i < 16; i++)
+ isi[i] = fw_read_s16(hdev, port, i);
+
+ rv_debug(hdev, port, lane, 0xA, 8);
+ for (i = 0; i < 2; i++)
+ isi[16 + i] = fw_read_s16(hdev, port, i);
+
+ get_tx_taps(hdev, port, lane, tx_taps);
+
+ __snprintf(buf, size, "Card %u Port %u lane %d:\n", card_location, port, lane);
+ __snprintf(buf, size,
+ "sd: %u\nphy_ready: %u\nppm: %d\nch_est_of: %u\nch_est_hf: %u\n"
+ "adaptation state: 0x%x\ntx_pol: %u\nrx_pol: %u\n", sd, phy_ready, ppm,
+ ch_est_of, ch_est_hf, adapt_state, tx_pol, rx_pol);
+ copy_info(buf, "eyes", eye, 3, size);
+ copy_info(buf, "isi", isi, 18, size);
+ copy_info(buf, "tx_taps", tx_taps, 5, size);
+
+ dump_ber_info(hdev, port, lane, buf, size);
+
+ __snprintf(buf, size, "\n");
+ }
+}
diff --git a/include/linux/net/intel/gaudi2.h b/include/linux/net/intel/gaudi2.h
new file mode 100644
index 000000000000..49dd1b6b7c86
--- /dev/null
+++ b/include/linux/net/intel/gaudi2.h
@@ -0,0 +1,432 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef HBL_GAUDI2_H_
+#define HBL_GAUDI2_H_
+
+#include <linux/net/intel/cn.h>
+#include <linux/net/intel/gaudi2_aux.h>
+
+#define NIC_PSN_NBITS 24
+#define NIC_PSN_MSB_MASK (BIT(NIC_PSN_NBITS - 1))
+#define NIC_PSN_LOWER_MASK ((NIC_PSN_MSB_MASK) - 1)
+
+#define NIC_IS_PSN_CYCLIC_BIG(psn_a, psn_b) \
+ ({ \
+ u32 _psn_a = (psn_a); \
+ u32 _psn_b = (psn_b); \
+ ((_psn_a & NIC_PSN_MSB_MASK) == (_psn_b & NIC_PSN_MSB_MASK) ? \
+ (_psn_a & NIC_PSN_LOWER_MASK) > (_psn_b & NIC_PSN_LOWER_MASK) : \
+ (_psn_a & NIC_PSN_LOWER_MASK) < (_psn_b & NIC_PSN_LOWER_MASK)); \
+ })
+
+enum gaudi2_wqe_opcode {
+ WQE_NOP = 0,
+ WQE_SEND = 1,
+ WQE_LINEAR = 2,
+ WQE_STRIDE = 3,
+ WQE_MULTI_STRIDE = 4,
+ WQE_RENDEZVOUS_WR = 5,
+ WQE_RENDEZVOUS_RD = 6,
+ WQE_QOS_UPDATE = 7
+};
+
+#define NIC_SKB_PAD_SIZE 187
+
+/**
+ * enum gaudi2_eqe_type - Event queue element types for the NIC.
+ * @EQE_COMP: Completion queue event. May occur upon Ethernet or RDMA Rx completion.
+ * @EQE_COMP_ERR: Completion queue error event. May occur upon CQ overrun or other errors. Overrun
+ * may occur in case the S/W doesn't consume the CQ entries fast enough so there is
+ * no room for new H/W's entries.
+ * @EQE_QP_ERR: QP moved to error state event. May occur by varies QP errors, e.g.: QP not valid,
+ * QP state invalid etc.
+ * @EQE_LINK_STATUS: PCS link status changed event. May occur upon link up/down events.
+ * @EQE_RAW_TX_COMP: Ethernet Tx completion event. May occur once H/W complete to send Ethernet
+ * packet.
+ * @EQE_DB_FIFO_OVERRUN: DB FIFO overrun event. May occur upon FIFO overrun in case S/W overwrite
+ * un-consumed FIFO entry.
+ * @EQE_CONG: Congestion control completion queue event. May occur upon any packet completion in
+ * case CC is enabled.
+ * @EQE_CONG_ERR: Congestion control completion queue error event. May occur upon CCQ overrun or
+ * other errors. Overrun may occur in case the S/W doesn't consume the CCQ entries
+ * fast enough so there is no room for new H/W's entries.
+ * @EQE_RESERVED: Reserved event value.
+ *
+ * ******************* SW events *******************
+ * @EQE_QP_ALIGN_COUNTERS: QPC sanity failed and QPC counters were reset to last valid values.
+ */
+enum gaudi2_eqe_type {
+ EQE_COMP = 0x0,
+ EQE_COMP_ERR = 0x1,
+ EQE_QP_ERR = 0x2,
+ EQE_LINK_STATUS = 0x3,
+ EQE_RAW_TX_COMP = 0x4,
+ EQE_DB_FIFO_OVERRUN = 0x5,
+ EQE_CONG = 0x6,
+ EQE_CONG_ERR = 0x7,
+ EQE_RESERVED = 0x8,
+
+ /* events triggered by SW */
+ EQE_QP_ALIGN_COUNTERS = 0xa,
+};
+
+/* BIT() macro is overflowing on full 64 bit mask, use the safer BITMLL() instead */
+#define BITMLL(nr) (U64_MAX >> (64 - (nr)))
+
+/* Use multiple underscores to avoid hiding collisions. Using len and _len like in NIC_SET_BITS()
+ * causes len to be 0 here in NIC_SET().
+ */
+#define NIC_SET(desc, idx, shift, val, __len) \
+ ({ \
+ u64 *_data = &(desc).data[(idx)]; \
+ u32 _shift = (shift); \
+ u32 ___len = (__len); \
+ *_data &= ~((u64)(BITMLL(___len)) << _shift); \
+ *_data |= (u64)((val) & BITMLL(___len)) << _shift; \
+ })
+
+#define NIC_SET_BITS(desc, lsb, val, len) \
+ do { \
+ u32 _lsb = (lsb); \
+ u32 _len = (len); \
+ BUILD_BUG_ON((_lsb / 64) != ((_lsb + _len - 1) / 64)); \
+ NIC_SET((desc), _lsb / 64, _lsb % 64, (val), _len); \
+ } while (0)
+
+#define NIC_GET(desc, idx, shift, len) \
+ ((((desc).data[idx]) >> (shift)) & BITMLL(len))
+
+#define NIC_GET_BITS(desc, lsb, len) \
+ ({ \
+ u32 _lsb = (lsb); \
+ u32 _len = (len); \
+ BUILD_BUG_ON((_lsb / 64) != ((_lsb + _len - 1) / 64)); \
+ NIC_GET(desc, _lsb / 64, _lsb % 64, _len); \
+ })
+
+struct gaudi2_qpc_requester {
+ u64 data[16];
+};
+
+struct qpc_mask {
+ u64 data[sizeof(struct gaudi2_qpc_requester) >> 3];
+};
+
+#define REQ_QPC_SET_DST_QP(req, val) NIC_SET_BITS(req, 0, val, 24)
+#define REQ_QPC_SET_RKEY(req, val) NIC_SET_BITS(req, 128, val, 32)
+#define REQ_QPC_SET_DST_IP(req, val) NIC_SET_BITS(req, 160, val, 32)
+#define REQ_QPC_SET_DST_MAC_LSB(req, val) NIC_SET_BITS(req, 192, val, 32)
+#define REQ_QPC_SET_DST_MAC_MSB(req, val) NIC_SET_BITS(req, 224, val, 16)
+#define REQ_QPC_SET_TIMEOUT_RETRY_COUNT(req, val) \
+ NIC_SET_BITS(req, 248, val, 8)
+#define REQ_QPC_SET_NTS_PSN(req, val) NIC_SET_BITS(req, 256, val, 24)
+#define REQ_QPC_SET_BCS_PSN(req, val) NIC_SET_BITS(req, 288, val, 24)
+#define REQ_QPC_SET_SCHD_Q_NUM(req, val) NIC_SET_BITS(req, 312, val, 8)
+#define REQ_QPC_SET_ONA_PSN(req, val) NIC_SET_BITS(req, 320, val, 24)
+
+#define REQ_QPC_SET_TM_GRANULARITY(req, val) NIC_SET_BITS(req, 376, val, 7)
+#define REQ_QPC_SET_WQ_BACK_PRESSURE(req, val) NIC_SET_BITS(req, 383, val, 1)
+#define REQ_QPC_SET_REMOTE_WQ_LOG_SZ(req, val) NIC_SET_BITS(req, 408, val, 5)
+#define REQ_QPC_SET_ENCAP_TYPE(req, val) NIC_SET_BITS(req, 413, val, 3)
+#define REQ_QPC_SET_CQ_NUM(req, val) NIC_SET_BITS(req, 440, val, 5)
+#define REQ_QPC_SET_RTT_STATE(req, val) NIC_SET_BITS(req, 445, val, 2)
+#define REQ_QPC_SET_ENCAP_ENABLE(req, val) NIC_SET_BITS(req, 447, val, 1)
+#define REQ_QPC_SET_CONGESTION_WIN(req, val) NIC_SET_BITS(req, 448, val, 24)
+#define REQ_QPC_SET_BURST_SIZE(req, val) NIC_SET_BITS(req, 544, val, 22)
+#define REQ_QPC_SET_ASID(req, val) NIC_SET_BITS(req, 566, val, 8)
+
+#define REQ_QPC_SET_LAST_IDX(req, val) NIC_SET_BITS(req, 576, val, 22)
+#define REQ_QPC_SET_EXECUTION_IDX(req, val) NIC_SET_BITS(req, 608, val, 22)
+#define REQ_QPC_SET_CONSUMER_IDX(req, val) NIC_SET_BITS(req, 640, val, 22)
+#define REQ_QPC_SET_LOCAL_PRODUCER_IDX(req, val) NIC_SET_BITS(req, 672, val, 22)
+#define REQ_QPC_SET_REMOTE_PRODUCER_IDX(req, val) NIC_SET_BITS(req, 704, val, 22)
+#define REQ_QPC_SET_REMOTE_CONSUMER_IDX(req, val) NIC_SET_BITS(req, 736, val, 22)
+#define REQ_QPC_SET_OLDEST_UNACKED_REMOTE_PRODUCER_IDX(req, val) NIC_SET_BITS(req, 768, \
+ val, 22)
+#define REQ_QPC_SET_PSN_SINCE_ACKREQ(req, val) NIC_SET_BITS(req, 800, val, 8)
+#define REQ_QPC_SET_ACKREQ_FREQ(req, val) NIC_SET_BITS(req, 808, val, 8)
+#define REQ_QPC_SET_PACING_TIME(req, val) NIC_SET_BITS(req, 832, val, 16)
+
+#define REQ_QPC_SET_DATA_MMU_BYPASS(req, val) NIC_SET_BITS(req, 1003, val, 1)
+#define REQ_QPC_SET_MOD_GAUDI1(req, val) NIC_SET_BITS(req, 1004, val, 1)
+#define REQ_QPC_SET_PORT(req, val) NIC_SET_BITS(req, 1005, val, 2)
+#define REQ_QPC_SET_WQ_TYPE(req, val) NIC_SET_BITS(req, 1007, val, 2)
+
+#define REQ_QPC_SET_SWQ_GRANULARITY(req, val) NIC_SET_BITS(req, 1009, val, 1)
+#define REQ_QPC_SET_TRANSPORT_SERVICE(req, val) NIC_SET_BITS(req, 1010, val, 1)
+#define REQ_QPC_SET_PRIORITY(req, val) NIC_SET_BITS(req, 1011, val, 2)
+#define REQ_QPC_SET_CONGESTION_MODE(req, val) NIC_SET_BITS(req, 1013, val, 2)
+#define REQ_QPC_SET_MTU(req, val) NIC_SET_BITS(req, 1015, val, 2)
+
+#define REQ_QPC_SET_WQ_BASE_ADDR(req, val) NIC_SET_BITS(req, 1017, val, 2)
+#define REQ_QPC_SET_TRUST_LEVEL(req, val) NIC_SET_BITS(req, 1019, val, 2)
+#define REQ_QPC_SET_ERR(req, val) NIC_SET_BITS(req, 1022, val, 1)
+#define REQ_QPC_SET_VALID(req, val) NIC_SET_BITS(req, 1023, val, 1)
+
+/* REQ QPC Get */
+#define REQ_QPC_GET_DST_QP(req) NIC_GET_BITS(req, 0, 24)
+#define REQ_QPC_GET_MULTI_STRIDE_STATE_LSB(req) NIC_GET_BITS(req, 32, 32)
+#define REQ_QPC_GET_MULTI_STRIDE_STATE_MSB(req) NIC_GET_BITS(req, 64, 64)
+#define REQ_QPC_GET_RKEY(req) NIC_GET_BITS(req, 128, 32)
+#define REQ_QPC_GET_DST_IP(req) NIC_GET_BITS(req, 160, 32)
+#define REQ_QPC_GET_DST_MAC_LSB(req) NIC_GET_BITS(req, 192, 32)
+#define REQ_QPC_GET_DST_MAC_MSB(req) NIC_GET_BITS(req, 224, 16)
+#define REQ_QPC_GET_SEQUENCE_ERROR_RETRY_COUNT(req) NIC_GET_BITS(req, 240, 8)
+#define REQ_QPC_GET_TIMEOUT_RETRY_COUNT(req) NIC_GET_BITS(req, 248, 8)
+#define REQ_QPC_GET_NTS_PSN(req) NIC_GET_BITS(req, 256, 24)
+#define REQ_QPC_GET_BCS_PSN(req) NIC_GET_BITS(req, 288, 24)
+#define REQ_QPC_GET_SCHD_Q_NUM(req) NIC_GET_BITS(req, 312, 8)
+#define REQ_QPC_GET_ONA_PSN(req) NIC_GET_BITS(req, 320, 24)
+#define REQ_QPC_GET_BCC_PSN(req) NIC_GET_BITS(req, 352, 24)
+#define REQ_QPC_GET_TM_GRANULARITY(req) NIC_GET_BITS(req, 376, 7)
+#define REQ_QPC_GET_WQ_BACK_PRESSURE(req) NIC_GET_BITS(req, 383, 1)
+#define REQ_QPC_GET_CONGESTION_MARKED_ACK(req) NIC_GET_BITS(req, 384, 24)
+#define REQ_QPC_GET_REMOTE_WQ_LOG_SZ(req) NIC_GET_BITS(req, 408, 5)
+#define REQ_QPC_GET_ENCAP_TYPE(req) NIC_GET_BITS(req, 413, 3)
+#define REQ_QPC_GET_CONGESTION_NON_MARKED_ACK(req) NIC_GET_BITS(req, 416, 24)
+#define REQ_QPC_GET_CQ_NUM(req) NIC_GET_BITS(req, 440, 5)
+#define REQ_QPC_GET_RTT_STATE(req) NIC_GET_BITS(req, 445, 2)
+#define REQ_QPC_GET_ENCAP_ENABLE(req) NIC_GET_BITS(req, 447, 1)
+#define REQ_QPC_GET_CONGESTION_WIN(req) NIC_GET_BITS(req, 448, 24)
+#define REQ_QPC_GET_RTT_TIMESTAMP(req) NIC_GET_BITS(req, 480, 25)
+#define REQ_QPC_GET_RTT_MARKED_PSN(req) NIC_GET_BITS(req, 512, 24)
+#define REQ_QPC_GET_BURST_SIZE(req) NIC_GET_BITS(req, 544, 22)
+#define REQ_QPC_GET_ASID(req) NIC_GET_BITS(req, 566, 10)
+#define REQ_QPC_GET_LAST_IDX(req) NIC_GET_BITS(req, 576, 22)
+#define REQ_QPC_GET_EXECUTION_IDX(req) NIC_GET_BITS(req, 608, 22)
+#define REQ_QPC_GET_CONSUMER_IDX(req) NIC_GET_BITS(req, 640, 22)
+#define REQ_QPC_GET_LOCAL_PRODUCER_IDX(req) NIC_GET_BITS(req, 672, 22)
+#define REQ_QPC_GET_REMOTE_PRODUCER_IDX(req) NIC_GET_BITS(req, 704, 22)
+#define REQ_QPC_GET_REMOTE_CONSUMER_IDX(req) NIC_GET_BITS(req, 736, 22)
+#define REQ_QPC_GET_OLDEST_UNACKED_REMOTE_PRODUCER_IDX(req) NIC_GET_BITS(req, 768, 22)
+#define REQ_QPC_GET_PSN_SINCE_ACKREQ(req) NIC_GET_BITS(req, 800, 8)
+#define REQ_QPC_GET_ACKREQ_FREQ(req) NIC_GET_BITS(req, 808, 8)
+#define REQ_QPC_GET_PACING_TIME(req) NIC_GET_BITS(req, 832, 16)
+#define REQ_QPC_GET_PSN_DELIVERED(req) NIC_GET_BITS(req, 864, 24)
+#define REQ_QPC_GET_DATA_MMU_BYPASS(req) NIC_GET_BITS(req, 1003, 1)
+#define REQ_QPC_GET_MOD_GAUDI1(req) NIC_GET_BITS(req, 1004, 1)
+#define REQ_QPC_GET_PORT(req) NIC_GET_BITS(req, 1005, 2)
+#define REQ_QPC_GET_WQ_TYPE(req) NIC_GET_BITS(req, 1007, 2)
+#define REQ_QPC_GET_SWQ_GRANULARITY(req) NIC_GET_BITS(req, 1009, 1)
+#define REQ_QPC_GET_TRANSPORT_SERVICE(req) NIC_GET_BITS(req, 1010, 1)
+#define REQ_QPC_GET_PRIORITY(req) NIC_GET_BITS(req, 1011, 2)
+#define REQ_QPC_GET_CONGESTION_MODE(req) NIC_GET_BITS(req, 1013, 2)
+#define REQ_QPC_GET_MTU(req) NIC_GET_BITS(req, 1015, 2)
+#define REQ_QPC_GET_WQ_BASE_ADDR(req) NIC_GET_BITS(req, 1017, 2)
+#define REQ_QPC_GET_TRUST_LEVEL(req) NIC_GET_BITS(req, 1019, 2)
+#define REQ_QPC_GET_IN_WORK(req) NIC_GET_BITS(req, 1021, 1)
+#define REQ_QPC_GET_ERROR(req) NIC_GET_BITS(req, 1022, 1)
+#define REQ_QPC_GET_VALID(req) NIC_GET_BITS(req, 1023, 1)
+
+/* Resp QPC */
+struct gaudi2_qpc_responder {
+ u64 data[4];
+};
+
+#define RES_QPC_SET_DST_QP(res, val) NIC_SET_BITS(res, 0, val, 24)
+#define RES_QPC_SET_PORT(res, val) NIC_SET_BITS(res, 24, val, 2)
+#define RES_QPC_SET_PRIORITY(res, val) NIC_SET_BITS(res, 26, val, 2)
+#define RES_QPC_SET_LKEY(res, val) NIC_SET_BITS(res, 32, val, 32)
+
+#define RES_QPC_SET_DST_IP(res, val) NIC_SET_BITS(res, 64, val, 32)
+#define RES_QPC_SET_DST_MAC_LSB(res, val) NIC_SET_BITS(res, 96, val, 32)
+#define RES_QPC_SET_DST_MAC_MSB(res, val) NIC_SET_BITS(res, 128, val, 16)
+#define RES_QPC_SET_TRANSPORT_SERVICE(res, val) NIC_SET_BITS(res, 149, val, 1)
+
+#define RES_QPC_SET_ASID(res, val) NIC_SET_BITS(res, 150, val, 10)
+#define RES_QPC_SET_PEER_QP(res, val) NIC_SET_BITS(res, 160, val, 24)
+#define RES_QPC_SET_SCHD_Q_NUM(res, val) NIC_SET_BITS(res, 184, val, 8)
+
+#define RES_QPC_SET_TRUST_LEVEL(res, val) NIC_SET_BITS(res, 216, val, 2)
+#define RES_QPC_SET_MOD_GAUDI1(res, val) NIC_SET_BITS(res, 218, val, 1)
+#define RES_QPC_SET_DATA_MMU_BYPASS(res, val) NIC_SET_BITS(res, 219, val, 1)
+#define RES_QPC_SET_ENCAP_TYPE(res, val) NIC_SET_BITS(res, 220, val, 3)
+#define RES_QPC_SET_ENCAP_ENABLE(res, val) NIC_SET_BITS(res, 223, val, 1)
+#define RES_QPC_SET_CQ_NUM(res, val) NIC_SET_BITS(res, 248, val, 5)
+
+#define RES_QPC_SET_PEER_WQ_GRAN(res, val) NIC_SET_BITS(res, 253, val, 1)
+#define RES_QPC_SET_VALID(res, val) NIC_SET_BITS(res, 255, val, 1)
+
+/* Resp QPC Get */
+#define RES_QPC_GET_DESTINATION_QP(res) NIC_GET_BITS(res, 0, 24)
+#define RES_QPC_GET_PORT(res) NIC_GET_BITS(res, 24, 2)
+#define RES_QPC_GET_PRIORITY(res) NIC_GET_BITS(res, 26, 2)
+#define RES_QPC_GET_CONN_STATE(res) NIC_GET_BITS(res, 28, 2)
+#define RES_QPC_GET_NACK_SYNDROME(res) NIC_GET_BITS(res, 30, 2)
+#define RES_QPC_GET_LKEY(res) NIC_GET_BITS(res, 32, 32)
+#define RES_QPC_GET_DST_IP(res) NIC_GET_BITS(res, 64, 32)
+#define RES_QPC_GET_DST_MAC_LSB(res) NIC_GET_BITS(res, 96, 32)
+#define RES_QPC_GET_DST_MAC_MSB(res) NIC_GET_BITS(res, 128, 16)
+#define RES_QPC_GET_ECN_COUNT(res) NIC_GET_BITS(res, 144, 5)
+#define RES_QPC_GET_TRANSPORT_SERVICE(res) NIC_GET_BITS(res, 149, 1)
+#define RES_QPC_GET_ASID(res) NIC_GET_BITS(res, 150, 10)
+#define RES_QPC_GET_PEER_QP(res) NIC_GET_BITS(res, 160, 24)
+#define RES_QPC_GET_SCHD_Q_NUM(res) NIC_GET_BITS(res, 184, 8)
+#define RES_QPC_GET_EXPECTED_PSN(res) NIC_GET_BITS(res, 192, 24)
+#define RES_QPC_GET_TRUST_LEVEL(res) NIC_GET_BITS(res, 216, 2)
+#define RES_QPC_GET_MOD_GAUDI1(res) NIC_GET_BITS(res, 218, 1)
+#define RES_QPC_GET_DATA_MMU_BYPASS(res) NIC_GET_BITS(res, 219, 1)
+#define RES_QPC_GET_ENCAP_TYPE(res) NIC_GET_BITS(res, 220, 3)
+#define RES_QPC_GET_ENCAP_ENABLE(res) NIC_GET_BITS(res, 223, 1)
+#define RES_QPC_GET_CYCLIC_IDX(res) NIC_GET_BITS(res, 224, 24)
+#define RES_QPC_GET_CQ_NUM(res) NIC_GET_BITS(res, 248, 5)
+#define RES_QPC_GET_PEER_WQ_GRAN(res) NIC_GET_BITS(res, 253, 1)
+#define RES_QPC_GET_IN_WORK(res) NIC_GET_BITS(res, 254, 1)
+#define RES_QPC_GET_VALID(res) NIC_GET_BITS(res, 255, 1)
+
+struct gaudi2_sq_wqe {
+ u64 data[4];
+};
+
+/* TX WQE Get */
+#define TX_WQE_GET_OPCODE(wqe) NIC_GET_BITS(wqe, 0, 5)
+#define TX_WQE_GET_TRACE_EVENT_DATA(wqe) NIC_GET_BITS(wqe, 5, 1)
+#define TX_WQE_GET_TRACE_EVENT(wqe) NIC_GET_BITS(wqe, 6, 1)
+#define TX_WQE_GET_WQE_INDEX(wqe) NIC_GET_BITS(wqe, 8, 8)
+#define TX_WQE_GET_REDUCTION_OPCODE(wqe) NIC_GET_BITS(wqe, 16, 13)
+#define TX_WQE_GET_SE(wqe) NIC_GET_BITS(wqe, 29, 1)
+#define TX_WQE_GET_INLINE(wqe) NIC_GET_BITS(wqe, 30, 1)
+#define TX_WQE_GET_ACKREQ(wqe) NIC_GET_BITS(wqe, 31, 1)
+#define TX_WQE_GET_SIZE(wqe) NIC_GET_BITS(wqe, 32, 32)
+#define TX_WQE_GET_LOCAL_ADDR_LSB(wqe) NIC_GET_BITS(wqe, 64, 32)
+#define TX_WQE_GET_LOCAL_ADDR_MSB(wqe) NIC_GET_BITS(wqe, 96, 32)
+#define TX_WQE_GET_REMOTE_ADDR_LSB(wqe) NIC_GET_BITS(wqe, 128, 32)
+#define TX_WQE_GET_REMOTE_ADDR_MSB(wqe) NIC_GET_BITS(wqe, 160, 32)
+#define TX_WQE_GET_TAG(wqe) NIC_GET_BITS(wqe, 192, 32)
+#define TX_WQE_GET_REMOTE_SOB(wqe) NIC_GET_BITS(wqe, 224, 27)
+#define TX_WQE_GET_REMOTE_SOB_DATA(wqe) NIC_GET_BITS(wqe, 251, 2)
+#define TX_WQE_GET_SOB_CMD(wqe) NIC_GET_BITS(wqe, 253, 1)
+#define TX_WQE_GET_COMPLETION_TYPE(wqe) NIC_GET_BITS(wqe, 254, 2)
+
+/* TX WQE Set */
+#define CFG_SQ_WQE_RESET(swq) memset((swq)->data, 0, sizeof(u64) * 4)
+
+#define CFG_SQ_WQE_OPCODE(swq, val) \
+ ((swq)->data[0] |= (val))
+#define CFG_SQ_WQE_INDEX(swq, val) \
+ ((swq)->data[0] |= (val) << 8)
+#define CFG_SQ_WQE_SOL_EVENT(swq, val) \
+ ((swq)->data[0] |= (val) << 29)
+#define CFG_SQ_WQE_INLINE(swq, val) \
+ ((swq)->data[0] |= (val) << 30)
+#define CFG_SQ_WQE_SIZE(swq, val) \
+ ((swq)->data[0] |= (val) << 32)
+#define CFG_SQ_WQE_LOCAL_ADDRESS(swq, val) \
+ ((swq)->data[1] = (val))
+struct gaudi2_rq_wqe {
+ u64 data[2];
+};
+
+/* RX WQE Get */
+#define RX_WQE_GET_OPCODE(wqe) NIC_GET_BITS(wqe, 0, 5)
+#define RX_WQE_GET_WQE_INDEX(wqe) NIC_GET_BITS(wqe, 8, 8)
+#define RX_WQE_GET_SOB_CMD(wqe) NIC_GET_BITS(wqe, 31, 1)
+#define RX_WQE_GET_LOCAL_SOB(wqe) NIC_GET_BITS(wqe, 32, 27)
+#define RX_WQE_GET_LOCAL_SOB_DATA(wqe) NIC_GET_BITS(wqe, 59, 3)
+#define RX_WQE_GET_COMPLETION_TYPE(wqe) NIC_GET_BITS(wqe, 62, 2)
+#define RX_WQE_GET_SIZE(wqe) NIC_GET_BITS(wqe, 64, 32)
+#define RX_WQE_GET_TAG(wqe) NIC_GET_BITS(wqe, 96, 32)
+
+struct gaudi2_cqe {
+ u32 data[4];
+};
+
+#define CQE_IS_VALID(cqe) (((cqe)->data[0] >> 31) & 1)
+#define CQE_IS_REQ(cqe) (((cqe)->data[0] >> 24) & 1)
+#define CQE_QPN(cqe) ((cqe)->data[0] & 0xFFFFFF)
+#define CQE_SET_INVALID(cqe) ((cqe)->data[0] &= ~(1ull << 31))
+#define CQE_WQE_IDX(cqe) ((cqe)->data[1])
+#define CQE_TAG(cqe) ((cqe)->data[2])
+#define CQE_RAW_PKT_SIZE(cqe) ((cqe)->data[3])
+
+#define EQE_HEADER(valid, type) ((!!(valid) << 31) | (type))
+#define EQE_TYPE(eqe) ((eqe)->data[0] & 0xf)
+#define EQE_IS_VALID(eqe) (((eqe)->data[0] >> 31) & 0x1)
+#define EQE_SET_INVALID(eqe) ((eqe)->data[0] &= ~(1ull << 31))
+#define EQE_CQ_EVENT_CQ_NUM(eqe) ((eqe)->data[1] & 0xffff)
+#define EQE_CQ_EVENT_PI(eqe) ((eqe)->data[2])
+#define EQE_CQ_EVENT_CCQ_NUM(eqe) ((eqe)->data[1] & 0xffff)
+
+#define EQE_QP_EVENT_QPN(eqe) ((eqe)->data[1] & 0xffffff)
+#define EQE_QP_EVENT_RESET(eqe) (((eqe)->data[1] >> 31) & 0x1)
+#define EQE_QP_EVENT_ERR_SYND(eqe) ((eqe)->data[2])
+
+#define EQE_RAW_TX_EVENT_QPN(eqe) ((eqe)->data[1] & 0xffffff)
+#define EQE_RAW_TX_EVENT_IDX(eqe) ((eqe)->data[2] & 0xffffffff)
+
+#define EQE_LINK_STATUS_TIME_STAMP(eqe) ((eqe)->data[1])
+#define EQE_LINK_STATUS(eqe) ((eqe)->data[2] & 0xf)
+
+#define EQE_DB_EVENT_DB_NUM(eqe) ((eqe)->data[1] & 0xffff)
+
+#define EQE_SW_EVENT_QPN(eqe) ((eqe)->data[1] & 0xffffff)
+
+#define EQ_IDX_MASK GENMASK(23, 0)
+
+/**
+ * struct gaudi2_en_tx_buf - indicates a tx buffer
+ * @skb: the transmitted skb
+ * @dma_addr: the skb's mapped dma address
+ * @len: buffer size
+ */
+struct gaudi2_en_tx_buf {
+ struct sk_buff *skb;
+ dma_addr_t dma_addr;
+ int len;
+};
+
+/**
+ * struct gaudi2_en_aux_data - Gaudi2 Ethernet driver data.
+ * @rx_rings: array of the Rx rings of all ports.
+ * @cq_rings: array of the CQ rings of all ports.
+ * @wq_rings: array of the WQ rings of all ports.
+ * @kernel_asid: kernel ASID.
+ * @raw_qpn: raw data (Ethernet) QP number.
+ * @tx_ring_len: number of elements in the Tx ring.
+ * @schedq_num: sched-Q number used for the Eth driver of the port.
+ * @pad_size: the pad size in bytes for the skb to transmit.
+ */
+struct gaudi2_en_aux_data {
+ struct hbl_cn_ring **rx_rings;
+ struct hbl_cn_ring **cq_rings;
+ struct hbl_cn_ring **wq_rings;
+ u32 kernel_asid;
+ u32 raw_qpn;
+ u32 tx_ring_len;
+ u32 schedq_num;
+ u16 pad_size;
+};
+
+/**
+ * struct gaudi2_en_aux_ops - ASIC specific functions for cn <-> en drivers communication.
+ * @configure_cq: configure a CQ.
+ * @arm_cq: arm a CQ to issue an interrupt after reaching certain index or timeout.
+ * @write_rx_ci: write the Rx CI to the HW.
+ * @get_pfc_cnts: retrieve PFC counters.
+ * @ring_tx_doorbell: ring the Tx door bell so the HW will send it.
+ * @qp_err_syndrome_to_str: Convert error syndrome id to string.
+ * @db_fifo_reset: Reset Ethernet doorbell fifo.
+ * @port_reset_locked: reset the port assuming we are under lock.
+ * @get_overrun_cnt: get db fifo overrun counter.
+ */
+struct gaudi2_en_aux_ops {
+ /* en2cn */
+ void (*configure_cq)(struct hbl_aux_dev *aux_dev, u32 port, u16 coalesce_usec, bool enable);
+ void (*arm_cq)(struct hbl_aux_dev *aux_dev, u32 port, u32 index);
+ void (*write_rx_ci)(struct hbl_aux_dev *aux_dev, u32 port, u32 ci);
+ void (*get_pfc_cnts)(struct hbl_aux_dev *aux_dev, u32 port, int pfc_prio,
+ u64 *indications, u64 *requests);
+ int (*ring_tx_doorbell)(struct hbl_aux_dev *aux_dev, u32 port, u32 pi, bool *full_after_tx);
+ char* (*qp_err_syndrome_to_str)(u32 syndrome);
+ void (*db_fifo_reset)(struct hbl_aux_dev *aux_dev, u32 port);
+
+ /* cn2en */
+ int (*port_reset_locked)(struct hbl_aux_dev *aux_dev, u32 port);
+ u32 (*get_overrun_cnt)(struct hbl_aux_dev *aux_dev, u32 port_idx);
+};
+
+#endif /* HBL_GAUDI2_H_ */
diff --git a/include/linux/net/intel/gaudi2_aux.h b/include/linux/net/intel/gaudi2_aux.h
new file mode 100644
index 000000000000..6b2025ff5713
--- /dev/null
+++ b/include/linux/net/intel/gaudi2_aux.h
@@ -0,0 +1,94 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef HBL_GAUDI2_AUX_H_
+#define HBL_GAUDI2_AUX_H_
+
+#include <linux/types.h>
+#include <linux/net/intel/cn_aux.h>
+
+enum gaudi2_setup_type {
+ GAUDI2_SETUP_TYPE_HLS2,
+};
+
+/**
+ * struct gaudi2_cn_aux_data - Gaudi2 CN driver data.
+ * @setup_type: type of setup connectivity.
+ * @cfg_base: configuration space base address.
+ * @irq_num_port_base: base IRQ number for port EQ.
+ * @sob_id_base: first reserved SOB ID.
+ * @sob_inc_cfg_val: configuration value for incrementing SOB by one.
+ * @fw_security_enabled: FW security enabled.
+ * @msix_enabled: MSI-X enabled.
+ */
+struct gaudi2_cn_aux_data {
+ enum gaudi2_setup_type setup_type;
+ u64 cfg_base;
+ u32 irq_num_port_base;
+ u32 sob_id_base;
+ u32 sob_inc_cfg_val;
+ u8 fw_security_enabled;
+ u8 msix_enabled;
+};
+
+/**
+ * struct gaudi2_cn_aux_ops - ASIC specific functions for cn <-> compute drivers communication.
+ * @get_event_name: Translate event type to name.
+ * @poll_mem: Poll on a memory address until a given condition is fulfilled or timeout.
+ * @dma_alloc_coherent: Allocate coherent DMA memory.
+ * @dma_free_coherent: Free coherent DMA memory.
+ * @dma_pool_zalloc: Allocate small size DMA memory from the pool.
+ * @dma_pool_free: Free small size DMA memory from the pool.
+ * @spmu_get_stats_info: get SPMU statistics information.
+ * @spmu_config: config the SPMU.
+ * @spmu_sample: read SPMU counters.
+ * @poll_reg: Poll on a register until a given condition is fulfilled or timeout.
+ * @send_cpu_message: send message to F/W. If the message is timedout, the driver will eventually
+ * reset the device. The timeout is passed as an argument. If it is 0 the
+ * timeout set is the default timeout for the specific ASIC.
+ * @post_send_status: handler for post sending status packet to FW.
+ * @reset_prepare: Prepare to reset.
+ * @reset_late_init: Notify that compute device finished reset.
+ * @sw_err_event_handler: Handle SW error event.
+ * @axi_error_response_event_handler: Handle AXI error.
+ * @ports_stop_prepare: prepare the ports for a stop.
+ * @send_port_cpucp_status: Send port status to FW.
+ */
+struct gaudi2_cn_aux_ops {
+ /* cn2compute */
+ char *(*get_event_name)(struct hbl_aux_dev *aux_dev, u16 event_type);
+ int (*poll_mem)(struct hbl_aux_dev *aux_dev, u32 *addr, u32 *val,
+ hbl_cn_poll_cond_func func);
+ void *(*dma_alloc_coherent)(struct hbl_aux_dev *aux_dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flag);
+ void (*dma_free_coherent)(struct hbl_aux_dev *aux_dev, size_t size, void *cpu_addr,
+ dma_addr_t dma_handle);
+ void *(*dma_pool_zalloc)(struct hbl_aux_dev *aux_dev, size_t size, gfp_t mem_flags,
+ dma_addr_t *dma_handle);
+ void (*dma_pool_free)(struct hbl_aux_dev *aux_dev, void *vaddr, dma_addr_t dma_addr);
+ void (*spmu_get_stats_info)(struct hbl_aux_dev *aux_dev, u32 port,
+ struct hbl_cn_stat **stats, u32 *n_stats);
+ int (*spmu_config)(struct hbl_aux_dev *aux_dev, u32 port, u32 num_event_types,
+ u32 event_types[], bool enable);
+ int (*spmu_sample)(struct hbl_aux_dev *aux_dev, u32 port, u32 num_out_data, u64 out_data[]);
+ int (*poll_reg)(struct hbl_aux_dev *aux_dev, u32 reg, u64 timeout_us,
+ hbl_cn_poll_cond_func func, void *arg);
+ int (*send_cpu_message)(struct hbl_aux_dev *aux_dev, u32 *msg, u16 len, u32 timeout,
+ u64 *result);
+ void (*post_send_status)(struct hbl_aux_dev *aux_dev, u32 port);
+ /* compute2cn */
+ void (*reset_prepare)(struct hbl_aux_dev *aux_dev);
+ void (*reset_late_init)(struct hbl_aux_dev *aux_dev);
+ int (*sw_err_event_handler)(struct hbl_aux_dev *aux_dev, u16 event_type, u8 macro_index,
+ struct hl_eq_nic_intr_cause *intr_cause_cpucp);
+ int (*axi_error_response_event_handler)(struct hbl_aux_dev *aux_dev, u16 event_type,
+ u8 macro_index,
+ struct hl_eq_nic_intr_cause *intr_cause_cpucp);
+ void (*ports_stop_prepare)(struct hbl_aux_dev *aux_dev, bool fw_reset, bool in_teardown);
+ int (*send_port_cpucp_status)(struct hbl_aux_dev *aux_dev, u32 port, u8 cmd, u8 period);
+};
+
+#endif /* HBL_GAUDI2_AUX_H_ */
--
2.34.1
^ permalink raw reply related [flat|nested] 107+ messages in thread
* [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
` (6 preceding siblings ...)
2024-06-13 8:22 ` [PATCH 08/15] net: hbl_cn: gaudi2: ASIC specific support Omer Shpigelman
@ 2024-06-13 8:22 ` Omer Shpigelman
2024-06-13 21:49 ` Andrew Lunn
` (5 more replies)
2024-06-13 8:22 ` [PATCH 10/15] net: hbl_en: gaudi2: ASIC specific support Omer Shpigelman
` (7 subsequent siblings)
15 siblings, 6 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:22 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
This ethernet driver is initialized via auxiliary bus by the hbl_cn
driver.
It serves mainly for control operations that are needed for AI scaling.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Co-developed-by: David Meriin <dmeriin@habana.ai>
Signed-off-by: David Meriin <dmeriin@habana.ai>
Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
---
MAINTAINERS | 9 +
drivers/net/ethernet/intel/Kconfig | 18 +
drivers/net/ethernet/intel/Makefile | 1 +
drivers/net/ethernet/intel/hbl_en/Makefile | 9 +
.../net/ethernet/intel/hbl_en/common/Makefile | 3 +
.../net/ethernet/intel/hbl_en/common/hbl_en.c | 1168 +++++++++++++++++
.../net/ethernet/intel/hbl_en/common/hbl_en.h | 206 +++
.../intel/hbl_en/common/hbl_en_dcbnl.c | 101 ++
.../ethernet/intel/hbl_en/common/hbl_en_drv.c | 211 +++
.../intel/hbl_en/common/hbl_en_ethtool.c | 452 +++++++
10 files changed, 2178 insertions(+)
create mode 100644 drivers/net/ethernet/intel/hbl_en/Makefile
create mode 100644 drivers/net/ethernet/intel/hbl_en/common/Makefile
create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c
create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c
create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 096439a62129..7301f38e9cfb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9617,6 +9617,15 @@ F: include/linux/habanalabs/
F: include/linux/net/intel/cn*
F: include/linux/net/intel/gaudi2*
+HABANALABS ETHERNET DRIVER
+M: Omer Shpigelman <oshpigelman@habana.ai>
+L: netdev@vger.kernel.org
+S: Supported
+W: https://www.habana.ai
+F: Documentation/networking/device_drivers/ethernet/intel/hbl.rst
+F: drivers/net/ethernet/intel/hbl_en/
+F: include/linux/net/intel/cn*
+
HACKRF MEDIA DRIVER
L: linux-media@vger.kernel.org
S: Orphan
diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index 0d1b8a2bae99..5d07349348a0 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -417,4 +417,22 @@ config HABANA_CN
To compile this driver as a module, choose M here. The module
will be called habanalabs_cn.
+config HABANA_EN
+ tristate "HabanaLabs (an Intel Company) Ethernet driver"
+ depends on NETDEVICES && ETHERNET && INET
+ select HABANA_CN
+ help
+ This driver enables Ethernet functionality for the network interfaces
+ that are part of the GAUDI ASIC family of AI Accelerators.
+ For more information on how to identify your adapter, go to the
+ Adapter & Driver ID Guide that can be located at:
+
+ <http://support.intel.com>
+
+ More specific information on configuring the driver is in
+ <file:Documentation/networking/device_drivers/ethernet/intel/hbl.rst>.
+
+ To compile this driver as a module, choose M here. The module
+ will be called habanalabs_en.
+
endif # NET_VENDOR_INTEL
diff --git a/drivers/net/ethernet/intel/Makefile b/drivers/net/ethernet/intel/Makefile
index 10049a28e336..ec62a0227897 100644
--- a/drivers/net/ethernet/intel/Makefile
+++ b/drivers/net/ethernet/intel/Makefile
@@ -20,3 +20,4 @@ obj-$(CONFIG_FM10K) += fm10k/
obj-$(CONFIG_ICE) += ice/
obj-$(CONFIG_IDPF) += idpf/
obj-$(CONFIG_HABANA_CN) += hbl_cn/
+obj-$(CONFIG_HABANA_EN) += hbl_en/
diff --git a/drivers/net/ethernet/intel/hbl_en/Makefile b/drivers/net/ethernet/intel/hbl_en/Makefile
new file mode 100644
index 000000000000..695497ab93b6
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_en/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for HabanaLabs (an Intel Company) Ethernet network driver
+#
+
+obj-$(CONFIG_HABANA_EN) := habanalabs_en.o
+
+include $(src)/common/Makefile
+habanalabs_en-y += $(HBL_EN_COMMON_FILES)
diff --git a/drivers/net/ethernet/intel/hbl_en/common/Makefile b/drivers/net/ethernet/intel/hbl_en/common/Makefile
new file mode 100644
index 000000000000..a3ccb5dbf4a6
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_en/common/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+HBL_EN_COMMON_FILES := common/hbl_en_drv.o common/hbl_en.o \
+ common/hbl_en_ethtool.o common/hbl_en_dcbnl.o
diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
new file mode 100644
index 000000000000..066be5ac2d84
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
@@ -0,0 +1,1168 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "hbl_en.h"
+#include <linux/inetdevice.h>
+
+#define TX_TIMEOUT (5 * HZ)
+#define PORT_RESET_TIMEOUT_MSEC (60 * 1000ull) /* 60s */
+
+/**
+ * struct hbl_en_tx_pkt_work - used to schedule a work of a Tx packet.
+ * @tx_work: workqueue object to run when packet needs to be sent.
+ * @port: pointer to current port structure.
+ * @skb: copy of the packet to send.
+ */
+struct hbl_en_tx_pkt_work {
+ struct work_struct tx_work;
+ struct hbl_en_port *port;
+ struct sk_buff *skb;
+};
+
+static int hbl_en_napi_poll(struct napi_struct *napi, int budget);
+static int hbl_en_port_open(struct hbl_en_port *port);
+
+static int hbl_en_ports_reopen(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_en_device *hdev = aux_dev->priv;
+ struct hbl_en_port *port;
+ int rc = 0, i;
+
+ for (i = 0; i < hdev->max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ port = &hdev->ports[i];
+
+ /* It could be that the port was shutdown by 'ip link set down' and there is no need
+ * in reopening it.
+ * Since we mark the ports as in reset even if they are disabled, we clear the flag
+ * here anyway.
+ * See hbl_en_ports_stop_prepare() for more info.
+ */
+ if (!netif_running(port->ndev)) {
+ atomic_set(&port->in_reset, 0);
+ continue;
+ }
+
+ rc = hbl_en_port_open(port);
+
+ atomic_set(&port->in_reset, 0);
+
+ if (rc)
+ break;
+ }
+
+ hdev->in_reset = false;
+
+ return rc;
+}
+
+static void hbl_en_port_fini(struct hbl_en_port *port)
+{
+ if (port->rx_wq)
+ destroy_workqueue(port->rx_wq);
+}
+
+static int hbl_en_port_init(struct hbl_en_port *port)
+{
+ struct hbl_en_device *hdev = port->hdev;
+ u32 port_idx = port->idx;
+ char wq_name[32];
+ int rc;
+
+ if (hdev->poll_enable) {
+ memset(wq_name, 0, sizeof(wq_name));
+ snprintf(wq_name, sizeof(wq_name) - 1, "hbl%u-port%d-rx-wq", hdev->core_dev_id,
+ port_idx);
+ port->rx_wq = alloc_ordered_workqueue(wq_name, 0);
+ if (!port->rx_wq) {
+ dev_err(hdev->dev, "Failed to allocate Rx WQ\n");
+ rc = -ENOMEM;
+ goto fail;
+ }
+ }
+
+ hbl_en_ethtool_init_coalesce(port);
+
+ return 0;
+
+fail:
+ hbl_en_port_fini(port);
+
+ return rc;
+}
+
+static void _hbl_en_set_port_status(struct hbl_en_port *port, bool up)
+{
+ struct net_device *ndev = port->ndev;
+ u32 port_idx = port->idx;
+
+ if (up) {
+ netif_carrier_on(ndev);
+ netif_wake_queue(ndev);
+ } else {
+ netif_carrier_off(ndev);
+ netif_stop_queue(ndev);
+ }
+
+ /* Unless link events are getting through the EQ, no need to print about link down events
+ * during port reset
+ */
+ if (port->hdev->has_eq || up || !atomic_read(&port->in_reset))
+ netdev_info(port->ndev, "link %s, port %d\n", up ? "up" : "down", port_idx);
+}
+
+static void hbl_en_set_port_status(struct hbl_aux_dev *aux_dev, u32 port_idx, bool up)
+{
+ struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
+
+ _hbl_en_set_port_status(port, up);
+}
+
+static bool hbl_en_is_port_open(struct hbl_aux_dev *aux_dev, u32 port_idx)
+{
+ struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
+
+ return port->is_initialized;
+}
+
+/* get the src IP as it is done in devinet_ioctl() */
+static int hbl_en_get_src_ip(struct hbl_aux_dev *aux_dev, u32 port_idx, u32 *src_ip)
+{
+ struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
+ struct net_device *ndev = port->ndev;
+ struct in_device *in_dev;
+ struct in_ifaddr *ifa;
+ int rc = 0;
+
+ /* for the case where no src IP is configured */
+ *src_ip = 0;
+
+ /* rtnl lock should be acquired in relevant flows before taking configuration lock */
+ if (!rtnl_is_locked()) {
+ netdev_err(port->ndev, "Rtnl lock is not acquired, can't proceed\n");
+ rc = -EFAULT;
+ goto out;
+ }
+
+ in_dev = __in_dev_get_rtnl(ndev);
+ if (!in_dev) {
+ netdev_err(port->ndev, "Failed to get IPv4 struct\n");
+ rc = -EFAULT;
+ goto out;
+ }
+
+ ifa = rtnl_dereference(in_dev->ifa_list);
+
+ while (ifa) {
+ if (!strcmp(ndev->name, ifa->ifa_label)) {
+ /* convert the BE to native and later on it will be
+ * written to the HW as LE in QPC_SET
+ */
+ *src_ip = be32_to_cpu(ifa->ifa_local);
+ break;
+ }
+ ifa = rtnl_dereference(ifa->ifa_next);
+ }
+out:
+ return rc;
+}
+
+static void hbl_en_reset_stats(struct hbl_aux_dev *aux_dev, u32 port_idx)
+{
+ struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
+
+ port->net_stats.rx_packets = 0;
+ port->net_stats.tx_packets = 0;
+ port->net_stats.rx_bytes = 0;
+ port->net_stats.tx_bytes = 0;
+ port->net_stats.tx_errors = 0;
+ atomic64_set(&port->net_stats.rx_dropped, 0);
+ atomic64_set(&port->net_stats.tx_dropped, 0);
+}
+
+static u32 hbl_en_get_mtu(struct hbl_aux_dev *aux_dev, u32 port_idx)
+{
+ struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
+ struct net_device *ndev = port->ndev;
+ u32 mtu;
+
+ if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
+ netdev_err(ndev, "port is in reset, can't get MTU\n");
+ return 0;
+ }
+
+ mtu = ndev->mtu;
+
+ atomic_set(&port->in_reset, 0);
+
+ return mtu;
+}
+
+static u32 hbl_en_get_pflags(struct hbl_aux_dev *aux_dev, u32 port_idx)
+{
+ struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
+
+ return port->pflags;
+}
+
+static void hbl_en_set_dev_lpbk(struct hbl_aux_dev *aux_dev, u32 port_idx, bool enable)
+{
+ struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
+ struct net_device *ndev = port->ndev;
+
+ if (enable)
+ ndev->features |= NETIF_F_LOOPBACK;
+ else
+ ndev->features &= ~NETIF_F_LOOPBACK;
+}
+
+/* This function should be called after ctrl_lock was taken */
+static int hbl_en_port_open_locked(struct hbl_en_port *port)
+{
+ struct hbl_en_device *hdev = port->hdev;
+ struct net_device *ndev = port->ndev;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+ int rc;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ if (port->is_initialized)
+ return 0;
+
+ if (!hdev->poll_enable)
+ netif_napi_add(ndev, &port->napi, hbl_en_napi_poll);
+
+ rc = aux_ops->port_hw_init(aux_dev, port_idx);
+ if (rc) {
+ netdev_err(ndev, "Failed to configure the HW, rc %d\n", rc);
+ goto hw_init_fail;
+ }
+
+ if (!hdev->poll_enable)
+ napi_enable(&port->napi);
+
+ rc = hdev->asic_funcs.eth_port_open(port);
+ if (rc) {
+ netdev_err(ndev, "Failed to init H/W, rc %d\n", rc);
+ goto port_open_fail;
+ }
+
+ rc = aux_ops->update_mtu(aux_dev, port_idx, ndev->mtu);
+ if (rc) {
+ netdev_err(ndev, "MTU update failed, rc %d\n", rc);
+ goto update_mtu_fail;
+ }
+
+ rc = aux_ops->phy_init(aux_dev, port_idx);
+ if (rc) {
+ netdev_err(ndev, "PHY init failed, rc %d\n", rc);
+ goto phy_init_fail;
+ }
+
+ netif_start_queue(ndev);
+
+ port->is_initialized = true;
+
+ return 0;
+
+phy_init_fail:
+ /* no need to revert the MTU change, it will be updated on next port open */
+update_mtu_fail:
+ hdev->asic_funcs.eth_port_close(port);
+port_open_fail:
+ if (!hdev->poll_enable)
+ napi_disable(&port->napi);
+
+ aux_ops->port_hw_fini(aux_dev, port_idx);
+hw_init_fail:
+ if (!hdev->poll_enable)
+ netif_napi_del(&port->napi);
+
+ return rc;
+}
+
+static int hbl_en_port_open(struct hbl_en_port *port)
+{
+ struct hbl_en_device *hdev = port->hdev;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+ int rc;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ aux_ops->ctrl_lock(aux_dev, port_idx);
+ rc = hbl_en_port_open_locked(port);
+ aux_ops->ctrl_unlock(aux_dev, port_idx);
+
+ return rc;
+}
+
+static int hbl_en_open(struct net_device *netdev)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(netdev);
+ int rc;
+
+ if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
+ netdev_err(netdev, "port is in reset, can't open it\n");
+ return -EBUSY;
+ }
+
+ rc = hbl_en_port_open(port);
+
+ atomic_set(&port->in_reset, 0);
+
+ return rc;
+}
+
+/* This function should be called after ctrl_lock was taken */
+static void hbl_en_port_close_locked(struct hbl_en_port *port)
+{
+ struct hbl_en_device *hdev = port->hdev;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ if (!port->is_initialized)
+ return;
+
+ port->is_initialized = false;
+
+ /* verify that the port is marked as closed before continuing */
+ mb();
+
+ /* Print if not in hard reset flow e.g. from ip cmd */
+ if (!hdev->in_reset && netif_carrier_ok(port->ndev))
+ netdev_info(port->ndev, "port was closed\n");
+
+ /* disable the PHY here so no link changes will occur from this point forward */
+ aux_ops->phy_fini(aux_dev, port_idx);
+
+ /* disable Tx SW flow */
+ netif_carrier_off(port->ndev);
+ netif_tx_disable(port->ndev);
+
+ /* stop Tx/Rx HW */
+ aux_ops->port_hw_fini(aux_dev, port_idx);
+
+ /* disable Tx/Rx QPs */
+ hdev->asic_funcs.eth_port_close(port);
+
+ /* stop Rx SW flow */
+ if (hdev->poll_enable) {
+ hbl_en_rx_poll_stop(port);
+ } else {
+ napi_disable(&port->napi);
+ netif_napi_del(&port->napi);
+ }
+
+ /* Explicitly count the port close operations as we don't get a link event for this.
+ * Upon port open we receive a link event, hence no additional action required.
+ */
+ aux_ops->port_toggle_count(aux_dev, port_idx);
+}
+
+static void hbl_en_port_close(struct hbl_en_port *port)
+{
+ struct hbl_en_device *hdev = port->hdev;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ aux_ops->ctrl_lock(aux_dev, port_idx);
+ hbl_en_port_close_locked(port);
+ aux_ops->ctrl_unlock(aux_dev, port_idx);
+}
+
+/* This function should be called after ctrl_lock was taken */
+static int __hbl_en_port_reset_locked(struct hbl_en_port *port)
+{
+ hbl_en_port_close_locked(port);
+
+ return hbl_en_port_open_locked(port);
+}
+
+/* This function should be called after ctrl_lock was taken */
+int hbl_en_port_reset_locked(struct hbl_aux_dev *aux_dev, u32 port_idx)
+{
+ struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
+
+ return __hbl_en_port_reset_locked(port);
+}
+
+int hbl_en_port_reset(struct hbl_en_port *port)
+{
+ hbl_en_port_close(port);
+
+ /* Sleep in order to let obsolete events to be dropped before re-opening the port */
+ msleep(20);
+
+ return hbl_en_port_open(port);
+}
+
+static int hbl_en_close(struct net_device *netdev)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(netdev);
+ struct hbl_en_device *hdev = port->hdev;
+ ktime_t timeout;
+
+ /* Looks like the return value of this function is not checked, so we can't just return
+ * EBUSY if the port is under reset. We need to wait until the reset is finished and then
+ * close the port. Otherwise the netdev will set the port as closed although port_close()
+ * wasn't called. Only if we waited long enough and the reset hasn't finished, we can return
+ * an error without actually closing the port as it is a fatal flow anyway.
+ */
+ timeout = ktime_add_ms(ktime_get(), PORT_RESET_TIMEOUT_MSEC);
+ while (atomic_cmpxchg(&port->in_reset, 0, 1)) {
+ /* If this is called from unregister_netdev() then the port was already closed and
+ * hence we can safely return.
+ * We could have just check the port_open boolean, but that might hide some future
+ * bugs. Hence it is better to use a dedicated flag for that.
+ */
+ if (READ_ONCE(hdev->in_teardown))
+ return 0;
+
+ usleep_range(50, 200);
+ if (ktime_compare(ktime_get(), timeout) > 0) {
+ netdev_crit(netdev,
+ "Timeout while waiting for port to finish reset, can't close it\n"
+ );
+ return -EBUSY;
+ }
+ }
+
+ hbl_en_port_close(port);
+
+ atomic_set(&port->in_reset, 0);
+
+ return 0;
+}
+
+/**
+ * hbl_en_ports_stop_prepare() - stop the Rx and Tx and synchronize with other reset flows.
+ * @aux_dev: habanalabs auxiliary device structure.
+ *
+ * This function makes sure that during the reset no packets will be processed and that
+ * ndo_open/ndo_close do not open/close the ports.
+ * A hard reset might occur right after the driver was loaded, which means before the ports
+ * initialization was finished. Therefore, even if the ports are not yet open, we mark it as in
+ * reset in order to avoid races. We clear the in reset flag later on when reopening the ports.
+ */
+static void hbl_en_ports_stop_prepare(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_en_device *hdev = aux_dev->priv;
+ struct hbl_en_port *port;
+ ktime_t timeout;
+ int i;
+
+ /* Check if the ports where initialized. If not, we shouldn't mark them as in reset because
+ * they will fail to get opened.
+ */
+ if (!hdev->is_initialized || hdev->in_reset)
+ return;
+
+ for (i = 0; i < hdev->max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ port = &hdev->ports[i];
+
+ /* This function is competing with reset from ethtool/ip, so try to take the
+ * in_reset atomic and if we are already in a middle of reset, wait until reset
+ * function is finished.
+ * Reset function is designed to always finish (could take up to a few seconds in
+ * worst case).
+ * We mark also closed ports as in reset so they won't be able to get opened while
+ * the device in under reset.
+ */
+
+ timeout = ktime_add_ms(ktime_get(), PORT_RESET_TIMEOUT_MSEC);
+ while (atomic_cmpxchg(&port->in_reset, 0, 1)) {
+ usleep_range(50, 200);
+ if (ktime_compare(ktime_get(), timeout) > 0) {
+ netdev_crit(port->ndev,
+ "Timeout while waiting for port %d to finish reset\n",
+ port->idx);
+ break;
+ }
+ }
+ }
+
+ hdev->in_reset = true;
+}
+
+static void hbl_en_ports_stop(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_en_device *hdev = aux_dev->priv;
+ struct hbl_en_port *port;
+ int i;
+
+ for (i = 0; i < hdev->max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ port = &hdev->ports[i];
+
+ if (netif_running(port->ndev))
+ hbl_en_port_close(port);
+ }
+}
+
+static int hbl_en_change_mtu(struct net_device *netdev, int new_mtu)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(netdev);
+ int rc = 0;
+
+ if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
+ netdev_err(netdev, "port is in reset, can't change MTU\n");
+ return -EBUSY;
+ }
+
+ if (netif_running(port->ndev)) {
+ hbl_en_port_close(port);
+
+ /* Sleep in order to let obsolete events to be dropped before re-opening the port */
+ msleep(20);
+
+ netdev->mtu = new_mtu;
+
+ rc = hbl_en_port_open(port);
+ if (rc)
+ netdev_err(netdev, "Failed to reinit port for MTU change, rc %d\n", rc);
+ } else {
+ netdev->mtu = new_mtu;
+ }
+
+ atomic_set(&port->in_reset, 0);
+
+ return rc;
+}
+
+/* Swap source and destination MAC addresses */
+static inline void swap_l2(char *buf)
+{
+ u16 *eth_hdr, tmp;
+
+ eth_hdr = (u16 *)buf;
+ tmp = eth_hdr[0];
+ eth_hdr[0] = eth_hdr[3];
+ eth_hdr[3] = tmp;
+ tmp = eth_hdr[1];
+ eth_hdr[1] = eth_hdr[4];
+ eth_hdr[4] = tmp;
+ tmp = eth_hdr[2];
+ eth_hdr[2] = eth_hdr[5];
+ eth_hdr[5] = tmp;
+}
+
+/* Swap source and destination IP addresses
+ */
+static inline void swap_l3(char *buf)
+{
+ u32 tmp;
+
+ /* skip the Ethernet header and the IP header till source IP address */
+ buf += ETH_HLEN + 12;
+ tmp = ((u32 *)buf)[0];
+ ((u32 *)buf)[0] = ((u32 *)buf)[1];
+ ((u32 *)buf)[1] = tmp;
+}
+
+static void do_tx_swap(struct hbl_en_port *port, struct sk_buff *skb)
+{
+ struct hbl_en_device *hdev = port->hdev;
+ u16 *tmp_buff = (u16 *)skb->data;
+ u32 port_idx = port->idx;
+
+ /* First, let's print the SKB we got */
+ dev_dbg_ratelimited(hdev->dev,
+ "Send [P%d]: dst-mac:%04x%04x%04x, src-mac:%04x%04x%04x, eth-type:%04x, len:%u\n",
+ port_idx, swab16(tmp_buff[0]), swab16(tmp_buff[1]), swab16(tmp_buff[2]),
+ swab16(tmp_buff[3]), swab16(tmp_buff[4]), swab16(tmp_buff[5]),
+ swab16(tmp_buff[6]), skb->len);
+
+ /* Before submit it to HW, in case this is ipv4 pkt, swap eth/ip addresses.
+ * that way, we may send ECMP (ping) to ourselves in LB cases.
+ */
+ swap_l2(skb->data);
+ if (swab16(tmp_buff[6]) == ETH_P_IP)
+ swap_l3(skb->data);
+}
+
+static bool is_pkt_swap_enabled(struct hbl_en_device *hdev)
+{
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ return aux_ops->is_eth_lpbk(aux_dev);
+}
+
+static bool is_tx_disabled(struct hbl_en_port *port)
+{
+ struct hbl_en_device *hdev = port->hdev;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ return aux_ops->get_mac_lpbk(aux_dev, port_idx) && !is_pkt_swap_enabled(hdev);
+}
+
+static netdev_tx_t hbl_en_handle_tx(struct hbl_en_port *port, struct sk_buff *skb)
+{
+ struct hbl_en_device *hdev = port->hdev;
+ netdev_tx_t ret;
+
+ if (skb->len <= 0 || is_tx_disabled(port))
+ goto free_skb;
+
+ if (skb->len > hdev->max_frm_len) {
+ netdev_err(port->ndev, "Tx pkt size %uB exceeds maximum of %uB\n", skb->len,
+ hdev->max_frm_len);
+ goto free_skb;
+ }
+
+ if (is_pkt_swap_enabled(hdev))
+ do_tx_swap(port, skb);
+
+ /* Pad the ethernet packets to the minimum frame size as the NIC hw doesn't do it.
+ * eth_skb_pad() frees the packet on failure, so just increment the dropped counter and
+ * return as success to avoid a retry.
+ */
+ if (skb_put_padto(skb, hdev->pad_size)) {
+ dev_err_ratelimited(hdev->dev, "Padding failed, the skb is dropped\n");
+ atomic64_inc(&port->net_stats.tx_dropped);
+ return NETDEV_TX_OK;
+ }
+
+ ret = hdev->asic_funcs.write_pkt_to_hw(port, skb);
+ if (ret == NETDEV_TX_OK) {
+ port->net_stats.tx_packets++;
+ port->net_stats.tx_bytes += skb->len;
+ }
+
+ return ret;
+
+free_skb:
+ dev_kfree_skb_any(skb);
+ return NETDEV_TX_OK;
+}
+
+static netdev_tx_t hbl_en_start_xmit(struct sk_buff *skb, struct net_device *netdev)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(netdev);
+ struct hbl_en_device *hdev;
+
+ hdev = port->hdev;
+
+ return hbl_en_handle_tx(port, skb);
+}
+
+static int hbl_en_set_port_mac_loopback(struct hbl_en_port *port, bool enable)
+{
+ struct hbl_en_device *hdev = port->hdev;
+ struct net_device *ndev = port->ndev;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+ int rc;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ rc = aux_ops->set_mac_lpbk(aux_dev, port_idx, enable);
+ if (rc)
+ return rc;
+
+ netdev_info(ndev, "port %u: mac loopback is %s\n", port_idx,
+ enable ? "enabled" : "disabled");
+
+ if (netif_running(ndev)) {
+ rc = hbl_en_port_reset(port);
+ if (rc) {
+ netdev_err(ndev, "Failed to reset port %u, rc %d\n", port_idx, rc);
+ return rc;
+ }
+ }
+
+ return 0;
+}
+
+static int hbl_en_set_features(struct net_device *netdev, netdev_features_t features)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(netdev);
+ netdev_features_t changed;
+ int rc = 0;
+
+ if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
+ netdev_err(netdev, "port %d is in reset, can't update settings", port->idx);
+ return -EBUSY;
+ }
+
+ changed = netdev->features ^ features;
+
+ if (changed & NETIF_F_LOOPBACK)
+ rc = hbl_en_set_port_mac_loopback(port, !!(features & NETIF_F_LOOPBACK));
+
+ atomic_set(&port->in_reset, 0);
+
+ return rc;
+}
+
+static void hbl_en_handle_tx_timeout(struct net_device *netdev, unsigned int txqueue)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(netdev);
+
+ port->net_stats.tx_errors++;
+ atomic64_inc(&port->net_stats.tx_dropped);
+}
+
+static void hbl_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(dev);
+
+ stats->rx_bytes = port->net_stats.rx_bytes;
+ stats->tx_bytes = port->net_stats.tx_bytes;
+ stats->rx_packets = port->net_stats.rx_packets;
+ stats->tx_packets = port->net_stats.tx_packets;
+ stats->tx_errors = port->net_stats.tx_errors;
+ stats->tx_dropped = (u64)atomic64_read(&port->net_stats.tx_dropped);
+ stats->rx_dropped = (u64)atomic64_read(&port->net_stats.rx_dropped);
+}
+
+static const struct net_device_ops hbl_en_netdev_ops = {
+ .ndo_open = hbl_en_open,
+ .ndo_stop = hbl_en_close,
+ .ndo_start_xmit = hbl_en_start_xmit,
+ .ndo_validate_addr = eth_validate_addr,
+ .ndo_change_mtu = hbl_en_change_mtu,
+ .ndo_set_features = hbl_en_set_features,
+ .ndo_get_stats64 = hbl_en_get_stats64,
+ .ndo_tx_timeout = hbl_en_handle_tx_timeout,
+};
+
+static void hbl_en_set_ops(struct net_device *ndev)
+{
+ ndev->netdev_ops = &hbl_en_netdev_ops;
+ ndev->ethtool_ops = hbl_en_ethtool_get_ops(ndev);
+#ifdef CONFIG_DCB
+ ndev->dcbnl_ops = &hbl_en_dcbnl_ops;
+#endif
+}
+
+static int hbl_en_port_register(struct hbl_en_port *port)
+{
+ struct hbl_en_device *hdev = port->hdev;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+ struct hbl_en_port **ptr;
+ struct net_device *ndev;
+ int rc;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ ndev = alloc_etherdev(sizeof(struct hbl_en_port *));
+ if (!ndev) {
+ dev_err(hdev->dev, "netdevice %d alloc failed\n", port_idx);
+ return -ENOMEM;
+ }
+
+ port->ndev = ndev;
+ SET_NETDEV_DEV(ndev, &hdev->pdev->dev);
+ ptr = netdev_priv(ndev);
+ *ptr = port;
+
+ /* necessary for creating multiple interfaces */
+ ndev->dev_port = port_idx;
+
+ hbl_en_set_ops(ndev);
+
+ ndev->watchdog_timeo = TX_TIMEOUT;
+ ndev->min_mtu = hdev->min_raw_mtu;
+ ndev->max_mtu = hdev->max_raw_mtu;
+
+ /* Add loopback capability to the device. */
+ ndev->hw_features |= NETIF_F_LOOPBACK;
+
+ /* If this port was set to loopback, set it also to the ndev features */
+ if (aux_ops->get_mac_lpbk(aux_dev, port_idx))
+ ndev->features |= NETIF_F_LOOPBACK;
+
+ eth_hw_addr_set(ndev, port->mac_addr);
+
+ /* It's more an intelligent poll wherein, we enable the Rx completion EQE event and then
+ * start the poll from there.
+ * Inside the polling thread, we read packets from hardware and then reschedule the poll
+ * only if there are more packets to be processed. Else we re-enable the CQ Arm interrupt
+ * and exit the poll.
+ */
+ if (hdev->poll_enable)
+ hbl_en_rx_poll_trigger_init(port);
+
+ netif_carrier_off(ndev);
+
+ rc = register_netdev(ndev);
+ if (rc) {
+ dev_err(hdev->dev, "Could not register netdevice %d\n", port_idx);
+ goto err;
+ }
+
+ return 0;
+
+err:
+ if (ndev) {
+ free_netdev(ndev);
+ port->ndev = NULL;
+ }
+
+ return rc;
+}
+
+static void dump_swap_pkt(struct hbl_en_port *port, struct sk_buff *skb)
+{
+ struct hbl_en_device *hdev = port->hdev;
+ u16 *tmp_buff = (u16 *)skb->data;
+ u32 port_idx = port->idx;
+
+ /* The SKB is ready now (before stripping-out the L2), print its content */
+ dev_dbg_ratelimited(hdev->dev,
+ "Recv [P%d]: dst-mac:%04x%04x%04x, src-mac:%04x%04x%04x, eth-type:%04x, len:%u\n",
+ port_idx, swab16(tmp_buff[0]), swab16(tmp_buff[1]), swab16(tmp_buff[2]),
+ swab16(tmp_buff[3]), swab16(tmp_buff[4]), swab16(tmp_buff[5]),
+ swab16(tmp_buff[6]), skb->len);
+}
+
+int hbl_en_handle_rx(struct hbl_en_port *port, int budget)
+{
+ struct hbl_en_device *hdev = port->hdev;
+ enum hbl_en_eth_pkt_status pkt_status;
+ struct net_device *ndev = port->ndev;
+ int rc, pkt_count = 0;
+ struct sk_buff *skb;
+ void *pkt_addr;
+ u32 pkt_size;
+
+ if (!netif_carrier_ok(ndev))
+ return 0;
+
+ while (pkt_count < budget) {
+ pkt_status = hdev->asic_funcs.read_pkt_from_hw(port, &pkt_addr, &pkt_size);
+
+ if (pkt_status == ETH_PKT_NONE)
+ break;
+
+ pkt_count++;
+
+ if (pkt_status == ETH_PKT_DROP) {
+ atomic64_inc(&port->net_stats.rx_dropped);
+ continue;
+ }
+
+ if (hdev->poll_enable)
+ skb = __netdev_alloc_skb_ip_align(ndev, pkt_size, GFP_KERNEL);
+ else
+ skb = napi_alloc_skb(&port->napi, pkt_size);
+
+ if (!skb) {
+ atomic64_inc(&port->net_stats.rx_dropped);
+ break;
+ }
+
+ skb_copy_to_linear_data(skb, pkt_addr, pkt_size);
+ skb_put(skb, pkt_size);
+
+ if (is_pkt_swap_enabled(hdev))
+ dump_swap_pkt(port, skb);
+
+ skb->protocol = eth_type_trans(skb, ndev);
+
+ /* Zero the packet buffer memory to avoid leak in case of wrong
+ * size is used when next packet populates the same memory
+ */
+ memset(pkt_addr, 0, pkt_size);
+
+ /* polling is done in thread context and hence BH should be disabled */
+ if (hdev->poll_enable)
+ local_bh_disable();
+
+ rc = netif_receive_skb(skb);
+
+ if (hdev->poll_enable)
+ local_bh_enable();
+
+ if (rc == NET_RX_SUCCESS) {
+ port->net_stats.rx_packets++;
+ port->net_stats.rx_bytes += pkt_size;
+ } else {
+ atomic64_inc(&port->net_stats.rx_dropped);
+ }
+ }
+
+ return pkt_count;
+}
+
+static bool __hbl_en_rx_poll_schedule(struct hbl_en_port *port, unsigned long delay)
+{
+ return queue_delayed_work(port->rx_wq, &port->rx_poll_work, delay);
+}
+
+static void hbl_en_rx_poll_work(struct work_struct *work)
+{
+ struct hbl_en_port *port = container_of(work, struct hbl_en_port, rx_poll_work.work);
+ struct hbl_en_device *hdev = port->hdev;
+ int pkt_count;
+
+ pkt_count = hbl_en_handle_rx(port, NAPI_POLL_WEIGHT);
+
+ /* Reschedule the poll if we have consumed budget which means we still have packets to
+ * process. Else re-enable the Rx IRQs and exit the work.
+ */
+ if (pkt_count < NAPI_POLL_WEIGHT)
+ hdev->asic_funcs.reenable_rx_irq(port);
+ else
+ __hbl_en_rx_poll_schedule(port, 0);
+}
+
+/* Rx poll init and trigger routines are used in event-driven setups where
+ * Rx polling is initialized once during init or open and started/triggered by the event handler.
+ */
+void hbl_en_rx_poll_trigger_init(struct hbl_en_port *port)
+{
+ INIT_DELAYED_WORK(&port->rx_poll_work, hbl_en_rx_poll_work);
+}
+
+bool hbl_en_rx_poll_start(struct hbl_en_port *port)
+{
+ return __hbl_en_rx_poll_schedule(port, msecs_to_jiffies(1));
+}
+
+void hbl_en_rx_poll_stop(struct hbl_en_port *port)
+{
+ cancel_delayed_work_sync(&port->rx_poll_work);
+}
+
+static int hbl_en_napi_poll(struct napi_struct *napi, int budget)
+{
+ struct hbl_en_port *port = container_of(napi, struct hbl_en_port, napi);
+ struct hbl_en_device *hdev = port->hdev;
+ int pkt_count;
+
+ /* exit if we are called by netpoll as we free the Tx ring via EQ (if enabled) */
+ if (!budget)
+ return 0;
+
+ pkt_count = hbl_en_handle_rx(port, budget);
+
+ /* If budget not fully consumed, exit the polling mode */
+ if (pkt_count < budget) {
+ napi_complete_done(napi, pkt_count);
+ hdev->asic_funcs.reenable_rx_irq(port);
+ }
+
+ return pkt_count;
+}
+
+static void hbl_en_port_unregister(struct hbl_en_port *port)
+{
+ struct net_device *ndev = port->ndev;
+
+ unregister_netdev(ndev);
+ free_netdev(ndev);
+ port->ndev = NULL;
+}
+
+static int hbl_en_set_asic_funcs(struct hbl_en_device *hdev)
+{
+ switch (hdev->asic_type) {
+ case HBL_ASIC_GAUDI2:
+ default:
+ dev_err(hdev->dev, "Unrecognized ASIC type %d\n", hdev->asic_type);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static void hbl_en_handle_eqe(struct hbl_aux_dev *aux_dev, u32 port, struct hbl_cn_eqe *eqe)
+{
+ struct hbl_en_device *hdev = aux_dev->priv;
+
+ hdev->asic_funcs.handle_eqe(aux_dev, port, eqe);
+}
+
+static void hbl_en_set_aux_ops(struct hbl_en_device *hdev, bool enable)
+{
+ struct hbl_en_aux_ops *aux_ops = hdev->aux_dev->aux_ops;
+
+ if (enable) {
+ aux_ops->ports_reopen = hbl_en_ports_reopen;
+ aux_ops->ports_stop_prepare = hbl_en_ports_stop_prepare;
+ aux_ops->ports_stop = hbl_en_ports_stop;
+ aux_ops->set_port_status = hbl_en_set_port_status;
+ aux_ops->is_port_open = hbl_en_is_port_open;
+ aux_ops->get_src_ip = hbl_en_get_src_ip;
+ aux_ops->reset_stats = hbl_en_reset_stats;
+ aux_ops->get_mtu = hbl_en_get_mtu;
+ aux_ops->get_pflags = hbl_en_get_pflags;
+ aux_ops->set_dev_lpbk = hbl_en_set_dev_lpbk;
+ aux_ops->handle_eqe = hbl_en_handle_eqe;
+ } else {
+ aux_ops->ports_reopen = NULL;
+ aux_ops->ports_stop_prepare = NULL;
+ aux_ops->ports_stop = NULL;
+ aux_ops->set_port_status = NULL;
+ aux_ops->is_port_open = NULL;
+ aux_ops->get_src_ip = NULL;
+ aux_ops->reset_stats = NULL;
+ aux_ops->get_mtu = NULL;
+ aux_ops->get_pflags = NULL;
+ aux_ops->set_dev_lpbk = NULL;
+ aux_ops->handle_eqe = NULL;
+ }
+}
+
+int hbl_en_dev_init(struct hbl_en_device *hdev)
+{
+ struct hbl_en_asic_funcs *asic_funcs = &hdev->asic_funcs;
+ struct hbl_en_port *port;
+ int rc, i, port_cnt = 0;
+
+ /* must be called before the call to dev_init() */
+ rc = hbl_en_set_asic_funcs(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "failed to set aux ops\n");
+ return rc;
+ }
+
+ rc = asic_funcs->dev_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "device init failed\n");
+ return rc;
+ }
+
+ /* init the function pointers here before calling hbl_en_port_register which sets up
+ * net_device_ops, and its ops might start getting called.
+ * If any failure is encountered, these will be made NULL and the core driver won't call
+ * them.
+ */
+ hbl_en_set_aux_ops(hdev, true);
+
+ /* Port register depends on the above initialization so it must be called here and not
+ * before that.
+ */
+ for (i = 0; i < hdev->max_num_of_ports; i++, port_cnt++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ port = &hdev->ports[i];
+
+ rc = hbl_en_port_init(port);
+ if (rc) {
+ dev_err(hdev->dev, "port init failed\n");
+ goto unregister_ports;
+ }
+
+ rc = hbl_en_port_register(port);
+ if (rc) {
+ dev_err(hdev->dev, "port register failed\n");
+
+ hbl_en_port_fini(port);
+ goto unregister_ports;
+ }
+ }
+
+ hdev->is_initialized = true;
+
+ return 0;
+
+unregister_ports:
+ for (i = 0; i < port_cnt; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ port = &hdev->ports[i];
+
+ hbl_en_port_unregister(port);
+ hbl_en_port_fini(port);
+ }
+
+ hbl_en_set_aux_ops(hdev, false);
+
+ asic_funcs->dev_fini(hdev);
+
+ return rc;
+}
+
+void hbl_en_dev_fini(struct hbl_en_device *hdev)
+{
+ struct hbl_en_asic_funcs *asic_funcs = &hdev->asic_funcs;
+ struct hbl_en_port *port;
+ int i;
+
+ hdev->in_teardown = true;
+
+ if (!hdev->is_initialized)
+ return;
+
+ hdev->is_initialized = false;
+
+ for (i = 0; i < hdev->max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ port = &hdev->ports[i];
+
+ /* It could be this cleanup flow is called after a failed init flow.
+ * Hence we need to check that we indeed have a netdev to unregister.
+ */
+ if (!port->ndev)
+ continue;
+
+ hbl_en_port_unregister(port);
+ hbl_en_port_fini(port);
+ }
+
+ hbl_en_set_aux_ops(hdev, false);
+
+ asic_funcs->dev_fini(hdev);
+}
+
+dma_addr_t hbl_en_dma_map(struct hbl_en_device *hdev, void *addr, int len)
+{
+ dma_addr_t dma_addr;
+
+ if (hdev->dma_map_support)
+ dma_addr = dma_map_single(&hdev->pdev->dev, addr, len, DMA_TO_DEVICE);
+ else
+ dma_addr = virt_to_phys(addr);
+
+ return dma_addr;
+}
+
+void hbl_en_dma_unmap(struct hbl_en_device *hdev, dma_addr_t dma_addr, int len)
+{
+ if (hdev->dma_map_support)
+ dma_unmap_single(&hdev->pdev->dev, dma_addr, len, DMA_TO_DEVICE);
+}
diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
new file mode 100644
index 000000000000..15504c1f3cfb
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef HABANALABS_EN_H_
+#define HABANALABS_EN_H_
+
+#include <linux/net/intel/cn.h>
+
+#include <linux/netdevice.h>
+#include <linux/pci.h>
+
+#define HBL_EN_NAME "habanalabs_en"
+
+#define HBL_EN_PORT(aux_dev, idx) (&(((struct hbl_en_device *)(aux_dev)->priv)->ports[(idx)]))
+
+#define hbl_netdev_priv(ndev) \
+({ \
+ typecheck(struct net_device *, ndev); \
+ *(struct hbl_en_port **)netdev_priv(ndev); \
+})
+
+/**
+ * enum hbl_en_eth_pkt_status - status of Rx Ethernet packet.
+ * ETH_PKT_OK: packet was received successfully.
+ * ETH_PKT_DROP: packet should be dropped.
+ * ETH_PKT_NONE: no available packet.
+ */
+enum hbl_en_eth_pkt_status {
+ ETH_PKT_OK,
+ ETH_PKT_DROP,
+ ETH_PKT_NONE
+};
+
+/**
+ * struct hbl_en_net_stats - stats of Ethernet interface.
+ * rx_packets: number of packets received.
+ * tx_packets: number of packets sent.
+ * rx_bytes: total bytes of data received.
+ * tx_bytes: total bytes of data sent.
+ * tx_errors: number of errors in the TX.
+ * rx_dropped: number of packets dropped by the RX.
+ * tx_dropped: number of packets dropped by the TX.
+ */
+struct hbl_en_net_stats {
+ u64 rx_packets;
+ u64 tx_packets;
+ u64 rx_bytes;
+ u64 tx_bytes;
+ u64 tx_errors;
+ atomic64_t rx_dropped;
+ atomic64_t tx_dropped;
+};
+
+/**
+ * struct hbl_en_port - manage port common structure.
+ * @hdev: habanalabs Ethernet device structure.
+ * @ndev: network device.
+ * @rx_wq: WQ for Rx poll when we cannot schedule NAPI poll.
+ * @mac_addr: HW MAC addresses.
+ * @asic_specific: ASIC specific port structure.
+ * @napi: New API structure.
+ * @rx_poll_work: Rx work for polling mode.
+ * @net_stats: statistics of the ethernet interface.
+ * @in_reset: true if the NIC was marked as in reset, false otherwise. Used to avoid an additional
+ * stopping of the NIC if a hard reset was re-initiated.
+ * @pflags: ethtool private flags bit mask.
+ * @idx: index of this specific port.
+ * @rx_max_coalesced_frames: Maximum number of packets to receive before an RX interrupt.
+ * @tx_max_coalesced_frames: Maximum number of packets to be sent before a TX interrupt.
+ * @rx_coalesce_usecs: How many usecs to delay an RX interrupt after a packet arrives.
+ * @is_initialized: true if the port H/W is initialized, false otherwise.
+ * @pfc_enable: true if this port supports Priority Flow Control, false otherwise.
+ * @auto_neg_enable: is autoneg enabled.
+ * @auto_neg_resolved: was autoneg phase finished successfully.
+ */
+struct hbl_en_port {
+ struct hbl_en_device *hdev;
+ struct net_device *ndev;
+ struct workqueue_struct *rx_wq;
+ char *mac_addr;
+ void *asic_specific;
+ struct napi_struct napi;
+ struct delayed_work rx_poll_work;
+ struct hbl_en_net_stats net_stats;
+ atomic_t in_reset;
+ u32 pflags;
+ u32 idx;
+ u32 rx_max_coalesced_frames;
+ u32 tx_max_coalesced_frames;
+ u16 rx_coalesce_usecs;
+ u8 is_initialized;
+ u8 pfc_enable;
+ u8 auto_neg_enable;
+ u8 auto_neg_resolved;
+};
+
+/**
+ * struct hbl_en_asic_funcs - ASIC specific Ethernet functions.
+ * @dev_init: device init.
+ * @dev_fini: device cleanup.
+ * @reenable_rx_irq: re-enable Rx interrupts.
+ * @eth_port_open: initialize and open the Ethernet port.
+ * @eth_port_close: close the Ethernet port.
+ * @write_pkt_to_hw: write skb to HW.
+ * @read_pkt_from_hw: read pkt from HW.
+ * @get_pfc_cnts: get PFC counters.
+ * @set_coalesce: set Tx/Rx coalesce config in HW.
+ * @get_rx_ring size: get max number of elements the Rx ring can contain.
+ * @handle_eqe: Handle a received event.
+ */
+struct hbl_en_asic_funcs {
+ int (*dev_init)(struct hbl_en_device *hdev);
+ void (*dev_fini)(struct hbl_en_device *hdev);
+ void (*reenable_rx_irq)(struct hbl_en_port *port);
+ int (*eth_port_open)(struct hbl_en_port *port);
+ void (*eth_port_close)(struct hbl_en_port *port);
+ netdev_tx_t (*write_pkt_to_hw)(struct hbl_en_port *port, struct sk_buff *skb);
+ int (*read_pkt_from_hw)(struct hbl_en_port *port, void **pkt_addr, u32 *pkt_size);
+ void (*get_pfc_cnts)(struct hbl_en_port *port, void *ptr);
+ int (*set_coalesce)(struct hbl_en_port *port);
+ int (*get_rx_ring_size)(struct hbl_en_port *port);
+ void (*handle_eqe)(struct hbl_aux_dev *aux_dev, u32 port_idx, struct hbl_cn_eqe *eqe);
+};
+
+/**
+ * struct hbl_en_device - habanalabs Ethernet device structure.
+ * @pdev: pointer to PCI device.
+ * @dev: related kernel basic device structure.
+ * @ports: array of all ports manage common structures.
+ * @aux_dev: pointer to auxiliary device.
+ * @asic_specific: ASIC specific device structure.
+ * @fw_ver: FW version.
+ * @qsfp_eeprom: QSFPD EEPROM info.
+ * @mac_addr: array of all MAC addresses.
+ * @asic_funcs: ASIC specific Ethernet functions.
+ * @asic_type: ASIC specific type.
+ * @ports_mask: mask of available ports.
+ * @auto_neg_mask: mask of port with Autonegotiation enabled.
+ * @port_reset_timeout: max time in seconds for a port reset flow to finish.
+ * @pending_reset_long_timeout: long timeout for pending hard reset to finish in seconds.
+ * @max_frm_len: maximum allowed frame length.
+ * @raw_elem_size: size of element in raw buffers.
+ * @max_raw_mtu: maximum MTU size for raw packets.
+ * @min_raw_mtu: minimum MTU size for raw packets.
+ * @pad_size: the pad size in bytes for the skb to transmit.
+ * @core_dev_id: core device ID.
+ * @max_num_of_ports: max number of available ports;
+ * @in_reset: is the entire NIC currently under reset.
+ * @poll_enable: Enable Rx polling rather than IRQ + NAPI.
+ * @in_teardown: true if the NIC is in teardown (during device remove).
+ * @is_initialized: was the device initialized successfully.
+ * @has_eq: true if event queue is supported.
+ * @dma_map_support: HW supports DMA mapping.
+ */
+struct hbl_en_device {
+ struct pci_dev *pdev;
+ struct device *dev;
+ struct hbl_en_port *ports;
+ struct hbl_aux_dev *aux_dev;
+ void *asic_specific;
+ char *fw_ver;
+ char *qsfp_eeprom;
+ char *mac_addr;
+ struct hbl_en_asic_funcs asic_funcs;
+ enum hbl_cn_asic_type asic_type;
+ u64 ports_mask;
+ u64 auto_neg_mask;
+ u32 port_reset_timeout;
+ u32 pending_reset_long_timeout;
+ u32 max_frm_len;
+ u32 raw_elem_size;
+ u16 max_raw_mtu;
+ u16 min_raw_mtu;
+ u16 pad_size;
+ u16 core_dev_id;
+ u8 max_num_of_ports;
+ u8 in_reset;
+ u8 poll_enable;
+ u8 in_teardown;
+ u8 is_initialized;
+ u8 has_eq;
+ u8 dma_map_support;
+};
+
+int hbl_en_dev_init(struct hbl_en_device *hdev);
+void hbl_en_dev_fini(struct hbl_en_device *hdev);
+
+const struct ethtool_ops *hbl_en_ethtool_get_ops(struct net_device *ndev);
+void hbl_en_ethtool_init_coalesce(struct hbl_en_port *port);
+
+extern const struct dcbnl_rtnl_ops hbl_en_dcbnl_ops;
+
+bool hbl_en_rx_poll_start(struct hbl_en_port *port);
+void hbl_en_rx_poll_stop(struct hbl_en_port *port);
+void hbl_en_rx_poll_trigger_init(struct hbl_en_port *port);
+int hbl_en_port_reset(struct hbl_en_port *port);
+int hbl_en_port_reset_locked(struct hbl_aux_dev *aux_dev, u32 port_idx);
+int hbl_en_handle_rx(struct hbl_en_port *port, int budget);
+dma_addr_t hbl_en_dma_map(struct hbl_en_device *hdev, void *addr, int len);
+void hbl_en_dma_unmap(struct hbl_en_device *hdev, dma_addr_t dma_addr, int len);
+
+#endif /* HABANALABS_EN_H_ */
diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c
new file mode 100644
index 000000000000..5d718579a2b6
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "hbl_en.h"
+
+#define PFC_PRIO_MASK_ALL GENMASK(HBL_EN_PFC_PRIO_NUM - 1, 0)
+#define PFC_PRIO_MASK_NONE 0
+
+#ifdef CONFIG_DCB
+static int hbl_en_dcbnl_ieee_getpfc(struct net_device *netdev, struct ieee_pfc *pfc)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(netdev);
+ struct hbl_en_device *hdev;
+ u32 port_idx;
+
+ hdev = port->hdev;
+ port_idx = port->idx;
+
+ if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
+ dev_dbg_ratelimited(hdev->dev, "port %d is in reset, can't get PFC", port_idx);
+ return -EBUSY;
+ }
+
+ pfc->pfc_en = port->pfc_enable ? PFC_PRIO_MASK_ALL : PFC_PRIO_MASK_NONE;
+ pfc->pfc_cap = HBL_EN_PFC_PRIO_NUM;
+
+ hdev->asic_funcs.get_pfc_cnts(port, pfc);
+
+ atomic_set(&port->in_reset, 0);
+
+ return 0;
+}
+
+static int hbl_en_dcbnl_ieee_setpfc(struct net_device *netdev, struct ieee_pfc *pfc)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(netdev);
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_en_device *hdev;
+ u8 curr_pfc_en;
+ u32 port_idx;
+ int rc = 0;
+
+ hdev = port->hdev;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ port_idx = port->idx;
+
+ if (pfc->pfc_en & ~PFC_PRIO_MASK_ALL) {
+ dev_dbg_ratelimited(hdev->dev, "PFC supports %d priorities only, port %d\n",
+ HBL_EN_PFC_PRIO_NUM, port_idx);
+ return -EINVAL;
+ }
+
+ if (pfc->pfc_en != PFC_PRIO_MASK_NONE && pfc->pfc_en != PFC_PRIO_MASK_ALL) {
+ dev_dbg_ratelimited(hdev->dev,
+ "PFC should be enabled/disabled on all priorities, port %d\n",
+ port_idx);
+ return -EINVAL;
+ }
+
+ if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
+ dev_dbg_ratelimited(hdev->dev, "port %d is in reset, can't set PFC", port_idx);
+ return -EBUSY;
+ }
+
+ curr_pfc_en = port->pfc_enable ? PFC_PRIO_MASK_ALL : PFC_PRIO_MASK_NONE;
+
+ if (pfc->pfc_en == curr_pfc_en)
+ goto out;
+
+ port->pfc_enable = !port->pfc_enable;
+
+ rc = aux_ops->set_pfc(aux_dev, port_idx, port->pfc_enable);
+
+out:
+ atomic_set(&port->in_reset, 0);
+
+ return rc;
+}
+
+static u8 hbl_en_dcbnl_getdcbx(struct net_device *netdev)
+{
+ return DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE;
+}
+
+static u8 hbl_en_dcbnl_setdcbx(struct net_device *netdev, u8 mode)
+{
+ return !(mode == (DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE));
+}
+
+const struct dcbnl_rtnl_ops hbl_en_dcbnl_ops = {
+ .ieee_getpfc = hbl_en_dcbnl_ieee_getpfc,
+ .ieee_setpfc = hbl_en_dcbnl_ieee_setpfc,
+ .getdcbx = hbl_en_dcbnl_getdcbx,
+ .setdcbx = hbl_en_dcbnl_setdcbx
+};
+#endif
diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c
new file mode 100644
index 000000000000..23a87d36ded5
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c
@@ -0,0 +1,211 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#define pr_fmt(fmt) "habanalabs_en: " fmt
+
+#include "hbl_en.h"
+
+#include <linux/module.h>
+#include <linux/auxiliary_bus.h>
+
+#define HBL_DRIVER_AUTHOR "HabanaLabs Kernel Driver Team"
+
+#define HBL_DRIVER_DESC "HabanaLabs AI accelerators Ethernet driver"
+
+MODULE_AUTHOR(HBL_DRIVER_AUTHOR);
+MODULE_DESCRIPTION(HBL_DRIVER_DESC);
+MODULE_LICENSE("GPL");
+
+static bool poll_enable;
+
+module_param(poll_enable, bool, 0444);
+MODULE_PARM_DESC(poll_enable,
+ "Enable Rx polling rather than IRQ + NAPI (0 = no, 1 = yes, default: no)");
+
+static int hdev_init(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_en_aux_data *aux_data = aux_dev->aux_data;
+ struct hbl_en_port *ports, *port;
+ struct hbl_en_device *hdev;
+ int rc, i;
+
+ hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
+ if (!hdev)
+ return -ENOMEM;
+
+ ports = kcalloc(aux_data->max_num_of_ports, sizeof(*ports), GFP_KERNEL);
+ if (!ports) {
+ rc = -ENOMEM;
+ goto ports_alloc_fail;
+ }
+
+ aux_dev->priv = hdev;
+ hdev->aux_dev = aux_dev;
+ hdev->ports = ports;
+ hdev->pdev = aux_data->pdev;
+ hdev->dev = aux_data->dev;
+ hdev->ports_mask = aux_data->ports_mask;
+ hdev->auto_neg_mask = aux_data->auto_neg_mask;
+ hdev->max_num_of_ports = aux_data->max_num_of_ports;
+ hdev->core_dev_id = aux_data->id;
+ hdev->fw_ver = aux_data->fw_ver;
+ hdev->qsfp_eeprom = aux_data->qsfp_eeprom;
+ hdev->asic_type = aux_data->asic_type;
+ hdev->pending_reset_long_timeout = aux_data->pending_reset_long_timeout;
+ hdev->max_frm_len = aux_data->max_frm_len;
+ hdev->raw_elem_size = aux_data->raw_elem_size;
+ hdev->max_raw_mtu = aux_data->max_raw_mtu;
+ hdev->min_raw_mtu = aux_data->min_raw_mtu;
+ hdev->pad_size = ETH_ZLEN;
+ hdev->has_eq = aux_data->has_eq;
+ hdev->dma_map_support = true;
+ hdev->poll_enable = poll_enable;
+
+ for (i = 0; i < hdev->max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ port = &hdev->ports[i];
+ port->hdev = hdev;
+ port->idx = i;
+ port->pfc_enable = true;
+ port->pflags = PFLAGS_PCS_LINK_CHECK | PFLAGS_PHY_AUTO_NEG_LPBK;
+ port->mac_addr = aux_data->mac_addr[i];
+ port->auto_neg_enable = !!(aux_data->auto_neg_mask & BIT(i));
+ }
+
+ return 0;
+
+ports_alloc_fail:
+ kfree(hdev);
+
+ return rc;
+}
+
+static void hdev_fini(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_en_device *hdev = aux_dev->priv;
+
+ kfree(hdev->ports);
+ kfree(hdev);
+ aux_dev->priv = NULL;
+}
+
+static const struct auxiliary_device_id hbl_en_id_table[] = {
+ { .name = "habanalabs_cn.en", },
+ {},
+};
+
+MODULE_DEVICE_TABLE(auxiliary, hbl_en_id_table);
+
+static int hbl_en_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
+{
+ struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
+ struct hbl_en_aux_ops *aux_ops = aux_dev->aux_ops;
+ struct hbl_en_device *hdev;
+ ktime_t timeout;
+ int rc;
+
+ rc = hdev_init(aux_dev);
+ if (rc) {
+ dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
+ return -EIO;
+ }
+
+ hdev = aux_dev->priv;
+
+ /* don't allow module unloading while it is attached */
+ if (!try_module_get(THIS_MODULE)) {
+ dev_err(hdev->dev, "Failed to increment %s module refcount\n", HBL_EN_NAME);
+ rc = -EIO;
+ goto module_get_err;
+ }
+
+ timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
+ while (1) {
+ aux_ops->hw_access_lock(aux_dev);
+
+ /* if the device is operational, proceed to actual init while holding the lock in
+ * order to prevent concurrent hard reset
+ */
+ if (aux_ops->device_operational(aux_dev))
+ break;
+
+ aux_ops->hw_access_unlock(aux_dev);
+
+ if (ktime_compare(ktime_get(), timeout) > 0) {
+ dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
+ rc = -EBUSY;
+ goto timeout_err;
+ }
+
+ dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing en\n");
+
+ msleep_interruptible(MSEC_PER_SEC);
+ }
+
+ rc = hbl_en_dev_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init en device\n");
+ goto dev_init_err;
+ }
+
+ aux_ops->hw_access_unlock(aux_dev);
+
+ return 0;
+
+dev_init_err:
+ aux_ops->hw_access_unlock(aux_dev);
+timeout_err:
+ module_put(THIS_MODULE);
+module_get_err:
+ hdev_fini(aux_dev);
+
+ return rc;
+}
+
+/* This function can be called only from the CN driver when deleting the aux bus, because we
+ * incremented the module refcount on probing. Hence no need to protect here from hard reset.
+ */
+static void hbl_en_remove(struct auxiliary_device *adev)
+{
+ struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
+ struct hbl_en_device *hdev = aux_dev->priv;
+
+ if (!hdev)
+ return;
+
+ hbl_en_dev_fini(hdev);
+
+ /* allow module unloading as now it is detached */
+ module_put(THIS_MODULE);
+
+ hdev_fini(aux_dev);
+}
+
+static struct auxiliary_driver hbl_en_driver = {
+ .name = "eth",
+ .probe = hbl_en_probe,
+ .remove = hbl_en_remove,
+ .id_table = hbl_en_id_table,
+};
+
+static int __init hbl_en_init(void)
+{
+ pr_info("loading driver\n");
+
+ return auxiliary_driver_register(&hbl_en_driver);
+}
+
+static void __exit hbl_en_exit(void)
+{
+ auxiliary_driver_unregister(&hbl_en_driver);
+
+ pr_info("driver removed\n");
+}
+
+module_init(hbl_en_init);
+module_exit(hbl_en_exit);
diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c
new file mode 100644
index 000000000000..1d14d283409b
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c
@@ -0,0 +1,452 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "hbl_en.h"
+#include <linux/ethtool.h>
+
+#define RX_COALESCED_FRAMES_MIN 1
+#define TX_COALESCED_FRAMES_MIN 1
+#define TX_COALESCED_FRAMES_MAX 10
+
+static const char pflags_str[][ETH_GSTRING_LEN] = {
+ "pcs-link-check",
+ "phy-auto-neg-lpbk",
+};
+
+#define NIC_STAT(m) {#m, offsetof(struct hbl_en_port, net_stats.m)}
+
+static struct hbl_cn_stat netdev_eth_stats[] = {
+ NIC_STAT(rx_packets),
+ NIC_STAT(tx_packets),
+ NIC_STAT(rx_bytes),
+ NIC_STAT(tx_bytes),
+ NIC_STAT(tx_errors),
+ NIC_STAT(rx_dropped),
+ NIC_STAT(tx_dropped)
+};
+
+static size_t pflags_str_len = ARRAY_SIZE(pflags_str);
+static size_t netdev_eth_stats_len = ARRAY_SIZE(netdev_eth_stats);
+
+static void hbl_en_ethtool_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *drvinfo)
+{
+ struct hbl_en_device *hdev;
+ struct hbl_en_port *port;
+
+ port = hbl_netdev_priv(ndev);
+ hdev = port->hdev;
+
+ strscpy(drvinfo->driver, HBL_EN_NAME, sizeof(drvinfo->driver));
+ strscpy(drvinfo->fw_version, hdev->fw_ver, sizeof(drvinfo->fw_version));
+ strscpy(drvinfo->bus_info, pci_name(hdev->pdev), sizeof(drvinfo->bus_info));
+}
+
+static int hbl_en_ethtool_get_module_info(struct net_device *ndev, struct ethtool_modinfo *modinfo)
+{
+ modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
+ modinfo->type = ETH_MODULE_SFF_8636;
+
+ return 0;
+}
+
+static int hbl_en_ethtool_get_module_eeprom(struct net_device *ndev, struct ethtool_eeprom *ee,
+ u8 *data)
+{
+ struct hbl_en_device *hdev;
+ struct hbl_en_port *port;
+ u32 first, last, len;
+ u8 *qsfp_eeprom;
+
+ port = hbl_netdev_priv(ndev);
+ hdev = port->hdev;
+ qsfp_eeprom = hdev->qsfp_eeprom;
+
+ if (ee->len == 0)
+ return -EINVAL;
+
+ first = ee->offset;
+ last = ee->offset + ee->len;
+
+ if (first < ETH_MODULE_SFF_8636_LEN) {
+ len = min_t(unsigned int, last, ETH_MODULE_SFF_8079_LEN);
+ len -= first;
+
+ memcpy(data, qsfp_eeprom + first, len);
+ }
+
+ return 0;
+}
+
+static u32 hbl_en_ethtool_get_priv_flags(struct net_device *ndev)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(ndev);
+
+ return port->pflags;
+}
+
+static int hbl_en_ethtool_set_priv_flags(struct net_device *ndev, u32 priv_flags)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(ndev);
+
+ port->pflags = priv_flags;
+
+ return 0;
+}
+
+static int hbl_en_ethtool_get_link_ksettings(struct net_device *ndev,
+ struct ethtool_link_ksettings *cmd)
+{
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_en_device *hdev;
+ struct hbl_en_port *port;
+ u32 port_idx, speed;
+
+ port = hbl_netdev_priv(ndev);
+ hdev = port->hdev;
+ port_idx = port->idx;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ speed = aux_ops->get_speed(aux_dev, port_idx);
+
+ cmd->base.speed = speed;
+ cmd->base.duplex = DUPLEX_FULL;
+
+ ethtool_link_ksettings_zero_link_mode(cmd, supported);
+ ethtool_link_ksettings_zero_link_mode(cmd, advertising);
+
+ switch (speed) {
+ case SPEED_100000:
+ ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseCR4_Full);
+ ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseSR4_Full);
+ ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseKR4_Full);
+ ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseLR4_ER4_Full);
+
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseCR4_Full);
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseSR4_Full);
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseKR4_Full);
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseLR4_ER4_Full);
+
+ cmd->base.port = PORT_FIBRE;
+
+ ethtool_link_ksettings_add_link_mode(cmd, supported, FIBRE);
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, FIBRE);
+
+ ethtool_link_ksettings_add_link_mode(cmd, supported, Backplane);
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, Backplane);
+ break;
+ case SPEED_50000:
+ ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseSR2_Full);
+ ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseCR2_Full);
+ ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseKR2_Full);
+
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseSR2_Full);
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseCR2_Full);
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseKR2_Full);
+ break;
+ case SPEED_25000:
+ ethtool_link_ksettings_add_link_mode(cmd, supported, 25000baseCR_Full);
+
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, 25000baseCR_Full);
+ break;
+ case SPEED_200000:
+ ethtool_link_ksettings_add_link_mode(cmd, supported, 200000baseCR4_Full);
+ ethtool_link_ksettings_add_link_mode(cmd, supported, 200000baseKR4_Full);
+
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, 200000baseCR4_Full);
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, 200000baseKR4_Full);
+ break;
+ case SPEED_400000:
+ ethtool_link_ksettings_add_link_mode(cmd, supported, 400000baseCR4_Full);
+ ethtool_link_ksettings_add_link_mode(cmd, supported, 400000baseKR4_Full);
+
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, 400000baseCR4_Full);
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, 400000baseKR4_Full);
+ break;
+ default:
+ netdev_err(port->ndev, "unknown speed %d\n", speed);
+ return -EFAULT;
+ }
+
+ ethtool_link_ksettings_add_link_mode(cmd, supported, Autoneg);
+
+ if (port->auto_neg_enable) {
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, Autoneg);
+ cmd->base.autoneg = AUTONEG_ENABLE;
+ if (port->auto_neg_resolved)
+ ethtool_link_ksettings_add_link_mode(cmd, lp_advertising, Autoneg);
+ } else {
+ cmd->base.autoneg = AUTONEG_DISABLE;
+ }
+
+ ethtool_link_ksettings_add_link_mode(cmd, supported, Pause);
+
+ if (port->pfc_enable)
+ ethtool_link_ksettings_add_link_mode(cmd, advertising, Pause);
+
+ return 0;
+}
+
+/* only autoneg is mutable */
+static bool check_immutable_ksettings(const struct ethtool_link_ksettings *old_cmd,
+ const struct ethtool_link_ksettings *new_cmd)
+{
+ return (old_cmd->base.speed == new_cmd->base.speed) &&
+ (old_cmd->base.duplex == new_cmd->base.duplex) &&
+ (old_cmd->base.port == new_cmd->base.port) &&
+ (old_cmd->base.phy_address == new_cmd->base.phy_address) &&
+ (old_cmd->base.eth_tp_mdix_ctrl == new_cmd->base.eth_tp_mdix_ctrl) &&
+ bitmap_equal(old_cmd->link_modes.advertising, new_cmd->link_modes.advertising,
+ __ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static int
+hbl_en_ethtool_set_link_ksettings(struct net_device *ndev, const struct ethtool_link_ksettings *cmd)
+{
+ struct ethtool_link_ksettings curr_cmd;
+ struct hbl_en_device *hdev;
+ struct hbl_en_port *port;
+ bool auto_neg;
+ u32 port_idx;
+ int rc;
+
+ port = hbl_netdev_priv(ndev);
+ hdev = port->hdev;
+ port_idx = port->idx;
+
+ memset(&curr_cmd, 0, sizeof(struct ethtool_link_ksettings));
+
+ rc = hbl_en_ethtool_get_link_ksettings(ndev, &curr_cmd);
+ if (rc)
+ return rc;
+
+ if (!check_immutable_ksettings(&curr_cmd, cmd))
+ return -EOPNOTSUPP;
+
+ auto_neg = cmd->base.autoneg == AUTONEG_ENABLE;
+
+ if (port->auto_neg_enable == auto_neg)
+ return 0;
+
+ if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
+ netdev_err(port->ndev, "port is in reset, can't update settings\n");
+ return -EBUSY;
+ }
+
+ if (auto_neg && !(hdev->auto_neg_mask & BIT(port_idx))) {
+ netdev_err(port->ndev, "port autoneg is disabled by BMC\n");
+ rc = -EFAULT;
+ goto out;
+ }
+
+ port->auto_neg_enable = auto_neg;
+
+ if (netif_running(port->ndev)) {
+ rc = hbl_en_port_reset(port);
+ if (rc)
+ netdev_err(port->ndev, "Failed to reset port for settings update, rc %d\n",
+ rc);
+ }
+
+out:
+ atomic_set(&port->in_reset, 0);
+
+ return rc;
+}
+
+static int hbl_en_ethtool_get_sset_count(struct net_device *ndev, int sset)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(ndev);
+ struct hbl_en_device *hdev = port->hdev;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ switch (sset) {
+ case ETH_SS_STATS:
+ return netdev_eth_stats_len + aux_ops->get_cnts_num(aux_dev, port_idx);
+ case ETH_SS_PRIV_FLAGS:
+ return pflags_str_len;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static void hbl_en_ethtool_get_strings(struct net_device *ndev, u32 stringset, u8 *data)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(ndev);
+ struct hbl_en_device *hdev = port->hdev;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+ int i;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ switch (stringset) {
+ case ETH_SS_STATS:
+ for (i = 0; i < netdev_eth_stats_len; i++)
+ ethtool_puts(&data, netdev_eth_stats[i].str);
+
+ aux_ops->get_cnts_names(aux_dev, port_idx, data);
+ break;
+ case ETH_SS_PRIV_FLAGS:
+ for (i = 0; i < pflags_str_len; i++)
+ ethtool_puts(&data, pflags_str[i]);
+ break;
+ }
+}
+
+static void hbl_en_ethtool_get_ethtool_stats(struct net_device *ndev,
+ __always_unused struct ethtool_stats *stats, u64 *data)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(ndev);
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_en_device *hdev;
+ u32 port_idx;
+ char *p;
+ int i;
+
+ hdev = port->hdev;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ port_idx = port->idx;
+
+ if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
+ dev_info_ratelimited(hdev->dev, "port %d is in reset, can't get ethtool stats",
+ port_idx);
+ return;
+ }
+
+ /* Even though the Ethernet Rx/Tx flow might update the stats in parallel, there is not an
+ * absolute need for synchronisation. This is because, missing few counts of these stats is
+ * much better than adding a lock to synchronize and increase the overhead of the Rx/Tx
+ * flows. In worst case scenario, reader will get stale stats. He will receive updated
+ * stats in next read.
+ */
+ for (i = 0; i < netdev_eth_stats_len; i++) {
+ p = (char *)port + netdev_eth_stats[i].lo_offset;
+ data[i] = *(u32 *)p;
+ }
+
+ data += i;
+
+ aux_ops->get_cnts_values(aux_dev, port_idx, data);
+
+ atomic_set(&port->in_reset, 0);
+}
+
+static int hbl_en_ethtool_get_coalesce(struct net_device *ndev,
+ struct ethtool_coalesce *coal,
+ struct kernel_ethtool_coalesce *kernel_coal,
+ struct netlink_ext_ack *extack)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(ndev);
+ struct hbl_en_device *hdev = port->hdev;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ aux_ops->ctrl_lock(aux_dev, port_idx);
+
+ coal->tx_max_coalesced_frames = port->tx_max_coalesced_frames;
+ coal->rx_coalesce_usecs = port->rx_coalesce_usecs;
+ coal->rx_max_coalesced_frames = port->rx_max_coalesced_frames;
+
+ aux_ops->ctrl_unlock(aux_dev, port_idx);
+
+ return 0;
+}
+
+static int hbl_en_ethtool_set_coalesce(struct net_device *ndev,
+ struct ethtool_coalesce *coal,
+ struct kernel_ethtool_coalesce *kernel_coal,
+ struct netlink_ext_ack *extack)
+{
+ struct hbl_en_port *port = hbl_netdev_priv(ndev);
+ struct hbl_en_device *hdev = port->hdev;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+ int rc, rx_ring_size;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
+ netdev_err(port->ndev, "port is in reset, can't update settings\n");
+ return -EBUSY;
+ }
+
+ if (coal->tx_max_coalesced_frames < TX_COALESCED_FRAMES_MIN ||
+ coal->tx_max_coalesced_frames > TX_COALESCED_FRAMES_MAX) {
+ netdev_err(ndev, "tx max_coalesced_frames should be between %d and %d\n",
+ TX_COALESCED_FRAMES_MIN, TX_COALESCED_FRAMES_MAX);
+ rc = -EINVAL;
+ goto atomic_out;
+ }
+
+ rx_ring_size = hdev->asic_funcs.get_rx_ring_size(port);
+ if (coal->rx_max_coalesced_frames < RX_COALESCED_FRAMES_MIN ||
+ coal->rx_max_coalesced_frames >= rx_ring_size) {
+ netdev_err(ndev, "rx max_coalesced_frames should be between %d and %d\n",
+ RX_COALESCED_FRAMES_MIN, rx_ring_size);
+ rc = -EINVAL;
+ goto atomic_out;
+ }
+
+ aux_ops->ctrl_lock(aux_dev, port_idx);
+
+ port->tx_max_coalesced_frames = coal->tx_max_coalesced_frames;
+ port->rx_coalesce_usecs = coal->rx_coalesce_usecs;
+ port->rx_max_coalesced_frames = coal->rx_max_coalesced_frames;
+
+ rc = hdev->asic_funcs.set_coalesce(port);
+
+ aux_ops->ctrl_unlock(aux_dev, port_idx);
+
+atomic_out:
+ atomic_set(&port->in_reset, 0);
+ return rc;
+}
+
+void hbl_en_ethtool_init_coalesce(struct hbl_en_port *port)
+{
+ port->rx_coalesce_usecs = CQ_ARM_TIMEOUT_USEC;
+ port->rx_max_coalesced_frames = 1;
+ port->tx_max_coalesced_frames = 1;
+}
+
+static const struct ethtool_ops hbl_en_ethtool_ops_coalesce = {
+ .supported_coalesce_params = ETHTOOL_COALESCE_RX_USECS | ETHTOOL_COALESCE_RX_MAX_FRAMES |
+ ETHTOOL_COALESCE_TX_MAX_FRAMES,
+ .get_drvinfo = hbl_en_ethtool_get_drvinfo,
+ .get_link = ethtool_op_get_link,
+ .get_module_info = hbl_en_ethtool_get_module_info,
+ .get_module_eeprom = hbl_en_ethtool_get_module_eeprom,
+ .get_priv_flags = hbl_en_ethtool_get_priv_flags,
+ .set_priv_flags = hbl_en_ethtool_set_priv_flags,
+ .get_link_ksettings = hbl_en_ethtool_get_link_ksettings,
+ .set_link_ksettings = hbl_en_ethtool_set_link_ksettings,
+ .get_sset_count = hbl_en_ethtool_get_sset_count,
+ .get_strings = hbl_en_ethtool_get_strings,
+ .get_ethtool_stats = hbl_en_ethtool_get_ethtool_stats,
+ .get_coalesce = hbl_en_ethtool_get_coalesce,
+ .set_coalesce = hbl_en_ethtool_set_coalesce,
+};
+
+const struct ethtool_ops *hbl_en_ethtool_get_ops(struct net_device *ndev)
+{
+ return &hbl_en_ethtool_ops_coalesce;
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 107+ messages in thread
* [PATCH 10/15] net: hbl_en: gaudi2: ASIC specific support
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
` (7 preceding siblings ...)
2024-06-13 8:22 ` [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver Omer Shpigelman
@ 2024-06-13 8:22 ` Omer Shpigelman
2024-06-13 8:22 ` [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver Omer Shpigelman
` (6 subsequent siblings)
15 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:22 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
Add Gaudi2 ASIC specific support for ethernet purpose which includes HW
specific configurations and operations.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Co-developed-by: David Meriin <dmeriin@habana.ai>
Signed-off-by: David Meriin <dmeriin@habana.ai>
Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
---
MAINTAINERS | 1 +
drivers/net/ethernet/intel/hbl_en/Makefile | 3 +
.../net/ethernet/intel/hbl_en/common/hbl_en.c | 2 +
.../net/ethernet/intel/hbl_en/common/hbl_en.h | 2 +
.../net/ethernet/intel/hbl_en/gaudi2/Makefile | 2 +
.../ethernet/intel/hbl_en/gaudi2/gaudi2_en.c | 728 ++++++++++++++++++
.../ethernet/intel/hbl_en/gaudi2/gaudi2_en.h | 53 ++
.../intel/hbl_en/gaudi2/gaudi2_en_dcbnl.c | 32 +
8 files changed, 823 insertions(+)
create mode 100644 drivers/net/ethernet/intel/hbl_en/gaudi2/Makefile
create mode 100644 drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en.c
create mode 100644 drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en.h
create mode 100644 drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en_dcbnl.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 7301f38e9cfb..01b82e9b672c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9625,6 +9625,7 @@ W: https://www.habana.ai
F: Documentation/networking/device_drivers/ethernet/intel/hbl.rst
F: drivers/net/ethernet/intel/hbl_en/
F: include/linux/net/intel/cn*
+F: include/linux/net/intel/gaudi2*
HACKRF MEDIA DRIVER
L: linux-media@vger.kernel.org
diff --git a/drivers/net/ethernet/intel/hbl_en/Makefile b/drivers/net/ethernet/intel/hbl_en/Makefile
index 695497ab93b6..adc81ddf7d10 100644
--- a/drivers/net/ethernet/intel/hbl_en/Makefile
+++ b/drivers/net/ethernet/intel/hbl_en/Makefile
@@ -7,3 +7,6 @@ obj-$(CONFIG_HABANA_EN) := habanalabs_en.o
include $(src)/common/Makefile
habanalabs_en-y += $(HBL_EN_COMMON_FILES)
+
+include $(src)/gaudi2/Makefile
+habanalabs_en-y += $(HBL_EN_GAUDI2_FILES)
diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
index 066be5ac2d84..7f071aea1b8e 100644
--- a/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
+++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
@@ -997,6 +997,8 @@ static int hbl_en_set_asic_funcs(struct hbl_en_device *hdev)
{
switch (hdev->asic_type) {
case HBL_ASIC_GAUDI2:
+ gaudi2_en_set_asic_funcs(hdev);
+ break;
default:
dev_err(hdev->dev, "Unrecognized ASIC type %d\n", hdev->asic_type);
return -EINVAL;
diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
index 15504c1f3cfb..20259d610081 100644
--- a/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
+++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
@@ -203,4 +203,6 @@ int hbl_en_handle_rx(struct hbl_en_port *port, int budget);
dma_addr_t hbl_en_dma_map(struct hbl_en_device *hdev, void *addr, int len);
void hbl_en_dma_unmap(struct hbl_en_device *hdev, dma_addr_t dma_addr, int len);
+void gaudi2_en_set_asic_funcs(struct hbl_en_device *hdev);
+
#endif /* HABANALABS_EN_H_ */
diff --git a/drivers/net/ethernet/intel/hbl_en/gaudi2/Makefile b/drivers/net/ethernet/intel/hbl_en/gaudi2/Makefile
new file mode 100644
index 000000000000..e95e714bcecf
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_en/gaudi2/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+HBL_EN_GAUDI2_FILES := gaudi2/gaudi2_en.o gaudi2/gaudi2_en_dcbnl.o
diff --git a/drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en.c b/drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en.c
new file mode 100644
index 000000000000..5be6d1d6aa3d
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en.c
@@ -0,0 +1,728 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "gaudi2_en.h"
+
+#include <linux/circ_buf.h>
+
+#define RING_SIZE_MASK(ring) ((ring)->count - 1)
+
+static void req_qpc_init(struct gaudi2_qpc_requester *req_qpc, unsigned int mtu, int last_idx,
+ u32 schedq_num, bool enable)
+{
+ REQ_QPC_SET_TRANSPORT_SERVICE(*req_qpc, TS_RAW);
+ REQ_QPC_SET_LAST_IDX(*req_qpc, last_idx);
+ REQ_QPC_SET_WQ_TYPE(*req_qpc, 1);
+ REQ_QPC_SET_WQ_BASE_ADDR(*req_qpc, 0);
+ REQ_QPC_SET_MTU(*req_qpc, mtu);
+ REQ_QPC_SET_REMOTE_WQ_LOG_SZ(*req_qpc, 1);
+ REQ_QPC_SET_VALID(*req_qpc, (u64)enable);
+ REQ_QPC_SET_TRUST_LEVEL(*req_qpc, SECURED);
+ REQ_QPC_SET_PORT(*req_qpc, 0);
+ REQ_QPC_SET_DATA_MMU_BYPASS(*req_qpc, 1);
+ REQ_QPC_SET_BURST_SIZE(*req_qpc, 1);
+ REQ_QPC_SET_SCHD_Q_NUM(*req_qpc, schedq_num);
+ /* Due to a HW bug, backpressure is indicated on the ETH QP after some time. In order to
+ * avoid the BP message being sent, set the QP as backpressured to begin with. This will
+ * have no further impact, as the BP mechanism is associated with RDMA only.
+ */
+ REQ_QPC_SET_WQ_BACK_PRESSURE(*req_qpc, 1);
+}
+
+static void res_qpc_init(struct gaudi2_qpc_responder *res_qpc, u32 raw_qpn, u32 schedq_num,
+ bool enable)
+{
+ RES_QPC_SET_TRANSPORT_SERVICE(*res_qpc, TS_RAW);
+ RES_QPC_SET_VALID(*res_qpc, (u64)enable);
+ RES_QPC_SET_TRUST_LEVEL(*res_qpc, SECURED);
+ RES_QPC_SET_PORT(*res_qpc, 0);
+ RES_QPC_SET_CQ_NUM(*res_qpc, raw_qpn);
+ RES_QPC_SET_DATA_MMU_BYPASS(*res_qpc, 1);
+ RES_QPC_SET_SCHD_Q_NUM(*res_qpc, schedq_num);
+}
+
+static int gaudi2_en_read_pkt_from_hw(struct hbl_en_port *port, void **pkt_addr, u32 *pkt_size)
+{
+ struct gaudi2_en_port *gaudi2_port = port->asic_specific;
+ struct hbl_en_device *hdev = port->hdev;
+ struct hbl_cn_ring *rx_ring, *cq_ring;
+ enum hbl_en_eth_pkt_status pkt_status;
+ struct gaudi2_en_aux_ops *aux_ops;
+ struct gaudi2_en_device *gaudi2;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+ struct gaudi2_cqe *cqe_p;
+ u32 pi, size, wqe_idx;
+
+ gaudi2 = hdev->asic_specific;
+ aux_ops = gaudi2->aux_ops;
+ aux_dev = hdev->aux_dev;
+
+ rx_ring = gaudi2_port->rx_ring;
+ cq_ring = gaudi2_port->cq_ring;
+
+ /* check if packet is available by reading the PI */
+ if (cq_ring->ci_shadow == cq_ring->pi_shadow) {
+ pi = *((u32 *)RING_PI_ADDRESS(cq_ring));
+ if (pi == cq_ring->pi_shadow)
+ return ETH_PKT_NONE;
+
+ cq_ring->pi_shadow = pi;
+ }
+
+ cqe_p = (struct gaudi2_cqe *)RING_BUF_ADDRESS(cq_ring) +
+ (cq_ring->ci_shadow & RING_SIZE_MASK(cq_ring));
+
+ if (!CQE_IS_VALID(cqe_p)) {
+ dev_warn_ratelimited(hdev->dev, "Port-%d got invalid CQE on CQ\n", port_idx);
+ return ETH_PKT_DROP;
+ }
+
+ pkt_status = ETH_PKT_OK;
+
+ /* wqe index will point to the buffer consumed by HW */
+ wqe_idx = CQE_WQE_IDX(cqe_p) & RING_SIZE_MASK(rx_ring);
+ size = CQE_RAW_PKT_SIZE(cqe_p);
+
+ /* Since CQE is valid, SW must consume it, even if packet would eventually be dropped. */
+ if (size > hdev->max_frm_len || size == 0) {
+ dev_warn_ratelimited(hdev->dev, "Port-%d got invalid packet size %u\n",
+ port_idx, size);
+ pkt_status = ETH_PKT_DROP;
+ }
+
+ *pkt_addr = RING_BUF_ADDRESS(rx_ring) + wqe_idx * hdev->raw_elem_size;
+ *pkt_size = size;
+
+ cq_ring->ci_shadow++;
+
+ /* Mark the CQ-entry is not valid */
+ CQE_SET_INVALID(cqe_p);
+
+ /* inform the HW with our current CI */
+ aux_ops->write_rx_ci(aux_dev, port_idx, cq_ring->ci_shadow);
+
+ return pkt_status;
+}
+
+static int gaudi2_en_get_rx_ring_size(struct hbl_en_port *port)
+{
+ struct gaudi2_en_port *gaudi2_port = port->asic_specific;
+ struct hbl_cn_ring *rx_ring;
+
+ rx_ring = gaudi2_port->rx_ring;
+
+ return RING_SIZE_MASK(rx_ring);
+}
+
+static void gaudi2_en_configure_cq(struct hbl_en_port *port, bool enable)
+{
+ struct hbl_en_device *hdev = port->hdev;
+ struct gaudi2_en_aux_ops *aux_ops;
+ struct gaudi2_en_device *gaudi2;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+
+ gaudi2 = hdev->asic_specific;
+ aux_ops = gaudi2->aux_ops;
+ aux_dev = hdev->aux_dev;
+
+ /* if rx_coalesce_usecs is 0 then timer should be disabled */
+ aux_ops->configure_cq(aux_dev, port_idx, port->rx_coalesce_usecs,
+ port->rx_coalesce_usecs ? enable : false);
+}
+
+static void gaudi2_en_arm_cq(struct hbl_en_port *port)
+{
+ struct gaudi2_en_port *gaudi2_port = port->asic_specific;
+ struct hbl_en_device *hdev = port->hdev;
+ struct gaudi2_en_aux_ops *aux_ops;
+ struct gaudi2_en_device *gaudi2;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+
+ gaudi2 = hdev->asic_specific;
+ aux_ops = gaudi2->aux_ops;
+ aux_dev = hdev->aux_dev;
+
+ /* The trigger happens only when PI > IDX, therefore subtract 1 from arming index */
+ aux_ops->arm_cq(aux_dev, port_idx,
+ gaudi2_port->cq_ring->ci_shadow + port->rx_max_coalesced_frames - 1);
+}
+
+static int gaudi2_en_set_coalesce(struct hbl_en_port *port)
+{
+ struct hbl_en_device *hdev = port->hdev;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port_idx = port->idx;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ if (!aux_ops->is_port_open(aux_dev, port_idx))
+ return 0;
+
+ gaudi2_en_configure_cq(port, port->is_initialized);
+ gaudi2_en_arm_cq(port);
+
+ return 0;
+}
+
+static int gaudi2_en_config_qp(struct hbl_en_port *port, bool enable)
+{
+ struct gaudi2_en_port *gaudi2_port = port->asic_specific;
+ struct hbl_en_device *hdev = gaudi2_port->hdev;
+ struct gaudi2_qpc_requester req_qpc = {};
+ struct gaudi2_qpc_responder res_qpc = {};
+ struct net_device *ndev = port->ndev;
+ struct gaudi2_en_aux_data *aux_data;
+ u32 port_idx, raw_qpn, schedq_num;
+ struct gaudi2_en_device *gaudi2;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ struct qpc_mask mask = {};
+ int last_idx, rc;
+ unsigned int mtu;
+
+ gaudi2 = hdev->asic_specific;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ aux_data = gaudi2->aux_data;
+ port_idx = gaudi2_port->idx;
+ raw_qpn = aux_data->raw_qpn;
+ schedq_num = aux_data->schedq_num;
+ mtu = ndev->mtu + HBL_EN_MAX_HEADERS_SZ;
+
+ /* Need to configure the log of the MTU value minus 1KB as this is the minimum valid MTU.
+ * If the MTU value is not a power of 2, use the next possible value.
+ */
+ mtu = __fls(mtu) - 10 + !is_power_of_2(mtu);
+
+ last_idx = gaudi2_port->wq_ring->count - 1;
+
+ if (!enable) {
+ rc = aux_ops->eq_dispatcher_unregister_qp(aux_dev, port_idx, raw_qpn);
+ if (rc) {
+ netdev_err(ndev, "Failed to unregister QP, %d\n", rc);
+ return rc;
+ }
+
+ REQ_QPC_SET_VALID(mask, 1);
+ rc = aux_ops->qpc_write(aux_dev, port_idx, &req_qpc, &mask, raw_qpn, true);
+ if (rc) {
+ netdev_err(ndev, "Failed to configure requester QP, %d\n", rc);
+ return rc;
+ }
+
+ memset(&mask, 0, sizeof(mask));
+ RES_QPC_SET_VALID(mask, 1);
+ rc = aux_ops->qpc_write(aux_dev, port_idx, &res_qpc, &mask, raw_qpn, false);
+ if (rc)
+ netdev_err(ndev, "Failed to configure responder QP, %d\n", rc);
+
+ return rc;
+ }
+
+ memset(&res_qpc, 0, sizeof(res_qpc));
+ res_qpc_init(&res_qpc, raw_qpn, schedq_num, enable);
+ rc = aux_ops->qpc_write(aux_dev, port_idx, &res_qpc, NULL, raw_qpn, false);
+ if (rc) {
+ netdev_err(ndev, "Failed to configure responder QP, %d\n", rc);
+ goto qp_register_fail;
+ }
+
+ memset(&req_qpc, 0, sizeof(req_qpc));
+ req_qpc_init(&req_qpc, mtu, last_idx, schedq_num, enable);
+ rc = aux_ops->qpc_write(aux_dev, port_idx, &req_qpc, NULL, raw_qpn, true);
+ if (rc) {
+ netdev_err(ndev, "Failed to configure requester QP, %d\n", rc);
+ goto qp_register_fail;
+ }
+
+ rc = aux_ops->eq_dispatcher_register_qp(aux_dev, port_idx, aux_data->kernel_asid, raw_qpn);
+ if (rc) {
+ netdev_err(ndev, "Failed to register QP, %d\n", rc);
+ goto qp_register_fail;
+ }
+
+ return 0;
+
+qp_register_fail:
+ memset(&res_qpc, 0, sizeof(res_qpc));
+ RES_QPC_SET_VALID(res_qpc, 0);
+ aux_ops->qpc_write(aux_dev, port_idx, &res_qpc, NULL, raw_qpn, false);
+ memset(&req_qpc, 0, sizeof(req_qpc));
+ REQ_QPC_SET_VALID(req_qpc, 0);
+ aux_ops->qpc_write(aux_dev, port_idx, &req_qpc, NULL, raw_qpn, true);
+
+ return rc;
+}
+
+static void gaudi2_en_tx_done(struct gaudi2_en_port *gaudi2_port, struct hbl_cn_eqe *eqe_p)
+{
+ u32 port_idx, raw_qpn, handled_ci, pi, previous_pi;
+ struct hbl_en_device *hdev = gaudi2_port->hdev;
+ struct gaudi2_en_aux_data *asic_aux_data;
+ struct hbl_en_aux_data *aux_data;
+ struct gaudi2_en_tx_buf *tx_buf;
+ struct netdev_queue *netdev_txq;
+ struct hbl_aux_dev *aux_dev;
+ struct net_device *ndev;
+
+ port_idx = gaudi2_port->idx;
+ ndev = hdev->ports[port_idx].ndev;
+ aux_dev = hdev->aux_dev;
+ aux_data = aux_dev->aux_data;
+ asic_aux_data = aux_data->asic_specific;
+ raw_qpn = asic_aux_data->raw_qpn;
+
+ if (EQE_RAW_TX_EVENT_QPN(eqe_p) != raw_qpn) {
+ netdev_warn(ndev, "tx-done: port %d got wrong QP (%d vs %d); ignoring", port_idx,
+ EQE_RAW_TX_EVENT_QPN(eqe_p), raw_qpn);
+ return;
+ }
+
+ if (EQE_RAW_TX_EVENT_IDX(eqe_p) >= asic_aux_data->tx_ring_len) {
+ netdev_err(ndev, "tx-done: port %d got invalid WQE index (%d max %d); ignoring",
+ port_idx, EQE_RAW_TX_EVENT_IDX(eqe_p), asic_aux_data->tx_ring_len - 1);
+ return;
+ }
+
+ netdev_txq = netdev_get_tx_queue(ndev, 0);
+
+ /* Here we need to acquire the Tx lock (which is acquired also by the Tx handler) in order
+ * to prevent races when accessing to the Tx buffer and stopping/waking the netdev queue.
+ */
+ __netif_tx_lock_bh(netdev_txq);
+
+ /* Check if the index we got is in the current data bounds (indicated by the CI and PI).
+ * The out of bounds region is [PI,CI-1] circulary
+ */
+ pi = gaudi2_port->tx_buf_info_pi;
+ previous_pi = CIRC_CNT(pi, 1, asic_aux_data->tx_ring_len);
+
+ if (CIRC_CNT(previous_pi, EQE_RAW_TX_EVENT_IDX(eqe_p), asic_aux_data->tx_ring_len) >=
+ CIRC_CNT(pi, gaudi2_port->tx_buf_info_ci, asic_aux_data->tx_ring_len)) {
+ dev_warn_ratelimited(hdev->dev,
+ "tx-done: port %d got stale WQE index (expecting values between 0x%x to 0x%x, got 0x%x); ignoring",
+ port_idx, gaudi2_port->tx_buf_info_ci, pi,
+ EQE_RAW_TX_EVENT_IDX(eqe_p));
+ goto out;
+ }
+
+ /* Handle all entries up to the entry reported in the event */
+ do {
+ tx_buf = gaudi2_port->tx_buf_info + gaudi2_port->tx_buf_info_ci;
+ if (!tx_buf->skb) {
+ dev_warn_ratelimited(hdev->dev,
+ "Port-%d attempted to free a NULL element in TX ring (ci 0x%x, pi 0x%x, idx 0x%x)\n",
+ port_idx, gaudi2_port->tx_buf_info_ci, pi,
+ EQE_RAW_TX_EVENT_IDX(eqe_p));
+ goto out;
+ }
+ hbl_en_dma_unmap(hdev, tx_buf->dma_addr, tx_buf->len);
+ dev_consume_skb_any(tx_buf->skb);
+
+ tx_buf->skb = NULL;
+ handled_ci = gaudi2_port->tx_buf_info_ci;
+ gaudi2_port->tx_buf_info_ci =
+ (gaudi2_port->tx_buf_info_ci + 1) & (asic_aux_data->tx_ring_len - 1);
+ } while (EQE_RAW_TX_EVENT_IDX(eqe_p) != handled_ci);
+
+ /* No need to check for fifo space because if queue was stopped then fifo has room by now
+ * as it cleaned within a device cycle. In addition, wake the queue only if link is UP.
+ */
+ if (netif_queue_stopped(ndev) && netif_carrier_ok(ndev))
+ netif_wake_queue(ndev);
+
+out:
+ __netif_tx_unlock_bh(netdev_txq);
+}
+
+static u32 gaudi2_en_get_overrun_cnt(struct hbl_aux_dev *aux_dev, u32 port_idx)
+{
+ struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
+ struct gaudi2_en_port *gaudi2_port;
+
+ gaudi2_port = port->asic_specific;
+
+ return gaudi2_port->fifo_overrun_err_cnt;
+}
+
+static void gaudi2_en_handle_eqe(struct hbl_aux_dev *aux_dev, u32 port_idx, struct hbl_cn_eqe *eqe)
+{
+ struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
+ u32 event_type = EQE_TYPE(eqe), qp, synd;
+ struct hbl_en_device *hdev = port->hdev;
+ struct gaudi2_en_aux_ops *asic_aux_ops;
+ struct gaudi2_en_port *gaudi2_port;
+ struct hbl_en_aux_ops *aux_ops;
+
+ gaudi2_port = port->asic_specific;
+ aux_ops = hdev->aux_dev->aux_ops;
+ asic_aux_ops = aux_ops->asic_ops;
+
+ if (!EQE_IS_VALID(eqe)) {
+ dev_warn_ratelimited(hdev->dev, "Port-%d got invalid EQE on EQ!\n", port_idx);
+ return;
+ }
+
+ switch (event_type) {
+ case EQE_COMP_ERR:
+ dev_warn_ratelimited(hdev->dev, "Port-%d cq-err event CQ:%d PI:0x%x\n",
+ port_idx, EQE_CQ_EVENT_CQ_NUM(eqe), EQE_CQ_EVENT_PI(eqe));
+
+ atomic64_inc(&port->net_stats.rx_dropped);
+ /* CQ is configured to generate BP on such cases hence we just need to handle
+ * the packets in the Rx buffer
+ */
+ fallthrough;
+ case EQE_COMP:
+ if (!hdev->poll_enable) {
+ /* napi_schedule() eventually calls __raise_softirq_irqoff() which sets the
+ * net Rx softirq to run. Since we are in thread context here, the pending
+ * softirq flag won't be checked and the Rx softirq won't be invoked. Hence
+ * we need to use the bh_disable/enable pair to invoke it.
+ */
+ local_bh_disable();
+ napi_schedule(&port->napi);
+ local_bh_enable();
+ } else {
+ hbl_en_rx_poll_start(port);
+ }
+ break;
+ case EQE_RAW_TX_COMP:
+ gaudi2_en_tx_done(gaudi2_port, eqe);
+ break;
+ case EQE_QP_ERR:
+ synd = EQE_QP_EVENT_ERR_SYND(eqe);
+ qp = EQE_QP_EVENT_QPN(eqe);
+ dev_err_ratelimited(hdev->dev, "Port-%d qp-err event QP:%d err:%d %s\n", port_idx,
+ qp, synd, asic_aux_ops->qp_err_syndrome_to_str(synd));
+
+ /* In case of QP error we need to reset the port. We are calling the "locked"
+ * version of that function since the port->control_lock has been already
+ * taken at the beginning of the EQ handler.
+ */
+ dev_err_ratelimited(hdev->dev, "Going to reset port %d\n", port_idx);
+ aux_ops->track_ext_port_reset(aux_dev, port_idx, synd);
+ hbl_en_port_reset_locked(aux_dev, port_idx);
+ break;
+ case EQE_DB_FIFO_OVERRUN:
+ dev_warn_ratelimited(hdev->dev, "Port-%d db-fifo overrun event\n", port_idx);
+ gaudi2_port->fifo_overrun_err_cnt++;
+ atomic64_inc(&port->net_stats.tx_dropped);
+ break;
+ default:
+ dev_warn_ratelimited(hdev->dev, "Port-%d unsupported event type: %d", port_idx,
+ event_type);
+ break;
+ }
+}
+
+static netdev_tx_t gaudi2_en_write_pkt_to_hw(struct hbl_en_port *port, struct sk_buff *skb)
+{
+ u32 port_idx, tx_buf_info_pi, pi, space_left_in_qp, wq_ring_pi;
+ struct gaudi2_en_port *gaudi2_port = port->asic_specific;
+ struct hbl_en_device *hdev = gaudi2_port->hdev;
+ struct gaudi2_en_aux_data *asic_aux_data;
+ struct net_device *ndev = port->ndev;
+ struct gaudi2_en_aux_ops *aux_ops;
+ struct hbl_en_aux_data *aux_data;
+ struct gaudi2_en_device *gaudi2;
+ struct gaudi2_sq_wqe *wqe_p;
+ struct hbl_cn_ring *wq_ring;
+ struct hbl_aux_dev *aux_dev;
+ bool db_fifo_full_after_tx;
+ dma_addr_t dma_addr;
+ int rc;
+
+ tx_buf_info_pi = gaudi2_port->tx_buf_info_pi;
+ port_idx = port->idx;
+ gaudi2 = hdev->asic_specific;
+ aux_ops = gaudi2->aux_ops;
+ aux_dev = hdev->aux_dev;
+ aux_data = aux_dev->aux_data;
+ asic_aux_data = aux_data->asic_specific;
+ wq_ring = gaudi2_port->wq_ring;
+
+ dma_addr = hbl_en_dma_map(hdev, skb->data, skb->len);
+ if (unlikely(dma_mapping_error(hdev->dev, dma_addr))) {
+ dev_err_ratelimited(hdev->dev, "port %d failed to map DMA address\n", port_idx);
+ dev_kfree_skb_any(skb);
+ return NETDEV_TX_OK;
+ }
+
+ gaudi2_port->tx_buf_info[tx_buf_info_pi].dma_addr = dma_addr;
+ gaudi2_port->tx_buf_info[tx_buf_info_pi].skb = skb;
+ gaudi2_port->tx_buf_info[tx_buf_info_pi].len = skb->len;
+ gaudi2_port->tx_buf_info_pi = (tx_buf_info_pi + 1) & (asic_aux_data->tx_ring_len - 1);
+
+ /* point on the next WQE */
+ pi = wq_ring->pi_shadow;
+ wq_ring_pi = (wq_ring->pi_shadow + 1) & (wq_ring->count - 1);
+
+ wqe_p = (struct gaudi2_sq_wqe *)RING_BUF_ADDRESS(wq_ring) + pi;
+ memset(wqe_p, 0, sizeof(*wqe_p));
+
+ /* for ethernet only, turn on the solicite event bit */
+ CFG_SQ_WQE_RESET(wqe_p);
+ CFG_SQ_WQE_OPCODE(wqe_p, WQE_LINEAR);
+ CFG_SQ_WQE_INDEX(wqe_p, pi & 0xff);
+ CFG_SQ_WQE_INLINE(wqe_p, 0);
+ CFG_SQ_WQE_LOCAL_ADDRESS(wqe_p, dma_addr);
+ CFG_SQ_WQE_SIZE(wqe_p, (u64)skb->len);
+ CFG_SQ_WQE_SOL_EVENT(wqe_p, (pi % port->tx_max_coalesced_frames) ? 0 : 1);
+
+ /* make sure data is filled before ringing the db */
+ mb();
+
+ /* Ring doorbell */
+ rc = aux_ops->ring_tx_doorbell(aux_dev, port_idx, wq_ring_pi, &db_fifo_full_after_tx);
+ if (rc) {
+ /* Fifo is full, revert indices, unmap the skb, stop queue and return the error. */
+ gaudi2_port->tx_buf_info_pi = tx_buf_info_pi;
+ hbl_en_dma_unmap(hdev, dma_addr, skb->len);
+ gaudi2_port->tx_buf_info[tx_buf_info_pi].skb = NULL;
+
+ netdev_dbg(ndev, "port: %d stop queue due to full fifo - packet not sent\n",
+ port_idx);
+ netif_stop_queue(skb->dev);
+
+ return NETDEV_TX_BUSY;
+ }
+
+ wq_ring->pi_shadow = wq_ring_pi;
+
+ /* Check if we have enough space on the QP-WQ for the next xmit. */
+ space_left_in_qp = CIRC_SPACE(gaudi2_port->tx_buf_info_pi, gaudi2_port->tx_buf_info_ci,
+ asic_aux_data->tx_ring_len);
+ if (!space_left_in_qp || db_fifo_full_after_tx) {
+ netdev_dbg(ndev, "port: %d stop queue due to full %s\n", port_idx,
+ db_fifo_full_after_tx ? "fifo" : "WQ");
+ netif_stop_queue(skb->dev);
+ }
+
+ return NETDEV_TX_OK;
+}
+
+static int gaudi2_en_port_open(struct hbl_en_port *port)
+{
+ struct gaudi2_en_port *gaudi2_port = port->asic_specific;
+ struct hbl_cn_ring *wq_ring, *cq_ring;
+ int rc;
+
+ /* Reset Tx ring shadow PI/CI */
+ wq_ring = gaudi2_port->wq_ring;
+ wq_ring->pi_shadow = 0;
+ wq_ring->ci_shadow = 0; /* Unused */
+
+ /* Reset SW Tx buffer PI/CI */
+ gaudi2_port->tx_buf_info_pi = 0;
+ gaudi2_port->tx_buf_info_ci = 0;
+
+ /* Reset FIFO overrun error counter */
+ gaudi2_port->fifo_overrun_err_cnt = 0;
+
+ /* Reset CQ ring HW PI and shadow PI/CI */
+ cq_ring = gaudi2_port->cq_ring;
+ *((u32 *)RING_PI_ADDRESS(cq_ring)) = 0;
+ cq_ring->pi_shadow = 0;
+ cq_ring->ci_shadow = 0;
+
+ rc = gaudi2_en_config_qp(port, true);
+ if (rc) {
+ netdev_warn(port->ndev, "Failed to configure QPs, %d\n", rc);
+ return rc;
+ }
+
+ /* We would need the CQ-ARM for both polling and NAPI flows. This is because, even in
+ * polling mode, we would start the Rx Poll only upon the CQ-ARM event triggering the EQ
+ * for Rx completion.
+ */
+ gaudi2_en_configure_cq(port, true);
+ gaudi2_en_arm_cq(port);
+
+ return 0;
+}
+
+static void gaudi2_en_db_fifo_reset(struct gaudi2_en_port *gaudi2_port)
+{
+ struct hbl_en_device *hdev = gaudi2_port->hdev;
+ struct gaudi2_en_aux_ops *asic_aux_ops;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ asic_aux_ops = aux_ops->asic_ops;
+
+ asic_aux_ops->db_fifo_reset(aux_dev, gaudi2_port->idx);
+}
+
+static void gaudi2_en_flush_tx_buffer(struct gaudi2_en_port *gaudi2_port)
+{
+ struct hbl_en_device *hdev = gaudi2_port->hdev;
+ struct gaudi2_en_aux_data *asic_aux_data;
+ struct hbl_en_aux_data *aux_data;
+ struct gaudi2_en_tx_buf *tx_buf;
+ struct hbl_aux_dev *aux_dev;
+ u32 ci, pi;
+
+ aux_dev = hdev->aux_dev;
+ aux_data = aux_dev->aux_data;
+ asic_aux_data = aux_data->asic_specific;
+ ci = gaudi2_port->tx_buf_info_ci;
+ pi = gaudi2_port->tx_buf_info_pi;
+
+ while (ci != pi) {
+ tx_buf = &gaudi2_port->tx_buf_info[ci];
+ hbl_en_dma_unmap(hdev, tx_buf->dma_addr, tx_buf->len);
+ dev_kfree_skb_any(tx_buf->skb);
+
+ ci = (ci + 1) & (asic_aux_data->tx_ring_len - 1);
+ }
+
+ gaudi2_port->tx_buf_info_ci = ci;
+}
+
+static void gaudi2_en_port_close(struct hbl_en_port *port)
+{
+ struct gaudi2_en_port *gaudi2_port = port->asic_specific;
+ int rc;
+
+ gaudi2_en_configure_cq(port, false);
+
+ /* disable ETH Rx/Tx in H/W */
+ rc = gaudi2_en_config_qp(port, false);
+ if (rc)
+ netdev_warn(port->ndev, "Failed to destroy QPs, %d\n", rc);
+
+ gaudi2_en_db_fifo_reset(gaudi2_port);
+
+ /* Discard skbs safely from tx_buf as we won't get the tx_done call from the EQ now that the
+ * port is closed.
+ */
+ gaudi2_en_flush_tx_buffer(gaudi2_port);
+}
+
+static int gaudi2_en_dev_init(struct hbl_en_device *hdev)
+{
+ struct hbl_aux_dev *aux_dev = hdev->aux_dev;
+ struct gaudi2_en_port *gaudi2_port, *ports;
+ struct gaudi2_en_aux_data *asic_aux_data;
+ struct gaudi2_en_aux_ops *asic_aux_ops;
+ struct hbl_en_aux_data *aux_data;
+ struct gaudi2_en_device *gaudi2;
+ struct hbl_en_aux_ops *aux_ops;
+ int i, rc = 0, ports_cnt = 0;
+ struct hbl_en_port *port;
+ u32 tx_ring_size;
+
+ aux_data = aux_dev->aux_data;
+ asic_aux_data = aux_data->asic_specific;
+ aux_ops = aux_dev->aux_ops;
+ asic_aux_ops = aux_ops->asic_ops;
+
+ gaudi2 = kzalloc(sizeof(*gaudi2), GFP_KERNEL);
+ if (!gaudi2)
+ return -ENOMEM;
+
+ ports = kcalloc(hdev->max_num_of_ports, sizeof(*ports), GFP_KERNEL);
+ if (!ports) {
+ rc = -ENOMEM;
+ goto ports_alloc_fail;
+ }
+
+ tx_ring_size = asic_aux_data->tx_ring_len * sizeof(struct gaudi2_en_tx_buf);
+
+ for (i = 0; i < hdev->max_num_of_ports; i++, ports_cnt++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ gaudi2_port = &ports[i];
+ gaudi2_port->tx_buf_info = kzalloc(tx_ring_size, GFP_KERNEL);
+ if (!gaudi2_port->tx_buf_info) {
+ rc = -ENOMEM;
+ goto ports_init_fail;
+ }
+
+ gaudi2_port->idx = i;
+ gaudi2_port->hdev = hdev;
+ gaudi2_port->rx_ring = asic_aux_data->rx_rings[i];
+ gaudi2_port->cq_ring = asic_aux_data->cq_rings[i];
+ gaudi2_port->wq_ring = asic_aux_data->wq_rings[i];
+ port = &hdev->ports[i];
+ port->asic_specific = gaudi2_port;
+ }
+
+ asic_aux_ops->port_reset_locked = hbl_en_port_reset_locked;
+ asic_aux_ops->get_overrun_cnt = gaudi2_en_get_overrun_cnt;
+
+ gaudi2->ports = ports;
+ gaudi2->aux_data = asic_aux_data;
+ gaudi2->aux_ops = asic_aux_ops;
+
+ hdev->asic_specific = gaudi2;
+
+ hdev->pad_size = gaudi2->aux_data->pad_size;
+
+ return 0;
+
+ports_init_fail:
+ for (i = 0; i < ports_cnt; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ gaudi2_port = &ports[i];
+ kfree(gaudi2_port->tx_buf_info);
+ }
+
+ kfree(ports);
+ports_alloc_fail:
+ kfree(gaudi2);
+
+ return rc;
+}
+
+static void gaudi2_en_dev_fini(struct hbl_en_device *hdev)
+{
+ struct gaudi2_en_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_en_port *gaudi2_port;
+ int i;
+
+ if (!gaudi2)
+ return;
+
+ for (i = 0; i < hdev->max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ gaudi2_port = &gaudi2->ports[i];
+ kfree(gaudi2_port->tx_buf_info);
+ }
+
+ kfree(gaudi2->ports);
+ kfree(gaudi2);
+}
+
+void gaudi2_en_set_asic_funcs(struct hbl_en_device *hdev)
+{
+ struct hbl_en_asic_funcs *asic_funcs = &hdev->asic_funcs;
+
+ asic_funcs->dev_init = gaudi2_en_dev_init;
+ asic_funcs->dev_fini = gaudi2_en_dev_fini;
+ asic_funcs->eth_port_open = gaudi2_en_port_open;
+ asic_funcs->eth_port_close = gaudi2_en_port_close;
+ asic_funcs->reenable_rx_irq = gaudi2_en_arm_cq;
+ asic_funcs->write_pkt_to_hw = gaudi2_en_write_pkt_to_hw;
+ asic_funcs->read_pkt_from_hw = gaudi2_en_read_pkt_from_hw;
+ asic_funcs->get_pfc_cnts = gaudi2_en_dcbnl_get_pfc_cnts;
+ asic_funcs->set_coalesce = gaudi2_en_set_coalesce;
+ asic_funcs->get_rx_ring_size = gaudi2_en_get_rx_ring_size;
+ asic_funcs->handle_eqe = gaudi2_en_handle_eqe;
+}
diff --git a/drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en.h b/drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en.h
new file mode 100644
index 000000000000..ec5084462899
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef GAUDI2_EN_H_
+#define GAUDI2_EN_H_
+
+#include <linux/net/intel/gaudi2.h>
+
+#include "../common/hbl_en.h"
+
+/**
+ * struct gaudi2_en_device - Gaudi2 device structure.
+ * @ports: array of Gaudi2 ports structures.
+ * @aux_data: relevant data from the core device.
+ * @aux_ops: pointer functions for core <-> en drivers communication.
+ */
+struct gaudi2_en_device {
+ struct gaudi2_en_port *ports;
+ struct gaudi2_en_aux_data *aux_data;
+ struct gaudi2_en_aux_ops *aux_ops;
+};
+
+/**
+ * struct gaudi2_en_port - Gaudi2 port structure.
+ * @hdev: habanalabs device structure.
+ * @rx_ring: raw skb ring.
+ * @cq_ring: packets completion ring.
+ * @wq_ring: work queue ring.
+ * @tx_buf_info: Tx packets ring.
+ * @idx: port index.
+ * @tx_buf_info_pi: Tx producer index.
+ * @tx_buf_info_ci: Tx consumer index.
+ * @fifo_overrrun_err_cnt: error count of fifo overrun
+ */
+struct gaudi2_en_port {
+ struct hbl_en_device *hdev;
+ struct hbl_cn_ring *rx_ring;
+ struct hbl_cn_ring *cq_ring;
+ struct hbl_cn_ring *wq_ring;
+ struct gaudi2_en_tx_buf *tx_buf_info;
+ u32 idx;
+ u32 tx_buf_info_pi;
+ u32 tx_buf_info_ci;
+ u32 fifo_overrun_err_cnt;
+};
+
+void gaudi2_en_dcbnl_get_pfc_cnts(struct hbl_en_port *port, void *ptr);
+
+#endif /* GAUDI2_EN_H_ */
diff --git a/drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en_dcbnl.c b/drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en_dcbnl.c
new file mode 100644
index 000000000000..f565d7648823
--- /dev/null
+++ b/drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en_dcbnl.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "gaudi2_en.h"
+
+void gaudi2_en_dcbnl_get_pfc_cnts(struct hbl_en_port *port, void *ptr)
+{
+#ifdef CONFIG_DCB
+ struct hbl_en_device *hdev = port->hdev;
+ struct gaudi2_en_aux_ops *asic_aux_ops;
+ struct hbl_en_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ struct ieee_pfc *pfc = ptr;
+ u64 indications, requests;
+ u32 port_idx = port->idx;
+ int pfc_prio;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ asic_aux_ops = aux_ops->asic_ops;
+
+ for (pfc_prio = 0; pfc_prio < HBL_EN_PFC_PRIO_NUM; pfc_prio++) {
+ asic_aux_ops->get_pfc_cnts(aux_dev, port_idx, pfc_prio, &indications, &requests);
+
+ pfc->indications[pfc_prio] = indications;
+ pfc->requests[pfc_prio] = requests;
+ }
+#endif
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 107+ messages in thread
* [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
` (8 preceding siblings ...)
2024-06-13 8:22 ` [PATCH 10/15] net: hbl_en: gaudi2: ASIC specific support Omer Shpigelman
@ 2024-06-13 8:22 ` Omer Shpigelman
2024-06-13 19:18 ` Leon Romanovsky
2024-06-17 14:17 ` Jason Gunthorpe
2024-06-13 8:22 ` [PATCH 12/15] RDMA/hbl: direct verbs support Omer Shpigelman
` (5 subsequent siblings)
15 siblings, 2 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:22 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
Add an RDMA driver of Gaudi ASICs family for AI scaling.
The driver itself is agnostic to the ASIC in action, it operates according
to the capabilities that were passed on device initialization.
The device is initialized by the hbl_cn driver via auxiliary bus.
The driver also supports QP resource tracking and port/device HW counters.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Co-developed-by: David Meriin <dmeriin@habana.ai>
Signed-off-by: David Meriin <dmeriin@habana.ai>
Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
---
MAINTAINERS | 10 +
drivers/infiniband/Kconfig | 1 +
drivers/infiniband/hw/Makefile | 1 +
drivers/infiniband/hw/hbl/Kconfig | 17 +
drivers/infiniband/hw/hbl/Makefile | 8 +
drivers/infiniband/hw/hbl/hbl.h | 326 +++
drivers/infiniband/hw/hbl/hbl_main.c | 478 ++++
drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 ++++++++++++++++++++++
include/uapi/rdma/hbl-abi.h | 204 ++
include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
12 files changed, 3904 insertions(+)
create mode 100644 drivers/infiniband/hw/hbl/Kconfig
create mode 100644 drivers/infiniband/hw/hbl/Makefile
create mode 100644 drivers/infiniband/hw/hbl/hbl.h
create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c
create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c
create mode 100644 include/uapi/rdma/hbl-abi.h
create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h
create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 01b82e9b672c..d754eb13cb58 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9627,6 +9627,16 @@ F: drivers/net/ethernet/intel/hbl_en/
F: include/linux/net/intel/cn*
F: include/linux/net/intel/gaudi2*
+HABANALABS IB DRIVER
+M: Omer Shpigelman <oshpigelman@habana.ai>
+L: linux-rdma@vger.kernel.org
+S: Supported
+W: https://www.habana.ai
+Q: https://patchwork.kernel.org/project/linux-rdma/list/
+F: drivers/infiniband/hw/hbl/
+F: include/linux/net/intel/cn*
+F: include/uapi/rdma/hbl-abi.h
+
HACKRF MEDIA DRIVER
L: linux-media@vger.kernel.org
S: Orphan
diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index a5827d11e934..8f913558b816 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -83,6 +83,7 @@ source "drivers/infiniband/hw/bnxt_re/Kconfig"
source "drivers/infiniband/hw/cxgb4/Kconfig"
source "drivers/infiniband/hw/efa/Kconfig"
source "drivers/infiniband/hw/erdma/Kconfig"
+source "drivers/infiniband/hw/hbl/Kconfig"
source "drivers/infiniband/hw/hfi1/Kconfig"
source "drivers/infiniband/hw/hns/Kconfig"
source "drivers/infiniband/hw/irdma/Kconfig"
diff --git a/drivers/infiniband/hw/Makefile b/drivers/infiniband/hw/Makefile
index 1211f4317a9f..e8fc49b1bb03 100644
--- a/drivers/infiniband/hw/Makefile
+++ b/drivers/infiniband/hw/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_INFINIBAND_MTHCA) += mthca/
obj-$(CONFIG_INFINIBAND_QIB) += qib/
obj-$(CONFIG_INFINIBAND_CXGB4) += cxgb4/
obj-$(CONFIG_INFINIBAND_EFA) += efa/
+obj-$(CONFIG_INFINIBAND_HBL) += hbl/
obj-$(CONFIG_INFINIBAND_IRDMA) += irdma/
obj-$(CONFIG_MANA_INFINIBAND) += mana/
obj-$(CONFIG_MLX4_INFINIBAND) += mlx4/
diff --git a/drivers/infiniband/hw/hbl/Kconfig b/drivers/infiniband/hw/hbl/Kconfig
new file mode 100644
index 000000000000..90c865a82540
--- /dev/null
+++ b/drivers/infiniband/hw/hbl/Kconfig
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# HabanaLabs (an Intel Company) Infiniband driver configuration
+#
+
+config INFINIBAND_HBL
+ tristate "HabanaLabs (an Intel Company) InfiniBand driver"
+ depends on NETDEVICES && ETHERNET && PCI && INET
+ select HABANA_CN
+ help
+ This driver enables InfiniBand functionality for the network
+ interfaces that are part of the GAUDI ASIC family of AI Accelerators.
+ The network interfaces are mainly used for scaling-out the training of
+ AI Neural Networks through ROCEv2 protocol.
+
+ To compile this driver as a module, choose M here. The module
+ will be called habanalabs_ib.
diff --git a/drivers/infiniband/hw/hbl/Makefile b/drivers/infiniband/hw/hbl/Makefile
new file mode 100644
index 000000000000..659d4bbfec0f
--- /dev/null
+++ b/drivers/infiniband/hw/hbl/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for HabanaLabs (an Intel Company) Infiniband driver
+#
+
+obj-$(CONFIG_INFINIBAND_HBL) := habanalabs_ib.o
+
+habanalabs_ib-y += hbl_main.o hbl_verbs.o
diff --git a/drivers/infiniband/hw/hbl/hbl.h b/drivers/infiniband/hw/hbl/hbl.h
new file mode 100644
index 000000000000..4fbe2368fd11
--- /dev/null
+++ b/drivers/infiniband/hw/hbl/hbl.h
@@ -0,0 +1,326 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright 2022-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef _HBL_H_
+#define _HBL_H_
+
+#include <linux/net/intel/cn.h>
+
+#include <uapi/rdma/hbl-abi.h>
+#include <uapi/rdma/hbl_user_ioctl_cmds.h>
+#include <uapi/rdma/hbl_user_ioctl_verbs.h>
+#include <rdma/ib_verbs.h>
+#include <linux/pci.h>
+#include <rdma/uverbs_ioctl.h>
+#include <linux/xarray.h>
+
+#define HBL_IB_MAX_PORT_GIDS 13
+
+/* For internal ports, only one GID is required and that is based on the MAC address. */
+#define HBL_IB_MAX_PORT_GIDS_INTERNAL 1
+
+/* define maximum supported send and receive SGEs */
+#define HBL_IB_MAX_SEND_SGE 2
+#define HBL_IB_MAX_RECV_SGE 2
+
+#define HBL_IB_EQ_PORT_FIELD_MASK 0xFFFF
+#define HBL_IB_EQ_PORT_FIELD_SIZE 16
+
+/**
+ * struct hbl_ib_user_mmap_entry - Mmap information.
+ * @rdma_entry: IB core rdma mmap entry.
+ * @info: Information for performing mmap.
+ */
+struct hbl_ib_user_mmap_entry {
+ struct rdma_user_mmap_entry rdma_entry;
+ struct hbl_ib_mem_info info;
+};
+
+/**
+ * struct hbl_ib_pd - Habanalabs IB PD.
+ * @ibpd: IB core PD.
+ * @pdn: PD ID.
+ */
+struct hbl_ib_pd {
+ struct ib_pd ibpd;
+ u32 pdn;
+};
+
+/**
+ * struct hbl_ib_ucontext - Habanalabs IB user context.
+ * @ibucontext: IB core user context.
+ * @qp_xarray: QP handle.
+ * @cn_ctx: CN context private data.
+ * @pd_allocated: is PD allocated.
+ * @ports_mask: Mask of ports associated with this context.
+ */
+struct hbl_ib_ucontext {
+ struct ib_ucontext ibucontext;
+ struct xarray qp_xarray;
+ void *cn_ctx;
+ atomic_t pd_allocated;
+ u64 ports_mask;
+};
+
+/**
+ * struct hbl_ib_cq - Habanalabs IB CQ.
+ * @ibcq: IB core CQ.
+ * @hctx: HBL IB context.
+ * @mem_handle_entry: Mmap entry for the mem handle.
+ * @pi_handle_entry: Mmap entry for the pi handle.
+ * @regs_handle_entry: Mmap entry for the regs handle.
+ * @port_cq: contains the hbl_ib_cq structure per port.
+ * @cq_type: Type of CQ resource
+ * @cq_num: CQ number that was allocated.
+ * @hbl_port_num: hbl port number that matches with the core code's port number.
+ * @is_native: If this is native create cq call or dv create cq call.
+ */
+struct hbl_ib_cq {
+ struct ib_cq ibcq;
+ struct hbl_ib_ucontext *hctx;
+ struct rdma_user_mmap_entry *mem_handle_entry;
+ struct rdma_user_mmap_entry *pi_handle_entry;
+ struct rdma_user_mmap_entry *regs_handle_entry;
+ struct hbl_ib_cq *port_cq;
+ enum hbl_ibv_cq_type cq_type;
+ u32 cq_num;
+ u8 hbl_port_num;
+ u8 is_native;
+};
+
+/**
+ * struct hbl_ib_qp - Habanalabs IB QP.
+ * @ibqp: IB core QP.
+ * @hctx: hbl IB context.
+ * @swq_mem_handle_entry: Mmap entry for the swq mem handle.
+ * @rwq_mem_handle_entry: Mmap entry for the rwq mem handle.
+ * @qp_state: Current QP state.
+ * @wq_type: WQ type.
+ * @qp_id: hbl core QP ID.
+ * @dest_qp_num: destination qp number.
+ * @max_send_wr: maximum send work requests supported.
+ * @max_recv_wr: maximum receive work requests supported.
+ * @mtu: QP mtu.
+ * @dst_ip_addr: destination IPv4 address.
+ * @dst_mac_addr: destination MAC address.
+ * @wq_granularity: send WQE granularity.
+ */
+struct hbl_ib_qp {
+ struct ib_qp ibqp;
+ struct hbl_ib_ucontext *hctx;
+ struct rdma_user_mmap_entry *swq_mem_handle_entry;
+ struct rdma_user_mmap_entry *rwq_mem_handle_entry;
+ enum ib_qp_state qp_state;
+ enum qpc_req_wq_type wq_type;
+ u32 qp_id;
+ u32 dest_qp_num;
+ u32 max_send_wr;
+ u32 max_recv_wr;
+ u32 mtu;
+ u32 dst_ip_addr;
+ u8 dst_mac_addr[ETH_ALEN];
+ u8 wq_granularity;
+};
+
+/**
+ * struct gid_entry - IB GID structure.
+ * @gid: IB global identifier.
+ * @gid_type: IB GID type.
+ */
+struct gid_entry {
+ union ib_gid gid;
+ enum ib_gid_type gid_type;
+};
+
+/**
+ * struct hbl_ib_port_init_params - Habanalabs IB port input parameters.
+ * @wq_arr_attr: Array of WQ-array attributes for each WQ-array type.
+ * @qp_wq_bp_offs: Offsets in NIC memory to signal a back pressure.
+ * @hbl_port_num: hbl port number that matches with the core code's port number.
+ * @advanced: WQ should support advanced operations such as RDV, QMan, WTD, etc.
+ * @adaptive_timeout_en: Enable adaptive_timeout feature on the port.
+ */
+struct hbl_ib_port_init_params {
+ struct hbl_wq_array_attr wq_arr_attr[HBL_IB_WQ_ARRAY_TYPE_MAX];
+ u32 qp_wq_bp_offs[HBL_IB_MAX_BP_OFFS];
+ u32 hbl_port_num;
+ u8 advanced;
+ u8 adaptive_timeout_en;
+};
+
+/**
+ * struct hbl_ib_port - Habanalabs IB port.
+ * @hdev: Habanalabs IB device.
+ * @hctx: hbl IB context.
+ * @gids: Array of GIDs (group IDs).
+ * @hbl_ibcq_tbl: CQ IDs table.
+ * @eq_comp: Completion object for event queue.
+ * @eq_thread: Event queue thread.
+ * @eq_lock: Event queue handling synchronization object.
+ * @port: Port ID.
+ * @mtu: Port MTU.
+ * @swqs_enabled: Array of send WQs from each type which indicate if WQ is enabled.
+ * @rwqs_enabled: Array of receive WQs from each type which indicate if WQ is enabled.
+ * @open: Port initialized.
+ */
+struct hbl_ib_port {
+ struct hbl_ib_device *hdev;
+ struct hbl_ib_ucontext *hctx;
+ struct gid_entry gids[HBL_IB_MAX_PORT_GIDS];
+ struct xarray hbl_ibcq_tbl;
+ struct completion eq_comp;
+ struct task_struct *eq_thread;
+ atomic_t eq_lock;
+ u32 port;
+ u32 mtu;
+ u8 swqs_enabled[HBL_IB_WQ_ARRAY_TYPE_MAX];
+ u8 rwqs_enabled[HBL_IB_WQ_ARRAY_TYPE_MAX];
+ u8 open;
+};
+
+/**
+ * struct hbl_ib_device_stats - IB device counters structure.
+ * @fatal_event: Fatal events counter.
+ */
+struct hbl_ib_device_stats {
+ atomic_t fatal_event;
+};
+
+/**
+ * struct hbl_ib_port_stats - IB port counters info.
+ * @stat_desc: Core rdma stats structure of the counters.
+ * @names: Names of the counters.
+ * @num: Number of counters.
+ */
+struct hbl_ib_port_stats {
+ struct rdma_stat_desc *stat_desc;
+ u8 **names;
+ u32 num;
+};
+
+/**
+ * struct hbl_ib_device - habanalabs IB device structure.
+ * @ibdev: IB device.
+ * @dev_stats: Device counters.
+ * @netdev_notifier: netdev events notifier.
+ * @port_stats: Array of port counters.
+ * @pdev: Pointer to PCI device.
+ * @dev: Related kernel basic device structure.
+ * @aux_dev: Pointer to auxiliary device.
+ * @ib_port: IB port structure.
+ * @hbl_to_ib_port_map: mapping array between hbl port to IB port.
+ * @dev_lock: Device lock for configuration serialization.
+ * @ctx_open: User context allocated.
+ * @ports_mask: Mask of available ports.
+ * @ext_ports_mask: Mask of external ports (subset of ports_mask).
+ * @pending_reset_long_timeout: Long timeout for pending hard reset to finish in seconds.
+ * @id: Core device ID.
+ * @max_num_of_ports: Maximum number of ports supported by ASIC.
+ * @mixed_qp_wq_types: Using mixed QP WQ types is supported.
+ * @umr_support: device supports UMR.
+ * @cc_support: device supports congestion control.
+ */
+struct hbl_ib_device {
+ struct ib_device ibdev;
+ struct hbl_ib_device_stats dev_stats;
+ struct notifier_block netdev_notifier;
+ struct hbl_ib_port_stats *port_stats;
+ struct pci_dev *pdev;
+ struct device *dev;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_port *ib_port;
+ u32 *hbl_to_ib_port_map;
+ atomic_t ctx_open;
+ u64 ports_mask;
+ u64 ext_ports_mask;
+ u32 pending_reset_long_timeout;
+ u16 id;
+ u8 max_num_of_ports;
+ u8 mixed_qp_wq_types;
+ u8 umr_support;
+ u8 cc_support;
+};
+
+extern const struct ib_device_ops hbl_ib_dev_ops;
+extern const struct uapi_definition hbl_usr_fifo_defs[];
+extern const struct uapi_definition hbl_set_port_ex_defs[];
+extern const struct uapi_definition hbl_query_port_defs[];
+extern const struct uapi_definition hbl_encap_defs[];
+
+static inline struct hbl_ib_device *to_hbl_ib_dev(struct ib_device *ibdev)
+{
+ return container_of(ibdev, struct hbl_ib_device, ibdev);
+}
+
+static inline struct hbl_ib_ucontext *to_hbl_ib_ucontext(struct ib_ucontext *ibucontext)
+{
+ return container_of(ibucontext, struct hbl_ib_ucontext, ibucontext);
+}
+
+static inline u32 hbl_to_ib_port_num(struct hbl_ib_device *hdev, u32 hbl_port_num)
+{
+ return hdev->hbl_to_ib_port_map[hbl_port_num];
+}
+
+static inline int ib_to_hbl_port_num(struct hbl_ib_device *hdev, u32 ib_port_num, u32 *hbl_port_num)
+{
+ u32 hbl_port;
+
+ if (!ib_port_num)
+ return -EINVAL;
+
+ for (hbl_port = 0; hbl_port < hdev->max_num_of_ports; hbl_port++)
+ if (hbl_to_ib_port_num(hdev, hbl_port) == ib_port_num) {
+ *hbl_port_num = hbl_port;
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+static inline struct hbl_ib_user_mmap_entry *
+to_hbl_ib_user_mmap_entry(struct rdma_user_mmap_entry *rdma_entry)
+{
+ return container_of(rdma_entry, struct hbl_ib_user_mmap_entry, rdma_entry);
+}
+
+struct rdma_user_mmap_entry *
+hbl_ib_user_mmap_entry_insert(struct ib_ucontext *ucontext, u64 address, size_t length,
+ u64 *offset);
+
+#define hbl_ibdev_emerg(ibdev, format, ...) ibdev_emerg(ibdev, format, ##__VA_ARGS__)
+#define hbl_ibdev_alert(ibdev, format, ...) ibdev_alert(ibdev, format, ##__VA_ARGS__)
+#define hbl_ibdev_crit(ibdev, format, ...) ibdev_crit(ibdev, format, ##__VA_ARGS__)
+#define hbl_ibdev_err(ibdev, format, ...) ibdev_err(ibdev, format, ##__VA_ARGS__)
+#define hbl_ibdev_warn(ibdev, format, ...) ibdev_warn(ibdev, format, ##__VA_ARGS__)
+#define hbl_ibdev_notice(ibdev, format, ...) ibdev_notice(ibdev, format, ##__VA_ARGS__)
+#define hbl_ibdev_info(ibdev, format, ...) ibdev_info(ibdev, format, ##__VA_ARGS__)
+#define hbl_ibdev_dbg(ibdev, format, ...) ibdev_dbg(ibdev, format, ##__VA_ARGS__)
+
+#define hbl_ibdev_emerg_ratelimited(ibdev, fmt, ...) \
+ ibdev_emerg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
+#define hbl_ibdev_alert_ratelimited(ibdev, fmt, ...) \
+ ibdev_alert_ratelimited(ibdev, fmt, ##__VA_ARGS__)
+#define hbl_ibdev_crit_ratelimited(ibdev, fmt, ...) \
+ ibdev_crit_ratelimited(ibdev, fmt, ##__VA_ARGS__)
+#define hbl_ibdev_err_ratelimited(ibdev, fmt, ...) \
+ ibdev_err_ratelimited(ibdev, fmt, ##__VA_ARGS__)
+#define hbl_ibdev_warn_ratelimited(ibdev, fmt, ...) \
+ ibdev_warn_ratelimited(ibdev, fmt, ##__VA_ARGS__)
+#define hbl_ibdev_notice_ratelimited(ibdev, fmt, ...) \
+ ibdev_notice_ratelimited(ibdev, fmt, ##__VA_ARGS__)
+#define hbl_ibdev_info_ratelimited(ibdev, fmt, ...) \
+ ibdev_info_ratelimited(ibdev, fmt, ##__VA_ARGS__)
+#define hbl_ibdev_dbg_ratelimited(ibdev, fmt, ...) \
+ ibdev_dbg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
+
+int hbl_ib_port_init(struct hbl_ib_ucontext *hctx, struct hbl_ib_port_init_params *init_params);
+void hbl_ib_eqe_handler(struct hbl_ib_port *ib_port);
+void hbl_ib_eqe_null_work(struct hbl_aux_dev *aux_dev, u32 port);
+void hbl_ib_eqe_work_schd(struct hbl_aux_dev *aux_dev, u32 port);
+
+#endif /* _HBL_H_ */
diff --git a/drivers/infiniband/hw/hbl/hbl_main.c b/drivers/infiniband/hw/hbl/hbl_main.c
new file mode 100644
index 000000000000..98d3ed46bfe2
--- /dev/null
+++ b/drivers/infiniband/hw/hbl/hbl_main.c
@@ -0,0 +1,478 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2022-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#define pr_fmt(fmt) "habanalabs_ib: " fmt
+
+#include "hbl.h"
+
+#include <linux/module.h>
+#include <linux/auxiliary_bus.h>
+
+#define HBL_DRIVER_AUTHOR "HabanaLabs Kernel Driver Team"
+
+#define HBL_DRIVER_DESC "HabanaLabs AI accelerators InfiniBand driver"
+
+MODULE_AUTHOR(HBL_DRIVER_AUTHOR);
+MODULE_DESCRIPTION(HBL_DRIVER_DESC);
+MODULE_LICENSE("GPL");
+
+#define MTU_DEFAULT SZ_4K
+
+static void hbl_ib_port_event(struct ib_device *ibdev, u32 port_num, enum ib_event_type reason)
+{
+ struct ib_event event;
+
+ event.device = ibdev;
+ event.element.port_num = port_num;
+ event.event = reason;
+
+ ib_dispatch_event(&event);
+}
+
+static void hbl_ib_port_mtu_update(struct ib_device *ibdev, u32 hbl_port, u32 mtu)
+{
+ struct hbl_ib_device *hdev = to_hbl_ib_dev(ibdev);
+
+ hdev->ib_port[hbl_port].mtu = mtu;
+}
+
+static bool hbl_ib_match_netdev(struct ib_device *ibdev, struct net_device *netdev)
+{
+ struct hbl_ib_device *hdev = to_hbl_ib_dev(ibdev);
+ struct hbl_ib_aux_data *aux_data;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = hdev->aux_dev;
+ aux_data = aux_dev->aux_data;
+
+ /* IB and EN share the same PCI device, hence we can find the correct netdev to bind to
+ * ibdev through the pointer to this device and port index.
+ */
+ if (&hdev->pdev->dev == netdev->dev.parent)
+ return true;
+
+ return false;
+}
+
+static int hbl_ib_netdev_event(struct notifier_block *notifier, unsigned long event, void *ptr)
+{
+ struct hbl_ib_device *hdev = container_of(notifier, struct hbl_ib_device, netdev_notifier);
+ struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
+ struct ib_device *ibdev = &hdev->ibdev;
+ u32 ib_port;
+
+ if (hbl_ib_match_netdev(ibdev, netdev))
+ ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
+ else
+ return NOTIFY_DONE;
+
+ switch (event) {
+ case NETDEV_UP:
+ hbl_ib_port_event(ibdev, ib_port, IB_EVENT_PORT_ACTIVE);
+ break;
+ case NETDEV_DOWN:
+ hbl_ib_port_event(ibdev, ib_port, IB_EVENT_PORT_ERR);
+ break;
+ case NETDEV_REGISTER:
+ ib_device_set_netdev(ibdev, netdev, ib_port);
+ hbl_ib_port_mtu_update(ibdev, netdev->dev_port, netdev->mtu);
+ break;
+ case NETDEV_UNREGISTER:
+ hbl_ib_port_mtu_update(ibdev, netdev->dev_port, MTU_DEFAULT);
+ ib_device_set_netdev(ibdev, NULL, ib_port);
+ break;
+ case NETDEV_CHANGEMTU:
+ hbl_ib_port_mtu_update(ibdev, netdev->dev_port, netdev->mtu);
+ break;
+ default:
+ break;
+ }
+
+ return NOTIFY_DONE;
+}
+
+static void hbl_ib_dispatch_fatal_event(struct hbl_aux_dev *aux_dev, u32 asid)
+{
+ struct hbl_ib_device *hdev = aux_dev->priv;
+ struct ib_event ibev = {};
+
+ atomic_inc(&hdev->dev_stats.fatal_event);
+
+ hbl_ibdev_err(&hdev->ibdev, "raising fatal event for context with ASID %d\n", asid);
+
+ ibev.device = &hdev->ibdev;
+ ibev.event = IB_EVENT_DEVICE_FATAL;
+ ibev.element.port_num = HBL_IB_EQ_PORT_FIELD_MASK | (asid << HBL_IB_EQ_PORT_FIELD_SIZE);
+ ib_dispatch_event(&ibev);
+}
+
+static void hbl_ib_set_aux_ops(struct hbl_ib_device *hdev, bool enable)
+{
+ struct hbl_aux_dev *aux_dev = hdev->aux_dev;
+ struct hbl_ib_aux_ops *aux_ops;
+
+ aux_ops = aux_dev->aux_ops;
+
+ /* map cn2ib functions */
+ if (enable)
+ aux_ops->eqe_work_schd = hbl_ib_eqe_null_work;
+ else
+ aux_ops->eqe_work_schd = NULL;
+
+ aux_ops->dispatch_fatal_event = hbl_ib_dispatch_fatal_event;
+}
+
+static int hbl_ib_dev_init(struct hbl_ib_device *hdev)
+{
+ char name[IB_DEVICE_NAME_MAX] = {0};
+ struct ib_device *ibdev;
+ u32 max_num_of_ports;
+ int rc, i, port_cnt;
+
+ ibdev = &hdev->ibdev;
+
+ ibdev->node_type = RDMA_NODE_UNSPECIFIED;
+ ibdev->dev.parent = &hdev->pdev->dev;
+
+ max_num_of_ports = hdev->max_num_of_ports;
+
+ /* Allocation of the mapping array between hbl<->ib ports.
+ * We don't need to initialize this array with some special value as 0 is an invalid value
+ * for IB port.
+ */
+ hdev->hbl_to_ib_port_map = kcalloc(max_num_of_ports, sizeof(u32), GFP_KERNEL);
+ if (!hdev->hbl_to_ib_port_map)
+ return -ENOMEM;
+
+ port_cnt = 0;
+ for (i = 0; i < max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ port_cnt++;
+ hdev->hbl_to_ib_port_map[i] = port_cnt;
+ }
+
+ ibdev->phys_port_cnt = port_cnt;
+
+ /* The number of Completion vectors (i.e. MSI-X vectors) available for this RDMA device.
+ * For now have it as '1'
+ */
+ ibdev->num_comp_vectors = 1;
+
+ ib_set_device_ops(ibdev, &hbl_ib_dev_ops);
+
+ /* The CN driver might start calling the aux functions after registering the device so set
+ * the callbacks here.
+ */
+ hbl_ib_set_aux_ops(hdev, true);
+
+ snprintf(name, sizeof(name), "hbl_%d", hdev->id);
+
+ rc = ib_register_device(ibdev, name, &hdev->pdev->dev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to register IB device, err %d\n", rc);
+ goto ibdev_register_fail;
+ }
+
+ hdev->netdev_notifier.notifier_call = hbl_ib_netdev_event;
+
+ rc = register_netdevice_notifier(&hdev->netdev_notifier);
+ if (rc) {
+ hbl_ibdev_err(ibdev, "Failed to register netdev notifier, err %d\n", rc);
+ goto notifier_register_fail;
+ }
+
+ hbl_ibdev_info(ibdev, "IB device registered\n");
+
+ return 0;
+
+notifier_register_fail:
+ ib_unregister_device(ibdev);
+ibdev_register_fail:
+ hbl_ib_set_aux_ops(hdev, false);
+ kfree(hdev->hbl_to_ib_port_map);
+ return rc;
+}
+
+static void hbl_ib_dev_fini(struct hbl_ib_device *hdev)
+{
+ struct ib_device *ibdev = &hdev->ibdev;
+
+ hbl_ibdev_info(ibdev, "Unregister IB device\n");
+ unregister_netdevice_notifier(&hdev->netdev_notifier);
+ ib_unregister_device(ibdev);
+ hbl_ib_set_aux_ops(hdev, false);
+ kfree(hdev->hbl_to_ib_port_map);
+}
+
+/* Initialize an array of strings to hold the counters names.
+ * We get the names as one long spaced string and then we convert it to an array of strings like the
+ * IB counters API expects.
+ */
+static int hbl_ib_cnts_init(struct hbl_ib_device *hdev, int port)
+{
+ struct hbl_ib_port_cnts_data *cnts_data;
+ struct hbl_ib_port_stats *port_stats;
+ struct rdma_stat_desc *stat_desc;
+ struct hbl_ib_aux_data *aux_data;
+ u8 *ptr, *data, **data2;
+ int cnt_num, i;
+
+ aux_data = hdev->aux_dev->aux_data;
+ port_stats = &hdev->port_stats[port];
+ cnts_data = &aux_data->cnts_data[port];
+ cnt_num = cnts_data->num;
+
+ /* array for strings and pointers for them */
+ data = kcalloc(cnt_num, (sizeof(u8 *) + HBL_IB_CNT_NAME_LEN), GFP_KERNEL);
+ if (!data)
+ goto exit_err;
+
+ stat_desc = kcalloc(cnt_num, sizeof(*stat_desc), GFP_KERNEL);
+ if (!stat_desc)
+ goto free_data;
+
+ /* copy the strings after the pointers to them */
+ ptr = data + cnt_num * sizeof(u8 *);
+ memcpy(ptr, cnts_data->names, cnt_num * HBL_IB_CNT_NAME_LEN);
+
+ data2 = (u8 **)data;
+
+ /* set the pointers to the strings */
+ for (i = 0; i < cnt_num; i++)
+ data2[i] = ptr + i * HBL_IB_CNT_NAME_LEN;
+
+ port_stats->num = cnt_num;
+ port_stats->names = data2;
+
+ for (i = 0; i < cnt_num; i++)
+ stat_desc[i].name = data2[i];
+
+ port_stats->stat_desc = stat_desc;
+
+ return 0;
+
+free_data:
+ kfree(data);
+exit_err:
+ return -ENOMEM;
+}
+
+static void hbl_ib_cnts_fini(struct hbl_ib_device *hdev, int port)
+{
+ kfree(hdev->port_stats[port].stat_desc);
+ kfree(hdev->port_stats[port].names);
+}
+
+static int hdev_init(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_ib_aux_data *aux_data = aux_dev->aux_data;
+ struct hbl_ib_device *hdev;
+ int rc, i;
+
+ hdev = ib_alloc_device(hbl_ib_device, ibdev);
+ if (!hdev)
+ return -ENOMEM;
+
+ aux_dev->priv = hdev;
+ hdev->aux_dev = aux_dev;
+ hdev->pdev = aux_data->pdev;
+ hdev->dev = aux_data->dev;
+ hdev->ports_mask = aux_data->ports_mask;
+ hdev->ext_ports_mask = aux_data->ext_ports_mask;
+ hdev->pending_reset_long_timeout = aux_data->pending_reset_long_timeout;
+ hdev->id = aux_data->id;
+ hdev->max_num_of_ports = aux_data->max_num_of_ports;
+ hdev->mixed_qp_wq_types = aux_data->mixed_qp_wq_types;
+ hdev->umr_support = aux_data->umr_support;
+ hdev->cc_support = aux_data->cc_support;
+
+ /* Allocate port structs */
+ hdev->ib_port = kcalloc(hdev->max_num_of_ports, sizeof(*hdev->ib_port), GFP_KERNEL);
+ if (!hdev->ib_port) {
+ rc = -ENOMEM;
+ goto free_device;
+ }
+
+ /* Set default MTU value that can be overridden later by netdev */
+ for (i = 0; i < hdev->max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ hdev->ib_port[i].mtu = MTU_DEFAULT;
+ }
+
+ hdev->port_stats = kcalloc(hdev->max_num_of_ports, sizeof(*hdev->port_stats), GFP_KERNEL);
+ if (!hdev->port_stats) {
+ rc = -ENOMEM;
+ goto free_ports;
+ }
+
+ for (i = 0; i < hdev->max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ rc = hbl_ib_cnts_init(hdev, i);
+ if (rc)
+ goto free_cnts;
+ }
+
+ return 0;
+
+free_cnts:
+ for (--i; i >= 0; i--) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ hbl_ib_cnts_fini(hdev, i);
+ }
+ kfree(hdev->port_stats);
+free_ports:
+ kfree(hdev->ib_port);
+free_device:
+ aux_dev->priv = NULL;
+ ib_dealloc_device(&hdev->ibdev);
+
+ return rc;
+}
+
+static void hdev_fini(struct hbl_aux_dev *aux_dev)
+{
+ struct hbl_ib_device *hdev = aux_dev->priv;
+ int i;
+
+ for (i = 0; i < hdev->max_num_of_ports; i++) {
+ if (!(hdev->ports_mask & BIT(i)))
+ continue;
+
+ hbl_ib_cnts_fini(hdev, i);
+ }
+ kfree(hdev->port_stats);
+
+ kfree(hdev->ib_port);
+
+ aux_dev->priv = NULL;
+ ib_dealloc_device(&hdev->ibdev);
+}
+
+static const struct auxiliary_device_id hbl_ib_id_table[] = {
+ { .name = "habanalabs_cn.ib", },
+ {},
+};
+
+MODULE_DEVICE_TABLE(auxiliary, hbl_ib_id_table);
+
+static int hbl_ib_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
+{
+ struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
+ struct hbl_ib_aux_ops *aux_ops = aux_dev->aux_ops;
+ struct hbl_ib_device *hdev;
+ ktime_t timeout;
+ int rc;
+
+ rc = hdev_init(aux_dev);
+ if (rc) {
+ dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
+ return -EIO;
+ }
+
+ hdev = aux_dev->priv;
+
+ /* don't allow module unloading while it is attached */
+ if (!try_module_get(THIS_MODULE)) {
+ dev_err(hdev->dev, "Failed to increment %s module refcount\n",
+ module_name(THIS_MODULE));
+ rc = -EIO;
+ goto module_get_err;
+ }
+
+ timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
+ while (1) {
+ aux_ops->hw_access_lock(aux_dev);
+
+ /* if the device is operational, proceed to actual init while holding the lock in
+ * order to prevent concurrent hard reset
+ */
+ if (aux_ops->device_operational(aux_dev))
+ break;
+
+ aux_ops->hw_access_unlock(aux_dev);
+
+ if (ktime_compare(ktime_get(), timeout) > 0) {
+ dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
+ rc = -EBUSY;
+ goto timeout_err;
+ }
+
+ dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing IB\n");
+
+ msleep_interruptible(MSEC_PER_SEC);
+ }
+
+ rc = hbl_ib_dev_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init ib device\n");
+ goto dev_init_err;
+ }
+
+ aux_ops->hw_access_unlock(aux_dev);
+
+ return 0;
+
+dev_init_err:
+ aux_ops->hw_access_unlock(aux_dev);
+timeout_err:
+ module_put(THIS_MODULE);
+module_get_err:
+ hdev_fini(aux_dev);
+
+ return rc;
+}
+
+/* This function can be called only from the CN driver when deleting the aux bus, because we
+ * incremented the module refcount on probing. Hence no need to protect here from hard reset.
+ */
+static void hbl_ib_remove(struct auxiliary_device *adev)
+{
+ struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
+ struct hbl_ib_device *hdev = aux_dev->priv;
+
+ if (!hdev)
+ return;
+
+ hbl_ib_dev_fini(hdev);
+
+ /* allow module unloading as now it is detached */
+ module_put(THIS_MODULE);
+
+ hdev_fini(aux_dev);
+}
+
+static struct auxiliary_driver hbl_ib_driver = {
+ .name = "ib",
+ .probe = hbl_ib_probe,
+ .remove = hbl_ib_remove,
+ .id_table = hbl_ib_id_table,
+};
+
+static int __init hbl_ib_init(void)
+{
+ pr_info("loading driver\n");
+
+ return auxiliary_driver_register(&hbl_ib_driver);
+}
+
+static void __exit hbl_ib_exit(void)
+{
+ auxiliary_driver_unregister(&hbl_ib_driver);
+
+ pr_info("driver removed\n");
+}
+
+module_init(hbl_ib_init);
+module_exit(hbl_ib_exit)
diff --git a/drivers/infiniband/hw/hbl/hbl_verbs.c b/drivers/infiniband/hw/hbl/hbl_verbs.c
new file mode 100644
index 000000000000..624bcd7290f0
--- /dev/null
+++ b/drivers/infiniband/hw/hbl/hbl_verbs.c
@@ -0,0 +1,2686 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2022-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include <rdma/ib_addr.h>
+#include <rdma/uverbs_ioctl.h>
+#include <linux/net/intel/cni.h>
+#include <linux/bitfield.h>
+#include <linux/ctype.h>
+#include <linux/vmalloc.h>
+
+#include "hbl.h"
+#include <uapi/rdma/hbl_user_ioctl_cmds.h>
+#include <uapi/rdma/hbl_user_ioctl_verbs.h>
+
+#define HBL_IB_MAX_QP BIT(10)
+#define HBL_IB_MAX_CQE BIT(13)
+#define HBL_IB_MAX_MSG_SIZE SZ_1G
+#define HBL_IB_DEFAULT_MAX_NUM_OF_QPS 128
+#define HBL_IB_DEFAULT_MAX_NUM_WQES_IN_WQ 256
+#define HBL_IB_DEFAULT_WQ_MEM_ID HBL_CNI_MEM_HOST
+#define HBL_IB_DUMP_QP_SZ SZ_1K
+
+static int verify_qp_xarray(struct hbl_ib_qp *hblqp);
+static void qp_user_mmap_entries_remove(struct hbl_ib_qp *qp);
+
+enum hbl_ib_device_stats_type {
+ FATAL_EVENT,
+};
+
+static const struct rdma_stat_desc hbl_ib_device_stats[] = {
+ { .name = "fatal_event",},
+};
+
+static inline struct hbl_ib_pd *to_hbl_ib_pd(struct ib_pd *ibpd)
+{
+ return container_of(ibpd, struct hbl_ib_pd, ibpd);
+}
+
+static inline struct hbl_ib_qp *to_hbl_ib_qp(struct ib_qp *ibqp)
+{
+ return container_of(ibqp, struct hbl_ib_qp, ibqp);
+}
+
+static inline struct hbl_ib_cq *to_hbl_ib_cq(struct ib_cq *ibcq)
+{
+ return container_of(ibcq, struct hbl_ib_cq, ibcq);
+}
+
+static inline u64 to_hbl_port_mask(struct hbl_ib_device *hdev, u64 ib_port_mask)
+{
+ u32 hbl_port_num, ib_port_num;
+ u64 hbl_port_mask = 0x0;
+
+ for (hbl_port_num = 0; hbl_port_num < hdev->max_num_of_ports; hbl_port_num++) {
+ ib_port_num = hbl_to_ib_port_num(hdev, hbl_port_num);
+ if (!ib_port_num)
+ continue;
+
+ if (ib_port_mask & BIT_ULL(ib_port_num))
+ hbl_port_mask |= BIT_ULL(hbl_port_num);
+ }
+
+ return hbl_port_mask;
+}
+
+static inline u64 to_ib_port_mask(struct hbl_ib_device *hdev, u64 hbl_port_mask)
+{
+ u32 hbl_port_num, ib_port_num;
+ u64 ib_port_mask = 0x0;
+
+ for (hbl_port_num = 0; hbl_port_num < hdev->max_num_of_ports; hbl_port_num++) {
+ if (!(hbl_port_mask & BIT(hbl_port_num)))
+ continue;
+
+ ib_port_num = hbl_to_ib_port_num(hdev, hbl_port_num);
+
+ /* The IB ports are 1 based, hence getting zero value means that we have bug in the
+ * hbl<->ib port mapping.
+ */
+ WARN_ON(!ib_port_num);
+
+ ib_port_mask |= BIT_ULL(ib_port_num);
+ }
+
+ return ib_port_mask;
+}
+
+struct rdma_user_mmap_entry *
+hbl_ib_user_mmap_entry_insert(struct ib_ucontext *ucontext, u64 handle, size_t length, u64 *offset)
+{
+ struct hbl_ib_user_mmap_entry *entry;
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ int rc;
+
+ ibdev = ucontext->device;
+ hdev = to_hbl_ib_dev(ibdev);
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ return ERR_PTR(-ENOMEM);
+
+ rc = aux_ops->query_mem_handle(aux_dev, handle, &entry->info);
+ if (rc)
+ goto err_free_entry;
+
+ rc = rdma_user_mmap_entry_insert_range(ucontext, &entry->rdma_entry, length, 1, U32_MAX);
+ if (rc)
+ goto err_free_entry;
+
+ *offset = rdma_user_mmap_get_offset(&entry->rdma_entry);
+
+ return &entry->rdma_entry;
+
+err_free_entry:
+ kfree(entry);
+ return ERR_PTR(rc);
+}
+
+static int to_hbl_wq_arr_types(struct ib_device *ibdev, enum hbl_ib_wq_array_type ib_wq_arr_type,
+ enum hbl_nic_mem_type *swq_type, enum hbl_nic_mem_type *rwq_type)
+{
+ switch (ib_wq_arr_type) {
+ case HBL_IB_WQ_ARRAY_TYPE_GENERIC:
+ *swq_type = HBL_CNI_USER_WQ_SEND;
+ *rwq_type = HBL_CNI_USER_WQ_RECV;
+ break;
+ default:
+ hbl_ibdev_err(ibdev, "Invalid WQ array type %d\n", ib_wq_arr_type);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int hbl_ib_wqs_init(struct hbl_ib_port *ib_port, struct hbl_ib_ucontext *hctx,
+ struct hbl_ib_port_init_params *init_params,
+ enum hbl_ib_wq_array_type ib_wq_arr_type)
+{
+ struct hbl_cni_user_wq_arr_unset_in wq_arr_unset_in = {};
+ struct hbl_cni_user_wq_arr_set_out wq_arr_set_out = {};
+ struct hbl_cni_user_wq_arr_set_in wq_arr_set_in = {};
+ enum hbl_nic_mem_type swq_type, rwq_type;
+ struct hbl_wq_array_attr *wq_arr_attr;
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ u32 port;
+ int rc;
+
+ hdev = ib_port->hdev;
+ port = ib_port->port;
+ ibdev = &hdev->ibdev;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ rc = to_hbl_wq_arr_types(ibdev, ib_wq_arr_type, &swq_type, &rwq_type);
+ if (rc)
+ return rc;
+
+ wq_arr_attr = &init_params->wq_arr_attr[ib_wq_arr_type];
+
+ if (!wq_arr_attr->max_num_of_wqs || !wq_arr_attr->max_num_of_wqes_in_wq)
+ return 0;
+
+ wq_arr_set_in.port = port;
+ wq_arr_set_in.num_of_wqs = wq_arr_attr->max_num_of_wqs;
+ wq_arr_set_in.num_of_wq_entries = wq_arr_attr->max_num_of_wqes_in_wq;
+ wq_arr_set_in.mem_id = wq_arr_attr->mem_id;
+ wq_arr_set_in.swq_granularity = wq_arr_attr->swq_granularity;
+
+ wq_arr_set_in.type = swq_type;
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_WQ_SET, &wq_arr_set_in,
+ &wq_arr_set_out);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to set send WQ, port %d\n", port);
+ return rc;
+ }
+
+ ib_port->swqs_enabled[ib_wq_arr_type] = true;
+
+ wq_arr_set_in.type = rwq_type;
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_WQ_SET, &wq_arr_set_in,
+ &wq_arr_set_out);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to set recv WQ, port %d\n", port);
+ goto clear_send_wq;
+ }
+
+ ib_port->rwqs_enabled[ib_wq_arr_type] = true;
+
+ return 0;
+
+clear_send_wq:
+ wq_arr_unset_in.port = port;
+ wq_arr_unset_in.type = swq_type;
+ aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_WQ_UNSET, &wq_arr_unset_in, NULL);
+ ib_port->swqs_enabled[ib_wq_arr_type] = false;
+
+ return rc;
+}
+
+static void hbl_ib_wqs_fini(struct hbl_ib_port *ib_port, struct hbl_ib_ucontext *hctx,
+ enum hbl_ib_wq_array_type ib_wq_arr_type)
+{
+ struct hbl_cni_user_wq_arr_unset_in wq_arr_unset_in = {};
+ enum hbl_nic_mem_type swq_type, rwq_type;
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ u32 port;
+ int rc;
+
+ hdev = ib_port->hdev;
+ port = ib_port->port;
+ ibdev = &hdev->ibdev;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ rc = to_hbl_wq_arr_types(ibdev, ib_wq_arr_type, &swq_type, &rwq_type);
+ if (rc)
+ return;
+
+ wq_arr_unset_in.port = port;
+
+ if (ib_port->rwqs_enabled[ib_wq_arr_type]) {
+ wq_arr_unset_in.type = rwq_type;
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_WQ_UNSET,
+ &wq_arr_unset_in, NULL);
+ if (rc)
+ hbl_ibdev_dbg(ibdev, "failed to unset recv WQ, port %d\n", port);
+
+ ib_port->rwqs_enabled[ib_wq_arr_type] = false;
+ }
+
+ if (ib_port->swqs_enabled[ib_wq_arr_type]) {
+ wq_arr_unset_in.type = swq_type;
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_WQ_UNSET,
+ &wq_arr_unset_in, NULL);
+ if (rc)
+ hbl_ibdev_dbg(ibdev, "failed to unset send WQ, port %d\n", port);
+
+ ib_port->swqs_enabled[ib_wq_arr_type] = false;
+ }
+}
+
+static void hbl_ib_port_clear(struct hbl_ib_ucontext *hctx, int port)
+{
+ struct hbl_ib_device *hdev = to_hbl_ib_dev(hctx->ibucontext.device);
+ struct hbl_ib_port *ib_port = &hdev->ib_port[port];
+
+ /* Clean IB port struct from previous CTX allocations */
+ memset(ib_port, 0, sizeof(*ib_port));
+}
+
+static int hbl_ib_eq_func(void *param)
+{
+ unsigned long timeout = msecs_to_jiffies(60 * MSEC_PER_SEC);
+ struct hbl_ib_port *ib_port = param;
+ int rc;
+
+ while (!kthread_should_stop()) {
+ /* Use timeout to avoid warnings for sleeping too long */
+ rc = wait_for_completion_interruptible_timeout(&ib_port->eq_comp, timeout);
+
+ /* No need to iterate on the devices when timed out or signaled, but only when
+ * completed.
+ */
+ if (rc > 0)
+ hbl_ib_eqe_handler(ib_port);
+ }
+
+ return 0;
+}
+
+static int hbl_ib_eq_init(struct hbl_ib_port *ib_port)
+{
+ struct ib_device *ibdev = &ib_port->hdev->ibdev;
+ char eq_th_name[32] = {0};
+ u32 port = ib_port->port;
+ int rc;
+
+ init_completion(&ib_port->eq_comp);
+ atomic_set(&ib_port->eq_lock, 0);
+
+ snprintf(eq_th_name, sizeof(eq_th_name) - 1, "hbl_eq%d", port);
+ ib_port->eq_thread = kthread_run(hbl_ib_eq_func, ib_port, eq_th_name);
+ if (IS_ERR(ib_port->eq_thread)) {
+ rc = PTR_ERR(ib_port->eq_thread);
+ hbl_ibdev_dbg(ibdev, "failed to create an EQ thread, port %d, err %d\n", port, rc);
+ return rc;
+ }
+
+ return 0;
+}
+
+static void hbl_ib_eqe_fini(struct hbl_ib_port *ib_port)
+{
+ while (atomic_cmpxchg(&ib_port->eq_lock, 0, 1))
+ usleep_range(50, 200);
+
+ complete_all(&ib_port->eq_comp);
+ kthread_stop(ib_port->eq_thread);
+}
+
+int hbl_ib_port_init(struct hbl_ib_ucontext *hctx, struct hbl_ib_port_init_params *init_params)
+{
+ struct hbl_cni_set_user_app_params_in set_app_params_in = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_port *ib_port;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ u32 port;
+ int rc;
+
+ hdev = to_hbl_ib_dev(hctx->ibucontext.device);
+ ibdev = &hdev->ibdev;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ port = init_params->hbl_port_num;
+
+ ib_port = &hdev->ib_port[port];
+
+ ib_port->hdev = hdev;
+ ib_port->port = port;
+
+ ib_port->hctx = hctx;
+
+ set_app_params_in.port = port;
+ set_app_params_in.advanced = init_params->advanced;
+ set_app_params_in.adaptive_timeout_en = init_params->adaptive_timeout_en;
+ memcpy(set_app_params_in.bp_offs, init_params->qp_wq_bp_offs,
+ sizeof(set_app_params_in.bp_offs));
+
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_SET_USER_APP_PARAMS,
+ &set_app_params_in, NULL);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to set app params, port %d\n", port);
+ return rc;
+ }
+
+ /* Create pointer table to hold pointer to allocated ib_cq structs.
+ * We need them to post dispatch IB events.
+ */
+ xa_init(&ib_port->hbl_ibcq_tbl);
+
+ rc = hbl_ib_wqs_init(ib_port, hctx, init_params, HBL_IB_WQ_ARRAY_TYPE_GENERIC);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to init WQs, port %d\n", port);
+ goto destroy_xa;
+ }
+
+ rc = hbl_ib_eq_init(ib_port);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to init EQ completion object, port %d\n", port);
+ goto clean_wqs;
+ }
+
+ ib_port->open = true;
+
+ return 0;
+
+clean_wqs:
+ hbl_ib_wqs_fini(ib_port, hctx, HBL_IB_WQ_ARRAY_TYPE_GENERIC);
+destroy_xa:
+ xa_destroy(&ib_port->hbl_ibcq_tbl);
+ return rc;
+}
+
+static void hbl_ib_port_fini(struct hbl_ib_ucontext *hctx, u32 port)
+{
+ struct hbl_ib_port *ib_port;
+ struct hbl_ib_device *hdev;
+
+ hdev = to_hbl_ib_dev(hctx->ibucontext.device);
+ ib_port = &hdev->ib_port[port];
+
+ if (!ib_port->open)
+ return;
+
+ hbl_ib_eqe_fini(ib_port);
+
+ hbl_ib_wqs_fini(ib_port, hctx, HBL_IB_WQ_ARRAY_TYPE_GENERIC);
+
+ xa_destroy(&ib_port->hbl_ibcq_tbl);
+}
+
+static int hbl_ib_alloc_ucontext(struct ib_ucontext *ibucontext, struct ib_udata *udata)
+{
+ struct hbl_ib_device *hdev = to_hbl_ib_dev(ibucontext->device);
+ struct hbl_ib_ucontext *hctx = to_hbl_ib_ucontext(ibucontext);
+ struct hbl_ib_port_init_params port_init_params = {};
+ struct hbl_ibv_alloc_ucontext_resp resp = {};
+ struct ib_device *ibdev = ibucontext->device;
+ struct hbl_aux_dev *aux_dev = hdev->aux_dev;
+ struct hbl_ibv_alloc_ucontext_req req = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ u64 user_ports_mask;
+ int rc, i;
+
+ aux_ops = aux_dev->aux_ops;
+
+ rc = ib_copy_from_udata(&req, udata, min(sizeof(req), udata->inlen));
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to copy in udata for alloc_ucontext\n");
+ return rc;
+ }
+
+ user_ports_mask = req.ports_mask;
+
+ /* If the user didn't provide mask, we should use the core mask, which is 0-based.
+ * Otherwise, the user provides 1-based mask, so we need to convert it to core mask.
+ */
+ if (!user_ports_mask)
+ user_ports_mask = hdev->ports_mask;
+ else
+ user_ports_mask = to_hbl_port_mask(hdev, user_ports_mask);
+
+ if (user_ports_mask & ~hdev->ports_mask) {
+ hbl_ibdev_dbg(ibdev, "user ports mask (0x%llx) contains a disabled port\n",
+ user_ports_mask);
+ return -EINVAL;
+ }
+
+ if (atomic_cmpxchg(&hdev->ctx_open, 0, 1)) {
+ hbl_ibdev_dbg(ibdev, "ucontext is already allocated\n");
+ return -EBUSY;
+ }
+
+ rc = aux_ops->alloc_ucontext(aux_dev, req.core_fd, &hctx->cn_ctx);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "alloc context failed\n");
+ goto exit;
+ }
+
+ /* Clear all the Ports */
+ for (i = 0; i < hdev->max_num_of_ports; i++)
+ hbl_ib_port_clear(hctx, i);
+
+ xa_init_flags(&hctx->qp_xarray, XA_FLAGS_ALLOC);
+
+ /* If alloc context called from non-DV flow, need to initialize the ports here */
+ if (!req.use_dvs) {
+ struct hbl_wq_array_attr *gen_wq_arr_attr =
+ &port_init_params.wq_arr_attr[HBL_IB_WQ_ARRAY_TYPE_GENERIC];
+
+ gen_wq_arr_attr->max_num_of_wqs = HBL_IB_DEFAULT_MAX_NUM_OF_QPS;
+ gen_wq_arr_attr->max_num_of_wqes_in_wq = HBL_IB_DEFAULT_MAX_NUM_WQES_IN_WQ;
+ gen_wq_arr_attr->mem_id = HBL_IB_DEFAULT_WQ_MEM_ID;
+
+ for (i = 0; i < hdev->max_num_of_ports; i++) {
+ if (!(user_ports_mask & BIT(i)))
+ continue;
+
+ port_init_params.hbl_port_num = i;
+ rc = hbl_ib_port_init(hctx, &port_init_params);
+ if (rc)
+ goto uninit_ports;
+ }
+ }
+
+ hctx->ports_mask = user_ports_mask;
+
+ /* Here we should return ib mask, which is 1-based */
+ resp.ports_mask = to_ib_port_mask(hdev, user_ports_mask);
+
+ if (hdev->umr_support)
+ resp.cap_mask |= HBL_UCONTEXT_CAP_MMAP_UMR;
+ if (hdev->cc_support)
+ resp.cap_mask |= HBL_UCONTEXT_CAP_CC;
+
+ rc = ib_copy_to_udata(udata, &resp, min(sizeof(resp), udata->outlen));
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to copy out udata for alloc ucontext\n");
+ goto uninit_ports;
+ }
+
+ /* User context is allocated - set the handler for future EQE (should be last) */
+ aux_ops->eqe_work_schd = hbl_ib_eqe_work_schd;
+
+ hbl_ibdev_dbg(ibdev, "IB context was allocated\n");
+
+ return 0;
+
+uninit_ports:
+ for (--i; i >= 0; i--) {
+ if (!(user_ports_mask & BIT(i)))
+ continue;
+
+ hbl_ib_port_fini(hctx, i);
+ }
+ xa_destroy(&hctx->qp_xarray);
+ aux_ops->dealloc_ucontext(aux_dev, hctx->cn_ctx);
+exit:
+ atomic_set(&hdev->ctx_open, 0);
+
+ return rc;
+}
+
+static void hbl_ib_dealloc_ucontext(struct ib_ucontext *ibucontext)
+{
+ struct hbl_ib_device *hdev = to_hbl_ib_dev(ibucontext->device);
+ struct hbl_ib_ucontext *hctx = to_hbl_ib_ucontext(ibucontext);
+ struct hbl_aux_dev *aux_dev = hdev->aux_dev;
+ struct hbl_ib_aux_ops *aux_ops;
+ int i;
+
+ aux_ops = aux_dev->aux_ops;
+
+ /* User context is dealocated, prevent from future EQE call the handler */
+ aux_ops->eqe_work_schd = hbl_ib_eqe_null_work;
+
+ for (i = 0; i < hdev->max_num_of_ports; i++) {
+ if (!(hctx->ports_mask & BIT(i)))
+ continue;
+
+ hbl_ib_port_fini(hctx, i);
+ }
+
+ /* Core uverbs enforces that all ucontext sub-resources (e.g. QPs) are already released by
+ * the time we reach here. Hence, no need to check for active xarray IDs.
+ */
+ xa_destroy(&hctx->qp_xarray);
+
+ aux_ops->dealloc_ucontext(aux_dev, hctx->cn_ctx);
+
+ atomic_set(&hdev->ctx_open, 0);
+
+ hbl_ibdev_dbg(&hdev->ibdev, "IB context was deallocated\n");
+}
+
+static void hbl_ib_get_dev_fw_str(struct ib_device *device, char *str)
+{
+ struct hbl_ib_device *hdev = to_hbl_ib_dev(device);
+ struct hbl_aux_dev *aux_dev = hdev->aux_dev;
+ struct hbl_ib_device_attr dev_attr = {};
+ struct hbl_ib_aux_ops *aux_ops;
+
+ aux_ops = aux_dev->aux_ops;
+ aux_ops->query_device(aux_dev, &dev_attr);
+
+ snprintf(str, IB_FW_VERSION_NAME_MAX, "%d.%u.%u", (u32)(dev_attr.fw_ver >> 32),
+ (u16)FIELD_GET((0xffff << 16), dev_attr.fw_ver), (u16)(dev_attr.fw_ver & 0xffff));
+}
+
+static enum rdma_link_layer hbl_ib_port_link_layer(struct ib_device *ibdev, u32 port_num)
+{
+ return IB_LINK_LAYER_ETHERNET;
+}
+
+static int hbl_ib_get_port_immutable(struct ib_device *ibdev, u32 port_num,
+ struct ib_port_immutable *immutable)
+{
+ struct hbl_ib_device *hdev = to_hbl_ib_dev(ibdev);
+ struct ib_port_attr attr;
+ u32 hport = 0;
+ int rc;
+
+ rc = ib_to_hbl_port_num(hdev, port_num, &hport);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u\n", port_num);
+ return rc;
+ }
+
+ rc = ib_query_port(ibdev, port_num, &attr);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Couldn't query port %d, rc %d\n", port_num, rc);
+ return rc;
+ }
+
+ immutable->pkey_tbl_len = attr.pkey_tbl_len;
+ immutable->gid_tbl_len = attr.gid_tbl_len;
+
+ if (hdev->ext_ports_mask & BIT(hport))
+ /* RoCEv1 is used for MAC based address resolution on L2 networks.
+ * while RoCEv2 is used for IP based address resolution on L3 networks.
+ */
+ immutable->core_cap_flags = RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP |
+ RDMA_CORE_CAP_PROT_ROCE;
+ else
+ /* Since the internal ports are not advertised to netdev, we need to advertise them
+ * as plain IB to the IB core.
+ */
+ immutable->core_cap_flags = RDMA_CORE_CAP_PROT_IB;
+
+ return 0;
+}
+
+static int hbl_ib_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+ struct ib_udata *udata)
+{
+ struct hbl_ib_device *hdev = to_hbl_ib_dev(ibdev);
+ struct hbl_aux_dev *aux_dev = hdev->aux_dev;
+ struct hbl_ib_device_attr dev_attr = {};
+ struct hbl_ib_aux_ops *aux_ops;
+
+ aux_ops = aux_dev->aux_ops;
+
+ if (udata && udata->inlen && !ib_is_udata_cleared(udata, 0, udata->inlen)) {
+ hbl_ibdev_dbg(ibdev, "Incompatible ABI params, udata not cleared\n");
+ return -EINVAL;
+ }
+
+ memset(props, 0, sizeof(*props));
+
+ aux_ops->query_device(aux_dev, &dev_attr);
+
+ props->fw_ver = dev_attr.fw_ver;
+ props->max_mr = 1;
+ props->max_mr_size = dev_attr.max_mr_size;
+ props->page_size_cap = dev_attr.page_size_cap;
+
+ props->vendor_id = dev_attr.vendor_id;
+ props->vendor_part_id = dev_attr.vendor_part_id;
+ props->hw_ver = dev_attr.hw_ver;
+
+ props->max_qp = dev_attr.max_qp;
+ props->max_qp_wr = dev_attr.max_qp_wr;
+
+ props->device_cap_flags = IB_DEVICE_RAW_MULTI |
+ IB_DEVICE_CHANGE_PHY_PORT |
+ IB_DEVICE_CURR_QP_STATE_MOD |
+ IB_DEVICE_SHUTDOWN_PORT |
+ IB_DEVICE_PORT_ACTIVE_EVENT |
+ IB_DEVICE_RC_RNR_NAK_GEN |
+ IB_DEVICE_N_NOTIFY_CQ;
+
+ /* RR is unsupported but we need at least 2 max sge to pass pyverbs test */
+ props->max_send_sge = HBL_IB_MAX_SEND_SGE;
+ props->max_recv_sge = HBL_IB_MAX_RECV_SGE;
+
+ /* RD is unsupported */
+ props->max_sge_rd = 0;
+ props->max_cq = 1;
+ props->max_cqe = dev_attr.max_cqe;
+
+ props->max_pd = 1;
+ props->atomic_cap = IB_ATOMIC_NONE;
+ props->max_raw_ipv6_qp = 1;
+ props->max_raw_ethy_qp = 1;
+ props->max_pkeys = 1;
+
+ if (udata && udata->outlen)
+ hbl_ibdev_dbg(ibdev, "Failed to copy udata for query_device\n");
+
+ return 0;
+}
+
+static u32 conv_lane_to_ib_width(u32 num_lanes)
+{
+ switch (num_lanes) {
+ case 1:
+ return IB_WIDTH_1X;
+ case 2:
+ return IB_WIDTH_2X;
+ case 4:
+ return IB_WIDTH_4X;
+ default:
+ return IB_WIDTH_4X;
+ }
+}
+
+static u32 conv_speed_to_ib_speed(u32 port_speed, u8 lanes)
+{
+ u32 speed_per_lane = port_speed / lanes;
+
+ switch (speed_per_lane) {
+ case SPEED_25000:
+ return IB_SPEED_EDR;
+ case SPEED_50000:
+ return IB_SPEED_HDR;
+ case SPEED_100000:
+ return IB_SPEED_NDR;
+ default:
+ return IB_SPEED_HDR;
+ }
+}
+
+static int hbl_ib_query_port(struct ib_device *ibdev, u32 port, struct ib_port_attr *props)
+{
+ struct hbl_ib_device *hdev = to_hbl_ib_dev(ibdev);
+ struct hbl_aux_dev *aux_dev = hdev->aux_dev;
+ struct hbl_ib_port_attr port_attr = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ u32 hport;
+ int rc;
+
+ rc = ib_to_hbl_port_num(hdev, port, &hport);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u\n", port);
+ return rc;
+ }
+
+ aux_ops = aux_dev->aux_ops;
+ aux_ops->query_port(aux_dev, hport, &port_attr);
+
+ props->state = port_attr.open ? IB_PORT_ACTIVE : IB_PORT_DOWN;
+ props->max_mtu = ib_mtu_int_to_enum(port_attr.max_mtu);
+
+ /* external ports: Use value initialized in hbl_ib_port.
+ * Internal ports: Hard code 4KB for now
+ */
+ props->active_mtu = ib_mtu_int_to_enum(hdev->ib_port[hport].mtu);
+ if (hdev->ext_ports_mask & BIT(hport))
+ props->gid_tbl_len = HBL_IB_MAX_PORT_GIDS;
+ else
+ props->gid_tbl_len = HBL_IB_MAX_PORT_GIDS_INTERNAL;
+
+ props->max_msg_sz = port_attr.max_msg_sz;
+ props->pkey_tbl_len = 1;
+
+ props->active_speed = conv_speed_to_ib_speed(port_attr.speed, port_attr.num_lanes);
+ props->active_width = conv_lane_to_ib_width(port_attr.num_lanes);
+
+ props->phys_state = port_attr.link_up ? IB_PORT_PHYS_STATE_LINK_UP :
+ IB_PORT_PHYS_STATE_DISABLED;
+
+ return 0;
+}
+
+static int hbl_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
+{
+ struct hbl_ib_ucontext *hctx = rdma_udata_to_drv_context(udata, struct hbl_ib_ucontext,
+ ibucontext);
+ struct hbl_ib_pd *pd = to_hbl_ib_pd(ibpd);
+ struct hbl_ibv_alloc_pd_resp resp = {};
+ struct ib_device *ibdev = ibpd->device;
+ int rc;
+
+ if (udata->inlen && !ib_is_udata_cleared(udata, 0, udata->inlen)) {
+ hbl_ibdev_dbg(ibdev, "Incompatible ABI params, udata not cleared\n");
+ return -EINVAL;
+ }
+
+ /* currently only a single PD is supoprted */
+ if (atomic_cmpxchg(&hctx->pd_allocated, 0, 1)) {
+ hbl_ibdev_dbg(ibdev, "no available PD\n");
+ return -ESRCH;
+ }
+
+ pd->pdn = 1;
+ resp.pdn = pd->pdn;
+
+ if (udata->outlen) {
+ rc = ib_copy_to_udata(udata, &resp, min(sizeof(resp), udata->outlen));
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed to copy udata for alloc_pd\n");
+ goto err;
+ }
+ }
+
+ hbl_ibdev_dbg(ibdev, "allocated PD %d\n", pd->pdn);
+
+ return 0;
+
+err:
+ atomic_set(&hctx->pd_allocated, 0);
+
+ return rc;
+}
+
+static void cq_user_mmap_entries_remove(struct hbl_ib_cq *cq)
+{
+ if (cq->regs_handle_entry)
+ rdma_user_mmap_entry_remove(cq->regs_handle_entry);
+
+ rdma_user_mmap_entry_remove(cq->pi_handle_entry);
+ rdma_user_mmap_entry_remove(cq->mem_handle_entry);
+}
+
+static int cq_user_mmap_entries_setup(struct hbl_ib_device *dev, struct hbl_ib_cq *cq,
+ struct hbl_ib_ucontext *hctx, u32 cq_size, u64 *mem_handle,
+ u64 *pi_handle, u64 *regs_handle)
+{
+ int rc;
+
+ cq->mem_handle_entry = hbl_ib_user_mmap_entry_insert(&hctx->ibucontext, *mem_handle,
+ cq_size, mem_handle);
+ if (IS_ERR(cq->mem_handle_entry))
+ return PTR_ERR(cq->mem_handle_entry);
+
+ cq->pi_handle_entry = hbl_ib_user_mmap_entry_insert(&hctx->ibucontext, *pi_handle,
+ PAGE_SIZE, pi_handle);
+ if (IS_ERR(cq->pi_handle_entry)) {
+ rc = PTR_ERR(cq->pi_handle_entry);
+ goto err_free_mem;
+ }
+
+ if (regs_handle) {
+ cq->regs_handle_entry = hbl_ib_user_mmap_entry_insert(&hctx->ibucontext,
+ *regs_handle, PAGE_SIZE,
+ regs_handle);
+ if (IS_ERR(cq->regs_handle_entry)) {
+ rc = PTR_ERR(cq->regs_handle_entry);
+ goto err_free_pi;
+ }
+ }
+
+ return 0;
+
+err_free_pi:
+ rdma_user_mmap_entry_remove(cq->pi_handle_entry);
+
+err_free_mem:
+ rdma_user_mmap_entry_remove(cq->mem_handle_entry);
+
+ return rc;
+}
+
+/* Get the max supported port from ports_mask.
+ * based on MSB we are counting the maximum valid ports.
+ */
+static int get_max_ports_from_port_mask(int ports_mask)
+{
+ int max_num_ports = 0;
+ int msb_index = 0;
+
+ if (ports_mask == 0)
+ return -1;
+
+ while (ports_mask > 1) {
+ ports_mask >>= 1;
+ msb_index++;
+ }
+
+ max_num_ports = msb_index + 1;
+
+ return max_num_ports;
+}
+
+static int __create_per_port_cq(struct hbl_ib_cq *hblcq, struct hbl_ib_device *hdev,
+ const struct ib_cq_init_attr *attr, struct ib_udata *udata)
+{
+ u64 mmap_set_mask = 0, cq_set_mask = 0, ports_mask, ib_ports_mask;
+ struct hbl_cni_alloc_user_cq_id_out alloc_cq_out = {};
+ struct hbl_cni_alloc_user_cq_id_in alloc_cq_in = {};
+ struct hbl_cni_user_cq_id_unset_in cq_unset_in = {};
+ struct hbl_cni_user_cq_id_set_out cq_set_out = {};
+ struct hbl_ibv_port_create_cq_resp *port_cq_resp;
+ struct hbl_cni_user_cq_id_set_in cq_set_in = {};
+ struct hbl_ibv_create_cq_resp *resp = NULL;
+ struct hbl_ib_device_attr dev_attr = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_ib_cq *port_hblcq;
+ struct hbl_ib_port *ib_port;
+ struct hbl_aux_dev *aux_dev;
+ u32 cq_ib_port, cq_num, i;
+ struct ib_device *ibdev;
+ size_t resp_size;
+ int cqes, rc = 0;
+ int max_ports;
+
+ ibdev = &hdev->ibdev;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ cqes = attr->cqe;
+ hctx = hblcq->hctx;
+ ports_mask = hctx->ports_mask;
+
+ ib_ports_mask = to_ib_port_mask(hdev, ports_mask);
+ max_ports = get_max_ports_from_port_mask(ib_ports_mask);
+ if (max_ports < 0) {
+ hbl_ibdev_dbg(ibdev, "port mask is empty: %llx\n", ib_ports_mask);
+ return -EINVAL;
+ }
+
+ resp = kzalloc((sizeof(struct hbl_ibv_port_create_cq_resp) * max_ports) +
+ (sizeof(struct hbl_ibv_create_cq_resp)), GFP_KERNEL);
+ resp_size = (sizeof(struct hbl_ibv_port_create_cq_resp) * max_ports) +
+ (sizeof(struct hbl_ibv_create_cq_resp));
+
+ hblcq->port_cq = kzalloc((sizeof(struct hbl_ib_cq) * max_ports), GFP_KERNEL);
+
+ for (i = 0; i < max_ports; i++) {
+ if (!(ports_mask & BIT(i)))
+ continue;
+
+ cq_ib_port = hbl_to_ib_port_num(hdev, i);
+ ib_port = &hdev->ib_port[i];
+
+ /* Step 1: Alloc cq */
+ alloc_cq_in.port = i;
+
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_ALLOC_USER_CQ_ID,
+ &alloc_cq_in, &alloc_cq_out);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Allocation of cq_id failed, port: %u\n", i);
+ goto err_cq;
+ }
+
+ cq_num = alloc_cq_out.id;
+
+ port_hblcq = &hblcq->port_cq[cq_ib_port];
+ port_hblcq->hbl_port_num = i;
+ port_hblcq->cq_num = cq_num;
+ port_hblcq->cq_type = HBL_CQ_TYPE_QP;
+
+ cq_set_mask |= BIT(cq_ib_port);
+ aux_ops->query_device(aux_dev, &dev_attr);
+
+ if (cqes < dev_attr.min_cq_entries) {
+ cqes = dev_attr.min_cq_entries;
+ hbl_ibdev_dbg(ibdev,
+ "Requested cqe: %d is less than minimum required cqe: %d. Hence ceiling it to min required CQE\n",
+ cqes, dev_attr.min_cq_entries);
+ }
+
+ /* Step 2: USER_CQ Set */
+ cq_set_in.port = i;
+ cq_set_in.id = cq_num;
+ cq_set_in.num_of_cqes = cqes;
+
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_CQ_ID_SET, &cq_set_in,
+ &cq_set_out);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "CQ_ID Set failed, port: %u\n", i);
+ goto err_cq;
+ }
+
+ port_cq_resp = &resp->port_cq_resp[cq_ib_port];
+ port_cq_resp->cq_num = cq_num;
+ port_cq_resp->mem_handle = cq_set_out.mem_handle;
+ port_cq_resp->pi_handle = cq_set_out.pi_handle;
+ port_cq_resp->regs_handle = cq_set_out.regs_handle;
+ port_cq_resp->regs_offset = cq_set_out.regs_offset;
+ port_cq_resp->cq_size = PAGE_ALIGN(dev_attr.cqe_size * cqes);
+
+ rc = cq_user_mmap_entries_setup(hdev, &hblcq->port_cq[cq_ib_port], hctx,
+ port_cq_resp->cq_size, &port_cq_resp->mem_handle,
+ &port_cq_resp->pi_handle,
+ &port_cq_resp->regs_handle);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "unable to set up cq mmap entries\n");
+ goto err_cq;
+ }
+
+ mmap_set_mask |= BIT(cq_ib_port);
+ xa_store(&ib_port->hbl_ibcq_tbl, cq_num, &hblcq->ibcq, GFP_KERNEL);
+ }
+
+ if (udata->outlen) {
+ rc = ib_copy_to_udata(udata, resp, min(resp_size, udata->outlen));
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed to copy udata to userspace\n");
+ goto err_cq;
+ }
+ }
+
+ kfree(resp);
+ return 0;
+
+err_cq:
+ for (i = 0; i < hdev->max_num_of_ports; i++) {
+ if (mmap_set_mask & BIT(i))
+ cq_user_mmap_entries_remove(&hblcq->port_cq[i]);
+
+ if (cq_set_mask & BIT(i)) {
+ cq_unset_in.port = hblcq->port_cq[i].hbl_port_num;
+ cq_unset_in.id = hblcq->port_cq[i].cq_num;
+ if (aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_CQ_ID_UNSET,
+ &cq_unset_in, NULL)) {
+ hbl_ibdev_dbg(ibdev, "Failed to destroy cq, port: %d, cq_num: %d\n",
+ cq_unset_in.port, hblcq->port_cq[i].cq_num);
+ }
+ }
+ }
+
+ kfree(hblcq->port_cq);
+ kfree(resp);
+
+ return rc;
+}
+
+static int __create_cq(struct hbl_ib_cq *hblcq, struct hbl_ib_device *hdev,
+ const struct ib_cq_init_attr *attr, struct ib_udata *udata, u32 hbl_port_num)
+{
+ struct hbl_cni_alloc_user_cq_id_out alloc_cq_out = {};
+ struct hbl_cni_alloc_user_cq_id_in alloc_cq_in = {};
+ struct hbl_cni_user_cq_id_unset_in cq_unset_in = {};
+ struct hbl_cni_user_cq_id_set_out cq_set_out = {};
+ struct hbl_cni_user_cq_id_set_in cq_set_in = {};
+ struct hbl_ibv_create_cq_resp resp = {};
+ struct hbl_ib_device_attr dev_attr = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_ib_port *ib_port;
+ struct hbl_aux_dev *aux_dev;
+ struct ib_device *ibdev;
+ int cqes, rc;
+ u32 cq_num;
+
+ ibdev = &hdev->ibdev;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ cqes = attr->cqe;
+ ib_port = &hdev->ib_port[hbl_port_num];
+ hctx = hblcq->hctx;
+
+ /* Step 1: Alloc cq */
+ alloc_cq_in.port = hbl_port_num;
+
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_ALLOC_USER_CQ_ID, &alloc_cq_in,
+ &alloc_cq_out);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Allocation of cq_id failed, port: %d\n", hbl_port_num);
+ return rc;
+ }
+
+ cq_num = alloc_cq_out.id;
+
+ aux_ops->query_device(aux_dev, &dev_attr);
+
+ /* If the number of cqes requested by the IB user is less than the minimum required by the
+ * HW, ceil it to min required cq entries. This is needed to pass the test_cq pyverbs test.
+ */
+ if (cqes < dev_attr.min_cq_entries) {
+ cqes = dev_attr.min_cq_entries;
+ hbl_ibdev_dbg(ibdev,
+ "Requested cqe: %d is less than minimum required cqe: %d. Hence ceiling it to min required CQE\n",
+ cqes, dev_attr.min_cq_entries);
+ }
+
+ /* Step 2: USER_CQ Set */
+ cq_set_in.port = hbl_port_num;
+ cq_set_in.id = cq_num;
+ cq_set_in.num_of_cqes = cqes;
+
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_CQ_ID_SET, &cq_set_in,
+ &cq_set_out);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "CQ_ID Set failed, port: %d\n", hbl_port_num);
+ goto unset_cq;
+ }
+
+ resp.cq_num = cq_num;
+ resp.mem_handle = cq_set_out.mem_handle;
+ resp.pi_handle = cq_set_out.pi_handle;
+ resp.regs_handle = cq_set_out.regs_handle;
+ resp.regs_offset = cq_set_out.regs_offset;
+ resp.cq_size = PAGE_ALIGN(dev_attr.cqe_size * cqes);
+
+ rc = cq_user_mmap_entries_setup(hdev, hblcq, hctx, resp.cq_size, &resp.mem_handle,
+ &resp.pi_handle, &resp.regs_handle);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "unable to set up cq mmap entries\n");
+ goto unset_cq;
+ }
+
+ if (udata->outlen) {
+ rc = ib_copy_to_udata(udata, &resp, min(sizeof(resp), udata->outlen));
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed to copy udata to userspace\n");
+ goto unset_mmap_entries;
+ }
+ }
+
+ /* Number of cqes that are allocated. Also store the relevant data needed for
+ * destroying the cq.
+ */
+ hblcq->ibcq.cqe = cqes;
+ hblcq->hbl_port_num = hbl_port_num;
+ hblcq->cq_num = cq_num;
+ hblcq->cq_type = HBL_CQ_TYPE_QP;
+
+ xa_store(&ib_port->hbl_ibcq_tbl, cq_num, &hblcq->ibcq, GFP_KERNEL);
+
+ return 0;
+
+unset_mmap_entries:
+ cq_user_mmap_entries_remove(hblcq);
+
+unset_cq:
+ cq_unset_in.port = hbl_port_num;
+ cq_unset_in.id = cq_num;
+
+ if (aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_CQ_ID_UNSET, &cq_unset_in,
+ NULL))
+ hbl_ibdev_dbg(ibdev, "Failed to destroy cq, port: %d, cq_num: %d\n",
+ hbl_port_num, cq_num);
+
+ return rc;
+}
+
+static int __create_cc_cq(struct hbl_ib_cq *hblcq, struct hbl_ib_device *hdev,
+ const struct ib_cq_init_attr *attr, struct ib_udata *udata,
+ u32 hbl_port_num)
+{
+ struct hbl_cni_user_ccq_unset_in cc_cq_unset_in = {};
+ struct hbl_cni_user_ccq_set_out cc_cq_set_out = {};
+ struct hbl_cni_user_ccq_set_in cc_cq_set_in = {};
+ struct hbl_ibv_create_cq_resp resp = {};
+ struct hbl_ib_device_attr dev_attr = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_aux_dev *aux_dev;
+ struct ib_device *ibdev;
+ int rc;
+
+ ibdev = &hdev->ibdev;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ hctx = hblcq->hctx;
+
+ cc_cq_set_in.port = hbl_port_num;
+ cc_cq_set_in.num_of_entries = attr->cqe;
+
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_CCQ_SET, &cc_cq_set_in,
+ &cc_cq_set_out);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to set CC CQ, port %d\n", hbl_port_num);
+ return rc;
+ }
+
+ aux_ops->query_device(aux_dev, &dev_attr);
+
+ resp.cq_num = cc_cq_set_out.id;
+ resp.mem_handle = cc_cq_set_out.mem_handle;
+ resp.pi_handle = cc_cq_set_out.pi_handle;
+ resp.cq_size = PAGE_ALIGN(dev_attr.cqe_size * attr->cqe);
+
+ rc = cq_user_mmap_entries_setup(hdev, hblcq, hctx, resp.cq_size, &resp.mem_handle,
+ &resp.pi_handle, NULL);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "unable to set up cq mmap entries\n");
+ goto err_cc_cq_unset;
+ }
+
+ if (udata->outlen) {
+ rc = ib_copy_to_udata(udata, &resp, min(sizeof(resp), udata->outlen));
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed to copy udata to userspace\n");
+ goto err_mmap_entries_unset;
+ }
+ }
+
+ /* Number of cqes that are allocated. Also store the relevant data needed for destroying the
+ * cq.
+ */
+ hblcq->ibcq.cqe = attr->cqe;
+ hblcq->hbl_port_num = hbl_port_num;
+ hblcq->cq_num = cc_cq_set_out.id;
+ hblcq->cq_type = HBL_CQ_TYPE_CC;
+
+ return 0;
+
+err_mmap_entries_unset:
+ cq_user_mmap_entries_remove(hblcq);
+
+err_cc_cq_unset:
+ cc_cq_unset_in.port = hbl_port_num;
+
+ if (aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_CCQ_UNSET, &cc_cq_unset_in,
+ NULL))
+ hbl_ibdev_dbg(ibdev, "failed to unset CC CQ, port %d\n", hbl_port_num);
+
+ return rc;
+}
+
+static int create_cq(struct hbl_ib_cq *hblcq, struct hbl_ib_device *hdev,
+ const struct ib_cq_init_attr *attr, struct ib_udata *udata)
+{
+ struct hbl_ibv_create_cq_req cmd = {};
+ struct hbl_ib_ucontext *hctx;
+ struct ib_device *ibdev;
+ u32 hbl_port_num = 0;
+ int rc;
+
+ hctx = rdma_udata_to_drv_context(udata, struct hbl_ib_ucontext, ibucontext);
+ ibdev = &hdev->ibdev;
+
+ if (attr->flags) {
+ hbl_ibdev_dbg(ibdev, "attr->flags: %d but should be 0\n", attr->flags);
+ return -EOPNOTSUPP;
+ }
+
+ rc = ib_copy_from_udata(&cmd, udata, min(sizeof(cmd), udata->inlen));
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed to copy udata from user space\n");
+ return rc;
+ }
+
+ /* For native CQ port number is not relevant */
+ if (!(cmd.flags & CQ_FLAG_NATIVE)) {
+ rc = ib_to_hbl_port_num(hdev, cmd.port_num, &hbl_port_num);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u\n", cmd.port_num);
+ return rc;
+ }
+
+ if (!(hctx->ports_mask & BIT(hbl_port_num))) {
+ hbl_ibdev_dbg(ibdev,
+ "port %d is not part of the context's ports mask 0x%llx\n",
+ hbl_port_num, hctx->ports_mask);
+ return -EINVAL;
+ }
+ }
+
+ hblcq->hctx = hctx;
+
+ switch (cmd.cq_type) {
+ case HBL_CQ_TYPE_QP:
+ if (cmd.flags & CQ_FLAG_NATIVE) {
+ hblcq->is_native = true;
+ rc = __create_per_port_cq(hblcq, hdev, attr, udata);
+ } else {
+ rc = __create_cq(hblcq, hdev, attr, udata, hbl_port_num);
+ }
+ break;
+ case HBL_CQ_TYPE_CC:
+ rc = __create_cc_cq(hblcq, hdev, attr, udata, hbl_port_num);
+ break;
+ default:
+ hbl_ibdev_dbg(ibdev, "Invalid CQ resource requested %u\n", cmd.cq_type);
+ rc = -EINVAL;
+ }
+
+ return rc;
+}
+
+static int hbl_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+ struct ib_udata *udata)
+{
+ struct hbl_ib_device *hdev = to_hbl_ib_dev(ibcq->device);
+ struct hbl_ib_cq *hblcq = to_hbl_ib_cq(ibcq);
+ int rc;
+
+ rc = create_cq(hblcq, hdev, attr, udata);
+ if (rc) {
+ hbl_ibdev_dbg(&hdev->ibdev, "Failed to create a CQ\n");
+ return rc;
+ }
+
+ return 0;
+}
+
+static int create_qp(struct hbl_ib_qp *hblqp, struct ib_qp_init_attr *qp_init_attr,
+ struct ib_udata *udata)
+{
+ struct hbl_ib_ucontext *hctx = rdma_udata_to_drv_context(udata, struct hbl_ib_ucontext,
+ ibucontext);
+ struct ib_device *ibdev = hblqp->ibqp.device;
+ u32 qp_num;
+ int rc;
+
+ /* Allocate an IB QP handle. Note,
+ * - It doesn't map to HW QPC index.
+ * - No HW or hbl_cn QP resources are allocated yet.
+ */
+ rc = xa_alloc(&hctx->qp_xarray, &qp_num, hblqp, xa_limit_32b, GFP_KERNEL);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed to allocate IB QP handle\n");
+ return rc;
+ }
+
+ hblqp->ibqp.qp_num = qp_num;
+ hblqp->qp_state = IB_QPS_RESET;
+ hblqp->hctx = hctx;
+
+ /* Cache the required QP params */
+ hblqp->max_send_wr = qp_init_attr->cap.max_send_wr;
+ hblqp->max_recv_wr = qp_init_attr->cap.max_recv_wr;
+
+ return 0;
+}
+
+static int hbl_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *qp_init_attr,
+ struct ib_udata *udata)
+{
+ struct hbl_ib_qp *hblqp = to_hbl_ib_qp(ibqp);
+
+ return create_qp(hblqp, qp_init_attr, udata);
+}
+
+static int __hbl_ib_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
+{
+ struct hbl_ib_ucontext *hctx = rdma_udata_to_drv_context(udata, struct hbl_ib_ucontext,
+ ibucontext);
+ struct hbl_ib_pd *pd = to_hbl_ib_pd(ibpd);
+ struct ib_device *ibdev = ibpd->device;
+
+ hbl_ibdev_dbg(ibdev, "deallocated PD %d\n", pd->pdn);
+
+ atomic_set(&hctx->pd_allocated, 0);
+
+ return 0;
+}
+
+static int hbl_ib_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
+{
+ return __hbl_ib_dealloc_pd(ibpd, udata);
+}
+
+static int hbl_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
+{
+ return -EOPNOTSUPP;
+}
+
+static void __destroy_per_port_cq(struct hbl_ib_cq *hblcq, struct hbl_ib_device *hdev)
+{
+ struct hbl_cni_user_cq_id_unset_in cq_unset_in = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_ib_ucontext *hctx;
+ u32 cq_ib_port, hbl_port_num;
+ struct hbl_ib_cq *port_hblcq;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_port *ib_port;
+ struct ib_device *ibdev;
+ u64 ports_mask, i;
+ int rc;
+
+ ibdev = &hdev->ibdev;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ hctx = hblcq->hctx;
+ ports_mask = hctx->ports_mask;
+
+ for (i = 0; i < hdev->max_num_of_ports; i++) {
+ if (!(ports_mask & BIT(i)))
+ continue;
+
+ cq_ib_port = hbl_to_ib_port_num(hdev, i);
+ port_hblcq = &hblcq->port_cq[cq_ib_port];
+
+ cq_user_mmap_entries_remove(port_hblcq);
+
+ hbl_port_num = port_hblcq->hbl_port_num;
+ ib_port = &hdev->ib_port[hbl_port_num];
+ xa_erase(&ib_port->hbl_ibcq_tbl, port_hblcq->cq_num);
+ cq_unset_in.port = port_hblcq->hbl_port_num;
+ cq_unset_in.id = port_hblcq->cq_num;
+
+ if (aux_ops->device_operational(aux_dev)) {
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_CQ_ID_UNSET,
+ &cq_unset_in, NULL);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed to destroy cq, port: %d, cq_num: %d\n",
+ hblcq->hbl_port_num, hblcq->cq_num);
+ }
+ }
+ }
+
+ kfree(hblcq->port_cq);
+}
+
+static int destroy_cq(struct ib_cq *ibcq)
+{
+ struct hbl_cni_user_ccq_unset_in cc_cq_unset_in = {};
+ struct hbl_cni_user_cq_id_unset_in cq_unset_in = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_port *ib_port;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ struct hbl_ib_cq *hblcq;
+ int rc;
+
+ hblcq = to_hbl_ib_cq(ibcq);
+ ibdev = ibcq->device;
+ hdev = to_hbl_ib_dev(ibdev);
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ ib_port = &hdev->ib_port[hblcq->hbl_port_num];
+ hctx = hblcq->hctx;
+
+ if (hblcq->is_native) {
+ __destroy_per_port_cq(hblcq, hdev);
+ } else {
+ cq_user_mmap_entries_remove(hblcq);
+
+ if (hblcq->cq_type == HBL_CQ_TYPE_QP) {
+ xa_erase(&ib_port->hbl_ibcq_tbl, hblcq->cq_num);
+ cq_unset_in.port = hblcq->hbl_port_num;
+ cq_unset_in.id = hblcq->cq_num;
+
+ if (aux_ops->device_operational(aux_dev)) {
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx,
+ HBL_CNI_OP_USER_CQ_ID_UNSET, &cq_unset_in,
+ NULL);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev,
+ "Failed to destroy cq, port: %d, cq_num: %d\n",
+ hblcq->hbl_port_num, hblcq->cq_num);
+ return rc;
+ }
+ }
+ } else {
+ cc_cq_unset_in.port = hblcq->hbl_port_num;
+
+ if (aux_ops->device_operational(aux_dev)) {
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx,
+ HBL_CNI_OP_USER_CCQ_UNSET, &cc_cq_unset_in,
+ NULL);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to unset CC CQ, port %d\n",
+ hblcq->hbl_port_num);
+ return rc;
+ }
+ }
+ }
+ }
+
+ return 0;
+}
+
+static int hbl_ib_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
+{
+ int rc;
+
+ rc = destroy_cq(ibcq);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static int __destroy_qp(struct hbl_ib_qp *hblqp)
+{
+ struct hbl_cni_destroy_conn_in destroy_conn_in = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ struct ib_qp *ibqp;
+ u32 hbl_port;
+ int rc;
+
+ ibqp = &hblqp->ibqp;
+ ibdev = ibqp->device;
+ hdev = to_hbl_ib_dev(ibdev);
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ hctx = hblqp->hctx;
+
+ qp_user_mmap_entries_remove(hblqp);
+
+ rc = ib_to_hbl_port_num(hdev, ibqp->port, &hbl_port);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u, IB QP %u\n", ibqp->port, ibqp->qp_num);
+ return rc;
+ }
+
+ destroy_conn_in.port = hbl_port;
+ destroy_conn_in.conn_id = hblqp->qp_id;
+
+ if (aux_ops->device_operational(aux_dev)) {
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_DESTROY_CONN,
+ &destroy_conn_in, NULL);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed to destroy QP id %d, port %d\n", hblqp->qp_id,
+ hbl_port);
+ return rc;
+ }
+ }
+ return 0;
+}
+
+static int destroy_qp(struct hbl_ib_qp *hblqp)
+{
+ struct hbl_ib_ucontext *hctx;
+ struct ib_qp *ibqp;
+ int rc;
+
+ rc = verify_qp_xarray(hblqp);
+ if (rc)
+ return rc;
+
+ ibqp = &hblqp->ibqp;
+ hctx = hblqp->hctx;
+
+ if (hblqp->qp_state >= IB_QPS_INIT) {
+ rc = __destroy_qp(hblqp);
+ if (rc)
+ return rc;
+ }
+
+ xa_erase(&hctx->qp_xarray, ibqp->qp_num);
+
+ return 0;
+}
+
+static int hbl_ib_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
+{
+ struct hbl_ib_qp *hblqp = to_hbl_ib_qp(ibqp);
+ int rc;
+
+ rc = destroy_qp(hblqp);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static struct rdma_hw_stats *__hbl_ib_alloc_hw_stats(struct ib_device *ibdev, u32 port_num)
+{
+ struct rdma_stat_desc *hbl_ib_port_stats;
+ struct hbl_ib_port_stats *port_stats;
+ struct hbl_ib_device *hdev;
+ u32 port;
+ int rc;
+
+ if (!port_num)
+ return rdma_alloc_hw_stats_struct(hbl_ib_device_stats,
+ ARRAY_SIZE(hbl_ib_device_stats),
+ RDMA_HW_STATS_DEFAULT_LIFESPAN);
+
+ hdev = to_hbl_ib_dev(ibdev);
+
+ rc = ib_to_hbl_port_num(hdev, port_num, &port);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u\n", port_num);
+ return NULL;
+ }
+
+ port_stats = &hdev->port_stats[port];
+
+ hbl_ib_port_stats = port_stats->stat_desc;
+
+ return rdma_alloc_hw_stats_struct(hbl_ib_port_stats, port_stats->num,
+ RDMA_HW_STATS_DEFAULT_LIFESPAN);
+}
+
+static struct rdma_hw_stats *hbl_ib_alloc_hw_port_stats(struct ib_device *ibdev, u32 port_num)
+{
+ return __hbl_ib_alloc_hw_stats(ibdev, port_num);
+}
+
+static struct rdma_hw_stats *hbl_ib_alloc_hw_device_stats(struct ib_device *ibdev)
+{
+ return __hbl_ib_alloc_hw_stats(ibdev, 0);
+}
+
+static int hbl_ib_get_hw_stats(struct ib_device *ibdev, struct rdma_hw_stats *stats, u32 port_num,
+ int index)
+{
+ struct hbl_ib_device *hdev = to_hbl_ib_dev(ibdev);
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ u32 port;
+ int rc;
+
+ if (!port_num) {
+ stats->value[FATAL_EVENT] = atomic_read(&hdev->dev_stats.fatal_event);
+
+ return ARRAY_SIZE(hbl_ib_device_stats);
+ }
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ rc = ib_to_hbl_port_num(hdev, port_num, &port);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u\n", port_num);
+ return rc;
+ }
+
+ if (!aux_ops->device_operational(aux_dev)) {
+ hbl_ibdev_dbg(ibdev, "device not operational, can't get stats\n");
+ return -EINVAL;
+ }
+
+ aux_ops->get_cnts_values(aux_dev, port, stats->value);
+
+ return stats->num_counters;
+}
+
+static int hbl_ib_mmap(struct ib_ucontext *ibucontext, struct vm_area_struct *vma)
+{
+ struct rdma_user_mmap_entry *rdma_entry;
+ struct hbl_ib_user_mmap_entry *entry;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ int rc;
+
+ ibdev = ibucontext->device;
+ hdev = to_hbl_ib_dev(ibdev);
+
+ rdma_entry = rdma_user_mmap_entry_get(ibucontext, vma);
+ if (!rdma_entry) {
+ hbl_ibdev_dbg(&hdev->ibdev, "pgoff[%#lx] does not have valid entry\n",
+ vma->vm_pgoff);
+ return -EINVAL;
+ }
+
+ entry = to_hbl_ib_user_mmap_entry(rdma_entry);
+
+ switch (entry->info.mtype) {
+ case HBL_IB_MEM_HW_BLOCK:
+ vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY |
+ VM_NORESERVE);
+
+ rc = rdma_user_mmap_io(ibucontext, vma, entry->info.bus_addr >> PAGE_SHIFT,
+ entry->rdma_entry.npages * PAGE_SIZE,
+ vma->vm_page_prot, rdma_entry);
+ break;
+ case HBL_IB_MEM_HOST_DMA_COHERENT:
+ case HBL_IB_MEM_HOST_MAP_ONLY:
+ if (entry->info.vmalloc) {
+ vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY | VM_NORESERVE);
+
+ rc = remap_vmalloc_range(vma, entry->info.cpu_addr, 0);
+ } else {
+ vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
+ VM_DONTCOPY | VM_NORESERVE);
+
+ rc = remap_pfn_range(vma, vma->vm_start,
+ virt_to_phys(entry->info.cpu_addr) >> PAGE_SHIFT,
+ vma->vm_end - vma->vm_start, vma->vm_page_prot);
+ }
+ break;
+ case HBL_IB_MEM_HOST_VIRTUAL:
+ vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY | VM_NORESERVE);
+
+ rc = remap_vmalloc_range(vma, entry->info.cpu_addr, 0);
+ break;
+ default:
+ hbl_ibdev_dbg(&hdev->ibdev,
+ "pgoff[%#lx] does not have valid entry memory type %d\n",
+ vma->vm_pgoff, entry->info.mtype);
+ rc = -EINVAL;
+ }
+
+ rdma_user_mmap_entry_put(rdma_entry);
+
+ return rc;
+}
+
+static void hbl_ib_mmap_free(struct rdma_user_mmap_entry *rdma_entry)
+{
+ struct hbl_ib_user_mmap_entry *entry = to_hbl_ib_user_mmap_entry(rdma_entry);
+
+ kfree(entry);
+}
+
+static int verify_qp_xarray(struct hbl_ib_qp *hblqp)
+{
+ struct ib_device *ibdev;
+ struct ib_qp *ibqp;
+
+ ibqp = &hblqp->ibqp;
+ ibdev = ibqp->device;
+
+ hblqp = xa_load(&hblqp->hctx->qp_xarray, ibqp->qp_num);
+ if (!hblqp) {
+ hbl_ibdev_dbg(ibdev, "Invalid IB QP %d modified\n", ibqp->qp_num);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int verify_modify_qp(struct hbl_ib_qp *hblqp, struct ib_qp_attr *qp_attr, int qp_attr_mask)
+{
+ struct ib_device *ibdev;
+ struct ib_qp *ibqp;
+ int rc;
+
+ ibqp = &hblqp->ibqp;
+ ibdev = ibqp->device;
+
+ rc = verify_qp_xarray(hblqp);
+ if (rc)
+ return rc;
+
+ /* Verify state change and corresponding QP attribute mask. */
+ if (!ib_modify_qp_is_ok(hblqp->qp_state, qp_attr->qp_state, IB_QPT_RC, qp_attr_mask)) {
+ hbl_ibdev_dbg(ibdev, "Invalid IB QP %d params\n", ibqp->qp_num);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int get_qp_wq_type(struct hbl_ib_device *hdev, enum qpc_req_wq_type *to, u8 from)
+{
+ if (from & HBL_WQ_READ_RDV_ENDP) {
+ if (hweight_long(from) != 1)
+ return -EINVAL;
+
+ *to = QPC_REQ_WQ_TYPE_RDV_READ;
+
+ return 0;
+ }
+
+ if (from & HBL_WQ_SEND_RDV) {
+ if (hweight_long(from) != 1)
+ return -EINVAL;
+
+ *to = QPC_REQ_WQ_TYPE_RDV_WRITE;
+
+ return 0;
+ }
+
+ if (from & (HBL_WQ_WRITE | HBL_WQ_RECV_RDV | HBL_WQ_READ_RDV)) {
+ if (hdev->mixed_qp_wq_types) {
+ if ((from & HBL_WQ_RECV_RDV) && (from & HBL_WQ_READ_RDV))
+ return -EINVAL;
+ } else {
+ if (hweight_long(from) != 1)
+ return -EINVAL;
+ }
+
+ *to = QPC_REQ_WQ_TYPE_WRITE;
+
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+static int alloc_qp(struct hbl_ib_qp *hblqp, struct ib_qp_attr *qp_attr, int qp_attr_mask,
+ struct hbl_ibv_modify_qp_req *modify_qp_req,
+ struct hbl_ibv_modify_qp_resp *modify_qp_resp)
+{
+ struct hbl_cni_alloc_conn_out alloc_conn_out = {};
+ struct hbl_cni_alloc_conn_in alloc_conn_in = {};
+ enum qpc_req_wq_type hbl_wq_type;
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ struct ib_qp *ibqp;
+ u8 ib_wq_type;
+ u32 hbl_port;
+ int rc;
+
+ ibqp = &hblqp->ibqp;
+ ibdev = ibqp->device;
+ hdev = to_hbl_ib_dev(ibdev);
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ hctx = hblqp->hctx;
+
+ rc = ib_to_hbl_port_num(hdev, qp_attr->port_num, &hbl_port);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u\n", qp_attr->port_num);
+ return rc;
+ }
+
+ ib_wq_type = modify_qp_req->wq_type;
+ rc = get_qp_wq_type(hdev, &hbl_wq_type, ib_wq_type);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid WQ type mask %d, port %u\n", ib_wq_type, hbl_port);
+ return rc;
+ }
+
+ alloc_conn_in.port = hbl_port;
+
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_ALLOC_CONN, &alloc_conn_in,
+ &alloc_conn_out);
+
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed to allocate QP, port %d\n", hbl_port);
+ return rc;
+ }
+
+ hblqp->qp_id = alloc_conn_out.conn_id;
+ hblqp->qp_state = IB_QPS_INIT;
+ hblqp->wq_type = hbl_wq_type;
+ hblqp->wq_granularity = modify_qp_req->wq_granularity;
+
+ modify_qp_resp->qp_num = hblqp->qp_id;
+
+ return 0;
+}
+
+static u8 get_req_cq_number(struct hbl_ib_qp *hblqp)
+{
+ struct hbl_ib_cq *hblcq;
+ struct ib_cq *ibcq;
+ struct ib_qp *ibqp;
+
+ ibqp = &hblqp->ibqp;
+ ibcq = ibqp->send_cq;
+ hblcq = to_hbl_ib_cq(ibcq);
+
+ return hblcq->is_native ? hblcq->port_cq[ibqp->port].cq_num : hblcq->cq_num;
+}
+
+static u8 get_res_cq_number(struct hbl_ib_qp *hblqp)
+{
+ struct hbl_ib_cq *hblcq;
+ struct ib_qp *ibqp;
+ struct ib_cq *ibcq;
+
+ ibqp = &hblqp->ibqp;
+ ibcq = ibqp->recv_cq;
+ hblcq = to_hbl_ib_cq(ibcq);
+
+ return hblcq->is_native ? hblcq->port_cq[ibqp->port].cq_num : hblcq->cq_num;
+}
+
+static void copy_mac_reverse(u8 *dst, u8 *src)
+{
+ int i;
+
+ for (i = 0; i < ETH_ALEN; i++)
+ dst[i] = src[(ETH_ALEN - 1) - i];
+}
+
+static inline bool is_l2_gid(struct in6_addr *addr)
+{
+ return (addr->s6_addr32[0] == htonl(0xfe800000)) && (addr->s6_addr32[1] == 0);
+}
+
+static int set_res_qp_ctx(struct hbl_ib_qp *hblqp, struct ib_qp_attr *qp_attr, int qp_attr_mask,
+ struct hbl_ibv_modify_qp_req *modify_qp_req)
+{
+ struct hbl_cni_res_conn_ctx_in res_conn_ctx_in = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ union ib_gid *dgid;
+ struct ib_qp *ibqp;
+ u32 hbl_port;
+ int rc;
+
+ ibqp = &hblqp->ibqp;
+ ibdev = ibqp->device;
+ hdev = to_hbl_ib_dev(ibdev);
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ dgid = &qp_attr->ah_attr.grh.dgid;
+ hctx = hblqp->hctx;
+
+ rc = ib_to_hbl_port_num(hdev, ibqp->port, &hbl_port);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u\n, IB QP %u", ibqp->port, ibqp->qp_num);
+ return rc;
+ }
+
+ /* If the ports are internal, anyway we don't use the dst_mac_addr when configuring the QPC.
+ * Instead we use the broadcast MAC for dest MAC. Refer the ASIC specific set_res_qp_ctx
+ */
+ if (hdev->ext_ports_mask & BIT(hbl_port)) {
+ copy_mac_reverse(hblqp->dst_mac_addr, qp_attr->ah_attr.roce.dmac);
+ memcpy(res_conn_ctx_in.dst_mac_addr, hblqp->dst_mac_addr, ETH_ALEN);
+ }
+
+ if ((hdev->ext_ports_mask & BIT(hbl_port)) && !is_l2_gid((struct in6_addr *)dgid->raw)) {
+ hblqp->dst_ip_addr = htonl(((struct in6_addr *)dgid->raw)->s6_addr32[3]);
+ res_conn_ctx_in.dst_ip_addr = hblqp->dst_ip_addr;
+ }
+
+ res_conn_ctx_in.dst_conn_id = qp_attr->dest_qp_num;
+ res_conn_ctx_in.port = hbl_port;
+ res_conn_ctx_in.conn_id = hblqp->qp_id;
+ res_conn_ctx_in.cq_number = get_res_cq_number(hblqp);
+ res_conn_ctx_in.local_key = modify_qp_req->local_key;
+ res_conn_ctx_in.priority = modify_qp_req->priority;
+ res_conn_ctx_in.loopback = modify_qp_req->loopback;
+ res_conn_ctx_in.wq_peer_size = hblqp->max_send_wr;
+ res_conn_ctx_in.rdv = hblqp->wq_type == QPC_REQ_WQ_TYPE_RDV_READ ||
+ hblqp->wq_type == QPC_REQ_WQ_TYPE_RDV_WRITE;
+ res_conn_ctx_in.conn_peer = hblqp->qp_id;
+ res_conn_ctx_in.wq_peer_granularity = hblqp->wq_granularity;
+ res_conn_ctx_in.encap_en = modify_qp_req->encap_en;
+ res_conn_ctx_in.encap_id = modify_qp_req->encap_num;
+
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_SET_RES_CONN_CTX, &res_conn_ctx_in,
+ NULL);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed to config RTR, QP %d, port %d\n", hblqp->qp_id,
+ hbl_port);
+ return rc;
+ }
+
+ hblqp->qp_state = IB_QPS_RTR;
+ hblqp->dest_qp_num = qp_attr->dest_qp_num;
+
+ if (qp_attr->path_mtu == HBL_IB_MTU_8192)
+ hblqp->mtu = 8192;
+ else
+ hblqp->mtu = ib_mtu_enum_to_int(qp_attr->path_mtu);
+
+ return 0;
+}
+
+static void qp_user_mmap_entries_remove(struct hbl_ib_qp *qp)
+{
+ if (qp->rwq_mem_handle_entry)
+ rdma_user_mmap_entry_remove(qp->rwq_mem_handle_entry);
+ if (qp->swq_mem_handle_entry)
+ rdma_user_mmap_entry_remove(qp->swq_mem_handle_entry);
+}
+
+static int qp_user_mmap_entries_setup(struct hbl_ib_device *dev, struct hbl_ib_qp *qp,
+ struct hbl_ib_ucontext *hctx,
+ struct hbl_ibv_modify_qp_resp *resp)
+{
+ int rc;
+
+ if (resp->swq_mem_handle) {
+ qp->swq_mem_handle_entry = hbl_ib_user_mmap_entry_insert(&hctx->ibucontext,
+ resp->swq_mem_handle,
+ resp->swq_mem_size,
+ &resp->swq_mem_handle);
+ if (IS_ERR(qp->swq_mem_handle_entry)) {
+ rc = PTR_ERR(qp->swq_mem_handle_entry);
+ goto reset_swq_entry;
+ }
+ }
+
+ if (resp->rwq_mem_handle) {
+ qp->rwq_mem_handle_entry = hbl_ib_user_mmap_entry_insert(&hctx->ibucontext,
+ resp->rwq_mem_handle,
+ resp->rwq_mem_size,
+ &resp->rwq_mem_handle);
+ if (IS_ERR(qp->rwq_mem_handle_entry)) {
+ rc = PTR_ERR(qp->rwq_mem_handle_entry);
+ goto reset_rwq_entry;
+ }
+ }
+
+ return 0;
+
+reset_rwq_entry:
+ qp->rwq_mem_handle_entry = NULL;
+ if (qp->swq_mem_handle_entry)
+ rdma_user_mmap_entry_remove(qp->swq_mem_handle_entry);
+
+reset_swq_entry:
+ qp->swq_mem_handle_entry = NULL;
+
+ return rc;
+}
+
+static int set_req_qp_ctx(struct hbl_ib_qp *hblqp, struct ib_qp_attr *qp_attr, int qp_attr_mask,
+ struct hbl_ibv_modify_qp_req *modify_qp_req,
+ struct hbl_ibv_modify_qp_resp *modify_qp_resp)
+{
+ struct hbl_cni_req_conn_ctx_out req_conn_ctx_out = {};
+ struct hbl_cni_req_conn_ctx_in req_conn_ctx_in = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ struct ib_qp *ibqp;
+ u32 hbl_port;
+ int rc;
+
+ ibqp = &hblqp->ibqp;
+ ibdev = ibqp->device;
+ hdev = to_hbl_ib_dev(ibdev);
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ hctx = hblqp->hctx;
+
+ rc = ib_to_hbl_port_num(hdev, ibqp->port, &hbl_port);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u, IB QP %u\n", ibqp->port, ibqp->qp_num);
+ return rc;
+ }
+
+ req_conn_ctx_in.port = hbl_port;
+ req_conn_ctx_in.conn_id = hblqp->qp_id;
+ req_conn_ctx_in.dst_conn_id = hblqp->dest_qp_num;
+ req_conn_ctx_in.wq_type = hblqp->wq_type;
+ req_conn_ctx_in.wq_size = hblqp->max_send_wr;
+ req_conn_ctx_in.cq_number = get_req_cq_number(hblqp);
+ req_conn_ctx_in.remote_key = modify_qp_req->remote_key;
+ req_conn_ctx_in.priority = modify_qp_req->priority;
+ req_conn_ctx_in.timer_granularity = qp_attr->timeout;
+
+ if (modify_qp_req->dest_wq_size) {
+ if (!is_power_of_2(modify_qp_req->dest_wq_size)) {
+ hbl_ibdev_dbg(ibdev, "dest_wq_size :%d is not power of 2, QP %d, port %d\n",
+ modify_qp_req->dest_wq_size, hblqp->qp_id, hbl_port);
+ return -EINVAL;
+ }
+ req_conn_ctx_in.wq_remote_log_size = ilog2(modify_qp_req->dest_wq_size);
+ }
+
+ req_conn_ctx_in.congestion_en = modify_qp_req->congestion_en;
+ req_conn_ctx_in.congestion_wnd = modify_qp_req->congestion_wnd;
+ req_conn_ctx_in.loopback = modify_qp_req->loopback;
+ req_conn_ctx_in.compression_en = modify_qp_req->compression_en;
+ req_conn_ctx_in.encap_en = modify_qp_req->encap_en;
+ req_conn_ctx_in.encap_id = modify_qp_req->encap_num;
+ req_conn_ctx_in.swq_granularity = hblqp->wq_granularity;
+ req_conn_ctx_in.mtu = hblqp->mtu;
+
+ /* If the ports are internal, anyway we don't use the dst_mac_addr when configuring the QPC.
+ * Instead we use the broadcast MAC for dest MAC. Refer the ASIC specific set_req_qp_ctx
+ */
+ if (hdev->ext_ports_mask & BIT(hbl_port)) {
+ memcpy(req_conn_ctx_in.dst_mac_addr, hblqp->dst_mac_addr, ETH_ALEN);
+ req_conn_ctx_in.dst_ip_addr = hblqp->dst_ip_addr;
+ }
+
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_SET_REQ_CONN_CTX, &req_conn_ctx_in,
+ &req_conn_ctx_out);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed to config RTS, QP %d, port %d\n",
+ hblqp->qp_id, hbl_port);
+ return rc;
+ }
+
+ modify_qp_resp->swq_mem_handle = req_conn_ctx_out.swq_mem_handle;
+ modify_qp_resp->swq_mem_size = req_conn_ctx_out.swq_mem_size;
+ modify_qp_resp->rwq_mem_handle = req_conn_ctx_out.rwq_mem_handle;
+ modify_qp_resp->rwq_mem_size = req_conn_ctx_out.rwq_mem_size;
+
+ WARN_ON_ONCE(!PAGE_ALIGNED(modify_qp_resp->swq_mem_size));
+ WARN_ON_ONCE(!PAGE_ALIGNED(modify_qp_resp->rwq_mem_size));
+
+ rc = qp_user_mmap_entries_setup(hdev, hblqp, hctx, modify_qp_resp);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed create mmap entries for QP %d, port %d\n",
+ hblqp->qp_id, hbl_port);
+ return rc;
+ }
+
+ hblqp->qp_state = IB_QPS_RTS;
+
+ return 0;
+}
+
+static int reset_qp(struct hbl_ib_qp *hblqp)
+{
+ struct ib_device *ibdev;
+ struct ib_qp *ibqp;
+ int rc;
+
+ ibqp = &hblqp->ibqp;
+ ibdev = ibqp->device;
+
+ rc = __destroy_qp(hblqp);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed to reset QP %d, port %d\n", hblqp->qp_id, ibqp->port);
+ return rc;
+ }
+
+ hblqp->qp_state = IB_QPS_RESET;
+
+ return 0;
+}
+
+static int hbl_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr, int qp_attr_mask,
+ struct ib_udata *udata)
+{
+ struct hbl_ibv_modify_qp_resp modify_qp_resp = {};
+ struct hbl_ibv_modify_qp_req modify_qp_req = {};
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ struct hbl_ib_qp *hblqp;
+ u32 ib_port, hbl_port;
+ int rc;
+
+ ibdev = ibqp->device;
+ hdev = to_hbl_ib_dev(ibdev);
+ hblqp = to_hbl_ib_qp(ibqp);
+ hctx = hblqp->hctx;
+
+ rc = verify_modify_qp(hblqp, qp_attr, qp_attr_mask);
+ if (rc)
+ return rc;
+
+ ib_port = (qp_attr_mask & IB_QP_PORT) ? qp_attr->port_num : hblqp->ibqp.port;
+
+ rc = ib_to_hbl_port_num(hdev, ib_port, &hbl_port);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u\n", ib_port);
+ return rc;
+ }
+
+ if (!(hctx->ports_mask & BIT(hbl_port))) {
+ hbl_ibdev_dbg(ibdev, "port %d is not part of the context's ports mask 0x%llx\n",
+ hbl_port, hctx->ports_mask);
+ return -EINVAL;
+ }
+
+ rc = ib_copy_from_udata(&modify_qp_req, udata, min(sizeof(modify_qp_req), udata->inlen));
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed to copy from modify QP udata\n");
+ return rc;
+ }
+
+ if ((qp_attr_mask & IB_QP_STATE) && qp_attr->qp_state == IB_QPS_RESET) {
+ /* QP state transition IB_QPS_RESET ==> IB_QPS_RESET is NoOps */
+ if (hblqp->qp_state != IB_QPS_RESET) {
+ rc = reset_qp(hblqp);
+ if (rc)
+ return rc;
+ }
+ }
+
+ if ((qp_attr_mask & IB_QP_STATE) && qp_attr->qp_state == IB_QPS_INIT) {
+ /* QP state transition IB_QPS_INIT ==> IB_QPS_INIT. Destroy old QP. */
+ if (hblqp->qp_state == IB_QPS_INIT) {
+ rc = reset_qp(hblqp);
+ if (rc)
+ return rc;
+ }
+
+ rc = alloc_qp(hblqp, qp_attr, qp_attr_mask, &modify_qp_req, &modify_qp_resp);
+ if (rc)
+ return rc;
+ }
+
+ if ((qp_attr_mask & IB_QP_STATE) && qp_attr->qp_state == IB_QPS_RTR) {
+ rc = set_res_qp_ctx(hblqp, qp_attr, qp_attr_mask, &modify_qp_req);
+ if (rc)
+ goto err_reset_qp;
+ }
+
+ if ((qp_attr_mask & IB_QP_STATE) && qp_attr->qp_state == IB_QPS_RTS) {
+ rc = set_req_qp_ctx(hblqp, qp_attr, qp_attr_mask, &modify_qp_req, &modify_qp_resp);
+ if (rc)
+ goto err_reset_qp;
+ }
+
+ rc = ib_copy_to_udata(udata, &modify_qp_resp, min(sizeof(modify_qp_resp), udata->outlen));
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "Failed to copy to QP modify udata\n");
+ goto err_reset_qp;
+ }
+
+ return 0;
+
+err_reset_qp:
+ reset_qp(hblqp);
+
+ return rc;
+}
+
+static int hbl_ib_query_gid(struct ib_device *ibdev, u32 port, int index, union ib_gid *gid)
+{
+ /* The IB core would query the GID for non-ROCE ports i.e. internal ports */
+ memset(gid->raw, 0xFF, sizeof(gid->raw));
+
+ return 0;
+}
+
+static int hbl_ib_query_pkey(struct ib_device *ibdev, u32 port, u16 index, u16 *pkey)
+{
+ if (index > 0)
+ return -EINVAL;
+
+ *pkey = 0xffff;
+
+ return 0;
+}
+
+static int hbl_ib_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr, int qp_attr_mask,
+ struct ib_qp_init_attr *qp_init_attr)
+{
+ struct hbl_ib_qp *hblqp;
+
+ hblqp = to_hbl_ib_qp(ibqp);
+
+ memset(qp_attr, 0, sizeof(*qp_attr));
+ memset(qp_init_attr, 0, sizeof(*qp_init_attr));
+
+ qp_attr->qp_state = hblqp->qp_state;
+ qp_attr->dest_qp_num = hblqp->dest_qp_num;
+ qp_attr->port_num = ibqp->port;
+
+ qp_init_attr->cap.max_send_wr = hblqp->max_send_wr;
+ qp_init_attr->cap.max_recv_wr = hblqp->max_recv_wr;
+
+ /* We need to populate these 2 params to pass query_qp pyverbs test */
+ qp_init_attr->cap.max_send_sge = HBL_IB_MAX_SEND_SGE;
+ qp_init_attr->cap.max_recv_sge = HBL_IB_MAX_RECV_SGE;
+
+ /* Both xrcd and qp_access_flags are not used by our flows, so we may override them in order
+ * to pass extra data for EQ events.
+ */
+ qp_attr->qp_access_flags = (int)(uintptr_t)ibqp->xrcd;
+
+ return 0;
+}
+
+static struct ib_mr *hbl_ib_reg_mr(struct ib_pd *ibpd, u64 start, u64 length, u64 virt_addr,
+ int access_flags, struct ib_udata *udata)
+{
+ return ERR_PTR(-EOPNOTSUPP);
+}
+
+static struct ib_mr *hbl_ib_reg_user_mr_dmabuf(struct ib_pd *ibpd, u64 start, u64 length,
+ u64 virt_addr, int fd, int access_flags,
+ struct ib_udata *udata)
+{
+ return ERR_PTR(-EOPNOTSUPP);
+}
+
+/* The GID table is created and maintained by the kernel rdma cache module based on the gid_tbl_len
+ * provided. add_gid callback would be called whenever a new GID entry is added to the GID table. We
+ * get the gid details as part of the attr param. We can store that for our reference.
+ */
+static int hbl_ib_add_gid(const struct ib_gid_attr *attr, void **context)
+{
+ struct hbl_ib_device *hdev = to_hbl_ib_dev(attr->device);
+ struct hbl_aux_dev *aux_dev = hdev->aux_dev;
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_ib_port *ib_port;
+ union hbl_ib_sockaddr {
+ struct sockaddr_in saddr_in;
+ struct sockaddr_in6 saddr_in6;
+ } sa;
+ u32 port, ip_addr;
+ int rc;
+
+ rc = ib_to_hbl_port_num(hdev, attr->port_num, &port);
+ if (rc) {
+ hbl_ibdev_dbg(&hdev->ibdev, "invalid IB port %u\n", attr->port_num);
+ return rc;
+ }
+
+ ib_port = &hdev->ib_port[port];
+ aux_ops = aux_dev->aux_ops;
+
+ memcpy(ib_port->gids[attr->index].gid.raw, attr->gid.raw, sizeof(attr->gid));
+ ib_port->gids[attr->index].gid_type = attr->gid_type;
+
+ if (ipv6_addr_v4mapped((struct in6_addr *)&attr->gid)) {
+ rdma_gid2ip((struct sockaddr *)&sa, &ib_port->gids[attr->index].gid);
+ ip_addr = be32_to_cpu(sa.saddr_in.sin_addr.s_addr);
+ aux_ops->set_ip_addr_encap(aux_dev, ip_addr, port);
+ }
+
+ return 0;
+}
+
+static int hbl_ib_del_gid(const struct ib_gid_attr *attr, void **context)
+{
+ struct hbl_ib_device *hdev = to_hbl_ib_dev(attr->device);
+ struct hbl_ib_port *ib_port;
+
+ if (attr->port_num > hdev->max_num_of_ports) {
+ hbl_ibdev_dbg(&hdev->ibdev, "%s, port num: %d out of bounds\n", __func__,
+ attr->port_num);
+ return -EINVAL;
+ }
+
+ ib_port = &hdev->ib_port[attr->port_num - 1];
+
+ /* validate the params */
+ if (attr->index >= HBL_IB_MAX_PORT_GIDS) {
+ hbl_ibdev_dbg(&hdev->ibdev, "%s, GID index: %d out of bounds\n", __func__,
+ attr->index);
+ return -EINVAL;
+ }
+
+ memset(ib_port->gids[attr->index].gid.raw, 0, sizeof(attr->gid));
+ ib_port->gids[attr->index].gid_type = 0;
+
+ return 0;
+}
+
+static int hbl_ib_fill_data(struct sk_buff *msg, struct ib_qp *ibqp, bool req, bool print_qp_id)
+{
+ char *data_buf, *str_buf, *ptr, *ptr2, *ptr3, *full_name, *name, *val, *prefix;
+ struct hbl_ib_dump_qp_attr attr = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ struct hbl_ib_qp *hblqp;
+ int rc = 0, len, i;
+ u32 hbl_port;
+
+ ibdev = ibqp->device;
+ hblqp = to_hbl_ib_qp(ibqp);
+ hdev = to_hbl_ib_dev(ibdev);
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ rc = ib_to_hbl_port_num(hdev, ibqp->port, &hbl_port);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u\n", ibqp->port);
+ return rc;
+ }
+
+ data_buf = kzalloc(HBL_IB_DUMP_QP_SZ, GFP_KERNEL);
+ if (!data_buf)
+ return -ENOMEM;
+
+ str_buf = kcalloc(2, NAME_MAX, GFP_KERNEL);
+ if (!str_buf) {
+ rc = -ENOMEM;
+ goto free_data_buf;
+ }
+
+ prefix = req ? "req_" : "res_";
+
+ full_name = str_buf;
+ memcpy(full_name, prefix, strlen(prefix));
+ name = full_name + strlen(prefix);
+
+ val = full_name + NAME_MAX;
+
+ attr.port = hbl_port;
+ attr.qpn = hblqp->qp_id;
+ attr.req = req;
+ attr.full = false;
+ attr.force = true;
+
+ rc = aux_ops->dump_qp(aux_dev, &attr, data_buf, HBL_IB_DUMP_QP_SZ);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to dump QP %d, port %d\n", attr.qpn, attr.port);
+ rc = -ENODATA;
+ goto free_str_buf;
+ }
+
+ if (print_qp_id) {
+ snprintf(val, NAME_MAX, "%u", hblqp->qp_id);
+ rc = rdma_nl_put_driver_string(msg, "qp_id", val);
+ if (rc)
+ goto free_str_buf;
+ }
+
+ /* skip first line */
+ ptr = strchr(data_buf, '\n');
+ ptr++;
+
+ while (1) {
+ ptr2 = strchr(ptr, ':');
+ if (!ptr2)
+ break;
+
+ /* Skip section headlines and any empty lines - they don't have (:) separator */
+ do {
+ ptr3 = strchr(ptr, '\n');
+ if (!ptr3 || ptr3 >= ptr2)
+ break;
+
+ ptr = ptr3 + 1;
+ } while (1);
+
+ /* extract attribute name */
+ len = ptr2 - ptr;
+ memcpy(name, ptr, len);
+ name[len] = '\0';
+
+ /* to lowercase and no spaces */
+ for (i = 0; i < len; i++)
+ if (isspace(name[i]))
+ name[i] = '_';
+ else
+ name[i] = tolower(name[i]);
+
+ /* skip ':' and the following space */
+ ptr = ptr2 + 2;
+
+ ptr2 = strchr(ptr, '\n');
+
+ /* extract attribute value */
+ len = ptr2 - ptr;
+ memcpy(val, ptr, len);
+ val[len] = '\0';
+
+ if (rdma_nl_put_driver_string(msg, full_name, val)) {
+ rc = -EMSGSIZE;
+ goto free_str_buf;
+ }
+
+ /* move to next line */
+ ptr = ptr2 + 1;
+ }
+
+free_str_buf:
+ kfree(str_buf);
+free_data_buf:
+ kfree(data_buf);
+
+ return rc;
+}
+
+static int hbl_ib_fill_res_qp_entry(struct sk_buff *msg, struct ib_qp *ibqp)
+{
+ struct ib_device *ibdev = ibqp->device;
+ struct nlattr *table_attr;
+ int rc;
+
+ table_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_DRIVER);
+ if (!table_attr)
+ return -EMSGSIZE;
+
+ rc = hbl_ib_fill_data(msg, ibqp, true, true);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed get REQ QP %d data, port %d\n", ibqp->qp_num,
+ ibqp->port);
+ rc = -ENODATA;
+ goto free_table;
+ }
+
+ rc = hbl_ib_fill_data(msg, ibqp, false, false);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed get RES QP %d data, port %d\n", ibqp->qp_num,
+ ibqp->port);
+ rc = -ENODATA;
+ goto free_table;
+ }
+
+ nla_nest_end(msg, table_attr);
+
+ return 0;
+
+free_table:
+ nla_nest_cancel(msg, table_attr);
+ return rc;
+}
+
+void hbl_ib_eqe_null_work(struct hbl_aux_dev *aux_dev, u32 port)
+{
+}
+
+void hbl_ib_eqe_work_schd(struct hbl_aux_dev *aux_dev, u32 port)
+{
+ struct hbl_ib_device *hdev = aux_dev->priv;
+ struct hbl_ib_port *ib_port;
+
+ ib_port = &hdev->ib_port[port];
+
+ if (!ib_port->open)
+ return;
+
+ /* Use this atomic to prevent a race - a thread handling received EQE in hbl_cn enters here
+ * to wake up the EQ thread while another thread is executing hbl_ib_port_fini which
+ * releases it. In such case the first thread might access a released resource.
+ */
+ if (atomic_cmpxchg(&ib_port->eq_lock, 0, 1))
+ return;
+
+ complete(&ib_port->eq_comp);
+
+ atomic_set(&ib_port->eq_lock, 0);
+}
+
+static bool hbl_ib_dispatch_event_qp(struct hbl_ib_device *hdev, struct hbl_ib_ucontext *hctx,
+ u32 port, u32 qpn, enum ib_event_type event_type,
+ u64 extra_data)
+{
+ struct ib_event ibev = {};
+ struct hbl_ib_qp *hblqp;
+ bool found_qp = false;
+ unsigned long id = 0;
+ struct ib_qp *ibqp;
+ u32 qp_port;
+ int rc;
+
+ xa_lock(&hctx->qp_xarray);
+ xa_for_each(&hctx->qp_xarray, id, hblqp) {
+ ibqp = &hblqp->ibqp;
+
+ rc = ib_to_hbl_port_num(hdev, ibqp->port, &qp_port);
+ if (rc) {
+ hbl_ibdev_dbg(&hdev->ibdev, "invalid IB port %u, IB QP %u\n", ibqp->port,
+ ibqp->qp_num);
+ continue;
+ }
+
+ /* We need to iterate over all QPs that are allocated
+ * under this CTX as we need to perform backward mapping
+ * of QP ID we have in HBL to the corresponding IB QP struct
+ */
+ if (hblqp->qp_id == qpn && qp_port == port) {
+ /* xrcd is not used by our flows, so we may override it in order to pass
+ * extra data for EQ events.
+ * This is not part of the event, but rather part of the qp structure.
+ * Meaning an additional QP event will override the value stored in xrcd.
+ * Since this is an error case, the QP should not receive anymore events.
+ */
+ ibqp->xrcd = (void *)extra_data;
+ ibev.event = event_type;
+ ibev.element.qp = ibqp;
+
+ ibqp->event_handler(&ibev, ibqp->qp_context);
+
+ /* We should mark the QP as in error state */
+ hblqp->qp_state = IB_QPS_ERR;
+
+ found_qp = true;
+ break;
+ }
+ }
+
+ xa_unlock(&hctx->qp_xarray);
+
+ return found_qp;
+}
+
+void hbl_ib_eqe_handler(struct hbl_ib_port *ib_port)
+{
+ struct hbl_ib_ucontext *hctx = ib_port->hctx;
+ struct hbl_cni_eq_poll_out eq_poll_out = {};
+ struct hbl_ib_device *hdev = ib_port->hdev;
+ struct hbl_cni_eq_poll_in eq_poll_in = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ enum ib_event_type event_type;
+ struct hbl_aux_dev *aux_dev;
+ struct ib_event ibev = {};
+ u32 port = ib_port->port;
+ bool found_qp = false;
+ struct ib_cq *ibcq;
+ int rc;
+
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+ eq_poll_in.port = port;
+
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_EQ_POLL, &eq_poll_in,
+ &eq_poll_out);
+ if (rc) {
+ hbl_ibdev_err(&hdev->ibdev, "port %d - EQ poll failed %d\n", port, rc);
+ return;
+ }
+
+ switch (eq_poll_out.status) {
+ case HBL_CNI_EQ_POLL_STATUS_SUCCESS:
+
+ ibev.device = &hdev->ibdev;
+
+ switch (eq_poll_out.ev_type) {
+ case HBL_CNI_EQ_EVENT_TYPE_CQ_ERR:
+ hbl_ibdev_dbg(&hdev->ibdev, "port %d cq %d - received CQ ERR event\n", port,
+ eq_poll_out.idx);
+
+ xa_lock(&ib_port->hbl_ibcq_tbl);
+ ibcq = xa_load(&ib_port->hbl_ibcq_tbl, eq_poll_out.idx);
+
+ if (ibcq) {
+ ibev.element.cq = ibcq;
+ ibev.event = IB_EVENT_CQ_ERR;
+
+ ibcq->event_handler(&ibev, ibcq->cq_context);
+ } else {
+ hbl_ibdev_err(&hdev->ibdev,
+ "port %d cq %d - received CQ ERR event, but CQ is not allocated\n",
+ port, eq_poll_out.idx);
+ }
+
+ xa_unlock(&ib_port->hbl_ibcq_tbl);
+ break;
+ case HBL_CNI_EQ_EVENT_TYPE_QP_ERR:
+ /* In IBv we can't pass the syndrome value to user space via IB events
+ * mechanism, and hence we will print it instead.
+ */
+ hbl_ibdev_err(&hdev->ibdev, "port %d qp %d - received QP ERR syndrome: %s\n",
+ port, eq_poll_out.idx,
+ aux_ops->qp_syndrome_to_str(aux_dev, eq_poll_out.ev_data));
+
+ event_type = eq_poll_out.is_req ? IB_EVENT_QP_REQ_ERR : IB_EVENT_QP_FATAL;
+
+ found_qp = hbl_ib_dispatch_event_qp(hdev, hctx, port, eq_poll_out.idx,
+ event_type, eq_poll_out.ev_data);
+
+ if (!found_qp)
+ hbl_ibdev_err(&hdev->ibdev,
+ "port %d qp %d - received QP ERR event, but QP is not allocated\n",
+ port, eq_poll_out.idx);
+
+ break;
+ case HBL_CNI_EQ_EVENT_TYPE_DB_FIFO_ERR:
+ hbl_ibdev_err(&hdev->ibdev, "port %d user fifo %d error\n", port,
+ eq_poll_out.idx);
+
+ ibev.event = IB_EVENT_DEVICE_FATAL;
+ ibev.element.port_num = (port & HBL_IB_EQ_PORT_FIELD_MASK) |
+ (eq_poll_out.idx << HBL_IB_EQ_PORT_FIELD_SIZE);
+ ib_dispatch_event(&ibev);
+ break;
+ case HBL_CNI_EQ_EVENT_TYPE_CCQ:
+ hbl_ibdev_dbg(&hdev->ibdev, "Port %u: got completion on congestion CQ %u\n",
+ port, eq_poll_out.idx);
+
+ ibev.event = IB_EVENT_SM_CHANGE;
+ ibev.element.port_num = (port & HBL_IB_EQ_PORT_FIELD_MASK) |
+ (eq_poll_out.idx << HBL_IB_EQ_PORT_FIELD_SIZE);
+ ib_dispatch_event(&ibev);
+ break;
+ case HBL_CNI_EQ_EVENT_TYPE_WTD_SECURITY_ERR:
+ hbl_ibdev_dbg(&hdev->ibdev, "Port %u: got WTD security error on QP %u\n",
+ port, eq_poll_out.idx);
+
+ found_qp = hbl_ib_dispatch_event_qp(hdev, hctx, port, eq_poll_out.idx,
+ IB_EVENT_PATH_MIG, 0);
+
+ if (!found_qp)
+ hbl_ibdev_err(&hdev->ibdev,
+ "port %d qp %d - received WTD security event, but QP is not allocated\n",
+ port, eq_poll_out.idx);
+ break;
+ case HBL_CNI_EQ_EVENT_TYPE_NUMERICAL_ERR:
+ hbl_ibdev_dbg(&hdev->ibdev, "Port %u: got numerical error on QP %u\n",
+ port, eq_poll_out.idx);
+
+ found_qp = hbl_ib_dispatch_event_qp(hdev, hctx, port, eq_poll_out.idx,
+ IB_EVENT_PATH_MIG_ERR, 0);
+
+ if (!found_qp)
+ hbl_ibdev_err(&hdev->ibdev,
+ "port %d qp %d - received numerical error event, but QP is not allocated\n",
+ port, eq_poll_out.idx);
+ break;
+ case HBL_CNI_EQ_EVENT_TYPE_LINK_STATUS:
+ hbl_ibdev_dbg(&hdev->ibdev, "port %d link %s\n", port,
+ eq_poll_out.ev_data ? "up" : "down");
+
+ ibev.event = eq_poll_out.ev_data ?
+ IB_EVENT_PORT_ACTIVE : IB_EVENT_PORT_ERR;
+ ibev.element.port_num = port;
+ ib_dispatch_event(&ibev);
+ break;
+ case HBL_CNI_EQ_EVENT_TYPE_QP_ALIGN_COUNTERS:
+ hbl_ibdev_dbg(&hdev->ibdev,
+ "port %d qp %d - Align QP counters on QP timeout\n", port,
+ eq_poll_out.idx);
+
+ found_qp = hbl_ib_dispatch_event_qp(hdev, hctx, port, eq_poll_out.idx,
+ IB_EVENT_QP_LAST_WQE_REACHED, 0);
+
+ if (!found_qp)
+ hbl_ibdev_err(&hdev->ibdev,
+ "port %d qp %d - received Align QP counters event, but QP is not allocated\n",
+ port, eq_poll_out.idx);
+ break;
+ default:
+ hbl_ibdev_dbg(&hdev->ibdev, "port %d EQ poll success, event %d\n", port,
+ eq_poll_out.ev_type);
+ break;
+ }
+ break;
+ default:
+ break;
+ }
+}
+
+const struct ib_device_ops hbl_ib_dev_ops = {
+ .owner = THIS_MODULE,
+ .driver_id = RDMA_DRIVER_HBL,
+ .uverbs_abi_ver = HBL_IB_UVERBS_ABI_VERSION,
+
+ .add_gid = hbl_ib_add_gid,
+ .del_gid = hbl_ib_del_gid,
+ .alloc_hw_port_stats = hbl_ib_alloc_hw_port_stats,
+ .alloc_hw_device_stats = hbl_ib_alloc_hw_device_stats,
+ .alloc_pd = hbl_ib_alloc_pd,
+ .alloc_ucontext = hbl_ib_alloc_ucontext,
+ .create_cq = hbl_ib_create_cq,
+ .create_qp = hbl_ib_create_qp,
+ .dealloc_pd = hbl_ib_dealloc_pd,
+ .dealloc_ucontext = hbl_ib_dealloc_ucontext,
+ .dereg_mr = hbl_ib_dereg_mr,
+ .destroy_cq = hbl_ib_destroy_cq,
+ .destroy_qp = hbl_ib_destroy_qp,
+ .fill_res_qp_entry = hbl_ib_fill_res_qp_entry,
+ .get_hw_stats = hbl_ib_get_hw_stats,
+ .get_dev_fw_str = hbl_ib_get_dev_fw_str,
+ .get_link_layer = hbl_ib_port_link_layer,
+ .get_port_immutable = hbl_ib_get_port_immutable,
+ .mmap = hbl_ib_mmap,
+ .mmap_free = hbl_ib_mmap_free,
+ .modify_qp = hbl_ib_modify_qp,
+ .query_device = hbl_ib_query_device,
+ .query_gid = hbl_ib_query_gid,
+ .query_pkey = hbl_ib_query_pkey,
+ .query_port = hbl_ib_query_port,
+ .query_qp = hbl_ib_query_qp,
+ .reg_user_mr = hbl_ib_reg_mr,
+ .reg_user_mr_dmabuf = hbl_ib_reg_user_mr_dmabuf,
+
+ INIT_RDMA_OBJ_SIZE(ib_cq, hbl_ib_cq, ibcq),
+ INIT_RDMA_OBJ_SIZE(ib_pd, hbl_ib_pd, ibpd),
+ INIT_RDMA_OBJ_SIZE(ib_qp, hbl_ib_qp, ibqp),
+ INIT_RDMA_OBJ_SIZE(ib_ucontext, hbl_ib_ucontext, ibucontext),
+};
diff --git a/include/uapi/rdma/hbl-abi.h b/include/uapi/rdma/hbl-abi.h
new file mode 100644
index 000000000000..c545e07c9734
--- /dev/null
+++ b/include/uapi/rdma/hbl-abi.h
@@ -0,0 +1,204 @@
+/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) */
+/*
+ * Copyright 2022-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef HBL_IB_ABI_USER_H
+#define HBL_IB_ABI_USER_H
+
+#include <linux/types.h>
+
+/* Increment this value if any changes that break userspace ABI compatibility are made. */
+#define HBL_IB_UVERBS_ABI_VERSION 1
+
+#define HBL_IB_MTU_8192 6
+
+/**
+ * struct hbl_ibv_alloc_ucontext_req - Request udata for alloc ucontext.
+ * @ports_mask: Mask of ports associated with this context. 0 is for all available ports.
+ * @core_fd: Core device file descriptor.
+ * @use_dvs: Indicates if we're going to use our DVs.
+ */
+struct hbl_ibv_alloc_ucontext_req {
+ __aligned_u64 ports_mask;
+ __s32 core_fd;
+ __u8 use_dvs;
+ __u8 reserved[3];
+};
+
+/**
+ * enum hbl_ibv_ucontext_cap - Device capabilities.
+ * @HBL_UCONTEXT_CAP_MMAP_UMR: User memory region.
+ * @HBL_UCONTEXT_CAP_CC: Congestion control.
+ */
+enum hbl_ibv_ucontext_cap {
+ HBL_UCONTEXT_CAP_MMAP_UMR = 1 << 0,
+ HBL_UCONTEXT_CAP_CC = 1 << 1,
+};
+
+/**
+ * struct hbl_ibv_alloc_ucontext_resp - Response udata for alloc ucontext.
+ * @ports_mask: Mask of ports associated with this context.
+ * @cap_mask: Capabilities mask.
+ */
+struct hbl_ibv_alloc_ucontext_resp {
+ __aligned_u64 ports_mask;
+ __aligned_u64 cap_mask;
+};
+
+/**
+ * struct hbl_ibv_alloc_pd_resp - Response udata for alloc PD.
+ * @pdn: PD number.
+ */
+struct hbl_ibv_alloc_pd_resp {
+ __u32 pdn;
+ __u32 reserved;
+};
+
+/**
+ * enum hbl_ibv_qp_wq_types - QP WQ types.
+ * @HBL_WQ_WRITE: WRITE or "native" SEND operations are allowed on this QP.
+ * NOTE: the last is not supported!
+ * @HBL_WQ_RECV_RDV: RECEIVE-RDV or WRITE operations are allowed on this QP.
+ * NOTE: posting all operations at the same time is not supported!
+ * @HBL_WQ_READ_RDV: READ-RDV or WRITE operations are allowed on this QP.
+ * NOTE: posting all operations at the same time is not supported!
+ * @HBL_WQ_SEND_RDV: SEND-RDV operation is allowed on this QP.
+ * @HBL_WQ_READ_RDV_ENDP: No operation is allowed on this endpoint QP!
+ */
+enum hbl_ibv_qp_wq_types {
+ HBL_WQ_WRITE = 0x1,
+ HBL_WQ_RECV_RDV = 0x2,
+ HBL_WQ_READ_RDV = 0x4,
+ HBL_WQ_SEND_RDV = 0x8,
+ HBL_WQ_READ_RDV_ENDP = 0x10,
+};
+
+/**
+ * struct hbl_ibv_modify_qp_req - Request udata for modify QP.
+ * @local_key: Unique key for local memory access.
+ * @remote_key: Unique key for remote memory access.
+ * @congestion_wnd: Congestion-Window size.
+ * @dest_wq_size: Number of WQEs on the destination.
+ * @priority: Requester/responder QP priority.
+ * @wq_type: WQ type. e.g. write, rdv etc
+ * @loopback: QP loopback enable/disable.
+ * @congestion_en: Congestion-control enable/disable.
+ * @compression_en: Compression enable/disable.
+ * @encap_en: Encapsulation enable flag.
+ * @encap_num: Encapsulation number.
+ * @wq_granularity: WQ granularity [0 for 32B or 1 for 64B].
+ */
+struct hbl_ibv_modify_qp_req {
+ __u32 local_key;
+ __u32 remote_key;
+ __u32 congestion_wnd;
+ __u32 reserved0;
+ __u32 dest_wq_size;
+ __u8 priority;
+ __u8 wq_type;
+ __u8 loopback;
+ __u8 congestion_en;
+ __u8 reserved1;
+ __u8 reserved2;
+ __u8 compression_en;
+ __u8 reserved3;
+ __u8 encap_en;
+ __u8 encap_num;
+ __u8 reserved4;
+ __u8 wq_granularity;
+ __u8 reserved5;
+ __u8 reserved6[5];
+};
+
+/**
+ * struct hbl_ibv_modify_qp_resp - Response udata for modify QP.
+ * @swq_mem_handle: Send WQ mmap handle.
+ * @rwq_mem_handle: Receive WQ mmap handle.
+ * @swq_mem_size: Send WQ mmap size.
+ * @rwq_mem_size: Receive WQ mmap size.
+ * @qp_num: HBL QP num.
+ */
+struct hbl_ibv_modify_qp_resp {
+ __aligned_u64 swq_mem_handle;
+ __aligned_u64 rwq_mem_handle;
+ __u32 swq_mem_size;
+ __u32 rwq_mem_size;
+ __u32 qp_num;
+ __u32 reserved;
+};
+
+/**
+ * enum hbl_ibv_cq_type - CQ types, used during allocation of CQs.
+ * @HBL_CQ_TYPE_QP: Standard CQ used for completion of a operation for a QP.
+ * @HBL_CQ_TYPE_CC: Congestion control CQ.
+ */
+enum hbl_ibv_cq_type {
+ HBL_CQ_TYPE_QP,
+ HBL_CQ_TYPE_CC,
+};
+
+/**
+ * hbl_ibv_cq_req_flags - CQ req flag used for distinguision between CQ based on attributes.
+ * @CQ_FLAG_NATIVE: Bit 1 is set, it represents the CQ is called for native create CQ.
+ */
+enum hbl_ibv_cq_req_flags {
+ CQ_FLAG_NATIVE = 1 << 0,
+};
+
+/**
+ * struct hbl_ibv_create_cq_req - Request udata for create CQ.
+ * @port_num: IB Port number.
+ * @cq_type: Type of CQ resource as mentioned in hbl_ibv_cq_type.
+ * @flags: CQ req flag used for cq attributes.
+ */
+struct hbl_ibv_create_cq_req {
+ __u32 port_num;
+ __u8 cq_type;
+ __u8 flags;
+ __u8 reserved[2];
+};
+
+/**
+ * struct hbl_ibv_port_create_cq_resp - Response udata for create CQ.
+ * @mem_handle: Handle for the CQ buffer.
+ * @pi_handle: Handle for the Pi memory.
+ * @regs_handle: Handle for the CQ UMR register.
+ * @regs_offset: Register offset of CQ UMR register.
+ * @cq_num: CQ number that is allocated.
+ * @cq_size: Size of the CQ.
+ */
+struct hbl_ibv_port_create_cq_resp {
+ __aligned_u64 mem_handle;
+ __aligned_u64 pi_handle;
+ __aligned_u64 regs_handle;
+ __u32 regs_offset;
+ __u32 cq_num;
+ __u32 cq_size;
+ __u32 reserved;
+};
+
+/**
+ * struct hbl_ibv_create_cq_resp - Response udata for create CQ.
+ * @mem_handle: Handle for the CQ buffer.
+ * @pi_handle: Handle for the Pi memory.
+ * @regs_handle: Handle for the CQ UMR register.
+ * @regs_offset: Register offset of CQ UMR register.
+ * @cq_num: CQ number that is allocated.
+ * @cq_size: Size of the CQ.
+ * @port_cq_resp: response data for create CQ per port.
+ */
+struct hbl_ibv_create_cq_resp {
+ __aligned_u64 mem_handle;
+ __aligned_u64 pi_handle;
+ __aligned_u64 regs_handle;
+ __u32 regs_offset;
+ __u32 cq_num;
+ __u32 cq_size;
+ __u32 reserved;
+ struct hbl_ibv_port_create_cq_resp port_cq_resp[];
+};
+
+#endif /* HBL_IB_ABI_USER_H */
diff --git a/include/uapi/rdma/hbl_user_ioctl_cmds.h b/include/uapi/rdma/hbl_user_ioctl_cmds.h
new file mode 100644
index 000000000000..7ac2bf116385
--- /dev/null
+++ b/include/uapi/rdma/hbl_user_ioctl_cmds.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) */
+/*
+ * Copyright 2022-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef HBL_IB_USER_IOCTL_CMDS_H
+#define HBL_IB_USER_IOCTL_CMDS_H
+
+#include <linux/types.h>
+#include <rdma/ib_user_ioctl_cmds.h>
+
+enum hbl_ib_objects {
+ HBL_IB_OBJECT_USR_FIFO = (1U << UVERBS_ID_NS_SHIFT),
+ HBL_IB_OBJECT_SET_PORT_EX,
+ HBL_IB_OBJECT_QUERY_PORT,
+ HBL_IB_OBJECT_RESERVED,
+ HBL_IB_OBJECT_ENCAP,
+};
+
+enum hbl_ib_usr_fifo_obj_methods {
+ HBL_IB_METHOD_USR_FIFO_OBJ_CREATE = (1U << UVERBS_ID_NS_SHIFT),
+ HBL_IB_METHOD_USR_FIFO_OBJ_DESTROY,
+};
+
+enum hbl_ib_usr_fifo_create_attrs {
+ HBL_IB_ATTR_USR_FIFO_CREATE_IN = (1U << UVERBS_ID_NS_SHIFT),
+ HBL_IB_ATTR_USR_FIFO_CREATE_OUT,
+ HBL_IB_ATTR_USR_FIFO_CREATE_HANDLE,
+};
+
+enum hbl_ib_usr_fifo_destroy_attrs {
+ HBL_IB_ATTR_USR_FIFO_DESTROY_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+};
+
+enum hbl_ib_device_methods {
+ HBL_IB_METHOD_SET_PORT_EX = (1U << UVERBS_ID_NS_SHIFT),
+ HBL_IB_METHOD_QUERY_PORT,
+};
+
+enum hbl_ib_set_port_ex_attrs {
+ HBL_IB_ATTR_SET_PORT_EX_IN = (1U << UVERBS_ID_NS_SHIFT),
+};
+
+enum hbl_ib_query_port_attrs {
+ HBL_IB_ATTR_QUERY_PORT_IN = (1U << UVERBS_ID_NS_SHIFT),
+ HBL_IB_ATTR_QUERY_PORT_OUT,
+};
+
+enum hbl_ib_encap_methods {
+ HBL_IB_METHOD_ENCAP_CREATE = (1U << UVERBS_ID_NS_SHIFT),
+ HBL_IB_METHOD_ENCAP_DESTROY,
+};
+
+enum hbl_ib_encap_create_attrs {
+ HBL_IB_ATTR_ENCAP_CREATE_IN = (1U << UVERBS_ID_NS_SHIFT),
+ HBL_IB_ATTR_ENCAP_CREATE_OUT,
+ HBL_IB_ATTR_ENCAP_CREATE_HANDLE,
+};
+
+enum hbl_ib_encap_destroy_attrs {
+ HBL_IB_ATTR_ENCAP_DESTROY_HANDLE = (1U << UVERBS_ID_NS_SHIFT),
+};
+
+#endif /* HBL_IB_USER_IOCTL_CMDS_H */
diff --git a/include/uapi/rdma/hbl_user_ioctl_verbs.h b/include/uapi/rdma/hbl_user_ioctl_verbs.h
new file mode 100644
index 000000000000..ce896f5c58ba
--- /dev/null
+++ b/include/uapi/rdma/hbl_user_ioctl_verbs.h
@@ -0,0 +1,106 @@
+/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) */
+/*
+ * Copyright 2022-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef HBL_IB_USER_IOCTL_VERBS_H
+#define HBL_IB_USER_IOCTL_VERBS_H
+
+#include <linux/types.h>
+
+#define HBL_IB_MAX_BP_OFFS 16
+
+enum hbl_ib_wq_array_type {
+ HBL_IB_WQ_ARRAY_TYPE_GENERIC,
+ HBL_IB_WQ_ARRAY_TYPE_RESERVED1,
+ HBL_IB_WQ_ARRAY_TYPE_RESERVED2,
+ HBL_IB_WQ_ARRAY_TYPE_RESERVED3,
+ HBL_IB_WQ_ARRAY_TYPE_RESERVED4,
+ HBL_IB_WQ_ARRAY_TYPE_MAX,
+};
+
+struct hbl_wq_array_attr {
+ __u32 max_num_of_wqs;
+ __u32 max_num_of_wqes_in_wq;
+ __u8 mem_id;
+ __u8 swq_granularity;
+ __u8 reserved0[6];
+ __aligned_u64 reserved1[2];
+};
+
+struct hbl_uapi_usr_fifo_create_in {
+ __u32 port_num;
+ __u32 reserved0;
+ __u32 reserved1;
+ __u32 usr_fifo_num_hint;
+ __u8 mode;
+ __u8 reserved2;
+ __u8 reserved3[6];
+};
+
+struct hbl_uapi_usr_fifo_create_out {
+ __aligned_u64 ci_handle;
+ __aligned_u64 regs_handle;
+ __u32 usr_fifo_num;
+ __u32 regs_offset;
+ __u32 size;
+ __u32 bp_thresh;
+};
+
+struct hbl_uapi_set_port_ex_in {
+ struct hbl_wq_array_attr wq_arr_attr[HBL_IB_WQ_ARRAY_TYPE_MAX];
+ /* Pointer to u32 array */
+ __aligned_u64 qp_wq_bp_offs;
+ __u32 qp_wq_bp_offs_cnt;
+ __u32 port_num;
+ __aligned_u64 reserved0;
+ __u32 reserved1;
+ __u8 reserved2;
+ __u8 advanced;
+ __u8 adaptive_timeout_en;
+ __u8 reserved3;
+};
+
+struct hbl_uapi_query_port_in {
+ __u32 port_num;
+ __u32 reserved;
+};
+
+struct hbl_uapi_query_port_out {
+ __u32 max_num_of_qps;
+ __u32 num_allocated_qps;
+ __u32 max_allocated_qp_num;
+ __u32 max_cq_size;
+ __u32 reserved0;
+ __u32 reserved1;
+ __u32 reserved2;
+ __u32 reserved3;
+ __u32 reserved4;
+ __u8 advanced;
+ __u8 max_num_of_cqs;
+ __u8 max_num_of_usr_fifos;
+ __u8 max_num_of_encaps;
+ __u8 nic_macro_idx;
+ __u8 nic_phys_port_idx;
+ __u8 reserved[6];
+};
+
+struct hbl_uapi_encap_create_in {
+ __aligned_u64 tnl_hdr_ptr;
+ __u32 tnl_hdr_size;
+ __u32 port_num;
+ __u32 ipv4_addr;
+ __u16 udp_dst_port;
+ __u16 ip_proto;
+ __u8 encap_type;
+ __u8 reserved[7];
+};
+
+struct hbl_uapi_encap_create_out {
+ __u32 encap_num;
+ __u32 reserved;
+};
+
+#endif /* HBL_IB_USER_IOCTL_VERBS_H */
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index fe15bc7e9f70..016ac5c8fdea 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -255,6 +255,7 @@ enum rdma_driver_id {
RDMA_DRIVER_SIW,
RDMA_DRIVER_ERDMA,
RDMA_DRIVER_MANA,
+ RDMA_DRIVER_HBL,
};
enum ib_uverbs_gid_type {
--
2.34.1
^ permalink raw reply related [flat|nested] 107+ messages in thread
* [PATCH 12/15] RDMA/hbl: direct verbs support
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
` (9 preceding siblings ...)
2024-06-13 8:22 ` [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver Omer Shpigelman
@ 2024-06-13 8:22 ` Omer Shpigelman
2024-06-13 8:22 ` [PATCH 13/15] accel/habanalabs: network scaling support Omer Shpigelman
` (4 subsequent siblings)
15 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:22 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
Add direct verbs (DV) uAPI.
The added operations are:
query_port: query vendor specific port attributes.
set_port_ex: set port extended settings.
usr_fifo: set user FIFO object for triggering HW doorbells.
encap: set port encapsulation (UDP/IPv4).
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Co-developed-by: David Meriin <dmeriin@habana.ai>
Signed-off-by: David Meriin <dmeriin@habana.ai>
Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
---
drivers/infiniband/hw/hbl/Kconfig | 1 +
drivers/infiniband/hw/hbl/Makefile | 4 +
drivers/infiniband/hw/hbl/hbl_encap.c | 216 +++++++++++++++++
drivers/infiniband/hw/hbl/hbl_main.c | 15 ++
drivers/infiniband/hw/hbl/hbl_query_port.c | 96 ++++++++
drivers/infiniband/hw/hbl/hbl_set_port_ex.c | 96 ++++++++
drivers/infiniband/hw/hbl/hbl_usr_fifo.c | 252 ++++++++++++++++++++
7 files changed, 680 insertions(+)
create mode 100644 drivers/infiniband/hw/hbl/hbl_encap.c
create mode 100644 drivers/infiniband/hw/hbl/hbl_query_port.c
create mode 100644 drivers/infiniband/hw/hbl/hbl_set_port_ex.c
create mode 100644 drivers/infiniband/hw/hbl/hbl_usr_fifo.c
diff --git a/drivers/infiniband/hw/hbl/Kconfig b/drivers/infiniband/hw/hbl/Kconfig
index 90c865a82540..db09ecb3968a 100644
--- a/drivers/infiniband/hw/hbl/Kconfig
+++ b/drivers/infiniband/hw/hbl/Kconfig
@@ -6,6 +6,7 @@
config INFINIBAND_HBL
tristate "HabanaLabs (an Intel Company) InfiniBand driver"
depends on NETDEVICES && ETHERNET && PCI && INET
+ depends on INFINIBAND_USER_ACCESS
select HABANA_CN
help
This driver enables InfiniBand functionality for the network
diff --git a/drivers/infiniband/hw/hbl/Makefile b/drivers/infiniband/hw/hbl/Makefile
index 659d4bbfec0f..86f53ca6b9d5 100644
--- a/drivers/infiniband/hw/hbl/Makefile
+++ b/drivers/infiniband/hw/hbl/Makefile
@@ -6,3 +6,7 @@
obj-$(CONFIG_INFINIBAND_HBL) := habanalabs_ib.o
habanalabs_ib-y += hbl_main.o hbl_verbs.o
+
+habanalabs_ib-$(CONFIG_INFINIBAND_USER_ACCESS) += hbl_encap.o hbl_query_port.o \
+ hbl_set_port_ex.o \
+ hbl_usr_fifo.o
diff --git a/drivers/infiniband/hw/hbl/hbl_encap.c b/drivers/infiniband/hw/hbl/hbl_encap.c
new file mode 100644
index 000000000000..bcc198059664
--- /dev/null
+++ b/drivers/infiniband/hw/hbl/hbl_encap.c
@@ -0,0 +1,216 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2022-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "hbl.h"
+#include <rdma/ib_user_verbs.h>
+#include <rdma/ib_addr.h>
+#include <rdma/uverbs_ioctl.h>
+#include <linux/net/intel/cni.h>
+
+#include <rdma/uverbs_ioctl.h>
+#include <uapi/rdma/hbl_user_ioctl_cmds.h>
+
+#define UVERBS_MODULE_NAME hbl
+#include <rdma/uverbs_named_ioctl.h>
+
+struct hbl_encap {
+ u32 port_num;
+ u32 encap_num;
+};
+
+static int UVERBS_HANDLER(HBL_IB_METHOD_ENCAP_CREATE)(struct uverbs_attr_bundle *attrs)
+{
+ struct hbl_cni_user_encap_alloc_out alloc_encap_out = {};
+ struct hbl_cni_user_encap_alloc_in alloc_encap_in = {};
+ struct hbl_cni_user_encap_unset_in unset_encap_in = {};
+ struct hbl_cni_user_encap_set_in set_encap_in = {};
+ struct hbl_uapi_encap_create_out out = {};
+ struct hbl_uapi_encap_create_in in = {};
+ u32 port, tnl_hdr_size, encap_num;
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_encap *encap_pdata;
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ struct ib_uobject *uobj;
+ u64 tnl_hdr_ptr;
+ u8 encap_type;
+ int rc;
+
+ hctx = to_hbl_ib_ucontext(ib_uverbs_get_ucontext(attrs));
+ if (IS_ERR(hctx))
+ return PTR_ERR(hctx);
+
+ uobj = uverbs_attr_get_uobject(attrs, HBL_IB_ATTR_ENCAP_CREATE_HANDLE);
+ hdev = to_hbl_ib_dev(hctx->ibucontext.device);
+ ibdev = &hdev->ibdev;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ rc = uverbs_copy_from(&in, attrs, HBL_IB_ATTR_ENCAP_CREATE_IN);
+ if (rc)
+ return rc;
+
+ rc = ib_to_hbl_port_num(hdev, in.port_num, &port);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u\n", in.port_num);
+ return rc;
+ }
+
+ if (!(hctx->ports_mask & BIT(port))) {
+ hbl_ibdev_dbg(ibdev, "port %d is not part of the context's ports mask 0x%llx\n",
+ port, hctx->ports_mask);
+ return -EINVAL;
+ }
+
+ encap_pdata = kzalloc(sizeof(*encap_pdata), GFP_KERNEL);
+ if (!encap_pdata)
+ return -ENOMEM;
+
+ encap_type = in.encap_type;
+
+ alloc_encap_in.port = port;
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_ENCAP_ALLOC, &alloc_encap_in,
+ &alloc_encap_out);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to alloc encap for port %d\n", port);
+ goto err_free_pdata;
+ }
+
+ encap_num = alloc_encap_out.id;
+
+ if (encap_type != HBL_CNI_ENCAP_NONE) {
+ tnl_hdr_ptr = in.tnl_hdr_ptr;
+ tnl_hdr_size = in.tnl_hdr_size;
+ } else {
+ tnl_hdr_ptr = 0;
+ tnl_hdr_size = 0;
+ }
+
+ encap_pdata->port_num = port;
+ encap_pdata->encap_num = encap_num;
+
+ set_encap_in.tnl_hdr_ptr = tnl_hdr_ptr;
+ set_encap_in.tnl_hdr_size = tnl_hdr_size;
+ set_encap_in.port = port;
+ set_encap_in.id = encap_num;
+ set_encap_in.encap_type = encap_type;
+
+ switch (encap_type) {
+ case HBL_CNI_ENCAP_NONE:
+ set_encap_in.ipv4_addr = in.ipv4_addr;
+ break;
+ case HBL_CNI_ENCAP_OVER_UDP:
+ set_encap_in.udp_dst_port = in.udp_dst_port;
+ break;
+ case HBL_CNI_ENCAP_OVER_IPV4:
+ set_encap_in.ip_proto = in.ip_proto;
+ break;
+ default:
+ break;
+ }
+
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_ENCAP_SET, &set_encap_in,
+ NULL);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to set encap for port %d\n", port);
+ goto err_encap_unset;
+ }
+
+ out.encap_num = encap_num;
+ uobj->object = encap_pdata;
+
+ rc = uverbs_copy_to_struct_or_zero(attrs, HBL_IB_ATTR_ENCAP_CREATE_OUT, &out, sizeof(out));
+ if (rc)
+ goto err_encap_unset;
+
+ return 0;
+
+err_encap_unset:
+ unset_encap_in.port = port;
+ unset_encap_in.id = encap_num;
+
+ if (aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_ENCAP_UNSET, &unset_encap_in,
+ NULL))
+ hbl_ibdev_dbg(ibdev, "failed to unset encap for port %d, encap_num %d\n", port,
+ encap_num);
+
+err_free_pdata:
+ kfree(encap_pdata);
+
+ return rc;
+}
+
+static int hbl_free_encap(struct ib_uobject *uobject, enum rdma_remove_reason why,
+ struct uverbs_attr_bundle *attrs)
+{
+ struct hbl_cni_user_encap_unset_in unset_encap_in = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_encap *encap_pdata;
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ int rc;
+
+ hctx = to_hbl_ib_ucontext(ib_uverbs_get_ucontext(attrs));
+ if (IS_ERR(hctx))
+ return PTR_ERR(hctx);
+
+ encap_pdata = uobject->object;
+ hdev = to_hbl_ib_dev(hctx->ibucontext.device);
+ ibdev = &hdev->ibdev;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ unset_encap_in.port = encap_pdata->port_num;
+ unset_encap_in.id = encap_pdata->encap_num;
+ if (aux_ops->device_operational(aux_dev)) {
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_ENCAP_UNSET,
+ &unset_encap_in, NULL);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to unset encap for port %d, id %d\n",
+ unset_encap_in.port, unset_encap_in.id);
+ return rc;
+ }
+ }
+
+ kfree(encap_pdata);
+
+ return 0;
+}
+
+DECLARE_UVERBS_NAMED_METHOD(
+ HBL_IB_METHOD_ENCAP_CREATE,
+ UVERBS_ATTR_IDR(HBL_IB_ATTR_ENCAP_CREATE_HANDLE,
+ HBL_IB_OBJECT_ENCAP,
+ UVERBS_ACCESS_NEW,
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_IN(HBL_IB_ATTR_ENCAP_CREATE_IN,
+ UVERBS_ATTR_STRUCT(struct hbl_uapi_encap_create_in, reserved),
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(HBL_IB_ATTR_ENCAP_CREATE_OUT,
+ UVERBS_ATTR_STRUCT(struct hbl_uapi_encap_create_out, reserved),
+ UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_METHOD_DESTROY(
+ HBL_IB_METHOD_ENCAP_DESTROY,
+ UVERBS_ATTR_IDR(HBL_IB_ATTR_ENCAP_DESTROY_HANDLE,
+ HBL_IB_OBJECT_ENCAP,
+ UVERBS_ACCESS_DESTROY,
+ UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_OBJECT(HBL_IB_OBJECT_ENCAP,
+ UVERBS_TYPE_ALLOC_IDR(hbl_free_encap),
+ &UVERBS_METHOD(HBL_IB_METHOD_ENCAP_CREATE),
+ &UVERBS_METHOD(HBL_IB_METHOD_ENCAP_DESTROY));
+
+const struct uapi_definition hbl_encap_defs[] = {
+ UAPI_DEF_CHAIN_OBJ_TREE_NAMED(HBL_IB_OBJECT_ENCAP),
+ {},
+};
diff --git a/drivers/infiniband/hw/hbl/hbl_main.c b/drivers/infiniband/hw/hbl/hbl_main.c
index 98d3ed46bfe2..6bee35695a06 100644
--- a/drivers/infiniband/hw/hbl/hbl_main.c
+++ b/drivers/infiniband/hw/hbl/hbl_main.c
@@ -22,6 +22,16 @@ MODULE_LICENSE("GPL");
#define MTU_DEFAULT SZ_4K
+static const struct uapi_definition hbl_defs[] = {
+#if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
+ UAPI_DEF_CHAIN(hbl_usr_fifo_defs),
+ UAPI_DEF_CHAIN(hbl_set_port_ex_defs),
+ UAPI_DEF_CHAIN(hbl_query_port_defs),
+ UAPI_DEF_CHAIN(hbl_encap_defs),
+#endif
+ {}
+};
+
static void hbl_ib_port_event(struct ib_device *ibdev, u32 port_num, enum ib_event_type reason)
{
struct ib_event event;
@@ -166,6 +176,11 @@ static int hbl_ib_dev_init(struct hbl_ib_device *hdev)
ib_set_device_ops(ibdev, &hbl_ib_dev_ops);
+ if (IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS))
+ ibdev->driver_def = hbl_defs;
+ else
+ dev_info(hdev->dev, "IB user access is disabled\n");
+
/* The CN driver might start calling the aux functions after registering the device so set
* the callbacks here.
*/
diff --git a/drivers/infiniband/hw/hbl/hbl_query_port.c b/drivers/infiniband/hw/hbl/hbl_query_port.c
new file mode 100644
index 000000000000..1882507b0b58
--- /dev/null
+++ b/drivers/infiniband/hw/hbl/hbl_query_port.c
@@ -0,0 +1,96 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2022-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "hbl.h"
+#include <rdma/ib_user_verbs.h>
+#include <rdma/ib_addr.h>
+#include <rdma/uverbs_ioctl.h>
+#include <linux/net/intel/cni.h>
+
+#include <rdma/uverbs_ioctl.h>
+#include <uapi/rdma/hbl_user_ioctl_cmds.h>
+#include <uapi/rdma/hbl_user_ioctl_verbs.h>
+
+#define UVERBS_MODULE_NAME hbl
+#include <rdma/uverbs_named_ioctl.h>
+
+static int UVERBS_HANDLER(HBL_IB_METHOD_QUERY_PORT)(struct uverbs_attr_bundle *attrs)
+{
+ struct hbl_cni_get_user_app_params_out app_params_out = {};
+ struct hbl_cni_get_user_app_params_in app_params_in = {};
+ struct hbl_uapi_query_port_out out = {};
+ struct hbl_uapi_query_port_in in = {};
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ int rc;
+
+ hctx = to_hbl_ib_ucontext(ib_uverbs_get_ucontext(attrs));
+ if (IS_ERR(hctx))
+ return PTR_ERR(hctx);
+
+ hdev = to_hbl_ib_dev(hctx->ibucontext.device);
+ ibdev = &hdev->ibdev;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ rc = uverbs_copy_from(&in, attrs, HBL_IB_ATTR_QUERY_PORT_IN);
+ if (rc)
+ return rc;
+
+ rc = ib_to_hbl_port_num(hdev, in.port_num, &app_params_in.port);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u\n", in.port_num);
+ return rc;
+ }
+
+ if (!(hctx->ports_mask & BIT(app_params_in.port))) {
+ hbl_ibdev_dbg(ibdev, "port %d is not part of the context's ports mask 0x%llx\n",
+ app_params_in.port, hctx->ports_mask);
+ return -EINVAL;
+ }
+
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_GET_USER_APP_PARAMS,
+ &app_params_in, &app_params_out);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to get user params for port %d\n", app_params_in.port);
+ return rc;
+ }
+
+ out.max_num_of_qps = app_params_out.max_num_of_qps;
+ out.num_allocated_qps = app_params_out.num_allocated_qps;
+ out.max_allocated_qp_num = app_params_out.max_allocated_qp_idx;
+ out.max_cq_size = app_params_out.max_cq_size;
+ out.advanced = app_params_out.advanced;
+ out.max_num_of_cqs = app_params_out.max_num_of_cqs;
+ out.max_num_of_usr_fifos = app_params_out.max_num_of_db_fifos;
+ out.max_num_of_encaps = app_params_out.max_num_of_encaps;
+ out.nic_macro_idx = app_params_out.nic_macro_idx;
+ out.nic_phys_port_idx = app_params_out.nic_phys_port_idx;
+
+ rc = uverbs_copy_to_struct_or_zero(attrs, HBL_IB_ATTR_QUERY_PORT_OUT, &out, sizeof(out));
+
+ return rc;
+}
+
+DECLARE_UVERBS_NAMED_METHOD(
+ HBL_IB_METHOD_QUERY_PORT,
+ UVERBS_ATTR_PTR_IN(HBL_IB_ATTR_QUERY_PORT_IN,
+ UVERBS_ATTR_STRUCT(struct hbl_uapi_query_port_in, reserved),
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(HBL_IB_ATTR_QUERY_PORT_OUT,
+ UVERBS_ATTR_STRUCT(struct hbl_uapi_query_port_out, reserved),
+ UA_MANDATORY));
+
+DECLARE_UVERBS_GLOBAL_METHODS(HBL_IB_OBJECT_QUERY_PORT, &UVERBS_METHOD(HBL_IB_METHOD_QUERY_PORT));
+
+const struct uapi_definition hbl_query_port_defs[] = {
+ UAPI_DEF_CHAIN_OBJ_TREE_NAMED(HBL_IB_OBJECT_QUERY_PORT),
+ {},
+};
diff --git a/drivers/infiniband/hw/hbl/hbl_set_port_ex.c b/drivers/infiniband/hw/hbl/hbl_set_port_ex.c
new file mode 100644
index 000000000000..15b509506cca
--- /dev/null
+++ b/drivers/infiniband/hw/hbl/hbl_set_port_ex.c
@@ -0,0 +1,96 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2022-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "hbl.h"
+#include <rdma/ib_user_verbs.h>
+#include <rdma/ib_addr.h>
+#include <rdma/uverbs_ioctl.h>
+#include <linux/net/intel/cni.h>
+
+#include <rdma/uverbs_ioctl.h>
+#include <uapi/rdma/hbl_user_ioctl_cmds.h>
+#include <uapi/rdma/hbl_user_ioctl_verbs.h>
+
+#define UVERBS_MODULE_NAME hbl
+#include <rdma/uverbs_named_ioctl.h>
+
+static int UVERBS_HANDLER(HBL_IB_METHOD_SET_PORT_EX)(struct uverbs_attr_bundle *attrs)
+{
+ struct hbl_ib_port_init_params port_init_params = {};
+ struct hbl_uapi_set_port_ex_in in = {};
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ u32 hbl_port;
+ int rc, i;
+
+ hctx = to_hbl_ib_ucontext(ib_uverbs_get_ucontext(attrs));
+ if (IS_ERR(hctx))
+ return PTR_ERR(hctx);
+
+ hdev = to_hbl_ib_dev(hctx->ibucontext.device);
+ ibdev = &hdev->ibdev;
+
+ rc = uverbs_copy_from(&in, attrs, HBL_IB_ATTR_SET_PORT_EX_IN);
+ if (rc)
+ return rc;
+
+ rc = ib_to_hbl_port_num(hdev, in.port_num, &hbl_port);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u\n", in.port_num);
+ return rc;
+ }
+
+ if (!(hctx->ports_mask & BIT(hbl_port))) {
+ hbl_ibdev_dbg(ibdev, "port %d is not part of the context's ports mask 0x%llx\n",
+ hbl_port, hctx->ports_mask);
+ return -EINVAL;
+ }
+
+ if (!in.qp_wq_bp_offs && in.qp_wq_bp_offs_cnt > 0)
+ return -EINVAL;
+
+ port_init_params.hbl_port_num = hbl_port;
+
+ for (i = 0; i < HBL_IB_WQ_ARRAY_TYPE_MAX; i++) {
+ port_init_params.wq_arr_attr[i].max_num_of_wqs =
+ in.wq_arr_attr[i].max_num_of_wqs;
+ port_init_params.wq_arr_attr[i].max_num_of_wqes_in_wq =
+ in.wq_arr_attr[i].max_num_of_wqes_in_wq;
+ port_init_params.wq_arr_attr[i].mem_id = in.wq_arr_attr[i].mem_id;
+ port_init_params.wq_arr_attr[i].swq_granularity =
+ in.wq_arr_attr[i].swq_granularity;
+ }
+
+ if (copy_from_user(port_init_params.qp_wq_bp_offs, u64_to_user_ptr(in.qp_wq_bp_offs),
+ sizeof(port_init_params.qp_wq_bp_offs[0]) *
+ min((u32)HBL_IB_MAX_BP_OFFS, in.qp_wq_bp_offs_cnt)))
+ return -EFAULT;
+
+ port_init_params.advanced = in.advanced;
+ port_init_params.adaptive_timeout_en = in.adaptive_timeout_en;
+
+ rc = hbl_ib_port_init(hctx, &port_init_params);
+ if (rc)
+ hbl_ibdev_dbg(ibdev, "failed(%d) to set port %u extended params\n", rc, hbl_port);
+
+ return rc;
+}
+
+DECLARE_UVERBS_NAMED_METHOD(
+ HBL_IB_METHOD_SET_PORT_EX,
+ UVERBS_ATTR_PTR_IN(HBL_IB_ATTR_SET_PORT_EX_IN,
+ UVERBS_ATTR_STRUCT(struct hbl_uapi_set_port_ex_in, reserved3),
+ UA_MANDATORY));
+
+DECLARE_UVERBS_GLOBAL_METHODS(HBL_IB_OBJECT_SET_PORT_EX,
+ &UVERBS_METHOD(HBL_IB_METHOD_SET_PORT_EX));
+
+const struct uapi_definition hbl_set_port_ex_defs[] = {
+ UAPI_DEF_CHAIN_OBJ_TREE_NAMED(HBL_IB_OBJECT_SET_PORT_EX),
+ {},
+};
diff --git a/drivers/infiniband/hw/hbl/hbl_usr_fifo.c b/drivers/infiniband/hw/hbl/hbl_usr_fifo.c
new file mode 100644
index 000000000000..37f387a99f40
--- /dev/null
+++ b/drivers/infiniband/hw/hbl/hbl_usr_fifo.c
@@ -0,0 +1,252 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2022-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "hbl.h"
+#include <rdma/ib_user_verbs.h>
+#include <rdma/ib_addr.h>
+#include <rdma/uverbs_ioctl.h>
+#include <linux/net/intel/cni.h>
+
+#include <rdma/uverbs_ioctl.h>
+#include <uapi/rdma/hbl_user_ioctl_cmds.h>
+#include <uapi/rdma/hbl_user_ioctl_verbs.h>
+
+#define UVERBS_MODULE_NAME hbl
+#include <rdma/uverbs_named_ioctl.h>
+
+/**
+ * struct hbl_usr_fifo - This structure will be stored inside the uobject.
+ * @ci_entry: The rdma_user_mmap_entry for the mapped ci.
+ * @regs_entry: The rdma_user_mmap_entry for the mapped registers.
+ * @port: The port of this fifo.
+ * @id: The id of this fifo.
+ */
+struct hbl_usr_fifo {
+ struct rdma_user_mmap_entry *ci_entry;
+ struct rdma_user_mmap_entry *regs_entry;
+ u32 port;
+ u32 id;
+};
+
+static void user_fifo_mmap_entry_remove(struct hbl_usr_fifo *usr_fifo)
+{
+ rdma_user_mmap_entry_remove(usr_fifo->regs_entry);
+ if (usr_fifo->ci_entry)
+ rdma_user_mmap_entry_remove(usr_fifo->ci_entry);
+}
+
+static int user_fifo_mmap_entry_setup(struct hbl_ib_device *dev, struct hbl_ib_ucontext *hctx,
+ struct hbl_usr_fifo *usr_fifo,
+ struct hbl_uapi_usr_fifo_create_out *out)
+{
+ if (out->ci_handle) {
+ usr_fifo->ci_entry = hbl_ib_user_mmap_entry_insert(&hctx->ibucontext,
+ out->ci_handle,
+ PAGE_SIZE, &out->ci_handle);
+ if (IS_ERR(usr_fifo->ci_entry))
+ return PTR_ERR(usr_fifo->ci_entry);
+ }
+
+ usr_fifo->regs_entry = hbl_ib_user_mmap_entry_insert(&hctx->ibucontext, out->regs_handle,
+ PAGE_SIZE, &out->regs_handle);
+ if (IS_ERR(usr_fifo->regs_entry))
+ goto err_free_ci;
+
+ return 0;
+
+err_free_ci:
+ if (usr_fifo->ci_entry)
+ rdma_user_mmap_entry_remove(usr_fifo->ci_entry);
+
+ return PTR_ERR(usr_fifo->regs_entry);
+}
+
+static int UVERBS_HANDLER(HBL_IB_METHOD_USR_FIFO_OBJ_CREATE)(struct uverbs_attr_bundle *attrs)
+{
+ struct hbl_cni_alloc_user_db_fifo_out alloc_db_fifo_out = {};
+ struct hbl_cni_alloc_user_db_fifo_in alloc_db_fifo_in = {};
+ struct hbl_cni_user_db_fifo_unset_in db_fifo_unset_in = {};
+ struct hbl_cni_user_db_fifo_set_out db_fifo_set_out = {};
+ struct hbl_cni_user_db_fifo_set_in db_fifo_set_in = {};
+ struct hbl_uapi_usr_fifo_create_out out = {};
+ struct hbl_uapi_usr_fifo_create_in in = {};
+ struct hbl_usr_fifo *usr_fifo_pdata;
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ struct ib_uobject *uobj;
+ u32 port, id;
+ u8 mode;
+ int rc;
+
+ hctx = to_hbl_ib_ucontext(ib_uverbs_get_ucontext(attrs));
+ if (IS_ERR(hctx))
+ return PTR_ERR(hctx);
+
+ uobj = uverbs_attr_get_uobject(attrs, HBL_IB_ATTR_USR_FIFO_CREATE_HANDLE);
+ hdev = to_hbl_ib_dev(hctx->ibucontext.device);
+ ibdev = &hdev->ibdev;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ rc = uverbs_copy_from(&in, attrs, HBL_IB_ATTR_USR_FIFO_CREATE_IN);
+ if (rc)
+ return rc;
+
+ rc = ib_to_hbl_port_num(hdev, in.port_num, &port);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "invalid IB port %u\n", in.port_num);
+ return rc;
+ }
+
+ if (!(hctx->ports_mask & BIT(port))) {
+ hbl_ibdev_dbg(ibdev, "port %d is not part of the context's ports mask 0x%llx\n",
+ port, hctx->ports_mask);
+ return -EINVAL;
+ }
+
+ usr_fifo_pdata = kzalloc(sizeof(*usr_fifo_pdata), GFP_KERNEL);
+ if (!usr_fifo_pdata)
+ return -ENOMEM;
+
+ mode = in.mode;
+
+ alloc_db_fifo_in.port = port;
+ alloc_db_fifo_in.id_hint = in.usr_fifo_num_hint;
+
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_ALLOC_USER_DB_FIFO,
+ &alloc_db_fifo_in, &alloc_db_fifo_out);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to alloc db fifo, port %d\n", port);
+ goto err_free_pdata;
+ }
+
+ id = alloc_db_fifo_out.id;
+
+ usr_fifo_pdata->port = port;
+ usr_fifo_pdata->id = id;
+
+ db_fifo_set_in.port = port;
+ db_fifo_set_in.id = id;
+ db_fifo_set_in.mode = mode;
+
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_DB_FIFO_SET, &db_fifo_set_in,
+ &db_fifo_set_out);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to set db fifo %d, port %d\n", id, port);
+ goto err_usr_fifo_unset;
+ }
+
+ out.usr_fifo_num = id;
+ out.ci_handle = db_fifo_set_out.ci_handle;
+ out.regs_handle = db_fifo_set_out.regs_handle;
+ out.regs_offset = db_fifo_set_out.regs_offset;
+ out.size = db_fifo_set_out.fifo_size;
+ out.bp_thresh = db_fifo_set_out.fifo_bp_thresh;
+
+ rc = user_fifo_mmap_entry_setup(hdev, hctx, usr_fifo_pdata, &out);
+ if (rc)
+ goto err_usr_fifo_unset;
+
+ uobj->object = usr_fifo_pdata;
+
+ rc = uverbs_copy_to_struct_or_zero(attrs, HBL_IB_ATTR_USR_FIFO_CREATE_OUT, &out,
+ sizeof(out));
+ if (rc)
+ goto err_remove_mmap_entries;
+
+ return 0;
+
+err_remove_mmap_entries:
+ user_fifo_mmap_entry_remove(usr_fifo_pdata);
+
+err_usr_fifo_unset:
+ db_fifo_unset_in.port = port;
+ db_fifo_unset_in.id = id;
+
+ if (aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_DB_FIFO_UNSET,
+ &db_fifo_unset_in, NULL))
+ hbl_ibdev_dbg(ibdev, "failed to unset db fifo %d, port %d\n", id, port);
+
+err_free_pdata:
+ kfree(usr_fifo_pdata);
+ return rc;
+}
+
+static int hbl_free_usr_fifo(struct ib_uobject *uobject, enum rdma_remove_reason why,
+ struct uverbs_attr_bundle *attrs)
+{
+ struct hbl_cni_user_db_fifo_unset_in db_fifo_unset_in = {};
+ struct hbl_usr_fifo *usr_fifo_pdata;
+ struct hbl_ib_aux_ops *aux_ops;
+ struct hbl_ib_ucontext *hctx;
+ struct hbl_aux_dev *aux_dev;
+ struct hbl_ib_device *hdev;
+ struct ib_device *ibdev;
+ int rc;
+
+ hctx = to_hbl_ib_ucontext(ib_uverbs_get_ucontext(attrs));
+ if (IS_ERR(hctx))
+ return PTR_ERR(hctx);
+
+ usr_fifo_pdata = uobject->object;
+ hdev = to_hbl_ib_dev(hctx->ibucontext.device);
+ ibdev = &hdev->ibdev;
+ aux_dev = hdev->aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ db_fifo_unset_in.port = usr_fifo_pdata->port;
+ db_fifo_unset_in.id = usr_fifo_pdata->id;
+
+ user_fifo_mmap_entry_remove(usr_fifo_pdata);
+
+ if (aux_ops->device_operational(aux_dev)) {
+ rc = aux_ops->cmd_ctrl(aux_dev, hctx->cn_ctx, HBL_CNI_OP_USER_DB_FIFO_UNSET,
+ &db_fifo_unset_in, NULL);
+ if (rc) {
+ hbl_ibdev_dbg(ibdev, "failed to unset db fifo %d, port %d\n",
+ usr_fifo_pdata->id, usr_fifo_pdata->port);
+ return rc;
+ }
+ }
+
+ kfree(usr_fifo_pdata);
+
+ return 0;
+}
+
+DECLARE_UVERBS_NAMED_METHOD(
+ HBL_IB_METHOD_USR_FIFO_OBJ_CREATE,
+ UVERBS_ATTR_IDR(HBL_IB_ATTR_USR_FIFO_CREATE_HANDLE,
+ HBL_IB_OBJECT_USR_FIFO,
+ UVERBS_ACCESS_NEW,
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_IN(HBL_IB_ATTR_USR_FIFO_CREATE_IN,
+ UVERBS_ATTR_STRUCT(struct hbl_uapi_usr_fifo_create_in, reserved3),
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(HBL_IB_ATTR_USR_FIFO_CREATE_OUT,
+ UVERBS_ATTR_STRUCT(struct hbl_uapi_usr_fifo_create_out, bp_thresh),
+ UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_METHOD_DESTROY(
+ HBL_IB_METHOD_USR_FIFO_OBJ_DESTROY,
+ UVERBS_ATTR_IDR(HBL_IB_ATTR_USR_FIFO_DESTROY_HANDLE,
+ HBL_IB_OBJECT_USR_FIFO,
+ UVERBS_ACCESS_DESTROY,
+ UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_OBJECT(HBL_IB_OBJECT_USR_FIFO,
+ UVERBS_TYPE_ALLOC_IDR(hbl_free_usr_fifo),
+ &UVERBS_METHOD(HBL_IB_METHOD_USR_FIFO_OBJ_CREATE),
+ &UVERBS_METHOD(HBL_IB_METHOD_USR_FIFO_OBJ_DESTROY));
+
+const struct uapi_definition hbl_usr_fifo_defs[] = {
+ UAPI_DEF_CHAIN_OBJ_TREE_NAMED(HBL_IB_OBJECT_USR_FIFO),
+ {},
+};
--
2.34.1
^ permalink raw reply related [flat|nested] 107+ messages in thread
* [PATCH 13/15] accel/habanalabs: network scaling support
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
` (10 preceding siblings ...)
2024-06-13 8:22 ` [PATCH 12/15] RDMA/hbl: direct verbs support Omer Shpigelman
@ 2024-06-13 8:22 ` Omer Shpigelman
2024-06-19 18:41 ` Sunil Kovvuri Goutham
2024-06-13 8:22 ` [PATCH 14/15] accel/habanalabs/gaudi2: CN registers header files Omer Shpigelman
` (3 subsequent siblings)
15 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:22 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
Add common support for AI scaling over the network.
Initialize the hbl_cn driver via auxiliary bus and serve as its adapter
for accessing the device.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Co-developed-by: David Meriin <dmeriin@habana.ai>
Signed-off-by: David Meriin <dmeriin@habana.ai>
Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
---
drivers/accel/habanalabs/Kconfig | 1 +
drivers/accel/habanalabs/Makefile | 3 +
drivers/accel/habanalabs/cn/Makefile | 2 +
drivers/accel/habanalabs/cn/cn.c | 815 ++++++++++++++++++
drivers/accel/habanalabs/cn/cn.h | 133 +++
.../habanalabs/common/command_submission.c | 2 +-
drivers/accel/habanalabs/common/device.c | 23 +
drivers/accel/habanalabs/common/firmware_if.c | 20 +
drivers/accel/habanalabs/common/habanalabs.h | 43 +-
.../accel/habanalabs/common/habanalabs_drv.c | 37 +-
.../habanalabs/common/habanalabs_ioctl.c | 2 +
drivers/accel/habanalabs/common/memory.c | 123 +++
drivers/accel/habanalabs/gaudi/gaudi.c | 14 +-
drivers/accel/habanalabs/gaudi2/gaudi2.c | 14 +-
.../accel/habanalabs/gaudi2/gaudi2_security.c | 16 +-
drivers/accel/habanalabs/goya/goya.c | 6 +
.../include/hw_ip/nic/nic_general.h | 15 +
include/uapi/drm/habanalabs_accel.h | 10 +-
18 files changed, 1251 insertions(+), 28 deletions(-)
create mode 100644 drivers/accel/habanalabs/cn/Makefile
create mode 100644 drivers/accel/habanalabs/cn/cn.c
create mode 100644 drivers/accel/habanalabs/cn/cn.h
create mode 100644 drivers/accel/habanalabs/include/hw_ip/nic/nic_general.h
diff --git a/drivers/accel/habanalabs/Kconfig b/drivers/accel/habanalabs/Kconfig
index be85336107f9..dcc8d294b761 100644
--- a/drivers/accel/habanalabs/Kconfig
+++ b/drivers/accel/habanalabs/Kconfig
@@ -13,6 +13,7 @@ config DRM_ACCEL_HABANALABS
select DMA_SHARED_BUFFER
select CRC32
select FW_LOADER
+ select AUXILIARY_BUS
help
Enables PCIe card driver for Habana's AI Processors (AIP) that are
designed to accelerate Deep Learning inference and training workloads.
diff --git a/drivers/accel/habanalabs/Makefile b/drivers/accel/habanalabs/Makefile
index 98510cdd5066..37e216689d1a 100644
--- a/drivers/accel/habanalabs/Makefile
+++ b/drivers/accel/habanalabs/Makefile
@@ -8,6 +8,9 @@ obj-$(CONFIG_DRM_ACCEL_HABANALABS) := habanalabs.o
include $(src)/common/Makefile
habanalabs-y += $(HL_COMMON_FILES)
+include $(src)/cn/Makefile
+habanalabs-y += $(HL_CN_FILES)
+
include $(src)/gaudi2/Makefile
habanalabs-y += $(HL_GAUDI2_FILES)
diff --git a/drivers/accel/habanalabs/cn/Makefile b/drivers/accel/habanalabs/cn/Makefile
new file mode 100644
index 000000000000..b90d5a93a632
--- /dev/null
+++ b/drivers/accel/habanalabs/cn/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+HL_CN_FILES := cn/cn.o
diff --git a/drivers/accel/habanalabs/cn/cn.c b/drivers/accel/habanalabs/cn/cn.c
new file mode 100644
index 000000000000..29e4c2292391
--- /dev/null
+++ b/drivers/accel/habanalabs/cn/cn.c
@@ -0,0 +1,815 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "cn.h"
+
+#include "../common/habanalabs.h"
+#include <linux/file.h>
+
+static int hl_cn_send_empty_status(struct hl_device *hdev, int port)
+{
+ struct hl_cn_funcs *cn_funcs = hdev->asic_funcs->cn_funcs;
+ struct cpucp_nic_status status = {};
+ struct hl_cn_properties *cn_props;
+ struct cpucp_nic_status_packet *pkt;
+ size_t total_pkt_size, data_size;
+ u64 result;
+ int rc;
+
+ cn_props = &hdev->asic_prop.cn_props;
+ data_size = cn_props->status_packet_size;
+
+ total_pkt_size = sizeof(struct cpucp_nic_status_packet) + data_size;
+
+ /* data should be aligned to 8 bytes in order to CPU-CP to copy it */
+ total_pkt_size = (total_pkt_size + 0x7) & ~0x7;
+
+ /* total_pkt_size is casted to u16 later on */
+ if (total_pkt_size > USHRT_MAX) {
+ dev_err(hdev->dev, "NIC status data is too big\n");
+ rc = -EINVAL;
+ goto out;
+ }
+
+ pkt = kzalloc(total_pkt_size, GFP_KERNEL);
+ if (!pkt) {
+ rc = -ENOMEM;
+ goto out;
+ }
+
+ status.port = cpu_to_le32(port);
+ status.up = false;
+
+ pkt->length = cpu_to_le32(data_size / sizeof(u32));
+ memcpy(&pkt->data, &status, data_size);
+
+ pkt->cpucp_pkt.ctl = cpu_to_le32(CPUCP_PACKET_NIC_STATUS << CPUCP_PKT_CTL_OPCODE_SHIFT);
+
+ rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) pkt, total_pkt_size, 0, &result);
+
+ if (rc)
+ dev_err(hdev->dev, "failed to send NIC status, port %d\n", port);
+
+ kfree(pkt);
+out:
+ cn_funcs->port_funcs->post_send_status(hdev, port);
+
+ return rc;
+}
+
+static bool hl_cn_device_operational(struct hbl_aux_dev *aux_dev)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+
+ return hl_device_operational(hdev, NULL);
+}
+
+static void hl_cn_hw_access_lock(struct hbl_aux_dev *aux_dev)
+ __acquires(&hdev->cn.hw_access_lock)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+
+ mutex_lock(&hdev->cn.hw_access_lock);
+}
+
+static void hl_cn_hw_access_unlock(struct hbl_aux_dev *aux_dev)
+ __releases(&hdev->cn.hw_access_lock)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+
+ mutex_unlock(&hdev->cn.hw_access_lock);
+}
+
+static void hl_cn_device_reset(struct hbl_aux_dev *aux_dev)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+
+ hl_device_reset(hdev, HL_DRV_RESET_HARD);
+}
+
+void *hl_cn_dma_alloc_coherent(struct hbl_aux_dev *aux_dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flag)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+
+ return hl_asic_dma_alloc_coherent(hdev, size, dma_handle, flag);
+}
+
+void hl_cn_dma_free_coherent(struct hbl_aux_dev *aux_dev, size_t size, void *cpu_addr,
+ dma_addr_t dma_handle)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+
+ hl_asic_dma_free_coherent(hdev, size, cpu_addr, dma_handle);
+}
+
+void *hl_cn_dma_pool_zalloc(struct hbl_aux_dev *aux_dev, size_t size, gfp_t mem_flags,
+ dma_addr_t *dma_handle)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+
+ return hl_asic_dma_pool_zalloc(hdev, size, mem_flags, dma_handle);
+}
+
+void hl_cn_dma_pool_free(struct hbl_aux_dev *aux_dev, void *vaddr, dma_addr_t dma_addr)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+
+ hl_asic_dma_pool_free(hdev, vaddr, dma_addr);
+}
+
+static int hl_cn_vm_dev_mmu_map(struct hbl_aux_dev *aux_dev, u64 vm_handle,
+ enum hbl_cn_mem_type mem_type, u64 addr, u64 dva, size_t size)
+
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+
+ return hl_map_vmalloc_range(cn->ctx, addr, dva, size);
+}
+
+static void hl_cn_vm_dev_mmu_unmap(struct hbl_aux_dev *aux_dev, u64 vm_handle, u64 dva, size_t size)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+ int rc;
+
+ rc = hl_unmap_vmalloc_range(cn->ctx, dva);
+ if (rc)
+ dev_crit(hdev->dev, "Failed to unmap dva 0x%llx with size 0x%lx, err %d\n", dva,
+ size, rc);
+}
+
+static int hl_cn_vm_reserve_dva_block(struct hbl_aux_dev *aux_dev, u64 vm_handle, u64 size,
+ u64 *dva)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+ u64 addr;
+
+ addr = hl_reserve_va_block(hdev, cn->ctx, HL_VA_RANGE_TYPE_HOST, size, PAGE_SIZE);
+ if (!addr)
+ return -ENOMEM;
+
+ *dva = addr;
+
+ return 0;
+}
+
+static void hl_cn_vm_unreserve_dva_block(struct hbl_aux_dev *aux_dev, u64 vm_handle, u64 dva,
+ u64 size)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+
+ hl_unreserve_va_block(hdev, cn->ctx, dva, size);
+}
+
+void hl_cn_spmu_get_stats_info(struct hbl_aux_dev *aux_dev, u32 port, struct hbl_cn_stat **stats,
+ u32 *n_stats)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+ struct hl_cn_port_funcs *port_funcs = hdev->asic_funcs->cn_funcs->port_funcs;
+
+ port_funcs->spmu_get_stats_info(hdev, port, stats, n_stats);
+}
+
+int hl_cn_spmu_config(struct hbl_aux_dev *aux_dev, u32 port, u32 num_event_types, u32 event_types[],
+ bool enable)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+ struct hl_cn_port_funcs *port_funcs = hdev->asic_funcs->cn_funcs->port_funcs;
+
+ return port_funcs->spmu_config(hdev, port, num_event_types, event_types, enable);
+}
+
+int hl_cn_spmu_sample(struct hbl_aux_dev *aux_dev, u32 port, u32 num_out_data, u64 out_data[])
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+ struct hl_cn_port_funcs *port_funcs = hdev->asic_funcs->cn_funcs->port_funcs;
+
+ return port_funcs->spmu_sample(hdev, port, num_out_data, out_data);
+}
+
+int hl_cn_poll_reg(struct hbl_aux_dev *aux_dev, u32 reg, u64 timeout_us, hbl_cn_poll_cond_func func,
+ void *arg)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+ u32 val;
+
+ return hl_poll_timeout(hdev, reg, val, func(val, arg), 1000, timeout_us);
+}
+
+int hl_cn_send_cpu_message(struct hbl_aux_dev *aux_dev, u32 *msg, u16 len, u32 timeout, u64 *result)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+
+ return hdev->asic_funcs->send_cpu_message(hdev, msg, len, timeout, result);
+}
+
+void hl_cn_post_send_status(struct hbl_aux_dev *aux_dev, u32 port)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+ struct hl_cn_port_funcs *port_funcs = hdev->asic_funcs->cn_funcs->port_funcs;
+
+ port_funcs->post_send_status(hdev, port);
+}
+
+static u32 hl_cn_dram_readl(struct hbl_aux_dev *aux_dev, u64 addr)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+ u64 val = 0;
+ int rc;
+
+ rc = hdev->asic_funcs->access_dev_mem(hdev, PCI_REGION_DRAM, addr, &val, DEBUGFS_READ32);
+ if (rc)
+ dev_crit(hdev->dev, "Failed to readl from dev_mem addr 0x%llx\n", addr);
+
+ return val;
+}
+
+static void hl_cn_dram_writel(struct hbl_aux_dev *aux_dev, u32 val, u64 addr)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+ u64 data = val;
+ int rc;
+
+ rc = hdev->asic_funcs->access_dev_mem(hdev, PCI_REGION_DRAM, addr, &data, DEBUGFS_WRITE32);
+ if (rc)
+ dev_crit(hdev->dev, "Failed to writel to dev_mem addr 0x%llx\n", addr);
+}
+
+static u32 hl_cn_rreg(struct hbl_aux_dev *aux_dev, u32 reg)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+
+ return hdev->asic_funcs->rreg(hdev, reg);
+}
+
+static void hl_cn_wreg(struct hbl_aux_dev *aux_dev, u32 reg, u32 val)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+
+ return hdev->asic_funcs->wreg(hdev, reg, val);
+}
+
+static int hl_cn_get_reg_pcie_addr(struct hbl_aux_dev *aux_dev, u32 reg, u64 *pci_addr)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+
+ return hdev->asic_funcs->get_reg_pcie_addr(hdev, reg, pci_addr);
+}
+
+static int hl_cn_register_cn_user_context(struct hbl_aux_dev *aux_dev, int user_fd,
+ const void *cn_ctx, u64 *comp_handle, u64 *vm_handle)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+ struct drm_file *file_priv;
+ struct hl_fpriv *hpriv;
+ struct file *file;
+ int rc = 0;
+
+ if (atomic_cmpxchg(&cn->ctx_registered, 0, 1)) {
+ dev_dbg(hdev->dev, "user context is already registered\n");
+ return -EBUSY;
+ }
+
+ /* CN driver can independently manage its resources and context.
+ * However, for HL devices, corresponding HW resources can also be managed by compute side.
+ * To avoid contention (e.g. abrupt application close) between them, enforce orderly FD
+ * closure. This facilitates that CN destroy runs first, followed by compute fini.
+ */
+ file = fget(user_fd);
+ if (!file || !hl_check_fd(file)) {
+ rc = -EBADF;
+ goto file_err;
+ }
+
+ mutex_lock(&hdev->fpriv_list_lock);
+
+ if (list_empty(&hdev->fpriv_list)) {
+ dev_dbg(hdev->dev, "no open user context\n");
+ rc = -ESRCH;
+ goto open_ctx_err;
+ }
+
+ /* The list should contain a single element as currently only a single user context is
+ * allowed. Therefore get the first entry.
+ */
+ hpriv = list_first_entry(&hdev->fpriv_list, struct hl_fpriv, dev_node);
+
+ file_priv = file->private_data;
+ if (hpriv != file_priv->driver_priv) {
+ dev_dbg(hdev->dev, "user FD mismatch\n");
+ rc = -EINVAL;
+ goto fd_mismatch_err;
+ }
+
+ mutex_unlock(&hdev->fpriv_list_lock);
+
+ /* these must have different values to allow data transfer */
+ *comp_handle = 0;
+ *vm_handle = 1;
+
+ return 0;
+
+fd_mismatch_err:
+open_ctx_err:
+ mutex_unlock(&hdev->fpriv_list_lock);
+ fput(file);
+file_err:
+ atomic_set(&cn->ctx_registered, 0);
+
+ return rc;
+}
+
+static void hl_cn_deregister_cn_user_context(struct hbl_aux_dev *aux_dev, u64 vm_handle)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+ struct hl_fpriv *hpriv;
+ struct file *file;
+
+ mutex_lock(&hdev->fpriv_list_lock);
+ hpriv = list_first_entry(&hdev->fpriv_list, struct hl_fpriv, dev_node);
+ mutex_unlock(&hdev->fpriv_list_lock);
+
+ file = hpriv->file_priv->filp;
+
+ /* We can assert here that all CN resources which might have dependency on compute side are
+ * already released. Hence, release reference to compute file.
+ */
+ fput(file);
+
+ atomic_set(&cn->ctx_registered, 0);
+}
+
+static int hl_cn_vm_create(struct hbl_aux_dev *aux_dev, u64 comp_handle, u32 flags, u64 *vm_handle)
+{
+ *vm_handle = 0;
+
+ return 0;
+}
+
+static void hl_cn_vm_destroy(struct hbl_aux_dev *aux_dev, u64 vm_handle)
+{
+
+}
+
+static int hl_cn_get_vm_info(struct hbl_aux_dev *aux_dev, u64 vm_handle,
+ struct hbl_cn_vm_info *vm_info)
+{
+ vm_info->mmu_mode = HBL_CN_MMU_MODE_EXTERNAL;
+ vm_info->ext_mmu.work_id = 1;
+
+ return 0;
+}
+
+static void hl_cn_get_cpucp_info(struct hbl_aux_dev *aux_dev,
+ struct hbl_cn_cpucp_info *hl_cn_cpucp_info)
+{
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+ struct hl_device *hdev = container_of(cn, struct hl_device, cn);
+ struct hbl_cn_cpucp_info *cn_cpucp_info;
+
+ cn_cpucp_info = &hdev->asic_prop.cn_props.cpucp_info;
+
+ memcpy(hl_cn_cpucp_info, cn_cpucp_info, sizeof(*cn_cpucp_info));
+}
+
+static void hl_cn_cpucp_info_le_to_cpu(struct cpucp_nic_info *cpucp_nic_info,
+ struct hbl_cn_cpucp_info *hbl_cn_cpucp_info)
+{
+ int i;
+
+ for (i = 0 ; i < CPUCP_MAX_NICS ; i++) {
+ memcpy(&hbl_cn_cpucp_info->mac_addrs[i], &cpucp_nic_info->mac_addrs[i],
+ sizeof(cpucp_nic_info->mac_addrs[i]));
+ hbl_cn_cpucp_info->tx_swap_map[i] = le16_to_cpu(cpucp_nic_info->tx_swap_map[i]);
+ }
+
+ for (i = 0 ; i < CPUCP_NIC_MASK_ARR_LEN ; i++) {
+ hbl_cn_cpucp_info->link_mask[i] = le64_to_cpu(cpucp_nic_info->link_mask[i]);
+ hbl_cn_cpucp_info->link_ext_mask[i] = le64_to_cpu(cpucp_nic_info->link_ext_mask[i]);
+ hbl_cn_cpucp_info->auto_neg_mask[i] = le64_to_cpu(cpucp_nic_info->auto_neg_mask[i]);
+ }
+
+ for (i = 0 ; i < CPUCP_NIC_POLARITY_ARR_LEN ; i++) {
+ hbl_cn_cpucp_info->pol_tx_mask[i] = le64_to_cpu(cpucp_nic_info->pol_tx_mask[i]);
+ hbl_cn_cpucp_info->pol_rx_mask[i] = le64_to_cpu(cpucp_nic_info->pol_rx_mask[i]);
+ }
+
+ hbl_cn_cpucp_info->serdes_type = (enum cpucp_serdes_type)
+ le16_to_cpu(cpucp_nic_info->serdes_type);
+
+ memcpy(hbl_cn_cpucp_info->qsfp_eeprom, cpucp_nic_info->qsfp_eeprom,
+ sizeof(cpucp_nic_info->qsfp_eeprom));
+}
+
+static int hl_cn_get_asic_type(struct hl_device *hdev, enum hbl_cn_asic_type *asic_type)
+
+{
+ switch (hdev->asic_type) {
+ case ASIC_GAUDI2:
+ case ASIC_GAUDI2B:
+ case ASIC_GAUDI2C:
+ *asic_type = HBL_ASIC_GAUDI2;
+ break;
+ default:
+ dev_err(hdev->dev, "Unrecognized ASIC type %d\n", hdev->asic_type);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int hl_cn_aux_data_init(struct hl_device *hdev)
+{
+ struct hl_cn_funcs *cn_funcs = hdev->asic_funcs->cn_funcs;
+ struct asic_fixed_properties *asic_props = &hdev->asic_prop;
+ struct hbl_cn_aux_data *aux_data;
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hl_cn *cn = &hdev->cn;
+ struct hbl_aux_dev *aux_dev;
+ u64 dram_kmd_size;
+ int rc;
+
+ aux_data = kzalloc(sizeof(*aux_data), GFP_KERNEL);
+ if (!aux_data)
+ return -ENOMEM;
+
+ aux_ops = kzalloc(sizeof(*aux_ops), GFP_KERNEL);
+ if (!aux_ops) {
+ rc = -ENOMEM;
+ goto free_aux_data;
+ }
+
+ aux_dev = &cn->cn_aux_dev;
+ aux_dev->aux_data = aux_data;
+ aux_dev->aux_ops = aux_ops;
+ aux_dev->type = HBL_AUX_DEV_CN;
+
+ aux_data->pdev = hdev->pdev;
+ aux_data->dev = hdev->dev;
+ aux_data->ports_mask = cn->ports_mask;
+ aux_data->ext_ports_mask = cn->eth_ports_mask;
+ aux_data->auto_neg_mask = cn->auto_neg_mask;
+ aux_data->fw_ver = asic_props->cpucp_info.cpucp_version;
+ aux_data->nic_drv_addr = asic_props->nic_drv_addr;
+ aux_data->nic_drv_size = asic_props->nic_drv_size;
+ aux_data->macro_cfg_size = asic_props->macro_cfg_size;
+ aux_data->pending_reset_long_timeout = hdev->pldm ? HL_PLDM_HARD_RESET_MAX_TIMEOUT :
+ HL_HARD_RESET_MAX_TIMEOUT;
+ aux_data->id = hdev->cdev_idx;
+ aux_data->pldm = hdev->pldm;
+ aux_data->skip_phy_init = hdev->cn.skip_phy_init;
+ aux_data->load_phy_fw = hdev->cn.load_fw;
+ aux_data->cpucp_fw = !!(hdev->fw_components & FW_TYPE_BOOT_CPU);
+ aux_data->supports_coresight = hdev->supports_coresight;
+ aux_data->use_fw_serdes_info = cn->use_fw_serdes_info;
+ aux_data->cache_line_size = asic_props->cache_line_size;
+ aux_data->clk = asic_props->clk;
+ aux_data->kernel_asid = HL_KERNEL_ASID_ID;
+ aux_data->card_location = cn->card_location;
+ aux_data->mmu_enable = true;
+ aux_data->lanes_per_port = cn->lanes_per_port;
+ aux_data->device_timeout = HL_DEVICE_TIMEOUT_USEC;
+ aux_data->fw_major_version = hdev->fw_inner_major_ver;
+ aux_data->fw_minor_version = hdev->fw_inner_minor_ver;
+ aux_data->fw_app_cpu_boot_dev_sts0 = asic_props->fw_app_cpu_boot_dev_sts0;
+ aux_data->fw_app_cpu_boot_dev_sts1 = asic_props->fw_app_cpu_boot_dev_sts1;
+ aux_data->cpucp_checkers_shift = NIC_CHECKERS_CHECK_SHIFT;
+
+ rc = hl_cn_get_asic_type(hdev, &aux_data->asic_type);
+ if (rc) {
+ dev_err(hdev->dev, "failed to set eth aux data asic type\n");
+ goto free_aux_ops;
+ }
+
+ dram_kmd_size = asic_props->dram_user_base_address - asic_props->dram_base_address;
+ aux_data->dram_size = (asic_props->dram_size < dram_kmd_size) ? 0 : dram_kmd_size;
+
+ /* set cn -> accel ops */
+ aux_ops->device_operational = hl_cn_device_operational;
+ aux_ops->hw_access_lock = hl_cn_hw_access_lock;
+ aux_ops->hw_access_unlock = hl_cn_hw_access_unlock;
+ aux_ops->device_reset = hl_cn_device_reset;
+ aux_ops->vm_dev_mmu_map = hl_cn_vm_dev_mmu_map;
+ aux_ops->vm_dev_mmu_unmap = hl_cn_vm_dev_mmu_unmap;
+ aux_ops->vm_reserve_dva_block = hl_cn_vm_reserve_dva_block;
+ aux_ops->vm_unreserve_dva_block = hl_cn_vm_unreserve_dva_block;
+ aux_ops->dram_readl = hl_cn_dram_readl;
+ aux_ops->dram_writel = hl_cn_dram_writel;
+ aux_ops->rreg = hl_cn_rreg;
+ aux_ops->wreg = hl_cn_wreg;
+ aux_ops->get_reg_pcie_addr = hl_cn_get_reg_pcie_addr;
+ aux_ops->register_cn_user_context = hl_cn_register_cn_user_context;
+ aux_ops->deregister_cn_user_context = hl_cn_deregister_cn_user_context;
+ aux_ops->vm_create = hl_cn_vm_create;
+ aux_ops->vm_destroy = hl_cn_vm_destroy;
+ aux_ops->get_vm_info = hl_cn_get_vm_info;
+ aux_ops->poll_reg = hl_cn_poll_reg;
+ aux_ops->get_cpucp_info = hl_cn_get_cpucp_info;
+
+ cn_funcs->set_cn_data(hdev);
+
+ return 0;
+
+free_aux_ops:
+ kfree(aux_ops);
+free_aux_data:
+ kfree(aux_data);
+
+ return rc;
+}
+
+static void hl_cn_aux_data_fini(struct hl_device *hdev)
+{
+ struct hbl_aux_dev *aux_dev = &hdev->cn.cn_aux_dev;
+
+ kfree(aux_dev->aux_ops);
+ kfree(aux_dev->aux_data);
+}
+
+static void cn_adev_release(struct device *dev)
+{
+ struct hbl_aux_dev *aux_dev = container_of(dev, struct hbl_aux_dev, adev.dev);
+ struct hl_cn *cn = container_of(aux_dev, struct hl_cn, cn_aux_dev);
+
+ cn->is_cn_aux_dev_initialized = false;
+}
+
+static int hl_cn_aux_drv_init(struct hl_device *hdev)
+{
+ struct hl_cn *cn = &hdev->cn;
+ struct hbl_aux_dev *aux_dev = &cn->cn_aux_dev;
+ struct auxiliary_device *adev;
+ int rc;
+
+ rc = hl_cn_aux_data_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "CN aux data init failed\n");
+ return rc;
+ }
+
+ adev = &aux_dev->adev;
+ adev->id = hdev->id;
+ adev->name = "cn";
+ adev->dev.parent = hdev->dev;
+ adev->dev.release = cn_adev_release;
+
+ rc = auxiliary_device_init(adev);
+ if (rc) {
+ dev_err(hdev->dev, "CN auxiliary_device_init failed\n");
+ goto aux_data_free;
+ }
+
+ rc = auxiliary_device_add(adev);
+ if (rc) {
+ dev_err(hdev->dev, "CN auxiliary_device_add failed\n");
+ goto uninit_adev;
+ }
+
+ cn->is_cn_aux_dev_initialized = true;
+
+ return 0;
+
+uninit_adev:
+ auxiliary_device_uninit(adev);
+aux_data_free:
+ hl_cn_aux_data_fini(hdev);
+
+ return rc;
+}
+
+static void hl_cn_aux_drv_fini(struct hl_device *hdev)
+{
+ struct hl_cn *cn = &hdev->cn;
+ struct auxiliary_device *adev;
+
+ if (!cn->is_cn_aux_dev_initialized)
+ return;
+
+ adev = &cn->cn_aux_dev.adev;
+
+ auxiliary_device_delete(adev);
+ auxiliary_device_uninit(adev);
+
+ hl_cn_aux_data_fini(hdev);
+}
+
+int hl_cn_reopen(struct hl_device *hdev)
+{
+ struct hl_cn_funcs *cn_funcs = hdev->asic_funcs->cn_funcs;
+ struct hbl_aux_dev *aux_dev = &hdev->cn.cn_aux_dev;
+ struct hbl_cn_aux_ops *aux_ops = aux_dev->aux_ops;
+ int rc;
+
+ /* check if the NIC is enabled */
+ if (!hdev->cn.ports_mask)
+ return 0;
+
+ if (aux_ops->ports_reopen) {
+ rc = aux_ops->ports_reopen(aux_dev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to reopen the eth ports, %d\n", rc);
+ return rc;
+ }
+ }
+
+ cn_funcs->set_hw_cap(hdev, true);
+
+ return 0;
+}
+
+int hl_cn_init(struct hl_device *hdev)
+{
+ struct hl_cn_properties *cn_props = &hdev->asic_prop.cn_props;
+ struct hl_cn_funcs *cn_funcs = hdev->asic_funcs->cn_funcs;
+ struct hl_cn *cn = &hdev->cn;
+ int rc;
+
+ /*
+ * In init flow we initialize the NIC ports from scratch. In hard reset
+ * flow, we get here after the NIC ports were halted, hence we only need to reopen them.
+ */
+ if (hdev->reset_info.in_reset)
+ return hl_cn_reopen(hdev);
+
+ cn->ports_mask &= GENMASK(cn_props->max_num_of_ports - 1, 0);
+ cn->ports_ext_mask &= cn->ports_mask;
+ cn->auto_neg_mask &= cn->ports_mask;
+
+ /* check if the NIC is enabled */
+ if (!hdev->cn.ports_mask)
+ return 0;
+
+ rc = cn_funcs->pre_core_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to pre init the NIC, %d\n", rc);
+ return rc;
+ }
+
+ /* check if all ports are disabled by the FW */
+ if (!hdev->cn.ports_mask) {
+ dev_dbg(hdev->dev, "all NIC ports are disabled by the FW\n");
+ return 0;
+ }
+
+ cn->eth_ports_mask = hdev->cn.eth_on_internal ? hdev->cn.ports_mask :
+ hdev->cn.ports_ext_mask;
+
+ /* verify the kernel module name as the auxiliary drivers will bind according to it */
+ WARN_ONCE(strcmp(HL_NAME, KBUILD_MODNAME),
+ "habanalabs name not in sync with kernel module name");
+
+ rc = hl_cn_aux_drv_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init CN driver, %d\n", rc);
+ return rc;
+ }
+
+ cn_funcs->set_hw_cap(hdev, true);
+
+ cn->is_initialized = true;
+
+ return 0;
+}
+
+void hl_cn_fini(struct hl_device *hdev)
+{
+ struct hl_cn *cn = &hdev->cn;
+
+ /* The NIC capability bit of each ASIC cannot be used as a prerequisite for this function,
+ * as we may arrive here after a failing hard reset w/o calling to hl_cn_reopen().
+ * But we can check if the NIC is totally disabled.
+ */
+ if (!hdev->cn.ports_mask)
+ return;
+
+ if (!cn->is_initialized)
+ return;
+
+ hl_cn_aux_drv_fini(hdev);
+
+ cn->is_initialized = false;
+}
+
+void hl_cn_stop(struct hl_device *hdev)
+{
+ struct hl_cn_funcs *cn_funcs = hdev->asic_funcs->cn_funcs;
+ struct hl_cn *cn = &hdev->cn;
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = &cn->cn_aux_dev;
+ aux_ops = aux_dev->aux_ops;
+
+ if (!cn_funcs->get_hw_cap(hdev))
+ return;
+
+ if (aux_ops->ports_stop)
+ aux_ops->ports_stop(aux_dev);
+
+ /* Set NIC as not initialized. */
+ cn_funcs->set_hw_cap(hdev, false);
+}
+
+void hl_cn_hard_reset_prepare(struct hl_device *hdev, bool fw_reset)
+{
+ struct hl_cn_funcs *cn_funcs = hdev->asic_funcs->cn_funcs;
+
+ if (!cn_funcs->get_hw_cap(hdev))
+ return;
+
+ cn_funcs->port_funcs->ports_stop_prepare(hdev, fw_reset, hdev->device_fini_pending);
+}
+
+int hl_cn_send_status(struct hl_device *hdev, int port, u8 cmd, u8 period)
+{
+ struct hl_cn_funcs *cn_funcs = hdev->asic_funcs->cn_funcs;
+
+ if (!cn_funcs->get_hw_cap(hdev)) {
+ if (cmd != HBL_CN_STATUS_PERIODIC_STOP)
+ return hl_cn_send_empty_status(hdev, port);
+ return 0;
+ }
+
+ return cn_funcs->port_funcs->send_port_cpucp_status(hdev, port, cmd, period);
+}
+
+void hl_cn_synchronize_irqs(struct hl_device *hdev)
+{
+ struct hl_cn_funcs *cn_funcs = hdev->asic_funcs->cn_funcs;
+ struct hbl_aux_dev *aux_dev = &hdev->cn.cn_aux_dev;
+ struct hbl_cn_aux_ops *aux_ops;
+
+ aux_ops = aux_dev->aux_ops;
+
+ if (!cn_funcs->get_hw_cap(hdev))
+ return;
+
+ if (aux_ops->synchronize_irqs)
+ aux_ops->synchronize_irqs(aux_dev);
+}
+
+int hl_cn_cpucp_info_get(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct cpucp_nic_info *cpucp_nic_info;
+ dma_addr_t cpucp_nic_info_dma_addr;
+ int rc;
+
+ cpucp_nic_info = hl_cpu_accessible_dma_pool_alloc(hdev,
+ sizeof(struct cpucp_nic_info),
+ &cpucp_nic_info_dma_addr);
+ if (!cpucp_nic_info) {
+ dev_err(hdev->dev,
+ "Failed to allocate DMA memory for CPU-CP NIC info packet\n");
+ return -ENOMEM;
+ }
+
+ memset(cpucp_nic_info, 0, sizeof(struct cpucp_nic_info));
+
+ /* Unfortunately, 0 is a valid type in this field from f/w perspective,
+ * so to support older f/w where they don't return this field, put
+ * here the max value so when converting serdes type to server type,
+ * we will put the UNKNOWN value into the server type.
+ */
+ cpucp_nic_info->serdes_type = cpu_to_le16(U16_MAX);
+
+ rc = hl_fw_cpucp_nic_info_get(hdev, cpucp_nic_info_dma_addr);
+ if (rc)
+ goto out;
+
+ hl_cn_cpucp_info_le_to_cpu(cpucp_nic_info, &prop->cn_props.cpucp_info);
+
+out:
+ hl_cpu_accessible_dma_pool_free(hdev, sizeof(struct cpucp_nic_info), cpucp_nic_info);
+
+ return 0;
+}
diff --git a/drivers/accel/habanalabs/cn/cn.h b/drivers/accel/habanalabs/cn/cn.h
new file mode 100644
index 000000000000..8cab5423aaca
--- /dev/null
+++ b/drivers/accel/habanalabs/cn/cn.h
@@ -0,0 +1,133 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ *
+ */
+
+#ifndef CN_H_
+#define CN_H_
+
+#include <uapi/drm/habanalabs_accel.h>
+#include <linux/net/intel/cn_aux.h>
+
+#include <linux/kfifo.h>
+#include <linux/hashtable.h>
+#include <linux/ctype.h>
+
+#include <linux/habanalabs/cpucp_if.h>
+
+struct hl_device;
+struct hl_ctx;
+
+#define NIC_MACRO_CFG_SIZE hdev->asic_prop.macro_cfg_size
+#define NIC_MACRO_CFG_BASE(port) (NIC_MACRO_CFG_SIZE * ((port) >> 1))
+#define NIC_MACRO_WREG32(reg, val) WREG32(NIC_MACRO_CFG_BASE(port) + (reg), (val))
+
+/**
+ * struct hl_cn - habanalabs CN common structure.
+ * @cn_aux_dev: pointer to CN auxiliary device structure.
+ * @ctx: compute user context.
+ * @hw_access_lock: protects the HW access from CN flows.
+ * @ports_mask: contains mask of the CN ports that are enabled, as received from the f/w. This
+ * field can contain different values based on the server type
+ * @ports_ext_mask: contains mask of the CN ports that are external (used for scale-out), as
+ * received from the f/w. This field can contain different values based on the
+ * server type.
+ * @auto_neg_mask: mask of ports with Autonegotiation enabled.
+ * @eth_ports_mask: Ethernet ports enable mask.
+ * @ctx_registered: is user context registered.
+ * @card_location: the OAM number in the HLS (relevant for PMC card type).
+ * @lanes_per_port: number of physical lanes per port.
+ * @use_fw_serdes_info: true if NIC should use serdes values from F/W, false if CN should use hard
+ * coded values.
+ * @is_cn_aux_dev_initialized: true if the CN auxiliary device is initialized.
+ * @is_initialized: is device initialized.
+ * @load_fw: load PHY FW from ASIC path.
+ * @skip_phy_init: skip PHY init phase.
+ * @eth_on_internal: set internal ports as Ethernet ports.
+ */
+struct hl_cn {
+ struct hbl_aux_dev cn_aux_dev;
+ struct hl_ctx *ctx;
+ struct mutex hw_access_lock;
+ u64 ports_mask;
+ u64 ports_ext_mask;
+ u64 auto_neg_mask;
+ u64 eth_ports_mask;
+ atomic_t ctx_registered;
+ u32 card_location;
+ u8 lanes_per_port;
+ u8 use_fw_serdes_info;
+ u8 is_cn_aux_dev_initialized;
+ u8 is_initialized;
+ u8 load_fw;
+ u8 skip_phy_init;
+ u8 eth_on_internal;
+};
+
+/**
+ * struct hl_cn_port_funcs - ASIC specific CN functions that are called from common code for a
+ * specific port.
+ * @spmu_get_stats_info: get SPMU statistics information.
+ * @spmu_config: config the SPMU.
+ * @spmu_sample: read the SPMU counters.
+ * @post_send_status: ASIC-specific handler for post sending status packet to FW.
+ * @ports_stop_prepare: prepare the ports for a stop.
+ * @send_port_cpucp_status: Send port status to FW.
+ */
+struct hl_cn_port_funcs {
+ void (*spmu_get_stats_info)(struct hl_device *hdev, u32 port, struct hbl_cn_stat **stats,
+ u32 *n_stats);
+ int (*spmu_config)(struct hl_device *hdev, u32 port, u32 num_event_types, u32 event_types[],
+ bool enable);
+ int (*spmu_sample)(struct hl_device *hdev, u32 port, u32 num_out_data, u64 out_data[]);
+ void (*post_send_status)(struct hl_device *hdev, u32 port);
+ void (*ports_stop_prepare)(struct hl_device *hdev, bool fw_reset, bool in_teardown);
+ int (*send_port_cpucp_status)(struct hl_device *hdev, u32 port, u8 cmd, u8 period);
+};
+
+/**
+ * struct hl_cn_funcs - ASIC specific CN functions that are called from common code.
+ * @pre_core_init: NIC initializations to be done only once on device probe.
+ * @get_hw_cap: check rather HW capability bitmap is set for NIC.
+ * @set_hw_cap: set HW capability (on/off).
+ * @set_cn_data: ASIC data to be used by the CN driver.
+ * @port_funcs: functions called from common code for a specific NIC port.
+ */
+struct hl_cn_funcs {
+ int (*pre_core_init)(struct hl_device *hdev);
+ bool (*get_hw_cap)(struct hl_device *hdev);
+ void (*set_hw_cap)(struct hl_device *hdev, bool enable);
+ void (*set_cn_data)(struct hl_device *hdev);
+ struct hl_cn_port_funcs *port_funcs;
+};
+
+int hl_cn_init(struct hl_device *hdev);
+void hl_cn_fini(struct hl_device *hdev);
+void hl_cn_stop(struct hl_device *hdev);
+int hl_cn_reopen(struct hl_device *hdev);
+int hl_cn_send_status(struct hl_device *hdev, int port, u8 cmd, u8 period);
+void hl_cn_hard_reset_prepare(struct hl_device *hdev, bool fw_reset);
+void hl_cn_synchronize_irqs(struct hl_device *hdev);
+int hl_cn_cpucp_info_get(struct hl_device *hdev);
+void *hl_cn_dma_alloc_coherent(struct hbl_aux_dev *aux_dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t flag);
+void hl_cn_dma_free_coherent(struct hbl_aux_dev *aux_dev, size_t size, void *cpu_addr,
+ dma_addr_t dma_handle);
+void *hl_cn_dma_pool_zalloc(struct hbl_aux_dev *aux_dev, size_t size, gfp_t mem_flags,
+ dma_addr_t *dma_handle);
+void hl_cn_dma_pool_free(struct hbl_aux_dev *aux_dev, void *vaddr, dma_addr_t dma_addr);
+void hl_cn_spmu_get_stats_info(struct hbl_aux_dev *aux_dev, u32 port, struct hbl_cn_stat **stats,
+ u32 *n_stats);
+int hl_cn_spmu_config(struct hbl_aux_dev *aux_dev, u32 port, u32 num_event_types, u32 event_types[],
+ bool enable);
+int hl_cn_spmu_sample(struct hbl_aux_dev *aux_dev, u32 port, u32 num_out_data, u64 out_data[]);
+int hl_cn_poll_reg(struct hbl_aux_dev *aux_dev, u32 reg, u64 timeout_us, hbl_cn_poll_cond_func func,
+ void *arg);
+int hl_cn_send_cpu_message(struct hbl_aux_dev *aux_dev, u32 *msg, u16 len, u32 timeout,
+ u64 *result);
+void hl_cn_post_send_status(struct hbl_aux_dev *aux_dev, u32 port);
+
+#endif /* CN_H_ */
diff --git a/drivers/accel/habanalabs/common/command_submission.c b/drivers/accel/habanalabs/common/command_submission.c
index 39e23d625a3c..bba6765100b2 100644
--- a/drivers/accel/habanalabs/common/command_submission.c
+++ b/drivers/accel/habanalabs/common/command_submission.c
@@ -2262,7 +2262,7 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
goto free_cs_chunk_array;
}
- if (!hdev->nic_ports_mask) {
+ if (!hdev->cn.ports_mask) {
atomic64_inc(&ctx->cs_counters.validation_drop_cnt);
atomic64_inc(&cntr->validation_drop_cnt);
dev_err(hdev->dev,
diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c
index 8f92445c5a90..bda3524fb1b4 100644
--- a/drivers/accel/habanalabs/common/device.c
+++ b/drivers/accel/habanalabs/common/device.c
@@ -963,6 +963,7 @@ static int device_early_init(struct hl_device *hdev)
INIT_LIST_HEAD(&hdev->fpriv_ctrl_list);
mutex_init(&hdev->fpriv_list_lock);
mutex_init(&hdev->fpriv_ctrl_list_lock);
+ mutex_init(&hdev->cn.hw_access_lock);
mutex_init(&hdev->clk_throttling.lock);
return 0;
@@ -1007,6 +1008,7 @@ static void device_early_fini(struct hl_device *hdev)
mutex_destroy(&hdev->debug_lock);
mutex_destroy(&hdev->send_cpu_message_lock);
+ mutex_destroy(&hdev->cn.hw_access_lock);
mutex_destroy(&hdev->fpriv_list_lock);
mutex_destroy(&hdev->fpriv_ctrl_list_lock);
@@ -1251,6 +1253,10 @@ static void take_release_locks(struct hl_device *hdev)
mutex_unlock(&hdev->fpriv_list_lock);
mutex_lock(&hdev->fpriv_ctrl_list_lock);
mutex_unlock(&hdev->fpriv_ctrl_list_lock);
+
+ /* Flush CN flows */
+ mutex_lock(&hdev->cn.hw_access_lock);
+ mutex_unlock(&hdev->cn.hw_access_lock);
}
static void hl_abort_waiting_for_completions(struct hl_device *hdev)
@@ -1868,6 +1874,12 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
goto out_err;
}
+ rc = hdev->asic_funcs->cn_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init CN driver\n");
+ goto out_err;
+ }
+
if (!hdev->asic_prop.fw_security_enabled)
hl_fw_set_max_power(hdev);
} else {
@@ -2330,6 +2342,14 @@ int hl_device_init(struct hl_device *hdev)
goto out_disabled;
}
+ /* must be called after sysfs init for the auxiliary bus */
+ rc = hdev->asic_funcs->cn_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init CN driver\n");
+ rc = 0;
+ goto out_disabled;
+ }
+
/* Need to call this again because the max power might change,
* depending on card type for certain ASICs
*/
@@ -2517,6 +2537,9 @@ void hl_device_fini(struct hl_device *hdev)
hl_cb_pool_fini(hdev);
+ /* CN uses the kernel context for MMU mappings, therefore must be cleaned before it */
+ hdev->asic_funcs->cn_fini(hdev);
+
/* Reset the H/W. It will be in idle state after this returns */
rc = hdev->asic_funcs->hw_fini(hdev, true, false);
if (rc)
diff --git a/drivers/accel/habanalabs/common/firmware_if.c b/drivers/accel/habanalabs/common/firmware_if.c
index 4bd02778a970..52bbfa093ae3 100644
--- a/drivers/accel/habanalabs/common/firmware_if.c
+++ b/drivers/accel/habanalabs/common/firmware_if.c
@@ -1040,6 +1040,26 @@ int hl_fw_get_monitor_dump(struct hl_device *hdev, void *data)
return rc;
}
+int hl_fw_cpucp_nic_info_get(struct hl_device *hdev, dma_addr_t cpucp_nic_info_dma_addr)
+{
+ struct cpucp_packet pkt = {};
+ u64 result;
+ int rc;
+
+ pkt.ctl = cpu_to_le32(CPUCP_PACKET_NIC_INFO_GET <<
+ CPUCP_PKT_CTL_OPCODE_SHIFT);
+ pkt.addr = cpu_to_le64(cpucp_nic_info_dma_addr);
+ pkt.data_max_size = cpu_to_le32(sizeof(struct cpucp_nic_info));
+
+ rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
+ HL_CPUCP_INFO_TIMEOUT_USEC, &result);
+ if (rc)
+ dev_err(hdev->dev,
+ "Failed to handle CPU-CP NIC info pkt, error %d\n", rc);
+
+ return rc;
+}
+
int hl_fw_cpucp_pci_counters_get(struct hl_device *hdev,
struct hl_info_pci_counters *counters)
{
diff --git a/drivers/accel/habanalabs/common/habanalabs.h b/drivers/accel/habanalabs/common/habanalabs.h
index 48f0f3eea1ef..907e54ce0066 100644
--- a/drivers/accel/habanalabs/common/habanalabs.h
+++ b/drivers/accel/habanalabs/common/habanalabs.h
@@ -32,6 +32,7 @@
#include <drm/drm_device.h>
#include <drm/drm_file.h>
+#include "../cn/cn.h"
#include "security.h"
#define HL_NAME "habanalabs"
@@ -542,11 +543,25 @@ struct hl_hints_range {
u64 end_addr;
};
+/**
+ * struct hl_cn_properties - ASIC specific properties.
+ * @cpucp_info: received various information from CPU-CP regarding the NIC
+ * H/W, e.g. MAC addresses.
+ * @status_packet_size: size of the status packet we are going to send to F/W.
+ * @max_num_of_ports: maximum number of ports supported by ASIC.
+ */
+struct hl_cn_properties {
+ struct hbl_cn_cpucp_info cpucp_info;
+ u32 status_packet_size;
+ u8 max_num_of_ports;
+};
+
/**
* struct asic_fixed_properties - ASIC specific immutable properties.
* @hw_queues_props: H/W queues properties.
* @special_blocks: points to an array containing special blocks info.
* @skip_special_blocks_cfg: special blocks skip configs.
+ * @cn_props: CN driver properties.
* @cpucp_info: received various information from CPU-CP regarding the H/W, e.g.
* available sensors.
* @uboot_ver: F/W U-boot version.
@@ -590,6 +605,9 @@ struct hl_hints_range {
* @max_freq_value: current max clk frequency.
* @engine_core_interrupt_reg_addr: interrupt register address for engine core to use
* in order to raise events toward FW.
+ * @nic_drv_addr: base address for NIC driver on DRAM.
+ * @nic_drv_size: driver size reserved for NIC driver on DRAM.
+ * @macro_cfg_size: the size of the macro configuration space.
* @clk_pll_index: clock PLL index that specify which PLL determines the clock
* we display to the user
* @mmu_pgt_size: MMU page tables total size.
@@ -666,6 +684,7 @@ struct hl_hints_range {
* @cache_line_size: device cache line size.
* @server_type: Server type that the ASIC is currently installed in.
* The value is according to enum hl_server_type in uapi file.
+ * @clk: clock frequency in MHz.
* @completion_queues_count: number of completion queues.
* @completion_mode: 0 - job based completion, 1 - cs based completion
* @mme_master_slave_mode: 0 - Each MME works independently, 1 - MME works
@@ -705,6 +724,7 @@ struct asic_fixed_properties {
struct hw_queue_properties *hw_queues_props;
struct hl_special_block_info *special_blocks;
struct hl_skip_blocks_cfg skip_special_blocks_cfg;
+ struct hl_cn_properties cn_props;
struct cpucp_info cpucp_info;
char uboot_ver[VERSION_MAX_LEN];
char preboot_ver[VERSION_MAX_LEN];
@@ -742,6 +762,9 @@ struct asic_fixed_properties {
u64 host_end_address;
u64 max_freq_value;
u64 engine_core_interrupt_reg_addr;
+ u64 nic_drv_addr;
+ u64 nic_drv_size;
+ u32 macro_cfg_size;
u32 clk_pll_index;
u32 mmu_pgt_size;
u32 mmu_pte_size;
@@ -796,6 +819,7 @@ struct asic_fixed_properties {
u16 eq_interrupt_id;
u16 cache_line_size;
u16 server_type;
+ u16 clk;
u8 completion_queues_count;
u8 completion_mode;
u8 mme_master_slave_mode;
@@ -1562,10 +1586,14 @@ struct engines_data {
* then the timeout is the default timeout for the specific
* ASIC
* @get_hw_state: retrieve the H/W state
+ * @cn_init: init the CN H/W and I/F. This should be called in the final stage of the init flow, as
+ * we must not have anything that might fail during its initialization after the CN init.
+ * @cn_fini: perform CN cleanup.
* @pci_bars_map: Map PCI BARs.
* @init_iatu: Initialize the iATU unit inside the PCI controller.
* @rreg: Read a register. Needed for simulator support.
* @wreg: Write a register. Needed for simulator support.
+ * @get_reg_pcie_addr: Retrieve pci address.
* @halt_coresight: stop the ETF and ETR traces.
* @ctx_init: context dependent initialization.
* @ctx_fini: context dependent cleanup.
@@ -1617,6 +1645,7 @@ struct engines_data {
* @send_device_activity: indication to FW about device availability
* @set_dram_properties: set DRAM related properties.
* @set_binning_masks: set binning/enable masks for all relevant components.
+ * @cn_funcs: ASIC specific CN functions.
*/
struct hl_asic_funcs {
int (*early_init)(struct hl_device *hdev);
@@ -1692,10 +1721,13 @@ struct hl_asic_funcs {
int (*get_monitor_dump)(struct hl_device *hdev, void *data);
int (*send_cpu_message)(struct hl_device *hdev, u32 *msg,
u16 len, u32 timeout, u64 *result);
+ int (*cn_init)(struct hl_device *hdev);
+ void (*cn_fini)(struct hl_device *hdev);
int (*pci_bars_map)(struct hl_device *hdev);
int (*init_iatu)(struct hl_device *hdev);
u32 (*rreg)(struct hl_device *hdev, u32 reg);
void (*wreg)(struct hl_device *hdev, u32 reg, u32 val);
+ int (*get_reg_pcie_addr)(struct hl_device *hdev, u32 reg, u64 *pci_addr);
void (*halt_coresight)(struct hl_device *hdev, struct hl_ctx *ctx);
int (*ctx_init)(struct hl_ctx *ctx);
void (*ctx_fini)(struct hl_ctx *ctx);
@@ -1751,6 +1783,7 @@ struct hl_asic_funcs {
int (*send_device_activity)(struct hl_device *hdev, bool open);
int (*set_dram_properties)(struct hl_device *hdev);
int (*set_binning_masks)(struct hl_device *hdev);
+ struct hl_cn_funcs *cn_funcs;
};
@@ -3234,6 +3267,7 @@ struct hl_reset_info {
* @asic_prop: ASIC specific immutable properties.
* @asic_funcs: ASIC specific functions.
* @asic_specific: ASIC specific information to use only from ASIC files.
+ * @cn: CN common structure.
* @vm: virtual memory manager for MMU.
* @hwmon_dev: H/W monitor device.
* @hl_chip_info: ASIC's sensors information.
@@ -3353,7 +3387,6 @@ struct hl_reset_info {
* @supports_ctx_switch: true if a ctx switch is required upon first submission.
* @support_preboot_binning: true if we support read binning info from preboot.
* @eq_heartbeat_received: indication that eq heartbeat event has received from FW.
- * @nic_ports_mask: Controls which NIC ports are enabled. Used only for testing.
* @fw_components: Controls which f/w components to load to the device. There are multiple f/w
* stages and sometimes we want to stop at a certain stage. Used only for testing.
* @mmu_disable: Disable the device MMU(s). Used only for testing.
@@ -3413,6 +3446,7 @@ struct hl_device {
struct asic_fixed_properties asic_prop;
const struct hl_asic_funcs *asic_funcs;
void *asic_specific;
+ struct hl_cn cn;
struct hl_vm vm;
struct device *hwmon_dev;
struct hwmon_chip_info *hl_chip_info;
@@ -3518,7 +3552,6 @@ struct hl_device {
u8 eq_heartbeat_received;
/* Parameters for bring-up to be upstreamed */
- u64 nic_ports_mask;
u64 fw_components;
u8 mmu_disable;
u8 cpu_queues_enable;
@@ -3843,6 +3876,8 @@ void hl_userptr_delete_list(struct hl_device *hdev,
bool hl_userptr_is_pinned(struct hl_device *hdev, u64 addr, u32 size,
struct list_head *userptr_list,
struct hl_userptr **userptr);
+int hl_map_vmalloc_range(struct hl_ctx *ctx, u64 vmalloc_va, u64 device_va, u64 size);
+int hl_unmap_vmalloc_range(struct hl_ctx *ctx, u64 device_va);
int hl_mmu_init(struct hl_device *hdev);
void hl_mmu_fini(struct hl_device *hdev);
@@ -3943,6 +3978,7 @@ int hl_fw_cpucp_handshake(struct hl_device *hdev,
u32 boot_err1_reg);
int hl_fw_get_eeprom_data(struct hl_device *hdev, void *data, size_t max_size);
int hl_fw_get_monitor_dump(struct hl_device *hdev, void *data);
+int hl_fw_cpucp_nic_info_get(struct hl_device *hdev, dma_addr_t cpucp_nic_info_dma_addr);
int hl_fw_cpucp_pci_counters_get(struct hl_device *hdev,
struct hl_info_pci_counters *counters);
int hl_fw_cpucp_total_energy_get(struct hl_device *hdev,
@@ -4059,6 +4095,9 @@ void hl_capture_engine_err(struct hl_device *hdev, u16 engine_id, u16 error_coun
void hl_enable_err_info_capture(struct hl_error_info *captured_err_info);
void hl_init_cpu_for_irq(struct hl_device *hdev);
void hl_set_irq_affinity(struct hl_device *hdev, int irq);
+int hl_get_hw_block_handle(struct hl_device *hdev, u64 address,
+ u64 *handle, u32 *size);
+bool hl_check_fd(struct file *file);
#ifdef CONFIG_DEBUG_FS
diff --git a/drivers/accel/habanalabs/common/habanalabs_drv.c b/drivers/accel/habanalabs/common/habanalabs_drv.c
index e542fd40e16c..cee81375bfdc 100644
--- a/drivers/accel/habanalabs/common/habanalabs_drv.c
+++ b/drivers/accel/habanalabs/common/habanalabs_drv.c
@@ -110,6 +110,11 @@ static const struct drm_driver hl_driver = {
.num_ioctls = ARRAY_SIZE(hl_drm_ioctls)
};
+bool hl_check_fd(struct file *filp)
+{
+ return (filp->f_op == &hl_fops);
+}
+
/*
* get_asic_type - translate device id to asic type
*
@@ -328,9 +333,27 @@ int hl_device_open_ctrl(struct inode *inode, struct file *filp)
return rc;
}
+static u32 get_dev_nic_ports_mask(struct hl_device *hdev)
+{
+ enum hl_asic_type asic_type = hdev->asic_type;
+ u32 mask;
+
+ switch (asic_type) {
+ case ASIC_GAUDI2:
+ case ASIC_GAUDI2B:
+ case ASIC_GAUDI2C:
+ /* 24 ports are supported */
+ mask = 0xFFFFFF;
+ break;
+ default:
+ mask = 0;
+ }
+
+ return mask;
+}
+
static void set_driver_behavior_per_device(struct hl_device *hdev)
{
- hdev->nic_ports_mask = 0;
hdev->fw_components = FW_TYPE_ALL_TYPES;
hdev->cpu_queues_enable = 1;
hdev->pldm = 0;
@@ -338,6 +361,7 @@ static void set_driver_behavior_per_device(struct hl_device *hdev)
hdev->bmc_enable = 1;
hdev->reset_on_preboot_fail = 1;
hdev->heartbeat = 1;
+ hdev->card_type = cpucp_card_type_pmc;
}
static void copy_kernel_module_params_to_device(struct hl_device *hdev)
@@ -377,6 +401,8 @@ static void fixup_device_params_per_asic(struct hl_device *hdev, int timeout)
static int fixup_device_params(struct hl_device *hdev)
{
+ struct hl_cn *cn = &hdev->cn;
+ u32 dev_nic_ports_mask;
int tmp_timeout;
tmp_timeout = timeout_locked;
@@ -405,6 +431,15 @@ static int fixup_device_params(struct hl_device *hdev)
/* If CPU queues not enabled, no way to do heartbeat */
if (!hdev->cpu_queues_enable)
hdev->heartbeat = 0;
+
+ /* Adjust NIC ports parameters according to the device in-hand */
+ dev_nic_ports_mask = get_dev_nic_ports_mask(hdev);
+
+ cn->ports_mask = dev_nic_ports_mask;
+ cn->ports_ext_mask = dev_nic_ports_mask;
+ cn->auto_neg_mask = dev_nic_ports_mask;
+ cn->skip_phy_init = hdev->pldm;
+
fixup_device_params_per_asic(hdev, tmp_timeout);
return 0;
diff --git a/drivers/accel/habanalabs/common/habanalabs_ioctl.c b/drivers/accel/habanalabs/common/habanalabs_ioctl.c
index 1dd6e23172ca..d8809c74ac0f 100644
--- a/drivers/accel/habanalabs/common/habanalabs_ioctl.c
+++ b/drivers/accel/habanalabs/common/habanalabs_ioctl.c
@@ -108,6 +108,8 @@ static int hw_ip_info(struct hl_device *hdev, struct hl_info_args *args)
hw_ip.edma_enabled_mask = prop->edma_enabled_mask;
hw_ip.server_type = prop->server_type;
+ hw_ip.nic_ports_mask = hdev->cn.ports_mask;
+ hw_ip.nic_ports_external_mask = hdev->cn.eth_ports_mask;
hw_ip.security_enabled = prop->fw_security_enabled;
hw_ip.revision_id = hdev->pdev->revision;
hw_ip.rotator_enabled_mask = prop->rotator_enabled_mask;
diff --git a/drivers/accel/habanalabs/common/memory.c b/drivers/accel/habanalabs/common/memory.c
index 3348ad12c237..dd52b0ee0eef 100644
--- a/drivers/accel/habanalabs/common/memory.c
+++ b/drivers/accel/habanalabs/common/memory.c
@@ -1424,6 +1424,12 @@ static int map_block(struct hl_device *hdev, u64 address, u64 *handle, u32 *size
return 0;
}
+int hl_get_hw_block_handle(struct hl_device *hdev, u64 address,
+ u64 *handle, u32 *size)
+{
+ return map_block(hdev, address, handle, size);
+}
+
static void hw_block_vm_close(struct vm_area_struct *vma)
{
struct hl_vm_hw_block_list_node *lnode =
@@ -2940,3 +2946,120 @@ void hl_hw_block_mem_fini(struct hl_ctx *ctx)
mutex_destroy(&ctx->hw_block_list_lock);
}
+
+/**
+ * hl_map_vmalloc_range - Map vmalloc allocation to PMMU.
+ * @ctx: Associated context.
+ * @vmalloc_va: Start vmalloc virtual address.
+ * @device_va: Start device virtual address.
+ * @size: Size of allocation to map.
+ *
+ * Return: 0 on success, -ve on failure.
+ */
+int hl_map_vmalloc_range(struct hl_ctx *ctx, u64 vmalloc_va, u64 device_va, u64 size)
+{
+ struct hl_device *hdev = ctx->hdev;
+ struct hl_userptr *userptr = NULL;
+ struct hl_vm_phys_pg_pack *phys_pg_pack = NULL;
+ struct hl_vm_hash_node *hnode;
+ int rc;
+
+ /*
+ * Iterate through vmalloc pages and map them for DMA via sg-list.
+ * No need to pin the pages since we are mapping kernel memory which
+ * is never swapped out.
+ */
+ rc = dma_map_host_va(hdev, vmalloc_va, size, &userptr);
+ if (rc) {
+ dev_err(hdev->dev, "DMA mapping failed, vaddr 0x%llx\n", vmalloc_va);
+ return rc;
+ }
+
+ /*
+ * Create pack of host pages which we will later map to pmmu.
+ * Do not allow huge page optimization. We have pre allocated
+ * device VA with preset notion of alignment which is same as
+ * host page alignment.
+ */
+ rc = init_phys_pg_pack_from_userptr(ctx, userptr, &phys_pg_pack, true);
+ if (rc) {
+ dev_err(hdev->dev, "Unable to init page pack, vaddr 0x%llx\n", vmalloc_va);
+ goto err_dma_unmap;
+ }
+
+ /* Validate kernel host VA and device VA are aligned to pmmu page size. */
+ if (device_va & (phys_pg_pack->page_size - 1) ||
+ vmalloc_va & (phys_pg_pack->page_size - 1)) {
+ dev_err(hdev->dev,
+ "Unaligned mapping, host VA 0x%llx, device VA 0x%llx, page size 0x%x",
+ vmalloc_va, device_va, phys_pg_pack->page_size);
+ rc = -EINVAL;
+ goto err_free_page_pack;
+ }
+
+ mutex_lock(&hdev->mmu_lock);
+
+ /* Map page pack to pmmu */
+ rc = map_phys_pg_pack(ctx, device_va, phys_pg_pack);
+ if (rc) {
+ mutex_unlock(&hdev->mmu_lock);
+ dev_err(hdev->dev, "Mapping page pack failed, vaddr 0x%llx, device VA 0x%llx\n",
+ vmalloc_va, device_va);
+ goto err_free_page_pack;
+ }
+
+ rc = hl_mmu_invalidate_cache_range(hdev,
+ false, userptr->vm_type | MMU_OP_SKIP_LOW_CACHE_INV,
+ ctx->asid, device_va, phys_pg_pack->total_size);
+
+ mutex_unlock(&hdev->mmu_lock);
+
+ if (rc)
+ goto err_free_unmap_page_pack;
+
+ /*
+ * Keep track of mapping. Add mapped chunk to global hash list.
+ * Context release uses this list for force release if this mapping
+ * is not released gracefully.
+ */
+ hnode = kzalloc(sizeof(*hnode), GFP_KERNEL);
+ if (!hnode) {
+ rc = -ENOMEM;
+ goto err_free_unmap_page_pack;
+ }
+
+ hnode->ptr = (void *) userptr;
+ hnode->vaddr = device_va;
+
+ mutex_lock(&ctx->mem_hash_lock);
+ hash_add(ctx->mem_hash, &hnode->node, device_va);
+ mutex_unlock(&ctx->mem_hash_lock);
+
+ free_phys_pg_pack(hdev, phys_pg_pack);
+
+ return 0;
+
+err_free_unmap_page_pack:
+ unmap_phys_pg_pack(ctx, device_va, phys_pg_pack);
+err_free_page_pack:
+ free_phys_pg_pack(hdev, phys_pg_pack);
+err_dma_unmap:
+ dma_unmap_host_va(hdev, userptr);
+ return rc;
+}
+
+/**
+ * hl_unmap_vmalloc_range - Unmap vmalloc allocation from PMMU.
+ * @ctx: Associated context.
+ * @device_va: Start device virtual address.
+ *
+ * Return: 0 on success, -ve on failure.
+ */
+int hl_unmap_vmalloc_range(struct hl_ctx *ctx, u64 device_va)
+{
+ struct hl_mem_in args = {
+ .unmap.device_virt_addr = device_va,
+ };
+
+ return unmap_device_va(ctx, &args, false);
+}
diff --git a/drivers/accel/habanalabs/gaudi/gaudi.c b/drivers/accel/habanalabs/gaudi/gaudi.c
index f2b04ffb0ecb..e46bbe73fb82 100644
--- a/drivers/accel/habanalabs/gaudi/gaudi.c
+++ b/drivers/accel/habanalabs/gaudi/gaudi.c
@@ -1618,10 +1618,10 @@ static int gaudi_late_init(struct hl_device *hdev)
}
if ((hdev->card_type == cpucp_card_type_pci) &&
- (hdev->nic_ports_mask & 0x3)) {
+ (hdev->cn.ports_mask & 0x3)) {
dev_info(hdev->dev,
"PCI card detected, only 8 ports are enabled\n");
- hdev->nic_ports_mask &= ~0x3;
+ hdev->cn.ports_mask &= ~0x3;
/* Stop and disable unused NIC QMANs */
WREG32(mmNIC0_QM0_GLBL_CFG1, NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
@@ -3243,7 +3243,7 @@ static void gaudi_init_nic_qmans(struct hl_device *hdev)
mmNIC1_QM0_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
int i, nic_id, internal_q_index;
- if (!hdev->nic_ports_mask)
+ if (!hdev->cn.ports_mask)
return;
if (gaudi->hw_cap_initialized & HW_CAP_NIC_MASK)
@@ -3252,7 +3252,7 @@ static void gaudi_init_nic_qmans(struct hl_device *hdev)
dev_dbg(hdev->dev, "Initializing NIC QMANs\n");
for (nic_id = 0 ; nic_id < NIC_NUMBER_OF_ENGINES ; nic_id++) {
- if (!(hdev->nic_ports_mask & (1 << nic_id))) {
+ if (!(hdev->cn.ports_mask & (1 << nic_id))) {
nic_offset += nic_delta_between_qmans;
if (nic_id & 1) {
nic_offset -= (nic_delta_between_qmans * 2);
@@ -9117,6 +9117,11 @@ static int gaudi_send_device_activity(struct hl_device *hdev, bool open)
return 0;
}
+static int gaudi_get_reg_pcie_addr(struct hl_device *hdev, u32 reg, u64 *pci_addr)
+{
+ return -EOPNOTSUPP;
+}
+
static const struct hl_asic_funcs gaudi_funcs = {
.early_init = gaudi_early_init,
.early_fini = gaudi_early_fini,
@@ -9172,6 +9177,7 @@ static const struct hl_asic_funcs gaudi_funcs = {
.init_iatu = gaudi_init_iatu,
.rreg = hl_rreg,
.wreg = hl_wreg,
+ .get_reg_pcie_addr = gaudi_get_reg_pcie_addr,
.halt_coresight = gaudi_halt_coresight,
.ctx_init = gaudi_ctx_init,
.ctx_fini = gaudi_ctx_fini,
diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c
index fa1c4feb9f89..a22d2a93394e 100644
--- a/drivers/accel/habanalabs/gaudi2/gaudi2.c
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c
@@ -3222,7 +3222,7 @@ static void gaudi2_init_arcs(struct hl_device *hdev)
continue;
if (gaudi2_is_arc_nic_owned(arc_id) &&
- !(hdev->nic_ports_mask & BIT_ULL(arc_id - CPU_ID_NIC_QMAN_ARC0)))
+ !(hdev->cn.ports_mask & BIT_ULL(arc_id - CPU_ID_NIC_QMAN_ARC0)))
continue;
if (gaudi2_is_arc_tpc_owned(arc_id) && !(gaudi2->tpc_hw_cap_initialized &
@@ -3996,7 +3996,7 @@ static void gaudi2_stop_nic_qmans(struct hl_device *hdev)
queue_id = GAUDI2_QUEUE_ID_NIC_0_0;
for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
- if (!(hdev->nic_ports_mask & BIT(i)))
+ if (!(hdev->cn.ports_mask & BIT(i)))
continue;
reg_base = gaudi2_qm_blocks_bases[queue_id];
@@ -4192,7 +4192,7 @@ static void gaudi2_disable_nic_qmans(struct hl_device *hdev)
queue_id = GAUDI2_QUEUE_ID_NIC_0_0;
for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
- if (!(hdev->nic_ports_mask & BIT(i)))
+ if (!(hdev->cn.ports_mask & BIT(i)))
continue;
reg_base = gaudi2_qm_blocks_bases[queue_id];
@@ -4661,7 +4661,7 @@ static void gaudi2_nic_qmans_manual_flush(struct hl_device *hdev)
queue_id = GAUDI2_QUEUE_ID_NIC_0_0;
for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
- if (!(hdev->nic_ports_mask & BIT(i)))
+ if (!(hdev->cn.ports_mask & BIT(i)))
continue;
gaudi2_qman_manual_flush_common(hdev, queue_id);
@@ -7279,7 +7279,7 @@ static bool gaudi2_get_nic_idle_status(struct hl_device *hdev, u64 *mask_arr, u8
u64 offset = 0;
/* NIC, twelve macros in Full chip */
- if (e && hdev->nic_ports_mask)
+ if (e && hdev->cn.ports_mask)
hl_engine_data_sprintf(e,
"\nNIC is_idle QM_GLBL_STS0 QM_CGM_STS\n"
"--- ------- ------------ ----------\n");
@@ -7290,7 +7290,7 @@ static bool gaudi2_get_nic_idle_status(struct hl_device *hdev, u64 *mask_arr, u8
else
offset += NIC_QM_OFFSET;
- if (!(hdev->nic_ports_mask & BIT(i)))
+ if (!(hdev->cn.ports_mask & BIT(i)))
continue;
engine_idx = GAUDI2_ENGINE_ID_NIC0_0 + i;
@@ -8333,7 +8333,7 @@ static void gaudi2_check_if_razwi_happened(struct hl_device *hdev)
/* check all NICs */
for (mod_idx = 0 ; mod_idx < NIC_NUMBER_OF_PORTS ; mod_idx++)
- if (hdev->nic_ports_mask & BIT(mod_idx))
+ if (hdev->cn.ports_mask & BIT(mod_idx))
gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_NIC, mod_idx >> 1, 0,
NULL);
diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2_security.c b/drivers/accel/habanalabs/gaudi2/gaudi2_security.c
index 34bf80c5a44b..bb837be47908 100644
--- a/drivers/accel/habanalabs/gaudi2/gaudi2_security.c
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2_security.c
@@ -3493,7 +3493,7 @@ static int gaudi2_init_protection_bits(struct hl_device *hdev)
rc |= hl_init_pb_with_mask(hdev, NIC_NUMBER_OF_MACROS, NIC_OFFSET,
HL_PB_SINGLE_INSTANCE, HL_PB_NA,
gaudi2_pb_nic0, ARRAY_SIZE(gaudi2_pb_nic0),
- NULL, HL_PB_NA, hdev->nic_ports_mask);
+ NULL, HL_PB_NA, hdev->cn.ports_mask);
/* NIC QM and QPC */
rc |= hl_init_pb_with_mask(hdev, NIC_NUMBER_OF_MACROS, NIC_OFFSET,
@@ -3501,7 +3501,7 @@ static int gaudi2_init_protection_bits(struct hl_device *hdev)
gaudi2_pb_nic0_qm_qpc, ARRAY_SIZE(gaudi2_pb_nic0_qm_qpc),
gaudi2_pb_nic0_qm_qpc_unsecured_regs,
ARRAY_SIZE(gaudi2_pb_nic0_qm_qpc_unsecured_regs),
- hdev->nic_ports_mask);
+ hdev->cn.ports_mask);
/* NIC QM ARC */
rc |= hl_init_pb_ranges_with_mask(hdev, NIC_NUMBER_OF_MACROS,
@@ -3510,7 +3510,7 @@ static int gaudi2_init_protection_bits(struct hl_device *hdev)
ARRAY_SIZE(gaudi2_pb_nic0_qm_arc_aux0),
gaudi2_pb_nic0_qm_arc_aux0_unsecured_regs,
ARRAY_SIZE(gaudi2_pb_nic0_qm_arc_aux0_unsecured_regs),
- hdev->nic_ports_mask);
+ hdev->cn.ports_mask);
/* NIC UMR */
rc |= hl_init_pb_ranges_with_mask(hdev, NIC_NUMBER_OF_MACROS,
@@ -3519,7 +3519,7 @@ static int gaudi2_init_protection_bits(struct hl_device *hdev)
ARRAY_SIZE(gaudi2_pb_nic0_umr),
gaudi2_pb_nic0_umr_unsecured_regs,
ARRAY_SIZE(gaudi2_pb_nic0_umr_unsecured_regs),
- hdev->nic_ports_mask);
+ hdev->cn.ports_mask);
/* Rotators */
instance_offset = mmROT1_BASE - mmROT0_BASE;
@@ -3799,22 +3799,22 @@ void gaudi2_ack_protection_bits_errors(struct hl_device *hdev)
/* NIC */
hl_ack_pb_with_mask(hdev, NIC_NUMBER_OF_MACROS, NIC_OFFSET, HL_PB_SINGLE_INSTANCE, HL_PB_NA,
- gaudi2_pb_nic0, ARRAY_SIZE(gaudi2_pb_nic0), hdev->nic_ports_mask);
+ gaudi2_pb_nic0, ARRAY_SIZE(gaudi2_pb_nic0), hdev->cn.ports_mask);
/* NIC QM and QPC */
hl_ack_pb_with_mask(hdev, NIC_NUMBER_OF_MACROS, NIC_OFFSET, NIC_NUMBER_OF_QM_PER_MACRO,
NIC_QM_OFFSET, gaudi2_pb_nic0_qm_qpc, ARRAY_SIZE(gaudi2_pb_nic0_qm_qpc),
- hdev->nic_ports_mask);
+ hdev->cn.ports_mask);
/* NIC QM ARC */
hl_ack_pb_with_mask(hdev, NIC_NUMBER_OF_MACROS, NIC_OFFSET, NIC_NUMBER_OF_QM_PER_MACRO,
NIC_QM_OFFSET, gaudi2_pb_nic0_qm_arc_aux0,
- ARRAY_SIZE(gaudi2_pb_nic0_qm_arc_aux0), hdev->nic_ports_mask);
+ ARRAY_SIZE(gaudi2_pb_nic0_qm_arc_aux0), hdev->cn.ports_mask);
/* NIC UMR */
hl_ack_pb_with_mask(hdev, NIC_NUMBER_OF_MACROS, NIC_OFFSET, NIC_NUMBER_OF_QM_PER_MACRO,
NIC_QM_OFFSET, gaudi2_pb_nic0_umr, ARRAY_SIZE(gaudi2_pb_nic0_umr),
- hdev->nic_ports_mask);
+ hdev->cn.ports_mask);
/* Rotators */
instance_offset = mmROT1_BASE - mmROT0_BASE;
diff --git a/drivers/accel/habanalabs/goya/goya.c b/drivers/accel/habanalabs/goya/goya.c
index 5a359c3bdc78..e0a803bde463 100644
--- a/drivers/accel/habanalabs/goya/goya.c
+++ b/drivers/accel/habanalabs/goya/goya.c
@@ -5438,6 +5438,11 @@ static int goya_send_device_activity(struct hl_device *hdev, bool open)
return 0;
}
+static int goya_get_reg_pcie_addr(struct hl_device *hdev, u32 reg, u64 *pci_addr)
+{
+ return -EOPNOTSUPP;
+}
+
static const struct hl_asic_funcs goya_funcs = {
.early_init = goya_early_init,
.early_fini = goya_early_fini,
@@ -5493,6 +5498,7 @@ static const struct hl_asic_funcs goya_funcs = {
.init_iatu = goya_init_iatu,
.rreg = hl_rreg,
.wreg = hl_wreg,
+ .get_reg_pcie_addr = goya_get_reg_pcie_addr,
.halt_coresight = goya_halt_coresight,
.ctx_init = goya_ctx_init,
.ctx_fini = goya_ctx_fini,
diff --git a/drivers/accel/habanalabs/include/hw_ip/nic/nic_general.h b/drivers/accel/habanalabs/include/hw_ip/nic/nic_general.h
new file mode 100644
index 000000000000..5d300118d069
--- /dev/null
+++ b/drivers/accel/habanalabs/include/hw_ip/nic/nic_general.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ *
+ */
+
+#ifndef INCLUDE_NIC_GENERAL_H_
+#define INCLUDE_NIC_GENERAL_H_
+
+#define HABANALABS_MAC_OUI_1 0xB0FD0B
+#define HABANALABS_MAC_OUI_2 0x68932E
+
+#endif /* INCLUDE_NIC_GENERAL_H_ */
diff --git a/include/uapi/drm/habanalabs_accel.h b/include/uapi/drm/habanalabs_accel.h
index a512dc4cffd0..c2c987ba3998 100644
--- a/include/uapi/drm/habanalabs_accel.h
+++ b/include/uapi/drm/habanalabs_accel.h
@@ -935,14 +935,14 @@ struct hl_info_hw_ip_info {
__u8 reserved2;
__u64 reserved3;
__u64 device_mem_alloc_default_page_size;
- __u64 reserved4;
- __u64 reserved5;
- __u32 reserved6;
- __u8 reserved7;
+ __u64 nic_ports_mask;
+ __u64 nic_ports_external_mask;
+ __u32 reserved4;
+ __u8 reserved5;
__u8 revision_id;
__u16 tpc_interrupt_id;
__u32 rotator_enabled_mask;
- __u32 reserved9;
+ __u32 reserved6;
__u64 engine_core_interrupt_reg_addr;
__u64 reserved_dram_size;
};
--
2.34.1
^ permalink raw reply related [flat|nested] 107+ messages in thread
* [PATCH 14/15] accel/habanalabs/gaudi2: CN registers header files
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
` (11 preceding siblings ...)
2024-06-13 8:22 ` [PATCH 13/15] accel/habanalabs: network scaling support Omer Shpigelman
@ 2024-06-13 8:22 ` Omer Shpigelman
2024-06-13 8:22 ` [PATCH 15/15] accel/habanalabs/gaudi2: network scaling support Omer Shpigelman
` (2 subsequent siblings)
15 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:22 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
Add the relevant CN registers header files. These files are generated
automatically from a tool maintained by the VLSI engineers.
These are required for the upcoming GAUDI2 CN driver support.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Co-developed-by: David Meriin <dmeriin@habana.ai>
Signed-off-by: David Meriin <dmeriin@habana.ai>
Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
---
.../include/gaudi2/asic_reg/gaudi2_regs.h | 10 +-
.../include/gaudi2/asic_reg/nic0_phy_regs.h | 59 ++
.../nic0_qm0_axuser_nonsecured_regs.h | 61 ++
.../include/gaudi2/asic_reg/nic0_qpc1_regs.h | 905 ++++++++++++++++++
.../include/gaudi2/asic_reg/nic0_rxe0_regs.h | 725 ++++++++++++++
.../include/gaudi2/asic_reg/nic0_rxe1_regs.h | 725 ++++++++++++++
.../include/gaudi2/asic_reg/nic0_txe0_regs.h | 529 ++++++++++
.../include/gaudi2/asic_reg/nic0_txs0_regs.h | 289 ++++++
8 files changed, 3302 insertions(+), 1 deletion(-)
create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_phy_regs.h
create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_qm0_axuser_nonsecured_regs.h
create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_qpc1_regs.h
create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_rxe0_regs.h
create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_rxe1_regs.h
create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_txe0_regs.h
create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_txs0_regs.h
diff --git a/drivers/accel/habanalabs/include/gaudi2/asic_reg/gaudi2_regs.h b/drivers/accel/habanalabs/include/gaudi2/asic_reg/gaudi2_regs.h
index d21fcd3880b4..4aa0117dc62a 100644
--- a/drivers/accel/habanalabs/include/gaudi2/asic_reg/gaudi2_regs.h
+++ b/drivers/accel/habanalabs/include/gaudi2/asic_reg/gaudi2_regs.h
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0
*
- * Copyright 2020-2023 HabanaLabs, Ltd.
+ * Copyright 2020-2024 HabanaLabs, Ltd.
* All Rights Reserved.
*
*/
@@ -563,6 +563,14 @@
#include "nic0_qm0_cgm_regs.h"
#include "nic0_umr0_0_completion_queue_ci_1_regs.h"
#include "nic0_umr0_0_unsecure_doorbell0_regs.h"
+#include "nic0_qm0_axuser_nonsecured_regs.h"
+#include "nic0_txe0_regs.h"
+#include "nic0_rxe0_regs.h"
+#include "nic0_rxe1_regs.h"
+#include "nic0_txs0_regs.h"
+#include "nic0_phy_regs.h"
+#include "nic0_qpc1_regs.h"
+#include "nic0_rxe0_regs.h"
#define NIC_OFFSET (mmNIC1_MSTR_IF_RR_SHRD_HBW_BASE - mmNIC0_MSTR_IF_RR_SHRD_HBW_BASE)
diff --git a/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_phy_regs.h b/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_phy_regs.h
new file mode 100644
index 000000000000..f7d21bf181fd
--- /dev/null
+++ b/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_phy_regs.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2016-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ *
+ */
+
+/************************************
+ ** This is an auto-generated file **
+ ** DO NOT EDIT BELOW **
+ ************************************/
+
+#ifndef ASIC_REG_NIC0_PHY_REGS_H_
+#define ASIC_REG_NIC0_PHY_REGS_H_
+
+/*
+ *****************************************
+ * NIC0_PHY
+ * (Prototype: PRT_PHY)
+ *****************************************
+ */
+
+#define mmNIC0_PHY_PHY_RX_STS_0 0x5460000
+
+#define mmNIC0_PHY_PHY_RX_STS_1 0x5460004
+
+#define mmNIC0_PHY_PHY_RX_STS_2 0x5460008
+
+#define mmNIC0_PHY_PHY_RX_STS_3 0x546000C
+
+#define mmNIC0_PHY_PHY_RX_CFG_0 0x5460010
+
+#define mmNIC0_PHY_PHY_RX_CFG_1 0x5460014
+
+#define mmNIC0_PHY_PHY_RX_CFG_2 0x5460018
+
+#define mmNIC0_PHY_PHY_RX_CFG_3 0x546001C
+
+#define mmNIC0_PHY_PHY_TX_STS_0 0x5460020
+
+#define mmNIC0_PHY_PHY_TX_STS_1 0x5460024
+
+#define mmNIC0_PHY_PHY_TX_STS_2 0x5460028
+
+#define mmNIC0_PHY_PHY_TX_STS_3 0x546002C
+
+#define mmNIC0_PHY_PHY_RST_CFG 0x5460030
+
+#define mmNIC0_PHY_PHY_CFG_ADDR 0x5460040
+
+#define mmNIC0_PHY_PHY_LINK_STS_INTR 0x5460050
+
+#define mmNIC0_PHY_PHY_IDDQ_0 0x5460060
+
+#define mmNIC0_PHY_PHY_IDDQ_1 0x5460064
+
+#define mmNIC0_PHY_PHY_ASYNC_LANE_SWAP 0x5460068
+
+#endif /* ASIC_REG_NIC0_PHY_REGS_H_ */
diff --git a/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_qm0_axuser_nonsecured_regs.h b/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_qm0_axuser_nonsecured_regs.h
new file mode 100644
index 000000000000..adc0eec56f0c
--- /dev/null
+++ b/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_qm0_axuser_nonsecured_regs.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2016-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ *
+ */
+
+/************************************
+ ** This is an auto-generated file **
+ ** DO NOT EDIT BELOW **
+ ************************************/
+
+#ifndef ASIC_REG_NIC0_QM0_AXUSER_NONSECURED_REGS_H_
+#define ASIC_REG_NIC0_QM0_AXUSER_NONSECURED_REGS_H_
+
+/*
+ *****************************************
+ * NIC0_QM0_AXUSER_NONSECURED
+ * (Prototype: AXUSER)
+ *****************************************
+ */
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_HB_ASID 0x541AB80
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_HB_MMU_BP 0x541AB84
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_HB_STRONG_ORDER 0x541AB88
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_HB_NO_SNOOP 0x541AB8C
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_HB_WR_REDUCTION 0x541AB90
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_HB_RD_ATOMIC 0x541AB94
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_HB_QOS 0x541AB98
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_HB_RSVD 0x541AB9C
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_HB_EMEM_CPAGE 0x541ABA0
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_HB_CORE 0x541ABA4
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_E2E_COORD 0x541ABA8
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_HB_WR_OVRD_LO 0x541ABB0
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_HB_WR_OVRD_HI 0x541ABB4
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_HB_RD_OVRD_LO 0x541ABB8
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_HB_RD_OVRD_HI 0x541ABBC
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_LB_COORD 0x541ABC0
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_LB_LOCK 0x541ABC4
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_LB_RSVD 0x541ABC8
+
+#define mmNIC0_QM0_AXUSER_NONSECURED_LB_OVRD 0x541ABCC
+
+#endif /* ASIC_REG_NIC0_QM0_AXUSER_NONSECURED_REGS_H_ */
diff --git a/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_qpc1_regs.h b/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_qpc1_regs.h
new file mode 100644
index 000000000000..9d1443fffb2d
--- /dev/null
+++ b/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_qpc1_regs.h
@@ -0,0 +1,905 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2016-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ *
+ */
+
+/************************************
+ ** This is an auto-generated file **
+ ** DO NOT EDIT BELOW **
+ ************************************/
+
+#ifndef ASIC_REG_NIC0_QPC1_REGS_H_
+#define ASIC_REG_NIC0_QPC1_REGS_H_
+
+/*
+ *****************************************
+ * NIC0_QPC1
+ * (Prototype: NIC_QPC)
+ *****************************************
+ */
+
+#define mmNIC0_QPC1_REQ_QPC_CACHE_INVALIDATE 0x543F000
+
+#define mmNIC0_QPC1_REQ_QPC_CACHE_INV_STATUS 0x543F004
+
+#define mmNIC0_QPC1_REQ_STATIC_CONFIG 0x543F008
+
+#define mmNIC0_QPC1_REQ_BASE_ADDRESS_63_32 0x543F00C
+
+#define mmNIC0_QPC1_REQ_BASE_ADDRESS_31_7 0x543F010
+
+#define mmNIC0_QPC1_REQ_CLEAN_LINK_LIST 0x543F014
+
+#define mmNIC0_QPC1_REQ_ERR_FIFO_PUSH_63_32 0x543F018
+
+#define mmNIC0_QPC1_REQ_ERR_FIFO_PUSH_31_0 0x543F01C
+
+#define mmNIC0_QPC1_REQ_ERR_QP_STATE_63_32 0x543F020
+
+#define mmNIC0_QPC1_REQ_ERR_QP_STATE_31_0 0x543F024
+
+#define mmNIC0_QPC1_RETRY_COUNT_MAX 0x543F028
+
+#define mmNIC0_QPC1_AXI_PROT 0x543F030
+
+#define mmNIC0_QPC1_RES_QPC_CACHE_INVALIDATE 0x543F034
+
+#define mmNIC0_QPC1_RES_QPC_CACHE_INV_STATUS 0x543F038
+
+#define mmNIC0_QPC1_RES_STATIC_CONFIG 0x543F03C
+
+#define mmNIC0_QPC1_RES_BASE_ADDRESS_63_32 0x543F040
+
+#define mmNIC0_QPC1_RES_BASE_ADDRESS_31_7 0x543F044
+
+#define mmNIC0_QPC1_RES_CLEAN_LINK_LIST 0x543F048
+
+#define mmNIC0_QPC1_ERR_FIFO_WRITE_INDEX 0x543F050
+
+#define mmNIC0_QPC1_ERR_FIFO_PRODUCER_INDEX 0x543F054
+
+#define mmNIC0_QPC1_ERR_FIFO_CONSUMER_INDEX 0x543F058
+
+#define mmNIC0_QPC1_ERR_FIFO_MASK 0x543F05C
+
+#define mmNIC0_QPC1_ERR_FIFO_CREDIT 0x543F060
+
+#define mmNIC0_QPC1_ERR_FIFO_CFG 0x543F064
+
+#define mmNIC0_QPC1_ERR_FIFO_INTR_MASK 0x543F068
+
+#define mmNIC0_QPC1_ERR_FIFO_BASE_ADDR_63_32 0x543F06C
+
+#define mmNIC0_QPC1_ERR_FIFO_BASE_ADDR_31_7 0x543F070
+
+#define mmNIC0_QPC1_GW_BUSY 0x543F080
+
+#define mmNIC0_QPC1_GW_CTRL 0x543F084
+
+#define mmNIC0_QPC1_GW_DATA_0 0x543F08C
+
+#define mmNIC0_QPC1_GW_DATA_1 0x543F090
+
+#define mmNIC0_QPC1_GW_DATA_2 0x543F094
+
+#define mmNIC0_QPC1_GW_DATA_3 0x543F098
+
+#define mmNIC0_QPC1_GW_DATA_4 0x543F09C
+
+#define mmNIC0_QPC1_GW_DATA_5 0x543F0A0
+
+#define mmNIC0_QPC1_GW_DATA_6 0x543F0A4
+
+#define mmNIC0_QPC1_GW_DATA_7 0x543F0A8
+
+#define mmNIC0_QPC1_GW_DATA_8 0x543F0AC
+
+#define mmNIC0_QPC1_GW_DATA_9 0x543F0B0
+
+#define mmNIC0_QPC1_GW_DATA_10 0x543F0B4
+
+#define mmNIC0_QPC1_GW_DATA_11 0x543F0B8
+
+#define mmNIC0_QPC1_GW_DATA_12 0x543F0BC
+
+#define mmNIC0_QPC1_GW_DATA_13 0x543F0C0
+
+#define mmNIC0_QPC1_GW_DATA_14 0x543F0C4
+
+#define mmNIC0_QPC1_GW_DATA_15 0x543F0C8
+
+#define mmNIC0_QPC1_GW_DATA_16 0x543F0CC
+
+#define mmNIC0_QPC1_GW_DATA_17 0x543F0D0
+
+#define mmNIC0_QPC1_GW_DATA_18 0x543F0D4
+
+#define mmNIC0_QPC1_GW_DATA_19 0x543F0D8
+
+#define mmNIC0_QPC1_GW_DATA_20 0x543F0DC
+
+#define mmNIC0_QPC1_GW_DATA_21 0x543F0E0
+
+#define mmNIC0_QPC1_GW_DATA_22 0x543F0E4
+
+#define mmNIC0_QPC1_GW_DATA_23 0x543F0E8
+
+#define mmNIC0_QPC1_GW_DATA_24 0x543F0EC
+
+#define mmNIC0_QPC1_GW_DATA_25 0x543F0F0
+
+#define mmNIC0_QPC1_GW_DATA_26 0x543F0F4
+
+#define mmNIC0_QPC1_GW_DATA_27 0x543F0F8
+
+#define mmNIC0_QPC1_GW_DATA_28 0x543F0FC
+
+#define mmNIC0_QPC1_GW_DATA_29 0x543F100
+
+#define mmNIC0_QPC1_GW_DATA_30 0x543F104
+
+#define mmNIC0_QPC1_GW_DATA_31 0x543F108
+
+#define mmNIC0_QPC1_GW_MASK_0 0x543F124
+
+#define mmNIC0_QPC1_GW_MASK_1 0x543F128
+
+#define mmNIC0_QPC1_GW_MASK_2 0x543F12C
+
+#define mmNIC0_QPC1_GW_MASK_3 0x543F130
+
+#define mmNIC0_QPC1_GW_MASK_4 0x543F134
+
+#define mmNIC0_QPC1_GW_MASK_5 0x543F138
+
+#define mmNIC0_QPC1_GW_MASK_6 0x543F13C
+
+#define mmNIC0_QPC1_GW_MASK_7 0x543F140
+
+#define mmNIC0_QPC1_GW_MASK_8 0x543F144
+
+#define mmNIC0_QPC1_GW_MASK_9 0x543F148
+
+#define mmNIC0_QPC1_GW_MASK_10 0x543F14C
+
+#define mmNIC0_QPC1_GW_MASK_11 0x543F150
+
+#define mmNIC0_QPC1_GW_MASK_12 0x543F154
+
+#define mmNIC0_QPC1_GW_MASK_13 0x543F158
+
+#define mmNIC0_QPC1_GW_MASK_14 0x543F15C
+
+#define mmNIC0_QPC1_GW_MASK_15 0x543F160
+
+#define mmNIC0_QPC1_GW_MASK_16 0x543F164
+
+#define mmNIC0_QPC1_GW_MASK_17 0x543F168
+
+#define mmNIC0_QPC1_GW_MASK_18 0x543F16C
+
+#define mmNIC0_QPC1_GW_MASK_19 0x543F170
+
+#define mmNIC0_QPC1_GW_MASK_20 0x543F174
+
+#define mmNIC0_QPC1_GW_MASK_21 0x543F178
+
+#define mmNIC0_QPC1_GW_MASK_22 0x543F17C
+
+#define mmNIC0_QPC1_GW_MASK_23 0x543F180
+
+#define mmNIC0_QPC1_GW_MASK_24 0x543F184
+
+#define mmNIC0_QPC1_GW_MASK_25 0x543F188
+
+#define mmNIC0_QPC1_GW_MASK_26 0x543F18C
+
+#define mmNIC0_QPC1_GW_MASK_27 0x543F190
+
+#define mmNIC0_QPC1_GW_MASK_28 0x543F194
+
+#define mmNIC0_QPC1_GW_MASK_29 0x543F198
+
+#define mmNIC0_QPC1_GW_MASK_30 0x543F19C
+
+#define mmNIC0_QPC1_GW_MASK_31 0x543F1A0
+
+#define mmNIC0_QPC1_CC_TIMEOUT 0x543F1B0
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_EN 0x543F1FC
+
+#define mmNIC0_QPC1_CC_TICK_WRAP 0x543F200
+
+#define mmNIC0_QPC1_CC_ROLLBACK 0x543F204
+
+#define mmNIC0_QPC1_CC_MAX_WINDOW_SIZE 0x543F208
+
+#define mmNIC0_QPC1_CC_MIN_WINDOW_SIZE 0x543F20C
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_0 0x543F210
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_1 0x543F214
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_2 0x543F218
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_3 0x543F21C
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_4 0x543F220
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_5 0x543F224
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_6 0x543F228
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_7 0x543F22C
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_8 0x543F230
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_9 0x543F234
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_10 0x543F238
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_11 0x543F23C
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_12 0x543F240
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_13 0x543F244
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_14 0x543F248
+
+#define mmNIC0_QPC1_CC_ALPHA_LINEAR_15 0x543F24C
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_0 0x543F250
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_1 0x543F254
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_2 0x543F258
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_3 0x543F25C
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_4 0x543F260
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_5 0x543F264
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_6 0x543F268
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_7 0x543F26C
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_8 0x543F270
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_9 0x543F274
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_10 0x543F278
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_11 0x543F27C
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_12 0x543F280
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_13 0x543F284
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_14 0x543F288
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_15 0x543F28C
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_0 0x543F290
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_1 0x543F294
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_2 0x543F298
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_3 0x543F29C
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_4 0x543F2A0
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_5 0x543F2A4
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_6 0x543F2A8
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_7 0x543F2AC
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_8 0x543F2B0
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_9 0x543F2B4
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_10 0x543F2B8
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_11 0x543F2BC
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_12 0x543F2C0
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_13 0x543F2C4
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_14 0x543F2C8
+
+#define mmNIC0_QPC1_CC_ALPHA_LOG_THRESHOLD_15 0x543F2CC
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_0 0x543F2D0
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_1 0x543F2D4
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_2 0x543F2D8
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_3 0x543F2DC
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_4 0x543F2E0
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_5 0x543F2E4
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_6 0x543F2E8
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_7 0x543F2EC
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_8 0x543F2F0
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_9 0x543F2F4
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_10 0x543F2F8
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_11 0x543F2FC
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_12 0x543F300
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_13 0x543F304
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_14 0x543F308
+
+#define mmNIC0_QPC1_CC_WINDOW_INC_15 0x543F30C
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_0 0x543F310
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_1 0x543F314
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_2 0x543F318
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_3 0x543F31C
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_4 0x543F320
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_5 0x543F324
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_6 0x543F328
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_7 0x543F32C
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_8 0x543F330
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_9 0x543F334
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_10 0x543F338
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_11 0x543F33C
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_12 0x543F340
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_13 0x543F344
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_14 0x543F348
+
+#define mmNIC0_QPC1_CC_WINDOW_IN_THRESHOLD_15 0x543F34C
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_0 0x543F360
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_1 0x543F364
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_2 0x543F368
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_3 0x543F36C
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_4 0x543F370
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_5 0x543F374
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_6 0x543F378
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_7 0x543F37C
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_8 0x543F380
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_9 0x543F384
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_10 0x543F388
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_11 0x543F38C
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_12 0x543F390
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_13 0x543F394
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_14 0x543F398
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_15 0x543F39C
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_16 0x543F3A0
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_17 0x543F3A4
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_18 0x543F3A8
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_19 0x543F3AC
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_20 0x543F3B0
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_21 0x543F3B4
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_22 0x543F3B8
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_23 0x543F3BC
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_24 0x543F3C0
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_25 0x543F3C4
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_26 0x543F3C8
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_27 0x543F3CC
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_28 0x543F3D0
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_29 0x543F3D4
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_30 0x543F3D8
+
+#define mmNIC0_QPC1_DB_FIFO_USER_OVRD_31 0x543F3DC
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_0 0x543F3E0
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_1 0x543F3E4
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_2 0x543F3E8
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_3 0x543F3EC
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_4 0x543F3F0
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_5 0x543F3F4
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_6 0x543F3F8
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_7 0x543F3FC
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_8 0x543F400
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_9 0x543F404
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_10 0x543F408
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_11 0x543F40C
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_12 0x543F410
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_13 0x543F414
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_14 0x543F418
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_15 0x543F41C
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_16 0x543F420
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_17 0x543F424
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_18 0x543F428
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_19 0x543F42C
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_20 0x543F430
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_21 0x543F434
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_22 0x543F438
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_23 0x543F43C
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_24 0x543F440
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_25 0x543F444
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_26 0x543F448
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_27 0x543F44C
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_28 0x543F450
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_29 0x543F454
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_30 0x543F458
+
+#define mmNIC0_QPC1_DB_FIFO_CFG_31 0x543F45C
+
+#define mmNIC0_QPC1_SECURED_DB_FIRST32 0x543F460
+
+#define mmNIC0_QPC1_SECURED_DB_SECOND32 0x543F464
+
+#define mmNIC0_QPC1_SECURED_DB_THIRD32 0x543F468
+
+#define mmNIC0_QPC1_SECURED_DB_FOURTH32 0x543F46C
+
+#define mmNIC0_QPC1_PRIVILEGE_DB_FIRST32 0x543F470
+
+#define mmNIC0_QPC1_PRIVILEGE_DB_SECOND32 0x543F474
+
+#define mmNIC0_QPC1_PRIVILEGE_DB_THIRD32 0x543F478
+
+#define mmNIC0_QPC1_PRIVILEGE_DB_FOURTH32 0x543F47C
+
+#define mmNIC0_QPC1_DBG_INDICATION 0x543F480
+
+#define mmNIC0_QPC1_WTD_WC_FSM 0x543F484
+
+#define mmNIC0_QPC1_WTD_SLICE_FSM 0x543F488
+
+#define mmNIC0_QPC1_REQ_TX_EMPTY_CNT 0x543F48C
+
+#define mmNIC0_QPC1_RES_TX_EMPTY_CNT 0x543F490
+
+#define mmNIC0_QPC1_NUM_ROLLBACKS 0x543F494
+
+#define mmNIC0_QPC1_LAST_QP_ROLLED_BACK 0x543F498
+
+#define mmNIC0_QPC1_NUM_TIMEOUTS 0x543F49C
+
+#define mmNIC0_QPC1_LAST_QP_TIMED_OUT 0x543F4A0
+
+#define mmNIC0_QPC1_WTD_SLICE_FSM_HI 0x543F4A4
+
+#define mmNIC0_QPC1_INTERRUPT_BASE_0 0x543F4B0
+
+#define mmNIC0_QPC1_INTERRUPT_BASE_1 0x543F4B4
+
+#define mmNIC0_QPC1_INTERRUPT_BASE_2 0x543F4B8
+
+#define mmNIC0_QPC1_INTERRUPT_BASE_3 0x543F4BC
+
+#define mmNIC0_QPC1_INTERRUPT_BASE_4 0x543F4C0
+
+#define mmNIC0_QPC1_INTERRUPT_BASE_5 0x543F4C4
+
+#define mmNIC0_QPC1_INTERRUPT_BASE_6 0x543F4C8
+
+#define mmNIC0_QPC1_INTERRUPT_BASE_7 0x543F4CC
+
+#define mmNIC0_QPC1_INTERRUPT_BASE_8 0x543F4D0
+
+#define mmNIC0_QPC1_INTERRUPT_BASE_9 0x543F4D4
+
+#define mmNIC0_QPC1_INTERRUPT_BASE_10 0x543F4D8
+
+#define mmNIC0_QPC1_INTERRUPT_DATA_0 0x543F4DC
+
+#define mmNIC0_QPC1_INTERRUPT_DATA_1 0x543F4E0
+
+#define mmNIC0_QPC1_INTERRUPT_DATA_2 0x543F4E4
+
+#define mmNIC0_QPC1_INTERRUPT_DATA_3 0x543F4E8
+
+#define mmNIC0_QPC1_INTERRUPT_DATA_4 0x543F4EC
+
+#define mmNIC0_QPC1_INTERRUPT_DATA_5 0x543F4F0
+
+#define mmNIC0_QPC1_INTERRUPT_DATA_6 0x543F4F4
+
+#define mmNIC0_QPC1_INTERRUPT_DATA_7 0x543F4F8
+
+#define mmNIC0_QPC1_INTERRUPT_DATA_8 0x543F4FC
+
+#define mmNIC0_QPC1_INTERRUPT_DATA_9 0x543F500
+
+#define mmNIC0_QPC1_INTERRUPT_DATA_10 0x543F504
+
+#define mmNIC0_QPC1_DBG_COUNT_SELECT_0 0x543F600
+
+#define mmNIC0_QPC1_DBG_COUNT_SELECT_1 0x543F604
+
+#define mmNIC0_QPC1_DBG_COUNT_SELECT_2 0x543F608
+
+#define mmNIC0_QPC1_DBG_COUNT_SELECT_3 0x543F60C
+
+#define mmNIC0_QPC1_DBG_COUNT_SELECT_4 0x543F610
+
+#define mmNIC0_QPC1_DBG_COUNT_SELECT_5 0x543F614
+
+#define mmNIC0_QPC1_DBG_COUNT_SELECT_6 0x543F618
+
+#define mmNIC0_QPC1_DBG_COUNT_SELECT_7 0x543F61C
+
+#define mmNIC0_QPC1_DBG_COUNT_SELECT_8 0x543F620
+
+#define mmNIC0_QPC1_DBG_COUNT_SELECT_9 0x543F624
+
+#define mmNIC0_QPC1_DBG_COUNT_SELECT_10 0x543F628
+
+#define mmNIC0_QPC1_DBG_COUNT_SELECT_11 0x543F62C
+
+#define mmNIC0_QPC1_DOORBELL_SECURITY 0x543F648
+
+#define mmNIC0_QPC1_DBG_CFG 0x543F64C
+
+#define mmNIC0_QPC1_RES_RING0_PI 0x543F650
+
+#define mmNIC0_QPC1_RES_RING0_CI 0x543F654
+
+#define mmNIC0_QPC1_RES_RING0_CFG 0x543F658
+
+#define mmNIC0_QPC1_RES_RING1_PI 0x543F65C
+
+#define mmNIC0_QPC1_RES_RING1_CI 0x543F660
+
+#define mmNIC0_QPC1_RES_RING1_CFG 0x543F664
+
+#define mmNIC0_QPC1_RES_RING2_PI 0x543F668
+
+#define mmNIC0_QPC1_RES_RING2_CI 0x543F66C
+
+#define mmNIC0_QPC1_RES_RING2_CFG 0x543F670
+
+#define mmNIC0_QPC1_RES_RING3_PI 0x543F674
+
+#define mmNIC0_QPC1_RES_RING3_CI 0x543F678
+
+#define mmNIC0_QPC1_RES_RING3_CFG 0x543F67C
+
+#define mmNIC0_QPC1_REQ_RING0_CI 0x543F680
+
+#define mmNIC0_QPC1_REQ_RING1_CI 0x543F684
+
+#define mmNIC0_QPC1_REQ_RING2_CI 0x543F688
+
+#define mmNIC0_QPC1_REQ_RING3_CI 0x543F68C
+
+#define mmNIC0_QPC1_INTERRUPT_CAUSE 0x543F690
+
+#define mmNIC0_QPC1_INTERRUPT_MASK 0x543F694
+
+#define mmNIC0_QPC1_INTERRUPT_CLR 0x543F698
+
+#define mmNIC0_QPC1_INTERRUPT_EN 0x543F69C
+
+#define mmNIC0_QPC1_INTERRUPT_CFG 0x543F6F0
+
+#define mmNIC0_QPC1_INTERRUPT_RESP_ERR_CAUSE 0x543F6F4
+
+#define mmNIC0_QPC1_INTERRUPT_RESP_ERR_MASK 0x543F6F8
+
+#define mmNIC0_QPC1_INTERRUPR_RESP_ERR_CLR 0x543F700
+
+#define mmNIC0_QPC1_TMR_GW_VALID 0x543F704
+
+#define mmNIC0_QPC1_TMR_GW_DATA0 0x543F708
+
+#define mmNIC0_QPC1_TMR_GW_DATA1 0x543F70C
+
+#define mmNIC0_QPC1_RNR_RETRY_COUNT_EN 0x543F710
+
+#define mmNIC0_QPC1_EVENT_QUE_BASE_ADDR_63_32 0x543F830
+
+#define mmNIC0_QPC1_EVENT_QUE_BASE_ADDR_31_7 0x543F834
+
+#define mmNIC0_QPC1_EVENT_QUE_LOG_SIZE 0x543F838
+
+#define mmNIC0_QPC1_EVENT_QUE_WRITE_INDEX 0x543F83C
+
+#define mmNIC0_QPC1_EVENT_QUE_PRODUCER_INDEX 0x543F840
+
+#define mmNIC0_QPC1_EVENT_QUE_PI_ADDR_63_32 0x543F844
+
+#define mmNIC0_QPC1_EVENT_QUE_PI_ADDR_31_7 0x543F848
+
+#define mmNIC0_QPC1_EVENT_QUE_CONSUMER_INDEX_CB 0x543F84C
+
+#define mmNIC0_QPC1_EVENT_QUE_CFG 0x543F850
+
+#define mmNIC0_QPC1_LBW_PROT 0x543F858
+
+#define mmNIC0_QPC1_MEM_WRITE_INIT 0x543F85C
+
+#define mmNIC0_QPC1_QMAN_DOORBELL 0x543F8E8
+
+#define mmNIC0_QPC1_QMAN_DOORBELL_QPN 0x543F8EC
+
+#define mmNIC0_QPC1_SECURED_CQ_NUMBER 0x543F8F0
+
+#define mmNIC0_QPC1_SECURED_CQ_CONSUMER_INDEX 0x543F8F4
+
+#define mmNIC0_QPC1_PRIVILEGE_CQ_NUMBER 0x543F8F8
+
+#define mmNIC0_QPC1_PRIVILEGE_CQ_CONSUMER_INDEX 0x543F8FC
+
+#define mmNIC0_QPC1_TX_WQ_BASE_ADDR_63_32_0 0x543F900
+
+#define mmNIC0_QPC1_TX_WQ_BASE_ADDR_63_32_1 0x543F904
+
+#define mmNIC0_QPC1_TX_WQ_BASE_ADDR_63_32_2 0x543F908
+
+#define mmNIC0_QPC1_TX_WQ_BASE_ADDR_63_32_3 0x543F90C
+
+#define mmNIC0_QPC1_TX_WQ_BASE_ADDR_31_0_0 0x543F910
+
+#define mmNIC0_QPC1_TX_WQ_BASE_ADDR_31_0_1 0x543F914
+
+#define mmNIC0_QPC1_TX_WQ_BASE_ADDR_31_0_2 0x543F918
+
+#define mmNIC0_QPC1_TX_WQ_BASE_ADDR_31_0_3 0x543F91C
+
+#define mmNIC0_QPC1_LOG_MAX_TX_WQ_SIZE_0 0x543F920
+
+#define mmNIC0_QPC1_LOG_MAX_TX_WQ_SIZE_1 0x543F924
+
+#define mmNIC0_QPC1_LOG_MAX_TX_WQ_SIZE_2 0x543F928
+
+#define mmNIC0_QPC1_LOG_MAX_TX_WQ_SIZE_3 0x543F92C
+
+#define mmNIC0_QPC1_MMU_BYPASS_TX_WQ_0 0x543F930
+
+#define mmNIC0_QPC1_MMU_BYPASS_TX_WQ_1 0x543F934
+
+#define mmNIC0_QPC1_MMU_BYPASS_TX_WQ_2 0x543F938
+
+#define mmNIC0_QPC1_MMU_BYPASS_TX_WQ_3 0x543F93C
+
+#define mmNIC0_QPC1_RX_WQ_BASE_ADDR_63_32_0 0x543F940
+
+#define mmNIC0_QPC1_RX_WQ_BASE_ADDR_63_32_1 0x543F944
+
+#define mmNIC0_QPC1_RX_WQ_BASE_ADDR_63_32_2 0x543F948
+
+#define mmNIC0_QPC1_RX_WQ_BASE_ADDR_63_32_3 0x543F94C
+
+#define mmNIC0_QPC1_RX_WQ_BASE_ADDR_31_0_0 0x543F950
+
+#define mmNIC0_QPC1_RX_WQ_BASE_ADDR_31_0_1 0x543F954
+
+#define mmNIC0_QPC1_RX_WQ_BASE_ADDR_31_0_2 0x543F958
+
+#define mmNIC0_QPC1_RX_WQ_BASE_ADDR_31_0_3 0x543F95C
+
+#define mmNIC0_QPC1_LOG_MAX_RX_WQ_SIZE_0 0x543F960
+
+#define mmNIC0_QPC1_LOG_MAX_RX_WQ_SIZE_1 0x543F964
+
+#define mmNIC0_QPC1_LOG_MAX_RX_WQ_SIZE_2 0x543F968
+
+#define mmNIC0_QPC1_LOG_MAX_RX_WQ_SIZE_3 0x543F96C
+
+#define mmNIC0_QPC1_MMU_BYPASS_RX_WQ_0 0x543F970
+
+#define mmNIC0_QPC1_MMU_BYPASS_RX_WQ_1 0x543F974
+
+#define mmNIC0_QPC1_MMU_BYPASS_RX_WQ_2 0x543F978
+
+#define mmNIC0_QPC1_MMU_BYPASS_RX_WQ_3 0x543F97C
+
+#define mmNIC0_QPC1_WQE_MEM_WRITE_AXI_PROT 0x543F980
+
+#define mmNIC0_QPC1_WQ_UPPER_THRESHOLD 0x543F984
+
+#define mmNIC0_QPC1_WQ_LOWER_THRESHOLD 0x543F988
+
+#define mmNIC0_QPC1_WQ_BP_2ARC_ADDR 0x543F98C
+
+#define mmNIC0_QPC1_WQ_BP_2QMAN_ADDR 0x543F990
+
+#define mmNIC0_QPC1_WTD_CONFIG 0x543F994
+
+#define mmNIC0_QPC1_REQTX_ERR_FIFO_PUSH_63_32 0x543F998
+
+#define mmNIC0_QPC1_REQTX_ERR_FIFO_PUSH_31_0 0x543F99C
+
+#define mmNIC0_QPC1_REQTX_ERR_QP_STATE_63_32 0x543F9A0
+
+#define mmNIC0_QPC1_REQTX_ERR_QP_STATE_31_0 0x543F9A4
+
+#define mmNIC0_QPC1_EVENT_QUE_CONSUMER_INDEX 0x543F9A8
+
+#define mmNIC0_QPC1_ARM_CQ_NUM 0x543F9AC
+
+#define mmNIC0_QPC1_ARM_CQ_INDEX 0x543F9B0
+
+#define mmNIC0_QPC1_QPC_CLOCK_GATE 0x543F9B4
+
+#define mmNIC0_QPC1_QPC_CLOCK_GATE_DIS 0x543F9B8
+
+#define mmNIC0_QPC1_CONG_QUE_BASE_ADDR_63_32 0x543F9BC
+
+#define mmNIC0_QPC1_CONG_QUE_BASE_ADDR_31_7 0x543F9C0
+
+#define mmNIC0_QPC1_CONG_QUE_LOG_SIZE 0x543F9C4
+
+#define mmNIC0_QPC1_CONG_QUE_WRITE_INDEX 0x543F9C8
+
+#define mmNIC0_QPC1_CONG_QUE_PRODUCER_INDEX 0x543F9CC
+
+#define mmNIC0_QPC1_CONG_QUE_PI_ADDR_63_32 0x543F9D0
+
+#define mmNIC0_QPC1_CONG_QUE_PI_ADDR_31_7 0x543F9D4
+
+#define mmNIC0_QPC1_CONG_QUE_CONSUMER_INDEX_CB 0x543F9D8
+
+#define mmNIC0_QPC1_CONG_QUE_CFG 0x543F9DC
+
+#define mmNIC0_QPC1_CONG_QUE_CONSUMER_INDEX 0x543F9E0
+
+#define mmNIC0_QPC1_LINEAR_WQE_STATIC_0 0x543FA00
+
+#define mmNIC0_QPC1_LINEAR_WQE_STATIC_1 0x543FA04
+
+#define mmNIC0_QPC1_LINEAR_WQE_STATIC_2 0x543FA08
+
+#define mmNIC0_QPC1_LINEAR_WQE_STATIC_3 0x543FA0C
+
+#define mmNIC0_QPC1_LINEAR_WQE_STATIC_4 0x543FA10
+
+#define mmNIC0_QPC1_LINEAR_WQE_STATIC_5 0x543FA14
+
+#define mmNIC0_QPC1_LINEAR_WQE_STATIC_6 0x543FA18
+
+#define mmNIC0_QPC1_LINEAR_WQE_STATIC_7 0x543FA1C
+
+#define mmNIC0_QPC1_LINEAR_WQE_STATIC_8 0x543FA20
+
+#define mmNIC0_QPC1_LINEAR_WQE_STATIC_9 0x543FA24
+
+#define mmNIC0_QPC1_LINEAR_WQE_DYNAMIC_0 0x543FA40
+
+#define mmNIC0_QPC1_LINEAR_WQE_DYNAMIC_1 0x543FA44
+
+#define mmNIC0_QPC1_LINEAR_WQE_DYNAMIC_2 0x543FA48
+
+#define mmNIC0_QPC1_LINEAR_WQE_DYNAMIC_3 0x543FA4C
+
+#define mmNIC0_QPC1_LINEAR_WQE_DYNAMIC_4 0x543FA50
+
+#define mmNIC0_QPC1_LINEAR_WQE_DYNAMIC_5 0x543FA54
+
+#define mmNIC0_QPC1_LINEAR_WQE_QPN 0x543FA58
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_0 0x543FA80
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_1 0x543FA84
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_2 0x543FA88
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_3 0x543FA8C
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_4 0x543FA90
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_5 0x543FA94
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_6 0x543FA98
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_7 0x543FA9C
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_8 0x543FAA0
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_9 0x543FAA4
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_10 0x543FAA8
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_11 0x543FAAC
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_12 0x543FAB0
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_13 0x543FAB4
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_14 0x543FAB8
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_15 0x543FABC
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_16 0x543FAC0
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_STATIC_17 0x543FAC4
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_DYNAMIC_0 0x543FAE0
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_DYNAMIC_1 0x543FAE4
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_DYNAMIC_2 0x543FAE8
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_DYNAMIC_3 0x543FAEC
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_DYNAMIC_4 0x543FAF0
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_DYNAMIC_5 0x543FAF4
+
+#define mmNIC0_QPC1_MULTI_STRIDE_WQE_QPN 0x543FAF8
+
+#endif /* ASIC_REG_NIC0_QPC1_REGS_H_ */
diff --git a/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_rxe0_regs.h b/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_rxe0_regs.h
new file mode 100644
index 000000000000..054414222ae1
--- /dev/null
+++ b/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_rxe0_regs.h
@@ -0,0 +1,725 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2016-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ *
+ */
+
+/************************************
+ ** This is an auto-generated file **
+ ** DO NOT EDIT BELOW **
+ ************************************/
+
+#ifndef ASIC_REG_NIC0_RXE0_REGS_H_
+#define ASIC_REG_NIC0_RXE0_REGS_H_
+
+/*
+ *****************************************
+ * NIC0_RXE0
+ * (Prototype: NIC_RXE)
+ *****************************************
+ */
+
+#define mmNIC0_RXE0_CONTROL 0x544A000
+
+#define mmNIC0_RXE0_SCATTER_CFG 0x544A004
+
+#define mmNIC0_RXE0_SCATTER_CQ_ADDR 0x544A008
+
+#define mmNIC0_RXE0_RAW_QPN_P0_0 0x544A010
+
+#define mmNIC0_RXE0_RAW_QPN_P0_1 0x544A014
+
+#define mmNIC0_RXE0_RAW_QPN_P1_0 0x544A018
+
+#define mmNIC0_RXE0_RAW_QPN_P1_1 0x544A01C
+
+#define mmNIC0_RXE0_RAW_QPN_P2_0 0x544A020
+
+#define mmNIC0_RXE0_RAW_QPN_P2_1 0x544A024
+
+#define mmNIC0_RXE0_RAW_QPN_P3_0 0x544A028
+
+#define mmNIC0_RXE0_RAW_QPN_P3_1 0x544A02C
+
+#define mmNIC0_RXE0_RXE_CHECKS 0x544A030
+
+#define mmNIC0_RXE0_PKT_DROP 0x544A034
+
+#define mmNIC0_RXE0_PKT_SIZE_CHECK_RC 0x544A038
+
+#define mmNIC0_RXE0_PKT_SIZE_CHECK_RAW 0x544A03C
+
+#define mmNIC0_RXE0_ARUSER_MMU_BP 0x544A064
+
+#define mmNIC0_RXE0_AWUSER_LBW 0x544A068
+
+#define mmNIC0_RXE0_ARPROT_HBW 0x544A070
+
+#define mmNIC0_RXE0_AWPROT_LBW 0x544A074
+
+#define mmNIC0_RXE0_WIN0_WQ_BASE_LO 0x544A080
+
+#define mmNIC0_RXE0_WIN0_WQ_BASE_HI 0x544A084
+
+#define mmNIC0_RXE0_WIN0_WQ_MISC 0x544A088
+
+#define mmNIC0_RXE0_WIN1_WQ_BASE_LO 0x544A090
+
+#define mmNIC0_RXE0_WIN1_WQ_BASE_HI 0x544A094
+
+#define mmNIC0_RXE0_WIN1_WQ_MISC 0x544A098
+
+#define mmNIC0_RXE0_WIN2_WQ_BASE_LO 0x544A0A0
+
+#define mmNIC0_RXE0_WIN2_WQ_BASE_HI 0x544A0A4
+
+#define mmNIC0_RXE0_WIN2_WQ_MISC 0x544A0A8
+
+#define mmNIC0_RXE0_WIN3_WQ_BASE_LO 0x544A0B0
+
+#define mmNIC0_RXE0_WIN3_WQ_BASE_HI 0x544A0B4
+
+#define mmNIC0_RXE0_WIN3_WQ_MISC 0x544A0B8
+
+#define mmNIC0_RXE0_CG 0x544A0D0
+
+#define mmNIC0_RXE0_CG_TIMER 0x544A0D4
+
+#define mmNIC0_RXE0_WQE_WQ_WR_OP_DISABLE 0x544A0D8
+
+#define mmNIC0_RXE0_WQE_WQ_RDV_OP_DISABLE 0x544A0DC
+
+#define mmNIC0_RXE0_WQE_WQ_RD_OP_DISABLE 0x544A0E0
+
+#define mmNIC0_RXE0_WQE_MAX_WRITE_SEND_SIZE 0x544A0E4
+
+#define mmNIC0_RXE0_WQE_MAX_MULTI_STRIDE_SIZE 0x544A0E8
+
+#define mmNIC0_RXE0_CACHE_CFG 0x544A0F0
+
+#define mmNIC0_RXE0_CACHE_INFO 0x544A0F4
+
+#define mmNIC0_RXE0_CACHE_ADDR_LO 0x544A0F8
+
+#define mmNIC0_RXE0_CACHE_ADDR_HI 0x544A0FC
+
+#define mmNIC0_RXE0_CQ_BASE_ADDR_31_7 0x544A100
+
+#define mmNIC0_RXE0_CQ_BASE_ADDR_63_32 0x544A104
+
+#define mmNIC0_RXE0_CQ_LOG_MAX_SIZE 0x544A108
+
+#define mmNIC0_RXE0_CQ_ARM_TIMEOUT_EN 0x544A110
+
+#define mmNIC0_RXE0_CQ_ARM_TIMEOUT 0x544A114
+
+#define mmNIC0_RXE0_CQ_CFG_0 0x544A180
+
+#define mmNIC0_RXE0_CQ_CFG_1 0x544A184
+
+#define mmNIC0_RXE0_CQ_CFG_2 0x544A188
+
+#define mmNIC0_RXE0_CQ_CFG_3 0x544A18C
+
+#define mmNIC0_RXE0_CQ_CFG_4 0x544A190
+
+#define mmNIC0_RXE0_CQ_CFG_5 0x544A194
+
+#define mmNIC0_RXE0_CQ_CFG_6 0x544A198
+
+#define mmNIC0_RXE0_CQ_CFG_7 0x544A19C
+
+#define mmNIC0_RXE0_CQ_CFG_8 0x544A1A0
+
+#define mmNIC0_RXE0_CQ_CFG_9 0x544A1A4
+
+#define mmNIC0_RXE0_CQ_CFG_10 0x544A1A8
+
+#define mmNIC0_RXE0_CQ_CFG_11 0x544A1AC
+
+#define mmNIC0_RXE0_CQ_CFG_12 0x544A1B0
+
+#define mmNIC0_RXE0_CQ_CFG_13 0x544A1B4
+
+#define mmNIC0_RXE0_CQ_CFG_14 0x544A1B8
+
+#define mmNIC0_RXE0_CQ_CFG_15 0x544A1BC
+
+#define mmNIC0_RXE0_CQ_CFG_16 0x544A1C0
+
+#define mmNIC0_RXE0_CQ_CFG_17 0x544A1C4
+
+#define mmNIC0_RXE0_CQ_CFG_18 0x544A1C8
+
+#define mmNIC0_RXE0_CQ_CFG_19 0x544A1CC
+
+#define mmNIC0_RXE0_CQ_CFG_20 0x544A1D0
+
+#define mmNIC0_RXE0_CQ_CFG_21 0x544A1D4
+
+#define mmNIC0_RXE0_CQ_CFG_22 0x544A1D8
+
+#define mmNIC0_RXE0_CQ_CFG_23 0x544A1DC
+
+#define mmNIC0_RXE0_CQ_CFG_24 0x544A1E0
+
+#define mmNIC0_RXE0_CQ_CFG_25 0x544A1E4
+
+#define mmNIC0_RXE0_CQ_CFG_26 0x544A1E8
+
+#define mmNIC0_RXE0_CQ_CFG_27 0x544A1EC
+
+#define mmNIC0_RXE0_CQ_CFG_28 0x544A1F0
+
+#define mmNIC0_RXE0_CQ_CFG_29 0x544A1F4
+
+#define mmNIC0_RXE0_CQ_CFG_30 0x544A1F8
+
+#define mmNIC0_RXE0_CQ_CFG_31 0x544A1FC
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_0 0x544A200
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_1 0x544A204
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_2 0x544A208
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_3 0x544A20C
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_4 0x544A210
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_5 0x544A214
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_6 0x544A218
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_7 0x544A21C
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_8 0x544A220
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_9 0x544A224
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_10 0x544A228
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_11 0x544A22C
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_12 0x544A230
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_13 0x544A234
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_14 0x544A238
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_15 0x544A23C
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_16 0x544A240
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_17 0x544A244
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_18 0x544A248
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_19 0x544A24C
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_20 0x544A250
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_21 0x544A254
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_22 0x544A258
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_23 0x544A25C
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_24 0x544A260
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_25 0x544A264
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_26 0x544A268
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_27 0x544A26C
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_28 0x544A270
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_29 0x544A274
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_30 0x544A278
+
+#define mmNIC0_RXE0_CQ_WRITE_INDEX_31 0x544A27C
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_0 0x544A280
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_1 0x544A284
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_2 0x544A288
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_3 0x544A28C
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_4 0x544A290
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_5 0x544A294
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_6 0x544A298
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_7 0x544A29C
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_8 0x544A2A0
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_9 0x544A2A4
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_10 0x544A2A8
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_11 0x544A2AC
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_12 0x544A2B0
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_13 0x544A2B4
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_14 0x544A2B8
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_15 0x544A2BC
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_16 0x544A2C0
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_17 0x544A2C4
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_18 0x544A2C8
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_19 0x544A2CC
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_20 0x544A2D0
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_21 0x544A2D4
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_22 0x544A2D8
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_23 0x544A2DC
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_24 0x544A2E0
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_25 0x544A2E4
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_26 0x544A2E8
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_27 0x544A2EC
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_28 0x544A2F0
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_29 0x544A2F4
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_30 0x544A2F8
+
+#define mmNIC0_RXE0_CQ_PRODUCER_INDEX_31 0x544A2FC
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_0 0x544A300
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_1 0x544A304
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_2 0x544A308
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_3 0x544A30C
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_4 0x544A310
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_5 0x544A314
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_6 0x544A318
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_7 0x544A31C
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_8 0x544A320
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_9 0x544A324
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_10 0x544A328
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_11 0x544A32C
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_12 0x544A330
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_13 0x544A334
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_14 0x544A338
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_15 0x544A33C
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_16 0x544A340
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_17 0x544A344
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_18 0x544A348
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_19 0x544A34C
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_20 0x544A350
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_21 0x544A354
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_22 0x544A358
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_23 0x544A35C
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_24 0x544A360
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_25 0x544A364
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_26 0x544A368
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_27 0x544A36C
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_28 0x544A370
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_29 0x544A374
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_30 0x544A378
+
+#define mmNIC0_RXE0_CQ_CONSUMER_INDEX_31 0x544A37C
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_0 0x544A380
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_1 0x544A384
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_2 0x544A388
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_3 0x544A38C
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_4 0x544A390
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_5 0x544A394
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_6 0x544A398
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_7 0x544A39C
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_8 0x544A3A0
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_9 0x544A3A4
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_10 0x544A3A8
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_11 0x544A3AC
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_12 0x544A3B0
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_13 0x544A3B4
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_14 0x544A3B8
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_15 0x544A3BC
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_16 0x544A3C0
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_17 0x544A3C4
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_18 0x544A3C8
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_19 0x544A3CC
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_20 0x544A3D0
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_21 0x544A3D4
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_22 0x544A3D8
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_23 0x544A3DC
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_24 0x544A3E0
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_25 0x544A3E4
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_26 0x544A3E8
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_27 0x544A3EC
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_28 0x544A3F0
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_29 0x544A3F4
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_30 0x544A3F8
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_LO_31 0x544A3FC
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_0 0x544A400
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_1 0x544A404
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_2 0x544A408
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_3 0x544A40C
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_4 0x544A410
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_5 0x544A414
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_6 0x544A418
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_7 0x544A41C
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_8 0x544A420
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_9 0x544A424
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_10 0x544A428
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_11 0x544A42C
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_12 0x544A430
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_13 0x544A434
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_14 0x544A438
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_15 0x544A43C
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_16 0x544A440
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_17 0x544A444
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_18 0x544A448
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_19 0x544A44C
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_20 0x544A450
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_21 0x544A454
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_22 0x544A458
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_23 0x544A45C
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_24 0x544A460
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_25 0x544A464
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_26 0x544A468
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_27 0x544A46C
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_28 0x544A470
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_29 0x544A474
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_30 0x544A478
+
+#define mmNIC0_RXE0_CQ_PI_ADDR_HI_31 0x544A47C
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_0 0x544A480
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_1 0x544A484
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_2 0x544A488
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_3 0x544A48C
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_4 0x544A490
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_5 0x544A494
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_6 0x544A498
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_7 0x544A49C
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_8 0x544A4A0
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_9 0x544A4A4
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_10 0x544A4A8
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_11 0x544A4AC
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_12 0x544A4B0
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_13 0x544A4B4
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_14 0x544A4B8
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_15 0x544A4BC
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_16 0x544A4C0
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_17 0x544A4C4
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_18 0x544A4C8
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_19 0x544A4CC
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_20 0x544A4D0
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_21 0x544A4D4
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_22 0x544A4D8
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_23 0x544A4DC
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_24 0x544A4E0
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_25 0x544A4E4
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_26 0x544A4E8
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_27 0x544A4EC
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_28 0x544A4F0
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_29 0x544A4F4
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_30 0x544A4F8
+
+#define mmNIC0_RXE0_CQ_AXI_PROT_31 0x544A4FC
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_0 0x544A500
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_1 0x544A504
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_2 0x544A508
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_3 0x544A50C
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_4 0x544A510
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_5 0x544A514
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_6 0x544A518
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_7 0x544A51C
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_8 0x544A520
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_9 0x544A524
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_10 0x544A528
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_11 0x544A52C
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_12 0x544A530
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_13 0x544A534
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_14 0x544A538
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_15 0x544A53C
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_16 0x544A540
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_17 0x544A544
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_18 0x544A548
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_19 0x544A54C
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_20 0x544A550
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_21 0x544A554
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_22 0x544A558
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_23 0x544A55C
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_24 0x544A560
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_25 0x544A564
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_26 0x544A568
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_27 0x544A56C
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_28 0x544A570
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_29 0x544A574
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_30 0x544A578
+
+#define mmNIC0_RXE0_CQ_LOG_SIZE_31 0x544A57C
+
+#define mmNIC0_RXE0_RDV_SEND_WQ_BASE_ADDR_LO 0x544A600
+
+#define mmNIC0_RXE0_RDV_SEND_WQ_BASE_ADDR_HI 0x544A604
+
+#define mmNIC0_RXE0_RDV_LOG_MAX_WQ_SIZE 0x544A608
+
+#define mmNIC0_RXE0_LBW_BASE_LO 0x544A700
+
+#define mmNIC0_RXE0_LBW_BASE_HI 0x544A704
+
+#define mmNIC0_RXE0_LBW_LOG_SIZE 0x544A708
+
+#define mmNIC0_RXE0_RAW_BASE_LO_P0_0 0x544A710
+
+#define mmNIC0_RXE0_RAW_BASE_LO_P0_1 0x544A714
+
+#define mmNIC0_RXE0_RAW_BASE_HI_P0_0 0x544A720
+
+#define mmNIC0_RXE0_RAW_BASE_HI_P0_1 0x544A724
+
+#define mmNIC0_RXE0_RAW_MISC_P0_0 0x544A730
+
+#define mmNIC0_RXE0_RAW_MISC_P0_1 0x544A734
+
+#define mmNIC0_RXE0_RAW_BASE_LO_P1_0 0x544A750
+
+#define mmNIC0_RXE0_RAW_BASE_LO_P1_1 0x544A754
+
+#define mmNIC0_RXE0_RAW_BASE_HI_P1_0 0x544A760
+
+#define mmNIC0_RXE0_RAW_BASE_HI_P1_1 0x544A764
+
+#define mmNIC0_RXE0_RAW_MISC_P1_0 0x544A770
+
+#define mmNIC0_RXE0_RAW_MISC_P1_1 0x544A774
+
+#define mmNIC0_RXE0_RAW_BASE_LO_P2_0 0x544A790
+
+#define mmNIC0_RXE0_RAW_BASE_LO_P2_1 0x544A794
+
+#define mmNIC0_RXE0_RAW_BASE_HI_P2_0 0x544A7A0
+
+#define mmNIC0_RXE0_RAW_BASE_HI_P2_1 0x544A7A4
+
+#define mmNIC0_RXE0_RAW_MISC_P2_0 0x544A7B0
+
+#define mmNIC0_RXE0_RAW_MISC_P2_1 0x544A7B4
+
+#define mmNIC0_RXE0_RAW_BASE_LO_P3_0 0x544A7D0
+
+#define mmNIC0_RXE0_RAW_BASE_LO_P3_1 0x544A7D4
+
+#define mmNIC0_RXE0_RAW_BASE_HI_P3_0 0x544A7E0
+
+#define mmNIC0_RXE0_RAW_BASE_HI_P3_1 0x544A7E4
+
+#define mmNIC0_RXE0_RAW_MISC_P3_0 0x544A7F0
+
+#define mmNIC0_RXE0_RAW_MISC_P3_1 0x544A7F4
+
+#define mmNIC0_RXE0_SEI_INTR_CAUSE 0x544A800
+
+#define mmNIC0_RXE0_SEI_INTR_MASK 0x544A804
+
+#define mmNIC0_RXE0_SEI_INTR_CLEAR 0x544A808
+
+#define mmNIC0_RXE0_SPI_INTR_CAUSE 0x544A810
+
+#define mmNIC0_RXE0_SPI_INTR_MASK 0x544A814
+
+#define mmNIC0_RXE0_SPI_INTR_CLEAR 0x544A818
+
+#define mmNIC0_RXE0_DBG_SPMU_SELECT 0x544AA00
+
+#define mmNIC0_RXE0_DBG_INV_OP_0 0x544AA04
+
+#define mmNIC0_RXE0_DBG_INV_OP_1 0x544AA08
+
+#define mmNIC0_RXE0_DBG_AXI_ERR 0x544AA0C
+
+#define mmNIC0_RXE0_DBG_AXI_CQE_ERR 0x544AA10
+
+#define mmNIC0_RXE0_DBG_AXI_LBW_ERR 0x544AA14
+
+#define mmNIC0_RXE0_DBG_EN 0x544AA18
+
+#define mmNIC0_RXE0_DBG_CQ_ARM_ON 0x544AA1C
+
+#define mmNIC0_RXE0_DBG_CQ_ARM_SEL 0x544AA20
+
+#define mmNIC0_RXE0_DBG_CQ_ARM_IDX 0x544AA24
+
+#define mmNIC0_RXE0_DBG_SLICE_MAIN 0x544AA28
+
+#define mmNIC0_RXE0_DBG_SLICE_SCT 0x544AA2C
+
+#endif /* ASIC_REG_NIC0_RXE0_REGS_H_ */
diff --git a/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_rxe1_regs.h b/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_rxe1_regs.h
new file mode 100644
index 000000000000..9cc3fbd5ec51
--- /dev/null
+++ b/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_rxe1_regs.h
@@ -0,0 +1,725 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2016-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ *
+ */
+
+/************************************
+ ** This is an auto-generated file **
+ ** DO NOT EDIT BELOW **
+ ************************************/
+
+#ifndef ASIC_REG_NIC0_RXE1_REGS_H_
+#define ASIC_REG_NIC0_RXE1_REGS_H_
+
+/*
+ *****************************************
+ * NIC0_RXE1
+ * (Prototype: NIC_RXE)
+ *****************************************
+ */
+
+#define mmNIC0_RXE1_CONTROL 0x544B000
+
+#define mmNIC0_RXE1_SCATTER_CFG 0x544B004
+
+#define mmNIC0_RXE1_SCATTER_CQ_ADDR 0x544B008
+
+#define mmNIC0_RXE1_RAW_QPN_P0_0 0x544B010
+
+#define mmNIC0_RXE1_RAW_QPN_P0_1 0x544B014
+
+#define mmNIC0_RXE1_RAW_QPN_P1_0 0x544B018
+
+#define mmNIC0_RXE1_RAW_QPN_P1_1 0x544B01C
+
+#define mmNIC0_RXE1_RAW_QPN_P2_0 0x544B020
+
+#define mmNIC0_RXE1_RAW_QPN_P2_1 0x544B024
+
+#define mmNIC0_RXE1_RAW_QPN_P3_0 0x544B028
+
+#define mmNIC0_RXE1_RAW_QPN_P3_1 0x544B02C
+
+#define mmNIC0_RXE1_RXE_CHECKS 0x544B030
+
+#define mmNIC0_RXE1_PKT_DROP 0x544B034
+
+#define mmNIC0_RXE1_PKT_SIZE_CHECK_RC 0x544B038
+
+#define mmNIC0_RXE1_PKT_SIZE_CHECK_RAW 0x544B03C
+
+#define mmNIC0_RXE1_ARUSER_MMU_BP 0x544B064
+
+#define mmNIC0_RXE1_AWUSER_LBW 0x544B068
+
+#define mmNIC0_RXE1_ARPROT_HBW 0x544B070
+
+#define mmNIC0_RXE1_AWPROT_LBW 0x544B074
+
+#define mmNIC0_RXE1_WIN0_WQ_BASE_LO 0x544B080
+
+#define mmNIC0_RXE1_WIN0_WQ_BASE_HI 0x544B084
+
+#define mmNIC0_RXE1_WIN0_WQ_MISC 0x544B088
+
+#define mmNIC0_RXE1_WIN1_WQ_BASE_LO 0x544B090
+
+#define mmNIC0_RXE1_WIN1_WQ_BASE_HI 0x544B094
+
+#define mmNIC0_RXE1_WIN1_WQ_MISC 0x544B098
+
+#define mmNIC0_RXE1_WIN2_WQ_BASE_LO 0x544B0A0
+
+#define mmNIC0_RXE1_WIN2_WQ_BASE_HI 0x544B0A4
+
+#define mmNIC0_RXE1_WIN2_WQ_MISC 0x544B0A8
+
+#define mmNIC0_RXE1_WIN3_WQ_BASE_LO 0x544B0B0
+
+#define mmNIC0_RXE1_WIN3_WQ_BASE_HI 0x544B0B4
+
+#define mmNIC0_RXE1_WIN3_WQ_MISC 0x544B0B8
+
+#define mmNIC0_RXE1_CG 0x544B0D0
+
+#define mmNIC0_RXE1_CG_TIMER 0x544B0D4
+
+#define mmNIC0_RXE1_WQE_WQ_WR_OP_DISABLE 0x544B0D8
+
+#define mmNIC0_RXE1_WQE_WQ_RDV_OP_DISABLE 0x544B0DC
+
+#define mmNIC0_RXE1_WQE_WQ_RD_OP_DISABLE 0x544B0E0
+
+#define mmNIC0_RXE1_WQE_MAX_WRITE_SEND_SIZE 0x544B0E4
+
+#define mmNIC0_RXE1_WQE_MAX_MULTI_STRIDE_SIZE 0x544B0E8
+
+#define mmNIC0_RXE1_CACHE_CFG 0x544B0F0
+
+#define mmNIC0_RXE1_CACHE_INFO 0x544B0F4
+
+#define mmNIC0_RXE1_CACHE_ADDR_LO 0x544B0F8
+
+#define mmNIC0_RXE1_CACHE_ADDR_HI 0x544B0FC
+
+#define mmNIC0_RXE1_CQ_BASE_ADDR_31_7 0x544B100
+
+#define mmNIC0_RXE1_CQ_BASE_ADDR_63_32 0x544B104
+
+#define mmNIC0_RXE1_CQ_LOG_MAX_SIZE 0x544B108
+
+#define mmNIC0_RXE1_CQ_ARM_TIMEOUT_EN 0x544B110
+
+#define mmNIC0_RXE1_CQ_ARM_TIMEOUT 0x544B114
+
+#define mmNIC0_RXE1_CQ_CFG_0 0x544B180
+
+#define mmNIC0_RXE1_CQ_CFG_1 0x544B184
+
+#define mmNIC0_RXE1_CQ_CFG_2 0x544B188
+
+#define mmNIC0_RXE1_CQ_CFG_3 0x544B18C
+
+#define mmNIC0_RXE1_CQ_CFG_4 0x544B190
+
+#define mmNIC0_RXE1_CQ_CFG_5 0x544B194
+
+#define mmNIC0_RXE1_CQ_CFG_6 0x544B198
+
+#define mmNIC0_RXE1_CQ_CFG_7 0x544B19C
+
+#define mmNIC0_RXE1_CQ_CFG_8 0x544B1A0
+
+#define mmNIC0_RXE1_CQ_CFG_9 0x544B1A4
+
+#define mmNIC0_RXE1_CQ_CFG_10 0x544B1A8
+
+#define mmNIC0_RXE1_CQ_CFG_11 0x544B1AC
+
+#define mmNIC0_RXE1_CQ_CFG_12 0x544B1B0
+
+#define mmNIC0_RXE1_CQ_CFG_13 0x544B1B4
+
+#define mmNIC0_RXE1_CQ_CFG_14 0x544B1B8
+
+#define mmNIC0_RXE1_CQ_CFG_15 0x544B1BC
+
+#define mmNIC0_RXE1_CQ_CFG_16 0x544B1C0
+
+#define mmNIC0_RXE1_CQ_CFG_17 0x544B1C4
+
+#define mmNIC0_RXE1_CQ_CFG_18 0x544B1C8
+
+#define mmNIC0_RXE1_CQ_CFG_19 0x544B1CC
+
+#define mmNIC0_RXE1_CQ_CFG_20 0x544B1D0
+
+#define mmNIC0_RXE1_CQ_CFG_21 0x544B1D4
+
+#define mmNIC0_RXE1_CQ_CFG_22 0x544B1D8
+
+#define mmNIC0_RXE1_CQ_CFG_23 0x544B1DC
+
+#define mmNIC0_RXE1_CQ_CFG_24 0x544B1E0
+
+#define mmNIC0_RXE1_CQ_CFG_25 0x544B1E4
+
+#define mmNIC0_RXE1_CQ_CFG_26 0x544B1E8
+
+#define mmNIC0_RXE1_CQ_CFG_27 0x544B1EC
+
+#define mmNIC0_RXE1_CQ_CFG_28 0x544B1F0
+
+#define mmNIC0_RXE1_CQ_CFG_29 0x544B1F4
+
+#define mmNIC0_RXE1_CQ_CFG_30 0x544B1F8
+
+#define mmNIC0_RXE1_CQ_CFG_31 0x544B1FC
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_0 0x544B200
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_1 0x544B204
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_2 0x544B208
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_3 0x544B20C
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_4 0x544B210
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_5 0x544B214
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_6 0x544B218
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_7 0x544B21C
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_8 0x544B220
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_9 0x544B224
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_10 0x544B228
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_11 0x544B22C
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_12 0x544B230
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_13 0x544B234
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_14 0x544B238
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_15 0x544B23C
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_16 0x544B240
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_17 0x544B244
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_18 0x544B248
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_19 0x544B24C
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_20 0x544B250
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_21 0x544B254
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_22 0x544B258
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_23 0x544B25C
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_24 0x544B260
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_25 0x544B264
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_26 0x544B268
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_27 0x544B26C
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_28 0x544B270
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_29 0x544B274
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_30 0x544B278
+
+#define mmNIC0_RXE1_CQ_WRITE_INDEX_31 0x544B27C
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_0 0x544B280
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_1 0x544B284
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_2 0x544B288
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_3 0x544B28C
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_4 0x544B290
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_5 0x544B294
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_6 0x544B298
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_7 0x544B29C
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_8 0x544B2A0
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_9 0x544B2A4
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_10 0x544B2A8
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_11 0x544B2AC
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_12 0x544B2B0
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_13 0x544B2B4
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_14 0x544B2B8
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_15 0x544B2BC
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_16 0x544B2C0
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_17 0x544B2C4
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_18 0x544B2C8
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_19 0x544B2CC
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_20 0x544B2D0
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_21 0x544B2D4
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_22 0x544B2D8
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_23 0x544B2DC
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_24 0x544B2E0
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_25 0x544B2E4
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_26 0x544B2E8
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_27 0x544B2EC
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_28 0x544B2F0
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_29 0x544B2F4
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_30 0x544B2F8
+
+#define mmNIC0_RXE1_CQ_PRODUCER_INDEX_31 0x544B2FC
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_0 0x544B300
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_1 0x544B304
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_2 0x544B308
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_3 0x544B30C
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_4 0x544B310
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_5 0x544B314
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_6 0x544B318
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_7 0x544B31C
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_8 0x544B320
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_9 0x544B324
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_10 0x544B328
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_11 0x544B32C
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_12 0x544B330
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_13 0x544B334
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_14 0x544B338
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_15 0x544B33C
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_16 0x544B340
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_17 0x544B344
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_18 0x544B348
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_19 0x544B34C
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_20 0x544B350
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_21 0x544B354
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_22 0x544B358
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_23 0x544B35C
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_24 0x544B360
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_25 0x544B364
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_26 0x544B368
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_27 0x544B36C
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_28 0x544B370
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_29 0x544B374
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_30 0x544B378
+
+#define mmNIC0_RXE1_CQ_CONSUMER_INDEX_31 0x544B37C
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_0 0x544B380
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_1 0x544B384
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_2 0x544B388
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_3 0x544B38C
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_4 0x544B390
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_5 0x544B394
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_6 0x544B398
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_7 0x544B39C
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_8 0x544B3A0
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_9 0x544B3A4
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_10 0x544B3A8
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_11 0x544B3AC
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_12 0x544B3B0
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_13 0x544B3B4
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_14 0x544B3B8
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_15 0x544B3BC
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_16 0x544B3C0
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_17 0x544B3C4
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_18 0x544B3C8
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_19 0x544B3CC
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_20 0x544B3D0
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_21 0x544B3D4
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_22 0x544B3D8
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_23 0x544B3DC
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_24 0x544B3E0
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_25 0x544B3E4
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_26 0x544B3E8
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_27 0x544B3EC
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_28 0x544B3F0
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_29 0x544B3F4
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_30 0x544B3F8
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_LO_31 0x544B3FC
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_0 0x544B400
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_1 0x544B404
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_2 0x544B408
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_3 0x544B40C
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_4 0x544B410
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_5 0x544B414
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_6 0x544B418
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_7 0x544B41C
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_8 0x544B420
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_9 0x544B424
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_10 0x544B428
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_11 0x544B42C
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_12 0x544B430
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_13 0x544B434
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_14 0x544B438
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_15 0x544B43C
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_16 0x544B440
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_17 0x544B444
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_18 0x544B448
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_19 0x544B44C
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_20 0x544B450
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_21 0x544B454
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_22 0x544B458
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_23 0x544B45C
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_24 0x544B460
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_25 0x544B464
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_26 0x544B468
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_27 0x544B46C
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_28 0x544B470
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_29 0x544B474
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_30 0x544B478
+
+#define mmNIC0_RXE1_CQ_PI_ADDR_HI_31 0x544B47C
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_0 0x544B480
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_1 0x544B484
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_2 0x544B488
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_3 0x544B48C
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_4 0x544B490
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_5 0x544B494
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_6 0x544B498
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_7 0x544B49C
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_8 0x544B4A0
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_9 0x544B4A4
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_10 0x544B4A8
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_11 0x544B4AC
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_12 0x544B4B0
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_13 0x544B4B4
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_14 0x544B4B8
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_15 0x544B4BC
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_16 0x544B4C0
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_17 0x544B4C4
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_18 0x544B4C8
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_19 0x544B4CC
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_20 0x544B4D0
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_21 0x544B4D4
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_22 0x544B4D8
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_23 0x544B4DC
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_24 0x544B4E0
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_25 0x544B4E4
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_26 0x544B4E8
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_27 0x544B4EC
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_28 0x544B4F0
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_29 0x544B4F4
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_30 0x544B4F8
+
+#define mmNIC0_RXE1_CQ_AXI_PROT_31 0x544B4FC
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_0 0x544B500
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_1 0x544B504
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_2 0x544B508
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_3 0x544B50C
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_4 0x544B510
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_5 0x544B514
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_6 0x544B518
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_7 0x544B51C
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_8 0x544B520
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_9 0x544B524
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_10 0x544B528
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_11 0x544B52C
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_12 0x544B530
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_13 0x544B534
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_14 0x544B538
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_15 0x544B53C
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_16 0x544B540
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_17 0x544B544
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_18 0x544B548
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_19 0x544B54C
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_20 0x544B550
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_21 0x544B554
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_22 0x544B558
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_23 0x544B55C
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_24 0x544B560
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_25 0x544B564
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_26 0x544B568
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_27 0x544B56C
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_28 0x544B570
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_29 0x544B574
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_30 0x544B578
+
+#define mmNIC0_RXE1_CQ_LOG_SIZE_31 0x544B57C
+
+#define mmNIC0_RXE1_RDV_SEND_WQ_BASE_ADDR_LO 0x544B600
+
+#define mmNIC0_RXE1_RDV_SEND_WQ_BASE_ADDR_HI 0x544B604
+
+#define mmNIC0_RXE1_RDV_LOG_MAX_WQ_SIZE 0x544B608
+
+#define mmNIC0_RXE1_LBW_BASE_LO 0x544B700
+
+#define mmNIC0_RXE1_LBW_BASE_HI 0x544B704
+
+#define mmNIC0_RXE1_LBW_LOG_SIZE 0x544B708
+
+#define mmNIC0_RXE1_RAW_BASE_LO_P0_0 0x544B710
+
+#define mmNIC0_RXE1_RAW_BASE_LO_P0_1 0x544B714
+
+#define mmNIC0_RXE1_RAW_BASE_HI_P0_0 0x544B720
+
+#define mmNIC0_RXE1_RAW_BASE_HI_P0_1 0x544B724
+
+#define mmNIC0_RXE1_RAW_MISC_P0_0 0x544B730
+
+#define mmNIC0_RXE1_RAW_MISC_P0_1 0x544B734
+
+#define mmNIC0_RXE1_RAW_BASE_LO_P1_0 0x544B750
+
+#define mmNIC0_RXE1_RAW_BASE_LO_P1_1 0x544B754
+
+#define mmNIC0_RXE1_RAW_BASE_HI_P1_0 0x544B760
+
+#define mmNIC0_RXE1_RAW_BASE_HI_P1_1 0x544B764
+
+#define mmNIC0_RXE1_RAW_MISC_P1_0 0x544B770
+
+#define mmNIC0_RXE1_RAW_MISC_P1_1 0x544B774
+
+#define mmNIC0_RXE1_RAW_BASE_LO_P2_0 0x544B790
+
+#define mmNIC0_RXE1_RAW_BASE_LO_P2_1 0x544B794
+
+#define mmNIC0_RXE1_RAW_BASE_HI_P2_0 0x544B7A0
+
+#define mmNIC0_RXE1_RAW_BASE_HI_P2_1 0x544B7A4
+
+#define mmNIC0_RXE1_RAW_MISC_P2_0 0x544B7B0
+
+#define mmNIC0_RXE1_RAW_MISC_P2_1 0x544B7B4
+
+#define mmNIC0_RXE1_RAW_BASE_LO_P3_0 0x544B7D0
+
+#define mmNIC0_RXE1_RAW_BASE_LO_P3_1 0x544B7D4
+
+#define mmNIC0_RXE1_RAW_BASE_HI_P3_0 0x544B7E0
+
+#define mmNIC0_RXE1_RAW_BASE_HI_P3_1 0x544B7E4
+
+#define mmNIC0_RXE1_RAW_MISC_P3_0 0x544B7F0
+
+#define mmNIC0_RXE1_RAW_MISC_P3_1 0x544B7F4
+
+#define mmNIC0_RXE1_SEI_INTR_CAUSE 0x544B800
+
+#define mmNIC0_RXE1_SEI_INTR_MASK 0x544B804
+
+#define mmNIC0_RXE1_SEI_INTR_CLEAR 0x544B808
+
+#define mmNIC0_RXE1_SPI_INTR_CAUSE 0x544B810
+
+#define mmNIC0_RXE1_SPI_INTR_MASK 0x544B814
+
+#define mmNIC0_RXE1_SPI_INTR_CLEAR 0x544B818
+
+#define mmNIC0_RXE1_DBG_SPMU_SELECT 0x544BA00
+
+#define mmNIC0_RXE1_DBG_INV_OP_0 0x544BA04
+
+#define mmNIC0_RXE1_DBG_INV_OP_1 0x544BA08
+
+#define mmNIC0_RXE1_DBG_AXI_ERR 0x544BA0C
+
+#define mmNIC0_RXE1_DBG_AXI_CQE_ERR 0x544BA10
+
+#define mmNIC0_RXE1_DBG_AXI_LBW_ERR 0x544BA14
+
+#define mmNIC0_RXE1_DBG_EN 0x544BA18
+
+#define mmNIC0_RXE1_DBG_CQ_ARM_ON 0x544BA1C
+
+#define mmNIC0_RXE1_DBG_CQ_ARM_SEL 0x544BA20
+
+#define mmNIC0_RXE1_DBG_CQ_ARM_IDX 0x544BA24
+
+#define mmNIC0_RXE1_DBG_SLICE_MAIN 0x544BA28
+
+#define mmNIC0_RXE1_DBG_SLICE_SCT 0x544BA2C
+
+#endif /* ASIC_REG_NIC0_RXE1_REGS_H_ */
diff --git a/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_txe0_regs.h b/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_txe0_regs.h
new file mode 100644
index 000000000000..0537d108ead9
--- /dev/null
+++ b/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_txe0_regs.h
@@ -0,0 +1,529 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2016-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ *
+ */
+
+/************************************
+ ** This is an auto-generated file **
+ ** DO NOT EDIT BELOW **
+ ************************************/
+
+#ifndef ASIC_REG_NIC0_TXE0_REGS_H_
+#define ASIC_REG_NIC0_TXE0_REGS_H_
+
+/*
+ *****************************************
+ * NIC0_TXE0
+ * (Prototype: NIC_TXE)
+ *****************************************
+ */
+
+#define mmNIC0_TXE0_WQE_FETCH_REQ_MASK_31_0 0x5452000
+
+#define mmNIC0_TXE0_WQE_FETCH_REQ_MASK_47_32 0x5452004
+
+#define mmNIC0_TXE0_LOCAL_WQ_BUFFER_SIZE 0x5452008
+
+#define mmNIC0_TXE0_LOCAL_WQ_LINE_SIZE 0x545200C
+
+#define mmNIC0_TXE0_LOG_MAX_WQ_SIZE_0 0x5452010
+
+#define mmNIC0_TXE0_LOG_MAX_WQ_SIZE_1 0x5452014
+
+#define mmNIC0_TXE0_LOG_MAX_WQ_SIZE_2 0x5452018
+
+#define mmNIC0_TXE0_LOG_MAX_WQ_SIZE_3 0x545201C
+
+#define mmNIC0_TXE0_SQ_BASE_ADDRESS_63_32_0 0x5452020
+
+#define mmNIC0_TXE0_SQ_BASE_ADDRESS_63_32_1 0x5452024
+
+#define mmNIC0_TXE0_SQ_BASE_ADDRESS_63_32_2 0x5452028
+
+#define mmNIC0_TXE0_SQ_BASE_ADDRESS_63_32_3 0x545202C
+
+#define mmNIC0_TXE0_SQ_BASE_ADDRESS_31_0_0 0x5452030
+
+#define mmNIC0_TXE0_SQ_BASE_ADDRESS_31_0_1 0x5452034
+
+#define mmNIC0_TXE0_SQ_BASE_ADDRESS_31_0_2 0x5452038
+
+#define mmNIC0_TXE0_SQ_BASE_ADDRESS_31_0_3 0x545203C
+
+#define mmNIC0_TXE0_WQE_USER_CFG 0x5452040
+
+#define mmNIC0_TXE0_ALLOC_CREDIT 0x5452044
+
+#define mmNIC0_TXE0_ALLOC_CREDIT_FORCE_FULL 0x5452048
+
+#define mmNIC0_TXE0_READ_CREDIT 0x545204C
+
+#define mmNIC0_TXE0_READ_CREDIT_FORCE_FULL 0x5452050
+
+#define mmNIC0_TXE0_BURST_ENABLE 0x5452054
+
+#define mmNIC0_TXE0_WR_INIT_BUSY 0x5452058
+
+#define mmNIC0_TXE0_READ_RES_WT_INIT_BUSY 0x545205C
+
+#define mmNIC0_TXE0_BTH_TVER 0x5452060
+
+#define mmNIC0_TXE0_IPV4_IDENTIFICATION 0x5452064
+
+#define mmNIC0_TXE0_IPV4_FLAGS 0x5452068
+
+#define mmNIC0_TXE0_PAD 0x545206C
+
+#define mmNIC0_TXE0_ADD_PAD_TO_IPV4_LEN 0x5452070
+
+#define mmNIC0_TXE0_ADD_PAD_TO_UDP_LEN 0x5452074
+
+#define mmNIC0_TXE0_ICRC_EN 0x5452078
+
+#define mmNIC0_TXE0_UDP_MASK_S_PORT 0x545207C
+
+#define mmNIC0_TXE0_UDP_CHECKSUM 0x5452080
+
+#define mmNIC0_TXE0_UDP_DEST_PORT 0x5452084
+
+#define mmNIC0_TXE0_PORT0_MAC_CFG_47_32 0x5452088
+
+#define mmNIC0_TXE0_PORT0_MAC_CFG_31_0 0x545208C
+
+#define mmNIC0_TXE0_PORT1_MAC_CFG_47_32 0x5452090
+
+#define mmNIC0_TXE0_PORT1_MAC_CFG_31_0 0x5452094
+
+#define mmNIC0_TXE0_PRIO_TO_DSCP_0 0x545209C
+
+#define mmNIC0_TXE0_PRIO_TO_DSCP_1 0x54520A0
+
+#define mmNIC0_TXE0_PRIO_TO_PCP 0x54520B0
+
+#define mmNIC0_TXE0_MAC_ETHER_TYPE 0x54520B4
+
+#define mmNIC0_TXE0_MAC_ETHER_TYPE_VLAN 0x54520B8
+
+#define mmNIC0_TXE0_ECN_0 0x54520BC
+
+#define mmNIC0_TXE0_ECN_1 0x54520C0
+
+#define mmNIC0_TXE0_IPV4_TIME_TO_LIVE_0 0x54520C4
+
+#define mmNIC0_TXE0_IPV4_TIME_TO_LIVE_1 0x54520C8
+
+#define mmNIC0_TXE0_PRIO_PORT_CREDIT_FORCE 0x54520CC
+
+#define mmNIC0_TXE0_PRIO_PORT_CRDIT_0 0x54520D0
+
+#define mmNIC0_TXE0_PRIO_PORT_CRDIT_1 0x54520D4
+
+#define mmNIC0_TXE0_PRIO_PORT_CRDIT_2 0x54520D8
+
+#define mmNIC0_TXE0_PRIO_PORT_CRDIT_3 0x54520DC
+
+#define mmNIC0_TXE0_PRIO_PORT_CRDIT_4 0x54520E0
+
+#define mmNIC0_TXE0_PRIO_PORT_CRDIT_5 0x54520E4
+
+#define mmNIC0_TXE0_PRIO_PORT_CRDIT_6 0x54520E8
+
+#define mmNIC0_TXE0_PRIO_PORT_CRDIT_7 0x54520EC
+
+#define mmNIC0_TXE0_WQE_FETCH_TOKEN_EN 0x54520F0
+
+#define mmNIC0_TXE0_NACK_SYNDROME 0x54520F4
+
+#define mmNIC0_TXE0_WQE_FETCH_AXI_PROT 0x54520FC
+
+#define mmNIC0_TXE0_DATA_FETCH_AXI_PROT 0x5452104
+
+#define mmNIC0_TXE0_FETCH_OUT_OF_TOKEN 0x5452108
+
+#define mmNIC0_TXE0_ECN_COUNT_EN 0x545210C
+
+#define mmNIC0_TXE0_INERRUPT_CAUSE 0x5452110
+
+#define mmNIC0_TXE0_INTERRUPT_MASK 0x5452114
+
+#define mmNIC0_TXE0_INTERRUPT_CLR 0x5452118
+
+#define mmNIC0_TXE0_VLAN_TAG_QPN_OFFSET 0x545211C
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_0 0x5452120
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_1 0x5452124
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_2 0x5452128
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_3 0x545212C
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_4 0x5452130
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_5 0x5452134
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_6 0x5452138
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_7 0x545213C
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_8 0x5452140
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_9 0x5452144
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_10 0x5452148
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_11 0x545214C
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_12 0x5452150
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_13 0x5452154
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_14 0x5452158
+
+#define mmNIC0_TXE0_VALN_TAG_CFG_15 0x545215C
+
+#define mmNIC0_TXE0_DBG_TRIG 0x5452160
+
+#define mmNIC0_TXE0_WQE_PREFETCH_CFG 0x5452164
+
+#define mmNIC0_TXE0_WQE_PREFETCH_INVALIDATE 0x5452168
+
+#define mmNIC0_TXE0_SWAP_MEMORY_ENDIANNESS 0x545216C
+
+#define mmNIC0_TXE0_WQE_FETCH_SLICE_47_32 0x5452170
+
+#define mmNIC0_TXE0_WQE_FETCH_SLICE_31_0 0x5452174
+
+#define mmNIC0_TXE0_WQE_EXE_SLICE_47_32 0x5452178
+
+#define mmNIC0_TXE0_WQE_EXE_SLICE_31_0 0x545217C
+
+#define mmNIC0_TXE0_DBG_COUNT_SELECT0 0x5452180
+
+#define mmNIC0_TXE0_DBG_COUNT_SELECT1 0x5452184
+
+#define mmNIC0_TXE0_DBG_COUNT_SELECT2 0x5452188
+
+#define mmNIC0_TXE0_DBG_COUNT_SELECT3 0x545218C
+
+#define mmNIC0_TXE0_DBG_COUNT_SELECT4 0x5452190
+
+#define mmNIC0_TXE0_DBG_COUNT_SELECT5 0x5452194
+
+#define mmNIC0_TXE0_DBG_COUNT_SELECT6 0x5452198
+
+#define mmNIC0_TXE0_DBG_COUNT_SELECT7 0x545219C
+
+#define mmNIC0_TXE0_DBG_COUNT_SELECT8 0x54521A0
+
+#define mmNIC0_TXE0_DBG_COUNT_SELECT9 0x54521A4
+
+#define mmNIC0_TXE0_DBG_COUNT_SELECT10 0x54521A8
+
+#define mmNIC0_TXE0_DBG_COUNT_SELECT11 0x54521AC
+
+#define mmNIC0_TXE0_BTH_MKEY 0x54521B0
+
+#define mmNIC0_TXE0_WQE_BUFF_FLUSH_SLICE_47_3 0x54521B4
+
+#define mmNIC0_TXE0_WQE_BUFF_FLUSH_SLICE_31_0 0x54521B8
+
+#define mmNIC0_TXE0_INTERRUPT_INDEX_MASK_RING_0 0x54521BC
+
+#define mmNIC0_TXE0_INTERRUPT_INDEX_MASK_RING_1 0x54521C0
+
+#define mmNIC0_TXE0_INTERRUPT_INDEX_MASK_RING_2 0x54521C4
+
+#define mmNIC0_TXE0_INTERRUPT_INDEX_MASK_RING_3 0x54521C8
+
+#define mmNIC0_TXE0_INTERRUPT_INDEX_MASK_RING_4 0x54521CC
+
+#define mmNIC0_TXE0_QPN_RING_0 0x54521D0
+
+#define mmNIC0_TXE0_QPN_RING_1 0x54521D4
+
+#define mmNIC0_TXE0_QPN_RING_2 0x54521D8
+
+#define mmNIC0_TXE0_QPN_RING_3 0x54521DC
+
+#define mmNIC0_TXE0_INTERRUPT_EACH_PACKET 0x54521F0
+
+#define mmNIC0_TXE0_EXECUTIN_INDEX_RING_0 0x54521F4
+
+#define mmNIC0_TXE0_EXECUTIN_INDEX_RING_1 0x54521F8
+
+#define mmNIC0_TXE0_EXECUTIN_INDEX_RING_2 0x54521FC
+
+#define mmNIC0_TXE0_EXECUTIN_INDEX_RING_3 0x5452200
+
+#define mmNIC0_TXE0_WQE_FETCH_AXI_USER_LO 0x5452208
+
+#define mmNIC0_TXE0_WQE_FETCH_AXI_USER_HI 0x545220C
+
+#define mmNIC0_TXE0_DATA_FETCH_AXI_USER_LO 0x5452210
+
+#define mmNIC0_TXE0_DATA_FETCH_AXI_USER_HI 0x5452214
+
+#define mmNIC0_TXE0_CHICKEN_BITS 0x5452218
+
+#define mmNIC0_TXE0_CHICKEN_BITS2 0x545221C
+
+#define mmNIC0_TXE0_WQE_CHECK_EN 0x5452220
+
+#define mmNIC0_TXE0_WQE_CHECK_EN2 0x5452224
+
+#define mmNIC0_TXE0_WQE_CHECK_CFG1 0x5452228
+
+#define mmNIC0_TXE0_WQE_CHECK_CFG2 0x545222C
+
+#define mmNIC0_TXE0_WQE_CHECK_CFG3 0x5452230
+
+#define mmNIC0_TXE0_WQE_CHECK_CONST1 0x5452234
+
+#define mmNIC0_TXE0_WQE_CHECK_CONST2 0x5452238
+
+#define mmNIC0_TXE0_WQE_CHECK_CONST3 0x545223C
+
+#define mmNIC0_TXE0_WQE_CHECK_CONST4 0x5452240
+
+#define mmNIC0_TXE0_WQE_CHECK_CONST5 0x5452244
+
+#define mmNIC0_TXE0_WQE_CHECK_CONST6 0x5452248
+
+#define mmNIC0_TXE0_WQE_CHECK_CONST7 0x545224C
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT0_0 0x5452250
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT0_1 0x5452254
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT0_2 0x5452258
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT0_3 0x545225C
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT0_4 0x5452260
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT0_5 0x5452264
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT0_6 0x5452268
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT0_7 0x545226C
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT1_0 0x5452270
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT1_1 0x5452274
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT1_2 0x5452278
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT1_3 0x545227C
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT1_4 0x5452280
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT1_5 0x5452284
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT1_6 0x5452288
+
+#define mmNIC0_TXE0_SOURCE_IP_PORT1_7 0x545228C
+
+#define mmNIC0_TXE0_BTH_RSVD 0x5452290
+
+#define mmNIC0_TXE0_MULTI_PKT_WQE 0x5452294
+
+#define mmNIC0_TXE0_TXWQC 0x54522A0
+
+#define mmNIC0_TXE0_TXWQC_STATUS 0x54522A4
+
+#define mmNIC0_TXE0_TXWQC_INVALIDATE 0x54522A8
+
+#define mmNIC0_TXE0_STATS_CFG0 0x54522B0
+
+#define mmNIC0_TXE0_STATS_CFG1 0x54522B4
+
+#define mmNIC0_TXE0_STATS_CFG2 0x54522B8
+
+#define mmNIC0_TXE0_STATS_TOT_BYTES_LSB 0x54522C0
+
+#define mmNIC0_TXE0_STATS_TOT_BYTES_MSB 0x54522C4
+
+#define mmNIC0_TXE0_STATS_TOT_PKTS_LSB 0x54522C8
+
+#define mmNIC0_TXE0_STATS_TOT_PKTS_MSB 0x54522CC
+
+#define mmNIC0_TXE0_STATS_MEAS_WIN_BYTES_LSB 0x54522D0
+
+#define mmNIC0_TXE0_STATS_MEAS_WIN_BYTES_MSB 0x54522D4
+
+#define mmNIC0_TXE0_STATS_MEAS_WIN_PKTS 0x54522D8
+
+#define mmNIC0_TXE0_STATS_MEAS_LATENCY 0x54522DC
+
+#define mmNIC0_TXE0_HW_EVENT_CFG 0x54522E0
+
+#define mmNIC0_TXE0_ENCAP_CFG_0 0x5452300
+
+#define mmNIC0_TXE0_ENCAP_CFG_1 0x5452304
+
+#define mmNIC0_TXE0_ENCAP_CFG_2 0x5452308
+
+#define mmNIC0_TXE0_ENCAP_CFG_3 0x545230C
+
+#define mmNIC0_TXE0_ENCAP_CFG_4 0x5452310
+
+#define mmNIC0_TXE0_ENCAP_CFG_5 0x5452314
+
+#define mmNIC0_TXE0_ENCAP_CFG_6 0x5452318
+
+#define mmNIC0_TXE0_ENCAP_CFG_7 0x545231C
+
+#define mmNIC0_TXE0_ENCAP_DATA_31_0_0 0x5452320
+
+#define mmNIC0_TXE0_ENCAP_DATA_31_0_1 0x5452324
+
+#define mmNIC0_TXE0_ENCAP_DATA_31_0_2 0x5452328
+
+#define mmNIC0_TXE0_ENCAP_DATA_31_0_3 0x545232C
+
+#define mmNIC0_TXE0_ENCAP_DATA_31_0_4 0x5452330
+
+#define mmNIC0_TXE0_ENCAP_DATA_31_0_5 0x5452334
+
+#define mmNIC0_TXE0_ENCAP_DATA_31_0_6 0x5452338
+
+#define mmNIC0_TXE0_ENCAP_DATA_31_0_7 0x545233C
+
+#define mmNIC0_TXE0_ENCAP_DATA_63_32_0 0x5452340
+
+#define mmNIC0_TXE0_ENCAP_DATA_63_32_1 0x5452344
+
+#define mmNIC0_TXE0_ENCAP_DATA_63_32_2 0x5452348
+
+#define mmNIC0_TXE0_ENCAP_DATA_63_32_3 0x545234C
+
+#define mmNIC0_TXE0_ENCAP_DATA_63_32_4 0x5452350
+
+#define mmNIC0_TXE0_ENCAP_DATA_63_32_5 0x5452354
+
+#define mmNIC0_TXE0_ENCAP_DATA_63_32_6 0x5452358
+
+#define mmNIC0_TXE0_ENCAP_DATA_63_32_7 0x545235C
+
+#define mmNIC0_TXE0_ENCAP_DATA_95_64_0 0x5452360
+
+#define mmNIC0_TXE0_ENCAP_DATA_95_64_1 0x5452364
+
+#define mmNIC0_TXE0_ENCAP_DATA_95_64_2 0x5452368
+
+#define mmNIC0_TXE0_ENCAP_DATA_95_64_3 0x545236C
+
+#define mmNIC0_TXE0_ENCAP_DATA_95_64_4 0x5452370
+
+#define mmNIC0_TXE0_ENCAP_DATA_95_64_5 0x5452374
+
+#define mmNIC0_TXE0_ENCAP_DATA_95_64_6 0x5452378
+
+#define mmNIC0_TXE0_ENCAP_DATA_95_64_7 0x545237C
+
+#define mmNIC0_TXE0_ENCAP_DATA_127_96_0 0x5452380
+
+#define mmNIC0_TXE0_ENCAP_DATA_127_96_1 0x5452384
+
+#define mmNIC0_TXE0_ENCAP_DATA_127_96_2 0x5452388
+
+#define mmNIC0_TXE0_ENCAP_DATA_127_96_3 0x545238C
+
+#define mmNIC0_TXE0_ENCAP_DATA_127_96_4 0x5452390
+
+#define mmNIC0_TXE0_ENCAP_DATA_127_96_5 0x5452394
+
+#define mmNIC0_TXE0_ENCAP_DATA_127_96_6 0x5452398
+
+#define mmNIC0_TXE0_ENCAP_DATA_127_96_7 0x545239C
+
+#define mmNIC0_TXE0_ENCAP_DATA_159_128_0 0x54523A0
+
+#define mmNIC0_TXE0_ENCAP_DATA_159_128_1 0x54523A4
+
+#define mmNIC0_TXE0_ENCAP_DATA_159_128_2 0x54523A8
+
+#define mmNIC0_TXE0_ENCAP_DATA_159_128_3 0x54523AC
+
+#define mmNIC0_TXE0_ENCAP_DATA_159_128_4 0x54523B0
+
+#define mmNIC0_TXE0_ENCAP_DATA_159_128_5 0x54523B4
+
+#define mmNIC0_TXE0_ENCAP_DATA_159_128_6 0x54523B8
+
+#define mmNIC0_TXE0_ENCAP_DATA_159_128_7 0x54523BC
+
+#define mmNIC0_TXE0_ENCAP_DATA_191_160_0 0x54523C0
+
+#define mmNIC0_TXE0_ENCAP_DATA_191_160_1 0x54523C4
+
+#define mmNIC0_TXE0_ENCAP_DATA_191_160_2 0x54523C8
+
+#define mmNIC0_TXE0_ENCAP_DATA_191_160_3 0x54523CC
+
+#define mmNIC0_TXE0_ENCAP_DATA_191_160_4 0x54523D0
+
+#define mmNIC0_TXE0_ENCAP_DATA_191_160_5 0x54523D4
+
+#define mmNIC0_TXE0_ENCAP_DATA_191_160_6 0x54523D8
+
+#define mmNIC0_TXE0_ENCAP_DATA_191_160_7 0x54523DC
+
+#define mmNIC0_TXE0_ENCAP_DATA_223_192_0 0x54523E0
+
+#define mmNIC0_TXE0_ENCAP_DATA_223_192_1 0x54523E4
+
+#define mmNIC0_TXE0_ENCAP_DATA_223_192_2 0x54523E8
+
+#define mmNIC0_TXE0_ENCAP_DATA_223_192_3 0x54523EC
+
+#define mmNIC0_TXE0_ENCAP_DATA_223_192_4 0x54523F0
+
+#define mmNIC0_TXE0_ENCAP_DATA_223_192_5 0x54523F4
+
+#define mmNIC0_TXE0_ENCAP_DATA_223_192_6 0x54523F8
+
+#define mmNIC0_TXE0_ENCAP_DATA_223_192_7 0x54523FC
+
+#define mmNIC0_TXE0_ENCAP_DATA_255_224_0 0x5452400
+
+#define mmNIC0_TXE0_ENCAP_DATA_255_224_1 0x5452404
+
+#define mmNIC0_TXE0_ENCAP_DATA_255_224_2 0x5452408
+
+#define mmNIC0_TXE0_ENCAP_DATA_255_224_3 0x545240C
+
+#define mmNIC0_TXE0_ENCAP_DATA_255_224_4 0x5452410
+
+#define mmNIC0_TXE0_ENCAP_DATA_255_224_5 0x5452414
+
+#define mmNIC0_TXE0_ENCAP_DATA_255_224_6 0x5452418
+
+#define mmNIC0_TXE0_ENCAP_DATA_255_224_7 0x545241C
+
+#define mmNIC0_TXE0_ENCAP_CFG2 0x5452420
+
+#define mmNIC0_TXE0_MTD_DUAL_STRIDE3 0x5452430
+
+#define mmNIC0_TXE0_MTD_DUAL_STRIDE4 0x5452434
+
+#define mmNIC0_TXE0_MTD_DUAL_NUM_OF_STRIDES 0x5452438
+
+#define mmNIC0_TXE0_CLK_GATE_CFG 0x5452440
+
+#define mmNIC0_TXE0_WQE_CHECK_NOTIFY_EN 0x5452450
+
+#define mmNIC0_TXE0_WQE_CHECK_NOTIFY_EN2 0x5452454
+
+#define mmNIC0_TXE0_WQE_CHECK_CFG4 0x5452458
+
+#define mmNIC0_TXE0_WQE_CHECK_CFG5 0x545245C
+
+#define mmNIC0_TXE0_WQE_CHECK_CFG6 0x5452460
+
+#define mmNIC0_TXE0_DATA_READ_RL_CFG 0x5452470
+
+#endif /* ASIC_REG_NIC0_TXE0_REGS_H_ */
diff --git a/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_txs0_regs.h b/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_txs0_regs.h
new file mode 100644
index 000000000000..94db189452b4
--- /dev/null
+++ b/drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_txs0_regs.h
@@ -0,0 +1,289 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2016-2020 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ *
+ */
+
+/************************************
+ ** This is an auto-generated file **
+ ** DO NOT EDIT BELOW **
+ ************************************/
+
+#ifndef ASIC_REG_NIC0_TXS0_REGS_H_
+#define ASIC_REG_NIC0_TXS0_REGS_H_
+
+/*
+ *****************************************
+ * NIC0_TXS0
+ * (Prototype: NIC_TXS)
+ *****************************************
+ */
+
+#define mmNIC0_TXS0_TMR_SCAN_EN 0x5450000
+
+#define mmNIC0_TXS0_TICK_WRAP 0x5450004
+
+#define mmNIC0_TXS0_SCAN_TIME_COMPARE_0 0x5450008
+
+#define mmNIC0_TXS0_SCAN_TIME_COMPARE_1 0x545000C
+
+#define mmNIC0_TXS0_SLICE_CREDIT 0x5450010
+
+#define mmNIC0_TXS0_SLICE_FORCE_FULL 0x5450014
+
+#define mmNIC0_TXS0_FIRST_SCHEDQ_ID 0x5450018
+
+#define mmNIC0_TXS0_LAST_SCHEDQ_ID 0x545001C
+
+#define mmNIC0_TXS0_PUSH_MASK 0x5450020
+
+#define mmNIC0_TXS0_POP_MASK 0x5450024
+
+#define mmNIC0_TXS0_PUSH_RELEASE_INVALIDATE 0x5450028
+
+#define mmNIC0_TXS0_POP_RELEASE_INVALIDATE 0x545002C
+
+#define mmNIC0_TXS0_LIST_MEM_READ_MASK 0x5450030
+
+#define mmNIC0_TXS0_FIFO_MEM_READ_MASK 0x5450034
+
+#define mmNIC0_TXS0_LIST_MEM_WRITE_MASK 0x5450038
+
+#define mmNIC0_TXS0_FIFO_MEM_WRITE_MASK 0x545003C
+
+#define mmNIC0_TXS0_BASE_ADDRESS_63_32 0x5450040
+
+#define mmNIC0_TXS0_BASE_ADDRESS_31_7 0x5450044
+
+#define mmNIC0_TXS0_AXI_PROT 0x545004C
+
+#define mmNIC0_TXS0_RATE_LIMIT 0x5450050
+
+#define mmNIC0_TXS0_CACHE_CFG 0x5450054
+
+#define mmNIC0_TXS0_SCHEDQ_MEM_INIT 0x5450058
+
+#define mmNIC0_TXS0_SCHEDQ_UPDATE_EN 0x545005C
+
+#define mmNIC0_TXS0_SCHEDQ_UPDATE_FIFO 0x5450060
+
+#define mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_31_0 0x5450064
+
+#define mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_63_32 0x5450068
+
+#define mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_95_64 0x545006C
+
+#define mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_127_96 0x5450070
+
+#define mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_159_128 0x5450074
+
+#define mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_191_160 0x5450078
+
+#define mmNIC0_TXS0_SCHEDQ_UPDATE_DESC_217_192 0x545007C
+
+#define mmNIC0_TXS0_FORCE_HIT_EN 0x5450080
+
+#define mmNIC0_TXS0_INVALIDATE_LIST 0x5450084
+
+#define mmNIC0_TXS0_INVALIDATE_LIST_STATUS 0x5450088
+
+#define mmNIC0_TXS0_INVALIDATE_FREE_LIST 0x545008C
+
+#define mmNIC0_TXS0_INVALIDATE_FREE_LIST_STAT 0x5450090
+
+#define mmNIC0_TXS0_PUSH_PREFETCH_EN 0x5450094
+
+#define mmNIC0_TXS0_PUSH_RELEASE_EN 0x5450098
+
+#define mmNIC0_TXS0_PUSH_LOCK_EN 0x545009C
+
+#define mmNIC0_TXS0_PUSH_PREFETCH_NEXT_EN 0x54500A0
+
+#define mmNIC0_TXS0_POP_PREFETCH_EN 0x54500A4
+
+#define mmNIC0_TXS0_POP_RELEASE_EN 0x54500A8
+
+#define mmNIC0_TXS0_POP_LOCK_EN 0x54500AC
+
+#define mmNIC0_TXS0_POP_PREFETCH_NEXT_EN 0x54500B0
+
+#define mmNIC0_TXS0_LIST_MASK 0x54500B4
+
+#define mmNIC0_TXS0_RELEASE_INCALIDATE 0x54500B8
+
+#define mmNIC0_TXS0_BASE_ADDRESS_FREE_LIST_63_32 0x54500BC
+
+#define mmNIC0_TXS0_BASE_ADDRESS_FREE_LIST_31_0 0x54500C0
+
+#define mmNIC0_TXS0_FREE_LIST_EN 0x54500C4
+
+#define mmNIC0_TXS0_PUSH_FORCE_HIT_EN 0x54500C8
+
+#define mmNIC0_TXS0_PRODUCER_UPDATE_EN 0x54500CC
+
+#define mmNIC0_TXS0_PRODUCER_UPDATE 0x54500D0
+
+#define mmNIC0_TXS0_PRIOQ_CREDIT_FORCE 0x54500D4
+
+#define mmNIC0_TXS0_PRIOQ_CREDIT_0 0x54500D8
+
+#define mmNIC0_TXS0_PRIOQ_CREDIT_1 0x54500DC
+
+#define mmNIC0_TXS0_PRIOQ_CREDIT_2 0x54500E0
+
+#define mmNIC0_TXS0_PRIOQ_CREDIT_3 0x54500E4
+
+#define mmNIC0_TXS0_PRIOQ_CREDIT_4 0x54500E8
+
+#define mmNIC0_TXS0_PRIOQ_CREDIT_5 0x54500EC
+
+#define mmNIC0_TXS0_PRIOQ_CREDIT_6 0x54500F0
+
+#define mmNIC0_TXS0_PRIOQ_CREDIT_7 0x54500F4
+
+#define mmNIC0_TXS0_DBG_COUNT_SELECT0 0x54500F8
+
+#define mmNIC0_TXS0_DBG_COUNT_SELECT1 0x54500FC
+
+#define mmNIC0_TXS0_DBG_COUNT_SELECT2 0x5450100
+
+#define mmNIC0_TXS0_DBG_COUNT_SELECT3 0x5450104
+
+#define mmNIC0_TXS0_DBG_COUNT_SELECT4 0x5450108
+
+#define mmNIC0_TXS0_DBG_COUNT_SELECT5 0x545010C
+
+#define mmNIC0_TXS0_DBG_COUNT_SELECT6 0x5450110
+
+#define mmNIC0_TXS0_DBG_COUNT_SELECT7 0x5450114
+
+#define mmNIC0_TXS0_DBG_COUNT_SELECT8 0x5450118
+
+#define mmNIC0_TXS0_DBG_COUNT_SELECT9 0x545011C
+
+#define mmNIC0_TXS0_DBG_COUNT_SELECT10 0x5450120
+
+#define mmNIC0_TXS0_DBG_COUNT_SELECT11 0x5450124
+
+#define mmNIC0_TXS0_IGNORE_BURST_EN 0x5450140
+
+#define mmNIC0_TXS0_IGNORE_BURST_THRESHOLD_0 0x5450144
+
+#define mmNIC0_TXS0_IGNORE_BURST_THRESHOLD_1 0x5450148
+
+#define mmNIC0_TXS0_IGNORE_BURST_THRESHOLD_2 0x545014C
+
+#define mmNIC0_TXS0_IGNORE_BURST_THRESHOLD_3 0x5450150
+
+#define mmNIC0_TXS0_IGNORE_BURST_THRESHOLD_4 0x5450154
+
+#define mmNIC0_TXS0_IGNORE_BURST_THRESHOLD_5 0x5450158
+
+#define mmNIC0_TXS0_IGNORE_BURST_THRESHOLD_6 0x545015C
+
+#define mmNIC0_TXS0_IGNORE_BURST_THRESHOLD_7 0x5450160
+
+#define mmNIC0_TXS0_RANDOM_PSUH_CFG 0x5450164
+
+#define mmNIC0_TXS0_DBG_HW_EVENT_TRIGER 0x5450168
+
+#define mmNIC0_TXS0_INTERRUPT_CAUSE 0x545016C
+
+#define mmNIC0_TXS0_INTERRUPT_MASK 0x5450170
+
+#define mmNIC0_TXS0_INTERRUPT_CLR 0x5450174
+
+#define mmNIC0_TXS0_LOAD_SLICE_HIT_EN 0x5450178
+
+#define mmNIC0_TXS0_SLICE_ACTIVE_47_32 0x545017C
+
+#define mmNIC0_TXS0_SLICE_ACTIVE_31_0 0x5450180
+
+#define mmNIC0_TXS0_AXI_CACHE 0x5450184
+
+#define mmNIC0_TXS0_SLICE_GW_ADDR 0x5450188
+
+#define mmNIC0_TXS0_SLICE_GW_DATA 0x545018C
+
+#define mmNIC0_TXS0_SCANNER_CREDIT_EN 0x5450190
+
+#define mmNIC0_TXS0_FREE_LIST_PUSH_MASK_EN 0x5450194
+
+#define mmNIC0_TXS0_FREE_AEMPTY_THRESHOLD 0x5450198
+
+#define mmNIC0_TXS0_AXI_USER_LO 0x5450200
+
+#define mmNIC0_TXS0_AXI_USER_HI 0x5450204
+
+#define mmNIC0_TXS0_NCH_SYNCED 0x5450210
+
+#define mmNIC0_TXS0_NCH_ASYNCED 0x5450214
+
+#define mmNIC0_TXS0_NCH_ASYNCED_RES 0x5450218
+
+#define mmNIC0_TXS0_STATS_CFG 0x5450220
+
+#define mmNIC0_TXS0_STATS_TOT_PUSH_REQ 0x5450230
+
+#define mmNIC0_TXS0_STATS_TOT_PUSH_RES 0x5450234
+
+#define mmNIC0_TXS0_STATS_TOT_SCHED_QP_REQ 0x5450238
+
+#define mmNIC0_TXS0_STATS_TOT_SCHED_QP_RES 0x545023C
+
+#define mmNIC0_TXS0_STATS_TOT_RETURN_SLICE 0x5450240
+
+#define mmNIC0_TXS0_STATS_WIN_PUSH_REQ 0x5450250
+
+#define mmNIC0_TXS0_STATS_WIN_PUSH_RES 0x5450254
+
+#define mmNIC0_TXS0_STATS_WIN_SCHED_QP_REQ 0x5450258
+
+#define mmNIC0_TXS0_STATS_WIN_SCHED_QP_RES 0x545025C
+
+#define mmNIC0_TXS0_STATS_WIN_RETURN_SLICE 0x5450260
+
+#define mmNIC0_TXS0_ASYNC_NICL_APB_ADDR_MASK 0x5450270
+
+#define mmNIC0_TXS0_ASYNC_NICL_APB_SPLIT_ADDR0 0x5450274
+
+#define mmNIC0_TXS0_ASYNC_NICL_APB_SPLIT_ADDR1 0x5450278
+
+#define mmNIC0_TXS0_ASYNC_NICL_APB_SPLIT_ADDR2 0x545027C
+
+#define mmNIC0_TXS0_ASYNC_NICL_APB_SPLIT_ADDR3 0x5450280
+
+#define mmNIC0_TXS0_ASYNC_NICD_APB_ADDR_MASK 0x5450290
+
+#define mmNIC0_TXS0_ASYNC_NICD_APB_SPLIT_ADDR0 0x5450294
+
+#define mmNIC0_TXS0_ASYNC_NICD_APB_SPLIT_ADDR1 0x5450298
+
+#define mmNIC0_TXS0_ASYNC_NICD_APB_SPLIT_ADDR2 0x545029C
+
+#define mmNIC0_TXS0_ASYNC_NICD_APB_SPLIT_ADDR3 0x54502A0
+
+#define mmNIC0_TXS0_TX_APB_ADDR_MASK 0x54502B0
+
+#define mmNIC0_TXS0_TX_APB_SPLIT_ADDR0 0x54502B4
+
+#define mmNIC0_TXS0_TX_APB_SPLIT_ADDR1 0x54502B8
+
+#define mmNIC0_TXS0_TX_APB_SPLIT_ADDR2 0x54502BC
+
+#define mmNIC0_TXS0_TX_APB_SPLIT_ADDR3 0x54502C0
+
+#define mmNIC0_TXS0_TX_APB_SPLIT_ADDR4 0x54502C4
+
+#define mmNIC0_TXS0_TX_APB_SPLIT_ADDR5 0x54502C8
+
+#define mmNIC0_TXS0_TX_APB_SPLIT_ADDR6 0x54502CC
+
+#define mmNIC0_TXS0_HW_EVENT_CFG 0x54502E0
+
+#define mmNIC0_TXS0_CHICKEN_BITS 0x54502E8
+
+#define mmNIC0_TXS0_CLK_GATE_CFG 0x54502F0
+
+#endif /* ASIC_REG_NIC0_TXS0_REGS_H_ */
--
2.34.1
^ permalink raw reply related [flat|nested] 107+ messages in thread
* [PATCH 15/15] accel/habanalabs/gaudi2: network scaling support
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
` (12 preceding siblings ...)
2024-06-13 8:22 ` [PATCH 14/15] accel/habanalabs/gaudi2: CN registers header files Omer Shpigelman
@ 2024-06-13 8:22 ` Omer Shpigelman
2024-06-17 12:34 ` [PATCH 00/15] Introduce HabanaLabs network drivers Alexander Lobakin
2024-06-19 16:33 ` Jiri Pirko
15 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-13 8:22 UTC (permalink / raw)
To: linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, oshpigelman, zyehudai
Add GAUDI2 ASIC specific support for AI scaling over the network.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Co-developed-by: David Meriin <dmeriin@habana.ai>
Signed-off-by: David Meriin <dmeriin@habana.ai>
Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
---
drivers/accel/habanalabs/gaudi2/Makefile | 2 +-
drivers/accel/habanalabs/gaudi2/gaudi2.c | 426 +++++++++++++++++-
drivers/accel/habanalabs/gaudi2/gaudi2P.h | 41 +-
drivers/accel/habanalabs/gaudi2/gaudi2_cn.c | 424 +++++++++++++++++
drivers/accel/habanalabs/gaudi2/gaudi2_cn.h | 42 ++
.../habanalabs/gaudi2/gaudi2_coresight.c | 145 +++++-
6 files changed, 1050 insertions(+), 30 deletions(-)
create mode 100644 drivers/accel/habanalabs/gaudi2/gaudi2_cn.c
create mode 100644 drivers/accel/habanalabs/gaudi2/gaudi2_cn.h
diff --git a/drivers/accel/habanalabs/gaudi2/Makefile b/drivers/accel/habanalabs/gaudi2/Makefile
index 1e047883ba74..289e69a355b1 100644
--- a/drivers/accel/habanalabs/gaudi2/Makefile
+++ b/drivers/accel/habanalabs/gaudi2/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0-only
HL_GAUDI2_FILES := gaudi2/gaudi2.o gaudi2/gaudi2_security.o \
- gaudi2/gaudi2_coresight.o
+ gaudi2/gaudi2_coresight.o gaudi2/gaudi2_cn.o
diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c
index a22d2a93394e..6417c69d47af 100644
--- a/drivers/accel/habanalabs/gaudi2/gaudi2.c
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c
@@ -1,12 +1,14 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * Copyright 2020-2022 HabanaLabs, Ltd.
+ * Copyright 2020-2024 HabanaLabs, Ltd.
* All Rights Reserved.
*/
#include "gaudi2P.h"
+#include "gaudi2_cn.h"
#include "gaudi2_masks.h"
+#include "gaudi2_coresight_regs.h"
#include "../include/gaudi2/gaudi2_special_blocks.h"
#include "../include/hw_ip/mmu/mmu_general.h"
#include "../include/hw_ip/mmu/mmu_v2_0.h"
@@ -2216,6 +2218,9 @@ static bool gaudi2_get_edma_idle_status(struct hl_device *hdev, u64 *mask_arr, u
struct engines_data *e);
static u64 gaudi2_mmu_scramble_addr(struct hl_device *hdev, u64 raw_addr);
static u64 gaudi2_mmu_descramble_addr(struct hl_device *hdev, u64 scrambled_addr);
+static int gaudi2_send_cpu_message(struct hl_device *hdev, u32 *msg, u16 len, u32 timeout,
+ u64 *result);
+static void gaudi2_init_cn(struct hl_device *hdev);
static void gaudi2_init_scrambler_hbm(struct hl_device *hdev)
{
@@ -2382,15 +2387,17 @@ static int gaudi2_set_dram_properties(struct hl_device *hdev)
hbm_drv_base_offset = roundup(CPU_FW_IMAGE_SIZE, prop->num_functional_hbms * SZ_8M);
/*
- * The NIC driver section size and the HMMU page tables section in the HBM needs
- * to be the remaining size in the first dram page after taking into
- * account the F/W image size
+ * The NIC driver section size and the HMMU page tables section in the HBM needs to be the
+ * remaining size in the first dram page after taking into account the F/W image size.
*/
+ prop->nic_drv_size = (prop->dram_page_size - hbm_drv_base_offset) -
+ (HMMU_PAGE_TABLES_SIZE + EDMA_PQS_SIZE + EDMA_SCRATCHPAD_SIZE);
+ prop->nic_drv_addr = DRAM_PHYS_BASE + hbm_drv_base_offset;
+
+ prop->clk = GAUDI2_NIC_CLK_FREQ / USEC_PER_SEC;
/* Reserve region in HBM for HMMU page tables */
- prop->mmu_pgt_addr = DRAM_PHYS_BASE + hbm_drv_base_offset +
- ((prop->dram_page_size - hbm_drv_base_offset) -
- (HMMU_PAGE_TABLES_SIZE + EDMA_PQS_SIZE + EDMA_SCRATCHPAD_SIZE));
+ prop->mmu_pgt_addr = prop->nic_drv_addr + prop->nic_drv_size;
/* Set EDMA PQs HBM addresses */
edma_pq_base_addr = prop->mmu_pgt_addr + HMMU_PAGE_TABLES_SIZE;
@@ -2409,6 +2416,7 @@ static int gaudi2_set_dram_properties(struct hl_device *hdev)
static int gaudi2_set_fixed_properties(struct hl_device *hdev)
{
struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct hl_cn_properties *cn_prop = &prop->cn_props;
struct hw_queue_properties *q_props;
u32 num_sync_stream_queues = 0;
int i, rc;
@@ -2601,6 +2609,10 @@ static int gaudi2_set_fixed_properties(struct hl_device *hdev)
prop->hbw_flush_reg = mmPCIE_WRAP_SPECIAL_GLBL_SPARE_0;
+ prop->macro_cfg_size = NIC_OFFSET;
+ cn_prop->status_packet_size = sizeof(struct cpucp_nic_status);
+ cn_prop->max_num_of_ports = NIC_NUMBER_OF_PORTS;
+
return 0;
free_qprops:
@@ -2996,8 +3008,16 @@ static int gaudi2_cpucp_info_get(struct hl_device *hdev)
u64 dram_size;
int rc;
- if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q)) {
+ /* Skip for hard or device release reset flow. No need to repopulate. */
+ if (!hdev->reset_info.in_reset) {
+ rc = gaudi2_cn_set_info(hdev, false);
+ if (rc)
+ return rc;
+ }
+
return 0;
+ }
/* No point of asking this information again when not doing hard reset, as the device
* CPU hasn't been reset
@@ -3058,23 +3078,47 @@ static int gaudi2_cpucp_info_get(struct hl_device *hdev)
prop->max_power_default = (u64) max_power;
- return 0;
+ /* Repopulate post hard reset since device CPU has been reset */
+ return gaudi2_cn_set_info(hdev, true);
}
-static int gaudi2_fetch_psoc_frequency(struct hl_device *hdev)
+static int gaudi2_fetch_frequency(struct hl_device *hdev, u32 pll_index, u16 *pll_freq_arr)
{
struct gaudi2_device *gaudi2 = hdev->asic_specific;
- u16 pll_freq_arr[HL_PLL_NUM_OUTPUTS];
- int rc;
if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
return 0;
- rc = hl_fw_cpucp_pll_info_get(hdev, HL_GAUDI2_CPU_PLL, pll_freq_arr);
+ return hl_fw_cpucp_pll_info_get(hdev, pll_index, pll_freq_arr);
+}
+
+static int gaudi2_fetch_psoc_frequency(struct hl_device *hdev)
+{
+ u16 pll_freq_arr[HL_PLL_NUM_OUTPUTS] = {0};
+ int rc;
+
+ rc = gaudi2_fetch_frequency(hdev, HL_GAUDI2_CPU_PLL, pll_freq_arr);
+ if (rc)
+ return rc;
+
+ if (pll_freq_arr[3] != 0)
+ hdev->asic_prop.psoc_timestamp_frequency = pll_freq_arr[3];
+
+ return 0;
+}
+
+static int gaudi2_fetch_nic_frequency(struct hl_device *hdev)
+{
+ u16 pll_freq_arr[HL_PLL_NUM_OUTPUTS] = {0};
+ int rc;
+
+ rc = gaudi2_fetch_frequency(hdev, HL_GAUDI2_NIC_PLL, pll_freq_arr);
if (rc)
return rc;
- hdev->asic_prop.psoc_timestamp_frequency = pll_freq_arr[3];
+ /* DIV1 - nic_clk */
+ if (pll_freq_arr[1] != 0)
+ hdev->asic_prop.clk = pll_freq_arr[1];
return 0;
}
@@ -3237,6 +3281,79 @@ static void gaudi2_init_arcs(struct hl_device *hdev)
CFG_BASE + le32_to_cpu(dyn_regs->eng_arc_irq_ctrl);
}
+static int gaudi2_cn_clear_mem(struct hl_device *hdev)
+{
+ u32 nic_drv_size, gran_per_packet, num_iter;
+ struct asic_fixed_properties *asic_prop;
+ struct cpucp_cn_clear_mem_packet pkt;
+ bool use_cpucp;
+ int i, rc = 0;
+ u64 val = 0;
+
+ if (!hdev->cn.ports_mask)
+ return rc;
+
+ asic_prop = &hdev->asic_prop;
+
+ use_cpucp = !!(hdev->asic_prop.fw_app_cpu_boot_dev_sts0 &
+ CPU_BOOT_DEV_STS0_NIC_MEM_CLEAR_EN);
+ if (use_cpucp) {
+ /* nic driver size is expected to be less than 4GB in gaudi2 */
+ if (asic_prop->nic_drv_size > BIT(32)) {
+ dev_err(hdev->dev,
+ "Failed to clear NIC memory, nic size is 0x%llx is bigger than 4GB\n",
+ asic_prop->nic_drv_size);
+ return -EINVAL;
+ }
+
+ /* subtract 1 to support size of 4GB as well */
+ nic_drv_size = (asic_prop->nic_drv_size - 1) & 0xFFFFFFFF;
+ /* max 250 MB per packet, in order to avoid CPUCP packet timeout */
+ gran_per_packet = 250 * 1024 * 1024;
+ num_iter = (nic_drv_size + gran_per_packet) / gran_per_packet;
+
+ for (i = 0; i < num_iter; i++) {
+ memset(&pkt, 0, sizeof(pkt));
+ pkt.cpucp_pkt.ctl = cpu_to_le32(CPUCP_PACKET_NIC_CLR_MEM <<
+ CPUCP_PKT_CTL_OPCODE_SHIFT);
+ pkt.mem_base_addr = cpu_to_le64(asic_prop->nic_drv_addr +
+ i * gran_per_packet);
+
+ if (i < num_iter - 1)
+ pkt.size = cpu_to_le32(gran_per_packet);
+ else
+ /* add 1 as it was subtracted in original size calculation */
+ pkt.size = cpu_to_le32(nic_drv_size - gran_per_packet * i + 1);
+
+ rc = gaudi2_send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt), 0, NULL);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to handle CPU-CP pkt %u, error %d\n",
+ CPUCP_PACKET_NIC_CLR_MEM, rc);
+ return rc;
+ }
+ }
+ } else {
+ for (i = 0; i < hdev->asic_prop.nic_drv_size; i += sizeof(val)) {
+ rc = hdev->asic_funcs->access_dev_mem(hdev, PCI_REGION_DRAM,
+ hdev->asic_prop.nic_drv_addr + i,
+ &val, DEBUGFS_WRITE64);
+
+ if (rc) {
+ dev_err(hdev->dev, "Failed to set nic memory. addr: 0x%llx",
+ hdev->asic_prop.nic_drv_addr + i);
+ return rc;
+ }
+
+ /* sleep every 32MB to avoid high CPU utilization */
+ if (i && !(i % SZ_32M))
+ usleep_range(50, 100);
+ }
+ }
+
+ return rc;
+}
+
static int gaudi2_scrub_arc_dccm(struct hl_device *hdev, u32 cpu_id)
{
u32 reg_base, reg_val;
@@ -3303,6 +3420,15 @@ static int gaudi2_scrub_arcs_dccm(struct hl_device *hdev)
return 0;
}
+static int gaudi2_nic_unmask_ecc_interrupts(struct hl_device *hdev)
+{
+ struct cpucp_packet pkt = {};
+
+ pkt.ctl = cpu_to_le32(CPUCP_PACKET_NIC_ECC_INTRS_UNMASK << CPUCP_PKT_CTL_OPCODE_SHIFT);
+
+ return gaudi2_send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt), 0, NULL);
+}
+
static int gaudi2_late_init(struct hl_device *hdev)
{
struct gaudi2_device *gaudi2 = hdev->asic_specific;
@@ -3329,6 +3455,24 @@ static int gaudi2_late_init(struct hl_device *hdev)
goto disable_pci_access;
}
+ rc = gaudi2_fetch_nic_frequency(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to fetch NIC frequency\n");
+ goto disable_pci_access;
+ }
+
+ rc = gaudi2_cn_clear_mem(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to clear CN memory\n");
+ goto disable_pci_access;
+ }
+
+ rc = gaudi2_nic_unmask_ecc_interrupts(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to unmask NIC ECC interrupts\n");
+ goto disable_pci_access;
+ }
+
gaudi2_init_arcs(hdev);
rc = gaudi2_scrub_arcs_dccm(hdev);
@@ -4836,6 +4980,13 @@ static void gaudi2_halt_engines(struct hl_device *hdev, bool hard_reset, bool fw
else
wait_timeout_ms = GAUDI2_RESET_WAIT_MSEC;
+ /*
+ * Mark the NIC as in reset to avoid any new NIC accesses to the H/W. This must be done
+ * before we stop the CPU as the NIC might use it e.g. get/set EEPROM data.
+ */
+ if (hard_reset)
+ hl_cn_hard_reset_prepare(hdev, fw_reset);
+
if (fw_reset)
goto skip_engines;
@@ -4872,11 +5023,14 @@ static void gaudi2_halt_engines(struct hl_device *hdev, bool hard_reset, bool fw
skip_engines:
if (hard_reset) {
+ hl_cn_stop(hdev);
gaudi2_disable_msix(hdev);
return;
}
+ gaudi2_cn_compute_reset_prepare(hdev);
gaudi2_sync_irqs(hdev);
+ hl_cn_synchronize_irqs(hdev);
}
static void gaudi2_init_firmware_preload_params(struct hl_device *hdev)
@@ -5377,7 +5531,7 @@ static void gaudi2_arm_monitors_for_virt_msix_db(struct hl_device *hdev, u32 sob
static void gaudi2_prepare_sm_for_virt_msix_db(struct hl_device *hdev)
{
- u32 decoder_id, sob_id, first_mon_id, interrupt_id;
+ u32 decoder_id, port, sob_id, first_mon_id, interrupt_id;
struct asic_fixed_properties *prop = &hdev->asic_prop;
/* Decoder normal/abnormal interrupts */
@@ -5395,6 +5549,14 @@ static void gaudi2_prepare_sm_for_virt_msix_db(struct hl_device *hdev)
interrupt_id += 1;
gaudi2_arm_monitors_for_virt_msix_db(hdev, sob_id, first_mon_id, interrupt_id);
}
+
+ /* NIC EQ interrupts */
+ for (port = 0 ; port < NIC_NUMBER_OF_PORTS ; ++port) {
+ sob_id = GAUDI2_RESERVED_SOB_NIC_PORT_FIRST + port;
+ first_mon_id = GAUDI2_RESERVED_MON_NIC_PORT_FIRST + 3 * port;
+ interrupt_id = GAUDI2_IRQ_NUM_NIC_PORT_FIRST + port;
+ gaudi2_arm_monitors_for_virt_msix_db(hdev, sob_id, first_mon_id, interrupt_id);
+ }
}
static void gaudi2_init_sm(struct hl_device *hdev)
@@ -6161,6 +6323,8 @@ static int gaudi2_hw_init(struct hl_device *hdev)
return rc;
}
+ gaudi2_cn_quiescence(hdev);
+
gaudi2_init_scrambler_hbm(hdev);
gaudi2_init_kdma(hdev);
@@ -6187,6 +6351,7 @@ static int gaudi2_hw_init(struct hl_device *hdev)
gaudi2_init_mme(hdev);
gaudi2_init_rotator(hdev);
gaudi2_init_dec(hdev);
+ gaudi2_init_cn(hdev);
gaudi2_enable_timestamp(hdev);
rc = gaudi2_coresight_init(hdev);
@@ -6469,6 +6634,12 @@ static int gaudi2_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset
gaudi2_poll_btm_indication(hdev, poll_timeout_us);
}
+ /* On PLDM the NIC PHY link is always up, and because the NIC interrupts are enabled by
+ * default - need to disable the interrupts ASAP.
+ */
+ if (hard_reset && hdev->pldm)
+ gaudi2_cn_disable_interrupts(hdev);
+
if (!gaudi2)
return 0;
@@ -6488,7 +6659,7 @@ static int gaudi2_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset
HW_CAP_PMMU | HW_CAP_CPU | HW_CAP_CPU_Q |
HW_CAP_SRAM_SCRAMBLER | HW_CAP_DMMU_MASK |
HW_CAP_PDMA_MASK | HW_CAP_EDMA_MASK | HW_CAP_KDMA |
- HW_CAP_MME_MASK | HW_CAP_ROT_MASK);
+ HW_CAP_MME_MASK | HW_CAP_ROT_MASK | HW_CAP_NIC_DRV);
memset(gaudi2->events_stat, 0, sizeof(gaudi2->events_stat));
} else {
@@ -7173,6 +7344,8 @@ static int gaudi2_compute_reset_late_init(struct hl_device *hdev)
gaudi2_init_security(hdev);
+ gaudi2_cn_compute_reset_late_init(hdev);
+
/* Unmask all IRQs since some could have been received during the soft reset */
irq_arr_size = gaudi2->num_of_valid_hw_events * sizeof(gaudi2->hw_events[0]);
return hl_fw_unmask_irq_arr(hdev, gaudi2->hw_events, irq_arr_size);
@@ -7557,6 +7730,30 @@ static void gaudi2_hw_queues_unlock(struct hl_device *hdev)
spin_unlock(&gaudi2->hw_queues_lock);
}
+void gaudi2_init_cn(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 i, reg_base, queue_id;
+
+ if (!hdev->cn.ports_mask)
+ return;
+
+ if (gaudi2->nic_hw_cap_initialized & HW_CAP_NIC_MASK)
+ return;
+
+ queue_id = GAUDI2_QUEUE_ID_NIC_0_0;
+
+ for (i = 0; i < NIC_NUMBER_OF_ENGINES;
+ i++, queue_id += NUM_OF_PQ_PER_QMAN) {
+ if (!(hdev->cn.ports_mask & BIT(i)))
+ continue;
+
+ reg_base = gaudi2_qm_blocks_bases[queue_id];
+ gaudi2_init_qman(hdev, reg_base, queue_id);
+ gaudi2->nic_hw_cap_initialized |= BIT_ULL(i);
+ }
+}
+
static u32 gaudi2_get_pci_id(struct hl_device *hdev)
{
return hdev->pdev->device;
@@ -7779,7 +7976,7 @@ static int gaudi2_mmu_shared_prepare(struct hl_device *hdev, u32 asid)
{
struct asic_fixed_properties *prop = &hdev->asic_prop;
u32 rw_asid, offset;
- int rc, i;
+ int rc, i, q;
rw_asid = FIELD_PREP(ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_RD_MASK, asid) |
FIELD_PREP(ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_WR_MASK, asid);
@@ -7819,6 +8016,18 @@ static int gaudi2_mmu_shared_prepare(struct hl_device *hdev, u32 asid)
if (rc)
return rc;
+ /* NIC */
+ for (i = 0 ; i < NIC_NUMBER_OF_MACROS ; i++)
+ for (q = 0 ; q < NIC_NUMBER_OF_QM_PER_MACRO ; q++) {
+ if (!(hdev->cn.ports_mask & BIT(i * NIC_NUMBER_OF_QM_PER_MACRO + q)))
+ continue;
+
+ WREG32(mmNIC0_QM0_AXUSER_NONSECURED_HB_ASID +
+ i * NIC_OFFSET + q * NIC_QM_OFFSET, rw_asid);
+ WREG32(mmNIC0_QM0_AXUSER_NONSECURED_HB_MMU_BP +
+ i * NIC_OFFSET + q * NIC_QM_OFFSET, 0);
+ }
+
return 0;
}
@@ -7936,6 +8145,47 @@ static bool gaudi2_handle_ecc_event(struct hl_device *hdev, u16 event_type,
return !!ecc_data->is_critical;
}
+static void gaudi2_handle_bmon_spmu_event(struct hl_device *hdev, u8 macro_index)
+{
+ /* For this point a profiler is not configuring the BMON and SPMU block and
+ * therefore we can't deduce which port and entity triggered the interrupt.
+ * So for now we only clear all interrupt registers to prevent interrupt flood
+ */
+ u32 macro_offset = GAUDI2_SPMU_NIC1_DBG_0 - GAUDI2_SPMU_NIC0_DBG_0;
+
+ WREG32(debug_spmu_regs[GAUDI2_SPMU_NIC0_DBG_0 + macro_index * macro_offset]
+ + mmSPMU_PMINTENCLR_EL1_OFFSET, 0x1);
+ WREG32(debug_spmu_regs[GAUDI2_SPMU_NIC0_DBG_1 + macro_index * macro_offset]
+ + mmSPMU_PMINTENCLR_EL1_OFFSET, 0x1);
+
+ macro_offset = GAUDI2_BMON_NIC1_DBG_0_0 - GAUDI2_BMON_NIC0_DBG_0_0;
+ WREG32(debug_bmon_regs[GAUDI2_BMON_NIC0_DBG_0_0 + macro_index * macro_offset]
+ + mmBMON_INT_CLR_OFFSET, 0x1);
+ WREG32(debug_bmon_regs[GAUDI2_BMON_NIC0_DBG_1_0 + macro_index * macro_offset]
+ + mmBMON_INT_CLR_OFFSET, 0x1);
+ WREG32(debug_bmon_regs[GAUDI2_BMON_NIC0_DBG_2_0 + macro_index * macro_offset]
+ + mmBMON_INT_CLR_OFFSET, 0x1);
+ WREG32(debug_bmon_regs[GAUDI2_BMON_NIC0_DBG_0_1 + macro_index * macro_offset]
+ + mmBMON_INT_CLR_OFFSET, 0x1);
+ WREG32(debug_bmon_regs[GAUDI2_BMON_NIC0_DBG_1_1 + macro_index * macro_offset]
+ + mmBMON_INT_CLR_OFFSET, 0x1);
+ WREG32(debug_bmon_regs[GAUDI2_BMON_NIC0_DBG_2_1 + macro_index * macro_offset]
+ + mmBMON_INT_CLR_OFFSET, 0x1);
+}
+
+static int gaudi2_handle_nic_sw_error_event(struct hl_device *hdev, u16 event_type, u8 macro_index,
+ struct hl_eq_nic_intr_cause *nic_intr_cause)
+{
+ u32 error_count;
+
+ error_count = gaudi2_cn_handle_sw_error_event(hdev, event_type, macro_index,
+ nic_intr_cause);
+
+ hl_check_for_glbl_errors(hdev);
+
+ return error_count;
+}
+
static void handle_lower_qman_data_on_err(struct hl_device *hdev, u64 qman_base, u32 engine_id)
{
struct undefined_opcode_info *undef_opcode = &hdev->captured_err_info.undef_opcode;
@@ -8347,6 +8597,22 @@ static void gaudi2_check_if_razwi_happened(struct hl_device *hdev)
gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_ROT, mod_idx, 0, NULL);
}
+static int gaudi2_handle_nic_axi_error_response_event(struct hl_device *hdev, u16 event_type,
+ u8 macro_index, struct hl_eq_nic_intr_cause *nic_intr_cause,
+ u64 *event_mask)
+{
+ u32 error_count = 0;
+
+ error_count = gaudi2_cn_handle_axi_error_response_event(hdev, event_type, macro_index,
+ nic_intr_cause);
+
+ /* check if RAZWI happened */
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_NIC, macro_index, 0, event_mask);
+ hl_check_for_glbl_errors(hdev);
+
+ return error_count;
+}
+
static int gaudi2_psoc_razwi_get_engines(struct gaudi2_razwi_info *razwi_info, u32 array_size,
u32 axuser_xy, u32 *base, u16 *eng_id,
char *eng_name)
@@ -8560,6 +8826,11 @@ static int gaudi2_handle_qm_sei_err(struct hl_device *hdev, u16 event_type,
qman_base = mmROT0_QM_BASE + index * ROT_OFFSET;
module = RAZWI_ROT;
break;
+ case GAUDI2_EVENT_NIC0_AXI_ERROR_RESPONSE ... GAUDI2_EVENT_NIC11_AXI_ERROR_RESPONSE:
+ index = event_type - GAUDI2_EVENT_NIC0_AXI_ERROR_RESPONSE;
+ qman_base = mmNIC0_QM0_BASE + index * NIC_OFFSET;
+ module = RAZWI_NIC;
+ break;
default:
return 0;
}
@@ -8684,6 +8955,13 @@ static int gaudi2_handle_qman_err(struct hl_device *hdev, u16 event_type, u64 *e
qid_base = GAUDI2_QUEUE_ID_ROT_1_0;
qman_base = mmROT1_QM_BASE;
break;
+ case GAUDI2_EVENT_NIC0_QM0 ... GAUDI2_EVENT_NIC11_QM1:
+ index = event_type - GAUDI2_EVENT_NIC0_QM0;
+ qid_base = GAUDI2_QUEUE_ID_NIC_0_0 + index * QMAN_STREAMS;
+ qman_base = mmNIC0_QM0_BASE +
+ (index / NIC_NUMBER_OF_QM_PER_MACRO) * NIC_OFFSET +
+ (index % NIC_NUMBER_OF_QM_PER_MACRO) * NIC_QM_OFFSET;
+ break;
default:
return 0;
}
@@ -10176,6 +10454,25 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent
event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
break;
+ case GAUDI2_EVENT_NIC0_BMON_SPMU:
+ case GAUDI2_EVENT_NIC1_BMON_SPMU:
+ case GAUDI2_EVENT_NIC2_BMON_SPMU:
+ case GAUDI2_EVENT_NIC3_BMON_SPMU:
+ case GAUDI2_EVENT_NIC4_BMON_SPMU:
+ case GAUDI2_EVENT_NIC5_BMON_SPMU:
+ case GAUDI2_EVENT_NIC6_BMON_SPMU:
+ case GAUDI2_EVENT_NIC7_BMON_SPMU:
+ case GAUDI2_EVENT_NIC8_BMON_SPMU:
+ case GAUDI2_EVENT_NIC9_BMON_SPMU:
+ case GAUDI2_EVENT_NIC10_BMON_SPMU:
+ case GAUDI2_EVENT_NIC11_BMON_SPMU:
+ index = (event_type - GAUDI2_EVENT_NIC0_BMON_SPMU) /
+ (GAUDI2_EVENT_NIC1_BMON_SPMU - GAUDI2_EVENT_NIC0_BMON_SPMU);
+ gaudi2_handle_bmon_spmu_event(hdev, index);
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
case GAUDI2_EVENT_CPU_FIX_POWER_ENV_S:
case GAUDI2_EVENT_CPU_FIX_POWER_ENV_E:
case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_S:
@@ -10213,6 +10510,33 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent
event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
break;
+ case GAUDI2_EVENT_NIC0_AXI_ERROR_RESPONSE ... GAUDI2_EVENT_NIC11_AXI_ERROR_RESPONSE:
+ index = (event_type - GAUDI2_EVENT_NIC0_AXI_ERROR_RESPONSE);
+ error_count = gaudi2_handle_nic_axi_error_response_event(hdev, event_type, index,
+ &eq_entry->nic_intr_cause, &event_mask);
+ error_count += gaudi2_handle_qm_sei_err(hdev, event_type, false, &event_mask);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_NIC0_SW_ERROR:
+ case GAUDI2_EVENT_NIC1_SW_ERROR:
+ case GAUDI2_EVENT_NIC2_SW_ERROR:
+ case GAUDI2_EVENT_NIC3_SW_ERROR:
+ case GAUDI2_EVENT_NIC4_SW_ERROR:
+ case GAUDI2_EVENT_NIC5_SW_ERROR:
+ case GAUDI2_EVENT_NIC6_SW_ERROR:
+ case GAUDI2_EVENT_NIC7_SW_ERROR:
+ case GAUDI2_EVENT_NIC8_SW_ERROR:
+ case GAUDI2_EVENT_NIC9_SW_ERROR:
+ case GAUDI2_EVENT_NIC10_SW_ERROR:
+ case GAUDI2_EVENT_NIC11_SW_ERROR:
+ index = (event_type - GAUDI2_EVENT_NIC0_SW_ERROR) /
+ (GAUDI2_EVENT_NIC1_SW_ERROR - GAUDI2_EVENT_NIC0_SW_ERROR);
+ error_count = gaudi2_handle_nic_sw_error_event(hdev, event_type, index,
+ &eq_entry->nic_intr_cause);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
case GAUDI2_EVENT_CPU_CPLD_SHUTDOWN_CAUSE:
dev_info(hdev->dev, "CPLD shutdown cause, reset reason: 0x%llx\n",
le64_to_cpu(eq_entry->data[0]));
@@ -10233,6 +10557,11 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent
event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
break;
+ case GAUDI2_EVENT_CPU0_STATUS_NIC0_ENG0 ... GAUDI2_EVENT_CPU11_STATUS_NIC11_ENG1:
+ hl_cn_send_status(hdev, event_type - GAUDI2_EVENT_CPU0_STATUS_NIC0_ENG0, 0, 0);
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ break;
+
case GAUDI2_EVENT_ARC_DCCM_FULL:
error_count = hl_arc_event_handle(hdev, event_type, &eq_entry->arc_data);
event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
@@ -11510,6 +11839,54 @@ static u32 *gaudi2_get_stream_master_qid_arr(void)
return NULL;
}
+static void gaudi2_update_nic_qmans_state(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 nic_mask, queue_id;
+ int nic_id;
+
+ queue_id = GAUDI2_QUEUE_ID_NIC_0_0;
+
+ for (nic_id = 0 ; nic_id < NIC_NUMBER_OF_ENGINES;
+ nic_id++, queue_id += NUM_OF_PQ_PER_QMAN) {
+
+ nic_mask = BIT(HW_CAP_NIC_SHIFT + nic_id);
+
+ /* We need to stop and disable the QMAN in case we already
+ * initialized it AND the firmware reported that the matching
+ * NIC is disabled
+ */
+ if ((gaudi2->nic_hw_cap_initialized & nic_mask) &&
+ (!(hdev->cn.ports_mask & BIT(nic_id)))) {
+
+ gaudi2_stop_qman_common(hdev, gaudi2_qm_blocks_bases[queue_id]);
+ gaudi2_disable_qman_common(hdev, gaudi2_qm_blocks_bases[queue_id]);
+
+ /* Remove that capability bit so no one would be able
+ * to submit to that NIC's QMAN
+ */
+ gaudi2->nic_hw_cap_initialized &= ~nic_mask;
+ }
+ }
+}
+
+static int gaudi2_cn_init(struct hl_device *hdev)
+{
+ int rc;
+
+ rc = hl_cn_init(hdev);
+ if (rc)
+ return rc;
+
+ /* After ports initialization (for the first time), we need to check
+ * whether the f/w reported on ports that are disabled. For those
+ * ports, we need to disable their QMANs and update the HW_CAP bits
+ */
+ gaudi2_update_nic_qmans_state(hdev);
+
+ return 0;
+}
+
static void gaudi2_add_device_attr(struct hl_device *hdev, struct attribute_group *dev_clk_attr_grp,
struct attribute_group *dev_vrm_attr_grp)
{
@@ -11590,6 +11967,17 @@ static void gaudi2_write_pte(struct hl_device *hdev, u64 addr, u64 val)
writeq(val, hdev->pcie_bar[DRAM_BAR_ID] + (addr - gaudi2->dram_bar_cur_addr));
}
+static int gaudi2_get_reg_pcie_addr(struct hl_device *hdev, u32 reg, u64 *pci_addr)
+{
+ u64 offset = CFG_BASE - STM_FLASH_BASE_ADDR + reg;
+
+ if (pci_resource_len(hdev->pdev, SRAM_CFG_BAR_ID) < offset)
+ return -EINVAL;
+
+ *pci_addr = offset;
+ return 0;
+}
+
static const struct hl_asic_funcs gaudi2_funcs = {
.early_init = gaudi2_early_init,
.early_fini = gaudi2_early_fini,
@@ -11641,10 +12029,13 @@ static const struct hl_asic_funcs gaudi2_funcs = {
.get_eeprom_data = gaudi2_get_eeprom_data,
.get_monitor_dump = gaudi2_get_monitor_dump,
.send_cpu_message = gaudi2_send_cpu_message,
+ .cn_init = gaudi2_cn_init,
+ .cn_fini = hl_cn_fini,
.pci_bars_map = gaudi2_pci_bars_map,
.init_iatu = gaudi2_init_iatu,
.rreg = hl_rreg,
.wreg = hl_wreg,
+ .get_reg_pcie_addr = gaudi2_get_reg_pcie_addr,
.halt_coresight = gaudi2_halt_coresight,
.ctx_init = gaudi2_ctx_init,
.ctx_fini = gaudi2_ctx_fini,
@@ -11679,6 +12070,7 @@ static const struct hl_asic_funcs gaudi2_funcs = {
.get_sob_addr = &gaudi2_get_sob_addr,
.set_pci_memory_regions = gaudi2_set_pci_memory_regions,
.get_stream_master_qid_arr = gaudi2_get_stream_master_qid_arr,
+ .cn_funcs = &gaudi2_cn_funcs,
.check_if_razwi_happened = gaudi2_check_if_razwi_happened,
.mmu_get_real_page_size = gaudi2_mmu_get_real_page_size,
.access_dev_mem = hl_access_dev_mem,
diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2P.h b/drivers/accel/habanalabs/gaudi2/gaudi2P.h
index eee41387b269..429e89703e36 100644
--- a/drivers/accel/habanalabs/gaudi2/gaudi2P.h
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2P.h
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0
*
- * Copyright 2020-2022 HabanaLabs, Ltd.
+ * Copyright 2020-2024 HabanaLabs, Ltd.
* All Rights Reserved.
*
*/
@@ -8,12 +8,15 @@
#ifndef GAUDI2P_H_
#define GAUDI2P_H_
+#include <linux/net/intel/cn_aux.h>
+#include <linux/net/intel/gaudi2_aux.h>
#include <uapi/drm/habanalabs_accel.h>
#include "../common/habanalabs.h"
#include <linux/habanalabs/hl_boot_if.h>
#include "../include/gaudi2/gaudi2.h"
#include "../include/gaudi2/gaudi2_packets.h"
#include "../include/gaudi2/gaudi2_fw_if.h"
+#include "../include/gaudi2/gaudi2_coresight.h"
#include "../include/gaudi2/gaudi2_async_events.h"
#define GAUDI2_LINUX_FW_FILE "habanalabs/gaudi2/gaudi2-fit.itb"
@@ -86,7 +89,7 @@
#define GAUDI2_BOOT_FIT_REQ_TIMEOUT_USEC 10000000 /* 10s */
-#define GAUDI2_NIC_CLK_FREQ 450000000ull /* 450 MHz */
+#define GAUDI2_NIC_CLK_FREQ 488000000ull /* 488 MHz */
#define DC_POWER_DEFAULT 60000 /* 60W */
@@ -198,6 +201,7 @@
HW_CAP_HBM_SCRAMBLER_SW_RESET)
#define HW_CAP_HBM_SCRAMBLER_SHIFT 41
#define HW_CAP_RESERVED BIT(43)
+#define HW_CAP_NIC_DRV BIT(44)
#define HW_CAP_MMU_MASK (HW_CAP_PMMU | HW_CAP_DMMU_MASK)
/* Range Registers */
@@ -239,6 +243,8 @@
#define GAUDI2_NUM_TESTED_QS (GAUDI2_QUEUE_ID_CPU_PQ - GAUDI2_QUEUE_ID_PDMA_0_0)
+extern u64 debug_bmon_regs[GAUDI2_BMON_LAST + 1];
+extern u64 debug_spmu_regs[GAUDI2_SPMU_LAST + 1];
enum gaudi2_reserved_sob_id {
GAUDI2_RESERVED_SOB_CS_COMPLETION_FIRST,
@@ -251,6 +257,9 @@ enum gaudi2_reserved_sob_id {
GAUDI2_RESERVED_SOB_DEC_ABNRM_FIRST,
GAUDI2_RESERVED_SOB_DEC_ABNRM_LAST =
GAUDI2_RESERVED_SOB_DEC_ABNRM_FIRST + NUMBER_OF_DEC - 1,
+ GAUDI2_RESERVED_SOB_NIC_PORT_FIRST,
+ GAUDI2_RESERVED_SOB_NIC_PORT_LAST =
+ GAUDI2_RESERVED_SOB_NIC_PORT_FIRST + NIC_NUMBER_OF_PORTS - 1,
GAUDI2_RESERVED_SOB_NUMBER
};
@@ -265,6 +274,9 @@ enum gaudi2_reserved_mon_id {
GAUDI2_RESERVED_MON_DEC_ABNRM_FIRST,
GAUDI2_RESERVED_MON_DEC_ABNRM_LAST =
GAUDI2_RESERVED_MON_DEC_ABNRM_FIRST + 3 * NUMBER_OF_DEC - 1,
+ GAUDI2_RESERVED_MON_NIC_PORT_FIRST,
+ GAUDI2_RESERVED_MON_NIC_PORT_LAST =
+ GAUDI2_RESERVED_MON_NIC_PORT_FIRST + 3 * NIC_NUMBER_OF_PORTS - 1,
GAUDI2_RESERVED_MON_NUMBER
};
@@ -513,13 +525,8 @@ struct gaudi2_queues_test_info {
* @events_stat: array that holds histogram of all received events.
* @events_stat_aggregate: same as events_stat but doesn't get cleared on reset.
* @num_of_valid_hw_events: used to hold the number of valid H/W events.
- * @nic_ports: array that holds all NIC ports manage structures.
- * @nic_macros: array that holds all NIC macro manage structures.
- * @core_info: core info to be used by the Ethernet driver.
- * @aux_ops: functions for core <-> aux drivers communication.
- * @flush_db_fifo: flag to force flush DB FIFO after a write.
- * @hbm_cfg: HBM subsystem settings
- * @hw_queues_lock_mutex: used by simulator instead of hw_queues_lock.
+ * @cn_aux_ops: functions for core <-> accel drivers communication.
+ * @cn_aux_data: data to be used by the core driver.
* @queues_test_info: information used by the driver when testing the HW queues.
*/
struct gaudi2_device {
@@ -549,6 +556,10 @@ struct gaudi2_device {
u32 events_stat_aggregate[GAUDI2_EVENT_SIZE];
u32 num_of_valid_hw_events;
+ /* NIC fields */
+ struct gaudi2_cn_aux_ops cn_aux_ops;
+ struct gaudi2_cn_aux_data cn_aux_data;
+
/* Queue testing */
struct gaudi2_queues_test_info queues_test_info[GAUDI2_NUM_TESTED_QS];
};
@@ -588,6 +599,7 @@ enum gaudi2_block_types {
GAUDI2_BLOCK_TYPE_MAX
};
+extern struct hl_cn_funcs gaudi2_cn_funcs;
extern const u32 gaudi2_dma_core_blocks_bases[DMA_CORE_ID_SIZE];
extern const u32 gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_SIZE];
extern const u32 gaudi2_mme_acc_blocks_bases[MME_ID_SIZE];
@@ -609,4 +621,15 @@ int gaudi2_init_security(struct hl_device *hdev);
void gaudi2_ack_protection_bits_errors(struct hl_device *hdev);
int gaudi2_send_device_activity(struct hl_device *hdev, bool open);
+/* Functions exported for NIC */
+void gaudi2_cn_spmu_get_stats_info(struct hl_device *hdev, u32 port, struct hbl_cn_stat **stats,
+ u32 *n_stats);
+int gaudi2_cn_spmu_config(struct hl_device *hdev, u32 port, u32 num_event_types, u32 event_types[],
+ bool enable);
+int gaudi2_cn_spmu_sample(struct hl_device *hdev, u32 port, u32 num_out_data, u64 out_data[]);
+void gaudi2_cn_disable_interrupts(struct hl_device *hdev);
+void gaudi2_cn_quiescence(struct hl_device *hdev);
+void gaudi2_cn_compute_reset_prepare(struct hl_device *hdev);
+void gaudi2_cn_compute_reset_late_init(struct hl_device *hdev);
+
#endif /* GAUDI2P_H_ */
diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2_cn.c b/drivers/accel/habanalabs/gaudi2/gaudi2_cn.c
new file mode 100644
index 000000000000..9beb11b579e4
--- /dev/null
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2_cn.c
@@ -0,0 +1,424 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2020-2022 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ */
+
+#include "gaudi2_cn.h"
+#include "../include/gaudi2/asic_reg/gaudi2_regs.h"
+#include "../include/gaudi2/gaudi2_async_ids_map_extended.h"
+#include "../include/hw_ip/nic/nic_general.h"
+
+static bool gaudi2_cn_get_hw_cap(struct hl_device *hdev);
+
+int gaudi2_cn_handle_sw_error_event(struct hl_device *hdev, u16 event_type, u8 macro_index,
+ struct hl_eq_nic_intr_cause *nic_intr_cause)
+{
+ struct hbl_aux_dev *aux_dev = &hdev->cn.cn_aux_dev;
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cn_aux_ops *aux_ops = &gaudi2->cn_aux_ops;
+ u32 error_count = 0;
+
+ if (aux_ops->sw_err_event_handler)
+ error_count = aux_ops->sw_err_event_handler(aux_dev, event_type, macro_index,
+ nic_intr_cause);
+
+ return error_count;
+}
+
+int gaudi2_cn_handle_axi_error_response_event(struct hl_device *hdev, u16 event_type,
+ u8 macro_index,
+ struct hl_eq_nic_intr_cause *nic_intr_cause)
+{
+ struct hbl_aux_dev *aux_dev = &hdev->cn.cn_aux_dev;
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cn_aux_ops *aux_ops = &gaudi2->cn_aux_ops;
+ u32 error_count = 0;
+
+ if (aux_ops->axi_error_response_event_handler)
+ error_count = aux_ops->axi_error_response_event_handler(aux_dev, event_type,
+ macro_index,
+ nic_intr_cause);
+
+ return error_count;
+}
+
+/**
+ * gaudi2_cn_disable_interrupts() - Disable interrupts of all ports.
+ * Gaudi2 CN interrupts are enabled by default, need to disable them ASAP
+ * before ports init and after hard reset.
+ *
+ * @hdev: habanalabs device structure.
+ */
+void gaudi2_cn_disable_interrupts(struct hl_device *hdev)
+{
+ u32 port;
+
+ if (!hdev->cn.ports_mask)
+ return;
+
+ /* we only need the port number for NIC_WREG32 */
+ for (port = 0 ; port < NIC_NUMBER_OF_PORTS ; port++) {
+ NIC_WREG32(mmNIC0_QPC0_EVENT_QUE_CFG, 0);
+ NIC_WREG32(mmNIC0_QPC0_INTERRUPT_EN, 0);
+ NIC_WREG32(mmNIC0_QPC0_INTERRUPT_MASK, 0xFFFFFFFF);
+
+ /* This registers needs to be configured only in case of PLDM */
+ if (hdev->pldm) {
+ NIC_WREG32(mmNIC0_QPC0_INTERRUPT_RESP_ERR_MASK, 0xFFFFFFFF);
+ NIC_WREG32(mmNIC0_TXE0_INTERRUPT_MASK, 0xFFFFFFFF);
+ NIC_WREG32(mmNIC0_RXE0_SPI_INTR_MASK, 0xFFFFFFFF);
+ NIC_WREG32(mmNIC0_RXE0_SEI_INTR_MASK, 0xFFFFFFFF);
+ NIC_WREG32(mmNIC0_TXS0_INTERRUPT_MASK, 0xFFFFFFFF);
+ }
+
+ /* WA for H/W bug H6-3339 - mask the link UP interrupt */
+ NIC_MACRO_WREG32(mmNIC0_PHY_PHY_LINK_STS_INTR, 0x1);
+ }
+
+ /* flush */
+ port = 0;
+ NIC_RREG32(mmNIC0_QPC0_EVENT_QUE_CFG);
+}
+
+/**
+ * gaudi2_cn_quiescence() - make sure that NIC does not generate events nor
+ * receives traffic.
+ * Gaudi2 default values at power-up and after hard-reset are interrupts enabled
+ * and Rx enabled, we need to disable them until driver configuration is
+ * complete.
+ *
+ * @hdev: habanalabs device structure.
+ */
+void gaudi2_cn_quiescence(struct hl_device *hdev)
+{
+ /*
+ * Do not quiescence the ports during device release
+ * reset aka soft reset flow.
+ */
+ if (gaudi2_cn_get_hw_cap(hdev))
+ return;
+
+ dev_dbg(hdev->dev, "Quiescence the NICs\n");
+
+ gaudi2_cn_disable_interrupts(hdev);
+}
+
+static bool gaudi2_cn_get_hw_cap(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ return (gaudi2->hw_cap_initialized & HW_CAP_NIC_DRV);
+}
+
+static void gaudi2_cn_set_hw_cap(struct hl_device *hdev, bool enable)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ if (enable)
+ gaudi2->hw_cap_initialized |= HW_CAP_NIC_DRV;
+ else
+ gaudi2->hw_cap_initialized &= ~HW_CAP_NIC_DRV;
+}
+
+/**
+ * gaudi2_cn_override_ports_ext_mask() - Set the external ports mask.
+ * @hdev: Hl device whose external ports mask to return.
+ * @ports_ext_mask: Out, the external ports mask.
+ */
+static void gaudi2_cn_override_ports_ext_mask(struct hl_device *hdev, uint64_t *ports_ext_mask)
+{
+ /* For asic type GAUDI2B, the external ports mask shouldn't be changed */
+ if (hdev->asic_type == ASIC_GAUDI2B) {
+ *ports_ext_mask = hdev->cn.ports_ext_mask;
+ return;
+ }
+
+ /* If we are running on a PCI card, all the ports should be set as external */
+ if (hdev->card_type == cpucp_card_type_pci) {
+ *ports_ext_mask = hdev->cn.ports_mask;
+ return;
+ }
+
+ /* For HLS2 setup type, the external ports mask shouldn't be changed */
+ *ports_ext_mask = hdev->cn.ports_ext_mask;
+}
+
+static int gaudi2_cn_check_oui_prefix_validity(u8 *mac_addr)
+{
+ u8 mac[ETH_ALEN];
+ int i;
+
+ for (i = 0 ; i < 3 ; i++)
+ mac[i] = HABANALABS_MAC_OUI_1 >> (8 * (2 - i));
+
+ if (!strncmp(mac, mac_addr, 3))
+ return 1;
+
+ for (i = 0 ; i < 3 ; i++)
+ mac[i] = HABANALABS_MAC_OUI_2 >> (8 * (2 - i));
+
+ if (!strncmp(mac, mac_addr, 3))
+ return 1;
+
+ return 0;
+}
+
+int gaudi2_cn_set_info(struct hl_device *hdev, bool get_from_fw)
+{
+ struct hbl_cn_cpucp_info *cn_cpucp_info = &hdev->asic_prop.cn_props.cpucp_info;
+ struct cpucp_info *cpucp_info = &hdev->asic_prop.cpucp_info;
+ struct hbl_cn_cpucp_mac_addr *mac_arr = cn_cpucp_info->mac_addrs;
+ struct hl_cn *cn = &hdev->cn;
+ u32 serdes_type = MAX_NUM_SERDES_TYPE;
+ u8 mac[ETH_ALEN], *mac_addr;
+ int rc, i;
+
+ /* copy the MAC OUI in reverse */
+ for (i = 0 ; i < 3 ; i++)
+ mac[i] = HABANALABS_MAC_OUI_1 >> (8 * (2 - i));
+
+ if (get_from_fw) {
+ rc = hl_cn_cpucp_info_get(hdev);
+ if (rc)
+ return rc;
+
+ cn->ports_mask &= cn_cpucp_info->link_mask[0];
+ cn->ports_ext_mask &= cn_cpucp_info->link_ext_mask[0];
+ cn->auto_neg_mask &= cn_cpucp_info->auto_neg_mask[0];
+
+ serdes_type = cn_cpucp_info->serdes_type;
+
+ /* check for invalid MAC addresses from F/W (bad OUI) */
+ for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+ if (!(cn->ports_mask & BIT(i)))
+ continue;
+
+ mac_addr = mac_arr[i].mac_addr;
+ if (!gaudi2_cn_check_oui_prefix_validity(mac_addr))
+ dev_warn(hdev->dev, "unrecognized MAC OUI %pM, port %d\n",
+ mac_addr, i);
+ }
+
+ cn->card_location = le32_to_cpu(cpucp_info->card_location);
+ cn->use_fw_serdes_info = true;
+ } else {
+ /* No F/W, hence need to set the MACs manually (randomize) */
+ get_random_bytes(&mac[3], 2);
+
+ for (i = 0 ; i < NIC_NUMBER_OF_PORTS ; i++) {
+ if (!(cn->ports_mask & BIT(i)))
+ continue;
+
+ mac[ETH_ALEN - 1] = i;
+ memcpy(mac_arr[i].mac_addr, mac, ETH_ALEN);
+ }
+
+ dev_warn(hdev->dev, "can't read card location as FW security is enabled\n");
+ }
+
+ switch (serdes_type) {
+ case HLS2_SERDES_TYPE:
+ hdev->asic_prop.server_type = HL_SERVER_GAUDI2_HLS2;
+ break;
+ case HLS2_TYPE_1_SERDES_TYPE:
+ hdev->asic_prop.server_type = HL_SERVER_GAUDI2_TYPE1;
+ break;
+ default:
+ hdev->asic_prop.server_type = HL_SERVER_TYPE_UNKNOWN;
+
+ if (get_from_fw) {
+ dev_err(hdev->dev, "bad SerDes type %d\n", serdes_type);
+ return -EFAULT;
+ }
+ break;
+ }
+
+ /* If we are running on non HLS2 setup or a PCI card, all the ports should be set as
+ * external (the only exception is when the asic type is GADUI2B).
+ */
+ if (hdev->card_type == cpucp_card_type_pci) {
+ if (hdev->asic_type != ASIC_GAUDI2B)
+ cn->ports_ext_mask = cn->ports_mask;
+
+ cn->auto_neg_mask &= ~cn->ports_ext_mask;
+ }
+
+ gaudi2_cn_override_ports_ext_mask(hdev, &cn->ports_ext_mask);
+
+ if (hdev->card_type == cpucp_card_type_pci)
+ cn->auto_neg_mask &= ~cn->ports_ext_mask;
+
+ /* Disable ANLT on NIC 0 ports (due to lane swapping) */
+ cn->auto_neg_mask &= ~0x3;
+
+ cn->lanes_per_port = 2;
+ cn->load_fw = false;
+ cn->eth_on_internal = false;
+
+ return 0;
+}
+
+static int gaudi2_cn_pre_core_init(struct hl_device *hdev)
+{
+ return 0;
+}
+
+static char *gaudi2_cn_get_event_name(struct hbl_aux_dev *aux_dev, u16 event_type)
+{
+ return gaudi2_irq_map_table[event_type].valid ? gaudi2_irq_map_table[event_type].name :
+ "N/A Event";
+}
+
+static int gaudi2_cn_poll_mem(struct hbl_aux_dev *aux_dev, u32 *addr, u32 *val,
+ hbl_cn_poll_cond_func func)
+{
+ return hl_poll_timeout_memory(hdev, addr, *val, func(*val, NULL), 10,
+ HL_DEVICE_TIMEOUT_USEC, true);
+}
+
+static void *gaudi2_cn_dma_alloc_coherent(struct hbl_aux_dev *aux_dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flag)
+{
+ return hl_cn_dma_alloc_coherent(aux_dev, size, dma_handle, flag);
+}
+
+static void gaudi2_cn_dma_free_coherent(struct hbl_aux_dev *aux_dev, size_t size, void *cpu_addr,
+ dma_addr_t dma_handle)
+{
+ hl_cn_dma_free_coherent(aux_dev, size, cpu_addr, dma_handle);
+}
+
+static void *gaudi2_cn_dma_pool_zalloc(struct hbl_aux_dev *aux_dev, size_t size, gfp_t mem_flags,
+ dma_addr_t *dma_handle)
+{
+ return hl_cn_dma_pool_zalloc(aux_dev, size, mem_flags, dma_handle);
+}
+
+static void gaudi2_cn_dma_pool_free(struct hbl_aux_dev *aux_dev, void *vaddr, dma_addr_t dma_addr)
+{
+ hl_cn_dma_pool_free(aux_dev, vaddr, dma_addr);
+}
+
+static void gaudi2_cn_set_cn_data(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cn_aux_data *gaudi2_aux_data;
+ struct gaudi2_cn_aux_ops *gaudi2_aux_ops;
+ struct hbl_cn_aux_data *aux_data;
+ struct hbl_cn_aux_ops *aux_ops;
+ struct hl_cn *cn = &hdev->cn;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = &cn->cn_aux_dev;
+ aux_data = aux_dev->aux_data;
+ gaudi2_aux_data = &gaudi2->cn_aux_data;
+ aux_data->asic_specific = gaudi2_aux_data;
+ aux_ops = aux_dev->aux_ops;
+ gaudi2_aux_ops = &gaudi2->cn_aux_ops;
+ aux_ops->asic_ops = gaudi2_aux_ops;
+
+ gaudi2_aux_data->cfg_base = CFG_BASE;
+ gaudi2_aux_data->fw_security_enabled = hdev->asic_prop.fw_security_enabled;
+ gaudi2_aux_data->msix_enabled = !!(gaudi2->hw_cap_initialized & HW_CAP_MSIX);
+ gaudi2_aux_data->irq_num_port_base = GAUDI2_IRQ_NUM_NIC_PORT_FIRST;
+ gaudi2_aux_data->sob_id_base = GAUDI2_RESERVED_SOB_NIC_PORT_FIRST;
+ gaudi2_aux_data->sob_inc_cfg_val = GAUDI2_SOB_INCREMENT_BY_ONE;
+ gaudi2_aux_data->setup_type = GAUDI2_SETUP_TYPE_HLS2;
+
+ /* cn2accel */
+ gaudi2_aux_ops->get_event_name = gaudi2_cn_get_event_name;
+ gaudi2_aux_ops->poll_mem = gaudi2_cn_poll_mem;
+ gaudi2_aux_ops->dma_alloc_coherent = gaudi2_cn_dma_alloc_coherent;
+ gaudi2_aux_ops->dma_free_coherent = gaudi2_cn_dma_free_coherent;
+ gaudi2_aux_ops->dma_pool_zalloc = gaudi2_cn_dma_pool_zalloc;
+ gaudi2_aux_ops->dma_pool_free = gaudi2_cn_dma_pool_free;
+ gaudi2_aux_ops->spmu_get_stats_info = hl_cn_spmu_get_stats_info;
+ gaudi2_aux_ops->spmu_config = hl_cn_spmu_config;
+ gaudi2_aux_ops->spmu_sample = hl_cn_spmu_sample;
+ gaudi2_aux_ops->poll_reg = hl_cn_poll_reg;
+ gaudi2_aux_ops->send_cpu_message = hl_cn_send_cpu_message;
+ gaudi2_aux_ops->post_send_status = hl_cn_post_send_status;
+}
+
+void gaudi2_cn_compute_reset_prepare(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cn_aux_ops *gaudi2_aux_ops;
+ struct hl_cn *cn = &hdev->cn;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = &cn->cn_aux_dev;
+ gaudi2_aux_ops = &gaudi2->cn_aux_ops;
+
+ if (gaudi2_aux_ops->reset_prepare)
+ gaudi2_aux_ops->reset_prepare(aux_dev);
+}
+
+void gaudi2_cn_compute_reset_late_init(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cn_aux_ops *gaudi2_aux_ops;
+ struct hl_cn *cn = &hdev->cn;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = &cn->cn_aux_dev;
+ gaudi2_aux_ops = &gaudi2->cn_aux_ops;
+
+ if (gaudi2_aux_ops->reset_late_init)
+ gaudi2_aux_ops->reset_late_init(aux_dev);
+}
+
+static void gaudi2_cn_post_send_status(struct hl_device *hdev, u32 port)
+{
+ hl_fw_unmask_irq(hdev, GAUDI2_EVENT_CPU0_STATUS_NIC0_ENG0 + port);
+}
+
+static void gaudi2_cn_ports_stop_prepare(struct hl_device *hdev, bool fw_reset, bool in_teardown)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cn_aux_ops *gaudi2_aux_ops;
+ struct hl_cn *cn = &hdev->cn;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = &cn->cn_aux_dev;
+ gaudi2_aux_ops = &gaudi2->cn_aux_ops;
+
+ if (gaudi2_aux_ops->ports_stop_prepare)
+ gaudi2_aux_ops->ports_stop_prepare(aux_dev, fw_reset, in_teardown);
+}
+
+static int gaudi2_cn_send_port_cpucp_status(struct hl_device *hdev, u32 port, u8 cmd, u8 period)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cn_aux_ops *gaudi2_aux_ops;
+ struct hl_cn *cn = &hdev->cn;
+ struct hbl_aux_dev *aux_dev;
+
+ aux_dev = &cn->cn_aux_dev;
+ gaudi2_aux_ops = &gaudi2->cn_aux_ops;
+
+ if (gaudi2_aux_ops->send_port_cpucp_status)
+ return gaudi2_aux_ops->send_port_cpucp_status(aux_dev, port, cmd, period);
+
+ return -ENODEV;
+}
+
+static struct hl_cn_port_funcs gaudi2_cn_port_funcs = {
+ .spmu_get_stats_info = gaudi2_cn_spmu_get_stats_info,
+ .spmu_config = gaudi2_cn_spmu_config,
+ .spmu_sample = gaudi2_cn_spmu_sample,
+ .post_send_status = gaudi2_cn_post_send_status,
+ .ports_stop_prepare = gaudi2_cn_ports_stop_prepare,
+ .send_port_cpucp_status = gaudi2_cn_send_port_cpucp_status,
+};
+
+struct hl_cn_funcs gaudi2_cn_funcs = {
+ .get_hw_cap = gaudi2_cn_get_hw_cap,
+ .set_hw_cap = gaudi2_cn_set_hw_cap,
+ .pre_core_init = gaudi2_cn_pre_core_init,
+ .set_cn_data = gaudi2_cn_set_cn_data,
+ .port_funcs = &gaudi2_cn_port_funcs,
+};
diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2_cn.h b/drivers/accel/habanalabs/gaudi2/gaudi2_cn.h
new file mode 100644
index 000000000000..dbc8ede23779
--- /dev/null
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2_cn.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2020-2024 HabanaLabs, Ltd.
+ * Copyright (C) 2023-2024, Intel Corporation.
+ * All Rights Reserved.
+ *
+ */
+
+#ifndef GAUDI2_CN_H_
+#define GAUDI2_CN_H_
+
+#include "gaudi2P.h"
+#include "../include/gaudi2/asic_reg/gaudi2_regs.h"
+
+#define NIC_MAX_RC_MTU SZ_8K
+/* This is the max frame length the H/W supports (Tx/Rx) */
+#define NIC_MAX_RDMA_HDRS 128
+#define NIC_MAX_FRM_LEN (NIC_MAX_RC_MTU + NIC_MAX_RDMA_HDRS)
+
+#define NIC_CFG_LO_SIZE (mmNIC0_QPC1_REQ_STATIC_CONFIG - \
+ mmNIC0_QPC0_REQ_STATIC_CONFIG)
+
+#define NIC_CFG_HI_SIZE (mmNIC0_RXE1_CONTROL - mmNIC0_RXE0_CONTROL)
+
+#define NIC_CFG_BASE(port, reg) \
+ ((u64) (NIC_MACRO_CFG_BASE(port) + \
+ ((reg < mmNIC0_RXE0_CONTROL) ? \
+ (NIC_CFG_LO_SIZE * (u64) ((port) & 1)) : \
+ (NIC_CFG_HI_SIZE * (u64) ((port) & 1)))))
+
+#define NIC_RREG32(reg) RREG32(NIC_CFG_BASE(port, (reg)) + (reg))
+#define NIC_WREG32(reg, val) WREG32(NIC_CFG_BASE(port, (reg)) + (reg), (val))
+#define NIC_RMWREG32(reg, val, mask) \
+ RMWREG32(NIC_CFG_BASE(port, reg) + (reg), (val), (mask))
+
+int gaudi2_cn_set_info(struct hl_device *hdev, bool get_from_fw);
+int gaudi2_cn_handle_sw_error_event(struct hl_device *hdev, u16 event_type, u8 macro_index,
+ struct hl_eq_nic_intr_cause *nic_intr_cause);
+int gaudi2_cn_handle_axi_error_response_event(struct hl_device *hdev, u16 event_type,
+ u8 macro_index, struct hl_eq_nic_intr_cause *nic_intr_cause);
+
+#endif /* GAUDI2_CN_H_ */
diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2_coresight.c b/drivers/accel/habanalabs/gaudi2/gaudi2_coresight.c
index 2423620ff358..f7b3692f6bff 100644
--- a/drivers/accel/habanalabs/gaudi2/gaudi2_coresight.c
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2_coresight.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * Copyright 2019-2022 HabanaLabs, Ltd.
+ * Copyright 2019-2024 HabanaLabs, Ltd.
* All Rights Reserved.
*/
#include "gaudi2_coresight_regs.h"
@@ -9,6 +9,8 @@
#define GAUDI2_PLDM_CORESIGHT_TIMEOUT_USEC (CORESIGHT_TIMEOUT_USEC * 2000)
#define SPMU_MAX_COUNTERS 6
+/* SPMU should also include overflow_idx and cycle_cnt_idx */
+#define SPMU_DATA_LEN (SPMU_MAX_COUNTERS + 2)
#define COMPONENT_ID_INVALID ((u32)(-1))
#define MAX_BMONS_PER_UNIT 8
@@ -59,6 +61,23 @@ struct component_config_offsets {
u32 bmon_ids[MAX_BMONS_PER_UNIT];
};
+static struct hbl_cn_stat gaudi2_nic0_spmu_stats[] = {
+ {"spmu_req_out_of_range_psn", 17},
+ {"spmu_req_unset_psn", 18},
+ {"spmu_res_duplicate_psn", 20},
+ {"spmu_res_out_of_sequence_psn", 21}
+};
+
+static struct hbl_cn_stat gaudi2_nic1_spmu_stats[] = {
+ {"spmu_req_out_of_range_psn", 5},
+ {"spmu_req_unset_psn", 6},
+ {"spmu_res_duplicate_psn", 8},
+ {"spmu_res_out_of_sequence_psn", 9}
+};
+
+static size_t gaudi2_nic0_spmu_stats_len = ARRAY_SIZE(gaudi2_nic0_spmu_stats);
+static size_t gaudi2_nic1_spmu_stats_len = ARRAY_SIZE(gaudi2_nic1_spmu_stats);
+
static u64 debug_stm_regs[GAUDI2_STM_LAST + 1] = {
[GAUDI2_STM_DCORE0_TPC0_EML] = mmDCORE0_TPC0_EML_STM_BASE,
[GAUDI2_STM_DCORE0_TPC1_EML] = mmDCORE0_TPC1_EML_STM_BASE,
@@ -489,7 +508,7 @@ static u64 debug_funnel_regs[GAUDI2_FUNNEL_LAST + 1] = {
[GAUDI2_FUNNEL_NIC11_DBG_NCH] = mmNIC11_DBG_FUNNEL_NCH_BASE
};
-static u64 debug_bmon_regs[GAUDI2_BMON_LAST + 1] = {
+u64 debug_bmon_regs[GAUDI2_BMON_LAST + 1] = {
[GAUDI2_BMON_DCORE0_TPC0_EML_0] = mmDCORE0_TPC0_EML_BUSMON_0_BASE,
[GAUDI2_BMON_DCORE0_TPC0_EML_1] = mmDCORE0_TPC0_EML_BUSMON_1_BASE,
[GAUDI2_BMON_DCORE0_TPC0_EML_2] = mmDCORE0_TPC0_EML_BUSMON_2_BASE,
@@ -877,7 +896,7 @@ static u64 debug_bmon_regs[GAUDI2_BMON_LAST + 1] = {
[GAUDI2_BMON_NIC11_DBG_2_1] = mmNIC11_DBG_BMON2_1_BASE
};
-static u64 debug_spmu_regs[GAUDI2_SPMU_LAST + 1] = {
+u64 debug_spmu_regs[GAUDI2_SPMU_LAST + 1] = {
[GAUDI2_SPMU_DCORE0_TPC0_EML] = mmDCORE0_TPC0_EML_SPMU_BASE,
[GAUDI2_SPMU_DCORE0_TPC1_EML] = mmDCORE0_TPC1_EML_SPMU_BASE,
[GAUDI2_SPMU_DCORE0_TPC2_EML] = mmDCORE0_TPC2_EML_SPMU_BASE,
@@ -2432,6 +2451,11 @@ static int gaudi2_config_bmon(struct hl_device *hdev, struct hl_debug_params *pa
return 0;
}
+static bool gaudi2_reg_is_nic_spmu(enum gaudi2_debug_spmu_regs_index reg_idx)
+{
+ return reg_idx >= GAUDI2_SPMU_NIC0_DBG_0 && reg_idx <= GAUDI2_SPMU_NIC11_DBG_1;
+}
+
static int gaudi2_config_spmu(struct hl_device *hdev, struct hl_debug_params *params)
{
struct hl_debug_params_spmu *input = params->input;
@@ -2542,6 +2566,121 @@ static int gaudi2_config_spmu(struct hl_device *hdev, struct hl_debug_params *pa
return 0;
}
+static int gaudi2_sample_spmu(struct hl_device *hdev, struct hl_debug_params *params)
+{
+ u32 output_arr_len;
+ u32 events_num;
+ u64 base_reg;
+ u64 *output;
+ int i;
+
+ if (params->reg_idx >= ARRAY_SIZE(debug_spmu_regs)) {
+ dev_err(hdev->dev, "Invalid register index in SPMU\n");
+ return -EINVAL;
+ }
+
+ base_reg = debug_spmu_regs[params->reg_idx];
+
+ /* in case base reg is 0x0 we ignore this configuration */
+ if (!base_reg)
+ return 0;
+
+ output = params->output;
+ output_arr_len = params->output_size / sizeof(u64);
+ events_num = output_arr_len;
+
+ if (output_arr_len < 1) {
+ dev_err(hdev->dev, "not enough values for SPMU sample\n");
+ return -EINVAL;
+ }
+
+ if (events_num > SPMU_MAX_COUNTERS) {
+ dev_err(hdev->dev, "too many events values for SPMU sample\n");
+ return -EINVAL;
+ }
+
+ /* capture */
+ WREG32(base_reg + mmSPMU_PMSCR_OFFSET, 1);
+
+ /* read the shadow registers */
+ for (i = 0 ; i < events_num ; i++)
+ output[i] = RREG32(base_reg + mmSPMU_PMEVCNTSR0_OFFSET + i * 4);
+
+ return 0;
+}
+
+void gaudi2_cn_spmu_get_stats_info(struct hl_device *hdev, u32 port, struct hbl_cn_stat **stats,
+ u32 *n_stats)
+{
+ if (!hdev->supports_coresight) {
+ *n_stats = 0;
+ return;
+ }
+
+ if (port & 1) {
+ *n_stats = gaudi2_nic1_spmu_stats_len;
+ *stats = gaudi2_nic1_spmu_stats;
+ } else {
+ *n_stats = gaudi2_nic0_spmu_stats_len;
+ *stats = gaudi2_nic0_spmu_stats;
+ }
+}
+
+int gaudi2_cn_spmu_config(struct hl_device *hdev, u32 port, u32 num_event_types, u32 event_types[],
+ bool enable)
+{
+ struct hl_debug_params_spmu spmu;
+ struct hl_debug_params params;
+ u64 event_counters[SPMU_DATA_LEN];
+ int i;
+
+ if (!hdev->supports_coresight)
+ return 0;
+
+ /* validate nic port */
+ if (!gaudi2_reg_is_nic_spmu(GAUDI2_SPMU_NIC0_DBG_0 + port)) {
+ dev_err(hdev->dev, "Invalid nic port %u\n", port);
+ return -EINVAL;
+ }
+
+ memset(¶ms, 0, sizeof(struct hl_debug_params));
+ params.op = HL_DEBUG_OP_SPMU;
+ params.input = &spmu;
+ params.enable = enable;
+ params.output_size = sizeof(event_counters);
+ params.output = event_counters;
+ params.reg_idx = GAUDI2_SPMU_NIC0_DBG_0 + port;
+
+ memset(&spmu, 0, sizeof(struct hl_debug_params_spmu));
+ spmu.event_types_num = num_event_types;
+
+ for (i = 0 ; i < spmu.event_types_num ; i++)
+ spmu.event_types[i] = event_types[i];
+
+ return gaudi2_config_spmu(hdev, ¶ms);
+}
+
+int gaudi2_cn_spmu_sample(struct hl_device *hdev, u32 port, u32 num_out_data, u64 out_data[])
+{
+ struct hl_debug_params params;
+
+ if (!hdev->supports_coresight)
+ return 0;
+
+ /* validate nic port */
+ if (!gaudi2_reg_is_nic_spmu(GAUDI2_SPMU_NIC0_DBG_0 + port)) {
+ dev_err(hdev->dev, "Invalid nic port %u\n", port);
+ return -EINVAL;
+ }
+
+ memset(¶ms, 0, sizeof(struct hl_debug_params));
+ params.output = out_data;
+ params.output_size = num_out_data * sizeof(u64);
+ params.reg_idx = GAUDI2_SPMU_NIC0_DBG_0 + port;
+
+ return gaudi2_sample_spmu(hdev, ¶ms);
+}
+
int gaudi2_debug_coresight(struct hl_device *hdev, struct hl_ctx *ctx, void *data)
{
struct hl_debug_params *params = data;
--
2.34.1
^ permalink raw reply related [flat|nested] 107+ messages in thread
* Re: [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver
2024-06-13 8:21 ` [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver Omer Shpigelman
@ 2024-06-13 13:01 ` Przemek Kitszel
2024-06-13 14:16 ` Przemek Kitszel
2024-06-17 8:08 ` Omer Shpigelman
2024-06-15 0:05 ` Stephen Hemminger
2024-06-17 14:05 ` Markus Elfring
2 siblings, 2 replies; 107+ messages in thread
From: Przemek Kitszel @ 2024-06-13 13:01 UTC (permalink / raw)
To: Omer Shpigelman, linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, zyehudai
On 6/13/24 10:21, Omer Shpigelman wrote:
> Add the hbl_cn driver which will serve both Ethernet and InfiniBand
> drivers.
> hbl_cn is the layer which is used by the satellite drivers for many shared
> operations that are needed by both EN and IB subsystems like QPs, CQs etc.
> The CN driver is initialized via auxiliary bus by the habanalabs driver.
>
> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> Co-developed-by: David Meriin <dmeriin@habana.ai>
> Signed-off-by: David Meriin <dmeriin@habana.ai>
> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> ---
> .../device_drivers/ethernet/index.rst | 1 +
> .../device_drivers/ethernet/intel/hbl.rst | 82 +
> MAINTAINERS | 11 +
> drivers/net/ethernet/intel/Kconfig | 20 +
> drivers/net/ethernet/intel/Makefile | 1 +
> drivers/net/ethernet/intel/hbl_cn/Makefile | 9 +
> .../net/ethernet/intel/hbl_cn/common/Makefile | 3 +
> .../net/ethernet/intel/hbl_cn/common/hbl_cn.c | 5954 +++++++++++++++++
> .../net/ethernet/intel/hbl_cn/common/hbl_cn.h | 1627 +++++
> .../ethernet/intel/hbl_cn/common/hbl_cn_drv.c | 220 +
> .../intel/hbl_cn/common/hbl_cn_memory.c | 40 +
> .../ethernet/intel/hbl_cn/common/hbl_cn_phy.c | 33 +
> .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 13 +
> include/linux/habanalabs/cpucp_if.h | 125 +-
> include/linux/habanalabs/hl_boot_if.h | 9 +-
> include/linux/net/intel/cn.h | 474 ++
> include/linux/net/intel/cn_aux.h | 298 +
> include/linux/net/intel/cni.h | 636 ++
> 18 files changed, 9545 insertions(+), 11 deletions(-)
this is a very big patch, it asks for a split; what's worse, it's
proportional to the size of this series:
146 files changed, 148514 insertions(+), 70 deletions(-)
which is just too big
[...]
> +Support
> +=======
> +For general information, go to the Intel support website at:
> +https://www.intel.com/support/
> +
> +If an issue is identified with the released source code on a supported kernel
> +with a supported adapter, email the specific information related to the issue
> +to intel-wired-lan@lists.osuosl.org.
I'm welcoming you to post next version of the driver to the IWL mailing
list, and before that, to go through our Intel path for ethernet
subsystem (rdma and a few smaller ones also go through that)
(that starts internally, I will PM you the details)
[...]
> +++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
> @@ -0,0 +1,5954 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#include "hbl_cn.h"
> +
> +#include <linux/file.h>
> +#include <linux/module.h>
> +#include <linux/overflow.h>
> +#include <linux/pci.h>
> +#include <linux/slab.h>
> +
> +#define NIC_MIN_WQS_PER_PORT 2
> +
> +#define NIC_SEQ_RESETS_TIMEOUT_MS 15000 /* 15 seconds */
> +#define NIC_MAX_SEQ_RESETS 3
> +
> +#define HBL_CN_IPV4_PROTOCOL_UDP 17
> +
> +/* SOB mask is not expected to change across ASIC. Hence common defines. */
> +#define NIC_SOB_INC_MASK 0x80000000
> +#define NIC_SOB_VAL_MASK 0x7fff
> +
> +#define NIC_DUMP_QP_SZ SZ_4K
> +
> +#define HBL_AUX2NIC(aux_dev) \
> + ({ \
> + struct hbl_aux_dev *__aux_dev = (aux_dev); \
> + ((__aux_dev)->type == HBL_AUX_DEV_ETH) ? \
> + container_of(__aux_dev, struct hbl_cn_device, en_aux_dev) : \
> + container_of(__aux_dev, struct hbl_cn_device, ib_aux_dev); \
> + })
this should be a function
> +
> +#define RAND_STAT_CNT(cnt) \
> + do { \
> + u32 __cnt = get_random_u32(); \
> + (cnt) = __cnt; \
> + dev_info(hdev->dev, "port %d, %s: %u\n", port, #cnt, __cnt); \
no way for such message, ditto for the function
> + } while (0)
> +
> +struct hbl_cn_stat hbl_cn_mac_fec_stats[] = {
> + {"correctable_errors", 0x2, 0x3},
> + {"uncorrectable_errors", 0x4, 0x5}
> +};
> +
> +struct hbl_cn_stat hbl_cn_mac_stats_rx[] = {
> + {"Octets", 0x0},
> + {"OctetsReceivedOK", 0x4},
> + {"aAlignmentErrors", 0x8},
> + {"aPAUSEMACCtrlFramesReceived", 0xC},
> + {"aFrameTooLongErrors", 0x10},
> + {"aInRangeLengthErrors", 0x14},
> + {"aFramesReceivedOK", 0x18},
> + {"aFrameCheckSequenceErrors", 0x1C},
> + {"VLANReceivedOK", 0x20},
> + {"ifInErrors", 0x24},
> + {"ifInUcastPkts", 0x28},
> + {"ifInMulticastPkts", 0x2C},
> + {"ifInBroadcastPkts", 0x30},
> + {"DropEvents", 0x34},
> + {"Pkts", 0x38},
> + {"UndersizePkts", 0x3C},
> + {"Pkts64Octets", 0x40},
> + {"Pkts65to127Octets", 0x44},
> + {"Pkts128to255Octets", 0x48},
> + {"Pkts256to511Octets", 0x4C},
> + {"Pkts512to1023Octets", 0x50},
> + {"Pkts1024to1518Octets", 0x54},
> + {"Pkts1519toMaxOctets", 0x58},
> + {"OversizePkts", 0x5C},
> + {"Jabbers", 0x60},
> + {"Fragments", 0x64},
> + {"aCBFCPAUSERx0", 0x68},
> + {"aCBFCPAUSERx1", 0x6C},
> + {"aCBFCPAUSERx2", 0x70},
> + {"aCBFCPAUSERx3", 0x74},
> + {"aCBFCPAUSERx4", 0x78},
> + {"aCBFCPAUSERx5", 0x7C},
> + {"aCBFCPAUSERx6", 0x80},
> + {"aCBFCPAUSERx7", 0x84},
> + {"aMACControlFramesReceived", 0x88}
> +};
> +
> +struct hbl_cn_stat hbl_cn_mac_stats_tx[] = {
> + {"Octets", 0x0},
> + {"OctetsTransmittedOK", 0x4},
> + {"aPAUSEMACCtrlFramesTransmitted", 0x8},
> + {"aFramesTransmittedOK", 0xC},
> + {"VLANTransmittedOK", 0x10},
> + {"ifOutErrors", 0x14},
> + {"ifOutUcastPkts", 0x18},
> + {"ifOutMulticastPkts", 0x1C},
> + {"ifOutBroadcastPkts", 0x20},
> + {"Pkts64Octets", 0x24},
> + {"Pkts65to127Octets", 0x28},
> + {"Pkts128to255Octets", 0x2C},
> + {"Pkts256to511Octets", 0x30},
> + {"Pkts512to1023Octets", 0x34},
> + {"Pkts1024to1518Octets", 0x38},
> + {"Pkts1519toMaxOctets", 0x3C},
> + {"aCBFCPAUSETx0", 0x40},
> + {"aCBFCPAUSETx1", 0x44},
> + {"aCBFCPAUSETx2", 0x48},
> + {"aCBFCPAUSETx3", 0x4C},
> + {"aCBFCPAUSETx4", 0x50},
> + {"aCBFCPAUSETx5", 0x54},
> + {"aCBFCPAUSETx6", 0x58},
> + {"aCBFCPAUSETx7", 0x5C},
> + {"aMACControlFramesTx", 0x60},
> + {"Pkts", 0x64}
> +};
> +
> +static const char pcs_counters_str[][ETH_GSTRING_LEN] = {
> + {"pcs_local_faults"},
> + {"pcs_remote_faults"},
> + {"pcs_remote_fault_reconfig"},
> + {"pcs_link_restores"},
> + {"pcs_link_toggles"},
> +};
> +
> +static size_t pcs_counters_str_len = ARRAY_SIZE(pcs_counters_str);
> +size_t hbl_cn_mac_fec_stats_len = ARRAY_SIZE(hbl_cn_mac_fec_stats);
> +size_t hbl_cn_mac_stats_rx_len = ARRAY_SIZE(hbl_cn_mac_stats_rx);
> +size_t hbl_cn_mac_stats_tx_len = ARRAY_SIZE(hbl_cn_mac_stats_tx);
why those are not const?
> +
> +static void qps_stop(struct hbl_cn_device *hdev);
> +static void qp_destroy_work(struct work_struct *work);
> +static int __user_wq_arr_unset(struct hbl_cn_ctx *ctx, struct hbl_cn_port *cn_port, u32 type);
> +static void user_cq_destroy(struct kref *kref);
> +static void set_app_params_clear(struct hbl_cn_device *hdev);
> +static int hbl_cn_ib_cmd_ctrl(struct hbl_aux_dev *aux_dev, void *cn_ib_ctx, u32 op, void *input,
> + void *output);
> +static int hbl_cn_ib_query_mem_handle(struct hbl_aux_dev *ib_aux_dev, u64 mem_handle,
> + struct hbl_ib_mem_info *info);
> +
> +static void hbl_cn_reset_stats_counters_port(struct hbl_cn_device *hdev, u32 port);
> +static void hbl_cn_late_init(struct hbl_cn_device *hdev);
> +static void hbl_cn_late_fini(struct hbl_cn_device *hdev);
> +static int hbl_cn_sw_init(struct hbl_cn_device *hdev);
> +static void hbl_cn_sw_fini(struct hbl_cn_device *hdev);
> +static void hbl_cn_spmu_init(struct hbl_cn_port *cn_port, bool full);
> +static int hbl_cn_cmd_port_check(struct hbl_cn_device *hdev, u32 port, u32 flags);
> +static void hbl_cn_qps_stop(struct hbl_cn_port *cn_port);
> +
> +static int hbl_cn_request_irqs(struct hbl_cn_device *hdev)
> +{
> + struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
> +
> + return asic_funcs->request_irqs(hdev);
> +}
> +
> +static void hbl_cn_free_irqs(struct hbl_cn_device *hdev)
> +{
> + struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
> +
> + asic_funcs->free_irqs(hdev);
> +}
> +
> +static void hbl_cn_synchronize_irqs(struct hbl_aux_dev *cn_aux_dev)
> +{
> + struct hbl_cn_device *hdev = cn_aux_dev->priv;
> + struct hbl_cn_asic_funcs *asic_funcs;
> +
> + asic_funcs = hdev->asic_funcs;
> +
> + asic_funcs->synchronize_irqs(hdev);
> +}
> +
> +void hbl_cn_get_frac_info(u64 numerator, u64 denominator, u64 *integer, u64 *exp)
> +{
> + u64 high_digit_n, high_digit_d, integer_tmp, exp_tmp;
> + u8 num_digits_n, num_digits_d;
> + int i;
> +
> + num_digits_d = hbl_cn_get_num_of_digits(denominator);
> + high_digit_d = denominator;
> + for (i = 0; i < num_digits_d - 1; i++)
> + high_digit_d /= 10;
> +
> + integer_tmp = 0;
> + exp_tmp = 0;
> +
> + if (numerator) {
> + num_digits_n = hbl_cn_get_num_of_digits(numerator);
> + high_digit_n = numerator;
> + for (i = 0; i < num_digits_n - 1; i++)
> + high_digit_n /= 10;
> +
> + exp_tmp = num_digits_d - num_digits_n;
> +
> + if (high_digit_n < high_digit_d) {
> + high_digit_n *= 10;
> + exp_tmp++;
> + }
> +
> + integer_tmp = div_u64(high_digit_n, high_digit_d);
> + }
> +
> + *integer = integer_tmp;
> + *exp = exp_tmp;
> +}
this function sounds suspicious for a network driver, what do you need
it for?
> +
> +int hbl_cn_read_spmu_counters(struct hbl_cn_port *cn_port, u64 out_data[], u32 *num_out_data)
> +{
> + struct hbl_cn_device *hdev = cn_port->hdev;
> + struct hbl_cn_asic_port_funcs *port_funcs;
> + struct hbl_cn_stat *ignore;
> + int rc;
> +
> + port_funcs = hdev->asic_funcs->port_funcs;
> +
> + port_funcs->spmu_get_stats_info(cn_port, &ignore, num_out_data);
hard to ignore that you deref uninitialized pointer...
please consider going one step back and start with our internal mailing
lists, thank you
Przemek
[...]
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver
2024-06-13 13:01 ` Przemek Kitszel
@ 2024-06-13 14:16 ` Przemek Kitszel
2024-06-17 8:08 ` Omer Shpigelman
1 sibling, 0 replies; 107+ messages in thread
From: Przemek Kitszel @ 2024-06-13 14:16 UTC (permalink / raw)
To: Omer Shpigelman, linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, zyehudai
On 6/13/24 15:01, Przemek Kitszel wrote:
> On 6/13/24 10:21, Omer Shpigelman wrote:
[...]
>> +
>> +int hbl_cn_read_spmu_counters(struct hbl_cn_port *cn_port, u64
>> out_data[], u32 *num_out_data)
>> +{
>> + struct hbl_cn_device *hdev = cn_port->hdev;
>> + struct hbl_cn_asic_port_funcs *port_funcs;
>> + struct hbl_cn_stat *ignore;
>> + int rc;
>> +
>> + port_funcs = hdev->asic_funcs->port_funcs;
>> +
>> + port_funcs->spmu_get_stats_info(cn_port, &ignore, num_out_data);
>
> hard to ignore that you deref uninitialized pointer...
oh, sorry, I was in hurry, please disregard this particular comment
>
> please consider going one step back and start with our internal mailing
> lists, thank you
> Przemek
but this option very much still holds
>
> [...]
>
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-13 8:22 ` [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver Omer Shpigelman
@ 2024-06-13 19:18 ` Leon Romanovsky
2024-06-17 17:43 ` Omer Shpigelman
2024-06-17 14:17 ` Jason Gunthorpe
1 sibling, 1 reply; 107+ messages in thread
From: Leon Romanovsky @ 2024-06-13 19:18 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel, linux-rdma, netdev, dri-devel, ogabbay, zyehudai
On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
> Add an RDMA driver of Gaudi ASICs family for AI scaling.
> The driver itself is agnostic to the ASIC in action, it operates according
> to the capabilities that were passed on device initialization.
> The device is initialized by the hbl_cn driver via auxiliary bus.
> The driver also supports QP resource tracking and port/device HW counters.
>
> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> Co-developed-by: David Meriin <dmeriin@habana.ai>
> Signed-off-by: David Meriin <dmeriin@habana.ai>
> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
I afraid that you misinterpreted the "Co-developed-by" tag. All these
people are probably touch the code and not actually sit together at
the same room and write the code together. So, please remove the
extensive "Co-developed-by" tags.
It is not full review yet, but simple pass-by-comments.
> ---
> MAINTAINERS | 10 +
> drivers/infiniband/Kconfig | 1 +
> drivers/infiniband/hw/Makefile | 1 +
> drivers/infiniband/hw/hbl/Kconfig | 17 +
> drivers/infiniband/hw/hbl/Makefile | 8 +
> drivers/infiniband/hw/hbl/hbl.h | 326 +++
> drivers/infiniband/hw/hbl/hbl_main.c | 478 ++++
> drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 ++++++++++++++++++++++
> include/uapi/rdma/hbl-abi.h | 204 ++
> include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
> include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
> include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
> 12 files changed, 3904 insertions(+)
> create mode 100644 drivers/infiniband/hw/hbl/Kconfig
> create mode 100644 drivers/infiniband/hw/hbl/Makefile
> create mode 100644 drivers/infiniband/hw/hbl/hbl.h
> create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c
> create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c
> create mode 100644 include/uapi/rdma/hbl-abi.h
> create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h
> create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h
<...>
> +#define hbl_ibdev_emerg(ibdev, format, ...) ibdev_emerg(ibdev, format, ##__VA_ARGS__)
> +#define hbl_ibdev_alert(ibdev, format, ...) ibdev_alert(ibdev, format, ##__VA_ARGS__)
> +#define hbl_ibdev_crit(ibdev, format, ...) ibdev_crit(ibdev, format, ##__VA_ARGS__)
> +#define hbl_ibdev_err(ibdev, format, ...) ibdev_err(ibdev, format, ##__VA_ARGS__)
> +#define hbl_ibdev_warn(ibdev, format, ...) ibdev_warn(ibdev, format, ##__VA_ARGS__)
> +#define hbl_ibdev_notice(ibdev, format, ...) ibdev_notice(ibdev, format, ##__VA_ARGS__)
> +#define hbl_ibdev_info(ibdev, format, ...) ibdev_info(ibdev, format, ##__VA_ARGS__)
> +#define hbl_ibdev_dbg(ibdev, format, ...) ibdev_dbg(ibdev, format, ##__VA_ARGS__)
> +
> +#define hbl_ibdev_emerg_ratelimited(ibdev, fmt, ...) \
> + ibdev_emerg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +#define hbl_ibdev_alert_ratelimited(ibdev, fmt, ...) \
> + ibdev_alert_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +#define hbl_ibdev_crit_ratelimited(ibdev, fmt, ...) \
> + ibdev_crit_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +#define hbl_ibdev_err_ratelimited(ibdev, fmt, ...) \
> + ibdev_err_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +#define hbl_ibdev_warn_ratelimited(ibdev, fmt, ...) \
> + ibdev_warn_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +#define hbl_ibdev_notice_ratelimited(ibdev, fmt, ...) \
> + ibdev_notice_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +#define hbl_ibdev_info_ratelimited(ibdev, fmt, ...) \
> + ibdev_info_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +#define hbl_ibdev_dbg_ratelimited(ibdev, fmt, ...) \
> + ibdev_dbg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> +
Please don't redefine the existing macros. Just use the existing ones.
<...>
> + if (hbl_ib_match_netdev(ibdev, netdev))
> + ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
> + else
> + return NOTIFY_DONE;
It is not kernel coding style. Please write:
if (!hbl_ib_match_netdev(ibdev, netdev))
return NOTIFY_DONE;
ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
> +
<...>
> +static int hbl_ib_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
> +{
> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
> + struct hbl_ib_aux_ops *aux_ops = aux_dev->aux_ops;
> + struct hbl_ib_device *hdev;
> + ktime_t timeout;
> + int rc;
> +
> + rc = hdev_init(aux_dev);
> + if (rc) {
> + dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
> + return -EIO;
> + }
> +
> + hdev = aux_dev->priv;
> +
> + /* don't allow module unloading while it is attached */
> + if (!try_module_get(THIS_MODULE)) {
This part makes wonder, what are you trying to do here? What doesn't work for you
in standard driver core and module load mechanism?
> + dev_err(hdev->dev, "Failed to increment %s module refcount\n",
> + module_name(THIS_MODULE));
> + rc = -EIO;
> + goto module_get_err;
> + }
> +
> + timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
> + while (1) {
> + aux_ops->hw_access_lock(aux_dev);
> +
> + /* if the device is operational, proceed to actual init while holding the lock in
> + * order to prevent concurrent hard reset
> + */
> + if (aux_ops->device_operational(aux_dev))
> + break;
> +
> + aux_ops->hw_access_unlock(aux_dev);
> +
> + if (ktime_compare(ktime_get(), timeout) > 0) {
> + dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
> + rc = -EBUSY;
> + goto timeout_err;
> + }
> +
> + dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing IB\n");
> +
> + msleep_interruptible(MSEC_PER_SEC);
> + }
The code above is unexpected.
> +
> + rc = hbl_ib_dev_init(hdev);
> + if (rc) {
> + dev_err(hdev->dev, "Failed to init ib device\n");
> + goto dev_init_err;
> + }
> +
> + aux_ops->hw_access_unlock(aux_dev);
> +
> + return 0;
> +
> +dev_init_err:
> + aux_ops->hw_access_unlock(aux_dev);
> +timeout_err:
> + module_put(THIS_MODULE);
> +module_get_err:
> + hdev_fini(aux_dev);
> +
> + return rc;
> +}
<...>
> +static int __init hbl_ib_init(void)
> +{
> + pr_info("loading driver\n");
Please remove all these debug prints and leave only the necessary ones.
> +
> + return auxiliary_driver_register(&hbl_ib_driver);
> +}
> +
> +static void __exit hbl_ib_exit(void)
> +{
> + auxiliary_driver_unregister(&hbl_ib_driver);
> +
> + pr_info("driver removed\n");
> +}
> +
> +module_init(hbl_ib_init);
> +module_exit(hbl_ib_exit)
Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-13 8:22 ` [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver Omer Shpigelman
@ 2024-06-13 21:49 ` Andrew Lunn
2024-06-18 6:58 ` Omer Shpigelman
2024-06-14 22:48 ` Joe Damato
` (4 subsequent siblings)
5 siblings, 1 reply; 107+ messages in thread
From: Andrew Lunn @ 2024-06-13 21:49 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel, linux-rdma, netdev, dri-devel, ogabbay, zyehudai
> +static int hbl_en_napi_poll(struct napi_struct *napi, int budget);
> +static int hbl_en_port_open(struct hbl_en_port *port);
When you do the Intel internal review, i expect this is crop up. No
forward declarations please. Put the code in the right order so they
are not needed.
> +static int hbl_en_get_src_ip(struct hbl_aux_dev *aux_dev, u32 port_idx, u32 *src_ip)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> + struct net_device *ndev = port->ndev;
> + struct in_device *in_dev;
> + struct in_ifaddr *ifa;
> + int rc = 0;
> +
> + /* for the case where no src IP is configured */
> + *src_ip = 0;
> +
> + /* rtnl lock should be acquired in relevant flows before taking configuration lock */
> + if (!rtnl_is_locked()) {
> + netdev_err(port->ndev, "Rtnl lock is not acquired, can't proceed\n");
> + rc = -EFAULT;
> + goto out;
> + }
You will find all other drivers just do:
ASSERT_RTNL().
If your locking is broken, you are probably dead anyway, so you might
as well keep going and try to explode in the most interesting way
possible.
> +static void hbl_en_reset_stats(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + port->net_stats.rx_packets = 0;
> + port->net_stats.tx_packets = 0;
> + port->net_stats.rx_bytes = 0;
> + port->net_stats.tx_bytes = 0;
> + port->net_stats.tx_errors = 0;
> + atomic64_set(&port->net_stats.rx_dropped, 0);
> + atomic64_set(&port->net_stats.tx_dropped, 0);
Why atomic64_set? Atomics are expensive, so you should not be using
them. netdev has other cheaper methods, which other Intel developers
should be happy to tell you all about.
> +static u32 hbl_en_get_mtu(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> + struct net_device *ndev = port->ndev;
> + u32 mtu;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(ndev, "port is in reset, can't get MTU\n");
> + return 0;
> + }
> +
> + mtu = ndev->mtu;
I think you need a better error message. All this does is access
ndev->mtu. What does it matter if the port is in reset? You don't
access it.
> +static int hbl_en_close(struct net_device *netdev)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + struct hbl_en_device *hdev = port->hdev;
> + ktime_t timeout;
> +
> + /* Looks like the return value of this function is not checked, so we can't just return
> + * EBUSY if the port is under reset. We need to wait until the reset is finished and then
> + * close the port. Otherwise the netdev will set the port as closed although port_close()
> + * wasn't called. Only if we waited long enough and the reset hasn't finished, we can return
> + * an error without actually closing the port as it is a fatal flow anyway.
> + */
> + timeout = ktime_add_ms(ktime_get(), PORT_RESET_TIMEOUT_MSEC);
> + while (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + /* If this is called from unregister_netdev() then the port was already closed and
> + * hence we can safely return.
> + * We could have just check the port_open boolean, but that might hide some future
> + * bugs. Hence it is better to use a dedicated flag for that.
> + */
> + if (READ_ONCE(hdev->in_teardown))
> + return 0;
> +
> + usleep_range(50, 200);
> + if (ktime_compare(ktime_get(), timeout) > 0) {
> + netdev_crit(netdev,
> + "Timeout while waiting for port to finish reset, can't close it\n"
> + );
> + return -EBUSY;
> + }
This has the usual bug. Please look at include/linux/iopoll.h.
> + timeout = ktime_add_ms(ktime_get(), PORT_RESET_TIMEOUT_MSEC);
> + while (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + usleep_range(50, 200);
> + if (ktime_compare(ktime_get(), timeout) > 0) {
> + netdev_crit(port->ndev,
> + "Timeout while waiting for port %d to finish reset\n",
> + port->idx);
> + break;
> + }
> + }
and again. Don't roll your own timeout loops like this, use the core
version.
> +static int hbl_en_change_mtu(struct net_device *netdev, int new_mtu)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + int rc = 0;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(netdev, "port is in reset, can't change MTU\n");
> + return -EBUSY;
> + }
> +
> + if (netif_running(port->ndev)) {
> + hbl_en_port_close(port);
> +
> + /* Sleep in order to let obsolete events to be dropped before re-opening the port */
> + msleep(20);
> +
> + netdev->mtu = new_mtu;
> +
> + rc = hbl_en_port_open(port);
> + if (rc)
> + netdev_err(netdev, "Failed to reinit port for MTU change, rc %d\n", rc);
Does that mean the port is FUBAR?
Most operations like this are expected to roll back to the previous
working configuration on failure. So if changing the MTU requires new
buffers in your ring, you should first allocate the new buffers, then
free the old buffers, so that if allocation fails, you still have
buffers, and the device can continue operating.
> +module_param(poll_enable, bool, 0444);
> +MODULE_PARM_DESC(poll_enable,
> + "Enable Rx polling rather than IRQ + NAPI (0 = no, 1 = yes, default: no)");
Module parameters are not liked. This probably needs to go away.
> +static int hbl_en_ethtool_get_module_info(struct net_device *ndev, struct ethtool_modinfo *modinfo)
> +{
> + modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
> + modinfo->type = ETH_MODULE_SFF_8636;
Is this an SFF, not an SFP? How else can you know what module it is
without doing an I2C transfer to ask the module what it is?
> +static int hbl_en_ethtool_get_module_eeprom(struct net_device *ndev, struct ethtool_eeprom *ee,
> + u8 *data)
> +{
This is the old API. Please update to the new API so there is access
to all the pages of the SFF/SFP.
> +static int hbl_en_ethtool_get_link_ksettings(struct net_device *ndev,
> + struct ethtool_link_ksettings *cmd)
> +{
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + struct hbl_en_device *hdev;
> + struct hbl_en_port *port;
> + u32 port_idx, speed;
> +
> + port = hbl_netdev_priv(ndev);
> + hdev = port->hdev;
> + port_idx = port->idx;
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> + speed = aux_ops->get_speed(aux_dev, port_idx);
> +
> + cmd->base.speed = speed;
> + cmd->base.duplex = DUPLEX_FULL;
> +
> + ethtool_link_ksettings_zero_link_mode(cmd, supported);
> + ethtool_link_ksettings_zero_link_mode(cmd, advertising);
> +
> + switch (speed) {
> + case SPEED_100000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseSR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseKR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseLR4_ER4_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseSR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseKR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseLR4_ER4_Full);
> +
> + cmd->base.port = PORT_FIBRE;
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, FIBRE);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, FIBRE);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, Backplane);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Backplane);
> + break;
> + case SPEED_50000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseSR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseCR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseKR2_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseSR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseCR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseKR2_Full);
> + break;
> + case SPEED_25000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 25000baseCR_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 25000baseCR_Full);
> + break;
> + case SPEED_200000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 200000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 200000baseKR4_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 200000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 200000baseKR4_Full);
> + break;
> + case SPEED_400000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 400000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 400000baseKR4_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 400000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 400000baseKR4_Full);
> + break;
> + default:
> + netdev_err(port->ndev, "unknown speed %d\n", speed);
> + return -EFAULT;
> + }
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, Autoneg);
> +
> + if (port->auto_neg_enable) {
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Autoneg);
> + cmd->base.autoneg = AUTONEG_ENABLE;
> + if (port->auto_neg_resolved)
> + ethtool_link_ksettings_add_link_mode(cmd, lp_advertising, Autoneg);
That looks odd. Care to explain?
> + } else {
> + cmd->base.autoneg = AUTONEG_DISABLE;
> + }
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, Pause);
> +
> + if (port->pfc_enable)
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Pause);
And is suspect that is wrong. Everybody gets pause wrong. Please
double check my previous posts about pause.
> + if (auto_neg && !(hdev->auto_neg_mask & BIT(port_idx))) {
> + netdev_err(port->ndev, "port autoneg is disabled by BMC\n");
> + rc = -EFAULT;
> + goto out;
Don't say you support autoneg in supported if that is the case.
And EFAULT is about memory problems. EINVAL, maybe EPERM? or
EOPNOTSUPP.
Andrew
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-13 8:22 ` [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver Omer Shpigelman
2024-06-13 21:49 ` Andrew Lunn
@ 2024-06-14 22:48 ` Joe Damato
2024-06-16 1:04 ` Andrew Lunn
2024-06-18 19:37 ` Omer Shpigelman
2024-06-15 0:10 ` Stephen Hemminger
` (3 subsequent siblings)
5 siblings, 2 replies; 107+ messages in thread
From: Joe Damato @ 2024-06-14 22:48 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel, linux-rdma, netdev, dri-devel, ogabbay, zyehudai
On Thu, Jun 13, 2024 at 11:22:02AM +0300, Omer Shpigelman wrote:
> This ethernet driver is initialized via auxiliary bus by the hbl_cn
> driver.
> It serves mainly for control operations that are needed for AI scaling.
>
> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> Co-developed-by: David Meriin <dmeriin@habana.ai>
> Signed-off-by: David Meriin <dmeriin@habana.ai>
> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> ---
> MAINTAINERS | 9 +
> drivers/net/ethernet/intel/Kconfig | 18 +
> drivers/net/ethernet/intel/Makefile | 1 +
> drivers/net/ethernet/intel/hbl_en/Makefile | 9 +
> .../net/ethernet/intel/hbl_en/common/Makefile | 3 +
> .../net/ethernet/intel/hbl_en/common/hbl_en.c | 1168 +++++++++++++++++
> .../net/ethernet/intel/hbl_en/common/hbl_en.h | 206 +++
> .../intel/hbl_en/common/hbl_en_dcbnl.c | 101 ++
> .../ethernet/intel/hbl_en/common/hbl_en_drv.c | 211 +++
> .../intel/hbl_en/common/hbl_en_ethtool.c | 452 +++++++
> 10 files changed, 2178 insertions(+)
> create mode 100644 drivers/net/ethernet/intel/hbl_en/Makefile
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/Makefile
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 096439a62129..7301f38e9cfb 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9617,6 +9617,15 @@ F: include/linux/habanalabs/
> F: include/linux/net/intel/cn*
> F: include/linux/net/intel/gaudi2*
>
> +HABANALABS ETHERNET DRIVER
> +M: Omer Shpigelman <oshpigelman@habana.ai>
> +L: netdev@vger.kernel.org
> +S: Supported
> +W: https://www.habana.ai
> +F: Documentation/networking/device_drivers/ethernet/intel/hbl.rst
> +F: drivers/net/ethernet/intel/hbl_en/
> +F: include/linux/net/intel/cn*
> +
> HACKRF MEDIA DRIVER
> L: linux-media@vger.kernel.org
> S: Orphan
> diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
> index 0d1b8a2bae99..5d07349348a0 100644
> --- a/drivers/net/ethernet/intel/Kconfig
> +++ b/drivers/net/ethernet/intel/Kconfig
> @@ -417,4 +417,22 @@ config HABANA_CN
> To compile this driver as a module, choose M here. The module
> will be called habanalabs_cn.
>
> +config HABANA_EN
> + tristate "HabanaLabs (an Intel Company) Ethernet driver"
> + depends on NETDEVICES && ETHERNET && INET
> + select HABANA_CN
> + help
> + This driver enables Ethernet functionality for the network interfaces
> + that are part of the GAUDI ASIC family of AI Accelerators.
> + For more information on how to identify your adapter, go to the
> + Adapter & Driver ID Guide that can be located at:
> +
> + <http://support.intel.com>
> +
> + More specific information on configuring the driver is in
> + <file:Documentation/networking/device_drivers/ethernet/intel/hbl.rst>.
> +
> + To compile this driver as a module, choose M here. The module
> + will be called habanalabs_en.
> +
> endif # NET_VENDOR_INTEL
> diff --git a/drivers/net/ethernet/intel/Makefile b/drivers/net/ethernet/intel/Makefile
> index 10049a28e336..ec62a0227897 100644
> --- a/drivers/net/ethernet/intel/Makefile
> +++ b/drivers/net/ethernet/intel/Makefile
> @@ -20,3 +20,4 @@ obj-$(CONFIG_FM10K) += fm10k/
> obj-$(CONFIG_ICE) += ice/
> obj-$(CONFIG_IDPF) += idpf/
> obj-$(CONFIG_HABANA_CN) += hbl_cn/
> +obj-$(CONFIG_HABANA_EN) += hbl_en/
> diff --git a/drivers/net/ethernet/intel/hbl_en/Makefile b/drivers/net/ethernet/intel/hbl_en/Makefile
> new file mode 100644
> index 000000000000..695497ab93b6
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/Makefile
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Makefile for HabanaLabs (an Intel Company) Ethernet network driver
> +#
> +
> +obj-$(CONFIG_HABANA_EN) := habanalabs_en.o
> +
> +include $(src)/common/Makefile
> +habanalabs_en-y += $(HBL_EN_COMMON_FILES)
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/Makefile b/drivers/net/ethernet/intel/hbl_en/common/Makefile
> new file mode 100644
> index 000000000000..a3ccb5dbf4a6
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +HBL_EN_COMMON_FILES := common/hbl_en_drv.o common/hbl_en.o \
> + common/hbl_en_ethtool.o common/hbl_en_dcbnl.o
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
> new file mode 100644
> index 000000000000..066be5ac2d84
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
> @@ -0,0 +1,1168 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#include "hbl_en.h"
> +#include <linux/inetdevice.h>
> +
> +#define TX_TIMEOUT (5 * HZ)
> +#define PORT_RESET_TIMEOUT_MSEC (60 * 1000ull) /* 60s */
> +
> +/**
> + * struct hbl_en_tx_pkt_work - used to schedule a work of a Tx packet.
> + * @tx_work: workqueue object to run when packet needs to be sent.
> + * @port: pointer to current port structure.
> + * @skb: copy of the packet to send.
> + */
> +struct hbl_en_tx_pkt_work {
> + struct work_struct tx_work;
> + struct hbl_en_port *port;
> + struct sk_buff *skb;
> +};
> +
> +static int hbl_en_napi_poll(struct napi_struct *napi, int budget);
> +static int hbl_en_port_open(struct hbl_en_port *port);
> +
> +static int hbl_en_ports_reopen(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> + struct hbl_en_port *port;
> + int rc = 0, i;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + /* It could be that the port was shutdown by 'ip link set down' and there is no need
> + * in reopening it.
> + * Since we mark the ports as in reset even if they are disabled, we clear the flag
> + * here anyway.
> + * See hbl_en_ports_stop_prepare() for more info.
> + */
> + if (!netif_running(port->ndev)) {
> + atomic_set(&port->in_reset, 0);
> + continue;
> + }
> +
> + rc = hbl_en_port_open(port);
> +
> + atomic_set(&port->in_reset, 0);
> +
> + if (rc)
> + break;
> + }
> +
> + hdev->in_reset = false;
> +
> + return rc;
> +}
> +
> +static void hbl_en_port_fini(struct hbl_en_port *port)
> +{
> + if (port->rx_wq)
> + destroy_workqueue(port->rx_wq);
> +}
> +
> +static int hbl_en_port_init(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + u32 port_idx = port->idx;
> + char wq_name[32];
> + int rc;
> +
> + if (hdev->poll_enable) {
> + memset(wq_name, 0, sizeof(wq_name));
> + snprintf(wq_name, sizeof(wq_name) - 1, "hbl%u-port%d-rx-wq", hdev->core_dev_id,
> + port_idx);
> + port->rx_wq = alloc_ordered_workqueue(wq_name, 0);
> + if (!port->rx_wq) {
> + dev_err(hdev->dev, "Failed to allocate Rx WQ\n");
> + rc = -ENOMEM;
> + goto fail;
> + }
> + }
> +
> + hbl_en_ethtool_init_coalesce(port);
> +
> + return 0;
> +
> +fail:
> + hbl_en_port_fini(port);
> +
> + return rc;
> +}
> +
> +static void _hbl_en_set_port_status(struct hbl_en_port *port, bool up)
> +{
> + struct net_device *ndev = port->ndev;
> + u32 port_idx = port->idx;
> +
> + if (up) {
> + netif_carrier_on(ndev);
> + netif_wake_queue(ndev);
> + } else {
> + netif_carrier_off(ndev);
> + netif_stop_queue(ndev);
> + }
> +
> + /* Unless link events are getting through the EQ, no need to print about link down events
> + * during port reset
> + */
> + if (port->hdev->has_eq || up || !atomic_read(&port->in_reset))
> + netdev_info(port->ndev, "link %s, port %d\n", up ? "up" : "down", port_idx);
> +}
> +
> +static void hbl_en_set_port_status(struct hbl_aux_dev *aux_dev, u32 port_idx, bool up)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + _hbl_en_set_port_status(port, up);
> +}
> +
> +static bool hbl_en_is_port_open(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + return port->is_initialized;
> +}
> +
> +/* get the src IP as it is done in devinet_ioctl() */
> +static int hbl_en_get_src_ip(struct hbl_aux_dev *aux_dev, u32 port_idx, u32 *src_ip)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> + struct net_device *ndev = port->ndev;
> + struct in_device *in_dev;
> + struct in_ifaddr *ifa;
> + int rc = 0;
> +
> + /* for the case where no src IP is configured */
> + *src_ip = 0;
> +
> + /* rtnl lock should be acquired in relevant flows before taking configuration lock */
> + if (!rtnl_is_locked()) {
> + netdev_err(port->ndev, "Rtnl lock is not acquired, can't proceed\n");
> + rc = -EFAULT;
> + goto out;
> + }
> +
> + in_dev = __in_dev_get_rtnl(ndev);
> + if (!in_dev) {
> + netdev_err(port->ndev, "Failed to get IPv4 struct\n");
> + rc = -EFAULT;
> + goto out;
> + }
> +
> + ifa = rtnl_dereference(in_dev->ifa_list);
> +
> + while (ifa) {
> + if (!strcmp(ndev->name, ifa->ifa_label)) {
> + /* convert the BE to native and later on it will be
> + * written to the HW as LE in QPC_SET
> + */
> + *src_ip = be32_to_cpu(ifa->ifa_local);
> + break;
> + }
> + ifa = rtnl_dereference(ifa->ifa_next);
> + }
> +out:
> + return rc;
> +}
> +
> +static void hbl_en_reset_stats(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + port->net_stats.rx_packets = 0;
> + port->net_stats.tx_packets = 0;
> + port->net_stats.rx_bytes = 0;
> + port->net_stats.tx_bytes = 0;
> + port->net_stats.tx_errors = 0;
> + atomic64_set(&port->net_stats.rx_dropped, 0);
> + atomic64_set(&port->net_stats.tx_dropped, 0);
> +}
> +
> +static u32 hbl_en_get_mtu(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> + struct net_device *ndev = port->ndev;
> + u32 mtu;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(ndev, "port is in reset, can't get MTU\n");
> + return 0;
> + }
> +
> + mtu = ndev->mtu;
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return mtu;
> +}
> +
> +static u32 hbl_en_get_pflags(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + return port->pflags;
> +}
> +
> +static void hbl_en_set_dev_lpbk(struct hbl_aux_dev *aux_dev, u32 port_idx, bool enable)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> + struct net_device *ndev = port->ndev;
> +
> + if (enable)
> + ndev->features |= NETIF_F_LOOPBACK;
> + else
> + ndev->features &= ~NETIF_F_LOOPBACK;
> +}
> +
> +/* This function should be called after ctrl_lock was taken */
> +static int hbl_en_port_open_locked(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct net_device *ndev = port->ndev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int rc;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + if (port->is_initialized)
> + return 0;
> +
> + if (!hdev->poll_enable)
> + netif_napi_add(ndev, &port->napi, hbl_en_napi_poll);
> +
> + rc = aux_ops->port_hw_init(aux_dev, port_idx);
> + if (rc) {
> + netdev_err(ndev, "Failed to configure the HW, rc %d\n", rc);
> + goto hw_init_fail;
> + }
> +
> + if (!hdev->poll_enable)
> + napi_enable(&port->napi);
> +
> + rc = hdev->asic_funcs.eth_port_open(port);
> + if (rc) {
> + netdev_err(ndev, "Failed to init H/W, rc %d\n", rc);
> + goto port_open_fail;
> + }
> +
> + rc = aux_ops->update_mtu(aux_dev, port_idx, ndev->mtu);
> + if (rc) {
> + netdev_err(ndev, "MTU update failed, rc %d\n", rc);
> + goto update_mtu_fail;
> + }
> +
> + rc = aux_ops->phy_init(aux_dev, port_idx);
> + if (rc) {
> + netdev_err(ndev, "PHY init failed, rc %d\n", rc);
> + goto phy_init_fail;
> + }
> +
> + netif_start_queue(ndev);
> +
> + port->is_initialized = true;
> +
> + return 0;
> +
> +phy_init_fail:
> + /* no need to revert the MTU change, it will be updated on next port open */
> +update_mtu_fail:
> + hdev->asic_funcs.eth_port_close(port);
> +port_open_fail:
> + if (!hdev->poll_enable)
> + napi_disable(&port->napi);
> +
> + aux_ops->port_hw_fini(aux_dev, port_idx);
> +hw_init_fail:
> + if (!hdev->poll_enable)
> + netif_napi_del(&port->napi);
> +
> + return rc;
> +}
> +
> +static int hbl_en_port_open(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int rc;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + aux_ops->ctrl_lock(aux_dev, port_idx);
> + rc = hbl_en_port_open_locked(port);
> + aux_ops->ctrl_unlock(aux_dev, port_idx);
> +
> + return rc;
> +}
> +
> +static int hbl_en_open(struct net_device *netdev)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + int rc;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(netdev, "port is in reset, can't open it\n");
> + return -EBUSY;
> + }
> +
> + rc = hbl_en_port_open(port);
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +/* This function should be called after ctrl_lock was taken */
> +static void hbl_en_port_close_locked(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + if (!port->is_initialized)
> + return;
> +
> + port->is_initialized = false;
> +
> + /* verify that the port is marked as closed before continuing */
> + mb();
> +
> + /* Print if not in hard reset flow e.g. from ip cmd */
> + if (!hdev->in_reset && netif_carrier_ok(port->ndev))
> + netdev_info(port->ndev, "port was closed\n");
> +
> + /* disable the PHY here so no link changes will occur from this point forward */
> + aux_ops->phy_fini(aux_dev, port_idx);
> +
> + /* disable Tx SW flow */
> + netif_carrier_off(port->ndev);
> + netif_tx_disable(port->ndev);
> +
> + /* stop Tx/Rx HW */
> + aux_ops->port_hw_fini(aux_dev, port_idx);
> +
> + /* disable Tx/Rx QPs */
> + hdev->asic_funcs.eth_port_close(port);
> +
> + /* stop Rx SW flow */
> + if (hdev->poll_enable) {
> + hbl_en_rx_poll_stop(port);
> + } else {
> + napi_disable(&port->napi);
> + netif_napi_del(&port->napi);
> + }
> +
> + /* Explicitly count the port close operations as we don't get a link event for this.
> + * Upon port open we receive a link event, hence no additional action required.
> + */
> + aux_ops->port_toggle_count(aux_dev, port_idx);
> +}
> +
> +static void hbl_en_port_close(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + aux_ops->ctrl_lock(aux_dev, port_idx);
> + hbl_en_port_close_locked(port);
> + aux_ops->ctrl_unlock(aux_dev, port_idx);
> +}
> +
> +/* This function should be called after ctrl_lock was taken */
> +static int __hbl_en_port_reset_locked(struct hbl_en_port *port)
> +{
> + hbl_en_port_close_locked(port);
> +
> + return hbl_en_port_open_locked(port);
> +}
> +
> +/* This function should be called after ctrl_lock was taken */
> +int hbl_en_port_reset_locked(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + return __hbl_en_port_reset_locked(port);
> +}
> +
> +int hbl_en_port_reset(struct hbl_en_port *port)
> +{
> + hbl_en_port_close(port);
> +
> + /* Sleep in order to let obsolete events to be dropped before re-opening the port */
> + msleep(20);
> +
> + return hbl_en_port_open(port);
> +}
> +
> +static int hbl_en_close(struct net_device *netdev)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + struct hbl_en_device *hdev = port->hdev;
> + ktime_t timeout;
> +
> + /* Looks like the return value of this function is not checked, so we can't just return
> + * EBUSY if the port is under reset. We need to wait until the reset is finished and then
> + * close the port. Otherwise the netdev will set the port as closed although port_close()
> + * wasn't called. Only if we waited long enough and the reset hasn't finished, we can return
> + * an error without actually closing the port as it is a fatal flow anyway.
> + */
> + timeout = ktime_add_ms(ktime_get(), PORT_RESET_TIMEOUT_MSEC);
> + while (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + /* If this is called from unregister_netdev() then the port was already closed and
> + * hence we can safely return.
> + * We could have just check the port_open boolean, but that might hide some future
> + * bugs. Hence it is better to use a dedicated flag for that.
> + */
> + if (READ_ONCE(hdev->in_teardown))
> + return 0;
> +
> + usleep_range(50, 200);
> + if (ktime_compare(ktime_get(), timeout) > 0) {
> + netdev_crit(netdev,
> + "Timeout while waiting for port to finish reset, can't close it\n"
> + );
> + return -EBUSY;
> + }
> + }
> +
> + hbl_en_port_close(port);
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return 0;
> +}
> +
> +/**
> + * hbl_en_ports_stop_prepare() - stop the Rx and Tx and synchronize with other reset flows.
> + * @aux_dev: habanalabs auxiliary device structure.
> + *
> + * This function makes sure that during the reset no packets will be processed and that
> + * ndo_open/ndo_close do not open/close the ports.
> + * A hard reset might occur right after the driver was loaded, which means before the ports
> + * initialization was finished. Therefore, even if the ports are not yet open, we mark it as in
> + * reset in order to avoid races. We clear the in reset flag later on when reopening the ports.
> + */
> +static void hbl_en_ports_stop_prepare(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> + struct hbl_en_port *port;
> + ktime_t timeout;
> + int i;
> +
> + /* Check if the ports where initialized. If not, we shouldn't mark them as in reset because
> + * they will fail to get opened.
> + */
> + if (!hdev->is_initialized || hdev->in_reset)
> + return;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + /* This function is competing with reset from ethtool/ip, so try to take the
> + * in_reset atomic and if we are already in a middle of reset, wait until reset
> + * function is finished.
> + * Reset function is designed to always finish (could take up to a few seconds in
> + * worst case).
> + * We mark also closed ports as in reset so they won't be able to get opened while
> + * the device in under reset.
> + */
> +
> + timeout = ktime_add_ms(ktime_get(), PORT_RESET_TIMEOUT_MSEC);
> + while (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + usleep_range(50, 200);
> + if (ktime_compare(ktime_get(), timeout) > 0) {
> + netdev_crit(port->ndev,
> + "Timeout while waiting for port %d to finish reset\n",
> + port->idx);
> + break;
> + }
> + }
> + }
> +
> + hdev->in_reset = true;
> +}
> +
> +static void hbl_en_ports_stop(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> + struct hbl_en_port *port;
> + int i;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + if (netif_running(port->ndev))
> + hbl_en_port_close(port);
> + }
> +}
> +
> +static int hbl_en_change_mtu(struct net_device *netdev, int new_mtu)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + int rc = 0;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(netdev, "port is in reset, can't change MTU\n");
> + return -EBUSY;
> + }
> +
> + if (netif_running(port->ndev)) {
> + hbl_en_port_close(port);
> +
> + /* Sleep in order to let obsolete events to be dropped before re-opening the port */
> + msleep(20);
> +
> + netdev->mtu = new_mtu;
> +
> + rc = hbl_en_port_open(port);
> + if (rc)
> + netdev_err(netdev, "Failed to reinit port for MTU change, rc %d\n", rc);
> + } else {
> + netdev->mtu = new_mtu;
> + }
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +/* Swap source and destination MAC addresses */
> +static inline void swap_l2(char *buf)
> +{
> + u16 *eth_hdr, tmp;
> +
> + eth_hdr = (u16 *)buf;
> + tmp = eth_hdr[0];
> + eth_hdr[0] = eth_hdr[3];
> + eth_hdr[3] = tmp;
> + tmp = eth_hdr[1];
> + eth_hdr[1] = eth_hdr[4];
> + eth_hdr[4] = tmp;
> + tmp = eth_hdr[2];
> + eth_hdr[2] = eth_hdr[5];
> + eth_hdr[5] = tmp;
> +}
> +
> +/* Swap source and destination IP addresses
> + */
> +static inline void swap_l3(char *buf)
> +{
> + u32 tmp;
> +
> + /* skip the Ethernet header and the IP header till source IP address */
> + buf += ETH_HLEN + 12;
> + tmp = ((u32 *)buf)[0];
> + ((u32 *)buf)[0] = ((u32 *)buf)[1];
> + ((u32 *)buf)[1] = tmp;
> +}
> +
> +static void do_tx_swap(struct hbl_en_port *port, struct sk_buff *skb)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + u16 *tmp_buff = (u16 *)skb->data;
> + u32 port_idx = port->idx;
> +
> + /* First, let's print the SKB we got */
> + dev_dbg_ratelimited(hdev->dev,
> + "Send [P%d]: dst-mac:%04x%04x%04x, src-mac:%04x%04x%04x, eth-type:%04x, len:%u\n",
> + port_idx, swab16(tmp_buff[0]), swab16(tmp_buff[1]), swab16(tmp_buff[2]),
> + swab16(tmp_buff[3]), swab16(tmp_buff[4]), swab16(tmp_buff[5]),
> + swab16(tmp_buff[6]), skb->len);
> +
> + /* Before submit it to HW, in case this is ipv4 pkt, swap eth/ip addresses.
> + * that way, we may send ECMP (ping) to ourselves in LB cases.
> + */
> + swap_l2(skb->data);
> + if (swab16(tmp_buff[6]) == ETH_P_IP)
> + swap_l3(skb->data);
> +}
> +
> +static bool is_pkt_swap_enabled(struct hbl_en_device *hdev)
> +{
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + return aux_ops->is_eth_lpbk(aux_dev);
> +}
> +
> +static bool is_tx_disabled(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + return aux_ops->get_mac_lpbk(aux_dev, port_idx) && !is_pkt_swap_enabled(hdev);
> +}
> +
> +static netdev_tx_t hbl_en_handle_tx(struct hbl_en_port *port, struct sk_buff *skb)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + netdev_tx_t ret;
> +
> + if (skb->len <= 0 || is_tx_disabled(port))
> + goto free_skb;
> +
> + if (skb->len > hdev->max_frm_len) {
> + netdev_err(port->ndev, "Tx pkt size %uB exceeds maximum of %uB\n", skb->len,
> + hdev->max_frm_len);
> + goto free_skb;
> + }
> +
> + if (is_pkt_swap_enabled(hdev))
> + do_tx_swap(port, skb);
> +
> + /* Pad the ethernet packets to the minimum frame size as the NIC hw doesn't do it.
> + * eth_skb_pad() frees the packet on failure, so just increment the dropped counter and
> + * return as success to avoid a retry.
> + */
> + if (skb_put_padto(skb, hdev->pad_size)) {
> + dev_err_ratelimited(hdev->dev, "Padding failed, the skb is dropped\n");
> + atomic64_inc(&port->net_stats.tx_dropped);
> + return NETDEV_TX_OK;
> + }
> +
> + ret = hdev->asic_funcs.write_pkt_to_hw(port, skb);
> + if (ret == NETDEV_TX_OK) {
> + port->net_stats.tx_packets++;
> + port->net_stats.tx_bytes += skb->len;
> + }
> +
> + return ret;
> +
> +free_skb:
> + dev_kfree_skb_any(skb);
> + return NETDEV_TX_OK;
> +}
> +
> +static netdev_tx_t hbl_en_start_xmit(struct sk_buff *skb, struct net_device *netdev)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + struct hbl_en_device *hdev;
> +
> + hdev = port->hdev;
> +
> + return hbl_en_handle_tx(port, skb);
> +}
> +
> +static int hbl_en_set_port_mac_loopback(struct hbl_en_port *port, bool enable)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct net_device *ndev = port->ndev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int rc;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + rc = aux_ops->set_mac_lpbk(aux_dev, port_idx, enable);
> + if (rc)
> + return rc;
> +
> + netdev_info(ndev, "port %u: mac loopback is %s\n", port_idx,
> + enable ? "enabled" : "disabled");
> +
> + if (netif_running(ndev)) {
> + rc = hbl_en_port_reset(port);
> + if (rc) {
> + netdev_err(ndev, "Failed to reset port %u, rc %d\n", port_idx, rc);
> + return rc;
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int hbl_en_set_features(struct net_device *netdev, netdev_features_t features)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + netdev_features_t changed;
> + int rc = 0;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(netdev, "port %d is in reset, can't update settings", port->idx);
> + return -EBUSY;
> + }
> +
> + changed = netdev->features ^ features;
> +
> + if (changed & NETIF_F_LOOPBACK)
> + rc = hbl_en_set_port_mac_loopback(port, !!(features & NETIF_F_LOOPBACK));
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +static void hbl_en_handle_tx_timeout(struct net_device *netdev, unsigned int txqueue)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> +
> + port->net_stats.tx_errors++;
> + atomic64_inc(&port->net_stats.tx_dropped);
> +}
> +
> +static void hbl_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(dev);
> +
> + stats->rx_bytes = port->net_stats.rx_bytes;
> + stats->tx_bytes = port->net_stats.tx_bytes;
> + stats->rx_packets = port->net_stats.rx_packets;
> + stats->tx_packets = port->net_stats.tx_packets;
> + stats->tx_errors = port->net_stats.tx_errors;
> + stats->tx_dropped = (u64)atomic64_read(&port->net_stats.tx_dropped);
> + stats->rx_dropped = (u64)atomic64_read(&port->net_stats.rx_dropped);
> +}
> +
> +static const struct net_device_ops hbl_en_netdev_ops = {
> + .ndo_open = hbl_en_open,
> + .ndo_stop = hbl_en_close,
> + .ndo_start_xmit = hbl_en_start_xmit,
> + .ndo_validate_addr = eth_validate_addr,
> + .ndo_change_mtu = hbl_en_change_mtu,
> + .ndo_set_features = hbl_en_set_features,
> + .ndo_get_stats64 = hbl_en_get_stats64,
> + .ndo_tx_timeout = hbl_en_handle_tx_timeout,
> +};
> +
> +static void hbl_en_set_ops(struct net_device *ndev)
> +{
> + ndev->netdev_ops = &hbl_en_netdev_ops;
> + ndev->ethtool_ops = hbl_en_ethtool_get_ops(ndev);
> +#ifdef CONFIG_DCB
> + ndev->dcbnl_ops = &hbl_en_dcbnl_ops;
> +#endif
> +}
> +
> +static int hbl_en_port_register(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + struct hbl_en_port **ptr;
> + struct net_device *ndev;
> + int rc;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + ndev = alloc_etherdev(sizeof(struct hbl_en_port *));
> + if (!ndev) {
> + dev_err(hdev->dev, "netdevice %d alloc failed\n", port_idx);
> + return -ENOMEM;
> + }
> +
> + port->ndev = ndev;
> + SET_NETDEV_DEV(ndev, &hdev->pdev->dev);
> + ptr = netdev_priv(ndev);
> + *ptr = port;
> +
> + /* necessary for creating multiple interfaces */
> + ndev->dev_port = port_idx;
> +
> + hbl_en_set_ops(ndev);
> +
> + ndev->watchdog_timeo = TX_TIMEOUT;
> + ndev->min_mtu = hdev->min_raw_mtu;
> + ndev->max_mtu = hdev->max_raw_mtu;
> +
> + /* Add loopback capability to the device. */
> + ndev->hw_features |= NETIF_F_LOOPBACK;
> +
> + /* If this port was set to loopback, set it also to the ndev features */
> + if (aux_ops->get_mac_lpbk(aux_dev, port_idx))
> + ndev->features |= NETIF_F_LOOPBACK;
> +
> + eth_hw_addr_set(ndev, port->mac_addr);
> +
> + /* It's more an intelligent poll wherein, we enable the Rx completion EQE event and then
> + * start the poll from there.
> + * Inside the polling thread, we read packets from hardware and then reschedule the poll
> + * only if there are more packets to be processed. Else we re-enable the CQ Arm interrupt
> + * and exit the poll.
> + */
> + if (hdev->poll_enable)
> + hbl_en_rx_poll_trigger_init(port);
> +
> + netif_carrier_off(ndev);
> +
> + rc = register_netdev(ndev);
> + if (rc) {
> + dev_err(hdev->dev, "Could not register netdevice %d\n", port_idx);
> + goto err;
> + }
> +
> + return 0;
> +
> +err:
> + if (ndev) {
> + free_netdev(ndev);
> + port->ndev = NULL;
> + }
> +
> + return rc;
> +}
> +
> +static void dump_swap_pkt(struct hbl_en_port *port, struct sk_buff *skb)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + u16 *tmp_buff = (u16 *)skb->data;
> + u32 port_idx = port->idx;
> +
> + /* The SKB is ready now (before stripping-out the L2), print its content */
> + dev_dbg_ratelimited(hdev->dev,
> + "Recv [P%d]: dst-mac:%04x%04x%04x, src-mac:%04x%04x%04x, eth-type:%04x, len:%u\n",
> + port_idx, swab16(tmp_buff[0]), swab16(tmp_buff[1]), swab16(tmp_buff[2]),
> + swab16(tmp_buff[3]), swab16(tmp_buff[4]), swab16(tmp_buff[5]),
> + swab16(tmp_buff[6]), skb->len);
> +}
> +
> +int hbl_en_handle_rx(struct hbl_en_port *port, int budget)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + enum hbl_en_eth_pkt_status pkt_status;
> + struct net_device *ndev = port->ndev;
> + int rc, pkt_count = 0;
> + struct sk_buff *skb;
> + void *pkt_addr;
> + u32 pkt_size;
> +
> + if (!netif_carrier_ok(ndev))
> + return 0;
> +
> + while (pkt_count < budget) {
> + pkt_status = hdev->asic_funcs.read_pkt_from_hw(port, &pkt_addr, &pkt_size);
> +
> + if (pkt_status == ETH_PKT_NONE)
> + break;
> +
> + pkt_count++;
> +
> + if (pkt_status == ETH_PKT_DROP) {
> + atomic64_inc(&port->net_stats.rx_dropped);
> + continue;
> + }
> +
> + if (hdev->poll_enable)
> + skb = __netdev_alloc_skb_ip_align(ndev, pkt_size, GFP_KERNEL);
> + else
> + skb = napi_alloc_skb(&port->napi, pkt_size);
> +
> + if (!skb) {
> + atomic64_inc(&port->net_stats.rx_dropped);
It seems like buffer exhaustion (!skb) would be rx_missed_errors?
The documentation in include/uapi/linux/if_link.h:
* @rx_dropped: Number of packets received but not processed,
* e.g. due to lack of resources or unsupported protocol.
* For hardware interfaces this counter may include packets discarded
* due to L2 address filtering but should not include packets dropped
* by the device due to buffer exhaustion which are counted separately in
* @rx_missed_errors (since procfs folds those two counters together).
But, I don't know much about your hardware so I could be wrong.
> + break;
> + }
> +
> + skb_copy_to_linear_data(skb, pkt_addr, pkt_size);
> + skb_put(skb, pkt_size);
> +
> + if (is_pkt_swap_enabled(hdev))
> + dump_swap_pkt(port, skb);
> +
> + skb->protocol = eth_type_trans(skb, ndev);
> +
> + /* Zero the packet buffer memory to avoid leak in case of wrong
> + * size is used when next packet populates the same memory
> + */
> + memset(pkt_addr, 0, pkt_size);
> +
> + /* polling is done in thread context and hence BH should be disabled */
> + if (hdev->poll_enable)
> + local_bh_disable();
> +
> + rc = netif_receive_skb(skb);
Is there any reason in particular to call netif_receive_skb instead of
napi_gro_receive ?
> +
> + if (hdev->poll_enable)
> + local_bh_enable();
> +
> + if (rc == NET_RX_SUCCESS) {
> + port->net_stats.rx_packets++;
> + port->net_stats.rx_bytes += pkt_size;
> + } else {
> + atomic64_inc(&port->net_stats.rx_dropped);
> + }
> + }
> +
> + return pkt_count;
> +}
> +
> +static bool __hbl_en_rx_poll_schedule(struct hbl_en_port *port, unsigned long delay)
> +{
> + return queue_delayed_work(port->rx_wq, &port->rx_poll_work, delay);
> +}
> +
> +static void hbl_en_rx_poll_work(struct work_struct *work)
> +{
> + struct hbl_en_port *port = container_of(work, struct hbl_en_port, rx_poll_work.work);
> + struct hbl_en_device *hdev = port->hdev;
> + int pkt_count;
> +
> + pkt_count = hbl_en_handle_rx(port, NAPI_POLL_WEIGHT);
> +
> + /* Reschedule the poll if we have consumed budget which means we still have packets to
> + * process. Else re-enable the Rx IRQs and exit the work.
> + */
> + if (pkt_count < NAPI_POLL_WEIGHT)
> + hdev->asic_funcs.reenable_rx_irq(port);
> + else
> + __hbl_en_rx_poll_schedule(port, 0);
> +}
> +
> +/* Rx poll init and trigger routines are used in event-driven setups where
> + * Rx polling is initialized once during init or open and started/triggered by the event handler.
> + */
> +void hbl_en_rx_poll_trigger_init(struct hbl_en_port *port)
> +{
> + INIT_DELAYED_WORK(&port->rx_poll_work, hbl_en_rx_poll_work);
> +}
> +
> +bool hbl_en_rx_poll_start(struct hbl_en_port *port)
> +{
> + return __hbl_en_rx_poll_schedule(port, msecs_to_jiffies(1));
> +}
> +
> +void hbl_en_rx_poll_stop(struct hbl_en_port *port)
> +{
> + cancel_delayed_work_sync(&port->rx_poll_work);
> +}
> +
> +static int hbl_en_napi_poll(struct napi_struct *napi, int budget)
> +{
> + struct hbl_en_port *port = container_of(napi, struct hbl_en_port, napi);
> + struct hbl_en_device *hdev = port->hdev;
> + int pkt_count;
> +
> + /* exit if we are called by netpoll as we free the Tx ring via EQ (if enabled) */
> + if (!budget)
> + return 0;
> +
> + pkt_count = hbl_en_handle_rx(port, budget);
> +
> + /* If budget not fully consumed, exit the polling mode */
> + if (pkt_count < budget) {
> + napi_complete_done(napi, pkt_count);
I believe this code might be incorrect and that it should be:
if (napi_complete_done(napi, pkt_done))
hdev->asic_funcs.reenable_rx_irq(port);
> + hdev->asic_funcs.reenable_rx_irq(port);
> + }
> +
> + return pkt_count;
> +}
> +
> +static void hbl_en_port_unregister(struct hbl_en_port *port)
> +{
> + struct net_device *ndev = port->ndev;
> +
> + unregister_netdev(ndev);
> + free_netdev(ndev);
> + port->ndev = NULL;
> +}
> +
> +static int hbl_en_set_asic_funcs(struct hbl_en_device *hdev)
> +{
> + switch (hdev->asic_type) {
> + case HBL_ASIC_GAUDI2:
> + default:
> + dev_err(hdev->dev, "Unrecognized ASIC type %d\n", hdev->asic_type);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static void hbl_en_handle_eqe(struct hbl_aux_dev *aux_dev, u32 port, struct hbl_cn_eqe *eqe)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> +
> + hdev->asic_funcs.handle_eqe(aux_dev, port, eqe);
> +}
> +
> +static void hbl_en_set_aux_ops(struct hbl_en_device *hdev, bool enable)
> +{
> + struct hbl_en_aux_ops *aux_ops = hdev->aux_dev->aux_ops;
> +
> + if (enable) {
> + aux_ops->ports_reopen = hbl_en_ports_reopen;
> + aux_ops->ports_stop_prepare = hbl_en_ports_stop_prepare;
> + aux_ops->ports_stop = hbl_en_ports_stop;
> + aux_ops->set_port_status = hbl_en_set_port_status;
> + aux_ops->is_port_open = hbl_en_is_port_open;
> + aux_ops->get_src_ip = hbl_en_get_src_ip;
> + aux_ops->reset_stats = hbl_en_reset_stats;
> + aux_ops->get_mtu = hbl_en_get_mtu;
> + aux_ops->get_pflags = hbl_en_get_pflags;
> + aux_ops->set_dev_lpbk = hbl_en_set_dev_lpbk;
> + aux_ops->handle_eqe = hbl_en_handle_eqe;
> + } else {
> + aux_ops->ports_reopen = NULL;
> + aux_ops->ports_stop_prepare = NULL;
> + aux_ops->ports_stop = NULL;
> + aux_ops->set_port_status = NULL;
> + aux_ops->is_port_open = NULL;
> + aux_ops->get_src_ip = NULL;
> + aux_ops->reset_stats = NULL;
> + aux_ops->get_mtu = NULL;
> + aux_ops->get_pflags = NULL;
> + aux_ops->set_dev_lpbk = NULL;
> + aux_ops->handle_eqe = NULL;
> + }
> +}
> +
> +int hbl_en_dev_init(struct hbl_en_device *hdev)
> +{
> + struct hbl_en_asic_funcs *asic_funcs = &hdev->asic_funcs;
> + struct hbl_en_port *port;
> + int rc, i, port_cnt = 0;
> +
> + /* must be called before the call to dev_init() */
> + rc = hbl_en_set_asic_funcs(hdev);
> + if (rc) {
> + dev_err(hdev->dev, "failed to set aux ops\n");
> + return rc;
> + }
> +
> + rc = asic_funcs->dev_init(hdev);
> + if (rc) {
> + dev_err(hdev->dev, "device init failed\n");
> + return rc;
> + }
> +
> + /* init the function pointers here before calling hbl_en_port_register which sets up
> + * net_device_ops, and its ops might start getting called.
> + * If any failure is encountered, these will be made NULL and the core driver won't call
> + * them.
> + */
> + hbl_en_set_aux_ops(hdev, true);
> +
> + /* Port register depends on the above initialization so it must be called here and not
> + * before that.
> + */
> + for (i = 0; i < hdev->max_num_of_ports; i++, port_cnt++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + rc = hbl_en_port_init(port);
> + if (rc) {
> + dev_err(hdev->dev, "port init failed\n");
> + goto unregister_ports;
> + }
> +
> + rc = hbl_en_port_register(port);
> + if (rc) {
> + dev_err(hdev->dev, "port register failed\n");
> +
> + hbl_en_port_fini(port);
> + goto unregister_ports;
> + }
> + }
> +
> + hdev->is_initialized = true;
> +
> + return 0;
> +
> +unregister_ports:
> + for (i = 0; i < port_cnt; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + hbl_en_port_unregister(port);
> + hbl_en_port_fini(port);
> + }
> +
> + hbl_en_set_aux_ops(hdev, false);
> +
> + asic_funcs->dev_fini(hdev);
> +
> + return rc;
> +}
> +
> +void hbl_en_dev_fini(struct hbl_en_device *hdev)
> +{
> + struct hbl_en_asic_funcs *asic_funcs = &hdev->asic_funcs;
> + struct hbl_en_port *port;
> + int i;
> +
> + hdev->in_teardown = true;
> +
> + if (!hdev->is_initialized)
> + return;
> +
> + hdev->is_initialized = false;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + /* It could be this cleanup flow is called after a failed init flow.
> + * Hence we need to check that we indeed have a netdev to unregister.
> + */
> + if (!port->ndev)
> + continue;
> +
> + hbl_en_port_unregister(port);
> + hbl_en_port_fini(port);
> + }
> +
> + hbl_en_set_aux_ops(hdev, false);
> +
> + asic_funcs->dev_fini(hdev);
> +}
> +
> +dma_addr_t hbl_en_dma_map(struct hbl_en_device *hdev, void *addr, int len)
> +{
> + dma_addr_t dma_addr;
> +
> + if (hdev->dma_map_support)
> + dma_addr = dma_map_single(&hdev->pdev->dev, addr, len, DMA_TO_DEVICE);
> + else
> + dma_addr = virt_to_phys(addr);
> +
> + return dma_addr;
> +}
> +
> +void hbl_en_dma_unmap(struct hbl_en_device *hdev, dma_addr_t dma_addr, int len)
> +{
> + if (hdev->dma_map_support)
> + dma_unmap_single(&hdev->pdev->dev, dma_addr, len, DMA_TO_DEVICE);
> +}
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
> new file mode 100644
> index 000000000000..15504c1f3cfb
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
> @@ -0,0 +1,206 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#ifndef HABANALABS_EN_H_
> +#define HABANALABS_EN_H_
> +
> +#include <linux/net/intel/cn.h>
> +
> +#include <linux/netdevice.h>
> +#include <linux/pci.h>
> +
> +#define HBL_EN_NAME "habanalabs_en"
> +
> +#define HBL_EN_PORT(aux_dev, idx) (&(((struct hbl_en_device *)(aux_dev)->priv)->ports[(idx)]))
> +
> +#define hbl_netdev_priv(ndev) \
> +({ \
> + typecheck(struct net_device *, ndev); \
> + *(struct hbl_en_port **)netdev_priv(ndev); \
> +})
> +
> +/**
> + * enum hbl_en_eth_pkt_status - status of Rx Ethernet packet.
> + * ETH_PKT_OK: packet was received successfully.
> + * ETH_PKT_DROP: packet should be dropped.
> + * ETH_PKT_NONE: no available packet.
> + */
> +enum hbl_en_eth_pkt_status {
> + ETH_PKT_OK,
> + ETH_PKT_DROP,
> + ETH_PKT_NONE
> +};
> +
> +/**
> + * struct hbl_en_net_stats - stats of Ethernet interface.
> + * rx_packets: number of packets received.
> + * tx_packets: number of packets sent.
> + * rx_bytes: total bytes of data received.
> + * tx_bytes: total bytes of data sent.
> + * tx_errors: number of errors in the TX.
> + * rx_dropped: number of packets dropped by the RX.
> + * tx_dropped: number of packets dropped by the TX.
> + */
> +struct hbl_en_net_stats {
> + u64 rx_packets;
> + u64 tx_packets;
> + u64 rx_bytes;
> + u64 tx_bytes;
> + u64 tx_errors;
> + atomic64_t rx_dropped;
> + atomic64_t tx_dropped;
> +};
> +
> +/**
> + * struct hbl_en_port - manage port common structure.
> + * @hdev: habanalabs Ethernet device structure.
> + * @ndev: network device.
> + * @rx_wq: WQ for Rx poll when we cannot schedule NAPI poll.
> + * @mac_addr: HW MAC addresses.
> + * @asic_specific: ASIC specific port structure.
> + * @napi: New API structure.
> + * @rx_poll_work: Rx work for polling mode.
> + * @net_stats: statistics of the ethernet interface.
> + * @in_reset: true if the NIC was marked as in reset, false otherwise. Used to avoid an additional
> + * stopping of the NIC if a hard reset was re-initiated.
> + * @pflags: ethtool private flags bit mask.
> + * @idx: index of this specific port.
> + * @rx_max_coalesced_frames: Maximum number of packets to receive before an RX interrupt.
> + * @tx_max_coalesced_frames: Maximum number of packets to be sent before a TX interrupt.
> + * @rx_coalesce_usecs: How many usecs to delay an RX interrupt after a packet arrives.
> + * @is_initialized: true if the port H/W is initialized, false otherwise.
> + * @pfc_enable: true if this port supports Priority Flow Control, false otherwise.
> + * @auto_neg_enable: is autoneg enabled.
> + * @auto_neg_resolved: was autoneg phase finished successfully.
> + */
> +struct hbl_en_port {
> + struct hbl_en_device *hdev;
> + struct net_device *ndev;
> + struct workqueue_struct *rx_wq;
> + char *mac_addr;
> + void *asic_specific;
> + struct napi_struct napi;
> + struct delayed_work rx_poll_work;
> + struct hbl_en_net_stats net_stats;
> + atomic_t in_reset;
> + u32 pflags;
> + u32 idx;
> + u32 rx_max_coalesced_frames;
> + u32 tx_max_coalesced_frames;
> + u16 rx_coalesce_usecs;
> + u8 is_initialized;
> + u8 pfc_enable;
> + u8 auto_neg_enable;
> + u8 auto_neg_resolved;
> +};
> +
> +/**
> + * struct hbl_en_asic_funcs - ASIC specific Ethernet functions.
> + * @dev_init: device init.
> + * @dev_fini: device cleanup.
> + * @reenable_rx_irq: re-enable Rx interrupts.
> + * @eth_port_open: initialize and open the Ethernet port.
> + * @eth_port_close: close the Ethernet port.
> + * @write_pkt_to_hw: write skb to HW.
> + * @read_pkt_from_hw: read pkt from HW.
> + * @get_pfc_cnts: get PFC counters.
> + * @set_coalesce: set Tx/Rx coalesce config in HW.
> + * @get_rx_ring size: get max number of elements the Rx ring can contain.
> + * @handle_eqe: Handle a received event.
> + */
> +struct hbl_en_asic_funcs {
> + int (*dev_init)(struct hbl_en_device *hdev);
> + void (*dev_fini)(struct hbl_en_device *hdev);
> + void (*reenable_rx_irq)(struct hbl_en_port *port);
> + int (*eth_port_open)(struct hbl_en_port *port);
> + void (*eth_port_close)(struct hbl_en_port *port);
> + netdev_tx_t (*write_pkt_to_hw)(struct hbl_en_port *port, struct sk_buff *skb);
> + int (*read_pkt_from_hw)(struct hbl_en_port *port, void **pkt_addr, u32 *pkt_size);
> + void (*get_pfc_cnts)(struct hbl_en_port *port, void *ptr);
> + int (*set_coalesce)(struct hbl_en_port *port);
> + int (*get_rx_ring_size)(struct hbl_en_port *port);
> + void (*handle_eqe)(struct hbl_aux_dev *aux_dev, u32 port_idx, struct hbl_cn_eqe *eqe);
> +};
> +
> +/**
> + * struct hbl_en_device - habanalabs Ethernet device structure.
> + * @pdev: pointer to PCI device.
> + * @dev: related kernel basic device structure.
> + * @ports: array of all ports manage common structures.
> + * @aux_dev: pointer to auxiliary device.
> + * @asic_specific: ASIC specific device structure.
> + * @fw_ver: FW version.
> + * @qsfp_eeprom: QSFPD EEPROM info.
> + * @mac_addr: array of all MAC addresses.
> + * @asic_funcs: ASIC specific Ethernet functions.
> + * @asic_type: ASIC specific type.
> + * @ports_mask: mask of available ports.
> + * @auto_neg_mask: mask of port with Autonegotiation enabled.
> + * @port_reset_timeout: max time in seconds for a port reset flow to finish.
> + * @pending_reset_long_timeout: long timeout for pending hard reset to finish in seconds.
> + * @max_frm_len: maximum allowed frame length.
> + * @raw_elem_size: size of element in raw buffers.
> + * @max_raw_mtu: maximum MTU size for raw packets.
> + * @min_raw_mtu: minimum MTU size for raw packets.
> + * @pad_size: the pad size in bytes for the skb to transmit.
> + * @core_dev_id: core device ID.
> + * @max_num_of_ports: max number of available ports;
> + * @in_reset: is the entire NIC currently under reset.
> + * @poll_enable: Enable Rx polling rather than IRQ + NAPI.
> + * @in_teardown: true if the NIC is in teardown (during device remove).
> + * @is_initialized: was the device initialized successfully.
> + * @has_eq: true if event queue is supported.
> + * @dma_map_support: HW supports DMA mapping.
> + */
> +struct hbl_en_device {
> + struct pci_dev *pdev;
> + struct device *dev;
> + struct hbl_en_port *ports;
> + struct hbl_aux_dev *aux_dev;
> + void *asic_specific;
> + char *fw_ver;
> + char *qsfp_eeprom;
> + char *mac_addr;
> + struct hbl_en_asic_funcs asic_funcs;
> + enum hbl_cn_asic_type asic_type;
> + u64 ports_mask;
> + u64 auto_neg_mask;
> + u32 port_reset_timeout;
> + u32 pending_reset_long_timeout;
> + u32 max_frm_len;
> + u32 raw_elem_size;
> + u16 max_raw_mtu;
> + u16 min_raw_mtu;
> + u16 pad_size;
> + u16 core_dev_id;
> + u8 max_num_of_ports;
> + u8 in_reset;
> + u8 poll_enable;
> + u8 in_teardown;
> + u8 is_initialized;
> + u8 has_eq;
> + u8 dma_map_support;
> +};
> +
> +int hbl_en_dev_init(struct hbl_en_device *hdev);
> +void hbl_en_dev_fini(struct hbl_en_device *hdev);
> +
> +const struct ethtool_ops *hbl_en_ethtool_get_ops(struct net_device *ndev);
> +void hbl_en_ethtool_init_coalesce(struct hbl_en_port *port);
> +
> +extern const struct dcbnl_rtnl_ops hbl_en_dcbnl_ops;
> +
> +bool hbl_en_rx_poll_start(struct hbl_en_port *port);
> +void hbl_en_rx_poll_stop(struct hbl_en_port *port);
> +void hbl_en_rx_poll_trigger_init(struct hbl_en_port *port);
> +int hbl_en_port_reset(struct hbl_en_port *port);
> +int hbl_en_port_reset_locked(struct hbl_aux_dev *aux_dev, u32 port_idx);
> +int hbl_en_handle_rx(struct hbl_en_port *port, int budget);
> +dma_addr_t hbl_en_dma_map(struct hbl_en_device *hdev, void *addr, int len);
> +void hbl_en_dma_unmap(struct hbl_en_device *hdev, dma_addr_t dma_addr, int len);
> +
> +#endif /* HABANALABS_EN_H_ */
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c
> new file mode 100644
> index 000000000000..5d718579a2b6
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c
> @@ -0,0 +1,101 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#include "hbl_en.h"
> +
> +#define PFC_PRIO_MASK_ALL GENMASK(HBL_EN_PFC_PRIO_NUM - 1, 0)
> +#define PFC_PRIO_MASK_NONE 0
> +
> +#ifdef CONFIG_DCB
> +static int hbl_en_dcbnl_ieee_getpfc(struct net_device *netdev, struct ieee_pfc *pfc)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + struct hbl_en_device *hdev;
> + u32 port_idx;
> +
> + hdev = port->hdev;
> + port_idx = port->idx;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + dev_dbg_ratelimited(hdev->dev, "port %d is in reset, can't get PFC", port_idx);
> + return -EBUSY;
> + }
> +
> + pfc->pfc_en = port->pfc_enable ? PFC_PRIO_MASK_ALL : PFC_PRIO_MASK_NONE;
> + pfc->pfc_cap = HBL_EN_PFC_PRIO_NUM;
> +
> + hdev->asic_funcs.get_pfc_cnts(port, pfc);
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return 0;
> +}
> +
> +static int hbl_en_dcbnl_ieee_setpfc(struct net_device *netdev, struct ieee_pfc *pfc)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + struct hbl_en_device *hdev;
> + u8 curr_pfc_en;
> + u32 port_idx;
> + int rc = 0;
> +
> + hdev = port->hdev;
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> + port_idx = port->idx;
> +
> + if (pfc->pfc_en & ~PFC_PRIO_MASK_ALL) {
> + dev_dbg_ratelimited(hdev->dev, "PFC supports %d priorities only, port %d\n",
> + HBL_EN_PFC_PRIO_NUM, port_idx);
> + return -EINVAL;
> + }
> +
> + if (pfc->pfc_en != PFC_PRIO_MASK_NONE && pfc->pfc_en != PFC_PRIO_MASK_ALL) {
> + dev_dbg_ratelimited(hdev->dev,
> + "PFC should be enabled/disabled on all priorities, port %d\n",
> + port_idx);
> + return -EINVAL;
> + }
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + dev_dbg_ratelimited(hdev->dev, "port %d is in reset, can't set PFC", port_idx);
> + return -EBUSY;
> + }
> +
> + curr_pfc_en = port->pfc_enable ? PFC_PRIO_MASK_ALL : PFC_PRIO_MASK_NONE;
> +
> + if (pfc->pfc_en == curr_pfc_en)
> + goto out;
> +
> + port->pfc_enable = !port->pfc_enable;
> +
> + rc = aux_ops->set_pfc(aux_dev, port_idx, port->pfc_enable);
> +
> +out:
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +static u8 hbl_en_dcbnl_getdcbx(struct net_device *netdev)
> +{
> + return DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE;
> +}
> +
> +static u8 hbl_en_dcbnl_setdcbx(struct net_device *netdev, u8 mode)
> +{
> + return !(mode == (DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE));
> +}
> +
> +const struct dcbnl_rtnl_ops hbl_en_dcbnl_ops = {
> + .ieee_getpfc = hbl_en_dcbnl_ieee_getpfc,
> + .ieee_setpfc = hbl_en_dcbnl_ieee_setpfc,
> + .getdcbx = hbl_en_dcbnl_getdcbx,
> + .setdcbx = hbl_en_dcbnl_setdcbx
> +};
> +#endif
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c
> new file mode 100644
> index 000000000000..23a87d36ded5
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c
> @@ -0,0 +1,211 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#define pr_fmt(fmt) "habanalabs_en: " fmt
> +
> +#include "hbl_en.h"
> +
> +#include <linux/module.h>
> +#include <linux/auxiliary_bus.h>
> +
> +#define HBL_DRIVER_AUTHOR "HabanaLabs Kernel Driver Team"
> +
> +#define HBL_DRIVER_DESC "HabanaLabs AI accelerators Ethernet driver"
> +
> +MODULE_AUTHOR(HBL_DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(HBL_DRIVER_DESC);
> +MODULE_LICENSE("GPL");
> +
> +static bool poll_enable;
> +
> +module_param(poll_enable, bool, 0444);
> +MODULE_PARM_DESC(poll_enable,
> + "Enable Rx polling rather than IRQ + NAPI (0 = no, 1 = yes, default: no)");
> +
> +static int hdev_init(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_aux_data *aux_data = aux_dev->aux_data;
> + struct hbl_en_port *ports, *port;
> + struct hbl_en_device *hdev;
> + int rc, i;
> +
> + hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
> + if (!hdev)
> + return -ENOMEM;
> +
> + ports = kcalloc(aux_data->max_num_of_ports, sizeof(*ports), GFP_KERNEL);
> + if (!ports) {
> + rc = -ENOMEM;
> + goto ports_alloc_fail;
> + }
> +
> + aux_dev->priv = hdev;
> + hdev->aux_dev = aux_dev;
> + hdev->ports = ports;
> + hdev->pdev = aux_data->pdev;
> + hdev->dev = aux_data->dev;
> + hdev->ports_mask = aux_data->ports_mask;
> + hdev->auto_neg_mask = aux_data->auto_neg_mask;
> + hdev->max_num_of_ports = aux_data->max_num_of_ports;
> + hdev->core_dev_id = aux_data->id;
> + hdev->fw_ver = aux_data->fw_ver;
> + hdev->qsfp_eeprom = aux_data->qsfp_eeprom;
> + hdev->asic_type = aux_data->asic_type;
> + hdev->pending_reset_long_timeout = aux_data->pending_reset_long_timeout;
> + hdev->max_frm_len = aux_data->max_frm_len;
> + hdev->raw_elem_size = aux_data->raw_elem_size;
> + hdev->max_raw_mtu = aux_data->max_raw_mtu;
> + hdev->min_raw_mtu = aux_data->min_raw_mtu;
> + hdev->pad_size = ETH_ZLEN;
> + hdev->has_eq = aux_data->has_eq;
> + hdev->dma_map_support = true;
> + hdev->poll_enable = poll_enable;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> + port->hdev = hdev;
> + port->idx = i;
> + port->pfc_enable = true;
> + port->pflags = PFLAGS_PCS_LINK_CHECK | PFLAGS_PHY_AUTO_NEG_LPBK;
> + port->mac_addr = aux_data->mac_addr[i];
> + port->auto_neg_enable = !!(aux_data->auto_neg_mask & BIT(i));
> + }
> +
> + return 0;
> +
> +ports_alloc_fail:
> + kfree(hdev);
> +
> + return rc;
> +}
> +
> +static void hdev_fini(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> +
> + kfree(hdev->ports);
> + kfree(hdev);
> + aux_dev->priv = NULL;
> +}
> +
> +static const struct auxiliary_device_id hbl_en_id_table[] = {
> + { .name = "habanalabs_cn.en", },
> + {},
> +};
> +
> +MODULE_DEVICE_TABLE(auxiliary, hbl_en_id_table);
> +
> +static int hbl_en_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
> +{
> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
> + struct hbl_en_aux_ops *aux_ops = aux_dev->aux_ops;
> + struct hbl_en_device *hdev;
> + ktime_t timeout;
> + int rc;
> +
> + rc = hdev_init(aux_dev);
> + if (rc) {
> + dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
> + return -EIO;
> + }
> +
> + hdev = aux_dev->priv;
> +
> + /* don't allow module unloading while it is attached */
> + if (!try_module_get(THIS_MODULE)) {
> + dev_err(hdev->dev, "Failed to increment %s module refcount\n", HBL_EN_NAME);
> + rc = -EIO;
> + goto module_get_err;
> + }
> +
> + timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
> + while (1) {
> + aux_ops->hw_access_lock(aux_dev);
> +
> + /* if the device is operational, proceed to actual init while holding the lock in
> + * order to prevent concurrent hard reset
> + */
> + if (aux_ops->device_operational(aux_dev))
> + break;
> +
> + aux_ops->hw_access_unlock(aux_dev);
> +
> + if (ktime_compare(ktime_get(), timeout) > 0) {
> + dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
> + rc = -EBUSY;
> + goto timeout_err;
> + }
> +
> + dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing en\n");
> +
> + msleep_interruptible(MSEC_PER_SEC);
> + }
> +
> + rc = hbl_en_dev_init(hdev);
> + if (rc) {
> + dev_err(hdev->dev, "Failed to init en device\n");
> + goto dev_init_err;
> + }
> +
> + aux_ops->hw_access_unlock(aux_dev);
> +
> + return 0;
> +
> +dev_init_err:
> + aux_ops->hw_access_unlock(aux_dev);
> +timeout_err:
> + module_put(THIS_MODULE);
> +module_get_err:
> + hdev_fini(aux_dev);
> +
> + return rc;
> +}
> +
> +/* This function can be called only from the CN driver when deleting the aux bus, because we
> + * incremented the module refcount on probing. Hence no need to protect here from hard reset.
> + */
> +static void hbl_en_remove(struct auxiliary_device *adev)
> +{
> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
> + struct hbl_en_device *hdev = aux_dev->priv;
> +
> + if (!hdev)
> + return;
> +
> + hbl_en_dev_fini(hdev);
> +
> + /* allow module unloading as now it is detached */
> + module_put(THIS_MODULE);
> +
> + hdev_fini(aux_dev);
> +}
> +
> +static struct auxiliary_driver hbl_en_driver = {
> + .name = "eth",
> + .probe = hbl_en_probe,
> + .remove = hbl_en_remove,
> + .id_table = hbl_en_id_table,
> +};
> +
> +static int __init hbl_en_init(void)
> +{
> + pr_info("loading driver\n");
> +
> + return auxiliary_driver_register(&hbl_en_driver);
> +}
> +
> +static void __exit hbl_en_exit(void)
> +{
> + auxiliary_driver_unregister(&hbl_en_driver);
> +
> + pr_info("driver removed\n");
> +}
> +
> +module_init(hbl_en_init);
> +module_exit(hbl_en_exit);
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c
> new file mode 100644
> index 000000000000..1d14d283409b
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c
> @@ -0,0 +1,452 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#include "hbl_en.h"
> +#include <linux/ethtool.h>
> +
> +#define RX_COALESCED_FRAMES_MIN 1
> +#define TX_COALESCED_FRAMES_MIN 1
> +#define TX_COALESCED_FRAMES_MAX 10
> +
> +static const char pflags_str[][ETH_GSTRING_LEN] = {
> + "pcs-link-check",
> + "phy-auto-neg-lpbk",
> +};
> +
> +#define NIC_STAT(m) {#m, offsetof(struct hbl_en_port, net_stats.m)}
> +
> +static struct hbl_cn_stat netdev_eth_stats[] = {
> + NIC_STAT(rx_packets),
> + NIC_STAT(tx_packets),
> + NIC_STAT(rx_bytes),
> + NIC_STAT(tx_bytes),
> + NIC_STAT(tx_errors),
> + NIC_STAT(rx_dropped),
> + NIC_STAT(tx_dropped)
> +};
> +
> +static size_t pflags_str_len = ARRAY_SIZE(pflags_str);
> +static size_t netdev_eth_stats_len = ARRAY_SIZE(netdev_eth_stats);
> +
> +static void hbl_en_ethtool_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *drvinfo)
> +{
> + struct hbl_en_device *hdev;
> + struct hbl_en_port *port;
> +
> + port = hbl_netdev_priv(ndev);
> + hdev = port->hdev;
> +
> + strscpy(drvinfo->driver, HBL_EN_NAME, sizeof(drvinfo->driver));
> + strscpy(drvinfo->fw_version, hdev->fw_ver, sizeof(drvinfo->fw_version));
> + strscpy(drvinfo->bus_info, pci_name(hdev->pdev), sizeof(drvinfo->bus_info));
> +}
> +
> +static int hbl_en_ethtool_get_module_info(struct net_device *ndev, struct ethtool_modinfo *modinfo)
> +{
> + modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
> + modinfo->type = ETH_MODULE_SFF_8636;
> +
> + return 0;
> +}
> +
> +static int hbl_en_ethtool_get_module_eeprom(struct net_device *ndev, struct ethtool_eeprom *ee,
> + u8 *data)
> +{
> + struct hbl_en_device *hdev;
> + struct hbl_en_port *port;
> + u32 first, last, len;
> + u8 *qsfp_eeprom;
> +
> + port = hbl_netdev_priv(ndev);
> + hdev = port->hdev;
> + qsfp_eeprom = hdev->qsfp_eeprom;
> +
> + if (ee->len == 0)
> + return -EINVAL;
> +
> + first = ee->offset;
> + last = ee->offset + ee->len;
> +
> + if (first < ETH_MODULE_SFF_8636_LEN) {
> + len = min_t(unsigned int, last, ETH_MODULE_SFF_8079_LEN);
> + len -= first;
> +
> + memcpy(data, qsfp_eeprom + first, len);
> + }
> +
> + return 0;
> +}
> +
> +static u32 hbl_en_ethtool_get_priv_flags(struct net_device *ndev)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> +
> + return port->pflags;
> +}
> +
> +static int hbl_en_ethtool_set_priv_flags(struct net_device *ndev, u32 priv_flags)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> +
> + port->pflags = priv_flags;
> +
> + return 0;
> +}
> +
> +static int hbl_en_ethtool_get_link_ksettings(struct net_device *ndev,
> + struct ethtool_link_ksettings *cmd)
> +{
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + struct hbl_en_device *hdev;
> + struct hbl_en_port *port;
> + u32 port_idx, speed;
> +
> + port = hbl_netdev_priv(ndev);
> + hdev = port->hdev;
> + port_idx = port->idx;
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> + speed = aux_ops->get_speed(aux_dev, port_idx);
> +
> + cmd->base.speed = speed;
> + cmd->base.duplex = DUPLEX_FULL;
> +
> + ethtool_link_ksettings_zero_link_mode(cmd, supported);
> + ethtool_link_ksettings_zero_link_mode(cmd, advertising);
> +
> + switch (speed) {
> + case SPEED_100000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseSR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseKR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseLR4_ER4_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseSR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseKR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseLR4_ER4_Full);
> +
> + cmd->base.port = PORT_FIBRE;
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, FIBRE);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, FIBRE);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, Backplane);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Backplane);
> + break;
> + case SPEED_50000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseSR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseCR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseKR2_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseSR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseCR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseKR2_Full);
> + break;
> + case SPEED_25000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 25000baseCR_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 25000baseCR_Full);
> + break;
> + case SPEED_200000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 200000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 200000baseKR4_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 200000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 200000baseKR4_Full);
> + break;
> + case SPEED_400000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 400000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 400000baseKR4_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 400000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 400000baseKR4_Full);
> + break;
> + default:
> + netdev_err(port->ndev, "unknown speed %d\n", speed);
> + return -EFAULT;
> + }
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, Autoneg);
> +
> + if (port->auto_neg_enable) {
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Autoneg);
> + cmd->base.autoneg = AUTONEG_ENABLE;
> + if (port->auto_neg_resolved)
> + ethtool_link_ksettings_add_link_mode(cmd, lp_advertising, Autoneg);
> + } else {
> + cmd->base.autoneg = AUTONEG_DISABLE;
> + }
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, Pause);
> +
> + if (port->pfc_enable)
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Pause);
> +
> + return 0;
> +}
> +
> +/* only autoneg is mutable */
> +static bool check_immutable_ksettings(const struct ethtool_link_ksettings *old_cmd,
> + const struct ethtool_link_ksettings *new_cmd)
> +{
> + return (old_cmd->base.speed == new_cmd->base.speed) &&
> + (old_cmd->base.duplex == new_cmd->base.duplex) &&
> + (old_cmd->base.port == new_cmd->base.port) &&
> + (old_cmd->base.phy_address == new_cmd->base.phy_address) &&
> + (old_cmd->base.eth_tp_mdix_ctrl == new_cmd->base.eth_tp_mdix_ctrl) &&
> + bitmap_equal(old_cmd->link_modes.advertising, new_cmd->link_modes.advertising,
> + __ETHTOOL_LINK_MODE_MASK_NBITS);
> +}
> +
> +static int
> +hbl_en_ethtool_set_link_ksettings(struct net_device *ndev, const struct ethtool_link_ksettings *cmd)
> +{
> + struct ethtool_link_ksettings curr_cmd;
> + struct hbl_en_device *hdev;
> + struct hbl_en_port *port;
> + bool auto_neg;
> + u32 port_idx;
> + int rc;
> +
> + port = hbl_netdev_priv(ndev);
> + hdev = port->hdev;
> + port_idx = port->idx;
> +
> + memset(&curr_cmd, 0, sizeof(struct ethtool_link_ksettings));
> +
> + rc = hbl_en_ethtool_get_link_ksettings(ndev, &curr_cmd);
> + if (rc)
> + return rc;
> +
> + if (!check_immutable_ksettings(&curr_cmd, cmd))
> + return -EOPNOTSUPP;
> +
> + auto_neg = cmd->base.autoneg == AUTONEG_ENABLE;
> +
> + if (port->auto_neg_enable == auto_neg)
> + return 0;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(port->ndev, "port is in reset, can't update settings\n");
> + return -EBUSY;
> + }
> +
> + if (auto_neg && !(hdev->auto_neg_mask & BIT(port_idx))) {
> + netdev_err(port->ndev, "port autoneg is disabled by BMC\n");
> + rc = -EFAULT;
> + goto out;
> + }
> +
> + port->auto_neg_enable = auto_neg;
> +
> + if (netif_running(port->ndev)) {
> + rc = hbl_en_port_reset(port);
> + if (rc)
> + netdev_err(port->ndev, "Failed to reset port for settings update, rc %d\n",
> + rc);
> + }
> +
> +out:
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +static int hbl_en_ethtool_get_sset_count(struct net_device *ndev, int sset)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + switch (sset) {
> + case ETH_SS_STATS:
> + return netdev_eth_stats_len + aux_ops->get_cnts_num(aux_dev, port_idx);
> + case ETH_SS_PRIV_FLAGS:
> + return pflags_str_len;
> + default:
> + return -EOPNOTSUPP;
> + }
> +}
> +
> +static void hbl_en_ethtool_get_strings(struct net_device *ndev, u32 stringset, u8 *data)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int i;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + switch (stringset) {
> + case ETH_SS_STATS:
> + for (i = 0; i < netdev_eth_stats_len; i++)
> + ethtool_puts(&data, netdev_eth_stats[i].str);
> +
> + aux_ops->get_cnts_names(aux_dev, port_idx, data);
> + break;
> + case ETH_SS_PRIV_FLAGS:
> + for (i = 0; i < pflags_str_len; i++)
> + ethtool_puts(&data, pflags_str[i]);
> + break;
> + }
> +}
> +
> +static void hbl_en_ethtool_get_ethtool_stats(struct net_device *ndev,
> + __always_unused struct ethtool_stats *stats, u64 *data)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + struct hbl_en_device *hdev;
> + u32 port_idx;
> + char *p;
> + int i;
> +
> + hdev = port->hdev;
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> + port_idx = port->idx;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + dev_info_ratelimited(hdev->dev, "port %d is in reset, can't get ethtool stats",
> + port_idx);
> + return;
> + }
> +
> + /* Even though the Ethernet Rx/Tx flow might update the stats in parallel, there is not an
> + * absolute need for synchronisation. This is because, missing few counts of these stats is
> + * much better than adding a lock to synchronize and increase the overhead of the Rx/Tx
> + * flows. In worst case scenario, reader will get stale stats. He will receive updated
> + * stats in next read.
> + */
> + for (i = 0; i < netdev_eth_stats_len; i++) {
> + p = (char *)port + netdev_eth_stats[i].lo_offset;
> + data[i] = *(u32 *)p;
> + }
> +
> + data += i;
> +
> + aux_ops->get_cnts_values(aux_dev, port_idx, data);
> +
> + atomic_set(&port->in_reset, 0);
> +}
> +
> +static int hbl_en_ethtool_get_coalesce(struct net_device *ndev,
> + struct ethtool_coalesce *coal,
> + struct kernel_ethtool_coalesce *kernel_coal,
> + struct netlink_ext_ack *extack)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + aux_ops->ctrl_lock(aux_dev, port_idx);
> +
> + coal->tx_max_coalesced_frames = port->tx_max_coalesced_frames;
> + coal->rx_coalesce_usecs = port->rx_coalesce_usecs;
> + coal->rx_max_coalesced_frames = port->rx_max_coalesced_frames;
> +
> + aux_ops->ctrl_unlock(aux_dev, port_idx);
> +
> + return 0;
> +}
> +
> +static int hbl_en_ethtool_set_coalesce(struct net_device *ndev,
> + struct ethtool_coalesce *coal,
> + struct kernel_ethtool_coalesce *kernel_coal,
> + struct netlink_ext_ack *extack)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int rc, rx_ring_size;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(port->ndev, "port is in reset, can't update settings\n");
> + return -EBUSY;
> + }
> +
> + if (coal->tx_max_coalesced_frames < TX_COALESCED_FRAMES_MIN ||
> + coal->tx_max_coalesced_frames > TX_COALESCED_FRAMES_MAX) {
> + netdev_err(ndev, "tx max_coalesced_frames should be between %d and %d\n",
> + TX_COALESCED_FRAMES_MIN, TX_COALESCED_FRAMES_MAX);
> + rc = -EINVAL;
> + goto atomic_out;
> + }
> +
> + rx_ring_size = hdev->asic_funcs.get_rx_ring_size(port);
> + if (coal->rx_max_coalesced_frames < RX_COALESCED_FRAMES_MIN ||
> + coal->rx_max_coalesced_frames >= rx_ring_size) {
> + netdev_err(ndev, "rx max_coalesced_frames should be between %d and %d\n",
> + RX_COALESCED_FRAMES_MIN, rx_ring_size);
> + rc = -EINVAL;
> + goto atomic_out;
> + }
> +
> + aux_ops->ctrl_lock(aux_dev, port_idx);
> +
> + port->tx_max_coalesced_frames = coal->tx_max_coalesced_frames;
> + port->rx_coalesce_usecs = coal->rx_coalesce_usecs;
> + port->rx_max_coalesced_frames = coal->rx_max_coalesced_frames;
> +
> + rc = hdev->asic_funcs.set_coalesce(port);
> +
> + aux_ops->ctrl_unlock(aux_dev, port_idx);
> +
> +atomic_out:
> + atomic_set(&port->in_reset, 0);
> + return rc;
> +}
> +
> +void hbl_en_ethtool_init_coalesce(struct hbl_en_port *port)
> +{
> + port->rx_coalesce_usecs = CQ_ARM_TIMEOUT_USEC;
> + port->rx_max_coalesced_frames = 1;
> + port->tx_max_coalesced_frames = 1;
> +}
> +
> +static const struct ethtool_ops hbl_en_ethtool_ops_coalesce = {
> + .supported_coalesce_params = ETHTOOL_COALESCE_RX_USECS | ETHTOOL_COALESCE_RX_MAX_FRAMES |
> + ETHTOOL_COALESCE_TX_MAX_FRAMES,
> + .get_drvinfo = hbl_en_ethtool_get_drvinfo,
> + .get_link = ethtool_op_get_link,
> + .get_module_info = hbl_en_ethtool_get_module_info,
> + .get_module_eeprom = hbl_en_ethtool_get_module_eeprom,
> + .get_priv_flags = hbl_en_ethtool_get_priv_flags,
> + .set_priv_flags = hbl_en_ethtool_set_priv_flags,
> + .get_link_ksettings = hbl_en_ethtool_get_link_ksettings,
> + .set_link_ksettings = hbl_en_ethtool_set_link_ksettings,
> + .get_sset_count = hbl_en_ethtool_get_sset_count,
> + .get_strings = hbl_en_ethtool_get_strings,
> + .get_ethtool_stats = hbl_en_ethtool_get_ethtool_stats,
> + .get_coalesce = hbl_en_ethtool_get_coalesce,
> + .set_coalesce = hbl_en_ethtool_set_coalesce,
> +};
> +
> +const struct ethtool_ops *hbl_en_ethtool_get_ops(struct net_device *ndev)
> +{
> + return &hbl_en_ethtool_ops_coalesce;
> +}
> --
> 2.34.1
>
>
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver
2024-06-13 8:21 ` [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver Omer Shpigelman
2024-06-13 13:01 ` Przemek Kitszel
@ 2024-06-15 0:05 ` Stephen Hemminger
2024-06-17 8:14 ` Omer Shpigelman
2024-06-17 14:05 ` Markus Elfring
2 siblings, 1 reply; 107+ messages in thread
From: Stephen Hemminger @ 2024-06-15 0:05 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel, linux-rdma, netdev, dri-devel, ogabbay, zyehudai
> +#define HBL_AUX2NIC(aux_dev) \
> + ({ \
> + struct hbl_aux_dev *__aux_dev = (aux_dev); \
> + ((__aux_dev)->type == HBL_AUX_DEV_ETH) ? \
> + container_of(__aux_dev, struct hbl_cn_device, en_aux_dev) : \
> + container_of(__aux_dev, struct hbl_cn_device, ib_aux_dev); \
> + })
> +
> +#define RAND_STAT_CNT(cnt) \
> + do { \
> + u32 __cnt = get_random_u32(); \
> + (cnt) = __cnt; \
> + dev_info(hdev->dev, "port %d, %s: %u\n", port, #cnt, __cnt); \
> + } while (0)
> +
> +struct hbl_cn_stat hbl_cn_mac_fec_stats[] = {
> + {"correctable_errors", 0x2, 0x3},
> + {"uncorrectable_errors", 0x4, 0x5}
> +};
> +
These tables should be marked const?
> +struct hbl_cn_stat hbl_cn_mac_stats_rx[] = {
> + {"Octets", 0x0},
> + {"OctetsReceivedOK", 0x4},
> + {"aAlignmentErrors", 0x8},
> + {"aPAUSEMACCtrlFramesReceived", 0xC},
> + {"aFrameTooLongErrors", 0x10},
> + {"aInRangeLengthErrors", 0x14},
> + {"aFramesReceivedOK", 0x18},
> + {"aFrameCheckSequenceErrors", 0x1C},
> + {"VLANReceivedOK", 0x20},
> + {"ifInErrors", 0x24},
> + {"ifInUcastPkts", 0x28},
> + {"ifInMulticastPkts", 0x2C},
> + {"ifInBroadcastPkts", 0x30},
> + {"DropEvents", 0x34},
> + {"Pkts", 0x38},
> + {"UndersizePkts", 0x3C},
> + {"Pkts64Octets", 0x40},
> + {"Pkts65to127Octets", 0x44},
> + {"Pkts128to255Octets", 0x48},
> + {"Pkts256to511Octets", 0x4C},
> + {"Pkts512to1023Octets", 0x50},
> + {"Pkts1024to1518Octets", 0x54},
> + {"Pkts1519toMaxOctets", 0x58},
> + {"OversizePkts", 0x5C},
> + {"Jabbers", 0x60},
> + {"Fragments", 0x64},
> + {"aCBFCPAUSERx0", 0x68},
> + {"aCBFCPAUSERx1", 0x6C},
> + {"aCBFCPAUSERx2", 0x70},
> + {"aCBFCPAUSERx3", 0x74},
> + {"aCBFCPAUSERx4", 0x78},
> + {"aCBFCPAUSERx5", 0x7C},
> + {"aCBFCPAUSERx6", 0x80},
> + {"aCBFCPAUSERx7", 0x84},
> + {"aMACControlFramesReceived", 0x88}
> +};
> +
> +struct hbl_cn_stat hbl_cn_mac_stats_tx[] = {
> + {"Octets", 0x0},
> + {"OctetsTransmittedOK", 0x4},
> + {"aPAUSEMACCtrlFramesTransmitted", 0x8},
> + {"aFramesTransmittedOK", 0xC},
> + {"VLANTransmittedOK", 0x10},
> + {"ifOutErrors", 0x14},
> + {"ifOutUcastPkts", 0x18},
> + {"ifOutMulticastPkts", 0x1C},
> + {"ifOutBroadcastPkts", 0x20},
> + {"Pkts64Octets", 0x24},
> + {"Pkts65to127Octets", 0x28},
> + {"Pkts128to255Octets", 0x2C},
> + {"Pkts256to511Octets", 0x30},
> + {"Pkts512to1023Octets", 0x34},
> + {"Pkts1024to1518Octets", 0x38},
> + {"Pkts1519toMaxOctets", 0x3C},
> + {"aCBFCPAUSETx0", 0x40},
> + {"aCBFCPAUSETx1", 0x44},
> + {"aCBFCPAUSETx2", 0x48},
> + {"aCBFCPAUSETx3", 0x4C},
> + {"aCBFCPAUSETx4", 0x50},
> + {"aCBFCPAUSETx5", 0x54},
> + {"aCBFCPAUSETx6", 0x58},
> + {"aCBFCPAUSETx7", 0x5C},
> + {"aMACControlFramesTx", 0x60},
> + {"Pkts", 0x64}
> +};
> +
> +static const char pcs_counters_str[][ETH_GSTRING_LEN] = {
> + {"pcs_local_faults"},
> + {"pcs_remote_faults"},
> + {"pcs_remote_fault_reconfig"},
> + {"pcs_link_restores"},
> + {"pcs_link_toggles"},
> +};
> +
> +static size_t pcs_counters_str_len = ARRAY_SIZE(pcs_counters_str);
> +size_t hbl_cn_mac_fec_stats_len = ARRAY_SIZE(hbl_cn_mac_fec_stats);
> +size_t hbl_cn_mac_stats_rx_len = ARRAY_SIZE(hbl_cn_mac_stats_rx);
> +size_t hbl_cn_mac_stats_tx_len = ARRAY_SIZE(hbl_cn_mac_stats_tx);
> +
> +static void qps_stop(struct hbl_cn_device *hdev);
> +static void qp_destroy_work(struct work_struct *work);
> +static int __user_wq_arr_unset(struct hbl_cn_ctx *ctx, struct hbl_cn_port *cn_port, u32 type);
> +static void user_cq_destroy(struct kref *kref);
> +static void set_app_params_clear(struct hbl_cn_device *hdev);
> +static int hbl_cn_ib_cmd_ctrl(struct hbl_aux_dev *aux_dev, void *cn_ib_ctx, u32 op, void *input,
> + void *output);
> +static int hbl_cn_ib_query_mem_handle(struct hbl_aux_dev *ib_aux_dev, u64 mem_handle,
> + struct hbl_ib_mem_info *info);
> +
> +static void hbl_cn_reset_stats_counters_port(struct hbl_cn_device *hdev, u32 port);
> +static void hbl_cn_late_init(struct hbl_cn_device *hdev);
> +static void hbl_cn_late_fini(struct hbl_cn_device *hdev);
> +static int hbl_cn_sw_init(struct hbl_cn_device *hdev);
> +static void hbl_cn_sw_fini(struct hbl_cn_device *hdev);
> +static void hbl_cn_spmu_init(struct hbl_cn_port *cn_port, bool full);
> +static int hbl_cn_cmd_port_check(struct hbl_cn_device *hdev, u32 port, u32 flags);
> +static void hbl_cn_qps_stop(struct hbl_cn_port *cn_port);
Can you reorder code so forward declarations are not required?
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-13 8:22 ` [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver Omer Shpigelman
2024-06-13 21:49 ` Andrew Lunn
2024-06-14 22:48 ` Joe Damato
@ 2024-06-15 0:10 ` Stephen Hemminger
2024-06-19 12:07 ` Omer Shpigelman
2024-06-15 0:16 ` Stephen Hemminger
` (2 subsequent siblings)
5 siblings, 1 reply; 107+ messages in thread
From: Stephen Hemminger @ 2024-06-15 0:10 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel, linux-rdma, netdev, dri-devel, ogabbay, zyehudai
On Thu, 13 Jun 2024 11:22:02 +0300
Omer Shpigelman <oshpigelman@habana.ai> wrote:
> +static int hbl_en_ports_reopen(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> + struct hbl_en_port *port;
> + int rc = 0, i;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + /* It could be that the port was shutdown by 'ip link set down' and there is no need
> + * in reopening it.
> + * Since we mark the ports as in reset even if they are disabled, we clear the flag
> + * here anyway.
> + * See hbl_en_ports_stop_prepare() for more info.
> + */
> + if (!netif_running(port->ndev)) {
> + atomic_set(&port->in_reset, 0);
> + continue;
> + }
> +
Rather than duplicating network device state in your own flags, it would be better to use
existing infrastructure. Read Documentation/networking/operstates.rst
Then you could also get rid of the kludge timer stuff in hbl_en_close().
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-13 8:22 ` [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver Omer Shpigelman
` (2 preceding siblings ...)
2024-06-15 0:10 ` Stephen Hemminger
@ 2024-06-15 0:16 ` Stephen Hemminger
2024-06-18 19:39 ` Omer Shpigelman
2024-06-15 10:55 ` Zhu Yanjun
2024-06-15 17:13 ` Zhu Yanjun
5 siblings, 1 reply; 107+ messages in thread
From: Stephen Hemminger @ 2024-06-15 0:16 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel, linux-rdma, netdev, dri-devel, ogabbay, zyehudai
> +
> +/* get the src IP as it is done in devinet_ioctl() */
> +static int hbl_en_get_src_ip(struct hbl_aux_dev *aux_dev, u32 port_idx, u32 *src_ip)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> + struct net_device *ndev = port->ndev;
> + struct in_device *in_dev;
> + struct in_ifaddr *ifa;
> + int rc = 0;
> +
> + /* for the case where no src IP is configured */
> + *src_ip = 0;
> +
> + /* rtnl lock should be acquired in relevant flows before taking configuration lock */
> + if (!rtnl_is_locked()) {
> + netdev_err(port->ndev, "Rtnl lock is not acquired, can't proceed\n");
> + rc = -EFAULT;
> + goto out;
> + }
> +
> + in_dev = __in_dev_get_rtnl(ndev);
> + if (!in_dev) {
> + netdev_err(port->ndev, "Failed to get IPv4 struct\n");
> + rc = -EFAULT;
> + goto out;
> + }
> +
> + ifa = rtnl_dereference(in_dev->ifa_list);
> +
> + while (ifa) {
> + if (!strcmp(ndev->name, ifa->ifa_label)) {
> + /* convert the BE to native and later on it will be
> + * written to the HW as LE in QPC_SET
> + */
> + *src_ip = be32_to_cpu(ifa->ifa_local);
> + break;
> + }
> + ifa = rtnl_dereference(ifa->ifa_next);
> + }
> +out:
> + return rc;
> +}
Does this device require IPv4? What about users and infrastructures that use IPv6 only?
IPv4 is legacy at this point.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-13 8:22 ` [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver Omer Shpigelman
` (3 preceding siblings ...)
2024-06-15 0:16 ` Stephen Hemminger
@ 2024-06-15 10:55 ` Zhu Yanjun
2024-06-18 11:16 ` Omer Shpigelman
2024-06-15 17:13 ` Zhu Yanjun
5 siblings, 1 reply; 107+ messages in thread
From: Zhu Yanjun @ 2024-06-15 10:55 UTC (permalink / raw)
To: Omer Shpigelman, linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, zyehudai
在 2024/6/13 16:22, Omer Shpigelman 写道:
> This ethernet driver is initialized via auxiliary bus by the hbl_cn
> driver.
> It serves mainly for control operations that are needed for AI scaling.
>
> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> Co-developed-by: David Meriin <dmeriin@habana.ai>
> Signed-off-by: David Meriin <dmeriin@habana.ai>
> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> ---
> MAINTAINERS | 9 +
> drivers/net/ethernet/intel/Kconfig | 18 +
> drivers/net/ethernet/intel/Makefile | 1 +
> drivers/net/ethernet/intel/hbl_en/Makefile | 9 +
> .../net/ethernet/intel/hbl_en/common/Makefile | 3 +
> .../net/ethernet/intel/hbl_en/common/hbl_en.c | 1168 +++++++++++++++++
> .../net/ethernet/intel/hbl_en/common/hbl_en.h | 206 +++
> .../intel/hbl_en/common/hbl_en_dcbnl.c | 101 ++
> .../ethernet/intel/hbl_en/common/hbl_en_drv.c | 211 +++
> .../intel/hbl_en/common/hbl_en_ethtool.c | 452 +++++++
> 10 files changed, 2178 insertions(+)
> create mode 100644 drivers/net/ethernet/intel/hbl_en/Makefile
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/Makefile
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 096439a62129..7301f38e9cfb 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9617,6 +9617,15 @@ F: include/linux/habanalabs/
> F: include/linux/net/intel/cn*
> F: include/linux/net/intel/gaudi2*
>
> +HABANALABS ETHERNET DRIVER
> +M: Omer Shpigelman <oshpigelman@habana.ai>
> +L: netdev@vger.kernel.org
> +S: Supported
> +W: https://www.habana.ai
> +F: Documentation/networking/device_drivers/ethernet/intel/hbl.rst
> +F: drivers/net/ethernet/intel/hbl_en/
> +F: include/linux/net/intel/cn*
> +
> HACKRF MEDIA DRIVER
> L: linux-media@vger.kernel.org
> S: Orphan
> diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
> index 0d1b8a2bae99..5d07349348a0 100644
> --- a/drivers/net/ethernet/intel/Kconfig
> +++ b/drivers/net/ethernet/intel/Kconfig
> @@ -417,4 +417,22 @@ config HABANA_CN
> To compile this driver as a module, choose M here. The module
> will be called habanalabs_cn.
>
> +config HABANA_EN
> + tristate "HabanaLabs (an Intel Company) Ethernet driver"
> + depends on NETDEVICES && ETHERNET && INET
> + select HABANA_CN
> + help
> + This driver enables Ethernet functionality for the network interfaces
> + that are part of the GAUDI ASIC family of AI Accelerators.
> + For more information on how to identify your adapter, go to the
> + Adapter & Driver ID Guide that can be located at:
> +
> + <http://support.intel.com>
> +
> + More specific information on configuring the driver is in
> + <file:Documentation/networking/device_drivers/ethernet/intel/hbl.rst>.
> +
> + To compile this driver as a module, choose M here. The module
> + will be called habanalabs_en.
> +
> endif # NET_VENDOR_INTEL
> diff --git a/drivers/net/ethernet/intel/Makefile b/drivers/net/ethernet/intel/Makefile
> index 10049a28e336..ec62a0227897 100644
> --- a/drivers/net/ethernet/intel/Makefile
> +++ b/drivers/net/ethernet/intel/Makefile
> @@ -20,3 +20,4 @@ obj-$(CONFIG_FM10K) += fm10k/
> obj-$(CONFIG_ICE) += ice/
> obj-$(CONFIG_IDPF) += idpf/
> obj-$(CONFIG_HABANA_CN) += hbl_cn/
> +obj-$(CONFIG_HABANA_EN) += hbl_en/
> diff --git a/drivers/net/ethernet/intel/hbl_en/Makefile b/drivers/net/ethernet/intel/hbl_en/Makefile
> new file mode 100644
> index 000000000000..695497ab93b6
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/Makefile
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Makefile for HabanaLabs (an Intel Company) Ethernet network driver
> +#
> +
> +obj-$(CONFIG_HABANA_EN) := habanalabs_en.o
> +
> +include $(src)/common/Makefile
> +habanalabs_en-y += $(HBL_EN_COMMON_FILES)
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/Makefile b/drivers/net/ethernet/intel/hbl_en/common/Makefile
> new file mode 100644
> index 000000000000..a3ccb5dbf4a6
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +HBL_EN_COMMON_FILES := common/hbl_en_drv.o common/hbl_en.o \
> + common/hbl_en_ethtool.o common/hbl_en_dcbnl.o
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
> new file mode 100644
> index 000000000000..066be5ac2d84
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
> @@ -0,0 +1,1168 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#include "hbl_en.h"
> +#include <linux/inetdevice.h>
> +
> +#define TX_TIMEOUT (5 * HZ)
> +#define PORT_RESET_TIMEOUT_MSEC (60 * 1000ull) /* 60s */
> +
> +/**
> + * struct hbl_en_tx_pkt_work - used to schedule a work of a Tx packet.
> + * @tx_work: workqueue object to run when packet needs to be sent.
> + * @port: pointer to current port structure.
> + * @skb: copy of the packet to send.
> + */
> +struct hbl_en_tx_pkt_work {
> + struct work_struct tx_work;
> + struct hbl_en_port *port;
> + struct sk_buff *skb;
> +};
> +
> +static int hbl_en_napi_poll(struct napi_struct *napi, int budget);
> +static int hbl_en_port_open(struct hbl_en_port *port);
> +
> +static int hbl_en_ports_reopen(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> + struct hbl_en_port *port;
> + int rc = 0, i;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + /* It could be that the port was shutdown by 'ip link set down' and there is no need
> + * in reopening it.
> + * Since we mark the ports as in reset even if they are disabled, we clear the flag
> + * here anyway.
> + * See hbl_en_ports_stop_prepare() for more info.
> + */
> + if (!netif_running(port->ndev)) {
> + atomic_set(&port->in_reset, 0);
> + continue;
> + }
> +
> + rc = hbl_en_port_open(port);
> +
> + atomic_set(&port->in_reset, 0);
> +
> + if (rc)
> + break;
> + }
> +
> + hdev->in_reset = false;
> +
> + return rc;
> +}
> +
> +static void hbl_en_port_fini(struct hbl_en_port *port)
> +{
> + if (port->rx_wq)
> + destroy_workqueue(port->rx_wq);
> +}
> +
> +static int hbl_en_port_init(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + u32 port_idx = port->idx;
> + char wq_name[32];
> + int rc;
> +
> + if (hdev->poll_enable) {
> + memset(wq_name, 0, sizeof(wq_name));
> + snprintf(wq_name, sizeof(wq_name) - 1, "hbl%u-port%d-rx-wq", hdev->core_dev_id,
> + port_idx);
> + port->rx_wq = alloc_ordered_workqueue(wq_name, 0);
> + if (!port->rx_wq) {
> + dev_err(hdev->dev, "Failed to allocate Rx WQ\n");
> + rc = -ENOMEM;
> + goto fail;
> + }
> + }
> +
> + hbl_en_ethtool_init_coalesce(port);
> +
> + return 0;
> +
> +fail:
> + hbl_en_port_fini(port);
> +
> + return rc;
> +}
> +
> +static void _hbl_en_set_port_status(struct hbl_en_port *port, bool up)
> +{
> + struct net_device *ndev = port->ndev;
> + u32 port_idx = port->idx;
> +
> + if (up) {
> + netif_carrier_on(ndev);
> + netif_wake_queue(ndev);
> + } else {
> + netif_carrier_off(ndev);
> + netif_stop_queue(ndev);
> + }
> +
> + /* Unless link events are getting through the EQ, no need to print about link down events
> + * during port reset
> + */
> + if (port->hdev->has_eq || up || !atomic_read(&port->in_reset))
> + netdev_info(port->ndev, "link %s, port %d\n", up ? "up" : "down", port_idx);
> +}
> +
> +static void hbl_en_set_port_status(struct hbl_aux_dev *aux_dev, u32 port_idx, bool up)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + _hbl_en_set_port_status(port, up);
> +}
> +
> +static bool hbl_en_is_port_open(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + return port->is_initialized;
> +}
> +
> +/* get the src IP as it is done in devinet_ioctl() */
> +static int hbl_en_get_src_ip(struct hbl_aux_dev *aux_dev, u32 port_idx, u32 *src_ip)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> + struct net_device *ndev = port->ndev;
> + struct in_device *in_dev;
> + struct in_ifaddr *ifa;
> + int rc = 0;
> +
> + /* for the case where no src IP is configured */
> + *src_ip = 0;
> +
> + /* rtnl lock should be acquired in relevant flows before taking configuration lock */
> + if (!rtnl_is_locked()) {
> + netdev_err(port->ndev, "Rtnl lock is not acquired, can't proceed\n");
> + rc = -EFAULT;
> + goto out;
> + }
> +
> + in_dev = __in_dev_get_rtnl(ndev);
> + if (!in_dev) {
> + netdev_err(port->ndev, "Failed to get IPv4 struct\n");
> + rc = -EFAULT;
> + goto out;
> + }
> +
> + ifa = rtnl_dereference(in_dev->ifa_list);
> +
> + while (ifa) {
> + if (!strcmp(ndev->name, ifa->ifa_label)) {
> + /* convert the BE to native and later on it will be
> + * written to the HW as LE in QPC_SET
> + */
> + *src_ip = be32_to_cpu(ifa->ifa_local);
> + break;
> + }
> + ifa = rtnl_dereference(ifa->ifa_next);
> + }
> +out:
> + return rc;
> +}
> +
> +static void hbl_en_reset_stats(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + port->net_stats.rx_packets = 0;
> + port->net_stats.tx_packets = 0;
> + port->net_stats.rx_bytes = 0;
> + port->net_stats.tx_bytes = 0;
> + port->net_stats.tx_errors = 0;
> + atomic64_set(&port->net_stats.rx_dropped, 0);
> + atomic64_set(&port->net_stats.tx_dropped, 0);
> +}
> +
> +static u32 hbl_en_get_mtu(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> + struct net_device *ndev = port->ndev;
> + u32 mtu;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(ndev, "port is in reset, can't get MTU\n");
> + return 0;
> + }
> +
> + mtu = ndev->mtu;
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return mtu;
> +}
> +
> +static u32 hbl_en_get_pflags(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + return port->pflags;
> +}
> +
> +static void hbl_en_set_dev_lpbk(struct hbl_aux_dev *aux_dev, u32 port_idx, bool enable)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> + struct net_device *ndev = port->ndev;
> +
> + if (enable)
> + ndev->features |= NETIF_F_LOOPBACK;
> + else
> + ndev->features &= ~NETIF_F_LOOPBACK;
> +}
> +
> +/* This function should be called after ctrl_lock was taken */
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/dev-tools/sparse.rst?h=v6.10-rc3#n64
"
__must_hold - The specified lock is held on function entry and exit.
"
Add "__must_hold" to confirm "The specified lock is held on function
entry and exit." ?
Zhu Yanjun
> +static int hbl_en_port_open_locked(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct net_device *ndev = port->ndev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int rc;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + if (port->is_initialized)
> + return 0;
> +
> + if (!hdev->poll_enable)
> + netif_napi_add(ndev, &port->napi, hbl_en_napi_poll);
> +
> + rc = aux_ops->port_hw_init(aux_dev, port_idx);
> + if (rc) {
> + netdev_err(ndev, "Failed to configure the HW, rc %d\n", rc);
> + goto hw_init_fail;
> + }
> +
> + if (!hdev->poll_enable)
> + napi_enable(&port->napi);
> +
> + rc = hdev->asic_funcs.eth_port_open(port);
> + if (rc) {
> + netdev_err(ndev, "Failed to init H/W, rc %d\n", rc);
> + goto port_open_fail;
> + }
> +
> + rc = aux_ops->update_mtu(aux_dev, port_idx, ndev->mtu);
> + if (rc) {
> + netdev_err(ndev, "MTU update failed, rc %d\n", rc);
> + goto update_mtu_fail;
> + }
> +
> + rc = aux_ops->phy_init(aux_dev, port_idx);
> + if (rc) {
> + netdev_err(ndev, "PHY init failed, rc %d\n", rc);
> + goto phy_init_fail;
> + }
> +
> + netif_start_queue(ndev);
> +
> + port->is_initialized = true;
> +
> + return 0;
> +
> +phy_init_fail:
> + /* no need to revert the MTU change, it will be updated on next port open */
> +update_mtu_fail:
> + hdev->asic_funcs.eth_port_close(port);
> +port_open_fail:
> + if (!hdev->poll_enable)
> + napi_disable(&port->napi);
> +
> + aux_ops->port_hw_fini(aux_dev, port_idx);
> +hw_init_fail:
> + if (!hdev->poll_enable)
> + netif_napi_del(&port->napi);
> +
> + return rc;
> +}
> +
> +static int hbl_en_port_open(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int rc;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + aux_ops->ctrl_lock(aux_dev, port_idx);
> + rc = hbl_en_port_open_locked(port);
> + aux_ops->ctrl_unlock(aux_dev, port_idx);
> +
> + return rc;
> +}
> +
> +static int hbl_en_open(struct net_device *netdev)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + int rc;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(netdev, "port is in reset, can't open it\n");
> + return -EBUSY;
> + }
> +
> + rc = hbl_en_port_open(port);
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +/* This function should be called after ctrl_lock was taken */
> +static void hbl_en_port_close_locked(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + if (!port->is_initialized)
> + return;
> +
> + port->is_initialized = false;
> +
> + /* verify that the port is marked as closed before continuing */
> + mb();
> +
> + /* Print if not in hard reset flow e.g. from ip cmd */
> + if (!hdev->in_reset && netif_carrier_ok(port->ndev))
> + netdev_info(port->ndev, "port was closed\n");
> +
> + /* disable the PHY here so no link changes will occur from this point forward */
> + aux_ops->phy_fini(aux_dev, port_idx);
> +
> + /* disable Tx SW flow */
> + netif_carrier_off(port->ndev);
> + netif_tx_disable(port->ndev);
> +
> + /* stop Tx/Rx HW */
> + aux_ops->port_hw_fini(aux_dev, port_idx);
> +
> + /* disable Tx/Rx QPs */
> + hdev->asic_funcs.eth_port_close(port);
> +
> + /* stop Rx SW flow */
> + if (hdev->poll_enable) {
> + hbl_en_rx_poll_stop(port);
> + } else {
> + napi_disable(&port->napi);
> + netif_napi_del(&port->napi);
> + }
> +
> + /* Explicitly count the port close operations as we don't get a link event for this.
> + * Upon port open we receive a link event, hence no additional action required.
> + */
> + aux_ops->port_toggle_count(aux_dev, port_idx);
> +}
> +
> +static void hbl_en_port_close(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + aux_ops->ctrl_lock(aux_dev, port_idx);
> + hbl_en_port_close_locked(port);
> + aux_ops->ctrl_unlock(aux_dev, port_idx);
> +}
> +
> +/* This function should be called after ctrl_lock was taken */
> +static int __hbl_en_port_reset_locked(struct hbl_en_port *port)
> +{
> + hbl_en_port_close_locked(port);
> +
> + return hbl_en_port_open_locked(port);
> +}
> +
> +/* This function should be called after ctrl_lock was taken */
> +int hbl_en_port_reset_locked(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + return __hbl_en_port_reset_locked(port);
> +}
> +
> +int hbl_en_port_reset(struct hbl_en_port *port)
> +{
> + hbl_en_port_close(port);
> +
> + /* Sleep in order to let obsolete events to be dropped before re-opening the port */
> + msleep(20);
> +
> + return hbl_en_port_open(port);
> +}
> +
> +static int hbl_en_close(struct net_device *netdev)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + struct hbl_en_device *hdev = port->hdev;
> + ktime_t timeout;
> +
> + /* Looks like the return value of this function is not checked, so we can't just return
> + * EBUSY if the port is under reset. We need to wait until the reset is finished and then
> + * close the port. Otherwise the netdev will set the port as closed although port_close()
> + * wasn't called. Only if we waited long enough and the reset hasn't finished, we can return
> + * an error without actually closing the port as it is a fatal flow anyway.
> + */
> + timeout = ktime_add_ms(ktime_get(), PORT_RESET_TIMEOUT_MSEC);
> + while (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + /* If this is called from unregister_netdev() then the port was already closed and
> + * hence we can safely return.
> + * We could have just check the port_open boolean, but that might hide some future
> + * bugs. Hence it is better to use a dedicated flag for that.
> + */
> + if (READ_ONCE(hdev->in_teardown))
> + return 0;
> +
> + usleep_range(50, 200);
> + if (ktime_compare(ktime_get(), timeout) > 0) {
> + netdev_crit(netdev,
> + "Timeout while waiting for port to finish reset, can't close it\n"
> + );
> + return -EBUSY;
> + }
> + }
> +
> + hbl_en_port_close(port);
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return 0;
> +}
> +
> +/**
> + * hbl_en_ports_stop_prepare() - stop the Rx and Tx and synchronize with other reset flows.
> + * @aux_dev: habanalabs auxiliary device structure.
> + *
> + * This function makes sure that during the reset no packets will be processed and that
> + * ndo_open/ndo_close do not open/close the ports.
> + * A hard reset might occur right after the driver was loaded, which means before the ports
> + * initialization was finished. Therefore, even if the ports are not yet open, we mark it as in
> + * reset in order to avoid races. We clear the in reset flag later on when reopening the ports.
> + */
> +static void hbl_en_ports_stop_prepare(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> + struct hbl_en_port *port;
> + ktime_t timeout;
> + int i;
> +
> + /* Check if the ports where initialized. If not, we shouldn't mark them as in reset because
> + * they will fail to get opened.
> + */
> + if (!hdev->is_initialized || hdev->in_reset)
> + return;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + /* This function is competing with reset from ethtool/ip, so try to take the
> + * in_reset atomic and if we are already in a middle of reset, wait until reset
> + * function is finished.
> + * Reset function is designed to always finish (could take up to a few seconds in
> + * worst case).
> + * We mark also closed ports as in reset so they won't be able to get opened while
> + * the device in under reset.
> + */
> +
> + timeout = ktime_add_ms(ktime_get(), PORT_RESET_TIMEOUT_MSEC);
> + while (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + usleep_range(50, 200);
> + if (ktime_compare(ktime_get(), timeout) > 0) {
> + netdev_crit(port->ndev,
> + "Timeout while waiting for port %d to finish reset\n",
> + port->idx);
> + break;
> + }
> + }
> + }
> +
> + hdev->in_reset = true;
> +}
> +
> +static void hbl_en_ports_stop(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> + struct hbl_en_port *port;
> + int i;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + if (netif_running(port->ndev))
> + hbl_en_port_close(port);
> + }
> +}
> +
> +static int hbl_en_change_mtu(struct net_device *netdev, int new_mtu)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + int rc = 0;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(netdev, "port is in reset, can't change MTU\n");
> + return -EBUSY;
> + }
> +
> + if (netif_running(port->ndev)) {
> + hbl_en_port_close(port);
> +
> + /* Sleep in order to let obsolete events to be dropped before re-opening the port */
> + msleep(20);
> +
> + netdev->mtu = new_mtu;
> +
> + rc = hbl_en_port_open(port);
> + if (rc)
> + netdev_err(netdev, "Failed to reinit port for MTU change, rc %d\n", rc);
> + } else {
> + netdev->mtu = new_mtu;
> + }
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +/* Swap source and destination MAC addresses */
> +static inline void swap_l2(char *buf)
> +{
> + u16 *eth_hdr, tmp;
> +
> + eth_hdr = (u16 *)buf;
> + tmp = eth_hdr[0];
> + eth_hdr[0] = eth_hdr[3];
> + eth_hdr[3] = tmp;
> + tmp = eth_hdr[1];
> + eth_hdr[1] = eth_hdr[4];
> + eth_hdr[4] = tmp;
> + tmp = eth_hdr[2];
> + eth_hdr[2] = eth_hdr[5];
> + eth_hdr[5] = tmp;
> +}
> +
> +/* Swap source and destination IP addresses
> + */
> +static inline void swap_l3(char *buf)
> +{
> + u32 tmp;
> +
> + /* skip the Ethernet header and the IP header till source IP address */
> + buf += ETH_HLEN + 12;
> + tmp = ((u32 *)buf)[0];
> + ((u32 *)buf)[0] = ((u32 *)buf)[1];
> + ((u32 *)buf)[1] = tmp;
> +}
> +
> +static void do_tx_swap(struct hbl_en_port *port, struct sk_buff *skb)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + u16 *tmp_buff = (u16 *)skb->data;
> + u32 port_idx = port->idx;
> +
> + /* First, let's print the SKB we got */
> + dev_dbg_ratelimited(hdev->dev,
> + "Send [P%d]: dst-mac:%04x%04x%04x, src-mac:%04x%04x%04x, eth-type:%04x, len:%u\n",
> + port_idx, swab16(tmp_buff[0]), swab16(tmp_buff[1]), swab16(tmp_buff[2]),
> + swab16(tmp_buff[3]), swab16(tmp_buff[4]), swab16(tmp_buff[5]),
> + swab16(tmp_buff[6]), skb->len);
> +
> + /* Before submit it to HW, in case this is ipv4 pkt, swap eth/ip addresses.
> + * that way, we may send ECMP (ping) to ourselves in LB cases.
> + */
> + swap_l2(skb->data);
> + if (swab16(tmp_buff[6]) == ETH_P_IP)
> + swap_l3(skb->data);
> +}
> +
> +static bool is_pkt_swap_enabled(struct hbl_en_device *hdev)
> +{
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + return aux_ops->is_eth_lpbk(aux_dev);
> +}
> +
> +static bool is_tx_disabled(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + return aux_ops->get_mac_lpbk(aux_dev, port_idx) && !is_pkt_swap_enabled(hdev);
> +}
> +
> +static netdev_tx_t hbl_en_handle_tx(struct hbl_en_port *port, struct sk_buff *skb)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + netdev_tx_t ret;
> +
> + if (skb->len <= 0 || is_tx_disabled(port))
> + goto free_skb;
> +
> + if (skb->len > hdev->max_frm_len) {
> + netdev_err(port->ndev, "Tx pkt size %uB exceeds maximum of %uB\n", skb->len,
> + hdev->max_frm_len);
> + goto free_skb;
> + }
> +
> + if (is_pkt_swap_enabled(hdev))
> + do_tx_swap(port, skb);
> +
> + /* Pad the ethernet packets to the minimum frame size as the NIC hw doesn't do it.
> + * eth_skb_pad() frees the packet on failure, so just increment the dropped counter and
> + * return as success to avoid a retry.
> + */
> + if (skb_put_padto(skb, hdev->pad_size)) {
> + dev_err_ratelimited(hdev->dev, "Padding failed, the skb is dropped\n");
> + atomic64_inc(&port->net_stats.tx_dropped);
> + return NETDEV_TX_OK;
> + }
> +
> + ret = hdev->asic_funcs.write_pkt_to_hw(port, skb);
> + if (ret == NETDEV_TX_OK) {
> + port->net_stats.tx_packets++;
> + port->net_stats.tx_bytes += skb->len;
> + }
> +
> + return ret;
> +
> +free_skb:
> + dev_kfree_skb_any(skb);
> + return NETDEV_TX_OK;
> +}
> +
> +static netdev_tx_t hbl_en_start_xmit(struct sk_buff *skb, struct net_device *netdev)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + struct hbl_en_device *hdev;
> +
> + hdev = port->hdev;
> +
> + return hbl_en_handle_tx(port, skb);
> +}
> +
> +static int hbl_en_set_port_mac_loopback(struct hbl_en_port *port, bool enable)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct net_device *ndev = port->ndev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int rc;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + rc = aux_ops->set_mac_lpbk(aux_dev, port_idx, enable);
> + if (rc)
> + return rc;
> +
> + netdev_info(ndev, "port %u: mac loopback is %s\n", port_idx,
> + enable ? "enabled" : "disabled");
> +
> + if (netif_running(ndev)) {
> + rc = hbl_en_port_reset(port);
> + if (rc) {
> + netdev_err(ndev, "Failed to reset port %u, rc %d\n", port_idx, rc);
> + return rc;
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int hbl_en_set_features(struct net_device *netdev, netdev_features_t features)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + netdev_features_t changed;
> + int rc = 0;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(netdev, "port %d is in reset, can't update settings", port->idx);
> + return -EBUSY;
> + }
> +
> + changed = netdev->features ^ features;
> +
> + if (changed & NETIF_F_LOOPBACK)
> + rc = hbl_en_set_port_mac_loopback(port, !!(features & NETIF_F_LOOPBACK));
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +static void hbl_en_handle_tx_timeout(struct net_device *netdev, unsigned int txqueue)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> +
> + port->net_stats.tx_errors++;
> + atomic64_inc(&port->net_stats.tx_dropped);
> +}
> +
> +static void hbl_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(dev);
> +
> + stats->rx_bytes = port->net_stats.rx_bytes;
> + stats->tx_bytes = port->net_stats.tx_bytes;
> + stats->rx_packets = port->net_stats.rx_packets;
> + stats->tx_packets = port->net_stats.tx_packets;
> + stats->tx_errors = port->net_stats.tx_errors;
> + stats->tx_dropped = (u64)atomic64_read(&port->net_stats.tx_dropped);
> + stats->rx_dropped = (u64)atomic64_read(&port->net_stats.rx_dropped);
> +}
> +
> +static const struct net_device_ops hbl_en_netdev_ops = {
> + .ndo_open = hbl_en_open,
> + .ndo_stop = hbl_en_close,
> + .ndo_start_xmit = hbl_en_start_xmit,
> + .ndo_validate_addr = eth_validate_addr,
> + .ndo_change_mtu = hbl_en_change_mtu,
> + .ndo_set_features = hbl_en_set_features,
> + .ndo_get_stats64 = hbl_en_get_stats64,
> + .ndo_tx_timeout = hbl_en_handle_tx_timeout,
> +};
> +
> +static void hbl_en_set_ops(struct net_device *ndev)
> +{
> + ndev->netdev_ops = &hbl_en_netdev_ops;
> + ndev->ethtool_ops = hbl_en_ethtool_get_ops(ndev);
> +#ifdef CONFIG_DCB
> + ndev->dcbnl_ops = &hbl_en_dcbnl_ops;
> +#endif
> +}
> +
> +static int hbl_en_port_register(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + struct hbl_en_port **ptr;
> + struct net_device *ndev;
> + int rc;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + ndev = alloc_etherdev(sizeof(struct hbl_en_port *));
> + if (!ndev) {
> + dev_err(hdev->dev, "netdevice %d alloc failed\n", port_idx);
> + return -ENOMEM;
> + }
> +
> + port->ndev = ndev;
> + SET_NETDEV_DEV(ndev, &hdev->pdev->dev);
> + ptr = netdev_priv(ndev);
> + *ptr = port;
> +
> + /* necessary for creating multiple interfaces */
> + ndev->dev_port = port_idx;
> +
> + hbl_en_set_ops(ndev);
> +
> + ndev->watchdog_timeo = TX_TIMEOUT;
> + ndev->min_mtu = hdev->min_raw_mtu;
> + ndev->max_mtu = hdev->max_raw_mtu;
> +
> + /* Add loopback capability to the device. */
> + ndev->hw_features |= NETIF_F_LOOPBACK;
> +
> + /* If this port was set to loopback, set it also to the ndev features */
> + if (aux_ops->get_mac_lpbk(aux_dev, port_idx))
> + ndev->features |= NETIF_F_LOOPBACK;
> +
> + eth_hw_addr_set(ndev, port->mac_addr);
> +
> + /* It's more an intelligent poll wherein, we enable the Rx completion EQE event and then
> + * start the poll from there.
> + * Inside the polling thread, we read packets from hardware and then reschedule the poll
> + * only if there are more packets to be processed. Else we re-enable the CQ Arm interrupt
> + * and exit the poll.
> + */
> + if (hdev->poll_enable)
> + hbl_en_rx_poll_trigger_init(port);
> +
> + netif_carrier_off(ndev);
> +
> + rc = register_netdev(ndev);
> + if (rc) {
> + dev_err(hdev->dev, "Could not register netdevice %d\n", port_idx);
> + goto err;
> + }
> +
> + return 0;
> +
> +err:
> + if (ndev) {
> + free_netdev(ndev);
> + port->ndev = NULL;
> + }
> +
> + return rc;
> +}
> +
> +static void dump_swap_pkt(struct hbl_en_port *port, struct sk_buff *skb)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + u16 *tmp_buff = (u16 *)skb->data;
> + u32 port_idx = port->idx;
> +
> + /* The SKB is ready now (before stripping-out the L2), print its content */
> + dev_dbg_ratelimited(hdev->dev,
> + "Recv [P%d]: dst-mac:%04x%04x%04x, src-mac:%04x%04x%04x, eth-type:%04x, len:%u\n",
> + port_idx, swab16(tmp_buff[0]), swab16(tmp_buff[1]), swab16(tmp_buff[2]),
> + swab16(tmp_buff[3]), swab16(tmp_buff[4]), swab16(tmp_buff[5]),
> + swab16(tmp_buff[6]), skb->len);
> +}
> +
> +int hbl_en_handle_rx(struct hbl_en_port *port, int budget)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + enum hbl_en_eth_pkt_status pkt_status;
> + struct net_device *ndev = port->ndev;
> + int rc, pkt_count = 0;
> + struct sk_buff *skb;
> + void *pkt_addr;
> + u32 pkt_size;
> +
> + if (!netif_carrier_ok(ndev))
> + return 0;
> +
> + while (pkt_count < budget) {
> + pkt_status = hdev->asic_funcs.read_pkt_from_hw(port, &pkt_addr, &pkt_size);
> +
> + if (pkt_status == ETH_PKT_NONE)
> + break;
> +
> + pkt_count++;
> +
> + if (pkt_status == ETH_PKT_DROP) {
> + atomic64_inc(&port->net_stats.rx_dropped);
> + continue;
> + }
> +
> + if (hdev->poll_enable)
> + skb = __netdev_alloc_skb_ip_align(ndev, pkt_size, GFP_KERNEL);
> + else
> + skb = napi_alloc_skb(&port->napi, pkt_size);
> +
> + if (!skb) {
> + atomic64_inc(&port->net_stats.rx_dropped);
> + break;
> + }
> +
> + skb_copy_to_linear_data(skb, pkt_addr, pkt_size);
> + skb_put(skb, pkt_size);
> +
> + if (is_pkt_swap_enabled(hdev))
> + dump_swap_pkt(port, skb);
> +
> + skb->protocol = eth_type_trans(skb, ndev);
> +
> + /* Zero the packet buffer memory to avoid leak in case of wrong
> + * size is used when next packet populates the same memory
> + */
> + memset(pkt_addr, 0, pkt_size);
> +
> + /* polling is done in thread context and hence BH should be disabled */
> + if (hdev->poll_enable)
> + local_bh_disable();
> +
> + rc = netif_receive_skb(skb);
> +
> + if (hdev->poll_enable)
> + local_bh_enable();
> +
> + if (rc == NET_RX_SUCCESS) {
> + port->net_stats.rx_packets++;
> + port->net_stats.rx_bytes += pkt_size;
> + } else {
> + atomic64_inc(&port->net_stats.rx_dropped);
> + }
> + }
> +
> + return pkt_count;
> +}
> +
> +static bool __hbl_en_rx_poll_schedule(struct hbl_en_port *port, unsigned long delay)
> +{
> + return queue_delayed_work(port->rx_wq, &port->rx_poll_work, delay);
> +}
> +
> +static void hbl_en_rx_poll_work(struct work_struct *work)
> +{
> + struct hbl_en_port *port = container_of(work, struct hbl_en_port, rx_poll_work.work);
> + struct hbl_en_device *hdev = port->hdev;
> + int pkt_count;
> +
> + pkt_count = hbl_en_handle_rx(port, NAPI_POLL_WEIGHT);
> +
> + /* Reschedule the poll if we have consumed budget which means we still have packets to
> + * process. Else re-enable the Rx IRQs and exit the work.
> + */
> + if (pkt_count < NAPI_POLL_WEIGHT)
> + hdev->asic_funcs.reenable_rx_irq(port);
> + else
> + __hbl_en_rx_poll_schedule(port, 0);
> +}
> +
> +/* Rx poll init and trigger routines are used in event-driven setups where
> + * Rx polling is initialized once during init or open and started/triggered by the event handler.
> + */
> +void hbl_en_rx_poll_trigger_init(struct hbl_en_port *port)
> +{
> + INIT_DELAYED_WORK(&port->rx_poll_work, hbl_en_rx_poll_work);
> +}
> +
> +bool hbl_en_rx_poll_start(struct hbl_en_port *port)
> +{
> + return __hbl_en_rx_poll_schedule(port, msecs_to_jiffies(1));
> +}
> +
> +void hbl_en_rx_poll_stop(struct hbl_en_port *port)
> +{
> + cancel_delayed_work_sync(&port->rx_poll_work);
> +}
> +
> +static int hbl_en_napi_poll(struct napi_struct *napi, int budget)
> +{
> + struct hbl_en_port *port = container_of(napi, struct hbl_en_port, napi);
> + struct hbl_en_device *hdev = port->hdev;
> + int pkt_count;
> +
> + /* exit if we are called by netpoll as we free the Tx ring via EQ (if enabled) */
> + if (!budget)
> + return 0;
> +
> + pkt_count = hbl_en_handle_rx(port, budget);
> +
> + /* If budget not fully consumed, exit the polling mode */
> + if (pkt_count < budget) {
> + napi_complete_done(napi, pkt_count);
> + hdev->asic_funcs.reenable_rx_irq(port);
> + }
> +
> + return pkt_count;
> +}
> +
> +static void hbl_en_port_unregister(struct hbl_en_port *port)
> +{
> + struct net_device *ndev = port->ndev;
> +
> + unregister_netdev(ndev);
> + free_netdev(ndev);
> + port->ndev = NULL;
> +}
> +
> +static int hbl_en_set_asic_funcs(struct hbl_en_device *hdev)
> +{
> + switch (hdev->asic_type) {
> + case HBL_ASIC_GAUDI2:
> + default:
> + dev_err(hdev->dev, "Unrecognized ASIC type %d\n", hdev->asic_type);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static void hbl_en_handle_eqe(struct hbl_aux_dev *aux_dev, u32 port, struct hbl_cn_eqe *eqe)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> +
> + hdev->asic_funcs.handle_eqe(aux_dev, port, eqe);
> +}
> +
> +static void hbl_en_set_aux_ops(struct hbl_en_device *hdev, bool enable)
> +{
> + struct hbl_en_aux_ops *aux_ops = hdev->aux_dev->aux_ops;
> +
> + if (enable) {
> + aux_ops->ports_reopen = hbl_en_ports_reopen;
> + aux_ops->ports_stop_prepare = hbl_en_ports_stop_prepare;
> + aux_ops->ports_stop = hbl_en_ports_stop;
> + aux_ops->set_port_status = hbl_en_set_port_status;
> + aux_ops->is_port_open = hbl_en_is_port_open;
> + aux_ops->get_src_ip = hbl_en_get_src_ip;
> + aux_ops->reset_stats = hbl_en_reset_stats;
> + aux_ops->get_mtu = hbl_en_get_mtu;
> + aux_ops->get_pflags = hbl_en_get_pflags;
> + aux_ops->set_dev_lpbk = hbl_en_set_dev_lpbk;
> + aux_ops->handle_eqe = hbl_en_handle_eqe;
> + } else {
> + aux_ops->ports_reopen = NULL;
> + aux_ops->ports_stop_prepare = NULL;
> + aux_ops->ports_stop = NULL;
> + aux_ops->set_port_status = NULL;
> + aux_ops->is_port_open = NULL;
> + aux_ops->get_src_ip = NULL;
> + aux_ops->reset_stats = NULL;
> + aux_ops->get_mtu = NULL;
> + aux_ops->get_pflags = NULL;
> + aux_ops->set_dev_lpbk = NULL;
> + aux_ops->handle_eqe = NULL;
> + }
> +}
> +
> +int hbl_en_dev_init(struct hbl_en_device *hdev)
> +{
> + struct hbl_en_asic_funcs *asic_funcs = &hdev->asic_funcs;
> + struct hbl_en_port *port;
> + int rc, i, port_cnt = 0;
> +
> + /* must be called before the call to dev_init() */
> + rc = hbl_en_set_asic_funcs(hdev);
> + if (rc) {
> + dev_err(hdev->dev, "failed to set aux ops\n");
> + return rc;
> + }
> +
> + rc = asic_funcs->dev_init(hdev);
> + if (rc) {
> + dev_err(hdev->dev, "device init failed\n");
> + return rc;
> + }
> +
> + /* init the function pointers here before calling hbl_en_port_register which sets up
> + * net_device_ops, and its ops might start getting called.
> + * If any failure is encountered, these will be made NULL and the core driver won't call
> + * them.
> + */
> + hbl_en_set_aux_ops(hdev, true);
> +
> + /* Port register depends on the above initialization so it must be called here and not
> + * before that.
> + */
> + for (i = 0; i < hdev->max_num_of_ports; i++, port_cnt++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + rc = hbl_en_port_init(port);
> + if (rc) {
> + dev_err(hdev->dev, "port init failed\n");
> + goto unregister_ports;
> + }
> +
> + rc = hbl_en_port_register(port);
> + if (rc) {
> + dev_err(hdev->dev, "port register failed\n");
> +
> + hbl_en_port_fini(port);
> + goto unregister_ports;
> + }
> + }
> +
> + hdev->is_initialized = true;
> +
> + return 0;
> +
> +unregister_ports:
> + for (i = 0; i < port_cnt; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + hbl_en_port_unregister(port);
> + hbl_en_port_fini(port);
> + }
> +
> + hbl_en_set_aux_ops(hdev, false);
> +
> + asic_funcs->dev_fini(hdev);
> +
> + return rc;
> +}
> +
> +void hbl_en_dev_fini(struct hbl_en_device *hdev)
> +{
> + struct hbl_en_asic_funcs *asic_funcs = &hdev->asic_funcs;
> + struct hbl_en_port *port;
> + int i;
> +
> + hdev->in_teardown = true;
> +
> + if (!hdev->is_initialized)
> + return;
> +
> + hdev->is_initialized = false;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + /* It could be this cleanup flow is called after a failed init flow.
> + * Hence we need to check that we indeed have a netdev to unregister.
> + */
> + if (!port->ndev)
> + continue;
> +
> + hbl_en_port_unregister(port);
> + hbl_en_port_fini(port);
> + }
> +
> + hbl_en_set_aux_ops(hdev, false);
> +
> + asic_funcs->dev_fini(hdev);
> +}
> +
> +dma_addr_t hbl_en_dma_map(struct hbl_en_device *hdev, void *addr, int len)
> +{
> + dma_addr_t dma_addr;
> +
> + if (hdev->dma_map_support)
> + dma_addr = dma_map_single(&hdev->pdev->dev, addr, len, DMA_TO_DEVICE);
> + else
> + dma_addr = virt_to_phys(addr);
> +
> + return dma_addr;
> +}
> +
> +void hbl_en_dma_unmap(struct hbl_en_device *hdev, dma_addr_t dma_addr, int len)
> +{
> + if (hdev->dma_map_support)
> + dma_unmap_single(&hdev->pdev->dev, dma_addr, len, DMA_TO_DEVICE);
> +}
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
> new file mode 100644
> index 000000000000..15504c1f3cfb
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
> @@ -0,0 +1,206 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#ifndef HABANALABS_EN_H_
> +#define HABANALABS_EN_H_
> +
> +#include <linux/net/intel/cn.h>
> +
> +#include <linux/netdevice.h>
> +#include <linux/pci.h>
> +
> +#define HBL_EN_NAME "habanalabs_en"
> +
> +#define HBL_EN_PORT(aux_dev, idx) (&(((struct hbl_en_device *)(aux_dev)->priv)->ports[(idx)]))
> +
> +#define hbl_netdev_priv(ndev) \
> +({ \
> + typecheck(struct net_device *, ndev); \
> + *(struct hbl_en_port **)netdev_priv(ndev); \
> +})
> +
> +/**
> + * enum hbl_en_eth_pkt_status - status of Rx Ethernet packet.
> + * ETH_PKT_OK: packet was received successfully.
> + * ETH_PKT_DROP: packet should be dropped.
> + * ETH_PKT_NONE: no available packet.
> + */
> +enum hbl_en_eth_pkt_status {
> + ETH_PKT_OK,
> + ETH_PKT_DROP,
> + ETH_PKT_NONE
> +};
> +
> +/**
> + * struct hbl_en_net_stats - stats of Ethernet interface.
> + * rx_packets: number of packets received.
> + * tx_packets: number of packets sent.
> + * rx_bytes: total bytes of data received.
> + * tx_bytes: total bytes of data sent.
> + * tx_errors: number of errors in the TX.
> + * rx_dropped: number of packets dropped by the RX.
> + * tx_dropped: number of packets dropped by the TX.
> + */
> +struct hbl_en_net_stats {
> + u64 rx_packets;
> + u64 tx_packets;
> + u64 rx_bytes;
> + u64 tx_bytes;
> + u64 tx_errors;
> + atomic64_t rx_dropped;
> + atomic64_t tx_dropped;
> +};
> +
> +/**
> + * struct hbl_en_port - manage port common structure.
> + * @hdev: habanalabs Ethernet device structure.
> + * @ndev: network device.
> + * @rx_wq: WQ for Rx poll when we cannot schedule NAPI poll.
> + * @mac_addr: HW MAC addresses.
> + * @asic_specific: ASIC specific port structure.
> + * @napi: New API structure.
> + * @rx_poll_work: Rx work for polling mode.
> + * @net_stats: statistics of the ethernet interface.
> + * @in_reset: true if the NIC was marked as in reset, false otherwise. Used to avoid an additional
> + * stopping of the NIC if a hard reset was re-initiated.
> + * @pflags: ethtool private flags bit mask.
> + * @idx: index of this specific port.
> + * @rx_max_coalesced_frames: Maximum number of packets to receive before an RX interrupt.
> + * @tx_max_coalesced_frames: Maximum number of packets to be sent before a TX interrupt.
> + * @rx_coalesce_usecs: How many usecs to delay an RX interrupt after a packet arrives.
> + * @is_initialized: true if the port H/W is initialized, false otherwise.
> + * @pfc_enable: true if this port supports Priority Flow Control, false otherwise.
> + * @auto_neg_enable: is autoneg enabled.
> + * @auto_neg_resolved: was autoneg phase finished successfully.
> + */
> +struct hbl_en_port {
> + struct hbl_en_device *hdev;
> + struct net_device *ndev;
> + struct workqueue_struct *rx_wq;
> + char *mac_addr;
> + void *asic_specific;
> + struct napi_struct napi;
> + struct delayed_work rx_poll_work;
> + struct hbl_en_net_stats net_stats;
> + atomic_t in_reset;
> + u32 pflags;
> + u32 idx;
> + u32 rx_max_coalesced_frames;
> + u32 tx_max_coalesced_frames;
> + u16 rx_coalesce_usecs;
> + u8 is_initialized;
> + u8 pfc_enable;
> + u8 auto_neg_enable;
> + u8 auto_neg_resolved;
> +};
> +
> +/**
> + * struct hbl_en_asic_funcs - ASIC specific Ethernet functions.
> + * @dev_init: device init.
> + * @dev_fini: device cleanup.
> + * @reenable_rx_irq: re-enable Rx interrupts.
> + * @eth_port_open: initialize and open the Ethernet port.
> + * @eth_port_close: close the Ethernet port.
> + * @write_pkt_to_hw: write skb to HW.
> + * @read_pkt_from_hw: read pkt from HW.
> + * @get_pfc_cnts: get PFC counters.
> + * @set_coalesce: set Tx/Rx coalesce config in HW.
> + * @get_rx_ring size: get max number of elements the Rx ring can contain.
> + * @handle_eqe: Handle a received event.
> + */
> +struct hbl_en_asic_funcs {
> + int (*dev_init)(struct hbl_en_device *hdev);
> + void (*dev_fini)(struct hbl_en_device *hdev);
> + void (*reenable_rx_irq)(struct hbl_en_port *port);
> + int (*eth_port_open)(struct hbl_en_port *port);
> + void (*eth_port_close)(struct hbl_en_port *port);
> + netdev_tx_t (*write_pkt_to_hw)(struct hbl_en_port *port, struct sk_buff *skb);
> + int (*read_pkt_from_hw)(struct hbl_en_port *port, void **pkt_addr, u32 *pkt_size);
> + void (*get_pfc_cnts)(struct hbl_en_port *port, void *ptr);
> + int (*set_coalesce)(struct hbl_en_port *port);
> + int (*get_rx_ring_size)(struct hbl_en_port *port);
> + void (*handle_eqe)(struct hbl_aux_dev *aux_dev, u32 port_idx, struct hbl_cn_eqe *eqe);
> +};
> +
> +/**
> + * struct hbl_en_device - habanalabs Ethernet device structure.
> + * @pdev: pointer to PCI device.
> + * @dev: related kernel basic device structure.
> + * @ports: array of all ports manage common structures.
> + * @aux_dev: pointer to auxiliary device.
> + * @asic_specific: ASIC specific device structure.
> + * @fw_ver: FW version.
> + * @qsfp_eeprom: QSFPD EEPROM info.
> + * @mac_addr: array of all MAC addresses.
> + * @asic_funcs: ASIC specific Ethernet functions.
> + * @asic_type: ASIC specific type.
> + * @ports_mask: mask of available ports.
> + * @auto_neg_mask: mask of port with Autonegotiation enabled.
> + * @port_reset_timeout: max time in seconds for a port reset flow to finish.
> + * @pending_reset_long_timeout: long timeout for pending hard reset to finish in seconds.
> + * @max_frm_len: maximum allowed frame length.
> + * @raw_elem_size: size of element in raw buffers.
> + * @max_raw_mtu: maximum MTU size for raw packets.
> + * @min_raw_mtu: minimum MTU size for raw packets.
> + * @pad_size: the pad size in bytes for the skb to transmit.
> + * @core_dev_id: core device ID.
> + * @max_num_of_ports: max number of available ports;
> + * @in_reset: is the entire NIC currently under reset.
> + * @poll_enable: Enable Rx polling rather than IRQ + NAPI.
> + * @in_teardown: true if the NIC is in teardown (during device remove).
> + * @is_initialized: was the device initialized successfully.
> + * @has_eq: true if event queue is supported.
> + * @dma_map_support: HW supports DMA mapping.
> + */
> +struct hbl_en_device {
> + struct pci_dev *pdev;
> + struct device *dev;
> + struct hbl_en_port *ports;
> + struct hbl_aux_dev *aux_dev;
> + void *asic_specific;
> + char *fw_ver;
> + char *qsfp_eeprom;
> + char *mac_addr;
> + struct hbl_en_asic_funcs asic_funcs;
> + enum hbl_cn_asic_type asic_type;
> + u64 ports_mask;
> + u64 auto_neg_mask;
> + u32 port_reset_timeout;
> + u32 pending_reset_long_timeout;
> + u32 max_frm_len;
> + u32 raw_elem_size;
> + u16 max_raw_mtu;
> + u16 min_raw_mtu;
> + u16 pad_size;
> + u16 core_dev_id;
> + u8 max_num_of_ports;
> + u8 in_reset;
> + u8 poll_enable;
> + u8 in_teardown;
> + u8 is_initialized;
> + u8 has_eq;
> + u8 dma_map_support;
> +};
> +
> +int hbl_en_dev_init(struct hbl_en_device *hdev);
> +void hbl_en_dev_fini(struct hbl_en_device *hdev);
> +
> +const struct ethtool_ops *hbl_en_ethtool_get_ops(struct net_device *ndev);
> +void hbl_en_ethtool_init_coalesce(struct hbl_en_port *port);
> +
> +extern const struct dcbnl_rtnl_ops hbl_en_dcbnl_ops;
> +
> +bool hbl_en_rx_poll_start(struct hbl_en_port *port);
> +void hbl_en_rx_poll_stop(struct hbl_en_port *port);
> +void hbl_en_rx_poll_trigger_init(struct hbl_en_port *port);
> +int hbl_en_port_reset(struct hbl_en_port *port);
> +int hbl_en_port_reset_locked(struct hbl_aux_dev *aux_dev, u32 port_idx);
> +int hbl_en_handle_rx(struct hbl_en_port *port, int budget);
> +dma_addr_t hbl_en_dma_map(struct hbl_en_device *hdev, void *addr, int len);
> +void hbl_en_dma_unmap(struct hbl_en_device *hdev, dma_addr_t dma_addr, int len);
> +
> +#endif /* HABANALABS_EN_H_ */
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c
> new file mode 100644
> index 000000000000..5d718579a2b6
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c
> @@ -0,0 +1,101 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#include "hbl_en.h"
> +
> +#define PFC_PRIO_MASK_ALL GENMASK(HBL_EN_PFC_PRIO_NUM - 1, 0)
> +#define PFC_PRIO_MASK_NONE 0
> +
> +#ifdef CONFIG_DCB
> +static int hbl_en_dcbnl_ieee_getpfc(struct net_device *netdev, struct ieee_pfc *pfc)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + struct hbl_en_device *hdev;
> + u32 port_idx;
> +
> + hdev = port->hdev;
> + port_idx = port->idx;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + dev_dbg_ratelimited(hdev->dev, "port %d is in reset, can't get PFC", port_idx);
> + return -EBUSY;
> + }
> +
> + pfc->pfc_en = port->pfc_enable ? PFC_PRIO_MASK_ALL : PFC_PRIO_MASK_NONE;
> + pfc->pfc_cap = HBL_EN_PFC_PRIO_NUM;
> +
> + hdev->asic_funcs.get_pfc_cnts(port, pfc);
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return 0;
> +}
> +
> +static int hbl_en_dcbnl_ieee_setpfc(struct net_device *netdev, struct ieee_pfc *pfc)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + struct hbl_en_device *hdev;
> + u8 curr_pfc_en;
> + u32 port_idx;
> + int rc = 0;
> +
> + hdev = port->hdev;
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> + port_idx = port->idx;
> +
> + if (pfc->pfc_en & ~PFC_PRIO_MASK_ALL) {
> + dev_dbg_ratelimited(hdev->dev, "PFC supports %d priorities only, port %d\n",
> + HBL_EN_PFC_PRIO_NUM, port_idx);
> + return -EINVAL;
> + }
> +
> + if (pfc->pfc_en != PFC_PRIO_MASK_NONE && pfc->pfc_en != PFC_PRIO_MASK_ALL) {
> + dev_dbg_ratelimited(hdev->dev,
> + "PFC should be enabled/disabled on all priorities, port %d\n",
> + port_idx);
> + return -EINVAL;
> + }
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + dev_dbg_ratelimited(hdev->dev, "port %d is in reset, can't set PFC", port_idx);
> + return -EBUSY;
> + }
> +
> + curr_pfc_en = port->pfc_enable ? PFC_PRIO_MASK_ALL : PFC_PRIO_MASK_NONE;
> +
> + if (pfc->pfc_en == curr_pfc_en)
> + goto out;
> +
> + port->pfc_enable = !port->pfc_enable;
> +
> + rc = aux_ops->set_pfc(aux_dev, port_idx, port->pfc_enable);
> +
> +out:
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +static u8 hbl_en_dcbnl_getdcbx(struct net_device *netdev)
> +{
> + return DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE;
> +}
> +
> +static u8 hbl_en_dcbnl_setdcbx(struct net_device *netdev, u8 mode)
> +{
> + return !(mode == (DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE));
> +}
> +
> +const struct dcbnl_rtnl_ops hbl_en_dcbnl_ops = {
> + .ieee_getpfc = hbl_en_dcbnl_ieee_getpfc,
> + .ieee_setpfc = hbl_en_dcbnl_ieee_setpfc,
> + .getdcbx = hbl_en_dcbnl_getdcbx,
> + .setdcbx = hbl_en_dcbnl_setdcbx
> +};
> +#endif
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c
> new file mode 100644
> index 000000000000..23a87d36ded5
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c
> @@ -0,0 +1,211 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#define pr_fmt(fmt) "habanalabs_en: " fmt
> +
> +#include "hbl_en.h"
> +
> +#include <linux/module.h>
> +#include <linux/auxiliary_bus.h>
> +
> +#define HBL_DRIVER_AUTHOR "HabanaLabs Kernel Driver Team"
> +
> +#define HBL_DRIVER_DESC "HabanaLabs AI accelerators Ethernet driver"
> +
> +MODULE_AUTHOR(HBL_DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(HBL_DRIVER_DESC);
> +MODULE_LICENSE("GPL");
> +
> +static bool poll_enable;
> +
> +module_param(poll_enable, bool, 0444);
> +MODULE_PARM_DESC(poll_enable,
> + "Enable Rx polling rather than IRQ + NAPI (0 = no, 1 = yes, default: no)");
> +
> +static int hdev_init(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_aux_data *aux_data = aux_dev->aux_data;
> + struct hbl_en_port *ports, *port;
> + struct hbl_en_device *hdev;
> + int rc, i;
> +
> + hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
> + if (!hdev)
> + return -ENOMEM;
> +
> + ports = kcalloc(aux_data->max_num_of_ports, sizeof(*ports), GFP_KERNEL);
> + if (!ports) {
> + rc = -ENOMEM;
> + goto ports_alloc_fail;
> + }
> +
> + aux_dev->priv = hdev;
> + hdev->aux_dev = aux_dev;
> + hdev->ports = ports;
> + hdev->pdev = aux_data->pdev;
> + hdev->dev = aux_data->dev;
> + hdev->ports_mask = aux_data->ports_mask;
> + hdev->auto_neg_mask = aux_data->auto_neg_mask;
> + hdev->max_num_of_ports = aux_data->max_num_of_ports;
> + hdev->core_dev_id = aux_data->id;
> + hdev->fw_ver = aux_data->fw_ver;
> + hdev->qsfp_eeprom = aux_data->qsfp_eeprom;
> + hdev->asic_type = aux_data->asic_type;
> + hdev->pending_reset_long_timeout = aux_data->pending_reset_long_timeout;
> + hdev->max_frm_len = aux_data->max_frm_len;
> + hdev->raw_elem_size = aux_data->raw_elem_size;
> + hdev->max_raw_mtu = aux_data->max_raw_mtu;
> + hdev->min_raw_mtu = aux_data->min_raw_mtu;
> + hdev->pad_size = ETH_ZLEN;
> + hdev->has_eq = aux_data->has_eq;
> + hdev->dma_map_support = true;
> + hdev->poll_enable = poll_enable;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> + port->hdev = hdev;
> + port->idx = i;
> + port->pfc_enable = true;
> + port->pflags = PFLAGS_PCS_LINK_CHECK | PFLAGS_PHY_AUTO_NEG_LPBK;
> + port->mac_addr = aux_data->mac_addr[i];
> + port->auto_neg_enable = !!(aux_data->auto_neg_mask & BIT(i));
> + }
> +
> + return 0;
> +
> +ports_alloc_fail:
> + kfree(hdev);
> +
> + return rc;
> +}
> +
> +static void hdev_fini(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> +
> + kfree(hdev->ports);
> + kfree(hdev);
> + aux_dev->priv = NULL;
> +}
> +
> +static const struct auxiliary_device_id hbl_en_id_table[] = {
> + { .name = "habanalabs_cn.en", },
> + {},
> +};
> +
> +MODULE_DEVICE_TABLE(auxiliary, hbl_en_id_table);
> +
> +static int hbl_en_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
> +{
> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
> + struct hbl_en_aux_ops *aux_ops = aux_dev->aux_ops;
> + struct hbl_en_device *hdev;
> + ktime_t timeout;
> + int rc;
> +
> + rc = hdev_init(aux_dev);
> + if (rc) {
> + dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
> + return -EIO;
> + }
> +
> + hdev = aux_dev->priv;
> +
> + /* don't allow module unloading while it is attached */
> + if (!try_module_get(THIS_MODULE)) {
> + dev_err(hdev->dev, "Failed to increment %s module refcount\n", HBL_EN_NAME);
> + rc = -EIO;
> + goto module_get_err;
> + }
> +
> + timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
> + while (1) {
> + aux_ops->hw_access_lock(aux_dev);
> +
> + /* if the device is operational, proceed to actual init while holding the lock in
> + * order to prevent concurrent hard reset
> + */
> + if (aux_ops->device_operational(aux_dev))
> + break;
> +
> + aux_ops->hw_access_unlock(aux_dev);
> +
> + if (ktime_compare(ktime_get(), timeout) > 0) {
> + dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
> + rc = -EBUSY;
> + goto timeout_err;
> + }
> +
> + dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing en\n");
> +
> + msleep_interruptible(MSEC_PER_SEC);
> + }
> +
> + rc = hbl_en_dev_init(hdev);
> + if (rc) {
> + dev_err(hdev->dev, "Failed to init en device\n");
> + goto dev_init_err;
> + }
> +
> + aux_ops->hw_access_unlock(aux_dev);
> +
> + return 0;
> +
> +dev_init_err:
> + aux_ops->hw_access_unlock(aux_dev);
> +timeout_err:
> + module_put(THIS_MODULE);
> +module_get_err:
> + hdev_fini(aux_dev);
> +
> + return rc;
> +}
> +
> +/* This function can be called only from the CN driver when deleting the aux bus, because we
> + * incremented the module refcount on probing. Hence no need to protect here from hard reset.
> + */
> +static void hbl_en_remove(struct auxiliary_device *adev)
> +{
> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
> + struct hbl_en_device *hdev = aux_dev->priv;
> +
> + if (!hdev)
> + return;
> +
> + hbl_en_dev_fini(hdev);
> +
> + /* allow module unloading as now it is detached */
> + module_put(THIS_MODULE);
> +
> + hdev_fini(aux_dev);
> +}
> +
> +static struct auxiliary_driver hbl_en_driver = {
> + .name = "eth",
> + .probe = hbl_en_probe,
> + .remove = hbl_en_remove,
> + .id_table = hbl_en_id_table,
> +};
> +
> +static int __init hbl_en_init(void)
> +{
> + pr_info("loading driver\n");
> +
> + return auxiliary_driver_register(&hbl_en_driver);
> +}
> +
> +static void __exit hbl_en_exit(void)
> +{
> + auxiliary_driver_unregister(&hbl_en_driver);
> +
> + pr_info("driver removed\n");
> +}
> +
> +module_init(hbl_en_init);
> +module_exit(hbl_en_exit);
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c
> new file mode 100644
> index 000000000000..1d14d283409b
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c
> @@ -0,0 +1,452 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#include "hbl_en.h"
> +#include <linux/ethtool.h>
> +
> +#define RX_COALESCED_FRAMES_MIN 1
> +#define TX_COALESCED_FRAMES_MIN 1
> +#define TX_COALESCED_FRAMES_MAX 10
> +
> +static const char pflags_str[][ETH_GSTRING_LEN] = {
> + "pcs-link-check",
> + "phy-auto-neg-lpbk",
> +};
> +
> +#define NIC_STAT(m) {#m, offsetof(struct hbl_en_port, net_stats.m)}
> +
> +static struct hbl_cn_stat netdev_eth_stats[] = {
> + NIC_STAT(rx_packets),
> + NIC_STAT(tx_packets),
> + NIC_STAT(rx_bytes),
> + NIC_STAT(tx_bytes),
> + NIC_STAT(tx_errors),
> + NIC_STAT(rx_dropped),
> + NIC_STAT(tx_dropped)
> +};
> +
> +static size_t pflags_str_len = ARRAY_SIZE(pflags_str);
> +static size_t netdev_eth_stats_len = ARRAY_SIZE(netdev_eth_stats);
> +
> +static void hbl_en_ethtool_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *drvinfo)
> +{
> + struct hbl_en_device *hdev;
> + struct hbl_en_port *port;
> +
> + port = hbl_netdev_priv(ndev);
> + hdev = port->hdev;
> +
> + strscpy(drvinfo->driver, HBL_EN_NAME, sizeof(drvinfo->driver));
> + strscpy(drvinfo->fw_version, hdev->fw_ver, sizeof(drvinfo->fw_version));
> + strscpy(drvinfo->bus_info, pci_name(hdev->pdev), sizeof(drvinfo->bus_info));
> +}
> +
> +static int hbl_en_ethtool_get_module_info(struct net_device *ndev, struct ethtool_modinfo *modinfo)
> +{
> + modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
> + modinfo->type = ETH_MODULE_SFF_8636;
> +
> + return 0;
> +}
> +
> +static int hbl_en_ethtool_get_module_eeprom(struct net_device *ndev, struct ethtool_eeprom *ee,
> + u8 *data)
> +{
> + struct hbl_en_device *hdev;
> + struct hbl_en_port *port;
> + u32 first, last, len;
> + u8 *qsfp_eeprom;
> +
> + port = hbl_netdev_priv(ndev);
> + hdev = port->hdev;
> + qsfp_eeprom = hdev->qsfp_eeprom;
> +
> + if (ee->len == 0)
> + return -EINVAL;
> +
> + first = ee->offset;
> + last = ee->offset + ee->len;
> +
> + if (first < ETH_MODULE_SFF_8636_LEN) {
> + len = min_t(unsigned int, last, ETH_MODULE_SFF_8079_LEN);
> + len -= first;
> +
> + memcpy(data, qsfp_eeprom + first, len);
> + }
> +
> + return 0;
> +}
> +
> +static u32 hbl_en_ethtool_get_priv_flags(struct net_device *ndev)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> +
> + return port->pflags;
> +}
> +
> +static int hbl_en_ethtool_set_priv_flags(struct net_device *ndev, u32 priv_flags)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> +
> + port->pflags = priv_flags;
> +
> + return 0;
> +}
> +
> +static int hbl_en_ethtool_get_link_ksettings(struct net_device *ndev,
> + struct ethtool_link_ksettings *cmd)
> +{
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + struct hbl_en_device *hdev;
> + struct hbl_en_port *port;
> + u32 port_idx, speed;
> +
> + port = hbl_netdev_priv(ndev);
> + hdev = port->hdev;
> + port_idx = port->idx;
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> + speed = aux_ops->get_speed(aux_dev, port_idx);
> +
> + cmd->base.speed = speed;
> + cmd->base.duplex = DUPLEX_FULL;
> +
> + ethtool_link_ksettings_zero_link_mode(cmd, supported);
> + ethtool_link_ksettings_zero_link_mode(cmd, advertising);
> +
> + switch (speed) {
> + case SPEED_100000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseSR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseKR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseLR4_ER4_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseSR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseKR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseLR4_ER4_Full);
> +
> + cmd->base.port = PORT_FIBRE;
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, FIBRE);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, FIBRE);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, Backplane);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Backplane);
> + break;
> + case SPEED_50000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseSR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseCR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseKR2_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseSR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseCR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseKR2_Full);
> + break;
> + case SPEED_25000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 25000baseCR_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 25000baseCR_Full);
> + break;
> + case SPEED_200000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 200000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 200000baseKR4_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 200000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 200000baseKR4_Full);
> + break;
> + case SPEED_400000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 400000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 400000baseKR4_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 400000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 400000baseKR4_Full);
> + break;
> + default:
> + netdev_err(port->ndev, "unknown speed %d\n", speed);
> + return -EFAULT;
> + }
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, Autoneg);
> +
> + if (port->auto_neg_enable) {
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Autoneg);
> + cmd->base.autoneg = AUTONEG_ENABLE;
> + if (port->auto_neg_resolved)
> + ethtool_link_ksettings_add_link_mode(cmd, lp_advertising, Autoneg);
> + } else {
> + cmd->base.autoneg = AUTONEG_DISABLE;
> + }
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, Pause);
> +
> + if (port->pfc_enable)
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Pause);
> +
> + return 0;
> +}
> +
> +/* only autoneg is mutable */
> +static bool check_immutable_ksettings(const struct ethtool_link_ksettings *old_cmd,
> + const struct ethtool_link_ksettings *new_cmd)
> +{
> + return (old_cmd->base.speed == new_cmd->base.speed) &&
> + (old_cmd->base.duplex == new_cmd->base.duplex) &&
> + (old_cmd->base.port == new_cmd->base.port) &&
> + (old_cmd->base.phy_address == new_cmd->base.phy_address) &&
> + (old_cmd->base.eth_tp_mdix_ctrl == new_cmd->base.eth_tp_mdix_ctrl) &&
> + bitmap_equal(old_cmd->link_modes.advertising, new_cmd->link_modes.advertising,
> + __ETHTOOL_LINK_MODE_MASK_NBITS);
> +}
> +
> +static int
> +hbl_en_ethtool_set_link_ksettings(struct net_device *ndev, const struct ethtool_link_ksettings *cmd)
> +{
> + struct ethtool_link_ksettings curr_cmd;
> + struct hbl_en_device *hdev;
> + struct hbl_en_port *port;
> + bool auto_neg;
> + u32 port_idx;
> + int rc;
> +
> + port = hbl_netdev_priv(ndev);
> + hdev = port->hdev;
> + port_idx = port->idx;
> +
> + memset(&curr_cmd, 0, sizeof(struct ethtool_link_ksettings));
> +
> + rc = hbl_en_ethtool_get_link_ksettings(ndev, &curr_cmd);
> + if (rc)
> + return rc;
> +
> + if (!check_immutable_ksettings(&curr_cmd, cmd))
> + return -EOPNOTSUPP;
> +
> + auto_neg = cmd->base.autoneg == AUTONEG_ENABLE;
> +
> + if (port->auto_neg_enable == auto_neg)
> + return 0;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(port->ndev, "port is in reset, can't update settings\n");
> + return -EBUSY;
> + }
> +
> + if (auto_neg && !(hdev->auto_neg_mask & BIT(port_idx))) {
> + netdev_err(port->ndev, "port autoneg is disabled by BMC\n");
> + rc = -EFAULT;
> + goto out;
> + }
> +
> + port->auto_neg_enable = auto_neg;
> +
> + if (netif_running(port->ndev)) {
> + rc = hbl_en_port_reset(port);
> + if (rc)
> + netdev_err(port->ndev, "Failed to reset port for settings update, rc %d\n",
> + rc);
> + }
> +
> +out:
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +static int hbl_en_ethtool_get_sset_count(struct net_device *ndev, int sset)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + switch (sset) {
> + case ETH_SS_STATS:
> + return netdev_eth_stats_len + aux_ops->get_cnts_num(aux_dev, port_idx);
> + case ETH_SS_PRIV_FLAGS:
> + return pflags_str_len;
> + default:
> + return -EOPNOTSUPP;
> + }
> +}
> +
> +static void hbl_en_ethtool_get_strings(struct net_device *ndev, u32 stringset, u8 *data)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int i;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + switch (stringset) {
> + case ETH_SS_STATS:
> + for (i = 0; i < netdev_eth_stats_len; i++)
> + ethtool_puts(&data, netdev_eth_stats[i].str);
> +
> + aux_ops->get_cnts_names(aux_dev, port_idx, data);
> + break;
> + case ETH_SS_PRIV_FLAGS:
> + for (i = 0; i < pflags_str_len; i++)
> + ethtool_puts(&data, pflags_str[i]);
> + break;
> + }
> +}
> +
> +static void hbl_en_ethtool_get_ethtool_stats(struct net_device *ndev,
> + __always_unused struct ethtool_stats *stats, u64 *data)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + struct hbl_en_device *hdev;
> + u32 port_idx;
> + char *p;
> + int i;
> +
> + hdev = port->hdev;
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> + port_idx = port->idx;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + dev_info_ratelimited(hdev->dev, "port %d is in reset, can't get ethtool stats",
> + port_idx);
> + return;
> + }
> +
> + /* Even though the Ethernet Rx/Tx flow might update the stats in parallel, there is not an
> + * absolute need for synchronisation. This is because, missing few counts of these stats is
> + * much better than adding a lock to synchronize and increase the overhead of the Rx/Tx
> + * flows. In worst case scenario, reader will get stale stats. He will receive updated
> + * stats in next read.
> + */
> + for (i = 0; i < netdev_eth_stats_len; i++) {
> + p = (char *)port + netdev_eth_stats[i].lo_offset;
> + data[i] = *(u32 *)p;
> + }
> +
> + data += i;
> +
> + aux_ops->get_cnts_values(aux_dev, port_idx, data);
> +
> + atomic_set(&port->in_reset, 0);
> +}
> +
> +static int hbl_en_ethtool_get_coalesce(struct net_device *ndev,
> + struct ethtool_coalesce *coal,
> + struct kernel_ethtool_coalesce *kernel_coal,
> + struct netlink_ext_ack *extack)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + aux_ops->ctrl_lock(aux_dev, port_idx);
> +
> + coal->tx_max_coalesced_frames = port->tx_max_coalesced_frames;
> + coal->rx_coalesce_usecs = port->rx_coalesce_usecs;
> + coal->rx_max_coalesced_frames = port->rx_max_coalesced_frames;
> +
> + aux_ops->ctrl_unlock(aux_dev, port_idx);
> +
> + return 0;
> +}
> +
> +static int hbl_en_ethtool_set_coalesce(struct net_device *ndev,
> + struct ethtool_coalesce *coal,
> + struct kernel_ethtool_coalesce *kernel_coal,
> + struct netlink_ext_ack *extack)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int rc, rx_ring_size;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(port->ndev, "port is in reset, can't update settings\n");
> + return -EBUSY;
> + }
> +
> + if (coal->tx_max_coalesced_frames < TX_COALESCED_FRAMES_MIN ||
> + coal->tx_max_coalesced_frames > TX_COALESCED_FRAMES_MAX) {
> + netdev_err(ndev, "tx max_coalesced_frames should be between %d and %d\n",
> + TX_COALESCED_FRAMES_MIN, TX_COALESCED_FRAMES_MAX);
> + rc = -EINVAL;
> + goto atomic_out;
> + }
> +
> + rx_ring_size = hdev->asic_funcs.get_rx_ring_size(port);
> + if (coal->rx_max_coalesced_frames < RX_COALESCED_FRAMES_MIN ||
> + coal->rx_max_coalesced_frames >= rx_ring_size) {
> + netdev_err(ndev, "rx max_coalesced_frames should be between %d and %d\n",
> + RX_COALESCED_FRAMES_MIN, rx_ring_size);
> + rc = -EINVAL;
> + goto atomic_out;
> + }
> +
> + aux_ops->ctrl_lock(aux_dev, port_idx);
> +
> + port->tx_max_coalesced_frames = coal->tx_max_coalesced_frames;
> + port->rx_coalesce_usecs = coal->rx_coalesce_usecs;
> + port->rx_max_coalesced_frames = coal->rx_max_coalesced_frames;
> +
> + rc = hdev->asic_funcs.set_coalesce(port);
> +
> + aux_ops->ctrl_unlock(aux_dev, port_idx);
> +
> +atomic_out:
> + atomic_set(&port->in_reset, 0);
> + return rc;
> +}
> +
> +void hbl_en_ethtool_init_coalesce(struct hbl_en_port *port)
> +{
> + port->rx_coalesce_usecs = CQ_ARM_TIMEOUT_USEC;
> + port->rx_max_coalesced_frames = 1;
> + port->tx_max_coalesced_frames = 1;
> +}
> +
> +static const struct ethtool_ops hbl_en_ethtool_ops_coalesce = {
> + .supported_coalesce_params = ETHTOOL_COALESCE_RX_USECS | ETHTOOL_COALESCE_RX_MAX_FRAMES |
> + ETHTOOL_COALESCE_TX_MAX_FRAMES,
> + .get_drvinfo = hbl_en_ethtool_get_drvinfo,
> + .get_link = ethtool_op_get_link,
> + .get_module_info = hbl_en_ethtool_get_module_info,
> + .get_module_eeprom = hbl_en_ethtool_get_module_eeprom,
> + .get_priv_flags = hbl_en_ethtool_get_priv_flags,
> + .set_priv_flags = hbl_en_ethtool_set_priv_flags,
> + .get_link_ksettings = hbl_en_ethtool_get_link_ksettings,
> + .set_link_ksettings = hbl_en_ethtool_set_link_ksettings,
> + .get_sset_count = hbl_en_ethtool_get_sset_count,
> + .get_strings = hbl_en_ethtool_get_strings,
> + .get_ethtool_stats = hbl_en_ethtool_get_ethtool_stats,
> + .get_coalesce = hbl_en_ethtool_get_coalesce,
> + .set_coalesce = hbl_en_ethtool_set_coalesce,
> +};
> +
> +const struct ethtool_ops *hbl_en_ethtool_get_ops(struct net_device *ndev)
> +{
> + return &hbl_en_ethtool_ops_coalesce;
> +}
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-13 8:22 ` [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver Omer Shpigelman
` (4 preceding siblings ...)
2024-06-15 10:55 ` Zhu Yanjun
@ 2024-06-15 17:13 ` Zhu Yanjun
2024-06-16 1:08 ` Andrew Lunn
5 siblings, 1 reply; 107+ messages in thread
From: Zhu Yanjun @ 2024-06-15 17:13 UTC (permalink / raw)
To: Omer Shpigelman, linux-kernel, linux-rdma, netdev, dri-devel
Cc: ogabbay, zyehudai
在 2024/6/13 16:22, Omer Shpigelman 写道:
> This ethernet driver is initialized via auxiliary bus by the hbl_cn
> driver.
> It serves mainly for control operations that are needed for AI scaling.
>
> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> Co-developed-by: David Meriin <dmeriin@habana.ai>
> Signed-off-by: David Meriin <dmeriin@habana.ai>
> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> ---
> MAINTAINERS | 9 +
> drivers/net/ethernet/intel/Kconfig | 18 +
> drivers/net/ethernet/intel/Makefile | 1 +
> drivers/net/ethernet/intel/hbl_en/Makefile | 9 +
> .../net/ethernet/intel/hbl_en/common/Makefile | 3 +
> .../net/ethernet/intel/hbl_en/common/hbl_en.c | 1168 +++++++++++++++++
> .../net/ethernet/intel/hbl_en/common/hbl_en.h | 206 +++
> .../intel/hbl_en/common/hbl_en_dcbnl.c | 101 ++
> .../ethernet/intel/hbl_en/common/hbl_en_drv.c | 211 +++
> .../intel/hbl_en/common/hbl_en_ethtool.c | 452 +++++++
> 10 files changed, 2178 insertions(+)
> create mode 100644 drivers/net/ethernet/intel/hbl_en/Makefile
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/Makefile
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 096439a62129..7301f38e9cfb 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9617,6 +9617,15 @@ F: include/linux/habanalabs/
> F: include/linux/net/intel/cn*
> F: include/linux/net/intel/gaudi2*
>
> +HABANALABS ETHERNET DRIVER
> +M: Omer Shpigelman <oshpigelman@habana.ai>
> +L: netdev@vger.kernel.org
> +S: Supported
> +W: https://www.habana.ai
> +F: Documentation/networking/device_drivers/ethernet/intel/hbl.rst
> +F: drivers/net/ethernet/intel/hbl_en/
> +F: include/linux/net/intel/cn*
> +
> HACKRF MEDIA DRIVER
> L: linux-media@vger.kernel.org
> S: Orphan
> diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
> index 0d1b8a2bae99..5d07349348a0 100644
> --- a/drivers/net/ethernet/intel/Kconfig
> +++ b/drivers/net/ethernet/intel/Kconfig
> @@ -417,4 +417,22 @@ config HABANA_CN
> To compile this driver as a module, choose M here. The module
> will be called habanalabs_cn.
>
> +config HABANA_EN
> + tristate "HabanaLabs (an Intel Company) Ethernet driver"
> + depends on NETDEVICES && ETHERNET && INET
> + select HABANA_CN
> + help
> + This driver enables Ethernet functionality for the network interfaces
> + that are part of the GAUDI ASIC family of AI Accelerators.
> + For more information on how to identify your adapter, go to the
> + Adapter & Driver ID Guide that can be located at:
> +
> + <http://support.intel.com>
> +
> + More specific information on configuring the driver is in
> + <file:Documentation/networking/device_drivers/ethernet/intel/hbl.rst>.
> +
> + To compile this driver as a module, choose M here. The module
> + will be called habanalabs_en.
> +
> endif # NET_VENDOR_INTEL
> diff --git a/drivers/net/ethernet/intel/Makefile b/drivers/net/ethernet/intel/Makefile
> index 10049a28e336..ec62a0227897 100644
> --- a/drivers/net/ethernet/intel/Makefile
> +++ b/drivers/net/ethernet/intel/Makefile
> @@ -20,3 +20,4 @@ obj-$(CONFIG_FM10K) += fm10k/
> obj-$(CONFIG_ICE) += ice/
> obj-$(CONFIG_IDPF) += idpf/
> obj-$(CONFIG_HABANA_CN) += hbl_cn/
> +obj-$(CONFIG_HABANA_EN) += hbl_en/
> diff --git a/drivers/net/ethernet/intel/hbl_en/Makefile b/drivers/net/ethernet/intel/hbl_en/Makefile
> new file mode 100644
> index 000000000000..695497ab93b6
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/Makefile
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Makefile for HabanaLabs (an Intel Company) Ethernet network driver
> +#
> +
> +obj-$(CONFIG_HABANA_EN) := habanalabs_en.o
> +
> +include $(src)/common/Makefile
> +habanalabs_en-y += $(HBL_EN_COMMON_FILES)
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/Makefile b/drivers/net/ethernet/intel/hbl_en/common/Makefile
> new file mode 100644
> index 000000000000..a3ccb5dbf4a6
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +HBL_EN_COMMON_FILES := common/hbl_en_drv.o common/hbl_en.o \
> + common/hbl_en_ethtool.o common/hbl_en_dcbnl.o
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
> new file mode 100644
> index 000000000000..066be5ac2d84
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
> @@ -0,0 +1,1168 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#include "hbl_en.h"
> +#include <linux/inetdevice.h>
> +
> +#define TX_TIMEOUT (5 * HZ)
> +#define PORT_RESET_TIMEOUT_MSEC (60 * 1000ull) /* 60s */
> +
> +/**
> + * struct hbl_en_tx_pkt_work - used to schedule a work of a Tx packet.
> + * @tx_work: workqueue object to run when packet needs to be sent.
> + * @port: pointer to current port structure.
> + * @skb: copy of the packet to send.
> + */
> +struct hbl_en_tx_pkt_work {
> + struct work_struct tx_work;
> + struct hbl_en_port *port;
> + struct sk_buff *skb;
> +};
> +
> +static int hbl_en_napi_poll(struct napi_struct *napi, int budget);
> +static int hbl_en_port_open(struct hbl_en_port *port);
> +
> +static int hbl_en_ports_reopen(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> + struct hbl_en_port *port;
> + int rc = 0, i;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + /* It could be that the port was shutdown by 'ip link set down' and there is no need
> + * in reopening it.
> + * Since we mark the ports as in reset even if they are disabled, we clear the flag
> + * here anyway.
> + * See hbl_en_ports_stop_prepare() for more info.
> + */
> + if (!netif_running(port->ndev)) {
> + atomic_set(&port->in_reset, 0);
> + continue;
> + }
> +
> + rc = hbl_en_port_open(port);
> +
> + atomic_set(&port->in_reset, 0);
> +
> + if (rc)
> + break;
> + }
> +
> + hdev->in_reset = false;
> +
> + return rc;
> +}
> +
> +static void hbl_en_port_fini(struct hbl_en_port *port)
> +{
> + if (port->rx_wq)
> + destroy_workqueue(port->rx_wq);
> +}
> +
> +static int hbl_en_port_init(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + u32 port_idx = port->idx;
> + char wq_name[32];
> + int rc;
> +
> + if (hdev->poll_enable) {
> + memset(wq_name, 0, sizeof(wq_name));
> + snprintf(wq_name, sizeof(wq_name) - 1, "hbl%u-port%d-rx-wq", hdev->core_dev_id,
> + port_idx);
> + port->rx_wq = alloc_ordered_workqueue(wq_name, 0);
> + if (!port->rx_wq) {
> + dev_err(hdev->dev, "Failed to allocate Rx WQ\n");
> + rc = -ENOMEM;
> + goto fail;
> + }
> + }
> +
> + hbl_en_ethtool_init_coalesce(port);
> +
> + return 0;
> +
> +fail:
> + hbl_en_port_fini(port);
> +
> + return rc;
> +}
> +
> +static void _hbl_en_set_port_status(struct hbl_en_port *port, bool up)
> +{
> + struct net_device *ndev = port->ndev;
> + u32 port_idx = port->idx;
> +
> + if (up) {
> + netif_carrier_on(ndev);
> + netif_wake_queue(ndev);
> + } else {
> + netif_carrier_off(ndev);
> + netif_stop_queue(ndev);
> + }
> +
> + /* Unless link events are getting through the EQ, no need to print about link down events
> + * during port reset
> + */
> + if (port->hdev->has_eq || up || !atomic_read(&port->in_reset))
> + netdev_info(port->ndev, "link %s, port %d\n", up ? "up" : "down", port_idx);
> +}
> +
> +static void hbl_en_set_port_status(struct hbl_aux_dev *aux_dev, u32 port_idx, bool up)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + _hbl_en_set_port_status(port, up);
> +}
> +
> +static bool hbl_en_is_port_open(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + return port->is_initialized;
> +}
> +
> +/* get the src IP as it is done in devinet_ioctl() */
> +static int hbl_en_get_src_ip(struct hbl_aux_dev *aux_dev, u32 port_idx, u32 *src_ip)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> + struct net_device *ndev = port->ndev;
> + struct in_device *in_dev;
> + struct in_ifaddr *ifa;
> + int rc = 0;
> +
> + /* for the case where no src IP is configured */
> + *src_ip = 0;
> +
> + /* rtnl lock should be acquired in relevant flows before taking configuration lock */
> + if (!rtnl_is_locked()) {
> + netdev_err(port->ndev, "Rtnl lock is not acquired, can't proceed\n");
> + rc = -EFAULT;
> + goto out;
> + }
> +
> + in_dev = __in_dev_get_rtnl(ndev);
> + if (!in_dev) {
> + netdev_err(port->ndev, "Failed to get IPv4 struct\n");
> + rc = -EFAULT;
> + goto out;
> + }
> +
> + ifa = rtnl_dereference(in_dev->ifa_list);
> +
> + while (ifa) {
> + if (!strcmp(ndev->name, ifa->ifa_label)) {
> + /* convert the BE to native and later on it will be
> + * written to the HW as LE in QPC_SET
> + */
> + *src_ip = be32_to_cpu(ifa->ifa_local);
> + break;
> + }
> + ifa = rtnl_dereference(ifa->ifa_next);
> + }
> +out:
> + return rc;
> +}
> +
> +static void hbl_en_reset_stats(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + port->net_stats.rx_packets = 0;
> + port->net_stats.tx_packets = 0;
> + port->net_stats.rx_bytes = 0;
> + port->net_stats.tx_bytes = 0;
> + port->net_stats.tx_errors = 0;
> + atomic64_set(&port->net_stats.rx_dropped, 0);
> + atomic64_set(&port->net_stats.tx_dropped, 0);
per-cpu variable is better?
Zhu Yanjun
> +}
> +
> +static u32 hbl_en_get_mtu(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> + struct net_device *ndev = port->ndev;
> + u32 mtu;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(ndev, "port is in reset, can't get MTU\n");
> + return 0;
> + }
> +
> + mtu = ndev->mtu;
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return mtu;
> +}
> +
> +static u32 hbl_en_get_pflags(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + return port->pflags;
> +}
> +
> +static void hbl_en_set_dev_lpbk(struct hbl_aux_dev *aux_dev, u32 port_idx, bool enable)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> + struct net_device *ndev = port->ndev;
> +
> + if (enable)
> + ndev->features |= NETIF_F_LOOPBACK;
> + else
> + ndev->features &= ~NETIF_F_LOOPBACK;
> +}
> +
> +/* This function should be called after ctrl_lock was taken */
> +static int hbl_en_port_open_locked(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct net_device *ndev = port->ndev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int rc;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + if (port->is_initialized)
> + return 0;
> +
> + if (!hdev->poll_enable)
> + netif_napi_add(ndev, &port->napi, hbl_en_napi_poll);
> +
> + rc = aux_ops->port_hw_init(aux_dev, port_idx);
> + if (rc) {
> + netdev_err(ndev, "Failed to configure the HW, rc %d\n", rc);
> + goto hw_init_fail;
> + }
> +
> + if (!hdev->poll_enable)
> + napi_enable(&port->napi);
> +
> + rc = hdev->asic_funcs.eth_port_open(port);
> + if (rc) {
> + netdev_err(ndev, "Failed to init H/W, rc %d\n", rc);
> + goto port_open_fail;
> + }
> +
> + rc = aux_ops->update_mtu(aux_dev, port_idx, ndev->mtu);
> + if (rc) {
> + netdev_err(ndev, "MTU update failed, rc %d\n", rc);
> + goto update_mtu_fail;
> + }
> +
> + rc = aux_ops->phy_init(aux_dev, port_idx);
> + if (rc) {
> + netdev_err(ndev, "PHY init failed, rc %d\n", rc);
> + goto phy_init_fail;
> + }
> +
> + netif_start_queue(ndev);
> +
> + port->is_initialized = true;
> +
> + return 0;
> +
> +phy_init_fail:
> + /* no need to revert the MTU change, it will be updated on next port open */
> +update_mtu_fail:
> + hdev->asic_funcs.eth_port_close(port);
> +port_open_fail:
> + if (!hdev->poll_enable)
> + napi_disable(&port->napi);
> +
> + aux_ops->port_hw_fini(aux_dev, port_idx);
> +hw_init_fail:
> + if (!hdev->poll_enable)
> + netif_napi_del(&port->napi);
> +
> + return rc;
> +}
> +
> +static int hbl_en_port_open(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int rc;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + aux_ops->ctrl_lock(aux_dev, port_idx);
> + rc = hbl_en_port_open_locked(port);
> + aux_ops->ctrl_unlock(aux_dev, port_idx);
> +
> + return rc;
> +}
> +
> +static int hbl_en_open(struct net_device *netdev)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + int rc;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(netdev, "port is in reset, can't open it\n");
> + return -EBUSY;
> + }
> +
> + rc = hbl_en_port_open(port);
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +/* This function should be called after ctrl_lock was taken */
> +static void hbl_en_port_close_locked(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + if (!port->is_initialized)
> + return;
> +
> + port->is_initialized = false;
> +
> + /* verify that the port is marked as closed before continuing */
> + mb();
> +
> + /* Print if not in hard reset flow e.g. from ip cmd */
> + if (!hdev->in_reset && netif_carrier_ok(port->ndev))
> + netdev_info(port->ndev, "port was closed\n");
> +
> + /* disable the PHY here so no link changes will occur from this point forward */
> + aux_ops->phy_fini(aux_dev, port_idx);
> +
> + /* disable Tx SW flow */
> + netif_carrier_off(port->ndev);
> + netif_tx_disable(port->ndev);
> +
> + /* stop Tx/Rx HW */
> + aux_ops->port_hw_fini(aux_dev, port_idx);
> +
> + /* disable Tx/Rx QPs */
> + hdev->asic_funcs.eth_port_close(port);
> +
> + /* stop Rx SW flow */
> + if (hdev->poll_enable) {
> + hbl_en_rx_poll_stop(port);
> + } else {
> + napi_disable(&port->napi);
> + netif_napi_del(&port->napi);
> + }
> +
> + /* Explicitly count the port close operations as we don't get a link event for this.
> + * Upon port open we receive a link event, hence no additional action required.
> + */
> + aux_ops->port_toggle_count(aux_dev, port_idx);
> +}
> +
> +static void hbl_en_port_close(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + aux_ops->ctrl_lock(aux_dev, port_idx);
> + hbl_en_port_close_locked(port);
> + aux_ops->ctrl_unlock(aux_dev, port_idx);
> +}
> +
> +/* This function should be called after ctrl_lock was taken */
> +static int __hbl_en_port_reset_locked(struct hbl_en_port *port)
> +{
> + hbl_en_port_close_locked(port);
> +
> + return hbl_en_port_open_locked(port);
> +}
> +
> +/* This function should be called after ctrl_lock was taken */
> +int hbl_en_port_reset_locked(struct hbl_aux_dev *aux_dev, u32 port_idx)
> +{
> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> +
> + return __hbl_en_port_reset_locked(port);
> +}
> +
> +int hbl_en_port_reset(struct hbl_en_port *port)
> +{
> + hbl_en_port_close(port);
> +
> + /* Sleep in order to let obsolete events to be dropped before re-opening the port */
> + msleep(20);
> +
> + return hbl_en_port_open(port);
> +}
> +
> +static int hbl_en_close(struct net_device *netdev)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + struct hbl_en_device *hdev = port->hdev;
> + ktime_t timeout;
> +
> + /* Looks like the return value of this function is not checked, so we can't just return
> + * EBUSY if the port is under reset. We need to wait until the reset is finished and then
> + * close the port. Otherwise the netdev will set the port as closed although port_close()
> + * wasn't called. Only if we waited long enough and the reset hasn't finished, we can return
> + * an error without actually closing the port as it is a fatal flow anyway.
> + */
> + timeout = ktime_add_ms(ktime_get(), PORT_RESET_TIMEOUT_MSEC);
> + while (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + /* If this is called from unregister_netdev() then the port was already closed and
> + * hence we can safely return.
> + * We could have just check the port_open boolean, but that might hide some future
> + * bugs. Hence it is better to use a dedicated flag for that.
> + */
> + if (READ_ONCE(hdev->in_teardown))
> + return 0;
> +
> + usleep_range(50, 200);
> + if (ktime_compare(ktime_get(), timeout) > 0) {
> + netdev_crit(netdev,
> + "Timeout while waiting for port to finish reset, can't close it\n"
> + );
> + return -EBUSY;
> + }
> + }
> +
> + hbl_en_port_close(port);
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return 0;
> +}
> +
> +/**
> + * hbl_en_ports_stop_prepare() - stop the Rx and Tx and synchronize with other reset flows.
> + * @aux_dev: habanalabs auxiliary device structure.
> + *
> + * This function makes sure that during the reset no packets will be processed and that
> + * ndo_open/ndo_close do not open/close the ports.
> + * A hard reset might occur right after the driver was loaded, which means before the ports
> + * initialization was finished. Therefore, even if the ports are not yet open, we mark it as in
> + * reset in order to avoid races. We clear the in reset flag later on when reopening the ports.
> + */
> +static void hbl_en_ports_stop_prepare(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> + struct hbl_en_port *port;
> + ktime_t timeout;
> + int i;
> +
> + /* Check if the ports where initialized. If not, we shouldn't mark them as in reset because
> + * they will fail to get opened.
> + */
> + if (!hdev->is_initialized || hdev->in_reset)
> + return;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + /* This function is competing with reset from ethtool/ip, so try to take the
> + * in_reset atomic and if we are already in a middle of reset, wait until reset
> + * function is finished.
> + * Reset function is designed to always finish (could take up to a few seconds in
> + * worst case).
> + * We mark also closed ports as in reset so they won't be able to get opened while
> + * the device in under reset.
> + */
> +
> + timeout = ktime_add_ms(ktime_get(), PORT_RESET_TIMEOUT_MSEC);
> + while (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + usleep_range(50, 200);
> + if (ktime_compare(ktime_get(), timeout) > 0) {
> + netdev_crit(port->ndev,
> + "Timeout while waiting for port %d to finish reset\n",
> + port->idx);
> + break;
> + }
> + }
> + }
> +
> + hdev->in_reset = true;
> +}
> +
> +static void hbl_en_ports_stop(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> + struct hbl_en_port *port;
> + int i;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + if (netif_running(port->ndev))
> + hbl_en_port_close(port);
> + }
> +}
> +
> +static int hbl_en_change_mtu(struct net_device *netdev, int new_mtu)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + int rc = 0;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(netdev, "port is in reset, can't change MTU\n");
> + return -EBUSY;
> + }
> +
> + if (netif_running(port->ndev)) {
> + hbl_en_port_close(port);
> +
> + /* Sleep in order to let obsolete events to be dropped before re-opening the port */
> + msleep(20);
> +
> + netdev->mtu = new_mtu;
> +
> + rc = hbl_en_port_open(port);
> + if (rc)
> + netdev_err(netdev, "Failed to reinit port for MTU change, rc %d\n", rc);
> + } else {
> + netdev->mtu = new_mtu;
> + }
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +/* Swap source and destination MAC addresses */
> +static inline void swap_l2(char *buf)
> +{
> + u16 *eth_hdr, tmp;
> +
> + eth_hdr = (u16 *)buf;
> + tmp = eth_hdr[0];
> + eth_hdr[0] = eth_hdr[3];
> + eth_hdr[3] = tmp;
> + tmp = eth_hdr[1];
> + eth_hdr[1] = eth_hdr[4];
> + eth_hdr[4] = tmp;
> + tmp = eth_hdr[2];
> + eth_hdr[2] = eth_hdr[5];
> + eth_hdr[5] = tmp;
> +}
> +
> +/* Swap source and destination IP addresses
> + */
> +static inline void swap_l3(char *buf)
> +{
> + u32 tmp;
> +
> + /* skip the Ethernet header and the IP header till source IP address */
> + buf += ETH_HLEN + 12;
> + tmp = ((u32 *)buf)[0];
> + ((u32 *)buf)[0] = ((u32 *)buf)[1];
> + ((u32 *)buf)[1] = tmp;
> +}
> +
> +static void do_tx_swap(struct hbl_en_port *port, struct sk_buff *skb)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + u16 *tmp_buff = (u16 *)skb->data;
> + u32 port_idx = port->idx;
> +
> + /* First, let's print the SKB we got */
> + dev_dbg_ratelimited(hdev->dev,
> + "Send [P%d]: dst-mac:%04x%04x%04x, src-mac:%04x%04x%04x, eth-type:%04x, len:%u\n",
> + port_idx, swab16(tmp_buff[0]), swab16(tmp_buff[1]), swab16(tmp_buff[2]),
> + swab16(tmp_buff[3]), swab16(tmp_buff[4]), swab16(tmp_buff[5]),
> + swab16(tmp_buff[6]), skb->len);
> +
> + /* Before submit it to HW, in case this is ipv4 pkt, swap eth/ip addresses.
> + * that way, we may send ECMP (ping) to ourselves in LB cases.
> + */
> + swap_l2(skb->data);
> + if (swab16(tmp_buff[6]) == ETH_P_IP)
> + swap_l3(skb->data);
> +}
> +
> +static bool is_pkt_swap_enabled(struct hbl_en_device *hdev)
> +{
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + return aux_ops->is_eth_lpbk(aux_dev);
> +}
> +
> +static bool is_tx_disabled(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + return aux_ops->get_mac_lpbk(aux_dev, port_idx) && !is_pkt_swap_enabled(hdev);
> +}
> +
> +static netdev_tx_t hbl_en_handle_tx(struct hbl_en_port *port, struct sk_buff *skb)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + netdev_tx_t ret;
> +
> + if (skb->len <= 0 || is_tx_disabled(port))
> + goto free_skb;
> +
> + if (skb->len > hdev->max_frm_len) {
> + netdev_err(port->ndev, "Tx pkt size %uB exceeds maximum of %uB\n", skb->len,
> + hdev->max_frm_len);
> + goto free_skb;
> + }
> +
> + if (is_pkt_swap_enabled(hdev))
> + do_tx_swap(port, skb);
> +
> + /* Pad the ethernet packets to the minimum frame size as the NIC hw doesn't do it.
> + * eth_skb_pad() frees the packet on failure, so just increment the dropped counter and
> + * return as success to avoid a retry.
> + */
> + if (skb_put_padto(skb, hdev->pad_size)) {
> + dev_err_ratelimited(hdev->dev, "Padding failed, the skb is dropped\n");
> + atomic64_inc(&port->net_stats.tx_dropped);
> + return NETDEV_TX_OK;
> + }
> +
> + ret = hdev->asic_funcs.write_pkt_to_hw(port, skb);
> + if (ret == NETDEV_TX_OK) {
> + port->net_stats.tx_packets++;
> + port->net_stats.tx_bytes += skb->len;
> + }
> +
> + return ret;
> +
> +free_skb:
> + dev_kfree_skb_any(skb);
> + return NETDEV_TX_OK;
> +}
> +
> +static netdev_tx_t hbl_en_start_xmit(struct sk_buff *skb, struct net_device *netdev)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + struct hbl_en_device *hdev;
> +
> + hdev = port->hdev;
> +
> + return hbl_en_handle_tx(port, skb);
> +}
> +
> +static int hbl_en_set_port_mac_loopback(struct hbl_en_port *port, bool enable)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct net_device *ndev = port->ndev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int rc;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + rc = aux_ops->set_mac_lpbk(aux_dev, port_idx, enable);
> + if (rc)
> + return rc;
> +
> + netdev_info(ndev, "port %u: mac loopback is %s\n", port_idx,
> + enable ? "enabled" : "disabled");
> +
> + if (netif_running(ndev)) {
> + rc = hbl_en_port_reset(port);
> + if (rc) {
> + netdev_err(ndev, "Failed to reset port %u, rc %d\n", port_idx, rc);
> + return rc;
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int hbl_en_set_features(struct net_device *netdev, netdev_features_t features)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + netdev_features_t changed;
> + int rc = 0;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(netdev, "port %d is in reset, can't update settings", port->idx);
> + return -EBUSY;
> + }
> +
> + changed = netdev->features ^ features;
> +
> + if (changed & NETIF_F_LOOPBACK)
> + rc = hbl_en_set_port_mac_loopback(port, !!(features & NETIF_F_LOOPBACK));
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +static void hbl_en_handle_tx_timeout(struct net_device *netdev, unsigned int txqueue)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> +
> + port->net_stats.tx_errors++;
> + atomic64_inc(&port->net_stats.tx_dropped);
> +}
> +
> +static void hbl_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(dev);
> +
> + stats->rx_bytes = port->net_stats.rx_bytes;
> + stats->tx_bytes = port->net_stats.tx_bytes;
> + stats->rx_packets = port->net_stats.rx_packets;
> + stats->tx_packets = port->net_stats.tx_packets;
> + stats->tx_errors = port->net_stats.tx_errors;
> + stats->tx_dropped = (u64)atomic64_read(&port->net_stats.tx_dropped);
> + stats->rx_dropped = (u64)atomic64_read(&port->net_stats.rx_dropped);
> +}
> +
> +static const struct net_device_ops hbl_en_netdev_ops = {
> + .ndo_open = hbl_en_open,
> + .ndo_stop = hbl_en_close,
> + .ndo_start_xmit = hbl_en_start_xmit,
> + .ndo_validate_addr = eth_validate_addr,
> + .ndo_change_mtu = hbl_en_change_mtu,
> + .ndo_set_features = hbl_en_set_features,
> + .ndo_get_stats64 = hbl_en_get_stats64,
> + .ndo_tx_timeout = hbl_en_handle_tx_timeout,
> +};
> +
> +static void hbl_en_set_ops(struct net_device *ndev)
> +{
> + ndev->netdev_ops = &hbl_en_netdev_ops;
> + ndev->ethtool_ops = hbl_en_ethtool_get_ops(ndev);
> +#ifdef CONFIG_DCB
> + ndev->dcbnl_ops = &hbl_en_dcbnl_ops;
> +#endif
> +}
> +
> +static int hbl_en_port_register(struct hbl_en_port *port)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + struct hbl_en_port **ptr;
> + struct net_device *ndev;
> + int rc;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + ndev = alloc_etherdev(sizeof(struct hbl_en_port *));
> + if (!ndev) {
> + dev_err(hdev->dev, "netdevice %d alloc failed\n", port_idx);
> + return -ENOMEM;
> + }
> +
> + port->ndev = ndev;
> + SET_NETDEV_DEV(ndev, &hdev->pdev->dev);
> + ptr = netdev_priv(ndev);
> + *ptr = port;
> +
> + /* necessary for creating multiple interfaces */
> + ndev->dev_port = port_idx;
> +
> + hbl_en_set_ops(ndev);
> +
> + ndev->watchdog_timeo = TX_TIMEOUT;
> + ndev->min_mtu = hdev->min_raw_mtu;
> + ndev->max_mtu = hdev->max_raw_mtu;
> +
> + /* Add loopback capability to the device. */
> + ndev->hw_features |= NETIF_F_LOOPBACK;
> +
> + /* If this port was set to loopback, set it also to the ndev features */
> + if (aux_ops->get_mac_lpbk(aux_dev, port_idx))
> + ndev->features |= NETIF_F_LOOPBACK;
> +
> + eth_hw_addr_set(ndev, port->mac_addr);
> +
> + /* It's more an intelligent poll wherein, we enable the Rx completion EQE event and then
> + * start the poll from there.
> + * Inside the polling thread, we read packets from hardware and then reschedule the poll
> + * only if there are more packets to be processed. Else we re-enable the CQ Arm interrupt
> + * and exit the poll.
> + */
> + if (hdev->poll_enable)
> + hbl_en_rx_poll_trigger_init(port);
> +
> + netif_carrier_off(ndev);
> +
> + rc = register_netdev(ndev);
> + if (rc) {
> + dev_err(hdev->dev, "Could not register netdevice %d\n", port_idx);
> + goto err;
> + }
> +
> + return 0;
> +
> +err:
> + if (ndev) {
> + free_netdev(ndev);
> + port->ndev = NULL;
> + }
> +
> + return rc;
> +}
> +
> +static void dump_swap_pkt(struct hbl_en_port *port, struct sk_buff *skb)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + u16 *tmp_buff = (u16 *)skb->data;
> + u32 port_idx = port->idx;
> +
> + /* The SKB is ready now (before stripping-out the L2), print its content */
> + dev_dbg_ratelimited(hdev->dev,
> + "Recv [P%d]: dst-mac:%04x%04x%04x, src-mac:%04x%04x%04x, eth-type:%04x, len:%u\n",
> + port_idx, swab16(tmp_buff[0]), swab16(tmp_buff[1]), swab16(tmp_buff[2]),
> + swab16(tmp_buff[3]), swab16(tmp_buff[4]), swab16(tmp_buff[5]),
> + swab16(tmp_buff[6]), skb->len);
> +}
> +
> +int hbl_en_handle_rx(struct hbl_en_port *port, int budget)
> +{
> + struct hbl_en_device *hdev = port->hdev;
> + enum hbl_en_eth_pkt_status pkt_status;
> + struct net_device *ndev = port->ndev;
> + int rc, pkt_count = 0;
> + struct sk_buff *skb;
> + void *pkt_addr;
> + u32 pkt_size;
> +
> + if (!netif_carrier_ok(ndev))
> + return 0;
> +
> + while (pkt_count < budget) {
> + pkt_status = hdev->asic_funcs.read_pkt_from_hw(port, &pkt_addr, &pkt_size);
> +
> + if (pkt_status == ETH_PKT_NONE)
> + break;
> +
> + pkt_count++;
> +
> + if (pkt_status == ETH_PKT_DROP) {
> + atomic64_inc(&port->net_stats.rx_dropped);
> + continue;
> + }
> +
> + if (hdev->poll_enable)
> + skb = __netdev_alloc_skb_ip_align(ndev, pkt_size, GFP_KERNEL);
> + else
> + skb = napi_alloc_skb(&port->napi, pkt_size);
> +
> + if (!skb) {
> + atomic64_inc(&port->net_stats.rx_dropped);
> + break;
> + }
> +
> + skb_copy_to_linear_data(skb, pkt_addr, pkt_size);
> + skb_put(skb, pkt_size);
> +
> + if (is_pkt_swap_enabled(hdev))
> + dump_swap_pkt(port, skb);
> +
> + skb->protocol = eth_type_trans(skb, ndev);
> +
> + /* Zero the packet buffer memory to avoid leak in case of wrong
> + * size is used when next packet populates the same memory
> + */
> + memset(pkt_addr, 0, pkt_size);
> +
> + /* polling is done in thread context and hence BH should be disabled */
> + if (hdev->poll_enable)
> + local_bh_disable();
> +
> + rc = netif_receive_skb(skb);
> +
> + if (hdev->poll_enable)
> + local_bh_enable();
> +
> + if (rc == NET_RX_SUCCESS) {
> + port->net_stats.rx_packets++;
> + port->net_stats.rx_bytes += pkt_size;
> + } else {
> + atomic64_inc(&port->net_stats.rx_dropped);
> + }
> + }
> +
> + return pkt_count;
> +}
> +
> +static bool __hbl_en_rx_poll_schedule(struct hbl_en_port *port, unsigned long delay)
> +{
> + return queue_delayed_work(port->rx_wq, &port->rx_poll_work, delay);
> +}
> +
> +static void hbl_en_rx_poll_work(struct work_struct *work)
> +{
> + struct hbl_en_port *port = container_of(work, struct hbl_en_port, rx_poll_work.work);
> + struct hbl_en_device *hdev = port->hdev;
> + int pkt_count;
> +
> + pkt_count = hbl_en_handle_rx(port, NAPI_POLL_WEIGHT);
> +
> + /* Reschedule the poll if we have consumed budget which means we still have packets to
> + * process. Else re-enable the Rx IRQs and exit the work.
> + */
> + if (pkt_count < NAPI_POLL_WEIGHT)
> + hdev->asic_funcs.reenable_rx_irq(port);
> + else
> + __hbl_en_rx_poll_schedule(port, 0);
> +}
> +
> +/* Rx poll init and trigger routines are used in event-driven setups where
> + * Rx polling is initialized once during init or open and started/triggered by the event handler.
> + */
> +void hbl_en_rx_poll_trigger_init(struct hbl_en_port *port)
> +{
> + INIT_DELAYED_WORK(&port->rx_poll_work, hbl_en_rx_poll_work);
> +}
> +
> +bool hbl_en_rx_poll_start(struct hbl_en_port *port)
> +{
> + return __hbl_en_rx_poll_schedule(port, msecs_to_jiffies(1));
> +}
> +
> +void hbl_en_rx_poll_stop(struct hbl_en_port *port)
> +{
> + cancel_delayed_work_sync(&port->rx_poll_work);
> +}
> +
> +static int hbl_en_napi_poll(struct napi_struct *napi, int budget)
> +{
> + struct hbl_en_port *port = container_of(napi, struct hbl_en_port, napi);
> + struct hbl_en_device *hdev = port->hdev;
> + int pkt_count;
> +
> + /* exit if we are called by netpoll as we free the Tx ring via EQ (if enabled) */
> + if (!budget)
> + return 0;
> +
> + pkt_count = hbl_en_handle_rx(port, budget);
> +
> + /* If budget not fully consumed, exit the polling mode */
> + if (pkt_count < budget) {
> + napi_complete_done(napi, pkt_count);
> + hdev->asic_funcs.reenable_rx_irq(port);
> + }
> +
> + return pkt_count;
> +}
> +
> +static void hbl_en_port_unregister(struct hbl_en_port *port)
> +{
> + struct net_device *ndev = port->ndev;
> +
> + unregister_netdev(ndev);
> + free_netdev(ndev);
> + port->ndev = NULL;
> +}
> +
> +static int hbl_en_set_asic_funcs(struct hbl_en_device *hdev)
> +{
> + switch (hdev->asic_type) {
> + case HBL_ASIC_GAUDI2:
> + default:
> + dev_err(hdev->dev, "Unrecognized ASIC type %d\n", hdev->asic_type);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static void hbl_en_handle_eqe(struct hbl_aux_dev *aux_dev, u32 port, struct hbl_cn_eqe *eqe)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> +
> + hdev->asic_funcs.handle_eqe(aux_dev, port, eqe);
> +}
> +
> +static void hbl_en_set_aux_ops(struct hbl_en_device *hdev, bool enable)
> +{
> + struct hbl_en_aux_ops *aux_ops = hdev->aux_dev->aux_ops;
> +
> + if (enable) {
> + aux_ops->ports_reopen = hbl_en_ports_reopen;
> + aux_ops->ports_stop_prepare = hbl_en_ports_stop_prepare;
> + aux_ops->ports_stop = hbl_en_ports_stop;
> + aux_ops->set_port_status = hbl_en_set_port_status;
> + aux_ops->is_port_open = hbl_en_is_port_open;
> + aux_ops->get_src_ip = hbl_en_get_src_ip;
> + aux_ops->reset_stats = hbl_en_reset_stats;
> + aux_ops->get_mtu = hbl_en_get_mtu;
> + aux_ops->get_pflags = hbl_en_get_pflags;
> + aux_ops->set_dev_lpbk = hbl_en_set_dev_lpbk;
> + aux_ops->handle_eqe = hbl_en_handle_eqe;
> + } else {
> + aux_ops->ports_reopen = NULL;
> + aux_ops->ports_stop_prepare = NULL;
> + aux_ops->ports_stop = NULL;
> + aux_ops->set_port_status = NULL;
> + aux_ops->is_port_open = NULL;
> + aux_ops->get_src_ip = NULL;
> + aux_ops->reset_stats = NULL;
> + aux_ops->get_mtu = NULL;
> + aux_ops->get_pflags = NULL;
> + aux_ops->set_dev_lpbk = NULL;
> + aux_ops->handle_eqe = NULL;
> + }
> +}
> +
> +int hbl_en_dev_init(struct hbl_en_device *hdev)
> +{
> + struct hbl_en_asic_funcs *asic_funcs = &hdev->asic_funcs;
> + struct hbl_en_port *port;
> + int rc, i, port_cnt = 0;
> +
> + /* must be called before the call to dev_init() */
> + rc = hbl_en_set_asic_funcs(hdev);
> + if (rc) {
> + dev_err(hdev->dev, "failed to set aux ops\n");
> + return rc;
> + }
> +
> + rc = asic_funcs->dev_init(hdev);
> + if (rc) {
> + dev_err(hdev->dev, "device init failed\n");
> + return rc;
> + }
> +
> + /* init the function pointers here before calling hbl_en_port_register which sets up
> + * net_device_ops, and its ops might start getting called.
> + * If any failure is encountered, these will be made NULL and the core driver won't call
> + * them.
> + */
> + hbl_en_set_aux_ops(hdev, true);
> +
> + /* Port register depends on the above initialization so it must be called here and not
> + * before that.
> + */
> + for (i = 0; i < hdev->max_num_of_ports; i++, port_cnt++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + rc = hbl_en_port_init(port);
> + if (rc) {
> + dev_err(hdev->dev, "port init failed\n");
> + goto unregister_ports;
> + }
> +
> + rc = hbl_en_port_register(port);
> + if (rc) {
> + dev_err(hdev->dev, "port register failed\n");
> +
> + hbl_en_port_fini(port);
> + goto unregister_ports;
> + }
> + }
> +
> + hdev->is_initialized = true;
> +
> + return 0;
> +
> +unregister_ports:
> + for (i = 0; i < port_cnt; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + hbl_en_port_unregister(port);
> + hbl_en_port_fini(port);
> + }
> +
> + hbl_en_set_aux_ops(hdev, false);
> +
> + asic_funcs->dev_fini(hdev);
> +
> + return rc;
> +}
> +
> +void hbl_en_dev_fini(struct hbl_en_device *hdev)
> +{
> + struct hbl_en_asic_funcs *asic_funcs = &hdev->asic_funcs;
> + struct hbl_en_port *port;
> + int i;
> +
> + hdev->in_teardown = true;
> +
> + if (!hdev->is_initialized)
> + return;
> +
> + hdev->is_initialized = false;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> +
> + /* It could be this cleanup flow is called after a failed init flow.
> + * Hence we need to check that we indeed have a netdev to unregister.
> + */
> + if (!port->ndev)
> + continue;
> +
> + hbl_en_port_unregister(port);
> + hbl_en_port_fini(port);
> + }
> +
> + hbl_en_set_aux_ops(hdev, false);
> +
> + asic_funcs->dev_fini(hdev);
> +}
> +
> +dma_addr_t hbl_en_dma_map(struct hbl_en_device *hdev, void *addr, int len)
> +{
> + dma_addr_t dma_addr;
> +
> + if (hdev->dma_map_support)
> + dma_addr = dma_map_single(&hdev->pdev->dev, addr, len, DMA_TO_DEVICE);
> + else
> + dma_addr = virt_to_phys(addr);
> +
> + return dma_addr;
> +}
> +
> +void hbl_en_dma_unmap(struct hbl_en_device *hdev, dma_addr_t dma_addr, int len)
> +{
> + if (hdev->dma_map_support)
> + dma_unmap_single(&hdev->pdev->dev, dma_addr, len, DMA_TO_DEVICE);
> +}
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
> new file mode 100644
> index 000000000000..15504c1f3cfb
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
> @@ -0,0 +1,206 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#ifndef HABANALABS_EN_H_
> +#define HABANALABS_EN_H_
> +
> +#include <linux/net/intel/cn.h>
> +
> +#include <linux/netdevice.h>
> +#include <linux/pci.h>
> +
> +#define HBL_EN_NAME "habanalabs_en"
> +
> +#define HBL_EN_PORT(aux_dev, idx) (&(((struct hbl_en_device *)(aux_dev)->priv)->ports[(idx)]))
> +
> +#define hbl_netdev_priv(ndev) \
> +({ \
> + typecheck(struct net_device *, ndev); \
> + *(struct hbl_en_port **)netdev_priv(ndev); \
> +})
> +
> +/**
> + * enum hbl_en_eth_pkt_status - status of Rx Ethernet packet.
> + * ETH_PKT_OK: packet was received successfully.
> + * ETH_PKT_DROP: packet should be dropped.
> + * ETH_PKT_NONE: no available packet.
> + */
> +enum hbl_en_eth_pkt_status {
> + ETH_PKT_OK,
> + ETH_PKT_DROP,
> + ETH_PKT_NONE
> +};
> +
> +/**
> + * struct hbl_en_net_stats - stats of Ethernet interface.
> + * rx_packets: number of packets received.
> + * tx_packets: number of packets sent.
> + * rx_bytes: total bytes of data received.
> + * tx_bytes: total bytes of data sent.
> + * tx_errors: number of errors in the TX.
> + * rx_dropped: number of packets dropped by the RX.
> + * tx_dropped: number of packets dropped by the TX.
> + */
> +struct hbl_en_net_stats {
> + u64 rx_packets;
> + u64 tx_packets;
> + u64 rx_bytes;
> + u64 tx_bytes;
> + u64 tx_errors;
> + atomic64_t rx_dropped;
> + atomic64_t tx_dropped;
> +};
> +
> +/**
> + * struct hbl_en_port - manage port common structure.
> + * @hdev: habanalabs Ethernet device structure.
> + * @ndev: network device.
> + * @rx_wq: WQ for Rx poll when we cannot schedule NAPI poll.
> + * @mac_addr: HW MAC addresses.
> + * @asic_specific: ASIC specific port structure.
> + * @napi: New API structure.
> + * @rx_poll_work: Rx work for polling mode.
> + * @net_stats: statistics of the ethernet interface.
> + * @in_reset: true if the NIC was marked as in reset, false otherwise. Used to avoid an additional
> + * stopping of the NIC if a hard reset was re-initiated.
> + * @pflags: ethtool private flags bit mask.
> + * @idx: index of this specific port.
> + * @rx_max_coalesced_frames: Maximum number of packets to receive before an RX interrupt.
> + * @tx_max_coalesced_frames: Maximum number of packets to be sent before a TX interrupt.
> + * @rx_coalesce_usecs: How many usecs to delay an RX interrupt after a packet arrives.
> + * @is_initialized: true if the port H/W is initialized, false otherwise.
> + * @pfc_enable: true if this port supports Priority Flow Control, false otherwise.
> + * @auto_neg_enable: is autoneg enabled.
> + * @auto_neg_resolved: was autoneg phase finished successfully.
> + */
> +struct hbl_en_port {
> + struct hbl_en_device *hdev;
> + struct net_device *ndev;
> + struct workqueue_struct *rx_wq;
> + char *mac_addr;
> + void *asic_specific;
> + struct napi_struct napi;
> + struct delayed_work rx_poll_work;
> + struct hbl_en_net_stats net_stats;
> + atomic_t in_reset;
> + u32 pflags;
> + u32 idx;
> + u32 rx_max_coalesced_frames;
> + u32 tx_max_coalesced_frames;
> + u16 rx_coalesce_usecs;
> + u8 is_initialized;
> + u8 pfc_enable;
> + u8 auto_neg_enable;
> + u8 auto_neg_resolved;
> +};
> +
> +/**
> + * struct hbl_en_asic_funcs - ASIC specific Ethernet functions.
> + * @dev_init: device init.
> + * @dev_fini: device cleanup.
> + * @reenable_rx_irq: re-enable Rx interrupts.
> + * @eth_port_open: initialize and open the Ethernet port.
> + * @eth_port_close: close the Ethernet port.
> + * @write_pkt_to_hw: write skb to HW.
> + * @read_pkt_from_hw: read pkt from HW.
> + * @get_pfc_cnts: get PFC counters.
> + * @set_coalesce: set Tx/Rx coalesce config in HW.
> + * @get_rx_ring size: get max number of elements the Rx ring can contain.
> + * @handle_eqe: Handle a received event.
> + */
> +struct hbl_en_asic_funcs {
> + int (*dev_init)(struct hbl_en_device *hdev);
> + void (*dev_fini)(struct hbl_en_device *hdev);
> + void (*reenable_rx_irq)(struct hbl_en_port *port);
> + int (*eth_port_open)(struct hbl_en_port *port);
> + void (*eth_port_close)(struct hbl_en_port *port);
> + netdev_tx_t (*write_pkt_to_hw)(struct hbl_en_port *port, struct sk_buff *skb);
> + int (*read_pkt_from_hw)(struct hbl_en_port *port, void **pkt_addr, u32 *pkt_size);
> + void (*get_pfc_cnts)(struct hbl_en_port *port, void *ptr);
> + int (*set_coalesce)(struct hbl_en_port *port);
> + int (*get_rx_ring_size)(struct hbl_en_port *port);
> + void (*handle_eqe)(struct hbl_aux_dev *aux_dev, u32 port_idx, struct hbl_cn_eqe *eqe);
> +};
> +
> +/**
> + * struct hbl_en_device - habanalabs Ethernet device structure.
> + * @pdev: pointer to PCI device.
> + * @dev: related kernel basic device structure.
> + * @ports: array of all ports manage common structures.
> + * @aux_dev: pointer to auxiliary device.
> + * @asic_specific: ASIC specific device structure.
> + * @fw_ver: FW version.
> + * @qsfp_eeprom: QSFPD EEPROM info.
> + * @mac_addr: array of all MAC addresses.
> + * @asic_funcs: ASIC specific Ethernet functions.
> + * @asic_type: ASIC specific type.
> + * @ports_mask: mask of available ports.
> + * @auto_neg_mask: mask of port with Autonegotiation enabled.
> + * @port_reset_timeout: max time in seconds for a port reset flow to finish.
> + * @pending_reset_long_timeout: long timeout for pending hard reset to finish in seconds.
> + * @max_frm_len: maximum allowed frame length.
> + * @raw_elem_size: size of element in raw buffers.
> + * @max_raw_mtu: maximum MTU size for raw packets.
> + * @min_raw_mtu: minimum MTU size for raw packets.
> + * @pad_size: the pad size in bytes for the skb to transmit.
> + * @core_dev_id: core device ID.
> + * @max_num_of_ports: max number of available ports;
> + * @in_reset: is the entire NIC currently under reset.
> + * @poll_enable: Enable Rx polling rather than IRQ + NAPI.
> + * @in_teardown: true if the NIC is in teardown (during device remove).
> + * @is_initialized: was the device initialized successfully.
> + * @has_eq: true if event queue is supported.
> + * @dma_map_support: HW supports DMA mapping.
> + */
> +struct hbl_en_device {
> + struct pci_dev *pdev;
> + struct device *dev;
> + struct hbl_en_port *ports;
> + struct hbl_aux_dev *aux_dev;
> + void *asic_specific;
> + char *fw_ver;
> + char *qsfp_eeprom;
> + char *mac_addr;
> + struct hbl_en_asic_funcs asic_funcs;
> + enum hbl_cn_asic_type asic_type;
> + u64 ports_mask;
> + u64 auto_neg_mask;
> + u32 port_reset_timeout;
> + u32 pending_reset_long_timeout;
> + u32 max_frm_len;
> + u32 raw_elem_size;
> + u16 max_raw_mtu;
> + u16 min_raw_mtu;
> + u16 pad_size;
> + u16 core_dev_id;
> + u8 max_num_of_ports;
> + u8 in_reset;
> + u8 poll_enable;
> + u8 in_teardown;
> + u8 is_initialized;
> + u8 has_eq;
> + u8 dma_map_support;
> +};
> +
> +int hbl_en_dev_init(struct hbl_en_device *hdev);
> +void hbl_en_dev_fini(struct hbl_en_device *hdev);
> +
> +const struct ethtool_ops *hbl_en_ethtool_get_ops(struct net_device *ndev);
> +void hbl_en_ethtool_init_coalesce(struct hbl_en_port *port);
> +
> +extern const struct dcbnl_rtnl_ops hbl_en_dcbnl_ops;
> +
> +bool hbl_en_rx_poll_start(struct hbl_en_port *port);
> +void hbl_en_rx_poll_stop(struct hbl_en_port *port);
> +void hbl_en_rx_poll_trigger_init(struct hbl_en_port *port);
> +int hbl_en_port_reset(struct hbl_en_port *port);
> +int hbl_en_port_reset_locked(struct hbl_aux_dev *aux_dev, u32 port_idx);
> +int hbl_en_handle_rx(struct hbl_en_port *port, int budget);
> +dma_addr_t hbl_en_dma_map(struct hbl_en_device *hdev, void *addr, int len);
> +void hbl_en_dma_unmap(struct hbl_en_device *hdev, dma_addr_t dma_addr, int len);
> +
> +#endif /* HABANALABS_EN_H_ */
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c
> new file mode 100644
> index 000000000000..5d718579a2b6
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c
> @@ -0,0 +1,101 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#include "hbl_en.h"
> +
> +#define PFC_PRIO_MASK_ALL GENMASK(HBL_EN_PFC_PRIO_NUM - 1, 0)
> +#define PFC_PRIO_MASK_NONE 0
> +
> +#ifdef CONFIG_DCB
> +static int hbl_en_dcbnl_ieee_getpfc(struct net_device *netdev, struct ieee_pfc *pfc)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + struct hbl_en_device *hdev;
> + u32 port_idx;
> +
> + hdev = port->hdev;
> + port_idx = port->idx;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + dev_dbg_ratelimited(hdev->dev, "port %d is in reset, can't get PFC", port_idx);
> + return -EBUSY;
> + }
> +
> + pfc->pfc_en = port->pfc_enable ? PFC_PRIO_MASK_ALL : PFC_PRIO_MASK_NONE;
> + pfc->pfc_cap = HBL_EN_PFC_PRIO_NUM;
> +
> + hdev->asic_funcs.get_pfc_cnts(port, pfc);
> +
> + atomic_set(&port->in_reset, 0);
> +
> + return 0;
> +}
> +
> +static int hbl_en_dcbnl_ieee_setpfc(struct net_device *netdev, struct ieee_pfc *pfc)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + struct hbl_en_device *hdev;
> + u8 curr_pfc_en;
> + u32 port_idx;
> + int rc = 0;
> +
> + hdev = port->hdev;
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> + port_idx = port->idx;
> +
> + if (pfc->pfc_en & ~PFC_PRIO_MASK_ALL) {
> + dev_dbg_ratelimited(hdev->dev, "PFC supports %d priorities only, port %d\n",
> + HBL_EN_PFC_PRIO_NUM, port_idx);
> + return -EINVAL;
> + }
> +
> + if (pfc->pfc_en != PFC_PRIO_MASK_NONE && pfc->pfc_en != PFC_PRIO_MASK_ALL) {
> + dev_dbg_ratelimited(hdev->dev,
> + "PFC should be enabled/disabled on all priorities, port %d\n",
> + port_idx);
> + return -EINVAL;
> + }
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + dev_dbg_ratelimited(hdev->dev, "port %d is in reset, can't set PFC", port_idx);
> + return -EBUSY;
> + }
> +
> + curr_pfc_en = port->pfc_enable ? PFC_PRIO_MASK_ALL : PFC_PRIO_MASK_NONE;
> +
> + if (pfc->pfc_en == curr_pfc_en)
> + goto out;
> +
> + port->pfc_enable = !port->pfc_enable;
> +
> + rc = aux_ops->set_pfc(aux_dev, port_idx, port->pfc_enable);
> +
> +out:
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +static u8 hbl_en_dcbnl_getdcbx(struct net_device *netdev)
> +{
> + return DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE;
> +}
> +
> +static u8 hbl_en_dcbnl_setdcbx(struct net_device *netdev, u8 mode)
> +{
> + return !(mode == (DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE));
> +}
> +
> +const struct dcbnl_rtnl_ops hbl_en_dcbnl_ops = {
> + .ieee_getpfc = hbl_en_dcbnl_ieee_getpfc,
> + .ieee_setpfc = hbl_en_dcbnl_ieee_setpfc,
> + .getdcbx = hbl_en_dcbnl_getdcbx,
> + .setdcbx = hbl_en_dcbnl_setdcbx
> +};
> +#endif
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c
> new file mode 100644
> index 000000000000..23a87d36ded5
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c
> @@ -0,0 +1,211 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#define pr_fmt(fmt) "habanalabs_en: " fmt
> +
> +#include "hbl_en.h"
> +
> +#include <linux/module.h>
> +#include <linux/auxiliary_bus.h>
> +
> +#define HBL_DRIVER_AUTHOR "HabanaLabs Kernel Driver Team"
> +
> +#define HBL_DRIVER_DESC "HabanaLabs AI accelerators Ethernet driver"
> +
> +MODULE_AUTHOR(HBL_DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(HBL_DRIVER_DESC);
> +MODULE_LICENSE("GPL");
> +
> +static bool poll_enable;
> +
> +module_param(poll_enable, bool, 0444);
> +MODULE_PARM_DESC(poll_enable,
> + "Enable Rx polling rather than IRQ + NAPI (0 = no, 1 = yes, default: no)");
> +
> +static int hdev_init(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_aux_data *aux_data = aux_dev->aux_data;
> + struct hbl_en_port *ports, *port;
> + struct hbl_en_device *hdev;
> + int rc, i;
> +
> + hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
> + if (!hdev)
> + return -ENOMEM;
> +
> + ports = kcalloc(aux_data->max_num_of_ports, sizeof(*ports), GFP_KERNEL);
> + if (!ports) {
> + rc = -ENOMEM;
> + goto ports_alloc_fail;
> + }
> +
> + aux_dev->priv = hdev;
> + hdev->aux_dev = aux_dev;
> + hdev->ports = ports;
> + hdev->pdev = aux_data->pdev;
> + hdev->dev = aux_data->dev;
> + hdev->ports_mask = aux_data->ports_mask;
> + hdev->auto_neg_mask = aux_data->auto_neg_mask;
> + hdev->max_num_of_ports = aux_data->max_num_of_ports;
> + hdev->core_dev_id = aux_data->id;
> + hdev->fw_ver = aux_data->fw_ver;
> + hdev->qsfp_eeprom = aux_data->qsfp_eeprom;
> + hdev->asic_type = aux_data->asic_type;
> + hdev->pending_reset_long_timeout = aux_data->pending_reset_long_timeout;
> + hdev->max_frm_len = aux_data->max_frm_len;
> + hdev->raw_elem_size = aux_data->raw_elem_size;
> + hdev->max_raw_mtu = aux_data->max_raw_mtu;
> + hdev->min_raw_mtu = aux_data->min_raw_mtu;
> + hdev->pad_size = ETH_ZLEN;
> + hdev->has_eq = aux_data->has_eq;
> + hdev->dma_map_support = true;
> + hdev->poll_enable = poll_enable;
> +
> + for (i = 0; i < hdev->max_num_of_ports; i++) {
> + if (!(hdev->ports_mask & BIT(i)))
> + continue;
> +
> + port = &hdev->ports[i];
> + port->hdev = hdev;
> + port->idx = i;
> + port->pfc_enable = true;
> + port->pflags = PFLAGS_PCS_LINK_CHECK | PFLAGS_PHY_AUTO_NEG_LPBK;
> + port->mac_addr = aux_data->mac_addr[i];
> + port->auto_neg_enable = !!(aux_data->auto_neg_mask & BIT(i));
> + }
> +
> + return 0;
> +
> +ports_alloc_fail:
> + kfree(hdev);
> +
> + return rc;
> +}
> +
> +static void hdev_fini(struct hbl_aux_dev *aux_dev)
> +{
> + struct hbl_en_device *hdev = aux_dev->priv;
> +
> + kfree(hdev->ports);
> + kfree(hdev);
> + aux_dev->priv = NULL;
> +}
> +
> +static const struct auxiliary_device_id hbl_en_id_table[] = {
> + { .name = "habanalabs_cn.en", },
> + {},
> +};
> +
> +MODULE_DEVICE_TABLE(auxiliary, hbl_en_id_table);
> +
> +static int hbl_en_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
> +{
> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
> + struct hbl_en_aux_ops *aux_ops = aux_dev->aux_ops;
> + struct hbl_en_device *hdev;
> + ktime_t timeout;
> + int rc;
> +
> + rc = hdev_init(aux_dev);
> + if (rc) {
> + dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
> + return -EIO;
> + }
> +
> + hdev = aux_dev->priv;
> +
> + /* don't allow module unloading while it is attached */
> + if (!try_module_get(THIS_MODULE)) {
> + dev_err(hdev->dev, "Failed to increment %s module refcount\n", HBL_EN_NAME);
> + rc = -EIO;
> + goto module_get_err;
> + }
> +
> + timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
> + while (1) {
> + aux_ops->hw_access_lock(aux_dev);
> +
> + /* if the device is operational, proceed to actual init while holding the lock in
> + * order to prevent concurrent hard reset
> + */
> + if (aux_ops->device_operational(aux_dev))
> + break;
> +
> + aux_ops->hw_access_unlock(aux_dev);
> +
> + if (ktime_compare(ktime_get(), timeout) > 0) {
> + dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
> + rc = -EBUSY;
> + goto timeout_err;
> + }
> +
> + dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing en\n");
> +
> + msleep_interruptible(MSEC_PER_SEC);
> + }
> +
> + rc = hbl_en_dev_init(hdev);
> + if (rc) {
> + dev_err(hdev->dev, "Failed to init en device\n");
> + goto dev_init_err;
> + }
> +
> + aux_ops->hw_access_unlock(aux_dev);
> +
> + return 0;
> +
> +dev_init_err:
> + aux_ops->hw_access_unlock(aux_dev);
> +timeout_err:
> + module_put(THIS_MODULE);
> +module_get_err:
> + hdev_fini(aux_dev);
> +
> + return rc;
> +}
> +
> +/* This function can be called only from the CN driver when deleting the aux bus, because we
> + * incremented the module refcount on probing. Hence no need to protect here from hard reset.
> + */
> +static void hbl_en_remove(struct auxiliary_device *adev)
> +{
> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
> + struct hbl_en_device *hdev = aux_dev->priv;
> +
> + if (!hdev)
> + return;
> +
> + hbl_en_dev_fini(hdev);
> +
> + /* allow module unloading as now it is detached */
> + module_put(THIS_MODULE);
> +
> + hdev_fini(aux_dev);
> +}
> +
> +static struct auxiliary_driver hbl_en_driver = {
> + .name = "eth",
> + .probe = hbl_en_probe,
> + .remove = hbl_en_remove,
> + .id_table = hbl_en_id_table,
> +};
> +
> +static int __init hbl_en_init(void)
> +{
> + pr_info("loading driver\n");
> +
> + return auxiliary_driver_register(&hbl_en_driver);
> +}
> +
> +static void __exit hbl_en_exit(void)
> +{
> + auxiliary_driver_unregister(&hbl_en_driver);
> +
> + pr_info("driver removed\n");
> +}
> +
> +module_init(hbl_en_init);
> +module_exit(hbl_en_exit);
> diff --git a/drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c
> new file mode 100644
> index 000000000000..1d14d283409b
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c
> @@ -0,0 +1,452 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2020-2024 HabanaLabs, Ltd.
> + * Copyright (C) 2023-2024, Intel Corporation.
> + * All Rights Reserved.
> + */
> +
> +#include "hbl_en.h"
> +#include <linux/ethtool.h>
> +
> +#define RX_COALESCED_FRAMES_MIN 1
> +#define TX_COALESCED_FRAMES_MIN 1
> +#define TX_COALESCED_FRAMES_MAX 10
> +
> +static const char pflags_str[][ETH_GSTRING_LEN] = {
> + "pcs-link-check",
> + "phy-auto-neg-lpbk",
> +};
> +
> +#define NIC_STAT(m) {#m, offsetof(struct hbl_en_port, net_stats.m)}
> +
> +static struct hbl_cn_stat netdev_eth_stats[] = {
> + NIC_STAT(rx_packets),
> + NIC_STAT(tx_packets),
> + NIC_STAT(rx_bytes),
> + NIC_STAT(tx_bytes),
> + NIC_STAT(tx_errors),
> + NIC_STAT(rx_dropped),
> + NIC_STAT(tx_dropped)
> +};
> +
> +static size_t pflags_str_len = ARRAY_SIZE(pflags_str);
> +static size_t netdev_eth_stats_len = ARRAY_SIZE(netdev_eth_stats);
> +
> +static void hbl_en_ethtool_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *drvinfo)
> +{
> + struct hbl_en_device *hdev;
> + struct hbl_en_port *port;
> +
> + port = hbl_netdev_priv(ndev);
> + hdev = port->hdev;
> +
> + strscpy(drvinfo->driver, HBL_EN_NAME, sizeof(drvinfo->driver));
> + strscpy(drvinfo->fw_version, hdev->fw_ver, sizeof(drvinfo->fw_version));
> + strscpy(drvinfo->bus_info, pci_name(hdev->pdev), sizeof(drvinfo->bus_info));
> +}
> +
> +static int hbl_en_ethtool_get_module_info(struct net_device *ndev, struct ethtool_modinfo *modinfo)
> +{
> + modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
> + modinfo->type = ETH_MODULE_SFF_8636;
> +
> + return 0;
> +}
> +
> +static int hbl_en_ethtool_get_module_eeprom(struct net_device *ndev, struct ethtool_eeprom *ee,
> + u8 *data)
> +{
> + struct hbl_en_device *hdev;
> + struct hbl_en_port *port;
> + u32 first, last, len;
> + u8 *qsfp_eeprom;
> +
> + port = hbl_netdev_priv(ndev);
> + hdev = port->hdev;
> + qsfp_eeprom = hdev->qsfp_eeprom;
> +
> + if (ee->len == 0)
> + return -EINVAL;
> +
> + first = ee->offset;
> + last = ee->offset + ee->len;
> +
> + if (first < ETH_MODULE_SFF_8636_LEN) {
> + len = min_t(unsigned int, last, ETH_MODULE_SFF_8079_LEN);
> + len -= first;
> +
> + memcpy(data, qsfp_eeprom + first, len);
> + }
> +
> + return 0;
> +}
> +
> +static u32 hbl_en_ethtool_get_priv_flags(struct net_device *ndev)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> +
> + return port->pflags;
> +}
> +
> +static int hbl_en_ethtool_set_priv_flags(struct net_device *ndev, u32 priv_flags)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> +
> + port->pflags = priv_flags;
> +
> + return 0;
> +}
> +
> +static int hbl_en_ethtool_get_link_ksettings(struct net_device *ndev,
> + struct ethtool_link_ksettings *cmd)
> +{
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + struct hbl_en_device *hdev;
> + struct hbl_en_port *port;
> + u32 port_idx, speed;
> +
> + port = hbl_netdev_priv(ndev);
> + hdev = port->hdev;
> + port_idx = port->idx;
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> + speed = aux_ops->get_speed(aux_dev, port_idx);
> +
> + cmd->base.speed = speed;
> + cmd->base.duplex = DUPLEX_FULL;
> +
> + ethtool_link_ksettings_zero_link_mode(cmd, supported);
> + ethtool_link_ksettings_zero_link_mode(cmd, advertising);
> +
> + switch (speed) {
> + case SPEED_100000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseSR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseKR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseLR4_ER4_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseSR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseKR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseLR4_ER4_Full);
> +
> + cmd->base.port = PORT_FIBRE;
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, FIBRE);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, FIBRE);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, Backplane);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Backplane);
> + break;
> + case SPEED_50000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseSR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseCR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseKR2_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseSR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseCR2_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseKR2_Full);
> + break;
> + case SPEED_25000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 25000baseCR_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 25000baseCR_Full);
> + break;
> + case SPEED_200000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 200000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 200000baseKR4_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 200000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 200000baseKR4_Full);
> + break;
> + case SPEED_400000:
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 400000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, supported, 400000baseKR4_Full);
> +
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 400000baseCR4_Full);
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 400000baseKR4_Full);
> + break;
> + default:
> + netdev_err(port->ndev, "unknown speed %d\n", speed);
> + return -EFAULT;
> + }
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, Autoneg);
> +
> + if (port->auto_neg_enable) {
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Autoneg);
> + cmd->base.autoneg = AUTONEG_ENABLE;
> + if (port->auto_neg_resolved)
> + ethtool_link_ksettings_add_link_mode(cmd, lp_advertising, Autoneg);
> + } else {
> + cmd->base.autoneg = AUTONEG_DISABLE;
> + }
> +
> + ethtool_link_ksettings_add_link_mode(cmd, supported, Pause);
> +
> + if (port->pfc_enable)
> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Pause);
> +
> + return 0;
> +}
> +
> +/* only autoneg is mutable */
> +static bool check_immutable_ksettings(const struct ethtool_link_ksettings *old_cmd,
> + const struct ethtool_link_ksettings *new_cmd)
> +{
> + return (old_cmd->base.speed == new_cmd->base.speed) &&
> + (old_cmd->base.duplex == new_cmd->base.duplex) &&
> + (old_cmd->base.port == new_cmd->base.port) &&
> + (old_cmd->base.phy_address == new_cmd->base.phy_address) &&
> + (old_cmd->base.eth_tp_mdix_ctrl == new_cmd->base.eth_tp_mdix_ctrl) &&
> + bitmap_equal(old_cmd->link_modes.advertising, new_cmd->link_modes.advertising,
> + __ETHTOOL_LINK_MODE_MASK_NBITS);
> +}
> +
> +static int
> +hbl_en_ethtool_set_link_ksettings(struct net_device *ndev, const struct ethtool_link_ksettings *cmd)
> +{
> + struct ethtool_link_ksettings curr_cmd;
> + struct hbl_en_device *hdev;
> + struct hbl_en_port *port;
> + bool auto_neg;
> + u32 port_idx;
> + int rc;
> +
> + port = hbl_netdev_priv(ndev);
> + hdev = port->hdev;
> + port_idx = port->idx;
> +
> + memset(&curr_cmd, 0, sizeof(struct ethtool_link_ksettings));
> +
> + rc = hbl_en_ethtool_get_link_ksettings(ndev, &curr_cmd);
> + if (rc)
> + return rc;
> +
> + if (!check_immutable_ksettings(&curr_cmd, cmd))
> + return -EOPNOTSUPP;
> +
> + auto_neg = cmd->base.autoneg == AUTONEG_ENABLE;
> +
> + if (port->auto_neg_enable == auto_neg)
> + return 0;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(port->ndev, "port is in reset, can't update settings\n");
> + return -EBUSY;
> + }
> +
> + if (auto_neg && !(hdev->auto_neg_mask & BIT(port_idx))) {
> + netdev_err(port->ndev, "port autoneg is disabled by BMC\n");
> + rc = -EFAULT;
> + goto out;
> + }
> +
> + port->auto_neg_enable = auto_neg;
> +
> + if (netif_running(port->ndev)) {
> + rc = hbl_en_port_reset(port);
> + if (rc)
> + netdev_err(port->ndev, "Failed to reset port for settings update, rc %d\n",
> + rc);
> + }
> +
> +out:
> + atomic_set(&port->in_reset, 0);
> +
> + return rc;
> +}
> +
> +static int hbl_en_ethtool_get_sset_count(struct net_device *ndev, int sset)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + switch (sset) {
> + case ETH_SS_STATS:
> + return netdev_eth_stats_len + aux_ops->get_cnts_num(aux_dev, port_idx);
> + case ETH_SS_PRIV_FLAGS:
> + return pflags_str_len;
> + default:
> + return -EOPNOTSUPP;
> + }
> +}
> +
> +static void hbl_en_ethtool_get_strings(struct net_device *ndev, u32 stringset, u8 *data)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int i;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + switch (stringset) {
> + case ETH_SS_STATS:
> + for (i = 0; i < netdev_eth_stats_len; i++)
> + ethtool_puts(&data, netdev_eth_stats[i].str);
> +
> + aux_ops->get_cnts_names(aux_dev, port_idx, data);
> + break;
> + case ETH_SS_PRIV_FLAGS:
> + for (i = 0; i < pflags_str_len; i++)
> + ethtool_puts(&data, pflags_str[i]);
> + break;
> + }
> +}
> +
> +static void hbl_en_ethtool_get_ethtool_stats(struct net_device *ndev,
> + __always_unused struct ethtool_stats *stats, u64 *data)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + struct hbl_en_device *hdev;
> + u32 port_idx;
> + char *p;
> + int i;
> +
> + hdev = port->hdev;
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> + port_idx = port->idx;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + dev_info_ratelimited(hdev->dev, "port %d is in reset, can't get ethtool stats",
> + port_idx);
> + return;
> + }
> +
> + /* Even though the Ethernet Rx/Tx flow might update the stats in parallel, there is not an
> + * absolute need for synchronisation. This is because, missing few counts of these stats is
> + * much better than adding a lock to synchronize and increase the overhead of the Rx/Tx
> + * flows. In worst case scenario, reader will get stale stats. He will receive updated
> + * stats in next read.
> + */
> + for (i = 0; i < netdev_eth_stats_len; i++) {
> + p = (char *)port + netdev_eth_stats[i].lo_offset;
> + data[i] = *(u32 *)p;
> + }
> +
> + data += i;
> +
> + aux_ops->get_cnts_values(aux_dev, port_idx, data);
> +
> + atomic_set(&port->in_reset, 0);
> +}
> +
> +static int hbl_en_ethtool_get_coalesce(struct net_device *ndev,
> + struct ethtool_coalesce *coal,
> + struct kernel_ethtool_coalesce *kernel_coal,
> + struct netlink_ext_ack *extack)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + aux_ops->ctrl_lock(aux_dev, port_idx);
> +
> + coal->tx_max_coalesced_frames = port->tx_max_coalesced_frames;
> + coal->rx_coalesce_usecs = port->rx_coalesce_usecs;
> + coal->rx_max_coalesced_frames = port->rx_max_coalesced_frames;
> +
> + aux_ops->ctrl_unlock(aux_dev, port_idx);
> +
> + return 0;
> +}
> +
> +static int hbl_en_ethtool_set_coalesce(struct net_device *ndev,
> + struct ethtool_coalesce *coal,
> + struct kernel_ethtool_coalesce *kernel_coal,
> + struct netlink_ext_ack *extack)
> +{
> + struct hbl_en_port *port = hbl_netdev_priv(ndev);
> + struct hbl_en_device *hdev = port->hdev;
> + struct hbl_en_aux_ops *aux_ops;
> + struct hbl_aux_dev *aux_dev;
> + u32 port_idx = port->idx;
> + int rc, rx_ring_size;
> +
> + aux_dev = hdev->aux_dev;
> + aux_ops = aux_dev->aux_ops;
> +
> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> + netdev_err(port->ndev, "port is in reset, can't update settings\n");
> + return -EBUSY;
> + }
> +
> + if (coal->tx_max_coalesced_frames < TX_COALESCED_FRAMES_MIN ||
> + coal->tx_max_coalesced_frames > TX_COALESCED_FRAMES_MAX) {
> + netdev_err(ndev, "tx max_coalesced_frames should be between %d and %d\n",
> + TX_COALESCED_FRAMES_MIN, TX_COALESCED_FRAMES_MAX);
> + rc = -EINVAL;
> + goto atomic_out;
> + }
> +
> + rx_ring_size = hdev->asic_funcs.get_rx_ring_size(port);
> + if (coal->rx_max_coalesced_frames < RX_COALESCED_FRAMES_MIN ||
> + coal->rx_max_coalesced_frames >= rx_ring_size) {
> + netdev_err(ndev, "rx max_coalesced_frames should be between %d and %d\n",
> + RX_COALESCED_FRAMES_MIN, rx_ring_size);
> + rc = -EINVAL;
> + goto atomic_out;
> + }
> +
> + aux_ops->ctrl_lock(aux_dev, port_idx);
> +
> + port->tx_max_coalesced_frames = coal->tx_max_coalesced_frames;
> + port->rx_coalesce_usecs = coal->rx_coalesce_usecs;
> + port->rx_max_coalesced_frames = coal->rx_max_coalesced_frames;
> +
> + rc = hdev->asic_funcs.set_coalesce(port);
> +
> + aux_ops->ctrl_unlock(aux_dev, port_idx);
> +
> +atomic_out:
> + atomic_set(&port->in_reset, 0);
> + return rc;
> +}
> +
> +void hbl_en_ethtool_init_coalesce(struct hbl_en_port *port)
> +{
> + port->rx_coalesce_usecs = CQ_ARM_TIMEOUT_USEC;
> + port->rx_max_coalesced_frames = 1;
> + port->tx_max_coalesced_frames = 1;
> +}
> +
> +static const struct ethtool_ops hbl_en_ethtool_ops_coalesce = {
> + .supported_coalesce_params = ETHTOOL_COALESCE_RX_USECS | ETHTOOL_COALESCE_RX_MAX_FRAMES |
> + ETHTOOL_COALESCE_TX_MAX_FRAMES,
> + .get_drvinfo = hbl_en_ethtool_get_drvinfo,
> + .get_link = ethtool_op_get_link,
> + .get_module_info = hbl_en_ethtool_get_module_info,
> + .get_module_eeprom = hbl_en_ethtool_get_module_eeprom,
> + .get_priv_flags = hbl_en_ethtool_get_priv_flags,
> + .set_priv_flags = hbl_en_ethtool_set_priv_flags,
> + .get_link_ksettings = hbl_en_ethtool_get_link_ksettings,
> + .set_link_ksettings = hbl_en_ethtool_set_link_ksettings,
> + .get_sset_count = hbl_en_ethtool_get_sset_count,
> + .get_strings = hbl_en_ethtool_get_strings,
> + .get_ethtool_stats = hbl_en_ethtool_get_ethtool_stats,
> + .get_coalesce = hbl_en_ethtool_get_coalesce,
> + .set_coalesce = hbl_en_ethtool_set_coalesce,
> +};
> +
> +const struct ethtool_ops *hbl_en_ethtool_get_ops(struct net_device *ndev)
> +{
> + return &hbl_en_ethtool_ops_coalesce;
> +}
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-14 22:48 ` Joe Damato
@ 2024-06-16 1:04 ` Andrew Lunn
2024-06-18 19:37 ` Omer Shpigelman
1 sibling, 0 replies; 107+ messages in thread
From: Andrew Lunn @ 2024-06-16 1:04 UTC (permalink / raw)
To: Joe Damato, Omer Shpigelman, linux-kernel, linux-rdma, netdev,
dri-devel, ogabbay, zyehudai
On Fri, Jun 14, 2024 at 03:48:43PM -0700, Joe Damato wrote:
> On Thu, Jun 13, 2024 at 11:22:02AM +0300, Omer Shpigelman wrote:
> > This ethernet driver is initialized via auxiliary bus by the hbl_cn
> > driver.
> > It serves mainly for control operations that are needed for AI scaling.
> >
> > Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> > Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> > Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> > Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> > Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> > Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> > Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> > Co-developed-by: David Meriin <dmeriin@habana.ai>
> > Signed-off-by: David Meriin <dmeriin@habana.ai>
> > Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> > Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> > Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
Hi Joe
Please trim emails to include just the relevant context when
replying. It is hard to see your comments, and so it is likely some
will be missed, and you will need to make the same comment on v2.
Andrew
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-15 17:13 ` Zhu Yanjun
@ 2024-06-16 1:08 ` Andrew Lunn
0 siblings, 0 replies; 107+ messages in thread
From: Andrew Lunn @ 2024-06-16 1:08 UTC (permalink / raw)
To: Zhu Yanjun
Cc: Omer Shpigelman, linux-kernel, linux-rdma, netdev, dri-devel,
ogabbay, zyehudai
> > +static void hbl_en_reset_stats(struct hbl_aux_dev *aux_dev, u32 port_idx)
> > +{
> > + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> > +
> > + port->net_stats.rx_packets = 0;
> > + port->net_stats.tx_packets = 0;
> > + port->net_stats.rx_bytes = 0;
> > + port->net_stats.tx_bytes = 0;
> > + port->net_stats.tx_errors = 0;
> > + atomic64_set(&port->net_stats.rx_dropped, 0);
> > + atomic64_set(&port->net_stats.tx_dropped, 0);
>
> per-cpu variable is better?
Please trim replies to just the needed context. Is this the only
comment in this 2300 line email? Do i need to keep searching for more
comments?
Andrew
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver
2024-06-13 13:01 ` Przemek Kitszel
2024-06-13 14:16 ` Przemek Kitszel
@ 2024-06-17 8:08 ` Omer Shpigelman
2024-06-17 11:48 ` Leon Romanovsky
1 sibling, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-17 8:08 UTC (permalink / raw)
To: Przemek Kitszel, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org
Cc: ogabbay@kernel.org, Zvika Yehudai
On 6/13/24 16:01, Przemek Kitszel wrote:
> [Some people who received this message don't often get email from przemyslaw.kitszel@intel.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On 6/13/24 10:21, Omer Shpigelman wrote:
>> Add the hbl_cn driver which will serve both Ethernet and InfiniBand
>> drivers.
>> hbl_cn is the layer which is used by the satellite drivers for many shared
>> operations that are needed by both EN and IB subsystems like QPs, CQs etc.
>> The CN driver is initialized via auxiliary bus by the habanalabs driver.
>>
>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
>> Co-developed-by: David Meriin <dmeriin@habana.ai>
>> Signed-off-by: David Meriin <dmeriin@habana.ai>
>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
>> ---
>> .../device_drivers/ethernet/index.rst | 1 +
>> .../device_drivers/ethernet/intel/hbl.rst | 82 +
>> MAINTAINERS | 11 +
>> drivers/net/ethernet/intel/Kconfig | 20 +
>> drivers/net/ethernet/intel/Makefile | 1 +
>> drivers/net/ethernet/intel/hbl_cn/Makefile | 9 +
>> .../net/ethernet/intel/hbl_cn/common/Makefile | 3 +
>> .../net/ethernet/intel/hbl_cn/common/hbl_cn.c | 5954 +++++++++++++++++
>> .../net/ethernet/intel/hbl_cn/common/hbl_cn.h | 1627 +++++
>> .../ethernet/intel/hbl_cn/common/hbl_cn_drv.c | 220 +
>> .../intel/hbl_cn/common/hbl_cn_memory.c | 40 +
>> .../ethernet/intel/hbl_cn/common/hbl_cn_phy.c | 33 +
>> .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 13 +
>> include/linux/habanalabs/cpucp_if.h | 125 +-
>> include/linux/habanalabs/hl_boot_if.h | 9 +-
>> include/linux/net/intel/cn.h | 474 ++
>> include/linux/net/intel/cn_aux.h | 298 +
>> include/linux/net/intel/cni.h | 636 ++
>> 18 files changed, 9545 insertions(+), 11 deletions(-)
>
> this is a very big patch, it asks for a split; what's worse, it's
> proportional to the size of this series:
> 146 files changed, 148514 insertions(+), 70 deletions(-)
> which is just too big
>
> [...]
>
Yeah, well I'm limited to 15 patches per patch set according to the kernel
doc so I had to have this big patch.
Our changes are contained in 4 different drivers and all of the changes
should be merged together so the HW will be operational.
Hence I had to squeeze some code to a big patch.
>> +Support
>> +=======
>> +For general information, go to the Intel support website at:
>> +https://www.intel.com/support/
>> +
>> +If an issue is identified with the released source code on a supported kernel
>> +with a supported adapter, email the specific information related to the issue
>> +to intel-wired-lan@lists.osuosl.org.
>
> I'm welcoming you to post next version of the driver to the IWL mailing
> list, and before that, to go through our Intel path for ethernet
> subsystem (rdma and a few smaller ones also go through that)
> (that starts internally, I will PM you the details)
>
> [...]
>
Ok, I'll go the Intel path first.
>> +++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
>> @@ -0,0 +1,5954 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright 2020-2024 HabanaLabs, Ltd.
>> + * Copyright (C) 2023-2024, Intel Corporation.
>> + * All Rights Reserved.
>> + */
>> +
>> +#include "hbl_cn.h"
>> +
>> +#include <linux/file.h>
>> +#include <linux/module.h>
>> +#include <linux/overflow.h>
>> +#include <linux/pci.h>
>> +#include <linux/slab.h>
>> +
>> +#define NIC_MIN_WQS_PER_PORT 2
>> +
>> +#define NIC_SEQ_RESETS_TIMEOUT_MS 15000 /* 15 seconds */
>> +#define NIC_MAX_SEQ_RESETS 3
>> +
>> +#define HBL_CN_IPV4_PROTOCOL_UDP 17
>> +
>> +/* SOB mask is not expected to change across ASIC. Hence common defines. */
>> +#define NIC_SOB_INC_MASK 0x80000000
>> +#define NIC_SOB_VAL_MASK 0x7fff
>> +
>> +#define NIC_DUMP_QP_SZ SZ_4K
>> +
>> +#define HBL_AUX2NIC(aux_dev) \
>> + ({ \
>> + struct hbl_aux_dev *__aux_dev = (aux_dev); \
>> + ((__aux_dev)->type == HBL_AUX_DEV_ETH) ? \
>> + container_of(__aux_dev, struct hbl_cn_device, en_aux_dev) : \
>> + container_of(__aux_dev, struct hbl_cn_device, ib_aux_dev); \
>> + })
>
> this should be a function
>
I'll switch it to a function.
>> +
>> +#define RAND_STAT_CNT(cnt) \
>> + do { \
>> + u32 __cnt = get_random_u32(); \
>> + (cnt) = __cnt; \
>> + dev_info(hdev->dev, "port %d, %s: %u\n", port, #cnt, __cnt); \
>
> no way for such message, ditto for the function
>
The thing is that I'd like to print the counter name and it's value.
For that I need to stringify the counter name.
IMO it is nicer to have the current code rather than something like:
RAND_STAT_CNT(status->high_ber_reinit,
__stringify(status->high_ber_reinit));
RAND_STAT_CNT(status->correctable_err_cnt,
__stringify(status->correctable_err_cnt));
or:
RAND_STAT_CNT(status->high_ber_reinit, "high_ber_reinit");
RAND_STAT_CNT(status->correctable_err_cnt, "correctable_err_cnt");
But I'll change it if it's not common to print from macros.
>> + } while (0)
>> +
>> +struct hbl_cn_stat hbl_cn_mac_fec_stats[] = {
>> + {"correctable_errors", 0x2, 0x3},
>> + {"uncorrectable_errors", 0x4, 0x5}
>> +};
>> +
>> +struct hbl_cn_stat hbl_cn_mac_stats_rx[] = {
>> + {"Octets", 0x0},
>> + {"OctetsReceivedOK", 0x4},
>> + {"aAlignmentErrors", 0x8},
>> + {"aPAUSEMACCtrlFramesReceived", 0xC},
>> + {"aFrameTooLongErrors", 0x10},
>> + {"aInRangeLengthErrors", 0x14},
>> + {"aFramesReceivedOK", 0x18},
>> + {"aFrameCheckSequenceErrors", 0x1C},
>> + {"VLANReceivedOK", 0x20},
>> + {"ifInErrors", 0x24},
>> + {"ifInUcastPkts", 0x28},
>> + {"ifInMulticastPkts", 0x2C},
>> + {"ifInBroadcastPkts", 0x30},
>> + {"DropEvents", 0x34},
>> + {"Pkts", 0x38},
>> + {"UndersizePkts", 0x3C},
>> + {"Pkts64Octets", 0x40},
>> + {"Pkts65to127Octets", 0x44},
>> + {"Pkts128to255Octets", 0x48},
>> + {"Pkts256to511Octets", 0x4C},
>> + {"Pkts512to1023Octets", 0x50},
>> + {"Pkts1024to1518Octets", 0x54},
>> + {"Pkts1519toMaxOctets", 0x58},
>> + {"OversizePkts", 0x5C},
>> + {"Jabbers", 0x60},
>> + {"Fragments", 0x64},
>> + {"aCBFCPAUSERx0", 0x68},
>> + {"aCBFCPAUSERx1", 0x6C},
>> + {"aCBFCPAUSERx2", 0x70},
>> + {"aCBFCPAUSERx3", 0x74},
>> + {"aCBFCPAUSERx4", 0x78},
>> + {"aCBFCPAUSERx5", 0x7C},
>> + {"aCBFCPAUSERx6", 0x80},
>> + {"aCBFCPAUSERx7", 0x84},
>> + {"aMACControlFramesReceived", 0x88}
>> +};
>> +
>> +struct hbl_cn_stat hbl_cn_mac_stats_tx[] = {
>> + {"Octets", 0x0},
>> + {"OctetsTransmittedOK", 0x4},
>> + {"aPAUSEMACCtrlFramesTransmitted", 0x8},
>> + {"aFramesTransmittedOK", 0xC},
>> + {"VLANTransmittedOK", 0x10},
>> + {"ifOutErrors", 0x14},
>> + {"ifOutUcastPkts", 0x18},
>> + {"ifOutMulticastPkts", 0x1C},
>> + {"ifOutBroadcastPkts", 0x20},
>> + {"Pkts64Octets", 0x24},
>> + {"Pkts65to127Octets", 0x28},
>> + {"Pkts128to255Octets", 0x2C},
>> + {"Pkts256to511Octets", 0x30},
>> + {"Pkts512to1023Octets", 0x34},
>> + {"Pkts1024to1518Octets", 0x38},
>> + {"Pkts1519toMaxOctets", 0x3C},
>> + {"aCBFCPAUSETx0", 0x40},
>> + {"aCBFCPAUSETx1", 0x44},
>> + {"aCBFCPAUSETx2", 0x48},
>> + {"aCBFCPAUSETx3", 0x4C},
>> + {"aCBFCPAUSETx4", 0x50},
>> + {"aCBFCPAUSETx5", 0x54},
>> + {"aCBFCPAUSETx6", 0x58},
>> + {"aCBFCPAUSETx7", 0x5C},
>> + {"aMACControlFramesTx", 0x60},
>> + {"Pkts", 0x64}
>> +};
>> +
>> +static const char pcs_counters_str[][ETH_GSTRING_LEN] = {
>> + {"pcs_local_faults"},
>> + {"pcs_remote_faults"},
>> + {"pcs_remote_fault_reconfig"},
>> + {"pcs_link_restores"},
>> + {"pcs_link_toggles"},
>> +};
>> +
>> +static size_t pcs_counters_str_len = ARRAY_SIZE(pcs_counters_str);
>> +size_t hbl_cn_mac_fec_stats_len = ARRAY_SIZE(hbl_cn_mac_fec_stats);
>> +size_t hbl_cn_mac_stats_rx_len = ARRAY_SIZE(hbl_cn_mac_stats_rx);
>> +size_t hbl_cn_mac_stats_tx_len = ARRAY_SIZE(hbl_cn_mac_stats_tx);
>
> why those are not const?
>
I'll add const.
>> +
>> +static void qps_stop(struct hbl_cn_device *hdev);
>> +static void qp_destroy_work(struct work_struct *work);
>> +static int __user_wq_arr_unset(struct hbl_cn_ctx *ctx, struct hbl_cn_port *cn_port, u32 type);
>> +static void user_cq_destroy(struct kref *kref);
>> +static void set_app_params_clear(struct hbl_cn_device *hdev);
>> +static int hbl_cn_ib_cmd_ctrl(struct hbl_aux_dev *aux_dev, void *cn_ib_ctx, u32 op, void *input,
>> + void *output);
>> +static int hbl_cn_ib_query_mem_handle(struct hbl_aux_dev *ib_aux_dev, u64 mem_handle,
>> + struct hbl_ib_mem_info *info);
>> +
>> +static void hbl_cn_reset_stats_counters_port(struct hbl_cn_device *hdev, u32 port);
>> +static void hbl_cn_late_init(struct hbl_cn_device *hdev);
>> +static void hbl_cn_late_fini(struct hbl_cn_device *hdev);
>> +static int hbl_cn_sw_init(struct hbl_cn_device *hdev);
>> +static void hbl_cn_sw_fini(struct hbl_cn_device *hdev);
>> +static void hbl_cn_spmu_init(struct hbl_cn_port *cn_port, bool full);
>> +static int hbl_cn_cmd_port_check(struct hbl_cn_device *hdev, u32 port, u32 flags);
>> +static void hbl_cn_qps_stop(struct hbl_cn_port *cn_port);
>> +
>> +static int hbl_cn_request_irqs(struct hbl_cn_device *hdev)
>> +{
>> + struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
>> +
>> + return asic_funcs->request_irqs(hdev);
>> +}
>> +
>> +static void hbl_cn_free_irqs(struct hbl_cn_device *hdev)
>> +{
>> + struct hbl_cn_asic_funcs *asic_funcs = hdev->asic_funcs;
>> +
>> + asic_funcs->free_irqs(hdev);
>> +}
>> +
>> +static void hbl_cn_synchronize_irqs(struct hbl_aux_dev *cn_aux_dev)
>> +{
>> + struct hbl_cn_device *hdev = cn_aux_dev->priv;
>> + struct hbl_cn_asic_funcs *asic_funcs;
>> +
>> + asic_funcs = hdev->asic_funcs;
>> +
>> + asic_funcs->synchronize_irqs(hdev);
>> +}
>> +
>> +void hbl_cn_get_frac_info(u64 numerator, u64 denominator, u64 *integer, u64 *exp)
>> +{
>> + u64 high_digit_n, high_digit_d, integer_tmp, exp_tmp;
>> + u8 num_digits_n, num_digits_d;
>> + int i;
>> +
>> + num_digits_d = hbl_cn_get_num_of_digits(denominator);
>> + high_digit_d = denominator;
>> + for (i = 0; i < num_digits_d - 1; i++)
>> + high_digit_d /= 10;
>> +
>> + integer_tmp = 0;
>> + exp_tmp = 0;
>> +
>> + if (numerator) {
>> + num_digits_n = hbl_cn_get_num_of_digits(numerator);
>> + high_digit_n = numerator;
>> + for (i = 0; i < num_digits_n - 1; i++)
>> + high_digit_n /= 10;
>> +
>> + exp_tmp = num_digits_d - num_digits_n;
>> +
>> + if (high_digit_n < high_digit_d) {
>> + high_digit_n *= 10;
>> + exp_tmp++;
>> + }
>> +
>> + integer_tmp = div_u64(high_digit_n, high_digit_d);
>> + }
>> +
>> + *integer = integer_tmp;
>> + *exp = exp_tmp;
>> +}
>
> this function sounds suspicious for a network driver, what do you need
> it for?
>
Some of our counters are exposed by the HW with numerator/denominator
representation and we'd like to expose them to the user with exponent
representation.
This function converts a counter value from one representation to the
other.
>> +
>> +int hbl_cn_read_spmu_counters(struct hbl_cn_port *cn_port, u64 out_data[], u32 *num_out_data)
>> +{
>> + struct hbl_cn_device *hdev = cn_port->hdev;
>> + struct hbl_cn_asic_port_funcs *port_funcs;
>> + struct hbl_cn_stat *ignore;
>> + int rc;
>> +
>> + port_funcs = hdev->asic_funcs->port_funcs;
>> +
>> + port_funcs->spmu_get_stats_info(cn_port, &ignore, num_out_data);
>
> hard to ignore that you deref uninitialized pointer...
>
> please consider going one step back and start with our internal mailing
> lists, thank you
> Przemek
>
> [...]
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver
2024-06-15 0:05 ` Stephen Hemminger
@ 2024-06-17 8:14 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-17 8:14 UTC (permalink / raw)
To: Stephen Hemminger
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/15/24 03:05, Stephen Hemminger wrote:
> [You don't often get email from stephen@networkplumber.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
>> +#define HBL_AUX2NIC(aux_dev) \
>> + ({ \
>> + struct hbl_aux_dev *__aux_dev = (aux_dev); \
>> + ((__aux_dev)->type == HBL_AUX_DEV_ETH) ? \
>> + container_of(__aux_dev, struct hbl_cn_device, en_aux_dev) : \
>> + container_of(__aux_dev, struct hbl_cn_device, ib_aux_dev); \
>> + })
>> +
>> +#define RAND_STAT_CNT(cnt) \
>> + do { \
>> + u32 __cnt = get_random_u32(); \
>> + (cnt) = __cnt; \
>> + dev_info(hdev->dev, "port %d, %s: %u\n", port, #cnt, __cnt); \
>> + } while (0)
>> +
>> +struct hbl_cn_stat hbl_cn_mac_fec_stats[] = {
>> + {"correctable_errors", 0x2, 0x3},
>> + {"uncorrectable_errors", 0x4, 0x5}
>> +};
>> +
>
> These tables should be marked const?
>
I'll add const to them.
>> +struct hbl_cn_stat hbl_cn_mac_stats_rx[] = {
>> + {"Octets", 0x0},
>> + {"OctetsReceivedOK", 0x4},
>> + {"aAlignmentErrors", 0x8},
>> + {"aPAUSEMACCtrlFramesReceived", 0xC},
>> + {"aFrameTooLongErrors", 0x10},
>> + {"aInRangeLengthErrors", 0x14},
>> + {"aFramesReceivedOK", 0x18},
>> + {"aFrameCheckSequenceErrors", 0x1C},
>> + {"VLANReceivedOK", 0x20},
>> + {"ifInErrors", 0x24},
>> + {"ifInUcastPkts", 0x28},
>> + {"ifInMulticastPkts", 0x2C},
>> + {"ifInBroadcastPkts", 0x30},
>> + {"DropEvents", 0x34},
>> + {"Pkts", 0x38},
>> + {"UndersizePkts", 0x3C},
>> + {"Pkts64Octets", 0x40},
>> + {"Pkts65to127Octets", 0x44},
>> + {"Pkts128to255Octets", 0x48},
>> + {"Pkts256to511Octets", 0x4C},
>> + {"Pkts512to1023Octets", 0x50},
>> + {"Pkts1024to1518Octets", 0x54},
>> + {"Pkts1519toMaxOctets", 0x58},
>> + {"OversizePkts", 0x5C},
>> + {"Jabbers", 0x60},
>> + {"Fragments", 0x64},
>> + {"aCBFCPAUSERx0", 0x68},
>> + {"aCBFCPAUSERx1", 0x6C},
>> + {"aCBFCPAUSERx2", 0x70},
>> + {"aCBFCPAUSERx3", 0x74},
>> + {"aCBFCPAUSERx4", 0x78},
>> + {"aCBFCPAUSERx5", 0x7C},
>> + {"aCBFCPAUSERx6", 0x80},
>> + {"aCBFCPAUSERx7", 0x84},
>> + {"aMACControlFramesReceived", 0x88}
>> +};
>> +
>> +struct hbl_cn_stat hbl_cn_mac_stats_tx[] = {
>> + {"Octets", 0x0},
>> + {"OctetsTransmittedOK", 0x4},
>> + {"aPAUSEMACCtrlFramesTransmitted", 0x8},
>> + {"aFramesTransmittedOK", 0xC},
>> + {"VLANTransmittedOK", 0x10},
>> + {"ifOutErrors", 0x14},
>> + {"ifOutUcastPkts", 0x18},
>> + {"ifOutMulticastPkts", 0x1C},
>> + {"ifOutBroadcastPkts", 0x20},
>> + {"Pkts64Octets", 0x24},
>> + {"Pkts65to127Octets", 0x28},
>> + {"Pkts128to255Octets", 0x2C},
>> + {"Pkts256to511Octets", 0x30},
>> + {"Pkts512to1023Octets", 0x34},
>> + {"Pkts1024to1518Octets", 0x38},
>> + {"Pkts1519toMaxOctets", 0x3C},
>> + {"aCBFCPAUSETx0", 0x40},
>> + {"aCBFCPAUSETx1", 0x44},
>> + {"aCBFCPAUSETx2", 0x48},
>> + {"aCBFCPAUSETx3", 0x4C},
>> + {"aCBFCPAUSETx4", 0x50},
>> + {"aCBFCPAUSETx5", 0x54},
>> + {"aCBFCPAUSETx6", 0x58},
>> + {"aCBFCPAUSETx7", 0x5C},
>> + {"aMACControlFramesTx", 0x60},
>> + {"Pkts", 0x64}
>> +};
>> +
>> +static const char pcs_counters_str[][ETH_GSTRING_LEN] = {
>> + {"pcs_local_faults"},
>> + {"pcs_remote_faults"},
>> + {"pcs_remote_fault_reconfig"},
>> + {"pcs_link_restores"},
>> + {"pcs_link_toggles"},
>> +};
>> +
>> +static size_t pcs_counters_str_len = ARRAY_SIZE(pcs_counters_str);
>> +size_t hbl_cn_mac_fec_stats_len = ARRAY_SIZE(hbl_cn_mac_fec_stats);
>> +size_t hbl_cn_mac_stats_rx_len = ARRAY_SIZE(hbl_cn_mac_stats_rx);
>> +size_t hbl_cn_mac_stats_tx_len = ARRAY_SIZE(hbl_cn_mac_stats_tx);
>> +
>> +static void qps_stop(struct hbl_cn_device *hdev);
>> +static void qp_destroy_work(struct work_struct *work);
>> +static int __user_wq_arr_unset(struct hbl_cn_ctx *ctx, struct hbl_cn_port *cn_port, u32 type);
>> +static void user_cq_destroy(struct kref *kref);
>> +static void set_app_params_clear(struct hbl_cn_device *hdev);
>> +static int hbl_cn_ib_cmd_ctrl(struct hbl_aux_dev *aux_dev, void *cn_ib_ctx, u32 op, void *input,
>> + void *output);
>> +static int hbl_cn_ib_query_mem_handle(struct hbl_aux_dev *ib_aux_dev, u64 mem_handle,
>> + struct hbl_ib_mem_info *info);
>> +
>> +static void hbl_cn_reset_stats_counters_port(struct hbl_cn_device *hdev, u32 port);
>> +static void hbl_cn_late_init(struct hbl_cn_device *hdev);
>> +static void hbl_cn_late_fini(struct hbl_cn_device *hdev);
>> +static int hbl_cn_sw_init(struct hbl_cn_device *hdev);
>> +static void hbl_cn_sw_fini(struct hbl_cn_device *hdev);
>> +static void hbl_cn_spmu_init(struct hbl_cn_port *cn_port, bool full);
>> +static int hbl_cn_cmd_port_check(struct hbl_cn_device *hdev, u32 port, u32 flags);
>> +static void hbl_cn_qps_stop(struct hbl_cn_port *cn_port);
>
> Can you reorder code so forward declarations are not required?
I'll try to reorder to get rid of the forward declarations.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver
2024-06-17 8:08 ` Omer Shpigelman
@ 2024-06-17 11:48 ` Leon Romanovsky
2024-06-18 7:28 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Leon Romanovsky @ 2024-06-17 11:48 UTC (permalink / raw)
To: Omer Shpigelman
Cc: Przemek Kitszel, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On Mon, Jun 17, 2024 at 08:08:26AM +0000, Omer Shpigelman wrote:
> On 6/13/24 16:01, Przemek Kitszel wrote:
> > [Some people who received this message don't often get email from przemyslaw.kitszel@intel.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > On 6/13/24 10:21, Omer Shpigelman wrote:
> >> Add the hbl_cn driver which will serve both Ethernet and InfiniBand
> >> drivers.
> >> hbl_cn is the layer which is used by the satellite drivers for many shared
> >> operations that are needed by both EN and IB subsystems like QPs, CQs etc.
> >> The CN driver is initialized via auxiliary bus by the habanalabs driver.
> >>
> >> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> >> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> >> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> >> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> >> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> >> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> >> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> >> Co-developed-by: David Meriin <dmeriin@habana.ai>
> >> Signed-off-by: David Meriin <dmeriin@habana.ai>
> >> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> >> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> >> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> >> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> >> ---
> >> .../device_drivers/ethernet/index.rst | 1 +
> >> .../device_drivers/ethernet/intel/hbl.rst | 82 +
> >> MAINTAINERS | 11 +
> >> drivers/net/ethernet/intel/Kconfig | 20 +
> >> drivers/net/ethernet/intel/Makefile | 1 +
> >> drivers/net/ethernet/intel/hbl_cn/Makefile | 9 +
> >> .../net/ethernet/intel/hbl_cn/common/Makefile | 3 +
> >> .../net/ethernet/intel/hbl_cn/common/hbl_cn.c | 5954 +++++++++++++++++
> >> .../net/ethernet/intel/hbl_cn/common/hbl_cn.h | 1627 +++++
> >> .../ethernet/intel/hbl_cn/common/hbl_cn_drv.c | 220 +
> >> .../intel/hbl_cn/common/hbl_cn_memory.c | 40 +
> >> .../ethernet/intel/hbl_cn/common/hbl_cn_phy.c | 33 +
> >> .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 13 +
> >> include/linux/habanalabs/cpucp_if.h | 125 +-
> >> include/linux/habanalabs/hl_boot_if.h | 9 +-
> >> include/linux/net/intel/cn.h | 474 ++
> >> include/linux/net/intel/cn_aux.h | 298 +
> >> include/linux/net/intel/cni.h | 636 ++
> >> 18 files changed, 9545 insertions(+), 11 deletions(-)
> >
> > this is a very big patch, it asks for a split; what's worse, it's
> > proportional to the size of this series:
> > 146 files changed, 148514 insertions(+), 70 deletions(-)
> > which is just too big
> >
> > [...]
> >
>
> Yeah, well I'm limited to 15 patches per patch set according to the kernel
> doc so I had to have this big patch.
> Our changes are contained in 4 different drivers and all of the changes
> should be merged together so the HW will be operational.
> Hence I had to squeeze some code to a big patch.
Submit your code in multiple steps. One driver at a time.
Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 00/15] Introduce HabanaLabs network drivers
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
` (13 preceding siblings ...)
2024-06-13 8:22 ` [PATCH 15/15] accel/habanalabs/gaudi2: network scaling support Omer Shpigelman
@ 2024-06-17 12:34 ` Alexander Lobakin
2024-06-19 11:40 ` Omer Shpigelman
2024-06-19 16:33 ` Jiri Pirko
15 siblings, 1 reply; 107+ messages in thread
From: Alexander Lobakin @ 2024-06-17 12:34 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel, linux-rdma, netdev, dri-devel, ogabbay, zyehudai
From: Omer Shpigelman <oshpigelman@habana.ai>
Date: Thu, 13 Jun 2024 11:21:53 +0300
> This patch set implements the HabanaLabs network drivers for Gaudi2 ASIC
> which is designed for scaling of AI neural networks training.
> The patch set includes the common code which is shared by all Gaudi ASICs
> and the Gaudi2 ASIC specific code. Newer ASICs code will be followed.
> All of these network drivers are modeled as an auxiliary devices to the
> parent driver.
>
> The newly added drivers are Core Network (CN), Ethernet and InfiniBand.
> All of these drivers are based on the existing habanalabs driver which
> serves as the compute driver and the entire platform.
> The habanalabs driver probes the network drivers which configure the
> relevant NIC HW of the device. In addition, it continuously communicates
> with the CN driver for providing some services which are not NIC specific
> e.g. PCI, MMU, FW communication etc.
>
> See the drivers scheme at:
> Documentation/networking/device_drivers/ethernet/intel/hbl.rst
>
> The CN driver is both a parent and a son driver. It serves as the common
> layer of many shared operations that are required by both EN and IB
> drivers.
>
> The Gaudi2 NIC HW is composed of 48 physical lanes, 56Gbps each. Each pair
> of lanes represent a 100Gbps logical port.
>
> The NIC HW was designed specifically for scaling AI training.
> Hence it basically functions as a regular NIC device but it is tuned for
> its dedicated purpose. As a result, the NIC HW supports Ethernet traffic
> and RDMA over modified ROCEv2 protocol.
> For example, with respect to the IB driver, the HW supports a single
> context and a single PD. The reason for this is that the operational use
> case of AI training for Gaudi2 consists of a single user
> application/process.
> Another example related to the IB driver is the lack of MR since a single
> application/process can share the entire MMU with the compute device.
> Moreover, the memory allocation of user data buffers which are used for
> RDMA communication is done via the habanalabs compute driver uAPI.
> With respect to the Ethernet driver, since the Ethernet flow is used
> mainly for control, the HW is not performance tuned e.g. it assumes a
> contiguous memory for the Rx buffers. Thus the EN driver needs to copy the
> Rx packets from the Rx buffer into the skb memory.
>
> The first 8 patches implement the CN driver.
> The next 2 patches implement the EN driver.
> The next 2 patches implement the IB driver.
> The last 3 patches modify the compute driver to support the CN driver.
>
> The patches are rebased on v6.10-rc3 tag:
> https://github.com/torvalds/linux/releases/tag/v6.10-rc3
>
> The patches are also available at:
> https://github.com/HabanaAI/drivers.gpu.linux-nic.kernel/tree/hbl_next
>
> The user-mode of the driver is being reviewed at:
> https://github.com/linux-rdma/rdma-core/pull/1472
>
> Any feedback, comment or question is welcome.
>
> Thanks,
> Omer
>
> Omer Shpigelman (15):
> net: hbl_cn: add habanalabs Core Network driver
> net: hbl_cn: memory manager component
> net: hbl_cn: physical layer support
> net: hbl_cn: QP state machine
> net: hbl_cn: memory trace events
> net: hbl_cn: debugfs support
> net: hbl_cn: gaudi2: ASIC register header files
> net: hbl_cn: gaudi2: ASIC specific support
> net: hbl_en: add habanalabs Ethernet driver
> net: hbl_en: gaudi2: ASIC specific support
> RDMA/hbl: add habanalabs RDMA driver
> RDMA/hbl: direct verbs support
> accel/habanalabs: network scaling support
> accel/habanalabs/gaudi2: CN registers header files
> accel/habanalabs/gaudi2: network scaling support
>
> .../ABI/testing/debugfs-driver-habanalabs_cn | 195 +
> .../device_drivers/ethernet/index.rst | 1 +
> .../device_drivers/ethernet/intel/hbl.rst | 82 +
> MAINTAINERS | 33 +
> drivers/accel/habanalabs/Kconfig | 1 +
> drivers/accel/habanalabs/Makefile | 3 +
> drivers/accel/habanalabs/cn/Makefile | 2 +
> drivers/accel/habanalabs/cn/cn.c | 815 +
> drivers/accel/habanalabs/cn/cn.h | 133 +
> .../habanalabs/common/command_submission.c | 2 +-
> drivers/accel/habanalabs/common/device.c | 23 +
> drivers/accel/habanalabs/common/firmware_if.c | 20 +
> drivers/accel/habanalabs/common/habanalabs.h | 43 +-
> .../accel/habanalabs/common/habanalabs_drv.c | 37 +-
> .../habanalabs/common/habanalabs_ioctl.c | 2 +
> drivers/accel/habanalabs/common/memory.c | 123 +
> drivers/accel/habanalabs/gaudi/gaudi.c | 14 +-
> drivers/accel/habanalabs/gaudi2/Makefile | 2 +-
> drivers/accel/habanalabs/gaudi2/gaudi2.c | 440 +-
> drivers/accel/habanalabs/gaudi2/gaudi2P.h | 41 +-
> drivers/accel/habanalabs/gaudi2/gaudi2_cn.c | 424 +
> drivers/accel/habanalabs/gaudi2/gaudi2_cn.h | 42 +
> .../habanalabs/gaudi2/gaudi2_coresight.c | 145 +-
> .../accel/habanalabs/gaudi2/gaudi2_security.c | 16 +-
> drivers/accel/habanalabs/goya/goya.c | 6 +
> .../include/gaudi2/asic_reg/gaudi2_regs.h | 10 +-
> .../include/gaudi2/asic_reg/nic0_phy_regs.h | 59 +
> .../nic0_qm0_axuser_nonsecured_regs.h | 61 +
> .../include/gaudi2/asic_reg/nic0_qpc1_regs.h | 905 +
> .../include/gaudi2/asic_reg/nic0_rxe0_regs.h | 725 +
> .../include/gaudi2/asic_reg/nic0_rxe1_regs.h | 725 +
> .../include/gaudi2/asic_reg/nic0_txe0_regs.h | 529 +
> .../include/gaudi2/asic_reg/nic0_txs0_regs.h | 289 +
> .../include/hw_ip/nic/nic_general.h | 15 +
> drivers/infiniband/Kconfig | 1 +
> drivers/infiniband/hw/Makefile | 1 +
> drivers/infiniband/hw/hbl/Kconfig | 18 +
> drivers/infiniband/hw/hbl/Makefile | 12 +
> drivers/infiniband/hw/hbl/hbl.h | 326 +
> drivers/infiniband/hw/hbl/hbl_encap.c | 216 +
> drivers/infiniband/hw/hbl/hbl_main.c | 493 +
> drivers/infiniband/hw/hbl/hbl_query_port.c | 96 +
> drivers/infiniband/hw/hbl/hbl_set_port_ex.c | 96 +
> drivers/infiniband/hw/hbl/hbl_usr_fifo.c | 252 +
> drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 +
> drivers/net/ethernet/intel/Kconfig | 38 +
> drivers/net/ethernet/intel/Makefile | 2 +
> drivers/net/ethernet/intel/hbl_cn/Makefile | 14 +
> .../net/ethernet/intel/hbl_cn/common/Makefile | 3 +
> .../net/ethernet/intel/hbl_cn/common/hbl_cn.c | 5984 ++
> .../net/ethernet/intel/hbl_cn/common/hbl_cn.h | 1666 +
> .../intel/hbl_cn/common/hbl_cn_debugfs.c | 1457 +
> .../ethernet/intel/hbl_cn/common/hbl_cn_drv.c | 240 +
> .../intel/hbl_cn/common/hbl_cn_memory.c | 368 +
> .../ethernet/intel/hbl_cn/common/hbl_cn_phy.c | 234 +
> .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 491 +
> .../net/ethernet/intel/hbl_cn/gaudi2/Makefile | 3 +
> .../asic_reg/arc_farm_kdma_ctx_axuser_masks.h | 135 +
> .../asic_reg/dcore0_sync_mngr_objs_regs.h | 43543 +++++++++++++++
> .../asic_reg/gaudi2_blocks_linux_driver.h | 45068 ++++++++++++++++
I don't think adding generated register defs etc. is a good idea.
You just bloat the kernel code while most of the values are not used.
When I work with HW and need to use some register defs, I add them
manually one-by-one only when needed. I know it takes more time than to
just add a whole generated reg file, but we don't need tens of thousand
unused locs in the kernel.
Please add only the actually used definitions. This applies to every
file from the series.
> .../hbl_cn/gaudi2/asic_reg/gaudi2_regs.h | 77 +
> .../asic_reg/nic0_mac_ch0_mac_128_masks.h | 339 +
> .../asic_reg/nic0_mac_ch0_mac_128_regs.h | 101 +
> .../asic_reg/nic0_mac_ch0_mac_pcs_masks.h | 713 +
> .../asic_reg/nic0_mac_ch0_mac_pcs_regs.h | 271 +
> .../asic_reg/nic0_mac_ch1_mac_pcs_regs.h | 271 +
> .../asic_reg/nic0_mac_ch2_mac_pcs_regs.h | 271 +
> .../asic_reg/nic0_mac_ch3_mac_pcs_regs.h | 271 +
> .../nic0_mac_glob_stat_control_reg_masks.h | 67 +
> .../nic0_mac_glob_stat_control_reg_regs.h | 37 +
> .../asic_reg/nic0_mac_glob_stat_rx0_regs.h | 93 +
> .../asic_reg/nic0_mac_glob_stat_rx2_regs.h | 93 +
> .../asic_reg/nic0_mac_glob_stat_tx0_regs.h | 75 +
> .../asic_reg/nic0_mac_glob_stat_tx2_regs.h | 75 +
> .../gaudi2/asic_reg/nic0_mac_rs_fec_regs.h | 157 +
> .../hbl_cn/gaudi2/asic_reg/nic0_phy_masks.h | 77 +
> .../hbl_cn/gaudi2/asic_reg/nic0_phy_regs.h | 59 +
> .../nic0_qm0_axuser_nonsecured_regs.h | 61 +
> .../asic_reg/nic0_qpc0_axuser_cong_que_regs.h | 61 +
> .../asic_reg/nic0_qpc0_axuser_db_fifo_regs.h | 61 +
> .../asic_reg/nic0_qpc0_axuser_err_fifo_regs.h | 61 +
> .../nic0_qpc0_axuser_ev_que_lbw_intr_regs.h | 61 +
> .../asic_reg/nic0_qpc0_axuser_qpc_req_regs.h | 61 +
> .../asic_reg/nic0_qpc0_axuser_qpc_resp_regs.h | 61 +
> .../asic_reg/nic0_qpc0_axuser_rxwqe_regs.h | 61 +
> .../nic0_qpc0_axuser_txwqe_lbw_qman_bp_regs.h | 61 +
> .../nic0_qpc0_dbfifo0_ci_upd_addr_regs.h | 27 +
> .../nic0_qpc0_dbfifosecur_ci_upd_addr_regs.h | 27 +
> .../hbl_cn/gaudi2/asic_reg/nic0_qpc0_masks.h | 963 +
> .../hbl_cn/gaudi2/asic_reg/nic0_qpc0_regs.h | 905 +
> .../hbl_cn/gaudi2/asic_reg/nic0_qpc1_regs.h | 905 +
> .../gaudi2/asic_reg/nic0_rxb_core_masks.h | 459 +
> .../gaudi2/asic_reg/nic0_rxb_core_regs.h | 665 +
> .../nic0_rxe0_axuser_axuser_cq0_regs.h | 61 +
> .../nic0_rxe0_axuser_axuser_cq1_regs.h | 61 +
> .../hbl_cn/gaudi2/asic_reg/nic0_rxe0_masks.h | 705 +
> .../hbl_cn/gaudi2/asic_reg/nic0_rxe0_regs.h | 725 +
> .../asic_reg/nic0_rxe0_wqe_aruser_regs.h | 61 +
> .../hbl_cn/gaudi2/asic_reg/nic0_rxe1_regs.h | 725 +
> .../gaudi2/asic_reg/nic0_serdes0_masks.h | 7163 +++
> .../gaudi2/asic_reg/nic0_serdes0_regs.h | 1679 +
> .../gaudi2/asic_reg/nic0_serdes1_regs.h | 1679 +
> .../asic_reg/nic0_tmr_axuser_tmr_fifo_regs.h | 61 +
> .../nic0_tmr_axuser_tmr_free_list_regs.h | 61 +
> .../asic_reg/nic0_tmr_axuser_tmr_fsm_regs.h | 61 +
> .../hbl_cn/gaudi2/asic_reg/nic0_tmr_masks.h | 361 +
> .../hbl_cn/gaudi2/asic_reg/nic0_tmr_regs.h | 183 +
> .../hbl_cn/gaudi2/asic_reg/nic0_txb_regs.h | 167 +
> .../hbl_cn/gaudi2/asic_reg/nic0_txe0_masks.h | 759 +
> .../hbl_cn/gaudi2/asic_reg/nic0_txe0_regs.h | 529 +
> .../hbl_cn/gaudi2/asic_reg/nic0_txs0_masks.h | 555 +
> .../hbl_cn/gaudi2/asic_reg/nic0_txs0_regs.h | 289 +
> .../nic0_umr0_0_completion_queue_ci_1_regs.h | 27 +
> .../nic0_umr0_0_unsecure_doorbell0_regs.h | 31 +
> .../nic0_umr0_0_unsecure_doorbell1_regs.h | 31 +
> .../gaudi2/asic_reg/prt0_mac_core_masks.h | 137 +
> .../gaudi2/asic_reg/prt0_mac_core_regs.h | 67 +
> .../ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.c | 5689 ++
> .../ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.h | 427 +
> .../intel/hbl_cn/gaudi2/gaudi2_cn_debugfs.c | 319 +
> .../intel/hbl_cn/gaudi2/gaudi2_cn_eq.c | 732 +
> .../intel/hbl_cn/gaudi2/gaudi2_cn_phy.c | 2743 +
> drivers/net/ethernet/intel/hbl_en/Makefile | 12 +
> .../net/ethernet/intel/hbl_en/common/Makefile | 3 +
> .../net/ethernet/intel/hbl_en/common/hbl_en.c | 1170 +
> .../net/ethernet/intel/hbl_en/common/hbl_en.h | 208 +
> .../intel/hbl_en/common/hbl_en_dcbnl.c | 101 +
> .../ethernet/intel/hbl_en/common/hbl_en_drv.c | 211 +
> .../intel/hbl_en/common/hbl_en_ethtool.c | 452 +
> .../net/ethernet/intel/hbl_en/gaudi2/Makefile | 2 +
> .../ethernet/intel/hbl_en/gaudi2/gaudi2_en.c | 728 +
> .../ethernet/intel/hbl_en/gaudi2/gaudi2_en.h | 53 +
> .../intel/hbl_en/gaudi2/gaudi2_en_dcbnl.c | 32 +
> include/linux/habanalabs/cpucp_if.h | 125 +-
> include/linux/habanalabs/hl_boot_if.h | 9 +-
> include/linux/net/intel/cn.h | 474 +
> include/linux/net/intel/cn_aux.h | 298 +
> include/linux/net/intel/cni.h | 636 +
> include/linux/net/intel/gaudi2.h | 432 +
> include/linux/net/intel/gaudi2_aux.h | 94 +
> include/trace/events/habanalabs_cn.h | 116 +
> include/uapi/drm/habanalabs_accel.h | 10 +-
> include/uapi/rdma/hbl-abi.h | 204 +
> include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
> include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
> include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
> 146 files changed, 148514 insertions(+), 70 deletions(-)
So most of these new lines are generated register definitions. The
series can be several times smaller if you follow my advice.
Thanks,
Olek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 04/15] net: hbl_cn: QP state machine
2024-06-13 8:21 ` [PATCH 04/15] net: hbl_cn: QP state machine Omer Shpigelman
@ 2024-06-17 13:18 ` Leon Romanovsky
2024-06-18 5:50 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Leon Romanovsky @ 2024-06-17 13:18 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel, linux-rdma, netdev, dri-devel, ogabbay, zyehudai
On Thu, Jun 13, 2024 at 11:21:57AM +0300, Omer Shpigelman wrote:
> Add a common QP state machine which handles the moving for a QP from one
> state to another including performing necessary checks, draining
> in-flight transactions, invalidating caches and error reporting.
>
> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> Co-developed-by: David Meriin <dmeriin@habana.ai>
> Signed-off-by: David Meriin <dmeriin@habana.ai>
> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> ---
> .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 480 +++++++++++++++++-
> 1 file changed, 479 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
> index 9ddc23bf8194..26ebdf448193 100644
> --- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
> +++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
> @@ -6,8 +6,486 @@
<...>
> +/* The following table represents the (valid) operations that can be performed on
> + * a QP in order to move it from one state to another
> + * For example: a QP in RTR state can be moved to RTS state using the CN_QP_OP_RTR_2RTS
> + * operation.
> + */
> +static const enum hbl_cn_qp_state_op qp_valid_state_op[CN_QP_NUM_STATE][CN_QP_NUM_STATE] = {
> + [CN_QP_STATE_RESET] = {
> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> + [CN_QP_STATE_INIT] = CN_QP_OP_RST_2INIT,
> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
> + },
> + [CN_QP_STATE_INIT] = {
> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> + [CN_QP_STATE_INIT] = CN_QP_OP_NOP,
> + [CN_QP_STATE_RTR] = CN_QP_OP_INIT_2RTR,
> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
> + },
> + [CN_QP_STATE_RTR] = {
> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> + [CN_QP_STATE_RTR] = CN_QP_OP_RTR_2RTR,
> + [CN_QP_STATE_RTS] = CN_QP_OP_RTR_2RTS,
> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
> + [CN_QP_STATE_QPD] = CN_QP_OP_RTR_2QPD,
> + },
> + [CN_QP_STATE_RTS] = {
> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> + [CN_QP_STATE_RTS] = CN_QP_OP_RTS_2RTS,
> + [CN_QP_STATE_SQD] = CN_QP_OP_RTS_2SQD,
> + [CN_QP_STATE_QPD] = CN_QP_OP_RTS_2QPD,
> + [CN_QP_STATE_SQERR] = CN_QP_OP_RTS_2SQERR,
> + },
> + [CN_QP_STATE_SQD] = {
> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> + [CN_QP_STATE_SQD] = CN_QP_OP_SQD_2SQD,
> + [CN_QP_STATE_RTS] = CN_QP_OP_SQD_2RTS,
> + [CN_QP_STATE_QPD] = CN_QP_OP_SQD_2QPD,
> + [CN_QP_STATE_SQERR] = CN_QP_OP_SQD_2SQ_ERR,
> + },
> + [CN_QP_STATE_QPD] = {
> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
> + [CN_QP_STATE_RTR] = CN_QP_OP_QPD_2RTR,
> + },
> + [CN_QP_STATE_SQERR] = {
> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> + [CN_QP_STATE_SQD] = CN_QP_OP_SQ_ERR_2SQD,
> + [CN_QP_STATE_SQERR] = CN_QP_OP_NOP,
> + },
> + [CN_QP_STATE_ERR] = {
> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> + }
> +};
I don't understand why IBTA QP state machine is declared in ETH driver
and not in IB driver.
> +
<...>
> + /* Release lock while we wait before retry.
> + * Note, we can assert that we are already locked.
> + */
> + port_funcs->cfg_unlock(cn_port);
> +
> + msleep(20);
> +
> + port_funcs->cfg_lock(cn_port);
lock/unlock through ops pointer doesn't look like a good idea.
Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver
2024-06-13 8:21 ` [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver Omer Shpigelman
2024-06-13 13:01 ` Przemek Kitszel
2024-06-15 0:05 ` Stephen Hemminger
@ 2024-06-17 14:05 ` Markus Elfring
2024-06-17 15:02 ` Andrew Lunn
2 siblings, 1 reply; 107+ messages in thread
From: Markus Elfring @ 2024-06-17 14:05 UTC (permalink / raw)
To: Abhilash K V, Andrey Agranovich, Bharat Jauhari, David Meriin,
Omer Shpigelman, Sagiv Ozeri, Zvika Yehudai, netdev, linux-rdma,
dri-devel
Cc: LKML
…
> +++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
> @@ -0,0 +1,5954 @@
…
> +int hbl_cn_read_spmu_counters(struct hbl_cn_port *cn_port, u64 out_data[], u32 *num_out_data)
> +{
…
> + mutex_lock(&cn_port->cnt_lock);
> + rc = port_funcs->spmu_sample(cn_port, *num_out_data, out_data);
> + mutex_unlock(&cn_port->cnt_lock);
> +
> + return rc;
> +}
…
Would you become interested to apply a statement like “guard(mutex)(&cn_port->cnt_lock);”?
https://elixir.bootlin.com/linux/v6.10-rc4/source/include/linux/mutex.h#L196
Regards,
Markus
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-13 8:22 ` [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver Omer Shpigelman
2024-06-13 19:18 ` Leon Romanovsky
@ 2024-06-17 14:17 ` Jason Gunthorpe
2024-06-19 9:39 ` Omer Shpigelman
1 sibling, 1 reply; 107+ messages in thread
From: Jason Gunthorpe @ 2024-06-17 14:17 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel, linux-rdma, netdev, dri-devel, ogabbay, zyehudai
On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
> Add an RDMA driver of Gaudi ASICs family for AI scaling.
> The driver itself is agnostic to the ASIC in action, it operates according
> to the capabilities that were passed on device initialization.
> The device is initialized by the hbl_cn driver via auxiliary bus.
> The driver also supports QP resource tracking and port/device HW counters.
I'm glad to finally see this, I've been talking to habana folks a long
time now to get this worked out!
This will need to be split up more, like others have said. I'd post
the RDMA series assuming that the basic ethernet driver is merged. You
don't need to combine basic ethernet with rdma in the same series.
Jason
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver
2024-06-17 14:05 ` Markus Elfring
@ 2024-06-17 15:02 ` Andrew Lunn
2024-06-18 7:51 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Andrew Lunn @ 2024-06-17 15:02 UTC (permalink / raw)
To: Markus Elfring
Cc: Abhilash K V, Andrey Agranovich, Bharat Jauhari, David Meriin,
Omer Shpigelman, Sagiv Ozeri, Zvika Yehudai, netdev, linux-rdma,
dri-devel, LKML
On Mon, Jun 17, 2024 at 04:05:57PM +0200, Markus Elfring wrote:
> …
> > +++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
> > @@ -0,0 +1,5954 @@
> …
> > +int hbl_cn_read_spmu_counters(struct hbl_cn_port *cn_port, u64 out_data[], u32 *num_out_data)
> > +{
> …
> > + mutex_lock(&cn_port->cnt_lock);
> > + rc = port_funcs->spmu_sample(cn_port, *num_out_data, out_data);
> > + mutex_unlock(&cn_port->cnt_lock);
> > +
> > + return rc;
> > +}
> …
>
> Would you become interested to apply a statement like “guard(mutex)(&cn_port->cnt_lock);”?
> https://elixir.bootlin.com/linux/v6.10-rc4/source/include/linux/mutex.h#L196
Hi Markus
We decided for netdev that guard() was too magical, at least for the
moment. Lets wait a few years to see how it pans out. scoped_guard()
is however O.K.
Andrew
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-13 19:18 ` Leon Romanovsky
@ 2024-06-17 17:43 ` Omer Shpigelman
2024-06-17 19:04 ` Leon Romanovsky
0 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-17 17:43 UTC (permalink / raw)
To: Leon Romanovsky
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/13/24 22:18, Leon Romanovsky wrote:
> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
>> The driver itself is agnostic to the ASIC in action, it operates according
>> to the capabilities that were passed on device initialization.
>> The device is initialized by the hbl_cn driver via auxiliary bus.
>> The driver also supports QP resource tracking and port/device HW counters.
>>
>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
>> Co-developed-by: David Meriin <dmeriin@habana.ai>
>> Signed-off-by: David Meriin <dmeriin@habana.ai>
>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
>
> I afraid that you misinterpreted the "Co-developed-by" tag. All these
> people are probably touch the code and not actually sit together at
> the same room and write the code together. So, please remove the
> extensive "Co-developed-by" tags.
>
> It is not full review yet, but simple pass-by-comments.
>
Actually except of two, all of the mentioned persons sat in the same room
and developed the code together.
The remaining two are located on a different site (but also together).
Isn't that what "Co-developed-by" tag for?
I wanted to give them credit for writing the code but I can remove if it's
not common.
>> ---
>> MAINTAINERS | 10 +
>> drivers/infiniband/Kconfig | 1 +
>> drivers/infiniband/hw/Makefile | 1 +
>> drivers/infiniband/hw/hbl/Kconfig | 17 +
>> drivers/infiniband/hw/hbl/Makefile | 8 +
>> drivers/infiniband/hw/hbl/hbl.h | 326 +++
>> drivers/infiniband/hw/hbl/hbl_main.c | 478 ++++
>> drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 ++++++++++++++++++++++
>> include/uapi/rdma/hbl-abi.h | 204 ++
>> include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
>> include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
>> include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
>> 12 files changed, 3904 insertions(+)
>> create mode 100644 drivers/infiniband/hw/hbl/Kconfig
>> create mode 100644 drivers/infiniband/hw/hbl/Makefile
>> create mode 100644 drivers/infiniband/hw/hbl/hbl.h
>> create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c
>> create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c
>> create mode 100644 include/uapi/rdma/hbl-abi.h
>> create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h
>> create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h
>
> <...>
>
>> +#define hbl_ibdev_emerg(ibdev, format, ...) ibdev_emerg(ibdev, format, ##__VA_ARGS__)
>> +#define hbl_ibdev_alert(ibdev, format, ...) ibdev_alert(ibdev, format, ##__VA_ARGS__)
>> +#define hbl_ibdev_crit(ibdev, format, ...) ibdev_crit(ibdev, format, ##__VA_ARGS__)
>> +#define hbl_ibdev_err(ibdev, format, ...) ibdev_err(ibdev, format, ##__VA_ARGS__)
>> +#define hbl_ibdev_warn(ibdev, format, ...) ibdev_warn(ibdev, format, ##__VA_ARGS__)
>> +#define hbl_ibdev_notice(ibdev, format, ...) ibdev_notice(ibdev, format, ##__VA_ARGS__)
>> +#define hbl_ibdev_info(ibdev, format, ...) ibdev_info(ibdev, format, ##__VA_ARGS__)
>> +#define hbl_ibdev_dbg(ibdev, format, ...) ibdev_dbg(ibdev, format, ##__VA_ARGS__)
>> +
>> +#define hbl_ibdev_emerg_ratelimited(ibdev, fmt, ...) \
>> + ibdev_emerg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +#define hbl_ibdev_alert_ratelimited(ibdev, fmt, ...) \
>> + ibdev_alert_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +#define hbl_ibdev_crit_ratelimited(ibdev, fmt, ...) \
>> + ibdev_crit_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +#define hbl_ibdev_err_ratelimited(ibdev, fmt, ...) \
>> + ibdev_err_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +#define hbl_ibdev_warn_ratelimited(ibdev, fmt, ...) \
>> + ibdev_warn_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +#define hbl_ibdev_notice_ratelimited(ibdev, fmt, ...) \
>> + ibdev_notice_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +#define hbl_ibdev_info_ratelimited(ibdev, fmt, ...) \
>> + ibdev_info_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +#define hbl_ibdev_dbg_ratelimited(ibdev, fmt, ...) \
>> + ibdev_dbg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +
>
> Please don't redefine the existing macros. Just use the existing ones.
>
>
> <...>
>
That's a leftover from some debug code. I'll remove.
>> + if (hbl_ib_match_netdev(ibdev, netdev))
>> + ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
>> + else
>> + return NOTIFY_DONE;
>
> It is not kernel coding style. Please write:
> if (!hbl_ib_match_netdev(ibdev, netdev))
> return NOTIFY_DONE;
>
> ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
>
I'll fix the code, thanks.
>> +
>
> <...>
>
>> +static int hbl_ib_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
>> +{
>> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
>> + struct hbl_ib_aux_ops *aux_ops = aux_dev->aux_ops;
>> + struct hbl_ib_device *hdev;
>> + ktime_t timeout;
>> + int rc;
>> +
>> + rc = hdev_init(aux_dev);
>> + if (rc) {
>> + dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
>> + return -EIO;
>> + }
>> +
>> + hdev = aux_dev->priv;
>> +
>> + /* don't allow module unloading while it is attached */
>> + if (!try_module_get(THIS_MODULE)) {
>
> This part makes wonder, what are you trying to do here? What doesn't work for you
> in standard driver core and module load mechanism?
>
Before auxiliary bus was introduced, we used EXPORT_SYMBOLs for inter
driver communication. That incremented the refcount of the used module so
it couldn't be removed while it is in use.
Auxiliary bus usage doesn't increment the used module refcount and hence
the used module can be removed while it is in use and that's something
we don't want to allow.
We could solve it by some global locking or in_use atomic but the most
simple and clean way is just to increment the used module refcount on
auxiliary device probe and decrement it on auxiliary device removal.
>> + dev_err(hdev->dev, "Failed to increment %s module refcount\n",
>> + module_name(THIS_MODULE));
>> + rc = -EIO;
>> + goto module_get_err;
>> + }
>> +
>> + timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
>> + while (1) {
>> + aux_ops->hw_access_lock(aux_dev);
>> +
>> + /* if the device is operational, proceed to actual init while holding the lock in
>> + * order to prevent concurrent hard reset
>> + */
>> + if (aux_ops->device_operational(aux_dev))
>> + break;
>> +
>> + aux_ops->hw_access_unlock(aux_dev);
>> +
>> + if (ktime_compare(ktime_get(), timeout) > 0) {
>> + dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
>> + rc = -EBUSY;
>> + goto timeout_err;
>> + }
>> +
>> + dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing IB\n");
>> +
>> + msleep_interruptible(MSEC_PER_SEC);
>> + }
>
> The code above is unexpected.
>
We have no control on when the user insmod the IB driver. As a result it
is possible that the IB auxiliary device will be probed while the compute
device is under reset (due to some HW error). During device reset we must
not access the HW. We should wait for the compute device to finish the
reset flow before we continue with the IB device probe and block any
compute device reset flow meanwhile.
>> +
>> + rc = hbl_ib_dev_init(hdev);
>> + if (rc) {
>> + dev_err(hdev->dev, "Failed to init ib device\n");
>> + goto dev_init_err;
>> + }
>> +
>> + aux_ops->hw_access_unlock(aux_dev);
>> +
>> + return 0;
>> +
>> +dev_init_err:
>> + aux_ops->hw_access_unlock(aux_dev);
>> +timeout_err:
>> + module_put(THIS_MODULE);
>> +module_get_err:
>> + hdev_fini(aux_dev);
>> +
>> + return rc;
>> +}
>
> <...>
>
>> +static int __init hbl_ib_init(void)
>> +{
>> + pr_info("loading driver\n");
>
> Please remove all these debug prints and leave only the necessary ones.
>
Sure, will remove.
>> +
>> + return auxiliary_driver_register(&hbl_ib_driver);
>> +}
>> +
>> +static void __exit hbl_ib_exit(void)
>> +{
>> + auxiliary_driver_unregister(&hbl_ib_driver);
>> +
>> + pr_info("driver removed\n");
>> +}
>> +
>> +module_init(hbl_ib_init);
>> +module_exit(hbl_ib_exit)
>
> Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-17 17:43 ` Omer Shpigelman
@ 2024-06-17 19:04 ` Leon Romanovsky
2024-06-18 11:08 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Leon Romanovsky @ 2024-06-17 19:04 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On Mon, Jun 17, 2024 at 05:43:49PM +0000, Omer Shpigelman wrote:
> On 6/13/24 22:18, Leon Romanovsky wrote:
> > [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
> >> Add an RDMA driver of Gaudi ASICs family for AI scaling.
> >> The driver itself is agnostic to the ASIC in action, it operates according
> >> to the capabilities that were passed on device initialization.
> >> The device is initialized by the hbl_cn driver via auxiliary bus.
> >> The driver also supports QP resource tracking and port/device HW counters.
> >>
> >> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> >> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> >> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> >> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> >> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> >> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> >> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> >> Co-developed-by: David Meriin <dmeriin@habana.ai>
> >> Signed-off-by: David Meriin <dmeriin@habana.ai>
> >> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> >> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> >> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> >> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> >
> > I afraid that you misinterpreted the "Co-developed-by" tag. All these
> > people are probably touch the code and not actually sit together at
> > the same room and write the code together. So, please remove the
> > extensive "Co-developed-by" tags.
> >
> > It is not full review yet, but simple pass-by-comments.
> >
>
> Actually except of two, all of the mentioned persons sat in the same room
> and developed the code together.
> The remaining two are located on a different site (but also together).
> Isn't that what "Co-developed-by" tag for?
> I wanted to give them credit for writing the code but I can remove if it's
> not common.
Signed-off-by will be enough to give them credit.
>
> >> ---
> >> MAINTAINERS | 10 +
> >> drivers/infiniband/Kconfig | 1 +
> >> drivers/infiniband/hw/Makefile | 1 +
> >> drivers/infiniband/hw/hbl/Kconfig | 17 +
> >> drivers/infiniband/hw/hbl/Makefile | 8 +
> >> drivers/infiniband/hw/hbl/hbl.h | 326 +++
> >> drivers/infiniband/hw/hbl/hbl_main.c | 478 ++++
> >> drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 ++++++++++++++++++++++
> >> include/uapi/rdma/hbl-abi.h | 204 ++
> >> include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
> >> include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
> >> include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
> >> 12 files changed, 3904 insertions(+)
> >> create mode 100644 drivers/infiniband/hw/hbl/Kconfig
> >> create mode 100644 drivers/infiniband/hw/hbl/Makefile
> >> create mode 100644 drivers/infiniband/hw/hbl/hbl.h
> >> create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c
> >> create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c
> >> create mode 100644 include/uapi/rdma/hbl-abi.h
> >> create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h
> >> create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h
> >
> > <...>
> >
> >> +#define hbl_ibdev_emerg(ibdev, format, ...) ibdev_emerg(ibdev, format, ##__VA_ARGS__)
> >> +#define hbl_ibdev_alert(ibdev, format, ...) ibdev_alert(ibdev, format, ##__VA_ARGS__)
> >> +#define hbl_ibdev_crit(ibdev, format, ...) ibdev_crit(ibdev, format, ##__VA_ARGS__)
> >> +#define hbl_ibdev_err(ibdev, format, ...) ibdev_err(ibdev, format, ##__VA_ARGS__)
> >> +#define hbl_ibdev_warn(ibdev, format, ...) ibdev_warn(ibdev, format, ##__VA_ARGS__)
> >> +#define hbl_ibdev_notice(ibdev, format, ...) ibdev_notice(ibdev, format, ##__VA_ARGS__)
> >> +#define hbl_ibdev_info(ibdev, format, ...) ibdev_info(ibdev, format, ##__VA_ARGS__)
> >> +#define hbl_ibdev_dbg(ibdev, format, ...) ibdev_dbg(ibdev, format, ##__VA_ARGS__)
> >> +
> >> +#define hbl_ibdev_emerg_ratelimited(ibdev, fmt, ...) \
> >> + ibdev_emerg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >> +#define hbl_ibdev_alert_ratelimited(ibdev, fmt, ...) \
> >> + ibdev_alert_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >> +#define hbl_ibdev_crit_ratelimited(ibdev, fmt, ...) \
> >> + ibdev_crit_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >> +#define hbl_ibdev_err_ratelimited(ibdev, fmt, ...) \
> >> + ibdev_err_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >> +#define hbl_ibdev_warn_ratelimited(ibdev, fmt, ...) \
> >> + ibdev_warn_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >> +#define hbl_ibdev_notice_ratelimited(ibdev, fmt, ...) \
> >> + ibdev_notice_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >> +#define hbl_ibdev_info_ratelimited(ibdev, fmt, ...) \
> >> + ibdev_info_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >> +#define hbl_ibdev_dbg_ratelimited(ibdev, fmt, ...) \
> >> + ibdev_dbg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >> +
> >
> > Please don't redefine the existing macros. Just use the existing ones.
> >
> >
> > <...>
> >
>
> That's a leftover from some debug code. I'll remove.
>
> >> + if (hbl_ib_match_netdev(ibdev, netdev))
> >> + ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
> >> + else
> >> + return NOTIFY_DONE;
> >
> > It is not kernel coding style. Please write:
> > if (!hbl_ib_match_netdev(ibdev, netdev))
> > return NOTIFY_DONE;
> >
> > ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
> >
>
> I'll fix the code, thanks.
>
> >> +
> >
> > <...>
> >
> >> +static int hbl_ib_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
> >> +{
> >> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
> >> + struct hbl_ib_aux_ops *aux_ops = aux_dev->aux_ops;
> >> + struct hbl_ib_device *hdev;
> >> + ktime_t timeout;
> >> + int rc;
> >> +
> >> + rc = hdev_init(aux_dev);
> >> + if (rc) {
> >> + dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
> >> + return -EIO;
> >> + }
> >> +
> >> + hdev = aux_dev->priv;
> >> +
> >> + /* don't allow module unloading while it is attached */
> >> + if (!try_module_get(THIS_MODULE)) {
> >
> > This part makes wonder, what are you trying to do here? What doesn't work for you
> > in standard driver core and module load mechanism?
> >
>
> Before auxiliary bus was introduced, we used EXPORT_SYMBOLs for inter
> driver communication. That incremented the refcount of the used module so
> it couldn't be removed while it is in use.
> Auxiliary bus usage doesn't increment the used module refcount and hence
> the used module can be removed while it is in use and that's something
> we don't want to allow.
> We could solve it by some global locking or in_use atomic but the most
> simple and clean way is just to increment the used module refcount on
> auxiliary device probe and decrement it on auxiliary device removal.
No, you was supposed to continue to use EXPORT_SYMBOLs and don't
invent auxiliary ops structure (this is why you lost module
reference counting).
>
> >> + dev_err(hdev->dev, "Failed to increment %s module refcount\n",
> >> + module_name(THIS_MODULE));
> >> + rc = -EIO;
> >> + goto module_get_err;
> >> + }
> >> +
> >> + timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
> >> + while (1) {
> >> + aux_ops->hw_access_lock(aux_dev);
> >> +
> >> + /* if the device is operational, proceed to actual init while holding the lock in
> >> + * order to prevent concurrent hard reset
> >> + */
> >> + if (aux_ops->device_operational(aux_dev))
> >> + break;
> >> +
> >> + aux_ops->hw_access_unlock(aux_dev);
> >> +
> >> + if (ktime_compare(ktime_get(), timeout) > 0) {
> >> + dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
> >> + rc = -EBUSY;
> >> + goto timeout_err;
> >> + }
> >> +
> >> + dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing IB\n");
> >> +
> >> + msleep_interruptible(MSEC_PER_SEC);
> >> + }
> >
> > The code above is unexpected.
> >
>
> We have no control on when the user insmod the IB driver.
It is not true, this is controlled through module dependencies
mechanism.
> As a result it is possible that the IB auxiliary device will be probed
> while the compute device is under reset (due to some HW error).
No, it is not possible. If you structure your driver right.
Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 04/15] net: hbl_cn: QP state machine
2024-06-17 13:18 ` Leon Romanovsky
@ 2024-06-18 5:50 ` Omer Shpigelman
2024-06-18 7:08 ` Leon Romanovsky
0 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-18 5:50 UTC (permalink / raw)
To: Leon Romanovsky
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/17/24 16:18, Leon Romanovsky wrote:
> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Thu, Jun 13, 2024 at 11:21:57AM +0300, Omer Shpigelman wrote:
>> Add a common QP state machine which handles the moving for a QP from one
>> state to another including performing necessary checks, draining
>> in-flight transactions, invalidating caches and error reporting.
>>
>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
>> Co-developed-by: David Meriin <dmeriin@habana.ai>
>> Signed-off-by: David Meriin <dmeriin@habana.ai>
>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
>> ---
>> .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 480 +++++++++++++++++-
>> 1 file changed, 479 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>> index 9ddc23bf8194..26ebdf448193 100644
>> --- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>> +++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>> @@ -6,8 +6,486 @@
>
> <...>
>
>> +/* The following table represents the (valid) operations that can be performed on
>> + * a QP in order to move it from one state to another
>> + * For example: a QP in RTR state can be moved to RTS state using the CN_QP_OP_RTR_2RTS
>> + * operation.
>> + */
>> +static const enum hbl_cn_qp_state_op qp_valid_state_op[CN_QP_NUM_STATE][CN_QP_NUM_STATE] = {
>> + [CN_QP_STATE_RESET] = {
>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>> + [CN_QP_STATE_INIT] = CN_QP_OP_RST_2INIT,
>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
>> + },
>> + [CN_QP_STATE_INIT] = {
>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>> + [CN_QP_STATE_INIT] = CN_QP_OP_NOP,
>> + [CN_QP_STATE_RTR] = CN_QP_OP_INIT_2RTR,
>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
>> + },
>> + [CN_QP_STATE_RTR] = {
>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>> + [CN_QP_STATE_RTR] = CN_QP_OP_RTR_2RTR,
>> + [CN_QP_STATE_RTS] = CN_QP_OP_RTR_2RTS,
>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>> + [CN_QP_STATE_QPD] = CN_QP_OP_RTR_2QPD,
>> + },
>> + [CN_QP_STATE_RTS] = {
>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>> + [CN_QP_STATE_RTS] = CN_QP_OP_RTS_2RTS,
>> + [CN_QP_STATE_SQD] = CN_QP_OP_RTS_2SQD,
>> + [CN_QP_STATE_QPD] = CN_QP_OP_RTS_2QPD,
>> + [CN_QP_STATE_SQERR] = CN_QP_OP_RTS_2SQERR,
>> + },
>> + [CN_QP_STATE_SQD] = {
>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>> + [CN_QP_STATE_SQD] = CN_QP_OP_SQD_2SQD,
>> + [CN_QP_STATE_RTS] = CN_QP_OP_SQD_2RTS,
>> + [CN_QP_STATE_QPD] = CN_QP_OP_SQD_2QPD,
>> + [CN_QP_STATE_SQERR] = CN_QP_OP_SQD_2SQ_ERR,
>> + },
>> + [CN_QP_STATE_QPD] = {
>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
>> + [CN_QP_STATE_RTR] = CN_QP_OP_QPD_2RTR,
>> + },
>> + [CN_QP_STATE_SQERR] = {
>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>> + [CN_QP_STATE_SQD] = CN_QP_OP_SQ_ERR_2SQD,
>> + [CN_QP_STATE_SQERR] = CN_QP_OP_NOP,
>> + },
>> + [CN_QP_STATE_ERR] = {
>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>> + }
>> +};
>
> I don't understand why IBTA QP state machine is declared in ETH driver
> and not in IB driver.
>
Implementing the actual transitions between the states requires full
knowledge of the HW e.g. when to flush, cache invalidation, timeouts.
Our IB driver is agnostic to the ASIC type by design. Note that more ASIC
generations are planned to be added and the IB driver should not be aware
of these additional HWs.
Hence we implemeted the QP state machine in the CN driver which is aware
of the actual HW.
>> +
>
> <...>
>
>> + /* Release lock while we wait before retry.
>> + * Note, we can assert that we are already locked.
>> + */
>> + port_funcs->cfg_unlock(cn_port);
>> +
>> + msleep(20);
>> +
>> + port_funcs->cfg_lock(cn_port);
>
> lock/unlock through ops pointer doesn't look like a good idea.
>
More ASIC generations will be added once we merge the current Gaudi2 code.
On other ASICs the locking granularity is different because some of the HW
resources are shared between different logical ports.
Hence it is was logical for us to implement it with a function pointer so
each ASIC specific code can implemnet the locking properly.
> Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-13 21:49 ` Andrew Lunn
@ 2024-06-18 6:58 ` Omer Shpigelman
2024-06-18 14:19 ` Andrew Lunn
0 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-18 6:58 UTC (permalink / raw)
To: Andrew Lunn
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/14/24 00:49, Andrew Lunn wrote:
> [Some people who received this message don't often get email from andrew@lunn.ch. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
>> +static int hbl_en_napi_poll(struct napi_struct *napi, int budget);
>> +static int hbl_en_port_open(struct hbl_en_port *port);
>
> When you do the Intel internal review, i expect this is crop up. No
> forward declarations please. Put the code in the right order so they
> are not needed.
>
I'll try to get rid of these forward declarations by re-odering the functions.
>> +static int hbl_en_get_src_ip(struct hbl_aux_dev *aux_dev, u32 port_idx, u32 *src_ip)
>> +{
>> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
>> + struct net_device *ndev = port->ndev;
>> + struct in_device *in_dev;
>> + struct in_ifaddr *ifa;
>> + int rc = 0;
>> +
>> + /* for the case where no src IP is configured */
>> + *src_ip = 0;
>> +
>> + /* rtnl lock should be acquired in relevant flows before taking configuration lock */
>> + if (!rtnl_is_locked()) {
>> + netdev_err(port->ndev, "Rtnl lock is not acquired, can't proceed\n");
>> + rc = -EFAULT;
>> + goto out;
>> + }
>
> You will find all other drivers just do:
>
> ASSERT_RTNL().
>
> If your locking is broken, you are probably dead anyway, so you might
> as well keep going and try to explode in the most interesting way
> possible.
>
Thanks, I'll switch to ASSERT_RTNL().
>> +static void hbl_en_reset_stats(struct hbl_aux_dev *aux_dev, u32 port_idx)
>> +{
>> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
>> +
>> + port->net_stats.rx_packets = 0;
>> + port->net_stats.tx_packets = 0;
>> + port->net_stats.rx_bytes = 0;
>> + port->net_stats.tx_bytes = 0;
>> + port->net_stats.tx_errors = 0;
>> + atomic64_set(&port->net_stats.rx_dropped, 0);
>> + atomic64_set(&port->net_stats.tx_dropped, 0);
>
> Why atomic64_set? Atomics are expensive, so you should not be using
> them. netdev has other cheaper methods, which other Intel developers
> should be happy to tell you all about.
>
We used atomic64_set as these counters are updated also from non-netdev
flow in case of HW errors.
I can switch to use u64_stats_sync is that's the intention.
I'm about to start a review process with Intel developers regardless of
this issue and I'll bring this up too.
>> +static u32 hbl_en_get_mtu(struct hbl_aux_dev *aux_dev, u32 port_idx)
>> +{
>> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
>> + struct net_device *ndev = port->ndev;
>> + u32 mtu;
>> +
>> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
>> + netdev_err(ndev, "port is in reset, can't get MTU\n");
>> + return 0;
>> + }
>> +
>> + mtu = ndev->mtu;
>
> I think you need a better error message. All this does is access
> ndev->mtu. What does it matter if the port is in reset? You don't
> access it.
>
This function is called from the CN driver to get the current MTU in order
to configure it to the HW, for exmaple when configuring an IB QP. The MTU
value might be changed by user while we execute this function. Such an MTU
change requires port reset.
Hence, if the port is under reset we cannot be sure what is the MTU value.
Since the user should not change the MTU while QPs are being configured
(but we cannot block this flow either), we report an error because the MTU
value cannot be retrieved.
The other option to read the MTU value without checking for an in-progress
reset flow but in that case the MTU value might be incorrect.
>> +static int hbl_en_close(struct net_device *netdev)
>> +{
>> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
>> + struct hbl_en_device *hdev = port->hdev;
>> + ktime_t timeout;
>> +
>> + /* Looks like the return value of this function is not checked, so we can't just return
>> + * EBUSY if the port is under reset. We need to wait until the reset is finished and then
>> + * close the port. Otherwise the netdev will set the port as closed although port_close()
>> + * wasn't called. Only if we waited long enough and the reset hasn't finished, we can return
>> + * an error without actually closing the port as it is a fatal flow anyway.
>> + */
>> + timeout = ktime_add_ms(ktime_get(), PORT_RESET_TIMEOUT_MSEC);
>> + while (atomic_cmpxchg(&port->in_reset, 0, 1)) {
>> + /* If this is called from unregister_netdev() then the port was already closed and
>> + * hence we can safely return.
>> + * We could have just check the port_open boolean, but that might hide some future
>> + * bugs. Hence it is better to use a dedicated flag for that.
>> + */
>> + if (READ_ONCE(hdev->in_teardown))
>> + return 0;
>> +
>> + usleep_range(50, 200);
>> + if (ktime_compare(ktime_get(), timeout) > 0) {
>> + netdev_crit(netdev,
>> + "Timeout while waiting for port to finish reset, can't close it\n"
>> + );
>> + return -EBUSY;
>> + }
>
> This has the usual bug. Please look at include/linux/iopoll.h.
>
I'll take a look, thanks.
>> + timeout = ktime_add_ms(ktime_get(), PORT_RESET_TIMEOUT_MSEC);
>> + while (atomic_cmpxchg(&port->in_reset, 0, 1)) {
>> + usleep_range(50, 200);
>> + if (ktime_compare(ktime_get(), timeout) > 0) {
>> + netdev_crit(port->ndev,
>> + "Timeout while waiting for port %d to finish reset\n",
>> + port->idx);
>> + break;
>> + }
>> + }
>
> and again. Don't roll your own timeout loops like this, use the core
> version.
>
I will look for some core alternative.
>> +static int hbl_en_change_mtu(struct net_device *netdev, int new_mtu)
>> +{
>> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
>> + int rc = 0;
>> +
>> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
>> + netdev_err(netdev, "port is in reset, can't change MTU\n");
>> + return -EBUSY;
>> + }
>> +
>> + if (netif_running(port->ndev)) {
>> + hbl_en_port_close(port);
>> +
>> + /* Sleep in order to let obsolete events to be dropped before re-opening the port */
>> + msleep(20);
>> +
>> + netdev->mtu = new_mtu;
>> +
>> + rc = hbl_en_port_open(port);
>> + if (rc)
>> + netdev_err(netdev, "Failed to reinit port for MTU change, rc %d\n", rc);
>
> Does that mean the port is FUBAR?
>
> Most operations like this are expected to roll back to the previous
> working configuration on failure. So if changing the MTU requires new
> buffers in your ring, you should first allocate the new buffers, then
> free the old buffers, so that if allocation fails, you still have
> buffers, and the device can continue operating.
>
A failure in opening a port is a fatal error. It shouldn't happen. This is
not something we wish to recover from.
This kind of an error indicates a severe system error that will usually
require a driver removal and reload anyway.
>> +module_param(poll_enable, bool, 0444);
>> +MODULE_PARM_DESC(poll_enable,
>> + "Enable Rx polling rather than IRQ + NAPI (0 = no, 1 = yes, default: no)");
>
> Module parameters are not liked. This probably needs to go away.
>
I see that various vendors under net/ethernet/* use module parameters.
Can't we add another one?
>> +static int hbl_en_ethtool_get_module_info(struct net_device *ndev, struct ethtool_modinfo *modinfo)
>> +{
>> + modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
>> + modinfo->type = ETH_MODULE_SFF_8636;
>
> Is this an SFF, not an SFP? How else can you know what module it is
> without doing an I2C transfer to ask the module what it is?
>
The current type is SFF and it is unlikely to be changed.
>> +static int hbl_en_ethtool_get_module_eeprom(struct net_device *ndev, struct ethtool_eeprom *ee,
>> + u8 *data)
>> +{
>
> This is the old API. Please update to the new API so there is access
> to all the pages of the SFF/SFP.
>
Are you referring to get_module_eeprom_by_page()? if so, then it is not
supported by our FW, we read the entire data on device load.
However, I can hide that behind the new API and return only the
requested page if that's the intention.
>> +static int hbl_en_ethtool_get_link_ksettings(struct net_device *ndev,
>> + struct ethtool_link_ksettings *cmd)
>> +{
>> + struct hbl_en_aux_ops *aux_ops;
>> + struct hbl_aux_dev *aux_dev;
>> + struct hbl_en_device *hdev;
>> + struct hbl_en_port *port;
>> + u32 port_idx, speed;
>> +
>> + port = hbl_netdev_priv(ndev);
>> + hdev = port->hdev;
>> + port_idx = port->idx;
>> + aux_dev = hdev->aux_dev;
>> + aux_ops = aux_dev->aux_ops;
>> + speed = aux_ops->get_speed(aux_dev, port_idx);
>> +
>> + cmd->base.speed = speed;
>> + cmd->base.duplex = DUPLEX_FULL;
>> +
>> + ethtool_link_ksettings_zero_link_mode(cmd, supported);
>> + ethtool_link_ksettings_zero_link_mode(cmd, advertising);
>> +
>> + switch (speed) {
>> + case SPEED_100000:
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseCR4_Full);
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseSR4_Full);
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseKR4_Full);
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseLR4_ER4_Full);
>> +
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseCR4_Full);
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseSR4_Full);
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseKR4_Full);
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseLR4_ER4_Full);
>> +
>> + cmd->base.port = PORT_FIBRE;
>> +
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, FIBRE);
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, FIBRE);
>> +
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, Backplane);
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Backplane);
>> + break;
>> + case SPEED_50000:
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseSR2_Full);
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseCR2_Full);
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseKR2_Full);
>> +
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseSR2_Full);
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseCR2_Full);
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseKR2_Full);
>> + break;
>> + case SPEED_25000:
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, 25000baseCR_Full);
>> +
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 25000baseCR_Full);
>> + break;
>> + case SPEED_200000:
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, 200000baseCR4_Full);
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, 200000baseKR4_Full);
>> +
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 200000baseCR4_Full);
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 200000baseKR4_Full);
>> + break;
>> + case SPEED_400000:
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, 400000baseCR4_Full);
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, 400000baseKR4_Full);
>> +
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 400000baseCR4_Full);
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, 400000baseKR4_Full);
>> + break;
>> + default:
>> + netdev_err(port->ndev, "unknown speed %d\n", speed);
>> + return -EFAULT;
>> + }
>> +
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, Autoneg);
>> +
>> + if (port->auto_neg_enable) {
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Autoneg);
>> + cmd->base.autoneg = AUTONEG_ENABLE;
>> + if (port->auto_neg_resolved)
>> + ethtool_link_ksettings_add_link_mode(cmd, lp_advertising, Autoneg);
>
> That looks odd. Care to explain?
>
The HW of all of our ports supports autoneg.
But in addition, the ports are divided to two groups:
internal: ports which are connected to other Gaudi2 ports in the same server.
external: ports which are connected to an external switch.
Only internal ports use autoneg.
The ports mask which sets each port as internal/external is fetched from
the FW on device load.
>> + } else {
>> + cmd->base.autoneg = AUTONEG_DISABLE;
>> + }
>> +
>> + ethtool_link_ksettings_add_link_mode(cmd, supported, Pause);
>> +
>> + if (port->pfc_enable)
>> + ethtool_link_ksettings_add_link_mode(cmd, advertising, Pause);
>
> And is suspect that is wrong. Everybody gets pause wrong. Please
> double check my previous posts about pause.
>
Our HW supports Pause frames.
But, PFC can be disabled via lldptool for exmaple, so we won't advertise
it.
I'll try to find more info about it, but can you please share what's wrong
with the curent code?
BTW I will change it to Asym_Pause as we support Tx pause frames as well.
>> + if (auto_neg && !(hdev->auto_neg_mask & BIT(port_idx))) {
>> + netdev_err(port->ndev, "port autoneg is disabled by BMC\n");
>> + rc = -EFAULT;
>> + goto out;
>
> Don't say you support autoneg in supported if that is the case.
>
> And EFAULT is about memory problems. EINVAL, maybe EPERM? or
> EOPNOTSUPP.
>
> Andrew
Yeah, should be switched to EPERM/EOPNOTSUPP.
Regarding the support of autoneg - the HW supports autoneg but it might be
disabled by the FW. Hence we might not be able to switch it on.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 04/15] net: hbl_cn: QP state machine
2024-06-18 5:50 ` Omer Shpigelman
@ 2024-06-18 7:08 ` Leon Romanovsky
2024-06-18 7:58 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Leon Romanovsky @ 2024-06-18 7:08 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On Tue, Jun 18, 2024 at 05:50:15AM +0000, Omer Shpigelman wrote:
> On 6/17/24 16:18, Leon Romanovsky wrote:
> > [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > On Thu, Jun 13, 2024 at 11:21:57AM +0300, Omer Shpigelman wrote:
> >> Add a common QP state machine which handles the moving for a QP from one
> >> state to another including performing necessary checks, draining
> >> in-flight transactions, invalidating caches and error reporting.
> >>
> >> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> >> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> >> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> >> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> >> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> >> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> >> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> >> Co-developed-by: David Meriin <dmeriin@habana.ai>
> >> Signed-off-by: David Meriin <dmeriin@habana.ai>
> >> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> >> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> >> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> >> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> >> ---
> >> .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 480 +++++++++++++++++-
> >> 1 file changed, 479 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
> >> index 9ddc23bf8194..26ebdf448193 100644
> >> --- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
> >> +++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
> >> @@ -6,8 +6,486 @@
> >
> > <...>
> >
> >> +/* The following table represents the (valid) operations that can be performed on
> >> + * a QP in order to move it from one state to another
> >> + * For example: a QP in RTR state can be moved to RTS state using the CN_QP_OP_RTR_2RTS
> >> + * operation.
> >> + */
> >> +static const enum hbl_cn_qp_state_op qp_valid_state_op[CN_QP_NUM_STATE][CN_QP_NUM_STATE] = {
> >> + [CN_QP_STATE_RESET] = {
> >> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >> + [CN_QP_STATE_INIT] = CN_QP_OP_RST_2INIT,
> >> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
> >> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
> >> + },
> >> + [CN_QP_STATE_INIT] = {
> >> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> >> + [CN_QP_STATE_INIT] = CN_QP_OP_NOP,
> >> + [CN_QP_STATE_RTR] = CN_QP_OP_INIT_2RTR,
> >> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
> >> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
> >> + },
> >> + [CN_QP_STATE_RTR] = {
> >> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> >> + [CN_QP_STATE_RTR] = CN_QP_OP_RTR_2RTR,
> >> + [CN_QP_STATE_RTS] = CN_QP_OP_RTR_2RTS,
> >> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
> >> + [CN_QP_STATE_QPD] = CN_QP_OP_RTR_2QPD,
> >> + },
> >> + [CN_QP_STATE_RTS] = {
> >> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> >> + [CN_QP_STATE_RTS] = CN_QP_OP_RTS_2RTS,
> >> + [CN_QP_STATE_SQD] = CN_QP_OP_RTS_2SQD,
> >> + [CN_QP_STATE_QPD] = CN_QP_OP_RTS_2QPD,
> >> + [CN_QP_STATE_SQERR] = CN_QP_OP_RTS_2SQERR,
> >> + },
> >> + [CN_QP_STATE_SQD] = {
> >> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> >> + [CN_QP_STATE_SQD] = CN_QP_OP_SQD_2SQD,
> >> + [CN_QP_STATE_RTS] = CN_QP_OP_SQD_2RTS,
> >> + [CN_QP_STATE_QPD] = CN_QP_OP_SQD_2QPD,
> >> + [CN_QP_STATE_SQERR] = CN_QP_OP_SQD_2SQ_ERR,
> >> + },
> >> + [CN_QP_STATE_QPD] = {
> >> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> >> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
> >> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
> >> + [CN_QP_STATE_RTR] = CN_QP_OP_QPD_2RTR,
> >> + },
> >> + [CN_QP_STATE_SQERR] = {
> >> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> >> + [CN_QP_STATE_SQD] = CN_QP_OP_SQ_ERR_2SQD,
> >> + [CN_QP_STATE_SQERR] = CN_QP_OP_NOP,
> >> + },
> >> + [CN_QP_STATE_ERR] = {
> >> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> >> + }
> >> +};
> >
> > I don't understand why IBTA QP state machine is declared in ETH driver
> > and not in IB driver.
> >
>
> Implementing the actual transitions between the states requires full
> knowledge of the HW e.g. when to flush, cache invalidation, timeouts.
> Our IB driver is agnostic to the ASIC type by design. Note that more ASIC
> generations are planned to be added and the IB driver should not be aware
> of these additional HWs.
> Hence we implemeted the QP state machine in the CN driver which is aware
> of the actual HW.
Somehow ALL other IB drivers are able to implement this logic in the IB,
while supporting multiple ASICs. I don't see a reason why you can't do
the same.
>
> >> +
> >
> > <...>
> >
> >> + /* Release lock while we wait before retry.
> >> + * Note, we can assert that we are already locked.
> >> + */
> >> + port_funcs->cfg_unlock(cn_port);
> >> +
> >> + msleep(20);
> >> +
> >> + port_funcs->cfg_lock(cn_port);
> >
> > lock/unlock through ops pointer doesn't look like a good idea.
> >
>
> More ASIC generations will be added once we merge the current Gaudi2 code.
> On other ASICs the locking granularity is different because some of the HW
> resources are shared between different logical ports.
> Hence it is was logical for us to implement it with a function pointer so
> each ASIC specific code can implemnet the locking properly.
We are reviewing this code which is for the current ASIC, not for the
unknown future ASICs. Please don't over engineer the first submission.
You will always be able to improve/change the code once you decide to
upstream your future ASICs.
Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver
2024-06-17 11:48 ` Leon Romanovsky
@ 2024-06-18 7:28 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-18 7:28 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Przemek Kitszel, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On 6/17/24 14:48, Leon Romanovsky wrote:
> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Mon, Jun 17, 2024 at 08:08:26AM +0000, Omer Shpigelman wrote:
>> On 6/13/24 16:01, Przemek Kitszel wrote:
>>> [Some people who received this message don't often get email from przemyslaw.kitszel@intel.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> On 6/13/24 10:21, Omer Shpigelman wrote:
>>>> Add the hbl_cn driver which will serve both Ethernet and InfiniBand
>>>> drivers.
>>>> hbl_cn is the layer which is used by the satellite drivers for many shared
>>>> operations that are needed by both EN and IB subsystems like QPs, CQs etc.
>>>> The CN driver is initialized via auxiliary bus by the habanalabs driver.
>>>>
>>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
>>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
>>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
>>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
>>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
>>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
>>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
>>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
>>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
>>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
>>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
>>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
>>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
>>>> ---
>>>> .../device_drivers/ethernet/index.rst | 1 +
>>>> .../device_drivers/ethernet/intel/hbl.rst | 82 +
>>>> MAINTAINERS | 11 +
>>>> drivers/net/ethernet/intel/Kconfig | 20 +
>>>> drivers/net/ethernet/intel/Makefile | 1 +
>>>> drivers/net/ethernet/intel/hbl_cn/Makefile | 9 +
>>>> .../net/ethernet/intel/hbl_cn/common/Makefile | 3 +
>>>> .../net/ethernet/intel/hbl_cn/common/hbl_cn.c | 5954 +++++++++++++++++
>>>> .../net/ethernet/intel/hbl_cn/common/hbl_cn.h | 1627 +++++
>>>> .../ethernet/intel/hbl_cn/common/hbl_cn_drv.c | 220 +
>>>> .../intel/hbl_cn/common/hbl_cn_memory.c | 40 +
>>>> .../ethernet/intel/hbl_cn/common/hbl_cn_phy.c | 33 +
>>>> .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 13 +
>>>> include/linux/habanalabs/cpucp_if.h | 125 +-
>>>> include/linux/habanalabs/hl_boot_if.h | 9 +-
>>>> include/linux/net/intel/cn.h | 474 ++
>>>> include/linux/net/intel/cn_aux.h | 298 +
>>>> include/linux/net/intel/cni.h | 636 ++
>>>> 18 files changed, 9545 insertions(+), 11 deletions(-)
>>>
>>> this is a very big patch, it asks for a split; what's worse, it's
>>> proportional to the size of this series:
>>> 146 files changed, 148514 insertions(+), 70 deletions(-)
>>> which is just too big
>>>
>>> [...]
>>>
>>
>> Yeah, well I'm limited to 15 patches per patch set according to the kernel
>> doc so I had to have this big patch.
>> Our changes are contained in 4 different drivers and all of the changes
>> should be merged together so the HW will be operational.
>> Hence I had to squeeze some code to a big patch.
>
> Submit your code in multiple steps. One driver at a time.
>
> Thanks
I can push each driver at a time but it is missing the big context. Every
single driver is useless without the others (or at least a subset of
them).
But I'll do that anyway so it will be possible to review.
Thanks.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver
2024-06-17 15:02 ` Andrew Lunn
@ 2024-06-18 7:51 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-18 7:51 UTC (permalink / raw)
To: Andrew Lunn, Markus Elfring
Cc: Abhilash K V, Andrey Agranovich, Bharat Jauhari, David Meriin,
Sagiv Ozeri, Zvika Yehudai, netdev@vger.kernel.org,
linux-rdma@vger.kernel.org, dri-devel@lists.freedesktop.org, LKML
On 6/17/24 18:02, Andrew Lunn wrote:
> [Some people who received this message don't often get email from andrew@lunn.ch. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Mon, Jun 17, 2024 at 04:05:57PM +0200, Markus Elfring wrote:
>> …
>>> +++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
>>> @@ -0,0 +1,5954 @@
>> …
>>> +int hbl_cn_read_spmu_counters(struct hbl_cn_port *cn_port, u64 out_data[], u32 *num_out_data)
>>> +{
>> …
>>> + mutex_lock(&cn_port->cnt_lock);
>>> + rc = port_funcs->spmu_sample(cn_port, *num_out_data, out_data);
>>> + mutex_unlock(&cn_port->cnt_lock);
>>> +
>>> + return rc;
>>> +}
>> …
>>
>> Would you become interested to apply a statement like “guard(mutex)(&cn_port->cnt_lock);”?
>> https://elixir.bootlin.com/linux/v6.10-rc4/source/include/linux/mutex.h#L196
>
> Hi Markus
>
> We decided for netdev that guard() was too magical, at least for the
> moment. Lets wait a few years to see how it pans out. scoped_guard()
> is however O.K.
>
> Andrew
Thanks for the reference.
I don't see any other Ethernet driver that uses these so let me first
handle the necessary changes and later on I'll check optional
enhancements.
But yeah, we are always open to improve the code.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 04/15] net: hbl_cn: QP state machine
2024-06-18 7:08 ` Leon Romanovsky
@ 2024-06-18 7:58 ` Omer Shpigelman
2024-06-18 9:00 ` Leon Romanovsky
0 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-18 7:58 UTC (permalink / raw)
To: Leon Romanovsky
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/18/24 10:08, Leon Romanovsky wrote:
> On Tue, Jun 18, 2024 at 05:50:15AM +0000, Omer Shpigelman wrote:
>> On 6/17/24 16:18, Leon Romanovsky wrote:
>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> On Thu, Jun 13, 2024 at 11:21:57AM +0300, Omer Shpigelman wrote:
>>>> Add a common QP state machine which handles the moving for a QP from one
>>>> state to another including performing necessary checks, draining
>>>> in-flight transactions, invalidating caches and error reporting.
>>>>
>>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
>>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
>>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
>>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
>>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
>>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
>>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
>>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
>>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
>>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
>>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
>>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
>>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
>>>> ---
>>>> .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 480 +++++++++++++++++-
>>>> 1 file changed, 479 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>>>> index 9ddc23bf8194..26ebdf448193 100644
>>>> --- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>>>> +++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>>>> @@ -6,8 +6,486 @@
>>>
>>> <...>
>>>
>>>> +/* The following table represents the (valid) operations that can be performed on
>>>> + * a QP in order to move it from one state to another
>>>> + * For example: a QP in RTR state can be moved to RTS state using the CN_QP_OP_RTR_2RTS
>>>> + * operation.
>>>> + */
>>>> +static const enum hbl_cn_qp_state_op qp_valid_state_op[CN_QP_NUM_STATE][CN_QP_NUM_STATE] = {
>>>> + [CN_QP_STATE_RESET] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_INIT] = CN_QP_OP_RST_2INIT,
>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
>>>> + },
>>>> + [CN_QP_STATE_INIT] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>> + [CN_QP_STATE_INIT] = CN_QP_OP_NOP,
>>>> + [CN_QP_STATE_RTR] = CN_QP_OP_INIT_2RTR,
>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
>>>> + },
>>>> + [CN_QP_STATE_RTR] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>> + [CN_QP_STATE_RTR] = CN_QP_OP_RTR_2RTR,
>>>> + [CN_QP_STATE_RTS] = CN_QP_OP_RTR_2RTS,
>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_RTR_2QPD,
>>>> + },
>>>> + [CN_QP_STATE_RTS] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>> + [CN_QP_STATE_RTS] = CN_QP_OP_RTS_2RTS,
>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_RTS_2SQD,
>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_RTS_2QPD,
>>>> + [CN_QP_STATE_SQERR] = CN_QP_OP_RTS_2SQERR,
>>>> + },
>>>> + [CN_QP_STATE_SQD] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_SQD_2SQD,
>>>> + [CN_QP_STATE_RTS] = CN_QP_OP_SQD_2RTS,
>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_SQD_2QPD,
>>>> + [CN_QP_STATE_SQERR] = CN_QP_OP_SQD_2SQ_ERR,
>>>> + },
>>>> + [CN_QP_STATE_QPD] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
>>>> + [CN_QP_STATE_RTR] = CN_QP_OP_QPD_2RTR,
>>>> + },
>>>> + [CN_QP_STATE_SQERR] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_SQ_ERR_2SQD,
>>>> + [CN_QP_STATE_SQERR] = CN_QP_OP_NOP,
>>>> + },
>>>> + [CN_QP_STATE_ERR] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>> + }
>>>> +};
>>>
>>> I don't understand why IBTA QP state machine is declared in ETH driver
>>> and not in IB driver.
>>>
>>
>> Implementing the actual transitions between the states requires full
>> knowledge of the HW e.g. when to flush, cache invalidation, timeouts.
>> Our IB driver is agnostic to the ASIC type by design. Note that more ASIC
>> generations are planned to be added and the IB driver should not be aware
>> of these additional HWs.
>> Hence we implemeted the QP state machine in the CN driver which is aware
>> of the actual HW.
>
> Somehow ALL other IB drivers are able to implement this logic in the IB,
> while supporting multiple ASICs. I don't see a reason why you can't do
> the same.
>
If we are referring to this actual table, then I can move it to the IB
driver and the CN driver will fetch the needed opcode via a function
pointer.
Is that ok?
>>
>>>> +
>>>
>>> <...>
>>>
>>>> + /* Release lock while we wait before retry.
>>>> + * Note, we can assert that we are already locked.
>>>> + */
>>>> + port_funcs->cfg_unlock(cn_port);
>>>> +
>>>> + msleep(20);
>>>> +
>>>> + port_funcs->cfg_lock(cn_port);
>>>
>>> lock/unlock through ops pointer doesn't look like a good idea.
>>>
>>
>> More ASIC generations will be added once we merge the current Gaudi2 code.
>> On other ASICs the locking granularity is different because some of the HW
>> resources are shared between different logical ports.
>> Hence it is was logical for us to implement it with a function pointer so
>> each ASIC specific code can implemnet the locking properly.
>
> We are reviewing this code which is for the current ASIC, not for the
> unknown future ASICs. Please don't over engineer the first submission.
> You will always be able to improve/change the code once you decide to
> upstream your future ASICs.
>
I see. I'll refactor the code then.
> Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 04/15] net: hbl_cn: QP state machine
2024-06-18 7:58 ` Omer Shpigelman
@ 2024-06-18 9:00 ` Leon Romanovsky
2024-06-24 7:24 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Leon Romanovsky @ 2024-06-18 9:00 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On Tue, Jun 18, 2024 at 07:58:55AM +0000, Omer Shpigelman wrote:
> On 6/18/24 10:08, Leon Romanovsky wrote:
> > On Tue, Jun 18, 2024 at 05:50:15AM +0000, Omer Shpigelman wrote:
> >> On 6/17/24 16:18, Leon Romanovsky wrote:
> >>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>
> >>> On Thu, Jun 13, 2024 at 11:21:57AM +0300, Omer Shpigelman wrote:
> >>>> Add a common QP state machine which handles the moving for a QP from one
> >>>> state to another including performing necessary checks, draining
> >>>> in-flight transactions, invalidating caches and error reporting.
> >>>>
> >>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> >>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> >>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> >>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> >>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> >>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> >>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> >>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
> >>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
> >>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> >>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> >>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> >>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> >>>> ---
> >>>> .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 480 +++++++++++++++++-
> >>>> 1 file changed, 479 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
> >>>> index 9ddc23bf8194..26ebdf448193 100644
> >>>> --- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
> >>>> +++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
> >>>> @@ -6,8 +6,486 @@
> >>>
> >>> <...>
> >>>
> >>>> +/* The following table represents the (valid) operations that can be performed on
> >>>> + * a QP in order to move it from one state to another
> >>>> + * For example: a QP in RTR state can be moved to RTS state using the CN_QP_OP_RTR_2RTS
> >>>> + * operation.
> >>>> + */
> >>>> +static const enum hbl_cn_qp_state_op qp_valid_state_op[CN_QP_NUM_STATE][CN_QP_NUM_STATE] = {
> >>>> + [CN_QP_STATE_RESET] = {
> >>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >>>> + [CN_QP_STATE_INIT] = CN_QP_OP_RST_2INIT,
> >>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
> >>>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
> >>>> + },
> >>>> + [CN_QP_STATE_INIT] = {
> >>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> >>>> + [CN_QP_STATE_INIT] = CN_QP_OP_NOP,
> >>>> + [CN_QP_STATE_RTR] = CN_QP_OP_INIT_2RTR,
> >>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
> >>>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
> >>>> + },
> >>>> + [CN_QP_STATE_RTR] = {
> >>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> >>>> + [CN_QP_STATE_RTR] = CN_QP_OP_RTR_2RTR,
> >>>> + [CN_QP_STATE_RTS] = CN_QP_OP_RTR_2RTS,
> >>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
> >>>> + [CN_QP_STATE_QPD] = CN_QP_OP_RTR_2QPD,
> >>>> + },
> >>>> + [CN_QP_STATE_RTS] = {
> >>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> >>>> + [CN_QP_STATE_RTS] = CN_QP_OP_RTS_2RTS,
> >>>> + [CN_QP_STATE_SQD] = CN_QP_OP_RTS_2SQD,
> >>>> + [CN_QP_STATE_QPD] = CN_QP_OP_RTS_2QPD,
> >>>> + [CN_QP_STATE_SQERR] = CN_QP_OP_RTS_2SQERR,
> >>>> + },
> >>>> + [CN_QP_STATE_SQD] = {
> >>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> >>>> + [CN_QP_STATE_SQD] = CN_QP_OP_SQD_2SQD,
> >>>> + [CN_QP_STATE_RTS] = CN_QP_OP_SQD_2RTS,
> >>>> + [CN_QP_STATE_QPD] = CN_QP_OP_SQD_2QPD,
> >>>> + [CN_QP_STATE_SQERR] = CN_QP_OP_SQD_2SQ_ERR,
> >>>> + },
> >>>> + [CN_QP_STATE_QPD] = {
> >>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> >>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
> >>>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
> >>>> + [CN_QP_STATE_RTR] = CN_QP_OP_QPD_2RTR,
> >>>> + },
> >>>> + [CN_QP_STATE_SQERR] = {
> >>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> >>>> + [CN_QP_STATE_SQD] = CN_QP_OP_SQ_ERR_2SQD,
> >>>> + [CN_QP_STATE_SQERR] = CN_QP_OP_NOP,
> >>>> + },
> >>>> + [CN_QP_STATE_ERR] = {
> >>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
> >>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
> >>>> + }
> >>>> +};
> >>>
> >>> I don't understand why IBTA QP state machine is declared in ETH driver
> >>> and not in IB driver.
> >>>
> >>
> >> Implementing the actual transitions between the states requires full
> >> knowledge of the HW e.g. when to flush, cache invalidation, timeouts.
> >> Our IB driver is agnostic to the ASIC type by design. Note that more ASIC
> >> generations are planned to be added and the IB driver should not be aware
> >> of these additional HWs.
> >> Hence we implemeted the QP state machine in the CN driver which is aware
> >> of the actual HW.
> >
> > Somehow ALL other IB drivers are able to implement this logic in the IB,
> > while supporting multiple ASICs. I don't see a reason why you can't do
> > the same.
> >
>
> If we are referring to this actual table, then I can move it to the IB
> driver and the CN driver will fetch the needed opcode via a function
> pointer.
> Is that ok?
This table spotted my attention, but right separation shouldn't be limited
to only this table. The outcome of this conversation should be:
"IB specific logic should be in IB driver, and CN driver should be able to
handle only low-level operations".
Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-17 19:04 ` Leon Romanovsky
@ 2024-06-18 11:08 ` Omer Shpigelman
2024-06-18 12:58 ` Leon Romanovsky
2024-06-18 16:01 ` Przemek Kitszel
0 siblings, 2 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-18 11:08 UTC (permalink / raw)
To: Leon Romanovsky
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/17/24 22:04, Leon Romanovsky wrote:
> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Mon, Jun 17, 2024 at 05:43:49PM +0000, Omer Shpigelman wrote:
>> On 6/13/24 22:18, Leon Romanovsky wrote:
>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
>>>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
>>>> The driver itself is agnostic to the ASIC in action, it operates according
>>>> to the capabilities that were passed on device initialization.
>>>> The device is initialized by the hbl_cn driver via auxiliary bus.
>>>> The driver also supports QP resource tracking and port/device HW counters.
>>>>
>>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
>>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
>>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
>>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
>>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
>>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
>>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
>>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
>>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
>>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
>>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
>>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
>>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
>>>
>>> I afraid that you misinterpreted the "Co-developed-by" tag. All these
>>> people are probably touch the code and not actually sit together at
>>> the same room and write the code together. So, please remove the
>>> extensive "Co-developed-by" tags.
>>>
>>> It is not full review yet, but simple pass-by-comments.
>>>
>>
>> Actually except of two, all of the mentioned persons sat in the same room
>> and developed the code together.
>> The remaining two are located on a different site (but also together).
>> Isn't that what "Co-developed-by" tag for?
>> I wanted to give them credit for writing the code but I can remove if it's
>> not common.
>
> Signed-off-by will be enough to give them credit.
>
Ok, good enough.
>>
>>>> ---
>>>> MAINTAINERS | 10 +
>>>> drivers/infiniband/Kconfig | 1 +
>>>> drivers/infiniband/hw/Makefile | 1 +
>>>> drivers/infiniband/hw/hbl/Kconfig | 17 +
>>>> drivers/infiniband/hw/hbl/Makefile | 8 +
>>>> drivers/infiniband/hw/hbl/hbl.h | 326 +++
>>>> drivers/infiniband/hw/hbl/hbl_main.c | 478 ++++
>>>> drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 ++++++++++++++++++++++
>>>> include/uapi/rdma/hbl-abi.h | 204 ++
>>>> include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
>>>> include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
>>>> include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
>>>> 12 files changed, 3904 insertions(+)
>>>> create mode 100644 drivers/infiniband/hw/hbl/Kconfig
>>>> create mode 100644 drivers/infiniband/hw/hbl/Makefile
>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl.h
>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c
>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c
>>>> create mode 100644 include/uapi/rdma/hbl-abi.h
>>>> create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h
>>>> create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h
>>>
>>> <...>
>>>
>>>> +#define hbl_ibdev_emerg(ibdev, format, ...) ibdev_emerg(ibdev, format, ##__VA_ARGS__)
>>>> +#define hbl_ibdev_alert(ibdev, format, ...) ibdev_alert(ibdev, format, ##__VA_ARGS__)
>>>> +#define hbl_ibdev_crit(ibdev, format, ...) ibdev_crit(ibdev, format, ##__VA_ARGS__)
>>>> +#define hbl_ibdev_err(ibdev, format, ...) ibdev_err(ibdev, format, ##__VA_ARGS__)
>>>> +#define hbl_ibdev_warn(ibdev, format, ...) ibdev_warn(ibdev, format, ##__VA_ARGS__)
>>>> +#define hbl_ibdev_notice(ibdev, format, ...) ibdev_notice(ibdev, format, ##__VA_ARGS__)
>>>> +#define hbl_ibdev_info(ibdev, format, ...) ibdev_info(ibdev, format, ##__VA_ARGS__)
>>>> +#define hbl_ibdev_dbg(ibdev, format, ...) ibdev_dbg(ibdev, format, ##__VA_ARGS__)
>>>> +
>>>> +#define hbl_ibdev_emerg_ratelimited(ibdev, fmt, ...) \
>>>> + ibdev_emerg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>> +#define hbl_ibdev_alert_ratelimited(ibdev, fmt, ...) \
>>>> + ibdev_alert_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>> +#define hbl_ibdev_crit_ratelimited(ibdev, fmt, ...) \
>>>> + ibdev_crit_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>> +#define hbl_ibdev_err_ratelimited(ibdev, fmt, ...) \
>>>> + ibdev_err_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>> +#define hbl_ibdev_warn_ratelimited(ibdev, fmt, ...) \
>>>> + ibdev_warn_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>> +#define hbl_ibdev_notice_ratelimited(ibdev, fmt, ...) \
>>>> + ibdev_notice_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>> +#define hbl_ibdev_info_ratelimited(ibdev, fmt, ...) \
>>>> + ibdev_info_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>> +#define hbl_ibdev_dbg_ratelimited(ibdev, fmt, ...) \
>>>> + ibdev_dbg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>> +
>>>
>>> Please don't redefine the existing macros. Just use the existing ones.
>>>
>>>
>>> <...>
>>>
>>
>> That's a leftover from some debug code. I'll remove.
>>
>>>> + if (hbl_ib_match_netdev(ibdev, netdev))
>>>> + ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
>>>> + else
>>>> + return NOTIFY_DONE;
>>>
>>> It is not kernel coding style. Please write:
>>> if (!hbl_ib_match_netdev(ibdev, netdev))
>>> return NOTIFY_DONE;
>>>
>>> ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
>>>
>>
>> I'll fix the code, thanks.
>>
>>>> +
>>>
>>> <...>
>>>
>>>> +static int hbl_ib_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
>>>> +{
>>>> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
>>>> + struct hbl_ib_aux_ops *aux_ops = aux_dev->aux_ops;
>>>> + struct hbl_ib_device *hdev;
>>>> + ktime_t timeout;
>>>> + int rc;
>>>> +
>>>> + rc = hdev_init(aux_dev);
>>>> + if (rc) {
>>>> + dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
>>>> + return -EIO;
>>>> + }
>>>> +
>>>> + hdev = aux_dev->priv;
>>>> +
>>>> + /* don't allow module unloading while it is attached */
>>>> + if (!try_module_get(THIS_MODULE)) {
>>>
>>> This part makes wonder, what are you trying to do here? What doesn't work for you
>>> in standard driver core and module load mechanism?
>>>
>>
>> Before auxiliary bus was introduced, we used EXPORT_SYMBOLs for inter
>> driver communication. That incremented the refcount of the used module so
>> it couldn't be removed while it is in use.
>> Auxiliary bus usage doesn't increment the used module refcount and hence
>> the used module can be removed while it is in use and that's something
>> we don't want to allow.
>> We could solve it by some global locking or in_use atomic but the most
>> simple and clean way is just to increment the used module refcount on
>> auxiliary device probe and decrement it on auxiliary device removal.
>
> No, you was supposed to continue to use EXPORT_SYMBOLs and don't
> invent auxiliary ops structure (this is why you lost module
> reference counting).
>
Sorry, but according to the auxiliary bus doc, a domain-specific ops
structure can be used.
We followed the usage example described at drivers/base/auxiliary.c.
What am I missing?
Moreover, we'd like to support the mode where the IB or the ETH driver is
not loaded at all. But this cannot be achieved if we use EXPORT_SYMBOLs
exclusively for inter driver communication.
>>
>>>> + dev_err(hdev->dev, "Failed to increment %s module refcount\n",
>>>> + module_name(THIS_MODULE));
>>>> + rc = -EIO;
>>>> + goto module_get_err;
>>>> + }
>>>> +
>>>> + timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
>>>> + while (1) {
>>>> + aux_ops->hw_access_lock(aux_dev);
>>>> +
>>>> + /* if the device is operational, proceed to actual init while holding the lock in
>>>> + * order to prevent concurrent hard reset
>>>> + */
>>>> + if (aux_ops->device_operational(aux_dev))
>>>> + break;
>>>> +
>>>> + aux_ops->hw_access_unlock(aux_dev);
>>>> +
>>>> + if (ktime_compare(ktime_get(), timeout) > 0) {
>>>> + dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
>>>> + rc = -EBUSY;
>>>> + goto timeout_err;
>>>> + }
>>>> +
>>>> + dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing IB\n");
>>>> +
>>>> + msleep_interruptible(MSEC_PER_SEC);
>>>> + }
>>>
>>> The code above is unexpected.
>>>
>>
>> We have no control on when the user insmod the IB driver.
>
> It is not true, this is controlled through module dependencies
> mechanism.
>
Yeah, if we would use EXPORT_SYMBOLs for inter driver communication but
we don't.
>> As a result it is possible that the IB auxiliary device will be probed
>> while the compute device is under reset (due to some HW error).
>
> No, it is not possible. If you structure your driver right.
>
Again, it is not possible if we would use EXPORT_SYMBOLs.
Please let me know if we misunderstood something because AFAIU we followed
the auxiliary bus doc usage example.
> Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-15 10:55 ` Zhu Yanjun
@ 2024-06-18 11:16 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-18 11:16 UTC (permalink / raw)
To: Zhu Yanjun, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org
Cc: ogabbay@kernel.org, Zvika Yehudai
On 6/15/24 13:55, Zhu Yanjun wrote:
> [You don't often get email from yanjun.zhu@linux.dev. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> 在 2024/6/13 16:22, Omer Shpigelman 写道:
>> +
>> +/* This function should be called after ctrl_lock was taken */
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/dev-tools/sparse.rst?h=v6.10-rc3#n64
>
> "
> __must_hold - The specified lock is held on function entry and exit.
> "
> Add "__must_hold" to confirm "The specified lock is held on function
> entry and exit." ?
>
> Zhu Yanjun
Thanks, I'll add it.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-18 11:08 ` Omer Shpigelman
@ 2024-06-18 12:58 ` Leon Romanovsky
2024-06-19 9:27 ` Omer Shpigelman
2024-06-18 16:01 ` Przemek Kitszel
1 sibling, 1 reply; 107+ messages in thread
From: Leon Romanovsky @ 2024-06-18 12:58 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On Tue, Jun 18, 2024 at 11:08:34AM +0000, Omer Shpigelman wrote:
> On 6/17/24 22:04, Leon Romanovsky wrote:
> > [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > On Mon, Jun 17, 2024 at 05:43:49PM +0000, Omer Shpigelman wrote:
> >> On 6/13/24 22:18, Leon Romanovsky wrote:
> >>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>
> >>> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
> >>>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
> >>>> The driver itself is agnostic to the ASIC in action, it operates according
> >>>> to the capabilities that were passed on device initialization.
> >>>> The device is initialized by the hbl_cn driver via auxiliary bus.
> >>>> The driver also supports QP resource tracking and port/device HW counters.
> >>>>
> >>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> >>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> >>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> >>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> >>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> >>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> >>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> >>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
> >>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
> >>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> >>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> >>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> >>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> >>>
> >>> I afraid that you misinterpreted the "Co-developed-by" tag. All these
> >>> people are probably touch the code and not actually sit together at
> >>> the same room and write the code together. So, please remove the
> >>> extensive "Co-developed-by" tags.
> >>>
> >>> It is not full review yet, but simple pass-by-comments.
> >>>
> >>
> >> Actually except of two, all of the mentioned persons sat in the same room
> >> and developed the code together.
> >> The remaining two are located on a different site (but also together).
> >> Isn't that what "Co-developed-by" tag for?
> >> I wanted to give them credit for writing the code but I can remove if it's
> >> not common.
> >
> > Signed-off-by will be enough to give them credit.
> >
>
> Ok, good enough.
>
> >>
> >>>> ---
> >>>> MAINTAINERS | 10 +
> >>>> drivers/infiniband/Kconfig | 1 +
> >>>> drivers/infiniband/hw/Makefile | 1 +
> >>>> drivers/infiniband/hw/hbl/Kconfig | 17 +
> >>>> drivers/infiniband/hw/hbl/Makefile | 8 +
> >>>> drivers/infiniband/hw/hbl/hbl.h | 326 +++
> >>>> drivers/infiniband/hw/hbl/hbl_main.c | 478 ++++
> >>>> drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 ++++++++++++++++++++++
> >>>> include/uapi/rdma/hbl-abi.h | 204 ++
> >>>> include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
> >>>> include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
> >>>> include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
> >>>> 12 files changed, 3904 insertions(+)
> >>>> create mode 100644 drivers/infiniband/hw/hbl/Kconfig
> >>>> create mode 100644 drivers/infiniband/hw/hbl/Makefile
> >>>> create mode 100644 drivers/infiniband/hw/hbl/hbl.h
> >>>> create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c
> >>>> create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c
> >>>> create mode 100644 include/uapi/rdma/hbl-abi.h
> >>>> create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h
> >>>> create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h
> >>>
> >>> <...>
> >>>
> >>>> +#define hbl_ibdev_emerg(ibdev, format, ...) ibdev_emerg(ibdev, format, ##__VA_ARGS__)
> >>>> +#define hbl_ibdev_alert(ibdev, format, ...) ibdev_alert(ibdev, format, ##__VA_ARGS__)
> >>>> +#define hbl_ibdev_crit(ibdev, format, ...) ibdev_crit(ibdev, format, ##__VA_ARGS__)
> >>>> +#define hbl_ibdev_err(ibdev, format, ...) ibdev_err(ibdev, format, ##__VA_ARGS__)
> >>>> +#define hbl_ibdev_warn(ibdev, format, ...) ibdev_warn(ibdev, format, ##__VA_ARGS__)
> >>>> +#define hbl_ibdev_notice(ibdev, format, ...) ibdev_notice(ibdev, format, ##__VA_ARGS__)
> >>>> +#define hbl_ibdev_info(ibdev, format, ...) ibdev_info(ibdev, format, ##__VA_ARGS__)
> >>>> +#define hbl_ibdev_dbg(ibdev, format, ...) ibdev_dbg(ibdev, format, ##__VA_ARGS__)
> >>>> +
> >>>> +#define hbl_ibdev_emerg_ratelimited(ibdev, fmt, ...) \
> >>>> + ibdev_emerg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>> +#define hbl_ibdev_alert_ratelimited(ibdev, fmt, ...) \
> >>>> + ibdev_alert_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>> +#define hbl_ibdev_crit_ratelimited(ibdev, fmt, ...) \
> >>>> + ibdev_crit_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>> +#define hbl_ibdev_err_ratelimited(ibdev, fmt, ...) \
> >>>> + ibdev_err_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>> +#define hbl_ibdev_warn_ratelimited(ibdev, fmt, ...) \
> >>>> + ibdev_warn_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>> +#define hbl_ibdev_notice_ratelimited(ibdev, fmt, ...) \
> >>>> + ibdev_notice_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>> +#define hbl_ibdev_info_ratelimited(ibdev, fmt, ...) \
> >>>> + ibdev_info_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>> +#define hbl_ibdev_dbg_ratelimited(ibdev, fmt, ...) \
> >>>> + ibdev_dbg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>> +
> >>>
> >>> Please don't redefine the existing macros. Just use the existing ones.
> >>>
> >>>
> >>> <...>
> >>>
> >>
> >> That's a leftover from some debug code. I'll remove.
> >>
> >>>> + if (hbl_ib_match_netdev(ibdev, netdev))
> >>>> + ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
> >>>> + else
> >>>> + return NOTIFY_DONE;
> >>>
> >>> It is not kernel coding style. Please write:
> >>> if (!hbl_ib_match_netdev(ibdev, netdev))
> >>> return NOTIFY_DONE;
> >>>
> >>> ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
> >>>
> >>
> >> I'll fix the code, thanks.
> >>
> >>>> +
> >>>
> >>> <...>
> >>>
> >>>> +static int hbl_ib_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
> >>>> +{
> >>>> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
> >>>> + struct hbl_ib_aux_ops *aux_ops = aux_dev->aux_ops;
> >>>> + struct hbl_ib_device *hdev;
> >>>> + ktime_t timeout;
> >>>> + int rc;
> >>>> +
> >>>> + rc = hdev_init(aux_dev);
> >>>> + if (rc) {
> >>>> + dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
> >>>> + return -EIO;
> >>>> + }
> >>>> +
> >>>> + hdev = aux_dev->priv;
> >>>> +
> >>>> + /* don't allow module unloading while it is attached */
> >>>> + if (!try_module_get(THIS_MODULE)) {
> >>>
> >>> This part makes wonder, what are you trying to do here? What doesn't work for you
> >>> in standard driver core and module load mechanism?
> >>>
> >>
> >> Before auxiliary bus was introduced, we used EXPORT_SYMBOLs for inter
> >> driver communication. That incremented the refcount of the used module so
> >> it couldn't be removed while it is in use.
> >> Auxiliary bus usage doesn't increment the used module refcount and hence
> >> the used module can be removed while it is in use and that's something
> >> we don't want to allow.
> >> We could solve it by some global locking or in_use atomic but the most
> >> simple and clean way is just to increment the used module refcount on
> >> auxiliary device probe and decrement it on auxiliary device removal.
> >
> > No, you was supposed to continue to use EXPORT_SYMBOLs and don't
> > invent auxiliary ops structure (this is why you lost module
> > reference counting).
> >
>
> Sorry, but according to the auxiliary bus doc, a domain-specific ops
> structure can be used.
> We followed the usage example described at drivers/base/auxiliary.c.
> What am I missing?
Being the one who implemented auxiliary bus in the kernel and converted
number of drivers to use it, I strongly recommend do NOT follow the example
provided there.
So you are missing "best practice", and "best practice" is to use
EXPORT_SYMBOLs and rely on module reference counting.
> Moreover, we'd like to support the mode where the IB or the ETH driver is
> not loaded at all. But this cannot be achieved if we use EXPORT_SYMBOLs
> exclusively for inter driver communication.
It is not true and not how the kernel works. You can perfectly load core
driver without IB and ETH, at some extent this is how mlx5 driver works.
>
> >>
> >>>> + dev_err(hdev->dev, "Failed to increment %s module refcount\n",
> >>>> + module_name(THIS_MODULE));
> >>>> + rc = -EIO;
> >>>> + goto module_get_err;
> >>>> + }
> >>>> +
> >>>> + timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
> >>>> + while (1) {
> >>>> + aux_ops->hw_access_lock(aux_dev);
> >>>> +
> >>>> + /* if the device is operational, proceed to actual init while holding the lock in
> >>>> + * order to prevent concurrent hard reset
> >>>> + */
> >>>> + if (aux_ops->device_operational(aux_dev))
> >>>> + break;
> >>>> +
> >>>> + aux_ops->hw_access_unlock(aux_dev);
> >>>> +
> >>>> + if (ktime_compare(ktime_get(), timeout) > 0) {
> >>>> + dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
> >>>> + rc = -EBUSY;
> >>>> + goto timeout_err;
> >>>> + }
> >>>> +
> >>>> + dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing IB\n");
> >>>> +
> >>>> + msleep_interruptible(MSEC_PER_SEC);
> >>>> + }
> >>>
> >>> The code above is unexpected.
> >>>
> >>
> >> We have no control on when the user insmod the IB driver.
> >
> > It is not true, this is controlled through module dependencies
> > mechanism.
> >
>
> Yeah, if we would use EXPORT_SYMBOLs for inter driver communication but
> we don't.
So please use it and don't add complexity where it is not needed.
>
> >> As a result it is possible that the IB auxiliary device will be probed
> >> while the compute device is under reset (due to some HW error).
> >
> > No, it is not possible. If you structure your driver right.
> >
>
> Again, it is not possible if we would use EXPORT_SYMBOLs.
> Please let me know if we misunderstood something because AFAIU we followed
> the auxiliary bus doc usage example.
It is better to follow actual drivers that use auxiliary bus and see how
they implemented it and not rely on examples in the documentation.
Thanks
>
> > Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-18 6:58 ` Omer Shpigelman
@ 2024-06-18 14:19 ` Andrew Lunn
2024-06-19 7:16 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Andrew Lunn @ 2024-06-18 14:19 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
> >> +static u32 hbl_en_get_mtu(struct hbl_aux_dev *aux_dev, u32 port_idx)
> >> +{
> >> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> >> + struct net_device *ndev = port->ndev;
> >> + u32 mtu;
> >> +
> >> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> >> + netdev_err(ndev, "port is in reset, can't get MTU\n");
> >> + return 0;
> >> + }
> >> +
> >> + mtu = ndev->mtu;
> >
> > I think you need a better error message. All this does is access
> > ndev->mtu. What does it matter if the port is in reset? You don't
> > access it.
> >
>
> This function is called from the CN driver to get the current MTU in order
> to configure it to the HW, for exmaple when configuring an IB QP. The MTU
> value might be changed by user while we execute this function.
Change of MTU will happen while holding RTNL. Why not simply hold RTNL
while programming the hardware? That is the normal pattern for MAC
drivers.
> >> +static int hbl_en_change_mtu(struct net_device *netdev, int new_mtu)
> >> +{
> >> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> >> + int rc = 0;
> >> +
> >> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> >> + netdev_err(netdev, "port is in reset, can't change MTU\n");
> >> + return -EBUSY;
> >> + }
> >> +
> >> + if (netif_running(port->ndev)) {
> >> + hbl_en_port_close(port);
> >> +
> >> + /* Sleep in order to let obsolete events to be dropped before re-opening the port */
> >> + msleep(20);
> >> +
> >> + netdev->mtu = new_mtu;
> >> +
> >> + rc = hbl_en_port_open(port);
> >> + if (rc)
> >> + netdev_err(netdev, "Failed to reinit port for MTU change, rc %d\n", rc);
> >
> > Does that mean the port is FUBAR?
> >
> > Most operations like this are expected to roll back to the previous
> > working configuration on failure. So if changing the MTU requires new
> > buffers in your ring, you should first allocate the new buffers, then
> > free the old buffers, so that if allocation fails, you still have
> > buffers, and the device can continue operating.
> >
>
> A failure in opening a port is a fatal error. It shouldn't happen. This is
> not something we wish to recover from.
What could cause open to fail? Is memory allocated?
> This kind of an error indicates a severe system error that will usually
> require a driver removal and reload anyway.
>
> >> +module_param(poll_enable, bool, 0444);
> >> +MODULE_PARM_DESC(poll_enable,
> >> + "Enable Rx polling rather than IRQ + NAPI (0 = no, 1 = yes, default: no)");
> >
> > Module parameters are not liked. This probably needs to go away.
> >
>
> I see that various vendors under net/ethernet/* use module parameters.
> Can't we add another one?
Look at the history of those module parameters. Do you see many added
in the last year? 5 years?
> >> +static int hbl_en_ethtool_get_module_info(struct net_device *ndev, struct ethtool_modinfo *modinfo)
> >> +{
> >> + modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
> >> + modinfo->type = ETH_MODULE_SFF_8636;
> >
> > Is this an SFF, not an SFP? How else can you know what module it is
> > without doing an I2C transfer to ask the module what it is?
> >
>
> The current type is SFF and it is unlikely to be changed.
Well, SFF are soldered to the board, so yes, it is unlikely to
change...
Please add a comment that this is an SFF, not an SFP, so is soldered
to the board, and so it is known to be an 8636 compatible device.
> Are you referring to get_module_eeprom_by_page()? if so, then it is not
> supported by our FW, we read the entire data on device load.
> However, I can hide that behind the new API and return only the
> requested page if that's the intention.
Well, if your firmware is so limited, then you might as well stick to
the old API, and let the core do the conversion to the legacy
code. But i'm surprised you don't allow access to the temperature
sensors, received signal strength, voltages etc, which could be
exported via HWMON.
> >> + ethtool_link_ksettings_add_link_mode(cmd, lp_advertising, Autoneg);
> >
> > That looks odd. Care to explain?
> >
>
> The HW of all of our ports supports autoneg.
> But in addition, the ports are divided to two groups:
> internal: ports which are connected to other Gaudi2 ports in the same server.
> external: ports which are connected to an external switch.
> Only internal ports use autoneg.
> The ports mask which sets each port as internal/external is fetched from
> the FW on device load.
That is not what i meant. lc_advertising should indicate the link
modes the peer is advertising. If this was a copper link, it typically
would contain 10BaseT-Half, 10BaseT-Full, 100BaseT-Half,
100BaseT-Full, 1000BaseT-Half. Setting the Autoneg bit is pointless,
since the peer must be advertising in order that lp_advertising has a
value!
> Our HW supports Pause frames.
> But, PFC can be disabled via lldptool for exmaple, so we won't advertise
> it.
Please also implement the standard netdev way of configuring pause.
When you do that, you should start to understand how pause can be
negotiated, or forced. That is what most get wrong.
> I'll try to find more info about it, but can you please share what's wrong
> with the curent code?
> BTW I will change it to Asym_Pause as we support Tx pause frames as well.
>
> >> + if (auto_neg && !(hdev->auto_neg_mask & BIT(port_idx))) {
> >> + netdev_err(port->ndev, "port autoneg is disabled by BMC\n");
> >> + rc = -EFAULT;
> >> + goto out;
> >
> > Don't say you support autoneg in supported if that is the case.
> >
> > And EFAULT is about memory problems. EINVAL, maybe EPERM? or
> > EOPNOTSUPP.
> >
> > Andrew
>
> Yeah, should be switched to EPERM/EOPNOTSUPP.
> Regarding the support of autoneg - the HW supports autoneg but it might be
> disabled by the FW. Hence we might not be able to switch it on.
No problem, ask the firmware what it is doing, and return the reality
in ksetting. Only say you support autoneg if your firmware allows you
to perform autoneg.
Andrew
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-18 11:08 ` Omer Shpigelman
2024-06-18 12:58 ` Leon Romanovsky
@ 2024-06-18 16:01 ` Przemek Kitszel
2024-06-19 9:34 ` Omer Shpigelman
1 sibling, 1 reply; 107+ messages in thread
From: Przemek Kitszel @ 2024-06-18 16:01 UTC (permalink / raw)
To: Omer Shpigelman, Leon Romanovsky
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/18/24 13:08, Omer Shpigelman wrote:
> On 6/17/24 22:04, Leon Romanovsky wrote:
>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>
>> On Mon, Jun 17, 2024 at 05:43:49PM +0000, Omer Shpigelman wrote:
>>> On 6/13/24 22:18, Leon Romanovsky wrote:
>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>
>>>> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
>>>>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
>>>>> The driver itself is agnostic to the ASIC in action, it operates according
>>>>> to the capabilities that were passed on device initialization.
>>>>> The device is initialized by the hbl_cn driver via auxiliary bus.
>>>>> The driver also supports QP resource tracking and port/device HW counters.
>>>>>
>>>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
>>>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
>>>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
>>>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
>>>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
>>>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
>>>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
>>>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
>>>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
>>>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
>>>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
>>>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
>>>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
>>>>
>>>> I afraid that you misinterpreted the "Co-developed-by" tag. All these
>>>> people are probably touch the code and not actually sit together at
>>>> the same room and write the code together. So, please remove the
>>>> extensive "Co-developed-by" tags.
>>>>
>>>> It is not full review yet, but simple pass-by-comments.
>>>>
>>>
>>> Actually except of two, all of the mentioned persons sat in the same room
>>> and developed the code together.
>>> The remaining two are located on a different site (but also together).
>>> Isn't that what "Co-developed-by" tag for?
>>> I wanted to give them credit for writing the code but I can remove if it's
>>> not common.
>>
>> Signed-off-by will be enough to give them credit.
>>
>
> Ok, good enough.
>
I would say that a lone sign-off give a little of credit compared to the
co-developed-by tag. OTOH the list here is unusually long. What makes it
even more tricky to evaluate is the fact that there is a lot of code ;)
So, I would suggest to re-evaluate this on your next (trimmed down)
revisions.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-14 22:48 ` Joe Damato
2024-06-16 1:04 ` Andrew Lunn
@ 2024-06-18 19:37 ` Omer Shpigelman
2024-06-18 21:19 ` Stephen Hemminger
1 sibling, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-18 19:37 UTC (permalink / raw)
To: Joe Damato, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On 6/15/24 01:48, Joe Damato wrote:
> [You don't often get email from jdamato@fastly.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Thu, Jun 13, 2024 at 11:22:02AM +0300, Omer Shpigelman wrote:
>> This ethernet driver is initialized via auxiliary bus by the hbl_cn
>> driver.
>> It serves mainly for control operations that are needed for AI scaling.
>>
>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
>> Co-developed-by: David Meriin <dmeriin@habana.ai>
>> Signed-off-by: David Meriin <dmeriin@habana.ai>
>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
<...>
>> + if (hdev->poll_enable)
>> + skb = __netdev_alloc_skb_ip_align(ndev, pkt_size, GFP_KERNEL);
>> + else
>> + skb = napi_alloc_skb(&port->napi, pkt_size);
>> +
>> + if (!skb) {
>> + atomic64_inc(&port->net_stats.rx_dropped);
>
> It seems like buffer exhaustion (!skb) would be rx_missed_errors?
>
> The documentation in include/uapi/linux/if_link.h:
>
> * @rx_dropped: Number of packets received but not processed,
> * e.g. due to lack of resources or unsupported protocol.
> * For hardware interfaces this counter may include packets discarded
> * due to L2 address filtering but should not include packets dropped
> * by the device due to buffer exhaustion which are counted separately in
> * @rx_missed_errors (since procfs folds those two counters together).
>
> But, I don't know much about your hardware so I could be wrong.
>
Per my understanding rx_dropped should be used here. According the doc you
posted, rx_dropped should be used in case of dropped packets due to lack
of resources, while rx_missed_errors should be used for packets that were
dropped by the device due to buffer exhaustion, not by the driver.
Please correct me if I misunderstood something.
>> + break;
>> + }
>> +
>> + skb_copy_to_linear_data(skb, pkt_addr, pkt_size);
>> + skb_put(skb, pkt_size);
>> +
>> + if (is_pkt_swap_enabled(hdev))
>> + dump_swap_pkt(port, skb);
>> +
>> + skb->protocol = eth_type_trans(skb, ndev);
>> +
>> + /* Zero the packet buffer memory to avoid leak in case of wrong
>> + * size is used when next packet populates the same memory
>> + */
>> + memset(pkt_addr, 0, pkt_size);
>> +
>> + /* polling is done in thread context and hence BH should be disabled */
>> + if (hdev->poll_enable)
>> + local_bh_disable();
>> +
>> + rc = netif_receive_skb(skb);
>
> Is there any reason in particular to call netif_receive_skb instead of
> napi_gro_receive ?
>
As you can see, we also support polling mode which is a non-NAPI flow.
We could use napi_gro_receive() for NAPI flow and netif_receive_skb() for
polling mode but we don't support RX checksum offload anyway.
>> +
>> + if (hdev->poll_enable)
>> + local_bh_enable();
<...>
>> + pkt_count = hbl_en_handle_rx(port, budget);
>> +
>> + /* If budget not fully consumed, exit the polling mode */
>> + if (pkt_count < budget) {
>> + napi_complete_done(napi, pkt_count);
>
> I believe this code might be incorrect and that it should be:
>
> if (napi_complete_done(napi, pkt_done))
> hdev->asic_funcs.reenable_rx_irq(port);
>
Thanks, I'll add the condition.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-15 0:16 ` Stephen Hemminger
@ 2024-06-18 19:39 ` Omer Shpigelman
2024-06-19 15:40 ` Andrew Lunn
0 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-18 19:39 UTC (permalink / raw)
To: Stephen Hemminger
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/15/24 03:16, Stephen Hemminger wrote:
> [You don't often get email from stephen@networkplumber.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
>> +
>> +/* get the src IP as it is done in devinet_ioctl() */
>> +static int hbl_en_get_src_ip(struct hbl_aux_dev *aux_dev, u32 port_idx, u32 *src_ip)
>> +{
>> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
>> + struct net_device *ndev = port->ndev;
>> + struct in_device *in_dev;
>> + struct in_ifaddr *ifa;
>> + int rc = 0;
>> +
>> + /* for the case where no src IP is configured */
>> + *src_ip = 0;
>> +
>> + /* rtnl lock should be acquired in relevant flows before taking configuration lock */
>> + if (!rtnl_is_locked()) {
>> + netdev_err(port->ndev, "Rtnl lock is not acquired, can't proceed\n");
>> + rc = -EFAULT;
>> + goto out;
>> + }
>> +
>> + in_dev = __in_dev_get_rtnl(ndev);
>> + if (!in_dev) {
>> + netdev_err(port->ndev, "Failed to get IPv4 struct\n");
>> + rc = -EFAULT;
>> + goto out;
>> + }
>> +
>> + ifa = rtnl_dereference(in_dev->ifa_list);
>> +
>> + while (ifa) {
>> + if (!strcmp(ndev->name, ifa->ifa_label)) {
>> + /* convert the BE to native and later on it will be
>> + * written to the HW as LE in QPC_SET
>> + */
>> + *src_ip = be32_to_cpu(ifa->ifa_local);
>> + break;
>> + }
>> + ifa = rtnl_dereference(ifa->ifa_next);
>> + }
>> +out:
>> + return rc;
>> +}
>
> Does this device require IPv4? What about users and infrastructures that use IPv6 only?
> IPv4 is legacy at this point.
Gaudi2 supports IPv4 only.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-18 19:37 ` Omer Shpigelman
@ 2024-06-18 21:19 ` Stephen Hemminger
2024-06-19 12:13 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Stephen Hemminger @ 2024-06-18 21:19 UTC (permalink / raw)
To: Omer Shpigelman
Cc: Joe Damato, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On Tue, 18 Jun 2024 19:37:36 +0000
Omer Shpigelman <oshpigelman@habana.ai> wrote:
> >
> > Is there any reason in particular to call netif_receive_skb instead of
> > napi_gro_receive ?
> >
>
> As you can see, we also support polling mode which is a non-NAPI flow.
> We could use napi_gro_receive() for NAPI flow and netif_receive_skb() for
> polling mode but we don't support RX checksum offload anyway.
Why non-NAPI? I thought current netdev policy was all drivers should
use NAPI.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-18 14:19 ` Andrew Lunn
@ 2024-06-19 7:16 ` Omer Shpigelman
2024-06-19 8:01 ` Przemek Kitszel
` (2 more replies)
0 siblings, 3 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-19 7:16 UTC (permalink / raw)
To: Andrew Lunn
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/18/24 17:19, Andrew Lunn wrote:
>>>> +static u32 hbl_en_get_mtu(struct hbl_aux_dev *aux_dev, u32 port_idx)
>>>> +{
>>>> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
>>>> + struct net_device *ndev = port->ndev;
>>>> + u32 mtu;
>>>> +
>>>> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
>>>> + netdev_err(ndev, "port is in reset, can't get MTU\n");
>>>> + return 0;
>>>> + }
>>>> +
>>>> + mtu = ndev->mtu;
>>>
>>> I think you need a better error message. All this does is access
>>> ndev->mtu. What does it matter if the port is in reset? You don't
>>> access it.
>>>
>>
>> This function is called from the CN driver to get the current MTU in order
>> to configure it to the HW, for exmaple when configuring an IB QP. The MTU
>> value might be changed by user while we execute this function.
>
> Change of MTU will happen while holding RTNL. Why not simply hold RTNL
> while programming the hardware? That is the normal pattern for MAC
> drivers.
>
I can hold the RTNL lock while configuring the HW but it seems like a big
overhead. Configuring the HW might take some time due to QP draining or
cache invalidation.
To me it seems unnecessary but if that's the common way then I'll change
it.
>>>> +static int hbl_en_change_mtu(struct net_device *netdev, int new_mtu)
>>>> +{
>>>> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
>>>> + int rc = 0;
>>>> +
>>>> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
>>>> + netdev_err(netdev, "port is in reset, can't change MTU\n");
>>>> + return -EBUSY;
>>>> + }
>>>> +
>>>> + if (netif_running(port->ndev)) {
>>>> + hbl_en_port_close(port);
>>>> +
>>>> + /* Sleep in order to let obsolete events to be dropped before re-opening the port */
>>>> + msleep(20);
>>>> +
>>>> + netdev->mtu = new_mtu;
>>>> +
>>>> + rc = hbl_en_port_open(port);
>>>> + if (rc)
>>>> + netdev_err(netdev, "Failed to reinit port for MTU change, rc %d\n", rc);
>>>
>>> Does that mean the port is FUBAR?
>>>
>>> Most operations like this are expected to roll back to the previous
>>> working configuration on failure. So if changing the MTU requires new
>>> buffers in your ring, you should first allocate the new buffers, then
>>> free the old buffers, so that if allocation fails, you still have
>>> buffers, and the device can continue operating.
>>>
>>
>> A failure in opening a port is a fatal error. It shouldn't happen. This is
>> not something we wish to recover from.
>
> What could cause open to fail? Is memory allocated?
>
Memory is allocated but it is freed in case of a failure.
Port opening can fail due to other reasons as well like some HW timeout
while configuring the ETH QP.
>> This kind of an error indicates a severe system error that will usually
>> require a driver removal and reload anyway.
>>
>>>> +module_param(poll_enable, bool, 0444);
>>>> +MODULE_PARM_DESC(poll_enable,
>>>> + "Enable Rx polling rather than IRQ + NAPI (0 = no, 1 = yes, default: no)");
>>>
>>> Module parameters are not liked. This probably needs to go away.
>>>
>>
>> I see that various vendors under net/ethernet/* use module parameters.
>> Can't we add another one?
>
> Look at the history of those module parameters. Do you see many added
> in the last year? 5 years?
>
I didn't check that prior to my submit. Regarding this "no new module
parameters allowed" rule, is that documented anywhere? if not, is that the
common practice? not to try to do something that was not done recently?
how "recently" is defined?
I just want to clarify this because it's hard to handle these submissions
when we write some code based on existing examples but then we are
rejected because "we don't do that here anymore".
I want to avoid future cases of this mismatch.
>>>> +static int hbl_en_ethtool_get_module_info(struct net_device *ndev, struct ethtool_modinfo *modinfo)
>>>> +{
>>>> + modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
>>>> + modinfo->type = ETH_MODULE_SFF_8636;
>>>
>>> Is this an SFF, not an SFP? How else can you know what module it is
>>> without doing an I2C transfer to ask the module what it is?
>>>
>>
>> The current type is SFF and it is unlikely to be changed.
>
> Well, SFF are soldered to the board, so yes, it is unlikely to
> change...
>
> Please add a comment that this is an SFF, not an SFP, so is soldered
> to the board, and so it is known to be an 8636 compatible device.
>
I'll add.
>> Are you referring to get_module_eeprom_by_page()? if so, then it is not
>> supported by our FW, we read the entire data on device load.
>> However, I can hide that behind the new API and return only the
>> requested page if that's the intention.
>
> Well, if your firmware is so limited, then you might as well stick to
> the old API, and let the core do the conversion to the legacy
> code. But i'm surprised you don't allow access to the temperature
> sensors, received signal strength, voltages etc, which could be
> exported via HWMON.
>
I'll stick to the old API.
Regaring the sensors, our compute driver (under accel/habanalabs) exports
them via HWMON.
>>>> + ethtool_link_ksettings_add_link_mode(cmd, lp_advertising, Autoneg);
>>>
>>> That looks odd. Care to explain?
>>>
>>
>> The HW of all of our ports supports autoneg.
>> But in addition, the ports are divided to two groups:
>> internal: ports which are connected to other Gaudi2 ports in the same server.
>> external: ports which are connected to an external switch.
>> Only internal ports use autoneg.
>> The ports mask which sets each port as internal/external is fetched from
>> the FW on device load.
>
> That is not what i meant. lc_advertising should indicate the link
> modes the peer is advertising. If this was a copper link, it typically
> would contain 10BaseT-Half, 10BaseT-Full, 100BaseT-Half,
> 100BaseT-Full, 1000BaseT-Half. Setting the Autoneg bit is pointless,
> since the peer must be advertising in order that lp_advertising has a
> value!
>
Sorry, but I don't get this. The problem is the setting of the Autoneg bit
in lp_advertising? is that redundant? I see that other vendors set it too
in case that Autoneg was completed.
>> Our HW supports Pause frames.
>> But, PFC can be disabled via lldptool for exmaple, so we won't advertise
>> it.
>
> Please also implement the standard netdev way of configuring pause.
> When you do that, you should start to understand how pause can be
> negotiated, or forced. That is what most get wrong.
>
Let me revisit this.
>> I'll try to find more info about it, but can you please share what's wrong
>> with the curent code?
>> BTW I will change it to Asym_Pause as we support Tx pause frames as well.
>>
>>>> + if (auto_neg && !(hdev->auto_neg_mask & BIT(port_idx))) {
>>>> + netdev_err(port->ndev, "port autoneg is disabled by BMC\n");
>>>> + rc = -EFAULT;
>>>> + goto out;
>>>
>>> Don't say you support autoneg in supported if that is the case.
>>>
>>> And EFAULT is about memory problems. EINVAL, maybe EPERM? or
>>> EOPNOTSUPP.
>>>
>>> Andrew
>>
>> Yeah, should be switched to EPERM/EOPNOTSUPP.
>> Regarding the support of autoneg - the HW supports autoneg but it might be
>> disabled by the FW. Hence we might not be able to switch it on.
>
> No problem, ask the firmware what it is doing, and return the reality
> in ksetting. Only say you support autoneg if your firmware allows you
> to perform autoneg.
>
> Andrew
>
Ok, I'll set the Autoneg support bit properly.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-19 7:16 ` Omer Shpigelman
@ 2024-06-19 8:01 ` Przemek Kitszel
2024-06-19 12:15 ` Omer Shpigelman
2024-06-19 15:21 ` Jakub Kicinski
2024-06-19 16:13 ` Andrew Lunn
2 siblings, 1 reply; 107+ messages in thread
From: Przemek Kitszel @ 2024-06-19 8:01 UTC (permalink / raw)
To: Omer Shpigelman, Andrew Lunn
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/19/24 09:16, Omer Shpigelman wrote:
> On 6/18/24 17:19, Andrew Lunn wrote:
[...]
>>>>> +module_param(poll_enable, bool, 0444);
>>>>> +MODULE_PARM_DESC(poll_enable,
>>>>> + "Enable Rx polling rather than IRQ + NAPI (0 = no, 1 = yes, default: no)");
>>>>
>>>> Module parameters are not liked. This probably needs to go away.
>>>>
>>>
>>> I see that various vendors under net/ethernet/* use module parameters.
>>> Can't we add another one?
>>
>> Look at the history of those module parameters. Do you see many added
>> in the last year? 5 years?
>>
>
> I didn't check that prior to my submit. Regarding this "no new module
> parameters allowed" rule, is that documented anywhere? if not, is that the
> common practice? not to try to do something that was not done recently?
> how "recently" is defined?
> I just want to clarify this because it's hard to handle these submissions
> when we write some code based on existing examples but then we are
> rejected because "we don't do that here anymore".
> I want to avoid future cases of this mismatch.
>
best way is to read netdev ML, that way you will learn what interfaces
are frowned upon and which are outright banned, sometimes you could
judge yourself knowing which interfaces are most developed recently
in this module params example - they were introduced to allow init phase
configuration of the device, that could not be postponed, what in the
general case sounds like a workaround; hardest cases include huge swaths
of (physical continuous) memory to be allocated, but for that there are
now device tree binding solutions; more typical cases for networking are
resolved via devlink reload
devlink parms are also the thing that should be used as a default for
new parameters, the best if given parameter is not driver specific quirk
poll_enable sounds like something that should be a common param,
but you have to better describe what you mean there
(see napi_poll(), "Enable Rx polling" would mean to use that as default,
do you mean busy polling or what?)
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-18 12:58 ` Leon Romanovsky
@ 2024-06-19 9:27 ` Omer Shpigelman
2024-06-19 10:52 ` Leon Romanovsky
0 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-19 9:27 UTC (permalink / raw)
To: Leon Romanovsky, gregkh@linuxfoundation.org
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/18/24 15:58, Leon Romanovsky wrote:
> On Tue, Jun 18, 2024 at 11:08:34AM +0000, Omer Shpigelman wrote:
>> On 6/17/24 22:04, Leon Romanovsky wrote:
>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> On Mon, Jun 17, 2024 at 05:43:49PM +0000, Omer Shpigelman wrote:
>>>> On 6/13/24 22:18, Leon Romanovsky wrote:
>>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>>
>>>>> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
>>>>>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
>>>>>> The driver itself is agnostic to the ASIC in action, it operates according
>>>>>> to the capabilities that were passed on device initialization.
>>>>>> The device is initialized by the hbl_cn driver via auxiliary bus.
>>>>>> The driver also supports QP resource tracking and port/device HW counters.
>>>>>>
>>>>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
>>>>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
>>>>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
>>>>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
>>>>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
>>>>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
>>>>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
>>>>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
>>>>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
>>>>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
>>>>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
>>>>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
>>>>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
>>>>>
>>>>> I afraid that you misinterpreted the "Co-developed-by" tag. All these
>>>>> people are probably touch the code and not actually sit together at
>>>>> the same room and write the code together. So, please remove the
>>>>> extensive "Co-developed-by" tags.
>>>>>
>>>>> It is not full review yet, but simple pass-by-comments.
>>>>>
>>>>
>>>> Actually except of two, all of the mentioned persons sat in the same room
>>>> and developed the code together.
>>>> The remaining two are located on a different site (but also together).
>>>> Isn't that what "Co-developed-by" tag for?
>>>> I wanted to give them credit for writing the code but I can remove if it's
>>>> not common.
>>>
>>> Signed-off-by will be enough to give them credit.
>>>
>>
>> Ok, good enough.
>>
>>>>
>>>>>> ---
>>>>>> MAINTAINERS | 10 +
>>>>>> drivers/infiniband/Kconfig | 1 +
>>>>>> drivers/infiniband/hw/Makefile | 1 +
>>>>>> drivers/infiniband/hw/hbl/Kconfig | 17 +
>>>>>> drivers/infiniband/hw/hbl/Makefile | 8 +
>>>>>> drivers/infiniband/hw/hbl/hbl.h | 326 +++
>>>>>> drivers/infiniband/hw/hbl/hbl_main.c | 478 ++++
>>>>>> drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 ++++++++++++++++++++++
>>>>>> include/uapi/rdma/hbl-abi.h | 204 ++
>>>>>> include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
>>>>>> include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
>>>>>> include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
>>>>>> 12 files changed, 3904 insertions(+)
>>>>>> create mode 100644 drivers/infiniband/hw/hbl/Kconfig
>>>>>> create mode 100644 drivers/infiniband/hw/hbl/Makefile
>>>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl.h
>>>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c
>>>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c
>>>>>> create mode 100644 include/uapi/rdma/hbl-abi.h
>>>>>> create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h
>>>>>> create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h
>>>>>
>>>>> <...>
>>>>>
>>>>>> +#define hbl_ibdev_emerg(ibdev, format, ...) ibdev_emerg(ibdev, format, ##__VA_ARGS__)
>>>>>> +#define hbl_ibdev_alert(ibdev, format, ...) ibdev_alert(ibdev, format, ##__VA_ARGS__)
>>>>>> +#define hbl_ibdev_crit(ibdev, format, ...) ibdev_crit(ibdev, format, ##__VA_ARGS__)
>>>>>> +#define hbl_ibdev_err(ibdev, format, ...) ibdev_err(ibdev, format, ##__VA_ARGS__)
>>>>>> +#define hbl_ibdev_warn(ibdev, format, ...) ibdev_warn(ibdev, format, ##__VA_ARGS__)
>>>>>> +#define hbl_ibdev_notice(ibdev, format, ...) ibdev_notice(ibdev, format, ##__VA_ARGS__)
>>>>>> +#define hbl_ibdev_info(ibdev, format, ...) ibdev_info(ibdev, format, ##__VA_ARGS__)
>>>>>> +#define hbl_ibdev_dbg(ibdev, format, ...) ibdev_dbg(ibdev, format, ##__VA_ARGS__)
>>>>>> +
>>>>>> +#define hbl_ibdev_emerg_ratelimited(ibdev, fmt, ...) \
>>>>>> + ibdev_emerg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>> +#define hbl_ibdev_alert_ratelimited(ibdev, fmt, ...) \
>>>>>> + ibdev_alert_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>> +#define hbl_ibdev_crit_ratelimited(ibdev, fmt, ...) \
>>>>>> + ibdev_crit_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>> +#define hbl_ibdev_err_ratelimited(ibdev, fmt, ...) \
>>>>>> + ibdev_err_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>> +#define hbl_ibdev_warn_ratelimited(ibdev, fmt, ...) \
>>>>>> + ibdev_warn_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>> +#define hbl_ibdev_notice_ratelimited(ibdev, fmt, ...) \
>>>>>> + ibdev_notice_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>> +#define hbl_ibdev_info_ratelimited(ibdev, fmt, ...) \
>>>>>> + ibdev_info_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>> +#define hbl_ibdev_dbg_ratelimited(ibdev, fmt, ...) \
>>>>>> + ibdev_dbg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>> +
>>>>>
>>>>> Please don't redefine the existing macros. Just use the existing ones.
>>>>>
>>>>>
>>>>> <...>
>>>>>
>>>>
>>>> That's a leftover from some debug code. I'll remove.
>>>>
>>>>>> + if (hbl_ib_match_netdev(ibdev, netdev))
>>>>>> + ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
>>>>>> + else
>>>>>> + return NOTIFY_DONE;
>>>>>
>>>>> It is not kernel coding style. Please write:
>>>>> if (!hbl_ib_match_netdev(ibdev, netdev))
>>>>> return NOTIFY_DONE;
>>>>>
>>>>> ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
>>>>>
>>>>
>>>> I'll fix the code, thanks.
>>>>
>>>>>> +
>>>>>
>>>>> <...>
>>>>>
>>>>>> +static int hbl_ib_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
>>>>>> +{
>>>>>> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
>>>>>> + struct hbl_ib_aux_ops *aux_ops = aux_dev->aux_ops;
>>>>>> + struct hbl_ib_device *hdev;
>>>>>> + ktime_t timeout;
>>>>>> + int rc;
>>>>>> +
>>>>>> + rc = hdev_init(aux_dev);
>>>>>> + if (rc) {
>>>>>> + dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
>>>>>> + return -EIO;
>>>>>> + }
>>>>>> +
>>>>>> + hdev = aux_dev->priv;
>>>>>> +
>>>>>> + /* don't allow module unloading while it is attached */
>>>>>> + if (!try_module_get(THIS_MODULE)) {
>>>>>
>>>>> This part makes wonder, what are you trying to do here? What doesn't work for you
>>>>> in standard driver core and module load mechanism?
>>>>>
>>>>
>>>> Before auxiliary bus was introduced, we used EXPORT_SYMBOLs for inter
>>>> driver communication. That incremented the refcount of the used module so
>>>> it couldn't be removed while it is in use.
>>>> Auxiliary bus usage doesn't increment the used module refcount and hence
>>>> the used module can be removed while it is in use and that's something
>>>> we don't want to allow.
>>>> We could solve it by some global locking or in_use atomic but the most
>>>> simple and clean way is just to increment the used module refcount on
>>>> auxiliary device probe and decrement it on auxiliary device removal.
>>>
>>> No, you was supposed to continue to use EXPORT_SYMBOLs and don't
>>> invent auxiliary ops structure (this is why you lost module
>>> reference counting).
>>>
>>
>> Sorry, but according to the auxiliary bus doc, a domain-specific ops
>> structure can be used.
>> We followed the usage example described at drivers/base/auxiliary.c.
>> What am I missing?
>
> Being the one who implemented auxiliary bus in the kernel and converted
> number of drivers to use it, I strongly recommend do NOT follow the example
> provided there.
>
> So you are missing "best practice", and "best practice" is to use
> EXPORT_SYMBOLs and rely on module reference counting.
>
It is not just the usage example but also the general feature doc before
it:
"The generic behavior can be extended and specialized as needed by
encapsulating an auxiliary_device within other domain-specific structures
and the use of .ops callbacks."
It is also mentioned there that the ops structure are used for specific
auxiliary device operations while EXPORT_SYMBOLs should be used for common
infrastrucure the parent driver exposes:
"Note that ops are intended as a way to augment instance behavior within a
class of auxiliary devices, it is not the mechanism for exporting common
infrastructure from the parent."
All of our ops callbacks are meant to provide functionality related to the
auxiliary device, they are not just general/common infrastructure.
Why do we have this doc if we should ignore it? why wasn't the doc
modified according to the "best practice" you described? the doc is
misleading.
Adding gregkh here as he requested the auxiliary bus feature IIRC.
Greg - isn't the doc legit? should EXPORT_SYMBOLs necessarily be used
together with auxiliary bus rather than ops structure?
As we saw it, auxiliary bus gives us the flexibility to choose which
modules will be loaded while EXPORT_SYMBOLs enforces the dependencies
which might not be needed in some cases.
>> Moreover, we'd like to support the mode where the IB or the ETH driver is
>> not loaded at all. But this cannot be achieved if we use EXPORT_SYMBOLs
>> exclusively for inter driver communication.
>
> It is not true and not how the kernel works. You can perfectly load core
> driver without IB and ETH, at some extent this is how mlx5 driver works.
>
mlx5 IB driver doesn't export any symbol that is used by the core driver,
that's why the core driver can be loaded without the IB driver (althought
you'll get circular dependency if you would export).
If relying on exported symbols only, then our IB and ETH drivers will need
to export symbols too because the core driver accesses them post probing.
Hence we won't be able to load the core driver without both of them (or
loading anything due to circular dependency).
Unless we'll use dynamic symbol lookup and I don't think that's your
intention.
>>
>>>>
>>>>>> + dev_err(hdev->dev, "Failed to increment %s module refcount\n",
>>>>>> + module_name(THIS_MODULE));
>>>>>> + rc = -EIO;
>>>>>> + goto module_get_err;
>>>>>> + }
>>>>>> +
>>>>>> + timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
>>>>>> + while (1) {
>>>>>> + aux_ops->hw_access_lock(aux_dev);
>>>>>> +
>>>>>> + /* if the device is operational, proceed to actual init while holding the lock in
>>>>>> + * order to prevent concurrent hard reset
>>>>>> + */
>>>>>> + if (aux_ops->device_operational(aux_dev))
>>>>>> + break;
>>>>>> +
>>>>>> + aux_ops->hw_access_unlock(aux_dev);
>>>>>> +
>>>>>> + if (ktime_compare(ktime_get(), timeout) > 0) {
>>>>>> + dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
>>>>>> + rc = -EBUSY;
>>>>>> + goto timeout_err;
>>>>>> + }
>>>>>> +
>>>>>> + dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing IB\n");
>>>>>> +
>>>>>> + msleep_interruptible(MSEC_PER_SEC);
>>>>>> + }
>>>>>
>>>>> The code above is unexpected.
>>>>>
>>>>
>>>> We have no control on when the user insmod the IB driver.
>>>
>>> It is not true, this is controlled through module dependencies
>>> mechanism.
>>>
>>
>> Yeah, if we would use EXPORT_SYMBOLs for inter driver communication but
>> we don't.
>
> So please use it and don't add complexity where it is not needed.
>
>>
>>>> As a result it is possible that the IB auxiliary device will be probed
>>>> while the compute device is under reset (due to some HW error).
>>>
>>> No, it is not possible. If you structure your driver right.
>>>
>>
>> Again, it is not possible if we would use EXPORT_SYMBOLs.
>> Please let me know if we misunderstood something because AFAIU we followed
>> the auxiliary bus doc usage example.
>
> It is better to follow actual drivers that use auxiliary bus and see how
> they implemented it and not rely on examples in the documentation.
>
But isn't that what the doc for? to explain the guidelines? and it's not
that there is a big red note there of "this example should not be taken as
is, please look at your subsystem guidelines".
> Thanks
>
>>
>>> Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-18 16:01 ` Przemek Kitszel
@ 2024-06-19 9:34 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-19 9:34 UTC (permalink / raw)
To: Przemek Kitszel, Leon Romanovsky
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/18/24 19:01, Przemek Kitszel wrote:
> On 6/18/24 13:08, Omer Shpigelman wrote:
>> On 6/17/24 22:04, Leon Romanovsky wrote:
>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> On Mon, Jun 17, 2024 at 05:43:49PM +0000, Omer Shpigelman wrote:
>>>> On 6/13/24 22:18, Leon Romanovsky wrote:
>>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>>
>>>>> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
>>>>>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
>>>>>> The driver itself is agnostic to the ASIC in action, it operates according
>>>>>> to the capabilities that were passed on device initialization.
>>>>>> The device is initialized by the hbl_cn driver via auxiliary bus.
>>>>>> The driver also supports QP resource tracking and port/device HW counters.
>>>>>>
>>>>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
>>>>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
>>>>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
>>>>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
>>>>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
>>>>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
>>>>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
>>>>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
>>>>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
>>>>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
>>>>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
>>>>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
>>>>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
>>>>>
>>>>> I afraid that you misinterpreted the "Co-developed-by" tag. All these
>>>>> people are probably touch the code and not actually sit together at
>>>>> the same room and write the code together. So, please remove the
>>>>> extensive "Co-developed-by" tags.
>>>>>
>>>>> It is not full review yet, but simple pass-by-comments.
>>>>>
>>>>
>>>> Actually except of two, all of the mentioned persons sat in the same room
>>>> and developed the code together.
>>>> The remaining two are located on a different site (but also together).
>>>> Isn't that what "Co-developed-by" tag for?
>>>> I wanted to give them credit for writing the code but I can remove if it's
>>>> not common.
>>>
>>> Signed-off-by will be enough to give them credit.
>>>
>>
>> Ok, good enough.
>>
>
> I would say that a lone sign-off give a little of credit compared to the
> co-developed-by tag. OTOH the list here is unusually long. What makes it
> even more tricky to evaluate is the fact that there is a lot of code ;)
>
> So, I would suggest to re-evaluate this on your next (trimmed down)
> revisions.
You are right about the smaller credit Signed-off-by gives compared to
Co-developed-by, but yeah, the list is unusually long.
I'll try to split it to smaller patches and give credit only to the
specific persons involved.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-17 14:17 ` Jason Gunthorpe
@ 2024-06-19 9:39 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-19 9:39 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/17/24 17:17, Jason Gunthorpe wrote:
> [You don't often get email from jgg@ziepe.ca. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
>> The driver itself is agnostic to the ASIC in action, it operates according
>> to the capabilities that were passed on device initialization.
>> The device is initialized by the hbl_cn driver via auxiliary bus.
>> The driver also supports QP resource tracking and port/device HW counters.
>
> I'm glad to finally see this, I've been talking to habana folks a long
> time now to get this worked out!
>
> This will need to be split up more, like others have said. I'd post
> the RDMA series assuming that the basic ethernet driver is merged. You
> don't need to combine basic ethernet with rdma in the same series.
>
> Jason
Yeah, I'll push one driver at a time in multiple patch sets with smaller
patches.
It's just that all the 4 drivers are operating together, they are not
operational separately. That's why I thought it will give more context to
see the entire picture when reviewing.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-19 9:27 ` Omer Shpigelman
@ 2024-06-19 10:52 ` Leon Romanovsky
2024-06-24 8:47 ` Omer Shpigelman
2024-06-28 10:24 ` Omer Shpigelman
0 siblings, 2 replies; 107+ messages in thread
From: Leon Romanovsky @ 2024-06-19 10:52 UTC (permalink / raw)
To: Omer Shpigelman
Cc: gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On Wed, Jun 19, 2024 at 09:27:54AM +0000, Omer Shpigelman wrote:
> On 6/18/24 15:58, Leon Romanovsky wrote:
> > On Tue, Jun 18, 2024 at 11:08:34AM +0000, Omer Shpigelman wrote:
> >> On 6/17/24 22:04, Leon Romanovsky wrote:
> >>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>
> >>> On Mon, Jun 17, 2024 at 05:43:49PM +0000, Omer Shpigelman wrote:
> >>>> On 6/13/24 22:18, Leon Romanovsky wrote:
> >>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>>>
> >>>>> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
> >>>>>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
> >>>>>> The driver itself is agnostic to the ASIC in action, it operates according
> >>>>>> to the capabilities that were passed on device initialization.
> >>>>>> The device is initialized by the hbl_cn driver via auxiliary bus.
> >>>>>> The driver also supports QP resource tracking and port/device HW counters.
> >>>>>>
> >>>>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> >>>>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> >>>>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> >>>>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> >>>>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> >>>>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> >>>>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> >>>>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
> >>>>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
> >>>>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> >>>>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> >>>>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> >>>>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> >>>>>
> >>>>> I afraid that you misinterpreted the "Co-developed-by" tag. All these
> >>>>> people are probably touch the code and not actually sit together at
> >>>>> the same room and write the code together. So, please remove the
> >>>>> extensive "Co-developed-by" tags.
> >>>>>
> >>>>> It is not full review yet, but simple pass-by-comments.
> >>>>>
> >>>>
> >>>> Actually except of two, all of the mentioned persons sat in the same room
> >>>> and developed the code together.
> >>>> The remaining two are located on a different site (but also together).
> >>>> Isn't that what "Co-developed-by" tag for?
> >>>> I wanted to give them credit for writing the code but I can remove if it's
> >>>> not common.
> >>>
> >>> Signed-off-by will be enough to give them credit.
> >>>
> >>
> >> Ok, good enough.
> >>
> >>>>
> >>>>>> ---
> >>>>>> MAINTAINERS | 10 +
> >>>>>> drivers/infiniband/Kconfig | 1 +
> >>>>>> drivers/infiniband/hw/Makefile | 1 +
> >>>>>> drivers/infiniband/hw/hbl/Kconfig | 17 +
> >>>>>> drivers/infiniband/hw/hbl/Makefile | 8 +
> >>>>>> drivers/infiniband/hw/hbl/hbl.h | 326 +++
> >>>>>> drivers/infiniband/hw/hbl/hbl_main.c | 478 ++++
> >>>>>> drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 ++++++++++++++++++++++
> >>>>>> include/uapi/rdma/hbl-abi.h | 204 ++
> >>>>>> include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
> >>>>>> include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
> >>>>>> include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
> >>>>>> 12 files changed, 3904 insertions(+)
> >>>>>> create mode 100644 drivers/infiniband/hw/hbl/Kconfig
> >>>>>> create mode 100644 drivers/infiniband/hw/hbl/Makefile
> >>>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl.h
> >>>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c
> >>>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c
> >>>>>> create mode 100644 include/uapi/rdma/hbl-abi.h
> >>>>>> create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h
> >>>>>> create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h
> >>>>>
> >>>>> <...>
> >>>>>
> >>>>>> +#define hbl_ibdev_emerg(ibdev, format, ...) ibdev_emerg(ibdev, format, ##__VA_ARGS__)
> >>>>>> +#define hbl_ibdev_alert(ibdev, format, ...) ibdev_alert(ibdev, format, ##__VA_ARGS__)
> >>>>>> +#define hbl_ibdev_crit(ibdev, format, ...) ibdev_crit(ibdev, format, ##__VA_ARGS__)
> >>>>>> +#define hbl_ibdev_err(ibdev, format, ...) ibdev_err(ibdev, format, ##__VA_ARGS__)
> >>>>>> +#define hbl_ibdev_warn(ibdev, format, ...) ibdev_warn(ibdev, format, ##__VA_ARGS__)
> >>>>>> +#define hbl_ibdev_notice(ibdev, format, ...) ibdev_notice(ibdev, format, ##__VA_ARGS__)
> >>>>>> +#define hbl_ibdev_info(ibdev, format, ...) ibdev_info(ibdev, format, ##__VA_ARGS__)
> >>>>>> +#define hbl_ibdev_dbg(ibdev, format, ...) ibdev_dbg(ibdev, format, ##__VA_ARGS__)
> >>>>>> +
> >>>>>> +#define hbl_ibdev_emerg_ratelimited(ibdev, fmt, ...) \
> >>>>>> + ibdev_emerg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>> +#define hbl_ibdev_alert_ratelimited(ibdev, fmt, ...) \
> >>>>>> + ibdev_alert_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>> +#define hbl_ibdev_crit_ratelimited(ibdev, fmt, ...) \
> >>>>>> + ibdev_crit_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>> +#define hbl_ibdev_err_ratelimited(ibdev, fmt, ...) \
> >>>>>> + ibdev_err_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>> +#define hbl_ibdev_warn_ratelimited(ibdev, fmt, ...) \
> >>>>>> + ibdev_warn_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>> +#define hbl_ibdev_notice_ratelimited(ibdev, fmt, ...) \
> >>>>>> + ibdev_notice_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>> +#define hbl_ibdev_info_ratelimited(ibdev, fmt, ...) \
> >>>>>> + ibdev_info_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>> +#define hbl_ibdev_dbg_ratelimited(ibdev, fmt, ...) \
> >>>>>> + ibdev_dbg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>> +
> >>>>>
> >>>>> Please don't redefine the existing macros. Just use the existing ones.
> >>>>>
> >>>>>
> >>>>> <...>
> >>>>>
> >>>>
> >>>> That's a leftover from some debug code. I'll remove.
> >>>>
> >>>>>> + if (hbl_ib_match_netdev(ibdev, netdev))
> >>>>>> + ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
> >>>>>> + else
> >>>>>> + return NOTIFY_DONE;
> >>>>>
> >>>>> It is not kernel coding style. Please write:
> >>>>> if (!hbl_ib_match_netdev(ibdev, netdev))
> >>>>> return NOTIFY_DONE;
> >>>>>
> >>>>> ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
> >>>>>
> >>>>
> >>>> I'll fix the code, thanks.
> >>>>
> >>>>>> +
> >>>>>
> >>>>> <...>
> >>>>>
> >>>>>> +static int hbl_ib_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
> >>>>>> +{
> >>>>>> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
> >>>>>> + struct hbl_ib_aux_ops *aux_ops = aux_dev->aux_ops;
> >>>>>> + struct hbl_ib_device *hdev;
> >>>>>> + ktime_t timeout;
> >>>>>> + int rc;
> >>>>>> +
> >>>>>> + rc = hdev_init(aux_dev);
> >>>>>> + if (rc) {
> >>>>>> + dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
> >>>>>> + return -EIO;
> >>>>>> + }
> >>>>>> +
> >>>>>> + hdev = aux_dev->priv;
> >>>>>> +
> >>>>>> + /* don't allow module unloading while it is attached */
> >>>>>> + if (!try_module_get(THIS_MODULE)) {
> >>>>>
> >>>>> This part makes wonder, what are you trying to do here? What doesn't work for you
> >>>>> in standard driver core and module load mechanism?
> >>>>>
> >>>>
> >>>> Before auxiliary bus was introduced, we used EXPORT_SYMBOLs for inter
> >>>> driver communication. That incremented the refcount of the used module so
> >>>> it couldn't be removed while it is in use.
> >>>> Auxiliary bus usage doesn't increment the used module refcount and hence
> >>>> the used module can be removed while it is in use and that's something
> >>>> we don't want to allow.
> >>>> We could solve it by some global locking or in_use atomic but the most
> >>>> simple and clean way is just to increment the used module refcount on
> >>>> auxiliary device probe and decrement it on auxiliary device removal.
> >>>
> >>> No, you was supposed to continue to use EXPORT_SYMBOLs and don't
> >>> invent auxiliary ops structure (this is why you lost module
> >>> reference counting).
> >>>
> >>
> >> Sorry, but according to the auxiliary bus doc, a domain-specific ops
> >> structure can be used.
> >> We followed the usage example described at drivers/base/auxiliary.c.
> >> What am I missing?
> >
> > Being the one who implemented auxiliary bus in the kernel and converted
> > number of drivers to use it, I strongly recommend do NOT follow the example
> > provided there.
> >
> > So you are missing "best practice", and "best practice" is to use
> > EXPORT_SYMBOLs and rely on module reference counting.
> >
>
> It is not just the usage example but also the general feature doc before
> it:
> "The generic behavior can be extended and specialized as needed by
> encapsulating an auxiliary_device within other domain-specific structures
> and the use of .ops callbacks."
> It is also mentioned there that the ops structure are used for specific
> auxiliary device operations while EXPORT_SYMBOLs should be used for common
> infrastrucure the parent driver exposes:
> "Note that ops are intended as a way to augment instance behavior within a
> class of auxiliary devices, it is not the mechanism for exporting common
> infrastructure from the parent."
> All of our ops callbacks are meant to provide functionality related to the
> auxiliary device, they are not just general/common infrastructure.
Of course they are common, otherwise why did you put them in common code?
For example, you have callbacks to lock and unlock internal HW access,
how is it not common?
>
> Why do we have this doc if we should ignore it? why wasn't the doc
> modified according to the "best practice" you described? the doc is
> misleading.
Because this is how upstream kernel development works. We are trying to
come to the agreement and get the best solution for the problem. Sometimes,
the outcome of the discussion is not "the best solution", but "good
enough". This doc can be served as an example. Everyone involved in the
development of auxbus and later usage of it, were focused on implementation,
documentation was good enough as it didn't limit anyone who actually
used it.
>
> Adding gregkh here as he requested the auxiliary bus feature IIRC.
> Greg - isn't the doc legit? should EXPORT_SYMBOLs necessarily be used
> together with auxiliary bus rather than ops structure?
This is not what you are doing here. You completely ditched EXPORT_SYMBOLs
and reinvented module reference counting which overcomplicated the code
just to avoid using standard kernel mechanism.
> As we saw it, auxiliary bus gives us the flexibility to choose which
> modules will be loaded while EXPORT_SYMBOLs enforces the dependencies
> which might not be needed in some cases.
>
> >> Moreover, we'd like to support the mode where the IB or the ETH driver is
> >> not loaded at all. But this cannot be achieved if we use EXPORT_SYMBOLs
> >> exclusively for inter driver communication.
> >
> > It is not true and not how the kernel works. You can perfectly load core
> > driver without IB and ETH, at some extent this is how mlx5 driver works.
> >
>
> mlx5 IB driver doesn't export any symbol that is used by the core driver,
> that's why the core driver can be loaded without the IB driver (althought
> you'll get circular dependency if you would export).
Yes, IB and ETH drivers are "users" of core driver. As RDMA maintainer,
I'm reluctant to accept code that exports symbols from IB drivers to
other subsystems. We have drivers/infiniband/core/ for that.
> If relying on exported symbols only, then our IB and ETH drivers will need
> to export symbols too because the core driver accesses them post probing.
So you should fix your core driver. This is exactly what auxbus model
proposes.
> Hence we won't be able to load the core driver without both of them (or
> loading anything due to circular dependency).
> Unless we'll use dynamic symbol lookup and I don't think that's your
> intention.
No it is not.
>
> >>
> >>>>
> >>>>>> + dev_err(hdev->dev, "Failed to increment %s module refcount\n",
> >>>>>> + module_name(THIS_MODULE));
> >>>>>> + rc = -EIO;
> >>>>>> + goto module_get_err;
> >>>>>> + }
> >>>>>> +
> >>>>>> + timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
> >>>>>> + while (1) {
> >>>>>> + aux_ops->hw_access_lock(aux_dev);
> >>>>>> +
> >>>>>> + /* if the device is operational, proceed to actual init while holding the lock in
> >>>>>> + * order to prevent concurrent hard reset
> >>>>>> + */
> >>>>>> + if (aux_ops->device_operational(aux_dev))
> >>>>>> + break;
> >>>>>> +
> >>>>>> + aux_ops->hw_access_unlock(aux_dev);
> >>>>>> +
> >>>>>> + if (ktime_compare(ktime_get(), timeout) > 0) {
> >>>>>> + dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
> >>>>>> + rc = -EBUSY;
> >>>>>> + goto timeout_err;
> >>>>>> + }
> >>>>>> +
> >>>>>> + dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing IB\n");
> >>>>>> +
> >>>>>> + msleep_interruptible(MSEC_PER_SEC);
> >>>>>> + }
> >>>>>
> >>>>> The code above is unexpected.
> >>>>>
> >>>>
> >>>> We have no control on when the user insmod the IB driver.
> >>>
> >>> It is not true, this is controlled through module dependencies
> >>> mechanism.
> >>>
> >>
> >> Yeah, if we would use EXPORT_SYMBOLs for inter driver communication but
> >> we don't.
> >
> > So please use it and don't add complexity where it is not needed.
> >
> >>
> >>>> As a result it is possible that the IB auxiliary device will be probed
> >>>> while the compute device is under reset (due to some HW error).
> >>>
> >>> No, it is not possible. If you structure your driver right.
> >>>
> >>
> >> Again, it is not possible if we would use EXPORT_SYMBOLs.
> >> Please let me know if we misunderstood something because AFAIU we followed
> >> the auxiliary bus doc usage example.
> >
> > It is better to follow actual drivers that use auxiliary bus and see how
> > they implemented it and not rely on examples in the documentation.
> >
>
> But isn't that what the doc for? to explain the guidelines? and it's not
> that there is a big red note there of "this example should not be taken as
> is, please look at your subsystem guidelines".
At the beginning that doc was located in Documentation/ folder and no one
really cared about it. After moving from Documentation/ to drivers/base/auxiliary.c,
it became more visible, but still no one relied on it. You are first one
who read.
There is no subsystem rules here. Everyone relied on EXPORT_SYMBOLs and didn't
use ops structure. Kernel is evolving project, there is no need to find a rule
for everything.
Thanks
>
> > Thanks
> >
> >>
> >>> Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 00/15] Introduce HabanaLabs network drivers
2024-06-17 12:34 ` [PATCH 00/15] Introduce HabanaLabs network drivers Alexander Lobakin
@ 2024-06-19 11:40 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-19 11:40 UTC (permalink / raw)
To: Alexander Lobakin
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/17/24 15:34, Alexander Lobakin wrote:
> [Some people who received this message don't often get email from aleksander.lobakin@intel.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> From: Omer Shpigelman <oshpigelman@habana.ai>
> Date: Thu, 13 Jun 2024 11:21:53 +0300
>
>> This patch set implements the HabanaLabs network drivers for Gaudi2 ASIC
>> which is designed for scaling of AI neural networks training.
>> The patch set includes the common code which is shared by all Gaudi ASICs
>> and the Gaudi2 ASIC specific code. Newer ASICs code will be followed.
>> All of these network drivers are modeled as an auxiliary devices to the
>> parent driver.
>>
>> The newly added drivers are Core Network (CN), Ethernet and InfiniBand.
>> All of these drivers are based on the existing habanalabs driver which
>> serves as the compute driver and the entire platform.
>> The habanalabs driver probes the network drivers which configure the
>> relevant NIC HW of the device. In addition, it continuously communicates
>> with the CN driver for providing some services which are not NIC specific
>> e.g. PCI, MMU, FW communication etc.
>>
>> See the drivers scheme at:
>> Documentation/networking/device_drivers/ethernet/intel/hbl.rst
>>
>> The CN driver is both a parent and a son driver. It serves as the common
>> layer of many shared operations that are required by both EN and IB
>> drivers.
>>
>> The Gaudi2 NIC HW is composed of 48 physical lanes, 56Gbps each. Each pair
>> of lanes represent a 100Gbps logical port.
>>
>> The NIC HW was designed specifically for scaling AI training.
>> Hence it basically functions as a regular NIC device but it is tuned for
>> its dedicated purpose. As a result, the NIC HW supports Ethernet traffic
>> and RDMA over modified ROCEv2 protocol.
>> For example, with respect to the IB driver, the HW supports a single
>> context and a single PD. The reason for this is that the operational use
>> case of AI training for Gaudi2 consists of a single user
>> application/process.
>> Another example related to the IB driver is the lack of MR since a single
>> application/process can share the entire MMU with the compute device.
>> Moreover, the memory allocation of user data buffers which are used for
>> RDMA communication is done via the habanalabs compute driver uAPI.
>> With respect to the Ethernet driver, since the Ethernet flow is used
>> mainly for control, the HW is not performance tuned e.g. it assumes a
>> contiguous memory for the Rx buffers. Thus the EN driver needs to copy the
>> Rx packets from the Rx buffer into the skb memory.
>>
>> The first 8 patches implement the CN driver.
>> The next 2 patches implement the EN driver.
>> The next 2 patches implement the IB driver.
>> The last 3 patches modify the compute driver to support the CN driver.
>>
>> The patches are rebased on v6.10-rc3 tag:
>> https://github.com/torvalds/linux/releases/tag/v6.10-rc3
>>
>> The patches are also available at:
>> https://github.com/HabanaAI/drivers.gpu.linux-nic.kernel/tree/hbl_next
>>
>> The user-mode of the driver is being reviewed at:
>> https://github.com/linux-rdma/rdma-core/pull/1472
>>
>> Any feedback, comment or question is welcome.
>>
>> Thanks,
>> Omer
>>
>> Omer Shpigelman (15):
>> net: hbl_cn: add habanalabs Core Network driver
>> net: hbl_cn: memory manager component
>> net: hbl_cn: physical layer support
>> net: hbl_cn: QP state machine
>> net: hbl_cn: memory trace events
>> net: hbl_cn: debugfs support
>> net: hbl_cn: gaudi2: ASIC register header files
>> net: hbl_cn: gaudi2: ASIC specific support
>> net: hbl_en: add habanalabs Ethernet driver
>> net: hbl_en: gaudi2: ASIC specific support
>> RDMA/hbl: add habanalabs RDMA driver
>> RDMA/hbl: direct verbs support
>> accel/habanalabs: network scaling support
>> accel/habanalabs/gaudi2: CN registers header files
>> accel/habanalabs/gaudi2: network scaling support
>>
>> .../ABI/testing/debugfs-driver-habanalabs_cn | 195 +
>> .../device_drivers/ethernet/index.rst | 1 +
>> .../device_drivers/ethernet/intel/hbl.rst | 82 +
>> MAINTAINERS | 33 +
>> drivers/accel/habanalabs/Kconfig | 1 +
>> drivers/accel/habanalabs/Makefile | 3 +
>> drivers/accel/habanalabs/cn/Makefile | 2 +
>> drivers/accel/habanalabs/cn/cn.c | 815 +
>> drivers/accel/habanalabs/cn/cn.h | 133 +
>> .../habanalabs/common/command_submission.c | 2 +-
>> drivers/accel/habanalabs/common/device.c | 23 +
>> drivers/accel/habanalabs/common/firmware_if.c | 20 +
>> drivers/accel/habanalabs/common/habanalabs.h | 43 +-
>> .../accel/habanalabs/common/habanalabs_drv.c | 37 +-
>> .../habanalabs/common/habanalabs_ioctl.c | 2 +
>> drivers/accel/habanalabs/common/memory.c | 123 +
>> drivers/accel/habanalabs/gaudi/gaudi.c | 14 +-
>> drivers/accel/habanalabs/gaudi2/Makefile | 2 +-
>> drivers/accel/habanalabs/gaudi2/gaudi2.c | 440 +-
>> drivers/accel/habanalabs/gaudi2/gaudi2P.h | 41 +-
>> drivers/accel/habanalabs/gaudi2/gaudi2_cn.c | 424 +
>> drivers/accel/habanalabs/gaudi2/gaudi2_cn.h | 42 +
>> .../habanalabs/gaudi2/gaudi2_coresight.c | 145 +-
>> .../accel/habanalabs/gaudi2/gaudi2_security.c | 16 +-
>> drivers/accel/habanalabs/goya/goya.c | 6 +
>> .../include/gaudi2/asic_reg/gaudi2_regs.h | 10 +-
>> .../include/gaudi2/asic_reg/nic0_phy_regs.h | 59 +
>> .../nic0_qm0_axuser_nonsecured_regs.h | 61 +
>> .../include/gaudi2/asic_reg/nic0_qpc1_regs.h | 905 +
>> .../include/gaudi2/asic_reg/nic0_rxe0_regs.h | 725 +
>> .../include/gaudi2/asic_reg/nic0_rxe1_regs.h | 725 +
>> .../include/gaudi2/asic_reg/nic0_txe0_regs.h | 529 +
>> .../include/gaudi2/asic_reg/nic0_txs0_regs.h | 289 +
>> .../include/hw_ip/nic/nic_general.h | 15 +
>> drivers/infiniband/Kconfig | 1 +
>> drivers/infiniband/hw/Makefile | 1 +
>> drivers/infiniband/hw/hbl/Kconfig | 18 +
>> drivers/infiniband/hw/hbl/Makefile | 12 +
>> drivers/infiniband/hw/hbl/hbl.h | 326 +
>> drivers/infiniband/hw/hbl/hbl_encap.c | 216 +
>> drivers/infiniband/hw/hbl/hbl_main.c | 493 +
>> drivers/infiniband/hw/hbl/hbl_query_port.c | 96 +
>> drivers/infiniband/hw/hbl/hbl_set_port_ex.c | 96 +
>> drivers/infiniband/hw/hbl/hbl_usr_fifo.c | 252 +
>> drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 +
>> drivers/net/ethernet/intel/Kconfig | 38 +
>> drivers/net/ethernet/intel/Makefile | 2 +
>> drivers/net/ethernet/intel/hbl_cn/Makefile | 14 +
>> .../net/ethernet/intel/hbl_cn/common/Makefile | 3 +
>> .../net/ethernet/intel/hbl_cn/common/hbl_cn.c | 5984 ++
>> .../net/ethernet/intel/hbl_cn/common/hbl_cn.h | 1666 +
>> .../intel/hbl_cn/common/hbl_cn_debugfs.c | 1457 +
>> .../ethernet/intel/hbl_cn/common/hbl_cn_drv.c | 240 +
>> .../intel/hbl_cn/common/hbl_cn_memory.c | 368 +
>> .../ethernet/intel/hbl_cn/common/hbl_cn_phy.c | 234 +
>> .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 491 +
>> .../net/ethernet/intel/hbl_cn/gaudi2/Makefile | 3 +
>> .../asic_reg/arc_farm_kdma_ctx_axuser_masks.h | 135 +
>> .../asic_reg/dcore0_sync_mngr_objs_regs.h | 43543 +++++++++++++++
>> .../asic_reg/gaudi2_blocks_linux_driver.h | 45068 ++++++++++++++++
>
> I don't think adding generated register defs etc. is a good idea.
> You just bloat the kernel code while most of the values are not used.
>
> When I work with HW and need to use some register defs, I add them
> manually one-by-one only when needed. I know it takes more time than to
> just add a whole generated reg file, but we don't need tens of thousand
> unused locs in the kernel.
> Please add only the actually used definitions. This applies to every
> file from the series.
>
Ok, I'll check which register definitions are actually needed.
>> .../hbl_cn/gaudi2/asic_reg/gaudi2_regs.h | 77 +
>> .../asic_reg/nic0_mac_ch0_mac_128_masks.h | 339 +
>> .../asic_reg/nic0_mac_ch0_mac_128_regs.h | 101 +
>> .../asic_reg/nic0_mac_ch0_mac_pcs_masks.h | 713 +
>> .../asic_reg/nic0_mac_ch0_mac_pcs_regs.h | 271 +
>> .../asic_reg/nic0_mac_ch1_mac_pcs_regs.h | 271 +
>> .../asic_reg/nic0_mac_ch2_mac_pcs_regs.h | 271 +
>> .../asic_reg/nic0_mac_ch3_mac_pcs_regs.h | 271 +
>> .../nic0_mac_glob_stat_control_reg_masks.h | 67 +
>> .../nic0_mac_glob_stat_control_reg_regs.h | 37 +
>> .../asic_reg/nic0_mac_glob_stat_rx0_regs.h | 93 +
>> .../asic_reg/nic0_mac_glob_stat_rx2_regs.h | 93 +
>> .../asic_reg/nic0_mac_glob_stat_tx0_regs.h | 75 +
>> .../asic_reg/nic0_mac_glob_stat_tx2_regs.h | 75 +
>> .../gaudi2/asic_reg/nic0_mac_rs_fec_regs.h | 157 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_phy_masks.h | 77 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_phy_regs.h | 59 +
>> .../nic0_qm0_axuser_nonsecured_regs.h | 61 +
>> .../asic_reg/nic0_qpc0_axuser_cong_que_regs.h | 61 +
>> .../asic_reg/nic0_qpc0_axuser_db_fifo_regs.h | 61 +
>> .../asic_reg/nic0_qpc0_axuser_err_fifo_regs.h | 61 +
>> .../nic0_qpc0_axuser_ev_que_lbw_intr_regs.h | 61 +
>> .../asic_reg/nic0_qpc0_axuser_qpc_req_regs.h | 61 +
>> .../asic_reg/nic0_qpc0_axuser_qpc_resp_regs.h | 61 +
>> .../asic_reg/nic0_qpc0_axuser_rxwqe_regs.h | 61 +
>> .../nic0_qpc0_axuser_txwqe_lbw_qman_bp_regs.h | 61 +
>> .../nic0_qpc0_dbfifo0_ci_upd_addr_regs.h | 27 +
>> .../nic0_qpc0_dbfifosecur_ci_upd_addr_regs.h | 27 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_qpc0_masks.h | 963 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_qpc0_regs.h | 905 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_qpc1_regs.h | 905 +
>> .../gaudi2/asic_reg/nic0_rxb_core_masks.h | 459 +
>> .../gaudi2/asic_reg/nic0_rxb_core_regs.h | 665 +
>> .../nic0_rxe0_axuser_axuser_cq0_regs.h | 61 +
>> .../nic0_rxe0_axuser_axuser_cq1_regs.h | 61 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_rxe0_masks.h | 705 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_rxe0_regs.h | 725 +
>> .../asic_reg/nic0_rxe0_wqe_aruser_regs.h | 61 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_rxe1_regs.h | 725 +
>> .../gaudi2/asic_reg/nic0_serdes0_masks.h | 7163 +++
>> .../gaudi2/asic_reg/nic0_serdes0_regs.h | 1679 +
>> .../gaudi2/asic_reg/nic0_serdes1_regs.h | 1679 +
>> .../asic_reg/nic0_tmr_axuser_tmr_fifo_regs.h | 61 +
>> .../nic0_tmr_axuser_tmr_free_list_regs.h | 61 +
>> .../asic_reg/nic0_tmr_axuser_tmr_fsm_regs.h | 61 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_tmr_masks.h | 361 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_tmr_regs.h | 183 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_txb_regs.h | 167 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_txe0_masks.h | 759 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_txe0_regs.h | 529 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_txs0_masks.h | 555 +
>> .../hbl_cn/gaudi2/asic_reg/nic0_txs0_regs.h | 289 +
>> .../nic0_umr0_0_completion_queue_ci_1_regs.h | 27 +
>> .../nic0_umr0_0_unsecure_doorbell0_regs.h | 31 +
>> .../nic0_umr0_0_unsecure_doorbell1_regs.h | 31 +
>> .../gaudi2/asic_reg/prt0_mac_core_masks.h | 137 +
>> .../gaudi2/asic_reg/prt0_mac_core_regs.h | 67 +
>> .../ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.c | 5689 ++
>> .../ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.h | 427 +
>> .../intel/hbl_cn/gaudi2/gaudi2_cn_debugfs.c | 319 +
>> .../intel/hbl_cn/gaudi2/gaudi2_cn_eq.c | 732 +
>> .../intel/hbl_cn/gaudi2/gaudi2_cn_phy.c | 2743 +
>> drivers/net/ethernet/intel/hbl_en/Makefile | 12 +
>> .../net/ethernet/intel/hbl_en/common/Makefile | 3 +
>> .../net/ethernet/intel/hbl_en/common/hbl_en.c | 1170 +
>> .../net/ethernet/intel/hbl_en/common/hbl_en.h | 208 +
>> .../intel/hbl_en/common/hbl_en_dcbnl.c | 101 +
>> .../ethernet/intel/hbl_en/common/hbl_en_drv.c | 211 +
>> .../intel/hbl_en/common/hbl_en_ethtool.c | 452 +
>> .../net/ethernet/intel/hbl_en/gaudi2/Makefile | 2 +
>> .../ethernet/intel/hbl_en/gaudi2/gaudi2_en.c | 728 +
>> .../ethernet/intel/hbl_en/gaudi2/gaudi2_en.h | 53 +
>> .../intel/hbl_en/gaudi2/gaudi2_en_dcbnl.c | 32 +
>> include/linux/habanalabs/cpucp_if.h | 125 +-
>> include/linux/habanalabs/hl_boot_if.h | 9 +-
>> include/linux/net/intel/cn.h | 474 +
>> include/linux/net/intel/cn_aux.h | 298 +
>> include/linux/net/intel/cni.h | 636 +
>> include/linux/net/intel/gaudi2.h | 432 +
>> include/linux/net/intel/gaudi2_aux.h | 94 +
>> include/trace/events/habanalabs_cn.h | 116 +
>> include/uapi/drm/habanalabs_accel.h | 10 +-
>> include/uapi/rdma/hbl-abi.h | 204 +
>> include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
>> include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
>> include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
>> 146 files changed, 148514 insertions(+), 70 deletions(-)
>
> So most of these new lines are generated register definitions. The
> series can be several times smaller if you follow my advice.
>
> Thanks,
> Olek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-15 0:10 ` Stephen Hemminger
@ 2024-06-19 12:07 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-19 12:07 UTC (permalink / raw)
To: Stephen Hemminger
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/15/24 03:10, Stephen Hemminger wrote:
> [You don't often get email from stephen@networkplumber.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Thu, 13 Jun 2024 11:22:02 +0300
> Omer Shpigelman <oshpigelman@habana.ai> wrote:
>
>> +static int hbl_en_ports_reopen(struct hbl_aux_dev *aux_dev)
>> +{
>> + struct hbl_en_device *hdev = aux_dev->priv;
>> + struct hbl_en_port *port;
>> + int rc = 0, i;
>> +
>> + for (i = 0; i < hdev->max_num_of_ports; i++) {
>> + if (!(hdev->ports_mask & BIT(i)))
>> + continue;
>> +
>> + port = &hdev->ports[i];
>> +
>> + /* It could be that the port was shutdown by 'ip link set down' and there is no need
>> + * in reopening it.
>> + * Since we mark the ports as in reset even if they are disabled, we clear the flag
>> + * here anyway.
>> + * See hbl_en_ports_stop_prepare() for more info.
>> + */
>> + if (!netif_running(port->ndev)) {
>> + atomic_set(&port->in_reset, 0);
>> + continue;
>> + }
>> +
>
> Rather than duplicating network device state in your own flags, it would be better to use
> existing infrastructure. Read Documentation/networking/operstates.rst
>
> Then you could also get rid of the kludge timer stuff in hbl_en_close().
>
I think that additional explanation is needed here.
In addition to netdev flows, we also support an internal reset flow
(that's what the in_reset boolean indicates).
Our NIC driver is an extension of the compute driver (they share the same
HW) and a reset flow might be needed due to some compute operation which
is entirely unrelated to the NIC driver. But we must not access the HW
while this reset flow is executed.
Note that this internal reset flow originates from the compute driver and
hence we might have parallel netdev operations that should be blocked
meanwhile.
The internal reset flow has 2 phases - teardown and re-init. This reopen
function is called during the re-init phase to restore the NIC ports, but
only if they were actually opened prior to the reset flow.
Regarding hbl_en_close() - during the port close we need to write to the
HW so due to the explanation above, also there we should wait for an
existing internal reset flow to finish first.
Let me know if that's clear enough and addresses your concerns.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-18 21:19 ` Stephen Hemminger
@ 2024-06-19 12:13 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-19 12:13 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Joe Damato, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On 6/19/24 00:19, Stephen Hemminger wrote:
> On Tue, 18 Jun 2024 19:37:36 +0000
> Omer Shpigelman <oshpigelman@habana.ai> wrote:
>
>>>
>>> Is there any reason in particular to call netif_receive_skb instead of
>>> napi_gro_receive ?
>>>
>>
>> As you can see, we also support polling mode which is a non-NAPI flow.
>> We could use napi_gro_receive() for NAPI flow and netif_receive_skb() for
>> polling mode but we don't support RX checksum offload anyway.
>
> Why non-NAPI? I thought current netdev policy was all drivers should
> use NAPI.
If that's the current policy then I can remove this non-NAPI mode.
I see on another thread that module parameters are not allowed so
apparently I'll need to remove this polling mode anyway as it is set by a
module parameter.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-19 8:01 ` Przemek Kitszel
@ 2024-06-19 12:15 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-19 12:15 UTC (permalink / raw)
To: Przemek Kitszel, Andrew Lunn
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/19/24 11:01, Przemek Kitszel wrote:
> On 6/19/24 09:16, Omer Shpigelman wrote:
>> On 6/18/24 17:19, Andrew Lunn wrote:
>
> [...]
>
>>>>>> +module_param(poll_enable, bool, 0444);
>>>>>> +MODULE_PARM_DESC(poll_enable,
>>>>>> + "Enable Rx polling rather than IRQ + NAPI (0 = no, 1 = yes, default: no)");
>>>>>
>>>>> Module parameters are not liked. This probably needs to go away.
>>>>>
>>>>
>>>> I see that various vendors under net/ethernet/* use module parameters.
>>>> Can't we add another one?
>>>
>>> Look at the history of those module parameters. Do you see many added
>>> in the last year? 5 years?
>>>
>>
>> I didn't check that prior to my submit. Regarding this "no new module
>> parameters allowed" rule, is that documented anywhere? if not, is that the
>> common practice? not to try to do something that was not done recently?
>> how "recently" is defined?
>> I just want to clarify this because it's hard to handle these submissions
>> when we write some code based on existing examples but then we are
>> rejected because "we don't do that here anymore".
>> I want to avoid future cases of this mismatch.
>>
>
> best way is to read netdev ML, that way you will learn what interfaces
> are frowned upon and which are outright banned, sometimes you could
> judge yourself knowing which interfaces are most developed recently
>
> in this module params example - they were introduced to allow init phase
> configuration of the device, that could not be postponed, what in the
> general case sounds like a workaround; hardest cases include huge swaths
> of (physical continuous) memory to be allocated, but for that there are
> now device tree binding solutions; more typical cases for networking are
> resolved via devlink reload
>
> devlink parms are also the thing that should be used as a default for
> new parameters, the best if given parameter is not driver specific quirk
>
> poll_enable sounds like something that should be a common param,
> but you have to better describe what you mean there
> (see napi_poll(), "Enable Rx polling" would mean to use that as default,
> do you mean busy polling or what?)
Yes, busy polling.
But never mind, I was informed that NAPI must be used so apparently I'll
need to anyway remove this polling mode and its module parameter.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-19 7:16 ` Omer Shpigelman
2024-06-19 8:01 ` Przemek Kitszel
@ 2024-06-19 15:21 ` Jakub Kicinski
2024-06-20 8:43 ` Omer Shpigelman
2024-06-19 16:13 ` Andrew Lunn
2 siblings, 1 reply; 107+ messages in thread
From: Jakub Kicinski @ 2024-06-19 15:21 UTC (permalink / raw)
To: Omer Shpigelman
Cc: Andrew Lunn, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On Wed, 19 Jun 2024 07:16:20 +0000 Omer Shpigelman wrote:
> >> Are you referring to get_module_eeprom_by_page()? if so, then it is not
> >> supported by our FW, we read the entire data on device load.
> >> However, I can hide that behind the new API and return only the
> >> requested page if that's the intention.
> >
> > Well, if your firmware is so limited, then you might as well stick to
> > the old API, and let the core do the conversion to the legacy
> > code. But i'm surprised you don't allow access to the temperature
> > sensors, received signal strength, voltages etc, which could be
> > exported via HWMON.
>
> I'll stick to the old API.
> Regaring the sensors, our compute driver (under accel/habanalabs) exports
> them via HWMON.
You support 400G, you really need to give the user the ability
to access higher pages.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-18 19:39 ` Omer Shpigelman
@ 2024-06-19 15:40 ` Andrew Lunn
2024-06-20 8:36 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Andrew Lunn @ 2024-06-19 15:40 UTC (permalink / raw)
To: Omer Shpigelman
Cc: Stephen Hemminger, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
> > Does this device require IPv4? What about users and infrastructures that use IPv6 only?
> > IPv4 is legacy at this point.
>
> Gaudi2 supports IPv4 only.
Really? I guess really old stuff, SLIP from 1988 does not support
IPv6, but i don't remember seeing anything from this century which
does not support passing IPv6 frames over a netdev.
Andrew
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-19 7:16 ` Omer Shpigelman
2024-06-19 8:01 ` Przemek Kitszel
2024-06-19 15:21 ` Jakub Kicinski
@ 2024-06-19 16:13 ` Andrew Lunn
2024-06-23 6:22 ` Omer Shpigelman
2 siblings, 1 reply; 107+ messages in thread
From: Andrew Lunn @ 2024-06-19 16:13 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On Wed, Jun 19, 2024 at 07:16:20AM +0000, Omer Shpigelman wrote:
> On 6/18/24 17:19, Andrew Lunn wrote:
> >>>> +static u32 hbl_en_get_mtu(struct hbl_aux_dev *aux_dev, u32 port_idx)
> >>>> +{
> >>>> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> >>>> + struct net_device *ndev = port->ndev;
> >>>> + u32 mtu;
> >>>> +
> >>>> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> >>>> + netdev_err(ndev, "port is in reset, can't get MTU\n");
> >>>> + return 0;
> >>>> + }
> >>>> +
> >>>> + mtu = ndev->mtu;
> >>>
> >>> I think you need a better error message. All this does is access
> >>> ndev->mtu. What does it matter if the port is in reset? You don't
> >>> access it.
> >>>
> >>
> >> This function is called from the CN driver to get the current MTU in order
> >> to configure it to the HW, for exmaple when configuring an IB QP. The MTU
> >> value might be changed by user while we execute this function.
> >
> > Change of MTU will happen while holding RTNL. Why not simply hold RTNL
> > while programming the hardware? That is the normal pattern for MAC
> > drivers.
> >
>
> I can hold the RTNL lock while configuring the HW but it seems like a big
> overhead. Configuring the HW might take some time due to QP draining or
> cache invalidation.
How often does the MTU change? Once, maybe twice on boot, and never
again? MTU change is not hot path. For slow path code, KISS is much
better, so it is likely to be correct.
> To me it seems unnecessary but if that's the common way then I'll change
> it.
>
> >>>> +static int hbl_en_change_mtu(struct net_device *netdev, int new_mtu)
> >>>> +{
> >>>> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
> >>>> + int rc = 0;
> >>>> +
> >>>> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> >>>> + netdev_err(netdev, "port is in reset, can't change MTU\n");
> >>>> + return -EBUSY;
> >>>> + }
> >>>> +
> >>>> + if (netif_running(port->ndev)) {
> >>>> + hbl_en_port_close(port);
> >>>> +
> >>>> + /* Sleep in order to let obsolete events to be dropped before re-opening the port */
> >>>> + msleep(20);
> >>>> +
> >>>> + netdev->mtu = new_mtu;
> >>>> +
> >>>> + rc = hbl_en_port_open(port);
> >>>> + if (rc)
> >>>> + netdev_err(netdev, "Failed to reinit port for MTU change, rc %d\n", rc);
> >>>
> >>> Does that mean the port is FUBAR?
> >>>
> >>> Most operations like this are expected to roll back to the previous
> >>> working configuration on failure. So if changing the MTU requires new
> >>> buffers in your ring, you should first allocate the new buffers, then
> >>> free the old buffers, so that if allocation fails, you still have
> >>> buffers, and the device can continue operating.
> >>>
> >>
> >> A failure in opening a port is a fatal error. It shouldn't happen. This is
> >> not something we wish to recover from.
> >
> > What could cause open to fail? Is memory allocated?
> >
>
> Memory is allocated but it is freed in case of a failure.
> Port opening can fail due to other reasons as well like some HW timeout
> while configuring the ETH QP.
If the hardware timeout because the hardware is dead, there is nothing
you can do about it. Its dead.
But what about when the system is under memory pressure? You say it
allocates memory. What happens if those allocations fail. Does
changing the MTU take me from a working system to a dead system? It is
good practice to not kill a working system under situations like
memory pressure. You try to first allocate the memory you need to
handle the new MTU, and only if successful do you free existing memory
you no longer need. That means if you cannot allocate the needed
memory, you still have the old memory, you can keep the old MTU and
return -ENOMEM, and the system keeps running.
> I didn't check that prior to my submit. Regarding this "no new module
> parameters allowed" rule, is that documented anywhere?
Lots of emails that fly passed on the mailing list. Maybe once every
couple of months when a vendor tries to mainline a new driver without
reading the mailing list for a few months to know how mainline
actually works. I _guess_ Davem has been pushing back on module
parameters for 10 years? Maybe more.
> if not, is that the
> common practice? not to try to do something that was not done recently?
> how "recently" is defined?
> I just want to clarify this because it's hard to handle these submissions
> when we write some code based on existing examples but then we are
> rejected because "we don't do that here anymore".
> I want to avoid future cases of this mismatch.
My suggestion would be to spend 30 minutes every day reading patches
and review comment on the mailing list. Avoid making the same mistakes
others make, especially newbies to mainline, and see what others are
doing in the same niche as this device. 30 minutes might seem like a
lot, but how much time did you waste implementing polling mode, now
you are going to throw it away?
> >>>> + ethtool_link_ksettings_add_link_mode(cmd, lp_advertising, Autoneg);
> >>>
> >>> That looks odd. Care to explain?
> >>>
> >>
> >> The HW of all of our ports supports autoneg.
> >> But in addition, the ports are divided to two groups:
> >> internal: ports which are connected to other Gaudi2 ports in the same server.
> >> external: ports which are connected to an external switch.
> >> Only internal ports use autoneg.
> >> The ports mask which sets each port as internal/external is fetched from
> >> the FW on device load.
> >
> > That is not what i meant. lc_advertising should indicate the link
> > modes the peer is advertising. If this was a copper link, it typically
> > would contain 10BaseT-Half, 10BaseT-Full, 100BaseT-Half,
> > 100BaseT-Full, 1000BaseT-Half. Setting the Autoneg bit is pointless,
> > since the peer must be advertising in order that lp_advertising has a
> > value!
> >
>
> Sorry, but I don't get this. The problem is the setting of the Autoneg bit
> in lp_advertising? is that redundant? I see that other vendors set it too
> in case that Autoneg was completed.
$ ethtool eth0
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
This is `supported`. The hardware can do these link modes.
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
It also support symmetric pause, and can do autoneg.
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: Symmetric Receive-only
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
This is `advertising`, and is what this device is advertising to the
link partner. By default you copy supported into advertising, but the
user can use ethtool -s advertise N, where N is a list of link modes,
to change what is advertised to the link partner.
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Link partner advertised pause frame use: Symmetric
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
This is `lp_advertising`, what the link partner is advertising to this
device. Once you have this, you mask lp_advertising with advertising,
and generally pick the link mode with the highest bandwidth:
Speed: 1000Mb/s
Duplex: Full
So autoneg resolved to 1000baseT/Full
Andrew
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 00/15] Introduce HabanaLabs network drivers
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
` (14 preceding siblings ...)
2024-06-17 12:34 ` [PATCH 00/15] Introduce HabanaLabs network drivers Alexander Lobakin
@ 2024-06-19 16:33 ` Jiri Pirko
2024-06-20 5:37 ` Omer Shpigelman
15 siblings, 1 reply; 107+ messages in thread
From: Jiri Pirko @ 2024-06-19 16:33 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel, linux-rdma, netdev, dri-devel, ogabbay, zyehudai
Thu, Jun 13, 2024 at 10:21:53AM CEST, oshpigelman@habana.ai wrote:
>This patch set implements the HabanaLabs network drivers for Gaudi2 ASIC
>which is designed for scaling of AI neural networks training.
>The patch set includes the common code which is shared by all Gaudi ASICs
>and the Gaudi2 ASIC specific code. Newer ASICs code will be followed.
>All of these network drivers are modeled as an auxiliary devices to the
>parent driver.
>
>The newly added drivers are Core Network (CN), Ethernet and InfiniBand.
>All of these drivers are based on the existing habanalabs driver which
>serves as the compute driver and the entire platform.
>The habanalabs driver probes the network drivers which configure the
>relevant NIC HW of the device. In addition, it continuously communicates
>with the CN driver for providing some services which are not NIC specific
>e.g. PCI, MMU, FW communication etc.
>
>See the drivers scheme at:
>Documentation/networking/device_drivers/ethernet/intel/hbl.rst
>
>The CN driver is both a parent and a son driver. It serves as the common
>layer of many shared operations that are required by both EN and IB
>drivers.
>
>The Gaudi2 NIC HW is composed of 48 physical lanes, 56Gbps each. Each pair
>of lanes represent a 100Gbps logical port.
What do you mean by "logical port"? Is it a separate netdevice. So you
have 24 netdevices visible on the system? How the physical port/ports
look like? How do you model that in devlink? Do you support port
splitting?
>
>The NIC HW was designed specifically for scaling AI training.
>Hence it basically functions as a regular NIC device but it is tuned for
>its dedicated purpose. As a result, the NIC HW supports Ethernet traffic
>and RDMA over modified ROCEv2 protocol.
>For example, with respect to the IB driver, the HW supports a single
>context and a single PD. The reason for this is that the operational use
>case of AI training for Gaudi2 consists of a single user
>application/process.
>Another example related to the IB driver is the lack of MR since a single
>application/process can share the entire MMU with the compute device.
>Moreover, the memory allocation of user data buffers which are used for
>RDMA communication is done via the habanalabs compute driver uAPI.
>With respect to the Ethernet driver, since the Ethernet flow is used
>mainly for control, the HW is not performance tuned e.g. it assumes a
>contiguous memory for the Rx buffers. Thus the EN driver needs to copy the
>Rx packets from the Rx buffer into the skb memory.
>
>The first 8 patches implement the CN driver.
>The next 2 patches implement the EN driver.
>The next 2 patches implement the IB driver.
>The last 3 patches modify the compute driver to support the CN driver.
>
>The patches are rebased on v6.10-rc3 tag:
>https://github.com/torvalds/linux/releases/tag/v6.10-rc3
>
>The patches are also available at:
>https://github.com/HabanaAI/drivers.gpu.linux-nic.kernel/tree/hbl_next
>
>The user-mode of the driver is being reviewed at:
>https://github.com/linux-rdma/rdma-core/pull/1472
>
>Any feedback, comment or question is welcome.
>
>Thanks,
>Omer
>
>Omer Shpigelman (15):
> net: hbl_cn: add habanalabs Core Network driver
> net: hbl_cn: memory manager component
> net: hbl_cn: physical layer support
> net: hbl_cn: QP state machine
> net: hbl_cn: memory trace events
> net: hbl_cn: debugfs support
> net: hbl_cn: gaudi2: ASIC register header files
> net: hbl_cn: gaudi2: ASIC specific support
> net: hbl_en: add habanalabs Ethernet driver
> net: hbl_en: gaudi2: ASIC specific support
> RDMA/hbl: add habanalabs RDMA driver
> RDMA/hbl: direct verbs support
> accel/habanalabs: network scaling support
> accel/habanalabs/gaudi2: CN registers header files
> accel/habanalabs/gaudi2: network scaling support
>
> .../ABI/testing/debugfs-driver-habanalabs_cn | 195 +
> .../device_drivers/ethernet/index.rst | 1 +
> .../device_drivers/ethernet/intel/hbl.rst | 82 +
> MAINTAINERS | 33 +
> drivers/accel/habanalabs/Kconfig | 1 +
> drivers/accel/habanalabs/Makefile | 3 +
> drivers/accel/habanalabs/cn/Makefile | 2 +
> drivers/accel/habanalabs/cn/cn.c | 815 +
> drivers/accel/habanalabs/cn/cn.h | 133 +
> .../habanalabs/common/command_submission.c | 2 +-
> drivers/accel/habanalabs/common/device.c | 23 +
> drivers/accel/habanalabs/common/firmware_if.c | 20 +
> drivers/accel/habanalabs/common/habanalabs.h | 43 +-
> .../accel/habanalabs/common/habanalabs_drv.c | 37 +-
> .../habanalabs/common/habanalabs_ioctl.c | 2 +
> drivers/accel/habanalabs/common/memory.c | 123 +
> drivers/accel/habanalabs/gaudi/gaudi.c | 14 +-
> drivers/accel/habanalabs/gaudi2/Makefile | 2 +-
> drivers/accel/habanalabs/gaudi2/gaudi2.c | 440 +-
> drivers/accel/habanalabs/gaudi2/gaudi2P.h | 41 +-
> drivers/accel/habanalabs/gaudi2/gaudi2_cn.c | 424 +
> drivers/accel/habanalabs/gaudi2/gaudi2_cn.h | 42 +
> .../habanalabs/gaudi2/gaudi2_coresight.c | 145 +-
> .../accel/habanalabs/gaudi2/gaudi2_security.c | 16 +-
> drivers/accel/habanalabs/goya/goya.c | 6 +
> .../include/gaudi2/asic_reg/gaudi2_regs.h | 10 +-
> .../include/gaudi2/asic_reg/nic0_phy_regs.h | 59 +
> .../nic0_qm0_axuser_nonsecured_regs.h | 61 +
> .../include/gaudi2/asic_reg/nic0_qpc1_regs.h | 905 +
> .../include/gaudi2/asic_reg/nic0_rxe0_regs.h | 725 +
> .../include/gaudi2/asic_reg/nic0_rxe1_regs.h | 725 +
> .../include/gaudi2/asic_reg/nic0_txe0_regs.h | 529 +
> .../include/gaudi2/asic_reg/nic0_txs0_regs.h | 289 +
> .../include/hw_ip/nic/nic_general.h | 15 +
> drivers/infiniband/Kconfig | 1 +
> drivers/infiniband/hw/Makefile | 1 +
> drivers/infiniband/hw/hbl/Kconfig | 18 +
> drivers/infiniband/hw/hbl/Makefile | 12 +
> drivers/infiniband/hw/hbl/hbl.h | 326 +
> drivers/infiniband/hw/hbl/hbl_encap.c | 216 +
> drivers/infiniband/hw/hbl/hbl_main.c | 493 +
> drivers/infiniband/hw/hbl/hbl_query_port.c | 96 +
> drivers/infiniband/hw/hbl/hbl_set_port_ex.c | 96 +
> drivers/infiniband/hw/hbl/hbl_usr_fifo.c | 252 +
> drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 +
> drivers/net/ethernet/intel/Kconfig | 38 +
> drivers/net/ethernet/intel/Makefile | 2 +
> drivers/net/ethernet/intel/hbl_cn/Makefile | 14 +
> .../net/ethernet/intel/hbl_cn/common/Makefile | 3 +
> .../net/ethernet/intel/hbl_cn/common/hbl_cn.c | 5984 ++
> .../net/ethernet/intel/hbl_cn/common/hbl_cn.h | 1666 +
> .../intel/hbl_cn/common/hbl_cn_debugfs.c | 1457 +
> .../ethernet/intel/hbl_cn/common/hbl_cn_drv.c | 240 +
> .../intel/hbl_cn/common/hbl_cn_memory.c | 368 +
> .../ethernet/intel/hbl_cn/common/hbl_cn_phy.c | 234 +
> .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 491 +
> .../net/ethernet/intel/hbl_cn/gaudi2/Makefile | 3 +
> .../asic_reg/arc_farm_kdma_ctx_axuser_masks.h | 135 +
> .../asic_reg/dcore0_sync_mngr_objs_regs.h | 43543 +++++++++++++++
> .../asic_reg/gaudi2_blocks_linux_driver.h | 45068 ++++++++++++++++
> .../hbl_cn/gaudi2/asic_reg/gaudi2_regs.h | 77 +
> .../asic_reg/nic0_mac_ch0_mac_128_masks.h | 339 +
> .../asic_reg/nic0_mac_ch0_mac_128_regs.h | 101 +
> .../asic_reg/nic0_mac_ch0_mac_pcs_masks.h | 713 +
> .../asic_reg/nic0_mac_ch0_mac_pcs_regs.h | 271 +
> .../asic_reg/nic0_mac_ch1_mac_pcs_regs.h | 271 +
> .../asic_reg/nic0_mac_ch2_mac_pcs_regs.h | 271 +
> .../asic_reg/nic0_mac_ch3_mac_pcs_regs.h | 271 +
> .../nic0_mac_glob_stat_control_reg_masks.h | 67 +
> .../nic0_mac_glob_stat_control_reg_regs.h | 37 +
> .../asic_reg/nic0_mac_glob_stat_rx0_regs.h | 93 +
> .../asic_reg/nic0_mac_glob_stat_rx2_regs.h | 93 +
> .../asic_reg/nic0_mac_glob_stat_tx0_regs.h | 75 +
> .../asic_reg/nic0_mac_glob_stat_tx2_regs.h | 75 +
> .../gaudi2/asic_reg/nic0_mac_rs_fec_regs.h | 157 +
> .../hbl_cn/gaudi2/asic_reg/nic0_phy_masks.h | 77 +
> .../hbl_cn/gaudi2/asic_reg/nic0_phy_regs.h | 59 +
> .../nic0_qm0_axuser_nonsecured_regs.h | 61 +
> .../asic_reg/nic0_qpc0_axuser_cong_que_regs.h | 61 +
> .../asic_reg/nic0_qpc0_axuser_db_fifo_regs.h | 61 +
> .../asic_reg/nic0_qpc0_axuser_err_fifo_regs.h | 61 +
> .../nic0_qpc0_axuser_ev_que_lbw_intr_regs.h | 61 +
> .../asic_reg/nic0_qpc0_axuser_qpc_req_regs.h | 61 +
> .../asic_reg/nic0_qpc0_axuser_qpc_resp_regs.h | 61 +
> .../asic_reg/nic0_qpc0_axuser_rxwqe_regs.h | 61 +
> .../nic0_qpc0_axuser_txwqe_lbw_qman_bp_regs.h | 61 +
> .../nic0_qpc0_dbfifo0_ci_upd_addr_regs.h | 27 +
> .../nic0_qpc0_dbfifosecur_ci_upd_addr_regs.h | 27 +
> .../hbl_cn/gaudi2/asic_reg/nic0_qpc0_masks.h | 963 +
> .../hbl_cn/gaudi2/asic_reg/nic0_qpc0_regs.h | 905 +
> .../hbl_cn/gaudi2/asic_reg/nic0_qpc1_regs.h | 905 +
> .../gaudi2/asic_reg/nic0_rxb_core_masks.h | 459 +
> .../gaudi2/asic_reg/nic0_rxb_core_regs.h | 665 +
> .../nic0_rxe0_axuser_axuser_cq0_regs.h | 61 +
> .../nic0_rxe0_axuser_axuser_cq1_regs.h | 61 +
> .../hbl_cn/gaudi2/asic_reg/nic0_rxe0_masks.h | 705 +
> .../hbl_cn/gaudi2/asic_reg/nic0_rxe0_regs.h | 725 +
> .../asic_reg/nic0_rxe0_wqe_aruser_regs.h | 61 +
> .../hbl_cn/gaudi2/asic_reg/nic0_rxe1_regs.h | 725 +
> .../gaudi2/asic_reg/nic0_serdes0_masks.h | 7163 +++
> .../gaudi2/asic_reg/nic0_serdes0_regs.h | 1679 +
> .../gaudi2/asic_reg/nic0_serdes1_regs.h | 1679 +
> .../asic_reg/nic0_tmr_axuser_tmr_fifo_regs.h | 61 +
> .../nic0_tmr_axuser_tmr_free_list_regs.h | 61 +
> .../asic_reg/nic0_tmr_axuser_tmr_fsm_regs.h | 61 +
> .../hbl_cn/gaudi2/asic_reg/nic0_tmr_masks.h | 361 +
> .../hbl_cn/gaudi2/asic_reg/nic0_tmr_regs.h | 183 +
> .../hbl_cn/gaudi2/asic_reg/nic0_txb_regs.h | 167 +
> .../hbl_cn/gaudi2/asic_reg/nic0_txe0_masks.h | 759 +
> .../hbl_cn/gaudi2/asic_reg/nic0_txe0_regs.h | 529 +
> .../hbl_cn/gaudi2/asic_reg/nic0_txs0_masks.h | 555 +
> .../hbl_cn/gaudi2/asic_reg/nic0_txs0_regs.h | 289 +
> .../nic0_umr0_0_completion_queue_ci_1_regs.h | 27 +
> .../nic0_umr0_0_unsecure_doorbell0_regs.h | 31 +
> .../nic0_umr0_0_unsecure_doorbell1_regs.h | 31 +
> .../gaudi2/asic_reg/prt0_mac_core_masks.h | 137 +
> .../gaudi2/asic_reg/prt0_mac_core_regs.h | 67 +
> .../ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.c | 5689 ++
> .../ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.h | 427 +
> .../intel/hbl_cn/gaudi2/gaudi2_cn_debugfs.c | 319 +
> .../intel/hbl_cn/gaudi2/gaudi2_cn_eq.c | 732 +
> .../intel/hbl_cn/gaudi2/gaudi2_cn_phy.c | 2743 +
> drivers/net/ethernet/intel/hbl_en/Makefile | 12 +
> .../net/ethernet/intel/hbl_en/common/Makefile | 3 +
> .../net/ethernet/intel/hbl_en/common/hbl_en.c | 1170 +
> .../net/ethernet/intel/hbl_en/common/hbl_en.h | 208 +
> .../intel/hbl_en/common/hbl_en_dcbnl.c | 101 +
> .../ethernet/intel/hbl_en/common/hbl_en_drv.c | 211 +
> .../intel/hbl_en/common/hbl_en_ethtool.c | 452 +
> .../net/ethernet/intel/hbl_en/gaudi2/Makefile | 2 +
> .../ethernet/intel/hbl_en/gaudi2/gaudi2_en.c | 728 +
> .../ethernet/intel/hbl_en/gaudi2/gaudi2_en.h | 53 +
> .../intel/hbl_en/gaudi2/gaudi2_en_dcbnl.c | 32 +
> include/linux/habanalabs/cpucp_if.h | 125 +-
> include/linux/habanalabs/hl_boot_if.h | 9 +-
> include/linux/net/intel/cn.h | 474 +
> include/linux/net/intel/cn_aux.h | 298 +
> include/linux/net/intel/cni.h | 636 +
> include/linux/net/intel/gaudi2.h | 432 +
> include/linux/net/intel/gaudi2_aux.h | 94 +
> include/trace/events/habanalabs_cn.h | 116 +
> include/uapi/drm/habanalabs_accel.h | 10 +-
> include/uapi/rdma/hbl-abi.h | 204 +
> include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
> include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
> include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
> 146 files changed, 148514 insertions(+), 70 deletions(-)
> create mode 100644 Documentation/ABI/testing/debugfs-driver-habanalabs_cn
> create mode 100644 Documentation/networking/device_drivers/ethernet/intel/hbl.rst
> create mode 100644 drivers/accel/habanalabs/cn/Makefile
> create mode 100644 drivers/accel/habanalabs/cn/cn.c
> create mode 100644 drivers/accel/habanalabs/cn/cn.h
> create mode 100644 drivers/accel/habanalabs/gaudi2/gaudi2_cn.c
> create mode 100644 drivers/accel/habanalabs/gaudi2/gaudi2_cn.h
> create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_phy_regs.h
> create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_qm0_axuser_nonsecured_regs.h
> create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_qpc1_regs.h
> create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_rxe0_regs.h
> create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_rxe1_regs.h
> create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_txe0_regs.h
> create mode 100644 drivers/accel/habanalabs/include/gaudi2/asic_reg/nic0_txs0_regs.h
> create mode 100644 drivers/accel/habanalabs/include/hw_ip/nic/nic_general.h
> create mode 100644 drivers/infiniband/hw/hbl/Kconfig
> create mode 100644 drivers/infiniband/hw/hbl/Makefile
> create mode 100644 drivers/infiniband/hw/hbl/hbl.h
> create mode 100644 drivers/infiniband/hw/hbl/hbl_encap.c
> create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c
> create mode 100644 drivers/infiniband/hw/hbl/hbl_query_port.c
> create mode 100644 drivers/infiniband/hw/hbl/hbl_set_port_ex.c
> create mode 100644 drivers/infiniband/hw/hbl/hbl_usr_fifo.c
> create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/Makefile
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/Makefile
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.c
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_debugfs.c
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_drv.c
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_memory.c
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_phy.c
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/Makefile
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/arc_farm_kdma_ctx_axuser_masks.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/dcore0_sync_mngr_objs_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/gaudi2_blocks_linux_driver.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/gaudi2_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_ch0_mac_128_masks.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_ch0_mac_128_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_ch0_mac_pcs_masks.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_ch0_mac_pcs_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_ch1_mac_pcs_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_ch2_mac_pcs_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_ch3_mac_pcs_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_glob_stat_control_reg_masks.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_glob_stat_control_reg_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_glob_stat_rx0_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_glob_stat_rx2_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_glob_stat_tx0_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_glob_stat_tx2_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_mac_rs_fec_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_phy_masks.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_phy_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qm0_axuser_nonsecured_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_cong_que_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_db_fifo_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_err_fifo_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_ev_que_lbw_intr_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_qpc_req_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_qpc_resp_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_rxwqe_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_axuser_txwqe_lbw_qman_bp_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_dbfifo0_ci_upd_addr_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_dbfifosecur_ci_upd_addr_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_masks.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc0_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_qpc1_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxb_core_masks.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxb_core_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxe0_axuser_axuser_cq0_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxe0_axuser_axuser_cq1_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxe0_masks.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxe0_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxe0_wqe_aruser_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_rxe1_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_serdes0_masks.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_serdes0_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_serdes1_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_tmr_axuser_tmr_fifo_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_tmr_axuser_tmr_free_list_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_tmr_axuser_tmr_fsm_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_tmr_masks.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_tmr_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_txb_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_txe0_masks.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_txe0_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_txs0_masks.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_txs0_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_umr0_0_completion_queue_ci_1_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_umr0_0_unsecure_doorbell0_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/nic0_umr0_0_unsecure_doorbell1_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/prt0_mac_core_masks.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/asic_reg/prt0_mac_core_regs.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.c
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn.h
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_debugfs.c
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_eq.c
> create mode 100644 drivers/net/ethernet/intel/hbl_cn/gaudi2/gaudi2_cn_phy.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/Makefile
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/Makefile
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en.h
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_dcbnl.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_drv.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/common/hbl_en_ethtool.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/gaudi2/Makefile
> create mode 100644 drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en.c
> create mode 100644 drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en.h
> create mode 100644 drivers/net/ethernet/intel/hbl_en/gaudi2/gaudi2_en_dcbnl.c
> create mode 100644 include/linux/net/intel/cn.h
> create mode 100644 include/linux/net/intel/cn_aux.h
> create mode 100644 include/linux/net/intel/cni.h
> create mode 100644 include/linux/net/intel/gaudi2.h
> create mode 100644 include/linux/net/intel/gaudi2_aux.h
> create mode 100644 include/trace/events/habanalabs_cn.h
> create mode 100644 include/uapi/rdma/hbl-abi.h
> create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h
> create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h
>
>--
>2.34.1
>
>
^ permalink raw reply [flat|nested] 107+ messages in thread
* RE: [PATCH 06/15] net: hbl_cn: debugfs support
2024-06-13 8:21 ` [PATCH 06/15] net: hbl_cn: debugfs support Omer Shpigelman
@ 2024-06-19 18:35 ` Sunil Kovvuri Goutham
2024-06-21 10:17 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Sunil Kovvuri Goutham @ 2024-06-19 18:35 UTC (permalink / raw)
To: Omer Shpigelman, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org
Cc: ogabbay@kernel.org, zyehudai@habana.ai
>+
>+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_disable_decap
>+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_inject_rx_err
>+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_mac_lane_remap
Don't think debugfs is the correct interface for all this configuration.
Debugfs should ideally be used for dumping runtime device state info for debug purposes.
>+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_mac_loopback
Why not use ethtool ?
>+
>+What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_mmu_bypass
How does this work ?
Thanks,
Sunil.
^ permalink raw reply [flat|nested] 107+ messages in thread
* RE: [PATCH 13/15] accel/habanalabs: network scaling support
2024-06-13 8:22 ` [PATCH 13/15] accel/habanalabs: network scaling support Omer Shpigelman
@ 2024-06-19 18:41 ` Sunil Kovvuri Goutham
2024-06-21 10:21 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Sunil Kovvuri Goutham @ 2024-06-19 18:41 UTC (permalink / raw)
To: Omer Shpigelman, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org
Cc: ogabbay@kernel.org, zyehudai@habana.ai
>
>Add common support for AI scaling over the network. Initialize the hbl_cn driver via
>auxiliary bus and serve as its adapter for accessing the device.
A 1200 line patch deserves a bit more of info in the commit msg.
Can you please elaborate what network scaling support is being added in this patch.
Thanks,
Sunil.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 00/15] Introduce HabanaLabs network drivers
2024-06-19 16:33 ` Jiri Pirko
@ 2024-06-20 5:37 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-20 5:37 UTC (permalink / raw)
To: Jiri Pirko
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/19/24 19:33, Jiri Pirko wrote:
> [You don't often get email from jiri@resnulli.us. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Thu, Jun 13, 2024 at 10:21:53AM CEST, oshpigelman@habana.ai wrote:
>> This patch set implements the HabanaLabs network drivers for Gaudi2 ASIC
>> which is designed for scaling of AI neural networks training.
>> The patch set includes the common code which is shared by all Gaudi ASICs
>> and the Gaudi2 ASIC specific code. Newer ASICs code will be followed.
>> All of these network drivers are modeled as an auxiliary devices to the
>> parent driver.
>>
>> The newly added drivers are Core Network (CN), Ethernet and InfiniBand.
>> All of these drivers are based on the existing habanalabs driver which
>> serves as the compute driver and the entire platform.
>> The habanalabs driver probes the network drivers which configure the
>> relevant NIC HW of the device. In addition, it continuously communicates
>> with the CN driver for providing some services which are not NIC specific
>> e.g. PCI, MMU, FW communication etc.
>>
>> See the drivers scheme at:
>> Documentation/networking/device_drivers/ethernet/intel/hbl.rst
>>
>> The CN driver is both a parent and a son driver. It serves as the common
>> layer of many shared operations that are required by both EN and IB
>> drivers.
>>
>> The Gaudi2 NIC HW is composed of 48 physical lanes, 56Gbps each. Each pair
>> of lanes represent a 100Gbps logical port.
>
> What do you mean by "logical port"? Is it a separate netdevice. So you
> have 24 netdevices visible on the system? How the physical port/ports
> look like? How do you model that in devlink? Do you support port
> splitting?
>
I first described our HW. It is composed of 48 physical lanes. But each
netdevice (meaning a "logical port") is mapped to a pair of these, so we
end up with 24 netdevices visible on the system.
Technically we could work in a mode where we have 48 netdevices visible on
the system and each netdevice is mapped to a single physical lane, but we
have no use case for that.
We are not interagted to devlink, we didn't find a need for that in our
use cases.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-19 15:40 ` Andrew Lunn
@ 2024-06-20 8:36 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-20 8:36 UTC (permalink / raw)
To: Andrew Lunn
Cc: Stephen Hemminger, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On 6/19/24 18:40, Andrew Lunn wrote:
>>> Does this device require IPv4? What about users and infrastructures that use IPv6 only?
>>> IPv4 is legacy at this point.
>>
>> Gaudi2 supports IPv4 only.
>
> Really? I guess really old stuff, SLIP from 1988 does not support
> IPv6, but i don't remember seeing anything from this century which
> does not support passing IPv6 frames over a netdev.
>
> Andrew
We support IPv6 for ETH, not for RDMA. For RDMA, IPv4 is good enough for
our use case so IPv6 was not required. Stephen's comment was about the
code where the CN driver fetches the port IP for configuring it to the
RDMA QPs. It is an RDMA specific path.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-19 15:21 ` Jakub Kicinski
@ 2024-06-20 8:43 ` Omer Shpigelman
2024-06-20 13:51 ` Jakub Kicinski
0 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-20 8:43 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Andrew Lunn, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On 6/19/24 18:21, Jakub Kicinski wrote:
> [Some people who received this message don't often get email from kuba@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Wed, 19 Jun 2024 07:16:20 +0000 Omer Shpigelman wrote:
>>>> Are you referring to get_module_eeprom_by_page()? if so, then it is not
>>>> supported by our FW, we read the entire data on device load.
>>>> However, I can hide that behind the new API and return only the
>>>> requested page if that's the intention.
>>>
>>> Well, if your firmware is so limited, then you might as well stick to
>>> the old API, and let the core do the conversion to the legacy
>>> code. But i'm surprised you don't allow access to the temperature
>>> sensors, received signal strength, voltages etc, which could be
>>> exported via HWMON.
>>
>> I'll stick to the old API.
>> Regaring the sensors, our compute driver (under accel/habanalabs) exports
>> them via HWMON.
>
> You support 400G, you really need to give the user the ability
> to access higher pages.
Actually the 200G and 400G modes in the ethtool code should be removed
from this patch set. They are not relevant for Gaudi2. I'll fix it in the
next version.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-20 8:43 ` Omer Shpigelman
@ 2024-06-20 13:51 ` Jakub Kicinski
2024-06-20 19:14 ` Andrew Lunn
0 siblings, 1 reply; 107+ messages in thread
From: Jakub Kicinski @ 2024-06-20 13:51 UTC (permalink / raw)
To: Omer Shpigelman
Cc: Andrew Lunn, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On Thu, 20 Jun 2024 08:43:34 +0000 Omer Shpigelman wrote:
> > You support 400G, you really need to give the user the ability
> > to access higher pages.
>
> Actually the 200G and 400G modes in the ethtool code should be removed
> from this patch set. They are not relevant for Gaudi2. I'll fix it in the
> next version.
How do your customers / users check SFP diagnostics?
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-20 13:51 ` Jakub Kicinski
@ 2024-06-20 19:14 ` Andrew Lunn
2024-06-23 14:48 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Andrew Lunn @ 2024-06-20 19:14 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Omer Shpigelman, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On Thu, Jun 20, 2024 at 06:51:35AM -0700, Jakub Kicinski wrote:
> On Thu, 20 Jun 2024 08:43:34 +0000 Omer Shpigelman wrote:
> > > You support 400G, you really need to give the user the ability
> > > to access higher pages.
> >
> > Actually the 200G and 400G modes in the ethtool code should be removed
> > from this patch set. They are not relevant for Gaudi2. I'll fix it in the
> > next version.
>
> How do your customers / users check SFP diagnostics?
And perform firmware upgrade of the SFPs?
https://lore.kernel.org/netdev/20240619121727.3643161-7-danieller@nvidia.com/T/
Andrew
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 06/15] net: hbl_cn: debugfs support
2024-06-19 18:35 ` Sunil Kovvuri Goutham
@ 2024-06-21 10:17 ` Omer Shpigelman
2024-06-21 10:30 ` Sunil Kovvuri Goutham
2024-06-21 15:33 ` Andrew Lunn
0 siblings, 2 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-21 10:17 UTC (permalink / raw)
To: Sunil Kovvuri Goutham, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org
Cc: ogabbay@kernel.org, Zvika Yehudai
On 6/19/24 21:35, Sunil Kovvuri Goutham wrote:
> [Some people who received this message don't often get email from sgoutham@marvell.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
>> +
>> +What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_disable_decap
>> +What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_inject_rx_err
>> +What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_mac_lane_remap
>
> Don't think debugfs is the correct interface for all this configuration.
> Debugfs should ideally be used for dumping runtime device state info for debug purposes.
>
I see other vendors have debugfs entries for debug configurations or
settings, not just for dumping debug info.
>> +What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_mac_loopback
>
> Why not use ethtool ?
>
We have an ethtool option for that, but we have also internal NIC ports
that are not exposed as netdevices and for them the ethtool path is
irrelevant. Hence we need this debugfs option as well.
>> +
>> +What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_mmu_bypass
>
> How does this work ?
>
When this option is enabled the RDMA data buffers the user allocated on the
host memory are being accessed directly i.e. without MMU.
But now after you brought this up I see that it is not fully supported
anymore so I'll remove in the next version.
> Thanks,
> Sunil.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 13/15] accel/habanalabs: network scaling support
2024-06-19 18:41 ` Sunil Kovvuri Goutham
@ 2024-06-21 10:21 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-21 10:21 UTC (permalink / raw)
To: Sunil Kovvuri Goutham, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org
Cc: ogabbay@kernel.org, Zvika Yehudai
On 6/19/24 21:41, Sunil Kovvuri Goutham wrote:
> [Some people who received this message don't often get email from sgoutham@marvell.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
>>
>> Add common support for AI scaling over the network. Initialize the hbl_cn driver via
>> auxiliary bus and serve as its adapter for accessing the device.
>
> A 1200 line patch deserves a bit more of info in the commit msg.
> Can you please elaborate what network scaling support is being added in this patch.
>
> Thanks,
> Sunil.
>
Ok, I'll add more details regarding what exactly is added in the next
patch set version.
^ permalink raw reply [flat|nested] 107+ messages in thread
* RE: [PATCH 06/15] net: hbl_cn: debugfs support
2024-06-21 10:17 ` Omer Shpigelman
@ 2024-06-21 10:30 ` Sunil Kovvuri Goutham
2024-06-23 7:25 ` Omer Shpigelman
2024-06-21 15:33 ` Andrew Lunn
1 sibling, 1 reply; 107+ messages in thread
From: Sunil Kovvuri Goutham @ 2024-06-21 10:30 UTC (permalink / raw)
To: Omer Shpigelman, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org
Cc: ogabbay@kernel.org, Zvika Yehudai
>>> +
>>> +What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_disable_decap
>>> +What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_inject_rx_err
>>> +What:
>/sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_mac_lane_remap
>>
>> Don't think debugfs is the correct interface for all this configuration.
>> Debugfs should ideally be used for dumping runtime device state info for debug
>purposes.
>>
>
>I see other vendors have debugfs entries for debug configurations or settings, not
>just for dumping debug info.
>
But disable_decap / mac_lane_remap seems configuration related which changes the way pkts are processed not debug.
Configurations are supported via devlink.
Thanks,
Sunil.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 06/15] net: hbl_cn: debugfs support
2024-06-21 10:17 ` Omer Shpigelman
2024-06-21 10:30 ` Sunil Kovvuri Goutham
@ 2024-06-21 15:33 ` Andrew Lunn
2024-06-23 6:57 ` Omer Shpigelman
1 sibling, 1 reply; 107+ messages in thread
From: Andrew Lunn @ 2024-06-21 15:33 UTC (permalink / raw)
To: Omer Shpigelman
Cc: Sunil Kovvuri Goutham, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
> I see other vendors have debugfs entries for debug configurations or
> settings, not just for dumping debug info.
Did you see any added in the last few years? This is also something
DaveM pushed back on. We want uniform APIs so that all devices look
alike. Please consider what you are exporting here, how it should
cleanly fit into ethtool, devlink, etc, and expand these APIs to cover
your needs.
>
> >> +What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_mac_loopback
> >
> > Why not use ethtool ?
> >
>
> We have an ethtool option for that, but we have also internal NIC ports
> that are not exposed as netdevices and for them the ethtool path is
> irrelevant. Hence we need this debugfs option as well.
If there is no netdev, what is the point of putting it into loopback?
How do you send packets which are to be looped back? How do you
receive them to see if they were actually looped back?
Andrew
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-19 16:13 ` Andrew Lunn
@ 2024-06-23 6:22 ` Omer Shpigelman
2024-06-23 14:46 ` Andrew Lunn
0 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-23 6:22 UTC (permalink / raw)
To: Andrew Lunn
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/19/24 19:13, Andrew Lunn wrote:
> On Wed, Jun 19, 2024 at 07:16:20AM +0000, Omer Shpigelman wrote:
>> On 6/18/24 17:19, Andrew Lunn wrote:
>>>>>> +static u32 hbl_en_get_mtu(struct hbl_aux_dev *aux_dev, u32 port_idx)
>>>>>> +{
>>>>>> + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
>>>>>> + struct net_device *ndev = port->ndev;
>>>>>> + u32 mtu;
>>>>>> +
>>>>>> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
>>>>>> + netdev_err(ndev, "port is in reset, can't get MTU\n");
>>>>>> + return 0;
>>>>>> + }
>>>>>> +
>>>>>> + mtu = ndev->mtu;
>>>>>
>>>>> I think you need a better error message. All this does is access
>>>>> ndev->mtu. What does it matter if the port is in reset? You don't
>>>>> access it.
>>>>>
>>>>
>>>> This function is called from the CN driver to get the current MTU in order
>>>> to configure it to the HW, for exmaple when configuring an IB QP. The MTU
>>>> value might be changed by user while we execute this function.
>>>
>>> Change of MTU will happen while holding RTNL. Why not simply hold RTNL
>>> while programming the hardware? That is the normal pattern for MAC
>>> drivers.
>>>
>>
>> I can hold the RTNL lock while configuring the HW but it seems like a big
>> overhead. Configuring the HW might take some time due to QP draining or
>> cache invalidation.
>
> How often does the MTU change? Once, maybe twice on boot, and never
> again? MTU change is not hot path. For slow path code, KISS is much
> better, so it is likely to be correct.
>
Yeah, it's not a hot path so I guess we can just return the MTU value
regardless of a parallel reset flow.
>> To me it seems unnecessary but if that's the common way then I'll change
>> it.
>>
>>>>>> +static int hbl_en_change_mtu(struct net_device *netdev, int new_mtu)
>>>>>> +{
>>>>>> + struct hbl_en_port *port = hbl_netdev_priv(netdev);
>>>>>> + int rc = 0;
>>>>>> +
>>>>>> + if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
>>>>>> + netdev_err(netdev, "port is in reset, can't change MTU\n");
>>>>>> + return -EBUSY;
>>>>>> + }
>>>>>> +
>>>>>> + if (netif_running(port->ndev)) {
>>>>>> + hbl_en_port_close(port);
>>>>>> +
>>>>>> + /* Sleep in order to let obsolete events to be dropped before re-opening the port */
>>>>>> + msleep(20);
>>>>>> +
>>>>>> + netdev->mtu = new_mtu;
>>>>>> +
>>>>>> + rc = hbl_en_port_open(port);
>>>>>> + if (rc)
>>>>>> + netdev_err(netdev, "Failed to reinit port for MTU change, rc %d\n", rc);
>>>>>
>>>>> Does that mean the port is FUBAR?
>>>>>
>>>>> Most operations like this are expected to roll back to the previous
>>>>> working configuration on failure. So if changing the MTU requires new
>>>>> buffers in your ring, you should first allocate the new buffers, then
>>>>> free the old buffers, so that if allocation fails, you still have
>>>>> buffers, and the device can continue operating.
>>>>>
>>>>
>>>> A failure in opening a port is a fatal error. It shouldn't happen. This is
>>>> not something we wish to recover from.
>>>
>>> What could cause open to fail? Is memory allocated?
>>>
>>
>> Memory is allocated but it is freed in case of a failure.
>> Port opening can fail due to other reasons as well like some HW timeout
>> while configuring the ETH QP.
>
> If the hardware timeout because the hardware is dead, there is nothing
> you can do about it. Its dead.
>
In our case the HW might timeout without being dead. Our ETH and RDMA QPs
are being configured in the same path in the HW so it is possible that a
timeout for ETH QP configuration will occur due to many parallel RDMA QPs
configurations and so a simple ETH QP configuration retry will solve it.
> But what about when the system is under memory pressure? You say it
> allocates memory. What happens if those allocations fail. Does
> changing the MTU take me from a working system to a dead system? It is
> good practice to not kill a working system under situations like
> memory pressure. You try to first allocate the memory you need to
> handle the new MTU, and only if successful do you free existing memory
> you no longer need. That means if you cannot allocate the needed
> memory, you still have the old memory, you can keep the old MTU and
> return -ENOMEM, and the system keeps running.
>
That's a good optimization for these kind of on-the-fly configurations but
as you wrote before, changing an MTU value is not a hot path so out of
cost-benefit considerations we didn't find it mandatory to optimize this
flow.
But let me check this option for the next patch set version.
>> I didn't check that prior to my submit. Regarding this "no new module
>> parameters allowed" rule, is that documented anywhere?
>
> Lots of emails that fly passed on the mailing list. Maybe once every
> couple of months when a vendor tries to mainline a new driver without
> reading the mailing list for a few months to know how mainline
> actually works. I _guess_ Davem has been pushing back on module
> parameters for 10 years? Maybe more.
>
>
Ok, I'll just drop it in the next patch set version.
>> if not, is that the
>> common practice? not to try to do something that was not done recently?
>> how "recently" is defined?
>> I just want to clarify this because it's hard to handle these submissions
>> when we write some code based on existing examples but then we are
>> rejected because "we don't do that here anymore".
>> I want to avoid future cases of this mismatch.
>
> My suggestion would be to spend 30 minutes every day reading patches
> and review comment on the mailing list. Avoid making the same mistakes
> others make, especially newbies to mainline, and see what others are
> doing in the same niche as this device. 30 minutes might seem like a
> lot, but how much time did you waste implementing polling mode, now
> you are going to throw it away?
>
I get your point but still it will be good if it would be documented
somewhere IMHO.
>>>>>> + ethtool_link_ksettings_add_link_mode(cmd, lp_advertising, Autoneg);
>>>>>
>>>>> That looks odd. Care to explain?
>>>>>
>>>>
>>>> The HW of all of our ports supports autoneg.
>>>> But in addition, the ports are divided to two groups:
>>>> internal: ports which are connected to other Gaudi2 ports in the same server.
>>>> external: ports which are connected to an external switch.
>>>> Only internal ports use autoneg.
>>>> The ports mask which sets each port as internal/external is fetched from
>>>> the FW on device load.
>>>
>>> That is not what i meant. lc_advertising should indicate the link
>>> modes the peer is advertising. If this was a copper link, it typically
>>> would contain 10BaseT-Half, 10BaseT-Full, 100BaseT-Half,
>>> 100BaseT-Full, 1000BaseT-Half. Setting the Autoneg bit is pointless,
>>> since the peer must be advertising in order that lp_advertising has a
>>> value!
>>>
>>
>> Sorry, but I don't get this. The problem is the setting of the Autoneg bit
>> in lp_advertising? is that redundant? I see that other vendors set it too
>> in case that Autoneg was completed.
>
>
> $ ethtool eth0
> Settings for eth0:
> Supported ports: [ TP MII ]
> Supported link modes: 10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
> 1000baseT/Full
>
> This is `supported`. The hardware can do these link modes.
>
> Supported pause frame use: Symmetric Receive-only
> Supports auto-negotiation: Yes
>
> It also support symmetric pause, and can do autoneg.
>
> Supported FEC modes: Not reported
> Advertised link modes: 10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
> 1000baseT/Full
> Advertised pause frame use: Symmetric Receive-only
> Advertised auto-negotiation: Yes
> Advertised FEC modes: Not reported
>
> This is `advertising`, and is what this device is advertising to the
> link partner. By default you copy supported into advertising, but the
> user can use ethtool -s advertise N, where N is a list of link modes,
> to change what is advertised to the link partner.
>
> Link partner advertised link modes: 10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
> 1000baseT/Full
> Link partner advertised pause frame use: Symmetric
> Link partner advertised auto-negotiation: Yes
> Link partner advertised FEC modes: Not reported
>
> This is `lp_advertising`, what the link partner is advertising to this
> device. Once you have this, you mask lp_advertising with advertising,
> and generally pick the link mode with the highest bandwidth:
>
> Speed: 1000Mb/s
> Duplex: Full
>
> So autoneg resolved to 1000baseT/Full
>
> Andrew
I'm familiar with this logic but I don't understand your point. The point
you are making is that setting this Autoneg bit in lp_advertising is
pointless? I see other vendors setting it too in case that autoneg was
completed.
Is that redundant also in their case? because it looks to me that in this
case we followed the same logic and conventions other vendors followed.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 06/15] net: hbl_cn: debugfs support
2024-06-21 15:33 ` Andrew Lunn
@ 2024-06-23 6:57 ` Omer Shpigelman
2024-06-23 15:02 ` Andrew Lunn
0 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-23 6:57 UTC (permalink / raw)
To: Andrew Lunn
Cc: Sunil Kovvuri Goutham, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On 6/21/24 18:33, Andrew Lunn wrote:
>> I see other vendors have debugfs entries for debug configurations or
>> settings, not just for dumping debug info.
>
> Did you see any added in the last few years? This is also something
> DaveM pushed back on. We want uniform APIs so that all devices look
> alike. Please consider what you are exporting here, how it should
> cleanly fit into ethtool, devlink, etc, and expand these APIs to cover
> your needs.
>
If it's problematic then I'll try to stick to the ones which expose debug
info and maybe some other necessary debug options e.g. loopback. I'll try
to minimize by removing anything that is not mandatory.
>>
>>>> +What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_mac_loopback
>>>
>>> Why not use ethtool ?
>>>
>>
>> We have an ethtool option for that, but we have also internal NIC ports
>> that are not exposed as netdevices and for them the ethtool path is
>> irrelevant. Hence we need this debugfs option as well.
>
> If there is no netdev, what is the point of putting it into loopback?
> How do you send packets which are to be looped back? How do you
> receive them to see if they were actually looped back?
>
> Andrew
To run RDMA test in loopback. That's how we can pinpoint problems like
packet drops or performance degradation. For example, if packet drops were
seen on the port then it is crucial to know if these drops are
reproducible in loopback mode. If they are, then the root cause is
probably some internal HW failure or misconfiguration. If not, then the
packet drops might be related to the link quality.
We send these packets by setting a loopback bypass in the MAC layer. The
packets themselves and the NIC HW logic are agnostic to the loopback
setting (except of the packet's MAC address).
We receive and validate them and in the same way we receive and validate
regular packets - the loopback bypass is in the MAC layer which is in a
lower layer than our NIC HW logic.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 06/15] net: hbl_cn: debugfs support
2024-06-21 10:30 ` Sunil Kovvuri Goutham
@ 2024-06-23 7:25 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-23 7:25 UTC (permalink / raw)
To: Sunil Kovvuri Goutham, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org
Cc: ogabbay@kernel.org, Zvika Yehudai
On 6/21/24 13:30, Sunil Kovvuri Goutham wrote:
>>>> +
>>>> +What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_disable_decap
>>>> +What: /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_inject_rx_err
>>>> +What:
>> /sys/kernel/debug/habanalabs_cn/hbl_cn<n>/nic_mac_lane_remap
>>>
>>> Don't think debugfs is the correct interface for all this configuration.
>>> Debugfs should ideally be used for dumping runtime device state info for debug
>> purposes.
>>>
>>
>> I see other vendors have debugfs entries for debug configurations or settings, not
>> just for dumping debug info.
>>
>
> But disable_decap / mac_lane_remap seems configuration related which changes the way pkts are processed not debug.
> Configurations are supported via devlink.
>
As I wrote to Andrew, I'll stick to the debugfs entries that are really
necessary for us.
BTW the entries you mentioned are not regular configurations but advanced
settings to augment debuggability. But yeah, if we can set these via
devlink then it is better to use it.
Let me revisit this for the next patch set version.
> Thanks,
> Sunil.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-23 6:22 ` Omer Shpigelman
@ 2024-06-23 14:46 ` Andrew Lunn
2024-06-26 10:13 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Andrew Lunn @ 2024-06-23 14:46 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
> > But what about when the system is under memory pressure? You say it
> > allocates memory. What happens if those allocations fail. Does
> > changing the MTU take me from a working system to a dead system? It is
> > good practice to not kill a working system under situations like
> > memory pressure. You try to first allocate the memory you need to
> > handle the new MTU, and only if successful do you free existing memory
> > you no longer need. That means if you cannot allocate the needed
> > memory, you still have the old memory, you can keep the old MTU and
> > return -ENOMEM, and the system keeps running.
> >
>
> That's a good optimization for these kind of on-the-fly configurations but
> as you wrote before, changing an MTU value is not a hot path so out of
> cost-benefit considerations we didn't find it mandatory to optimize this
> flow.
I would not call this an optimization. And it is not just about
changing the MTU. ethtool set_ringparam() is also likely to run into
this problem, and any other configuration which requires reallocating
the rings.
This is something else which comes up every few months on the list,
and drivers writers who monitor the list will write their drivers that
why, not 'optimise' it later.
> I get your point but still it will be good if it would be documented
> somewhere IMHO.
Kernel documentation is poor, agreed. But kernel policy is also
somewhat fluid, best practices change, and any developers can
influence that policy, different subsystems can and do have
contradictory policy, etc. The mailing list is the best place to learn
and to take part in this community. You need to be on the list for
other reasons as well.
> I'm familiar with this logic but I don't understand your point. The point
> you are making is that setting this Autoneg bit in lp_advertising is
> pointless? I see other vendors setting it too in case that autoneg was
> completed.
> Is that redundant also in their case? because it looks to me that in this
> case we followed the same logic and conventions other vendors followed.
Please show us the output from ethtool. Does it look like the example
i showed? I must admit, i'm more from the embedded world and don't
have access to high speed interfaces. But the basic concept of
auto-neg should not change that much.
Andrew
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-20 19:14 ` Andrew Lunn
@ 2024-06-23 14:48 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-23 14:48 UTC (permalink / raw)
To: Andrew Lunn, Jakub Kicinski
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/20/24 22:14, Andrew Lunn wrote:
> On Thu, Jun 20, 2024 at 06:51:35AM -0700, Jakub Kicinski wrote:
>> On Thu, 20 Jun 2024 08:43:34 +0000 Omer Shpigelman wrote:
>>>> You support 400G, you really need to give the user the ability
>>>> to access higher pages.
>>>
>>> Actually the 200G and 400G modes in the ethtool code should be removed
>>> from this patch set. They are not relevant for Gaudi2. I'll fix it in the
>>> next version.
>>
>> How do your customers / users check SFP diagnostics?
>
> And perform firmware upgrade of the SFPs?
>
> https://lore.kernel.org/netdev/20240619121727.3643161-7-danieller@nvidia.com/T/
>
> Andrew
>
Via OAM I2C Master.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 06/15] net: hbl_cn: debugfs support
2024-06-23 6:57 ` Omer Shpigelman
@ 2024-06-23 15:02 ` Andrew Lunn
2024-06-24 7:21 ` Omer Shpigelman
` (2 more replies)
0 siblings, 3 replies; 107+ messages in thread
From: Andrew Lunn @ 2024-06-23 15:02 UTC (permalink / raw)
To: Omer Shpigelman
Cc: Sunil Kovvuri Goutham, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
> > If there is no netdev, what is the point of putting it into loopback?
> > How do you send packets which are to be looped back? How do you
> > receive them to see if they were actually looped back?
> >
> > Andrew
>
> To run RDMA test in loopback.
What is special about your RDMA? Why do you need something which other
vendors don't? Please solve this problem for all RDMA devices, not
yours.
This all part of the same thing with respect to module
parameters. Vendors would add module parameters for something. Other
vendors would have the same concept, but give it a different name,
different values. It was all poorly documented. You had to read the
kernel sources to figure out what kernel module parameters do. Same
goes for debugfs, driver values in /proc, /sysfs or /debugfs. So for
years we have been pushing back on things like this.
If you have something which is unique to your hardware, no other
vendor is ever going to have the same, then you can make an argument
for something driver specific in /debugfs. But RDMA loopback tests is
clearly not specific to your driver. Extend the KAPI and tools to
cover this, document the KAPI, write the man page, and let other
vendors implement the little bit they need in their driver, so users
have a uniform way of doing things over a rather of devices.
You will get a lot of pushback on everything in /debugfs, so please
review them all with this in mind.
Andrew
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 06/15] net: hbl_cn: debugfs support
2024-06-23 15:02 ` Andrew Lunn
@ 2024-06-24 7:21 ` Omer Shpigelman
2024-06-24 9:22 ` Leon Romanovsky
2024-12-17 10:00 ` Avri Kehat
2 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-24 7:21 UTC (permalink / raw)
To: Andrew Lunn
Cc: Sunil Kovvuri Goutham, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On 6/23/24 18:02, Andrew Lunn wrote:
>>> If there is no netdev, what is the point of putting it into loopback?
>>> How do you send packets which are to be looped back? How do you
>>> receive them to see if they were actually looped back?
>>>
>>> Andrew
>>
>> To run RDMA test in loopback.
>
> What is special about your RDMA? Why do you need something which other
> vendors don't? Please solve this problem for all RDMA devices, not
> yours.
>
> This all part of the same thing with respect to module
> parameters. Vendors would add module parameters for something. Other
> vendors would have the same concept, but give it a different name,
> different values. It was all poorly documented. You had to read the
> kernel sources to figure out what kernel module parameters do. Same
> goes for debugfs, driver values in /proc, /sysfs or /debugfs. So for
> years we have been pushing back on things like this.
>
> If you have something which is unique to your hardware, no other
> vendor is ever going to have the same, then you can make an argument
> for something driver specific in /debugfs. But RDMA loopback tests is
> clearly not specific to your driver. Extend the KAPI and tools to
> cover this, document the KAPI, write the man page, and let other
> vendors implement the little bit they need in their driver, so users
> have a uniform way of doing things over a rather of devices.
>
> You will get a lot of pushback on everything in /debugfs, so please
> review them all with this in mind.
>
> Andrew
I see your point and I'll keep that in mind. For these kinds of
configurations we can use devlink instead of debugfs.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 04/15] net: hbl_cn: QP state machine
2024-06-18 9:00 ` Leon Romanovsky
@ 2024-06-24 7:24 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-24 7:24 UTC (permalink / raw)
To: Leon Romanovsky
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/18/24 12:00, Leon Romanovsky wrote:
> On Tue, Jun 18, 2024 at 07:58:55AM +0000, Omer Shpigelman wrote:
>> On 6/18/24 10:08, Leon Romanovsky wrote:
>>> On Tue, Jun 18, 2024 at 05:50:15AM +0000, Omer Shpigelman wrote:
>>>> On 6/17/24 16:18, Leon Romanovsky wrote:
>>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>>
>>>>> On Thu, Jun 13, 2024 at 11:21:57AM +0300, Omer Shpigelman wrote:
>>>>>> Add a common QP state machine which handles the moving for a QP from one
>>>>>> state to another including performing necessary checks, draining
>>>>>> in-flight transactions, invalidating caches and error reporting.
>>>>>>
>>>>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
>>>>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
>>>>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
>>>>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
>>>>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
>>>>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
>>>>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
>>>>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
>>>>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
>>>>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
>>>>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
>>>>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
>>>>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
>>>>>> ---
>>>>>> .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 480 +++++++++++++++++-
>>>>>> 1 file changed, 479 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>>>>>> index 9ddc23bf8194..26ebdf448193 100644
>>>>>> --- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>>>>>> +++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>>>>>> @@ -6,8 +6,486 @@
>>>>>
>>>>> <...>
>>>>>
>>>>>> +/* The following table represents the (valid) operations that can be performed on
>>>>>> + * a QP in order to move it from one state to another
>>>>>> + * For example: a QP in RTR state can be moved to RTS state using the CN_QP_OP_RTR_2RTS
>>>>>> + * operation.
>>>>>> + */
>>>>>> +static const enum hbl_cn_qp_state_op qp_valid_state_op[CN_QP_NUM_STATE][CN_QP_NUM_STATE] = {
>>>>>> + [CN_QP_STATE_RESET] = {
>>>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>>>> + [CN_QP_STATE_INIT] = CN_QP_OP_RST_2INIT,
>>>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>>>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
>>>>>> + },
>>>>>> + [CN_QP_STATE_INIT] = {
>>>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>>>> + [CN_QP_STATE_INIT] = CN_QP_OP_NOP,
>>>>>> + [CN_QP_STATE_RTR] = CN_QP_OP_INIT_2RTR,
>>>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>>>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
>>>>>> + },
>>>>>> + [CN_QP_STATE_RTR] = {
>>>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>>>> + [CN_QP_STATE_RTR] = CN_QP_OP_RTR_2RTR,
>>>>>> + [CN_QP_STATE_RTS] = CN_QP_OP_RTR_2RTS,
>>>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>>>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_RTR_2QPD,
>>>>>> + },
>>>>>> + [CN_QP_STATE_RTS] = {
>>>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>>>> + [CN_QP_STATE_RTS] = CN_QP_OP_RTS_2RTS,
>>>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_RTS_2SQD,
>>>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_RTS_2QPD,
>>>>>> + [CN_QP_STATE_SQERR] = CN_QP_OP_RTS_2SQERR,
>>>>>> + },
>>>>>> + [CN_QP_STATE_SQD] = {
>>>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_SQD_2SQD,
>>>>>> + [CN_QP_STATE_RTS] = CN_QP_OP_SQD_2RTS,
>>>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_SQD_2QPD,
>>>>>> + [CN_QP_STATE_SQERR] = CN_QP_OP_SQD_2SQ_ERR,
>>>>>> + },
>>>>>> + [CN_QP_STATE_QPD] = {
>>>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>>>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
>>>>>> + [CN_QP_STATE_RTR] = CN_QP_OP_QPD_2RTR,
>>>>>> + },
>>>>>> + [CN_QP_STATE_SQERR] = {
>>>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_SQ_ERR_2SQD,
>>>>>> + [CN_QP_STATE_SQERR] = CN_QP_OP_NOP,
>>>>>> + },
>>>>>> + [CN_QP_STATE_ERR] = {
>>>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>>>> + }
>>>>>> +};
>>>>>
>>>>> I don't understand why IBTA QP state machine is declared in ETH driver
>>>>> and not in IB driver.
>>>>>
>>>>
>>>> Implementing the actual transitions between the states requires full
>>>> knowledge of the HW e.g. when to flush, cache invalidation, timeouts.
>>>> Our IB driver is agnostic to the ASIC type by design. Note that more ASIC
>>>> generations are planned to be added and the IB driver should not be aware
>>>> of these additional HWs.
>>>> Hence we implemeted the QP state machine in the CN driver which is aware
>>>> of the actual HW.
>>>
>>> Somehow ALL other IB drivers are able to implement this logic in the IB,
>>> while supporting multiple ASICs. I don't see a reason why you can't do
>>> the same.
>>>
>>
>> If we are referring to this actual table, then I can move it to the IB
>> driver and the CN driver will fetch the needed opcode via a function
>> pointer.
>> Is that ok?
>
> This table spotted my attention, but right separation shouldn't be limited
> to only this table. The outcome of this conversation should be:
> "IB specific logic should be in IB driver, and CN driver should be able to
> handle only low-level operations".
>
> Thanks
Ok, I'll check how we can move the IB specific logic to the IB driver.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-19 10:52 ` Leon Romanovsky
@ 2024-06-24 8:47 ` Omer Shpigelman
2024-06-24 9:10 ` Leon Romanovsky
2024-06-28 10:24 ` Omer Shpigelman
1 sibling, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-24 8:47 UTC (permalink / raw)
To: Leon Romanovsky
Cc: gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On 6/19/24 13:52, Leon Romanovsky wrote:
> On Wed, Jun 19, 2024 at 09:27:54AM +0000, Omer Shpigelman wrote:
>> On 6/18/24 15:58, Leon Romanovsky wrote:
>>> On Tue, Jun 18, 2024 at 11:08:34AM +0000, Omer Shpigelman wrote:
>>>> On 6/17/24 22:04, Leon Romanovsky wrote:
>>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>>
>>>>> On Mon, Jun 17, 2024 at 05:43:49PM +0000, Omer Shpigelman wrote:
>>>>>> On 6/13/24 22:18, Leon Romanovsky wrote:
>>>>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>>>>
>>>>>>> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
>>>>>>>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
>>>>>>>> The driver itself is agnostic to the ASIC in action, it operates according
>>>>>>>> to the capabilities that were passed on device initialization.
>>>>>>>> The device is initialized by the hbl_cn driver via auxiliary bus.
>>>>>>>> The driver also supports QP resource tracking and port/device HW counters.
>>>>>>>>
>>>>>>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
>>>>>>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
>>>>>>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
>>>>>>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
>>>>>>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
>>>>>>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
>>>>>>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
>>>>>>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
>>>>>>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
>>>>>>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
>>>>>>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
>>>>>>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
>>>>>>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
>>>>>>>
>>>>>>> I afraid that you misinterpreted the "Co-developed-by" tag. All these
>>>>>>> people are probably touch the code and not actually sit together at
>>>>>>> the same room and write the code together. So, please remove the
>>>>>>> extensive "Co-developed-by" tags.
>>>>>>>
>>>>>>> It is not full review yet, but simple pass-by-comments.
>>>>>>>
>>>>>>
>>>>>> Actually except of two, all of the mentioned persons sat in the same room
>>>>>> and developed the code together.
>>>>>> The remaining two are located on a different site (but also together).
>>>>>> Isn't that what "Co-developed-by" tag for?
>>>>>> I wanted to give them credit for writing the code but I can remove if it's
>>>>>> not common.
>>>>>
>>>>> Signed-off-by will be enough to give them credit.
>>>>>
>>>>
>>>> Ok, good enough.
>>>>
>>>>>>
>>>>>>>> ---
>>>>>>>> MAINTAINERS | 10 +
>>>>>>>> drivers/infiniband/Kconfig | 1 +
>>>>>>>> drivers/infiniband/hw/Makefile | 1 +
>>>>>>>> drivers/infiniband/hw/hbl/Kconfig | 17 +
>>>>>>>> drivers/infiniband/hw/hbl/Makefile | 8 +
>>>>>>>> drivers/infiniband/hw/hbl/hbl.h | 326 +++
>>>>>>>> drivers/infiniband/hw/hbl/hbl_main.c | 478 ++++
>>>>>>>> drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 ++++++++++++++++++++++
>>>>>>>> include/uapi/rdma/hbl-abi.h | 204 ++
>>>>>>>> include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
>>>>>>>> include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
>>>>>>>> include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
>>>>>>>> 12 files changed, 3904 insertions(+)
>>>>>>>> create mode 100644 drivers/infiniband/hw/hbl/Kconfig
>>>>>>>> create mode 100644 drivers/infiniband/hw/hbl/Makefile
>>>>>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl.h
>>>>>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c
>>>>>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c
>>>>>>>> create mode 100644 include/uapi/rdma/hbl-abi.h
>>>>>>>> create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h
>>>>>>>> create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h
>>>>>>>
>>>>>>> <...>
>>>>>>>
>>>>>>>> +#define hbl_ibdev_emerg(ibdev, format, ...) ibdev_emerg(ibdev, format, ##__VA_ARGS__)
>>>>>>>> +#define hbl_ibdev_alert(ibdev, format, ...) ibdev_alert(ibdev, format, ##__VA_ARGS__)
>>>>>>>> +#define hbl_ibdev_crit(ibdev, format, ...) ibdev_crit(ibdev, format, ##__VA_ARGS__)
>>>>>>>> +#define hbl_ibdev_err(ibdev, format, ...) ibdev_err(ibdev, format, ##__VA_ARGS__)
>>>>>>>> +#define hbl_ibdev_warn(ibdev, format, ...) ibdev_warn(ibdev, format, ##__VA_ARGS__)
>>>>>>>> +#define hbl_ibdev_notice(ibdev, format, ...) ibdev_notice(ibdev, format, ##__VA_ARGS__)
>>>>>>>> +#define hbl_ibdev_info(ibdev, format, ...) ibdev_info(ibdev, format, ##__VA_ARGS__)
>>>>>>>> +#define hbl_ibdev_dbg(ibdev, format, ...) ibdev_dbg(ibdev, format, ##__VA_ARGS__)
>>>>>>>> +
>>>>>>>> +#define hbl_ibdev_emerg_ratelimited(ibdev, fmt, ...) \
>>>>>>>> + ibdev_emerg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>>>> +#define hbl_ibdev_alert_ratelimited(ibdev, fmt, ...) \
>>>>>>>> + ibdev_alert_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>>>> +#define hbl_ibdev_crit_ratelimited(ibdev, fmt, ...) \
>>>>>>>> + ibdev_crit_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>>>> +#define hbl_ibdev_err_ratelimited(ibdev, fmt, ...) \
>>>>>>>> + ibdev_err_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>>>> +#define hbl_ibdev_warn_ratelimited(ibdev, fmt, ...) \
>>>>>>>> + ibdev_warn_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>>>> +#define hbl_ibdev_notice_ratelimited(ibdev, fmt, ...) \
>>>>>>>> + ibdev_notice_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>>>> +#define hbl_ibdev_info_ratelimited(ibdev, fmt, ...) \
>>>>>>>> + ibdev_info_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>>>> +#define hbl_ibdev_dbg_ratelimited(ibdev, fmt, ...) \
>>>>>>>> + ibdev_dbg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>>>>>>>> +
>>>>>>>
>>>>>>> Please don't redefine the existing macros. Just use the existing ones.
>>>>>>>
>>>>>>>
>>>>>>> <...>
>>>>>>>
>>>>>>
>>>>>> That's a leftover from some debug code. I'll remove.
>>>>>>
>>>>>>>> + if (hbl_ib_match_netdev(ibdev, netdev))
>>>>>>>> + ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
>>>>>>>> + else
>>>>>>>> + return NOTIFY_DONE;
>>>>>>>
>>>>>>> It is not kernel coding style. Please write:
>>>>>>> if (!hbl_ib_match_netdev(ibdev, netdev))
>>>>>>> return NOTIFY_DONE;
>>>>>>>
>>>>>>> ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
>>>>>>>
>>>>>>
>>>>>> I'll fix the code, thanks.
>>>>>>
>>>>>>>> +
>>>>>>>
>>>>>>> <...>
>>>>>>>
>>>>>>>> +static int hbl_ib_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
>>>>>>>> +{
>>>>>>>> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
>>>>>>>> + struct hbl_ib_aux_ops *aux_ops = aux_dev->aux_ops;
>>>>>>>> + struct hbl_ib_device *hdev;
>>>>>>>> + ktime_t timeout;
>>>>>>>> + int rc;
>>>>>>>> +
>>>>>>>> + rc = hdev_init(aux_dev);
>>>>>>>> + if (rc) {
>>>>>>>> + dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
>>>>>>>> + return -EIO;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + hdev = aux_dev->priv;
>>>>>>>> +
>>>>>>>> + /* don't allow module unloading while it is attached */
>>>>>>>> + if (!try_module_get(THIS_MODULE)) {
>>>>>>>
>>>>>>> This part makes wonder, what are you trying to do here? What doesn't work for you
>>>>>>> in standard driver core and module load mechanism?
>>>>>>>
>>>>>>
>>>>>> Before auxiliary bus was introduced, we used EXPORT_SYMBOLs for inter
>>>>>> driver communication. That incremented the refcount of the used module so
>>>>>> it couldn't be removed while it is in use.
>>>>>> Auxiliary bus usage doesn't increment the used module refcount and hence
>>>>>> the used module can be removed while it is in use and that's something
>>>>>> we don't want to allow.
>>>>>> We could solve it by some global locking or in_use atomic but the most
>>>>>> simple and clean way is just to increment the used module refcount on
>>>>>> auxiliary device probe and decrement it on auxiliary device removal.
>>>>>
>>>>> No, you was supposed to continue to use EXPORT_SYMBOLs and don't
>>>>> invent auxiliary ops structure (this is why you lost module
>>>>> reference counting).
>>>>>
>>>>
>>>> Sorry, but according to the auxiliary bus doc, a domain-specific ops
>>>> structure can be used.
>>>> We followed the usage example described at drivers/base/auxiliary.c.
>>>> What am I missing?
>>>
>>> Being the one who implemented auxiliary bus in the kernel and converted
>>> number of drivers to use it, I strongly recommend do NOT follow the example
>>> provided there.
>>>
>>> So you are missing "best practice", and "best practice" is to use
>>> EXPORT_SYMBOLs and rely on module reference counting.
>>>
>>
>> It is not just the usage example but also the general feature doc before
>> it:
>> "The generic behavior can be extended and specialized as needed by
>> encapsulating an auxiliary_device within other domain-specific structures
>> and the use of .ops callbacks."
>> It is also mentioned there that the ops structure are used for specific
>> auxiliary device operations while EXPORT_SYMBOLs should be used for common
>> infrastrucure the parent driver exposes:
>> "Note that ops are intended as a way to augment instance behavior within a
>> class of auxiliary devices, it is not the mechanism for exporting common
>> infrastructure from the parent."
>> All of our ops callbacks are meant to provide functionality related to the
>> auxiliary device, they are not just general/common infrastructure.
>
> Of course they are common, otherwise why did you put them in common code?
> For example, you have callbacks to lock and unlock internal HW access,
> how is it not common?
>
As I saw it, the "common" functions are general capabilities the parent
driver exposes, not necessaritly related to the auxiliary device.
But let me revisit this and try to restructure the code so the parent
driver will use EXPORT_SYMBOLs.
>>
>> Why do we have this doc if we should ignore it? why wasn't the doc
>> modified according to the "best practice" you described? the doc is
>> misleading.
>
> Because this is how upstream kernel development works. We are trying to
> come to the agreement and get the best solution for the problem. Sometimes,
> the outcome of the discussion is not "the best solution", but "good
> enough". This doc can be served as an example. Everyone involved in the
> development of auxbus and later usage of it, were focused on implementation,
> documentation was good enough as it didn't limit anyone who actually
> used it.
>
I get your point but still I think that the doc is misleading if it shows
a usage exmaple but practically no one should follow it.
Better to remove this usage exmaple completely IMHO.
>>
>> Adding gregkh here as he requested the auxiliary bus feature IIRC.
>> Greg - isn't the doc legit? should EXPORT_SYMBOLs necessarily be used
>> together with auxiliary bus rather than ops structure?
>
> This is not what you are doing here. You completely ditched EXPORT_SYMBOLs
> and reinvented module reference counting which overcomplicated the code
> just to avoid using standard kernel mechanism.
>
>> As we saw it, auxiliary bus gives us the flexibility to choose which
>> modules will be loaded while EXPORT_SYMBOLs enforces the dependencies
>> which might not be needed in some cases.
>>
>>>> Moreover, we'd like to support the mode where the IB or the ETH driver is
>>>> not loaded at all. But this cannot be achieved if we use EXPORT_SYMBOLs
>>>> exclusively for inter driver communication.
>>>
>>> It is not true and not how the kernel works. You can perfectly load core
>>> driver without IB and ETH, at some extent this is how mlx5 driver works.
>>>
>>
>> mlx5 IB driver doesn't export any symbol that is used by the core driver,
>> that's why the core driver can be loaded without the IB driver (althought
>> you'll get circular dependency if you would export).
>
> Yes, IB and ETH drivers are "users" of core driver. As RDMA maintainer,
> I'm reluctant to accept code that exports symbols from IB drivers to
> other subsystems. We have drivers/infiniband/core/ for that.
>
So we'll need to restructure the code to follow this limitation. We'll
take care of it for the next patch set version.
BTW if you won't allow such driver specific EXPORT_SYMBOLs, I think it is
good to have it documented similarly to other "don't do" guideliens in the
infiniband doc.
That's because in the net/ethernet subsystem for exmaple it is very common
to add such driver specific EXPORT_SYMBOLs.
>> If relying on exported symbols only, then our IB and ETH drivers will need
>> to export symbols too because the core driver accesses them post probing.
>
> So you should fix your core driver. This is exactly what auxbus model
> proposes.
>
>> Hence we won't be able to load the core driver without both of them (or
>> loading anything due to circular dependency).
>> Unless we'll use dynamic symbol lookup and I don't think that's your
>> intention.
>
> No it is not.
>
>>
>>>>
>>>>>>
>>>>>>>> + dev_err(hdev->dev, "Failed to increment %s module refcount\n",
>>>>>>>> + module_name(THIS_MODULE));
>>>>>>>> + rc = -EIO;
>>>>>>>> + goto module_get_err;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
>>>>>>>> + while (1) {
>>>>>>>> + aux_ops->hw_access_lock(aux_dev);
>>>>>>>> +
>>>>>>>> + /* if the device is operational, proceed to actual init while holding the lock in
>>>>>>>> + * order to prevent concurrent hard reset
>>>>>>>> + */
>>>>>>>> + if (aux_ops->device_operational(aux_dev))
>>>>>>>> + break;
>>>>>>>> +
>>>>>>>> + aux_ops->hw_access_unlock(aux_dev);
>>>>>>>> +
>>>>>>>> + if (ktime_compare(ktime_get(), timeout) > 0) {
>>>>>>>> + dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
>>>>>>>> + rc = -EBUSY;
>>>>>>>> + goto timeout_err;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing IB\n");
>>>>>>>> +
>>>>>>>> + msleep_interruptible(MSEC_PER_SEC);
>>>>>>>> + }
>>>>>>>
>>>>>>> The code above is unexpected.
>>>>>>>
>>>>>>
>>>>>> We have no control on when the user insmod the IB driver.
>>>>>
>>>>> It is not true, this is controlled through module dependencies
>>>>> mechanism.
>>>>>
>>>>
>>>> Yeah, if we would use EXPORT_SYMBOLs for inter driver communication but
>>>> we don't.
>>>
>>> So please use it and don't add complexity where it is not needed.
>>>
>>>>
>>>>>> As a result it is possible that the IB auxiliary device will be probed
>>>>>> while the compute device is under reset (due to some HW error).
>>>>>
>>>>> No, it is not possible. If you structure your driver right.
>>>>>
>>>>
>>>> Again, it is not possible if we would use EXPORT_SYMBOLs.
>>>> Please let me know if we misunderstood something because AFAIU we followed
>>>> the auxiliary bus doc usage example.
>>>
>>> It is better to follow actual drivers that use auxiliary bus and see how
>>> they implemented it and not rely on examples in the documentation.
>>>
>>
>> But isn't that what the doc for? to explain the guidelines? and it's not
>> that there is a big red note there of "this example should not be taken as
>> is, please look at your subsystem guidelines".
>
> At the beginning that doc was located in Documentation/ folder and no one
> really cared about it. After moving from Documentation/ to drivers/base/auxiliary.c,
> it became more visible, but still no one relied on it. You are first one
> who read.
>
> There is no subsystem rules here. Everyone relied on EXPORT_SYMBOLs and didn't
> use ops structure. Kernel is evolving project, there is no need to find a rule
> for everything.
>
> Thanks
>
>>
>>> Thanks
>>>
>>>>
>>>>> Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-24 8:47 ` Omer Shpigelman
@ 2024-06-24 9:10 ` Leon Romanovsky
0 siblings, 0 replies; 107+ messages in thread
From: Leon Romanovsky @ 2024-06-24 9:10 UTC (permalink / raw)
To: Omer Shpigelman
Cc: gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On Mon, Jun 24, 2024 at 08:47:41AM +0000, Omer Shpigelman wrote:
> On 6/19/24 13:52, Leon Romanovsky wrote:
> > On Wed, Jun 19, 2024 at 09:27:54AM +0000, Omer Shpigelman wrote:
> >> On 6/18/24 15:58, Leon Romanovsky wrote:
> >>> On Tue, Jun 18, 2024 at 11:08:34AM +0000, Omer Shpigelman wrote:
> >>>> On 6/17/24 22:04, Leon Romanovsky wrote:
> >>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>>>
> >>>>> On Mon, Jun 17, 2024 at 05:43:49PM +0000, Omer Shpigelman wrote:
> >>>>>> On 6/13/24 22:18, Leon Romanovsky wrote:
> >>>>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>>>>>
> >>>>>>> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
> >>>>>>>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
> >>>>>>>> The driver itself is agnostic to the ASIC in action, it operates according
> >>>>>>>> to the capabilities that were passed on device initialization.
> >>>>>>>> The device is initialized by the hbl_cn driver via auxiliary bus.
> >>>>>>>> The driver also supports QP resource tracking and port/device HW counters.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> >>>>>>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> >>>>>>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> >>>>>>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> >>>>>>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> >>>>>>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> >>>>>>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> >>>>>>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
> >>>>>>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
> >>>>>>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> >>>>>>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> >>>>>>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> >>>>>>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> >>>>>>>
> >>>>>>> I afraid that you misinterpreted the "Co-developed-by" tag. All these
> >>>>>>> people are probably touch the code and not actually sit together at
> >>>>>>> the same room and write the code together. So, please remove the
> >>>>>>> extensive "Co-developed-by" tags.
> >>>>>>>
> >>>>>>> It is not full review yet, but simple pass-by-comments.
> >>>>>>>
> >>>>>>
> >>>>>> Actually except of two, all of the mentioned persons sat in the same room
> >>>>>> and developed the code together.
> >>>>>> The remaining two are located on a different site (but also together).
> >>>>>> Isn't that what "Co-developed-by" tag for?
> >>>>>> I wanted to give them credit for writing the code but I can remove if it's
> >>>>>> not common.
> >>>>>
> >>>>> Signed-off-by will be enough to give them credit.
> >>>>>
> >>>>
> >>>> Ok, good enough.
> >>>>
> >>>>>>
> >>>>>>>> ---
> >>>>>>>> MAINTAINERS | 10 +
> >>>>>>>> drivers/infiniband/Kconfig | 1 +
> >>>>>>>> drivers/infiniband/hw/Makefile | 1 +
> >>>>>>>> drivers/infiniband/hw/hbl/Kconfig | 17 +
> >>>>>>>> drivers/infiniband/hw/hbl/Makefile | 8 +
> >>>>>>>> drivers/infiniband/hw/hbl/hbl.h | 326 +++
> >>>>>>>> drivers/infiniband/hw/hbl/hbl_main.c | 478 ++++
> >>>>>>>> drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 ++++++++++++++++++++++
> >>>>>>>> include/uapi/rdma/hbl-abi.h | 204 ++
> >>>>>>>> include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 +
> >>>>>>>> include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 +
> >>>>>>>> include/uapi/rdma/ib_user_ioctl_verbs.h | 1 +
> >>>>>>>> 12 files changed, 3904 insertions(+)
> >>>>>>>> create mode 100644 drivers/infiniband/hw/hbl/Kconfig
> >>>>>>>> create mode 100644 drivers/infiniband/hw/hbl/Makefile
> >>>>>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl.h
> >>>>>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c
> >>>>>>>> create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c
> >>>>>>>> create mode 100644 include/uapi/rdma/hbl-abi.h
> >>>>>>>> create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h
> >>>>>>>> create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h
> >>>>>>>
> >>>>>>> <...>
> >>>>>>>
> >>>>>>>> +#define hbl_ibdev_emerg(ibdev, format, ...) ibdev_emerg(ibdev, format, ##__VA_ARGS__)
> >>>>>>>> +#define hbl_ibdev_alert(ibdev, format, ...) ibdev_alert(ibdev, format, ##__VA_ARGS__)
> >>>>>>>> +#define hbl_ibdev_crit(ibdev, format, ...) ibdev_crit(ibdev, format, ##__VA_ARGS__)
> >>>>>>>> +#define hbl_ibdev_err(ibdev, format, ...) ibdev_err(ibdev, format, ##__VA_ARGS__)
> >>>>>>>> +#define hbl_ibdev_warn(ibdev, format, ...) ibdev_warn(ibdev, format, ##__VA_ARGS__)
> >>>>>>>> +#define hbl_ibdev_notice(ibdev, format, ...) ibdev_notice(ibdev, format, ##__VA_ARGS__)
> >>>>>>>> +#define hbl_ibdev_info(ibdev, format, ...) ibdev_info(ibdev, format, ##__VA_ARGS__)
> >>>>>>>> +#define hbl_ibdev_dbg(ibdev, format, ...) ibdev_dbg(ibdev, format, ##__VA_ARGS__)
> >>>>>>>> +
> >>>>>>>> +#define hbl_ibdev_emerg_ratelimited(ibdev, fmt, ...) \
> >>>>>>>> + ibdev_emerg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>>>> +#define hbl_ibdev_alert_ratelimited(ibdev, fmt, ...) \
> >>>>>>>> + ibdev_alert_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>>>> +#define hbl_ibdev_crit_ratelimited(ibdev, fmt, ...) \
> >>>>>>>> + ibdev_crit_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>>>> +#define hbl_ibdev_err_ratelimited(ibdev, fmt, ...) \
> >>>>>>>> + ibdev_err_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>>>> +#define hbl_ibdev_warn_ratelimited(ibdev, fmt, ...) \
> >>>>>>>> + ibdev_warn_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>>>> +#define hbl_ibdev_notice_ratelimited(ibdev, fmt, ...) \
> >>>>>>>> + ibdev_notice_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>>>> +#define hbl_ibdev_info_ratelimited(ibdev, fmt, ...) \
> >>>>>>>> + ibdev_info_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>>>> +#define hbl_ibdev_dbg_ratelimited(ibdev, fmt, ...) \
> >>>>>>>> + ibdev_dbg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
> >>>>>>>> +
> >>>>>>>
> >>>>>>> Please don't redefine the existing macros. Just use the existing ones.
> >>>>>>>
> >>>>>>>
> >>>>>>> <...>
> >>>>>>>
> >>>>>>
> >>>>>> That's a leftover from some debug code. I'll remove.
> >>>>>>
> >>>>>>>> + if (hbl_ib_match_netdev(ibdev, netdev))
> >>>>>>>> + ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
> >>>>>>>> + else
> >>>>>>>> + return NOTIFY_DONE;
> >>>>>>>
> >>>>>>> It is not kernel coding style. Please write:
> >>>>>>> if (!hbl_ib_match_netdev(ibdev, netdev))
> >>>>>>> return NOTIFY_DONE;
> >>>>>>>
> >>>>>>> ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
> >>>>>>>
> >>>>>>
> >>>>>> I'll fix the code, thanks.
> >>>>>>
> >>>>>>>> +
> >>>>>>>
> >>>>>>> <...>
> >>>>>>>
> >>>>>>>> +static int hbl_ib_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
> >>>>>>>> +{
> >>>>>>>> + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
> >>>>>>>> + struct hbl_ib_aux_ops *aux_ops = aux_dev->aux_ops;
> >>>>>>>> + struct hbl_ib_device *hdev;
> >>>>>>>> + ktime_t timeout;
> >>>>>>>> + int rc;
> >>>>>>>> +
> >>>>>>>> + rc = hdev_init(aux_dev);
> >>>>>>>> + if (rc) {
> >>>>>>>> + dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
> >>>>>>>> + return -EIO;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + hdev = aux_dev->priv;
> >>>>>>>> +
> >>>>>>>> + /* don't allow module unloading while it is attached */
> >>>>>>>> + if (!try_module_get(THIS_MODULE)) {
> >>>>>>>
> >>>>>>> This part makes wonder, what are you trying to do here? What doesn't work for you
> >>>>>>> in standard driver core and module load mechanism?
> >>>>>>>
> >>>>>>
> >>>>>> Before auxiliary bus was introduced, we used EXPORT_SYMBOLs for inter
> >>>>>> driver communication. That incremented the refcount of the used module so
> >>>>>> it couldn't be removed while it is in use.
> >>>>>> Auxiliary bus usage doesn't increment the used module refcount and hence
> >>>>>> the used module can be removed while it is in use and that's something
> >>>>>> we don't want to allow.
> >>>>>> We could solve it by some global locking or in_use atomic but the most
> >>>>>> simple and clean way is just to increment the used module refcount on
> >>>>>> auxiliary device probe and decrement it on auxiliary device removal.
> >>>>>
> >>>>> No, you was supposed to continue to use EXPORT_SYMBOLs and don't
> >>>>> invent auxiliary ops structure (this is why you lost module
> >>>>> reference counting).
> >>>>>
> >>>>
> >>>> Sorry, but according to the auxiliary bus doc, a domain-specific ops
> >>>> structure can be used.
> >>>> We followed the usage example described at drivers/base/auxiliary.c.
> >>>> What am I missing?
> >>>
> >>> Being the one who implemented auxiliary bus in the kernel and converted
> >>> number of drivers to use it, I strongly recommend do NOT follow the example
> >>> provided there.
> >>>
> >>> So you are missing "best practice", and "best practice" is to use
> >>> EXPORT_SYMBOLs and rely on module reference counting.
> >>>
> >>
> >> It is not just the usage example but also the general feature doc before
> >> it:
> >> "The generic behavior can be extended and specialized as needed by
> >> encapsulating an auxiliary_device within other domain-specific structures
> >> and the use of .ops callbacks."
> >> It is also mentioned there that the ops structure are used for specific
> >> auxiliary device operations while EXPORT_SYMBOLs should be used for common
> >> infrastrucure the parent driver exposes:
> >> "Note that ops are intended as a way to augment instance behavior within a
> >> class of auxiliary devices, it is not the mechanism for exporting common
> >> infrastructure from the parent."
> >> All of our ops callbacks are meant to provide functionality related to the
> >> auxiliary device, they are not just general/common infrastructure.
> >
> > Of course they are common, otherwise why did you put them in common code?
> > For example, you have callbacks to lock and unlock internal HW access,
> > how is it not common?
> >
>
> As I saw it, the "common" functions are general capabilities the parent
> driver exposes, not necessaritly related to the auxiliary device.
> But let me revisit this and try to restructure the code so the parent
> driver will use EXPORT_SYMBOLs.
>
> >>
> >> Why do we have this doc if we should ignore it? why wasn't the doc
> >> modified according to the "best practice" you described? the doc is
> >> misleading.
> >
> > Because this is how upstream kernel development works. We are trying to
> > come to the agreement and get the best solution for the problem. Sometimes,
> > the outcome of the discussion is not "the best solution", but "good
> > enough". This doc can be served as an example. Everyone involved in the
> > development of auxbus and later usage of it, were focused on implementation,
> > documentation was good enough as it didn't limit anyone who actually
> > used it.
> >
>
> I get your point but still I think that the doc is misleading if it shows
> a usage exmaple but practically no one should follow it.
> Better to remove this usage exmaple completely IMHO.
We (developers) didn't want that example in first place. I'm not going
to argue again in order to attempt to remove it.
>
> >>
> >> Adding gregkh here as he requested the auxiliary bus feature IIRC.
> >> Greg - isn't the doc legit? should EXPORT_SYMBOLs necessarily be used
> >> together with auxiliary bus rather than ops structure?
> >
> > This is not what you are doing here. You completely ditched EXPORT_SYMBOLs
> > and reinvented module reference counting which overcomplicated the code
> > just to avoid using standard kernel mechanism.
> >
> >> As we saw it, auxiliary bus gives us the flexibility to choose which
> >> modules will be loaded while EXPORT_SYMBOLs enforces the dependencies
> >> which might not be needed in some cases.
> >>
> >>>> Moreover, we'd like to support the mode where the IB or the ETH driver is
> >>>> not loaded at all. But this cannot be achieved if we use EXPORT_SYMBOLs
> >>>> exclusively for inter driver communication.
> >>>
> >>> It is not true and not how the kernel works. You can perfectly load core
> >>> driver without IB and ETH, at some extent this is how mlx5 driver works.
> >>>
> >>
> >> mlx5 IB driver doesn't export any symbol that is used by the core driver,
> >> that's why the core driver can be loaded without the IB driver (althought
> >> you'll get circular dependency if you would export).
> >
> > Yes, IB and ETH drivers are "users" of core driver. As RDMA maintainer,
> > I'm reluctant to accept code that exports symbols from IB drivers to
> > other subsystems. We have drivers/infiniband/core/ for that.
> >
>
> So we'll need to restructure the code to follow this limitation. We'll
> take care of it for the next patch set version.
> BTW if you won't allow such driver specific EXPORT_SYMBOLs, I think it is
> good to have it documented similarly to other "don't do" guideliens in the
> infiniband doc.
> That's because in the net/ethernet subsystem for exmaple it is very common
> to add such driver specific EXPORT_SYMBOLs.
Yes, this is technical limitation, it is because PCI core (driver common code)
is located in drivers/net and not because of policy to accept EXPORT_SYMBOLs
in netdev.
If you put your driver common code in other place, you won't need any EXPORT_SYMBOLs
in drivers/net.
>
> >> If relying on exported symbols only, then our IB and ETH drivers will need
> >> to export symbols too because the core driver accesses them post probing.
> >
> > So you should fix your core driver. This is exactly what auxbus model
> > proposes.
> >
> >> Hence we won't be able to load the core driver without both of them (or
> >> loading anything due to circular dependency).
> >> Unless we'll use dynamic symbol lookup and I don't think that's your
> >> intention.
> >
> > No it is not.
> >
> >>
> >>>>
> >>>>>>
> >>>>>>>> + dev_err(hdev->dev, "Failed to increment %s module refcount\n",
> >>>>>>>> + module_name(THIS_MODULE));
> >>>>>>>> + rc = -EIO;
> >>>>>>>> + goto module_get_err;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
> >>>>>>>> + while (1) {
> >>>>>>>> + aux_ops->hw_access_lock(aux_dev);
> >>>>>>>> +
> >>>>>>>> + /* if the device is operational, proceed to actual init while holding the lock in
> >>>>>>>> + * order to prevent concurrent hard reset
> >>>>>>>> + */
> >>>>>>>> + if (aux_ops->device_operational(aux_dev))
> >>>>>>>> + break;
> >>>>>>>> +
> >>>>>>>> + aux_ops->hw_access_unlock(aux_dev);
> >>>>>>>> +
> >>>>>>>> + if (ktime_compare(ktime_get(), timeout) > 0) {
> >>>>>>>> + dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
> >>>>>>>> + rc = -EBUSY;
> >>>>>>>> + goto timeout_err;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing IB\n");
> >>>>>>>> +
> >>>>>>>> + msleep_interruptible(MSEC_PER_SEC);
> >>>>>>>> + }
> >>>>>>>
> >>>>>>> The code above is unexpected.
> >>>>>>>
> >>>>>>
> >>>>>> We have no control on when the user insmod the IB driver.
> >>>>>
> >>>>> It is not true, this is controlled through module dependencies
> >>>>> mechanism.
> >>>>>
> >>>>
> >>>> Yeah, if we would use EXPORT_SYMBOLs for inter driver communication but
> >>>> we don't.
> >>>
> >>> So please use it and don't add complexity where it is not needed.
> >>>
> >>>>
> >>>>>> As a result it is possible that the IB auxiliary device will be probed
> >>>>>> while the compute device is under reset (due to some HW error).
> >>>>>
> >>>>> No, it is not possible. If you structure your driver right.
> >>>>>
> >>>>
> >>>> Again, it is not possible if we would use EXPORT_SYMBOLs.
> >>>> Please let me know if we misunderstood something because AFAIU we followed
> >>>> the auxiliary bus doc usage example.
> >>>
> >>> It is better to follow actual drivers that use auxiliary bus and see how
> >>> they implemented it and not rely on examples in the documentation.
> >>>
> >>
> >> But isn't that what the doc for? to explain the guidelines? and it's not
> >> that there is a big red note there of "this example should not be taken as
> >> is, please look at your subsystem guidelines".
> >
> > At the beginning that doc was located in Documentation/ folder and no one
> > really cared about it. After moving from Documentation/ to drivers/base/auxiliary.c,
> > it became more visible, but still no one relied on it. You are first one
> > who read.
> >
> > There is no subsystem rules here. Everyone relied on EXPORT_SYMBOLs and didn't
> > use ops structure. Kernel is evolving project, there is no need to find a rule
> > for everything.
> >
> > Thanks
> >
> >>
> >>> Thanks
> >>>
> >>>>
> >>>>> Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 06/15] net: hbl_cn: debugfs support
2024-06-23 15:02 ` Andrew Lunn
2024-06-24 7:21 ` Omer Shpigelman
@ 2024-06-24 9:22 ` Leon Romanovsky
2024-12-17 10:00 ` Avri Kehat
2 siblings, 0 replies; 107+ messages in thread
From: Leon Romanovsky @ 2024-06-24 9:22 UTC (permalink / raw)
To: Andrew Lunn
Cc: Omer Shpigelman, Sunil Kovvuri Goutham,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On Sun, Jun 23, 2024 at 05:02:44PM +0200, Andrew Lunn wrote:
> > > If there is no netdev, what is the point of putting it into loopback?
> > > How do you send packets which are to be looped back? How do you
> > > receive them to see if they were actually looped back?
> > >
> > > Andrew
> >
> > To run RDMA test in loopback.
>
> What is special about your RDMA? Why do you need something which other
> vendors don't? Please solve this problem for all RDMA devices, not
> yours.
I'm not aware of anything special here, which require special treatment.
All RDMA devices support loopback natively and can "put" traffic from
their TX directly to their RX. This is how we can run RDMA tests, which
are part of rdma-core https://github.com/linux-rdma/rdma-core/tree/master/tests.
Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-23 14:46 ` Andrew Lunn
@ 2024-06-26 10:13 ` Omer Shpigelman
2024-06-26 14:13 ` Andrew Lunn
0 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-26 10:13 UTC (permalink / raw)
To: Andrew Lunn
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/23/24 17:46, Andrew Lunn wrote:
>>> But what about when the system is under memory pressure? You say it
>>> allocates memory. What happens if those allocations fail. Does
>>> changing the MTU take me from a working system to a dead system? It is
>>> good practice to not kill a working system under situations like
>>> memory pressure. You try to first allocate the memory you need to
>>> handle the new MTU, and only if successful do you free existing memory
>>> you no longer need. That means if you cannot allocate the needed
>>> memory, you still have the old memory, you can keep the old MTU and
>>> return -ENOMEM, and the system keeps running.
>>>
>>
>> That's a good optimization for these kind of on-the-fly configurations but
>> as you wrote before, changing an MTU value is not a hot path so out of
>> cost-benefit considerations we didn't find it mandatory to optimize this
>> flow.
>
> I would not call this an optimization. And it is not just about
> changing the MTU. ethtool set_ringparam() is also likely to run into
> this problem, and any other configuration which requires reallocating
> the rings.
>
> This is something else which comes up every few months on the list,
> and drivers writers who monitor the list will write their drivers that
> why, not 'optimise' it later.
>
Actually I was wrong, we don't allocate memory in this port reset flow, we
only reset the rings. But I get your point, it makes sense.
>> I get your point but still it will be good if it would be documented
>> somewhere IMHO.
>
> Kernel documentation is poor, agreed. But kernel policy is also
> somewhat fluid, best practices change, and any developers can
> influence that policy, different subsystems can and do have
> contradictory policy, etc. The mailing list is the best place to learn
> and to take part in this community. You need to be on the list for
> other reasons as well.
>
Ok, got it.
>> I'm familiar with this logic but I don't understand your point. The point
>> you are making is that setting this Autoneg bit in lp_advertising is
>> pointless? I see other vendors setting it too in case that autoneg was
>> completed.
>> Is that redundant also in their case? because it looks to me that in this
>> case we followed the same logic and conventions other vendors followed.
>
> Please show us the output from ethtool. Does it look like the example
> i showed? I must admit, i'm more from the embedded world and don't
> have access to high speed interfaces. But the basic concept of
> auto-neg should not change that much.
>
> Andrew
Here is the output:
$ ethtool eth0
Settings for eth0:
Supported ports: [ FIBRE Backplane ]
Supported link modes: 100000baseKR4/Full
100000baseSR4/Full
100000baseCR4/Full
100000baseLR4_ER4/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 100000baseKR4/Full
100000baseSR4/Full
100000baseCR4/Full
100000baseLR4_ER4/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes: Not reported
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 100000Mb/s
Duplex: Full
Auto-negotiation: on
There are few points to mention:
1. We don't allow to modify the advertised link modes so by definition the
advertised ones are a copy of the supported ones.
2. Reading the peer advertised link modes is not supported so we don't
report them (similarly to some other vendors).
3. Our speed is fixed and also cannot be changed so we don't mask
lp_advertising with advertising to pick the highest speed. We aim for a
specific speed and hence it's binary - or we'll have a link with that
specific speed or we won't have a link at all.
4. If we support autoneg and it was completed, we can conclude that also
our peer supports autoneg and hence we report that.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-26 10:13 ` Omer Shpigelman
@ 2024-06-26 14:13 ` Andrew Lunn
2024-06-30 7:11 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Andrew Lunn @ 2024-06-26 14:13 UTC (permalink / raw)
To: Omer Shpigelman
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
> Here is the output:
> $ ethtool eth0
> Settings for eth0:
> Supported ports: [ FIBRE Backplane ]
> Supported link modes: 100000baseKR4/Full
> 100000baseSR4/Full
> 100000baseCR4/Full
> 100000baseLR4_ER4/Full
> Supported pause frame use: Symmetric
> Supports auto-negotiation: Yes
> Supported FEC modes: Not reported
> Advertised link modes: 100000baseKR4/Full
> 100000baseSR4/Full
> 100000baseCR4/Full
> 100000baseLR4_ER4/Full
> Advertised pause frame use: Symmetric
> Advertised auto-negotiation: Yes
> Advertised FEC modes: Not reported
> Link partner advertised link modes: Not reported
> Link partner advertised pause frame use: No
> Link partner advertised auto-negotiation: Yes
> Link partner advertised FEC modes: Not reported
> Speed: 100000Mb/s
> Duplex: Full
> Auto-negotiation: on
>
> There are few points to mention:
> 1. We don't allow to modify the advertised link modes so by definition the
> advertised ones are a copy of the supported ones.
So there is no way to ask it use to use 100000baseCR4/Full, for
example? You would normally change the advertised modes to just that
one link mode, and then it has no choice. It either uses
100000baseCR4/Full, or it does not establish a link.
Also, my experience with slower modules is that one supporting
2500BaseX can also support 1000BaseX. However, there is no auto-neg
defined for speeds, just pause. So if the link peer only supports
1000BaseX, you don't get link. What you typically see is:
$ ethtool eth0
Settings for eth0:
Supported ports: [ FIBRE Backplane ]
Supported link modes: 1000baseX
2500baseX
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 2500baseX
Advertised pause frame use: Symmetric
and then you use ethtool to change advertising to 1000baseX and then
you get link. Can these modules support slower speeds?
> 2. Reading the peer advertised link modes is not supported so we don't
> report them (similarly to some other vendors).
Not supported by your firmware? Or not supported by the modules?
Andrew
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-19 10:52 ` Leon Romanovsky
2024-06-24 8:47 ` Omer Shpigelman
@ 2024-06-28 10:24 ` Omer Shpigelman
2024-06-30 13:29 ` Leon Romanovsky
2024-07-12 13:08 ` Jason Gunthorpe
1 sibling, 2 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-28 10:24 UTC (permalink / raw)
To: Leon Romanovsky
Cc: gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On 6/19/24 13:52, Leon Romanovsky wrote:
> On Wed, Jun 19, 2024 at 09:27:54AM +0000, Omer Shpigelman wrote:
>> On 6/18/24 15:58, Leon Romanovsky wrote:
>>> On Tue, Jun 18, 2024 at 11:08:34AM +0000, Omer Shpigelman wrote:
>>>> On 6/17/24 22:04, Leon Romanovsky wrote:
>>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>>
>>>>> On Mon, Jun 17, 2024 at 05:43:49PM +0000, Omer Shpigelman wrote:
>>>>>> On 6/13/24 22:18, Leon Romanovsky wrote:
>>>>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>>>>
>>>>>>> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
>>>>>>>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
>>>>>>>> The driver itself is agnostic to the ASIC in action, it operates according
>>>>>>>> to the capabilities that were passed on device initialization.
>>>>>>>> The device is initialized by the hbl_cn driver via auxiliary bus.
>>>>>>>> The driver also supports QP resource tracking and port/device HW counters.
>>>>>>>>
>>>>>>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
>>>>>>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
>>>>>>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
>>>>>>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
>>>>>>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
>>>>>>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
>>>>>>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
>>>>>>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
>>>>>>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
>>>>>>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
>>>>>>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
>>>>>>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
>>>>>>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
>>>>>>>
<...>
>> mlx5 IB driver doesn't export any symbol that is used by the core driver,
>> that's why the core driver can be loaded without the IB driver (althought
>> you'll get circular dependency if you would export).
>
> Yes, IB and ETH drivers are "users" of core driver. As RDMA maintainer,
> I'm reluctant to accept code that exports symbols from IB drivers to
> other subsystems. We have drivers/infiniband/core/ for that.
>
We need the core driver to access the IB driver (and to the ETH driver as
well). As you wrote, we can't use exported symbols from our IB driver nor
rely on function pointers, but what about providing the core driver an ops
structure? meaning exporting a register function from the core driver that
should be called by the IB driver during auxiliary device probe.
Something like:
int hbl_cn_register_ib_aux_dev(struct auxiliary_device *adev,
struct hbl_ib_ops *ops)
{
...
}
EXPORT_SYMBOL(hbl_cn_register_ib_aux_dev);
That's how only the parent driver exports symbols to the son driver so the
IB driver is a "user" of the core driver and so we count on the internal
module reference counter. But we also get the ability to access the IB
driver from the core driver (to report a HW error for example).
>> If relying on exported symbols only, then our IB and ETH drivers will need
>> to export symbols too because the core driver accesses them post probing.
>
> So you should fix your core driver. This is exactly what auxbus model
> proposes.
>
>> Hence we won't be able to load the core driver without both of them (or
>> loading anything due to circular dependency).
>> Unless we'll use dynamic symbol lookup and I don't think that's your
>> intention.
>
> No it is not.
>
<...>
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver
2024-06-26 14:13 ` Andrew Lunn
@ 2024-06-30 7:11 ` Omer Shpigelman
0 siblings, 0 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-06-30 7:11 UTC (permalink / raw)
To: Andrew Lunn
Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 6/26/24 17:13, Andrew Lunn wrote:
>> Here is the output:
>> $ ethtool eth0
>> Settings for eth0:
>> Supported ports: [ FIBRE Backplane ]
>> Supported link modes: 100000baseKR4/Full
>> 100000baseSR4/Full
>> 100000baseCR4/Full
>> 100000baseLR4_ER4/Full
>> Supported pause frame use: Symmetric
>> Supports auto-negotiation: Yes
>> Supported FEC modes: Not reported
>> Advertised link modes: 100000baseKR4/Full
>> 100000baseSR4/Full
>> 100000baseCR4/Full
>> 100000baseLR4_ER4/Full
>> Advertised pause frame use: Symmetric
>> Advertised auto-negotiation: Yes
>> Advertised FEC modes: Not reported
>> Link partner advertised link modes: Not reported
>> Link partner advertised pause frame use: No
>> Link partner advertised auto-negotiation: Yes
>> Link partner advertised FEC modes: Not reported
>> Speed: 100000Mb/s
>> Duplex: Full
>> Auto-negotiation: on
>>
>> There are few points to mention:
>> 1. We don't allow to modify the advertised link modes so by definition the
>> advertised ones are a copy of the supported ones.
>
> So there is no way to ask it use to use 100000baseCR4/Full, for
> example? You would normally change the advertised modes to just that
> one link mode, and then it has no choice. It either uses
> 100000baseCR4/Full, or it does not establish a link.
>
No, our FW doesn't support it as we have no use case for that.
> Also, my experience with slower modules is that one supporting
> 2500BaseX can also support 1000BaseX. However, there is no auto-neg
> defined for speeds, just pause. So if the link peer only supports
> 1000BaseX, you don't get link. What you typically see is:
>
> $ ethtool eth0
> Settings for eth0:
> Supported ports: [ FIBRE Backplane ]
> Supported link modes: 1000baseX
> 2500baseX
> Supported pause frame use: Symmetric
> Supports auto-negotiation: Yes
> Supported FEC modes: Not reported
> Advertised link modes: 2500baseX
> Advertised pause frame use: Symmetric
>
> and then you use ethtool to change advertising to 1000baseX and then
> you get link. Can these modules support slower speeds?
>
No, we support a single speed.
>> 2. Reading the peer advertised link modes is not supported so we don't
>> report them (similarly to some other vendors).
>
> Not supported by your firmware? Or not supported by the modules?
>
Let me explain it better - Gaudi2 is not a general purpose Ethernet NIC.
Its goal is to support any Ethernet traffic that is needed for enabling
the scaling of AI neural networks training as part of HLS2 server:
https://www.intel.com/content/www/us/en/content-details/784778/hls-gaudi-2-deep-learning-server-datasheet.html
Hence, in contrary to a general purpose Ethernet NIC, it is well known who
is our peer and what are its capabilities - it is a Gaudi2 NIC or a
switch.
Technically we can read the advertised link partner modes but we had no
demand for that because the driver and the user are well aware of who is
on the other side.
Reading it from the FW will be the same as having it hard coded because
the value is already known (otherwise we won't have a link). I can add it
to lp_advertising if necessary although per my check most vendors don't
report it either.
> Andrew
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-28 10:24 ` Omer Shpigelman
@ 2024-06-30 13:29 ` Leon Romanovsky
2024-07-01 10:46 ` Omer Shpigelman
2024-07-12 13:08 ` Jason Gunthorpe
1 sibling, 1 reply; 107+ messages in thread
From: Leon Romanovsky @ 2024-06-30 13:29 UTC (permalink / raw)
To: Omer Shpigelman
Cc: gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On Fri, Jun 28, 2024 at 10:24:32AM +0000, Omer Shpigelman wrote:
> On 6/19/24 13:52, Leon Romanovsky wrote:
> > On Wed, Jun 19, 2024 at 09:27:54AM +0000, Omer Shpigelman wrote:
> >> On 6/18/24 15:58, Leon Romanovsky wrote:
> >>> On Tue, Jun 18, 2024 at 11:08:34AM +0000, Omer Shpigelman wrote:
> >>>> On 6/17/24 22:04, Leon Romanovsky wrote:
> >>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>>>
> >>>>> On Mon, Jun 17, 2024 at 05:43:49PM +0000, Omer Shpigelman wrote:
> >>>>>> On 6/13/24 22:18, Leon Romanovsky wrote:
> >>>>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>>>>>
> >>>>>>> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
> >>>>>>>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
> >>>>>>>> The driver itself is agnostic to the ASIC in action, it operates according
> >>>>>>>> to the capabilities that were passed on device initialization.
> >>>>>>>> The device is initialized by the hbl_cn driver via auxiliary bus.
> >>>>>>>> The driver also supports QP resource tracking and port/device HW counters.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> >>>>>>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> >>>>>>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> >>>>>>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> >>>>>>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> >>>>>>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> >>>>>>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> >>>>>>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
> >>>>>>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
> >>>>>>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> >>>>>>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> >>>>>>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> >>>>>>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> >>>>>>>
>
> <...>
>
> >> mlx5 IB driver doesn't export any symbol that is used by the core driver,
> >> that's why the core driver can be loaded without the IB driver (althought
> >> you'll get circular dependency if you would export).
> >
> > Yes, IB and ETH drivers are "users" of core driver. As RDMA maintainer,
> > I'm reluctant to accept code that exports symbols from IB drivers to
> > other subsystems. We have drivers/infiniband/core/ for that.
> >
>
> We need the core driver to access the IB driver (and to the ETH driver as
> well). As you wrote, we can't use exported symbols from our IB driver nor
> rely on function pointers, but what about providing the core driver an ops
> structure? meaning exporting a register function from the core driver that
> should be called by the IB driver during auxiliary device probe.
> Something like:
>
> int hbl_cn_register_ib_aux_dev(struct auxiliary_device *adev,
> struct hbl_ib_ops *ops)
> {
> ...
> }
> EXPORT_SYMBOL(hbl_cn_register_ib_aux_dev);
>
> That's how only the parent driver exports symbols to the son driver so the
> IB driver is a "user" of the core driver and so we count on the internal
> module reference counter. But we also get the ability to access the IB
> driver from the core driver (to report a HW error for example).
Before you are talking about solutions, please explain in technical
terms why you absolutely need to access IB from core driver and any
other possible way is not possible.
Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-30 13:29 ` Leon Romanovsky
@ 2024-07-01 10:46 ` Omer Shpigelman
2024-07-01 12:46 ` Leon Romanovsky
0 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-07-01 10:46 UTC (permalink / raw)
To: Leon Romanovsky
Cc: gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On 6/30/24 16:29, Leon Romanovsky wrote:
> On Fri, Jun 28, 2024 at 10:24:32AM +0000, Omer Shpigelman wrote:
>> On 6/19/24 13:52, Leon Romanovsky wrote:
>>> On Wed, Jun 19, 2024 at 09:27:54AM +0000, Omer Shpigelman wrote:
>>>> On 6/18/24 15:58, Leon Romanovsky wrote:
>>>>> On Tue, Jun 18, 2024 at 11:08:34AM +0000, Omer Shpigelman wrote:
>>>>>> On 6/17/24 22:04, Leon Romanovsky wrote:
>>>>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>>>>
>>>>>>> On Mon, Jun 17, 2024 at 05:43:49PM +0000, Omer Shpigelman wrote:
>>>>>>>> On 6/13/24 22:18, Leon Romanovsky wrote:
>>>>>>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>>>>>>
>>>>>>>>> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
>>>>>>>>>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
>>>>>>>>>> The driver itself is agnostic to the ASIC in action, it operates according
>>>>>>>>>> to the capabilities that were passed on device initialization.
>>>>>>>>>> The device is initialized by the hbl_cn driver via auxiliary bus.
>>>>>>>>>> The driver also supports QP resource tracking and port/device HW counters.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
>>>>>>>>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
>>>>>>>>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
>>>>>>>>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
>>>>>>>>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
>>>>>>>>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
>>>>>>>>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
>>>>>>>>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
>>>>>>>>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
>>>>>>>>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
>>>>>>>>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
>>>>>>>>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
>>>>>>>>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
>>>>>>>>>
>>
>> <...>
>>
>>>> mlx5 IB driver doesn't export any symbol that is used by the core driver,
>>>> that's why the core driver can be loaded without the IB driver (althought
>>>> you'll get circular dependency if you would export).
>>>
>>> Yes, IB and ETH drivers are "users" of core driver. As RDMA maintainer,
>>> I'm reluctant to accept code that exports symbols from IB drivers to
>>> other subsystems. We have drivers/infiniband/core/ for that.
>>>
>>
>> We need the core driver to access the IB driver (and to the ETH driver as
>> well). As you wrote, we can't use exported symbols from our IB driver nor
>> rely on function pointers, but what about providing the core driver an ops
>> structure? meaning exporting a register function from the core driver that
>> should be called by the IB driver during auxiliary device probe.
>> Something like:
>>
>> int hbl_cn_register_ib_aux_dev(struct auxiliary_device *adev,
>> struct hbl_ib_ops *ops)
>> {
>> ...
>> }
>> EXPORT_SYMBOL(hbl_cn_register_ib_aux_dev);
>>
>> That's how only the parent driver exports symbols to the son driver so the
>> IB driver is a "user" of the core driver and so we count on the internal
>> module reference counter. But we also get the ability to access the IB
>> driver from the core driver (to report a HW error for example).
>
> Before you are talking about solutions, please explain in technical
> terms why you absolutely need to access IB from core driver and any
> other possible way is not possible.
>
> Thanks
First of all, as a general assumption, everything we do today can also be
done with unidirectional drivers communication only. If the parent driver
cannot access the son driver directly, then we can have a blocking command
queue on the parent side that the parent driver will push to it and the
son driver will fetch from it, execute the command and unblock the parent.
That will work but it adds complexity which I'm not sure that is needed.
The second point is not necessarily about the direction of the
communication but more about generally using function pointers rather than
exported symbols - we have 2 flavors of functions for inter driver
communications: common functions and ASIC specific functions. The ASIC
specific functions are exposed and initialized per ASIC. If we convert
them to EXPORT_SYMBOLs then we expose ASIC specific functions regardless
of the ASIC in action.
Again, that will work but seems unnecessary. We can check the ASIC type
that was passed in each exported function and fail if a wrong ASIC type
was used, but it seems to me like an incorrect approach to use exported
symbols for ASIC specific communication. EXPORT_SYMBOLs were meant to be
used for driver level communication, not for utilizing device specific
capabilities. For that, an ops struct seems more appropriate.
That's why I'm suggesting to combine both exported symbols and function
pointers.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-07-01 10:46 ` Omer Shpigelman
@ 2024-07-01 12:46 ` Leon Romanovsky
0 siblings, 0 replies; 107+ messages in thread
From: Leon Romanovsky @ 2024-07-01 12:46 UTC (permalink / raw)
To: Omer Shpigelman
Cc: gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
dri-devel@lists.freedesktop.org, ogabbay@kernel.org,
Zvika Yehudai
On Mon, Jul 01, 2024 at 10:46:48AM +0000, Omer Shpigelman wrote:
> On 6/30/24 16:29, Leon Romanovsky wrote:
> > On Fri, Jun 28, 2024 at 10:24:32AM +0000, Omer Shpigelman wrote:
> >> On 6/19/24 13:52, Leon Romanovsky wrote:
> >>> On Wed, Jun 19, 2024 at 09:27:54AM +0000, Omer Shpigelman wrote:
> >>>> On 6/18/24 15:58, Leon Romanovsky wrote:
> >>>>> On Tue, Jun 18, 2024 at 11:08:34AM +0000, Omer Shpigelman wrote:
> >>>>>> On 6/17/24 22:04, Leon Romanovsky wrote:
> >>>>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>>>>>
> >>>>>>> On Mon, Jun 17, 2024 at 05:43:49PM +0000, Omer Shpigelman wrote:
> >>>>>>>> On 6/13/24 22:18, Leon Romanovsky wrote:
> >>>>>>>>> [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>>>>>>>
> >>>>>>>>> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
> >>>>>>>>>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
> >>>>>>>>>> The driver itself is agnostic to the ASIC in action, it operates according
> >>>>>>>>>> to the capabilities that were passed on device initialization.
> >>>>>>>>>> The device is initialized by the hbl_cn driver via auxiliary bus.
> >>>>>>>>>> The driver also supports QP resource tracking and port/device HW counters.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> >>>>>>>>>> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> >>>>>>>>>> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> >>>>>>>>>> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> >>>>>>>>>> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> >>>>>>>>>> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> >>>>>>>>>> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> >>>>>>>>>> Co-developed-by: David Meriin <dmeriin@habana.ai>
> >>>>>>>>>> Signed-off-by: David Meriin <dmeriin@habana.ai>
> >>>>>>>>>> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> >>>>>>>>>> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> >>>>>>>>>> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> >>>>>>>>>> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> >>>>>>>>>
> >>
> >> <...>
> >>
> >>>> mlx5 IB driver doesn't export any symbol that is used by the core driver,
> >>>> that's why the core driver can be loaded without the IB driver (althought
> >>>> you'll get circular dependency if you would export).
> >>>
> >>> Yes, IB and ETH drivers are "users" of core driver. As RDMA maintainer,
> >>> I'm reluctant to accept code that exports symbols from IB drivers to
> >>> other subsystems. We have drivers/infiniband/core/ for that.
> >>>
> >>
> >> We need the core driver to access the IB driver (and to the ETH driver as
> >> well). As you wrote, we can't use exported symbols from our IB driver nor
> >> rely on function pointers, but what about providing the core driver an ops
> >> structure? meaning exporting a register function from the core driver that
> >> should be called by the IB driver during auxiliary device probe.
> >> Something like:
> >>
> >> int hbl_cn_register_ib_aux_dev(struct auxiliary_device *adev,
> >> struct hbl_ib_ops *ops)
> >> {
> >> ...
> >> }
> >> EXPORT_SYMBOL(hbl_cn_register_ib_aux_dev);
> >>
> >> That's how only the parent driver exports symbols to the son driver so the
> >> IB driver is a "user" of the core driver and so we count on the internal
> >> module reference counter. But we also get the ability to access the IB
> >> driver from the core driver (to report a HW error for example).
> >
> > Before you are talking about solutions, please explain in technical
> > terms why you absolutely need to access IB from core driver and any
> > other possible way is not possible.
> >
> > Thanks
>
> First of all, as a general assumption, everything we do today can also be
> done with unidirectional drivers communication only. If the parent driver
> cannot access the son driver directly, then we can have a blocking command
> queue on the parent side that the parent driver will push to it and the
> son driver will fetch from it, execute the command and unblock the parent.
> That will work but it adds complexity which I'm not sure that is needed.
> The second point is not necessarily about the direction of the
> communication but more about generally using function pointers rather than
> exported symbols - we have 2 flavors of functions for inter driver
> communications: common functions and ASIC specific functions. The ASIC
> specific functions are exposed and initialized per ASIC. If we convert
> them to EXPORT_SYMBOLs then we expose ASIC specific functions regardless
> of the ASIC in action.
> Again, that will work but seems unnecessary. We can check the ASIC type
> that was passed in each exported function and fail if a wrong ASIC type
> was used, but it seems to me like an incorrect approach to use exported
> symbols for ASIC specific communication. EXPORT_SYMBOLs were meant to be
> used for driver level communication, not for utilizing device specific
> capabilities. For that, an ops struct seems more appropriate.
> That's why I'm suggesting to combine both exported symbols and function
> pointers.
Thanks for the explanation. I understand your concerns, but I don't see
any technical justification for the need to access IB driver from the
core.
Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-06-28 10:24 ` Omer Shpigelman
2024-06-30 13:29 ` Leon Romanovsky
@ 2024-07-12 13:08 ` Jason Gunthorpe
2024-07-14 10:18 ` Omer Shpigelman
1 sibling, 1 reply; 107+ messages in thread
From: Jason Gunthorpe @ 2024-07-12 13:08 UTC (permalink / raw)
To: Omer Shpigelman
Cc: Leon Romanovsky, gregkh@linuxfoundation.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On Fri, Jun 28, 2024 at 10:24:32AM +0000, Omer Shpigelman wrote:
> We need the core driver to access the IB driver (and to the ETH driver as
> well). As you wrote, we can't use exported symbols from our IB driver nor
> rely on function pointers, but what about providing the core driver an ops
> structure? meaning exporting a register function from the core driver that
> should be called by the IB driver during auxiliary device probe.
> Something like:
>
> int hbl_cn_register_ib_aux_dev(struct auxiliary_device *adev,
> struct hbl_ib_ops *ops)
> {
> ...
> }
> EXPORT_SYMBOL(hbl_cn_register_ib_aux_dev);
Definately do not do some kind of double-register like this.
The auxiliary_device scheme can already be extended to provide ops for
each sub device.
Like
struct habana_driver {
struct auxiliary_driver base;
const struct habana_ops *ops;
};
If the ops are justified or not is a different question.
> module reference counter. But we also get the ability to access the IB
> driver from the core driver (to report a HW error for example).
Report a HW error seems reasonable to me
Other driver have used notifier chains for this kind of stuff though.
Jason
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-07-12 13:08 ` Jason Gunthorpe
@ 2024-07-14 10:18 ` Omer Shpigelman
2024-07-16 13:40 ` Jason Gunthorpe
0 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-07-14 10:18 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, gregkh@linuxfoundation.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 7/12/24 16:08, Jason Gunthorpe wrote:
> [You don't often get email from jgg@ziepe.ca. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Fri, Jun 28, 2024 at 10:24:32AM +0000, Omer Shpigelman wrote:
>
>> We need the core driver to access the IB driver (and to the ETH driver as
>> well). As you wrote, we can't use exported symbols from our IB driver nor
>> rely on function pointers, but what about providing the core driver an ops
>> structure? meaning exporting a register function from the core driver that
>> should be called by the IB driver during auxiliary device probe.
>> Something like:
>>
>> int hbl_cn_register_ib_aux_dev(struct auxiliary_device *adev,
>> struct hbl_ib_ops *ops)
>> {
>> ...
>> }
>> EXPORT_SYMBOL(hbl_cn_register_ib_aux_dev);
>
> Definately do not do some kind of double-register like this.
>
> The auxiliary_device scheme can already be extended to provide ops for
> each sub device.
>
> Like
>
> struct habana_driver {
> struct auxiliary_driver base;
> const struct habana_ops *ops;
> };
>
> If the ops are justified or not is a different question.
>
Well, I suggested this double-register option because I got a comment that
the design pattern of embedded ops structure shouldn't be used.
So I'm confused now...
>> module reference counter. But we also get the ability to access the IB
>> driver from the core driver (to report a HW error for example).
>
> Report a HW error seems reasonable to me
>
> Other driver have used notifier chains for this kind of stuff though.
>
> Jason
I'll look into the option of using notifier chains in this case, although
as I saw it, the notifier chains are more suitable for broadcast updates
where the updater is not necessarily aware of the identity nor the number
of the subscribers. It looks kind of overkill for our error reporting case
which is simpler.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-07-14 10:18 ` Omer Shpigelman
@ 2024-07-16 13:40 ` Jason Gunthorpe
2024-07-17 7:08 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Jason Gunthorpe @ 2024-07-16 13:40 UTC (permalink / raw)
To: Omer Shpigelman
Cc: Leon Romanovsky, gregkh@linuxfoundation.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On Sun, Jul 14, 2024 at 10:18:12AM +0000, Omer Shpigelman wrote:
> On 7/12/24 16:08, Jason Gunthorpe wrote:
> > [You don't often get email from jgg@ziepe.ca. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > On Fri, Jun 28, 2024 at 10:24:32AM +0000, Omer Shpigelman wrote:
> >
> >> We need the core driver to access the IB driver (and to the ETH driver as
> >> well). As you wrote, we can't use exported symbols from our IB driver nor
> >> rely on function pointers, but what about providing the core driver an ops
> >> structure? meaning exporting a register function from the core driver that
> >> should be called by the IB driver during auxiliary device probe.
> >> Something like:
> >>
> >> int hbl_cn_register_ib_aux_dev(struct auxiliary_device *adev,
> >> struct hbl_ib_ops *ops)
> >> {
> >> ...
> >> }
> >> EXPORT_SYMBOL(hbl_cn_register_ib_aux_dev);
> >
> > Definately do not do some kind of double-register like this.
> >
> > The auxiliary_device scheme can already be extended to provide ops for
> > each sub device.
> >
> > Like
> >
> > struct habana_driver {
> > struct auxiliary_driver base;
> > const struct habana_ops *ops;
> > };
> >
> > If the ops are justified or not is a different question.
> >
>
> Well, I suggested this double-register option because I got a comment that
> the design pattern of embedded ops structure shouldn't be used.
> So I'm confused now...
Yeah, don't stick ops in random places, but the device_driver is the
right place.
> I'll look into the option of using notifier chains in this case, although
> as I saw it, the notifier chains are more suitable for broadcast updates
> where the updater is not necessarily aware of the identity nor the number
> of the subscribers.
Yes, that is right.
Jason
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-07-16 13:40 ` Jason Gunthorpe
@ 2024-07-17 7:08 ` Omer Shpigelman
2024-07-17 7:36 ` Leon Romanovsky
0 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-07-17 7:08 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, gregkh@linuxfoundation.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 7/16/24 16:40, Jason Gunthorpe wrote:
> On Sun, Jul 14, 2024 at 10:18:12AM +0000, Omer Shpigelman wrote:
>> On 7/12/24 16:08, Jason Gunthorpe wrote:
>>> [You don't often get email from jgg@ziepe.ca. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> On Fri, Jun 28, 2024 at 10:24:32AM +0000, Omer Shpigelman wrote:
>>>
>>>> We need the core driver to access the IB driver (and to the ETH driver as
>>>> well). As you wrote, we can't use exported symbols from our IB driver nor
>>>> rely on function pointers, but what about providing the core driver an ops
>>>> structure? meaning exporting a register function from the core driver that
>>>> should be called by the IB driver during auxiliary device probe.
>>>> Something like:
>>>>
>>>> int hbl_cn_register_ib_aux_dev(struct auxiliary_device *adev,
>>>> struct hbl_ib_ops *ops)
>>>> {
>>>> ...
>>>> }
>>>> EXPORT_SYMBOL(hbl_cn_register_ib_aux_dev);
>>>
>>> Definately do not do some kind of double-register like this.
>>>
>>> The auxiliary_device scheme can already be extended to provide ops for
>>> each sub device.
>>>
>>> Like
>>>
>>> struct habana_driver {
>>> struct auxiliary_driver base;
>>> const struct habana_ops *ops;
>>> };
>>>
>>> If the ops are justified or not is a different question.
>>>
>>
>> Well, I suggested this double-register option because I got a comment that
>> the design pattern of embedded ops structure shouldn't be used.
>> So I'm confused now...
>
> Yeah, don't stick ops in random places, but the device_driver is the
> right place.
>
Sorry, let me explain again. My original code has an ops structure
exactly like you are suggesting now (see struct hbl_aux_dev in the first
patch of the series). But I was instructed not to use this ops structure
and to rely on exported symbols for inter-driver communication.
I'll be happy to use this ops structure like in your example rather than
converting my code to use exported symbols.
Leon - am I missing anything? what's the verdict here?
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-07-17 7:08 ` Omer Shpigelman
@ 2024-07-17 7:36 ` Leon Romanovsky
2024-07-17 10:51 ` Omer Shpigelman
0 siblings, 1 reply; 107+ messages in thread
From: Leon Romanovsky @ 2024-07-17 7:36 UTC (permalink / raw)
To: Omer Shpigelman
Cc: Jason Gunthorpe, gregkh@linuxfoundation.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On Wed, Jul 17, 2024 at 07:08:59AM +0000, Omer Shpigelman wrote:
> On 7/16/24 16:40, Jason Gunthorpe wrote:
> > On Sun, Jul 14, 2024 at 10:18:12AM +0000, Omer Shpigelman wrote:
> >> On 7/12/24 16:08, Jason Gunthorpe wrote:
> >>> [You don't often get email from jgg@ziepe.ca. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>
> >>> On Fri, Jun 28, 2024 at 10:24:32AM +0000, Omer Shpigelman wrote:
> >>>
> >>>> We need the core driver to access the IB driver (and to the ETH driver as
> >>>> well). As you wrote, we can't use exported symbols from our IB driver nor
> >>>> rely on function pointers, but what about providing the core driver an ops
> >>>> structure? meaning exporting a register function from the core driver that
> >>>> should be called by the IB driver during auxiliary device probe.
> >>>> Something like:
> >>>>
> >>>> int hbl_cn_register_ib_aux_dev(struct auxiliary_device *adev,
> >>>> struct hbl_ib_ops *ops)
> >>>> {
> >>>> ...
> >>>> }
> >>>> EXPORT_SYMBOL(hbl_cn_register_ib_aux_dev);
> >>>
> >>> Definately do not do some kind of double-register like this.
> >>>
> >>> The auxiliary_device scheme can already be extended to provide ops for
> >>> each sub device.
> >>>
> >>> Like
> >>>
> >>> struct habana_driver {
> >>> struct auxiliary_driver base;
> >>> const struct habana_ops *ops;
> >>> };
> >>>
> >>> If the ops are justified or not is a different question.
> >>>
> >>
> >> Well, I suggested this double-register option because I got a comment that
> >> the design pattern of embedded ops structure shouldn't be used.
> >> So I'm confused now...
> >
> > Yeah, don't stick ops in random places, but the device_driver is the
> > right place.
> >
>
> Sorry, let me explain again. My original code has an ops structure
> exactly like you are suggesting now (see struct hbl_aux_dev in the first
> patch of the series). But I was instructed not to use this ops structure
> and to rely on exported symbols for inter-driver communication.
> I'll be happy to use this ops structure like in your example rather than
> converting my code to use exported symbols.
> Leon - am I missing anything? what's the verdict here?
You are missing the main sentence from Jason's response: "don't stick ops in random places".
It is fine to have ops in device driver, so the core driver can call them. However, in your
original code, you added ops everywhere. It caused to the need to implement module reference
counting and crazy stuff like calls to lock and unlock functions from the aux driver to the core.
Verdict is still the same. Core driver should provide EXPORT_SYMBOLs, so the aux driver can call
them directly and enjoy from proper module loading and unloading.
The aux driver can have ops in the device driver, so the core driver can call them to perform something
specific for that aux driver.
Calls between aux drivers should be done via the core driver.
Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-07-17 7:36 ` Leon Romanovsky
@ 2024-07-17 10:51 ` Omer Shpigelman
2024-07-17 11:56 ` Jason Gunthorpe
2024-07-17 12:33 ` Leon Romanovsky
0 siblings, 2 replies; 107+ messages in thread
From: Omer Shpigelman @ 2024-07-17 10:51 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Jason Gunthorpe, gregkh@linuxfoundation.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 7/17/24 10:36, Leon Romanovsky wrote:
> On Wed, Jul 17, 2024 at 07:08:59AM +0000, Omer Shpigelman wrote:
>> On 7/16/24 16:40, Jason Gunthorpe wrote:
>>> On Sun, Jul 14, 2024 at 10:18:12AM +0000, Omer Shpigelman wrote:
>>>> On 7/12/24 16:08, Jason Gunthorpe wrote:
>>>>> [You don't often get email from jgg@ziepe.ca. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>>
>>>>> On Fri, Jun 28, 2024 at 10:24:32AM +0000, Omer Shpigelman wrote:
>>>>>
>>>>>> We need the core driver to access the IB driver (and to the ETH driver as
>>>>>> well). As you wrote, we can't use exported symbols from our IB driver nor
>>>>>> rely on function pointers, but what about providing the core driver an ops
>>>>>> structure? meaning exporting a register function from the core driver that
>>>>>> should be called by the IB driver during auxiliary device probe.
>>>>>> Something like:
>>>>>>
>>>>>> int hbl_cn_register_ib_aux_dev(struct auxiliary_device *adev,
>>>>>> struct hbl_ib_ops *ops)
>>>>>> {
>>>>>> ...
>>>>>> }
>>>>>> EXPORT_SYMBOL(hbl_cn_register_ib_aux_dev);
>>>>>
>>>>> Definately do not do some kind of double-register like this.
>>>>>
>>>>> The auxiliary_device scheme can already be extended to provide ops for
>>>>> each sub device.
>>>>>
>>>>> Like
>>>>>
>>>>> struct habana_driver {
>>>>> struct auxiliary_driver base;
>>>>> const struct habana_ops *ops;
>>>>> };
>>>>>
>>>>> If the ops are justified or not is a different question.
>>>>>
>>>>
>>>> Well, I suggested this double-register option because I got a comment that
>>>> the design pattern of embedded ops structure shouldn't be used.
>>>> So I'm confused now...
>>>
>>> Yeah, don't stick ops in random places, but the device_driver is the
>>> right place.
>>>
>>
>> Sorry, let me explain again. My original code has an ops structure
>> exactly like you are suggesting now (see struct hbl_aux_dev in the first
>> patch of the series). But I was instructed not to use this ops structure
>> and to rely on exported symbols for inter-driver communication.
>> I'll be happy to use this ops structure like in your example rather than
>> converting my code to use exported symbols.
>> Leon - am I missing anything? what's the verdict here?
>
> You are missing the main sentence from Jason's response: "don't stick ops in random places".
>
> It is fine to have ops in device driver, so the core driver can call them. However, in your
> original code, you added ops everywhere. It caused to the need to implement module reference
> counting and crazy stuff like calls to lock and unlock functions from the aux driver to the core.
>
> Verdict is still the same. Core driver should provide EXPORT_SYMBOLs, so the aux driver can call
> them directly and enjoy from proper module loading and unloading.
>
> The aux driver can have ops in the device driver, so the core driver can call them to perform something
> specific for that aux driver.
>
> Calls between aux drivers should be done via the core driver.
>
> Thanks
The only place we have an ops structure is in the device driver,
similarly to Jason's example. In our code it is struct hbl_aux_dev. What
other random places did you see?
We have several auxiliary devices so we have several instances of this
structure but the definition is in a single place.
The module reference counting is unrelated to the ops structure - we used
it to block the son driver removal while the parent driver can access it.
Even with exported symbols we would use it. Anyway, in v2 we'd like to
allow the son driver removal before the parent so this module reference
counting will be removed.
The lock/unlock functions are also unrelated to the ops structure, we would
add these even with exported symbols. The reason is that our NIC drivers
are the sons/grandsons of a compute device which can enter a reset flow as
part of a TDR mechanism. During this flow we must not access the HW so we
need to block a parallel son device probing.
In addition, we don't have any direct communication between the aux
drivers, everything is done via the parent driver.
Given all of the above, what is the problem with our current code? we did
exactly what Jason wrote in his example - having an ops structure in the
device driver to allow inter-driver communication.
The only issue I see here is the question if this ops structure is for
unidirectional communication (meaning parent to son only) or for
bidirectional communication between the drivers (meaning also son to
parent). That's the only point that was not mentioned by Jason while you
are clear about the answer.
AFAIU EXPORT_SYMBOLs should be used to expose driver level operations, not
operations which are device specific (and that's our case). Hence we used
this ops structure also for son-to-parent communication, although we can
switch them with exported symbols if we have to.
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-07-17 10:51 ` Omer Shpigelman
@ 2024-07-17 11:56 ` Jason Gunthorpe
2024-07-17 12:33 ` Leon Romanovsky
1 sibling, 0 replies; 107+ messages in thread
From: Jason Gunthorpe @ 2024-07-17 11:56 UTC (permalink / raw)
To: Omer Shpigelman
Cc: Leon Romanovsky, gregkh@linuxfoundation.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On Wed, Jul 17, 2024 at 10:51:03AM +0000, Omer Shpigelman wrote:
> The only place we have an ops structure is in the device driver,
> similarly to Jason's example. In our code it is struct
> hbl_aux_dev. What
No, hbl_aux_dev is an 'struct auxiliary_device', not a 'struct
device_driver', it is different. I did literally mean struct
device_driver.
Jason
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-07-17 10:51 ` Omer Shpigelman
2024-07-17 11:56 ` Jason Gunthorpe
@ 2024-07-17 12:33 ` Leon Romanovsky
2024-07-18 6:54 ` Omer Shpigelman
1 sibling, 1 reply; 107+ messages in thread
From: Leon Romanovsky @ 2024-07-17 12:33 UTC (permalink / raw)
To: Omer Shpigelman
Cc: Jason Gunthorpe, gregkh@linuxfoundation.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On Wed, Jul 17, 2024 at 10:51:03AM +0000, Omer Shpigelman wrote:
> On 7/17/24 10:36, Leon Romanovsky wrote:
> > On Wed, Jul 17, 2024 at 07:08:59AM +0000, Omer Shpigelman wrote:
> >> On 7/16/24 16:40, Jason Gunthorpe wrote:
> >>> On Sun, Jul 14, 2024 at 10:18:12AM +0000, Omer Shpigelman wrote:
> >>>> On 7/12/24 16:08, Jason Gunthorpe wrote:
> >>>>> [You don't often get email from jgg@ziepe.ca. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>>>
> >>>>> On Fri, Jun 28, 2024 at 10:24:32AM +0000, Omer Shpigelman wrote:
> >>>>>
> >>>>>> We need the core driver to access the IB driver (and to the ETH driver as
> >>>>>> well). As you wrote, we can't use exported symbols from our IB driver nor
> >>>>>> rely on function pointers, but what about providing the core driver an ops
> >>>>>> structure? meaning exporting a register function from the core driver that
> >>>>>> should be called by the IB driver during auxiliary device probe.
> >>>>>> Something like:
> >>>>>>
> >>>>>> int hbl_cn_register_ib_aux_dev(struct auxiliary_device *adev,
> >>>>>> struct hbl_ib_ops *ops)
> >>>>>> {
> >>>>>> ...
> >>>>>> }
> >>>>>> EXPORT_SYMBOL(hbl_cn_register_ib_aux_dev);
> >>>>>
> >>>>> Definately do not do some kind of double-register like this.
> >>>>>
> >>>>> The auxiliary_device scheme can already be extended to provide ops for
> >>>>> each sub device.
> >>>>>
> >>>>> Like
> >>>>>
> >>>>> struct habana_driver {
> >>>>> struct auxiliary_driver base;
> >>>>> const struct habana_ops *ops;
> >>>>> };
> >>>>>
> >>>>> If the ops are justified or not is a different question.
> >>>>>
> >>>>
> >>>> Well, I suggested this double-register option because I got a comment that
> >>>> the design pattern of embedded ops structure shouldn't be used.
> >>>> So I'm confused now...
> >>>
> >>> Yeah, don't stick ops in random places, but the device_driver is the
> >>> right place.
> >>>
> >>
> >> Sorry, let me explain again. My original code has an ops structure
> >> exactly like you are suggesting now (see struct hbl_aux_dev in the first
> >> patch of the series). But I was instructed not to use this ops structure
> >> and to rely on exported symbols for inter-driver communication.
> >> I'll be happy to use this ops structure like in your example rather than
> >> converting my code to use exported symbols.
> >> Leon - am I missing anything? what's the verdict here?
> >
> > You are missing the main sentence from Jason's response: "don't stick ops in random places".
> >
> > It is fine to have ops in device driver, so the core driver can call them. However, in your
> > original code, you added ops everywhere. It caused to the need to implement module reference
> > counting and crazy stuff like calls to lock and unlock functions from the aux driver to the core.
> >
> > Verdict is still the same. Core driver should provide EXPORT_SYMBOLs, so the aux driver can call
> > them directly and enjoy from proper module loading and unloading.
> >
> > The aux driver can have ops in the device driver, so the core driver can call them to perform something
> > specific for that aux driver.
> >
> > Calls between aux drivers should be done via the core driver.
> >
> > Thanks
>
> The only place we have an ops structure is in the device driver,
> similarly to Jason's example. In our code it is struct hbl_aux_dev. What
> other random places did you see?
This is exactly random place.
I suggest you to take time, learn how existing drivers in netdev and
RDMA uses auxbus infrastructure and follow the same pattern. There are
many examples already in the kernel.
And no, if you do everything right, you won't need custom module
reference counting and other hacks. There is nothing special in your
device/driver which requires special treatment.
Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-07-17 12:33 ` Leon Romanovsky
@ 2024-07-18 6:54 ` Omer Shpigelman
2024-07-18 8:31 ` Leon Romanovsky
0 siblings, 1 reply; 107+ messages in thread
From: Omer Shpigelman @ 2024-07-18 6:54 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Jason Gunthorpe, gregkh@linuxfoundation.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On 7/17/24 15:33, Leon Romanovsky wrote:
> On Wed, Jul 17, 2024 at 10:51:03AM +0000, Omer Shpigelman wrote:
>> On 7/17/24 10:36, Leon Romanovsky wrote:
>>> On Wed, Jul 17, 2024 at 07:08:59AM +0000, Omer Shpigelman wrote:
>>>> On 7/16/24 16:40, Jason Gunthorpe wrote:
>>>>> On Sun, Jul 14, 2024 at 10:18:12AM +0000, Omer Shpigelman wrote:
>>>>>> On 7/12/24 16:08, Jason Gunthorpe wrote:
>>>>>>> [You don't often get email from jgg@ziepe.ca. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>>>>
>>>>>>> On Fri, Jun 28, 2024 at 10:24:32AM +0000, Omer Shpigelman wrote:
>>>>>>>
>>>>>>>> We need the core driver to access the IB driver (and to the ETH driver as
>>>>>>>> well). As you wrote, we can't use exported symbols from our IB driver nor
>>>>>>>> rely on function pointers, but what about providing the core driver an ops
>>>>>>>> structure? meaning exporting a register function from the core driver that
>>>>>>>> should be called by the IB driver during auxiliary device probe.
>>>>>>>> Something like:
>>>>>>>>
>>>>>>>> int hbl_cn_register_ib_aux_dev(struct auxiliary_device *adev,
>>>>>>>> struct hbl_ib_ops *ops)
>>>>>>>> {
>>>>>>>> ...
>>>>>>>> }
>>>>>>>> EXPORT_SYMBOL(hbl_cn_register_ib_aux_dev);
>>>>>>>
>>>>>>> Definately do not do some kind of double-register like this.
>>>>>>>
>>>>>>> The auxiliary_device scheme can already be extended to provide ops for
>>>>>>> each sub device.
>>>>>>>
>>>>>>> Like
>>>>>>>
>>>>>>> struct habana_driver {
>>>>>>> struct auxiliary_driver base;
>>>>>>> const struct habana_ops *ops;
>>>>>>> };
>>>>>>>
>>>>>>> If the ops are justified or not is a different question.
>>>>>>>
>>>>>>
>>>>>> Well, I suggested this double-register option because I got a comment that
>>>>>> the design pattern of embedded ops structure shouldn't be used.
>>>>>> So I'm confused now...
>>>>>
>>>>> Yeah, don't stick ops in random places, but the device_driver is the
>>>>> right place.
>>>>>
>>>>
>>>> Sorry, let me explain again. My original code has an ops structure
>>>> exactly like you are suggesting now (see struct hbl_aux_dev in the first
>>>> patch of the series). But I was instructed not to use this ops structure
>>>> and to rely on exported symbols for inter-driver communication.
>>>> I'll be happy to use this ops structure like in your example rather than
>>>> converting my code to use exported symbols.
>>>> Leon - am I missing anything? what's the verdict here?
>>>
>>> You are missing the main sentence from Jason's response: "don't stick ops in random places".
>>>
>>> It is fine to have ops in device driver, so the core driver can call them. However, in your
>>> original code, you added ops everywhere. It caused to the need to implement module reference
>>> counting and crazy stuff like calls to lock and unlock functions from the aux driver to the core.
>>>
>>> Verdict is still the same. Core driver should provide EXPORT_SYMBOLs, so the aux driver can call
>>> them directly and enjoy from proper module loading and unloading.
>>>
>>> The aux driver can have ops in the device driver, so the core driver can call them to perform something
>>> specific for that aux driver.
>>>
>>> Calls between aux drivers should be done via the core driver.
>>>
>>> Thanks
>>
>> The only place we have an ops structure is in the device driver,
>> similarly to Jason's example. In our code it is struct hbl_aux_dev. What
>> other random places did you see?
>
> This is exactly random place.
>
> I suggest you to take time, learn how existing drivers in netdev and
> RDMA uses auxbus infrastructure and follow the same pattern. There are
> many examples already in the kernel.
>
> And no, if you do everything right, you won't need custom module
> reference counting and other hacks. There is nothing special in your
> device/driver which requires special treatment.
>
> Thanks
How come it is a random place?
Look at irdma/i40e - they have an ops struct (struct i40e_ops) embedded
in their shared aux struct (struct i40e_auxiliary_device) which is the
wrapper of the base aux struct (struct auxiliary_device).
This is very similar to what we have - a pointer to an ops struct
(void *aux_ops) embedded in our shared aux struct (struct hbl_aux_dev)
which is the wrapper of the base struct (struct auxiliary_device).
The only difference is that they put their ops struct inside some info
struct (struct i40e_info) and we have a separate pointer for that info
(void *aux_data).
In addition, they use the ops struct for accessing the net driver from the
RDMA driver, meaning son-to-parent communication instead of having an
exported symbol e.g. i40e_client_request_reset().
They have only a single operation as EXPORT_SYMBOL function for
(un)registering the son - i40e_client_device_register() and
i40e_client_device_unregister().
So what is the problem with how we implemented it?
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver
2024-07-18 6:54 ` Omer Shpigelman
@ 2024-07-18 8:31 ` Leon Romanovsky
0 siblings, 0 replies; 107+ messages in thread
From: Leon Romanovsky @ 2024-07-18 8:31 UTC (permalink / raw)
To: Omer Shpigelman
Cc: Jason Gunthorpe, gregkh@linuxfoundation.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
ogabbay@kernel.org, Zvika Yehudai
On Thu, Jul 18, 2024 at 06:54:17AM +0000, Omer Shpigelman wrote:
> On 7/17/24 15:33, Leon Romanovsky wrote:
> > On Wed, Jul 17, 2024 at 10:51:03AM +0000, Omer Shpigelman wrote:
> >> On 7/17/24 10:36, Leon Romanovsky wrote:
> >>> On Wed, Jul 17, 2024 at 07:08:59AM +0000, Omer Shpigelman wrote:
> >>>> On 7/16/24 16:40, Jason Gunthorpe wrote:
> >>>>> On Sun, Jul 14, 2024 at 10:18:12AM +0000, Omer Shpigelman wrote:
> >>>>>> On 7/12/24 16:08, Jason Gunthorpe wrote:
<...>
> So what is the problem with how we implemented it?
Please do you homework and send new version of the patch series.
Thanks
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 06/15] net: hbl_cn: debugfs support
2024-06-23 15:02 ` Andrew Lunn
2024-06-24 7:21 ` Omer Shpigelman
2024-06-24 9:22 ` Leon Romanovsky
@ 2024-12-17 10:00 ` Avri Kehat
2024-12-17 10:52 ` Andrew Lunn
2 siblings, 1 reply; 107+ messages in thread
From: Avri Kehat @ 2024-12-17 10:00 UTC (permalink / raw)
To: andrew
Cc: dri-devel, linux-kernel, linux-rdma, netdev, ogabbay, oshpigelman,
sgoutham, zyehudai
Revisiting the comments regarding our use of debugfs as an interface for device configurations -
A big part of the non-statistics debugfs parameters are HW related debug-only capabilities, and not configurations required by the user.
Should these sort of parameters be part of devlink as well?
Is there another location where debug related configurations for development can reside in?
Avri
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH 06/15] net: hbl_cn: debugfs support
2024-12-17 10:00 ` Avri Kehat
@ 2024-12-17 10:52 ` Andrew Lunn
0 siblings, 0 replies; 107+ messages in thread
From: Andrew Lunn @ 2024-12-17 10:52 UTC (permalink / raw)
To: Avri Kehat
Cc: dri-devel, linux-kernel, linux-rdma, netdev, ogabbay, oshpigelman,
sgoutham, zyehudai
On Tue, Dec 17, 2024 at 12:00:39PM +0200, Avri Kehat wrote:
> Revisiting the comments regarding our use of debugfs as an interface for device configurations -
> A big part of the non-statistics debugfs parameters are HW related debug-only capabilities, and not configurations required by the user.
> Should these sort of parameters be part of devlink as well?
> Is there another location where debug related configurations for development can reside in?
There are a few options:
If the user does not require them, don't even implement them. If the
user is not using them, who is?
Implement them in an out of tree patch, which your development team
can use, since these are not user configuration options.
devlink is a possibility, but developers complain it is slow to get
them merged since we want to understand what the configuration option
does, why would i want to use it, would any other vendor have the same
need, and should it be made generic etc.
You could join those asking for fwctl, which is a contentious subject.
Andrew
^ permalink raw reply [flat|nested] 107+ messages in thread
end of thread, other threads:[~2024-12-17 10:52 UTC | newest]
Thread overview: 107+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
2024-06-13 8:21 ` [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver Omer Shpigelman
2024-06-13 13:01 ` Przemek Kitszel
2024-06-13 14:16 ` Przemek Kitszel
2024-06-17 8:08 ` Omer Shpigelman
2024-06-17 11:48 ` Leon Romanovsky
2024-06-18 7:28 ` Omer Shpigelman
2024-06-15 0:05 ` Stephen Hemminger
2024-06-17 8:14 ` Omer Shpigelman
2024-06-17 14:05 ` Markus Elfring
2024-06-17 15:02 ` Andrew Lunn
2024-06-18 7:51 ` Omer Shpigelman
2024-06-13 8:21 ` [PATCH 02/15] net: hbl_cn: memory manager component Omer Shpigelman
2024-06-13 8:21 ` [PATCH 03/15] net: hbl_cn: physical layer support Omer Shpigelman
2024-06-13 8:21 ` [PATCH 04/15] net: hbl_cn: QP state machine Omer Shpigelman
2024-06-17 13:18 ` Leon Romanovsky
2024-06-18 5:50 ` Omer Shpigelman
2024-06-18 7:08 ` Leon Romanovsky
2024-06-18 7:58 ` Omer Shpigelman
2024-06-18 9:00 ` Leon Romanovsky
2024-06-24 7:24 ` Omer Shpigelman
2024-06-13 8:21 ` [PATCH 05/15] net: hbl_cn: memory trace events Omer Shpigelman
2024-06-13 8:21 ` [PATCH 06/15] net: hbl_cn: debugfs support Omer Shpigelman
2024-06-19 18:35 ` Sunil Kovvuri Goutham
2024-06-21 10:17 ` Omer Shpigelman
2024-06-21 10:30 ` Sunil Kovvuri Goutham
2024-06-23 7:25 ` Omer Shpigelman
2024-06-21 15:33 ` Andrew Lunn
2024-06-23 6:57 ` Omer Shpigelman
2024-06-23 15:02 ` Andrew Lunn
2024-06-24 7:21 ` Omer Shpigelman
2024-06-24 9:22 ` Leon Romanovsky
2024-12-17 10:00 ` Avri Kehat
2024-12-17 10:52 ` Andrew Lunn
2024-06-13 8:22 ` [PATCH 08/15] net: hbl_cn: gaudi2: ASIC specific support Omer Shpigelman
2024-06-13 8:22 ` [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver Omer Shpigelman
2024-06-13 21:49 ` Andrew Lunn
2024-06-18 6:58 ` Omer Shpigelman
2024-06-18 14:19 ` Andrew Lunn
2024-06-19 7:16 ` Omer Shpigelman
2024-06-19 8:01 ` Przemek Kitszel
2024-06-19 12:15 ` Omer Shpigelman
2024-06-19 15:21 ` Jakub Kicinski
2024-06-20 8:43 ` Omer Shpigelman
2024-06-20 13:51 ` Jakub Kicinski
2024-06-20 19:14 ` Andrew Lunn
2024-06-23 14:48 ` Omer Shpigelman
2024-06-19 16:13 ` Andrew Lunn
2024-06-23 6:22 ` Omer Shpigelman
2024-06-23 14:46 ` Andrew Lunn
2024-06-26 10:13 ` Omer Shpigelman
2024-06-26 14:13 ` Andrew Lunn
2024-06-30 7:11 ` Omer Shpigelman
2024-06-14 22:48 ` Joe Damato
2024-06-16 1:04 ` Andrew Lunn
2024-06-18 19:37 ` Omer Shpigelman
2024-06-18 21:19 ` Stephen Hemminger
2024-06-19 12:13 ` Omer Shpigelman
2024-06-15 0:10 ` Stephen Hemminger
2024-06-19 12:07 ` Omer Shpigelman
2024-06-15 0:16 ` Stephen Hemminger
2024-06-18 19:39 ` Omer Shpigelman
2024-06-19 15:40 ` Andrew Lunn
2024-06-20 8:36 ` Omer Shpigelman
2024-06-15 10:55 ` Zhu Yanjun
2024-06-18 11:16 ` Omer Shpigelman
2024-06-15 17:13 ` Zhu Yanjun
2024-06-16 1:08 ` Andrew Lunn
2024-06-13 8:22 ` [PATCH 10/15] net: hbl_en: gaudi2: ASIC specific support Omer Shpigelman
2024-06-13 8:22 ` [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver Omer Shpigelman
2024-06-13 19:18 ` Leon Romanovsky
2024-06-17 17:43 ` Omer Shpigelman
2024-06-17 19:04 ` Leon Romanovsky
2024-06-18 11:08 ` Omer Shpigelman
2024-06-18 12:58 ` Leon Romanovsky
2024-06-19 9:27 ` Omer Shpigelman
2024-06-19 10:52 ` Leon Romanovsky
2024-06-24 8:47 ` Omer Shpigelman
2024-06-24 9:10 ` Leon Romanovsky
2024-06-28 10:24 ` Omer Shpigelman
2024-06-30 13:29 ` Leon Romanovsky
2024-07-01 10:46 ` Omer Shpigelman
2024-07-01 12:46 ` Leon Romanovsky
2024-07-12 13:08 ` Jason Gunthorpe
2024-07-14 10:18 ` Omer Shpigelman
2024-07-16 13:40 ` Jason Gunthorpe
2024-07-17 7:08 ` Omer Shpigelman
2024-07-17 7:36 ` Leon Romanovsky
2024-07-17 10:51 ` Omer Shpigelman
2024-07-17 11:56 ` Jason Gunthorpe
2024-07-17 12:33 ` Leon Romanovsky
2024-07-18 6:54 ` Omer Shpigelman
2024-07-18 8:31 ` Leon Romanovsky
2024-06-18 16:01 ` Przemek Kitszel
2024-06-19 9:34 ` Omer Shpigelman
2024-06-17 14:17 ` Jason Gunthorpe
2024-06-19 9:39 ` Omer Shpigelman
2024-06-13 8:22 ` [PATCH 12/15] RDMA/hbl: direct verbs support Omer Shpigelman
2024-06-13 8:22 ` [PATCH 13/15] accel/habanalabs: network scaling support Omer Shpigelman
2024-06-19 18:41 ` Sunil Kovvuri Goutham
2024-06-21 10:21 ` Omer Shpigelman
2024-06-13 8:22 ` [PATCH 14/15] accel/habanalabs/gaudi2: CN registers header files Omer Shpigelman
2024-06-13 8:22 ` [PATCH 15/15] accel/habanalabs/gaudi2: network scaling support Omer Shpigelman
2024-06-17 12:34 ` [PATCH 00/15] Introduce HabanaLabs network drivers Alexander Lobakin
2024-06-19 11:40 ` Omer Shpigelman
2024-06-19 16:33 ` Jiri Pirko
2024-06-20 5:37 ` Omer Shpigelman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox