* [pull request][net-next 00/14] mlx5 updates 2023-08-14
@ 2023-08-14 21:41 Saeed Mahameed
2023-08-14 21:41 ` [net-next 01/14] net/mlx5: Consolidate devlink documentation in devlink/mlx5.rst Saeed Mahameed
` (13 more replies)
0 siblings, 14 replies; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan
From: Saeed Mahameed <saeedm@nvidia.com>
This series provides updates to mlx5 driver.
For more information please see tag log below.
Please pull and let me know if there is any problem.
Thanks,
Saeed.
The following changes since commit f3cc00303cdbc82f719e0cf0dc6f057e8741b5ee:
Merge branch 'devlink-introduce-selective-dumps' (2023-08-14 11:47:27 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2023-08-14
for you to fetch changes up to bd3a2f77809b84df5dc15edd72cf9955568e698e:
net/mlx5: Don't query MAX caps twice (2023-08-14 14:40:22 -0700)
----------------------------------------------------------------
mlx5-updates-2023-08-14
1) Handle PTP out of order CQEs issue
2) Check FW status before determining reset successful
3) Expose maximum supported SFs via devlink resource
4) MISC cleanups
----------------------------------------------------------------
Jianbo Liu (1):
net/mlx5: E-switch, Add checking for flow rule destinations
Jiri Pirko (5):
net/mlx5: Use auxiliary_device_uninit() instead of device_put()
net/mlx5: Remove redundant SF supported check from mlx5_sf_hw_table_init()
net/mlx5: Use mlx5_sf_start_function_id() helper instead of directly calling MLX5_CAP_GEN()
net/mlx5: Remove redundant check of mlx5_vhca_event_supported()
net/mlx5: Fix error message in mlx5_sf_dev_state_change_handler()
Moshe Shemesh (1):
net/mlx5: Check with FW that sync reset completed successfully
Rahul Rameshbabu (3):
net/mlx5: Consolidate devlink documentation in devlink/mlx5.rst
net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs
net/mlx5e: Add recovery flow for tx devlink health reporter for unhealthy PTP SQ
Shay Drory (4):
net/mlx5: Expose max possible SFs via devlink resource
net/mlx5: Remove unused CAPs
net/mlx5: Remove unused MAX HCA capabilities
net/mlx5: Don't query MAX caps twice
.../ethernet/mellanox/mlx5/counters.rst | 6 +
.../ethernet/mellanox/mlx5/devlink.rst | 313 ---------------------
.../ethernet/mellanox/mlx5/index.rst | 1 -
Documentation/networking/devlink/mlx5.rst | 182 ++++++++++++
drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 3 +
drivers/net/ethernet/mellanox/mlx5/core/devlink.h | 8 +
.../net/ethernet/mellanox/mlx5/core/en/health.h | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c | 237 ++++++++++++----
drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h | 59 +++-
.../ethernet/mellanox/mlx5/core/en/reporter_tx.c | 65 +++++
.../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 3 +-
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 4 +-
drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 4 +-
drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 28 +-
.../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 31 ++
drivers/net/ethernet/mellanox/mlx5/core/fw.c | 59 ++--
drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 39 ++-
drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h | 2 +
drivers/net/ethernet/mellanox/mlx5/core/main.c | 16 +-
.../net/ethernet/mellanox/mlx5/core/mlx5_core.h | 3 +
.../net/ethernet/mellanox/mlx5/core/sf/dev/dev.c | 12 +-
.../net/ethernet/mellanox/mlx5/core/sf/hw_table.c | 49 +++-
include/linux/mlx5/device.h | 69 +----
include/linux/mlx5/driver.h | 1 -
include/linux/mlx5/mlx5_ifc.h | 46 +--
25 files changed, 672 insertions(+), 569 deletions(-)
delete mode 100644 Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst
^ permalink raw reply [flat|nested] 16+ messages in thread
* [net-next 01/14] net/mlx5: Consolidate devlink documentation in devlink/mlx5.rst
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
@ 2023-08-14 21:41 ` Saeed Mahameed
2023-08-16 2:40 ` patchwork-bot+netdevbpf
2023-08-14 21:41 ` [net-next 02/14] net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs Saeed Mahameed
` (12 subsequent siblings)
13 siblings, 1 reply; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Rahul Rameshbabu,
Jiri Pirko, Gal Pressman
From: Rahul Rameshbabu <rrameshbabu@nvidia.com>
De-duplicate documentation by removing mellanox/mlx5/devlink.rst. Instead,
only use the generic devlink documentation directory to document mlx5
devlink parameters. Avoid providing general devlink tool usage information
in mlx5-specific documentation.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
.../ethernet/mellanox/mlx5/devlink.rst | 313 ------------------
.../ethernet/mellanox/mlx5/index.rst | 1 -
Documentation/networking/devlink/mlx5.rst | 179 ++++++++++
3 files changed, 179 insertions(+), 314 deletions(-)
delete mode 100644 Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst
diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst
deleted file mode 100644
index a4edf908b707..000000000000
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst
+++ /dev/null
@@ -1,313 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
-.. include:: <isonum.txt>
-
-=======
-Devlink
-=======
-
-:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-
-Contents
-========
-
-- `Info`_
-- `Parameters`_
-- `Health reporters`_
-
-Info
-====
-
-The devlink info reports the running and stored firmware versions on device.
-It also prints the device PSID which represents the HCA board type ID.
-
-User command example::
-
- $ devlink dev info pci/0000:00:06.0
- pci/0000:00:06.0:
- driver mlx5_core
- versions:
- fixed:
- fw.psid MT_0000000009
- running:
- fw.version 16.26.0100
- stored:
- fw.version 16.26.0100
-
-Parameters
-==========
-
-flow_steering_mode: Device flow steering mode
----------------------------------------------
-The flow steering mode parameter controls the flow steering mode of the driver.
-Two modes are supported:
-
-1. 'dmfs' - Device managed flow steering.
-2. 'smfs' - Software/Driver managed flow steering.
-
-In DMFS mode, the HW steering entities are created and managed through the
-Firmware.
-In SMFS mode, the HW steering entities are created and managed though by
-the driver directly into hardware without firmware intervention.
-
-SMFS mode is faster and provides better rule insertion rate compared to default DMFS mode.
-
-User command examples:
-
-- Set SMFS flow steering mode::
-
- $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
-
-- Read device flow steering mode::
-
- $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
- pci/0000:06:00.0:
- name flow_steering_mode type driver-specific
- values:
- cmode runtime value smfs
-
-enable_roce: RoCE enablement state
-----------------------------------
-If the device supports RoCE disablement, RoCE enablement state controls device
-support for RoCE capability. Otherwise, the control occurs in the driver stack.
-When RoCE is disabled at the driver level, only raw ethernet QPs are supported.
-
-To change RoCE enablement state, a user must change the driverinit cmode value
-and run devlink reload.
-
-User command examples:
-
-- Disable RoCE::
-
- $ devlink dev param set pci/0000:06:00.0 name enable_roce value false cmode driverinit
- $ devlink dev reload pci/0000:06:00.0
-
-- Read RoCE enablement state::
-
- $ devlink dev param show pci/0000:06:00.0 name enable_roce
- pci/0000:06:00.0:
- name enable_roce type generic
- values:
- cmode driverinit value true
-
-esw_port_metadata: Eswitch port metadata state
-----------------------------------------------
-When applicable, disabling eswitch metadata can increase packet rate
-up to 20% depending on the use case and packet sizes.
-
-Eswitch port metadata state controls whether to internally tag packets with
-metadata. Metadata tagging must be enabled for multi-port RoCE, failover
-between representors and stacked devices.
-By default metadata is enabled on the supported devices in E-switch.
-Metadata is applicable only for E-switch in switchdev mode and
-users may disable it when NONE of the below use cases will be in use:
-
-1. HCA is in Dual/multi-port RoCE mode.
-2. VF/SF representor bonding (Usually used for Live migration)
-3. Stacked devices
-
-When metadata is disabled, the above use cases will fail to initialize if
-users try to enable them.
-
-- Show eswitch port metadata::
-
- $ devlink dev param show pci/0000:06:00.0 name esw_port_metadata
- pci/0000:06:00.0:
- name esw_port_metadata type driver-specific
- values:
- cmode runtime value true
-
-- Disable eswitch port metadata::
-
- $ devlink dev param set pci/0000:06:00.0 name esw_port_metadata value false cmode runtime
-
-- Change eswitch mode to switchdev mode where after choosing the metadata value::
-
- $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
-
-hairpin_num_queues: Number of hairpin queues
---------------------------------------------
-We refer to a TC NIC rule that involves forwarding as "hairpin".
-
-Hairpin queues are mlx5 hardware specific implementation for hardware
-forwarding of such packets.
-
-- Show the number of hairpin queues::
-
- $ devlink dev param show pci/0000:06:00.0 name hairpin_num_queues
- pci/0000:06:00.0:
- name hairpin_num_queues type driver-specific
- values:
- cmode driverinit value 2
-
-- Change the number of hairpin queues::
-
- $ devlink dev param set pci/0000:06:00.0 name hairpin_num_queues value 4 cmode driverinit
-
-hairpin_queue_size: Size of the hairpin queues
-----------------------------------------------
-Control the size of the hairpin queues.
-
-- Show the size of the hairpin queues::
-
- $ devlink dev param show pci/0000:06:00.0 name hairpin_queue_size
- pci/0000:06:00.0:
- name hairpin_queue_size type driver-specific
- values:
- cmode driverinit value 1024
-
-- Change the size (in packets) of the hairpin queues::
-
- $ devlink dev param set pci/0000:06:00.0 name hairpin_queue_size value 512 cmode driverinit
-
-Health reporters
-================
-
-tx reporter
------------
-The tx reporter is responsible for reporting and recovering of the following two error scenarios:
-
-- tx timeout
- Report on kernel tx timeout detection.
- Recover by searching lost interrupts.
-- tx error completion
- Report on error tx completion.
- Recover by flushing the tx queue and reset it.
-
-tx reporter also support on demand diagnose callback, on which it provides
-real time information of its send queues status.
-
-User commands examples:
-
-- Diagnose send queues status::
-
- $ devlink health diagnose pci/0000:82:00.0 reporter tx
-
-.. note::
- This command has valid output only when interface is up, otherwise the command has empty output.
-
-- Show number of tx errors indicated, number of recover flows ended successfully,
- is autorecover enabled and graceful period from last recover::
-
- $ devlink health show pci/0000:82:00.0 reporter tx
-
-rx reporter
------------
-The rx reporter is responsible for reporting and recovering of the following two error scenarios:
-
-- rx queues' initialization (population) timeout
- Population of rx queues' descriptors on ring initialization is done
- in napi context via triggering an irq. In case of a failure to get
- the minimum amount of descriptors, a timeout would occur, and
- descriptors could be recovered by polling the EQ (Event Queue).
-- rx completions with errors (reported by HW on interrupt context)
- Report on rx completion error.
- Recover (if needed) by flushing the related queue and reset it.
-
-rx reporter also supports on demand diagnose callback, on which it
-provides real time information of its receive queues' status.
-
-- Diagnose rx queues' status and corresponding completion queue::
-
- $ devlink health diagnose pci/0000:82:00.0 reporter rx
-
-NOTE: This command has valid output only when interface is up. Otherwise, the command has empty output.
-
-- Show number of rx errors indicated, number of recover flows ended successfully,
- is autorecover enabled, and graceful period from last recover::
-
- $ devlink health show pci/0000:82:00.0 reporter rx
-
-fw reporter
------------
-The fw reporter implements `diagnose` and `dump` callbacks.
-It follows symptoms of fw error such as fw syndrome by triggering
-fw core dump and storing it into the dump buffer.
-The fw reporter diagnose command can be triggered any time by the user to check
-current fw status.
-
-User commands examples:
-
-- Check fw heath status::
-
- $ devlink health diagnose pci/0000:82:00.0 reporter fw
-
-- Read FW core dump if already stored or trigger new one::
-
- $ devlink health dump show pci/0000:82:00.0 reporter fw
-
-.. note::
- This command can run only on the PF which has fw tracer ownership,
- running it on other PF or any VF will return "Operation not permitted".
-
-fw fatal reporter
------------------
-The fw fatal reporter implements `dump` and `recover` callbacks.
-It follows fatal errors indications by CR-space dump and recover flow.
-The CR-space dump uses vsc interface which is valid even if the FW command
-interface is not functional, which is the case in most FW fatal errors.
-The recover function runs recover flow which reloads the driver and triggers fw
-reset if needed.
-On firmware error, the health buffer is dumped into the dmesg. The log
-level is derived from the error's severity (given in health buffer).
-
-User commands examples:
-
-- Run fw recover flow manually::
-
- $ devlink health recover pci/0000:82:00.0 reporter fw_fatal
-
-- Read FW CR-space dump if already stored or trigger new one::
-
- $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
-
-.. note::
- This command can run only on PF.
-
-vnic reporter
--------------
-The vnic reporter implements only the `diagnose` callback.
-It is responsible for querying the vnic diagnostic counters from fw and displaying
-them in realtime.
-
-Description of the vnic counters:
-
-- total_q_under_processor_handle
- number of queues in an error state due to
- an async error or errored command.
-- send_queue_priority_update_flow
- number of QP/SQ priority/SL update events.
-- cq_overrun
- number of times CQ entered an error state due to an overflow.
-- async_eq_overrun
- number of times an EQ mapped to async events was overrun.
- comp_eq_overrun number of times an EQ mapped to completion events was
- overrun.
-- quota_exceeded_command
- number of commands issued and failed due to quota exceeded.
-- invalid_command
- number of commands issued and failed dues to any reason other than quota
- exceeded.
-- nic_receive_steering_discard
- number of packets that completed RX flow
- steering but were discarded due to a mismatch in flow table.
-- generated_pkt_steering_fail
- number of packets generated by the VNIC experiencing unexpected steering
- failure (at any point in steering flow).
-- handled_pkt_steering_fail
- number of packets handled by the VNIC experiencing unexpected steering
- failure (at any point in steering flow owned by the VNIC, including the FDB
- for the eswitch owner).
-
-User commands examples:
-
-- Diagnose PF/VF vnic counters::
-
- $ devlink health diagnose pci/0000:82:00.1 reporter vnic
-
-- Diagnose representor vnic counters (performed by supplying devlink port of the
- representor, which can be obtained via devlink port command)::
-
- $ devlink health diagnose pci/0000:82:00.1/65537 reporter vnic
-
-.. note::
- This command can run over all interfaces such as PF/VF and representor ports.
diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/index.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/index.rst
index 3fdcd6b61ccf..581a91caa579 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/index.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/index.rst
@@ -13,7 +13,6 @@ Contents:
:maxdepth: 2
kconfig
- devlink
switchdev
tracepoints
counters
diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
index 202798d6501e..196a4bb28df1 100644
--- a/Documentation/networking/devlink/mlx5.rst
+++ b/Documentation/networking/devlink/mlx5.rst
@@ -18,6 +18,11 @@ Parameters
* - ``enable_roce``
- driverinit
- Type: Boolean
+
+ If the device supports RoCE disablement, RoCE enablement state controls
+ device support for RoCE capability. Otherwise, the control occurs in the
+ driver stack. When RoCE is disabled at the driver level, only raw
+ ethernet QPs are supported.
* - ``io_eq_size``
- driverinit
- The range is between 64 and 4096.
@@ -48,6 +53,9 @@ parameters.
* ``smfs`` Software managed flow steering. In SMFS mode, the HW
steering entities are created and manage through the driver without
firmware intervention.
+
+ SMFS mode is faster and provides better rule insertion rate compared to
+ default DMFS mode.
* - ``fdb_large_groups``
- u32
- driverinit
@@ -71,7 +79,24 @@ parameters.
deprecated.
Default: disabled
+ * - ``esw_port_metadata``
+ - Boolean
+ - runtime
+ - When applicable, disabling eswitch metadata can increase packet rate up
+ to 20% depending on the use case and packet sizes.
+
+ Eswitch port metadata state controls whether to internally tag packets
+ with metadata. Metadata tagging must be enabled for multi-port RoCE,
+ failover between representors and stacked devices. By default metadata is
+ enabled on the supported devices in E-switch. Metadata is applicable only
+ for E-switch in switchdev mode and users may disable it when NONE of the
+ below use cases will be in use:
+ 1. HCA is in Dual/multi-port RoCE mode.
+ 2. VF/SF representor bonding (Usually used for Live migration)
+ 3. Stacked devices
+ When metadata is disabled, the above use cases will fail to initialize if
+ users try to enable them.
* - ``hairpin_num_queues``
- u32
- driverinit
@@ -104,3 +129,157 @@ The ``mlx5`` driver reports the following versions
* - ``fw.version``
- stored, running
- Three digit major.minor.subminor firmware version number.
+
+Health reporters
+================
+
+tx reporter
+-----------
+The tx reporter is responsible for reporting and recovering of the following two error scenarios:
+
+- tx timeout
+ Report on kernel tx timeout detection.
+ Recover by searching lost interrupts.
+- tx error completion
+ Report on error tx completion.
+ Recover by flushing the tx queue and reset it.
+
+tx reporter also support on demand diagnose callback, on which it provides
+real time information of its send queues status.
+
+User commands examples:
+
+- Diagnose send queues status::
+
+ $ devlink health diagnose pci/0000:82:00.0 reporter tx
+
+.. note::
+ This command has valid output only when interface is up, otherwise the command has empty output.
+
+- Show number of tx errors indicated, number of recover flows ended successfully,
+ is autorecover enabled and graceful period from last recover::
+
+ $ devlink health show pci/0000:82:00.0 reporter tx
+
+rx reporter
+-----------
+The rx reporter is responsible for reporting and recovering of the following two error scenarios:
+
+- rx queues' initialization (population) timeout
+ Population of rx queues' descriptors on ring initialization is done
+ in napi context via triggering an irq. In case of a failure to get
+ the minimum amount of descriptors, a timeout would occur, and
+ descriptors could be recovered by polling the EQ (Event Queue).
+- rx completions with errors (reported by HW on interrupt context)
+ Report on rx completion error.
+ Recover (if needed) by flushing the related queue and reset it.
+
+rx reporter also supports on demand diagnose callback, on which it
+provides real time information of its receive queues' status.
+
+- Diagnose rx queues' status and corresponding completion queue::
+
+ $ devlink health diagnose pci/0000:82:00.0 reporter rx
+
+.. note::
+ This command has valid output only when interface is up. Otherwise, the command has empty output.
+
+- Show number of rx errors indicated, number of recover flows ended successfully,
+ is autorecover enabled, and graceful period from last recover::
+
+ $ devlink health show pci/0000:82:00.0 reporter rx
+
+fw reporter
+-----------
+The fw reporter implements `diagnose` and `dump` callbacks.
+It follows symptoms of fw error such as fw syndrome by triggering
+fw core dump and storing it into the dump buffer.
+The fw reporter diagnose command can be triggered any time by the user to check
+current fw status.
+
+User commands examples:
+
+- Check fw heath status::
+
+ $ devlink health diagnose pci/0000:82:00.0 reporter fw
+
+- Read FW core dump if already stored or trigger new one::
+
+ $ devlink health dump show pci/0000:82:00.0 reporter fw
+
+.. note::
+ This command can run only on the PF which has fw tracer ownership,
+ running it on other PF or any VF will return "Operation not permitted".
+
+fw fatal reporter
+-----------------
+The fw fatal reporter implements `dump` and `recover` callbacks.
+It follows fatal errors indications by CR-space dump and recover flow.
+The CR-space dump uses vsc interface which is valid even if the FW command
+interface is not functional, which is the case in most FW fatal errors.
+The recover function runs recover flow which reloads the driver and triggers fw
+reset if needed.
+On firmware error, the health buffer is dumped into the dmesg. The log
+level is derived from the error's severity (given in health buffer).
+
+User commands examples:
+
+- Run fw recover flow manually::
+
+ $ devlink health recover pci/0000:82:00.0 reporter fw_fatal
+
+- Read FW CR-space dump if already stored or trigger new one::
+
+ $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
+
+.. note::
+ This command can run only on PF.
+
+vnic reporter
+-------------
+The vnic reporter implements only the `diagnose` callback.
+It is responsible for querying the vnic diagnostic counters from fw and displaying
+them in realtime.
+
+Description of the vnic counters:
+
+- total_q_under_processor_handle
+ number of queues in an error state due to
+ an async error or errored command.
+- send_queue_priority_update_flow
+ number of QP/SQ priority/SL update events.
+- cq_overrun
+ number of times CQ entered an error state due to an overflow.
+- async_eq_overrun
+ number of times an EQ mapped to async events was overrun.
+ comp_eq_overrun number of times an EQ mapped to completion events was
+ overrun.
+- quota_exceeded_command
+ number of commands issued and failed due to quota exceeded.
+- invalid_command
+ number of commands issued and failed dues to any reason other than quota
+ exceeded.
+- nic_receive_steering_discard
+ number of packets that completed RX flow
+ steering but were discarded due to a mismatch in flow table.
+- generated_pkt_steering_fail
+ number of packets generated by the VNIC experiencing unexpected steering
+ failure (at any point in steering flow).
+- handled_pkt_steering_fail
+ number of packets handled by the VNIC experiencing unexpected steering
+ failure (at any point in steering flow owned by the VNIC, including the FDB
+ for the eswitch owner).
+
+User commands examples:
+
+- Diagnose PF/VF vnic counters::
+
+ $ devlink health diagnose pci/0000:82:00.1 reporter vnic
+
+- Diagnose representor vnic counters (performed by supplying devlink port of the
+ representor, which can be obtained via devlink port command)::
+
+ $ devlink health diagnose pci/0000:82:00.1/65537 reporter vnic
+
+.. note::
+ This command can run over all interfaces such as PF/VF and representor ports.
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [net-next 02/14] net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
2023-08-14 21:41 ` [net-next 01/14] net/mlx5: Consolidate devlink documentation in devlink/mlx5.rst Saeed Mahameed
@ 2023-08-14 21:41 ` Saeed Mahameed
2023-08-14 21:41 ` [net-next 03/14] net/mlx5e: Add recovery flow for tx devlink health reporter for unhealthy PTP SQ Saeed Mahameed
` (11 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Rahul Rameshbabu
From: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Use a map structure for associating CQEs containing port timestamping
information with the appropriate skb. Track order of WQEs submitted using a
FIFO. Check if the corresponding port timestamping CQEs from the lookup
values in the FIFO are considered dropped due to time elapsed. Return the
lookup value to a freelist after consuming the skb. Reuse the freed lookup
in future WQE submission iterations.
The map structure uses an integer identifier for the key and returns an skb
corresponding to that identifier. Embed the integer identifier in the WQE
submitted to the WQ for the transmit path when the SQ is a PTP (port
timestamping) SQ. The embedded identifier can then be queried using a field
in the CQE of the corresponding port timestamping CQ. In the port
timestamping napi_poll context, the identifier is queried from the CQE
polled from CQ and used to lookup the corresponding skb from the WQE submit
path. The skb reference is removed from map and then embedded with the port
HW timestamp information from the CQE and eventually consumed.
The metadata freelist FIFO is an array containing integer identifiers that
can be pushed and popped in the FIFO. The purpose of this structure is
bookkeeping what identifier values can safely be used in a subsequent WQE
submission and should not contain identifiers that have still not been
reaped by processing a corresponding CQE completion on the port
timestamping CQ.
The ts_cqe_pending_list structure is a combination of an array and linked
list. The array is pre-populated with the nodes that will be added and
removed from the head of the linked list. Each node contains the unique
identifier value associated with the values submitted in the WQEs and
retrieved in the port timestamping CQEs. When a WQE is submitted, the node
in the array corresponding to the identifier popped from the metadata
freelist is added to the end of the CQE pending list and is marked as
"in-use". The node is removed from the linked list under two conditions.
The first condition is that the corresponding port timestamping CQE is
polled in the PTP napi_poll context. The second condition is that more than
a second has elapsed since the DMA timestamp value corresponding to the WQE
submission. When the first condition occurs, the "in-use" bit in the linked
list node is cleared, and the resources corresponding to the WQE submission
are then released. The second condition, however, indicates that the port
timestamping CQE will likely never be delivered. It's not impossible for
the device to post a CQE after an infinite amount of time though highly
improbable. In order to be resilient to this improbable case, resources
related to the corresponding WQE submission are still kept, the identifier
value is not returned to the freelist, and the "in-use" bit is cleared on
the node to indicate that it's no longer part of the linked list of "likely
to be delivered" port timestamping CQE identifiers. A count for the number
of port timestamping CQEs considered highly likely to never be delivered by
the device is maintained. This count gets decremented in the unlikely event
a port timestamping CQE considered unlikely to ever be delivered is polled
in the PTP napi_poll context.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
.../ethernet/mellanox/mlx5/counters.rst | 6 +
.../net/ethernet/mellanox/mlx5/core/en/ptp.c | 215 +++++++++++++-----
.../net/ethernet/mellanox/mlx5/core/en/ptp.h | 57 ++++-
.../ethernet/mellanox/mlx5/core/en_ethtool.c | 3 +-
.../ethernet/mellanox/mlx5/core/en_stats.c | 4 +-
.../ethernet/mellanox/mlx5/core/en_stats.h | 4 +-
.../net/ethernet/mellanox/mlx5/core/en_tx.c | 28 ++-
7 files changed, 236 insertions(+), 81 deletions(-)
diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index a395df9c2751..008e560e12b5 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -683,6 +683,12 @@ the software port.
time protocol.
- Error
+ * - `ptp_cq[i]_late_cqe`
+ - Number of times a CQE has been delivered on the PTP timestamping CQ when
+ the CQE was not expected since a certain amount of time had elapsed where
+ the device typically ensures not posting the CQE.
+ - Error
+
.. [#ring_global] The corresponding ring and global counters do not share the
same name (i.e. do not follow the common naming scheme).
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
index b0b429a0321e..8680d21f3e7b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
@@ -5,6 +5,8 @@
#include "en/txrx.h"
#include "en/params.h"
#include "en/fs_tt_redirect.h"
+#include <linux/list.h>
+#include <linux/spinlock.h>
struct mlx5e_ptp_fs {
struct mlx5_flow_handle *l2_rule;
@@ -19,6 +21,48 @@ struct mlx5e_ptp_params {
struct mlx5e_rq_param rq_param;
};
+struct mlx5e_ptp_port_ts_cqe_tracker {
+ u8 metadata_id;
+ bool inuse : 1;
+ struct list_head entry;
+};
+
+struct mlx5e_ptp_port_ts_cqe_list {
+ struct mlx5e_ptp_port_ts_cqe_tracker *nodes;
+ struct list_head tracker_list_head;
+ /* Sync list operations in xmit and napi_poll contexts */
+ spinlock_t tracker_list_lock;
+};
+
+static inline void
+mlx5e_ptp_port_ts_cqe_list_add(struct mlx5e_ptp_port_ts_cqe_list *list, u8 metadata)
+{
+ struct mlx5e_ptp_port_ts_cqe_tracker *tracker = &list->nodes[metadata];
+
+ WARN_ON_ONCE(tracker->inuse);
+ tracker->inuse = true;
+ spin_lock(&list->tracker_list_lock);
+ list_add_tail(&tracker->entry, &list->tracker_list_head);
+ spin_unlock(&list->tracker_list_lock);
+}
+
+static void
+mlx5e_ptp_port_ts_cqe_list_remove(struct mlx5e_ptp_port_ts_cqe_list *list, u8 metadata)
+{
+ struct mlx5e_ptp_port_ts_cqe_tracker *tracker = &list->nodes[metadata];
+
+ WARN_ON_ONCE(!tracker->inuse);
+ tracker->inuse = false;
+ spin_lock(&list->tracker_list_lock);
+ list_del(&tracker->entry);
+ spin_unlock(&list->tracker_list_lock);
+}
+
+void mlx5e_ptpsq_track_metadata(struct mlx5e_ptpsq *ptpsq, u8 metadata)
+{
+ mlx5e_ptp_port_ts_cqe_list_add(ptpsq->ts_cqe_pending_list, metadata);
+}
+
struct mlx5e_skb_cb_hwtstamp {
ktime_t cqe_hwtstamp;
ktime_t port_hwtstamp;
@@ -79,75 +123,88 @@ void mlx5e_skb_cb_hwtstamp_handler(struct sk_buff *skb, int hwtstamp_type,
memset(skb->cb, 0, sizeof(struct mlx5e_skb_cb_hwtstamp));
}
-#define PTP_WQE_CTR2IDX(val) ((val) & ptpsq->ts_cqe_ctr_mask)
-
-static bool mlx5e_ptp_ts_cqe_drop(struct mlx5e_ptpsq *ptpsq, u16 skb_ci, u16 skb_id)
+static struct sk_buff *
+mlx5e_ptp_metadata_map_lookup(struct mlx5e_ptp_metadata_map *map, u16 metadata)
{
- return (ptpsq->ts_cqe_ctr_mask && (skb_ci != skb_id));
+ return map->data[metadata];
}
-static bool mlx5e_ptp_ts_cqe_ooo(struct mlx5e_ptpsq *ptpsq, u16 skb_id)
+static struct sk_buff *
+mlx5e_ptp_metadata_map_remove(struct mlx5e_ptp_metadata_map *map, u16 metadata)
{
- u16 skb_ci = PTP_WQE_CTR2IDX(ptpsq->skb_fifo_cc);
- u16 skb_pi = PTP_WQE_CTR2IDX(ptpsq->skb_fifo_pc);
+ struct sk_buff *skb;
- if (PTP_WQE_CTR2IDX(skb_id - skb_ci) >= PTP_WQE_CTR2IDX(skb_pi - skb_ci))
- return true;
+ skb = map->data[metadata];
+ map->data[metadata] = NULL;
- return false;
+ return skb;
}
-static void mlx5e_ptp_skb_fifo_ts_cqe_resync(struct mlx5e_ptpsq *ptpsq, u16 skb_ci,
- u16 skb_id, int budget)
+static void mlx5e_ptpsq_mark_ts_cqes_undelivered(struct mlx5e_ptpsq *ptpsq,
+ ktime_t port_tstamp)
{
- struct skb_shared_hwtstamps hwts = {};
- struct sk_buff *skb;
+ struct mlx5e_ptp_port_ts_cqe_list *cqe_list = ptpsq->ts_cqe_pending_list;
+ ktime_t timeout = ns_to_ktime(MLX5E_PTP_TS_CQE_UNDELIVERED_TIMEOUT);
+ struct mlx5e_ptp_metadata_map *metadata_map = &ptpsq->metadata_map;
+ struct mlx5e_ptp_port_ts_cqe_tracker *pos, *n;
+
+ spin_lock(&cqe_list->tracker_list_lock);
+ list_for_each_entry_safe(pos, n, &cqe_list->tracker_list_head, entry) {
+ struct sk_buff *skb =
+ mlx5e_ptp_metadata_map_lookup(metadata_map, pos->metadata_id);
+ ktime_t dma_tstamp = mlx5e_skb_cb_get_hwts(skb)->cqe_hwtstamp;
- ptpsq->cq_stats->resync_event++;
+ if (!dma_tstamp ||
+ ktime_after(ktime_add(dma_tstamp, timeout), port_tstamp))
+ break;
- while (skb_ci != skb_id) {
- skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo);
- hwts.hwtstamp = mlx5e_skb_cb_get_hwts(skb)->cqe_hwtstamp;
- skb_tstamp_tx(skb, &hwts);
- ptpsq->cq_stats->resync_cqe++;
- napi_consume_skb(skb, budget);
- skb_ci = PTP_WQE_CTR2IDX(ptpsq->skb_fifo_cc);
+ metadata_map->undelivered_counter++;
+ WARN_ON_ONCE(!pos->inuse);
+ pos->inuse = false;
+ list_del(&pos->entry);
}
+ spin_unlock(&cqe_list->tracker_list_lock);
}
+#define PTP_WQE_CTR2IDX(val) ((val) & ptpsq->ts_cqe_ctr_mask)
+
static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
struct mlx5_cqe64 *cqe,
int budget)
{
- u16 skb_id = PTP_WQE_CTR2IDX(be16_to_cpu(cqe->wqe_counter));
- u16 skb_ci = PTP_WQE_CTR2IDX(ptpsq->skb_fifo_cc);
+ struct mlx5e_ptp_port_ts_cqe_list *pending_cqe_list = ptpsq->ts_cqe_pending_list;
+ u8 metadata_id = PTP_WQE_CTR2IDX(be16_to_cpu(cqe->wqe_counter));
+ bool is_err_cqe = !!MLX5E_RX_ERR_CQE(cqe);
struct mlx5e_txqsq *sq = &ptpsq->txqsq;
struct sk_buff *skb;
ktime_t hwtstamp;
- if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
- skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo);
- ptpsq->cq_stats->err_cqe++;
- goto out;
+ if (likely(pending_cqe_list->nodes[metadata_id].inuse)) {
+ mlx5e_ptp_port_ts_cqe_list_remove(pending_cqe_list, metadata_id);
+ } else {
+ /* Reclaim space in the unlikely event CQE was delivered after
+ * marking it late.
+ */
+ ptpsq->metadata_map.undelivered_counter--;
+ ptpsq->cq_stats->late_cqe++;
}
- if (mlx5e_ptp_ts_cqe_drop(ptpsq, skb_ci, skb_id)) {
- if (mlx5e_ptp_ts_cqe_ooo(ptpsq, skb_id)) {
- /* already handled by a previous resync */
- ptpsq->cq_stats->ooo_cqe_drop++;
- return;
- }
- mlx5e_ptp_skb_fifo_ts_cqe_resync(ptpsq, skb_ci, skb_id, budget);
+ skb = mlx5e_ptp_metadata_map_remove(&ptpsq->metadata_map, metadata_id);
+
+ if (unlikely(is_err_cqe)) {
+ ptpsq->cq_stats->err_cqe++;
+ goto out;
}
- skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo);
hwtstamp = mlx5e_cqe_ts_to_ns(sq->ptp_cyc2time, sq->clock, get_cqe_ts(cqe));
mlx5e_skb_cb_hwtstamp_handler(skb, MLX5E_SKB_CB_PORT_HWTSTAMP,
hwtstamp, ptpsq->cq_stats);
ptpsq->cq_stats->cqe++;
+ mlx5e_ptpsq_mark_ts_cqes_undelivered(ptpsq, hwtstamp);
out:
napi_consume_skb(skb, budget);
+ mlx5e_ptp_metadata_fifo_push(&ptpsq->metadata_freelist, metadata_id);
}
static bool mlx5e_ptp_poll_ts_cq(struct mlx5e_cq *cq, int budget)
@@ -291,36 +348,78 @@ static void mlx5e_ptp_destroy_sq(struct mlx5_core_dev *mdev, u32 sqn)
static int mlx5e_ptp_alloc_traffic_db(struct mlx5e_ptpsq *ptpsq, int numa)
{
- int wq_sz = mlx5_wq_cyc_get_size(&ptpsq->txqsq.wq);
- struct mlx5_core_dev *mdev = ptpsq->txqsq.mdev;
+ struct mlx5e_ptp_metadata_fifo *metadata_freelist = &ptpsq->metadata_freelist;
+ struct mlx5e_ptp_metadata_map *metadata_map = &ptpsq->metadata_map;
+ struct mlx5e_ptp_port_ts_cqe_list *cqe_list;
+ int db_sz;
+ int md;
- ptpsq->skb_fifo.fifo = kvzalloc_node(array_size(wq_sz, sizeof(*ptpsq->skb_fifo.fifo)),
- GFP_KERNEL, numa);
- if (!ptpsq->skb_fifo.fifo)
+ cqe_list = kvzalloc_node(sizeof(*ptpsq->ts_cqe_pending_list), GFP_KERNEL, numa);
+ if (!cqe_list)
return -ENOMEM;
+ ptpsq->ts_cqe_pending_list = cqe_list;
+
+ db_sz = min_t(u32, mlx5_wq_cyc_get_size(&ptpsq->txqsq.wq),
+ 1 << MLX5_CAP_GEN_2(ptpsq->txqsq.mdev,
+ ts_cqe_metadata_size2wqe_counter));
+ ptpsq->ts_cqe_ctr_mask = db_sz - 1;
+
+ cqe_list->nodes = kvzalloc_node(array_size(db_sz, sizeof(*cqe_list->nodes)),
+ GFP_KERNEL, numa);
+ if (!cqe_list->nodes)
+ goto free_cqe_list;
+ INIT_LIST_HEAD(&cqe_list->tracker_list_head);
+ spin_lock_init(&cqe_list->tracker_list_lock);
+
+ metadata_freelist->data =
+ kvzalloc_node(array_size(db_sz, sizeof(*metadata_freelist->data)),
+ GFP_KERNEL, numa);
+ if (!metadata_freelist->data)
+ goto free_cqe_list_nodes;
+ metadata_freelist->mask = ptpsq->ts_cqe_ctr_mask;
+
+ for (md = 0; md < db_sz; ++md) {
+ cqe_list->nodes[md].metadata_id = md;
+ metadata_freelist->data[md] = md;
+ }
+ metadata_freelist->pc = db_sz;
+
+ metadata_map->data =
+ kvzalloc_node(array_size(db_sz, sizeof(*metadata_map->data)),
+ GFP_KERNEL, numa);
+ if (!metadata_map->data)
+ goto free_metadata_freelist;
+ metadata_map->capacity = db_sz;
- ptpsq->skb_fifo.pc = &ptpsq->skb_fifo_pc;
- ptpsq->skb_fifo.cc = &ptpsq->skb_fifo_cc;
- ptpsq->skb_fifo.mask = wq_sz - 1;
- if (MLX5_CAP_GEN_2(mdev, ts_cqe_metadata_size2wqe_counter))
- ptpsq->ts_cqe_ctr_mask =
- (1 << MLX5_CAP_GEN_2(mdev, ts_cqe_metadata_size2wqe_counter)) - 1;
return 0;
+
+free_metadata_freelist:
+ kvfree(metadata_freelist->data);
+free_cqe_list_nodes:
+ kvfree(cqe_list->nodes);
+free_cqe_list:
+ kvfree(cqe_list);
+ return -ENOMEM;
}
-static void mlx5e_ptp_drain_skb_fifo(struct mlx5e_skb_fifo *skb_fifo)
+static void mlx5e_ptp_drain_metadata_map(struct mlx5e_ptp_metadata_map *map)
{
- while (*skb_fifo->pc != *skb_fifo->cc) {
- struct sk_buff *skb = mlx5e_skb_fifo_pop(skb_fifo);
+ int idx;
+
+ for (idx = 0; idx < map->capacity; ++idx) {
+ struct sk_buff *skb = map->data[idx];
dev_kfree_skb_any(skb);
}
}
-static void mlx5e_ptp_free_traffic_db(struct mlx5e_skb_fifo *skb_fifo)
+static void mlx5e_ptp_free_traffic_db(struct mlx5e_ptpsq *ptpsq)
{
- mlx5e_ptp_drain_skb_fifo(skb_fifo);
- kvfree(skb_fifo->fifo);
+ mlx5e_ptp_drain_metadata_map(&ptpsq->metadata_map);
+ kvfree(ptpsq->metadata_map.data);
+ kvfree(ptpsq->metadata_freelist.data);
+ kvfree(ptpsq->ts_cqe_pending_list->nodes);
+ kvfree(ptpsq->ts_cqe_pending_list);
}
static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn,
@@ -348,8 +447,7 @@ static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn,
if (err)
goto err_free_txqsq;
- err = mlx5e_ptp_alloc_traffic_db(ptpsq,
- dev_to_node(mlx5_core_dma_dev(c->mdev)));
+ err = mlx5e_ptp_alloc_traffic_db(ptpsq, dev_to_node(mlx5_core_dma_dev(c->mdev)));
if (err)
goto err_free_txqsq;
@@ -366,7 +464,7 @@ static void mlx5e_ptp_close_txqsq(struct mlx5e_ptpsq *ptpsq)
struct mlx5e_txqsq *sq = &ptpsq->txqsq;
struct mlx5_core_dev *mdev = sq->mdev;
- mlx5e_ptp_free_traffic_db(&ptpsq->skb_fifo);
+ mlx5e_ptp_free_traffic_db(ptpsq);
cancel_work_sync(&sq->recover_work);
mlx5e_ptp_destroy_sq(mdev, sq->sqn);
mlx5e_free_txqsq_descs(sq);
@@ -534,7 +632,10 @@ static void mlx5e_ptp_build_params(struct mlx5e_ptp *c,
/* SQ */
if (test_bit(MLX5E_PTP_STATE_TX, c->state)) {
- params->log_sq_size = orig->log_sq_size;
+ params->log_sq_size =
+ min(MLX5_CAP_GEN_2(c->mdev, ts_cqe_metadata_size2wqe_counter),
+ MLX5E_PTP_MAX_LOG_SQ_SIZE);
+ params->log_sq_size = min(params->log_sq_size, orig->log_sq_size);
mlx5e_ptp_build_sq_param(c->mdev, params, &cparams->txq_sq_param);
}
/* RQ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h
index cc7efde88ac3..7c5597d4589d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h
@@ -7,18 +7,36 @@
#include "en.h"
#include "en_stats.h"
#include "en/txrx.h"
+#include <linux/ktime.h>
#include <linux/ptp_classify.h>
+#include <linux/time64.h>
#define MLX5E_PTP_CHANNEL_IX 0
+#define MLX5E_PTP_MAX_LOG_SQ_SIZE (8U)
+#define MLX5E_PTP_TS_CQE_UNDELIVERED_TIMEOUT (1 * NSEC_PER_SEC)
+
+struct mlx5e_ptp_metadata_fifo {
+ u8 cc;
+ u8 pc;
+ u8 mask;
+ u8 *data;
+};
+
+struct mlx5e_ptp_metadata_map {
+ u16 undelivered_counter;
+ u16 capacity;
+ struct sk_buff **data;
+};
struct mlx5e_ptpsq {
struct mlx5e_txqsq txqsq;
struct mlx5e_cq ts_cq;
- u16 skb_fifo_cc;
- u16 skb_fifo_pc;
- struct mlx5e_skb_fifo skb_fifo;
struct mlx5e_ptp_cq_stats *cq_stats;
u16 ts_cqe_ctr_mask;
+
+ struct mlx5e_ptp_port_ts_cqe_list *ts_cqe_pending_list;
+ struct mlx5e_ptp_metadata_fifo metadata_freelist;
+ struct mlx5e_ptp_metadata_map metadata_map;
};
enum {
@@ -69,12 +87,35 @@ static inline bool mlx5e_use_ptpsq(struct sk_buff *skb)
fk.ports.dst == htons(PTP_EV_PORT));
}
-static inline bool mlx5e_ptpsq_fifo_has_room(struct mlx5e_txqsq *sq)
+static inline void mlx5e_ptp_metadata_fifo_push(struct mlx5e_ptp_metadata_fifo *fifo, u8 metadata)
{
- if (!sq->ptpsq)
- return true;
+ fifo->data[fifo->mask & fifo->pc++] = metadata;
+}
+
+static inline u8
+mlx5e_ptp_metadata_fifo_pop(struct mlx5e_ptp_metadata_fifo *fifo)
+{
+ return fifo->data[fifo->mask & fifo->cc++];
+}
- return mlx5e_skb_fifo_has_room(&sq->ptpsq->skb_fifo);
+static inline void
+mlx5e_ptp_metadata_map_put(struct mlx5e_ptp_metadata_map *map,
+ struct sk_buff *skb, u8 metadata)
+{
+ WARN_ON_ONCE(map->data[metadata]);
+ map->data[metadata] = skb;
+}
+
+static inline bool mlx5e_ptpsq_metadata_freelist_empty(struct mlx5e_ptpsq *ptpsq)
+{
+ struct mlx5e_ptp_metadata_fifo *freelist;
+
+ if (likely(!ptpsq))
+ return false;
+
+ freelist = &ptpsq->metadata_freelist;
+
+ return freelist->pc == freelist->cc;
}
int mlx5e_ptp_open(struct mlx5e_priv *priv, struct mlx5e_params *params,
@@ -89,6 +130,8 @@ void mlx5e_ptp_free_rx_fs(struct mlx5e_flow_steering *fs,
const struct mlx5e_profile *profile);
int mlx5e_ptp_rx_manage_fs(struct mlx5e_priv *priv, bool set);
+void mlx5e_ptpsq_track_metadata(struct mlx5e_ptpsq *ptpsq, u8 metadata);
+
enum {
MLX5E_SKB_CB_CQE_HWTSTAMP = BIT(0),
MLX5E_SKB_CB_PORT_HWTSTAMP = BIT(1),
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 04195a673a6b..dff02434ff45 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -2061,7 +2061,8 @@ static int set_pflag_tx_port_ts(struct net_device *netdev, bool enable)
struct mlx5e_params new_params;
int err;
- if (!MLX5_CAP_GEN(mdev, ts_cqe_to_dest_cqn))
+ if (!MLX5_CAP_GEN(mdev, ts_cqe_to_dest_cqn) ||
+ !MLX5_CAP_GEN_2(mdev, ts_cqe_metadata_size2wqe_counter))
return -EOPNOTSUPP;
/* Don't allow changing the PTP state if HTB offload is active, because
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 07b84d668fcc..52141729444e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -2142,9 +2142,7 @@ static const struct counter_desc ptp_cq_stats_desc[] = {
{ MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, err_cqe) },
{ MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, abort) },
{ MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, abort_abs_diff_ns) },
- { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, resync_cqe) },
- { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, resync_event) },
- { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, ooo_cqe_drop) },
+ { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, late_cqe) },
};
static const struct counter_desc ptp_rq_stats_desc[] = {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 1ff8a06027dc..409e9a47e433 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -449,9 +449,7 @@ struct mlx5e_ptp_cq_stats {
u64 err_cqe;
u64 abort;
u64 abort_abs_diff_ns;
- u64 resync_cqe;
- u64 resync_event;
- u64 ooo_cqe_drop;
+ u64 late_cqe;
};
struct mlx5e_rep_stats {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index c7eb6b238c2b..d41435c22ce5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -372,7 +372,7 @@ mlx5e_txwqe_complete(struct mlx5e_txqsq *sq, struct sk_buff *skb,
const struct mlx5e_tx_attr *attr,
const struct mlx5e_tx_wqe_attr *wqe_attr, u8 num_dma,
struct mlx5e_tx_wqe_info *wi, struct mlx5_wqe_ctrl_seg *cseg,
- bool xmit_more)
+ struct mlx5_wqe_eth_seg *eseg, bool xmit_more)
{
struct mlx5_wq_cyc *wq = &sq->wq;
bool send_doorbell;
@@ -394,11 +394,16 @@ mlx5e_txwqe_complete(struct mlx5e_txqsq *sq, struct sk_buff *skb,
mlx5e_tx_check_stop(sq);
- if (unlikely(sq->ptpsq)) {
+ if (unlikely(sq->ptpsq &&
+ (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))) {
+ u8 metadata_index = be32_to_cpu(eseg->flow_table_metadata);
+
mlx5e_skb_cb_hwtstamp_init(skb);
- mlx5e_skb_fifo_push(&sq->ptpsq->skb_fifo, skb);
+ mlx5e_ptpsq_track_metadata(sq->ptpsq, metadata_index);
+ mlx5e_ptp_metadata_map_put(&sq->ptpsq->metadata_map, skb,
+ metadata_index);
if (!netif_tx_queue_stopped(sq->txq) &&
- !mlx5e_skb_fifo_has_room(&sq->ptpsq->skb_fifo)) {
+ mlx5e_ptpsq_metadata_freelist_empty(sq->ptpsq)) {
netif_tx_stop_queue(sq->txq);
sq->stats->stopped++;
}
@@ -483,13 +488,16 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
if (unlikely(num_dma < 0))
goto err_drop;
- mlx5e_txwqe_complete(sq, skb, attr, wqe_attr, num_dma, wi, cseg, xmit_more);
+ mlx5e_txwqe_complete(sq, skb, attr, wqe_attr, num_dma, wi, cseg, eseg, xmit_more);
return;
err_drop:
stats->dropped++;
dev_kfree_skb_any(skb);
+ if (unlikely(sq->ptpsq && (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)))
+ mlx5e_ptp_metadata_fifo_push(&sq->ptpsq->metadata_freelist,
+ be32_to_cpu(eseg->flow_table_metadata));
mlx5e_tx_flush(sq);
}
@@ -645,9 +653,9 @@ void mlx5e_tx_mpwqe_ensure_complete(struct mlx5e_txqsq *sq)
static void mlx5e_cqe_ts_id_eseg(struct mlx5e_ptpsq *ptpsq, struct sk_buff *skb,
struct mlx5_wqe_eth_seg *eseg)
{
- if (ptpsq->ts_cqe_ctr_mask && unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
- eseg->flow_table_metadata = cpu_to_be32(ptpsq->skb_fifo_pc &
- ptpsq->ts_cqe_ctr_mask);
+ if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
+ eseg->flow_table_metadata =
+ cpu_to_be32(mlx5e_ptp_metadata_fifo_pop(&ptpsq->metadata_freelist));
}
static void mlx5e_txwqe_build_eseg(struct mlx5e_priv *priv, struct mlx5e_txqsq *sq,
@@ -766,7 +774,7 @@ void mlx5e_txqsq_wake(struct mlx5e_txqsq *sq)
{
if (netif_tx_queue_stopped(sq->txq) &&
mlx5e_wqc_has_room_for(&sq->wq, sq->cc, sq->pc, sq->stop_room) &&
- mlx5e_ptpsq_fifo_has_room(sq) &&
+ !mlx5e_ptpsq_metadata_freelist_empty(sq->ptpsq) &&
!test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state)) {
netif_tx_wake_queue(sq->txq);
sq->stats->wake++;
@@ -1031,7 +1039,7 @@ void mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
if (unlikely(num_dma < 0))
goto err_drop;
- mlx5e_txwqe_complete(sq, skb, &attr, &wqe_attr, num_dma, wi, cseg, xmit_more);
+ mlx5e_txwqe_complete(sq, skb, &attr, &wqe_attr, num_dma, wi, cseg, eseg, xmit_more);
return;
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [net-next 03/14] net/mlx5e: Add recovery flow for tx devlink health reporter for unhealthy PTP SQ
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
2023-08-14 21:41 ` [net-next 01/14] net/mlx5: Consolidate devlink documentation in devlink/mlx5.rst Saeed Mahameed
2023-08-14 21:41 ` [net-next 02/14] net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs Saeed Mahameed
@ 2023-08-14 21:41 ` Saeed Mahameed
2023-08-14 21:41 ` [net-next 04/14] net/mlx5: Expose max possible SFs via devlink resource Saeed Mahameed
` (10 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Rahul Rameshbabu
From: Rahul Rameshbabu <rrameshbabu@nvidia.com>
A new check for the tx devlink health reporter is introduced for
determining when the PTP port timestamping SQ is considered unhealthy. If
there are enough CQEs considered never to be delivered, the space that can
be utilized on the SQ decreases significantly, impacting performance and
usability of the SQ. The health reporter is triggered when the number of
likely never delivered port timestamping CQEs that utilize the space of the
PTP SQ is greater than 93.75% of the total capacity of the SQ. A devlink
health reporter recover method is also provided for this specific TX error
context that restarts the PTP SQ.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
Documentation/networking/devlink/mlx5.rst | 5 +-
.../ethernet/mellanox/mlx5/core/en/health.h | 1 +
.../net/ethernet/mellanox/mlx5/core/en/ptp.c | 22 +++++++
.../net/ethernet/mellanox/mlx5/core/en/ptp.h | 2 +
.../mellanox/mlx5/core/en/reporter_tx.c | 65 +++++++++++++++++++
5 files changed, 94 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
index 196a4bb28df1..702f204a3dbd 100644
--- a/Documentation/networking/devlink/mlx5.rst
+++ b/Documentation/networking/devlink/mlx5.rst
@@ -135,7 +135,7 @@ Health reporters
tx reporter
-----------
-The tx reporter is responsible for reporting and recovering of the following two error scenarios:
+The tx reporter is responsible for reporting and recovering of the following three error scenarios:
- tx timeout
Report on kernel tx timeout detection.
@@ -143,6 +143,9 @@ The tx reporter is responsible for reporting and recovering of the following two
- tx error completion
Report on error tx completion.
Recover by flushing the tx queue and reset it.
+- tx PTP port timestamping CQ unhealthy
+ Report too many CQEs never delivered on port ts CQ.
+ Recover by flushing and re-creating all PTP channels.
tx reporter also support on demand diagnose callback, on which it provides
real time information of its send queues status.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
index 0107e4e73bb0..415840c3ef84 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/health.h
@@ -18,6 +18,7 @@ void mlx5e_reporter_tx_create(struct mlx5e_priv *priv);
void mlx5e_reporter_tx_destroy(struct mlx5e_priv *priv);
void mlx5e_reporter_tx_err_cqe(struct mlx5e_txqsq *sq);
int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq);
+void mlx5e_reporter_tx_ptpsq_unhealthy(struct mlx5e_ptpsq *ptpsq);
int mlx5e_health_cq_diag_fmsg(struct mlx5e_cq *cq, struct devlink_fmsg *fmsg);
int mlx5e_health_cq_common_diag_fmsg(struct mlx5e_cq *cq, struct devlink_fmsg *fmsg);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
index 8680d21f3e7b..bb11e644d24f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
@@ -2,6 +2,7 @@
// Copyright (c) 2020 Mellanox Technologies
#include "en/ptp.h"
+#include "en/health.h"
#include "en/txrx.h"
#include "en/params.h"
#include "en/fs_tt_redirect.h"
@@ -140,6 +141,12 @@ mlx5e_ptp_metadata_map_remove(struct mlx5e_ptp_metadata_map *map, u16 metadata)
return skb;
}
+static bool mlx5e_ptp_metadata_map_unhealthy(struct mlx5e_ptp_metadata_map *map)
+{
+ /* Considered beginning unhealthy state if size * 15 / 2^4 cannot be reclaimed. */
+ return map->undelivered_counter > (map->capacity >> 4) * 15;
+}
+
static void mlx5e_ptpsq_mark_ts_cqes_undelivered(struct mlx5e_ptpsq *ptpsq,
ktime_t port_tstamp)
{
@@ -205,6 +212,9 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
out:
napi_consume_skb(skb, budget);
mlx5e_ptp_metadata_fifo_push(&ptpsq->metadata_freelist, metadata_id);
+ if (unlikely(mlx5e_ptp_metadata_map_unhealthy(&ptpsq->metadata_map)) &&
+ !test_and_set_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
+ queue_work(ptpsq->txqsq.priv->wq, &ptpsq->report_unhealthy_work);
}
static bool mlx5e_ptp_poll_ts_cq(struct mlx5e_cq *cq, int budget)
@@ -422,6 +432,14 @@ static void mlx5e_ptp_free_traffic_db(struct mlx5e_ptpsq *ptpsq)
kvfree(ptpsq->ts_cqe_pending_list);
}
+static void mlx5e_ptpsq_unhealthy_work(struct work_struct *work)
+{
+ struct mlx5e_ptpsq *ptpsq =
+ container_of(work, struct mlx5e_ptpsq, report_unhealthy_work);
+
+ mlx5e_reporter_tx_ptpsq_unhealthy(ptpsq);
+}
+
static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn,
int txq_ix, struct mlx5e_ptp_params *cparams,
int tc, struct mlx5e_ptpsq *ptpsq)
@@ -451,6 +469,8 @@ static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn,
if (err)
goto err_free_txqsq;
+ INIT_WORK(&ptpsq->report_unhealthy_work, mlx5e_ptpsq_unhealthy_work);
+
return 0;
err_free_txqsq:
@@ -464,6 +484,8 @@ static void mlx5e_ptp_close_txqsq(struct mlx5e_ptpsq *ptpsq)
struct mlx5e_txqsq *sq = &ptpsq->txqsq;
struct mlx5_core_dev *mdev = sq->mdev;
+ if (current_work() != &ptpsq->report_unhealthy_work)
+ cancel_work_sync(&ptpsq->report_unhealthy_work);
mlx5e_ptp_free_traffic_db(ptpsq);
cancel_work_sync(&sq->recover_work);
mlx5e_ptp_destroy_sq(mdev, sq->sqn);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h
index 7c5597d4589d..7b700d0f956a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h
@@ -10,6 +10,7 @@
#include <linux/ktime.h>
#include <linux/ptp_classify.h>
#include <linux/time64.h>
+#include <linux/workqueue.h>
#define MLX5E_PTP_CHANNEL_IX 0
#define MLX5E_PTP_MAX_LOG_SQ_SIZE (8U)
@@ -34,6 +35,7 @@ struct mlx5e_ptpsq {
struct mlx5e_ptp_cq_stats *cq_stats;
u16 ts_cqe_ctr_mask;
+ struct work_struct report_unhealthy_work;
struct mlx5e_ptp_port_ts_cqe_list *ts_cqe_pending_list;
struct mlx5e_ptp_metadata_fifo metadata_freelist;
struct mlx5e_ptp_metadata_map metadata_map;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index b35ff289af49..ff8242f67c54 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -164,6 +164,43 @@ static int mlx5e_tx_reporter_timeout_recover(void *ctx)
return err;
}
+static int mlx5e_tx_reporter_ptpsq_unhealthy_recover(void *ctx)
+{
+ struct mlx5e_ptpsq *ptpsq = ctx;
+ struct mlx5e_channels *chs;
+ struct net_device *netdev;
+ struct mlx5e_priv *priv;
+ int carrier_ok;
+ int err;
+
+ if (!test_bit(MLX5E_SQ_STATE_RECOVERING, &ptpsq->txqsq.state))
+ return 0;
+
+ priv = ptpsq->txqsq.priv;
+
+ mutex_lock(&priv->state_lock);
+ chs = &priv->channels;
+ netdev = priv->netdev;
+
+ carrier_ok = netif_carrier_ok(netdev);
+ netif_carrier_off(netdev);
+
+ mlx5e_deactivate_priv_channels(priv);
+
+ mlx5e_ptp_close(chs->ptp);
+ err = mlx5e_ptp_open(priv, &chs->params, chs->c[0]->lag_port, &chs->ptp);
+
+ mlx5e_activate_priv_channels(priv);
+
+ /* return carrier back if needed */
+ if (carrier_ok)
+ netif_carrier_on(netdev);
+
+ mutex_unlock(&priv->state_lock);
+
+ return err;
+}
+
/* state lock cannot be grabbed within this function.
* It can cause a dead lock or a read-after-free.
*/
@@ -516,6 +553,15 @@ static int mlx5e_tx_reporter_timeout_dump(struct mlx5e_priv *priv, struct devlin
return mlx5e_tx_reporter_dump_sq(priv, fmsg, to_ctx->sq);
}
+static int mlx5e_tx_reporter_ptpsq_unhealthy_dump(struct mlx5e_priv *priv,
+ struct devlink_fmsg *fmsg,
+ void *ctx)
+{
+ struct mlx5e_ptpsq *ptpsq = ctx;
+
+ return mlx5e_tx_reporter_dump_sq(priv, fmsg, &ptpsq->txqsq);
+}
+
static int mlx5e_tx_reporter_dump_all_sqs(struct mlx5e_priv *priv,
struct devlink_fmsg *fmsg)
{
@@ -621,6 +667,25 @@ int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq)
return to_ctx.status;
}
+void mlx5e_reporter_tx_ptpsq_unhealthy(struct mlx5e_ptpsq *ptpsq)
+{
+ struct mlx5e_ptp_metadata_map *map = &ptpsq->metadata_map;
+ char err_str[MLX5E_REPORTER_PER_Q_MAX_LEN];
+ struct mlx5e_txqsq *txqsq = &ptpsq->txqsq;
+ struct mlx5e_cq *ts_cq = &ptpsq->ts_cq;
+ struct mlx5e_priv *priv = txqsq->priv;
+ struct mlx5e_err_ctx err_ctx = {};
+
+ err_ctx.ctx = ptpsq;
+ err_ctx.recover = mlx5e_tx_reporter_ptpsq_unhealthy_recover;
+ err_ctx.dump = mlx5e_tx_reporter_ptpsq_unhealthy_dump;
+ snprintf(err_str, sizeof(err_str),
+ "Unhealthy TX port TS queue: %d, SQ: 0x%x, CQ: 0x%x, Undelivered CQEs: %u Map Capacity: %u",
+ txqsq->ch_ix, txqsq->sqn, ts_cq->mcq.cqn, map->undelivered_counter, map->capacity);
+
+ mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx);
+}
+
static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops = {
.name = "tx",
.recover = mlx5e_tx_reporter_recover,
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [net-next 04/14] net/mlx5: Expose max possible SFs via devlink resource
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
` (2 preceding siblings ...)
2023-08-14 21:41 ` [net-next 03/14] net/mlx5e: Add recovery flow for tx devlink health reporter for unhealthy PTP SQ Saeed Mahameed
@ 2023-08-14 21:41 ` Saeed Mahameed
2023-08-14 21:41 ` [net-next 05/14] net/mlx5: Check with FW that sync reset completed successfully Saeed Mahameed
` (9 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh,
Jiri Pirko
From: Shay Drory <shayd@nvidia.com>
Introduce devlink resource for exposing max possible SFs on mlx5
devices.
For example:
$ devlink resource show pci/0000:00:0b.0
pci/0000:00:0b.0:
name max_local_SFs size 5 unit entry dpipe_tables none
name max_external_SFs size 0 unit entry dpipe_tables none
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/devlink.h | 8 ++++
.../ethernet/mellanox/mlx5/core/sf/hw_table.c | 44 +++++++++++++++++--
2 files changed, 48 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
index defba5bd91d9..961f75da6227 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -6,6 +6,14 @@
#include <net/devlink.h>
+enum mlx5_devlink_resource_id {
+ MLX5_DL_RES_MAX_LOCAL_SFS = 1,
+ MLX5_DL_RES_MAX_EXTERNAL_SFS,
+
+ __MLX5_ID_RES_MAX,
+ MLX5_ID_RES_MAX = __MLX5_ID_RES_MAX - 1,
+};
+
enum mlx5_devlink_param_id {
MLX5_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
MLX5_DEVLINK_PARAM_ID_FLOW_STEERING_MODE,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
index 17aa348989cb..c4daeaaafead 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
@@ -9,6 +9,7 @@
#include "mlx5_core.h"
#include "eswitch.h"
#include "diag/sf_tracepoint.h"
+#include "devlink.h"
struct mlx5_sf_hw {
u32 usr_sfnum;
@@ -243,6 +244,32 @@ static void mlx5_sf_hw_table_hwc_cleanup(struct mlx5_sf_hwc_table *hwc)
kfree(hwc->sfs);
}
+static void mlx5_sf_hw_table_res_unregister(struct mlx5_core_dev *dev)
+{
+ devl_resources_unregister(priv_to_devlink(dev));
+}
+
+static int mlx5_sf_hw_table_res_register(struct mlx5_core_dev *dev, u16 max_fn,
+ u16 max_ext_fn)
+{
+ struct devlink_resource_size_params size_params;
+ struct devlink *devlink = priv_to_devlink(dev);
+ int err;
+
+ devlink_resource_size_params_init(&size_params, max_fn, max_fn, 1,
+ DEVLINK_RESOURCE_UNIT_ENTRY);
+ err = devl_resource_register(devlink, "max_local_SFs", max_fn, MLX5_DL_RES_MAX_LOCAL_SFS,
+ DEVLINK_RESOURCE_ID_PARENT_TOP, &size_params);
+ if (err)
+ return err;
+
+ devlink_resource_size_params_init(&size_params, max_ext_fn, max_ext_fn, 1,
+ DEVLINK_RESOURCE_UNIT_ENTRY);
+ return devl_resource_register(devlink, "max_external_SFs", max_ext_fn,
+ MLX5_DL_RES_MAX_EXTERNAL_SFS, DEVLINK_RESOURCE_ID_PARENT_TOP,
+ &size_params);
+}
+
int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev)
{
struct mlx5_sf_hw_table *table;
@@ -262,12 +289,17 @@ int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev)
if (err)
return err;
+ if (mlx5_sf_hw_table_res_register(dev, max_fn, max_ext_fn))
+ mlx5_core_dbg(dev, "failed to register max SFs resources");
+
if (!max_fn && !max_ext_fn)
return 0;
table = kzalloc(sizeof(*table), GFP_KERNEL);
- if (!table)
- return -ENOMEM;
+ if (!table) {
+ err = -ENOMEM;
+ goto alloc_err;
+ }
mutex_init(&table->table_lock);
table->dev = dev;
@@ -291,6 +323,8 @@ int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev)
table_err:
mutex_destroy(&table->table_lock);
kfree(table);
+alloc_err:
+ mlx5_sf_hw_table_res_unregister(dev);
return err;
}
@@ -299,12 +333,14 @@ void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev)
struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table;
if (!table)
- return;
+ goto res_unregister;
- mutex_destroy(&table->table_lock);
mlx5_sf_hw_table_hwc_cleanup(&table->hwc[MLX5_SF_HWC_EXTERNAL]);
mlx5_sf_hw_table_hwc_cleanup(&table->hwc[MLX5_SF_HWC_LOCAL]);
+ mutex_destroy(&table->table_lock);
kfree(table);
+res_unregister:
+ mlx5_sf_hw_table_res_unregister(dev);
}
static int mlx5_sf_hw_vhca_event(struct notifier_block *nb, unsigned long opcode, void *data)
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [net-next 05/14] net/mlx5: Check with FW that sync reset completed successfully
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
` (3 preceding siblings ...)
2023-08-14 21:41 ` [net-next 04/14] net/mlx5: Expose max possible SFs via devlink resource Saeed Mahameed
@ 2023-08-14 21:41 ` Saeed Mahameed
2023-08-14 21:41 ` [net-next 06/14] net/mlx5: E-switch, Add checking for flow rule destinations Saeed Mahameed
` (8 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Moshe Shemesh, Shay Drory
From: Moshe Shemesh <moshe@nvidia.com>
Even if the PF driver had no error on his part of the sync reset flow,
the firmware can see wider picture as it syncs all the PFs in the flow.
So add at end of sync reset flow check with firmware by reading MFRL
register and initialization segment that the flow had no issue from
firmware point of view too.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/devlink.c | 3 ++
.../ethernet/mellanox/mlx5/core/fw_reset.c | 39 ++++++++++++++++---
.../ethernet/mellanox/mlx5/core/fw_reset.h | 2 +
include/linux/mlx5/mlx5_ifc.h | 3 +-
4 files changed, 40 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 3d82ec890666..af8460bb257b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -212,6 +212,9 @@ static int mlx5_devlink_reload_up(struct devlink *devlink, enum devlink_reload_a
/* On fw_activate action, also driver is reloaded and reinit performed */
*actions_performed |= BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT);
ret = mlx5_load_one_devl_locked(dev, true);
+ if (ret)
+ return ret;
+ ret = mlx5_fw_reset_verify_fw_complete(dev, extack);
break;
default:
/* Unsupported action should not get to this function */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index 4804990b7f22..e87766f91150 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -127,17 +127,23 @@ static int mlx5_fw_reset_get_reset_state_err(struct mlx5_core_dev *dev,
if (mlx5_reg_mfrl_query(dev, NULL, NULL, &reset_state))
goto out;
+ if (!reset_state)
+ return 0;
+
switch (reset_state) {
case MLX5_MFRL_REG_RESET_STATE_IN_NEGOTIATION:
case MLX5_MFRL_REG_RESET_STATE_RESET_IN_PROGRESS:
- NL_SET_ERR_MSG_MOD(extack, "Sync reset was already triggered");
+ NL_SET_ERR_MSG_MOD(extack, "Sync reset still in progress");
return -EBUSY;
- case MLX5_MFRL_REG_RESET_STATE_TIMEOUT:
- NL_SET_ERR_MSG_MOD(extack, "Sync reset got timeout");
+ case MLX5_MFRL_REG_RESET_STATE_NEG_TIMEOUT:
+ NL_SET_ERR_MSG_MOD(extack, "Sync reset negotiation timeout");
return -ETIMEDOUT;
case MLX5_MFRL_REG_RESET_STATE_NACK:
NL_SET_ERR_MSG_MOD(extack, "One of the hosts disabled reset");
return -EPERM;
+ case MLX5_MFRL_REG_RESET_STATE_UNLOAD_TIMEOUT:
+ NL_SET_ERR_MSG_MOD(extack, "Sync reset unload timeout");
+ return -ETIMEDOUT;
}
out:
@@ -151,7 +157,7 @@ int mlx5_fw_reset_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel,
struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
u32 out[MLX5_ST_SZ_DW(mfrl_reg)] = {};
u32 in[MLX5_ST_SZ_DW(mfrl_reg)] = {};
- int err;
+ int err, rst_res;
set_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags);
@@ -164,13 +170,34 @@ int mlx5_fw_reset_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel,
return 0;
clear_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags);
- if (err == -EREMOTEIO && MLX5_CAP_MCAM_FEATURE(dev, reset_state))
- return mlx5_fw_reset_get_reset_state_err(dev, extack);
+ if (err == -EREMOTEIO && MLX5_CAP_MCAM_FEATURE(dev, reset_state)) {
+ rst_res = mlx5_fw_reset_get_reset_state_err(dev, extack);
+ return rst_res ? rst_res : err;
+ }
NL_SET_ERR_MSG_MOD(extack, "Sync reset command failed");
return mlx5_cmd_check(dev, err, in, out);
}
+int mlx5_fw_reset_verify_fw_complete(struct mlx5_core_dev *dev,
+ struct netlink_ext_ack *extack)
+{
+ u8 rst_state;
+ int err;
+
+ err = mlx5_fw_reset_get_reset_state_err(dev, extack);
+ if (err)
+ return err;
+
+ rst_state = mlx5_get_fw_rst_state(dev);
+ if (!rst_state)
+ return 0;
+
+ mlx5_core_err(dev, "Sync reset did not complete, state=%d\n", rst_state);
+ NL_SET_ERR_MSG_MOD(extack, "Sync reset did not complete successfully");
+ return rst_state;
+}
+
int mlx5_fw_reset_set_live_patch(struct mlx5_core_dev *dev)
{
return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL0, 0, 0, false);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
index c57465595f7c..ea527d06a85f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
@@ -12,6 +12,8 @@ int mlx5_fw_reset_set_reset_sync(struct mlx5_core_dev *dev, u8 reset_type_sel,
int mlx5_fw_reset_set_live_patch(struct mlx5_core_dev *dev);
int mlx5_fw_reset_wait_reset_done(struct mlx5_core_dev *dev);
+int mlx5_fw_reset_verify_fw_complete(struct mlx5_core_dev *dev,
+ struct netlink_ext_ack *extack);
void mlx5_fw_reset_events_start(struct mlx5_core_dev *dev);
void mlx5_fw_reset_events_stop(struct mlx5_core_dev *dev);
void mlx5_drain_fw_reset(struct mlx5_core_dev *dev);
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 87fd6f9ed82c..9aed7e9b9f29 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -10858,8 +10858,9 @@ enum {
MLX5_MFRL_REG_RESET_STATE_IDLE = 0,
MLX5_MFRL_REG_RESET_STATE_IN_NEGOTIATION = 1,
MLX5_MFRL_REG_RESET_STATE_RESET_IN_PROGRESS = 2,
- MLX5_MFRL_REG_RESET_STATE_TIMEOUT = 3,
+ MLX5_MFRL_REG_RESET_STATE_NEG_TIMEOUT = 3,
MLX5_MFRL_REG_RESET_STATE_NACK = 4,
+ MLX5_MFRL_REG_RESET_STATE_UNLOAD_TIMEOUT = 5,
};
enum {
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [net-next 06/14] net/mlx5: E-switch, Add checking for flow rule destinations
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
` (4 preceding siblings ...)
2023-08-14 21:41 ` [net-next 05/14] net/mlx5: Check with FW that sync reset completed successfully Saeed Mahameed
@ 2023-08-14 21:41 ` Saeed Mahameed
2023-08-14 21:41 ` [net-next 07/14] net/mlx5: Use auxiliary_device_uninit() instead of device_put() Saeed Mahameed
` (7 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Jianbo Liu
From: Jianbo Liu <jianbol@nvidia.com>
Firmware doesn't allow flow rules in FDB to do header rewrite and send
packets to both internal and uplink vports. The following syndrome
will be generated when trying to offload such kind of rules:
mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 23569): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8c8f08), err(-22)
To avoid this syndrome, add a checking before creating FTE. If a rule
with header rewrite action forwards packets to both VF and PF, an
error is returned directly.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
.../mellanox/mlx5/core/eswitch_offloads.c | 31 +++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index e391535e1ab1..46b8c60ac39a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -535,6 +535,28 @@ esw_src_port_rewrite_supported(struct mlx5_eswitch *esw)
MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, ignore_flow_level);
}
+static bool
+esw_dests_to_vf_pf_vports(struct mlx5_flow_destination *dests, int max_dest)
+{
+ bool vf_dest = false, pf_dest = false;
+ int i;
+
+ for (i = 0; i < max_dest; i++) {
+ if (dests[i].type != MLX5_FLOW_DESTINATION_TYPE_VPORT)
+ continue;
+
+ if (dests[i].vport.num == MLX5_VPORT_UPLINK)
+ pf_dest = true;
+ else
+ vf_dest = true;
+
+ if (vf_dest && pf_dest)
+ return true;
+ }
+
+ return false;
+}
+
static int
esw_setup_dests(struct mlx5_flow_destination *dest,
struct mlx5_flow_act *flow_act,
@@ -671,6 +693,15 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
rule = ERR_PTR(err);
goto err_create_goto_table;
}
+
+ /* Header rewrite with combined wire+loopback in FDB is not allowed */
+ if ((flow_act.action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) &&
+ esw_dests_to_vf_pf_vports(dest, i)) {
+ esw_warn(esw->dev,
+ "FDB: Header rewrite with forwarding to both PF and VF is not allowed\n");
+ rule = ERR_PTR(-EINVAL);
+ goto err_esw_get;
+ }
}
if (esw_attr->decap_pkt_reformat)
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [net-next 07/14] net/mlx5: Use auxiliary_device_uninit() instead of device_put()
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
` (5 preceding siblings ...)
2023-08-14 21:41 ` [net-next 06/14] net/mlx5: E-switch, Add checking for flow rule destinations Saeed Mahameed
@ 2023-08-14 21:41 ` Saeed Mahameed
2023-08-14 21:41 ` [net-next 08/14] net/mlx5: Remove redundant SF supported check from mlx5_sf_hw_table_init() Saeed Mahameed
` (6 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Jiri Pirko
From: Jiri Pirko <jiri@nvidia.com>
Instead of using device_put(), use auxiliary_device_uninit() for
auxiliary device uninit which internally just calls device_put().
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
index 8e2abbab05f0..b2c849b8c0c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
@@ -129,7 +129,7 @@ static void mlx5_sf_dev_add(struct mlx5_core_dev *dev, u16 sf_index, u16 fn_id,
err = auxiliary_device_add(&sf_dev->adev);
if (err) {
- put_device(&sf_dev->adev.dev);
+ auxiliary_device_uninit(&sf_dev->adev);
goto add_err;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [net-next 08/14] net/mlx5: Remove redundant SF supported check from mlx5_sf_hw_table_init()
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
` (6 preceding siblings ...)
2023-08-14 21:41 ` [net-next 07/14] net/mlx5: Use auxiliary_device_uninit() instead of device_put() Saeed Mahameed
@ 2023-08-14 21:41 ` Saeed Mahameed
2023-08-14 21:41 ` [net-next 09/14] net/mlx5: Use mlx5_sf_start_function_id() helper instead of directly calling MLX5_CAP_GEN() Saeed Mahameed
` (5 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Jiri Pirko, Shay Drory
From: Jiri Pirko <jiri@nvidia.com>
Since mlx5_sf_supported() check is done as a first thing in
mlx5_sf_max_functions(), remove the redundant check.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
index c4daeaaafead..1f613320fe07 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
@@ -275,15 +275,14 @@ int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev)
struct mlx5_sf_hw_table *table;
u16 max_ext_fn = 0;
u16 ext_base_id = 0;
- u16 max_fn = 0;
u16 base_id;
+ u16 max_fn;
int err;
if (!mlx5_vhca_event_supported(dev))
return 0;
- if (mlx5_sf_supported(dev))
- max_fn = mlx5_sf_max_functions(dev);
+ max_fn = mlx5_sf_max_functions(dev);
err = mlx5_esw_sf_max_hpf_functions(dev, &max_ext_fn, &ext_base_id);
if (err)
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [net-next 09/14] net/mlx5: Use mlx5_sf_start_function_id() helper instead of directly calling MLX5_CAP_GEN()
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
` (7 preceding siblings ...)
2023-08-14 21:41 ` [net-next 08/14] net/mlx5: Remove redundant SF supported check from mlx5_sf_hw_table_init() Saeed Mahameed
@ 2023-08-14 21:41 ` Saeed Mahameed
2023-08-14 21:41 ` [net-next 10/14] net/mlx5: Remove redundant check of mlx5_vhca_event_supported() Saeed Mahameed
` (4 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Jiri Pirko, Shay Drory
From: Jiri Pirko <jiri@nvidia.com>
There is a helper called mlx5_sf_start_function_id() that
wraps up a query to get base SF function id. Use that instead of
calling MLX5_CAP_GEN() directly.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
index b2c849b8c0c9..39132a6cc68b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
@@ -167,7 +167,7 @@ mlx5_sf_dev_state_change_handler(struct notifier_block *nb, unsigned long event_
if (!max_functions)
return 0;
- base_id = MLX5_CAP_GEN(table->dev, sf_base_id);
+ base_id = mlx5_sf_start_function_id(table->dev);
if (event->function_id < base_id || event->function_id >= (base_id + max_functions))
return 0;
@@ -209,7 +209,7 @@ static int mlx5_sf_dev_vhca_arm_all(struct mlx5_sf_dev_table *table)
int i;
max_functions = mlx5_sf_max_functions(dev);
- function_id = MLX5_CAP_GEN(dev, sf_base_id);
+ function_id = mlx5_sf_start_function_id(dev);
/* Arm the vhca context as the vhca event notifier */
for (i = 0; i < max_functions; i++) {
err = mlx5_vhca_event_arm(dev, function_id);
@@ -234,7 +234,7 @@ static void mlx5_sf_dev_add_active_work(struct work_struct *work)
int i;
max_functions = mlx5_sf_max_functions(dev);
- function_id = MLX5_CAP_GEN(dev, sf_base_id);
+ function_id = mlx5_sf_start_function_id(dev);
for (i = 0; i < max_functions; i++, function_id++) {
if (table->stop_active_wq)
return;
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [net-next 10/14] net/mlx5: Remove redundant check of mlx5_vhca_event_supported()
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
` (8 preceding siblings ...)
2023-08-14 21:41 ` [net-next 09/14] net/mlx5: Use mlx5_sf_start_function_id() helper instead of directly calling MLX5_CAP_GEN() Saeed Mahameed
@ 2023-08-14 21:41 ` Saeed Mahameed
2023-08-14 21:41 ` [net-next 11/14] net/mlx5: Fix error message in mlx5_sf_dev_state_change_handler() Saeed Mahameed
` (3 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Jiri Pirko, Shay Drory
From: Jiri Pirko <jiri@nvidia.com>
Since mlx5_vhca_event_supported() is called in mlx5_sf_dev_supported(),
remove the redundant call.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
index 39132a6cc68b..e617a270d74a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
@@ -299,7 +299,7 @@ void mlx5_sf_dev_table_create(struct mlx5_core_dev *dev)
unsigned int max_sfs;
int err;
- if (!mlx5_sf_dev_supported(dev) || !mlx5_vhca_event_supported(dev))
+ if (!mlx5_sf_dev_supported(dev))
return;
table = kzalloc(sizeof(*table), GFP_KERNEL);
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [net-next 11/14] net/mlx5: Fix error message in mlx5_sf_dev_state_change_handler()
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
` (9 preceding siblings ...)
2023-08-14 21:41 ` [net-next 10/14] net/mlx5: Remove redundant check of mlx5_vhca_event_supported() Saeed Mahameed
@ 2023-08-14 21:41 ` Saeed Mahameed
2023-08-14 21:41 ` [net-next 12/14] net/mlx5: Remove unused CAPs Saeed Mahameed
` (2 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Jiri Pirko, Shay Drory
From: Jiri Pirko <jiri@nvidia.com>
sw_function_id contains sfnum, so fix the error message to name the
value properly.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
index e617a270d74a..05e148db9889 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
@@ -185,7 +185,7 @@ mlx5_sf_dev_state_change_handler(struct notifier_block *nb, unsigned long event_
mlx5_sf_dev_del(table->dev, sf_dev, sf_index);
else
mlx5_core_err(table->dev,
- "SF DEV: teardown state for invalid dev index=%d fn_id=0x%x\n",
+ "SF DEV: teardown state for invalid dev index=%d sfnum=0x%x\n",
sf_index, event->sw_function_id);
break;
case MLX5_VHCA_STATE_ACTIVE:
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [net-next 12/14] net/mlx5: Remove unused CAPs
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
` (10 preceding siblings ...)
2023-08-14 21:41 ` [net-next 11/14] net/mlx5: Fix error message in mlx5_sf_dev_state_change_handler() Saeed Mahameed
@ 2023-08-14 21:41 ` Saeed Mahameed
2023-08-14 21:41 ` [net-next 13/14] net/mlx5: Remove unused MAX HCA capabilities Saeed Mahameed
2023-08-14 21:41 ` [net-next 14/14] net/mlx5: Don't query MAX caps twice Saeed Mahameed
13 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Maher Sanalla
From: Shay Drory <shayd@nvidia.com>
mlx5 driver queries the device for VECTOR_CALC and SHAMPO caps, but
there isn't any user who requires them.
As well as, MLX5_MCAM_REGS_0x9080_0x90FF is queried but not used.
Thus, drop all usages and definitions of the mentioned caps above.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/fw.c | 13 ------
.../net/ethernet/mellanox/mlx5/core/main.c | 2 -
include/linux/mlx5/device.h | 17 +-------
include/linux/mlx5/mlx5_ifc.h | 43 -------------------
4 files changed, 1 insertion(+), 74 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index fb2035a5ec99..c982ce06d7a3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -206,12 +206,6 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
return err;
}
- if (MLX5_CAP_GEN(dev, vector_calc)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_VECTOR_CALC);
- if (err)
- return err;
- }
-
if (MLX5_CAP_GEN(dev, qos)) {
err = mlx5_core_get_caps(dev, MLX5_CAP_QOS);
if (err)
@@ -226,7 +220,6 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
if (MLX5_CAP_GEN(dev, mcam_reg)) {
mlx5_get_mcam_access_reg_group(dev, MLX5_MCAM_REGS_FIRST_128);
- mlx5_get_mcam_access_reg_group(dev, MLX5_MCAM_REGS_0x9080_0x90FF);
mlx5_get_mcam_access_reg_group(dev, MLX5_MCAM_REGS_0x9100_0x917F);
}
@@ -270,12 +263,6 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
return err;
}
- if (MLX5_CAP_GEN(dev, shampo)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_DEV_SHAMPO);
- if (err)
- return err;
- }
-
if (MLX5_CAP_GEN_64(dev, general_obj_types) &
MLX5_GENERAL_OBJ_TYPES_CAP_MACSEC_OFFLOAD) {
err = mlx5_core_get_caps(dev, MLX5_CAP_MACSEC);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index f4fe06a5042e..cb77b5774aad 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1714,7 +1714,6 @@ static const int types[] = {
MLX5_CAP_FLOW_TABLE,
MLX5_CAP_ESWITCH_FLOW_TABLE,
MLX5_CAP_ESWITCH,
- MLX5_CAP_VECTOR_CALC,
MLX5_CAP_QOS,
MLX5_CAP_DEBUG,
MLX5_CAP_DEV_MEM,
@@ -1723,7 +1722,6 @@ static const int types[] = {
MLX5_CAP_VDPA_EMULATION,
MLX5_CAP_IPSEC,
MLX5_CAP_PORT_SELECTION,
- MLX5_CAP_DEV_SHAMPO,
MLX5_CAP_MACSEC,
MLX5_CAP_ADV_VIRTUALIZATION,
MLX5_CAP_CRYPTO,
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 80cc12a9a531..5041965250f6 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1208,9 +1208,7 @@ enum mlx5_cap_type {
MLX5_CAP_FLOW_TABLE,
MLX5_CAP_ESWITCH_FLOW_TABLE,
MLX5_CAP_ESWITCH,
- MLX5_CAP_RESERVED,
- MLX5_CAP_VECTOR_CALC,
- MLX5_CAP_QOS,
+ MLX5_CAP_QOS = 0xc,
MLX5_CAP_DEBUG,
MLX5_CAP_RESERVED_14,
MLX5_CAP_DEV_MEM,
@@ -1220,7 +1218,6 @@ enum mlx5_cap_type {
MLX5_CAP_DEV_EVENT = 0x14,
MLX5_CAP_IPSEC,
MLX5_CAP_CRYPTO = 0x1a,
- MLX5_CAP_DEV_SHAMPO = 0x1d,
MLX5_CAP_MACSEC = 0x1f,
MLX5_CAP_GENERAL_2 = 0x20,
MLX5_CAP_PORT_SELECTION = 0x25,
@@ -1239,7 +1236,6 @@ enum mlx5_pcam_feature_groups {
enum mlx5_mcam_reg_groups {
MLX5_MCAM_REGS_FIRST_128 = 0x0,
- MLX5_MCAM_REGS_0x9080_0x90FF = 0x1,
MLX5_MCAM_REGS_0x9100_0x917F = 0x2,
MLX5_MCAM_REGS_NUM = 0x3,
};
@@ -1416,10 +1412,6 @@ enum mlx5_qcam_feature_groups {
#define MLX5_CAP_ODP_MAX(mdev, cap)\
MLX5_GET(odp_cap, mdev->caps.hca[MLX5_CAP_ODP]->max, cap)
-#define MLX5_CAP_VECTOR_CALC(mdev, cap) \
- MLX5_GET(vector_calc_cap, \
- mdev->caps.hca[MLX5_CAP_VECTOR_CALC]->cur, cap)
-
#define MLX5_CAP_QOS(mdev, cap)\
MLX5_GET(qos_cap, mdev->caps.hca[MLX5_CAP_QOS]->cur, cap)
@@ -1436,10 +1428,6 @@ enum mlx5_qcam_feature_groups {
MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_FIRST_128], \
mng_access_reg_cap_mask.access_regs.reg)
-#define MLX5_CAP_MCAM_REG1(mdev, reg) \
- MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_0x9080_0x90FF], \
- mng_access_reg_cap_mask.access_regs1.reg)
-
#define MLX5_CAP_MCAM_REG2(mdev, reg) \
MLX5_GET(mcam_reg, (mdev)->caps.mcam[MLX5_MCAM_REGS_0x9100_0x917F], \
mng_access_reg_cap_mask.access_regs2.reg)
@@ -1485,9 +1473,6 @@ enum mlx5_qcam_feature_groups {
#define MLX5_CAP_CRYPTO(mdev, cap)\
MLX5_GET(crypto_cap, (mdev)->caps.hca[MLX5_CAP_CRYPTO]->cur, cap)
-#define MLX5_CAP_DEV_SHAMPO(mdev, cap)\
- MLX5_GET(shampo_cap, mdev->caps.hca_cur[MLX5_CAP_DEV_SHAMPO], cap)
-
#define MLX5_CAP_MACSEC(mdev, cap)\
MLX5_GET(macsec_cap, (mdev)->caps.hca[MLX5_CAP_MACSEC]->cur, cap)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 9aed7e9b9f29..08dcb1f43be7 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1314,33 +1314,6 @@ struct mlx5_ifc_odp_cap_bits {
u8 reserved_at_120[0x6E0];
};
-struct mlx5_ifc_calc_op {
- u8 reserved_at_0[0x10];
- u8 reserved_at_10[0x9];
- u8 op_swap_endianness[0x1];
- u8 op_min[0x1];
- u8 op_xor[0x1];
- u8 op_or[0x1];
- u8 op_and[0x1];
- u8 op_max[0x1];
- u8 op_add[0x1];
-};
-
-struct mlx5_ifc_vector_calc_cap_bits {
- u8 calc_matrix[0x1];
- u8 reserved_at_1[0x1f];
- u8 reserved_at_20[0x8];
- u8 max_vec_count[0x8];
- u8 reserved_at_30[0xd];
- u8 max_chunk_size[0x3];
- struct mlx5_ifc_calc_op calc0;
- struct mlx5_ifc_calc_op calc1;
- struct mlx5_ifc_calc_op calc2;
- struct mlx5_ifc_calc_op calc3;
-
- u8 reserved_at_c0[0x720];
-};
-
struct mlx5_ifc_tls_cap_bits {
u8 tls_1_2_aes_gcm_128[0x1];
u8 tls_1_3_aes_gcm_128[0x1];
@@ -3435,20 +3408,6 @@ struct mlx5_ifc_roce_addr_layout_bits {
u8 reserved_at_e0[0x20];
};
-struct mlx5_ifc_shampo_cap_bits {
- u8 reserved_at_0[0x3];
- u8 shampo_log_max_reservation_size[0x5];
- u8 reserved_at_8[0x3];
- u8 shampo_log_min_reservation_size[0x5];
- u8 shampo_min_mss_size[0x10];
-
- u8 reserved_at_20[0x3];
- u8 shampo_max_log_headers_entry_size[0x5];
- u8 reserved_at_28[0x18];
-
- u8 reserved_at_40[0x7c0];
-};
-
struct mlx5_ifc_crypto_cap_bits {
u8 reserved_at_0[0x3];
u8 synchronize_dek[0x1];
@@ -3484,14 +3443,12 @@ union mlx5_ifc_hca_cap_union_bits {
struct mlx5_ifc_flow_table_eswitch_cap_bits flow_table_eswitch_cap;
struct mlx5_ifc_e_switch_cap_bits e_switch_cap;
struct mlx5_ifc_port_selection_cap_bits port_selection_cap;
- struct mlx5_ifc_vector_calc_cap_bits vector_calc_cap;
struct mlx5_ifc_qos_cap_bits qos_cap;
struct mlx5_ifc_debug_cap_bits debug_cap;
struct mlx5_ifc_fpga_cap_bits fpga_cap;
struct mlx5_ifc_tls_cap_bits tls_cap;
struct mlx5_ifc_device_mem_cap_bits device_mem_cap;
struct mlx5_ifc_virtio_emulation_cap_bits virtio_emulation_cap;
- struct mlx5_ifc_shampo_cap_bits shampo_cap;
struct mlx5_ifc_macsec_cap_bits macsec_cap;
struct mlx5_ifc_crypto_cap_bits crypto_cap;
u8 reserved_at_0[0x8000];
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [net-next 13/14] net/mlx5: Remove unused MAX HCA capabilities
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
` (11 preceding siblings ...)
2023-08-14 21:41 ` [net-next 12/14] net/mlx5: Remove unused CAPs Saeed Mahameed
@ 2023-08-14 21:41 ` Saeed Mahameed
2023-08-14 21:41 ` [net-next 14/14] net/mlx5: Don't query MAX caps twice Saeed Mahameed
13 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Maher Sanalla
From: Shay Drory <shayd@nvidia.com>
Each device cap has two modes: MAX and CUR. The driver maintains a
cache of both modes of the capabilities. For most device caps, the MAX
cap mode is never used.
Hence, remove all driver queries of the MAX mode of the said caps as
well as their helper MACROs.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/fw.c | 34 ++++++------
.../net/ethernet/mellanox/mlx5/core/main.c | 14 ++---
.../ethernet/mellanox/mlx5/core/mlx5_core.h | 3 ++
include/linux/mlx5/device.h | 52 -------------------
include/linux/mlx5/driver.h | 1 -
5 files changed, 30 insertions(+), 74 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index c982ce06d7a3..0e394408af75 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -160,13 +160,15 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
}
if (MLX5_CAP_GEN(dev, eth_net_offloads)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_ETHERNET_OFFLOADS);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ETHERNET_OFFLOADS,
+ HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_IPOIB_ENHANCED_OFFLOADS);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_IPOIB_ENHANCED_OFFLOADS,
+ HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
@@ -191,29 +193,30 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
if (MLX5_CAP_GEN(dev, nic_flow_table) ||
MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_FLOW_TABLE);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_FLOW_TABLE, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_ESWITCH_MANAGER(dev)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_ESWITCH_FLOW_TABLE);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ESWITCH_FLOW_TABLE,
+ HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
- err = mlx5_core_get_caps(dev, MLX5_CAP_ESWITCH);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ESWITCH, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN(dev, qos)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_QOS);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_QOS, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN(dev, debug))
- mlx5_core_get_caps(dev, MLX5_CAP_DEBUG);
+ mlx5_core_get_caps_mode(dev, MLX5_CAP_DEBUG, HCA_CAP_OPMOD_GET_CUR);
if (MLX5_CAP_GEN(dev, pcam_reg))
mlx5_get_pcam_reg(dev);
@@ -227,51 +230,52 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
mlx5_get_qcam_reg(dev);
if (MLX5_CAP_GEN(dev, device_memory)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_DEV_MEM);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_DEV_MEM, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN(dev, event_cap)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_DEV_EVENT);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_DEV_EVENT, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN(dev, tls_tx) || MLX5_CAP_GEN(dev, tls_rx)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_TLS);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_TLS, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN_64(dev, general_obj_types) &
MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_VDPA_EMULATION);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_VDPA_EMULATION, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN(dev, ipsec_offload)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_IPSEC);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_IPSEC, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN(dev, crypto)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_CRYPTO);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_CRYPTO, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN_64(dev, general_obj_types) &
MLX5_GENERAL_OBJ_TYPES_CAP_MACSEC_OFFLOAD) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_MACSEC);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_MACSEC, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN(dev, adv_virtualization)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_ADV_VIRTUALIZATION);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ADV_VIRTUALIZATION,
+ HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index cb77b5774aad..15561965d2af 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -361,9 +361,8 @@ void mlx5_core_uplink_netdev_event_replay(struct mlx5_core_dev *dev)
}
EXPORT_SYMBOL(mlx5_core_uplink_netdev_event_replay);
-static int mlx5_core_get_caps_mode(struct mlx5_core_dev *dev,
- enum mlx5_cap_type cap_type,
- enum mlx5_cap_mode cap_mode)
+int mlx5_core_get_caps_mode(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type,
+ enum mlx5_cap_mode cap_mode)
{
u8 in[MLX5_ST_SZ_BYTES(query_hca_cap_in)];
int out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out);
@@ -1620,21 +1619,24 @@ static int mlx5_query_hca_caps_light(struct mlx5_core_dev *dev)
return err;
if (MLX5_CAP_GEN(dev, eth_net_offloads)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_ETHERNET_OFFLOADS);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ETHERNET_OFFLOADS,
+ HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN(dev, nic_flow_table) ||
MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_FLOW_TABLE);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_FLOW_TABLE,
+ HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN_64(dev, general_obj_types) &
MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_VDPA_EMULATION);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_VDPA_EMULATION,
+ HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 8e2028d20a9e..124352459c23 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -174,6 +174,9 @@ static inline int mlx5_flexible_inlen(struct mlx5_core_dev *dev, size_t fixed,
#define MLX5_FLEXIBLE_INLEN(dev, fixed, item_size, num_items) \
mlx5_flexible_inlen(dev, fixed, item_size, num_items, __func__, __LINE__)
+int mlx5_core_get_caps(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type);
+int mlx5_core_get_caps_mode(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type,
+ enum mlx5_cap_mode cap_mode);
int mlx5_query_hca_caps(struct mlx5_core_dev *dev);
int mlx5_query_board_id(struct mlx5_core_dev *dev);
int mlx5_query_module_num(struct mlx5_core_dev *dev, int *module_num);
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 5041965250f6..93399802ba77 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1275,10 +1275,6 @@ enum mlx5_qcam_feature_groups {
MLX5_GET(per_protocol_networking_offload_caps,\
mdev->caps.hca[MLX5_CAP_ETHERNET_OFFLOADS]->cur, cap)
-#define MLX5_CAP_ETH_MAX(mdev, cap) \
- MLX5_GET(per_protocol_networking_offload_caps,\
- mdev->caps.hca[MLX5_CAP_ETHERNET_OFFLOADS]->max, cap)
-
#define MLX5_CAP_IPOIB_ENHANCED(mdev, cap) \
MLX5_GET(per_protocol_networking_offload_caps,\
mdev->caps.hca[MLX5_CAP_IPOIB_ENHANCED_OFFLOADS]->cur, cap)
@@ -1301,77 +1297,40 @@ enum mlx5_qcam_feature_groups {
#define MLX5_CAP64_FLOWTABLE(mdev, cap) \
MLX5_GET64(flow_table_nic_cap, (mdev)->caps.hca[MLX5_CAP_FLOW_TABLE]->cur, cap)
-#define MLX5_CAP_FLOWTABLE_MAX(mdev, cap) \
- MLX5_GET(flow_table_nic_cap, mdev->caps.hca[MLX5_CAP_FLOW_TABLE]->max, cap)
-
#define MLX5_CAP_FLOWTABLE_NIC_RX(mdev, cap) \
MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_receive.cap)
-#define MLX5_CAP_FLOWTABLE_NIC_RX_MAX(mdev, cap) \
- MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_receive.cap)
-
#define MLX5_CAP_FLOWTABLE_NIC_TX(mdev, cap) \
MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_transmit.cap)
-#define MLX5_CAP_FLOWTABLE_NIC_TX_MAX(mdev, cap) \
- MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_transmit.cap)
-
#define MLX5_CAP_FLOWTABLE_SNIFFER_RX(mdev, cap) \
MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_receive_sniffer.cap)
-#define MLX5_CAP_FLOWTABLE_SNIFFER_RX_MAX(mdev, cap) \
- MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_receive_sniffer.cap)
-
#define MLX5_CAP_FLOWTABLE_SNIFFER_TX(mdev, cap) \
MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_transmit_sniffer.cap)
-#define MLX5_CAP_FLOWTABLE_SNIFFER_TX_MAX(mdev, cap) \
- MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_transmit_sniffer.cap)
-
#define MLX5_CAP_FLOWTABLE_RDMA_RX(mdev, cap) \
MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_receive_rdma.cap)
-#define MLX5_CAP_FLOWTABLE_RDMA_RX_MAX(mdev, cap) \
- MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_receive_rdma.cap)
-
#define MLX5_CAP_FLOWTABLE_RDMA_TX(mdev, cap) \
MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_transmit_rdma.cap)
-#define MLX5_CAP_FLOWTABLE_RDMA_TX_MAX(mdev, cap) \
- MLX5_CAP_FLOWTABLE_MAX(mdev, flow_table_properties_nic_transmit_rdma.cap)
-
#define MLX5_CAP_ESW_FLOWTABLE(mdev, cap) \
MLX5_GET(flow_table_eswitch_cap, \
mdev->caps.hca[MLX5_CAP_ESWITCH_FLOW_TABLE]->cur, cap)
-#define MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, cap) \
- MLX5_GET(flow_table_eswitch_cap, \
- mdev->caps.hca[MLX5_CAP_ESWITCH_FLOW_TABLE]->max, cap)
-
#define MLX5_CAP_ESW_FLOWTABLE_FDB(mdev, cap) \
MLX5_CAP_ESW_FLOWTABLE(mdev, flow_table_properties_nic_esw_fdb.cap)
-#define MLX5_CAP_ESW_FLOWTABLE_FDB_MAX(mdev, cap) \
- MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, flow_table_properties_nic_esw_fdb.cap)
-
#define MLX5_CAP_ESW_EGRESS_ACL(mdev, cap) \
MLX5_CAP_ESW_FLOWTABLE(mdev, flow_table_properties_esw_acl_egress.cap)
-#define MLX5_CAP_ESW_EGRESS_ACL_MAX(mdev, cap) \
- MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, flow_table_properties_esw_acl_egress.cap)
-
#define MLX5_CAP_ESW_INGRESS_ACL(mdev, cap) \
MLX5_CAP_ESW_FLOWTABLE(mdev, flow_table_properties_esw_acl_ingress.cap)
-#define MLX5_CAP_ESW_INGRESS_ACL_MAX(mdev, cap) \
- MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, flow_table_properties_esw_acl_ingress.cap)
-
#define MLX5_CAP_ESW_FT_FIELD_SUPPORT_2(mdev, cap) \
MLX5_CAP_ESW_FLOWTABLE(mdev, ft_field_support_2_esw_fdb.cap)
-#define MLX5_CAP_ESW_FT_FIELD_SUPPORT_2_MAX(mdev, cap) \
- MLX5_CAP_ESW_FLOWTABLE_MAX(mdev, ft_field_support_2_esw_fdb.cap)
-
#define MLX5_CAP_ESW(mdev, cap) \
MLX5_GET(e_switch_cap, \
mdev->caps.hca[MLX5_CAP_ESWITCH]->cur, cap)
@@ -1380,10 +1339,6 @@ enum mlx5_qcam_feature_groups {
MLX5_GET64(flow_table_eswitch_cap, \
(mdev)->caps.hca[MLX5_CAP_ESWITCH_FLOW_TABLE]->cur, cap)
-#define MLX5_CAP_ESW_MAX(mdev, cap) \
- MLX5_GET(e_switch_cap, \
- mdev->caps.hca[MLX5_CAP_ESWITCH]->max, cap)
-
#define MLX5_CAP_PORT_SELECTION(mdev, cap) \
MLX5_GET(port_selection_cap, \
mdev->caps.hca[MLX5_CAP_PORT_SELECTION]->cur, cap)
@@ -1396,16 +1351,9 @@ enum mlx5_qcam_feature_groups {
MLX5_GET(adv_virtualization_cap, \
mdev->caps.hca[MLX5_CAP_ADV_VIRTUALIZATION]->cur, cap)
-#define MLX5_CAP_ADV_VIRTUALIZATION_MAX(mdev, cap) \
- MLX5_GET(adv_virtualization_cap, \
- mdev->caps.hca[MLX5_CAP_ADV_VIRTUALIZATION]->max, cap)
-
#define MLX5_CAP_FLOWTABLE_PORT_SELECTION(mdev, cap) \
MLX5_CAP_PORT_SELECTION(mdev, flow_table_properties_port_selection.cap)
-#define MLX5_CAP_FLOWTABLE_PORT_SELECTION_MAX(mdev, cap) \
- MLX5_CAP_PORT_SELECTION_MAX(mdev, flow_table_properties_port_selection.cap)
-
#define MLX5_CAP_ODP(mdev, cap)\
MLX5_GET(odp_cap, mdev->caps.hca[MLX5_CAP_ODP]->cur, cap)
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index e1c7e502a4fc..c9d82e74daaa 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1022,7 +1022,6 @@ bool mlx5_cmd_is_down(struct mlx5_core_dev *dev);
void mlx5_core_uplink_netdev_set(struct mlx5_core_dev *mdev, struct net_device *netdev);
void mlx5_core_uplink_netdev_event_replay(struct mlx5_core_dev *mdev);
-int mlx5_core_get_caps(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type);
void mlx5_health_cleanup(struct mlx5_core_dev *dev);
int mlx5_health_init(struct mlx5_core_dev *dev);
void mlx5_start_health_poll(struct mlx5_core_dev *dev);
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [net-next 14/14] net/mlx5: Don't query MAX caps twice
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
` (12 preceding siblings ...)
2023-08-14 21:41 ` [net-next 13/14] net/mlx5: Remove unused MAX HCA capabilities Saeed Mahameed
@ 2023-08-14 21:41 ` Saeed Mahameed
13 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2023-08-14 21:41 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Maher Sanalla
From: Shay Drory <shayd@nvidia.com>
Whenever mlx5 driver is probed or reloaded, it queries some capabilities
in MAX mode via set_hca_cap() API. Afterwards, the driver queries all
capabilities in MAX mode via mlx5_query_hca_caps() API.
Since MAX caps are read only caps, querying them twice is redundant.
Hence, delete the second query.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/fw.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 0e394408af75..58f4c0d0fafa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -143,18 +143,18 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
{
int err;
- err = mlx5_core_get_caps(dev, MLX5_CAP_GENERAL);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_GENERAL, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
if (MLX5_CAP_GEN(dev, port_selection_cap)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_PORT_SELECTION);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_PORT_SELECTION, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN(dev, hca_cap_2)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_GENERAL_2);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_GENERAL_2, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
@@ -174,19 +174,19 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
}
if (MLX5_CAP_GEN(dev, pg)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_ODP);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ODP, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN(dev, atomic)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_ATOMIC);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ATOMIC, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
if (MLX5_CAP_GEN(dev, roce)) {
- err = mlx5_core_get_caps(dev, MLX5_CAP_ROCE);
+ err = mlx5_core_get_caps_mode(dev, MLX5_CAP_ROCE, HCA_CAP_OPMOD_GET_CUR);
if (err)
return err;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [net-next 01/14] net/mlx5: Consolidate devlink documentation in devlink/mlx5.rst
2023-08-14 21:41 ` [net-next 01/14] net/mlx5: Consolidate devlink documentation in devlink/mlx5.rst Saeed Mahameed
@ 2023-08-16 2:40 ` patchwork-bot+netdevbpf
0 siblings, 0 replies; 16+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-08-16 2:40 UTC (permalink / raw)
To: Saeed Mahameed
Cc: davem, kuba, pabeni, edumazet, saeedm, netdev, tariqt,
rrameshbabu, jiri, gal
Hello:
This series was applied to netdev/net-next.git (main)
by Saeed Mahameed <saeedm@nvidia.com>:
On Mon, 14 Aug 2023 14:41:31 -0700 you wrote:
> From: Rahul Rameshbabu <rrameshbabu@nvidia.com>
>
> De-duplicate documentation by removing mellanox/mlx5/devlink.rst. Instead,
> only use the generic devlink documentation directory to document mlx5
> devlink parameters. Avoid providing general devlink tool usage information
> in mlx5-specific documentation.
>
> [...]
Here is the summary with links:
- [net-next,01/14] net/mlx5: Consolidate devlink documentation in devlink/mlx5.rst
https://git.kernel.org/netdev/net-next/c/b608dd670bb6
- [net-next,02/14] net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs
https://git.kernel.org/netdev/net-next/c/3178308ad4ca
- [net-next,03/14] net/mlx5e: Add recovery flow for tx devlink health reporter for unhealthy PTP SQ
https://git.kernel.org/netdev/net-next/c/53b836a44db4
- [net-next,04/14] net/mlx5: Expose max possible SFs via devlink resource
https://git.kernel.org/netdev/net-next/c/6486c0f44ed8
- [net-next,05/14] net/mlx5: Check with FW that sync reset completed successfully
https://git.kernel.org/netdev/net-next/c/a9f168e4c6e1
- [net-next,06/14] net/mlx5: E-switch, Add checking for flow rule destinations
https://git.kernel.org/netdev/net-next/c/e0e22d59b47a
- [net-next,07/14] net/mlx5: Use auxiliary_device_uninit() instead of device_put()
https://git.kernel.org/netdev/net-next/c/2ad0160c02be
- [net-next,08/14] net/mlx5: Remove redundant SF supported check from mlx5_sf_hw_table_init()
https://git.kernel.org/netdev/net-next/c/ae80d7a06fdb
- [net-next,09/14] net/mlx5: Use mlx5_sf_start_function_id() helper instead of directly calling MLX5_CAP_GEN()
https://git.kernel.org/netdev/net-next/c/88074d81e5fe
- [net-next,10/14] net/mlx5: Remove redundant check of mlx5_vhca_event_supported()
https://git.kernel.org/netdev/net-next/c/b63f8bde2fba
- [net-next,11/14] net/mlx5: Fix error message in mlx5_sf_dev_state_change_handler()
https://git.kernel.org/netdev/net-next/c/36e5a0efc810
- [net-next,12/14] net/mlx5: Remove unused CAPs
https://git.kernel.org/netdev/net-next/c/0b4eb603d635
- [net-next,13/14] net/mlx5: Remove unused MAX HCA capabilities
https://git.kernel.org/netdev/net-next/c/a41cb59117fa
- [net-next,14/14] net/mlx5: Don't query MAX caps twice
https://git.kernel.org/netdev/net-next/c/bd3a2f77809b
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-08-16 2:40 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-14 21:41 [pull request][net-next 00/14] mlx5 updates 2023-08-14 Saeed Mahameed
2023-08-14 21:41 ` [net-next 01/14] net/mlx5: Consolidate devlink documentation in devlink/mlx5.rst Saeed Mahameed
2023-08-16 2:40 ` patchwork-bot+netdevbpf
2023-08-14 21:41 ` [net-next 02/14] net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs Saeed Mahameed
2023-08-14 21:41 ` [net-next 03/14] net/mlx5e: Add recovery flow for tx devlink health reporter for unhealthy PTP SQ Saeed Mahameed
2023-08-14 21:41 ` [net-next 04/14] net/mlx5: Expose max possible SFs via devlink resource Saeed Mahameed
2023-08-14 21:41 ` [net-next 05/14] net/mlx5: Check with FW that sync reset completed successfully Saeed Mahameed
2023-08-14 21:41 ` [net-next 06/14] net/mlx5: E-switch, Add checking for flow rule destinations Saeed Mahameed
2023-08-14 21:41 ` [net-next 07/14] net/mlx5: Use auxiliary_device_uninit() instead of device_put() Saeed Mahameed
2023-08-14 21:41 ` [net-next 08/14] net/mlx5: Remove redundant SF supported check from mlx5_sf_hw_table_init() Saeed Mahameed
2023-08-14 21:41 ` [net-next 09/14] net/mlx5: Use mlx5_sf_start_function_id() helper instead of directly calling MLX5_CAP_GEN() Saeed Mahameed
2023-08-14 21:41 ` [net-next 10/14] net/mlx5: Remove redundant check of mlx5_vhca_event_supported() Saeed Mahameed
2023-08-14 21:41 ` [net-next 11/14] net/mlx5: Fix error message in mlx5_sf_dev_state_change_handler() Saeed Mahameed
2023-08-14 21:41 ` [net-next 12/14] net/mlx5: Remove unused CAPs Saeed Mahameed
2023-08-14 21:41 ` [net-next 13/14] net/mlx5: Remove unused MAX HCA capabilities Saeed Mahameed
2023-08-14 21:41 ` [net-next 14/14] net/mlx5: Don't query MAX caps twice Saeed Mahameed
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).