From: Saeed Mahameed <saeed@kernel.org>
To: "David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Eric Dumazet <edumazet@google.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
netdev@vger.kernel.org, Tariq Toukan <tariqt@nvidia.com>,
Maher Sanalla <msanalla@nvidia.com>,
Moshe Shemesh <moshe@nvidia.com>
Subject: [net-next 07/15] net/mlx5: Add vnic devlink health reporter to PFs/VFs
Date: Thu, 20 Apr 2023 18:38:42 -0700 [thread overview]
Message-ID: <20230421013850.349646-8-saeed@kernel.org> (raw)
In-Reply-To: <20230421013850.349646-1-saeed@kernel.org>
From: Maher Sanalla <msanalla@nvidia.com>
Create a vnic devlink health reporter for PFs/VFs interfaces.
The reporter's diagnose callback displays the values of vNIC/vport
transport debug counters of PFs/VFs, as follows:
$ devlink health diagnose pci/0000:08:00.0 reporter vnic
vNIC env counters:
total_error_queues: 0 send_queue_priority_update_flow: 0
comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
invalid_command: 0 quota_exceeded_command: 0
nic_receive_steering_discard: 0
Moreover, add documentation on the reporter functionality and the
counters description.
While at it, expose the vNIC counters diagnose function to be used by
the downstream patch, which will reveal the counters for representor
interfaces.
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
.../ethernet/mellanox/mlx5/devlink.rst | 30 +++++
.../net/ethernet/mellanox/mlx5/core/Makefile | 2 +-
.../mellanox/mlx5/core/diag/reporter_vnic.c | 125 ++++++++++++++++++
.../mellanox/mlx5/core/diag/reporter_vnic.h | 16 +++
.../net/ethernet/mellanox/mlx5/core/health.c | 4 +
include/linux/mlx5/driver.h | 1 +
6 files changed, 177 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.h
diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst
index 0995e4e5acd7..ceab18e46456 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst
@@ -257,3 +257,33 @@ User commands examples:
$ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
NOTE: This command can run only on PF.
+
+vnic reporter
+-------------
+The vnic reporter implements only the `diagnose` callback.
+It is responsible for querying the vnic diagnostic counters from fw and displaying
+them in realtime.
+
+Description of the vnic counters:
+total_q_under_processor_handle: number of queues in an error state due to
+an async error or errored command.
+send_queue_priority_update_flow: number of QP/SQ priority/SL update
+events.
+cq_overrun: number of times CQ entered an error state due to an
+overflow.
+async_eq_overrun: number of times an EQ mapped to async events was
+overrun.
+comp_eq_overrun: number of times an EQ mapped to completion events was
+overrun.
+quota_exceeded_command: number of commands issued and failed due to quota
+exceeded.
+invalid_command: number of commands issued and failed dues to any reason
+other than quota exceeded.
+nic_receive_steering_discard: number of packets that completed RX flow
+steering but were discarded due to a mismatch in flow table.
+
+User commands examples:
+- Diagnose PF/VF vnic counters
+ $ devlink health diagnose pci/0000:82:00.1 reporter vnic
+
+NOTE: This command can run only on PF/VF ports.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 68f6a4544f7e..ddf1e352f51d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -16,7 +16,7 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
transobj.o vport.o sriov.o fs_cmd.o fs_core.o pci_irq.o \
fs_counters.o fs_ft_pool.o rl.o lag/debugfs.o lag/lag.o dev.o events.o wq.o lib/gid.o \
lib/devcom.o lib/pci_vsc.o lib/dm.o lib/fs_ttc.o diag/fs_tracepoint.o \
- diag/fw_tracer.o diag/crdump.o devlink.o diag/rsc_dump.o \
+ diag/fw_tracer.o diag/crdump.o devlink.o diag/rsc_dump.o diag/reporter_vnic.o \
fw_reset.o qos.o lib/tout.o lib/aso.o
#
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c
new file mode 100644
index 000000000000..9114661cd967
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. */
+
+#include "reporter_vnic.h"
+#include "devlink.h"
+
+#define VNIC_ENV_GET64(vnic_env_stats, c) \
+ MLX5_GET64(query_vnic_env_out, (vnic_env_stats)->query_vnic_env_out, \
+ vport_env.c)
+
+struct mlx5_vnic_diag_stats {
+ __be64 query_vnic_env_out[MLX5_ST_SZ_QW(query_vnic_env_out)];
+};
+
+int mlx5_reporter_vnic_diagnose_counters(struct mlx5_core_dev *dev,
+ struct devlink_fmsg *fmsg,
+ u16 vport_num, bool other_vport)
+{
+ u32 in[MLX5_ST_SZ_DW(query_vnic_env_in)] = {};
+ struct mlx5_vnic_diag_stats vnic;
+ int err;
+
+ MLX5_SET(query_vnic_env_in, in, opcode, MLX5_CMD_OP_QUERY_VNIC_ENV);
+ MLX5_SET(query_vnic_env_in, in, vport_number, vport_num);
+ MLX5_SET(query_vnic_env_in, in, other_vport, !!other_vport);
+
+ err = mlx5_cmd_exec_inout(dev, query_vnic_env, in, &vnic.query_vnic_env_out);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_pair_nest_start(fmsg, "vNIC env counters");
+ if (err)
+ return err;
+
+ err = devlink_fmsg_obj_nest_start(fmsg);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u64_pair_put(fmsg, "total_error_queues",
+ VNIC_ENV_GET64(&vnic, total_error_queues));
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u64_pair_put(fmsg, "send_queue_priority_update_flow",
+ VNIC_ENV_GET64(&vnic, send_queue_priority_update_flow));
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u64_pair_put(fmsg, "comp_eq_overrun",
+ VNIC_ENV_GET64(&vnic, comp_eq_overrun));
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u64_pair_put(fmsg, "async_eq_overrun",
+ VNIC_ENV_GET64(&vnic, async_eq_overrun));
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u64_pair_put(fmsg, "cq_overrun",
+ VNIC_ENV_GET64(&vnic, cq_overrun));
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u64_pair_put(fmsg, "invalid_command",
+ VNIC_ENV_GET64(&vnic, invalid_command));
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u64_pair_put(fmsg, "quota_exceeded_command",
+ VNIC_ENV_GET64(&vnic, quota_exceeded_command));
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u64_pair_put(fmsg, "nic_receive_steering_discard",
+ VNIC_ENV_GET64(&vnic, nic_receive_steering_discard));
+ if (err)
+ return err;
+
+ err = devlink_fmsg_obj_nest_end(fmsg);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_pair_nest_end(fmsg);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static int mlx5_reporter_vnic_diagnose(struct devlink_health_reporter *reporter,
+ struct devlink_fmsg *fmsg,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_core_dev *dev = devlink_health_reporter_priv(reporter);
+
+ return mlx5_reporter_vnic_diagnose_counters(dev, fmsg, 0, false);
+}
+
+static const struct devlink_health_reporter_ops mlx5_reporter_vnic_ops = {
+ .name = "vnic",
+ .diagnose = mlx5_reporter_vnic_diagnose,
+};
+
+void mlx5_reporter_vnic_create(struct mlx5_core_dev *dev)
+{
+ struct mlx5_core_health *health = &dev->priv.health;
+ struct devlink *devlink = priv_to_devlink(dev);
+
+ health->vnic_reporter =
+ devlink_health_reporter_create(devlink,
+ &mlx5_reporter_vnic_ops,
+ 0, dev);
+ if (IS_ERR(health->vnic_reporter))
+ mlx5_core_warn(dev,
+ "Failed to create vnic reporter, err = %ld\n",
+ PTR_ERR(health->vnic_reporter));
+}
+
+void mlx5_reporter_vnic_destroy(struct mlx5_core_dev *dev)
+{
+ struct mlx5_core_health *health = &dev->priv.health;
+
+ if (!IS_ERR_OR_NULL(health->vnic_reporter))
+ devlink_health_reporter_destroy(health->vnic_reporter);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.h b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.h
new file mode 100644
index 000000000000..eba87a39e9b1
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES.
+ */
+#ifndef __MLX5_REPORTER_VNIC_H
+#define __MLX5_REPORTER_VNIC_H
+
+#include "mlx5_core.h"
+
+void mlx5_reporter_vnic_create(struct mlx5_core_dev *dev);
+void mlx5_reporter_vnic_destroy(struct mlx5_core_dev *dev);
+
+int mlx5_reporter_vnic_diagnose_counters(struct mlx5_core_dev *dev,
+ struct devlink_fmsg *fmsg,
+ u16 vport_num, bool other_vport);
+
+#endif /* __MLX5_REPORTER_VNIC_H */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 016c5f99c470..871c32dda66e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -42,6 +42,7 @@
#include "lib/pci_vsc.h"
#include "lib/tout.h"
#include "diag/fw_tracer.h"
+#include "diag/reporter_vnic.h"
enum {
MAX_MISSES = 3,
@@ -898,6 +899,7 @@ void mlx5_health_cleanup(struct mlx5_core_dev *dev)
cancel_delayed_work_sync(&health->update_fw_log_ts_work);
destroy_workqueue(health->wq);
+ mlx5_reporter_vnic_destroy(dev);
mlx5_fw_reporters_destroy(dev);
}
@@ -907,6 +909,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev)
char *name;
mlx5_fw_reporters_create(dev);
+ mlx5_reporter_vnic_create(dev);
health = &dev->priv.health;
name = kmalloc(64, GFP_KERNEL);
@@ -926,6 +929,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev)
return 0;
out_err:
+ mlx5_reporter_vnic_destroy(dev);
mlx5_fw_reporters_destroy(dev);
return -ENOMEM;
}
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 135a3c8d8237..5d25c4c73046 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -439,6 +439,7 @@ struct mlx5_core_health {
struct work_struct report_work;
struct devlink_health_reporter *fw_reporter;
struct devlink_health_reporter *fw_fatal_reporter;
+ struct devlink_health_reporter *vnic_reporter;
struct delayed_work update_fw_log_ts_work;
};
--
2.39.2
next prev parent reply other threads:[~2023-04-21 1:39 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-21 1:38 [pull request][net-next 00/15] mlx5 updates 2023-04-20 Saeed Mahameed
2023-04-21 1:38 ` [net-next 01/15] net/mlx5: DR, Fix dumping of legacy modify_hdr in debug dump Saeed Mahameed
2023-04-22 4:10 ` patchwork-bot+netdevbpf
2023-04-21 1:38 ` [net-next 02/15] net/mlx5: DR, Calculate sync threshold of each pool according to its type Saeed Mahameed
2023-04-21 1:38 ` [net-next 03/15] net/mlx5: DR, Add more info in domain dbg dump Saeed Mahameed
2023-04-21 1:38 ` [net-next 04/15] net/mlx5: DR, Add memory statistics for domain object Saeed Mahameed
2023-04-21 1:38 ` [net-next 05/15] Revert "net/mlx5: Expose steering dropped packets counter" Saeed Mahameed
2023-04-21 1:38 ` [net-next 06/15] Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports" Saeed Mahameed
2023-04-21 1:38 ` Saeed Mahameed [this message]
2023-04-21 1:38 ` [net-next 08/15] net/mlx5e: Add vnic devlink health reporter to representors Saeed Mahameed
2023-04-21 1:38 ` [net-next 09/15] net/mlx5e: RX, Fix releasing page_pool pages twice for striding RQ Saeed Mahameed
2023-04-21 1:38 ` [net-next 10/15] net/mlx5e: RX, Fix XDP_TX page release for legacy rq nonlinear case Saeed Mahameed
2023-04-21 1:38 ` [net-next 11/15] net/mlx5e: RX, Hook NAPIs to page pools Saeed Mahameed
2023-04-21 2:13 ` Jakub Kicinski
2023-04-24 11:59 ` Dragos Tatulea
2023-04-21 1:38 ` [net-next 12/15] net/mlx5: Include linux/pci.h for pci_msix_can_alloc_dyn() Saeed Mahameed
2023-04-21 1:38 ` [net-next 13/15] net/mlx5: E-Switch, Remove redundant dev arg from mlx5_esw_vport_alloc() Saeed Mahameed
2023-04-21 1:38 ` [net-next 14/15] net/mlx5: E-Switch, Remove unused mlx5_esw_offloads_vport_metadata_set() Saeed Mahameed
2023-04-21 1:38 ` [net-next 15/15] net/mlx5: Update op_mode to op_mod for port selection Saeed Mahameed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230421013850.349646-8-saeed@kernel.org \
--to=saeed@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=moshe@nvidia.com \
--cc=msanalla@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).