From: Jiri Pirko <jiri@resnulli.us>
To: Eran Ben Elisha <eranbe@mellanox.com>
Cc: netdev@vger.kernel.org, Jiri Pirko <jiri@mellanox.com>,
"David S. Miller" <davem@davemloft.net>,
Ariel Almog <ariela@mellanox.com>, Aya Levin <ayal@mellanox.com>,
Moshe Shemesh <moshe@mellanox.com>
Subject: Re: [PATCH net-next v2 09/11] net/mlx5e: Add TX reporter support
Date: Sun, 20 Jan 2019 12:06:18 +0100 [thread overview]
Message-ID: <20190120110618.GB2730@nanopsycho> (raw)
In-Reply-To: <1547762360-7075-10-git-send-email-eranbe@mellanox.com>
Thu, Jan 17, 2019 at 10:59:18PM CET, eranbe@mellanox.com wrote:
>Add mlx5e tx reporter to devlink health reporters. This reporter will be
>responsible for diagnosing, reporting and recovering of TX errors.
>This patch declares the TX reporter operations and allocate it using the
>devlink health API. Currently, this reporter supports reporting and
>recovering from send error CQE only. In addition, it adds diagnose
>information for the open SQs.
>
>For a local SQ recover (due to driver error report), in case of SQ recover
>failure, the recover operation will be considered as a failure.
>For a full TX recover, an attempt to close and open the channels will be
>done. If this one passed successfully, it will be considered as a
>successful recover.
>
>The SQ recover from error CQE flow is not a new feature in the driver,
>this patch re-organize the functions and adapt them for the devlink
>health API. For this purpose, move code from en_main.c to a new file
>named reporter_tx.c.
>
>Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
>Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
>---
> .../net/ethernet/mellanox/mlx5/core/Makefile | 2 +-
> drivers/net/ethernet/mellanox/mlx5/core/en.h | 18 +-
> .../ethernet/mellanox/mlx5/core/en/reporter.h | 14 +
> .../mellanox/mlx5/core/en/reporter_tx.c | 321 ++++++++++++++++++
> .../net/ethernet/mellanox/mlx5/core/en_main.c | 135 +-------
> .../net/ethernet/mellanox/mlx5/core/en_tx.c | 2 +-
> 6 files changed, 367 insertions(+), 125 deletions(-)
> create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
> create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>index 9de9abacf7f6..6bb2a860b15b 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>@@ -22,7 +22,7 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
> #
> mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
> en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \
>- en_selftest.o en/port.o en/monitor_stats.o
>+ en_selftest.o en/port.o en/monitor_stats.o en/reporter_tx.o
>
> #
> # Netdev extra
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
>index 8fa8fdd30b85..27e276c9bf84 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
>@@ -388,10 +388,7 @@ struct mlx5e_txqsq {
> struct mlx5e_channel *channel;
> int txq_ix;
> u32 rate_limit;
>- struct mlx5e_txqsq_recover {
>- struct work_struct recover_work;
>- u64 last_recover;
>- } recover;
>+ struct work_struct recover_work;
> } ____cacheline_aligned_in_smp;
>
> struct mlx5e_dma_info {
>@@ -682,6 +679,13 @@ struct mlx5e_rss_params {
> u8 hfunc;
> };
>
>+struct mlx5e_modify_sq_param {
>+ int curr_state;
>+ int next_state;
>+ int rl_update;
>+ int rl_index;
>+};
>+
> struct mlx5e_priv {
> /* priv data path fields - start */
> struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
>@@ -737,6 +741,7 @@ struct mlx5e_priv {
> #ifdef CONFIG_MLX5_EN_TLS
> struct mlx5e_tls *tls;
> #endif
>+ struct devlink_health_reporter *tx_reporter;
> };
>
> struct mlx5e_profile {
>@@ -866,6 +871,11 @@ void mlx5e_set_rq_type(struct mlx5_core_dev *mdev, struct mlx5e_params *params);
> void mlx5e_init_rq_type_params(struct mlx5_core_dev *mdev,
> struct mlx5e_params *params);
>
>+int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
>+ struct mlx5e_modify_sq_param *p);
>+void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq);
>+void mlx5e_tx_disable_queue(struct netdev_queue *txq);
>+
> static inline bool mlx5e_tunnel_inner_ft_supported(struct mlx5_core_dev *mdev)
> {
> return (MLX5_CAP_ETH(mdev, tunnel_stateless_gre) &&
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
>new file mode 100644
>index 000000000000..74083e120ab3
>--- /dev/null
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
>@@ -0,0 +1,14 @@
>+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
>+/* Copyright (c) 2018 Mellanox Technologies. */
>+
>+#ifndef __MLX5E_EN_REPORTER_H
>+#define __MLX5E_EN_REPORTER_H
>+
>+#include <linux/mlx5/driver.h>
>+#include "en.h"
>+
>+int mlx5e_tx_reporter_create(struct mlx5e_priv *priv);
>+void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv);
>+void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq);
>+
>+#endif
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
>new file mode 100644
>index 000000000000..9800df4909c2
>--- /dev/null
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
>@@ -0,0 +1,321 @@
>+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
>+/* Copyright (c) 2018 Mellanox Technologies. */
>+
>+#include <net/devlink.h>
>+#include "reporter.h"
>+#include "lib/eq.h"
>+
>+#define MLX5E_TX_REPORTER_PER_SQ_MAX_LEN 256
Some made-up number. I don't like how this whole thing is do. I will
comment on it in the devlink part.
[...]
>+static int
>+mlx5e_tx_reporter_build_diagnose_output(struct devlink_health_buffer *buffer,
>+ u32 sqn, u8 state, u8 stopped)
>+{
>+ int err, i;
>+ int nest = 0;
>+ char name[20];
>+
>+ err = devlink_health_buffer_nest_start(buffer,
>+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT);
>+ if (err)
>+ goto buffer_error;
>+ nest++;
>+
>+ err = devlink_health_buffer_nest_start(buffer,
>+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
>+ if (err)
>+ goto buffer_error;
>+ nest++;
>+
>+ sprintf(name, "SQ 0x%x", sqn);
No. The whole idea of having the message build up with nested attributes
(json-like) was to avoid things like this. No sprintfs please. If you
want to do sprintf, most of the time you are doing something wrong.
>+ err = devlink_health_buffer_put_object_name(buffer, name);
>+ if (err)
>+ goto buffer_error;
>+
>+ err = devlink_health_buffer_nest_start(buffer,
>+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
>+ if (err)
>+ goto buffer_error;
>+ nest++;
>+
>+ err = devlink_health_buffer_nest_start(buffer,
>+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT);
>+ if (err)
>+ goto buffer_error;
>+ nest++;
>+
>+ err = devlink_health_buffer_nest_start(buffer,
>+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
>+ if (err)
>+ goto buffer_error;
>+ nest++;
>+
>+ err = devlink_health_buffer_put_object_name(buffer, "HW state");
>+ if (err)
>+ goto buffer_error;
>+
>+ err = devlink_health_buffer_nest_start(buffer,
>+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
How could you put an object name and don't start nesting? It looks
implicit to me.
>+ if (err)
>+ goto buffer_error;
>+ nest++;
>+
>+ err = devlink_health_buffer_put_value_u8(buffer, state);
>+ if (err)
>+ goto buffer_error;
>+
>+ devlink_health_buffer_nest_end(buffer); /* DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE */
>+ nest--;
>+
>+ devlink_health_buffer_nest_end(buffer); /* DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR */
>+ nest--;
>+
>+ err = devlink_health_buffer_nest_start(buffer,
>+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR);
>+ if (err)
>+ goto buffer_error;
>+ nest++;
>+
>+ err = devlink_health_buffer_put_object_name(buffer, "stopped");
>+ if (err)
>+ goto buffer_error;
>+
>+ err = devlink_health_buffer_nest_start(buffer,
>+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE);
>+ if (err)
>+ goto buffer_error;
>+ nest++;
What's up with this nest++ nest--?
When you open a nest, you should have an error path inten to close the
nest and jump there in case of an error. This is hard to read and
understand.
>+
>+ err = devlink_health_buffer_put_value_u8(buffer, stopped);
>+ if (err)
>+ goto buffer_error;
>+
>+ for (i = 0; i < nest; i++)
>+ devlink_health_buffer_nest_end(buffer);
>+
>+ return 0;
>+
>+buffer_error:
>+ for (i = 0; i < nest; i++)
>+ devlink_health_buffer_nest_cancel(buffer);
>+ return err;
>+}
>+
>+static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
>+ struct devlink_health_buffer **buffers_array,
>+ unsigned int buffer_size,
>+ unsigned int num_buffers)
>+{
>+ struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
>+ unsigned int buff = 0;
>+ int i = 0, err = 0;
>+
>+ if (buffer_size < MLX5E_TX_REPORTER_PER_SQ_MAX_LEN)
>+ return -ENOMEM;
>+
>+ mutex_lock(&priv->state_lock);
>+
>+ if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) {
>+ mutex_unlock(&priv->state_lock);
>+ return 0;
>+ }
>+
>+ while (i < priv->channels.num * priv->channels.params.num_tc) {
>+ struct mlx5e_txqsq *sq = priv->txq2sq[i];
>+ u8 state;
>+
>+ err = mlx5_core_query_sq_state(priv->mdev, sq->sqn, &state);
>+ if (err)
>+ break;
>+
>+ err = mlx5e_tx_reporter_build_diagnose_output(buffers_array[buff],
>+ sq->sqn, state,
>+ netif_xmit_stopped(sq->txq));
>+ if (err) {
>+ if (++buff == num_buffers)
>+ break;
>+ } else {
>+ i++;
>+ }
>+ }
>+
>+ mutex_unlock(&priv->state_lock);
>+ return err;
>+}
>+
>+static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops = {
>+ .name = "TX",
>+ .recover = mlx5e_tx_reporter_recover,
>+ .diagnose_size = MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC *
>+ MLX5E_TX_REPORTER_PER_SQ_MAX_LEN,
>+ .diagnose = mlx5e_tx_reporter_diagnose,
>+ .dump_size = 0,
>+ .dump = NULL,
Don't initialize 0 and NULL in static const.
[...]
next prev parent reply other threads:[~2019-01-20 11:15 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-17 21:59 [PATCH net-next v2 00/11] Devlink health reporting and recovery system Eran Ben Elisha
2019-01-17 21:59 ` [PATCH net-next v2 01/11] devlink: Add health buffer support Eran Ben Elisha
2019-01-20 10:03 ` Jiri Pirko
2019-01-20 11:06 ` Eran Ben Elisha
2019-01-20 11:08 ` Jiri Pirko
2019-01-20 18:45 ` David Miller
2019-01-21 11:07 ` Eran Ben Elisha
2019-01-21 12:08 ` Jiri Pirko
2019-01-20 11:20 ` Jiri Pirko
2019-01-17 21:59 ` [PATCH net-next v2 02/11] devlink: Add health reporter create/destroy functionality Eran Ben Elisha
2019-01-20 11:49 ` Jiri Pirko
2019-01-17 21:59 ` [PATCH net-next v2 03/11] devlink: Add health report functionality Eran Ben Elisha
2019-01-20 11:27 ` Jiri Pirko
2019-01-21 11:12 ` Eran Ben Elisha
2019-01-17 21:59 ` [PATCH net-next v2 04/11] devlink: Add health get command Eran Ben Elisha
2019-01-20 11:31 ` Jiri Pirko
2019-01-17 21:59 ` [PATCH net-next v2 05/11] devlink: Add health set command Eran Ben Elisha
2019-01-20 11:32 ` Jiri Pirko
2019-01-17 21:59 ` [PATCH net-next v2 06/11] devlink: Add health recover command Eran Ben Elisha
2019-01-20 11:33 ` Jiri Pirko
2019-01-17 21:59 ` [PATCH net-next v2 07/11] devlink: Add health diagnose command Eran Ben Elisha
2019-01-20 11:38 ` Jiri Pirko
2019-01-17 21:59 ` [PATCH net-next v2 08/11] devlink: Add health dump {get,clear} commands Eran Ben Elisha
2019-01-17 21:59 ` [PATCH net-next v2 09/11] net/mlx5e: Add TX reporter support Eran Ben Elisha
2019-01-20 11:06 ` Jiri Pirko [this message]
2019-01-21 11:32 ` Eran Ben Elisha
2019-01-21 12:11 ` Jiri Pirko
2019-01-21 13:06 ` Eran Ben Elisha
2019-01-21 13:45 ` Jiri Pirko
2019-01-17 21:59 ` [PATCH net-next v2 10/11] net/mlx5e: Add TX timeout support for mlx5e TX reporter Eran Ben Elisha
2019-01-17 21:59 ` [PATCH net-next v2 11/11] devlink: Add Documentation/networking/devlink-health.txt Eran Ben Elisha
2019-01-18 22:59 ` [PATCH net-next v2 00/11] Devlink health reporting and recovery system David Miller
2019-01-20 9:27 ` [PATCH iproute2-next v2] devlink: Add health command support Aya Levin
2019-01-23 3:37 ` David Ahern
2019-01-23 11:27 ` Aya Levin
2019-01-24 0:27 ` David Ahern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190120110618.GB2730@nanopsycho \
--to=jiri@resnulli.us \
--cc=ariela@mellanox.com \
--cc=ayal@mellanox.com \
--cc=davem@davemloft.net \
--cc=eranbe@mellanox.com \
--cc=jiri@mellanox.com \
--cc=moshe@mellanox.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox