From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.5 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30EE0C61CE4 for ; Sun, 20 Jan 2019 11:15:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C679F20880 for ; Sun, 20 Jan 2019 11:15:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=resnulli-us.20150623.gappssmtp.com header.i=@resnulli-us.20150623.gappssmtp.com header.b="D98gWSF+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730476AbfATLPE (ORCPT ); Sun, 20 Jan 2019 06:15:04 -0500 Received: from mail-wr1-f66.google.com ([209.85.221.66]:43668 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728894AbfATLPE (ORCPT ); Sun, 20 Jan 2019 06:15:04 -0500 Received: by mail-wr1-f66.google.com with SMTP id r10so20023450wrs.10 for ; Sun, 20 Jan 2019 03:15:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=resnulli-us.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=W1ktB33XjGY87o/Hi9ukqfchotjdvn8O/JbIUoVanFI=; b=D98gWSF+RZVyEYjTHfPXRmoVe2TEbHVuKsqC2/saL/+49CbjvkkSYu0kk4UVyjtL6+ M6b4/cetIKjH7L6FFzKQB+yucDgLKpTVywVticm1tx0kYXkXHmEY7NIVlgP+hv2xXfkW ShdCv+9Dd48TJrQokUtg/vtKjPzI7Qlpe2HzhidqTc4nXtp3VTXN8QFNduU2jLVciSrk JfJv1y8HCujzbPr0cvuRYywbiyB1pVshYBud1Pw13co/18WOB0G9/920a9zlQiDrR4FV f0bmU6plDt4Qg/knXt9FDAKjbESZIkPdiJX+WoRQ2bPGuoAIeIh1DkMkHqh6KSVWk4m2 PGyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=W1ktB33XjGY87o/Hi9ukqfchotjdvn8O/JbIUoVanFI=; b=Xyy8NjYGU+vhBjxehFm9poam/4UMrbxvjt2FXbGyhKcQ0A5GcBehHvDDT93lPEVIdN an7TZEzojj+wqlhqseBXGLqmpz6dgDB+jOh/uZvzALHHtHjX88EyOgmkrgoIYNglHG+G 6j+zrCskkUlRTutDPpEcdIr9cHptAq81znXn+QtnWaBnz7J9gEQpPGRXMRTcozsSfv1X GtytICvubrWXekJcQgG3e7I4d72XqEOjDBCuPKkuqjAuNKw4c1vUY8RPPzlKc+lw43qz 3uIK3z0zkiQDxPKcWpZAyKVYZqEFkCg3KY5xV4uALb/2+CpNXvwlIMysc4P6RQjcZTfc xL5g== X-Gm-Message-State: AJcUukfh0hywsWmkCdZdHeR2Ip7BUgNamSIzMl6VUtGpElJyVZ2GVtk1 LkTJQWo5q1CDiEsNPRmdZ+HVNeZrT/4= X-Google-Smtp-Source: ALg8bN6jruYI9WUng5q7qjjsDegevT8oxQKj7aq3WkOBe8/BCrP0yfUMDzZBeX5McZ70zXmIiMKk+g== X-Received: by 2002:a5d:63c3:: with SMTP id c3mr23494710wrw.215.1547982901500; Sun, 20 Jan 2019 03:15:01 -0800 (PST) Received: from localhost ([193.47.165.251]) by smtp.gmail.com with ESMTPSA id s8sm102613408wrn.44.2019.01.20.03.15.00 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 20 Jan 2019 03:15:01 -0800 (PST) Date: Sun, 20 Jan 2019 12:06:18 +0100 From: Jiri Pirko To: Eran Ben Elisha Cc: netdev@vger.kernel.org, Jiri Pirko , "David S. Miller" , Ariel Almog , Aya Levin , Moshe Shemesh Subject: Re: [PATCH net-next v2 09/11] net/mlx5e: Add TX reporter support Message-ID: <20190120110618.GB2730@nanopsycho> References: <1547762360-7075-1-git-send-email-eranbe@mellanox.com> <1547762360-7075-10-git-send-email-eranbe@mellanox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1547762360-7075-10-git-send-email-eranbe@mellanox.com> User-Agent: Mutt/1.9.5 (2018-04-13) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Thu, Jan 17, 2019 at 10:59:18PM CET, eranbe@mellanox.com wrote: >Add mlx5e tx reporter to devlink health reporters. This reporter will be >responsible for diagnosing, reporting and recovering of TX errors. >This patch declares the TX reporter operations and allocate it using the >devlink health API. Currently, this reporter supports reporting and >recovering from send error CQE only. In addition, it adds diagnose >information for the open SQs. > >For a local SQ recover (due to driver error report), in case of SQ recover >failure, the recover operation will be considered as a failure. >For a full TX recover, an attempt to close and open the channels will be >done. If this one passed successfully, it will be considered as a >successful recover. > >The SQ recover from error CQE flow is not a new feature in the driver, >this patch re-organize the functions and adapt them for the devlink >health API. For this purpose, move code from en_main.c to a new file >named reporter_tx.c. > >Signed-off-by: Eran Ben Elisha >Reviewed-by: Saeed Mahameed >--- > .../net/ethernet/mellanox/mlx5/core/Makefile | 2 +- > drivers/net/ethernet/mellanox/mlx5/core/en.h | 18 +- > .../ethernet/mellanox/mlx5/core/en/reporter.h | 14 + > .../mellanox/mlx5/core/en/reporter_tx.c | 321 ++++++++++++++++++ > .../net/ethernet/mellanox/mlx5/core/en_main.c | 135 +------- > .../net/ethernet/mellanox/mlx5/core/en_tx.c | 2 +- > 6 files changed, 367 insertions(+), 125 deletions(-) > create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h > create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c > >diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile >index 9de9abacf7f6..6bb2a860b15b 100644 >--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile >+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile >@@ -22,7 +22,7 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \ > # > mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \ > en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \ >- en_selftest.o en/port.o en/monitor_stats.o >+ en_selftest.o en/port.o en/monitor_stats.o en/reporter_tx.o > > # > # Netdev extra >diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h >index 8fa8fdd30b85..27e276c9bf84 100644 >--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h >+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h >@@ -388,10 +388,7 @@ struct mlx5e_txqsq { > struct mlx5e_channel *channel; > int txq_ix; > u32 rate_limit; >- struct mlx5e_txqsq_recover { >- struct work_struct recover_work; >- u64 last_recover; >- } recover; >+ struct work_struct recover_work; > } ____cacheline_aligned_in_smp; > > struct mlx5e_dma_info { >@@ -682,6 +679,13 @@ struct mlx5e_rss_params { > u8 hfunc; > }; > >+struct mlx5e_modify_sq_param { >+ int curr_state; >+ int next_state; >+ int rl_update; >+ int rl_index; >+}; >+ > struct mlx5e_priv { > /* priv data path fields - start */ > struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC]; >@@ -737,6 +741,7 @@ struct mlx5e_priv { > #ifdef CONFIG_MLX5_EN_TLS > struct mlx5e_tls *tls; > #endif >+ struct devlink_health_reporter *tx_reporter; > }; > > struct mlx5e_profile { >@@ -866,6 +871,11 @@ void mlx5e_set_rq_type(struct mlx5_core_dev *mdev, struct mlx5e_params *params); > void mlx5e_init_rq_type_params(struct mlx5_core_dev *mdev, > struct mlx5e_params *params); > >+int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn, >+ struct mlx5e_modify_sq_param *p); >+void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq); >+void mlx5e_tx_disable_queue(struct netdev_queue *txq); >+ > static inline bool mlx5e_tunnel_inner_ft_supported(struct mlx5_core_dev *mdev) > { > return (MLX5_CAP_ETH(mdev, tunnel_stateless_gre) && >diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h >new file mode 100644 >index 000000000000..74083e120ab3 >--- /dev/null >+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h >@@ -0,0 +1,14 @@ >+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ >+/* Copyright (c) 2018 Mellanox Technologies. */ >+ >+#ifndef __MLX5E_EN_REPORTER_H >+#define __MLX5E_EN_REPORTER_H >+ >+#include >+#include "en.h" >+ >+int mlx5e_tx_reporter_create(struct mlx5e_priv *priv); >+void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv); >+void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq); >+ >+#endif >diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c >new file mode 100644 >index 000000000000..9800df4909c2 >--- /dev/null >+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c >@@ -0,0 +1,321 @@ >+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ >+/* Copyright (c) 2018 Mellanox Technologies. */ >+ >+#include >+#include "reporter.h" >+#include "lib/eq.h" >+ >+#define MLX5E_TX_REPORTER_PER_SQ_MAX_LEN 256 Some made-up number. I don't like how this whole thing is do. I will comment on it in the devlink part. [...] >+static int >+mlx5e_tx_reporter_build_diagnose_output(struct devlink_health_buffer *buffer, >+ u32 sqn, u8 state, u8 stopped) >+{ >+ int err, i; >+ int nest = 0; >+ char name[20]; >+ >+ err = devlink_health_buffer_nest_start(buffer, >+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT); >+ if (err) >+ goto buffer_error; >+ nest++; >+ >+ err = devlink_health_buffer_nest_start(buffer, >+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR); >+ if (err) >+ goto buffer_error; >+ nest++; >+ >+ sprintf(name, "SQ 0x%x", sqn); No. The whole idea of having the message build up with nested attributes (json-like) was to avoid things like this. No sprintfs please. If you want to do sprintf, most of the time you are doing something wrong. >+ err = devlink_health_buffer_put_object_name(buffer, name); >+ if (err) >+ goto buffer_error; >+ >+ err = devlink_health_buffer_nest_start(buffer, >+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE); >+ if (err) >+ goto buffer_error; >+ nest++; >+ >+ err = devlink_health_buffer_nest_start(buffer, >+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT); >+ if (err) >+ goto buffer_error; >+ nest++; >+ >+ err = devlink_health_buffer_nest_start(buffer, >+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR); >+ if (err) >+ goto buffer_error; >+ nest++; >+ >+ err = devlink_health_buffer_put_object_name(buffer, "HW state"); >+ if (err) >+ goto buffer_error; >+ >+ err = devlink_health_buffer_nest_start(buffer, >+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE); How could you put an object name and don't start nesting? It looks implicit to me. >+ if (err) >+ goto buffer_error; >+ nest++; >+ >+ err = devlink_health_buffer_put_value_u8(buffer, state); >+ if (err) >+ goto buffer_error; >+ >+ devlink_health_buffer_nest_end(buffer); /* DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE */ >+ nest--; >+ >+ devlink_health_buffer_nest_end(buffer); /* DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR */ >+ nest--; >+ >+ err = devlink_health_buffer_nest_start(buffer, >+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_PAIR); >+ if (err) >+ goto buffer_error; >+ nest++; >+ >+ err = devlink_health_buffer_put_object_name(buffer, "stopped"); >+ if (err) >+ goto buffer_error; >+ >+ err = devlink_health_buffer_nest_start(buffer, >+ DEVLINK_ATTR_HEALTH_BUFFER_OBJECT_VALUE); >+ if (err) >+ goto buffer_error; >+ nest++; What's up with this nest++ nest--? When you open a nest, you should have an error path inten to close the nest and jump there in case of an error. This is hard to read and understand. >+ >+ err = devlink_health_buffer_put_value_u8(buffer, stopped); >+ if (err) >+ goto buffer_error; >+ >+ for (i = 0; i < nest; i++) >+ devlink_health_buffer_nest_end(buffer); >+ >+ return 0; >+ >+buffer_error: >+ for (i = 0; i < nest; i++) >+ devlink_health_buffer_nest_cancel(buffer); >+ return err; >+} >+ >+static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter, >+ struct devlink_health_buffer **buffers_array, >+ unsigned int buffer_size, >+ unsigned int num_buffers) >+{ >+ struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter); >+ unsigned int buff = 0; >+ int i = 0, err = 0; >+ >+ if (buffer_size < MLX5E_TX_REPORTER_PER_SQ_MAX_LEN) >+ return -ENOMEM; >+ >+ mutex_lock(&priv->state_lock); >+ >+ if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) { >+ mutex_unlock(&priv->state_lock); >+ return 0; >+ } >+ >+ while (i < priv->channels.num * priv->channels.params.num_tc) { >+ struct mlx5e_txqsq *sq = priv->txq2sq[i]; >+ u8 state; >+ >+ err = mlx5_core_query_sq_state(priv->mdev, sq->sqn, &state); >+ if (err) >+ break; >+ >+ err = mlx5e_tx_reporter_build_diagnose_output(buffers_array[buff], >+ sq->sqn, state, >+ netif_xmit_stopped(sq->txq)); >+ if (err) { >+ if (++buff == num_buffers) >+ break; >+ } else { >+ i++; >+ } >+ } >+ >+ mutex_unlock(&priv->state_lock); >+ return err; >+} >+ >+static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops = { >+ .name = "TX", >+ .recover = mlx5e_tx_reporter_recover, >+ .diagnose_size = MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC * >+ MLX5E_TX_REPORTER_PER_SQ_MAX_LEN, >+ .diagnose = mlx5e_tx_reporter_diagnose, >+ .dump_size = 0, >+ .dump = NULL, Don't initialize 0 and NULL in static const. [...]