netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli.us>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>,
	Moshe Shemesh <moshe@mellanox.com>,
	Eran Ben Elisha <eranbe@mellanox.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Jiri Pirko <jiri@mellanox.com>
Subject: Re: [net-next 09/15] net/mlx5: Create FW devlink health reporter
Date: Tue, 7 May 2019 07:59:44 +0200	[thread overview]
Message-ID: <20190507055944.GA14667@nanopsycho.orion> (raw)
In-Reply-To: <20190506214640.jgeikwozvk55c6iy@ast-mbp>

Mon, May 06, 2019 at 11:46:43PM CEST, alexei.starovoitov@gmail.com wrote:
>On Mon, May 06, 2019 at 07:52:18PM +0000, Saeed Mahameed wrote:
>> On Mon, 2019-05-06 at 13:38 +0200, Jiri Pirko wrote:
>> > Mon, May 06, 2019 at 12:45:44PM CEST, moshe@mellanox.com wrote:
>> > > 
>> > > On 5/5/2019 6:42 PM, Jiri Pirko wrote:
>> > > > Sun, May 05, 2019 at 02:33:23AM CEST, saeedm@mellanox.com wrote:
>> > > > > From: Moshe Shemesh <moshe@mellanox.com>
>> > > > > 
>> > > > > Create mlx5_devlink_health_reporter for FW reporter. The FW
>> > > > > reporter
>> > > > > implements devlink_health_reporter diagnose callback.
>> > > > > 
>> > > > > The fw reporter diagnose command can be triggered any time by
>> > > > > the user
>> > > > > to check current fw status.
>> > > > > In healthy status, it will return clear syndrome. Otherwise it
>> > > > > will dump
>> > > > > the health info buffer.
>> > > > > 
>> > > > > Command example and output on healthy status:
>> > > > > $ devlink health diagnose pci/0000:82:00.0 reporter fw
>> > > > > Syndrome: 0
>> > > > > 
>> > > > > Command example and output on non healthy status:
>> > > > > $ devlink health diagnose pci/0000:82:00.0 reporter fw
>> > > > > diagnose data:
>> > > > > assert_var[0] 0xfc3fc043
>> > > > > assert_var[1] 0x0001b41c
>> > > > > assert_var[2] 0x00000000
>> > > > > assert_var[3] 0x00000000
>> > > > > assert_var[4] 0x00000000
>> > > > > assert_exit_ptr 0x008033b4
>> > > > > assert_callra 0x0080365c
>> > > > > fw_ver 16.24.1000
>> > > > > hw_id 0x0000020d
>> > > > > irisc_index 0
>> > > > > synd 0x8: unrecoverable hardware error
>> > > > > ext_synd 0x003d
>> > > > > raw fw_ver 0x101803e8
>> > > > > 
>> > > > > Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
>> > > > > Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
>> > > > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>> > > > 
>> > > > 	
>> > > > [...]	
>> > > > 	
>> > > > 	
>> > > > > +static int
>> > > > > +mlx5_fw_reporter_diagnose(struct devlink_health_reporter
>> > > > > *reporter,
>> > > > > +			  struct devlink_fmsg *fmsg)
>> > > > > +{
>> > > > > +	struct mlx5_core_dev *dev =
>> > > > > devlink_health_reporter_priv(reporter);
>> > > > > +	struct mlx5_core_health *health = &dev->priv.health;
>> > > > > +	u8 synd;
>> > > > > +	int err;
>> > > > > +
>> > > > > +	mutex_lock(&health->info_buf_lock);
>> > > > > +	mlx5_get_health_info(dev, &synd);
>> > > > > +
>> > > > > +	if (!synd) {
>> > > > > +		mutex_unlock(&health->info_buf_lock);
>> > > > > +		return devlink_fmsg_u8_pair_put(fmsg,
>> > > > > "Syndrome", synd);
>> > > > > +	}
>> > > > > +
>> > > > > +	err = devlink_fmsg_string_pair_put(fmsg, "diagnose
>> > > > > data",
>> > > > > +					   health->info_buf);
>> > > > 
>> > > > No! This is wrong! You are sneaking in text blob. Please put the
>> > > > info in
>> > > > structured form using proper fmsg helpers.
>> > > > 
>> > > This is the fw output format, it is already in use, I don't want
>> > > to 
>> > > change it, just have it here in the diagnose output too.
>> > 
>> > Already in use where? in dmesg? Sorry, but that is not an argument.
>> > Please format the message properly.
>> > 
>> 
>> What is wrong here ? 
>> 
>> Unlike binary dump data, I thought diagnose data is allowed to be
>> developer friendly free text format, if not then let's enforce it in
>> devlink API. Jiri,  you can't audit each and every use of
>> devlink_fmsg_string_pair_put, it must be done by design.
>
>I agree with the purpose that Jiri is trying to get out of this infra,
>but I disagree that strings should be disallowed.

Not strings in general, I have nothing against them. If they are plain.
But when you push structured data into strings, for example using
sprintf for ints etc, that is a problem I'm pointing at.
Instead of that, the structured data should be formatted using
proper fmsg helpers. We already have them in kernel, it is just about
using them and not taking shortcuts.


>
>The example messages from commit log are not useful at all:
>$ devlink health diagnose pci/0000:82:00.0 reporter fw
>diagnose data:
>assert_var[0] 0xfc3fc043
>assert_var[1] 0x0001b41c
>assert_var[2] 0x00000000
>assert_var[3] 0x00000000
>assert_var[4] 0x00000000
>assert_exit_ptr 0x008033b4
>assert_callra 0x0080365c
>fw_ver 16.24.1000
>hw_id 0x0000020d
>irisc_index 0
>synd 0x8: unrecoverable hardware error
>ext_synd 0x003d
>raw fw_ver 0x101803e8
>
>Firmware dumps some magic numbers. How this is any better than
>what we have today?
>Every bit has to have meaningful message.
>The point of devlink diag is not to replicate dmesg, but to give
>users a clean way to understand and debug the hw.
>
>It seems mlx is adding a new interface and immediately misusing it.
>This not an example to give to other vendors.

I agree.

  reply	other threads:[~2019-05-07  5:59 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-05  0:32 [pull request][net-next 00/15] Mellanox, mlx5 Firmware devlink health and sw reset Saeed Mahameed
2019-05-05  0:32 ` [net-next 01/15] net/mlx5: Move all devlink related functions calls to devlink.c Saeed Mahameed
2019-05-05  0:32 ` [net-next 02/15] net/mlx5: Add Vendor Specific Capability access gateway Saeed Mahameed
2019-05-05  0:33 ` [net-next 03/15] net/mlx5: Add Crdump FW snapshot support Saeed Mahameed
2019-05-05 15:36   ` Jiri Pirko
2019-05-05  0:33 ` [net-next 04/15] net/mlx5: Add support for devlink region_snapshot parameter Saeed Mahameed
2019-05-05  0:33 ` [net-next 05/15] net/mlx5: Handle SW reset of FW in error flow Saeed Mahameed
2019-05-05  0:33 ` [net-next 06/15] net/mlx5: Control CR-space access by different PFs Saeed Mahameed
2019-05-05  0:33 ` [net-next 07/15] net/mlx5: Issue SW reset on FW assert Saeed Mahameed
2019-05-05 15:38   ` Jiri Pirko
2019-05-06 10:44     ` Moshe Shemesh
2019-05-05  0:33 ` [net-next 08/15] net/mlx5: Refactor print health info Saeed Mahameed
2019-05-05 15:42   ` Jiri Pirko
2019-05-05  0:33 ` [net-next 09/15] net/mlx5: Create FW devlink health reporter Saeed Mahameed
2019-05-05 15:42   ` Jiri Pirko
2019-05-06 10:45     ` Moshe Shemesh
2019-05-06 11:38       ` Jiri Pirko
2019-05-06 19:52         ` Saeed Mahameed
2019-05-06 21:46           ` Alexei Starovoitov
2019-05-07  5:59             ` Jiri Pirko [this message]
2019-05-07  6:01           ` Jiri Pirko
2019-05-07  0:11   ` Jakub Kicinski
2019-05-05  0:33 ` [net-next 10/15] net/mlx5: Add core dump register access functions Saeed Mahameed
2019-05-05  0:33 ` [net-next 11/15] net/mlx5: Add support for FW reporter dump Saeed Mahameed
2019-05-05 15:49   ` Jiri Pirko
2019-05-06 10:51     ` Moshe Shemesh
2019-05-06 11:37       ` Jiri Pirko
2019-05-05  0:33 ` [net-next 12/15] net/mlx5: Report devlink health on FW issues Saeed Mahameed
2019-05-05  0:33 ` [net-next 13/15] net/mlx5: Add fw fatal devlink health reporter Saeed Mahameed
2019-05-05  0:33 ` [net-next 14/15] net/mlx5: Add support for FW fatal reporter dump Saeed Mahameed
2019-05-05 15:52   ` Jiri Pirko
2019-05-06 10:54     ` Moshe Shemesh
2019-05-06 11:42       ` Jiri Pirko
2019-05-05  0:33 ` [net-next 15/15] net/mlx5: Report devlink health on FW fatal issues Saeed Mahameed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190507055944.GA14667@nanopsycho.orion \
    --to=jiri@resnulli.us \
    --cc=alexei.starovoitov@gmail.com \
    --cc=davem@davemloft.net \
    --cc=eranbe@mellanox.com \
    --cc=jiri@mellanox.com \
    --cc=moshe@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).