All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Lunn <andrew@lunn.ch>
To: Eran Ben Elisha <eranbe@mellanox.com>
Cc: netdev@vger.kernel.org, Jiri Pirko <jiri@mellanox.com>,
	Andy Gospodarek <andrew.gospodarek@broadcom.com>,
	Michael Chan <michael.chan@broadcom.com>,
	Jakub Kicinski <jakub.kicinski@netronome.com>,
	Simon Horman <simon.horman@netronome.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Tal Alon <talal@mellanox.com>, Ariel Almog <ariela@mellanox.com>
Subject: Re: [RFC PATCH iproute2-next] man: Add devlink health man page
Date: Thu, 13 Sep 2018 17:12:52 +0200	[thread overview]
Message-ID: <20180913151252.GC23892@lunn.ch> (raw)
In-Reply-To: <66584ca2-8efa-9a6d-c1f3-1cf81cb04259@mellanox.com>

> >>>>        devlink health sensor set pci/0000:01:00.0 name TX_COMP_ERROR action reset off action dump on
> >>>>            Sets TX_COMP_ERROR sensor parameters for a specific device.

> >>This is what I had in mind:
> >>1. command interface error
> >>2. command interface timeout
> >>3. stuck TX queue (like tx_timeout)
> >>4. stuck TX completion queue (driver did not process packets in a reasonable
> >>time period)
> >>5. stuck RX queue
> >>6. RX completion error
> >>7. TX completion error
> >>8. HW / FW catastrophic error report
> >>9. completion queue overrun

> Such issues do exist in production environment, and need to be handled even
> if root cause is a bug which will be fixed in latest release. My feature
> should help developers / administrator to control and recover their live
> systems, by auto correction and logging support.
> Goal is:
> - Provide alert debug information
> - Self healing
> - If problem needs vendor support, provide a way to gather all needed
> debugging information.

So maybe you have the wrong name for this. Health is nice in terms of
Marketing, but we are actually talking about bug recovery.

devlink bug sensor set pci/0000:01:00.0 name command_interface_error action reset off action dump on
devlink bug sensor set pci/0000:01:00.0 name command_interface_timeout action reset off action dump on
devlink bug sensor set pci/0000:01:00.0 name transmit_completion_error action reset off action dump on
devlink bug sensor set pci/0000:01:00.0 name completion_queue_overrun action reset off action dump on

seems a lot more understandable than:

devlink health set pci/0000:01:00.0 name TX_COMP_ERROR action reset off action dump on

	Andrew

  reply	other threads:[~2018-09-13 20:22 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-13  8:18 [RFC PATCH iproute2-next] System specification health API Eran Ben Elisha
2018-09-13  8:18 ` [RFC PATCH iproute2-next] man: Add devlink health man page Eran Ben Elisha
2018-09-13 10:27   ` Tobin C. Harding
2018-09-13 11:58     ` Eran Ben Elisha
2018-09-13 22:06       ` Tobin C. Harding
2018-09-13 12:08   ` Andrew Lunn
2018-09-13 12:49     ` Eran Ben Elisha
2018-09-13 13:24       ` Andrew Lunn
2018-09-13 14:30         ` Eran Ben Elisha
2018-09-13 15:12           ` Andrew Lunn [this message]
2018-09-16  9:14             ` Eran Ben Elisha
2018-09-13 17:36 ` [RFC PATCH iproute2-next] System specification health API Jakub Kicinski
2018-09-16 10:37   ` Eran Ben Elisha
2018-09-25 12:00     ` Eran Ben Elisha
2018-09-16 19:29   ` Stephen Hemminger
2018-09-16 19:57     ` Andrew Lunn
2018-09-25 12:17       ` Eran Ben Elisha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180913151252.GC23892@lunn.ch \
    --to=andrew@lunn.ch \
    --cc=alexander.duyck@gmail.com \
    --cc=andrew.gospodarek@broadcom.com \
    --cc=ariela@mellanox.com \
    --cc=eranbe@mellanox.com \
    --cc=f.fainelli@gmail.com \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@mellanox.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=simon.horman@netronome.com \
    --cc=talal@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.