* [RFC PATCH iproute2-next V2] System specification exception API @ 2018-09-26 11:52 Eran Ben Elisha 2018-09-26 11:52 ` [RFC PATCH iproute2-next V2] man: Add devlink exception man page Eran Ben Elisha 2018-09-27 12:47 ` [RFC PATCH iproute2-next V2] System specification exception API Jiri Pirko 0 siblings, 2 replies; 8+ messages in thread From: Eran Ben Elisha @ 2018-09-26 11:52 UTC (permalink / raw) To: netdev, Jakub Kicinski, Jiri Pirko, Stephen Hemminger, Andrew Lunn, Tobin C. Harding Cc: Ariel Almog, Tal Alon, Eran Ben Elisha The exception spec is targeted for Real Time Alerting, in order to know when something bad had happened to a PCI device - Provide alert debug information - Self healing - If problem needs vendor support, provide a way to gather all needed debugging information. The exception mechanism contains condition checkers which sense for malfunction. Upon a condition hit, actions such as logs and correction can be taken. The condition checkers are divided into the following groups - Hardware - a checker which is triggered by the device due to malfunction. - Software - a checker which is triggered by the software due to malfunction. Both groups of condition checkers can be triggered due to error event or due to a periodic check. Actions are the way to handle those events. Action can be in one of the following groups: - Dump - SW trace, SW dump, HW trace, HW dump - Reset - Surgical correction (e.g. modify Q, flush Q, reset of device, etc) Actions can be performed by SW or HW. User is allowed to enable or disable condition checkers and its action mapping. This RFC man page patch describes the suggested API of devlink-exception in order to control conditions and actions. V2: * Renaming terms: health -> exception sensor -> condition * Remove reinit command and merge with action command. * Consmetics in grammer. Eran Ben Elisha (1): man: Add devlink exception man page man/man8/devlink-exception.8 | 158 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 158 insertions(+) create mode 100644 man/man8/devlink-exception.8 -- 1.8.3.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH iproute2-next V2] man: Add devlink exception man page 2018-09-26 11:52 [RFC PATCH iproute2-next V2] System specification exception API Eran Ben Elisha @ 2018-09-26 11:52 ` Eran Ben Elisha 2018-09-27 14:32 ` Jiri Pirko 2018-09-27 12:47 ` [RFC PATCH iproute2-next V2] System specification exception API Jiri Pirko 1 sibling, 1 reply; 8+ messages in thread From: Eran Ben Elisha @ 2018-09-26 11:52 UTC (permalink / raw) To: netdev, Jakub Kicinski, Jiri Pirko, Stephen Hemminger, Andrew Lunn, Tobin C. Harding Cc: Ariel Almog, Tal Alon, Eran Ben Elisha Add devlink-exception man page. Devlink-exception tool will control device exception attributes, conditions, actions and logging. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> ------------------------------------------------------- Copy paste man output to here for easier review process of the RFC. DEVLINK-EXCEPTION(8) Linux DEVLINK-EXCEPTION(8) NAME devlink-exception - devlink exception configuration SYNOPSIS devlink [ OPTIONS ] exception { COMMAND | help } OPTIONS := { -V[ersion] | -n[no-nice-names] } devlink exception show [ DEV ] [ condition NAME ] [ action NAME ] devlink exception condition set DEV name NAME [ action NAME { active | inactive } ] devlink exception action set DEV name NAME period PERIOD count COUNT fail { ignore | down } devlink exception help DESCRIPTION devlink-exception tool allows user to configure the way driver treats unexpected status. The tool allows configuration of the conditions that can trigger exception activity. Set for each condition the follow up opera‐ tions, such as, reset and dump of info. In addition, set the exception activity termination action. devlink exception show - Display devlink exception conditions and actions attributes DEV Specifies the devlink device to show. condition NAME Specifies the devlink condition to show. action NAME Specifies the devlink action to show. devlink exception condition set - sets devlink exception condition attributes DEV Specifies the devlink device to set. name NAME Name of the condition to set. action NAME { active | inactive } Specify which actions to activate and which to deactivate once a condition was triggered. Actions can be dump, reset, etc. devlink exception action set - sets devlink action attributes. Once this command is launched, period and count measurement will be reset. DEV Specifies the devlink device to set. name NAME Specifies the devlink action to set. period PERIOD The period on which we limit the amount of performed actions, measured in seconds. count COUNT The maximum number of actions performed in a limited time frame. fail { ignore | down } Specify the behavior once count limit was reached. ignore - Skip triggering this action. down - Driver will remain in nonoperational state. EXAMPLES devlink exception show Shows the exception state of all devlink devices on the system. devlink exception show pci/0000:01:00.0 Shows the exception state of specified devlink device. devlink exception condition set pci/0000:01:00.0 name TX_COMP_ERROR action reset off action dump on Sets TX_COMP_ERROR condition parameters for a specific device. devlink exception action set pci/0000:01:00.0 name reset period 3600 count 5 fail ignore Sets exception attributes for reset action. Period timer and counter are being reset. SEE ALSO devlink(8), devlink-port(8), devlink-sb(8), devlink-monitor(8), devlink-dev(8), AUTHOR Eran ben Elisha <eranbe@mellanox.com> iproute2 15 Aug 2018 DEVLINK-EXCEPTION(8) --- man/man8/devlink-exception.8 | 158 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 158 insertions(+) create mode 100644 man/man8/devlink-exception.8 diff --git a/man/man8/devlink-exception.8 b/man/man8/devlink-exception.8 new file mode 100644 index 000000000000..03f24b32cc98 --- /dev/null +++ b/man/man8/devlink-exception.8 @@ -0,0 +1,158 @@ +.TH DEVLINK\-EXCEPTION 8 "15 Aug 2018" "iproute2" "Linux" +.SH NAME +devlink-exception \- devlink exception configuration +.SH SYNOPSIS +.sp +.ad l +.in +8 +.ti -8 +.B devlink +.RI "[ " OPTIONS " ]" +.BR exception +.RI " { " COMMAND " | " +.BR help " }" +.sp + +.ti -8 +.IR OPTIONS " := { " +\fB\-V\fR[\fIersion\fR] | +\fB\-n\fR[\fIno-nice-names\fR] } + +.ti -8 +.B devlink exception show +.RI "[ " DEV " ]" +.RI "[ " +.B condition +.IR NAME +.RI "]" +.RI "[ " +.B action +.IR NAME +.RI "]" + +.ti -8 +.B devlink exception condition set +.IR DEV +.B name +.IR NAME +.RI "[ " +.BR action +.IR NAME +.R "{" active "|" inactive "}" ] + +.ti -8 +.B devlink exception action set +.IR DEV +.B name +.IR NAME +.BR period +.IR PERIOD +.BR count +.IR COUNT +.BR fail " { " +.IR ignore +.BR "| " +.IR down +.R "} " + +.ti -8 +.B devlink exception help + +.SH "DESCRIPTION" +.B devlink-exception +tool allows user to configure the way driver treats unexpected status. The tool allows configuration of the conditions that can trigger exception activity. Set for each condition the follow up operations, such as, reset and dump of info. In addition, set the exception activity termination action. + +.SS devlink exception show - Display devlink exception conditions and actions attributes +.TP +.BI "DEV" +Specifies the devlink device to show. + +.PP +.TP +.BI condition " NAME" +Specifies the devlink condition to show. + +.TP +.BI action " NAME" +Specifies the devlink action to show. + +.SS devlink exception condition set - sets devlink exception condition attributes + +.TP +.B "DEV" +Specifies the devlink device to set. + +.TP +.BI name " NAME" +Name of the condition to set. + +.TP +.BR action +.IR NAME +.R "{" active "|" inactive "} " +.in +4 +Specify which actions to activate and which to deactivate once a condition was triggered. Actions can be dump, reset, etc. + +.SS devlink exception action set - sets devlink action attributes. +Once this command is launched, period and count measurement will be reset. + +.TP +.B "DEV" +Specifies the devlink device to set. + +.TP +.BI name " NAME" +Specifies the devlink action to set. + +.TP +.BI period " PERIOD" +The period on which we limit the amount of performed actions, measured in seconds. + +.TP +.BI count " COUNT" +The maximum number of actions performed in a limited time frame. + +.TP +.BR fail +.R "{" ignore "|" down "}" +.in +4 +Specify the behavior once count limit was reached. + +.I ignore +- Skip triggering this action. + +.I down +- Driver will remain in nonoperational state. + +.SH "EXAMPLES" +.PP +devlink exception show +.RS 4 +Shows the exception state of all devlink devices on the system. +.RE +.PP +devlink exception show pci/0000:01:00.0 +.RS 4 +Shows the exception state of specified devlink device. +.RE +.PP +devlink exception condition set pci/0000:01:00.0 name TX_COMP_ERROR action reset off action dump on +.RS 4 +Sets TX_COMP_ERROR condition parameters for a specific device. +.RE +.PP +devlink exception action set pci/0000:01:00.0 name reset period 3600 count 5 fail ignore +.RS 4 +Sets exception attributes for reset action. Period timer and counter are being reset. +.RE + +.SH SEE ALSO +.BR devlink (8), +.BR devlink-port (8), +.BR devlink-sb (8), +.BR devlink-monitor (8), +.BR devlink-dev (8), +.br + +.SH AUTHOR +Eran ben Elisha <eranbe@mellanox.com> -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH iproute2-next V2] man: Add devlink exception man page 2018-09-26 11:52 ` [RFC PATCH iproute2-next V2] man: Add devlink exception man page Eran Ben Elisha @ 2018-09-27 14:32 ` Jiri Pirko 2018-09-27 16:26 ` David Ahern 0 siblings, 1 reply; 8+ messages in thread From: Jiri Pirko @ 2018-09-27 14:32 UTC (permalink / raw) To: Eran Ben Elisha Cc: netdev, Jakub Kicinski, Jiri Pirko, Stephen Hemminger, Andrew Lunn, Tobin C. Harding, Ariel Almog, Tal Alon Wed, Sep 26, 2018 at 01:52:59PM CEST, eranbe@mellanox.com wrote: >Add devlink-exception man page. Devlink-exception tool will control device >exception attributes, conditions, actions and logging. > >Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> > >------------------------------------------------------- >Copy paste man output to here for easier review process of the RFC. > >DEVLINK-EXCEPTION(8) Linux DEVLINK-EXCEPTION(8) > >NAME > devlink-exception - devlink exception configuration > >SYNOPSIS > devlink [ OPTIONS ] exception { COMMAND | help } > > OPTIONS := { -V[ersion] | -n[no-nice-names] } > > devlink exception show [ DEV ] [ condition NAME ] [ action NAME ] > > devlink exception condition set DEV name NAME [ action NAME { active | inactive } ] > > devlink exception action set DEV name NAME period PERIOD count COUNT fail { ignore | down } > > devlink exception help > >DESCRIPTION > devlink-exception tool allows user to configure the way driver treats unexpected status. The tool allows configuration of the conditions that can trigger exception activity. Set for each condition the follow up opera‐ > tions, such as, reset and dump of info. In addition, set the exception activity termination action. > > devlink exception show - Display devlink exception conditions and actions attributes > DEV Specifies the devlink device to show. > > condition NAME > Specifies the devlink condition to show. > > action NAME > Specifies the devlink action to show. > > devlink exception condition set - sets devlink exception condition attributes > DEV Specifies the devlink device to set. > > name NAME > Name of the condition to set. > > action NAME { active | inactive } > Specify which actions to activate and which to deactivate once a condition was triggered. Actions can be dump, reset, etc. > > devlink exception action set - sets devlink action attributes. > Once this command is launched, period and count measurement will be reset. > > DEV Specifies the devlink device to set. > > name NAME > Specifies the devlink action to set. > > period PERIOD > The period on which we limit the amount of performed actions, measured in seconds. > > count COUNT > The maximum number of actions performed in a limited time frame. > > fail { ignore | down } > Specify the behavior once count limit was reached. > > ignore - Skip triggering this action. > > down - Driver will remain in nonoperational state. > >EXAMPLES > devlink exception show > Shows the exception state of all devlink devices on the system. > > devlink exception show pci/0000:01:00.0 > Shows the exception state of specified devlink device. > > devlink exception condition set pci/0000:01:00.0 name TX_COMP_ERROR action reset off action dump on > Sets TX_COMP_ERROR condition parameters for a specific device. > > devlink exception action set pci/0000:01:00.0 name reset period 3600 count 5 fail ignore > Sets exception attributes for reset action. Period timer and counter are being reset. Looks good to me. But still, I need the code so I can play with it, to see the outputs etc. Thanks! > >SEE ALSO > devlink(8), devlink-port(8), devlink-sb(8), devlink-monitor(8), devlink-dev(8), > >AUTHOR > Eran ben Elisha <eranbe@mellanox.com> > >iproute2 15 Aug 2018 DEVLINK-EXCEPTION(8) > >--- > man/man8/devlink-exception.8 | 158 +++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 158 insertions(+) > create mode 100644 man/man8/devlink-exception.8 > >diff --git a/man/man8/devlink-exception.8 b/man/man8/devlink-exception.8 >new file mode 100644 >index 000000000000..03f24b32cc98 >--- /dev/null >+++ b/man/man8/devlink-exception.8 >@@ -0,0 +1,158 @@ >+.TH DEVLINK\-EXCEPTION 8 "15 Aug 2018" "iproute2" "Linux" >+.SH NAME >+devlink-exception \- devlink exception configuration >+.SH SYNOPSIS >+.sp >+.ad l >+.in +8 >+.ti -8 >+.B devlink >+.RI "[ " OPTIONS " ]" >+.BR exception >+.RI " { " COMMAND " | " >+.BR help " }" >+.sp >+ >+.ti -8 >+.IR OPTIONS " := { " >+\fB\-V\fR[\fIersion\fR] | >+\fB\-n\fR[\fIno-nice-names\fR] } >+ >+.ti -8 >+.B devlink exception show >+.RI "[ " DEV " ]" >+.RI "[ " >+.B condition >+.IR NAME >+.RI "]" >+.RI "[ " >+.B action >+.IR NAME >+.RI "]" >+ >+.ti -8 >+.B devlink exception condition set >+.IR DEV >+.B name >+.IR NAME >+.RI "[ " >+.BR action >+.IR NAME >+.R "{" active "|" inactive "}" ] >+ >+.ti -8 >+.B devlink exception action set >+.IR DEV >+.B name >+.IR NAME >+.BR period >+.IR PERIOD >+.BR count >+.IR COUNT >+.BR fail " { " >+.IR ignore >+.BR "| " >+.IR down >+.R "} " >+ >+.ti -8 >+.B devlink exception help >+ >+.SH "DESCRIPTION" >+.B devlink-exception >+tool allows user to configure the way driver treats unexpected status. The tool allows configuration of the conditions that can trigger exception activity. Set for each condition the follow up operations, such as, reset and dump of info. In addition, set the exception activity termination action. >+ >+.SS devlink exception show - Display devlink exception conditions and actions attributes >+.TP >+.BI "DEV" >+Specifies the devlink device to show. >+ >+.PP >+.TP >+.BI condition " NAME" >+Specifies the devlink condition to show. >+ >+.TP >+.BI action " NAME" >+Specifies the devlink action to show. >+ >+.SS devlink exception condition set - sets devlink exception condition attributes >+ >+.TP >+.B "DEV" >+Specifies the devlink device to set. >+ >+.TP >+.BI name " NAME" >+Name of the condition to set. >+ >+.TP >+.BR action >+.IR NAME >+.R "{" active "|" inactive "} " >+.in +4 >+Specify which actions to activate and which to deactivate once a condition was triggered. Actions can be dump, reset, etc. >+ >+.SS devlink exception action set - sets devlink action attributes. >+Once this command is launched, period and count measurement will be reset. >+ >+.TP >+.B "DEV" >+Specifies the devlink device to set. >+ >+.TP >+.BI name " NAME" >+Specifies the devlink action to set. >+ >+.TP >+.BI period " PERIOD" >+The period on which we limit the amount of performed actions, measured in seconds. >+ >+.TP >+.BI count " COUNT" >+The maximum number of actions performed in a limited time frame. >+ >+.TP >+.BR fail >+.R "{" ignore "|" down "}" >+.in +4 >+Specify the behavior once count limit was reached. >+ >+.I ignore >+- Skip triggering this action. >+ >+.I down >+- Driver will remain in nonoperational state. >+ >+.SH "EXAMPLES" >+.PP >+devlink exception show >+.RS 4 >+Shows the exception state of all devlink devices on the system. >+.RE >+.PP >+devlink exception show pci/0000:01:00.0 >+.RS 4 >+Shows the exception state of specified devlink device. >+.RE >+.PP >+devlink exception condition set pci/0000:01:00.0 name TX_COMP_ERROR action reset off action dump on >+.RS 4 >+Sets TX_COMP_ERROR condition parameters for a specific device. >+.RE >+.PP >+devlink exception action set pci/0000:01:00.0 name reset period 3600 count 5 fail ignore >+.RS 4 >+Sets exception attributes for reset action. Period timer and counter are being reset. >+.RE >+ >+.SH SEE ALSO >+.BR devlink (8), >+.BR devlink-port (8), >+.BR devlink-sb (8), >+.BR devlink-monitor (8), >+.BR devlink-dev (8), >+.br >+ >+.SH AUTHOR >+Eran ben Elisha <eranbe@mellanox.com> >-- >1.8.3.1 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH iproute2-next V2] man: Add devlink exception man page 2018-09-27 14:32 ` Jiri Pirko @ 2018-09-27 16:26 ` David Ahern 0 siblings, 0 replies; 8+ messages in thread From: David Ahern @ 2018-09-27 16:26 UTC (permalink / raw) To: Jiri Pirko, Eran Ben Elisha Cc: netdev, Jakub Kicinski, Jiri Pirko, Stephen Hemminger, Andrew Lunn, Tobin C. Harding, Ariel Almog, Tal Alon On 9/27/18 8:32 AM, Jiri Pirko wrote: > But still, I need the code so I can play with it, to > see the outputs etc. +1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH iproute2-next V2] System specification exception API 2018-09-26 11:52 [RFC PATCH iproute2-next V2] System specification exception API Eran Ben Elisha 2018-09-26 11:52 ` [RFC PATCH iproute2-next V2] man: Add devlink exception man page Eran Ben Elisha @ 2018-09-27 12:47 ` Jiri Pirko 2018-09-27 14:02 ` Eran Ben Elisha 1 sibling, 1 reply; 8+ messages in thread From: Jiri Pirko @ 2018-09-27 12:47 UTC (permalink / raw) To: Eran Ben Elisha Cc: netdev, Jakub Kicinski, Jiri Pirko, Stephen Hemminger, Andrew Lunn, Tobin C. Harding, Ariel Almog, Tal Alon Wed, Sep 26, 2018 at 01:52:58PM CEST, eranbe@mellanox.com wrote: >The exception spec is targeted for Real Time Alerting, in order to know when >something bad had happened to a PCI device >- Provide alert debug information >- Self healing >- If problem needs vendor support, provide a way to gather all needed debugging > information. > >The exception mechanism contains condition checkers which sense for malfunction. Upon a condition hit, >actions such as logs and correction can be taken. > >The condition checkers are divided into the following groups >- Hardware - a checker which is triggered by the device due to > malfunction. >- Software - a checker which is triggered by the software due to > malfunction. What do you mean by a "software malfunction", a "FW malfunction"? Also, I don't see this 2 groups in the man. >Both groups of condition checkers can be triggered due to error event or due to a periodic check. > >Actions are the way to handle those events. Action can be in one of the >following groups: >- Dump - SW trace, SW dump, HW trace, HW dump >- Reset - Surgical correction (e.g. modify Q, flush Q, reset of device, etc) >Actions can be performed by SW or HW. > >User is allowed to enable or disable condition checkers and its action mapping. > >This RFC man page patch describes the suggested API of devlink-exception in order >to control conditions and actions. > >V2: >* Renaming terms: > health -> exception > sensor -> condition >* Remove reinit command and merge with action command. >* Consmetics in grammer. > >Eran Ben Elisha (1): > man: Add devlink exception man page > > man/man8/devlink-exception.8 | 158 +++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 158 insertions(+) > create mode 100644 man/man8/devlink-exception.8 > >-- >1.8.3.1 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH iproute2-next V2] System specification exception API 2018-09-27 12:47 ` [RFC PATCH iproute2-next V2] System specification exception API Jiri Pirko @ 2018-09-27 14:02 ` Eran Ben Elisha 2018-09-27 14:34 ` Jiri Pirko 0 siblings, 1 reply; 8+ messages in thread From: Eran Ben Elisha @ 2018-09-27 14:02 UTC (permalink / raw) To: Jiri Pirko Cc: netdev, Jakub Kicinski, Jiri Pirko, Stephen Hemminger, Andrew Lunn, Tobin C. Harding, Ariel Almog, Tal Alon On 9/27/2018 3:47 PM, Jiri Pirko wrote: > Wed, Sep 26, 2018 at 01:52:58PM CEST, eranbe@mellanox.com wrote: >> The exception spec is targeted for Real Time Alerting, in order to know when >> something bad had happened to a PCI device >> - Provide alert debug information >> - Self healing >> - If problem needs vendor support, provide a way to gather all needed debugging >> information. >> >> The exception mechanism contains condition checkers which sense for malfunction. Upon a condition hit, >> actions such as logs and correction can be taken. >> >> The condition checkers are divided into the following groups >> - Hardware - a checker which is triggered by the device due to >> malfunction. >> - Software - a checker which is triggered by the software due to >> malfunction. > > What do you mean by a "software malfunction", a "FW malfunction"? > Also, I don't see this 2 groups in the man. Software malfunction can be a Transmit error (caused by bad send request). FW/HW malfunction can be any catastrophic error report (the ones that should be exposed to driver). The comment here was to highlight that we can support different kinds of condition groups. If for a specific condition, we will need to highlight it is SW/HW, we can concatenate it to its name. Eran > > >> Both groups of condition checkers can be triggered due to error event or due to a periodic check. >> >> Actions are the way to handle those events. Action can be in one of the >> following groups: >> - Dump - SW trace, SW dump, HW trace, HW dump >> - Reset - Surgical correction (e.g. modify Q, flush Q, reset of device, etc) >> Actions can be performed by SW or HW. >> >> User is allowed to enable or disable condition checkers and its action mapping. >> >> This RFC man page patch describes the suggested API of devlink-exception in order >> to control conditions and actions. >> >> V2: >> * Renaming terms: >> health -> exception >> sensor -> condition >> * Remove reinit command and merge with action command. >> * Consmetics in grammer. >> >> Eran Ben Elisha (1): >> man: Add devlink exception man page >> >> man/man8/devlink-exception.8 | 158 +++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 158 insertions(+) >> create mode 100644 man/man8/devlink-exception.8 >> >> -- >> 1.8.3.1 >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH iproute2-next V2] System specification exception API 2018-09-27 14:02 ` Eran Ben Elisha @ 2018-09-27 14:34 ` Jiri Pirko 2018-09-27 15:04 ` Eran Ben Elisha 0 siblings, 1 reply; 8+ messages in thread From: Jiri Pirko @ 2018-09-27 14:34 UTC (permalink / raw) To: Eran Ben Elisha Cc: netdev, Jakub Kicinski, Jiri Pirko, Stephen Hemminger, Andrew Lunn, Tobin C. Harding, Ariel Almog, Tal Alon Thu, Sep 27, 2018 at 04:02:48PM CEST, eranbe@mellanox.com wrote: > > >On 9/27/2018 3:47 PM, Jiri Pirko wrote: >> Wed, Sep 26, 2018 at 01:52:58PM CEST, eranbe@mellanox.com wrote: >> > The exception spec is targeted for Real Time Alerting, in order to know when >> > something bad had happened to a PCI device >> > - Provide alert debug information >> > - Self healing >> > - If problem needs vendor support, provide a way to gather all needed debugging >> > information. >> > >> > The exception mechanism contains condition checkers which sense for malfunction. Upon a condition hit, >> > actions such as logs and correction can be taken. >> > >> > The condition checkers are divided into the following groups >> > - Hardware - a checker which is triggered by the device due to >> > malfunction. >> > - Software - a checker which is triggered by the software due to >> > malfunction. >> >> What do you mean by a "software malfunction", a "FW malfunction"? >> Also, I don't see this 2 groups in the man. > >Software malfunction can be a Transmit error (caused by bad send request). Sorry, but I still don't undestand what "software malfuntion" are you talking about. Could you be more specific please? >FW/HW malfunction can be any catastrophic error report (the ones that should >be exposed to driver). >The comment here was to highlight that we can support different kinds of >condition groups. >If for a specific condition, we will need to highlight it is SW/HW, we can >concatenate it to its name. > >Eran > >> >> >> > Both groups of condition checkers can be triggered due to error event or due to a periodic check. >> > >> > Actions are the way to handle those events. Action can be in one of the >> > following groups: >> > - Dump - SW trace, SW dump, HW trace, HW dump >> > - Reset - Surgical correction (e.g. modify Q, flush Q, reset of device, etc) >> > Actions can be performed by SW or HW. >> > >> > User is allowed to enable or disable condition checkers and its action mapping. >> > >> > This RFC man page patch describes the suggested API of devlink-exception in order >> > to control conditions and actions. >> > >> > V2: >> > * Renaming terms: >> > health -> exception >> > sensor -> condition >> > * Remove reinit command and merge with action command. >> > * Consmetics in grammer. >> > >> > Eran Ben Elisha (1): >> > man: Add devlink exception man page >> > >> > man/man8/devlink-exception.8 | 158 +++++++++++++++++++++++++++++++++++++++++++ >> > 1 file changed, 158 insertions(+) >> > create mode 100644 man/man8/devlink-exception.8 >> > >> > -- >> > 1.8.3.1 >> > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH iproute2-next V2] System specification exception API 2018-09-27 14:34 ` Jiri Pirko @ 2018-09-27 15:04 ` Eran Ben Elisha 0 siblings, 0 replies; 8+ messages in thread From: Eran Ben Elisha @ 2018-09-27 15:04 UTC (permalink / raw) To: Jiri Pirko Cc: netdev, Jakub Kicinski, Jiri Pirko, Stephen Hemminger, Andrew Lunn, Tobin C. Harding, Ariel Almog, Tal Alon On 9/27/2018 5:34 PM, Jiri Pirko wrote: > Thu, Sep 27, 2018 at 04:02:48PM CEST, eranbe@mellanox.com wrote: >> >> >> On 9/27/2018 3:47 PM, Jiri Pirko wrote: >>> Wed, Sep 26, 2018 at 01:52:58PM CEST, eranbe@mellanox.com wrote: >>>> The exception spec is targeted for Real Time Alerting, in order to know when >>>> something bad had happened to a PCI device >>>> - Provide alert debug information >>>> - Self healing >>>> - If problem needs vendor support, provide a way to gather all needed debugging >>>> information. >>>> >>>> The exception mechanism contains condition checkers which sense for malfunction. Upon a condition hit, >>>> actions such as logs and correction can be taken. >>>> >>>> The condition checkers are divided into the following groups >>>> - Hardware - a checker which is triggered by the device due to >>>> malfunction. >>>> - Software - a checker which is triggered by the software due to >>>> malfunction. >>> >>> What do you mean by a "software malfunction", a "FW malfunction"? >>> Also, I don't see this 2 groups in the man. >> >> Software malfunction can be a Transmit error (caused by bad send request). > > Sorry, but I still don't undestand what "software malfuntion" are you > talking about. Could you be more specific please? * Driver is building a bad send Work request (bug in driver, bug in packet generator, etc). When it sends it, it gets back an error completion from the HW. This error might cause the HW Queue to be in error state and cannot be used again until it is being "recovered". Condition: Error completion Action: Queue recover The entire scenario is due to SW malfunction. * Driver is trying to configure HW QoS register bug failed by the FW. Condition: command execution error Action: Dump of command + Dump of SW internal related DB + Dump of FW related DB * Another existing example is the ndo_tx_timeout routine. (This is being done in the networking stuck layer, and can be configured today from a sysfs). If a vendor driver has other specific checking routine like this one in its driver (which he needs to configure from userspace), then it can handled via devlink-exception and be tagged as a software condition. > > >> FW/HW malfunction can be any catastrophic error report (the ones that should >> be exposed to driver). >> The comment here was to highlight that we can support different kinds of >> condition groups. >> If for a specific condition, we will need to highlight it is SW/HW, we can >> concatenate it to its name. >> >> Eran >> >>>> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-09-27 22:45 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-09-26 11:52 [RFC PATCH iproute2-next V2] System specification exception API Eran Ben Elisha 2018-09-26 11:52 ` [RFC PATCH iproute2-next V2] man: Add devlink exception man page Eran Ben Elisha 2018-09-27 14:32 ` Jiri Pirko 2018-09-27 16:26 ` David Ahern 2018-09-27 12:47 ` [RFC PATCH iproute2-next V2] System specification exception API Jiri Pirko 2018-09-27 14:02 ` Eran Ben Elisha 2018-09-27 14:34 ` Jiri Pirko 2018-09-27 15:04 ` Eran Ben Elisha
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).