* [PATCH net-next 0/3] Devlink health updates
@ 2019-03-03 8:57 Eran Ben Elisha
2019-03-03 8:57 ` [PATCH net-next 1/3] devlink: Update reporter state to error even if recover aborted Eran Ben Elisha
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Eran Ben Elisha @ 2019-03-03 8:57 UTC (permalink / raw)
To: netdev, David S. Miller; +Cc: Saeed Mahameed, Jiri Pirko, Eran Ben Elisha
This patchset includes a fix [patch 01] to the devlink health state update, in
case recover was aborted.
In addition, it includes a small enhancement to the infrastructure in order to
allow direct state update in run-time, and use it from mlx5e tx reporter.
Eran Ben Elisha (3):
devlink: Update reporter state to error even if recover aborted
devlink: Add support for direct reporter health state update
net/mlx5e: Update tx reporter status in case channels were
successfully opened
.../net/ethernet/mellanox/mlx5/core/en_main.c | 4 +++
include/net/devlink.h | 14 +++++++++
include/trace/events/devlink.h | 31 +++++++++++++++++++
net/core/devlink.c | 27 ++++++++++++----
4 files changed, 70 insertions(+), 6 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net-next 1/3] devlink: Update reporter state to error even if recover aborted
2019-03-03 8:57 [PATCH net-next 0/3] Devlink health updates Eran Ben Elisha
@ 2019-03-03 8:57 ` Eran Ben Elisha
2019-03-03 8:57 ` [PATCH net-next 2/3] devlink: Add support for direct reporter health state update Eran Ben Elisha
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Eran Ben Elisha @ 2019-03-03 8:57 UTC (permalink / raw)
To: netdev, David S. Miller; +Cc: Saeed Mahameed, Jiri Pirko, Eran Ben Elisha
If devlink_health_report() aborted the recover flow due to grace period checker,
it left the reporter status as DEVLINK_HEALTH_REPORTER_STATE_HEALTHY, which is
a bug. Fix that by always setting the reporter state to
DEVLINK_HEALTH_REPORTER_STATE_ERROR prior to running the checker mentioned above.
In addition, save the previous health_state in a temporary variable, then use
it in the abort check comparison instead of using reporter->health_state which
might be already changed.
Fixes: c8e1da0bf923 ("devlink: Add health report functionality")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
net/core/devlink.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 6515fbec0dcd..376e01a70c6d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4569,16 +4569,19 @@ static int devlink_health_do_dump(struct devlink_health_reporter *reporter,
int devlink_health_report(struct devlink_health_reporter *reporter,
const char *msg, void *priv_ctx)
{
+ enum devlink_health_reporter_state prev_health_state;
struct devlink *devlink = reporter->devlink;
/* write a log message of the current error */
WARN_ON(!msg);
trace_devlink_health_report(devlink, reporter->ops->name, msg);
reporter->error_count++;
+ prev_health_state = reporter->health_state;
+ reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR;
/* abort if the previous error wasn't recovered */
if (reporter->auto_recover &&
- (reporter->health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY ||
+ (prev_health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY ||
jiffies - reporter->last_recovery_ts <
msecs_to_jiffies(reporter->graceful_period))) {
trace_devlink_health_recover_aborted(devlink,
--
2.17.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next 2/3] devlink: Add support for direct reporter health state update
2019-03-03 8:57 [PATCH net-next 0/3] Devlink health updates Eran Ben Elisha
2019-03-03 8:57 ` [PATCH net-next 1/3] devlink: Update reporter state to error even if recover aborted Eran Ben Elisha
@ 2019-03-03 8:57 ` Eran Ben Elisha
2019-03-03 8:57 ` [PATCH net-next 3/3] net/mlx5e: Update tx reporter status in case channels were successfully opened Eran Ben Elisha
2019-03-04 19:00 ` [PATCH net-next 0/3] Devlink health updates David Miller
3 siblings, 0 replies; 5+ messages in thread
From: Eran Ben Elisha @ 2019-03-03 8:57 UTC (permalink / raw)
To: netdev, David S. Miller; +Cc: Saeed Mahameed, Jiri Pirko, Eran Ben Elisha
It is possible that a reporter state will be updated due to a recover flow
which is not triggered by a devlink health related operation, but as a side
effect of some other operation in the system.
Expose devlink health API for a direct update of a reporter status.
Move devlink_health_reporter_state enum definition to devlink.h so it could
be used from drivers as a parameter of devlink_health_reporter_state_update.
In addition, add trace_devlink_health_reporter_state_update to provide user
notification for reporter state change.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
include/net/devlink.h | 14 ++++++++++++++
include/trace/events/devlink.h | 31 +++++++++++++++++++++++++++++++
net/core/devlink.c | 22 +++++++++++++++++-----
3 files changed, 62 insertions(+), 5 deletions(-)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 7f5a0bdca228..63de99e09f04 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -447,6 +447,11 @@ typedef void devlink_snapshot_data_dest_t(const void *data);
struct devlink_fmsg;
struct devlink_health_reporter;
+enum devlink_health_reporter_state {
+ DEVLINK_HEALTH_REPORTER_STATE_HEALTHY,
+ DEVLINK_HEALTH_REPORTER_STATE_ERROR,
+};
+
/**
* struct devlink_health_reporter_ops - Reporter operations
* @name: reporter name
@@ -715,6 +720,9 @@ void *
devlink_health_reporter_priv(struct devlink_health_reporter *reporter);
int devlink_health_report(struct devlink_health_reporter *reporter,
const char *msg, void *priv_ctx);
+void
+devlink_health_reporter_state_update(struct devlink_health_reporter *reporter,
+ enum devlink_health_reporter_state state);
void devlink_compat_running_version(struct net_device *dev,
char *buf, size_t len);
@@ -1204,6 +1212,12 @@ devlink_health_report(struct devlink_health_reporter *reporter,
return 0;
}
+static inline void
+devlink_health_reporter_state_update(struct devlink_health_reporter *reporter,
+ enum devlink_health_reporter_state state)
+{
+}
+
static inline void
devlink_compat_running_version(struct net_device *dev, char *buf, size_t len)
{
diff --git a/include/trace/events/devlink.h b/include/trace/events/devlink.h
index 191ddf67d769..6f60a78d9a7e 100644
--- a/include/trace/events/devlink.h
+++ b/include/trace/events/devlink.h
@@ -140,6 +140,37 @@ TRACE_EVENT(devlink_health_recover_aborted,
__entry->time_since_last_recover)
);
+/*
+ * Tracepoint for devlink health reporter state update:
+ */
+TRACE_EVENT(devlink_health_reporter_state_update,
+ TP_PROTO(const struct devlink *devlink, const char *reporter_name,
+ bool new_state),
+
+ TP_ARGS(devlink, reporter_name, new_state),
+
+ TP_STRUCT__entry(
+ __string(bus_name, devlink->dev->bus->name)
+ __string(dev_name, dev_name(devlink->dev))
+ __string(driver_name, devlink->dev->driver->name)
+ __string(reporter_name, reporter_name)
+ __field(u8, new_state)
+ ),
+
+ TP_fast_assign(
+ __assign_str(bus_name, devlink->dev->bus->name);
+ __assign_str(dev_name, dev_name(devlink->dev));
+ __assign_str(driver_name, devlink->dev->driver->name);
+ __assign_str(reporter_name, reporter_name);
+ __entry->new_state = new_state;
+ ),
+
+ TP_printk("bus_name=%s dev_name=%s driver_name=%s reporter_name=%s: new_state=%d",
+ __get_str(bus_name), __get_str(dev_name),
+ __get_str(driver_name), __get_str(reporter_name),
+ __entry->new_state)
+);
+
#endif /* _TRACE_DEVLINK_H */
/* This part must be outside protection */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 376e01a70c6d..78e22cea4cc7 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4409,11 +4409,6 @@ struct devlink_health_reporter {
u64 last_recovery_ts;
};
-enum devlink_health_reporter_state {
- DEVLINK_HEALTH_REPORTER_STATE_HEALTHY,
- DEVLINK_HEALTH_REPORTER_STATE_ERROR,
-};
-
void *
devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
{
@@ -4498,6 +4493,23 @@ devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
}
EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy);
+void
+devlink_health_reporter_state_update(struct devlink_health_reporter *reporter,
+ enum devlink_health_reporter_state state)
+{
+ if (WARN_ON(state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY &&
+ state != DEVLINK_HEALTH_REPORTER_STATE_ERROR))
+ return;
+
+ if (reporter->health_state == state)
+ return;
+
+ reporter->health_state = state;
+ trace_devlink_health_reporter_state_update(reporter->devlink,
+ reporter->ops->name, state);
+}
+EXPORT_SYMBOL_GPL(devlink_health_reporter_state_update);
+
static int
devlink_health_reporter_recover(struct devlink_health_reporter *reporter,
void *priv_ctx)
--
2.17.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next 3/3] net/mlx5e: Update tx reporter status in case channels were successfully opened
2019-03-03 8:57 [PATCH net-next 0/3] Devlink health updates Eran Ben Elisha
2019-03-03 8:57 ` [PATCH net-next 1/3] devlink: Update reporter state to error even if recover aborted Eran Ben Elisha
2019-03-03 8:57 ` [PATCH net-next 2/3] devlink: Add support for direct reporter health state update Eran Ben Elisha
@ 2019-03-03 8:57 ` Eran Ben Elisha
2019-03-04 19:00 ` [PATCH net-next 0/3] Devlink health updates David Miller
3 siblings, 0 replies; 5+ messages in thread
From: Eran Ben Elisha @ 2019-03-03 8:57 UTC (permalink / raw)
To: netdev, David S. Miller; +Cc: Saeed Mahameed, Jiri Pirko, Eran Ben Elisha
Once channels were successfully opened, update tx reporter health state to
healthy. This is needed for the following scenario:
- SQ has an un-recovered error reported to the devlink health, resulting tx
reporter state to be error.
- Current channels (including this SQ) are closed
- New channels are opened
After that flow, the original error was "solved", and tx reporter state
should be healthy. However, as it was resolved as a side effect, and not
via tx reporter recover method, driver needs to inform devlink health
about it.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e5f74eb986b3..b5fdbd3190d9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2302,6 +2302,10 @@ int mlx5e_open_channels(struct mlx5e_priv *priv,
goto err_close_channels;
}
+ if (!IS_ERR_OR_NULL(priv->tx_reporter))
+ devlink_health_reporter_state_update(priv->tx_reporter,
+ DEVLINK_HEALTH_REPORTER_STATE_HEALTHY);
+
kvfree(cparam);
return 0;
--
2.17.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next 0/3] Devlink health updates
2019-03-03 8:57 [PATCH net-next 0/3] Devlink health updates Eran Ben Elisha
` (2 preceding siblings ...)
2019-03-03 8:57 ` [PATCH net-next 3/3] net/mlx5e: Update tx reporter status in case channels were successfully opened Eran Ben Elisha
@ 2019-03-04 19:00 ` David Miller
3 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2019-03-04 19:00 UTC (permalink / raw)
To: eranbe; +Cc: netdev, saeedm, jiri
From: Eran Ben Elisha <eranbe@mellanox.com>
Date: Sun, 3 Mar 2019 10:57:28 +0200
> This patchset includes a fix [patch 01] to the devlink health state update, in
> case recover was aborted.
>
> In addition, it includes a small enhancement to the infrastructure in order to
> allow direct state update in run-time, and use it from mlx5e tx reporter.
Series applied, thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-03-04 19:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-03-03 8:57 [PATCH net-next 0/3] Devlink health updates Eran Ben Elisha
2019-03-03 8:57 ` [PATCH net-next 1/3] devlink: Update reporter state to error even if recover aborted Eran Ben Elisha
2019-03-03 8:57 ` [PATCH net-next 2/3] devlink: Add support for direct reporter health state update Eran Ben Elisha
2019-03-03 8:57 ` [PATCH net-next 3/3] net/mlx5e: Update tx reporter status in case channels were successfully opened Eran Ben Elisha
2019-03-04 19:00 ` [PATCH net-next 0/3] Devlink health updates David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).