netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] Devlink health updates
@ 2019-03-03  8:57 Eran Ben Elisha
  2019-03-03  8:57 ` [PATCH net-next 1/3] devlink: Update reporter state to error even if recover aborted Eran Ben Elisha
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Eran Ben Elisha @ 2019-03-03  8:57 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: Saeed Mahameed, Jiri Pirko, Eran Ben Elisha

This patchset includes a fix [patch 01] to the devlink health state update, in
case recover was aborted.

In addition, it includes a small enhancement to the infrastructure in order to
allow direct state update in run-time, and use it from mlx5e tx reporter.

Eran Ben Elisha (3):
  devlink: Update reporter state to error even if recover aborted
  devlink: Add support for direct reporter health state update
  net/mlx5e: Update tx reporter status in case channels were
    successfully opened

 .../net/ethernet/mellanox/mlx5/core/en_main.c |  4 +++
 include/net/devlink.h                         | 14 +++++++++
 include/trace/events/devlink.h                | 31 +++++++++++++++++++
 net/core/devlink.c                            | 27 ++++++++++++----
 4 files changed, 70 insertions(+), 6 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net-next 1/3] devlink: Update reporter state to error even if recover aborted
  2019-03-03  8:57 [PATCH net-next 0/3] Devlink health updates Eran Ben Elisha
@ 2019-03-03  8:57 ` Eran Ben Elisha
  2019-03-03  8:57 ` [PATCH net-next 2/3] devlink: Add support for direct reporter health state update Eran Ben Elisha
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Eran Ben Elisha @ 2019-03-03  8:57 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: Saeed Mahameed, Jiri Pirko, Eran Ben Elisha

If devlink_health_report() aborted the recover flow due to grace period checker,
it left the reporter status as DEVLINK_HEALTH_REPORTER_STATE_HEALTHY, which is
a bug. Fix that by always setting the reporter state to
DEVLINK_HEALTH_REPORTER_STATE_ERROR prior to running the checker mentioned above.

In addition, save the previous health_state in a temporary variable, then use
it in the abort check comparison instead of using reporter->health_state which
might be already changed.

Fixes: c8e1da0bf923 ("devlink: Add health report functionality")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/core/devlink.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 6515fbec0dcd..376e01a70c6d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4569,16 +4569,19 @@ static int devlink_health_do_dump(struct devlink_health_reporter *reporter,
 int devlink_health_report(struct devlink_health_reporter *reporter,
 			  const char *msg, void *priv_ctx)
 {
+	enum devlink_health_reporter_state prev_health_state;
 	struct devlink *devlink = reporter->devlink;
 
 	/* write a log message of the current error */
 	WARN_ON(!msg);
 	trace_devlink_health_report(devlink, reporter->ops->name, msg);
 	reporter->error_count++;
+	prev_health_state = reporter->health_state;
+	reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR;
 
 	/* abort if the previous error wasn't recovered */
 	if (reporter->auto_recover &&
-	    (reporter->health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY ||
+	    (prev_health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY ||
 	     jiffies - reporter->last_recovery_ts <
 	     msecs_to_jiffies(reporter->graceful_period))) {
 		trace_devlink_health_recover_aborted(devlink,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net-next 2/3] devlink: Add support for direct reporter health state update
  2019-03-03  8:57 [PATCH net-next 0/3] Devlink health updates Eran Ben Elisha
  2019-03-03  8:57 ` [PATCH net-next 1/3] devlink: Update reporter state to error even if recover aborted Eran Ben Elisha
@ 2019-03-03  8:57 ` Eran Ben Elisha
  2019-03-03  8:57 ` [PATCH net-next 3/3] net/mlx5e: Update tx reporter status in case channels were successfully opened Eran Ben Elisha
  2019-03-04 19:00 ` [PATCH net-next 0/3] Devlink health updates David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: Eran Ben Elisha @ 2019-03-03  8:57 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: Saeed Mahameed, Jiri Pirko, Eran Ben Elisha

It is possible that a reporter state will be updated due to a recover flow
which is not triggered by a devlink health related operation, but as a side
effect of some other operation in the system.

Expose devlink health API for a direct update of a reporter status.

Move devlink_health_reporter_state enum definition to devlink.h so it could
be used from drivers as a parameter of devlink_health_reporter_state_update.

In addition, add trace_devlink_health_reporter_state_update to provide user
notification for reporter state change.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/devlink.h          | 14 ++++++++++++++
 include/trace/events/devlink.h | 31 +++++++++++++++++++++++++++++++
 net/core/devlink.c             | 22 +++++++++++++++++-----
 3 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 7f5a0bdca228..63de99e09f04 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -447,6 +447,11 @@ typedef void devlink_snapshot_data_dest_t(const void *data);
 struct devlink_fmsg;
 struct devlink_health_reporter;
 
+enum devlink_health_reporter_state {
+	DEVLINK_HEALTH_REPORTER_STATE_HEALTHY,
+	DEVLINK_HEALTH_REPORTER_STATE_ERROR,
+};
+
 /**
  * struct devlink_health_reporter_ops - Reporter operations
  * @name: reporter name
@@ -715,6 +720,9 @@ void *
 devlink_health_reporter_priv(struct devlink_health_reporter *reporter);
 int devlink_health_report(struct devlink_health_reporter *reporter,
 			  const char *msg, void *priv_ctx);
+void
+devlink_health_reporter_state_update(struct devlink_health_reporter *reporter,
+				     enum devlink_health_reporter_state state);
 
 void devlink_compat_running_version(struct net_device *dev,
 				    char *buf, size_t len);
@@ -1204,6 +1212,12 @@ devlink_health_report(struct devlink_health_reporter *reporter,
 	return 0;
 }
 
+static inline void
+devlink_health_reporter_state_update(struct devlink_health_reporter *reporter,
+				     enum devlink_health_reporter_state state)
+{
+}
+
 static inline void
 devlink_compat_running_version(struct net_device *dev, char *buf, size_t len)
 {
diff --git a/include/trace/events/devlink.h b/include/trace/events/devlink.h
index 191ddf67d769..6f60a78d9a7e 100644
--- a/include/trace/events/devlink.h
+++ b/include/trace/events/devlink.h
@@ -140,6 +140,37 @@ TRACE_EVENT(devlink_health_recover_aborted,
 		  __entry->time_since_last_recover)
 );
 
+/*
+ * Tracepoint for devlink health reporter state update:
+ */
+TRACE_EVENT(devlink_health_reporter_state_update,
+	TP_PROTO(const struct devlink *devlink, const char *reporter_name,
+		 bool new_state),
+
+	TP_ARGS(devlink, reporter_name, new_state),
+
+	TP_STRUCT__entry(
+		__string(bus_name, devlink->dev->bus->name)
+		__string(dev_name, dev_name(devlink->dev))
+		__string(driver_name, devlink->dev->driver->name)
+		__string(reporter_name, reporter_name)
+		__field(u8, new_state)
+	),
+
+	TP_fast_assign(
+		__assign_str(bus_name, devlink->dev->bus->name);
+		__assign_str(dev_name, dev_name(devlink->dev));
+		__assign_str(driver_name, devlink->dev->driver->name);
+		__assign_str(reporter_name, reporter_name);
+		__entry->new_state = new_state;
+	),
+
+	TP_printk("bus_name=%s dev_name=%s driver_name=%s reporter_name=%s: new_state=%d",
+		  __get_str(bus_name), __get_str(dev_name),
+		  __get_str(driver_name), __get_str(reporter_name),
+		  __entry->new_state)
+);
+
 #endif /* _TRACE_DEVLINK_H */
 
 /* This part must be outside protection */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 376e01a70c6d..78e22cea4cc7 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4409,11 +4409,6 @@ struct devlink_health_reporter {
 	u64 last_recovery_ts;
 };
 
-enum devlink_health_reporter_state {
-	DEVLINK_HEALTH_REPORTER_STATE_HEALTHY,
-	DEVLINK_HEALTH_REPORTER_STATE_ERROR,
-};
-
 void *
 devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
 {
@@ -4498,6 +4493,23 @@ devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
 }
 EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy);
 
+void
+devlink_health_reporter_state_update(struct devlink_health_reporter *reporter,
+				     enum devlink_health_reporter_state state)
+{
+	if (WARN_ON(state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY &&
+		    state != DEVLINK_HEALTH_REPORTER_STATE_ERROR))
+		return;
+
+	if (reporter->health_state == state)
+		return;
+
+	reporter->health_state = state;
+	trace_devlink_health_reporter_state_update(reporter->devlink,
+						   reporter->ops->name, state);
+}
+EXPORT_SYMBOL_GPL(devlink_health_reporter_state_update);
+
 static int
 devlink_health_reporter_recover(struct devlink_health_reporter *reporter,
 				void *priv_ctx)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net-next 3/3] net/mlx5e: Update tx reporter status in case channels were successfully opened
  2019-03-03  8:57 [PATCH net-next 0/3] Devlink health updates Eran Ben Elisha
  2019-03-03  8:57 ` [PATCH net-next 1/3] devlink: Update reporter state to error even if recover aborted Eran Ben Elisha
  2019-03-03  8:57 ` [PATCH net-next 2/3] devlink: Add support for direct reporter health state update Eran Ben Elisha
@ 2019-03-03  8:57 ` Eran Ben Elisha
  2019-03-04 19:00 ` [PATCH net-next 0/3] Devlink health updates David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: Eran Ben Elisha @ 2019-03-03  8:57 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: Saeed Mahameed, Jiri Pirko, Eran Ben Elisha

Once channels were successfully opened, update tx reporter health state to
healthy. This is needed for the following scenario:
- SQ has an un-recovered error reported to the devlink health, resulting tx
  reporter state to be error.
- Current channels (including this SQ) are closed
- New channels are opened
After that flow, the original error was "solved", and tx reporter state
should be healthy. However, as it was resolved as a side effect, and not
via tx reporter recover method, driver needs to inform devlink health
about it.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e5f74eb986b3..b5fdbd3190d9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2302,6 +2302,10 @@ int mlx5e_open_channels(struct mlx5e_priv *priv,
 			goto err_close_channels;
 	}
 
+	if (!IS_ERR_OR_NULL(priv->tx_reporter))
+		devlink_health_reporter_state_update(priv->tx_reporter,
+						     DEVLINK_HEALTH_REPORTER_STATE_HEALTHY);
+
 	kvfree(cparam);
 	return 0;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 0/3] Devlink health updates
  2019-03-03  8:57 [PATCH net-next 0/3] Devlink health updates Eran Ben Elisha
                   ` (2 preceding siblings ...)
  2019-03-03  8:57 ` [PATCH net-next 3/3] net/mlx5e: Update tx reporter status in case channels were successfully opened Eran Ben Elisha
@ 2019-03-04 19:00 ` David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2019-03-04 19:00 UTC (permalink / raw)
  To: eranbe; +Cc: netdev, saeedm, jiri

From: Eran Ben Elisha <eranbe@mellanox.com>
Date: Sun,  3 Mar 2019 10:57:28 +0200

> This patchset includes a fix [patch 01] to the devlink health state update, in
> case recover was aborted.
> 
> In addition, it includes a small enhancement to the infrastructure in order to
> allow direct state update in run-time, and use it from mlx5e tx reporter.

Series applied, thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-03-04 19:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-03-03  8:57 [PATCH net-next 0/3] Devlink health updates Eran Ben Elisha
2019-03-03  8:57 ` [PATCH net-next 1/3] devlink: Update reporter state to error even if recover aborted Eran Ben Elisha
2019-03-03  8:57 ` [PATCH net-next 2/3] devlink: Add support for direct reporter health state update Eran Ben Elisha
2019-03-03  8:57 ` [PATCH net-next 3/3] net/mlx5e: Update tx reporter status in case channels were successfully opened Eran Ben Elisha
2019-03-04 19:00 ` [PATCH net-next 0/3] Devlink health updates David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).