* [PATCH net v2] devlink: Delay health recover notification until devlink registered
@ 2023-08-09 20:35 Shay Drory
2023-08-10 17:33 ` Jakub Kicinski
0 siblings, 1 reply; 6+ messages in thread
From: Shay Drory @ 2023-08-09 20:35 UTC (permalink / raw)
To: netdev, pabeni, davem, kuba, edumazet; +Cc: Shay Drory, Jiri Pirko
From one side, devl_register() is done last in device initialization
phase, in order to expose devlink to the user only when device is
ready. From second side, it is valid to create health reporters
during device initialization, in order to recover and/or notify the
user.
As a result, a health recover can be invoked before devl_register().
However, invoking health recover before devl_register() triggers a
WARN_ON.
Hence, apply delay notification mechanism to health reporters.
Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
v1->v2:
- re-phrased the commit message.
---
net/devlink/devl_internal.h | 21 +++++++++++++++++++++
net/devlink/health.c | 29 +++++++++--------------------
net/devlink/leftover.c | 5 +++++
3 files changed, 35 insertions(+), 20 deletions(-)
diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h
index 62921b2eb0d3..9269dbe1b047 100644
--- a/net/devlink/devl_internal.h
+++ b/net/devlink/devl_internal.h
@@ -53,6 +53,25 @@ struct devlink {
char priv[] __aligned(NETDEV_ALIGN);
};
+struct devlink_health_reporter {
+ struct list_head list;
+ void *priv;
+ const struct devlink_health_reporter_ops *ops;
+ struct devlink *devlink;
+ struct devlink_port *devlink_port;
+ struct devlink_fmsg *dump_fmsg;
+ struct mutex dump_lock; /* lock parallel read/write from dump buffers */
+ u64 graceful_period;
+ bool auto_recover;
+ bool auto_dump;
+ u8 health_state;
+ u64 dump_ts;
+ u64 dump_real_ts;
+ u64 error_count;
+ u64 recovery_count;
+ u64 last_recovery_ts;
+};
+
extern struct xarray devlinks;
extern struct genl_family devlink_nl_family;
@@ -168,6 +187,8 @@ extern const struct devlink_cmd devl_cmd_selftests_get;
/* Notify */
void devlink_notify(struct devlink *devlink, enum devlink_command cmd);
+void devlink_recover_notify_check(struct devlink_health_reporter *reporter,
+ enum devlink_command cmd);
/* Ports */
int devlink_port_netdevice_event(struct notifier_block *nb,
diff --git a/net/devlink/health.c b/net/devlink/health.c
index 194340a8bb86..b1ceea733926 100644
--- a/net/devlink/health.c
+++ b/net/devlink/health.c
@@ -51,25 +51,6 @@ static void devlink_fmsg_free(struct devlink_fmsg *fmsg)
kfree(fmsg);
}
-struct devlink_health_reporter {
- struct list_head list;
- void *priv;
- const struct devlink_health_reporter_ops *ops;
- struct devlink *devlink;
- struct devlink_port *devlink_port;
- struct devlink_fmsg *dump_fmsg;
- struct mutex dump_lock; /* lock parallel read/write from dump buffers */
- u64 graceful_period;
- bool auto_recover;
- bool auto_dump;
- u8 health_state;
- u64 dump_ts;
- u64 dump_real_ts;
- u64 error_count;
- u64 recovery_count;
- u64 last_recovery_ts;
-};
-
void *
devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
{
@@ -480,7 +461,8 @@ static void devlink_recover_notify(struct devlink_health_reporter *reporter,
int err;
WARN_ON(cmd != DEVLINK_CMD_HEALTH_REPORTER_RECOVER);
- ASSERT_DEVLINK_REGISTERED(devlink);
+ if (!devl_is_registered(devlink))
+ return;
msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
if (!msg)
@@ -496,6 +478,13 @@ static void devlink_recover_notify(struct devlink_health_reporter *reporter,
0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
}
+void devlink_recover_notify_check(struct devlink_health_reporter *reporter,
+ enum devlink_command cmd)
+{
+ if (reporter->error_count)
+ devlink_recover_notify(reporter, cmd);
+}
+
void
devlink_health_reporter_recovery_done(struct devlink_health_reporter *reporter)
{
diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c
index 1f00f874471f..6d07fd55c75b 100644
--- a/net/devlink/leftover.c
+++ b/net/devlink/leftover.c
@@ -6659,6 +6659,7 @@ void devlink_notify_register(struct devlink *devlink)
{
struct devlink_trap_policer_item *policer_item;
struct devlink_trap_group_item *group_item;
+ struct devlink_health_reporter *reporter;
struct devlink_param_item *param_item;
struct devlink_trap_item *trap_item;
struct devlink_port *devlink_port;
@@ -6695,6 +6696,10 @@ void devlink_notify_register(struct devlink *devlink)
xa_for_each(&devlink->params, param_id, param_item)
devlink_param_notify(devlink, 0, param_item,
DEVLINK_CMD_PARAM_NEW);
+
+ list_for_each_entry(reporter, &devlink->reporter_list, list)
+ devlink_recover_notify_check(reporter,
+ DEVLINK_CMD_HEALTH_REPORTER_RECOVER);
}
void devlink_notify_unregister(struct devlink *devlink)
--
2.38.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net v2] devlink: Delay health recover notification until devlink registered
2023-08-09 20:35 [PATCH net v2] devlink: Delay health recover notification until devlink registered Shay Drory
@ 2023-08-10 17:33 ` Jakub Kicinski
2023-08-11 8:05 ` Jiri Pirko
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2023-08-10 17:33 UTC (permalink / raw)
To: Shay Drory; +Cc: netdev, pabeni, davem, edumazet, Jiri Pirko
On Wed, 9 Aug 2023 23:35:21 +0300 Shay Drory wrote:
> From one side, devl_register() is done last in device initialization
> phase, in order to expose devlink to the user only when device is
> ready. From second side, it is valid to create health reporters
> during device initialization, in order to recover and/or notify the
> user.
> As a result, a health recover can be invoked before devl_register().
> However, invoking health recover before devl_register() triggers a
> WARN_ON.
My comment on v1 wasn't clear enough, I guess.
What I was trying to get across is that because drivers can take
devl_lock(), devl_register() does not have to be last.
AFAIU your driver does:
devlink_port_health_reporter_create()
...
devlink_register()
why not change it to do:
devl_lock()
devl_register()
devl_port_health_reporter_create()
...
devl_unlock() # until unlock user space can't access the instance
?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net v2] devlink: Delay health recover notification until devlink registered
2023-08-10 17:33 ` Jakub Kicinski
@ 2023-08-11 8:05 ` Jiri Pirko
2023-08-11 21:42 ` Jakub Kicinski
0 siblings, 1 reply; 6+ messages in thread
From: Jiri Pirko @ 2023-08-11 8:05 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Shay Drory, netdev, pabeni, davem, edumazet, Jiri Pirko
Thu, Aug 10, 2023 at 07:33:00PM CEST, kuba@kernel.org wrote:
>On Wed, 9 Aug 2023 23:35:21 +0300 Shay Drory wrote:
>> From one side, devl_register() is done last in device initialization
>> phase, in order to expose devlink to the user only when device is
>> ready. From second side, it is valid to create health reporters
>> during device initialization, in order to recover and/or notify the
>> user.
>> As a result, a health recover can be invoked before devl_register().
>> However, invoking health recover before devl_register() triggers a
>> WARN_ON.
>
>My comment on v1 wasn't clear enough, I guess.
>
>What I was trying to get across is that because drivers can take
>devl_lock(), devl_register() does not have to be last.
>
>AFAIU your driver does:
>
> devlink_port_health_reporter_create()
> ...
> devlink_register()
>
>why not change it to do:
>
> devl_lock()
> devl_register()
> devl_port_health_reporter_create()
> ...
> devl_unlock() # until unlock user space can't access the instance
This patch is not about user accessing it, this is about
notification that would be tried to send before the instance is
registered. So the lock scheme you suggest is not necessary. What helps
is to move devl_port_health_reporter_create() call after register.
We have the same issue with mlxsw where this notification may be called
before register from mlxsw_core_health_event_work()
Limiting the creation of health reporter only when instance is
registered does not seem like a good solution to me.
As with any other object, we postpone the notifications only after
register is done, that sounds fine, doesn't it?
>
>?
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net v2] devlink: Delay health recover notification until devlink registered
2023-08-11 8:05 ` Jiri Pirko
@ 2023-08-11 21:42 ` Jakub Kicinski
2023-08-11 22:14 ` Jakub Kicinski
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2023-08-11 21:42 UTC (permalink / raw)
To: Jiri Pirko; +Cc: Shay Drory, netdev, pabeni, davem, edumazet, Jiri Pirko
On Fri, 11 Aug 2023 10:05:03 +0200 Jiri Pirko wrote:
> This patch is not about user accessing it, this is about
> notification that would be tried to send before the instance is
> registered. So the lock scheme you suggest is not necessary. What helps
> is to move devl_port_health_reporter_create() call after register.
We moved to a model where the instance is registered early,
why is the driver not registering the instance early?
My first choice would be to fix the driver rather than adjust
the behavior of the core, and I'm hearing no solid reason why
that's not possible.
> We have the same issue with mlxsw where this notification may be called
> before register from mlxsw_core_health_event_work()
mlxsw? I somehow assumed this patch was for mlx5..
Quality commit message :|
> Limiting the creation of health reporter only when instance is
> registered does not seem like a good solution to me.
>
> As with any other object, we postpone the notifications only after
> register is done, that sounds fine, doesn't it?
No, it doesn't. Registering health reporters on a semi-initialized
device is a major foot gun, we should not allow this unless really
necessary.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net v2] devlink: Delay health recover notification until devlink registered
2023-08-11 21:42 ` Jakub Kicinski
@ 2023-08-11 22:14 ` Jakub Kicinski
2023-08-13 10:12 ` Shay Drory
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2023-08-11 22:14 UTC (permalink / raw)
To: Jiri Pirko; +Cc: Shay Drory, netdev, pabeni, davem, edumazet, Jiri Pirko
On Fri, 11 Aug 2023 14:42:31 -0700 Jakub Kicinski wrote:
> > Limiting the creation of health reporter only when instance is
> > registered does not seem like a good solution to me.
> >
> > As with any other object, we postpone the notifications only after
> > register is done, that sounds fine, doesn't it?
>
> No, it doesn't. Registering health reporters on a semi-initialized
> device is a major foot gun, we should not allow this unless really
> necessary.
FWIW I mean that the recovery may race with the init and try to access
things which aren't set up yet. Could lead to deadlocks or crashes at
worst and sprinkling of "if (!initialized) return;" at best.
All the same problems we had before reload was put under the instance
lock basically.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net v2] devlink: Delay health recover notification until devlink registered
2023-08-11 22:14 ` Jakub Kicinski
@ 2023-08-13 10:12 ` Shay Drory
0 siblings, 0 replies; 6+ messages in thread
From: Shay Drory @ 2023-08-13 10:12 UTC (permalink / raw)
To: Jakub Kicinski, Jiri Pirko; +Cc: netdev, pabeni, davem, edumazet, Jiri Pirko
On 12/08/2023 1:14, Jakub Kicinski wrote:
> External email: Use caution opening links or attachments
>
>
> On Fri, 11 Aug 2023 14:42:31 -0700 Jakub Kicinski wrote:
>>> Limiting the creation of health reporter only when instance is
>>> registered does not seem like a good solution to me.
>>>
>>> As with any other object, we postpone the notifications only after
>>> register is done, that sounds fine, doesn't it?
>> No, it doesn't. Registering health reporters on a semi-initialized
>> device is a major foot gun, we should not allow this unless really
>> necessary.
> FWIW I mean that the recovery may race with the init and try to access
> things which aren't set up yet. Could lead to deadlocks or crashes at
> worst and sprinkling of "if (!initialized) return;" at best.
> All the same problems we had before reload was put under the instance
> lock basically.
Ack, will work on it and send a new patch.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-08-13 10:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-09 20:35 [PATCH net v2] devlink: Delay health recover notification until devlink registered Shay Drory
2023-08-10 17:33 ` Jakub Kicinski
2023-08-11 8:05 ` Jiri Pirko
2023-08-11 21:42 ` Jakub Kicinski
2023-08-11 22:14 ` Jakub Kicinski
2023-08-13 10:12 ` Shay Drory
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).