From: Khasnis Soumya <soumya.khasnis@sony.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: rafael@kernel.org, linux-kernel@vger.kernel.org,
daniel.lezcano@linaro.org, festevam@denx.de, lee@kernel.org,
benjamin.bara@skidata.com, dmitry.osipenko@collabora.com,
ldmldm05@gmail.com, soumya.khasnis@sony.com,
srinavasa.nagaraju@sony.com, Madhusudan.Bobbili@sony.com,
shingo.takeuchi@sony.com, keita.aihara@sony.com,
masaya.takahashi@sony.com
Subject: Re: [PATCH v2] reboot: Add timeout for device shutdown
Date: Thu, 6 Jun 2024 08:48:17 +0000 [thread overview]
Message-ID: <20240606084817.GA118817@sony.com> (raw)
In-Reply-To: <2024052927-traffic-lazy-e3ad@gregkh>
On Wed, May 29, 2024 at 02:51:48PM +0200, Greg KH wrote:
> On Wed, May 29, 2024 at 11:00:49AM +0000, Soumya Khasnis wrote:
> > The device shutdown callbacks invoked during shutdown/reboot
> > are prone to errors depending on the device state or mishandling
> > by one or more driver.
>
> Why not fix those drivers? A release callback should not stall, and if
> it does, that's a bug that should be fixed there.
Yes, the drivers must be fixed. But currently there is no good mechanism
which detects such stalls and there by resulting in a system hang. This
serves as a fail-safe mechanism to prevent such stalls at reboot/shutdown.
>
> Or use a watchdog and just reboot if that triggers at shutdown time.
The hard or software watchdog enabled by config_lockup_detector couldn’t
detect the cases when stalled on IO wait (wait_for_completion/io)
>
> > In order to prevent a device hang in such
> > scenarios, we bail out after a timeout while dumping a meaningful
> > call trace of the shutdown callback which blocks the shutdown or
> > reboot process.
>
> Dump it where?
Dump to kernel log and there by the console. Will update the commit message
>
>
> >
> > Signed-off-by: Soumya Khasnis <soumya.khasnis@sony.com>
> > Signed-off-by: Srinavasa Nagaraju <Srinavasa.Nagaraju@sony.com>
> > ---
> > drivers/base/Kconfig | 15 +++++++++++++++
> > kernel/reboot.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
> > 2 files changed, 60 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> > index 2b8fd6bb7da0..d06e379b6281 100644
> > --- a/drivers/base/Kconfig
> > +++ b/drivers/base/Kconfig
> > @@ -243,3 +243,18 @@ config FW_DEVLINK_SYNC_STATE_TIMEOUT
> > work on.
> >
> > endmenu
> > +
> > +config DEVICE_SHUTDOWN_TIMEOUT
> > + bool "device shutdown timeout"
> > + default n
>
> That is the default, so no need for this.
Was mindful of the “slow” devices in the system with long IO wait times.
Will have to consider increasing DEVICE_SHUTDOWN_TIMEOUT_SEC and make this
a default y
>
>
> > + help
> > + Enable timeout for device shutdown. Helps in case device shutdown
> > + is hung during shoutdonw and reboot.
> > +
> > +
> > +config DEVICE_SHUTDOWN_TIMEOUT_SEC
> > + int "device shutdown timeout in seconds"
> > + default 5
> > + depends on DEVICE_SHUTDOWN_TIMEOUT
> > + help
> > + sets time for device shutdown timeout in seconds
>
> You need much more help text for all of these.
Will update the commit with more help text
>
> And why are these in the drivers/base/Kconfig file? It has nothing to
> do with "devices", or the driver core, it's all core kernel reboot
> logic.
This is handling only device shutdown, not complete reboot part,
so we want to keep it here.
>
>
> > diff --git a/kernel/reboot.c b/kernel/reboot.c
> > index 22c16e2564cc..8460bd24563b 100644
> > --- a/kernel/reboot.c
> > +++ b/kernel/reboot.c
> > @@ -18,7 +18,7 @@
> > #include <linux/syscalls.h>
> > #include <linux/syscore_ops.h>
> > #include <linux/uaccess.h>
> > -
> > +#include <linux/sched/debug.h>
>
> Why remove the blank line?
Will fix it.
>
> > /*
> > * this indicates whether you can reboot with ctrl-alt-del: the default is yes
> > */
> > @@ -48,6 +48,14 @@ int reboot_cpu;
> > enum reboot_type reboot_type = BOOT_ACPI;
> > int reboot_force;
> >
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +struct device_shutdown_timeout {
> > + struct timer_list timer;
> > + struct task_struct *task;
> > +} devs_shutdown;
> > +#define SHUTDOWN_TIMEOUT CONFIG_DEVICE_SHUTDOWN_TIMEOUT_SEC
> > +#endif
>
> #ifdefs should not be in .c files, please put this in a .h file where it
> belongs. Same for the other #ifdefs.
Will fix it
>
>
>
> > +
> > struct sys_off_handler {
> > struct notifier_block nb;
> > int (*sys_off_cb)(struct sys_off_data *data);
> > @@ -88,12 +96,46 @@ void emergency_restart(void)
> > }
> > EXPORT_SYMBOL_GPL(emergency_restart);
> >
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +static void device_shutdown_timeout_handler(struct timer_list *t)
> > +{
> > + pr_emerg("**** device shutdown timeout ****\n");
>
> What does this have to do with "devices"? This is a whole-system issue,
> or really a "broken driver" issue.
Shutdown/reboot procedure involves quite a few routines. We are specifically
dealing with a broken driver or device malfunction.
>
> > + show_stack(devs_shutdown.task, NULL, KERN_EMERG);
>
> How do you know this is the 'device shutdown' stack? What is a "device
> shutdown"?
The timeout handler is invoked in the context of reboot/shutdown process.
We are interested in the device shutdown callback which was stuck and
preventing the reboot/shutdown. By device shutdown, we refer to device
class/bus/driver shutdown calls.
>
> > + if (system_state == SYSTEM_RESTART)
> > + emergency_restart();
> > + else
> > + machine_power_off();
> > +}
> > +
> > +static void device_shutdown_timer_set(void)
> > +{
> > + devs_shutdown.task = current;
>
> It's just the normal shutdown stack/process, why call it a device?
We are specifically dealing with a broken driver or device malfunction,
So to indicate same, we want to name it as device_shutdown stack.
>
> > + timer_setup(&devs_shutdown.timer, device_shutdown_timeout_handler, 0);
> > + devs_shutdown.timer.expires = jiffies + SHUTDOWN_TIMEOUT * HZ;
> > + add_timer(&devs_shutdown.timer);
> > +}
> > +
> > +static void device_shutdown_timer_clr(void)
> > +{
> > + del_timer(&devs_shutdown.timer);
> > +}
> > +#else
> > +static inline void device_shutdown_timer_set(void)
> > +{
> > +}
> > +static inline void device_shutdown_timer_clr(void)
> > +{
> > +}
> > +#endif
> > +
> > void kernel_restart_prepare(char *cmd)
> > {
> > blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
> > system_state = SYSTEM_RESTART;
> > usermodehelper_disable();
> > + device_shutdown_timer_set();
> > device_shutdown();
> > + device_shutdown_timer_clr();
>
> Why isn't all of this done in device_shutdown() if you think it is a
> device issue? Why put it in reboot.c?
Will consider moving to device_shutdown()..
>
> thanks,
>
> greg k-h
Thanks and Regards,
Soumya.
next prev parent reply other threads:[~2024-06-06 8:58 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-29 11:00 [PATCH v2] reboot: Add timeout for device shutdown Soumya Khasnis
2024-05-29 12:47 ` Greg KH
2024-05-29 12:51 ` Greg KH
2024-06-06 8:48 ` Khasnis Soumya [this message]
2024-05-30 3:02 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240606084817.GA118817@sony.com \
--to=soumya.khasnis@sony.com \
--cc=Madhusudan.Bobbili@sony.com \
--cc=benjamin.bara@skidata.com \
--cc=daniel.lezcano@linaro.org \
--cc=dmitry.osipenko@collabora.com \
--cc=festevam@denx.de \
--cc=gregkh@linuxfoundation.org \
--cc=keita.aihara@sony.com \
--cc=ldmldm05@gmail.com \
--cc=lee@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=masaya.takahashi@sony.com \
--cc=rafael@kernel.org \
--cc=shingo.takeuchi@sony.com \
--cc=srinavasa.nagaraju@sony.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox