All of lore.kernel.org
 help / color / mirror / Atom feed
From: Khasnis Soumya <soumya.khasnis@sony.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: rafael@kernel.org, linux-kernel@vger.kernel.org,
	daniel.lezcano@linaro.org, festevam@denx.de, lee@kernel.org,
	benjamin.bara@skidata.com, dmitry.osipenko@collabora.com,
	ldmldm05@gmail.com, soumya.khasnis@sony.com,
	srinavasa.nagaraju@sony.com, Madhusudan.Bobbili@sony.com,
	shingo.takeuchi@sony.com, keita.aihara@sony.com,
	masaya.takahashi@sony.com
Subject: Re: [PATCH v2] reboot: Add timeout for device shutdown
Date: Thu, 6 Jun 2024 08:48:17 +0000	[thread overview]
Message-ID: <20240606084817.GA118817@sony.com> (raw)
In-Reply-To: <2024052927-traffic-lazy-e3ad@gregkh>

On Wed, May 29, 2024 at 02:51:48PM +0200, Greg KH wrote:
> On Wed, May 29, 2024 at 11:00:49AM +0000, Soumya Khasnis wrote:
> > The device shutdown callbacks invoked during shutdown/reboot
> > are prone to errors depending on the device state or mishandling
> > by one or more driver.
> 
> Why not fix those drivers?  A release callback should not stall, and if
> it does, that's a bug that should be fixed there.
Yes, the drivers must be fixed. But currently there is no good mechanism
which detects such stalls and there by resulting in a system hang. This
serves as a fail-safe mechanism to prevent such stalls at reboot/shutdown.

> 
> Or use a watchdog and just reboot if that triggers at shutdown time.
The hard or software watchdog enabled by config_lockup_detector couldn’t
detect the cases when stalled on IO wait (wait_for_completion/io)

> 
> > In order to prevent a device hang in such
> > scenarios, we bail out after a timeout while dumping a meaningful
> > call trace of the shutdown callback which blocks the shutdown or
> > reboot process.
> 
> Dump it where?
Dump to kernel log and there by the console. Will update the commit message

> 
> 
> > 
> > Signed-off-by: Soumya Khasnis <soumya.khasnis@sony.com>
> > Signed-off-by: Srinavasa Nagaraju <Srinavasa.Nagaraju@sony.com>
> > ---
> >  drivers/base/Kconfig | 15 +++++++++++++++
> >  kernel/reboot.c      | 46 +++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 60 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> > index 2b8fd6bb7da0..d06e379b6281 100644
> > --- a/drivers/base/Kconfig
> > +++ b/drivers/base/Kconfig
> > @@ -243,3 +243,18 @@ config FW_DEVLINK_SYNC_STATE_TIMEOUT
> >  	  work on.
> >  
> >  endmenu
> > +
> > +config DEVICE_SHUTDOWN_TIMEOUT
> > +	bool "device shutdown timeout"
> > +	default n
> 
> That is the default, so no need for this.
Was mindful of the “slow” devices in the system with long IO wait times.
Will have to consider increasing DEVICE_SHUTDOWN_TIMEOUT_SEC and make this
a default y

> 
> 
> > +	help
> > +	   Enable timeout for device shutdown. Helps in case device shutdown
> > +	   is hung during shoutdonw and reboot.
> > +
> > +
> > +config DEVICE_SHUTDOWN_TIMEOUT_SEC
> > +	int "device shutdown timeout in seconds"
> > +	default 5
> > +	depends on DEVICE_SHUTDOWN_TIMEOUT
> > +	help
> > +	  sets time for device shutdown timeout in seconds
> 
> You need much more help text for all of these.
Will update the commit with more help text

> 
> And why are these in the drivers/base/Kconfig file?  It has nothing to
> do with "devices", or the driver core, it's all core kernel reboot
> logic.
This is handling only device shutdown, not complete reboot part,
so we want to keep it here.

> 
> 
> > diff --git a/kernel/reboot.c b/kernel/reboot.c
> > index 22c16e2564cc..8460bd24563b 100644
> > --- a/kernel/reboot.c
> > +++ b/kernel/reboot.c
> > @@ -18,7 +18,7 @@
> >  #include <linux/syscalls.h>
> >  #include <linux/syscore_ops.h>
> >  #include <linux/uaccess.h>
> > -
> > +#include <linux/sched/debug.h>
> 
> Why remove the blank line?
Will fix it.

> 
> >  /*
> >   * this indicates whether you can reboot with ctrl-alt-del: the default is yes
> >   */
> > @@ -48,6 +48,14 @@ int reboot_cpu;
> >  enum reboot_type reboot_type = BOOT_ACPI;
> >  int reboot_force;
> >  
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +struct device_shutdown_timeout {
> > +	struct timer_list timer;
> > +	struct task_struct *task;
> > +} devs_shutdown;
> > +#define SHUTDOWN_TIMEOUT CONFIG_DEVICE_SHUTDOWN_TIMEOUT_SEC
> > +#endif
> 
> #ifdefs should not be in .c files, please put this in a .h file where it
> belongs.  Same for the other #ifdefs.
Will fix it

> 
> 
> 
> > +
> >  struct sys_off_handler {
> >  	struct notifier_block nb;
> >  	int (*sys_off_cb)(struct sys_off_data *data);
> > @@ -88,12 +96,46 @@ void emergency_restart(void)
> >  }
> >  EXPORT_SYMBOL_GPL(emergency_restart);
> >  
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +static void device_shutdown_timeout_handler(struct timer_list *t)
> > +{
> > +	pr_emerg("**** device shutdown timeout ****\n");
> 
> What does this have to do with "devices"?  This is a whole-system issue,
> or really a "broken driver" issue.
Shutdown/reboot procedure involves quite a few routines. We are specifically
dealing with a broken driver or device malfunction.

> 
> > +	show_stack(devs_shutdown.task, NULL, KERN_EMERG);
> 
> How do you know this is the 'device shutdown' stack?  What is a "device
> shutdown"?
The timeout handler is invoked in the context of reboot/shutdown process.
We are interested in the device shutdown callback which was stuck and
preventing the reboot/shutdown. By device shutdown, we refer to device
class/bus/driver shutdown calls.

> 
> > +	if (system_state == SYSTEM_RESTART)
> > +		emergency_restart();
> > +	else
> > +		machine_power_off();
> > +}
> > +
> > +static void device_shutdown_timer_set(void)
> > +{
> > +	devs_shutdown.task = current;
> 
> It's just the normal shutdown stack/process, why call it a device?
We are specifically dealing with a broken driver or device malfunction,
So to indicate same, we want to name it as device_shutdown stack.

> 
> > +	timer_setup(&devs_shutdown.timer, device_shutdown_timeout_handler, 0);
> > +	devs_shutdown.timer.expires = jiffies + SHUTDOWN_TIMEOUT * HZ;
> > +	add_timer(&devs_shutdown.timer);
> > +}
> > +
> > +static void device_shutdown_timer_clr(void)
> > +{
> > +	del_timer(&devs_shutdown.timer);
> > +}
> > +#else
> > +static inline void device_shutdown_timer_set(void)
> > +{
> > +}
> > +static inline void device_shutdown_timer_clr(void)
> > +{
> > +}
> > +#endif
> > +
> >  void kernel_restart_prepare(char *cmd)
> >  {
> >  	blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
> >  	system_state = SYSTEM_RESTART;
> >  	usermodehelper_disable();
> > +	device_shutdown_timer_set();
> >  	device_shutdown();
> > +	device_shutdown_timer_clr();
> 
> Why isn't all of this done in device_shutdown() if you think it is a
> device issue?  Why put it in reboot.c?
Will consider moving to device_shutdown()..

> 
> thanks,
> 
> greg k-h

Thanks and Regards,
Soumya.

  reply	other threads:[~2024-06-06  8:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-29 11:00 [PATCH v2] reboot: Add timeout for device shutdown Soumya Khasnis
2024-05-29 12:47 ` Greg KH
2024-05-29 12:51 ` Greg KH
2024-06-06  8:48   ` Khasnis Soumya [this message]
2024-05-30  3:02 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240606084817.GA118817@sony.com \
    --to=soumya.khasnis@sony.com \
    --cc=Madhusudan.Bobbili@sony.com \
    --cc=benjamin.bara@skidata.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dmitry.osipenko@collabora.com \
    --cc=festevam@denx.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=keita.aihara@sony.com \
    --cc=ldmldm05@gmail.com \
    --cc=lee@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masaya.takahashi@sony.com \
    --cc=rafael@kernel.org \
    --cc=shingo.takeuchi@sony.com \
    --cc=srinavasa.nagaraju@sony.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.