public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Khasnis Soumya <soumya.khasnis@sony.com>
To: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: gregkh@linuxfoundation.org, rafael@kernel.org,
	linux-kernel@vger.kernel.org, festevam@denx.de, lee@kernel.org,
	benjamin.bara@skidata.com, dmitry.osipenko@collabora.com,
	ldmldm05@gmail.com, soumya.khasnis@sony.com,
	srinavasa.nagaraju@sony.com, Madhusudan.Bobbili@sony.com,
	shingo.takeuchi@sony.com, keita.aihara@sony.com,
	masaya.takahashi@sony.com
Subject: Re: [PATCH v3] driver core: Add timeout for device shutdown
Date: Fri, 7 Jun 2024 11:37:50 +0000	[thread overview]
Message-ID: <20240607113750.GA30558@sony.com> (raw)
In-Reply-To: <97c8ab14-f56f-4a25-b036-51679251adf3@linaro.org>

On Thu, Jun 06, 2024 at 05:23:19PM +0200, Daniel Lezcano wrote:
> On 06/06/2024 10:50, Soumya Khasnis wrote:
> > The device shutdown callbacks invoked during shutdown/reboot
> > are prone to errors depending on the device state or mishandling
> > by one or more driver. In order to prevent a device hang in such
> > scenarios, we bail out after a timeout while dumping a meaningful
> > call trace of the shutdown callback to kernel logs, which blocks
> > the shutdown or reboot process.
> 
> Is that not somehow already achieved by the watchdog mechanism ?
The hard or software watchdog enabled by config_lockup_detector couldn’t
detect the cases when stalled on IO wait (wait_for_completion/io)

> 
> > Signed-off-by: Soumya Khasnis <soumya.khasnis@sony.com>
> > Signed-off-by: Srinavasa Nagaraju <Srinavasa.Nagaraju@sony.com>
> > ---
> > Changes in v3:
> >    -fix review comments
> >    -updated commit message
> > 
> >   drivers/base/Kconfig | 18 ++++++++++++++++++
> >   drivers/base/base.h  |  8 ++++++++
> >   drivers/base/core.c  | 40 ++++++++++++++++++++++++++++++++++++++++
> >   3 files changed, 66 insertions(+)
> > 
> > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> > index 2b8fd6bb7da0..342d3f87a404 100644
> > --- a/drivers/base/Kconfig
> > +++ b/drivers/base/Kconfig
> > @@ -243,3 +243,21 @@ config FW_DEVLINK_SYNC_STATE_TIMEOUT
> >   	  work on.
> >   
> >   endmenu
> > +
> > +config DEVICE_SHUTDOWN_TIMEOUT
> > +	bool "device shutdown timeout"
> > +	default y
> > +	help
> > +	   Enable timeout for device shutdown. In case of device shutdown is
> > +	   broken or device is not responding, system shutdown or restart may hang.
> > +	   This timeout handles such situation and triggers emergency_restart or
> > +	   machine_power_off. Also dumps call trace of shutdown process.
> > +
> > +
> > +config DEVICE_SHUTDOWN_TIMEOUT_SEC
> > +	int "device shutdown timeout in seconds"
> > +	range 10 60
> > +	default 10
> 
> How do you know the shutdown time is between this range?
> 
> What about large systems ?
Agree it is difficult to set single timeout for all device.
This range I have based on consumer device where response time cannot be more.
But still as you mentioned we can not make this configuration by default "true/y"
with some fixed range. I will change patch  to set this configuration default to 
"false/n" as before, and will also remove range.

> 
> > +	depends on DEVICE_SHUTDOWN_TIMEOUT
> > +	help
> > +	  sets time for device shutdown timeout in seconds
> > diff --git a/drivers/base/base.h b/drivers/base/base.h
> > index 0738ccad08b2..97eea57a8868 100644
> > --- a/drivers/base/base.h
> > +++ b/drivers/base/base.h
> > @@ -243,3 +243,11 @@ static inline int devtmpfs_delete_node(struct device *dev) { return 0; }
> >   
> >   void software_node_notify(struct device *dev);
> >   void software_node_notify_remove(struct device *dev);
> > +
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +struct device_shutdown_timeout {
> > +	struct timer_list timer;
> > +	struct task_struct *task;
> > +};
> > +#define SHUTDOWN_TIMEOUT CONFIG_DEVICE_SHUTDOWN_TIMEOUT_SEC
> > +#endif
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index b93f3c5716ae..dab455054a80 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -35,6 +35,12 @@
> >   #include "base.h"
> >   #include "physical_location.h"
> >   #include "power/power.h"
> > +#include <linux/sched/debug.h>
> > +#include <linux/reboot.h>
> > +
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +struct device_shutdown_timeout devs_shutdown;
> > +#endif
> >   
> >   /* Device links support. */
> >   static LIST_HEAD(deferred_sync);
> > @@ -4799,6 +4805,38 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
> >   }
> >   EXPORT_SYMBOL_GPL(device_change_owner);
> >   
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +static void device_shutdown_timeout_handler(struct timer_list *t)
> > +{
> > +	pr_emerg("**** device shutdown timeout ****\n");
> > +	show_stack(devs_shutdown.task, NULL, KERN_EMERG);
> > +	if (system_state == SYSTEM_RESTART)
> > +		emergency_restart();
> > +	else
> > +		machine_power_off();
> > +}
> 
> So if one device is misbehaving, all the others shutdown callbacks are 
> skipped with emergency halt/reboot ? That is prone to break the system, no?
Skipping other callback may not cause system break, but emergency shutdown or
reboot is better then leave system in hung state. That is the main functionality
of this patch.
> 
> > +static void device_shutdown_timer_set(void)
> > +{
> > +	devs_shutdown.task = current;
> > +	timer_setup(&devs_shutdown.timer, device_shutdown_timeout_handler, 0);
> > +	devs_shutdown.timer.expires = jiffies + SHUTDOWN_TIMEOUT * HZ;
> > +	add_timer(&devs_shutdown.timer);
> > +}
> > +
> > +static void device_shutdown_timer_clr(void)
> > +{
> > +	del_timer(&devs_shutdown.timer);
> > +}
> > +#else
> > +static inline void device_shutdown_timer_set(void)
> > +{
> > +}
> > +static inline void device_shutdown_timer_clr(void)
> > +{
> > +}
> > +#endif
> > +
> >   /**
> >    * device_shutdown - call ->shutdown() on each device to shutdown.
> >    */
> > @@ -4810,6 +4848,7 @@ void device_shutdown(void)
> >   	device_block_probing();
> >   
> >   	cpufreq_suspend();
> > +	device_shutdown_timer_set();
> >   
> >   	spin_lock(&devices_kset->list_lock);
> >   	/*
> > @@ -4869,6 +4908,7 @@ void device_shutdown(void)
> >   		spin_lock(&devices_kset->list_lock);
> >   	}
> >   	spin_unlock(&devices_kset->list_lock);
> > +	device_shutdown_timer_clr();
> >   }
> >   
> >   /*
> 
> -- 
> <https://urldefense.com/v3/__http://www.linaro.org/__;!!JmoZiZGBv3RvKRSx!6XWB4gl8L3rRMPtMmiqJdKcGhAMKhZ9UVvLyqOiGr3vHiQzlgwInwY3OVNNzXZsLONbeCLZZ-CY-APJdHGYO7DpNrCqk$ [linaro[.]org]> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <https://urldefense.com/v3/__http://www.facebook.com/pages/Linaro__;!!JmoZiZGBv3RvKRSx!6XWB4gl8L3rRMPtMmiqJdKcGhAMKhZ9UVvLyqOiGr3vHiQzlgwInwY3OVNNzXZsLONbeCLZZ-CY-APJdHGYO7AtMvPiK$ [facebook[.]com]> Facebook |
> <https://urldefense.com/v3/__http://twitter.com/*!/linaroorg__;Iw!!JmoZiZGBv3RvKRSx!6XWB4gl8L3rRMPtMmiqJdKcGhAMKhZ9UVvLyqOiGr3vHiQzlgwInwY3OVNNzXZsLONbeCLZZ-CY-APJdHGYO7Imo3W2M$ [twitter[.]com]> Twitter |
> <https://urldefense.com/v3/__http://www.linaro.org/linaro-blog/__;!!JmoZiZGBv3RvKRSx!6XWB4gl8L3rRMPtMmiqJdKcGhAMKhZ9UVvLyqOiGr3vHiQzlgwInwY3OVNNzXZsLONbeCLZZ-CY-APJdHGYO7DxWnKe3$ [linaro[.]org]> Blog
> 

  reply	other threads:[~2024-06-07 11:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-06  8:50 [PATCH v3] driver core: Add timeout for device shutdown Soumya Khasnis
2024-06-06 15:23 ` Daniel Lezcano
2024-06-07 11:37   ` Khasnis Soumya [this message]
2024-06-06 20:52 ` Greg KH
2024-06-07 11:50   ` Khasnis Soumya
2024-06-07  0:03 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240607113750.GA30558@sony.com \
    --to=soumya.khasnis@sony.com \
    --cc=Madhusudan.Bobbili@sony.com \
    --cc=benjamin.bara@skidata.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dmitry.osipenko@collabora.com \
    --cc=festevam@denx.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=keita.aihara@sony.com \
    --cc=ldmldm05@gmail.com \
    --cc=lee@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masaya.takahashi@sony.com \
    --cc=rafael@kernel.org \
    --cc=shingo.takeuchi@sony.com \
    --cc=srinavasa.nagaraju@sony.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox