* Re: [RFC PATCH] thermal: Schedule a backup thermal shutdown workqueue after a known period of time to tackle failed poweroff [not found] <1450676778-7840-1-git-send-email-j-keerthy@ti.com> @ 2015-12-28 17:41 ` Nishanth Menon 2015-12-29 9:16 ` Keerthy 0 siblings, 1 reply; 6+ messages in thread From: Nishanth Menon @ 2015-12-28 17:41 UTC (permalink / raw) To: Keerthy, rui.zhang, edubezval; +Cc: linux-omap, linux-pm@vger.kernel.org On 12/20/2015 11:46 PM, Keerthy wrote: +linux-pm. > In few rare conditions like during boot up the orderly_poweroff > function might not be able to complete fully leaving the device > running at dangerously high temperatures. Hence adding a backup > workqueue to act after a known period of time and poweroff the device. > > Suggested-by: Nishanth Menon <nm@ti.com> > Reported-by: Nishanth Menon <nm@ti.com> The specific case I hit was a critical temperature event as soon as the hwmon device was probed (the driver tmp102 was a kernel module). > Signed-off-by: Keerthy <j-keerthy@ti.com> > --- > drivers/thermal/Kconfig | 9 +++++++++ > drivers/thermal/thermal_core.c | 26 ++++++++++++++++++++++++++ > 2 files changed, 35 insertions(+) > > diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig > index 8cc4ac6..25584ee 100644 > --- a/drivers/thermal/Kconfig > +++ b/drivers/thermal/Kconfig > @@ -92,6 +92,15 @@ config THERMAL_DEFAULT_GOV_POWER_ALLOCATOR > > endchoice > > +config THERMAL_BKUP_SHUTDOWN_DELAY_MS > + int "Thermal shutdown backup delay in milli-seconds" > + depends on THERMAL > + default "5000" > + ---help--- > + The number of milliseconds to delay after orderly_poweroff call. Probably needs a rephrase. > + > + Default: 5000 (5 seconds) > + > config THERMAL_GOV_FAIR_SHARE > bool "Fair-share thermal governor" > help > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index d9e525c..b793857 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -61,6 +61,12 @@ static DEFINE_MUTEX(thermal_governor_lock); > > static struct thermal_governor *def_governor; > > +#ifdef CONFIG_THERMAL_BKUP_SHUTDOWN_DELAY_MS > +#define BKUP_SHUTDOWN_DELAY CONFIG_THERMAL_BKUP_SHUTDOWN_DELAY_MS > +#else > +#define BKUP_SHUTDOWN_DELAY 5000 > +#endif > + I am not sure if this #ifdeffery is even needed. Eduardo, Rui: If this is not the suggested technique, maybe you guys could suggest how we could handle a case where userspace might be hungup due to some reason and a case where a critical temperature event in the middle of device probe was triggered? Obviously, we'd like to take into consideration userspace latencies as well- but that is very specific to fs being run.. not really a simple problem, IMHO.. > static struct thermal_governor *__find_governor(const char *name) > { > struct thermal_governor *pos; > @@ -423,6 +429,18 @@ static void handle_non_critical_trips(struct thermal_zone_device *tz, > def_governor->throttle(tz, trip); > } > > +static void bkup_shutdown_func(struct work_struct *unused); > +static DECLARE_DELAYED_WORK(bkup_shutdown_work, bkup_shutdown_func); > + > +static void bkup_shutdown_func(struct work_struct *work) > +{ > + pr_warn("Orderly_poweroff has failed! Attempting kernel_power_off\n"); > + kernel_power_off(); > + > + pr_warn("kernel_power_off has failed! Attempting emergency_restart\n"); > + emergency_restart(); I think documentation is necessary that we are hoping for bootloader to be able to detect and halt as needed -> risk here obviously is an infinite reboot loop :( . > +} > + > static void handle_critical_trips(struct thermal_zone_device *tz, > int trip, enum thermal_trip_type trip_type) > { > @@ -443,6 +461,14 @@ static void handle_critical_trips(struct thermal_zone_device *tz, > dev_emerg(&tz->device, > "critical temperature reached(%d C),shutting down\n", > tz->temperature / 1000); > + /** needs to be kernel doc style? > + * This is a backup option to shutdown the > + * system in case orderly_poweroff > + * fails Maybe adjust to 80char? > + */ > + schedule_delayed_work(&bkup_shutdown_work, > + msecs_to_jiffies(BKUP_SHUTDOWN_DELAY)); > + > orderly_poweroff(true); > } > } > -- Regards, Nishanth Menon ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] thermal: Schedule a backup thermal shutdown workqueue after a known period of time to tackle failed poweroff 2015-12-28 17:41 ` [RFC PATCH] thermal: Schedule a backup thermal shutdown workqueue after a known period of time to tackle failed poweroff Nishanth Menon @ 2015-12-29 9:16 ` Keerthy 2015-12-31 17:29 ` Eduardo Valentin 0 siblings, 1 reply; 6+ messages in thread From: Keerthy @ 2015-12-29 9:16 UTC (permalink / raw) To: Nishanth Menon, Keerthy, rui.zhang, edubezval Cc: linux-omap, linux-pm@vger.kernel.org Hi Nishanth, On Monday 28 December 2015 11:11 PM, Nishanth Menon wrote: > On 12/20/2015 11:46 PM, Keerthy wrote: > > +linux-pm. Thanks for looping! > >> In few rare conditions like during boot up the orderly_poweroff >> function might not be able to complete fully leaving the device >> running at dangerously high temperatures. Hence adding a backup >> workqueue to act after a known period of time and poweroff the device. >> > > >> Suggested-by: Nishanth Menon <nm@ti.com> >> Reported-by: Nishanth Menon <nm@ti.com> > > The specific case I hit was a critical temperature event as soon as > the hwmon device was probed (the driver tmp102 was a kernel module). > >> Signed-off-by: Keerthy <j-keerthy@ti.com> >> --- >> drivers/thermal/Kconfig | 9 +++++++++ >> drivers/thermal/thermal_core.c | 26 ++++++++++++++++++++++++++ >> 2 files changed, 35 insertions(+) >> >> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig >> index 8cc4ac6..25584ee 100644 >> --- a/drivers/thermal/Kconfig >> +++ b/drivers/thermal/Kconfig >> @@ -92,6 +92,15 @@ config THERMAL_DEFAULT_GOV_POWER_ALLOCATOR >> >> endchoice >> >> +config THERMAL_BKUP_SHUTDOWN_DELAY_MS >> + int "Thermal shutdown backup delay in milli-seconds" >> + depends on THERMAL >> + default "5000" >> + ---help--- >> + The number of milliseconds to delay after orderly_poweroff call. > > Probably needs a rephrase. Delay in milliseconds before the backup thermal shutdown is triggered. > >> + >> + Default: 5000 (5 seconds) >> + >> config THERMAL_GOV_FAIR_SHARE >> bool "Fair-share thermal governor" >> help >> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c >> index d9e525c..b793857 100644 >> --- a/drivers/thermal/thermal_core.c >> +++ b/drivers/thermal/thermal_core.c >> @@ -61,6 +61,12 @@ static DEFINE_MUTEX(thermal_governor_lock); >> >> static struct thermal_governor *def_governor; >> >> +#ifdef CONFIG_THERMAL_BKUP_SHUTDOWN_DELAY_MS >> +#define BKUP_SHUTDOWN_DELAY CONFIG_THERMAL_BKUP_SHUTDOWN_DELAY_MS >> +#else >> +#define BKUP_SHUTDOWN_DELAY 5000 >> +#endif >> + > > I am not sure if this #ifdeffery is even needed. > > > Eduardo, Rui: If this is not the suggested technique, maybe you guys > could suggest how we could handle a case where userspace might be > hungup due to some reason and a case where a critical temperature > event in the middle of device probe was triggered? > > Obviously, we'd like to take into consideration userspace latencies as > well- but that is very specific to fs being run.. not really a simple > problem, IMHO.. > >> static struct thermal_governor *__find_governor(const char *name) >> { >> struct thermal_governor *pos; >> @@ -423,6 +429,18 @@ static void handle_non_critical_trips(struct thermal_zone_device *tz, >> def_governor->throttle(tz, trip); >> } >> >> +static void bkup_shutdown_func(struct work_struct *unused); >> +static DECLARE_DELAYED_WORK(bkup_shutdown_work, bkup_shutdown_func); >> + >> +static void bkup_shutdown_func(struct work_struct *work) >> +{ >> + pr_warn("Orderly_poweroff has failed! Attempting kernel_power_off\n"); >> + kernel_power_off(); >> + >> + pr_warn("kernel_power_off has failed! Attempting emergency_restart\n"); >> + emergency_restart(); > > I think documentation is necessary that we are hoping for bootloader > to be able to detect and halt as needed -> risk here obviously is an > infinite reboot loop :( . Agreed. > > >> +} >> + >> static void handle_critical_trips(struct thermal_zone_device *tz, >> int trip, enum thermal_trip_type trip_type) >> { >> @@ -443,6 +461,14 @@ static void handle_critical_trips(struct thermal_zone_device *tz, >> dev_emerg(&tz->device, >> "critical temperature reached(%d C),shutting down\n", >> tz->temperature / 1000); >> + /** > > needs to be kernel doc style? > >> + * This is a backup option to shutdown the >> + * system in case orderly_poweroff >> + * fails > Maybe adjust to 80char? Okay. > >> + */ >> + schedule_delayed_work(&bkup_shutdown_work, >> + msecs_to_jiffies(BKUP_SHUTDOWN_DELAY)); >> + >> orderly_poweroff(true); >> } >> } >> > > I will wait for Eduardo/Rui's inputs before posting the next version. Best Regards, Keerthy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] thermal: Schedule a backup thermal shutdown workqueue after a known period of time to tackle failed poweroff 2015-12-29 9:16 ` Keerthy @ 2015-12-31 17:29 ` Eduardo Valentin 2015-12-31 17:47 ` Nishanth Menon 0 siblings, 1 reply; 6+ messages in thread From: Eduardo Valentin @ 2015-12-31 17:29 UTC (permalink / raw) To: Keerthy Cc: Nishanth Menon, Keerthy, rui.zhang, linux-omap, linux-pm@vger.kernel.org can we have a shorter title? On Tue, Dec 29, 2015 at 02:46:49PM +0530, Keerthy wrote: > Hi Nishanth, > <cut> > > > >I am not sure if this #ifdeffery is even needed. > > > > > >Eduardo, Rui: If this is not the suggested technique, maybe you guys > >could suggest how we could handle a case where userspace might be > >hungup due to some reason and a case where a critical temperature > >event in the middle of device probe was triggered? Orderly power off is supposed to take care of this. Looking at the code, it will force a shutdown in case execution of userland command fails: static int __orderly_poweroff(bool force) { int ret; ret = run_cmd(poweroff_cmd); if (ret && force) { pr_warn("Failed to start orderly shutdown: forcing the issue\n"); /* * I guess this should try to kick off some daemon to sync and * poweroff asap. Or not even bother syncing if we're doing an * emergency shutdown? */ emergency_sync(); kernel_power_off(); } > > > >Obviously, we'd like to take into consideration userspace latencies as > >well- but that is very specific to fs being run.. not really a simple > >problem, IMHO.. > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] thermal: Schedule a backup thermal shutdown workqueue after a known period of time to tackle failed poweroff 2015-12-31 17:29 ` Eduardo Valentin @ 2015-12-31 17:47 ` Nishanth Menon 2015-12-31 18:20 ` Eduardo Valentin 0 siblings, 1 reply; 6+ messages in thread From: Nishanth Menon @ 2015-12-31 17:47 UTC (permalink / raw) To: Eduardo Valentin, Keerthy Cc: Keerthy, rui.zhang, linux-omap, linux-pm@vger.kernel.org On 12/31/2015 11:29 AM, Eduardo Valentin wrote: > can we have a shorter title? > > On Tue, Dec 29, 2015 at 02:46:49PM +0530, Keerthy wrote: >> Hi Nishanth, >> > > <cut> >>> >>> I am not sure if this #ifdeffery is even needed. >>> >>> >>> Eduardo, Rui: If this is not the suggested technique, maybe you guys >>> could suggest how we could handle a case where userspace might be >>> hungup due to some reason and a case where a critical temperature >>> event in the middle of device probe was triggered? > > Orderly power off is supposed to take care of this. Looking at the code, > it will force a shutdown in case execution of userland command fails: > > static int __orderly_poweroff(bool force) > { > int ret; > > ret = run_cmd(poweroff_cmd); > > if (ret && force) { > pr_warn("Failed to start orderly shutdown: forcing the issue\n"); > > /* > * I guess this should try to kick off some daemon to sync and > * poweroff asap. Or not even bother syncing if we're doing an > * emergency shutdown? > */ > emergency_sync(); > kernel_power_off(); > } Yes, it will *IF* userspace fails. the condition that I had tracked was before identifying the following fix[1] - Example fail is here[2] In this case, tmp102 is setup for X15 as [3] - and built as a module. as the kernel startsup filesystem and starts a modprobe of all modules via udev rules, the probe of tmp102 detects (falsely) a critical temperature condition. Shutdown attempt in the middle of driver probe is always a tricky business. As we look at the log in [2], Line 472 > thermal thermal_zone3: critical temperature reached(108 C),shutting down We have userspace trigger for shutdown taking place. Line 495: INIT: Sending processes the TERM signal userspace starts shutting down services. (but note that probe for other devices were either in progress or queued up to complete).. at line 647 - we are in a weird place -> sysrq shows that system is idled and userspace is shutdown and system is still active. In this case, we entered the case thanks to a driver bug, but if this situation was a real world temperature scenario, then we'd probably in an overtemp scenario, then device damage could take place OR something much worse. The only alternative is to run a parallel thread in case userspace fails to complete the job in some given period of time - due to what ever be the condition triggering the problem. I hope this explains the problem. [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=00917b5c55aeb01322d5ab51af8c025b82959224 [2] http://pastebin.ubuntu.com/14326688/ [3] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/am57xx-beagle-x15.dts#n738 -- Regards, Nishanth Menon ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] thermal: Schedule a backup thermal shutdown workqueue after a known period of time to tackle failed poweroff 2015-12-31 17:47 ` Nishanth Menon @ 2015-12-31 18:20 ` Eduardo Valentin 2015-12-31 18:29 ` Nishanth Menon 0 siblings, 1 reply; 6+ messages in thread From: Eduardo Valentin @ 2015-12-31 18:20 UTC (permalink / raw) To: Nishanth Menon Cc: Keerthy, Keerthy, rui.zhang, linux-omap, linux-pm@vger.kernel.org On Thu, Dec 31, 2015 at 11:47:57AM -0600, Nishanth Menon wrote: > On 12/31/2015 11:29 AM, Eduardo Valentin wrote: > > can we have a shorter title? > > > > > > Orderly power off is supposed to take care of this. Looking at the code, > > it will force a shutdown in case execution of userland command fails: > > > > static int __orderly_poweroff(bool force) > > { > > int ret; > > > > ret = run_cmd(poweroff_cmd); > > > > if (ret && force) { > > pr_warn("Failed to start orderly shutdown: forcing the issue\n"); > > > > /* > > * I guess this should try to kick off some daemon to sync and > > * poweroff asap. Or not even bother syncing if we're doing an > > * emergency shutdown? > > */ > > emergency_sync(); > > kernel_power_off(); > > } > > Yes, it will *IF* userspace fails. the condition that I had tracked > was before identifying the following fix[1] - Example fail is here[2] > OK. But still, why other users of orderly_poweroff do not deserve to be fixed, then? > > I hope this explains the problem. > > [1] > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=00917b5c55aeb01322d5ab51af8c025b82959224 > [2] http://pastebin.ubuntu.com/14326688/ > > [3] > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/am57xx-beagle-x15.dts#n738 > > -- > Regards, > Nishanth Menon ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] thermal: Schedule a backup thermal shutdown workqueue after a known period of time to tackle failed poweroff 2015-12-31 18:20 ` Eduardo Valentin @ 2015-12-31 18:29 ` Nishanth Menon 0 siblings, 0 replies; 6+ messages in thread From: Nishanth Menon @ 2015-12-31 18:29 UTC (permalink / raw) To: Eduardo Valentin Cc: Keerthy, Keerthy, rui.zhang, linux-omap, linux-pm@vger.kernel.org On 12/31/2015 12:20 PM, Eduardo Valentin wrote: > On Thu, Dec 31, 2015 at 11:47:57AM -0600, Nishanth Menon wrote: >> On 12/31/2015 11:29 AM, Eduardo Valentin wrote: >>> can we have a shorter title? >>> >>> >>> Orderly power off is supposed to take care of this. Looking at the code, >>> it will force a shutdown in case execution of userland command fails: >>> >>> static int __orderly_poweroff(bool force) >>> { >>> int ret; >>> >>> ret = run_cmd(poweroff_cmd); >>> >>> if (ret && force) { >>> pr_warn("Failed to start orderly shutdown: forcing the issue\n"); >>> >>> /* >>> * I guess this should try to kick off some daemon to sync and >>> * poweroff asap. Or not even bother syncing if we're doing an >>> * emergency shutdown? >>> */ >>> emergency_sync(); >>> kernel_power_off(); >>> } >> >> Yes, it will *IF* userspace fails. the condition that I had tracked >> was before identifying the following fix[1] - Example fail is here[2] >> > > OK. But still, why other users of orderly_poweroff do not > deserve to be fixed, then? > I'd agree as well.. I guess the comment from Robin Holt <holt@sgi.com> anticipated something like this will eventually occur. "* I guess this should try to kick off some daemon to sync and * poweroff asap. Or not even bother syncing if we're doing an * emergency shutdown? " Keerthy - would you spin this as a generic fix? >> >> I hope this explains the problem. >> >> [1] >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=00917b5c55aeb01322d5ab51af8c025b82959224 >> [2] http://pastebin.ubuntu.com/14326688/ >> >> [3] >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/am57xx-beagle-x15.dts#n738 >> >> -- >> Regards, >> Nishanth Menon -- Regards, Nishanth Menon ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-12-31 18:30 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1450676778-7840-1-git-send-email-j-keerthy@ti.com> 2015-12-28 17:41 ` [RFC PATCH] thermal: Schedule a backup thermal shutdown workqueue after a known period of time to tackle failed poweroff Nishanth Menon 2015-12-29 9:16 ` Keerthy 2015-12-31 17:29 ` Eduardo Valentin 2015-12-31 17:47 ` Nishanth Menon 2015-12-31 18:20 ` Eduardo Valentin 2015-12-31 18:29 ` Nishanth Menon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).