From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nishanth Menon Subject: Re: [RFC PATCH] thermal: Schedule a backup thermal shutdown workqueue after a known period of time to tackle failed poweroff Date: Thu, 31 Dec 2015 11:47:57 -0600 Message-ID: <56856A4D.7090804@ti.com> References: <1450676778-7840-1-git-send-email-j-keerthy@ti.com> <5681742C.8050805@ti.com> <56824F81.7000202@ti.com> <20151231172906.GB11863@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20151231172906.GB11863@localhost.localdomain> Sender: linux-pm-owner@vger.kernel.org To: Eduardo Valentin , Keerthy Cc: Keerthy , rui.zhang@intel.com, linux-omap@vger.kernel.org, "linux-pm@vger.kernel.org" List-Id: linux-omap@vger.kernel.org On 12/31/2015 11:29 AM, Eduardo Valentin wrote: > can we have a shorter title? > > On Tue, Dec 29, 2015 at 02:46:49PM +0530, Keerthy wrote: >> Hi Nishanth, >> > > >>> >>> I am not sure if this #ifdeffery is even needed. >>> >>> >>> Eduardo, Rui: If this is not the suggested technique, maybe you guys >>> could suggest how we could handle a case where userspace might be >>> hungup due to some reason and a case where a critical temperature >>> event in the middle of device probe was triggered? > > Orderly power off is supposed to take care of this. Looking at the code, > it will force a shutdown in case execution of userland command fails: > > static int __orderly_poweroff(bool force) > { > int ret; > > ret = run_cmd(poweroff_cmd); > > if (ret && force) { > pr_warn("Failed to start orderly shutdown: forcing the issue\n"); > > /* > * I guess this should try to kick off some daemon to sync and > * poweroff asap. Or not even bother syncing if we're doing an > * emergency shutdown? > */ > emergency_sync(); > kernel_power_off(); > } Yes, it will *IF* userspace fails. the condition that I had tracked was before identifying the following fix[1] - Example fail is here[2] In this case, tmp102 is setup for X15 as [3] - and built as a module. as the kernel startsup filesystem and starts a modprobe of all modules via udev rules, the probe of tmp102 detects (falsely) a critical temperature condition. Shutdown attempt in the middle of driver probe is always a tricky business. As we look at the log in [2], Line 472 > thermal thermal_zone3: critical temperature reached(108 C),shutting down We have userspace trigger for shutdown taking place. Line 495: INIT: Sending processes the TERM signal userspace starts shutting down services. (but note that probe for other devices were either in progress or queued up to complete).. at line 647 - we are in a weird place -> sysrq shows that system is idled and userspace is shutdown and system is still active. In this case, we entered the case thanks to a driver bug, but if this situation was a real world temperature scenario, then we'd probably in an overtemp scenario, then device damage could take place OR something much worse. The only alternative is to run a parallel thread in case userspace fails to complete the job in some given period of time - due to what ever be the condition triggering the problem. I hope this explains the problem. [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=00917b5c55aeb01322d5ab51af8c025b82959224 [2] http://pastebin.ubuntu.com/14326688/ [3] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/am57xx-beagle-x15.dts#n738 -- Regards, Nishanth Menon