From: Zhang Rui <rui.zhang@intel.com>
To: Keerthy <j-keerthy@ti.com>, Eduardo Valentin <edubezval@gmail.com>
Cc: linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-omap@vger.kernel.org, nm@ti.com, t-kristo@ti.com
Subject: Re: [PATCH] thermal: core: Add a back up thermal shutdown mechanism
Date: Wed, 12 Apr 2017 16:45:44 +0800 [thread overview]
Message-ID: <1491986744.2357.42.camel@intel.com> (raw)
In-Reply-To: <db07d448-0fa9-f582-a323-5edcb9a0b509@ti.com>
On Wed, 2017-04-12 at 14:06 +0530, Keerthy wrote:
>
> On Wednesday 12 April 2017 01:56 PM, Zhang Rui wrote:
> >
> > On Wed, 2017-04-12 at 13:25 +0530, Keerthy wrote:
> > >
> > >
> > > On Wednesday 12 April 2017 09:35 AM, Eduardo Valentin wrote:
> > > >
> > > >
> > > > Keerthy,
> > > >
> > > > On Wed, Apr 12, 2017 at 09:09:36AM +0530, Keerthy wrote:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wednesday 12 April 2017 08:50 AM, Zhang Rui wrote:
> > > > > >
> > > > > >
> > > > > > On Wed, 2017-04-12 at 08:19 +0530, Keerthy wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tuesday 11 April 2017 10:59 PM, Eduardo Valentin
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Hey,
> > > > > > > >
> > > > > > > > On Fri, Mar 31, 2017 at 12:00:20PM +0530, Keerthy
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > off).
> > > > <cut>
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > OK... This seams to me, still a corner case supposed to
> > > > > > > > be
> > > > > > > > fixed at
> > > > > > > > orderly_power_off, not at thermal. But..
> > > > > > > >
> > > > ^^^ Then again, this must be fixed not at thermal core. And re-
> > > > reading
> > > > the history of this thread, this seams to be really something
> > > > broken at
> > > > OMAP/DRA7, as mentioned in previous messages. That is probably
> > > > why
> > > > you
> > > > are pushing for pm_power_off(), which seams to be the one that
> > > > works for
> > > > you, pulling the plug correctly (DRA7).
> > > Zhang/Eduardo,
> > >
> > > OMAP5/DRA7 is one case.
> > >
> > > I believe i this is the root cause of this failure.
> > >
> > > thermal_zone_device_check --> thermal_zone_device_update -->
> > > handle_thermal_trip --> handle_critical_trips -->
> > > orderly_poweroff
> > >
> > > The above sequence happens every 250/500 mS based on the
> > > configuration.
> > > The orderly_poweroff function is getting called every 250/500 mS
> > > and
> > > i
> > > see with a full fledged nfs file system it takes at least 5-10
> > > Seconds
> > > to shutdown and during that time we bombard with orderly_poweroff
> > > calls
> > > multiple times due to the thermal_zone_device_check triggering
> > > periodically.
> > >
> > > To confirm that i made sure that handle_critical_trips calls
> > > orderly_poweroff only once and i no longer see the failure on
> > > DRA72-
> > > EVM
> > > board.
> > >
> > Nice catch!
> Thanks.
>
> >
> >
> > >
> > > So IMHO once we get to handle_critical_trips case where we do
> > > orderly_poweroff we need to do the following:
> > >
> > > 1) Make sure that orderly_poweroff is called only once.
> > agreed.
> >
> > >
> > > 2) Cancel all the scheduled work queues to monitor the
> > > temperature as
> > > we have already reached a point of shutting down the system.
> > >
> > agreed.
> >
> > now I think we've found the root cause of the problem.
> > orderly_poweroff() is not reenterable and it does not have to be.
> > If we're using orderly_poweroff() for emergency power off, we have
> > to
> > use it correctly.
> >
> > will you generate a patch to do this?
> Sure. I will generate a patch to take care of 1) To make sure that
> orderly_poweroff is called only once right away. I have already
> tested.
>
> for 2) Cancel all the scheduled work queues to monitor the
> temperature.
> I will take some more time to make it and test.
>
> Is that okay? Or you want me to send both together?
>
I think you can send patch for step 1 first.
thanks,
rui
> Regards,
> Keerthy
>
> >
> >
> > thanks,
> > rui
> >
> > >
> > > Let me know your thoughts on this.
> > >
> > > Best Regards,
> > > Keerthy
> > > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > However, there is no clean way of detecting such
> > > > > > > > > failure
> > > > > > > > > of
> > > > > > > > > userspace
> > > > > > > > > powering off the system. In such scenarios, it is
> > > > > > > > > necessary for a
> > > > > > > > > backup
> > > > > > > > > workqueue to be able to force a shutdown of the
> > > > > > > > > system
> > > > > > > > > when
> > > > > > > > > orderly
> > > > > > > > > shutdown is not successful after a configurable time
> > > > > > > > > period.
> > > > > > > > >
> > > > > > > > Given that system running hot is a thermal issue, I
> > > > > > > > guess
> > > > > > > > we care
> > > > > > > > more
> > > > > > > > on this matter then..
> > > > > > > Yes!
> > > > > > >
> > > > > > I just read this thread again https://patchwork.kernel.org/
> > > > > > patc
> > > > > > h/802458
> > > > > > 1/ to recall the previous discussion.
> > > > > >
> > > > > > https://patchwork.kernel.org/patch/8149891/
> > > > > > https://patchwork.kernel.org/patch/8149861/
> > > > > > should be the solution made based on Ingo' suggestion,
> > > > > > right?
> > > > > >
> > > > > > And to me, this sounds like the right direction to go,
> > > > > > thermal
> > > > > > does not
> > > > > > need a back up shutdown solution, it just needs a kernel
> > > > > > function call
> > > > > > which guarantees the system can be shutdown/reboot
> > > > > > immediately.
> > > > > >
> > > > > > is there any reason that patch 1/2 is not accepted?
> > > > > Zhang,
> > > > >
> > > > > http://www.serverphorums.com/read.php?12,1400964
> > > > >
> > > > > I got a NAK from Alan and was given this direction on a
> > > > > thermal_poweroff
> > > > > which is more or less what is done in this patch.
> > > > >
> > > > Actually, Alan's suggestion is more for you to define a
> > > > thermal_poweroff() that can be defined per architecture.
> > > >
> > > > Also, please, keep track of your patch versions and also do
> > > > copy
> > > > everybody who has stated their opinion on previous discussions.
> > > > These
> > > > patches must have Ingo, Alan, and RMK copied too. In this way
> > > > we
> > > > avoid
> > > > loosing track of what has been suggested and we also converge
> > > > faster to
> > > > something everybody (or most of us) agree. Next version,
> > > > please,
> > > > fix
> > > > that.
> > > >
> > > >
> > > > To me, thermal core needs a function that simply powers off the
> > > > system.
> > > > No timeouts, delayed works, backups, etc. Simple and straight.
> > > >
> > > > The idea of having a per architecture implementation, as per
> > > > Alan's
> > > > suggestion, makes sense to me too. Having something different
> > > > from
> > > > pm_power_off(), specific to thermal, might also give the
> > > > opportunity to
> > > > save the power off reason.
> > > >
> > > > BR,
> > > >
> > > > Eduardo Valentin
> > > >
next prev parent reply other threads:[~2017-04-12 8:45 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-31 6:30 [PATCH] thermal: core: Add a back up thermal shutdown mechanism Keerthy
2017-03-31 6:30 ` Keerthy
2017-04-11 17:29 ` Eduardo Valentin
2017-04-12 2:49 ` Keerthy
2017-04-12 2:49 ` Keerthy
2017-04-12 3:20 ` Zhang Rui
2017-04-12 3:39 ` Keerthy
2017-04-12 3:39 ` Keerthy
2017-04-12 4:05 ` Eduardo Valentin
2017-04-12 4:18 ` Keerthy
2017-04-12 4:18 ` Keerthy
2017-04-12 7:55 ` Keerthy
2017-04-12 7:55 ` Keerthy
2017-04-12 8:26 ` Zhang Rui
2017-04-12 8:36 ` Keerthy
2017-04-12 8:36 ` Keerthy
2017-04-12 8:45 ` Zhang Rui [this message]
2017-04-12 15:44 ` Eduardo Valentin
2017-04-12 16:16 ` Keerthy
2017-04-12 16:16 ` Keerthy
2017-04-12 16:50 ` Eduardo Valentin
2017-04-12 16:31 ` Grygorii Strashko
2017-04-12 16:31 ` Grygorii Strashko
2017-04-12 16:34 ` Eduardo Valentin
2017-04-12 16:44 ` Keerthy
2017-04-12 16:44 ` Keerthy
2017-04-12 16:54 ` Eduardo Valentin
2017-04-12 17:07 ` Keerthy
2017-04-12 17:07 ` Keerthy
2017-04-12 17:08 ` Grygorii Strashko
2017-04-12 17:08 ` Grygorii Strashko
2017-04-12 17:11 ` Keerthy
2017-04-12 17:11 ` Keerthy
2017-04-12 17:24 ` Eduardo Valentin
2017-04-12 18:43 ` Tero Kristo
2017-04-12 18:43 ` Tero Kristo
2017-04-13 3:50 ` Keerthy
2017-04-13 3:50 ` Keerthy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1491986744.2357.42.camel@intel.com \
--to=rui.zhang@intel.com \
--cc=edubezval@gmail.com \
--cc=j-keerthy@ti.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-omap@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=nm@ti.com \
--cc=t-kristo@ti.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.