All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eduardo Valentin <edubezval@gmail.com>
To: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Zhang Rui <rui.zhang@intel.com>, Keerthy <j-keerthy@ti.com>,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-omap@vger.kernel.org, nm@ti.com, t-kristo@ti.com
Subject: Re: [PATCH] thermal: core: Add a back up thermal shutdown mechanism
Date: Wed, 12 Apr 2017 09:34:24 -0700	[thread overview]
Message-ID: <20170412163422.GA13484@localhost.localdomain> (raw)
In-Reply-To: <b565f2c9-fdd7-7525-da91-695f113e631b@ti.com>

[-- Attachment #1: Type: text/plain, Size: 4281 bytes --]

Hey,

On Wed, Apr 12, 2017 at 11:31:18AM -0500, Grygorii Strashko wrote:
> 
> 
> On 04/12/2017 10:44 AM, Eduardo Valentin wrote:
> > Hello,
> > 
> ...
> 
> > 
> > I agree. But there it nothing that says it is not reenterable. If you
> > saw something in this line, can you please share?
> > 
> >>>> will you generate a patch to do this?
> >>> Sure. I will generate a patch to take care of 1) To make sure that
> >>> orderly_poweroff is called only once right away. I have already
> >>> tested.
> >>>
> >>> for 2) Cancel all the scheduled work queues to monitor the
> >>> temperature.
> >>> I will take some more time to make it and test.
> >>>
> >>> Is that okay? Or you want me to send both together?
> >>>
> >> I think you can send patch for step 1 first.
> > 
> > I am happy to see that Keerthy found the problem with his setup and a
> > possible solution. But I have a few concerns here.
> > 
> > 1. If regular shutdown process takes 10seconds, that is a ballpark that
> > thermal should never wait. orderly_poweroff() calls run_cmd() with wait
> > flag set. That means, if regular userland shutdown takes 10s, we are
> > waiting for it. Obviously this not acceptable. Specially if you setup
> > critical trip to be 125C. Now, if you properly size the critical trip to
> > fire before hotspot really reach 125C, for 10s (or the time it takes to
> > shutdown), then fine. But based on what was described in this thread,
> > his system is waiting 10s on regular shutdown, and his silicon is on
> > out-of-spec temperature for 10s, which is wrong.
> > 
> > 2. The above scenario is not acceptable in a long run, specially from a
> > reliability perspective. If orderly_poweroff() has a possibility to
> > simply never return (or take too long), I would say the thermal
> > subsystem is using the wrong API.
> > 
> 
> 
> Hh, I do not see that orderly_poweroff() will wait for anything now:
> void orderly_poweroff(bool force)
> {
> 	if (force) /* do not override the pending "true" */
> 		poweroff_force = true;
> 	schedule_work(&poweroff_work); 
> ^^^^^^^ async call. even here can be pretty big delay if system is under pressure
> }
> 
> 
> static int __orderly_poweroff(bool force)
> {
> 	int ret;
> 
> 	ret = run_cmd(poweroff_cmd);
> ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC

Yeah, and that is what I really meant. Sorry for the confusion. The exec
is problematic in his scenario too, given he is running on a very
interesting NFS setup. Yes, the WAIT_EXEC is set:
392 static int run_cmd(const char *cmd)
393 {
394         char **argv;
395         static char *envp[] = {
396                 "HOME=/",
397                 "PATH=/sbin:/bin:/usr/sbin:/usr/bin",
398                 NULL
399         };
400         int ret;
401         argv = argv_split(GFP_KERNEL, cmd, NULL);
402         if (argv) {
403                 ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
404                 argv_free(argv);
405         } else {
406                 ret = -ENOMEM;
407         }
408 
409         return ret;
410 }
411 


> 
> 	if (ret && force) {
> 		pr_warn("Failed to start orderly shutdown: forcing the issue\n");
> 
> 		/*
> 		 * I guess this should try to kick off some daemon to sync and
> 		 * poweroff asap.  Or not even bother syncing if we're doing an
> 		 * emergency shutdown?
> 		 */
> 		emergency_sync();
> 		kernel_power_off();
> ^^^ force power off, but only if run_cmd() failed - for example /sbin/poweroff doesn't exist
> 	}
> 
> 	return ret;
> }
> 
> static bool poweroff_force;
> 
> static void poweroff_work_func(struct work_struct *work)
> {
> 	__orderly_poweroff(poweroff_force);
> }
> 
> As result thermal has no control of power off any more after calling orderly_poweroff() and can get the result
> of US poweroff binary execution.
> 
> > 
> > If you are going to implement the above two patches, keep in mind:
> > i. At least within the thermal subsystem, you need to take care of all
> > zones that could trigger a shutdown.
> > ii. serializing the calls to orderly_poweroff() seams to be more
> > concerning than cancelling all monitoring.
> > 
> > 
> 
> -- 
> regards,
> -grygorii

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2017-04-12 16:34 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-31  6:30 [PATCH] thermal: core: Add a back up thermal shutdown mechanism Keerthy
2017-03-31  6:30 ` Keerthy
2017-04-11 17:29 ` Eduardo Valentin
2017-04-12  2:49   ` Keerthy
2017-04-12  2:49     ` Keerthy
2017-04-12  3:20     ` Zhang Rui
2017-04-12  3:39       ` Keerthy
2017-04-12  3:39         ` Keerthy
2017-04-12  4:05         ` Eduardo Valentin
2017-04-12  4:18           ` Keerthy
2017-04-12  4:18             ` Keerthy
2017-04-12  7:55           ` Keerthy
2017-04-12  7:55             ` Keerthy
2017-04-12  8:26             ` Zhang Rui
2017-04-12  8:36               ` Keerthy
2017-04-12  8:36                 ` Keerthy
2017-04-12  8:45                 ` Zhang Rui
2017-04-12 15:44                   ` Eduardo Valentin
2017-04-12 16:16                     ` Keerthy
2017-04-12 16:16                       ` Keerthy
2017-04-12 16:50                       ` Eduardo Valentin
2017-04-12 16:31                     ` Grygorii Strashko
2017-04-12 16:31                       ` Grygorii Strashko
2017-04-12 16:34                       ` Eduardo Valentin [this message]
2017-04-12 16:44                       ` Keerthy
2017-04-12 16:44                         ` Keerthy
2017-04-12 16:54                         ` Eduardo Valentin
2017-04-12 17:07                           ` Keerthy
2017-04-12 17:07                             ` Keerthy
2017-04-12 17:08                         ` Grygorii Strashko
2017-04-12 17:08                           ` Grygorii Strashko
2017-04-12 17:11                           ` Keerthy
2017-04-12 17:11                             ` Keerthy
2017-04-12 17:24                             ` Eduardo Valentin
2017-04-12 18:43                               ` Tero Kristo
2017-04-12 18:43                                 ` Tero Kristo
2017-04-13  3:50                                 ` Keerthy
2017-04-13  3:50                                   ` Keerthy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170412163422.GA13484@localhost.localdomain \
    --to=edubezval@gmail.com \
    --cc=grygorii.strashko@ti.com \
    --cc=j-keerthy@ti.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-omap@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=nm@ti.com \
    --cc=rui.zhang@intel.com \
    --cc=t-kristo@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.