Re: cpufreq_stats NULL deref on second system suspend

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>,
	Stephen Warren <swarren@wwwdotorg.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	cpufreq <cpufreq@vger.kernel.org>
Subject: Re: cpufreq_stats NULL deref on second system suspend
Date: Thu, 12 Sep 2013 11:56:34 +0530	[thread overview]
Message-ID: <52315E9A.3000607@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAKohpo=8mqRmmk17D52yiR9w1V4Z+2UzZ2kzABtrxHbYR+sQzw@mail.gmail.com>

On 09/12/2013 11:22 AM, Viresh Kumar wrote:
> Let me fix my mail first.. I was running out of time yesterday and so couldn't
> frame things correctly :)
> 
> On 11 September 2013 17:29, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> Okay.. There are two different ways in which cpufreq_add_dev() work
>> currently..
>>
>> Boot cluster (i.e. policy with boot CPU)
>> ---------------
>>
>> Here cpufreq_remove_dev() is never called for boot cpu but all others.
>> And similarly cpufreq_add_dev() is never called for boot cpu but all others.
>>
>> Now policy->cpu contains meaningful cpu at beginning of resume and
>> we don't need to modify that at all.. For all the remaining CPUs we
>> better call cpufreq_add_policy_cpu() rather..
> 
> And this should be done without your patch. Or actually we will simply
> return from this place. Atleast for systems with single cluster, like Tegra.
> 
> policy->related_cpus is still valid after resume and we haven't removed
> policy from the cpufreq_policy_list (though there is a bug which I have
> fixed separately and sent it to you..).. So no change required for a single
> cluster system..
> 
>> Non-boot Cluster
>> ---------------------
>>
>> All CPUs here are removed and at the end policy->cpu contains the last
>> cpu removed.. So, for a cluster with cpu 2 and 3.... it will contain 3..
>>
>> Now at resume we will add cpu2 first and so need to update policy->cpu
>> to 2..
> 
>> But for all other CPUs in this cluster we return early from
>> cpufreq_add_dev() and call cpufreq_add_policy_cpu() as policy->cpus
>> was fixed by call to ->init() for the first cpu of this cluster..
> 
> This was wrong, we need a valid policy->related_cpus field which is always
> valid and so we return early here too, but not for the first cpu of cluster.
> 
>> And so we never reach the line: policy->cpu = cpu;
>>
>> For the first cpu of non-boot cluster we need to call update_policy_cpu()
>> and not for others..
> 
> that's correct, thought I have one more idea.. :)
> 
>> But for the boot cluster if we can call ->init() somehow at resume time,
>> then things would be fairly similar in both cases..
> 
> Not required.. its all working already.. and so Stephen shouldn't need your
> patch for Tegra, but rather my patches that fix other cpufreq bugs..
> 
> Now coming back to the ideas I have...
> Same code will work if hotplug sequence is fixed a bit. Why aren't we doing
> exact opposite of suspend in resume?
> 
> We are removing CPUs (leaving the boot cpu) in ascending order and then
> adding them back in same order.. Why?
> 
> Why not remove CPUs in descending order and add in ascending order? Or
> remove in ascending order and add in descending order?
> 

I had the same thought when solving this bug.. We have had similar issues with
CPU hotplug notifiers too: why are they invoked in the same order during both
CPU down and up, instead of reversing the order? I even had a patchset to perform
reverse-invocation of notifiers.. http://lwn.net/Articles/508072/
... but people didn't find that very compelling to have.


> That way policy->cpu will be updated with the right cpu and your patch wouldn't
> be required..
> 
> I am not saying that this can't be hacked/fixed in cpufreq but suspend/resume
> may also be fixed and that looks logically more correct to me..
> 

It does to me too, but I think the reason nobody really bothered is because perhaps
not many other subsystems care about the order in which CPUs are torn down or
brought up; they just need the total number to match.. cpufreq is one exception
as we saw with this bug.


Regards,
Srivatsa S. Bhat

next prev parent reply	other threads:[~2013-09-12  6:30 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-09 19:22 cpufreq_stats NULL deref on second system suspend Stephen Warren
2013-09-09 20:01 ` Rafael J. Wysocki
2013-09-09 20:01   ` Stephen Warren
2013-09-09 20:24     ` Rafael J. Wysocki
2013-09-09 21:29       ` Stephen Warren
2013-09-09 23:14         ` Rafael J. Wysocki
2013-09-10 20:53           ` Stephen Warren
2013-09-10 22:34             ` Rafael J. Wysocki
2013-09-11 10:21               ` Srivatsa S. Bhat
2013-09-11 10:44                 ` Viresh Kumar
2013-09-11 10:45                   ` Viresh Kumar
2013-09-11 11:14                     ` Srivatsa S. Bhat
2013-09-11 11:59                       ` Viresh Kumar
2013-09-11 13:56                         ` Srivatsa S. Bhat
2013-09-12  5:52                         ` Viresh Kumar
2013-09-12  6:26                           ` Srivatsa S. Bhat [this message]
2013-09-12  6:41                             ` Viresh Kumar
2013-09-12  6:46                               ` Srivatsa S. Bhat
2013-09-12  6:52                                 ` Viresh Kumar
2013-09-12  7:14                                   ` Srivatsa S. Bhat
2013-09-12 15:55                             ` Stephen Warren
2013-09-12 17:26                               ` Srivatsa S. Bhat
2013-09-13  4:26                                 ` Viresh Kumar
2013-09-11 11:10                   ` Srivatsa S. Bhat
2013-09-11 11:15                     ` Viresh Kumar
2013-09-11 11:17                       ` Srivatsa S. Bhat
2013-09-11 11:41                         ` Viresh Kumar
2013-09-11 11:09                 ` Srivatsa S. Bhat
2013-09-11 16:05                 ` Stephen Warren
2013-09-11 18:03                   ` Srivatsa S. Bhat
2013-09-11 18:42                     ` Srivatsa S. Bhat
2013-09-11 19:03                       ` Stephen Warren
2013-09-11 19:46                         ` Srivatsa S. Bhat
2013-09-11 20:07                           ` Stephen Warren
2013-09-11 20:05                             ` Srivatsa S. Bhat
2013-09-12  6:04                           ` Viresh Kumar
2013-09-12  6:00                         ` Viresh Kumar
2013-09-12  5:58                       ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52315E9A.3000607@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=cpufreq@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=swarren@wwwdotorg.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).