Re: cpufreq_stats NULL deref on second system suspend

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>,
	Stephen Warren <swarren@wwwdotorg.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	cpufreq <cpufreq@vger.kernel.org>
Subject: Re: cpufreq_stats NULL deref on second system suspend
Date: Thu, 12 Sep 2013 11:56:34 +0530	[thread overview]
Message-ID: <52315E9A.3000607@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAKohpo=8mqRmmk17D52yiR9w1V4Z+2UzZ2kzABtrxHbYR+sQzw@mail.gmail.com>

On 09/12/2013 11:22 AM, Viresh Kumar wrote:
> Let me fix my mail first.. I was running out of time yesterday and so couldn't
> frame things correctly :)
> 
> On 11 September 2013 17:29, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> Okay.. There are two different ways in which cpufreq_add_dev() work
>> currently..
>>
>> Boot cluster (i.e. policy with boot CPU)
>> ---------------
>>
>> Here cpufreq_remove_dev() is never called for boot cpu but all others.
>> And similarly cpufreq_add_dev() is never called for boot cpu but all others.
>>
>> Now policy->cpu contains meaningful cpu at beginning of resume and
>> we don't need to modify that at all.. For all the remaining CPUs we
>> better call cpufreq_add_policy_cpu() rather..
> 
> And this should be done without your patch. Or actually we will simply
> return from this place. Atleast for systems with single cluster, like Tegra.
> 
> policy->related_cpus is still valid after resume and we haven't removed
> policy from the cpufreq_policy_list (though there is a bug which I have
> fixed separately and sent it to you..).. So no change required for a single
> cluster system..
> 
>> Non-boot Cluster
>> ---------------------
>>
>> All CPUs here are removed and at the end policy->cpu contains the last
>> cpu removed.. So, for a cluster with cpu 2 and 3.... it will contain 3..
>>
>> Now at resume we will add cpu2 first and so need to update policy->cpu
>> to 2..
> 
>> But for all other CPUs in this cluster we return early from
>> cpufreq_add_dev() and call cpufreq_add_policy_cpu() as policy->cpus
>> was fixed by call to ->init() for the first cpu of this cluster..
> 
> This was wrong, we need a valid policy->related_cpus field which is always
> valid and so we return early here too, but not for the first cpu of cluster.
> 
>> And so we never reach the line: policy->cpu = cpu;
>>
>> For the first cpu of non-boot cluster we need to call update_policy_cpu()
>> and not for others..
> 
> that's correct, thought I have one more idea.. :)
> 
>> But for the boot cluster if we can call ->init() somehow at resume time,
>> then things would be fairly similar in both cases..
> 
> Not required.. its all working already.. and so Stephen shouldn't need your
> patch for Tegra, but rather my patches that fix other cpufreq bugs..
> 
> Now coming back to the ideas I have...
> Same code will work if hotplug sequence is fixed a bit. Why aren't we doing
> exact opposite of suspend in resume?
> 
> We are removing CPUs (leaving the boot cpu) in ascending order and then
> adding them back in same order.. Why?
> 
> Why not remove CPUs in descending order and add in ascending order? Or
> remove in ascending order and add in descending order?
> 

I had the same thought when solving this bug.. We have had similar issues with
CPU hotplug notifiers too: why are they invoked in the same order during both
CPU down and up, instead of reversing the order? I even had a patchset to perform
reverse-invocation of notifiers.. http://lwn.net/Articles/508072/
... but people didn't find that very compelling to have.


> That way policy->cpu will be updated with the right cpu and your patch wouldn't
> be required..
> 
> I am not saying that this can't be hacked/fixed in cpufreq but suspend/resume
> may also be fixed and that looks logically more correct to me..
> 

It does to me too, but I think the reason nobody really bothered is because perhaps
not many other subsystems care about the order in which CPUs are torn down or
brought up; they just need the total number to match.. cpufreq is one exception
as we saw with this bug.


Regards,
Srivatsa S. Bhat

next prev parent reply	other threads:[~2013-09-12  6:26 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-09 19:22 cpufreq_stats NULL deref on second system suspend Stephen Warren
2013-09-09 20:01 ` Rafael J. Wysocki
2013-09-09 20:01   ` Stephen Warren
2013-09-09 20:24     ` Rafael J. Wysocki
2013-09-09 21:29       ` Stephen Warren
2013-09-09 23:14         ` Rafael J. Wysocki
2013-09-10 20:53           ` Stephen Warren
2013-09-10 22:34             ` Rafael J. Wysocki
2013-09-11 10:21               ` Srivatsa S. Bhat
2013-09-11 10:44                 ` Viresh Kumar
2013-09-11 10:45                   ` Viresh Kumar
2013-09-11 11:14                     ` Srivatsa S. Bhat
2013-09-11 11:59                       ` Viresh Kumar
2013-09-11 13:56                         ` Srivatsa S. Bhat
2013-09-12  5:52                         ` Viresh Kumar
2013-09-12  6:26                           ` Srivatsa S. Bhat [this message]
2013-09-12  6:41                             ` Viresh Kumar
2013-09-12  6:46                               ` Srivatsa S. Bhat
2013-09-12  6:52                                 ` Viresh Kumar
2013-09-12  7:14                                   ` Srivatsa S. Bhat
2013-09-12 15:55                             ` Stephen Warren
2013-09-12 17:26                               ` Srivatsa S. Bhat
2013-09-13  4:26                                 ` Viresh Kumar
2013-09-11 11:10                   ` Srivatsa S. Bhat
2013-09-11 11:15                     ` Viresh Kumar
2013-09-11 11:17                       ` Srivatsa S. Bhat
2013-09-11 11:41                         ` Viresh Kumar
2013-09-11 11:09                 ` Srivatsa S. Bhat
2013-09-11 16:05                 ` Stephen Warren
2013-09-11 18:03                   ` Srivatsa S. Bhat
2013-09-11 18:42                     ` Srivatsa S. Bhat
2013-09-11 19:03                       ` Stephen Warren
2013-09-11 19:46                         ` Srivatsa S. Bhat
2013-09-11 20:07                           ` Stephen Warren
2013-09-11 20:05                             ` Srivatsa S. Bhat
2013-09-12  6:04                           ` Viresh Kumar
2013-09-12  6:00                         ` Viresh Kumar
2013-09-12  5:58                       ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52315E9A.3000607@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=cpufreq@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=swarren@wwwdotorg.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.