All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Stephen Warren <swarren@wwwdotorg.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	cpufreq <cpufreq@vger.kernel.org>
Subject: Re: cpufreq_stats NULL deref on second system suspend
Date: Thu, 12 Sep 2013 01:16:37 +0530	[thread overview]
Message-ID: <5230C89D.7010801@linux.vnet.ibm.com> (raw)
In-Reply-To: <5230BE75.7040307@wwwdotorg.org>

On 09/12/2013 12:33 AM, Stephen Warren wrote:
> On 09/11/2013 12:42 PM, Srivatsa S. Bhat wrote:
> ...
>> OK, I took a second look at the code, and I suspect that applying the
>> second patch might help. So can you try by applying both the patches
>> please[1][2]?
>>
> ...
>> [1]. http://marc.info/?l=linux-kernel&m=137889516210816&w=2
>> [2]. http://marc.info/?l=linux-kernel&m=137889800511940&w=2
> 
> Yes, with both of those patches applies, the problem is solved:-)
> 
> I was going to test the second patch originally, but it sounded like it
> was more of a cleanup rather than a fix for my issue, so I didn't bother
> when I found the problem wasn't solved by patch 1. Sorry!
> 

Well, honestly, even I had intended the second patch as a cleanup and
hadn't asked you to test it ;-) Only when you reported that the first patch
failed to solve your problem, I realized that the second patch was
important too! :-) Thanks for testing!

> For the record, I'm testing on a 2-CPU system, so I'm not sure whether
> your explanation applies; it talks about CPUs 2 and 3 whereas I only
> have CPUs 0 and 1, but perhaps your explanation applies equally to any
> pair of CPUs?
> 

Yes, it applies to any pair of CPUs, as long as the CPU first taken down
is not the policy->cpu. In your case, it applies like this:
IIUC, CPU0 is the boot cpu, and hence it wont be taken offline using hotplug.
So only CPU 1 is taken offline during suspend. And if it is not the policy->cpu,
then it hits the very same bug that I described with the analogy of CPUs 2
and 3.

> For the record, here's the information you requested in the other email:
> 
> # cat /sys/devices/system/cpu/cpu*/cpufreq/related_cpus
> 0 1
> 0 1
> 

Thanks! It would have been more useful to somehow know which was the
policy->cpu. But looking at the problem, certainly CPU0 was the policy->cpu
in your case. Anyway, nevermind, good to know that the problem got solved
by the 2 patches :-) And more importantly, we now fully understand the
problems that can lead to the NULL deref and the solutions, as outlined below:

Problem 1 : The last surviving policy->cpu during suspend might not
be the one which is onlined during resume. So policy->cpu updates can
get missed by the cpufreq-stats code. This is solved by patch 1.

Problem 2 : If a CPU other than the policy->cpu goes down first during
suspend, then we end up spuriously updating the policy->cpu field, making
update_policy_cpu() go crazy. This is solved by patch 2.

Ideally, I think we should fix the weird if/else condition, since *that*
is the real culprit; and retain patch 2 as a cleanup.

Something like this:


From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Subject: [PATCH] cpufreq: Restructure if/else block to avoid unintended behavior

In __cpufreq_remove_dev_prepare(), the code which decides whether to remove
the sysfs link or nominate a new policy cpu, is governed by an if/else block
with a rather complex set of conditionals. Worse, they harbor a subtlety
which leads to certain unintended behavior.

The code looks like this:

        if (cpu != policy->cpu && !frozen) {
                sysfs_remove_link(&dev->kobj, "cpufreq");
        } else if (cpus > 1) {
		new_cpu = cpufreq_nominate_new_policy_cpu(...);
		...
		update_policy_cpu(..., new_cpu);
	}

The original intention was:
If the CPU going offline is not policy->cpu, just remove the link.
On the other hand, if the CPU going offline is the policy->cpu itself,
handover the policy->cpu job to some other surviving CPU in that policy.

But because the 'if' condition also includes the 'frozen' check, now there
are *two* possibilities by which we can enter the 'else' block:

1. cpu == policy->cpu (intended)
2. cpu != policy->cpu && frozen (unintended)

Due to the second (unintended) scenario, we end up spuriously nominating
a CPU as the policy->cpu, even when the existing policy->cpu is alive and
well. This can cause problems further down the line, especially when we end
up nominating the same policy->cpu as the new one (ie., old == new),
because it totally confuses update_policy_cpu().

To avoid this mess, restructure the if/else block to only do what was
originally intended, and thus prevent any unwelcome surprises.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/cpufreq/cpufreq.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 62bdb95..247842b 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1193,8 +1193,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev,
 		cpumask_clear_cpu(cpu, policy->cpus);
 	unlock_policy_rwsem_write(cpu);
 
-	if (cpu != policy->cpu && !frozen) {
-		sysfs_remove_link(&dev->kobj, "cpufreq");
+	if (cpu != policy->cpu) {
+		if (!frozen)
+			sysfs_remove_link(&dev->kobj, "cpufreq");
 	} else if (cpus > 1) {
 
 		new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu, frozen);




So can you see if patch 1 + this above fix solves your problem as well?
Then we can retain the original patch 2 as a cleanup, after these 2 patches.
This organization also makes the code look better and understandable.

Rafael, I'll post the 3 patches separately after knowing the results from
Stephen. You don't have to bother deciphering the patch ordering just yet ;-)

Regards,
Srivatsa S. Bhat


  reply	other threads:[~2013-09-11 19:46 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-09 19:22 cpufreq_stats NULL deref on second system suspend Stephen Warren
2013-09-09 20:01 ` Rafael J. Wysocki
2013-09-09 20:01   ` Stephen Warren
2013-09-09 20:24     ` Rafael J. Wysocki
2013-09-09 21:29       ` Stephen Warren
2013-09-09 23:14         ` Rafael J. Wysocki
2013-09-10 20:53           ` Stephen Warren
2013-09-10 22:34             ` Rafael J. Wysocki
2013-09-11 10:21               ` Srivatsa S. Bhat
2013-09-11 10:44                 ` Viresh Kumar
2013-09-11 10:45                   ` Viresh Kumar
2013-09-11 11:14                     ` Srivatsa S. Bhat
2013-09-11 11:59                       ` Viresh Kumar
2013-09-11 13:56                         ` Srivatsa S. Bhat
2013-09-12  5:52                         ` Viresh Kumar
2013-09-12  6:26                           ` Srivatsa S. Bhat
2013-09-12  6:41                             ` Viresh Kumar
2013-09-12  6:46                               ` Srivatsa S. Bhat
2013-09-12  6:52                                 ` Viresh Kumar
2013-09-12  7:14                                   ` Srivatsa S. Bhat
2013-09-12 15:55                             ` Stephen Warren
2013-09-12 17:26                               ` Srivatsa S. Bhat
2013-09-13  4:26                                 ` Viresh Kumar
2013-09-11 11:10                   ` Srivatsa S. Bhat
2013-09-11 11:15                     ` Viresh Kumar
2013-09-11 11:17                       ` Srivatsa S. Bhat
2013-09-11 11:41                         ` Viresh Kumar
2013-09-11 11:09                 ` Srivatsa S. Bhat
2013-09-11 16:05                 ` Stephen Warren
2013-09-11 18:03                   ` Srivatsa S. Bhat
2013-09-11 18:42                     ` Srivatsa S. Bhat
2013-09-11 19:03                       ` Stephen Warren
2013-09-11 19:46                         ` Srivatsa S. Bhat [this message]
2013-09-11 20:07                           ` Stephen Warren
2013-09-11 20:05                             ` Srivatsa S. Bhat
2013-09-12  6:04                           ` Viresh Kumar
2013-09-12  6:00                         ` Viresh Kumar
2013-09-12  5:58                       ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5230C89D.7010801@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=cpufreq@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=swarren@wwwdotorg.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.