From: Kapileshwar Singh <kapileshwar.singh@arm.com>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Javi Merino <Javi.Merino@arm.com>,
Eduardo Valentin <edubezval@gmail.com>,
Zhang Rui <rui.zhang@intel.com>,
Linux PM list <linux-pm@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Punit Agrawal <Punit.Agrawal@arm.com>,
Lina Iyer <lina.iyer@linaro.org>, Mark Brown <broonie@kernel.org>,
Jon Medhurst <tixy@linaro.org>
Subject: Re: [PATCH v3 5/5] thermal: cpu_cooling: update the cpu device when cpufreq updates the policy cpu
Date: Tue, 03 Mar 2015 15:09:48 +0000 [thread overview]
Message-ID: <54F5CEBC.1070303@arm.com> (raw)
In-Reply-To: <CAKohponB3cNXT8p=SqrfpGEPjLpeE=tnzUVOiabV8hEfBMb8rQ@mail.gmail.com>
On 03/03/15 13:07, Viresh Kumar wrote:
> On 3 March 2015 at 17:11, Kapileshwar Singh <kapileshwar.singh@arm.com> wrote:
>> Yes I indeed tested the case where we cache the device pointer of the CPU for which the OPP's are populated.
>> When this CPU is hotplugged out, it invalidates the device pointer itself. Here are the error we get in dmesg:
>
> What do you mean by 'invalidates the device pointer' ? that cpu_dev is NULL ?
The cpu_dev is not NULL but we get an erroneous OPP back. We found the problem lies in the way we calculate the frequency for the cluster.
>> <3>[67203.216774] opp_get_voltage: Invalid parameters
>> <3>[67203.326774] opp_get_voltage: Invalid parameters
>> <3>[67203.326774] opp_get_voltage: Invalid parameters
>
> Have you handwritten them ? Why don't they precede with dev_pm_* ??
I have not handwritten them, It was from a Linaro 3.10 based kernel when I first noticed this issue but the same problem exists in mainline.
Apologies for this I sent you an older trace which I had saved when I found the bug. Here is the trace I get from mainline
[ 5680.135339] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.245528] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.355432] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.465521] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.575599] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.685817] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.795556] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.905598] dev_pm_opp_get_voltage: Invalid parameters
>
>>
>> Which happens because:
>>
>> unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp)
>> {
>> ..
>> tmp_opp = rcu_dereference(opp);
>> if (unlikely(IS_ERR_OR_NULL(tmp_opp)) || !tmp_opp->available)
>> pr_err("%s: Invalid parameters\n", __func__);
>
> This %s should print routine name ..
>
>> else
>> ..
>>
>> Which happens when
>>
>> opp = dev_pm_opp_find_freq_exact(cpufreq_device->cpu_dev, freq_hz,
>> true);
>>
>> returns a an erroneous or NULL OPP or the opp is unavailable (in the above condition)
>
Update: This returns an erroneous OPP
> Please goto the depth of this thing, as I don't think it should happen.
>
> Over that I was asking you if you have tested the solution Javi gave,
> because OPPs
> wouldn't have been initialized for other CPUs once policy->cpu goes down.
I did test this but we were working with the assumption that OPPs should be populated for all the CPUs and also that OPPs are lost for a hotplugged CPU which I see is not the case.
We have looked at this more closely and found that problem lies in:
freq = cpufreq_quick_get(cpumask_any(&cpufreq_device->allowed_cpus));
which returns a NULL frequency as we are not checking for online CPUs here. We shall come up with a fix for this. Many thanks for helping us with the investigation.
Regards,
KP
next prev parent reply other threads:[~2015-03-03 15:09 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-02 17:17 [PATCH v3 0/5] Subject: The power allocator thermal governor Javi Merino
2015-03-02 17:17 ` [PATCH v3 1/5] thermal: introduce the Power Allocator governor Javi Merino
2015-03-02 17:17 ` [PATCH v3 2/5] thermal: add trace events to the power allocator governor Javi Merino
2017-03-15 4:26 ` Viresh Kumar
2015-03-02 17:17 ` [PATCH v3 3/5] thermal: export thermal_zone_parameters to sysfs Javi Merino
2015-03-02 17:17 ` [PATCH v3 4/5] Revert "cpufreq: remove CPUFREQ_UPDATE_POLICY_CPU notifications" Javi Merino
2015-03-02 17:17 ` [PATCH v3 5/5] thermal: cpu_cooling: update the cpu device when cpufreq updates the policy cpu Javi Merino
2015-03-03 4:03 ` Viresh Kumar
2015-03-03 10:59 ` Kapileshwar Singh
2015-03-03 11:19 ` Viresh Kumar
2015-03-03 11:41 ` Kapileshwar Singh
2015-03-03 13:07 ` Viresh Kumar
2015-03-03 15:09 ` Kapileshwar Singh [this message]
2015-03-03 15:26 ` Sudeep Holla
2015-03-03 15:30 ` Viresh Kumar
2015-03-03 15:33 ` Sudeep Holla
2015-03-03 15:29 ` Viresh Kumar
2015-03-03 15:34 ` Kapileshwar Singh
2015-03-02 17:28 ` [PATCH v3 0/5] Subject: The power allocator thermal governor Eduardo Valentin
2015-03-02 17:40 ` Javi Merino
2015-03-02 18:47 ` Eduardo Valentin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54F5CEBC.1070303@arm.com \
--to=kapileshwar.singh@arm.com \
--cc=Javi.Merino@arm.com \
--cc=Punit.Agrawal@arm.com \
--cc=broonie@kernel.org \
--cc=edubezval@gmail.com \
--cc=lina.iyer@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=rui.zhang@intel.com \
--cc=tixy@linaro.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox