From: Nishanth Menon <nm@ti.com>
To: Dmitry Torokhov <dtor@chromium.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
Viresh Kumar <viresh.kumar@linaro.org>,
Thomas Petazzoni <thomas.petazzoni@free-electrons.com>,
Geert Uytterhoeven <geert+renesas@glider.be>,
Stefan Wahren <stefan.wahren@i2se.com>,
Paul Gortmaker <paul.gortmaker@windriver.com>,
"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/4] PM / OPP: take RCU lock in dev_pm_opp_get_opp_count
Date: Wed, 24 Dec 2014 14:37:14 -0600 [thread overview]
Message-ID: <549B23FA.5050100@ti.com> (raw)
In-Reply-To: <CAE_wzQ_djUapN-hwHUttcrfzUiWPHvW9OxV7zqrwidtYh8_FZg@mail.gmail.com>
On 12/24/2014 11:44 AM, Dmitry Torokhov wrote:
> On Wed, Dec 24, 2014 at 9:37 AM, Nishanth Menon <nm@ti.com> wrote:
>> On 12/24/2014 11:31 AM, Dmitry Torokhov wrote:
>>> On Wed, Dec 24, 2014 at 9:16 AM, Nishanth Menon <nm@ti.com> wrote:
>>>> On 12/24/2014 11:09 AM, Dmitry Torokhov wrote:
>>>>> On Wed, Dec 24, 2014 at 8:48 AM, Nishanth Menon <nm@ti.com> wrote:
>>>>>> On 12/16/2014 05:09 PM, Dmitry Torokhov wrote:
>>>>>>> A lot of callers are missing the fact that dev_pm_opp_get_opp_count
>>>>>>> needs to be called under RCU lock. Given that RCU locks can safely be
>>>>>>> nested, instead of providing *_locked() API, let's take RCU lock inside
>>>>>>> dev_pm_opp_get_opp_count() and leave callers as is.
>>>>>>
>>>>>> While it is true that we can safely do nested RCU locks, This also
>>>>>> encourages wrong usage.
>>>>>>
>>>>>> count = dev_pm_opp_get_opp_count(dev)
>>>>>> ^^ point A
>>>>>> array = kzalloc(count * sizeof (*array));
>>>>>> rcu_read_lock();
>>>>>> ^^ point B
>>>>>> .. work down the list and add OPPs..
>>>>>> ...
>>>>>>
>>>>>> Between A and B, we might have had list modification (dynamic OPP
>>>>>> addition or deletion) - which implies that the count is no longer
>>>>>> accurate between point A and B. instead, enforcing callers to have the
>>>>>> responsibility of rcu_lock is exactly what we have to do since the OPP
>>>>>> library has no clue how to enforce pointer or data accuracy.
>>>>>
>>>>> No, you seem to have a misconception that rcu_lock protects you past
>>>>> the point B, but that is also wrong. The only thing rcu "lock"
>>>>> provides is safe traversing the list and guarantee that elements will
>>>>> not disappear while you are referencing them, but list can both
>>>>> contract and expand under you. In that regard code in
>>>>> drivers/cpufreq/cpufreq_opp.c is utterly wrong. If you want to count
>>>>> the list and use number of elements you should be taking a mutex.
>>>>> Luckily all cpufreq drivers at the moment only want to see if OPP
>>>>> table is empty or not, so as a stop-gap we can take rcu_lock
>>>>> automatically as we are getting count. We won't get necessarily
>>>>> accurate result, but at least we will be safe traversing the list.
>>>>
>>>> So, instead of a half solution, lets consider this in the realm of
>>>> dynamic OPPs as well. agreed to the point that we only have safe
>>>> traversal and pointer validity. the real problem however is with
>>>> "dynamic OPPs" (one of the original reasons why i did not add dynamic
>>>> OPPs in the original version was to escape from it's complexity for
>>>> users - anyways.. we are beyond that now). if OPPs can be removed on
>>>> the fly, we need the following:
>>>> a) use OPP notifiers to adequately handle list modification
>>>> b) lock down list modification (and associated APIs) to ensure that
>>>> the original cpufreq /devfreq list is correct.
>>>>
>>>> I still dont see the need to do this half solution.
>>>
>>> The need for half solution at the moment is that you can't safely
>>> travel the lists and may crash on an invalid pointer.
>>
>> So, fix the cpufreq-dt instead of moving the hack inside OPP driver.
>
> I started there, but it is not only cpufreq-dt that got it wrong. I
> considered changing individual drivers (Viresh also suggested adding
> _locked() variant API), but decided patching opp was less invasive for
> now.
True. I had done an audit and cleanup, I think a couple or so years
back and things ofcourse tend to go down the bitrot path without
constant checkups :(
>>> Going forward I think (I mentioned that in my other email) that we
>>> should rework the OPP API so that callers fetch OPP table object for a
>>> device at init/probe time and then use it to get OPPs. This way won't
>>> have to travel two lists any time we want to reference an OPP.
>>>
>>> And instead of relying notifiers, maybe look into using OPP tables
>>> directly in cpufreq drivers instead of converting OPP into static-ish
>>> cpufreq tables.
>>>
>>
>> If you'd like a proper fix for OPP usage, I am all open to see such a
>> proposal that works not just for cpufreq, but also for devfreq as well.
>
> Yeah, let's see what kind of time I have ;)
That would be nice. Thank you.
if you could post the split off the remaining patches from the series
esp patches #1 and #2 w.r.t 3.19-rc1 and repost, it will be nice to
get them merged in as they do look like obvious improvements good to
get in without depending on the remainder of the series which we can
work on meanwhile.
--
Regards,
Nishanth Menon
next prev parent reply other threads:[~2014-12-24 20:37 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-16 23:09 [PATCH 0/4] Allow cpufreq-dt to defer probe if OPP table is not ready Dmitry Torokhov
2014-12-16 23:09 ` [PATCH 1/4] PM / OPP: add some lockdep annotations Dmitry Torokhov
2014-12-17 4:10 ` Viresh Kumar
2014-12-24 16:28 ` Nishanth Menon
2014-12-16 23:09 ` [PATCH 2/4] PM / OPP: fix warning in of_free_opp_table Dmitry Torokhov
2014-12-17 4:28 ` Viresh Kumar
2014-12-24 16:42 ` Nishanth Menon
2014-12-16 23:09 ` [PATCH 3/4] PM / OPP: take RCU lock in dev_pm_opp_get_opp_count Dmitry Torokhov
2014-12-17 4:36 ` Viresh Kumar
2014-12-17 17:28 ` Dmitry Torokhov
2014-12-18 2:11 ` Viresh Kumar
2014-12-17 23:47 ` Paul E. McKenney
2014-12-18 2:11 ` Viresh Kumar
2014-12-24 16:48 ` Nishanth Menon
2014-12-24 17:09 ` Dmitry Torokhov
2014-12-24 17:16 ` Nishanth Menon
2014-12-24 17:31 ` Dmitry Torokhov
2014-12-24 17:37 ` Nishanth Menon
2014-12-24 17:44 ` Dmitry Torokhov
2014-12-24 20:37 ` Nishanth Menon [this message]
2014-12-27 20:34 ` Rafael J. Wysocki
2014-12-16 23:09 ` [PATCH 4/4] cpufreq-dt: defer probing if OPP table is not ready Dmitry Torokhov
2014-12-17 4:37 ` Viresh Kumar
2014-12-24 16:58 ` Nishanth Menon
2014-12-17 4:42 ` [PATCH 0/4] Allow cpufreq-dt to defer probe " Viresh Kumar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=549B23FA.5050100@ti.com \
--to=nm@ti.com \
--cc=dtor@chromium.org \
--cc=geert+renesas@glider.be \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=paul.gortmaker@windriver.com \
--cc=rjw@rjwysocki.net \
--cc=stefan.wahren@i2se.com \
--cc=thomas.petazzoni@free-electrons.com \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).