From: "Rafael J. Wysocki" <rjw@rjwysocki.net>
To: Stephen Boyd <sboyd@codeaurora.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"cpufreq@vger.kernel.org" <cpufreq@vger.kernel.org>,
"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
Kukjin Kim <kgene.kim@samsung.com>
Subject: Re: mutex warning in cpufreq + RFC patch
Date: Sat, 31 Aug 2013 02:59:31 +0200 [thread overview]
Message-ID: <41768242.uCD68eUKhK@vostro.rjw.lan> (raw)
In-Reply-To: <2143062.PTWIfM6mhl@vostro.rjw.lan>
On Saturday, August 31, 2013 02:55:57 AM Rafael J. Wysocki wrote:
> On Friday, August 30, 2013 05:36:41 PM Stephen Boyd wrote:
> > On 08/29, Viresh Kumar wrote:
> > > On 28 August 2013 22:22, Stephen Boyd <sboyd@codeaurora.org> wrote:
> > > >
> > > > I've applied these patches on top of v3.10
> > > >
> > > > f51e1eb63d9c28cec188337ee656a13be6980cfd (cpufreq: Fix cpufreq regression after suspend/resume
> > > > aae760ed21cd690fe8a6db9f3a177ad55d7e12ab (cpufreq: Revert commit a66b2e to fix suspend/resume regression)
> > > > e8d05276f236ee6435e78411f62be9714e0b9377 (cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression)
> > > > 2a99859932281ed6c2ecdd988855f8f6838f6743 (cpufreq: Fix cpufreq driver module refcount balance after suspend/resume)
> > > > 419e172145cf6c51d436a8bf4afcd17511f0ff79 (cpufreq: don't leave stale policy pointer in cdbs->cur_policy)
> > > > 95731ebb114c5f0c028459388560fc2a72fe5049 (cpufreq: Fix governor start/stop race condition)
> > > >
> > > > That second to last one causes a NULL pointer exception after the mutex
> > > > warning above because the limits case does
> > > >
> > > > if (policy->max < cpu_cdbs->cur_policy->cur)
> > > >
> > > > and that dereferences a NULL cur_policy pointer.
> > >
> > > I have seen something similar and the error checking patch that
> > > I mentioned earlier came as solution to that only..
> >
> > Yes that patch may reduce the chance of the race condition but I
> > don't believe it removes it entirely. I believe this bug still
> > exists in linux-next. Consider the scenario where CPU1 is going
> > down.
> >
> > __cpufreq_remove_dev()
> > ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
> > __cpufreq_governor()
> > policy->governor->governor(policy, CPUFREQ_GOV_STOP);
> > cpufreq_governor_dbs()
> > case CPUFREQ_GOV_STOP:
> > mutex_destroy(&cpu_cdbs->timer_mutex)
> > cpu_cdbs->cur_policy = NULL;
> > <PREEMPT>
> > store()
> > __cpufreq_set_policy()
> > ret = __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
> > __cpufreq_governor()
> > policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
> > case CPUFREQ_GOV_LIMITS:
> > mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
> > if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
> >
> > Once we stop the governor I don't see how another thread can't
> > race in and get all the way down into the GOV_LIMITS case. Even
> > if we wanted to lock out that thread with some mutex or semaphore
> > it will have to continue running eventually and so we really need
> > to wait until all the sysfs files are gone before we stop the
> > governor (in the case of the last cpu for the policy) or we need
> > to stop and start the governor while holding the policy semaphore
> > to prevent a race.
> >
> > >
> > > > Are there any fixes that I'm missing? I see that some things are
> > > > changing in linux-next but they don't look like fixes, more like
> > > > optimizations.
> > >
> > > Getting patches over 3.10 would be tricky.. You are two kernel
> > > version back and that's not going to help much.. There are too many
> > > patches in between linux-next and 3.10..
> > >
> > >
> > > I really can't tell you which specific ones to include, as I am lost in them :)
> >
> > That's a problem. 3.10 is the next long term stable kernel and so we need to
> > backport any fixes to 3.10 for the next two years. Hopefully these bugs I'm
> > finding in the 3.10 stable kernel's cpufreq code aren't known issues on
> > 3.11/next.
>
> No, they aren't.
>
> Well, that's the main reason why I've been pushing back against more churn in
> the cpuidle subsystem recently. I think we went too far with changes that
> were not entirely understood and now we're seeing the fallout.
s/cpuidle/cpufreq/
Apparently, I'm already too tired.
next prev parent reply other threads:[~2013-08-31 0:59 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-28 2:57 mutex warning in cpufreq + RFC patch Stephen Boyd
2013-08-28 6:58 ` Viresh Kumar
2013-08-28 16:52 ` Stephen Boyd
2013-08-29 8:37 ` Viresh Kumar
2013-08-29 8:39 ` Viresh Kumar
2013-08-31 0:36 ` Stephen Boyd
2013-08-31 0:55 ` Rafael J. Wysocki
2013-08-31 0:59 ` Rafael J. Wysocki [this message]
2013-09-01 6:24 ` Viresh Kumar
2013-09-01 13:22 ` Rafael J. Wysocki
2013-09-01 16:21 ` Viresh Kumar
2013-09-03 13:18 ` Srivatsa S. Bhat
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41768242.uCD68eUKhK@vostro.rjw.lan \
--to=rjw@rjwysocki.net \
--cc=cpufreq@vger.kernel.org \
--cc=kgene.kim@samsung.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=sboyd@codeaurora.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox