Re: mutex warning in cpufreq + RFC patch

public inbox for cpufreq@vger.kernel.org
 help / color / mirror / Atom feed

From: "Rafael J. Wysocki" <rjw@rjwysocki.net>
To: Stephen Boyd <sboyd@codeaurora.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"cpufreq@vger.kernel.org" <cpufreq@vger.kernel.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	Kukjin Kim <kgene.kim@samsung.com>
Subject: Re: mutex warning in cpufreq + RFC patch
Date: Sat, 31 Aug 2013 02:59:31 +0200	[thread overview]
Message-ID: <41768242.uCD68eUKhK@vostro.rjw.lan> (raw)
In-Reply-To: <2143062.PTWIfM6mhl@vostro.rjw.lan>

On Saturday, August 31, 2013 02:55:57 AM Rafael J. Wysocki wrote:
> On Friday, August 30, 2013 05:36:41 PM Stephen Boyd wrote:
> > On 08/29, Viresh Kumar wrote:
> > > On 28 August 2013 22:22, Stephen Boyd <sboyd@codeaurora.org> wrote:
> > > >
> > > > I've applied these patches on top of v3.10
> > > >
> > > > f51e1eb63d9c28cec188337ee656a13be6980cfd (cpufreq: Fix cpufreq regression after suspend/resume
> > > > aae760ed21cd690fe8a6db9f3a177ad55d7e12ab (cpufreq: Revert commit a66b2e to fix suspend/resume regression)
> > > > e8d05276f236ee6435e78411f62be9714e0b9377 (cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression)
> > > > 2a99859932281ed6c2ecdd988855f8f6838f6743 (cpufreq: Fix cpufreq driver module refcount balance after suspend/resume)
> > > > 419e172145cf6c51d436a8bf4afcd17511f0ff79 (cpufreq: don't leave stale policy pointer in cdbs->cur_policy)
> > > > 95731ebb114c5f0c028459388560fc2a72fe5049 (cpufreq: Fix governor start/stop race condition)
> > > >
> > > > That second to last one causes a NULL pointer exception after the mutex
> > > > warning above because the limits case does
> > > >
> > > >     if (policy->max < cpu_cdbs->cur_policy->cur)
> > > >
> > > > and that dereferences a NULL cur_policy pointer.
> > > 
> > > I have seen something similar and the error checking patch that
> > > I mentioned earlier came as solution to that only..
> > 
> > Yes that patch may reduce the chance of the race condition but I
> > don't believe it removes it entirely. I believe this bug still
> > exists in linux-next. Consider the scenario where CPU1 is going
> > down.
> > 
> > __cpufreq_remove_dev()
> >  ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
> >   __cpufreq_governor()
> >    policy->governor->governor(policy, CPUFREQ_GOV_STOP);
> >     cpufreq_governor_dbs()
> >      case CPUFREQ_GOV_STOP:
> >       mutex_destroy(&cpu_cdbs->timer_mutex)
> >       cpu_cdbs->cur_policy = NULL;
> >   <PREEMPT>
> > store()
> >  __cpufreq_set_policy()
> >   ret = __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
> >    __cpufreq_governor()
> >     policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
> >      case CPUFREQ_GOV_LIMITS:
> >       mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
> >        if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
> > 
> > Once we stop the governor I don't see how another thread can't
> > race in and get all the way down into the GOV_LIMITS case. Even
> > if we wanted to lock out that thread with some mutex or semaphore
> > it will have to continue running eventually and so we really need
> > to wait until all the sysfs files are gone before we stop the
> > governor (in the case of the last cpu for the policy) or we need
> > to stop and start the governor while holding the policy semaphore
> > to prevent a race.
> > 
> > > 
> > > > Are there any fixes that I'm missing? I see that some things are
> > > > changing in linux-next but they don't look like fixes, more like
> > > > optimizations.
> > > 
> > > Getting patches over 3.10 would be tricky.. You are two kernel
> > > version back and that's not going to help much.. There are too many
> > > patches in between linux-next and 3.10..
> > >
> > > 
> > > I really can't tell you which specific ones to include, as I am lost in them :)
> > 
> > That's a problem. 3.10 is the next long term stable kernel and so we need to
> > backport any fixes to 3.10 for the next two years. Hopefully these bugs I'm
> > finding in the 3.10 stable kernel's cpufreq code aren't known issues on
> > 3.11/next.
> 
> No, they aren't.
> 
> Well, that's the main reason why I've been pushing back against more churn in
> the cpuidle subsystem recently.  I think we went too far with changes that
> were not entirely understood and now we're seeing the fallout.

s/cpuidle/cpufreq/

Apparently, I'm already too tired.

next prev parent reply	other threads:[~2013-08-31  0:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-28  2:57 mutex warning in cpufreq + RFC patch Stephen Boyd
2013-08-28  6:58 ` Viresh Kumar
2013-08-28 16:52   ` Stephen Boyd
2013-08-29  8:37     ` Viresh Kumar
2013-08-29  8:39       ` Viresh Kumar
2013-08-31  0:36       ` Stephen Boyd
2013-08-31  0:55         ` Rafael J. Wysocki
2013-08-31  0:59           ` Rafael J. Wysocki [this message]
2013-09-01  6:24         ` Viresh Kumar
2013-09-01 13:22           ` Rafael J. Wysocki
2013-09-01 16:21             ` Viresh Kumar
2013-09-03 13:18               ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41768242.uCD68eUKhK@vostro.rjw.lan \
    --to=rjw@rjwysocki.net \
    --cc=cpufreq@vger.kernel.org \
    --cc=kgene.kim@samsung.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=sboyd@codeaurora.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox