Re: [v3.10 regression] deadlock on cpu hotplug

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Michael Wang <wangyun@linux.vnet.ibm.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Borislav Petkov <bp@alien8.de>, Jiri Kosina <jkosina@suse.cz>,
	Tomasz Figa <t.figa@samsung.com>,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Subject: Re: [v3.10 regression] deadlock on cpu hotplug
Date: Tue, 09 Jul 2013 18:37:25 +0530	[thread overview]
Message-ID: <51DC0B0D.9070201@linux.vnet.ibm.com> (raw)
In-Reply-To: <1754044.EVIH1UZj6p@amdc1032>

On 07/09/2013 05:21 PM, Bartlomiej Zolnierkiewicz wrote:
> 
> Hi,
> 
> On Tuesday, July 09, 2013 10:15:43 AM Michael Wang wrote:
>> Hi, Bartlomiej
>>
>> On 07/08/2013 11:26 PM, Bartlomiej Zolnierkiewicz wrote:
>> [snip]
>>>
>>> # echo 0 > /sys/devices/system/cpu/cpu3/online
>>> # echo 0 > /sys/devices/system/cpu/cpu2/online
>>> # echo 0 > /sys/devices/system/cpu/cpu1/online
>>> # while true;do echo 1 > /sys/devices/system/cpu/cpu1/online;echo 0 > /sys/devices/system/cpu/cpu1/online;done
>>>
>>> The commit in question (2f7021a8) was merged in v3.10-rc5 as a fix for
>>> commit 031299b ("cpufreq: governors: Avoid unnecessary per cpu timer
>>> interrupts") which was causing a kernel warning to show up.
>>>
>>> Michael/Viresh: do you have some idea how to fix the issue?
>>
>> Thanks for the report :) would you like to take a try
>> on below patch and see whether it solve the issue?
> 
> It doesn't help and unfortunately it just can't help as it only
> addresses lockdep functionality while the issue is not a lockdep
> problem but a genuine locking problem. CPU hot-unplug invokes
> _cpu_down() which calls cpu_hotplug_begin() which in turn takes
> &cpu_hotplug.lock. The lock is then hold during __cpu_notify()
> call. Notifier chain goes up to cpufreq_governor_dbs() which for
> CPUFREQ_GOV_STOP event does gov_cancel_work(). This function
> flushes pending work and waits for it to finish. The all above
> happens in one kernel thread. At the same time the other kernel
> thread is doing the work we are waiting to complete and it also
> happens to do gov_queue_work() which calls get_online_cpus().
> Then the code tries to take &cpu_hotplug.lock which is already
> held by the first thread and deadlocks.
> 

Yeah, exactly!

So I had proposed doing an asynchronous cancel-work or doing the
synchronous cancel-work in the CPU_POST_DEAD phase, where the
cpu_hotplug.lock is not held. See this thread:

http://marc.info/?l=linux-kernel&m=137241212616799&w=2
http://marc.info/?l=linux-pm&m=137242906622537&w=2

But now that I look at commit 2f7021a8 again, I still think we should
revert it and fix the _actual_ root-cause of the bug.

Cpufreq subsystem has enough synchronization to ensure that policy->cpus
always contains online CPUs. And it also has the concept of cancelling
queued work items, *before* that CPU is taken offline.
So, where is the chance that we try to queue work items on offline CPUs?

To answer that question, I was looking at the cpufreq code yesterday
and found something very interesting: the gov_cancel_work() that is
invoked before a CPU goes offline, can actually end up effectively
*NOT* cancelling the queued work item!

The reason is, the per-cpu work items are not just self-queueing (if
that was the case, gov_cancel_work would have been successful without
any doubt), but instead, they can also queue work items on *other* CPUs!

Example from ondemand governor's per-cpu work item:

static void od_dbs_timer(struct work_struct *work)
{
  ...
  bool modify_all = true;
  ...
  gov_queue_work(dbs_data, dbs_info->cdbs.cur_policy, delay, modify_all);
}

So, every per-cpu work item can re-queue the work item on *many other*
CPUs, and not just itself!

So that leads to a race which makes gov_cancel_work() ineffective.
The call to cancel_delayed_work_sync() will cancel all pending work items
on say CPU 3 (which is going down), but immediately after that, say CPU4's
work item fires and queues the work item on CPU4 as well as CPU3. Thus,
gov_cancel_work() _effectively_ didn't do anything useful.

But this still doesn't immediately explain how we can end up trying to
queue work items on offline CPUs (since policy->cpus is supposed to always
contain online cpus only, and this does look correct in the code as well,
at a first glance). But I just wanted to share this finding, in case it
helps us find out the real root-cause.

Also, you might perhaps want to try the (untested) patch shown below, and
see if it resolves your problem. It basically makes work-items requeue
themselves on only their respective CPUs and not others, so that
gov_cancel_work succeeds in its mission. However, I guess the patch is
wrong from a cpufreq perspective, in case cpufreq really depends on the
"requeue-work-on-everybody" model.

Regards,
Srivatsa S. Bhat

------------------------------------------------------------------------

 drivers/cpufreq/cpufreq_conservative.c |    2 +-
 drivers/cpufreq/cpufreq_governor.c     |    2 --
 drivers/cpufreq/cpufreq_ondemand.c     |    2 +-
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 0ceb2ef..bbfc1dd 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -120,7 +120,7 @@ static void cs_dbs_timer(struct work_struct *work)
 	struct dbs_data *dbs_data = dbs_info->cdbs.cur_policy->governor_data;
 	struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 	int delay = delay_for_sampling_rate(cs_tuners->sampling_rate);
-	bool modify_all = true;
+	bool modify_all = false;

 	mutex_lock(&core_dbs_info->cdbs.timer_mutex);
 	if (!need_load_eval(&core_dbs_info->cdbs, cs_tuners->sampling_rate))
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 4645876..ec4baeb 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -137,10 +137,8 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
 	if (!all_cpus) {
 		__gov_queue_work(smp_processor_id(), dbs_data, delay);
 	} else {
-		get_online_cpus();
 		for_each_cpu(i, policy->cpus)
 			__gov_queue_work(i, dbs_data, delay);
-		put_online_cpus();
 	}
 }
 EXPORT_SYMBOL_GPL(gov_queue_work);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 93eb5cb..241ebc0 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -230,7 +230,7 @@ static void od_dbs_timer(struct work_struct *work)
 	struct dbs_data *dbs_data = dbs_info->cdbs.cur_policy->governor_data;
 	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
 	int delay = 0, sample_type = core_dbs_info->sample_type;
-	bool modify_all = true;
+	bool modify_all = false;

 	mutex_lock(&core_dbs_info->cdbs.timer_mutex);
 	if (!need_load_eval(&core_dbs_info->cdbs, od_tuners->sampling_rate)) {

next prev parent reply	other threads:[~2013-07-09 13:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-08 15:26 [v3.10 regression] deadlock on cpu hotplug Bartlomiej Zolnierkiewicz
2013-07-09  2:15 ` Michael Wang
2013-07-09 11:51   ` Bartlomiej Zolnierkiewicz
2013-07-09 13:07     ` Srivatsa S. Bhat [this message]
2013-07-10  3:29       ` Michael Wang
2013-07-10  4:12       ` Michael Wang
2013-07-10  5:39         ` Viresh Kumar
2013-07-10  6:04           ` Michael Wang
2013-07-10  6:34             ` Viresh Kumar
2013-07-10  2:40     ` Michael Wang
2013-07-10  8:57       ` Michael Wang

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:0ceb2ef dfblob:bbfc1dd dfblob:4645876 dfblob:ec4baeb
dfblob:93eb5cb dfblob:241ebc0 )
 OR (
bs:"Re: [v3.10 regression] deadlock on cpu hotplug" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51DC0B0D.9070201@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=b.zolnierkie@samsung.com \
    --cc=bp@alien8.de \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=t.figa@samsung.com \
    --cc=viresh.kumar@linaro.org \
    --cc=wangyun@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.