Re: [LOCKDEP] cpufreq: possible circular locking dependency detected

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Wang <wangyun@linux.vnet.ibm.com>
To: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>, Borislav Petkov <bp@alien8.de>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org
Subject: Re: [LOCKDEP] cpufreq: possible circular locking dependency detected
Date: Mon, 15 Jul 2013 10:42:35 +0800	[thread overview]
Message-ID: <51E3619B.6060901@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130714114721.GB2178@swordfish>

On 07/14/2013 07:47 PM, Sergey Senozhatsky wrote:
[snip]
> 
> Hello,
> 
> I just realized that lockdep was disabling itself at startup (after recent AMD
> radeon patch set) due to radeon kms error:
> 
> [    4.790019] [drm] Loading CEDAR Microcode
> [    4.790943] r600_cp: Failed to load firmware "radeon/CEDAR_smc.bin"
> [    4.791152] [drm:evergreen_startup] *ERROR* Failed to load firmware!
> [    4.791330] radeon 0000:01:00.0: disabling GPU acceleration
> [    4.792633] INFO: trying to register non-static key.
> [    4.792792] the code is fine but needs lockdep annotation.
> [    4.792953] turning off the locking correctness validator.
> 
> 
> Now, as I fixed radeon kms, I can see:
> 
> [  806.660530] ------------[ cut here ]------------
> [  806.660539] WARNING: CPU: 0 PID: 2389 at arch/x86/kernel/smp.c:124
> native_smp_send_reschedule+0x57/0x60()
> 
> [  806.660572] Workqueue: events od_dbs_timer
> [  806.660575]  0000000000000009 ffff8801531cfbd8 ffffffff816044ee
> 0000000000000000
> [  806.660577]  ffff8801531cfc10 ffffffff8104995d 0000000000000003
> ffff8801531f8000
> [  806.660579]  000000010001ee39 0000000000000003 0000000000000003
> ffff8801531cfc20
> [  806.660580] Call Trace:
> [  806.660587]  [<ffffffff816044ee>] dump_stack+0x4e/0x82
> [  806.660591]  [<ffffffff8104995d>] warn_slowpath_common+0x7d/0xa0
> [  806.660593]  [<ffffffff81049a3a>] warn_slowpath_null+0x1a/0x20
> [  806.660595]  [<ffffffff8102ca07>] native_smp_send_reschedule+0x57/0x60
> [  806.660599]  [<ffffffff81085211>] wake_up_nohz_cpu+0x61/0xb0
> [  806.660603]  [<ffffffff8105cb6d>] add_timer_on+0x8d/0x1e0
> [  806.660607]  [<ffffffff8106cc66>] __queue_delayed_work+0x166/0x1a0
> [  806.660609]  [<ffffffff8106d6a9>] ? try_to_grab_pending+0xd9/0x1a0
> [  806.660611]  [<ffffffff8106d7bf>] mod_delayed_work_on+0x4f/0x90
> [  806.660613]  [<ffffffff8150f436>] gov_queue_work+0x56/0xd0
> [  806.660615]  [<ffffffff8150e740>] od_dbs_timer+0xc0/0x160
> [  806.660617]  [<ffffffff8106dbcd>] process_one_work+0x1cd/0x6a0
> [  806.660619]  [<ffffffff8106db63>] ? process_one_work+0x163/0x6a0
> [  806.660622]  [<ffffffff8106e8d1>] worker_thread+0x121/0x3a0
> [  806.660627]  [<ffffffff810b668d>] ? trace_hardirqs_on+0xd/0x10
> [  806.660629]  [<ffffffff8106e7b0>] ? manage_workers.isra.24+0x2a0/0x2a0
> [  806.660633]  [<ffffffff810760cb>] kthread+0xdb/0xe0
> [  806.660635]  [<ffffffff81075ff0>] ? insert_kthread_work+0x70/0x70
> [  806.660639]  [<ffffffff8160de6c>] ret_from_fork+0x7c/0xb0
> [  806.660640]  [<ffffffff81075ff0>] ? insert_kthread_work+0x70/0x70
> [  806.660642] ---[ end trace 01ae278488a0ad6d ]---

So it back again...

Currently I have some assumptions in my mind:
1. we still failed to stop od_dbs_timer after STOP notify.
2. there is some code else which restart the work after STOP notify.
3. policy->cpus is broken.

I think we need a more detail investigation this time, let's catch the
mouse out ;-)

Regards,
Michael Wang

> 
> 
> The same problem why get/put_online_cpus() has been added to  __gov_queue_work()
> 
> commit 2f7021a815f20f3481c10884fe9735ce2a56db35
> Author: Michael Wang
> 
>     cpufreq: protect 'policy->cpus' from offlining during
>     __gov_queue_work()
> 
> 	-ss
> 
>> Regards,
>> Michael Wang
>>
>> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
>> index dc9b72e..a64b544 100644
>> --- a/drivers/cpufreq/cpufreq_governor.c
>> +++ b/drivers/cpufreq/cpufreq_governor.c
>> @@ -178,13 +178,14 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
>>  {
>>  	int i;
>>  
>> +	if (dbs_data->queue_stop)
>> +		return;
>> +
>>  	if (!all_cpus) {
>>  		__gov_queue_work(smp_processor_id(), dbs_data, delay);
>>  	} else {
>> -		get_online_cpus();
>>  		for_each_cpu(i, policy->cpus)
>>  			__gov_queue_work(i, dbs_data, delay);
>> -		put_online_cpus();
>>  	}
>>  }
>>  EXPORT_SYMBOL_GPL(gov_queue_work);
>> @@ -193,12 +194,27 @@ static inline void gov_cancel_work(struct dbs_data *dbs_data,
>>  		struct cpufreq_policy *policy)
>>  {
>>  	struct cpu_dbs_common_info *cdbs;
>> -	int i;
>> +	int i, round = 2;
>>  
>> +	dbs_data->queue_stop = 1;
>> +redo:
>> +	round--;
>>  	for_each_cpu(i, policy->cpus) {
>>  		cdbs = dbs_data->cdata->get_cpu_cdbs(i);
>>  		cancel_delayed_work_sync(&cdbs->work);
>>  	}
>> +
>> +	/*
>> +	 * Since there is no lock to prvent re-queue the
>> +	 * cancelled work, some early cancelled work might
>> +	 * have been queued again by later cancelled work.
>> +	 *
>> +	 * Flush the work again with dbs_data->queue_stop
>> +	 * enabled, this time there will be no survivors.
>> +	 */
>> +	if (round)
>> +		goto redo;
>> +	dbs_data->queue_stop = 0;
>>  }
>>  
>>  /* Will return if we need to evaluate cpu load again or not */
>> diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
>> index e16a961..9116135 100644
>> --- a/drivers/cpufreq/cpufreq_governor.h
>> +++ b/drivers/cpufreq/cpufreq_governor.h
>> @@ -213,6 +213,7 @@ struct dbs_data {
>>  	unsigned int min_sampling_rate;
>>  	int usage_count;
>>  	void *tuners;
>> +	int queue_stop;
>>  
>>  	/* dbs_mutex protects dbs_enable in governor start/stop */
>>  	struct mutex mutex;
>>
>>>
>>> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
>>>
>>> ---
>>>
>>>  drivers/cpufreq/cpufreq.c          |  5 +----
>>>  drivers/cpufreq/cpufreq_governor.c | 17 +++++++++++------
>>>  drivers/cpufreq/cpufreq_stats.c    |  2 +-
>>>  3 files changed, 13 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>> index 6a015ad..f8aacf1 100644
>>> --- a/drivers/cpufreq/cpufreq.c
>>> +++ b/drivers/cpufreq/cpufreq.c
>>> @@ -1943,13 +1943,10 @@ static int __cpuinit cpufreq_cpu_callback(struct notifier_block *nfb,
>>>  		case CPU_ONLINE:
>>>  			cpufreq_add_dev(dev, NULL);
>>>  			break;
>>> -		case CPU_DOWN_PREPARE:
>>> +		case CPU_POST_DEAD:
>>>  		case CPU_UP_CANCELED_FROZEN:
>>>  			__cpufreq_remove_dev(dev, NULL);
>>>  			break;
>>> -		case CPU_DOWN_FAILED:
>>> -			cpufreq_add_dev(dev, NULL);
>>> -			break;
>>>  		}
>>>  	}
>>>  	return NOTIFY_OK;
>>> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
>>> index 4645876..681d5d6 100644
>>> --- a/drivers/cpufreq/cpufreq_governor.c
>>> +++ b/drivers/cpufreq/cpufreq_governor.c
>>> @@ -125,7 +125,11 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
>>>  		unsigned int delay)
>>>  {
>>>  	struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
>>> -
>>> +	/* cpu offline might block existing gov_queue_work() user,
>>> +	 * unblocking it after CPU_DEAD and before CPU_POST_DEAD.
>>> +	 * thus potentially we can hit offlined CPU */
>>> +	if (unlikely(cpu_is_offline(cpu)))
>>> +		return;
>>>  	mod_delayed_work_on(cpu, system_wq, &cdbs->work, delay);
>>>  }
>>>
>>> @@ -133,15 +137,14 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
>>>  		unsigned int delay, bool all_cpus)
>>>  {
>>>  	int i;
>>> -
>>> +	get_online_cpus();
>>>  	if (!all_cpus) {
>>>  		__gov_queue_work(smp_processor_id(), dbs_data, delay);
>>>  	} else {
>>> -		get_online_cpus();
>>>  		for_each_cpu(i, policy->cpus)
>>>  			__gov_queue_work(i, dbs_data, delay);
>>> -		put_online_cpus();
>>>  	}
>>> +	put_online_cpus();
>>>  }
>>>  EXPORT_SYMBOL_GPL(gov_queue_work);
>>>
>>> @@ -354,8 +357,10 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
>>>  		/* Initiate timer time stamp */
>>>  		cpu_cdbs->time_stamp = ktime_get();
>>>
>>> -		gov_queue_work(dbs_data, policy,
>>> -				delay_for_sampling_rate(sampling_rate), true);
>>> +		/* hotplug lock already held */
>>> +		for_each_cpu(j, policy->cpus)
>>> +			__gov_queue_work(j, dbs_data,
>>> +				delay_for_sampling_rate(sampling_rate));
>>>  		break;
>>>
>>>  	case CPUFREQ_GOV_STOP:
>>> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
>>> index cd9e817..833816e 100644
>>> --- a/drivers/cpufreq/cpufreq_stats.c
>>> +++ b/drivers/cpufreq/cpufreq_stats.c
>>> @@ -355,7 +355,7 @@ static int __cpuinit cpufreq_stat_cpu_callback(struct notifier_block *nfb,
>>>  	case CPU_DOWN_PREPARE:
>>>  		cpufreq_stats_free_sysfs(cpu);
>>>  		break;
>>> -	case CPU_DEAD:
>>> +	case CPU_POST_DEAD:
>>>  		cpufreq_stats_free_table(cpu);
>>>  		break;
>>>  	case CPU_UP_CANCELED_FROZEN:
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

next prev parent reply	other threads:[~2013-07-15  2:42 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-25 21:15 [LOCKDEP] cpufreq: possible circular locking dependency detected Sergey Senozhatsky
2013-06-28  4:43 ` Viresh Kumar
2013-06-28  7:44   ` [RFC PATCH] cpu hotplug: rework cpu_hotplug locking (was [LOCKDEP] cpufreq: possible circular locking dependency detected) Sergey Senozhatsky
2013-06-28  9:31     ` Srivatsa S. Bhat
2013-06-28 10:04       ` Sergey Senozhatsky
2013-06-28 14:13     ` Srivatsa S. Bhat
2013-06-29  7:35       ` Sergey Senozhatsky
2013-07-01  4:42 ` [LOCKDEP] cpufreq: possible circular locking dependency detected Michael Wang
2013-07-10 23:13   ` Sergey Senozhatsky
2013-07-11  2:43     ` Michael Wang
2013-07-11  8:22       ` Sergey Senozhatsky
2013-07-11  8:47         ` Michael Wang
2013-07-11  8:48           ` Michael Wang
2013-07-11 11:47             ` Bartlomiej Zolnierkiewicz
2013-07-12  2:19               ` Michael Wang
2013-07-11  9:01           ` Sergey Senozhatsky
2013-07-14 11:47       ` Sergey Senozhatsky
2013-07-14 12:06         ` Sergey Senozhatsky
2013-07-15  3:50           ` Michael Wang
2013-07-15  7:52             ` Michael Wang
2013-07-15  8:29               ` Sergey Senozhatsky
2013-07-15 13:19                 ` Srivatsa S. Bhat
2013-07-15 13:32                   ` Srivatsa S. Bhat
2013-07-15 20:49                   ` Peter Wu
2013-07-16  8:29                     ` Srivatsa S. Bhat
2013-07-15 23:20                   ` Sergey Senozhatsky
2013-07-16  8:33                     ` Srivatsa S. Bhat
2013-07-16 10:44                       ` Sergey Senozhatsky
2013-07-16 15:19                         ` Srivatsa S. Bhat
2013-07-16 21:29                           ` Rafael J. Wysocki
2013-07-16  2:19                   ` Michael Wang
2013-07-15  2:42         ` Michael Wang [this message]
2013-07-14 15:56       ` Rafael J. Wysocki
2013-07-15  2:46         ` Michael Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E3619B.6060901@linux.vnet.ibm.com \
    --to=wangyun@linux.vnet.ibm.com \
    --cc=bp@alien8.de \
    --cc=cpufreq@vger.kernel.org \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).