CPUfreq lockdep issue

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* CPUfreq lockdep issue
@ 2016-02-18 11:06 Joonas Lahtinen
  2016-02-18 11:34 ` Viresh Kumar
  2016-02-22  9:10 ` Viresh Kumar
  0 siblings, 2 replies; 7+ messages in thread
From: Joonas Lahtinen @ 2016-02-18 11:06 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar, linux-pm; +Cc: Daniel Vetter

Hi,

The Intel P-state driver has a lockdep issue as described below. It
could in theory cause a deadlock if initialization and suspend were to
be performed simultaneously. Conflicting calling paths are as follows:

intel_pstate_init(...)
	...cpufreq_online(...)
		down_write(&policy->rwsem); // Locks policy->rwsem
		...
		cpufreq_init_policy(policy);
			...intel_pstate_hwp_set();
				get_online_cpus(); // Temporarily locks cpu_hotplug.lock
		...
		up_write(&policy->rwsem);

pm_suspend(...)
	...disable_nonboot_cpus()
		_cpu_down()
			cpu_hotplug_begin(); // Locks cpu_hotplug.lock
			__cpu_notify(CPU_DOWN_PREPARE, ...);
				...cpufreq_offline_prepare();
					down_write(&policy->rwsem); // Locks policy->rwsem

Quickly looking at the code, some refactoring has to be done to fix the
issue. I think it would a good idea to document some of the driver
callbacks related to what locks are held etc. in order to avoid future
situations like this.

Because get_online_cpus() is of recursive nature and the way it
currently works, adding wider get_online_cpus() scope up around
cpufreq_online() does not fix the issue because it only momentarily
locks cpu_hotplug.lock and proceeds to do so again at next call.

Moving get_online_cpus() completely away from pstate_hwp_set() and
assuring it is called higher in the call chain might be a viable
solution. Then it could be made sure get_online_cpus() is not called
while policy->rwsem is being held already.

Do you think that would be an appropriate way of fixing it?

This issue arose on SKL platforms on i915 DRM driver continuous
integration system when we enabled lockdep debugging.

Regards, Joonas

FULL TRACE FOR P-STATE INIT

get_online_cpus+0x90/0xa0
? get_online_cpus+0x2d/0xa0
intel_pstate_hwp_set+0x40/0x150
intel_pstate_set_policy+0x135/0x170
cpufreq_set_policy+0xdb/0x3f0
cpufreq_init_policy+0x6a/0xc0
? cpufreq_update_policy+0x1e0/0x1e0
cpufreq_online+0x314/0xad0
cpufreq_add_dev+0x5a/0x80
subsys_interface_register+0xa7/0xf0
? _raw_write_unlock_irqrestore+0x3d/0x60
cpufreq_register_driver+0x147/0x210
intel_pstate_init+0x35b/0x481
? cpufreq_gov_dbs_init+0x12/0x12
do_one_initcall+0xa6/0x1d0
kernel_init_freeable+0x113/0x199
? rest_init+0x140/0x140
kernel_init+0x9/0xe0
ret_from_fork+0x3f/0x70
? rest_init+0x140/0x140

FULL TRACE FOR SUSPEND

lock_acquire+0xc3/0x1d0
? cpufreq_offline_prepare+0x92/0x2d0
down_write+0x3f/0x70
? cpufreq_offline_prepare+0x92/0x2d0
cpufreq_offline_prepare+0x92/0x2d0
cpufreq_cpu_callback+0x45/0x50
notifier_call_chain+0x39/0xa0
__raw_notifier_call_chain+0x9/0x10
_cpu_down+0x9e/0x330
? vprintk_emit+0x2cb/0x610
? vprintk_emit+0x324/0x610
? vprintk_default+0x1a/0x20
disable_nonboot_cpus+0xc0/0x350
suspend_devices_and_enter+0x409/0xb30
pm_suspend+0x53f/0x8c0
state_store+0x77/0xe0
kobj_attr_store+0xf/0x20
sysfs_kf_write+0x40/0x50
kernfs_fop_write+0x13c/0x180
__vfs_write+0x23/0xe0
? percpu_down_read+0x52/0x90
? __sb_start_write+0xb2/0xf0
? __sb_start_write+0xb2/0xf0
vfs_write+0xa2/0x190
? __fget_light+0x6a/0x90
SyS_write+0x44/0xb0
entry_SYSCALL_64_fastpath+0x16/0x73

-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CPUfreq lockdep issue
  2016-02-18 11:06 CPUfreq lockdep issue Joonas Lahtinen
@ 2016-02-18 11:34 ` Viresh Kumar
  2016-02-19  8:50   ` Joonas Lahtinen
  2016-02-22  9:10 ` Viresh Kumar
  1 sibling, 1 reply; 7+ messages in thread
From: Viresh Kumar @ 2016-02-18 11:34 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: Rafael J. Wysocki, linux-pm, Daniel Vetter

On 18-02-16, 13:06, Joonas Lahtinen wrote:
> Hi,
> 
> The Intel P-state driver has a lockdep issue as described below. It
> could in theory cause a deadlock if initialization and suspend were to
> be performed simultaneously. Conflicting calling paths are as follows:
> 
> intel_pstate_init(...)
> 	...cpufreq_online(...)
> 		down_write(&policy->rwsem); // Locks policy->rwsem
> 		...
> 		cpufreq_init_policy(policy);
> 			...intel_pstate_hwp_set();
> 				get_online_cpus(); // Temporarily locks cpu_hotplug.lock

Why is this one required?

> 		...
> 		up_write(&policy->rwsem);
> 
> pm_suspend(...)
> 	...disable_nonboot_cpus()
> 		_cpu_down()
> 			cpu_hotplug_begin(); // Locks cpu_hotplug.lock
> 			__cpu_notify(CPU_DOWN_PREPARE, ...);
> 				...cpufreq_offline_prepare();
> 					down_write(&policy->rwsem); // Locks policy->rwsem
> 
> Quickly looking at the code, some refactoring has to be done to fix the
> issue. I think it would a good idea to document some of the driver
> callbacks related to what locks are held etc. in order to avoid future
> situations like this.
> 
> Because get_online_cpus() is of recursive nature and the way it
> currently works, adding wider get_online_cpus() scope up around
> cpufreq_online() does not fix the issue because it only momentarily
> locks cpu_hotplug.lock and proceeds to do so again at next call.
> 
> Moving get_online_cpus() completely away from pstate_hwp_set() and
> assuring it is called higher in the call chain might be a viable
> solution. Then it could be made sure get_online_cpus() is not called
> while policy->rwsem is being held already.

I don't think that will be a good solution. So what you are
essentially saying is, take policy->rwsem after get_online_cpus()
only.

> Do you think that would be an appropriate way of fixing it?

At least I don't. Why do we need to call get_online_cpus()
intel-pstate governor ?

-- 
viresh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CPUfreq lockdep issue
  2016-02-18 11:34 ` Viresh Kumar
@ 2016-02-19  8:50   ` Joonas Lahtinen
  2016-02-19  9:17     ` Viresh Kumar
  0 siblings, 1 reply; 7+ messages in thread
From: Joonas Lahtinen @ 2016-02-19  8:50 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: Rafael J. Wysocki, linux-pm, Daniel Vetter

On to, 2016-02-18 at 17:04 +0530, Viresh Kumar wrote:
> On 18-02-16, 13:06, Joonas Lahtinen wrote:
> > Hi,
> > 
> > The Intel P-state driver has a lockdep issue as described below. It
> > could in theory cause a deadlock if initialization and suspend were to
> > be performed simultaneously. Conflicting calling paths are as follows:
> > 
> > intel_pstate_init(...)
> > 	...cpufreq_online(...)
> > 		down_write(&policy->rwsem); // Locks policy->rwsem
> > 		...
> > 		cpufreq_init_policy(policy);
> > 			...intel_pstate_hwp_set();
> > 				get_online_cpus(); // Temporarily locks cpu_hotplug.lock
> 
> Why is this one required?

Otherwise CPUs could be added or removed during the execution of
intel_pstate_hwp_set(), which has the following code:

       	get_online_cpus();
        for_each_online_cpu(cpu) {
		...
		wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
	}
> 
> > 		...
> > 		up_write(&policy->rwsem);
> > 
> > pm_suspend(...)
> > 	...disable_nonboot_cpus()
> > 		_cpu_down()
> > 			cpu_hotplug_begin(); // Locks cpu_hotplug.lock
> > 			__cpu_notify(CPU_DOWN_PREPARE, ...);
> > 				...cpufreq_offline_prepare();
> > 					down_write(&policy->rwsem); // Locks policy->rwsem
> > 
> > Quickly looking at the code, some refactoring has to be done to fix the
> > issue. I think it would a good idea to document some of the driver
> > callbacks related to what locks are held etc. in order to avoid future
> > situations like this.
> > 
> > Because get_online_cpus() is of recursive nature and the way it
> > currently works, adding wider get_online_cpus() scope up around
> > cpufreq_online() does not fix the issue because it only momentarily
> > locks cpu_hotplug.lock and proceeds to do so again at next call.
> > 
> > Moving get_online_cpus() completely away from pstate_hwp_set() and
> > assuring it is called higher in the call chain might be a viable
> > solution. Then it could be made sure get_online_cpus() is not called
> > while policy->rwsem is being held already.
> 
> I don't think that will be a good solution. So what you are
> essentially saying is, take policy->rwsem after get_online_cpus()
> only.

Yes, grab the policy lock only after we've made sure we can apply the
policy to the online CPUs.

> 
> > Do you think that would be an appropriate way of fixing it?
> 
> At least I don't. Why do we need to call get_online_cpus()
> intel-pstate governor ?

See above for the code.

Regards, Joonas

> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CPUfreq lockdep issue
  2016-02-19  8:50   ` Joonas Lahtinen
@ 2016-02-19  9:17     ` Viresh Kumar
  2016-02-19 22:35       ` Srinivas Pandruvada
  0 siblings, 1 reply; 7+ messages in thread
From: Viresh Kumar @ 2016-02-19  9:17 UTC (permalink / raw)
  To: Joonas Lahtinen, dirk.j.brandewie, srinivas.pandruvada
  Cc: Rafael J. Wysocki, linux-pm, Daniel Vetter, lenb

Adding the maintainers of the driver in cc now.

On 19-02-16, 10:50, Joonas Lahtinen wrote:
> On to, 2016-02-18 at 17:04 +0530, Viresh Kumar wrote:
> > On 18-02-16, 13:06, Joonas Lahtinen wrote:
> > > Hi,
> > > 
> > > The Intel P-state driver has a lockdep issue as described below. It
> > > could in theory cause a deadlock if initialization and suspend were to
> > > be performed simultaneously. Conflicting calling paths are as follows:
> > > 
> > > intel_pstate_init(...)
> > > 	...cpufreq_online(...)
> > > 		down_write(&policy->rwsem); // Locks policy->rwsem
> > > 		...
> > > 		cpufreq_init_policy(policy);
> > > 			...intel_pstate_hwp_set();
> > > 				get_online_cpus(); // Temporarily locks cpu_hotplug.lock
> > 
> > Why is this one required?
> 
> Otherwise CPUs could be added or removed during the execution of
> intel_pstate_hwp_set(), which has the following code:
> 
>        	get_online_cpus();
>         for_each_online_cpu(cpu) {
> 		...
> 		wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
> 	}
> > 
> > > 		...
> > > 		up_write(&policy->rwsem);
> > > 
> > > pm_suspend(...)
> > > 	...disable_nonboot_cpus()
> > > 		_cpu_down()
> > > 			cpu_hotplug_begin(); // Locks cpu_hotplug.lock
> > > 			__cpu_notify(CPU_DOWN_PREPARE, ...);
> > > 				...cpufreq_offline_prepare();
> > > 					down_write(&policy->rwsem); // Locks policy->rwsem
> > > 
> > > Quickly looking at the code, some refactoring has to be done to fix the
> > > issue. I think it would a good idea to document some of the driver
> > > callbacks related to what locks are held etc. in order to avoid future
> > > situations like this.
> > > 
> > > Because get_online_cpus() is of recursive nature and the way it
> > > currently works, adding wider get_online_cpus() scope up around
> > > cpufreq_online() does not fix the issue because it only momentarily
> > > locks cpu_hotplug.lock and proceeds to do so again at next call.
> > > 
> > > Moving get_online_cpus() completely away from pstate_hwp_set() and
> > > assuring it is called higher in the call chain might be a viable
> > > solution. Then it could be made sure get_online_cpus() is not called
> > > while policy->rwsem is being held already.

Hi Guys,

Joonas reported a lockdep between cpufreq and intel-pstate driver and
we are looking for probable solutions.

I failed to understand why should we run intel_pstate_hwp_set() for
each online CPU, while the frequency is changed only for a group of
CPUs that belong to a policy. Ofcourse intel_pstate_hwp_set() is
required to be run for all CPUs, while the sysfs files are touched.
And so, do we have a problem with below diff?

-- 
viresh

-------------------------8<-------------------------
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index d6061be2c542..a3c1788daab2 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -287,7 +287,7 @@ static inline void update_turbo_state(void)
 		 cpu->pstate.max_pstate == cpu->pstate.turbo_pstate);
 }
 
-static void intel_pstate_hwp_set(void)
+static void intel_pstate_hwp_set(const struct cpumask *cpumask)
 {
 	int min, hw_min, max, hw_max, cpu, range, adj_range;
 	u64 value, cap;
@@ -297,9 +297,7 @@ static void intel_pstate_hwp_set(void)
 	hw_max = HWP_HIGHEST_PERF(cap);
 	range = hw_max - hw_min;
 
-	get_online_cpus();
-
-	for_each_online_cpu(cpu) {
+	for_each_cpu(cpu, cpumask) {
 		rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, &value);
 		adj_range = limits->min_perf_pct * range / 100;
 		min = hw_min + adj_range;
@@ -318,7 +316,12 @@ static void intel_pstate_hwp_set(void)
 		value |= HWP_MAX_PERF(max);
 		wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
 	}
+}
 
+static void intel_pstate_hwp_set_online_cpus(void)
+{
+	get_online_cpus();
+	intel_pstate_hwp_set(cpu_online_mask);
 	put_online_cpus();
 }
 
@@ -440,7 +443,7 @@ static ssize_t store_no_turbo(struct kobject *a, struct attribute *b,
 	limits->no_turbo = clamp_t(int, input, 0, 1);
 
 	if (hwp_active)
-		intel_pstate_hwp_set();
+		intel_pstate_hwp_set_online_cpus();
 
 	return count;
 }
@@ -466,7 +469,7 @@ static ssize_t store_max_perf_pct(struct kobject *a, struct attribute *b,
 				  int_tofp(100));
 
 	if (hwp_active)
-		intel_pstate_hwp_set();
+		intel_pstate_hwp_set_online_cpus();
 	return count;
 }
 
@@ -491,7 +494,7 @@ static ssize_t store_min_perf_pct(struct kobject *a, struct attribute *b,
 				  int_tofp(100));
 
 	if (hwp_active)
-		intel_pstate_hwp_set();
+		intel_pstate_hwp_set_online_cpus();
 	return count;
 }
 
@@ -1112,7 +1115,7 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
 		pr_debug("intel_pstate: set performance\n");
 		limits = &performance_limits;
 		if (hwp_active)
-			intel_pstate_hwp_set();
+			intel_pstate_hwp_set(policy->cpus);
 		return 0;
 	}
 
@@ -1144,7 +1147,7 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
 				  int_tofp(100));
 
 	if (hwp_active)
-		intel_pstate_hwp_set();
+		intel_pstate_hwp_set(policy->cpus);
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: CPUfreq lockdep issue
  2016-02-19  9:17     ` Viresh Kumar
@ 2016-02-19 22:35       ` Srinivas Pandruvada
  2016-02-19 23:14         ` Rafael J. Wysocki
  0 siblings, 1 reply; 7+ messages in thread
From: Srinivas Pandruvada @ 2016-02-19 22:35 UTC (permalink / raw)
  To: Viresh Kumar, Joonas Lahtinen, dirk.j.brandewie
  Cc: Rafael J. Wysocki, linux-pm, Daniel Vetter, lenb

On Fri, 2016-02-19 at 14:47 +0530, Viresh Kumar wrote:
> Adding the maintainers of the driver in cc now.
[...]
Hi Viresh,
> Hi Guys,
> 
> Joonas reported a lockdep between cpufreq and intel-pstate driver and
> we are looking for probable solutions.
> 
> I failed to understand why should we run intel_pstate_hwp_set() for
> each online CPU, while the frequency is changed only for a group of
> CPUs that belong to a policy. 
Don't need from set_policy interface. 
> Ofcourse intel_pstate_hwp_set() is
> required to be run for all CPUs, while the sysfs files are touched.
> And so, do we have a problem with below diff?
I gave a quick test, the change works fine.

Thanks,
Srinivas


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CPUfreq lockdep issue
  2016-02-19 22:35       ` Srinivas Pandruvada
@ 2016-02-19 23:14         ` Rafael J. Wysocki
  0 siblings, 0 replies; 7+ messages in thread
From: Rafael J. Wysocki @ 2016-02-19 23:14 UTC (permalink / raw)
  To: Srinivas Pandruvada, Viresh Kumar
  Cc: Joonas Lahtinen, Dirk J Brandewie, Rafael J. Wysocki,
	linux-pm@vger.kernel.org, Daniel Vetter, Len Brown

On Fri, Feb 19, 2016 at 11:35 PM, Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
> On Fri, 2016-02-19 at 14:47 +0530, Viresh Kumar wrote:
>> Adding the maintainers of the driver in cc now.
> [...]
> Hi Viresh,
>> Hi Guys,
>>
>> Joonas reported a lockdep between cpufreq and intel-pstate driver and
>> we are looking for probable solutions.
>>
>> I failed to understand why should we run intel_pstate_hwp_set() for
>> each online CPU, while the frequency is changed only for a group of
>> CPUs that belong to a policy.
> Don't need from set_policy interface.

>> Ofcourse intel_pstate_hwp_set() is
>> required to be run for all CPUs, while the sysfs files are touched.
>> And so, do we have a problem with below diff?
> I gave a quick test, the change works fine.

Yes, the patch looks correct to me.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CPUfreq lockdep issue
  2016-02-18 11:06 CPUfreq lockdep issue Joonas Lahtinen
  2016-02-18 11:34 ` Viresh Kumar
@ 2016-02-22  9:10 ` Viresh Kumar
  1 sibling, 0 replies; 7+ messages in thread
From: Viresh Kumar @ 2016-02-22  9:10 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: Rafael J. Wysocki, linux-pm, Daniel Vetter

On 18-02-16, 13:06, Joonas Lahtinen wrote:
> Hi,
> 
> The Intel P-state driver has a lockdep issue as described below. It
> could in theory cause a deadlock if initialization and suspend were to
> be performed simultaneously. Conflicting calling paths are as follows:

Hi,

I have cc'd you to a patch today, can you please test that and verify
that the problem is gone now ?

-- 
viresh

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-02-22  9:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-18 11:06 CPUfreq lockdep issue Joonas Lahtinen
2016-02-18 11:34 ` Viresh Kumar
2016-02-19  8:50   ` Joonas Lahtinen
2016-02-19  9:17     ` Viresh Kumar
2016-02-19 22:35       ` Srinivas Pandruvada
2016-02-19 23:14         ` Rafael J. Wysocki
2016-02-22  9:10 ` Viresh Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).