* CPUfreq lockdep issue @ 2016-02-18 11:06 Joonas Lahtinen 2016-02-18 11:34 ` Viresh Kumar 2016-02-22 9:10 ` Viresh Kumar 0 siblings, 2 replies; 7+ messages in thread From: Joonas Lahtinen @ 2016-02-18 11:06 UTC (permalink / raw) To: Rafael J. Wysocki, Viresh Kumar, linux-pm; +Cc: Daniel Vetter Hi, The Intel P-state driver has a lockdep issue as described below. It could in theory cause a deadlock if initialization and suspend were to be performed simultaneously. Conflicting calling paths are as follows: intel_pstate_init(...) ...cpufreq_online(...) down_write(&policy->rwsem); // Locks policy->rwsem ... cpufreq_init_policy(policy); ...intel_pstate_hwp_set(); get_online_cpus(); // Temporarily locks cpu_hotplug.lock ... up_write(&policy->rwsem); pm_suspend(...) ...disable_nonboot_cpus() _cpu_down() cpu_hotplug_begin(); // Locks cpu_hotplug.lock __cpu_notify(CPU_DOWN_PREPARE, ...); ...cpufreq_offline_prepare(); down_write(&policy->rwsem); // Locks policy->rwsem Quickly looking at the code, some refactoring has to be done to fix the issue. I think it would a good idea to document some of the driver callbacks related to what locks are held etc. in order to avoid future situations like this. Because get_online_cpus() is of recursive nature and the way it currently works, adding wider get_online_cpus() scope up around cpufreq_online() does not fix the issue because it only momentarily locks cpu_hotplug.lock and proceeds to do so again at next call. Moving get_online_cpus() completely away from pstate_hwp_set() and assuring it is called higher in the call chain might be a viable solution. Then it could be made sure get_online_cpus() is not called while policy->rwsem is being held already. Do you think that would be an appropriate way of fixing it? This issue arose on SKL platforms on i915 DRM driver continuous integration system when we enabled lockdep debugging. Regards, Joonas FULL TRACE FOR P-STATE INIT get_online_cpus+0x90/0xa0 ? get_online_cpus+0x2d/0xa0 intel_pstate_hwp_set+0x40/0x150 intel_pstate_set_policy+0x135/0x170 cpufreq_set_policy+0xdb/0x3f0 cpufreq_init_policy+0x6a/0xc0 ? cpufreq_update_policy+0x1e0/0x1e0 cpufreq_online+0x314/0xad0 cpufreq_add_dev+0x5a/0x80 subsys_interface_register+0xa7/0xf0 ? _raw_write_unlock_irqrestore+0x3d/0x60 cpufreq_register_driver+0x147/0x210 intel_pstate_init+0x35b/0x481 ? cpufreq_gov_dbs_init+0x12/0x12 do_one_initcall+0xa6/0x1d0 kernel_init_freeable+0x113/0x199 ? rest_init+0x140/0x140 kernel_init+0x9/0xe0 ret_from_fork+0x3f/0x70 ? rest_init+0x140/0x140 FULL TRACE FOR SUSPEND lock_acquire+0xc3/0x1d0 ? cpufreq_offline_prepare+0x92/0x2d0 down_write+0x3f/0x70 ? cpufreq_offline_prepare+0x92/0x2d0 cpufreq_offline_prepare+0x92/0x2d0 cpufreq_cpu_callback+0x45/0x50 notifier_call_chain+0x39/0xa0 __raw_notifier_call_chain+0x9/0x10 _cpu_down+0x9e/0x330 ? vprintk_emit+0x2cb/0x610 ? vprintk_emit+0x324/0x610 ? vprintk_default+0x1a/0x20 disable_nonboot_cpus+0xc0/0x350 suspend_devices_and_enter+0x409/0xb30 pm_suspend+0x53f/0x8c0 state_store+0x77/0xe0 kobj_attr_store+0xf/0x20 sysfs_kf_write+0x40/0x50 kernfs_fop_write+0x13c/0x180 __vfs_write+0x23/0xe0 ? percpu_down_read+0x52/0x90 ? __sb_start_write+0xb2/0xf0 ? __sb_start_write+0xb2/0xf0 vfs_write+0xa2/0x190 ? __fget_light+0x6a/0x90 SyS_write+0x44/0xb0 entry_SYSCALL_64_fastpath+0x16/0x73 -- Joonas Lahtinen Open Source Technology Center Intel Corporation ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPUfreq lockdep issue 2016-02-18 11:06 CPUfreq lockdep issue Joonas Lahtinen @ 2016-02-18 11:34 ` Viresh Kumar 2016-02-19 8:50 ` Joonas Lahtinen 2016-02-22 9:10 ` Viresh Kumar 1 sibling, 1 reply; 7+ messages in thread From: Viresh Kumar @ 2016-02-18 11:34 UTC (permalink / raw) To: Joonas Lahtinen; +Cc: Rafael J. Wysocki, linux-pm, Daniel Vetter On 18-02-16, 13:06, Joonas Lahtinen wrote: > Hi, > > The Intel P-state driver has a lockdep issue as described below. It > could in theory cause a deadlock if initialization and suspend were to > be performed simultaneously. Conflicting calling paths are as follows: > > intel_pstate_init(...) > ...cpufreq_online(...) > down_write(&policy->rwsem); // Locks policy->rwsem > ... > cpufreq_init_policy(policy); > ...intel_pstate_hwp_set(); > get_online_cpus(); // Temporarily locks cpu_hotplug.lock Why is this one required? > ... > up_write(&policy->rwsem); > > pm_suspend(...) > ...disable_nonboot_cpus() > _cpu_down() > cpu_hotplug_begin(); // Locks cpu_hotplug.lock > __cpu_notify(CPU_DOWN_PREPARE, ...); > ...cpufreq_offline_prepare(); > down_write(&policy->rwsem); // Locks policy->rwsem > > Quickly looking at the code, some refactoring has to be done to fix the > issue. I think it would a good idea to document some of the driver > callbacks related to what locks are held etc. in order to avoid future > situations like this. > > Because get_online_cpus() is of recursive nature and the way it > currently works, adding wider get_online_cpus() scope up around > cpufreq_online() does not fix the issue because it only momentarily > locks cpu_hotplug.lock and proceeds to do so again at next call. > > Moving get_online_cpus() completely away from pstate_hwp_set() and > assuring it is called higher in the call chain might be a viable > solution. Then it could be made sure get_online_cpus() is not called > while policy->rwsem is being held already. I don't think that will be a good solution. So what you are essentially saying is, take policy->rwsem after get_online_cpus() only. > Do you think that would be an appropriate way of fixing it? At least I don't. Why do we need to call get_online_cpus() intel-pstate governor ? -- viresh ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPUfreq lockdep issue 2016-02-18 11:34 ` Viresh Kumar @ 2016-02-19 8:50 ` Joonas Lahtinen 2016-02-19 9:17 ` Viresh Kumar 0 siblings, 1 reply; 7+ messages in thread From: Joonas Lahtinen @ 2016-02-19 8:50 UTC (permalink / raw) To: Viresh Kumar; +Cc: Rafael J. Wysocki, linux-pm, Daniel Vetter On to, 2016-02-18 at 17:04 +0530, Viresh Kumar wrote: > On 18-02-16, 13:06, Joonas Lahtinen wrote: > > Hi, > > > > The Intel P-state driver has a lockdep issue as described below. It > > could in theory cause a deadlock if initialization and suspend were to > > be performed simultaneously. Conflicting calling paths are as follows: > > > > intel_pstate_init(...) > > ...cpufreq_online(...) > > down_write(&policy->rwsem); // Locks policy->rwsem > > ... > > cpufreq_init_policy(policy); > > ...intel_pstate_hwp_set(); > > get_online_cpus(); // Temporarily locks cpu_hotplug.lock > > Why is this one required? Otherwise CPUs could be added or removed during the execution of intel_pstate_hwp_set(), which has the following code: get_online_cpus(); for_each_online_cpu(cpu) { ... wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value); } > > > ... > > up_write(&policy->rwsem); > > > > pm_suspend(...) > > ...disable_nonboot_cpus() > > _cpu_down() > > cpu_hotplug_begin(); // Locks cpu_hotplug.lock > > __cpu_notify(CPU_DOWN_PREPARE, ...); > > ...cpufreq_offline_prepare(); > > down_write(&policy->rwsem); // Locks policy->rwsem > > > > Quickly looking at the code, some refactoring has to be done to fix the > > issue. I think it would a good idea to document some of the driver > > callbacks related to what locks are held etc. in order to avoid future > > situations like this. > > > > Because get_online_cpus() is of recursive nature and the way it > > currently works, adding wider get_online_cpus() scope up around > > cpufreq_online() does not fix the issue because it only momentarily > > locks cpu_hotplug.lock and proceeds to do so again at next call. > > > > Moving get_online_cpus() completely away from pstate_hwp_set() and > > assuring it is called higher in the call chain might be a viable > > solution. Then it could be made sure get_online_cpus() is not called > > while policy->rwsem is being held already. > > I don't think that will be a good solution. So what you are > essentially saying is, take policy->rwsem after get_online_cpus() > only. Yes, grab the policy lock only after we've made sure we can apply the policy to the online CPUs. > > > Do you think that would be an appropriate way of fixing it? > > At least I don't. Why do we need to call get_online_cpus() > intel-pstate governor ? See above for the code. Regards, Joonas > -- Joonas Lahtinen Open Source Technology Center Intel Corporation ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPUfreq lockdep issue 2016-02-19 8:50 ` Joonas Lahtinen @ 2016-02-19 9:17 ` Viresh Kumar 2016-02-19 22:35 ` Srinivas Pandruvada 0 siblings, 1 reply; 7+ messages in thread From: Viresh Kumar @ 2016-02-19 9:17 UTC (permalink / raw) To: Joonas Lahtinen, dirk.j.brandewie, srinivas.pandruvada Cc: Rafael J. Wysocki, linux-pm, Daniel Vetter, lenb Adding the maintainers of the driver in cc now. On 19-02-16, 10:50, Joonas Lahtinen wrote: > On to, 2016-02-18 at 17:04 +0530, Viresh Kumar wrote: > > On 18-02-16, 13:06, Joonas Lahtinen wrote: > > > Hi, > > > > > > The Intel P-state driver has a lockdep issue as described below. It > > > could in theory cause a deadlock if initialization and suspend were to > > > be performed simultaneously. Conflicting calling paths are as follows: > > > > > > intel_pstate_init(...) > > > ...cpufreq_online(...) > > > down_write(&policy->rwsem); // Locks policy->rwsem > > > ... > > > cpufreq_init_policy(policy); > > > ...intel_pstate_hwp_set(); > > > get_online_cpus(); // Temporarily locks cpu_hotplug.lock > > > > Why is this one required? > > Otherwise CPUs could be added or removed during the execution of > intel_pstate_hwp_set(), which has the following code: > > get_online_cpus(); > for_each_online_cpu(cpu) { > ... > wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value); > } > > > > > ... > > > up_write(&policy->rwsem); > > > > > > pm_suspend(...) > > > ...disable_nonboot_cpus() > > > _cpu_down() > > > cpu_hotplug_begin(); // Locks cpu_hotplug.lock > > > __cpu_notify(CPU_DOWN_PREPARE, ...); > > > ...cpufreq_offline_prepare(); > > > down_write(&policy->rwsem); // Locks policy->rwsem > > > > > > Quickly looking at the code, some refactoring has to be done to fix the > > > issue. I think it would a good idea to document some of the driver > > > callbacks related to what locks are held etc. in order to avoid future > > > situations like this. > > > > > > Because get_online_cpus() is of recursive nature and the way it > > > currently works, adding wider get_online_cpus() scope up around > > > cpufreq_online() does not fix the issue because it only momentarily > > > locks cpu_hotplug.lock and proceeds to do so again at next call. > > > > > > Moving get_online_cpus() completely away from pstate_hwp_set() and > > > assuring it is called higher in the call chain might be a viable > > > solution. Then it could be made sure get_online_cpus() is not called > > > while policy->rwsem is being held already. Hi Guys, Joonas reported a lockdep between cpufreq and intel-pstate driver and we are looking for probable solutions. I failed to understand why should we run intel_pstate_hwp_set() for each online CPU, while the frequency is changed only for a group of CPUs that belong to a policy. Ofcourse intel_pstate_hwp_set() is required to be run for all CPUs, while the sysfs files are touched. And so, do we have a problem with below diff? -- viresh -------------------------8<------------------------- diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index d6061be2c542..a3c1788daab2 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -287,7 +287,7 @@ static inline void update_turbo_state(void) cpu->pstate.max_pstate == cpu->pstate.turbo_pstate); } -static void intel_pstate_hwp_set(void) +static void intel_pstate_hwp_set(const struct cpumask *cpumask) { int min, hw_min, max, hw_max, cpu, range, adj_range; u64 value, cap; @@ -297,9 +297,7 @@ static void intel_pstate_hwp_set(void) hw_max = HWP_HIGHEST_PERF(cap); range = hw_max - hw_min; - get_online_cpus(); - - for_each_online_cpu(cpu) { + for_each_cpu(cpu, cpumask) { rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, &value); adj_range = limits->min_perf_pct * range / 100; min = hw_min + adj_range; @@ -318,7 +316,12 @@ static void intel_pstate_hwp_set(void) value |= HWP_MAX_PERF(max); wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value); } +} +static void intel_pstate_hwp_set_online_cpus(void) +{ + get_online_cpus(); + intel_pstate_hwp_set(cpu_online_mask); put_online_cpus(); } @@ -440,7 +443,7 @@ static ssize_t store_no_turbo(struct kobject *a, struct attribute *b, limits->no_turbo = clamp_t(int, input, 0, 1); if (hwp_active) - intel_pstate_hwp_set(); + intel_pstate_hwp_set_online_cpus(); return count; } @@ -466,7 +469,7 @@ static ssize_t store_max_perf_pct(struct kobject *a, struct attribute *b, int_tofp(100)); if (hwp_active) - intel_pstate_hwp_set(); + intel_pstate_hwp_set_online_cpus(); return count; } @@ -491,7 +494,7 @@ static ssize_t store_min_perf_pct(struct kobject *a, struct attribute *b, int_tofp(100)); if (hwp_active) - intel_pstate_hwp_set(); + intel_pstate_hwp_set_online_cpus(); return count; } @@ -1112,7 +1115,7 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) pr_debug("intel_pstate: set performance\n"); limits = &performance_limits; if (hwp_active) - intel_pstate_hwp_set(); + intel_pstate_hwp_set(policy->cpus); return 0; } @@ -1144,7 +1147,7 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) int_tofp(100)); if (hwp_active) - intel_pstate_hwp_set(); + intel_pstate_hwp_set(policy->cpus); return 0; } ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: CPUfreq lockdep issue 2016-02-19 9:17 ` Viresh Kumar @ 2016-02-19 22:35 ` Srinivas Pandruvada 2016-02-19 23:14 ` Rafael J. Wysocki 0 siblings, 1 reply; 7+ messages in thread From: Srinivas Pandruvada @ 2016-02-19 22:35 UTC (permalink / raw) To: Viresh Kumar, Joonas Lahtinen, dirk.j.brandewie Cc: Rafael J. Wysocki, linux-pm, Daniel Vetter, lenb On Fri, 2016-02-19 at 14:47 +0530, Viresh Kumar wrote: > Adding the maintainers of the driver in cc now. [...] Hi Viresh, > Hi Guys, > > Joonas reported a lockdep between cpufreq and intel-pstate driver and > we are looking for probable solutions. > > I failed to understand why should we run intel_pstate_hwp_set() for > each online CPU, while the frequency is changed only for a group of > CPUs that belong to a policy. Don't need from set_policy interface. > Ofcourse intel_pstate_hwp_set() is > required to be run for all CPUs, while the sysfs files are touched. > And so, do we have a problem with below diff? I gave a quick test, the change works fine. Thanks, Srinivas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPUfreq lockdep issue 2016-02-19 22:35 ` Srinivas Pandruvada @ 2016-02-19 23:14 ` Rafael J. Wysocki 0 siblings, 0 replies; 7+ messages in thread From: Rafael J. Wysocki @ 2016-02-19 23:14 UTC (permalink / raw) To: Srinivas Pandruvada, Viresh Kumar Cc: Joonas Lahtinen, Dirk J Brandewie, Rafael J. Wysocki, linux-pm@vger.kernel.org, Daniel Vetter, Len Brown On Fri, Feb 19, 2016 at 11:35 PM, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote: > On Fri, 2016-02-19 at 14:47 +0530, Viresh Kumar wrote: >> Adding the maintainers of the driver in cc now. > [...] > Hi Viresh, >> Hi Guys, >> >> Joonas reported a lockdep between cpufreq and intel-pstate driver and >> we are looking for probable solutions. >> >> I failed to understand why should we run intel_pstate_hwp_set() for >> each online CPU, while the frequency is changed only for a group of >> CPUs that belong to a policy. > Don't need from set_policy interface. >> Ofcourse intel_pstate_hwp_set() is >> required to be run for all CPUs, while the sysfs files are touched. >> And so, do we have a problem with below diff? > I gave a quick test, the change works fine. Yes, the patch looks correct to me. Thanks, Rafael ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CPUfreq lockdep issue 2016-02-18 11:06 CPUfreq lockdep issue Joonas Lahtinen 2016-02-18 11:34 ` Viresh Kumar @ 2016-02-22 9:10 ` Viresh Kumar 1 sibling, 0 replies; 7+ messages in thread From: Viresh Kumar @ 2016-02-22 9:10 UTC (permalink / raw) To: Joonas Lahtinen; +Cc: Rafael J. Wysocki, linux-pm, Daniel Vetter On 18-02-16, 13:06, Joonas Lahtinen wrote: > Hi, > > The Intel P-state driver has a lockdep issue as described below. It > could in theory cause a deadlock if initialization and suspend were to > be performed simultaneously. Conflicting calling paths are as follows: Hi, I have cc'd you to a patch today, can you please test that and verify that the problem is gone now ? -- viresh ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-02-22 9:10 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-18 11:06 CPUfreq lockdep issue Joonas Lahtinen 2016-02-18 11:34 ` Viresh Kumar 2016-02-19 8:50 ` Joonas Lahtinen 2016-02-19 9:17 ` Viresh Kumar 2016-02-19 22:35 ` Srinivas Pandruvada 2016-02-19 23:14 ` Rafael J. Wysocki 2016-02-22 9:10 ` Viresh Kumar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).